Disclosure of Invention
In order to solve the above problem, an embodiment of the present invention provides a sample classification and identification method based on a convolutional neural network.
In one aspect of the present invention, a sample classification and identification method based on a convolutional neural network is provided, where the method includes:
s1, collecting original training data
Collecting spectral data of a target substance by using laser-induced breakdown spectroscopy equipment, wherein the spectral data is used as original training data, and the target substance is a substance which belongs to the same class as a sample to be classified;
s2, carrying out summation operation on the original training data to obtain the training data
Randomly extracting a plurality of pieces of data from the original training data and summing the data to obtain a piece of summation data, and repeating the steps of randomly extracting a plurality of pieces of data from the original training data and summing the data to obtain a piece of summation data until a preset number of pieces of summation data are obtained and used as the training data;
s3, randomly adjusting the wavelength represented by the abscissa of the training data obtained in S2
Adding a random number to the wavelength value corresponding to the coordinate point by coordinate point aiming at the abscissa of each piece of training data obtained in S2 to obtain new training data;
s4, randomly adjusting the intensity indicated by the ordinate of the training data obtained in S2
Aiming at the vertical coordinate of each piece of training data obtained in the S2, multiplying the intensity value corresponding to each coordinate point by a randomly generated three-order smooth curve to obtain new training data;
s5, mean filtering with random window length according to the intensity indicated by the ordinate of the training data obtained in S2
Aiming at the ordinate of each piece of training data obtained in S2, performing average filtering with the window length of 2W +1 on a coordinate point by coordinate point basis to obtain new training data, wherein the average filtering with the window length of 2W +1 is a process of taking an average value between intensity values corresponding to the coordinate points and intensity values corresponding to W coordinate points in front of and behind the coordinate points, and W is an integer;
s6, randomly superposing white noise on the intensity represented by the ordinate of the training data obtained in S2
Randomly superposing white noise on each coordinate point according to the vertical coordinate of each piece of training data obtained in the S2 to obtain new training data;
s7 training of convolutional neural network
Carrying out abscissa standardization on each piece of new training data obtained in S3, S4, S5 and S6 to obtain spectral data under a standard abscissa, using the spectral data as input spectral data, inputting each piece of input spectral data into an initial convolutional neural network to obtain an estimated category, calculating a loss value by using a loss function based on the obtained estimated category and an actual category, adjusting parameter values of each trainable parameter in the initial convolutional neural network by using a back propagation algorithm based on the calculated loss value under the condition that the calculated loss value is not converged, and using the current initial convolutional neural network as a final convolutional neural network for identifying a sample category to be classified under the condition that the loss value is converged, wherein the actual category is a category to which a target substance corresponding to the input spectral data actually belongs;
s8, identifying the category to which the sample to be classified belongs
And collecting the spectrum data of the sample to be classified by using the laser-induced breakdown spectroscopy equipment, and inputting the collected spectrum data of the sample to be classified into the final convolution neural network obtained in the S7 to obtain the category of the sample to be classified.
Compared with the prior art, the invention has the beneficial effects that: in the training process of the convolutional neural network, on the basis of original training data obtained by collecting target substances by utilizing laser-induced breakdown spectroscopy equipment, the data volume is increased in a permutation and combination mode to obtain training data, and then on the basis of the characteristics of the difference change range between the spectral data obtained by detecting the target substances and the actual spectral data in different equipment and different environments in practical application, the spectral data obtained by detecting the different equipment and different environments are simulated by adjusting the wavelength represented by the abscissa and the intensity represented by the ordinate of the training data, so that the training data used for training the convolutional neural network is obtained and trained, the convolutional neural network obtained by training is suitable for classification and identification of the spectral data collected in different equipment and different environments, and standard substances are not needed for calibration; in addition, the identification effect of the convolutional neural network mainly depends on the data volume of the training data, hundreds of thousands of training data volumes are usually needed to achieve the expected identification effect, and in practice, the collection of hundreds of thousands of data volumes through laser-induced breakdown spectroscopy equipment is difficult to achieve.
Optionally, the value of the random number in S3 ranges from (-1 × Min (X)i-Xi-1,Xi+1-Xi), Min(Xi-Xi-1,Xi+1-Xi)),XiAnd (3) representing a wavelength value corresponding to the ith coordinate point in the abscissa X of the training data, wherein i is an integer.
Optionally, the variance of the white noise in S6 is V × maxI, where maxI is a maximum intensity value in the intensity represented by the ordinate of the training data of the current random superimposed white noise, and V is a preset value range (0, 1).
Optionally, the abscissa normalization in S7, and the process of obtaining the spectral data in the standard abscissa includes: each new training data obtained in S3, S4, S5, and S6 was used to obtain spectral data in a standard abscissa using cubic spline interpolation.
Optionally, the initial convolutional neural network in S7 includes an input layer, a first convolutional layer, a first active layer, a first pooling layer, a second convolutional layer, a second active layer, a second pooling layer, a third convolutional layer, a third active layer, a third pooling layer, a global average pooling layer, a full-link layer, and an output layer.
Optionally, the step of inputting each input spectrum data into the initial convolutional neural network to obtain the estimated category includes:
the input layer receives input spectral data;
performing convolution operation on input spectrum data by using 8 convolution kernels and step length 1 to obtain 8 characteristic graphs, wherein the characteristic sizes of the 8 convolution kernels are 21 x 1;
the first activation layer sets negative values in the 8 characteristic graphs to be zero through relu activation operation;
the first pooling layer performs maximum pooling operation on the 8 feature maps after relu activation operation by using a filter with the feature size of 2 x 1 and the step length of 2 to obtain 8 feature maps after pooling treatment;
performing convolution operation on the 8 characteristic graphs subjected to the pooling treatment of the first pooling layer by using 16 convolution kernels to obtain 16 characteristic graphs, wherein the characteristic sizes of the 16 convolution kernels are 11 x 1;
the second activation layer sets negative values in the 16 characteristic graphs to be zero through relu activation operation;
the second pooling layer performs maximum pooling operation on the 16 feature maps after relu activation operation by using a filter with the feature size of 2 x 1 and the step length of 2 to obtain 16 feature maps after pooling treatment;
the third convolution layer performs convolution operation on the 16 feature maps subjected to the pooling treatment of the second pooling layer by adopting 32 convolution kernels to obtain 32 feature maps, wherein the feature sizes of the 32 convolution kernels are 5 x 1;
the third activation layer sets negative values in the 32 characteristic graphs to be zero through relu activation operation;
the third pooling layer performs maximum pooling operation on the 32 feature maps after relu activation operation by using a filter with the feature size of 2 x 1 and the step length of 2 to obtain 32 feature maps after pooling treatment;
the global average pooling layer carries out average operation on each feature map in the 32 feature maps subjected to pooling by the third pooling layer to obtain a feature map with the size of 1 × 32;
multiplying the element values of all elements in the characteristic diagram with the size of 1 x 32 by the weight and adding the offset to obtain all elements and element values used for being input to the output layer by the fully-connected layer, wherein each element represents an estimation category, and the size of the element value represents the possibility that a prediction result is the estimation category;
the output layer performs normalized exponential function operation on the element values of all the elements input by the full-connection layer, and the estimated category represented by the element corresponding to the maximum element value obtained through operation is used as the estimated category of the target substance corresponding to the input spectrum data received by the input layer.
Optionally, the trainable parameters include parameters of convolution kernels in each convolution layer and weights and biases in fully connected layers.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
Referring to fig. 1, a sample classification and identification method based on a convolutional neural network provided by an embodiment of the present invention includes:
s1, collecting original training data
And collecting spectral data of a target substance as original training data by using laser-induced breakdown spectroscopy equipment, wherein the target substance is a substance which belongs to the same class as the sample to be classified.
For example, the sample to be classified is iron ore, and correspondingly the target substance is also iron ore. In the process of acquiring the spectral data of the target substance, the spectral data of different positions can be acquired for one target substance, for example, the spectral data of 20 different positions of the iron ore can be acquired for one piece of iron ore.
In one implementation, in order to improve the recognition effect of the convolutional neural network, the spectral data can be collected based on a binary classification principle, in this case, the target substances are classified into substances belonging to the same class as the sample to be classified and substances belonging to different classes as the sample to be classified, the spectral data of the substances belonging to the same class as the sample to be classified and the spectral data of the substances belonging to different classes as the sample to be classified can be respectively collected by two laser-induced breakdown spectroscopy devices when the spectral data is collected, so that the convolutional neural network can recognize the substances belonging to the same class as the sample to be classified and the substances belonging to different classes as the sample to be classified, thereby better recognizing the sample to be classified, for example, when the sample to be classified is iron ore, the spectral data of the iron ore and the solid waste samples of non-iron ore can be collected, such as spectral data of iron sheet and scrap iron; and randomly extracting 80% of the two types of collected spectral data as original training data, and using the rest 20% as original verification data for verifying the classification effect of the network after the convolutional neural network training is finished.
S2, carrying out summation operation on the original training data to obtain the training data
Randomly extracting a plurality of pieces of data from the original training data and summing the data to obtain a piece of summation data, and repeating the steps of randomly extracting a plurality of pieces of data from the original training data and summing the data to obtain a piece of summation data until a preset number of pieces of summation data are obtained and used as the training data.
The original training data are subjected to summation operation based on a random permutation and combination principle, so that the data volume is enriched. When the summation operation is performed on the two types of original training data collected in S1, the summation operation needs to be performed on the original training data obtained by collecting iron ore and the summation operation needs to be performed on the original training data obtained by collecting solid waste samples, that is, the two types of original training data cannot be mixed and summed.
S3, randomly adjusting the wavelength represented by the abscissa of the training data obtained in S2
And adding a random number to the wavelength value corresponding to the coordinate point by coordinate point aiming at the abscissa of each piece of training data obtained in the step S2 to obtain new training data.
In practice, the random number takes on a value range of (-1 × Min (X)i-Xi-1,Xi+1-Xi), Min(Xi-Xi-1,Xi+1-Xi)),XiAnd the wavelength value corresponding to the ith coordinate point in the abscissa X of the training data is represented, and i is an integer representing the serial number of the coordinate point.
Fig. 2 is a comparison graph of training data before and after wavelength adjustment, in which a dotted line represents a curve before adjustment, a solid line represents a curve after adjustment, and wavelength errors between spectral data collected by different devices are simulated by adjusting the wavelength.
S4, randomly adjusting the intensity indicated by the ordinate of the training data obtained in S2
And for the ordinate of each piece of training data obtained in the step S2, multiplying the intensity value corresponding to each coordinate point by a randomly generated third-order smooth curve to obtain new training data.
In practice, the expression of the third order smoothing curve is:
F(n)=a3* n3+ a2* n2 +a1* n1+ a0
wherein n represents the serial number of the coordinate points on the ordinate of the training data, that is, it is used to represent the second coordinate points, for example, the total number of the coordinate points on the ordinate is 2048, the value of n is 1, 2, 3 … … 2048, a3, a2, a1, and a0 is a random decimal between 0 and 1; in the specific calculation process, the intensity value corresponding to the first coordinate point in the ordinate of the training data is multiplied by F (1), the intensity value corresponding to the second coordinate point is multiplied by F (2) … …, and so on.
Fig. 3 is a comparison graph of the training data before and after the intensity is randomly adjusted, wherein the dotted line represents a curve before the adjustment, the solid line represents a curve after the adjustment, and the difference of signal response curves of different devices, that is, the intensity error between the spectral data collected by different devices is simulated by randomly adjusting the intensity.
S5, mean filtering with random window length according to the intensity indicated by the ordinate of the training data obtained in S2
And aiming at the ordinate of each piece of training data obtained in the step S2, performing average filtering with a window length of 2W +1 on a coordinate-by-coordinate basis to obtain new training data, wherein the average filtering with the window length of 2W +1 is a process of taking an average value between intensity values corresponding to the coordinate points and intensity values corresponding to W coordinate points before and after the coordinate points, and W is an integer.
In implementation, W may be a random integer between 0 and 4, and in the case where the number of coordinate points before and after is less than W, the filtering process is performed according to the actual number of coordinate points.
As shown in fig. 4, a comparison graph of training data before and after mean filtering processing is performed, where a dotted line represents a curve before adjustment, and a solid line represents a curve after adjustment, and differences of spectral data of different devices caused by different resolutions are simulated by mean filtering.
S6, randomly superposing white noise on the intensity represented by the ordinate of the training data obtained in S2
And randomly superposing white noise on each coordinate point according to the vertical coordinate of each piece of training data obtained in the S2 to obtain new training data.
In implementation, the white noise variance is V × maxI, where maxI is the maximum intensity value in the intensities represented by the ordinate of the training data of the current random superimposed white noise, and V is a preset value range of (0, 1).
Fig. 5 shows a front-back comparison graph of training data randomly superimposed with white noise, in which a dotted line represents a curve before adjustment, a solid line represents a curve after adjustment, and the influence of the device itself and the environment on the collected spectral data is simulated by superimposing the random white noise.
S7 training of convolutional neural network
The method comprises the steps of carrying out abscissa standardization on each piece of new training data obtained in S3, S4, S5 and S6 to obtain spectral data under a standard abscissa, using the spectral data as input spectral data, inputting each piece of input spectral data into an initial convolutional neural network to obtain an estimated category, calculating a loss value by using a loss function based on the obtained estimated category and an actual category, adjusting parameter values of all trainable parameters in the initial convolutional neural network by using a back propagation algorithm based on the calculated loss value under the condition that the calculated loss value is not converged, and using the current initial convolutional neural network as a final convolutional neural network for identifying a sample category to be classified under the condition that the loss value is converged, wherein the actual category is a category to which a target substance corresponding to the input spectral data actually belongs.
In the implementation, the abscissa normalization is a process of uniformly converting the wavelengths represented by the abscissas in different spectral data into a preset standard abscissa form by an interpolation method for different minimum values, maximum values and step sizes of the wavelengths represented by the abscissas in different spectral data, for example, the preset standard abscissa represents the wavelengths with a minimum value of 170, a maximum value of 800 and a step size of 1, that is, the wavelength range represented by the abscissas is 170, 171 and 172 … … 800, and after the abscissas are normalized, the wavelengths represented by the abscissas in different spectral data are uniformly converted into the forms with a minimum value of 170, a maximum value of 800 and a step size of 1; specifically, each piece of new training data obtained in S3, S4, S5, and S6 may be subjected to abscissa normalization by using a cubic spline interpolation method to obtain spectral data in a standard abscissa; the loss function may be a MSE mean square error function.
Specifically, the initial convolutional neural network may include an input layer, a first convolutional layer, a first active layer, a first pooling layer, a second convolutional layer, a second active layer, a second pooling layer, a third convolutional layer, a third active layer, a third pooling layer, a global average pooling layer, a full-link layer, and an output layer; the trainable parameters include the parameters of the convolution kernels in each convolutional layer and the weights and offsets in the fully-connected layers.
After the input spectrum data is input into the initial convolutional neural network, the input layer receives the input spectrum data;
performing convolution operation on input spectrum data by using 8 convolution kernels and step length 1 to obtain 8 characteristic graphs, wherein the characteristic sizes of the 8 convolution kernels are 21 x 1;
the first activation layer sets negative values in the 8 characteristic graphs to be zero through relu activation operation;
the first pooling layer performs maximum pooling operation on the 8 feature maps after relu activation operation by using a filter with the feature size of 2 x 1 and the step length of 2 to obtain 8 feature maps after pooling treatment;
performing convolution operation on the 8 characteristic graphs subjected to the pooling treatment of the first pooling layer by using 16 convolution kernels to obtain 16 characteristic graphs, wherein the characteristic sizes of the 16 convolution kernels are 11 x 1;
the second activation layer sets negative values in the 16 characteristic graphs to be zero through relu activation operation;
the second pooling layer performs maximum pooling operation on the 16 feature maps after relu activation operation by using a filter with the feature size of 2 x 1 and the step length of 2 to obtain 16 feature maps after pooling treatment;
the third convolution layer performs convolution operation on the 16 feature maps subjected to the pooling treatment of the second pooling layer by adopting 32 convolution kernels to obtain 32 feature maps, wherein the feature sizes of the 32 convolution kernels are 5 x 1;
the third activation layer sets negative values in the 32 characteristic graphs to be zero through relu activation operation;
the third pooling layer performs maximum pooling operation on the 32 feature maps after relu activation operation by using a filter with the feature size of 2 x 1 and the step length of 2 to obtain 32 feature maps after pooling treatment;
the global average pooling layer carries out average operation on each feature map in the 32 feature maps subjected to pooling by the third pooling layer to obtain a feature map with the size of 1 × 32;
multiplying the element values of all elements in the characteristic diagram with the size of 1 x 32 by the weight and adding the offset to obtain all elements and element values used for being input to the output layer by the fully-connected layer, wherein each element represents an estimation category, and the size of the element value represents the possibility that a prediction result is the estimation category;
the output layer performs normalized exponential function operation on the element values of all the elements input by the full-connection layer, and the estimated category represented by the element corresponding to the maximum element value obtained through operation is used as the estimated category of the target substance corresponding to the input spectrum data received by the input layer.
For example, after the element values are subjected to normalized exponential function operation, the result is 0.8 iron ore, and 0.2 solid waste samples are obtained, and then the iron ore is used as an estimated category; correspondingly, when the loss value is calculated, the loss value can be calculated through a loss function based on the estimated type 0.8 iron ore and the actual type 1.0 iron ore, parameters of convolution kernels in each convolution layer and weights and biases in all connection layers are adjusted through a back propagation algorithm to continue training under the condition that the loss value is not converged, and the current convolution neural network is used as a final convolution neural network for identifying the type of the sample to be classified until the calculated loss value is converged.
After the final convolutional neural network is obtained, the original verification data in S1 may be used to verify the classification effect, and the final convolutional neural network may be used to classify the spectral data acquired by different devices after the classification accuracy of the original verification data is expected.
S8, identifying the category to which the sample to be classified belongs
And collecting the spectrum data of the sample to be classified by using the laser-induced breakdown spectroscopy equipment, and inputting the collected spectrum data of the sample to be classified into the final convolution neural network obtained in the S7 to obtain the category of the sample to be classified.
In the scheme provided by the embodiment of the invention, in the training process of the convolutional neural network, on the basis of original training data obtained by collecting a target substance by utilizing laser-induced breakdown spectroscopy equipment, the data volume is increased by a permutation and combination mode to obtain training data, and then based on the characteristics of the difference change range between the spectral data obtained by detecting the target substance in different equipment and different environments in practical application and the actual spectral data, the spectral data obtained by detection in different equipment and different environments is simulated by adjusting the wavelength represented by the abscissa and the intensity represented by the ordinate of the training data, training data for training the convolutional neural network is obtained and trained, so that the convolutional neural network obtained through training is suitable for classification and identification of spectral data acquired under different devices and different environments, and calibration is not needed by using standard substances; in addition, the identification effect of the convolutional neural network mainly depends on the data volume of the training data, hundreds of thousands of training data volumes are usually needed to achieve the expected identification effect, and in practice, the collection of hundreds of thousands of data volumes through laser-induced breakdown spectroscopy equipment is difficult to achieve.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.