CN111126333A

CN111126333A - Garbage classification method based on light convolutional neural network

Info

Publication number: CN111126333A
Application number: CN201911405696.5A
Authority: CN
Inventors: 石翠萍; 王涛; 李静辉; 靳展; 王天毅
Original assignee: Qiqihar University
Current assignee: Qiqihar University
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-08
Anticipated expiration: 2039-12-30
Also published as: CN111126333B

Abstract

A garbage classification method based on a light convolutional neural network belongs to the technical field of garbage classification. The invention solves the problem that the existing method can not simultaneously combine low model complexity and high classification precision. The invention divides the characteristic extraction layer into 9 parts, the convolution of each part adopts a method of combining the depth separable convolution and the common convolution, the sizes of convolution kernels alternately adopt 1 × 1 and 3 × 3, and the batch normalization processing is carried out on the result of each convolution. Unlike the commonly used ReLU activation function and Flatten connection layer, the model of the invention adopts Leaky ReLU as the activation function and a global average pooling layer as the connection layer. The experimental result shows that after the network is trained and tested on a TrashNet data set, the accuracy rate of 93.02% is obtained, the classification precision is high, the complexity of the model is low, and both the classification precision and the complexity of the model can be considered. The invention can be applied to intelligent garbage classification.

Description

Garbage classification method based on light convolutional neural network

Technical Field

The invention belongs to the technical field of garbage classification, and particularly relates to a garbage classification method based on a light-weight convolutional neural network.

Background

In recent years, the rapid increase of economy promotes urbanization construction, but the environmental pollution problem is increasingly aggravated while the urbanization is developed. Rapid industrial and economic development consumes a large amount of resources and increases municipal domestic waste. At present, due to the dumping of domestic garbage, more than 5 hundred million square meters of land is occupied in China, and nearly two thirds of large and medium-sized cities are phagocytized by the garbage. Undoubtedly, the recycling of municipal solid wastes is of great significance for alleviating environmental pollution and excessive resource consumption. However, how to effectively recycle and dispose the garbage is a major problem in all countries in the world. The sustainable development strategy proposed in recent years makes a great contribution to solving the problem, but still faces the problems of time and labor consumption and low efficiency in recycling classified garbage. The intelligent sustainable development war provides a new solution for the problems of excessive manpower consumption and low efficiency in the treatment of urban domestic garbage. Under the background of sustainable development, intelligent cities are developed, and the intelligent recycling of the household garbage is realized, so that the resource utilization can be effectively improved and the urban environment problem can be relieved.

The deep learning technology makes prominent contribution to the development of intelligent cities. In recent five years, many researchers have experimented with deep learning image technology and other image processing technology under the background of intelligently managing and recycling garbage. In 2016, Yang M and Thung G use an improved Support Vector Machine (SVM) and a method of finely tuning a Convolutional Neural Network (CNN) -AlexNet to identify garbage categories to achieve a classification task under the background of intelligent garbage classification. Training the SVM model on a data set of 6 types of garbage collected by the SVM model, and measuring the classification precision to be 63%; the classification accuracy of the CNN model was measured to be 22%. In 2017, Awe O et al fine-tune the FasterR-CNN network on the PASCAL VOC data set. 10000 garbage images of 3 categories are manufactured by the people based on data sets (TrashNet) of Yang M and Thung G, and detection and classification of three types of garbage, namely waste residue, recoverable garbage and waste paper are realized. The Average precision of the three categories was found to be 68.3% by testing (MAP). In 2018, Kennedy T et al fine-tune the VGG19 network, and also based on the trashent dataset, 2645 garbage image datasets were made, containing 7 categories. The precision is 88.42% after testing. Rabino et al fine-tune a lightweight deep neural network, MobileNet, with a measured accuracy of 87.2% on the trashent dataset. Satvilkar M uses a variety of classification methods for comparison. And (3) fine-tuning the TrashNet to obtain a data set containing 2390 junk images of 5 categories, and finally obtaining that the classification effect is the best by adopting a Convolutional Neural Network (CNN) method, and the precision is 89.81% through testing. BS Costa et al fine-tunes VGG16 and AlexNet networks, improves K-Nearest Neighbor (KNN) and Random Decision forest (RF) methods, and tests on a TrashNet data set to obtain that the network effect of VGG16 after fine tuning is the best and the recognition accuracy reaches 93%. Aral et al have further fine-tuned DenseNet, inclusion-V4, Xception, and MobileNet networks. Training and testing are carried out on a TranshNet data set, and the fine-tuned DenseNet network has the best effect, and the accuracy index reaches 95%. In 2019, Victoriaruiz et al fine-tune classical networks such as VGG, inclusion, ResNet, inclusion-ResNet and the like, and training tests are performed on TrashNet data sets, so that the obtained ResNet network after fine tuning is used, has the best classification effect and has 88.66% of recognition accuracy. Two people AdedejiO and Wangz use a method of combining a ResNet network and a Multi-class SVM (Multi-ClassSVM) to train and test on a TrashNet data set to obtain the recognition accuracy of 87%. Abdul Rajak AR et al use a trimmed AlexNet network to test on an improved trashent dataset and measure that the network has 80% recognition accuracy. Among these methods, models with high accuracy generally have high model complexity. On the other hand, if a method with a low complexity model is used, a good effect cannot be achieved in the aspect of precision when classifying the garbage images. Therefore, the existing method cannot combine low model complexity and high classification precision at the same time.

Disclosure of Invention

The invention aims to solve the problem that the existing method cannot simultaneously combine low model complexity and high classification precision, and provides a garbage classification method based on a light-weight convolutional neural network.

The technical scheme adopted by the invention for solving the technical problems is as follows: a garbage classification method based on a lightweight convolutional neural network comprises the following steps:

acquiring an original garbage image data set, and performing data enhancement on the acquired image to obtain an image data set after data enhancement;

secondly, preprocessing the image data set after data enhancement to obtain a preprocessed image data set;

step three, constructing a convolutional neural network model, and training the constructed convolutional neural network on the preprocessed image data set to obtain a trained convolutional neural network model;

the structure of the convolutional neural network model is as follows: starting from the input end of the convolutional neural network model, the convolutional neural network model sequentially comprises an input layer, a first sub convolution unit, a second sub convolution unit, a third sub convolution unit, a fourth sub convolution unit, a fifth sub convolution unit, a sixth sub convolution unit, a seventh sub convolution unit, an eighth sub convolution unit, a ninth sub convolution unit, a global average pooling layer and a dense connection output layer;

a model mechanism with the highest automatic storage precision value, a mechanism for automatically reducing the learning rate and an automatic training stopping mechanism are added in the training process; setting an initial learning rate, and using the initial learning rate as a current learning rate L_rTraining the constructed convolutional neural network model at the current learning rate until the precision value P is reached_maxLast M consecutive timesNone of the precision values obtained in the training compares to P_maxIf the learning rate is high, the current learning rate is reduced, and a new learning rate is obtained;

L′_r＝L_r×C

wherein: l' is the new learning rate, C is the reduction parameter;

taking the new learning rate as the current learning rate, continuing training, and reducing the current learning rate again when the conditions are met;

terminating the training until the precision values obtained by continuously reducing the learning rate for N times are not improved compared with the precision values obtained by continuously reducing the learning rate for N times; taking the convolutional neural network model corresponding to the highest precision value as a trained convolutional neural network model;

and step four, inputting the garbage image to be identified into the convolutional neural network model trained in the step three for testing, and obtaining a garbage classification result output by the convolutional neural network model.

The invention has the beneficial effects that: the invention provides a garbage classification method based on a lightweight convolution neural network, which divides a feature extraction layer into 9 parts, wherein the convolution of each part adopts a method combining deep separable convolution and convolution, the sizes of convolution kernels alternately adopt 1 x 1 and 3 x 3, and the batch normalization processing is carried out on the result of each convolution. Unlike the commonly used ReLU activation function and Flatten connection layer, the model of the invention adopts a Leaky ReLU activation function as the activation function and a global average pooling layer as the connection layer. Experimental results show that after the network is trained and tested on a TrashNet data set, the accuracy rate of 93.02% is obtained, the classification precision is high, the complexity of a model is low, the classification precision and the complexity of the model can be considered simultaneously, the problem that the existing method cannot combine low-model complexity and high-classification precision simultaneously is solved, and the method can be applied to actual intelligent garbage classification processing.

Drawings

FIG. 1 is a flow chart of a garbage classification method based on a lightweight convolutional neural network of the present invention;

FIG. 2 is a diagram of an initial model architecture proposed by the present invention;

FIG. 3 is a graph comparing the processing effect of a global average pooling layer (GAP) and a connection layer (Flatten);

FIG. 4 is a comparison of a depth separable convolutional layer with a generic convolutional layer;

FIG. 5 is a graph of the confusion matrix resulting from the initial model evaluation;

FIG. 6 is a graph of a confusion matrix obtained after improved model evaluation;

FIG. 7 is a thermodynamic diagram of the initial model versus image prediction;

FIG. 8 is a thermodynamic diagram of improved model versus image prediction;

FIG. 9 is a graph of accuracy variation of the improved model of the present invention using the RMSprop optimization method;

FIG. 10 is a graph of accuracy change of the improved model of the present invention using the Adam optimization method;

FIG. 11 is a graph of accuracy change of an improved model according to the present invention using an SGD + Momentum optimization method;

FIG. 12 is a confusion matrix diagram obtained after an improved model of the invention is evaluated by an RMSprop optimization method;

FIG. 13 is a confusion matrix diagram obtained after the improved model of the present invention is evaluated by using the Adam optimization method;

FIG. 14 is a confusion matrix diagram obtained after an improved model of the present invention is evaluated by an SGD + Momentum optimization method;

FIG. 15 is a parametric quantity comparison of the improved model of the present invention with other models.

Detailed Description

The first embodiment is as follows: as shown in fig. 1, the garbage classification method based on the lightweight convolutional neural network according to the present embodiment includes the following steps:

a model mechanism with the highest automatic storage precision value, a mechanism for automatically reducing the learning rate and an automatic training stopping mechanism are added in the training process; setting an initial learning rate, and using the initial learning rate as a current learning rate L_rTraining the constructed convolutional neural network model at the current learning rate until the precision value P is reached_maxThe precision values obtained in the following M times of training are not better than P_maxIf the learning rate is high, the current learning rate is reduced, and a new learning rate is obtained;

L′_r＝L_r×C

wherein: l' is the new learning rate, C is the reduction parameter; p_maxThe maximum precision value under the current learning rate;

and step four, inputting the garbage image to be identified into the convolutional neural network model trained in the step three, and obtaining a garbage classification result output by the convolutional neural network model.

In the training process of the embodiment, a model mechanism for monitoring and storing the highest precision value is added, when training for the 1 st time, the precision value obtained by the 1 st training is the highest precision value, the model automatically stores the model obtained by the 1 st training, when training for the 2 nd time, if the precision value obtained by the 2 nd training is higher than the precision value obtained by the 1 st training, the model automatically stores the model obtained by the 2 nd training, if the precision value obtained by the 2 nd training is not higher than the precision value obtained by the 1 st training, the model obtained by the 2 nd training is not stored, and by analogy, the model with the highest precision value is stored through the continuous training process.

The value of N is 25, the accuracy value obtained by continuously reducing the learning rate for N times is not improved compared with the accuracy value obtained by continuously reducing the learning rate for N times, and the method specifically comprises the following steps: if the current learning rate is n₀，n₀After 1 reduction is n₁，n₀After 2 reductions is n₂，…，n₀N after N reductions_N，n₁、n₂、…、n_NThe corresponding precision values are not higher than n₀And terminating the training if the corresponding precision value is obtained, and storing the convolutional neural network model corresponding to the highest precision value in the whole training process to obtain the trained convolutional neural network model.

The convolutional neural network model finally adopted by the invention is obtained by improving the initial model provided by the invention and shown in fig. 2, and the specific improvement mode is as follows: the Flatten layer in fig. 2 is improved to a global average Pooling layer (GAP), and the ReLU function used in the model of fig. 2 is improved to a leak ReLU function, that is, the activation function adopted by the convolutional neural network model of the present invention is a leak ReLU, and the mathematical expression of the leak ReLU function is:

wherein: x is the number of_iAnd y_iThe value of a is 10 for the variables of the Leaky ReLU function.

The ReLU function is improved into the Leaky ReLU function, so that the problem that the neuron does not learn due to the use of the ReLU function can be solved. The GAP is to calculate the pixel average value of each input feature map, and then combine the pixel average values (feature points) of each feature map into a feature vector, so that each feature map only outputs one feature point. And the Flatten layer directly multiplies the size of the input characteristic diagram with the number (channel number) of the input characteristic diagram to obtain a final vector. Fig. 3 is a diagram of the difference between GAP and Flatten, H indicates the height of the feature map, W indicates the width of the feature map, and C indicates the number of input feature maps. As is apparent from fig. 3, the global average pooling is to sum up the spatial information, so that the robustness to the spatial variation of the input is higher, and the complexity of the model is reduced.

And (3) calculating the feature graph by a GAP layer to obtain a combined feature vector, inputting the combined feature vector into a Dense connection (Dense) output layer with an activation function of Softmax for calculation, and outputting a corresponding probability value. In the process, Softmax executes the function of mapping the input characteristic value to the probability value, and the label corresponding to the output maximum probability is the predicted category label. When n feature points are input, the output y of the k-th feature point is calculated_kIt can be expressed as:

the invention simultaneously analyzes the complexity of the model and the extraction capability of the model to the image characteristics, and provides a light CNN network model with high image characteristic extraction capability. First, the network structure adopted by the model can be divided into a convolutional layer part for extracting image features and a dense connection layer part for classification. The partial convolutional layer for extracting image features mirrors the network infrastructure (5 partial convolutional layers) of VGG 16. On the basis, a convolution layer of four parts is added, and 9 parts in total are used for extracting the attribute features of the target image. Convolutional layers use a combination of depth separable convolution and convolution. Meanwhile, the convolution kernel of each convolution is alternately used by 3 multiplied by 3 and 1 multiplied by 1, so that the parameter quantity calculated by the model in the training process is greatly reduced, the complexity of the model is reduced, and better image feature extraction capability can be obtained. Secondly, the invention adopts a Batch Normalization (BN) method provided in the literature (Ioffe S, Szegedy C. batch normalization: adaptive deep network training by reducing internal covariance shift [ J ]. arXiv preprinting arXiv:1502.03167,2015.), and the method is not only beneficial to avoiding the gradient disappearance problem possibly caused by deepening of the model layer number, but also improves the generalization capability of the model, so that the model has better robustness. Finally, the invention improves the initial model again on the basis of the above. On one hand, starting from the activation function of the convolutional layer, a conventional modified linear unit function (ReLU) is improved into a variant leakage modified linear unit function (leakage ReLU), and the problem that when the input value is negative, the neuron learns slowly or not is solved. On the other hand, the invention also improves the Flatten layer in the initial model into a global average Pooling layer (GAP), reduces the complexity of the model, and can inhibit the overfitting problem caused by the deepening of the layer number.

The second embodiment is as follows: the first difference between the present embodiment and the specific embodiment is: in the first step, data enhancement is performed on the acquired image to obtain an image data set after data enhancement, and the specific process is as follows:

performing data enhancement on the acquired image, wherein the data enhancement mode is as follows: setting the random rotation angle range of the image to be 0-20 degrees, setting the offset coefficient of the image in the horizontal and vertical directions to be 0.2, and filling in a neighboring manner;

and after the data enhancement is finished, obtaining the image data set after the data enhancement.

The data set TrashNet used in the invention only contains 2527 images of garbage, including cardboard, glass, metal, paper, plastic, 6 categories of garbage, and total 2527 RGB three-color images. Wherein: 403 pieces of paper boards, 501 pieces of glass, 410 pieces of metal, 594 pieces of paper, 482 pieces of plastic and 137 pieces of garbage.

The data volume is less due to the data set trashent. Therefore, the data set is subjected to data enhancement by using the data enhancer in the Keras. The shift coefficient of the image in the horizontal and vertical directions of 0.2 means a ratio with respect to the total width and the total height of the image.

The third concrete implementation mode: the first difference between the present embodiment and the specific embodiment is: in the second step, the image data set after data enhancement is preprocessed to obtain a preprocessed image data set, and the specific process is as follows:

the size of each image in the data enhanced image dataset is uniformly scaled to a size of (224 ), and each value in the generated matrix is multiplied by 1/255 so that each value is between 0 and 1, and the pre-processed image dataset is obtained.

For data with relatively large values or heterogeneous data (for example, one eigenvalue of the data is within a range of 0-1, and the other eigenvalue of the data is within a range of 100-200), it is not suitable to input the data into a neural network, which may result in large gradient update and thus cause network convergence, in order to make network learning easier, the input data should have the following characteristics that ① has a small value, most values are suitable within a range of 0-1. ② homogeneity, all characteristics should have approximately the same value, therefore, before the experiment starts, the data image should be preprocessed.

The fourth concrete implementation mode: the first difference between the present embodiment and the specific embodiment is: the first sub-convolution unit comprises a first convolution layer, a first depth separable convolution layer and a first maximum pooling layer from an input end in sequence;

the second sub-convolution unit comprises a second convolution layer, a second depth separable convolution layer and a second maximum pooling layer in sequence from the input end;

the third sub-convolution unit comprises a third convolution layer, a fourth convolution layer, a third depth separable convolution layer and a third maximum pooling layer in sequence from the input end;

the fourth sub-convolution unit comprises a fifth convolution layer, a sixth convolution layer, a fourth depth separable convolution layer and a fourth maximum pooling layer in sequence from the input end;

the fifth sub-convolution unit includes, in order from the input end, a seventh convolution layer, an eighth convolution layer, a fifth depth-separable convolution layer, and a fifth maximum pooling layer;

the sixth sub-convolution unit includes, in order from the input end, a ninth convolution layer and a sixth depth-separable convolution layer;

the seventh sub-convolution unit includes, in order from the input end, a tenth convolution layer and a seventh depth-separable convolution layer;

the eighth sub-convolution unit includes, in order from the input end, an eleventh convolution layer and an eighth depth-separable convolution layer;

the ninth sub-convolution unit includes, in order from the input end, a twelfth convolution layer, a ninth depth-separable convolution layer, and a thirteenth convolution layer;

in the first sub-convolution unit and the second sub-convolution unit, the first convolution layer and the second convolution layer each employ a convolution kernel of 1 × 1 size, and the first depth-separable convolution layer and the second depth-separable convolution layer each employ a convolution kernel of 3 × 3 size;

in a third sub-convolution unit, the third convolution layer employs convolution kernels of size 1 × 1, the fourth convolution layer employs convolution kernels of size 3 × 3, and the third depth separable convolution layer employs convolution kernels of size 3 × 3;

in the fourth sub-convolution unit, the fifth convolution layer employs convolution kernels of size 1 × 1, the sixth convolution layer employs convolution kernels of size 3 × 3, and the fourth depth separable convolution layer employs convolution kernels of size 3 × 3;

in the fifth sub-convolution unit, the seventh convolution layer employs convolution kernels of size 1 × 1, the eighth convolution layer employs convolution kernels of size 3 × 3, and the fifth depth separable convolution layer employs convolution kernels of size 3 × 3;

in the sixth sub-convolution unit, the ninth convolution layer employs convolution kernels of size 1 × 1, and the sixth depth-separable convolution layer employs convolution kernels of size 3 × 3;

in the seventh sub-convolution unit, the tenth convolution layer employs convolution kernels of size 1 × 1, and the seventh depth-separable convolution layer employs convolution kernels of size 3 × 3;

in the eighth sub-convolution unit, the eleventh convolution layer employs convolution kernels of size 1 × 1, and the eighth depth-separable convolution layer employs convolution kernels of size 3 × 3;

in the ninth sub-convolution unit, the twelfth convolution layer and the thirteenth convolution layer each employ a convolution kernel of 1 × 1 size, and the ninth depth separable convolution layer employs a convolution kernel of 3 × 3 size.

In the present invention, the convolution layers other than the depth-separable convolution layer are all general convolution layers.

Thus, the complexity of the model can be significantly reduced by using convolution kernels of 1 × 1 size for a large number of convolutions. The number of output channels in the convolutional layer is set according to the sequence of the sub-convolutional units, and the numerical value adopts the 2-power law. The number of output channels of the convolutional layers (including the general convolutional layer and the depth-separable convolutional layer) in the first to ninth sub-convolutional units is set to 32, 64, 128, 256, 256, 256, 128, 128, 128, respectively.

The down-sampling is achieved by setting a step size of a maximum pooling layer (maxporoling) among the first to fifth sub-convolution units. The pooling step is set to 2, i.e. after each pooling the size of the output profile is 1/2 of the input profile. After 5 times of down sampling, a 7 × 7 size feature map is output.

Fig. 4 is a comparison of a depth-separable convolutional layer with a generic convolutional layer, and it can be seen that when the number of input channels is 3, the convolutional kernel size is 5 × 5, and the number of output channels is 32, the number of parameters required to be calculated by the generic convolutional layer is 2400, while the number of parameters required to be calculated using depth-separable convolution is 171, which is a nearly 92.88% reduction in the amount of parameters. Thus, the use of a deep separable convolution can result in a significant reduction in the complexity of the model.

The invention adds the BN layer after each layer of convolution (including a general convolution layer and a depth separable convolution layer), and effectively reduces the overfitting problem caused by less data sets after the layer is added. In addition, the invention adds L2 regularization to the convolutional layer, and the regularization coefficient is 0.0005. The method can perform punishment on the large weight value in the learning process.

The fifth concrete implementation mode: the first difference between the present embodiment and the specific embodiment is: the initial learning rate is 0.001, the value of M is 8, the value of C is 0.5, and the value of N is 25.

The sixth specific implementation mode: the first difference between the present embodiment and the specific embodiment is: the optimization method adopted by the convolutional neural network model is a random gradient descent optimization method for adding momentum coefficients, and the value of the momentum coefficients is 0.9.

The seventh embodiment: the first difference between the present embodiment and the specific embodiment is: the activation function of the densely-connected output layer is Softmax.

The specific implementation mode is eight: the first difference between the present embodiment and the specific embodiment is: the loss function adopted by the convolutional neural network model is a classified cross entropy loss function, and the expression of the classified cross entropy loss function is as follows:

wherein: log is the logarithm based on the natural constant e. y is_kRepresenting the probability value of outputting characteristic neurons after the kth characteristic point passes through Softmax, n is the number of the characteristic points, t_kIs correctly de-tagged, when t_kFor correct de-labeling, t_kIs 1 when t is_kWhen not correctly de-tagged, t_kIs 0.

y_kThe larger the value of (A), the smaller the loss value L; conversely, y_kThe smaller the value, the larger the loss value.

Preparation of the experiment

A. Experimental Environment

(1) The system comprises the following steps: windows10

(2) Hardware configuration: intel (R) Pentium (R) CPU G4600@3.60 GHz; memory 8.00 GB; GPU NVIDIA GeForce GTX 1050Ti.

(3) Programming language: python3.5.2

(4) A deep learning framework: keras2.1.3, Tensorflow-gpu 1.4.0

B. Description of evaluation indexes related to experiment

(1) Precision (Precision): number of correctly predicted positive samples T_PProportional to the number of true positive samples.

Can be expressed as:

(2) recall (Recall): the ratio of the number of correctly predicted positive samples to the total number of predicted positive samples can be expressed as:

(3) f1 Score (F1-Score, F1) F1 is an index calculated from precision (P) and Recall (R).

Can be expressed as:

(4) class Activation Map (CAM): class activation thermodynamic diagrams are two-dimensional networks of scores associated with particular output classes, computed for each location of any input image, which represent how important each location is to the class. The index is realized by a Gradient-weighted Classaging Mapping (Grad-CAM) method proposed in 2017 by Rampraaath R.Selvaarju et al (Selvaarju R, Cogswell M, DasA, et al.Grad-CAM: Visual extensions from network views Visual-based analysis [ C ]// Proceedings of the IEEE International Conference on computer Vision.2017: 618-626).

Discussion of experiments and results

Comparing performance of initial and improved models

The activation function adopted by the initial model is a ReLU function, the activation function adopted by the improved model (namely the model finally adopted by the invention) is a Leaky ReLU function, the connection layer of the initial model is Flatten, the improved model is a global average pooling layer (GAP), and the other parts of the initial model have the same structure as the improved model.

(1) Quantity of model parameters

The complexity of the model is analyzed from the perspective of the model parameter number, and the model parameter statistical function of the Keras framework is adopted for realization. Table 1 is a comparison of the initial model and the improved model in terms of parameter quantities. As can be seen from table 1, the improved model is reduced by 2 percentage points in the model parameters compared to the unmodified initial model. Namely, the GAP layer is used for replacing the Flatten layer, so that the complexity of the model is reduced.

TABLE 1 comparison of the quantities of parameters of the initial and improved models

(2) Accuracy and loss value

The invention improves the activation function used by the initial model convolution layer, and improves the originally used ReLU function into a Leaky ReLU function. The accuracy and loss values of the initial model and the improved model are compared here, see table 2. As can be seen from Table 2, the improved model accuracy increased by 1.97%. The loss value was reduced by 0.14.

TABLE 2 table of comparison results of loss of precision values of initial model and improved model

(3) Other evaluation indexes

Secondly, we also compared the initial and improved models for evaluation on other criteria (Precision, Recall, F1-Score). Fig. 5 is a confusion matrix map obtained after initial model evaluation, and fig. 6 is a confusion matrix map obtained after improved model evaluation. As can be seen from fig. 5 and 6, the number of correctly recognized image categories of the initial model is 202, and the recognition rate reaches 86.70%; the number of the classes to which the improved model correctly identifies the image is 210, and the identification rate reaches 90.13%. Compared with the initial model, the recognition rate is improved by 3.43%.

Table 3 shows the results of comparing the values of other evaluation indexes of the initial model and the improved model. As can be seen from Table 3, the improved model increased 5.4% on the average value of Precision; an increase of 5% in the mean value of Recall; the improvement in the average value of F1-Score was 5.5%.

TABLE 3 evaluation index comparison results of initial model and improved model

Fig. 7 and 8 show thermodynamic diagrams predicted from an image of glass type using the initial model and the improved model, respectively. As can be seen from fig. 7 and 8, the initial model extracts only a portion of the thermal characteristics of the glass, while the improved model extracts the thermal characteristics of the glass substantially completely. This demonstrates that the improved model is better at extracting features from the image than the initial model.

Therefore, the activation function is replaced by the Leaky ReLU function, and the GAP layer is used for replacing the Flatten layer, so that the complexity of the improved model can be improved, and the feature extraction capability of the model can be improved.

Selection of optimization method

Different optimization methods are selected to train the model, and the final performances of the model are different. In order to make the performance of the trained model better, the improved model is trained by selecting three optimization methods of Adam, SGD + Momentum and RMSprop respectively.

The table of loss of accuracy versus value (table 4) and the graphs of accuracy change during training using these three optimization methods are plotted, respectively, i.e., fig. 9, fig. 10, fig. 11. From table 4, it can be seen that, from the perspective of the precision value, the precision value of the SGD + Momentum optimization method is 0.87% higher than that of the RMSprop optimization method, and 0.22% higher than that of the Adam optimization method. The accuracy value of the SGD + Momentum optimization method is higher than that of the other two optimization methods.

TABLE 4 comparison of precision loss results using different optimization methods

The following continues with the evaluation of the three optimization methods on other evaluation index data. Fig. 12 to 14 show the confusion matrix diagrams output after evaluation of the model trained by three different optimization methods. As can be seen from fig. 12 to 14, when the RMSprop optimization method is adopted, the number of the categories to which the correctly-identified images belong is 210, and the identification rate is 90.13%; when the Adam optimization method is adopted, the number of the types of the correctly identified images is 207, and the identification rate is 88.84%; when the SGD + Momentum optimization method is adopted, the number of the categories to which the images belong is 212, and the highest recognition rate is 90.99%.

As can be seen from Table 5, the model trained by the SGD + Momentum optimization method is 0.012 higher in Precision index value than the model trained by the RMSprop optimization method; the ratio of the value of the Recall index is 0.008 higher; the numerical ratio of the F1-Score index is 0.012 higher. The value of Precision index is 0.04 higher than that of Precision index by using an Adam optimization method; the numerical value of the Recall index is higher by 0.016; the index value of F1-Score is 0.035 higher.

TABLE 5 several evaluation index data result tables obtained by training models using different optimization methods

From the above experiments and analysis, it can be seen that the best performance can be obtained by using the SGD + Momentum optimization method from the viewpoint of the classification performance finally achieved by improving the model. Therefore, the invention finally adopts the SGD + Momentum optimization method.

In contrast to other methods

(1) Complexity contrast of different models

The present invention is directed to the literature (Kennedy T. OscarNet: using transfer learning disposable waste [ J ])].CS230 Report:Deep Learning.Stanford University,CA,Winter,2018.)、(Rabano S L,Cabatuan M K,Sybingco E,et al.Common GarbageClassification Using MobileNet[C]//2018IEEE 10th International Conference onHumanoid,Nanotechnology,Information Technology,Communication and Control,Environment and Management(HNICEM).IEEE,1-4.)、(Costa B S,Bernardes AC S,Pereira J V A,et al.Artificial Intelligence in Automated Sorting in TrashRecycling[C]//Anais do XV Encontro Nacional de Inteligência Artificial eComputacional.SBC,2018:198-205.)、(Adedeji O,Wang Z.Intelligent WasteClassification System Using Deep Learning Convolutional Neural Network[J].Procedia Manufacturing,2019,35:607-612.)、(Ruiz V,

Sánchez,JoséF.Vélez,et al.Automatic Image-Based Waste Classification[M]Spring, Cham,2019.) and (AralR A, Keskin

R,Kaya M,et al.Classification of TrashNet DatasetBased on Deep Learning Models[C]//2018IEEE International Conference on BigData (Big Data). IEEE,2018: 2058-. The method mainly comprises the reproduction of the networks of AlexNet, VGG16, VGG19, MobileNet, ResNet-50, inclusion-ResNet, DenseNet121 and DenseNet169 after fine adjustment, and is used for comparing the complexity with the improved model of the invention.

The invention utilizes the statistical function of the model parameters in the Keras framework to statistically output the total parameter number of the model of the improved model and a plurality of comparison methods, which is shown in Table 6. FIG. 15 is a graphical representation of parameter quantity comparison. Compared with networks such as VGG19, VGG16, AlexNet and inclusion-ResNet after fine adjustment, the improved model of the invention has the advantages that the parameter amount is respectively reduced by 98.84%, 98.8%, 97.76% and 97.03%; compared with the ResNet50 and the inclusion networks after fine adjustment, the parameter quantity is respectively reduced by 93.16% and 92.6%; compared with the two networks of the DenseNet121 and the DenseNet169 after fine adjustment, the parameter quantity is respectively reduced by 77.09% and 87.24%; compared with the MobileNet network, the parameter amount is reduced by 50.11%.

TABLE 6 comparison of model parameters

From the above experiments and analysis, the improved model of the present invention has lower complexity than other methods. The improved model of the present invention is verified below in terms of its feature extraction capabilities.

(2) Comparison of classification effects of different models

And carrying out experimental analysis comparison from the angle of the model precision index value. The invention improves the model and literature during this experiment (Yang M, Thung G. Classification of train for retrieval status [ J)].CS229 Project Report,2016:1-6.)、(Kennedy T.OscarNet:using transfer learningto classify disposable waste[J].CS230 Report:Deep Learning.StanfordUniversity,CA,Winter,2018.)、(Rabano S L,Cabatuan M K,Sybingco E,et al.CommonGarbage Classification Using MobileNet[C]//2018IEEE 10th InternationalConference on Humanoid,Nanotechnology,Information Technology,Communicationand Control,Environment and Management(HNICEM).IEEE,1-4.)、(Satvilkar M.ImageBased Trash Classification using Machine Learning Algorithms forRecyclability Status[J].Image,2018,13:08.)、(Costa B S,Bernardes AC S,PereiraJ V A,et al.Artificial Intelligence in Automated Sorting in Trash Recycling[C]//Anais do XV Encontro Nacional de

Artificial eComputacional.SBC,2018:198-205.)、(Aral R A,Keskin

R,Kaya M,etal.Classification of TrashNet Dataset Based on Deep Learning Models[C]//2018IEEE International Conference on Big Data(Big Data).IEEE,2018:2058-2062.)、(Ruiz V,

Sánchez,JoséF.Vélez,et al.Automatic Image-Based WasteClassification[M]Spring, char, 2019.) and (Adedeji O, WangZ. Intelligent class Classification System Using Deep Learning neural Network [ J]Procedia Manufacturing,2019,35:607- & 612.) was compared. The method comprises improved SVM, CNN, XGB and KNN methods and fine-tuned VGG16, VGG19, MobileNet, AlexNet, ResNet18, inclusion-ResNet, DenseNet121 and DenseNet169 networks. Comparative analysis will be performed based on the data obtained after the experiment.

Table 7 shows the results of comparing the improved model of the present invention with other methods in terms of model accuracy. The analysis shows that the literature (Costa B S, Bernares A C S, Pereira J V A, et al]The use of a fine tuned VGG16 network enabled a 93% accuracy comparable to the accuracy values of the improved model of the present invention. The literature (Aral R A, Keskin)

R,Kaya M,et al.Classification of TrashNet Dataset Based on Deep Learning Models[C]I/2018 IEEE International Conference on Big Data (Big Data) IEEE,2018: 2058-. However, from the results of the previous analysis of model complexity, the improved model of the present invention reduced the model parameters by 98.8% compared to the trimmed VGG 16. Compared with the finely adjusted DenseNet121 and DenseNet169 networks, the improved model of the invention has reduced model parameters of 77.09% and 87.24% respectively, and has lower complexity. Thus, the process of the invention mayAnd simultaneously, the low model complexity and the high precision are considered.

Next, the experiment and result analysis were continued from three points of Precision, Recall and F1-Score.

TABLE 7 comparison of model accuracy for each method

Comparison with other methods was made on three indices of Precision, Recall and F1-Score. On the average of Precision indexes of various types, Precision of the improved model is higher than that of other methods. On the Recall index, the Recall value of most classes of the improved model is higher than that of other methods. By calculating the average value of the Recall of each type, the Recall of the improved model is higher than that of other methods. The improved model of the present invention has higher F1-Score values than other methods by calculating the mean of the various classes of F1-Score on the F1-Score index. The improved model of the invention has more average classification effect, and avoids the condition of better or worse classification effect on a certain class.

In conclusion, the complexity of the proposed model and the classification effect on the target image are evaluated from the aspects of model parameter quantity, model precision and other indexes. The experimental data prove that compared with other methods, the improved model has lower model complexity and better classification effect, and the problem that most of the current models cannot have low model complexity and high classification precision is solved well.

The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims

1. A garbage classification method based on a lightweight convolutional neural network is characterized by comprising the following steps:

L′_r＝L_r×C

wherein: l' is the new learning rate, C is the reduction parameter;

2. The method for garbage classification based on the lightweight convolutional neural network as claimed in claim 1, wherein in the first step, the acquired image is subjected to data enhancement to obtain a data-enhanced image dataset, and the specific process is as follows:

3. The method for garbage classification based on the lightweight convolutional neural network as claimed in claim 1, wherein in the second step, the image data set after data enhancement is preprocessed to obtain a preprocessed image data set, and the specific process is as follows:

4. The method of claim 1, wherein the first sub-convolution unit comprises, in order from the input end, a first convolution layer, a first depth separable convolution layer, and a first maximum pooling layer;

5. The method of claim 1, wherein the initial learning rate is 0.001, the value of M is 8, the value of C is 0.5, and the value of N is 25.

6. The method for garbage classification based on the lightweight convolutional neural network as claimed in claim 1, wherein the convolutional neural network model adopts an activation function of Leaky ReLU, and the mathematical expression of the function of Leaky ReLU is as follows:

7. The method of claim 1, wherein the convolutional neural network model adopts a stochastic gradient descent optimization method with momentum coefficients added, and the momentum coefficients take a value of 0.9.

8. The method of claim 1, wherein the activation function of the dense connection output layer is Softmax.

9. The method for garbage classification based on the lightweight convolutional neural network as claimed in claim 1, wherein the loss function adopted by the convolutional neural network model is a classification cross entropy loss function, and the expression of the classification cross entropy loss function L is:

wherein: log is the logarithm based on the natural constant e, y_kRepresenting the probability value of outputting characteristic neurons after the kth characteristic point passes through Softmax, n is the number of the characteristic points, t_kIs correctly de-tagged, when t_kFor correct de-labeling, t_kIs 1 when t is_kWhen not correctly de-tagged, t_kIs 0.