CN116258905A

CN116258905A - Garbage detection method of neural network model based on deep learning

Info

Publication number: CN116258905A
Application number: CN202310234295.8A
Authority: CN
Inventors: 黄�俊; 李果; 刘�文
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-06-13

Abstract

The invention designs a garbage detection method of a neural network model based on deep learning, which constructs a garbage detection model based on a target detection network of YOLOv4, wherein the garbage detection model comprises an Input network module, a backstone network module, a Neck network module and a Head network module; collecting garbage images of daily scenes; inputting the garbage image to an Input module for preprocessing; the preprocessed garbage image is subjected to feature extraction through a backhaul network module to obtain a feature layer; based on the feature layer, carrying out feature fusion through the Neck network module to obtain a feature layer to be predicted; and inputting the layer to be predicted into a Head module for outputting, and obtaining the detection information of the garbage image. The method introduces an Encoder module in a Vision Transformer feature extraction module to the end of a YOLOv4 feature extraction network; meanwhile, the NMS algorithm is replaced by a Soft-NMS algorithm to optimize screening of candidate frames; and finally adding a connecting layer and a convolution layer, and performing up-sampling operation and down-sampling operation again. And outputting the feature layers with more small target feature information in the trunk feature extraction network to the connection layer, carrying out feature fusion, and increasing the output detection layer number from 3 layers to 4 layers. The method provided by the invention has the advantages of obtaining a good detection effect aiming at the conditions that the garbage objects are easy to deform and stack and the transparency is high in the detection of the recyclable garbage, and being remarkable.

Description

Garbage detection method of neural network model based on deep learning

Technical Field

The invention belongs to the technical field of garbage detection, and mainly relates to a garbage detection method based on a neural network model of deep learning.

Background

In recent years, the combined application of garbage sorting devices and computer vision technology has gradually replaced the conventional manual garbage sorting method to become a mainstream garbage sorting method. Objects such as pop cans, wine bottles, plastic bottles and the like are different in materials and similar in appearance, and classification detection errors easily occur during detection. Under the actual garbage recycling detection scene, the garbage objects are easy to deform and shelter, the garbage objects with higher transparency are often confused with the background, and the difficulty of extracting the characteristics of the target objects is increased.

In recent years, a target detection algorithm based on deep learning is becoming the mainstream of the target detection algorithm. The deep architecture of the full convolution network is used to detect spam in images, the network trains on a spam image (GINI) dataset, detects images and roughly partitions the spam areas, and finally optimizes the network structure for deployment to the mobile end. The mobilent network is used to pre-train on ImageNet datasets, followed by training on datasets composed of glass, paper, plastic, metal, and other garbage. CompostNet networks are used to identify and classify kitchen waste.

Although the garbage detection algorithm based on deep learning has achieved better results, the following problems still exist: the structure of the rubbish object made of the same material is diversified, the change of the external dimension is large, the situation of deformation and shielding is easy to occur, and the rubbish object is often confused with the background when the transparency is higher. The target detection algorithm based on deep learning is affected by feature extraction, so that the accuracy of garbage object detection is reduced, and the conditions of missing detection, false detection and the like are easy to occur.

Disclosure of Invention

In order to solve the technical problems, the invention provides a garbage detection method based on a neural network model of deep learning, which can accurately identify object information in an acquired picture and achieve the effect of rapidly detecting household garbage.

Comprising the following steps:

constructing a garbage detection model based on a target detection network of YOLOv4, wherein the garbage detection model comprises an Input network module, a backhaul network module, a Neck network module and a Head network module;

collecting garbage images of daily scenes;

inputting the garbage image to an Input module for preprocessing;

the preprocessed garbage image is subjected to feature extraction through a backhaul network module to obtain a feature layer;

based on the feature layer, carrying out feature fusion through the Neck network module to obtain a feature layer to be predicted;

and inputting the layer to be predicted into a Head module for outputting, and obtaining the detection information of the garbage image.

Optionally, the preprocessing is: carrying out data enhancement processing on the garbage image;

the data enhancement process includes: carrying out random scaling, random cutting, random arrangement and Mostic data enhancement on the garbage image;

the Mostic data enhancement is to randomly splice four garbage images into one picture.

Optionally, the backhaul network module includes an activation function and a residual block structure.

Optionally, the activation function is a mich activation function;

the residual block structure is a stacked composition of one downsampling and multiple residual structures.

Optionally, the Neck network module adopts an SPP+PANet structure, two layers of transformers are added between the SPP structure and the PANet structure to enhance feature extraction, and four feature layers to be predicted are output to the Head network module.

Optionally, the Head network module carries out regression and classification on the layer to be predicted to obtain detection information of the garbage image.

Optionally, the Head network module optimizes the selection of the prediction box using a Soft-NMS algorithm.

Optionally, the detection information of the garbage image includes: and the size, the center position and the confidence of the object in the garbage image.

Optionally, the garbage detection model based on the YOLOv4 neural network structure includes:

collecting a data set of a household garbage image, marking the data set, and dividing the marked data set into a training set and a verification set;

selecting the YOLOv4 neural network;

training the YOLOv4 neural network based on the training set, verifying the trained YOLOv4 neural network based on the verification set, and obtaining the garbage detection image by the YOLOv4 neural network after verification.

Optionally, training the YOLOv4 neural network based on the training set includes: and (3) accelerating by using Cuda software, optimizing the model by using an SGD method, and dynamically adjusting the training learning rate of the model by using a cosine annealing algorithm.

Compared with the prior art, the invention can bring at least one of the following beneficial effects:

1. introducing a transducer encoder module into a Yolov4 trunk feature extraction network, and enhancing feature extraction, in particular feature extraction of a deformed transparent target.

2. And replacing the NMS algorithm with a Soft-NMS algorithm, optimizing the selection of a prediction frame, reducing the omission ratio of the garbage object under the condition of dense shielding, and improving the detection precision.

3. The detection layer is increased to 4 layers, so that the detection performance of target objects with different scales is improved, and the problem that the detection accuracy of recyclable garbage in an actual sorting scene is reduced due to the diversity of morphological scales is solved.

Drawings

FIG. 1 is a schematic flow diagram of a garbage detection method based on a neural network model for deep learning according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating steps according to an embodiment of the present invention;

FIG. 3 is a diagram of a YOLOv4 network model in accordance with an embodiment of the present invention;

FIG. 4 is a training loss graph of an embodiment of the present invention;

fig. 5-7 are schematic diagrams of detection results according to examples of the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Examples

As shown in fig. 1, the embodiment provides a garbage detection method based on a neural network model of deep learning, which includes:

collecting garbage images of daily scenes;

inputting the garbage image to an Input module for preprocessing;

Further, the preprocessing is as follows: carrying out data enhancement processing on the garbage image;

Further, the backhaul network module includes an activation function and a residual block structure.

Further, the activation function is a Mish activation function;

Furthermore, the Neck network module adopts an SPP+PANet structure, two layers of transformers are added between the SPP structure and the PANet structure to strengthen feature extraction, and four feature layers to be predicted are output to the Head network module.

Further, the Head network module carries out regression and classification on the layer to be predicted to obtain detection information of the garbage image.

Further, the Head network module optimizes the selection of the prediction box using a Soft-NMS algorithm.

Further, the detection information of the garbage image includes: and the size, the center position and the confidence of the object in the garbage image.

Further, the garbage detection model based on the YOLOv4 neural network structure comprises:

selecting the YOLOv4 neural network;

Further, training the YOLOv4 neural network based on the training set includes: and (3) accelerating by using Cuda software, optimizing the model by using an SGD method, and dynamically adjusting the training learning rate of the model by using a cosine annealing algorithm.

As shown in fig. 2, a specific detection method in the present embodiment is described in detail below;

1. selecting a data set

Firstly, selecting a data set, downloading the data set on the internet, wherein the data set comprises 6000 or more household garbage pictures, all the household garbage pictures are shot by putting articles in a living environment under sunlight/indoor light sources, the sizes of the pictures are variable, and the garbage identification and classification data set comprises 48 categories including disposable snack boxes, book paper sheets, charging treasures, leftovers, bags, garbage cans, plastic kitchen ware, plastic utensils and the like, and part of data is shown in figure 3.

2. Selecting a neural network model

The YOLOv4 model is improved on the basis of YOLOv1, YOLOv2 and YOLOv3 and consists of a main network, a reinforced feature extraction network and a detection network. YOLOv4 uses CSPDarknet53 as the backbone network structure. The dark 53 contains 5 large residual blocks, each containing a number of

residual units

1, 2, 8, 4, using the Mish function as an activation function in the backbone network, and still employing the leak_relu function as an activation function behind the backbone network. YOLOv4 uses spatial pyramid pooling (Spatial Pyramid Pooling, SPP) in the CSPDarknet53 to increase receptive field size compared to mainstream CSPResNext50 and EfficientNet. CSPDarknet53 adds a cross-phase portion (Cross Stage Partial, CSP) to each large residual block, each large residual block feature map divided into two portions: one part is convolved, the other part is combined with the convolution result of the previous part, and the convolution results are combined through a cross-stage hierarchical structure, so that the calculation amount is reduced, and meanwhile, the accuracy can be ensured. CSP improves model learning rate and reduces computational effort. The enhanced feature extraction portion of YOLOv4 uses a path aggregation network (Path Aggregation Network, PANet) to enhance the overall feature hierarchy architecture by bottom-up path enhancement features using accurate positioning information in lower layers, shortening the information path between lower and highest features, linking the feature network and all feature levels using adaptive feature layers so that the useful information in each feature level propagates directly to the sub-networks below. Feature integration was then performed by performing a 3×3 convolution with YOLO Head, and a 1×1 convolution adjusts the number of channels. And the final prediction part uses a prediction part of YOLOv3, decodes the prediction result, combines the prediction result with a priori frame generated by a K-means algorithm to obtain candidate frames, and then eliminates redundant frames by a non-maximal suppression algorithm to obtain a final detection result.

As shown in fig. 3, the YOLOv4 network model is divided into 4 general modules including an Input network module, a backhaul network module, a neg network module, and a Head network module.

In the actual deployment process, the YOLOv4 network model carries out training of network parameters in advance at a server side, a data set used for training is input into a network, and the network carries out learning of the data set according to a certain learning parameter and continuously carries out cross verification so as to update network structure parameters; after training, the verification effect of the model is observed, the training learning parameters (such as learning rate, training round number and the like) are continuously adjusted, and when the training result is stably converged to a certain value, the network parameters obtained by training are saved, wherein the parameters are the network parameters which are actually used. Then, the network model is mounted on the embedded equipment, an operation environment for operating the YOLOv4 network model is configured, and the computer needs to be connected with the image sensor to acquire images in an actual scene.

3. Dataset preprocessing

The preprocessing of the data set mainly comprises the steps of processing the marked data set (48 types, 6000 pictures) in format, wherein the main processing comprises the steps of putting different file formats under different folders, dividing the data in the data set into a training set and a verification set according to proportion, inputting the training set into a network model to update network model parameters, guiding the parameter updating direction of the network model by the verification set, and evaluating the training result of the network model.

4. Configuration learning model

The model configuration step is mainly used for presetting the selected YOLO V4 network model, and comprises the steps of inputting information such as category number and name, file paths of a training set and a verification set in a data set (operation results obtained based on data set reading), and the like, and the results obtained by the model configuration are applied in the subsequent model training step. The size of the picture input in the network is 416×416, so the data set picture is input and normalized based on the self-width height. For pictures. For the incoming data, a Mosaic method is adopted to enhance the data, the method performs operations such as random overturning, scaling, cutting and the like on the input pictures, 4 processed pictures are spliced on one picture, the robustness of the model to the background of the target object during training is improved, and the training data volume is expanded.

5. Training model

Mostic enhancement was used for the first 70% of the total training rounds, and was cancelled for the last 30% of training. When the model is trained, the SGD method is used for optimizing the model, and the number of pictures which are transmitted to the model training at a time is 8; the maximum iteration number is 200; the cosine annealing algorithm is used for dynamically adjusting the learning rate of model training, and the learning rate for starting training is set to be 0.01. When training is carried out, weights obtained based on the pretraining of the COCO data set are loaded, and the pretraining weights are used for initialization, so that the fact that the weight initialization of a backbone network is too random to cause an unobvious feature extraction effect can be prevented, the overfitting phenomenon is avoided, and model convergence is accelerated.

6. Training completion

The training loss result curve for the improved network over the dataset is shown in fig. 4. In the figure, epoch is the total iteration number of model training and Loss of Loss training. From the training curve, the model loss tends to be smooth with substantially 160 iterations or so. Fig. 5-7 show the detection effect of the improved YOLOv4 model, wherein the display parameters of the prediction frame in the picture are the target categories and the confidence level.

In summary, in order to solve the problems that the scale change of a target is large, the target is easy to be shielded, the transparency is high, deformation is generated and the like in a recyclable garbage detection task, an improved garbage detection method based on YOLOv4 is provided. According to the method, the Encoder module in the Vision Transformer feature extraction module is introduced to the tail end of the Yolov4 feature extraction network, and the local features and the global features are fused, so that the expression capability of a feature map is enhanced, the conditions of deformation and higher transparency of a garbage object to be detected are facilitated, and the network detection precision is improved; meanwhile, the NMS algorithm is replaced by the Soft-NMS algorithm to optimize screening of candidate frames, so that shielding and stacking of garbage objects to be detected are facilitated, and the situation that the candidate frames are deleted violently to cause the error deletion of the frames to be detected of adjacent objects, so that the increase of the detection omission ratio and the reduction of the detection precision are avoided; and finally adding a connecting layer and a convolution layer, and performing up-sampling operation and down-sampling operation again. And outputting the feature layers with more small target feature information in the trunk feature extraction network to the connection layer for feature fusion, increasing the output detection layer number from 3 layers to 4 layers, and improving the detection capability of the model on small-size objects under the condition of increasing a small quantity of model parameters and calculated quantity. The detection result shows that the method provided by the invention has the remarkable advantages of obtaining a good detection effect aiming at the conditions that the garbage objects are easy to deform and stack and have high transparency in the detection of the recyclable garbage.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A garbage detection method based on a neural network model of deep learning is characterized by comprising the following steps:

collecting garbage images of daily scenes;

inputting the garbage image to an Input module for preprocessing;

2. The garbage detection method based on a deep learning neural network model according to claim 1, wherein the preprocessing is: carrying out data enhancement processing on the garbage image;

3. The deep learning based neural network model garbage detection method of claim 1, wherein the Backbone network module comprises an activation function and a residual block structure.

4. The garbage detection method based on a deep learning neural network model according to claim 3, wherein the activation function is a mich activation function;

5. The garbage detection method based on the deep learning neural network model according to claim 1, wherein the Neck network module adopts a SPP+PANet structure, two layers of transformers are added between the SPP structure and the PANet structure to strengthen feature extraction, and four feature layers to be predicted are output to the Head network module.

6. The garbage detection method based on the deep learning neural network model according to claim 1, wherein the Head network module carries out regression and classification on the layer to be predicted to obtain detection information of the garbage image.

7. The garbage detection method based on deep learning neural network model of claim 6, wherein the Head network module uses Soft-NMS algorithm to optimize the selection of prediction boxes.

8. The garbage detection method based on a deep learning neural network model according to claim 1, wherein the detection information of the garbage image includes: and the size, the center position and the confidence of the object in the garbage image.

9. The garbage detection method based on a deep learning neural network model according to claim 1, wherein the garbage detection model based on a YOLOv4 neural network structure comprises:

selecting the YOLOv4 neural network;

10. The garbage detection method based on a deep learning neural network model of claim 9, wherein training the YOLOv4 neural network based on the training set comprises: and (3) accelerating by using Cuda software, optimizing the model by using an SGD method, and dynamically adjusting the training learning rate of the model by using a cosine annealing algorithm.