CN114492658A

CN114492658A - Real-time household garbage detection method and device, electronic equipment and medium

Info

Publication number: CN114492658A
Application number: CN202210129621.4A
Authority: CN
Inventors: 江祥奎; 胡浩昌
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-13

Abstract

The invention discloses a real-time household garbage detection method based on attention mechanism combination, which comprises the following steps of: acquiring a garbage image detection data set, respectively inputting garbage images in the garbage image detection data set into an improved YOLOv5s network model, performing iterative training through a GPU, and training to obtain the optimal weight of the improved YOLOv5s network model; and loading the optimal weight into the improved YOLOv5s network model, inputting the garbage image to be detected, and outputting a detection result, wherein the detection result comprises the position of the target garbage on the image and the category of the target garbage. The method has the advantages of higher speed and higher precision for detecting the garbage images, can meet the requirement of real-time detection of the household garbage, reduces the calculated amount of a network model to a certain extent, improves the reasoning speed and the detection precision, provides an efficient detection method for garbage classification, reduces the consumption of labor cost and accelerates the development of the intellectualization of the garbage classification.

Description

Real-time household garbage detection method and device, electronic equipment and medium

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a real-time household garbage detection method and device based on attention mechanism combination, electronic equipment and a medium.

Background

In recent years, with the rapid development of global economy, the consumption level of residents is remarkably improved, the population scale of towns is continuously enlarged, and the quantity of household garbage is continuously and gradually increased. By 2020, the annual yield of the global domestic garbage is about 3400 ten thousand tons, and the annual yield of the Chinese domestic garbage is about 2000 ten thousand tons, which is the first in the world. Therefore, the housing and urban and rural construction division issues a notice about speeding up the classification of the domestic garbage of a part of major cities in 2017, clearly indicating that the work of classifying the domestic garbage is speeding up. In 2019, the office of national institutions issued a corresponding notice "reference standard for evaluating the classification work of the public institution domestic waste" and proposed requirements for further promoting related work.

The pollution of domestic garbage becomes a main source of environmental pollution, the global ecological environment is seriously damaged, and the normal life of residents is influenced. With the global goal of environmental conservation and greenness, and the issuance of corresponding garbage classification policies, countries have begun to implement them step by step. By 2020, about 11 countries of the world have participated in garbage classification. In China, 237 cities implement the garbage classification policy, and 30 cities issue mandatory garbage classification regulations. The garbage classification is beneficial to environmental protection and resource recycling, and conforms to the global theme of global environmental protection. Most of the domestic garbage is mixed raw garbage, the water content is high, the proportion of the organic garbage is high, and the problem cannot be effectively and quickly solved by adopting the traditional domestic garbage treatment method. The traditional household garbage classification method has the characteristics of labor consumption, low efficiency, poor working quality and the like. Therefore, a scientific, automatic and intelligent garbage classification method is needed. The design is forced to be more perfect, and the detection and identification method capable of simultaneously identifying various types of garbage is designed.

Disclosure of Invention

The invention provides a method, a device, electronic equipment and a medium for detecting domestic garbage in real time based on attention mechanism combination aiming at the defects of the prior art. The method is based on an improved YOLOv5s network added with attention mechanism combination, provides a method for simultaneously detecting and identifying the garbage of multiple categories, can better solve the problems of garbage detection and identification, and has extremely wide application scenes and market values.

In order to solve the technical problems, the invention adopts the technical scheme that: a real-time household garbage detection method based on attention mechanism combination is characterized by comprising the following steps:

acquiring a garbage image detection data set;

respectively inputting the garbage images in the garbage image detection data set into an improved YOLOv5s network model, performing iterative training through a GPU, and training to obtain the optimal weight of the improved YOLOv5s network model; the improved YOLOv5s network model is improved on a YOLOv5s network model, and specifically comprises the following steps: introducing a lightweight feedforward convolution attention module CBAM behind a Focus module at the front end of a backbone network of the YOLOv5s network model, and arranging an SE-Net channel attention module behind the tail end of the backbone network;

and loading the optimal weight into the improved YOLOv5s network model, inputting the garbage image to be detected, and outputting a detection result, wherein the detection result comprises the position of the target garbage on the image and the category of the target garbage.

The invention introduces a lightweight feedforward convolution attention module CBAM behind a Focus module at the front end of a backbone network, the Focus module outputs a characteristic diagram after completing operations such as slicing, connecting and convolution and the like on an input image, the characteristic diagram enters a channel attention module CAM, global maximum pooling and global average pooling operations are carried out on the input characteristic diagram, then the number of channels is compressed and expanded to the original characteristic dimension again to obtain two activated results, finally the two activated results are summed and output by a Sigmoid function to output the characteristic diagram, then the characteristic diagram is multiplied by the original input characteristic diagram to complete the channel attention operation, the result output by the channel attention module is input to a space attention module SAM, the two output characteristic diagrams are connected together by the global maximum pooling and global average pooling operations, and then the channel dimension is changed by convolution operation, outputting a feature diagram through a Sigmoid function, multiplying the feature diagram by an original input feature diagram to finish space attention operation, and sending the processed feature diagram into a next layer of convolution module;

an SE-Net channel attention module is arranged at the tail end of a backbone network, an SPP module finishes space pyramid pooling operation, three feature graphs are output at the tail end of the backbone network, two-dimensional features of each channel are compressed into a real number, the number of the channels is kept unchanged, a current feature graph is obtained, and the feature graphs are converted into r of original feature dimensions through a full connection layer through excitation operation^-1And the multiplication is activated by a ReLu function, the original feature dimension is generated by a full connection layer, the feature dimension is converted into a standardized weight from 0 to 1 by using a Sigmoid function, and finally the weighted feature graph is input into a Neck network of YOLOv5 s.

Optionally, in the improved YOLOv5s network model, in addition to introducing a lightweight feedforward convolution attention module CBAM after a Focus module at a front end of a backbone network of the YOLOv5s network model and a SE-Net channel attention module at a rear end of the backbone network, a small target detection layer is added at a prediction end of the YOLOv5s network model to improve a head network structure, so as to improve detection of the improved YOLOv5s network model on target small garbage.

Optionally, the method for acquiring a spam image detection data set includes:

collecting garbage images containing various daily living garbage, labeling garbage names, garbage categories and garbage positions in the garbage images, and then establishing a garbage image detection data set; the garbage categories comprise four garbage classification categories of household garbage, recyclable garbage, harmful garbage and other garbage.

And storing the file type in an xml format, wherein the information marking mainly comprises: selecting and marking the format of the garbage image data set as a VOC format, the name and the type of garbage needing to be detected in the garbage image, the name and the storage path of a marked image generation file, and the coordinate value and the length and the width of target garbage in the image

And marking the garbage marked with the names according to the major categories of the garbage marked with the names, wherein the major categories are respectively domestic garbage, recoverable garbage, harmful garbage and other garbage, and then establishing a garbage image detection data set through garbage images of the small category garbage marked with the domestic garbage and the major category garbage.

Optionally, the specific process of establishing the spam image detection data set is as follows:

step S1-1: determining the type and size of garbage to be detected, and collecting garbage images of corresponding names in modes of video capture, camera shooting or online collection and the like;

step S1-2: and (3) performing information annotation on the garbage image obtained in the S1-1 by using target detection annotation software Labelimg, and storing the information annotation into a file type in an xml format, wherein the information annotation mainly comprises the following steps: selecting a format for marking the garbage image data set as a VOC format, a garbage name and a garbage category of garbage to be detected, a name and a storage path of a marked image generation file, and coordinate values and length and width of target garbage in an image;

step S1-3: and after information labeling is completed, counting the image quantity of the garbage image data set, converting all the file types of the completed information labeling xml format into txt file types, and performing normalization processing on the coordinate values of garbage to obtain a garbage image detection data set.

Optionally, the anchor frame value of the modified YOLOv5s network model is set to [5,6,8,14,15,11], [10,13,16,30,33,23], [30,61,62,45,59,119], [116,90,156,198,373,326], and the modified YOLOv5s network model trains the output prediction frame and the real frame to compare at the set anchor frame value, calculates the difference between the two frames, and performs reverse update iteration and adaptive adjustment on the network model parameters.

Optionally, the garbage images of the garbage image detection data set are respectively input into the improved YOLOv5s network model and are subjected to 1000 times of iterative training by the GPU, the size of the image is set to be 640 × 640, the batch size of each model iterative training is set to be 16, the initial learning rate is set to be 0.01, and the iterative training is performed by the GPU in the created virtual environment.

Optionally, the detection result includes the name, category, size of the prediction box, confidence value of the detected garbage and the position information of the target garbage on the image, so that simultaneous recognition of multiple types of household garbage is realized, and the purpose of real-time detection of the household garbage is achieved.

In order to solve the above problems, the present invention also provides a cultural relic image color restoration apparatus, comprising:

the preprocessing module is used for acquiring a garbage image detection data set;

the training module is used for respectively inputting the garbage images in the garbage image detection data set into an improved YOLOv5s network model, performing iterative training through a GPU, and training to obtain the optimal weight of the improved YOLOv5s network model; the improved YOLOv5s network model is improved on a YOLOv5s network model, and specifically comprises the following steps: introducing a lightweight feedforward convolution attention module CBAM behind a Focus module at the front end of a backbone network of the YOLOv5s network model, and arranging an SE-Net channel attention module behind the tail end of the backbone network;

and the garbage detection module is used for loading the optimal weight into the improved YOLOv5s network model, inputting a garbage image to be detected, outputting a detection result and obtaining the position and the belonging category of the target garbage on the image.

In order to solve the above problem, the present invention also provides an electronic device, wherein the electronic device includes:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for attention-combination based real-time detection of household garbage according to any one of claims 1-6.

In order to solve the above problems, the present invention further provides a computer readable storage medium storing computer instructions which are operated to execute the method for real-time detection of household garbage based on attention mechanism combination according to any one of claims 1 to 6.

Compared with the prior art, the invention has the following advantages:

1. the invention improves the YOLOv5s network model, introduces a lightweight feedforward convolution attention module CBAM behind a Focus module at the front end of a backbone network of the YOLOv5s network model, and places an SE-Net channel attention module behind the tail end of the backbone network of the YOLOv5s network model, and by adding an attention mechanism combination on the YOLOv5s network model, the detection accuracy, recall ratio and mAP value of the model can be improved, and meanwhile, the inference time and the calculation amount are reduced. And a small target detection layer is added at the prediction end to improve the head network structure for detecting small target garbage objects, including capsules, button batteries and the like, so that the accuracy of the small target garbage can be provided. The invention has excellent performance on detection precision while meeting the requirement of higher detection speed, can meet the requirement of real-time detection of household garbage, reduces the calculated amount of a network model to a certain extent, improves the reasoning speed and the detection precision, provides an efficient detection method for garbage classification, reduces the consumption of labor cost and accelerates the development of intellectualization of garbage classification.

2. The invention can detect and identify various common garbage types in life, the accuracy rate of the household garbage detection reaches 93.5%, the recall rate reaches 91.1%, the value of mAP @0.5 reaches 96.4%, the reasoning speed reaches 13.1ms, and the model calculation amount reaches 8.9 GFLOPS. Compared with the original YOLOv5s model, the calculated amount is reduced by 45.3%, the detection precision is improved by 4.6%, the inference time is reduced by 7.4ms, the simultaneous recognition of various garbage can be realized, a technical scheme is provided for the intelligent processing of the garbage, and the purpose of real-time detection of the household garbage is achieved.

3. The household garbage detection method, the household garbage detection device, the electronic equipment and the computer readable storage medium solve the problems of garbage detection and real-time identification.

The technical solution of the present invention is further described in detail by the accompanying drawings and examples.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a comparison graph of model training effects of four pre-training models according to the present invention;

FIG. 3 is a schematic diagram of an improved YOLOv5s network model incorporating a combination of attention mechanisms in the YOLOv5s network model of comparative example 2;

FIG. 4 is a graph of the training effect of the improved YOLOv5s network model of comparative example 2 with the attention mechanism combination introduced in the YOLOv5s network model;

FIG. 5 is a schematic diagram of an improved YOLOv5s network model in comparative example 3 with the addition of a small target detection layer to the YOLOv5s network model;

FIG. 6 is a graph showing the training effect of the improved YOLOv5s network model of comparative example 3 in which a small target detection layer is added to the YOLOv5s network model;

fig. 7 is a schematic diagram of an improved YOLOv5s network model according to embodiment 1 of the present invention;

FIG. 8 is a comparison graph of the training effect of the improved YOLOv5s network model of the embodiment of the present invention and the original YOLOv5s network model of comparative example 1;

fig. 9 is a diagram of the garbage detection effect based on the improved YOLOv5s network model.

Detailed Description

Example 1

As shown in fig. 1, the method for detecting domestic garbage in real time based on attention mechanism combination in this embodiment includes the following steps:

s1: acquiring a garbage image detection data set, and dividing the garbage image detection data set into a training set, a verification set and a test set;

in this embodiment, acquiring a garbage image detection data set collects garbage images by means of online collection, capturing household garbage in a video, shooting and the like, all household garbage in the images are labeled according to their names to obtain garbage of 13 names, and 13 is divided into four categories, which are respectively: the method comprises the following steps of establishing a garbage detection data set containing various categories of household garbage, recoverable garbage, harmful garbage and other garbage, and dividing the data set into a training set, a verification set and a test set; the method comprises the following specific steps:

step S1-1, determining the type and size of garbage to be detected, capturing the garbage in a real life scene in a video, shooting by a camera, compiling a crawler script by utilizing a Python programming language, collecting garbage images of corresponding names in an online collection mode and the like;

step S1-2: and (2) using target detection labeling software Labelimg to label the information of the collected picture data set and store the information as file types in an xml format, and obtaining 13 kinds of garbage with names in total, wherein the method comprises the following steps: apple core, cylinder battery, button cell, books, capsule, pencil, toothbrush, old trousers, mobile phone, remote controller, old jacket, vegetable leaf, watermelon peel, the information mark mainly includes: selecting the format of the marked data set as a VOC format, the name and the category of the junk needing to be detected, the name and the storage path of the marked picture generation file, and the coordinate value and the length and the width of the target junk in the image;

step S1-3: counting the size of a data set after information labeling is finished to obtain 7671 effective samples in total, compiling a script by utilizing a Python programming language, converting all files subjected to information labeling into txt file types and carrying out normalization processing on coordinate values to obtain a garbage image detection data set, and dividing the garbage image detection data set into a training set, a verification set and a testing set according to a sample quantity proportion of 7:1:2, wherein the household garbage comprises 2204 samples of apple cores, toothbrushes, vegetable leaves and watermelon peels; the recoverable garbage comprises 2179 samples of books, old trousers and old coats, and 1805 samples of harmful garbage comprising cylindrical batteries, button batteries, pencils and mobile phones; other garbage includes 1483 samples of capsules and remote controllers.

And S2, respectively inputting the garbage images in the training set and the verification set into an improved YOLOv5S network model, performing iterative training through a GPU, and training to obtain the optimal weight. The method specifically comprises the following steps: and performing 1000 times of iterative training on a data set in the training set, setting the size of an image to be 640 multiplied by 640, setting the batch size of each model iterative training to be 16, setting the initial learning rate to be 0.01, and performing iterative training by using a GPU in the created virtual environment.

As shown in fig. 7, the improved YOLOv5s network model is an improvement on the YOLOv5s network model, specifically: introducing a lightweight feedforward convolution attention module CBAM behind a Focus module at the front end of a backbone network of a YOLOv5s network model, outputting a feature map after slicing, connecting, convolving and the like are carried out on an input image by the Focus module, entering a channel attention module, carrying out maximum pooling and average pooling on the input feature map, then compressing the number of channels and expanding the channel to the original feature dimension again to obtain two activated results, adding the two activated results, connecting a sigmoid function output feature map, multiplying the two results by the original input feature map to complete channel attention operation, inputting the result output by the channel attention module to a space attention module, stacking the two output feature maps together by adopting maximum pooling and average pooling, changing the channel dimension by convolution operation, outputting the feature map through the sigmoid function, multiplying the feature graph with the original input feature graph to finish space attention operation, and sending the processed feature graph into a next layer of convolution module;

an SE-Net channel attention module is arranged at the tail end of a backbone network in a post-positioned mode, an SPP module finishes space pyramid pooling operation, three feature graphs are output at the tail end of the backbone network, two-dimensional features of each channel are compressed into a real number, the number of the channels is kept unchanged, a current feature graph is obtained, the feature graphs are converted into times of original feature dimensions through a full connection layer and activated by a ReLu function through excitation operation, the original feature dimensions are generated by the full connection layer, the original feature dimensions are converted into standardized weights from 0 to 1 through an s-shaped function, and finally the weighted feature graphs are input into a Neck network;

and adding a small target detection layer at a prediction end of the YOLOv5s network model to improve a head network structure, which aims to solve the problem of high error rate of small target garbage detection, for example, small-size target garbage such as cylindrical batteries, button batteries and capsules exist in 13 types of garbage, so that the detection precision is improved, and finally, the improved YOLOv5s network model is obtained.

The anchor frame value of the improved YOLOv5s network model is set as [5,6,8,14,15,11], [10,13,16,30,33,23], [30,61,62,45,59,119], [116,90,156,198,373,326], the improved YOLOv5s network model trains output prediction frames and real frames to be compared on the set anchor frame value, the difference between the two frames is calculated, and the network model parameters are adjusted in a self-adaptive mode in a reverse updating iteration mode.

S3, inputting the garbage image to be detected into the trained improved YOLOv5S network model by using the optimal weight to obtain a garbage real-time detection result, wherein the detection result mainly comprises the detected garbage type and name, the size of a prediction box, a confidence value and position information of a target as shown in FIG. 9.

In this embodiment, the training set and the verification set in step S1 are input into four pre-training models, namely YOLOv5S, YOLOv5m, YOLOv5l, and YOLOv5x, provided by the authority, and are respectively subjected to 100 times of iterative training, after the iterative training is completed, corresponding optimal weights are obtained, and then the data set in the test set is used to test the four pre-training models, namely YOLOv5S, YOLOv5m, YOLOv5l, and YOLOv5x after the training, where a comparison graph of training effects is shown in fig. 2, and test results are shown in table 2.

TABLE 1 test results of four pre-trained models

As can be seen from the above table: YOLOv5s has a mAP value of 91.8%, a reasoning time of 19.8ms, YOLOv5m has a mAP value of 96%, a reasoning time of 70.1ms, YOLOv5l has a mAP value of 92.9%, a reasoning time of 122.9ms, YOLOv5x has a mAP value of 91.9%, and a reasoning time of 306.1 ms.

Through performance index analysis of the four pre-training models, for a data set used by the method, the accuracy rate of the YOLOv5m model is the highest, but the reasoning time is long and cannot meet the requirement of real-time performance, and the accuracy rate of the YOLOv5s model is lower than that of the other three models, but the reasoning time is the shortest and can meet the requirement of real-time performance, so that according to actual requirements, the method selects the YOLOv5s network model as the pre-training model, and then improves the YOLOv5s network model to improve the accuracy rate and the recall rate.

Comparative example 1

The process of this comparative example is the same as that of example 1, except that: the network model trained in this example is YOLOv5 s.

Comparative example 2

The process of this comparative example is the same as that of example 1, except that: the improved YOLOv5s trained in the comparative example is an improvement on a YOLOv5s network model, specifically, a lightweight feedforward convolution attention module CBAM is introduced behind a Focus module at the front end of a backbone network of the YOLOv5s network model, and an SE-Net channel attention module is arranged behind the tail end of the backbone network. A schematic diagram of the modified YOLOv5s is shown in fig. 3.

Comparative example 3

The process of this comparative example is the same as that of example 1, except that: the improved YOLOv5s trained in the present example is an improvement on the YOLOv5s network model, specifically, a small target detection layer is added at the prediction end of the YOLOv5s network model to improve the head network structure, that is, the head network structure is improved. A schematic diagram of the modified YOLOv5s is shown in fig. 4.

Then, the training effect and the detection result of the four network models in example 1 and comparative examples 1 to 3 are described as follows by combining fig. 4, fig. 6, fig. 8 and table 1:

table 2 test results of the test methods in example 1 and comparative examples 1 to 3

As can be seen from the above table: the mAP value of YOLOv5s is 91.8%, the inference time is 19.8ms, the calculated amount is 16.5GFLOPS, the mAP value of the improved YOLOv5s network model is 96.4%, the inference time is 13.1ms, and the calculated amount is 8.9 GFLOPS. The YOLOv5s network model of the increased attention mechanism combination used in comparative example 2 had an mapp value of 94.5%, a reasoning time of 8.5ms, and a calculated amount of 6.0 GFLOPS; in comparative example 3, the YOLOv5s network model with the small target detection layer added has the mAP value of 93.6%, the reasoning time of 32.2ms and the calculated amount of 27.7GFLOPS, and through the performance index analysis of the four network models, for the data set used by the invention, although the reasoning time of the small target detection layer added is longer than that of the network model with the attention mechanism combination, the recall rate can be improved, which is more critical for garbage classification. Compared with the original YOLOv5s network model, the accuracy rate of the improved YOLOv5s network model in the detection method is improved by 4.6%, the accuracy rate is improved by 2.5%, the recall rate is improved by 3%, the reasoning time is reduced by 6.7ms, and the calculated amount is reduced by 7.6 GFLOPS.

Example 2

The invention also provides a cultural relic image color restoration device, which comprises:

the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring a garbage image detection data set and dividing the garbage image detection data set into a training set, a verification set and a test set;

the training module is used for respectively inputting the garbage images in the training set and the verification set into an improved YOLOv5s network model, performing iterative training by a GPU, and training to obtain the optimal weight of the improved YOLOv5s network model; the improved YOLOv5s network model is improved on a YOLOv5s network model, and specifically comprises the following steps: introducing a lightweight feedforward convolution attention module CBAM behind a Focus module at the front end of a backbone network of the YOLOv5s network model, and arranging an SE-Net channel attention module behind the tail end of the backbone network;

step S1-2: and (3) carrying out information annotation on the collected picture data set by using target detection annotation software Labelimg, storing the information annotation into a file type in an xml format, and obtaining 13 types of garbage with names, wherein the method comprises the following steps: apple core, cylinder battery, button cell, books, capsule, pencil, toothbrush, old trousers, mobile phone, remote controller, old jacket, vegetable leaf, watermelon peel, the information mark mainly includes: selecting the format of the marked data set as a VOC format, the name and the category of the junk needing to be detected, the name and the storage path of the marked picture generation file, and the coordinate value and the length and the width of the target junk in the image;

In this embodiment, the spam images in the training set and the verification set are respectively input to the improved YOLOv5s network model shown in fig. 9, and are subjected to iterative training by the GPU, so as to obtain the optimal weight. The method specifically comprises the following steps: and performing 1000 times of iterative training on a data set in the training set, setting the size of an image to be 640 multiplied by 640, setting the batch size of each model iterative training to be 16, setting the initial learning rate to be 0.01, and performing iterative training by using a GPU in the created virtual environment.

The anchor frame value of the improved YOLOv5s network model is set to be [5,6,8,14,15,11], [10,13,16,30,33,23], [30,61,62,45,59,119], [116,90,156,198,373,326], the improved YOLOv5s network model trains output prediction frames to be compared with real frames on the set anchor frame value, the difference value between the two frames is calculated, and network model parameters are adjusted in a self-adapting mode in a reverse updating iteration mode.

And inputting the garbage image to be detected into the trained improved YOLOv5s network model by using the optimal weight to obtain a real-time garbage detection result, as shown in fig. 9, wherein the detection result mainly comprises the detected garbage category and name, the size of a prediction box, a confidence value and the position information of the target.

Example 3

The present invention also provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for real-time detection of household garbage based on a combination of attention power mechanisms of embodiment 1.

Example 4

The embodiment also provides a computer-readable storage medium, which stores computer instructions that are operated to execute the method for real-time detection of household garbage based on attention mechanism combination in embodiment 1.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the present invention in any way. Any simple modification, change and equivalent changes of the above embodiments according to the technical essence of the invention are still within the protection scope of the technical solution of the invention.

Claims

1. A real-time household garbage detection method based on attention mechanism combination is characterized by comprising the following steps:

acquiring a garbage image detection data set;

2. The method as claimed in claim 1, wherein the improved YOLOv5s network model is characterized in that, in addition to introducing a lightweight feedforward convolution attention module CBAM behind a front-end Focus module of a backbone network of the YOLOv5s network model, a SE-Net channel attention module is behind a tail end of the backbone network, a small target detection layer is added at a prediction end of the YOLOv5s network model to improve a head network structure, so as to improve the detection of the improved YOLOv5s network model on target small garbage.

3. The method for detecting the household garbage in real time based on the combination of the attention mechanism and the attention mechanism as claimed in claim 1, wherein the method for acquiring the garbage image detection data set comprises the following steps:

collecting garbage images containing various daily life garbage, labeling garbage names, garbage categories and garbage positions in the garbage images, and then establishing a garbage image detection data set; the garbage categories comprise four garbage classification categories of household garbage, recyclable garbage, harmful garbage and other garbage.

And storing the file type in an xml format, wherein the information marking mainly comprises: selecting and marking the format of the garbage image data set as a VOC format, the name and the type of garbage to be detected in the garbage image, the name and the storage path of a generated file of a marked picture, and the coordinate value and the length and width of target garbage in the image

4. The method for detecting the household garbage in real time based on the combination of the attention mechanism and the attention mechanism as claimed in claim 2, wherein the specific process for establishing the garbage image detection data set is as follows:

step S1-2: and (3) performing information annotation on the garbage image obtained in the S1-1 by using target detection annotation software Labelimg, and storing the information annotation into a file type in an xml format, wherein the information annotation mainly comprises the following steps: selecting the format of the marked data set as a VOC format, the name and the class of the garbage to be detected, the name and the storage path of the marked image generation file, and the coordinate value and the length and the width of the target garbage in the image;

5. The method for detecting household garbage based on attention mechanism combination according to claim 1 or 2, wherein the anchor frame value of the modified YOLOv5s network model is set as [5,6,8,14,15,11], [10,13,16,30,33,23], [30,61,62,45,59,119], [116,90,156,198,373,326], the modified YOLOv5s network model trains the output prediction frame to compare with the real frame at the set anchor frame value, calculates the difference between the two frames, and performs reverse update iteration to adaptively adjust the network model parameters.

6. The method as claimed in claim 1, wherein the garbage images of the garbage image detection data set are respectively input into a modified YOLOv5s network model and are subjected to 1000 times of iterative training by a GPU, the image size is set to 640 x 640, the batch size of each model iterative training is set to 16, the initial learning rate is set to 0.01, and the iterative training is performed by the GPU in the created virtual environment.

7. The method according to claim 1, wherein the detection result comprises the name, category, prediction box size, confidence value and position information of the image where the target garbage is located.

8. A real-time household garbage detection device based on attention mechanism combination is characterized by comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for real-time attention-based mechanism-combined household waste detection according to any one of claims 1-7.

10. A computer readable storage medium storing computer instructions operable to perform the method for real-time attention-based mechanism-based detection of household waste according to any one of claims 1 to 7.