CN110852351A

CN110852351A - Image-based garbage classification method and device, terminal equipment and storage medium

Info

Publication number: CN110852351A
Application number: CN201911003601.7A
Authority: CN
Inventors: 唐蔚然; 谢洪涛
Original assignee: Suzhou Magic Island Information Technology Co Ltd
Current assignee: Suzhou Magic Island Information Technology Co Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2020-02-28

Abstract

The invention discloses a garbage classification method and device based on images, a terminal device and a storage medium, wherein the garbage classification method comprises the following steps: performing feature extraction on an input image by using a basic feature network to obtain a garbage image; respectively carrying out noise suppression on the garbage image by utilizing at least three attention mechanism modules to obtain at least three corresponding depth characteristic information; carrying out bilinear polymerization on at least three kinds of depth characteristic information pairwise to obtain at least three kinds of enhancement information; fusing at least three types of enhancement information to obtain junk image expression information; and classifying the garbage image expression information by using a classifier so as to realize classification of the input image. The technical scheme of the embodiment of the invention obtains the best current result in the three widely used reference data sets.

Description

Image-based garbage classification method and device, terminal equipment and storage medium

Technical Field

The present invention relates to a garbage classification method, and in particular, to a garbage classification method and apparatus based on an image, a terminal device, and a storage medium.

Background

The garbage classification aims at classifying different garbage according to materials and purposes so as to effectively reduce environmental pollution and resource waste, and the automatic classification of garbage images has wide application prospect at present.

Because the garbage image data has larger intra-class difference and inter-class similarity, the traditional image classification algorithm cannot solve the problem. At present, a bilinear polymerization algorithm is used for carrying out high-order mapping on image features to capture the difference between fine-grained images. However, this bilinear polymerization approach ignores a problem: high-order mapping not only introduces more image detail information, but also amplifies noise information to influence the feature expression capability. In addition, due to the characteristic that the existence form of noise in image expression has diversity, it is difficult to effectively remove the noise through a single noise suppression mechanism.

Disclosure of Invention

In order to solve the technical problem, embodiments of the present invention provide a method and an apparatus for image-based garbage classification, a terminal device, and a storage medium.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

an embodiment of a first aspect of the present invention provides an image-based garbage classification method, where the garbage classification method includes:

performing feature extraction on an input image by using a basic feature network to obtain a garbage image;

respectively carrying out noise suppression on the garbage image by utilizing at least three attention mechanism modules to obtain at least three corresponding depth characteristic information;

carrying out bilinear polymerization on at least three kinds of depth characteristic information pairwise to obtain at least three kinds of enhancement information;

fusing at least three types of enhancement information to obtain junk image expression information;

and classifying the garbage image expression information by using a classifier so as to realize classification of the input image.

Further, the attention mechanism module comprises: the system comprises a spatial attention module based on the feature information of different areas of the garbage image, a channel attention module based on the feature channel information of the garbage image, and an area relation attention module based on the relation between different areas of the garbage image.

Further, the spatial attention module has the expression:

f_s(X_i)＝diag(ω_i)X_i；

wherein diag (-) is a diagonalization operation, using the elements in the input vector as diagonal elements to generate a matrix, vector ω_iThe method comprises the following steps: feature X is transformed by a two-layer 1 × 1 convolution operation and a ReLU activation function_iMapping into a feature with the channel number of 1. Then using softmax operation to carry out characteristic normalization processing to obtain omega_i。

Further, the expression of the channel attention module is:

f_c(X_i)＝X_idiag(c_i)+X_i

wherein diag (. cndot.) is a diagonalization operation, c_iThe method comprises the following steps: using a global average pooling operation to pool features X_iCarrying out spatial averaging to obtain a global vector characteristic; information extraction is carried out through two layers of full connection operation, and the dimension changes are c → c/16 and c/16 → c respectively; c is obtained by normalization through a Sigmoid activation function_i。

Further, the expression of the region relationship attention module is as follows:

whereinAnd

are all X_iObtained by convolution and pooling operations.

Further, classifying the spam image expression information using a classifier comprises: a cross entropy loss function is used as an optimization objective.

Further, classifying the spam image expression information using a classifier comprises:

performing data augmentation on the training data set;

and (3) disordering the augmented training data, performing batch training according to a preset quantity, and simultaneously, randomly intercepting an area with a preset size on the original image of the training data set and inputting the area into the classifier.

An embodiment of a second aspect of the present invention provides an image-based garbage classification apparatus, including:

the characteristic extraction module is used for extracting the characteristics of the input image by utilizing a basic characteristic network to obtain a garbage image;

the noise suppression module is used for performing noise suppression on the garbage image by utilizing at least three attention mechanism modules to obtain at least three corresponding depth characteristic information;

the enhancement module is used for performing bilinear polymerization on at least three kinds of depth characteristic information pairwise to obtain at least three kinds of enhancement information;

the fusion module is used for fusing at least three types of enhancement information to obtain junk image expression information;

and the classification module is used for classifying the garbage image expression information by using a classifier so as to realize classification of the input image.

In a third aspect, the present invention provides a terminal device, where the terminal device includes a processor and a memory for storing processor-executable instructions, and the processor executes the steps of any one of the above garbage classification methods.

A fourth aspect of the present invention provides a computer-readable storage medium, storing computer instructions, which when executed, implement the steps of any one of the above-mentioned garbage classification methods.

The embodiment of the invention provides a garbage classification method, a garbage classification device, terminal equipment and a storage medium based on images, on one hand, a plurality of complementary denoising image features can be extracted from garbage images to capture distinctive image features; on the other hand, a hierarchical fusion method based on high-order mapping is provided, and a plurality of noise-suppressed image features are subjected to high-order mapping and effectively fused to obtain a more robust global image expression and are used for classification. The technical scheme of the embodiment of the invention obtains the best current result in the three widely used reference data sets.

Drawings

FIG. 1 is an alternative flow chart of a garbage classification method according to an embodiment of the present invention;

FIG. 2 is another alternative flow chart of the garbage classification method provided by the embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a spatial attention module according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a channel attention module according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a region relationship attention module according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention, belong to the scope of protection of the invention.

As shown in fig. 1 and fig. 2, an embodiment of the first aspect of the present invention provides an image-based garbage classification method, where the method includes the following steps:

and S10, extracting the features of the input image by using the basic feature network to obtain a garbage image.

Specifically, the basic feature network may be any general convolutional neural network, that is, the basic feature network includes a plurality of convolutional operation modules, and each convolutional module includes a plurality of convolutional layers and an activation function; the convolution modules are connected by an average pooling layer or a maximum pooling layer, and the number and the size of convolution kernels in each convolution module are basically unchanged; the number of convolution kernels of different convolution modules is sequentially increased along with the increase of the number of network layers; here, we take the output of the last layer of the underlying feature network as the final image representation feature. Taking VGG-16 as an example, the base feature network comprises five convolutional layer modules, and each convolutional layer module comprises different numbers of convolution and activation operations. The number of the characteristic channels output by the five groups of modules is as follows in sequence: 64, 128, 256, 512, 512. With the increase of modules, the resolution of the output features becomes smaller in sequence, and the semantic hierarchy of the extracted features becomes higher in sequence. Finally, defining the characteristics output by the last layer of the basic characteristic network as X_i∈R^N×CWhere N is the number of spatial pixels, C is the number of feature channels, and i is the sample index.

And S20, respectively carrying out noise suppression on the garbage image 10 by using at least three attention mechanism modules to obtain at least three corresponding depth feature information 20.

In one specific example of the present invention, as shown in fig. 2 to 5, the spatial information of the image, the channel information of the feature, and the region spatial relationship information are respectively refined and redundantly eliminated by using the three attention mechanism modules. The image space information includes important image space information, and the important image space information refers to texture information of which region in the garbage image is more important for fine-grained garbage classification, such as a pop can region in fig. 2. The feature channel information includes feature channel important information, which refers to which channels of information in the image feature expression have distinctiveness for garbage classification, such as channels related to texture and shape. The area relation information comprises area relation important information, the area relation important information refers to the relation between which areas in the garbage image are valuable for garbage classification, and for example, the pull ring area and the can body area are mutually related in the drawing to indicate that the can is probably an empty pop can.

In particular, the spatial attention module is f_sRepresents; f for channel attention module_cRepresents; and regional relationship attention module f_rAnd (4) showing.

Further, f_sFirst, feature X is transformed by a two-layer 1 × 1 convolution operation (Conv) and a ReLU activation function_iMapping into a feature with the channel number of 1. Then, a vector omega is obtained after feature normalization processing is carried out by using softmax operation_iIts dimension is N. Vector omega_iThe weight in (1) corresponds to the feature X_iThe degree of importance of each pixel: areas with high weights have more important information, and areas with low weights have more noise. Finally, the vector omega is processed_iAs weights and features X_iThe spatial pixels in (1) are weighted, and the specific expression of the spatial attention module is as follows:

f_s(X_i)＝diag(ω_i)X_i；

wherein diag (·) is a diagonalization operation, and a matrix is generated by taking the elements in the input vector as diagonal elements. The use of softmax has the following advantages: softmax enables ω_iThe value of (1) is between (0, 1), so that the weight of the large value is concentrated in the important area of the image; softmax can suppress the problem of deep network gradient explosions.

f_cFirst, X is pooled by a global average pooling operation_iAnd carrying out spatial averaging to obtain a global vector characteristic. And then information extraction is carried out through two-layer full connection operation, and the dimension change is c → c/16 and c/16 → c respectively. Finally, normalization is carried out through a Sigmoid activation function to obtain c_i. The channel attention module attention expression is:

f_c(X_i)＝X_idiag(c_i)+X_i；

preferably, a residual attention mechanism is used here to make the training more stable.

Regional relationship attention module f compared to spatial attention module_rThe operation of one more area interaction is realized. The relation between space and space is obtained in the form of outer product, finally, the spatial relation weight is obtained by normalization through Softmax operation, and the expression of the regional relation attention module is as follows:

wherein

And

are all X_iObtained by convolution and pooling operations. Soffmax here is the operation along the matrix row vector.

And S30, carrying out bilinear polymerization on at least three kinds of depth characteristic information pairwise to obtain at least three kinds of enhancement information.

Bilinear polymerization is a widely used characteristic polymerization method, and the method can carry out interaction between channels on two characteristic expressions of an image through the operation of outer product, and finally carry out space average pooling to obtain a final expression. Due to the outer product operation, the method can map the image features to a high-order semantic space to obtain richer feature expression, and therefore, the method also has stronger distinguishing capability. Further, the three enhanced images obtained above are expressed, bilinear polymerization is carried out between every two images by the step-type method in FIG. 2,

and S40, fusing at least three types of enhancement information to obtain junk image expression information.

Specifically, the final expression is obtained in a cascade manner. Compared with direct cascade, the method can additionally explore the hidden associated information among the three characteristics, so that the method has stronger distinguishing capability:

wherein

For a bilinear aggregation function:

x₁，x₂∈R^N×C。is characterized by cascaded operation.

And S50, classifying the garbage image expression information by using a classifier to realize classification of the input image.

Since the classification of the garbage image is a fine-grained classification problem essentially, a cross entropy loss function is adopted as an optimization target, and the expression is as follows:

wherein, y_iRepresenting the true classification result, i.e. the label; a is_iRepresents the classifier pair Y_iThe classes obtained by the classification are scored.

Further, to reduce the risk of model overfitting, classifying the spam image expression information by using a classifier comprises:

performing data augmentation (such as folding, stretching and the like) on the training data set;

the augmented training data is scrambled and batch-trained in a predetermined number (e.g., batch size 8), and regions of a predetermined size (e.g., 448 x 448) are randomly clipped from the original image of the training data set and input to the classifier.

When a network is trained, a random gradient descent method is used as an optimizer, and the learning rate attenuation strategy is set to be exponential attenuation; the initial learning rate was 0.01. Meanwhile, the previous layer of the classifier is set to Dropout with a ratio of 0.5, and the coefficient value of the L2 penalty term is set to 0.0005. The initialization of the network adopts the MSRA method, and the Gaussian parameter is set asNormal distribution of (1), n is the number of parameters).

To validate the effectiveness of the present invention, we evaluated on three widely used fine-grained target classification benchmark datasets. The three reference data sets are respectively a fine-grained bird data set (CUB-200), a fine-grained vehicle data set (Car-196) and an action identification arbitrary set (MPII). The data set specific information is as follows:

1. the CUB-200 dataset consists of 11788 bird pictures of 200 categories. The training/testing of the data set is divided into: 5994 training pictures and 5794 test pictures;

2. the Car-196 data set consists of a total of 196 categories of 16185 Car class pictures. The training/testing of the data set is divided into: 8114 training pictures and 8041 test pictures;

3. the MPII data set consists of 15205 pictures of 393 behavior classes. The training/testing of the data set is divided into: 8218 training pictures and 6987 test pictures.

Experiments prove that the garbage classification method of the embodiment of the invention obtains the best experimental result on the three reference data sets. The recognition accuracy rates on the CUB-200 and Car-196 datasets were 86.2% and 91.5%, respectively, and the MAP (mean Average precision) index on the MPII dataset was 32.7%.

The terminal device may further include a network interface, an input device, a hard disk, and a display device.

The various interfaces and devices described above may be interconnected by a bus architecture. A bus architecture may be any architecture that may include any number of interconnected buses and bridges. One or more Central Processing Units (CPUs), represented in particular by a processor, and one or more memories, represented by a memory, are connected together by various circuits. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.

The network interface can be connected to a network (such as the internet, a local area network, etc.), and can acquire relevant data from the network and store the relevant data in the hard disk.

The input device can receive various instructions input by an operator and send the instructions to the processor for execution. The input device may include a keyboard or a pointing device (e.g., a mouse, trackball, touch pad, touch screen, or the like).

The display device can display the result obtained by the processor executing the instruction.

The memory is used for storing programs and data necessary for operating the operating system, intermediate results in the calculation process of the processor and the like.

It will be appreciated that the memory in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memories of the apparatus and methods described herein are intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, the memory stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system and an application program.

The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs include various application programs, such as a browser (btower) and the like, and are used for realizing various application services. The program for implementing the method of the embodiment of the present invention may be included in the application program.

The processor acquires the panoramic image when calling and executing the application program and the data stored in the memory, specifically, the application program and the data can be a program or an instruction stored in the application program; preprocessing the panoramic image to obtain a subimage to be processed; inputting the sub-image to be processed into a multi-path convolution neural network to obtain a deep characteristic map of the sub-image to be processed; performing pooling treatment on the deep layer characteristic diagram; and inputting the deep characteristic map subjected to pooling into a full-connected model, and taking the output of the full-connected model as the position information after relocation.

The method disclosed by the above embodiment of the invention can be applied to a processor or implemented by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules b in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

It is understood that storage media include, but are not limited to: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Other steps of garbage classification according to embodiments of the present invention are understood and readily implemented by those skilled in the art and therefore will not be described in detail.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. An image-based garbage classification method, characterized in that the garbage classification method comprises:

2. The method of classifying refuse according to claim 1, wherein the attention mechanism module comprises: the system comprises a spatial attention module based on the feature information of different areas of the garbage image, a channel attention module based on the feature channel information of the garbage image, and an area relation attention module based on the relation between different areas of the garbage image.

3. The method of garbage classification of claim 2, wherein the spatial attention module has the expression:

f_s(X_i)＝diag(ω_i)X_i；

wherein diag (-) is a diagonalization operation, using the elements in the input vector as diagonal elements to generate a matrix, vector ω_iThe method comprises the following steps: feature X is transformed by a two-layer 1 × 1 convolution operation and a ReLU activation function_iMapping into a characteristic with the channel number of 1; using softmax operation to carry out characteristic normalization processing to obtain omega_i。

4. The method of sorting garbage according to claim 2, wherein the expression of the channel attention module is:

f_c(X_i)＝X_idiag(c_i)+X_i

5. The method of garbage classification of claim 2 wherein the expression of the region relationship attention module is:

wherein

And

are all X_iObtained by convolution and pooling operations.

6. The method of classifying spam according to claim 1 wherein classifying spam image expression information using a classifier comprises: a cross entropy loss function is used as an optimization objective.

7. The method of classifying spam according to claim 1 wherein classifying spam image expression information using a classifier comprises:

performing data augmentation on the training data set;

8. An image-based garbage classification device, characterized in that the garbage classification device comprises:

9. A terminal device, characterized in that the terminal device comprises a processor and a memory for storing processor-executable instructions, the processor performing the steps of the garbage classification method of any one of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions which, when executed, implement the steps of the garbage classification method of any one of claims 1 to 7.