CN118429733A - Multi-head attention-driven kitchen garbage multi-label classification method - Google Patents

Multi-head attention-driven kitchen garbage multi-label classification method Download PDF

Info

Publication number
CN118429733A
CN118429733A CN202410900342.2A CN202410900342A CN118429733A CN 118429733 A CN118429733 A CN 118429733A CN 202410900342 A CN202410900342 A CN 202410900342A CN 118429733 A CN118429733 A CN 118429733A
Authority
CN
China
Prior art keywords
kitchen waste
module
label classification
layer
head attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410900342.2A
Other languages
Chinese (zh)
Other versions
CN118429733B (en
Inventor
梁桥康
李进涛
秦海
刘铭峰
柳力元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202410900342.2A priority Critical patent/CN118429733B/en
Publication of CN118429733A publication Critical patent/CN118429733A/en
Application granted granted Critical
Publication of CN118429733B publication Critical patent/CN118429733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-head attention-driven kitchen garbage multi-label classification method, which comprises the following steps: constructing a kitchen waste multi-label classification data set, wherein the kitchen waste multi-label classification data set comprises a plurality of images of different categories of kitchen waste, and the image labels comprise one or more categories; constructing a multi-head attention-driven graph rolling light-weight network model, wherein the model comprises a feature extraction module, a multi-head attention module and a dynamic graph rolling module; the feature extraction module extracts features from an input image, sends the features into the multi-head attention module to strengthen a category sensing area of a feature map, and sends the features into the dynamic map convolution module to self-adaptively capture the category sensing area; training the graph roll lightweight network model by using the constructed kitchen waste multi-label classification data set; and finally, performing multi-label classification on the kitchen waste image to be predicted by using the classification model obtained through training. The invention can reduce the performance loss caused by the reduction of model parameters, enhance the recognition capability and improve the multi-label classification effect.

Description

一种多头注意力驱动的厨余垃圾多标签分类方法A multi-label classification method for kitchen waste driven by multi-head attention

技术领域Technical Field

本发明涉及图像处理领域,具体涉及一种多头注意力驱动的厨余垃圾多标签分类方法。The present invention relates to the field of image processing, and in particular to a multi-head attention driven kitchen waste multi-label classification method.

背景技术Background technique

近年来,深度学习作为提高垃圾分类效率的手段,许多先进的分类算法被提出,然而目前公开的垃圾数据集大多是基于生活垃圾识别而设计的,对真实场景下厨余垃圾分类缺少相关研究,此外,厨余垃圾图像往往包含多个类别,是计算机视觉领域中典型的多标签图像分类。多标签图像分类作为计算机视觉领域任务之一,主要任务是在于准确预测图像中所包含的全部类别,由于在真实世界中,物体通常是多个同时出现的,这也更加符合人体的认知常识。In recent years, many advanced classification algorithms have been proposed as a means to improve the efficiency of garbage classification with deep learning. However, most of the currently available garbage datasets are designed based on the recognition of domestic garbage, and there is a lack of relevant research on the classification of kitchen waste in real scenes. In addition, kitchen waste images often contain multiple categories, which is a typical multi-label image classification in the field of computer vision. As one of the tasks in the field of computer vision, multi-label image classification is mainly aimed at accurately predicting all the categories contained in the image. Since in the real world, multiple objects usually appear at the same time, this is more in line with human cognitive common sense.

从厨余垃圾多标签分类角度出发,真实场景对模型的实时性要求较高,由于图像背景复杂,物体类别多样,物体标签之间存在关联性,目前的方法针对该问题的处理若是想获得较高的分类精度往往需要较复杂的网络模型,而基本的深度卷积网络在分类精度上无法保证。因此,研究真实场景下智能化的厨余垃圾多标签高效分类算法具有重要的实际价值和意义。From the perspective of multi-label classification of kitchen waste, real scenes have high requirements for the real-time performance of the model. Due to the complex image background, diverse object categories, and correlation between object labels, the current methods for dealing with this problem often require more complex network models if they want to obtain higher classification accuracy, and the basic deep convolutional network cannot guarantee classification accuracy. Therefore, it is of great practical value and significance to study intelligent kitchen waste multi-label efficient classification algorithms in real scenes.

发明内容Summary of the invention

本发明提供一种多头注意力驱动的厨余垃圾多标签分类方法,在降低模型参数量减少带来的性能损失的同时,进一步增强识别能力,提高多标签分类效果。The present invention provides a multi-head attention driven kitchen waste multi-label classification method, which reduces the performance loss caused by the reduction of model parameters while further enhancing the recognition ability and improving the multi-label classification effect.

为实现上述技术目的,本发明采用如下技术方案:In order to achieve the above technical objectives, the present invention adopts the following technical solutions:

一种多头注意力驱动的厨余垃圾多标签分类方法,包括:A multi-head attention driven kitchen waste multi-label classification method, including:

构建厨余垃圾多标签分类数据集,包括厨余垃圾多个不同类别各若干图像,每张图像的标签包括一个或多个类别;Construct a multi-label classification dataset for kitchen waste, including several images of different categories of kitchen waste, and the label of each image includes one or more categories;

构建多头注意力驱动的图卷积轻量化网络模型,包括轻量化的特征提取模块、多头注意力模块和动态图卷积模块;其中,特征提取模块对输入模型的图像提取特征图,提取的特征图送入多头注意力模块处理以加强特征图的类别感知区域,加强类别感知区域后的特征图再送入动态图卷积模块处理以自适应捕捉类别感知区域,输出预测类别;Construct a multi-head attention-driven graph convolution lightweight network model, including a lightweight feature extraction module, a multi-head attention module and a dynamic graph convolution module; the feature extraction module extracts feature maps from the image of the input model, and the extracted feature maps are sent to the multi-head attention module for processing to strengthen the category perception area of the feature maps. The feature maps with enhanced category perception areas are then sent to the dynamic graph convolution module for processing to adaptively capture the category perception area and output the predicted category;

使用构建的厨余垃圾多标签分类数据集训练所述图卷积轻量化网络模型;Use the constructed kitchen waste multi-label classification dataset to train the graph convolution lightweight network model;

最终使用训练得到的模型,即厨余垃圾多标签分类模型,对待预测厨余垃圾图像进行多标签分类。Finally, the trained model, namely the kitchen waste multi-label classification model, is used to perform multi-label classification on the kitchen waste images to be predicted.

进一步地,基于不同季节厨余垃圾的类别分布不同,所述厨余垃圾多标签分类数据集,从一年不同时间段中挑选多种不同类别组合的厨余垃圾图像得到。Furthermore, based on the different category distributions of kitchen waste in different seasons, the kitchen waste multi-label classification dataset is obtained by selecting kitchen waste images of multiple different category combinations from different time periods of a year.

进一步地,所述轻量化的特征提取模块,使用轻量化主干网络ShufflenetV2。Furthermore, the lightweight feature extraction module uses a lightweight backbone network ShufflenetV2.

进一步地,所述多头注意力模块,包括第一全连接层、缩放点积注意力子模块、第二全连接层、Dropout层和归一化层;Furthermore, the multi-head attention module includes a first fully connected layer, a scaled dot product attention submodule, a second fully connected layer, a Dropout layer and a normalization layer;

所述第一全连接层将输入的特征图降维转换为特征图 ;其中,分别代表了图像的长、宽、以及通道数; The first fully connected layer takes the input feature map Dimensionality reduction and conversion to feature map ;in, Represent the length, width, and number of channels of the image respectively;

所述缩放点积注意力子模块采用多头注意力机制,且每个头均将特征图作为 key与value,query则采用一组可学习的参数,计算式为: The scaled dot product attention submodule adopts a multi-head attention mechanism, and each head takes the feature map As key and value, query uses a set of learnable parameters, calculated as:

式中,表示采用多头注意力机制的缩放点积注意力子模块的输出,表示特征拼接,为附加权重矩阵,为缩放点积注意力模块的第个头的输 出,分别是缩放点积注意力子模块输入的的query、key、value与对应的权重矩阵相乘得到;为缩放因子;分别为第个头待学习的权重矩 阵,是权重矩阵在第i维度的权重; In the formula, represents the output of the scaled dot-product attention submodule using the multi-head attention mechanism, represents feature concatenation, is the additional weight matrix, The first The output of the head, They are the query, key, value and corresponding weight matrix of the scaled dot product attention submodule input Multiply them together to get; is the scaling factor; Respectively The weight matrix to be learned is the weight matrix The weight in the i-th dimension;

所述第二全连接层、Dropout层和归一化层,对缩放点积注意力子模块的输出进一步处理,表示为:The second fully connected layer, the Dropout layer, and the normalization layer further process the output of the scaled dot product attention submodule, which is expressed as:

式中,表示逐点相加,分别表示第二全连接层、Dropout层 和归一化层的处理;表示多头注意力模块输出的特征图。 In the formula, represents point-by-point addition, , , Respectively represent the processing of the second fully connected layer, Dropout layer and normalization layer; Feature map representing the output of the multi-head attention module.

进一步地,所述动态图卷积模块包括静态图卷积层和动态图卷积层;Furthermore, the dynamic graph convolution module includes a static graph convolution layer and a dynamic graph convolution layer;

所述动态图卷积层对静态图卷积层的输出处理,表示为: The dynamic graph convolution layer outputs the static graph convolution layer Processing, expressed as:

式中,是动态图卷积层的相关矩阵,而是动态图卷积层的状态更新权重, 是为激活函数,激活函数;是通过将特征图及其全局表 示串联而得到的,其中全局表示通过静态图卷积层的输出进行池化、1*1的一 维卷积以及激活函数得到;C代表厨余垃圾多标签分类的类别总数目,D1代表静态图卷积层 的输出特征图H的维度。 In the formula, is the correlation matrix of the dynamic graph convolutional layer, and is the state update weight of the dynamic graph convolution layer, it's for Activation function, for Activation function; is to transform the feature map Its global representation The global representation is obtained by concatenation. The output of the static graph convolution layer Pooling, 1*1 one-dimensional convolution and activation function are performed to obtain it; C represents the total number of categories for multi-label classification of kitchen waste, and D 1 represents the dimension of the output feature map H of the static graph convolution layer.

进一步地,所述静态图卷积层对输入的特征图处理,表示为:Furthermore, the static graph convolution layer processes the input feature graph, which is expressed as:

式中,是多头注意力模块输出的特征图,为静态图卷积层的输出特征图,由C 个类别对应的特征组成,即为激活函数,是静态图 卷积层的相关矩阵,而是静态图卷积层的状态更新权重。 In the formula, is the feature map output by the multi-head attention module, is the output feature map of the static graph convolution layer, which consists of features corresponding to C categories, namely ; is the activation function, is the correlation matrix of the static graph convolutional layer, and is the state update weight of the static graph convolution layer.

本发明一种多头注意力驱动的厨余垃圾多标签分类方法,利用轻量化网络优化图卷积分类模型,同时引入多头注意力机制减少特征信息的损失,捕捉不同层次的特征信息,加强复杂场景下主干网络的特征提取能力,减少模型参数量减少带来的性能损失,并进一步利用动态图卷积模块实现语义感知区域的自适应捕捉,进一步增强识别能力,提高多标签分类效果。和现有的厨余垃圾分类技术相比,具有如下优势:The present invention discloses a multi-head attention driven kitchen waste multi-label classification method, which uses a lightweight network to optimize the graph convolution classification model, and introduces a multi-head attention mechanism to reduce the loss of feature information, capture feature information at different levels, enhance the feature extraction capability of the backbone network in complex scenarios, reduce the performance loss caused by the reduction of model parameters, and further use a dynamic graph convolution module to achieve adaptive capture of semantic perception areas, further enhance recognition capabilities, and improve multi-label classification effects. Compared with existing kitchen waste classification technologies, it has the following advantages:

(1)克服传统GCN方法以普通深度卷积网络作为特征提取主干网精度不高和Transformer作为特征提取主干网模型参数量大的缺点,本发明采用ShuffleNetV2作为特征提取的主干网络,优化网络模型实现轻量化。(1) To overcome the shortcomings of the traditional GCN method, which uses ordinary deep convolutional networks as the feature extraction backbone network with low accuracy and Transformer as the feature extraction backbone network with large number of model parameters, the present invention adopts ShuffleNetV2 as the feature extraction backbone network and optimizes the network model to achieve lightweight.

(2)本发明设计了多头注意力模块和动态图卷积模块优化轻量化图卷积分类网络,减少模型参数量减少带来的性能损失同时,进一步增强识别能力,提高多标签分类效果。(2) The present invention designs a multi-head attention module and a dynamic graph convolution module to optimize the lightweight graph convolution classification network, thereby reducing the performance loss caused by the reduction of model parameters, further enhancing the recognition ability and improving the multi-label classification effect.

(3)具有很强的实用性和泛化能力,本发明不仅应用在目前厨余垃圾多标签分类中取得很好的效果,还在MS-COCO和VOC 2007数据集上获得优越的多标签分类精度。(3) It has strong practicality and generalization ability. The present invention not only achieves good results in the current multi-label classification of kitchen waste, but also obtains excellent multi-label classification accuracy on the MS-COCO and VOC 2007 datasets.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例多头注意力驱动的厨余垃圾多标签分类方法流程图。FIG1 is a flow chart of a multi-label classification method for kitchen waste driven by multi-attention according to an embodiment of the present invention.

图2为本发明实施例多头注意力模块示意图。FIG2 is a schematic diagram of a multi-head attention module according to an embodiment of the present invention.

图3为本发明实施例不同主干网络参数与精度的关系。FIG. 3 shows the relationship between different backbone network parameters and accuracy according to an embodiment of the present invention.

图4为本发明实施例不同模块融合对分类mAP的影响。FIG4 shows the influence of the fusion of different modules on the classification mAP according to an embodiment of the present invention.

图5为本发明实施例不同模块融合对分类各个类别AP的影响。FIG5 shows the influence of the fusion of different modules on the classification of APs of various categories according to an embodiment of the present invention.

具体实施方式Detailed ways

下面对本发明的实施例作详细说明,本实施例以本发明的技术方案为依据开展,给出了详细的实施方式和具体的操作过程,对本发明的技术方案作进一步解释说明。The following is a detailed description of an embodiment of the present invention. This embodiment is based on the technical solution of the present invention, and provides a detailed implementation method and a specific operation process to further explain the technical solution of the present invention.

本发明技术方案是基于多头注意力驱动的图卷积轻量化网络来实现厨余垃圾图像多标签分类,可利用Python编程语言进行实验,也可以采用C/C++编程语言进行工程化应用。The technical solution of the present invention is to realize multi-label classification of kitchen waste images based on a multi-head attention-driven graph convolution lightweight network. The Python programming language can be used for experiments, and the C/C++ programming language can also be used for engineering applications.

本发明提供了一种多头注意力驱动的图卷积轻量化网络多标签分类方法,处理流程参见图1,包括以下步骤:The present invention provides a multi-head attention driven graph convolution lightweight network multi-label classification method. The processing flow is shown in Figure 1, which includes the following steps:

步骤1,构建真实场景下采集的厨余垃圾多标签分类数据集MLKW,包括数据集的收集、标注;Step 1: Build a multi-label classification dataset MLKW for kitchen waste collected in real scenarios, including the collection and annotation of the dataset;

实施例所述数据在厨余垃圾分拣中心流水线上采集得到,获取的15,994张厨余垃圾图像完成了该MLKW数据集的制作,MLKW更加贴近真实场景,具有很强的工程应用意义。为了体现不同时间段厨余垃圾的特点,考虑不同季节厨余垃圾的类别分布不同,本发明从一年不同时间段挑选图像,其中春季3771张图像,夏季4735张图像,秋季2130张图像和冬季4358张图像。通过对图像数据进行清洗和筛选,最终得到了一个包含八个类别,共计3107张完全标注,其余部分标注的多标签厨余垃圾数据集,并将数据集按照比例为8:2划分为训练集和测试集。同时,在多标签分类领域中被广泛使用的PASCAL VOC 2007数据集和MS-COCO数据集上进行验证,其中VOC 2007数据集共计9963张图象并且涵盖了常见的20个类别,MS-COCO数据集包含一个训练集(82081张)和一个验证集(40504张),总共122581张图像并涵盖了常见的80个类别,每张图像大约有2.9个类别标签。The data described in the embodiment is collected on the assembly line of the kitchen waste sorting center, and the 15,994 kitchen waste images obtained complete the production of the MLKW data set. MLKW is closer to the real scene and has a strong engineering application significance. In order to reflect the characteristics of kitchen waste in different time periods, considering the different distribution of kitchen waste categories in different seasons, the present invention selects images from different time periods of the year, including 3771 images in spring, 4735 images in summer, 2130 images in autumn and 4358 images in winter. By cleaning and screening the image data, a multi-label kitchen waste data set containing eight categories, a total of 3107 fully annotated images, and the rest are annotated is finally obtained, and the data set is divided into a training set and a test set in a ratio of 8:2. At the same time, verification was performed on the PASCAL VOC 2007 dataset and MS-COCO dataset, which are widely used in the field of multi-label classification. The VOC 2007 dataset contains a total of 9,963 images and covers 20 common categories. The MS-COCO dataset contains a training set (82,081 images) and a validation set (40,504 images), a total of 122,581 images, covering 80 common categories, and each image has approximately 2.9 category labels.

步骤2,构建多头注意力驱动的图卷积轻量化网络模型,包括轻量化的特征提取模块、多头注意力模块和动态图卷积模块。Step 2: Build a multi-head attention-driven graph convolution lightweight network model, including a lightweight feature extraction module, a multi-head attention module, and a dynamic graph convolution module.

1、特征提取模块。1. Feature extraction module.

本实施例所述通过使用轻量化主干网络ShufflenetV2作为GCN模块的特征提取网络,实现轻量化,即GCLN。但为了适应本发明的轻量化模型,将图像的原始分辨率3,256×2,724,调整为448×448。模型在训练前,首先在ImageNet1K数据集上经过预训练的ShuffleNetV2_x1_0作为主干网络。In this embodiment, the lightweight backbone network ShufflenetV2 is used as the feature extraction network of the GCN module to achieve lightweight, i.e., GCLN. However, in order to adapt to the lightweight model of the present invention, the original resolution of the image, 3,256×2,724, is adjusted to 448×448. Before the model is trained, ShuffleNetV2_x1_0, which has been pre-trained on the ImageNet1K dataset, is first used as the backbone network.

2、多头注意力模块。2. Multi-head attention module.

将特征提取模块提取的特征图送入多头注意力模块加强特征图的类别感知区域。The feature map extracted by the feature extraction module is sent to the multi-head attention module to strengthen the category perception area of the feature map.

多头注意力模块(MHA)的目的是为了捕捉感兴趣的类别区域,通过主干网络提取 的特征,W、H、D分别代表图像的长、宽、以及通道数,经过降维转变为,以此作为多头注意力机制的key与value,而对于query,则采用一组可学习 的参数,完成对图像特征的全局审视,参见图2。在MHA中,缩放点积注意力(scaled dot- product attention)作为计算核心,其计算公式为: The purpose of the multi-head attention module (MHA) is to capture the category area of interest, through the features extracted by the backbone network , W, H, and D represent the length, width, and number of channels of the image, respectively. After dimensionality reduction, it is transformed into , which is used as the key and value of the multi-head attention mechanism. For the query, a set of learnable parameters is used to complete the global review of the image features, see Figure 2. In MHA, scaled dot-product attention is used as the calculation core, and its calculation formula is:

(1) (1)

其中:Q、K、V是通过经过全连接层的输入q'、k'、v'与对应的权重矩阵相乘得到。为缩放因子,的维度与初始输入q、k、v的维度相同。参见图2。Among them: Q, K, V are the input q', k', v' through the fully connected layer and the corresponding weight matrix Multiply them together to get . is the scaling factor, The dimension of is the same as the dimension of the initial input q, k, v. See Figure 2.

不同于Vision Transformer中的多头注意力机制,本发明通过对其重新设计,引入残差连接以此减少信息的损失。计算过程为如下:将q、k、v矩阵映射到不同的子空间变换生成多个head,通过对每个head进行缩放点积注意力机制,从而捕捉到不同子空间和不同维度的信息。其中:Different from the multi-head attention mechanism in Vision Transformer, this paper introduces residual connection to reduce information loss by redesigning it. The calculation process is as follows: map the q, k, and v matrices to different subspace transformations to generate multiple heads, and perform a scaled dot product attention mechanism on each head to capture information in different subspaces and dimensions. Among them:

(2) (2)

(3) (3)

其中:W为附加权重矩阵,Concat表示特征拼接。Among them: W is the additional weight matrix, and Concat represents feature concatenation.

通过多头注意力之后,考虑到训练过程中的过拟合问题,添加Dropout层和归一化进行缓解,除此之外,通过残差连接减少信息的损失,计算公式如下:After multi-head attention, considering the overfitting problem in the training process, Dropout layer and normalization are added to alleviate it. In addition, residual connection is used to reduce information loss. The calculation formula is as follows:

(4) (4)

其中:Vc表示感兴趣类别,c表示类别数目,表示逐点相加,Q为q‘与相乘到。Where: V c represents the category of interest, c represents the number of categories, It means point-by-point addition, Q is q' and Multiply to.

3、动态卷积模块。3. Dynamic convolution module.

将多头注意力模块输出的特征图输送到动态图卷积模块进行最终分类。The feature map output by the multi-head attention module is fed into the dynamic graph convolution module for final classification.

本实施例通过多头注意力机制,得到了感兴趣类别,为了使图卷积对每张图像具有特定的标签依赖关系,以此设计一种动态图卷积模块(Dynamic GCN)。This example uses a multi-head attention mechanism to obtain the category of interest In order to make the graph convolution have a specific label dependency for each image, a dynamic graph convolution module (Dynamic GCN) is designed.

通过将Vc依次传递给静态GCN和动态GCN。单层静态GCN简单定义为,其中,激活函数,其中,相关矩阵,而是状态更新权重。然后采用动态GCN。相关矩阵是受图像特征影响的特征矩阵,通过这种方式可以有效缓解过拟合问题,动态GCN的输出,具体计算过程为:By passing Vc to static GCN and dynamic GCN in sequence. A single-layer static GCN is simply defined as ,in , activation function for ,in, The correlation matrix, is the state update weight. Then use dynamic GCN. The correlation matrix It is a feature matrix affected by image features. This method can effectively alleviate the overfitting problem. The output of dynamic GCN , the specific calculation process is:

(5) (5)

其中,激活函数,激活函数,为状态更新权值,为构造动态相关矩阵Ad的一个权值,是通过将H及其全局表示串联而得到的,这是由全局平均池和一个conv层顺序得到。形式上,定义为:in, for Activation function, for Activation function, is the state update weight, To construct a weight for the dynamic correlation matrix Ad, is obtained by taking H and its global representation This is obtained by concatenating the global average pooling and a conv layer in sequence. Formally, defined as:

(6) (6)

步骤3,使用构建的厨余垃圾多标签分类数据集训练所述图卷积轻量化网络模型。Step 3: Use the constructed kitchen waste multi-label classification dataset to train the graph convolution lightweight network model.

在训练过程中,动态图卷积模块(DGCN)采用了非线性激活函数LeakyReLU,设置斜率为0.2。训练过程中的一些超参数如下:优化器采用SGD,其动量为0.9,权重衰减系数为0.0001。batch size为16,多头注意力模块(MHA)和动态图卷积模块(DGCN)模块的初始学习速率为0.5,主干网络的初始学习速率为0.05,共训练50个epoch,并且在第30个和第40个epoch是分别以0.1的衰减系数更新学习率。需要Ubuntu 16.04LTS系统以上,系统环境需要Pytorch 1.6, Python3.6以上。硬件平台包括显卡Tesla V100,CUDA11.0,CuDNN8.0.2GPUs,同时CPU内存不低于64G,固态硬盘不小于512G。During the training process, the dynamic graph convolution module (DGCN) uses the nonlinear activation function LeakyReLU, and sets the slope to 0.2. Some hyperparameters in the training process are as follows: the optimizer uses SGD, its momentum is 0.9, and the weight decay coefficient is 0.0001. The batch size is 16, the initial learning rate of the multi-head attention module (MHA) and the dynamic graph convolution module (DGCN) is 0.5, and the initial learning rate of the backbone network is 0.05. A total of 50 epochs are trained, and the learning rate is updated with a decay coefficient of 0.1 at the 30th and 40th epochs respectively. Ubuntu 16.04LTS system or above is required, and the system environment requires Pytorch 1.6, Python3.6 or above. The hardware platform includes graphics card Tesla V100, CUDA11.0, CuDNN8.0.2GPUs, and the CPU memory is not less than 64G, and the solid-state drive is not less than 512G.

步骤4,使用训练得到的模型,即厨余垃圾多标签分类模型,对待预测厨余垃圾图像进行多标签分类。Step 4: Use the trained model, i.e., the kitchen waste multi-label classification model, to perform multi-label classification on the kitchen waste image to be predicted.

为便于理解本发明的技术效果,提供本发明和传统方法的应用对比如下:To facilitate understanding of the technical effects of the present invention, the following comparison is provided between the present invention and the traditional method:

表1和图3展示了不同轻量化主干网络在不同方法上的比较,轻量化主干网络包括ShuffleNetV2、MobileNetV3、EfficientNet,多标签图像分类算法包括ML-GCN、ADD-GCN、MHA-GCN。主要对比了不同主干网络在不同算法之间的平均精度(mAP)和参数量(Params)。由表1可以看出,在同一主干下,本发明在所有指标上均优于其他方法。由图2可以看出,通过小幅度增加模型的参数量,获得了较大的性能提升,本发明在MLKW数据集上所表现的性能比ML-GCN最少提升了8.6%,比ADD-GCN最少提升了4.8%,表明了本文方法的有效性。Table 1 and Figure 3 show the comparison of different lightweight backbone networks in different methods. The lightweight backbone networks include ShuffleNetV2, MobileNetV3, and EfficientNet, and the multi-label image classification algorithms include ML-GCN, ADD-GCN, and MHA-GCN. The main comparison is the average precision (mAP) and parameter amount (Params) of different backbone networks between different algorithms. As can be seen from Table 1, under the same backbone, the present invention is superior to other methods in all indicators. As can be seen from Figure 2, by slightly increasing the number of parameters of the model, a large performance improvement is obtained. The performance of the present invention on the MLKW dataset is at least 8.6% higher than that of ML-GCN and at least 4.8% higher than that of ADD-GCN, which shows the effectiveness of the method in this paper.

表2是本发明在其他方法在VOC2007数据集上的实验结果。可知,在多个类别上的AP值较其他方法具有优越性,与ResNet101相比,将mAP提高至了94.0%。值得注意的是,本发明是针对厨余垃圾数据集设计的,在VOC2007上与ML-GCN的性能持平,这也证明了方法的泛化性与可扩展性。Table 2 shows the experimental results of the present invention and other methods on the VOC2007 dataset. It can be seen that the AP values on multiple categories are superior to other methods, and compared with ResNet101, the mAP is increased to 94.0%. It is worth noting that the present invention is designed for the kitchen waste dataset and has the same performance as ML-GCN on VOC2007, which also proves the generalization and scalability of the method.

表3是本发明在其他方法在MS-COCO数据集上的实验结果。表3展示了MHA-GCN在MS-COCO上的性能,以及与ResNet101、SRTN、ML-GCN等方法的对比结果。显然,本发明方法MHA-GCN在mAP、OF1、CF1等多个指标上均优于其他方法,除此之外,相比于ResNet101,MHA-GCN的总体性能提高了6.1%,这进一步表明了本文方法的优势。Table 3 is the experimental results of other methods of the present invention on the MS-COCO dataset. Table 3 shows the performance of MHA-GCN on MS-COCO, as well as the comparison results with ResNet101, SRTN, ML-GCN and other methods. Obviously, the method MHA-GCN of the present invention outperforms other methods in multiple indicators such as mAP, OF1, CF1, etc. In addition, compared with ResNet101, the overall performance of MHA-GCN is improved by 6.1%, which further demonstrates the advantages of the method of this paper.

为了进一步验证本发明方法的多头注意力模块MHA和动态图卷积模块GCN的有效性,本发明在MLKW数据上验证了不同模块融合对分类mAP的影响和不同模块融合对分类各个类别AP的影响,分别参见图4和图5。In order to further verify the effectiveness of the multi-head attention module MHA and the dynamic graph convolution module GCN of the method of the present invention, the present invention verifies the influence of the fusion of different modules on the classification mAP and the influence of the fusion of different modules on the AP of each category of classification on the MLKW data, see Figure 4 and Figure 5 respectively.

以上实施例为本申请的优选实施例,本领域的普通技术人员还可以在此基础上进行各种变换或改进,在不脱离本申请总的构思的前提下,这些变换或改进都应当属于本申请要求保护的范围之内。The above embodiments are preferred embodiments of the present application. Ordinary technicians in this field can also make various changes or improvements on this basis. Without departing from the overall concept of the present application, these changes or improvements should fall within the scope of protection required by the present application.

Claims (6)

1. A multi-head attention-driven kitchen garbage multi-label classification method is characterized by comprising the following steps of:
Constructing a kitchen waste multi-label classification data set, wherein the multi-label classification data set comprises a plurality of images of kitchen waste in different categories, and the labels of each image comprise one or more categories;
constructing a multi-head attention-driven graph rolling light-weight network model, wherein the model comprises a light-weight characteristic extraction module, a multi-head attention module and a dynamic graph rolling module; the feature extraction module extracts feature images of an input model, the extracted feature images are sent to the multi-head attention module to be processed so as to strengthen category sensing areas of the feature images, and the feature images after strengthening the category sensing areas are sent to the dynamic image convolution module to be processed so as to adaptively capture the category sensing areas and output prediction categories;
training the graph roll lightweight network model by using the constructed kitchen waste multi-label classification data set;
Finally, a model obtained through training, namely a kitchen waste multi-label classification model, is used for multi-label classification of the kitchen waste image to be predicted.
2. The multi-head attention-driven kitchen waste multi-label classification method according to claim 1, wherein the kitchen waste multi-label classification data set is obtained by selecting kitchen waste images with a plurality of different types and combinations from different time periods of one year based on different types and distributions of kitchen waste in different seasons.
3. The multi-head attention driven kitchen waste multi-tag classification method of claim 1, wherein the lightweight feature extraction module uses a lightweight backbone network ShufflenetV.
4. The multi-head attention driven kitchen waste multi-label classification method according to claim 1, wherein the multi-head attention module comprises a first full-connection layer, a zoom dot product attention sub-module, a second full-connection layer, a Dropout layer and a normalization layer;
The first full connection layer inputs the characteristic diagram Dimension reduction conversion into feature map; Wherein,Representing the length, width, and number of channels of the image, respectively;
The zoom dot product attention submodule adopts a multi-head attention mechanism, and each head is used for mapping the characteristic diagram As the key and the value, the query adopts a set of parameters which can be learned, and the calculation formula is as follows:
In the method, in the process of the invention, Representing the output of a scaled dot product attention sub-module employing a multi-headed attention mechanism,The feature stitching is represented and is performed,In order to attach the weight matrix to the vehicle,Attention module for dot product scalingThe output of the individual head is provided with,Query, key, value and corresponding weight matrix, respectively, of the scaled dot product attention submodule inputMultiplying to obtain; is a scaling factor; Respectively the first The weight matrix to be learned of each head is the weight matrixWeights in the ith dimension;
The second fully-connected layer, dropout layer, and normalization layer further process the output of the scaled dot product attention sub-module, expressed as:
In the method, in the process of the invention, Representing a point-by-point addition,Respectively representing the treatment of a second full connection layer, a Dropout layer and a normalization layer; and the characteristic diagram represents the output of the multi-head attention module.
5. The multi-head attention driven kitchen waste multi-label classification method according to claim 1, wherein the dynamic graph convolution module comprises a static graph convolution layer and a dynamic graph convolution layer;
Output of the dynamic graph convolution layer to the static graph convolution layer The process is expressed as:
In the method, in the process of the invention, Is the correlation matrix of the dynamic graph convolution layer, andIs the state update weight of the dynamic graph convolutional layer,Is thatThe function is activated and the function is activated,Is thatActivating a function; By combining characteristic diagrams And global representation thereofObtained by concatenation, where global representsOutput through static graph convolutional layersCarrying out pooling and 1*1 one-dimensional convolution and activating functions to obtain; c represents the total number of categories of multi-label classification of kitchen waste, and D 1 represents the dimension of an output characteristic diagram H of a static diagram convolution layer.
6. The multi-head attention driven kitchen waste multi-label classification method according to claim 5, wherein the static map convolution layer processes an input feature map, expressed as:
In the method, in the process of the invention, Is a characteristic diagram output by the multi-head attention module,The output characteristic diagram of the static diagram convolution layer consists of C types of corresponding characteristics, namelyIn order to activate the function,Is the correlation matrix of the convolution layer of the static diagramIs the state update weight of the static graph convolutional layer.
CN202410900342.2A 2024-07-05 2024-07-05 Multi-head attention-driven kitchen garbage multi-label classification method Active CN118429733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410900342.2A CN118429733B (en) 2024-07-05 2024-07-05 Multi-head attention-driven kitchen garbage multi-label classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410900342.2A CN118429733B (en) 2024-07-05 2024-07-05 Multi-head attention-driven kitchen garbage multi-label classification method

Publications (2)

Publication Number Publication Date
CN118429733A true CN118429733A (en) 2024-08-02
CN118429733B CN118429733B (en) 2024-10-11

Family

ID=92321837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410900342.2A Active CN118429733B (en) 2024-07-05 2024-07-05 Multi-head attention-driven kitchen garbage multi-label classification method

Country Status (1)

Country Link
CN (1) CN118429733B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
JP6980958B1 (en) * 2021-06-23 2021-12-15 中国科学院西北生態環境資源研究院 Rural area classification garbage identification method based on deep learning
CN114612681A (en) * 2022-01-30 2022-06-10 西北大学 Multi-label image classification method, model construction method and device based on GCN
CN116484740A (en) * 2023-04-28 2023-07-25 南京信息工程大学 A Line Parameter Identification Method Based on Mining Spatial Topological Features of Power Grid
CN116863531A (en) * 2023-05-22 2023-10-10 山东师范大学 Human behavior recognition method and system based on self-attention enhanced graph neural network
US20240119721A1 (en) * 2022-10-06 2024-04-11 Qualcomm Incorporated Processing data using convolution as a transformer operation
WO2024139297A1 (en) * 2022-12-30 2024-07-04 深圳云天励飞技术股份有限公司 Road disease identification method and re-identification method, and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
JP6980958B1 (en) * 2021-06-23 2021-12-15 中国科学院西北生態環境資源研究院 Rural area classification garbage identification method based on deep learning
CN114612681A (en) * 2022-01-30 2022-06-10 西北大学 Multi-label image classification method, model construction method and device based on GCN
US20240119721A1 (en) * 2022-10-06 2024-04-11 Qualcomm Incorporated Processing data using convolution as a transformer operation
WO2024139297A1 (en) * 2022-12-30 2024-07-04 深圳云天励飞技术股份有限公司 Road disease identification method and re-identification method, and related device
CN116484740A (en) * 2023-04-28 2023-07-25 南京信息工程大学 A Line Parameter Identification Method Based on Mining Spatial Topological Features of Power Grid
CN116863531A (en) * 2023-05-22 2023-10-10 山东师范大学 Human behavior recognition method and system based on self-attention enhanced graph neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAI QIN, ET AL.: "Active Learning-DETR: Cost-Effective Object Detection for Kitchen Waste", IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 22 February 2024 (2024-02-22), pages 1 - 15 *
陈佳伟;韩芳;王直杰;: "基于自注意力门控图卷积网络的特定目标情感分析", 计算机应用, no. 08, 10 August 2020 (2020-08-10), pages 38 - 42 *
龚亮威 等: "基于多头类特定残差注意力和图卷积的多标签图像分类算法", 微电子学与计算机, 31 August 2023 (2023-08-31), pages 45 - 54 *

Also Published As

Publication number Publication date
CN118429733B (en) 2024-10-11

Similar Documents

Publication Publication Date Title
US20220382553A1 (en) Fine-grained image recognition method and apparatus using graph structure represented high-order relation discovery
CN109919183B (en) A kind of image recognition method, device, device and storage medium based on small sample
CN110188227A (en) A Hash Image Retrieval Method Based on Deep Learning and Low-Rank Matrix Optimization
CN110222718B (en) Image processing method and device
CN114780767B (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN116152792B (en) Vehicle re-identification method based on cross-context and characteristic response attention mechanism
Shamsolmoali et al. High-dimensional multimedia classification using deep CNN and extended residual units
CN111597929A (en) Group Behavior Recognition Method Based on Channel Information Fusion and Group Relationship Spatial Structured Modeling
CN111008224A (en) A time series classification and retrieval method based on deep multi-task representation learning
CN116612288A (en) Multi-scale lightweight real-time semantic segmentation method and system
CN115830666A (en) A video expression recognition method and application based on spatio-temporal feature decoupling
CN109492610B (en) Pedestrian re-identification method and device and readable storage medium
CN113989566A (en) Image classification method and device, computer equipment and storage medium
CN118429733B (en) Multi-head attention-driven kitchen garbage multi-label classification method
CN114170460A (en) Multi-mode fusion-based artwork classification method and system
CN111222515B (en) Image translation method based on context-aware attention
CN118154989A (en) A garbage classification method and system based on neural network
CN118447324A (en) A multi-label image classification method based on deep learning
CN110852272B (en) Pedestrian detection method
CN112528077A (en) Video face retrieval method and system based on video embedding
CN118470553A (en) Hyperspectral remote sensing image processing method based on spatial spectral attention mechanism
CN118840205A (en) Financial product processing method and device, storage medium and electronic equipment
CN118135306A (en) Double-branch hyperspectral image classification method based on graph convolution neural network and attention mechanism
CN118133114A (en) Track prediction method, medium and system based on graph neural network
CN117036711A (en) Weak supervision semantic segmentation method based on attention adjustment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant