JP2022058173A

JP2022058173A - Image classification method and apparatus

Info

Publication number: JP2022058173A
Application number: JP2021124754A
Authority: JP
Inventors: 杰蒋; Jie Jiang; 君燕楊; Junyan Yang; 輝許; Hui Xu; 家豪孫; Jiahao Sun; 陽劉; Yang Liu; 来康; Lai Kang; 迎梅魏; Yingmei Wei; 毓湘謝; Yuxiang Xie
Original assignee: Nat Univ Defense Tech Pla; National University of Defense Technology
Current assignee: Nat Univ Defense Tech Pla; National University of Defense Technology
Priority date: 2020-09-30
Filing date: 2021-07-29
Publication date: 2022-04-11
Anticipated expiration: 2041-07-29
Also published as: CN111898709B; CN111898709A; JP7013057B1

Abstract

To provide an image classification method and an apparatus for recognizing and classifying image data.SOLUTION: A method comprises a step of, by incorporating a residual mechanism into an attention model, combining contextual information within an attention mechanism without increasing a parameter and assisting the attention model to further accurately extract a feature of interest for an image classification task, inputting image data into a residual attention mechanism model to improve efficiency and accuracy of image classification when recognizing and classifying the image data.SELECTED DRAWING: Figure 1

Description

本明細書の１つ又は複数の実施例は、画像認識の技術分野に関し、特に、画像分類方法及び機器に関する。 One or more embodiments of the present specification relate to the art of image recognition, in particular to image classification methods and equipment.

社会の情報化の度合いが増加するにつれて、画像は、徐々にテキストに取って代わり、人間による情報の伝達及び保存用の重要な媒体になってきた。画像に含まれる情報の無秩序化及び膨大な量は、画像情報の処理に大きな挑戦をもたらしている。如何にして画像を効果的に分類し、我々が必要とする有用な情報を抽出するかは、コンピュータビジョン分野において、注目を集める課題になっている。 As the degree of informationization in society has increased, images have gradually replaced text and have become an important medium for the transmission and storage of information by humans. The disorder and enormous amount of information contained in images poses great challenges in the processing of image information. How to effectively classify images and extract useful information that we need has become a hot issue in the field of computer vision.

しかし、社会の発展に伴い、画像のデータ量は指数的に増加し、画像の応用範囲が拡大され続けており、従来技術における画像分類のネットワーク構造及びアルゴリズムは、様々な種類、様々な性質及び無秩序の画像データを完璧かつ効率的に分類するという要件を満たすにはほど遠く、従来の画像分類方式の効率及び正確率は、改善の余地がある。 However, with the development of society, the amount of image data has increased exponentially, and the range of application of images has continued to expand. Far from meeting the requirement to classify chaotic image data perfectly and efficiently, the efficiency and accuracy of conventional image classification methods can be improved.

これに鑑みて、本明細書の１つ又は複数の実施例の目的は、画像分類の効率及び正確率が低いという問題を解決するための画像分類方法及び機器を提案することにある。 In view of this, an object of one or more embodiments herein is to propose image classification methods and devices for solving the problem of low efficiency and accuracy of image classification.

上記目的に基づいて、本明細書の１つ又は複数の実施例は、画像分類方法であって、
残差ネットワークモデルを確立し、前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えて、穴付き残差ネットワーク主幹を生成することと、
注意力メカニズムモデルのチャンネル注意力モジュール及び空間注意力モジュールに基づいて、前記残差ネットワークモデルの重み層を生成することと、
前記穴付き残差ネットワーク主幹と前記重み層とで構成される残差注意力メカニズムモデルを生成し、前記残差注意力メカニズムモデルを訓練することと、
画像データを前記残差注意力メカニズムモデルに入力し、前記画像データを認識して分類することを含む、画像分類方法を提供している。 Based on the above object, one or more embodiments of the present specification are image classification methods.
Establishing a residual network model and replacing the standard convolution at the original edge of the residual network model with a perforated convolution to generate a perforated residual network backbone.
Generating the weight layer of the residual network model based on the channel attention module and the spatial attention module of the attention mechanism model,
To generate a residual attention mechanism model composed of the residual network main trunk with holes and the weight layer, and to train the residual attention mechanism model.
Provided is an image classification method including inputting image data into the residual attention mechanism model and recognizing and classifying the image data.

いくつかの実施形態において、上述の前記残差ネットワークモデルの重み層を生成することは、
前記チャンネル注意力モジュール及び前記空間注意力モジュールに基づいて、チャンネル注意力重み層及び空間注意力重み層を生成し、前記チャンネル注意力重み層と前記空間注意力重み層とを直列に順次配列することを含む。 In some embodiments, generating the weight layer of the residual network model described above can be done.
A channel attention weight layer and a spatial attention weight layer are generated based on the channel attention module and the spatial attention module, and the channel attention weight layer and the spatial attention weight layer are sequentially arranged in series. Including that.

いくつかの実施形態において、前記チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行って、前記チャンネル注意力重み層を生成する。 In some embodiments, matrix addition between the channel attention module and the residual network edge is performed to generate the channel attention weight layer.

いくつかの実施形態において、前記チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行う前には、
前記チャンネル注意力モジュールに対して逆畳み込み操作を行うことを更に含む。 In some embodiments, prior to matrix addition between the channel attention module and the residual network edge,
It further includes performing a deconvolution operation on the channel attention module.

いくつかの実施形態において、上述の前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えることは、
前記元エッジにおける標準畳み込みを穴付き畳み込み直列バッチ正規化用の直列線形整流活性化関数の畳み込み層に置き換えることを含む。 In some embodiments, replacing the standard convolution at the original edge of the residual network model described above with a perforated convolution is possible.
It involves replacing the standard convolution at the original edge with the convolution layer of the series linear rectification activation function for perforated convolution series batch normalization.

同一構想に基づいて、本明細書の１つ又は複数の実施例は、画像分類機器であって、
残差ネットワークモデルを確立し、前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えて、穴付き残差ネットワーク主幹を生成する主幹モジュールと、
注意力メカニズムモデルのチャンネル注意力モジュール及び空間注意力モジュールに基づいて、前記残差ネットワークモデルの重み層を生成する重みモジュールと、
前記穴付き残差ネットワーク主幹と前記重み層とで構成される残差注意力メカニズムモデルを生成し、前記残差注意力メカニズムモデルを訓練する生成モジュールと、
画像データを前記残差注意力メカニズムモデルに入力し、前記画像データを認識して分類する分類モジュールとを含む、画像分類機器を更に提供している。 Based on the same concept, one or more embodiments of the present specification are image classification devices.
A trunk module that establishes a residual network model and replaces the standard convolution at the original edge of the residual network model with a perforated convolution to generate a perforated residual network backbone.
A weight module that generates a weight layer of the residual network model based on the channel attention module and the spatial attention module of the attention mechanism model, and
A generation module that generates a residual attention mechanism model composed of the residual network main trunk with holes and the weight layer, and trains the residual attention mechanism model.
Further provided is an image classification device including a classification module for inputting image data into the residual attention mechanism model and recognizing and classifying the image data.

いくつかの実施形態において、前記重みモジュールが前記残差ネットワークモデルの重み層を生成することは、
前記チャンネル注意力モジュール及び前記空間注意力モジュールに基づいて、チャンネル注意力重み層及び空間注意力重み層を生成し、前記チャンネル注意力重み層と前記空間注意力重み層とを直列に順次配列することを含む。 In some embodiments, the weighting module produces a weighting layer for the residual network model.
A channel attention weight layer and a spatial attention weight layer are generated based on the channel attention module and the spatial attention module, and the channel attention weight layer and the spatial attention weight layer are sequentially arranged in series. Including that.

いくつかの実施形態において、前記重みモジュールは、チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行って、前記チャンネル注意力重み層を生成する。 In some embodiments, the weight module performs matrix addition between the channel attention module and the residual network edge to generate the channel attention weight layer.

いくつかの実施形態において、前記重みモジュールが、チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行う前には、
前記チャンネル注意力モジュールに対して逆畳み込み操作を行うことを更に含む。 In some embodiments, the weighting module is prior to performing matrix addition between the channel attention module and the residual network edge.
It further includes performing a deconvolution operation on the channel attention module.

いくつかの実施形態において、前記主幹モジュールが前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えることは、
前記元エッジにおける標準畳み込みを穴付き畳み込み直列バッチ正規化用の直列線形整流活性化関数の畳み込み層に置き換えることを含む。 In some embodiments, the trunk module replaces the standard convolution at the original edge of the residual network model with a perforated convolution.
It involves replacing the standard convolution at the original edge with the convolution layer of the series linear rectification activation function for perforated convolution series batch normalization.

以上の記載から分かるように、本明細書の１つ又は複数の実施例による画像分類方法及び機器は、残差ネットワークモデルを確立し、前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えて、穴付き残差ネットワーク主幹を生成することと、注意力メカニズムモデルのチャンネル注意力モジュール及び空間注意力モジュールに基づいて、前記残差ネットワークモデルの重み層を生成することと、前記穴付き残差ネットワーク主幹と前記重み層とで構成される残差注意力メカニズムモデルを生成し、前記残差注意力メカニズムモデルを訓練することと、画像データを前記残差注意力メカニズムモデルに入力し、前記画像データを認識して分類することを含む。本明細書の１つ又は複数の実施例は、注意力モデルに残差メカニズムを取り入れて、パラメータを増やすことなく注意力メカニズム内部のコンテキスト情報を結合させ、画像分類タスクにとって興味のある特徴がより正確に抽出されるように注意力モデルを支援することで、画像分類の効率及び正確率を向上させた。そして、本技術案によって改良された注意力メカニズムモデルの訓練時間は、本来の約半分に短縮され、訓練効率が大幅に向上される。 As can be seen from the above description, the image classification method and device according to one or more embodiments of the present specification establishes a residual network model and perforates the standard convolution at the original edge of the residual network model. To generate the core of the residual network with holes, and to generate the weight layer of the residual network model based on the channel attention module and the spatial attention module of the attention mechanism model, and the hole. Generate a residual attention mechanism model composed of the residual network main trunk and the weight layer, train the residual attention mechanism model, and input the image data into the residual attention mechanism model. , Includes recognizing and classifying the image data. One or more embodiments of the specification incorporate a residual mechanism into the attention model, combining contextual information within the attention mechanism without increasing parameters, and more features of interest to the image classification task. By assisting the attention model to be extracted accurately, we improved the efficiency and accuracy of image classification. The training time of the attention mechanism model improved by this technical proposal is shortened to about half of the original time, and the training efficiency is greatly improved.

本明細書の１つ又は複数の実施例又は従来技術における技術案をより明確に説明するために、以下、実施例又は従来技術の説明に使用する必要のある図面を簡単に紹介するが、明らかなことに、以下に説明する図面は、本明細書の１つ又は複数の実施例に過ぎず、当業者にとっては、創造的労働を払わずに、これらの図面から他の図面を得ることもできる。
図１は本明細書の１つ又は複数の実施例による画像分類方法のフロー模式図である。図２は、本明細書の１つ又は複数の実施例による残差ネットワークモデルの応用原理模式図である。図３は、本明細書の１つ又は複数の実施例による穴付き残差ネットワーク主幹の残差ブロックの模式図である。図４は、本明細書の１つ又は複数の実施例による注意力メカニズムモデルの構造模式図である。図５は、本明細書の１つ又は複数の実施例による残差注意力メカニズムモデル（Ｄｉｌａｔｅｄ－ＣＢＡＭ）の構造模式図である。図６は、本明細書の１つ又は複数の実施例による残差チャンネル注意力モジュールの構造模式図である。図７は、本明細書の１つ又は複数の実施例による別の残差チャンネル注意力モジュールの構造模式図である。図８は、本明細書の１つ又は複数の実施例による画像分類機器の構造模式図である。 In order to more clearly explain one or more embodiments or technical proposals in the prior art of the present specification, the drawings that need to be used in the description of the examples or the prior art will be briefly introduced below, but it is clear. Indeed, the drawings described below are merely one or more embodiments of the specification, and those skilled in the art may obtain other drawings from these drawings without paying creative labor. can.
FIG. 1 is a schematic flow diagram of an image classification method according to one or more embodiments of the present specification. FIG. 2 is a schematic diagram of the application principle of the residual network model according to one or more embodiments of the present specification. FIG. 3 is a schematic diagram of a residual block of a perforated residual network main trunk according to one or more embodiments of the present specification. FIG. 4 is a structural schematic diagram of an attention mechanism model according to one or more embodiments of the present specification. FIG. 5 is a schematic structural diagram of a residual attention mechanism model (Dirated-CBAM) according to one or more embodiments of the present specification. FIG. 6 is a schematic structural diagram of the residual channel attention module according to one or more embodiments of the present specification. FIG. 7 is a schematic structural diagram of another residual channel attention module according to one or more embodiments herein. FIG. 8 is a schematic structural diagram of an image classification device according to one or more embodiments of the present specification.

本明細書の目的、技術案及び利点をより明確にするために、以下、具体的な実施例と併せて図面を参照し、本明細書を更に詳しく説明する。 In order to further clarify the purposes, technical proposals and advantages of the present specification, the present specification will be described in more detail with reference to the drawings together with specific examples below.

説明すべきなのは、特に定義しない限り、本明細書の実施例に使用される技術用語又は科学用語は、当業者が理解できる通常の意味を有する。本開示に使用される「第一」、「第二」及び類似する用語は、いかなる順序、数量又は重要性を示すものではなく、異なる構成要素を区別するためのものに過ぎない。「含む」又は「包含」等の類似する用語は、当該用語の前に記載された素子、部材や方法ステップが、当該用語の後に挙げられる素子、部材や方法ステップ、及びそれらの同等物を含むが、他の素子、部材や方法ステップを排除しないことを意味する。「接続」や「繋がる」等の類似する用語は、物理的又は機械的接続に限定されず、直接又は間接を問わずに電気的接続を含んでもよい。「上」、「下」、「左」、「右」等は、相対位置関係を示すだけであり、説明対象の絶対位置が変わると、当該相対位置関係も対応して変化する可能性がある。 It should be explained that, unless otherwise defined, the technical or scientific terms used in the examples herein have the usual meanings understood by one of ordinary skill in the art. The terms "first," "second," and similar terms used in this disclosure do not indicate any order, quantity, or materiality, but merely to distinguish between different components. Similar terms such as "include" or "include" include elements, members or method steps in which the elements, members or method steps described prior to the term include the elements, members or method steps listed after the term, and their equivalents. However, it does not exclude other elements, members or method steps. Similar terms such as "connect" and "connect" are not limited to physical or mechanical connections and may include electrical connections, either directly or indirectly. "Upper", "lower", "left", "right", etc. only indicate relative positional relationships, and if the absolute position to be explained changes, the relative positional relationships may also change accordingly. ..

背景技術部分に記載されているように、画像分類とは、具体的に、コンピュータが、関連アルゴリズムの補助の下で、入力データを用いて画像の種類を判別することであり、研究ターゲット検出タスクや画像分割タスク等の重要な基礎として、比較的高い学術研究及び科学技術的な応用価値を持っており、コンピュータビジョン分野での研究作業のほとんどは、画像分類タスクに関連している。ディープラーニングの飛躍により、画像分類技術は、ハードウェアレベル及びソフトウェアレベルの両方にて顕著に向上されており、既存のビッグデータセットの多くでは、人間の目の画像識別能力を超える水準に達しており、画像分類及び関連するコンピュータビジョン分野の研究に注目し始める研究者もますます増えている。 As described in the Background Technology section, image segmentation is specifically a computer's determination of image types using input data with the assistance of relevant algorithms, a research target detection task. It has relatively high academic research and scientific and technological application value as an important basis for image segmentation tasks, etc., and most of the research work in the field of computer vision is related to image segmentation tasks. With the leap of deep learning, image classification technology has been significantly improved at both the hardware and software levels, reaching levels that exceed the human eye's ability to discriminate images in many existing big datasets. More and more researchers are beginning to focus on research in the field of image classification and related computer vision.

コンピュータビジョンにおける人気のある研究方向として、画像物体分類は、セキュリティ防御分野の映像インテリジェント分析、歩行者検出、顔認識、交通監視分野の逆行検出、車両カウント、交通シーン物体認識、ナンバープレート検出及び認識、物流管理統計分野の物体認識カウント、商品認識分類、製品品質評価、及び、アルバムインテリジェント分析分野のピクチャコンテンツに基づく画像検索、アルバム自動クラスタリング、人物体像検出、物体像検出等を含め、多くの分野で幅広く応用されている。 As a popular research direction in computer vision, image object classification is video intelligent analysis in the security defense field, pedestrian detection, face recognition, retrograde detection in the traffic monitoring field, vehicle counting, traffic scene object recognition, number plate detection and recognition. , Object recognition count in the field of logistics management statistics, product recognition classification, product quality evaluation, and image search based on picture content in the field of album intelligent analysis, automatic album clustering, human body image detection, object image detection, etc. Widely applied in the field.

しかしながら、画像データ量の増加や応用範囲の継続的な拡大に伴い、既存のネットワーク構造及びアルゴリズムは、様々な種類、様々な性質及び無秩序の画像データを完璧かつ効率的に分類するという要件を満たすにはほど遠い。したがって、研究者達は、画像分類の効率及び正確率を向上させるために、畳み込みニューラルネットワークアーキテクチャを更に検討及び改良していく必要がある。 However, as the amount of image data increases and the range of applications continues to expand, existing network structures and algorithms meet the requirement to classify different types, different properties and disordered image data perfectly and efficiently. Far from it. Therefore, researchers need to further study and improve the convolutional neural network architecture in order to improve the efficiency and accuracy of image classification.

上記実情に鑑みて、注意力モデルに残差メカニズムを取り入れ、注意力モデルにて残差エッジを利用して、注意力モジュール内の同等なマッピングを実行し、パラメータを増やすことなく注意力メカニズム内部のコンテキスト情報を結合させ、画像分類タスクにとって興味のある特徴がより正確に抽出されるように注意力モデルを支援することで、画像分類の効率及び正確率を向上させた。そして、本技術案によって改良された注意力メカニズムモデルの訓練時間は、本来の約半分に短縮され、訓練効率が大幅に向上される。 In view of the above situation, we incorporated the residual mechanism into the attention model, used the residual edge in the attention model to perform the equivalent mapping in the attention module, and inside the attention mechanism without increasing the parameters. We have improved the efficiency and accuracy of image classification by combining the contextual information of the image classification task and assisting the attention model to more accurately extract features of interest to the image classification task. The training time of the attention mechanism model improved by this technical proposal is shortened to about half of the original time, and the training efficiency is greatly improved.

本明細書の一実施例に係る画像分類方法のフロー模式図である図１を参照して、当該画像分類方法は、図１に示すように、具体的に以下のステップ１０１～１０４を含む。 With reference to FIG. 1, which is a schematic flow diagram of an image classification method according to an embodiment of the present specification, the image classification method specifically includes the following steps 101 to 104, as shown in FIG.

ステップ１０１は、残差ネットワークモデルを確立し、前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えて、穴付き残差ネットワーク主幹を生成することである。 Step 101 is to establish a residual network model and replace the standard convolution at the original edge of the residual network model with a perforated convolution to generate a perforated residual network backbone.

になり、即ち、元エッジ（図中のエッジ）に対しては、残差エッジに入力された特徴図内の情報が精緻化されて残差ネットワーク効果が最適化されるように、標準畳み込みの畳み込み層を構築することになる。ここで、標準畳み込みとは、一般的な畳み込みであり、数学上で通俗的に言えば、入力行列と畳み込みカーネル（畳み込みカーネも行列である）とが、対応する要素を乗算して合計を求めたものであるため、１回の畳み込みの結果出力は１つの数値となり、最後に入力された入力行列全体を遍歴すると、最終的に１つの結果行列が得られる。一般的な畳み込みの２次元畳み込みカーネルとしては、３＊３の畳み込みカーネルが最もよくあるが、ネットワークの設計に応じて、５＊５又は７＊７のものを設計してもよい。

That is, for the original edge (edge in the figure), the standard convolution is performed so that the information in the feature diagram input to the residual edge is refined and the residual network effect is optimized. You will build a convolutional layer. Here, the standard convolution is a general convolution, and mathematically speaking, the input matrix and the convolution kernel (the convolution carne is also a matrix) multiply the corresponding elements to obtain the sum. Therefore, the output of the result of one convolution becomes one numerical value, and when the entire input matrix input at the end is iterated, one result matrix is finally obtained. As a general convolutional two-dimensional convolution kernel, a 3 * 3 convolution kernel is most common, but a 5 * 5 or 7 * 7 convolution kernel may be designed depending on the network design.

次に、穴付き畳み込みとは、膨張畳み込みとも呼ばれる拡張畳み込み（ＤｉｌａｔｅｄＣｏｎｖｏｌｕｔｉｏｎ）であり、標準の畳み込みカーネルに穴を注入することで、モデルの受容野（ｒｅｃｅｐｔｉｏｎｆｉｅｌｄ）を増加させるものである。一般的な畳み込みに比べて、拡張畳み込みは、拡張率パラメータを増加させており、拡張率とは、畳み込みカーネルの点の間の間隔の数を指す。拡張率が一般的な畳み込み内に設定されていると仮定すると、その拡張率の値は１となり、畳み込みカーネルの点同士が隣接していることを示すが、穴付き畳み込みでは、拡張率は、１でなく、例えば、拡張率が２の場合、畳み込みカーネルの点の間が１画素離間していることを示し、即ち、拡張率が２の穴付き畳み込みの３＊３畳み込みカーネルと、標準畳み込みの５＊５畳み込みカーネルとは、同じ受容野を有する。残差ネットワークの場合、前期に得られた入力画像の特徴図によって抽出されるのは、一般に画像の輪郭情報となるが、穴付き畳み込みの場合、それによってもたらされた、受容野を拡大するという特性によれば、初期特徴図の有用な情報をより好適にスクリーニングすることが可能であり、初期に抽出された画像輪郭、縁の特徴図、及び、後期に抽出された画像細部情報の特徴図を結合させて、画像情報をより好適に纏めて統括することができるため、ネットワーク画像分類の効果が向上される。 Next, perforated convolution is a extended convolution, also known as an inflated convolution, which increases the model's receptive field by injecting holes into a standard convolution kernel. Compared to general convolution, extended convolution increases the expansion factor parameter, which refers to the number of intervals between points in the convolution kernel. Assuming the expansion ratio is set within a typical convolution, the expansion ratio value is 1, indicating that the points of the convolution kernel are adjacent, but in a perforated convolution, the expansion ratio is For example, if the expansion ratio is 2 instead of 1, it indicates that the points of the convolution kernel are separated by 1 pixel, that is, the 3 * 3 convolution kernel with a hole convolution having an expansion ratio of 2 and the standard convolution. It has the same receptive field as the 5 * 5 convolution kernel of. In the case of the residual network, the contour information of the image is generally extracted by the feature diagram of the input image obtained in the previous period, but in the case of the perforated convolution, the receptive field brought about by it is expanded. According to the characteristic, it is possible to more preferably screen useful information of the initial feature diagram, and features of the image contour extracted at the early stage, the characteristic diagram of the edge, and the detailed image information extracted at the later stage. Since the images can be combined and the image information can be more preferably collected and integrated, the effect of network image classification is improved.

ステップ１０２は、注意力メカニズムモデルのチャンネル注意力モジュール及び空間注意力モジュールに基づいて、前記残差ネットワークモデルの重み層を生成することである。 Step 102 is to generate the weight layer of the residual network model based on the channel attention module and the spatial attention module of the attention mechanism model.

本ステップの目的としては、注意力メカニズムモデルにおけるチャンネル注意力モジュール及び空間注意力モジュールを残差ネットワークモデルの重み層として設定することである。ここで、注意力メカニズムモデル（ＣＢＡＭ、ＣｏｎｖｏｌｕｔｉｏｎａｌＢｌｏｃｋＡｔｔｅｎｔｉｏｎＭｏｄｕｌｅ）は、図４に示すように、空間（ｓｐａｔｉａｌ）とチャンネル（ｃｈａｎｎｅｌ）とを結合させた注意力メカニズムモジュールであり、丸印内の「×」は、行列の要素毎（ｅｌｅｍｅｎｔ－ｗｉｓｅ）のドット積操作を表す。本具体的な実施例は、ＣＢＡＭモデルにおけるチャンネル注意力メカニズムを用いて、チャンネルに対して最大プーリング操作及び平均プーリング操作を行うことで、得られた特徴図を多層パーセプトロン（ｓｈａｒｅｄＭＬＰ）に入力し、得られた２つの特徴図に対して要素毎の加算操作を使用し、ｓｉｇｍｏｉｄ活性化関数によって、畳み込み層から出力された特徴図を非線形化し、チャンネル注意力の表現能力を拡大することができるため、より効果的なチャンネルの重みが得られる。 The purpose of this step is to set the channel attention module and the spatial attention module in the attention mechanism model as the weight layer of the residual network model. Here, the attention mechanism model (CBAM, Structural Block Attention Module) is an attention mechanism module in which a space (spatial) and a channel (channel) are combined as shown in FIG. 4, and the "attention mechanism module" in the circle is used. “X” represents a dot product operation for each element (element-wise) of the matrix. In this specific embodiment, the feature diagram obtained by performing the maximum pooling operation and the average pooling operation on the channel using the channel attention mechanism in the CBAM model is input to the multi-layer perceptron (shard MLP). , The element-by-element addition operation can be used for the two obtained feature diagrams, and the feature diagram output from the convolution layer can be non-linearized by the sigmoid activation function to expand the ability to express channel attention. Therefore, a more effective channel weight can be obtained.

具体的な実施例において、チャンネル注意力モジュール及び空間注意力モジュールに基づいているため、重み層は、２つ生成され、穴付き残差ネットワーク主幹に設定されるものとなる。その設定方式としては、２つのモジュールをそのまま重み層として取り出してもよいし、モジュールを取り出した後に、モジュールを更に調整することで重み層を形成してもよく、例えば、チャンネル注意力モジュールに対しては、標準畳み込みの残差エッジとチャンネル注意力モジュールとの行列加算処理によって、対応する重み層を生成してもよいし、前のステップと同様な穴付き残差エッジとの行列加算を行うことで、対応する重み層を生成してもよい。次に、チャンネル注意力重み層については、生成された２つの重み層が並列関係とされてもよいし、直列関係とされてもよい。直列関係では、先となる重み層として、チャンネル注意力重み層とされてよいし、空間注意力重み層とされてもよい。 In a specific embodiment, since it is based on the channel attention module and the spatial attention module, two weight layers are generated and set in the perforated residual network main trunk. As the setting method, the two modules may be taken out as a weight layer as they are, or the weight layer may be formed by further adjusting the modules after taking out the modules. For example, for a channel attention module. Alternatively, the corresponding weight layer may be generated by the matrix addition process of the residual edge of the standard convolution and the channel attention module, or the matrix addition with the residual edge with holes similar to the previous step is performed. By doing so, the corresponding weight layer may be generated. Next, regarding the channel attention weight layer, the two generated weight layers may be in a parallel relationship or a serial relationship. In the serial relationship, the preceding weight layer may be a channel attention weight layer or a spatial attention weight layer.

ステップ１０３は、前記穴付き残差ネットワーク主幹と前記重み層とで構成される残差注意力メカニズムモデルを生成し、前記残差注意力メカニズムモデルを訓練することである。 Step 103 is to generate a residual attention mechanism model composed of the holed residual network main trunk and the weight layer, and train the residual attention mechanism model.

本ステップの目的としては、生成された主幹と重み層とを結合させて、残差注意力メカニズムモデルを生成して訓練することである。ここで、具体的な実施例における残差注意力メカニズムモデル（Ｄｉｌａｔｅｄ－ＣＢＡＭ）の構造模式図である図５に示すように、丸印内の「×」は、行列の要素毎のドット積操作を表し、丸印内の「＋」は、行列加算の要素毎の操作を表す。そして、Ｄｉｌａｔｅｄ－ＣＢＡＭモデルに対しては、画像分類のモデル訓練が行われる。 The purpose of this step is to connect the generated main trunk and the weight layer to generate and train a residual attention mechanism model. Here, as shown in FIG. 5, which is a structural schematic diagram of the residual attention mechanism model (Dilated-CBAM) in a specific embodiment, “x” in the circle indicates a dot product operation for each element of the matrix. , And "+" in the circle represents the operation of each element of matrix addition. Then, model training for image classification is performed on the Directed-CBAM model.

具体的な応用シーンでは、Ｄｉｌａｔｅｄ－ＣＢＡＭモデルの画像分類効果を検証し、更にＣｉｆａｒ－１０データセット（画像データセットの１つであり、ＣＩＦＡＲ－１００と同様にラベル付きのデータセットであり、より大きな規模とされる８千万枚のスモールピクチャに由来するデータセット）の訓練セットを訓練して最適化されたＤｉｌａｔｅｄ－ＣＢＡＭモデルを利用するために、表１に示すように、Ｃｉｆａｒ－１０データセットのテストセットによって、訓練して得られたネットワーク及び重みによる同じ性質の画像データの分類の正確率及び収束能力を検証した。ここで、Ｔｒａｉｎａｃｃは、Ｃｉｆａｒ－１０データセット訓練セットでのモデルによる分類の成功率を表し、Ｔｅｓｔａｃｃは、Ｃｉｆａｒ－１０データセットのテストセットでのモデルによる分類の成功率を表し、ＥＰＯＣＨは、モデル期間又は周期を表し、１つの完全なデータセットがニューラルネットワークを１回通過してから１回戻るまでの過程は、１回のｅｐｏｃｈと呼ばれる。その中でのモデルは、順次に、それぞれ１８層の残差ネットワークモデル（ＲｅｓＮｅｔ－１８）と、既存のＣＢＡＭモデルと、ＣＢＡＭモデル埋め込み型穴付き畳み込み実験モデルと、チャンネル注意力モジュールがＣＢＡＭモデルにおける元チャンネル注意力モジュールとされるＤｉｌａｔｅｄ－ＣＢＡＭモデルフレームワークと、チャンネル注意力モジュールが、チャンネル注意力モジュールと残差ネットワークエッジとを結合させた残差チャンネル注意力モジュールとされるＤｉｌａｔｅｄ－ＣＢＡＭモデルフレームワークと、チャンネル注意力モジュールが、チャンネル注意力モジュールと穴付き残差ネットワークエッジとを結合させた穴付き残差チャンネル注意力モジュールとされるＤｉｌａｔｅｄ－ＣＢＡＭモデルフレームワークと、穴付き残差ネットワーク主幹における穴付き畳み込みがグループ畳み込み（ｇｒｏｕｐｓｃｏｎｖ）に置き換えられたＤｉｌａｔｅｄ－ＣＢＡＭモデルフレームワークと、ＥＬＵ活性化関数が埋め込まれたＤｉｌａｔｅｄ－ＣＢＡＭモデルフレームワークと、ＳＥＬＵ活性化関数が埋め込まれたＤｉｌａｔｅｄ－ＣＢＡＭモデルフレームワークとになる。Ｄｉｌａｔｅｄ－ＣＢＡＭモデルについては、その中のチャンネル注意力モジュールが、チャンネル注意力モジュールと残差ネットワークエッジとを結合させた残差チャンネル注意力モジュールである場合（即ち、表中の５行目のデータ）、訓練セットでの分類の正確率が９８．７％に達し、テストセットでの分類の正確率が９３．５％に達している一方、その収束速度がわずか１０周期になっていることが分かる。 In a specific application scene, the image classification effect of the Laid-CBAM model is verified, and the Cifar-10 data set (one of the image data sets, which is a labeled data set like the CIFAR-100, and more. Cifar-10 data, as shown in Table 1, to train a training set (data set derived from a large scale of 80 million small pictures) to utilize an optimized Laid-CBAM model. The test set of the set verified the accuracy and convergence ability of the classification of image data of the same nature by the network and weight obtained by training. Here, Train ac represents the success rate of classification by model in the Cifar-10 dataset training set, Test acc represents the success rate of classification by model in the test set of Cifar-10 dataset, and EPOCH is. , Representing a model period or period, the process from one complete dataset passing through the neural network once to returning once is called a single epoch. Among them, the models are, in sequence, an 18-layer residual network model (ResNet-18), an existing CBAM model, a CBAM model embedded holed convolutional experimental model, and a channel attention module in the CBAM model. The Directed-CBAM model framework, which is the original channel attention module, and the Laid-CBAM model frame, which is the residual channel attention module in which the channel attention module combines the channel attention module and the residual network edge. The work and the channel attention module are the Dilated-CBAM model framework, which is a holed residual channel attention module that combines the channel attention module and the holed residual network edge, and the holed residual network main trunk. The Laid-CBAM model framework in which the perforated convolution in is replaced by a group conv, the Laid-CBAM model framework in which the ELU activation function is embedded, and the Laid-CBAM in which the SELU activation function is embedded. It becomes a model framework. For the Laid-CBAM model, if the channel attention module in it is a residual channel attention module that combines the channel attention module and the residual network edge (ie, the data in row 5 of the table). ), The accuracy rate of classification in the training set has reached 98.7%, the accuracy rate of classification in the test set has reached 93.5%, while the convergence speed is only 10 cycles. I understand.

ステップ１０４は、画像データを前記残差注意力メカニズムモデルに入力し、前記画像データを認識して分類することである。 Step 104 is to input the image data into the residual attention mechanism model and recognize and classify the image data.

本ステップの目的としては、認識すべき画像を訓練済みの残差注意力メカニズムモデルに入力し、残差注意力メカニズムモデルによって画像を認識して分類することである。ここで、画像データは、例えばビデオカメラやカメラ等の外部機器によって得られたものであってもよいし、ユーザが外部ネットワークを介して得たものであってもよく、更に、システム又はサーバ自身のデータベースに保存されたもの等であってもよい。 The purpose of this step is to input the image to be recognized into the trained residual attention mechanism model and to recognize and classify the image by the residual attention mechanism model. Here, the image data may be obtained by an external device such as a video camera or a camera, may be obtained by the user via an external network, and may be obtained by the system or the server itself. It may be the one stored in the database of.

認識された分類結果については、保存、提示又は再加工の形で認識分類結果が処理されてもよく、ここでの分類結果は、単一の画像が具体的に属するタイプ、又は、複数枚の画像間の分類処理結果であってもよい。様々な応用シーン及び実施ニーズに応じて、具体的に認識分類結果の出力方式を柔軟に選択可能である。 Regarding the recognized classification result, the recognition classification result may be processed in the form of storage, presentation or reprocessing, and the classification result here is a type to which a single image specifically belongs, or a plurality of images. It may be the result of classification processing between images. It is possible to flexibly select the output method of the recognition classification result specifically according to various application scenes and implementation needs.

例えば、単一の機器上で本実施例に係る方法が実行される応用シーンの場合は、認識分類結果を、そのまま現在機器の表示部品（ディスプレイやプロジェクタ等）に表示させるように出力して、現在機器の操作者が表示部品から認識分類結果の内容を直接視認できるようにすることが可能である。 For example, in the case of an application scene in which the method according to this embodiment is executed on a single device, the recognition classification result is output as it is so as to be displayed on the display component (display, projector, etc.) of the current device. Currently, it is possible for the operator of the device to directly visually recognize the contents of the recognition classification result from the display component.

別の例として、複数の機器で構成されるシステム上で本実施例に係る方法が実行される応用シーンの場合は、認識分類結果を、任意のデータ通信方式（有線接続、ＮＦＣ、ブルートゥース（登録商標）、ｗｉｆｉ、セルラモバイルネットワーク等）を介してシステム内の他の受信側としての所定機器に送信し、認識分類結果を受信した所定機器がそれに対して後続処理を行えるようにすることが可能である。選択的に、当該所定機器は、所定のサーバであってもよく、サーバは、通常、クラウドに設定されるものであり、データの処理及び保存センタとして、認識分類結果を保存及び配信可能であり、ここで、配信の受信側は端末機器であり、当該端末機器の所有者又は操作者は、現在のユーザ、画像を所有する機構又は個人、画像の提示に関連する組織、個人やウェブサイト等であり得る。 As another example, in the case of an application scene in which the method according to this embodiment is executed on a system composed of a plurality of devices, the recognition classification result can be input to any data communication method (wired connection, NFC, Bluetooth (registered). It is possible to transmit to a predetermined device as another receiving side in the system via (trademark), wifi, cellular mobile network, etc., and allow the predetermined device that has received the recognition classification result to perform subsequent processing on it. Is. Optionally, the predetermined device may be a predetermined server, and the server is usually set in the cloud, and can store and distribute the recognition classification result as a data processing and storage center. Here, the receiving side of the distribution is a terminal device, and the owner or operator of the terminal device is the current user, the mechanism or individual who owns the image, the organization related to the presentation of the image, the individual, the website, etc. Can be.

更なる例として、複数の機器で構成されるシステム上で本実施例に係る方法が実行される応用シーンの場合は、認識分類結果を、任意のデータ通信方式を介してそのまま所定の端末機器に送信することが可能であり、端末機器は、前の段落に列挙された１つ又は複数であり得る。 As a further example, in the case of an application scene in which the method according to this embodiment is executed on a system composed of a plurality of devices, the recognition classification result is directly transferred to a predetermined terminal device via an arbitrary data communication method. It is possible to transmit and the terminal device may be one or more listed in the previous paragraph.

本明細書の１つ又は複数の実施例を適用することによって提供される画像分類方法は、残差ネットワークモデルを確立し、前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えて、穴付き残差ネットワーク主幹を生成することと、注意力メカニズムモデルのチャンネル注意力モジュール及び空間注意力モジュールに基づいて、前記残差ネットワークモデルの重み層を生成することと、前記穴付き残差ネットワーク主幹と前記重み層とで構成される残差注意力メカニズムモデルを生成し、前記残差注意力メカニズムモデルを訓練することと、画像データを前記残差注意力メカニズムモデルに入力し、前記画像データを認識して分類することとを含む。本明細書の１つ又は複数の実施例は、注意力モデルに残差メカニズムを取り入れて、パラメータを増やすことなく注意力メカニズム内部のコンテキスト情報を結合させ、画像分類タスクにとって興味のある特徴がより正確に抽出されるように注意力モデルを支援することで、画像分類の効率及び正確率を向上させた。そして、本技術案によって改良された注意力メカニズムモデルの訓練時間は、本来の約半分に短縮され、訓練効率が大幅に向上される。 The image classification method provided by applying one or more embodiments herein establishes a residual network model and replaces the standard convolution at the original edge of the residual network model with a perforated convolution. , Generating a holed residual network trunk, generating a weight layer of the residual network model based on the channel attention module and spatial attention module of the attention mechanism model, and the holed residual. The residual attention mechanism model composed of the network main trunk and the weight layer is generated, the residual attention mechanism model is trained, and the image data is input to the residual attention mechanism model, and the image is described. Includes recognizing and classifying data. One or more embodiments of the specification incorporate a residual mechanism into the attention model, combining contextual information within the attention mechanism without increasing parameters, and more features of interest to the image classification task. By assisting the attention model to be extracted accurately, we improved the efficiency and accuracy of image classification. The training time of the attention mechanism model improved by this technical proposal is shortened to about half of the original time, and the training efficiency is greatly improved.

説明すべきなのは、本明細書の１つ又は複数の実施例に係る方法は、単一の機器、例えば１台のコンピュータ又はサーバ等によって実行されてもよい。本実施例に係る方法は、分散式シーンに適用されて、複数台の機器間の協働によって完成されてもよい。このような分散型シーンの場合、これらの複数台の機器のうち、１台の機器が、本明細書の１つ又は複数の実施例に係る方法内の何れか１つ又は複数のステップのみを実行し、これらの複数台の機器間がインタラクションを行って前記の方法を完成させてもよい。 It should be explained that the method according to one or more embodiments herein may be performed by a single device, such as a single computer or server. The method according to this embodiment may be applied to a distributed scene and completed by collaboration between a plurality of devices. In the case of such a distributed scene, of these plurality of devices, one device performs only one or more steps within the method according to one or more embodiments herein. It may be performed and the plurality of devices interact with each other to complete the above method.

上記では、本明細書の特定の実施例について説明したが、他の実施例も、添付の特許請求の範囲内に含まれる。一部の場合、特許請求の範囲に記載の動作又はステップは、実施例とは異なる順序で実行されることも可能であり、且つ所望の結果が達成できる。また、図面に記載の手順は、所望の結果を得るために、示されている特定の順序や連続順序を必ずしも必要とするとは限らない。いくつかの実装形態では、マルチタスク処理及び並行処理も可能であるか、或いは有利であり得る。 Although the specific embodiments of the present specification have been described above, other embodiments are also included in the appended claims. In some cases, the actions or steps described in the claims may be performed in a different order than in the examples, and the desired result can be achieved. Also, the procedures described in the drawings do not necessarily require the particular order or sequence shown to obtain the desired result. In some implementations, multitasking and concurrency may also be possible or advantageous.

本明細書の選択的な実施例において、画像認識の効果を最良にするために、上述の前記残差ネットワークモデルの重み層を生成することは、
前記チャンネル注意力モジュール及び前記空間注意力モジュールに基づいて、チャンネル注意力重み層及び空間注意力重み層を生成し、前記チャンネル注意力重み層と前記空間注意力重み層とを直列に順次配列することを含む。 In a selective embodiment of the present specification, in order to maximize the effect of image recognition, it is possible to generate the weight layer of the residual network model described above.
A channel attention weight layer and a spatial attention weight layer are generated based on the channel attention module and the spatial attention module, and the channel attention weight layer and the spatial attention weight layer are sequentially arranged in series. Including that.

ここで、直列に順次配列するとは、図５に示す構造模式図におけるチャンネル注意力モジュールと空間注意力モジュールとの配列方式となる。具体的な応用シーンでは、特徴図が先ずチャンネル注意力モジュールによって処理され、処理結果が空間注意力モジュールに入力されて処理され、次に、出力結果と残差エッジとの行列加算が行われる。なお、チャンネル注意力モジュールと空間注意力モジュールとの配列方式としては、チャンネル注意力モジュールよりも、空間注意力モジュールが先となるように直列してもよいし、両モジュールが並列する等であってもよい。 Here, arranging sequentially in series means an arrangement method of the channel attention module and the spatial attention module in the structural schematic diagram shown in FIG. In a specific application scene, the feature diagram is first processed by the channel attention module, the processing result is input to the spatial attention module and processed, and then the matrix addition of the output result and the residual edge is performed. As for the arrangement method of the channel attention module and the spatial attention module, the spatial attention module may be serialized so as to precede the channel attention module, or both modules may be arranged in parallel. You may.

本明細書の選択的な実施例において、受容野にマルチスケールのコンテキスト情報を抽出させることで、画像の領域をより正確に重み付けするためには、前記チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行って、前記チャンネル注意力重み層を生成するようにしており、ここでの残差ネットワークエッジは、現在の既存残差ネットワークにおける残差エッジとなる。 In a selective embodiment of the present specification, in order to more accurately weight the region of the image by having the receptive field extract multi-scale contextual information, the channel attention module and the residual network edge are used. The matrix addition is performed to generate the channel attention weight layer, and the residual network edge here is the residual edge in the current existing residual network.

図６に示すように、具体的な実施例において、Ｄｉｌａｔｅｄ－ＣＢＡＭモデルの基礎チャンネル注意力モジュールは、ＣＢＡＭモデルを模倣し、平均プーリング及び最大プーリングを通じてチャンネルのグローバル特徴を抽出し、得られた特徴図をそれぞれ多層パーセプトロンに入力して、異なるチャンネル間の関係を計算し、チャンネル重み行列を出力し、次に、残差ネットワークモデルにおける残差エッジチャンネル重み行列の行列加算操作を行う。図６では、丸印内の「＋」は、行列加算の要素毎の操作を表し、丸印内の「Ｓ」字状曲線は、例えばＳｉｇｍｏｉｄ等の活性化関数を表す。 As shown in FIG. 6, in a specific embodiment, the basic channel attention module of the Laid-CBAM model mimics the CBAM model and extracts the global features of the channel through average pooling and maximum pooling, and the resulting features. Each figure is input to the multi-layer perceptron, the relationship between different channels is calculated, the channel weight matrix is output, and then the matrix addition operation of the residual edge channel weight matrix in the residual network model is performed. In FIG. 6, the “+” in the circle represents an operation for each element of matrix addition, and the “S” -shaped curve in the circle represents an activation function such as a sigmoid.

具体的な応用シーンでは、画像は、数値行列の形で保存及び計算され、１つのチャンネルは１つの行列に対応し、空間注意力モジュールは、各々のチャンネルに対応する行列上で効果が出るようにしている。数学的観点から分析すれば、同じ行列内には、コンテキスト情報の連結問題が存在しないため、空間注意力モジュールでは、Ｄｉｌａｔｅｄ－ＣＢＡＭモデルに残差メカニズムが適用されない。即ち、Ｄｉｌａｔｅｄ－ＣＢＡＭモデルにおける空間注意力モジュールとしては、現在のＣＢＡＭモデルにおける空間注意力モジュールをそのまま流用している。 In a specific application scene, the image is stored and calculated in the form of a numerical matrix so that one channel corresponds to one matrix and the spatial attention module has an effect on the matrix corresponding to each channel. I have to. From a mathematical point of view, the spatial attention module does not apply the residual mechanism to the Laid-CBAM model because there is no concatenation problem of contextual information in the same matrix. That is, as the spatial attention module in the Laid-CBAM model, the spatial attention module in the current CBAM model is used as it is.

本明細書の選択的な実施例において、画像特徴抽出中の画像サイズの変化を統合し、画像のサイズを再拡大することで、残差エッジでの特徴図と、チャンネル注意力モジュールから出力された特徴図との行列加算の要素毎の操作がより適合的に行われるようにするために、前記チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行う前には、前記チャンネル注意力モジュールに対して逆畳み込み操作を行うことを更に含む。 In a selective embodiment of the present specification, the feature diagram at the residual edge and the channel attention module are output by integrating the changes in image size during image feature extraction and re-magnifying the image size. Before performing the matrix addition between the channel attention module and the residual network edge, the channel attention module is performed so that the operation for each element of the matrix addition with the feature diagram can be performed more flexibly. Further includes performing a deconvolution operation on the object.

図７は、図６に逆畳み込み操作を加えたものを示す。図中の一重丸は、逆畳み込み（ｄｅｃｏｎｖｏｌｕｔｉｏｎ）操作を表す。本具体的な応用シーンでは、加算が必要とされる行列をより適合させ、正確度を向上させるために、逆畳み込み操作を行っているが、他の応用シーンでは、必ずしも逆畳み込みを行う必要がない。 FIG. 7 shows FIG. 6 with a deconvolution operation added. The single circle in the figure represents a deconvolution operation. In this specific application scene, a deconvolution operation is performed in order to better adapt the matrix that requires addition and improve the accuracy, but in other application scenes, it is not always necessary to perform deconvolution. not.

本明細書の選択的な実施例において、画像の輪郭をより正確に抽出し、計算速度及び収束速度を高めるために、上述の前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えることは、
前記元エッジにおける標準畳み込みを穴付き畳み込み直列バッチ正規化用の直列線形整流活性化関数の畳み込み層に置き換えることを含む。 In a selective embodiment of the present specification, the standard convolution at the original edge of the residual network model described above is replaced with a perforated convolution in order to extract the contour of the image more accurately and to increase the calculation speed and the convergence speed. That is
It involves replacing the standard convolution at the original edge with the convolution layer of the series linear rectification activation function for perforated convolution series batch normalization.

同じ構想に基づいて、本明細書の１つ又は複数の実施例は、図８に示すように、
残差ネットワークモデルを確立し、前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えて、穴付き残差ネットワーク主幹を生成する主幹モジュール８０１と、
注意力メカニズムモデルのチャンネル注意力モジュール及び空間注意力モジュールに基づいて、前記残差ネットワークモデルの重み層を生成する重みモジュール８０２と、
前記穴付き残差ネットワーク主幹と前記重み層とで構成される残差注意力メカニズムモデルを生成し、前記残差注意力メカニズムモデルを訓練する生成モジュール８０３と、
画像データを前記残差注意力メカニズムモデルに入力し、前記画像データを認識して分類する分類モジュール８０４とを含む、画像分類機器を更に提供している。 Based on the same concept, one or more embodiments herein are as shown in FIG.
A trunk module 801 that establishes a residual network model and replaces the standard convolution at the original edge of the residual network model with a perforated convolution to generate a perforated residual network backbone.
A weight module 802 that generates a weight layer of the residual network model based on the channel attention module and the spatial attention module of the attention mechanism model, and
A generation module 803 that generates a residual attention mechanism model composed of the residual network main trunk with holes and the weight layer, and trains the residual attention mechanism model.
Further provided are image classification devices including a classification module 804 that inputs image data into the residual attention mechanism model and recognizes and classifies the image data.

１つの選択的な実施例として、前記重みモジュール８０２が前記残差ネットワークモデルの重み層を生成することは、
前記チャンネル注意力モジュール及び前記空間注意力モジュールに基づいて、チャンネル注意力重み層及び空間注意力重み層を生成し、前記チャンネル注意力重み層と前記空間注意力重み層とを直列に順次配列することを含む。 As one optional embodiment, the weighting module 802 may generate a weighting layer for the residual network model.
A channel attention weight layer and a spatial attention weight layer are generated based on the channel attention module and the spatial attention module, and the channel attention weight layer and the spatial attention weight layer are sequentially arranged in series. Including that.

１つの選択的な実施例として、前記重みモジュール８０２は、チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行って、前記チャンネル注意力重み層を生成する。 As one selective embodiment, the weight module 802 performs matrix addition between the channel attention module and the residual network edge to generate the channel attention weight layer.

１つの選択的な実施例として、前記重みモジュール８０２が、チャンネル注意力モジュールと残差ネットワークエッジとの行列加算を行う前には、
前記チャンネル注意力モジュールに対して逆畳み込み操作を行うことを更に含む。 As one selective embodiment, before the weight module 802 performs matrix addition between the channel attention module and the residual network edge,
It further includes performing a deconvolution operation on the channel attention module.

１つの選択的な実施例として、前記主幹モジュール８０１が前記残差ネットワークモデルの元エッジにおける標準畳み込みを穴付き畳み込みに置き換えることは、
前記元エッジにおける標準畳み込みを穴付き畳み込み直列バッチ正規化用の直列線形整流活性化関数の畳み込み層に置き換えることを含む。 As one optional embodiment, the trunk module 801 replaces the standard convolution at the original edge of the residual network model with a perforated convolution.
It involves replacing the standard convolution at the original edge with the convolution layer of the series linear rectification activation function for perforated convolution series batch normalization.

説明の便宜上、以上の機器の説明時に機能を様々なモジュールに分けて個別に説明している。勿論、本明細書の１つ又は複数の実施例の実施時には、各モジュールの機能を同じ１つ又は複数のソフトウェア及び／又はハードウェアに実装してもよい。 For convenience of explanation, the functions are described individually by dividing them into various modules when the above devices are described. Of course, when implementing one or more embodiments herein, the functionality of each module may be implemented in the same software and / or hardware.

上記実施例に係る機器は、前述実施例における該当する方法を実現するためのものであり、かつ該当する方法の実施例の有益な効果を奏するが、ここで繰り返して説明しない。 The equipment according to the above embodiment is for realizing the corresponding method in the above-mentioned embodiment, and has a beneficial effect of the embodiment of the corresponding method, but will not be described repeatedly here.

当業者であれば理解すべきなのは、以上の如何なる実施例による議論も、単に例示的なものであり、本開示の範囲（請求項を含む）がこれらの例に限定されることを意味するものではない。本開示の思想に基づいて、以上の実施例又は異なる実施例における技術的特徴は、互いに組み合わせられてもよく、ステップは、任意の順序で実現されてもよく、更に、上記のような本明細書の１つ又は複数の実施例は、様々な態様による他の変形も多く存在するが、簡潔のため、これらの変形について詳しく記載されていない。 Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and means that the scope of the present disclosure (including claims) is limited to these examples. is not. Based on the ideas of the present disclosure, the technical features of the above embodiments or different embodiments may be combined with each other, the steps may be realized in any order, and the present specification as described above. Although one or more examples of the book also have many other variants in various embodiments, they are not described in detail for the sake of brevity.

更に、説明及び議論を簡素化するとともに、本明細書の１つ又は複数の実施例を理解し難くしないために、提供される図面には、集積回路（ＩＣ）チップ及び他の部品との公知の電源／グランド接続が示されてもよいし、示されなくてもよい。なお、本明細書の１つ又は複数の実施例が理解され難くなるのを回避するために、機器は、ブロック図の形で示されてもよく、そして、これは、以下の事実も考慮に入れており、即ち、これらのブロック図に係る機器の実施形態の細部は、本明細書の１つ又は複数の実施例を実施しようとするプラットフォームに大きく依存している（即ち、これらの細部は、完全に当業者の理解範囲内にあるべきである）。本開示の例示的な実施例に対する説明のために具体的な細部（例えば回路）が記載されている場合、当業者にとって明らかなことに、これらの具体的な細部がない場合や、これらの具体的な細部が変化した場合であっても、本明細書の１つ又は複数の実施例を実施することができる。したがって、これらの記載は、制限的なものではなく、説明的なものと見なされるべきである。 Further, in order to simplify the description and discussion and not obscure one or more embodiments herein, the drawings provided are publicly known with integrated circuit (IC) chips and other components. Power / ground connection may or may not be shown. It should be noted that, in order to avoid obscuring one or more embodiments of the specification, the device may be shown in the form of a block diagram, which also takes into account the following facts: That is, the details of the embodiments of the equipment according to these block diagrams are highly dependent on the platform on which one or more embodiments of the present specification are to be implemented (ie, these details are included). , Should be entirely within the understanding of those skilled in the art). Where specific details (eg, circuits) are provided for illustration purposes of the exemplary embodiments of the present disclosure, it will be apparent to those of skill in the art that these specific details are absent or these specifics. One or more embodiments of the present specification can be implemented even if the details change. Therefore, these statements should be considered descriptive rather than restrictive.

本開示の具体的な実施例を元に本開示を説明したが、上記の説明によれば、これらの実施例の置換、修正や変形の多くは、当業者とって明らかなものである。例えば、他のメモリアーキテクチャ（例えば、動的ＲＡＭ（ＤＲＡＭ））であっても、議論された実施例を適用可能である。 Although the present disclosure has been described based on specific embodiments of the present disclosure, many of the substitutions, modifications and variations of these embodiments will be apparent to those skilled in the art according to the above description. For example, other memory architectures (eg, dynamic RAM (DRAM)) are also applicable to the examples discussed.

本明細書の１つ又は複数の実施例は、添付の特許請求の範囲の広い範囲内に含まれるこれらの置換、修正や変形の全てをカバーすることを意図している。したがって、本明細書の１つ又は複数の実施例の精神及び原則内でなされた如何なる省略、修正、同等な置換、改良等は、全て本開示の保護範囲内に含まれるものとする。 One or more embodiments of the specification are intended to cover all of these substitutions, modifications and modifications contained within the broad scope of the appended claims. Accordingly, any omissions, amendments, equivalent substitutions, improvements, etc. made within the spirit and principles of one or more embodiments of this specification shall be within the scope of this disclosure.

Claims

It is an image classification method
Establishing a residual network model and replacing the standard convolution at the original edge of the residual network model with a perforated convolution to generate a perforated residual network backbone.
Generating the weight layer of the residual network model based on the channel attention module and the spatial attention module of the attention mechanism model,
To generate a residual attention mechanism model composed of the residual network main trunk with holes and the weight layer, and to train the residual attention mechanism model.
It includes inputting image data into the residual attention mechanism model and recognizing and classifying the image data.
Replacing the standard convolution at the original edge of the residual network model described above with a perforated convolution
It involves replacing the standard convolution at the original edge with the convolution layer of the series linear rectification activation function for perforated convolution series batch normalization.
Generating the weight layer of the residual network model described above
A channel attention weight layer and a spatial attention weight layer are generated based on the channel attention module and the spatial attention module, and the channel attention weight layer and the spatial attention weight layer are sequentially arranged in series. Including that
An image classification method comprising performing matrix addition between the channel attention module and a residual network edge to generate the channel attention weight layer.

Before performing the matrix addition between the channel attention module and the residual network edge,
The method of claim 1, further comprising performing a deconvolution operation on the channel attention module.

It is an image classification device
A trunk module that establishes a residual network model and replaces the standard convolution at the original edge of the residual network model with a perforated convolution to generate a perforated residual network backbone.
A weight module that generates a weight layer of the residual network model based on the channel attention module and the spatial attention module of the attention mechanism model, and
A generation module that generates a residual attention mechanism model composed of the residual network main trunk with holes and the weight layer, and trains the residual attention mechanism model.
Includes a classification module that inputs image data into the residual attention mechanism model and recognizes and classifies the image data.
It is possible that the trunk module replaces the standard convolution at the original edge of the residual network model with a perforated convolution.
It involves replacing the standard convolution at the original edge with the convolution layer of the series linear rectification activation function for perforated convolution series batch normalization.
It is possible that the weight module produces a weight layer for the residual network model.
A channel attention weight layer and a spatial attention weight layer are generated based on the channel attention module and the spatial attention module, and the channel attention weight layer and the spatial attention weight layer are sequentially arranged in series. Including that
The weight module is an image classification device characterized in that a matrix addition between the channel attention module and a residual network edge is performed to generate the channel attention weight layer.

Before the weight module performs matrix addition between the channel attention module and the residual network edge,
The device according to claim 3, further comprising performing a deconvolution operation on the channel attention module.