JP6991960B2

JP6991960B2 - Image recognition device, image recognition method and program

Info

Publication number: JP6991960B2
Application number: JP2018246993A
Authority: JP
Inventors: 建鋒徐; 和之田坂
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2022-01-13
Anticipated expiration: 2038-12-28
Also published as: JP2020107185A

Description

本発明は画像認識装置、画像認識方法及びプログラムに関し、特に画像認識処理の負荷を軽減するための技術に関する。 The present invention relates to an image recognition device, an image recognition method and a program, and more particularly to a technique for reducing the load of image recognition processing.

近年、ニューラルネットワークの一種であるディープラーニングを用いて画像から物体のクラスを認識する技術が実用化されている。このような技術の中には、認識精度を向上させるためにより多くの層を含むニューラルネットワークの構造も提案されている。ニューラルネットワークにおいては、層を重ねるごとにより高度で複雑な特徴を抽出できるようになる。したがって、層を深くすることはニューラルネットワークを用いた機械学習モデルの認識精度向上に重要な役割を果たす。 In recent years, a technique for recognizing a class of an object from an image using deep learning, which is a kind of neural network, has been put into practical use. Among such techniques, the structure of a neural network containing more layers has been proposed in order to improve the recognition accuracy. In a neural network, more sophisticated and complex features can be extracted by stacking layers. Therefore, deepening the layer plays an important role in improving the recognition accuracy of machine learning models using neural networks.

一方で、ニューラルネットワークの層が深くなるほど認識処理実行時の計算量が増え、認識処理を実行するために要求される計算能力が増加する傾向がある。このため、例えばＩｏＴ（Internet Of Things）デバイス等の計算リソースが相対的に小さい装置ではニューラルネットワークの層が深い機械学習モデルを実行することが困難となりうる。 On the other hand, as the layer of the neural network becomes deeper, the amount of calculation at the time of executing the recognition process increases, and the computing power required to execute the recognition process tends to increase. For this reason, it may be difficult to execute a machine learning model with a deep neural network layer in a device having a relatively small computational resource such as an IoT (Internet Of Things) device.

この問題に対処するために、例えば非特許文献１では、機械学習モデルを構成する複数の層に出力層を設けて、計算リソースの変動に合わせて出力層を選んで画像認識を行う技術が提案されている。 In order to deal with this problem, for example, in Non-Patent Document 1, a technique is proposed in which output layers are provided in a plurality of layers constituting a machine learning model, and an output layer is selected according to fluctuations in computational resources to perform image recognition. Has been done.

Gao Huang, et al., Multi-Scale Dense Convolutional Networks for Resource Efficient Image Classification, International Conference on Learning Representations (ICLR) 2018.Gao Huang, et al., Multi-Scale Dense Convolutional Networks for Resource Efficient Image Classification, International Conference on Learning Representations (ICLR) 2018.

非特許文献１に係る技術では、ニューラルネットワークの浅い層（すなわち、入力層に近い層）の出力層で認識のための演算を終えるほど、認識精度が低下する。また、非特許文献１に係る技術のニューラルネットワークは２次元的に広がる層構造を持つため、機械学習モデルのサイズが大きくなり、また学習が難しくなりうる。このように、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減することは改善の余地がある。 In the technique according to Non-Patent Document 1, the recognition accuracy decreases as the calculation for recognition is completed in the output layer of the shallow layer (that is, the layer close to the input layer) of the neural network. Further, since the neural network of the technique according to Non-Patent Document 1 has a layered structure that spreads two-dimensionally, the size of the machine learning model becomes large and learning may become difficult. As described above, in the image recognition processing using the machine learning model of the neural network, there is room for improvement in reducing the processing load.

そこで、本発明はこれらの点に鑑みてなされたものであり、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減するための技術を提供することを目的とする。 Therefore, the present invention has been made in view of these points, and an object of the present invention is to provide a technique for reducing a processing load in an image recognition process using a machine learning model of a neural network.

本発明の第１の態様は、画像認識装置である。この装置は、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得するモデル取得部と、前記複数のフィルタのうち、前記画像データに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を前記画像データに適用することにより、前記被写体グループを特定するグループ認識部と、前記前段フィルタ群の適用結果と、前記複数のフィルタのうち前記前段フィルタ群を除いたフィルタ群である後段フィルタ群の適用結果とに基づいて、前記画像データに含まれる認識対象を特定する個別認識部と、前記グループ認識部が特定した被写体グループと、前記後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、前記後段フィルタ群のうち前記個別認識部に適用させるフィルタを選択するフィルタ選択部と、を備える。 The first aspect of the present invention is an image recognition device. This device is a machine learning model that includes a plurality of filters as model parameters, and information indicating which of a plurality of predetermined recognition targets the subject included in the image data to be processed is the recognition target. To recognize which subject group the subject included in the image data belongs to among a plurality of predetermined subject groups among the plurality of filters and the model acquisition unit that acquires the machine learning model that outputs the data. By applying the pre-stage filter group to the image data, the group recognition unit that identifies the subject group, the application result of the pre-stage filter group, and the filter group excluding the pre-stage filter group from the plurality of filters. Based on the application result of a certain latter stage filter group, the individual recognition unit that specifies the recognition target included in the image data, the subject group specified by the group recognition unit, and each filter constituting the latter stage filter group are set. A filter selection unit for selecting a filter to be applied to the individual recognition unit among the subsequent filter groups is provided based on at least the priority.

前記フィルタ選択部は、前記画像認識装置の計算リソースが許容する範囲において、前記後段フィルタ群を構成する各フィルタに設定されている優先度の高い順に前記個別認識部に適用させるフィルタを選択してもよい。 The filter selection unit selects filters to be applied to the individual recognition unit in descending order of priority set for each filter constituting the subsequent filter group within a range allowed by the calculation resource of the image recognition device. May be good.

前記画像認識装置は、前記複数のフィルタそれぞれの重み係数の大きさを示す指標を算出する重み指標算出部と、前記指標によって重み係数が大きいことを示しているフィルタには、重み係数が小さいことを示しているフィルタよりも、前記優先度を高く設定する優先度設定部と、をさらに備えてもよい。 The image recognition device has a weight index calculation unit that calculates an index indicating the magnitude of the weight coefficient of each of the plurality of filters, and a filter that indicates that the weight coefficient is large by the index has a small weight coefficient. It may be further provided with a priority setting unit for setting the priority higher than the filter showing the above.

前記画像認識装置は、前記複数のフィルタそれぞれについて他のフィルタとの類似度を算出する類似度算出部と、他に類似するフィルタが存在しないフィルタには、他に類似するフィルタが存在するフィルタよりも、前記優先度を高く設定する優先度設定部と、をさらに備えてもよい。 The image recognition device has a similarity calculation unit that calculates the similarity with other filters for each of the plurality of filters, and a filter that does not have other similar filters has a filter that has other similar filters. Further may be provided with a priority setting unit for setting the priority higher.

前記優先度設定部は、前記複数の被写体グループ毎に、前記後段フィルタ群を構成する各フィルタに設定する優先度を変更してもよい。 The priority setting unit may change the priority set for each filter constituting the subsequent filter group for each of the plurality of subject groups.

前記フィルタ選択部は、前記後段フィルタ群を構成する２以上のフィルタそれぞれに等しい優先度が設定されている場合、等しい優先度が設定されたフィルタから無作為にフィルタを選択してもよい。 When the equal priority is set for each of the two or more filters constituting the subsequent filter group, the filter selection unit may randomly select a filter from the filters set with the same priority.

前記画像認識装置は、複数の画像データと、前記複数の画像データそれぞれに含まれる被写体と、当該被写体の被写体グループとが関連付けられた学習データに基づいて、ニューラルネットワークを用いた機械学習によって前記機械学習モデルを生成する学習部をさらに備えてもよく、前記学習部は、前記前段フィルタ群を生成する前段学習部と、前記前段学習部が前記前段フィルタ群を生成した後に、前記前段フィルタ群への誤差逆伝搬を行わずに前記前段フィルタ群の適用結果を用いて前記後段フィルタ群を生成する後段学習部と、を備えてもよい。 The image recognition device is based on learning data in which a plurality of image data, a subject included in each of the plurality of image data, and a subject group of the subject are associated with each other, and the machine is subjected to machine learning by machine learning using a neural network. A learning unit that generates a learning model may be further provided, and the learning unit may include a pre-stage learning unit that generates the pre-stage filter group, and a pre-stage learning unit that generates the pre-stage filter group and then moves to the pre-stage filter group. It may be provided with a post-stage learning unit that generates the post-stage filter group by using the application result of the pre-stage filter group without performing the error back propagation.

本発明の第２の態様は、画像認識方法である。この方法において、プロセッサが、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得するステップと、前記複数のフィルタのうち、前記画像データに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を前記画像データに適用することにより、前記被写体グループを特定するステップと、特定した前記被写体グループと、前記複数のフィルタのうち前記前段フィルタ群を除いたフィルタ群である後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、前記後段フィルタ群を構成するフィルタの中から１以上のフィルタを選択するステップと、前記前段フィルタ群の適用結果と、選択した前記フィルタの適用結果とに基づいて、前記画像データに含まれる認識対象を特定するステップと、を実行する。 The second aspect of the present invention is an image recognition method. In this method, the processor is a machine learning model that includes a plurality of filters as model parameters, and which of the plurality of predetermined recognition targets the subject included in the image data to be processed is the recognition target. A step of acquiring a machine learning model that outputs information indicating the above, and recognition of which subject group among the plurality of filters the subject included in the image data belongs to a plurality of predetermined subject groups. By applying the pre-stage filter group for the purpose to the image data, a step of specifying the subject group, the specified subject group, and a post-stage filter which is a filter group excluding the pre-stage filter group from the plurality of filters. A step of selecting one or more filters from the filters constituting the latter-stage filter group, an application result of the first-stage filter group, and selection based on at least the priority set for each filter constituting the group. Based on the application result of the filter, the step of specifying the recognition target included in the image data is executed.

本発明の第３の態様は、プログラムである。このプログラムは、コンピュータに、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得する機能と、前記複数のフィルタのうち、前記画像データに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を前記画像データに適用することにより、前記被写体グループを特定する機能と、特定した前記被写体グループと、前記複数のフィルタのうち前記前段フィルタ群を除いたフィルタ群である後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、前記後段フィルタ群を構成するフィルタの中から１以上のフィルタを選択する機能と、前記前段フィルタ群の適用結果と、選択した前記フィルタの適用結果とに基づいて、前記画像データに含まれる認識対象を特定する機能と、を実現させる。 A third aspect of the present invention is a program. This program is a machine learning model that includes multiple filters as model parameters in a computer, and which of the plurality of predetermined recognition targets the subject included in the image data to be processed is the recognition target. A function to acquire a machine learning model that outputs information indicating the above, and recognition of which subject group the subject included in the image data belongs to among a plurality of predetermined subject groups among the plurality of filters. By applying the pre-stage filter group for the purpose to the image data, the function of specifying the subject group, the specified subject group, and the post-stage filter which is a filter group excluding the pre-stage filter group from the plurality of filters. A function to select one or more filters from the filters constituting the latter-stage filter group based on at least the priority set for each filter constituting the group, an application result of the first-stage filter group, and selection. Based on the application result of the filter, the function of specifying the recognition target included in the image data is realized.

本発明によれば、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減することができる。 According to the present invention, it is possible to reduce the processing load in the image recognition processing using the machine learning model of the neural network.

畳込みニューラルネットワークの一般的な機能構成を模式的に示す図である。It is a figure which shows typically the general functional structure of a convolutional neural network. ＡｌｅｘＮｅｔとして知られるニューラルネットワークの機械学習モデルの層構造を模式的に示す図である。It is a figure which shows typically the layer structure of the machine learning model of the neural network known as AlexNet. 実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークの層構造を模式的に示す図である。It is a figure which shows typically the layer structure of the convolutional neural network used by the image recognition apparatus which concerns on embodiment. 実施の形態に係るニューラルネットワークの学習過程を説明するための図である。It is a figure for demonstrating the learning process of the neural network which concerns on embodiment. 実施の形態に係るニューラルネットワークの認識過程を説明するための図である。It is a figure for demonstrating the recognition process of the neural network which concerns on embodiment. 実施の形態に係る画像認識装置の機能構成を模式的に示す図である。It is a figure which shows typically the functional structure of the image recognition apparatus which concerns on embodiment. 被写体グループ毎のフィルタの優先度を表形式で模式的に示す図である。It is a figure which shows typically the priority of the filter for each subject group in a tabular form. 実施の形態に係る学習部の機能構成を模式的に示す図である。It is a figure which shows typically the functional structure of the learning part which concerns on embodiment. 実施の形態に係る画像認識装置が実行する画像認識処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the image recognition processing executed by the image recognition apparatus which concerns on embodiment.

＜畳込みニューラルネットワーク＞
実施の形態に係る画像認識装置は、ニューラルネットワークの機械学習モデルを用いた画像認識処理を実行するための装置である。実施の形態に係る画像認識装置は、主な一例として、畳込みニューラルネットワーク（Convolutional Neural Network；ＣＮＮ）の機械学習モデルを用いる。そこで、実施の形態に係る情報処理装置の前提技術として、まず畳込みニューラルネットワークについて簡単に説明する。 <Convolutional neural network>
The image recognition device according to the embodiment is a device for executing image recognition processing using a machine learning model of a neural network. The image recognition device according to the embodiment uses a machine learning model of a convolutional neural network (CNN) as a main example. Therefore, as a prerequisite technique of the information processing apparatus according to the embodiment, first, a convolutional neural network will be briefly described.

図１は、畳込みニューラルネットワークの一般的な機能構成を模式的に示す図である。現在、様々な構成のニューラルネットワークが提案されているが、これらの基本構成は共通である。ニューラルネットワークの基本構成は、複数種類の層の重ね合わせ（又はグラフ構造）で表現される。ニューラルネットワークは、入力データに対する出力結果が適切な値になるようにモデルパラメータを学習する。言い換えると、ニューラルネットワークは、入力データに対する出力結果が適切な値になるように定義された損失関数を最小化するようにモデルパラメータを学習する。 FIG. 1 is a diagram schematically showing a general functional configuration of a convolutional neural network. Currently, neural networks with various configurations have been proposed, but these basic configurations are common. The basic configuration of a neural network is represented by a superposition (or graph structure) of a plurality of types of layers. The neural network learns the model parameters so that the output result for the input data is an appropriate value. In other words, the neural network learns the model parameters to minimize the loss function defined so that the output result for the input data is of appropriate value.

図１は、入力画像Ｉに含まれる被写体の種類を出力するように学習された機械学習モデルを示している。図１に示す例では、入力層Ｌｉに入力された入力画像Ｉは、第１畳込み層Ｃ１、第２畳込み層Ｃ２の順に処理され、プーリング層Ｐ、第１全結合層Ｆ１、第２全結合層Ｆ２、及び出力層Ｌｏに至るように構成されている。出力層は、入力画像Ｉに含まれる被写体の種類を示す識別ラベルＢを出力する。 FIG. 1 shows a machine learning model trained to output the types of subjects included in the input image I. In the example shown in FIG. 1, the input image I input to the input layer Li is processed in the order of the first convolution layer C1 and the second convolution layer C2, and the pooling layer P, the first fully connected layer F1, and the second. It is configured to reach the fully connected layer F2 and the output layer Lo. The output layer outputs an identification label B indicating the type of the subject included in the input image I.

例えば、図１に示す機械学習モデルが、犬や猫、猿等の複数の動物を認識するための機械学習モデルである場合、あらかじめ識別対象の動物を特定するための識別ラベルＢが割り当てられている。この機械学習モデルの入力層Ｌｉに入力画像Ｉが入力されると、出力層Ｌｏは、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す識別ラベルＢを出力する。なお、識別ラベルＢは、例えば、複数の認識対象それぞれに一意に割り当てられたビット列である。 For example, when the machine learning model shown in FIG. 1 is a machine learning model for recognizing a plurality of animals such as dogs, cats, and monkeys, an identification label B for identifying an animal to be identified is assigned in advance. There is. When the input image I is input to the input layer Li of the machine learning model, the output layer Lo outputs an identification label B indicating which of the plurality of predetermined recognition targets is the recognition target. The identification label B is, for example, a bit string uniquely assigned to each of the plurality of recognition targets.

ニューラルネットワークにおいては、前段層の出力がその前段層に隣接する後段層の入力となる。畳込みニューラルネットワークにおける各畳込み層は、前段層から入力された信号に対してフィルタを適用し、フィルタの出力がその層の出力となる。 In a neural network, the output of the front layer becomes the input of the rear layer adjacent to the front layer. Each convolutional layer in the convolutional neural network applies a filter to the signal input from the previous layer, and the output of the filter becomes the output of that layer.

図２は、ＡｌｅｘＮｅｔとして知られるニューラルネットワークの機械学習モデルの層構造を模式的に示す図である。図２に示すように、ＡｌｅｘＮｅｔは入力層Ｌｉと、５つの畳込み層（第１畳込み層Ｃ１、第２畳込み層Ｃ２、第３畳込み層Ｃ３、第４畳込み層Ｃ４、及び第５畳込み層Ｃ５）と、２つの全結合層（第１全結合層Ｆ１及び第２全結合層Ｆ２）と、出力層Ｌｏとを含み、最終層は１０００種類の識別ラベルＢを出力するように構成されている。すなわち、ＡｌｅｘＮｅｔは１０００種類の認識対象を認識するための畳込みニューラルネットワークである。 FIG. 2 is a diagram schematically showing the layer structure of a machine learning model of a neural network known as AlexNet. As shown in FIG. 2, AlexNet has an input layer Li and five folding layers (first folding layer C1, second folding layer C2, third folding layer C3, fourth folding layer C4, and a second folding layer. 5 convolution layer C5), two fully connected layers (first fully connected layer F1 and second fully connected layer F2), and an output layer Lo, and the final layer outputs 1000 kinds of identification labels B. It is configured in. That is, AlexNet is a convolutional neural network for recognizing 1000 types of recognition targets.

図示はしないが、認識精度を向上させるため、さらに深い層を持つネット構造が提案されている。例えば、ＲｅｓＮｅｔとして知られるニューラルネットワークの機械学習モデルは、１５２層からなる層構造を有している。ニューラルネットワークでは、層を重ねる毎により高度で複雑な特徴を抽出可能であるため、層を深くすることは認識精度の向上に重要な役割を果たしていると考えられる。 Although not shown, a net structure with a deeper layer has been proposed in order to improve recognition accuracy. For example, a machine learning model of a neural network known as ResNet has a layered structure consisting of 152 layers. In a neural network, it is possible to extract more sophisticated and complicated features by stacking layers, so it is considered that deepening the layers plays an important role in improving recognition accuracy.

＜実施の形態に係るニューラルネットワーク＞
図３は、実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークＮの層構造を模式的に示す図である。以下、図３を参照して、実施の形態に係るニューラルネットワークＮについて説明する。 <Neural network according to the embodiment>
FIG. 3 is a diagram schematically showing the layer structure of the convolutional neural network N used by the image recognition device according to the embodiment. Hereinafter, the neural network N according to the embodiment will be described with reference to FIG.

実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークＮの機械学習モデルは、従来の畳込みニューラルネットワークと同様に複数の畳込み層を備え、入力画像Ｉに含まれる被写体が複数の認識対象のうちいずれの認識対象であるかを出力する。このため、実施の形態に係る画像認識装置１が用いる畳込みニューラルネットワークＮの機械学習モデルは、複数のフィルタをモデルパラメータとして含んでいる。 The machine learning model of the convolutional neural network N used by the image recognition device according to the embodiment includes a plurality of convolutional layers as in the conventional convolutional neural network, and the subject included in the input image I is a plurality of recognition targets. Of these, which recognition target is output. Therefore, the machine learning model of the convolutional neural network N used by the image recognition device 1 according to the embodiment includes a plurality of filters as model parameters.

畳込みニューラルネットワークでは、層を重ねる毎により高度で複雑な特徴を抽出することができるので、層を深くすることによって区別が難しい類似した認識対象であっても認識できるようになる。反対に、認識対象同士が著しく異なっていれば、畳込みニューラルネットワークの層が少なくても認識することができる。これらの事象は、畳込みニューラルネットワークの前段部分で認識対象の大きな特徴を認識し、後段に進むほど各認識対象に特有の詳細な特徴を認識していることを示唆している。同様の事象が各層のフィルタ数を変更することでも実現できる。すなわち、各層のフィルタ数を多くすることによって区別が難しい類似した認識対象であっても認識できるようになる。反対に、認識対象同士が著しく異なっていれば、各層のフィルタ数が少なくても認識することができる。 In a convolutional neural network, more sophisticated and complicated features can be extracted by stacking layers, so that even similar recognition objects that are difficult to distinguish can be recognized by deepening the layers. On the contrary, if the recognition targets are significantly different from each other, it can be recognized even if the number of layers of the convolutional neural network is small. These events suggest that the large feature of the recognition target is recognized in the front part of the convolutional neural network, and the detailed feature peculiar to each recognition target is recognized in the latter part. A similar phenomenon can be realized by changing the number of filters in each layer. That is, by increasing the number of filters in each layer, even similar recognition targets that are difficult to distinguish can be recognized. On the contrary, if the recognition targets are significantly different from each other, recognition can be performed even if the number of filters in each layer is small.

したがって、機械学習モデルを構成するフィルタには、すべての認識対象の認識に寄与するフィルタと、ある認識対象を認識するためには重要な役割を果たす一方で別の認識対象の認識にはあまり寄与しないようなフィルタとが存在すると考えられる。前者のフィルタは複数の認識対象が共通に含む特徴を認識するためのフィルタが挙げられ、後者のフィルタは類似する特徴を有する特定の認識対象を区別するためのフィルタが挙げられる。 Therefore, the filters that make up the machine learning model are those that contribute to the recognition of all recognition targets and those that play an important role in recognizing one recognition target but contribute less to the recognition of another recognition target. It is considered that there is a filter that does not. The former filter includes a filter for recognizing a feature commonly included in a plurality of recognition targets, and the latter filter includes a filter for distinguishing a specific recognition target having similar features.

例えば、ある機械学習モデルの認識対象に、猫、犬、猿、又は牛等の哺乳類と、みかん、人参、大根、又はキャベツ等の植物と、自動車やビル等の人工物とが含まれているとする。この場合、例えば、哺乳類の「目」に強く反応するフィルタは、認識対象が哺乳類か否かという大きな特徴を捉えるためのフィルタと考えられるので、すべての認識対象の認識に寄与するフィルタと考えられる。 For example, recognition targets of a machine learning model include mammals such as cats, dogs, monkeys, or cows, plants such as mandarins, carrots, radishes, or cabbage, and artificial objects such as automobiles and buildings. And. In this case, for example, a filter that strongly reacts to the "eyes" of mammals is considered to be a filter for capturing a large feature of whether or not the recognition target is a mammal, and thus is considered to be a filter that contributes to the recognition of all recognition targets. ..

これに対し、例えば、「犬」と「猫」とを区別するために用いられるフィルタは、認識対象が犬又は猫である場合には認識に大きく寄与すると考えられるが、認識対象が猫や犬以外の「キャベツ」や「自動車」である場合には認識にあまり寄与しないと考えられる。すなわち、ある認識対象を認識するためには重要な役割を果たす一方で別の認識対象の認識にはあまり寄与しないようなフィルタが存在すると考えられる。 On the other hand, for example, the filter used to distinguish between "dog" and "cat" is considered to greatly contribute to recognition when the recognition target is a dog or a cat, but the recognition target is a cat or a dog. If it is a "cat" or "automobile" other than the above, it is considered that it does not contribute much to recognition. That is, it is considered that there is a filter that plays an important role in recognizing a certain recognition target but does not contribute much to the recognition of another recognition target.

図３に示すように、実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークＮの機械学習モデルは、前段フィルタ群と後段フィルタ群との２つのフィルタ群を備えている。図３に示す例では、前段フィルタ群は、５つの畳込み層（第１前段畳込み層Ｃ_ｆ１、第２前段畳込み層Ｃ_ｆ２、第３前段畳込み層Ｃ_ｆ３、第４前段畳込み層Ｃ_ｆ４、及び第５前段畳込み層Ｃ_ｆ５）と２つの前結合層（第１全結合層Ｆ１及び第２全結合層Ｆ２）に含まれるフィルタである。また後段フィルタ群は、５つの畳込み層（第１後段畳込み層Ｃ_ｒ１、第２後段畳込み層Ｃ_ｒ２、第３後段畳込み層Ｃ_ｒ３、第４後段畳込み層Ｃ_ｒ４、及び第５後段畳込み層Ｃ_ｒ５）に含まれるフィルタである。 As shown in FIG. 3, the machine learning model of the convolutional neural network N used by the image recognition device according to the embodiment includes two filter groups, a front-stage filter group and a rear-stage filter group. In the example shown in FIG. 3, the front-stage filter group consists of five folding layers (first front-stage folding layer C _f 1, second front-stage folding layer C _f 2, third front-stage folding layer C _f 3, fourth. It is a filter included in the pre-stage convolutional layer C _f4 and the fifth pre-stage convolutional layer C _f5 ) and two pre-bonding layers (first fully-bonded layer F1 and second fully-bonded layer F2). In addition, the posterior filter group consists of five convolutional layers (first posterior convolutional layer _Cr 1, second posterior convolutional layer _Cr 2, third posterior convolutional layer _Cr 3, and fourth posterior convolutional layer _Cr 3. 4 and the filter included in the 5th post-stage convolutional layer _Cr 5).

例えば、図２に示したＡｌｅｘＮｅｔに対して本実施の形態を適用した場合、各層が備えるフィルタを前段フィルタ群と後段フィルタ群とに等分する。具体的には、ニューラルネットワーク構造としては、前段フィルタ群及び後段フィルタ群の第１畳込み層については、（５５×５５）ノード×４８フィルタとなる。同様に、第２畳込み層～第５畳込み層については、それぞれ、（２７×２７）ノード×１２８フィルタ、（１３×１３）ノード×１９２フィルタ、（１３×１３）ノード×１９２フィルタ、（１３×１３）ノード×１２８フィルタとなる。また、第１全結合層Ｆ１、第２全結合層Ｆ２、第３全結合層Ｆ３、及び第４全結合層Ｆ４は、いずれも２０４８ノードとなる。 For example, when the present embodiment is applied to AlexNet shown in FIG. 2, the filter provided in each layer is equally divided into a front-stage filter group and a rear-stage filter group. Specifically, as the neural network structure, the first convolution layer of the front-stage filter group and the rear-stage filter group is a (55 × 55) node × 48 filter. Similarly, for the second convolutional layer to the fifth convolutional layer, a (27 × 27) node × 128 filter, a (13 × 13) node × 192 filter, and a (13 × 13) node × 192 filter, respectively. 13 x 13) Node x 128 filter. Further, the first fully connected layer F1, the second fully connected layer F2, the third fully connected layer F3, and the fourth fully connected layer F4 all have 2048 nodes.

［学習過程］
まず、実施の形態に係るニューラルネットワークＮの学習過程について説明する。 [Learning process]
First, the learning process of the neural network N according to the embodiment will be described.

図４（ａ）－（ｃ）は、実施の形態に係るニューラルネットワークＮの学習過程を説明するための図である。具体的には、図４（ａ）及び図４（ｂ）は、それぞれ前段フィルタ群の第１段階学習及び第２段階学習を示しており、図４（ｃ）は後段フィルタ群の学習を示している。ここで、前段フィルタ群は、入力画像Ｉに含まれる被写体が、あらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するために用いられるフィルタ群である。また、後段フィルタ群は、機械学習モデルを構成するフィルタの中から、前段フィルタを構成するフィルタを除いたフィルタ群である。 4 (a)-(c) are diagrams for explaining the learning process of the neural network N according to the embodiment. Specifically, FIGS. 4 (a) and 4 (b) show the first-stage learning and the second-stage learning of the first-stage filter group, respectively, and FIG. 4 (c) shows the learning of the second-stage filter group. ing. Here, the pre-stage filter group is a filter group used for recognizing which subject group among a plurality of predetermined subject groups the subject included in the input image I belongs to. The latter-stage filter group is a filter group excluding the filters constituting the first-stage filter from the filters constituting the machine learning model.

図４（ａ）に示すように、前段フィルタ群の第１段階学習では、学習用の画像データである学習データを入力としたとき、その画像データに対応付けられた被写体の識別ラベルＢを出力する第１の前段フィルタ群を生成する。すなわち、前段フィルタ群の第１段階学習は、通常の機械学習の工程と同様である。 As shown in FIG. 4A, in the first stage learning of the pre-stage filter group, when the learning data which is the image data for learning is input, the identification label B of the subject associated with the image data is output. The first pre-stage filter group is generated. That is, the first-stage learning of the pre-stage filter group is the same as the normal machine learning process.

図４（ｂ）に示すように、前段フィルタ群の第２段階学習では、学習データを入力としたとき、その画像データが属する被写体グループを示すグループ識別ラベルを出力するような機械学習モデルを生成する。具体的には、前段フィルタ群に含まれる第１全結合層Ｆ１と第２全結合層Ｆ２をファインチューニングすることで、画像データが属するグループを認識する第２の前段フィルタ群を生成する。 As shown in FIG. 4B, in the second stage learning of the pre-stage filter group, when the training data is input, a machine learning model that outputs a group identification label indicating the subject group to which the image data belongs is generated. do. Specifically, by fine-tuning the first fully connected layer F1 and the second fully connected layer F2 included in the first-stage filter group, a second first-stage filter group that recognizes the group to which the image data belongs is generated.

図４（ｂ）では、ファインチューニング後の第１全結合層Ｆ１と第２全結合層Ｆ２は、それぞれ第１全結合層Ｆ１’と第２全結合層Ｆ２’と記載されている。したがって、第１の前段フィルタ群の畳込み層と第２の前段フィルタ群の畳込み層とは共通である。第１の前段フィルタ群と第２の前段フィルタ群とは大部分が共通するため、両者を特に区別する場合を除き、単に前段フィルタ群と記載する。 In FIG. 4B, the first fully bonded layer F1 and the second fully bonded layer F2 after fine tuning are described as the first fully bonded layer F1'and the second fully bonded layer F2', respectively. Therefore, the convolutional layer of the first pre-stage filter group and the convolutional layer of the second pre-stage filter group are common. Since most of the first pre-stage filter group and the second pre-stage filter group are common, they are simply referred to as the pre-stage filter group unless the two are particularly distinguished.

後段フィルタの学習は、図４（ｃ）に示すように、前段フィルタ群の第１学習段階で生成された機械学習モデルとともに学習される。すなわち、第１の前段フィルタ群の出力と、後段フィルタ群の出力とを用いて学習データの識別ラベルＢを出力するように後段フィルタ群が学習される。後段フィルタ群の学習時には、前段フィルタ群への誤差逆伝搬は行われず、前段フィルタ群の適用結果を用いて後段フィルタ群が学習される。 As shown in FIG. 4C, the learning of the latter-stage filter is learned together with the machine learning model generated in the first learning stage of the first-stage filter group. That is, the rear filter group is learned so as to output the identification label B of the training data by using the output of the first front filter group and the output of the rear filter group. At the time of learning the latter-stage filter group, error back propagation is not performed to the first-stage filter group, and the latter-stage filter group is learned using the application result of the first-stage filter group.

［認識過程］
続いて、実施の形態に係るニューラルネットワークＮの学習過程について説明する。 [Recognition process]
Subsequently, the learning process of the neural network N according to the embodiment will be described.

図５（ａ）－（ｂ）は、実施の形態に係るニューラルネットワークＮの認識過程を説明するための図である。実施の形態に係るニューラルネットワークＮの認識過程は、入力画像Ｉに含まれる被写体が属するグループを認識するグループ認識の段階と、当該被写体そのものを認識する被写体認識の段階との２つの段階から構成されている。 5 (a)-(b) are diagrams for explaining the recognition process of the neural network N according to the embodiment. The recognition process of the neural network N according to the embodiment is composed of two stages, a group recognition stage for recognizing the group to which the subject included in the input image I belongs and a subject recognition stage for recognizing the subject itself. ing.

図５（ａ）は、入力画像Ｉのグループ認識を説明するための図である。グループ認識は、前段フィルタ群の第２段階学習で生成された第２の前段フィルタ群を用いて行われる。入力画像Ｉに第２の前段フィルタ群を適用することにより、入力画像Ｉに含まれる被写体が属する被写体グループが認識される。 FIG. 5A is a diagram for explaining group recognition of the input image I. Group recognition is performed using the second pre-stage filter group generated in the second-stage learning of the pre-stage filter group. By applying the second pre-stage filter group to the input image I, the subject group to which the subject included in the input image I belongs is recognized.

実施の形態に係る画像認識装置は、後段フィルタ群に含まれるフィルタをすべて適用することで、最も高い認識精度で認識対象を認識することが期待できる。しかしながら、後段フィルタ群に含まれるフィルタをすべて適用しなくても、一定精度の認識精度を維持することはできる。したがって、画像認識装置は、例えば、画像認識装置の計算リソースがもともと小さかったり、計算リソースは大きくても他の演算の処理負荷が大きく一時的に機械学習モデルの適用に割り当てる計算リソースが小さかったりする場合に、計算リソースに合わせて適用する後段フィルタを取捨選択することで、演算の実行可能性と、認識精度とのバランスを図ることができる。 The image recognition device according to the embodiment can be expected to recognize the recognition target with the highest recognition accuracy by applying all the filters included in the subsequent filter group. However, it is possible to maintain a constant recognition accuracy without applying all the filters included in the subsequent filter group. Therefore, in the image recognition device, for example, the calculation resource of the image recognition device is originally small, or even if the calculation resource is large, the processing load of other operations is large and the calculation resource temporarily allocated to the application of the machine learning model is small. In this case, the feasibility of the calculation and the recognition accuracy can be balanced by selecting the post-stage filter to be applied according to the calculation resource.

詳細は後述するが、実施の形態に係る画像認識装置は、入力画像Ｉに含まれる被写体が属する被写体グループに基づいて、後段フィルタ群を構成するフィルタの中から実際に適用するフィルタを選択する。そして、図５（ｂ）に示すように、第１の前段フィルタ群の出力と、選択された後段フィルタ群との出力を合わせて、入力画像Ｉに含まれる被写体を示す識別ラベルＢが出力される。 Although the details will be described later, the image recognition device according to the embodiment selects a filter to be actually applied from the filters constituting the subsequent filter group based on the subject group to which the subject included in the input image I belongs. Then, as shown in FIG. 5B, the output of the first front-stage filter group and the output of the selected rear-stage filter group are combined to output the identification label B indicating the subject included in the input image I. Ru.

ここで、入力画像Ｉのグループ認識に要求される演算量の増加量が、後段フィルタ群のフィルタの取捨選択による演算量の低減量を上回っては、後段フィルタ群のフィルタを取捨選択することの意味がない。しかしながら、実施の形態に係るニューラルネットワークＮにおいて、第１の前段フィルタ群の畳込み層と、第２の前段フィルタ群の畳込み層とは共通である。したがって、入力画像Ｉに含まれる被写体が属する被写体グループを認識するために実行した第２の前段フィルタ群の畳込み層の演算結果は、第１の前段フィルタ群の畳込み層の演算に流用できる。これにより、実施の形態に係るニューラルネットワークＮの認識過程において、入力画像Ｉに含まれる被写体が属する被写体グループを認識するために増加する演算コストは、実質的に第１全結合層Ｆ１’の演算と第２全結合層Ｆ２’の演算だけである。ゆえに、入力画像Ｉのグループ認識に要求される演算量の増加量は、後段フィルタ群のフィルタの取捨選択による演算量の低減量を十分下回ることが期待できる。 Here, if the amount of increase in the amount of calculation required for group recognition of the input image I exceeds the amount of reduction in the amount of calculation due to the selection of the filters in the subsequent filter group, the filters in the subsequent filter group are selected. meaningless. However, in the neural network N according to the embodiment, the convolutional layer of the first pre-stage filter group and the convolutional layer of the second pre-stage filter group are common. Therefore, the calculation result of the convolutional layer of the second pre-stage filter group executed for recognizing the subject group to which the subject included in the input image I belongs can be diverted to the calculation of the convolutional layer of the first pre-stage filter group. .. As a result, in the recognition process of the neural network N according to the embodiment, the calculation cost that increases for recognizing the subject group to which the subject included in the input image I belongs is substantially the calculation of the first fully connected layer F1'. And only the operation of the second fully connected layer F2'. Therefore, it can be expected that the amount of increase in the amount of calculation required for group recognition of the input image I is sufficiently smaller than the amount of decrease in the amount of calculation due to the selection of filters in the subsequent filter group.

このように、実施の形態に係る画像認識装置は、入力画像Ｉに適用するフィルタを取捨選択することにより、認識処理の処理負荷を軽減することができる。 As described above, the image recognition device according to the embodiment can reduce the processing load of the recognition process by selecting the filter applied to the input image I.

＜実施の形態に係る画像認識装置１の機能構成＞
図６は、実施の形態に係る画像認識装置１の機能構成を模式的に示す図である。画像認識装置１は、記憶部２と制御部３とを備える。図６において、矢印は主なデータの流れを示しており、図６に示していないデータの流れがあってもよい。図６において、各機能ブロックはハードウェア（装置）単位の構成ではなく、機能単位の構成を示している。そのため、図６に示す機能ブロックは単一の装置内に実装されてもよく、あるいは複数の装置内に分かれて実装されてもよい。機能ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてもよい。 <Functional configuration of the image recognition device 1 according to the embodiment>
FIG. 6 is a diagram schematically showing a functional configuration of the image recognition device 1 according to the embodiment. The image recognition device 1 includes a storage unit 2 and a control unit 3. In FIG. 6, the arrows indicate the main data flows, and there may be data flows not shown in FIG. In FIG. 6, each functional block shows not a hardware (device) unit configuration but a functional unit configuration. Therefore, the functional block shown in FIG. 6 may be mounted in a single device, or may be mounted in a plurality of devices separately. Data can be exchanged between functional blocks via any means such as a data bus, a network, and a portable storage medium.

記憶部２は、画像認識装置１を実現するコンピュータのＢＩＯＳ（Basic Input Output System）等を格納するＲＯＭ（Read Only Memory）や画像認識装置１の作業領域となるＲＡＭ（Random Access Memory）、ＯＳ（Operating System）やアプリケーションプログラム、当該アプリケーションプログラムの実行時に参照される種々の情報を格納するＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の大容量記憶装置である。 The storage unit 2 includes a ROM (Read Only Memory) that stores a BIOS (Basic Input Output System) of a computer that realizes the image recognition device 1, a RAM (Random Access Memory) that is a work area of the image recognition device 1, and an OS (OS). It is a large-capacity storage device such as an HDD (Hard Disk Drive) or SSD (Solid State Drive) that stores an Operating System), an application program, and various information referred to when the application program is executed.

制御部３は、画像認識装置１のＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等のプロセッサであり、記憶部２に記憶されたプログラムを実行することによって画像取得部３０、モデル取得部３１、グループ認識部３２、フィルタ選択部３３、個別認識部３４、リソース取得部３５、重み指標算出部３６、優先度設定部３７、類似度算出部３８、及び学習部３９として機能する。 The control unit 3 is a processor such as a CPU (Central Processing Unit) or GPU (Graphics Processing Unit) of the image recognition device 1, and the image acquisition unit 30 and the model acquisition unit are executed by executing a program stored in the storage unit 2. It functions as 31, a group recognition unit 32, a filter selection unit 33, an individual recognition unit 34, a resource acquisition unit 35, a weight index calculation unit 36, a priority setting unit 37, a similarity calculation unit 38, and a learning unit 39.

なお、図６は、画像認識装置１が単一の装置で構成されている場合の例を示している。しかしながら、画像認識装置１は、例えばクラウドコンピューティングシステムのように複数のプロセッサやメモリ等の計算リソースによって実現されてもよい。この場合、制御部３を構成する各部は、複数の異なるプロセッサの中の少なくともいずれかのプロセッサがプログラムを実行することによって実現される。 Note that FIG. 6 shows an example in which the image recognition device 1 is composed of a single device. However, the image recognition device 1 may be realized by a plurality of processors, memory, and other computational resources, such as a cloud computing system. In this case, each unit constituting the control unit 3 is realized by executing a program by at least one of a plurality of different processors.

画像取得部３０は、画像認識装置１が処理対象とする画像データである入力画像Ｉを取得する。モデル取得部３１は、複数のフィルタをモデルパラメータとして含む機械学習モデルを取得する。実施の形態に係るモデル取得部３１が取得する機械学習モデルは、処理対象の画像データである入力画像Ｉに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルである。 The image acquisition unit 30 acquires the input image I, which is the image data to be processed by the image recognition device 1. The model acquisition unit 31 acquires a machine learning model including a plurality of filters as model parameters. In the machine learning model acquired by the model acquisition unit 31 according to the embodiment, which of the plurality of predetermined recognition targets the subject included in the input image I, which is the image data to be processed, is the recognition target. It is a machine learning model that outputs information indicating.

グループ認識部３２は、学習モデルに含まれる複数のフィルタのうち、入力画像Ｉに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群（すなわち、上述した第２の前段フィルタ群）を、入力画像Ｉに適用して被写体グループを特定する。 The group recognition unit 32 is a pre-stage filter group for recognizing which subject group the subject included in the input image I belongs to among the plurality of predetermined subject groups among the plurality of filters included in the learning model. (That is, the above-mentioned second pre-stage filter group) is applied to the input image I to specify the subject group.

フィルタ選択部３３は、機械学習モデルに含まれる複数のフィルタのうち、前段フィルタ群を除いたフィルタ群である後段フィルタ群の中から１以上のフィルタを選択する。ここで、フィルタ選択部３３は、グループ認識部３２が特定した被写体グループと、後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいてフィルタを選択する。フィルタ選択部３３によるフィルタ選択の詳細は後述する。 The filter selection unit 33 selects one or more filters from the latter-stage filter group, which is a filter group excluding the first-stage filter group, from among the plurality of filters included in the machine learning model. Here, the filter selection unit 33 selects a filter based on at least the subject group specified by the group recognition unit 32 and the priority set for each filter constituting the subsequent filter group. Details of filter selection by the filter selection unit 33 will be described later.

個別認識部３４は、第１の前段フィルタ群の適用結果と、フィルタ選択部３３が選択したフィルタを入力画像Ｉに適用した結果とに基づいて、入力画像Ｉに含まれる認識対象を特定する。このように、画像認識装置１は、機械学習モデルが備える後段フィルタ群の中から実際に入力画像Ｉに適用するフィルタを選択する。これにより、画像認識装置１は、機械学習モデルが備えるフィルタをすべて使用する場合と比較して、画像認識処理における処理負荷を軽減することができる。 The individual recognition unit 34 identifies the recognition target included in the input image I based on the application result of the first pre-stage filter group and the result of applying the filter selected by the filter selection unit 33 to the input image I. In this way, the image recognition device 1 selects a filter that is actually applied to the input image I from the post-stage filter group provided in the machine learning model. As a result, the image recognition device 1 can reduce the processing load in the image recognition process as compared with the case where all the filters included in the machine learning model are used.

リソース取得部３５は、画像認識装置１の計算リソースを取得する。ここでリソース取得部３５が取得する「計算リソース」は、画像認識装置１が備えるＣＰＵ及びＧＰＵ等のプロセッサのパワー及び画像認識装置１が備える主記憶装置の容量等、画像認識装置１が画像認識処理に割り当てることができる計算能力である。画像認識装置１の計算リソースは、例えば画像認識装置１がブレードサーバである場合とＩｏＴデバイスである場合のように、画像認識装置１の種類によって異なる。また、同一の画像認識装置１であっても、画像認識処理時に並行して実行している他の処理に割り当てている計算リソースの大きさによって、画像認識処理に割り当て可能な計算リソースが変化しうる。リソース取得部３５は、画像認識装置１が機械学習モデルを用いて画像認識処理を実施する際に使用可能な計算リソースを取得する。 The resource acquisition unit 35 acquires the calculation resource of the image recognition device 1. Here, the "calculation resource" acquired by the resource acquisition unit 35 is image recognition by the image recognition device 1 such as the power of a processor such as a CPU and a GPU included in the image recognition device 1 and the capacity of the main storage device included in the image recognition device 1. Computational power that can be assigned to processing. The calculation resource of the image recognition device 1 differs depending on the type of the image recognition device 1, for example, when the image recognition device 1 is a blade server and when it is an IoT device. Further, even in the same image recognition device 1, the calculation resources that can be allocated to the image recognition process change depending on the size of the calculation resources allocated to other processes that are executed in parallel during the image recognition process. sell. The resource acquisition unit 35 acquires computational resources that can be used when the image recognition device 1 performs image recognition processing using a machine learning model.

フィルタ選択部３３は、画像認識装置１の計算リソースが許容する範囲において、後段フィルタ群を構成する各フィルタに設定されている優先度の高い順に個別認識部３４に適用させるフィルタを選択する。一般に、画像認識処理において適用するフィルタの数が多いほど高い認識精度を期待できる。画像認識装置１は、画像認識装置１の計算リソースに応じて適用する後段フィルタ群のフィルタを選択することにより、画像認識装置１が実行可能な範囲において最も高い認識精度を期待できる画像認識処理を実行することができる。 The filter selection unit 33 selects filters to be applied to the individual recognition unit 34 in descending order of priority set in each filter constituting the subsequent filter group within the range allowed by the calculation resource of the image recognition device 1. Generally, the larger the number of filters applied in the image recognition process, the higher the recognition accuracy can be expected. The image recognition device 1 performs image recognition processing that can be expected to have the highest recognition accuracy within the range that the image recognition device 1 can execute by selecting a filter of a post-stage filter group to be applied according to the calculation resource of the image recognition device 1. Can be executed.

［フィルタ選択処理］
続いて、実施の形態に係る画像認識装置１におけるフィルタ選択処理を説明する。 [Filter selection process]
Subsequently, the filter selection process in the image recognition device 1 according to the embodiment will be described.

上述したように、ニューラルネットワークにおいては、前段層の出力がその前段層に隣接する後段層の入力となる。畳込みニューラルネットワークにおける各畳込み層は、前段層から入力された信号に対してフィルタを適用し、フィルタの出力がその層の出力となる。したがって、畳込み層におけるフィルタの重み係数の絶対値が大きいほど、その次の層の入力信号の絶対値が大きくなりうる。 As described above, in the neural network, the output of the front layer becomes the input of the rear layer adjacent to the front layer. Each convolutional layer in the convolutional neural network applies a filter to the signal input from the previous layer, and the output of the filter becomes the output of that layer. Therefore, the larger the absolute value of the weighting factor of the filter in the convolutional layer, the larger the absolute value of the input signal of the next layer can be.

すなわち、ある畳込み層のフィルタの重み係数の大きさは、次の層において対応するユニットの活性度の指標値となりうる。ニューラルネットワークにおいては、層を構成するユニットうち、活性度の大きいユニットは、活性度の小さいユニットよりも、認識能力に対する寄与度が大きいと言われている。 That is, the magnitude of the weighting factor of the filter of one convolutional layer can be an index value of the activity of the corresponding unit in the next layer. In a neural network, it is said that among the units constituting the layer, the unit having a high activity contributes more to the cognitive ability than the unit having a low activity.

そこで、重み指標算出部３６は、後段フィルタ群に含まれる複数のフィルタそれぞれの重み係数の大きさを示す指標を算出する。ここで、「重み係数の大きさを示す指標」とは、例えばフィルタの重み係数の絶対値の総和をフィルタの重み係数の数で割った値である。あるいは、フィルタの重み係数の２乗の総和を、フィルタの重み係数の数で割った値であってもよい。いずれにしても、フィルタの重み係数の大きさを示す指標が大きいほど、そのフィルタに含まれる重み係数が大きいことを示している。 Therefore, the weight index calculation unit 36 calculates an index indicating the magnitude of the weight coefficient of each of the plurality of filters included in the subsequent filter group. Here, the "index indicating the magnitude of the weighting coefficient" is, for example, a value obtained by dividing the sum of the absolute values of the weighting coefficients of the filter by the number of weighting coefficients of the filter. Alternatively, it may be a value obtained by dividing the sum of the squares of the weighting coefficients of the filter by the number of weighting coefficients of the filter. In any case, the larger the index indicating the magnitude of the weighting coefficient of the filter, the larger the weighting coefficient included in the filter.

優先度設定部３７は、重み指標算出部３６が算出した指標によって重み係数が大きいことを示しているフィルタには、重み係数が小さいことを示しているフィルタよりも、優先度を高く設定する。これにより、画像認識装置１は、機械学習モデルの認識能力に寄与度が大きいと考えられるフィルタを優先して選択することができる。 The priority setting unit 37 sets the priority of the filter indicating that the weighting coefficient is large by the index calculated by the weighting index calculating unit 36 to be higher than that of the filter indicating that the weighting coefficient is small. As a result, the image recognition device 1 can preferentially select a filter that is considered to have a large contribution to the recognition ability of the machine learning model.

ここで、優先度設定部３７がある後段フィルタ群に含まれる各フィルタに設定する優先度の段階は、後段フィルタ群に含まれるフィルタの数を上限として任意である。例えば、優先度の段階をフィルタの数と同じにした場合、その畳込み層に含まれるフィルタは優先度を用いて序列をつけることができる。 Here, the priority stage set for each filter included in the subsequent filter group in which the priority setting unit 37 is located is arbitrary up to the number of filters included in the latter filter group. For example, if the priority level is the same as the number of filters, the filters contained in the convolutional layer can be ordered using the priority.

あるいは、フィルタの数が２以上の場合において、優先度の段階を「高」と「低」との２段階としてもよい。この場合、優先度設定部３７は、所定の閾値Ａを設定し、各フィルタの重み係数の大きさを示す指標が閾値Ａを超える場合は優先度を「高」とし、閾値Ａ未満の場合は優先度を「低」とすればよい。 Alternatively, when the number of filters is two or more, the priority stage may be two stages of "high" and "low". In this case, the priority setting unit 37 sets a predetermined threshold value A, sets the priority to "high" when the index indicating the magnitude of the weighting factor of each filter exceeds the threshold value A, and sets the priority to "high" when the index indicates the magnitude of the weighting factor of each filter is less than the threshold value A. The priority may be set to "low".

また、優先度設定部３７は、後段フィルタ群に含まれるフィルタ同士の類似度に基づいて、各フィルタに設定する優先度を変更してもよい。一般に、ある２つのフィルタの重み係数が近似しているほど、その２つのフィルタは近似する特徴を抽出すると考えられる。したがって、ある２つのフィルタが類似する場合には、いずれか一方のフィルタが特徴を抽出すれば、もう一方のフィルタを用いなくても、最終的な認識精度の変化は小さいと考えられる。反対に、他に類似するフィルタが存在しないフィルタは、そのフィルタは他のフィルタでは抽出できない特徴を抽出できる可能性がある。 Further, the priority setting unit 37 may change the priority set for each filter based on the similarity between the filters included in the subsequent filter group. In general, the closer the weighting factors of two filters are, the more the two filters are considered to extract similar features. Therefore, when two filters are similar, if one of the filters extracts the features, it is considered that the change in the final recognition accuracy is small even if the other filter is not used. Conversely, a filter for which no other similar filter exists may be able to extract features that the filter cannot extract with other filters.

そこで、類似度算出部３８は、後段フィルタ群に含まれる複数のフィルタそれぞれについて他のフィルタとの類似度を算出する。優先度設定部３７は、他に類似するフィルタが存在しないフィルタには、他に類似するフィルタが存在するフィルタよりも、優先度を高く設定する。これにより、画像認識装置１は、画像認識装置１の計算リソースが許容する範囲において、異なる特徴を抽出するためのフィルタを選択することができる。 Therefore, the similarity calculation unit 38 calculates the similarity with other filters for each of the plurality of filters included in the subsequent filter group. The priority setting unit 37 sets the priority of the filter having no other similar filter higher than that of the filter having other similar filters. Thereby, the image recognition device 1 can select a filter for extracting different features within the range allowed by the calculation resource of the image recognition device 1.

ここで、類似度算出部３８は、フィルタ間の「距離」をフィルタ間の類似度として算出すればよい。類似度算出部３８が算出するフィルタ間の「距離」は、距離の公理を満たせばどのような量であってもよいが、例えばフィルタ間のユークリッド距離である。具体的には、類似度算出部３８は、第ｉフィルタと第ｊフィルタの類似度Ｄ（ｉ，ｊ）は以下の式（１）を用いて算出する。 Here, the similarity calculation unit 38 may calculate the "distance" between the filters as the similarity between the filters. The "distance" between the filters calculated by the similarity calculation unit 38 may be any quantity as long as the axiom of the distance is satisfied, and is, for example, the Euclidean distance between the filters. Specifically, the similarity calculation unit 38 calculates the similarity D (i, j) between the i-filter and the j-filter using the following equation (1).

ここで、Ｉ（ｍ，ｎ，ｆ）は、３次元の第ｉフィルタの縦ｍ、横ｎ、高さｆにおける要素であり、Ｊ（ｍ，ｎ，ｆ）は、第ｊフィルタの縦ｍ、横ｎ、高さｆにおける要素である。式（１）は、２つのフィルタのユークリッド距離を、フィルタの要素数（重み係数の数）で正規化した量であることを示している。ある２つのフィルタ間の非類似度Ｄの値が小さいほど、そのフィルタ同士は類似していることを示している。この他にも、類似度算出部３８は、例えばコサイン類似度を用いてフィルタ間の類似度を算出してもよい。

Here, I (m, n, f) is an element in the vertical m, the horizontal n, and the height f of the three-dimensional i-filter, and J (m, n, f) is the vertical m of the j-th filter. , Lateral n, height f. Equation (1) shows that the Euclidean distance between the two filters is a quantity normalized by the number of elements of the filter (the number of weighting coefficients). The smaller the value of the degree of dissimilarity D between two filters, the more similar the filters are. In addition to this, the similarity calculation unit 38 may calculate the similarity between filters using, for example, the cosine similarity.

類似度算出部３８は、第ｉフィルタの類似度Ｓ（ｉ）として、Ｓ（ｉ）＝ΣＤ（ｉ，ｊ）（ｊ＝１，・・・，Ｎｆ；Ｎｆはフィルタの数）を算出し、その値が大きい（即ち他に類似するフィルタが存在しない）フィルタに高い優先度を割り当ててもよい。 The similarity calculation unit 38 calculates S (i) = ΣD (i, j) (j = 1, ..., Nf; Nf is the number of filters) as the similarity S (i) of the i-filter. , Higher priority may be assigned to filters with higher values (ie no other similar filters).

重み指標算出部３６と同様に、優先度設定部３７も、類似度の段階を「類似」と「非類似」との２段階としてもよい。この場合、優先度設定部３７は、所定の閾値Ｂを設定し、各フィルタ間の類似度が閾値Ｂを超える場合は「類似」とし、閾値Ｂ未満の場合は「非類似」とすればよい。また、優先度設定部３７は、重み指標算出部３６が優先度を２段階とした場合に、優先度が低となっているフィルタを対象として、フィルタ間の類似度を求めてもよい。この場合、優先度設定部３７は、いずれのフィルタとも非類似となったフィルタの優先度を「高」としてもよい。 Similar to the weight index calculation unit 36, the priority setting unit 37 may also have two levels of similarity, “similar” and “dissimilar”. In this case, the priority setting unit 37 may set a predetermined threshold value B, and if the similarity between the filters exceeds the threshold value B, it may be regarded as “similar”, and if it is less than the threshold value B, it may be regarded as “dissimilar”. .. Further, the priority setting unit 37 may obtain the similarity between the filters for the filters having the lower priority when the weight index calculation unit 36 sets the priority to two stages. In this case, the priority setting unit 37 may set the priority of the filter that is dissimilar to any of the filters to “high”.

上述したように、機械学習モデルを構成するフィルタには、ある認識対象を認識するためには重要な役割を果たす一方で別の認識対象の認識にはあまり寄与しないようなフィルタが存在すると考えられる。したがって、被写体の種類によって、その被写体を認識するためのフィルタの重要度が変わることが起こりうる。 As described above, it is considered that there are filters that make up a machine learning model that play an important role in recognizing one recognition target but do not contribute much to the recognition of another recognition target. .. Therefore, the importance of the filter for recognizing the subject may change depending on the type of the subject.

そこで、優先度設定部３７は、あらかじめ定められた複数の被写体グループ毎に、後段フィルタ群を構成する各フィルタに設定する優先度を変更する。以下、優先度設定部３７が実行する被写体グループ毎のフィルタの優先度の設定について具体的に説明する。 Therefore, the priority setting unit 37 changes the priority set for each filter constituting the subsequent filter group for each of a plurality of predetermined subject groups. Hereinafter, the setting of the priority of the filter for each subject group executed by the priority setting unit 37 will be specifically described.

いま、被写体グループの種類がＰ種類（Ｐは２以上の整数）であり、後段フィルタ群に含まれるフィルタの数がＱ個（Ｑは２以上の整数）であるとする。ｑ番目のフィルタをフィルタｆ_ｑ（１≦ｑ≦Ｑ）とし、ｐ番目のグループ（１≦ｐ≦Ｐ）におけるフィルタｆ_ｑの重要性の順序をＷ_ｐｑとする。 Now, it is assumed that the type of the subject group is P type (P is an integer of 2 or more) and the number of filters included in the subsequent filter group is Q (Q is an integer of 2 or more). The qth filter is the filter f _q (1 ≦ q ≦ Q), and the order of importance of the filter f _q in the pth group (1 ≦ p ≦ P) is W _pq .

ステップ１：優先度設定部３７は、ｐに１を設定する。
ステップ２：優先度設定部３７は、ｐ番目の被写体グループのテストデータＴ_ｐを取得する。
ステップ３：優先度設定部３７は、前段フィルタ群と全ての後段フィルタ群とをテストデータＴ_ｐに適用し、認識率Ｒ_ｐを算出する。 Step 1: The priority setting unit 37 sets p to 1.
Step 2: The priority setting unit 37 acquires the test data Tp of the _p -th subject group.
Step 3: The priority setting unit 37 applies the front-stage filter group and all the rear-stage filter groups to the test data _{Tp, and calculates the recognition rate R p} _.

ステップ４：優先度設定部３７は、ｑに１を設定する。
ステップ５：優先度設定部３７は、後段フィルタ群に含まれるフィルタの中からフィルタｆ_ｑを除外して適用した場合のテストデータＴ_ｐの認識率Ｒ_ｐｑを算出する。
ステップ６：優先度設定部３７は、認識率Ｒ_ｐから認識率Ｒ_ｐｑを減算した値である認識率の低下量Ｃ_ｑを算出する。低下量Ｃ_ｑはフィルタｆ_ｑを除外したことによる認識率の低下量を示している。すなわち、認識率に対するフィルタｆ_ｑの貢献度を示している。 Step 4: The priority setting unit 37 sets q to 1.
Step 5: The priority setting unit 37 calculates the recognition rate R _pq of the test data T _p when the filter f _q is excluded from the filters included in the subsequent filter group and applied.
Step 6: The priority setting unit 37 calculates a reduction amount C _q of the recognition rate, which is a value obtained by subtracting the recognition rate R _pq from the recognition rate R _p . The amount of decrease C _q indicates the amount of decrease in the recognition rate due to the exclusion of the filter f _q . That is, it shows the contribution of the filter f _q to the recognition rate.

ステップ７：優先度設定部３７は、ｑの値をｑ＋１に更新する。
ステップ８：ｑがＱを超えるまで、優先度設定部３７はステップ４及びステップ５の処理を繰り返す。
ステップ９：優先度設定部３７は、Ｑ個の低下量Ｃ_ｑを大きい順に並べ替える。このとき低下量Ｃ_ｑの添字ｑの順序が、ｐ番目の被写体グループにおけるフィルタｆ_ｑの重要製の順序Ｗ_ｐｑとなる。 Step 7: The priority setting unit 37 updates the value of q to q + 1.
Step 8: The priority setting unit 37 repeats the processes of steps 4 and 5 until q exceeds Q.
Step 9: The priority setting unit 37 sorts the Q reduction amounts C _q in descending order. At this time, the order of the subscripts q of the reduction amount C _q is the important order W _pq of the filter f _q in the p-th subject group.

ステップ１０：優先度設定部３７は、ｐの値をｐ＋１に更新する。
スタップ１１：ｐがＰを超えるまで、優先度設定部３７はステップ２からステップ８までの処理を繰り返す。 Step 10: The priority setting unit 37 updates the value of p to p + 1.
Stap 11: The priority setting unit 37 repeats the processes from step 2 to step 8 until p exceeds P.

以上の処理により、優先度設定部３７は、あらかじめ定められた被写体ブループ毎に、後段フィルタ群を構成する各フィルタの重要性の順序を求めることができる。優先度設定部３７は、あらかじめ定められた被写体ブループ毎に、重要性が高いフィルタほど優先度を上げる。これにより、優先度設定部３７は、あらかじめ定められた複数の被写体グループ毎に、後段フィルタ群を構成する各フィルタに設定する優先度を変更することができる。 By the above processing, the priority setting unit 37 can obtain the order of importance of each filter constituting the subsequent filter group for each predetermined subject group. The priority setting unit 37 raises the priority of the filter having higher importance for each predetermined subject group. As a result, the priority setting unit 37 can change the priority set for each filter constituting the subsequent filter group for each of a plurality of predetermined subject groups.

なお、フィルタの類似度と同様に、優先度設定部３７は、重要度の段階を重要か否かの２段階としてもよい。この場合、優先度設定部３７は、所定の閾値Ｃを設定し、認識率Ｒ_ｐｑが閾値Ｃより小さい場合はフィルタｆ_ｑを重要とし、認識率Ｒ_ｐｑが閾値Ｃ以上の場合はフィルタｆ_ｑを重要でないとすればよい。 As with the similarity of the filter, the priority setting unit 37 may have two levels of importance or not. In this case, the priority setting unit 37 sets a predetermined threshold value C, the filter f _q is important when the recognition rate R _pq is smaller than the threshold value C, and the filter f _q is important when the recognition rate R _pq is equal to or higher than the threshold value C. Should not be important.

図７は、被写体グループ毎のフィルタの優先度を表形式で模式的に示す図である。具体的には、図７は、第１被写体グループに関するフィルタ毎の重み係数の大きさ、類似度、重要性の順序、及び優先度を格納する優先度データベースのデータ構造を示している。優先度データベースは記憶部２に格納され、優先度設定部３７によって管理される。優先度データベースを参照することにより、フィルタ選択部３３は、被写体グループと、後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、後段フィルタ群のうち個別認識部３４に適用させるフィルタを選択することができる。 FIG. 7 is a diagram schematically showing the priority of the filter for each subject group in a table format. Specifically, FIG. 7 shows the data structure of a priority database that stores the magnitude, similarity, order of importance, and priorities of the weighting factors for each filter for the first subject group. The priority database is stored in the storage unit 2 and managed by the priority setting unit 37. By referring to the priority database, the filter selection unit 33 informs the individual recognition unit 34 of the latter-stage filter group based on at least the subject group and the priority set for each filter constituting the latter-stage filter group. You can select the filter to apply.

ここで、優先度設定部３７が、後段フィルタ群に含まれる各フィルタに設定する優先度の段階をフィルタの数未満とした場合、複数のフィルタが同じ優先度となることも起こりうる。そこで、フィルタ選択部３３は、前段フィルタ群を構成する２以上のフィルタそれぞれに等しい優先度が設定されている場合は、等しい優先度が設定されたフィルタから無作為にフィルタを選択すればよい。これにより、フィルタ選択部３３は、優先度以外の他の指標を参照することなく、フィルタを選択することができる。 Here, when the priority setting unit 37 sets the priority level set for each filter included in the subsequent filter group to less than the number of filters, it is possible that a plurality of filters have the same priority. Therefore, when the same priority is set for each of the two or more filters constituting the previous stage filter group, the filter selection unit 33 may randomly select a filter from the filters set with the same priority. As a result, the filter selection unit 33 can select the filter without referring to an index other than the priority.

学習部３９は、複数の画像データと、複数の画像データそれぞれに含まれる被写体と、被写体の被写体グループとが関連付けられた学習データに基づいて、ニューラルネットワークを用いた機械学習によって機械学習モデルを生成する。 The learning unit 39 generates a machine learning model by machine learning using a neural network based on learning data in which a plurality of image data, a subject included in each of the plurality of image data, and a subject group of the subject are associated with each other. do.

図８は、実施の形態に係る学習部３９の機能構成を模式的に示す図である。図８に示すように、学習部３９は、前段学習部３９０と後段学習部３９１とを備える。 FIG. 8 is a diagram schematically showing the functional configuration of the learning unit 39 according to the embodiment. As shown in FIG. 8, the learning unit 39 includes a front-stage learning unit 390 and a rear-stage learning unit 391.

まず、画像取得部３０は、画像データに含まれる被写体と、その被写体が属する被写体グループとが既知である複数の画像データを取得する。前段学習部３９０は、画像取得部３０が取得した複数の画像データを学習データとして、図４（ａ）－（ｂ）を参照して説明したように、ニューラルネットワークを用いた機械学習によって前段フィルタ群を生成する。 First, the image acquisition unit 30 acquires a plurality of image data in which the subject included in the image data and the subject group to which the subject belongs are known. The pre-stage learning unit 390 uses a plurality of image data acquired by the image acquisition unit 30 as training data, and as described with reference to FIGS. 4 (a)-(b), the pre-stage filter is performed by machine learning using a neural network. Generate a swarm.

後段学習部３９１は、図４（ｃ）を参照して説明したように、前段学習部３９０が前段フィルタ群を生成した後に、前段フィルタ群への誤差逆伝搬を行わずに前段フィルタ群の適用結果を用いて後段フィルタ群を生成する。 As described with reference to FIG. 4C, the rear-stage learning unit 391 applies the front-stage filter group without performing error back propagation to the front-stage filter group after the front-stage learning unit 390 generates the front-stage filter group. The result is used to generate a group of subsequent filters.

具体的には、後段学習部３９１は出力層Ｌｏの出力と、入力画像Ｉに対応付けられた識別ラベルＢとの誤差を誤差逆伝搬させることによって各層を構成するフィルタの重みを更新する。このとき、後段学習部３９１は、第１前段畳込み層Ｃ_ｆ１、第２前段畳込み層Ｃ_ｆ２、第３前段畳込み層Ｃ_ｆ３、第４前段畳込み層Ｃ_ｆ４、及び第５前段畳込み層Ｃ_ｆ５に含まれるフィルタの重みを固定し、その更新を禁止する。これにより、後段学習部３９１は、前段フィルタ群を固定したまま、後段フィルタ群を生成することができる。 Specifically, the latter-stage learning unit 391 updates the weights of the filters constituting each layer by back-propagating the error between the output of the output layer Lo and the identification label B associated with the input image I. At this time, the rear-stage learning unit 391 includes a first front-stage folding layer C _f 1, a second front-stage folding layer C _f 2, a third front-stage folding layer C _f 3, a fourth front-stage folding layer C _f 4, and the like. The weight of the filter included in the fifth front convolution layer C _f 5 is fixed, and its update is prohibited. As a result, the rear-stage learning unit 391 can generate the rear-stage filter group while keeping the front-stage filter group fixed.

＜画像認識装置１が実行する画像認識方法の処理フロー＞
図９は、実施の形態に係る画像認識装置１が実行する画像認識処理の流れを説明するためのフローチャートである。本フローチャートにおける処理は、例えば画像認識装置１が起動したときに開始する。 <Processing flow of the image recognition method executed by the image recognition device 1>
FIG. 9 is a flowchart for explaining the flow of the image recognition process executed by the image recognition device 1 according to the embodiment. The process in this flowchart starts, for example, when the image recognition device 1 is activated.

モデル取得部３１は、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データである入力画像Ｉに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得する（Ｓ２）。 The model acquisition unit 31 is a machine learning model that includes a plurality of filters as model parameters, and recognizes the subject included in the input image I, which is the image data to be processed, among a plurality of predetermined recognition targets. Acquire a machine learning model that outputs information indicating whether or not it is a target (S2).

グループ認識部３２は、複数のフィルタのうち、入力画像Ｉに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を入力画像Ｉに適用することにより、被写体グループを特定する（Ｓ４）。 The group recognition unit 32 uses the input image I as a pre-stage filter group for recognizing which of the plurality of predetermined subject groups the subject included in the input image I belongs to among the plurality of filters. By applying, the subject group is specified (S4).

フィルタ選択部３３は、モデル取得部３１が特定した被写体グループと、後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、後段フィルタ群を構成するフィルタの中から１以上のフィルタを選択する（Ｓ６）。 The filter selection unit 33 is one or more of the filters constituting the subsequent filter group based on at least the subject group specified by the model acquisition unit 31 and the priority set for each filter constituting the subsequent filter group. Select the filter of (S6).

個別認識部３４は、前段フィルタ群の適用結果と、選択したフィルタの適用結果とに基づいて、入力画像Ｉに含まれる認識対象を特定する（Ｓ８）。 The individual recognition unit 34 identifies the recognition target included in the input image I based on the application result of the previous stage filter group and the application result of the selected filter (S8).

＜実施の形態に係る画像認識装置１が奏する効果＞
以上説明したように、実施の形態に係る画像認識装置１によれば、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減することができる。 <Effects of the image recognition device 1 according to the embodiment>
As described above, according to the image recognition device 1 according to the embodiment, it is possible to reduce the processing load in the image recognition processing using the machine learning model of the neural network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes can be made within the scope of the gist. be. For example, the specific embodiment of the distribution / integration of the device is not limited to the above embodiment, and all or a part thereof may be functionally or physically distributed / integrated in any unit. Can be done. Also included in the embodiments of the present invention are new embodiments resulting from any combination of the plurality of embodiments. The effect of the new embodiment produced by the combination has the effect of the original embodiment together.

１・・・画像認識装置
２・・・記憶部
３・・・制御部
３０・・・画像取得部
３１・・・モデル取得部
３２・・・グループ認識部
３３・・・フィルタ選択部
３４・・・個別認識部
３５・・・リソース取得部
３６・・・重み指標算出部
３７・・・優先度設定部
３８・・・類似度算出部
３９・・・学習部
３９０・・・前段学習部
３９１・・・後段学習部
Ｎ・・・畳込みニューラルネットワーク
1 ... Image recognition device 2 ... Storage unit 3 ... Control unit 30 ... Image acquisition unit 31 ... Model acquisition unit 32 ... Group recognition unit 33 ... Filter selection unit 34 ... -Individual recognition unit 35 ... Resource acquisition unit 36 ... Weight index calculation unit 37 ... Priority setting unit 38 ... Similarity calculation unit 39 ... Learning unit 390 ... Pre-stage learning unit 391.・・ Post-stage learning unit N ・・・ Convolutional neural network

Claims

A machine learning model that includes a plurality of filters as model parameters, and outputs information indicating which of a plurality of predetermined recognition targets the subject included in the image data to be processed is. The model acquisition unit that acquires the learning model, and
By applying to the image data a pre-stage filter group for recognizing which of the plurality of predetermined subject groups the subject included in the image data belongs to among the plurality of filters. A group recognition unit that identifies the subject group and
Individual recognition that identifies the recognition target included in the image data based on the application result of the pre-stage filter group and the application result of the post-stage filter group, which is a filter group excluding the pre-stage filter group among the plurality of filters. Department and
A filter to be applied to the individual recognition unit among the rear filter groups is selected based on at least the subject group specified by the group recognition unit and the priority set for each filter constituting the rear filter group. Filter selection part and
Image recognition device.

The filter selection unit selects filters to be applied to the individual recognition unit in descending order of priority set for each filter constituting the subsequent filter group within a range allowed by the calculation resource of the image recognition device.
The image recognition device according to claim 1.

A weight index calculation unit that calculates an index indicating the magnitude of the weight coefficient of each of the plurality of filters, and a weight index calculation unit.
The filter indicating that the weighting factor is large by the index includes a priority setting unit that sets the priority higher than the filter indicating that the weighting factor is small.
The image recognition device according to claim 1 or 2, further comprising.

A similarity calculation unit that calculates the similarity with other filters for each of the plurality of filters,
A filter having no other similar filter further includes a priority setting unit for setting the priority higher than that of a filter having another similar filter.
The image recognition device according to any one of claims 1 to 3.

The priority setting unit changes the priority set for each filter constituting the subsequent filter group for each of the plurality of subject groups.
The image recognition device according to claim 3 or 4.

When the equal priority is set for each of the two or more filters constituting the subsequent filter group, the filter selection unit randomly selects a filter from the filters set with the same priority.
The image recognition device according to any one of claims 1 to 5.

Learning to generate the machine learning model by machine learning using a neural network based on learning data in which a plurality of image data, a subject included in each of the plurality of image data, and a subject group of the subject are associated with each other. With more parts,
The learning unit
The pre-stage learning unit that generates the pre-stage filter group and
After the pre-stage learning unit generates the pre-stage filter group, the post-stage learning unit that generates the post-stage filter group using the application result of the pre-stage filter group without performing error back propagation to the pre-stage filter group. Prepare, prepare
The image recognition device according to any one of claims 1 to 6.

The processor,
A machine learning model that includes a plurality of filters as model parameters, and outputs information indicating which of a plurality of predetermined recognition targets the subject included in the image data to be processed is. The steps to get the learning model and
By applying to the image data a pre-stage filter group for recognizing which of the plurality of predetermined subject groups the subject included in the image data belongs to among the plurality of filters. The step of identifying the subject group and
Based on at least the specified subject group and the priority set for each filter constituting the rear filter group, which is a filter group excluding the front filter group among the plurality of filters, the rear filter group is provided. A step to select one or more filters from the constituent filters, and
A step of specifying a recognition target included in the image data based on the application result of the previous stage filter group and the application result of the selected filter.
Image recognition method to perform.

On the computer
A machine learning model that includes a plurality of filters as model parameters, and outputs information indicating which of a plurality of predetermined recognition targets the subject included in the image data to be processed is. With the ability to get a learning model,
By applying to the image data a pre-stage filter group for recognizing which of the plurality of predetermined subject groups the subject included in the image data belongs to among the plurality of filters. The function to specify the subject group and
Based on at least the specified subject group and the priority set for each filter constituting the rear filter group, which is a filter group excluding the front filter group among the plurality of filters, the rear filter group is provided. A function to select one or more filters from the constituent filters, and
A function of specifying a recognition target included in the image data based on the application result of the previous stage filter group and the application result of the selected filter.
A program that realizes.