JP2020107185A

JP2020107185A - Image recognition device, image recognition method and program

Info

Publication number: JP2020107185A
Application number: JP2018246993A
Authority: JP
Inventors: 建鋒徐; Kenho Jo; 和之田坂; Kazuyuki Tasaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-09
Anticipated expiration: 2038-12-28
Also published as: JP6991960B2

Abstract

To reduce a processing load in image recognition processing using a machine learning model of a neural network.SOLUTION: A model acquisition unit 31 acquires a machine learning model that recognizes a subject included in image data to be processed. A group recognition unit 32 identifies, from a plurality of filters included in machine learning, a subject group by applying to the image data a front-stage filter group for recognizing which subject group a subject included in the image data belongs to. An individual recognition unit 34 identifies, based on an application result of the front-stage filter group and an application result of a rear-stage filter group which is a filter group excluding the front-stage filter group from the plurality of filters, a recognition object included in the image data A filter selection unit 33 selects, based on the subject group identified by the group recognition unit 32 and the priority set for each filter that constitutes the rear-stage filter group, a filter to be applied to the individual recognition unit 34 from the rear-stage filter group.SELECTED DRAWING: Figure 6

Description

本発明は画像認識装置、画像認識方法及びプログラムに関し、特に画像認識処理の負荷を軽減するための技術に関する。 The present invention relates to an image recognition device, an image recognition method and a program, and more particularly to a technique for reducing the load of image recognition processing.

近年、ニューラルネットワークの一種であるディープラーニングを用いて画像から物体のクラスを認識する技術が実用化されている。このような技術の中には、認識精度を向上させるためにより多くの層を含むニューラルネットワークの構造も提案されている。ニューラルネットワークにおいては、層を重ねるごとにより高度で複雑な特徴を抽出できるようになる。したがって、層を深くすることはニューラルネットワークを用いた機械学習モデルの認識精度向上に重要な役割を果たす。 In recent years, a technique for recognizing an object class from an image using deep learning, which is a kind of neural network, has been put into practical use. Among such techniques, a structure of a neural network including more layers has been proposed in order to improve recognition accuracy. In the neural network, it becomes possible to extract higher and more complex features by stacking layers. Therefore, making the layers deep plays an important role in improving the recognition accuracy of machine learning models using neural networks.

一方で、ニューラルネットワークの層が深くなるほど認識処理実行時の計算量が増え、認識処理を実行するために要求される計算能力が増加する傾向がある。このため、例えばＩｏＴ（Internet Of Things）デバイス等の計算リソースが相対的に小さい装置ではニューラルネットワークの層が深い機械学習モデルを実行することが困難となりうる。 On the other hand, the deeper the layer of the neural network, the more the amount of calculation at the time of executing the recognition process increases, and the calculation capacity required for executing the recognition process tends to increase. Therefore, it may be difficult to execute a machine learning model having a deep neural network layer in a device having relatively small calculation resources such as an IoT (Internet Of Things) device.

この問題に対処するために、例えば非特許文献１では、機械学習モデルを構成する複数の層に出力層を設けて、計算リソースの変動に合わせて出力層を選んで画像認識を行う技術が提案されている。 In order to deal with this problem, for example, in Non-Patent Document 1, a technique is proposed in which an output layer is provided in a plurality of layers forming a machine learning model, and the output layer is selected according to the fluctuation of calculation resources to perform image recognition. Has been done.

Gao Huang, et al., Multi-Scale Dense Convolutional Networks for Resource Efficient Image Classification, International Conference on Learning Representations (ICLR) 2018.Gao Huang, et al., Multi-Scale Dense Convolutional Networks for Resource Efficient Image Classification, International Conference on Learning Representations (ICLR) 2018.

非特許文献１に係る技術では、ニューラルネットワークの浅い層（すなわち、入力層に近い層）の出力層で認識のための演算を終えるほど、認識精度が低下する。また、非特許文献１に係る技術のニューラルネットワークは２次元的に広がる層構造を持つため、機械学習モデルのサイズが大きくなり、また学習が難しくなりうる。このように、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減することは改善の余地がある。 In the technique according to Non-Patent Document 1, the recognition accuracy decreases as the calculation for recognition is completed in the output layer of the shallow layer (that is, the layer close to the input layer) of the neural network. Further, since the neural network of the technique according to Non-Patent Document 1 has a layered structure that spreads two-dimensionally, the size of the machine learning model may be large and learning may be difficult. Thus, there is room for improvement in reducing the processing load in the image recognition processing using the machine learning model of the neural network.

そこで、本発明はこれらの点に鑑みてなされたものであり、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減するための技術を提供することを目的とする。 Therefore, the present invention has been made in view of these points, and an object thereof is to provide a technique for reducing the processing load in image recognition processing using a machine learning model of a neural network.

本発明の第１の態様は、画像認識装置である。この装置は、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得するモデル取得部と、前記複数のフィルタのうち、前記画像データに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を前記画像データに適用することにより、前記被写体グループを特定するグループ認識部と、前記前段フィルタ群の適用結果と、前記複数のフィルタのうち前記前段フィルタ群を除いたフィルタ群である後段フィルタ群の適用結果とに基づいて、前記画像データに含まれる認識対象を特定する個別認識部と、前記グループ認識部が特定した被写体グループと、前記後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、前記後段フィルタ群のうち前記個別認識部に適用させるフィルタを選択するフィルタ選択部と、を備える。 A first aspect of the present invention is an image recognition device. This device is a machine learning model that includes a plurality of filters as model parameters, and information indicating which of the plurality of predetermined recognition targets the subject included in the image data of the processing target is a recognition target. A model acquisition unit that acquires a machine learning model that outputs a machine learning model, and to recognize which one of a plurality of predetermined object groups the object included in the image data belongs to among the plurality of filters. By applying the pre-stage filter group to the image data, a group recognition unit that identifies the subject group, an application result of the pre-stage filter group, and a filter group excluding the pre-stage filter group among the plurality of filters. Based on the application result of a certain post-stage filter group, an individual recognition unit that identifies the recognition target included in the image data, a subject group identified by the group recognition unit, and set to each filter that configures the post-stage filter group A filter selecting unit that selects a filter to be applied to the individual recognizing unit in the post-stage filter group based on at least the assigned priority.

前記フィルタ選択部は、前記画像認識装置の計算リソースが許容する範囲において、前記後段フィルタ群を構成する各フィルタに設定されている優先度の高い順に前記個別認識部に適用させるフィルタを選択してもよい。 The filter selection unit selects a filter to be applied to the individual recognition unit in descending order of priority set for each filter forming the latter-stage filter group in a range permitted by the calculation resource of the image recognition device. Good.

前記画像認識装置は、前記複数のフィルタそれぞれの重み係数の大きさを示す指標を算出する重み指標算出部と、前記指標によって重み係数が大きいことを示しているフィルタには、重み係数が小さいことを示しているフィルタよりも、前記優先度を高く設定する優先度設定部と、をさらに備えてもよい。 In the image recognition device, the weighting factor calculation unit that calculates the index indicating the magnitude of the weighting factor of each of the plurality of filters, and the filter that indicates that the weighting factor is large by the index have a small weighting factor. And a priority setting unit for setting the priority higher than that of the filter indicating.

前記画像認識装置は、前記複数のフィルタそれぞれについて他のフィルタとの類似度を算出する類似度算出部と、他に類似するフィルタが存在しないフィルタには、他に類似するフィルタが存在するフィルタよりも、前記優先度を高く設定する優先度設定部と、をさらに備えてもよい。 The image recognition device, a similarity calculation unit that calculates the similarity of each of the plurality of filters with other filters, and a filter in which no other similar filter exists, than a filter in which another similar filter exists. Also, a priority setting unit that sets the priority higher may be further provided.

前記優先度設定部は、前記複数の被写体グループ毎に、前記後段フィルタ群を構成する各フィルタに設定する優先度を変更してもよい。 The priority setting unit may change, for each of the plurality of subject groups, the priority set for each filter included in the post-stage filter group.

前記フィルタ選択部は、前記後段フィルタ群を構成する２以上のフィルタそれぞれに等しい優先度が設定されている場合、等しい優先度が設定されたフィルタから無作為にフィルタを選択してもよい。 When equal priority is set to each of the two or more filters forming the latter filter group, the filter selection unit may randomly select a filter from filters having equal priorities.

前記画像認識装置は、複数の画像データと、前記複数の画像データそれぞれに含まれる被写体と、当該被写体の被写体グループとが関連付けられた学習データに基づいて、ニューラルネットワークを用いた機械学習によって前記機械学習モデルを生成する学習部をさらに備えてもよく、前記学習部は、前記前段フィルタ群を生成する前段学習部と、前記前段学習部が前記前段フィルタ群を生成した後に、前記前段フィルタ群への誤差逆伝搬を行わずに前記前段フィルタ群の適用結果を用いて前記後段フィルタ群を生成する後段学習部と、を備えてもよい。 The image recognition device performs the machine learning by machine learning using a neural network based on a plurality of image data, a subject included in each of the plurality of image data, and learning data in which a subject group of the subject is associated. The learning unit may further include a learning unit that generates a learning model, wherein the learning unit includes a pre-stage learning unit that generates the pre-stage filter group, and the pre-stage learning unit that generates the pre-stage filter group and then moves to the pre-stage filter group. And a post-stage learning unit that generates the post-stage filter group using the application result of the pre-stage filter group without performing the error back propagation.

本発明の第２の態様は、画像認識方法である。この方法において、プロセッサが、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得するステップと、前記複数のフィルタのうち、前記画像データに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を前記画像データに適用することにより、前記被写体グループを特定するステップと、特定した前記被写体グループと、前記複数のフィルタのうち前記前段フィルタ群を除いたフィルタ群である後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、前記後段フィルタ群を構成するフィルタの中から１以上のフィルタを選択するステップと、前記前段フィルタ群の適用結果と、選択した前記フィルタの適用結果とに基づいて、前記画像データに含まれる認識対象を特定するステップと、を実行する。 A second aspect of the present invention is an image recognition method. In this method, the processor is a machine learning model including a plurality of filters as model parameters, and which one of a plurality of predetermined recognition targets is a subject included in the image data to be processed is a recognition target. And a step of acquiring a machine learning model that outputs information indicating that which of the plurality of filters the subject included in the image data belongs to among a plurality of predetermined subject groups is recognized. Applying a pre-stage filter group to the image data to identify the subject group, the identified subject group, and a post-stage filter that is a filter group excluding the pre-stage filter group from the plurality of filters. Selecting at least one filter from the filters forming the latter filter group based on at least the priority set for each filter forming the group, the application result of the former filter group, and the selection And specifying a recognition target included in the image data based on the applied result of the filter.

本発明の第３の態様は、プログラムである。このプログラムは、コンピュータに、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得する機能と、前記複数のフィルタのうち、前記画像データに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を前記画像データに適用することにより、前記被写体グループを特定する機能と、特定した前記被写体グループと、前記複数のフィルタのうち前記前段フィルタ群を除いたフィルタ群である後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、前記後段フィルタ群を構成するフィルタの中から１以上のフィルタを選択する機能と、前記前段フィルタ群の適用結果と、選択した前記フィルタの適用結果とに基づいて、前記画像データに含まれる認識対象を特定する機能と、を実現させる。 A third aspect of the present invention is a program. This program is a machine learning model in which a computer includes a plurality of filters as model parameters, and which one of a plurality of predetermined recognition targets is a subject included in image data to be processed is a recognition target. And a function of acquiring a machine learning model that outputs information indicating that which of the plurality of filters the subject included in the image data belongs to among a plurality of predetermined subject groups is recognized. By applying a pre-stage filter group to the image data, a function of identifying the subject group, the identified subject group, and a post-stage filter that is a filter group excluding the pre-stage filter group from the plurality of filters A function of selecting one or more filters from the filters forming the latter-stage filter group based on at least the priority set for each filter constituting the group, an application result of the preceding-stage filter group, and a selection And a function of specifying a recognition target included in the image data based on the applied result of the filter.

本発明によれば、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減することができる。 According to the present invention, it is possible to reduce the processing load in image recognition processing using a machine learning model of a neural network.

畳込みニューラルネットワークの一般的な機能構成を模式的に示す図である。It is a figure which shows typically the general function structure of a convolutional neural network. ＡｌｅｘＮｅｔとして知られるニューラルネットワークの機械学習モデルの層構造を模式的に示す図である。It is a figure which shows typically the layered structure of the machine learning model of the neural network known as AlexNet. 実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークの層構造を模式的に示す図である。It is a figure which shows typically the layered structure of the convolutional neural network which the image recognition apparatus which concerns on embodiment uses. 実施の形態に係るニューラルネットワークの学習過程を説明するための図である。It is a figure for explaining the learning process of the neural network concerning an embodiment. 実施の形態に係るニューラルネットワークの認識過程を説明するための図である。It is a figure for demonstrating the recognition process of the neural network which concerns on embodiment. 実施の形態に係る画像認識装置の機能構成を模式的に示す図である。It is a figure which shows typically the functional structure of the image recognition apparatus which concerns on embodiment. 被写体グループ毎のフィルタの優先度を表形式で模式的に示す図である。It is a figure which shows typically the priority of the filter for every subject group in a tabular form. 実施の形態に係る学習部の機能構成を模式的に示す図である。It is a figure which shows typically the functional structure of the learning part which concerns on embodiment. 実施の形態に係る画像認識装置が実行する画像認識処理の流れを説明するためのフローチャートである。6 is a flowchart for explaining the flow of image recognition processing executed by the image recognition device according to the embodiment.

＜畳込みニューラルネットワーク＞
実施の形態に係る画像認識装置は、ニューラルネットワークの機械学習モデルを用いた画像認識処理を実行するための装置である。実施の形態に係る画像認識装置は、主な一例として、畳込みニューラルネットワーク（Convolutional Neural Network；ＣＮＮ）の機械学習モデルを用いる。そこで、実施の形態に係る情報処理装置の前提技術として、まず畳込みニューラルネットワークについて簡単に説明する。 <Convolutional neural network>
The image recognition device according to the embodiment is a device for executing an image recognition process using a machine learning model of a neural network. The image recognition apparatus according to the embodiment uses a machine learning model of a convolutional neural network (CNN) as a main example. Therefore, as a prerequisite technique of the information processing apparatus according to the embodiment, first, a convolutional neural network will be briefly described.

図１は、畳込みニューラルネットワークの一般的な機能構成を模式的に示す図である。現在、様々な構成のニューラルネットワークが提案されているが、これらの基本構成は共通である。ニューラルネットワークの基本構成は、複数種類の層の重ね合わせ（又はグラフ構造）で表現される。ニューラルネットワークは、入力データに対する出力結果が適切な値になるようにモデルパラメータを学習する。言い換えると、ニューラルネットワークは、入力データに対する出力結果が適切な値になるように定義された損失関数を最小化するようにモデルパラメータを学習する。 FIG. 1 is a diagram schematically showing a general functional configuration of a convolutional neural network. Currently, various configurations of neural networks have been proposed, but these basic configurations are common. The basic configuration of the neural network is expressed by superimposing a plurality of types of layers (or a graph structure). The neural network learns the model parameters so that the output result for the input data has an appropriate value. In other words, the neural network learns the model parameters so as to minimize the loss function defined so that the output result with respect to the input data becomes an appropriate value.

図１は、入力画像Ｉに含まれる被写体の種類を出力するように学習された機械学習モデルを示している。図１に示す例では、入力層Ｌｉに入力された入力画像Ｉは、第１畳込み層Ｃ１、第２畳込み層Ｃ２の順に処理され、プーリング層Ｐ、第１全結合層Ｆ１、第２全結合層Ｆ２、及び出力層Ｌｏに至るように構成されている。出力層は、入力画像Ｉに含まれる被写体の種類を示す識別ラベルＢを出力する。 FIG. 1 shows a machine learning model learned so as to output the types of subjects included in the input image I. In the example shown in FIG. 1, the input image I input to the input layer Li is processed in the order of the first convolutional layer C1 and the second convolutional layer C2, and the pooling layer P, the first fully connected layer F1, and the second It is configured to reach the full coupling layer F2 and the output layer Lo. The output layer outputs the identification label B indicating the type of subject included in the input image I.

例えば、図１に示す機械学習モデルが、犬や猫、猿等の複数の動物を認識するための機械学習モデルである場合、あらかじめ識別対象の動物を特定するための識別ラベルＢが割り当てられている。この機械学習モデルの入力層Ｌｉに入力画像Ｉが入力されると、出力層Ｌｏは、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す識別ラベルＢを出力する。なお、識別ラベルＢは、例えば、複数の認識対象それぞれに一意に割り当てられたビット列である。 For example, when the machine learning model shown in FIG. 1 is a machine learning model for recognizing a plurality of animals such as dogs, cats, and monkeys, an identification label B for identifying an animal to be identified is assigned in advance. There is. When the input image I is input to the input layer Li of this machine learning model, the output layer Lo outputs the identification label B indicating which of the plurality of predetermined recognition targets is the recognition target. The identification label B is, for example, a bit string uniquely assigned to each of the plurality of recognition targets.

ニューラルネットワークにおいては、前段層の出力がその前段層に隣接する後段層の入力となる。畳込みニューラルネットワークにおける各畳込み層は、前段層から入力された信号に対してフィルタを適用し、フィルタの出力がその層の出力となる。 In the neural network, the output of the preceding layer becomes the input of the succeeding layer adjacent to the preceding layer. Each convolutional layer in the convolutional neural network applies a filter to the signal input from the previous stage layer, and the output of the filter becomes the output of that layer.

図２は、ＡｌｅｘＮｅｔとして知られるニューラルネットワークの機械学習モデルの層構造を模式的に示す図である。図２に示すように、ＡｌｅｘＮｅｔは入力層Ｌｉと、５つの畳込み層（第１畳込み層Ｃ１、第２畳込み層Ｃ２、第３畳込み層Ｃ３、第４畳込み層Ｃ４、及び第５畳込み層Ｃ５）と、２つの全結合層（第１全結合層Ｆ１及び第２全結合層Ｆ２）と、出力層Ｌｏとを含み、最終層は１０００種類の識別ラベルＢを出力するように構成されている。すなわち、ＡｌｅｘＮｅｔは１０００種類の認識対象を認識するための畳込みニューラルネットワークである。 FIG. 2 is a diagram schematically showing the layer structure of a machine learning model of a neural network known as AlexNet. As shown in FIG. 2, AlexNet includes an input layer Li, five convolution layers (first convolution layer C1, second convolution layer C2, third convolution layer C3, fourth convolution layer C4, and fourth convolution layer C4). 5 convolutional layer C5), two fully coupled layers (first fully coupled layer F1 and second fully coupled layer F2), and output layer Lo, and the final layer outputs 1000 kinds of identification labels B. Is configured. That is, AlexNet is a convolutional neural network for recognizing 1000 types of recognition targets.

図示はしないが、認識精度を向上させるため、さらに深い層を持つネット構造が提案されている。例えば、ＲｅｓＮｅｔとして知られるニューラルネットワークの機械学習モデルは、１５２層からなる層構造を有している。ニューラルネットワークでは、層を重ねる毎により高度で複雑な特徴を抽出可能であるため、層を深くすることは認識精度の向上に重要な役割を果たしていると考えられる。 Although not shown, a net structure having a deeper layer has been proposed to improve recognition accuracy. For example, a neural network machine learning model known as ResNet has a layered structure of 152 layers. Since it is possible to extract higher and more complicated features with each layer of the neural network, it is considered that making the layers deep plays an important role in improving the recognition accuracy.

＜実施の形態に係るニューラルネットワーク＞
図３は、実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークＮの層構造を模式的に示す図である。以下、図３を参照して、実施の形態に係るニューラルネットワークＮについて説明する。 <Neural network according to the embodiment>
FIG. 3 is a diagram schematically showing the layer structure of the convolutional neural network N used by the image recognition apparatus according to the embodiment. The neural network N according to the embodiment will be described below with reference to FIG.

実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークＮの機械学習モデルは、従来の畳込みニューラルネットワークと同様に複数の畳込み層を備え、入力画像Ｉに含まれる被写体が複数の認識対象のうちいずれの認識対象であるかを出力する。このため、実施の形態に係る画像認識装置１が用いる畳込みニューラルネットワークＮの機械学習モデルは、複数のフィルタをモデルパラメータとして含んでいる。 The machine learning model of the convolutional neural network N used by the image recognition apparatus according to the embodiment includes a plurality of convolutional layers as in the conventional convolutional neural network, and the subject included in the input image I is a plurality of recognition targets. Which of the recognition targets is output. Therefore, the machine learning model of the convolutional neural network N used by the image recognition device 1 according to the embodiment includes a plurality of filters as model parameters.

畳込みニューラルネットワークでは、層を重ねる毎により高度で複雑な特徴を抽出することができるので、層を深くすることによって区別が難しい類似した認識対象であっても認識できるようになる。反対に、認識対象同士が著しく異なっていれば、畳込みニューラルネットワークの層が少なくても認識することができる。これらの事象は、畳込みニューラルネットワークの前段部分で認識対象の大きな特徴を認識し、後段に進むほど各認識対象に特有の詳細な特徴を認識していることを示唆している。同様の事象が各層のフィルタ数を変更することでも実現できる。すなわち、各層のフィルタ数を多くすることによって区別が難しい類似した認識対象であっても認識できるようになる。反対に、認識対象同士が著しく異なっていれば、各層のフィルタ数が少なくても認識することができる。 In the convolutional neural network, higher and more complicated features can be extracted each time a layer is overlaid. Therefore, by making layers deeper, even similar recognition targets that are difficult to distinguish can be recognized. On the contrary, if the recognition targets are significantly different from each other, the recognition can be performed even if the number of layers of the convolutional neural network is small. These phenomena suggest that large features of the recognition target are recognized in the front part of the convolutional neural network, and detailed features peculiar to each recognition target are recognized in the subsequent stages. The same phenomenon can be realized by changing the number of filters in each layer. That is, by increasing the number of filters in each layer, even similar recognition targets that are difficult to distinguish can be recognized. On the contrary, if the recognition targets are significantly different, the recognition can be performed even if the number of filters in each layer is small.

したがって、機械学習モデルを構成するフィルタには、すべての認識対象の認識に寄与するフィルタと、ある認識対象を認識するためには重要な役割を果たす一方で別の認識対象の認識にはあまり寄与しないようなフィルタとが存在すると考えられる。前者のフィルタは複数の認識対象が共通に含む特徴を認識するためのフィルタが挙げられ、後者のフィルタは類似する特徴を有する特定の認識対象を区別するためのフィルタが挙げられる。 Therefore, the filters that make up the machine learning model contribute to the recognition of all recognition targets, and play an important role in recognizing one recognition target, but do not contribute much to the recognition of another recognition target. It is considered that there is a filter that does not. The former filter includes a filter for recognizing a feature that a plurality of recognition targets commonly include, and the latter filter includes a filter for distinguishing a specific recognition target having similar features.

例えば、ある機械学習モデルの認識対象に、猫、犬、猿、又は牛等の哺乳類と、みかん、人参、大根、又はキャベツ等の植物と、自動車やビル等の人工物とが含まれているとする。この場合、例えば、哺乳類の「目」に強く反応するフィルタは、認識対象が哺乳類か否かという大きな特徴を捉えるためのフィルタと考えられるので、すべての認識対象の認識に寄与するフィルタと考えられる。 For example, the recognition target of a certain machine learning model includes a mammal such as a cat, a dog, a monkey, or a cow, a plant such as a mandarin orange, a carrot, a radish, or a cabbage, and an artificial object such as a car or a building. And In this case, for example, a filter that strongly reacts to the "eyes" of mammals is considered to be a filter for capturing a large feature of whether or not the recognition target is a mammal, and thus is considered to be a filter that contributes to recognition of all recognition targets. ..

これに対し、例えば、「犬」と「猫」とを区別するために用いられるフィルタは、認識対象が犬又は猫である場合には認識に大きく寄与すると考えられるが、認識対象が猫や犬以外の「キャベツ」や「自動車」である場合には認識にあまり寄与しないと考えられる。すなわち、ある認識対象を認識するためには重要な役割を果たす一方で別の認識対象の認識にはあまり寄与しないようなフィルタが存在すると考えられる。 On the other hand, for example, a filter used for distinguishing “dog” from “cat” is considered to greatly contribute to recognition when the recognition target is a dog or a cat, but the recognition target is a cat or a dog. If it is a "cabbage" or "car" other than the above, it is considered that it does not contribute much to recognition. That is, it is considered that there exists a filter that plays an important role in recognizing a certain recognition target but does not contribute much to the recognition of another recognition target.

図３に示すように、実施の形態に係る画像認識装置が用いる畳込みニューラルネットワークＮの機械学習モデルは、前段フィルタ群と後段フィルタ群との２つのフィルタ群を備えている。図３に示す例では、前段フィルタ群は、５つの畳込み層（第１前段畳込み層Ｃ_ｆ１、第２前段畳込み層Ｃ_ｆ２、第３前段畳込み層Ｃ_ｆ３、第４前段畳込み層Ｃ_ｆ４、及び第５前段畳込み層Ｃ_ｆ５）と２つの前結合層（第１全結合層Ｆ１及び第２全結合層Ｆ２）に含まれるフィルタである。また後段フィルタ群は、５つの畳込み層（第１後段畳込み層Ｃ_ｒ１、第２後段畳込み層Ｃ_ｒ２、第３後段畳込み層Ｃ_ｒ３、第４後段畳込み層Ｃ_ｒ４、及び第５後段畳込み層Ｃ_ｒ５）に含まれるフィルタである。 As shown in FIG. 3, the machine learning model of the convolutional neural network N used by the image recognition apparatus according to the embodiment includes two filter groups, a pre-stage filter group and a post-stage filter group. In the example illustrated in FIG. 3, the pre-stage filter group includes five convolution layers (first pre-stage convolution layer C _f 1, second pre-stage convolution layer C _f 2, third pre-stage convolution layer C _f 3, and fourth convolution layer C _f 1. The filters are included in the front convolutional layer C _f 4 and the fifth front convolutional layer C _f 5) and the two front coupling layers (the first fully coupling layer F1 and the second fully coupling layer F2). The post-stage filter group includes five convolutional layers (first post-stage convolutional layer C _r 1, second post-stage convolutional layer C _r 2, third post-stage convolutional layer C _r 3, and fourth post-stage convolutional layer C _{r ).} 4 and the fifth post-convolutional layer C _r 5).

例えば、図２に示したＡｌｅｘＮｅｔに対して本実施の形態を適用した場合、各層が備えるフィルタを前段フィルタ群と後段フィルタ群とに等分する。具体的には、ニューラルネットワーク構造としては、前段フィルタ群及び後段フィルタ群の第１畳込み層については、（５５×５５）ノード×４８フィルタとなる。同様に、第２畳込み層〜第５畳込み層については、それぞれ、（２７×２７）ノード×１２８フィルタ、（１３×１３）ノード×１９２フィルタ、（１３×１３）ノード×１９２フィルタ、（１３×１３）ノード×１２８フィルタとなる。また、第１全結合層Ｆ１、第２全結合層Ｆ２、第３全結合層Ｆ３、及び第４全結合層Ｆ４は、いずれも２０４８ノードとなる。 For example, when the present embodiment is applied to AlexNet shown in FIG. 2, the filters included in each layer are equally divided into a pre-stage filter group and a post-stage filter group. Specifically, the neural network structure has (55×55) node×48 filters for the first convolution layer of the pre-stage filter group and the post-stage filter group. Similarly, for the second to fifth convolution layers, (27×27) node×128 filter, (13×13) node×192 filter, (13×13) node×192 filter, ( 13×13) node×128 filter. Further, the first total coupling layer F1, the second total coupling layer F2, the third total coupling layer F3, and the fourth total coupling layer F4 all have 2048 nodes.

［学習過程］
まず、実施の形態に係るニューラルネットワークＮの学習過程について説明する。 [Learning process]
First, the learning process of the neural network N according to the embodiment will be described.

図４（ａ）−（ｃ）は、実施の形態に係るニューラルネットワークＮの学習過程を説明するための図である。具体的には、図４（ａ）及び図４（ｂ）は、それぞれ前段フィルタ群の第１段階学習及び第２段階学習を示しており、図４（ｃ）は後段フィルタ群の学習を示している。ここで、前段フィルタ群は、入力画像Ｉに含まれる被写体が、あらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するために用いられるフィルタ群である。また、後段フィルタ群は、機械学習モデルを構成するフィルタの中から、前段フィルタを構成するフィルタを除いたフィルタ群である。 4A to 4C are diagrams for explaining the learning process of the neural network N according to the embodiment. Specifically, FIGS. 4A and 4B show the first-stage learning and the second-stage learning of the pre-stage filter group, respectively, and FIG. 4C shows the learning of the post-stage filter group. ing. Here, the pre-stage filter group is a filter group used for recognizing which one of a plurality of predetermined subject groups the subject included in the input image I belongs to. The post-stage filter group is a filter group excluding the filters constituting the pre-stage filter from the filters constituting the machine learning model.

図４（ａ）に示すように、前段フィルタ群の第１段階学習では、学習用の画像データである学習データを入力としたとき、その画像データに対応付けられた被写体の識別ラベルＢを出力する第１の前段フィルタ群を生成する。すなわち、前段フィルタ群の第１段階学習は、通常の機械学習の工程と同様である。 As shown in FIG. 4A, in the first-stage learning of the pre-stage filter group, when learning data, which is image data for learning, is input, the identification label B of the subject associated with the image data is output. To generate a first pre-filter group. That is, the first stage learning of the pre-stage filter group is the same as the normal machine learning process.

図４（ｂ）に示すように、前段フィルタ群の第２段階学習では、学習データを入力としたとき、その画像データが属する被写体グループを示すグループ識別ラベルを出力するような機械学習モデルを生成する。具体的には、前段フィルタ群に含まれる第１全結合層Ｆ１と第２全結合層Ｆ２をファインチューニングすることで、画像データが属するグループを認識する第２の前段フィルタ群を生成する。 As shown in FIG. 4B, in the second stage learning of the pre-stage filter group, when learning data is input, a machine learning model that outputs a group identification label indicating a subject group to which the image data belongs is generated. To do. Specifically, the first pre-combined layer F1 and the second full-combined layer F2 included in the pre-stage filter group are fine-tuned to generate a second pre-stage filter group that recognizes the group to which the image data belongs.

図４（ｂ）では、ファインチューニング後の第１全結合層Ｆ１と第２全結合層Ｆ２は、それぞれ第１全結合層Ｆ１’と第２全結合層Ｆ２’と記載されている。したがって、第１の前段フィルタ群の畳込み層と第２の前段フィルタ群の畳込み層とは共通である。第１の前段フィルタ群と第２の前段フィルタ群とは大部分が共通するため、両者を特に区別する場合を除き、単に前段フィルタ群と記載する。 In FIG. 4B, the first fully coupled layer F1 and the second fully coupled layer F2 after fine tuning are described as a first fully coupled layer F1' and a second fully coupled layer F2', respectively. Therefore, the convolutional layer of the first pre-stage filter group and the convolutional layer of the second pre-stage filter group are common. Since most of the first pre-stage filter group and the second pre-stage filter group are common, the first pre-stage filter group is simply referred to as the pre-stage filter group unless a distinction is made between the two.

後段フィルタの学習は、図４（ｃ）に示すように、前段フィルタ群の第１学習段階で生成された機械学習モデルとともに学習される。すなわち、第１の前段フィルタ群の出力と、後段フィルタ群の出力とを用いて学習データの識別ラベルＢを出力するように後段フィルタ群が学習される。後段フィルタ群の学習時には、前段フィルタ群への誤差逆伝搬は行われず、前段フィルタ群の適用結果を用いて後段フィルタ群が学習される。 As shown in FIG. 4C, the learning of the second-stage filter is performed together with the machine learning model generated in the first learning stage of the first-stage filter group. That is, the post-filter group is learned so that the identification label B of the learning data is output using the output of the first pre-filter group and the output of the post-filter group. At the time of learning the post-stage filter group, back propagation of the error to the pre-stage filter group is not performed, but the post-stage filter group is learned using the application result of the pre-stage filter group.

［認識過程］
続いて、実施の形態に係るニューラルネットワークＮの学習過程について説明する。 [Recognition process]
Next, the learning process of the neural network N according to the embodiment will be described.

図５（ａ）−（ｂ）は、実施の形態に係るニューラルネットワークＮの認識過程を説明するための図である。実施の形態に係るニューラルネットワークＮの認識過程は、入力画像Ｉに含まれる被写体が属するグループを認識するグループ認識の段階と、当該被写体そのものを認識する被写体認識の段階との２つの段階から構成されている。 FIGS. 5A and 5B are diagrams for explaining the recognition process of the neural network N according to the embodiment. The recognition process of the neural network N according to the embodiment is composed of two stages: a group recognition stage for recognizing a group to which the subject included in the input image I belongs and a subject recognition stage for recognizing the subject itself. ing.

図５（ａ）は、入力画像Ｉのグループ認識を説明するための図である。グループ認識は、前段フィルタ群の第２段階学習で生成された第２の前段フィルタ群を用いて行われる。入力画像Ｉに第２の前段フィルタ群を適用することにより、入力画像Ｉに含まれる被写体が属する被写体グループが認識される。 FIG. 5A is a diagram for explaining the group recognition of the input image I. Group recognition is performed using the second pre-stage filter group generated in the second-stage learning of the pre-stage filter group. By applying the second pre-filter group to the input image I, the subject group to which the subject included in the input image I belongs is recognized.

実施の形態に係る画像認識装置は、後段フィルタ群に含まれるフィルタをすべて適用することで、最も高い認識精度で認識対象を認識することが期待できる。しかしながら、後段フィルタ群に含まれるフィルタをすべて適用しなくても、一定精度の認識精度を維持することはできる。したがって、画像認識装置は、例えば、画像認識装置の計算リソースがもともと小さかったり、計算リソースは大きくても他の演算の処理負荷が大きく一時的に機械学習モデルの適用に割り当てる計算リソースが小さかったりする場合に、計算リソースに合わせて適用する後段フィルタを取捨選択することで、演算の実行可能性と、認識精度とのバランスを図ることができる。 The image recognition apparatus according to the embodiment can be expected to recognize the recognition target with the highest recognition accuracy by applying all the filters included in the post-stage filter group. However, it is possible to maintain constant recognition accuracy without applying all the filters included in the latter filter group. Therefore, in the image recognition device, for example, the calculation resource of the image recognition device is originally small, or even if the calculation resource is large, the processing load of other calculations is large and the calculation resource temporarily allocated to the application of the machine learning model is small. In this case, it is possible to balance the feasibility of calculation with the recognition accuracy by selecting the post-stage filter to be applied according to the calculation resource.

詳細は後述するが、実施の形態に係る画像認識装置は、入力画像Ｉに含まれる被写体が属する被写体グループに基づいて、後段フィルタ群を構成するフィルタの中から実際に適用するフィルタを選択する。そして、図５（ｂ）に示すように、第１の前段フィルタ群の出力と、選択された後段フィルタ群との出力を合わせて、入力画像Ｉに含まれる被写体を示す識別ラベルＢが出力される。 Although details will be described later, the image recognition apparatus according to the embodiment selects a filter to be actually applied from the filters forming the post-stage filter group based on the subject group to which the subject included in the input image I belongs. Then, as shown in FIG. 5B, the output of the first front filter group and the output of the selected rear filter group are combined to output the identification label B indicating the subject included in the input image I. It

ここで、入力画像Ｉのグループ認識に要求される演算量の増加量が、後段フィルタ群のフィルタの取捨選択による演算量の低減量を上回っては、後段フィルタ群のフィルタを取捨選択することの意味がない。しかしながら、実施の形態に係るニューラルネットワークＮにおいて、第１の前段フィルタ群の畳込み層と、第２の前段フィルタ群の畳込み層とは共通である。したがって、入力画像Ｉに含まれる被写体が属する被写体グループを認識するために実行した第２の前段フィルタ群の畳込み層の演算結果は、第１の前段フィルタ群の畳込み層の演算に流用できる。これにより、実施の形態に係るニューラルネットワークＮの認識過程において、入力画像Ｉに含まれる被写体が属する被写体グループを認識するために増加する演算コストは、実質的に第１全結合層Ｆ１’の演算と第２全結合層Ｆ２’の演算だけである。ゆえに、入力画像Ｉのグループ認識に要求される演算量の増加量は、後段フィルタ群のフィルタの取捨選択による演算量の低減量を十分下回ることが期待できる。 Here, if the amount of increase in the amount of calculation required for group recognition of the input image I exceeds the amount of decrease in the amount of calculation due to selection of filters in the post-stage filter group, it is possible to select filters in the post-stage filter group. meaningless. However, in the neural network N according to the embodiment, the convolutional layer of the first pre-stage filter group and the convolutional layer of the second pre-stage filter group are common. Therefore, the calculation result of the convolutional layer of the second pre-stage filter group executed for recognizing the subject group to which the subject included in the input image I belongs can be used for the calculation of the convolutional layer of the first pre-stage filter group. .. Accordingly, in the recognition process of the neural network N according to the embodiment, the calculation cost increased for recognizing the subject group to which the subject included in the input image I belongs is substantially the calculation of the first fully connected layer F1′. And the calculation of the second fully connected layer F2'. Therefore, it can be expected that the increase amount of the calculation amount required for the group recognition of the input image I is sufficiently smaller than the reduction amount of the calculation amount due to the selection of the filters of the post-stage filter group.

このように、実施の形態に係る画像認識装置は、入力画像Ｉに適用するフィルタを取捨選択することにより、認識処理の処理負荷を軽減することができる。 As described above, the image recognition apparatus according to the embodiment can reduce the processing load of the recognition processing by selecting the filters to be applied to the input image I.

＜実施の形態に係る画像認識装置１の機能構成＞
図６は、実施の形態に係る画像認識装置１の機能構成を模式的に示す図である。画像認識装置１は、記憶部２と制御部３とを備える。図６において、矢印は主なデータの流れを示しており、図６に示していないデータの流れがあってもよい。図６において、各機能ブロックはハードウェア（装置）単位の構成ではなく、機能単位の構成を示している。そのため、図６に示す機能ブロックは単一の装置内に実装されてもよく、あるいは複数の装置内に分かれて実装されてもよい。機能ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてもよい。 <Functional configuration of the image recognition device 1 according to the embodiment>
FIG. 6 is a diagram schematically showing the functional configuration of the image recognition device 1 according to the embodiment. The image recognition device 1 includes a storage unit 2 and a control unit 3. In FIG. 6, arrows indicate main data flows, and there may be data flows not shown in FIG. In FIG. 6, each functional block does not have a hardware (device) unit configuration but a functional unit configuration. Therefore, the functional blocks shown in FIG. 6 may be implemented in a single device or may be separately implemented in a plurality of devices. Data transfer between the functional blocks may be performed via any means such as a data bus, a network, a portable storage medium, or the like.

記憶部２は、画像認識装置１を実現するコンピュータのＢＩＯＳ（Basic Input Output System）等を格納するＲＯＭ（Read Only Memory）や画像認識装置１の作業領域となるＲＡＭ（Random Access Memory）、ＯＳ（Operating System）やアプリケーションプログラム、当該アプリケーションプログラムの実行時に参照される種々の情報を格納するＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の大容量記憶装置である。 The storage unit 2 includes a ROM (Read Only Memory) that stores a BIOS (Basic Input Output System) of a computer that implements the image recognition apparatus 1, a RAM (Random Access Memory) that is a work area of the image recognition apparatus 1, and an OS ( An operating system), an application program, and a large-capacity storage device such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive) that stores various types of information referred to when the application program is executed.

制御部３は、画像認識装置１のＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等のプロセッサであり、記憶部２に記憶されたプログラムを実行することによって画像取得部３０、モデル取得部３１、グループ認識部３２、フィルタ選択部３３、個別認識部３４、リソース取得部３５、重み指標算出部３６、優先度設定部３７、類似度算出部３８、及び学習部３９として機能する。 The control unit 3 is a processor such as a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit) of the image recognition device 1, and executes the program stored in the storage unit 2 to execute the image acquisition unit 30 and the model acquisition unit. 31, the group recognition unit 32, the filter selection unit 33, the individual recognition unit 34, the resource acquisition unit 35, the weight index calculation unit 36, the priority setting unit 37, the similarity calculation unit 38, and the learning unit 39.

なお、図６は、画像認識装置１が単一の装置で構成されている場合の例を示している。しかしながら、画像認識装置１は、例えばクラウドコンピューティングシステムのように複数のプロセッサやメモリ等の計算リソースによって実現されてもよい。この場合、制御部３を構成する各部は、複数の異なるプロセッサの中の少なくともいずれかのプロセッサがプログラムを実行することによって実現される。 Note that FIG. 6 shows an example in which the image recognition device 1 is configured by a single device. However, the image recognition device 1 may be realized by a plurality of computing resources such as a processor and a memory like a cloud computing system. In this case, each unit constituting the control unit 3 is realized by executing a program by at least one of the plurality of different processors.

画像取得部３０は、画像認識装置１が処理対象とする画像データである入力画像Ｉを取得する。モデル取得部３１は、複数のフィルタをモデルパラメータとして含む機械学習モデルを取得する。実施の形態に係るモデル取得部３１が取得する機械学習モデルは、処理対象の画像データである入力画像Ｉに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルである。 The image acquisition unit 30 acquires an input image I which is image data to be processed by the image recognition device 1. The model acquisition unit 31 acquires a machine learning model including a plurality of filters as model parameters. In the machine learning model acquired by the model acquisition unit 31 according to the embodiment, which one of a plurality of predetermined recognition targets is a subject included in an input image I that is image data to be processed is a recognition target. Is a machine learning model that outputs information indicating

グループ認識部３２は、学習モデルに含まれる複数のフィルタのうち、入力画像Ｉに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群（すなわち、上述した第２の前段フィルタ群）を、入力画像Ｉに適用して被写体グループを特定する。 The group recognition unit 32 recognizes which of the plurality of predetermined object groups the object included in the input image I belongs to among the plurality of filters included in the learning model. (That is, the above-mentioned second pre-stage filter group) is applied to the input image I to identify the subject group.

フィルタ選択部３３は、機械学習モデルに含まれる複数のフィルタのうち、前段フィルタ群を除いたフィルタ群である後段フィルタ群の中から１以上のフィルタを選択する。ここで、フィルタ選択部３３は、グループ認識部３２が特定した被写体グループと、後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいてフィルタを選択する。フィルタ選択部３３によるフィルタ選択の詳細は後述する。 The filter selection unit 33 selects one or more filters from the plurality of filters included in the machine learning model from the post-stage filter group that is the filter group excluding the pre-stage filter group. Here, the filter selection unit 33 selects a filter based on at least the subject group identified by the group recognition unit 32 and the priority set for each filter that configures the subsequent filter group. Details of filter selection by the filter selection unit 33 will be described later.

個別認識部３４は、第１の前段フィルタ群の適用結果と、フィルタ選択部３３が選択したフィルタを入力画像Ｉに適用した結果とに基づいて、入力画像Ｉに含まれる認識対象を特定する。このように、画像認識装置１は、機械学習モデルが備える後段フィルタ群の中から実際に入力画像Ｉに適用するフィルタを選択する。これにより、画像認識装置１は、機械学習モデルが備えるフィルタをすべて使用する場合と比較して、画像認識処理における処理負荷を軽減することができる。 The individual recognition unit 34 identifies the recognition target included in the input image I based on the application result of the first pre-stage filter group and the result of applying the filter selected by the filter selection unit 33 to the input image I. In this way, the image recognition apparatus 1 selects a filter to be actually applied to the input image I from the post-stage filter group included in the machine learning model. As a result, the image recognition device 1 can reduce the processing load in the image recognition process, as compared with the case where all the filters included in the machine learning model are used.

リソース取得部３５は、画像認識装置１の計算リソースを取得する。ここでリソース取得部３５が取得する「計算リソース」は、画像認識装置１が備えるＣＰＵ及びＧＰＵ等のプロセッサのパワー及び画像認識装置１が備える主記憶装置の容量等、画像認識装置１が画像認識処理に割り当てることができる計算能力である。画像認識装置１の計算リソースは、例えば画像認識装置１がブレードサーバである場合とＩｏＴデバイスである場合のように、画像認識装置１の種類によって異なる。また、同一の画像認識装置１であっても、画像認識処理時に並行して実行している他の処理に割り当てている計算リソースの大きさによって、画像認識処理に割り当て可能な計算リソースが変化しうる。リソース取得部３５は、画像認識装置１が機械学習モデルを用いて画像認識処理を実施する際に使用可能な計算リソースを取得する。 The resource acquisition unit 35 acquires the calculation resource of the image recognition device 1. Here, the “computation resource” acquired by the resource acquisition unit 35 includes the power of a processor such as a CPU and a GPU included in the image recognition device 1 and the capacity of a main storage device included in the image recognition device 1 and the like. It is the computing power that can be assigned to a process. The calculation resource of the image recognition device 1 differs depending on the type of the image recognition device 1, such as when the image recognition device 1 is a blade server and when it is an IoT device. Even in the same image recognition apparatus 1, the calculation resources that can be allocated to the image recognition process change depending on the size of the calculation resources that are allocated to other processes that are executed in parallel during the image recognition process. sell. The resource acquisition unit 35 acquires a calculation resource that can be used when the image recognition device 1 performs an image recognition process using a machine learning model.

フィルタ選択部３３は、画像認識装置１の計算リソースが許容する範囲において、後段フィルタ群を構成する各フィルタに設定されている優先度の高い順に個別認識部３４に適用させるフィルタを選択する。一般に、画像認識処理において適用するフィルタの数が多いほど高い認識精度を期待できる。画像認識装置１は、画像認識装置１の計算リソースに応じて適用する後段フィルタ群のフィルタを選択することにより、画像認識装置１が実行可能な範囲において最も高い認識精度を期待できる画像認識処理を実行することができる。 The filter selection unit 33 selects a filter to be applied to the individual recognition unit 34 in descending order of priority set for each filter that configures the post-stage filter group within a range permitted by the calculation resource of the image recognition device 1. Generally, the higher the number of filters applied in the image recognition processing, the higher the recognition accuracy can be expected. The image recognition device 1 selects the filter of the post-stage filter group to be applied according to the calculation resource of the image recognition device 1 to perform the image recognition process in which the highest recognition accuracy can be expected in the range in which the image recognition device 1 can execute. Can be executed.

［フィルタ選択処理］
続いて、実施の形態に係る画像認識装置１におけるフィルタ選択処理を説明する。 [Filter selection processing]
Subsequently, a filter selection process in the image recognition device 1 according to the embodiment will be described.

上述したように、ニューラルネットワークにおいては、前段層の出力がその前段層に隣接する後段層の入力となる。畳込みニューラルネットワークにおける各畳込み層は、前段層から入力された信号に対してフィルタを適用し、フィルタの出力がその層の出力となる。したがって、畳込み層におけるフィルタの重み係数の絶対値が大きいほど、その次の層の入力信号の絶対値が大きくなりうる。 As described above, in the neural network, the output of the preceding layer becomes the input of the following layer adjacent to the preceding layer. Each convolutional layer in the convolutional neural network applies a filter to the signal input from the previous stage layer, and the output of the filter becomes the output of that layer. Therefore, the larger the absolute value of the weighting coefficient of the filter in the convolutional layer, the larger the absolute value of the input signal of the next layer can be.

すなわち、ある畳込み層のフィルタの重み係数の大きさは、次の層において対応するユニットの活性度の指標値となりうる。ニューラルネットワークにおいては、層を構成するユニットうち、活性度の大きいユニットは、活性度の小さいユニットよりも、認識能力に対する寄与度が大きいと言われている。 That is, the magnitude of the weighting coefficient of the filter of a certain convolutional layer can be an index value of the activity of the corresponding unit in the next layer. In a neural network, it is said that a unit having a high activity among units forming a layer has a greater contribution to a recognition ability than a unit having a low activity.

そこで、重み指標算出部３６は、後段フィルタ群に含まれる複数のフィルタそれぞれの重み係数の大きさを示す指標を算出する。ここで、「重み係数の大きさを示す指標」とは、例えばフィルタの重み係数の絶対値の総和をフィルタの重み係数の数で割った値である。あるいは、フィルタの重み係数の２乗の総和を、フィルタの重み係数の数で割った値であってもよい。いずれにしても、フィルタの重み係数の大きさを示す指標が大きいほど、そのフィルタに含まれる重み係数が大きいことを示している。 Therefore, the weight index calculation unit 36 calculates an index indicating the magnitude of the weight coefficient of each of the plurality of filters included in the latter filter group. Here, the “index indicating the magnitude of the weighting coefficient” is, for example, a value obtained by dividing the sum of absolute values of the weighting coefficients of the filter by the number of weighting coefficients of the filter. Alternatively, it may be a value obtained by dividing the sum of squares of the filter weighting factors by the number of filter weighting factors. In any case, the larger the index indicating the size of the weighting coefficient of the filter, the larger the weighting coefficient included in the filter.

優先度設定部３７は、重み指標算出部３６が算出した指標によって重み係数が大きいことを示しているフィルタには、重み係数が小さいことを示しているフィルタよりも、優先度を高く設定する。これにより、画像認識装置１は、機械学習モデルの認識能力に寄与度が大きいと考えられるフィルタを優先して選択することができる。 The priority setting unit 37 sets a higher priority for a filter that indicates that the weighting coefficient is larger by the index calculated by the weighting index calculation unit 36 than a filter that indicates that the weighting coefficient is small. Thereby, the image recognition device 1 can preferentially select a filter that is considered to have a large contribution to the recognition ability of the machine learning model.

ここで、優先度設定部３７がある後段フィルタ群に含まれる各フィルタに設定する優先度の段階は、後段フィルタ群に含まれるフィルタの数を上限として任意である。例えば、優先度の段階をフィルタの数と同じにした場合、その畳込み層に含まれるフィルタは優先度を用いて序列をつけることができる。 Here, the priority level set by the priority setting unit 37 for each filter included in the post-stage filter group is arbitrary with the upper limit being the number of filters included in the post-stage filter group. For example, if the priority level is the same as the number of filters, the filters included in the convolutional layer can be ordered by priority.

あるいは、フィルタの数が２以上の場合において、優先度の段階を「高」と「低」との２段階としてもよい。この場合、優先度設定部３７は、所定の閾値Ａを設定し、各フィルタの重み係数の大きさを示す指標が閾値Ａを超える場合は優先度を「高」とし、閾値Ａ未満の場合は優先度を「低」とすればよい。 Alternatively, when the number of filters is two or more, the priority level may be set to two levels of “high” and “low”. In this case, the priority setting unit 37 sets a predetermined threshold A, sets the priority to “high” when the index indicating the size of the weighting coefficient of each filter exceeds the threshold A, and sets the priority to less than the threshold A. The priority may be “low”.

また、優先度設定部３７は、後段フィルタ群に含まれるフィルタ同士の類似度に基づいて、各フィルタに設定する優先度を変更してもよい。一般に、ある２つのフィルタの重み係数が近似しているほど、その２つのフィルタは近似する特徴を抽出すると考えられる。したがって、ある２つのフィルタが類似する場合には、いずれか一方のフィルタが特徴を抽出すれば、もう一方のフィルタを用いなくても、最終的な認識精度の変化は小さいと考えられる。反対に、他に類似するフィルタが存在しないフィルタは、そのフィルタは他のフィルタでは抽出できない特徴を抽出できる可能性がある。 Further, the priority setting unit 37 may change the priority set for each filter based on the similarity between the filters included in the latter filter group. In general, it is considered that the closer the weighting coefficients of a certain two filters are, the more the two filters extract the similar features. Therefore, when two certain filters are similar to each other, if one of the filters extracts a feature, it is considered that the final change in the recognition accuracy is small without using the other filter. Conversely, a filter for which no other similar filter exists can potentially extract features that other filters cannot.

そこで、類似度算出部３８は、後段フィルタ群に含まれる複数のフィルタそれぞれについて他のフィルタとの類似度を算出する。優先度設定部３７は、他に類似するフィルタが存在しないフィルタには、他に類似するフィルタが存在するフィルタよりも、優先度を高く設定する。これにより、画像認識装置１は、画像認識装置１の計算リソースが許容する範囲において、異なる特徴を抽出するためのフィルタを選択することができる。 Therefore, the similarity calculation unit 38 calculates the similarity of each of the plurality of filters included in the latter filter group to other filters. The priority setting unit 37 sets a higher priority for a filter having no other similar filter than for a filter having another similar filter. As a result, the image recognition device 1 can select a filter for extracting different features within a range permitted by the calculation resources of the image recognition device 1.

ここで、類似度算出部３８は、フィルタ間の「距離」をフィルタ間の類似度として算出すればよい。類似度算出部３８が算出するフィルタ間の「距離」は、距離の公理を満たせばどのような量であってもよいが、例えばフィルタ間のユークリッド距離である。具体的には、類似度算出部３８は、第ｉフィルタと第ｊフィルタの類似度Ｄ（ｉ，ｊ）は以下の式（１）を用いて算出する。 Here, the similarity calculation unit 38 may calculate the “distance” between the filters as the similarity between the filters. The “distance” between the filters calculated by the similarity calculation unit 38 may be any amount as long as the distance axiom is satisfied, but is, for example, the Euclidean distance between the filters. Specifically, the similarity calculator 38 calculates the similarity D(i,j) between the i-th filter and the j-th filter using the following equation (1).

ここで、Ｉ（ｍ，ｎ，ｆ）は、３次元の第ｉフィルタの縦ｍ、横ｎ、高さｆにおける要素であり、Ｊ（ｍ，ｎ，ｆ）は、第ｊフィルタの縦ｍ、横ｎ、高さｆにおける要素である。式（１）は、２つのフィルタのユークリッド距離を、フィルタの要素数（重み係数の数）で正規化した量であることを示している。ある２つのフィルタ間の非類似度Ｄの値が小さいほど、そのフィルタ同士は類似していることを示している。この他にも、類似度算出部３８は、例えばコサイン類似度を用いてフィルタ間の類似度を算出してもよい。

Here, I(m,n,f) is an element in the vertical m, horizontal n, and height f of the three-dimensional i-th filter, and J(m,n,f) is the vertical m of the j-th filter. , Lateral n, and height f. Expression (1) indicates that the Euclidean distance between the two filters is a quantity normalized by the number of filter elements (the number of weighting factors). The smaller the dissimilarity D between two filters is, the more similar the filters are. In addition to this, the similarity calculation unit 38 may calculate the similarity between filters using, for example, the cosine similarity.

類似度算出部３８は、第ｉフィルタの類似度Ｓ（ｉ）として、Ｓ（ｉ）＝ΣＤ（ｉ，ｊ）（ｊ＝１，・・・，Ｎｆ；Ｎｆはフィルタの数）を算出し、その値が大きい（即ち他に類似するフィルタが存在しない）フィルタに高い優先度を割り当ててもよい。 The similarity calculation unit 38 calculates S(i)=ΣD(i,j) (j=1,..., Nf; Nf is the number of filters) as the similarity S(i) of the i-th filter. , A filter whose value is large (that is, there is no other similar filter) may be assigned a high priority.

重み指標算出部３６と同様に、優先度設定部３７も、類似度の段階を「類似」と「非類似」との２段階としてもよい。この場合、優先度設定部３７は、所定の閾値Ｂを設定し、各フィルタ間の類似度が閾値Ｂを超える場合は「類似」とし、閾値Ｂ未満の場合は「非類似」とすればよい。また、優先度設定部３７は、重み指標算出部３６が優先度を２段階とした場合に、優先度が低となっているフィルタを対象として、フィルタ間の類似度を求めてもよい。この場合、優先度設定部３７は、いずれのフィルタとも非類似となったフィルタの優先度を「高」としてもよい。 Similar to the weight index calculation unit 36, the priority setting unit 37 may have two levels of similarity, “similar” and “not similar”. In this case, the priority setting unit 37 sets a predetermined threshold value B, and if the similarity between the filters exceeds the threshold value B, the similarity is set, and if the similarity degree between the filters is less than the threshold value B, the priority setting section 37 sets the dissimilarity. .. Further, when the weighting index calculation unit 36 sets the priority to two levels, the priority setting unit 37 may obtain the similarity between the filters by targeting the filters having the low priority. In this case, the priority setting unit 37 may set the priority of the filter dissimilar to any of the filters to “high”.

上述したように、機械学習モデルを構成するフィルタには、ある認識対象を認識するためには重要な役割を果たす一方で別の認識対象の認識にはあまり寄与しないようなフィルタが存在すると考えられる。したがって、被写体の種類によって、その被写体を認識するためのフィルタの重要度が変わることが起こりうる。 As described above, it is considered that there are filters that make up a machine learning model that play an important role in recognizing one recognition target but do not contribute much to the recognition of another recognition target. .. Therefore, the importance of the filter for recognizing the subject may change depending on the type of the subject.

そこで、優先度設定部３７は、あらかじめ定められた複数の被写体グループ毎に、後段フィルタ群を構成する各フィルタに設定する優先度を変更する。以下、優先度設定部３７が実行する被写体グループ毎のフィルタの優先度の設定について具体的に説明する。 Therefore, the priority setting unit 37 changes the priority set for each of the filters forming the post-stage filter group for each of a plurality of predetermined subject groups. Hereinafter, the setting of the filter priority for each subject group performed by the priority setting unit 37 will be specifically described.

いま、被写体グループの種類がＰ種類（Ｐは２以上の整数）であり、後段フィルタ群に含まれるフィルタの数がＱ個（Ｑは２以上の整数）であるとする。ｑ番目のフィルタをフィルタｆ_ｑ（１≦ｑ≦Ｑ）とし、ｐ番目のグループ（１≦ｐ≦Ｐ）におけるフィルタｆ_ｑの重要性の順序をＷ_ｐｑとする。 Now, it is assumed that there are P types of subject groups (P is an integer of 2 or more) and Q filters (Q is an integer of 2 or more) included in the post-stage filter group. Let the q-th filter be the filter f _q (1≦q≦Q), and the order of importance of the filter f _q in the p-th group (1≦p≦P) be W _pq .

ステップ１：優先度設定部３７は、ｐに１を設定する。
ステップ２：優先度設定部３７は、ｐ番目の被写体グループのテストデータＴ_ｐを取得する。
ステップ３：優先度設定部３７は、前段フィルタ群と全ての後段フィルタ群とをテストデータＴ_ｐに適用し、認識率Ｒ_ｐを算出する。 Step 1: The priority setting unit 37 sets 1 to p.
Step 2: The priority setting unit 37 acquires the test data T _p of the p-th subject group.
Step 3: priority setting unit 37 applies a pre-stage filter group and all of the subsequent filter group to the test data T _p, calculates the recognition rate R _p.

ステップ４：優先度設定部３７は、ｑに１を設定する。
ステップ５：優先度設定部３７は、後段フィルタ群に含まれるフィルタの中からフィルタｆ_ｑを除外して適用した場合のテストデータＴ_ｐの認識率Ｒ_ｐｑを算出する。
ステップ６：優先度設定部３７は、認識率Ｒ_ｐから認識率Ｒ_ｐｑを減算した値である認識率の低下量Ｃ_ｑを算出する。低下量Ｃ_ｑはフィルタｆ_ｑを除外したことによる認識率の低下量を示している。すなわち、認識率に対するフィルタｆ_ｑの貢献度を示している。 Step 4: The priority setting unit 37 sets 1 to q.
Step 5: The priority setting unit 37 calculates the recognition rate R _pq of the test data T _p when the filter f _q is excluded from the filters included in the latter filter group and applied.
Step 6: The priority setting unit 37 calculates the reduction amount C _q of the recognition rate, which is a value obtained by subtracting the recognition rate R _pq from the recognition rate R _p . The reduction amount C _q indicates the reduction amount of the recognition rate due to the exclusion of the filter f _q . That is, the contribution of the filter f _{q to} the recognition rate is shown.

ステップ７：優先度設定部３７は、ｑの値をｑ＋１に更新する。
ステップ８：ｑがＱを超えるまで、優先度設定部３７はステップ４及びステップ５の処理を繰り返す。
ステップ９：優先度設定部３７は、Ｑ個の低下量Ｃ_ｑを大きい順に並べ替える。このとき低下量Ｃ_ｑの添字ｑの順序が、ｐ番目の被写体グループにおけるフィルタｆ_ｑの重要製の順序Ｗ_ｐｑとなる。 Step 7: The priority setting unit 37 updates the value of q to q+1.
Step 8: The priority setting unit 37 repeats the processing of steps 4 and 5 until q exceeds Q.
Step 9: The priority setting unit 37 sorts the Q reduction amounts C _q in descending order. At this time, the order of the subscript q of the reduction amount C _q is the important order W _pq of the filter f _q in the p-th subject group.

ステップ１０：優先度設定部３７は、ｐの値をｐ＋１に更新する。
スタップ１１：ｐがＰを超えるまで、優先度設定部３７はステップ２からステップ８までの処理を繰り返す。 Step 10: The priority setting unit 37 updates the value of p to p+1.
Until the stub 11:p exceeds P, the priority setting unit 37 repeats the processing from step 2 to step 8.

以上の処理により、優先度設定部３７は、あらかじめ定められた被写体ブループ毎に、後段フィルタ群を構成する各フィルタの重要性の順序を求めることができる。優先度設定部３７は、あらかじめ定められた被写体ブループ毎に、重要性が高いフィルタほど優先度を上げる。これにより、優先度設定部３７は、あらかじめ定められた複数の被写体グループ毎に、後段フィルタ群を構成する各フィルタに設定する優先度を変更することができる。 Through the above processing, the priority setting unit 37 can obtain the order of importance of the filters forming the post-stage filter group for each predetermined object group. The priority setting unit 37 raises the priority for a filter having higher importance for each predetermined object group. As a result, the priority setting unit 37 can change the priority set for each of the filters forming the post-stage filter group for each of a plurality of predetermined subject groups.

なお、フィルタの類似度と同様に、優先度設定部３７は、重要度の段階を重要か否かの２段階としてもよい。この場合、優先度設定部３７は、所定の閾値Ｃを設定し、認識率Ｒ_ｐｑが閾値Ｃより小さい場合はフィルタｆ_ｑを重要とし、認識率Ｒ_ｐｑが閾値Ｃ以上の場合はフィルタｆ_ｑを重要でないとすればよい。 As with the filter similarity, the priority setting unit 37 may set the level of importance to two levels of importance or not. In this case, the priority setting unit 37 sets a predetermined threshold value C, recognition rate R _pq cases the threshold C is smaller than the critical filter f _q, if the recognition rate R _pq is less than the threshold value C filter f _q Should not be important.

図７は、被写体グループ毎のフィルタの優先度を表形式で模式的に示す図である。具体的には、図７は、第１被写体グループに関するフィルタ毎の重み係数の大きさ、類似度、重要性の順序、及び優先度を格納する優先度データベースのデータ構造を示している。優先度データベースは記憶部２に格納され、優先度設定部３７によって管理される。優先度データベースを参照することにより、フィルタ選択部３３は、被写体グループと、後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、後段フィルタ群のうち個別認識部３４に適用させるフィルタを選択することができる。 FIG. 7 is a diagram schematically showing the filter priority for each subject group in a table format. Specifically, FIG. 7 shows a data structure of a priority database that stores the magnitude of the weighting factor for each filter, similarity, order of importance, and priority for the first subject group. The priority database is stored in the storage unit 2 and managed by the priority setting unit 37. By referring to the priority database, the filter selection unit 33 causes the individual recognition unit 34 of the post-stage filter group to identify the individual group based on at least the subject group and the priorities set for the filters constituting the post-stage filter group. You can select the filter to apply.

ここで、優先度設定部３７が、後段フィルタ群に含まれる各フィルタに設定する優先度の段階をフィルタの数未満とした場合、複数のフィルタが同じ優先度となることも起こりうる。そこで、フィルタ選択部３３は、前段フィルタ群を構成する２以上のフィルタそれぞれに等しい優先度が設定されている場合は、等しい優先度が設定されたフィルタから無作為にフィルタを選択すればよい。これにより、フィルタ選択部３３は、優先度以外の他の指標を参照することなく、フィルタを選択することができる。 Here, when the priority setting unit 37 sets the priority level set for each filter included in the subsequent filter group to less than the number of filters, it is possible that a plurality of filters have the same priority. Therefore, when equal priority is set to each of two or more filters that configure the pre-stage filter group, the filter selection unit 33 may randomly select the filters from the filters to which equal priority is set. As a result, the filter selection unit 33 can select a filter without referring to an index other than the priority.

学習部３９は、複数の画像データと、複数の画像データそれぞれに含まれる被写体と、被写体の被写体グループとが関連付けられた学習データに基づいて、ニューラルネットワークを用いた機械学習によって機械学習モデルを生成する。 The learning unit 39 generates a machine learning model by machine learning using a neural network based on learning data in which a plurality of image data, a subject included in each of the plurality of image data, and a subject group of the subject are associated with each other. To do.

図８は、実施の形態に係る学習部３９の機能構成を模式的に示す図である。図８に示すように、学習部３９は、前段学習部３９０と後段学習部３９１とを備える。 FIG. 8 is a diagram schematically showing the functional configuration of the learning unit 39 according to the embodiment. As shown in FIG. 8, the learning unit 39 includes a pre-stage learning unit 390 and a post-stage learning unit 391.

まず、画像取得部３０は、画像データに含まれる被写体と、その被写体が属する被写体グループとが既知である複数の画像データを取得する。前段学習部３９０は、画像取得部３０が取得した複数の画像データを学習データとして、図４（ａ）−（ｂ）を参照して説明したように、ニューラルネットワークを用いた機械学習によって前段フィルタ群を生成する。 First, the image acquisition unit 30 acquires a plurality of image data in which the subject included in the image data and the subject group to which the subject belongs are known. The pre-stage learning unit 390 uses the plurality of image data acquired by the image acquisition unit 30 as learning data, and as described with reference to FIGS. 4A and 4B, the pre-stage learning unit performs machine learning using a neural network. Generate a group.

後段学習部３９１は、図４（ｃ）を参照して説明したように、前段学習部３９０が前段フィルタ群を生成した後に、前段フィルタ群への誤差逆伝搬を行わずに前段フィルタ群の適用結果を用いて後段フィルタ群を生成する。 As described with reference to FIG. 4C, the post-stage learning unit 391 applies the pre-stage filter group without performing back error propagation to the pre-stage filter group after the pre-stage learning unit 390 generates the pre-stage filter group. A post-stage filter group is generated using the result.

具体的には、後段学習部３９１は出力層Ｌｏの出力と、入力画像Ｉに対応付けられた識別ラベルＢとの誤差を誤差逆伝搬させることによって各層を構成するフィルタの重みを更新する。このとき、後段学習部３９１は、第１前段畳込み層Ｃ_ｆ１、第２前段畳込み層Ｃ_ｆ２、第３前段畳込み層Ｃ_ｆ３、第４前段畳込み層Ｃ_ｆ４、及び第５前段畳込み層Ｃ_ｆ５に含まれるフィルタの重みを固定し、その更新を禁止する。これにより、後段学習部３９１は、前段フィルタ群を固定したまま、後段フィルタ群を生成することができる。 Specifically, the post-stage learning unit 391 updates the weights of the filters forming each layer by back-propagating the error between the output of the output layer Lo and the identification label B associated with the input image I. At this time, the second-stage learning unit 391 includes the first front-stage convolutional layer C _f 1, the second front-stage convolutional layer C _f 2, the third front-stage convolutional layer C _f 3, the fourth front-stage convolutional layer C _f 4, and The weight of the filter included in the fifth front convolutional layer C _f 5 is fixed and its update is prohibited. As a result, the post-stage learning unit 391 can generate the post-stage filter group while fixing the pre-stage filter group.

＜画像認識装置１が実行する画像認識方法の処理フロー＞
図９は、実施の形態に係る画像認識装置１が実行する画像認識処理の流れを説明するためのフローチャートである。本フローチャートにおける処理は、例えば画像認識装置１が起動したときに開始する。 <Processing Flow of Image Recognition Method Performed by Image Recognition Device 1>
FIG. 9 is a flowchart for explaining the flow of the image recognition process executed by the image recognition device 1 according to the embodiment. The process in this flowchart is started, for example, when the image recognition device 1 is activated.

モデル取得部３１は、複数のフィルタをモデルパラメータとして含む機械学習モデルであって、処理対象の画像データである入力画像Ｉに含まれる被写体が、あらかじめ定められた複数の認識対象のうちいずれの認識対象であるかを示す情報を出力する機械学習モデルを取得する（Ｓ２）。 The model acquisition unit 31 is a machine learning model including a plurality of filters as model parameters, and a subject included in an input image I that is image data to be processed is recognized as one of a plurality of predetermined recognition targets. A machine learning model that outputs information indicating whether it is a target is acquired (S2).

グループ認識部３２は、複数のフィルタのうち、入力画像Ｉに含まれる被写体があらかじめ定められた複数の被写体グループのうちいずれの被写体グループに属すかを認識するための前段フィルタ群を入力画像Ｉに適用することにより、被写体グループを特定する（Ｓ４）。 The group recognition unit 32 sets, in the input image I, a pre-stage filter group for recognizing which of the plurality of predetermined object groups the object included in the input image I belongs to among the plurality of filters. By applying, the subject group is specified (S4).

フィルタ選択部３３は、モデル取得部３１が特定した被写体グループと、後段フィルタ群を構成する各フィルタに設定されている優先度とに少なくとも基づいて、後段フィルタ群を構成するフィルタの中から１以上のフィルタを選択する（Ｓ６）。 The filter selection unit 33 selects one or more of the filters that form the post-stage filter group based on at least the subject group identified by the model acquisition unit 31 and the priority set for each filter that forms the post-stage filter group. Is selected (S6).

個別認識部３４は、前段フィルタ群の適用結果と、選択したフィルタの適用結果とに基づいて、入力画像Ｉに含まれる認識対象を特定する（Ｓ８）。 The individual recognition unit 34 identifies the recognition target included in the input image I based on the application result of the pre-stage filter group and the application result of the selected filter (S8).

＜実施の形態に係る画像認識装置１が奏する効果＞
以上説明したように、実施の形態に係る画像認識装置１によれば、ニューラルネットワークの機械学習モデルを用いた画像認識処理において、処理負荷を軽減することができる。 <Effects of the image recognition device 1 according to the embodiment>
As described above, the image recognition device 1 according to the embodiment can reduce the processing load in the image recognition process using the machine learning model of the neural network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist thereof. is there. For example, the specific embodiment of device distribution/integration is not limited to the above embodiment, and all or part of the device may be functionally or physically distributed/integrated in arbitrary units. You can Further, a new embodiment that occurs due to an arbitrary combination of a plurality of embodiments is also included in the embodiment of the present invention. The effect of the new embodiment produced by the combination also has the effect of the original embodiment.

１・・・画像認識装置
２・・・記憶部
３・・・制御部
３０・・・画像取得部
３１・・・モデル取得部
３２・・・グループ認識部
３３・・・フィルタ選択部
３４・・・個別認識部
３５・・・リソース取得部
３６・・・重み指標算出部
３７・・・優先度設定部
３８・・・類似度算出部
３９・・・学習部
３９０・・・前段学習部
３９１・・・後段学習部
Ｎ・・・畳込みニューラルネットワーク
1... Image recognition device 2... Storage unit 3... Control unit 30... Image acquisition unit 31... Model acquisition unit 32... Group recognition unit 33... Filter selection unit 34... Individual recognition unit 35... Resource acquisition unit 36... Weight index calculation unit 37... Priority setting unit 38... Similarity calculation unit 39... Learning unit 390... Previous learning unit 391. ..Lear learning unit N... Convolutional neural network

Claims

A machine learning model including a plurality of filters as model parameters, which outputs information indicating which recognition target among a plurality of predetermined recognition targets the subject included in the image data to be processed is. A model acquisition unit that acquires a learning model,
By applying, to the image data, a pre-stage filter group for recognizing which one of the plurality of subject groups the subject included in the image data belongs to among the plurality of filters, among the plurality of filters, A group recognition unit that identifies the subject group,
Individual recognition for identifying a recognition target included in the image data based on the application result of the pre-stage filter group and the application result of the post-stage filter group which is a filter group excluding the pre-stage filter group among the plurality of filters. Department,
A filter to be applied to the individual recognition unit is selected from the post-stage filter group based on at least the subject group specified by the group recognition unit and the priority set for each filter forming the post-stage filter group. A filter selector,
An image recognition device including.

The filter selection unit selects a filter to be applied to the individual recognition unit in descending order of priority set in each filter that configures the latter-stage filter group in a range that the calculation resource of the image recognition device allows,
The image recognition device according to claim 1.

A weighting index calculation unit that calculates an index indicating the magnitude of the weighting factor of each of the plurality of filters,
The filter indicating that the weighting coefficient is large by the index, the priority setting unit that sets the priority higher than that of the filter indicating that the weighting coefficient is small,
The image recognition device according to claim 1, further comprising:

A similarity calculation unit that calculates the similarity with other filters for each of the plurality of filters,
A filter having no other similar filter is further provided with a priority setting unit that sets the priority higher than a filter having another similar filter,
The image recognition device according to claim 1.

The priority setting unit, for each of the plurality of subject groups, changes the priority set to each filter that configures the latter-stage filter group,
The image recognition device according to claim 3 or 4.

When equal priority is set to each of two or more filters that form the latter stage filter group, the filter selection unit randomly selects a filter from the filters to which equal priority is set,
The image recognition device according to claim 1.

Learning that generates the machine learning model by machine learning using a neural network based on a plurality of image data, a subject included in each of the plurality of image data, and learning data in which a subject group of the subject is associated. More parts,
The learning unit is
A pre-stage learning unit that generates the pre-stage filter group,
After the pre-stage learning unit generates the pre-stage filter group, a post-stage learning unit that generates the post-stage filter group using the application result of the pre-stage filter group without performing error back propagation to the pre-stage filter group, Prepare,
The image recognition device according to claim 1.

The processor
A machine learning model including a plurality of filters as model parameters, which outputs information indicating which recognition target among a plurality of predetermined recognition targets the subject included in the image data to be processed is. Obtaining a learning model,
By applying, to the image data, a pre-stage filter group for recognizing which one of the plurality of subject groups the subject included in the image data belongs to among the plurality of filters, among the plurality of filters, Identifying the subject group,
Based on at least the specified subject group and the priority set for each filter constituting the post-stage filter group that is the filter group excluding the pre-stage filter group among the plurality of filters, the post-stage filter group is Selecting one or more filters from the constituent filters,
Based on the application result of the pre-stage filter group, and the application result of the selected filter, a step of identifying the recognition target included in the image data,
Image recognition method for executing.

On the computer,
A machine learning model including a plurality of filters as model parameters, which outputs information indicating which recognition target among a plurality of predetermined recognition targets the subject included in the image data to be processed is. The ability to get a learning model,
By applying, to the image data, a pre-stage filter group for recognizing which one of the plurality of subject groups the subject included in the image data belongs to among the plurality of filters, among the plurality of filters, A function for specifying the subject group,
Based on at least the specified subject group and the priority set for each filter constituting the post-stage filter group that is the filter group excluding the pre-stage filter group among the plurality of filters, the post-stage filter group is A function to select one or more filters from the filters to configure,
Based on the application result of the pre-stage filter group and the application result of the selected filter, a function of specifying a recognition target included in the image data,
A program that realizes.