JP6948851B2

JP6948851B2 - Information processing device, information processing method

Info

Publication number: JP6948851B2
Application number: JP2017118841A
Authority: JP
Inventors: 大岳八谷
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-06-30
Filing date: 2017-06-16
Publication date: 2021-10-13
Anticipated expiration: 2037-06-16
Also published as: JP2018010626A

Description

本発明は階層型ニューラルネットワークを用いた情報処理技術に関するものである。 The present invention relates to an information processing technique using a hierarchical neural network.

近年、監視カメラが取得した画像や映像から、人や群衆の活動パターンを分析したり、特定の事象を検出して通報するサービスがある。該サービスを実現する為には監視カメラが撮影した動画像から人であるのか車であるのか等の物体の属性、歩いているのか走っているか等の行動の種類、鞄であるのかカゴであるのか等の人の所持品の種類、を認識可能な機械学習を用いた認識技術が不可欠である。ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ（以下ＤＮＮと省略）は、高精度な認識を実現する機械学習の手法として注目を集めている。前述したサービスは、介護施設、一般家庭、駅や市街地などの公共施設、スーパー、コンビニエンスストア等の店舗など、様々な環境において活用される。一方、ＤＮＮを学習するための学習データは、実際に該サービスが利用される環境とは異なる環境にて取得されることが多い。例えば、学習データは、実験室にて行われる開発者の演技から取得されることがある。このような学習データを用いて学習した認識器は、学習データ特有の特徴量に依存してしまい、実際に監視カメラが設置された環境では十分に性能が発揮されないという問題がある。そのため、学習したＤＮＮが認識に用いている特徴量を特定することへの要望が高まっている。 In recent years, there are services that analyze the activity patterns of people and crowds from images and videos acquired by surveillance cameras, and detect and report specific events. In order to realize the service, the attributes of the object such as whether it is a person or a car, the type of action such as walking or running, and whether it is a bag or a basket are determined from the moving image taken by the surveillance camera. It is indispensable to have a recognition technology using machine learning that can recognize the type of personal belongings such as a camera. Deep Neural Network (hereinafter abbreviated as DNN) is attracting attention as a machine learning method that realizes highly accurate recognition. The services described above are used in various environments such as nursing care facilities, general households, public facilities such as stations and urban areas, and stores such as supermarkets and convenience stores. On the other hand, the learning data for learning the DNN is often acquired in an environment different from the environment in which the service is actually used. For example, learning data may be obtained from a developer's performance in the laboratory. The recognizer learned using such learning data depends on the feature amount peculiar to the learning data, and there is a problem that the performance is not sufficiently exhibited in the environment where the surveillance camera is actually installed. Therefore, there is an increasing demand for specifying the feature amount used for recognition by the learned DNN.

非特許文献１では、学習したＤＮＮの特定階層の特徴マップのうち、入力した評価用の画像データに対して活性度が高いものを選定し、該特徴マップをｐｏｏｌｉｎｇ層とｃｏｎｖｏｌｕｔｉｏｎ層との逆変換を順次かけて入力層まで戻すことにより可視化する。 In Non-Patent Document 1, among the learned feature maps of the specific layer of DNN, those having high activity with respect to the input image data for evaluation are selected, and the feature map is inversely converted between the polling layer and the convolution layer. Is visualized by sequentially applying and returning to the input layer.

また、非特許文献２では、評価用の画像データを分割し、各領域を取り除いた部分画像を、学習したＤＮＮに入力する。そして、各部分画像をＤＮＮに入力した際のＤＮＮの認識精度の変化に基づき、認識に寄与している画像上の領域を選定する。 Further, in Non-Patent Document 2, the image data for evaluation is divided, and the partial image from which each region is removed is input to the learned DNN. Then, the area on the image that contributes to the recognition is selected based on the change in the recognition accuracy of the DNN when each partial image is input to the DNN.

また、非特許文献３では、ランダムに選択したニューロンの値をゼロまたはノイズを加えながらＤＮＮを学習するＤｒｏｐｏｕｔと呼ばれる方式が提案されている。該方式により、認識精度を改善しつつ学習データに対する過度な適合を回避するように、活性化するＤＮＮのニューロンの数を抑えることができる。 Further, Non-Patent Document 3 proposes a method called Dropout, which learns DNN while adding zero or noise to the value of a randomly selected neuron. By this method, the number of activated DNN neurons can be suppressed so as to avoid excessive adaptation to the training data while improving the recognition accuracy.

Visualizing and Understanding Convolutional Networks, M.D. Ziler and R. Fergus, European Conference on Computer Vision (ECCV), 2014Visualizing and Understanding Convolutional Networks, M.D. Ziler and R. Fergus, European Conference on Computer Vision (ECCV), 2014 Object Detectors Emerge in Deep Scene CNNs, B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, International Conference on Learning Representations (ICLR), 2015Object Detectors Emerge in Deep Scene CNNs, B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, International Conference on Learning Representations (ICLR), 2015 Dropout: A Simple Way to Prevent Neural Networks from Overfitting, N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Journal of Machine Learning Research 15 (2014) 1929-1958.Dropout: A Simple Way to Prevent Neural Networks from Overfitting, N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Journal of Machine Learning Research 15 (2014) 1929-1958.

しかしながら、非特許文献１に記載の方法では、評価用の画像データに対して認識に寄与した特徴マップを可視化しているわけではない。具体的には、非特許文献１にて可視化されている高い活性度を持つ特徴マップの情報は、ＤＮＮの出力層に伝搬される過程において小さい重み係数や他の特徴マップとの打ち消し合いにより消失される可能性がある。その場合、活性度の高い特徴マップは認識に寄与していないことになる。逆に、低い活性度を持つ特徴マップの情報が出力層に伝搬される過程において、大きい重み係数や他の特徴マップとの相乗効果などにより強化される可能性もある。その場合、活性度の低い特徴マップは認識に寄与していることになる。したがって、非特許文献１に記載の方法では、利用者は可視化された特徴マップがどの程度認識に活用されているのかを把握することが出来ない。また、非特許文献１に記載の方法では、利用者は、可視化されている特徴マップ以外の特徴マップで認識に寄与しているものがあるか否かを把握することができない。 However, the method described in Non-Patent Document 1 does not visualize the feature map that contributed to the recognition of the image data for evaluation. Specifically, the information of the feature map with high activity visualized in Non-Patent Document 1 disappears due to a small weighting coefficient and cancellation with other feature maps in the process of being propagated to the output layer of DNN. May be done. In that case, the feature map with high activity does not contribute to recognition. On the contrary, in the process of propagating the information of the feature map having low activity to the output layer, it may be strengthened by a large weighting coefficient or a synergistic effect with other feature maps. In that case, the feature map with low activity contributes to recognition. Therefore, with the method described in Non-Patent Document 1, the user cannot grasp to what extent the visualized feature map is utilized for recognition. Further, by the method described in Non-Patent Document 1, the user cannot grasp whether or not there is a feature map other than the visualized feature map that contributes to recognition.

一方、非特許文献２では、認識精度に寄与する画像データ上の領域を可視化することが出来る。これにより、利用者は、画像データ上のどの領域がどれくらい認識に寄与しているかを把握することができる。しかしながら、非特許文献２の可視化方式では、特徴マップを可視化しているわけではないので、選定された画像データの領域上のどの特徴をＤＮＮが実際に認識に活用しているのかがわからない。例えば、同一の領域上に複数の物体が存在する場合、どの物体の情報が認識に寄与しているのかがわからない。また、人の顔が選定された場合、認識に寄与しているのが、顔の表情、色、大きさ、形なのか、髪の毛、目や口などのパーツなのかがわからない。さらに、非特許文献２に記載の方法では、領域を取り除いて作成した部分画像ごとにＤＮＮの出力値を求める必要があるので、計算に時間がかかるという問題がある。 On the other hand, in Non-Patent Document 2, it is possible to visualize the area on the image data that contributes to the recognition accuracy. As a result, the user can grasp which area on the image data contributes to the recognition and how much. However, since the visualization method of Non-Patent Document 2 does not visualize the feature map, it is not clear which feature on the selected image data area is actually used by DNN for recognition. For example, when a plurality of objects exist in the same area, it is not known which object's information contributes to recognition. Also, when a human face is selected, it is not clear whether it is the facial expression, color, size, shape, or parts such as hair, eyes, and mouth that contribute to recognition. Further, the method described in Non-Patent Document 2 has a problem that it takes time to calculate because it is necessary to obtain the output value of DNN for each partial image created by removing the region.

一方、非特許文献３に記載の方法では、限られたニューロンが認識に寄与するようにＤＮＮを学習することができる。しかしながら、非特許文献３に記載の方法では、認識に寄与するニューロンが明示的に選択されているわけではない。そのため、寄与するニューロンを把握するためには、専門家による様々な評価用データに対するニューロンの活性化状況の分析が必要となる。つまり、別途認識に寄与するニューロンを特定する方法が必要になる。 On the other hand, in the method described in Non-Patent Document 3, DNN can be learned so that a limited number of neurons contribute to recognition. However, in the method described in Non-Patent Document 3, neurons that contribute to recognition are not explicitly selected. Therefore, in order to understand the contributing neurons, it is necessary for experts to analyze the activation status of neurons with respect to various evaluation data. In other words, a separate method for identifying neurons that contribute to recognition is required.

また、非特許文献３に記載の方法では、学習データに基づき認識に寄与するニューロンが獲得されるが、該ニューロンが実際の認識において有用なものとは限らない。上述したように、ある特定の環境で取得した学習データには、環境特有の偏りが含まれる場合があり、該学習データを用いて獲得した認識に寄与するニューロンは、本来の認識に必要の無い特徴量を誤って表現している可能性がある。例えば、「歩く」と「走る」という動作認識の学習データにおいて、「歩く」データに必ず「机」が映っていて、「走る」の学習データに「机」が一つも映っていないというような偏りがあるとする。その場合、非特許文献３に記載の方法では、「机」の特徴量に対応するニューロンを認識に寄与するニューロンとして獲得する。しかしながら、実際に学習したＤＮＮが利用される一般的な環境では、そのような偏りは無いため、該ニューロンは有用ではなく、むしろ認識に弊害となる可能性がある。例えば、該ＤＮＮは、「走る」動作の映像に「机」が映っている場合、「歩く」と誤認識する可能性がある。 Further, in the method described in Non-Patent Document 3, neurons that contribute to recognition are acquired based on learning data, but the neurons are not always useful in actual recognition. As described above, the learning data acquired in a specific environment may include a bias peculiar to the environment, and the neurons that contribute to the recognition acquired using the learning data are not necessary for the original recognition. There is a possibility that the feature amount is expressed incorrectly. For example, in the learning data of motion recognition of "walking" and "running", "desk" is always reflected in the "walking" data, and no "desk" is reflected in the learning data of "running". Suppose there is a bias. In that case, in the method described in Non-Patent Document 3, a neuron corresponding to the feature amount of the "desk" is acquired as a neuron that contributes to recognition. However, in a general environment in which the actually learned DNN is used, there is no such bias, so that the neuron is not useful, but rather may adversely affect cognition. For example, the DNN may mistakenly recognize "walking" when a "desk" is shown in the image of the "running" motion.

このように、非特許文献３に記載の方法では、学習データの偏りがある場合、認識に寄与するニューロンは誤った特徴量を表現する問題がある上、さらにはユーザはその問題を容易に確認することができないという問題がある。 As described above, in the method described in Non-Patent Document 3, when the learning data is biased, the neurons contributing to recognition have a problem of expressing an erroneous feature amount, and the user can easily confirm the problem. There is a problem that it cannot be done.

本発明はこのような問題に鑑みてなされたものであり、評価用データの認識に寄与するＤＮＮの特徴マップ若しくはニューロンを特定するための技術を提供する。 The present invention has been made in view of such a problem, and provides a technique for identifying a DNN feature map or a neuron that contributes to the recognition of evaluation data.

本発明の一様態は、複数のカテゴリのそれぞれに対して、入力データに該カテゴリに属する対象が含まれる確率を出力値として出力する第１のニューラルネットワークと、
前記第１のニューラルネットワークにおいて特定のユニットを変更した第２のニューラルネットワークであって、前記複数のカテゴリのそれぞれに対して、前記入力データに該カテゴリに属する対象が含まれる確率を出力値として出力する該第２のニューラルネットワークと、
前記複数のカテゴリのそれぞれについて、前記第１のニューラルネットワークが該カテゴリに対して出力した出力値と前記第２のニューラルネットワークが該カテゴリに対して出力した出力値との間の差分を表す差分情報を求める計算手段と、
前記計算手段が求めた前記差分情報に基づいて、前記入力データに対する前記出力値への特定のユニットの寄与を表す情報を表示装置に出力する出力手段と
を備えることを特徴とする。 The uniformity of the present invention includes a first neural network that outputs the probability that an object belonging to the category is included in the input data as an output value for each of the plurality of categories.
A second neural network in which a specific unit is modified in the first neural network , and the probability that the input data includes an object belonging to the category is output as an output value for each of the plurality of categories. With the second neural network
For each of the plurality of categories, the difference information of the first neural network representing the difference between the output value the output value output to the category second neural network is output to the category And the calculation method to find
It is characterized by including an output means for outputting to a display device information indicating the contribution of a specific unit to the output value with respect to the input data based on the difference information obtained by the calculation means.

本発明の構成によれば、評価用データの認識に寄与するＤＮＮの特徴マップ若しくはニューロンを特定することができる。 According to the configuration of the present invention, it is possible to identify a DNN feature map or a neuron that contributes to the recognition of evaluation data.

認識学習システム１の構成例を示す図。The figure which shows the configuration example of the recognition learning system 1. 記憶部Ｍ１が記憶する情報の一例を示す図。The figure which shows an example of the information which a storage part M1 stores. ＤＮＮのネットワーク構造の一例を示す図。The figure which shows an example of the network structure of DNN. 記憶部Ｍ２が記憶する情報の一例を示す図。The figure which shows an example of the information which a storage part M2 stores. 変化情報の求め方を説明する図。The figure explaining how to obtain the change information. ＧＵＩの表示例を示す図。The figure which shows the display example of GUI. ＧＵＩの表示例を示す図。The figure which shows the display example of GUI. 認識学習システム１の動作のフローチャート。The flowchart of the operation of the recognition learning system 1. 認識学習システム１ａの構成例を示す図。The figure which shows the configuration example of the recognition learning system 1a. ＧＵＩの表示例を示す図。The figure which shows the display example of GUI. ドロップアウトの割合の一例を示す図。The figure which shows an example of the dropout ratio. 認識学習システム１ａの動作のフローチャート。The flowchart of the operation of the recognition learning system 1a. 認識学習システム１ｂの構成例を示す図。The figure which shows the configuration example of the recognition learning system 1b. コンピュータ装置のハードウェア構成例を示す図。The figure which shows the hardware configuration example of a computer apparatus. 変化情報の求め方を説明する図。The figure explaining how to obtain the change information. 変化情報の求め方を説明する図。The figure explaining how to obtain the change information.

以下、添付図面を参照し、本発明の実施形態について説明する。なお、以下説明する実施形態は、本発明を具体的に実施した場合の一例を示すもので、特許請求の範囲に記載した構成の具体的な実施例の１つである。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In addition, the embodiment described below shows an example when the present invention is concretely implemented, and is one of the specific examples of the configuration described in the claims.

［第１の実施形態］
本実施形態では、以下のような構成を有する情報処理装置の一例について説明する。該情報処理装置は、入力データに対するそれぞれのカテゴリに対応する第１のニューラルネットワークの出力値を求める（第１の計算）。ここで、入力データに対するそれぞれのカテゴリに対応する、前記第１のニューラルネットワークにおいて指定されたユニットを変更した第２のニューラルネットワークの出力値を求める（第２の計算）。そして、それぞれのカテゴリについて、第１の計算で求めた出力値と第２の計算で求めた出力値との間の変化を表す変化情報を求め（第３の計算）、第３の計算で求めた変化情報に基づいて、指定されたユニットの寄与を表す情報を表示装置に出力する。 [First Embodiment]
In this embodiment, an example of an information processing apparatus having the following configuration will be described. The information processing device obtains the output value of the first neural network corresponding to each category for the input data (first calculation). Here, the output value of the second neural network in which the unit specified in the first neural network corresponding to each category for the input data is changed is obtained (second calculation). Then, for each category, change information representing the change between the output value obtained in the first calculation and the output value obtained in the second calculation is obtained (third calculation), and is obtained in the third calculation. Based on the change information, the information indicating the contribution of the specified unit is output to the display device.

本実施形態では、このような情報処理装置を図１に示すような認識学習システム１における認識学習装置１０に適用した場合について説明する。図１に示す如く、認識学習システム１は、認識学習装置１０と、端末装置１００と、を有しており、認識学習装置１０と端末装置１００とは無線若しくは有線のネットワークを介して互いにデータ通信が可能なように構成されている。このネットワークには、例えば、固定電話回線網や、携帯電話回線網や、インターネットが適用できる。なお、図１では認識学習装置１０と端末装置１００とは別個の装置として示しているが、認識学習装置１０と端末装置１００とを一体化させて１つの装置してもよい。 In the present embodiment, a case where such an information processing device is applied to the recognition learning device 10 in the recognition learning system 1 as shown in FIG. 1 will be described. As shown in FIG. 1, the recognition learning system 1 includes a recognition learning device 10 and a terminal device 100, and the recognition learning device 10 and the terminal device 100 communicate data with each other via a wireless or wired network. Is configured to be possible. For example, a fixed telephone line network, a mobile phone line network, or the Internet can be applied to this network. Although the recognition learning device 10 and the terminal device 100 are shown as separate devices in FIG. 1, the recognition learning device 10 and the terminal device 100 may be integrated into one device.

本実施形態では、認識学習システム１の利用者が、学習用の画像若しくは映像（以下、学習用データ）に対して、学習済みのＤＮＮにおいて不要な特徴量が認識に用いられていないかを確認するケースについて説明する。具体的には、認識学習システム１は、評価用に用いる画像若しくは映像（以下、評価用データ）の認識に寄与したＤＮＮの特徴量を特定し、該特徴量を示す情報を該評価用データに重畳して表示する。ここで、不要な特徴量とは、学習用データの取得時に不意に映り込んだ、学習用データの取得環境に特有の物体や事象に依存する特徴量などである。例えば、学習用データが実験室にて演技を行った様子を撮影したことで得たデータである場合、実験室ならではの実験装置や、演技者ならではの癖、服装、姿勢などが、学習用データの取得環境に特有の物体や事象に対応する。ここで、利用者とは、例えば本システムを開発する研究開発者、または本システムを監視カメラとともにエンド利用者に提供するためにＤＮＮの調整を行うシステムインテグレータなどである。また、ＤＮＮの認識対象とは、概念化および言語化可能な物体の状態であり、該状態を言語的に示すラベル情報により特徴付けられる。認識対象には、例えば、「人」、「車」などの物体の属性や、「歩いている」、「走っている」などの物体の行動や、「鞄」、「カゴ」などの人の所持品などが含まれる。なお、ここで、ＤＮＮには、以下の文献にて提案されているＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ（以下ＣＮＮと省略）などがある。
・ ImageNet Classification with Deep Convolutional Neural Networks, A. Krizhevsky, I. Sutskever and G. E. Hinton, Advances in Neural Information Processing Systems 25 (NIPS 2012) In the present embodiment, the user of the recognition learning system 1 confirms whether or not unnecessary feature quantities in the learned DNN are used for recognition of the learning image or video (hereinafter referred to as learning data). The case to do is explained. Specifically, the recognition learning system 1 identifies the feature amount of DNN that contributed to the recognition of the image or video (hereinafter, evaluation data) used for evaluation, and the information indicating the feature amount is used as the evaluation data. Display in layers. Here, the unnecessary feature amount is a feature amount that is unexpectedly reflected when the learning data is acquired and depends on an object or an event peculiar to the learning data acquisition environment. For example, if the learning data is data obtained by shooting a performance in a laboratory, the experimental equipment unique to the laboratory and the habits, clothes, postures, etc. unique to the performer are the learning data. Corresponds to objects and events specific to the acquisition environment. Here, the user is, for example, a research and developer who develops this system, or a system integrator who adjusts DNN in order to provide this system to an end user together with a surveillance camera. Further, the recognition target of DNN is a state of an object that can be conceptualized and verbalized, and is characterized by label information that linguistically indicates the state. The recognition targets include, for example, the attributes of objects such as "people" and "cars", the behavior of objects such as "walking" and "running", and the behavior of people such as "bags" and "baskets". Includes personal belongings. Here, the DNN includes a Convolution Neural Network (hereinafter abbreviated as CNN) proposed in the following documents.
-ImageNet Classification with Deep Convolutional Neural Networks, A. Krizhevsky, I. Sutskever and GE Hinton, Advances in Neural Information Processing Systems 25 (NIPS 2012)

先ず、端末装置１００について説明する。端末装置１００は、各種の情報を表示する表示部ＤＳと、該表示部ＤＳ上で行われた利用者の操作を検知するための操作検知部ＯＰと、を有する装置である。端末装置１００には、例えばＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やタブレットＰＣ、スマートフォン、フューチャーフォン等が適用できる。 First, the terminal device 100 will be described. The terminal device 100 is a device having a display unit DS for displaying various information and an operation detection unit OP for detecting a user's operation performed on the display unit DS. For example, a PC (Personal Computer), a tablet PC, a smartphone, a future phone, or the like can be applied to the terminal device 100.

表示部ＤＳは、液晶パネルや有機ＥＬパネルなどの画像表示パネルを備えており、認識学習装置１０から受信した各種の情報を表示する。詳しくは後述するが、表示部ＤＳは、評価用データ、後述する可視化部１５により生成された特徴量を可視化するためのユニット可視化情報、検出部１３により生成された特徴量の認識における寄与度を示す変化情報、を表示する。また、表示部ＤＳは、後述する認識学習装置１０が記憶するＤＮＮを構成する特徴マップやニューロンを識別するユニットＩＤや、認識対象のカテゴリを識別するカテゴリＩＤの一覧を表示する。 The display unit DS includes an image display panel such as a liquid crystal panel or an organic EL panel, and displays various information received from the recognition learning device 10. Although the details will be described later, the display unit DS provides evaluation data, unit visualization information for visualizing the feature amount generated by the visualization unit 15 described later, and the degree of contribution in recognizing the feature amount generated by the detection unit 13. The change information shown is displayed. In addition, the display unit DS displays a list of feature maps and neuron-identifying unit IDs constituting the DNN stored in the recognition learning device 10 described later, and category IDs for identifying categories to be recognized.

操作検出部ＯＰは、表示部ＤＳの画像表示パネルに配置されたタッチセンサを備えており、利用者の指やタッチペンの動きに基づく利用者の操作を検出するとともに、検出した操作を示す操作情報を認識学習装置１０に対して送信する。なお、操作検出部ＯＰは、コントローラ、キーボード及びマウスなどの入力デバイスを備え、画像表示パネルに表示された画像に対する利用者の操作を示す操作情報を取得するようにしても良い。この操作情報には、例えば、評価用データの選定指示や、可視化の実行指示や、ユニットＩＤやカテゴリＩＤの選択指示などがある。なお、操作検出部ＯＰは、操作情報として「可視化の実行」を検出した場合、端末装置１００が記憶する評価用データを、認識学習装置１０に対して送信する。また操作検出部ＯＰは、操作情報としてユニットＩＤ及びカテゴリＩＤの選択を検出した場合、該ユニットＩＤ及びカテゴリＩＤに対応するユニット可視化情報及び変化情報を認識学習装置１０から受信し、評価用データに重畳して表示部ＤＳに表示させる。 The operation detection unit OP includes a touch sensor arranged on the image display panel of the display unit DS, detects the user's operation based on the movement of the user's finger or the touch pen, and operates information indicating the detected operation. Is transmitted to the recognition learning device 10. The operation detection unit OP may include an input device such as a controller, a keyboard, and a mouse, and may acquire operation information indicating the user's operation on the image displayed on the image display panel. This operation information includes, for example, an instruction for selecting evaluation data, an instruction for executing visualization, an instruction for selecting a unit ID and a category ID, and the like. When the operation detection unit OP detects "execution of visualization" as the operation information, the operation detection unit OP transmits the evaluation data stored in the terminal device 100 to the recognition learning device 10. When the operation detection unit OP detects the selection of the unit ID and the category ID as the operation information, the operation detection unit OP receives the unit visualization information and the change information corresponding to the unit ID and the category ID from the recognition learning device 10 and uses them as evaluation data. It is superimposed and displayed on the display unit DS.

次に、認識学習装置１０について説明する。記憶部Ｍ１は、認識対象のカテゴリを識別するカテゴリＩＤと関連づけて、次のような情報を記憶する。即ち、ＤＮＮの各階層を識別する階層ＩＤ、該階層ＩＤの階層の名称を示す階層名情報、該階層の直下の階層を識別する下階層ＩＤ、該階層の直上の階層を識別する上階層ＩＤ、該階層における処理方法及び処理パラメータを示す処理パラメータ情報、を記憶する。記憶部Ｍ１が記憶する情報の一例を図２に示す。 Next, the recognition learning device 10 will be described. The storage unit M1 stores the following information in association with the category ID that identifies the category to be recognized. That is, a layer ID that identifies each layer of the DNN, a layer name information that indicates the name of the layer of the layer ID, a lower layer ID that identifies the layer immediately below the layer, and an upper layer ID that identifies the layer immediately above the layer. , Processing parameter information indicating the processing method and processing parameter in the layer is stored. FIG. 2 shows an example of the information stored in the storage unit M1.

図２では、認識対象のカテゴリＩＤ及び階層ＩＤはアルファベット及び数字から成る文字列として表しているが、カテゴリＩＤ及び階層ＩＤの表現方法は特定の表現方法に限るものではない。図２の場合、認識対象のカテゴリは２つあり、該２つのカテゴリは、カテゴリＩＤ「Ｃ０１」とカテゴリＩＤ「Ｃ０２」とで識別される。 In FIG. 2, the category ID and the hierarchical ID to be recognized are represented as character strings composed of alphabets and numbers, but the expression method of the category ID and the hierarchical ID is not limited to a specific expression method. In the case of FIG. 2, there are two categories to be recognized, and the two categories are identified by the category ID “C01” and the category ID “C02”.

図２では、階層ＩＤ「Ｌ０１」と関連付けて、階層名「入力層」、下階層ＩＤ「ＮＵＬＬ」（階層ＩＤ「Ｌ０１」の階層よりも下位の階層は存在しないことを表す）、上階層ＩＤ「Ｌ０２」、処理パラメータ「処理方法：データ入力」が記憶されている。これは、階層ＩＤ「Ｌ０１」である階層が「入力層」であって、入力層よりも下位の階層は存在せず、入力層より１階層上の階層の階層ＩＤが「Ｌ０２」であり、入力層において行う処理方法がデータ入力であることを表している。つまり、入力層とは、画像または映像などのデータをＤＮＮに入力して階層ＩＤが「Ｌ０２」である階層に転送するための処理を行う階層である。 In FIG. 2, in association with the layer ID “L01”, the layer name “input layer”, the lower layer ID “Null” (indicating that there is no layer lower than the layer of the layer ID “L01”), and the upper layer ID. "L02" and the processing parameter "processing method: data input" are stored. This is because the layer having the layer ID "L01" is the "input layer", there is no layer lower than the input layer, and the layer ID one layer above the input layer is "L02". Indicates that the processing method performed in the input layer is data input. That is, the input layer is a layer for inputting data such as an image or a video into the DNN and transferring the data to the layer having the layer ID "L02".

また図２では、階層ＩＤ「Ｌ０２」と関連付けて、階層名「Ｃｏｎｖｏｌｕｔｉｏｎ１層」、下階層ＩＤ「Ｌ０１」、上階層ＩＤ「Ｌ０３」、処理パラメータ「処理方法：Ｃｏｎｖｏｌｕｔｉｏｎ…」が記憶されている。これは、階層ＩＤ「Ｌ０２」である階層の階層名がＣｏｎｖｏｌｕｔｉｏｎ１層であり、Ｃｏｎｖｏｌｕｔｉｏｎ１層の１つ下位の階層が「入力層」であって、Ｃｏｎｖｏｌｕｔｉｏｎ１層より１階層上の階層の階層ＩＤが「Ｌ０３」であることを表している。更に、Ｃｏｎｖｏｌｕｔｉｏｎ１層において行う処理方法が、入力層から入力したデータに対し、処理パラメータとしての重み係数とバイアス項を用いた畳み込み演算を行うことを表している。つまり、Ｃｏｎｖｏｌｕｔｉｏｎ１層とは、入力層から入力したデータに対して重み係数とバイアス項を用いた畳み込み演算を行い、該畳み込み演算の結果を階層ＩＤが「Ｌ０３」である階層（Ｐｏｏｌｉｎｇ１層）に対して出力する階層である。この処理パラメータが保持する処理方法には、データ入力およびＣｏｎｖｏｌｕｔｉｏｎ以外にも、以下の文献に記載されているものがある。すなわち、フィルタごとに最大値を求めるＰｏｏｌｉｎｇや、入力データと重み係数との内積を計算するＩｎｎｅｒＰｒｏｄｕｃｔおよび評価用データがカテゴリに属する確率を計算するｓｏｆｔｍａｘなどがある。
・ J. Yangging et al., Caffe: COnvolutional Architecture for Fast Feature Embedding, 2014 Further, in FIG. 2, the layer name “Convolution 1 layer”, the lower layer ID “L01”, the upper layer ID “L03”, and the processing parameter “processing method: Convolution ...” are stored in association with the layer ID “L02”. This is because the layer name of the layer having the layer ID "L02" is the Convolution 1 layer, the layer one level below the Convolution 1 layer is the "input layer", and the layer ID of the layer one layer above the Convolution 1 layer is "". It indicates that it is "L03". Further, the processing method performed in the Convolution 1 layer indicates that the data input from the input layer is subjected to a convolution operation using a weighting coefficient and a bias term as processing parameters. That is, the Convolution 1 layer performs a convolution operation using a weighting coefficient and a bias term on the data input from the input layer, and the result of the convolution operation is applied to the layer (Polling 1 layer) whose layer ID is "L03". It is a hierarchy to output. In addition to data input and Convolution, the processing methods held by this processing parameter include those described in the following documents. That is, there are Pooling for calculating the maximum value for each filter, InnerProduct for calculating the inner product of the input data and the weighting coefficient, and softmax for calculating the probability that the evaluation data belongs to the category.
・ J. Yangging et al., Caffe: COnvolutional Architecture for Fast Feature Embedding, 2014

また、この処理パラメータには、各階層の処理に用いられるフィルタの大きさ、数およびストライド幅、Ｃｏｎｖｏｌｕｔｉｏｎ層やＩｎｎｅｒＰｒｏｄｕｃｔ層で用いられる重み係数やバイアス項の値などが含まれている。 Further, this processing parameter includes the size, number and stride width of the filter used for processing each layer, the weighting coefficient used in the Convolution layer and the InnerProduct layer, and the value of the bias term.

記憶部Ｍ１に格納されている情報によって規定されるＤＮＮのネットワーク構造の一例を図３に示す。図３に例示したＤＮＮは、入力層３０１、Ｃｏｎｖｏｌｕｔｉｏｎ１層３０２、Ｐｏｏｌｉｎｇ１層３０３、Ｃｏｎｖｏｌｕｔｉｏｎ２層３０４、Ｐｏｏｌｉｎｇ２層３０５、Ｉｎｎｅｒｐｒｏｄｕｃｔ層３０６、出力層３０７から構成されている。入力層３０１とＣｏｎｖｏｌｕｔｉｏｎ１層３０２との間において行われる処理は、Ｃｏｎｖｏｌｕｔｉｏｎ１層３０２に対応する処理パラメータ情報で規定されている「Ｃｏｎｖｏｌｕｔｉｏｎ処理３１１」である。また、Ｃｏｎｖｏｌｕｔｉｏｎ１層３０２とＰｏｏｌｉｎｇ１層３０３との間において行われる処理は、Ｐｏｏｌｉｎｇ１層３０３に対応する処理パラメータ情報で規定されている「Ｐｏｏｌｉｎｇ処理３１２」である。また、Ｐｏｏｌｉｎｇ２層３０５とＩｎｎｅｒｐｒｏｄｕｃｔ層３０６との間において行われる処理は、Ｉｎｎｅｒｐｒｏｄｕｃｔ層３０６に対応する処理パラメータ情報で規定されている「ＩｎｎｅｒＰｒｏｄｕｃｔ処理３１３」である。また、Ｉｎｎｅｒｐｒｏｄｕｃｔ層３０６と出力層３０７との間において行われる処理は、出力層３０７に対応する処理パラメータ情報で規定されている「ｓｏｆｔｍａｘ処理３１４」である。 FIG. 3 shows an example of the network structure of the DNN defined by the information stored in the storage unit M1. The DNN illustrated in FIG. 3 is composed of an input layer 301, a Convolution 1 layer 302, a Polling 1 layer 303, a Convolution 2 layer 304, a Pooling 2 layer 305, an Inner product layer 306, and an output layer 307. The process performed between the input layer 301 and the Convolution 1 layer 302 is the "Convolution process 311" defined in the process parameter information corresponding to the Convolution 1 layer 302. Further, the process performed between the Composition 1 layer 302 and the Pooling 1 layer 303 is the “Polling process 312” defined in the process parameter information corresponding to the Pooling 1 layer 303. Further, the process performed between the Polling 2 layer 305 and the Inner product layer 306 is the "Inner Product process 313" defined in the process parameter information corresponding to the Inner product layer 306. Further, the process performed between the Inner product layer 306 and the output layer 307 is the "softmax process 314" defined in the process parameter information corresponding to the output layer 307.

また、図３では、Ｃｏｎｖｏｌｕｔｉｏｎ層およびｐｏｏｌｉｎｇ層には複数の特徴マップが存在しており、ＩｎｎｅｒＰｒｏｄｕｃｔ層および出力層には複数のニューロンが存在している。そして、特徴マップおよびニューロンなどのユニットは、ユニットＩＤにより識別される。例えば、Ｃｏｎｖｏｌｕｔｉｏｎ１層３０２における２つの特徴マップは、ユニットＩＤ「Ｆ０２００１」３２１とユニットＩＤ「Ｆ０２００２」３２２とにより識別される。また、ＩｎｎｅｒＰｒｏｄｕｃｔ層３０６における２つのニューロンは、ユニットＩＤ「Ｆ０６００１」３２３とユニットＩＤ「Ｆ０６００２」３２４とにより識別される。また、図３では、出力層３０７の２つのニューロンに対して、それぞれ認識対象のカテゴリＩＤ＝Ｃ０１、Ｃ０２が割り当てられている。つまり、詳しくは後述するが、カテゴリＩＤ＝Ｃ０１のニューロンからの出力値が、カテゴリＩＤ＝Ｃ０１に対応する出力スコア情報であり、カテゴリＩＤ＝Ｃ０２のニューロンからの出力値が、カテゴリＩＤ＝Ｃ０２に対応する出力スコア情報である。 Further, in FIG. 3, a plurality of feature maps exist in the Convolution layer and the polling layer, and a plurality of neurons exist in the InnerProduct layer and the output layer. Units such as feature maps and neurons are then identified by the unit ID. For example, the two feature maps in the Convolution 1 layer 302 are identified by the unit ID "F02001" 321 and the unit ID "F02002" 322. In addition, the two neurons in the InnerProduct layer 306 are identified by the unit ID "F06001" 323 and the unit ID "F06002" 324. Further, in FIG. 3, the categories ID = C01 and C02 to be recognized are assigned to the two neurons in the output layer 307, respectively. That is, as will be described in detail later, the output value from the neuron of category ID = C01 is the output score information corresponding to the category ID = C01, and the output value from the neuron of category ID = C02 becomes category ID = C02. Corresponding output score information.

このように、記憶部Ｍ１に格納されている情報は、ＤＮＮのネットワーク構造を規定するものであるから、以下では、記憶部Ｍ１に格納されている情報を、ＤＮＮの構造情報と称する場合がある。 As described above, the information stored in the storage unit M1 defines the network structure of the DNN. Therefore, in the following, the information stored in the storage unit M1 may be referred to as the structural information of the DNN. ..

記憶部Ｍ２は、評価用データに対するＤＮＮの各階層の処理結果であるユニットの状態を示すユニット状態情報と、認識対象のカテゴリごとのＤＮＮの出力スコアを示す出力スコア情報と、を記憶する。具体的には、記憶部Ｍ２は、各カテゴリを識別するカテゴリＩＤに対応付けて、各カテゴリに対するＤＮＮの出力スコア情報を記憶する。また、記憶部Ｍ２は、ＤＮＮの階層を識別する階層ＩＤと関連付けて、該階層における特徴マップまたはニューロンなどのユニットを識別するユニットＩＤと、該ユニットの状態を示すユニット状態情報と、を記憶する。記憶部Ｍ２が記憶する情報の一例を図４に示す。 The storage unit M2 stores unit state information indicating the state of the unit which is the processing result of each layer of DNN for the evaluation data, and output score information indicating the output score of DNN for each category to be recognized. Specifically, the storage unit M2 stores the output score information of the DNN for each category in association with the category ID that identifies each category. Further, the storage unit M2 stores a unit ID that identifies a unit such as a feature map or a neuron in the hierarchy and unit state information indicating the state of the unit in association with the hierarchy ID that identifies the hierarchy of the DNN. .. FIG. 4 shows an example of the information stored in the storage unit M2.

図４では、ユニットＩＤはアルファベット及び数字から成る文字列として表しているが、ユニットＩＤの表現方法は特定の表現方法に限るものではない。ユニットＩＤは、ユニットが属する階層の階層ＩＤと、該階層における該ユニットの順番と、に基づいて生成される。例えば、階層ＩＤ「Ｌ０２」の１番目のユニットのユニットＩＤは「Ｆ０２００１」である。また、同階層の２番目のユニットのユニットＩＤは、「Ｆ０２００２」である。 In FIG. 4, the unit ID is represented as a character string composed of alphabets and numbers, but the expression method of the unit ID is not limited to a specific expression method. The unit ID is generated based on the hierarchy ID of the hierarchy to which the unit belongs and the order of the units in the hierarchy. For example, the unit ID of the first unit of the layer ID "L02" is "F02001". The unit ID of the second unit in the same layer is "F02002".

また、図４では、カテゴリＩＤ「Ｃ０１」の出力スコア情報として「１０．５」、カテゴリＩＤ「Ｃ０２」の出力スコア情報として「３．８」が記憶されている。また、階層ＩＤ「Ｌ０１」に関連付けられて、ユニットＩＤ「Ｆ０２００１」と、ユニット状態として、特徴マップの行列が記憶されている。また、階層ＩＤ「Ｌ０６」に関連付けられて、ユニットＩＤ「Ｆ０６００１」と、ユニット状態としてニューロンの値が記憶されている。 Further, in FIG. 4, "10.5" is stored as the output score information of the category ID "C01", and "3.8" is stored as the output score information of the category ID "C02". Further, the unit ID "F012001" and the matrix of the feature map are stored as the unit state in association with the layer ID "L01". Further, the unit ID "F06001" and the value of the neuron as the unit state are stored in association with the hierarchical ID "L06".

図１に戻って、処理部１１は、評価用データに対するＤＮＮの各認識対象カテゴリの出力スコア情報を計算するとともに、計算の過程で得られた各ユニットのユニット状態情報を記憶部Ｍ２に格納する。具体的には、処理部１１は、記憶部Ｍ１から、ＤＮＮが認識対象とするカテゴリのカテゴリＩＤと、各階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報、を読み込む。そして処理部１１は、読み込んだ構造情報に基づいてＤＮＮを構築し、端末装置１００から受信した評価用データに対し、最下位の階層から最上位の階層の順に各階層に対応する処理パラメータ情報を適用して処理を行う。そして、処理部１１は、ＤＮＮの最上位層からの出力（出力スコア情報）のうち、記憶部Ｍ１から読み出したカテゴリＩＤに対応する出力スコア情報を、該カテゴリＩＤと対応付けて記憶部Ｍ２に格納する。 Returning to FIG. 1, the processing unit 11 calculates the output score information of each recognition target category of DNN for the evaluation data, and stores the unit state information of each unit obtained in the calculation process in the storage unit M2. .. Specifically, the processing unit 11 reads from the storage unit M1 the category ID of the category to be recognized by the DNN, the lower layer ID associated with each layer ID, the upper layer ID, and the processing parameter information. Then, the processing unit 11 constructs a DNN based on the read structural information, and provides processing parameter information corresponding to each layer in the order of the lowest layer to the highest layer with respect to the evaluation data received from the terminal device 100. Apply and process. Then, the processing unit 11 associates the output score information corresponding to the category ID read from the storage unit M1 among the outputs (output score information) from the uppermost layer of the DNN with the category ID to the storage unit M2. Store.

なお、本実施形態では、評価用データとして画像を用いるが、評価用データは画像に限らない。例えば、以下の文献にて提案されているように、映像を認識対象とすることができる。
・ Two-stream convlutional networks for action recognition in videos, K. Simonyan and A. Zisserman, Advances in Neural Information Processing System 25 (NIPS), 2014.
・ 3D Convlutional Neural Networks for Human Action Recognition, S. Ji, W. Xu, M. Yang and K. Yu, Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2012 In the present embodiment, an image is used as the evaluation data, but the evaluation data is not limited to the image. For example, as proposed in the following documents, an image can be recognized.
・ Two-stream convlutional networks for action recognition in videos, K. Simonyan and A. Zisserman, Advances in Neural Information Processing System 25 (NIPS), 2014.
・ 3D Convlutional Neural Networks for Human Action Recognition, S. Ji, W. Xu, M. Yang and K. Yu, Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2012

処理部１１は入力層に評価用データを入力してから最上位階層の出力を得るまでの過程における各ユニットのユニット状態情報を、該ユニットが属する階層の階層ＩＤ及び該ユニットのユニットＩＤに関連付けて記憶部Ｍ２に格納する。そして処理部１１は、トリガーを処理部１２に対して出力する。 The processing unit 11 associates the unit status information of each unit in the process from inputting the evaluation data to the input layer to obtaining the output of the highest layer with the layer ID of the layer to which the unit belongs and the unit ID of the unit. And store it in the storage unit M2. Then, the processing unit 11 outputs a trigger to the processing unit 12.

処理部１２は、処理部１１からトリガーを入力したことに応じて、記憶部Ｍ１から、認識対象のカテゴリＩＤと、階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報、を読み込む。また処理部１２は、記憶部Ｍ２から、カテゴリＩＤに関連付けられた出力スコア情報と、階層ＩＤとユニットＩＤとに関連付けられたユニット状態情報とを読み込む。そして処理部１２は、読み込んだユニットＩＤのうち特定のユニットＩＤに対応するユニット状態情報について規定の処理を行う。ここで、特定のユニットＩＤとは、可視化対象のユニット（可視化対象ユニット）を識別するユニットＩＤとして利用者によって予め指定（設定）されたユニットＩＤである。例えば、利用者がＣｏｎｖｏｌｕｔｉｏｎ１層の１番目の特徴マップを可視化対象としたい場合は、「Ｆ０２００１」を特定のユニットＩＤとして設定する。また、利用者がＣｏｎｖｏｌｕｔｉｏｎ１層の全ての特徴マップを可視化対象としたい場合は、ワイルドカードを用いて「Ｆ０２＊」を特定のユニットＩＤとして設定する。また、特定のユニットＩＤに対応するユニット状態情報について行う「規定の処理」には様々な処理が考えられるが、例えば、以下のような２種類の処理（第１の処理、第２の処理）が考えられる。 The processing unit 12 reads the category ID to be recognized and the lower layer ID, upper layer ID, and processing parameter information associated with the layer ID from the storage unit M1 in response to the input of the trigger from the processing unit 11. .. Further, the processing unit 12 reads the output score information associated with the category ID and the unit state information associated with the hierarchical ID and the unit ID from the storage unit M2. Then, the processing unit 12 performs a predetermined process for the unit status information corresponding to the specific unit ID among the read unit IDs. Here, the specific unit ID is a unit ID designated (set) in advance by the user as a unit ID for identifying the unit to be visualized (the unit to be visualized). For example, when the user wants to visualize the first feature map of the Convolution 1 layer, "F02001" is set as a specific unit ID. Further, when the user wants to visualize all the feature maps of the Convolution 1 layer, "F02 *" is set as a specific unit ID by using a wild card. In addition, various processes can be considered for the "specified process" performed for the unit status information corresponding to the specific unit ID. For example, the following two types of processes (first process and second process) are considered. Can be considered.

第１の処理では、処理部１２は、記憶部Ｍ２から読み込んだユニット状態情報のうち特定のユニットＩＤに対応するユニット状態情報が表す数値の集合と同サイズ且つ要素が全て０となる別集合を付加ユニット情報として生成する。例えば、ユニット状態情報が特徴マップの行列を表している場合には、該行列と同サイズ且つ全ての要素が０である行列を付加ユニット情報として生成する。また、ユニット状態情報がニューロンの値である場合には、値が０のニューロン値を付加ユニット情報として生成する。以下では、特定のユニットＩＤに対応するユニット状態情報は要素が全てゼロのユニット（特徴マップまたはニューロン）に置き換えられるので、該ユニットからの出力が０になり、ＤＮＮ上では疑似的に該ユニットが削除された状態になる。 In the first process, the processing unit 12 sets another set having the same size as the set of numerical values represented by the unit state information corresponding to the specific unit ID among the unit state information read from the storage unit M2 and having all zero elements. Generated as additional unit information. For example, when the unit state information represents a matrix of the feature map, a matrix having the same size as the matrix and having all elements 0 is generated as additional unit information. When the unit state information is a neuron value, a neuron value having a value of 0 is generated as additional unit information. In the following, the unit state information corresponding to a specific unit ID is replaced with a unit (feature map or neuron) having all zero elements, so that the output from the unit becomes 0, and the unit is pseudo on the DNN. It will be in the deleted state.

第２の処理では、処理部１２は、記憶部Ｍ２から読み込んだユニット状態情報のうち特定のユニットＩＤに対応するユニット状態情報が表す数値の集合と同サイズ且つ要素が全てランダム値となる別集合を付加情報として生成する。ランダム値は、例えば、独立同一に正規分布やラプラス分布などに従う。例えば、ユニット状態情報が特徴マップの行列を表している場合には、該行列と同サイズ且つ全ての要素がランダム値である行列を付加情報として生成する。また、ユニット状態情報がニューロンの値である場合には、値がランダム値のニューロン値を付加情報として生成する。そして処理部１２は、特定のユニットＩＤに対応するユニット状態情報に付加情報を加算する（対応する要素ごとの加算）ことで付加ユニット情報を生成する。 In the second process, the processing unit 12 is a separate set having the same size as the set of numerical values represented by the unit state information corresponding to the specific unit ID among the unit state information read from the storage unit M2, and all the elements are random values. Is generated as additional information. Random values follow, for example, a normal distribution or a Laplace distribution independently and identically. For example, when the unit state information represents a matrix of the feature map, a matrix having the same size as the matrix and all elements having random values is generated as additional information. When the unit state information is a neuron value, a neuron value having a random value is generated as additional information. Then, the processing unit 12 generates the additional unit information by adding the additional information to the unit state information corresponding to the specific unit ID (addition for each corresponding element).

そして処理部１２は、特定のユニットＩＤ（規定の処理の対象となったユニット状態情報のユニットＩＤ）と、付加ユニット情報と、を検出部１３に対して出力する。 Then, the processing unit 12 outputs a specific unit ID (unit ID of the unit status information subject to the specified processing) and additional unit information to the detection unit 13.

検出部１３は、記憶部Ｍ１から認識対象のカテゴリＩＤと、階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報を読み込む。更に検出部１３は、記憶部Ｍ２からカテゴリＩＤに関連付けられた出力スコア情報と、階層ＩＤとユニットＩＤとに関連付けられたユニット状態情報とを読み込む。そして検出部１３は、処理部１１と同様にして、評価用データに対するＤＮＮの各認識対象カテゴリの出力スコア情報を計算するのであるが、その際、特定のユニットＩＤに対応するユニット状態情報として付加ユニット情報を用いる。更に、検出部１３は、特定のユニットＩＤに対応する階層ＩＤよりも下位の階層に対応する階層ＩＤと関連づけられているユニット状態情報は再度計算する必要はなく、記憶部Ｍ２に格納されているユニット状態情報を用いればよい。例えば、Ｃｏｎｖｏｌｕｔｉｏｎ２層のユニットについて規定の処理を行った場合、Ｃｏｎｖｏｌｕｔｉｏｎ１層、Ｐｏｏｌｉｎｇ１層のユニット状態情報が出力スコア情報の計算に再利用される。 The detection unit 13 reads the category ID to be recognized and the lower layer ID, upper layer ID, and processing parameter information associated with the layer ID from the storage unit M1. Further, the detection unit 13 reads the output score information associated with the category ID and the unit state information associated with the hierarchical ID and the unit ID from the storage unit M2. Then, the detection unit 13 calculates the output score information of each recognition target category of DNN with respect to the evaluation data in the same manner as the processing unit 11, but at that time, it is added as the unit state information corresponding to the specific unit ID. Use unit information. Further, the detection unit 13 does not need to recalculate the unit state information associated with the layer ID corresponding to the layer lower than the layer ID corresponding to the specific unit ID, and is stored in the storage unit M2. The unit status information may be used. For example, when the specified processing is performed on the unit of the Convolution 2 layer, the unit state information of the Convolution 1 layer and the Pooling 1 layer is reused for the calculation of the output score information.

このようにして検出部１３は、特定のユニットＩＤに対応するユニットを付加ユニット情報に置き換えた場合のＤＮＮの評価用データに対する各認識対象カテゴリの出力スコア情報を計算する。検出部１３は、記憶部Ｍ２に格納されている出力スコア情報に対する該計算した出力スコア情報の変化（特定のユニットＩＤに対応するユニットを付加ユニット情報に置き換えたことによる出力スコア情報の変化を示す変化情報）をカテゴリ毎に求める。また、変化情報の算出処理には様々な算出処理が考えられるが、例えば、以下の２つの算出処理（第１の算出処理、第２の算出処理）が考えられる。 In this way, the detection unit 13 calculates the output score information of each recognition target category for the DNN evaluation data when the unit corresponding to the specific unit ID is replaced with the additional unit information. The detection unit 13 indicates a change in the calculated output score information with respect to the output score information stored in the storage unit M2 (change in the output score information due to replacement of the unit corresponding to the specific unit ID with the additional unit information). Change information) is calculated for each category. Further, various calculation processes can be considered for the calculation process of change information. For example, the following two calculation processes (first calculation process and second calculation process) can be considered.

第１の算出処理では検出部１３は、特定のユニットＩＤに対応するユニット状態情報を付加ユニット情報に置き換えた場合のＤＮＮの評価用データに対する出力スコア情報と、記憶部Ｍ２に格納されている出力スコア情報と、の差分を変化情報として求める。第１の算出処理では、例えば以下の式（１）に従って変化情報を求める。なお、式（１）のように、この差分は、負の値を取らないようにするために絶対値をとってもよい。 In the first calculation process, the detection unit 13 outputs the output score information for the DNN evaluation data when the unit status information corresponding to the specific unit ID is replaced with the additional unit information, and the output stored in the storage unit M2. The difference between the score information and the score information is obtained as change information. In the first calculation process, for example, change information is obtained according to the following equation (1). As in the equation (1), this difference may take an absolute value so as not to take a negative value.

式（１）においてΔＳ_ｃ、ｕは、ユニットＩＤ＝ｕのユニット状態情報を付加ユニット情報に置き換えた場合においてカテゴリｃについて求めた変化情報である。Ｓｃは、記憶部Ｍ２から読み込んだカテゴリｃの出力スコア情報、Ｓ_ｃ、ｕは、ユニットＩＤ＝ｕのユニット状態情報を付加ユニット情報に置き換えた場合においてカテゴリｃについてＤＮＮから出力された出力スコア情報である。 In the equation (1), ΔS _{c and u} are change information obtained for the category c when the unit state information of the unit ID = u is replaced with the additional unit information. Sc is the output score information of the category c read from the storage unit M2, and Sc _{and u} are the output score information output from the DNN for the category c when the unit state information of the unit ID = u is replaced with the additional unit information. Is.

第２の処理では検出部１３は、特定のユニットＩＤに対応するユニット状態情報を付加ユニット情報に置き換えた場合のＤＮＮの評価用データに対する出力スコア情報と、付加ユニット情報を生成するために用いた付加情報との相関係数を変化情報として求める。この場合、処理部１２は更に検出部１３に対して付加情報を出力する必要がある。具体的には、各可視化対象ユニットについて（若しくはその一部について）処理部１２及び検出部１３は以下のような処理を行う。即ち、処理部１２は、該可視化対象ユニットのユニット状態情報に付加情報を加えて付加ユニット情報を生成し、検出部１３は可視化対象ユニットのユニット状態情報の代わりに該付加ユニット情報を用いたＤＮＮの出力スコア情報を計算する。そして検出部１３は、計算した各出力スコア情報と、該出力スコア情報を計算するために用いた付加情報と、の組を用いて以下の式（２）を計算することで、相関係数を変化情報として計算する。 In the second process, the detection unit 13 was used to generate the output score information for the DNN evaluation data when the unit status information corresponding to the specific unit ID was replaced with the additional unit information, and the additional unit information. The correlation coefficient with the additional information is obtained as change information. In this case, the processing unit 12 needs to further output additional information to the detection unit 13. Specifically, the processing unit 12 and the detection unit 13 perform the following processing for each visualization target unit (or a part thereof). That is, the processing unit 12 adds additional information to the unit status information of the visualization target unit to generate additional unit information, and the detection unit 13 uses the additional unit information instead of the unit status information of the visualization target unit. Calculate the output score information of. Then, the detection unit 13 calculates the correlation coefficient by calculating the following equation (2) using the set of each calculated output score information and the additional information used for calculating the output score information. Calculate as change information.

式（２）においてＮは繰り返す回数（組の数）を示す。Ｓ_{ｃ、ｕ、ｉ}は、ユニットＩＤ＝ｕのユニット状態情報を、ｉ番目の規定の処理によって生成された付加ユニット情報に置き換えた場合のカテゴリｃの出力スコア情報を示す。ａ_ｉは、ｉ番目の付加情報である。 In equation (2), N indicates the number of repetitions (number of pairs). _{Sc, u, and i} indicate the output score information of the category c when the unit state information of the unit ID = u is replaced with the additional unit information generated by the i-th specified process. a _i is the i-th additional information.

処理部１２が上記の第１の処理を行った場合における変化情報の求め方について、図５を用いて説明する。図５では、ＤＮＮのＣｏｎｖｏｌｕｔｉｏｎ１層５０１およびＣｏｎｖｏｌｕｔｉｏｎ２層５０２に含まれるユニット（それぞれユニット５１１，５１２）が可視化対象ユニットに設定されている。このような場合、処理部１２は、ユニット５１１と同サイズ且つ全ての要素が０であるユニット（付加ユニット情報）５３１を生成すると共に、ユニット５１２と同サイズ且つ全ての要素が０であるユニット（付加ユニット情報）５３２を生成する。 A method of obtaining change information when the processing unit 12 performs the first processing described above will be described with reference to FIG. In FIG. 5, the units included in the Convolution 1 layer 501 and the Convolution 2 layer 502 of the DNN (units 511 and 512, respectively) are set as the visualization target units. In such a case, the processing unit 12 generates a unit (additional unit information) 531 having the same size as the unit 511 and having all elements 0, and a unit having the same size as the unit 512 and having all elements 0 (additional unit information) 531. Additional unit information) 532 is generated.

そして図５では、検出部１３は、ユニット５１１をユニット５３１に差し替えた場合のＤＮＮの出力スコア情報（ユニット５１２をユニット５３２に差し替えた場合のＤＮＮの出力スコア情報）を８．５として求めている。また、図５では、差し替え前のＤＮＮの出力スコア情報を１０．５としている。然るに、変化情報は２となっている。このような変化情報の算出処理はカテゴリごとに行われる。 Then, in FIG. 5, the detection unit 13 obtains the DNN output score information when the unit 511 is replaced with the unit 531 (DNN output score information when the unit 512 is replaced with the unit 532) as 8.5. .. Further, in FIG. 5, the output score information of the DNN before replacement is set to 10.5. However, the change information is 2. The calculation process of such change information is performed for each category.

このように、処理部１２及び検出部１３は、可視化対象ユニットごとに、それぞれのカテゴリについての変化情報を算出することができる。つまり、それぞれの可視化対象ユニットについて次のような一連の処理が行われる。即ち、該可視化対象ユニットのユニット状態情報を対応する付加ユニット情報に置き換えたＤＮＮのカテゴリごとの出力スコア情報を算出し、置き換え前のＤＮＮのカテゴリごとの出力スコア情報に対する変化情報を求める。 In this way, the processing unit 12 and the detection unit 13 can calculate the change information for each category for each visualization target unit. That is, the following series of processing is performed for each visualization target unit. That is, the output score information for each category of the DNN in which the unit state information of the visualization target unit is replaced with the corresponding additional unit information is calculated, and the change information with respect to the output score information for each category of the DNN before the replacement is obtained.

そして検出部１３は、カテゴリＩＤごとに、特定のユニットＩＤと変化情報とユニット状態情報との組を、選択部１４に対して出力する。つまり、検出部１３は、式（１）または式（２）などで計算した、カテゴリｃごとの変化情報ΔＳ_ｃ、ｕの集合を、選択部１４に出力する。 Then, the detection unit 13 outputs a set of a specific unit ID, change information, and unit state information to the selection unit 14 for each category ID. That is, the detection unit 13 outputs to the selection unit 14 a set _{of change information ΔS c and u} for each category c calculated by the equation (1) or the equation (2).

選択部１４は、入力した変化情報に基づき、入力したカテゴリＩＤごとに、認識への寄与度が高いユニットのユニットＩＤを選択する。このユニットＩＤの選択方法として、選択部１４は、カテゴリＩＤごとに、変化情報の値が大きいユニットＩＤを、寄与度の高いユニットのユニットＩＤとして選択する。具体的には、例えば、選択部１４は、カテゴリＩＤごとに、閾値以上の変化情報を持つユニットＩＤを全て選択する。また、選択部１４は、カテゴリＩＤごとに、変化情報の値の大きい順に先頭から規定数の変化情報を持つユニットＩＤを選択する。そして選択部１４は、カテゴリＩＤごとに、選択したユニットＩＤと変化情報との組を可視化部１５に対して出力する。なお、選択部１４は、カテゴリごとではなく全カテゴリに対して認識に寄与するユニットを選択してもよい。例えば、選択部１４は、特定のユニットＩＤの全カテゴリの変化情報の平均値、合計または最大値などの統計値を求め、その統計値が大きいユニットを選択する。 The selection unit 14 selects the unit ID of the unit having a high degree of contribution to recognition for each of the input category IDs based on the input change information. As a method of selecting the unit ID, the selection unit 14 selects a unit ID having a large value of change information as a unit ID of a unit having a high degree of contribution for each category ID. Specifically, for example, the selection unit 14 selects all unit IDs having change information equal to or greater than the threshold value for each category ID. Further, the selection unit 14 selects a unit ID having a specified number of change information from the beginning in descending order of the value of the change information for each category ID. Then, the selection unit 14 outputs a set of the selected unit ID and the change information to the visualization unit 15 for each category ID. The selection unit 14 may select units that contribute to recognition for all categories, not for each category. For example, the selection unit 14 obtains a statistical value such as an average value, a total or a maximum value of change information of all categories of a specific unit ID, and selects a unit having a large statistical value.

なお、変化情報と比較する閾値や、選択するユニットＩＤの数については、例えば、端末装置１００の表示部ＤＳに表示された数値を人が調整することにより設定できる。また、操作検出部ＯＰは、人による該数値の変更を示す操作を検出し、該数値と操作情報とを認識学習装置１０に出力する。認識学習装置１０は、端末装置１００から該数値と操作情報とを入力したことに応じて、該数値を閾値や選択するユニットＩＤの数として、認識学習装置１０内の不図示のメモリに記憶させる。 The threshold value to be compared with the change information and the number of unit IDs to be selected can be set by, for example, a person adjusting the numerical value displayed on the display unit DS of the terminal device 100. Further, the operation detection unit OP detects an operation indicating a change in the numerical value by a person, and outputs the numerical value and the operation information to the recognition learning device 10. The recognition learning device 10 stores the numerical value as a threshold value or the number of unit IDs to be selected in a memory (not shown) in the recognition learning device 10 in response to input of the numerical value and operation information from the terminal device 100. ..

可視化部１５は、選択部１４から受けたユニットＩＤに対応するユニットを可視化するための情報をユニット可視化情報として生成する。具体的には、可視化部１５は、記憶部Ｍ１から、各階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報、を読み込む。そして、可視化部１５は、記憶部Ｍ１から読み込んだ下階層ＩＤ、上階層ＩＤ、処理パラメータ情報に基づき、ユニット可視化情報を生成する。例えば、非特許文献１に記載のように、ユニット状態情報を、下位のｐｏｏｌｉｎｇ層とｃｏｎｖｏｌｕｔｉｏｎ層との逆変換を順次かけて入力層まで戻す方法を用いることができる。これにより、評価用データとしての画像上において可視化対象ユニットに対応する対象（特徴）を特定することができる。この特定した対象（特徴）の画像上の領域及び該領域に配置するオブジェクトを示す情報がユニット可視化情報である。 The visualization unit 15 generates information for visualizing the unit corresponding to the unit ID received from the selection unit 14 as unit visualization information. Specifically, the visualization unit 15 reads the lower layer ID, the upper layer ID, and the processing parameter information associated with each layer ID from the storage unit M1. Then, the visualization unit 15 generates unit visualization information based on the lower layer ID, the upper layer ID, and the processing parameter information read from the storage unit M1. For example, as described in Non-Patent Document 1, it is possible to use a method of returning the unit state information to the input layer by sequentially performing the inverse transformation of the lower polling layer and the convolution layer. Thereby, the target (feature) corresponding to the visualization target unit can be specified on the image as the evaluation data. The information indicating the area on the image of the specified object (feature) and the object to be arranged in the area is the unit visualization information.

そして可視化部１５は選択部１４から受けたユニットＩＤ及び変化情報、該ユニットＩＤに対応する階層ＩＤ、カテゴリＩＤ、ユニット可視化情報、を端末装置１００に対して送信する。 Then, the visualization unit 15 transmits the unit ID and change information received from the selection unit 14, the layer ID corresponding to the unit ID, the category ID, and the unit visualization information to the terminal device 100.

端末装置１００の表示部ＤＳには、図６に例示するＧＵＩ（グラフィカルユーザインターフェース）が表示される。このＧＵＩにおいてＤＳ１は、端末装置１００が保持する評価用データとしての画像である。ＤＳ２は、可視化部１５から受けたユニットＩＤの一覧と階層ＩＤとを表示する表示領域である。ＤＳ３は、可視化部１５から受けたカテゴリＩＤの一覧を表示する表示領域である。このようなＧＵＩにおいて図７に例示する如く、ＤＳ２に一覧表示されているユニットＩＤのうち１つを利用者の手による操作ＵＳ１によって指定したことを操作検出部ＯＰが検出したとする。また、ＤＳ３に一覧表示されているカテゴリＩＤのうち１つを利用者の手による操作ＵＳ２によって指定したと操作検出部ＯＰが検出したとする。すると端末装置１００の表示部ＤＳには図７に示す如く、認識学習装置１０から受信した変化情報のうち、指定されたユニットＩＤ及びカテゴリＩＤに対応する変化情報が寄与度ＤＳ１０２として表示される。更に表示部ＤＳには、指定されたユニットＩＤに対応するユニット可視化情報が示す領域（頭部の領域）内に、該ユニット可視化情報が示すオブジェクトＤＳ１０１が表示される。寄与度ＤＳ１０２及びオブジェクトＤＳ１０１は何れも評価用データとしての画像上に重畳して表示される。しかし、ＧＵＩのレイアウトは図７に示したレイアウトに限らない。然るに、寄与度ＤＳ１０２及びオブジェクトＤＳ１０１を表示する際には、評価用データとしての画像上に重畳させなくても構わない。なお、指定されたユニットの寄与を表す情報としては、変化情報をそのまま寄与度として表示するのではなく、適当な大きさの値に正規化したり、所定範囲ごとにレベルで表してもよく、あるいはグラフ化して表現してもよい。 The GUI (graphical user interface) illustrated in FIG. 6 is displayed on the display unit DS of the terminal device 100. In this GUI, DS1 is an image as evaluation data held by the terminal device 100. The DS2 is a display area for displaying a list of unit IDs received from the visualization unit 15 and a hierarchical ID. The DS3 is a display area for displaying a list of category IDs received from the visualization unit 15. As illustrated in FIG. 7 in such a GUI, it is assumed that the operation detection unit OP detects that one of the unit IDs listed in the DS2 is specified by the operation US1 by the user. Further, it is assumed that the operation detection unit OP detects that one of the category IDs listed on the DS3 is specified by the operation US2 by the user. Then, as shown in FIG. 7, the change information corresponding to the designated unit ID and category ID among the change information received from the recognition learning device 10 is displayed as the contribution degree DS 102 on the display unit DS of the terminal device 100. Further, on the display unit DS, the object DS101 indicated by the unit visualization information is displayed in the area (head area) indicated by the unit visualization information corresponding to the designated unit ID. Both the contribution degree DS102 and the object DS101 are superimposed and displayed on the image as evaluation data. However, the GUI layout is not limited to the layout shown in FIG. However, when displaying the contribution degree DS102 and the object DS101, it is not necessary to superimpose them on the image as evaluation data. As the information representing the contribution of the specified unit, the change information may not be displayed as it is as the contribution degree, but may be normalized to a value of an appropriate size, or may be represented by a level for each predetermined range. It may be expressed as a graph.

次に、上述の認識学習システム１の動作について、図８のフローチャートに沿って説明する。図８は、ＤＮＮにおける認識処理に寄与する特徴量の可視化の一例を示すフローチャートである。なお、図８に示した各処理の詳細は上記の通りであるため、以下では簡単に説明する。 Next, the operation of the above-mentioned recognition learning system 1 will be described with reference to the flowchart of FIG. FIG. 8 is a flowchart showing an example of visualization of a feature amount that contributes to recognition processing in DNN. Since the details of each process shown in FIG. 8 are as described above, they will be briefly described below.

先ず、端末装置１００の表示部ＤＳは、評価用データの一覧を表示する（Ｖ１０１）。評価用データの一覧としては、例えば、画像のサムネイルの一覧であっても良いし、映像のプレビューの一覧であっても良い。ここで利用者が評価用データの一覧から１つを選択する操作を行うと共に、「可視化の実行」の指示を入力したことを操作検出部ＯＰが検知すると、端末装置１００は一覧から選択された評価用データを認識学習装置１０に対して送信する（Ｖ１０２）。認識学習装置１０の処理部１２は、端末装置１００から送信された評価用データを受信する（Ｖ１０２）。 First, the display unit DS of the terminal device 100 displays a list of evaluation data (V101). The list of evaluation data may be, for example, a list of thumbnails of images or a list of previews of images. Here, when the operation detection unit OP detects that the user has selected one from the list of evaluation data and has input the instruction of "execution of visualization", the terminal device 100 is selected from the list. The evaluation data is transmitted to the recognition learning device 10 (V102). The processing unit 12 of the recognition learning device 10 receives the evaluation data transmitted from the terminal device 100 (V102).

次に、認識学習装置１０の処理部１１は記憶部Ｍ１から、ＤＮＮが認識対象とするカテゴリのカテゴリＩＤと、各階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報を読み込む（Ｖ１０３）。 Next, the processing unit 11 of the recognition learning device 10 reads the category ID of the category to be recognized by the DNN, the lower layer ID, the upper layer ID, and the processing parameter information associated with each layer ID from the storage unit M1 ( V103).

次に処理部１１は、読み込んだ構造情報に基づき、端末装置１００から受信した評価用データに対して、最下位の階層から最上位の階層の順に各階層に対応する処理パラメータ情報を適用し、カテゴリごとの出力スコア情報を求める（Ｖ１０４）。 Next, the processing unit 11 applies the processing parameter information corresponding to each layer in the order from the lowest layer to the highest layer to the evaluation data received from the terminal device 100 based on the read structural information. Obtain output score information for each category (V104).

そして処理部１１は、ＤＮＮの最上位層からの出力（出力スコア情報）のうち、記憶部Ｍ１から読み出したカテゴリＩＤに対応する出力スコア情報を、該カテゴリＩＤと対応付けて記憶部Ｍ２に格納する（Ｖ１０５）。更に処理部１１は、各ユニットのユニット状態情報を、該ユニットが属する階層の階層ＩＤ及び該ユニットのユニットＩＤに関連付けて記憶部Ｍ２に格納する（Ｖ１０５）。そして処理部１１は、トリガーを処理部１２に対して出力する。 Then, the processing unit 11 stores the output score information corresponding to the category ID read from the storage unit M1 among the outputs (output score information) from the uppermost layer of the DNN in the storage unit M2 in association with the category ID. (V105). Further, the processing unit 11 stores the unit status information of each unit in the storage unit M2 in association with the layer ID of the hierarchy to which the unit belongs and the unit ID of the unit (V105). Then, the processing unit 11 outputs a trigger to the processing unit 12.

次に、処理部１２は、可視化対象ユニットの数をカウントするためのカウンタ変数ｉの値を０に初期化する（Ｖ１０６）。更に処理部１２は、記憶部Ｍ１から、認識対象のカテゴリＩＤと、階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報、を読み込む（Ｖ１０６）。また処理部１２は、記憶部Ｍ２から、カテゴリＩＤに関連付けられた出力スコア情報と、階層ＩＤとユニットＩＤとに関連付けられたユニット状態情報とを読み込む（Ｖ１０６）。 Next, the processing unit 12 initializes the value of the counter variable i for counting the number of units to be visualized to 0 (V106). Further, the processing unit 12 reads the category ID to be recognized, the lower layer ID associated with the layer ID, the upper layer ID, and the processing parameter information from the storage unit M1 (V106). Further, the processing unit 12 reads the output score information associated with the category ID and the unit state information associated with the hierarchical ID and the unit ID from the storage unit M2 (V106).

読み込んだユニットＩＤのうち特定のユニットＩＤの数をＮ（Ｎは２以上の整数）とすると、処理部１２は、ｉ番目の特定のユニットＩＤに対応するユニット状態情報について規定の処理を行うことで、付加ユニット情報を生成する（Ｖ１０７）。そして処理部１２は、ｉ番目の特定のユニットＩＤと、該ｉ番目の特定のユニットＩＤについて生成した付加ユニット情報と、を検出部１３に対して出力する（Ｖ１０７）。 Assuming that the number of specific unit IDs among the read unit IDs is N (N is an integer of 2 or more), the processing unit 12 performs a predetermined process for the unit status information corresponding to the i-th specific unit ID. Then, the additional unit information is generated (V107). Then, the processing unit 12 outputs the i-th specific unit ID and the additional unit information generated for the i-th specific unit ID to the detection unit 13 (V107).

検出部１３は処理部１１と同様にして、評価用データに対するＤＮＮの各認識対象カテゴリの出力スコア情報を計算するが、その際、ｉ番目の特定のユニットＩＤに対応するユニット状態情報の代わりに付加ユニット情報を用いる（Ｖ１０８）。 The detection unit 13 calculates the output score information of each recognition target category of DNN for the evaluation data in the same manner as the processing unit 11, but at that time, instead of the unit status information corresponding to the i-th specific unit ID, Additional unit information is used (V108).

そして検出部１３は、Ｖ１０８で計算した出力スコア情報と、記憶部Ｍ２に格納されている出力スコア情報と、の間の変化をカテゴリごとに求める（Ｖ１０９）。そして検出部１３は、カウンタ変数ｉの値を１つインクリメントする（Ｖ１１０）。インクリメント後のカウンタ変数ｉの値がＮ以上となった場合には、処理はＶ１１１を介してＶ１１２に進み、Ｎ未満であれば、処理はＶ１１１を介してＶ１０７に戻る。 Then, the detection unit 13 obtains a change between the output score information calculated by V108 and the output score information stored in the storage unit M2 for each category (V109). Then, the detection unit 13 increments the value of the counter variable i by one (V110). If the value of the counter variable i after increment is N or more, the process proceeds to V112 via V111, and if it is less than N, the process returns to V107 via V111.

選択部１４は、カテゴリＩＤごとに、認識への寄与度が高いユニットのユニットＩＤを選択し、カテゴリＩＤごとに、選択したユニットＩＤと変化情報との組を可視化部１５に対して出力する（Ｖ１１２）。 The selection unit 14 selects the unit ID of the unit having a high contribution to recognition for each category ID, and outputs the set of the selected unit ID and the change information to the visualization unit 15 for each category ID (for each category ID). V112).

可視化部１５は、選択部１４から受けたユニットＩＤに対応するユニットを可視化するための情報をユニット可視化情報として生成する（Ｖ１１３）。そして可視化部１５は、選択部１４から受けたユニットＩＤ及び変化情報、該ユニットＩＤに対応する階層ＩＤ、カテゴリＩＤ、ユニット可視化情報、を端末装置１００に対して送信する（Ｖ１１３）。 The visualization unit 15 generates information for visualizing the unit corresponding to the unit ID received from the selection unit 14 as unit visualization information (V113). Then, the visualization unit 15 transmits the unit ID and change information received from the selection unit 14, the layer ID corresponding to the unit ID, the category ID, and the unit visualization information to the terminal device 100 (V113).

端末装置１００の表示部ＤＳは、端末装置１００が保持する評価用データとしての画像、可視化部１５から受けたユニットＩＤの一覧と階層ＩＤ、可視化部１５から受けたカテゴリＩＤの一覧を表示している。このような状態において、利用者がＧＵＩ上でユニットＩＤ及びカテゴリＩＤを指定したとする。すると表示部ＤＳは、該指定されたユニットＩＤ及びカテゴリＩＤに対応する変化情報の表す寄与度、指定されたユニットＩＤに対応するユニット可視化情報が示すオブジェクト、のそれぞれを評価用データに重畳して表示する（Ｖ１１４）。 The display unit DS of the terminal device 100 displays an image as evaluation data held by the terminal device 100, a list of unit IDs and hierarchical IDs received from the visualization unit 15, and a list of category IDs received from the visualization unit 15. There is. In such a state, it is assumed that the user specifies the unit ID and the category ID on the GUI. Then, the display unit DS superimposes each of the contribution degree represented by the change information corresponding to the designated unit ID and the category ID and the object indicated by the unit visualization information corresponding to the designated unit ID on the evaluation data. Display (V114).

このように、本実施形態によれば、評価用データに対して、ＤＮＮの認識に寄与する特徴マップまたはニューロンなどのユニットの情報を可視化することが出来る。然るに利用者は、学習データ特有の特徴量など不要な特徴量が認識に利用されていないかを確認することができる。そして、もしＤＮＮが不要な特徴量を認識に用いていることが分かった場合、利用者は該特徴量を含むデータを学習データから削除して、ＤＮＮを再学習することができる。これにより、利用者は不要な特徴量を用いないＤＮＮを獲得することができる。 As described above, according to the present embodiment, it is possible to visualize the information of the unit such as the feature map or the neuron that contributes to the recognition of DNN with respect to the evaluation data. However, the user can confirm whether or not unnecessary features such as features peculiar to the learning data are used for recognition. Then, if it is found that the DNN uses an unnecessary feature amount for recognition, the user can delete the data including the feature amount from the learning data and relearn the DNN. As a result, the user can acquire a DNN that does not use unnecessary features.

また、本実施形態では、出力スコア情報の変化を検出する際に、既に計算した各ユニットの状態を再利用する。これにより、認識に寄与するユニットを高速に求めることができる。特に上位の階層におけるユニットほど再利用できる下位層のユニットがより多いため、より高速に求めることができる。そのため、利用者は、より多くの評価用データを用いてＤＮＮの認識に寄与する特徴量を確認することができる。 Further, in the present embodiment, when detecting a change in the output score information, the already calculated state of each unit is reused. As a result, a unit that contributes to recognition can be obtained at high speed. In particular, the higher the level, the more units in the lower layer that can be reused, so it can be obtained at higher speed. Therefore, the user can confirm the feature amount that contributes to the recognition of DNN by using more evaluation data.

なお、本実施形態では、各ユニットに対する出力スコア情報の独立的な変化に基づき、認識に寄与するユニットを選択する場合について説明した。しかしながら、これらの一連の処理を、複数のユニットの共起性を考慮して行ってもよい。例えば、以下の文献にて記載されているＦｏｒｗａｒｄＳｅｌｅｃｔｉｏｎまたはＢａｃｋｗａｒｄＳｅｌｅｃｔｉｏｎを用いて、近似的に出力スコアの変化を最大化するユニットの組み合わせを選択してもよい。
・ Feature Selection for Reinforcement Learning: Evaluating Implicit State-Reward Dependency via Conditional Mutual Information, H. Hachiya & M. Sugiyama, ECML2010 In this embodiment, a case where a unit that contributes to recognition is selected based on an independent change in the output score information for each unit has been described. However, these series of processes may be performed in consideration of the co-occurrence of a plurality of units. For example, a combination of units that approximately maximizes the change in output score may be selected using the Forward Selection or Backward Selection described in the following documents.
・ Feature Selection for Reinforcement Learning: Evaluating Implicit State-Reward Dependency via Conditional Mutual Information, H. Hachiya & M. Sugiyama, ECML2010

［第２の実施形態］
本実施形態を含め、以降の各実施形態では、第１の実施形態との差分について重点的に説明し、以下で特に触れない限りは、第１の実施形態と同様であるものとする。本実施形態に係る認識学習システム１ａの構成例について、図９を用いて説明する。本実施形態に係る認識学習システム１ａは、学習済みのＤＮＮが不要な特徴量を認識に用いていないかを利用者が確認し、もし用いられている場合は該特徴量の重要度を低く設定して認識器を再学習させる構成を有する。つまり、可視化された特徴量に対する利用者からのフィードバックを示す操作情報に基づいて、認識学習装置１０ａがＤＮＮを再学習する点において、第１の実施形態と異なる。 [Second Embodiment]
In each of the following embodiments including this embodiment, the differences from the first embodiment will be mainly described, and unless otherwise specified below, the same as the first embodiment. A configuration example of the recognition learning system 1a according to the present embodiment will be described with reference to FIG. In the recognition learning system 1a according to the present embodiment, the user confirms whether the learned DNN is using unnecessary features for recognition, and if they are used, the importance of the features is set low. It has a configuration for re-learning the recognizer. That is, it differs from the first embodiment in that the recognition learning device 10a relearns the DNN based on the operation information indicating the feedback from the user with respect to the visualized feature amount.

本実施形態に係る認識学習システム１ａは認識学習装置１０ａと端末装置１００ａとを有しており、認識学習装置１０ａと端末装置１００ａとの間は第１の実施形態と同様、有線や無線等のネットワークを介して互いにデータ通信が可能なように構成されている。 The recognition learning system 1a according to the present embodiment has a recognition learning device 10a and a terminal device 100a, and between the recognition learning device 10a and the terminal device 100a, as in the first embodiment, a wired or wireless connection is used. It is configured to allow data communication with each other via a network.

端末装置１００ａの操作検出部ＯＰは、第１の実施形態と同様に利用者の表示部ＤＳに対する操作情報を検知するのである。本実施形態では更に操作検出部ＯＰは、後述する重要度情報の設定指示や、ＤＮＮの再学習の実行指示を検知する。 The operation detection unit OP of the terminal device 100a detects the operation information for the display unit DS of the user as in the first embodiment. In the present embodiment, the operation detection unit OP further detects an instruction for setting importance information, which will be described later, and an instruction for executing DNN re-learning.

本実施形態では、表示部ＤＳは、図７のＧＵＩの代わりに、図１０に例示するＧＵＩを表示する。図１０のＧＵＩでは、ＤＳ２からユニットＩＤとして「Ｆ０４００１」、ＤＳ３からカテゴリＩＤとして「Ｃ０２」が選択されておいる。その結果、背景にある建物（ユニット可視化情報が示す領域）にオブジェクトＤＳ１０１（ユニット可視化情報が示すお武家区と）が重畳されて表示されている。また図１０のＧＵＩでは、表示領域ＤＳ４内にオブジェクトＤＳ１０１で示したユニット（特徴量）に対する利用者からのフィードバック操作ＵＳ３を取得するための重要度のプルダウンメニューＤＳ４０１と再学習の実行ボタンＤＳ４０２とが表示されている。プルダウンメニューＤＳ４０１は、指示することで複数の重要度（例えば０〜１の間の実数値で、値が大きいほどより高い重要度を表し、値が小さいほどより低い重要度を表す）の一覧を表示するので、利用者は一覧から１つの重要度を選択指示することができる。実行ボタンＤＳ４０２は、指示することで認識学習装置１０ａに対して再学習を指示することができる。 In the present embodiment, the display unit DS displays the GUI illustrated in FIG. 10 instead of the GUI shown in FIG. In the GUI of FIG. 10, "F04001" is selected as the unit ID from DS2, and "C02" is selected as the category ID from DS3. As a result, the object DS101 (with the samurai district indicated by the unit visualization information) is superimposed and displayed on the building (the area indicated by the unit visualization information) in the background. Further, in the GUI of FIG. 10, the pull-down menu DS401 of the importance for acquiring the feedback operation US3 from the user for the unit (feature amount) shown by the object DS101 and the relearning execution button DS402 are displayed in the display area DS4. It is displayed. The pull-down menu DS401 lists a plurality of importance (for example, a real value between 0 and 1, the larger the value, the higher the importance, and the smaller the value, the lower the importance) by instructing. Since it is displayed, the user can select and instruct one importance from the list. The execution button DS402 can instruct the recognition learning device 10a to relearn by instructing it.

操作検出部ＯＰは、利用者によるプルダウンメニューＤＳ４０１や実行ボタンＤＳ４０２に対する操作を示す操作情報を検知する。操作情報が「プルダウンメニューＤＳ４０１を用いた重要度の入力」である場合には、端末装置１００ａは、入力された重要度を示す重要度情報を、オブジェクトＤＳ１０１に対応する可視化対象ユニットのユニットＩＤに関連付けて記憶する。一方、操作情報が「実行ボタンＤＳ４０２の指示」である場合には、端末装置１００ａは、記憶している重要度情報と、該重要度情報と関連づけて記憶されているユニットＩＤと、を再学習の実行指示と共に認識学習装置１０ａに対して送信する。なお、利用者がプルダウンメニューＤＳ４０１を用いて重要度を設定していない場合には、デフォルトの重要度を示す重要度情報が送信されることになる。このデフォルトの重要度については特定の重要度に限らないが、例えば１である。また、このデフォルトの重要度は、認識学習装置１０ａからユニットＩＤに関連付けられて入力した変化情報の値が設定されてもよい。 The operation detection unit OP detects operation information indicating an operation on the pull-down menu DS401 or the execution button DS402 by the user. When the operation information is "input of importance using the pull-down menu DS401", the terminal device 100a converts the input importance information indicating the importance into the unit ID of the visualization target unit corresponding to the object DS101. Associate and memorize. On the other hand, when the operation information is the "instruction of the execution button DS402", the terminal device 100a relearns the stored importance information and the unit ID stored in association with the importance information. Is transmitted to the recognition learning device 10a together with the execution instruction of. If the user has not set the importance using the pull-down menu DS401, the importance information indicating the default importance will be transmitted. The importance of this default is not limited to a specific importance, but is 1, for example. Further, the value of the change information input from the recognition learning device 10a in association with the unit ID may be set as the default importance.

一方、認識学習装置１０ａの再学習部１６は、端末装置１００ａから再学習の実行指示を受けると、学習データを用いて、重要度情報に基づきＤＮＮを学習する。具体的には、端末装置１００ａからユニットＩＤと重要度情報との組みを入力したことに応じて再学習部１６は記憶部Ｍ１から、ＤＮＮが認識対象とするカテゴリＩＤと、各階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報、を読み込む。そして再学習部１６は、記憶部Ｍ１から読み込んだＤＮＮの構造情報と、端末装置１００ａから受信した重要度情報と、に基づく重要度付き学習方法を用いて、学習データに対するＤＮＮの識別誤差を最小化するように処理パラメータ情報を更新する。ここで、更新が行われる処理パラメータ情報は、例えば、Ｃｏｎｖｏｌｕｔｉｏｎ処理やＩｎｎｅｒＰｒｏｄｕｃｔ処理の重み係数とバイアス項の値である。この学習データは、画像や映像などの入力データと入力データが属するカテゴリＩＤの複数の組から成るデータであり、予め作成されたものである。また、この重要度付き学習方法には、例えば、次の２つの学習方法がある。 On the other hand, when the re-learning unit 16 of the recognition learning device 10a receives a re-learning execution instruction from the terminal device 100a, the re-learning unit 16 learns the DNN based on the importance information using the learning data. Specifically, in response to the input of the combination of the unit ID and the importance information from the terminal device 100a, the re-learning unit 16 associates the storage unit M1 with the category ID to be recognized by the DNN and each layer ID. The lower layer ID, upper layer ID, and processing parameter information are read. Then, the re-learning unit 16 minimizes the identification error of the DNN with respect to the learning data by using the learning method with importance based on the structural information of the DNN read from the storage unit M1 and the importance information received from the terminal device 100a. Update the processing parameter information so that Here, the processing parameter information to be updated is, for example, the weight coefficient and the value of the bias term of the Convolution processing and the InnerProduct processing. This learning data is data composed of a plurality of sets of input data such as images and videos and category IDs to which the input data belong, and is created in advance. Further, the learning method with importance includes, for example, the following two learning methods.

第１の学習方法として、再学習部１６は、端末装置１００ａから受信したユニットＩＤと重要度情報とに基づき、記憶部Ｍ１から読み込んだＤＮＮの構造情報の各ユニットのドロップアウトする割合を設定する。ドロップアウトとは上記の非特許文献３にて提案されているように、学習過程の各反復においてランダムに選んだユニットを一時的にネットワークから切り離す処理のことで、ドロップアウトされたユニットに係る処理パラメータ情報は、該反復において更新が行われない。 As the first learning method, the re-learning unit 16 sets the dropout ratio of each unit of the DNN structural information read from the storage unit M1 based on the unit ID and the importance information received from the terminal device 100a. .. As proposed in Non-Patent Document 3 above, dropout is a process of temporarily disconnecting randomly selected units from the network in each iteration of the learning process, and is a process related to the dropped out unit. The parameter information is not updated in the iteration.

このドロップアウトが行われる各ユニットの割合は、通常は固定の０．５（上記文献）などに設定されるが、この第１の学習方法では、該割合を、以下の式（３）のように、入力した重要度情報に基づき設定される。 The ratio of each unit to which this dropout is performed is usually set to a fixed value of 0.5 (above document) or the like, but in this first learning method, the ratio is set as the following equation (3). Is set based on the input importance information.

式（３）においてｒはドロップアウトの割合で、Ｉは重要度情報が表す重要度である。例えば、重要度Ｉが１の場合は、ドロップアウトの割合は通常の割合０．５に設定される。しかし、重要度Ｉが０．１のユニットについては、ドロップアウトの割合は通常の割合より高い値、例えば、０．９５に設定される。これにより、重要度が低いユニットは、高い頻度でドロップアウトが行われるため、該ユニットの処理パラメータ情報は更新が行われにくくなる。そのため、該ユニットの認識への寄与は相対的に小さくなる。 In equation (3), r is the dropout rate and I is the importance represented by the importance information. For example, if the importance I is 1, the dropout rate is set to the normal rate of 0.5. However, for units of importance I of 0.1, the dropout rate is set to a higher value than the normal rate, for example 0.95. As a result, the unit of low importance is dropped out frequently, so that the processing parameter information of the unit is difficult to be updated. Therefore, the contribution of the unit to recognition is relatively small.

各ユニットに設定されたドロップアウトの割合の一例を図１１に示す。図１１では、Ｃｏｎｖｏｌｕｔｉｏｎ２層１２０１の特徴マップ１２０２および特徴マップ１２０３のそれぞれにドロップアウトの割合０．５および０．９５が設定されている。また、図１１では、Ｉｎｎｅｒｐｒｏｄｕｃｔ層１２０４のニューロン１２０５のドロップアウトの割合が０．７に設定されている。 FIG. 11 shows an example of the dropout ratio set for each unit. In FIG. 11, the dropout ratios of 0.5 and 0.95 are set for the feature map 1202 and the feature map 1203 of the Convolution 2 layer 1201, respectively. Also, in FIG. 11, the dropout rate of neuron 1205 in Inner product layer 1204 is set to 0.7.

第２の学習方法として、再学習部１６は、端末装置１００ａから受信したユニットＩＤと重要度情報とに基づく罰則項を、以下の式（４）のように最小化を行う識別誤差に付加する。 As a second learning method, the re-learning unit 16 adds a penalty term based on the unit ID and importance information received from the terminal device 100a to the identification error that is minimized as in the following equation (4). ..

式（４）においてθはＤＮＮの各ユニットの処理パラメータ情報を要素に持つベクトル、Ｅ（θ）は学習データに対するＤＮＮの識別誤差、λは誤差と重要度の罰則項のバランスを取るための係数、Ｕは各ユニットの重要度の逆数を対角成分にもつ行列である。例えば、ｉ番目のユニットの重要度が０．５の場合は、行列Ｕの要素Ｕ_ｉｉは２となる。ここで、重要度が低いユニットほど、ユニットの処理パラメータ情報に対する罰則が強くなるため、式（４）を最小化するように学習したＤＮＮは、より重要度の低いユニットを使わないように学習される。 In equation (4), θ is a vector having the processing parameter information of each unit of DNN as an element, E (θ) is the DNN identification error with respect to the training data, and λ is a coefficient for balancing the error and the penalty term of importance. , U is a matrix having the reciprocal of the importance of each unit as a diagonal component. For example, when the importance of the i-th unit is 0.5, the element _{Uii of the} matrix U is 2. Here, the less important the unit, the stronger the penalty for the processing parameter information of the unit. Therefore, the DNN learned to minimize the equation (4) is learned not to use the less important unit. NS.

なお、詳細は省くが、第１および第２の学習方法において、各階層の処理パラメータ情報は、最初に初期化された後、識別誤差を最小化するようにする。そのために、ＳｔｏｃｈａｓｔｉｃＧｒａｄｉｅｎｔＤｅｓｃｅｎｔ（ＳＧＤ）やＡｄａＤｅｌｔａ（J. Yangging et al., Caffe: COnvolutional Architecture for Fast Feature Embedding, 2014）などの勾配法が用いられる。 Although details are omitted, in the first and second learning methods, the processing parameter information of each layer is initialized first, and then the identification error is minimized. For this purpose, gradient methods such as Stochastic Gradient Descent (SGD) and Adadelta (J. Yangging et al., Caffe: COnvolutional Architecture for Fast Feature Embedding, 2014) are used.

そして、再学習部１６は、更新した処理パラメータ情報を、記憶部Ｍ１に階層ＩＤに関連付けて記憶させる。これにより、記憶部Ｍ１に格納されているＤＮＮの構造情報が再学習により更新されたことになる。 Then, the re-learning unit 16 stores the updated processing parameter information in the storage unit M1 in association with the layer ID. As a result, the structural information of the DNN stored in the storage unit M1 is updated by re-learning.

次に、本実施形態に係る認識学習システム１ａの動作について、図１２のフローチャートを用いて説明する。図１２において図８に示した処理ステップと同じ処理ステップには同じステップ番号を付しており、該処理ステップに係る説明は省略する。 Next, the operation of the recognition learning system 1a according to the present embodiment will be described with reference to the flowchart of FIG. In FIG. 12, the same processing steps as those shown in FIG. 8 are assigned the same step numbers, and the description of the processing steps will be omitted.

Ｖ１１４の処理の後、利用者が「プルダウンメニューＤＳ４０１を用いた重要度の入力」を行ったとする。このとき、端末装置１００ａは、入力された重要度を示す重要度情報を、オブジェクトＤＳ１０１に対応する可視化対象ユニットのユニットＩＤに関連付けて記憶する（Ｆ１０１）。一方、利用者が「実行ボタンＤＳ４０２の指示」を行った場合には、端末装置１００ａは、重要度情報と、該重要度情報と関連づけて記憶されているユニットＩＤと、を認識学習装置１０ａに対して送信する（Ｆ１０１）。 It is assumed that the user performs "input of importance using the pull-down menu DS401" after the processing of V114. At this time, the terminal device 100a stores the input importance information indicating the importance in association with the unit ID of the visualization target unit corresponding to the object DS101 (F101). On the other hand, when the user gives an "instruction of the execution button DS402", the terminal device 100a transmits the importance information and the unit ID stored in association with the importance information to the recognition learning device 10a. (F101).

次に、再学習部１６は、端末装置１００ａから受信したユニットＩＤと重要度情報とに基づき、記憶部Ｍ１から読み込んだＤＮＮの構造情報の各ユニットのドロップアウトする割合を設定する（Ｆ１０２）。次に、再学習部１６は、処理パラメータ情報を初期化した後、識別誤差を最小化するようにＳＧＤやＡｄａＤｅｌｔａなどの勾配法を用いて、処理パラメータ情報を更新する（Ｆ１０３）。次に、再学習部１６は、Ｆ１０３で更新した処理パラメータ情報を、対応する階層ＩＤと関連付けて記憶部Ｍ１に記憶させる（Ｆ１０４）。 Next, the re-learning unit 16 sets the dropout ratio of each unit of the DNN structural information read from the storage unit M1 based on the unit ID received from the terminal device 100a and the importance information (F102). Next, the re-learning unit 16 initializes the processing parameter information, and then updates the processing parameter information by using a gradient method such as SGD or AdaDelta so as to minimize the identification error (F103). Next, the re-learning unit 16 stores the processing parameter information updated in F103 in the storage unit M1 in association with the corresponding hierarchical ID (F104).

このように、本実施形態によれば、第１の実施形態に係る効果に加え、もしＤＮＮが不要な特徴量を認識に用いていることが分かった場合、利用者は該特徴量に対して低い重要度を設定して、ＤＮＮを再学習することができる。これにより、利用者は直感的および簡単な操作で不要な特徴量を用いないＤＮＮを獲得することができる。 As described above, according to the present embodiment, in addition to the effect according to the first embodiment, if it is found that the DNN uses an unnecessary feature amount for recognition, the user applies the feature amount to the feature amount. You can relearn the DNN with a lower importance. As a result, the user can acquire the DNN without using unnecessary features by intuitive and simple operation.

［第３の実施形態］
本実施形態に係る認識学習システム１ｂの構成例について、図１３を用いて説明する。本実施形態に係る認識学習システム１ｂは、利用者が用意した評価用データの認識において寄与度の低い特徴マップおよびニューロンを選定し、ＤＮＮから削除する構成を有する。ここで、評価用データは、例えば、ある特定のドメインの複数の画像または複数のクリップから構成される映像である。ドメインとは、本システムが利用されると想定される環境であり、例えば、介護施設、一般家庭、公共施設の駅や市街、店舗などである。 [Third Embodiment]
A configuration example of the recognition learning system 1b according to the present embodiment will be described with reference to FIG. The recognition learning system 1b according to the present embodiment has a configuration in which feature maps and neurons having a low contribution in recognition of evaluation data prepared by the user are selected and deleted from the DNN. Here, the evaluation data is, for example, an image composed of a plurality of images or a plurality of clips of a specific domain. A domain is an environment in which this system is assumed to be used, for example, a nursing care facility, a general household, a station or city of a public facility, a store, or the like.

本実施形態に係る認識学習システム１ｂは認識学習装置１０ｂと端末装置１００とを有しており、認識学習装置１０ａと端末装置１００との間は第１の実施形態と同様、有線や無線等のネットワークを介して互いにデータ通信が可能なように構成されている。 The recognition learning system 1b according to the present embodiment has a recognition learning device 10b and a terminal device 100, and the recognition learning device 10a and the terminal device 100 are connected between the recognition learning device 10a and the terminal device 100 by wire, wireless, or the like as in the first embodiment. It is configured to allow data communication with each other via a network.

選択部１４ｂは、検出部１３から入力した変化情報に基づき、入力したカテゴリＩＤごとに、認識への寄与度が低いユニットのユニットＩＤを選択する。このユニットＩＤの選択方法として、選択部１４ｂは、カテゴリＩＤごとに、変化情報が小さいユニットＩＤを、寄与度の低いユニットのユニットＩＤとして選択する。例えば、選択部１４は、カテゴリＩＤごとに、様々な評価用データに対する変化情報の平均を各ユニットＩＤについて求め、該平均が閾値未満の変化情報を持つユニットＩＤを全て選択する。また、選択部１４ｂは、カテゴリＩＤごとに、平均の小さい順に先頭から規定数の平均に対応するユニットＩＤを選択ユニットＩＤとして選択する。そして選択部１４ｂは、カテゴリＩＤごとに、選択ユニットＩＤと変化情報との組を可視化部１５及び削除部１７に対して出力する。 The selection unit 14b selects the unit ID of the unit having a low contribution to recognition for each category ID input based on the change information input from the detection unit 13. As a method of selecting the unit ID, the selection unit 14b selects a unit ID having a small change information as a unit ID of a unit having a low contribution degree for each category ID. For example, the selection unit 14 obtains the average of the change information for various evaluation data for each unit ID for each category ID, and selects all the unit IDs having the change information whose average is less than the threshold value. Further, the selection unit 14b selects, as the selection unit ID, the unit ID corresponding to the average of the specified number from the beginning in ascending order of the average for each category ID. Then, the selection unit 14b outputs a set of the selection unit ID and the change information to the visualization unit 15 and the deletion unit 17 for each category ID.

削除部１７は、選択ユニットＩＤに対応するユニットをＤＮＮから削除する。具体的には、選択部１４ｂから、選択ユニットＩＤと変化情報との組を入力したことに応じて、削除部１７は、記憶部Ｍ１から、ＤＮＮが認識対象とするカテゴリＩＤと、各階層ＩＤに関連付けられた下階層ＩＤ、上階層ＩＤ、処理パラメータ情報、を読み込む。そして、削除部１７は、選択部１４ｂから入力した選択ユニットＩＤに基づく更新方法で、ＤＮＮの構造情報を更新する。更新方法として、例えば、処理パラメータ情報に含まれている、選択ユニットＩＤのユニットの重み係数およびバイアス項を０にするなどして該ユニットを削除する。また、削除部１７は、選択ユニットＩＤのユニットが属する階層の処理パラメータ情報が保持するフィルタ数を、削除したユニット数に応じて減らす。そして削除部１７は、更新した構造情報を、記憶部Ｍ１に記憶させる。 The deletion unit 17 deletes the unit corresponding to the selected unit ID from the DNN. Specifically, in response to the input of the set of the selection unit ID and the change information from the selection unit 14b, the deletion unit 17 receives the category ID to be recognized by the DNN and each layer ID from the storage unit M1. Read the lower layer ID, upper layer ID, and processing parameter information associated with. Then, the deletion unit 17 updates the structural information of the DNN by an update method based on the selection unit ID input from the selection unit 14b. As an update method, for example, the unit is deleted by setting the weighting coefficient and the bias term of the unit of the selected unit ID to 0, which are included in the processing parameter information. Further, the deletion unit 17 reduces the number of filters held in the processing parameter information of the hierarchy to which the unit of the selected unit ID belongs according to the number of deleted units. Then, the deletion unit 17 stores the updated structural information in the storage unit M1.

なお、可視化部１５は、選択ユニットＩＤに対応するユニットを可視化するユニット可視化情報を生成する。そして、端末装置１００は、生成されたユニット可視化情報に基づいてオブジェクトを表示部ＤＳに表示する。これにより、利用者は、認識学習装置１０ｂにより、削除されたユニットを確認することができる。 The visualization unit 15 generates unit visualization information for visualizing the unit corresponding to the selected unit ID. Then, the terminal device 100 displays the object on the display unit DS based on the generated unit visualization information. As a result, the user can confirm the deleted unit by the recognition learning device 10b.

なお、削除部１７は、削除したユニットの重み係数およびバイアス項などの処理パラメータ情報を認識学習システム１ｂ内に保持しておいてもよい。そして、端末装置１００は、削除されたユニットのユニット可視化情報とともに、「復旧」ボタンを表示部ＤＳに表示する。そして、端末装置１００の操作検出部ＯＰが利用者によるユニット可視化情報の選択及び「復旧」ボタンに対する操作を示す操作情報を検出した場合、端末装置１００は、認識学習装置１０ｂの削除部１７に対して操作情報を送信する。削除部１７は、端末装置１００から操作情報を受信したことに応じて、自装置内に記憶しておいた、利用者が選択したユニット可視化情報に対応するユニットＩＤに対応する処理パラメータ情報を選択し、記憶部Ｍ１に、該処理パラメータ情報を追加する。これにより利用者は、認識学習装置１０ｂにより削除されたユニットを確認し、もし重要なユニットが削除されたことが分かった場合は、該ユニットをＤＮＮに復旧させることができる。 The deletion unit 17 may hold processing parameter information such as the weighting coefficient and the bias term of the deleted unit in the recognition learning system 1b. Then, the terminal device 100 displays the "recovery" button on the display unit DS together with the unit visualization information of the deleted unit. Then, when the operation detection unit OP of the terminal device 100 detects the operation information indicating the selection of the unit visualization information by the user and the operation for the "recovery" button, the terminal device 100 responds to the deletion unit 17 of the recognition learning device 10b. And send the operation information. In response to receiving the operation information from the terminal device 100, the deletion unit 17 selects the processing parameter information corresponding to the unit ID corresponding to the unit visualization information selected by the user, which is stored in the own device. Then, the processing parameter information is added to the storage unit M1. As a result, the user can confirm the deleted unit by the recognition learning device 10b, and if it is found that the important unit has been deleted, the user can restore the unit to DNN.

このように、本実施形態によれば、特定のドメインにおける評価データに対してＤＮＮの認識に寄与しない特徴マップまたはニューロンを削除することができる。これにより、特定のドメインにおいて、ＤＮＮは認識精度を維持しながら、軽量および高速に認識ができるようになる。例えば、様々なドメインを含む学習データを用いて多様な環境に対応可能なＤＮＮを学習しておき、実際に本システムが利用される特定のドメインに合わせて、ＤＮＮを調整するようなことができる。 Thus, according to the present embodiment, feature maps or neurons that do not contribute to the recognition of DNN for evaluation data in a specific domain can be deleted. As a result, in a specific domain, DNN can recognize lightly and at high speed while maintaining recognition accuracy. For example, it is possible to learn a DNN that can correspond to various environments using learning data including various domains, and adjust the DNN according to a specific domain in which this system is actually used. ..

［第４の実施形態］
特定のユニットＩＤに対応するユニット状態情報について行う「規定の処理」には様々な処理が考えられるが、例えば、以下のような処理（第３の処理、第４の処理）も考えられる。 [Fourth Embodiment]
Various processes can be considered as the "specified process" performed for the unit status information corresponding to the specific unit ID. For example, the following processes (third process, fourth process) can also be considered.

第３の処理として、処理部１２は、記憶部Ｍ２から読み込んだユニット状態情報のうち特定のユニットＩＤに対応するユニットと同じ階層の任意のユニットＩＤに関連付けられたユニット状態情報を付加ユニット情報として生成する。ここで、任意のユニットＩＤとは、例えば、特定のユニットＩＤと隣り合うユニットＩＤや、ランダムに選択したユニットのＩＤや、固定のユニットＩＤなどに相当する。ここで、ランダムなユニットＩＤは、例えば、同じ階層内のユニットＩＤの中から、一様分布に従って選択される。なお、「規定の処理」として、所定のユニット状態情報に、付加ユニット情報を足すなどの四則演算などの処理を施してもよい。 As a third process, the processing unit 12 uses the unit status information associated with any unit ID in the same hierarchy as the unit corresponding to the specific unit ID among the unit status information read from the storage unit M2 as additional unit information. Generate. Here, the arbitrary unit ID corresponds to, for example, a unit ID adjacent to a specific unit ID, an ID of a randomly selected unit, a fixed unit ID, or the like. Here, the random unit IDs are selected according to a uniform distribution, for example, from the unit IDs in the same hierarchy. As the "specified processing", processing such as four arithmetic operations such as adding additional unit information to the predetermined unit status information may be performed.

第４の処理として、処理部１２は、記憶部Ｍ２から読み込んだユニット状態情報のうち特定のユニットＩＤに対応するユニット状態情報が表す数値の集合と同サイズ且つ要素が所定の値を持つ特徴マップまたはニューロンを示す付加ユニット情報を生成する。ここで、所定の値とは、例えば、予め定められた固定の数値パターンである。 As a fourth process, the processing unit 12 is a feature map having the same size as a set of numerical values represented by the unit state information corresponding to a specific unit ID among the unit state information read from the storage unit M2 and having a predetermined value for the element. Or generate additional unit information indicating a neuron. Here, the predetermined value is, for example, a predetermined fixed numerical pattern.

この「規定の処理」に必要な処理情報は、自装置内または外部の記憶装置に記憶されている。例えば、処理情報は、自装置内の記憶部Ｍ１のＤＮＮの構造情報の一部として記憶されている。この処理情報には、例えば、「規定の処理」を示すＩＤ、付加ユニット情報、ランダム値を生成する確率分布の情報、および差し替えや四則演算などの付加ユニット情報と特定のユニット情報とに対する処理情報などがある。 The processing information required for this "specified processing" is stored in a storage device inside or outside the own device. For example, the processing information is stored as a part of the structural information of the DNN of the storage unit M1 in the own device. This processing information includes, for example, an ID indicating "specified processing", additional unit information, information on a probability distribution for generating a random value, and processing information for additional unit information such as replacement and four arithmetic operations and specific unit information. and so on.

また、「規定の処理」は、ＤＮＮの構造の一部として処理を施してもよい。具体的には、処理部１２は、処理対象である所定の階層と、一つ上位の階層との間に、「規定の処理」を施すユニット付加処理層を挿入した構造を示すＤＮＮ構造情報を生成する。ここで、ユニット付加処理層の各ユニット情報は、図１６で後述するように、付加ユニット情報に対応しており、一つ下位の階層の各ユニット情報に対して、「規定の処理」を適用するように、ＤＮＮ構造情報の処理パラメータが設定される。そして、処理部１２は、生成したＤＮＮ構造情報を、記憶部Ｍ１に記憶させる。 Further, the "specified processing" may be performed as a part of the structure of the DNN. Specifically, the processing unit 12 provides DNN structure information indicating a structure in which a unit addition processing layer for performing "specified processing" is inserted between a predetermined layer to be processed and a layer one level higher. Generate. Here, each unit information of the unit addition processing layer corresponds to the addition unit information as will be described later in FIG. 16, and "specified processing" is applied to each unit information of the next lower layer. The processing parameters of the DNN structure information are set so as to be performed. Then, the processing unit 12 stores the generated DNN structure information in the storage unit M1.

図１５は、ＤＮＮの可視化対象のユニットに第３の処理を適用する一例を示す図である。まず、図１５では、記憶部Ｍ１に格納されているＤＮＮのＣｏｎｖｏｌｕｔｉｏｎ１層５０１およびＣｏｎｖｏｌｕｔｉｏｎ２層５０２に含まれるユニットが可視化対象ユニットに設定されている場合について説明されている。具体的には、図１５では、第３の処理として、ユニット５１１と５１２とそれぞれ同じ階層で隣り合うユニット５３１−２、５３２−２を付加ユニット情報として選択し、ユニット５１１、５１２のユニット状態情報がそれぞれ差し替えられる（５４１−２、５４２−２）または加算されることが示されている。 FIG. 15 is a diagram showing an example of applying the third process to the unit to be visualized by DNN. First, FIG. 15 describes a case where the units included in the Convolution 1 layer 501 and the Convolution 2 layer 502 of the DNN stored in the storage unit M1 are set as the visualization target units. Specifically, in FIG. 15, as the third process, units 531-2 and 532-2 adjacent to the units 511 and 512 in the same hierarchy are selected as additional unit information, and the unit status information of the units 511 and 512 is selected. Are shown to be replaced (541-2, 542-2) or added, respectively.

図１６は、「規定の処理」をＤＮＮの階層の処理として適用する一例を示す図である。まず、図１６では、記憶部Ｍ１に格納されているＤＮＮのＣｏｎｖｏｌｕｔｉｏｎ１層５０１およびＣｏｎｖｏｌｕｔｉｏｎ２層５０２に含まれるユニットが可視化対象ユニットに設定されている場合について説明されている。具体的には、図１６では、Ｃｏｎｖｏｌｕｔｉｏｎ１層５０１とＣｏｎｖｏｌｕｔｉｏｎ２層５０２の出力が、それぞれユニット付加処理１層５０１−３とユニット付加処理２層５０２−３に入力され、上述した第１から第４の処理が適用されることを示している。例えば、Ｃｏｎｖｏｌｕｔｉｏｎ１層のユニットＩＤがＦ０２００１のユニットに対しては、ユニット付加処理１層のユニットＩＤがＦ０３００１の付加ユニット情報が適用される。また、Ｃｏｎｖｏｌｕｔｉｏｎ２層のユニットＩＤがＦ０５００３のユニットに対しては、ユニット付加処理２層のユニットＩＤがＦ０６００３の付加ユニット情報が適用される。例えば、ユニット付加処理１層にて第４の処理が用いられる場合、付加ユニット情報Ｆ０３００１をユニットＩＤがＦ０２００１のユニットと同じ大きさで要素が所定の値を持つように設定し、ユニットＩＤがＦ０２００１のユニットを置き換えるまたは加算するなどの四則演算を適用する。 FIG. 16 is a diagram showing an example in which "specified processing" is applied as processing of the DNN hierarchy. First, FIG. 16 describes a case where the units included in the Convolution 1 layer 501 and the Convolution 2 layer 502 of the DNN stored in the storage unit M1 are set as the visualization target units. Specifically, in FIG. 16, the outputs of the Convolution 1 layer 501 and the Convolution 2 layer 502 are input to the unit addition process 1 layer 501-3 and the unit addition process 2 layer 502-3, respectively, and the first to fourth layers described above are described. Indicates that the process is applied. For example, the additional unit information having the unit ID of the unit addition process 1 layer is applied to the unit having the unit ID of the Convolution 1 layer of F0201. Further, the additional unit information having the unit ID of the unit addition process 2 layer F06003 is applied to the unit having the unit ID of the Convolution 2 layer F0503. For example, when the fourth process is used in the unit addition process 1 layer, the addition unit information F03001 is set so that the element has a predetermined value with the same size as the unit whose unit ID is F012001, and the unit ID is F022001. Apply four arithmetic operations such as replacing or adding units in.

［第５の実施形態］
図８のステップＶ１０６とステップＶ１０７との間のステップにおいて、「規定の処理」としてどのような処理を行うのかを設定するようにしても良い。その場合、設定された「規定の処理」を実現するための処理が以降の各ステップにおいて行われることになる。例えば、処理部１２は、記憶部Ｍ１から読み込んだ処理情報に基づき「規定の処理」を設定する。例えば、図１６の説明にて前述したように「規定の処理」がＤＮＮの構造の一部として処理される場合は、ユニット付加処理層を挿入した構造および「規定の処理」に対応した処理パラメータを示すＤＮＮ構造情報を生成する。そして処理部１２は、生成したＤＮＮ構造情報を記憶部Ｍ１に記憶させる。 [Fifth Embodiment]
In the step between step V106 and step V107 of FIG. 8, what kind of processing is to be performed as "specified processing" may be set. In that case, the process for realizing the set "specified process" will be performed in each subsequent step. For example, the processing unit 12 sets the "specified processing" based on the processing information read from the storage unit M1. For example, when the "specified processing" is processed as a part of the DNN structure as described above in the description of FIG. 16, the structure in which the unit addition processing layer is inserted and the processing parameters corresponding to the "specified processing" Generates DNN structure information indicating. Then, the processing unit 12 stores the generated DNN structure information in the storage unit M1.

また、上記の各実施形態では、複数の状態を識別する問題を例にとり説明したが、これに限るものではなく、一般的な識別問題に適用することが可能であり、例えば、正常と異常を識別する異常検知の問題に適用することができる。 Further, in each of the above embodiments, the problem of identifying a plurality of states has been described as an example, but the present invention is not limited to this, and can be applied to a general identification problem. It can be applied to the problem of anomaly detection to identify.

また、上記の各実施形態においては、認識学習装置１０、１０ａ、１０ｂのそれぞれは記憶部Ｍ１及び記憶部Ｍ２を備えていると説明したが、記憶部Ｍ１及び記憶部Ｍ２は認識学習装置１０、１０ａ、１０ｂと通信可能な外部の機器としても構わない。例えば、記憶部Ｍ１及び記憶部Ｍ２は、ネットワークを介して認識学習装置１０、１０ａ、１０ｂとデータ通信が可能なサーバ上や、他の装置が記憶部Ｍ１及び記憶部Ｍ２を備えてもよい。これは、他の機能部についても同様である。 Further, in each of the above embodiments, it has been described that each of the recognition learning devices 10, 10a and 10b includes a storage unit M1 and a storage unit M2. It may be an external device capable of communicating with 10a and 10b. For example, the storage unit M1 and the storage unit M2 may include the storage unit M1 and the storage unit M2 on a server capable of data communication with the recognition learning devices 10, 10a and 10b via a network, or another device may include the storage unit M1 and the storage unit M2. This also applies to other functional parts.

また、以上説明した各実施形態や変形例の構成はその一部若しくは全部を適宜組み合わせて使用することが可能であるし、また、以上説明した各実施形態や変形例の構成の一部若しくは全部を選択的に使用しても構わない。 In addition, some or all of the configurations of the above-described embodiments and modifications can be used in combination as appropriate, and some or all of the configurations of the above-described embodiments and modifications can be used in combination. May be used selectively.

［第６の実施形態］
認識学習装置１０、１０ａ、１０ｂを構成する各機能部はハードウェアで実装しても良いが、記憶部Ｍ１及び記憶部Ｍ２を除く他の各部をソフトウェア（コンピュータプログラム）で実装しても構わない。このような場合、このソフトウェアを実行可能なコンピュータ装置（記憶部Ｍ１及び記憶部Ｍ２を有する若しくは記憶部Ｍ１及び記憶部Ｍ２とデータ通信が可能である）は認識学習装置１０、１０ａ、１０ｂに適用可能である。このようなコンピュータ装置のハードウェア構成例について、図１４のブロック図を用いて説明する。 [Sixth Embodiment]
Each functional part constituting the recognition learning devices 10, 10a and 10b may be implemented by hardware, but each part other than the storage unit M1 and the storage unit M2 may be implemented by software (computer program). .. In such a case, a computer device capable of executing this software (having a storage unit M1 and a storage unit M2 or capable of data communication with the storage unit M1 and the storage unit M2) is applied to the recognition learning devices 10, 10a and 10b. It is possible. An example of a hardware configuration of such a computer device will be described with reference to the block diagram of FIG.

ＣＰＵ９０１は、ＲＡＭ９０２やＲＯＭ９０３に格納されているコンピュータプログラムやデータを用いて処理を行う。これによりＣＰＵ９０１は、コンピュータ装置全体の動作制御を行うと共に、コンピュータ装置を適用する認識学習装置１０、１０ａ、１０ｂが行うものとして上述した各処理を実行若しくは制御する。 The CPU 901 performs processing using computer programs and data stored in the RAM 902 and the ROM 903. As a result, the CPU 901 controls the operation of the entire computer device, and executes or controls each of the above-described processes as performed by the recognition learning devices 10, 10a, and 10b to which the computer device is applied.

ＲＡＭ９０２は、ＲＯＭ９０３や外部記憶装置９０６からロードされたコンピュータプログラムやデータ、Ｉ／Ｆ（インターフェース）９０７を介して外部から受信したデータ、を格納するためのエリアを有する。更にＲＡＭ９０２は、ＣＰＵ９０１が各種の処理を実行する際に用いるワークエリアを有する。このようにＲＡＭ９０２は、各種のエリアを適宜提供することができる。ＲＯＭ９０３には、書き換え不要の本コンピュータ装置の設定データやブートプログラムなどが格納されている。 The RAM 902 has an area for storing computer programs and data loaded from the ROM 903 and the external storage device 906, and data received from the outside via the I / F (interface) 907. Further, the RAM 902 has a work area used by the CPU 901 to execute various processes. As described above, the RAM 902 can appropriately provide various areas. The ROM 903 stores setting data, a boot program, and the like of the computer device that do not need to be rewritten.

操作部９０４は、マウスやキーボードなどのユーザインターフェースにより構成されており、ユーザが操作することで各種の指示をＣＰＵ９０１に対して入力することができる。例えば、ユーザが操作することで閾値などの設定情報をコンピュータ装置に入力することができる。 The operation unit 904 is configured by a user interface such as a mouse or a keyboard, and various instructions can be input to the CPU 901 by the user operating the operation unit 904. For example, the user can input setting information such as a threshold value to the computer device by operating the computer device.

表示部９０５は、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ９０１による処理結果を画像や文字などでもって表示することができる。なお、表示部９０５は、投影面に対して画像や文字を投影する投射装置であっても構わない。なお、操作部９０４と表示部９０５とを一体化させてタッチパネル画面を構成しても構わない。 The display unit 905 is composed of a CRT, a liquid crystal screen, or the like, and can display the processing result by the CPU 901 with an image, characters, or the like. The display unit 905 may be a projection device that projects an image or characters onto the projection surface. The touch panel screen may be configured by integrating the operation unit 904 and the display unit 905.

外部記憶装置９０６は、ハードディスクドライブ装置に代表される大容量情報記憶装置である。外部記憶装置９０６には、ＯＳ（オペレーティングシステム）や、認識学習装置１０、１０ａ、１０ｂが行うものとして上述した各処理をＣＰＵ９０１に実行若しくは制御させるためのコンピュータプログラムやデータが保存されている。このコンピュータプログラムには、図１，９，１３において記憶部Ｍ１及び記憶部Ｍ２を除く認識学習装置１０、１０ａ、１０ｂの各機能部の機能をＣＰＵ９０１に実行若しくは制御させるためのコンピュータプログラムが含まれている。また、外部記憶装置９０６に保存されているデータには、認識学習装置１０、１０ａ、１０ｂが既知の情報として取り扱うもの（閾値など）が含まれている。また、記憶部Ｍ１及び記憶部Ｍ２は外部記憶装置９０６内に設けられても良い。外部記憶装置９０６に保存されているコンピュータプログラムやデータは、ＣＰＵ９０１による制御に従って適宜ＲＡＭ９０２にロードされ、ＣＰＵ９０１による処理対象となる。 The external storage device 906 is a large-capacity information storage device typified by a hard disk drive device. The external storage device 906 stores computer programs and data for causing the CPU 901 to execute or control each of the above-mentioned processes as performed by the OS (operating system) and the recognition learning devices 10, 10a, and 10b. This computer program includes a computer program for causing the CPU 901 to execute or control the functions of the functional units of the recognition learning devices 10, 10a, and 10b excluding the storage unit M1 and the storage unit M2 in FIGS. 1, 9, and 13. ing. Further, the data stored in the external storage device 906 includes data (threshold value and the like) handled by the recognition learning devices 10, 10a and 10b as known information. Further, the storage unit M1 and the storage unit M2 may be provided in the external storage device 906. The computer programs and data stored in the external storage device 906 are appropriately loaded into the RAM 902 according to the control by the CPU 901, and are processed by the CPU 901.

Ｉ／Ｆ９０７は、外部の機器とのデータ通信を行うためのインターフェースとして機能するものであり、例えば、端末装置１００（１００ａ）との間のデータ通信は、このＩ／Ｆ９０７を介して行われる。 The I / F 907 functions as an interface for performing data communication with an external device. For example, data communication with the terminal device 100 (100a) is performed via the I / F 907.

ＣＰＵ９０１、ＲＡＭ９０２、ＲＯＭ９０３、操作部９０４、表示部９０５、外部記憶装置９０６、Ｉ／Ｆ９０７は何れもバス９０８に接続されている。なお、図１４に示したコンピュータ装置の構成は端末装置１００（１００ａ）にも適用可能である。この場合、表示部９０５は表示部ＤＳとして機能するし、操作検出部ＯＰは操作部９０４によって実装可能である。 The CPU 901, RAM 902, ROM 903, operation unit 904, display unit 905, external storage device 906, and I / F 907 are all connected to the bus 908. The configuration of the computer device shown in FIG. 14 can also be applied to the terminal device 100 (100a). In this case, the display unit 905 functions as the display unit DS, and the operation detection unit OP can be mounted by the operation unit 904.

このように、上記の各実施形態や変形例によれば、評価用データの認識に寄与するＤＮＮの特徴量を可視化することができる。そのため、利用者はＤＮＮが学習データ特有の特徴量を利用していないかどうかを確認することができる。また、可視化された特徴量に対する利用者からの重要度のフィードバックに基づき、ＤＮＮを再学習することができる。そのため、利用者は学習データ特有の特徴量を利用しないようにＤＮＮを制御することができる。また、評価用データの認識に寄与していないＤＮＮの特徴量を削除することができる。そのため、利用環境に合わせてＤＮＮの高速および軽量化することができる。 As described above, according to each of the above-described embodiments and modifications, it is possible to visualize the feature amount of DNN that contributes to the recognition of the evaluation data. Therefore, the user can confirm whether or not the DNN uses the feature amount peculiar to the learning data. In addition, the DNN can be relearned based on the feedback from the user of the importance of the visualized features. Therefore, the user can control the DNN so as not to use the feature amount peculiar to the learning data. In addition, the DNN feature amount that does not contribute to the recognition of the evaluation data can be deleted. Therefore, the speed and weight of the DNN can be reduced according to the usage environment.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１１：処理部１２：処理部１３：検出部１４：選択部１５：可視化部 11: Processing unit 12: Processing unit 13: Detection unit 14: Selection unit 15: Visualization unit

Claims

For each of the plurality of categories, a first neural network that outputs the probability that the target belonging to the category is included in the input data as an output value, and
A second neural network in which a specific unit is modified in the first neural network , and the probability that the input data includes an object belonging to the category is output as an output value for each of the plurality of categories. With the second neural network
For each of the plurality of categories, the difference information of the first neural network representing the difference between the output value the output value output to the category second neural network is output to the category And the calculation method to find
An information processing device including an output means for outputting to a display device information indicating the contribution of a specific unit to the output value with respect to the input data based on the difference information obtained by the calculation means.

The information processing apparatus according to claim 1, wherein the second neural network is a neural network in which the output values of all neurons in the specific unit of the first neural network are changed to 0.

The information processing apparatus according to claim 1, wherein the second neural network is a neural network in which a predetermined value is added to an output value of each neuron in the specific unit of the first neural network. ..

The first and second neural networks are neural networks having a plurality of layers.
The second neural network uses the processing result of the lower layer obtained when the first neural network obtains the output value as the processing result of the layer lower than the layer to which the specific unit belongs. The information processing apparatus according to any one of claims 1 to 3.

The information processing apparatus according to any one of claims 1 to 4, wherein the calculation means obtains the difference as the difference information.

The calculation means according to any one of claims 1 to 4, wherein the calculation means obtains the difference information based on the output value of the first neural network and the information used for the change. Information processing device.

The information according to any one of claims 1 to 6, wherein the second neural network is a neural network in which a plurality of specific units in the first neural network are sequentially changed. Processing equipment.

Said output means, for each of the plurality of categories, and defining the number of the difference information in descending order, according to claim 1 to 7, characterized in that the output information representative of a particular unit corresponding to the difference information, the The information processing apparatus according to any one of the above items.

The output means specifies a specified number of specific units in ascending order of the average of the difference information with respect to the plurality of input data for each of the plurality of categories , and the calculation means obtains the specified specific units. The information processing apparatus according to any one of claims 1 to 7, wherein the difference information and the information representing the specified specific unit are output.

In addition
The information processing apparatus according to claim 9, further comprising means for deleting the specified specific unit from the first neural network.

The output means further comprises any one of claims 1 to 10, wherein the output means outputs information representing the characteristics of the input data that contributes to the output value of the specific unit to the display device. The information processing device described.

In addition, it is equipped with a means for accepting the selection of units by the user.
The output means is characterized in that the unit selected by the user is the specific unit, and information representing the characteristics of the input data that contributes to the output value of the specific unit is displayed on the display device. The information processing device according to claim 11.

Further, a means for accepting the selection of the category by the user is provided.
12. The output means is characterized in that the display device displays information representing the characteristics of the input data that contributes to the output value of the category selected by the user in the unit selected by the user. The information processing device described in.

It said output means further among the elements included in the input data, and outputs the information representing the elements contributing to the output value of the particular unit, to the display device, contribute to the output value of the particular unit The information processing apparatus according to any one of claims 1 to 13, wherein the elements to be processed are identified and displayed.

In addition
When importance for the elements displayed in the display device is input, and characterized by using the importance with learning method using the importance comprises means for performing re-learning of the first neural network The information processing apparatus according to claim 14.

The information processing apparatus according to any one of claims 1 to 15, wherein the unit is a feature map of a neural network or a neuron.

For each of the plurality of categories, a first neural network that outputs the probability that the target belonging to the category is included in the input data as an output value, and
A second neural network in which a specific unit is modified in the first neural network, and the probability that the input data includes an object belonging to the category is output as an output value for each of the plurality of categories. With the second neural network
It is an information processing method performed by an information processing device having
For each of the plurality of categories, the calculation means of the information processing device outputs an output value for the category by the first neural network and an output value output for the category by the second neural network. a calculation step of obtaining difference information representing a difference between,
The output means of the information processing device includes an output step of outputting to a display device information indicating the contribution of a specific unit to the output value to the input data based on the difference information obtained in the calculation step. An information processing method characterized by the fact that.

A computer program for causing a computer to function as each means of the information processing apparatus according to any one of claims 1 to 16.