JP2017076193A

JP2017076193A - Brain activity analysis device, brain activity analysis method and brain activity analysis program

Info

Publication number: JP2017076193A
Application number: JP2015202275A
Authority: JP
Inventors: 之康神谷; Koreyasu Kamiya; 友慈堀川; Tomoyasu Horikawa
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2015-10-13
Filing date: 2015-10-13
Publication date: 2017-04-20
Anticipated expiration: 2035-10-13
Also published as: JP6643771B2

Abstract

PROBLEM TO BE SOLVED: To provide a brain activity analysis device capable of identifying an object category from a brain activity signal which is measured during a period when a subject is looking at an object image containing an object which is not used in training of a decoder, or when a subject imagines such an object image.SOLUTION: A brain activity analysis device comprises: a feature vector extraction part 3016 for extracting a vision feature vector about plural pieces of reference image data stored in a general purpose image database 4000; a feature vector prediction part 3014 for generating an estimated vision feature vector which is estimated based on a brain activity pattern of the subject; and an identification processing part 3020 for identifying an object category corresponding to the brain activity pattern which occurs in a prescribed region in a brain of the subject, based on a degree of correlation between the extracted vision feature vector and the estimated vision feature vector.SELECTED DRAWING: Figure 3

Description

この発明は、脳機能画像法を用いた脳活動解析装置および脳活動解析方法に関する。 The present invention relates to a brain activity analysis apparatus and a brain activity analysis method using a brain function imaging method.

（一般物体認識技術）
制約のない実世界シーンの画像に対して計算機がその中に含まれる物体を一般的な名称で認識する技術を、「一般物体認識」と呼ぶ。言い換えれば、「一般物体認識」とは、識別器を機械学習で訓練する際のデータベースには存在しない入力画像の物体のカテゴリを予測する（分類する）ことを意味する。 (General object recognition technology)
A technique in which a computer recognizes an object included in an unrestricted real-world scene image with a general name is called “general object recognition”. In other words, “general object recognition” means to predict (classify) a category of an object of an input image that does not exist in a database when training a discriminator by machine learning.

これに対して、「特定物体認識」とは、識別器を訓練する際のデータベースは認識対象とする物体の画像をすでに持つことを前提として、入力画像に写る物体とデータベース内の画像を照合し同定することである。 On the other hand, “specific object recognition” means that the database in training the classifier already has an image of the object to be recognized, and collates the object in the input image with the image in the database. To identify.

特定物体認識に使用される技術において、よく使用される特徴量としては、たとえば、「ＳＩＦＴ（Scale Invariant Feature Transform）」がある。これは，１９９９年にUniversity of British Columbia のDavid G.Loweによって提案された手法であり、スケールの変化に不変な特徴量である。ＳＩＦＴでは，画像が拡大しても縮小しても同じものとしてマッチングでき、さらに、スケールだけではなく、回転に対しても不変な特徴量を得ることができる。 In the technique used for specific object recognition, for example, “SIFT (Scale Invariant Feature Transform)” is often used as a feature quantity. This is a technique proposed by David G. Lowe of the University of British Columbia in 1999, and is a feature that is invariant to changes in scale. In SIFT, matching is possible even if the image is enlarged or reduced, and furthermore, it is possible to obtain a feature quantity that is invariant not only to the scale but also to the rotation.

当然ながら、一般物体認識には、特定物体認識よりも、より多くの技術的課題が存在する。たとえば、制約のない画像における「一般的な名称」を表す同一クラスの範囲が広く、同一クラスに属する対象のアピアランス（外観、見え）の変化がきわめて大きいために、１）対象の特徴抽出、２）認識モデル（識別器）の構築、３）学習データセットの構築が、困難であるためである。 Of course, general object recognition has more technical challenges than specific object recognition. For example, since the range of the same class representing “generic names” in an unconstrained image is wide and the appearance (appearance, appearance) of an object belonging to the same class varies greatly, 1) feature extraction of the object, This is because it is difficult to construct a recognition model (discriminator), and 3) to construct a learning data set.

一般物体認識の技術としては、たとえば、２０００年代に入り、計算機技術の発展により大量のデータを高速処理できるようになると、いわゆる「統計的機械学習」が、一般画像認識の技術に適用されるようになった。 As a general object recognition technique, for example, in the 2000s, when a large amount of data can be processed at high speed by the development of computer technology, so-called “statistical machine learning” is applied to the general image recognition technique. Became.

ここでは、特に、画像中の局所特徴量を利用する手法として、「Bag-of-keypoints」あるいは「Bag-of-visual words」と呼ばれる技術が発展した。 Here, in particular, a technique called “Bag-of-keypoints” or “Bag-of-visual words” has been developed as a method of using local features in an image.

「Bag-of-visual words」では、画像中の位置を無視して、画像を局所的特徴（visual words）の集合として考え、局所特徴の特徴ベクトルををベクトル量子化することで、画像の特徴量は、画像から抽出した数千個程度の局所的特徴（visual words）の出現頻度のヒストグラムとして表現される。 In “Bag-of-visual words”, the position of the image is ignored, the image is considered as a set of local features (visual words), and the feature vectors of the local features are vector-quantized to obtain image features. The quantity is expressed as a histogram of the appearance frequency of about several thousand local features (visual words) extracted from the image.

そして、このような局所特徴として、上述したＳＩＦＴ技術を用いて、特徴点を所定次元のベクトルとして記述する（ＳＩＦＴ記述子と呼ぶ）。 Then, as such local features, feature points are described as vectors of a predetermined dimension using the above-described SIFT technique (referred to as SIFT descriptors).

一般的には、Bag-of-visual wordsによる特徴表現は、学習画像に対する、１）特徴点抽出、２）ＳＩＦＴ記述子ベクトルの計算、３）全学習画像の全ＳＩＦＴ記述子ベクトルのk-means法によるクラスタリングによるコードブックの作成、４）コードブックに基づいて各画像について、ＳＩＦＴ記述子ベクトルのヒストグラムの作成、という手順で実行される。 In general, the feature expression by Bag-of-visual words is 1) feature point extraction, 2) calculation of SIFT descriptor vector, 3) k-means of all SIFT descriptor vectors of all learning images. A code book is created by clustering according to the method, and 4) a SIFT descriptor vector histogram is created for each image based on the code book.

たとえば、特許文献１には、物体認識部が、動画解析部から画像データを受け取り、該画像データから特定物体認識を行うための特徴量データ、例えば、ＳＩＦＴ特徴量（局所的な領域の濃度変化特徴を表す特徴量）などの局所特徴量から計算されるＢａｇＯｆＦｅａｔｕｒｅｓ特徴量（例えば、あらかじめ局所特徴量の集合をＫｍｅａｎｓ法によりクラスタリングしておき、代表的な局所特徴量を任意の個数見つけ出し、画像１枚における求めた代表的な局所特徴量の出現度合いを表した特徴量）を抽出する機能と、該特徴量データと、物体認識データベースの中に保存されている物体特徴量とを比較して特定物体認識処理を行う機能と、該特定物体認識結果と特定物体の位置情報を、動画データベースに保存する機能と、該画像データと物体認識データベースの中に保存されている一般物体認識器と一般物体名称から一般物体認識処理を行う機能と、該一般物体認識結果を動画データベースに保存する機能と、を有する構成について、開示がある。 For example, in Patent Document 1, the object recognition unit receives image data from the moving image analysis unit, and feature amount data for performing specific object recognition from the image data, for example, SIFT feature amount (concentration change in local area) Bag Of Features features calculated from local features such as features representing features) (for example, a set of local features is clustered in advance by the Kmeans method to find an arbitrary number of representative local features, A feature value that represents the appearance level of a representative local feature value obtained in one image), and the feature value data is compared with the object feature value stored in the object recognition database. A function of performing specific object recognition processing, a function of storing the specific object recognition result and position information of the specific object in a moving image database, and the image data And a general object recognizer stored in the object recognition database, a function of performing general object recognition processing from a general object name, and a function of storing the general object recognition result in a moving image database are disclosed. is there.

ただし、このような局所特徴量から計算されるＢａｇＯｆＦｅａｔｕｒｅｓ特徴量による物体認識には、一定の限界があった。これは、主として、物体の認識にあたり、画像中の物体の見え（アピアランス）の違いに対して不変（ある意味での鈍感さ）と考えられる画像の局所的な特徴を採用しつつも、類似カテゴリとの区別を可能とする弁別力（ある意味での敏感さ）を両立させる必要があることが、理由の一つと考えられる。 However, the object recognition based on the Bag Of Features feature amount calculated from such local feature amount has a certain limit. This is mainly due to the recognition of objects, while adopting local features of images that are considered invariant (insensitive in a sense) to the difference in appearance (appearance) of objects in images, One of the reasons for this is that it is necessary to achieve a discrimination power (sensitivity in a certain sense) that can be distinguished from the other.

ところが、最近、畳込みニューラルネットワーク（convolutional neural network）を用いた、いわゆる「ディープラーニング（深層学習）」が、一般物体認識において、高い能力を発揮することが実証されている（たとえば、非特許文献１を参照）。 However, recently, it has been demonstrated that so-called “deep learning” using a convolutional neural network exhibits high performance in general object recognition (for example, non-patent literature). 1).

畳込みニューラルネットワークは、神経科学の知見に基づく構造を持つ順伝播型ニューラルネットワークの一種である。 A convolutional neural network is a kind of forward propagation type neural network having a structure based on knowledge of neuroscience.

生物の視覚系では，外界から眼に取り込まれ網膜に結んだ像は，脳の視覚野に電気的な信号として伝達される。脳の視覚野にある無数の神経細胞の中には、網膜の特定の場所に特定のパタンが入力されると興奮し、それ以外のときは興奮しないという、選択的な振る舞いを示すものがあることが知られている。 In the visual system of living organisms, an image taken into the eye from the outside and tied to the retina is transmitted as an electrical signal to the visual cortex of the brain. Some of the myriad nerve cells in the visual cortex of the brain exhibit selective behavior that is excited when a specific pattern is entered at a specific location in the retina and not excited at other times. It is known.

そのような細胞には単純型細胞(simple cell)、複雑型細胞(complex cell) と呼ばれる２種類があり、網膜（あるいは視野）の特定の位置に，特定の方向・太さの線分が提示されたときのみ選択的に反応する。単純型細胞と複雑型細胞との入力の位置選択性の違いがあり、前者はそれが厳密だが、後者は一定の寛容性を持つという特性がある。 There are two types of such cells, called simple cells and complex cells. Lines with a specific direction and thickness are displayed at specific positions in the retina (or visual field). Reacts selectively only when There is a difference in the regioselectivity of input between simple cells and complex cells, and the former has the characteristic that it is strict but the latter has a certain tolerance.

畳込みニューラルネットワークは、このような単純型細胞の機能と複雑型細胞の機能とを、ニューラルネットワークの構成としてモデル化したものである。 The convolutional neural network is a model of such a simple cell function and a complex cell function as a neural network configuration.

一方で、人間の脳の視覚野の活動についても、非侵襲な計測方法である核磁気共鳴画像法（ＭＲＩ:Magnetic Resonance Imaging）が発展したことにより、ほぼリアルタイムに近い条件で、観察することが可能となってきている。 On the other hand, the activity of the visual cortex of the human brain can also be observed under nearly real-time conditions due to the development of magnetic resonance imaging (MRI), a non-invasive measurement method. It has become possible.

核磁気共鳴画像法を利用して、ヒトの脳の活動に関連した血流動態反応を視覚化する方法である機能的磁気共鳴画像法（ｆＭＲＩ：functinal Magnetic Resonance Imaging）を始めとする脳機能画像法は、感覚刺激や認知課題遂行による脳活動と安静時や対照課題遂行による脳活動の違いを検出して、関心のある脳機能の構成要素に対応する脳賦活領域を特定すること、すなわち脳の機能局在を明らかにすることにもちいられてきた。 Functional brain imaging including functional magnetic resonance imaging (fMRI), a method of visualizing hemodynamic responses related to human brain activity using nuclear magnetic resonance imaging The method identifies the brain activation areas corresponding to the components of the brain function of interest by detecting the difference between the brain activity due to sensory stimulation and cognitive task execution and the brain activity due to rest or control task execution. It has been used to clarify the functional localization of.

血流量の変化がＮＭＲ信号強度に変化をもたらすのは、血液中の酸素化および脱酸素化ヘモグロビンは磁気的な性質が異なることを利用している。酸素化ヘモグロビンは反磁性体の性質があり、周りに存在する水の水素原子の緩和時間に影響を与えないのに対し、脱酸素化ヘモグロビンは常磁性体であり、周囲の磁場を変化させる。したがって、脳が刺激を受け、局部血流が増大し、酸素化ヘモグロビンが増加すると、その変化分をＭＲＩ信号として検出する事ができる。被験者への刺激は、たとえば、視覚による刺激や聴覚による刺激、あるいは所定の課題（タスク）の実行等が用いられる。 The change in blood flow causes a change in the NMR signal intensity, utilizing the fact that oxygenated and deoxygenated hemoglobin in the blood has different magnetic properties. Oxygenated hemoglobin has a diamagnetic property and does not affect the relaxation time of hydrogen atoms present in the surrounding area, whereas deoxygenated hemoglobin is a paramagnetic material and changes the surrounding magnetic field. Therefore, when the brain is stimulated, local blood flow increases, and oxygenated hemoglobin increases, the change can be detected as an MRI signal. As the stimulus to the subject, for example, visual stimulation, auditory stimulation, execution of a predetermined task (task), or the like is used.

ここで、脳機能研究においては、微小静脈や毛細血管における赤血球中の脱酸素化ヘモグロビンの濃度が減少する現象（ＢＯＬＤ効果）に対応した水素原子の核磁気共鳴信号（ＭＲＩ信号）の上昇を測定することによって脳の活動の測定が行われている。 Here, in brain function research, we measured the increase in nuclear magnetic resonance signals (MRI signals) of hydrogen atoms corresponding to the phenomenon (BOLD effect) in which the concentration of deoxygenated hemoglobin in erythrocytes decreases in micro veins and capillaries. In this way, brain activity is measured.

特に、人の運動機能に関する研究では、被験者に何らかの運動を行わせつつ、上記ＭＲＩ装置によって脳の活動を測定することが行われている。 In particular, in research on human motor functions, brain activity is measured by the MRI apparatus while causing a subject to perform some kind of exercise.

ところで、ヒトの場合、非侵襲的な脳活動の計測が必要であり、この場合、ｆＭＲＩデータから、より詳細な情報を抽出できるデコーディング技術が発達してきており、低次視覚野の脳活動のデコーディングにより、被験者が今見ている画像を再構成することに成功いしている（たとえば、非特許文献２）。特に、ｆＭＲＩが脳におけるボクセル単位（volumetric pixel : voxel）で脳活動を解析することで、脳活動の空間的パターンから、刺激入力や認識状態を推定することが可能となっている。 By the way, in the case of humans, it is necessary to measure brain activity noninvasively. In this case, decoding technology capable of extracting more detailed information from fMRI data has been developed, and the brain activity of the lower visual cortex has been developed. The decoding has succeeded in reconstructing the image that the subject is currently viewing (for example, Non-Patent Document 2). In particular, fMRI analyzes brain activity in units of voxels in the brain (volumetric pixel: voxel), so that it is possible to estimate stimulus input and recognition status from the spatial pattern of brain activity.

さらに、実際に何を見ているのかだけでなく、脳活動から、人が何を想像しているのかや（非特許文献３）、さらには、何を夢見ているのか（非特許文献４）、といった精神活動の内容を解釈することも可能となってきている。 Furthermore, not only what is actually seen, but also what the person is imagining from brain activity (Non-patent Document 3), and what is dreaming (Non-Patent Document 4) It is also possible to interpret the contents of mental activities such as.

特開２０１５−３２９０５号公報Japanese Patent Laying-Open No. 2015-32905

A. Krizhevsky, I. Sutskever and G. E. Hinton, ”ImageNet classification with deep convolutional neural networks，” In Proceedings of Neural Information Processing Systems, 2012.A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” In Proceedings of Neural Information Processing Systems, 2012. Kamitani Y, Tong F. ”Decoding the visual and subjective contents of the human brain．” Nat Neurosci. 2005; 8: 679-85.Kamitani Y, Tong F. “Decoding the visual and subjective contents of the human brain.” Nat Neurosci. 2005; 8: 679-85. Reddy, L., Tsuchiya, N. & Serre, T. ”Reading the mind's eye: Decoding category information during mental imagery．” Neuroimage 50, 818-825 (2010).Reddy, L., Tsuchiya, N. & Serre, T. “Reading the mind's eye: Decoding category information during mental imagery.” Neuroimage 50, 818-825 (2010). Horikawa, T., Tamaki, M., Miyawaki, Y. & Kamitani, Y. ”Neural decoding of visual imagery during sleep．” Science 340, 639-642 (2013).Horikawa, T., Tamaki, M., Miyawaki, Y. & Kamitani, Y. “Neural decoding of visual imagery during sleep.” Science 340, 639-642 (2013).

上述のように、最近の研究は、視認されまたは想像された内容を、fMRIにより脳活動を計測することでデコードすることを達成している。 As mentioned above, recent research has achieved decoding what is viewed or imagined by measuring brain activity with fMRI.

ほとんどのこれらの研究は、分類ベースのアプローチに依存したものであり、統計的分類器（デコーダ）が脳活性パターンと解読されるべきターゲットの視覚的な内容の関係を学習するように訓練される。 Most of these studies rely on classification-based approaches, where a statistical classifier (decoder) is trained to learn the relationship between brain activity patterns and the visual content of the target to be deciphered .

しかしながら、そのようなアプローチは可能な出力の数に基本的な拘束を課したものであり、分類器からの出力は、デコーダのトレーニングに使用されたクラスの数に制限されており、デコーダがトレーニング中で使用されない、任意のクラスに関する予測をすることができない。 However, such an approach imposes basic constraints on the number of possible outputs, the output from the classifier is limited to the number of classes used to train the decoder, and the decoder trains Cannot make predictions about any class that is not used in.

すなわち、その予測はトレーニングサンプルに制限されており、画像認識における一般物体認識に相当するような予測については、未だ、達成されていない。 That is, the prediction is limited to the training sample, and the prediction corresponding to the general object recognition in the image recognition has not been achieved yet.

また、脳活動から、一般的に物体のカテゴリをデコードする方法を確立できれば、脳活動からデコードされた情報を利用する技術に対する実用的な利点をもたらし、人間の脳がどのようにして莫大な数の物体を表現するかについても、理解が進むものと考えられる。 In addition, if we can establish a method that generally decodes the category of an object from brain activity, it will provide a practical advantage over the technology that utilizes information decoded from brain activity, and how the human brain It is thought that understanding will progress also about whether to express the object.

この発明は、上記のような問題点を解決するためになされたものであって、その目的は、被験者が、デコーダのトレーニングの中で使用されなかった物体を含むような物体画像を見ているあるいは想像している間に測定された脳活動信号からであっても、視認されまたは想像された物体のカテゴリを識別することが可能な脳活動解析装置、脳活動解析方法および脳活動解析プログラムを提供することである。 The present invention has been made to solve the above-described problems, and its object is to view an object image in which a subject includes an object that was not used in the training of the decoder. Alternatively, a brain activity analysis device, a brain activity analysis method, and a brain activity analysis program capable of identifying a category of a visually recognized or imagined object even from a brain activity signal measured during imagination Is to provide.

この発明の他の目的は、被験者が、物体画像を見ているあるいは想像している間に測定された脳活動信号から、視認されまたは想像された物体のカテゴリを識別するに当たり、識別器の機械学習に要する時間を短縮することが可能な脳活動解析装置、脳活動解析方法および脳活動解析プログラムを提供することである。 Another object of the present invention is to identify a machine of a discriminator in identifying a category of an object that is viewed or imagined from a brain activity signal measured while the subject is viewing or imagining the object image. To provide a brain activity analysis apparatus, a brain activity analysis method, and a brain activity analysis program capable of reducing the time required for learning.

この発明の１つの局面に従うと、脳活動解析装置であって、画像に含まれる物体のカテゴリ情報が関連付けられた複数の参照画像データを格納する画像データベースと、参照画像データについて、視覚特徴ベクトルを抽出する視覚特徴抽出部と、対象者の脳内の所定領域における脳活動を示す信号を測定するための脳活動検知装置からの信号を受信するためのインタフェースと、複数の試験画像を被験者に提示した際に、被験者の脳内の所定領域における脳活動を示す信号として予め測定された信号に基づく機械学習により、脳活動パターンから推定した推定視覚特徴ベクトルを生成するための特徴予測手段と、視覚特徴抽出部により抽出された視覚特徴ベクトルと、推定視覚特徴ベクトルとの相関の大きさに基づいて、対象者の所定領域に生じている脳活動パターンに対応する物体のカテゴリを識別する識別手段とを備える。 According to one aspect of the present invention, there is provided a brain activity analysis apparatus, an image database storing a plurality of reference image data associated with category information of an object included in an image, and visual feature vectors for the reference image data. Visual feature extraction unit to extract, interface for receiving signals from brain activity detection device for measuring signals indicating brain activity in a predetermined area in the subject's brain, and presenting multiple test images to the subject A feature prediction means for generating an estimated visual feature vector estimated from a brain activity pattern by machine learning based on a signal measured in advance as a signal indicating brain activity in a predetermined region in the subject's brain, Based on the magnitude of the correlation between the visual feature vector extracted by the feature extraction unit and the estimated visual feature vector, It comprises identifying means for identifying a category of the object corresponding to the brain activity pattern Flip to have.

好ましくは、画像データベース中に格納される参照画像データの物体のカテゴリの数は、被験者に提示した試験画像における物体のカテゴリの数よりも多い。 Preferably, the number of object categories in the reference image data stored in the image database is greater than the number of object categories in the test image presented to the subject.

好ましくは、特徴予測手段の機械学習に使用され、試験画像に対応する視覚特徴ベクトルは、同一のカテゴリに属する複数の参照画像データに対する視覚特徴ベクトルを平均したものである。 Preferably, the visual feature vector used for machine learning of the feature prediction means and corresponding to the test image is an average of visual feature vectors for a plurality of reference image data belonging to the same category.

好ましくは、視覚特徴抽出部は、多層構造を有する畳込みニューラルネットワークである。 Preferably, the visual feature extraction unit is a convolutional neural network having a multilayer structure.

好ましくは、特徴予測手段が予測する推定視覚特徴ベクトルは、畳込みニューラルネットワークの中間の層の発火パターンから抽出された特徴量である。 Preferably, the estimated visual feature vector predicted by the feature prediction means is a feature amount extracted from the firing pattern of the intermediate layer of the convolutional neural network.

好ましくは、特徴予測手段が予測する推定視覚特徴ベクトルは、ＳＩＦＴ＋ＢｏＦによる画像特徴ベクトルである。 Preferably, the estimated visual feature vector predicted by the feature prediction means is an image feature vector based on SIFT + BoF.

この発明の他の局面に従うと、対象者の脳内の所定領域における脳活動を示す信号を測定するための脳活動検知装置からの信号に基づいて、演算装置が、対象者が視認しているまたは想像している物体のカテゴリに識別を行うための脳活動解析方法であって、演算装置が、複数の試験画像を被験者に提示した際に、被験者の脳内の所定領域における脳活動を示す信号として予め測定された信号に基づいて、脳活動パターンから推定視覚特徴ベクトルを推定する処理を機械学習するステップと、演算装置が、脳活動検知装置からの信号に基づいて、推定視覚特徴ベクトルを推定するステップと、演算装置が、画像に含まれる物体のカテゴリ情報が関連付けられた複数の参照画像データを格納する画像データベースにより参照画像データに対して抽出された視覚特徴ベクトルと、推定視覚特徴ベクトルとの類似度の大きさを算出するステップと、演算装置が、算出された類似度の大きさに基づいて、対象者の所定領域に生じている脳活動パターンに対応する物体のカテゴリを識別するステップとを備える。 According to another aspect of the present invention, the computing device is visually recognized by the subject based on a signal from the brain activity detecting device for measuring a signal indicating brain activity in a predetermined region in the subject's brain. Or a brain activity analysis method for identifying a category of an imagined object, the computing device showing brain activity in a predetermined region in the subject's brain when the arithmetic device presents a plurality of test images to the subject A step of machine learning a process of estimating an estimated visual feature vector from a brain activity pattern based on a signal measured in advance as a signal; and an arithmetic unit that calculates an estimated visual feature vector based on a signal from the brain activity detecting device. An estimation step and an arithmetic unit for the reference image data by an image database storing a plurality of reference image data associated with category information of an object included in the image; The step of calculating the magnitude of the similarity between the visual feature vector that has been issued and the estimated visual feature vector, and the computing device are generated in a predetermined area of the subject based on the magnitude of the calculated similarity Identifying a category of the object corresponding to the brain activity pattern.

この発明のさらに他の局面に従うと、対象者の脳内の所定領域における脳活動を示す信号を測定するための脳活動検知装置からの信号に基づいて、対象者が視認しているまたは想像している物体のカテゴリに識別を行う処理をコンピュータに実行させるための脳活動解析プログラムであって、コンピュータの演算装置が、複数の試験画像を被験者に提示した際に、被験者の脳内の所定領域における脳活動を示す信号として予め測定された信号に基づいて、脳活動パターンから推定視覚特徴ベクトルを推定する処理を機械学習するステップと、演算装置が、脳活動検知装置からの信号に基づいて、推定視覚特徴ベクトルを推定するステップと、演算装置が、画像に含まれる物体のカテゴリ情報が関連付けられた複数の参照画像データを格納する画像データベースにより参照画像データに対して抽出された視覚特徴ベクトルと、推定視覚特徴ベクトルとの類似度の大きさを算出するステップと、演算装置が、算出された類似度の大きさに基づいて、対象者の所定領域に生じている脳活動パターンに対応する物体のカテゴリを識別するステップと、をコンピュータに実行させる。 According to still another aspect of the present invention, the subject visually recognizes or imagines based on a signal from the brain activity detection device for measuring a signal indicating brain activity in a predetermined region in the subject's brain. A brain activity analysis program for causing a computer to perform a process of identifying a category of an object, and when a computing device presents a plurality of test images to a subject, a predetermined region in the subject's brain A step of machine learning a process of estimating an estimated visual feature vector from a brain activity pattern based on a signal measured in advance as a signal indicating brain activity in the computer, and an arithmetic unit based on a signal from the brain activity detection device, The step of estimating the estimated visual feature vector and the arithmetic device store a plurality of reference image data associated with category information of an object included in the image A step of calculating the degree of similarity between the visual feature vector extracted from the reference image data by the image database and the estimated visual feature vector, and the arithmetic unit based on the calculated degree of similarity, Identifying a category of an object corresponding to a brain activity pattern occurring in a predetermined region of the subject, and causing the computer to execute.

この発明によれば、被験者が、デコーダのトレーニングの中で使用されなかった物体を含むような物体画像を見ているあるいは想像している間に測定された脳活動信号からであっても、視認されまたは想像された物体のカテゴリを識別することが可能となる。 According to the present invention, even if the subject is from a brain activity signal measured while viewing or imagining an object image that includes objects that were not used in the decoder training, It is possible to identify the category of objects that have been or imagined.

また、この発明によれば、被験者が、物体画像を見ているあるいは想像している間に測定された脳活動信号から、視認されまたは想像された物体のカテゴリを識別するに当たり、識別器の機械学習に要する時間を短縮することが可能となる。 In addition, according to the present invention, in identifying a category of a visually recognized or imagined object from a brain activity signal measured while the subject is viewing or imagining the object image, It is possible to reduce the time required for learning.

ＭＲＩ装置１０の全体構成を示す模式図である。1 is a schematic diagram showing an overall configuration of an MRI apparatus 10. データ処理部３２のハードウェアブロック図である。2 is a hardware block diagram of a data processing unit 32. FIG. 被験者が視認したか、あるいは、想像した物体について、一般物体認識の処理を行うための構成を示す機能ブロック図である。It is a functional block diagram which shows the structure for performing the process of general object recognition about the object which the test subject visually recognized or imagined. 脳活動解析装置システムの構築の手続きを説明するためのフローチャートである。It is a flowchart for demonstrating the procedure of construction | assembly of a brain activity analysis apparatus system. 対象者の脳活動データから、対象者が視認している物体または想像している物体を識別する処理を説明するための概念図である。It is a conceptual diagram for demonstrating the process which identifies the object which the subject visually recognizes or the object which is imagined from the subject's brain activity data. ＣＮＮモデルについて概念を示す図である。It is a figure which shows a concept about a CNN model. ＣＮＮモデルの構成の一例を説明するための概念図である。It is a conceptual diagram for demonstrating an example of a structure of a CNN model. モデル化の概念を説明するための図である。It is a figure for demonstrating the concept of modeling. モデル化の概念を説明するための図である。It is a figure for demonstrating the concept of modeling. 画像提示実験および想像実験の流れを説明するための概念図である。It is a conceptual diagram for demonstrating the flow of an image presentation experiment and an imaginary experiment. 多数の視覚野に対して、提示された画像の特徴に対する予測精度を示す図である。It is a figure which shows the prediction precision with respect to the feature of the image shown with respect to many visual areas. 視認された画像に対する、多数の視覚野からの物体特有の特徴の予測精度を示す図である。It is a figure which shows the prediction precision of the characteristic peculiar to the object from many visual areas with respect to the visually recognized image. 想像された画像に対する、多数の視覚野からの物体特有の特徴の予測精度を示す図である。It is a figure which shows the prediction precision of the characteristic peculiar to the object from many visual areas with respect to the imagined image. 物体カテゴリ識別の概念を示す図である。It is a figure which shows the concept of object category identification. 識別解析における識別手続きを説明するための概念図である。It is a conceptual diagram for demonstrating the identification procedure in identification analysis. 識別された予測カテゴリのランク（順位）と、ターゲットとなるカテゴリとの間の意味的距離を示す図である。It is a figure which shows the semantic distance between the rank (rank) of the identified prediction category, and the category used as a target. 視覚的特徴と候補集合サイズに対して、視認された物体に対する識別性能を示す図である。It is a figure which shows the identification performance with respect to the visually recognized object with respect to a visual feature and candidate set size. 多数の特徴関心領域の組合せの下での識別性能を評価した図である。It is the figure which evaluated the identification performance under the combination of many feature region of interest.

以下、本発明の実施の形態のＭＲＩシステムの構成について、図に従って説明する。なお、以下の実施の形態において、同じ符号を付した構成要素および処理工程は、同一または相当するものであり、必要でない場合は、その説明は繰り返さない。
［実施の形態１］
図１は、ＭＲＩ装置１０の全体構成を示す模式図である。 The configuration of the MRI system according to the embodiment of the present invention will be described below with reference to the drawings. In the following embodiments, components and processing steps given the same reference numerals are the same or equivalent, and the description thereof will not be repeated unless necessary.
[Embodiment 1]
FIG. 1 is a schematic diagram showing the overall configuration of the MRI apparatus 10.

ＭＲＩ装置は、上述のとおり、ｆＭＲＩ信号を測定することにより、脳内の関心領域の活動を計測することができる。 As described above, the MRI apparatus can measure the activity of the region of interest in the brain by measuring the fMRI signal.

図１に示すように、ＭＲＩ装置１０は、被験者２の関心領域に制御された磁場を付与してＲＦ波を照射する磁場印加機構１１と、この被験者２からの応答波（ＮＭＲ信号）を受信してアナログ信号を出力する受信コイル２０と、この被験者２に付与される磁場を制御するとともにＲＦ波の送受信を制御する駆動部２１と、この駆動部２１の制御シーケンスを設定するとともに各種データ信号を処理して画像を生成するデータ処理部３２とを備える。 As shown in FIG. 1, the MRI apparatus 10 receives a response wave (NMR signal) from the magnetic field application mechanism 11 that applies a controlled magnetic field to the region of interest of the subject 2 and irradiates an RF wave. Then, a receiving coil 20 that outputs an analog signal, a drive unit 21 that controls the magnetic field applied to the subject 2 and controls transmission and reception of RF waves, a control sequence for the drive unit 21 and various data signals are set. And a data processing unit 32 for generating an image.

なお、ここで、被験者２が載置される円筒形状のボアの中心軸をＺ軸にとりＺ軸と直交する水平方向にＸ軸及び鉛直方向にＹ軸を定義する。 Here, the central axis of the cylindrical bore on which the subject 2 is placed is taken as the Z axis, and the X axis in the horizontal direction perpendicular to the Z axis and the Y axis in the vertical direction are defined.

ＭＲＩ装置１０は、このような構成であるので、磁場印加機構１１により印加される静磁場により、被験者２を構成する原子核の核スピンは、磁場方向（Ｚ軸）に配向するとともに、この原子核に固有のラーモア周波数でこの磁場方向を軸とする歳差運動を行う。 Since the MRI apparatus 10 has such a configuration, the nuclear spin of the nucleus constituting the subject 2 is oriented in the magnetic field direction (Z axis) by the static magnetic field applied by the magnetic field applying mechanism 11, and Precession is performed around this magnetic field direction at the inherent Larmor frequency.

そして、このラーモア周波数と同じＲＦパルスを照射すると、原子は共鳴しエネルギーを吸収して励起され、核磁気共鳴現象（ＮＭＲ現象；Nuclear Magnetic Resonance）が生じる。この共鳴の後に、ＲＦパルス照射を停止すると、原子はエネルギーを放出して元の定常状態に戻る緩和過程で、ラーモア周波数と同じ周波数の電磁波（ＮＭＲ信号）を出力する。 When an RF pulse having the same frequency as that of the Larmor frequency is irradiated, the atoms resonate and absorb energy to be excited, and a nuclear magnetic resonance phenomenon (NMR phenomenon; Nuclear Magnetic Resonance) occurs. When the RF pulse irradiation is stopped after this resonance, the atoms emit an electromagnetic wave (NMR signal) having the same frequency as the Larmor frequency in a relaxation process in which energy is released and returns to the original steady state.

この出力されたＮＭＲ信号を被験者２からの応答波として受信コイル２０で受信し、データ処理部３２において、被験者２の関心領域が画像化される。 The output NMR signal is received by the receiving coil 20 as a response wave from the subject 2, and the region of interest of the subject 2 is imaged in the data processing unit 32.

磁場印加機構１１は、静磁場発生コイル１２と、傾斜磁場発生コイル１４と、ＲＦ照射部１６と、被験者２をボア中に載置する寝台１８とを備える。 The magnetic field application mechanism 11 includes a static magnetic field generation coil 12, a gradient magnetic field generation coil 14, an RF irradiation unit 16, and a bed 18 on which the subject 2 is placed in the bore.

被験者２は、寝台１８に、たとえば、仰臥する。被験者２は、特に限定されないが、たとえば、プリズムメガネ４により、Ｚ軸に対して垂直に設置されたディスプレイ６に表示される画面を見ることができる。このディスプレイ６の画像により、被験者２に視覚刺激が与えられる。なお、被験者２への視覚刺激は、被験者２の目前にプロジェクタにより画像が投影される構成であってもよい。 The subject 2 lies on the bed 18, for example. Although the subject 2 is not particularly limited, for example, the prism glasses 4 can see the screen displayed on the display 6 installed perpendicular to the Z axis. A visual stimulus is given to the subject 2 by the image on the display 6. Note that the visual stimulus to the subject 2 may be configured such that an image is projected by a projector in front of the subject 2.

このような視覚刺激は、被験者に対して物体の画像を提示したり、被験者が想像するべき物体を文字などで提示したりすることに使用される。 Such visual stimulation is used for presenting an image of an object to a subject or presenting an object to be imagined by a subject using characters or the like.

駆動部２１は、静磁場電源２２と、傾斜磁場電源２４と、信号送信部２６と、信号受信部２８と、寝台１８をＺ軸方向の任意位置に移動させる寝台駆動部３０とを備える。 The drive unit 21 includes a static magnetic field power supply 22, a gradient magnetic field power supply 24, a signal transmission unit 26, a signal reception unit 28, and a bed drive unit 30 that moves the bed 18 to an arbitrary position in the Z-axis direction.

データ処理部３２は、操作者（図示略）から各種操作や情報入力を受け付ける入力部４０と、被験者２の関心領域に関する各種画像及び各種情報を画面表示する表示部３８と、各種処理を実行させるプログラム・制御パラメータ・画像データ（構造画像等）及びその他の電子データを記憶する記憶部３６と、駆動部２１を駆動させる制御シーケンスを発生させるなどの各機能部の動作を制御する制御部４２と、駆動部２１との間で各種信号の送受信を実行するインタフェース部４４と、関心領域に由来する一群のＮＭＲ信号からなるデータを収集するデータ収集部４６と、このＮＭＲ信号のデータに基づいて画像を形成する画像処理部４８と、ネットワークとの間で通信を実行するためのネットワークインタフェース部５０を備える。 The data processing unit 32 causes the input unit 40 to receive various operations and information input from an operator (not shown), the display unit 38 that displays various images and various information related to the region of interest of the subject 2, and executes various processes. A storage unit 36 for storing programs, control parameters, image data (structural images, etc.) and other electronic data; a control unit 42 for controlling the operation of each functional unit such as generating a control sequence for driving the drive unit 21; An interface unit 44 that transmits and receives various signals to and from the drive unit 21, a data collection unit 46 that collects data including a group of NMR signals originating from the region of interest, and an image based on the data of the NMR signals An image processing unit 48 for forming the network and a network interface unit 50 for executing communication with the network.

後に説明するように、データ処理部３２は、ネットワークインタフェース部５０を介して、汎用画像データベース４０００とデータの授受を行う。ここで、汎用画像データベース４０００には、画像データとその画像データについての注釈（画像中に含まれる物体のカテゴリなどの情報）がタグ情報として関連付けられた多数のデータの組が、データベースとして格納されている。物体のカテゴリとは、従来から一般物体認識において使用される「物体の一般的な名称」であってよい。以下では、汎用画像データベース４０００に含まれる画像データのことを「参照画像データ」と呼ぶ。 As will be described later, the data processing unit 32 exchanges data with the general-purpose image database 4000 via the network interface unit 50. Here, the general-purpose image database 4000 stores a large number of data sets in which image data and annotations about the image data (information such as the category of an object included in the image) are associated as tag information. ing. The object category may be a “general name of an object” conventionally used in general object recognition. Hereinafter, the image data included in the general-purpose image database 4000 is referred to as “reference image data”.

また、データ処理部３２は、専用コンピュータである場合の他、各機能部を動作させる機能を実行する汎用コンピュータであって、記憶部３６にインストールされたプログラムに基づいて、指定された演算やデータ処理や制御シーケンスの発生をさせるものである場合も含まれる。以下では、データ処理部３２は、汎用コンピュータであるものとして説明する。 The data processing unit 32 is a general-purpose computer that executes a function for operating each functional unit, in addition to the case of being a dedicated computer, and based on a program installed in the storage unit 36, a specified calculation or data This includes cases where processing and control sequences are generated. In the following description, it is assumed that the data processing unit 32 is a general-purpose computer.

静磁場発生コイル１２は、Ｚ軸周りに巻回される螺旋コイルに静磁場電源２２から供給される電流を流して誘導磁場を発生させ、ボアにＺ軸方向の静磁場を発生させるものである。このボアに形成される静磁場の均一性の高い領域に被験者２の関心領域を設定することになる。ここで、静磁場コイル１２は、より詳しくは、たとえば、４個の空芯コイルから構成され、その組み合わせで内部に均一な磁界を作り、被験者２の体内の所定の原子核、より特定的には水素原子核のスピンに配向性を与える。 The static magnetic field generating coil 12 generates an induction magnetic field by causing a current supplied from a static magnetic field power supply 22 to flow through a spiral coil wound around the Z axis, and generates a static magnetic field in the Z axis direction in the bore. . The region of interest of the subject 2 is set in a region where the static magnetic field formed in the bore is highly uniform. Here, more specifically, the static magnetic field coil 12 is composed of, for example, four air-core coils, and a combination thereof creates a uniform magnetic field inside the predetermined magnetic nucleus in the body of the subject 2, more specifically, Gives orientation to the spin of hydrogen nuclei.

傾斜磁場発生コイル１４は、Ｘコイル、Ｙコイル及びＺコイル（図示省略）から構成され、円筒形状を示す静磁場発生コイル１２の内周面に設けられる。
これらＸコイル、Ｙコイル及びＺコイルは、それぞれＸ軸方向、Ｙ軸方向及びＺ軸方向を順番に切り替えながら、ボア内の均一磁場に対し傾斜磁場を重畳させ、静磁場に強度勾配を付与する。Ｚコイルは励起時に、磁界強度をＺ方向に傾斜させて共鳴面を限定し、Ｙコイルは、Ｚ方向の磁界印加の直後に短時間の傾斜を加えて検出信号にＹ座標に比例した位相変調を加え（位相エンコーディング）、Ｘコイルは、続いてデータ採取時に傾斜を加えて、検出信号にＸ座標に比例した周波数変調を与える（周波数エンコーディング）。 The gradient magnetic field generating coil 14 includes an X coil, a Y coil, and a Z coil (not shown), and is provided on the inner peripheral surface of the static magnetic field generating coil 12 having a cylindrical shape.
These X coil, Y coil, and Z coil superimpose a gradient magnetic field on the uniform magnetic field in the bore while sequentially switching the X-axis direction, the Y-axis direction, and the Z-axis direction, and give an intensity gradient to the static magnetic field. . During excitation, the Z coil limits the resonance surface by inclining the magnetic field strength in the Z direction, and the Y coil adds a short time immediately after applying the magnetic field in the Z direction to phase-modulate the detection signal in proportion to the Y coordinate. Is added (phase encoding), and the X coil subsequently applies a gradient when data is acquired to give the detected signal a frequency modulation proportional to the X coordinate (frequency encoding).

この重畳される傾斜磁場の切り替えは、制御シーケンスに従って、Ｘコイル、Ｙコイル及びＺコイルにそれぞれ異なるパルス信号が送信部２４から出力されることにより実現される。これにより、ＮＭＲ現象が発現する被験者２の位置を特定することができ、被験者２の画像を形成するのに必要な三次元座標上の位置情報が与えられる。 Switching of the superimposed gradient magnetic field is realized by outputting different pulse signals from the transmitter 24 to the X coil, the Y coil, and the Z coil in accordance with the control sequence. Thereby, the position of the subject 2 in which the NMR phenomenon appears can be specified, and position information on the three-dimensional coordinates necessary for forming the image of the subject 2 is given.

ここで、上述のように、３組の直交する傾斜磁場を用いて、それぞれにスライス方向、位相エンコード方向、および周波数エンコード方向を割り当ててその組み合わせにより様々な角度から撮影を行える。たとえば、Ｘ線ＣＴ装置で撮像されるものと同じ方向のトランスバーススライスに加えて、それと直交するサジタルスライスやコロナルスライス、更には面と垂直な方向が３組の直交する傾斜磁場の軸と平行でないオブリークスライス等について撮像することができる。 Here, as described above, using three sets of orthogonal gradient magnetic fields, a slice direction, a phase encoding direction, and a frequency encoding direction are assigned to each of them, and imaging can be performed from various angles depending on the combinations. For example, in addition to a transverse slice in the same direction as that imaged by an X-ray CT apparatus, a sagittal slice and a coronal slice perpendicular to the slice, and further, a direction perpendicular to the plane is parallel to three orthogonal gradient magnetic field axes. It is possible to take an image of a non-oblique slice or the like.

ＲＦ照射部１６は、制御シーケンスに従って信号送信部３３から送信される高周波信号に基づいて、被験者２の関心領域にＲＦ（Radio Frequency）パルスを照射するものである。
なお、ＲＦ照射部１６は、図１において、磁場印加機構１１に内蔵されているが、寝台１８に設けられたり、あるいは、受信コイル２０と一体化されていてもよい。 The RF irradiation unit 16 irradiates the region of interest of the subject 2 with an RF (Radio Frequency) pulse based on the high-frequency signal transmitted from the signal transmission unit 33 according to the control sequence.
The RF irradiation unit 16 is built in the magnetic field application mechanism 11 in FIG. 1, but may be provided on the bed 18 or integrated with the receiving coil 20.

受信コイル２０は、被験者２からの応答波（ＮＭＲ信号）を検出するものであって、このＮＭＲ信号を高感度で検出するために、被験者２に近接して配置されている。
ここで、受信コイル２０には、ＮＭＲ信号の電磁波がそのコイル素線を切ると電磁誘導に基づき微弱電流が生じる。この微弱電流は、信号受信部２８において増幅され、さらにアナログ信号からデジタル信号に変換されデータ処理部３２に送られる。 The receiving coil 20 detects a response wave (NMR signal) from the subject 2 and is disposed close to the subject 2 in order to detect the NMR signal with high sensitivity.
Here, when the electromagnetic wave of the NMR signal cuts the coil wire in the receiving coil 20, a weak current is generated based on the electromagnetic induction. The weak current is amplified by the signal receiving unit 28, further converted from an analog signal to a digital signal, and sent to the data processing unit 32.

すなわち、静磁界にＺ軸傾斜磁界を加えた状態にある被験者２に、共鳴周波数の高周波電磁界を、ＲＦ照射部１６を通じて印加すると、磁界の強さが共鳴条件になっている部分の所定の原子核、たとえば、水素原子核が、選択的に励起されて共鳴し始める。共鳴条件に合致した部分（たとえば、被験者２の所定の厚さの断層）にある所定の原子核が励起され、スピンがいっせいに回転する。励起パルスを止めると、受信コイル２０には、今度は、回転しているスピンが放射する電磁波が信号を誘起し、しばらくの間、この信号が検出される。この信号によって、被験者２の体内の、所定の原子を含んだ組織を観察する。そして、信号の発信位置を知るために、ＸとＹの傾斜磁界を加えて信号を検知する、という構成になっている。 That is, when a high-frequency electromagnetic field having a resonance frequency is applied to the subject 2 in a state where a Z-axis gradient magnetic field is added to the static magnetic field through the RF irradiating unit 16, a predetermined portion of the portion where the magnetic field strength is in the resonance condition is applied. Nuclei, such as hydrogen nuclei, are selectively excited and begin to resonate. Predetermined nuclei in a portion that matches the resonance condition (for example, a tomography having a predetermined thickness of the subject 2) are excited, and spins rotate together. When the excitation pulse is stopped, the electromagnetic wave emitted by the rotating spin is induced in the receiving coil 20 this time, and this signal is detected for a while. By this signal, a tissue containing a predetermined atom in the body of the subject 2 is observed. And in order to know the transmission position of a signal, it is the structure of adding a gradient magnetic field of X and Y, and detecting a signal.

画像処理部４８は、記憶部３６に構築されているデータに基づき、励起信号を繰り返し与えつつ検出信号を測定し、１回目のフーリエ変換計算により、共鳴の周波数をＸ座標に還元し、２回目のフーリエ変換でＹ座標を復元して画像を得て、表示部３８に対応する画像を表示する。 The image processing unit 48 measures the detection signal while repeatedly applying the excitation signal based on the data constructed in the storage unit 36, and reduces the resonance frequency to the X coordinate by the first Fourier transform calculation. The Y coordinate is restored by Fourier transformation to obtain an image, and an image corresponding to the display unit 38 is displayed.

たとえば、このようなＭＲＩシステムにより、ＢＯＬＤ信号をリアルタイムで撮像し、制御部４２により、時系列に撮像される画像について、後に説明するような解析処理を行うことで、ｆＭＲＩ画像の撮像を行い、脳活動に関する情報を取得すること可能となる。 For example, such an MRI system captures a BOLD signal in real time, and the control unit 42 performs an analysis process as described later on an image captured in time series, thereby capturing an fMRI image. It becomes possible to acquire information about brain activity.

図２は、データ処理部３２のハードウェアブロック図である。 FIG. 2 is a hardware block diagram of the data processing unit 32.

データ処理部３２のハードウェアとしては、上述のとおり、特に限定されないが、汎用コンピュータを使用することが可能である。 The hardware of the data processing unit 32 is not particularly limited as described above, but a general-purpose computer can be used.

図２において、データ処理部３２のコンピュータ本体２０１０は、メモリドライブ２０２０、ディスクドライブ２０３０に加えて、演算装置（ＣＰＵ）２０４０と、ディスクドライブ２０３０及びメモリドライブ２０２０に接続されたバス２０５０と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ２０６０とに接続され、アプリケーションプログラムの命令を一時的に記憶するとともに一時記憶空間を提供するためのＲＡＭ２０７０と、アプリケーションプログラム、システムプログラム、およびデータを記憶するための不揮発性記憶装置２０８０と、通信インタフェース２０９０とを含む。通信インタフェース２０９０は、駆動部２１等と信号の授受を行うためのインタフェース部４４および図示しないネットワークを介して他のコンピュータと通信するためのネットワークインタフェース５０に相当する。なお、不揮発性記憶装置２０８０としては、ハードディスク（ＨＤＤ）やソリッドステートドライブ（ＳＳＤ：Solid State Drive）などを使用することが可能である。不揮発性記憶装置２０８０が、記憶部３６に相当する。 In FIG. 2, in addition to the memory drive 2020 and the disk drive 2030, the computer main body 2010 of the data processing unit 32 boots up an arithmetic unit (CPU) 2040, a bus 2050 connected to the disk drive 2030 and the memory drive 2020, and A RAM 2070 that is connected to a ROM 2060 for storing a program such as a program, temporarily stores an instruction of the application program and provides a temporary storage space, and stores an application program, a system program, and data A nonvolatile storage device 2080 and a communication interface 2090 are included. The communication interface 2090 corresponds to the interface unit 44 for exchanging signals with the drive unit 21 and the like and the network interface 50 for communicating with other computers via a network (not shown). As the nonvolatile storage device 2080, a hard disk (HDD), a solid state drive (SSD), or the like can be used. The nonvolatile storage device 2080 corresponds to the storage unit 36.

ＣＰＵ２０４０が、プログラムに基づいて実行する演算処理により、データ処理部３２の各機能、たとえば、制御部４２、データ収集部４６、画像処理部４８の各機能が実現される。 Each function of the data processing unit 32, for example, each function of the control unit 42, the data collection unit 46, and the image processing unit 48 is realized by arithmetic processing executed by the CPU 2040 based on the program.

データ処理部３２に、上述した実施の形態の機能を実行させるプログラムは、ＣＤ−ＲＯＭ２２００、またはメモリ媒体２２１０のような記録媒体に記憶されて、ディスクドライブ２０３０またはメモリドライブ２０２０に挿入され、さらに不揮発性記憶装置２０８０に転送されても良い。あるいは、プログラムは、通信インタフェースを介してネットワークからダウンロードされてもよい。プログラムは実行の際にＲＡＭ２０７０にロードされる。 A program that causes the data processing unit 32 to execute the functions of the above-described embodiments is stored in a recording medium such as the CD-ROM 2200 or the memory medium 2210, inserted into the disk drive 2030 or the memory drive 2020, and further nonvolatile. May be transferred to the storage device 2080. Alternatively, the program may be downloaded from a network via a communication interface. The program is loaded into the RAM 2070 at the time of execution.

データ処理部３２は、さらに、入力装置としてのキーボード２１００およびマウス２１１０と、出力装置としてのディスプレイ２１２０とを備える。キーボード２１００およびマウス２１１０が入力部４０に相当し、ディスプレイ２１２０が表示部３８に相当する。 The data processing unit 32 further includes a keyboard 2100 and a mouse 2110 as input devices, and a display 2120 as an output device. A keyboard 2100 and a mouse 2110 correspond to the input unit 40, and a display 2120 corresponds to the display unit 38.

上述したようなデータ処理部３２として機能するためのプログラムは、コンピュータ本体２０１０に、情報処理装置等の機能を実行させるオペレーティングシステム（ＯＳ）は、必ずしも含まなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいれば良い。データ処理部３２がどのように動作するかは周知であり、詳細な説明は省略する。 The program for functioning as the data processing unit 32 as described above does not necessarily include an operating system (OS) that causes the computer main body 2010 to execute functions such as an information processing apparatus. The program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the data processing unit 32 operates is well known, and detailed description thereof is omitted.

また、上記プログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

図３は、データ処理部３２の実行する機能のうち、ＭＲＩ装置１０により計測された脳活動データにより、被験者が視認したか、あるいは、想像した物体について、一般物体認識の処理を行うための構成を示す機能ブロック図である。 FIG. 3 shows a configuration for performing general object recognition processing on an object that is viewed or imagined by a subject based on brain activity data measured by the MRI apparatus 10 among the functions executed by the data processing unit 32. It is a functional block diagram which shows.

不揮発性記憶装置２０８０には、以下のデータが格納されている：
１）画像データから画像中の物体の特徴を表す特徴ベクトルを抽出する特徴抽出器を特定するための特徴抽出器データ３１００
実際は、この特徴抽出器は、物体認識をソフトウェアで実行するための識別器（分類器）であり、このようなソフトウェアを動作させるためのパラメータ等が、特徴抽出器データ３１００に相当する。 The non-volatile storage device 2080 stores the following data:
1) Feature extractor data 3100 for specifying a feature extractor that extracts a feature vector representing the feature of an object in the image from the image data
Actually, this feature extractor is a discriminator (classifier) for executing object recognition by software, and parameters for operating such software correspond to the feature extractor data 3100.

ここで、識別器（分類器）としては、以下のものが考えられる。 Here, the following can be considered as a classifier (classifier).

ａ１）畳み込みニューラルネットワーク(ＣＮＮ)モデル
ｂ１）ＨＭＡＸモデル
ｃ１）ＧＩＳＴ
ｄ１）ＳＩＦＴ＋ＢｏＦモデル
これらの識別器（分類器）のモデルについては、後に詳しく説明する。 a1) Convolutional neural network (CNN) model b1) HMAX model c1) GIST
d1) SIFT + BoF model The models of these classifiers (classifiers) will be described in detail later.

２）脳活動データから画像の特徴ベクトルを予測するための予測器を学習するためのデータおよび予測器を特定するための予測器データ
具体的には、以下のようなデータが含まれる。 2) Data for learning a predictor for predicting an image feature vector from brain activity data and predictor data for specifying a predictor Specifically, the following data is included.

ａ２）被験者について、ある物体を含む画像データを視認しているときの関心領域の脳活動データである学習用ＭＲＩ測定データ３１０２
ｂ２）学習用ＭＲＩ測定データ３１０２を計測する際に、被験者に提示された画像について識別器（分類器）により抽出される物体の特徴ベクトル３１０４
ｃ２）予測すべき物体カテゴリについて、同一の物体カテゴリ内で物体の特徴ベクトル３１０４を平均した物体特有の特徴ベクトル３１０６
ｄ２）予測器を特定するための予測器データ３１１０
この予測器も、脳活動データから画像の特徴ベクトルを予測する処理をソフトウェアで実行するものであり、このようなソフトウェアを動作させるためのパラメータ等が、予測器データ３１１０に相当する。 a2) Learning MRI measurement data 3102 which is brain activity data of a region of interest when a subject is viewing image data including a certain object
b2) When the learning MRI measurement data 3102 is measured, the object feature vector 3104 extracted by the classifier (classifier) for the image presented to the subject.
c2) An object-specific feature vector 3106 obtained by averaging object feature vectors 3104 in the same object category for the object category to be predicted
d2) Predictor data 3110 for specifying a predictor
This predictor also executes processing for predicting image feature vectors from brain activity data by software, and parameters for operating such software correspond to the predictor data 3110.

このような予測器も、機械学習により訓練されて上記のようなパラメータ等が予め決定されているものとする。機械学習の訓練は、学習用ＭＲＩ測定データ３１０２から物体の特徴ベクトル３１０４を予測する、または、学習用ＭＲＩ測定データ３１０２から物体特有の特徴ベクトル３１０６を予測するものとして、実行される。 It is assumed that such a predictor is also trained by machine learning and the parameters as described above are determined in advance. The machine learning training is executed by predicting an object feature vector 3104 from the learning MRI measurement data 3102 or predicting an object-specific feature vector 3106 from the learning MRI measurement data 3102.

３）視認したかあるいは想像している物体を推定する対象となる被験者についてのＭＲＩ装置による関心領域の脳活動データであるＭＲＩ測定データ３１１２。 3) MRI measurement data 3112 which is brain activity data of a region of interest by an MRI apparatus for a subject who is to estimate a visually recognized or imagined object.

ＭＲＩ測定データ３１１２は、たとえば、図１に示したＭＲＩ装置１０により、被験者について計測され、記憶部３６（不揮発性記憶装置２０８０）に格納されているものとする。 For example, the MRI measurement data 3112 is measured for the subject by the MRI apparatus 10 illustrated in FIG. 1 and is stored in the storage unit 36 (nonvolatile storage device 2080).

さらに、ＣＰＵ２０４０の実行する機能には、以下のものが含まれる：
１）学習用ＭＲＩ測定データ３１０２と、物体の特徴ベクトル３１０４または物体特有の特徴ベクトル３１０６とを通信インタフェース２０９０経由で不揮発性記憶装置２０８０から受け取り、機械学習により予測器を生成して、生成された予測器を特定するための予測器データ３１１０を、不揮発性記憶装置２０８０に格納する予測器生成部３０１２。 Further, the functions executed by the CPU 2040 include the following:
1) The learning MRI measurement data 3102 and the object feature vector 3104 or the object-specific feature vector 3106 are received from the nonvolatile storage device 2080 via the communication interface 2090, and a predictor is generated by machine learning. A predictor generation unit 3012 that stores predictor data 3110 for specifying a predictor in the nonvolatile storage device 2080.

２）不揮発性記憶装置２０８０中の予測器データ３１１０に基づいて動作し、通信インタフェース２０９０経由で受信するＭＲＩ測定データ３１１２から、物体の画像を視認しているあるいは物体を想像している被験者の脳活動データから特徴ベクトルを予測する特徴ベクトル予測部３０１４。 2) The brain of the subject who is operating based on the predictor data 3110 in the non-volatile storage device 2080 and viewing the image of the object or imagining the object from the MRI measurement data 3112 received via the communication interface 2090 A feature vector prediction unit 3014 that predicts feature vectors from activity data.

３）汎用画像データベース４０００から、参照画像データおよび当該参照画像データについてのタグ情報とを通信インタフェース２０９０経由で読出し、特徴抽出器データ３１００で特定される識別器により画像の特徴ベクトルを算出する特徴ベクトル抽出部３０１６。 3) A feature vector that reads reference image data and tag information about the reference image data from the general-purpose image database 4000 via the communication interface 2090, and calculates a feature vector of the image by a discriminator specified by the feature extractor data 3100. Extractor 3016.

４）特徴ベクトル予測部３０１４により予測された特徴ベクトル（以下、「推定視覚特徴ベクトル」と呼ぶ）と、特徴ベクトル抽出部３０１６により算出された特徴ベクトル（以下、「視覚特徴ベクトル」と呼ぶ）との類似度（たとえば、相関）の大きさを算出する相関値算出部３０１８。 4) A feature vector predicted by the feature vector prediction unit 3014 (hereinafter referred to as “estimated visual feature vector”), and a feature vector calculated by the feature vector extraction unit 3016 (hereinafter referred to as “visual feature vector”). A correlation value calculation unit 3018 for calculating the degree of similarity (for example, correlation) of

５）相関値算出部３０１８により算出された類似度の大きさに基づいて、被験者の脳の関心領域に生じている脳活動パターンに対応する物体のカテゴリを識別する識別処理部３０２０。ここで、脳活動パターンに対応する物体のカテゴリとして識別するための条件としては、たとえば、推定視覚特徴ベクトルについて、所定数の参照画像データに対応する視覚特徴ベクトルの間で、最も大きな相関値を有する物体カテゴリを識別結果とすることとしてもよいし、あるいは、相関値が所定値を超えるような物体カテゴリが見いだされた時点で、その物体カテゴリを識別結果としてもよい。 5) An identification processing unit 3020 that identifies the category of the object corresponding to the brain activity pattern occurring in the region of interest of the subject's brain based on the degree of similarity calculated by the correlation value calculation unit 3018. Here, as a condition for identifying the category of the object corresponding to the brain activity pattern, for example, for the estimated visual feature vector, the largest correlation value is obtained between the visual feature vectors corresponding to a predetermined number of reference image data. The object category may be used as the identification result, or when an object category whose correlation value exceeds a predetermined value is found, the object category may be used as the identification result.

なお、以上の説明では、識別処理部が３０２０が、識別処理を行うにあたり、その都度、特徴ベクトル抽出部３０１６が、汎用画像データベース４０００内の参照画像データについて視覚特徴ベクトルを算出するものとして説明した。ただし、たとえば、特徴ベクトル抽出部３０１６は、事前において、汎用画像データベース４０００内の参照画像データの少なくとも一部について視覚特徴ベクトルを算出しておき、物体カテゴリとそれに対応する視覚特徴ベクトルとの複数個の組を、不揮発性記憶装置２０８０にカテゴリ特徴データベースとして格納しておくことで、相関値算出部３０１８および識別処理部３０２０は、カテゴリ特徴データベース内の視覚特徴ベクトルと推定視覚特徴ベクトルとの類似度を算出して、物体カテゴリを識別する構成とすることもできる。 In the above description, the identification processing unit 3020 has described that the feature vector extraction unit 3016 calculates the visual feature vector for the reference image data in the general-purpose image database 4000 each time the identification processing is performed. . However, for example, the feature vector extraction unit 3016 calculates visual feature vectors for at least a part of the reference image data in the general-purpose image database 4000 in advance, and a plurality of object categories and corresponding visual feature vectors. Is stored in the nonvolatile storage device 2080 as a category feature database, the correlation value calculation unit 3018 and the identification processing unit 3020 can determine the similarity between the visual feature vector and the estimated visual feature vector in the category feature database. Can be calculated to identify the object category.

また、特に、参照画像データの一部について視覚特徴ベクトルを算出しておき、物体カテゴリとそれに対応する視覚特徴ベクトルとの複数個の組を、カテゴリ特徴データベースとして格納する場合は、このカテゴリ特徴データベース内のデータを識別処理におけるキャッシュデータのように使用することも可能である。すなわち、相関値算出部３０１８は、まず、推定視覚特徴ベクトルと、カテゴリ特徴データベースに予め格納されている視覚特徴ベクトルとの間の相関を算出し、たとえば、所定の相関値以上となる物体カテゴリが存在しない場合または相関を算出した参照画像の個数が所定の個数に達していない場合には、汎用画像データベース４０００内の参照画像データを順次読み出して、特徴ベクトル抽出部３０１６が視覚特徴ベクトルを算出し、相関値算出部３０１８が相関値を算出して、識別処理部３０２０が、識別処理を行うこととしてもよい。 In particular, when a visual feature vector is calculated for a part of the reference image data and a plurality of sets of object categories and corresponding visual feature vectors are stored as a category feature database, this category feature database is used. It is also possible to use the data in the same way as cache data in the identification process. That is, the correlation value calculation unit 3018 first calculates the correlation between the estimated visual feature vector and the visual feature vector stored in advance in the category feature database. For example, an object category that has a predetermined correlation value or more is calculated. When there is no reference image or the number of reference images for which correlation has been calculated does not reach a predetermined number, the reference image data in the general-purpose image database 4000 is sequentially read, and the feature vector extraction unit 3016 calculates visual feature vectors. The correlation value calculation unit 3018 may calculate the correlation value, and the identification processing unit 3020 may perform the identification process.

さらに、以下の説明において、特に、実験結果については、学習用ＭＲＩ測定データ３１０２を計測した被験者と、ＭＲＩ測定データ３１１２を計測することで視認したかあるいは想像している物体を推定する対象となる被験者とが、同一人物であるものとして説明する。 Further, in the following description, in particular, with respect to the experimental result, the subject who has measured the learning MRI measurement data 3102 and the object that is visually recognized or imagined by measuring the MRI measurement data 3112 are estimated. The test subject is assumed to be the same person.

ただし、学習用ＭＲＩ測定データ３１０２を計測した第１の被験者（複数であってよい）と、ＭＲＩ測定データ３１１２を計測する第２の被験者とは、必ずしも同一人物である必要はない。 However, the first subject (which may be a plurality) that measures the MRI measurement data 3102 for learning and the second subject that measures the MRI measurement data 3112 are not necessarily the same person.

そこで、第１の被験者と第２の被験者とが同一であるか否かを問わず、ＭＲＩ測定データ３１１２を計測し、視認したかあるいは想像している物体を推定する対象となる第２の被験者のことを、特に、「対象者」と呼ぶことにする。 Therefore, regardless of whether or not the first subject and the second subject are the same, the second subject to be measured is the MRI measurement data 3112 to estimate the object that has been viewed or imagined. Will be referred to as “subject” in particular.

なお、直接、上述したような視覚特徴ベクトルの予測器に対する実験ではないものの、視覚刺激を与えたときに、ある被検者に生じる脳活動のパターンを、同一の視覚刺激を受けたときの他の被験者の脳活動のパターンに変換することには、たとえば、以下の文献に、すでに成功例が報告されている。 Although it is not directly an experiment on the predictor of the visual feature vector as described above, the pattern of brain activity that occurs in a subject when a visual stimulus is given is different from that obtained when the same visual stimulus is received. Successful examples have already been reported in the following literature, for example, in the conversion to the pattern of brain activity of subjects.

文献：Yamada, K., Miyawaki, Y., Kamitani, Y.: Inter-subject neural code converter for visual image representation: NeuroImage Vol. 113, pp.289-97 (2015)
つまり、この文献では、２人の被験者のペアについて、同一の視覚刺激が提示されたときの脳活動パターン（ボクセルパターン）を、複数の視覚刺激について、統計的機械学習を行って、一方の脳活動パターンから他方の脳活動パターンを予測している（ニューラルコードコンバータと呼ぶ）。なお、この統計的機械学習においては、任意の脳活動パターンを予測できるようにするために、２人の被験者に複数のランダムパターンを提示し、同一のパターンが提示された際の２人の脳活動パターンを関連付けて記録する。予測目標となる被検者の関心領域内の各ボクセルのｆＭＲＩ信号の振幅を、他方の被験者の関心領域内のボクセルのｆＭＲＩ信号の振幅の線形結合として、ニューラルコードコンバータを訓練する。ここで、このようなニューラルコードコンバータに対する学習には、後に説明するようなスパースロジスティック回帰の手法を用いることができる。 Literature: Yamada, K., Miyawaki, Y., Kamitani, Y .: Inter-subject neural code converter for visual image representation: NeuroImage Vol. 113, pp.289-97 (2015)
That is, in this document, a brain activity pattern (voxel pattern) when the same visual stimulus is presented for a pair of two subjects is subjected to statistical machine learning for a plurality of visual stimuli, and one brain is obtained. The other brain activity pattern is predicted from the activity pattern (called a neural code converter). In this statistical machine learning, in order to be able to predict an arbitrary brain activity pattern, a plurality of random patterns are presented to two subjects, and the two brains when the same pattern is presented Record activity patterns in association. The neural code converter is trained with the amplitude of the fMRI signal of each voxel in the region of interest of the subject to be predicted as a linear combination of the amplitude of the fMRI signal of the voxel in the region of interest of the other subject. Here, the sparse logistic regression method described later can be used for learning with respect to such a neural code converter.

このような技術を応用することで、第１の被験者と第２の被験者（対象者）とが異なる場合であっても、第１の被験者からの脳活動の測定データにより訓練された予測器を用いる際に、たとえば、第２の被験者の脳活動パターンをニューラルコードコンバータにより第１の被験者の脳活動パターンに変換すれば、第２の被験者の脳活動パターンに対応する「推定視覚特徴ベクトル」を予測することが可能である。
（脳活動解析装置システムの構築の手続き）
図４は、図３に示したような脳活動解析装置システムの構築の手続きを説明するためのフローチャートである。 By applying such a technique, even when the first subject and the second subject (subject) are different, the predictor trained by the measurement data of the brain activity from the first subject is used. In use, for example, if the brain activity pattern of the second subject is converted into the brain activity pattern of the first subject by a neural code converter, an “estimated visual feature vector” corresponding to the brain activity pattern of the second subject is used. It is possible to predict.
(Procedure for constructing a brain activity analyzer system)
FIG. 4 is a flowchart for explaining the procedure for constructing the brain activity analysis apparatus system as shown in FIG.

また、図５は、このようにして構築された脳活動解析装置システムにより、対象者の脳活動データから、対象者が視認している物体または想像している物体を識別する処理を説明するための概念図である。 FIG. 5 is a diagram for explaining a process of identifying an object visually recognized or imagined by the subject from the brain activity data of the subject by the brain activity analysis system constructed as described above. FIG.

図４を参照して、特徴ベクトル抽出部３０１４は、たとえば、汎用画像データベース４０００からの参照画像データについて、視覚的特徴を使用して、物体画像から視覚特徴パターンを抽出し、各視覚的特徴の特徴ベクトルによって物体画像を表現して、物体の特徴ベクトル３１０４を生成する（Ｓ１００）。このとき、物体特有の特徴ベクトル３１０６も併せて生成される。 Referring to FIG. 4, the feature vector extraction unit 3014 extracts a visual feature pattern from an object image using, for example, a visual feature for the reference image data from the general-purpose image database 4000, and extracts each visual feature. An object image is represented by the feature vector, and an object feature vector 3104 is generated (S100). At this time, an object-specific feature vector 3106 is also generated.

続いて、被験者に対して、物体の特徴ベクトル３１０４が抽出された参照画像データを提示して、その際に併せて計測された被験者の脳活動データから成る学習用ＭＲＩ測定データ３１０２とに基づいて、予測器生成部３０１２は、予測器（デコーダ）を、脳活性パターンから、視認された物体の視覚的特徴のパターンを推定するように機械学習で訓練する（Ｓ１０２）。 Subsequently, reference image data from which the object feature vector 3104 is extracted is presented to the subject, and based on the learning MRI measurement data 3102 including brain activity data of the subject measured at that time. The predictor generation unit 3012 trains the predictor (decoder) by machine learning so as to estimate the pattern of the visual feature of the visually recognized object from the brain activity pattern (S102).

特徴ベクトル予測部３０１４は、対象者の脳内の複数の所定領域における脳活動を示す信号を測定するための脳活動検知装置からの信号を受信し、訓練された予測器（デコーダ）により、対象者が視認している、あるいは、想像している物体についての特徴パターン（推定視覚特徴ベクトル）を推定する（Ｓ１０４）。 The feature vector prediction unit 3014 receives a signal from a brain activity detection device for measuring a signal indicating brain activity in a plurality of predetermined regions in the subject's brain, and the target is predicted by a trained predictor (decoder). A feature pattern (estimated visual feature vector) is estimated for an object visually recognized or imagined by a person (S104).

相関値算出部３０１８は、推定された推定視覚特徴ベクトルと、汎用画像データベースで注釈（画像中の物体の物体カテゴリの情報を含む）がタグ付けされた画像から計算された物体特有の特徴ベクトルとの間の類似度（たとえば、相関値）を計算する（Ｓ１０６）。 The correlation value calculation unit 3018 includes the estimated visual feature vector estimated, and an object-specific feature vector calculated from an image tagged with an annotation (including information on the object category of the object in the image) in the general-purpose image database. The similarity (for example, correlation value) is calculated (S106).

識別処理部３０２０は、類似度に基づいて、最も大きな類似度の画像に対応する注釈により、視認されたあるいは想像された物体を識別（分類）する（Ｓ１０８）。 The identification processing unit 3020 identifies (classifies) a visually recognized or imagined object based on the similarity based on the annotation corresponding to the image having the largest similarity (S108).

ここで、予測器を訓練する学習データは、汎用画像データベース４０００に含まれる参照画像データの一部である。一方、汎用画像データベース４０００には、学習に使用したのよりも、はるかに多くの物体カテゴリに属する参照画像データが格納されている。したがって、本実施の形態の脳活動解析装置によれば、後に、詳しく説明するように、予測器が学習していないような物体カテゴリの物体を含む画像を対象者が、視認しているまたは想像している場合でも、対象者の視認または想像している対象の物体のカテゴリを識別することができる。 Here, the learning data for training the predictor is a part of the reference image data included in the general-purpose image database 4000. On the other hand, the general-purpose image database 4000 stores reference image data belonging to a much larger number of object categories than that used for learning. Therefore, according to the brain activity analysis apparatus of the present embodiment, as will be described in detail later, the subject visually recognizes or imagines an image including an object of an object category that the predictor has not learned. In this case, it is possible to identify the category of the target object visually recognized or imagined by the target person.

また、図５を参照して、対象者が、画像２００を視認している場合、対象者の脳の関心領域に生じる活動パターンがｆＭＲＩにより計測され、その際に生じる活動の特徴パターン２１０が計測される。この特徴パターン２１０に基づいて、学習用データ２２０により予め訓練された予測器（デコーダ）２３０が、推定視覚特徴ベクトル２４０を推定する。 Referring to FIG. 5, when the subject visually recognizes the image 200, the activity pattern generated in the region of interest of the subject's brain is measured by fMRI, and the feature pattern 210 of the activity generated at that time is measured. Is done. Based on the feature pattern 210, a predictor (decoder) 230 trained in advance by the learning data 220 estimates the estimated visual feature vector 240.

推定視覚特徴ベクトル２４０と、汎用画像データベース４０００中の各参照画像データについて特徴ベクトル抽出部３０１６が抽出した視覚特徴ベクトル群２５０とを、相関値算出部３０１６の算出結果に基づいて、識別処理部３０２０が比較することで、対象者が、視認しているまたは想像している物体のカテゴリを識別することができる。 Based on the calculation result of the correlation value calculation unit 3016, the identification processing unit 3020 determines the estimated visual feature vector 240 and the visual feature vector group 250 extracted by the feature vector extraction unit 3016 for each reference image data in the general-purpose image database 4000. Can be used to identify the category of objects that the subject is viewing or imagining.

なお、特に限定されないが、本実施の形態において、ｆＭＲＩにより撮像される脳の関心領域（ＲＯＩ:Region of Interest）としては、以下のようなものがある：
・視覚野であるＶ１野−Ｖ４野、
・外側後頭複合体（ＬＯＣ：lateral occipital complex）、
・紡錘状顔領域（ＦＦＡ：fusiform face area）および
・海馬傍回場所領域（ＰＰＡ：parahippocampal place area）
ここで、海馬傍回場所領域とは、海馬傍皮質の下位領域で (顔や物体ではなく) 風景の符号化と認知に重要な役割を持つとされる。ｆＭＲＩ研究により、この脳領域は被験者が自然風景や都市風景などの画像 (つまり"場所"の画像) のような地理的な風景の刺激を呈示された際に高い活動を示することが知られている。
［視覚特徴ベクトルを抽出する識別器（分類器）］
以下では、図４において、ステップＳ１００の処理として、視覚的特徴を使用して、物体画像から特徴パターンを抽出し、各視覚的特徴の特徴ベクトルによって物体画像を表現する前提としての識別器について、説明する。 Although not particularly limited, in this embodiment, the region of interest (ROI) of the brain imaged by fMRI includes the following:
・ V1 field-V4 field which is visual field
・ Lateral occipital complex (LOC),
• fusiform face area (FFA) • parahippocampal place area (PPA)
Here, the parahippocampal region is a subregion of the parahippocampal cortex and is said to play an important role in the coding and recognition of landscapes (not faces and objects). According to fMRI studies, this brain region is known to show high activity when subjects are presented with stimuli of geographical scenery such as images of natural or urban landscapes (ie "location" images). ing.
[Classifier that extracts visual feature vectors (classifier)]
Hereinafter, in FIG. 4, as a process of step S <b> 100, using a visual feature, a feature pattern is extracted from an object image, and the classifier as a premise to represent the object image by the feature vector of each visual feature explain.

上述したように、識別器（分類器）としては、畳み込みニューラルネットワーク(ＣＮＮ)モデル、ＨＭＡＸモデル、ＧＩＳＴ、ＳＩＦＴ＋ＢｏＦモデルが考えられる。 As described above, as the classifier (classifier), a convolutional neural network (CNN) model, a HMAX model, a GIST, and a SIFT + BoF model can be considered.

以下、これらの識別器について、特に、ＣＮＮモデルを中心として説明する。 In the following, these discriminators will be described focusing on the CNN model.

識別器は、一般には、画像から、多層の処理層を経て、最終的に、物体の認識結果を出力するように構成される。 The discriminator is generally configured to output an object recognition result from an image through multiple processing layers.

図６では、一例として、ＣＮＮモデルについて概念を示す図である。図６では、合計８層のＣＮＮ１層〜ＣＮＮ８層での処理により、物体認識が実行される。 FIG. 6 is a diagram illustrating a concept of the CNN model as an example. In FIG. 6, object recognition is executed by processing in a total of eight layers CNN1 to CNN8.

（ＣＮＮモデル）
図７は、ＣＮＮモデルの構成の一例を説明するための概念図である。 (CNN model)
FIG. 7 is a conceptual diagram for explaining an example of the configuration of the CNN model.

図７に示すように、典型的なＣＮＮモデルは、入力側から、畳込み層、プーリング層の順で重ね、これを何度か繰り返す構造を持つ。ただしこの２種類の層はいつもペアで使われるわけではなく、畳込み層のみ複数回繰り返した後、プーリング層を１層重ねることもある。また、局所コントラスト正規化(local contrast normalization)と呼ばれる画像濃淡の正規化を行う層が設置される場合もある。 As shown in FIG. 7, a typical CNN model has a structure in which a convolution layer and a pooling layer are stacked in this order from the input side and this is repeated several times. However, these two types of layers are not always used in pairs, and only the convolution layer may be repeated several times, and then one pooling layer may be stacked. In some cases, a layer for normalizing image density called local contrast normalization is provided.

畳込み層とプーリング層の繰り返しの後には、隣接層間のユニットが全結合した（すべて密に結合した）層が配置される。これは普通の順伝播型ニューラルネットの層間結合であるが、畳込み層などと区別するために、層間が全結合(fully-connected)であると言う。 After the repetition of the convolution layer and the pooling layer, a layer in which the units between adjacent layers are fully connected (all closely connected) is disposed. This is an interlayer connection of a normal forward propagation type neural network, but in order to distinguish it from a convolution layer, the layer is said to be fully-connected.

最後のプーリング層から出力層の間には、通常この全結合層が複数、連続して配置される。最後に位置する出力層は、通常のニューラルネット同様に設計される。 Between the last pooling layer and the output layer, usually a plurality of all the coupling layers are continuously arranged. The output layer located at the end is designed in the same way as a normal neural network.

例えば目的がクラス分類なら，この層の活性化関数をソフトマックス(softmax) 関数とする。 For example, if the purpose is classification, the activation function of this layer is the softmax function.

「畳み込み層」は、上述した単純型細胞をモデル化したものということができる。また、「プーリング層」は、上述した複雑型細胞をモデル化したものということができる。 The “convolution layer” can be said to be a model of the above-described simple cell. The “pooling layer” can be said to be a model of the above-described complex cell.

図８および図９は、このようなモデル化の概念を説明するための図である。 8 and 9 are diagrams for explaining the concept of such modeling.

単純型細胞は，図８のような構造の単層ネットワークの各ユニットでモデル化できる。左側の層が入力で、右側が出力である。各層のユニットは２次元的に並び、図８（ａ）、（ｂ）のように右の層のユニットは，左の層のたとえば４×４のユニット群とのみ結合を持ち、その４×４のユニットに図８（ｃ）のような特定のパタンが入力されたときのみ、それに反応して活性化する（発火する）とする。そのパタンは（右の層の）全ユニットで共通である。 A simple cell can be modeled by each unit of a single-layer network structured as shown in FIG. The left layer is the input and the right is the output. The units in each layer are arranged two-dimensionally, and the unit in the right layer as shown in FIGS. 8A and 8B has a connection only with, for example, a 4 × 4 unit group in the left layer. Only when a specific pattern as shown in FIG. 8 (c) is input to the unit, it is activated (ignited) in response thereto. The pattern is common to all units (on the right layer).

一方、複雑型細胞は，図９に示すように、図８の単層ネットワークの上位に層を追加したとき、そのユニットによってモデル化できる。 On the other hand, as shown in FIG. 9, complex cells can be modeled by units when a layer is added above the single-layer network of FIG.

追加した層のユニットは，中間層のたとえば３×３のユニット群と結合を持ち、これらのユニットのうち、１つでも活性化すると、自身も活性化するとする。中間層のユニットが活性化するパタンが図８（ｃ）のとき、全体への入力が図９（ａ）から図９（ｂ）のように変わると、中間層で活性化するユニットは同図のように変化する。 The added layer unit has a connection with, for example, a 3 × 3 unit group of the intermediate layer, and when one of these units is activated, it activates itself. When the pattern activated by the unit in the intermediate layer is as shown in FIG. 8C, if the input to the whole changes as shown in FIG. 9A, the unit activated in the intermediate layer is shown in FIG. It changes as follows.

一方、出力側のユニットは、中間層のユニットがどれか１つでも活性化していれば活性化するため、図９（ａ）および図９（ｂ）のいずれの入力でも活性化する。 On the other hand, the output unit is activated if any one of the units in the intermediate layer is activated. Therefore, the output unit is activated by any of the inputs in FIGS. 9A and 9B.

このように、中間層のユニット（単純型細胞）は入力パタンの位置変化に敏感であるものの、出力側の層のユニット（複雑型細胞）は一定の（この例では、３×３）範囲の位置ずれに鈍感である。 As described above, the unit of the intermediate layer (simple cell) is sensitive to the change in the position of the input pattern, but the unit (complex cell) of the output layer is in a certain range (3 × 3 in this example). Insensitive to misalignment.

図９の中間層と出力側の層が、畳込みニューラルネットワークを構成する、畳込み層およびプーリング層に、それぞれ対応する。 The intermediate layer and the output layer in FIG. 9 correspond to the convolution layer and the pooling layer, which constitute the convolutional neural network, respectively.

畳み込みニューラルネットワーク(ＣＮＮ)モデルの詳細については、たとえば、以下の文献にも開示がある。 Details of the convolutional neural network (CNN) model are also disclosed in, for example, the following documents.

文献１：岡谷貴之著、「深層学習」、講談社、２０１５年４月７日、第１刷発行
以下の説明では、畳込みニューラルネットワークによるCNNモデルは、５つの畳み込み層（ＣＮＮ１−５）および３つの全結合層（ＣＮＮ６−８）から成る深層構造を備えた人工ニューラルネットワークであるものとする。 Reference 1: Takayuki Okaya, “Deep Learning”, Kodansha, April 7, 2015, first print issue In the following explanation, the CNN model based on the convolutional neural network has five convolutional layers (CNN1-5) and 3 Assume that the artificial neural network has a deep layer structure composed of two fully connected layers (CNN6-8).

また、学習等に使用される画像データは、一例として、オンライン画像データベースImageNet（汎用画像データベース４０００の一例）から集めたものを使用するものとして説明する。この画像データベースは、２０１１年の秋にリリースされたものであり、画像がWordNetの中の階層によってグループ化された画像データベースである。以下のサイトで公開されている。 Further, as an example, the image data used for learning and the like will be described using data collected from the online image database ImageNet (an example of the general-purpose image database 4000). This image database was released in the fall of 2011, and is an image database in which images are grouped by hierarchy in WordNet. It is published on the following site.

http://www.image-net.org/;
また、WordNetについては以下に開示がある。 http://www.image-net.org/;
In addition, WordNet is disclosed below.

Fellbaum, C. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press (1998).
WordNetは、英語の概念辞書（意味辞書）である。WordNetでは英単語がsynsetと呼ばれる同義語のグループに分類され、簡単な定義や、他の同義語のグループとの関係が記述される木構造を有している。すなわち、WordNetは、語を類義関係のセット(synset)でグループ化している点に特徴があり、一つのsynsetが一つの概念に対応する。また、各synsetは上位下位関係などの多様な関係で結ばれている。 Fellbaum, C. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press (1998).
WordNet is an English concept dictionary (semantic dictionary). In WordNet, English words are classified into synonym groups called synsets, and have a tree structure that describes simple definitions and relationships with other synonym groups. That is, WordNet is characterized in that words are grouped by a set of synonyms (synset), and one synset corresponds to one concept. Each synset is connected by various relationships such as upper and lower relationships.

ＣＮＮモデルは、ImageNetの中の画像で、たとえば１，０００の物体カテゴリを分類するように訓練された。第１〜第７層の各々で、そのユニットの一部、たとえば、１，０００ユニットをランダムに選択し、第８層の中では、分類する物体カテゴリの数に対応して設けられる１，０００ユニットをすべて使用して、画像特徴とする。 The CNN model was trained to classify, for example, 1,000 object categories with images in ImageNet. In each of the first to seventh layers, a part of the unit, for example, 1,000 units is selected at random, and in the eighth layer, 1,000 is provided corresponding to the number of object categories to be classified. All units are used as image features.

すなわち、それらのユニットの出力のベクトルを画像特徴ベクトルとして、各画像を表わし、それぞれ、ＣＮＮ１−ＣＮＮ８モデルという名前をつけることにする。 That is, each image is represented by using the output vector of these units as an image feature vector, and each unit is named CNN1-CNN8 model.

（ＨＭＡＸモデル）
ＨＭＡＸモデルでは、特徴は、多層の中で階層的に計算される。 (HMAX model)
In the HMAX model, features are calculated hierarchically in multiple layers.

ここで、ＨＭＡＸモデルについては、以下の文献に記載されているので、以下では、その概略について説明する。 Here, since the HMAX model is described in the following documents, an outline thereof will be described below.

文献２：T. Serre and M. Riesenhuber, ”Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex，”CBCLPaper 239/AIMemo 2004-017, Massachusetts Inst. of Technology, Cambridge, 2004.
文献３：Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M. & Poggio, T. ”Robust object recognition with cortex-like mechanisms．” IEEE Trans. Pattern Anal. Mach. Intell. 29, 411-426 (2007).
ＨＭＡＸモデルの複数の層は、画像層および６つの後続する層（Ｓ１，Ｃ１，Ｓ２，Ｃ２，Ｓ３およびＣ３層）から成り、それらはテンプレートマッチングと最大化オペレーションとを交互に実行する。 Reference 2: T. Serre and M. Riesenhuber, “Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex,” CBCLPaper 239 / AIMemo 2004-017, Massachusetts Inst. Of Technology, Cambridge , 2004.
Reference 3: Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M. & Poggio, T. “Robust object recognition with cortex-like mechanisms.” IEEE Trans. Pattern Anal. Mach. Intell. 29, 411-426 (2007).
The layers of the HMAX model consist of an image layer and six subsequent layers (S1, C1, S2, C2, S3 and C3 layers), which alternately perform template matching and maximization operations.

ここで、Ｓ１層は、入力画像に対して、様々な方向／スケールのガボールフィルタをかける層である。 Here, the S1 layer is a layer for applying Gabor filters of various directions / scales to the input image.

Ｃ１層は、近傍位置／スケールのＳ１層から入力を受け取り、その中の最大値を出力する。 The C1 layer receives an input from the S1 layer of the near position / scale and outputs the maximum value among them.

Ｓ２層は、Ｃ１層からの入力と事前に取得しているＮ個の形状パッチとの類似度を出力する層である。 The S2 layer is a layer that outputs the similarity between the input from the C1 layer and the N shape patches acquired in advance.

Ｃ２層は、Ｓ２層からの入力を受け取り、各形状ごとに全ての位置／スケールで最大の信号を出力する。 The C2 layer receives the input from the S2 layer and outputs the maximum signal at every position / scale for each shape.

また、Ｓ３層は、形状の様々な向きのテンプレートを持ち、Ｃ３層は、全ての向きを統合する処理をする。 In addition, the S3 layer has templates with various orientations of the shape, and the C3 layer performs processing for integrating all orientations.

各層の計算では、Ｃ２層およびＣ３層の中の特徴の数が１，０００に設定された以外は、文献３と、同じパラメーターを使用している。 In the calculation of each layer, the same parameters as in Reference 3 are used except that the number of features in the C2 layer and the C3 layer is set to 1,000.

３つのタイプのＨＭＡＸ特徴から成るベクトルによって各画像を表わし、それらはＳ１層、Ｓ２層およびＣ２層の中のユニットの１，０００個のランダムに選択された出力およびＣ３層の１，０００個の出力すべてからなっている。 Each image is represented by a vector consisting of three types of HMAX features, which are 1,000 randomly selected outputs of units in the S1, S2 and C2 layers and 1,000 of the C3 layers. It consists of all output.

（ＧＩＳＴモデル）
ＧＩＳＴ特徴は画像を小領域に区切り、それらの小領域に対し様々な方向，周波数のガボールフィルターをかけることにより、シーン情報を記述する特徴量である。 (GIST model)
The GIST feature is a feature amount that describes scene information by dividing an image into small regions and applying Gabor filters of various directions and frequencies to the small regions.

具体例としては、ＧＩＳＴ特徴を計算するために、画像は最初にグレイスケールに変換され、最大２５６ピクセル幅を持つようにサイズ変更される。 As a specific example, to calculate GIST features, the image is first converted to grayscale and resized to have a maximum width of 256 pixels.

次に、画像は１セットのガボール・フィルタ（１６の方向、４つのスケール）を使用してフィルタリングされる。 The image is then filtered using a set of Gabor filters (16 directions, 4 scales).

その後、フィルタリングされた画像は、４×４のグリッド（１６個のブロック）に分けられ、各ブロック内のフィルタリングされた出力は、各フィルタにつき１６個の応答を抽出するために平均される。 The filtered image is then divided into a 4 × 4 grid (16 blocks) and the filtered output in each block is averaged to extract 16 responses for each filter.

多数のフィルタからの応答は、各画像につき、１０２４次元（１６［方向］×４［スケール］×１６［ブロック］＝１０２４）の特徴ベクトルを生成するように連結される。 The responses from multiple filters are concatenated to generate a feature vector of 1024 dimensions (16 [directions] × 4 [scales] × 16 [blocks] = 1024) for each image.

（ＳＩＦＴ＋ＢｏＦモデル）
ＳＩＦＴ＋ＢｏＦによる画像特徴は、脳における神経処理に基づかないものの、中間レベルの視覚皮質の領野との描写的な類似点が報告されている。 (SIFT + BoF model)
Although image features with SIFT + BoF are not based on neural processing in the brain, descriptive similarities with areas of mid-level visual cortex have been reported.

一枚の画像に対し、１，０００個のパッチを設ける。特徴量はベクトル量子化を行い、１，０００個のクラスタにクラスタリングされる。最後に、ヒストグラムが正規化される。 1,000 patches are provided for one image. The feature quantity is vector quantized and clustered into 1,000 clusters. Finally, the histogram is normalized.

ＳＩＦＴ＋ＢｏＦモデルでは、使用する視覚的特徴は、ＳＩＦＴ記述子から計算される。 In the SIFT + BoF model, the visual features to use are calculated from the SIFT descriptor.

ＢｏＦアプローチでは、視覚的特徴ベクトル（visual words）の要素は、それぞれベクトル量子化された記述子によって作成される。 In the BoF approach, the elements of visual words are each created by a vector quantized descriptor.

独立したトレーニング画像セットから計算された約１００万のＳＩＦＴ記述子を使用して、１組の１０００個のビジュアルワードを生成するために、ｋ−ｍｅａｎｓ法が実行される。 The k-means method is performed to generate a set of 1000 visual words using approximately 1 million SIFT descriptors calculated from independent training image sets.

また、各ビジュアルワードの頻度は、各イメージのＢｏＦヒストグラムを作成するために計算される。 Also, the frequency of each visual word is calculated to create a BoF histogram for each image.

最後に、上記の処理によって得られたヒストグラムはすべて、ユニット・ノルム・ベクトルになるためにＬ−１正規化される。 Finally, all the histograms obtained by the above process are L-1 normalized to become unit norm vectors.

そこで、以下では、「ＳＩＦＴ＋ＢｏＦによる画像特徴ベクトル」とは、画像をＳＩＦＴ記述子による局所的特徴の集合として考え、局所特徴の特徴ベクトルををベクトル量子化することで、画像の特徴量を、画像から抽出した複数の局所的特徴の出現頻度のヒストグラムとして表現したベクトルをいう。
［視覚的特徴予測］
以下では、図４のステップＳ１０２で記載したような、予測器（デコーダ）を、脳活性パターンから、視認された物体の視覚的特徴のパターンを推定するように訓練する手続きについて、説明する。 Therefore, in the following, “image feature vector based on SIFT + BoF” means that the image is considered as a set of local features based on the SIFT descriptor, and the feature quantity of the image is obtained by vector quantization of the feature vector of the local feature. A vector expressed as a histogram of appearance frequencies of a plurality of local features extracted from.
[Visual feature prediction]
In the following, a procedure for training a predictor (decoder) as described in step S102 of FIG. 4 to estimate a visual feature pattern of a visually recognized object from a brain activity pattern will be described.

以下に説明するように、線形回帰関数を使用して、ｆＭＲＩで測定された脳活動から、視認された物体の視覚的特徴ベクトルを推定するためのデコーダを構築する。 As described below, a linear regression function is used to construct a decoder for estimating visual feature vectors of a viewed object from brain activity measured with fMRI.

このようなデコーダとしては、たとえば、予測に重要な特徴を自動的に選ぶことができる、スパースロジスティック回帰(Sparse Logistic Regression:ＳＬＲ)を使用することができる。
なお、スパースロジスティック回帰については、たとえば、以下の文献に開示がある。 As such a decoder, for example, Sparse Logistic Regression (SLR), which can automatically select features important for prediction, can be used.
Note that sparse logistic regression is disclosed in, for example, the following documents.

文献４：Okito Yamashita, Masa aki Sato, Taku Yoshioka, Frank Tong, and Yukiyasu Kamitani. ”Sparse Estimation automatically selects voxels relevant for the decoding of fMRI activity patterns．” NeuroImage, Vol. 42, No. 4, pp. 1414-1429, 2008.
ここで、スパースロジスティック回帰による推定は、ｆＭＲＩデータの場合のように、説明変数の次元の数が高い場合に、良い結果が得られることが知られている。 Reference 4: Okito Yamashita, Masaaki Sato, Taku Yoshioka, Frank Tong, and Yukiyasu Kamitani. “Sparse Estimation automatically selects voxels relevant for the decoding of fMRI activity patterns.” NeuroImage, Vol. 42, No. 4, pp. 1414- 1429, 2008.
Here, it is known that the estimation by sparse logistic regression can provide a good result when the number of explanatory variable dimensions is high, as in the case of fMRI data.

入力として、ｄ個のボクセルの脳活動から成るｆＭＲＩ測定サンプルを以下の式で表す。 As an input, an fMRI measurement sample composed of brain activities of d voxels is expressed by the following equation.

このｆＭＲＩ測定サンプルが与えられたとき、回帰関数は、以下のように表現される。 When this fMRI measurement sample is given, the regression function is expressed as follows.

ここで、ｘ_iは、ボクセルｉのｆＭＲＩ振幅を特定するためのスカラー値であり、ｗ_ｉは、ボクセルｉの重みであり、ｗ₀は、バイアス値である。簡単のために、バイアスｗ₀は、重みベクトルの中に、以下のようにして、含めるものとする。 Here, x _i is a scalar value for specifying the fMRI amplitude of voxel i, w _i is the weight of voxel i, and w ₀ is a bias value. For simplicity, the bias w ₀ is included in the weight vector as follows.

また、ダミー変数ｘ₀＝１が、測定サンプルベクトルの中に、以下のようにして含まれるものとする。 Further, it is assumed that the dummy variable x ₀ = 1 is included in the measurement sample vector as follows.

この関数を用いると、各特徴ベクトルのｌ番目の要素を、以下の式で表されるようなガウスノイズを回帰関数ｙ（ｘ）に付加することで説明されるターゲット変数ｔｌ（ｌ∈｛１，…，Ｌ｝）としてモデル化することができる。 Using this function, the target variable tl (l∈ {1) explained by adding the Gaussian noise represented by the following expression to the l-th element of each feature vector to the regression function y (x). ,..., L}).

ここで、εは、ノイズ精度βのゼロ平均ガウス確率変数である。 Here, ε is a zero mean Gaussian random variable with noise accuracy β.

訓練データセットが与えらたとき、回帰関数が目的関数を最適化するように、ＳＬＲは、回帰関数のための重みおよびバイアスを計算する。 Given a training data set, the SLR calculates weights and biases for the regression function so that the regression function optimizes the objective function.

目的関数を構築するために、以下のように尤度関数を表現する。 To construct the objective function, the likelihood function is expressed as follows.

ここで、Ｎは、サンプルの個数であり、ｘは、Ｎ×（ｄ＋１）のｆＭＲＩデータマトリックスであって、そのｎ番目の行は、（ｄ＋１）次元ベクトルｘ_nである。
ｔ_l＝｛ｔ_l1，…，ｔ_ln｝^Tは、視覚特徴ベクトルの要素のサンプルである。
Here, N is the number of samples, x is an N × (d + 1) fMRI data matrix, and its nth row is a (d + 1) -dimensional vector _xn .
t _l = {t _l1 ,..., t _ln } ^T is a sample of the elements of the visual feature vector.

ベイズパラメタ推定を行ない、重み推定へスパース化を行うために、関連度自動決定事前分布（ＡＲＤ：automatic relevance determination prior）が採用される。ＡＲＤについては、たとえば、以下の文献にも開示がある。 An automatic relevance determination prior (ARD) is adopted to perform Bayes parameter estimation and sparse weight estimation. ARD is also disclosed in, for example, the following documents.

文献５：Bishop, C.M. Pattern Recognition and Machine Learning. New York: Springer (2006).
訓練データセットを以下のように表す。 Reference 5: Bishop, CM Pattern Recognition and Machine Learning. New York: Springer (2006).
The training data set is represented as follows:

訓練データセットが与えられたときの重みパラメータｗの推定が行われる。 The weight parameter w is estimated when a training data set is given.

ここで、重みｗに対してガウス事前分布を仮定し、重み精度パラメタ―α＝｛α₀，…，α_d｝^Tに対しては、無情報事前分布を仮定し、ノイズ精度としてβを仮定すると、以下のように表される。 Here, a Gaussian prior distribution is assumed for the weight w, and for the weight accuracy parameter α = {α ₀ ,..., Α _d } ^T , an information-free prior distribution is assumed, and β is assumed as the noise accuracy. Then, it is expressed as follows.

ベイズ推定のフレームワークでは、すべての評価されたパラメーターの結合確率分布を考慮し、重みを、以下のｗについての結合事後確率Ｐ（ｗ，α，β｜Ｘ，ｔ_l）を評価することにより推定する； The Bayesian estimation framework considers the joint probability distribution of all evaluated parameters, and evaluates the weights and joint posterior probabilities P (w, α, β | X, t _l ) for w presume;

ここで、結合事後確率の評価は、分析的に扱いにくいという前提で、変分ベイズ法（ＶＢ）を使用して、それを近似する。 Here, the evaluation of the joint posterior probability is approximated using the variational Bayes method (VB) on the assumption that it is difficult to handle analytically.

パラメタ推定アルゴリズムについての詳細については、たとえば、以下の文献に開示がある。 Details of the parameter estimation algorithm are disclosed in, for example, the following documents.

文献６：Sato, M.A. Online model selection based on the variational Bayes. Neural Comp. 13, 1649-1681 (2001).
文献７：Sato, M.A. et al. Hierarchical Bayesian estimation for MEG inverse problem. Neuroimage 23, 806-826 (2004).
トレーニング画像セッションのｆＭＲＩサンプルを与えられたとしたときに、視認された物体カテゴリに対する個々の視覚的特徴の特徴ベクトルを推定する線形回帰モデルが、訓練される。 Reference 6: Sato, MA Online model selection based on the variational Bayes. Neural Comp. 13, 1649-1681 (2001).
Reference 7: Sato, MA et al. Hierarchical Bayesian estimation for MEG inverse problem. Neuroimage 23, 806-826 (2004).
Given a fMRI sample of a training image session, a linear regression model is trained that estimates the feature vector of individual visual features for the viewed object category.

テスト・データセットに対しては、ｆＭＲＩ信号の信号対雑音比を増加させるために、同じカテゴリ（テストイメージ・セッションの３５のサンプル、想像実験中の１０個のサンプル）に対応するｆＭＲＩサンプルは、複数の試行にわたって平均する。 For the test dataset, to increase the signal-to-noise ratio of the fMRI signal, fMRI samples corresponding to the same category (35 samples in the test image session, 10 samples in the imaginary experiment) are Average over multiple trials.

学習されたモデルを使用して、画像提示実験および想像実験中の各カテゴリに対して、１つの推定された特徴ベクトルを構成するために、平均されたｆＭＲＩサンプルから視認された／想像された物体の特徴ベクトルが推定される。
［実験］
以下では、識別解析のために行った実験の内容について、さらに説明する。
（識別解析の実験のための準備）
次に、図４のステップＳ１０６から〜Ｓ１０８で記載したような、脳活性パターンから推定された多物体の視覚的特徴のパターンを使用して、視認された、あるいは、想像された物体を識別する処理の実験を行うための準備の処理について説明する。 Objects viewed / imagined from averaged fMRI samples to construct one estimated feature vector for each category in the image presentation and imagination experiments using the learned model Is estimated.
[Experiment]
Below, the content of the experiment conducted for identification analysis is further demonstrated.
(Preparation for discrimination analysis experiment)
Next, a visually recognized or imagined object is identified using a multi-object visual feature pattern estimated from brain activity patterns, as described in steps S106 to S108 of FIG. A preparation process for conducting a processing experiment will be described.

識別解析では、視認された／想像された物体のカテゴリは、ｆＭＲＩの脳活動パターンから推定された視覚的特徴ベクトルを使用して識別された。 In the discriminant analysis, categories of objects that were viewed / imagined were identified using visual feature vectors estimated from fMRI brain activity patterns.

識別解析に先立って、視覚的特徴ベクトルは、以下に説明する実験では、汎用画像データベース４０００内のすべてのカテゴリ（１５，３７２個のカテゴリ）における、画像のすべてに対して計算された。 Prior to discriminant analysis, visual feature vectors were calculated for all of the images in all categories (15,372 categories) in the general-purpose image database 4000 in the experiments described below.

すべてのカテゴリに対して「物体特有の特徴ベクトル」を生成するために、個別画像の視覚的特徴ベクトルは、各カテゴリ内で平均され、候補カテゴリの組が形成された。 In order to generate “object-specific feature vectors” for all categories, the visual feature vectors of the individual images were averaged within each category to form a set of candidate categories.

その後、訓練されたＳＬＲモデルを使用して、ｆＭＲＩ測定サンプルから、視認された／想像された物体の視覚的特徴ベクトルを推定し、推定された特徴ベクトルと、候補集合での各カテゴリの特徴ベクトルの間のピアソンの相関係数が計算された。 The trained SLR model is then used to estimate the visual feature vector of the viewed / imagined object from the fMRI measurement sample, and the estimated feature vectors and feature vectors for each category in the candidate set The Pearson correlation coefficient between was calculated.

変動する候補の数に対してパフォーマンスを定量化するために、視認された／想像された物体カテゴリおよびランダムに選択されたカテゴリから成る候補集合を作成した。 In order to quantify performance against a varying number of candidates, a candidate set was created consisting of visually recognized / imagined object categories and randomly selected categories.

ここで、候補集合でのカテゴリのどれもデコードするモデルのトレーニングに使用されたものではない。 Here, none of the categories in the candidate set were used to train the model to decode.

予測された特徴ベクトルが与えられると、候補集合中の最も高い相関係数を備えたカテゴリの選択によりカテゴリが識別された。各サンプルの平均識別性能は、提示されていないカテゴリを１００回リサンプリングすることにより計算された。
（実験結果）
以下では、脳活動データからの画像の特徴ベクトルの予測について行った実験、および、推定された特徴ベクトルにより物体のカテゴリを推定する処理についての実験の結果を説明する。 Given the predicted feature vector, the category was identified by selecting the category with the highest correlation coefficient in the candidate set. The average discrimination performance of each sample was calculated by resampling the unpresented category 100 times.
(Experimental result)
In the following, the results of experiments performed on the prediction of image feature vectors from brain activity data and the processing of estimating an object category based on estimated feature vectors will be described.

簡単に要約すると、対象者が自然画像（１５０のカテゴリ）を見ている間に、脳活動はｆＭＲＩによって記録された。 Briefly summarized, brain activity was recorded by fMRI while the subject was viewing natural images (150 categories).

その後、訓練されたデコーダは、ｆＭＲＩ活性パターンからのデコーダ・トレーニングの中で使用されなかった、視認されたまたは想像された物体の特徴ベクトルを推定した。 The trained decoder then estimated the feature vectors of the viewed or imagined objects that were not used in the decoder training from the fMRI activity pattern.

推定された特徴ベクトルを、オンライン画像データベース中のイメージから計算された物体に特有の特徴ベクトル３１０６と比較することによって、視認され、または想像された物体を、データベース（１５，３７２のカテゴリ）で定義された物体カテゴリに基づいて、識別した。 Define a visually or imagined object in the database (15,372 categories) by comparing the estimated feature vector with an object-specific feature vector 3106 calculated from images in an online image database Identified based on the object category.

任意の物体カテゴリが、特徴空間の中で表わされるので、識別された物体カテゴリがデコーダの訓練において使用されるものに制限されていない。 Since any object category is represented in the feature space, the identified object categories are not limited to those used in decoder training.

訓練されたデコーダは、うまく個々の特徴の値を推定し、ほとんどの関心領域の組合せに対するｆＭＲＩ信号からの物体の識別を可能にした。高位および低位の視覚的特徴は、それぞれ、より高次のおよびより低次の皮質領域のｆＭＲＩ信号から、一層よく推定される傾向があった。 The trained decoder successfully estimated individual feature values and allowed identification of objects from the fMRI signal for most region of interest combinations. Higher and lower visual features tended to be better estimated from fMRI signals in higher and lower cortical areas, respectively.

また、さらに重要なことは、より高次の皮質領域から推定された中間レベルの特徴は、物体カテゴリの識別で最も有用なものであった。 More importantly, the mid-level features estimated from higher cortical areas were the most useful in identifying object categories.

さらに、物体カテゴリについて想像することは、中間レベルの視覚的特徴を推定するような脳活動を引き起こし、かつ、統計的に有意レベルで物体識別を行なうのに十分であった。 Furthermore, imagining the object category has been sufficient to cause brain activity that estimates intermediate level visual features and to perform object identification at a statistically significant level.

したがって、実験の結果は、物体カテゴリの制限のあるセットで訓練されたデコードモデルが、任意の物体カテゴリをデコードするために汎化することを実証するものである。さらに、想像により誘起された脳活動に対して、物体カテゴリ識別に成功したことは、視覚的な知覚に誘発された特徴レベル表現は、また、トップダウンの視覚的な像に使用されることを示唆する。
（実験の具体的内容）
２つ種類の実験が行なわれた。つまり、画像提示実験、および、想像実験である。 The experimental results therefore demonstrate that a decoding model trained with a limited set of object categories generalizes to decode any object category. In addition, for brain activity induced by imagination, the success of object category identification means that visual perceptually induced feature level representations can also be used for top-down visual images. Suggest.
(Specific contents of the experiment)
Two types of experiments were conducted. That is, an image presentation experiment and an imaginary experiment.

図１０は、画像提示実験および想像実験の流れを説明するための概念図である。 FIG. 10 is a conceptual diagram for explaining the flow of the image presentation experiment and the imaginary experiment.

まず、図１０（ａ）に示されるように、「画像提示実験」では、被験者が、連続して視覚的な物体画像（９秒間各々示された）を見ている間、ｆＭＲＩ信号が測定された。 First, as shown in FIG. 10A, in the “image presentation experiment”, the fMRI signal is measured while the subject is continuously viewing visual object images (shown for 9 seconds each). It was.

（ａ）画像提示実験のデザイン
ディスプレイの中心に、中心凝視スポットとともに画像が提示された。 (A) Design of image presentation experiment An image was presented at the center of the display with a central gaze spot.

各刺激ブロックがブロック開始を被験者に知られる前に、凝視スポットのカラーは、０．５秒間、白から赤に変化した。 The color of the gaze spot changed from white to red for 0.5 seconds before each stimulus block was known to the subject to start the block.

被験者は、各々の全体にわたって安定に固定された状態を維持し、各反復に対してボタンを押すことで答えて、画像についてone-back repetition detection taskを行なった。 Subjects performed a one-back repetition detection task on the image, maintaining a stable and fixed state throughout each and responding by pressing a button for each iteration.

被験者に対して機械学習のためのデータを取得する「トレーニング画像セッション」においては、１５０の異なる物体カテゴリ（各カテゴリからの８つのイメージ）からの合計１，２００の画像が、各々、一回提示された。 In a “training image session” in which data for machine learning is acquired for a subject, a total of 1,200 images from 150 different object categories (eight images from each category) are each presented once. It was done.

対象者に対する「テストイメージ・セッション」では、５０の物体カテゴリ（各カテゴリから１つのイメージ）からの合計５０のイメージが、各々、３５回、示された。 In a “test image session” for the subject, a total of 50 images from 50 object categories (one image from each category) were each shown 35 times.

一方、図１０（ｂ）に示すように、「想像実験」では、対象者が画像提示実験のテストイメージ・セッション中に示された５０の物体カテゴリの１つを想像する間に、ｆＭＲＩ信号が測定された。 On the other hand, as shown in FIG. 10B, in the “imagination experiment”, the fMRI signal is generated while the subject imagines one of the 50 object categories shown during the test image session of the image presentation experiment. Measured.

対象者は、最初の信号音で目を閉じて、キュー期間に赤い文字によって示されたカテゴリと関係する可能な限り多数の物体画像を想像した。 The subject closed his eyes with the first signal and imagined as many object images as possible related to the category indicated by the red letters during the cue period.

続いて、被験者は、第２の信号音で目を開き、それらが合図された物体カテゴリに対応する物体を想像することができたかどうか、評価した。 Subsequently, the subject opened their eyes with the second signal sound and evaluated whether they were able to imagine an object corresponding to the signaled object category.

テストイメージ・セッションおよび想像実験のカテゴリは、上述のとおり、トレーニング画像セッションにおいては使用されなかった。 The test image session and imaginary experiment categories were not used in the training image session, as described above.

解析のためには、上述したＶ１野，Ｖ２野，Ｖ３野，Ｖ４野，ＬＯＣ，ＦＦＡ，ＰＰＡ、低次の視覚野（ＬＶＣと呼ぶ：Ｖ１野−Ｖ３野）、より高次の視覚野（ＨＶＣと呼ぶ：ＬＯＣ，ＦＦＡ，ＰＰＡを含む領域をカバー）、および、上にリストされた視覚的な副領域をすべてカバーする視覚野全体（視覚野ＶＣと呼ぶ）を含む、多数の視覚野からのｆＭＲＩ信号を使用した。 For the analysis, the above-described V1 field, V2 field, V3 field, V4 field, LOC, FFA, PPA, low-order visual cortex (referred to as LVC: V1 field-V3 field), higher-order visual cortex ( Called HVC: covers the area containing LOC, FFA, PPA), and from multiple visual areas, including the entire visual cortex (referred to as visual cortex VC) covering all the visual subareas listed above FMRI signals were used.

１組の線形回帰関数（スパース線形回帰モデル）が、各脳領域に対応するｆＭＲＩ信号から視覚的特徴ベクトル（各視覚的特徴ごとに、約１，０００個の特徴要素）を推定するために使用された。 A set of linear regression functions (sparse linear regression model) is used to estimate visual feature vectors (approximately 1,000 feature elements for each visual feature) from the fMRI signal corresponding to each brain region It was done.

予測器（デコーダ）は、トレーニング画像セッションからのｆＭＲＩ信号を使用して、見られた物体から計算される特徴ベクトルの要素の値を推定するように訓練された。その後、訓練された予測器（デコーダ）は、テストイメージ・セッションおよび想像実験の測定されたｆＭＲＩ信号からテスト物体カテゴリに対して各視覚的特徴のパターンを推定するために使用された。 The predictor (decoder) was trained to use the fMRI signal from the training image session to estimate the value of the feature vector element calculated from the viewed object. A trained predictor (decoder) was then used to estimate the pattern of each visual feature for the test object category from the measured fMRI signal of the test image session and imaginary experiment.

（ｂ）想像実験のデザイン
各試験の開始は、凝視マークの色の変化で被験者に知らされた。 (B) Design of imaginary experiment The start of each test was informed to the subject by a change in the color of the gaze mark.

多くの物体名からなるキュー刺激は、３秒間視覚的に示された。想像期間の開始および終了は、耳への信号音（ビープ音）によって示された。 A cue stimulus consisting of many object names was visually shown for 3 seconds. The start and end of the imaginary period was indicated by a signal tone (beep) to the ear.

最初の信号音の後に、被験者は、赤い文字によって示されたカテゴリと関係する、できるだけ多数の物体画像を想像するように要求された。 After the first signal tone, the subject was required to imagine as many object images as possible, related to the category indicated by the red letters.

被験者は、第２の信号音が鳴るまで、目を閉じて(１５秒)想像し続けた。その後、被験者は、想像した内容の正確さを評価するように依頼された（３秒）。 The subject closed his eyes (15 seconds) and continued to imagine until the second signal sounded. The subject was then asked to evaluate the accuracy of the imagined content (3 seconds).

実際のキューは、たとえば、５０個のオブジェクト名から成り、一方で、スペース制限のために、その単語の部分集合だけが、この図の中で描かれている。
（画像特徴予測精度）
まず、多数の視覚野の脳活性パターンから、提示された画像に対する視覚的特徴ベクトルの値を計算により推定することができるかどうかを最初に調査した。 An actual queue consists of, for example, 50 object names, while only a subset of the words is depicted in this figure due to space limitations.
(Image feature prediction accuracy)
First, it was first investigated whether the visual feature vector value for the presented image could be estimated by calculation from the brain activity patterns of many visual cortex.

ピアソンの相関係数を使用して、テストイメージ・セッションの画像サンプルのすべてに対する予測値および真の値から成るサンプル特徴ベクトルの比較により、特徴予測精度を評価した。 Pearson's correlation coefficient was used to evaluate feature prediction accuracy by comparing sample feature vectors consisting of predicted and true values for all of the image samples in the test image session.

特徴値の分布、およびオリジナルの母集団の特徴要素の数が、視覚的特徴間で異なったので、視覚的特徴間の予測精度差を解釈することは困難である。 Since the distribution of feature values and the number of feature elements in the original population differ between visual features, it is difficult to interpret the prediction accuracy difference between visual features.

そこで、特徴予測の精度としては、（異なる視覚野から得られた）脳領域にわたるイントラ特徴予測精度差に注目した。 Therefore, as the accuracy of feature prediction, we focused on the difference of intra feature prediction accuracy across brain regions (obtained from different visual cortex).

図１１は、多数の視覚野に対して、提示された画像の特徴に対する予測精度を示す図である。 FIG. 11 is a diagram showing the prediction accuracy for the feature of the presented image for a large number of visual areas.

各視覚的特徴の特徴要素の平均の予測精度は、ピアソンの相関係数（５人の被験者に関する平均）を使用して、推定された特徴値および示された画像の特徴値から計算された。エラーバーは、被験者に渡る９５％の信頼区間（ＣＩ）を表示す。 The average prediction accuracy of each visual feature feature was calculated from the estimated feature values and feature values of the displayed image using Pearson's correlation coefficient (average for 5 subjects). The error bar displays a 95% confidence interval (CI) across the subject.

なお、図１１では、脳の領野は、一つの特徴（たとえば、ＣＮＮ１）に対して、左から右に、Ｖ１野、Ｖ２野、Ｖ３野、Ｖ４野、ＬＯＣ，ＦＦＡ，ＰＰＡ、ＬＶＣ、ＨＶＣ、ＶＣの順序で並んでいるものとする。以後のグラフでも、脳の領野の並びの順序は、この順序で同様であるので、説明は繰り返さない。 In FIG. 11, the brain area corresponds to one feature (for example, CNN1) from the left to the right, from the V1, V2, V3, V4, LOC, FFA, PPA, LVC, HVC, It is assumed that they are arranged in the VC order. In the subsequent graphs, the order of the brain areas is the same in this order, and the description will not be repeated.

すべての特徴関心領域（feature-ROI）の組に対して、真の値と、脳活性パターンから推定された特徴値とは、正に相関している（ウィルコクソンの符号付順位検定、p<0.05、すべての組、およびすべての被験者に対して）。 True values and feature values estimated from brain activity patterns are positively correlated for all feature-ROI pairs (Wilcoxon signed rank test, p <0.05) , For all pairs, and for all subjects).

視覚的特徴と視覚野の選択は、精度に影響しており、ＣＮＮの特徴に対応する結果の中で見られるように、高次の（より高い層の）視覚的特徴は、より低次の皮質領域よりも、より高次の皮質領域のｆＭＲＩ信号から一層よく推定される傾向があり、低次の（より低い層の）視覚的特徴は、より高次の皮質領域よりも、より低次の皮質領域のｆＭＲＩ信号から一層よく推定される傾向があった。 The choice of visual features and visual cortex affects accuracy, and as seen in the results corresponding to the features of CNN, higher order (higher layer) visual features are lower order. There is a tendency to be better estimated from the fMRI signal of higher cortical areas than cortical areas, and the lower order (lower layer) visual features are lower in order than the higher cortical areas. There was a tendency to be better estimated from the fMRI signal of the cortical region of

同様の傾向は、ＨＭＡＸモデルおよび他の２つのモデルでも観察された。これらの結果は、階層的視覚野と、特徴予測精度中の視覚的特徴の複雑さレベルの間の強い関連性を示している。
（物体特有の特徴予測精度）
図１２は、視認された画像に対する、多数の視覚野からの物体特有の特徴の予測精度を示す図である。 Similar trends were observed in the HMAX model and the other two models. These results show a strong association between the hierarchical visual cortex and the level of visual feature complexity during feature prediction accuracy.
(Object-specific feature prediction accuracy)
FIG. 12 is a diagram illustrating the prediction accuracy of features peculiar to an object from a large number of visual areas with respect to a visually recognized image.

物体カテゴリをデコードするデコードモデルにさらに適合させるために、ｆＭＲＩ信号からの個別の物体カテゴリの代表的な特徴パターンを推定するように、デコーダをさらにカスタマイズすることも可能である。 The decoder can be further customized to estimate representative feature patterns of individual object categories from the fMRI signal in order to further adapt to the decoding model for decoding object categories.

この目的のために、個別の物体カテゴリでタグ付けされた複像の画像の特徴ベクトルの平均により構築された「物体特有の特徴ベクトル」を使用することにより、デコーダをさらに訓練する。このようにして再訓練されたデコーダを「物体特有特徴デコーダ」と呼ぶ。 To this end, the decoder is further trained by using “object-specific feature vectors” constructed by averaging the feature vectors of double-image images tagged with individual object categories. The decoder retrained in this way is called an “object specific feature decoder”.

その上で、物体特有特徴デコーダが、被験者に見られたか想像されたカテゴリの物体特有の特徴パターンを推定できるかどうかテストした。図１２は、このようなテストの結果として、多数の視覚野において、視認された物体カテゴリについての物体特有の特徴に対する予測精度を示す。 Then, it was tested whether the object-specific feature decoder could estimate the object-specific feature pattern of the category seen or imagined by the subject. FIG. 12 shows the prediction accuracy for object-specific features for the viewed object category in a number of visual areas as a result of such a test.

物体特有特徴デコーダは、すべての特徴関心領域（feature-ROI）の組合せに対して、見られた物体のカテゴリに対する、物体特有の特徴パターンを推定することに成功した。 The object-specific feature decoder has succeeded in estimating object-specific feature patterns for the category of objects seen for all feature-ROI combinations.

図１１に示した画像特徴予測解析におけるパフォーマンス傾向とは対照的に、ほとんどの視覚的特徴に対して、物体特有の特徴パターンは、より低次の皮質領域よりもより高次の脳の皮質領域でのｆＭＲＩ信号から、よりよく推定された。 In contrast to the performance trend in the image feature prediction analysis shown in FIG. 11, for most visual features, object-specific feature patterns are higher cortical regions of the brain than lower cortical regions. Was better estimated from the fMRI signal at.

さらに、図１３は、想像された画像に対する、多数の視覚野からの物体特有の特徴の予測精度を示す図である。 Further, FIG. 13 is a diagram showing the prediction accuracy of features peculiar to an object from a large number of visual areas for an imagined image.

図１３に示すように、想像された物体カテゴリの物体特有パターンも、より高次の視覚皮質の活動で訓練された、視認された物体に対するのと同じ「物体特有特徴デコーダ」を使用して、ほとんどの特徴に対して、適切に推定された。 As shown in FIG. 13, the object-specific pattern of the imagined object category is also trained with higher-order visual cortex activity, using the same “object-specific feature decoder” as for the viewed object, Appropriately estimated for most features.

これらの結果は、想像されたオブジェクトの特徴パターンを推定するために、視認されたオブジェクトの特徴パターンを推定するように訓練されたデコーダが汎化することができることを示唆する。 These results suggest that a decoder trained to estimate the feature pattern of the viewed object can be generalized to estimate the feature pattern of the imagined object.

これは、言い換えると、物体カテゴリに関して想像することが視覚的特徴を推定する脳活動を引き起こすのに十分であるという証拠を提供する。 This, in other words, provides evidence that imagining with respect to the object category is sufficient to cause brain activity that estimates visual features.

さらに物体特有の特徴予測精度が、視覚的特徴間および視覚的特徴内の双方に対して、「カテゴリ識別能力」と正に関連していることが見いだされた。
（一般的な物体デコーディングを備えた物体カテゴリ識別）
図１４は、物体カテゴリ識別の概念を示す図である。 Furthermore, it has been found that object-specific feature prediction accuracy is positively related to “category discrimination ability”, both between and within visual features.
(Object category identification with general object decoding)
FIG. 14 is a diagram illustrating the concept of object category identification.

ピアソンの相関係数が、提示されたか想像されたカテゴリおよび汎用画像データベース４０００から任意に選ばれたカテゴリの特定数から成る、候補集合におけるカテゴリに対する物体特有の特徴ベクトルと推定された特徴ベクトルとの間で計算された。 Between the object-specific and estimated feature vectors for the categories in the candidate set, where the Pearson correlation coefficient consists of a specific number of categories presented or imagined and arbitrarily chosen from the generic image database 4000 Calculated between.

最も高い相関係数を備えたカテゴリは推定されたカテゴリ（図１４で、星印が付加）として選択された。 The category with the highest correlation coefficient was selected as the estimated category (added with an asterisk in FIG. 14).

すなわち、より詳しくは、物体に特有の特徴パターンを推定するように訓練されたデコーダから予測された特徴が、視認されまたは想像された物体のデコーディングに役立つか否かを調べるために識別解析が実行された。 That is, more specifically, a discriminant analysis is performed to see if the features predicted from the decoder trained to estimate the feature pattern specific to the object are useful for decoding the viewed or imagined object. It has been executed.

識別解析においては、テストイメージ・セッション（および想像実験）に使用されるカテゴリおよび汎用画像データベース４０００によって提供される１５，３３２のカテゴリから任意に選ばれた特定数（たとえば、１０，０００個）のカテゴリから成る候補特徴ベクトル集合が構築された。 In the discriminant analysis, a specific number (eg, 10,000) arbitrarily chosen from the categories used for test image sessions (and imaginary experiments) and the 15,332 categories provided by the general-purpose image database 4000. A set of candidate feature vectors consisting of categories was constructed.

対象者から測定されたｆＭＲＩサンプルが与えられると、カテゴリ識別は、推定された特徴ベクトルで最も高い相関係数を有する物体特有の特徴ベクトルを選択し、選択されたベクトルに対応するカテゴリを割り当てることにより行なわれた。 Given a measured fMRI sample from a subject, category identification selects an object-specific feature vector with the highest correlation coefficient among the estimated feature vectors and assigns a category corresponding to the selected vector. It was done by.

図１５は、識別解析における識別手続きを説明するための概念図である。 FIG. 15 is a conceptual diagram for explaining an identification procedure in identification analysis.

図１５では、視認された物体についての識別器によって識別された上位６つの予測カテゴリを示す。 FIG. 15 shows the top six prediction categories identified by the classifier for the viewed object.

視認されイメージされた物体に対しては、推定された特徴パターンを使用する場合、真の物体カテゴリは相関に関して、正確に選択されたか高位にランク付けされた。 For objects that were viewed and imaged, the true object category was correctly selected or ranked higher in terms of correlation when using the estimated feature pattern.

正確なカテゴリが割り当てられなかった場合さえ、上位６つのカテゴリは合理的なカテゴリを含むような識別結果となった。たとえば、「アヒル」に対して推定された特徴ベクトルは、別のタイプの鳥「イワミソサザイ」として誤認された。 Even if the correct categories were not assigned, the top six categories resulted in identifications that included reasonable categories. For example, the feature vector estimated for “duck” was misidentified as another type of bird, “Iwamoshizai”.

すなわち、識別解析は、相関の高い順に並べられた候補カテゴリは、意味論的に、ほとんどのＣＮＮの特徴およびSIFT+BoFに対して、ターゲット・カテゴリに似ている傾向がある。 That is, in the identification analysis, candidate categories arranged in order of high correlation tend to be semantically similar to the target category for most CNN features and SIFT + BoF.

したがって、上述のとおり、最終識別が正しくなかった時さえ、識別器は、ターゲット・カテゴリに意味論的に似ていたカテゴリを推定することができる。 Thus, as described above, the classifier can estimate a category that was semantically similar to the target category even when the final identification was incorrect.

図１６は、識別された予測カテゴリのランク（順位）と、ターゲットとなるカテゴリとの間の意味的距離を示す図である。 FIG. 16 is a diagram illustrating the semantic distance between the rank (rank) of the identified prediction category and the target category.

すなわち、図１５に示すように、予測されたカテゴリを上位から順に並べた場合、この予測カテゴリとターゲット物体カテゴリに関して、ランクと意味的距離の関係を定量的に評価した。 That is, as shown in FIG. 15, when the predicted categories are arranged in order from the top, the relationship between the rank and the semantic distance is quantitatively evaluated with respect to the predicted category and the target object category.

ここで、「意味的距離」としては、WordNet木の中でのカテゴリ間の最短パス経路長を使用する。 Here, as the “semantic distance”, the shortest path length between categories in the WordNet tree is used.

実験では、５０個の試験サンプルすべてに対して、９９９個の誤りのカテゴリのランダムサンプリングによって候補集合を構築した。そして、予測された特徴ベクトルと候補特徴ベクトルとの間の類似度（相関値）にしたがって、１，０００個の特徴ベクトル（１つの真のカテゴリおよび９９９個の誤りのカテゴリ）をランク付けし、すべてのサンプルに対して、繰り返し、真のカテゴリと１位から１０００位までランク付けされたカテゴリとの間の意味的距離を計算した。 In the experiment, a candidate set was constructed by random sampling of 999 error categories for all 50 test samples. Then, according to the similarity (correlation value) between the predicted feature vector and the candidate feature vector, 1,000 feature vectors (one true category and 999 error categories) are ranked, For all samples, the semantic distance between the true category and the category ranked from 1 to 1000 was calculated repeatedly.

図１６（ａ）に示すように、より高い順位にランクされたカテゴリは、ターゲット・カテゴリに、より短い意味的距離を示す傾向がある。 As shown in FIG. 16 (a), the higher ranked categories tend to show a shorter semantic distance to the target category.

また、図１６（ｂ）に示すように、意味的距離は、特により高いレベルのＣＮＮの特徴およびSIFT+BoFに対するランクと正に関連していた。 Also, as shown in FIG. 16 (b), the semantic distance was positively associated with a particularly high level of CNN features and rank for SIFT + BoF.

つまり、真のカテゴリが正確に識別されなかった場合さえ、これらの結果はこれらの視覚的特徴に対しては、誤認されたカテゴリは、真のカテゴリに、意味論的に類似することを意味している。 That is, even if the true category is not correctly identified, these results indicate that for these visual features, the misidentified category is semantically similar to the true category. ing.

したがって、たとえば、識別器の出力として、単に、最も相関値の高かったカテゴリのみを、ユーザに提示するだけでなく、たとえば、相関値の高いものから所定の数の上位のカテゴリを一覧として、提示することとしてもよい。
また、本実施の形態のような物体カテゴリ識別では、デコードするモデルを訓練するのに使用されるカテゴリの数によって制限を受けないので、モデル・トレーニングに使用されないものを含む何千もの物体カテゴリに対して、識別精度を評価することができる。 Therefore, for example, not only the category having the highest correlation value is presented to the user as the output of the discriminator, but also, for example, a predetermined number of upper categories having the highest correlation value are presented as a list. It is good to do.
Also, object category identification as in this embodiment is not limited by the number of categories used to train the model to be decoded, so it can be used for thousands of object categories, including those not used for model training. On the other hand, the identification accuracy can be evaluated.

図１７は、視覚的特徴と候補集合サイズに対して、視認された物体に対する識別性能を示す図である。 FIG. 17 is a diagram illustrating discrimination performance for a visually recognized object with respect to a visual feature and a candidate set size.

図１７では、すべての視覚的特徴と候補集合サイズの関数として、視認された物体に対する識別性能を評価している。 In FIG. 17, the discrimination performance for a visually recognized object is evaluated as a function of all visual features and the candidate set size.

たとえば、候補集合サイズが２である場合、ほとんどの視覚的特徴に対して、約８０％の精度がえられ、候補の数が増加した時でさえ、性能は、ほとんどの視覚的特徴に対して偶然の割合（図１７中点線で表す）を超えている。 For example, if the candidate set size is 2, about 80% accuracy is obtained for most visual features, and performance is improved for most visual features even when the number of candidates increases. It exceeds the chance ratio (represented by the dotted line in FIG. 17).

候補の数が増加するにつれて、一般的傾向として精度は低下するものの、特徴について識別精度の順序はほとんど一貫していた。 As the number of candidates increases, the general tendency is that accuracy decreases, but the order of identification accuracy for features is almost consistent.

概して、ＣＮＮ４−７および（SIFT+BoF）のような中間レベルの特徴は、候補の数が異なっても、視認された物体の識別に対して高い性能を一貫して示した。 In general, mid-level features such as CNN4-7 and (SIFT + BoF) consistently demonstrated high performance for identification of viewed objects, even with different numbers of candidates.

ここで、「中間レベルの特徴」とは、識別器が多層構造で形成されている場合、最上層と最下層を除いた層において特定される特徴（発火するユニットのパターン）のことをいう。また、「中間レベルの特徴」は、識別器が、人間の視覚野における画像の認識のプロセスをモデル化した特徴量を使用している場合、ガボールフィルタで検出されるような線分特徴より複雑で，それらを組み合わせたような複雑な図形パターンを持ちつつも，顔や物体など意味を帯びるほどには複雑な構造を持たないような特徴を意味するものとし、たとえば、上述した「ＳＩＦＴ＋ＢｏＦによる画像特徴ベクトル」を意味するものとする。 Here, the “intermediate level feature” refers to a feature (pattern of a unit that ignites) specified in a layer excluding the uppermost layer and the lowermost layer when the discriminator is formed in a multilayer structure. In addition, “intermediate level features” are more complex than line features such as those detected by a Gabor filter when the classifier uses features that model the process of image recognition in the human visual cortex. In this case, it means a feature that does not have a complicated structure such as a face or an object while having a complicated figure pattern such as a combination of them, for example, the above-mentioned “image by SIFT + BoF” It means “feature vector”.

図１８は、多数の特徴関心領域の組合せの下での識別性能を評価した図である。 FIG. 18 is a diagram in which the discrimination performance is evaluated under a combination of a large number of feature regions of interest.

すなわち、図１８に示すように、定量的に多数の特徴関心領域の組合せの下での識別性能を評価するために、候補集合の数が２である場合に、視認されたかまたは想像された物体の識別性能を評価した。図１８において、偶然のレベルは５０％ということになる。 That is, as shown in FIG. 18, in order to quantitatively evaluate the discrimination performance under a combination of a large number of feature regions of interest, a visually recognized or imagined object when the number of candidate sets is two. The discrimination performance of was evaluated. In FIG. 18, the accidental level is 50%.

図１８に示されるように、視認され、または、想像された物体は、統計的に有意レベルで、ほとんどの特徴関心領域の組合せで、識別することに成功した（ウィルコクソンの符号付順位検定およびp<0.01）。 As shown in FIG. 18, a visually recognized or imagined object was successfully identified with a statistically significant level, with most feature region of interest combinations (Wilcoxon signed rank test and p <0.01).

より高次の脳皮質領域から推定された中間レベルの特徴は、視認されおよび想像された物体カテゴリを識別することにおいて、最も有用である。 Mid-level features estimated from higher brain cortex regions are most useful in identifying object categories that are viewed and imagined.

想像中の脳活動を使用して識別に成功したことは、視覚的な知覚において誘発された特徴レベルの表現は、また、トップダウンの視覚的な想像にも採用されていることを示唆する。 Successful identification using imaginary brain activity suggests that feature-level representations elicited in visual perception are also employed in top-down visual imagination.

以上説明したとおり、本実施の形態の脳活動解析装置および脳活動解析方法によれば、対象者の視覚野のfMRI信号から被験者に視認されイメージされた任意の物体カテゴリを推定（識別）することができる。 As described above, according to the brain activity analysis apparatus and the brain activity analysis method of the present embodiment, it is possible to estimate (identify) an arbitrary object category visually recognized and imaged by the subject from the fMRI signal of the subject's visual cortex. Can do.

また、このような推定（識別）においては、中間レベルの特徴が視認されイメージされた精度の高い物体識別に寄与するということを実証した。 In addition, it has been demonstrated that such estimation (identification) contributes to highly accurate object identification in which intermediate-level features are visually recognized and imaged.

視覚的な刺激によって引き起こされた脳活動により訓練されたデコーダは、視認されたものだけでなく想像された物体の特徴パターンを推定することができ、また、想像タスクの間に引き起こされた脳活動からの同じデコーダによって推定された特徴パターンを使用して、想像された物体の識別が可能である。しかしながら、同じ種類の物体を思い描くイメージというものは、必ずしも、ピクセル的な類似性を持っていない。したがって、より複雑で不変性を有する特徴が、物体識別に、より適している。さらに、過度に複雑な視覚的特徴よりも中間レベルの特徴の方が物体識別には、より適していることが予想される。
したがって、低次および高次の視覚的特徴は視覚的な物体を、相応の精度で識別できるものの、中間レベルの特徴を識別解析で使用した場合の方が、より高い精度が得られた。これは、正確な物体識別中の中間レベルの表現の重要な貢献を示唆する。
たとえば、ＣＮＮで最も高い階層の特徴（ＣＮＮ８）は、最も高い識別精度を達成するわけではない一方、特徴予測精および一般的な物体認識パフォーマンスは、中間レベルのＣＮＮの特徴を用いる方が高かった。 Decoders trained by brain activity caused by visual stimuli can estimate feature patterns of imagined objects as well as those that were seen, and brain activity caused during imaginary tasks The feature pattern estimated by the same decoder from can be used to identify the imagined object. However, images that envision the same kind of objects do not necessarily have pixel similarity. Therefore, more complex and invariant features are more suitable for object identification. Furthermore, it is expected that intermediate level features are more suitable for object identification than overly complex visual features.
Thus, although low-order and high-order visual features can identify visual objects with reasonable accuracy, higher accuracy is obtained when intermediate-level features are used in discriminant analysis. This suggests an important contribution of intermediate level representation during accurate object identification.
For example, the highest hierarchical feature in CNN (CNN8) does not achieve the highest discrimination accuracy, while feature prediction accuracy and general object recognition performance were higher with intermediate level CNN features .

このような特性となる原因としては、ＣＮＮ８からの特徴が、脳活動からの予測に起因するノイズに比較的弱いことなどが想定される。 As a cause of such a characteristic, it is assumed that the feature from CNN 8 is relatively weak to noise caused by prediction from brain activity.

また、本実施の形態の脳活動解析装置および脳活動解析方法は、脳活動をワード/概念に翻訳することにより、脳活動に基づいた情報検索システムを提供することも可能である。 In addition, the brain activity analysis apparatus and the brain activity analysis method of the present embodiment can provide an information search system based on brain activity by translating the brain activity into words / concepts.

すなわち、ユーザが、特定の物体の名前を知らないし思い出さないが、その可視画像を想像することができる場合、そのようなシステムは有用である。 That is, such a system is useful if the user does not know or remember the name of a particular object but can imagine its visible image.

以上説明したような構成により、対象者が、物体画像を見ているあるいは想像している間に測定された脳活動信号から、視認されまたは想像された物体のカテゴリを識別するに当たり、一般的な多くの物体カテゴリの識別を行うことを目的とする場合でも、対象カテゴリの全てについて、事前に機械学習することを必要としないので、識別器の機械学習に要する時間を短縮することが可能となる。 With the configuration described above, when a subject identifies a category of a visually recognized or imagined object from a brain activity signal measured while viewing or imagining the object image, Even when the purpose is to identify many object categories, it is not necessary to perform machine learning in advance for all target categories, so the time required for machine learning of the classifier can be shortened. .

また、対象者が、識別器のトレーニングの中で使用されなかった物体を含むような物体画像を見ているあるいは想像している間に測定された脳活動信号からであっても、視認されまたは想像された物体のカテゴリを識別することが可能となる。 Also, even if the subject is viewed from brain activity signals measured while viewing or imagining object images that contain objects that were not used during classifier training, It is possible to identify the category of the imagined object.

今回開示された実施の形態は、本発明を具体的に実施するための構成の例示であって、本発明の技術的範囲を制限するものではない。本発明の技術的範囲は、実施の形態の説明ではなく、特許請求の範囲によって示されるものであり、特許請求の範囲の文言上の範囲および均等の意味の範囲内での変更が含まれることが意図される。 Embodiment disclosed this time is an illustration of the structure for implementing this invention concretely, Comprising: The technical scope of this invention is not restrict | limited. The technical scope of the present invention is shown not by the description of the embodiment but by the scope of the claims, and includes modifications within the wording and equivalent meanings of the scope of the claims. Is intended.

２被験者、６ディスプレイ、１０ＭＲＩ装置、１１磁場印加機構、１２静磁場発生コイル、１４傾斜磁場発生コイル、１６ＲＦ照射部、１８寝台、２０受信コイル、２１駆動部、２２静磁場電源、２４傾斜磁場電源、２６信号送信部、２８信号受信部、３０寝台駆動部、３２データ処理部、３６記憶部、３８表示部、４０入力部、４２制御部、４４インタフェース部、４６データ収集部、４８画像処理部、５０ネットワークインタフェース。 2 subjects, 6 display, 10 MRI apparatus, 11 magnetic field application mechanism, 12 static magnetic field generation coil, 14 gradient magnetic field generation coil, 16 RF irradiation unit, 18 bed, 20 reception coil, 21 drive unit, 22 static magnetic field power source, 24 gradient Magnetic field power supply, 26 signal transmission unit, 28 signal reception unit, 30 bed driving unit, 32 data processing unit, 36 storage unit, 38 display unit, 40 input unit, 42 control unit, 44 interface unit, 46 data collection unit, 48 image Processing unit, 50 network interface.

Claims

An image database storing a plurality of reference image data associated with category information of objects included in the image;
A visual feature extraction unit that extracts a visual feature vector for the reference image data;
An interface for receiving a signal from a brain activity detection device for measuring a signal indicating brain activity in a predetermined region in the subject's brain;
When multiple test images are presented to the subject, an estimated visual feature vector estimated from the brain activity pattern is generated by machine learning based on a signal measured in advance as a signal indicating brain activity in a predetermined region in the subject's brain Feature prediction means for
Based on the magnitude of the correlation between the visual feature vector extracted by the visual feature extraction unit and the estimated visual feature vector, the category of the object corresponding to the brain activity pattern occurring in the predetermined area of the subject is determined. A brain activity analysis apparatus comprising: an identification means for identifying.

The brain activity analysis apparatus according to claim 1, wherein the number of object categories in the reference image data stored in the image database is greater than the number of object categories in the test image presented to the subject.

The visual feature vector corresponding to the test image used for machine learning of the feature predicting means is an average of visual feature vectors for a plurality of reference image data belonging to the same category. Brain activity analyzer.

The brain activity analysis apparatus according to claim 1, wherein the visual feature extraction unit is a convolutional neural network having a multilayer structure.

The brain activity analysis apparatus according to claim 4, wherein the estimated visual feature vector predicted by the feature prediction unit is a feature amount extracted from a firing pattern of an intermediate layer of the convolutional neural network.

The brain activity analysis apparatus according to claim 1, wherein the estimated visual feature vector predicted by the feature prediction unit is an image feature vector based on SIFT + BoF.

Based on the signal from the brain activity detecting device for measuring the signal indicating the brain activity in the predetermined region in the subject's brain, the computing device is the category of the object that the subject is viewing or imagining. A brain activity analysis method for identifying
When the arithmetic device presents a plurality of test images to a subject, an estimated visual feature vector is estimated from a brain activity pattern based on a signal measured in advance as a signal indicating brain activity in a predetermined region in the subject's brain Machine learning a process to perform,
The computing device estimating an estimated visual feature vector based on a signal from the brain activity detecting device;
A visual feature vector extracted from the reference image data by an image database storing a plurality of reference image data associated with category information of an object included in the image, and the estimated visual feature vector. Calculating the degree of similarity;
And a step of identifying a category of an object corresponding to a brain activity pattern occurring in the predetermined area of the subject based on the calculated magnitude of the similarity. .

Based on a signal from a brain activity detection device for measuring a signal indicating brain activity in a predetermined region in the subject's brain, the category of the object visually recognized or imagined by the subject is identified. A brain activity analysis program for causing a computer to execute processing,
When the computing device of the computer presents a plurality of test images to the subject, an estimated visual feature vector from the brain activity pattern based on a signal measured in advance as a signal indicating brain activity in a predetermined region in the subject's brain Machine learning the process of estimating
The computing device estimating an estimated visual feature vector based on a signal from the brain activity detecting device;
A visual feature vector extracted from the reference image data by an image database storing a plurality of reference image data associated with category information of an object included in the image, and the estimated visual feature vector. Calculating the degree of similarity;
The computing device identifying a category of an object corresponding to a brain activity pattern occurring in the predetermined area of the subject based on the calculated magnitude of the similarity;
Is a brain activity analysis program that runs a computer.