JP2017041206A

JP2017041206A - Learning device, search device, method, and program

Info

Publication number: JP2017041206A
Application number: JP2015164218A
Authority: JP
Inventors: 卓弘金子; Takuhiro Kaneko; 隆行黒住; Takayuki Kurozumi; 邦夫柏野; Kunio Kashino
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-08-21
Filing date: 2015-08-21
Publication date: 2017-02-23
Anticipated expiration: 2035-08-21
Also published as: JP6397385B2

Abstract

PROBLEM TO BE SOLVED: To search for a multi-modal signal even when a part of the modal is missing.SOLUTION: Feature data is extracted by a target feature extraction part 40 with respect to each of inputted target signals to be single modal or multi-modal, target quantization data after converting the feature data of a target signal into quantization data using a code is acquired by a target feature quantizing part 42 on the basis of the extracted feature data of the target signal and the created conversion table with respect to the respective target signals, and attributes associated with accumulated quantization data corresponding to the target quantization data are searched for by a search part 44 in the created database on the basis of the target quantization data of the acquired target signal with respect to the respective target signals.SELECTED DRAWING: Figure 1

Description

本発明は、マルチモーダルな信号を探索するための学習装置、探索装置、方法、及びプログラムに関するものである。 The present invention relates to a learning device, a search device, a method, and a program for searching for a multimodal signal.

従来、マルチモーダルな信号を探索する方法として、時系列データである蓄積信号から、時系列データである目的信号に類似した箇所を探索する方法がある。 Conventionally, as a method of searching for a multimodal signal, there is a method of searching a location similar to a target signal that is time-series data from an accumulated signal that is time-series data.

特許第４３５８２２９号公報Japanese Patent No. 4358229

しかし、従来の方法では、複数のモーダルを使用した場合において、一部のモーダルが欠損していた場合に、マルチモーダルな信号を探索することができないという問題がある。 However, the conventional method has a problem in that when a plurality of modals are used and a part of the modals are missing, a multimodal signal cannot be searched.

本発明では、上記問題点を解決するために成されたものであり、一部のモーダルが欠損していてもマルチモーダルな信号を探索することができる学習装置、探索装置、方法、及びプログラムを提供することを目的とする。 In the present invention, a learning device, a search device, a method, and a program, which are made to solve the above-described problems, are capable of searching for a multimodal signal even when some modals are missing. The purpose is to provide.

上記目的を達成するために、第１の発明に係る学習装置は、入力されたマルチモーダルである学習信号の各々について、特徴データを抽出する学習特徴抽出部と、前記学習特徴抽出部において抽出した前記学習信号の各々の特徴データに基づいて、前記特徴データから共通の符号への変換テーブルを作成する学習部と、入力されたシングルモーダル又はマルチモーダルである蓄積信号の各々について、特徴データを抽出する蓄積特徴抽出部と、前記蓄積信号の各々について、前記蓄積特徴抽出部により抽出した前記蓄積信号の特徴データと、前記学習部により作成した変換テーブルとに基づいて、前記蓄積信号の特徴データを前記符号を用いた量子化データへ変換した蓄積量子化データを取得する蓄積特徴量子化部と、前記蓄積信号の各々について、前記蓄積特徴量子化部により取得した前記蓄積信号の蓄積量子化データと前記蓄積信号の属性とを対応付けてデータベースに登録し、前記データベースを作成するデータベース作成部と、を含んで構成されている。 In order to achieve the above object, a learning device according to a first aspect of the present invention extracts a learning feature extraction unit that extracts feature data for each input learning signal that is multimodal, and the learning feature extraction unit extracts the learning data. Based on the feature data of each of the learning signals, feature data is extracted for each of the learning unit that creates a conversion table from the feature data to a common code and the input single-modal or multi-modal stored signal. For each of the accumulated signals, the feature data of the accumulated signal is extracted based on the feature data of the accumulated signal extracted by the accumulated feature extractor and the conversion table created by the learning unit. An accumulation feature quantization unit that obtains accumulated quantized data converted into quantized data using the code, and each of the accumulated signals A database creating unit that creates the database by associating the accumulated quantized data of the accumulated signal acquired by the accumulated feature quantizing unit with the attribute of the accumulated signal and registering them in a database. ing.

第２の発明に係る学習方法は、学習特徴抽出部と、学習部と、蓄積特徴抽出部と、蓄積特徴量子化部と、データベース作成部とを含む、学習装置における学習方法であって、前記学習特徴抽出部は、入力されたマルチモーダルである学習信号の各々について、特徴データを抽出し、前記学習部は、前記学習特徴抽出部において抽出した前記学習信号の各々の特徴データに基づいて、前記特徴データから共通の符号への変換テーブルを作成し、前記蓄積特徴抽出部は、入力されたシングルモーダル又はマルチモーダルである蓄積信号の各々について、特徴データを抽出し、前記蓄積特徴量子化部は、前記蓄積信号の各々について、前記蓄積特徴抽出部により抽出した前記蓄積信号の特徴データと、前記学習部により作成した変換テーブルとに基づいて、前記蓄積信号の特徴データを前記符号を用いた量子化データへ変換した蓄積量子化データを取得し、前記データベース作成部は、前記蓄積信号の各々について、前記蓄積特徴量子化部により取得した前記蓄積信号の蓄積量子化データと前記蓄積信号の属性とを対応付けてデータベースに登録し、前記データベースを作成する。 A learning method according to a second invention is a learning method in a learning device including a learning feature extraction unit, a learning unit, an accumulation feature extraction unit, an accumulation feature quantization unit, and a database creation unit, The learning feature extraction unit extracts feature data for each of the input learning signals that are multimodal, and the learning unit, based on each feature data of the learning signal extracted in the learning feature extraction unit, A conversion table from the feature data to a common code is created, and the storage feature extraction unit extracts feature data for each input single-modal or multi-modal storage signal, and the storage feature quantization unit For each of the accumulated signals, based on the feature data of the accumulated signal extracted by the accumulated feature extracting unit and the conversion table created by the learning unit. And acquiring the accumulated quantized data obtained by converting the feature data of the accumulated signal into the quantized data using the code, and the database creating unit obtains each of the accumulated signals by the accumulated feature quantizing unit. The accumulated quantized data of the accumulated signal and the attribute of the accumulated signal are registered in association with each other to create the database.

第１及び第２の発明によれば、学習特徴抽出部により、入力されたマルチモーダルである学習信号の各々について、特徴データを抽出し、学習部により、抽出した学習信号の各々の特徴データに基づいて、特徴データから共通の符号への変換テーブルを作成し、蓄積特徴抽出部により、入力されたシングルモーダル又はマルチモーダルである蓄積信号の各々について、特徴データを抽出し、蓄積特徴量子化部により、蓄積信号の各々について、抽出した蓄積信号の特徴データと、作成した変換テーブルとに基づいて、蓄積信号の特徴データを符号を用いた量子化データへ変換した蓄積量子化データを取得し、データベース作成部により、蓄積信号の各々について、取得した蓄積信号の蓄積量子化データと蓄積信号の属性とを対応付けてデータベースに登録し、データベースを作成する。 According to the first and second aspects, the learning feature extraction unit extracts feature data for each of the input multimodal learning signals, and the learning unit extracts the feature data of each of the extracted learning signals. Based on this, a conversion table from the feature data to a common code is created, and the storage feature extraction unit extracts the feature data for each of the input single-modal or multi-modal storage signals, and the storage feature quantization unit For each of the accumulated signals, based on the extracted accumulated signal feature data and the created conversion table, the accumulated quantized data obtained by converting the accumulated signal feature data into quantized data using a code is obtained, For each of the accumulated signals, the database creation unit associates the accumulated quantized data of the acquired accumulated signal with the attribute of the accumulated signal. Registered in the over scan, to create a database.

このように、入力されたマルチモーダルである学習信号の各々について、特徴データを抽出し、抽出した学習信号の各々の特徴データに基づいて、変換テーブルを作成し、入力されたシングルモーダル又はマルチモーダルである蓄積信号の各々について、特徴データを抽出し、蓄積信号の各々について、抽出した蓄積信号の特徴データと、作成した変換テーブルとに基づいて、蓄積量子化データを取得し、データベース作成部により、蓄積信号の各々について、取得した蓄積信号の蓄積量子化データと蓄積信号の属性とを対応付けてデータベースに登録し、データベースを作成することにより一部のモーダルが欠損していてもマルチモーダルな信号を探索することができるデータベースを構築することができる。 In this way, feature data is extracted for each of the input learning signals that are multimodal, a conversion table is created based on the feature data of each of the extracted learning signals, and the input single modal or multimodal For each of the accumulated signals, the feature data is extracted, and for each of the accumulated signals, the accumulated quantized data is acquired based on the extracted feature data of the accumulated signal and the created conversion table. For each accumulated signal, the accumulated quantized data of the accumulated signal and the attribute of the accumulated signal are registered in the database in association with each other, and by creating a database, even if some modals are missing, it is multimodal. A database can be constructed that can search for signals.

また、第１の発明に係る学習装置において、前記蓄積特徴量子化部は、前記蓄積信号の特徴データに、前記学習信号の前記マルチモーダルに含まれるモーダルに対応するデータが欠損している場合には、前記蓄積信号の特徴データの前記欠損している部分にゼロを埋めた特徴データと、前記変換テーブルとに基づいて、前記蓄積量子化データを取得し、又は、前記蓄積信号の特徴データと、前記変換テーブルとに基づいて、前記変換テーブルに格納されている前記特徴データの、前記欠損している部分に対応するデータを無視して、前記蓄積量子化データを取得し、又は、前記蓄積信号の特徴データの前記欠損している部分に、対応する前記学習信号の特徴データの代表値を埋めた特徴データと、前記変換テーブルとに基づいて、前記蓄積量子化データを取得してもよい。 Further, in the learning device according to the first invention, the accumulated feature quantization unit is configured such that the feature data of the accumulated signal lacks data corresponding to the modal included in the multimodal of the learned signal. Acquiring the accumulated quantized data based on the feature data in which the missing portion of the feature data of the accumulated signal is zeroed and the conversion table, or the feature data of the accumulated signal and Based on the conversion table, the accumulated quantized data is obtained by ignoring data corresponding to the missing portion of the feature data stored in the conversion table, or the accumulation Based on the feature data in which the missing value of the feature data of the signal is filled with the representative value of the feature data of the corresponding learning signal and the conversion table, the accumulated amount Data may be acquired.

第３の発明に係る探索装置は、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出する目的特徴抽出部と、前記目的信号の各々について、前記目的特徴抽出部により抽出した前記目的信号の特徴データと、請求項１記載の学習装置において作成された変換テーブルとに基づいて、前記目的信号の特徴データを前記符号を用いた量子化データへ変換した目的量子化データを取得する目的特徴量子化部と、前記目的信号の各々について、前記目的特徴量子化部により取得された前記目的信号の目的量子化データに基づいて、前記学習装置において作成されたデータベースから、前記目的量子化データに対応する前記蓄積量子化データに対応付けられている前記属性を探索する探索部と、を含んで構成されている。 According to a third aspect of the present invention, there is provided a search device according to a third aspect of the present invention, wherein a target feature extraction unit that extracts feature data for each input single-modal or multi-modal target signal and the target feature extraction unit for each of the target signals The target quantized data obtained by converting the feature data of the target signal into quantized data using the code based on the extracted feature data of the target signal and the conversion table created in the learning device according to claim 1. From the database created in the learning device based on the target quantization data of the target signal acquired by the target feature quantization unit for each of the target signals, A search unit that searches for the attribute associated with the accumulated quantized data corresponding to the target quantized data. It is.

第４の発明に係る探索方法は、目的特徴抽出部と、目的特徴量子化部と、探索部とを含む、探索装置における探索方法であって、前記目的特徴抽出部は、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出し、前記目的特徴量子化部は、前記目的信号の各々について、前記目的特徴抽出部により抽出した前記目的信号の特徴データと、第３の発明の学習方法において作成された変換テーブルとに基づいて、前記目的信号の特徴データを前記符号を用いた量子化データへ変換した目的量子化データを取得し、前記探索部は、前記目的信号の各々について、前記目的特徴量子化部により取得された前記目的信号の目的量子化データに基づいて、前記学習装置において作成されたデータベースから、前記目的量子化データに対応する前記蓄積量子化データに対応付けられている前記属性を探索する。 A search method according to a fourth aspect of the present invention is a search method in a search device including an objective feature extraction unit, an objective feature quantization unit, and a search unit, wherein the objective feature extraction unit is an input single modal. Alternatively, feature data is extracted for each target signal that is multimodal, and the target feature quantizing unit extracts, for each of the target signals, feature data of the target signal extracted by the target feature extracting unit; Based on the conversion table created in the learning method of the invention, the target quantized data obtained by converting the feature data of the target signal into quantized data using the code is obtained, and the search unit includes the target signal For each of the above, from the database created in the learning device, based on the target quantization data of the target signal acquired by the target feature quantization unit, Searching the attribute associated with the storage quantized data corresponding to the serial object quantized data.

第３及び第４の発明によれば、目的特徴抽出部により、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出し、目的特徴量子化部により、目的信号の各々について、抽出した目的信号の特徴データと、第１の発明に係る学習装置において作成された変換テーブルとに基づいて、目的量子化データを取得し、探索部により、目的信号の各々について、取得された目的信号の目的量子化データに基づいて、学習装置において作成されたデータベースから、目的量子化データに対応する蓄積量子化データに対応付けられている属性を探索する。 According to the third and fourth inventions, the target feature extraction unit extracts feature data for each of the input single-modal or multi-modal target signals, and the target feature quantization unit extracts each of the target signals. The target quantization data is acquired based on the extracted feature data of the target signal and the conversion table created in the learning device according to the first invention, and is acquired for each of the target signals by the search unit. Based on the target quantized data of the target signal, an attribute associated with the accumulated quantized data corresponding to the target quantized data is searched from a database created in the learning device.

このように、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出し、目的信号の各々について、抽出した目的信号の特徴データと、作成された変換テーブルとに基づいて、目的量子化データを取得し、目的信号の各々について、取得された目的信号の目的量子化データに基づいて、作成されたデータベースから、目的量子化データに対応する蓄積量子化データに対応付けられている属性を探索することにより一部のモーダルが欠損していてもマルチモーダルな信号を探索することができる。 In this way, feature data is extracted for each input single-modal or multi-modal target signal, and for each target signal, the extracted target signal feature data and the created conversion table are used. The target quantized data is acquired, and each of the target signals is associated with the accumulated quantized data corresponding to the target quantized data from the created database based on the target quantized data of the acquired target signal. By searching for attributes, it is possible to search for multimodal signals even if some modals are missing.

また、第３の発明に係る探索装置において、前記目的特徴量子化部は、前記目的信号の特徴データに、前記学習信号の前記マルチモーダルに含まれるモーダルに対応するデータが欠損している場合には、前記目的信号の特徴データの前記欠損している部分にゼロを埋めた特徴データと、前記変換テーブルとに基づいて、前記目的量子化データを取得し、又は、前記目的信号の特徴データと、前記変換テーブルとに基づいて、前記変換テーブルに格納されている前記特徴データの、前記欠損している部分に対応するデータを無視して、前記目的量子化データを取得し、又は、前記目的信号の特徴データの前記欠損している部分に、対応する前記学習信号の特徴データの代表値を埋めた特徴データと、前記変換テーブルとに基づいて、前記目的量子化データを取得してもよい。 Further, in the search device according to the third invention, the target feature quantization unit includes a case where data corresponding to a modal included in the multimodal of the learning signal is missing in the feature data of the target signal. Obtaining the target quantized data based on the feature data in which the missing part of the feature data of the target signal is zeroed and the conversion table, or the feature data of the target signal and Based on the conversion table, the target data is obtained by ignoring data corresponding to the missing portion of the feature data stored in the conversion table, or Based on the feature data in which the missing value of the feature data of the signal is filled with the representative value of the feature data of the corresponding learning signal and the conversion table, the target amount Data may be acquired.

また、第３の発明に係る探索装置において、前記学習信号は、センサデータ又はメディアデータを２つ以上含み、前記蓄積信号は、センサデータ又はメディアデータを１つ以上含み、前記目的信号は、センサデータ又はメディアデータを１つ以上含んでもよい。 In the search device according to the third invention, the learning signal includes two or more sensor data or media data, the accumulated signal includes one or more sensor data or media data, and the target signal is a sensor One or more data or media data may be included.

また、本発明のプログラムは、コンピュータを、上記の学習装置、又は探索装置を構成する各部として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each part which comprises said learning apparatus or search apparatus.

以上説明したように、本発明の学習装置、方法、及びプログラムによれば、入力されたマルチモーダルである学習信号の各々について、特徴データを抽出し、抽出した学習信号の各々の特徴データに基づいて、変換テーブルを作成し、入力されたシングルモーダル又はマルチモーダルである蓄積信号の各々について、特徴データを抽出し、蓄積信号の各々について、抽出した蓄積信号の特徴データと、作成した変換テーブルとに基づいて、蓄積量子化データを取得し、データベース作成部により、蓄積信号の各々について、取得した蓄積信号の蓄積量子化データと蓄積信号の属性とを対応付けてデータベースに登録し、データベースを作成することにより一部のモーダルが欠損していてもマルチモーダルな信号を探索することができるデータベースを構築することができる。 As described above, according to the learning device, method, and program of the present invention, feature data is extracted for each input multimodal learning signal, and based on each feature data of the extracted learning signal. Then, a conversion table is created, and feature data is extracted for each of the inputted single-modal or multi-modal accumulated signals, and for each accumulated signal, the extracted accumulated signal feature data, the created conversion table, Based on the above, the accumulated quantization data is acquired, and the database creation unit registers the accumulated quantization data of the acquired accumulated signal and the attribute of the accumulated signal in the database in association with each accumulated signal, and creates the database. To search for multimodal signals even if some modals are missing It can be constructed over the nest.

また、本発明の探索装置、方法、及びプログラムによれば、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出し、目的信号の各々について、抽出した目的信号の特徴データと、作成された変換テーブルとに基づいて、目的量子化データを取得し、目的信号の各々について、取得された目的信号の目的量子化データに基づいて、作成されたデータベースから、目的量子化データに対応する蓄積量子化データに対応付けられている属性を探索することにより一部のモーダルが欠損していてもマルチモーダルな信号を探索することができる。 In addition, according to the search device, method, and program of the present invention, feature data is extracted for each input target signal that is single-modal or multi-modal, and the characteristics of the extracted target signal for each target signal. The target quantization data is acquired based on the data and the created conversion table, and the target quantization is performed for each target signal from the database created based on the target quantization data of the acquired target signal. By searching for the attribute associated with the accumulated quantized data corresponding to the data, a multimodal signal can be searched even if some modals are missing.

本発明の第１の実施形態に係るマルチモーダル信号探索装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the multimodal signal search apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るマルチモーダル信号探索装置における学習信号処理ルーチンのフローチャート図である。It is a flowchart figure of the learning signal processing routine in the multimodal signal search device which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るマルチモーダル信号探索装置における蓄積信号処理ルーチンのフローチャート図である。It is a flowchart figure of the accumulation signal processing routine in the multimodal signal search device concerning a 1st embodiment of the present invention. 本発明の第１の実施形態に係るマルチモーダル信号探索装置における探索処理ルーチンのフローチャート図である。It is a flowchart figure of the search process routine in the multimodal signal search device which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係るマルチモーダル信号探索装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the multimodal signal search apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係るマルチモーダル信号探索装置における蓄積信号処理ルーチンのフローチャート図である。It is a flowchart figure of the accumulation | storage signal processing routine in the multimodal signal search apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係るマルチモーダル信号探索装置における探索処理ルーチンのフローチャート図である。It is a flowchart figure of the search process routine in the multimodal signal search apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第３の実施形態に係るマルチモーダル信号探索装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the multimodal signal search apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施形態に係るマルチモーダル信号探索装置における蓄積信号処理ルーチンのフローチャート図である。It is a flowchart figure of the accumulation | storage signal processing routine in the multimodal signal search apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施形態に係るマルチモーダル信号探索装置における探索処理ルーチンのフローチャート図である。It is a flowchart figure of the search process routine in the multimodal signal search device which concerns on the 3rd Embodiment of this invention. ウェアラブルなモーダルを用いたデータ収集の一例を示す図である。It is a figure which shows an example of the data collection using a wearable modal. 外部に設置されたモーダルを用いたデータ収集の一例を示す図である。It is a figure which shows an example of the data collection using the modal installed outside. 実験結果の一例を示す図である。It is a figure which shows an example of an experimental result. 実験結果の一例を示す図である。It is a figure which shows an example of an experimental result.

以下、図面を参照して本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の第１の実施形態に係るマルチモーダル信号探索装置の構成＞
まず、本発明の第１の実施形態に係るマルチモーダル信号探索装置の構成について説明する。図１に示すように、本発明の第１の実施形態に係るマルチモーダル信号探索装置１００は、ＣＰＵと、ＲＡＭと、後述する学習信号処理ルーチン、蓄積信号処理ルーチン、及び探索処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このマルチモーダル信号探索装置１００は、機能的には図１に示すように学習信号取得部１０と、蓄積信号取得部１２と、目的信号取得部１４と、演算部２０と、出力部９０とを含んで構成されている。 <Configuration of Multimodal Signal Searching Device According to First Embodiment of the Present Invention>
First, the configuration of the multimodal signal search apparatus according to the first embodiment of the present invention will be described. As shown in FIG. 1, the multimodal signal search apparatus 100 according to the first embodiment of the present invention executes a CPU, a RAM, a learning signal processing routine, an accumulated signal processing routine, and a search processing routine, which will be described later. And a computer including a ROM storing various programs and various data. As shown in FIG. 1, this multimodal signal search device 100 functionally includes a learning signal acquisition unit 10, an accumulated signal acquisition unit 12, a target signal acquisition unit 14, a calculation unit 20, and an output unit 90. It is configured to include.

学習信号取得部１０は、少なくとも２つ以上の、学習に使用するマルチモーダルな信号（以後、学習信号とする。）を取得し、学習特徴抽出部２２に出力する。ここで、マルチモーダルな信号とは、例えば、マイクを用いて集音した音声信号データ（又は音響信号データ）、カメラを用いて撮影した画像信号データ、ウェアラブルまたは環境に設置されたセンサ、具体的には、加速度センサ、ジャイロセンサ、地磁気センサ、照度センサ、圧力センサ、近接センサ、温度センサ、湿度センサ、心拍・心電計、気圧センサ、ＧＰＳ、及び深度センサなどを用いて収集した加速度、角加速度、地磁気、照度、圧力、近接度、温度、湿度、心拍数・心電図、気圧、ＧＰＳデータ、及び深度地図などのセンサ信号データである。また、文も一つのモーダルとして用いてもよく、この場合には文字データを信号データとして用いる。なお、第１の実施形態においては、マルチモーダルな学習信号として音声信号データと、画像信号データとを用いることとする。また、計測機器を用いて収集されるセンサ信号データをセンサデータとし、文字データはメディアデータの一種とする。また、センサ信号データは、上記の例に限定されず、他のセンサ信号データを用いてもよい。また、メディアデータは、上記文字データに限定されず、他のメディアデータを用いてもよい。 The learning signal acquisition unit 10 acquires at least two or more multimodal signals used for learning (hereinafter referred to as learning signals) and outputs the acquired signals to the learning feature extraction unit 22. Here, the multimodal signal is, for example, audio signal data (or acoustic signal data) collected using a microphone, image signal data captured using a camera, a wearable or environment-installed sensor, specifically Includes acceleration sensors, gyro sensors, geomagnetic sensors, illuminance sensors, pressure sensors, proximity sensors, temperature sensors, humidity sensors, heart rate / electrocardiographs, barometric pressure sensors, GPS, and depth sensors, Sensor signal data such as acceleration, geomagnetism, illuminance, pressure, proximity, temperature, humidity, heart rate / electrocardiogram, barometric pressure, GPS data, and depth map. A sentence may also be used as one modal. In this case, character data is used as signal data. In the first embodiment, audio signal data and image signal data are used as multimodal learning signals. In addition, sensor signal data collected using a measuring device is sensor data, and character data is a kind of media data. The sensor signal data is not limited to the above example, and other sensor signal data may be used. The media data is not limited to the character data, and other media data may be used.

蓄積信号取得部１２は、少なくとも１つ以上の、後述するデータベースに蓄積するシングルモーダル、又はマルチモーダルな信号（以後、蓄積信号とする。）を取得し、蓄積特徴抽出部３０に出力する。なお、第１の実施形態においては、蓄積信号は、学習信号として取得されたモーダルと同種、及び同数のモーダルが取得される場合と、学習信号として取得されたモーダルの各々のうち、一部を欠損したモーダルの各々が取得される場合とがある。また、第１の実施形態においては、蓄積信号の各々に、当該蓄積信号に対応する属性データが付加されているものとする。なお、属性データと蓄積データとの対応関係を明確にすることができる場合には、付加されていることには限定されず、別途属性データを属性付与部３４に入力してもよい。ここで、属性データとは、データを表すのに有効な情報のことで、例えば、データの取得した環境やデータの中身などに関する説明やタグなどを表す。例えば、ダンスデータであれば、ダンスの技や構成、又は演者のタイプなどに関する情報が該当する。また、蓄積信号そのものを属性データとして用いてもよい。例えば、ダンスデータであれば、ダンスを撮影、録音した画像信号、音響信号データを属性データとして用いてもよい。 The accumulated signal acquisition unit 12 acquires at least one single-modal or multi-modal signal (hereinafter referred to as an accumulated signal) accumulated in a database, which will be described later, and outputs the acquired signal to the accumulated feature extraction unit 30. In the first embodiment, the accumulated signal is the same as the modal acquired as the learning signal and the same number of modals, and a part of each of the modal acquired as the learning signal. Each missing modal may be acquired. In the first embodiment, it is assumed that attribute data corresponding to each accumulated signal is added to each accumulated signal. If the correspondence between attribute data and accumulated data can be clarified, the attribute data is not limited to being added, and attribute data may be separately input to the attribute assigning unit 34. Here, the attribute data is information that is effective for representing the data. For example, the attribute data represents a description, a tag, or the like regarding the environment in which the data is acquired or the contents of the data. For example, in the case of dance data, information relating to dance technique and composition, performer type, or the like is applicable. Further, the accumulated signal itself may be used as attribute data. For example, in the case of dance data, an image signal obtained by photographing and recording a dance, and acoustic signal data may be used as attribute data.

例えば、第１の実施の形態においては、学習信号は、音声信号データのモーダルと、画像信号データのモーダルとからなるマルチモーダルであることから、蓄積信号として、同様の音声信号データのモーダルと、画像信号データのモーダルとからなるマルチモーダルを取得する場合と、音声信号データのモーダル及び画像信号データのモーダルの一方を取得する場合とがある。 For example, in the first embodiment, since the learning signal is multimodal including a modal of audio signal data and a modal of image signal data, the same modal of audio signal data as an accumulated signal, There are a case of acquiring a multimodal including a modal of image signal data and a case of acquiring one of a modal of audio signal data and a modal of image signal data.

目的信号取得部１４は、少なくとも１つ以上の、クエリとなるシングルモーダル、又はマルチモーダルな信号（以後、目的信号とする。）を取得し、目的特徴抽出部４０に出力する。なお、目的信号についても、上述した蓄積信号と同様に、学習信号として取得されたモーダルと同種、及び同数のモーダルが取得される場合と、学習信号として取得されたモーダルの各々のうち、一部を欠損したモーダルの各々が取得される場合とがある。 The target signal acquisition unit 14 acquires at least one single-modal or multi-modal signal (hereinafter referred to as a target signal) serving as a query, and outputs the acquired signal to the target feature extraction unit 40. As for the target signal, as in the case of the accumulated signal described above, the same kind and the same number of modals as the learning signal are acquired, and a part of each of the modals acquired as the learning signal. In some cases, each of the modals lacking is acquired.

演算部２０は、学習特徴抽出部２２と、学習部２４と、変換テーブル記憶部２６と、蓄積特徴抽出部３０と、蓄積特徴量子化部３２と、属性付与部３４と、データベース作成部３６と、データベース記憶部３８と、目的特徴抽出部４０と、目的特徴量子化部４２と、探索部４４とを含んで構成されている。 The calculation unit 20 includes a learning feature extraction unit 22, a learning unit 24, a conversion table storage unit 26, an accumulation feature extraction unit 30, an accumulation feature quantization unit 32, an attribute assignment unit 34, and a database creation unit 36. The database storage unit 38, the target feature extraction unit 40, the target feature quantization unit 42, and the search unit 44 are configured.

学習特徴抽出部２２は、学習信号取得部１０から入力された学習信号の各々について、当該学習信号から特徴データを抽出し、学習部２４に出力する。具体的には、学習信号に含まれる各モーダルについて特徴データを抽出する。なお、抽出された特徴データに含まれるデータの各々は、後述の学習部２４における処理として必要な最小単位分とする。 For each of the learning signals input from the learning signal acquisition unit 10, the learning feature extraction unit 22 extracts feature data from the learning signal and outputs the feature data to the learning unit 24. Specifically, feature data is extracted for each modal included in the learning signal. Note that each piece of data included in the extracted feature data is a minimum unit necessary for processing in the learning unit 24 described later.

ここで、特徴データを抽出するとは、音声信号データ（又は、音響信号データ）に対しては、まず、信号データを、指定したサンプリング周波数でリサンプリングする。例えば、サンプリング周波数を８０００Ｈｚとしてリサンプリングを行う。この際、前処理として、例えば、係数を０．７６としたプリエンファシスによって高域強調を行ってもよい。その後、一定の窓幅で信号を切り出すという処理を、一定間隔ずらしながら行う。パラメータとしては、例えば、窓幅を１０２４サンプル、シフト幅を１００サンプルとする。そして、切り出した個々の信号データに対して、離散フーリエ変換を行うことによって、短時間周波数スペクトルを得る。なお、音声信号の周波数スペクトルは、低周波領域にノイズが含まれることが多いので、得られた周波数スペクトルの一部を用いてもよい。例えば、低周波領域から６５番目のデータから、５１２番目のデータまでを用いる。ここで得られた短時間周波数スペクトルを時系列方向に並べることによって、時系列ベクトルを取得する。第１の実施形態においては、離散フーリエ変換を行う例を挙げたが、パワースペクトルへの変換方法として離散コサイン変換を用いてもよい。また、特徴データとして、他の公知の方法、例えば、音声信号データから得られるスペクトラル包絡、基本周波数の時間変化情報などを用いてもよい。 Here, extracting feature data means that for audio signal data (or acoustic signal data), first, the signal data is resampled at a designated sampling frequency. For example, resampling is performed with a sampling frequency of 8000 Hz. At this time, as preprocessing, for example, high frequency emphasis may be performed by pre-emphasis with a coefficient of 0.76. Thereafter, a process of cutting out a signal with a constant window width is performed while shifting by a constant interval. For example, the window width is 1024 samples and the shift width is 100 samples. Then, a short-time frequency spectrum is obtained by performing discrete Fourier transform on the cut out individual signal data. Since the frequency spectrum of the audio signal often includes noise in the low frequency region, a part of the obtained frequency spectrum may be used. For example, 65th data to 512th data from the low frequency region are used. A time series vector is acquired by arranging the short-time frequency spectra obtained here in the time series direction. In the first embodiment, an example in which discrete Fourier transform is performed has been described, but discrete cosine transform may be used as a conversion method to a power spectrum. Further, as the feature data, other known methods such as a spectral envelope obtained from audio signal data, time change information of the fundamental frequency, and the like may be used.

また、画像信号データに対しては、まず、前処理として、指定したフレームレートで動画をリサンプリングする。例えば、フレームレートを１５として、リサンプリングを行う。この際、特徴抽出の処理スピードを向上するために、画像サイズの縮小処理を実行してもよい。例えば、縦方向４８ピクセル、横方向６４ピクセルに縮小する。次に、動画中の各画像について、画像領域を一定の間隔でブロックに分割し、各ブロック内でＲＧＢそれぞれについて平均値を算出することで、各画像について特徴ベクトルを取得する。パラメータとしては、縦方向に１２分割、横方向に１６分割する。動画の場合、画像信号データが時系列にわたって続くので、上記特徴データの抽出を、各画像に対して適用し、つなぎ合わせることによって、時系列ベクトルを取得する。第１の実施形態においては、ブロック領域内のＲＧＢデータを用いる例を挙げたが、他にも、公知の手法であるScale-Invariant Feature Transform (SIFT)（非特許文献１：David G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, 1999.）などの局所記述子を画像の特徴データとして用いてもよい。 For image signal data, first, as preprocessing, a moving image is resampled at a specified frame rate. For example, resampling is performed with a frame rate of 15. At this time, in order to improve the processing speed of feature extraction, image size reduction processing may be executed. For example, the image is reduced to 48 pixels in the vertical direction and 64 pixels in the horizontal direction. Next, for each image in the moving image, the image area is divided into blocks at regular intervals, and an average value is calculated for each of RGB within each block, thereby obtaining a feature vector for each image. As a parameter, it is divided into 12 in the vertical direction and 16 in the horizontal direction. In the case of a moving image, since image signal data continues over time, the feature data extraction is applied to each image and connected to obtain a time series vector. In the first embodiment, an example in which RGB data in a block area is used has been described. In addition, Scale-Invariant Feature Transform (SIFT) (Non-Patent Document 1: David G. Lowe. Local recognition such as Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, 1999.) may be used as image feature data.

また、加速度センサ、ジャイロセンサ、地磁気センサによって得られた９軸のセンサデータ、あるいは、心拍・心電計を用いて得られた心拍・心電データに対しては、各軸について特徴データの抽出を行う。まず、前処理として一定のサンプリング周波数でリサンプリングを行う。例えば、サンプリング周波数を２００Ｈｚとしてリサンプリングを行う。上記処理に加えて、前処理として、平滑化などのフィルタリング処理を行い、ノイズ除去処理を実行してもよい。次に、時間軸上で、一定の窓幅で信号を切り出すという処理を、信号の先端から終端に向けて、一定間隔ずらしながら行う。パラメータとしては、例えば、窓幅を１秒、シフト幅を１秒とする。そして、切り出した個々の信号データに対して、離散コサイン変換を行うことによって、短時間周波数スペクトルを得る。ここで得られた短時間周波数スペクトルを時系列方向に並べることによって、時系列ベクトルを取得する。第１の実施形態においては、離散コサイン変換を用いる例を挙げたが、パワースペクトルへの変換方法として、離散フーリエ変換を行ってもよい。また、センサ信号データのピーク位置情報を特徴データとして用いてもよい。また、上記では、各軸について特徴データを抽出する例を挙げたが、３軸センサの場合は、各軸のセンサデータの２乗和のルートをとることによって、センサ値の大きさを求め、その値について前記と同様の処理を行ってもよい。 For 9-axis sensor data obtained by acceleration sensors, gyro sensors, and geomagnetic sensors, or for heart rate / electrocardiographic data obtained using a heart rate / electrocardiograph, feature data is extracted for each axis. I do. First, resampling is performed at a constant sampling frequency as preprocessing. For example, resampling is performed with a sampling frequency of 200 Hz. In addition to the above processing, as a preprocessing, a filtering process such as smoothing may be performed to perform a noise removal process. Next, a process of cutting out a signal with a constant window width on the time axis is performed while shifting the signal from the front end to the end of the signal. As parameters, for example, the window width is 1 second and the shift width is 1 second. A short-time frequency spectrum is obtained by performing discrete cosine transform on each cut out signal data. A time series vector is acquired by arranging the short-time frequency spectra obtained here in the time series direction. In the first embodiment, an example using discrete cosine transform has been described, but discrete Fourier transform may be performed as a conversion method to a power spectrum. Further, the peak position information of the sensor signal data may be used as feature data. In the above example, feature data is extracted for each axis. However, in the case of a 3-axis sensor, the magnitude of the sensor value is obtained by taking the root of the square sum of the sensor data for each axis. The same processing as described above may be performed for the value.

また、深度センサから得られた深度データに対しては、公知の手法を用いて、深度データを用いて、人体の骨格モデルを求め、各関節の軌跡データについて周波数解析を行うことによって、特徴データを抽出する。また、文中の文字データに対しては、英語の場合は、文字の区切り目であるスペースやピリオド、カンマなどの情報を元にして、文に出現する単語情報を特徴データとして取得する。また、照度センサ、圧力センサ、近接センサ、温度センサ、湿度センサ、気圧センサ、及びＧＰＳから得られた照度データ、圧力データ、近接度データ、温度データ、湿度データ、気圧データ、及びＧＰＳデータに対しては、公知の手法を用いて、任意の特徴データを抽出する。 In addition, for depth data obtained from the depth sensor, feature data is obtained by obtaining a human skeleton model using depth data and performing frequency analysis on the trajectory data of each joint using a known method. To extract. For character data in a sentence, in the case of English, word information appearing in the sentence is acquired as feature data based on information such as spaces, periods, and commas that are character breaks. Also, for illuminance data, pressure data, proximity data, temperature data, humidity data, atmospheric pressure data, and GPS data obtained from illuminance sensor, pressure sensor, proximity sensor, temperature sensor, humidity sensor, atmospheric pressure sensor, and GPS Then, arbitrary feature data is extracted using a known method.

なお、上述した特徴データの抽出方法は、特に限定されず、他の公知の手法を用いてもよい。また、特徴データのスケールは、個々のモーダルごとに異なるので、特徴データの抽出の後処理として、特徴データの中心化や正規化を行い、モーダル間の差異の緩和を行ってもよい。 Note that the feature data extraction method described above is not particularly limited, and other known methods may be used. In addition, since the scale of the feature data is different for each modal, the feature data may be centered or normalized as a post-processing of the feature data extraction to reduce the difference between the modals.

学習部２４は、学習特徴抽出部２２から入力された学習信号の各々の特徴データに基づいて、学習信号として取得されたモーダルの組み合わせにおいて、特徴データを共通の符号（或いは番号）へ変換する変換テーブルを作成し、変換テーブル記憶部２６に記憶する。第１の実施の形態においては、学習信号の各々に含まれる各モーダルの音声信号のモーダル、及び画像信号のモーダルの組み合わせについての変換テーブルを作成する。なお、第１の実施の形態においては、音声信号のモーダル、及び画像信号のモーダルの２つのモーダルの組み合わせについて、変換テーブルを作成する場合について説明したが、これに限定されるものではない。例えば、変換テーブルを作成するモーダルの組み合わせに用いるモーダルの数は限定されない。これは、音声特徴データと画像特徴データのように２つのモーダルを組み合わせてもよく、また、音声特徴データと画像特徴データ、加速度特徴データのように３つのモーダルを組み合わせもよいことを表す。そのため、あらゆるモーダルの組み合わせに対応させて変換テーブルを作成することができる。なお、学習部２４において作成される変換テーブルに対応するモーダルの組み合わせは予め定義しておくものとする。 The learning unit 24 converts the feature data into a common code (or number) in the modal combination acquired as the learning signal based on the feature data of each learning signal input from the learning feature extraction unit 22. A table is created and stored in the conversion table storage unit 26. In the first embodiment, a conversion table is created for combinations of modal audio signals and modal image signals included in each learning signal. In the first embodiment, a case has been described in which a conversion table is created for a combination of two modals: a modal of an audio signal and a modal of an image signal. However, the present invention is not limited to this. For example, the number of modals used for the combination of modals for creating the conversion table is not limited. This means that two modals may be combined like voice feature data and image feature data, and three modals may be combined like voice feature data, image feature data, and acceleration feature data. Therefore, it is possible to create a conversion table corresponding to any modal combination. Note that modal combinations corresponding to the conversion tables created in the learning unit 24 are defined in advance.

具体的には、変換テーブルは、例えば、公知の方法であるＫ‐ｍｅａｎｓ法に基づくＬＢＧアルゴリズムにより代表ベクトルをＶ_ｋ求め、代表ベクトルに番号ｋを付与することによって作成する。そのため、変換テーブルは、代表ベクトルＶ_ｋに近い特徴ベクトルを番号ｋに変換するためのテーブルとすることができる。ここで、ｋ=１，２，・・・，Ｋであり、Ｋは、代表ベクトルの数を表し、例えば、Ｋ＝１００とする。Ｋ‐ｍｅａｎｓ法も複数種類あるが、例えば、Ｅｌｋａｎアルゴリズムを用いる。また、Ｋ‐ｍｅａｎｓ法は初期値に依存する方法なので、初期値を設定する必要があるが、これについては、例えば、ランダムな値を用いる。Ｋ‐ｍｅａｎｓでは、初期値からはじめ、収束するまで反復処理を行うが、繰り返し回数については、例えば、５０回を上限とする。なお、第１の実施形態において、音声モーダルの特徴データの次元がＤ_ｘであり、画像モーダルの特徴データの次元がＤ_ｙであるとすると、第１の実施形態における代表ベクトルＶ_ｋの次元は、Ｄ_ｘ＋Ｄ_ｙとなる。ここで、各モーダルの優先順位は予め定義されており、第１の実施形態においては音声モーダルの後に画像モーダルの要素が並ぶということが予め定義されているものとする。そのため、代表ベクトルＶ_ｋの次元は、Ｄ_ｘ＋Ｄ_ｙとなる。 Specifically, the conversion table is created, for example, by obtaining a representative vector V _k by an LBG algorithm based on the K-means method, which is a known method, and assigning a number k to the representative vector. Therefore, the conversion table can be a table for converting a feature vector close to the representative vector V _k into a number k. Here, k = 1, 2,..., K, where K represents the number of representative vectors, for example, K = 100. Although there are a plurality of K-means methods, for example, the Elkan algorithm is used. Further, since the K-means method depends on the initial value, it is necessary to set the initial value. For this, for example, a random value is used. In K-means, an iterative process is performed from the initial value until convergence, and the upper limit of the number of iterations is 50, for example. In the first embodiment, if the dimension of the feature data of the voice modal is D _x and the dimension of the feature data of the image modal is D _y , the dimension of the representative vector V _k in the first embodiment is , D _x + D _y . Here, the priority order of each modal is defined in advance, and in the first embodiment, it is defined in advance that the elements of the image modal are arranged after the audio modal. Therefore, the dimension of the representative vector V _k is D _x + D _y .

変換テーブル記憶部２６には、学習部２４において作成された変換テーブルが記憶されている。第１の実施形態においては、Ｋ＝１００であるため、１００個の代表ベクトルＶ_ｋと番号ｋとの組み合わせが格納されているものとする。 The conversion table storage unit 26 stores a conversion table created by the learning unit 24. In the first embodiment, since K = 100, it is assumed that 100 combinations of representative vectors V _k and numbers k are stored.

蓄積特徴抽出部３０は、蓄積信号取得部１２から入力された蓄積信号の各々について、当該蓄積信号から特徴データを抽出し、蓄積特徴量子化部３２へ出力する。なお、蓄積特徴抽出部３０における、蓄積信号からの特徴データの抽出方法は、上述した学習特徴抽出部２２と同様であるため、詳細な説明は省略する。 The accumulated feature extraction unit 30 extracts feature data from each accumulated signal input from the accumulated signal acquisition unit 12 and outputs the feature data to the accumulated feature quantization unit 32. Note that the method for extracting feature data from the accumulated signal in the accumulated feature extraction unit 30 is the same as that of the learning feature extraction unit 22 described above, and thus detailed description thereof is omitted.

蓄積特徴量子化部３２は、蓄積信号の各々について、蓄積特徴抽出部３０において抽出した当該蓄積信号の特徴データと、変換テーブル記憶部２６に記憶されている変換テーブルとに基づいて、当該蓄積信号に含まれる最小処理単位毎の特徴データを量子化データへ変換し、変換された量子化データの各々に基づいて、蓄積量子化データを生成し、属性付与部３４に出力する。ここで、量子化は、上述した代表ベクトルＶ_ｋのうち最も近い代表ベクトルに対応する番号ｋを量子化値（量子化データ）として使用する。なお、対象となる蓄積信号において、変換テーブルの対象であるモーダルの組み合わせのうち、少なくとも１つ以上のモーダルを欠損している場合には、当該欠損しているモーダルに対応する蓄積信号の特徴データの値にゼロを埋める。 The accumulated feature quantization unit 32 determines, for each accumulated signal, the accumulated signal based on the feature data of the accumulated signal extracted by the accumulated feature extracting unit 30 and the conversion table stored in the conversion table storage unit 26. Is converted into quantized data, and accumulated quantized data is generated based on each of the converted quantized data and output to the attribute assigning unit 34. Here, the quantization uses number k corresponding to the closest representative vectors of the representative vector V _k described above as quantized values (quantized data). If at least one modal among the modal combinations that are the objects of the conversion table is missing in the target accumulated signal, the feature data of the accumulated signal corresponding to the missing modal. Fills the value of zero with zero.

具体的には、対象となる蓄積信号に含まれる各モーダルの特徴データの先頭から、各モーダルの最小処理単位データ同士を組み合わせたデータを処理特徴データとする。また、当該処理特徴データと変換テーブルに含まれる各代表ベクトルＶ_ｋとの距離を算出し、距離が最小となる代表ベクトルＶ_ｋを決定し、当該決定した代表ベクトルＶ_ｋに対応するｋを変換テーブルから量子化データとして取得する。当該処理を蓄積信号に含まれる各モーダルの特徴データの先頭から、最小処理単位毎に処理が可能な範囲まで繰り返す。そして、繰り返し処理において取得した各ｋの値に基づいて、ｋの値についてのヒストグラムを当該蓄積信号の蓄積量子化データとして作成する。なお、当該ヒストグラムの単位は、例えば、確率分布を表し、各ｋの個数の各々を、取得したｋの総数で割った値を用いる事とする。 Specifically, data obtained by combining the minimum processing unit data of the modals from the head of the feature data of the modals included in the target accumulated signal is set as the process feature data. Further, the distance between the processing feature data and each representative vector V _k included in the conversion table is calculated, the representative vector V _k having the smallest distance is determined, and _k corresponding to the determined representative vector V k is converted. Obtained as quantized data from the table. This processing is repeated from the beginning of each modal feature data included in the accumulated signal to a range where processing is possible for each minimum processing unit. Then, based on each k value acquired in the iterative process, a histogram for the k value is created as accumulated quantized data of the accumulated signal. The unit of the histogram represents, for example, a probability distribution, and a value obtained by dividing each number of k by the total number of acquired k is used.

例えば、前記変換テーブルを２つのモーダルを組み合わせて作成した場合について、１つの処理特徴データを量子化データに変換する場合について説明する。変換テーブルの対象となるモーダルの組み合わせが２つのモーダルである場合において、一方のモーダルＭ_１の特徴データの次元をＤ_１、他方のモーダルＭ_２の特徴データの次元をＤ_２とすると、変換テーブルの代表ベクトルＶ_ｋの次元は、Ｄ_１＋Ｄ_２となる（モーダルＭ_１の後にモーダルＭ_２が続くことが予め定義されている）。蓄積信号に含まれるモーダルが、変換テーブルの対象となるモーダルの各々と一致する場合には、当該蓄積特徴データ内のある時刻ｔのベクトルＷ_ｔの次元は、Ｄ_１＋Ｄ_２となり、代表ベクトルＶ_ｋの次元と一致するので、Ｗ_ｔとＶ_ｋとの距離をそのまま計算し、その値が最も小さくなるようなｋを求めることで、量子化データを取得することができる。ここで、距離を計算するとき、例えばＬ２距離を用いる。他にも、Ｌ１距離やハミング距離など公知の任意の距離評価尺度を用いてもよい。モーダルに不足がある場合、例えば、一つ目のモーダルＭ_１を欠損している場合は、前記蓄積特徴データ内のｔ番目のベクトルＷ_ｔ ^（２）の次元は、Ｄ_２次元となり、代表ベクトルＶ_ｋと比較すると、最初のＤ_１次元が欠損していることになる。第１の実施形態においては、欠損している部分については、ゼロを埋めることによって対処する。つまり、ゼロがＤ_１個並んだベクトルとＷ_ｔ ^（２）とをつなげたベクトルと、代表ベクトルＶ_ｋとの距離を計算し、その値が最も小さくなるようなｋを求めることで、量子化データを取得する。なお、上記の例は１つ目のモーダルが欠損したときについて説明したが、２つ目のモーダルが欠損した場合についても同様に処理を行うことにより対応できる。 For example, a case where one processing feature data is converted into quantized data will be described in the case where the conversion table is created by combining two modals. When the combination of modals to be converted is two modals, if the dimension of the feature data of _one modal M ₁ is D ₁ and the dimension of the feature data of the other modal M ₂ is D ₂ , the conversion table The dimension of the representative vector V _k is D ₁ + D ₂ (predefined that modal M ₁ is followed by modal M ₂ ). When the modal included in the accumulated signal matches each of the modals to be converted, the dimension of the vector W _t at a certain time t in the accumulated feature data is D ₁ + D ₂ , and the representative vector V since matches the dimensions of _k, the distance between W _t and V _k calculated directly, by obtaining the k such that the value is the smallest, it is possible to acquire the quantized data. Here, when calculating the distance, for example, the L2 distance is used. In addition, any known distance evaluation scale such as L1 distance or Hamming distance may be used. When the modal is insufficient, for example, when the _first modal M ₁ is missing, the dimension of the t-th vector W _t ⁽²⁾ in the accumulated feature data is D ₂ dimensional, and the representative vector Compared to V _k , the first D ₁ dimension is missing. In the first embodiment, the missing portion is dealt with by filling in zeros. That is, by calculating the distance between the vector in which one vector of D _{1 is} arranged and W _t ⁽²⁾ and the representative vector V _k, and obtaining _k that minimizes the value, quantization is performed. Get the data. In the above example, the case where the first modal is lost has been described. However, the case where the second modal is lost can also be handled by performing the same process.

属性付与部３４は、蓄積信号の各々について、蓄積特徴量子化部３２から入力された当該蓄積信号の蓄積量子化データと、当該蓄積信号の属性データとを紐付けてデータベース作成部３６に出力する。 For each of the accumulated signals, the attribute assigning unit 34 associates the accumulated quantized data of the accumulated signal input from the accumulated feature quantizing unit 32 with the attribute data of the accumulated signal and outputs the associated data to the database creating unit 36. .

データベース作成部３６は、属性付与部３４から入力された蓄積信号各々についての、蓄積量子化データと属性データとの組み合わせを、データベース記憶部３８に記憶されているデータベースに登録する。 The database creation unit 36 registers a combination of accumulated quantized data and attribute data for each accumulated signal input from the attribute assigning unit 34 in a database stored in the database storage unit 38.

データベース記憶部３８には、蓄積量子化データと属性データとの組み合わせの各々が記憶されているデータベースが記憶されている。 The database storage unit 38 stores a database in which each combination of accumulated quantized data and attribute data is stored.

目的特徴抽出部４０は、目的信号取得部１４から入力された目的信号の各々について、当該目的信号から特徴データを抽出し、目的特徴量子化部４２へ出力する。なお、目的特徴抽出部４０における、目的信号からの特徴データの抽出方法は、上述した学習特徴抽出部２２と同様であるため、詳細な説明は省略する。 The target feature extraction unit 40 extracts feature data from the target signal for each of the target signals input from the target signal acquisition unit 14 and outputs the feature data to the target feature quantization unit 42. Note that the method of extracting feature data from the target signal in the target feature extraction unit 40 is the same as that of the learning feature extraction unit 22 described above, and thus detailed description thereof is omitted.

目的特徴量子化部４２は、目的信号の各々について、目的特徴抽出部４０において抽出された当該目的信号の特徴データと、変換テーブル記憶部２６に記憶されている変換テーブルとに基づいて、当該目的信号に含まれる最小処理単位毎の特徴データを量子化データへ変換し、変換された量子化データの各々に基づいて、目的量子化データを生成し、探索部４４に出力する。なお、目的信号の特徴データを目的量子化データへ変換する方法は、上述した蓄積特徴量子化部３２における蓄積信号の特徴データを蓄積量子化データへ変換する方法と同様の処理であるため、詳細な説明は省略する。 The target feature quantization unit 42, for each target signal, based on the feature data of the target signal extracted by the target feature extraction unit 40 and the conversion table stored in the conversion table storage unit 26. The feature data for each minimum processing unit included in the signal is converted into quantized data, target quantized data is generated based on each of the converted quantized data, and is output to the search unit 44. The method for converting the feature data of the target signal into the target quantized data is the same processing as the method for converting the feature data of the stored signal into the stored quantized data in the stored feature quantizing unit 32 described above. The detailed explanation is omitted.

探索部４４は、目的信号の各々について、目的特徴量子化部４２において取得した当該目的信号の目的量子化データと、データベース記憶部３８に記憶されているデータベースとに基づいて、当該目的信号の属性を探索し、探索結果を出力部９０から出力する。 The search unit 44 determines, for each target signal, the attribute of the target signal based on the target quantization data of the target signal acquired by the target feature quantization unit 42 and the database stored in the database storage unit 38. And the search result is output from the output unit 90.

具体的には、目的信号の各々について、当該目的信号の目的量子化データであるヒストグラムと、データベースに含まれる蓄積量子化データであるヒストグラムの各々との一致度を計算し、当該一致度が予め定められた閾値を超えた場合に、両ヒストグラムが一致すると判定し、一致すると判定された蓄積量子化データに対応する属性データをデータベースから取得し、当該属性データを、当該目的信号の属性データとして出力部９０から出力する。ここで、一致度は、例えば、Ｌ１距離によって評価する。この一致度の評価方法は、特定の距離評価尺度に限定されず、Ｌ２距離、ハミング距離など公知の任意の距離評価尺度を用いてよい。また、蓄積量子化データと付与された属性について、ロジスティック回帰やサポートベクターマシンなどを用いて、識別関数をあらかじめ学習し、その学習した評価関数を用いて、目的量子化データに対応する前記属性データを出力してもよい。また、複数の蓄積量子化データについて一致すると判定された場合には、一致度が一番高い（例えば、一番距離が小さい）蓄積量子化データに対応する属性データを出力部９０から出力してもよいし、対応する属性データを距離が小さい順に並びかえた結果を出力部９０から出力してもよい。また、閾値を用いず、計算された一致度順に対応する属性データを並べ替えた結果を出力部９０から出力してもよい。 Specifically, for each target signal, the degree of coincidence between the histogram that is the target quantized data of the target signal and each histogram that is the accumulated quantized data included in the database is calculated. When the predetermined threshold is exceeded, it is determined that both histograms match, and attribute data corresponding to accumulated quantized data determined to match is acquired from the database, and the attribute data is used as attribute data of the target signal. Output from the output unit 90. Here, the degree of coincidence is evaluated by, for example, the L1 distance. This coincidence evaluation method is not limited to a specific distance evaluation scale, and any known distance evaluation scale such as an L2 distance or a Hamming distance may be used. Further, with respect to the accumulated quantized data and the assigned attribute, the discriminant function is learned in advance using logistic regression, a support vector machine or the like, and the attribute data corresponding to the target quantized data is used using the learned evaluation function. May be output. Further, when it is determined that the plurality of accumulated quantized data match, the attribute data corresponding to the accumulated quantized data having the highest degree of coincidence (for example, the shortest distance) is output from the output unit 90. Alternatively, the output unit 90 may output the result of rearranging the corresponding attribute data in ascending order of distance. Further, the output unit 90 may output the result of rearranging the attribute data corresponding to the calculated matching degree order without using the threshold value.

＜本発明の第１の実施形態に係るマルチモーダル信号探索装置の作用＞
次に、本発明の第１の実施形態に係るマルチモーダル信号探索装置１００の作用について説明する。マルチモーダル信号探索装置１００は、学習信号取得部１０によって学習信号の各々が取得されると、マルチモーダル信号探索装置１００によって、図２に示す学習信号処理ルーチンが実行される。また、マルチモーダル信号探索装置１００は、蓄積信号取得部１２によって蓄積信号を受け付けると、マルチモーダル信号探索装置１００によって、図３に示す蓄積信号処理ルーチンが実行される。また、マルチモーダル信号探索装置１００は、目的信号取得部１４によって目的信号を受け付けると、マルチモーダル信号探索装置１００によって、図４に示す探索処理ルーチンが実行される。 <Operation of Multimodal Signal Searching Device According to First Embodiment of the Present Invention>
Next, the operation of the multimodal signal search apparatus 100 according to the first embodiment of the present invention will be described. In the multimodal signal search device 100, when each learning signal is acquired by the learning signal acquisition unit 10, the multimodal signal search device 100 executes a learning signal processing routine shown in FIG. Further, when the multimodal signal search device 100 receives the accumulated signal by the accumulated signal acquisition unit 12, the multimodal signal search device 100 executes the accumulated signal processing routine shown in FIG. Further, when the multimodal signal search apparatus 100 receives the target signal by the target signal acquisition unit 14, the multimodal signal search apparatus 100 executes a search processing routine shown in FIG.

始めに、図２に示す学習信号処理について説明する。 First, the learning signal processing shown in FIG. 2 will be described.

まず、図２に示す学習信号処理ルーチンのステップＳ１００で、受け付けた学習信号の各々のうち、処理対象となる学習信号を決定する。 First, in step S100 of the learning signal processing routine shown in FIG. 2, a learning signal to be processed is determined from each of the received learning signals.

次に、ステップＳ１０２で、処理対象の学習信号について特徴データを抽出する。 Next, in step S102, feature data is extracted for the learning signal to be processed.

次に、ステップＳ１０４で、受け付けた全ての学習信号について、ステップＳ１０２の処理を終了したか否かを判定する。受け付けた全ての学習信号について、ステップＳ１０２の処理を終了したと判定した場合には、学習信号処理は、ステップＳ１０６へ移行する。一方、受け付けた全ての学習信号について、ステップＳ１０２の処理を終了していないと判定した場合には、学習信号処理は、ステップＳ１００へ移行し、処理対象となる学習信号を変更し、ステップＳ１０２〜ステップＳ１０４までの処理を繰り返す。 Next, in step S104, it is determined whether or not the processing in step S102 has been completed for all received learning signals. If it is determined that the processing in step S102 has been completed for all accepted learning signals, the learning signal processing proceeds to step S106. On the other hand, if it is determined that the processing of step S102 has not been completed for all received learning signals, the learning signal processing proceeds to step S100, changes the learning signal to be processed, and steps S102 to S102. The process up to step S104 is repeated.

次に、ステップＳ１０６で、ステップＳ１０２において取得した受け付けた学習信号各々の特徴データに基づいて、変換テーブルを作成し、変換テーブル記憶部２６に記憶し、学習信号処理ルーチンを終了する。 Next, in step S106, a conversion table is created based on the feature data of each of the accepted learning signals acquired in step S102, stored in the conversion table storage unit 26, and the learning signal processing routine ends.

次に、図３に示す蓄積信号処理ルーチンについて説明する。 Next, the accumulated signal processing routine shown in FIG. 3 will be described.

まず、図３に示す蓄積信号処理ルーチンのステップＳ１２０で、変換テーブル記憶部２６に記憶されている変換テーブルを読み込む。 First, in step S120 of the accumulated signal processing routine shown in FIG. 3, the conversion table stored in the conversion table storage unit 26 is read.

次に、ステップＳ１２２で、受け付けた蓄積信号の各々のうち、処理対象となる蓄積信号を決定する。 Next, in step S122, an accumulated signal to be processed is determined from each of the accepted accumulated signals.

次に、ステップＳ１２４で、処理対象となる蓄積信号について、上述のステップＳ１０２と同様に特徴データを抽出する。 Next, in step S124, feature data is extracted from the accumulated signal to be processed in the same manner as in step S102 described above.

次に、ステップＳ１２６で、処理対象となる蓄積信号について、ステップＳ１２４において取得した特徴データから処理対象となる最小単位を決定する。 Next, in step S126, for the accumulated signal to be processed, the minimum unit to be processed is determined from the feature data acquired in step S124.

次に、ステップＳ１２８で、処理対象となる蓄積信号について、上述のステップＳ１０６において取得した変換テーブルの対象となる全てのモーダルを含むか否かを判定する。蓄積信号に、対象となる全てのモーダルを含む場合には、蓄積信号処理は、ステップＳ１３２へ移行する。一方、蓄積信号に、対象となる全てのモーダルを含まない場合（一部欠損している）場合には、蓄積信号処理は、ステップＳ１３０へ移行する。 Next, in step S128, it is determined whether or not the accumulated signal to be processed includes all the modals to be converted in the conversion table acquired in step S106 described above. When the accumulated signal includes all the modals to be processed, the accumulated signal processing moves to step S132. On the other hand, when the accumulated signal does not include all the target modals (partially missing), the accumulated signal processing proceeds to step S130.

次に、ステップＳ１３０で、ステップＳ１２４において取得した、処理対象となる最小単位の特徴データの欠損部分に対応している部分の要素にゼロを埋める。 Next, in step S130, zero is filled in the element of the part corresponding to the missing part of the feature data of the minimum unit to be processed acquired in step S124.

次に、ステップＳ１３２で、ステップＳ１２４において取得した、又はステップＳ１３０においてゼロを埋めた処理対象となる最小単位の特徴データと、ステップＳ１２０において取得した変換テーブルとに基づいて、処理対象となる最小単位に対応する量子化データであるｋの値を決定する。 Next, in step S132, the minimum unit to be processed based on the feature data of the minimum unit to be processed acquired in step S124 or padded with zeros in step S130 and the conversion table acquired in step S120. The value of k which is the quantized data corresponding to is determined.

次に、ステップＳ１３４で、処理対象となる蓄積信号の全ての最小単位についてステップＳ１２８〜ステップＳ１３２までの処理を終了したか否かを判定する。全ての最小単位についてステップＳ１２８〜ステップＳ１３２までの処理を終了したと判定した場合には、蓄積信号処理は、ステップＳ１３６へ移行する。一方、全ての最小単位についてステップＳ１２８〜ステップＳ１３２までの処理を終了していないと判定した場合には、蓄積信号処理は、ステップＳ１２６へ移行し、処理対象となる最小単位を変更し、ステップＳ１２８〜ステップＳ１３４までの処理を繰り返す。 Next, in step S134, it is determined whether or not the processing from step S128 to step S132 has been completed for all the minimum units of the accumulated signal to be processed. If it is determined that the processing from step S128 to step S132 has been completed for all the minimum units, the accumulated signal processing proceeds to step S136. On the other hand, if it is determined that the processing from step S128 to step S132 has not been completed for all the minimum units, the accumulated signal processing proceeds to step S126, changes the minimum unit to be processed, and step S128. The process up to step S134 is repeated.

次に、ステップＳ１３６で、処理対象となる蓄積信号について、ステップＳ１３２において取得した当該蓄積信号に含まれる最小単位毎のｋの値の各々に基づいて、蓄積量子化データを生成する。 Next, in step S136, for the accumulated signal to be processed, accumulated quantized data is generated based on each value of k for each minimum unit included in the accumulated signal acquired in step S132.

次に、ステップＳ１３８で、処理対象となる蓄積信号について、ステップＳ１３６において取得した蓄積量子化データと当該蓄積信号に付加されている属性データとを紐づける。 Next, in step S138, for the accumulated signal to be processed, the accumulated quantized data acquired in step S136 is associated with the attribute data added to the accumulated signal.

次に、ステップＳ１４０で、ステップＳ１３８において取得した蓄積量子化データと属性データとの組み合わせをデータベース記憶部３８に記憶されているデータベースに記憶する。 Next, in step S140, the combination of the accumulated quantized data and the attribute data acquired in step S138 is stored in the database stored in the database storage unit 38.

次に、ステップＳ１４２で、受け付けた全ての蓄積信号について、ステップＳ１２４〜ステップＳ１４０までの処理を終了したか否かを判定する。全ての蓄積信号について、ステップＳ１２４〜ステップＳ１４０までの処理を終了したと判定した場合には、蓄積信号処理ルーチンは終了する。一方、全ての蓄積信号について、ステップＳ１２４〜ステップＳ１４０までの処理を終了していないと判定した場合には、蓄積信号処理ルーチンは、ステップＳ１２２へ移行し、処理対象となる蓄積信号を変更し、ステップＳ１２４〜ステップＳ１４２までの処理を繰り返す。 Next, in step S142, it is determined whether or not the processing from step S124 to step S140 has been completed for all received accumulated signals. If it is determined that the processing from step S124 to step S140 has been completed for all the accumulated signals, the accumulated signal processing routine ends. On the other hand, if it is determined that the processing from step S124 to step S140 has not been completed for all the accumulated signals, the accumulated signal processing routine proceeds to step S122, changes the accumulated signal to be processed, The processing from step S124 to step S142 is repeated.

次に、図４に示す探索処理ルーチンについて説明する。 Next, the search processing routine shown in FIG. 4 will be described.

まず、図４に示す探索処理ルーチンのステップＳ１５０で、変換テーブル記憶部２６に記憶されている変換テーブルを読み込む。 First, in step S150 of the search processing routine shown in FIG. 4, the conversion table stored in the conversion table storage unit 26 is read.

次に、ステップＳ１５２で、データベース記憶部３８に記憶されているデータベースを読み込む。 Next, in step S152, the database stored in the database storage unit 38 is read.

次に、ステップＳ１５４で、受け付けた目的信号の各々のうち、処理対象となる目的信号を決定する。 Next, in step S154, a target signal to be processed is determined from each of the received target signals.

次に、ステップＳ１５６で、処理対象となる目的信号について、上述のステップＳ１０２と同様に特徴データを抽出する。 Next, in step S156, feature data is extracted for the target signal to be processed in the same manner as in step S102 described above.

次に、ステップＳ１５８で、処理対象となる目的信号について、ステップＳ１５６において取得した特徴データから処理対象となる最小単位を決定する。 Next, in step S158, for the target signal to be processed, the minimum unit to be processed is determined from the feature data acquired in step S156.

次に、ステップＳ１６０で、処理対象となる目的信号について、上述のステップＳ１０６において取得した変換テーブルの対象となる全てのモーダルを含むか否かを判定する。目的信号に、対象となる全てのモーダルを含む場合には、探索処理は、ステップＳ１６４へ移行する。一方、目的信号に、対象となる全てのモーダルを含まない場合（一部欠損している）場合には、探索処理は、ステップＳ１６２へ移行する。 Next, in step S160, it is determined whether or not the target signal to be processed includes all modals that are targets of the conversion table acquired in step S106 described above. When the target signal includes all the modals to be processed, the search process proceeds to step S164. On the other hand, when the target signal does not include all of the target modals (partially missing), the search process proceeds to step S162.

次に、ステップＳ１６２で、ステップＳ１５６において取得した、処理対象となる最小単位の特徴データの欠損部分に対応している部分の要素にゼロを埋める。 Next, in step S162, zeros are filled in the elements of the part corresponding to the missing part of the feature data of the minimum unit to be processed acquired in step S156.

次に、ステップＳ１６４で、ステップＳ１５６において取得した、又はステップＳ１６２においてゼロを埋めた処理対象となる最小単位の特徴データと、上述のステップＳ１５０において取得した変換テーブルとに基づいて、処理対象となる最小単位に対応する量子化データであるｋの値を決定する。 Next, in step S164, the processing is performed based on the minimum unit feature data acquired in step S156 or zero-filled in step S162 and the conversion table acquired in step S150 described above. The value of k that is quantized data corresponding to the minimum unit is determined.

次に、ステップＳ１６６で、処理対象となる目的信号の全ての最小単位についてステップＳ１６０〜ステップＳ１６４までの処理を終了したか否かを判定する。全ての最小単位についてステップＳ１６０〜ステップＳ１６２までの処理を終了したと判定した場合には、探索処理は、ステップＳ１６８へ移行する。一方、全ての最小単位についてステップＳ１６０〜ステップＳ１６４までの処理を終了していないと判定した場合には、探索処理は、ステップＳ１５８へ移行し、処理対象となる最小単位を変更し、ステップＳ１６０〜ステップＳ１６６までの処理を繰り返す。 Next, in step S166, it is determined whether or not the processing from step S160 to step S164 has been completed for all the minimum units of the target signal to be processed. If it is determined that the processes from step S160 to step S162 have been completed for all the minimum units, the search process proceeds to step S168. On the other hand, if it is determined that the processes from step S160 to step S164 have not been completed for all the minimum units, the search process proceeds to step S158, changes the minimum unit to be processed, and steps S160 to S160. The processing up to step S166 is repeated.

次に、ステップＳ１６８で、処理対象となる目的信号について、ステップＳ１６４において取得した当該目的信号に含まれる最小単位毎のｋの値の各々に基づいて、目的量子化データを生成する。 Next, in step S168, for the target signal to be processed, target quantized data is generated based on each value of k for each minimum unit included in the target signal acquired in step S164.

次に、ステップＳ１７０で、処理対象となる目的信号について、ステップＳ１６８において取得した当該目的信号の目的量子化データと、ステップＳ１５２において取得したデータベースとに基づいて、当該目的信号に対応する属性データを探索する。 Next, in step S170, attribute data corresponding to the target signal is obtained based on the target quantization data of the target signal acquired in step S168 and the database acquired in step S152 for the target signal to be processed. Explore.

次に、ステップＳ１７２で、処理対象となる目的信号について、ステップＳ１７０において取得した属性データを探索結果として出力部９０から出力する。 Next, in step S172, for the target signal to be processed, the attribute data acquired in step S170 is output from the output unit 90 as a search result.

次に、ステップＳ１７４で、受け付けた全ての目的信号について、ステップＳ１５６〜ステップＳ１７２までの処理を終了したか否かを判定する。全ての目的信号について、ステップＳ１５６〜ステップＳ１７２までの処理を終了したと判定した場合には、探索処理ルーチンは終了する。一方、全ての目的信号について、ステップＳ１５６〜ステップＳ１７２までの処理を終了していないと判定した場合には、探索処理ルーチンは、ステップＳ１５４へ移行し、処理対象となる目的信号を変更し、ステップＳ１５６〜ステップＳ１７４までの処理を繰り返す。 Next, in step S174, it is determined whether or not the processing from step S156 to step S172 has been completed for all received target signals. When it is determined that the processing from step S156 to step S172 has been completed for all target signals, the search processing routine is completed. On the other hand, if it is determined that the processing from step S156 to step S172 has not been completed for all target signals, the search processing routine proceeds to step S154, changes the target signal to be processed, The processing from S156 to step S174 is repeated.

以上説明したように、本発明の第１の実施形態に係るマルチモーダル信号探索装置によれば、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出し、目的信号の各々について、抽出した目的信号の特徴データと、作成された変換テーブルとに基づいて、目的量子化データを取得し、目的信号の各々について、取得された目的信号の目的量子化データに基づいて、作成されたデータベースから、目的量子化データに対応する蓄積量子化データに対応付けられている属性を探索することにより一部のモーダルが欠損していてもマルチモーダルな信号を探索することができる。 As described above, according to the multimodal signal search apparatus according to the first embodiment of the present invention, feature data is extracted for each of the input single modal or multimodal target signals, and the target signal For each target signal, the target quantization data is acquired based on the extracted target signal feature data and the created conversion table, and for each target signal, based on the acquired target quantization data of the target signal, By searching the created database for an attribute associated with the accumulated quantized data corresponding to the target quantized data, a multimodal signal can be searched even if some modals are missing.

また、シングルモーダル、又はマルチモーダルな信号を対象のモーダルのあらゆる組み合わせで共通の符号への変換テーブルを用いて量子化することで、一部のモーダルが欠損していても探索することを可能とし、対応する属性データを取得することができる。 In addition, single-modal or multi-modal signals can be quantized using a common code conversion table for all combinations of target modals, enabling search even if some modals are missing. , Corresponding attribute data can be acquired.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

次に、第２の実施形態に係るマルチモーダル信号探索装置について説明する。 Next, a multimodal signal search apparatus according to the second embodiment will be described.

第２の実施形態においては、蓄積信号、又は目的信号の一部に欠損が生じている場合には、当該欠損部分を無視して蓄積量子化データ、及び目的量子化データを生成する点が、第１の実施形態と異なる。なお、第１の実施形態に係るマルチモーダル信号探索装置と同様の構成及び作用については、同一の符号を付して説明を省略する。 In the second embodiment, when the accumulated signal or a part of the target signal is deficient, the accumulated quantized data and the target quantized data are generated by ignoring the deficient part. Different from the first embodiment. In addition, about the structure and effect | action similar to the multimodal signal search apparatus which concerns on 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

＜第２の実施形態に係るマルチモーダル信号探索装置の構成＞
次に、本発明の第２の実施形態に係るマルチモーダル信号探索装置の構成について説明する。図５に示すように、本発明の第２の実施形態に係るマルチモーダル信号探索装置２００は、ＣＰＵと、ＲＡＭと、後述する学習信号処理ルーチン、蓄積信号処理ルーチン、及び探索処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このマルチモーダル信号探索装置２００は、機能的には図５に示すように学習信号取得部１０と、蓄積信号取得部１２と、目的信号取得部１４と、演算部２２０と、出力部９０とを含んで構成されている。 <Configuration of Multimodal Signal Searching Device According to Second Embodiment>
Next, the configuration of the multimodal signal search apparatus according to the second embodiment of the present invention will be described. As shown in FIG. 5, the multimodal signal search apparatus 200 according to the second embodiment of the present invention executes a CPU, a RAM, a learning signal processing routine, an accumulated signal processing routine, and a search processing routine, which will be described later. And a computer including a ROM storing various programs and various data. The multimodal signal search apparatus 200 functionally includes a learning signal acquisition unit 10, an accumulation signal acquisition unit 12, a target signal acquisition unit 14, a calculation unit 220, and an output unit 90 as shown in FIG. It is configured to include.

演算部２２０は、学習特徴抽出部２２と、学習部２４と、変換テーブル記憶部２６と、蓄積特徴抽出部３０と、蓄積特徴量子化部２３２と、属性付与部３４と、データベース作成部３６と、データベース記憶部３８と、目的特徴抽出部４０と、目的特徴量子化部２４２と、探索部４４とを含んで構成されている。 The calculation unit 220 includes a learning feature extraction unit 22, a learning unit 24, a conversion table storage unit 26, an accumulation feature extraction unit 30, an accumulation feature quantization unit 232, an attribute assignment unit 34, and a database creation unit 36. The database storage unit 38, the target feature extraction unit 40, the target feature quantization unit 242, and the search unit 44 are included.

蓄積特徴量子化部２３２は、蓄積信号の各々について、蓄積特徴抽出部３０において抽出された当該蓄積信号の特徴データと、変換テーブル記憶部２６に記憶されている変換テーブルとに基づいて、当該蓄積信号に含まれる最小処理単位毎の特徴データを量子化データへ変換し、変換された量子化データの各々に基づいて、蓄積量子化データを生成し、属性付与部３４に出力する。対象となる蓄積信号において、変換テーブルの対象であるモーダルの組み合わせのうち、少なくとも１つ以上のモーダルを欠損している場合には、当該欠損しているモーダルに対応する蓄積信号の特徴データの部分を無視する。 For each of the accumulated signals, the accumulated feature quantization unit 232 performs the accumulation based on the feature data of the accumulated signal extracted by the accumulated feature extraction unit 30 and the conversion table stored in the conversion table storage unit 26. The feature data for each minimum processing unit included in the signal is converted into quantized data, accumulated quantized data is generated based on each of the converted quantized data, and is output to the attribute assigning unit 34. If at least one or more modal combinations are missing from the modal combination that is the target of the conversion table in the target accumulated signal, the portion of the feature data of the accumulated signal corresponding to the missing modal Is ignored.

例えば、前記変換テーブルを２つのモーダルを組み合わせて作成した場合について、１つの処理特徴データを量子化データに変換する場合について説明する。変換テーブルの対象となるモーダルの組み合わせが２つのモーダルである場合において、一方のモーダルＭ_１の特徴データの次元をＤ_１、他方のモーダルＭ_２の特徴データの次元をＤ_２とすると、変換テーブルの代表ベクトルＶ_ｋの次元は、Ｄ_１＋Ｄ_２となる（モーダルＭ_１の後にモーダルＭ_２が続くことが予め定義されている）。蓄積信号に含まれるモーダルが、変換テーブルの対象となるモーダルの各々と一致する場合には、当該蓄積特徴データ内のｔ番目のベクトルＷ_ｔの次元は、Ｄ_１＋Ｄ_２となり、代表ベクトルＶ_ｋの次元と一致するので、Ｗ_ｔとＶ_ｋとの距離をそのまま計算し、その値が最も小さくなるようなｋを求めることで、量子化データを取得することができる。モーダルに不足がある場合、例えば、一つ目のモーダルＭ_１を欠損している場合は、前記蓄積特徴データ内のある時刻ｔのベクトルＷ_ｔ ^（２）の次元は、Ｄ_２次元となり、代表ベクトルＶ_ｋと比較すると、最初のＤ_１次元が欠損していることになる。第２の実施形態においては、欠損している部分については、無視することによって対処する。つまり、Ｗｔ^（２）と、Ｖ_ｋのＤ_１＋１次元目からＤ_１＋Ｄ_２次元目までの距離を計算し、その値が最も小さくなるようなｋを求めることで、量子化データを取得する。なお、上記の例は１つ目のモーダルが欠損したときについて説明したが、２つ目のモーダルが欠損した場合についても同様に処理を行うことにより対応できる。 For example, a case where one processing feature data is converted into quantized data will be described in the case where the conversion table is created by combining two modals. When the combination of modals to be converted is two modals, if the dimension of the feature data of _one modal M ₁ is D ₁ and the dimension of the feature data of the other modal M ₂ is D ₂ , the conversion table The dimension of the representative vector V _k is D ₁ + D ₂ (predefined that modal M ₁ is followed by modal M ₂ ). When the modal included in the accumulated signal matches each of the modals to be converted, the dimension of the t-th vector W _t in the accumulated feature data is D ₁ + D ₂ , and the representative vector V _k Therefore, quantized data can be acquired by calculating the distance between W _t and V _k as it is and obtaining k that minimizes the value. If there is a lack of modalities, for example, if the _first modal M ₁ is missing, the dimension of the vector W _t ⁽²⁾ at a certain time t in the accumulated feature data is D ₂ dimensional and is representative. Compared to the vector V _k , the first D ₁ dimension is missing. In the second embodiment, the missing portion is dealt with by ignoring it. That is, the quantized data is acquired by calculating Wt ⁽²⁾ and the distance from the D ₁ +1 dimensional dimension of V _k to the D ₁ + D ₂ dimensional dimension and obtaining k that minimizes the value. . In the above example, the case where the first modal is lost has been described. However, the case where the second modal is lost can also be handled by performing the same process.

目的特徴量子化部２４２は、目的信号の各々について、目的特徴抽出部４０において抽出された当該目的信号の特徴データと、変換テーブル記憶部２６に記憶されている変換テーブルとに基づいて、当該目的信号に含まれる最小処理単位毎の特徴データを量子化データへ変換し、変換された量子化データの各々に基づいて、目的量子化データを生成し、探索部４４に出力する。なお、目的信号の特徴データを目的量子化データへ変換する方法は、上述した蓄積特徴量子化部２３２における蓄積信号の特徴データを蓄積量子化データへ変換する方法と同様の処理であるため、詳細な説明は省略する。 For each target signal, the target feature quantization unit 242 determines the target signal based on the target signal feature data extracted by the target feature extraction unit 40 and the conversion table stored in the conversion table storage unit 26. The feature data for each minimum processing unit included in the signal is converted into quantized data, target quantized data is generated based on each of the converted quantized data, and is output to the search unit 44. The method for converting the feature data of the target signal into the target quantized data is the same as the method for converting the feature data of the stored signal into the stored quantized data in the stored feature quantizing unit 232 described above. The detailed explanation is omitted.

なお、第２の実施形態に係るマルチモーダル信号探索装置の他の構成については、第１の実施形態に係るマルチモーダル信号探索装置の構成と同様であるため、説明を省略する。 Note that the other configurations of the multimodal signal search apparatus according to the second embodiment are the same as the configurations of the multimodal signal search apparatus according to the first embodiment, and thus the description thereof is omitted.

＜本発明の第２の実施形態に係るマルチモーダル信号探索装置の作用＞
次に、本発明の第２の実施形態に係るマルチモーダル信号探索装置２００の作用について説明する。マルチモーダル信号探索装置２００は、学習信号取得部１０によって学習信号の各々が取得されると、マルチモーダル信号探索装置によって、図２に示す学習信号処理ルーチンが実行される。また、マルチモーダル信号探索装置２００は、蓄積信号取得部１２によって蓄積信号を受け付けると、マルチモーダル信号探索装置２００によって、図６に示す蓄積信号処理ルーチンが実行される。また、マルチモーダル信号探索装置２００は、目的信号取得部１４によって目的信号を受け付けると、マルチモーダル信号探索装置２００によって、図７に示す探索処理ルーチンが実行される。なお、第２の実施形態に係る学習信号処理ルーチンについては、第１の実施形態に係る学習信号処理ルーチンと同様である為、説明を省略する。 <Operation of Multimodal Signal Searching Device According to Second Embodiment of the Present Invention>
Next, the operation of the multimodal signal search apparatus 200 according to the second embodiment of the present invention will be described. In the multimodal signal search device 200, when each of the learning signals is acquired by the learning signal acquisition unit 10, the learning signal processing routine shown in FIG. 2 is executed by the multimodal signal search device. Further, when the multimodal signal search device 200 receives the accumulated signal by the accumulated signal acquisition unit 12, the multimodal signal search device 200 executes the accumulated signal processing routine shown in FIG. Further, when the multimodal signal search apparatus 200 receives the target signal by the target signal acquisition unit 14, the multimodal signal search apparatus 200 executes a search processing routine shown in FIG. Note that the learning signal processing routine according to the second embodiment is the same as the learning signal processing routine according to the first embodiment, and a description thereof will be omitted.

始めに、図６に示す蓄積信号処理ルーチンについて説明する。 First, the accumulated signal processing routine shown in FIG. 6 will be described.

図６に示す蓄積信号処理ルーチンのステップＳ２００で、ステップＳ１２４において取得した処理対象となる最小単位の特徴データと、ステップＳ１２０において取得した変換テーブルとに基づいて、欠損しているモーダルに対応する特徴データの部分を無視して量子化データであるｋの値を決定する。 In step S200 of the accumulated signal processing routine shown in FIG. 6, the feature corresponding to the missing modal based on the feature data of the minimum unit to be processed acquired in step S124 and the conversion table acquired in step S120. The value of k which is quantized data is determined ignoring the data portion.

ステップＳ２０２で、ステップＳ１２４において取得した処理対象となる最小単位の特徴データと、ステップＳ１２０において取得した変換テーブルとに基づいて、処理対象となる最小単位に対応する量子化データであるｋの値を決定する。 In step S202, based on the minimum unit feature data to be processed acquired in step S124 and the conversion table acquired in step S120, the value of k, which is quantized data corresponding to the minimum unit to be processed, is set. decide.

なお、第２の実施形態に係る蓄積信号処理ルーチンの他の処理については、第１の実施形態に係る蓄積信号処理ルーチンと同様であるため説明を省略する。 Note that other processes of the accumulation signal processing routine according to the second embodiment are the same as those of the accumulation signal processing routine according to the first embodiment, and a description thereof will be omitted.

次に、図７に示す探索処理ルーチンについて説明する。 Next, the search processing routine shown in FIG. 7 will be described.

図７に示す探索処理ルーチンのステップＳ２２０で、ステップＳ１５６において取得した処理対象となる最小単位の特徴データと、ステップＳ１５０において取得した変換テーブルとに基づいて、欠損しているモーダルに対応する特徴データの部分を無視して量子化データであるｋの値を決定する。 In step S220 of the search processing routine shown in FIG. 7, the feature data corresponding to the missing modal based on the feature data of the minimum unit to be processed acquired in step S156 and the conversion table acquired in step S150. The value of k, which is quantized data, is determined by ignoring the part.

ステップＳ２２２で、ステップＳ１５６において取得した処理対象となる最小単位の特徴データと、ステップＳ１５０において取得した変換テーブルとに基づいて、処理対象となる最小単位に対応する量子化データであるｋの値を決定する。 In step S222, based on the feature data of the minimum unit to be processed acquired in step S156 and the conversion table acquired in step S150, the value of k that is quantized data corresponding to the minimum unit to be processed is determined. decide.

なお、第２の実施形態に係る探索処理ルーチンの他の処理については、第１の実施形態に係る探索処理ルーチンと同様であるため説明を省略する。 Note that other processes of the search process routine according to the second embodiment are the same as those of the search process routine according to the first embodiment, and thus the description thereof is omitted.

以上説明したように、本発明の第２の実施形態に係るマルチモーダル信号探索装置によれば、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出し、目的信号の各々について、抽出した目的信号の特徴データと、作成された変換テーブルとに基づいて、目的信号の一部が欠損している場合には、当該部分を無視するように目的量子化データを取得し、目的信号の各々について、取得された目的信号の目的量子化データに基づいて、作成されたデータベースから、目的量子化データに対応する蓄積量子化データに対応付けられている属性を探索することにより一部のモーダルが欠損していてもマルチモーダルな信号を探索することができる。 As described above, according to the multimodal signal search device according to the second embodiment of the present invention, feature data is extracted for each of the input single modal or multimodal target signals, and the target signal For each, based on the extracted feature data of the target signal and the created conversion table, if part of the target signal is missing, the target quantized data is acquired so as to ignore that part. For each target signal, by searching an attribute associated with the accumulated quantized data corresponding to the target quantized data from the created database based on the target quantized data of the acquired target signal Even if some modals are missing, a multimodal signal can be searched.

次に、第３の実施形態に係るマルチモーダル信号探索装置について説明する。 Next, a multimodal signal search apparatus according to the third embodiment will be described.

第３の実施形態においては、蓄積信号、又は目的信号の一部に欠損が生じている場合には、当該欠損部分に、対応する学習信号の特徴データの代表値を埋めて、蓄積量子化データ、及び目的量子化データを生成する点が、第１、第２の実施形態と異なる。なお、第１、第２の実施形態に係るマルチモーダル信号探索装置と同様の構成及び作用については、同一の符号を付して説明を省略する。 In the third embodiment, when a defect occurs in a part of the accumulated signal or the target signal, the accumulated quantized data is filled with the representative value of the feature data of the corresponding learning signal in the missing part. And the point which produces | generates target quantization data differs from 1st, 2nd embodiment. In addition, about the structure and effect | action similar to the multimodal signal search apparatus which concerns on 1st, 2nd embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

＜第３の実施形態に係るマルチモーダル信号探索装置の構成＞
次に、本発明の第３の実施形態に係るマルチモーダル信号探索装置の構成について説明する。図８に示すように、本発明の第３の実施形態に係るマルチモーダル信号探索装置３００は、ＣＰＵと、ＲＡＭと、後述する学習信号処理ルーチン、蓄積信号処理ルーチン、及び探索処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このマルチモーダル信号探索装置３００は、機能的には図８に示すように学習信号取得部１０と、蓄積信号取得部１２と、目的信号取得部１４と、演算部３２０と、出力部９０とを含んで構成されている。 <Configuration of Multimodal Signal Searching Device According to Third Embodiment>
Next, the configuration of the multimodal signal search apparatus according to the third embodiment of the present invention will be described. As shown in FIG. 8, a multimodal signal search apparatus 300 according to the third embodiment of the present invention executes a CPU, a RAM, a learning signal processing routine, an accumulation signal processing routine, and a search processing routine, which will be described later. And a computer including a ROM storing various programs and various data. Functionally, the multimodal signal search apparatus 300 includes a learning signal acquisition unit 10, an accumulated signal acquisition unit 12, a target signal acquisition unit 14, a calculation unit 320, and an output unit 90 as shown in FIG. It is configured to include.

演算部３２０は、学習特徴抽出部２２と、学習部２４と、変換テーブル記憶部２６と、蓄積特徴抽出部３０と、蓄積特徴量子化部３３２と、属性付与部３４と、データベース作成部３６と、データベース記憶部３８と、目的特徴抽出部４０と、目的特徴量子化部３４２と、探索部４４とを含んで構成されている。 The calculation unit 320 includes a learning feature extraction unit 22, a learning unit 24, a conversion table storage unit 26, an accumulation feature extraction unit 30, an accumulation feature quantization unit 332, an attribute assignment unit 34, and a database creation unit 36. The database storage unit 38, the target feature extraction unit 40, the target feature quantization unit 342, and the search unit 44 are configured.

蓄積特徴量子化部３３２は、蓄積信号の各々について、蓄積特徴抽出部３０において抽出された当該蓄積信号の特徴データと、変換テーブル記憶部２６に記憶されている変換テーブルとに基づいて、当該蓄積信号に含まれる最小処理単位毎の特徴データを量子化データへ変換し、変換された量子化データの各々に基づいて、蓄積量子化データを生成し、属性付与部３４に出力する。対象となる蓄積信号において、変換テーブルの対象であるモーダルの組み合わせのうち、少なくとも１つ以上のモーダルを欠損している場合には、当該欠損しているモーダルに対応する学習信号の特徴データの代表値を埋める。代表値とは、基本統計量の一つで分布全体を一つの数で表したものであり、例えば、平均値、中央地、最頻値、最小値、最大値などである。 The accumulation feature quantization unit 332 performs the accumulation for each accumulation signal based on the feature data of the accumulation signal extracted by the accumulation feature extraction unit 30 and the conversion table stored in the conversion table storage unit 26. The feature data for each minimum processing unit included in the signal is converted into quantized data, accumulated quantized data is generated based on each of the converted quantized data, and is output to the attribute assigning unit 34. If at least one modal among the modal combinations that are targets of the conversion table is missing in the target accumulated signal, the feature data representative of the learning signal corresponding to the missing modal Fill in the value. The representative value is one of the basic statistics and represents the entire distribution as a single number, and includes, for example, an average value, a central location, a mode value, a minimum value, and a maximum value.

例えば、前記変換テーブルを２つのモーダルを組み合わせて作成した場合について、１つの処理特徴データを量子化データに変換する場合について説明する。変換テーブルの対象となるモーダルの組み合わせが２つのモーダルである場合において、一方のモーダルＭ_１の特徴データの次元をＤ_１、他方のモーダルＭ_２の特徴データの次元をＤ_２とすると、変換テーブルの代表ベクトルＶ_ｋの次元は、Ｄ_１＋Ｄ_２となる（モーダルＭ_１の後にモーダルＭ_２が続くことが予め定義されている）。蓄積信号に含まれるモーダルが、変換テーブルの対象となるモーダルの各々と一致する場合には、当該蓄積特徴データ内のある時刻ｔのベクトルＷ_ｔの次元は、Ｄ_１＋Ｄ_２となり、代表ベクトルＶ_ｋの次元と一致するので、Ｗ_ｔとＶ_ｋとの距離をそのまま計算し、その値が最も小さくなるようなｋを求めることで、量子化データを取得することができる。モーダルに不足がある場合、例えば、一つ目のモーダルＭ_１を欠損している場合は、前記蓄積特徴データ内のｔ番目のベクトルＷ_ｔ ^（２）の次元は、Ｄ_２次元となり、代表ベクトルＶ_ｋと比較すると、最初のＤ_１次元が欠損していることになる。第３の実施形態においては、欠損している部分については、対応する学習信号データの特徴データの代表値を埋めることによって対処する。この代表値は、学習信号の特徴データの次元ごと求められ、各次元で求めた代表値をつなげることにより、Ｄ_１次元のベクトルを得る。このＤ_１次元のベクトルとＷｔ^（２）とをつなげたベクトルと、代表ベクトルＶ_ｋとの距離を計算し、その値が最も小さくなるようなｋを求めることで、量子化データを取得する。なお、上記の例は１つ目のモーダルが欠損したときについて説明したが、２つ目のモーダルが欠損した場合についても同様に処理を行うことにより対応できる。 For example, a case where one processing feature data is converted into quantized data will be described in the case where the conversion table is created by combining two modals. When the combination of modals to be converted is two modals, if the dimension of the feature data of _one modal M ₁ is D ₁ and the dimension of the feature data of the other modal M ₂ is D ₂ , the conversion table The dimension of the representative vector V _k is D ₁ + D ₂ (predefined that modal M ₁ is followed by modal M ₂ ). When the modal included in the accumulated signal matches each of the modals to be converted, the dimension of the vector W _t at a certain time t in the accumulated feature data is D ₁ + D ₂ , and the representative vector V since matches the dimensions of _k, the distance between W _t and V _k calculated directly, by obtaining the k such that the value is the smallest, it is possible to acquire the quantized data. When the modal is insufficient, for example, when the _first modal M ₁ is missing, the dimension of the t-th vector W _t ⁽²⁾ in the accumulated feature data is D ₂ dimensional, and the representative vector Compared to V _k , the first D ₁ dimension is missing. In the third embodiment, the missing portion is dealt with by filling the representative value of the feature data of the corresponding learning signal data. The representative value is obtained for each dimension of the feature data of the learning signal, and a D _one- dimensional vector is obtained by connecting the representative values obtained in each dimension. Quantized data is acquired by calculating the distance between the vector connecting the D _one- dimensional vector and Wt ⁽²⁾ and the representative vector V _k and obtaining k that minimizes the value. In the above example, the case where the first modal is lost has been described. However, the case where the second modal is lost can also be handled by performing the same process.

目的特徴量子化部３４２は、目的信号の各々について、目的特徴抽出部４０において抽出された当該目的信号の特徴データと、変換テーブル記憶部２６に記憶されている変換テーブルとに基づいて、当該目的信号に含まれる最小処理単位毎の特徴データを量子化データへ変換し、変換された量子化データの各々に基づいて、目的量子化データを生成し、探索部４４に出力する。なお、目的信号の特徴データを目的量子化データへ変換する方法は、上述した蓄積特徴量子化部３３２における蓄積信号の特徴データを蓄積量子化データへ変換する方法と同様の処理であるため、詳細な説明は省略する。 The target feature quantization unit 342 performs, for each target signal, based on the feature data of the target signal extracted by the target feature extraction unit 40 and the conversion table stored in the conversion table storage unit 26. The feature data for each minimum processing unit included in the signal is converted into quantized data, target quantized data is generated based on each of the converted quantized data, and is output to the search unit 44. Note that the method of converting the feature data of the target signal into the target quantized data is the same processing as the method of converting the feature data of the stored signal into the stored quantized data in the stored feature quantizing unit 332 described above. The detailed explanation is omitted.

なお、第３の実施形態に係るマルチモーダル信号探索装置の他の構成については、第１の実施形態に係るマルチモーダル信号探索装置の構成と同様であるため、説明を省略する。 In addition, about the other structure of the multimodal signal search apparatus which concerns on 3rd Embodiment, since it is the same as that of the structure of the multimodal signal search apparatus which concerns on 1st Embodiment, description is abbreviate | omitted.

＜本発明の第３の実施形態に係るマルチモーダル信号探索装置の作用＞
次に、本発明の第３の実施形態に係るマルチモーダル信号探索装置３００の作用について説明する。マルチモーダル信号探索装置３００は、学習信号取得部１０によって学習信号の各々が取得されると、マルチモーダル信号探索装置によって、図２に示す学習信号処理ルーチンが実行される。また、マルチモーダル信号探索装置３００は、蓄積信号取得部１２によって蓄積信号を受け付けると、マルチモーダル信号探索装置３００によって、図９に示す蓄積信号処理ルーチンが実行される。また、マルチモーダル信号探索装置３００は、目的信号取得部１４によって目的信号を受け付けると、マルチモーダル信号探索装置３００によって、図１０に示す探索処理ルーチンが実行される。なお、第３の実施形態に係る学習信号処理ルーチンについては、第１の実施形態に係る学習信号処理ルーチンと同様である為、説明を省略する。 <Operation of Multimodal Signal Searching Device According to Third Embodiment of the Present Invention>
Next, the operation of the multimodal signal search apparatus 300 according to the third embodiment of the present invention will be described. In the multimodal signal search device 300, when each of the learning signals is acquired by the learning signal acquisition unit 10, the learning signal processing routine shown in FIG. 2 is executed by the multimodal signal search device. Further, when the accumulated signal acquisition unit 12 receives the accumulated signal, the multimodal signal search apparatus 300 executes an accumulated signal processing routine shown in FIG. Further, when the multimodal signal search device 300 receives the target signal by the target signal acquisition unit 14, the multimodal signal search device 300 executes a search processing routine shown in FIG. Note that the learning signal processing routine according to the third embodiment is the same as the learning signal processing routine according to the first embodiment, and thus description thereof is omitted.

始めに、図９に示す蓄積信号処理ルーチンについて説明する。 First, the accumulated signal processing routine shown in FIG. 9 will be described.

図９に示す蓄積信号処理ルーチンのステップＳ３００で、ステップＳ１２４において取得した、処理対象となる最小単位の特徴データの欠損部分に対応している部分の要素に、対応する学習信号の特徴データの代表値を埋める。 In step S300 of the accumulated signal processing routine shown in FIG. 9, the characteristic data of the learning signal corresponding to the element of the portion corresponding to the missing portion of the feature data of the minimum unit to be processed acquired in step S124. Fill in the value.

なお、第３の実施形態に係る蓄積信号処理ルーチンの他の処理については、第１の実施形態に係る蓄積信号処理ルーチンと同様であるため説明を省略する。 Note that other processes of the accumulation signal processing routine according to the third embodiment are the same as those of the accumulation signal processing routine according to the first embodiment, and thus the description thereof is omitted.

次に、図１０に示す探索処理ルーチンについて説明する。 Next, the search processing routine shown in FIG. 10 will be described.

図１０に示す探索処理ルーチンのステップＳ３２０で、ステップＳ１５６において取得した、処理対象となる最小単位の特徴データの欠損部分に対応している部分の要素に、対応する学習信号の特徴データの代表値を埋める。 In step S320 of the search processing routine shown in FIG. 10, the characteristic value of the learning signal corresponding to the element corresponding to the missing portion of the feature data of the minimum unit to be processed acquired in step S156. Fill.

なお、第３の実施形態に係る探索処理ルーチンの他の処理については、第１の実施形態に係る探索処理ルーチンと同様であるため説明を省略する。 Note that other processes of the search process routine according to the third embodiment are the same as those of the search process routine according to the first embodiment, and thus description thereof is omitted.

以上説明したように、本発明の第３の実施形態に係るマルチモーダル信号探索装置によれば、入力されたシングルモーダル又はマルチモーダルである目的信号の各々について、特徴データを抽出し、目的信号の各々について、抽出した目的信号の特徴データと、作成された変換テーブルとに基づいて、目的信号の一部が欠損している場合には、当該部分に対応する学習信号の特徴データの代表値を埋めるように目的量子化データを取得し、目的信号の各々について、取得された目的信号の目的量子化データに基づいて、作成されたデータベースから、目的量子化データに対応する蓄積量子化データに対応付けられている属性を探索することにより一部のモーダルが欠損していてもマルチモーダルな信号を探索することができる。 As described above, according to the multimodal signal search device according to the third embodiment of the present invention, feature data is extracted for each input single-modal or multimodal target signal, and the target signal For each, if a part of the target signal is missing based on the extracted feature data of the target signal and the created conversion table, the representative value of the feature data of the learning signal corresponding to the part is displayed. Acquire target quantized data to fill, and for each target signal, from the created database based on the acquired target quantized data of the target signal, corresponding to the accumulated quantized data corresponding to the target quantized data By searching for attached attributes, it is possible to search for multimodal signals even if some modals are missing.

例えば、第１、第２、及び第３の実施形態においては、学習処理、蓄積処理、及び探索処理が１つの装置において行われる場合について説明したが、これに限定されるものではない。例えば、学習処理、及び蓄積処理を学習装置において行い、探索処理を学習装置とは別の探索装置として構成してもよい。 For example, in the first, second, and third embodiments, the case where the learning process, the accumulation process, and the search process are performed in one apparatus has been described, but the present invention is not limited to this. For example, the learning process and the storage process may be performed in the learning device, and the search process may be configured as a search device different from the learning device.

また、第１、第２、及び第３の実施形態においては、蓄積信号、又は目的信号の一部が欠損している場合、実施形態毎に０を埋める処理、無視する処理、又は、学習信号の特徴データの代表値を埋める処理の何れかを実行するよう説明したが、これに限定されるものではない。例えば、蓄積信号、又は目的信号の一部が欠損している場合、０を埋める処理、無視する処理、及び学習信号の特徴データの代表値を埋める処理のいずれか適当な処理を処理毎に任意に選択してもよい。 In the first, second, and third embodiments, when a part of the accumulated signal or the target signal is missing, a process of filling 0, a process of ignoring, or a learning signal for each embodiment Although one of the processes for filling the representative value of the feature data has been described, the present invention is not limited to this. For example, if the stored signal or part of the target signal is missing, any appropriate process of filling the zero value, ignoring process, and filling the representative value of the feature data of the learning signal is arbitrarily selected for each process. You may choose.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能であるし、ネットワークを介して提供することも可能である。 Further, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium or provided via a network. It is also possible to do.

＜実験例＞
実際に、上記の第１、第２、及び第３の実施形態に係るマルチモーダル信号探索装置における処理を実データに対して適用した例について以下で説明する。まず、実験では、ヒップホップダンスを対象としてデータ収集を行った。データ収集は、２タイプ行い、一つ目は、図１１にようにウェアラブルなモーダルを用いたデータ収集、もう一つは、図１２のように外部に設置されたモーダルを用いたデータ収集である。なお、図１１の例においては、取得されるデータ等と部位等との組み合わせを表している。図１１、及び図１２のように本実験では、様々なモーダルでデータを収集しており、全ての組み合わせにおいて、発明技術を適用することができるが、検証では、画像信号データ、音声信号データ、加速度センサデータを対象にした。画像信号データは、演者の正面に設置した固定カメラを用いて収集したものを用い、音声信号データは、スピーカで流したダンスの曲をカメラに付属しているマイクによって収集したものを用い、加速度センサデータは、演者の頭頂部、胴部、両手首部、両足首部に装着したウェアラブルデバイスを用いて収集したものを使用した。 <Experimental example>
Actually, an example in which the processing in the multimodal signal search apparatus according to the first, second, and third embodiments is applied to actual data will be described below. First, in the experiment, we collected data for hip-hop dance. There are two types of data collection. The first is data collection using a wearable modal as shown in FIG. 11, and the other is data collection using an external modal as shown in FIG. . In the example of FIG. 11, a combination of acquired data or the like and a part or the like is shown. As shown in FIG. 11 and FIG. 12, in this experiment, data is collected in various modalities, and the invention technique can be applied to all combinations. However, in the verification, image signal data, audio signal data, Acceleration sensor data was targeted. The image signal data was collected using a fixed camera installed in front of the performer, and the audio signal data was collected using a microphone attached to the camera, and the acceleration of the dance music flowed through the speakers was accelerated. Sensor data collected using wearable devices attached to the top, torso, both wrists, and both ankles of the performer.

モーダルの種類としては、画像信号データが１種類、音声信号データが１種類、加速度センサデータが６種類の計８種類あり、これらの信号データの中から、１つ、または、２つを組み合わせたものをシングルモーダルデータ、または、マルチモーダルデータとみなし、蓄積信号、学習信号、目的信号の入力として用いた。ダンスは、全部で４つのパートで構成されており、どのパートに所属しているかという情報を属性データとして用いた。つまり、本実験では、シングルモーダルまたはマルチモーダルな信号を元に得た蓄積量子化データと、パートに関する属性データがセットとして、データベースに登録されており、目的信号が与えられた時には、そのデータベースを元にして、目的信号がどの属性を持つか、すなわち、どのパートに属するかということが出力として得られる。なお、実験に用いたデータの分割方法について述べると、４つのパートのうち、１つのパートのデータを学習信号として用い、変換テーブルの作成を行うために用いた。そして、残りの３つのパートは、蓄積信号、目的信号として用い、これらについて一致度の評価を行った。つまり、実験の評価対象は、３つのパートのうち、適切なパートを当てられるかどうかであり、ランダムに予測した場合、１/３の確率で当たる問題である。また、データ収集の際に、演者は２回同一のダンスを踊ったが、１回目の演技を蓄積信号、２回目の演技を目的信号として用いた。 There are eight types of modals, one type of image signal data, one type of audio signal data, and six types of acceleration sensor data, and one or two of these signal data are combined. The data was regarded as single-modal data or multi-modal data, and was used as an input for stored signals, learning signals, and target signals. The dance is composed of four parts in total, and information on which part it belongs to is used as attribute data. In other words, in this experiment, accumulated quantized data obtained based on a single-modal or multi-modal signal and part attribute data are registered as a set in the database, and when the target signal is given, the database is Based on this, it is possible to obtain as output which attribute the target signal has, that is, which part it belongs to. Note that the data division method used in the experiment is described in order to create a conversion table by using data of one part among four parts as a learning signal. The remaining three parts were used as an accumulation signal and a target signal, and the degree of coincidence was evaluated. In other words, the evaluation target of the experiment is whether or not an appropriate part can be applied among the three parts, and is a problem that hits with a probability of 1/3 when randomly predicted. In the data collection, the performer performed the same dance twice, but the first performance was used as the accumulation signal and the second performance as the target signal.

実験では、目的信号、蓄積信号、及び学習信号に用いるモーダルの組み合わせを変えて、比較検証を行った。本実験では、マルチモーダルデータとしては、２つのモーダルを組み合わせたものを想定している。以下では、表記上の都合として、一つ目のモーダルをモーダルＭ_１、二つ目のモーダルをモーダルＭ_２とする。なお、今回は、８種類のモーダルを用いている。２つのモーダルデータの組み合わせ数については、どちらのモーダルの取得時間を基準にして、結合するかということも考慮すると、_８Ｐ_２＝５６通りになる。 In the experiment, comparative verification was performed by changing the combination of modals used for the target signal, accumulated signal, and learning signal. In this experiment, multimodal data is assumed to be a combination of two modals. Hereinafter, for convenience of description, the _first modal is modal M ₁ and the second modal is modal M ₂ . In this case, eight types of modals are used. The number of combinations of the two modal data is ₈ P ₂ = 56 in consideration of which modal acquisition time is used as a reference.

比較を行った９パターンは以下の通りである。 The nine patterns compared are as follows.

（１）目的信号、蓄積信号、学習信号いずれもモーダルＭ_１のデータのみを用いるパターン。
（２）目的信号、蓄積信号、学習信号いずれもモーダルＭ_２のデータのみを用いるパターン。
（３）目的信号、蓄積信号、学習信号いずれもモーダルＭ_１とモーダルＭ_２を組み合わせたデータを用いるパターン。
（４）学習信号は、モーダルＭ_１とモーダルＭ_２を組み合わせたデータを用い、目的信号、蓄積信号はモーダルＭ_１のみを用いるパターン。なお、目的特徴量子化部分および蓄積特徴量子化部で量子化をする際に、モーダルの不足が生じるが、その不足分については、値を無視する。
（５）学習信号は、モーダルＭ_１とモーダルＭ_２を組み合わせたデータを用い、目的信号、蓄積信号はモーダルＭ_２のみを用いるパターン。なお、目的特徴量子化部分および蓄積特徴量子化部で量子化をする際に、モーダルの不足が生じるが、その不足分については、値を無視する。
（６）学習信号は、モーダルＭ_１とモーダルＭ_２を組み合わせたデータを用い、目的信号、蓄積信号はモーダルＭ_１のみを用いるパターン。なお、目的特徴量子化部分および蓄積特徴量子化部で量子化をする際に、モーダルの不足が生じるが、その不足分については、ゼロを埋める。本実験では、特徴データの抽出の後処理として、中心化を行っているため、ゼロを埋めることは、特徴データの代表値の一つである平均値を埋めることに対応する。
（７）学習信号は、モーダルＭ_１とモーダルＭ_２を組み合わせたデータを用い、目的信号、蓄積信号はモーダルＭ_２のみを用いるパターン。なお、目的特徴量子化部分および蓄積特徴量子化部で量子化をする際に、モーダルの不足が生じるが、その不足分については、ゼロを埋める。本実験では、特徴データの抽出の後処理として、中心化を行っているため、ゼロを埋めることは、特徴データの代表値の一つである平均値を埋めることに対応する。
（８）学習信号は、モーダルＭ_１とモーダルＭ_２を組み合わせたデータを用い、目的信号はモーダルＭ_１のみ、蓄積信号はモーダルＭ_２のみを用いるパターン。この場合、目的特徴量子化部分および蓄積特徴量子化部で量子化をする際に、モーダルの不足が生じるが、その不足分については、値を無視する。
（９）学習信号は、モーダルＭ_１とモーダルＭ_２を組み合わせたデータを用い、目的信号はモーダルＭ_２のみ、蓄積信号はモーダルＭ_１のみを用いるパターン。この場合、目的特徴量子化部分および蓄積特徴量子化部で量子化をする際に、モーダルの不足が生じるが、その不足分については、値を無視する。 (1) A pattern that uses only modal M ₁ data for all of the target signal, accumulated signal, and learning signal.
(2) target signal, the stored signal, either the learning signal using only data of the modal M ₂ pattern.
(3) A pattern using data in which the modal M ₁ and the modal M ₂ are combined for any of the target signal, the accumulation signal, and the learning signal.
(4) The learning signal uses data combining modal M ₁ and modal M ₂ , and the target signal and accumulated signal use only modal M ₁ . In addition, when quantization is performed in the target feature quantization part and the storage feature quantization unit, a modal shortage occurs, but the value of the shortage is ignored.
(5) A pattern in which the learning signal uses data combining modal M ₁ and modal M ₂ , and the target signal and the accumulation signal use only modal M ₂ . In addition, when quantization is performed in the target feature quantization part and the storage feature quantization unit, a modal shortage occurs, but the value of the shortage is ignored.
(6) The learning signal uses data combining modal M ₁ and modal M ₂ , and the target signal and accumulated signal use only modal M ₁ . In addition, when the target feature quantization part and the storage feature quantization unit perform quantization, a modal shortage occurs, but zero is filled in for the shortage. In this experiment, since centering is performed as a post-processing of feature data extraction, filling zero corresponds to filling an average value which is one of representative values of feature data.
(7) The learning signal uses data combining modal M ₁ and modal M ₂ , and the target signal and accumulated signal use only modal M ₂ . In addition, when the target feature quantization part and the storage feature quantization unit perform quantization, a modal shortage occurs, but zero is filled in for the shortage. In this experiment, since centering is performed as a post-processing of feature data extraction, filling zero corresponds to filling an average value which is one of representative values of feature data.
(8) The learning signal uses a combination of modal M ₁ and modal M ₂ , the target signal uses only modal M ₁ , and the accumulated signal uses only modal M ₂ . In this case, when quantization is performed by the target feature quantization part and the accumulation feature quantization unit, a modal shortage occurs, but the value of the shortage is ignored.
(9) The learning signal uses data obtained by combining modal M ₁ and modal M ₂ , the target signal uses only modal M ₂ , and the accumulation signal uses only modal M ₁ . In this case, when quantization is performed by the target feature quantization part and the accumulation feature quantization unit, a modal shortage occurs, but the value of the shortage is ignored.

上記９パターンのうち、（８）と（９）に関しては、目的信号と、蓄積信号のモーダルが完全に異なり、クロスモーダルに検索するパターンである。結果を図１３にまとめる。これは、モーダルの５６通りの組み合わせについて、精度の平均をとった値である。なお、平均をとった場合、パターン（１）（２）、（４）（５）、（６）（７）、（８）（９）は、同じ組み合わせについて評価をしていることになるが、符号テーブルを作成する際に用いるＫ‐ｍｅａｎｓについては、初期値依存性があり、ここにランダム性があるため、必ずしも一致していない。 Among the above nine patterns, (8) and (9) are patterns in which the modal of the target signal and the accumulated signal are completely different, and search is performed in a cross modal manner. The results are summarized in FIG. This is a value obtained by averaging the accuracy of 56 modal combinations. When the average is taken, the patterns (1), (2), (4), (5), (6), (7), (8), and (9) are evaluated for the same combination. The K-means used when creating the code table is dependent on the initial value, and there is randomness, so they do not necessarily match.

今回の実験で対象としている問題は、３つのパートのうち該当するパートを当てるという問題であり、ランダムに答えを選択した場合、精度は、１／３＝０.３３になる。これと、図１３の（１）から（９）の結果を比較すると、いずれも、０.３３を上回る値となっており、発明技術の有効性が分かる。 The problem that is the subject of this experiment is the problem of hitting the corresponding part of the three parts, and when the answer is selected at random, the accuracy is 1/3 = 0.33. When this is compared with the results of (1) to (9) in FIG. 13, both values are over 0.33, and the effectiveness of the inventive technique can be understood.

また、頭頂部のウェアラブルデバイスをモーダルＭ_１として用い、胴部のウェアラブルデバイスをモーダルＭ_２として用いた場合の結果を、図１４に示す。図１４では、上段が、左からパターン（１）から（３）の結果を表し、中段が、左からパターン（４）から（６）の結果を表し、下段が、左からパターン（７）から（９）の結果を表す。各々の混同行列は、縦軸が、目的信号の属するパートを表し、横軸が蓄積信号の属するパートを表す。この結果例では、いずれも精度は、１００％となっており、発明技術の有効性が分かる。なお、図１４の数字の値は距離を表すため、値が小さい程、一致度が高いことを表す。 FIG. 14 shows the results when the wearable device at the top of the head is used as the modal M ₁ and the wearable device at the torso is used as the modal M ₂ . In FIG. 14, the upper part represents the results of patterns (1) to (3) from the left, the middle part represents the results of patterns (4) to (6) from the left, and the lower part represents the results from pattern (7) from the left. The result of (9) is represented. In each confusion matrix, the vertical axis represents the part to which the target signal belongs, and the horizontal axis represents the part to which the accumulated signal belongs. In all of these result examples, the accuracy is 100%, which shows the effectiveness of the inventive technique. 14 represents the distance, the smaller the value, the higher the matching degree.

１０学習信号取得部
１２蓄積信号取得部
１４目的信号取得部
２０演算部
２２学習特徴抽出部
２４学習部
２６変換テーブル記憶部
３０蓄積特徴抽出部
３２蓄積特徴量子化部
３４属性付与部
３６データベース作成部
３８データベース記憶部
４０目的特徴抽出部
４２目的特徴量子化部
４４探索部
９０出力部
１００マルチモーダル信号探索装置
２００マルチモーダル信号探索装置
２２０演算部
２３２蓄積特徴量子化部
２４２目的特徴量子化部
３００マルチモーダル信号探索装置
３２０演算部
３３２蓄積特徴量子化部
３４２目的特徴量子化部 DESCRIPTION OF SYMBOLS 10 Learning signal acquisition part 12 Accumulated signal acquisition part 14 Objective signal acquisition part 20 Operation part 22 Learning feature extraction part 24 Learning part 26 Conversion table memory | storage part 30 Accumulated feature extraction part 32 Accumulated feature quantization part 34 Attribute provision part 36 Database preparation part 38 Database storage unit 40 Target feature extraction unit 42 Target feature quantization unit 44 Search unit 90 Output unit 100 Multimodal signal search device 200 Multimodal signal search device 220 Operation unit 232 Accumulated feature quantization unit 242 Target feature quantization unit 300 Multi Modal signal search device 320 Operation unit 332 Accumulated feature quantization unit 342 Target feature quantization unit

Claims

A learning feature extraction unit that extracts feature data for each of the input multimodal learning signals;
A learning unit that creates a conversion table from the feature data to a common code based on the feature data of each of the learning signals extracted by the learning feature extraction unit;
A storage feature extraction unit that extracts feature data for each of the input single-modal or multi-modal storage signals;
For each of the accumulated signals, the feature data of the accumulated signal is quantized using the code based on the feature data of the accumulated signal extracted by the accumulated feature extraction unit and the conversion table created by the learning unit. An accumulation feature quantization unit for obtaining accumulated quantization data converted into data;
For each of the accumulated signals, a database creating unit for creating the database by registering the accumulated quantized data of the accumulated signal acquired by the accumulated feature quantizing unit and the attribute of the accumulated signal in a database in association with each other,
Including a learning device.

A target feature extraction unit that extracts feature data for each of the input single-modal or multi-modal target signals;
The feature data of the target signal is encoded with respect to each of the target signals based on feature data of the target signal extracted by the target feature extraction unit and a conversion table created in the learning device according to claim 1. A target feature quantization unit for acquiring target quantized data converted into quantized data using
For each of the target signals, the accumulated quantum corresponding to the target quantized data from the database created in the learning device based on the target quantized data of the target signal acquired by the target feature quantizing unit. A search unit for searching for the attribute associated with the digitized data;
A search device including:

The accumulated feature quantization unit, when the feature data of the accumulated signal lacks data corresponding to the modal included in the multimodal of the learning signal, the missing feature data of the accumulated signal. The accumulated quantized data is acquired based on the feature data in which zeros are embedded in the portion and the conversion table, or the conversion table is acquired based on the feature data of the stored signal and the conversion table. Ignoring data corresponding to the missing portion of the feature data stored in the memory, obtaining the accumulated quantized data, or in the missing portion of the feature data of the accumulated signal 2. The learning device according to claim 1, wherein the accumulated quantized data is acquired based on feature data in which a representative value of feature data of the corresponding learning signal is filled and the conversion table. .

The target feature quantization unit, when the feature data of the target signal lacks data corresponding to the modal included in the multimodal of the learning signal, the feature data of the target signal. The target quantized data is acquired based on the feature data in which zeros are embedded in the portion and the conversion table, or the conversion table is acquired based on the feature data of the target signal and the conversion table. Ignoring data corresponding to the missing portion of the feature data stored in the target data, obtaining the target quantized data, or in the missing portion of the feature data of the target signal 3. The search device according to claim 2, wherein the target quantized data is acquired based on feature data in which representative values of feature data of the corresponding learning signal are filled and the conversion table. .

The learning signal includes two or more sensor data or media data,
The accumulated signal includes one or more sensor data or media data,
The search device according to claim 2 or 4, wherein the target signal includes one or more sensor data or media data.

A learning method in a learning device including a learning feature extraction unit, a learning unit, an accumulation feature extraction unit, an accumulation feature quantization unit, and a database creation unit,
The learning feature extraction unit extracts feature data for each of the input multimodal learning signals,
The learning unit creates a conversion table from the feature data to a common code based on the feature data of each of the learning signals extracted by the learning feature extraction unit,
The accumulation feature extraction unit extracts feature data for each of the inputted single-modal or multi-modal accumulation signals,
The accumulated feature quantization unit, for each of the accumulated signals, based on the accumulated signal feature data extracted by the accumulated feature extraction unit and the conversion table created by the learning unit, To obtain accumulated quantized data converted into quantized data using the code,
The database creation unit creates, for each of the accumulated signals, the accumulated quantization data of the accumulated signal acquired by the accumulated feature quantization unit and the attribute of the accumulated signal in association with each other, and creates the database. How to learn.

A search method in a search device, including a target feature extraction unit, a target feature quantization unit, and a search unit,
The target feature extraction unit extracts feature data for each of the input single-modal or multi-modal target signals,
The target feature quantization unit, for each of the target signals, based on feature data of the target signal extracted by the target feature extraction unit and a conversion table created in the learning method according to claim 6, Obtaining target quantized data obtained by converting characteristic data of the target signal into quantized data using the code,
For each of the target signals, the search unit converts the target quantized data from the database created in the learning device based on the target quantized data of the target signal acquired by the target feature quantizing unit. A search method for searching for the attribute associated with the corresponding accumulated quantized data.

A program for causing a computer to function as each part of the learning device according to claim 1 or 3, or the search device according to claim 2, claim 4, or claim 5.