JP2017027526A

JP2017027526A - Feature value generation method, feature value generation device, and feature value generation program

Info

Publication number: JP2017027526A
Application number: JP2015148079A
Authority: JP
Inventors: 豪入江; Takeshi Irie; 潤島村; Jun Shimamura; 明小島; Akira Kojima
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-07-27
Filing date: 2015-07-27
Publication date: 2017-02-02
Anticipated expiration: 2035-07-27
Also published as: JP6397378B2

Abstract

PROBLEM TO BE SOLVED: To provide a feature value generation method, a feature value generation device, and a feature value generation program that make it possible to discover a semantically associated signal content with high accuracy.SOLUTION: The method includes: generating a correspondence relation between each word and a word vector on the basis of co-occurrence of a word included in a document so that word vectors are closer to each other for words that are more likely to co-occur; extracting the document feature value of a document using a learned word vector; extracting the initial feature value of a signal content; finding the relative geometrical relation of the signal content and the document using at least one of the word vector and/or the document feature value; and generating, on the basis of the initial feature value, the relative geometrical relation, and a relation indicator, a feature value conversion function that converts the initial feature value to a low-order feature value and outputting the generated function.SELECTED DRAWING: Figure 1

Description

本発明は、画像・音声・映像などの信号コンテンツの検索や認識を実行するための特徴量を生成するに当たり、文書を用いることで、より意味的に関連した信号コンテンツの発見を可能にする特徴量を生成するための特徴量生成方法、特徴量生成装置、特徴量生成プログラムに関する。 The present invention enables the discovery of signal content more semantically related by using a document when generating a feature amount for executing search and recognition of signal content such as image, audio, and video. The present invention relates to a feature quantity generation method, a feature quantity generation device, and a feature quantity generation program for generating quantities.

通信環境、コンピュータ、分散処理基盤技術等の高度化・高品質化により、ネットワークに流通するメディアコンテンツ（画像・映像・音声等）の数は膨大なものとなっている。例えば、ある検索エンジンがインデクシングしているウェブページの数は数兆にのぼるといわれている。また、あるサイトでは、日々３．５億の画像がアップロードされていると報告されており、また、あるサイトでは、１分当たり６４時間分の映像が新規に公開されているとの報告もある。 The number of media contents (images, videos, sounds, etc.) distributed on the network has become enormous due to sophistication and high quality of communication environments, computers, distributed processing infrastructure technologies, and the like. For example, a search engine is said to have trillions of web pages indexed. Some sites report that 350 million images are uploaded every day, and some sites report that 64 hours of video per minute are newly released. .

以降、便宜上、画像、映像、音声等の信号メディアによってなるコンテンツを信号コンテンツと呼称する。 Hereinafter, for the sake of convenience, content made up of signal media such as images, video, and audio will be referred to as signal content.

このような膨大な量の信号コンテンツは、利用者にとっては豊富な情報源となる一方で、閲覧したい信号コンテンツに素早くアクセスすることがますます困難になっているという課題ももたらしている。このような流れの中、閲覧・視聴したい信号コンテンツを効率的に探し出すためのメディア解析技術への要望がますます高まってきている。 While such a huge amount of signal content is a rich source of information for users, it also brings about the problem that it is increasingly difficult to quickly access the signal content desired to be viewed. In such a trend, there is an increasing demand for media analysis technology for efficiently searching for signal contents to be browsed and viewed.

信号コンテンツの解析においては、意味的に関連している信号コンテンツを発見する手続きが重要な役割を果たす。例えば、信号コンテンツを分類する場合を考えると、同じような意味概念に属する信号コンテンツを同じカテゴリに分類することが常である。あるいは信号コンテンツを検索する場合、信号コンテンツをクエリとして与えたとき、この信号コンテンツと意味的に関連している信号コンテンツを検索することが基本的な要件となる。その他、コンテンツ推薦においても利用者がこれまでに閲覧した／閲覧している信号コンテンツと意味内容として類似した信号コンテンツを発見してこれを推薦するし、コンテンツ要約の場合においても、意味的に重複のない内容にまとめていく処理が必要となる。 In the analysis of signal content, procedures for finding semantically related signal content play an important role. For example, considering the case of classifying signal contents, it is usual to classify signal contents belonging to the same semantic concept into the same category. Alternatively, when searching for signal content, when signal content is given as a query, it is a basic requirement to search for signal content that is semantically related to the signal content. In addition, in content recommendation, a signal content similar to the signal content that the user has browsed / viewed so far is discovered and recommended, and this is also semantically duplicated in the case of content summarization. It is necessary to organize the contents into a non-content.

意味的に関連する信号コンテンツを発見する典型的な手続きについて解説しておく。まず、信号コンテンツをある特徴量によって表現する。次に、特徴量同士の近さを測ることで類似度を計算し、この類似度が近いものほど、意味的に関連している信号コンテンツであると見做す。単純な例を挙げれば、信号コンテンツが画像や映像であれば、画像（映像フレーム）の色ヒストグラムを特徴量としてその類似度を測ることができる。音声信号であれば、音声信号の波形の周波数特性を解析したもの（Mel-Frequency Cepstral Coefficient等）を特徴量として類似度を測ることができる。いうまでもなく、仮にコンテンツの数が1,000あれば、1,000のコンテンツそれぞれに対して類似度を計算し、結果類似度の高いコンテンツを類似コンテンツとして拾い上げる必要がある。 Describe a typical procedure for discovering semantically relevant signal content. First, signal content is expressed by a certain feature amount. Next, the similarity is calculated by measuring the closeness of the feature quantities, and the closer the similarity is, the more the signal content is considered to be semantically related. To give a simple example, if the signal content is an image or video, the degree of similarity can be measured using the color histogram of the image (video frame) as a feature amount. In the case of an audio signal, the degree of similarity can be measured using a characteristic (such as Mel-Frequency Cepstral Coefficient) obtained by analyzing the frequency characteristics of the waveform of the audio signal. Needless to say, if the number of contents is 1,000, it is necessary to calculate the similarity for each of the 1,000 contents, and to pick up the content having a high similarity as a similar content.

しかしながら、信号コンテンツの類似度を測る際には、下記（１）及び（２）に示す２つの重要な課題がある。
（１）膨大な計算時間を要する
（２）意味的に類似した信号コンテンツを発見することが難しい However, when measuring the similarity of signal content, there are two important problems shown in (1) and (2) below.
(1) Requires enormous calculation time (2) Difficult to find semantically similar signal content

以下、上記（１）及び（２）に示した重要な課題について、具体的に説明する。 Hereinafter, the important problems shown in the above (1) and (2) will be specifically described.

（１）通常、信号コンテンツの特徴量（ベクトル）の次元は高次元になることが多く、その類似度の計算には膨大な時間を要する。画像や映像を例に挙げると、色ヒストグラムのような単純な特徴量であっても、一般に数百〜数千次元の実数値ベクトルとなるし、最近用いられるニューラルネットを用いた特徴表現では数千次元、スパース表現やフィッシャーカーネルに基づく特徴表現では、数十万〜数百万次元のベクトルとなることもあり得る。さらに、全てのコンテンツの組に対してその類似度を計算する必要があるため、どのような類似度計算手段を用いようとも、特徴量の次元Ｄ、コンテンツ数Ｎに対してそれぞれ比例する計算時間を要する。 (1) Usually, the dimension of the feature amount (vector) of the signal content is often high, and the calculation of the degree of similarity requires a huge amount of time. Taking images and videos as an example, even simple features such as color histograms are typically hundreds to thousands of dimensional real-valued vectors, and in recent feature representations using neural networks, A feature representation based on a thousand dimensions, a sparse representation or a Fisher kernel can be a vector of hundreds of thousands to millions of dimensions. Furthermore, since it is necessary to calculate the degree of similarity for all content sets, the calculation time is proportional to the dimension D of the feature amount and the number N of contents, regardless of what degree of similarity calculation means is used. Cost.

（２）先に述べた色ヒストグラムのような画像特徴量に代表されるように、画像・映像・音声等の信号コンテンツの特徴表現は、一般に物理的な性質を表すものが殆どであるが、当然のことながら、物理的な特徴量が近いからと言って、必ずしも意味的に関連のある信号コンテンツであるとは言えない。例えば、『（赤い）リンゴ』の画像に類似したコンテンツとして欲しいのは、『赤い鞄』ではなく、同じ果物である『青リンゴ』や『オレンジ』であるが、これらは少なくとも色ヒストグラムの近さで正しく評価することはできない。 (2) As represented by the image feature values such as the color histogram described above, the feature representation of signal content such as images, video, and audio generally represents physical properties in general. Naturally, just because physical features are close does not necessarily mean that the signal content is semantically related. For example, you want content similar to the “(red) apple” image, not “red candy”, but the same fruits “green apple” and “orange”, but these are at least close to the color histogram. Cannot be evaluated correctly.

以上の背景を鑑み、実用上、（１）高速でありながらも、（２）意味的に関連したコンテンツの発見を可能にする信号コンテンツの特徴量を生成することができる技術が望まれる。 In view of the above background, there is a demand for a technique that can generate (1) a feature quantity of signal content that enables the discovery of semantically related content while being (1) fast.

従来、このような技術に関していくつかの発明がなされ、開示されてきている。 In the past, several inventions have been made and disclosed regarding such techniques.

例えば、非特許文献１に開示されている技術では、沢山の画像群と、それに付随する意味ラベル（すなわち、個々の画像がどういった意味カテゴリに属するかを指示するラベル）とが所与の下、Convolutional Neural Network (CNN)を利用して画像と意味ラベルとの関係を学習し、特徴量化する方法について開示されている。 For example, in the technique disclosed in Non-Patent Document 1, a large number of images and associated semantic labels (that is, labels indicating what semantic category each image belongs to) are given. Below, a method for learning the relationship between images and semantic labels and converting them into features using the Convolutional Neural Network (CNN) is disclosed.

また、特許文献１に開示されている技術では、２種類の同時共起する信号コンテンツの特徴量の圧縮において、一方あるいは双方の特徴量が欠損していて同時共起とならなかった信号コンテンツを含む場合において、元の特徴量の次元を削減して低次元化する特徴量生成技術が開示されている。 Further, in the technology disclosed in Patent Document 1, in the compression of the feature amount of two types of signal content that co-occurs simultaneously, one or both of the feature amounts are missing and the signal content that has not been co-occurrence is detected. In such a case, a feature quantity generation technique for reducing the dimension of the original feature quantity to reduce the dimension is disclosed.

特開２０１０−２８２２７７号公報JP 2010-282277 A

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks." In Proc. Advances in Neural Information Processing Systems (NIPS)", Pages. 1097-1105, 2012.Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks." In Proc. Advances in Neural Information Processing Systems (NIPS) ", Pages. 1097-1105, 2012.

非特許文献１に開示されている技術では、画像と意味ラベルとのペアを元に画像と意味ラベルとの関係を学習することで、意味的な画像特徴量を得ることを可能にしていた。しかしながら、この技術は膨大な量の画像（例えば、非特許文献１に開示されている例ではおよそ120万枚の画像）と、そのそれぞれに対する意味ラベルが既知であることを前提とする。多くの場合、画像に意味ラベルを付与する場合には人手によって付与しなければならず、このような膨大な量の画像に意味ラベルを付与することは多大な労力がかかるため、現実的にこの技術を利用しにくい場合が多かった。また、この技術は画像に対してのみ適用できる技術であり、例えば音声など他の信号コンテンツに適用することはできなかった。 In the technique disclosed in Non-Patent Document 1, it is possible to obtain a semantic image feature amount by learning a relationship between an image and a semantic label based on a pair of an image and a semantic label. However, this technique is based on the assumption that an enormous amount of images (for example, approximately 1.2 million images in the example disclosed in Non-Patent Document 1) and semantic labels for each of them are known. In many cases, when a semantic label is assigned to an image, it must be given manually, and it is practically difficult to assign a semantic label to such a large amount of images. There were many cases where it was difficult to use the technology. Further, this technique is a technique that can be applied only to images, and cannot be applied to other signal contents such as audio.

また、特許文献１に開示されている技術は、同時共起する２種のコンテンツのペアを前提として、その相関関係を使って新たな低次元特徴量を生成する技術である。非特許文献１に開示されている技術とは異なり、画像に意味ラベルを直接付与する必要がない点が特長である。 The technique disclosed in Patent Document 1 is a technique for generating a new low-dimensional feature amount using a correlation between two types of content that co-occur simultaneously. Unlike the technique disclosed in Non-Patent Document 1, it is a feature that it is not necessary to directly attach a semantic label to an image.

特許文献１の技術において、特徴量は、信号コンテンツの特徴量と文書の特徴量との統計量（相関）に基づいて学習生成される。しかしながら、信号コンテンツの物理的な特徴量と、文書の意味的な特徴量との単純な相関は、必ずしも有意ではない場合も多く、結果として意味的に関連した信号コンテンツを発見できるような特徴量を得ることは難しい場合も多かった。特に、この技術は、同時共起する信号コンテンツと文書とのペアを多数必要とするものであり、もし十分な数のペアが集められないような場合には、十分な精度を得ることが難しかった。 In the technique of Patent Document 1, the feature amount is generated by learning based on a statistical amount (correlation) between the feature amount of the signal content and the feature amount of the document. However, the simple correlation between the physical feature amount of the signal content and the semantic feature amount of the document is often not significant, and as a result, the feature amount can be used to discover the semantically related signal content. It was often difficult to obtain. In particular, this technique requires a large number of signal content and document pairs that co-occur at the same time, and it is difficult to obtain sufficient accuracy if a sufficient number of pairs cannot be collected. It was.

本発明は、以上のような事情に鑑みてなされたものであり、信号コンテンツの低次元特徴量を生成可能であり、かつ、文書の意味内容と対応した、文書特徴量の特徴的な幾何的特性を捕捉し、幾何的特性に基づいて信号コンテンツと文書との関連性を学習することで、信号コンテンツと文書のペアが少ないような場合であっても、意味的に関連した信号コンテンツを精度良く発見することを可能とする特徴量生成方法、特徴量生成装置、及び特徴量生成プログラムを提供することを目的とする。 The present invention has been made in view of the circumstances as described above, and is capable of generating low-dimensional feature quantities of signal content and has a characteristic geometric characteristic of a document feature quantity corresponding to the semantic contents of the document. Capturing characteristics and learning the relevance between signal content and documents based on geometric characteristics ensures accurate signal content that is semantically related, even when there are few pairs of signal content and documents It is an object of the present invention to provide a feature quantity generation method, a feature quantity generation device, and a feature quantity generation program that can be found well.

上記目的を達成するために、本発明の特徴量生成方法は、所望の種別の信号コンテンツ及び文書がそれぞれ１つ以上所与であり、前記信号コンテンツ及び前記文書の１つ以上の組の関係性の有無を表す関係指示子が所与である場合に、前記信号コンテンツの低次元特徴量を生成する特徴量変換関数を学習する特徴量生成方法であって、前記文書に含まれる単語の共起に基づいて、共起しやすい単語ほど相互に近い単語ベクトルになるように、各単語及び単語ベクトル間の対応関係を生成する単語ベクトル学習ステップと、学習した前記単語ベクトルを用いて、前記文書の文書特徴量を抽出する文書特徴抽出ステップと、前記信号コンテンツの初期特徴量を抽出する初期特徴量抽出ステップと、前記単語ベクトル及び前記文書特徴量のうちの少なくとも１つを用いて、前記信号コンテンツ及び前記文書の相対的幾何関係を求め、前記初期特徴量、前記相対的幾何関係、及び関係指示子に基づいて、前記初期特徴量を低次元特徴量に変換する特徴量変換関数を生成し、出力する特徴量変換関数生成ステップと、を有する。 In order to achieve the above object, according to the feature value generation method of the present invention, one or more desired types of signal contents and documents are respectively given, and the relationship between one or more sets of the signal contents and the documents. A feature value generation method for learning a feature value conversion function for generating a low-dimensional feature value of the signal content when a relation indicator indicating presence / absence of a signal is given, and a co-occurrence of words included in the document Based on the above, the word vector learning step for generating the correspondence between each word and the word vector so that the more likely words to be co-occurred, the word vector learning step, and using the learned word vector, A document feature extracting step for extracting a document feature, an initial feature extracting step for extracting an initial feature of the signal content, a small one of the word vector and the document feature The relative geometric relationship between the signal content and the document is obtained using at least one, and the initial feature amount is reduced to a low-dimensional feature amount based on the initial feature amount, the relative geometric relationship, and a relation indicator. Generating a feature amount conversion function to be converted into a feature amount conversion function and outputting the feature amount conversion function.

なお、記所望の種別の信号コンテンツが所与である場合において、前記信号コンテンツの前記低次元特徴量を生成する特徴量生成方法であって、前記信号コンテンツの前記初期特徴量を抽出する初期特徴量抽出ステップと、上記特徴量生成方法により生成した前記特徴量変換関数に基づいて、前記初期特徴量を低次元化して出力する低次元化ステップと、を有するようにしても良い。 In addition, when the signal content of the desired type is given, the feature amount generation method for generating the low-dimensional feature amount of the signal content, the initial feature extracting the initial feature amount of the signal content There may be included a quantity extraction step and a reduction order step of reducing the initial feature quantity and outputting it based on the feature quantity conversion function generated by the feature quantity generation method.

また、本発明の特徴量生成装置は、所望の種別の信号コンテンツ及び文書がそれぞれ１つ以上所与であり、前記信号コンテンツ及び前記文書の１つ以上の組の関係性の有無を表す関係指示子が所与である場合に、前記信号コンテンツの低次元特徴量を生成する特徴量変換関数を学習する特徴量生成装置であって、前記文書に含まれる単語の共起に基づいて、共起しやすい単語ほど相互に近い単語ベクトルになるように、各単語及び単語ベクトル間の対応関係を生成する単語ベクトル学習部と、学習した前記単語ベクトルを用いて、前記文書の文書特徴量を抽出する文書特徴抽出部と、前記信号コンテンツの初期特徴量を抽出する初期特徴量抽出部と、前記単語ベクトル及び前記文書特徴量のうちの少なくとも１つを用いて、前記信号コンテンツ及び前記文書の相対的幾何関係を求め、前記初期特徴量、前記相対的幾何関係、及び関係指示子に基づいて、前記初期特徴量を低次元特徴量に変換する特徴量変換関数を生成し、出力する特徴量変換関数生成部と、を有する。 In addition, the feature value generation apparatus of the present invention provides one or more desired types of signal content and document, and a relationship instruction that indicates whether or not there is a relationship between one or more sets of the signal content and the document. A feature value generation device that learns a feature value conversion function that generates a low-dimensional feature value of the signal content when a child is given, based on co-occurrence of words included in the document A word vector learning unit that generates a correspondence between each word and the word vector so that the easier the word is to be closer to each other, and the document feature amount of the document is extracted using the learned word vector A document feature extraction unit, an initial feature amount extraction unit that extracts an initial feature amount of the signal content, and at least one of the word vector and the document feature amount, and the signal content. And determining a relative geometric relationship of the document, and generating a feature amount conversion function for converting the initial feature amount into a low-dimensional feature amount based on the initial feature amount, the relative geometric relationship, and a relation indicator. And a feature quantity conversion function generation unit to output.

また、所望の種別の信号コンテンツが所与である場合において、信号コンテンツの低次元特徴量を生成する特徴量生成装置であって、前記信号コンテンツの初期特徴量を抽出する初期特徴量抽出部と、上記特徴量生成装置により生成した前記特徴量変換関数に基づいて、前記初期特徴量を低次元化して出力する低次元化部と、を有するようにしても良い。 In addition, when a desired type of signal content is given, a feature amount generation device that generates a low-dimensional feature amount of the signal content, an initial feature amount extraction unit that extracts the initial feature amount of the signal content; A lower-order unit that lowers the initial feature value based on the feature value conversion function generated by the feature value generation device and outputs the reduced feature value.

本発明の特徴量生成プログラムは、コンピュータに、上記特徴量生成方法の各ステップを実行させるためのプログラムである。 The feature value generation program of the present invention is a program for causing a computer to execute each step of the feature value generation method.

以上の特徴からなる本発明によれば、文書特徴量の持つ幾何的な特性を捉えることで、文書の持つ意味内容をより正確に捉え、幾何的特性を用いて信号コンテンツと文書の関係性を学習することで、信号コンテンツと文書のペアが少ないような場合であっても、より意味的に関連した信号コンテンツを精度よく発見することを可能とする信号コンテンツの低次元特徴量を生成可能な特徴量生成方法、特徴量生成装置、特徴量生成プログラムを提供することができる。 According to the present invention composed of the above features, by capturing the geometric characteristics of the document features, the semantic content of the document can be captured more accurately, and the relationship between the signal content and the document can be determined using the geometric characteristics. By learning, even when there are few pairs of signal contents and documents, it is possible to generate low-dimensional feature quantities of signal contents that enable more accurate detection of signal contents that are more semantically related. A feature value generation method, a feature value generation device, and a feature value generation program can be provided.

さらに、本発明により生成される信号コンテンツの特徴量は、元の初期特徴量と比べて非常に低次元であることから、高速な類似コンテンツの発見が可能である。つまり、より実時間性の要求される利用に対しても対応可能であり、これらの効果を活用した具体的な利用シーンとして、街中を歩いているときに気になる場所や商品をモバイル端末で写真撮影し、類似した場所・商品を検索することが可能になるという利点がある。 Furthermore, since the feature amount of the signal content generated by the present invention is very low in comparison with the original initial feature amount, it is possible to find similar content at high speed. In other words, it is possible to respond to usage that requires more real-time performance, and as a specific usage scene that makes use of these effects, places and products that you are interested in walking around the city can be viewed on mobile devices. There is an advantage that it is possible to take a picture and search for similar places and products.

上記２点の特長によれば、本発明によって（１）高速でありながらも、（２）意味的に類似したコンテンツの発見を可能にする信号コンテンツの特徴量を生成可能である。 According to the features of the above two points, according to the present invention, (1) it is possible to generate a feature quantity of signal content that enables (1) high speed but also (2) discovery of semantically similar content.

第１実施形態に係る特徴量生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the feature-value production | generation apparatus which concerns on 1st Embodiment. 実施形態に係る特徴量変換関数の生成方法を説明するための模式図である。It is a schematic diagram for demonstrating the production | generation method of the feature-value conversion function which concerns on embodiment. 実施形態に係る特徴量生成装置により実行される特徴量変換関数学習処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the feature-value conversion function learning process performed by the feature-value production | generation apparatus which concerns on embodiment. 実施形態に係る特徴量生成装置により実行される特徴量変換処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the feature-value conversion process performed by the feature-value production | generation apparatus which concerns on embodiment. 第２実施形態に係る特徴量生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the feature-value production | generation apparatus which concerns on 2nd Embodiment.

以下、本発明の実施形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜＜全体構成＞＞ << Overall structure >>

図１は、本発明の実施形態に係る特徴量生成装置１の構成の一例を示す機能ブロック図である。 FIG. 1 is a functional block diagram illustrating an example of a configuration of a feature quantity generation device 1 according to an embodiment of the present invention.

同図に示す特徴量生成装置１は、単語ベクトル学習部１１、文書特徴量抽出部１２、初期特徴量抽出部１３、特徴量変換関数生成部１４、及び、低次元化部１５を備える。また、特徴量生成装置１は、記憶装置として記憶部３を備える。 The feature value generation apparatus 1 shown in FIG. 1 includes a word vector learning unit 11, a document feature value extraction unit 12, an initial feature value extraction unit 13, a feature value conversion function generation unit 14, and a reduction dimension unit 15. The feature quantity generation device 1 includes a storage unit 3 as a storage device.

また、特徴量生成装置１は、コンテンツデータベース２と通信手段を介して接続されて相互に情報通信する。特徴量生成装置１は、コンテンツデータベース２に登録された信号コンテンツ２１、文書２２、及び関係指示子２３に基づいて特徴量変換関数３１を生成して記憶部３に格納する特徴量変換関数学習処理を実行する。また、特徴量生成装置１は、学習して生成した特徴量変換関数３１を用い、信号コンテンツ４の初期特徴量に基づいて新たな低次元特徴量５を生成する特徴量変換処理を実行する。 Also, the feature quantity generation device 1 is connected to the content database 2 via communication means and communicates information with each other. The feature quantity generation device 1 generates a feature quantity conversion function 31 based on the signal content 21, the document 22, and the relation indicator 23 registered in the content database 2 and stores them in the storage unit 3. Execute. In addition, the feature quantity generation device 1 executes a feature quantity conversion process for generating a new low-dimensional feature quantity 5 based on the initial feature quantity of the signal content 4 using the learned feature quantity conversion function 31.

なお、同図において、実線矢印は、特徴量変換関数学習処理時のデータの通信とその方向を示し、破線矢印は、特徴量変換処理時のデータの通信とその方向を表している。 In the figure, solid line arrows indicate data communication and its direction during the feature amount conversion function learning process, and broken line arrows indicate data communication and its direction during the feature amount conversion process.

コンテンツデータベース２は、特徴量生成装置１の内部にあっても外部にあっても構わない。上述した通信手段は、任意の公知ものを用いることができる。本実施形態では、コンテンツデータベース２が特徴量生成装置１の外部にあるものとして、特徴量生成装置１は、インターネット、ＴＣＰ／ＩＰにより通信する通信手段を介してコンテンツデータベース２に接続されているものとする。本実施形態では、コンテンツデータベース２は、いわゆるＲＤＢＭＳ (リレーショナルデータベース管理システム；ＲｅｌａｔｉｏｎａｌＤａｔａｂａｓｅＭａｎａｇｅｍｅｎｔＳｙｓｔｅｍ）を含んで構成されているものとするが、これに限らず、他の管理システムを用いたデータベースであっても良い。 The content database 2 may be inside or outside the feature quantity generation device 1. Any known communication means can be used. In the present embodiment, it is assumed that the content database 2 is external to the feature quantity generation device 1, and the feature quantity generation device 1 is connected to the content database 2 via communication means that communicates via the Internet or TCP / IP. And In the present embodiment, the content database 2 includes a so-called RDBMS (Relational Database Management System), but is not limited to this, and is a database using another management system. May be.

同図に示すように、コンテンツデータベース２には、信号コンテンツ２１、文書２２、及び関係指示子２３が格納されている。信号コンテンツ２１は、複数のコンテンツファイルの集合によって構成される。例えば、信号コンテンツ２１の種別が画像である場合、信号コンテンツ２１は画像ファイルの集合によって構成される。信号コンテンツ２１の種別が音である場合は、信号コンテンツ２１は音ファイルに集合によって構成される。信号コンテンツ２１の種別が映像である場合は、信号コンテンツ２１は映像ファイルの集合によって構成される。一方、文書２２は、文書ファイルの集合によって構成される。 As shown in the figure, the content database 2 stores a signal content 21, a document 22, and a relationship indicator 23. The signal content 21 is constituted by a set of a plurality of content files. For example, when the type of the signal content 21 is an image, the signal content 21 is configured by a set of image files. When the type of the signal content 21 is sound, the signal content 21 is configured as a set in a sound file. When the type of the signal content 21 is video, the signal content 21 is configured by a set of video files. On the other hand, the document 22 is composed of a set of document files.

コンテンツデータベース２には、各々の信号コンテンツ２１のコンテンツファイル、各々の文書２２の文書ファイルに対して、それぞれを一意に識別可能な識別子（例えば、ファイル固有の通し番号によるＩＤ等）が関連付けられており、任意のファイルを識別子を指定することにより参照することができる。 In the content database 2, an identifier (for example, an ID by a file-specific serial number) that is uniquely identifiable is associated with the content file of each signal content 21 and the document file of each document 22. Any file can be referred to by specifying an identifier.

関係指示子２３は、各々の信号コンテンツ２１のコンテンツファイル、及び、各々の文書２２の文書ファイルの間の“関係”を示すものであり、“関係”があると判断される信号コンテンツ２１のコンテンツファイル及び文書２２の文書ファイルの組を識別子の組として記述したものである。ここで言う“関係”とは、好ましくは信号コンテンツ２１または文書２２の意味内容的関連性である。識別子の組を生成する手法としては、任意の手法を採用することができる。例えば、人手によって識別子の組を生成しても良く、機械的に識別子の組を生成しても良く、あるいはその双方によって識別子の組を生成しても良い。 The relationship indicator 23 indicates the “relation” between the content file of each signal content 21 and the document file of each document 22, and the content of the signal content 21 determined to have “relation”. A set of files and a document file of the document 22 is described as a set of identifiers. The “relationship” here is preferably the semantic content relationship of the signal content 21 or the document 22. Any method can be adopted as a method for generating a set of identifiers. For example, the identifier set may be generated manually, the identifier set may be generated mechanically, or both may be generated.

例えば、信号コンテンツ２１が画像である場合を考える。例えば、人手により識別子の組を生成する場合は、“画像３”の識別子を持つ３番目の画像ファイルと、“文書８”の識別子を持つ８番目の文書ファイルの内容が、人手で見て、その内容が相互に関連していると判断される場合には、ユーザの指示に基づき、３番目の画像ファイルの識別子と８番目の文書ファイルの識別子との組である｛“画像３”、“文書８”｝を示す情報が、関係指示子２３としてコンテンツデータベース２に格納される。 For example, consider the case where the signal content 21 is an image. For example, when generating a set of identifiers manually, the contents of the third image file having the identifier “image 3” and the eighth document file having the identifier “document 8” are viewed manually. When it is determined that the contents are related to each other, based on the user's instruction, a set of the identifier of the third image file and the identifier of the eighth document file {“image 3”, “ Information indicating the document 8 ″} is stored in the content database 2 as the relationship indicator 23.

また、例えば、機械的に識別子の組を生成する場合は、ウェブページから画像ファイルを収集するような場合が例として挙げられる。最も単純には、同一ウェブページ内にある画像ファイルＡと文書ファイルＢは関連していると見做し、画像ファイルＡの識別子と文書ファイルＢの識別子との組である｛“画像Ａ”、“文書Ｂ”｝を示す情報が、関係指示子２３としてコンテンツデータベース２に格納される。あるいは、ウェブページ上レイアウトとして近傍にある画像ファイルと文書ファイル同士は互いに関連していると見做し、これらの画像ファイルの識別子と文書ファイルの識別子との組を示す情報が、関係指示子２３としてコンテンツデータベース２に格納されても良い。機械的に識別子の組を生成する場合、人手をかけることなく関係指示子２３が得られるというメリットがある。 In addition, for example, when a set of identifiers is mechanically generated, a case where an image file is collected from a web page is given as an example. Most simply, the image file A and the document file B in the same web page are considered to be related, and are a set of an identifier of the image file A and an identifier of the document file B {“image A”, Information indicating “document B”} is stored in the content database 2 as the relationship indicator 23. Alternatively, it is assumed that the image file and the document file in the vicinity as the layout on the web page are related to each other, and information indicating the pair of the identifier of the image file and the identifier of the document file is the relation indicator 23. May be stored in the content database 2. When generating a set of identifiers mechanically, there is an advantage that the relation indicator 23 can be obtained without manpower.

なお、関係指示子２３は、必ずしもコンテンツデータベース２に格納されている全ての信号コンテンツ２１に対して付与されている必要はなく、一部の信号コンテンツ２１に対して付与されていれば良い。特に、本実施形態の技術は、文書の意味と相関の高い文書特徴量空間上の相対的な幾何関係を捉えることができる点が特長であり、与えられる関係指示子２３は、例えば、全ての信号コンテンツ２１のうちの半数程度の信号コンテンツ２１に対して付与されていれば十分である。 The relation indicator 23 does not necessarily need to be assigned to all the signal contents 21 stored in the content database 2, and may be given to a part of the signal contents 21. In particular, the technique of the present embodiment is characterized in that it can capture the relative geometric relationship in the document feature space that is highly correlated with the meaning of the document. The given relationship indicator 23 is, for example, It is sufficient if it is given to about half of the signal contents 21.

その他、メタデータとして、例えば、信号コンテンツ２１の内容を表現するデータ（信号コンテンツ２１のタイトル、概要文、キーワード等）、信号コンテンツ２１のフォーマットに関するデータ（信号コンテンツ２１のデータ量、信号コンテンツ２１を表すサムネイル等のサイズ等）等を含んでいても構わない。なお、本実施形態では、メタデータを使用しない場合について説明する。 In addition, as metadata, for example, data representing the content of the signal content 21 (title of the signal content 21, summary sentence, keyword, etc.), data related to the format of the signal content 21 (data amount of the signal content 21, signal content 21 Or the like, such as the size of a thumbnail or the like to be represented). In the present embodiment, a case where metadata is not used will be described.

ここで、上述した幾何関係について具体的に説明する。例えば幾何関係は行列Ｗとして表現される。幾何関係Ｗのｉ，ｊ番目の要素はｉ番目の文書特徴量とｊ番目の文書特徴量の近さを表すとし、これが大きければ大きいほど近いとする。一例として図２に示すように、幾何関係Ｗ内の要素として相互に大きな値を持っている文書特徴量８Ａ同士、文書特徴量８Ｂ同士は、（暗黙的に）グループを構成していると考えられる。ただし、実際にグルーピング処理をしているわけではなく、幾何関係Ｗ内の要素として相互に大きな値を持っている文書特徴量８Ａ同士、文書特徴量８Ｂ同士を、ここでは便宜上ある種のソフトなグループと表現している。幾何関係Ｗは意味的な関連性を表すものであるので、同じグループに属しているような文書特徴量同士は意味的関連性も高い。 Here, the above-described geometric relationship will be specifically described. For example, the geometric relationship is expressed as a matrix W. The i-th and j-th elements of the geometric relationship W represent the closeness between the i-th document feature value and the j-th document feature value, and the larger this value, the closer. As an example, as shown in FIG. 2, document feature values 8A and document feature values 8B having mutually large values as elements in the geometric relationship W are considered to (implicitly) constitute a group. It is done. However, the grouping process is not actually performed, and the document feature amounts 8A and the document feature amounts 8B having mutually large values as elements in the geometric relationship W are represented by a certain kind of software here for convenience. Expressed as a group. Since the geometric relationship W represents a semantic relationship, document feature quantities belonging to the same group also have a high semantic relationship.

このような前提で、同じグループの文書特徴量８Ａと、関係指示子２３を通じて接続されている初期特徴量９Ａ同士もまた、同じグループに属していると見做すことができる。また、同じグループの文書特徴量８Ｂと、関係指示子２３を通じて接続されている初期特徴量９Ｂ同士もまた、同じグループに属していると見做すことができる。これを通じて、初期特徴量９Ａ同士、初期特徴量９Ｂ同士もまた、文書特徴量の意味的な関連性を反映したようなグループを持つと考えられる。 Under such a premise, it can be considered that the document feature value 8A of the same group and the initial feature value 9A connected through the relationship indicator 23 also belong to the same group. Further, it can be considered that the document feature value 8B of the same group and the initial feature value 9B connected through the relationship indicator 23 also belong to the same group. Through this, it is considered that the initial feature quantities 9A and the initial feature quantities 9B also have groups that reflect the semantic relevance of the document feature quantities.

すなわち、同じグループに属する初期特徴量９Ａ同士、初期特徴量９Ｂ同士が同じ次元に、異なるグループに属する初期特徴量同士は異なる次元に配置されるように低次元化する特徴量変換関数を求める。なお、特徴量変換関数の詳細に関しては、後述する。 That is, a feature quantity conversion function for reducing the dimension is obtained so that the initial feature quantities 9A and 9B belonging to the same group are arranged in the same dimension, and the initial feature quantities belonging to different groups are arranged in different dimensions. Details of the feature amount conversion function will be described later.

特徴量生成装置１が備える各部およびコンテンツデータベース２は、演算処理装置、記憶装置等を備えたコンピュータやサーバ等により構成され、特徴量生成装置１の各部が実行する処理は各種プログラムによって実行される。本実施形態では、各種プログラムは、特徴量生成装置１が備える記憶装置に記憶されているが、各種プログラムの記憶先はこれに限らず、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録されても良く、ネットワークを通して提供されても良い。また、その他のいかなる構成要素も、必ずしも単一のコンピュータやサーバによって実現される必要はなく、ネットワークによって接続された複数のコンピュータにより分散されて実現されても構わない。 Each unit and the content database 2 included in the feature quantity generation device 1 are configured by a computer, a server, or the like that includes an arithmetic processing device, a storage device, and the like, and processes executed by each unit of the feature quantity generation device 1 are executed by various programs. . In the present embodiment, various programs are stored in a storage device included in the feature value generation device 1, but the storage destination of the various programs is not limited to this, and is recorded in a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory. Or may be provided through a network. Further, any other components are not necessarily realized by a single computer or server, and may be realized by being distributed by a plurality of computers connected by a network.

検索装置６は、本実施形態を実現する上で必須の構成要素ではないが、本実施形態による特徴量生成装置１を意味的に関連した信号コンテンツを検索する目的で利用する上で備える装置である。検索装置６は、特徴量生成装置１、ならびにコンテンツデータベース２と相互に通信可能な形で接続されている。 The search device 6 is not an essential component for realizing the present embodiment, but is a device provided for using the feature quantity generation device 1 according to the present embodiment for the purpose of searching for signal content that is semantically related. is there. The search device 6 is connected to the feature value generation device 1 and the content database 2 so as to communicate with each other.

＜＜処理部＞＞ << Processor >>

次に、本実施形態における特徴量生成装置１の各処理部について説明する。 Next, each processing unit of the feature quantity generation device 1 in the present embodiment will be described.

単語ベクトル学習部１１は、コンテンツデータベース２から文書２２を取得し、文書に含まれる１つ又は複数の単語の各々に対して、単語を特徴づけるベクトルである単語ベクトル３２を学習して生成し、文書特徴量抽出部１２に出力すると共に記憶部３に記憶させる。 The word vector learning unit 11 acquires the document 22 from the content database 2, learns and generates a word vector 32 that is a vector characterizing the word for each of one or more words included in the document, The data is output to the document feature amount extraction unit 12 and stored in the storage unit 3.

文書特徴量抽出部１２は、単語ベクトル学習部１１から、単語ベクトル３２、及びコンテンツデータベース２から取得した文書２２を取得し、単語ベクトル３２に基づいて文書２２の各文書ファイルの文書特徴量を抽出する。また、文書特徴量抽出部１２は、抽出した文書特徴量を、特徴量変換関数生成部１４に出力する。 The document feature amount extraction unit 12 acquires the word vector 32 and the document 22 acquired from the content database 2 from the word vector learning unit 11, and extracts the document feature amount of each document file of the document 22 based on the word vector 32. To do. In addition, the document feature amount extraction unit 12 outputs the extracted document feature amount to the feature amount conversion function generation unit 14.

初期特徴量抽出部１３は、特徴量変換関数学習処理において、コンテンツデータベース２に格納されている信号コンテンツ２１を取得し、取得した信号コンテンツ２１を解析し、信号コンテンツ２１の初期特徴量を抽出し、特徴量変換関数生成部１４に出力する。また、初期特徴量抽出部１３は、特徴量変換処理において、上述した通信手段を介して外部から入力された信号コンテンツ４を取得し、取得した信号コンテンツ４を解析し、信号コンテンツ４の初期特徴量を抽出し、低次元化部１５に出力する。 The initial feature quantity extraction unit 13 acquires the signal content 21 stored in the content database 2 in the feature quantity conversion function learning process, analyzes the acquired signal content 21, and extracts the initial feature quantity of the signal content 21 And output to the feature amount conversion function generation unit 14. Further, the initial feature quantity extraction unit 13 acquires the signal content 4 input from the outside via the communication means described above in the feature quantity conversion process, analyzes the acquired signal content 4, and performs initial feature of the signal content 4. The amount is extracted and output to the reduction unit 15.

特徴量変換関数生成部１４は、文書特徴量抽出部１２から文書特徴量を取得し、初期特徴量抽出部１３から初期特徴量を取得し、コンテンツデータベース２から関係指示子２３を取得し、記憶部３から単語ベクトル３２を読み出す。そして、特徴量変換関数生成部１４は、文書特徴量、初期特徴量、関係指示子２３、及び単語ベクトル３２を用いて、初期特徴量を新たな低次元特徴量５に変換する特徴量変換関数３１を学習して生成し、記憶部３に記憶させる。 The feature amount conversion function generation unit 14 acquires the document feature amount from the document feature amount extraction unit 12, acquires the initial feature amount from the initial feature amount extraction unit 13, acquires the relation indicator 23 from the content database 2, and stores it. The word vector 32 is read from the part 3. The feature quantity conversion function generation unit 14 uses the document feature quantity, the initial feature quantity, the relation indicator 23, and the word vector 32 to convert the initial feature quantity into a new low-dimensional feature quantity 5. 31 is generated by learning and stored in the storage unit 3.

低次元化部１５は、初期特徴量抽出部１３から初期特徴量を取得し、記憶部３から特徴量変換関数３１を読み出し、初期特徴量を特徴量変換関数３１を用いて低次元特徴量５に変換することにより低次元特徴量５を生成する。そして、低次元化部１５は、生成した低次元特徴量５をコンテンツデータベース２に記憶させる。これにより、低次元特徴量５は、コンテンツデータベース２に格納されている各々の信号コンテンツ２１に対応付けられた上で、コンテンツデータベース２に格納される。 The reduction unit 15 acquires the initial feature amount from the initial feature amount extraction unit 13, reads the feature amount conversion function 31 from the storage unit 3, and uses the feature amount conversion function 31 to convert the initial feature amount into the low-dimensional feature amount 5. The low-dimensional feature quantity 5 is generated by converting to. Then, the dimension reduction unit 15 stores the generated reduced dimension feature quantity 5 in the content database 2. As a result, the low-dimensional feature value 5 is stored in the content database 2 after being associated with each signal content 21 stored in the content database 2.

また、低次元化部１５は、特徴量生成装置１が検索装置６に接続されている場合には、生成した低次元特徴量５を検索装置６に出力する。検索装置６は、利用者の指示により信号コンテンツ４が入力されると、特徴量生成装置１から信号コンテンツ４に対応する低次元特徴量５を取得する。また、検索装置６は、コンテンツデータベース２を用いて信号コンテンツ４に対応する低次元特徴量５に類似する信号コンテンツ２１を検索し、コンテンツデータベース２から検索結果７を取得し、信号コンテンツ４の入力元に出力する。 Further, when the feature quantity generation device 1 is connected to the search device 6, the reduction unit 15 outputs the generated low-dimensional feature value 5 to the search device 6. When the signal content 4 is input according to a user instruction, the search device 6 acquires a low-dimensional feature value 5 corresponding to the signal content 4 from the feature value generation device 1. Further, the search device 6 searches the signal content 21 similar to the low-dimensional feature quantity 5 corresponding to the signal content 4 using the content database 2, acquires the search result 7 from the content database 2, and inputs the signal content 4. Output to the original.

＜＜処理概要＞＞ << Process overview >>

次に、本実施形態における特徴量生成装置１の各処理について大まかに説明する。本実施形態に係る特徴量生成装置１は、特徴量変換関数を学習して生成する特徴量変換関数学習処理、及び、初期特徴量を低次元特徴量に変換する特徴量変換処理を実行する。 Next, each process of the feature value generation apparatus 1 in the present embodiment will be roughly described. The feature value generation apparatus 1 according to the present embodiment executes a feature value conversion function learning process that learns and generates a feature value conversion function, and a feature value conversion process that converts an initial feature value into a low-dimensional feature value.

最初に、特徴量変換関数学習処理について説明する。図３は、特徴量変換関数学習処理の流れを示すフローチャートである。特徴量変換関数学習処理は、例えば検索の対象とする信号コンテンツ４に対して特徴量変換処理を実行する前に、少なくとも１度実施しておく処理であり、利用者の指示が入力されたタイミングで実行される。 First, the feature amount conversion function learning process will be described. FIG. 3 is a flowchart showing the flow of the feature amount conversion function learning process. The feature quantity conversion function learning process is a process that is performed at least once before the feature quantity conversion process is performed on the signal content 4 to be searched, for example, and the timing when the user's instruction is input Is executed.

ステップＳ２０１では、単語ベクトル学習部１１が、コンテンツデータベース２から文書２２を取得し、文書２２に含まれる単語の共起に基づいて、共起しやすい単語ほど相互に近い単語ベクトルになるように、単語ベクトル３２を学習して生成し、単語ベクトル３２を記憶部３に記憶させる。 In step S201, the word vector learning unit 11 acquires the document 22 from the content database 2, and based on the co-occurrence of words included in the document 22, the words that are likely to co-occur are closer to each other. The word vector 32 is learned and generated, and the word vector 32 is stored in the storage unit 3.

次のステップＳ２０２では、文書特徴量抽出部１２が、コンテンツデータベース２から文書２２を取得すると共に、記憶部３から単語ベクトル３２を取得する。また、文書特徴量抽出部１２は、その後、取得した単語ベクトル３２に基づいて文書２２に含まれる各々の文書ファイルの各々、又は、文書ファイルの一部に対して特徴抽出処理を行って文書特徴量を抽出し、特徴量変換関数生成部１４に出力する。 In the next step S <b> 202, the document feature amount extraction unit 12 acquires the document 22 from the content database 2 and acquires the word vector 32 from the storage unit 3. In addition, the document feature amount extraction unit 12 then performs feature extraction processing on each document file included in the document 22 or a part of the document file based on the acquired word vector 32 to obtain the document feature. The amount is extracted and output to the feature amount conversion function generation unit 14.

次のステップＳ２０３では、初期特徴量抽出部１３が、コンテンツデータベース２から信号コンテンツ２１を取得し、取得した信号コンテンツ２１に含まれるコンテンツファイルの各々、又は、コンテンツファイルの一部に対して特徴抽出処理を行って初期特徴量を抽出し、特徴量変換関数生成部１４に出力する。 In the next step S203, the initial feature amount extraction unit 13 acquires the signal content 21 from the content database 2, and performs feature extraction for each of the content files included in the acquired signal content 21 or a part of the content file. Processing is performed to extract an initial feature value and output to the feature value conversion function generation unit 14.

次のステップＳ２０４では、特徴量変換関数生成部１４が、記憶部３から取得した単語ベクトル、及び、文書特徴量抽出部１２から取得した文書特徴量のうちの少なくとも１つを用いて、信号コンテンツ２１及び文書２２の相対的な幾何関係を求め、初期特徴量抽出部１３から取得した初期特徴量、求めた相対的な幾何関係、及びコンテンツデータベース２から取得した関係指示子２３に基づいて特徴量変換関数３１を生成し、特徴量変換関数３１を記憶部３に記憶させる。 In the next step S204, the signal content conversion function generation unit 14 uses at least one of the word vector acquired from the storage unit 3 and the document feature amount acquired from the document feature amount extraction unit 12 to generate signal content. 21, the relative geometric relationship between the document 22 and the document 22 is obtained, the feature amount based on the initial feature amount obtained from the initial feature amount extraction unit 13, the obtained relative geometric relationship, and the relation indicator 23 obtained from the content database 2. A conversion function 31 is generated, and the feature amount conversion function 31 is stored in the storage unit 3.

以上のような処理により、特徴量生成装置１は、コンテンツデータベース２に格納された信号コンテンツ２１、文書２２、関係指示子２３から、特徴量変換関数３１を生成する。なお、各ステップにおいて実行される各種処理の詳細については後述する。 Through the processing as described above, the feature quantity generation device 1 generates the feature quantity conversion function 31 from the signal content 21, the document 22, and the relation indicator 23 stored in the content database 2. Details of various processes executed in each step will be described later.

次に、特徴量変換処理について説明する。図４は、特徴量変換処理の流れを示すフローチャートである。特徴量変換処理は、記憶部３に格納された特徴量変換関数３１を用いて、信号コンテンツ４の初期特徴量を低次元化する処理である。特徴量変換処理は、利用者により信号コンテンツ４が指定された上で、利用者の指示が入力されたタイミングで実行される。 Next, the feature amount conversion process will be described. FIG. 4 is a flowchart showing the flow of the feature amount conversion process. The feature amount conversion process is a process for reducing the initial feature amount of the signal content 4 using the feature amount conversion function 31 stored in the storage unit 3. The feature amount conversion processing is executed at the timing when the user's instruction is input after the signal content 4 is designated by the user.

ステップＳ３０１では、初期特徴量抽出部１３が、上述した通信手段を介して、利用者により指定された信号コンテンツ４を取得し、取得した信号コンテンツ４の初期特徴量を抽出し、初期特徴量を低次元化部１５に出力する。本実施形態では、利用者に指摘された信号コンテンツ４を取得するが、信号コンテンツ４の取得方法はこれに限らず、信号コンテンツ４が記憶部３に記憶されている場合には、記憶部３から取得しても良い。 In step S301, the initial feature amount extraction unit 13 acquires the signal content 4 designated by the user via the communication unit described above, extracts the initial feature amount of the acquired signal content 4, and obtains the initial feature amount. Output to the reduction unit 15. In the present embodiment, the signal content 4 indicated by the user is acquired. However, the acquisition method of the signal content 4 is not limited to this, and when the signal content 4 is stored in the storage unit 3, the storage unit 3. You may get from.

次のステップＳ３０２では、低次元化部１５が、記憶部３から取得した特徴量変換関数３１に基づいて、初期特徴量抽出部１３から取得した初期特徴量を低次元化して低次元特徴量５に変換して出力する。 In the next step S302, the reduction unit 15 reduces the initial feature amount acquired from the initial feature amount extraction unit 13 based on the feature amount conversion function 31 acquired from the storage unit 3 to reduce the low-dimensional feature amount 5. Convert to and output.

以上のような処理により、特徴量生成装置１は、利用者により指定された信号コンテンツ４の低次元特徴量５を求める。 Through the processing as described above, the feature quantity generation device 1 obtains the low-dimensional feature quantity 5 of the signal content 4 designated by the user.

＜＜各処理の処理詳細＞＞ << Details of each process >>

以降、上述した各々の処理の詳細について、本実施形態における一例を説明する。 Hereinafter, an example of the present embodiment will be described in detail for each of the processes described above.

［単語ベクトルの学習］ [Learning word vectors]

単語ベクトル３２は、単語に対して一意に定まる有限次元のベクトルである。単語ベクトル３２の次元Ｄｙとすると、次元Ｄｙには任意の整数の値を指定することができる。例えば、Ｄｙ＝１００、Ｄｙ＝１０００等とすると良い。ここで、語彙Ｖｏｃに含まれる単語ｖを単語ベクトル３２に変換する変換がｖｅｃ（ｖ）で表わされるとする。この場合、変換ｖｅｃ（ｖ）は、例えば、単語ｖをキーとし、単語ベクトル３２をその値とするハッシュやルックアップテーブルによって構成される。 The word vector 32 is a finite-dimensional vector that is uniquely determined for a word. Assuming the dimension Dy of the word vector 32, an arbitrary integer value can be designated for the dimension Dy. For example, Dy = 100, Dy = 1000, etc. are preferable. Here, it is assumed that the conversion for converting the word v included in the vocabulary Voc into the word vector 32 is represented by vec (v). In this case, the conversion vec (v) is configured by, for example, a hash or lookup table using the word v as a key and the word vector 32 as its value.

語彙Ｖｏｃに含まれる任意の単語ｖの単語ベクトル３２を生成方法としては、種々の公知の方法を適用することができ、本実施形態では、特異値分解（ＳｉｎｇｕｌａｒＶａｌｕｅＤｅｃｏｍｐｏｓｉｔｉｏｎ：ＳＶＤ）を用いる方法を適用する。この方法においては、下記の手続きによって単語ベクトル３２を生成する。 Various known methods can be applied as a method for generating the word vector 32 of an arbitrary word v included in the vocabulary Voc. In this embodiment, a method using singular value decomposition (SVD) is used. Apply. In this method, the word vector 32 is generated by the following procedure.

（１）文書内に出現する単語を重複なく列挙する。この際、表記揺れを吸収したり、語形が変化する単語の語幹でマッチングを行う処理であるステミング処理を施したりすることにより、実質的に同じ単語であると見做せる単語を予め１つの単語にまとめても構わない。また、この際、出現回数が極めて多い単語、出現回数が極めて少ない単語等を除去しても構わない。出現頻度に基づいて単語を除去する場合には、例えば、各々の単語について単語頻度逆文書頻度（Ｔｅｒｍ−ｆｒｅｑｕｅｎｃｙ、ＩｎｖｅｒｓｅＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ：ＴＦ−ＩＤＦ）のスコアを求め、スコアが低い単語を除去すれば良い。そして、これらの処理を行った後に最終的に列挙された単語の集合を、語彙Ｖｏｃとする。 (1) List words that appear in the document without duplication. At this time, a word that can be regarded as substantially the same word by absorbing notation shaking or performing a stemming process that is a process of matching with the stem of a word whose word shape changes is preliminarily made into one word You can put it together. At this time, words having a very large number of appearances, words having a very small number of appearances, and the like may be removed. In the case of removing a word based on the appearance frequency, for example, a word frequency inverse document frequency (Term-frequency, Inverse Document Frequency: TF-IDF) score is obtained for each word, and a word with a low score is removed. good. A set of words finally enumerated after performing these processes is defined as a vocabulary Voc.

（２）語彙Ｖｏｃ内の全ての単語について、各々の文書内の出現頻度を計数し、各々の要素として出現頻度を持つ文書数×単語数サイズの行列を生成する。なお、各々の要素の値は、出現頻度に限らず、例えばＴＦ−ＩＤＦのスコアの値としても良い。 (2) For all words in the vocabulary Voc, the appearance frequency in each document is counted, and a matrix of the number of documents × word number size having the appearance frequency as each element is generated. Note that the value of each element is not limited to the appearance frequency, and may be, for example, a TF-IDF score value.

（３）生成した行列にＳＶＤを施す。生成した行列は、ＳＶＤを施すことにより、「文書数×Ｄｙの行列」、「Ｄｙ×Ｄｙの行列」、及び「Ｄｙ×単語数の行列」に分解することが可能となる。なお、分解して得られたこれらの行列のうち、最後のＤｙ×単語数の行列の各列が、語彙Ｖｏｃに含まれる全ての単語の単語ベクトル３２を表している。 (3) SVD is applied to the generated matrix. The generated matrix can be decomposed into “number of documents × Dy matrix”, “Dy × Dy matrix”, and “Dy × word number matrix” by applying SVD. Of these matrices obtained by decomposition, each column of the last Dy × word number matrix represents a word vector 32 of all the words included in the vocabulary Voc.

このように、本実施形態では、単語ベクトル３２の生成方法としてＳＶＤを用いる方法を適用するが、単語ベクトル３２の生成方法としてはＳＶＤを用いる方法以外の方法が多数存在するため、ＳＶＤを用いる方法以外の方法を適用しても良い。その場合には、好ましくは、下記の参考文献１に記載のＳｋｉｐ−ｇｒａｍ（ＳＧ）、ＣｏｎｔｉｎｕｏｕｓＢａｇ−ｏｆ−Ｗｏｒｄｓ（ＣＢＯＷ）等を用いる方法を適用すると良い。 As described above, in the present embodiment, the method using SVD is applied as the method for generating the word vector 32. However, since there are many methods other than the method using the SVD as the method for generating the word vector 32, the method using the SVD. Other methods may be applied. In that case, it is preferable to apply a method using Skip-gram (SG), Continuous Bag-of-Words (CBOW) described in Reference Document 1 below.

［参考文献１］T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and Their Compositionality," In Proc., Advances in Neural Information Processing Systems (NIPS), 2013. [Reference 1] T. Mikolov, I. Sutskever, K. Chen, GS Corrado, and J. Dean, "Distributed Representations of Words and Phrases and Their Compositionality," In Proc., Advances in Neural Information Processing Systems (NIPS) , 2013.

ＳＧ、ＣＢＯＷ等を用いる方法では、文書内に出現する単語の前後関係を元に単語ベクトル３２を学習する。ＳＧを用いる方法では、文書内の各々の単語が入力情報として与えられた際、与えられた単語の前後に出現する単語を予測する３層のニューラルネット型の関数を学習する。このニューラルネット型の関数のモデルの第２層のノードの数をＤｙとすると、最終的に、その第２層―第１層間のネットワーク結合重みがＤｙ×単語数の行列で表され、この行列を単語ベクトル３２とする。 In the method using SG, CBOW or the like, the word vector 32 is learned based on the context of words appearing in the document. In the method using SG, when each word in a document is given as input information, a three-layer neural network type function that predicts words appearing before and after the given word is learned. When the number of nodes in the second layer of this neural network type function model is Dy, the network connection weight between the second layer and the first layer is finally expressed as a matrix of Dy × number of words. Is a word vector 32.

ＣＢＯＷを用いる方法では、ＳＧを用いる方法とは異なり、予測対象の単語の前後に出現する単語が入力情報として与えられた下で、予測対象の単語を予測する３層のニューラルネット型の関数を学習する。そして、ＳＧを用いる方法と同様に、第２層―第１層間のネットワーク結合重みを単語ベクトル３２とする。 In the method using CBOW, unlike the method using SG, a three-layer neural network type function that predicts a word to be predicted is given as input information with words appearing before and after the word to be predicted. learn. Similarly to the method using SG, the network connection weight between the second layer and the first layer is set as the word vector 32.

上記参考文献１によれば、このようにして生成された単語ベクトル３２は、ＳＶＤ等の単語ベクトル３２と比較して格段に高い意味内容捕捉能力があることが示されており、例えば、ｖｅｃ（“ベルリン”）−ｖｅｃ（“ドイツ”）＋ｖｅｃ（“フランス”） ≒ ｖｅｃ（“パリ”）になるという、特徴的な幾何的特性を持つことが示されている。従って、上記参考文献１に記載の単語ベクトル３２の生成方法により、単語の持つ意味の関係と、単語ベクトルの幾何的な関係の対応に特徴的な関係があるという点で、本実施形態の目的に適した単語ベクトル３２を得ることができる。 According to the above-mentioned reference document 1, it is shown that the word vector 32 generated in this way has a significantly higher meaning and content capturing capability compared to the word vector 32 such as SVD. For example, vec ( It has been shown that it has a characteristic geometric characteristic of “Berlin”) − vec (“Germany”) + vec (“France”) ≈vec (“Paris”). Therefore, the object of the present embodiment is that there is a characteristic relationship in the correspondence between the meaning relationship of the words and the geometric relationship between the word vectors by the method of generating the word vector 32 described in the above-mentioned reference 1. Can be obtained.

以上のような手続きにより、特徴量生成装置１は、入力された文書２２から、単語ベクトル３２を学習し、生成する。 Through the above procedure, the feature quantity generation device 1 learns and generates the word vector 32 from the input document 22.

［文書特徴量の抽出］ [Extract document features]

次に、生成した単語ベクトル３２に基づいて、任意の文書から当該文書の文書特徴量を抽出する抽出方法を詳細に説明する。 Next, an extraction method for extracting the document feature amount of the document from an arbitrary document based on the generated word vector 32 will be described in detail.

この手続きは至極単純であり、文書内に出現する単語のうち、語彙Ｖｏｃに含まれる単語を全て列挙し、列挙した各々の単語に対応する単語ベクトル３２の統計量を文書特徴量とすれば良い。ここで言う統計量は、文書に含まれている単語の数に依存せずに有効であるべきであるという要請から、複数の単語が含まれる文書であっても、１つの単語が含まれる文書である場合、すなわち、元の単語ベクトル３２自体と同じ次元を持つ、単語ベクトル空間上のデータとして表現可能である統計量であることが望ましい。例えば、次元毎の和、次元毎の最大値、次元毎の中央値、次元毎の最頻値等、次元毎に求めることができる統計量であれば任意の統計量を用いて良い。しかしながら、最も単純には、全ての単語ベクトル３２の平均ベクトルを求め、求めた平均ベクトルを文書特徴量とすることが好ましい。 This procedure is extremely simple, and all the words included in the vocabulary Voc among the words appearing in the document are listed, and the statistic of the word vector 32 corresponding to each listed word may be used as the document feature amount. . A document that contains one word even if it is a document that contains a plurality of words because the statistic here should be valid without depending on the number of words contained in the document In other words, the statistic is preferably a statistic that can be expressed as data in a word vector space having the same dimensions as the original word vector 32 itself. For example, any statistic such as a sum for each dimension, a maximum value for each dimension, a median value for each dimension, a mode value for each dimension, and the like can be used for each dimension. However, most simply, it is preferable to obtain an average vector of all the word vectors 32 and use the obtained average vector as the document feature amount.

以上のような手続きにより、特徴量生成装置１は、コンテンツデータベース２に格納された文書２２の文書特徴量を求める。 Through the procedure described above, the feature quantity generation device 1 obtains the document feature quantity of the document 22 stored in the content database 2.

［初期特徴量抽出］ [Initial feature extraction]

次に、信号コンテンツ２１の初期特徴量を抽出する初期特徴量抽出方法について説明する。 Next, an initial feature quantity extraction method for extracting the initial feature quantity of the signal content 21 will be described.

初期特徴量抽出処理では、抽出できる初期特徴量が、信号コンテンツ２１の種別（画像、音、映像等）によって異なっている。しかしながら、各々の種別の信号コンテンツ２１からどのような初期特徴量を抽出するかは、本実施形態の要件として重要ではなく、公知の特徴抽出処理を用いて公知の初期特徴量を抽出して構わない。 In the initial feature quantity extraction process, the initial feature quantity that can be extracted differs depending on the type (image, sound, video, etc.) of the signal content 21. However, what kind of initial feature value is extracted from each type of signal content 21 is not important as a requirement of this embodiment, and a known initial feature value may be extracted using a known feature extraction process. Absent.

具体的には、信号コンテンツ２１の初期特徴量は、信号コンテンツ２１から抽出された、次元を持つ数値データ（スカラー又はベクトル）であれば有効である。ここでは、本実施形態の一例に適する、各種信号コンテンツ２１に対する初期特徴抽出処理の一例を説明する。 Specifically, the initial feature amount of the signal content 21 is valid if it is numeric data (scalar or vector) having dimensions extracted from the signal content 21. Here, an example of an initial feature extraction process for various signal contents 21 suitable for an example of the present embodiment will be described.

信号コンテンツ２１の種類が画像である場合には、信号コンテンツ２１から、例えば、明るさ特徴、色特徴、テクスチャ特徴、コンセプト特徴、景観特徴等の特徴量を初期特徴量として抽出する。 When the type of the signal content 21 is an image, for example, feature amounts such as brightness features, color features, texture features, concept features, and landscape features are extracted from the signal content 21 as initial feature amounts.

明るさ特徴を抽出する場合は、ＨＳＶ色空間におけるＶ値を数え上げることで、ヒストグラムとして抽出する。この場合、信号コンテンツ２１に含まれる各々の画像は、Ｖ値の量子化数（例えば、16bit量子化であれば256諧調）と同数の次元を持つベクトルとして表現される。 When extracting the brightness feature, the V value in the HSV color space is counted and extracted as a histogram. In this case, each image included in the signal content 21 is represented as a vector having the same number of dimensions as the number of V-value quantizations (for example, 256 gradations for 16-bit quantization).

色特徴を抽出する場合は、Ｌ＊ａ＊ｂ＊色空間における各軸（Ｌ＊、ａ＊、ｂ＊）の値を数え上げることで、ヒストグラムとして抽出する。この際、各々の軸のヒストグラムのビンの数は、例えば、Ｌ＊に対して４、ａ＊に対して１４、ｂ＊に対して１４等とすれば良い。この場合、信号コンテンツ２１に含まれる各々の画像は、３軸の合計のビンの数は、４×１４×１４＝７８４、すなわち７８４次元のベクトルとして表現される。 When extracting color features, the value of each axis (L *, a *, b *) in the L * a * b * color space is counted up and extracted as a histogram. At this time, the number of histogram bins for each axis may be set to 4 for L *, 14 for a *, 14 for b *, and the like. In this case, each image included in the signal content 21 is expressed as a total of three axes of bins as 4 × 14 × 14 = 784, that is, a 784-dimensional vector.

テクスチャ特徴を抽出する場合は、濃淡ヒストグラムの統計量（コントラスト）、パワースペクトル等を抽出する。又は、局所特徴量を抽出しても良い。局所特徴量を抽出する場合には、色、動き等と同様に、ヒストグラムとして抽出することができるようになるため好適である。局所特徴量としては、例えば下記の参考文献２に記載のＳＩＦＴ（Scale Invariant Feature Transform ）、下記の参考文献３に記載のＳＵＲＦ（Speeded Up Robust Features）等を用いることができる。 When extracting a texture feature, a statistic (contrast) of a density histogram, a power spectrum, and the like are extracted. Or you may extract a local feature-value. When extracting a local feature amount, it is preferable because it can be extracted as a histogram as in the case of color, movement and the like. As the local feature amount, for example, SIFT (Scale Invariant Feature Transform) described in Reference Document 2 below, SURF (Speeded Up Robust Features) described in Reference Document 3 below, and the like can be used.

［参考文献２］D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, pp.91-110, 2004. [Reference 2] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, pp.91-110, 2004.

［参考文献３］H. Bay, T. Tuytelaars, and L.V. Gool, "SURF: Speeded Up Robust Features", Lecture Notes in Computer Science, vol. 3951, pp.404-417, 2006. [Reference 3] H. Bay, T. Tuytelaars, and L.V. Gool, "SURF: Speeded Up Robust Features", Lecture Notes in Computer Science, vol. 3951, pp.404-417, 2006.

このようにして抽出される局所特徴量は、例えば１２８次元の実数値ベクトルとして表現される。このベクトルを予め学習して生成しておいた符号長を参照して符号に変換し、その符号の数を数え上げることで、ヒストグラムを生成する。この場合、ヒストグラムのビンの数は、符号長の符号数と一致する。又は、局所特徴量として、参考文献４に記載のスパース表現、参考文献５及び６に記載のフィッシャーカーネルに基づく特徴表現等を用いても良い。 The local feature amount extracted in this way is expressed as a 128-dimensional real value vector, for example. This vector is converted into a code with reference to a code length generated by learning in advance, and a histogram is generated by counting the number of codes. In this case, the number of bins in the histogram matches the code number of the code length. Alternatively, the sparse expression described in Reference Document 4, the feature expression based on the Fisher kernel described in Reference Documents 5 and 6, and the like may be used as the local feature amount.

[参考文献４] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong, "Locality-constrained Linear Coding for Image Classification", IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360-3367, 2010. [Reference 4] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong, "Locality-constrained Linear Coding for Image Classification", IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360-3367, 2010.

[参考文献５] Florent Perronnin, Jorge Sanchez, Thomas Mensink, "Improving the Fisher Kernel for Large-Scale Image Classification", European Conference on Computer Vision, pp. 143-156, 2010. [Reference 5] Florent Perronnin, Jorge Sanchez, Thomas Mensink, "Improving the Fisher Kernel for Large-Scale Image Classification", European Conference on Computer Vision, pp. 143-156, 2010.

[参考文献６] Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Sanchez, Patrick Perez, Cordelia Schmid, "Aggregating Local Image Descriptors into Compact Codes", IEEE Trans. Pattern Recognition and Machine Intelligence, Vol. 34, No. 9, pp. 1704-1716, 2012. [Reference 6] Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Sanchez, Patrick Perez, Cordelia Schmid, "Aggregating Local Image Descriptors into Compact Codes", IEEE Trans. Pattern Recognition and Machine Intelligence, Vol. 34, No. 9, pp. 1704-1716, 2012.

何れの局所特徴量を用いた場合であっても、結果として生成される初期特徴量は、符号長の符号数に依存した長さを持つ実数値ベクトルとなる。 Regardless of which local feature is used, the resulting initial feature is a real-valued vector having a length that depends on the number of codes of the code length.

コンセプト特徴を抽出する場合は、画像内に含まれる物体、画像が捉えているイベント等の特徴量を初期特徴量として抽出する。抽出する物体、イベント等は、任意の物体、イベント等を用いて良いが、例えば、「海」、「山」、「ボール」等である。仮に、画像内に「海」が映されていた場合、その画像は、「海」のコンセプトに帰属する画像であると判断する。各々の画像が、各々のコンセプトに帰属するか否かは、コンセプト識別器を用いて識別する。通常、コンセプト識別器は、コンセプト毎に１つずつ用意され、画像の特徴量が入力されると、当該画像が識別対象とするコンセプトに帰属しているか否かを帰属レベルとして出力する。コンセプト識別器は、画像の特徴量（例えば上述した局所特徴量）と、予め人手によって入力された、当該画像がどのコンセプトに帰属しているかを示す正解ラベルと、の関係を学習することによって獲得される。学習器としては、例えばサポートベクターマシン等を用いれば良い。コンセプト特徴を抽出する場合は、各々のコンセプトへの帰属レベルをまとめてベクトルとして表現する。この場合、生成される初期特徴量は、コンセプトの数と同数の次元を持つベクトルとなる。 When extracting concept features, feature amounts such as an object included in the image and an event captured by the image are extracted as initial feature amounts. Any object, event, or the like may be used as the object to be extracted, such as “sea”, “mountain”, “ball”, and the like. If “sea” is reflected in the image, it is determined that the image belongs to the concept of “sea”. Whether each image belongs to each concept is identified using a concept classifier. Normally, one concept discriminator is prepared for each concept, and when an image feature amount is input, whether or not the image belongs to a concept to be identified is output as an attribution level. The concept discriminator is acquired by learning the relationship between the feature amount of the image (for example, the above-described local feature amount) and the correct answer label indicating in advance which concept the image belongs to. Is done. For example, a support vector machine may be used as the learning device. When extracting concept features, the level of belonging to each concept is collectively expressed as a vector. In this case, the generated initial feature amount is a vector having the same number of dimensions as the number of concepts.

景観特徴は、画像の風景や場面を表現した特徴量である。景観特徴を抽出する場合は、例えば下記の参考文献７に記載のＧＩＳＴ記述子を用いることができる。ＧＩＳＴ記述子は、画像を複数の領域に分割し、分割した各々の領域に対して一定のオリエンテーションを持つフィルタを掛けたときの係数によって表現される。しかし、この場合、生成される初期特徴量は、フィルタの種類（分割する領域の数、及びオリエンテーションの数）に依存した長さのベクトルとなる。 A landscape feature is a feature amount that represents a landscape or scene of an image. When extracting a landscape feature, for example, the GIST descriptor described in Reference Document 7 below can be used. The GIST descriptor is expressed by a coefficient when an image is divided into a plurality of areas and a filter having a certain orientation is applied to each of the divided areas. However, in this case, the generated initial feature amount is a vector having a length depending on the type of filter (the number of regions to be divided and the number of orientations).

［参考文献７］A. Oliva and A. Torralba, "Building the gist of a scene: the role of global image features in recognition", Progress in Brain Research, 155, pp.23-36, 2006. [Reference 7] A. Oliva and A. Torralba, "Building the gist of a scene: the role of global image features in recognition", Progress in Brain Research, 155, pp. 23-36, 2006.

また、非特許文献１に記載のＣＮＮによる特徴量を初期特徴量として抽出しても良い。 Further, the feature amount by CNN described in Non-Patent Document 1 may be extracted as the initial feature amount.

次に、信号コンテンツ２１の種類が音である場合には、信号コンテンツ２１から、例えば音高特徴、音圧特徴、スペクトル特徴、リズム特徴、発話特徴、音楽特徴、音イベント特徴等を初期特徴量として抽出する。 Next, when the type of the signal content 21 is a sound, for example, an initial feature amount such as a pitch feature, a sound pressure feature, a spectrum feature, a rhythm feature, an utterance feature, a music feature, a sound event feature, etc. is extracted from the signal content 21. Extract as

音高特徴を抽出する場合は、信号コンテンツ２１に含まれる音ファイルから例えば音高（ピッチ）の特徴量を抽出すれば良い。抽出方法としては、例えば、下記の参考ウェブサイトに記載の方法等を適用することができる。この場合、ピッチを１次元ベクトル（スカラー）として表現しても良く、あるいはピッチを複数の次元に量子化し、複数の次元を持つベクトルとして表現しても良い。 When extracting pitch features, for example, feature values of pitch (pitch) may be extracted from the sound file included in the signal content 21. As an extraction method, for example, a method described in the following reference website can be applied. In this case, the pitch may be expressed as a one-dimensional vector (scalar), or the pitch may be quantized into a plurality of dimensions and expressed as a vector having a plurality of dimensions.

［参考ウェブサイト］http://en.wikipedia.org/wiki/Pitch_detection_algorithm [Reference website] http://en.wikipedia.org/wiki/Pitch_detection_algorithm

音圧特徴を抽出する場合は、信号コンテンツ２１に含まれる音ファイルから音声波形データの振幅値の特徴量を初期特徴量として抽出すれば良い。また、信号コンテンツ２１に含まれる音ファイルから音声波形データの短時間パワースペクトルを抽出し、任意の帯域の平均パワーを計算して特徴量を求め、初期特徴量としても良い。音声波形データの振幅値を抽出しても、短時間パワースペクトルを抽出しても、生成した初期特徴量は、音圧を計算するバンドの数に依存した長さのベクトルとなる。 When extracting the sound pressure feature, the feature value of the amplitude value of the speech waveform data may be extracted from the sound file included in the signal content 21 as the initial feature value. Alternatively, a short-time power spectrum of voice waveform data may be extracted from a sound file included in the signal content 21, and an average power in an arbitrary band may be calculated to obtain a feature value, which may be used as an initial feature value. Whether the amplitude value of the speech waveform data is extracted or the short-time power spectrum is extracted, the generated initial feature amount is a vector whose length depends on the number of bands for calculating the sound pressure.

スペクトル特徴を抽出する場合は、信号コンテンツ２１に含まれる音ファイルから例えばメル尺度ケプストラム係数（ＭＦＣＣ：Mel-Frequency Cepstral Coefficients ）の特徴量を初期特徴量として抽出すれば良い。 When extracting spectral features, for example, a feature value of Mel-Frequency Cepstral Coefficients (MFCC) may be extracted from the sound file included in the signal content 21 as an initial feature value.

リズム特徴を抽出する場合は、信号コンテンツ２１に含まれる音ファイルから例えばテンポの特徴量を初期特徴量として抽出すればよい。テンポを抽出する際には、例えば下記の参考文献８に記載の方法等を適用することができる。 When extracting rhythm features, for example, a tempo feature value may be extracted from the sound file included in the signal content 21 as an initial feature value. When extracting the tempo, for example, the method described in Reference Document 8 below can be applied.

［参考文献８］E.D. Scheirer, "Tempo and Beat Analysis of Acoustic Musical Signals ", Journal of Acoustic Society America, Vol. 103, Issue 1, pp.588-601, 1998. [Reference 8] E.D. Scheirer, "Tempo and Beat Analysis of Acoustic Musical Signals", Journal of Acoustic Society America, Vol. 103, Issue 1, pp.588-601, 1998.

発話特徴及び音楽特徴は、それぞれ発話の有無及び音楽の有無を表す。発話特徴又は音楽特徴を抽出する場合は、信号コンテンツ２１に含まれる音ファイルから、発話又は音楽が存在する区間を特徴量として抽出すれば良い。発話又は音楽が存在する区間を識別するためには、例えば下記の参考文献９に記載の方法等を適用することができる。 The utterance feature and the music feature represent the presence or absence of utterance and the presence or absence of music, respectively. When extracting an utterance feature or a music feature, a section in which an utterance or music is present may be extracted as a feature amount from a sound file included in the signal content 21. In order to identify a section where speech or music exists, for example, the method described in Reference Document 9 below can be applied.

［参考文献９］K. Minami, A. Akutsu, H. Hamada, and Y. Tonomura, "Video Handling with Music and Speech Detection", IEEE Multimedia, vol. 5, no. 3, pp.17-25, 1998. [Reference 9] K. Minami, A. Akutsu, H. Hamada, and Y. Tonomura, "Video Handling with Music and Speech Detection", IEEE Multimedia, vol. 5, no. 3, pp. 17-25, 1998 .

音イベント特徴を抽出する場合は、音イベントとして、例えば、笑い声、大声等の感情的な音声、又は、銃声、爆発音等の環境音の生起等を検出し、このような音イベントの特徴量を初期特徴量として抽出すれば良い。このような音イベントを検出する際には、例えば下記の参考文献１０に記載に方法等を適用することができる。 When extracting sound event features, for example, emotional sounds such as laughter and loud voice, or environmental sounds such as gunshots and explosion sounds are detected as sound events. May be extracted as the initial feature amount. When detecting such a sound event, for example, a method described in Reference Document 10 below can be applied.

［参考文献１０］国際公開第２００８／０３２７８７号公報 [Reference 10] International Publication No. 2008/032787

信号コンテンツ２１の種類が映像である場合は、映像が一般に画像及び音のストリームであることから、上述した画像特徴及び音特徴を用いて初期特徴量を抽出することができる。映像中の何れの画像の区間を分析するか、又は何れの音の区間を分析するかについては、例えば、映像を複数の区間に予め分割し、その区間毎に１つの画像を抽出して特徴量を抽出する。また、映像を複数の区間に予め分割し、その区間毎に音の特徴量を抽出する。このようにして、初期特徴抽出処理を実施する。 When the type of the signal content 21 is a video, since the video is generally an image and sound stream, the initial feature amount can be extracted using the above-described image feature and sound feature. Regarding which image section in the video to analyze or which sound section to analyze, for example, the video is divided into a plurality of sections in advance, and one image is extracted for each section. Extract the amount. Further, the video is divided into a plurality of sections in advance, and the sound feature amount is extracted for each section. In this way, the initial feature extraction process is performed.

なお、映像を複数の区間に分割する場合は、映像を予め定めた一定の間隔で分割しても良く、例えば下記の参考文献１１に記載の分割方法等を適用し、映像が不連続に途切れる点であるカット点で分割してもよい。望ましくは、後者の分割方法を適用すると良い。映像を複数の区間に分割した結果として、各々の区間の開始点（開始時刻）と終了点（終了時刻）が得られるが、この時刻毎に別々の初期特徴量として扱えば良い。 In addition, when dividing | segmenting an image | video into a several area, you may divide | segment video at a predetermined fixed interval, for example, the division | segmentation method etc. of the following reference literature 11 etc. are applied, and an image | video breaks discontinuously. You may divide | segment by the cut point which is a point. Preferably, the latter division method is applied. As a result of dividing the video into a plurality of sections, a start point (start time) and an end point (end time) of each section are obtained, and may be handled as separate initial feature values for each time.

［参考文献１１］Y. Tonomura, A. Akutsu, Y. Taniguchi, and G. Suzuki, "Structured Video Computing", IEEE Multimedia, pp.34-43, 1994. [Reference 11] Y. Tonomura, A. Akutsu, Y. Taniguchi, and G. Suzuki, "Structured Video Computing", IEEE Multimedia, pp.34-43, 1994.

以上のようにして抽出した初期特徴量は、抽出した特徴量のうちの何れか１つの特徴量であっても良く、複数の特徴量から計算した特徴量であっても良い。また、初期特徴量は、上述した方法で抽出した特徴量に限らず、その他の公知の抽出方法で取得した特徴量を初期特徴量として用いても良い。 The initial feature quantity extracted as described above may be any one of the extracted feature quantities, or may be a feature quantity calculated from a plurality of feature quantities. The initial feature value is not limited to the feature value extracted by the above-described method, and a feature value acquired by another known extraction method may be used as the initial feature value.

［特徴量変換関数の生成］ [Generate feature conversion function]

次に、特徴量変換関数の生成方法について詳細に説明する。 Next, a method for generating a feature quantity conversion function will be described in detail.

複数の信号コンテンツ２１のうちの信号コンテンツｉから抽出された初期特徴量をｘｉ∈ＲＤと表す。また、信号コンテンツの初期特徴量次元はＤｘである。 An initial feature amount extracted from the signal content i of the plurality of signal contents 21 is represented as xiεRD. The initial feature dimension of the signal content is Dx.

このとき、信号コンテンツをｄ次元（ｄ≦Ｄｘ）に低次元化する特徴量変換関数ｆ：ＲＤｘ→Ｒｄを求める。本実施形態では、信号コンテンツ２１と文書２２の関係性を学習し、活用することで、文書２２の持つ意味的な内容を信号コンテンツ２１の初期特徴量に移し、意味的な類似性がより反映された低次元特徴量を生成することである。これを実現するために、下記２つの方針を採る。 At this time, a feature amount conversion function f: RDx → Rd for reducing the signal content to d dimension (d ≦ Dx) is obtained. In the present embodiment, by learning and utilizing the relationship between the signal content 21 and the document 22, the semantic content of the document 22 is transferred to the initial feature amount of the signal content 21, and the semantic similarity is more reflected. The generated low-dimensional feature value is generated. To achieve this, the following two policies are adopted.

（１）初期特徴量・文書特徴量間の関係の保存：文書特徴量から捕捉される意味内容を初期特徴量に移すためには、信号コンテンツ２１（初期特徴量）及び文書２２（文書特徴量）の関係を崩さないような特徴量変換関数ｆを生成する。本実施形態では、信号コンテンツ２１及び文書２２の関係は関係指示子２３として与えられるため、関係指示子２３が指示する関係を保存するように特徴量変換関数ｆを学習する。 (1) Saving of the relationship between the initial feature quantity and the document feature quantity: In order to transfer the semantic content captured from the document feature quantity to the initial feature quantity, the signal content 21 (initial feature quantity) and the document 22 (document feature quantity) are stored. ) Is generated so as not to break the relationship. In this embodiment, since the relationship between the signal content 21 and the document 22 is given as the relationship indicator 23, the feature amount conversion function f is learned so as to store the relationship indicated by the relationship indicator 23.

（２）文書特徴量空間の幾何関係の保存：先に述べた通り、単語ベクトル３２には、例えばＳＧやＣＢＯＷにおける「ｖｅｃ（“ベルリン”）−ｖｅｃ（“ドイツ”）＋ｖｅｃ（“フランス”）≒ｖｅｃ（“パリ”）」等、単語の持つ意味と、単語ベクトル３２の幾何関係の間に明確な関連性があることが知られている。然るに、単語ベクトル空間上の幾何関係を保存するように、特徴量変換関数ｆを学習すれば、単語ベクトルの持つ意味的な関係性を効果的に保存した特徴量変換関数ｆを得ることができる。上記特許文献１に記載の低次元化方法のように単純な相関を用いる場合とは異なり、意味内容に関連した幾何関係を捉えることで、文書２２の量が少ない場合であっても、有効な低次元特徴量を構成することができる点で利がある。 (2) Preservation of geometric relationship of document feature amount space: As described above, the word vector 32 includes, for example, “vec (“ Berlin ”) − vec (“ Germany ”) + vec (“ France ”) in SG and CBOW. It is known that there is a clear relationship between the meaning of a word, such as ≈vec (“Paris”), and the geometric relationship of the word vector 32. However, if the feature quantity conversion function f is learned so as to preserve the geometric relationship in the word vector space, the feature quantity conversion function f that effectively preserves the semantic relationship of the word vector can be obtained. . Unlike the case of using a simple correlation as in the method of reducing the dimension described in Patent Document 1, it is effective even when the amount of the document 22 is small by capturing the geometric relationship related to the semantic content. This is advantageous in that low-dimensional feature values can be configured.

以降、一般性を失うことなく、関係指示子２３の与えられている信号コンテンツ２１及び文書２２の組の数をＮとし、当該信号コンテンツ２１の初期特徴量ｘｉ（ｉ＝１，２，・・・，Ｎ）と文書特徴量ｙｉ（ｉ＝１，２，・・・，Ｎ）について、同一インデクスを持つ組（例えばｘ１とｙ１、ｘ２とｙ２など）は関係指示子２３によって関係が指示されている組であるとする。また、これらは平均０に正規化されているとする。すなわち、初期特徴量ｘｉの平均ベクトルは０ベクトルである。 Thereafter, without losing generality, the number of sets of the signal content 21 and the document 22 to which the relation indicator 23 is given is N, and the initial feature amount xi (i = 1, 2,... .., N) and the document feature quantity yi (i = 1, 2,..., N), a pair having the same index (for example, x1 and y1, x2 and y2, etc.) is instructed by the relation indicator 23. Suppose that it is a pair. These are normalized to an average of 0. That is, the average vector of the initial feature quantity xi is a zero vector.

次に、特徴量変換関数ｆを生成する際の具体的な手続きを説明する。 Next, a specific procedure for generating the feature quantity conversion function f will be described.

まず、単語ベクトル空間にある幾何関係を捉える処理について説明する。これを実現するためには、各々の文書特徴量ｙｉが、他の文書特徴量ｙｊ≠ｉとどのような幾何関係にあるかを調べる必要がある。一般に、この幾何関係は、Ｎ個の文書特徴量ベクトルｙｉ（ｉ＝１，２，・・・，Ｎ）を入力情報として、文書特徴量ｙｉ及び文書特徴量ｙｊの間の幾何関係Ｗを出力とする、下記の最適化問題を解くことによって求められる。 First, a process for capturing a geometric relationship in the word vector space will be described. In order to realize this, it is necessary to examine how each document feature amount yi has a geometric relationship with other document feature amounts yj ≠ i. In general, this geometric relationship outputs the geometric relationship W between the document feature value yi and the document feature value yj with N document feature value vectors yi (i = 1, 2,..., N) as input information. Is obtained by solving the following optimization problem.

ここで、幾何関係Ｗ＝｛ｗｉｊ｝はＮ×Ｎの行列であり、そのｉ行目幾何関係ｗｉは、文書特徴量ｙｉの他の文書特徴量ｙｊとの幾何関係を表す。ここで言う幾何関係Ｗは、より具体的には、文書特徴量ｙｉを、他の文書特徴量ｙｊ（ｊ＝１，２，・・・，Ｎ、ただしｉ≠ｊ）の線形和で表現した際の、線形結合重みを表している。この計算方法は所謂最小二乗法であり、公知の勾配法や行列分解を用いて解くことができる。結果として得られる幾何関係Ｗは、より意味的に関連した内容を持つ文書２２同士の関係に対応する要素（つまり文書特徴量ｙｉ及び文書特徴量ｙｊの間の関係に対応する要素ｗｉｊ）ほど大きい値を持つ傾向にある。 Here, the geometrical relationship W = {wij} is an N × N matrix, and the i-th row geometrical relationship wi represents a geometrical relationship between the document feature amount yi and another document feature amount yj. More specifically, the geometric relationship W described here represents the document feature amount yi as a linear sum of other document feature amounts yj (j = 1, 2,..., N, i ≠ j). Represents the linear combination weight. This calculation method is a so-called least square method, which can be solved using a known gradient method or matrix decomposition. The resulting geometric relationship W is larger as the element corresponding to the relationship between the documents 22 having more semantically related contents (that is, the element wij corresponding to the relationship between the document feature amount yi and the document feature amount yj). Tend to have a value.

わかりやすい例を挙げると、文書特徴量ｙｉがリンゴについて記述した文書２２の文書特徴量であったとする。また、青リンゴについて記述した文書２２の文書特徴量ｙｊと、自動車について記述した文書２２の文書特徴量ｙｋがあったとする。このとき、文書特徴量ｙｋと比較し、文書特徴量ｙｊの方が文書特徴量ｙｉに類似しているため、幾何関係Ｗのうち、文書特徴量ｙｋに対応する幾何関係ｗｋはほとんどの値が０を取り、文書特徴量ｙｊに対応する幾何関係ｗｋはほとんどの値が１に近い値を取る。 As an easy-to-understand example, it is assumed that the document feature amount yi is the document feature amount of the document 22 describing the apple. Further, it is assumed that there is a document feature amount yj of the document 22 describing the green apple and a document feature amount yk of the document 22 describing the car. At this time, since the document feature amount yj is more similar to the document feature amount yi than the document feature amount yk, the geometric relationship wk corresponding to the document feature amount yk in the geometric relationship W has almost the same value. The geometric relationship wk corresponding to the document feature amount yj is 0, and most of the values are close to 1.

あるいは、各々の文書特徴量ｙｉは、他の極少数の文書特徴量としか有効な幾何関係Ｗを持たない、つまり、文書特徴量ｙｉを持つ文書２２に意味的に関連した文書２２は、Ｎ個の文書の中で非常に限られている場合が多くある。この場合には、文書特徴量ｙｉ及び文書特徴量ｙｊの間の幾何関係Ｗは、Ｎ個の文書特徴量ベクトルｙｉ（ｉ＝１，２，・・・，Ｎ）を入力情報として、文書特徴量ｙｉ及び文書特徴量ｙｊの間の幾何関係Ｗを出力情報とする、下記（１）式及び（２）式の最適化問題を解くことによって求められる。 Alternatively, each document feature amount yi has an effective geometric relationship W with only a very small number of other document feature amounts, that is, the document 22 semantically related to the document 22 having the document feature amount yi is N There are many cases where the number of documents is very limited. In this case, the geometric relationship W between the document feature value yi and the document feature value yj is obtained by using N document feature value vectors yi (i = 1, 2,..., N) as input information. It is obtained by solving the optimization problem of the following formulas (1) and (2) using the geometric relationship W between the quantity yi and the document feature quantity yj as output information.

この最適化問題はベクトルｗｉ中のいくつかの限られた要素にのみ非ゼロの値を持つように正則化する、所謂スパース回帰問題、スパースコーディング問題と言われる問題であり、例えば、最小角回帰（ＬｅａｓｔＡｎｇｌｅＲｅｇｒｅｓｓｉｏｎ：ＬＡＲＳ），ＡｌｔｅｒｎａｔｉｎｇＤｉｒｅｃｔｉｏｎＭｅｔｈｏｄｏｆＭｕｌｔｉｐｌｉｅｒｓ（ＡＤＭＭ）等のいくつかの公知の数値計算アルゴリズムを用いて解くことができる。 This optimization problem is a so-called sparse regression problem or sparse coding problem in which only a limited number of elements in the vector wi are regularized to have non-zero values. It can be solved by using some known numerical calculation algorithms such as (Least Angle Regression: LARS), Alternating Direction Method of Multipliers (ADMM).

一方、本実施形態では、必ずしもコンテンツデータベース２に登録された全ての文書２２に関係指示子２３が与えられている必要はない。特に、関係指示子２３が与えられている文書２２が少ない場合には、単語ベクトル空間上の幾何関係を捉えるに足る十分な量のデータ（文書特徴量）が得られず、結果として幾何関係の捕捉に失敗する、すなわち、意味的に関連した（していない）文書２２同士の幾何関係Ｗの要素が、本来、大きく（小さく）なるべきところ、そうはならないような場合も出てくる。そこで、本実施形態では、関係指示子２３が与えられていない文書２２の文書特徴量も活用して、幾何関係を捉えるに十分な量のデータを得ることを考える。 On the other hand, in this embodiment, it is not always necessary that the relationship indicator 23 be given to all the documents 22 registered in the content database 2. In particular, when the number of documents 22 to which the relationship indicator 23 is given is small, a sufficient amount of data (document feature amount) sufficient to capture the geometric relationship in the word vector space cannot be obtained. There are cases where acquisition fails, that is, the elements of the geometric relationship W between the documents 22 that are semantically related (not) should or should not be larger. Therefore, in the present embodiment, it is considered that the document feature amount of the document 22 to which the relationship indicator 23 is not given is also used to obtain a sufficient amount of data for capturing the geometric relationship.

関係指示子２３が与えられていない文書２２がＭ個存在するとする。関係指示子２３が与えられていない文書２２も含めた文書特徴量ｕｉ（ｉ＝１，２，・・・，（Ｎ＋Ｍ））と表す（最初のＮ個は関係指示子２３が与えられている文書２２の文書特徴量ｙｉ（ｉ＝１，・・・，Ｎ））。この場合にも上記述べた手続きと同様に、Ｎ＋Ｍ個の文書特徴量ｕｉを用いてこれらの幾何関係Ｗを求めればよい。具体的には、（１）式及び（２）式に対応する問題は、それぞれ下記（３）式及び（４）式となる。 Assume that there are M documents 22 to which no relationship indicator 23 is given. This is expressed as a document feature ui (i = 1, 2,..., (N + M)) including the document 22 to which the relationship indicator 23 is not given (the first N items are given the relationship indicator 23). Document feature amount yi (i = 1,..., N)) of the document 22. In this case, similar to the above-described procedure, these geometrical relationships W may be obtained using N + M document feature quantities ui. Specifically, the problems corresponding to the equations (1) and (2) are the following equations (3) and (4), respectively.

上記（３）式及び（４）式を、上記（１）式及び（２）式を用いた手続と同様の手続きで解くことによって幾何関係Ｗを求めれば良い。なお、関係指示子２３を持たない文書２２の文書特徴量は、必ずしも文書２２から抽出した文書特徴量である必要はない。具体的には、単語ベクトル３２自体を、直接関係指示子２３を持たない文書２２として利用することも可能である。単語ベクトル３２自体は、単語ベクトル空間上のＤｙ次元ベクトルであり、この空間を定める“基底”とも言える特別なデータであることから、単語ベクトル空間中の幾何関係Ｗを捉える上では特別有益な情報をもたらすものであり、効果的である。 The geometric relationship W may be obtained by solving the above equations (3) and (4) by the same procedure as the procedure using the above equations (1) and (2). Note that the document feature amount of the document 22 that does not have the relationship indicator 23 is not necessarily the document feature amount extracted from the document 22. Specifically, the word vector 32 itself can be used as the document 22 that does not have the direct relationship indicator 23. The word vector 32 itself is a Dy-dimensional vector in the word vector space, and is special data that can be said to be a “base” that defines this space. Therefore, the word vector 32 is information that is particularly useful for capturing the geometric relationship W in the word vector space. Is effective.

以上のような手続きによって、特徴量生成装置１は、単語ベクトル空間上の幾何関係Ｗを捕捉する。 Through the procedure as described above, the feature quantity generation device 1 captures the geometric relationship W in the word vector space.

続いて、単語ベクトル空間上の文書特徴量同士の幾何関係Ｗを可能な限り保存し、かつ、関係指示子２３によって表される信号コンテンツ２１及び文書２２の関係を最大限保存するように、初期特徴量を、初期特徴量の次元より低次元な次元を持つ低次元特徴量へと変換する特徴量変換関数ｆを生成する。具体的には、下記の手続きに従って特徴量変換関数ｆを生成する。 Subsequently, an initial setting is made so that the geometric relationship W between the document feature quantities in the word vector space is stored as much as possible, and the relationship between the signal content 21 and the document 22 represented by the relationship indicator 23 is stored as much as possible. A feature quantity conversion function f for converting the feature quantity into a low-dimensional feature quantity having a dimension lower than that of the initial feature quantity is generated. Specifically, the feature quantity conversion function f is generated according to the following procedure.

本実施形態では、特徴量変換関数ｆとして以下の形式をとるものを考える。 In the present embodiment, a feature quantity conversion function f that takes the following form is considered.

上記（５）式において、αｋ，ｔはパラメータ、κ（ｘｔ，ｘ）はカーネル関数である。カーネル関数は、下記（６）式に示す関数であると共に、Ｎ個の初期特徴量｛ｘ１，・・・，ｘＮ｝に対して、下記（７）式を満たし、任意の実数αｉ、αｊに対して、下記（８）式を満たす関数である。 In the above equation (5), αk, t is a parameter, and κ (xt, x) is a kernel function. The kernel function is a function shown in the following formula (6), and satisfies the following formula (7) for N initial feature quantities {x1,..., XN}, and is set to arbitrary real numbers αi and αj. On the other hand, it is a function that satisfies the following equation (8).

このような関数は無数に存在するが、例を挙げれば、下記（９）式、（１０）式、（１１）式等が存在する。ただし、β、γは正の実数値パラメータ、ｐは整数パラメータであり、適宜決定してよい。 There are an infinite number of such functions. For example, the following formulas (9), (10), and (11) exist. However, β and γ are positive real value parameters, and p is an integer parameter, which may be determined as appropriate.

なお、上記（１１）式のカーネル関数を用い、さらにｐ＝１、γ＝０とした場合には、得られる特徴量変換関数ｆは単純な線形関数に帰着され、低次元特徴量を求める処理、及び、特徴量変換関数ｆを求めるための処理量が大きく低減されるという利点がある。一方、この場合以外の場合には特徴量変換関数ｆは非線形関数となり、処理量は増加するものの、最終的に得られる低次元特徴量に移すことのできる文書２２の情報量は増え、結果的に得られる精度が向上するという利点がある。 When the kernel function of the above equation (11) is used and p = 1 and γ = 0, the obtained feature quantity conversion function f is reduced to a simple linear function, and processing for obtaining a low-dimensional feature quantity is performed. There is an advantage that the processing amount for obtaining the feature amount conversion function f is greatly reduced. On the other hand, in other cases, the feature quantity conversion function f becomes a non-linear function and the processing amount increases, but the information amount of the document 22 that can be transferred to the finally obtained low-dimensional feature amount increases, and as a result There is an advantage that the accuracy obtained is improved.

上記（５）式において、ｂｋは下記（１２）式、すなわち平均値を求めることで定められる定数であるため、上記（５）式は、下記（１３）式のような内積の形式に変換することができる。ただし、ａｋ、κ（ｘ）は、下記（１４）式で表される。 In the above equation (5), since bk is a constant determined by calculating the following equation (12), that is, calculating the average value, the above equation (5) is converted into an inner product form such as the following equation (13). be able to. However, ak and κ (x) are expressed by the following equation (14).

ここで、Ｔは、特徴量変換関数ｆ、具体的にはカーネルベクトル写像κ（ｘ）を定める定数である。定数Ｔは、Ｔ≦Ｎの範囲で任意の値に決めてよい。例えば、Ｔ＝３００として、全ての初期特徴量｛ｘ１，・・・，ｘＮ｝の中から初期特徴量をランダムにＴ個選出して特徴量変換関数ｆの生成に用いても良く、あるいはＫ−ｍｅａｎｓ等のクラスタリング法を用いて決定された代表ベクトルを特徴量変換関数ｆの生成に用いても良い。 Here, T is a constant that determines the feature amount conversion function f, specifically, the kernel vector mapping κ (x). The constant T may be set to an arbitrary value in the range of T ≦ N. For example, assuming that T = 300, T initial feature quantities may be randomly selected from all the initial feature quantities {x1,..., XN} and used to generate the feature quantity conversion function f. A representative vector determined using a clustering method such as -means may be used to generate the feature amount conversion function f.

然るに、特徴量変換関数ｆを求めるためには、パラメータ｛αｋ｝を決定すれば良い。そこで、以下に、パラメータ｛αｋ｝を決定する方法を説明する。 However, in order to obtain the feature amount conversion function f, the parameter {αk} may be determined. Therefore, a method for determining the parameter {αk} will be described below.

便宜上、関係指示子２３を持たない文書特徴量も含めた文書特徴量｛ｕ１，・・・，ｕ（Ｎ＋Ｍ）｝（ただし、ｕ１＝ｙ１，・・・，ｕＮ＝ｙＮ）を定義しておく。ただし、関係指示子２３を持たない文書特徴量は必須ではなく、その場合には、Ｍ＝０とすれば良い。さらに、上記初期特徴量ｘｉと同様の手続きによって、κ（ｘｉ）に相当するカーネルベクトル写像ρ（ｕｉ）を求めておく。 For convenience, document feature values {u1,..., U (N + M)} including document feature values having no relation designator 23 (where u1 = y1,..., UN = yN) are defined. . However, the document feature amount not having the relation indicator 23 is not essential, and in this case, M = 0 may be set. Further, a kernel vector map ρ (ui) corresponding to κ (xi) is obtained by the same procedure as the initial feature value xi.

κ（ｘｉ）（ｉ＝１，２，・・・，Ｎ）およびκ（ｕｉ）（ｉ＝１，２，・・・，Ｎ＋Ｍ）を並べた行列Ｋ＝［κ（ｘ１，１），・・・，κ１（ｘ１、Ｎ）］、Ρ＝［κ（ｕ１），・・・，κ（ｕＮ＋Ｍ）］を定義する。さらに、特徴量変換関数ｆに加えて、全く同様の形式を持つ文書特徴量を低次元化する変換関数ｇを用意し、上記（１３）式におけるαｋに相当するパラメータをθｋ（ｋ＝１，２，・・・，ｄ）とする。なお、ｇ並びにθｋは数理的理由によって便宜上用いるものであって、本実施形態において特徴量変換関数ｆとして利用することはない。αｋを並べた行列をΑ＝［α１，・・・，αｄ］と定義し、θｋ（ｋ＝１，２，・・・，ｄ）を並べた行列をΘ＝［θ１，・・・，θｄ］と定義し、以下の数式を解く。 A matrix K = [κ (x1, 1), in which κ (xi) (i = 1, 2,..., N) and κ (ui) (i = 1, 2,..., N + M) are arranged. .., Κ1 (x1, N)], Ρ = [κ (u1),..., Κ (uN + M)] are defined. Further, in addition to the feature amount conversion function f, a conversion function g for reducing the document feature amount having the completely same format is prepared, and a parameter corresponding to αk in the above equation (13) is set to θk (k = 1, 2, ..., d). Note that g and θk are used for convenience for mathematical reasons, and are not used as the feature amount conversion function f in this embodiment. A matrix in which αk is arranged is defined as Α = [α1,..., αd], and a matrix in which θk (k = 1, 2,..., d) is arranged is Θ = [θ1,. ] And solve the following equation.

ここで、上記（１５）式の第１項は、初期特徴量及び文書特徴量のそれぞれをＡおよびΘによって低次元化した際に得られる低次元特徴量において、関係指示子２３によって支持された組同士の線形または非線形な相関を最大化することを要請する項である。第２項は、幾何関係Ｗにより捕捉されている単語ベクトル３２上の幾何関係Ｗが、低次元化された後も保存されることを要請する項である。すなわち、上記（１５）式を解くことによって求められる特徴量変換関数ｆは、信号コンテンツ２１の低次元特徴量と同じく低次元化された文書特徴量との関係、および、単語ベクトル３２上の意味を反映した幾何関係Ｗの、双方を最大限保存する項である。 Here, the first term of the above equation (15) is supported by the relation indicator 23 in the low-dimensional feature value obtained when the initial feature value and the document feature value are reduced by A and Θ. This is a term that requests to maximize the linear or non-linear correlation between pairs. The second term is a term that requests that the geometric relationship W on the word vector 32 captured by the geometric relationship W be preserved even after the reduction in dimension. That is, the feature quantity conversion function f obtained by solving the above equation (15) is the relationship between the low-dimensional feature quantity of the signal content 21 and the reduced document feature quantity, and the meaning on the word vector 32. Is a term that preserves both of the geometric relationship W reflecting the maximum.

上記（１５）式に対して簡単な代数変形を適用すると、下記（１６）式が得られる。 When a simple algebraic modification is applied to the above equation (15), the following equation (16) is obtained.

ここで、下記（１７）式は、Ｇについて凸であるので、Ｇについて微分してその極値を取ることで、下記（１８）式に示すように、一般化固有値問題に帰着できる。 Here, since the following equation (17) is convex with respect to G, by differentiating G and taking its extreme value, it can be reduced to a generalized eigenvalue problem as shown in the following equation (18).

このような一般化固有値問題の解は、反復法などの公知の数値計算アルゴリズムによって求めることができる。Ｇの要素は、求めたいパラメータＡを含んでいるので、Ｇの要素を用いて特徴量変換関数ｆを得ることができる。 A solution of such a generalized eigenvalue problem can be obtained by a known numerical calculation algorithm such as an iterative method. Since the element of G includes the parameter A to be obtained, the feature quantity conversion function f can be obtained using the element of G.

以上のようにして、特徴量生成装置１は、目的としていた性質を最適に満たすような特徴量変換関数ｆを得る。 As described above, the feature quantity generation device 1 obtains the feature quantity conversion function f that optimally satisfies the target property.

［低次元化］ [Lower dimension]

特徴量変換関数ｆを求めた後であれば、任意の信号コンテンツ２１に対する初期特徴量を低次元化することができる。具体的には、初期特徴量ｘに対して平均が０になるようにシフトした後、下記（１９）式により新たな低次元特徴量を計算する。 After the feature amount conversion function f is obtained, the initial feature amount for any signal content 21 can be reduced. Specifically, after shifting so that the average becomes 0 with respect to the initial feature value x, a new low-dimensional feature value is calculated by the following equation (19).

＜＜意味的に関連したコンテンツの発見への適用＞＞ << Application to discovery of semantically related content >>

上記説明した本実施形態に係る特徴量生成装置１を、意味的に関連のある信号コンテンツ２１を検索する目的で利用する場合の一例について説明する。ここでは、信号コンテンツ２１が画像である場合を例に挙げて説明する。 An example will be described in which the feature value generation apparatus 1 according to the present embodiment described above is used for the purpose of searching for signal content 21 that is semantically related. Here, a case where the signal content 21 is an image will be described as an example.

例えば、コンテンツデータベース２に、Ｎ枚のデータベース画像が格納されているとする。上記説明した特徴量変換関数学習処理を通じて特徴量変換関数ｆを求め、これが記憶部３に格納されているものとし、さらに、上記Ｎ枚のデータベース画像に対応する低次元特徴量Ｚ＝｛ｚ１，・・・，ｚＮ｝が得られているものとする。このとき、目的は新たなクエリ画像が利用者から与えられた時に、当該クエリ画像に意味的に関連のあるデータベース画像を発見することである。 For example, it is assumed that N database images are stored in the content database 2. It is assumed that the feature quantity conversion function f is obtained through the above-described feature quantity conversion function learning process and stored in the storage unit 3, and further, the low-dimensional feature quantity Z = {z 1, corresponding to the N database images. ..., zN} is obtained. At this time, when a new query image is given by the user, the purpose is to find a database image that is semantically related to the query image.

まず、クエリ画像に対して初期特徴量抽出処理を施し、初期特徴量ｘｑを抽出する。その後、上記（１９）式に基づいて、特徴量ｘｑを低次元化し、低次元特徴量ｚｑを求める。 First, an initial feature quantity extraction process is performed on the query image to extract an initial feature quantity xq. Thereafter, the feature quantity xq is reduced in dimension based on the above equation (19) to obtain a low-dimensional feature quantity zq.

この低次元特徴量ｚｑと、低次元特徴量Ｚのそれぞれと距離を計算し、距離が小さいデータベース画像を意味的に関連のあるデータベース画像として出力する。低次元特徴量ｚｑ及び低次元特徴量Ｚはいずれも低次元であるため、距離計算に要する時間は、低次元化される前の初期特徴量を用いた場合に比べて短くなり、高速に演算することができる。また、物理的な特徴量である初期特徴量とは異なり、本実施形態により得られる低次元特徴量は文書２２の持つ意味内容と関連性が高くなるように学習された特徴量変換関数ｆを介して変換されているため、意味的な関連性の高いデータベース画像を精度良く発見することが可能である。 The distance between each of the low-dimensional feature quantity zq and the low-dimensional feature quantity Z is calculated, and a database image with a small distance is output as a database image that is semantically related. Since both the low-dimensional feature value zq and the low-dimensional feature value Z are low-dimensional, the time required for the distance calculation is shorter than when using the initial feature value before the reduction in dimension, and the calculation is performed at high speed. can do. In addition, unlike the initial feature quantity that is a physical feature quantity, the low-dimensional feature quantity obtained by the present embodiment is obtained by using the feature quantity conversion function f learned so as to be highly related to the semantic content of the document 22. Therefore, it is possible to find a database image having high semantic relevance with high accuracy.

以上が、本実施形態において、意味的に関連のある信号コンテンツ２１を検索する目的で利用する場合一例である。 The above is an example in the case of using for the purpose of searching for the signal content 21 that is semantically related in the present embodiment.

以上のように、本実施形態によれば、文書特徴量の持つ幾何的な特性を捉えることで、文書２２の持つ意味内容をより正確に捉え、これを用いて信号コンテンツ２１と文書２２の関係性を学習することで、信号コンテンツ２１と文書２２のペアが少ないような場合であっても、より意味的に関連した信号コンテンツ２１の発見が可能となる。 As described above, according to the present embodiment, by capturing the geometric characteristics of the document feature amount, the semantic content of the document 22 is captured more accurately, and using this, the relationship between the signal content 21 and the document 22 is determined. By learning the nature, it is possible to discover the signal content 21 more semantically related even when there are few pairs of the signal content 21 and the document 22.

さらに、本実施形態で生成される信号コンテンツ２１の低次元特徴量は、元の初期特徴量と比べて非常に低次元であることから、高速な類似コンテンツの発見が可能であり、本実施形態によって（１）高速でありながらも、（２）意味的に類似したコンテンツの発見を可能にする信号コンテンツ２１の特徴量を生成可能である。 Furthermore, since the low-dimensional feature amount of the signal content 21 generated in the present embodiment is very low compared to the original initial feature amount, it is possible to find similar content at high speed. (1) While being fast, (2) it is possible to generate a feature amount of the signal content 21 that enables the discovery of semantically similar content.

なお、本実施形態における主要な特徴を満たす範囲内において、任意の用途と構成を取ることができることは言うまでもない。例えば、特徴量変換関数生成処理と特徴量変換処理は分離可能であり、それぞれを成す装置を分けることもできるため、バッチ処理で実行する特徴量変換関数生成処理をサーバ装置に、オンライン処理で実行する特徴量変換処理をスマートフォンなどのクライアント装置に組み込む構成を取ることもできる。この場合の装置構成の一例を図５に示す。 In addition, it cannot be overemphasized that it can take arbitrary uses and a structure within the range with which the main characteristics in this embodiment are satisfy | filled. For example, the feature quantity conversion function generation process and the feature quantity conversion process can be separated, and the devices that make up each can be separated, so the feature quantity conversion function generation process executed in batch processing is executed on the server device by online processing It is also possible to adopt a configuration in which the feature amount conversion processing to be performed is incorporated into a client device such as a smartphone. An example of the apparatus configuration in this case is shown in FIG.

図５に示すように、サーバ装置１００は、単語ベクトル学習部１０１、文書特徴抽出部１０２、初期特徴量抽出部１０３、特徴量変換関数生成部１０４、低次元化部１０５、及び記憶部３００を備える。記憶部３００には、特徴量変換関数３０１及び単語ベクトル３０２が記憶されている。また、コンテンツデータベース２００には、信号コンテンツ２０１、文書２０２、関係指示子２０３、及び低次元特徴量２０４が記憶されている。サーバ装置１００は、コンテンツデータベース２００に接続されている。 As illustrated in FIG. 5, the server device 100 includes a word vector learning unit 101, a document feature extraction unit 102, an initial feature amount extraction unit 103, a feature amount conversion function generation unit 104, a reduction dimension unit 105, and a storage unit 300. Prepare. The storage unit 300 stores a feature amount conversion function 301 and a word vector 302. The content database 200 stores signal content 201, a document 202, a relation indicator 203, and a low-dimensional feature amount 204. The server device 100 is connected to the content database 200.

一方、図５に示すように、クライアント装置４００は、初期特徴量抽出部４０１、低次元化部４０２、及び記憶部５００を備える。記憶部５００には、特徴量変換関数５０１が記憶されている。 On the other hand, as illustrated in FIG. 5, the client device 400 includes an initial feature amount extraction unit 401, a low-dimensionalization unit 402, and a storage unit 500. The storage unit 500 stores a feature amount conversion function 501.

ここで、サーバ装置１００とクライアント装置４００において、共通する構成要素（初期特徴量抽出部、記憶部）はそれぞれ同一の機能を有するように構成し、また、図１に記載した各構成要素と同一名称のものは、図１の場合と同一の機能を有するものとしてよい。さらに、サーバ装置１００の記憶部３００に記憶されている特徴量変換関数３０１、及びクライアント装置４００の記憶部５００に記憶されている特徴量変換関数５０１は、それぞれ何らかの通信手段で適宜同期されているものとする。 Here, in the server apparatus 100 and the client apparatus 400, common components (initial feature quantity extraction unit, storage unit) are configured to have the same functions, and are the same as the respective components described in FIG. The name may have the same function as in FIG. Furthermore, the feature amount conversion function 301 stored in the storage unit 300 of the server device 100 and the feature amount conversion function 501 stored in the storage unit 500 of the client device 400 are each appropriately synchronized by some communication means. Shall.

さらに、検索装置８００を備える。この検索装置８００は、サーバ装置１００に組み込まれていても構わないし、外部からサーバ装置１００に接続されていても構わない。 Further, a search device 800 is provided. The search device 800 may be incorporated in the server device 100 or may be connected to the server device 100 from the outside.

この装置構成における処理概要は下記の通りである。 The outline of processing in this apparatus configuration is as follows.

まずサーバ装置１００は、上述した特徴量変換関数学習処理を行って、適宜、特徴量変換関数を生成し、クライアント装置４００の特徴量変換関数と同期する。さらに、コンテンツデータベース２００中のコンテンツに対して、上述した処理と同様の処理を行って、低次元特徴量２０４を生成し、コンテンツデータベース２００に格納する。 First, the server apparatus 100 performs the above-described feature quantity conversion function learning process, appropriately generates a feature quantity conversion function, and synchronizes with the feature quantity conversion function of the client apparatus 400. Further, the same processing as described above is performed on the content in the content database 200 to generate a low-dimensional feature amount 204 and store it in the content database 200.

一方、クライアント装置４００は、利用者からの検索要求、すなわち、新規信号コンテンツ６００の入力を受け付けたら、当該信号コンテンツ６００に対して低次元特徴量７００を生成し、検索装置８００に出力する。 On the other hand, when the client apparatus 400 receives a search request from the user, that is, an input of the new signal content 600, the client apparatus 400 generates a low-dimensional feature quantity 700 for the signal content 600 and outputs it to the search apparatus 800.

検索装置８００は、クライアント装置４００から低次元特徴量７００を受け付けた場合、当該低次元特徴量を用いてコンテンツデータベース２００へ検索を掛け、低次元特徴量７００に基づいて意味的に関連した信号コンテンツを発見し、発見した信号コンテンツを検索結果９００として利用者に出力する。 When the search device 800 receives the low-dimensional feature value 700 from the client device 400, the search device 800 searches the content database 200 using the low-dimensional feature value and semantically related signal content based on the low-dimensional feature value 700. And the found signal content is output as a search result 900 to the user.

このようにサーバ装置１００及びクライアント装置４００を構成することで、サーバ装置１００で特徴量変換関数学習処理を実施し、クライアント装置４００では特徴量変換処理のみを実施するように構成することができる。 By configuring the server device 100 and the client device 400 in this way, the server device 100 can perform the feature amount conversion function learning processing, and the client device 400 can be configured to perform only the feature amount conversion processing.

ここで、サーバ装置１００及びクライアント装置４００をこのように構成するメリットを述べる。一般に、クライアント装置（ＰＣ，スマートフォン等）は、サーバ装置と比較して演算能力に乏しいため、特徴量変換関数生成のように演算量が比較的多い処理には適さない場合がある。しかし、このように構成にすれば、特徴量変換関数学習処理は演算能力の高いサーバ装置で適宜実施し、クライアント装置では演算量の少ない特徴量変換処理だけを実施することができる。さらに、通常、ネットワークを介した通信によってデータ容量の多い情報を伝送する場合、伝送時間が掛かるという問題があるが、当該構成によって、伝送するのは情報量の小さい低次元特徴量の伝送でのみで処理が可能となり、検索に対する即応性を高めることができる。 Here, the merit of configuring the server apparatus 100 and the client apparatus 400 in this way will be described. In general, a client device (PC, smartphone, etc.) has poor calculation capability as compared to a server device, and therefore may not be suitable for processing with a relatively large amount of calculation such as feature amount conversion function generation. However, with this configuration, the feature amount conversion function learning process can be appropriately performed by a server device having high calculation capability, and only the feature amount conversion process having a small amount of operation can be performed by the client device. In addition, when transmitting information with a large amount of data through communication via a network, there is a problem that it takes a long time to transmit. However, due to this configuration, only low-dimensional feature values with a small amount of information are transmitted. Can be processed, and the quick response to the search can be improved.

１特徴量生成装置
２コンテンツデータベース
３記憶部
４信号コンテンツ
５低次元特徴量
１１単語ベクトル学習部
１２文書特徴量抽出部
１３初期特徴量抽出部
１４特徴量変換関数生成部
１５低次元化部
２１信号コンテンツ
２２文書
２３関係指示子
３１特徴量変換関数
３２単語ベクトル DESCRIPTION OF SYMBOLS 1 Feature-value production | generation apparatus 2 Content database 3 Memory | storage part 4 Signal content 5 Low-dimensional feature-value 11 Word vector learning part 12 Document feature-value extraction part 13 Initial feature-value extraction part 14 Feature-value conversion function generation part 15 Low-dimensionalization part 21 Signal Content 22 Document 23 Relation indicator 31 Feature amount conversion function 32 Word vector

Claims

If the signal content and document of the desired type are each given one or more, and the relation indicator indicating the presence or absence of the relationship between one or more sets of the signal content and the document is given, the signal A feature value generation method for learning a feature value conversion function for generating a low-dimensional feature value of content,
Based on the co-occurrence of words included in the document, a word vector learning step for generating a correspondence between each word and the word vector so that words that are more likely to co-occur are closer to each other.
A document feature extraction step of extracting a document feature amount of the document using the learned word vector;
An initial feature amount extracting step for extracting an initial feature amount of the signal content;
A relative geometric relationship between the signal content and the document is obtained using at least one of the word vector and the document feature amount, and based on the initial feature amount, the relative geometric relationship, and a relation indicator. Generating a feature quantity conversion function for converting the initial feature quantity into a low-dimensional feature quantity, and outputting a feature quantity conversion function generating step;
A method for generating a feature quantity.

A feature value generation method for generating the low-dimensional feature value of the signal content when the desired type of signal content is given,
An initial feature extraction step for extracting the initial feature of the signal content;
Based on the feature quantity conversion function generated by the feature quantity generation method according to claim 1, a reduction order step for reducing and outputting the initial feature quantity, and
A method for generating a feature quantity.

If the signal content and document of the desired type are each given one or more, and the relation indicator indicating the presence or absence of the relationship between one or more sets of the signal content and the document is given, the signal A feature value generation device for learning a feature value conversion function for generating a low-dimensional feature value of content,
Based on the co-occurrence of words included in the document, a word vector learning unit that generates a correspondence between each word and the word vector so that words that are more likely to co-occur are closer to each other,
A document feature extraction unit that extracts a document feature amount of the document using the learned word vector;
An initial feature amount extraction unit for extracting an initial feature amount of the signal content;
A relative geometric relationship between the signal content and the document is obtained using at least one of the word vector and the document feature amount, and based on the initial feature amount, the relative geometric relationship, and a relation indicator. Generating a feature value conversion function for converting the initial feature value into a low-dimensional feature value, and outputting a feature value conversion function generating unit;
A feature amount generating apparatus.

A feature value generation device that generates a low-dimensional feature value of a signal content when a desired type of signal content is given,
An initial feature amount extraction unit for extracting an initial feature amount of the signal content;
Based on the feature value conversion function generated by the feature value generation device according to claim 3, a reduction unit for reducing and outputting the initial feature value,
A feature amount generating apparatus.

A feature quantity generation program for causing a computer to execute each step of the feature quantity generation method according to claim 1 or 2.