JP6993250B2

JP6993250B2 - Content feature extractor, method, and program

Info

Publication number: JP6993250B2
Application number: JP2018016372A
Authority: JP
Inventors: 昭悟木村; ズービンガラマーニ; 悠介椋田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2018-02-01
Filing date: 2018-02-01
Publication date: 2022-01-13
Anticipated expiration: 2038-02-01
Also published as: JP2019133496A

Description

本発明は、コンテンツ特徴量抽出装置、方法、及びプログラムに係り、特に、コンテンツの特徴量を抽出するためのコンテンツ特徴量抽出装置、方法、及びプログラムに関する。 The present invention relates to a content feature amount extraction device, a method, and a program, and more particularly to a content feature amount extraction device, a method, and a program for extracting a content feature amount.

画像や映像の中に含まれる物体を特定する物体認識、画像や映像の中に含まれる物体の位置を特定する物体検出、画像や映像の内容を記述する説明文生成など、画像・映像に関わる様々なタスクの遂行において、画像や映像の特性を表現する画像特徴量は、極めて重要な役割を担っている。近年の画像特徴量抽出は、大量の画像・映像と、それら画像・映像の各々に付与されたラベルで構成される大規模データセットを用いて、畳み込みニューラルネットワークに代表される特徴量モデルを学習する、教師付学習に基づいている。しかし、これら教師付学習に基づく手法は、非常に大規模なラベル付データセットを必要とする。画像・映像の各々に正確なラベルを付与する作業は、非常に多くの人的稼働が必要であり、この点が特徴量抽出のボトルネックの一つとなっている。 Related to images and videos, such as object recognition that identifies objects contained in images and videos, object detection that identifies the positions of objects contained in images and videos, and generation of explanatory texts that describe the contents of images and videos. Image features, which represent the characteristics of images and videos, play an extremely important role in the performance of various tasks. In recent years, image feature extraction uses a large-scale data set consisting of a large number of images / videos and labels attached to each of those images / videos to learn a feature model represented by a convolutional neural network. Based on supervised learning. However, these supervised learning-based methods require very large labeled datasets. The work of assigning accurate labels to each of images and videos requires a great deal of human operation, which is one of the bottlenecks in feature quantity extraction.

このボトルネックを解消するために、ｗｅｂ画像を利用した特徴量抽出方法が考案されている。Web画像を利用する利点は、人手で付与するラベルほど正確ではないものの、特徴量学習に有用なラベル相当の情報を人的稼働なしに獲得することができる点にある。例えば、web画像検索システムを用いて画像を収集する際には、検索の際に用いたクエリをラベル相当の情報として用いることができる（非特許文献１）。また、コンテンツ共有サイトから画像を収集する際には、共有サイト上で付与されたテキストタグを用いることができる（非特許文献２）。 In order to eliminate this bottleneck, a feature amount extraction method using a web image has been devised. The advantage of using a Web image is that although it is not as accurate as a label given manually, information equivalent to a label useful for feature learning can be obtained without human operation. For example, when collecting images using a web image search system, the query used in the search can be used as information corresponding to the label (Non-Patent Document 1). Further, when collecting images from a content sharing site, a text tag assigned on the sharing site can be used (Non-Patent Document 2).

Sukhbaatar, Bruna, Paluri, Bourdev and Fergus, “Training convolutional networks from noisy labels,” Proc. International Conference on Learning Representations (ICLR), 2015.Sukhbaatar, Bruna, Paluri, Bourdev and Fergus, “Training convolutional networks from noisy labels,” Proc. International Conference on Learning Representations (ICLR), 2015. Joulin, van der Maaten and Jabri, “Learning visual features from large weakly supervised data,” Proc. European Conference on Computer Vision (ECCV), 2016.Joulin, van der Maaten and Jabri, “Learning visual features from large weakly supervised data,” Proc. European Conference on Computer Vision (ECCV), 2016.

しかし、これらの既存技術では、web画像を利用した獲得したラベル相当の情報に関する重要な性質を反映していない。すなわち、人手で付与したラベルとは異なり、ある特定のラベルが画像に付与されていないことが、当該ラベルに関連づけられた内容を当該画像に含まないことを示しているわけではない、という点である。例えば、web画像検索システムを用いて画像を収集する場合、収集した画像が検索の際に用いたクエリ以外の内容を含むことは当然想定されるが、それらの内容をすべて網羅するラベルを収集することは極めて困難である。上記既存技術を含む一般的な特徴量学習方法では、ラベルがないことと関連内容を画像が含まないこととを同一視して、識別的学習を実行するため、適切な画像特徴量の学習を行うことができない。 However, these existing technologies do not reflect the important properties of label-equivalent information obtained using web images. That is, unlike a label given manually, the fact that a particular label is not given to an image does not mean that the image does not contain the content associated with that label. be. For example, when collecting images using a web image search system, it is naturally assumed that the collected images contain contents other than the query used in the search, but labels that cover all of those contents are collected. That is extremely difficult. In the general feature amount learning method including the above-mentioned existing technology, discriminative learning is performed by equating the absence of a label with the fact that the image does not include the related content, so that appropriate image feature amount learning is performed. I can't do it.

本発明は、上記問題点を解決するために成されたものであり、予め用意されたディジタルコンテンツ集合を考慮したコンテンツの特徴量を抽出することができるコンテンツ特徴量抽出装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and is a content feature amount extraction device, a method, and a program capable of extracting feature amounts of contents in consideration of a digital content set prepared in advance. The purpose is to provide.

上記目的を達成するために、本発明に係るコンテンツ特徴量抽出方法は、予め用意されたディジタルコンテンツからなる複数のディジタルコンテンツ集合を用いて、前記ディジタルコンテンツ集合に含まれるディジタルコンテンツ、及び前記ディジタルコンテンツ集合とは別に与えられた新規ディジタルコンテンツの少なくとも一方の特徴量を抽出する、コンテンツ特徴量抽出装置におけるコンテンツ特徴量抽出方法において、コンテンツ基本特徴量抽出部が、前記ディジタルコンテンツ集合に含まれるディジタルコンテンツの各々について、前記ディジタルコンテンツの基本的な特徴量であるコンテンツ基本特徴量を抽出するステップと、グラフ構築部が、前記ディジタルコンテンツの各々がいずれのディジタルコンテンツ集合に含まれるかを表現するグラフであるコンテンツグラフを構築するステップと、リンク予測モデル学習部が、前記ディジタルコンテンツの各々から抽出したコンテンツ基本特徴量、及び前記コンテンツグラフに基づいて、前記ディジタルコンテンツ集合への所属の有無を前記ディジタルコンテンツの前記コンテンツ基本特徴量から予測するモデルであるリンク予測モデルを学習するステップと、コンテンツ特徴量算出部が、学習された前記リンク予測モデルに基づいて、前記ディジタルコンテンツ集合に含まれるディジタルコンテンツ、及び新規ディジタルコンテンツの少なくとも一方について、前記ディジタルコンテンツの特徴量であるコンテンツ特徴量を計算するステップと、を含んで実行することを特徴とする。 In order to achieve the above object, the content feature amount extraction method according to the present invention uses a plurality of digital content sets composed of digital contents prepared in advance, and the digital contents included in the digital contents set and the digital contents. In the content feature amount extraction method in the content feature amount extraction device that extracts at least one feature amount of the new digital content given separately from the set, the content basic feature amount extraction unit includes the digital content included in the digital content set. For each of the above, a step of extracting the basic content feature amount, which is the basic feature amount of the digital content, and a graph in which the graph construction unit expresses which digital content set each of the digital contents is included in. Based on the step of constructing a certain content graph, the content basic feature amount extracted from each of the digital contents by the link prediction model learning unit, and the content graph, the presence or absence of belonging to the digital content set is determined by the digital content. The step of learning the link prediction model, which is a model for predicting from the content basic feature amount, and the digital content included in the digital content set based on the learned link prediction model, and the content feature amount calculation unit. It is characterized in that at least one of the new digital contents is executed including a step of calculating a content feature amount which is a feature amount of the digital content.

本発明に係るコンテンツ特徴量抽出装置は、予め用意されたディジタルコンテンツからなる複数のディジタルコンテンツ集合を用いて、前記ディジタルコンテンツ集合に含まれるディジタルコンテンツ、及び前記ディジタルコンテンツ集合とは別に与えられた新規ディジタルコンテンツの少なくとも一方の特徴量を抽出する、コンテンツ特徴量抽出装置において、前記ディジタルコンテンツ集合に含まれるディジタルコンテンツの各々について、前記ディジタルコンテンツの基本的な特徴量であるコンテンツ基本特徴量を抽出するコンテンツ基本特徴量抽出部と、前記ディジタルコンテンツの各々がいずれのディジタルコンテンツ集合に含まれるかを表現するグラフであるコンテンツグラフを構築するグラフ構築部と、前記ディジタルコンテンツの各々から抽出したコンテンツ基本特徴量、及び前記コンテンツグラフに基づいて、前記ディジタルコンテンツ集合への所属の有無を前記ディジタルコンテンツの前記コンテンツ基本特徴量から予測するモデルであるリンク予測モデルを学習するリンク予測モデル学習部と、学習された前記リンク予測モデルに基づいて、前記ディジタルコンテンツ集合に含まれるディジタルコンテンツ、及び新規ディジタルコンテンツの少なくとも一方について、前記ディジタルコンテンツの特徴量であるコンテンツ特徴量を計算するコンテンツ特徴量算出部と、を含んで構成されている。 The content feature amount extracting device according to the present invention uses a plurality of digital content sets composed of digital contents prepared in advance, and is a novel content given separately from the digital contents included in the digital contents set and the digital contents set. In the content feature amount extraction device that extracts at least one feature amount of the digital content, the content basic feature amount which is the basic feature amount of the digital content is extracted for each of the digital contents included in the digital content set. A content basic feature amount extraction unit, a graph construction unit that constructs a content graph that is a graph expressing which digital content set each of the digital contents is included in, and a content basic feature extracted from each of the digital contents. Learned with a link prediction model learning unit that learns a link prediction model, which is a model that predicts the presence or absence of belonging to the digital content set from the content basic feature amount of the digital content based on the quantity and the content graph. Based on the link prediction model, the content feature amount calculation unit that calculates the content feature amount, which is the feature amount of the digital content, for at least one of the digital content included in the digital content set and the new digital content. It is composed of including.

本発明に係るプログラムは、コンピュータに、本発明のコンテンツ特徴量抽出方法の各ステップを実行させるためのプログラムである。 The program according to the present invention is a program for causing a computer to execute each step of the content feature amount extraction method of the present invention.

本発明のコンテンツ特徴量抽出装置、方法、及びプログラムによれば、予め用意されたディジタルコンテンツ集合を考慮したコンテンツの特徴量を抽出することができる、という効果が得られる。 According to the content feature amount extraction device, method, and program of the present invention, it is possible to obtain the effect that the feature amount of the content in consideration of the digital content set prepared in advance can be extracted.

本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the content feature amount extraction apparatus which concerns on 1st Embodiment of this invention. 本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置におけるコンテンツ特徴量抽出処理ルーチンを示すフローチャートである。It is a flowchart which shows the content feature amount extraction processing routine in the content feature amount extraction apparatus which concerns on 1st Embodiment of this invention. 本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the content feature amount extraction apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置におけるコンテンツ特徴量抽出処理ルーチンを示すフローチャートである。It is a flowchart which shows the content feature amount extraction processing routine in the content feature amount extraction apparatus which concerns on 2nd Embodiment of this invention. 本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the content feature amount extraction apparatus which concerns on 3rd Embodiment of this invention. 本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置におけるコンテンツ特徴量抽出処理ルーチンを示すフローチャートである。It is a flowchart which shows the content feature amount extraction processing routine in the content feature amount extraction apparatus which concerns on 3rd Embodiment of this invention. 実験結果におけるディジタルコンテンツが所属しているディジタルコンテンツ集合の数の統計の一例を示す図である。It is a figure which shows an example of the statistics of the number of digital contents sets to which digital contents belong in an experimental result. 実験結果における各ディジタルコンテンツ集合が含むディジタルコンテンツの数の統計の一例を示す図である。It is a figure which shows an example of the statistics of the number of digital contents contained in each digital content set in the experimental result.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Overview of Embodiments of the Present Invention>

まず、本発明の実施の形態における概要を説明する。 First, an outline of the embodiment of the present invention will be described.

本発明の実施の形態に係る手法は、例えばｗｅｂ画像から獲得するラベル相当の情報の性質を反映したラベル予測モデルを提供すると共に、このラベル予測モデルを活用した新しい特徴量学習の手段を提供するものである。 The method according to the embodiment of the present invention provides, for example, a label prediction model that reflects the properties of label-equivalent information acquired from a web image, and also provides a new means for learning features using this label prediction model. It is a thing.

より具体的には、（１）同一のラベル相当の情報を保持する画像集合を定義し、（２）この画像集合への画像の所属の有無を表現するグラフを構築し、（３）グラフのエッジの有無を予測することでラベル予測を行うと共に、（４）ラベル予測の結果、もしくはラベル予測のためのモデルの誤差関数を用いて、画像特徴量を学習により求めるものである。 More specifically, (1) an image set that holds information equivalent to the same label is defined, (2) a graph that expresses whether or not an image belongs to this image set is constructed, and (3) a graph. Label prediction is performed by predicting the presence or absence of edges, and the image feature amount is obtained by learning using (4) the result of label prediction or the error function of the model for label prediction.

ここまでの記述では、説明の明快さのために、画像のみを対象とした特徴量学習の手段を説明してきた。しかし、以降に記載する発明の実施の形態において、学習したラベル予測モデルを用いて特徴量を抽出する対象は画像に限定されるものではなく、音響信号、テキスト、センサ信号など、様々な対象に適用可能である。以降では、これら特徴量を抽出す対象をディジタルコンテンツと総称することとする。また、同一のラベル相当の情報を保持するディジタルコンテンツの集合を、ディジタルコンテンツ集合と呼ぶ。 In the description so far, for the sake of clarity of explanation, the means of feature amount learning for images only have been described. However, in the embodiment of the invention described below, the target for extracting the feature amount using the learned label prediction model is not limited to the image, but various targets such as acoustic signals, texts, and sensor signals. Applicable. Hereinafter, the targets for extracting these features will be collectively referred to as digital contents. Further, a set of digital contents holding information corresponding to the same label is called a digital contents set.

本発明の実施の形態は、あらかじめ用意された複数のディジタルコンテンツ集合を用いて，ディジタルコンテンツ集合に含まれるディジタルコンテンツ、あるいはそれらとは別に与えられた新規ディジタルコンテンツの特徴量を抽出する、コンテンツ特徴量抽出の手段を提供するものである。 An embodiment of the present invention is a content feature that extracts a feature amount of digital content included in a digital content set or a new digital content given separately from the digital content set using a plurality of digital content sets prepared in advance. It provides a means of quantity extraction.

＜本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置の構成＞ <Structure of content feature amount extraction device according to the first embodiment of the present invention>

次に、本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置の構成について説明する。図１に示すように、本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置１００は、ＣＰＵと、ＲＡＭと、後述するコンテンツ特徴量抽出処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このコンテンツ特徴量抽出装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 Next, the configuration of the content feature amount extraction device according to the first embodiment of the present invention will be described. As shown in FIG. 1, the content feature amount extraction device 100 according to the first embodiment of the present invention includes a CPU, a RAM, a program for executing a content feature amount extraction processing routine described later, and various data. It can be configured with a computer including a stored ROM. The content feature amount extraction device 100 functionally includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、学習用に、ラベル付きのディジタルコンテンツからなる複数のディジタルコンテンツ集合を受け付ける。また、入力部１０は、ラベルが未知の新規ディジタルコンテンツを受け付ける。 The input unit 10 accepts a plurality of digital content sets composed of labeled digital contents for learning. Further, the input unit 10 accepts new digital contents whose label is unknown.

演算部２０は、リンク予測モデル記憶部２２と、コンテンツ特徴量モデル記憶部２４と、コンテンツ基本特徴量抽出部３０と、コンテンツ集合基本特徴量抽出部３２と、グラフ構築部３４と、リンク予測モデル学習部３６と、コンテンツ特徴量モデル学習部４０と、コンテンツ特徴量算出部４２と、モデル交互最適化部４４とを含んで構成されている。 The calculation unit 20 includes a link prediction model storage unit 22, a content feature amount model storage unit 24, a content basic feature amount extraction unit 30, a content set basic feature amount extraction unit 32, a graph construction unit 34, and a link prediction model. It includes a learning unit 36, a content feature amount model learning unit 40, a content feature amount calculation unit 42, and a model alternating optimization unit 44.

コンテンツ基本特徴量抽出部３０は、入力部１０で受け付けた複数のディジタルコンテンツ集合に含まれるディジタルコンテンツの各々について、ディジタルコンテンツの基本的な特徴量であるコンテンツ基本特徴量を抽出する。 The content basic feature amount extraction unit 30 extracts the content basic feature amount, which is the basic feature amount of the digital content, for each of the digital contents included in the plurality of digital content sets received by the input unit 10.

コンテンツ基本特徴量を抽出する方法は特に限定されるものではなく、ディジタルコンテンツの種類に応じて様々な特徴量抽出方法を選択することができる、本実施の形態では、その一例として、画像を対象とした基本特徴量抽出方法として、以下の非特許文献３に記載のVGG-netを用いた特徴量を説明する。 The method for extracting the basic content feature amount is not particularly limited, and various feature amount extraction methods can be selected according to the type of digital content. In the present embodiment, as an example thereof, an image is targeted. As the basic feature amount extraction method, the feature amount using VGG-net described in Non-Patent Document 3 below will be described.

（非特許文献３）Simonyan and Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, http://arxiv.org/abs/1409.1556. (Non-Patent Document 3) Simonyan and Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv: 1409.1556, http://arxiv.org/abs/1409.1556.

VGG-netは16層もしくは19層の畳み込みニューラルネットワークで構成される物体認識モデルであり、大規模物体認識データセットを用いた教師付学習によって学習する。このVGG-netを画像特徴量抽出モデルとして用いる場合には、物体認識データセットで学習した物体認識モデルの途中経過、例えば、16層VGG-net（VGG16）の第14層（FC6：4096次元）、第15層（FC7：4096次元）あるいは最終層（FC8：1000次元）の出力を利用することが一般的である。コンテンツ基本特徴量抽出部３０は、このようにして学習された物体認識モデルにディジタルコンテンツを入力してコンテンツ基本特徴量を抽出する。 VGG-net is an object recognition model composed of 16-layer or 19-layer convolutional neural networks, and is learned by supervised learning using a large-scale object recognition data set. When this VGG-net is used as an image feature extraction model, the progress of the object recognition model learned in the object recognition data set, for example, the 14th layer (FC6: 4096 dimensions) of the 16-layer VGG-net (VGG16). , It is common to use the output of the 15th layer (FC7: 4096 dimensions) or the final layer (FC8: 1000 dimensions). The content basic feature amount extraction unit 30 inputs digital content into the object recognition model learned in this way and extracts the content basic feature amount.

コンテンツ集合基本特徴量抽出部３２は、入力部１０で受け付けた複数のディジタルコンテンツ集合の各々について、ディジタルコンテンツ集合の基本的な特徴量であるコンテンツ集合基本特徴量を抽出する。 The content set basic feature amount extraction unit 32 extracts the content set basic feature amount, which is the basic feature amount of the digital content set, for each of the plurality of digital content sets received by the input unit 10.

コンテンツ集合基本特徴量の抽出方法は特に限定されるものではなく、ディジタルコンテンツ集合を表現するディジタルコンテンツに応じて、様々な特徴量抽出方法を選択することができる。本実施の形態では、その一例として、ディジタルコンテンツ集合に含まれるテキスト情報を対象とした基本特徴量抽出方法として、以下の非特許文献４に記載の方法に代表される単語・単語系列埋め込み手法を用いる方法を説明する。 The method for extracting the basic feature amount of the content set is not particularly limited, and various feature amount extraction methods can be selected according to the digital content expressing the digital content set. In the present embodiment, as an example thereof, as a basic feature amount extraction method for text information included in a digital content set, a word / word sequence embedding method represented by the method described in Non-Patent Document 4 below is used. The method to be used will be described.

（非特許文献４）Mikolov, Sutsekver, Chen, Corrado and Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems 26 (NIPS 2013). (Non-Patent Document 4) Mikolov, Sutsekver, Chen, Corrado and Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems 26 (NIPS 2013).

非特許文献４に記載の方法は、単語あるいは単語系列を１つの多次元ベクトルに変換するモデルを用いる方法であり、このモデルは大規模文書データセットを用いた教師なし学習によって学習する。ディジタルコンテンツ集合に含まれるテキスト情報が一つの単語もしくは一つの単語系列である場合には、この単語もしくは単語系列をモデルに与えたときの出力をそのままコンテンツ集合基本特徴量として利用できる。単語もしくは単語系列が複数含まれる場合には、例えば、すべての単語もしくは単語系列から得られたベクトルを平均してコンテンツ集合基本特徴量として用いる。 The method described in Non-Patent Document 4 is a method using a model that transforms a word or a word sequence into one multidimensional vector, and this model is learned by unsupervised learning using a large-scale document data set. When the text information included in the digital content set is one word or one word sequence, the output when this word or word sequence is given to the model can be used as it is as the content set basic feature quantity. When a plurality of words or word sequences are included, for example, the vectors obtained from all the words or word sequences are averaged and used as the content set basic feature quantity.

グラフ構築部３４は、ディジタルコンテンツの各々がいずれのディジタルコンテンツ集合に含まれるかを表現するグラフであるコンテンツグラフを構築する。 The graph construction unit 34 constructs a content graph which is a graph expressing which digital content set includes each of the digital contents.

コンテンツグラフの形式及びその構築方法は各種考えられるが、本実施の形態においては、各ディジタルコンテンツ及び各ディジタルコンテンツ集合をノードに対応させ、あるディジタルコンテンツＩ_ｉがあるディジタルコンテンツ集合Ｇ_ｃに含まれるときにディジタルコンテンツＩ_ｉに対応するノードとディジタルコンテンツ集合Ｇ_ｃに対応するノードとの間にエッジを張る。ディジタルコンテンツの総数をＮ_Ｉ、ディジタルコンテンツ集合の総数をＮ_Ｃと表現すると、このコンテンツグラフは、Ｎ_Ｃ×Ｎ_Ｉ二値隣接行列 Various formats of the content graph and its construction method can be considered, but in the present embodiment, each digital content and each digital content set are associated with a node, and a certain digital content _Ii is included in a certain digital content set _Gc . Sometimes an edge is created between the node corresponding to the digital content I _i and the node corresponding to the digital content set G _c . If the total number of digital contents is expressed as NI and the total number of digital _contents is expressed as _NC , this content graph is an _NC × _NI binary adjacency matrix.

で表現され、この隣接行列の要素ａ_ｃ,ｉが１となるノード対(ｃ,ｉ)の間にリンクが存在し，それ以外のノード対にはリンクが存在しないことを示す。このように構成されたコンテンツグラフは二部グラフとなり、一方のノード集合がディジタルコンテンツ、もう一方のノード集合がディジタルコンテンツ集合に対応するノードが含まれることになる。 It is expressed by, and it is shown that the link exists between the node pair (c, i) in which the elements a _{c and i} of this adjacency matrix are 1, and the link does not exist in the other node pairs. The content graph configured in this way becomes a bipartite graph, and one node set contains digital content and the other node set contains nodes corresponding to the digital content set.

リンク予測モデル学習部３６は、コンテンツ基本特徴量抽出部３０でディジタルコンテンツの各々から抽出したコンテンツ基本特徴量、コンテンツ集合基本特徴量抽出部３２で複数のディジタルコンテンツ集合の各々から抽出したコンテンツ集合基本特徴量、及びグラフ構築部３４で構築したコンテンツグラフに基づいて、ディジタルコンテンツ集合への所属の有無を、コンテンツ基本特徴量及びコンテンツ集合基本特徴量から予測するリンク予測モデルを学習し、リンク予測モデル記憶部２２に記憶する。 The link prediction model learning unit 36 is a content basic feature amount extracted from each of the digital contents by the content basic feature amount extraction unit 30, and a content set basic extracted from each of a plurality of digital content sets by the content set basic feature amount extraction unit 32. Based on the feature amount and the content graph constructed by the graph construction unit 34, a link prediction model that predicts the presence or absence of belonging to the digital content set from the content basic feature amount and the content set basic feature amount is learned, and the link prediction model is used. It is stored in the storage unit 22.

リンク予測モデル及びリンク予測モデルの学習方法は特に限定されるものではないが、本実施の形態では、特に以下に示す線型モデルと順位損失に基づく方法について説明する。 The learning method of the link prediction model and the link prediction model is not particularly limited, but in the present embodiment, the linear model and the method based on the rank loss shown below will be particularly described.

まず、Ｎ_Ｉ個のディジタルコンテンツから抽出したコンテンツ基本特徴量を First, the basic content features extracted from _NI digital contents are calculated.

、Ｎ_Ｃ個のディジタルコンテンツ集合から抽出したコンテンツ集合基本特徴量を , NC _Content set basic features extracted from digital content set

と表現する。 It is expressed as.

これらの基本特徴量を用いて，ｉ番目のディジタルコンテンツに対応するノードと、ｃ番目のディジタルコンテンツ集合に対応するノードとの間にリンクが存在するかどうかの指標であるリンク予測値 Using these basic features, the link prediction value is an index of whether or not there is a link between the node corresponding to the i-th digital content and the node corresponding to the c-th digital content set.

を以下（１）式でモデル化する。 Is modeled by the following equation (1).

・・・（１）

... (1)

ここで、 here,

はすべてモデルパラメータであり、リンク予測モデルはこれらモデルパラメータによって特徴付けられる。コンテンツ集合基本特徴量が利用可能である場合には、モデルパラメータ Are all model parameters, and the link prediction model is characterized by these model parameters. Model parameters if content set basic features are available

を追加することにより、リンク予測モデルを以下（２）式のように修正する。 By adding, the link prediction model is modified as shown in Eq. (2) below.

・・・（２）

... (2)

また、各ディジタルコンテンツに対応するモデルパラメータであるコンテンツ潜在変数ｚ_Ii及び各ディジタルコンテンツ集合に対応するモデルパラメータであるコンテンツ集合潜在変数ｚ_Ｃｃを利用しないリンク予測モデルも可能である。コンテンツ潜在変数ｚ_Iiを利用しない場合、リンク予測モデルは以下（３）式のように修正される。 Further, a link prediction model that does not use the content latent variable z _Ii , which is a model parameter corresponding to each digital content, and the content set latent variable z _Cc , which is a model parameter corresponding to each digital content set, is also possible. When the content latent variable z _Ii is not used, the link prediction model is modified as shown in Eq. (3) below.

・・・（３）

... (3)

コンテンツ集合潜在変数ｚ_Ｃｃを利用しない場合も、同様にコンテンツ集合潜在変数と関連するモデルパラメータを省略する。以降では、説明の簡略化のため、（３）式によるリンク予測モデルを用いるものとする。なお、他のリンク予測モデルを用いる場合でも、扱いはほぼ同様である． Similarly, when the content set latent variable z _Cc is not used, the model parameters related to the content set latent variable are omitted. Hereinafter, for the sake of simplification of the explanation, the link prediction model based on the equation (3) will be used. Even when other link prediction models are used, the treatment is almost the same.

（３）式でモデル化したリンク予測値 Link prediction value modeled by equation (3)

を、コンテンツグラフを特徴付ける隣接行列Ａの対応する成分ａ_ｉ,ｃに近づけることが、リンク予測モデル学習部３６の主要工程となる。具体的には、以下（４）式に示す順位損失関数を最小化するモデルパラメータを導出する。 Is close to the corresponding components _{ai and c} of the adjacency matrix A that characterizes the content graph, which is the main step of the link prediction model learning unit 36. Specifically, a model parameter that minimizes the rank loss function shown in Eq. (4) below is derived.

・・・（４）

... (4)

ここで、ｌ(・)は損失関数，Ω(・)は正則化項であり，モデルパラメータに対して劣微分可能となるように選択する必要があり、例えば損失関数としてはヒンジロスや自乗ノルム、正則化項として例えば各パラメータ行列の自乗ノルムの重み付き線形和などを用いることができる。また、 Here, l (・) is a loss function and Ω (・) is a regularization term, and it is necessary to select them so that they can be subdifferentiated with respect to the model parameters. As a regularization term, for example, a weighted linear sum of squared norms of each parameter matrix can be used. again,

である。 Is.

（４）式に示す順位損失関数をすべてのパラメータのついて同時に最小化することは困難であることから、確率的勾配降下法を用いて、損失関数を減らすように個々のパラメータを個別に逐次更新する。（４）式に示す順位損失関数は、モデルパラメータに対して劣微分可能であることから、（４）式の和の内部の式 Since it is difficult to minimize the rank loss function shown in Eq. (4) for all parameters at the same time, the stochastic gradient descent method is used to sequentially update each parameter individually so as to reduce the loss function. do. Since the rank loss function shown in equation (4) is subderivative with respect to the model parameters, the equation inside the sum of equation (4)

及び正則化項 And regularization terms

を各モデルパラメータで個々に偏微分することにより、ディジタルコンテンツｉ、当該ディジタルコンテンツを含むディジタルコンテンツ集合ｃ及び当該ディジタルコンテンツを含まないディジタルコンテンツ集合ｃ’を固定した際の各モデルパラメータの更新式を導出できる。この更新式を用いて、確率的勾配降下法を用いることでモデルパラメータを更新する。すなわち、ディジタルコンテンツｉ、当該ディジタルコンテンツを含むディジタルコンテンツ集合ｃ及び当該ディジタルコンテンツを含まないディジタルコンテンツ集合ｃ’の三つ組みごとにモデルパラメータを更新する。 By partially differentiating each model parameter individually, the update formula of each model parameter when the digital content i, the digital content set c including the digital content, and the digital content set c'excluding the digital content are fixed is obtained. Can be derived. Using this update equation, the model parameters are updated by using the stochastic gradient descent method. That is, the model parameters are updated for each triplet of the digital content i, the digital content set c including the digital content, and the digital content set c'not including the digital content.

モデルパラメータの更新は、所定の条件を満たした際に停止する。停止条件としては、モデルパラメータ更新回数が所定数を超える、リンク予測モデル記憶部２２を参照し、更新前パラメータと更新後パラメータとの差分が所定の閾値よりも小さくなる、などの条件が考えられる。 The update of the model parameter is stopped when a predetermined condition is satisfied. Possible stop conditions include conditions such as the number of model parameter updates exceeding a predetermined number, the difference between the pre-update parameter and the post-update parameter being smaller than a predetermined threshold value with reference to the link prediction model storage unit 22. ..

以上がリンク予測モデル学習部３６のリンク予測モデルの学習処理である。 The above is the learning process of the link prediction model of the link prediction model learning unit 36.

コンテンツ特徴量モデル学習部４０は、リンク予測モデル学習部３６で学習されたリンク予測モデル、及びディジタルコンテンツ集合に含まれるディジタルコンテンツから、ディジタルコンテンツのコンテンツ特徴量を抽出するためのモデルであるコンテンツ特徴量モデルを学習し、コンテンツ特徴量モデル記憶部２４に記憶する。 The content feature amount model learning unit 40 is a model for extracting the content feature amount of digital content from the link prediction model learned by the link prediction model learning unit 36 and the digital content included in the digital content set. The quantity model is learned and stored in the content feature quantity model storage unit 24.

コンテンツ特徴量モデルの学習方法は特に限定されるものではないが、本実施の形態では、その一例として、VGG-Netを、コンテンツ特徴量モデルの一例とし、リンク予測モデル記憶部２２に記憶されたリンク予測モデルを用いたVGG-Netの再学習による方法を説明する。 The learning method of the content feature amount model is not particularly limited, but in the present embodiment, VGG-Net is taken as an example of the content feature amount model and stored in the link prediction model storage unit 22 as an example thereof. The method by re-learning VGG-Net using the link prediction model will be explained.

VGG-Netは、コンテンツ基本特徴量抽出部３０にて説明したとおり、１６層もしくは１９層の畳み込みニューラルネットワークで構成されるモデルであり、通常は大規模物体認識データセットを用いた教師付学習によって学習する。すなわち、あらかじめ準備された複数種類の物体のうち、いずれの物体が与えられた画像に含まれるかを示すラベルを教師として、モデルの予測と正解ラベルとの差を小さくするように、モデルパラメータを更新する。 As explained in the content basic feature quantity extraction unit 30, VGG-Net is a model composed of 16-layer or 19-layer convolutional neural networks, and is usually by supervised learning using a large-scale object recognition data set. learn. That is, the model parameter is set so as to reduce the difference between the prediction of the model and the correct answer label, using the label indicating which object is included in the given image among the plurality of types of objects prepared in advance as a teacher. Update.

一方、本実施の形態では、まず、リンク予測モデルの学習に用いたディジタルコンテンツをVGG-Netに入力し、コンテンツ基本特徴量抽出部３０と同様にしてコンテンツ基本特徴量を抽出する。続いて、学習済のリンク予測モデルにコンテンツ基本特徴量を入力し、（４）式に示した順位損失関数で順位損失を計算する。最後に、この順位損失が小さくなるように、VGG-Netのモデルパラメータを更新する。この手順において、リンク予測モデルは固定されているため、VGG-Netの学習における損失関数が一般的に用いられる関数と異なるだけと見なすことができ、誤差逆伝搬などの一般的なニューラルネットワークの学習手法をそのまま援用することができる。 On the other hand, in the present embodiment, first, the digital content used for learning the link prediction model is input to VGG-Net, and the content basic feature amount is extracted in the same manner as the content basic feature amount extraction unit 30. Subsequently, the content basic feature amount is input to the trained link prediction model, and the ranking loss is calculated by the ranking loss function shown in Eq. (4). Finally, the VGG-Net model parameters are updated so that this rank loss is reduced. In this procedure, since the link prediction model is fixed, the loss function in VGG-Net training can be regarded as different from the commonly used function, and training of general neural networks such as error backpropagation can be considered. The method can be used as it is.

VGG-Netのモデルパラメータの学習において、モデルパラメータにランダムな初期値に設定して学習を開始してもよいが、大規模物体認識データセットを用いてあらかじめモデルを学習しておき、そのモデルパラメータを初期値として用いることもできる。また、本実施形態におけるVGG-Netのモデルパラメータの学習において、すべての層のモデルパラメータを更新しても良いが、畳み込み層（第１層から第１３層まで）、あるいはこれら畳み込み層のうち入力に近い層（第１層から第１０層まで、第１層から第６層まで、など）のモデルパラメータを固定して、それ以外のモデルパラメータを更新することもできる。 In the training of VGG-Net model parameters, the model parameters may be set to random initial values and training may be started. However, the model is trained in advance using a large-scale object recognition data set, and the model parameters are trained. Can also be used as the initial value. Further, in the learning of the model parameters of VGG-Net in the present embodiment, the model parameters of all the layers may be updated, but the convolution layer (from the first layer to the thirteenth layer) or one of these convolution layers is input. It is also possible to fix the model parameters of layers close to (1st layer to 10th layer, 1st layer to 6th layer, etc.) and update other model parameters.

コンテンツ特徴量算出部４２は、コンテンツ特徴量モデル学習部４０で学習されたコンテンツ特徴量モデルを用いて、ディジタルコンテンツ集合に含まれるディジタルコンテンツから、コンテンツ特徴量を算出する。 The content feature amount calculation unit 42 calculates the content feature amount from the digital contents included in the digital content set by using the content feature amount model learned by the content feature amount model learning unit 40.

コンテンツ特徴量算出部４２は、コンテンツ特徴量モデルを用いる場合と、用いない場合が考えられるが、本実施の形態ではコンテンツ特徴量モデルを用いている。コンテンツ特徴量モデルを用いる場合は、コンテンツ基本特徴量抽出部３０と同様にして、その途中経過、例えば、16層VGG-net（VGG16）の第14層（FC6：4096次元）、第15層（FC7：4096次元）あるいは最終層（FC8：1000次元）の出力を利用することができる。 The content feature amount calculation unit 42 may or may not use the content feature amount model, but in the present embodiment, the content feature amount model is used. When the content feature amount model is used, the process is the same as that of the content basic feature amount extraction unit 30, for example, the 14th layer (FC6: 4096 dimensions) and the 15th layer (FC6: 4096 dimensions) of the 16-layer VGG-net (VGG16). The output of FC7: 4096 dimensions) or the final layer (FC8: 1000 dimensions) can be used.

モデル交互最適化部４４は、リンク予測モデル学習部３６の処理と、コンテンツ特徴量モデル学習部４０の処理とを交互に繰り返し実行することで、リンク予測モデル及びコンテンツ特徴量モデルを最適化する。モデル交互最適化部４４により、繰り返し終了条件を満たすかを判定し、満たしていれば学習処理を終了し、満たしていなければ、リンク予測モデル学習部３６の処理と、コンテンツ特徴量モデル学習部４０の処理とを実行することを繰り返す。 The model alternating optimization unit 44 optimizes the link prediction model and the content feature amount model by alternately and repeatedly executing the processing of the link prediction model learning unit 36 and the processing of the content feature amount model learning unit 40. The model alternating optimization unit 44 determines whether or not the iterative end condition is satisfied, and if it is satisfied, the learning process is terminated. If not, the link prediction model learning unit 36 and the content feature amount model learning unit 40 are used. Repeat the process of.

このモデル交互最適化部４４は、必ずしも必須となる構成要素ではないが、この構成要素を追加することにより、リンク予測モデル及びコンテンツ特徴量モデルが精緻化され、より有用なコンテンツ特徴量を抽出することが可能となる。 The model alternating optimization unit 44 is not necessarily an indispensable component, but by adding this component, the link prediction model and the content feature amount model are refined, and a more useful content feature amount is extracted. It becomes possible.

交互最適化の方法は特に限定されるものではないが、本実施の形態では、コンテンツ特徴量モデル学習部４０においてVGG-Netの再学習を用いる場合の方法について述べる。 The method of alternate optimization is not particularly limited, but in the present embodiment, a method when the re-learning of VGG-Net is used in the content feature amount model learning unit 40 will be described.

コンテンツ特徴量モデル学習部４０においてVGG-Netの再学習を用いる場合、コンテンツ基本特徴量抽出部３０とコンテンツ特徴量算出部４２とは、ほぼ同様の機能を有することとなる。すなわち、いずれの処理部においても、VGG-Netを特徴量モデルとして採用し、入力されたディジタルコンテンツから特徴量を抽出することとなる。 When the re-learning of VGG-Net is used in the content feature amount model learning unit 40, the content basic feature amount extraction unit 30 and the content feature amount calculation unit 42 have substantially the same functions. That is, in any processing unit, VGG-Net is adopted as a feature amount model, and the feature amount is extracted from the input digital content.

そこで、モデル交互最適化部４４は、以下の手順によってリンク予測モデルとコンテンツ特徴モデルを交互に最適化する。 Therefore, the model alternating optimization unit 44 alternately optimizes the link prediction model and the content feature model by the following procedure.

（１）リンク予測モデル学習部３６により、コンテンツ基本特徴量もしくはコンテンツ特徴量を用いて、リンク予測モデルを学習する。
（２）コンテンツ特徴量モデル学習部４０により、固定したリンク予測モデルを用いて、コンテンツ特徴量モデルを学習する。
（３）コンテンツ特徴量算出部４２により、固定したコンテンツ特徴量モデルを用いて、コンテンツ特徴量を学習する。
（４）（１）～（３）を繰り返し実行する。繰り返しは、所定の条件を満たした際に停止する。繰り返し終了条件としては、更新回数が所定数を超える、リンク予測モデル及びコンテンツ特徴量モデルについて、更新前パラメータと更新後パラメータとの差分が所定の閾値よりも小さくなる、などの条件が考えられる。 (1) The link prediction model learning unit 36 learns a link prediction model using the content basic feature amount or the content feature amount.
(2) The content feature amount model learning unit 40 learns the content feature amount model using a fixed link prediction model.
(3) The content feature amount calculation unit 42 learns the content feature amount using a fixed content feature amount model.
(4) Repeat steps (1) to (3). The repetition is stopped when a predetermined condition is satisfied. As the repetition end condition, conditions such as the number of updates exceeding a predetermined number and the difference between the pre-update parameter and the post-update parameter for the link prediction model and the content feature amount model may be smaller than a predetermined threshold value.

以上の処理によって、リンク予測モデル及びコンテンツ特徴量モデルが学習される。 By the above processing, the link prediction model and the content feature amount model are learned.

以上のように学習されたコンテンツ特徴量モデルを用いて、コンテンツ特徴量算出部４２は、入力部１０で受け付けた新規ディジタルコンテンツのコンテンツ特徴量を算出し、リンク予測部４６に出力する。 Using the content feature amount model learned as described above, the content feature amount calculation unit 42 calculates the content feature amount of the new digital content received by the input unit 10 and outputs it to the link prediction unit 46.

リンク予測部４６は、リンク予測モデル記憶部２２に記憶されたリンク予測モデル、及び新規ディジタルコンテンツのディジタルコンテンツ特徴量を入力とし、これらからディジタルコンテンツ集合の各々への新規ディジタルコンテンツの所属の有無を予測し、出力部５０に予測結果を出力する。 The link prediction unit 46 inputs the link prediction model stored in the link prediction model storage unit 22 and the digital content feature amount of the new digital content, and determines whether or not the new digital content belongs to each of the digital content sets. It makes a prediction and outputs the prediction result to the output unit 50.

＜本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置の作用＞ <Operation of the content feature amount extraction device according to the first embodiment of the present invention>

次に、本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置１００の作用について説明する。入力部１０においてラベル付きのディジタルコンテンツからなる複数のディジタルコンテンツ集合を受け付けると、コンテンツ特徴量抽出装置１００は、図２に示すコンテンツ特徴量抽出処理ルーチンを実行する。 Next, the operation of the content feature amount extraction device 100 according to the first embodiment of the present invention will be described. When the input unit 10 receives a plurality of digital content sets composed of labeled digital contents, the content feature amount extraction device 100 executes the content feature amount extraction processing routine shown in FIG. 2.

まず、ステップＳ１００では、入力部１０で受け付けた複数のディジタルコンテンツ集合に含まれるディジタルコンテンツの各々について、ディジタルコンテンツの基本的な特徴量であるコンテンツ基本特徴量を抽出する。 First, in step S100, the content basic feature amount, which is the basic feature amount of the digital content, is extracted for each of the digital contents included in the plurality of digital content sets received by the input unit 10.

ステップＳ１０２では、入力部１０で受け付けた複数のディジタルコンテンツ集合の各々について、ディジタルコンテンツ集合の基本的な特徴量であるコンテンツ集合基本特徴量を抽出する。 In step S102, the content set basic feature amount, which is the basic feature amount of the digital content set, is extracted for each of the plurality of digital content sets received by the input unit 10.

ステップＳ１０４では、ディジタルコンテンツの各々がいずれのディジタルコンテンツ集合に含まれるかを表現するグラフであるコンテンツグラフを構築する。 In step S104, a content graph, which is a graph representing which digital content set each of the digital contents is included in, is constructed.

ステップＳ１０６では、ステップＳ１００でディジタルコンテンツの各々から抽出したコンテンツ基本特徴量、又はステップＳ１１０で抽出したコンテンツ特徴量、ステップＳ１０２で複数のディジタルコンテンツ集合の各々から抽出したコンテンツ集合基本特徴量、及びステップＳ１０４で構築したコンテンツグラフに基づいて、ディジタルコンテンツ集合への所属の有無を、コンテンツ基本特徴量及びコンテンツ集合基本特徴量から予測するリンク予測モデルを学習し、リンク予測モデル記憶部２２に記憶する。 In step S106, the content basic feature amount extracted from each of the digital contents in step S100, the content feature amount extracted in step S110, the content set basic feature amount extracted from each of the plurality of digital content sets in step S102, and the step. Based on the content graph constructed in S104, a link prediction model that predicts the presence or absence of belonging to the digital content set from the content basic feature amount and the content set basic feature amount is learned and stored in the link prediction model storage unit 22.

ステップＳ１０８では、ステップＳ１０６で学習されたリンク予測モデル、及びディジタルコンテンツ集合に含まれるディジタルコンテンツから、ディジタルコンテンツのコンテンツ特徴量を抽出するためのモデルであるコンテンツ特徴量モデルを学習し、コンテンツ特徴量モデル記憶部２４に記憶する。 In step S108, the link prediction model learned in step S106 and the content feature amount model, which is a model for extracting the content feature amount of the digital content from the digital content included in the digital content set, are learned, and the content feature amount is learned. It is stored in the model storage unit 24.

ステップＳ１１０では、ステップＳ１０８で学習されたコンテンツ特徴量モデルを用いて、ディジタルコンテンツ集合に含まれるディジタルコンテンツから、コンテンツ特徴量を算出する。 In step S110, the content feature amount is calculated from the digital content included in the digital content set by using the content feature amount model learned in step S108.

ステップＳ１１２では、繰り返し終了条件を満たすかを判定し、満たしていればステップＳ１１４へ移行し、満たしていなければステップＳ１０６～Ｓ１１０の処理を繰り返す。 In step S112, it is determined whether or not the repetition end condition is satisfied, and if it is satisfied, the process proceeds to step S114, and if it is not satisfied, the processes of steps S106 to S110 are repeated.

ステップＳ１１４では、ステップＳ１０８で学習されたコンテンツ特徴量モデルを用いて、入力部１０で受け付けた新規ディジタルコンテンツのコンテンツ特徴量を算出する。 In step S114, the content feature amount of the new digital content received by the input unit 10 is calculated using the content feature amount model learned in step S108.

ステップＳ１１６では、ステップＳ１０６で学習されたリンク予測モデルと、ステップＳ１１４で算出されたコンテンツ特徴量とを用いて、ディジタルコンテンツ集合の各々への新規ディジタルコンテンツの所属の有無を予測し、出力部５０に出力して処理を終了する。 In step S116, the presence or absence of new digital content belonging to each of the digital content sets is predicted using the link prediction model learned in step S106 and the content feature amount calculated in step S114, and the output unit 50 Output to and end the process.

以上説明したように、本発明の第１の実施の形態に係るコンテンツ特徴量抽出装置によれば、ディジタルコンテンツ集合に含まれるディジタルコンテンツの各々について、ディジタルコンテンツの基本的な特徴量であるコンテンツ基本特徴量を抽出し、ディジタルコンテンツの各々がいずれのディジタルコンテンツ集合に含まれるかを表現するグラフであるコンテンツグラフを構築し、ディジタルコンテンツの各々から抽出したコンテンツ基本特徴量、及びコンテンツグラフに基づいて、ディジタルコンテンツ集合への所属の有無をディジタルコンテンツのコンテンツ基本特徴量から予測するモデルであるリンク予測モデルを学習し、学習されたリンク予測モデルに基づいて、ディジタルコンテンツ集合に含まれるディジタルコンテンツ、及び新規ディジタルコンテンツの少なくとも一方について、ディジタルコンテンツの特徴量であるコンテンツ特徴量を計算することにより、ラベルの性質を考慮したコンテンツ特徴量を抽出することができ、新規ディジタルコンテンツについて、精度よくラベルを予測することができる。 As described above, according to the content feature amount extracting device according to the first embodiment of the present invention, for each of the digital contents included in the digital content set, the content basic which is the basic feature amount of the digital content. A content graph is constructed by extracting features and expressing which digital content set each digital content is included in, and based on the basic content features extracted from each of the digital contents and the content graph. , Learn the link prediction model, which is a model that predicts the presence or absence of belonging to the digital content set from the basic content features of the digital content, and based on the learned link prediction model, the digital content included in the digital content set, and By calculating the content feature amount, which is the feature amount of the digital content, for at least one of the new digital contents, the content feature amount considering the properties of the label can be extracted, and the label is accurately predicted for the new digital content. can do.

＜本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置の構成＞ <Structure of content feature amount extraction device according to the second embodiment of the present invention>

次に、本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置の構成について説明する。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the configuration of the content feature amount extraction device according to the second embodiment of the present invention will be described. The same parts as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

第２の実施の形態では、コンテンツ特徴量モデル学習部、及びモデル交互最適化部を用いずに、学習したリンク予測モデルを用いてコンテンツ特徴量を算出する。 In the second embodiment, the content feature amount is calculated using the learned link prediction model without using the content feature amount model learning unit and the model alternating optimization unit.

図３に示すように、本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置２００は、ＣＰＵと、ＲＡＭと、後述するコンテンツ特徴量抽出処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このコンテンツ特徴量抽出装置２００は、機能的には図３に示すように入力部２１０と、演算部２２０と、出力部２５０とを備えている。 As shown in FIG. 3, the content feature amount extraction device 200 according to the second embodiment of the present invention includes a CPU, a RAM, a program for executing a content feature amount extraction processing routine described later, and various data. It can be configured with a computer including a stored ROM. The content feature amount extraction device 200 functionally includes an input unit 210, a calculation unit 220, and an output unit 250 as shown in FIG.

入力部２１０は、学習用に、ラベル付きのディジタルコンテンツからなる複数のディジタルコンテンツ集合を受け付ける。また、入力部２１０は、ラベルが未知の新規に与えられた新規ディジタルコンテンツを受け付ける。 The input unit 210 accepts a plurality of digital content sets composed of labeled digital contents for learning. Further, the input unit 210 accepts a newly given new digital content having an unknown label.

演算部２２０は、リンク予測モデル記憶部２２と、コンテンツ基本特徴量抽出部３０と、コンテンツ集合基本特徴量抽出部３２と、グラフ構築部３４と、リンク予測モデル学習部３６と、コンテンツ特徴量算出部２４２と、リンク予測部４６とを含んで構成されている。 The calculation unit 220 includes a link prediction model storage unit 22, a content basic feature amount extraction unit 30, a content set basic feature amount extraction unit 32, a graph construction unit 34, a link prediction model learning unit 36, and a content feature amount calculation. A unit 242 and a link prediction unit 46 are included.

コンテンツ特徴量算出部２４２は、リンク予測モデル記憶部に２２記憶されたリンク予測モデルに基づいて、ディジタルコンテンツ集合に含まれるディジタルコンテンツから、コンテンツ特徴量を算出する。 The content feature amount calculation unit 242 calculates the content feature amount from the digital contents included in the digital content set based on the link prediction model 22 stored in the link prediction model storage unit.

コンテンツ特徴量算出部２４２でリンク予測モデルに基づいてコンテンツ特徴量を算出する場合には、パース符号化に基づく方法により算出を行う。スパース符号化に基づく方法では、コンテンツ基本特徴量をコンテンツ集合基本特徴量（及び利用可能な場合には加えてコンテンツ集合潜在変数）の疎な線形和で表現する方法である。（３）式でモデル化したリンク予測モデルは、モデルパラメータで変換したコンテンツ基本特徴量 When the content feature amount calculation unit 242 calculates the content feature amount based on the link prediction model, the calculation is performed by a method based on perspective coding. The method based on sparse coding is a method of expressing the content basic features by a sparse linear sum of the content set basic features (and, if available, the content set latent variables). The link prediction model modeled by equation (3) is the basic content feature converted by the model parameters.

と、ディジタルコンテンツ集合を特徴付けるベクトル And the vectors that characterize the digital content set

との内積が大きい。すなわち両ベクトルが類似しているときに、リンク予測値 Dot product with is large. That is, when both vectors are similar, the link prediction value

が大きくなることを示している。この点に着目して、以下（５）式の最小化問題の解として得られる線形和の重み係数αを、コンテンツ基本特徴量ｘから得られる新しい特徴量、すなわちコンテンツ特徴量として算出する。

・・・（５）
なお、第２の実施の形態の他の構成は第１の実施の形態と同様であるため、説明を省略する。 Shows that becomes larger. Focusing on this point, the weighting coefficient α of the linear sum obtained as the solution of the minimization problem of the following equation (5) is calculated as a new feature amount obtained from the content basic feature amount x, that is, the content feature amount.

... (5)
Since the other configurations of the second embodiment are the same as those of the first embodiment, the description thereof will be omitted.

＜本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置の作用＞ <Operation of the content feature amount extraction device according to the second embodiment of the present invention>

次に、本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置２００の作用について説明する。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the operation of the content feature amount extracting device 200 according to the second embodiment of the present invention will be described. The same parts as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

入力部２１０においてラベル付きのディジタルコンテンツからなる複数のディジタルコンテンツ集合を受け付けると、コンテンツ特徴量抽出装置２００は、図４に示すコンテンツ特徴量抽出処理ルーチンを実行する。 When the input unit 210 receives a plurality of digital content sets composed of labeled digital contents, the content feature amount extraction device 200 executes the content feature amount extraction processing routine shown in FIG.

ステップＳ２００では、リンク予測モデル記憶部２２に記憶されたリンク予測モデルに基づいて、上記（５）式に従って、新規ディジタルコンテンツについて、ディジタルコンテンツの特徴量であるコンテンツ特徴量を計算する。 In step S200, based on the link prediction model stored in the link prediction model storage unit 22, the content feature amount, which is the feature amount of the digital content, is calculated for the new digital content according to the above equation (5).

以上説明したように、本発明の第２の実施の形態に係るコンテンツ特徴量抽出装置によれば、ディジタルコンテンツ集合に含まれるディジタルコンテンツの各々について、ディジタルコンテンツの基本的な特徴量であるコンテンツ基本特徴量を抽出し、ディジタルコンテンツの各々がいずれのディジタルコンテンツ集合に含まれるかを表現するグラフであるコンテンツグラフを構築し、ディジタルコンテンツの各々から抽出したコンテンツ基本特徴量、及びコンテンツグラフに基づいて、ディジタルコンテンツ集合への所属の有無をディジタルコンテンツのコンテンツ基本特徴量から予測するモデルであるリンク予測モデルを学習し、学習されたリンク予測モデルに基づいて、ディジタルコンテンツ集合に含まれるディジタルコンテンツ、及び新規ディジタルコンテンツの少なくとも一方について、ディジタルコンテンツの特徴量であるコンテンツ特徴量を計算することにより、ラベルの性質を考慮したコンテンツ特徴量を抽出することができ、新規ディジタルコンテンツについて、精度よくラベルを予測することができる。
＜本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置の構成＞ As described above, according to the content feature amount extracting device according to the second embodiment of the present invention, for each of the digital contents included in the digital content set, the content basic which is the basic feature amount of the digital content. A content graph is constructed by extracting features and expressing which digital content set each digital content is included in, and based on the basic content features extracted from each of the digital contents and the content graph. , Learn the link prediction model, which is a model that predicts the presence or absence of belonging to the digital content set from the basic content features of the digital content, and based on the learned link prediction model, the digital content included in the digital content set, and By calculating the content feature amount, which is the feature amount of the digital content, for at least one of the new digital contents, the content feature amount considering the properties of the label can be extracted, and the label is accurately predicted for the new digital content. can do.
<Structure of content feature amount extraction device according to the third embodiment of the present invention>

次に、本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置の構成について説明する。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the configuration of the content feature amount extraction device according to the third embodiment of the present invention will be described. The same parts as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

第３の実施の形態は、ディジタルコンテンツの新規ディジタルコンテンツ集合への所属の有無を予測する場合である。 The third embodiment is a case of predicting whether or not the digital content belongs to a new digital content set.

図５に示すように、本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置３００は、ＣＰＵと、ＲＡＭと、後述するコンテンツ特徴量抽出処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このコンテンツ特徴量抽出装置３００は、機能的には図５に示すように入力部３１０と、演算部３２０と、出力部３５０とを備えている。 As shown in FIG. 5, the content feature amount extraction device 300 according to the third embodiment of the present invention includes a CPU, a RAM, a program for executing a content feature amount extraction processing routine described later, and various data. It can be configured with a computer including a stored ROM. The content feature amount extraction device 300 functionally includes an input unit 310, a calculation unit 320, and an output unit 350 as shown in FIG.

入力部３１０は、学習用に、ラベル付きのディジタルコンテンツからなる複数のディジタルコンテンツ集合を受け付ける。また、入力部３１０は、新規のディジタルコンテンツ集合である新規ディジタルコンテンツ集合を受け付ける。また、入力部３１０は、ラベルが未知の新規に与えられた新規ディジタルコンテンツを受け付ける。 The input unit 310 accepts a plurality of digital content sets composed of labeled digital contents for learning. Further, the input unit 310 accepts a new digital content set, which is a new digital content set. Further, the input unit 310 accepts a newly given new digital content having an unknown label.

演算部３２０は、リンク予測モデル記憶部２２と、コンテンツ特徴量モデル記憶部２４と、コンテンツ基本特徴量抽出部３０と、コンテンツ集合基本特徴量抽出部３２と、グラフ構築部３４と、リンク予測モデル学習部３６と、コンテンツ特徴量モデル学習部４０と、コンテンツ特徴量算出部４２と、モデル交互最適化部４４と、新規ディジタルコンテンツ集合リンク予測部３４０とを含んで構成されている。 The calculation unit 320 includes a link prediction model storage unit 22, a content feature amount model storage unit 24, a content basic feature amount extraction unit 30, a content set basic feature amount extraction unit 32, a graph construction unit 34, and a link prediction model. It includes a learning unit 36, a content feature amount model learning unit 40, a content feature amount calculation unit 42, a model alternating optimization unit 44, and a new digital content set link prediction unit 340.

新規ディジタルコンテンツ集合リンク予測部３４０は、入力部３１０で受け付けた新規ディジタルコンテンツについて、リンク予測モデル学習部３６で学習された、リンク予測モデルに対応する、新規ディジタルコンテンツ集合についての潜在変数予測モデルと、リンク予測モデルとに基づいて、新規ディジタルコンテンツ集合への新規ディジタルコンテンツの所属の有無を予測する。 The new digital content set link prediction unit 340 uses the new digital content received by the input unit 310 as a latent variable prediction model for the new digital content set corresponding to the link prediction model learned by the link prediction model learning unit 36. , The presence or absence of new digital content belonging to the new digital content set is predicted based on the link prediction model.

新規ディジタルコンテンツ集合リンク予測部３４０により、リンク予測モデル学習部３６において考慮されていなかった新規ディジタルコンテンツ集合を考慮することが可能となる。ディジタルコンテンツ集合が、特定のテキストラベルが付与されたディジタルコンテンツの集合である場合には、新規ディジタルコンテンツ集合へのリンク予測は、学習の際には考慮されていなかったテキストラベルをディジタルコンテンツに付与するかどうかを判断する過程に相当する。 The new digital content set link prediction unit 340 makes it possible to consider a new digital content set that was not considered in the link prediction model learning unit 36. If the digital content set is a set of digital content with a particular text label, then link prediction to the new digital content set gives the digital content a text label that was not considered during training. It corresponds to the process of deciding whether to do it.

新規ディジタルコンテンツ集合へのリンク予測の手段は特に限定されるものではないが、本実施の形態においては、以下に示す状況を想定した方法について述べる。 The means for predicting the link to the new digital content set is not particularly limited, but in the present embodiment, a method assuming the following situations will be described.

新規ディジタルコンテンツ集合に何らかのテキスト情報が含まれており、コンテンツ集合基本特徴量抽出部３２に記載の方法を用いてディジタルコンテンツ集合基本特徴量が抽出できる。また、新規ディジタルコンテンツ集合が空集合、すなわち、新規ディジタルコンテンツ集合にディジタルコンテンツが１つも含まれていない、状況を想定する。また、ディジタルコンテンツが含まれていても良いが、以降に示す方法では利用しないものとする。 Some text information is included in the new digital content set, and the digital content set basic feature amount can be extracted by using the method described in the content set basic feature amount extraction unit 32. Further, it is assumed that the new digital content set is an empty set, that is, the new digital content set does not contain any digital content. Further, although digital contents may be included, they are not used in the methods shown below.

新規ディジタルコンテンツ集合リンク予測部３４０は、潜在変数予測モデル学習部３４３と、潜在変数予測部３４４と、リンク予測部３４６とを含んで構成されている。 The new digital content set link prediction unit 340 includes a latent variable prediction model learning unit 343, a latent variable prediction unit 344, and a link prediction unit 346.

潜在変数予測モデル学習部３４３は、リンク予測モデル記憶部２２に記憶されたリンク予測モデル、及びリンク予測モデル学習部３６で用いたディジタルコンテンツ集合基本特徴量から、リンク予測モデルのモデルパラメータの一部である潜在変数を予測するモデルである潜在変数予測モデルを学習する。 The latent variable prediction model learning unit 343 is a part of the model parameters of the link prediction model from the link prediction model stored in the link prediction model storage unit 22 and the digital content set basic feature quantity used in the link prediction model learning unit 36. Learn a latent variable prediction model, which is a model for predicting latent variables.

ディジタルコンテンツ集合ｃへのディジタルコンテンツｉの所属の有無を示すスコアであるリンク予測値は、上記（３）式の通り、ディジタルコンテンツ基本特徴量ｘ_ｉ、ディジタルコンテンツ集合基本特徴量ｙ_ｃ、及びディジタルコンテンツ集合潜在変数ｚ_Ｃｃから計算される。 The link prediction value, which is a score indicating whether or not the digital content i belongs to the digital content set c, is the digital content basic feature amount x _i , the digital content set basic feature amount y _c , and the digital as shown in the above equation (3). Calculated from the content set latent variable z _Cc .

しかし、新規ディジタルコンテンツ集合ｃ’においては、ディジタルコンテンツ集合基本特徴量ｙ_ｃ’は計算可能であるものの、ディジタルコンテンツ集合潜在変数ｚ_Ｃｃ’が利用できない。そこで、リンク予測モデル学習部３６で用いたディジタルコンテンツ集合基本特徴量Ｙ、及びリンク予測モデル学習部３６で学習したリンク予測モデルのモデルパラメータであるディジタルコンテンツ集合潜在変数Ｚ_Ｃを利用して、以下（６）式のディジタルコンテンツ集合基本特徴量からディジタルコンテンツ集合潜在変数を予測するモデルを考え、この潜在変数予測モデルｆ(・；θ)を学習により求める。 However, in the new digital content set c', although the digital content set basic feature quantity y _c'can be calculated, the digital content set latent variable z _Cc' cannot be used. Therefore, using the digital content set basic feature quantity Y used in the link prediction model learning unit 36 and the digital content set latent variable Z _C which is the model parameter of the link prediction model learned in the link prediction model learning unit 36, the following Consider a model that predicts a digital content set latent variable from the digital content set basic feature quantity of Eq. (6), and obtain this latent variable prediction model f (・; θ) by learning.

・・・（６）

... (6)

ここで、θは潜在変数予測モデルのモデルパラメータである。つまり、上記の潜在変数予測モデルｆ(・；θ)は、ｙ_ｃ’からｚ_Ｃｃ’を予測する。この学習は、ディジタルコンテンツ集合基本特徴量Ｙとディジタルコンテンツ集合潜在変数Ｚ_Ｃとを学習データとして、コンテンツ集合基本特徴量ｙからディジタルコンテンツ集合潜在変数ｚ_Ｃｃへの変換関数ｆ()を求めるものである。 Here, θ is a model parameter of the latent variable prediction model. That is, the above-mentioned latent variable prediction model f (.; θ) predicts z _Cc'from y _c' . In this learning, the conversion function f () from the content set basic feature quantity y to the digital content set latent variable z _Cc is obtained by using the digital content set basic feature quantity Y and the digital content set latent variable Z _C as training data. be.

潜在変数予測モデルの構成は特に限定されるものではないが、例えば、以下のような方法が考えられる。 The configuration of the latent variable prediction model is not particularly limited, but the following methods can be considered, for example.

単純な線形回帰モデルを採用し、リンク予測モデルを固定して以下（７）式のモデルパラメータθを学習により求める。 A simple linear regression model is adopted, the link prediction model is fixed, and the model parameter θ of the following equation (7) is obtained by learning.

・・・（７）

... (7)

（３）式のリンク予測モデルに上記（７）式の線形回帰モデルを代入した式を、新たなリンク予測モデルとして採用し、リンク予測モデルを再学習する。 The formula obtained by substituting the linear regression model of the above formula (7) into the link prediction model of the formula (3) is adopted as a new link prediction model, and the link prediction model is relearned.

また、リンク予測モデルの損失関数（４）式に、上記の線形回帰モデルを考慮した新たな正則化項を加えて、リンク予測モデルを再学習する。このとき、リンク予測モデルの損失関数（４）式は、以下（８）式のように修正される。 Further, the link prediction model is relearned by adding a new regularization term in consideration of the above linear regression model to the loss function (4) of the link prediction model. At this time, the loss function equation (4) of the link prediction model is modified as the following equation (8).

・・・（８）

... (8)

サポートベクトル回帰などの非線形回帰モデルを採用し、リンク予測モデルを固定して非線形回帰モデルのモデルパラメータを学習により求める。 A nonlinear regression model such as support vector regression is adopted, the link prediction model is fixed, and the model parameters of the nonlinear regression model are obtained by learning.

ニューラルネットワークモデルを採用し、リンク予測モデルを固定してニューラルネットワークのモデルパラメータを学習により求める。 A neural network model is adopted, the link prediction model is fixed, and the model parameters of the neural network are obtained by learning.

潜在変数予測部３４４は、潜在変数予測モデル学習部３４３で学習された潜在変数予測モデル、及び新規ディジタルコンテンツ集合を入力とし、新規ディジタルコンテンツ集合からコンテンツ集合基本特徴量を抽出し、新規ディジタルコンテンツ集合についてのコンテンツ集合基本特徴量、及び潜在変数予測モデルからコンテンツ集合潜在変数を予測し、新規ディジタルコンテンツ集合についてのコンテンツ集合潜在変数を出力する。 The latent variable prediction unit 344 inputs the latent variable prediction model learned by the latent variable prediction model learning unit 343 and the new digital content set, extracts the content set basic feature quantity from the new digital content set, and extracts the new digital content set. The content set latent variable is predicted from the content set basic feature quantity and the latent variable prediction model, and the content set latent variable for the new digital content set is output.

潜在変数予測部３４４は、まず、新規ディジタルコンテンツ集合からコンテンツ集合基本特徴量を抽出する。抽出方法は、コンテンツ集合基本特徴量抽出部３２に記載の方法と同様である。 The latent variable prediction unit 344 first extracts the content set basic features from the new digital content set. The extraction method is the same as the method described in the content set basic feature amount extraction unit 32.

潜在変数予測部３４４は、続いて、抽出したコンテンツ集合基本特徴量である新規コンテンツ集合基本特徴量ｙ_ｃ’、及び潜在変数予測モデルｆ(・；θ)から、以下（９）式のコンテンツ集合潜在変数を予測する。 The latent variable prediction unit 344 subsequently obtains the content set of the following equation (9) from the new content set basic feature amount y _c'which is the extracted content set basic feature amount and the latent variable prediction model f (・; θ). Predict latent variables.

・・・（９）

... (9)

リンク予測部３４６は、リンク予測モデル記憶部２２に記憶されたリンク予測モデル、予測されたコンテンツ集合潜在変数である予測コンテンツ集合潜在変数、新規ディジタルコンテンツについてのディジタルコンテンツ特徴量、及び新規コンテンツ集合基本特徴量を入力とし、これらから新規ディジタルコンテンツ集合を含むディジタルコンテンツ集合の各々への新規ディジタルコンテンツの所属の有無を予測し、出力部２５０に予測結果を出力する。 The link prediction unit 346 includes a link prediction model stored in the link prediction model storage unit 22, a predicted content set latent variable which is a predicted content set latent variable, a digital content feature amount for new digital content, and a new content set basic. Using the feature amount as an input, the presence or absence of belonging of the new digital content to each of the digital content sets including the new digital content set is predicted, and the prediction result is output to the output unit 250.

新規ディジタルコンテンツ集合ｃ’への新規ディジタルコンテンツｉ’の所属の有無の予測は、リンク予測モデル（３）式を用いることで実現できる。ただし、（３）式では、新規ディジタルコンテンツ集合に関するコンテンツ集合潜在変数は未知であるため、以下（１０）式の予測コンテンツ集合潜在変数を代わりに用いる。 The prediction of the presence or absence of the new digital content i'belonging to the new digital content set c'can be realized by using the link prediction model (3). However, since the content set latent variable related to the new digital content set is unknown in the equation (3), the predicted content set latent variable in the following equation (10) is used instead.

・・・（１０）

... (10)

この（１０）式で得られたリンク予測値 Link predicted value obtained by this equation (10)

がある一定以上大きな値を取るときに、ディジタルコンテンツｉ’が新規ディジタルコンテンツ集合ｃ’に所属すると判断する。 When a value larger than a certain value is taken, it is determined that the digital content i'belongs to the new digital content set c'.

以上の各部の処理によって、新規ディジタルコンテンツ集合リンク予測部３４０は、新規ディジタルコンテンツ集合へのディジタルコンテンツの所属の有無を予測し、新規ディジタルコンテンツ集合についての予測結果を出力する。
＜本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置の作用＞ Through the processing of each of the above units, the new digital content set link prediction unit 340 predicts whether or not the digital content belongs to the new digital content set, and outputs the prediction result for the new digital content set.
<Operation of the content feature amount extraction device according to the third embodiment of the present invention>

次に、本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置３００の作用について説明する。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。
＜本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置の作用＞ Next, the operation of the content feature amount extraction device 300 according to the third embodiment of the present invention will be described. The same parts as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.
<Operation of the content feature amount extraction device according to the third embodiment of the present invention>

次に、本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置３００の作用について説明する。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the operation of the content feature amount extraction device 300 according to the third embodiment of the present invention will be described. The same parts as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

入力部３１０においてラベル付きのディジタルコンテンツからなる複数のディジタルコンテンツ集合を受け付けると、コンテンツ特徴量抽出装置３００は、図６に示すコンテンツ特徴量抽出処理ルーチンを実行する。 When the input unit 310 receives a plurality of digital content sets composed of labeled digital contents, the content feature amount extraction device 300 executes the content feature amount extraction processing routine shown in FIG.

ステップＳ３０２では、リンク予測モデル記憶部２２に記憶されたリンク予測モデル、及びリンク予測モデルの学習に利用したディジタルコンテンツ集合基本特徴量から、リンク予測モデルのモデルパラメータの一部である潜在変数を予測するモデルである潜在変数予測モデルを学習する。 In step S302, a latent variable that is a part of the model parameter of the link prediction model is predicted from the link prediction model stored in the link prediction model storage unit 22 and the digital content set basic feature quantity used for learning the link prediction model. Learn the latent variable prediction model that is the model to do.

ステップＳ３０４では、ステップＳ３０２で学習された潜在変数予測モデル、及び新規ディジタルコンテンツ集合を入力とし、新規ディジタルコンテンツ集合からコンテンツ集合基本特徴量を抽出し、新規ディジタルコンテンツ集合についてのコンテンツ集合基本特徴量、及び潜在変数予測モデルからコンテンツ集合潜在変数を予測し、新規ディジタルコンテンツ集合についてのコンテンツ集合潜在変数を出力する。 In step S304, the latent variable prediction model learned in step S302 and the new digital content set are input, the content set basic feature amount is extracted from the new digital content set, and the content set basic feature amount for the new digital content set, And the content set latent variable is predicted from the latent variable prediction model, and the content set latent variable for the new digital content set is output.

ステップＳ３０６では、リンク予測モデル記憶部２２に記憶されたリンク予測モデル、予測されたコンテンツ集合潜在変数である予測コンテンツ集合潜在変数、及び新規ディジタルコンテンツについてのディジタルコンテンツ特徴量を入力とし、これらから新規ディジタルコンテンツ集合を含むディジタルコンテンツ集合の各々への新規ディジタルコンテンツの所属の有無を予測し、予測結果を出力する。 In step S306, the link prediction model stored in the link prediction model storage unit 22, the predicted content set latent variable which is the predicted content set latent variable, and the digital content feature amount for the new digital content are input, and new from these. It predicts whether or not new digital content belongs to each of the digital content sets including the digital content set, and outputs the prediction result.

以上説明したように、本発明の第３の実施の形態に係るコンテンツ特徴量抽出装置によれば、ディジタルコンテンツ集合に含まれるディジタルコンテンツの各々について、ディジタルコンテンツの基本的な特徴量であるコンテンツ基本特徴量を抽出し、ディジタルコンテンツの各々がいずれのディジタルコンテンツ集合に含まれるかを表現するグラフであるコンテンツグラフを構築し、ディジタルコンテンツの各々から抽出したコンテンツ基本特徴量、及びコンテンツグラフに基づいて、ディジタルコンテンツ集合への所属の有無をディジタルコンテンツのコンテンツ基本特徴量から予測するモデルであるリンク予測モデルを学習し、学習されたリンク予測モデルに基づいて、ディジタルコンテンツ集合に含まれるディジタルコンテンツ、及び新規ディジタルコンテンツの少なくとも一方について、ディジタルコンテンツの特徴量であるコンテンツ特徴量を計算し、新規ディジタルコンテンツ集合の潜在変数を予測することにより、ラベルの性質を考慮したコンテンツ特徴量を抽出することができ、新規ディジタルコンテンツについて、精度よくラベルを予測することができる。 As described above, according to the content feature amount extracting device according to the third embodiment of the present invention, for each of the digital contents included in the digital content set, the content basic which is the basic feature amount of the digital content. A content graph is constructed by extracting features and expressing which digital content set each digital content is included in, and based on the basic content features extracted from each of the digital contents and the content graph. , Learn the link prediction model, which is a model that predicts the presence or absence of belonging to the digital content set from the basic content features of the digital content, and based on the learned link prediction model, the digital content included in the digital content set, and By calculating the content feature quantity, which is the feature quantity of the digital content, and predicting the latent variable of the new digital content set, the content feature quantity considering the nature of the label can be extracted for at least one of the new digital content. , Labels can be predicted accurately for new digital contents.

＜実験結果＞ <Experimental results>

これまでに示した実施形態を検証するために、独自にデータセットを収集し、このデータセットを用いて、リンク予測モデル及びコンテンツ特徴量モデルを学習した。このデータセットは、65,000個のディジタルコンテンツ集合、及び150万個のディジタルコンテンツ（静止画像）から校正される。図７に各ディジタルコンテンツが所属しているディジタルコンテンツ集合の数の統計、及び図８に各ディジタルコンテンツ集合が含むディジタルコンテンツの数の統計を、それぞれ示す。 In order to verify the embodiments shown so far, a data set was independently collected, and a link prediction model and a content feature amount model were trained using this data set. This dataset is calibrated from a set of 65,000 digital contents and 1.5 million digital contents (still images). FIG. 7 shows statistics on the number of digital content sets to which each digital content belongs, and FIG. 8 shows statistics on the number of digital contents included in each digital content set.

以降に示す検証では、コンテンツ基本特徴量としてVGG-Netの第15層（FC7）の出力、4096次元を、コンテンツ集合基本特徴量として前記非特許文献４に記載のテキスト埋め込み300次元を用いた。これの各基本特徴量、及びデータセットから構築したコンテンツグラフを用いて、リンク予測モデルを学習すると共に、リンク予測モデルから直接コンテンツ特徴量を抽出する方法、及びリンク予測モデルを用いてコンテンツ特徴量モデルを学習した後にコンテンツ特徴量を抽出する方法、の２つの方法を実施した。 In the verifications shown below, the output of the 15th layer (FC7) of VGG-Net, 4096 dimensions, was used as the content basic feature quantity, and the text-embedded 300 dimensions described in Non-Patent Document 4 was used as the content set basic feature quantity. A method of learning a link prediction model using each of these basic features and a content graph constructed from a dataset, a method of extracting content features directly from the link prediction model, and a content feature using a link prediction model. Two methods, a method of extracting content features after learning the model, were carried out.

このように抽出したコンテンツ特徴量の性能を評価するために、公開画像データセットを用いた実験を行った。用いたデータセットは以下の通りである。 In order to evaluate the performance of the content features extracted in this way, an experiment using a public image data set was conducted. The data set used is as follows.

・食事画像の分類を目的としたデータセット： UECFOOD100、 UECFOOD256
・衣類画像の分類を目的としたデータセット： Apparel、Hipster
・画像から受ける印象の予測（positive or negative）を目的としたデータセット： Instagram -Data set for classification of meal images: UECFOOD100, UECFOOD256
-Data set for classification of clothing images: Apparel, Hipster
・ Data set for the purpose of predicting the impression (positive or negative) received from the image: Instagram

また、これらの公開データセットに加えて、画像の共有を目的とするSNSの一つであるPinterestから独自に収集したデータセットを評価用として用意した。このデータセットは、32種類のクラスラベルと、63,000枚の画像を含み、women’s fashion、 holiday’s events、 tattoos、 science and nature、 sportsなど、様々なカテゴリのクラスラベルを含む。また、各画像が複数のクラスラベルを持つ可能性がある、マルチラベル予測をタスクとする。 In addition to these public datasets, we also prepared a dataset originally collected from Pinterest, which is one of the SNSs for the purpose of sharing images, for evaluation. This dataset contains 32 class labels and 63,000 images, as well as various categories of class labels such as women's fashion, holiday's events, tattoos, science and nature, and sports. Also, the task is multi-label prediction, where each image may have multiple class labels.

評価実験では、以下の７種類の方法を比較した。 In the evaluation experiment, the following seven methods were compared.

1.VGG：コンテンツ基本特徴量をそのまま各タスクの特徴量として利用。
2.FT-GRP：リンク予測モデルを用いず、代わりにディジタルコンテンツ集合を正解ラベルとしてVGG-Netを再学習し、この再学習後のVGG-Netを各タスクの特徴量として利用。
3.FT-WORD：リンク予測モデルを用いず、代わりにディジタルコンテンツ集合に含まれるテキスト単語を正解ラベルとしてVGG-Netを再学習し、この再学習後のVGG-Netを各タスクの特徴量として利用。
4.PROP-SC：前記実施形態のうち、コンテンツ特徴量モデルを用いず、スパース符号化を用いてリンク予測モデルから直接コンテンツ特徴量を抽出する方法。
5.PROP-FT：前記実施形態のうち、コンテンツ特徴量モデルを学習して、このコンテンツ特徴量モデルからコンテンツ特徴量を抽出する方法。
6.VGG+SC：PROP-SCのコンテンツ特徴量とVGGのコンテンツ特徴量を連結して用いる方法。
7.VGG+FT：PROC-FTのコンテンツ特徴量とVGGのコンテンツ特徴量を連結して用いる方法。 1.VGG: The basic feature amount of the content is used as it is as the feature amount of each task.
2.FT-GRP: Instead of using the link prediction model, VGG-Net is relearned using the digital content set as the correct label, and the relearned VGG-Net is used as the feature of each task.
3.FT-WORD: Instead of using the link prediction model, VGG-Net is relearned using the text word contained in the digital content set as the correct label, and VGG-Net after this relearning is used as the feature of each task. use.
4.PROP-SC: A method of extracting content features directly from a link prediction model using sparse coding without using a content feature model in the above embodiment.
5.PROP-FT: A method of learning a content feature amount model and extracting a content feature amount from this content feature amount model in the above embodiment.
6.VGG + SC: A method of using PROP-SC content features and VGG content features in combination.
7.VGG + FT: A method of concatenating and using PROC-FT content features and VGG content features.

評価指標として、二値分類をタスクとするInstagramデータセット及びマルチラベル予測をタスクとするPinterestについてはmean average precision(MAP)、多クラス分類をタスクとするUECFOOD100/256、 Apparel、Hipsterについては分類正解率(ACC)を用いた。いずれの指標も、0を最小値、1を最大値として、値が大きいほどタスクの性能が良いことを示す。 As evaluation indicators, Instagram dataset with binary classification as a task and Pinterest with multi-label prediction as a task have mean average precision (MAP), and UECFOOD100 / 256 with multi-class classification as a task, Apparel, Hipster are classified correctly. The rate (ACC) was used. In each index, 0 is the minimum value and 1 is the maximum value, and the larger the value, the better the task performance.

表１に検証実験の結果を示す。 Table 1 shows the results of the verification experiment.

大規模物体認識データセットを用いて学習したVGGが各データセットにおいて優れた性能を示したが、web画像データを用いて学習したPROP-SC及びPROP-FTもそれに近い性能を得た。このことは、前記の実施形態に示した方法の有用性を示すものである。また、VGG+SC及びVGG+FTがVGGを上回る性能を得た。このことは、前述の実施形態に示した方法で抽出したコンテンツ特徴量が、大規模物体認識データセットで学習した特徴量とは大きく異なる性質の特徴量を抽出でき、かつそれが様々なドメインで適用可能であることを示している。 VGG trained using large-scale object recognition data sets showed excellent performance in each data set, but PROP-SC and PROP-FT trained using web image data also obtained similar performance. This shows the usefulness of the method shown in the above embodiment. In addition, VGG + SC and VGG + FT outperformed VGG. This means that the content features extracted by the method shown in the above-described embodiment can be extracted with characteristics that are significantly different from the features learned by the large-scale object recognition data set, and it can be extracted in various domains. Shows that it is applicable.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば上述した各実施の形態では、リンク予測モデル、コンテンツ特徴量モデルの学習と、コンテンツ特徴量の抽出、及びリンク予測とを、同一の装置で行うコンテンツ特徴量抽出について説明したがこれに限定されるものではない。例えば、リンク予測モデル、コンテンツ特徴量モデルの学習を行う装置と、コンテンツ特徴量の抽出、及びリンク予測を行う装置とに分けてもよい。 For example, in each of the above-described embodiments, the content feature amount extraction in which the learning of the link prediction model and the content feature amount model, the extraction of the content feature amount, and the link prediction are performed by the same device has been described, but the present invention is limited to this. It's not something. For example, it may be divided into a device for learning a link prediction model and a content feature amount model, and a device for extracting a content feature amount and performing link prediction.

１０、２１０、３１０入力部
２０、２２０、３２０演算部
２２リンク予測モデル記憶部
２４コンテンツ特徴量モデル記憶部
３０コンテンツ基本特徴量抽出部
３２コンテンツ集合基本特徴量抽出部
３４グラフ構築部
３６リンク予測モデル学習部
４０コンテンツ特徴量モデル学習部
４２コンテンツ特徴量算出部
４４モデル交互最適化部
４６、２４６リンク予測部
５０、２５０、３５０出力部
１００、２００、３００コンテンツ特徴量抽出装置
２４３コンテンツ特徴量算出部
３４０新規ディジタルコンテンツ集合リンク予測部
３４２潜在変数予測モデル学習部
３４４潜在変数予測部 10, 210, 310 Input unit 20, 220, 320 Calculation unit 22 Link prediction model storage unit 24 Content feature amount model storage unit 30 Content basic feature amount extraction unit 32 Content set basic feature amount extraction unit 34 Graph construction unit 36 Link prediction model Learning unit 40 Content feature amount model Learning unit 42 Content feature amount calculation unit 44 Model alternating optimization unit 46, 246 Link prediction unit 50, 250, 350 Output unit 100, 200, 300 Content feature amount extraction device 243 Content feature amount calculation unit 340 New Digital Content Set Link Prediction Unit 342 Latent Variable Prediction Model Learning Unit 344 Latent Variable Prediction Unit

Claims

Using a plurality of digital content sets consisting of digital contents prepared in advance, at least one feature amount of the digital contents included in the digital contents set and the new digital contents given separately from the digital contents set is extracted. , In the content feature amount extraction method in the content feature amount extraction device,
A step in which the content basic feature amount extraction unit extracts the content basic feature amount, which is the basic feature amount of the digital content, for each of the digital contents included in the digital content set.
A step in which the graph construction unit constructs a content graph, which is a graph expressing which digital content set each of the digital contents is included in.
The link prediction model learning unit predicts the presence or absence of belonging to the digital content set from the content basic feature amount of the digital content based on the content basic feature amount extracted from each of the digital contents and the content graph. Steps to learn the link prediction model, which is a model,
Based on the learned link prediction model, the content feature amount calculation unit calculates the content feature amount, which is the feature amount of the digital content, for at least one of the digital content included in the digital content set and the new digital content. Steps to do and
Content feature extraction method including.

The content feature amount extraction method further comprises
The content set basic feature amount extraction unit comprises a step of extracting the content set basic feature amount, which is the basic feature amount of the digital content set, for each of the plurality of digital content sets.
The step of the link prediction model learning unit is based on the content basic feature amount extracted from each of the digital contents, the content set basic feature amount extracted from each of the plurality of digital content sets, and the content graph. The content feature amount extraction method according to claim 1, wherein a link prediction model for predicting the presence or absence of belonging to a content set from the content basic feature amount and the content set basic feature amount is learned.

The content feature amount extraction method further comprises
The content feature amount model learning unit includes a step of learning a content feature amount model which is a model for extracting the content feature amount of the digital content from the link prediction model and the digital content included in the digital content set.
The step of the content feature amount calculation unit is a request to calculate the content feature amount from at least one of the digital content included in the digital content set and the new digital content by using the learned content feature amount model. The content feature amount extraction method according to item 1 or claim 2.

The content feature amount extraction method further comprises
The model alternating optimization unit optimizes the link prediction model and the content feature amount model by alternately and repeatedly executing the steps of the link prediction model learning unit and the steps of the content feature amount model learning unit. With steps,
The step of the link prediction model learning unit is based on the content feature amount extracted from each of the digital contents using the learned content feature amount model and the presence or absence of belonging to the digital content set based on the content graph. The content feature amount extraction method according to claim 3, wherein the link prediction model, which is a model for predicting the content from the content feature amount of the digital content, is learned.

The content feature amount extraction method further comprises
New digital content set The link prediction unit predicts the latent variables included in the link prediction model, which has been learned in advance for the new digital content set, which is a newly given digital content set, based on the latent variable prediction model. Claims 1 to include a step of predicting a latent variable for a new digital content set and predicting whether or not the digital content belongs to the new digital content set based on the predicted latent variable and the link prediction model. The content feature amount extraction method according to any one of claims 4.

Using a plurality of digital content sets consisting of digital contents prepared in advance, at least one feature amount of the digital contents included in the digital contents set and the new digital contents given separately from the digital contents set is extracted. , In the content feature amount extraction device
For each of the digital contents included in the digital content set, a content basic feature amount extraction unit for extracting the content basic feature amount which is the basic feature amount of the digital content, and a content basic feature amount extraction unit.
A graph construction unit that constructs a content graph, which is a graph expressing which digital content set each of the digital contents is included in, and a graph construction unit.
A link prediction model, which is a model for predicting the presence or absence of belonging to the digital content set from the content basic feature amount of the digital content based on the content basic feature amount extracted from each of the digital contents and the content graph. Link prediction model learning department to learn,
A content feature amount calculation unit that calculates a content feature amount, which is a feature amount of the digital content, for at least one of the digital content included in the digital content set and the new digital content based on the learned link prediction model. ,
Content feature quantity extraction device including.

A program for causing a computer to execute each step of the content feature amount extraction method according to any one of claims 1 to 5.