JP2018025949A

JP2018025949A - Learning device, image search device, method, and program

Info

Publication number: JP2018025949A
Application number: JP2016157008A
Authority: JP
Inventors: 航光田; Wataru Mitsuta; 東中　竜一郎; Ryuichiro Higashinaka; 竜一郎東中; 松尾　義博; Yoshihiro Matsuo; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-08-09
Filing date: 2016-08-09
Publication date: 2018-02-15
Anticipated expiration: 2036-08-09
Also published as: JP6553557B2

Abstract

PROBLEM TO BE SOLVED: To provide a learning device capable of learning a ranking model for accurately searching an image appropriate for a text data of a music.SOLUTION: A ranker learning unit 64 learns a ranking model based on a keyword pair feature quantity of a correct solution, a keyword pair feature quantity of an incorrect solution, a topic pair feature quantity of the correct solution, and a topic pair feature quantity of the incorrect solution to thereby learn a ranking model for accurately searching an image which is appropriate for a text data of a music.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、映像検索装置、方法、及びプログラムに係り、特に、楽曲の歌詞データに適した映像を検索するための学習装置、映像検索装置、方法、及びプログラムに関する。 The present invention relates to a learning device, a video search device, a method, and a program, and more particularly, to a learning device, a video search device, a method, and a program for searching for a video suitable for song lyrics data.

Web検索におけるイメージ検索など、テキストから映像を検索するニーズは高い。テキストから映像を検索することができれば、映像を目視で確認しながら検索する必要はなく、コストを低減できる。また、テキストに合った映像が取得できれば、テキストの内容を視覚的に補助することも可能である。 There is a great need to search video from text, such as image search in Web search. If the video can be searched from the text, it is not necessary to search while visually checking the video, and the cost can be reduced. In addition, if an image suitable for the text can be acquired, it is possible to visually assist the content of the text.

情報検索と言語処理(言語と計算)，第２章情報検索の基礎、第４章言語処理技術の利用、徳永健伸(著)，東京大学出版会，1999Information Retrieval and Language Processing (Language and Computation), Chapter 2 Information Retrieval Fundamentals, Chapter 4 Use of Language Processing Technology, Takenobu Tokunaga (Author), The University of Tokyo Press, 1999

イメージ検索などでは、入力されるテキストはキーワードであることが多い。しかし、楽曲の歌詞といった長文かつ主観的な文章を入力として、該当する映像を検索する手法はこれまでに確立されていない。 In an image search or the like, the input text is often a keyword. However, there has not been established a technique for searching for a corresponding video by inputting a long and subjective sentence such as a song lyrics.

本発明は、上記事情を鑑みて成されたものであり、楽曲の歌詞データに適した映像を精度よく検索するためのランキングモデルを学習することができる学習装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and provides a learning device, method, and program capable of learning a ranking model for accurately searching for a video suitable for song lyrics data. With the goal.

また、楽曲の歌詞データに適した映像を精度よく検索することができる映像検索装置、方法、及びプログラムを提供することを目的とする。 It is another object of the present invention to provide a video search apparatus, method, and program capable of accurately searching for a video suitable for song lyrics data.

上記目的を達成するために、第１の発明に係る学習装置は、楽曲の歌詞データに適した映像を検索するためのランキングモデルを学習する学習装置であって、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データ、及び楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データに含まれる歌詞データの各々から、歌詞用キーワードを抽出する歌詞用キーワード抽出部と、前記正解データ及び前記不正解データに含まれる概要テキストデータの各々から、概要テキスト用キーワードを抽出する概要テキスト用キーワード抽出部と、前記正解データの前記歌詞データから抽出された歌詞用キーワード及び前記正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記不正解データの前記歌詞データから抽出された歌詞用キーワード及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量とに基づいて、前記ランキングモデルを学習するランカ学習部と、を含んで構成されている。 In order to achieve the above object, a learning device according to a first aspect of the present invention is a learning device for learning a ranking model for searching for a video suitable for song lyrics data, wherein the song lyrics data and the song The correct answer data that is a pair with the summary text data given to the video suitable for the lyric data, and the lyric data of the music, and the summary text data given to the video that is not suitable for the lyric data of the music A keyword extraction unit for lyrics that extracts lyrics keywords from each of lyrics data included in certain incorrect answer data, and a summary text keyword is extracted from each of correct text and summary text data included in the incorrect answer data. A keyword extractor for summary text, a keyword for lyrics extracted from the lyrics data of the correct answer data, and the correct answer Keyword pair feature amount representing a combination of keywords for summary text extracted from the summary text data of the data, keywords for lyrics extracted from the lyrics data of the incorrect answer data, and the summary text data of the incorrect answer data And a ranker learning unit that learns the ranking model based on the keyword pair feature amount representing the combination of the keywords for the summary text extracted from.

また、第２の発明に係る学習装置は、楽曲の歌詞データに適した映像を検索するためのランキングモデルを学習する学習装置であって、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データ、及び楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データに含まれる歌詞データの各々から、歌詞用トピックを抽出する歌詞用トピック抽出部と、前記正解データ及び前記不正解データに含まれる概要テキストデータの各々から、概要テキスト用トピックを抽出する概要テキスト用トピック抽出部と、前記正解データの前記歌詞データから抽出された歌詞用トピック及び前記正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、前記不正解データの前記歌詞データから抽出された歌詞用トピック及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて、前記ランキングモデルを学習するランカ学習部と、を含んで構成されている。 The learning device according to the second invention is a learning device for learning a ranking model for searching for a video suitable for song lyrics data, and is suitable for song lyrics data and song lyrics data. Included in correct answer data that is a pair with summary text data given to video and incorrect answer data that is a pair of lyrics data of music and summary text data given to video that is not suitable for the lyrics data of the music A topic extraction unit for lyrics that extracts a topic for lyrics from each of the lyrics data to be extracted, and a topic extraction for summary text that extracts a topic for summary text from each of the summary text data included in the correct answer data and the incorrect answer data Section, the topic for lyrics extracted from the lyrics data of the correct answer data, and the summary text data of the correct answer data A topic pair feature amount representing a combination of topics extracted from the summary text, a topic for lyrics extracted from the lyrics data of the incorrect answer data, and a summary text extracted from the summary text data of the incorrect answer data A ranker learning unit that learns the ranking model based on a topic pair feature amount representing a combination of topics.

また、第３の発明に係る学習装置は、楽曲の歌詞データに適した映像を検索するためのランキングモデルを学習する学習装置であって、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データ、及び楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データに含まれる歌詞データの各々から、歌詞用キーワードを抽出する歌詞用キーワード抽出部と、前記正解データ及び前記不正解データに含まれる概要テキストデータの各々から、概要テキスト用キーワードを抽出する概要テキスト用キーワード抽出部と、前記正解データ及び前記不正解データに含まれる歌詞データの各々から、歌詞用トピックを抽出する歌詞用トピック抽出部と、前記正解データ及び前記不正解データに含まれる概要テキストデータの各々から、概要テキスト用トピックを抽出する概要テキスト用トピック抽出部と、前記正解データの前記歌詞データから抽出された歌詞用キーワード及び前記正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記不正解データの前記歌詞データから抽出された歌詞用キーワード及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記正解データの前記歌詞データから抽出された歌詞用トピック及び前記正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、前記不正解データの前記歌詞データから抽出された歌詞用トピック及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて、前記ランキングモデルを学習するランカ学習部と、を含んで構成されている。 A learning device according to a third aspect of the present invention is a learning device for learning a ranking model for searching for a video suitable for song lyrics data, and is suitable for song lyrics data and the song lyrics data. Included in correct answer data that is a pair with summary text data given to video and incorrect answer data that is a pair of lyrics data of music and summary text data given to video that is not suitable for the lyrics data of the music Lyric keyword extraction unit for extracting lyric keywords from each of the lyric data to be extracted, and general text keyword extraction for extracting the general text keywords from each of the general text data included in the correct data and the incorrect data And a song that extracts a lyric topic from each of the correct answer data and the lyric data included in the incorrect answer data A topic extraction unit for extracting a summary text topic from each of the summary text data included in the correct answer data and the incorrect answer data, and extracted from the lyrics data of the correct answer data. A keyword pair feature amount representing a combination of a keyword for lyrics and a keyword for summary text extracted from the summary text data of the correct answer data, a keyword for lyrics extracted from the lyrics data of the incorrect answer data, and the incorrect answer data A keyword pair feature amount representing a combination of keywords for summary text extracted from the summary text data, a topic for lyrics extracted from the lyrics data of the correct answer data, and the summary text data of the correct answer data Of the topic for summary text A topic pair feature representing a combination of a topic pair feature amount representing matching, a topic for lyrics extracted from the lyrics data of the incorrect answer data, and a topic for summary text extracted from the summary text data of the incorrect answer data And a ranker learning unit that learns the ranking model based on the quantity.

また、第１〜第３の発明に係る学習装置において、前記概要テキスト用キーワードは、人物、場所、季節、又はイベントを表すキーワードであるようにしてもよい。 In the learning device according to the first to third inventions, the summary text keyword may be a keyword representing a person, a place, a season, or an event.

第４の発明に係る映像検索装置は、楽曲の歌詞データに適した映像を検索する映像検索装置であって、入力された楽曲の歌詞データから、歌詞用キーワードを抽出する歌詞用キーワード抽出部と、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データの前記歌詞データから抽出された歌詞用キーワード及び前記正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データの前記歌詞データから抽出された歌詞用キーワード及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量とに基づいて予め学習された、楽曲の歌詞データに適した映像を検索するためのランキングモデルを記憶するランキングモデル記憶部と、前記映像の各々に対する、前記歌詞用キーワード抽出部によって抽出された歌詞用キーワード及び前記映像に付与された概要テキストデータから抽出される概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記ランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索する映像検索部と、を含んで構成されている。 A video search device according to a fourth invention is a video search device for searching for a video suitable for lyric data of a song, the lyric keyword extracting unit extracting a lyric keyword from the input lyric data of the song, The lyric keywords extracted from the lyric data of correct answer data that is a pair of the lyric data of music and the outline text data assigned to the video suitable for the lyric data of the music, and the outline text data of the correct data Of incorrect answer data that is a pair of a keyword pair feature amount representing a combination of keywords for summary text extracted from the above, lyrics data of music, and summary text data attached to a video not suitable for the lyrics data of the music Extracted from the lyric keyword extracted from the lyric data and the summary text data of the incorrect answer data A ranking model storage unit for storing a ranking model for searching for a video suitable for the lyrics data of a song, learned in advance based on a keyword pair feature amount representing a combination of keywords for the summary text, and for each of the videos , Based on a keyword pair feature amount representing a combination of a keyword for lyrics extracted by the keyword extraction unit for lyrics and a keyword for summary text extracted from the summary text data attached to the video, and the ranking model, A video search unit that searches for a video suitable for the lyrics data of the input music.

また、第５の発明に係る映像検索装置は、楽曲の歌詞データに適した映像を検索する映像検索装置であって、入力された楽曲の歌詞データから、歌詞用トピックを抽出する歌詞用トピック抽出部と、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データの前記歌詞データから抽出された歌詞用トピック及び前記正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データの前記歌詞データから抽出された歌詞用トピック及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて予め学習された、楽曲の歌詞データに適した映像を検索するためのランキングモデルを記憶するランキングモデル記憶部と、前記映像の各々に対する、前記歌詞用トピック抽出部によって抽出された歌詞用トピック及び前記映像に付与された概要テキストデータから抽出される概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、前記ランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索する映像検索部と、を含んで構成されている。 According to a fifth aspect of the present invention, there is provided a video search apparatus for searching for a video suitable for lyric data of a song, wherein the lyric topic extraction extracts a lyric topic from the input lyric data of the music. Section, the lyrics topic extracted from the lyrics data of the correct answer data that is a pair of the lyrics data of the song, and the summary text data attached to the video suitable for the lyrics data of the song, and the overview of the correct answer data Incorrect answer that is a pair of topic pair feature amount representing a combination of topics for summary text extracted from text data, lyrics data of music, and summary text data attached to video not suitable for the lyrics data of the music Topic for lyrics extracted from the lyrics data of the data and the summary extracted from the summary text data of the incorrect answer data A ranking model storage unit for storing a ranking model for searching for a video suitable for the lyrics data of the music, which has been learned in advance based on a topic pair feature amount representing a combination of topics for text, and for each of the videos, Based on a topic pair feature amount representing a combination of a topic for lyrics extracted by the topic extraction unit for lyrics and a topic for summary text extracted from the summary text data attached to the video, and an input based on the ranking model And a video search unit for searching for a video suitable for the lyrics data of the recorded music.

また、第６の発明に係る映像検索装置は、楽曲の歌詞データに適した映像を検索する映像検索装置であって、入力された楽曲の歌詞データから、歌詞用キーワードを抽出する歌詞用キーワード抽出部と、前記入力された楽曲の歌詞データから、歌詞用トピックを抽出する歌詞用トピック抽出部と、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データの前記歌詞データから抽出された歌詞用キーワード及び前記正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データの前記歌詞データから抽出された歌詞用キーワード及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記正解データの前記歌詞データから抽出された歌詞用トピック及び前記正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、前記不正解データの前記歌詞データから抽出された歌詞用トピック及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて予め学習された、楽曲の歌詞データに適した映像を検索するためのランキングモデルを記憶するランキングモデル記憶部と、前記映像の各々に対する、前記歌詞用キーワード抽出部によって抽出された歌詞用キーワード及び前記映像に付与された概要テキストデータから抽出される概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記歌詞用トピック抽出部によって抽出された歌詞用トピック及び前記映像に付与された概要テキストデータから抽出される概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、前記ランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索する映像検索部と、を含んで構成されている。 According to a sixth aspect of the present invention, there is provided a video search device for searching for a video suitable for lyric data of a song, wherein the lyric keyword is extracted from the input lyric data of the tune. A lyric topic extracting unit for extracting a lyric topic from the input lyric data, lyric data of the tune, and summary text data attached to a video suitable for the lyric data of the tune Keyword pair feature amount representing a combination of a keyword for lyrics extracted from the lyrics data of the correct answer data as a pair and a keyword for summary text extracted from the summary text data of the correct answer data, lyrics data of music, and The lyrics data of incorrect answer data that is a pair with the summary text data attached to the video not suitable for the lyrics data of the music A keyword pair feature amount representing a combination of a keyword for lyrics extracted from the data and a keyword for summary text extracted from the summary text data of the incorrect answer data, and a topic for lyrics extracted from the lyrics data of the correct data And a topic pair feature amount representing a combination of topics for summary text extracted from the summary text data of the correct answer data, and a topic for lyrics extracted from the lyrics data of the incorrect answer data and the summary of the incorrect answer data A ranking model storage unit for storing a ranking model for searching for videos suitable for lyric data of a song, learned in advance based on a topic pair feature amount representing a combination of topics for summary text extracted from text data; The lyrics for each of the videos A keyword pair feature amount representing a combination of a keyword for lyrics extracted by the keyword extraction unit and a keyword for summary text extracted from the summary text data attached to the video, and for the lyrics extracted by the topic extraction unit for lyrics Search for a video suitable for the lyrics data of the input music based on the topic pair feature amount representing the combination of the topic for the summary text extracted from the topic and the summary text data attached to the video and the ranking model. And a video search unit.

第７の発明に係る学習方法は、楽曲の歌詞データに適した映像を検索するためのランキングモデルを学習する学習装置における学習方法であって、歌詞用キーワード抽出部が、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データ、及び楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データに含まれる歌詞データの各々から、歌詞用キーワードを抽出するステップと、概要テキスト用キーワード抽出部が、前記正解データ及び前記不正解データに含まれる概要テキストデータの各々から、概要テキスト用キーワードを抽出するステップと、ランカ学習部が、前記正解データの前記歌詞データから抽出された歌詞用キーワード及び前記正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記不正解データの前記歌詞データから抽出された歌詞用キーワード及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量とに基づいて、前記ランキングモデルを学習するステップと、を含んで実行することを特徴とする。 A learning method according to a seventh aspect of the present invention is a learning method in a learning device for learning a ranking model for searching for a video suitable for song lyrics data, wherein the lyrics keyword extraction unit includes song lyrics data, Correct answer data that is a pair with summary text data given to the video suitable for the lyrics data of the music, and the lyrics data of the music, and summary text data given to the video not suitable for the lyrics data of the music A step of extracting a lyric keyword from each of lyric data included in a pair of incorrect answer data, and a summary text keyword extracting unit, from each of the correct text and the summary text data included in the incorrect answer data, A step of extracting a keyword for summary text and a ranker learning unit are extracted from the lyrics data of the correct answer data. A keyword pair feature amount representing a combination of a keyword for lyrics and a keyword for summary text extracted from the summary text data of the correct answer data, a keyword for lyrics extracted from the lyrics data of the incorrect answer data, and the incorrect answer data Learning the ranking model based on a keyword pair feature amount representing a combination of keywords for summary text extracted from the summary text data.

第８の発明に係る映像検索方法は、歌詞用キーワード抽出部と、楽曲の歌詞データと、前記楽曲の歌詞データに適していない映像に付与された概要テキストデータとのペアである不正解データの前記歌詞データから抽出された歌詞用キーワード及び前記不正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量とに基づいて予め学習された、楽曲の歌詞データに適した映像を検索するためのランキングモデルを記憶するランキングモデル記憶部と、映像検索部とを備え、楽曲の歌詞データに適した映像を検索する映像検索装置における映像検索方法であって、前記歌詞用キーワード抽出部が、入力された楽曲の歌詞データから、歌詞用キーワードを抽出するステップと、前記映像検索部が、前記映像の各々に対する、前記歌詞用キーワード抽出部によって抽出された歌詞用キーワード及び前記映像に付与された概要テキストデータから抽出される概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、楽曲の歌詞データと、前記楽曲の歌詞データに適した映像に付与された概要テキストデータとのペアである正解データの前記歌詞データから抽出された歌詞用キーワード及び前記正解データの前記概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、前記ランキングモデル記憶部に記憶されているランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索するステップと、を含んで実行することを特徴とする。 According to an eighth aspect of the present invention, there is provided a video search method comprising: a keyword extraction unit for lyrics; lyrics data of music; and incorrect answer data that is a pair of summary text data attached to video not suitable for the lyrics data of the music To the lyrics data of the music learned in advance based on the keyword pair feature amount representing the combination of the keyword for lyrics extracted from the lyrics data and the keyword for summary text extracted from the summary text data of the incorrect answer data A video search method in a video search apparatus, comprising: a ranking model storage unit for storing a ranking model for searching for a suitable video; and a video search unit, wherein the video search device searches for a video suitable for lyric data of music. A keyword extracting unit for extracting lyrics keywords from the lyrics data of the input music; A keyword pair feature amount representing a combination of a lyric keyword extracted by the lyric keyword extracting unit and an outline text keyword extracted from the outline text data attached to the image for each of the videos by the image search unit Lyric data extracted from the lyric data of the correct answer data and the summary text of the correct answer data, which is a pair of the lyric data of the tune and the outline text data given to the video suitable for the lyric data of the tune Based on a keyword pair feature amount representing a combination of keywords for summary text extracted from data and a ranking model stored in the ranking model storage unit, a video suitable for the lyrics data of the input music is searched. And executing the steps.

第９の発明に係るプログラムは、コンピュータを、第１〜第３の発明に係る学習装置、又は第４〜第６の発明に係る映像検索装置の各部として機能させるためのプログラムである。 A program according to a ninth invention is a program for causing a computer to function as each unit of the learning device according to the first to third inventions or the video search device according to the fourth to sixth inventions.

本発明の学習装置、方法、及びプログラムによれば、歌詞データの各々から、歌詞用キーワードを抽出し、概要テキストデータの各々から、概要テキスト用キーワードを抽出し、歌詞データの各々から、歌詞用トピックを抽出し、概要テキストデータの各々から、概要テキスト用トピックを抽出し、正解データの歌詞データから抽出された歌詞用キーワード及び正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、不正解データの歌詞データから抽出された歌詞用キーワード及び不正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、正解データの歌詞データから抽出された歌詞用トピック及び正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、不正解データの歌詞データから抽出された歌詞用トピック及び不正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて、ランキングモデルを学習することにより、楽曲の歌詞データに適した映像を精度よく検索するためのランキングモデルを学習することができる、という効果が得られる。 According to the learning device, method, and program of the present invention, a keyword for lyrics is extracted from each of lyrics data, a keyword for summary text is extracted from each of summary text data, and the lyrics are extracted from each of the lyrics data. A topic is extracted, a topic for summary text is extracted from each of the summary text data, and a combination of a keyword for lyrics extracted from the lyrics data of correct answer data and a keyword for summary text extracted from the summary text data of correct data Keyword pair feature amount that represents the combination of the keyword for lyrics extracted from the lyrics data of the incorrect answer data and the keyword for summary text extracted from the summary text data of the incorrect answer data, and the lyrics of the correct answer data Lyric topics and correct answer data extracted from the data Topic pair features that represent a combination of topics for summary text extracted from summary text data, and topics for lyrics extracted from lyrics data of incorrect answer data and summary text extracted from summary text data of incorrect answer data By learning the ranking model based on the topic pair feature amount that represents the combination of topics, it is possible to learn the ranking model for accurately searching the video suitable for the lyrics data of the music. It is done.

本発明の映像検索装置、方法、及びプログラムによれば、入力された楽曲の歌詞データから、歌詞用キーワードを抽出し、入力された楽曲の歌詞データから、歌詞用トピックを抽出し、映像の各々に対する、抽出された歌詞用キーワード及び映像に付与された概要テキストデータから抽出される概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、抽出された歌詞用トピック及び映像に付与された概要テキストデータから抽出される概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、ランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索することにより、楽曲の歌詞データに適した映像を精度よく検索することができる。 According to the video search apparatus, method, and program of the present invention, the lyrics keyword is extracted from the lyrics data of the input music, the topic for lyrics is extracted from the lyrics data of the input music, and each of the videos , The keyword pair feature amount representing the combination of the keyword for summary text extracted from the extracted keyword for lyrics and the summary text data attached to the video, and the summary text data attached to the extracted topic for lyrics and video Video suitable for the lyrics data of the music by searching the video suitable for the lyrics data of the input music based on the topic pair feature amount representing the combination of topics for the summary text extracted from the ranking model Can be searched with high accuracy.

本発明の実施の形態に係る学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the learning apparatus which concerns on embodiment of this invention. 歌詞データの形態素解析の結果の一例を示す図である。It is a figure which shows an example of the result of the morphological analysis of lyrics data. 単語抽出の結果の一例を示す図である。It is a figure which shows an example of the result of word extraction. 概要テキストの形態素解析の結果の一例を示す図である。It is a figure which shows an example of the result of the morphological analysis of an outline text. ある歌詞のトピックの上位５個の確率値の一例を示す図である。It is a figure which shows an example of the top five probability values of the topic of a certain lyrics. 本発明の実施の形態に係る映像検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video search apparatus which concerns on embodiment of this invention. 概要テキストのランキング結果の一例を示す図である。It is a figure which shows an example of the ranking result of an outline text. 本発明の実施の形態に係る学習装置における学習処理ルーチンを示すフローチャートである。It is a flowchart which shows the learning process routine in the learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像検索装置における映像検索処理ルーチンを示すフローチャートである。It is a flowchart which shows the video search process routine in the video search device which concerns on embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Outline according to Embodiment of the Present Invention>

まず、本発明の実施の形態における概要を説明する。 First, an outline of the embodiment of the present invention will be described.

本発明の実施の形態における楽曲映像検索は、学習処理と、検索処理とからなる。学習処理では、学習装置において、検索処理を行うために必要となるランキングモデルが作成される。検索処理では、映像検索装置において、学習処理で作成されたランキングモデルを元に、楽曲の歌詞データに対して映像の検索を行う。 The music video search in the embodiment of the present invention includes a learning process and a search process. In the learning process, a ranking model necessary for performing the search process is created in the learning device. In the search process, the video search device searches the lyric data of the music based on the ranking model created in the learning process.

＜本発明の実施の形態に係る学習装置の構成＞ <Configuration of Learning Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る学習装置の構成について説明する。図１に示すように、本発明の実施の形態に係る学習装置１００は、ＣＰＵと、ＲＡＭと、後述する学習処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この学習装置１００は、機能的には図１に示すように演算部２０を備えている。 Next, the configuration of the learning device according to the embodiment of the present invention will be described. As shown in FIG. 1, a learning device 100 according to an embodiment of the present invention is a computer that includes a CPU, a RAM, and a ROM that stores a program for executing a learning processing routine described later and various data. Can be configured. The learning device 100 functionally includes a calculation unit 20 as shown in FIG.

演算部２０は、歌詞データＤＢ３０と、歌詞用キーワード抽出部３２と、各歌詞のキーワードリスト３４と、概要テキストデータＤＢ３６と、概要テキスト用キーワード抽出部３８と、各概要テキストのキーワードリスト４０と、歌詞用トピックモデル作成部４２と、歌詞用トピックモデル４４と、歌詞用トピック抽出部４６と、各歌詞のトピックリスト４８と、概要テキスト用トピックモデル作成部５０と、概要テキスト用トピックモデル５２と、概要テキスト用トピック抽出部５４と、各概要テキストのトピックリスト５６と、歌詞・概要テキストペア正解データ５８と、不正解データ作成部６０と、歌詞・概要テキストペア不正解データ６２と、ランカ学習部６４と、ランキングモデル記憶部６６とを含んで構成されている。 The calculation unit 20 includes a lyrics data DB 30, a keyword extraction unit 32 for lyrics, a keyword list 34 for each lyrics, a summary text data DB 36, a keyword extraction unit 38 for summary text, a keyword list 40 for each summary text, Topic model creation unit 42 for lyrics, topic model 44 for lyrics, topic extraction unit 46 for lyrics, topic list 48 for each lyrics, topic model creation unit 50 for summary text, topic model 52 for summary text, Topic extraction section 54 for summary text, topic list 56 for each summary text, lyrics / summary text pair correct answer data 58, incorrect answer data creation section 60, lyrics / summary text pair incorrect answer data 62, ranker learning section 64 and a ranking model storage unit 66.

歌詞データＤＢ３０には、複数の楽曲の歌詞データが格納されている。 The lyrics data DB 30 stores lyrics data of a plurality of songs.

歌詞用キーワード抽出部３２は、以下に説明するように、歌詞データＤＢ３０に格納されている歌詞データの各々から、歌詞用キーワードを抽出し、歌詞ごとのキーワードリストを作成して、各歌詞のキーワードリスト３４として保存する。 The lyric keyword extracting unit 32 extracts lyric keywords from each of the lyric data stored in the lyric data DB 30 as described below, creates a keyword list for each lyric, Save as list 34.

歌詞用キーワード抽出部３２は、具体的には、まず、歌詞データ中の歌詞それぞれについて、形態素解析を行う。 Specifically, first, the lyric keyword extracting unit 32 performs morphological analysis for each of the lyrics in the lyric data.

例えば、以下のような歌詞があるとする。 For example, suppose you have the following lyrics:

私はあなたに会いたい
今すぐにでも会いたいの
・・・ I want to see you soon

上記の歌詞は、図２に示すように形態素解析される。ここで、形態素解析器にはNTT(R)が開発したJTAGを用いている。 The above lyrics are morphologically analyzed as shown in FIG. Here, JTAG developed by NTT (R) is used for the morphological analyzer.

図２において各行が１単語を表しており、左から順に表層形、品詞、標準形、基本形、読み、意味属性を表している。意味属性は３つのフィールドからなっており、名詞に関する意味属性、固有名詞に関する意味属性、用言に関する意味属性である。意味属性とは意味を表す番号である。 In FIG. 2, each line represents one word, and from the left, the surface form, the part of speech, the standard form, the basic form, the reading, and the semantic attribute are represented. The semantic attribute is composed of three fields: a semantic attribute related to a noun, a semantic attribute related to a proper noun, and a semantic attribute related to a predicate. A semantic attribute is a number representing meaning.

上記の形態素解析結果から、「あなた」は意味属性として15と2651を持つことが分かる。意味属性の詳細は以下の非特許文献２に示されている。 From the above morphological analysis results, it is understood that “you” has 15 and 2651 as semantic attributes. Details of the semantic attributes are shown in Non-Patent Document 2 below.

非特許文献２：池原悟,宮崎正弘,白井諭,横尾昭男,中岩浩巳,小倉健太郎,大山芳史,林良彦(1997) 日本語語彙大系．岩波書店． Non-patent document 2: Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio Yokoo, Hiroaki Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi (1997) Iwanami Shoten.

歌詞用キーワード抽出部３２は、次に、形態素解析の結果を利用して、所定の単語をキーワードとして抽出する。具体的には、品詞が名詞、動詞語幹、形容詞語幹のいずれかであり、かつ接尾辞ではない単語を抽出する。上記の形態素解析の結果では、図３に示す単語が抽出される。 Next, the lyric keyword extracting unit 32 extracts a predetermined word as a keyword by using the result of the morphological analysis. Specifically, a word whose part of speech is any of a noun, a verb stem, and an adjective stem and is not a suffix is extracted. As a result of the above morphological analysis, the words shown in FIG. 3 are extracted.

ここで、品詞が動詞語幹、および、形容詞語幹である形態素からは基本形を抽出対象とする。また、品詞が名詞である形態素からは標準形を抽出対象とする。 Here, basic forms are extracted from morphemes whose part of speech is a verb stem and an adjective stem. The standard form is extracted from the morpheme whose part of speech is a noun.

上記の手続きの結果、対象となる歌詞に対して、以下のキーワードリストが作成される。 As a result of the above procedure, the following keyword list is created for the target lyrics.

[私、あなた、会う、今、会う、・・・] [Me, you, meet, meet now ...]

当該キーワード抽出処理を歌詞データ中の全ての歌詞に対して適用し、歌詞ごとにキーワードリストを作成し、各歌詞のキーワードリスト３４として保存する。 The keyword extraction process is applied to all lyrics in the lyrics data, a keyword list is created for each lyrics, and is stored as the keyword list 34 for each lyrics.

概要テキストデータＤＢ３６には、映像に付与された概要テキストが格納されている。概要テキストとは、映像の説明であったり、映像の台本であったり、映像中の会話の情報であったり、テキストで表されるものであれば何でもよい。本実施例では、映像の説明が書かれたテキストを概要テキストと呼ぶ。 The summary text data DB 36 stores summary text added to the video. The summary text may be anything as long as it is a description of the video, a script of the video, conversation information in the video, or text. In this embodiment, the text in which the video description is written is referred to as summary text.

概要テキスト用キーワード抽出部３８は、以下に説明するように、概要テキストデータＤＢ３６に格納されている概要テキストデータの各々から、概要テキスト用キーワードを抽出し、概要テキストごとにキーワードリストを作成して、各概要テキストのキーワードリスト４０として保存する。各概要テキストのキーワードリスト４０は、概要テキストにおいて説明されている映像中の状況を表すキーワードのリストである。 As will be described below, the summary text keyword extraction unit 38 extracts summary text keywords from each of the summary text data stored in the summary text data DB 36 and creates a keyword list for each summary text. The keyword list 40 of each summary text is stored. The keyword list 40 for each summary text is a list of keywords representing the situation in the video described in the summary text.

概要テキスト用キーワードには、具体的には以下に列挙する６種類のキーワードがある。 Specifically, the summary text keywords include the following six types of keywords.

・人物キーワード：映像の人物を表す単語や表現
・場所キーワード：映像中の場所を表す単語や表現
・季節キーワード：映像中の季節を表す単語や表現
・イベントキーワード：映像中のイベントを表す単語や表現
・行動キーワード：映像中の行動や動作を表す単語や表現
・感情キーワード：映像中の感情を表す単語や表現・ Person keyword: Words and expressions representing the person in the image ・ Location keyword: Words and expressions representing the place in the image ・ Season keyword: Words and expressions representing the season in the image ・ Event keyword: Words representing the event in the image Expression / behavioral keywords: Words and expressions / emotions that represent actions and actions in the video Keywords: Words and expressions that represent emotions in the video

なお、アプリケーション依存で、これら以外のキーワードを定義してもよい。ここでは、映像を検索するのに重要と考えられるキーワードを定義している。 Other keywords may be defined depending on the application. Here, keywords that are considered important for video retrieval are defined.

概要テキスト用キーワード抽出部３８では、これらのキーワードを抽出するために、日本語語彙大系（図示省略）、感情語抽出器（図示省略）、評価表現抽出器（図示省略）を用いる。 The summary text keyword extracting unit 38 uses a Japanese vocabulary system (not shown), an emotion word extractor (not shown), and an evaluation expression extractor (not shown) to extract these keywords.

日本語語彙大系は、上記の状況のうち、人物キーワード、場所キーワード、季節キーワード、イベントキーワードを表す単語を抽出するために利用する。日本語語彙大系には、名詞の意味属性が階層的に整理されており、各意味属性には、上位の意味属性と、下位の意味属性がある。 The Japanese vocabulary system is used to extract words representing person keywords, place keywords, seasonal keywords, and event keywords from the above situation. In the Japanese vocabulary system, semantic attributes of nouns are arranged hierarchically, and each semantic attribute has an upper semantic attribute and a lower semantic attribute.

例えば、「場所」の意味属性の上位には「具体」があり、下位には「施設」、「地域」、「自然」がある。この意味属性の階層情報を利用して、以下のように、各状況を表すと考えられるキーワードを列挙した。 For example, “specific” is above the semantic attribute of “location”, and “facility”, “region”, and “nature” are below. Using the semantic attribute hierarchy information, the keywords considered to represent each situation are listed as follows.

・人物キーワード：「人」、「衣」、「衣料」、および、これらの下位の意味属性に対応する単語
・場所キーワード：「場所」、「建造物」、「乗り物」、「仕事場」、および、これらの下位の意味属性に対応する単語
・季節キーワード：「季節」、および、これらの下位の意味属性に対応する単語
・イベントキーワード：「式・行事等」、「生活」、および、これらの下位の意味属性に対応する単語・ Person keywords: “person”, “clothes”, “clothing”, and words / place keywords corresponding to semantic attributes below these: “place”, “building”, “vehicle”, “workplace”, and , Words / season keywords corresponding to these lower semantic attributes: “season”, and words / event keywords corresponding to these lower semantic attributes: “expression / event”, “life”, and these Words corresponding to lower semantic attributes

ここで、列挙されたキーワードにマッチした単語が概要テキストにあれば、それらは、人物、場所、季節、もしくは、イベントキーワードとして抽出される。 Here, if there are words in the summary text that match the listed keywords, they are extracted as persons, places, seasons, or event keywords.

例えば、以下の概要テキストを考える。 For example, consider the following summary text:

夏になる
少年が公園に行く Summer boy goes to the park

上記の概要テキストに対し形態素解析を行った結果を図４に示す。先に述べたように、最後のカラムには意味属性が記述されている。 FIG. 4 shows the result of the morphological analysis performed on the above summary text. As described above, semantic attributes are described in the last column.

ここで、「夏」は、意味属性が2674（夏）であり、2672（季節）の下位属性であるため、季節キーワードとして抽出される。現状、単語が複数の意味属性を持つ場合は、先頭の意味属性を利用して抽出を行うが、すべてを用いても良い。 Here, “summer” has a semantic attribute of 2674 (summer) and is a subordinate attribute of 2672 (season), and thus is extracted as a seasonal keyword. Currently, when a word has a plurality of semantic attributes, extraction is performed using the first semantic attribute, but all may be used.

なお、単語を抽出する際は、単語の標準形（３列目）を抽出する。標準形を用いることで、表記の僅かな違いを吸収してキーワードを抽出することができる。 When extracting a word, the standard form (third column) of the word is extracted. By using the standard form, keywords can be extracted while absorbing slight differences in notation.

上記の概要テキストに対しては、以下の単語が抽出される。 For the above summary text, the following words are extracted:

人物キーワード「少年」
場所キーワード「公園」
季節キーワード「夏」
イベントキーワードなし Character keyword "boy"
Location keyword "park"
Seasonal keyword “summer”
No event keywords

次に、感情語抽出器について説明する。感情語抽出器は、感情キーワードを抽出するために用いる。感情語抽出器としては、NTT(R)が開発したリッチインデクサという言語処理ツールを用いる。リッチインデクサには、予め決められた所定のキーワードリストを元に、感情に関わるキーワードを抽出する機能がある。この、リッチインデクサの機能を用いて、感情キーワードとして、例えば、楽しい、驚く、悲しい、幸せ、安心、心配といったキーワードを抽出する。 Next, the emotion word extractor will be described. The emotion word extractor is used to extract emotion keywords. As the emotion word extractor, a language processing tool called rich indexer developed by NTT (R) is used. The rich indexer has a function of extracting a keyword related to emotion based on a predetermined keyword list. Using this rich indexer function, for example, keywords such as fun, surprise, sad, happiness, relief, and worry are extracted as emotion keywords.

次に、評価表現抽出器について説明する。評価表現抽出器は、行動・感情キーワードを抽出するために用いる。行動にも様々あるが、ポジティブやネガティブといった極性に関わる行動を映像に関する重要な行動・感情と捉え、評価表現の中で、行動・感情に関するものを抽出する。評価表現とは、評価・感情に関わる言語表現を指す。ここでは、行動・感情に関する評価表現のリストを作成し、それらに合致するものを抽出することで、行動・感情キーワードとする。例えば、ほほえむ、ゆっくり、爽やか、切ない、慌てるといったキーワードを抽出する。 Next, the evaluation expression extractor will be described. The evaluation expression extractor is used to extract behavior / emotion keywords. Although there are various behaviors, the behaviors related to polarity such as positive and negative are regarded as important behaviors / emotions related to the video, and those related to the behaviors / emotions are extracted from the evaluation expression. Evaluation expression refers to language expression related to evaluation and emotion. Here, a list of evaluation expressions related to behaviour / emotion is created, and a list that matches them is extracted to obtain a behaviour / emotion keyword. For example, keywords such as smiling, slowly, refreshing, cutting, and panicking are extracted.

概要テキスト用キーワード抽出部３８では、上記の日本語語彙大系、感情語抽出器、及び評価表現抽出器を用いて、概要テキストデータＤＢ３６に格納されている全ての概要テキストデータに対して概要テキスト用キーワードの抽出を行い、概要テキストごとにキーワードのリストを作成し、各概要テキストのキーワードリストと４０して保存する。 The summary text keyword extraction unit 38 uses the above Japanese vocabulary system, emotion word extractor, and evaluation expression extractor to execute summary text for all summary text data stored in the summary text data DB 36. Keywords are extracted, a keyword list is created for each summary text, and saved as a keyword list for each summary text.

歌詞用トピックモデル作成部４２は、歌詞データＤＢ３０に格納されている歌詞データの各々から歌詞用トピックモデル４４を作成する。 The lyric topic model creation unit 42 creates a lyric topic model 44 from each of the lyric data stored in the lyric data DB 30.

トピックモデルとは、文書が複数の潜在トピックから生成されると仮定したモデルであり、単語の表層だけではない、文書の背後にある構造を分析するためによく用いられるものである。トピックモデルについては、以下の非特許文献３が詳しい。 A topic model is a model that assumes that a document is generated from a plurality of potential topics, and is often used to analyze not only the surface layer of words but also the structure behind the document. The following non-patent document 3 is detailed regarding the topic model.

非特許文献３：トピックモデル,岩田具治(著),講談社,2015. Non-Patent Document 3: Topic Model, Koji Iwata (Author), Kodansha, 2015.

歌詞用トピックモデル作成部４２では、具体的には、Latent Dirichlet Allocation（LDA）というアルゴリズムを用いて、歌詞データからトピックモデルを作成する。これは、トピックモデルを作成するのに一般的なアルゴリズムである。トピックモデルの構築には、各文書（すなわち、歌詞データのそれぞれ）を単語集合で表す必要があるが、ここでは、形態素解析の結果得られるすべての単語を利用した。トピック数は300とした。トピックモデル作成のツールには、gensimと呼ばれるライブラリを用いた。LDAについては、フリーソフトも多いため、それらを用いてもよい。 Specifically, the lyrics topic model creation unit 42 creates a topic model from the lyrics data using an algorithm called Latent Dirichlet Allocation (LDA). This is a common algorithm for creating topic models. To construct a topic model, each document (that is, each of lyric data) needs to be represented by a word set. Here, all words obtained as a result of morphological analysis are used. The number of topics was 300. A library called gensim was used as a topic model creation tool. About LDA, since there are many free software, you may use them.

歌詞用トピック抽出部４６は、歌詞データＤＢ３０に格納されている歌詞データの各々から、歌詞用トピックモデル４４に基づいて、歌詞用トピックを抽出し、各歌詞のトピックリスト４８を作成する。先に述べたように、トピックモデルでは、文書の背後に存在する潜在トピックを仮定し、それらが混ざりあって一つの文書が生成されていると考える。逆に言えば、一つの文書を、トピックモデルを用いて分析することで（これをinferenceという）、含まれている潜在トピックを調べることができる。ある文書に多く含まれている潜在トピックは、その文書の主要なトピックと考えられるので、それらを抽出する。 The lyrics topic extraction unit 46 extracts the lyrics topic from each of the lyrics data stored in the lyrics data DB 30 based on the lyrics topic model 44 and creates a topic list 48 of each lyrics. As described above, in the topic model, a potential topic existing behind a document is assumed, and they are mixed to generate a single document. In other words, by analyzing a single document using a topic model (this is called inference), the contained potential topics can be examined. Potential topics that are included in a document in large numbers are considered to be the main topics of the document, so they are extracted.

歌詞データについて、トピックを抽出する際には、歌詞用トピックモデル４４を利用する。歌詞用トピックモデルを用いた分析により、歌詞データ中のトピックリストを作成する。具体的には、各歌詞において、一定の割合以上含まれるトピックのみを抽出し、その歌詞のトピックリストとする。本発明の実施の形態ではこの閾値を0.1と定めた。例えば、ある歌詞のトピックの上位５個の含まれる度合い（確率値）が、図５に示すようになっていた場合、閾値が0.1以上のトピックを抽出することで、トピック85と122をこの歌詞のトピックリストとして抽出する。なお、85や122はトピックを表す番号である。 When extracting topics for the lyrics data, the topic model 44 for lyrics is used. A topic list in the lyrics data is created by analysis using the topic model for lyrics. Specifically, in each lyric, only the topics included in a certain ratio or more are extracted and used as the topic list of the lyric. In the embodiment of the present invention, this threshold is set to 0.1. For example, when the degree of inclusion (probability value) of the top five topics of a certain lyric is as shown in FIG. 5, by extracting topics whose threshold is 0.1 or more, topics 85 and 122 are converted to the lyrics. Extract as a topic list. 85 and 122 are numbers representing topics.

概要テキスト用トピックモデル作成部５０は、歌詞用トピックモデル作成部４２と同様の処理を、概要テキストデータＤＢ３６に格納されている概要テキストデータに対して行うことで、概要テキストデータについての概要テキスト用トピックモデル５２を作成する。本実施の形態では、トピック数は50とした。 The summary text topic model creation unit 50 performs the same processing as that of the lyrics topic model creation unit 42 on the summary text data stored in the summary text data DB 36, so that the summary text data A topic model 52 is created. In the present embodiment, the number of topics is 50.

概要テキスト用トピック抽出部５４は、概要テキストデータＤＢ３６に格納されている概要テキストデータの各々から、概要テキスト用トピックモデル５２に基づいて、概要テキスト用トピックを抽出し、各概要テキストのトピックリスト５６を作成する。閾値は、歌詞用トピック抽出部４６と同じとした。 The summary text topic extraction unit 54 extracts a summary text topic from each of the summary text data stored in the summary text data DB 36 based on the summary text topic model 52, and a topic list 56 of each summary text. Create The threshold value is the same as that of the lyrics topic extraction unit 46.

歌詞・概要テキストペア正解データ５８は、楽曲の歌詞データと、当該楽曲の歌詞データに適した映像に付与された概要テキストデータとが正しく対応付いている正解データのペアの集合である。これらの対応付けは人手で行ったものである。 The lyrics / outline text pair correct answer data 58 is a set of correct data pairs in which the lyrics data of music and the outline text data attached to the video suitable for the lyrics data of the music are correctly associated. These associations are performed manually.

不正解データ作成部６０は、歌詞・概要テキストペア正解データ５８を用いて、楽曲の歌詞データと、当該楽曲の歌詞データに適していない映像に付与された概要テキストデータとが対応付いたペアの集合である歌詞・概要テキストペア不正解データ６２を作成する。 The incorrect answer data creating unit 60 uses the lyrics / summary text pair correct answer data 58 to create a pair of lyrics data corresponding to the song and summary text data attached to the video not suitable for the song lyrics data. The lyrics / summary text pair incorrect answer data 62 as a set is created.

本実施の形態の目的は、歌詞に合った概要テキストを検索することで、その概要テキストに紐付いた映像を検索することである。すなわち、歌詞に対して、複数の概要テキストから対応付くものと対応付かないものを分類出来ればよい。 The purpose of this embodiment is to search for a video associated with the summary text by searching for the summary text that matches the lyrics. In other words, it is only necessary to classify lyrics that do not correspond to those that correspond from a plurality of summary texts.

一般に、分類問題は教師あり学習で行われる。そのためには、正解データ（正例と呼ぶ）と不正解データ（負例と呼ぶ）の両方が必要である。 In general, classification problems are performed by supervised learning. For that purpose, both correct answer data (referred to as a positive example) and incorrect answer data (referred to as a negative example) are required.

しかしながら、教師データとして、対応付けられた正例は持っているものの、負例を持っていなかったため、負例を自動生成することにした。具体的には、歌詞について、所定の概要テキストの集合からランダムに選択し、それを負例とした。ランダムに選ばれた概要テキストは対応付いていることは稀であると考えられるため、負例として利用することが可能である。このような手法は疑似負例の生成とも呼ばれ、機械学習において、よく用いられる手法である。 However, the teacher data has a positive example associated with it, but does not have a negative example, so the negative example is automatically generated. Specifically, the lyrics were randomly selected from a set of predetermined summary texts, and this was used as a negative example. Randomly chosen summary texts are rarely associated, and can be used as negative examples. Such a method is also called pseudo negative example generation, and is a method often used in machine learning.

このように、不正解データ作成部６０は、不正解データとして、歌詞と概要テキスト（歌詞と対応付いているもの以外の概要テキスト）とをランダムに組み合わせたペアを作成する。なお、ランダムに選択する以外に、人手で対応付かないことが確認されている歌詞と概要テキストのペアを不正解データとして利用してもよい。 In this way, the incorrect answer data creating unit 60 creates a pair in which lyrics and summary text (summary text other than those associated with lyrics) are randomly combined as incorrect answer data. In addition to selecting randomly, a pair of lyrics and summary text that has been confirmed not to be handled manually may be used as incorrect answer data.

本実施の形態では、正例と負例の割合は１:１に設定した。この割合は、後段のランカ学習部６４の性能に応じて、変更してもよい。 In the present embodiment, the ratio of positive examples to negative examples is set to 1: 1. This ratio may be changed according to the performance of the ranker learning unit 64 in the subsequent stage.

歌詞・概要テキストペア正解データ５８のそれぞれについて不正解データを一つずつ作成し、学習データとした。学習データには、更に、歌詞データと当該歌詞に紐付く概要テキスト、及び歌詞データと当該歌詞データに紐付かない概要テキストが含まれている。 One incorrect answer data was created for each of the lyrics / outline text pair correct answer data 58 and used as learning data. The learning data further includes lyrics data and summary text associated with the lyrics, and lyrics data and summary text not associated with the lyrics data.

ランカ学習部６４は、正解データの歌詞データから抽出された歌詞用キーワード及び正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、不正解データの歌詞データから抽出された歌詞用キーワード及び不正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、正解データの歌詞データから抽出された歌詞用トピック及び正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、不正解データの歌詞データから抽出された歌詞用トピック及び不正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて、ランキングモデルを学習し、ランキングモデル記憶部６６に記憶する。上記において、正解データは歌詞・概要テキストペア正解データ５８に基づき、不正解データは歌詞・概要テキストペア不正解データ６２に基づく。 The ranker learning unit 64 uses a keyword pair feature amount representing a combination of a keyword for lyrics extracted from lyrics data of correct answer data and a keyword for outline text extracted from summary text data of correct answer data, and lyrics data of incorrect answer data. Keyword pair feature amount representing a combination of keywords for summary text extracted from summary text data of extracted lyrics keywords and incorrect answer data, and topic text for lyrics and summary text of correct answer data extracted from lyrics data of correct data A combination of topic pair features that represent a combination of topics for summary text extracted from data, a topic for lyrics extracted from lyrics data of incorrect answer data, and a topic for summary text extracted from summary text data of incorrect answer data Topic representing Based on the A feature amount, you learn the ranking model, and stores the ranking model storage unit 66. In the above, the correct answer data is based on the lyrics / summary text pair correct answer data 58, and the incorrect answer data is based on the lyrics / summary text pair incorrect answer data 62.

ランカ学習部６４においては、上記の学習用データから特徴量を抽出し、この特徴量を元に評価関数を学習することで、楽曲の歌詞データに適した映像のランキングが可能なランキングモデル（ランカ）を作成する。 The ranker learning unit 64 extracts a feature amount from the learning data and learns an evaluation function based on the feature amount, whereby a ranking model (ranker) capable of ranking videos suitable for song lyrics data. ).

特徴量の抽出には、学習データのそれぞれから得られる、各歌詞のキーワードリスト３４と各概要テキストのキーワードリスト４０、及び各歌詞のトピックリスト４８と各概要テキストのトピックリスト５６を用いる。特徴量としては、２種類あり、キーワードペア特徴量とトピックペア特徴量がある。それぞれの特徴量は、正解データは歌詞・概要テキストペア正解データ５８と、不正解データは歌詞・概要テキストペア不正解データ６２とのそれぞれの全ての組み合わせについて抽出する。 The feature amount is extracted using a keyword list 34 for each lyrics and a keyword list 40 for each summary text, a topic list 48 for each lyrics, and a topic list 56 for each summary text obtained from each of the learning data. There are two types of feature quantities, including keyword pair feature quantities and topic pair feature quantities. The respective feature amounts are extracted for all combinations of the lyrics / summary text pair correct answer data 58 for correct answer data and the lyrics / summary text pair incorrect answer data 62 for incorrect answer data.

キーワードペア特徴量とは、歌詞用キーワード抽出部３２で作成した、歌詞データの各歌詞のキーワードリスト３４に含まれる単語と、概要テキスト用キーワード抽出部３８で作成した、概要テキストの各概要テキストのキーワードリスト４０に含まれる単語をもとに、そのすべての組み合わせを特徴量にしたものである。例えば、歌詞データと概要テキストとのキーワードリストのそれぞれが、以下のように構成されているとする。 The keyword pair feature amounts are the words included in the keyword list 34 of each lyric of the lyrics data created by the lyric keyword extracting unit 32 and the summary texts of the summary text created by the summary text keyword extracting unit 38. Based on the words included in the keyword list 40, all the combinations are used as feature amounts. For example, it is assumed that each of the keyword lists of lyrics data and summary text is configured as follows.

歌詞のキーワードリスト:[君(4回),会う(3回) ,ドキドキ(1回) ,...]
概要テキストのキーワードリスト:[カジュアル(2回),二人(2回),楽しい(1回),...] Keyword list of lyrics: [You (4 times), Meet (3 times), Pounding (1 time), ...]
Keyword list of summary text: [Casual (2 times), 2 people (2 times), Fun (1 time), ...]

この場合に、上記の歌詞のキーワードリスト及び概要テキストのキーワードリストの組み合わせから、キーワードペア特徴量として、"君-カジュアル","君-二人","君-楽しい","会う-カジュアル","会う-二人","会う-楽しい","ドキドキ-カジュアル","ドキドキ-二人","ドキドキ-楽しい"といった特徴量を抽出する。このようにして、歌詞のキーワードリスト及び概要テキストのキーワードリストの全ての組み合わせについてキーワードペア特徴量を抽出する。 In this case, from the combination of the keyword list of the above lyrics and the keyword list of the summary text, the keyword pair feature amounts are “Kimi-Casual”, “Kimi-Two”, “Kimi-fun”, “Meet-Casual”. Extract features such as "Meeting-Two people", "Meeting-Fun", "Pounding-casual", "Pounding-two people", and "Pounding-fun". In this manner, keyword pair feature amounts are extracted for all combinations of the keyword list for lyrics and the keyword list for summary text.

キーワードペア特徴量の値としては、当該キーワードペア特徴量を持つ組み合わせ、すなわち歌詞のキーワードリスト及び概要テキストのキーワードリストの組み合わせにおいて当該キーワードペア特徴量が出現したか否かを２値として利用する。なお、二値ではなく、特徴量の値として組み合わせにおける頻度情報を利用してもよい。 As the value of the keyword pair feature value, whether or not the keyword pair feature value appears in a combination having the keyword pair feature value, that is, a combination of the keyword list of lyrics and the keyword list of summary text is used as a binary value. Note that the frequency information in the combination may be used as the feature value instead of the binary value.

学習に使う素性は、学習データの全ての正例において、一定数以上出現する特徴量のみとしてもよい。そうすることで、特徴量の空間が小さくなり学習コストが低くなる。本発明の実施の形態では出現数の閾値を５に設定した。すなわち、学習データの組み合わせに５個以上含まれる特徴量が学習に使用される。 The feature used for learning may be only a feature amount that appears in a certain number or more in all positive examples of the learning data. By doing so, the feature space is reduced and the learning cost is reduced. In the embodiment of the present invention, the threshold of the number of appearances is set to 5. That is, feature quantities included in the combination of learning data by 5 or more are used for learning.

トピックペア特徴量は、歌詞用トピック抽出部４６で作成した、各歌詞のトピックリスト４８と、概要テキスト用トピック抽出部５４で作成した、各概要テキストのトピックリスト５６に含まれるトピックをもとに、キーワードペア特徴量と同様に、その組み合わせすべてを特徴量にしたものである。例えば、歌詞データと概要テキストとのトピックリストがそれぞれ以下のように構成されているとする。 The topic pair feature amount is based on the topics included in the topic list 48 for each lyrics created by the topic extraction unit 46 for lyrics and the topic list 56 for each summary text created by the topic extraction unit 54 for summary text. Like the keyword pair feature quantity, all the combinations are made feature quantities. For example, it is assumed that the topic lists of the lyrics data and the summary text are configured as follows.

歌詞のトピックリスト:[85,122]
概要テキストのトピックリスト:[33,2,27] Lyrics topic list: [85,122]
Summary text topic list: [33,2,27]

この場合に、特徴量として、"85-33","85-2","85-27","122-33","122-2","122-27"というトピックペア特徴量が抽出される。 In this case, topic pair feature values “85-33”, “85-2”, “85-27”, “122-33”, “122-2”, “122-27” are extracted as feature values. Is done.

ここでも、学習データの全ての正例において、一定数以上出現する特徴量のみを用いてもよいが、本発明の実施の形態では学習データの正例における全てのトピックペア特徴量を利用している。 Again, in all positive examples of learning data, only feature quantities that appear in a certain number or more may be used, but in the embodiment of the present invention, all topic pair feature quantities in the positive examples of learning data are used. Yes.

このようにして、学習データにおける正例、負例のそれぞれについてキーワードペア特徴量及びトピックペア特徴量を抽出し、正例と負例を分類することのできる評価関数を機械学習によって学習する。これは単純な二値分類問題であるので、分類問題によく用いられるアルゴリズムを用いればよい。ここでは、ロジスティック回帰を利用する。ほかのアルゴリズムとして、サポートベクトルマシン（SVM）を用いてもよい。なお、二値分類問題のモデルは、一般に分類対象の事例について正例らしさ（もしくは負例らしさ）の信頼度を出力できるため、その数値を用いて、複数の分類対象をランキングすることができる。本発明の実施の形態でも、ロジスティック回帰で得られた分類モデルを用いてランキングを行う。なお、ランキングSVMのようにランキングに特化した機械学習のアルゴリズムを用いて学習を行ってもよい。その場合は、正例を負例よりも上位にランキングするように評価関数を学習すればよい。 In this way, keyword pair feature values and topic pair feature values are extracted for each of positive examples and negative examples in the learning data, and an evaluation function that can classify positive examples and negative examples is learned by machine learning. Since this is a simple binary classification problem, an algorithm often used for the classification problem may be used. Here, logistic regression is used. As another algorithm, a support vector machine (SVM) may be used. In addition, since the binary classification problem model can generally output the reliability of the likelihood of being positive (or the likelihood of being negative) for the case of the classification target, it is possible to rank a plurality of classification targets by using the numerical values. In the embodiment of the present invention, ranking is performed using a classification model obtained by logistic regression. Note that learning may be performed using a machine learning algorithm specialized for ranking, such as ranking SVM. In that case, the evaluation function may be learned so that positive examples are ranked higher than negative examples.

ランキングモデル記憶部６６には、ランカ学習部６４で学習された、楽曲の歌詞データに適した映像を検索するためのランキングモデルが格納される。 The ranking model storage unit 66 stores a ranking model for searching for videos suitable for the lyrics data of the music learned by the ranker learning unit 64.

＜本発明の実施の形態に係る映像検索装置の構成＞ <Configuration of Video Retrieval Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る映像検索装置の構成について説明する。図６に示すように、本発明の実施の形態に係る映像検索装置２００は、ＣＰＵと、ＲＡＭと、後述する映像検索処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この映像検索装置２００は、機能的には図６に示すように入力部２１０と、演算部２２０と、出力部２７０とを備えている。 Next, the configuration of the video search apparatus according to the embodiment of the present invention will be described. As shown in FIG. 6, the video search apparatus 200 according to the embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a video search processing routine described later. Can be configured with a computer. Functionally, the video search apparatus 200 includes an input unit 210, a calculation unit 220, and an output unit 270 as shown in FIG.

入力部２１０は、楽曲の歌詞データを受け付ける。歌詞の他に、歌手名、作曲者名、楽曲ジャンルなどの情報も受け付けてもよい。歌詞の情報は演算部２２０へと出力され、解析が行われる。 The input unit 210 receives lyric data of music. In addition to the lyrics, information such as a singer name, a composer name, and a music genre may be received. The lyrics information is output to the calculation unit 220 for analysis.

演算部２２０は、歌詞用キーワード抽出部２３２と、各概要テキストのキーワードリスト２４０と、歌詞用トピックモデル２４４と、歌詞用トピック抽出部２４６と、各概要テキストのトピックリスト２５６と、映像検索部２６４と、ランキングモデル記憶部２６６とを含んで構成されている。 The calculation unit 220 includes a keyword extraction unit 232 for lyrics, a keyword list 240 for each summary text, a topic model for lyrics 244, a topic extraction unit for lyrics 246, a topic list 256 for each summary text, and a video search unit 264. And a ranking model storage unit 266.

歌詞用キーワード抽出部２３２は、入力部２１０で受け付けた楽曲の歌詞データから、歌詞用キーワードを抽出する。具体的な処理は、上記歌詞用キーワード抽出部３２と同様の処理を行えばよい。 The lyric keyword extracting unit 232 extracts lyric keywords from the lyric data of the music received by the input unit 210. The specific process may be performed in the same manner as the keyword extraction unit 32 for lyrics.

各概要テキストのキーワードリスト２４０には、上記各概要テキストのキーワードリスト４０と同様のものが格納されている。 The keyword list 240 for each summary text stores the same items as the keyword list 40 for each summary text.

歌詞用トピックモデル２４４には、上記歌詞用トピックモデル４４と同様のものが格納されている。 The lyrics topic model 244 stores the same as the lyrics topic model 44.

歌詞用トピック抽出部２４６は、入力部２１０で受け付けた楽曲の歌詞データから、歌詞用トピックモデル２４４に基づいて、歌詞用トピックを抽出する。具体的な処理は、上記歌詞用トピック抽出部４６と同様の処理を行えばよい。 The lyrics topic extraction unit 246 extracts the lyrics topic from the lyrics data of the music received by the input unit 210 based on the lyrics topic model 244. The specific process may be the same process as the topic extraction unit 46 for lyrics.

ランキングモデル記憶部２６６には、上記ランキングモデル記憶部６６と同様のものが格納されている。 The ranking model storage unit 266 stores the same items as the ranking model storage unit 66.

映像検索部２６４は、映像の各々に対する、歌詞用キーワード抽出部２３２によって抽出された歌詞用キーワード及び、各概要のキーワードリスト２４０において映像に付与された概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、歌詞用トピック抽出部２４６によって抽出された歌詞用トピック、及び各概要テキストのトピックリスト２５６において映像に付与された概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、ランキングモデル記憶部２６６に格納されているランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索する。 The video search unit 264 includes, for each video, a keyword for lyrics extracted by the keyword extracting unit for lyrics 232 and a summary text keyword extracted from the summary text data added to the video in the keyword list 240 for each summary. A combination of a keyword pair feature amount representing a combination, a lyric topic extracted by the lyric topic extraction unit 246, and a summary text topic extracted from the summary text data added to the video in the topic list 256 of each summary text Is searched for based on the topic pair feature quantity indicating the image and the ranking model stored in the ranking model storage unit 266.

映像検索部２６４は、まず、歌詞用キーワード抽出部２３２で抽出した歌詞用キーワードのリストと、検索対象である概要テキストのキーワードリスト２４０とのペアを作り、前述のキーワードペア特徴量を抽出する。また、歌詞用トピック抽出部２４６で抽出した歌詞用トピックのリストと、検索対象である概要テキストのトピックリスト２５６とのペアを作り、前述のトピックペア特徴量を抽出する。そして、キーワードペア特徴量及びトピックペア特徴量のそれぞれの特徴量について、ランキングモデル記憶部２６６に格納されているランキングモデル（本発明の実施の形態においてはロジスティック回帰のモデル）を適用することで、正例らしさ（すなわち、対応付いているかどうか）のスコアを求める。このスコアに基づいて、概要テキストをランキングすることにより、最も対応付いていると考えられる概要テキストが取得でき、また、その結果概要テキストに紐付いている映像を出力することができる。 First, the video search unit 264 creates a pair of the lyric keyword list extracted by the lyric keyword extraction unit 232 and the keyword list 240 of the summary text to be searched, and extracts the above-described keyword pair feature amount. Further, a pair of the lyrics topic list extracted by the lyrics topic extraction unit 246 and the topic list 256 of the summary text to be searched is created, and the above-described topic pair feature amount is extracted. Then, by applying a ranking model (a logistic regression model in the embodiment of the present invention) stored in the ranking model storage unit 266 for each of the keyword pair feature amount and the topic pair feature amount, Find the score of positiveness (ie, whether it corresponds). By ranking the summary text based on this score, it is possible to obtain the summary text most likely to be associated, and as a result, it is possible to output a video associated with the summary text.

例えば、ある歌詞データについて、５つの映像に付与された概要テキストのランキングを行った結果を図７に示す。 For example, FIG. 7 shows the result of ranking the summary texts given to five videos for certain lyrics data.

図７の結果は、概要テキストの番号と出力されたスコア（ロジスティック回帰を用いているため正例らしさを表す確率値）を表しており、この値が高いものほど、歌詞とその概要テキストが対応付いていることを表している。この例では２番目の概要テキストが、0.909のスコアと高く、入力された歌詞データと最も対応付いていると判定されている。 The result of FIG. 7 shows the number of the summary text and the output score (probability value indicating the likelihood of being a positive example because logistic regression is used). The higher this value, the more the lyrics correspond to the summary text Indicates that it is attached. In this example, the second summary text has a high score of 0.909, and it is determined that it corresponds most with the input lyrics data.

出力部２７０は、映像検索部２６４が出力した概要テキストのランキング情報から、上位Ｎ個の概要テキストに紐付いた映像をＮ個出力する。Ｎはアプリケーションに応じて設定すればよい。本発明の実施の形態では、Ｎは５としているが、最も対応付いている映像のみを検索したい場合Ｎを１とすればよい。 The output unit 270 outputs N videos associated with the top N summary texts based on the summary text ranking information output by the video search unit 264. N may be set according to the application. In the embodiment of the present invention, N is set to 5. However, if it is desired to search only the most associated video, N may be set to 1.

＜本発明の実施の形態に係る学習装置の作用＞ <Operation of Learning Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る学習装置１００の作用について説明する。学習装置１００は、図８に示す学習処理ルーチンを実行する。 Next, the operation of the learning device 100 according to the embodiment of the present invention will be described. The learning device 100 executes a learning process routine shown in FIG.

まず、ステップＳ１００では、歌詞データＤＢ３０に格納されている歌詞データの各々から、歌詞用キーワードを抽出し、歌詞ごとのキーワードリストを作成して、各歌詞のキーワードリスト３４として保存する。 First, in step S100, a lyric keyword is extracted from each of the lyric data stored in the lyric data DB 30, a keyword list for each lyric is created, and saved as a keyword list 34 for each lyric.

次に、ステップＳ１０２では、概要テキストデータＤＢ３６に格納されている概要テキストデータの各々から、概要テキスト用キーワードを抽出し、概要テキストごとにキーワードリストを作成して、各概要テキストのキーワードリスト４０として保存する。 Next, in step S102, a summary text keyword is extracted from each of the summary text data stored in the summary text data DB 36, a keyword list is created for each summary text, and the keyword list 40 of each summary text is created. save.

ステップＳ１０４では、歌詞データＤＢ３０に格納されている歌詞データの各々から歌詞用トピックモデル４４を作成する。 In step S104, a lyrics topic model 44 is created from each of the lyrics data stored in the lyrics data DB 30.

ステップＳ１０６では、歌詞データＤＢ３０に格納されている歌詞データの各々から、歌詞用トピックモデル４４に基づいて、歌詞用トピックを抽出し、各歌詞のトピックリスト４８を作成する。 In step S106, the lyrics topic is extracted from each of the lyrics data stored in the lyrics data DB 30 based on the lyrics topic model 44, and a topic list 48 of each lyrics is created.

ステップＳ１０８では、ステップＳ１０４と同様の処理を、概要テキストデータＤＢ３６に格納されている概要テキストデータに対して行う事で、概要テキストデータについての概要テキスト用トピックモデル５２を作成する。 In step S108, the summary text topic model 52 for the summary text data is created by performing the same processing as in step S104 on the summary text data stored in the summary text data DB 36.

ステップＳ１１０では、概要テキストデータＤＢ３６に格納されている概要テキストデータの各々から、概要テキスト用トピックモデル５２に基づいて、概要テキスト用トピックを抽出し、各概要テキストのトピックリスト５６を作成する。 In step S110, a summary text topic is extracted from each summary text data stored in the summary text data DB 36 based on the summary text topic model 52, and a topic list 56 of each summary text is created.

ステップＳ１１２では、歌詞・概要テキストペア正解データ５８を用いて、楽曲の歌詞データと、当該楽曲の歌詞データに適していない映像に付与された概要テキストデータとが対応付いたペアの集合である歌詞・概要テキストペア不正解データ６２を作成する。 In step S112, using the lyrics / outline text pair correct data 58, the lyrics that are a set of pairs in which the lyrics data of the music and the summary text data attached to the video not suitable for the lyrics data of the music are associated with each other. Outline text pair incorrect answer data 62 is created.

ステップＳ１１４では、正解データの歌詞データから抽出された歌詞用キーワード及び正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、不正解データの歌詞データから抽出された歌詞用キーワード及び不正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、正解データの歌詞データから抽出された歌詞用トピック及び正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、不正解データの歌詞データから抽出された歌詞用トピック及び不正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて、ランキングモデルを学習し、ランキングモデル記憶部６６に記憶し、処理を終了する。 In step S114, the keyword pair feature amount representing the combination of the lyric keyword extracted from the lyric data of the correct answer data and the outline text keyword extracted from the outline text data of the correct answer data and the lyric data of the incorrect answer data are extracted. From the keyword pair feature amount representing a combination of keywords for the summary text extracted from the summary text data of the lyric keywords and the incorrect answer data, and from the lyric topics and the summary text data of the correct answer data extracted from the lyrics data of the correct data Represents a combination of topic pair features that represent a combination of extracted summary text topics, a topic for lyrics extracted from lyrics data of incorrect answer data, and a topic for summary text extracted from summary text data of incorrect answer data. Topi Based on paired feature amount, it learns the ranking model, and stored in the ranking model storage unit 66, and ends the process.

以上説明したように、本発明の実施の形態に係る学習装置によれば、歌詞データの各々から、歌詞用キーワードを抽出し、概要テキストデータの各々から、概要テキスト用キーワードを抽出し、歌詞データの各々から、歌詞用トピックを抽出し、概要テキストデータの各々から、概要テキスト用トピックを抽出し、正解データの歌詞データから抽出された歌詞用キーワード及び正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、不正解データの歌詞データから抽出された歌詞用キーワード及び不正解データの概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、正解データの歌詞データから抽出された歌詞用トピック及び正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、不正解データの歌詞データから抽出された歌詞用トピック及び不正解データの概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量とに基づいて、ランキングモデルを学習することにより、楽曲の歌詞データに適した映像を精度よく検索するためのランキングモデルを学習することができる。 As described above, according to the learning device according to the embodiment of the present invention, the lyrics keyword is extracted from each of the lyrics data, the overview text keyword is extracted from each of the overview text data, and the lyrics data The topic for lyrics is extracted from each of the above, the topic for summary text is extracted from each of the summary text data, the keywords for lyrics extracted from the lyrics data of the correct answer data, and the overview extracted from the summary text data of the correct answer data Keyword pair feature amount representing a combination of text keywords and keyword pair feature amount representing a combination of a keyword for lyrics extracted from lyrics data of incorrect answer data and a keyword for summary text extracted from summary text data of incorrect answer data And a topic for lyrics extracted from the lyrics data of the correct answer data. Extracted from the summary text data of the topic pair representing the combination of the topic for the summary text extracted from the summary text data of the correct answer data and the topic for lyrics extracted from the lyrics data of the incorrect answer data and the incorrect answer data By learning the ranking model based on the topic pair feature amount representing the combination of the topics for the summary text, it is possible to learn the ranking model for accurately searching the video suitable for the lyrics data of the music.

＜本発明の実施の形態に係る映像検索装置の作用＞ <Operation of Video Retrieval Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る映像検索装置２００の作用について説明する。入力部２１０において楽曲の歌詞データを受け付けると、映像検索装置２００は、図９に示す映像検索処理ルーチンを実行する。 Next, the operation of the video search apparatus 200 according to the embodiment of the present invention will be described. When the lyrics data of the music is received by the input unit 210, the video search device 200 executes a video search processing routine shown in FIG.

まず、ステップＳ２００では、入力部２１０で受け付けた楽曲の歌詞データから、歌詞用キーワードを抽出する。 First, in step S200, a lyric keyword is extracted from the lyric data of the music received by the input unit 210.

次に、ステップＳ２０２では、入力部２１０で受け付けた楽曲の歌詞データから、歌詞用トピックモデル２４４に基づいて、歌詞用トピックを抽出する。 Next, in step S202, the lyrics topic is extracted from the lyrics data of the music received by the input unit 210 based on the lyrics topic model 244.

ステップＳ２０４では、映像の各々に対する、歌詞用キーワード抽出部２３２によって抽出された歌詞用キーワード及び、各概要のキーワードリスト２４０において映像に付与された概要テキストデータから抽出された概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、歌詞用トピック抽出部２４６によって抽出された歌詞用トピック、及び各概要テキストのトピックリスト２５６において映像に付与された概要テキストデータから抽出された概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、ランキングモデル記憶部２６６に格納されているランキングモデルとに基づいて、入力された楽曲の歌詞データに適した、映像に付与された概要テキストを検索する。 In step S204, a combination of the lyric keyword extracted by the lyric keyword extracting unit 232 and the general text keyword extracted from the general text data added to the video in each general keyword list 240 for each video. This represents a combination of a keyword pair feature amount to be expressed, a lyrics topic extracted by the lyrics topic extraction unit 246, and a summary text topic extracted from the summary text data added to the video in the topic list 256 of each summary text. Based on the topic pair feature amount and the ranking model stored in the ranking model storage unit 266, the outline text attached to the video suitable for the lyrics data of the input music is searched.

ステップＳ２０６では、ステップＳ２０４で検索された概要テキストのランキング情報から、上位Ｎ個の概要テキストに紐付いた映像をＮ個出力して処理を終了する。 In step S206, N videos associated with the top N summary texts are output from the ranking information of the summary text searched in step S204, and the process is terminated.

以上説明したように、本発明の実施の形態に係る映像検索装置によれば、入力された楽曲の歌詞データから、歌詞用キーワードを抽出し、入力された楽曲の歌詞データから、歌詞用トピックを抽出し、映像の各々に対する、抽出された歌詞用キーワード及び映像に付与された概要テキストデータから抽出される概要テキスト用キーワードの組み合わせを表すキーワードペア特徴量と、抽出された歌詞用トピック及び映像に付与された概要テキストデータから抽出される概要テキスト用トピックの組み合わせを表すトピックペア特徴量と、ランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索することにより、楽曲の歌詞データに適した映像を精度よく検索することができる。 As described above, according to the video search apparatus according to the embodiment of the present invention, the keyword for lyrics is extracted from the lyrics data of the input music, and the lyrics topic is extracted from the lyrics data of the input music. For each extracted video, a keyword pair feature amount representing a combination of the extracted lyric keyword and the general text keyword extracted from the general text data attached to the video, and the extracted lyric topic and video By searching for a video suitable for the lyric data of the input music based on the topic pair feature amount representing the combination of the topics for the general text extracted from the provided general text data and the ranking model, Video suitable for lyric data can be searched with high accuracy.

また、楽曲の歌詞といった長文かつ主観的な文章から、適切な映像が検索することができれば、楽曲に応じた映像を提供できるシステムに繋がり、楽曲の楽しみ方が広がる。 Also, if an appropriate video can be searched from long and subjective sentences such as the lyrics of the music, it will lead to a system that can provide video according to the music, and the way of enjoying the music will be expanded.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上述した実施の形態では、学習装置１００は、歌詞データＤＢ３０と、歌詞用キーワード抽出部３２と、各歌詞のキーワードリスト３４と、概要テキストデータＤＢ３６と、概要テキスト用キーワード抽出部３８と、各概要テキストのキーワードリスト４０と、歌詞用トピックモデル作成部４２と、歌詞用トピックモデル４４と、歌詞用トピック抽出部４６と、各歌詞のトピックリスト４８と、概要テキスト用トピックモデル作成部５０と、概要テキスト用トピックモデル５２と、概要テキスト用トピック抽出部５４と、各概要テキストのトピックリスト５６と、歌詞・概要テキストペア正解データ５８と、不正解データ作成部６０と、歌詞・概要テキストペア不正解データ６２と、ランカ学習部６４と、ランキングモデル記憶部６６とを含んで構成され、キーワードペア特徴量と、トピックペア特徴量とに基づいてランキングモデルを学習していたが、これに限定されるものではない。例えば、学習装置１００を歌詞データＤＢ３０と、歌詞用キーワード抽出部３２と、各歌詞のキーワードリスト３４と、概要テキストデータＤＢ３６と、概要テキスト用キーワード抽出部３８と、各概要テキストのキーワードリスト４０と、歌詞・概要テキストペア正解データ５８と、不正解データ作成部６０と、歌詞・概要テキストペア不正解データ６２と、ランカ学習部６４と、ランキングモデル記憶部６６とを含んで構成し、キーワードペア特徴量に基づいてランキングモデルを学習するようにしてもよい。この場合には、映像検索装置２００について、演算部２２０は、歌詞用キーワード抽出部２３２と、各概要テキストのキーワードリスト２４０と、映像検索部２６４と、ランキングモデル記憶部２６６とを含んで構成し、キーワードペア特徴量と、ランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索するようにしてもよい。 For example, in the embodiment described above, the learning device 100 includes the lyrics data DB 30, the keyword extraction unit 32 for lyrics, the keyword list 34 for each lyrics, the summary text data DB 36, the keyword extraction unit 38 for summary text, Keyword list 40 for each summary text, topic model creation unit 42 for lyrics, topic model 44 for lyrics, topic extraction unit 46 for lyrics, topic list 48 for each lyrics, topic model creation unit 50 for summary text, The topic model 52 for the summary text, the topic extraction unit 54 for the summary text, the topic list 56 of each summary text, the lyrics / summary text pair correct answer data 58, the incorrect answer data creation part 60, and the lyrics / summary text pair. Incorrect answer data 62, ranker learning unit 64, ranking model storage unit 6 It is configured to include a preparative, and keywords paired feature amount, had learned the ranking model based on the topic paired feature amount is not limited thereto. For example, the learning device 100 includes a lyrics data DB 30, a keyword extraction unit 32 for lyrics, a keyword list 34 for each lyrics, a summary text data DB 36, a keyword extraction unit 38 for summary text, and a keyword list 40 for each summary text. , Lyrics / summary text pair correct answer data 58, incorrect answer data creation section 60, lyrics / summary text pair incorrect answer data 62, ranker learning section 64, ranking model storage section 66, and keyword pair The ranking model may be learned based on the feature amount. In this case, with respect to the video search device 200, the calculation unit 220 includes a lyrics keyword extraction unit 232, a keyword list 240 for each summary text, a video search unit 264, and a ranking model storage unit 266. The video suitable for the lyrics data of the input music may be searched based on the keyword pair feature amount and the ranking model.

また、学習装置１００を、歌詞データＤＢ３０と、概要テキストデータＤＢ３６と、歌詞用トピックモデル作成部４２と、歌詞用トピックモデル４４と、歌詞用トピック抽出部４６と、各歌詞のトピックリスト４８と、概要テキスト用トピックモデル作成部５０と、概要テキスト用トピックモデル５２と、概要テキスト用トピック抽出部５４と、各概要テキストのトピックリスト５６と、歌詞・概要テキストペア正解データ５８と、不正解データ作成部６０と、歌詞・概要テキストペア不正解データ６２と、ランカ学習部６４と、ランキングモデル記憶部６６とを含んで構成し、トピックペア特徴量に基づいてランキングモデルを学習するようにしてもよい。この場合には、映像検索装置２００について、演算部２２０は、歌詞用トピックモデル２４４と、歌詞用トピック抽出部２４６と、各概要テキストのトピックリスト２５６と、映像検索部２６４と、ランキングモデル記憶部２６６とを含んで構成し、トピックペア特徴量と、ランキングモデルとに基づいて、入力された楽曲の歌詞データに適した映像を検索するようにしてもよい。 In addition, the learning apparatus 100 includes a lyrics data DB 30, a summary text data DB 36, a lyrics topic model creation unit 42, a lyrics topic model 44, a lyrics topic extraction unit 46, a topic list 48 for each lyrics, Summary text topic model creation unit 50, summary text topic model 52, summary text topic extraction unit 54, topic list 56 for each summary text, lyrics / summary text pair correct answer data 58, and incorrect answer data creation Unit 60, lyrics / outline text pair incorrect answer data 62, ranker learning unit 64, and ranking model storage unit 66, and the ranking model may be learned based on the topic pair feature amount. . In this case, with respect to the video search apparatus 200, the calculation unit 220 includes a topic model for lyrics 244, a topic extraction unit for lyrics 246, a topic list 256 for each summary text, a video search unit 264, and a ranking model storage unit. 266, and a video suitable for the lyrics data of the input music may be searched based on the topic pair feature quantity and the ranking model.

２０、２２０演算部
３２、２３２歌詞用キーワード抽出部
３４各歌詞のキーワードリスト
３８概要テキスト用キーワード抽出部
４０、２４０各概要テキストのキーワードリスト
４２歌詞用トピックモデル作成部
４４、２４４歌詞用トピックモデル
４６、２４６歌詞用トピック抽出部
４８各歌詞のトピックリスト
５０概要テキスト用トピックモデル作成部
５２概要テキスト用トピックモデル
５４概要テキスト用トピック抽出部
５６、２５６各概要テキストのトピックリスト
５８歌詞・概要テキストペア正解データ
６０不正解データ作成部
６２歌詞・概要テキストペア不正解データ
６４ランカ学習部
６６ランキングモデル記憶部
１００学習装置
２００映像検索装置
２１０入力部
２６４映像検索部
２６６ランキングモデル記憶部
２７０出力部 20, 220 Calculation unit 32, 232 Lyrics keyword extraction unit 34 Keyword list 38 for each lyrics 38 Outline text keyword extraction unit 40, 240 Keyword list 42 for each summary text Lyric topic model creation unit 44, 244 Lyric topic model 46 246 Lyrics topic extraction unit 48 Topic list 50 for each lyric Topic text topic model creation unit 52 General text topic model 54 General text topic extraction unit 56, 256 Topic list for each general text 58 Lyric / summary text pair correct answer Data 60 Incorrect answer data creation section 62 Lyric / summary text pair incorrect answer data 64 Ranker learning section 66 Ranking model storage section 100 Learning apparatus 200 Video search apparatus 210 Input section 264 Video search section 266 Ranking model storage section 2 0 output section

Claims

A learning device for learning a ranking model for searching for a video suitable for song lyrics data,
Correct data that is a pair of the lyrics data of the music and the summary text data given to the video suitable for the lyrics data of the music, and the lyrics data of the music and the video that is not suitable for the lyrics data of the music A keyword extractor for lyrics from each of the lyrics data included in the incorrect answer data that is a pair with the summary text data,
A summary text keyword extraction unit that extracts a summary text keyword from each of the summary text data included in the correct answer data and the incorrect answer data;
A keyword pair feature amount representing a combination of a keyword for lyrics extracted from the lyrics data of the correct answer data and a keyword for summary text extracted from the summary text data of the correct answer data, and the lyrics data of the incorrect answer data A ranker learning unit that learns the ranking model based on the extracted keyword for lyrics and the keyword pair feature amount that represents a combination of keywords for summary text extracted from the summary text data of the incorrect answer data;
A learning device including

A learning device for learning a ranking model for searching for a video suitable for song lyrics data,
Correct data that is a pair of the lyrics data of the music and the summary text data given to the video suitable for the lyrics data of the music, and the lyrics data of the music and the video that is not suitable for the lyrics data of the music A lyric topic extraction unit that extracts a lyric topic from each of the lyric data included in the incorrect answer data that is a pair with the summary text data;
A summary text topic extraction unit that extracts a summary text topic from each of the summary text data included in the correct answer data and the incorrect answer data;
A topic pair feature amount representing a combination of a topic for lyrics extracted from the lyrics data of the correct answer data and a topic for overview text extracted from the summary text data of the correct answer data, and the lyrics data of the incorrect answer data A ranker learning unit that learns the ranking model based on the extracted topic for lyrics and the topic pair feature amount representing the combination of the topic for summary text extracted from the summary text data of the incorrect answer data;
A learning device including

A learning device for learning a ranking model for searching for a video suitable for song lyrics data,
Correct data that is a pair of the lyrics data of the music and the summary text data given to the video suitable for the lyrics data of the music, and the lyrics data of the music and the video that is not suitable for the lyrics data of the music A keyword extractor for lyrics from each of the lyrics data included in the incorrect answer data that is a pair with the summary text data,
A summary text keyword extraction unit that extracts a summary text keyword from each of the summary text data included in the correct answer data and the incorrect answer data;
A lyric topic extracting unit that extracts a lyric topic from each of the lyric data included in the correct answer data and the incorrect answer data;
A summary text topic extraction unit that extracts a summary text topic from each of the summary text data included in the correct answer data and the incorrect answer data;
A keyword pair feature amount representing a combination of a keyword for lyrics extracted from the lyrics data of the correct answer data and a keyword for summary text extracted from the summary text data of the correct answer data, and the lyrics data of the incorrect answer data A keyword pair feature amount representing a combination of an extracted text keyword and a summary text keyword extracted from the summary text data of the incorrect answer data, a lyrics topic extracted from the lyrics data of the correct data, and the Topic pair feature amount representing a combination of topics for summary text extracted from the summary text data of correct answer data, a topic for lyrics extracted from the lyrics data of the incorrect answer data, and the summary text data of the incorrect answer data Summary extracted from And Lanka learning portion based on the topic paired feature value representing the combination of text for topics, learns the ranking model,
A learning device including

A video search device for searching video suitable for lyrics data of music,
A keyword extraction unit for lyrics that extracts a keyword for lyrics from the lyrics data of the input music;
From the lyric keyword extracted from the lyric data of the correct answer data that is a pair of the lyric data of the tune and the outline text data given to the video suitable for the lyric data of the tune, and the summary text data of the correct answer data The incorrect answer data that is a pair of the keyword pair feature amount representing the extracted summary text keyword combination, the song lyrics data, and the summary text data attached to the video not suitable for the song lyrics data Suitable for lyric data of music that has been learned in advance based on keyword pairs that represent combinations of keywords for lyrics extracted from lyrics data and keywords for summary text extracted from the summary text data of the incorrect answer data Ranking model that stores ranking models for searching for videos A storage unit,
A keyword pair feature amount representing a combination of a lyric keyword extracted by the lyric keyword extracting unit and an outline text keyword extracted from the outline text data attached to the image for each of the videos, and the ranking model Based on the above, a video search unit for searching for a video suitable for the lyrics data of the input music,
Video search device including

A video search device for searching video suitable for lyrics data of music,
A lyric topic extractor that extracts lyric topics from the lyrics data of the input music;
From the lyrics topic extracted from the lyrics data of the correct answer data that is a pair of the lyrics data of the music and the outline text data assigned to the video suitable for the lyrics data of the music, and the summary text data of the correct answer data The incorrect answer data that is a pair of the feature pair feature amount representing the extracted summary text topic combination, the lyrics data of the music, and the summary text data attached to the video not suitable for the lyrics data of the music Suitable for lyric data of music that has been learned in advance based on the topic pair feature amount representing the combination of the topic for lyrics extracted from the lyric data and the topic for summary text extracted from the summary text data of the incorrect answer data A ranking model storage unit for storing a ranking model for searching for a captured video;
A topic pair feature amount representing a combination of a lyric topic extracted by the lyric topic extraction unit and an outline text topic extracted from the outline text data attached to the image for each of the videos, and the ranking model Based on the above, a video search unit for searching for a video suitable for the lyrics data of the input music,
Video search device including

A video search device for searching video suitable for lyrics data of music,
A keyword extraction unit for lyrics that extracts a keyword for lyrics from the lyrics data of the input music;
A lyrics topic extraction unit that extracts a lyrics topic from the lyrics data of the input music;
From the lyric keyword extracted from the lyric data of the correct answer data that is a pair of the lyric data of the tune and the outline text data given to the video suitable for the lyric data of the tune, and the summary text data of the correct answer data The incorrect answer data that is a pair of the keyword pair feature amount representing the extracted summary text keyword combination, the song lyrics data, and the summary text data attached to the video not suitable for the song lyrics data A keyword pair feature amount representing a combination of a keyword for lyrics extracted from lyrics data and a keyword for summary text extracted from the summary text data of the incorrect answer data, and for lyrics extracted from the lyrics data of the correct answer data Extract from topic text data of topic and correct answer data A topic pair feature amount representing a combination of the extracted summary text topics, a lyrics topic extracted from the lyrics data of the incorrect answer data, and a summary text topic extracted from the summary text data of the incorrect answer data. A ranking model storage unit that stores a ranking model for searching for videos suitable for the lyric data of music, which has been learned in advance based on topic pair feature amounts representing combinations;
A keyword pair feature amount representing a combination of a lyric keyword extracted by the lyric keyword extracting unit and an outline text keyword extracted from the outline text data attached to the image for each of the videos, and for the lyrics Music composition inputted based on the topic pair feature amount representing the combination of the topic for lyrics extracted by the topic extraction unit and the topic for summary text extracted from the summary text data attached to the video, and the ranking model A video search unit to search for videos suitable for the lyrics data of
Video search device including

The learning device according to claim 1, wherein the summary text keyword is a keyword representing a person, a place, a season, or an event.

A learning method in a learning device for learning a ranking model for searching for a video suitable for lyrics data of a song,
The lyric keyword extraction unit includes the correct data that is a pair of the lyric data of the music and the summary text data attached to the video suitable for the lyric data of the music, the lyric data of the music, and the lyric data of the music Extracting a lyric keyword from each of lyric data included in incorrect answer data that is a pair with summary text data attached to an unsuitable video;
A step of extracting a summary text keyword from each of the summary text data included in the correct answer data and the incorrect answer data;
A ranker learning unit includes a keyword pair feature amount representing a combination of a keyword for lyrics extracted from the lyrics data of the correct answer data and a keyword for summary text extracted from the summary text data of the correct answer data, and the incorrect answer data Learning the ranking model based on a keyword pair feature amount representing a combination of a lyric keyword extracted from the lyric data and a summary text keyword extracted from the summary text data of the incorrect answer data; ,
Learning methods including.

A keyword extractor for lyrics, a keyword for lyrics extracted from the lyrics data of incorrect answer data which is a pair of the lyrics data of the song and the summary text data attached to the video not suitable for the lyrics data of the song; A ranking model for searching for a video suitable for lyric data of a song, learned in advance based on a keyword pair feature amount representing a combination of keywords for summary text extracted from the summary text data of the incorrect answer data A video search method in a video search device that includes a ranking model storage unit for storing, and a video search unit, and searches for a video suitable for lyrics data of a song,
The lyric keyword extraction unit extracts lyric keywords from the lyrics data of the input music;
A keyword pair feature representing a combination of a lyric keyword extracted by the lyric keyword extracting unit and a summary text keyword extracted from the summary text data attached to the video for each of the videos by the video search unit The keyword for lyrics extracted from the lyrics data of the correct answer data which is a pair of the amount, the lyrics data of the song, and the summary text data attached to the video suitable for the lyrics data of the song, and the summary of the correct data Search for a video suitable for the lyrics data of the input music based on the keyword pair feature amount representing the combination of keywords for the summary text extracted from the text data and the ranking model stored in the ranking model storage unit. And steps to
Video search method including

To make a computer function as each part of the learning device according to any one of claims 1 to 3 and claim 7 or the video search device according to any one of claims 4 to 6. Program.