JP2012014504A

JP2012014504A - Information selection device, server, information selection method, and program

Info

Publication number: JP2012014504A
Application number: JP2010151038A
Authority: JP
Inventors: Masami Nakazawa; 昌美中澤; Keiichiro Hoashi; 啓一郎帆足; Kazunori Matsumoto; 一則松本; Yasuhiro Takishima; 康弘滝嶋
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2010-07-01
Filing date: 2010-07-01
Publication date: 2012-01-19
Anticipated expiration: 2030-07-01
Also published as: JP5600498B2

Abstract

PROBLEM TO BE SOLVED: To provide an information selection device which selects appropriate content including information relevant to a program which is broadcast.SOLUTION: The information selection device receives subtitle information showing a substance of the program which is broadcast with the program, extracts a theme word showing a theme of the subtitle information and a content word showing the substance of the subtitle information from words included in the received subtitle information, retrieves the content relevant to the subtitle information among contents which are disclosed on a network using a plurality of retrieval conditions generated by combining the theme word and the content word, calculates similarity showing correlation between text information included in the retrieved content and showing the substance of the content and the subtitle information, selects any one retrieval condition from among the plurality of retrieval conditions based on the calculated similarity, and outputs position information showing an area in which the retrieved content is stored using the selected retrieval condition.

Description

本発明は、情報選択装置、サーバ装置、情報選択方法、及びプログラムに関する。 The present invention relates to an information selection device, a server device, an information selection method, and a program.

近年、テレビ放送と、Ｗｅｂコンテンツとを統合するために、テレビ放送により提供されている情報の構造である話題構造に基づく結合モデルの研究が進められている。
例えば、非特許文献１には、結合モデルに基づいて、テレビ放送により提供されている番組の内容を補完するＷｅｂページを検索するためのクエリ・フリー（Query-Free）検索機構が記載されている。この非特許文献１に記載された技術は、検索するＷｅｂページの種類に基づいて、４つのクエリを生成している。生成するクエリは、番組とともに送信されている字幕情報に含まれる主題語及び内容語を用いて生成され、ＳＤ（Subject-Deepening）、ＳＢ（Subject-Broadening）、ＣＤ（Content-Deepening）、及びＣＢ（Content-Broadening）の４つのクエリである。 In recent years, in order to integrate television broadcasting and Web content, research on a connection model based on a topic structure, which is a structure of information provided by television broadcasting, has been advanced.
For example, Non-Patent Document 1 describes a query-free search mechanism for searching a Web page that complements the content of a program provided by television broadcasting based on a combined model. . The technique described in Non-Patent Document 1 generates four queries based on the type of Web page to be searched. The query to be generated is generated by using the subject word and the content word included in the caption information transmitted together with the program, and the SD (Subject-Deepening), SB (Subject-Broadening), CD (Content-Deepening), and CB There are four queries of (Content-Broadening).

ここで、主題語とは、字幕情報において出現頻度が高く、かつ、その他の語との有向共起度の強いキーワードである。また、内容語とは、字幕情報において出現頻度が高く、かつ、主題語との無向共起度の強いキーワードである。
また、ＳＤは、主題語がＷｅｂページの本文を示すテキスト情報に含まれているＷｅｂページを検索するためのクエリであり、ＳＢは、内容語がＷｅｂページの本文に含まれているＷｅｂページを検索するためのクエリであり、ＣＢは、主題語がＷｅｂページのタイトルに含まれているＷｅｂページを検索するためのクエリであり、ＣＤは、内容語がＷｅｂページのタイトルに含まれているＷｅｂページを検索するためのクエリである。 Here, the subject word is a keyword that frequently appears in the caption information and has a strong degree of directed co-occurrence with other words. The content word is a keyword that frequently appears in the caption information and has a high degree of undirected co-occurrence with the subject word.
SD is a query for searching for a web page whose subject word is included in text information indicating the body of the web page, and SB is a query for a web page whose content word is included in the body of the web page. A query for searching, CB is a query for searching a Web page whose subject word is included in the title of the Web page, and CD is a Web whose content word is included in the title of the Web page. A query for searching a page.

上記の手法では、主題語及び内容語を用いた４つのクエリによるＷｅｂページの検索を行うことにより、番組の内容と単に類似するＷｅｂページを検索するのではなく、番組の内容をより詳しく、又は番組の内容を別の視点から示しているＷｅｂページを検索するようにしている。 In the above method, the web page is searched by four queries using the subject word and the content word, so that the web page similar to the content of the program is not simply searched, the program content is more detailed, or A Web page showing the contents of the program from another viewpoint is searched.

図６は、非特許文献１の手法によるＷｅｂページ（コンテンツ）の検索の処理を示すフローチャートである。
同図に示すように、番組とともに送信されている字幕情報を受信し（ステップＳ９０１）、受信した字幕情報から主題語及び内容語を抽出する（ステップＳ９０２）。抽出した主題語及び内容語を用いて、ＣＤ、ＣＢ、ＳＤ及びＳＢのクエリを生成し（ステップＳ９０３）、生成した各クエリを検索サーバ（検索エンジン）に送信することにより、Ｗｅｂページの検索を行う（ステップＳ９０４）。検索したＷｅｂページを示す情報であるＵＲＬを受信する（ステップＳ９０５）。 FIG. 6 is a flowchart showing a Web page (content) search process by the method of Non-Patent Document 1.
As shown in the figure, the subtitle information transmitted with the program is received (step S901), and the subject word and the content word are extracted from the received subtitle information (step S902). Using the extracted subject words and content words, queries for CD, CB, SD and SB are generated (step S903), and each generated query is transmitted to a search server (search engine) to search for a web page. This is performed (step S904). A URL that is information indicating the retrieved Web page is received (step S905).

この手法を用いると、字幕情報に含まれるすべての話題に対して、話題を掘り下げた詳細な情報と、当該話題を異なる視点からの情報とを示すＷｅｂページのＵＲＬが得られることになり、話題に関する情報を網羅的に検索することができる。 Using this method, for all topics included in the caption information, a URL of a Web page indicating detailed information that delves into the topic and information from a different viewpoint on the topic can be obtained. The information about can be searched exhaustively.

馬強、田中克己、「話題構造に基づく放送とＷｅｂコンテンツの統合のための検索機構」、情報処理学会論文誌：データベース、Ｖｏｌ．４５Ｎｏ．ＳＩＧ１０（ＴＯＤ２３）、ｐｐ．１８−３６、２００４年Ma Qiang, Katsumi Tanaka, “Search Mechanism for Integration of Broadcast and Web Content Based on Topic Structure”, Transactions of Information Processing Society of Japan: Database, Vol. 45 No. SIG10 (TOD23), pp. 18-36, 2004

しかしながら、非特許文献１の手法により得られる番組の内容に関連したＷｅｂページを、当該番組を視聴している視聴者（ユーザ）に、番組とともにテレビ画面上に表示させようとすると、Ｗｅｂページの数が多いために、テレビ画面上にすべてを表示させることができないことがある。また、番組の内容に関連したＷｅｂページを網羅的に検索したために、当該番組の内容と関連の度合いが低いＷｅｂページが含まれてしまうことがあり、ユーザの求めていない情報を示すＷｅｂページが含まれてしまうことがある。 However, if a viewer (user) viewing the program on the TV screen together with the program displays a web page related to the content of the program obtained by the method of Non-Patent Document 1, the web page Due to the large number, it may not be possible to display everything on the television screen. In addition, since the Web pages related to the content of the program are comprehensively searched, a Web page that has a low degree of association with the content of the program may be included. May be included.

すなわち、番組に関連する情報を含むＷｅｂページ（コンテンツ）を網羅的に検索したために、検索結果の数が多くなるとともに、番組の内容との関連の度合いが低いコンテンツが含まれるようになり、番組に関連する適切なコンテンツをネットワーク上のコンテンツから選択することができないことがあるという問題がある。 That is, since the Web page (content) including information related to the program is exhaustively searched, the number of search results increases and content with a low degree of association with the content of the program is included. There is a problem that it may not be possible to select appropriate content related to the network from content on the network.

本発明は、上記問題を解決すべくなされたもので、その目的は、放送されている番組に関連する情報を含む適切なコンテンツを選択することができる情報選択装置、情報選択方法、及びプログラム、並びにサーバ装置を提供することにある。 The present invention has been made to solve the above problems, and an object of the present invention is to select an appropriate content including information related to a broadcast program, an information selection device, an information selection method, and a program. The present invention also provides a server device.

上記問題を解決するために、本発明は、番組とともに放送されている該番組の内容を表す字幕情報を受信する字幕取得部と、前記字幕取得部が受信した字幕情報に含まれる単語から該字幕情報の主題を表す主題語と、該字幕情報の内容を表す内容語とを抽出し、該主題語及び該内容語を組み合わせて生成された複数の検索条件を用いてネットワーク上に公開されているコンテンツのうち該字幕情報に関連するコンテンツを検索するコンテンツ検索部と、前記複数の検索条件ごとに、前記コンテンツ検索部が検索したコンテンツに含まれ該コンテンツの内容を示すテキスト情報と、前記字幕情報との相関を示す類似度を算出し、算出した類似度に基づいて、前記複数の検索条件からいずれか一つの検索条件を選択する類似度算出部と、前記類似度算出部が選択した検索条件を用いて検索された前記コンテンツが記憶されている領域を示す位置情報を出力する出力部とを備えることを特徴とする情報選択装置である。 In order to solve the above problem, the present invention provides a subtitle acquisition unit that receives subtitle information representing the content of the program broadcast together with the program, and the subtitles from words included in the subtitle information received by the subtitle acquisition unit. A subject word representing the subject of the information and a content word representing the content of the caption information are extracted and published on the network using a plurality of search conditions generated by combining the subject word and the subject word. A content search unit that searches for content related to the subtitle information among the content, text information indicating the content of the content included in the content searched by the content search unit for each of the plurality of search conditions, and the subtitle information A similarity calculating unit that calculates a similarity indicating a correlation with the similarity, and selects one search condition from the plurality of search conditions based on the calculated similarity, and the similarity Calculator is information selection device characterized by comprising an output section the contents searched outputs position information indicating a region stored with the selected search criteria.

また、本発明は、上記に記載の発明において、過去に放送された字幕情報から予め抽出された複数の単語それぞれの前記字幕情報における出現頻度に基づいて、前記字幕情報の特徴を示すベクトルである字幕特徴量を算出する字幕特徴量算出部と、前記複数の単語それぞれの前記テキスト情報における出現頻度に基づいて、前記テキスト情報の特徴を示すベクトルであるテキスト特徴量を算出するテキスト特徴量算出部とを更に備え、前記類似度算出部は、前記字幕特徴量と前記テキスト特徴量とから前記類似度を算出し、前記出力部は、前記複数の検索条件ごとに算出する前記類似度の平均値が最も高い検索条件を選択するか、又は前記複数の検索条件のうち前記類似度が最も高いコンテンツを検索した検索条件を選択し、選択した検索条件により検索された前記コンテンツに対応する位置情報を出力することを特徴とする。 Further, the present invention is a vector indicating characteristics of the caption information on the basis of the appearance frequency of each of a plurality of words extracted in advance from caption information broadcast in the past in the invention described above. A subtitle feature amount calculation unit that calculates a subtitle feature amount, and a text feature amount calculation unit that calculates a text feature amount that is a vector indicating the feature of the text information based on the appearance frequency of each of the plurality of words in the text information The similarity calculation unit calculates the similarity from the caption feature quantity and the text feature quantity, and the output unit calculates an average value of the similarity calculated for each of the plurality of search conditions. The search condition with the highest similarity is selected, or the search condition for searching for the content with the highest similarity among the plurality of search conditions is selected, and the selected search condition is selected. And outputting location information corresponding to the contents searched by.

また、本発明は、上記に記載の発明において、前記字幕特徴量算出部は、前記複数の単語を用いてｔｆｉｄｆ（Term Frequency & Inverse Document Frequency）により重み付けをした前記字幕特徴量を算出し、前記テキスト特徴量算出部は、前記複数の単語を用いてｔｆｉｄｆにより重み付けをした前記テキスト特徴量を算出し、前記類似度算出部は、前記字幕特徴量と、前記テキスト特徴量とのコサイン相関値を前記類似度として算出することを特徴とする。 Further, the present invention is the invention described in the above, wherein the caption feature amount calculation unit calculates the caption feature amount weighted by tfidf (Term Frequency & Inverse Document Frequency) using the plurality of words, The text feature amount calculating unit calculates the text feature amount weighted by tfidf using the plurality of words, and the similarity calculating unit calculates a cosine correlation value between the caption feature amount and the text feature amount. The similarity is calculated.

また、本発明は、上記に記載の発明において、コンテンツ検索部は、前記ネットワーク上に公開されているコンテンツのうち受信した条件を満たすコンテンツを検索して該コンテンツの前記位置情報を返信する検索サーバに、前記複数の検索条件を送信することにより、前記複数の検索条件を満たすコンテンツを検索し、前記テキスト特徴量算出部は、前記コンテンツ検索部が検索したコンテンツのうち、前記複数の検索条件ごとに、前記検索サーバが順位付けした上位ｎ（ｎ≧１）個のコンテンツに対して前記テキスト特徴量を算出することを特徴とする。 Further, the present invention is the search server according to the above-described invention, wherein the content search unit searches for content satisfying a received condition among the content published on the network and returns the position information of the content In addition, by transmitting the plurality of search conditions, the content that satisfies the plurality of search conditions is searched, and the text feature amount calculation unit is configured to search for each of the plurality of search conditions among the contents searched by the content search unit. In addition, the text feature quantity is calculated for the top n (n ≧ 1) contents ranked by the search server.

また、本発明は、上記に記載の発明において、前記ネットワーク上に公開されているコンテンツのうち、前記主題語がテキスト情報に含まれ、前記内容語がタイトルに含まれていないコンテンツを検索する第１の検索条件と、前記ネットワーク上に公開されているコンテンツのうち、前記内容語がタイトルに含まれ、前記主題語がテキスト情報に含まれていないコンテンツを検索する第２の検索条件と前記ネットワーク上に公開されているコンテンツのうち、前記内容語がテキスト情報に含まれ、前記主題語がタイトルに含まれていないコンテンツを検索する第３の検索条件と、前記ネットワーク上に公開されているコンテンツのうち、前記主題語がタイトルに含まれ、前記内容語がテキスト情報に含まれていないコンテンツを検索する第４の検索条件と、を前記複数の検索条件として生成するクエリ生成部を更に備え、前記コンテンツ検索部は、クエリ生成部が生成した前記第１から第４の検索条件を前記検索サーバに送信することによりコンテンツの検索を行い、前記出力部は、前記第１から第４の検索条件のうちいずれか一つを選択し、選択した検索条件により検索された前記コンテンツを出力することを特徴とする。 According to the present invention, in the above-described invention, the content that is disclosed on the network is searched for content in which the subject word is included in the text information and the content word is not included in the title. A second search condition for searching for a content in which the content word is included in the title and the subject word is not included in the text information, and the network. Among the contents published above, a third search condition for retrieving contents in which the content word is included in the text information and the subject word is not included in the title, and the content published on the network The content is searched for content in which the subject word is included in the title and the content word is not included in the text information. A query generation unit that generates the search conditions as the plurality of search conditions, and the content search unit transmits the first to fourth search conditions generated by the query generation unit to the search server. The content is searched, and the output unit selects any one of the first to fourth search conditions, and outputs the content searched according to the selected search condition.

また、本発明は、上記に記載の発明において、前記コンテンツ検索部は、前記第１から第４の検索条件を用いた検索において、得られたコンテンツの数がｎ個未満の場合、該検索条件を生成する際に用いた前記主題語又は前記内容語の語数を減らして検索条件を再度生成してコンテンツの検索をすることを特徴とする。 Further, the present invention provides the above-described invention, wherein, in the search using the first to fourth search conditions, the content search unit includes the search condition when the number of obtained contents is less than n. The content search is performed by reducing the number of words of the subject word or the content word used when generating the search word and generating the search condition again.

また、本発明は、上記に記載の発明における情報選択装置と、放送されている複数の番組それぞれの字幕情報を前記情報選択装置に入力し、前記情報選択装置が出力する位置情報を番組に対応付けて記憶する選択情報記憶部と、前記複数の番組のいずれかを示す要求情報を受信し、該要求情報が示す番組に対応する位置情報を前記選択情報記憶部から読み出し、読み出した位置情報を該要求情報の送信元に送信する要求処理部とを具備することを特徴とするサーバ装置である。 Further, the present invention inputs the caption information of each of the information selection device in the above-described invention and a plurality of broadcast programs to the information selection device, and the position information output by the information selection device corresponds to the program A request information indicating any one of the plurality of programs is received, position information corresponding to the program indicated by the request information is read from the selection information storage, and the read position information is And a request processing unit that transmits the request information to a transmission source.

また、本発明は、番組とともに放送されている該番組の内容を表す字幕情報を受信する字幕取得ステップと、前記字幕取得ステップにおいて受信した字幕情報に含まれる単語から該字幕情報の主題を表す主題語と、該字幕情報の内容を表す内容語とを抽出し、該主題語及び該内容語を組み合わせて生成された複数の検索条件を用いてネットワーク上に公開されているコンテンツのうち該字幕情報に関連するコンテンツを検索するコンテンツ検索ステップと、前記複数の検索条件ごとに、前記コンテンツ検索ステップにおいて検索したコンテンツに含まれ該コンテンツの内容を示すテキスト情報と、前記字幕情報との相関を示す類似度を算出し、算出した類似度に基づいて、前記複数の検索条件からいずれか一つの検索条件を選択する類似度算出ステップと、前記類似度算出ステップにおいて選択した検索条件を用いて検索された前記コンテンツが記憶されている領域を示す位置情報を出力する出力ステップとを含むことを特徴とする情報選択方法である。 In addition, the present invention provides a subtitle acquisition step for receiving subtitle information representing the contents of the program broadcast together with the program, and a subject representing the subject of the subtitle information from words included in the subtitle information received in the subtitle acquisition step. The subtitle information among the contents published on the network using a plurality of search conditions generated by extracting the word and the content word representing the content of the subtitle information and combining the subject word and the content word A content search step for searching for content related to the content, a similarity indicating a correlation between the subtitle information and the text information indicating the content of the content included in the content searched in the content search step for each of the plurality of search conditions A degree of similarity is calculated, and one of the plurality of search conditions is selected based on the calculated degree of similarity. Steps and, wherein the contents searched by using the selected search criteria in similarity calculation step is information selection method characterized by comprising an output step for outputting the position information indicating a region stored.

また、本発明は、情報選択装置に備えられているコンピュータに番組とともに放送されている該番組の内容を表す字幕情報を受信する字幕取得ステップと、前記字幕取得ステップにおいて受信した字幕情報に含まれる単語から該字幕情報の主題を表す主題語と、該字幕情報の内容を表す内容語とを抽出し、該主題語及び該内容語を組み合わせて生成された複数の検索条件を用いてネットワーク上に公開されているコンテンツのうち該字幕情報に関連するコンテンツを検索するコンテンツ検索ステップと、前記複数の検索条件ごとに、前記コンテンツ検索ステップにおいて検索したコンテンツに含まれ該コンテンツの内容を示すテキスト情報と、前記字幕情報との相関を示す類似度を算出し、算出した類似度に基づいて、前記複数の検索条件からいずれか一つの検索条件を選択する類似度算出ステップと、前記類似度算出ステップにおいて選択した検索条件を用いて検索された前記コンテンツが記憶されている領域を示す位置情報を出力する出力ステップとを実行させるためのプログラムである。 Further, the present invention is included in a caption acquisition step for receiving caption information representing the contents of the program broadcast together with the program to a computer provided in the information selection device, and the caption information received in the caption acquisition step. A subject word representing the subject of the caption information and a content word representing the content of the caption information are extracted from the word, and a plurality of search conditions generated by combining the subject word and the subject word are used on the network. A content search step for searching for content related to the subtitle information among the published content; and text information indicating the content of the content included in the content searched in the content search step for each of the plurality of search conditions; , Calculating a similarity indicating a correlation with the caption information, and based on the calculated similarity, the plurality of search conditions A similarity calculation step for selecting any one search condition, and an output step for outputting position information indicating an area where the content searched using the search condition selected in the similarity calculation step is stored. This is a program to be executed.

この発明によれば、番組の字幕情報に関連する情報を含むコンテンツを検索する複数の検索条件ごとに、当該検索条件により検索されたコンテンツと、字幕情報との類似度を算出し、算出した類似度に基づいて複数の検索条件からいずれか一つを選択することにより、複数の検索条件から最適な検索条件を選択することができる。
これにより、複数の検索条件を用いて検索された多数のコンテンツのうち、選択した検索条件である最適な検索条件により検索されたコンテンツを出力することにより、番組の内容に関連する最適なコンテンツをユーザに提示することができる。 According to the present invention, for each of a plurality of search conditions for searching for content including information related to the caption information of a program, the similarity between the content searched according to the search condition and the caption information is calculated, and the calculated similarity By selecting any one of the plurality of search conditions based on the degree, the optimum search condition can be selected from the plurality of search conditions.
As a result, out of a large number of contents searched using a plurality of search conditions, the content searched for under the optimum search condition that is the selected search condition is output, so that the optimum content related to the contents of the program can be obtained. It can be presented to the user.

第１の実施形態における情報選択装置１の利用形態を示す概略ブロック図である。It is a schematic block diagram which shows the utilization form of the information selection apparatus 1 in 1st Embodiment. 同実施形態における情報選択装置１の構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the information selection apparatus 1 in the embodiment. 同実施形態における字幕特徴量（ベクトル）の一例を示す図である。It is a figure which shows an example of the caption feature-value (vector) in the embodiment. 同実施形態における情報選択装置１のコンテンツ選択処理を示すフローチャートである。It is a flowchart which shows the content selection process of the information selection apparatus 1 in the embodiment. 第２の実施形態における情報選択サーバ１００の構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the information selection server 100 in 2nd Embodiment. Ｗｅｂページ（コンテンツ）の検索の処理を示すフローチャートである。It is a flowchart which shows the search process of a web page (content).

以下、図面を参照して、本発明の実施形態による情報選択装置、情報選択方法、及びプログラム、並びにサーバ装置を説明する。 Hereinafter, an information selection device, an information selection method, a program, and a server device according to embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
図１は、第１の実施形態における情報選択装置１の利用形態を示す概略ブロック図である。
同図に示すように、情報選択装置１は、ネットワーク６を介して検索サーバ４及び複数のコンテンツサーバ５に接続されるとともに、セットトップボックス２に接続されている。 (First embodiment)
FIG. 1 is a schematic block diagram showing a usage form of the information selection apparatus 1 in the first embodiment.
As shown in FIG. 1, the information selection device 1 is connected to a search server 4 and a plurality of content servers 5 via a network 6 and is also connected to a set top box 2.

セットトップボックス２は、映像を表示する表示装置３に接続されている。また、セットトップボックス２は、放送番組の映像情報、音声情報及び字幕情報を含む放送信号を外部の送信局から受信し、受信した映像情報及び字幕情報を表示装置３が表示できる映像信号に変換し、音声情報を表示装置３が出力できる音声信号に変換し、映像信号及び音声信号を表示装置３に出力する。ここで、字幕情報は、例えば、クローズドキャプション（closed captioning）や、字幕放送などの文字による情報であり、放送されている番組のセリフや番組の内容を表す情報である。 The set top box 2 is connected to a display device 3 that displays video. The set-top box 2 receives a broadcast signal including video information, audio information, and caption information of a broadcast program from an external transmission station, and converts the received video information and caption information into a video signal that can be displayed by the display device 3. Then, the audio information is converted into an audio signal that can be output by the display device 3, and the video signal and the audio signal are output to the display device 3. Here, the subtitle information is, for example, information by characters such as closed captioning or subtitle broadcasting, and is information representing the lines of the broadcast program and the contents of the program.

また、セットトップボックス２は、ユーザから入力される操作情報に基づいて、ネットワーク６を介して、コンテンツサーバ５により配信されているコンテンツを受信して表示装置３に表示させる。また、セットトップボックス２は、受信した字幕情報を情報選択装置１に順に送信する。なお、セットトップボックス２は、光ファイバーケーブルや同軸ケーブルなどの有線回線を介して放送信号を受信してもよいし、無線回線を介して放送信号を受信してもよい。 Further, the set top box 2 receives the content distributed by the content server 5 via the network 6 based on the operation information input from the user, and displays it on the display device 3. Further, the set top box 2 sequentially transmits the received caption information to the information selection device 1. The set top box 2 may receive a broadcast signal via a wired line such as an optical fiber cable or a coaxial cable, or may receive a broadcast signal via a wireless line.

情報選択装置１は、セットトップボックス２から受信した字幕情報に基づいて、字幕情報の内容と関連性のあるコンテンツを検索サーバ４を用いて検索し、検索したコンテンツが記憶されている領域を示す位置情報をセットトップボックス２に送信する。ここで、位置情報は、例えば、ＵＲＬ（Uniform Resource Locator；統一資源位置指定子）などのコンテンツサーバ５と、当該コンテンツサーバ５により配信されているコンテンツが記憶されている領域とを示す情報である。
表示装置３は、入力される映像信号をデコードし、当該映像信号に応じた映像を表示させる。表示装置３は、例えば、テレビ受像機や、液晶ディスプレイなどである。 Based on the caption information received from the set top box 2, the information selection device 1 searches the content related to the content of the caption information using the search server 4, and indicates an area in which the searched content is stored. The position information is transmitted to the set top box 2. Here, the location information is information indicating a content server 5 such as a URL (Uniform Resource Locator) and an area in which the content distributed by the content server 5 is stored. .
The display device 3 decodes the input video signal and displays a video corresponding to the video signal. The display device 3 is, for example, a television receiver or a liquid crystal display.

検索サーバ４は、ネットワーク６を介して受信する検索条件に基づいて、ネットワーク６上で配信されているコンテンツのうち、受信する検索条件を満たすコンテンツを検索するとともに、当該検索条件に基づいて検索されたコンテンツに対して順位付けをしたリストを含む検索結果を、当該検索条件の送信元に返信する。コンテンツに対する順位付けは、例えば、コンテンツと検索条件と一致の度合いや、コンテンツの閲覧回数などに応じて行われる。このような検索サーバ４の例としては、Ｇｏｏｇｌｅ（登録商標）や、Ｙａｈｏｏ！（登録商標）などにより運営されているネットワーク上の検索サーバ（検索エンジン）がある。 The search server 4 searches for content that satisfies the search condition to be received among the content distributed on the network 6 based on the search condition received via the network 6 and is searched based on the search condition. A search result including a list in which the contents are ranked is returned to the transmission source of the search condition. The ranking of the content is performed according to, for example, the degree of coincidence between the content and the search condition, the number of browsing times of the content, and the like. Examples of such a search server 4 are Google (registered trademark) and Yahoo! There is a search server (search engine) on a network operated by (registered trademark) or the like.

コンテンツサーバ５は、文章により内容が表されたテキスト情報と、当該テキスト情報のタイトルとを含むＷｅｂページの情報を記憶し、記憶しているＷｅｂページを公開しているサーバ装置である。また、コンテンツサーバ５は、記憶しているＷｅｂページの閲覧の要求を受信すると、要求されたＷｅｂページの情報を要求元に送信する。ネットワーク６は、インターネットのような一般に公開されたネットワークでもよいし、イントラネットのような特定のユーザに限定されたネットワークでもよい。 The content server 5 is a server device that stores information on a Web page including text information whose contents are represented by text and a title of the text information, and publishes the stored Web page. In addition, when the content server 5 receives a request to browse a stored Web page, the content server 5 transmits information on the requested Web page to the request source. The network 6 may be a public network such as the Internet or a network limited to a specific user such as an intranet.

図２は、本実施形態における情報選択装置１の構成を示す概略ブロック図である。
同図に示すように、情報選択装置１は、単語リスト記憶部１１、字幕取得部１２、ＵＲＬ取得部１３、テキスト取得部１４、テキスト特徴量算出部１５、字幕特徴量算出部１６、類似度算出部１７、ＵＲＬ出力部１８を備えている。 FIG. 2 is a schematic block diagram showing the configuration of the information selection device 1 in the present embodiment.
As shown in the figure, the information selection device 1 includes a word list storage unit 11, a caption acquisition unit 12, a URL acquisition unit 13, a text acquisition unit 14, a text feature amount calculation unit 15, a caption feature amount calculation unit 16, a similarity degree. A calculation unit 17 and a URL output unit 18 are provided.

単語リスト記憶部１１は、ｍ（ｍ≧１）個の単語を含むリスト情報を予め記憶している。リスト情報は、例えば、過去に放送された字幕情報から出現頻度が高い単語や、各字幕情報の特徴を示す重要な単語を抽出して予め生成された単語の集合である。また、リスト情報に含まれる単語は、名詞、又は形容詞である。
字幕取得部１２は、外部の装置より字幕情報を受信し、受信した字幕情報をＵＲＬ取得部１３及び字幕特徴量算出部１６に出力する。本実施形態において、字幕取得部１２は、字幕情報をセットトップボックス２から受信する。 The word list storage unit 11 stores list information including m (m ≧ 1) words in advance. The list information is, for example, a set of words generated in advance by extracting words having a high appearance frequency from subtitle information broadcast in the past and important words indicating the characteristics of each subtitle information. The word included in the list information is a noun or an adjective.
The caption acquisition unit 12 receives caption information from an external device, and outputs the received caption information to the URL acquisition unit 13 and the caption feature value calculation unit 16. In the present embodiment, the caption acquisition unit 12 receives caption information from the set top box 2.

ＵＲＬ取得部１３は、字幕取得部１２から入力された字幕情報に対して形態素解析を行い、当該字幕情報を品詞に応じて単語に分け、字幕情報に含まれている単語から当該当該字幕情報の主題を表す主題語と、当該字幕情報の内容を表す内容語とを抽出する。そして、ＵＲＬ取得部１３は、抽出した主題語及び内容語を組み合わせて生成された複数の検索条件を用いて、ネットワーク６上に公開されているコンテンツのうち字幕情報に関連するコンテンツを検索し、検索したコンテンツの位置情報をテキスト取得部１４及びＵＲＬ出力部１８に出力する。以下、位置情報をＵＲＬとした場合について記載する。 The URL acquisition unit 13 performs morphological analysis on the subtitle information input from the subtitle acquisition unit 12, divides the subtitle information into words according to parts of speech, and extracts the subtitle information from the words included in the subtitle information. A subject word representing the subject and a content word representing the content of the caption information are extracted. Then, the URL acquisition unit 13 searches for content related to the caption information among the contents published on the network 6 using a plurality of search conditions generated by combining the extracted subject words and content words, The position information of the searched content is output to the text acquisition unit 14 and the URL output unit 18. Hereinafter, a case where the location information is a URL will be described.

また、ＵＲＬ取得部１３は、キーワード抽出部１３１、クエリ生成部１３２、コンテンツ検索部１３３を有している。
キーワード抽出部１３１は、入力される字幕情報に対して形態素解析を行い、当該字幕情報を品詞に応じて単語に分け、当該字幕情報に含まれる単語から当該字幕情報の主題語及び内容語を抽出し、抽出した主題語及び内容語をクエリ生成部１３２に出力する。このとき、キーワード抽出部１３１は、例えば、主題語を２語選択し、内容語を３語選択する。キーワード抽出部１３１が選択する主題語及び内容語の語数は、字幕情報の長さなどに応じて変更するようにしてもよい。
また、主題語及び内容語を抽出する方法は、例えば、非特許文献１に記載されている手法を用いて行う。具体的には、字幕情報において、出現頻度が高く、かつ、他の単語との共起関係の強い単語を主題語とする。また、字幕情報において、主題語と共起関係の強い単語を内容語とする。 The URL acquisition unit 13 includes a keyword extraction unit 131, a query generation unit 132, and a content search unit 133.
The keyword extraction unit 131 performs morphological analysis on the input subtitle information, divides the subtitle information into words according to the part of speech, and extracts the subject words and content words of the subtitle information from the words included in the subtitle information The extracted subject words and content words are output to the query generation unit 132. At this time, for example, the keyword extraction unit 131 selects two subject words and selects three content words. The number of subject words and content words selected by the keyword extraction unit 131 may be changed according to the length of the caption information.
Moreover, the method of extracting a subject word and a content word is performed using the method described in the nonpatent literature 1, for example. Specifically, in the caption information, a word having a high appearance frequency and a strong co-occurrence relationship with another word is set as a subject word. In the caption information, a word having a strong co-occurrence relationship with the subject word is used as a content word.

クエリ生成部１３２は、字幕情報に関連する周辺の情報が検索できるように、キーワード抽出部１３１が抽出した主題語及び内容語を組み合わせた前述の複数の検索条件として４つの検索条件を生成してコンテンツ検索部１３３に出力する。ここで、４つの検索条件の生成方法として、例えば、非特許文献１に記載されているクエリ（質問）の生成方法を用いる。 The query generation unit 132 generates four search conditions as the plurality of search conditions described above that combine the subject words and content words extracted by the keyword extraction unit 131 so that peripheral information related to the caption information can be searched. The data is output to the content search unit 133. Here, as a method for generating the four search conditions, for example, a method for generating a query (question) described in Non-Patent Document 1 is used.

ここで、クエリ生成部１３２が生成する４つの検索条件は、（Ａ）主題語に基づいて字幕情報の内容を詳細化することにより字幕情報の内容を掘り下げて検索するためのＳＤ（Subject Deepening)検索条件と、（Ｂ）内容語に基づいて字幕情報の内容を詳細化することにより字幕情報の内容を掘り下げて検索するためのＣＤ（Content Deepening）検索条件と、（Ｃ）主題語に基づいて字幕情報の内容を広げることにより字幕情報と異なる視点の情報を検索するためのＳＢ（Subject Broadening）検索条件と、（Ｄ）内容語に基づいて字幕情報の内容を広げることにより字幕情報と異なる視点の情報を検索するためのＣＢ（Content Broadening）検索条件とである。 Here, the four search conditions generated by the query generation unit 132 are: (A) SD (Subject Deepening) for searching deeply the content of subtitle information by refining the content of subtitle information based on the subject word. Based on the search condition, (B) the content deepening (CD) search condition for searching deeply the content of the subtitle information by refining the content of the subtitle information based on the content word, and (C) based on the subject word SB (Subject Broadening) search condition for searching for information from a viewpoint different from the caption information by expanding the contents of the caption information, and (D) a viewpoint different from the caption information by expanding the contents of the caption information based on the content word CB (Content Broadening) search condition for searching the information.

具体的には、（Ａ）ＳＤ検索条件（第１の検索条件）は、主題語がコンテンツの内容を文章により示すテキスト情報（本文）に含まれ、内容語がタイトル（見出し）に含まれていないコンテンツを検索する条件である。ここで、コンテンツのタイトルとは、例えば、タイトル（Title）のタグに記述された語である。
また、（Ｂ）ＣＤ検索条件（第２の検索条件）は、内容語がタイトルに含まれ、主題語が本文に含まれていないコンテンツを検索する条件である。
また、（Ｃ）ＳＢ検索条件（第３の検索条件）は、内容語が本文に含まれ、主題語がタイトルに含まれていないコンテンツを検索する条件である。
また、（Ｄ）ＣＢ検索条件（第４の検索条件）は、主題語がタイトルに含まれ、内容語が本文に含まれていないコンテンツを検索する条件である。 Specifically, (A) SD search condition (first search condition) is that the subject word is included in text information (body text) indicating the content of the content in a sentence, and the content word is included in the title (heading). It is a condition to search for no content. Here, the content title is, for example, a word described in a title tag.
Further, (B) CD search condition (second search condition) is a condition for searching for content in which the content word is included in the title and the subject word is not included in the text.
The (C) SB search condition (third search condition) is a condition for searching for content in which the content word is included in the text and the subject word is not included in the title.
Further, (D) CB search condition (fourth search condition) is a condition for searching for content in which the subject word is included in the title and the content word is not included in the text.

コンテンツ検索部１３３は、クエリ生成部１３２が生成した４つの検索条件それぞれを用いて各検索条件に一致するコンテンツを検索する。具体的には、コンテンツ検索部１３３は、ネットワーク６を介して、４つの検索条件それぞれを検索サーバ４に送信し、検索サーバ４から検索条件それぞれの検索結果を受信する。そして、コンテンツ検索部１３３は、各検索条件の検索結果のうち上位ｎ（ｎ≧１）個のコンテンツのＵＲＬを、検索条件に対応付けてテキスト取得部１４に出力する。ここで、ｎは、予め定められた数値であり、例えば、放送されている番組の映像とともに表示装置３に表示できるコンテンツの数、又はコンテンツの概要を示す情報の数に基づいて予め定められる。 The content search unit 133 searches for content that matches each search condition using each of the four search conditions generated by the query generation unit 132. Specifically, the content search unit 133 transmits each of the four search conditions to the search server 4 via the network 6 and receives the search result of each search condition from the search server 4. Then, the content search unit 133 outputs the URLs of the top n (n ≧ 1) content among the search results of each search condition to the text acquisition unit 14 in association with the search condition. Here, n is a predetermined numerical value, and is determined in advance based on, for example, the number of contents that can be displayed on the display device 3 together with the video of the broadcast program, or the number of information indicating the outline of the contents.

なお、コンテンツ検索部１３３は、クエリ生成部１３２が生成した検索条件により得られたＵＲＬがｎ個未満の場合、検索により得られるＵＲＬの個数が増えるように、検索条件に含まれる条件を減らして再度検索を行う。例えば、各検索条件の特徴を生かせるように検索条件に含まれる否定の条件を１つずつ減らすことにより検索条件を緩めて、再度検索を行う。また、否定の条件を削除しても得られるＵＲＬがｎ個以下のときは、肯定の条件を１つずつ減らして検索を繰り返してｎ個のＵＲＬが得られるようにする。
例えば、キーワード抽出部１３１が字幕情報から２語の主題語ｓ１、ｓ２と、３語の内容語ｃ１、ｃ２、ｃ３とを抽出する場合におけるＣＤ検索条件を緩めるとき、コンテンツ検索部１３３は、「内容語ｃ１、ｃ２、ｃ３がタイトルに含まれ、主題語ｓ１、ｓ２が本文に含まれない」という条件を、「内容語ｃ１、ｃ２、ｃ３がタイトルに含まれ、主題語ｓ１が本文に含まれない」という条件に変更して再度検索をする。 The content search unit 133 reduces the conditions included in the search condition so that the number of URLs obtained by the search increases when the number of URLs obtained by the search condition generated by the query generation unit 132 is less than n. Search again. For example, the search condition is relaxed by reducing the negative condition included in the search condition one by one so that the feature of each search condition can be utilized, and the search is performed again. If the number of URLs obtained even when the negative condition is deleted is n or less, the positive condition is reduced one by one and the search is repeated so that n URLs are obtained.
For example, when the keyword extraction unit 131 relaxes the CD search condition when extracting the two subject words s1, s2 and the three content words c1, c2, c3 from the caption information, the content search unit 133 The content words c1, c2, and c3 are included in the title, and the subject words s1 and s2 are not included in the body text. The content words c1, c2, and c3 are included in the title, and the subject word s1 is included in the body text. Change the search condition to "No" and search again.

テキスト取得部１４は、コンテンツ検索部１３３が出力する各検索条件に対応するｎ個ずつのコンテンツのＵＲＬを用いて、ネットワーク６を介してコンテンツサーバ５から当該コンテンツに含まれているテキスト情報を受信する。そして、テキスト取得部１４は、受信した各コンテンツのテキスト情報と、当該コンテンツに対応する検索条件とを対応付けてテキスト特徴量算出部１５に出力する。
なお、コンテンツに含まれるテキスト情報は、例えば、ＨＴＭＬにより記載されたコンテンツであれば、ボディ（body）タグにより示されている範囲の文章を選択することにより、コンテンツのデータから取り出される。また、ＭＨＴＭＬにより記載されたコンテンツであれば、テキスト情報は、元となるＨＴＭＬのファイルに含まれている文書を選択することにより、コンテンツのデータから取り出される。 The text acquisition unit 14 receives text information included in the content from the content server 5 via the network 6 using URLs of n pieces of content corresponding to each search condition output by the content search unit 133. To do. Then, the text acquisition unit 14 associates the received text information of each content with the search condition corresponding to the content, and outputs it to the text feature amount calculation unit 15.
For example, in the case of content described in HTML, text information included in content is extracted from content data by selecting a sentence in a range indicated by a body tag. If the content is described in MHTML, the text information is extracted from the content data by selecting a document included in the original HTML file.

テキスト特徴量算出部１５は、単語リスト記憶部１１に記憶されているリスト情報を読み出し、読み出したリスト情報に含まれる各単語を用いてｔｆｉｄｆ（Term Frequency & inverse Document Frequency）により重み付けをした特徴量であるテキスト特徴量を、テキスト取得部１４から入力されるコンテンツのテキスト情報ごとに算出する。
具体的には、テキスト特徴量算出部１５は、コンテンツのテキスト情報ごとに、リスト情報に含まれるｍ個の単語ごとのｔｆｉｄｆ値を算出し、算出したｍ個のｔｆｉｄｆ値を要素とするｍ次元のベクトルをテキスト特徴量として算出する。 The text feature amount calculation unit 15 reads the list information stored in the word list storage unit 11 and weights the feature amount by tfidf (Term Frequency & inverse Document Frequency) using each word included in the read list information. Is calculated for each text information of the content input from the text acquisition unit 14.
Specifically, the text feature amount calculation unit 15 calculates a tfidf value for each of m words included in the list information for each text information of the content, and an m-dimension having the calculated m tfidf values as elements. Is calculated as a text feature amount.

ここで、ｔｆｉｄｆは、ある文書（コンテンツ）内における文字列αの出現頻度（Term Frequency；ｔｆ）と、文書集合全体（コンテンツの集合全体）のうち文字列αが出現する文書数の頻度を逆数（Inverse Document Frequency；ｉｄｆ）の積により算出し、算出した数値が大きいほど文字列αが重要であることを示す指標の一つである。 Here, tfidf is the reciprocal of the appearance frequency (Term Frequency; tf) of the character string α in a certain document (content) and the frequency of the number of documents in which the character string α appears in the entire document set (entire content set). It is calculated by the product of (Inverse Document Frequency; idf), and is one of indexes indicating that the character string α is more important as the calculated numerical value is larger.

以下、テキスト特徴量の算出方法を詳細に説明する。
検索されたコンテンツのうち、ｊ（１≦ｊ≦４ｎ）番目のコンテンツのテキスト情報における、リスト情報のｉ（１≦ｉ≦ｍ）番目の単語のｔｆｉｄｆ値は、次式（１）により表される。 Hereinafter, a method for calculating the text feature amount will be described in detail.
Among the searched contents, the tfidf value of the i (1 ≦ i ≦ m) -th word of the list information in the text information of the j (1 ≦ j ≦ 4n) -th content is expressed by the following equation (1). The

ここで、ｔｆ_ｉ，ｊは、ｊ番目のコンテンツのテキスト情報におけるｉ番目の単語の出現数である。また、Ｎは、テキスト特徴量を算出する対象のコンテンツに含まれるテキスト情報の総数（＝ｎ×４）である。また、ｄｆ_ｉは、テキスト特徴量を算出する対象のコンテンツに含まれるテキスト情報(本文)のうち、ｉ番目の単語が含まれるコンテンツのテキスト情報の数である。
このように単語ごとに算出されたｔｆｉｄｆ値を要素とする次式（２）により表されるベクトルｔ_ｊをｊ番目のコンテンツのテキスト情報に対するテキスト特徴量とする。 Here, tf _{i, j} is the number of occurrences of the i-th word in the text information of the j-th content. N is the total number of text information (= n × 4) included in the content whose text feature value is to be calculated. Also, df _i is the number of text information of the content including the i-th word among the text information (body) included in the content whose text feature value is to be calculated.
The vector t _j represented by the following equation (2) having the tfidf value calculated for each word as an element is used as the text feature amount for the text information of the j-th content.

字幕特徴量算出部１６は、テキスト特徴量算出部１５と同様に、単語リスト記憶部１１に記憶されているリスト情報に含まれる各単語を用いてｔｆｉｄｆにより重み付けをした特徴量である字幕特徴量を字幕取得部１２から入力される字幕情報に対して算出する。
例えば、次の字幕情報が入力された場合、字幕情報における単語「南極」のｔｆｉｄｆ値は、「２５．９」と算出され、単語「しらせ」のｔｆｉｄｆ値は、「５７．９」と算出される。
（字幕情報）：「>>さて２５年にわたり、南極観測で活躍し、去年退役した砕氷船しらせ。その後継として建造された新しいしらせが今日、南極へ向けて初めての航海に出発します。中継でお伝えします。>>こちらがその新しいしらせです。通称、南極観測船。正式には氷を砕くと書いて砕氷船といいます。・・・（略）」 Similar to the text feature value calculation unit 15, the caption feature value calculation unit 16 is a feature value weighted by tfidf using each word included in the list information stored in the word list storage unit 11. Is calculated for subtitle information input from the subtitle acquisition unit 12.
For example, when the next caption information is input, the tfidf value of the word “antarctic” in the caption information is calculated as “25.9”, and the tfidf value of the word “shirase” is calculated as “57.9”. The
(Subtitle information): “>> Now, an icebreaker that has been active in Antarctic observation for 25 years and retired last year. A new one built as a successor today departs for the first voyage to Antarctica. >> This is the new news, commonly known as the Antarctic observation ship, which is officially written as an icebreaker.

図３は、本実施形態における字幕特徴量（ベクトル）の一例を示す図である。
図示する例では、字幕情報に出現しない「ヘッド」、「ひとたび手」、「５つ目」に対して「０」（ゼロ）が割り当てられ、字幕情報に出現する「出発」に対して算出されたｔｆｉｄｆ値「７．５３０６４５」が割り当てられている。
このように、リスト情報に含まれる単語のうち、字幕情報に出現しない単語に対応する要素に対して「０」（ゼロ）が割り当てられ、字幕情報に出現する単語に対応する要素に対してｔｆｉｄｆ値が割り当てられる。また、テキスト特徴量算出部１５がコンテンツのテキスト情報ごとに算出するテキスト特徴量も、字幕特徴量と同様に数値が割り当てられる。 FIG. 3 is a diagram illustrating an example of caption feature quantities (vectors) in the present embodiment.
In the illustrated example, “0” (zero) is assigned to “head”, “once hand”, and “fifth” that do not appear in the caption information, and is calculated for “departure” that appears in the caption information. The tfidf value “7.530645” is assigned.
Thus, among the words included in the list information, “0” (zero) is assigned to an element corresponding to a word that does not appear in caption information, and tfidf is assigned to an element corresponding to a word that appears in caption information. A value is assigned. Also, the text feature amount calculated by the text feature amount calculation unit 15 for each piece of text information of the content is assigned a numerical value in the same manner as the caption feature amount.

図２に戻って、類似度算出部１７は、字幕特徴量算出部１６が算出した字幕特徴量と、テキスト特徴量算出部１５が算出した各テキスト特徴量とを用いて、字幕情報と、コンテンツのテキスト情報との相関を示す類似度を算出し、算出した類似度に基づいて、４つの検索条件から字幕情報に類似したテキスト情報を含むコンテンツを検索する最適な検索条件を選択する。 Returning to FIG. 2, the similarity calculation unit 17 uses the caption feature amount calculated by the caption feature amount calculation unit 16 and each text feature amount calculated by the text feature amount calculation unit 15 to use the caption information and the content. The similarity indicating the correlation with the text information is calculated, and based on the calculated similarity, the optimum search condition for searching the content including the text information similar to the caption information is selected from the four search conditions.

具体的には、類似度算出部１７は、テキスト特徴量が算出したテキスト特徴量（ベクトルｔ_ｊ（１≦ｊ≦４×ｎ））と、字幕特徴量ｔ_ｋとを用いた次式（３）により表されるコサイン相関値ｃｏｓ（ｔ_ｊ、ｔ_ｋ）を類似度として算出する。 Specifically, the similarity calculating unit 17, a text feature quantity text feature amount is calculated (vector _{t j (1 ≦ j ≦ 4} × n)) and the following equation using the subtitle feature amount t _k (3 The cosine correlation value cos (t _j , t _k ) represented by

また、類似度算出部１７は、各テキスト特徴量と、字幕特徴量との類似度を算出するとともに、４つの検索条件それぞれにおける類似度の平均値を算出し、類似度の平均値が最も高い検索条件を最適な検索条件として選択し、選択した検索条件を示す情報をＵＲＬ出力部１８に出力する。 In addition, the similarity calculation unit 17 calculates the similarity between each text feature quantity and the caption feature quantity, calculates the average value of the similarity in each of the four search conditions, and the average value of the similarity is the highest. The search condition is selected as the optimum search condition, and information indicating the selected search condition is output to the URL output unit 18.

ＵＲＬ出力部１８には、類似度算出部１７から入力された検索条件を示す情報と、コンテンツ検索部１３３から４つの検索条件（ＳＢ、ＳＤ、ＣＢ、ＣＤ）それぞれに対応付けられたｎ個ずつＵＲＬとが入力される。また、ＵＲＬ出力部１８は、コンテンツ検索部１３３から４つの検索条件それぞれに対応付けられたｎ個ずつＵＲＬから、類似度算出部１７から入力された検索条件を示す情報に対応するｎ個のＵＲＬを選択して外部（セットトップボックス２）に出力する。 The URL output unit 18 includes information indicating the search condition input from the similarity calculation unit 17 and n pieces corresponding to each of the four search conditions (SB, SD, CB, CD) from the content search unit 133. URL is input. The URL output unit 18 also includes n URLs corresponding to information indicating the search condition input from the similarity calculation unit 17 from n URLs associated with each of the four search conditions from the content search unit 133. Is output to the outside (set top box 2).

図４は、本実施形態における情報選択装置１のコンテンツ選択処理を示すフローチャートである。
情報選択装置１は、字幕取得部１２がセットトップボックス２から字幕情報を受信すると（ステップＳ１０１）、字幕取得部１２が、受信した字幕情報をキーワード抽出部１３１と、字幕特徴量算出部１６とに出力する。
キーワード抽出部１３１は、字幕取得部１２から入力された字幕情報における主題語及び内容語を抽出し、抽出した主題語及び内容語をクエリ生成部１３２に出力する（ステップＳ１０２）。 FIG. 4 is a flowchart showing content selection processing of the information selection device 1 in the present embodiment.
When the subtitle acquisition unit 12 receives subtitle information from the set top box 2 (step S101), the subtitle acquisition unit 12 uses the keyword extraction unit 131, the subtitle feature amount calculation unit 16, and the subtitle acquisition unit 12 to receive the subtitle information. Output to.
The keyword extraction unit 131 extracts the subject words and content words in the caption information input from the caption acquisition unit 12, and outputs the extracted subject words and content words to the query generation unit 132 (step S102).

クエリ生成部１３２は、キーワード抽出部１３１から入力された主題語及び内容語を用いて、ＳＤ、ＳＢ、ＣＤ、ＣＢそれぞれの検索条件を生成し、生成した検索条件をコンテンツ検索部１３３に出力する（ステップＳ１０３)。
コンテンツ検索部１３３は、ネットワーク６を介して接続されている検索サーバ４に、クエリ生成部１３２が生成した検索条件を含む検索を要求する情報を送信し、検索条件ごとの検索結果を受信する。そして、コンテンツ検索部１３３は、検索条件ごとに、検索結果のうち上位ｎ個のコンテンツのＵＲＬを抽出し、各検索条件に対応付けた上位ｎ個ずつのコンテンツのＵＲＬをテキスト取得部１４に出力する（ステップＳ１０４）。 The query generation unit 132 generates search conditions for each of SD, SB, CD, and CB using the subject words and content words input from the keyword extraction unit 131, and outputs the generated search conditions to the content search unit 133. (Step S103).
The content search unit 133 transmits information requesting a search including the search condition generated by the query generation unit 132 to the search server 4 connected via the network 6 and receives a search result for each search condition. Then, the content search unit 133 extracts the URLs of the top n contents from the search results for each search condition, and outputs the URLs of the top n contents associated with the search conditions to the text acquisition unit 14. (Step S104).

テキスト取得部１４は、コンテンツ検索部１３３から入力されたＵＲＬを用いて、検索された各コンテンツのテキスト情報を取得し（ステップＳ１０５）、テキスト特徴量算出部１５は、テキスト取得部１４が取得した各コンテンツのテキスト情報と、単語リスト記憶部１１に記憶されている単語リストとを用いて、各コンテンツのテキスト情報に対するテキスト特徴量を算出する（ステップＳ１０６）。
また字幕特徴量算出部１６は、字幕取得部１２から入力された字幕情報と、単語リスト記憶部１１に記憶されている単語リストとを用いて、字幕情報に対する字幕特徴量を算出する（ステップＳ１０７)。 The text acquisition unit 14 acquires the text information of each searched content using the URL input from the content search unit 133 (step S105), and the text feature quantity calculation unit 15 acquires the text acquisition unit 14 Using the text information of each content and the word list stored in the word list storage unit 11, the text feature amount for the text information of each content is calculated (step S106).
Further, the caption feature amount calculation unit 16 calculates the caption feature amount for the caption information using the caption information input from the caption acquisition unit 12 and the word list stored in the word list storage unit 11 (step S107). ).

類似度算出部１７は、テキスト特徴量算出部１５が算出した各コンテンツのテキスト情報に対するテキスト特徴量と、字幕特徴量算出部１６が算出した字幕情報に対する字幕特徴量との類似度を算出する（ステップＳ１０８）。
また、類似度算出部１７は、４つの検索条件それぞれに対応する各コンテンツの類似度の平均値を算出し、最も平均値の高い検索条件を示す情報をＵＲＬ出力部１８に出力し、ＵＲＬ出力部１８は、類似度算出部１７から入力された情報に対応する検索条件により検索されたＵＲＬをセットトップボックス２に出力する（ステップＳ１０９）。 The similarity calculation unit 17 calculates the similarity between the text feature amount for the text information of each content calculated by the text feature amount calculation unit 15 and the caption feature amount for the caption information calculated by the caption feature amount calculation unit 16 ( Step S108).
Also, the similarity calculation unit 17 calculates the average value of the similarity of each content corresponding to each of the four search conditions, outputs information indicating the search condition with the highest average value to the URL output unit 18, and outputs the URL The unit 18 outputs the URL searched by the search condition corresponding to the information input from the similarity calculation unit 17 to the set top box 2 (step S109).

そして、セットトップボックス２は、放送されている番組の映像情報とともに、入力されたＵＲＬのコンテンツを示す情報を表示装置３に表示させる。
これにより、放送されている番組の内容に関連性の高いコンテンツを、番組とともに表示装置３に表示して、ユーザに提供する。 Then, the set top box 2 displays information indicating the content of the input URL on the display device 3 together with the video information of the broadcast program.
As a result, content highly relevant to the content of the broadcast program is displayed on the display device 3 together with the program and provided to the user.

上述した処理により、情報選択装置１は、字幕情報に基づいて生成されたＳＢ、ＳＤ、ＣＢ、及びＣＤそれぞれの検索条件により番組の内容をより詳しく記載したコンテンツ、又は番組の内容を異なる視点から記載したコンテンツを検索し、検索したコンテンツと字幕情報との類似度に基づいて、ＳＢ、ＳＤ、ＣＢ、及びＣＤの検索条件のうち、字幕情報と類似度の高いコンテンツを検索した検索条件により検索されたコンテンツを出力するようにした。
このように、複数の検索条件を用いて番組の内容に関連する周辺の情報を含むコンテンツを検索した上で、検索されコンテンツのうち番組の内容との類似度が高いコンテンツを検索した際に用いた検索条件を選択して検索されたコンテンツを絞り込むことにより、番組の内容との類似度がより高いコンテンツを選択することができる。その結果、番組の内容に関連する適切なコンテンツ、又はコンテンツのＵＲＬをユーザに提供することができる。 Through the above-described processing, the information selection device 1 allows the content of the program to be described in more detail according to the search conditions for SB, SD, CB, and CD generated based on the caption information, or the content of the program from different viewpoints. Search the listed content, and based on the similarity between the searched content and subtitle information, search by the search condition for searching for content with high similarity to subtitle information among the search conditions for SB, SD, CB, and CD The contents which were done were output.
As described above, after searching for content including peripheral information related to the content of the program using a plurality of search conditions, the content searched for content having a high similarity to the content of the program is used. By selecting a search condition that has been selected and narrowing down the searched content, it is possible to select a content with a higher degree of similarity to the content of the program. As a result, it is possible to provide the user with appropriate content related to the content of the program or the URL of the content.

また、情報選択装置１は、テキスト特徴量、及び字幕特徴量の算出に、過去に放送された字幕情報から予め抽出されたｍ個の単語を用いるようにした。これにより、字幕情報に頻出する単語を用いて、検索したコンテンツのテキスト特徴量を算出するようにしたので、字幕情報との類似度を算出する際の精度を向上させることができる。
また、情報選択装置１は、検索されたコンテンツのうち、各検索条件の上位ｎ個のコンテンツに対してテキスト特徴量を算出し、字幕情報との類似度を算出するようにした。これにより、テキスト特徴量、及び類似度の算出に要する処理時間を一定時間内に抑えるようにし、放送されている番組の内容とともに随時更新される字幕情報に関連する最適なコンテンツを、字幕情報の更新に応じて出力することができる。 In addition, the information selection device 1 uses m words extracted in advance from the caption information broadcast in the past for the calculation of the text feature amount and the caption feature amount. As a result, the text feature amount of the searched content is calculated using words that frequently appear in the caption information, so that the accuracy in calculating the similarity to the caption information can be improved.
In addition, the information selection device 1 calculates the text feature amount for the top n contents of each search condition among the searched contents, and calculates the similarity with the caption information. As a result, the processing time required to calculate the text feature amount and the similarity is suppressed within a certain time, and the optimal content related to the caption information updated at any time together with the contents of the broadcasted program is stored in the caption information. Can be output in response to updates.

なお、上述の第１の実施形態において、情報選択装置１は、選択したＵＲＬをセットトップボックス２に送信する構成について説明した。しかし、この構成に限ることなく、ネットワーク６などを介して、選択したＵＲＬを予め定められた装置に送信するようにしてもよい。これにより、放送されている番組を表示している表示装置３以外の装置、例えば、当該番組に関連するコンテンツを携帯情報端末などの装置に表示させることができるので、表示装置３においてコンテンツを表示させることができる領域に制限されることなく、コンテンツを表示させることができる。また、上述のコンテンツ選択処理を実行する処理能力を有していない携帯情報端末などにも、番組の内容に関連する適切なコンテンツを表示させることができる。 In the above-described first embodiment, the configuration in which the information selection device 1 transmits the selected URL to the set top box 2 has been described. However, without being limited to this configuration, the selected URL may be transmitted to a predetermined device via the network 6 or the like. Thus, a device other than the display device 3 displaying the broadcast program, for example, a content related to the program can be displayed on a device such as a portable information terminal, so that the content is displayed on the display device 3. The content can be displayed without being limited to the area that can be displayed. In addition, appropriate content related to the content of the program can be displayed on a portable information terminal or the like that does not have the processing ability to execute the above-described content selection processing.

また、上述の第１の実施形態において、情報選択装置１、セットトップボックス２、及び表示装置３は、独立した装置である場合について説明した。しかし、これに限らず、セットトップボックス２が情報選択装置１を含む構成であってもよいし、表示装置３が情報処理装置１及びセットトップボックス２を含む構成であってもよい。 Moreover, in the above-described first embodiment, the case where the information selection device 1, the set top box 2, and the display device 3 are independent devices has been described. However, the configuration is not limited thereto, and the set top box 2 may include the information selection device 1, or the display device 3 may include the information processing device 1 and the set top box 2.

（第２の実施形態）
図５は、第２の実施形態における情報選択サーバ１００の構成を示す概略ブロック図である。
同図に示すように、情報選択サーバ１００は、情報選択装置１、放送受信部１０１、選択情報記憶部１０２、要求処理部１０３を具備している。本実施形態における情報選択装置１、検索サーバ４、コンテンツサーバ５、及びネットワーク６は、第１の実施形態における情報選択装置１、検索サーバ４、コンテンツサーバ５、及びネットワーク６と同じ構成を有しており、同じ符号を付して、その説明を省略する。 (Second Embodiment)
FIG. 5 is a schematic block diagram showing the configuration of the information selection server 100 in the second embodiment.
As shown in the figure, the information selection server 100 includes an information selection device 1, a broadcast reception unit 101, a selection information storage unit 102, and a request processing unit 103. The information selection device 1, the search server 4, the content server 5, and the network 6 in the present embodiment have the same configuration as the information selection device 1, the search server 4, the content server 5, and the network 6 in the first embodiment. The same reference numerals are given and the description thereof is omitted.

放送受信部１０１は、放送されている番組が含まれる複数の放送信号を受信し、各放送信号に含まれる番組の文字情報を順に情報選択装置１に出力し、各放送信号に含まれる番組に関連するＵＲＬを選択情報記憶部１０２に順次出力させる。また、放送受信部１０１は、情報選択装置１が当該文字情報に応じて出力するＵＲＬと、情報選択装置１に出力した文字情報に対応する放送信号を識別するチャンネル情報とを対応付けて選択情報記憶部１０２に記憶させる。 The broadcast receiving unit 101 receives a plurality of broadcast signals including a broadcast program, and sequentially outputs the character information of the programs included in each broadcast signal to the information selection device 1 so that the program included in each broadcast signal Related URLs are sequentially output to the selection information storage unit 102. Also, the broadcast receiving unit 101 selects the selection information by associating the URL output by the information selection device 1 according to the character information and the channel information for identifying the broadcast signal corresponding to the character information output to the information selection device 1. The data is stored in the storage unit 102.

また、放送受信部１０１は、予め定められた期間ごとに、字幕情報を情報選択装置１に出力して、選択情報記憶部１０２に記憶されているチャンネル情報に対応付けられたＵＲＬを更新させる。
なお、放送信号受信部１０１は、光ファイバや同軸線などの有線回線を通じて放送信号を受信してもよし、無線回線を通じて放送信号を受信してもよい。 Also, the broadcast receiving unit 101 outputs subtitle information to the information selection device 1 for each predetermined period, and updates the URL associated with the channel information stored in the selection information storage unit 102.
The broadcast signal receiving unit 101 may receive a broadcast signal through a wired line such as an optical fiber or a coaxial line, or may receive a broadcast signal through a wireless line.

情報選択装置１は、放送信号受信部１０１から入力される字幕情報に基づいて、当該字幕情報に関連するｎ個のＵＲＬを選択情報記憶部１０２に出力する。
選択情報記憶部１０２には、放送受信部１０１が受信する複数の放送信号それぞれを示すチャンネル情報と、情報選択装置１が出力するＵＲＬとが対応付けて記憶されている。 The information selection device 1 outputs n URLs related to the subtitle information to the selection information storage unit 102 based on the subtitle information input from the broadcast signal receiving unit 101.
The selection information storage unit 102 stores channel information indicating each of a plurality of broadcast signals received by the broadcast reception unit 101 and the URL output by the information selection device 1 in association with each other.

要求処理部１０３は、ネットワーク９を介して、複数の端末装置８−１〜８−Ｎと接続されている。また、要求処理部１０３は、端末装置８−１〜８−Ｎから複数の放送信号のうちのいずれかを示すチャンネル情報を含む要求情報を受信すると、受信したチャネル情報に対応するＵＲＬを選択情報記憶部１０２から読み出し、読み出したＵＲＬを含む応答情報を当該要求情報の送信元に返信する。
ここで、端末装置８−１〜８−Ｎは、ネットワーク９を介して情報選択サーバ１００と通信を行える装置であり、例えば、携帯型の情報端末装置、ノートパソコン、据え置き型のパソコン、セットトップボックスなどである。 The request processing unit 103 is connected to a plurality of terminal devices 8-1 to 8 -N via the network 9. When the request processing unit 103 receives request information including channel information indicating any one of a plurality of broadcast signals from the terminal devices 8-1 to 8-N, the request processing unit 103 selects a URL corresponding to the received channel information as selection information. Response information including the read URL is read from the storage unit 102 and returned to the transmission source of the request information.
Here, the terminal devices 8-1 to 8-N are devices that can communicate with the information selection server 100 via the network 9, for example, a portable information terminal device, a notebook personal computer, a stationary personal computer, a set top. Such as a box.

上述のように、情報選択サーバ１００が、複数の放送信号により放送されている番組それぞれの内容に関連するコンテンツのＵＲＬを、放送信号を識別するチャンネル情報に対応付けて記憶し、端末装置８−１〜８−Ｎの要求に応じて、放送されている番組の内容に関連するコンテンツのＵＲＬを端末装置８−１〜８−Ｎに送信する。
これにより、端末装置８−１〜８−Ｎの処理能力が低い場合でも、端末装置８−１〜８Ｎは、情報選択サーバ１００を通じて、放送されている番組の内容に関連するコンテンツのＵＲＬを容易に受信することができる。また、ユーザは、例えば、端末装置８−１を操作して、受信したＵＲＬにより示されるコンテンツを閲覧することができ、番組の内容に関連するより詳細な情報を得ることができる。 As described above, the information selection server 100 stores the URL of the content related to the content of each program broadcast by a plurality of broadcast signals in association with the channel information for identifying the broadcast signal, and the terminal device 8- In response to the request from 1 to 8-N, the URL of the content related to the content of the broadcast program is transmitted to the terminal devices 8-1 to 8-N.
Thereby, even when the processing capability of the terminal devices 8-1 to 8-N is low, the terminal devices 8-1 to 8N can easily access the URL of the content related to the content of the broadcast program through the information selection server 100. Can be received. Further, for example, the user can browse the content indicated by the received URL by operating the terminal device 8-1, and can obtain more detailed information related to the content of the program.

なお、本実施形態において、検索サーバ４及びコンテンツサーバ５と、情報選択サーバ１００とを接続するネットワーク６と、端末装置８−１〜８−Ｎと、情報選択サーバ１００とを接続するネットワーク９とは、異なる場合について説明したが、同じネットワークを介して接続されていてもよい。 In the present embodiment, the network 6 that connects the search server 4 and the content server 5 and the information selection server 100, the network 9 that connects the terminal devices 8-1 to 8 -N, and the information selection server 100, Although different cases have been described, they may be connected via the same network.

なお、上述の第1及び第２の実施形態において、情報選択装置１が外部の装置から字幕情報を受信する構成について説明したが、情報選択装置１が放送信号を受信し、受信した放送信号を復調及び復号して、字幕情報を得るようにしてもよい。 In the first and second embodiments described above, the configuration in which the information selection device 1 receives caption information from an external device has been described. However, the information selection device 1 receives a broadcast signal, and the received broadcast signal is Subtitle information may be obtained by demodulating and decoding.

また、上述の第１及び第２の実施形態において、類似度算出部１７は、算出した類似度の平均値が最も高い検索条件を選択する構成について説明したが、算出した類似度が最も高いコンテンツを検索した際に用いた検索条件を選択するようにしてもよい。
これにより、最も類似度の高いコンテンツを必ず出力することができる。特に、第１及び第２の実施形態において、ユーザに提示できるコンテンツが少ない場合、例えば、表示装置３におけるコンテンツを表示するスペースが狭く１件又は２件しか表示できない場合、適切なコンテンツをユーザに提示することができる。 Further, in the first and second embodiments described above, the configuration in which the similarity calculation unit 17 selects the search condition having the highest average value of the calculated similarity has been described. However, the content having the highest calculated similarity is described. You may make it select the search conditions used when searching.
As a result, the content having the highest similarity can be output without fail. In particular, in the first and second embodiments, when there is little content that can be presented to the user, for example, when the space for displaying the content on the display device 3 is narrow and only one or two can be displayed, appropriate content is presented to the user. Can be presented.

上述の情報選択装置１は内部に、コンピュータシステムを有していてもよい。その場合、上述した字幕取得部１２、ＵＲＬ取得部１３、テキスト取得部１４、テキスト特徴量算出部１５、字幕特徴量算出部１６、類似度算出部１７、ＵＲＬ出力部１８それぞれの動作の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、各機能部の動作が行われることになる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。 The information selection apparatus 1 described above may have a computer system inside. In that case, the operation processes of the caption acquisition unit 12, URL acquisition unit 13, text acquisition unit 14, text feature amount calculation unit 15, subtitle feature amount calculation unit 16, similarity calculation unit 17, and URL output unit 18 described above are as follows. Are stored in a computer-readable recording medium in the form of a program, and the operation of each functional unit is performed by the computer reading and executing the program. Here, the computer-readable recording medium means a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, or the like. Alternatively, the computer program may be distributed to the computer via a communication line, and the computer that has received the distribution may execute the program.

１…情報選択装置
２…セットトップボックス
３…表示装置
４…検索サーバ
５…コンテンツサーバ
６，９…ネットワーク
８−１，８−２，８−Ｎ…端末装置
１１…単語リスト記憶部
１２…字幕取得部
１３…ＵＲＬ取得部
１４…テキスト取得部
１５…テキスト特徴量算出部
１６…字幕特徴量算出部
１７…類似度算出部
１８…ＵＲＬ出力部
１００…情報選択サーバ
１３１…キーワード抽出部
１３２…クエリ生成部
１３３…コンテンツ検索部 DESCRIPTION OF SYMBOLS 1 ... Information selection apparatus 2 ... Set top box 3 ... Display apparatus 4 ... Search server 5 ... Content server 6, 9 ... Network 8-1, 8-2, 8-N ... Terminal device 11 ... Word list memory | storage part 12 ... Subtitle Acquisition unit 13 ... URL acquisition unit 14 ... Text acquisition unit 15 ... Text feature amount calculation unit 16 ... Subtitle feature amount calculation unit 17 ... Similarity calculation unit 18 ... URL output unit 100 ... Information selection server 131 ... Keyword extraction unit 132 ... Query Generation unit 133 ... content search unit

Claims

A subtitle acquisition unit for receiving subtitle information representing the content of the program broadcast together with the program;
A subject word representing the subject of the caption information and a content word representing the content of the caption information are extracted from words included in the caption information received by the caption acquisition unit, and generated by combining the subject word and the content word A content search unit that searches for content related to the subtitle information among the content published on the network using the plurality of search conditions
For each of the plurality of search conditions, a similarity indicating the correlation between the text information included in the content searched by the content search unit and indicating the content of the content and the caption information is calculated, and based on the calculated similarity A similarity calculation unit that selects any one search condition from the plurality of search conditions;
An information selection apparatus comprising: an output unit that outputs position information indicating an area in which the content searched using the search condition selected by the similarity calculation unit is stored.

A subtitle feature amount calculation unit that calculates a subtitle feature amount that is a vector indicating the feature of the subtitle information, based on the appearance frequency in the subtitle information of each of a plurality of words extracted in advance from subtitle information broadcast in the past;
A text feature amount calculation unit that calculates a text feature amount that is a vector indicating the feature of the text information based on the appearance frequency of each of the plurality of words in the text information;
The similarity calculation unit calculates the similarity from the caption feature quantity and the text feature quantity;
The output unit selects a search condition having the highest average value of the similarity calculated for each of the plurality of search conditions, or a search condition for searching for content having the highest similarity among the plurality of search conditions The information selection apparatus according to claim 1, wherein position information corresponding to the content searched according to the selected search condition is output.

The caption feature amount calculating unit calculates the caption feature amount weighted by tfidf (Term Frequency & Inverse Document Frequency) using the plurality of words;
The text feature amount calculating unit calculates the text feature amount weighted by tfidf using the plurality of words;
The information selection apparatus according to claim 2, wherein the similarity calculation unit calculates a cosine correlation value between the caption feature quantity and the text feature quantity as the similarity degree.

The content search unit searches the content that satisfies the received condition among the contents published on the network and transmits the plurality of search conditions to a search server that returns the position information of the content. Search for content that satisfies the plurality of search conditions,
The text feature amount calculation unit is configured to perform the text feature on the top n (n ≧ 1) content ranked by the search server for each of the plurality of search conditions among the content searched by the content search unit. The information selection device according to claim 2, wherein an amount is calculated.

A first search condition for searching for content that includes the subject word in text information and does not include the content word in the title, among the contents published on the network;
Of the content published on the network, the content word is included in the title, and the second search condition for searching for the content not including the subject word in the text information and the content published on the network Of the content, a third search condition for searching for content in which the content word is included in text information and the subject word is not included in the title;
A fourth search condition for searching for content whose title is included in the title and whose content word is not included in the text information among the contents published on the network;
A query generation unit that generates the plurality of search conditions as a plurality of search conditions,
The content search unit searches the content by transmitting the first to fourth search conditions generated by the query generation unit to the search server,
5. The output unit according to claim 1, wherein the output unit selects any one of the first to fourth search conditions, and outputs the content searched based on the selected search condition. The information selection device according to any one of the above.

In the search using the first to fourth search conditions, the content search unit, when the number of obtained contents is less than n, the subject word or the content used when generating the search condition The information selection apparatus according to claim 5, wherein the number of words is reduced and a search condition is generated again to search for content.

The information selection device according to any one of claims 1 to 6,
A selection information storage unit that inputs subtitle information of each of a plurality of broadcast programs to the information selection device and stores position information output by the information selection device in association with the program;
A request for receiving request information indicating one of the plurality of programs, reading position information corresponding to the program indicated by the request information from the selection information storage unit, and transmitting the read position information to a transmission source of the request information A server apparatus comprising: a processing unit.

A subtitle acquisition step of receiving subtitle information representing the content of the program broadcast together with the program;
A subject word representing the subject of the caption information and a content word representing the content of the caption information are extracted from words included in the caption information received in the caption acquisition step, and generated by combining the subject word and the content word A content search step of searching for content related to the subtitle information among the content published on the network using the plurality of search conditions set;
For each of the plurality of search conditions, a similarity indicating the correlation between the text information indicating the content of the content included in the content searched in the content search step and the caption information is calculated, and based on the calculated similarity , A similarity calculation step of selecting any one search condition from the plurality of search conditions;
And an output step of outputting position information indicating an area in which the content searched using the search condition selected in the similarity calculation step is stored.

A subtitle acquisition step of receiving subtitle information representing the contents of the program broadcast together with the program to a computer provided in the information selection device;
A subject word representing the subject of the caption information and a content word representing the content of the caption information are extracted from words included in the caption information received in the caption acquisition step, and generated by combining the subject word and the content word A content search step of searching for content related to the subtitle information among the content published on the network using the plurality of search conditions set;
For each of the plurality of search conditions, a similarity indicating the correlation between the text information indicating the content of the content included in the content searched in the content search step and the caption information is calculated, and based on the calculated similarity , A similarity calculation step of selecting any one search condition from the plurality of search conditions;
An output step of outputting position information indicating an area in which the content searched using the search condition selected in the similarity calculation step is stored.