JP2016126567A

JP2016126567A - Content recommendation device and program

Info

Publication number: JP2016126567A
Application number: JP2015000334A
Authority: JP
Inventors: 松井　淳; Atsushi Matsui; 淳松井; 小早川　健; Takeshi Kobayakawa; 健小早川; 山内　結子; Yuiko Yamauchi; 結子山内
Original assignee: Nippon Hoso Kyokai NHK
Current assignee: Japan Broadcasting Corp
Priority date: 2015-01-05
Filing date: 2015-01-05
Publication date: 2016-07-11
Anticipated expiration: 2035-01-05
Also published as: JP6429382B2

Abstract

PROBLEM TO BE SOLVED: To search for a content by adding words and phrases in another notation which are deeply associated with the search word semantically to an original search word.SOLUTION: An acquisition unit of a content recommendation device acquires primary query data for showing a list of search words. A search word candidate extraction unit extracts words and phrases of search word candidates from a plurality of pieces of corpus data in which the same topic can be described in a different notation. A query extension unit, out of the words and phrases of the search word candidates, selects the words and phrases of the search word candidates in which similarity with one of the search words contained in the primary query data is higher than a predetermined condition, and also, which co-occur with one of the search words different from the search words which have acquired similarity higher than the predetermined condition in the corpus data, and adds them to the search words. A search unit searches for a content by using the search words contained in the primary query data and the search words added by the query extension unit.SELECTED DRAWING: Figure 1

Description

本発明は、コンテンツ推薦装置、及びプログラムに関する。 The present invention relates to a content recommendation device and a program.

テレビ番組などのコンテンツを視聴するためのコンテンツ処理装置には、コンテンツ推薦機能を備えたものが存在する（例えば、特許文献１、特許文献２参照）。このコンテンツ推薦機能によって、ユーザは自分の視聴したいコンテンツを簡便に見つけられるという利点がある。コンテンツ推薦機能は、何らかの手段によって構築したコンテンツ再生環境において、個々のユーザの視聴履歴を解析して処理することにより実現される。 Some content processing apparatuses for viewing content such as television programs have a content recommendation function (see, for example, Patent Document 1 and Patent Document 2). This content recommendation function has an advantage that the user can easily find the content he / she wants to view. The content recommendation function is realized by analyzing and processing the viewing history of individual users in a content reproduction environment constructed by some means.

例えば、特許文献２のコンテンツ処理装置は、個々のユーザのコンテンツ再生開始時間及びコンテンツ再生終了時間と、視聴したコンテンツを特定するコンテンツＩＤとからなる視聴ログ情報を蓄積する。コンテンツ処理装置は、蓄積した視聴ログ情報の集合の中から、システム設計者があらかじめプログラミングしたヒューリスティックなルールを用いて、個々のユーザの嗜好を反映した視聴ログ情報を機械的に抽出する。コンテンツ処理装置は、抽出した視聴ログ情報に対応する言語情報から、個々の視聴行動の要因となった話題を表す語句（クエリ）を何らかの手段によって取り出す。視聴ログ情報に対応する言語情報は、ユーザが視聴したコンテンツの字幕テキスト、または、視聴したコンテンツに付随する電子番組表（EPG：Electronic Program Guide）から得られる番組概要文などのメタデータである。 For example, the content processing apparatus disclosed in Patent Document 2 accumulates viewing log information including a content reproduction start time and a content reproduction end time of each user and a content ID that identifies the viewed content. The content processing apparatus mechanically extracts viewing log information reflecting individual user preferences from a set of accumulated viewing log information using heuristic rules programmed in advance by a system designer. The content processing apparatus extracts, by some means, a phrase (query) representing a topic that causes individual viewing behavior from language information corresponding to the extracted viewing log information. The language information corresponding to the viewing log information is metadata such as caption text of the content viewed by the user or a program summary sentence obtained from an electronic program guide (EPG) attached to the viewed content.

一方、ツイッターなどの不特定多数のユーザによるソーシャルメディア上での膨大な発言記録を解析するソーシャルデータ・マイニングという技術がある（例えば、非特許文献１参照）。ソーシャルデータ・マイニングでは、世間一般の人々の日々の関心事や、社会生活における多種多様な話題を、具体的な言語表現をともなう形で抽出することが可能である。また、ソーシャルデータにおける発言の対象がどのコンテンツに対して発せられたものであるかを自動的に判定する技術がある（例えば、非特許文献２参照）。 On the other hand, there is a technique called social data mining that analyzes an enormous amount of utterance records on social media by an unspecified number of users such as Twitter (see Non-Patent Document 1, for example). In social data mining, it is possible to extract the daily interests of the general public and various topics in social life in a form with specific language expressions. In addition, there is a technology for automatically determining to which content the utterance target in social data is uttered (see Non-Patent Document 2, for example).

特開２００５-３４８２５３号公報JP 2005-348253 A 特開２０１２-０６５１１９号公報JP 2012-065119 A

M. A. Russell，「入門ソーシャルデータデータマイニング、分析、可視化のテクニック」,オライリー・ジャパン，２０１１年M. A. Russell, “Introductory Social Data Techniques for Data Mining, Analysis, and Visualization”, O'Reilly Japan, 2011 平野真理子、神戸喬輔、小早川健，「ツイート対象番組の自動検出−網羅的・継続的な検出のために―」，２０１３年映像情報メディア学会冬季大会講演予稿集，一般社団法人映像情報メディア学会，２０１３年，３−７Mariko Hirano, Keisuke Kobe, Ken Kobayakawa, “Automatic detection of tweet target programs for comprehensive and continuous detection”, Proceedings of 2013 Winter Conference of the Institute of Image Information and Television Engineers, The Institute of Image Information and Television Engineers , 2013, 3-7

コンテンツ検索の条件として用いる語句の集合であるクエリの設定においては、表記の多様性を考慮する必要がある。これは、ある特定の話題を表すクエリは唯一の表記をとるとは限らないためである。例えば、「サッカー」と「フットボール」は多くの文脈において同一の球技種目を指し示す。同じ話題を表す異なる表記のうち、いずれか一方の表記のみをクエリとして設定した場合には、他方の表記によって記述されたコンテンツを検索することは原理的には困難である。そこで、コンテンツの検索を行う装置において、表層的な表記（記述）は異なるものの、指し示す内容がユーザの意図するコンテンツの内容と合致していると想定される語句をクエリに用いることもある。しかし、装置が想定した語句が、ユーザの意図するコンテンツの内容と異なっていれば、その想定した語句をクエリに用いても、ユーザの意図に該当するコンテンツを正しく特定することは原理的に困難である。結果、ユーザに提示すべきコンテンツの一部あるは大部分が推薦リストから欠落する危険性が生じる。 In setting a query, which is a set of words and phrases used as a content search condition, it is necessary to consider the diversity of notation. This is because a query representing a specific topic does not necessarily have a unique notation. For example, “soccer” and “football” refer to the same ball sport in many contexts. In the case where only one of the different notations representing the same topic is set as a query, it is difficult in principle to search for the content described by the other notation. Therefore, in a device that searches for content, although the surface notation (description) is different, a phrase that is assumed to match the content of the content intended by the user may be used in the query. However, if the word / phrase assumed by the device is different from the content of the content intended by the user, it is theoretically difficult to correctly identify the content corresponding to the user's intention even if the assumed word / phrase is used in the query. It is. As a result, there is a risk that some or most of the content to be presented to the user is missing from the recommendation list.

上述したように、コンテンツ推薦に用いるクエリは、表記の多様性を考慮した上で設定されなければならない。しかし、クエリの表記の多様性を、いかにしてクエリの設定の手順に組み込むかという技術的課題に対して、特許文献１、２は何ら具体的な解決方法を示していない。 As described above, the query used for content recommendation must be set in consideration of the variety of notations. However, Patent Documents 1 and 2 do not show any specific solution to the technical problem of how to incorporate the diversity of query notation into the query setting procedure.

一方、非特許文献１の技術によれば、ソーシャルデータ・マイニングによって、世間一般の人々の日々の関心事や、社会生活における多種多様な話題を、具体的な言語表現をともなう形で抽出することが可能である。このように抽出された言語的表現は、コンテンツの検索を行う際のクエリに追加すべき検索語の候補となる。また、非特許文献２の技術では、発言の対象がどのコンテンツに対して発せられたものであるかを自動的に判定している。コンテンツ・サービスを対象としたコンテンツ推薦において、このような技術を、多様性を考慮したクエリ拡張を実現するために利用することが期待される。しかしながら、非特許文献１に記載されたソーシャルデータ・マイニングの技術、及び、非特許文献２に記載されたツイートの対象コンテンツの自動判定の技術は、クエリとなる任意の語句と意味的に関連がある他の語句を特定する具体的手段を定めていない。 On the other hand, according to the technology of Non-Patent Document 1, social data mining is used to extract daily interests of the general public and various topics in social life in a form with specific language expressions. Is possible. The linguistic expression extracted in this way becomes a candidate for a search term to be added to a query when searching for content. Further, in the technique of Non-Patent Document 2, it is automatically determined to which content the utterance target is uttered. In content recommendation for content services, it is expected that such a technology will be used to realize query expansion considering diversity. However, the social data mining technology described in Non-Patent Document 1 and the technology for automatically determining the target content of a tweet described in Non-Patent Document 2 are semantically related to any phrase that is a query. There is no specific way to identify certain other words.

本発明は、このような事情を考慮してなされたもので、元となる検索語に、その検索語と意味的な関連が深い他の表記の語句を加えてコンテンツを検索することができるコンテンツ推薦装置、及びプログラムを提供する。 The present invention has been made in consideration of such circumstances, and can search for content by adding words of other notations that are deeply related to the search word to the original search word. A recommendation device and a program are provided.

本発明の一態様は、検索に用いる語句である検索語のリストを示す一次クエリデータを取得する取得部と、同一の話題が異なる表記により記述され得る複数のコーパスデータから検索語候補の語句を抽出する検索語候補抽出部と、前記検索語候補の語句の中から、前記一次クエリデータに含まれるいずれかの前記検索語との類似度が所定条件よりも高く、かつ、前記所定条件よりも高い前記類似度を得た前記検索語とは異なる前記検索語のいずれかと前記コーパスデータにおいて共起する検索語候補の語句を選択して前記検索語に追加するクエリ拡張部と、前記一次クエリデータに含まれる前記検索語と、前記クエリ拡張部が追加した前記検索語とを用いてコンテンツを検索する検索部と、を備えることを特徴とするコンテンツ推薦装置である。
この発明によれば、コンテンツ推薦装置は、同一の話題が異なる表記により記述され得る複数のコーパスデータから検索語候補となる語句を取得する。コンテンツ推薦装置は、検索語候補の語句の中から、元の検索語との類似度が所定条件よりも高く、かつ、類似度が所定条件よりも高いと判断したときに用いた元の検索語とは異なる元の検索語とコーパスデータにおいて共起する語句を、検索語に追加する。コンテンツ推薦装置は、元の検索語と追加した検索語とを用いてコンテンツを検索する。
これにより、コンテンツ推薦装置は、元となる検索語に、その検索語と意味的な関連が深い他の表記の語句を加えてコンテンツを検索し、推薦することができる。 According to one aspect of the present invention, an acquisition unit that acquires primary query data indicating a list of search terms that are terms used for a search, and a search term candidate phrase from a plurality of corpus data in which the same topic can be described by different notations The search word candidate extraction unit to be extracted and the similarity between the search word candidate and any one of the search words included in the primary query data is higher than a predetermined condition and higher than the predetermined condition. A query expansion unit that selects and adds to the search term a search term candidate phrase that co-occurs in the corpus data with any of the search terms different from the search term that has obtained a high similarity, and the primary query data A content recommendation device comprising: a search unit that searches for content using the search term included in the search term and the search term added by the query expansion unit .
According to the present invention, the content recommendation device acquires a word / phrase as a search word candidate from a plurality of corpus data in which the same topic can be described by different notations. The content recommendation device uses the original search word used when it is determined that the similarity to the original search word is higher than the predetermined condition and the similarity is higher than the predetermined condition from the search word candidate phrases A phrase that co-occurs in the corpus data and the original search word different from is added to the search word. The content recommendation device searches for content using the original search word and the added search word.
As a result, the content recommendation device can search and recommend content by adding, to the original search word, other notation words that are deeply related to the search word.

本発明の一態様は、上述したコンテンツ推薦装置であって、前記検索語候補抽出部は、所定期間内の前記コーパスデータから前記検索語候補の語句を抽出する、ことを特徴とする。
この発明によれば、コンテンツ推薦装置は、所定期間のコーパスデータから抽出した検索語候補の語句の中から検索語として追加する語句を選択する。
これにより、コンテンツ推薦装置は、元となる検索語に、その検索語と意味的な関連が深く、時事性を反映した他の表記の語句を加えてコンテンツを検索し、推薦することができる。 One aspect of the present invention is the content recommendation device described above, wherein the search word candidate extraction unit extracts words of the search word candidate from the corpus data within a predetermined period.
According to the present invention, the content recommendation device selects a word to be added as a search word from the search word candidate words extracted from the corpus data for a predetermined period.
As a result, the content recommendation device can search and recommend content by adding words having other notations that are deeply related to the search word and reflect current affairs to the original search word.

本発明の一態様は、上述したコンテンツ推薦装置であって、前記取得部は、ユーザが視聴したコンテンツに関するテキスト情報から抽出した語句からなる一次クエリデータを取得する、ことを特徴とする。
この発明によれば、コンテンツ推薦装置は、ユーザが視聴したコンテンツの履歴に基づいて、ユーザの嗜好を表す元の検索語を取得する。
これにより、コンテンツ推薦装置は、ユーザが視聴したコンテンツの履歴から、ユーザの嗜好に合った他のコンテンツを検索し、提示することができる。 One aspect of the present invention is the content recommendation device described above, wherein the acquisition unit acquires primary query data including words and phrases extracted from text information related to content viewed by a user.
According to the present invention, the content recommendation device acquires an original search word representing the user's preference based on the history of the content viewed by the user.
Thereby, the content recommendation apparatus can search and present other content that matches the user's preference from the history of the content viewed by the user.

本発明の一態様は、上述したコンテンツ推薦装置であって、前記取得部は、ユーザが再生したコンテンツの部分に関するテキスト情報から抽出した語句からなる一次クエリデータを取得する、ことを特徴とする。
この発明によれば、コンテンツ推薦装置は、ユーザが再生したコンテンツの部分の内容を表すテキスト情報から検索語を取得する。
これにより、コンテンツ推薦装置は、ユーザの嗜好をよく表した検索語を取得することができるため、ユーザの求める内容により則したコンテンツを推薦することができる。 One aspect of the present invention is the content recommendation device described above, wherein the acquisition unit acquires primary query data including words and phrases extracted from text information related to a portion of content reproduced by a user.
According to the present invention, the content recommendation device acquires a search term from text information representing the content portion of the content reproduced by the user.
As a result, the content recommendation device can acquire a search term that well represents the user's preference, and can therefore recommend content that conforms to the content desired by the user.

本発明の一態様は、上述したコンテンツ推薦装置であって、前記検索語候補抽出部は、前記コーパスデータのタグまたは本文から前記検索語候補の語句を抽出する、ことを特徴とする。
この発明によれば、コンテンツ推薦装置は、コーパスデータのタグまたは本文から検索語候補となる語句を抽出する。
これにより、コンテンツ推薦装置は、コーパスデータに含まれるタグを利用することにより、処理の負荷を抑えつつ、コーパスデータの本文の内容を良く表した語句を検索語候補として抽出することができる。また、コンテンツ推薦装置は、タグが利用できない場合でも、コーパスデータの本文の内容から検索語候補の語句を抽出することができる。 One aspect of the present invention is the above-described content recommendation device, wherein the search word candidate extraction unit extracts a phrase of the search word candidate from a tag or text of the corpus data.
According to the present invention, the content recommendation device extracts a phrase that is a search word candidate from a tag or text of corpus data.
As a result, the content recommendation device can extract, as a search word candidate, a phrase that well expresses the content of the text of the corpus data while suppressing the processing load by using the tag included in the corpus data. Further, the content recommendation device can extract a search term candidate word from the content of the text of the corpus data even when the tag cannot be used.

本発明の一態様は、コンピュータを、検索に用いる語句である検索語のリストを示す一次クエリデータを取得する取得手段と、同一の話題が異なる表記により記述され得る複数のコーパスデータから検索語候補の語句を抽出する検索語候補抽出手段と、前記検索語候補の語句の中から、前記一次クエリデータに含まれるいずれかの前記検索語との類似度が所定条件よりも高く、かつ、前記所定条件よりも高い前記類似度を得た前記検索語とは異なる前記検索語のいずれかと前記コーパスデータにおいて共起する検索語候補の語句を選択して前記検索語に追加するクエリ拡張手段と、前記一次クエリデータに含まれる前記検索語と、前記クエリ拡張手段が追加した前記検索語とを用いてコンテンツを検索する検索手段と、を具備するコンテンツ推薦装置として機能させるためのプログラムである。 One aspect of the present invention is a search term candidate from a plurality of corpus data in which a computer can describe primary query data indicating a list of search terms that are words used for a search, and a plurality of corpus data in which the same topic can be described by different notations The search word candidate extracting means for extracting the word of the search term, and the similarity between the search word candidate and any one of the search words included in the primary query data is higher than a predetermined condition, and the predetermined word Query expansion means for selecting and adding to the search terms a search term candidate word that co-occurs in the corpus data with any of the search terms different from the search term that has obtained the similarity higher than a condition; Content comprising: search means for searching for content using the search terms included in primary query data and the search terms added by the query expansion means Is a program for functioning as Como device.

本発明によれば、元となる検索語に、その検索語と意味的な関連が深い他の表記の語句を加えてコンテンツを検索することができる。 According to the present invention, it is possible to search for content by adding words having other notations deeply related to the search word to the original search word.

本発明の一実施形態によるコンテンツ推薦システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the content recommendation system by one Embodiment of this invention. 同実施形態のコンテンツ推薦装置によるコンテンツ推薦処理の処理フローを示す図である。It is a figure which shows the processing flow of the content recommendation process by the content recommendation apparatus of the embodiment. 同実施形態の視聴履歴解析部による一次クエリ生成処理の処理フローを示す図である。It is a figure which shows the processing flow of the primary query production | generation process by the viewing history analysis part of the embodiment. 同実施形態のクエリ拡張部による二次クエリ選定処理の処理フローを示す図である。It is a figure which shows the processing flow of the secondary query selection process by the query expansion part of the embodiment. 同実施形態の推薦リスト生成部による推薦コンテンツ選択処理の処理フローを示す図である。It is a figure which shows the processing flow of the recommendation content selection process by the recommendation list production | generation part of the embodiment. 同実施形態の視聴履歴記録部が出力するユーザ視聴履歴情報の例を示す図である。It is a figure which shows the example of the user viewing history information which the viewing history recording part of the embodiment outputs. 同実施形態の未視聴コンテンツ情報記録部が出力する未視聴コンテンツ情報の例を示す図である。It is a figure which shows the example of the unviewed content information which the unviewed content information recording part of the embodiment outputs. 同実施形態の視聴履歴解析部が出力する一次クエリデータの例を示す図である。It is a figure which shows the example of the primary query data which the viewing history analysis part of the embodiment outputs. 同実施形態のソーシャルデータ記録部が保存するソーシャルデータの例を示す図である。It is a figure which shows the example of the social data which the social data recording part of the embodiment preserve | saves. 同実施形態のソーシャルデータ解析部が出力する二次クエリ候補リストデータの例を示す図である。It is a figure which shows the example of the secondary query candidate list data which the social data analysis part of the embodiment outputs. 同実施形態のクエリ拡張部が生成する拡張クエリデータの例を示す図である。It is a figure which shows the example of the expansion query data which the query expansion part of the embodiment produces | generates. 同実施形態の拡張クエリと未視聴コンテンツ情報との関係を示す図である。It is a figure which shows the relationship between the expansion query and unviewed content information of the embodiment. 同実施形態の推薦リスト生成部が出力する推薦コンテンツリストデータの例を示す図である。It is a figure which shows the example of the recommendation content list data which the recommendation list production | generation part of the embodiment outputs. 同実施形態の推薦コンテンツ提示部がコンテンツ表示装置に表示させる推薦コンテンツ提示画面の例を示す図である。It is a figure which shows the example of the recommended content presentation screen which the recommended content presentation part of the embodiment displays on a content display apparatus.

以下、図面を参照しながら本発明の実施形態を詳細に説明する。
本実施形態のコンテンツ推薦装置は、ユーザの嗜好をキーワードなどの言語的手段によって記述した検索語の集合を一次クエリとし、一次クエリと内容が関連する他の表記の語句の集合である二次クエリを新たに一次クエリに追加して拡張クエリを生成する。二次クエリは、元の検索語と意味的な関連が深い語句の集合である。すなわち、二次クエリは、ユーザの嗜好を間接的に表現した補助的な検索語の集合である。本実施形態のコンテンツ推薦装置は、生成した拡張クエリを用いて、ユーザに推薦するコンテンツ（以下、「推薦コンテンツ」とも記載する。）を検索する。
このように、本実施形態のコンテンツ推薦装置は、ユーザの嗜好を表す元の検索語と、元の検索語に意味的な関連が深い他の検索語とを併用してユーザに推薦すべきコンテンツを検索する。従って、本実施形態のコンテンツ推薦装置は、ユーザの潜在的な嗜好や話題の関連性を考慮した高度なコンテンツ推薦を可能にする。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
The content recommendation device according to the present embodiment uses a set of search terms in which user preferences are described by linguistic means such as keywords as a primary query, and a secondary query that is a set of other notation phrases related to the primary query and contents. Is newly added to the primary query to generate an extended query. A secondary query is a set of phrases that are deeply semantically related to the original search term. That is, the secondary query is a set of auxiliary search terms that indirectly expresses the user's preference. The content recommendation device according to the present embodiment searches for content to be recommended to the user (hereinafter also referred to as “recommended content”) using the generated extended query.
As described above, the content recommendation device according to the present embodiment is a content that should be recommended to the user by using the original search word that represents the user's preference and another search word that is deeply related to the original search word. Search for. Therefore, the content recommendation device of the present embodiment enables advanced content recommendation in consideration of the user's potential preference and topic relevance.

また、ユーザの嗜好を構成する個々の概念（すなわち、話題）は、しばしば時事の出来事やその他の社会的情勢に影響され、日々変化すると考えられる。従って、クエリの設定においては、コンテンツ推薦を行う時期に応じた話題の時事性を十分に考慮する必要がある。例えば、芸能の話題や、事件・事故のニュースなど、ユーザの興味がごく短期間に限定した一過性の話題にあるときには、ユーザの長期的な嗜好に加えて、ユーザが新たに興味を示す可能性の高い、時事の話題を反映した検索語をクエリに反映させる必要がある。そこで、本実施形態のコンテンツ推薦装置は、推薦コンテンツの検索に用いる上述の拡張クエリを、表記の多様性に加え、時事性をさらに考慮して設定する。これにより、本実施形態のコンテンツ推薦装置は、ユーザの潜在的な嗜好に加え、日々新たに出現する時事の話題をも反映した、ユーザの求める内容により則したコンテンツ推薦を実現することができる。よって、本実施形態のコンテンツ推薦装置は、例えば、スポーツ中継などの専門性が高いコンテンツや、ニュース番組ならびにドキュメンタリー番組などの時事性が高いコンテンツを多く扱う放送サービスなどに好適である。 In addition, individual concepts (ie, topics) that constitute user preferences are often influenced by current events and other social situations, and are considered to change from day to day. Therefore, in setting the query, it is necessary to sufficiently consider the topical nature of the topic according to the time when the content recommendation is performed. For example, when the user's interest is a temporary topic that is limited to a very short period of time, such as entertainment topics or incident / accident news, in addition to the user's long-term preference, the user is newly interested It is necessary to reflect a search term that reflects the topic of current events that is highly likely. Therefore, the content recommendation device according to the present embodiment sets the above-described extended query used for the search of recommended content in consideration of the currentity in addition to the variety of notations. Thereby, the content recommendation apparatus of this embodiment can implement | achieve the content recommendation according to the content which the user calculates | requires reflecting the topic of the new thing which appears every day in addition to a user's potential preference. Therefore, the content recommendation device according to the present embodiment is suitable for a broadcasting service that handles a large amount of highly specialized content such as sports broadcasts and a highly current content such as news programs and documentary programs.

図１は、本発明の一実施形態によるコンテンツ推薦システムの構成を示す機能ブロック図であり、本実施形態と関係する機能ブロックのみを抽出して示してある。同図に示すように、コンテンツ推薦システムは、コンテンツ推薦装置１とコンテンツ表示装置３とを備えて構成される。コンテンツ推薦装置１とコンテンツ表示装置３とは、ＩＰ（Internet Protocol）ネットワークなどのネットワーク９を介して接続される。また、ネットワーク９には、ソーシャルメディアサービス提供装置５が接続される。同図においては、コンテンツ表示装置３及びソーシャルメディアサービス提供装置５をそれぞれ１台のみ示しているが、複数台ずつが存在し得る。 FIG. 1 is a functional block diagram showing a configuration of a content recommendation system according to an embodiment of the present invention, and only functional blocks related to the present embodiment are extracted and shown. As shown in the figure, the content recommendation system includes a content recommendation device 1 and a content display device 3. The content recommendation device 1 and the content display device 3 are connected via a network 9 such as an IP (Internet Protocol) network. Further, the social media service providing apparatus 5 is connected to the network 9. In the figure, only one content display device 3 and one social media service providing device 5 are shown, but a plurality of devices may exist.

コンテンツ推薦装置１は、例えば、１台以上のコンピュータ装置により実現される。コンテンツ推薦装置１は、視聴履歴記録部１１と、未視聴コンテンツ情報記録部１２と、視聴履歴解析部１３（取得部）と、ソーシャルデータ記録部１４と、ソーシャルデータ解析部１５（検索語候補抽出部）と、クエリ拡張部１６と、推薦リスト生成部１７（検索部）と、推薦コンテンツ提示部１８と、記憶部１９とを備えて構成される。例えば、コンテンツ推薦装置１がネットワークにより接続される複数台のコンピュータ装置により実現される場合、いずれのコンピュータ装置がいずれの機能部を実現するかについては任意とすることができる。 The content recommendation device 1 is realized by, for example, one or more computer devices. The content recommendation device 1 includes a viewing history recording unit 11, an unviewed content information recording unit 12, a viewing history analysis unit 13 (acquisition unit), a social data recording unit 14, and a social data analysis unit 15 (search word candidate extraction). Section), a query expansion section 16, a recommendation list generation section 17 (search section), a recommended content presentation section 18, and a storage section 19. For example, when the content recommendation device 1 is realized by a plurality of computer devices connected via a network, which computer device realizes which functional unit can be arbitrary.

視聴履歴記録部１１は、ユーザ視聴履歴情報を取得し、記録する。ユーザ視聴履歴情報は、コンテンツ推薦装置１が処理対象としているコンテンツ・サービスにおいて、ユーザが視聴したコンテンツの履歴を示す。ユーザ視聴履歴情報は、ユーザが視聴したコンテンツの識別情報と、そのコンテンツに関するテキスト情報とを含む。コンテンツは、テレビ番組、動画、静止画、ウェブページ、文書、テキスト、電子書籍など任意のコンテンツデータとすることができる。例えば、視聴履歴記録部１１は、記録するユーザ視聴履歴情報を、コンテンツ表示装置３から受信したコンテンツ視聴情報に基づいて取得する。 The viewing history recording unit 11 acquires and records user viewing history information. The user viewing history information indicates the history of the content viewed by the user in the content service that is the processing target of the content recommendation device 1. The user viewing history information includes identification information of content viewed by the user and text information related to the content. The content can be any content data such as a TV program, a moving image, a still image, a web page, a document, text, and an electronic book. For example, the viewing history recording unit 11 acquires user viewing history information to be recorded based on the content viewing information received from the content display device 3.

未視聴コンテンツ情報記録部１２は、未視聴コンテンツ情報を取得し、記録する。未視聴コンテンツ情報は、コンテンツ・サービスにおいてユーザに提供可能なコンテンツのうち、ユーザが未視聴のコンテンツに関するテキスト情報を含む。ユーザが未視聴のコンテンツを、「未視聴コンテンツ」とも記載する。未視聴コンテンツは、テレビ番組、動画、静止画、ウェブページ、文書、テキスト、電子書籍など任意のコンテンツデータとすることができる。 The unviewed content information recording unit 12 acquires and records unviewed content information. Unviewed content information includes text information related to content that has not been viewed by the user among content that can be provided to the user in the content service. Content that has not been viewed by the user is also referred to as “unviewed content”. The unviewed content can be any content data such as a TV program, a moving image, a still image, a web page, a document, a text, and an electronic book.

視聴履歴解析部１３は、視聴履歴記録部１１に記録されているユーザ視聴履歴情報を解析し、一次的な検索語となる語句の集合を示す一次クエリデータを生成する。一次クエリデータが示す語句集合の要素となる語句（検索語）を、「一次クエリの語句」と記載する。 The viewing history analysis unit 13 analyzes the user viewing history information recorded in the viewing history recording unit 11 and generates primary query data indicating a set of phrases that are primary search terms. A phrase (search term) that is an element of the phrase set indicated by the primary query data is referred to as “primary query phrase”.

ソーシャルデータ記録部１４は、不特定多数の投稿者がソーシャルメディア上に投稿したソーシャルデータをソーシャルメディアサービス提供装置５から取得して記録する。ソーシャルメディアの一例は、ツイッター（Twitter）である。ソーシャルデータは、例えばインターネットによりアクセス可能なソーシャルメディア上で公開されているデータであり、投稿者の発言内容を示すテキスト情報と、発言内容を公開（投稿）した日時の情報とを含む。不特定多数の投稿者の発言内容を示すソーシャルデータでは、同一の話題が異なる表記の語句により記述され得る。 The social data recording unit 14 acquires and records social data posted by an unspecified number of contributors on social media from the social media service providing apparatus 5. An example of social media is Twitter (Twitter). The social data is, for example, data published on social media accessible via the Internet, and includes text information indicating the content of the contributor's remarks and information on the date and time when the remark content is disclosed (posted). In the social data indicating the contents of remarks of an unspecified number of contributors, the same topic can be described by differently expressed words and phrases.

ソーシャルデータ解析部１５は、ソーシャルデータ記録部１４に記録されているソーシャルデータを解析し、二次クエリ候補の語句の集合を示す二次クエリ候補リストデータを生成する。二次クエリ候補とは、二次クエリの要素とすべき語句の候補である。二次クエリは、一次クエリの語句と意味的な関連が深く、一次クエリに追加する検索語となる語句の集合である。つまり、二次クエリ候補の語句は、検索語候補の語句である。 The social data analysis unit 15 analyzes the social data recorded in the social data recording unit 14 and generates secondary query candidate list data indicating a set of terms of secondary query candidates. A secondary query candidate is a candidate for a phrase that should be an element of a secondary query. The secondary query is a set of terms that are deeply related to the terms of the primary query and serve as search terms to be added to the primary query. That is, the phrase of the secondary query candidate is the phrase of the search word candidate.

クエリ拡張部１６は、一次クエリデータに含まれる一次クエリの語句と、二次クエリ候補リストデータに含まれる二次クエリ候補の語句とを比較して、二次クエリ候補の語句の中から一次クエリの語句と意味的な関係が深い語句を、二次クエリの語句として抽出する。クエリ拡張部１６は、抽出した二次クエリの語句を一次クエリの語句に追加した検索語の集合である拡張クエリデータを生成する。 The query expansion unit 16 compares the primary query word / phrase included in the primary query data with the secondary query candidate word / phrase included in the secondary query candidate list data, and selects the primary query from the secondary query candidate words / phrases. A word having a deep semantic relationship with the word is extracted as a word of the secondary query. The query expansion unit 16 generates expanded query data that is a set of search terms obtained by adding the extracted secondary query terms to the primary query terms.

推薦リスト生成部１７は、未視聴コンテンツ情報が示す未視聴コンテンツに関するテキスト情報と、拡張クエリデータとを用いてマッチングスコアを算出する。推薦リスト生成部１７は、算出したマッチングスコアに応じて各々の未視聴コンテンツに順位を付け、未視聴コンテンツの中から順位に基づいて推薦コンテンツを選択する。推薦リスト生成部１７は、選択した推薦コンテンツを特定する情報を列挙した推薦コンテンツリストデータを生成する。 The recommendation list generation unit 17 calculates a matching score using text information related to the unviewed content indicated by the unviewed content information and the extended query data. The recommendation list generation unit 17 ranks each unviewed content according to the calculated matching score, and selects the recommended content from the unviewed content based on the rank. The recommended list generation unit 17 generates recommended content list data listing information for specifying the selected recommended content.

推薦コンテンツ提示部１８は、推薦コンテンツリストデータが示す各々の推薦コンテンツに関する情報を提示する。つまり、推薦コンテンツ提示部１８は、推薦コンテンツに関する情報を設定した推薦コンテンツ提示情報を、ユーザのコンテンツ表示装置３に送信して表示させる。 The recommended content presentation unit 18 presents information regarding each recommended content indicated by the recommended content list data. That is, the recommended content presentation unit 18 transmits recommended content presentation information in which information related to the recommended content is set to the user's content display device 3 for display.

記憶部１９は、各コンテンツに関するテキスト情報を記憶する。また、記憶部１９は、コンテンツ・サービスにおいて各ユーザに提供可能なコンテンツの情報を記憶する。 The storage unit 19 stores text information regarding each content. The storage unit 19 stores information on content that can be provided to each user in the content service.

コンテンツ表示装置３は、例えば、ユーザのパーソナルコンピュータ、スマートフォン、タブレット端末、テレビジョン受信機などである。コンテンツ表示装置３は、操作部３１、取得部３２、出力部３３、通知部３４、及び受信部３５を備えて構成される。操作部３１は、ユーザによる操作を受ける。操作部３１は、例えば、キーやボタン、マウス、タッチパネルに配されたタッチセンサ、リモートコントローラによる操作を受信する装置である。取得部３２は、ユーザが利用可能なコンテンツ・サービスにおいて提供されるコンテンツの中から、操作部３１が受けたユーザの操作により選択されたコンテンツを取得する。出力部３３は、ディスプレイやスピーカーであり、取得部３２が取得したコンテンツを出力する。通知部３４は、ユーザが視聴したコンテンツを示すコンテンツ視聴情報をコンテンツ推薦装置１に通知する。受信部３５は、コンテンツ推薦装置１から推薦コンテンツ提示情報を受信し、出力部３３に表示させる。 The content display device 3 is, for example, a user's personal computer, a smartphone, a tablet terminal, a television receiver, or the like. The content display device 3 includes an operation unit 31, an acquisition unit 32, an output unit 33, a notification unit 34, and a reception unit 35. The operation unit 31 receives an operation by a user. The operation unit 31 is a device that receives an operation by a key, a button, a mouse, a touch sensor arranged on a touch panel, or a remote controller, for example. The acquisition unit 32 acquires the content selected by the user's operation received by the operation unit 31 from the content provided by the content service available to the user. The output unit 33 is a display or a speaker, and outputs the content acquired by the acquisition unit 32. The notification unit 34 notifies the content recommendation device 1 of content viewing information indicating the content viewed by the user. The receiving unit 35 receives recommended content presentation information from the content recommendation device 1 and causes the output unit 33 to display the recommended content presentation information.

続いて、コンテンツ推薦装置１の動作を説明する。以下では、コンテンツがテレビ番組である場合を例に説明する。
視聴履歴記録部１１は、各々のユーザが過去に視聴したコンテンツの履歴を示すユーザ視聴履歴情報を、ユーザの識別情報であるユーザＩＤと対応付けて記録している。具体的には、視聴履歴記録部１１は、コンテンツ表示装置３の通知部３４からユーザが視聴したコンテンツの情報と、ユーザを特定する情報とを設定したコンテンツ視聴情報を受信する。ユーザが視聴したコンテンツは、コンテンツ表示装置３の操作部３１が受けたユーザの操作に基づいて取得部３２が取得し、出力部３３により出力したコンテンツである。視聴履歴記録部１１は、受信したコンテンツ視聴情報に基づいて、ユーザが視聴したコンテンツのコンテンツＩＤと、そのコンテンツの内容を記述したテキスト情報とを含むユーザ視聴履歴情報を、ユーザＩＤと対応付けて記録する。テキスト情報は、例えば、番組概要文などである。記憶部１９は、コンテンツ推薦装置１が受信した放送波から取得した番組概要文を記憶しており、視聴履歴記録部１１は、番組概要文を記憶部１９から読み出す。 Next, the operation of the content recommendation device 1 will be described. Hereinafter, a case where the content is a television program will be described as an example.
The viewing history recording unit 11 records user viewing history information indicating the history of content viewed by each user in the past in association with a user ID that is user identification information. Specifically, the viewing history recording unit 11 receives content viewing information in which information on content viewed by the user and information specifying the user are set from the notification unit 34 of the content display device 3. The content viewed by the user is content acquired by the acquisition unit 32 based on a user operation received by the operation unit 31 of the content display device 3 and output by the output unit 33. Based on the received content viewing information, the viewing history recording unit 11 associates the user viewing history information including the content ID of the content viewed by the user and the text information describing the content with the user ID. Record. The text information is, for example, a program summary sentence. The storage unit 19 stores a program summary sentence acquired from the broadcast wave received by the content recommendation device 1, and the viewing history recording unit 11 reads the program summary sentence from the storage unit 19.

上記のように、視聴履歴記録部１１は、ユーザ視聴履歴情報によって、ユーザが視聴したコンテンツの内容を示すテキスト情報を、コンテンツ単位で記録することを基本とする。なお、視聴履歴記録部１１は、特許文献２に記載のように、ユーザがコンテンツを視聴したときの細かな操作履歴を詳細に記録した情報をさらにユーザ視聴履歴情報に設定してもよい。この場合、コンテンツ表示装置３の通知部３４は、ユーザがコンテンツを視聴したときの操作履歴をさらにコンテンツ推薦装置１に通知する。操作履歴は、例えば、コンテンツの再生開始点及び再生終了点などである。 As described above, the viewing history recording unit 11 basically records text information indicating the content of the content viewed by the user in units of content based on the user viewing history information. Note that, as described in Patent Document 2, the viewing history recording unit 11 may further set information that records in detail a detailed operation history when a user views content as user viewing history information. In this case, the notification unit 34 of the content display device 3 further notifies the content recommendation device 1 of the operation history when the user views the content. The operation history is, for example, a content reproduction start point and a reproduction end point.

一方、ソーシャルデータ記録部１４は、ネットワーク９を介してソーシャルメディアサービス提供装置５にアクセスする。ソーシャルメディアサービス提供装置５は、不特定多数の投稿者が投稿したソーシャルデータを公開している。ソーシャルデータは、投稿者の発言内容を示すテキスト情報と、そのテキスト情報の投稿日時を示すタイムスタンプとを含む。ソーシャルデータ記録部１４は、不特定多数の投稿者が投稿したソーシャルデータをアクセス先のソーシャルメディアサービス提供装置５から取得し、記録する。 On the other hand, the social data recording unit 14 accesses the social media service providing apparatus 5 via the network 9. The social media service providing apparatus 5 publishes social data posted by an unspecified number of contributors. The social data includes text information indicating the content of the utterance of the poster and a time stamp indicating the posting date and time of the text information. The social data recording unit 14 acquires and records social data posted by an unspecified number of contributors from the social media service providing apparatus 5 as an access destination.

なお、ソーシャルデータ記録部１４は、取得可能な全てのソーシャルデータを取得することを基本とする。つまり、ソーシャルデータ記録部１４は、アクセス可能なソーシャルメディア上の全ての発言記録のデータを収集する。ただし、解析対象のコンテンツを限定したコンテンツ推薦や、時事性をより重視したコンテンツ推薦を実現する用途の場合、ソーシャルデータ記録部１４は、収集対象のソーシャルデータを分類し、選別する処理をさらに行ってよい。収集対象のソーシャルデータの分類や選別には、ソーシャルデータの発言対象を自動判定する既存の技術や、発言の日時を特定可能な補助的手段を利用することができる。ソーシャルデータの発言対象を自動判定する技術としては、非特許文献２の技術が利用可能である。また、発言の日時を特定可能な形態でソーシャルデータを網羅的に取得する技術としては、「橋本翔、“tw twitter client on Ruby”、［online］、インターネット＜URL:http://shokai.github.io/tw/＞」などが利用可能である。これにより、ソーシャルデータ記録部１４は、タイムスタンプが所定期間内の投稿日時を示すソーシャルデータを収集する。例えば、最近の話題を反映した場合、ソーシャルデータ記録部１４は、例えば、現在から数日、数週間、数か月、あるいは、数年前までの期間のソーシャルデータを収集し、過去の話題を反映したい場合、指定された過去の期間のソーシャルデータを収集する。 The social data recording unit 14 is based on acquiring all social data that can be acquired. That is, the social data recording unit 14 collects data of all utterance records on accessible social media. However, in the case of the purpose of realizing content recommendation with limited content to be analyzed or content recommendation with more importance on current affairs, the social data recording unit 14 further performs a process of classifying and selecting social data to be collected. It's okay. For the classification and selection of social data to be collected, existing techniques for automatically determining the utterance target of social data and auxiliary means capable of specifying the date and time of utterance can be used. The technology of Non-Patent Document 2 can be used as a technology for automatically determining the utterance target of social data. In addition, technologies that comprehensively acquire social data in a form that can specify the date and time of remarks include “Sho Hashimoto,“ tw twitter client on Ruby ”, [online], Internet <URL: http: //shokai.github .io / tw /> "etc. can be used. Thereby, the social data recording unit 14 collects social data whose time stamp indicates a posting date and time within a predetermined period. For example, when reflecting recent topics, the social data recording unit 14 collects social data for a period of days, weeks, months, or years ago from the present, If you want to reflect, collect social data of the specified past period.

図２は、コンテンツ推薦装置１によるコンテンツ推薦処理の処理フローを示す図である。上記の処理により、コンテンツ推薦処理の開始前に、視聴履歴記録部１１には、各々のユーザが視聴したコンテンツのコンテンツＩＤと、そのコンテンツの内容を記述したテキスト情報を含んだユーザ視聴履歴情報が記録されている。コンテンツ推薦装置１は、各ユーザについて、図２に示すコンテンツ推薦処理を実行する。 FIG. 2 is a diagram showing a processing flow of content recommendation processing by the content recommendation device 1. By the above processing, before the content recommendation processing is started, the viewing history recording unit 11 has the user viewing history information including the content ID of the content viewed by each user and the text information describing the content of the content. It is recorded. The content recommendation device 1 executes the content recommendation process shown in FIG. 2 for each user.

視聴履歴記録部１１は、コンテンツを推薦するユーザのユーザＩＤが付与されているユーザ視聴履歴情報を未視聴コンテンツ情報記録部１２及び視聴履歴解析部１３に出力する（ステップＳ１１０）。 The viewing history recording unit 11 outputs the user viewing history information provided with the user ID of the user who recommends the content to the unviewed content information recording unit 12 and the viewing history analysis unit 13 (step S110).

未視聴コンテンツ情報記録部１２は、視聴履歴記録部１１から受信したユーザ視聴履歴情報に基づいて、コンテンツを推薦するユーザの未視聴コンテンツを検索する。具体的には、未視聴コンテンツ情報記録部１２は、記憶部１９に記憶されている各ユーザに提供可能なコンテンツの情報を参照し、ユーザ視聴履歴情報にコンテンツＩＤが設定されておらず、かつ、ユーザが利用可能なコンテンツを検索し、未視聴コンテンツとする。未視聴コンテンツ情報記録部１２は、未視聴コンテンツの内容を記述したテキスト情報を記憶部１９から読み出す。テキスト情報は、例えば、番組概要文などであり、コンテンツ推薦装置１が放送波から取得して記憶部１９に蓄積しておく。未視聴コンテンツ情報記録部１２は、未視聴コンテンツのテキスト情報のリストである未視聴コンテンツ情報を生成する（ステップＳ１１０）。 The unviewed content information recording unit 12 searches for unviewed content of a user who recommends content based on the user viewing history information received from the viewing history recording unit 11. Specifically, the unviewed content information recording unit 12 refers to the content information that can be provided to each user stored in the storage unit 19, the content ID is not set in the user viewing history information, and The content that can be used by the user is searched for as unviewed content. The unviewed content information recording unit 12 reads the text information describing the contents of the unviewed content from the storage unit 19. The text information is, for example, a program summary sentence or the like, and is acquired from the broadcast wave by the content recommendation device 1 and stored in the storage unit 19. The unviewed content information recording unit 12 generates unviewed content information that is a list of text information of unviewed content (step S110).

上記により、未視聴コンテンツ情報記録部１２は、コンテンツ推薦装置１が処理対象として想定したコンテンツ・サービスにおいてユーザが計算機可読な状態で入手可能な全てのコンテンツの中から、ユーザがまだ視聴していないコンテンツを検索する。入手可能なコンテンツは、例えば、一週間先までの放送予定番組などでもよく、ユーザが契約しているコンテンツ・サービスにおいて現在配信可能なコンテンツなどとしてもよい。なお、ユーザの視聴傾向に明らかな偏りがあることが予めわかっている場合、未視聴コンテンツ情報記録部１２は、ジャンルなどのコンテンツの属性により、未視聴コンテンツとして選択する対象を限定する処理を行ってもよい。 As described above, the unviewed content information recording unit 12 has not yet been viewed by the user from all the contents that the user can obtain in a computer-readable state in the content service assumed as the processing target by the content recommendation device 1. Search for content. The available content may be, for example, a program scheduled to be broadcast up to one week in advance, or may be content that can be currently distributed in a content service subscribed to by the user. When it is known in advance that there is a clear bias in the viewing tendency of the user, the unviewed content information recording unit 12 performs a process of limiting the target to be selected as unviewed content according to the content attribute such as the genre. May be.

視聴履歴解析部１３は、視聴履歴記録部１１からユーザ視聴履歴情報を受信する。視聴履歴解析部１３は、受信したユーザ視聴履歴情報に記述されている各々の視聴済みコンテンツの内容に関するテキスト情報を解析して、一次的な検索語の集合を示す一次クエリデータを生成する（ステップＳ１１５）。
具体的には、視聴履歴解析部１３は、ユーザ視聴履歴情報に記述されている番組概要文などのテキスト情報を、公知の形態素解析技術を用いて単語単位に分割する。視聴履歴解析部１３は、分割されたそれらの単語の中から、形態素解析の結果として各単語に付与された品詞などの情報に基づいて、検索語となる語句（単語）を選定する。例えば、視聴履歴解析部１３は、固有名詞（例えば、人名）などの意味的に重要な語句（単語）を検索語として選択する。視聴履歴解析部１３は、選択した語句をリストの形式で記述して一次クエリデータとする。 The viewing history analysis unit 13 receives user viewing history information from the viewing history recording unit 11. The viewing history analysis unit 13 analyzes text information related to the contents of each viewed content described in the received user viewing history information, and generates primary query data indicating a primary set of search terms (step) S115).
Specifically, the viewing history analysis unit 13 divides text information such as a program summary sentence described in the user viewing history information into words using a known morphological analysis technique. The viewing history analysis unit 13 selects a word (word) as a search word from the divided words based on information such as a part of speech given to each word as a result of morphological analysis. For example, the viewing history analysis unit 13 selects semantically important phrases (words) such as proper nouns (for example, personal names) as search terms. The viewing history analysis unit 13 describes the selected word / phrase in the form of a list to obtain primary query data.

なお、ユーザ視聴履歴情報に操作履歴が設定されている場合、視聴履歴解析部１３は、特許文献２に記載のように、ユーザが視聴したコンテンツの再生区間に対応するテキスト情報を形態素解析の対象に限定してもよい。コンテンツの再生区間は、ユーザ視聴履歴情報に設定されている操作履歴が示すコンテンツの再生開始点及び再生終了点により示される。コンテンツの再生区間に対応するテキスト情報は、例えば、その再生区間におけるコンテンツの字幕のデータである。コンテンツ推薦装置１は、放送波から取得した各コンテンツの字幕の情報を記憶部１９は蓄積しておき、視聴履歴解析部１３は、再生区間におけるコンテンツの字幕のデータを記憶部１９から読み出す。 When an operation history is set in the user viewing history information, the viewing history analysis unit 13 performs morphological analysis on text information corresponding to a playback section of content viewed by the user as described in Patent Document 2. You may limit to. The content playback section is indicated by the playback start point and playback end point of the content indicated by the operation history set in the user viewing history information. The text information corresponding to the content playback section is, for example, subtitle data of the content in the playback section. In the content recommendation device 1, the storage unit 19 accumulates the caption information of each content acquired from the broadcast wave, and the viewing history analysis unit 13 reads out the caption data of the content in the playback section from the storage unit 19.

ソーシャルデータ記録部１４は、記録したソーシャルデータをソーシャルデータ解析部１５に出力する（ステップＳ１２０）。時事性を考慮する場合、ソーシャルデータ記録部１４は、所定の期間のソーシャルデータをソーシャルデータ解析部１５に出力する。また、ソーシャルデータ記録部１４は、所定の発言対象のソーシャルデータをソーシャルデータ解析部１５に出力してもよい。また、ソーシャルデータ記録部１４は、事前にソーシャルデータを収集せず、ステップＳ１１５の処理の後にソーシャルデータを収集し、ソーシャルデータ解析部１５に出力してもよい。この場合、ソーシャルデータ記録部１４は、ステップＳ１１５の処理において得られた一次クエリデータを利用してソーシャルデータを取得し、記録してもよい。 The social data recording unit 14 outputs the recorded social data to the social data analysis unit 15 (step S120). In consideration of current affairs, the social data recording unit 14 outputs social data for a predetermined period to the social data analysis unit 15. Moreover, the social data recording unit 14 may output predetermined social data to be spoken to the social data analysis unit 15. Further, the social data recording unit 14 may collect social data after the process of step S <b> 115 without collecting social data in advance, and output the social data to the social data analysis unit 15. In this case, the social data recording unit 14 may acquire and record social data using the primary query data obtained in the process of step S115.

ソーシャルデータ解析部１５は、ソーシャルデータ記録部１４から受信したソーシャルデータを解析し、一次クエリデータに追加する検索語の候補となる二次クエリ候補の語句を抽出する。ソーシャルデータ解析部１５は、抽出した二次クエリ候補の語句を設定した二次クエリ候補リストデータを生成する（ステップＳ１２５）。 The social data analysis unit 15 analyzes the social data received from the social data recording unit 14 and extracts words of secondary query candidates that are candidates for search terms to be added to the primary query data. The social data analysis unit 15 generates secondary query candidate list data in which the extracted secondary query candidate words are set (step S125).

ソーシャルデータ解析部１５は、ソーシャルメディアの一つであるツイッターにおけるハッシュタグのように、ソーシャルデータ本体に付与されたラベルが利用可能である場合には、それらラベルの文字列（語句）をそのまま二次クエリ候補の語句として用いることを基本とする。また、ソーシャルデータ解析部１５は、ソーシャルデータの本体を視聴履歴解析部１３と同様の処理により解析し、固有名詞などの重要な語句をそのソーシャルデータの本体から直接抽出する処理を行ってもよい。ソーシャルデータの本体とは、ソーシャルデータにおいて投稿者の発言内容を文字列で記述した本文のデータである。 When a label attached to the social data main body is available, such as a hash tag in Twitter which is one of social media, the social data analysis unit 15 uses the character strings (words) of the labels as they are. Basically, it is used as a word for the next query candidate. Further, the social data analysis unit 15 may analyze the main body of the social data by the same process as the viewing history analysis unit 13 and perform a process of directly extracting important words such as proper nouns from the main body of the social data. . The main body of the social data is data of the body text describing the content of the poster's remarks in the social data as a character string.

ソーシャルデータ解析部１５は、取得した二次クエリ候補の各々の語句が、その語句が得られた元のソーシャルデータにおいて二次クエリ候補の他の語句と共起する場合、二次クエリ候補リストデータに、二次クエリ候補の語句に付加して補足情報を記録する。二次クエリ候補の語句の補足情報には、その二次クエリ候補の語句が得られた元のソーシャルデータにおいて共起する二次クエリ候補の他の語句全てが設定される。この補足情報は、次のステップＳ１３０の二次クエリ選定処理において利用される。 When each acquired phrase of the secondary query candidate co-occurs with other words of the secondary query candidate in the original social data from which the phrase is obtained, the social data analysis unit 15 selects secondary query candidate list data. In addition, the supplementary information is recorded in addition to the words of the secondary query candidate. In the supplementary information of the secondary query candidate word, all other words of the secondary query candidate that co-occur in the original social data from which the secondary query candidate word is obtained are set. This supplementary information is used in the secondary query selection process in the next step S130.

クエリ拡張部１６は、上記により一次クエリデータと二次クエリ候補リストデータの両者が生成された後、二次クエリ候補リストデータに設定されている語句の中から、一次クエリの複数の語句に内容的に何らかの関係が存在する語句を抽出する。さらに、クエリ拡張部１６はそれら抽出した二次クエリ候補の語句の中から選択した語句を一次クエリデータに追加し、拡張クエリデータとする（ステップＳ１３０）。 After both the primary query data and the secondary query candidate list data are generated as described above, the query expansion unit 16 stores the contents in a plurality of phrases of the primary query from the phrases set in the secondary query candidate list data. Words that have some kind of relationship. Further, the query expansion unit 16 adds a phrase selected from the extracted secondary query candidate phrases to the primary query data, thereby obtaining expanded query data (step S130).

そこでまず、クエリ拡張部１６は、二次クエリ候補リストデータに設定されている二次クエリ候補の各語句と一次クエリデータに設定されている一次クエリの各語句との類似度を何らかの手段により計算する。クエリ拡張部１６は、二次クエリ候補の語句のうち、一次クエリの語句との類似度が所定の閾値を超えた語句に限り、二次クエリとして採用する処理を基本とする。 First, the query expansion unit 16 calculates the similarity between each word of the secondary query candidate set in the secondary query candidate list data and each word of the primary query set in the primary query data by some means. To do. The query expansion unit 16 is based on a process of adopting as a secondary query only for words / phrases whose similarity to the word / phrase of the primary query exceeds a predetermined threshold among words / phrases of the secondary query candidate.

なお、二次クエリ候補の語句にソーシャルデータにおいて共起した二次クエリ候補の他の語句を記述した補足情報を付加した場合、その補足情報に一次クエリの語句のみを残すようにしてもよい。そして、クエリ拡張部１６は、二次クエリ候補の語句と一次クエリの語句の対のうち、その二次クエリ候補の語句の補足情報に対の一方となっている一次クエリの語句以外の語句が設定されていない対については、類似度計算の対象から除外してもよい。この処理を施すことにより、クエリ拡張部１６は、少なくとも一次クエリのいずれかの語句と意味的な関係が深く、かつ、少なくともひとつの他の一次クエリの語句との間に何らかの意味的なつながりがあることが保証された語句を抽出することが可能となる。すなわち、単一の一次クエリの語句としか意味的なつながりを持たない語句は二次クエリの候補から除外され、複数の一次クエリの語句と意味的なつながりをもった語句のみが二次クエリの語句として抽出される。 Note that when supplemental information describing another phrase of a secondary query candidate that co-occurs in social data is added to the phrase of the secondary query candidate, only the phrase of the primary query may be left in the supplemental information. Then, the query expansion unit 16 selects a phrase other than the primary query word that is one of the pair of supplementary information of the secondary query candidate word from the pair of the secondary query candidate word and the primary query word. Pairs that are not set may be excluded from the target of similarity calculation. By performing this processing, the query expansion unit 16 has a deep semantic relationship with at least one of the phrases in the primary query, and has some semantic connection between the phrase of at least one other primary query. It is possible to extract words that are guaranteed to exist. That is, words that only have a semantic connection to a single primary query phrase are excluded from the secondary query candidates, and only words that have a semantic connection to multiple primary query phrases are included in the secondary query. Extracted as a phrase.

語句間の類似度を定量的に計算する技術としては、多階層神経回路網による意味的距離を反映した単語のベクトル表現の技術（例えば、参考文献１参照）がある。また、単語の文書における出現傾向にもとづく特異値の大きさを特徴量の重み付けに用いた単語のクラスタリングの技術（例えば、参考文献２）も利用可能である。しかし、一次クエリデータから取得した任意の語句と、二次クエリ候補リストデータから選んだ任意の語句との対についての意味的な類似度を数値化できる技術であれば、どのような計算方法でもよく、他の技術を用いてもよい。 As a technique for quantitatively calculating the similarity between words, there is a technique of vector expression of words reflecting a semantic distance by a multi-layer neural network (for example, see Reference 1). In addition, a technique of word clustering (for example, Reference 2) using the magnitude of a singular value based on the appearance tendency of a word document for weighting a feature amount can be used. However, any calculation method can be used as long as it is a technique that can quantify the semantic similarity of a pair of an arbitrary phrase acquired from the primary query data and an arbitrary phrase selected from the secondary query candidate list data. Well, other techniques may be used.

（参考文献１）西尾泰和，「word2vecによる自然言語処理」，オライリー・ジャパン，２０１４年５月 (Reference 1) Yasukazu Nishio, “Natural Language Processing with word2vec”, O'Reilly Japan, May 2014

（参考文献２）平野真理子、神戸喬輔、小早川健，「大規模データの俯瞰とターゲットデータの抽出に対する文書―単語行列の特異値分解と特異値による重み付けの有効性」，言語処理学会，自然言語処理学会論文誌，２０１３年，Ｖｏｌ.２０，ｎｏ．３，ｐ．３３５−３６５ (Reference 2) Mariko Hirano, Keisuke Kobe, Ken Kobayakawa, “Documents for Overlooking Large-Scale Data and Extracting Target Data—Effectiveness of Singular Value Decomposition of Word Matrix and Weighting by Singular Values”, Natural Language Processing Society, Nature Journal of Language Processing Society, 2013, Vol.20, no. 3, p. 335-365

クエリ拡張部１６は、一次クエリの各語句と二次クエリ候補の各語句との論理的に可能な全ての対について、上述したように語句間の類似度を計算し、類似度が所定の閾値以上であるという条件を満たす二次クエリ候補の語句を二次クエリ(検索語）の語句として選択することを基本とする。このとき、クエリ拡張部１６は、選択した二次クエリの各々の語句（検索語）に、類似度の計算結果の値に基づいて別途算出したスコアを付与してもよい。スコアは、例えば、同一の語句同士の類似度が１となるように、類似度を正規化した値を用いることができる。また、スコアとして類似度自体を用いてもよい。このスコアは、次のステップＳ１３５の推薦コンテンツ選択処理において利用される。 The query expansion unit 16 calculates the similarity between words as described above for all logically possible pairs of each word of the primary query and each word of the secondary query candidate, and the similarity is a predetermined threshold value. Basically, a word of a secondary query candidate that satisfies the above condition is selected as a word of a secondary query (search word). At this time, the query expansion unit 16 may assign a score calculated separately based on the value of the similarity calculation result to each word (search term) of the selected secondary query. For the score, for example, a value obtained by normalizing the similarity so that the similarity between the same words becomes 1 can be used. Moreover, you may use similarity degree itself as a score. This score is used in the recommended content selection process in the next step S135.

推薦リスト生成部１７は、未視聴コンテンツ情報記録部１２から未視聴コンテンツ情報を読み出す。推薦リスト生成部１７は、未視聴コンテンツ情報に記述された未視聴コンテンツのリストの中から、ユーザに提示すべき推薦コンテンツを、拡張クエリデータを用いて選定する。推薦リスト生成部１７は、選定した推薦コンテンツを、拡張クエリデータとマッチする順にリストの形式でまとめる。そこで、推薦リスト生成部１７は、未視聴コンテンツ情報に設定されている各未視聴コンテンツのテキスト情報と、拡張クエリデータに設定されている一次クエリと二次クエリの各語句（拡張クエリデータの各要素）とのペアについてマッチングスコアを算出する。推薦リスト生成部１７は、算出したマッチングスコアに応じて各々の未視聴コンテンツに順位を付け、順位が上位Ｎ個（Ｎは１以上の整数）の未視聴コンテンツのコンテンツＩＤを列挙したリストを示す推薦コンテンツリストデータを生成する（ステップＳ１３５）。 The recommendation list generation unit 17 reads unviewed content information from the unviewed content information recording unit 12. The recommendation list generation unit 17 selects the recommended content to be presented to the user from the list of unviewed content described in the unviewed content information using the extended query data. The recommendation list generation unit 17 collects the selected recommended contents in a list format in the order of matching with the extended query data. Therefore, the recommendation list generation unit 17 sets the text information of each unviewed content set in the unviewed content information, and the words of the primary query and the secondary query set in the expanded query data (each of the expanded query data). A matching score is calculated for a pair with (element). The recommendation list generation unit 17 ranks each unviewed content according to the calculated matching score, and shows a list that lists the content IDs of the top N unlisted content (N is an integer of 1 or more). Recommended content list data is generated (step S135).

マッチングスコアは、検索語が未視聴コンテンツの内容を記述したテキスト情報に出現した回数などとすることができる。推薦リスト生成部１７は、原則として、一次クエリと二次クエリのそれぞれについてのマッチングスコアを同等に扱う方法を基本とする。具体的には、一次クエリの語句（検索語）および二次クエリの語句（検索語）のそれぞれについて独立にマッチングスコアを計算し、それらのマッチングスコアを同等の重みで扱った和（単純和）をとる。なお、推薦リスト生成部１７は、後者の二次クエリに対するマッチングスコアに何らかの方法により決定した重みを乗じた上で、前者の一次クエリに対するマッチングスコアに加算する重み付けの処理を別途、追加して行ってもよい。 The matching score may be the number of times that the search word appears in the text information describing the content of the unviewed content. The recommendation list generation unit 17 is basically based on a method of handling matching scores for each of the primary query and the secondary query equally. Specifically, a sum (simple sum) that calculates a matching score for each of the primary query terms (search terms) and secondary query terms (search terms) independently and treats the matching scores with equal weights. Take. The recommendation list generation unit 17 adds a weighting process to be added to the matching score for the former primary query after multiplying the matching score for the latter secondary query by a weight determined by some method. May be.

また、二次クエリのマッチングスコアに重みを乗ずる方法の場合、使用する重みは、ヒューリスティックに定めた経験値に固定する方法の他に、一次クエリと二次クエリとの間の類似度を用いる方法が考えられる。後者のクエリ間の類似度を利用する具体的な方法としては、例えば、二次クエリの各語句に付加されているスコアの平均値を重み（０から１の間の数値を持つ重み）とする。二次クエリの各語句に付加されているスコアは、上述したように、一次クエリの語句との類似度に基づいてクエリ拡張部１６が算出したスコアである。推薦コンテンツ提示部１８は、二次クエリの各語句のマッチングスコアを合計し、合計したマッチングスコアに類似度に基づくスコアの平均値を乗算した後、一次クエリに対するマッチングスコアと加算する。 Further, in the case of a method of multiplying the matching score of the secondary query by a weight, a method of using the similarity between the primary query and the secondary query in addition to a method of fixing the weight to be used to an heuristically determined experience value Can be considered. As a specific method of utilizing the similarity between the latter queries, for example, an average value of scores added to each word of the secondary query is set as a weight (weight having a numerical value between 0 and 1). . As described above, the score added to each word of the secondary query is a score calculated by the query expansion unit 16 based on the similarity to the word of the primary query. The recommended content presentation unit 18 sums up the matching scores of each word in the secondary query, and multiplies the total matching score by the average score based on the similarity, and then adds it to the matching score for the primary query.

最後に、推薦コンテンツ提示部１８は、推薦コンテンツリストデータに記載された推薦コンテンツの内容をユーザに提示する。つまり、推薦コンテンツ提示情報は、推薦コンテンツリストデータに記述された各々の推薦コンテンツに関する情報を記憶部１９から読み出し、読み出した情報を設定した推薦コンテンツ提示情報を生成する(ステップＳ１４０）。推薦コンテンツ提示部１８は、生成した推薦コンテンツ提示情報をユーザのコンテンツ表示装置に送信する(ステップＳ１４５）。コンテンツ表示装置３の受信部３５は、受信した推薦コンテンツ提示情報を出力部３３に出力させる。 Finally, the recommended content presentation unit 18 presents the content of the recommended content described in the recommended content list data to the user. That is, as the recommended content presentation information, information related to each recommended content described in the recommended content list data is read from the storage unit 19, and recommended content presentation information in which the read information is set is generated (step S140). The recommended content presentation unit 18 transmits the generated recommended content presentation information to the user's content display device (step S145). The receiving unit 35 of the content display device 3 causes the output unit 33 to output the received recommended content presentation information.

推薦コンテンツ提示情報は、推薦コンテンツリストデータに記述された各推薦コンテンツを特定するためのテキスト情報である。例えば、推薦コンテンツが放送番組である場合、推薦コンテンツ提示情報には、放送番組の放送日、放送開始時刻、番組名などを記述する。また、推薦コンテンツにユーザが直接アクセスするための情報や、推薦コンテンツの映像の一部を切り出したサンプル映像を、記憶部１９あるいはネットワークを介して接続されるコンピュータサーバから取得できる場合には、推薦コンテンツ提示部１８は、それらの情報を補助的情報としてコンテンツ提示情報に設定してもよい。推薦コンテンツにユーザが直接アクセスするための情報には、例えば、インターネット配信コンテンツのリンク情報を利用することができる。また、サンプル映像には、サムネイル画像、ハイライト映像、予告動画などを利用することができる。 The recommended content presentation information is text information for specifying each recommended content described in the recommended content list data. For example, when the recommended content is a broadcast program, the recommended content presentation information describes the broadcast date of the broadcast program, the broadcast start time, the program name, and the like. In addition, when the information for directly accessing the recommended content by the user or the sample video obtained by cutting out a part of the video of the recommended content can be acquired from the storage unit 19 or a computer server connected via the network, the recommended content The presentation unit 18 may set such information as auxiliary information in the content presentation information. As information for the user to directly access the recommended content, for example, link information of Internet distribution content can be used. Moreover, a thumbnail image, a highlight video, a notice video, etc. can be used for the sample video.

推薦コンテンツ提示情報の表示形態は、コンピュータ装置の画面に一覧表示が可能な、テキストベースの静的な表示形式を基本とする。なお、推薦コンテンツのサンプル映像が利用可能である場合には、それら補助的情報（動画像）を画面上の所定の領域に、推薦コンテンツリストデータに記載された順に提示（動作再生）するなど、視覚的な工夫を別途実装してもよい。 The display format of the recommended content presentation information is basically a text-based static display format that can be displayed as a list on the screen of the computer device. If sample video of recommended content is available, such auxiliary information (moving image) is presented in a predetermined area on the screen in the order described in the recommended content list data (operation playback). A visual device may be implemented separately.

なお、図２の処理において、コンテンツ推薦装置１は、ステップＳ１１０の処理、ステップＳ１１５の処理、ならびに、ステップＳ１２０からステップＳ１２５までの処理のうち任意の処理を並行して実行してもよい。 In the process of FIG. 2, the content recommendation device 1 may execute any process among the process of step S110, the process of step S115, and the process from step S120 to step S125 in parallel.

図３は、視聴履歴解析部１３による一次クエリ生成処理の処理フローを示す図であり、図２のステップＳ１１５における一次クエリ生成処理の詳細を示す。
視聴履歴解析部１３は、視聴履歴記録部１１からユーザ視聴履歴情報を受信する(ステップＳ２０５）。基本の方法では、視聴履歴解析部１３は、ユーザ視聴履歴情報から視聴済みコンテンツの内容を表す番組概要などのテキスト情報を取り出す(ステップＳ２１０）。別の方法としては、視聴履歴解析部１３は、特許文献２に記載された方法のように、ユーザ視聴履歴情報に設定されている視聴コンテンツの再生区間に対応する字幕テキストなどのテキスト情報を記憶部１９から取得する。 FIG. 3 is a diagram showing a processing flow of primary query generation processing by the viewing history analysis unit 13, and shows details of the primary query generation processing in step S115 of FIG.
The viewing history analysis unit 13 receives user viewing history information from the viewing history recording unit 11 (step S205). In the basic method, the viewing history analysis unit 13 extracts text information such as a program summary representing the content of the viewed content from the user viewing history information (step S210). As another method, the viewing history analysis unit 13 stores text information such as subtitle text corresponding to the playback section of the viewing content set in the user viewing history information as in the method described in Patent Document 2. Obtained from the unit 19.

視聴履歴解析部１３は、ステップＳ２１０において取り出したテキスト情報に対応した文字列に対して形態素解析の処理を施して、品詞情報が付与された語句（形態素）の列に分解する(ステップＳ２１５）。形態素解析の対象となるテキスト情報は、すなわち、視聴コンテンツ全体あるいは視聴コンテンツの再生区間に対応した文字列である。形態素解析の具体的な手段としては、オープンソースの形態素解析ソフトウェアであるMeCabなどの公知の技術が利用可能である。 The viewing history analysis unit 13 performs a morpheme analysis process on the character string corresponding to the text information extracted in step S210, and decomposes it into a string of words (morphemes) to which part-of-speech information is assigned (step S215). The text information to be subjected to morphological analysis is a character string corresponding to the entire viewing content or the playback section of the viewing content. As a specific means of morphological analysis, a known technique such as MeCab which is open source morphological analysis software can be used.

次に、視聴履歴解析部１３は、ステップＳ２１５の形態素解析により得られた品詞情報付きの語句の列から、視聴コンテンツ全体、あるいは、視聴コンテンツの再生区間にかかる話題を特定可能な語句を選定する(ステップＳ２２０）。例えば、視聴履歴解析部１３は、品詞情報に基づいて、人名や組織名、地域名、商品名などの語句のように、指し示す対象物が限定的な名詞（固有名詞）を選定する。最後に、視聴履歴解析部１３は、選定した語句をリスト形式にまとめて一次クエリデータとして出力する。 Next, the viewing history analysis unit 13 selects a phrase that can specify the entire viewing content or the topic related to the playback section of the viewing content from the sequence of phrases with part-of-speech information obtained by the morphological analysis in step S215. (Step S220). For example, the viewing history analysis unit 13 selects nouns (proper nouns) whose target objects are limited, such as phrases such as names of people, organizations, regions, and products, based on the part of speech information. Finally, the viewing history analysis unit 13 collects the selected words and phrases in a list format and outputs them as primary query data.

図４は、クエリ拡張部１６による二次クエリ選定処理の処理フローを示す図であり、図２のステップＳ１３０における二次クエリ選定処理の詳細を示す。ここでは、二次クエリ候補の語句の中から補足情報を利用して二次クエリの語句を選定する処理の例を示す。
まず、クエリ拡張部１６は、視聴履歴解析部１３から一次クエリデータを受信し、ソーシャルデータ解析部１５から二次クエリ候補リストデータを受信する(ステップＳ３０５）。次に、クエリ拡張部１６は、二次クエリ候補リストデータに記述されたそれぞれの語句について、当該語句が一次クエリデータに記述されている語句であるか否かを判断する。クエリ拡張部１６は、二次クエリ候補リストデータに記述されている二次クエリ候補の語句の中から、一次クエリデータに記述されているいずれかの語句と一致する語句を除外する(ステップＳ３１０）。 FIG. 4 is a diagram showing a processing flow of the secondary query selection process by the query expansion unit 16, and shows details of the secondary query selection process in step S130 of FIG. Here, an example of processing for selecting a secondary query word / phrase using supplementary information from secondary query candidate words / phrases will be described.
First, the query expansion unit 16 receives primary query data from the viewing history analysis unit 13 and receives secondary query candidate list data from the social data analysis unit 15 (step S305). Next, the query expansion unit 16 determines, for each word / phrase described in the secondary query candidate list data, whether or not the word / phrase is a word / phrase described in the primary query data. The query expansion unit 16 excludes a phrase that matches one of the phrases described in the primary query data from the phrases of the secondary query candidates described in the secondary query candidate list data (step S310). .

次に、クエリ拡張部１６は、ステップＳ３０５において一次クエリの語句を除いた二次クエリ候補の語句それぞれについて、補足情報に含まれる語句が、一次クエリデータに記述されている語句であるか否かを判断する。二次クエリ候補の語句に付加されている補足情報は、その二次クエリ候補の語句とソーシャルデータにおいて共起する他の語句を示す。クエリ拡張部１６は、補足情報が示す語句の中から、一次クエリデータに記述されている語句のいずれとも一致しない語句を除外する(ステップＳ３１５）。これにより、二次クエリ候補の語句の共起の相手の語句から、一次クエリデータに出現しない語句が除外される。 Next, the query expansion unit 16 determines whether or not the word / phrase included in the supplemental information is the word / phrase described in the primary query data for each word / phrase of the secondary query candidate excluding the word / phrase of the primary query in step S305. Judging. The supplementary information added to the phrase of the secondary query candidate indicates another phrase that co-occurs in the social data with the phrase of the secondary query candidate. The query expansion unit 16 excludes words / phrases that do not match any of the words / phrases described in the primary query data from the words / phrases indicated by the supplementary information (step S315). As a result, words that do not appear in the primary query data are excluded from the words of the co-occurrence of the words of the secondary query candidate.

クエリ拡張部１６は、ステップＳ３１５の処理によって二次クエリ候補の補足情報から一次クエリの語句以外の語句を除いた後、二次クエリ候補リストデータに含まれる各語句と、一次クエリデータに含まれる各語句とのそれぞれを、何らかの手段により単語間の意味的距離を反映したベクトル表現に変換する。クエリ拡張部１６は、二次クエリ候補の語句と一次クエリの語句との可能なすべての組み合わせそれぞれについて、何らかの手段により語句間の意味的な類似度を計算する(ステップＳ３２０）。類似度を定量的に評価する具体的な方法は、例えば、上述した参考文献１や参考文献２など、任意の既存の方法を使用することができるが、この限りではない。 The query expansion unit 16 removes words other than the words of the primary query from the supplementary information of the secondary query candidate through the process of step S315, and then includes each word and phrase included in the secondary query candidate list data and the primary query data. Each phrase is converted into a vector expression reflecting the semantic distance between words by some means. The query expansion unit 16 calculates the semantic similarity between the words by some means for each possible combination of the words of the secondary query candidate and the words of the primary query (step S320). As a specific method for quantitatively evaluating the degree of similarity, for example, any existing method such as Reference Document 1 and Reference Document 2 described above can be used, but this is not restrictive.

二次クエリ候補の語句を語句Ａ、語句Ａとの類似度を算出する対象の一次クエリの語句を語句Ｃとする。クエリ拡張部１６は、いずれか１以上の一次クエリの語句Ｃとの類似度が所定の閾値を超えた二次クエリ候補の全ての語句Ａについて、類似度が所定の閾値を超えた相手の語句Ｃを二次クエリ候補リストデータに上書きして保存する。なお、閾値の設定方法は経験的な値に固定する方法が考えられるが、この限りではない。 The phrase of the secondary query candidate is the phrase A, and the phrase of the primary query for which the similarity with the phrase A is calculated is the phrase C. The query expansion unit 16 uses the phrase of the partner whose similarity exceeds a predetermined threshold for all the phrases A of the secondary query candidates whose similarity to the phrase C of any one or more primary queries exceeds a predetermined threshold. Save C by overwriting the secondary query candidate list data. In addition, although the method of fixing to a empirical value can be considered as the setting method of a threshold value, it is not this limitation.

次に、クエリ拡張部１６は、二次クエリ候補リストデータから、二次クエリ候補の語句Ａと、その語句Ａと類似度が所定の閾値を超える一次クエリの語句Ｃと、語句Ａが共起する一次クエリの語句Ｂとを読み出す（ステップＳ３２５）。語句Ａが共起する一次クエリの語句Ｂは、語句Ａの補足情報から読み出される。
クエリ拡張部１６は、二次クエリ候補の語句Ａのそれぞれについて、語句Ａが共起する一次クエリの語句Ｂと、語句Ａとの類似度が所定の閾値を超える一次クエリの語句Ｃとが同一であるか否かを判断する。クエリ拡張部１６は、語句Ｂと語句Ｃとが同一である二次クエリ候補の語句Ａについては、二次クエリの語句として選択せず、二次クエリ候補リストデータからその語句Ａに付加されている補足情報及び語句Ｃと共に除外する。クエリ拡張部１６は、語句Ｂと語句Ｃとが異なる二次クエリ候補の語句Ａについては、二次クエリ候補リストデータにそのまま残す（ステップＳ３３０）。 Next, the query expansion unit 16 co-occurs the word A of the secondary query candidate, the word C of the primary query whose similarity with the word A exceeds a predetermined threshold, and the word A from the secondary query candidate list data. The phrase B of the primary query to be read is read (step S325). The phrase B of the primary query in which the phrase A co-occurs is read from the supplementary information of the phrase A.
For each word A of the secondary query candidate, the query expansion unit 16 has the same word B of the primary query in which the word A co-occurs and the word C of the primary query whose similarity with the word A exceeds a predetermined threshold. It is determined whether or not. The query expansion unit 16 does not select the secondary query candidate phrase A having the same phrase B and phrase C as the secondary query phrase, and is added to the phrase A from the secondary query candidate list data. Together with the supplementary information and the phrase C. The query expansion unit 16 leaves the phrase A of the secondary query candidate in which the phrase B and the phrase C are different from each other in the secondary query candidate list data (step S330).

最後に、クエリ拡張部１６は、二次クエリ候補リストデータに残った語句Ａを二次クエリとして採用する。すなわち、クエリ拡張部１６は、類似度が所定の閾値を超える相手の語句が一次クエリデータに存在し、かつ、その相手の語句が共起相手の一次クエリの語句とは異なる二次クエリ候補の語句を、二次クエリの語句として採用する。クエリ拡張部１６は、一次クエリの語句のリストを含む一次クエリデータと、採用した二次クエリの語句のリストとを連結したリストを拡張クエリデータとして出力する（ステップＳ３３５）。
拡張クエリデータには、後述する図１１の拡張クエリデータの具体例に示すように、先の類似度計算で得られた値（二次クエリの語句と一次クエリの語句との間の類似度）を各々の語句に併記してもよいが、これは必須の処理ではない。 Finally, the query expansion unit 16 adopts the phrase A remaining in the secondary query candidate list data as a secondary query. In other words, the query expansion unit 16 includes a partner query whose similarity exceeds a predetermined threshold in the primary query data, and the partner query is different from the query of the co-occurrence partner primary query. The phrase is adopted as the phrase for the secondary query. The query expansion unit 16 outputs, as expanded query data, a list obtained by concatenating the primary query data including the primary query word list and the adopted secondary query word list (step S335).
In the extended query data, as shown in a specific example of the extended query data in FIG. 11 described later, the value obtained by the previous similarity calculation (similarity between the words of the secondary query and the words of the primary query). May be included in each word, but this is not an essential process.

図５は、推薦リスト生成部１７による推薦コンテンツ選択処理の処理フローを示す図であり、図３のステップＳ１３５における推薦コンテンツ選択処理の詳細を示す。ここでは、二次クエリのマッチングスコアに重み付け処理を行う場合について示す。
まず、推薦リスト生成部１７は、未視聴コンテンツ情報記録部１２から未視聴コンテンツ情報を受信し、クエリ拡張部１６から拡張クエリデータを受信する（ステップＳ４０５）。推薦リスト生成部１７は、拡張クエリデータの部分集合である一次クエリデータを取得する。推薦リスト生成部１７は、未視聴コンテンツ情報のリストに記述されている各コンテンツについて、そのコンテンツのテキスト情報と一次クエリデータに属する一次クエリの語句とのマッチングスコアを計算し、一次スコアとする（ステップＳ４１０）。一次スコアを計算する具体的な処理としては、例えば、表記レベルで一次クエリの語句と一致する語句の出現頻度を単純に足し上げ、その出現頻度の合計値をそのまま利用する方法が考えられるが、その限りではない。 FIG. 5 is a diagram showing a process flow of recommended content selection processing by the recommendation list generation unit 17, and shows details of recommended content selection processing in step S135 of FIG. Here, it shows about the case where a weighting process is performed to the matching score of a secondary query.
First, the recommendation list generation unit 17 receives unviewed content information from the unviewed content information recording unit 12, and receives expanded query data from the query expansion unit 16 (step S405). The recommendation list generation unit 17 acquires primary query data that is a subset of the extended query data. For each content described in the list of unviewed content information, the recommendation list generation unit 17 calculates a matching score between the text information of the content and the primary query word belonging to the primary query data, and sets it as the primary score ( Step S410). As a specific process for calculating the primary score, for example, a method of simply adding up the appearance frequency of words that match the word of the primary query at the notation level and using the total value of the appearance frequencies as it is, Not so.

次に、推薦リスト生成部１７は、拡張クエリデータの残りの部分集合である二次クエリの語句のリストを取得する。推薦リスト生成部１７は、未視聴コンテンツ情報のリストに記述されている各コンテンツについて、そのコンテンツのテキスト情報と二次クエリの語句とのマッチングスコアを計算し、二次スコアとする（ステップＳ４１５）。二次スコアを計算する具体的な処理としては、先に示した一次スコアの計算方法と同様に、表記レベルで二次クエリの語句と一致する語句の出現頻度の累計値をそのまま用いる方法が考えられるが、その限りではない。 Next, the recommendation list generation unit 17 acquires a list of phrases of the secondary query that is the remaining subset of the extended query data. For each content described in the list of unviewed content information, the recommendation list generation unit 17 calculates a matching score between the text information of the content and the phrase of the secondary query, and sets it as a secondary score (step S415). . As a specific process for calculating the secondary score, the method of using the cumulative value of the appearance frequency of the phrase that matches the phrase of the secondary query at the notation level as in the above-described calculation method of the primary score is considered. Yes, but not so.

次に、推薦リスト生成部１７は、各コンテンツについて算出した一次スコアと二次スコアそれぞれに所定の重みを乗じた後に、それらの和を計算し、その値を当該コンテンツのマッチングスコアとする（ステップＳ４２０）。推薦リスト生成部１７は、各々の未視聴コンテンツについてのマッチングスコアをすべて計算した後に、マッチングスコアの値に基づいてコンテンツを何らかの手段により並べ替える。未視聴コンテンツのリストの並べ替えの具体的な手段としては、たとえば公知の技術であるUNIX（登録商標）コマンドのsortが利用可能であるが、この限りではない。推薦リスト生成部１７は、並べ替えたマッチングスコアの上位Ｎ個の未視聴コンテンツのコンテンツＩＤを推薦コンテンツリストデータに格納し出力する（ステップＳ４２５）。 Next, the recommendation list generation unit 17 multiplies each of the primary score and the secondary score calculated for each content by a predetermined weight, calculates a sum of them, and sets the value as a matching score for the content (step) S420). After calculating all the matching scores for each unviewed content, the recommendation list generation unit 17 rearranges the content by some means based on the value of the matching score. As a specific means for rearranging the list of unviewed content, for example, a sort of UNIX (registered trademark) command, which is a well-known technique, can be used, but this is not restrictive. The recommendation list generation unit 17 stores and outputs the content IDs of the top N unviewed contents of the sorted matching scores in the recommended content list data (step S425).

続いて、具体的なデータ例を用いてコンテンツ推薦装置１の動作例を説明する。
図６は、視聴履歴記録部１１が出力するユーザ視聴履歴情報の具体例を示す図である。ユーザ視聴履歴情報には、ユーザが視聴したコンテンツを特定する情報と、コンテンツの内容を示すテキスト情報とが、コンテンツごとに記述される。解析対象のコンテンツが放送番組である場合、同図に示すように、ユーザ視聴履歴情報には、ユーザが視聴した番組の放送チャンネル名、放送日時、番組名、及び番組概要文が、リスト形式で記述される。なお、これら放送番組に関する各種情報は、SKNET社のMonsterTVなどの商用ソフトウェアを用いることによって、放送波から計算機可読な状態で取得可能である。 Next, an operation example of the content recommendation device 1 will be described using specific data examples.
FIG. 6 is a diagram illustrating a specific example of user viewing history information output by the viewing history recording unit 11. In the user viewing history information, information for specifying the content viewed by the user and text information indicating the content are described for each content. When the content to be analyzed is a broadcast program, as shown in the figure, the user viewing history information includes a broadcast channel name, a broadcast date and time, a program name, and a program summary sentence of the program viewed by the user in a list format. Described. It should be noted that various information regarding these broadcast programs can be obtained in a computer-readable state from broadcast waves by using commercial software such as SKNET's Monster TV.

図７は、未視聴コンテンツ情報記録部１２が出力する未視聴コンテンツ情報の具体例を示す図である。未視聴コンテンツ情報は、ユーザ視聴履歴情報に含まれず、かつ、ユーザが現在および将来において利用可能なコンテンツに関する情報を、ユーザ視聴履歴情報に準ずる形態で記載したものである。解析対象のコンテンツが放送番組である場合、未視聴コンテンツ情報には、コンテンツ推薦処理の開始から一週間先までの放送予定番組それぞれの番組ＩＤ、放送チャンネル名、放送日時、番組名、及び番組概要文が、リスト形式で記述される。なお、これら放送予定番組に関する各種情報は、上述したSKNET社のMonsterTVなどの商用ソフトウェアを用いることによって、放送波から計算機可読な状態で取得可能である。 FIG. 7 is a diagram illustrating a specific example of unviewed content information output by the unviewed content information recording unit 12. The unviewed content information is information that is not included in the user viewing history information and that can be used by the user at present and in the future in a form similar to the user viewing history information. When the content to be analyzed is a broadcast program, the unviewed content information includes the program ID, broadcast channel name, broadcast date and time, program name, and program outline of each program scheduled to be broadcast from the start of the content recommendation process to one week ahead. A sentence is described in a list format. It should be noted that the various types of information related to the broadcast-scheduled program can be acquired in a computer-readable state from the broadcast wave by using commercial software such as SKNET's Monster TV described above.

図８は、視聴履歴解析部１３が出力する一次クエリデータの具体例を示す図である。一次クエリデータには、ステップＳ１１５において、視聴履歴解析部１３が、ユーザ視聴履歴情報から抽出した語句である検索語がリスト形式で記載される。同図に示す一次クエリデータは、視聴履歴解析部１３が図６に示すユーザ視聴履歴情報から抽出した３つの語句「建築」、「スコットランド」、「政治」からなる検索語の集合を示す。 FIG. 8 is a diagram illustrating a specific example of primary query data output from the viewing history analysis unit 13. In the primary query data, in step S115, the viewing history analysis unit 13 describes the search terms that are the phrases extracted from the user viewing history information in a list format. The primary query data shown in the figure shows a set of search terms including the three phrases “architecture”, “Scotland”, and “politics” extracted from the user viewing history information shown in FIG. 6 by the viewing history analysis unit 13.

図９は、ソーシャルデータ記録部１４が保存するソーシャルデータの具体例を示す図である。同図に示すソーシャルデータは、ソーシャルデータ記録部１４が、ツイッターのツイートログ検索画面に、図８に示す一次クエリデータに含まれる一次クエリの語句「建築」、「スコットランド」、「政治」をそれぞれ検索語として入力して得たツイート内容を示す。各々のエントリにおける括弧内の文字列は、ツイートの発言者と発言日時を表す。また、その括弧に続く文字列は、各々のツイートの発言内容を示す。エントリの最後の「＃」記号ではじまる文字列は、ツイートの内容を分類するためのラベル（ハッシュタグ）である。 FIG. 9 is a diagram illustrating a specific example of social data stored by the social data recording unit 14. In the social data shown in the figure, the social data recording unit 14 displays the words “architecture”, “Scotland”, and “politics” of the primary query included in the primary query data shown in FIG. The tweet contents obtained by inputting as a search term are shown. A character string in parentheses in each entry represents a tweet speaker and a utterance date. The character string following the parenthesis indicates the content of each tweet. The character string starting with the last “#” symbol of the entry is a label (hash tag) for classifying the content of the tweet.

図１０は、ソーシャルデータ解析部１５が出力する二次クエリ候補リストデータの具体例を示す図である。同図に示す二次クエリ候補リストデータは、ステップＳ１２５においてソーシャルデータ解析部１５が図９に示したソーシャルデータからハッシュタグを利用して抽出した二次クエリ候補の語句のリストを示す。二次クエリ候補の語句「建築」、「スコットランド」、「グラスゴー」、「狭小」、「ミニマル」、「住民投票」、「政治」、「軍歌」のそれぞれの後ろには、その語句がソーシャルデータで共起した他の語句を示す補足情報が括弧書きで記述されている。 FIG. 10 is a diagram illustrating a specific example of secondary query candidate list data output by the social data analysis unit 15. The secondary query candidate list data shown in the figure shows a list of phrases of secondary query candidates extracted by the social data analysis unit 15 using the hash tag from the social data shown in FIG. 9 in step S125. Secondary query candidate phrases “Architecture”, “Scotland”, “Glasgow”, “Narrow”, “Minimal”, “Invitation”, “Politics”, “Military Song” are followed by social data Supplementary information indicating other words co-occurred in is written in parentheses.

クエリ拡張部１６は、図４のステップＳ３１０の処理において、図１０に示す二次クエリ候補リストデータに記述されたそれぞれの語句について、当該語句が一次クエリデータに記述されている語句であるか否かを判断する。クエリ拡張部１６は、二次クエリ候補リストデータに記述されている語句の中から、一次クエリデータに含まれる一次クエリの語句「建築」、「スコットランド」、「政治」を除外する。これにより、二次クエリ候補リストデータには、「グラスゴー（建築，スコットランド）」、「狭小（建築，ミニマル）」、「ミニマル（建築，狭小）」、「ウィスキー（スコットランド）」、「住民投票（スコットランド）」、「軍歌（政治）」が残る。 In the process of step S310 in FIG. 4, the query expansion unit 16 determines whether or not the phrase is described in the primary query data for each phrase described in the secondary query candidate list data illustrated in FIG. Determine whether. The query expansion unit 16 excludes the phrases “architecture”, “Scotland”, and “politics” of the primary query included in the primary query data from the phrases described in the secondary query candidate list data. Accordingly, the secondary query candidate list data includes “Glasgow (Architecture, Scotland)”, “Narrow (Architecture, Minimal)”, “Minimal (Architecture, Narrow)”, “Whisky (Scotland)”, “ Scotland) ”and“ Military Song (Politics) ”remain.

さらに、クエリ拡張部１６は、ステップＳ３１５の処理において、二次クエリ候補リストデータに設定されている補足情報から一次クエリデータに出現しない語句を除外する。このとき、補足情報に一次クエリデータに出現する語句が含まれない二次クエリ候補の語句も二次クエリ候補リストデータから除外する。これにより、二次クエリ候補リストデータには、「グラスゴー（建築，スコットランド）」、「狭小（建築）」、「ミニマル（建築）」、「ウィスキー（スコットランド）」、「住民投票（スコットランド）」、「軍歌（政治）」が残る。 Further, in the process of step S315, the query expansion unit 16 excludes words / phrases that do not appear in the primary query data from the supplementary information set in the secondary query candidate list data. At this time, phrases of secondary query candidates whose supplementary information does not include words that appear in the primary query data are also excluded from the secondary query candidate list data. As a result, the secondary query candidate list data includes "Glasgow (Architecture, Scotland)", "Narrow (Architecture)", "Minimal (Architecture)", "Whisky (Scotland)", "Military song (politics)" remains.

クエリ拡張部１６は、二次クエリ候補リストデータに残った二次クエリ候補の語句「グラスゴー」、「狭小」、「ミニマル」、「ウィスキー」、「住民投票」、「軍歌」のそれぞれと、一次クエリの語句「建築」、「スコットランド」、「政治」のそれぞれとの類似度を算出する。そして、ステップＳ３２０において、クエリ拡張部１６は、一次クエリの語句「スコットランド」との類似度が所定以上の二次クエリの語句「グラスゴー」と、一次クエリの語句「政治」との類似度が所定以上の二次クエリの語句「住民投票」を選択する。クエリ拡張部１６は、二次クエリ候補リストデータに、二次クエリ候補の語句「グラスゴー」に対応付けて類似度が所定以上の相手の一次クエリの語句「スコットランド」を書き込む。さらに、クエリ拡張部１６は、二次クエリ候補の語句「住民投票」に対応付けて類似度が所定以上の相手の一次クエリの語句「政治」を書き込む。クエリ拡張部１６は、二次クエリ候補リストデータから、一次クエリの語句との類似度が所定より低い二次クエリの語句「狭小」、「ミニマル」、「ウィスキー」、「軍歌」と、それらの語句の補足情報を削除する。 The query expansion unit 16 includes the secondary query candidate words “Glasgow”, “narrow”, “minimal”, “whiskey”, “residential vote”, “military song”, and the primary query candidate remaining in the secondary query candidate list data. The similarity with each of the query terms “architecture”, “Scotland”, and “politics” is calculated. In step S320, the query expansion unit 16 determines that the similarity between the secondary query word “Glasgow” and the primary query word “politics” is similar to the primary query word “Scotland”. Select the phrase “local referendum” in the secondary query above. The query expansion unit 16 writes, in the secondary query candidate list data, the phrase “Scotland” of the partner's primary query having a similarity equal to or higher than a predetermined level in association with the phrase “Glasgow” of the secondary query candidate. Further, the query expansion unit 16 writes the phrase “politics” of the primary query of the partner having a similarity equal to or higher than a predetermined level in association with the phrase “resident vote” of the secondary query candidate. The query expansion unit 16 uses the secondary query terms “narrow”, “minimal”, “whiskey”, “military” of the secondary query whose similarity to the primary query is lower than a predetermined value from the secondary query candidate list data, Delete supplemental information for a phrase.

ステップＳ３２５〜ステップＳ３３０において、クエリ拡張部１６は、以下の処理を行う。すなわち、クエリ拡張部１６は、二次クエリ候補リストデータから二次クエリ候補の語句「グラスゴー」と、その語句の補足情報（建築，スコットランド）と、類似度が所定以上の相手の一次クエリの語句「スコットランド」を読み出す。クエリ拡張部１６は、補足情報に、類似度が所定以上の相手の一次クエリの語句「スコットランド」以外の一次クエリの語句「建築」が設定されているため、二次クエリ候補の語句「グラスゴー」を二次クエリの語句として選択する。クエリ拡張部１６は、二次クエリ候補の語句「グラスゴー」を二次クエリ候補リストデータにそのまま残す。
また、クエリ拡張部１６は、二次クエリ候補リストデータから二次クエリ候補の語句「住民投票」と、その語句の補足情報（スコットランド）と、類似度が所定以上の相手の一次クエリの語句「政治」を読み出す。クエリ拡張部１６は、補足情報に、類似度が所定以上の相手の一次クエリの語句「政治」以外の一次クエリの語句「スコットランド」が設定されているため、二次クエリ候補の語句「住民投票」を二次クエリの語句として選択する。クエリ拡張部１６は、二次クエリ候補の語句「住民投票」を二次クエリ候補リストデータにそのまま残す。 In steps S325 to S330, the query expansion unit 16 performs the following processing. That is, the query expansion unit 16 uses the secondary query candidate word / phrase “Glasgow” from the secondary query candidate list data, the supplementary information (architecture, Scotland) of the word, and the primary query word / phrase of the partner having a similarity equal to or higher than a predetermined value Read "Scotland". Since the query expansion unit 16 is set with the phrase “architecture” of the primary query other than the phrase “Scotland” of the primary query of the partner whose similarity is equal to or higher than a predetermined level, the query expansion unit 16 sets the phrase “Glasgow” as the secondary query candidate. As the secondary query term. The query expansion unit 16 leaves the phrase “Glasgow” of the secondary query candidate as it is in the secondary query candidate list data.
In addition, the query expansion unit 16 uses the secondary query candidate list data from the secondary query candidate list data, the phrase “resident vote”, supplementary information (Scotland) of the phrase, and the phrase “of the primary query of a partner whose similarity is equal to or higher than a predetermined value”. Read "politics". Since the query expansion unit 16 sets the phrase “scotland” of the primary query other than the phrase “politics” of the primary query of the partner whose similarity is equal to or higher than a predetermined level in the supplementary information, As the secondary query term. The query expansion unit 16 leaves the phrase “resident vote” of the secondary query candidate in the secondary query candidate list data as it is.

図１１は、クエリ拡張部１６が生成する拡張クエリデータの具体例を示す図である。
同図に示す拡張クエリデータに設定されている語句のリストのうち前半の語句「建築」、「スコットランド」、「政治」は、一次クエリデータから引き継がれた語句である。拡張クエリデータに設定されている語句のリストのうち後半の語句「グラスゴー」及び「住民投票」は、ステップＳ３３５において二次クエリ候補の語句の中からクエリ拡張部１６が二次クエリとして採択した語句である。 FIG. 11 is a diagram illustrating a specific example of the expanded query data generated by the query expansion unit 16.
The first half of the phrase “architecture”, “Scotland”, and “politics” in the list of phrases set in the expanded query data shown in FIG. 3 are phrases inherited from the primary query data. The latter half of the phrase “Glasgow” and “resident vote” in the list of phrases set in the expanded query data are the phrases that the query expansion unit 16 has adopted as secondary queries from the secondary query candidate phrases in step S335. It is.

同図においてそれぞれの語句と併記されている数値は、クエリ拡張部１６が計算したその語句と一次クエリの各語句との類似度のうち最も高い類似度を示す。なお、同じ語句同士の類似度は１．００である。従って、一次クエリデータから引き継がれた語句は、自語句との類似度が最も高いため、１．００となる。 In the figure, the numerical value written together with each word indicates the highest similarity among the similarities between the word calculated by the query expansion unit 16 and each word of the primary query. Note that the similarity between the same words is 1.00. Therefore, the phrase taken over from the primary query data has the highest similarity with the own phrase, and is 1.00.

図１２は、拡張クエリの語句と未視聴コンテンツ情報との関係を説明するための図である。同図において円Ｒ１〜Ｒ３の中に記述されている語句はそれぞれ、一次クエリの語句「建築」、「スコットランド」、「政治」である。また、円Ｒ４〜Ｒ５の中に記述されている語句はそれぞれ、二次クエリ候補の語句であり、二次クエリに選択された語句「グラスゴー」、「住民投票」である。円Ｒ４〜Ｒ９の中に記述されている語句はそれぞれ、二次クエリ候補であるが二次クエリには選択されなかった語句「狭小」、「ミニマル」、「ウィスキー」、「軍歌」である。各円の中心位置は、円の中に記述された語句を、ある手段によりその語句の意味的な類似度を反映したベクトル空間に射影した場合の位置を表す。すなわち、同図において近い位置に配置された円の語句同士は、意味的な類似度が大きいことを表す。 FIG. 12 is a diagram for explaining the relationship between the phrase of the extended query and the unviewed content information. In the figure, the terms described in circles R1 to R3 are the primary query terms “architecture”, “Scotland”, and “politics”, respectively. The phrases described in the circles R4 to R5 are secondary query candidate phrases, which are the phrases “Glasgow” and “resident vote” selected for the secondary query. The phrases described in the circles R4 to R9 are the phrases “narrow”, “minimal”, “whiskey”, and “military song” that are candidates for the secondary query but are not selected for the secondary query. The center position of each circle represents the position when a word described in the circle is projected onto a vector space reflecting the semantic similarity of the word by a certain means. That is, the words in circles arranged at close positions in the figure indicate that the semantic similarity is large.

各々の円に付けられた矢印は、各々の円の中に記述された語句がソーシャルデータにおいて共起する関係を表す。そして、矢印の元の語句は、その語句が一次クエリデータに存在することを表し、矢印の先の語句は、その語句が二次クエリ候補であることを表す。例えば、二次クエリ候補（二次クエリ）の語句「グラスゴー」は、一次クエリの語句「建築」ならびに「スコットランド」とソーシャルデータにおいて共起の関係にあることを表す。また、二次クエリ候補の語句「狭小」は、一次クエリの語句「建築」とソーシャルデータにおいて共起の関係にあることを表す。 An arrow attached to each circle represents a relationship in which words described in each circle co-occur in social data. The original phrase of the arrow indicates that the phrase is present in the primary query data, and the phrase at the end of the arrow indicates that the phrase is a secondary query candidate. For example, the phrase “Glasgow” of the secondary query candidate (secondary query) represents a co-occurrence relationship in the social data with the phrases “Architecture” and “Scotland” of the primary query. In addition, the phrase “narrow” of the secondary query candidate represents a co-occurrence relationship in the social data with the phrase “architecture” of the primary query.

テキスト情報Ｔ１は、一次クエリの語句「スコットランド」及び「建築」と、二次クエリの語句「グラスゴー」にヒットした未視聴コンテンツ情報を示す。また、テキスト情報Ｔ２は、一次クエリの語句「スコットランド」及び二次クエリの語句「住民投票」にヒットした未視聴コンテンツ情報を示す。すなわち、これらは、二次クエリを用いることによって推薦リストの上位にシフトされるコンテンツの具体例である。 The text information T1 indicates unviewed content information that hits the phrases “Scotland” and “Architecture” of the primary query and the phrase “Glasgow” of the secondary query. The text information T2 indicates unviewed content information that hits the phrase “Scotland” in the primary query and the phrase “resident vote” in the secondary query. That is, these are specific examples of content that is shifted up the recommendation list by using a secondary query.

コンテンツ推薦装置１は、一次クエリの語句と二次クエリ候補の語句間の共起の関係（矢印）、ならびに、語句間の類似度数（円同士の位置の近さ）の両方の情報を用いて、二次クエリ候補の語句の中からどの語句を二次クエリとして採用するかを決定する。そして、コンテンツ推薦装置１は、採用した二次クエリの語句を一次クエリと併用して未視聴コンテンツのテキスト情報とのマッチングスコアを計算する。これにより、コンテンツ推薦装置１は、ユーザの潜在的な嗜好、ならびに、世間一般の時事の話題をより反映したコンテンツを推薦コンテンツリストの上位に位置づけることができる。 The content recommendation device 1 uses information on both the co-occurrence relationship (arrow) between the words of the primary query and the words of the secondary query candidate, and the similarity degree between the words (the proximity of the positions of the circles). Then, it is determined which word / phrase is adopted as the secondary query from the words / phrases of the secondary query candidate. Then, the content recommendation device 1 calculates the matching score with the text information of the unviewed content by using the adopted secondary query phrase together with the primary query. As a result, the content recommendation device 1 can position the content more reflecting the user's potential preference and the topic of the general public at the top of the recommended content list.

以下、二次クエリの語句の取捨選択の基準について、具体例をあげながら詳しく説明する。上述したように、二次クエリ候補の語句のうち、円Ｒ４、Ｒ５の語句「グラスゴー」、「住民投票」は二次クエリとして採択された語句であり、円Ｒ６〜Ｒ１０の語句「狭小」、「ミニマル」、「ウィスキー」、「軍歌」は二次クエリとして棄却された語句である。二次クエリ候補の語句Ａを二次クエリの語句として採択するか棄却するかの判断基準は、以下の２点である。 In the following, the criteria for selecting terms in the secondary query will be described in detail with specific examples. As described above, among the phrases of the secondary query candidates, the phrases “Glasgow” and “resident referendum” of the circles R4 and R5 are the phrases adopted as the secondary query, and the phrases “narrow” of the circles R6 to R10, “Minimal”, “whiskey”, and “military song” are rejected phrases as secondary queries. The following two points are used as criteria for determining whether or not to accept the secondary query candidate phrase A as the secondary query phrase.

（１）二次クエリ候補の語句Ａが、いずれかの一次クエリの語句Ｃと類似度が高いこと。
（２）語句Ａと共起関係にある一次クエリの語句Ｂが、（１）の一次クエリの語句Ｃと異なること。 (1) The word A of the secondary query candidate has a high similarity to the word C of any primary query.
(2) The phrase B of the primary query that is co-occurring with the phrase A is different from the phrase C of the primary query of (1).

コンテンツ推薦装置１は、（１）及び（２）の二つの判断基準を満たす二次クエリ候補の語句Ａを二次クエリの語句として採択し、いずれか一方の条件、あるいは、両方の条件を満たさない語句Ａを二次クエリから棄却する。上記の２つの判断基準をともに満たす語句Ａは、ユーザの興味を反映した一次クエリの語句Ｃとの類似度が高く、かつ、語句Ｃとは異なる一次クエリの語句Ｂと、ソーシャルデータ上のある文脈において何らかの関係があることと同義である。つまり、語句Ａが採択されるためには、ソーシャルデータ上での語句Ｂとの共起関係に基づいてユーザの潜在的な興味の対象を指し示すと類推された語句であり、かつ、ユーザの明示的な興味の対象を指し示す語句Ｃと意味が近いことが条件となっている。上記の２つの判断基準を満たす語句Ａは、時事の話題が多く扱われるソーシャルデータにおける一次クエリの語句（ユーザの明示的な興味の対象）との共起関係を利用してコンテンツ推薦装置１が類推した、ユーザの潜在的な興味の対象であり、また、社会一般における時事の話題を反映した検索語（二次クエリ）である可能性が高い。 The content recommendation device 1 adopts the secondary query candidate word / phrase A that satisfies the two criteria (1) and (2) as the secondary query word / phrase and satisfies one or both of the conditions. Reject the missing phrase A from the secondary query. The phrase A that satisfies both of the above two criteria is highly similar to the phrase C of the primary query that reflects the user's interest and is different from the phrase C of the primary query that is different from the phrase C and is present in social data. It is synonymous with having some relationship in context. That is, in order for the phrase A to be adopted, it is a phrase that is analogized to indicate the target of the user's potential interest based on the co-occurrence relationship with the phrase B on social data, and the user's explicit It is a condition that the meaning is close to that of the word C indicating the object of general interest. The phrase A that satisfies the above two criteria is used by the content recommendation device 1 by utilizing the co-occurrence relationship with the phrase of the primary query (object of the user's explicit interest) in social data that deals with many current topics. By analogy, it is a target of the user's potential interest, and it is highly likely that it is a search word (secondary query) that reflects the topic of current events in society in general.

例えば、図１２に示した二次クエリ候補の語句「グラスゴー」（語句Ａの具体例）は、一次クエリの語句「建築」（語句Ｂの具体例）と共起関係にあり、かつ、「建築」とは異なる別の一次クエリの語句「スコットランド」（語句Ｃの具体例）と意味的に近いため、二次クエリの語句として採択される。同様に、語句「住民投票」（語句Ａの具体例）は、一次クエリの語句「スコットランド」（語句Ｂの具体例）と共起関係にあり、かつ、一次クエリの語句「政治」（語句Ｃの具体例）と意味的に近い関係にあるので、二次クエリの語句として採択される。一方で、語句「狭小」、「ミニマル」、「ウィスキー」、「軍歌」（語句Ａの具体例）については、それぞれの共起の相手「建築」、「スコットランド」、「政治」の他に意味的に近い一次クエリの語句をもたないため、二次クエリの語句として採択されない。仮に、二次クエリ候補の語句「ウィスキー」と意味的に近い語句「酒」が一次クエリに存在するならば、語句「ウィスキー」は二次クエリとして採択される可能性がある。二次クエリとして採択された語句「グラスゴー」、「住民投票」を利用して検索した結果得られたコンテンツは、図１２に示すように、ユーザの潜在的な嗜好（グラスゴーに残る壮麗な建築）や、世間一般の時事の話題（スコットランド独立についての住民投票）を反映したコンテンツである。 For example, the phrase “Glasgow” (specific example of the phrase A) of the secondary query candidate shown in FIG. 12 has a co-occurrence relationship with the phrase “architecture” (specific example of the phrase B) of the primary query, and “architecture” Is different from “Scotland” (specific example of the phrase C) of the primary query different from “” and is adopted as the phrase of the secondary query. Similarly, the phrase “resident vote” (specific example of the phrase A) is co-occurring with the phrase “Scotland” (specific example of the phrase B) of the primary query, and the phrase “politics” (phrase C) of the primary query. It is adopted as a phrase of the secondary query because it is semantically close to the specific example. On the other hand, the phrases “narrow”, “minimal”, “whiskey”, “military song” (specific examples of the phrase A) are meanings in addition to their co-occurring counterparts “architecture”, “Scotland”, “politics”. Since it does not have a primary query phrase that is close to the target, it is not adopted as a secondary query phrase. If the phrase “sake” that is semantically close to the phrase “whiskey” of the secondary query candidate exists in the primary query, the phrase “whiskey” may be adopted as the secondary query. As shown in FIG. 12, the content obtained as a result of searching using the phrases “Glasgow” and “local referendum” adopted as the secondary query is the potential user's preference (the magnificent architecture remaining in Glasgow). It is also a content that reflects the topic of the current general public (a referendum on Scotland independence).

図１３は、推薦リスト生成部１７が出力する推薦コンテンツリストデータの例を示す図である。同図に示す推薦コンテンツリストデータは、各推薦コンテンツの番組名、放送日時、番組概要を設定したデータである。 FIG. 13 is a diagram illustrating an example of recommended content list data output by the recommendation list generation unit 17. The recommended content list data shown in the figure is data in which the program name, broadcast date and time, and program outline of each recommended content are set.

図１４は、推薦コンテンツ提示部１８がコンテンツ表示装置に表示させる推薦コンテンツ提示画面の表示例を示す図である。同図は、図１３に示す推薦コンテンツリストの内容をウェブブラウザにより表示させたＧＵＩ（グラフィック・ユーザ・インタフェース）画面である。なお、推薦コンテンツ提示画面の上部に表示される「今後の放送予定」、「システム設定１」、ならびに「システム設定２」のタブは、それぞれ、未視聴コンテンツの一覧、当該ユーザの拡張クエリの内容の一覧、および、推薦リスト生成部１７において二次スコアに乗ずる重みの設定を表示させるためのオプションである。これらの表示は、本実施形態では必須ではない。 FIG. 14 is a diagram illustrating a display example of a recommended content presentation screen that the recommended content presentation unit 18 displays on the content display device. This figure is a GUI (graphic user interface) screen on which the content of the recommended content list shown in FIG. 13 is displayed by a web browser. The “future broadcast schedule”, “system setting 1”, and “system setting 2” tabs displayed at the top of the recommended content presentation screen are respectively a list of unviewed content and the contents of the extended query of the user. And a recommendation list generation unit 17 to display a setting of a weight to be multiplied by the secondary score. These displays are not essential in the present embodiment.

上記実施形態においては、一次クエリの語句をユーザ視聴履歴情報から抽出していたが、一次クエリの語句は、ユーザが入力したキーワードでもよい。
また、上記実施形態においては、ソーシャルデータを利用して二次クエリ候補の語句を取得しているが、他のデータを利用して二次クエリの語句を取得してもよい。ソーシャルデータのように、同じ話題に対して多様な表記が用いられ、話題の対象をタイムスタンプなどの時刻により特定することができる計算機利用可能なコーパスデータであれば、任意のデータを利用することができる。 In the above embodiment, the phrase of the primary query is extracted from the user viewing history information. However, the phrase of the primary query may be a keyword input by the user.
Moreover, in the said embodiment, although the phrase of a secondary query candidate is acquired using social data, you may acquire the phrase of a secondary query using other data. Arbitrary data can be used as long as it is computer-usable corpus data that can identify the subject of a topic by time such as a time stamp, etc. Can do.

なお、上記においては、コンテンツ推薦装置１とコンテンツ表示装置３とがネットワークを介して接続される場合について説明したが、コンテンツ表示装置３がコンテンツ推薦装置１を備えるように構成してもよい。また、コンテンツ表示装置３に、コンテンツ推薦装置１の一部の機能部を備える構成としてもよい。例えば、コンテンツ表示装置３にコンテンツ推薦装置１の視聴履歴記録部１１を備えてもよく、さらに、未視聴コンテンツ情報記録部１２や視聴履歴解析部１３を備えてもよい。 In the above description, the case where the content recommendation device 1 and the content display device 3 are connected via a network has been described. However, the content display device 3 may be configured to include the content recommendation device 1. The content display device 3 may be configured to include a part of the functional units of the content recommendation device 1. For example, the content display device 3 may include the viewing history recording unit 11 of the content recommendation device 1, and may further include the unviewed content information recording unit 12 and the viewing history analysis unit 13.

上述した実施形態によれば、コンテンツ推薦装置１は、インターネット上で提供されているソーシャルメディアを利用して、ユーザの嗜好を記述した検索語の集合である一次クエリと意味的に関係が深いその他の言語表現を二次クエリの語句として抽出する。コンテンツ推薦装置１は、ユーザの嗜好を記述した検索語の集合である一次クエリデータに、一次クエリの語句に基づいて抽出した二次クエリの語句を検索語として追加する。コンテンツ推薦装置１は、二次クエリの語句が追加された検索語の集合を用いてコンテンツを検索する。これにより、コンテンツ推薦装置１は、ユーザの求める内容により則したコンテンツを推薦することができる。また、二次クエリの語句の抽出に用いるソーシャルメディアの時期を限定することにより、コンテンツ推薦装置１は、ユーザの潜在的な嗜好に加え、日々新たに出現する時事の話題、あるいは、過去の話題を反映したコンテンツを推薦することができる。 According to the above-described embodiment, the content recommendation device 1 uses social media provided on the Internet, and has other semantic relationships with a primary query that is a set of search terms describing user preferences. Is extracted as a secondary query phrase. The content recommendation device 1 adds, as a search term, a secondary query word / phrase extracted based on the primary query word / phrase to the primary query data, which is a set of search terms describing user preferences. The content recommendation device 1 searches for content using a set of search terms to which a secondary query term is added. As a result, the content recommendation device 1 can recommend content that conforms to the content requested by the user. In addition, by limiting the time of social media used for the extraction of the words and phrases of the secondary query, the content recommendation device 1 allows the topic of a new topic to appear daily or a past topic in addition to the user's potential preference. Can be recommended.

以上説明したように、本実施形態のコンテンツ推薦装置１によれば、元の検索語から、語句の多様性や話題の時事性を適切に反映したクエリ（拡張クエリ）を自動的に生成することができる。そして、コンテンツ推薦装置１は、生成したクエリを用いてコンテンツを検索することによって、従来よりもユーザの嗜好により合致したコンテンツ推薦を実現することが可能となる。
また、本実施形態のコンテンツ推薦装置１によれば、元の検索語と意味的な関係が深い語句（拡張クエリ）に基づいた多様性に富んだコンテンツ推薦が可能となる。その結果、元の検索語だけからは見つけ出すことが難しい、ユーザの新たな興味の発掘や発見につながる可能性（セレンディピティ）に富んだコンテンツを推薦することができる。 As described above, according to the content recommendation device 1 of the present embodiment, automatically generating a query (extended query) that appropriately reflects the diversity of phrases and current topics of topics from the original search terms. Can do. And the content recommendation apparatus 1 can implement | achieve the content recommendation which matched the user preference rather than before by searching a content using the produced | generated query.
In addition, according to the content recommendation device 1 of the present embodiment, it is possible to recommend a variety of content based on phrases (extended queries) having a deep semantic relationship with the original search word. As a result, it is possible to recommend content that is difficult to find from only the original search terms and has a high possibility of finding new discoveries and discoveries of the user (serendipity).

上述したコンテンツ推薦装置１及びコンテンツ表示装置３は、内部にコンピュータシステムを有している。そして、コンテンツ推薦装置１及びコンテンツ表示装置３の動作の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータシステムが読み出して実行することによって、上記処理が行われる。ここでいうコンピュータシステムとは、ＣＰＵ及び各種メモリやＯＳ、周辺機器等のハードウェアを含むものである。 The content recommendation device 1 and the content display device 3 described above have a computer system inside. The process of operation of the content recommendation device 1 and the content display device 3 is stored in a computer-readable recording medium in the form of a program. The computer system reads and executes this program, so that the above processing is performed. Is called. The computer system here includes a CPU, various memories, an OS, and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

１コンテンツ推薦装置
１１視聴履歴記録部
１２未視聴コンテンツ情報記録部
１３視聴履歴解析部
１４ソーシャルデータ記録部
１５ソーシャルデータ解析部
１６クエリ拡張部
１７推薦リスト生成部
１８推薦コンテンツ提示部
１９記憶部
３コンテンツ表示装置
３１操作部
３２取得部
３３出力部
３４通知部
３５受信部
５ソーシャルメディアサービス提供装置
９ネットワーク 1 content recommendation device 11 viewing history recording unit 12 unviewed content information recording unit 13 viewing history analysis unit 14 social data recording unit 15 social data analysis unit 16 query expansion unit 17 recommendation list generation unit 18 recommended content presentation unit 19 storage unit 3 content Display device 31 Operation unit 32 Acquisition unit 33 Output unit 34 Notification unit 35 Reception unit 5 Social media service providing device 9 Network

Claims

An acquisition unit that acquires primary query data indicating a list of search terms that are terms used in the search;
A search word candidate extraction unit that extracts words of a search word candidate from a plurality of corpus data in which the same topic can be described by different notations;
The search in which the similarity with any of the search terms included in the primary query data is higher than a predetermined condition and higher than the predetermined condition from the search term candidate words A query expansion unit that selects and adds to the search terms a search term candidate phrase that co-occurs in the corpus data with any of the search terms different from a word;
A search unit for searching for content using the search term included in the primary query data and the search term added by the query expansion unit;
A content recommendation device comprising:

The search word candidate extraction unit extracts words of the search word candidate from the corpus data within a predetermined period.
The content recommendation device according to claim 1.

The acquisition unit acquires primary query data including words extracted from text information related to content viewed by a user.
The content recommendation device according to claim 1, wherein the content recommendation device is a content recommendation device.

The acquisition unit acquires primary query data composed of phrases extracted from text information related to a portion of content reproduced by a user,
The content recommendation device according to any one of claims 1 to 3, wherein

The search word candidate extraction unit extracts words of the search word candidate from a tag or text of the corpus data.
The content recommendation device according to any one of claims 1 to 4, wherein the content recommendation device is characterized in that:

Computer
An acquisition means for acquiring primary query data indicating a list of search terms that are terms used in the search;
Search word candidate extraction means for extracting words of a search word candidate from a plurality of corpus data in which the same topic can be described by different notations;
The search in which the similarity with any of the search terms included in the primary query data is higher than a predetermined condition and higher than the predetermined condition from the search term candidate words Query expansion means for selecting and adding to the search terms a search term candidate phrase that co-occurs in the corpus data with any of the search terms different from a word;
Search means for searching for content using the search terms included in the primary query data and the search terms added by the query expansion means;
For functioning as a content recommendation device.