JP7407560B2

JP7407560B2 - Keyword evaluation device, keyword evaluation method, and keyword evaluation program

Info

Publication number: JP7407560B2
Application number: JP2019197588A
Authority: JP
Inventors: 豊金子; 祐太星; 勇太萩尾
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2024-01-04
Anticipated expiration: 2039-10-30
Also published as: JP2021071569A

Description

本発明は、キーワードを評価する装置、方法及びプログラムに関する。 The present invention relates to a device, method, and program for evaluating keywords.

従来、人と一緒にテレビ番組などの映像を視聴するコミュニケーションロボットに関する技術が提案されている。
例えば、特許文献１では、映像に関連するソーシャルメディアコメントを利用し、ロボットの内部的なパーソナリティ又は感情状態などに応じて、発話文を生成し、ロボットを動作させることで、ロボットがユーザと共に映像を視聴しているかのようなアクションを実現する技術が提案されている。
また、特許文献２では、ロボットが人からのチャンネル切り替えなどの命令に応じると共に、テレビの方向を向きながら自発的につぶやくことで、ロボットが自律してテレビを視聴しているかのような動作をするロボットの制御技術が提案されている。 2. Description of the Related Art Conventionally, technologies related to communication robots that watch videos such as television programs together with people have been proposed.
For example, in Patent Document 1, the robot uses social media comments related to the video, generates utterances according to the robot's internal personality or emotional state, and operates the robot. A technology has been proposed that allows you to perform actions that make you feel as if you are watching a movie.
In addition, in Patent Document 2, the robot responds to commands from humans such as changing channels, and also murmurs spontaneously while facing the direction of the TV, so that the robot behaves as if it were watching TV autonomously. A control technology for robots has been proposed.

特許第６１２２７９２号公報Patent No. 6122792 特開２０１８－１８０４７２号公報JP 2018-180472 Publication 特許第５１９４１９８号公報Patent No. 5194198 特許第６４８６１６５号公報Patent No. 6486165

山本誠、谷本浩昭、新田直子、馬場口登；「個人的選好獲得のための特定人物のテレビ視聴時における興味区間推定」、電子情報通信学会論文誌Ｄ、Ｖｏｌ．Ｊ９０－Ｄ、Ｎｏ．８、ｐｐ．２２０２－２２１１、２００７Makoto Yamamoto, Hiroaki Tanimoto, Naoko Nitta, Noboru Babaguchi; "Estimating the interest range of a specific person when watching TV to acquire personal preferences", IEICE Transactions D, Vol. J90-D, No. 8, pp. 2202-2211, 2007

ところで、人と一緒にテレビを視聴するコミュニケーションロボットに、視聴中の番組に関連する話題に対して、人と共感する動作などをさせるためには、ロボットと一緒にいる人がどのような事に興味があるかを知ることが重要である。このような情報の１つとして、固有名詞などのキーワードにその人がどの程度興味を持っているかの指標である興味度がある。
例えば、ロボットと一緒にテレビを視聴している人が「Ａ子さん」のファンであること、すなわち興味度が高いことが分かると、「Ａ子さんってかわいいよね」というような共感する発話文や、「今、Ａ子さんがテレビに出てるよ」といった情報提供の発話文などをロボットから発話することができるようになる。 By the way, in order for a communication robot that watches TV with people to behave in ways that empathize with people regarding topics related to the program being watched, what kind of things should the people watching with the robot do? It is important to know what your interests are. One type of such information is interest level, which is an index of how much a person is interested in a keyword such as a proper noun.
For example, if it becomes clear that the person watching TV with the robot is a fan of A-san, that is, has a high level of interest in her, he or she will make empathetic utterances such as, "A-san is cute, isn't she?" The robot will be able to utter sentences and informational utterances such as ``A-ko is on TV right now.''

特許文献３では、過去の視聴履歴から、その人が興味を持ちそうな番組を推薦する技術が提案されている。これは、利用者が嗜好するコンテンツを提示する技術であり、特に他の人との関係性も含めた嗜好性の推定が行われる。
しかし、このような番組推薦で使われる技術では、推定対象は番組であり、視聴中の番組に関連した興味のあるキーワードの推定には利用できない。 Patent Document 3 proposes a technique for recommending programs that are likely to be of interest to a person based on past viewing history. This is a technology that presents content that the user prefers, and in particular, estimates the user's preferences, including relationships with other people.
However, in the techniques used in such program recommendation, the estimation target is the program, and cannot be used to estimate keywords of interest related to the program being viewed.

特許文献４では、番組に関連する数多くのキーワードの中から、興味候補キーワードを抽出する技術が提案されている。これは、予め準備された複数の辞書を用いて、番組に関連した多くのキーワードの中から、一般的に興味が強いと推定されるキーワードを抽出する技術である。
しかし、抽出されたキーワードはあくまで、複数の辞書によって興味が強いと推定されるキーワードであり、番組を視聴している個人の興味は反映されない。 Patent Document 4 proposes a technique for extracting interesting candidate keywords from a large number of keywords related to a program. This is a technique that uses a plurality of dictionaries prepared in advance to extract keywords that are presumed to be of general interest from among many keywords related to a program.
However, the extracted keywords are only keywords that are estimated to be of great interest according to multiple dictionaries, and do not reflect the interests of individuals viewing the program.

非特許文献１では、視聴中のテレビ番組の興味区間を推定する技術が提案されている。この技術は、テレビ視聴中の人の表情をカメラで撮り、その表情から興味区間を推定する手法である。これにより、番組内の興味区間にタグ付けされているキーワードを、興味のあるキーワードとして推定できる。
しかし、推定された興味区間内にタグ付けされたキーワードが複数ある場合には、これらの中のどのキーワードに興味があるかという推定はできない。また、テレビ番組を視聴中の人の表情を撮影するためには、家庭内にカメラを設置する必要があり、現実的には困難である。たとえ、カメラが設置できたとしても、日常の生活では、例えば横になって視聴するなど、顔の位置と向きとは様々であるため、人の表情を精度よく抽出することは困難である。 Non-Patent Document 1 proposes a technique for estimating an interesting section of a television program that is being viewed. This technology uses a camera to take pictures of people's facial expressions while they are watching TV, and then estimates areas of interest based on those facial expressions. Thereby, the keyword tagged in the interesting section within the program can be estimated as the keyword of interest.
However, if there are multiple tagged keywords within the estimated interest interval, it is not possible to estimate which of these keywords the user is interested in. Furthermore, in order to photograph the facial expressions of people watching television programs, it is necessary to install a camera in the home, which is difficult in practice. Even if a camera could be installed, it would be difficult to accurately extract a person's facial expressions because the positions and orientations of people's faces vary in daily life, such as when viewing while lying down.

このように、番組に数多くの関連するキーワードがタグ付けされていた場合、番組視聴中の視聴者の興味の対象であるキーワードを推定するには、視聴中のユーザからの何らかのリアクションが情報として必要となる。しかし、カメラなどのセンサ機器を家庭内に設置すること、又は身体にセンサ類を装着して視聴者のリアクションを観測することは、現実的には困難な場合が多い。また、視聴履歴、又はリモコンの操作などにより取得できる情報は、番組単位の嗜好を推定するためには利用できるが、番組内の関連するキーワードに対する興味度を推定するには不十分であるという課題があった。 In this way, if a program is tagged with many related keywords, some kind of reaction from the user while watching the program is required as information in order to estimate the keywords that the viewer is interested in while watching the program. becomes. However, it is often difficult in practice to install sensor devices such as cameras in homes or to observe viewers' reactions by attaching sensors to the body. Additionally, information that can be obtained through viewing history or remote control operations can be used to estimate preferences for each program, but is insufficient to estimate interest in related keywords within a program. was there.

本発明は、キーワードそれぞれに対する、利用者の興味度を評価できるキーワード評価装置、キーワード評価方法及びキーワード評価プログラムを提供することを目的とする。 An object of the present invention is to provide a keyword evaluation device, a keyword evaluation method, and a keyword evaluation program that can evaluate a user's degree of interest in each keyword.

本発明に係るキーワード評価装置は、入力されたキーワードを含む発話文を生成し、利用者に対して出力する発話生成部と、前記発話文に対する前記利用者のリアクション種別を取得するリアクション取得部と、前記リアクション種別に基づいて、前記利用者の前記キーワードに対する興味度を算出する興味度演算部と、を備える。 The keyword evaluation device according to the present invention includes an utterance generation unit that generates an utterance including an input keyword and outputs it to a user, and a reaction acquisition unit that acquires the type of reaction of the user to the utterance. , an interest degree calculation unit that calculates the user's degree of interest in the keyword based on the reaction type.

前記リアクション取得部は、前記リアクション種別として、ポジティブ及びネガティブを含む複数の種別のいずれかを取得してもよい。 The reaction acquisition unit may acquire any one of a plurality of types including positive and negative as the reaction type.

前記リアクション取得部は、前記発話文に対する前記利用者のリアクション時間を計測し、当該リアクション時間が所定時間を超える場合、前記リアクション種別として無反応の種別を取得してもよい。 The reaction acquisition unit may measure a reaction time of the user to the utterance, and if the reaction time exceeds a predetermined time, the reaction acquisition unit may acquire a non-reaction type as the reaction type.

前記キーワード評価装置は、前記発話文の種別を、所定数の発話種別の中から、予め定められた確率で選択する発話種別選択部を備えてもよい。 The keyword evaluation device may include an utterance type selection unit that selects the type of the uttered sentence from a predetermined number of utterance types with a predetermined probability.

前記キーワード評価装置は、前記リアクション種別に基づいて、前記発話種別ごとに、前記キーワードを含む発話文に対する嗜好度を算出する嗜好度演算部を備え、前記興味度演算部は、前記嗜好度の統計情報により前記興味度を算出してもよい。 The keyword evaluation device includes a preference calculation unit that calculates a preference for an utterance containing the keyword for each utterance type based on the reaction type, and the interest calculation unit calculates the preference statistics for the utterance including the keyword. The degree of interest may be calculated based on information.

前記リアクション取得部は、前記発話文に対する前記利用者のリアクション時間を計測し、前記嗜好度演算部は、前記リアクション時間に基づいて、前記嗜好度に重み付けしてもよい。 The reaction acquisition unit may measure a reaction time of the user to the uttered sentence, and the preference calculation unit may weight the preference level based on the reaction time.

前記キーワード評価装置は、放送番組から、所定のデータベースに含まれる前記キーワードを抽出するキーワード抽出部を備えてもよい。 The keyword evaluation device may include a keyword extraction unit that extracts the keywords included in a predetermined database from the broadcast program.

本発明に係るキーワード評価方法は、入力されたキーワードを含む発話文を生成し、利用者に対して出力する発話生成ステップと、前記発話文に対する前記利用者のリアクション種別を取得するリアクション取得ステップと、前記リアクション種別に基づいて、前記利用者の前記キーワードに対する興味度を算出する興味度演算ステップと、をコンピュータが実行する。 The keyword evaluation method according to the present invention includes an utterance generation step of generating an utterance including an input keyword and outputting it to a user, and a reaction acquisition step of acquiring the reaction type of the user to the utterance. A computer executes the step of calculating the degree of interest of the user in the keyword based on the reaction type.

本発明に係るキーワード評価プログラムは、前記キーワード評価装置としてコンピュータを機能させるためのものである。 A keyword evaluation program according to the present invention is for causing a computer to function as the keyword evaluation device.

本発明によれば、キーワードそれぞれに対する、利用者の興味度を評価できる。 According to the present invention, it is possible to evaluate a user's degree of interest in each keyword.

実施形態におけるキーワード評価装置の機能構成を示すブロック図である。It is a block diagram showing the functional composition of the keyword evaluation device in an embodiment. 実施形態における発話生成部の詳細な機能構成を示す図である。FIG. 3 is a diagram showing a detailed functional configuration of an utterance generation unit in an embodiment. 実施形態における開示テンプレートの一例を示す図である。It is a figure showing an example of the disclosure template in an embodiment. 実施形態における質問テンプレートの一例を示す図である。It is a figure showing an example of a question template in an embodiment. 実施形態における確認テンプレートの一例を示す図である。It is a figure which shows an example of the confirmation template in embodiment. 実施形態におけるキーワード辞典の登録例を示す図である。It is a figure showing the example of registration of the keyword dictionary in an embodiment. 実施形態におけるキーワード辞書に記録される嗜好度の一例を示す図である。It is a figure which shows an example of the preference degree recorded in the keyword dictionary in embodiment.

以下、本発明の実施形態の一例について説明する。
本実施形態におけるキーワード評価方法は、キーワードに対する利用者の興味度を評価する方法である。特に、本実施形態では、テレビなどの放送番組の視聴者と一緒に番組を視聴するロボットが、視聴中の番組に関連するキーワードに対する視聴者の興味度を評価する方法を例示する。
なお、本実施形態では、キーワードに対する興味の指標値を興味度と呼び、値が大きいほど興味が高いものとする。 An example of an embodiment of the present invention will be described below.
The keyword evaluation method in this embodiment is a method of evaluating a user's degree of interest in a keyword. In particular, the present embodiment exemplifies a method in which a robot that views a broadcast program such as a television program together with the viewer evaluates the viewer's degree of interest in a keyword related to the program being viewed.
In this embodiment, the index value of interest in a keyword is referred to as the degree of interest, and the larger the value, the higher the interest.

本実施形態では、人と一緒に番組を視聴するロボットに、キーワードの興味度を評価するキーワード評価装置が組み込まれ、ロボットは、視聴中の放送番組に関連するキーワードを含んだ発話文を視聴中の人に対して話しかけ、その反応を取得することによって、キーワードに対する興味度を評価する。 In this embodiment, a keyword evaluation device that evaluates the interest level of a keyword is built into a robot that watches a program together with a person. Evaluate the degree of interest in a keyword by talking to people and obtaining their reactions.

図１は、本実施形態におけるキーワード評価装置１の機能構成を示すブロック図である。
キーワード評価装置１は、制御部及び記憶部の他、各種インタフェースを備えた情報処理装置であり、記憶部に格納されたソフトウェア（キーワード評価プログラム）を制御部が実行することにより、本実施形態の各種機能が実現される。 FIG. 1 is a block diagram showing the functional configuration of a keyword evaluation device 1 in this embodiment.
The keyword evaluation device 1 is an information processing device equipped with various interfaces in addition to a control section and a storage section, and the control section executes software (keyword evaluation program) stored in the storage section. Various functions are realized.

キーワード評価装置１の制御部は、キーワード抽出部１１と、発話種別選択部１２と、発話生成部１３と、リアクション取得部１４と、嗜好度演算部１５と、興味度演算部１６とを備える。
また、キーワード評価装置１の記憶部は、キーワード評価プログラムの他、キーワード辞書１７などの各種データベースを備えている。 The control unit of the keyword evaluation device 1 includes a keyword extraction unit 11 , an utterance type selection unit 12 , an utterance generation unit 13 , a reaction acquisition unit 14 , a preference calculation unit 15 , and an interest calculation unit 16 .
Further, the storage unit of the keyword evaluation device 1 includes various databases such as a keyword dictionary 17 in addition to the keyword evaluation program.

本実施形態において、キーワード評価装置１は、利用者との間のインタフェースの一例として音声による入出力を行うこととする。キーワード評価装置１から出力された発話文出力Ｏは、音声合成装置２により音声出力ＯＡとして利用者に音声として提示される。音声出力ＯＡに対する利用者の応答である音声入力ＩＡは、音声認識装置３によりテキスト文に変換され、キーワード評価装置１の応答文入力Ｉとして入力される。 In this embodiment, the keyword evaluation device 1 performs voice input/output as an example of an interface with a user. The utterance output O outputted from the keyword evaluation device 1 is presented to the user as audio output OA by the speech synthesis device 2. The voice input IA, which is the user's response to the voice output OA, is converted into a text sentence by the voice recognition device 3 and input as a response sentence input I to the keyword evaluation device 1.

なお、本実施形態では、利用者との間のインタフェースとして音声による方式を説明するが、これに限定されるものではなく、例えば、発話文出力Ｏをディスプレイに表示し、利用者がキーボードなどを使って応答文入力Ｉを入力する方法でもよい。 In this embodiment, a voice method will be described as an interface with the user, but the method is not limited to this. For example, the utterance output O may be displayed on a display, and the user may use a keyboard, etc. It is also possible to input the response sentence input I using

キーワード抽出部１１は、キーワード評価装置１に入力される入力文Ｔから、キーワード辞書１７に存在するキーワードを抽出する。本実施形態では、入力文Ｔは、視聴中の放送番組に含まれる字幕情報（クローズドキャプション）として説明する。例えば、「今日はこのお店のラーメンを食べに行ってみたいと思います。」という字幕文が入力文Ｔとして入力され、キーワード辞書１７に「ラーメン」というキーワードが含まれている場合には、キーワード抽出部１１は、キーワード「ラーメン」を抽出する。 The keyword extraction unit 11 extracts keywords existing in the keyword dictionary 17 from the input sentence T input to the keyword evaluation device 1. In this embodiment, the input sentence T will be explained as subtitle information (closed caption) included in the broadcast program being viewed. For example, if the subtitle sentence "Today I would like to go and eat ramen at this restaurant" is input as input sentence T, and the keyword dictionary 17 contains the keyword "ramen", then The keyword extraction unit 11 extracts the keyword "ramen".

なお、キーワード辞書１７は、評価対象のキーワードが登録されたデータベースである。キーワードは、主に、人名、地名、施設などの固有名詞及び一般名詞などである。本実形態では、キーワード辞書１７にキーワードが予め登録されているものとする。また、キーワード辞書１７に登録されている各キーワードに対して、後述の発話文種別ごとに、嗜好の程度を示す嗜好度が保存される。 Note that the keyword dictionary 17 is a database in which keywords to be evaluated are registered. Keywords are mainly proper nouns and common nouns such as people's names, place names, and facilities. In this embodiment, it is assumed that keywords are registered in the keyword dictionary 17 in advance. Furthermore, for each keyword registered in the keyword dictionary 17, a degree of preference indicating the degree of preference is stored for each type of uttered sentence, which will be described later.

ここで、放送番組から字幕情報を抽出する方法は限定されないが、例えば、ロボットがテレビ受像機と連動して字幕情報を取得したり、放送連動サービスとして、インターネットなどの通信設備を用いて配信される字幕情報を取得したりといった方法がとられる。
なお、本実施形態では、入力文Ｔを字幕情報としているが、これには限定されない。例えば、ロボットに装備されたカメラ又はマイクロフォンを用いて、テレビ画像の画像処理、又はテレビ音声の音声認識などにより抽出された文であってもよい。 Here, the method of extracting subtitle information from a broadcast program is not limited, but for example, a robot may obtain subtitle information in conjunction with a television receiver, or a service may be distributed using communication equipment such as the Internet as a broadcast-linked service. Methods such as obtaining subtitle information from
Note that in this embodiment, the input sentence T is subtitle information, but it is not limited to this. For example, it may be a sentence extracted by image processing of television images or voice recognition of television audio using a camera or microphone equipped on the robot.

また、字幕情報などからキーワードを抽出する頻度は、一定時間を挟みながらランダムにサンプリングするなど、適宜調整されてよい。あるいは、キーワード抽出部１１は、単語の出現頻度などに基づく重要度の高いキーワードを優先して、又は発話回数が少なく評価が十分に得られていないキーワードを優先して、抽出されるキーワードの数を絞ってもよい。 Further, the frequency of extracting keywords from subtitle information or the like may be adjusted as appropriate, such as by randomly sampling with a certain period of time in between. Alternatively, the keyword extraction unit 11 prioritizes keywords with high importance based on the frequency of word appearance, etc., or prioritizes keywords that have been uttered few times and have not received sufficient evaluation, and the number of extracted keywords. You can also narrow down the

発話種別選択部１２は、発話生成部１３で生成する発話文の種別を、所定数の発話種別の中から、予め定められた確率で選択する。
ここで、発話種別は、テレビ視聴時の人同士の会話における発話を分類したものであり、例えば、次の文献Ａでは、「質問」「指示」「情報」「開示」「反射」「確認」「解釈」「応答」の８種類の種別が定義されている。
文献Ａ：星裕太、金子豊、萩尾勇太、村崎康博、上原道宏：「ロボット発話に向けたテレビ視聴時の人同士の対話解析」、電子情報通信学会、信学技報、ＣＮＲ２０１９－１（２０１９－０６）、ｐｐ．１－６ The utterance type selection section 12 selects the type of the utterance sentence generated by the utterance generation section 13 from among a predetermined number of utterance types at a predetermined probability.
Here, the utterance type is a classification of utterances in conversations between people while watching TV. For example, in the following document A, utterance types are "question,""instruction,""information,""disclosure,""reflection," and "confirmation." Eight types of "interpretation" and "response" are defined.
Document A: Yuta Hoshi, Yutaka Kaneko, Yuta Hagio, Yasuhiro Murasaki, Michihiro Uehara: "Analysis of dialogue between people while watching TV for robot speech", Institute of Electronics, Information and Communication Engineers, IEICE Technical Report, CNR2019-1 ( 2019-06), pp. 1-6

本実施形態では、発話種別を、「質問」「情報」「開示」「確認」の４種類とし、発話種別選択部１２は、これら４つの種別の中から１つを、所定の確率で選択する。
「質問」は、相手へ質問をする発話であり、例えば、「Ａ子さんは好きですか？」などがこれにあたる。
「情報」は、相手へ何らかの情報を提供する発話であり、例えば、「Ａ子さんは昔、〇〇ドラマに出てたんだ。」などがこれにあたる。
「開示」は、相手に自分の考えや気持ちを伝える発話であり、例えば、「Ａ子さんって大好き。」などがこれにあたる。
「確認」は、相手に何かの確認をする発話であり、例えば、「Ａ子さんはきれいですね。」などがこれにあたる。 In this embodiment, there are four types of utterances: "question", "information", "disclosure", and "confirmation", and the utterance type selection unit 12 selects one from these four types with a predetermined probability. .
A "question" is an utterance that asks a question to the other party, such as "Do you like A-san?"
"Information" is an utterance that provides some kind of information to the other party, such as "A-ko-san used to appear in XX drama."
"Disclosure" is an utterance that conveys one's thoughts and feelings to the other party, such as "I love A-san."
"Confirmation" is an utterance that confirms something to the other party, such as "A-san is beautiful."

文献Ａによれば、テレビ視聴時の人同士の会話では、会話を始める際の発話種別は、「開示」が２０～４０％と多く、次いで「確認」が１４～２０％、「質問」が８～１６％、「情報」が４～９％となっている。そこで、発話種別選択部１２は、例えば、「開示」を４５％、「確認」を３０％、「質問」を１５％、「情報」を１０％の割合でランダムに選択する。 According to Document A, in conversations between people while watching TV, the most common type of utterance at the beginning of a conversation is "disclosure" at 20-40%, followed by "confirmation" at 14-20%, and "question" at 14-20%. 8% to 16%, and 4% to 9% for "information." Therefore, the utterance type selection unit 12 randomly selects "disclosure" at a ratio of 45%, "confirmation" at 30%, "question" at 15%, and "information" at 10%, for example.

発話生成部１３は、キーワード抽出部１１により抽出されたキーワードと、発話種別選択部１２により選択された発話種別とに基づいて、発話文を生成し、発話文出力Ｏとして出力する。
具体的には、発話生成部１３は、発話種別選択部１２が選択した発話種別に応じた、キーワード抽出部１１が抽出したキーワードを含む発話文を生成する。このため、発話生成部１３は、４種類の発話種別のそれぞれに対応した発話文の生成手段を含んでいてよい。 The utterance generation unit 13 generates an utterance based on the keyword extracted by the keyword extraction unit 11 and the utterance type selected by the utterance type selection unit 12, and outputs it as an utterance output O.
Specifically, the utterance generation unit 13 generates an utterance sentence that corresponds to the utterance type selected by the utterance type selection unit 12 and includes the keyword extracted by the keyword extraction unit 11. For this reason, the utterance generation unit 13 may include generation means for generating utterances corresponding to each of the four types of utterances.

例えば、次の文献Ｂでは、過去の放送番組の字幕文を用いて、感情語が含まれる字幕文をテンプレート文として保管しておき、キーワードと組み合わせることで、このキーワードを含む感情を表す発話文を自動生成する技術が提案されている。この技術では、「食べたい」、「話したい」、「行きたい」などの願望を表す動詞句と、「きれい」、「おもしろい」、「大きい」などの形容詞とを代表語と呼び、対象のキーワードと特徴ベクトルの近い代表語が選択される。対象キーワードに対応する代表語が選択されると、この代表語を含むテンプレート文により発話文が生成される。本実施形態では、前述の４つの発話種別のうち、「開示」「質問」「確認」の各発話文を生成するために、この技術を応用する。
文献Ｂ：特開２０１８－１９００７７号公報 For example, in the following document B, subtitles containing emotion words are stored as template sentences using subtitles from past broadcast programs, and by combining them with keywords, utterances expressing emotions containing this keyword are created. A technique has been proposed to automatically generate . In this technology, verb phrases expressing desires such as ``I want to eat,'' ``I want to talk,'' and ``I want to go,'' and adjectives such as ``pretty,''``interesting,'' and ``big'' are called representative words. Representative words with similar feature vectors to the keyword are selected. When a representative word corresponding to the target keyword is selected, an uttered sentence is generated using a template sentence including this representative word. In this embodiment, this technique is applied to generate utterances for each of the four utterance types described above: "disclosure,""question," and "confirmation."
Document B: Unexamined Japanese Patent Publication No. 2018-190077

図２は、本実施形態における発話生成部１３の詳細な機能構成を示す図である。
発話生成部１３は、テンプレート抽出部１３１と、ベクトル距離算出部１３２と、テンプレート選択部１３３、キーワード検索部１３４、情報文生成部１３５とを備える。
また、発話生成部１３は、記憶部に格納されたテンプレートデータベース１８Ａ、特徴ベクトルデータベース１８Ｂ、及びキーワード辞典１８Ｃの各データベースを参照する。 FIG. 2 is a diagram showing a detailed functional configuration of the utterance generation unit 13 in this embodiment.
The utterance generation section 13 includes a template extraction section 131, a vector distance calculation section 132, a template selection section 133, a keyword search section 134, and an information sentence generation section 135.
Further, the utterance generation unit 13 refers to each database of a template database 18A, a feature vector database 18B, and a keyword dictionary 18C stored in the storage unit.

発話生成部１３には、キーワード抽出部１１により抽出されたキーワードＫｅｙと、発話種別選択部１２により選択された発話種別Ｃとが入力される。
ここで、発話種別Ｃが「質問」「開示」「確認」のいずれかの場合、キーワードＫｅｙ及び発話種別Ｃはテンプレート抽出部１３１に、発話種別Ｃが「情報」の場合、キーワードＫｅｙ及び発話種別Ｃはキーワード検索部１３４に、それぞれ入力される。 The keyword Key extracted by the keyword extraction unit 11 and the utterance type C selected by the utterance type selection unit 12 are input to the utterance generation unit 13 .
Here, when the utterance type C is "question", "disclosure", or "confirmation", the keyword Key and the utterance type C are sent to the template extraction unit 131, and when the utterance type C is "information", the keyword Key and the utterance type C is input into the keyword search section 134, respectively.

テンプレート抽出部１３１は、テンプレートデータベース１８Ａから、発話種別Ｃが「開示」の場合には開示テンプレートを、「質問」の場合には質問テンプレートを、「確認」の場合には確認テンプレートを抽出する。
ＸＸテンプレートは、「ＸＸ」文を生成するためのテンプレートが保管されたデータであり、代表語と、この代表語に対応したテンプレート文とが含まれる。 The template extraction unit 131 extracts a disclosure template when the utterance type C is "disclosure", a question template when it is a "question", and a confirmation template when it is "confirmation" from the template database 18A.
The XX template is data in which a template for generating a "XX" sentence is stored, and includes a representative word and a template sentence corresponding to the representative word.

図３は、本実施形態における開示テンプレートの一例を示す図である。
開示テンプレートは、代表語と、この代表語に対する「開示」文を生成するためのテンプレート文とが対になって登録されたデータである。
この例では、代表語として、「話したい」「行きたい」「食べたい」「きれい」「おもしろい」「大きい」の６つの代表語と、各代表語に対応するテンプレート文とが登録されている。例えば、動詞句の代表語「話したい」に対して、２つのテンプレート文「％ｋｅｙと話したい」、「％ｋｅｙとおしゃべりしたい」が登録されている。また、形容詞の代表語「きれい」に対して、「％ｋｅｙってとっても綺麗」、「なんてきれいな％ｋｅｙ」の２つのテンプレート文が登録されている。
なお、テンプレート文の％ｋｅｙは、キーワードの挿入位置を示している。 FIG. 3 is a diagram showing an example of a disclosure template in this embodiment.
The disclosure template is data in which a representative word and a template sentence for generating a "disclosure" sentence for the representative word are registered as a pair.
In this example, six representative words are registered: "I want to talk,""I want to go,""I want to eat,""pretty,""interesting," and "big" and template sentences corresponding to each representative word are registered. . For example, for the representative verb phrase "I want to talk", two template sentences "I want to talk with %key" and "I want to chat with %key" are registered. Furthermore, two template sentences are registered for the representative adjective word "beautiful": "%key is so beautiful" and "%key is so beautiful".
Note that %key in the template sentence indicates the insertion position of the keyword.

図４は、本実施形態における質問テンプレートの一例を示す図である。
質問テンプレートについても、開示テンプレートと同様の構造であり、代表語と共に、「質問」文を生成するためのテンプレート文が登録されている。
この例では、「話したい」「行きたい」「食べたい」の３つの動詞句の代表語に対するテンプレート文が登録されている。例えば、動詞句の代表語「話したい」に対して、「％ｋｅｙと話したことある？」、「％ｋｅｙとおしゃべりしたいですか？」の２つのテンプレート文が登録されている。 FIG. 4 is a diagram showing an example of a question template in this embodiment.
The question template also has the same structure as the disclosure template, and a template sentence for generating a "question" sentence is registered together with a representative word.
In this example, template sentences are registered for representative words of three verb phrases: "I want to talk,""I want to go," and "I want to eat." For example, for the representative verb phrase "I want to talk", two template sentences are registered: "Have you talked to %key?" and "Do you want to chat with %key?".

図５は、本実施形態における確認テンプレートの一例を示す図である。
確認テンプレートについても、データの構造は開示テンプレート及び質問テンプレートと同様であり、代表語と共に、「確認」文を生成するためのテンプレート文が登録されている。
この例では、形容詞の代表語「きれい」「おもしろい」「おいしい」の３つの形容詞の代表語に対するテンプレート文が登録されている。例えば、形容詞の代表語「きれい」に対して、「％ｋｅｙってとっても綺麗ですよね」、「きれいな％ｋｅｙってすてきですよね」の２つのテンプレート文が登録されている。 FIG. 5 is a diagram showing an example of a confirmation template in this embodiment.
The data structure of the confirmation template is the same as that of the disclosure template and the question template, and a template sentence for generating a "confirmation" sentence is registered along with a representative word.
In this example, template sentences are registered for three representative words of the adjective: "beautiful,""interesting," and "delicious." For example, for the representative adjective word ``beautiful'', two template sentences are registered: ``%key is very beautiful, isn't it?'' and ``pretty %key is wonderful, isn't it?''.

ベクトル距離算出部１３２は、テンプレート抽出部１３１により抽出されたテンプレート群に対して、テンプレート内の各代表語の特徴ベクトルと、キーワード（Ｋｅｙ）の特徴ベクトルとのベクトル距離を計算する。
特徴ベクトルデータベース１８Ｂには、少なくともキーワード辞書１７に記録されているキーワードと、各テンプレートに含まれている代表語とに関して、予め特徴ベクトルが記録されている。 The vector distance calculation unit 132 calculates the vector distance between the feature vector of each representative word in the template and the feature vector of the keyword (Key) for the template group extracted by the template extraction unit 131.
In the feature vector database 18B, feature vectors are recorded in advance for at least the keywords recorded in the keyword dictionary 17 and the representative words included in each template.

特徴ベクトルの算出方法としては、例えばｗｏｒｄ２ｖｅｃなどの方法がある。ｗｏｒｄ２ｖｅｃは、３層のニューラルネットワークを用いており、分かち書きされた大量の文章を入力することで算出された特徴ベクトルは、類似した語が距離の近いベクトルになることが経験的に知られている。
なお、特徴ベクトルの算出方法は、ｗｏｒｄ２ｖｅｃには限られず、意味的に近い語の特徴ベクトルがベクトル空間上で近くに分布する演算方式であればよい。 As a method for calculating a feature vector, there is a method such as word2vec, for example. word2vec uses a three-layer neural network, and it is empirically known that the feature vectors calculated by inputting a large amount of separated sentences are vectors with similar words close in distance. .
Note that the feature vector calculation method is not limited to word2vec, and any calculation method may be used as long as the feature vectors of words that are semantically similar are distributed nearby on the vector space.

ここで、キーワードＫｅｙの特徴ベクトルをｖｅｃ＿ｋ、代表語ｔの特徴ベクトルをｖｅｃ＿ｔとしたとき、ベクトル間の距離として、例えば、コサイン類似度ｃｏｓ（ｖｅｃ＿ｋ，ｖｅｃ＿ｔ）＝（ｖｅｃ＿ｋ・ｖｅｃ＿ｔ）／（｜ｖｅｃ＿ｋ｜｜ｖｅｃ＿ｔ｜）が利用できる。 Here, when the feature vector of keyword Key is vec_k and the feature vector of representative word t is vec_t, the distance between the vectors is, for example, cosine similarity cos(vec_k, vec_t)=(vec_k・vec_t)/(|vec_k ||vec_t|) can be used.

テンプレート選択部１３３は、ベクトル距離算出部１３２によるキーワードＫｅｙの特徴ベクトルと各代表語の特徴ベクトルとのコサイン類似度の計算結果の中から、キーワードＫｅｙの特徴ベクトルに最も近い、すなわちコサイン類似度の値が大きい特徴ベクトルを持つ代表語を選択する。 The template selection unit 133 selects the one closest to the feature vector of the keyword Key, that is, the one with the cosine similarity, from among the results of calculation of the cosine similarity between the feature vector of the keyword Key and the feature vector of each representative word by the vector distance calculation unit 132. Select representative words with feature vectors with large values.

次に、テンプレート選択部１３３は、選択された代表語に対するテンプレート文の中から１つのテンプレートを選択し、テンプレート文の中の％ｋｅｙをキーワードＫｅｙで置換することにより発話文（「質問」文、「開示」文、又は「確認」文）を生成する。
例えば、発話生成部１３にキーワードＫｅｙとして「讃岐うどん」が、発話種別Ｃとして「確認」が入力された場合、テンプレート抽出部１３１は、図５の確認テンプレートを抽出する。続いて、ベクトル距離算出部１３２は、「讃岐うどん」の特徴ベクトルｖｅｃ＿讃岐うどんと、代表語の特徴ベクトルｖｅｃ＿きれい、ｖｅｃ＿おもしろい、ｖｅｃ＿おいしいとのベクトル距離をそれぞれ計算する。そして、テンプレート選択部１３３は、これらの計算結果から、ｖｅｃ＿おいしいが最もベクトル距離が近かった場合には、「おいしい」に対するテンプレート文の中からランダムに１つを選択する。この結果、テンプレート文として「％ｋｅｙっておいしいですよね」を選択した場合には、テンプレート選択部１３３は、「確認」文として、「讃岐うどんっておいしいですよね」という発話文を生成する。 Next, the template selection unit 133 selects one template from among the template sentences for the selected representative word, and replaces %key in the template sentence with the keyword Key, so that the uttered sentence (“question” sentence, "disclosure" statement or "confirmation" statement).
For example, when "Sanuki udon" is input as the keyword Key and "confirmation" as the utterance type C to the utterance generation unit 13, the template extraction unit 131 extracts the confirmation template shown in FIG. 5. Subsequently, the vector distance calculation unit 132 calculates the vector distances between the feature vector vec_Sanuki udon of "Sanuki udon" and the feature vectors vec_beautiful, vec_interesting, and vec_delicious of the representative words. Then, based on these calculation results, if vec_delicious has the closest vector distance, the template selection unit 133 randomly selects one template sentence for "delicious". As a result, when "%key is delicious, isn't it?" is selected as the template sentence, the template selection unit 133 generates the uttered sentence "Sanuki udon is delicious, isn't it?" as the "confirmation" sentence.

キーワード検索部１３４は、キーワード辞典１８Ｃから、入力されたキーワードＫｅｙの説明文を取得する。
キーワード辞典１８Ｃには、キーワードの意味などの説明文が登録されている。 The keyword search unit 134 obtains the explanatory text of the input keyword Key from the keyword dictionary 18C.
Explanatory sentences such as the meaning of keywords are registered in the keyword dictionary 18C.

図６は、本実施形態におけるキーワード辞典１８Ｃの登録例を示す図である。
この例では、キーワードとして、「アンモナイト」「讃岐うどん」「シリウス」が登録され、キーワード「アンモナイト」の説明として「３億５０００万年前に海に生息した巻貝みたいな殻をもった生物」が、キーワード「讃岐うどん」の説明として「香川県の特産うどん」が、「シリウス」の説明として「おおいぬ座で最も明るい恒星で、太陽を除けば地球上から見える最も明るい恒星」が登録されている。 FIG. 6 is a diagram showing an example of registration of the keyword dictionary 18C in this embodiment.
In this example, the keywords "ammonite", "Sanuki udon", and "Sirius" are registered, and the explanation for the keyword "ammonite" is "creature with a shell like a conch shell that lived in the sea 350 million years ago." , "Kagawa Prefecture's specialty udon" is registered as an explanation for the keyword "Sanuki udon", and "the brightest star in the constellation Canis Major, and the brightest star visible from the earth except the sun" is registered as an explanation for "Sirius". There is.

情報文生成部１３５は、キーワード検索部１３４により取得された説明文とキーワードＫｅｙとを用い、発話文として「（Ｋｅｙ）は（説明文）なんだって」「（ｋｅｙ）は（説明文）なんだよ」などの「情報」文を生成する。
例えば、発話生成部１３にキーワードとして「シリウス」、発話種別として「情報」が入力された場合、キーワード検索部１３４は、説明文「おおいぬ座で最も明るい恒星で、太陽を除けば地球上から見える最も明るい恒星」を取得し、情報文生成部１３５は、「シリウスはおおいぬ座で最も明るい恒星で、太陽を除けば地球上から見える最も明るい恒星なんだって」を出力する。 The information sentence generation unit 135 uses the explanatory text and the keyword Key acquired by the keyword search unit 134 to generate utterances such as “What is (Key)? (Explanatory text)” and “What is (Key) (Explanatory text)?” Generates "informational" sentences such as "Yo".
For example, when "Sirius" is input as a keyword and "information" is input as an utterance type to the utterance generation unit 13, the keyword search unit 134 inputs the explanatory text ``The brightest star in the constellation Canis Major, which is the brightest star on Earth except for the sun. The information sentence generation unit 135 outputs ``Sirius is the brightest star in the constellation Canis Major, and is the brightest star visible from the earth except for the sun.''

リアクション取得部１４は、発話生成部１３が出力した発話文出力Ｏに対する利用者の反応である応答文入力Ｉから、利用者の嗜好を示すリアクション種別Ｃｒを取得する。また、発話文出力Ｏを出力してから応答文入力Ｉが入力されるまでの時間であるリアクション時間Ｔｒを計測する。 The reaction acquisition unit 14 acquires a reaction type Cr indicating the user's preference from the response sentence input I, which is the user's reaction to the utterance output O outputted by the utterance generation unit 13. In addition, a reaction time Tr, which is the time from when the uttered sentence output O is output until the response sentence input I is input, is measured.

リアクション種別Ｃｒは、例えば、ポジティブな嗜好を示す「はい」、ネガティブな嗜好を示す「いいえ」、どちらでもない無関心を示す「無反応」の３種類であってよい。
リアクション取得部１４は、リアクション時間Ｔｒが予め決められた時間Ｔｈを超えて計測できない場合（Ｔｒ＝０とする）、すなわち時間Ｔｈ以内に応答文入力Ｉがない場合に、リアクション種別Ｃｒを「無反応」とする。また、リアクション取得部１４は、リアクション時間Ｔｒが時間Ｔｈ以内の場合、すなわち時間Ｔｈ以内に応答文入力Ｉがあった場合には、応答文入力Ｉについて、「はい」又は「いいえ」のいずれの文であるかの２クラス分類を行う。 The reaction type Cr may be, for example, three types: "yes" indicating a positive preference, "no" indicating a negative preference, and "no reaction" indicating a neutral indifference.
When the reaction time Tr cannot be measured beyond a predetermined time Th (Tr=0), that is, when there is no response sentence input I within the time Th, the reaction acquisition unit 14 sets the reaction type Cr to "None". "reaction". Furthermore, when the reaction time Tr is within the time Th, that is, when there is a response sentence input I within the time Th, the reaction acquisition unit 14 selects either “yes” or “no” for the response sentence input I. Performs two-class classification based on whether it is a sentence or not.

文を２クラス分類する方法としては、例えば、予め学習データを用いた分類器を作成しておく方法があり、学習方法として、ニューラルネットワークを用いた方法、サポートベクタマシンを用いた方法などがある。なお、分類方法は、これらには限られない。 As a method for classifying sentences into two classes, for example, there is a method of creating a classifier using training data in advance, and as a learning method, there are methods using a neural network, a method using a support vector machine, etc. . Note that the classification method is not limited to these.

嗜好度演算部１５は、リアクション取得部１４により取得されたリアクション種別Ｃｒとリアクション時間Ｔｒとから、キーワードＫｅｙに対する嗜好の度合いを数値データとして算出する。算出結果は、キーワードＫｅｙを含む発話種別Ｃの発話文に対する嗜好度として、キーワードＫｅｙ及び発話種別Ｃごとにキーワード辞書１７に記録される。 The preference calculation unit 15 calculates the degree of preference for the keyword Key as numerical data from the reaction type Cr and reaction time Tr acquired by the reaction acquisition unit 14. The calculation result is recorded in the keyword dictionary 17 for each keyword Key and utterance type C as the degree of preference for the utterance sentence of utterance type C that includes the keyword Key.

ここで、キーワードＫｅｙを含む発話種別Ｃの発話文に対する嗜好度をＳ＿Ｃ＿Ｋｅｙとする。嗜好度の計算方法は限定されないが、嗜好度演算部１５は、例えば、リアクション時間Ｔｒに基づいて重み付けし、次の式によりＳ＿Ｃ＿Ｋｅｙを求める。
Ｓ＿Ｃ＿Ｋｅｙ＝｛α＿Ｃ×ｎ（Ｃｒ）－β＿Ｃ×（１－ｎ（Ｃｒ））｝／Ｔｒ
（Ｔｒ≠０のとき）
＝０（Ｔｒ＝０、すなわちＣｒ＝「無反応」のとき）
なお、ｎ（Ｃｒ）は、Ｃｒが「はい」のとき１、「いいえ」のとき０となる２値関数であり、α＿Ｃ及びβ＿Ｃは、予め決めておく定数である。 Here, the preference level for an utterance of utterance type C that includes the keyword Key is assumed to be S_C_Key. Although the method of calculating the preference level is not limited, the preference level calculation unit 15 weights it based on the reaction time Tr, and calculates the S_C_Key using the following formula.
S_C_Key={α_C×n(Cr)-β_C×(1-n(Cr))}/Tr
(When Tr≠0)
=0 (when Tr=0, that is, Cr=“no reaction”)
Note that n(Cr) is a binary function that is 1 when Cr is "yes" and 0 when Cr is "no," and α_C and β_C are constants determined in advance.

α＿Ｃ及びβ＿Ｃは、発話種別Ｃごとに設定することができる。例えば、発話種別が「質問」の発話文は、キーワードに関して直接的に利用者に質問する発話文であり、利用者からの「はい」又は「いいえ」の回答は、他の発話種別の発話文に対する応答文から推定したリアクション種別に比べ、信頼できるデータである。このため、α＿質問及びβ＿質問の値を他の発話種別より大きく設定しておくことで、他の発話種別の嗜好度よりも興味度への影響を大きくすることができる。 α_C and β_C can be set for each utterance type C. For example, an utterance whose utterance type is "Question" is an utterance that directly asks the user about the keyword, and a "yes" or "no" answer from the user is an utterance of another utterance type. This data is more reliable than the reaction type estimated from the response text. Therefore, by setting the values of α_question and β_question to be larger than other utterance types, it is possible to have a greater influence on the interest level than on the preference level of other utterance types.

なお、本実施形態では、ｎ（Ｃｒ）を０又は１の２値関数として説明するが、これには限られず、ｎ（Ｃｒ）は整数又は実数であってもよい。例えば、リアクション取得部１４によるリアクション種別の分類時に信頼度などの値が取得できる場合、この値を用いることができる。また、リアクション取得部１４の入力として心拍計などのセンサデータを使う場合、この値を用いることもできる。 In addition, in this embodiment, n(Cr) is explained as a binary function of 0 or 1, but it is not limited to this, and n(Cr) may be an integer or a real number. For example, if a value such as reliability can be obtained when classifying reaction types by the reaction obtaining unit 14, this value can be used. Further, when sensor data such as a heart rate monitor is used as input to the reaction acquisition unit 14, this value can also be used.

図７は、本実施形態におけるキーワード辞書１７に記録される嗜好度の一例を示す図である。
この例では、キーワードとして「アンモナイト」「讃岐うどん」「シリウス」が記録されている。また、各キーワードには、発話種別ごとの、これまでに計算された嗜好度の合計が記録されている。
嗜好度演算部１５は、発話文に対するリアクションを取得する度に、計算した嗜好度を、キーワード辞書１７の該当するキーワード及び発話種別の欄に記録されている嗜好度に加算すると共に、発話回数を＋１する。 FIG. 7 is a diagram showing an example of preference degrees recorded in the keyword dictionary 17 in this embodiment.
In this example, "ammonite,""Sanukiudon," and "Sirius" are recorded as keywords. Furthermore, for each keyword, the total of preference degrees calculated so far for each utterance type is recorded.
Every time a reaction to an utterance is obtained, the preference calculation unit 15 adds the calculated preference to the preference recorded in the corresponding keyword and utterance type columns of the keyword dictionary 17, and also calculates the number of utterances. Add +1.

例えば、キーワード「アンモナイト」に関する「質問」の発話は、これまでに１回行われ、嗜好度は＋１．２、「開示」の発話はこれまでに１回行われ、嗜好度は＋０．２である。同様に、キーワード「讃岐うどん」に関する「開示」の発話は３回行われ、嗜好度の合計値が＋０．３、「確認」の発話は１回行われ、嗜好度は－０．２である。 For example, the utterance of "question" regarding the keyword "ammonite" has been uttered once so far, and the preference level is +1.2, and the utterance of "disclosure" has been uttered once so far, and the preference level is +0.2. be. Similarly, the utterance of ``disclosure'' related to the keyword ``Sanuki udon'' is made three times, resulting in a total preference value of +0.3, and the utterance of ``confirmation'' is made once, with a preference value of -0.2. .

なお、発話文に対するリアクション種別が「無反応」だった場合、加算される嗜好度は０だが発話回数がカウントされるため、嗜好度の平均は、ポジティブ（好き）でもネガティブ（嫌い）でもない中立の状態へと近づく。 Furthermore, if the reaction type to the utterance is "no reaction", the added preference level is 0, but the number of utterances is counted, so the average preference level is neutral, neither positive (like) nor negative (dislike). approaching the state of

興味度演算部１６は、キーワードが指定された際に、このキーワードを含む発話文に対して過去のリアクション種別及びリアクション時間に基づいて算出された嗜好度の統計情報により、キーワードに対する利用者の興味度Ｋを算出して出力する。 When a keyword is specified, the interest level calculation unit 16 calculates the user's interest in the keyword based on statistical information of the preference level calculated based on past reaction types and reaction times for utterances that include this keyword. The degree K is calculated and output.

具体的には、興味度演算部１６は、例えば、キーワード辞書１７の該当するキーワードの嗜好度の平均値を興味度として計算する。図７の例では、キーワード「アンモナイト」の興味度は（１．２／１＋０．２／１）／２＝０．７、キーワード「讃岐うどん」の興味度は（０．３／３－０．２／１）／２＝－０．１、キーワード「シリウス」の興味度は（０．４／２－０．２／１）／２＝０と計算できる。 Specifically, the interest level calculation unit 16 calculates, for example, the average value of the preference levels of the relevant keyword in the keyword dictionary 17 as the interest level. In the example of FIG. 7, the interest level of the keyword "ammonite" is (1.2/1+0.2/1)/2=0.7, and the interest level of the keyword "Sanuki udon" is (0.3/3-0. 2/1)/2=-0.1, and the interest level of the keyword "Sirius" can be calculated as (0.4/2-0.2/1)/2=0.

本実施形態によれば、キーワード評価装置１は、キーワード辞書１７に保存されているキーワードに対して、利用者がどのような興味度を持っているかを推定するために、キーワード抽出部１１により抽出したキーワードに対して、発話生成部１３により発話文出力Ｏを生成する。キーワード評価装置１は、この発話文出力Ｏに対する利用者の応答である応答文入力Ｉから、リアクション取得部１４によりリアクション種別を判別し、このリアクション種別に基づいて、興味度演算部１６によりキーワードに対する興味度を算出する。 According to the present embodiment, the keyword evaluation device 1 uses the keyword extraction unit 11 to extract the keywords stored in the keyword dictionary 17 in order to estimate what kind of interest the user has in the keywords. The utterance generation unit 13 generates an utterance output O for the keyword. In the keyword evaluation device 1, the reaction acquisition unit 14 determines the reaction type from the response sentence input I, which is the user's response to the uttered sentence output O, and based on this reaction type, the interest level calculation unit 16 determines the reaction type for the keyword. Calculate interest level.

キーワード評価装置１は、例えば、ロボットに組み込まれることで、キーワードを含む発話文をロボットから人へ話しかけ、発話文への反応からキーワードに対する興味度を推定できる。従来の手法がテレビ視聴中の人からのパッシブな情報を収集して興味度を推定するのに対し、本実施形態では、ロボットから利用者に発話するというアクティブな手法により、キーワードそれぞれに対する利用者の反応の違いに基づいて、キーワードそれぞれに対する、利用者の興味度を評価できる。
この結果、ロボットは、利用者が好きなもの又は嫌いなものを選んで発話したり、キーワードへの興味度に応じて挙動を変化させたりといった、自然なコミュニケーションを実現できる。 For example, when the keyword evaluation device 1 is incorporated into a robot, the robot can speak a sentence containing a keyword to a person, and the degree of interest in the keyword can be estimated from the reaction to the sentence. While conventional methods collect passive information from people watching TV to estimate their interest level, this embodiment uses an active method in which the robot speaks to the user to determine the user's interest level for each keyword. Users' interest level for each keyword can be evaluated based on the differences in their reactions.
As a result, the robot can realize natural communication, such as choosing what the user likes or dislikes and speaking, and changing its behavior depending on the user's level of interest in the keyword.

また、キーワード評価装置１は、例えば、ロボットに組み込まれることで、利用者と一緒にテレビなどの放送番組を視聴中に、字幕文などからキーワード辞書１７に含まれるキーワードを抽出し、このキーワードに関する発話をロボットにさせる。
これにより、キーワード評価装置１は、ロボットの発話に対する利用者の反応から、視聴中の放送番組に関連するキーワードそれぞれに対する利用者の興味度を評価できる。 Further, the keyword evaluation device 1 can be incorporated into a robot, for example, to extract keywords included in the keyword dictionary 17 from subtitles etc. while watching a broadcast program on TV or the like together with the user, and Make the robot speak.
Thereby, the keyword evaluation device 1 can evaluate the user's degree of interest in each keyword related to the broadcast program being viewed based on the user's reaction to the robot's utterance.

この結果、ロボットは、キーワードが入力された際に、このキーワードへの利用者の興味度に応じて異なる動作をすることが可能となる。すなわち、ロボットは、あるキーワードで発話をすべきかどうかを判断し、動作内容を選択したり、適時に興味のある関連番組などの情報を利用者に提示したりできる。
なお、興味度の高いキーワードに関して発話する内容と、興味度を評価するために発話する内容とは、共通であってよい。この場合、前述の各種テンプレートは共用される。そして、いずれの場合にも、利用者の応答に応じて、嗜好度及び興味度が更新されてよい。 As a result, when a keyword is input, the robot can perform different actions depending on the user's level of interest in the keyword. In other words, the robot can determine whether or not to utter a certain keyword, select the content of the action, and present information such as related programs of interest to the user in a timely manner.
Note that the content uttered regarding a keyword with a high degree of interest and the content uttered for evaluating the degree of interest may be the same. In this case, the various templates described above are shared. In either case, the preference level and interest level may be updated according to the user's response.

キーワード評価装置１は、リアクション種別として、ポジティブ及びネガティブを含む複数の種別のいずれかを取得することで、キーワードに対する利用者の興味度を、一定の尺度で数値化して評価できる。 By acquiring any one of a plurality of reaction types including positive and negative reaction types, the keyword evaluation device 1 can evaluate the user's interest level in a keyword by quantifying it using a certain scale.

キーワード評価装置１は、発話文に対する利用者のリアクション種別を計測し、リアクション時間が所定時間を超える場合、リアクション種別として無反応の種別を取得する。
これにより、キーワード評価装置１は、キーワードに対して利用者が無関心であるような嗜好の種類を判別でき、興味度を適切に評価できる。 The keyword evaluation device 1 measures the user's reaction type to the uttered sentence, and when the reaction time exceeds a predetermined time, acquires a non-reaction type as the reaction type.
Thereby, the keyword evaluation device 1 can determine the type of preference in which the user is indifferent to the keyword, and can appropriately evaluate the degree of interest.

キーワード評価装置１は、発話種別選択部１２により、発話文の種別を所定数の発話種別の中から、予め定められた確率で選択する。
発話種別は、例えば「質問」のみでも興味度の評価は可能であるが、「質問」ばかりでは、利用者は、始めはロボットからの質問に応答していても、そのうちうっとうしいなどの理由で応答しなくなってしまうため、発話種別のバリエーションが必要である。文献Ａでは、人同士の会話では、「開示」文で会話が始まることが多いこと、また、「質問」「情報」「確認」は、「開示」文に比べて多くの発話は行われていないが、相手の反応率が高いという結果が示されている。
したがって、ロボットから人への発話の種別も、人同士の会話に近い確率で選択することにより、キーワード評価装置１は、例えば質問ばかりするロボットや、情報ばかり読み上げるロボットなどによって、利用者を飽きさせてしまうことなく、人と同じような発話種別の割合でロボットに発話させ、自然な対話を実現できる。 In the keyword evaluation device 1, the utterance type selection unit 12 selects the type of the uttered sentence from among a predetermined number of utterance types with a predetermined probability.
For example, it is possible to evaluate the level of interest by using only "questions" as the type of utterance, but if the user only uses "questions", even if the user initially responds to the questions from the robot, he or she will eventually respond due to reasons such as being annoying. Therefore, variations in utterance types are necessary. Document A shows that conversations between people often begin with a "disclosure" sentence, and that "questions,""information," and "confirmation" are uttered more often than "disclosure" sentences. However, the results show that the response rate of the other party is high.
Therefore, by selecting the type of speech from a robot to a person with a probability close to that of a conversation between people, the keyword evaluation device 1 can prevent the user from getting bored by, for example, using a robot that only asks questions or a robot that only reads out information. It is possible to have a robot speak at the same rate as a human, without causing confusion, and to achieve natural dialogue.

キーワード評価装置１は、嗜好度演算部１５により、リアクション種別に基づいて、発話種別ごとに、キーワードを含む発話文に対する嗜好度を算出し、キーワード辞書１７を更新する。
例えば、キーワード評価装置１は、キーワード「Ａ子さん」に対して、発話種別「質問」が選択された場合、「Ａ子さんは好きですか？」と利用者に質問し、利用者からの「好きです」、「あまり好きではない」、「嫌い」などの応答により、「Ａ子さん」に関する「質問」文に対する嗜好度を推定する。
これにより、キーワード評価装置１は、発話種別ごとに推定された嗜好度の統計情報により興味度を算出するので、例えば、「質問」文に対する嗜好度の値を大きくするなど、発話種別ごとの発話文に対する利用者の応答の信頼性を興味度へ反映させることができる。 The keyword evaluation device 1 uses the preference calculation unit 15 to calculate the preference for the utterance including the keyword for each utterance type based on the reaction type, and updates the keyword dictionary 17.
For example, when the utterance type "question" is selected for the keyword "A-san," the keyword evaluation device 1 asks the user, "Do you like A-san?" Based on responses such as "I like it,""Idon't really like it," and "I don't like it," the degree of preference for the "Question" sentence regarding "Miss A" is estimated.
As a result, the keyword evaluation device 1 calculates the interest level based on the statistical information of the preference level estimated for each utterance type. The reliability of a user's response to a sentence can be reflected in the level of interest.

キーワード評価装置１は、発話文に対するリアクション時間に基づいて、嗜好度に重み付けすることにより、キーワードに対する利用者の好き嫌いの度合いを嗜好度に反映し、興味度を適切に評価できる。 The keyword evaluation device 1 can appropriately evaluate the degree of interest by weighting the degree of preference based on the reaction time to the uttered sentence, thereby reflecting the degree of the user's likes and dislikes for the keyword in the degree of preference.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、前述の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載されたものに限定されるものではない。 Although the embodiments of the present invention have been described above, the present invention is not limited to the embodiments described above. Further, the effects described in the above-described embodiments are only a list of the most preferable effects resulting from the present invention, and the effects according to the present invention are not limited to those described in the embodiments.

前述の実施形態では、発話文の生成時に、キーワードに対して選択した代表語のテンプレートを用いたが、キーワードに関する発話を生成する手法であれば、これには限られない。また、代表語の選択には、特徴ベクトルによるベクトル距離を利用したが、例えば、各キーワードに対して代表語を予め決めておき、これらの代表語の中から選択する方式でもよい。さらに、代表語を使用せずに、キーワード辞書１７に登録されている各キーワードに対して、予め個別の発話文を準備しておき、これを直接用いる方式でもよい。 In the embodiment described above, a template of a representative word selected for a keyword is used when generating an utterance, but the present invention is not limited to this as long as the method generates an utterance related to a keyword. Although vector distances based on feature vectors are used to select representative words, for example, representative words may be determined in advance for each keyword and selected from among these representative words. Furthermore, instead of using representative words, an individual utterance may be prepared in advance for each keyword registered in the keyword dictionary 17, and this may be used directly.

また、「情報」文を生成する際には、キーワード辞典１８Ｃを使う方法を説明したが、これには限られない。例えば、インターネットに接続し、ウィキペディアなどのオンライン百科事典を参照してキーワードの説明文を取得する方法も適用可能である。 Furthermore, although a method has been described in which the keyword dictionary 18C is used when generating the "information" sentence, the method is not limited to this. For example, a method of connecting to the Internet and referring to an online encyclopedia such as Wikipedia to obtain explanations of keywords is also applicable.

前述の実施形態では、利用者のリアクション種別の判定時に、音声認識装置３により音声入力ＩＡをテキストデータに変換した応答文入力Ｉを用いたが、これには限られず、リアクションとして「はい」又は「いいえ」の判定、すなわち、発話文出力Ｏに対する利用者のポジティブ又はネガティブの判別が可能な方法であればよい。例えば、カメラにより利用者の顔を撮影した画像を入力としてもよいし、心電計、心拍計、脳波計などの身体データを取得する装置を利用者に装着する方法でもよい。
また、複数人が同時に利用している場合、キーワード評価装置１は、例えば、発話文に応じて、人同士の会話が弾んだか否かといった反応によって、リアクション種別を取得してもよい。 In the above-described embodiment, when determining the user's reaction type, the response sentence input I obtained by converting the voice input IA into text data by the voice recognition device 3 is used, but the present invention is not limited to this. Any method may be used as long as it is possible to determine "no", that is, to determine whether the user is positive or negative regarding the utterance output O. For example, an image of the user's face captured by a camera may be input, or a device for acquiring physical data such as an electrocardiograph, heart rate monitor, or electroencephalogram may be attached to the user.
Furthermore, when a plurality of people are using the service at the same time, the keyword evaluation device 1 may acquire the reaction type based on, for example, a reaction such as whether or not the conversation between the users was lively, depending on the utterance.

前述の実施形態では、利用者を一人として説明したが、これには限られない。複数の利用者それぞれを識別し、キーワード辞書１７の嗜好度を利用者ごとに管理することで、利用者ごとの興味度を取得することができる。 Although the above embodiment has been described assuming that there is one user, the present invention is not limited to this. By identifying each of a plurality of users and managing the degree of preference in the keyword dictionary 17 for each user, it is possible to obtain the degree of interest for each user.

前述の実施形態では、キーワード評価装置１は、ロボットに組み込まれるものとして説明したが、これには限られず、ロボットの外部に配置され、ロボットと有線又は無線にて、あるいはネットワークを介して通信接続されてもよい。
また、各種のデータベースは、キーワード評価装置１が備える構成としたが、これには限られず、データベースは、クラウドなどの外部サーバに配置されてもよい。 In the above-described embodiment, the keyword evaluation device 1 was described as being built into the robot; however, the present invention is not limited to this, and the keyword evaluation device 1 is placed outside the robot and is connected to the robot for communication by wire or wirelessly or via a network. may be done.
Further, although the keyword evaluation device 1 is configured to include various databases, the present invention is not limited to this, and the databases may be placed in an external server such as a cloud.

本実施形態では、主にキーワード評価装置１の構成と動作について説明したが、本発明はこれに限られず、各構成要素を備え、キーワードを評価するための方法、又はプログラムとして構成されてもよい。 In the present embodiment, the configuration and operation of the keyword evaluation device 1 have been mainly described, but the present invention is not limited to this, and may be configured as a method or program for evaluating keywords, including each component. .

さらに、キーワード評価装置１の機能を実現するためのプログラムをコンピュータで読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。 Furthermore, the functions of the keyword evaluation device 1 may be realized by recording a program on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium. good.

ここでいう「コンピュータシステム」とは、ＯＳや周辺機器などのハードウェアを含むものとする。また、「コンピュータで読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭなどの可搬媒体、コンピュータシステムに内蔵されるハードディスクなどの記憶装置のことをいう。 The "computer system" here includes hardware such as an OS and peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, magneto-optical disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems.

さらに「コンピュータで読み取り可能な記録媒体」とは、インターネットなどのネットワークや電話回線などの通信回線を介してプログラムを送信する場合の通信線のように、短時刻の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時刻プログラムを保持しているものも含んでもよい。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 Furthermore, a ``computer-readable recording medium'' refers to a computer-readable storage medium that dynamically retains a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include a device that stores a program at a fixed time, such as a volatile memory inside a computer system that is a server or client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, or may be one that can realize the above-mentioned functions in combination with a program already recorded in the computer system. .

１キーワード評価装置
２音声合成装置
３音声認識装置
１１キーワード抽出部
１２発話種別選択部
１３発話生成部
１４リアクション取得部
１５嗜好度演算部
１６興味度演算部
１７キーワード辞書
１８Ａテンプレートデータベース
１８Ｂ特徴ベクトルデータベース
１８Ｃキーワード辞典
１３１テンプレート抽出部
１３２ベクトル距離算出部
１３３テンプレート選択部
１３４キーワード検索部
１３５情報文生成部 1 Keyword evaluation device 2 Speech synthesis device 3 Speech recognition device 11 Keyword extraction unit 12 Utterance type selection unit 13 Utterance generation unit 14 Reaction acquisition unit 15 Preference level calculation unit 16 Interest level calculation unit 17 Keyword dictionary 18A Template database 18B Feature vector database 18C Keyword Dictionary 131 Template Extraction Unit 132 Vector Distance Calculation Unit 133 Template Selection Unit 134 Keyword Search Unit 135 Information Sentence Generation Unit

Claims

an utterance generation unit that takes as input a keyword extracted from at least one of subtitles, audio, and images included in the broadcast program being viewed , generates an utterance including the keyword, and outputs the generated utterance to the user;
a reaction acquisition unit that acquires the user's reaction type to the utterance;
A keyword evaluation device comprising: an interest level calculation unit that calculates the user's interest level for the keyword based on the reaction type.

The keyword evaluation device according to claim 1, wherein the reaction acquisition unit acquires one of a plurality of types including positive and negative as the reaction type.

The keyword evaluation device according to claim 2, wherein the reaction acquisition unit measures reaction time of the user with respect to the utterance, and when the reaction time exceeds a predetermined time, acquires a non-reaction type as the reaction type. .

The keyword evaluation device according to any one of claims 1 to 3, further comprising an utterance type selection unit that selects the type of the uttered sentence from a predetermined number of utterance types with a predetermined probability.

comprising a preference calculation unit that calculates a preference for an utterance containing the keyword for each utterance type based on the reaction type;
The keyword evaluation device according to claim 4, wherein the interest level calculation unit calculates the interest level based on statistical information of the preference level.

The reaction acquisition unit measures reaction time of the user to the utterance,
The keyword evaluation device according to claim 5, wherein the preference calculation unit weights the preference based on the reaction time.

The keyword evaluation device according to any one of claims 1 to 6, further comprising a keyword extraction unit that extracts the keywords included in a predetermined database from a broadcast program.

an utterance generation step of inputting a keyword extracted from at least one of subtitles, audio, and images included in the broadcast program being viewed , generating an utterance including the keyword, and outputting the utterance to the user;
a reaction acquisition step of acquiring the user's reaction type to the utterance;
A keyword evaluation method in which a computer executes the step of calculating the degree of interest of the user in the keyword based on the reaction type.

A keyword evaluation program for causing a computer to function as the keyword evaluation device according to any one of claims 1 to 7.