JP7058588B2

JP7058588B2 - Conversation system and conversation program

Info

Publication number: JP7058588B2
Application number: JP2018211056A
Authority: JP
Inventors: 國彦加藤; 貴子吉村; 貴信海上; 忠向井
Original assignee: Tomy Co Ltd
Current assignee: Tomy Co Ltd
Priority date: 2018-11-09
Filing date: 2018-11-09
Publication date: 2022-04-22
Anticipated expiration: 2038-11-09
Also published as: JP2020077272A

Description

本発明は、会話システムおよび会話プログラムに係り、特に、ユーザと共有されている画像に関連した発話に関する。 The present invention relates to conversation systems and conversation programs, and in particular relates to image-related utterances shared with users.

従来、画像に写し出された被写体に関連する事項をキーワードとして抽出する手法が知られている。例えば、特許文献１には、写真等の画像に被写体として含まれるものの名前や説明を取得する情報処理装置が開示されている。この情報処理装置では、所定の画像に関連する情報が画像関連情報として取得され、これに基づいてキーワードが生成される。また、特許文献２には、画像からその画像に関連する情報を抽出する情報検索装置が開示されている。この情報検索装置では、まず、検索対象となる画像データについて、この画像データの特徴データが抽出される。つぎに、この特徴データを検索キーにデータベースが検索される。このデータベースには、画像の特徴データと、これに関連付けられたキーワードとが格納されている。これにより、検索キーと同一または類似の特徴データが特定され、これに関連付けられたキーワードが抽出される。 Conventionally, a method of extracting items related to a subject projected on an image as a keyword has been known. For example, Patent Document 1 discloses an information processing apparatus for acquiring a name and an explanation of a subject included in an image such as a photograph. In this information processing device, information related to a predetermined image is acquired as image-related information, and a keyword is generated based on this. Further, Patent Document 2 discloses an information retrieval device that extracts information related to an image from an image. In this information retrieval device, first, the feature data of the image data is extracted from the image data to be searched. Next, the database is searched using this feature data as a search key. This database stores the feature data of the image and the keywords associated with it. This identifies feature data that is the same as or similar to the search key, and extracts keywords associated with it.

また、カメラで撮像した画像に写し出された被写体に関連するキーワードを参照して、ユーザの発話に対する応答文を生成する手法も知られている。例えば、特許文献３には、ユーザが見ている物体と関連する情報を参照して、応答文を生成する情報処理装置が開示されている。具体的には、まず、ユーザの発話がマイクによって取得されると共に、ロボットが見ている物体が「目」であるＣＣＤカメラによって撮像される。つぎに、この撮像された物体と関連する単語の辞書情報が参照され、音声認識および言語解析が行われる。そして、知識データベースに記憶されている情報のうち、撮像された物体と関連する情報が参照され、言語解析結果（意味理解の結果）に対応する応答文が生成される。これにより、ユーザの発話に対する応答文を正確かつ高速に生成することができる。 Further, there is also known a method of generating a response sentence to a user's utterance by referring to a keyword related to a subject projected on an image captured by a camera. For example, Patent Document 3 discloses an information processing apparatus that generates a response sentence by referring to information related to an object that the user is looking at. Specifically, first, the user's utterance is acquired by the microphone, and the object seen by the robot is imaged by the CCD camera, which is the "eye". Next, the dictionary information of the words related to the imaged object is referred to, and speech recognition and language analysis are performed. Then, among the information stored in the knowledge database, the information related to the captured object is referred to, and the response sentence corresponding to the language analysis result (result of meaning understanding) is generated. As a result, it is possible to generate a response sentence to the user's utterance accurately and at high speed.

特開２０１３－１０１４５０号公報Japanese Unexamined Patent Publication No. 2013-101450 特開２００２－２９７６４８号公報Japanese Unexamined Patent Publication No. 2002-297648 特開２００１－１８８７８０号公報Japanese Unexamined Patent Publication No. 2001-188780

ところで、従来の会話システムは、ユーザの問いかけがあれば受動的には応答するものの、ユーザの問いかけがない状態でシステム側から自発的に発話を行うもの、換言すれば、システム側から話題を能動的に提示するものは殆ど存在しない。そのため、ユーザ自身の発話が乏しい場合、会話が途切れがちになり、会話としての流れが成立し難いという問題があった。この点は、カメラの撮像画像から取得されたキーワードを用いて、ユーザの発話に対する応答文を生成する場合であっても異なることはない。なお、上述した特許文献３は、カメラの撮像画像から取得されたキーワードを応答文の正確性やレスポンスを高めるために用いるものであって、会話の流れに連続性を持たせるために用いるものではない。 By the way, the conventional conversation system passively responds to a user's question, but spontaneously speaks from the system side without the user's question, in other words, the topic is active from the system side. There is almost nothing to present. Therefore, when the user's own utterance is poor, the conversation tends to be interrupted, and there is a problem that it is difficult to establish the flow as a conversation. This point does not differ even when a response sentence to the user's utterance is generated by using the keyword acquired from the image captured by the camera. In addition, the above-mentioned Patent Document 3 uses the keyword acquired from the image captured by the camera to improve the accuracy and response of the response sentence, and does not use it to give continuity to the flow of conversation. not.

本発明は、かかる事情に鑑みてなされたものであり、その目的は、ユーザと共有されている画像から取得された被写体に関連するキーワードを用いて会話を行う会話システムにおいて、会話の流れに連続性を持たせることである。 The present invention has been made in view of such circumstances, and an object of the present invention is to be continuous with the flow of conversation in a conversation system in which a conversation is performed using keywords related to a subject acquired from an image shared with a user. It is to have sex.

かかる課題を解決すべく、第１の発明は、キーワード生成部と、会話処理部とを有し、ユーザと会話を行う会話システムを提供する。キーワード生成部は、ユーザと共有されている共有画像を入力として、共有画像に写し出された被写体に関連する複数のキーワードを生成する。会話処理部は、キーワード生成部によって生成された複数のキーワードの中から第１のキーワードを選択し、第１のキーワードを用いた会話文を自発的に発話する。また、会話処理部は、第１のキーワードを用いた会話文の発話を発端とした会話において、ユーザの応答が途切れたと判定された場合、キーワード生成部によって生成された複数のキーワードの中から、第１のキーワードとは異なる第２のキーワードを選択し、第２のキーワードを用いた会話文を自発的に発話する。 In order to solve such a problem, the first invention provides a conversation system having a keyword generation unit and a conversation processing unit and having a conversation with a user. The keyword generation unit receives a shared image shared with the user as an input, and generates a plurality of keywords related to the subject projected on the shared image. The conversation processing unit selects a first keyword from a plurality of keywords generated by the keyword generation unit, and spontaneously utters a conversation sentence using the first keyword. Further, when it is determined that the user's response is interrupted in the conversation starting from the utterance of the conversation sentence using the first keyword, the conversation processing unit selects from a plurality of keywords generated by the keyword generation unit. A second keyword different from the first keyword is selected, and a conversational sentence using the second keyword is spontaneously spoken.

ここで、第１の発明において、ユーザとの会話におけるキーワードの使用状態を管理するキーワード管理テーブルをさらに設けてもよい。この場合、上記キーワード生成部は、生成した複数のキーワードをキーワード管理テーブルに新規に登録する。また、上記会話処理部は、第１のキーワードの選択に応じて、キーワード管理テーブルにおける第１のキーワードの使用状態を更新すると共に、第２のキーワードの選択に応じて、キーワード管理テーブルにおける第２のキーワードの使用状態を更新する。 Here, in the first invention, a keyword management table for managing the usage state of the keyword in the conversation with the user may be further provided. In this case, the keyword generation unit newly registers the generated plurality of keywords in the keyword management table. Further, the conversation processing unit updates the usage state of the first keyword in the keyword management table according to the selection of the first keyword, and the second keyword management table according to the selection of the second keyword. Update the usage status of the keyword.

第１の発明において、上記会話処理部は、ユーザの応答がない状態が所定時間継続したこと、および、ユーザの応答内容が所定の条件を満たすことの少なくとも一方を満たす場合、ユーザの応答が途切れたと判定してもよい。また、上記会話処理部は、第１のキーワードの属性に応じて選択された第１の会話文テンプレートに、第１のキーワードを挿入することによって、第１の会話文を発話すると共に、第２のキーワードの属性に応じて選択された第２の会話文テンプレートに、第２のキーワードを挿入することによって、第２の会話文を発話してもよい。 In the first invention, the conversation processing unit interrupts the user's response when the user's no response continues for a predetermined time and the user's response content satisfies at least one of the predetermined conditions. It may be determined that the product has been used. In addition, the conversation processing unit utters the first conversation sentence by inserting the first keyword into the first conversation sentence template selected according to the attribute of the first keyword, and the second conversation sentence. The second conversation sentence may be uttered by inserting the second keyword into the second conversation sentence template selected according to the attribute of the keyword.

第２の発明は、第１から第４のステップを有する処理をコンピュータに実行させることによって、ユーザと会話を行う会話プログラムを提供する。第１のステップでは、ユーザと共有されている共有画像を入力として、共有画像に写し出された被写体に関連する複数のキーワードを生成する。第２のステップでは、複数のキーワードの中から第１のキーワードを選択し、第１のキーワードを用いた会話文を自発的に発話する。第３のステップでは、第１のキーワードを用いた会話文の発話を発端とした会話において、ユーザの応答が途切れたか否かを判定する。第４のステップでは、ユーザの応答が途切れたと判定された場合、複数のキーワードの中から、第１のキーワードとは異なる第２のキーワードを選択し、第２のキーワードを用いた会話文を自発的に発話する。 The second invention provides a conversation program for having a conversation with a user by causing a computer to execute a process having the first to fourth steps. In the first step, a shared image shared with the user is input, and a plurality of keywords related to the subject projected on the shared image are generated. In the second step, the first keyword is selected from a plurality of keywords, and a conversational sentence using the first keyword is spontaneously uttered. In the third step, it is determined whether or not the user's response is interrupted in the conversation starting from the utterance of the conversation sentence using the first keyword. In the fourth step, when it is determined that the user's response is interrupted, a second keyword different from the first keyword is selected from a plurality of keywords, and a conversation sentence using the second keyword is spontaneously generated. Speak to the target.

ここで、第２の発明において、ユーザとの会話において、キーワードの使用状態を管理するキーワード管理テーブルに、複数のキーワードを新規に登録する第５のステップをさらに設けてもよい。この場合、上記第２のステップは、キーワード管理テーブルにおける第１のキーワードの使用状態を更新するステップを含む。また、上記第４のステップは、キーワード管理テーブルにおける第２のキーワードの使用状態を更新するステップを含む。 Here, in the second invention, there may be further provided a fifth step of newly registering a plurality of keywords in the keyword management table that manages the usage state of the keywords in the conversation with the user. In this case, the second step includes a step of updating the usage state of the first keyword in the keyword management table. Further, the fourth step includes a step of updating the usage state of the second keyword in the keyword management table.

第２の発明において、上記第３のステップは、ユーザの応答がない状態が所定時間継続したこと、および、ユーザの応答内容が所定の条件を満たすことの少なくとも一方を満たす場合、ユーザの応答が途切れたと判定してもよい。また、上記第２のステップは、第１のキーワードの属性に応じて選択された第１の会話文テンプレートに、第１のキーワードを挿入することによって、第１の会話文を発話するステップを含み、上記第４のステップは、第２のキーワードの属性に応じて選択された第２の会話文テンプレートに、第２のキーワードを挿入することによって、第２の会話文を発話するステップを含んでいてもよい。 In the second aspect of the invention, the third step is that the user's response satisfies at least one of the condition that the user's response continues for a predetermined time and the user's response content satisfies the predetermined condition. It may be determined that it is interrupted. Further, the second step includes a step of uttering a first conversation sentence by inserting the first keyword into the first conversation sentence template selected according to the attribute of the first keyword. , The fourth step described above includes a step of uttering a second conversation sentence by inserting the second keyword into the second conversation sentence template selected according to the attribute of the second keyword. You may.

本発明によれば、共有画像に関する複数のキーワードを選択的に用いて、システム側からの発話が自発的かつ連続的に行われる。第１のキーワードを用いた発話を発端とした会話において、ユーザの応答が途切れた場合、第１のキーワードとは異なる第２のキーワードを用いた発話が自発的に行われ、ユーザに対して新たな話題が提示される。これらのキーワードは、ユーザ自身が認識・共有している共有画像に関するものゆえに、その範囲内で新たな話題を提示しても、ユーザに唐突感を抱かせることはない。これにより、自然な流れで会話に連続性を持たせることができる。 According to the present invention, utterances from the system side are spontaneously and continuously performed by selectively using a plurality of keywords related to the shared image. In a conversation that starts with an utterance using the first keyword, if the user's response is interrupted, an utterance using a second keyword different from the first keyword is spontaneously made to the user. Topic is presented. Since these keywords relate to shared images that the user recognizes and shares, even if a new topic is presented within the range, the user does not feel abrupt. As a result, it is possible to give continuity to the conversation in a natural flow.

特に、ユーザとの会話におけるキーワードの使用状態をキーワード管理テーブルで管理すれば、あるキーワードに関する話題が途切れた直後に、同じキーワードに関する話題が繰り返されるといった事態、すなわち、話題の重複を避けることができる。 In particular, if the usage status of a keyword in a conversation with a user is managed in a keyword management table, it is possible to avoid a situation in which a topic related to the same keyword is repeated immediately after a topic related to a certain keyword is interrupted, that is, duplication of topics is possible. ..

会話システムのブロック構成図Conversation system block configuration diagram 一例としてのキーワード管理テーブルの説明図Explanatory diagram of keyword management table as an example 会話処理の手順を示すフローチャートFlowchart showing the procedure of conversation processing 共有画像から生成されるキーワードの一例を示す図Diagram showing an example of keywords generated from a shared image キーワード管理テーブルの初期登録の一例を示す図Figure showing an example of initial registration of keyword management table ユーザとの会話の流れの一例を示す図Diagram showing an example of the flow of conversation with the user 第１の変形例に係るキーワード管理テーブルの説明図Explanatory diagram of the keyword management table according to the first modification 第２の変形例に係るキーワード管理テーブルの説明図Explanatory diagram of the keyword management table according to the second modification

図１は、本実施形態に係る会話システムのブロック構成図である。この会話システム１は、対話型ロボット玩具、対話アプリがインストールされたスマートフォン、スマートスピーカー、パーソナルコンピュータ（ＰＣ）などに搭載され、会話の相手方であるユーザと会話を行う。この会話において、会話システム１は、ユーザからの問いかけに対して随時応答するほか、「共有画像」に関連した話題を自発的・能動的にユーザに提示する。本明細書において、「共有画像」とは、会話相手であるユーザと共有され、ユーザ自身が把握している画像をいう。例えば、対話型ロボットのような玩具において、ロボットの目として搭載されたカメラによって撮像されたロボット周囲の画像、スマートフォンの内蔵カメラによって撮像された画像、スマートフォンやパーソナルコンピュータ（ＰＣ）のディスプレイに表示された画像などが挙げられる。これらの画像に写し出された全ての被写体は、ユーザ自身が把握しているものゆえに、ユーザと共有された話題となり得るものである。 FIG. 1 is a block configuration diagram of a conversation system according to the present embodiment. The conversation system 1 is mounted on an interactive robot toy, a smartphone in which a dialogue application is installed, a smart speaker, a personal computer (PC), and the like, and has a conversation with a user who is the other party of the conversation. In this conversation, the conversation system 1 responds to a question from the user at any time, and voluntarily and actively presents a topic related to the "shared image" to the user. In the present specification, the "shared image" means an image shared with a user who is a conversation partner and grasped by the user himself / herself. For example, in a toy such as an interactive robot, an image of the surroundings of the robot captured by a camera mounted as the eyes of the robot, an image captured by the built-in camera of the smartphone, and displayed on the display of a smartphone or a personal computer (PC). The image etc. can be mentioned. All the subjects projected in these images can be a topic shared with the user because they are grasped by the user himself / herself.

会話システム１は、マイク２によって集音されたユーザの発話と、特定の共有画像の内容とに基づいて、これらに応じた発話を出力する。ユーザの発話は、マイク２によって集音された音声情報として取得することのみならず、例えば、ＬＩＮＥボットのように、キーボード等で入力された文字情報をユーザの発話として取得してもよい。また、共有画像を取得する手段としては、例えば、ユーザの操作によって画像を撮像するカメラを用いることができる。これ以外にも、玩具に予め記憶された画像（画像ファイル）を表示するといった如く、会話システム１内に予め保存された画像のうちユーザによって選択されたもの、あるいは、ユーザが現に視認しているものなどを共有画像としてもよい。 The conversation system 1 outputs utterances corresponding to the utterances of the user collected by the microphone 2 and the contents of the specific shared image. The user's utterance may be acquired not only as voice information collected by the microphone 2, but also character information input by a keyboard or the like, such as a LINE bot, may be acquired as the user's utterance. Further, as a means for acquiring a shared image, for example, a camera that captures an image by a user's operation can be used. In addition to this, an image selected by the user among the images stored in advance in the conversation system 1, such as displaying an image (image file) stored in advance in the toy, or the user is actually viewing the image. Things may be used as shared images.

会話システム１は、ユーザの発話を集音するマイク２の他に、キーワード生成部３と、会話処理部４とを主体に構成されている。また、会話システム１は、必要な情報を記憶する記憶部として、キーワード管理テーブル５、属性テーブル６、会話文テンプレート格納部７などを備えている。 The conversation system 1 is mainly composed of a keyword generation unit 3 and a conversation processing unit 4 in addition to the microphone 2 that collects the user's utterances. Further, the conversation system 1 includes a keyword management table 5, an attribute table 6, a conversation sentence template storage unit 7, and the like as storage units for storing necessary information.

キーワード生成部３は、ユーザとの会話に際して、共有画像を入力とし、この共有画像に写し出された被写体に関連する事項をキーワード（テキスト）として生成・出力する。例えば、富士山を被写体とした共有画像から、「富士山」というキーワードが得られるといった如くである。キーワードの抽出手法、それ自体は公知であり、任意のものを用いることができる。例えば、上述した特許文献１や特許文献２に記載された手法を用いてよいし、ＹＯＬＯ（You Only Look Once）やＳＳＤ（Single Shot MultiBox Detector）の如く、深層学習による物体検出アルゴリズムを用いてもよい。キーワード生成部３によって生成された複数のキーワードは、会話処理部４に出力されると共に、キーワード管理テーブル５に新規に登録される。 The keyword generation unit 3 inputs a shared image when talking with the user, and generates and outputs items related to the subject projected on the shared image as a keyword (text). For example, it seems that the keyword "Mt. Fuji" can be obtained from a shared image with Mt. Fuji as the subject. The keyword extraction method itself is known, and any method can be used. For example, the methods described in Patent Document 1 and Patent Document 2 described above may be used, or an object detection algorithm based on deep learning such as YOLO (You Only Look Once) or SSD (Single Shot MultiBox Detector) may be used. good. The plurality of keywords generated by the keyword generation unit 3 are output to the conversation processing unit 4 and newly registered in the keyword management table 5.

キーワード管理テーブル５は、ユーザとの会話におけるキーワードの使用状態を管理する。図２は、一例としてのキーワード管理テーブル５の説明図である。このキーワード管理テーブル５は、複数のキーワードＡ～Ｄについて、ユーザとの会話における使用状態をステータスとして管理する。ステータスには、会話において未だ使用されていない状態を示す「未使用」と、会話において既に使用された状態を示す「使用済」とが存在する。キーワード管理テーブル５への新規登録に際して、キーワードＡ～Ｄのステータスは全て「未使用」にセットされる。 The keyword management table 5 manages the usage state of the keyword in the conversation with the user. FIG. 2 is an explanatory diagram of the keyword management table 5 as an example. The keyword management table 5 manages the usage status of the plurality of keywords A to D in a conversation with the user as a status. There are two statuses, "unused", which indicates a state that has not been used in the conversation, and "used", which indicates a state that has already been used in the conversation. At the time of new registration in the keyword management table 5, all the statuses of the keywords A to D are set to "unused".

会話処理部４は、テキスト文を音声で読み上げるテキスト読上部４ａ（ＴＴＳ）を備えており、既存のスマートスピーカーなどと同様、マイク２によって取得されたユーザの音声を解析して、その内容に応答した発話文（応答文）を発話する。また、会話処理部４は、ユーザの発話に応答した受動的な発話のほかに、新たな話題を能動的に提示する自発的な発話も行う。この自発的な発話は、システム主導のプッシュ型の発話であり、受動的な発話（応答）とは異なりユーザの発話に依存しない。これにより、ユーザと共有されている共有画像に関連する事項の範囲内で、各種の話題がユーザに提示される。 The conversation processing unit 4 is provided with a text reading upper part 4a (TTS) that reads out a text sentence by voice, analyzes the user's voice acquired by the microphone 2 and responds to the content, like an existing smart speaker or the like. Speak the spoken sentence (response sentence). In addition to passive utterances in response to user utterances, the conversation processing unit 4 also performs spontaneous utterances that actively present new topics. This spontaneous utterance is a system-driven push-type utterance, and unlike passive utterance (response), it does not depend on the user's utterance. As a result, various topics are presented to the user within the range of matters related to the shared image shared with the user.

この自発的な発話は、キーワード管理テーブル５を参照することによって行われる。具体的には、まず、キーワード生成部３によって生成された複数のキーワードＡ～Ｄの中から、キーワード管理テーブル５におけるステータスが「未使用」のもの（例えば、キーワードＡ）が選択される。キーワードＡの選択に応じて、キーワード管理テーブル５が更新され、選択されたキーワードＡのステータスが「未使用」から「使用済」に変更される。つぎに、属性テーブル６を参照することによって、キーワードＡの属性が特定される。つぎに、会話文テンプレート格納部７に格納されている多数の会話文テンプレートの中から、キーワードＡの属性に応じた会話文テンプレートが選択される。キーワードＡの属性に対応する会話文テンプレートが複数存在する場合には、所定の選択規則に基づいて、あるいは、ランダムに一つの会話文テンプレートが選択される。そして、選択された会話文テンプレートにおける空欄箇所にキーワードＡを挿入することによって、キーワードＡを話題とした会話文が生成され、テキスト読上部５ａによって発話される。 This spontaneous utterance is performed by referring to the keyword management table 5. Specifically, first, from a plurality of keywords A to D generated by the keyword generation unit 3, one having a status of "unused" in the keyword management table 5 (for example, keyword A) is selected. The keyword management table 5 is updated according to the selection of the keyword A, and the status of the selected keyword A is changed from "unused" to "used". Next, the attribute of the keyword A is specified by referring to the attribute table 6. Next, a conversation sentence template corresponding to the attribute of the keyword A is selected from a large number of conversation sentence templates stored in the conversation sentence template storage unit 7. When there are a plurality of conversational sentence templates corresponding to the attribute of the keyword A, one conversational sentence template is selected based on a predetermined selection rule or at random. Then, by inserting the keyword A in a blank place in the selected conversation sentence template, a conversation sentence with the keyword A as a topic is generated and spoken by the text reading upper part 5a.

また、会話処理部４は、キーワードＡを用いた会話文の発話を発端とした会話において、ユーザの応答が途切れたか否かを判定する。ユーザの応答が途切れたと判定された場合、直近に使用されたキーワードＡとは異なるもの、すなわち、キーワードＢ～Ｄのいずれかを用いた新たな話題がユーザに提示される。具体的には、まず、キーワード生成部３によって生成された複数のキーワードＡ～Ｄの中から、その時点において、キーワード管理テーブル５におけるステータスが「未使用」のもの（例えば、キーワードＢ）が選択される。キーワードＢの選択に応じて、キーワード管理テーブル５が更新され、キーワードＢのステータスが「未使用」から「使用済」に変更される。つぎに、属性テーブル６を参照することによって、キーワードＢの属性が特定される。つぎに、会話文テンプレート格納部７に格納されている多数の会話文テンプレートの中から、キーワードＢの属性に応じたいずれかの会話文テンプレートが選択される。そして、選択された会話文テンプレートにおける空欄箇所にキーワードＢを挿入することによって、キーワードＢを話題とした会話文が生成され、テキスト読上部５ａによって発話される。 Further, the conversation processing unit 4 determines whether or not the user's response is interrupted in the conversation starting from the utterance of the conversation sentence using the keyword A. When it is determined that the user's response is interrupted, a new topic different from the most recently used keyword A, that is, a new topic using any of the keywords B to D is presented to the user. Specifically, first, from the plurality of keywords A to D generated by the keyword generation unit 3, the one whose status in the keyword management table 5 is "unused" at that time (for example, keyword B) is selected. Will be done. The keyword management table 5 is updated according to the selection of the keyword B, and the status of the keyword B is changed from "unused" to "used". Next, the attribute of the keyword B is specified by referring to the attribute table 6. Next, one of the conversational sentence templates corresponding to the attribute of the keyword B is selected from a large number of conversational sentence templates stored in the conversational sentence template storage unit 7. Then, by inserting the keyword B in the blank part of the selected conversation sentence template, a conversation sentence with the keyword B as a topic is generated and spoken by the text reading upper part 5a.

図３は、会話処理の手順を示すフローチャートである。図１に示した会話システムは、予めインストールされたコンピュータプログラム（アプリを含む。）をコンピュータ（マイコン）に実行させることによって、実現することができる。以下、図４に示すように、会話アプリがインストールされたスマートフォンにおいて、内蔵カメラで富士山の景色を撮像した画像を共有画像とした場合を例に説明する。 FIG. 3 is a flowchart showing the procedure of conversation processing. The conversation system shown in FIG. 1 can be realized by causing a computer (microcomputer) to execute a computer program (including an application) installed in advance. Hereinafter, as shown in FIG. 4, a case where an image obtained by capturing a view of Mt. Fuji with a built-in camera as a shared image in a smartphone on which a conversation application is installed will be described as an example.

まず、ステップ１において、会話システム１に共有画像が入力される。共有画像の指定は、ユーザの明示的な指示に応じて行ってもよいし、システム側が自動的に行ってもよい。 First, in step 1, a shared image is input to the conversation system 1. The shared image may be specified according to the explicit instruction of the user, or may be automatically specified by the system side.

つぎに、ステップ２において、ステップ１で入力された共有画像に写し出された被写体に関連する事項が複数のキーワードとして生成・出力される。例えば、図４に示したように、富士山の共有画像から、「富士山」、「山」、「雪」、「空」、「青い」、「白」、「火山」といった７つのキーワードが取得される。なお、キーワードの抽出アルゴリズムによっては、個々のキーワードの出力と共に、その確度（確からしさ）も出力される。 Next, in step 2, items related to the subject projected on the shared image input in step 1 are generated and output as a plurality of keywords. For example, as shown in FIG. 4, seven keywords such as "Mt. Fuji", "mountain", "snow", "sky", "blue", "white", and "volcano" are acquired from the shared image of Mt. Fuji. To. Depending on the keyword extraction algorithm, the accuracy (accuracy) is output together with the output of each keyword.

ステップ３において、ステップ２で取得された複数のキーワードがキーワード管理テーブル５に新規に登録される。例えば、図５に示すように、「富士山」、「山」、「雪」、「空」、「青い」、「白」、「火山」といった７つのキーワードについて、ステータスを「未使用」にセットした上で、キーワード管理テーブル５に登録される。 In step 3, the plurality of keywords acquired in step 2 are newly registered in the keyword management table 5. For example, as shown in Fig. 5, the status is set to "unused" for seven keywords such as "Mt. Fuji", "mountain", "snow", "sky", "blue", "white", and "volcano". After that, it is registered in the keyword management table 5.

ステップ４では、ステップ２で生成された複数のキーワードの中から、ステータスが「未使用」であるいずれかのキーワードが選択される。キーワードの選択は、ランダムで行ってもよいし、例えば、確度が高いもの順といった如く、所定の選択規則に基づいて行ってもよい。そして、ステータス管理テーブル５が更新され、選択されたキーワードのステータスが「未使用」から「使用済」に変更される。例えば、上記７つのキーワードの中から「富士山」が選択された場合、「富士山」のステータスは、「未使用」から「使用済」に変更される。 In step 4, one of the keywords whose status is "unused" is selected from the plurality of keywords generated in step 2. The keywords may be selected randomly, or may be selected based on a predetermined selection rule, for example, in descending order of accuracy. Then, the status management table 5 is updated, and the status of the selected keyword is changed from "unused" to "used". For example, when "Mt. Fuji" is selected from the above seven keywords, the status of "Mt. Fuji" is changed from "unused" to "used".

ステップ５において、ステップ４で選択されたキーワードを用いて、自発的な会話文が発話される（自発発話）。キーワードとして「富士山」が選択された場合を例に説明すると、属性テーブル６を参照して、「富士山」の属性として「山」が特定される。つぎに、会話文テンプレート格納部７から、属性「山」に対応する会話文テンプレートとして、「○○きれいだね」という会話文テンプレートが選択される。そして、選択された会話文テンプレートの「○○」にキーワード「富士山」を挿入することによって、「富士山きれいだね」という会話文が発話され、「富士山」の話題がユーザに提起される。 In step 5, a spontaneous conversational sentence is uttered using the keyword selected in step 4 (spontaneous utterance). Explaining the case where "Mt. Fuji" is selected as the keyword, "mountain" is specified as the attribute of "Mt. Fuji" with reference to the attribute table 6. Next, from the conversation sentence template storage unit 7, the conversation sentence template "○○ is beautiful" is selected as the conversation sentence template corresponding to the attribute "mountain". Then, by inserting the keyword "Mt. Fuji" into the "○○" of the selected conversation sentence template, the conversation sentence "Mt. Fuji is beautiful" is uttered, and the topic of "Mt. Fuji" is raised to the user.

ステップ６において、ステップ５の自発発話を発端とした会話において、会話が途切れたか否かが判定される。この判定条件は任意に設定することができるが、例えば、ユーザの応答がない状態が所定時間継続した場合、会話が途切れたものと判定してもよい。ユーザの応答がないことをもって、ユーザは本話題に興味がないとみなせるからである。また、ユーザの応答はあったものの、その内容が所定の条件を満たす場合、例えば、ユーザの否定的応答や消極的肯定などの場合、会話が途切れたものと判定してもよい。その際、ユーザの声の抑揚などを考慮してもよい。 In step 6, it is determined whether or not the conversation is interrupted in the conversation starting from the spontaneous utterance in step 5. This determination condition can be arbitrarily set, but for example, when the state in which there is no response from the user continues for a predetermined time, it may be determined that the conversation is interrupted. This is because it can be considered that the user is not interested in this topic because there is no response from the user. Further, although there is a response from the user, if the content satisfies a predetermined condition, for example, if the user has a negative response or a negative affirmation, it may be determined that the conversation is interrupted. At that time, the intonation of the user's voice may be taken into consideration.

ステップ６の判定結果が否定の場合、すなわち、ユーザの応答があって会話が途切れていないと判定された場合には、ステップ７に進み、ユーザの応答に応じた受動的な会話文が発話される（受動発話）。この受動的な会話文の発話は、会話が途切れたと判定されるまで繰り返される（ステップ６，７）。これにより、システムによる自発的な発話「富士山きれいだね」を発端としたユーザとの会話が継続されることになる。 If the determination result in step 6 is negative, that is, if it is determined that the conversation is not interrupted due to the user's response, the process proceeds to step 7, and a passive conversation sentence corresponding to the user's response is uttered. (Passive utterance). The utterance of this passive conversational sentence is repeated until it is determined that the conversation is interrupted (steps 6 and 7). As a result, the conversation with the user starting from the spontaneous utterance "Mt. Fuji is beautiful" by the system will be continued.

これに対して、ステップ６の判定結果が肯定の場合、すなわち、会話が途切れたと判定された場合には、ステップ４に戻る。そして、ステータスが「未使用」である新たなキーワードの選択（ステップ４）と、このキーワードを用いた自発的な会話文の発話（ステップ５）とが行われる。これにより、以後の会話は、「富士山」の話題から別の話題（例えば「火山」）に移行することになる。 On the other hand, if the determination result in step 6 is affirmative, that is, if it is determined that the conversation is interrupted, the process returns to step 4. Then, a new keyword whose status is "unused" is selected (step 4), and a spontaneous conversation sentence is uttered using this keyword (step 5). As a result, the subsequent conversation will shift from the topic of "Mt. Fuji" to another topic (for example, "volcano").

以上のようなユーザとの一連のやり取りは、別ルーチンによって会話が終了したと判定されるまで継続される。 The series of exchanges with the user as described above is continued until it is determined by another routine that the conversation has ended.

図６は、ユーザとの会話の流れの一例を示す図である。まず、会話システム１側の自発発話として、「富士山」をキーワードとした「富士山きれいだね」が発話される。そして、これを発端としたユーザ側の応答と、会話システム１側の受動発話とが繰り返される。その後、「そうだよね」というユーザ側の応答（消極的肯定）が発話されると、途切れ条件が満たされて、「富士山」の話題は終了する。 FIG. 6 is a diagram showing an example of the flow of conversation with the user. First, as a spontaneous utterance on the conversation system 1 side, "Mt. Fuji is beautiful" with the keyword "Mt. Fuji" is spoken. Then, the response on the user side starting from this and the passive utterance on the conversation system 1 side are repeated. After that, when the user's response (negative affirmation) saying "That's right" is spoken, the interruption condition is satisfied and the topic of "Mt. Fuji" ends.

この話題の終了に伴い、会話システム１側の自発発話として、「富士山」とは異なる「火山」をキーワードとした「でも火山は怖いよね」が発話される。その後、「怖くないよ」というユーザ側の応答（否定的応答）が発話されると、途切れ条件が満たされて、「火山」の話題は終了する。 With the end of this topic, "But volcanoes are scary, aren't they?" With the keyword "volcano", which is different from "Mt. Fuji", as a spontaneous utterance on the conversation system 1 side. After that, when the user's response (negative response) saying "I'm not scared" is spoken, the interruption condition is satisfied and the topic of "volcano" ends.

この話題の終了に伴い、会話システム１側の自発発話として、「火山」とは異なる「山」をキーワードとした「じゃあ山に登りに行こうよ」が発話される。その後、「いいね」というユーザ応答（消極的肯定）が発話されると、途切れ条件が満たされて、「山」の話題は終了する。 With the end of this topic, as a spontaneous utterance on the conversation system 1 side, "Let's go up the mountain" with the keyword "mountain" different from "volcano" is spoken. After that, when the user response (negative affirmation) of "Like" is uttered, the interruption condition is satisfied and the topic of "Mountain" ends.

この話題の終了に伴い、会話システム１側の自発発話として、「山」とは異なる「雪」をキーワードとした「雪のない日がいいね」が発話され、以後、ユーザとのやり取りが継続される。富士山の撮像画像がユーザと共有されている状況下において、「富士山」、「火山」、「山」、「雪」の順に話題が移行したとしても、会話に流れに不自然さはなく、ユーザに唐突感を抱かせることはない。 With the end of this topic, as a spontaneous utterance on the conversation system 1 side, "I like a snow-free day" with the keyword "snow", which is different from "mountain", was spoken, and communication with users has continued since then. Will be done. Even if the topic shifts in the order of "Mt. Fuji", "volcano", "mountain", and "snow" in the situation where the captured image of Mt. Fuji is shared with the user, there is no unnatural flow in the conversation and the user. It doesn't make you feel abrupt.

このように、本実施形態によれば、共有画像に関する複数のキーワードを選択的に用いて、システム側からの発話が自発的かつ連続的に行われる。あるキーワードを用いた発話を発端とした会話において、ユーザの応答が途切れた場合、このキーワードとは異なる別のキーワードを用いた発話が自発的に行われ、ユーザに対して新たな話題が提示される。これらのキーワードは、ユーザと共有されている共有画像に関するものゆえに、その範囲内で新たな話題を提示しても、ユーザに唐突感を抱かせることはない。これにより、自然な流れで会話に連続性を持たせることができる。 As described above, according to the present embodiment, utterances from the system side are spontaneously and continuously performed by selectively using a plurality of keywords related to the shared image. In a conversation that starts with an utterance using a certain keyword, if the user's response is interrupted, an utterance using another keyword different from this keyword is spontaneously performed, and a new topic is presented to the user. To. Since these keywords are related to the shared image shared with the user, even if a new topic is presented within the range, the user does not feel abrupt. As a result, it is possible to give continuity to the conversation in a natural flow.

また、本実施形態によれば、キーワード管理テーブル５を用いて、ユーザとの会話におけるキーワードの使用状態を管理することで、例えば「富士山」の話題が途切れた直後に「富士山」の話題が繰り返されるといった事態を避けることができる。これにより、ユーザに違和感を与えることなく、会話に連続性を持たせることができる。 Further, according to the present embodiment, by managing the usage state of the keyword in the conversation with the user by using the keyword management table 5, for example, the topic of "Mt. Fuji" is repeated immediately after the topic of "Mt. Fuji" is interrupted. It is possible to avoid such a situation. As a result, it is possible to give continuity to the conversation without giving the user a sense of discomfort.

なお、上述した実施形態では、キーワード管理テーブル５のステータスとして、個々のキーワードの使用の有無（未使用／使用済）を管理しているが、例えば、図７に示すように、０回、１回、２回といった如く、個々のキーワードの使用回数を管理してもよい。また、図８に示すように、現時点でテーマとなっているキーワードが何であるのかだけをフラグで管理してもよい。例えば、キーワードＡを用いた自発発話を発端とした会話において、ユーザの応答が途切れた場合、同図のステータス（使用状態）から、次の自発発話として、現時点のキーワードＡ以外のもの、すなわち、キーワードＢ～Ｄのいずれかが選択される。これにより、上述した実施形態と同様、新たな話題への移行に際して、直前の話題が繰り返されるといった事態を避けることができる。 In the above-described embodiment, the presence / absence (unused / used) of each keyword is managed as the status of the keyword management table 5, but as shown in FIG. 7, for example, 0 times and 1 time. You may manage the number of times each keyword is used, such as once or twice. Further, as shown in FIG. 8, only what is the keyword currently the theme may be managed by the flag. For example, in a conversation starting from a spontaneous utterance using the keyword A, when the user's response is interrupted, the next spontaneous utterance from the status (usage state) in the figure is something other than the current keyword A, that is, One of the keywords B to D is selected. As a result, as in the above-described embodiment, it is possible to avoid a situation in which the immediately preceding topic is repeated when shifting to a new topic.

１会話システム
２マイク
３キーワード生成部
４会話処理部
４ａテキスト読上部
５キーワード管理テーブル
６属性テーブル
７会話文テンプレート格納部

1 Conversation system 2 Microphone 3 Keyword generation unit 4 Conversation processing unit 4a Text reading upper part 5 Keyword management table 6 Attribute table 7 Conversation sentence template storage unit

Claims

In a conversation system that talks with the user
A keyword generation unit that generates a plurality of keywords related to the subject projected on the shared image by inputting a shared image shared with the user, and a keyword generation unit.
A first keyword is selected from a plurality of keywords generated by the keyword generation unit, a conversation sentence using the first keyword is spontaneously spoken, and a conversation sentence using the first keyword is spoken. When it is determined that the user's response is interrupted in the conversation starting from the utterance of the above, a second keyword different from the first keyword is selected from the plurality of keywords generated by the keyword generation unit. , A conversation system characterized by having a conversation processing unit that spontaneously utters a conversation sentence using the second keyword.

It also has a keyword management table that manages the usage status of keywords in conversations with users.
The keyword generation unit newly registers the generated plurality of keywords in the keyword management table, and then registers the generated keywords in the keyword management table.
The conversation processing unit updates the usage state of the first keyword in the keyword management table according to the selection of the first keyword, and the keyword management table according to the selection of the second keyword. The conversation system according to claim 1, wherein the usage state of the second keyword is updated.

The conversation processing unit determines that the user's response is interrupted when the user's no response continues for a predetermined time and the user's response content satisfies at least one of the predetermined conditions. The conversation system according to claim 1 or 2.

The conversation processing unit utters the first conversation sentence by inserting the first keyword into the first conversation sentence template selected according to the attribute of the first keyword, and at the same time, the first conversation sentence. Any of claims 1 to 3, wherein the second conversation sentence is uttered by inserting the second keyword into the second conversation sentence template selected according to the attribute of the second keyword. The conversation system described in.

In a conversation program that talks to the user
The first step of generating a plurality of keywords related to the subject projected on the shared image by inputting the shared image shared with the user, and
A second step of selecting a first keyword from the plurality of keywords and spontaneously uttering a conversational sentence using the first keyword.
In the conversation starting from the utterance of the conversation sentence using the first keyword, the third step of determining whether or not the user's response is interrupted, and
When it is determined that the user's response is interrupted, a second keyword different from the first keyword is selected from the plurality of keywords, and a conversation sentence using the second keyword is spontaneously spoken. A conversation program comprising causing a computer to execute a process having a fourth step.

It further has a fifth step of newly registering the plurality of keywords in the keyword management table that manages the usage state of the keywords in the conversation with the user.
The second step includes updating the usage status of the first keyword in the keyword management table.
The conversation program according to claim 5, wherein the fourth step includes a step of updating the usage state of the second keyword in the keyword management table.

In the third step, it is determined that the user's response is interrupted when the user's no response continues for a predetermined time and the user's response content satisfies at least one of the predetermined conditions. The conversation program according to claim 5 or 6.

The second step is a step of uttering the first conversation sentence by inserting the first keyword into the first conversation sentence template selected according to the attribute of the first keyword. Including,
The fourth step is a step of uttering the second conversation sentence by inserting the second keyword into the second conversation sentence template selected according to the attribute of the second keyword. The conversation program according to any one of claims 5 to 7, wherein the conversation program comprises.