JP2022043901A

JP2022043901A - Dialogue system, interactive robot, program, and information processing method

Info

Publication number: JP2022043901A
Application number: JP2020149403A
Authority: JP
Inventors: 純司三谷; Junji Mitani
Original assignee: Sintokogio Ltd
Current assignee: Sintokogio Ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2022-03-16
Anticipated expiration: 2040-09-04
Also published as: JP7472727B2

Abstract

To improve the accuracy of estimating a subject of conversation in a dialogue with a user.SOLUTION: Processors (11, 21) included in a dialogue system 1 execute: estimation processing (S103) of estimating a subject of conversation based on a first user voice uttered by a user; generation processing (S105) of generating a response voice to respond to the first user voice; and determination processing (S109-S111) of, based on whether or not a second user voice uttered by the user in correspondence to an output of the response voice indicates negative content, determines whether or not the subject of conversation estimated by the estimation processing is correct.SELECTED DRAWING: Figure 2

Description

本発明は、ユーザと対話する技術に関する。 The present invention relates to a technique for interacting with a user.

ユーザと対話する技術が知られている。例えば、特許文献１には、話題の種類に応じて階層的に構成された複数個の言語モデルを用いて、入力音声に対応する話題を推定する技術が記載されている。当該技術は、入力音声の仮の認識結果および各言語モデル間の類似度と、認識結果の信頼度と、階層の深さとに基づいて、１つの言語モデルを選択することにより話題を推定する。 Techniques for interacting with users are known. For example, Patent Document 1 describes a technique for estimating a topic corresponding to an input voice by using a plurality of language models hierarchically configured according to the type of topic. The technique estimates the topic by selecting one language model based on the tentative recognition result of the input speech, the similarity between each language model, the reliability of the recognition result, and the depth of the hierarchy.

特許第５２１２９１０号（２０１３年３月８日公開）Patent No. 5212910 (published March 8, 2013)

しかしながら、特許文献１に記載された技術は、仮の認識結果が誤っていると、話題推定の精度が低くなる。このため、当該技術は、実際の話題と異なる話題を推定してしまう可能性があり、話題の推定精度において改善の余地がある。 However, in the technique described in Patent Document 1, if the tentative recognition result is incorrect, the accuracy of topic estimation becomes low. Therefore, the technique may estimate a topic different from the actual topic, and there is room for improvement in the estimation accuracy of the topic.

本発明の一態様は、ユーザとの対話において、話題の推定精度を向上させる技術を実現することを目的とする。 One aspect of the present invention is to realize a technique for improving the estimation accuracy of a topic in a dialogue with a user.

上記の課題を解決するために、本発明の一態様に係る対話システムは、１または複数のプロセッサを備える。前記１または複数のプロセッサは、推定処理と、生成処理と、判断処理とを実行する。また、本発明の一態様に係る情報処理方法は、１または複数のプロセッサが実行する情報処理方法である。当該情報処理方法は、推定ステップと、生成ステップと、判断ステップとを含む。 In order to solve the above problems, the dialogue system according to one aspect of the present invention includes one or more processors. The one or more processors execute the estimation process, the generation process, and the determination process. Further, the information processing method according to one aspect of the present invention is an information processing method executed by one or a plurality of processors. The information processing method includes an estimation step, a generation step, and a judgment step.

推定処理（推定ステップ）において、前記１または複数のプロセッサは、ユーザが発話した第１のユーザ音声に基づいて話題を推定する。生成処理（生成ステップ）において、前記１または複数のプロセッサは、前記第１のユーザ音声に応答する応答音声を生成する。判断処理（判断ステップ）において、前記１または複数のプロセッサは、前記応答音声の出力に対応して前記ユーザが発話した第２のユーザ音声が、否定的な内容を示すか否かに基づいて、前記推定処理により推定した話題が正しいか否かを判断する。 In the estimation process (estimation step), the one or more processors estimate the topic based on the first user voice spoken by the user. In the generation process (generation step), the one or more processors generate a response voice in response to the first user voice. In the determination process (determination step), the one or more processors are based on whether or not the second user voice spoken by the user in response to the output of the response voice shows negative content. It is determined whether or not the topic estimated by the estimation process is correct.

本発明の一態様によれば、ユーザとの対話において、話題の推定精度を向上させる技術を実現することができる。 According to one aspect of the present invention, it is possible to realize a technique for improving the estimation accuracy of a topic in a dialogue with a user.

本発明の一実施形態に係る対話システムの構成を示すブロック図である。It is a block diagram which shows the structure of the dialogue system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理方法の流れを示すフローチャートである。It is a flowchart which shows the flow of the information processing method which concerns on one Embodiment of this invention. 図２に示す情報処理方法が含む推定処理の詳細な流れを示すフローチャートである。It is a flowchart which shows the detailed flow of the estimation process included in the information processing method shown in FIG. 本発明の一実施形態における類似キーワードデータベースの具体例を示す図である。It is a figure which shows the specific example of the similar keyword database in one Embodiment of this invention. 本発明の一実施形態に係る対話システムの適用例を説明する図である。It is a figure explaining the application example of the dialogue system which concerns on one Embodiment of this invention.

以下、本発明の一実施形態について、図面を参照して説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

＜対話システムの概要＞
対話システム１は、ユーザが発話したユーザ音声を取得し、取得したユーザ音声に応答する応答音声を生成して出力することにより、ユーザと対話するシステムである。対話システム１は、第１のユーザ音声に基づいて話題を推定する。また、対話システム１は、推定した話題の正誤を、第２のユーザ音声が否定的な内容を示すか否かに基づいて判断する。第２のユーザ音声は、第１のユーザ音声に応答する応答音声の出力に対応してユーザが発話した音声である。本実施形態では、対話システム１が対象とするユーザは、被介護者または高齢者等である。対話システム１は、これらのユーザと対話する用途で用いられる。ただし、対話システム１が対象とするユーザは、これらの例に限定されない。 <Overview of dialogue system>
The dialogue system 1 is a system that interacts with a user by acquiring a user voice spoken by the user and generating and outputting a response voice that responds to the acquired user voice. The dialogue system 1 estimates the topic based on the first user voice. Further, the dialogue system 1 determines whether the estimated topic is correct or incorrect based on whether or not the second user voice shows negative content. The second user voice is a voice spoken by the user in response to the output of the response voice in response to the first user voice. In the present embodiment, the target user of the dialogue system 1 is a care recipient, an elderly person, or the like. The dialogue system 1 is used for interacting with these users. However, the users targeted by the dialogue system 1 are not limited to these examples.

＜対話システム１の構成＞
対話システム１の構成について、図１を参照して説明する。図１は、本発明の一実施形態に係る対話システム１の構成を示すブロック図である。図１に示すように、対話システム１は、対話ロボット１０と、サーバ２０とを含む。 <Configuration of Dialogue System 1>
The configuration of the dialogue system 1 will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of a dialogue system 1 according to an embodiment of the present invention. As shown in FIG. 1, the dialogue system 1 includes a dialogue robot 10 and a server 20.

（対話ロボット１０の構成）
対話ロボット１０の構成について、図１を参照して説明する。図１に示すように、対話ロボット１０は、コントローラ１１０と、マイク１２０と、スピーカ１３０とを含む。例えば、対話ロボット１０が顔部（不図示）を有する場合、マイク１２０およびスピーカ１３０は、顔部に含まれていてもよい。 (Configuration of dialogue robot 10)
The configuration of the dialogue robot 10 will be described with reference to FIG. As shown in FIG. 1, the dialogue robot 10 includes a controller 110, a microphone 120, and a speaker 130. For example, when the dialogue robot 10 has a face (not shown), the microphone 120 and the speaker 130 may be included in the face.

コントローラ１１０は、対話ロボット１０全体の動作を制御する。コントローラ１１０は、プロセッサ１１と、一次メモリ１２と、二次メモリ１３と、通信インタフェース１４と、入出力インタフェース１５とを含む。プロセッサ１１、一次メモリ１２、二次メモリ１３、通信インタフェース１４、および入出力インタフェース１５は、バスを介して相互に接続されている。 The controller 110 controls the operation of the entire dialogue robot 10. The controller 110 includes a processor 11, a primary memory 12, a secondary memory 13, a communication interface 14, and an input / output interface 15. The processor 11, the primary memory 12, the secondary memory 13, the communication interface 14, and the input / output interface 15 are connected to each other via a bus.

二次メモリ１３には、プログラムＰ１が格納されている。プログラムＰ１は、後述する情報処理方法Ｓの少なくとも一部をプロセッサ１１に実行させるためのプログラムである。プロセッサ１１は、二次メモリ１３に格納されているプログラムＰ１を一次メモリ１２上に展開する。そして、プロセッサ１１は、一次メモリ１２上に展開されたプログラムＰ１に含まれる命令に従って、情報処理方法Ｓに含まれる各ステップを実行する。 The program P1 is stored in the secondary memory 13. The program P1 is a program for causing the processor 11 to execute at least a part of the information processing method S described later. The processor 11 expands the program P1 stored in the secondary memory 13 on the primary memory 12. Then, the processor 11 executes each step included in the information processing method S according to the instruction included in the program P1 expanded on the primary memory 12.

プロセッサ１１として利用可能なデバイスとしては、例えば、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphic Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＰＵ（Micro Processing Unit）、ＦＰＵ（Floating point number Processing Unit）、ＰＰＵ（Physics Processing Unit）、マイクロコントローラ、又は、これらの組み合わせを挙げることができる。プロセッサ１１は、「演算装置」と呼ばれることもある。 Devices that can be used as the processor 11 include, for example, a CPU (Central Processing Unit), a GPU (Graphic Processing Unit), a DSP (Digital Signal Processor), an MPU (Micro Processing Unit), an FPU (Floating point number Processing Unit), and a PPU. (Physics Processing Unit), a microcontroller, or a combination thereof can be mentioned. The processor 11 is sometimes called an "arithmetic unit".

また、一次メモリ１２として利用可能なデバイスとしては、例えば、半導体ＲＡＭ（Random Access Memory）を挙げることができる。一次メモリ１２は、「主記憶装置」と呼ばれることもある。また、二次メモリ１３として利用可能なデバイスとしては、例えば、フラッシュメモリ、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、ＯＤＤ（Optical Disk Drive）、ＦＤＤ（Floppy（登録商標） Disk Drive）、又は、これらの組み合わせを挙げることができる。二次メモリ１３は、「補助記憶装置」と呼ばれることもある。なお、二次メモリ１３は、コントローラ１１０に内蔵されていてもよいし、通信インタフェース１４または入出力インタフェース１５を介してコントローラ１１０（対話ロボット１０）と接続された他のコンピュータ（例えば、サーバ２０）に内蔵されていてもよい。なお、本実施形態においては、コントローラ１１０における記憶を２つのメモリ（一次メモリ１２および二次メモリ１３）により実現しているが、これに限定されない。すなわち、コントローラ１１０における記憶を１つのメモリにより実現してもよい。この場合、例えば、そのメモリの或る記憶領域を一次メモリ１２として利用し、そのメモリの他の記憶領域を二次メモリ１３として利用すればよい。 Further, as a device that can be used as the primary memory 12, for example, a semiconductor RAM (Random Access Memory) can be mentioned. The primary memory 12 is sometimes referred to as a "main storage device". Devices that can be used as the secondary memory 13 include, for example, a flash memory, an HDD (Hard Disk Drive), an SSD (Solid State Drive), an ODD (Optical Disk Drive), and an FDD (Floppy (registered trademark) Disk Drive). , Or a combination of these. The secondary memory 13 is sometimes called an "auxiliary storage device". The secondary memory 13 may be built in the controller 110, or may be another computer (for example, the server 20) connected to the controller 110 (dialogue robot 10) via the communication interface 14 or the input / output interface 15. It may be built in. In the present embodiment, the storage in the controller 110 is realized by two memories (primary memory 12 and secondary memory 13), but the present invention is not limited to this. That is, the memory in the controller 110 may be realized by one memory. In this case, for example, a certain storage area of the memory may be used as the primary memory 12, and the other storage area of the memory may be used as the secondary memory 13.

通信インタフェース１４には、ネットワークＮ１を介して他のコンピュータが有線接続又は無線接続される。ここでは、他のコンピュータとして、少なくとも、サーバ２０が接続される。通信インタフェース１４としては、例えば、イーサネット（登録商標）、Ｗｉ－Ｆｉ（登録商標）などのインタフェースが挙げられる。利用可能なネットワークとしては、ＰＡＮ（Personal Area Network）、ＬＡＮ（Local Area Network）、ＣＡＮ（Campus Area Network）、ＭＡＮ（Metropolitan Area Network）、ＷＡＮ（Wide Area Network）、ＧＡＮ（Global Area Network）、又は、これらのネットワークを含むインターネットワークが挙げられる。インターネットワークは、イントラネットであってもよいし、エクストラネットであってもよいし、インターネットであってもよい。 Another computer is connected to the communication interface 14 by wire or wirelessly via the network N1. Here, at least the server 20 is connected as another computer. Examples of the communication interface 14 include interfaces such as Ethernet (registered trademark) and Wi-Fi (registered trademark). Available networks include PAN (Personal Area Network), LAN (Local Area Network), CAN (Campus Area Network), MAN (Metropolitan Area Network), WAN (Wide Area Network), GAN (Global Area Network), or , Internetworks including these networks. Internetwork may be an intranet, an extranet, or the Internet.

入出力インタフェース１５には、マイク１２０およびスピーカ１３０が接続される。入出力インタフェース１５としては、例えば、ＵＳＢ（Universal Serial Bus）、ＡＴＡ（Advanced Technology Attachment）、ＳＣＳＩ（Small Computer System Interface）、ＰＣＩ（Peripheral Component Interconnect）などのインタフェースが挙げられる。 A microphone 120 and a speaker 130 are connected to the input / output interface 15. Examples of the input / output interface 15 include interfaces such as USB (Universal Serial Bus), ATA (Advanced Technology Attachment), SCSI (Small Computer System Interface), and PCI (Peripheral Component Interconnect).

（サーバ２０の構成）
サーバ２０の構成について、図１を参照して説明する。図１に示すように、サーバ２０は、プロセッサ２１と、一次メモリ２２と、二次メモリ２３と、通信インタフェース２４とを含む。プロセッサ２１、一次メモリ２２、二次メモリ２３、および通信インタフェース２４は、バスを介して相互に接続されている。 (Configuration of server 20)
The configuration of the server 20 will be described with reference to FIG. As shown in FIG. 1, the server 20 includes a processor 21, a primary memory 22, a secondary memory 23, and a communication interface 24. The processor 21, the primary memory 22, the secondary memory 23, and the communication interface 24 are connected to each other via a bus.

二次メモリ２３には、プログラムＰ２、複数の話題キーワードデータベース（ＤＢ）２３１、複数の類似キーワードデータベース（ＤＢ）２３２、および音声データベース（ＤＢ）２３３が格納されている。これらのＤＢ２３１～２３３の詳細については後述する。プログラムＰ２は、後述する情報処理方法Ｓの少なくとも一部をプロセッサ２１に実行させるためのプログラムである。プロセッサ２１は、二次メモリ２３に格納されているプログラムＰ２を一次メモリ２２上に展開する。そして、プロセッサ２１は、一次メモリ２２上に展開されたプログラムＰ２に含まれる命令に従って、情報処理方法Ｓに含まれる各ステップを実行する。 The secondary memory 23 stores a program P2, a plurality of topic keyword databases (DB) 231, a plurality of similar keyword databases (DB) 232, and a voice database (DB) 233. Details of these DBs 231 to 233 will be described later. The program P2 is a program for causing the processor 21 to execute at least a part of the information processing method S described later. The processor 21 expands the program P2 stored in the secondary memory 23 on the primary memory 22. Then, the processor 21 executes each step included in the information processing method S according to the instruction included in the program P2 expanded on the primary memory 22.

プロセッサ２１として利用可能なデバイスとしては、例えば、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphic Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＰＵ（Micro Processing Unit）、ＦＰＵ（Floating point number Processing Unit）、ＰＰＵ（Physics Processing Unit）、マイクロコントローラ、又は、これらの組み合わせを挙げることができる。プロセッサ２１は、「演算装置」と呼ばれることもある。 Devices that can be used as the processor 21 include, for example, a CPU (Central Processing Unit), a GPU (Graphic Processing Unit), a DSP (Digital Signal Processor), an MPU (Micro Processing Unit), an FPU (Floating point number Processing Unit), and a PPU. (Physics Processing Unit), a microcontroller, or a combination thereof can be mentioned. The processor 21 is sometimes called an "arithmetic unit".

また、一次メモリ２２として利用可能なデバイスとしては、例えば、半導体ＲＡＭ（Random Access Memory）を挙げることができる。一次メモリ２２は、「主記憶装置」と呼ばれることもある。また、二次メモリ２３として利用可能なデバイスとしては、例えば、フラッシュメモリ、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、ＯＤＤ（Optical Disk Drive）、ＦＤＤ（Floppy（登録商標） Disk Drive）、又は、これらの組み合わせを挙げることができる。二次メモリ２３は、「補助記憶装置」と呼ばれることもある。なお、二次メモリ２３は、サーバ２０に内蔵されていてもよいし、通信インタフェース２４を介してサーバ２０と接続された他のコンピュータ（例えば、クラウドサーバを構成するコンピュータ）に内蔵されていてもよい。なお、本実施形態においては、サーバ２０における記憶を２つのメモリ（一次メモリ２２および二次メモリ２３）により実現しているが、これに限定されない。すなわち、サーバ２０における記憶を１つのメモリにより実現してもよい。この場合、例えば、そのメモリの或る記憶領域を一次メモリ２２として利用し、そのメモリの他の記憶領域を二次メモリ２３として利用すればよい。 Further, as a device that can be used as the primary memory 22, for example, a semiconductor RAM (Random Access Memory) can be mentioned. The primary memory 22 is sometimes referred to as a "main storage device". Devices that can be used as the secondary memory 23 include, for example, a flash memory, an HDD (Hard Disk Drive), an SSD (Solid State Drive), an ODD (Optical Disk Drive), and an FDD (Floppy (registered trademark) Disk Drive). , Or a combination of these. The secondary memory 23 is sometimes called an "auxiliary storage device". The secondary memory 23 may be built in the server 20, or may be built in another computer (for example, a computer constituting the cloud server) connected to the server 20 via the communication interface 24. good. In the present embodiment, the storage in the server 20 is realized by two memories (primary memory 22 and secondary memory 23), but the storage is not limited to this. That is, the storage in the server 20 may be realized by one memory. In this case, for example, a certain storage area of the memory may be used as the primary memory 22, and the other storage area of the memory may be used as the secondary memory 23.

通信インタフェース２４には、ネットワークＮ１を介して他のコンピュータが有線接続又は無線接続される。ここでは、他のコンピュータとして、少なくとも、対話ロボット１０が接続される。通信インタフェース２４としては、例えば、イーサネット（登録商標）、Ｗｉ－Ｆｉ（登録商標）などのインタフェースが挙げられる。利用可能なネットワークとしては、ＰＡＮ（Personal Area Network）、ＬＡＮ（Local Area Network）、ＣＡＮ（Campus Area Network）、ＭＡＮ（Metropolitan Area Network）、ＷＡＮ（Wide Area Network）、ＧＡＮ（Global Area Network）、又は、これらのネットワークを含むインターネットワークが挙げられる。インターネットワークは、イントラネットであってもよいし、エクストラネットであってもよいし、インターネットであってもよい。 Another computer is connected to the communication interface 24 by wire or wirelessly via the network N1. Here, at least the dialogue robot 10 is connected as another computer. Examples of the communication interface 24 include interfaces such as Ethernet (registered trademark) and Wi-Fi (registered trademark). Available networks include PAN (Personal Area Network), LAN (Local Area Network), CAN (Campus Area Network), MAN (Metropolitan Area Network), WAN (Wide Area Network), GAN (Global Area Network), or , Internetworks including these networks. Internetwork may be an intranet, an extranet, or the Internet.

＜情報処理方法Ｓの流れ＞
対話システム１が実行する情報処理方法Ｓについて、図２を参照して説明する。図２は、情報処理方法Ｓの流れを示すフローチャートである。図２において、左図は、プロセッサ１１（対話ロボット１０）が実行する処理を示し、右図は、プロセッサ２１（サーバ２０）が実行する処理を示す。図２に示すように、情報処理方法Ｓは、ステップＳ１０１からＳ１１４までを含む。 <Flow of information processing method S>
The information processing method S executed by the dialogue system 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the flow of the information processing method S. In FIG. 2, the left figure shows the process executed by the processor 11 (dialogue robot 10), and the right figure shows the process executed by the processor 21 (server 20). As shown in FIG. 2, the information processing method S includes steps S101 to S114.

（ステップＳ１０１）
ステップＳ１０１において、対話ロボット１０のプロセッサ１１は、マイク１２０を介して入力される音声を、第１のユーザ音声として取得する音声取得処理を実行する。また、プロセッサ１１は、取得した第１のユーザ音声を、一次メモリ１２または二次メモリ１３に一時的に記憶する。 (Step S101)
In step S101, the processor 11 of the dialogue robot 10 executes a voice acquisition process of acquiring the voice input via the microphone 120 as the first user voice. Further, the processor 11 temporarily stores the acquired first user voice in the primary memory 12 or the secondary memory 13.

（ステップＳ１０２）
ステップＳ１０２において、プロセッサ１１は、第１のユーザ音声に対する音声認識処理を実行する。音声認識処理による音声認識結果は、第１のユーザ音声に対応するテキスト情報である。なお、音声認識処理としては、公知の技術を適用可能である。例えば、プロセッサ１１は、二次メモリ１３に記憶された公知の音声認識プログラムを読み込んで実行してもよいし、クラウドサーバ（不図示）が提供する音声認識サービスを利用してもよい。 (Step S102)
In step S102, the processor 11 executes voice recognition processing for the first user voice. The voice recognition result by the voice recognition process is text information corresponding to the first user voice. As the speech recognition process, a known technique can be applied. For example, the processor 11 may read and execute a known voice recognition program stored in the secondary memory 13, or may use a voice recognition service provided by a cloud server (not shown).

（ステップＳ１０３～Ｓ１０４）
ステップＳ１０３において、プロセッサ１１は、第１のユーザ音声に基づいて話題を推定する推定処理を実行する。推定処理は、ステップＳ１０２における音声認識結果を用いて実行される。当該ステップの実行により、第１のユーザ音声に含まれるキーワードが特定されるとともに、第１のユーザ音声に関連する話題が推定される。プロセッサ１１は、ステップＳ１０３を実行するためにサーバ２０に対して問い合わせを行う。ステップＳ１０４においてサーバ２０は、複数の話題キーワードＤＢ２３１および複数の類似キーワードＤＢ２３２を検索することにより問い合わせに応答する。ステップＳ１０３～Ｓ１０４の詳細については後述する。 (Steps S103 to S104)
In step S103, the processor 11 executes an estimation process for estimating a topic based on the first user voice. The estimation process is executed using the voice recognition result in step S102. By executing the step, the keyword included in the first user voice is specified, and the topic related to the first user voice is estimated. The processor 11 makes an inquiry to the server 20 in order to execute step S103. In step S104, the server 20 responds to an inquiry by searching for a plurality of topic keyword DB 231 and a plurality of similar keyword DB 232. Details of steps S103 to S104 will be described later.

（ステップＳ１０５）
ステップＳ１０５において、プロセッサ１１は、第１のユーザ音声に応答する応答音声を生成する生成処理を実行する。具体的には、プロセッサ１１は、第１のユーザ音声の音声認識結果であるテキスト情報を用いて、応答用のテキスト情報を生成する。また、プロセッサ１１は、応答用のテキスト情報から、音声合成技術を用いて応答音声を生成する。 (Step S105)
In step S105, the processor 11 executes a generation process for generating a response voice in response to the first user voice. Specifically, the processor 11 generates text information for a response by using the text information which is the voice recognition result of the first user voice. Further, the processor 11 generates a response voice from the text information for the response by using the speech synthesis technique.

（応答音声の具体例）
ここで、応答音声の具体例について説明する。例えば、プロセッサ１１は、第１のユーザ音声に含まれるキーワードの復唱を含む応答音声、または当該キーワードに関連する質問を含む応答音声を生成する。復唱とは、当該キーワードを繰り返すことである。また、当該キーワードに関連する質問とは、第１のユーザ音声の内容を掘り下げる質問であることが望ましい。 (Specific example of response voice)
Here, a specific example of the response voice will be described. For example, the processor 11 generates a response voice including a repeat of a keyword included in the first user voice, or a response voice including a question related to the keyword. Repeating is to repeat the keyword. Further, it is desirable that the question related to the keyword is a question that delves into the content of the first user voice.

例えば、第１のユーザ音声の音声認識結果としてテキスト情報「今日はいい天気です」が得られているとする。また、当該音声認識結果からキーワードとして「いい」および「天気」が特定されているとする。この場合、プロセッサ１１は、これらのキーワードの復唱を含む応答音声「いい天気ですね」を生成する。 For example, it is assumed that the text information "Today is good weather" is obtained as the voice recognition result of the first user voice. Further, it is assumed that "good" and "weather" are specified as keywords from the voice recognition result. In this case, the processor 11 generates a response voice "good weather" including the repetition of these keywords.

また、例えば、第１のユーザ音声認識結果としてテキスト情報「昨日は公園に行ったんですよ」が得られているとする。また、当該音声認識結果から、キーワードとして「公園」および「行った」が特定されているとする。この場合、プロセッサ１１は、これらのキーワードに関連する質問を含む応答音声「どこの公園に行ったんですか」を生成する。 Further, for example, it is assumed that the text information "I went to the park yesterday" is obtained as the first user voice recognition result. Further, it is assumed that "park" and "went" are specified as keywords from the voice recognition result. In this case, the processor 11 generates a response voice "where did you go to the park" containing questions related to these keywords.

（ステップＳ１０６）
ステップＳ１０６において、プロセッサ１１は、応答音声をスピーカ１３０から出力する音声出力処理を実行する。 (Step S106)
In step S106, the processor 11 executes a voice output process for outputting the response voice from the speaker 130.

（ステップＳ１０７）
ステップＳ１０７において、プロセッサ１１は、マイク１２０を介して入力される音声を、第２のユーザ音声として取得する音声取得処理を実行する。 (Step S107)
In step S107, the processor 11 executes a voice acquisition process of acquiring the voice input via the microphone 120 as a second user voice.

（ステップＳ１０８）
ステップＳ１０８において、プロセッサ１１は、第２のユーザ音声に対する音声認識処理を実行する。音声認識処理による音声認識結果は、第２のユーザ音声に対応するテキスト情報である。 (Step S108)
In step S108, the processor 11 executes voice recognition processing for the second user voice. The voice recognition result by the voice recognition process is text information corresponding to the second user voice.

（ステップＳ１０９）
ステップＳ１０９において、プロセッサ１１は、第２のユーザ音声が否定的な内容を示すかを判断する。 (Step S109)
In step S109, the processor 11 determines whether the second user voice shows negative content.

（否定判断処理の具体例１）
例えば、プロセッサ１１は、第２のユーザ音声が否定的な内容を示すか否かの否定判断処理を、上述した音声認識結果に基づき実行する。具体的には、プロセッサ１１は、第２のユーザ音声の音声認識結果に否定的なキーワードが含まれる場合、第２のユーザ音声が否定的な内容を示すと判断する。否定的なキーワードの一例としては、「いや」、「違う」、「そうじゃない」等が挙げられるが、これらに限られない。この場合、例えば、二次メモリ１３は、否定的なキーワードをあらかじめ記憶しておく。プロセッサ１１は、第２のユーザ音声の音声認識結果、および二次メモリ１３に記憶された否定的なキーワードを参照して、当該否定判断処理を行う。 (Specific example 1 of negative judgment processing)
For example, the processor 11 executes a negative determination process of whether or not the second user voice shows a negative content based on the above-mentioned voice recognition result. Specifically, the processor 11 determines that the second user voice indicates a negative content when the voice recognition result of the second user voice contains a negative keyword. Examples of negative keywords include, but are not limited to, "no,""no," and "not." In this case, for example, the secondary memory 13 stores a negative keyword in advance. The processor 11 refers to the voice recognition result of the second user voice and the negative keyword stored in the secondary memory 13, and performs the negative determination process.

（否定判断処理の具体例２）
また、プロセッサ１１は、上述した否定判断処理を、具体例１で述べた音声認識結果に基づく手法に替えて、または、加えて、他の手法を用いて実行してもよい。具体例２は、他の手法の一例であり、ユーザのジェスチャまたは表情に基づき否定判断処理を行う手法である。例えば、否定的なジェスチャとして、首を横に振る等が挙げられる。また、否定的な表情として、口角が下がる等が挙げられる。例えば、プロセッサ１１は、カメラ（不図示）を用いて、第２のユーザ音声を発話しているユーザを撮影した画像を取得する。また、プロセッサ１１は、取得した画像を解析することにより、当該画像が否定的なジェスチャの特徴または否定的な表情の特徴を示すか否かを判断する。この場合、例えば、二次メモリ１３は、画像における否定的なジェスチャの特徴または表情の特徴をあらかじめ記憶しておく。プロセッサ１１は、第２のユーザ音声を発話しているユーザを撮影した画像、および二次メモリ１３に記憶された特徴を参照して、当該否定判断処理を行う。 (Specific example 2 of negative judgment processing)
Further, the processor 11 may execute the above-mentioned negative determination process in place of or in addition to the method based on the voice recognition result described in the first embodiment by using another method. Specific example 2 is an example of another method, and is a method of performing a negative judgment process based on a user's gesture or facial expression. For example, negative gestures include shaking the head. In addition, negative facial expressions include a lowered corner of the mouth. For example, the processor 11 uses a camera (not shown) to acquire an image of a user speaking a second user voice. In addition, the processor 11 analyzes the acquired image to determine whether or not the image exhibits negative gesture features or negative facial expression features. In this case, for example, the secondary memory 13 stores in advance negative gesture features or facial expression features in the image. The processor 11 performs the negative determination process with reference to an image of a user speaking a second user voice and a feature stored in the secondary memory 13.

（ステップＳ１１０）
ステップＳ１０９でＹｅｓの場合、ステップＳ１１０において、プロセッサ１１は、第１のユーザ音声に関連する話題は、他の話題であると判断する。他の話題とは、ステップＳ１０３で推定した話題以外の話題である。換言すると、プロセッサ１１は、ステップＳ１０３で推定した話題が正しくないと判断する。例えば、このとき、プロセッサ１１は、推定した話題が誤っていたことを示す応答音声を出力してもよい。そのような応答音声の具体例としては、「すみません、間違えました」等があるが、これに限られない。 (Step S110)
In the case of Yes in step S109, in step S110, the processor 11 determines that the topic related to the first user voice is another topic. The other topic is a topic other than the topic estimated in step S103. In other words, the processor 11 determines that the topic estimated in step S103 is incorrect. For example, at this time, the processor 11 may output a response voice indicating that the estimated topic was incorrect. Specific examples of such response voices include, but are not limited to, "I'm sorry, I made a mistake."

プロセッサ１１は、ステップＳ１１０を実行すると、情報処理方法Ｓを終了する。なお、プロセッサ１１は、情報処理方法Ｓを終了する前に、一次メモリ１２または二次メモリ１３から、一時的に記憶していた第１のユーザ音声を消去する。また、例えば、プロセッサ１１は、情報処理方法Ｓを終了した後、再度情報処理方法Ｓを繰り返してもよい。 When the processor 11 executes step S110, the processor 11 ends the information processing method S. The processor 11 erases the temporarily stored first user voice from the primary memory 12 or the secondary memory 13 before the information processing method S is terminated. Further, for example, the processor 11 may repeat the information processing method S again after finishing the information processing method S.

（ステップＳ１１１）
ステップＳ１０９でＮｏの場合、ステップＳ１１１において、プロセッサ１１は、ステップＳ１０３で推定した話題が正しいと判断する。 (Step S111)
If No in step S109, in step S111, the processor 11 determines that the topic estimated in step S103 is correct.

（ステップＳ１１２）
ステップＳ１１２において、プロセッサ１１は、推定した話題が所定条件を満たすか否かを判断する。本実施形態では、所定条件とは、「話題が健康に関連する」との条件である。 (Step S112)
In step S112, the processor 11 determines whether or not the estimated topic satisfies a predetermined condition. In the present embodiment, the predetermined condition is a condition that "the topic is related to health".

ステップＳ１１２でＮｏの場合、情報処理方法Ｓは終了する。なお、プロセッサ１１は、情報処理方法Ｓを終了する前に、一次メモリ１２または二次メモリ１３から、一時的に記憶していた第１のユーザ音声を消去する。また、例えば、プロセッサ１１は、情報処理方法Ｓを終了した後、再度情報処理方法Ｓを繰り返してもよい。 If No in step S112, the information processing method S ends. The processor 11 erases the temporarily stored first user voice from the primary memory 12 or the secondary memory 13 before the information processing method S is terminated. Further, for example, the processor 11 may repeat the information processing method S again after finishing the information processing method S.

（ステップＳ１１３～Ｓ１１４）
ステップＳ１１２でＹｅｓの場合、ステップＳ１１３において、プロセッサ１１は、一次記憶していた第１のユーザ音声をサーバ２０に送信することにより、当該第１のユーザ音声の記録を要求する。ステップＳ１１４において、サーバ２０のプロセッサ２１は、受信した第１のユーザ音声を音声ＤＢ２３３に記録する記録処理を実行する。プロセッサ２１は、音声ＤＢ２３３に記録した第１のユーザ音声に対するアクセス情報を、対話ロボット１０に送信する。音声ＤＢ２３３に記録された第１のユーザ音声は、アクセス情報を用いてアクセスされることにより、ダウンロード、再生、またはその両方が可能である。プロセッサ１１は、アクセス情報を受信すると、一次メモリ１２または二次メモリ１３から、一時的に記憶していた第１のユーザ音声を消去する。 (Steps S113 to S114)
In the case of Yes in step S112, in step S113, the processor 11 requests the recording of the first user voice by transmitting the first user voice stored in the primary to the server 20. In step S114, the processor 21 of the server 20 executes a recording process of recording the received first user voice in the voice DB 233. The processor 21 transmits the access information for the first user voice recorded in the voice DB 233 to the dialogue robot 10. The first user voice recorded in the voice DB 233 can be downloaded, played back, or both by being accessed using the access information. Upon receiving the access information, the processor 11 erases the temporarily stored first user voice from the primary memory 12 or the secondary memory 13.

（音声ＤＢ２３３）
音声ＤＢ２３３は、第１のユーザ音声を格納する。また、音声ＤＢ２３３は、第１のユーザ音声に関連付けて、関連情報を格納してもよい。関連情報の一例としては、日時、第１のユーザ音声に含まれるキーワード、推定処理により推定した話題、ユーザの識別情報、および対話ロボット１０の現在位置等が挙げられる。この場合、ステップＳ１１３において、対話ロボット１０のプロセッサ１１は、第１のユーザ音声とともに関連情報をサーバ２０に送信する。サーバ２０のプロセッサ２１は、対話ロボット１０から受信したこれらの情報を関連付けて音声ＤＢに記録する。 (Voice DB 233)
The voice DB 233 stores the first user voice. Further, the voice DB 233 may store related information in association with the first user voice. Examples of related information include the date and time, keywords included in the first user voice, topics estimated by estimation processing, user identification information, and the current position of the dialogue robot 10. In this case, in step S113, the processor 11 of the dialogue robot 10 transmits the related information to the server 20 together with the first user voice. The processor 21 of the server 20 associates these information received from the dialogue robot 10 and records it in the voice DB.

（ステップＳ１１５）
ステップＳ１１５において、プロセッサ１１は、ユーザに関する情報を外部に送信する送信処理を実行する。ユーザに関する情報は、第１のユーザ音声に対するアクセス情報を含む。また、ユーザに関する情報は、第１のユーザ音声に関連付けて音声ＤＢに記憶した関連情報を含んでいてもよい。ユーザに関する情報の送信先の一例としては、ユーザの健康を管理する管理者（家族、介護者、または主治医等）が挙げられるが、これに限られない。また、ユーザに関する情報の送信手段の一例としては、電子メールが挙げられるが、これに限られない。例えば、二次メモリ１３は、あらかじめ、送信先および送信手段を示す情報を記憶している。 (Step S115)
In step S115, the processor 11 executes a transmission process for transmitting information about the user to the outside. The information about the user includes access information for the first user voice. Further, the information about the user may include the related information stored in the voice DB in association with the first user voice. An example of a destination for sending information about a user is, but is not limited to, an administrator (family, caregiver, attending physician, etc.) who manages the health of the user. Further, an example of a means for transmitting information about a user is, but is not limited to, e-mail. For example, the secondary memory 13 stores information indicating a destination and a transmission means in advance.

（話題推定処理の詳細）
次に、ステップＳ１０３～Ｓ１０４における話題の推定処理の詳細について、図３を参照して説明する。図３は、話題の推定処理の詳細な流れを示すフローチャートである。図３に示すように、話題の推定処理は、ステップＳ２０１～Ｓ２０６を含む。 (Details of topic estimation processing)
Next, the details of the topic estimation process in steps S103 to S104 will be described with reference to FIG. FIG. 3 is a flowchart showing a detailed flow of the topic estimation process. As shown in FIG. 3, the topic estimation process includes steps S201 to S206.

（ステップＳ２０１）
ステップＳ２０１において、対話ロボット１０のプロセッサ１１は、第１のユーザ音声の音声認識結果から、１または複数のキーワードを抽出する。キーワードの抽出処理には、例えば、公知の自然言語処理の技術（例えば、形態素解析等）を適用可能である。 (Step S201)
In step S201, the processor 11 of the dialogue robot 10 extracts one or a plurality of keywords from the voice recognition result of the first user voice. For example, a known natural language processing technique (for example, morphological analysis or the like) can be applied to the keyword extraction process.

（ステップＳ２０２）
ステップＳ２０２において、プロセッサ１１は、抽出した各キーワードが複数の話題キーワードＤＢ２３１のうち何れに含まれるかに基づいて、話題を推定する。当該ステップの処理を実行するため、プロセッサ１１は、各キーワードを含む話題キーワードＤＢ２３１をサーバ２０に問い合わせ、サーバ２０は、問い合わせに応答する。 (Step S202)
In step S202, the processor 11 estimates the topic based on which of the plurality of topic keyword DB 231 each extracted keyword is included in. In order to execute the process of the step, the processor 11 inquires the server 20 about the topic keyword DB 231 including each keyword, and the server 20 responds to the inquiry.

（話題キーワードＤＢ２３１）
ここで、話題キーワードＤＢ２３１について説明する。サーバ２０の二次メモリ２３は、複数の話題の各々に関連付けて話題キーワードＤＢ２３１を記憶している。各話題キーワードＤＢ２３１は、当該話題において用いられる１以上のキーワードを含む。同一のキーワードが、複数の話題キーワードＤＢ２３１に含まれていてもよい。複数の話題には、「健康に関連する話題」が含まれる。「健康に関連する話題」とは、例えば、体調または病気に関連する話題を含む。その他、複数の話題には、例えば、「オレオレ詐欺などの特殊詐欺にあっていることを想起させる話題」、および「人間関係に関する話題」等が含まれていてもよいが、これらに限られない。 (Topic keyword DB231)
Here, the topic keyword DB 231 will be described. The secondary memory 23 of the server 20 stores the topic keyword DB 231 in association with each of the plurality of topics. Each topic keyword DB 231 includes one or more keywords used in the topic. The same keyword may be included in a plurality of topic keyword DB231. Multiple topics include "health-related topics". "Health-related topics" include, for example, topics related to physical condition or illness. In addition, the plurality of topics may include, but are not limited to, for example, "a topic reminiscent of a special fraud such as oleore fraud" and "a topic related to human relations". ..

（キーワードを含む話題キーワードＤＢ２３１が１つの場合）
プロセッサ１１は、抽出した１または複数のキーワードが１つの話題キーワードＤＢ２３１に含まれる場合、当該話題キーワードＤＢ２３１に関連付けられた話題を、第１ユーザ音声に関連する話題として推定する。このようなケースとして、抽出したキーワードが１つであり、かつ、当該キーワードを含む話題キーワードＤＢ２３１が１つの場合がある。また、このようなケースとして、複数のキーワードが抽出され、かつ、各キーワードを含む話題キーワードＤＢ２３１が全て同一の場合がある。 (When there is one topic keyword DB231 including keywords)
When one or a plurality of extracted keywords are included in one topic keyword DB 231, the processor 11 estimates the topic associated with the topic keyword DB 231 as a topic related to the first user voice. As such a case, there is a case where the extracted keyword is one and the topic keyword DB 231 including the keyword is one. Further, as such a case, a plurality of keywords may be extracted and the topic keyword DB 231 including each keyword may be all the same.

（キーワードを含む話題キーワードＤＢ２３１が複数の場合）
また、プロセッサ１１は、抽出した１または複数のキーワードが複数の話題キーワードＤＢ２３１に含まれる場合、そのうち何れかに関連付けられた話題を、第１ユーザ音声に関連する話題として推定する。このようなケースとして、抽出したキーワードが１つであり、かつ、当該キーワードが複数の話題キーワードＤＢ２３１に含まれる場合がある。また、このようなケースとして、複数のキーワードが抽出され、そのうち少なくとも２つが互いに異なる話題キーワードＤＢ２３１に含まれる場合がある。 (When there are multiple topic keyword DB231 including keywords)
Further, when the extracted one or a plurality of keywords are included in the plurality of topic keyword DB 231s, the processor 11 estimates the topic associated with any of them as the topic related to the first user voice. As such a case, there is a case where the extracted keyword is one and the keyword is included in a plurality of topic keyword DB231. Further, as such a case, a plurality of keywords may be extracted, and at least two of them may be included in the topic keyword DB 231 which is different from each other.

例えば、プロセッサ１１は、該当する複数の話題キーワードＤＢに関連付けられた話題のうち、所定のルールに基づいて何れかの話題を選択する。プロセッサ１１は、選択した話題を、第１のユーザ音声に関連する話題として推定する。所定のルールの具体例としては、（１）話題の固定的な優先順位、（２）話題の動的な優先順位、および（３）キーワードの個数が挙げられるが、これらに限られない。 For example, the processor 11 selects one of the topics associated with the corresponding plurality of topic keyword DBs based on a predetermined rule. The processor 11 estimates the selected topic as a topic related to the first user voice. Specific examples of predetermined rules include, but are not limited to, (1) fixed priority of topics, (2) dynamic priority of topics, and (3) number of keywords.

（１）話題の固定的な優先順位に基づく場合、二次メモリ１３は、複数の話題間に定められた固定的な優先順位をあらかじめ記憶しておく。プロセッサ１１は、該当する複数の話題キーワードＤＢに関連付けられた話題のうち、固定的な優先順位が最も高いものを選択する。 (1) When based on a fixed priority of a topic, the secondary memory 13 stores in advance a fixed priority determined between a plurality of topics. The processor 11 selects the topic having the highest fixed priority from the topics associated with the corresponding plurality of topic keyword DBs.

（２）話題の動的な優先順位に基づく場合、二次メモリ１３は、過去に実行されたステップＳ１１１で正しいと判断された話題の履歴を記憶しておく。プロセッサ１１は、話題の履歴に応じて話題の優先順位を動的に変化させる。プロセッサ１１は、該当する複数の話題キーワードＤＢ２３１に関連付けられた話題のうち、動的な優先順位が最も高いものを選択する。例えば、プロセッサ１１は、直近のステップＳ１１１で正しいと判断された話題の優先順位を最も高くしてもよい。また、プロセッサ１１は、直近の所定回数または直近の所定期間中におけるステップＳ１１１で正しいと判断された回数が多い順に話題の優先順位を高くしてもよい。 (2) Based on the dynamic priority of topics, the secondary memory 13 stores the history of topics determined to be correct in step S111 executed in the past. The processor 11 dynamically changes the priority of the topic according to the history of the topic. The processor 11 selects the topic having the highest dynamic priority among the topics associated with the corresponding plurality of topic keywords DB 231. For example, the processor 11 may have the highest priority of the topic determined to be correct in the latest step S111. Further, the processor 11 may raise the priority of the topic in descending order of the number of times determined to be correct in the latest predetermined number of times or the number of times determined to be correct in step S111 during the latest predetermined number of times.

（３）キーワードの個数に基づく場合、プロセッサ１１は、該当する複数の話題キーワードＤＢ２３１のうち、抽出されたキーワードを最も多く含むものを選択する。 (3) Based on the number of keywords, the processor 11 selects the one containing the most extracted keywords from the corresponding plurality of topic keyword DB231.

（ステップＳ２０３）
ステップＳ２０３において、プロセッサ１１は、ステップＳ２０２において話題が推定されたか否かを判断する。例えば、抽出した１または複数のキーワードを含む話題キーワードＤＢ２３１が１つも無い場合、プロセッサ１１は、話題が推定されなかったと判断する。 (Step S203)
In step S203, the processor 11 determines whether or not the topic has been estimated in step S202. For example, if there is no topic keyword DB 231 including one or a plurality of extracted keywords, the processor 11 determines that the topic has not been estimated.

（ステップＳ２０３でＹｅｓの場合）
ステップＳ２０３でＹｅｓの場合、プロセッサ１１は、ステップＳ１０３における話題の推定処理を終了する。これにより、ステップＳ１０３で特定したキーワードとして、ステップＳ２０１で抽出した１または複数のキーワードが適用される。また、ステップＳ１０３で推定した話題として、ステップＳ２０２で推定した話題が適用される。 (In the case of Yes in step S203)
If Yes in step S203, the processor 11 ends the topic estimation process in step S103. As a result, one or a plurality of keywords extracted in step S201 are applied as the keywords specified in step S103. Further, as the topic estimated in step S103, the topic estimated in step S202 is applied.

（ステップＳ２０３でＮｏの場合）
ステップＳ２０３でＮｏの場合、プロセッサ１１は、ステップＳ２０４からの処理を実行する。 (When No in step S203)
If No in step S203, the processor 11 executes the process from step S204.

（ステップＳ２０４）
ステップＳ２０４において、プロセッサ１１は、抽出した各キーワードが、類似キーワードであるか否かを判断する。当該ステップの処理を実行するため、プロセッサ１１は、各キーワードを類似キーワードとして含む話題キーワードＤＢ２３１をサーバ２０に問い合わせ、サーバ２０は、問い合わせに応答する。 (Step S204)
In step S204, the processor 11 determines whether or not each of the extracted keywords is a similar keyword. In order to execute the process of the step, the processor 11 inquires the server 20 of the topic keyword DB 231 including each keyword as a similar keyword, and the server 20 responds to the inquiry.

（類似キーワードＤＢ２３２）
ここで、類似キーワードＤＢ２３２について説明する。サーバ２０の二次メモリ２３は、複数の話題の各々に関連付けて、類似キーワードＤＢ２３２を記憶している。各類似キーワードＤＢ２３２は、当該話題で用いられる正解キーワードと、当該正解キーワードに類似する類似キーワードとを関連付けて格納している。類似キーワードは、正解キーワードを発話したユーザ音声に対する音声認識により、誤認識される可能性が高いキーワードである。 (Similar keyword DB232)
Here, the similar keyword DB232 will be described. The secondary memory 23 of the server 20 stores the similar keyword DB 232 in association with each of the plurality of topics. Each similar keyword DB232 stores the correct keyword used in the topic and the similar keyword similar to the correct keyword in association with each other. Similar keywords are keywords that are likely to be erroneously recognized by voice recognition for the user's voice that utters the correct keyword.

図４は、類似キーワードＤＢ２３２の具体例を示す図である。図４に示すように、正解キーワード「痛い」に対して、類似キーワード「イッタイ（一体）」、「イッタ（行った）」、「イタ（居た）」、および「タイ」がそれぞれ関連付けられている。例えば、これらの類似キーワードは、「痛い」と発話したユーザ音声に対する音声認識処理によって得られた、「痛い」とは異なるキーワードである。 FIG. 4 is a diagram showing a specific example of the similar keyword DB 232. As shown in FIG. 4, similar keywords "Ittai (one)", "Itta (go)", "Ita (was)", and "Thai" are associated with the correct keyword "pain", respectively. There is. For example, these similar keywords are different from "painful" obtained by voice recognition processing for the user voice uttered "painful".

なお、図４に示す正解キーワード「痛い」は、例えば、「健康に関連する話題」において用いられるキーワードであり、当該話題に関連付けられた話題キーワードＤＢ２３１に含まれている。 The correct answer keyword "painful" shown in FIG. 4 is, for example, a keyword used in "a topic related to health" and is included in the topic keyword DB 231 associated with the topic.

（ステップＳ２０４でＮｏの場合）
ステップＳ２０４でＮｏの場合、プロセッサ１１は、図２のステップＳ１１０を実行し、第１のユーザ音声に関連する話題は、他の話題であると判断する。他の話題とは、ここでは、複数の話題キーワードＤＢ２３１に関連付けられた話題の何れでもない話題である。この場合、プロセッサ１１は、情報処理方法Ｓを終了し、その後、例えば、再度情報処理方法Ｓを実行してもよい。 (If No in step S204)
If No in step S204, the processor 11 executes step S110 in FIG. 2 and determines that the topic related to the first user voice is another topic. The other topic here is a topic that is neither of the topics associated with the plurality of topic keywords DB231. In this case, the processor 11 may end the information processing method S, and then, for example, execute the information processing method S again.

（ステップＳ２０４でＹｅｓの場合）
ステップＳ２０４でＹｅｓの場合、プロセッサ１１は、ステップＳ２０５の処理を実行する。 (In the case of Yes in step S204)
If Yes in step S204, the processor 11 executes the process of step S205.

（ステップＳ２０５）
ステップＳ２０５において、プロセッサ１１は、第１のユーザ音声の音声認識結果に含まれる類似キーワードを、類似キーワードＤＢ２３２において当該類似キーワードに関連付けられた正解キーワードに置換する。つまり、プロセッサ１１は、第１のユーザ音声の音声認識結果から抽出した１または複数のキーワードのうち、誤認識である可能性が高い類似キーワードを正解キーワードに置換する。 (Step S205)
In step S205, the processor 11 replaces the similar keyword included in the voice recognition result of the first user voice with the correct answer keyword associated with the similar keyword in the similar keyword DB232. That is, the processor 11 replaces a similar keyword with a high possibility of erroneous recognition with a correct answer keyword among one or a plurality of keywords extracted from the voice recognition result of the first user voice.

なお、ステップＳ２０４において、抽出されたあるキーワードが、類似キーワードとして複数の類似キーワードＤＢ２３２に含まれると判定される場合がある。この場合、プロセッサ１１は、該当する複数の類似キーワードＤＢ２３２のうち、所定のルールに基づいて何れかを選択する。また、プロセッサ１１は、選択した類似キーワードＤＢ２３２を用いて、上述した置換処理を実行すればよい。なお、複数の類似キーワードＤＢ２３２から何れかを選択するルールの具体例としては、ステップＳ２０２で説明した所定のルールと同様、（１）話題の固定的な優先順位、（２）話題の動的な優先順位、および（３）キーワードの個数が挙げられるが、これらに限られない。 In step S204, it may be determined that a certain extracted keyword is included in a plurality of similar keyword DB232 as a similar keyword. In this case, the processor 11 selects one of the plurality of applicable similar keyword DB 232s based on a predetermined rule. Further, the processor 11 may execute the above-mentioned replacement process using the selected similar keyword DB232. As a specific example of the rule for selecting one from a plurality of similar keyword DB232, as in the predetermined rule described in step S202, (1) a fixed priority of the topic and (2) a dynamic topic. Priority and (3) the number of keywords can be mentioned, but are not limited to these.

例えば、第１のユーザ音声の音声認識結果から、２つのキーワード「頭」および「一体」が抽出されたとする。ここで、２つのキーワードのうち「一体」は、類似キーワードである。また、当該類似キーワード「一体」には、正解キーワード「痛い」が関連付けられている。このため、プロセッサ１１は、音声認識結果に含まれる類似キーワード「一体」を正解キーワード「痛い」に置換する。これにより、置換後の音声認識結果は、２つのキーワード「頭」および「痛い」を含む。 For example, it is assumed that two keywords "head" and "integral" are extracted from the voice recognition result of the first user voice. Here, of the two keywords, "one" is a similar keyword. In addition, the correct keyword "painful" is associated with the similar keyword "one". Therefore, the processor 11 replaces the similar keyword "integral" included in the speech recognition result with the correct answer keyword "painful". As a result, the speech recognition result after the replacement includes two keywords "head" and "painful".

（ステップＳ２０６）
ステップＳ２０６において、プロセッサ１１は、置換後の音声認識結果に基づいて、話題を推定する処理を実行する。話題を推定する処理については、ステップＳ２０２の処理と同様である。ただし、置換前の音声認識結果の代わりに、置換後の音声認識結果に含まれる各キーワードを用いる点が異なる。 (Step S206)
In step S206, the processor 11 executes a process of estimating a topic based on the voice recognition result after the replacement. The process of estimating the topic is the same as the process of step S202. However, the difference is that each keyword included in the voice recognition result after replacement is used instead of the voice recognition result before replacement.

ステップＳ２０６の処理を実行すると、プロセッサ１１は、ステップＳ１０３における話題の推定処理を終了する。これにより、ステップＳ１０３で特定したキーワードとして、置換後の音声認識結果に含まれる１または複数のキーワードが適用される。また、ステップＳ１０３で推定した話題として、ステップＳ２０６で推定した話題が適用される。 When the process of step S206 is executed, the processor 11 ends the topic estimation process in step S103. As a result, as the keyword specified in step S103, one or a plurality of keywords included in the replaced voice recognition result are applied. Further, as the topic estimated in step S103, the topic estimated in step S206 is applied.

その後の図２のステップＳ１０５では、プロセッサ１１は、置換後の音声認識結果に基づいて、応答音声を生成する。例えば、前述の例では、置換後の音声認識結果には、２つのキーワード「頭」および「痛い」が含まれている。この場合、ステップＳ１０５において、プロセッサ１１は、キーワードを復唱する応答音声「頭が痛いのですか？」を生成する。 After that, in step S105 of FIG. 2, the processor 11 generates a response voice based on the voice recognition result after the replacement. For example, in the above example, the speech recognition result after the replacement includes two keywords "head" and "painful". In this case, in step S105, the processor 11 generates a response voice "Do you have a headache?" That repeats the keyword.

このように、図２のステップＳ１０３における話題の推定処理では、プロセッサ１１は、複数の話題の何れかを推定する。 As described above, in the topic estimation process in step S103 of FIG. 2, the processor 11 estimates any of a plurality of topics.

以上で、対話システム１が実行する情報処理方法Ｓの流れの説明を終了する。 This completes the description of the flow of the information processing method S executed by the dialogue system 1.

＜適用例＞
対話システム１の適用例について、図５を参照して説明する。図５は、対話システム１の適用例を説明する図である。図５に示す対話ロボット１０は、被介護者との対話を行う用途で、被介護者の傍らに配置される。図５に示すように、対話ロボット１０、被介護者、および介護者は、以下のステップＮ１～Ｎ７を実行する。 <Application example>
An application example of the dialogue system 1 will be described with reference to FIG. FIG. 5 is a diagram illustrating an application example of the dialogue system 1. The dialogue robot 10 shown in FIG. 5 is arranged beside the care recipient for the purpose of having a dialogue with the care recipient. As shown in FIG. 5, the dialogue robot 10, the care recipient, and the caregiver perform the following steps N1 to N7.

（ステップＮ１：被介護者による発話）
被介護者は、「頭が痛い」と発話する。 (Step N1: Utterance by the care recipient)
The care recipient says, "I have a headache."

（ステップＮ２：対話ロボット１０による話題の推定）
対話ロボット１０は、被介護者が発話した音声を第１のユーザ音声として取得する。また、対話ロボット１０は、第１のユーザ音声の音声認識結果「頭が一体」に対して、類似キーワードＤＢ２３２を用いて置換処理を行う。これにより、置換後の音声認識結果は、キーワード「頭」および「痛い」を含む。また、当該キーワード「頭」および「痛い」は、「健康に関連する話題」に関連付けられた話題キーワードＤＢ２３１に含まれるとする。そこで、対話ロボット１０は、「健康に関連する話題」を推定する（図２のステップＳ１０１～Ｓ１０４）。 (Step N2: Estimating the topic by the dialogue robot 10)
The dialogue robot 10 acquires the voice spoken by the care recipient as the first user voice. Further, the dialogue robot 10 performs a replacement process using the similar keyword DB 232 for the voice recognition result "head is one" of the first user voice. As a result, the speech recognition result after the replacement includes the keywords "head" and "painful". Further, it is assumed that the keywords "head" and "painful" are included in the topic keyword DB 231 associated with the "topic related to health". Therefore, the dialogue robot 10 estimates "health-related topics" (steps S101 to S104 in FIG. 2).

（ステップＮ３：対話ロボット１０による応答）
対話ロボット１０は、置換後の音声認識結果に含まれるキーワード「頭」および「痛い」を用いて、これらのキーワードを復唱する応答音声「頭が痛いのですか？」を生成して出力する（ステップＳ１０５～Ｓ１０６）。 (Step N3: Response by the dialogue robot 10)
The dialogue robot 10 uses the keywords "head" and "painful" included in the replaced voice recognition result to generate and output a response voice "does your head hurt?" That repeats these keywords (does it hurt your head? " Steps S105 to S106).

（ステップＮ４：被介護者による否定応答）
対話ロボット１０の応答音声に対して、被介護者が「違う」等と否定応答した場合について説明する。この場合、対話ロボット１０は、被介護者の否定応答を第２のユーザ音声として取得する。また、対話ロボット１０は、第２のユーザ音声が否定的な内容を示すため、被介護者の話題は「健康に関連する話題」以外であると判断する。また、対話ロボット１０は、「すみません、間違えました」等といった音声を出力する（ステップＳ１０７～Ｓ１０９、Ｓ１１０）。続いて、ステップＮ１からの動作が繰り返される。 (Step N4: Negative response by the care recipient)
A case where the care recipient gives a negative response such as “No” to the response voice of the dialogue robot 10 will be described. In this case, the dialogue robot 10 acquires the negative response of the care recipient as the second user voice. Further, the dialogue robot 10 determines that the topic of the care recipient is other than the "topic related to health" because the second user voice shows a negative content. Further, the dialogue robot 10 outputs a voice such as "I'm sorry, I made a mistake" (steps S107 to S109, S110). Subsequently, the operation from step N1 is repeated.

（ステップＮ５：被介護者による肯定応答）
対話ロボット１０の応答音声に対して、被介護者が「そう」等と肯定応答した場合について説明する。対話ロボット１０は、被介護者の肯定応答を第２のユーザ音声として取得する。また、対話ロボット１０は、第２のユーザ音声が否定的な内容を示していないため、推定した「健康に関連する話題」が正しいと判断する（ステップＳ１０７～Ｓ１０９、Ｓ１１１）。 (Step N5: Acknowledgment by the care recipient)
A case where the care recipient gives an affirmative response such as “yes” to the response voice of the dialogue robot 10 will be described. The dialogue robot 10 acquires the affirmative response of the care recipient as a second user voice. Further, the dialogue robot 10 determines that the estimated "health-related topic" is correct because the second user voice does not show a negative content (steps S107 to S109, S111).

（ステップＮ６：対話ロボット１０から介護者への通知）
次に、対話ロボット１０は、被介護者の話題が健康に関連するため、第１のユーザ音声を、サーバ２０に送信することにより音声ＤＢ２３３に記録する。また、対話ロボット１０は、この被介護者の連絡先として介護者の電子メールアドレスを記憶している。そこで、対話ロボット１０は、この被介護者の情報を含む電子メールを、当該介護者の電子メールアドレス宛てに送信する。送信した電子メールには、音声ＤＢ２３３に記録した第１のユーザ音声に対するアクセス情報が含まれる（ステップＳ１１２～Ｓ１１５）。 (Step N6: Notification from the dialogue robot 10 to the caregiver)
Next, since the topic of the care recipient is related to health, the dialogue robot 10 records the first user voice in the voice DB 233 by transmitting it to the server 20. Further, the dialogue robot 10 stores the caregiver's e-mail address as the contact information of the care recipient. Therefore, the dialogue robot 10 sends an e-mail containing the information of the care recipient to the e-mail address of the caregiver. The transmitted e-mail includes access information for the first user voice recorded in the voice DB 233 (steps S112 to S115).

（ステップＮ７：介護者による第１のユーザ音声の再生）
介護者は、受信した電子メールに含まれるアクセス情報を用いて、サーバ２０の音声ＤＢ２３３にアクセスし、被介護者の第１のユーザ音声「頭が痛い」を再生する。 (Step N7: Playback of the first user voice by the caregiver)
The caregiver accesses the voice DB 233 of the server 20 by using the access information included in the received e-mail, and reproduces the first user voice “head hurts” of the care recipient.

＜本実施形態の効果＞
本実施形態に係る対話システム１は、第１のユーザ音声の音声認識結果に基づいて推定した話題が正しいか否かを、第２のユーザ音声が否定的な内容を示すか否かに基づいて判断する。その結果、第２のユーザ音声を考慮しない場合と比較して、話題の推定精度が向上する。 <Effect of this embodiment>
The dialogue system 1 according to the present embodiment determines whether or not the topic estimated based on the voice recognition result of the first user voice is correct, and whether or not the second user voice shows negative content. to decide. As a result, the estimation accuracy of the topic is improved as compared with the case where the second user voice is not taken into consideration.

また、本実施形態に係る対話システム１は、第１のユーザ音声の音声認識結果に含まれる類似キーワードを、当該類似キーワードに関連付けられた正解キーワードに置換し、置換後の音声認識結果に基づいて話題を推定する。その結果、第１のユーザ音声を誤認識する可能性を低減できるので、話題の推定精度がさらに向上する。 Further, the dialogue system 1 according to the present embodiment replaces the similar keyword included in the voice recognition result of the first user voice with the correct answer keyword associated with the similar keyword, and based on the voice recognition result after the replacement. Estimate the topic. As a result, the possibility of erroneously recognizing the first user voice can be reduced, so that the estimation accuracy of the topic is further improved.

また、本実施形態に係る対話システム１は、推定した話題が健康に関連する場合、第１のユーザ音声を音声ＤＢ２３３に記録するとともに、記録した第１のユーザ音声に対するアクセス情報をユーザの管理者の連絡先に送信する。その結果、対話ロボット１０は、ユーザが対話を楽しむ用途で利用されつつ、緊急を有する可能性が高い健康に関連する発話を検知し、検知した発話を迅速に外部に通知することができる。 Further, in the dialogue system 1 according to the present embodiment, when the estimated topic is related to health, the first user voice is recorded in the voice DB 233, and the recorded access information to the first user voice is recorded by the user administrator. Send to your contacts. As a result, the dialogue robot 10 can detect health-related utterances that are likely to have an urgent need and promptly notify the detected utterances to the outside while being used for the purpose of enjoying the dialogue by the user.

〔変形例〕
（話題の数の変形例）
本実施形態において、対話システム１は、第１のユーザ音声に関連する話題として、複数の話題のうち何れかを推定するものとして説明した。これに限らず、対話システム１は、第１のユーザ音声に関連する話題が、１つの特定の話題であるか否かを推定してもよい。この場合、サーバ２０は、特定の話題に関する話題キーワードＤＢ２３１および類似キーワードＤＢ２３２を１つずつ記憶する。１つの特定の話題は、例えば、健康に関連する話題であってもよい。この場合、対話システム１は、第１のユーザ音声に関連する話題が「健康に関連する話題」であるか否かを精度よく推定することができる。 [Modification example]
(Variation example of the number of topics)
In the present embodiment, the dialogue system 1 has been described as presuming any one of a plurality of topics as a topic related to the first user voice. Not limited to this, the dialogue system 1 may estimate whether or not the topic related to the first user voice is one specific topic. In this case, the server 20 stores the topic keyword DB 231 and the similar keyword DB 232 related to a specific topic one by one. One particular topic may be, for example, a topic related to health. In this case, the dialogue system 1 can accurately estimate whether or not the topic related to the first user voice is a "topic related to health".

（所定条件の変形例）
また、本実施形態において、外部への情報送信を行うか否かを判断する所定条件として、「話題が健康に関連する」との条件を適用する例について説明した。ただし、当該所定条件はこれに限られず、他の話題に関連するとの条件であってもよい。 (Modification example of predetermined conditions)
Further, in the present embodiment, an example in which the condition that "the topic is related to health" is applied as a predetermined condition for determining whether or not to transmit information to the outside has been described. However, the predetermined condition is not limited to this, and may be a condition related to other topics.

（話題推定処理の変形例）
また、本実施形態の話題の推定処理において、プロセッサ１１は、音声認識結果の確度に応じて、図３のステップＳ２０２およびＳ２０３を省略してもよい。例えば、プロセッサ１１は、ステップＳ２０２およびＳ２０３を、音声認識結果の確度が閾値以上の場合には実行し、閾値未満の場合には省略してもよい。これにより、プロセッサ１１は、音声認識結果の確度が高い場合には、まずは置換処理を行わずに話題を推定する。このため、確度が高いにも関わらず置換処理を行うことによって誤った話題が推定される可能性が低減される。また、これにより、プロセッサ１１は、音声認識結果の確度が低い場合には、先に置換処理を実行してから話題を推定する。このため、確度の低い音声認識結果を用いて誤った話題が推定される可能性が低減される。 (Modified example of topic estimation processing)
Further, in the topic estimation process of the present embodiment, the processor 11 may omit steps S202 and S203 of FIG. 3 depending on the accuracy of the voice recognition result. For example, the processor 11 may execute steps S202 and S203 when the accuracy of the speech recognition result is equal to or greater than the threshold value and may be omitted when the accuracy is less than the threshold value. As a result, when the accuracy of the speech recognition result is high, the processor 11 first estimates the topic without performing the replacement process. Therefore, even though the accuracy is high, the possibility that an erroneous topic is presumed by performing the replacement process is reduced. Further, as a result, when the accuracy of the voice recognition result is low, the processor 11 first executes the replacement process and then estimates the topic. Therefore, the possibility that an erroneous topic is estimated using the speech recognition result with low accuracy is reduced.

（サーバ２０が主要なステップを実行する変形例）
また、本実施形態に係る情報処理方法Ｓにおいて、対話ロボット１０が実行するステップの一部を、サーバ２０が実行してもよい。例えば、対話ロボット１０は、音声取得処理（ステップＳ１０１、Ｓ１０７）および音声出力処理（ステップＳ１０６）を実行し、サーバ２０が、その他のステップを実行してもよい。この場合、対話ロボット１０は、取得したユーザ音声をサーバ２０に送信し、サーバ２０から応答音声を受信して出力すればよい。 (Modified example in which the server 20 executes the main steps)
Further, in the information processing method S according to the present embodiment, the server 20 may execute a part of the steps executed by the dialogue robot 10. For example, the dialogue robot 10 may execute the voice acquisition process (steps S101 and S107) and the voice output process (step S106), and the server 20 may execute other steps. In this case, the dialogue robot 10 may transmit the acquired user voice to the server 20, receive the response voice from the server 20, and output it.

（ＤＢ２３１～２３３の格納場所の変形例）
また、本実施形態において、話題キーワードＤＢ２３１、類似キーワードＤＢ２３２、および音声ＤＢ２３３は、サーバ２０の二次メモリ２３に記憶されるものとして説明した。これに限らず、これらのＤＢ２３１～２３３の一部または全部は、対話システム１の外部（例えば、クラウドサーバ等）に記憶されてもよい。また、これらのＤＢ２３１～２３３の一部または全部は、対話ロボット１０の二次メモリ１３に記憶されてもよい。 (Modification example of storage location of DB231 to 233)
Further, in the present embodiment, the topic keyword DB 231 and the similar keyword DB 232 and the voice DB 233 have been described as being stored in the secondary memory 23 of the server 20. Not limited to this, a part or all of these DBs 231 to 233 may be stored outside the dialogue system 1 (for example, a cloud server or the like). Further, a part or all of these DBs 231 to 233 may be stored in the secondary memory 13 of the dialogue robot 10.

（対話ロボット１０が全てのステップを実行する変形例）
また、上述したＤＢ２３１～２３３の何れもサーバ２０が記憶しない場合、本実施形態に係る情報処理方法Ｓの全てのステップを対話ロボット１０が実行すればよい。この場合、サーバ２０は省略可能である。換言すると、本実施形態においては、複数のプロセッサ（対話ロボット１０のプロセッサ１１、およびサーバ２０のプロセッサ２１）が連携して情報処理方法Ｓを実行する構成を採用しているが、本発明は、これに限定されず、対話ロボット１０の単一のプロセッサ１１を用いて情報処理方法Ｓを実行する構成を採用してもよい。 (Modification example in which the dialogue robot 10 executes all steps)
Further, when the server 20 does not store any of the above-mentioned DBs 231 to 233, the dialogue robot 10 may execute all the steps of the information processing method S according to the present embodiment. In this case, the server 20 can be omitted. In other words, in the present embodiment, a configuration is adopted in which a plurality of processors (processor 11 of the interactive robot 10 and processor 21 of the server 20) cooperate to execute the information processing method S. The present invention is not limited to this, and a configuration in which the information processing method S is executed using a single processor 11 of the interactive robot 10 may be adopted.

（対話ロボット１０の代わりとなる構成）
また、本実施形態において、対話システム１は、対話ロボット１０の代わりに、スマートフォン、タブレット、スマートスピーカ、パーソナルコンピュータ等といった、プロセッサおよびメモリを備えるコンピュータを含んでもよい。この場合、当該メモリに対話ロボット１０と同様のプログラムＰ１を記憶し、当該プロセッサがプログラムＰ１を読み込んで実行する。これにより、対話システム１は、上述した実施形態と同様に動作し、同様の効果を奏する。 (A configuration that replaces the dialogue robot 10)
Further, in the present embodiment, the dialogue system 1 may include a computer having a processor and a memory such as a smartphone, a tablet, a smart speaker, a personal computer, etc., instead of the dialogue robot 10. In this case, the program P1 similar to that of the dialogue robot 10 is stored in the memory, and the processor reads and executes the program P1. As a result, the dialogue system 1 operates in the same manner as in the above-described embodiment and has the same effect.

〔まとめ〕
態様１に係る対話システムは、１または複数のプロセッサを備える。前記１または複数のプロセッサは、推定処理と、生成処理と、判断処理とを実行する。推定処理は、ユーザが発話した第１のユーザ音声に基づいて話題を推定する処理である。生成処理は、前記第１のユーザ音声に応答する応答音声を生成する処理である。判断処理は、前記応答音声の出力に対応して前記ユーザが発話した第２のユーザ音声が、否定的な内容を示すか否かに基づいて、前記推定処理により推定した話題が正しいか否かを判断する処理である。〔summary〕
The dialogue system according to aspect 1 comprises one or more processors. The one or more processors execute the estimation process, the generation process, and the determination process. The estimation process is a process of estimating a topic based on the first user voice spoken by the user. The generation process is a process of generating a response voice in response to the first user voice. In the judgment process, whether or not the topic estimated by the estimation process is correct is based on whether or not the second user voice spoken by the user in response to the output of the response voice shows negative content. It is a process to judge.

上記構成により、第１のユーザ音声に基づいて推定した話題が正しいか否かを、第２のユーザ音声が否定的な内容を示すか否かに基づいて判断する。第２のユーザ音声は、第１のユーザ音声に応答するために生成した音声である。その結果、第２のユーザ音声を考慮しない場合と比較して、話題の推定精度が向上する。 With the above configuration, it is determined whether or not the topic estimated based on the first user voice is correct based on whether or not the second user voice shows negative content. The second user voice is a voice generated in response to the first user voice. As a result, the estimation accuracy of the topic is improved as compared with the case where the second user voice is not taken into consideration.

態様２に係る対話システムは、態様１に係る対話システムの特徴に加えて、以下の特徴を有している。すなわち、態様２に係る対話システムにおいて、前記１または複数のプロセッサは、特定の話題で用いられる正解キーワードと、当該正解キーワードに類似する類似キーワードとを関連付けた類似キーワードデータベースを参照する。また、前記１または複数のプロセッサは、前記第１のユーザ音声の音声認識結果に含まれる前記類似キーワードを、前記類似キーワードデータベースにおいて当該類似キーワードに関連付けられた正解キーワードに置換する置換処理をさらに実行する。また、前記１または複数のプロセッサは、置換後の前記音声認識結果に基づいて前記推定処理および前記生成処理を実行する。 The dialogue system according to the second aspect has the following features in addition to the features of the dialogue system according to the first aspect. That is, in the dialogue system according to the second aspect, the one or more processors refer to a similar keyword database in which a correct keyword used in a specific topic and a similar keyword similar to the correct keyword are associated with each other. Further, the one or a plurality of processors further executes a replacement process of replacing the similar keyword included in the voice recognition result of the first user voice with a correct answer keyword associated with the similar keyword in the similar keyword database. do. Further, the one or more processors execute the estimation process and the generation process based on the voice recognition result after the replacement.

上記構成により、置換後の音声認識結果が誤っている可能性が低くなる。その結果、そのような置換後の音声認識結果に基づくことにより、話題の推定精度がさらに向上する。 With the above configuration, it is less likely that the voice recognition result after replacement is incorrect. As a result, the estimation accuracy of the topic is further improved based on the speech recognition result after such replacement.

態様３に係る対話システムは、態様２に係る対話システムの特徴に加えて、以下の特徴を有している。すなわち、態様３に係る対話システムにおいて、前記１または複数のプロセッサは、前記特定の話題を含む複数の話題の各々に関連付けられた前記類似キーワードデータベースを参照し、前記推定処理において、前記複数の話題の何れかを推定する。 The dialogue system according to the third aspect has the following features in addition to the features of the dialogue system according to the second aspect. That is, in the dialogue system according to the third aspect, the one or the plurality of processors refer to the similar keyword database associated with each of the plurality of topics including the specific topic, and in the estimation process, the plurality of topics. Estimate one of them.

上記構成により、複数の話題のそれぞれについて、第１のユーザ音声の音声認識結果が誤りである可能性を低くすることができる。 With the above configuration, it is possible to reduce the possibility that the voice recognition result of the first user voice is erroneous for each of the plurality of topics.

態様４に係る対話システムは、態様１から態様３の何れか一態様に係る対話システムの特徴に加えて、以下の特徴を有している。すなわち、態様４に係る対話システムにおいて、前記１または複数のプロセッサは、前記判断処理により正しいと判断された話題が所定条件を満たす場合、前記ユーザに関する情報を外部に送信する送信処理をさらに実行する。 The dialogue system according to the fourth aspect has the following features in addition to the features of the dialogue system according to any one of the first to third aspects. That is, in the dialogue system according to the fourth aspect, the one or more processors further execute a transmission process of transmitting information about the user to the outside when the topic determined to be correct by the determination process satisfies a predetermined condition. ..

上記構成により、所定条件を満たす話題に関するユーザの発話を検知して迅速に外部に通知することができる。 With the above configuration, it is possible to detect a user's utterance regarding a topic satisfying a predetermined condition and promptly notify the outside.

態様５に係る対話システムは、態様４に係る対話システムの特徴に加えて、以下の特徴を有している。すなわち、態様５に係る対話システムにおいて、前記１または複数のプロセッサは、前記第１のユーザ音声をメモリに記録する記録処理をさらに実行する。また、前記１または複数のプロセッサは、前記送信処理において、前記メモリに記録した前記第１のユーザ音声に対するアクセス情報を、前記ユーザに関する情報に含めて送信する。 The dialogue system according to the fifth aspect has the following features in addition to the features of the dialogue system according to the fourth aspect. That is, in the dialogue system according to the fifth aspect, the one or more processors further execute a recording process for recording the first user voice in the memory. Further, in the transmission process, the one or more processors include the access information for the first user voice recorded in the memory in the information about the user and transmits the information.

上記構成により、当該アクセス情報の受信者は、所定条件を満たす話題に関する第１のユーザ音声にアクセスして再生することができる。 With the above configuration, the receiver of the access information can access and play back the first user voice related to the topic satisfying the predetermined condition.

態様６に係る対話システムは、態様１から態様５の何れか一態様に係る対話システムの特徴に加えて、以下の特徴を有している。すなわち、態様６に係る対話システムにおいて、前記１または複数のプロセッサは、前記推定処理において、前記話題として、健康に関連する話題を推定する。 The dialogue system according to the sixth aspect has the following features in addition to the features of the dialogue system according to any one of the first to fifth aspects. That is, in the dialogue system according to the sixth aspect, the one or more processors estimate a topic related to health as the topic in the estimation process.

上記構成により、ユーザの話題が健康に関連するか否かを、より精度よく推定することができる。 With the above configuration, it is possible to more accurately estimate whether or not the user's topic is related to health.

態様７に係る対話ロボットは、態様１から態様６の何れか一態様に記載の対話システムに含まれる対話ロボットであって、前記１または複数のプロセッサの何れかを備える。前記対話ロボットが備えるプロセッサは、前記第１のユーザ音声および前記第２のユーザ音声を取得する音声取得処理と、前記応答音声を出力する音声出力処理とを少なくとも実行する。 The dialogue robot according to the seventh aspect is the dialogue robot included in the dialogue system according to any one of the first to sixth aspects, and includes any one or the plurality of processors. The processor included in the dialogue robot executes at least a voice acquisition process for acquiring the first user voice and the second user voice, and a voice output process for outputting the response voice.

上記構成により、ユーザは、態様１から態様６の何れかに係る対話システムを、ユーザと対話する対話ロボットの態様で利用することができる。 With the above configuration, the user can use the dialogue system according to any one of aspects 1 to 6 in the form of a dialogue robot that interacts with the user.

態様８に係るプログラムは、態様１から態様６の何れか一態様の対話システムを動作させるためのプログラムであって、前記１または複数のプロセッサに前記各処理を実行させる。 The program according to the eighth aspect is a program for operating the dialogue system of any one of the first aspect to the sixth aspect, and causes the one or a plurality of processors to execute each of the above processes.

上記構成により、態様１に係る対話システムと同様の効果を奏する。 With the above configuration, the same effect as that of the dialogue system according to the first aspect is obtained.

態様９に係る情報処理方法は、推定ステップと、生成ステップと、判断ステップとを含む。推定ステップにおいて、１または複数のプロセッサは、ユーザが発話した第１のユーザ音声に基づいて話題を推定する。生成ステップにおいて、前記１または複数のプロセッサは、前記第１のユーザ音声に応答する応答音声を生成する。判断ステップにおいて、前記１または複数のプロセッサは、前記応答音声の出力に対応して前記ユーザが発話した第２のユーザ音声が、否定的な内容を示すか否かに基づいて、前記推定ステップにより推定した話題が正しいか否かを判断する。 The information processing method according to the ninth aspect includes an estimation step, a generation step, and a determination step. In the estimation step, one or more processors estimate the topic based on the first user voice spoken by the user. In the generation step, the one or more processors generate a response voice in response to the first user voice. In the determination step, the one or more processors are based on the estimation step based on whether or not the second user voice spoken by the user in response to the output of the response voice shows negative content. Determine if the estimated topic is correct.

１対話システム
１０対話ロボット
１１、２１プロセッサ
１２、２２一次メモリ
１３、２３二次メモリ
１４、２４通信インタフェース
１５入出力インタフェース
２０サーバ
１１０コントローラ
１２０マイク
１３０スピーカ
２３１話題キーワードＤＢ
２３２類似キーワードＤＢ
２３３音声ＤＢ 1 Dialogue system 10 Dialogue robot 11, 21 Processor 12, 22 Primary memory 13, 23 Secondary memory 14, 24 Communication interface 15 Input / output interface 20 Server 110 Controller 120 Microphone 130 Speaker 231 Topic keyword DB
232 Similar keyword DB
233 Voice DB

Claims

An interactive system with one or more processors
The one or more processors
Estimating processing that estimates the topic based on the first user voice spoken by the user,
A generation process for generating a response voice in response to the first user voice,
Judgment to determine whether or not the topic estimated by the estimation process is correct based on whether or not the second user voice spoken by the user in response to the output of the response voice shows negative content. Processing and executing,
A dialogue system characterized by that.

The one or more processors
Refer to the similar keyword database that associates the correct keyword used in a specific topic with similar keywords similar to the correct keyword.
Further, a replacement process of replacing the similar keyword included in the voice recognition result of the first user voice with the correct answer keyword associated with the similar keyword in the similar keyword database is further executed.
The estimation process and the generation process are executed based on the voice recognition result after the replacement.
The dialogue system according to claim 1.

The one or more processors
Refer to the similar keyword database associated with each of the plurality of topics including the specific topic.
In the estimation process, one of the plurality of topics is estimated.
2. The dialogue system according to claim 2.

The one or more processors
If the topic determined to be correct by the determination process satisfies a predetermined condition, the transmission process of transmitting information about the user to the outside is further executed.
The dialogue system according to any one of claims 1 to 3, wherein the dialogue system is characterized in that.

The one or more processors
Further executing the recording process of recording the first user voice in the memory,
In the transmission process, the access information for the first user voice recorded in the memory is included in the information about the user and transmitted.
The dialogue system according to claim 4, wherein the dialogue system is characterized in that.

The one or more processors
In the estimation process, a topic related to health is estimated as the topic.
The dialogue system according to any one of claims 1 to 5, wherein the dialogue system is characterized in that.

The dialogue robot included in the dialogue system according to any one of claims 1 to 6, comprising any one of the above-mentioned one or a plurality of processors.
The processor included in the dialogue robot is
At least the voice acquisition process for acquiring the first user voice and the second user voice and the voice output process for outputting the response voice are executed.
A dialogue robot characterized by that.

A program for operating the dialogue system according to any one of claims 1 to 6, wherein the one or a plurality of processors execute each of the above processes.

An estimation step in which one or more processors estimate a topic based on the first user voice spoken by the user.
A generation step in which the one or more processors generate a response voice in response to the first user voice.
The topic estimated by the estimation step is based on whether or not the second user voice spoken by the user in response to the output of the response voice by the one or more processors shows negative content. Including judgment steps to determine whether it is correct or not,
An information processing method characterized by that.