JP2018013595A

JP2018013595A - Information processing device, terminal device, system, information processing method, and program

Info

Publication number: JP2018013595A
Application number: JP2016142734A
Authority: JP
Inventors: 晋太郎石田; Shintaro Ishida
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-07-20
Filing date: 2016-07-20
Publication date: 2018-01-25

Abstract

PROBLEM TO BE SOLVED: To determine a further appropriate reply in voice interaction techniques.SOLUTION: An information processing device includes: first estimation means for estimating a subjective state of a questioner on the basis of an acoustic feature quantity of a voice signal representing a question from the questioner; and determination means for determining a reply to the question on the basis of the subjective state of the questioner estimated by the first estimation means.SELECTED DRAWING: Figure 4

Description

本発明は、情報処理装置、端末装置、システム、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, a terminal device, a system, an information processing method, and a program.

近年、音声認識技術が様々な製品に搭載され、普及が進んでいる。音声認識技術の一つが音声対話技術である。音声対話技術とは、質問者からの音声による質問に対して適切な回答を行うというものであり、１つの質問に対して少なくとも１つ以上の回答が関係づけられている。音声対話技術における質問には、大きく分けて２種類の質問がある。１つは回答が１つに限定される質問で、例えば富士山の高さを質問する場合がこれに該当する。この質問に対する回答は、「３７７６メートルです」という内容となる。もう１つは、回答が複数存在する質問で、例えば、現在地周辺で提供されるお薦めランチ等を質問する場合が該当する。近年では前者の質問に加え、後者の質問に対して適切に回答することが望まれている。回答が複数存在する質問に対して適切な回答を行うためには、質問を行った際の質問者の状態を把握したうえで質問に回答する必要がある。質問者の状態を把握しないと、回答内容と質問者が所望する内容とに不一致が生じ、質問者の要求が満たされないという事態が発生する。
近年、質問者の状態を把握して質問に回答する方法には、様々な方法がある。例えば、質問者の性別や年齢、現在地等の質問者の意思に関係なくセンサ等を利用して取得できる情報から質問者の状態を推定し、推定した状態に応じて回答を決める方法がある。質問者の意思に関係なく取得できる情報から推定される質問者の状態を、以下では、質問者の客観的状態とする。特許文献１では、質問者の客観的状態である属性情報をもとに回答内容を決定する方法が提案されている。特許文献１には、より具体的には、自動車に搭載するナビゲーションシステムにおいて、ゴルフ場を検索する際に、質問者の年齢・平均スコア・現在地等の質問者の客観的状態から、最適なゴルフ場を回答する方法が開示されている。 In recent years, voice recognition technology has been installed in various products and is becoming popular. One of speech recognition technologies is speech dialogue technology. Spoken dialogue technology is to give an appropriate answer to a question by a voice from a questioner, and at least one answer is related to one question. There are roughly two types of questions in spoken dialogue technology. One is a question that is limited to one answer, and this is the case, for example, when asking the height of Mt. Fuji. The answer to this question is “It is 3776 meters”. The other is a question that has a plurality of answers, for example, a question about a recommended lunch provided around the current location. In recent years, in addition to the former question, it is desired to appropriately answer the latter question. In order to give an appropriate answer to a question with multiple answers, it is necessary to answer the question after grasping the state of the questioner when the question is asked. If the state of the questioner is not grasped, there is a mismatch between the content of the answer and the content desired by the questioner, and a situation occurs in which the request of the questioner is not satisfied.
In recent years, there are various methods for grasping the state of the questioner and answering the question. For example, there is a method of estimating a questioner's state from information that can be acquired using a sensor or the like regardless of the questioner's intention such as the gender, age, and current location of the questioner, and deciding an answer according to the estimated state. Hereinafter, the state of the questioner estimated from the information that can be acquired regardless of the intention of the questioner is referred to as an objective state of the questioner. Patent Document 1 proposes a method for determining the content of an answer based on attribute information that is an objective state of a questioner. More specifically, in Patent Document 1, when searching for a golf course in a navigation system mounted on an automobile, the optimal golf is determined based on the objective state of the questioner such as the age, average score, and current location of the questioner. A method for answering a place is disclosed.

特開２００７−１４８１１８号公報JP 2007-148118 A

同じ質問内容であっても、質問を行った質問者の主観的状態により、求める回答に違いが生じる。質問者の主観的状態とは、例えば、質問者の心理的状態等の、質問者の意思により決定される状態である。
例えば、質問者に時間的な余裕がなく、回答をすぐに求めている場合、質問者の主観的状態（例えば、焦っている等の状態）に応じた回答が求められる。例えば、質問者がカメラ等の装置を操作している場合、急いでいるのであれば、少ない操作で使用できる操作方法の回答が求められる。
しかし、特許文献１では質問者の客観的状態しか考慮していない。
本発明の目的は、音声対話技術において、より適切な回答を決定することを目的とする。 Even if the content of the question is the same, there is a difference in the desired answer depending on the subjective state of the questioner who asked the question. The subjective state of the interrogator is a state determined by the inquirer's intention, such as the psychological state of the interrogator.
For example, when the questioner has no time allowance and is immediately seeking an answer, an answer corresponding to the questioner's subjective state (for example, a state of being impatient) is required. For example, when a questioner is operating a device such as a camera, if he / she is in a hurry, an answer for an operation method that can be used with few operations is required.
However, Patent Document 1 considers only the objective state of the questioner.
An object of the present invention is to determine a more appropriate answer in a voice interaction technique.

本発明の情報処理装置は、質問者からの質問を示す音声信号の音響特徴量に基づいて、前記質問者の主観的状態を推定する第１の推定手段と、前記第１の推定手段により推定された前記質問者の主観的状態に基づいて、前記質問に対する回答を決定する決定手段と、を有する。 The information processing apparatus according to the present invention is estimated by a first estimating unit that estimates a subjective state of the questioner based on an acoustic feature amount of an audio signal indicating a question from the questioner, and the first estimating unit. Determining means for determining an answer to the question based on the subjective state of the questioner.

本発明によれば、音声対話技術において、より適切な回答を決定することができる。 According to the present invention, it is possible to determine a more appropriate answer in the voice interaction technique.

音声対話システムのシステム構成の一例を示す図である。It is a figure which shows an example of the system configuration | structure of a speech dialogue system. 音声対話システムの各構成要素のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of each component of a voice interactive system. 音声対話システムの各構成要素の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of each component of a voice interactive system. 回答選択処理の一例を示すフローチャートである。It is a flowchart which shows an example of an answer selection process. 初期設定処理の一例を示すフローチャートである。It is a flowchart which shows an example of an initial setting process. 質問者の主観的状態の推定処理の一例を示すフローチャートである。It is a flowchart which shows an example of a candidate's subjective state estimation process. 回答決定処理の一例を示すフローチャートである。It is a flowchart which shows an example of an answer determination process. 回答データベースの更新処理の一例を示すフローチャートである。It is a flowchart which shows an example of the update process of a reply database. インターフェース画面の一例を示す図である。It is a figure which shows an example of an interface screen. 提示される回答等の一例を示す図である。It is a figure which shows an example of the answer etc. which are shown.

以下、本発明の実施形態について図面に基づいて説明する。以下では、質問の際に質問者が発した質問に関する音声情報を、質問発話情報とする。質問発話情報は、発声した言葉の意味を示す質問内容の情報と、発声した音の情報である音響情報と、を含む。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following, voice information related to a question made by a questioner at the time of a question is referred to as question utterance information. The question utterance information includes information on the contents of the question indicating the meaning of the spoken word, and acoustic information that is information on the uttered sound.

＜実施形態１＞
本実施形態では、音声対話システムにおける質問者の質問発話情報に対し、質問者の客観的状態だけでなく、主観的状態に基づいて、適切な回答を提示する処理を説明する。
図１は、本実施形態の音声対話システムのシステム構成の一例を示す図である。
音声対話システムは、情報端末１０、カメラ２０、サーバー３０を含む。情報端末１０、カメラ２０、サーバー３０は、例えば、インターネット等のネットワークを介して相互に接続されている。質問者は、カメラ２０を用いて、撮影を行っているものとする。そして、質問者は、情報端末に対して、音声で質問を行う。質問者からの質問を受付けた情報端末１０は、質問発話情報を、サーバー３０に送信する。サーバー３０は、送信された質問発話情報に基づいて、質問者の質問に対する適切な回答を決定し、情報端末に送信する。情報端末１０は、送信された回答を、音声で出力したり、表示部１２に表示したりして質問者に提示する。
情報端末１０は、例えば、スマートフォンやタブレット端末、ウェアラブル端末等の端末装置である。情報端末１０は、質問者から質問発話情報を取得し、質問発話情報に対する回答を表示等することができる。情報端末１０は、質問発話情報に対して音声信号処理を行うことや、取得した音声認識結果の保存等を行うことができる。情報端末１０は、種々のセンサを含み、センサからの情報に基づいて、現在位置や現在時刻等を取得することができる。 <Embodiment 1>
In the present embodiment, a process of presenting an appropriate answer to the question utterance information of the questioner in the voice dialogue system based on not only the objective state of the questioner but also the subjective state will be described.
FIG. 1 is a diagram illustrating an example of a system configuration of the voice interaction system according to the present embodiment.
The voice interaction system includes an information terminal 10, a camera 20, and a server 30. The information terminal 10, the camera 20, and the server 30 are connected to each other via a network such as the Internet, for example. It is assumed that the questioner is shooting using the camera 20. The questioner then asks the information terminal by voice. The information terminal 10 that has received the question from the questioner transmits the question utterance information to the server 30. The server 30 determines an appropriate answer to the questioner's question based on the transmitted question utterance information, and transmits it to the information terminal. The information terminal 10 outputs the transmitted answer by voice or displays it on the display unit 12 to present it to the questioner.
The information terminal 10 is, for example, a terminal device such as a smartphone, a tablet terminal, or a wearable terminal. The information terminal 10 can acquire the question utterance information from the questioner, and can display an answer to the question utterance information. The information terminal 10 can perform voice signal processing on the question utterance information, save the acquired voice recognition result, and the like. The information terminal 10 includes various sensors, and can acquire the current position, the current time, and the like based on information from the sensors.

カメラ２０は、カメラ等の撮像装置である。カメラ２０は、画像やセンサ情報等を取得することができる。カメラ２０は、質問者からの質問に対する回答を決定する処理を行うわけではなく、情報端末１０から設定情報を受信し、受信した設定情報をカメラ２０に反映する処理を行う。
サーバー３０は、情報端末１０やカメラ２０と接続されるパーソナルコンピュータ、タブレット装置、サーバー装置等の情報処理装置である。情報端末から受信した質問発話情報に基づいて音声認識や回答検索を行ったり、ウェブサービス等から取得した情報をデータベースとして保存したりすることができる。
本実施形態では、音声対話システムは、３つの構成要素を含むとした。しかし、情報端末１０がサーバー３０の機能を有する場合、音声対話システムは、情報端末１０、カメラ２０のみを含むこととしてもよい。また、カメラ２０が情報端末１０及びサーバー３０の機能を含む場合、音声対話システムは、カメラ２０のみで構成されることとしてもよい。 The camera 20 is an imaging device such as a camera. The camera 20 can acquire images, sensor information, and the like. The camera 20 does not perform the process of determining the answer to the question from the questioner, but performs the process of receiving the setting information from the information terminal 10 and reflecting the received setting information on the camera 20.
The server 30 is an information processing device such as a personal computer, a tablet device, or a server device connected to the information terminal 10 or the camera 20. Based on the question utterance information received from the information terminal, voice recognition and answer search can be performed, and information acquired from a web service or the like can be stored as a database.
In the present embodiment, the voice interaction system includes three components. However, when the information terminal 10 has the function of the server 30, the voice interaction system may include only the information terminal 10 and the camera 20. Further, when the camera 20 includes the functions of the information terminal 10 and the server 30, the voice interactive system may be configured by only the camera 20.

図２は、音声対話システムの各構成要素のハードウェア構成の一例を示す図である。
情報端末１０は、操作部１１、表示部１２、通信部１３、ＣＰＵ１４、メモリ１５、記憶装置１６、マイク１７、センサ１８を含む。操作部１１、表示部１２、通信部１３、ＣＰＵ１４、メモリ１５、記憶装置１６、マイク１７、センサ１８は、情報端末１０のシステムバスを介して相互に接続されている。
操作部１１は、情報端末１０の操作部である。操作部１１は、例えば、質問者の操作を受け付ける際に利用されるボタンやタッチパネル式ディスプレイ等である。質問者は、操作部１１を介して、質問に対する回答についての満足の可否等を、情報端末１０に入力することができる。
表示部１２は、情報端末１０の表示部である。表示部１２は、例えば、ディスプレイ等である。表示部１２は、例えば、入力された質問や回答等の内容を表示することができる。 FIG. 2 is a diagram illustrating an example of a hardware configuration of each component of the voice interaction system.
The information terminal 10 includes an operation unit 11, a display unit 12, a communication unit 13, a CPU 14, a memory 15, a storage device 16, a microphone 17, and a sensor 18. The operation unit 11, display unit 12, communication unit 13, CPU 14, memory 15, storage device 16, microphone 17, and sensor 18 are connected to each other via the system bus of the information terminal 10.
The operation unit 11 is an operation unit of the information terminal 10. The operation unit 11 is, for example, a button or a touch panel display used when accepting an operation of a questioner. The questioner can input the satisfaction of the answer to the question or the like to the information terminal 10 via the operation unit 11.
The display unit 12 is a display unit of the information terminal 10. The display unit 12 is, for example, a display. For example, the display unit 12 can display contents such as an inputted question and answer.

通信部１３は、情報端末１０の通信部である。通信部１３は、マイク１７やセンサ１８等を介して取得されたデータをサーバー３０に送信したり、サーバー３０で処理された音声認識結果や回答等の情報を受信したりする際に利用される。
ＣＰＵ１４は、情報端末１０のＣＰＵである。ＣＰＵ１４は、メモリ１５や記憶装置１６に格納されているコンピュータプログラムやデータを用いて各種処理を行う。また、ＣＰＵ１４は、音声情報の雑音除去処理や回答情報の表示レイアウト作成処理等を行う。
メモリ１５は、情報端末１０のメモリである。メモリ１５は、マイクやセンサ等から取得したデータや、記憶装置１６に格納されているコンピュータプログラムを一時的に保持するワークエリアを含む。
記憶装置１６は、情報端末１０の記憶装置である。記憶装置１６は、音声処理等を行うためのコンピュータプログラムや回答情報の表示レイアウトフォーマット等の情報を記憶する。記憶装置１６は、例えば、ハードディスクドライブ（ＨＤＤ）やソリッドステートドライブ（ＳＳＤ）等である。 The communication unit 13 is a communication unit of the information terminal 10. The communication unit 13 is used when transmitting data acquired via the microphone 17, the sensor 18, or the like to the server 30 or receiving information such as a voice recognition result or answer processed by the server 30. .
The CPU 14 is a CPU of the information terminal 10. The CPU 14 performs various processes using computer programs and data stored in the memory 15 and the storage device 16. Further, the CPU 14 performs a noise removal process for voice information, a display layout creation process for answer information, and the like.
The memory 15 is a memory of the information terminal 10. The memory 15 includes a work area that temporarily holds data acquired from a microphone, a sensor, or the like, or a computer program stored in the storage device 16.
The storage device 16 is a storage device of the information terminal 10. The storage device 16 stores information such as a computer program for performing voice processing and the display layout format of answer information. The storage device 16 is, for example, a hard disk drive (HDD) or a solid state drive (SSD).

マイク１７は、情報端末１０のマイクである。マイク１７は、質問者からの質問発話情報を取得することができる。ＣＰＵ１４は、質問発話情報を、通信部１３を経てサーバーに送信する。
センサ１８は、情報端末１０のセンサである。センサ１８は、例えば、ＧＰＳや近接センサ等である。センサ１８は、情報端末１０の現在地や現在質問者が所持しているデバイスやアタッチメント等に関する情報を取得することができる。
ＣＰＵ１４が、メモリ１５又は記憶装置１６に記憶されたプログラムに基づき処理を実行することによって、図３で後述する情報端末１０の機能及び図４、５で後述するフローチャートにおける情報端末１０の処理等が実現される。 The microphone 17 is a microphone of the information terminal 10. The microphone 17 can acquire question utterance information from the questioner. The CPU 14 transmits the question utterance information to the server via the communication unit 13.
The sensor 18 is a sensor of the information terminal 10. The sensor 18 is, for example, a GPS or a proximity sensor. The sensor 18 can acquire information related to the current location of the information terminal 10, the device, attachment, etc. possessed by the current questioner.
When the CPU 14 executes processing based on the program stored in the memory 15 or the storage device 16, the function of the information terminal 10 described later in FIG. 3 and the processing of the information terminal 10 in the flowchart described later in FIGS. Realized.

カメラ２０は、通信部２１、ＣＰＵ２２、メモリ２３、記憶装置２４、センサ２５を含む。通信部２１、ＣＰＵ２２、メモリ２３、記憶装置２４、センサ２５は、カメラ２０のシステムバスを介して相互に接続されている。
通信部２１は、カメラ２０の通信部である。通信部２１は、センサ２５等を介して取得されたデータを情報端末１０やサーバー３０に送信したり、情報端末１０やサーバー３０から情報を受信したりする際に利用される。
ＣＰＵ２２は、カメラ２０のＣＰＵである。ＣＰＵ２２は、メモリ２３や記憶装置２４に格納されているコンピュータプログラムやデータを用いて各種処理を行う。
メモリ２３は、カメラ２０のメモリである。メモリ２３は、センサ２５等から取得したデータや、記憶装置２４に格納されているコンピュータプログラムを一時的に保持するワークエリアを含む。 The camera 20 includes a communication unit 21, a CPU 22, a memory 23, a storage device 24, and a sensor 25. The communication unit 21, CPU 22, memory 23, storage device 24, and sensor 25 are connected to each other via the system bus of the camera 20.
The communication unit 21 is a communication unit of the camera 20. The communication unit 21 is used when transmitting data acquired via the sensor 25 or the like to the information terminal 10 or the server 30 or receiving information from the information terminal 10 or the server 30.
The CPU 22 is a CPU of the camera 20. The CPU 22 performs various processes using computer programs and data stored in the memory 23 and the storage device 24.
The memory 23 is a memory of the camera 20. The memory 23 includes a work area that temporarily holds data acquired from the sensor 25 and the like and computer programs stored in the storage device 24.

記憶装置２４は、カメラ２０の記憶装置である。記憶装置２４は、撮影処理等を行うためのコンピュータプログラム等の情報を記憶する。記憶装置２４は、例えば、ＨＤＤやＳＳＤ等である。
センサ２５は、カメラ２０のセンサである。センサ２５は、例えば、撮像素子、ジャイロセンサ、照度センサ等を含む。センサ２５は、カメラ２０の姿勢に関する情報を取得できる。ＣＰＵ２２は、センサ２５を介して、撮影処理を行う。
ＣＰＵ２２が、メモリ２３又は記憶装置２４に記憶されたプログラムに基づき処理を実行することによって、図３で後述するカメラ２０の機能及びカメラ２０の処理等が実現される。 The storage device 24 is a storage device of the camera 20. The storage device 24 stores information such as a computer program for performing photographing processing and the like. The storage device 24 is, for example, an HDD or an SSD.
The sensor 25 is a sensor of the camera 20. The sensor 25 includes, for example, an image sensor, a gyro sensor, an illuminance sensor, and the like. The sensor 25 can acquire information related to the posture of the camera 20. The CPU 22 performs photographing processing via the sensor 25.
When the CPU 22 executes processing based on a program stored in the memory 23 or the storage device 24, the functions of the camera 20, the processing of the camera 20, and the like described later in FIG.

サーバー３０は、通信部３１、ＣＰＵ３２、メモリ３３、記憶装置３４を含む。通信部３１、ＣＰＵ３２、メモリ３３、記憶装置３４は、サーバー３０のシステムバスを介して相互に接続されている。
通信部３１はサーバー３０の通信部である。通信部３１は、情報端末１０やカメラ２０が取得・処理したデータの受信や、サーバー３０で処理した音声認識結果や回答等の情報の送信に利用される。
ＣＰＵ３２は、サーバー３０のＣＰＵである。ＣＰＵ３２は、メモリ３３や記憶装置３４に格納されているコンピュータプログラムやデータを用いて各種処理を行う。ＣＰＵ３２は、情報端末１０やカメラ２０では処理に時間がかかるような処理を行う。
メモリ３３は、サーバー３０のメモリである。メモリ３３は、例えば、情報端末１０のマイク１７やセンサ１８等から取得したデータを、一時的に保持するためのワークエリアを有する。また、メモリ３３は、ＣＰＵ３２が処理を実行する際に必要となるワークエリアも有する。
記憶装置３４は、サーバー３０の記憶装置である。記憶装置３４は、各種処理を行うためのコンピュータプログラムや、音声認識に必要な各種モデル、質問に対する回答候補を記憶するデータベース等を記憶する。記憶装置３４は、例えば、ＨＤＤやＳＳＤ等である。
ＣＰＵ３２が、メモリ３３又は記憶装置３４に記憶されたプログラムに基づき処理を実行することによって、図３で後述するサーバー３０の機能及び図４〜８で後述するフローチャートにおけるサーバー３０の処理等が実現される。 The server 30 includes a communication unit 31, a CPU 32, a memory 33, and a storage device 34. The communication unit 31, the CPU 32, the memory 33, and the storage device 34 are connected to each other via the system bus of the server 30.
The communication unit 31 is a communication unit of the server 30. The communication unit 31 is used to receive data acquired and processed by the information terminal 10 and the camera 20 and to transmit information such as voice recognition results and answers processed by the server 30.
The CPU 32 is a CPU of the server 30. The CPU 32 performs various processes using computer programs and data stored in the memory 33 and the storage device 34. The CPU 32 performs processing that takes time in the information terminal 10 and the camera 20.
The memory 33 is a memory of the server 30. For example, the memory 33 has a work area for temporarily holding data acquired from the microphone 17 or the sensor 18 of the information terminal 10. The memory 33 also has a work area required when the CPU 32 executes processing.
The storage device 34 is a storage device of the server 30. The storage device 34 stores a computer program for performing various processes, various models necessary for speech recognition, a database that stores answer candidates for questions, and the like. The storage device 34 is, for example, an HDD or an SSD.
The CPU 32 executes processing based on the program stored in the memory 33 or the storage device 34, thereby realizing the function of the server 30 described later in FIG. 3, the processing of the server 30 in the flowchart described later in FIGS. The

図３は、音声対話システムに含まれる情報端末１０、カメラ２０、サーバー３０の機能構成の一例を示す図である。
情報端末１０は、情報端末制御部３１１、音声取得部３１２、内容提示部３１３、表示操作部３１４、操作内容取得部３１５、センサ情報取得部３１６、表示条件設定部３１７を含む。
情報端末制御部３１１は、情報端末１０による質問発話情報取得、センサ情報取得、内容表示等を滞りなく行うよう、タスクの制御を行う。
音声取得部３１２は、マイク１７を介して、質問者からの質問を受け付け、質問発話情報を取得することができる。また必要に応じて質問発話情報以外に、交通音や水の流れる音、雑踏音等の音声以外の環境音を取得することもできる。
内容提示部３１３は、表示部１２に対して、質問者からの質問や質問に対する回答の内容を表示し、質問者に提示することができる。
表示操作部３１４は、操作部１１を介して、表示部１２に表示させる内容を切り替えたり、選択したりすることができる。
操作内容取得部３１５は、操作部１１を介した質問者の操作に基づいて、質問者がどの回答に目を通したか、実行したか、提示した回答に満足したか否か等の情報を取得することができる。
センサ情報取得部３１６は、情報端末１０に含まれるセンサ１８から、質問者の現在地や現在所持しているデバイスやアタッチメント等の情報を取得することができる。
表示条件設定部３１７は、表示操作部３１４を介して、表示部１２に表示させる内容の決定方法や、表示する回答数の上限数等の表示条件を設定できる。 FIG. 3 is a diagram illustrating an example of functional configurations of the information terminal 10, the camera 20, and the server 30 included in the voice interaction system.
The information terminal 10 includes an information terminal control unit 311, a voice acquisition unit 312, a content presentation unit 313, a display operation unit 314, an operation content acquisition unit 315, a sensor information acquisition unit 316, and a display condition setting unit 317.
The information terminal control unit 311 controls the task so that the question utterance information acquisition, sensor information acquisition, content display, and the like by the information terminal 10 are performed without delay.
The voice acquisition unit 312 can receive a question from a questioner via the microphone 17 and acquire question utterance information. In addition to the question utterance information, environmental sounds other than sounds such as traffic sounds, water flowing sounds, and hustle sounds can be acquired as necessary.
The content presentation unit 313 can display the question from the questioner and the content of the answer to the question on the display unit 12 and present it to the questioner.
The display operation unit 314 can switch or select the content to be displayed on the display unit 12 via the operation unit 11.
Based on the operation of the questioner via the operation unit 11, the operation content acquisition unit 315 obtains information such as which answer the questioner has read, executed, or satisfied with the presented answer. Can be acquired.
The sensor information acquisition unit 316 can acquire information such as the current location of the questioner, the currently owned device, and attachment from the sensor 18 included in the information terminal 10.
The display condition setting unit 317 can set display conditions such as a determination method of contents to be displayed on the display unit 12 and an upper limit number of answers to be displayed via the display operation unit 314.

カメラ２０は、カメラ制御部３２１、パラメータ設定部３２２、センサ情報取得部３２３を含む。
カメラ制御部３２１は、カメラ２０による撮像、パラメータ設定、センサ情報取得等の処理が滞りなく行われるよう、カメラ２０を制御する。
パラメータ設定部３２２は、カメラ２０の動作に関するパラメータを、カメラ２０に反映させる。パラメータ設定部３２２は、例えば、質問者から質問発話情報として、カメラ２０の使い方や設定についての質問の情報が取得された場合、質問者による回答確認後に、カメラのパラメータを変更することができる。パラメータ設定部３２２は、カメラ２０の操作部を介した質問者の操作に基づいて、又は、情報端末１０から受信される設定情報に基づいて、カメラ２０のパラメータを更新できる。
センサ情報取得部３２３は、センサ２５から、カメラの向いている方向や周囲の明るさ等の情報を取得することができる。 The camera 20 includes a camera control unit 321, a parameter setting unit 322, and a sensor information acquisition unit 323.
The camera control unit 321 controls the camera 20 so that processing such as imaging by the camera 20, parameter setting, and sensor information acquisition is performed without delay.
The parameter setting unit 322 reflects parameters related to the operation of the camera 20 on the camera 20. For example, when information about a question about how to use or set the camera 20 is acquired as question utterance information from the questioner, the parameter setting unit 322 can change the camera parameters after confirming the answer by the questioner. The parameter setting unit 322 can update the parameters of the camera 20 based on the operation of the questioner via the operation unit of the camera 20 or based on the setting information received from the information terminal 10.
The sensor information acquisition unit 323 can acquire information such as the direction in which the camera is facing and the ambient brightness from the sensor 25.

サーバー３０は、サーバー制御部３３１、音声認識部３３２、初期設定実行部３３３、客観状態推定部３３４、主観状態推定部３３５、回答情報記憶部３３６、質問者情報記憶部３３７、感情情報記憶部３３８、回答決定部３３９を含む。
サーバー制御部３３１は、サーバー３０による音声認識、回答決定、データベースの更新等の処理が滞りなく行われるように、タスクの制御を行う。
音声認識部３３２は、音声取得部３１２により取得された質問発話情報に基づいて、質問発話情報に係る質問者の音声の音響特徴量の抽出と、発話内容のテキスト化を行うことができる。
初期設定実行部３３３は、以下の二つの機能がある。一つは、質問者の性別、年齢、居住地等の質問者の客観的情報を取得及び記憶装置３４等に登録する機能である。もう一つは、質問者の発声の音響情報といった質問が発声して入力された場合に初めて取得可能となる質問者の主観的情報を取得及び、記憶装置３４等に登録する機能である。初期設定実行部３３３は、予め、感情が平常な状態の質問者から取得された音声情報から音響情報を抽出し、記憶装置３４等に登録するものとする。 The server 30 includes a server control unit 331, a voice recognition unit 332, an initial setting execution unit 333, an objective state estimation unit 334, a subjective state estimation unit 335, an answer information storage unit 336, a questioner information storage unit 337, and an emotion information storage unit 338. The answer determination unit 339 is included.
The server control unit 331 controls the task so that the processing such as voice recognition, answer determination, and database update by the server 30 is performed without delay.
Based on the question utterance information acquired by the voice acquisition unit 312, the voice recognition unit 332 can extract the acoustic feature amount of the questioner's voice related to the question utterance information and convert the utterance contents into text.
The initial setting execution unit 333 has the following two functions. One is a function of acquiring and registering objective information of the questioner such as the sex, age, and residence of the questioner in the storage device 34 and the like. The other is a function of acquiring and registering in the storage device 34 or the like the subjective information of the questioner that can be acquired for the first time when a question such as the acoustic information of the questioner's utterance is spoken and inputted. It is assumed that the initial setting execution unit 333 extracts acoustic information from voice information acquired from a questioner having a normal emotion in advance and registers it in the storage device 34 or the like.

感情が平常な場合の音声情報を取得するためには、例えば、以下のような方法がある。音声取得部３１２が、感情が平常な状態の質問者に予め用意された文章を発声してもらい、その発生された音声情報を取得するという方法である。本実施形態では、例えば、情報端末１０は、表示部１２に、「気分を落ち着けて、以下の文章を発声してください」等のメッセージを表示し、質問者に提示する。情報端末１０は、表示部１２に、そのメッセージと併せて、発声させる文章を表示する。質問者は、表示部１２に表示された文章を発声する。音声取得部３１２は、その音声を取得することになる。
また、音声取得部３１２は、一定期間、感情が平常な状態の質問者に任意の文章や単語を発声してもらい、その発声された任意の文章や単語の音声を取得することとしてもよい。また、音声取得部３１２は、これらの方法以外の方法で、感情が平常な状態の質問者の音声を取得することとしてもよい。また、質問者情報記憶部３３７は、音声取得部３１２により取得された感情が平常な状態の質問者の音声から抽出した音響情報を、質問者の状態推定処理に利用される情報を格納するデータベースに記憶する。以下では、質問者の状態推定処理に利用される情報を格納するデータベースを質問者データベースとする。 In order to acquire voice information when emotion is normal, for example, there are the following methods. In this method, the voice acquisition unit 312 asks a questioner who has a normal emotion to utter a prepared sentence and acquires the generated voice information. In this embodiment, for example, the information terminal 10 displays a message such as “Please calm down and say the following sentence” on the display unit 12 and present it to the questioner. The information terminal 10 displays a sentence to be uttered on the display unit 12 together with the message. The questioner utters the sentence displayed on the display unit 12. The sound acquisition unit 312 acquires the sound.
In addition, the voice acquisition unit 312 may ask a questioner who has a normal emotion for a certain period to utter an arbitrary sentence or word, and acquire the voice of the arbitrary sentence or word that has been uttered. In addition, the voice acquisition unit 312 may acquire the voice of the questioner who has a normal emotion by a method other than these methods. In addition, the questioner information storage unit 337 stores acoustic information extracted from the voice of the questioner who has a normal emotion acquired by the voice acquisition unit 312 and information used for the questioner state estimation process. To remember. Hereinafter, a database that stores information used for the questioner state estimation process is referred to as a questioner database.

客観状態推定部３３４は、質問者の客観的状態を推定する。質問者の客観的状態とは、質問者の意思に関係なく取得できる客観的情報から推定される質問者の状態である。質問者の身体的状態等の質問者の客観的状態は、センサ等で取得される客観的情報から推定可能な状態である。客観的情報とは、例えば、質問者の現在地や現在所持しているデバイスやアタッチメント等の情報のように、質問者の意思とは直接的に関係のない情報のことである。質問者の身体的状態とは、質問者の性別や年齢、身長、何を所持しているか等の質問者の身体や所持している物に関する状態である。また、事前に登録されている質問者の性別や年齢等の質問者の身体的な情報も客観的情報に含まれる。客観状態推定部３３４は、１つの客観的情報に対応する客観的状態を、質問者の客観的状態として推定してもよいし、複数の客観的情報に対応する客観的状態を、質問者の客観的状態として推定してもよい。例えば、客観状態推定部３３４は、センサ２５によりカメラ２０に望遠レンズが接続されていることを示す客観的情報が取得された場合、望遠レンズを所持している状態を、質問者の客観的状態として推定してもよい。
主観状態推定部３３５は、質問者の主観的状態を推定する。質問者の主観的状態とは、例えば、質問者の心理的状態等の質問者の意思により決定される状態である。質問者の心理的状態とは、「焦っている」、「落ち着いている」等の質問者の心理の状態である。質問者の主観的状態は、例えば、質問発話情報の音響情報とセンサ等を介して取得される情報等に基づいて推定可能である。主観状態推定部３３５は、例えば、質問発話情報の発声の速度が、予め定義された平常状態に比べ速い場合は、質問者が焦っている状態であると推定することができる。主観状態推定部３３５は、センサ等から取得される情報のうち、単一の情報に基づいて、質問者の主観的状態を推定してもよいし、複数の情報に基づいて、質問者の主観的状態を推定してもよい。 The objective state estimation unit 334 estimates the objective state of the questioner. The objective state of the questioner is the state of the questioner estimated from objective information that can be acquired regardless of the intention of the questioner. The objective state of the interrogator, such as the physical state of the interrogator, is a state that can be estimated from objective information acquired by a sensor or the like. The objective information is information that is not directly related to the intention of the questioner, such as information such as the current location of the questioner and the device or attachment currently possessed by the questioner. The physical state of the interrogator is a state relating to the body of the interrogator and the possessed item such as the gender, age, height, and what the interrogator possesses. In addition, the physical information of the questioner such as the sex and age of the questioner registered in advance is also included in the objective information. The objective state estimation unit 334 may estimate an objective state corresponding to one objective information as the objective state of the questioner, or obtain an objective state corresponding to a plurality of objective information. It may be estimated as an objective state. For example, when objective information indicating that a telephoto lens is connected to the camera 20 is acquired by the sensor 25, the objective state estimation unit 334 determines the state of possessing the telephoto lens as the objective state of the questioner. May be estimated.
The subjective state estimation unit 335 estimates the subjective state of the questioner. The subjective state of the interrogator is a state determined by the inquirer's intention such as the psychological state of the interrogator. The psychological state of the interrogator is the psychological state of the interrogator such as “I am impatient” or “I am calm”. The subjective state of the questioner can be estimated based on, for example, the acoustic information of the question utterance information and information acquired through a sensor or the like. The subjective state estimation unit 335 can estimate that the questioner is in a state of being impatient, for example, when the speed of the utterance of the question utterance information is faster than the normal state defined in advance. The subjective state estimation unit 335 may estimate the subjective state of the questioner based on a single piece of information acquired from the sensor or the like, or may determine the subjective state of the questioner based on a plurality of pieces of information. The target state may be estimated.

回答情報記憶部３３６は、質問内容と回答内容と付属情報との組みを回答データという１つの単位とし、複数の回答データを含むデータベース（以下では、回答データベース）の記憶と、更新を行う。回答データベースは、例えば、記憶装置３４等に実装される。回答データは、例えば、質問内容として「運動会での徒競走の写真の撮り方」というテキスト情報、回答内容としてシーンモードやシャッタースピードの設定値の情報を含む。更に、回答データは、例えば、付属情報として、回答に対応する客観的状態や回答に対応する主観的状態、及び、その回答データについての質問者の満足度の情報等を含む。回答データは、例えば、回答に対応する主観的状態として、その回答が過去、使用された際の質問者の主観的状態のうち、最も多かった主観的状態を含むこととしてもよい。回答内容と付属情報とを含む回答データは、回答候補と回答候補に対応する主観的状態と回答候補に対する前記質問者の満足の度合いを示す満足度との対応を示す対応情報の一例である。
本実施形態では、音声取得部３１２により取得された質問発話情報が音声認識部３３２により認識された後、回答決定部３３９が音声認識部３３２による認識結果と回答データベースとに基づいて、適切な回答を抽出する。そして、内容提示部３１３が、抽出された回答を、質問者に提示する。回答の提示後に質問者が行った操作内容は、操作内容取得部３１５によって取得され、その内容に応じて回答情報記憶部３３６が回答データベースを更新する。より具体的には、回答情報記憶部３３６は、回答データベースの項目のうち、その回答が選択された際の質問者の主観的状態の比率や、回答に対する満足度等の項目を更新する。本実施形態では、回答情報記憶部３３６は、回答データベースの更新において以上のような方法を用いているが、この方法に限定するものではない。回答情報記憶部３３６は、例えば、回答を、ウェブ等を介して外部から新たに取得した情報に基づいて、回答データベースを更新することとしてもよい。 The answer information storage unit 336 stores and updates a database (hereinafter referred to as an answer database) including a plurality of answer data, with the combination of the question contents, the answer contents, and the attached information as one unit called answer data. The answer database is implemented in the storage device 34, for example. The answer data includes, for example, text information “how to take a picture of an athletic meet at an athletic meet” as a question content, and information on setting values of a scene mode and a shutter speed as a response content. Furthermore, the answer data includes, for example, objective information corresponding to the answer, subjective state corresponding to the answer, and information on the degree of satisfaction of the questioner about the answer data as attached information. The answer data may include, for example, a subjective state corresponding to the answer, which is the most subjective state of the questioner's subjective state when the answer has been used in the past. The answer data including the answer contents and the attached information is an example of correspondence information indicating correspondence between the answer candidate, the subjective state corresponding to the answer candidate, and the satisfaction indicating the degree of satisfaction of the questioner with respect to the answer candidate.
In this embodiment, after the question utterance information acquired by the voice acquisition unit 312 is recognized by the voice recognition unit 332, the answer determination unit 339 determines an appropriate answer based on the recognition result by the voice recognition unit 332 and the answer database. To extract. Then, the content presentation unit 313 presents the extracted answer to the questioner. The operation content performed by the questioner after the answer is presented is acquired by the operation content acquisition unit 315, and the response information storage unit 336 updates the response database according to the content. More specifically, the answer information storage unit 336 updates items such as the ratio of the subjective state of the questioner when the answer is selected and the degree of satisfaction with the answer among the items in the answer database. In the present embodiment, the answer information storage unit 336 uses the method as described above in updating the answer database, but is not limited to this method. For example, the answer information storage unit 336 may update the answer database based on information newly acquired from the outside via the web or the like.

質問者情報記憶部３３７は、質問者の質問発話情報に加えて、質問者の客観的状態の推定に利用可能な情報と質問者の主観的状態の推定に利用可能な情報との両方を記憶する質問者データベースの記憶と、更新を行う。
本実施形態では、質問者情報記憶部３３７は、質問者データベースに質問発話情報の発話内容、音響情報、センサ情報等を記憶する。また、質問者情報記憶部３３７は、初期設定実行部３３３により取得された質問者の性別や年齢等の情報や、感情が平常である場合の質問者の音響情報等の情報も、質問者データベースに記憶する。また、質問者情報記憶部３３７は、質問発話情報の発話内容と音響情報、更にセンサ情報取得部３１６とセンサ情報取得部３２３により取得されたセンサ情報の内容で、質問者データベースを更新する。質問者情報記憶部３３７は、質問者データベースに、全てのデータを取得時間と紐づけて一元的に管理可能なように情報を記憶したり、質問者の主観的状態及び客観的状態のデータ等のデータの属性ごとに分類して管理可能なように記憶したりする。質問者データベースの更新方法は、このような方法に限定されるものではない。
感情情報記憶部３３８は、各種音響特徴量と感情を示すラベル（怒り、悲しみ、焦り、平常状態などのテキスト）との組を一つのデータ（以下では、感情データ）とし、複数の感情データを含むデータベース（以下では、感情データベース）の記憶を行う。音響特徴量としては、発話速度や基本周波数、音量、抑揚の大きさ、パワーや発話持続期間、更にはＭＦＣＣやその分散等がある。また、本実施形態では、それぞれの音響特徴量は、ラベルが示す感情の状態における複数の人の発声データから導出された平均的な特徴量である。本実施形態では、回答情報記憶部３３６、質問者情報記憶部３３７、感情情報記憶部３３８は、それぞれ別個の機能構成要素となっているが、これらのうちの幾つかを適宜統合した機能構成要素がサーバー３０に含まれることとしてもよい。 In addition to the question utterance information of the questioner, the questioner information storage unit 337 stores both information that can be used to estimate the objective state of the questioner and information that can be used to estimate the subjective state of the questioner. The questioner database is stored and updated.
In the present embodiment, the questioner information storage unit 337 stores the utterance content of the question utterance information, acoustic information, sensor information, and the like in the questioner database. The questioner information storage unit 337 also stores information such as the sex and age of the questioner acquired by the initial setting execution unit 333 and information such as the acoustic information of the questioner when the emotion is normal. To remember. Also, the questioner information storage unit 337 updates the questioner database with the utterance content and acoustic information of the question utterance information, and further with the sensor information content acquired by the sensor information acquisition unit 316 and the sensor information acquisition unit 323. The interrogator information storage unit 337 stores information in the interrogator database so that all data can be managed in an integrated manner by associating with the acquisition time, and data on the subjective state and objective state of the interrogator, etc. The data is classified and stored so that it can be managed. The method for updating the questioner database is not limited to such a method.
The emotion information storage unit 338 uses a combination of various acoustic features and emotion labels (texts such as anger, sadness, impatience, and normal state) as one data (hereinafter referred to as emotion data), and a plurality of emotion data. The database including the database (hereinafter emotion database) is stored. Examples of the acoustic feature amount include an utterance speed, a fundamental frequency, a volume, an inflection level, power, an utterance duration, MFCC and its dispersion. In the present embodiment, each acoustic feature amount is an average feature amount derived from utterance data of a plurality of persons in the emotional state indicated by the label. In this embodiment, the answer information storage unit 336, the questioner information storage unit 337, and the emotion information storage unit 338 are separate functional components, but functional components obtained by integrating some of them as appropriate. May be included in the server 30.

回答決定部３３９は、質問者が所望する回答を提示するために、質問者からの質問に対しての適切な回答の決定を行う。
回答決定部３３９は、例えば、以下の三つの機能を含む。
一つ目は、回答データベースの中から現在受けている質問と同じ質問内容を含む回答データを抽出して、抽出した回答データを回答候補に決定する。そして、現在の質問者の客観的状態と回答候補に対応する客観的状態とを比較して、類似であるか否かを判定する機能である。客観的状態が類似か否かの判定処理は、例えば以下のように行われる。現在所持している三脚やレンズといったカメラ２０の付属品についての情報が客観的情報として利用される場合、回答決定部３３９は、付属品毎に、質問の際の質問者と回答データとにおいて、所持状態が同一か否かを判定する。回答決定部３３９は、半数以上の付属品について所持状態が同一である場合は、客観的状態が類似であると判定する。回答決定部３３９は、客観的状態の類似判定方法として、以上の方法以外の方法を用いてもよい。
二つ目は、客観的状態を比較した上で類似であると判定した回答データの主観的状態と、質問の際の質問者の主観的状態を比較して、同一であるか否かを判定する機能である。即ち、質問の際の質問者の主観的状態が「焦っている」と推定された場合、比較中の回答データに対応する主観的状態が「焦っている」であるか否かを判定する機能である。
三つ目は、客観的状態、及び主観的状態が、回答データに対応する客観的状態、及び主観的状態と類似又は同一であると判断された回答データにおいて、満足度が閾値以上である回答データを質問者の質問に対する回答に決定する機能である。本実施形態では、回答決定部３３９は、最適な回答を提示するために以上のような方法を行うが、他の方法を行ってもよい。 The answer determination unit 339 determines an appropriate answer to the question from the questioner in order to present the answer desired by the questioner.
The answer determination unit 339 includes, for example, the following three functions.
First, answer data including the same question content as the question currently received is extracted from the answer database, and the extracted answer data is determined as an answer candidate. And it is a function which compares the objective state of a present questioner, and the objective state corresponding to an answer candidate, and determines whether it is similar. The process for determining whether or not the objective state is similar is performed, for example, as follows. When information about accessories of the camera 20 such as a tripod or a lens that is currently possessed is used as objective information, the answer determination unit 339 determines, for each accessory, the questioner and answer data for the question. It is determined whether the possession state is the same. The answer determination unit 339 determines that the objective state is similar when the possession state is the same for more than half of the accessories. The answer determination unit 339 may use a method other than the above method as the objective state similarity determination method.
Second, compare the subjective state of the answer data determined to be similar after comparing the objective state and the subjective state of the questioner at the time of the question to determine whether or not they are the same It is a function to do. That is, a function for determining whether or not the subjective state corresponding to the answer data being compared is “impressed” when the subjective state of the questioner at the time of the question is estimated to be “impressed” It is.
Third, in the answer data that the objective state and the subjective state are judged to be similar or identical to the objective state and the subjective state corresponding to the answer data, the answer with the satisfaction level equal to or greater than the threshold value This function determines data as answers to the questioner's question. In the present embodiment, the answer determination unit 339 performs the above-described method for presenting an optimal answer, but other methods may be used.

図４は、回答選択処理の一例を示すフローチャートである。
Ｓ４０１において、表示条件設定部３１７は、表示操作部３１４を介して、操作部１１に対する質問者の操作に基づいて、表示部１２への表示処理に関する条件を決定する。
Ｓ４０２において、初期設定実行部３３３は、質問者の音声対話システムの使用履歴に応じて、初期設定を行う。Ｓ４０２の処理の詳細は、図５で後述する。
Ｓ４０３において、音声取得部３１２は、質問者による質問の発声を、マイク１７を介して取得し、取得した音声情報から、質問発話情報を取得する。そして、音声取得部３１２は、取得した質問発話情報を、サーバー３０に送信する。音声取得部３１２による質問発話情報のサーバー３０への送信処理は、音声送信処理の一例である。
Ｓ４０４において、音声認識部３３２は、通信部１３及び通信部３１を介して、Ｓ４０３で音声取得部３１２により取得された質問者の質問発話情報を取得する。音声認識部３３２は、取得した質問発話情報から、音響情報と意味情報とを取得する。
Ｓ４０５において、客観状態推定部３３４は、センサ情報取得部３１６とカメラのセンサ情報取得部３２３を介して、センサ１８、センサ２５により取得された情報を取得する。そして、客観状態推定部３３４は、取得した情報に基づいて、質問者の客観的状態を推定する。
Ｓ４０６において、主観状態推定部３３５は、Ｓ４０４で音声認識部３３２により取得された質問発話情報の音響情報と、センサ情報取得部３１６及びセンサ情報取得部３２３により取得されたセンサ１８、センサ２５からの情報と、に基づいて、次の処理を行う。即ち、主観状態推定部３３５は、質問者の主観的状態を推定する。Ｓ４０６の処理の詳細は、図６で後述する。 FIG. 4 is a flowchart illustrating an example of an answer selection process.
In step S 401, the display condition setting unit 317 determines conditions regarding display processing on the display unit 12 based on the operation of the questioner with respect to the operation unit 11 via the display operation unit 314.
In step S402, the initial setting execution unit 333 performs initial setting according to the use history of the questioner's voice interaction system. Details of the process of S402 will be described later with reference to FIG.
In S403, the voice acquisition unit 312 acquires the utterance of the question by the questioner via the microphone 17, and acquires the question utterance information from the acquired voice information. Then, the voice acquisition unit 312 transmits the acquired question utterance information to the server 30. The process of transmitting the question utterance information to the server 30 by the voice acquisition unit 312 is an example of a voice transmission process.
In S404, the voice recognition unit 332 acquires the questioner's question utterance information acquired by the voice acquisition unit 312 in S403 via the communication unit 13 and the communication unit 31. The voice recognition unit 332 acquires acoustic information and semantic information from the acquired question utterance information.
In step S 405, the objective state estimation unit 334 acquires information acquired by the sensor 18 and the sensor 25 via the sensor information acquisition unit 316 and the sensor information acquisition unit 323 of the camera. Then, the objective state estimation unit 334 estimates the objective state of the questioner based on the acquired information.
In S406, the subjective state estimation unit 335 receives the acoustic information of the question utterance information acquired by the voice recognition unit 332 in S404, and the sensor 18 and sensor 25 acquired by the sensor information acquisition unit 316 and the sensor information acquisition unit 323. Based on the information, the following processing is performed. In other words, the subjective state estimation unit 335 estimates the subjective state of the questioner. Details of the process of S406 will be described later with reference to FIG.

Ｓ４０７において、回答決定部３３９は、以下の処理を実行する。
まず、回答決定部３３９は、音声認識部３３２から取得した質問発話情報の意味情報をもとに、回答情報記憶部３３６により記憶された回答データベースの中から、同じ質問内容を含む回答データを抽出して回答候補に決定する。そして、回答決定部３３９は、客観状態推定部３３４により取得された質問の際の質問者の客観的状態が、回答候補に対応する客観的状態と類似するか否かを判定する。
次に、回答決定部３３９は、対応する客観的状態が質問の際の質問者の客観的状態と類似であると判定した回答候補を特定する。そして、回答決定部３３９は、特定した回答候補それぞれについて、対応する主観的状態が、主観状態推定部３３５により取得された質問の際の質問者の主観的状態と同一であるか否かを判定する。
そして、回答決定部３３９は、対応する主観的状態が質問の際の質問者の主観的状態と同一である回答候補について、対応する満足度が設定された閾値以上であるか否かを判定する。そして、回答決定部３３９は、対応する満足度が設定された閾値以上である回答候補を質問者の質問に対する回答として決定する。本実施形態では、回答決定部３３９は、最適な回答を決定するために以上のような方法で回答の決定を行っているが、この方法に限定するものではない。Ｓ４０７の処理の詳細は、図７で後述する。 In S407, the answer determination unit 339 executes the following processing.
First, the answer determination unit 339 extracts answer data including the same question contents from the answer database stored by the answer information storage unit 336 based on the semantic information of the question utterance information acquired from the voice recognition unit 332. To determine the answer candidate. Then, the answer determination unit 339 determines whether or not the objective state of the questioner at the time of the question acquired by the objective state estimation unit 334 is similar to the objective state corresponding to the answer candidate.
Next, the answer determination unit 339 identifies answer candidates that have been determined that the corresponding objective state is similar to the objective state of the questioner at the time of the question. Then, the answer determination unit 339 determines, for each identified answer candidate, whether the corresponding subjective state is the same as the questioner's subjective state at the time of the question acquired by the subjective state estimation unit 335. To do.
Then, the answer determination unit 339 determines whether or not the corresponding satisfaction level is equal to or higher than a set threshold for the answer candidate whose corresponding subjective state is the same as the subjective state of the questioner at the time of the question. . Then, the answer determination unit 339 determines an answer candidate having a corresponding satisfaction level equal to or higher than the set threshold as an answer to the questioner's question. In the present embodiment, the answer determination unit 339 determines an answer by the above method in order to determine an optimum answer, but the present invention is not limited to this method. Details of the processing of S407 will be described later with reference to FIG.

Ｓ４０８において、回答決定部３３９は、Ｓ４０７で回答として決定した回答データの内容を、内容提示部３１３を介して質問者に提示する。例えば、回答決定部３３９は、Ｓ４０７で回答として決定した回答データの情報を情報端末１０に送信し、内容提示部３１３を介して、情報端末１０に送信した回答データの内容を、表示部１２に表示するよう指示する。回答決定部３３９によるＳ４０７で回答として決定された回答データの内容を情報端末１０に送信処理は、回答送信処理の一例である。そして、内容提示部３１３は、回答データの内容を表示部１２に表示することで、質問者に提示する。また、内容提示部３１３は、表示部１２に、提示した回答データに対する質問者からの応答を受け付ける受付画面を表示する。受付画面は、例えば、提示された回答データに満足したか否かの選択ボタンや、回答データを確認したか否かの選択ボタン、回答データに対応するパラメータをカメラ２０に反映させるか否かの選択ボタン等を含む。
Ｓ４０９において、操作内容取得部３１５は、Ｓ４０８で提示された回答データに対する質問者の応答を、表示操作部３１４を介して取得する。操作内容取得部３１５は、例えば、質問者がＳ４０８で表示された受付画面に対しての入力に応じて、質問者がどの回答に目を通したか否か、回答で提示された操作を実行したか否か、提示した回答に満足したか否か等の情報を取得する。回答情報記憶部３３６は、操作内容取得部３１５により取得されたこれらの情報を用いて、適切なタイミングで、回答データベースを更新する。Ｓ４０９の処理の詳細は、図８で後述する。 In step S 408, the answer determination unit 339 presents the content of the answer data determined as the answer in step S 407 to the questioner via the content presentation unit 313. For example, the answer determination unit 339 transmits the information of the answer data determined as an answer in S407 to the information terminal 10, and the content of the answer data transmitted to the information terminal 10 via the content presentation unit 313 is displayed on the display unit 12. Instruct to display. The process of transmitting the content of the answer data determined as the answer in S407 by the answer determination unit 339 to the information terminal 10 is an example of the answer transmission process. Then, the content presentation unit 313 displays the content of the answer data on the display unit 12 to present it to the questioner. Further, the content presentation unit 313 displays on the display unit 12 a reception screen that receives a response from the questioner with respect to the presented answer data. The reception screen includes, for example, a selection button indicating whether or not the presented answer data is satisfied, a selection button indicating whether or not the answer data is confirmed, and whether or not the parameter corresponding to the answer data is reflected on the camera 20. Includes selection buttons.
In S409, the operation content acquisition unit 315 acquires the questioner's response to the answer data presented in S408 via the display operation unit 314. For example, in response to an input to the reception screen displayed in S 408 by the questioner, the operation content acquisition unit 315 performs an operation indicated by the answer as to which answer the questioner has read. Information such as whether or not he / she was satisfied and satisfied with the presented answer. The answer information storage unit 336 uses the information acquired by the operation content acquisition unit 315 to update the answer database at an appropriate timing. Details of the processing of S409 will be described later with reference to FIG.

図５は、初期設定処理の一例を示すフローチャートである。
Ｓ５０１において、初期設定実行部３３３は、質問者による音声対話システムの利用が初めてか否かを判定する。初期設定実行部３３３は、質問者による音声対話システムの利用が初めてと判定した場合、Ｓ５０２に進み、質問者による音声対話システムの利用が初めてでないと判定した場合、図５の処理を終了する。初期設定実行部３３３は、例えば、表示部１２に利用が初めてか否かを選択するＹＥＳ／ＮＯ形式のボタンを含む選択画面を表示させ、質問者による選択画面を介した操作に基づいて、利用が初めてか否かの情報を取得することとしてもよい。
また、情報端末１０は、質問者のログイン処理等により質問者の情報を取得することとして、初期設定実行部３３３は、情報端末１０から質問者の情報を取得する。そして、初期設定実行部３３３は、取得した質問者について、感情状態が平常な場合の発声の音響情報が質問者データベース内に記憶されているか否かを判定することとしてもよい。その場合、初期設定実行部３３３は、質問者データベース内に記憶されていると判定した場合、その質問者による音声対話システムの利用は初めてではないと判定する。初期設定実行部３３３は、質問者データベース内に記憶されていないと判定した場合、その質問者による音声対話システムの利用は初めてであると判定する。 FIG. 5 is a flowchart illustrating an example of the initial setting process.
In step S 501, the initial setting execution unit 333 determines whether or not the questioner has used the voice interaction system for the first time. If the initial setting execution unit 333 determines that the questioner uses the voice interaction system for the first time, the initial setting execution unit 333 proceeds to step S502. For example, the initial setting execution unit 333 displays a selection screen including a YES / NO format button for selecting whether or not the use is the first time on the display unit 12, and uses the operation based on an operation through the selection screen by the questioner. It is good also as acquiring the information of whether it is the first time.
Further, the information terminal 10 acquires the questioner's information by the questioner's login process or the like, and the initial setting execution unit 333 acquires the questioner's information from the information terminal 10. Then, the initial setting execution unit 333 may determine whether or not acoustic information of utterance when the emotional state is normal is stored in the questioner database for the acquired questioner. In that case, when it is determined that the initial setting execution unit 333 is stored in the questioner database, the initial setting execution unit 333 determines that the use of the voice interactive system by the questioner is not the first time. If it is determined that the initial setting execution unit 333 is not stored in the questioner database, the initial setting execution unit 333 determines that the use of the voice interaction system by the questioner is the first time.

Ｓ５０２において、初期設定実行部３３３は、質問者の感情状態が平常である場合の音響情報を質問者データベースから取得する。感情状態が平常であるとは、怒りや悲しみ等の感情を感じていない落ち着いた感情状態である。
感情状態が平常である場合の音響情報を取得するには、例えば以下の手順の実行が必要である。まず、内容提示部３１３が、質問者に発話してもらうテキストを表示部１２に表示する。表示されるテキストは少なくとも一つ以上の文を含む。次に、音声認識部３３２は、質問者が発話したテキスト音声を、音声取得部３１２を介して取得し、取得した音声情報を認識し、その音声情報に対応する音響情報を取得する。そして、質問者情報記憶部３３７は、音声認識部３３２により取得された音響情報を、その質問者の感情状態が平常な状態の音響情報として質問者データベースに記憶する。例えば、音響情報は、発話速度や音量、抑揚の大きさ、基本周波数、パワーや発話持続時間、更にはＭＦＣＣやその分散等の音響特徴量である。本実施形態では、質問者の感情状態が平常である場合の音響情報の取得を、音声対話システムを初めて使用する場合にのみ取得されることとする。しかし、例えば、過去に音声対話システムを利用したことのある質問者が改めて音声対話システムを利用し始めた際に、音声対話システムは、質問者の感情状態が平常である場合の音響情報を取得することとしてもよい。 In step S 502, the initial setting execution unit 333 acquires acoustic information when the emotional state of the questioner is normal from the questioner database. The emotional state is normal is a calm emotional state in which no emotions such as anger and sadness are felt.
In order to acquire acoustic information when the emotional state is normal, for example, the following procedure must be executed. First, the content presentation unit 313 displays text on the display unit 12 that the questioner asks to speak. The displayed text includes at least one sentence. Next, the voice recognition unit 332 acquires the text voice uttered by the questioner via the voice acquisition unit 312, recognizes the acquired voice information, and acquires acoustic information corresponding to the voice information. The questioner information storage unit 337 stores the acoustic information acquired by the voice recognition unit 332 in the questioner database as acoustic information in which the questioner's emotional state is normal. For example, the acoustic information is an utterance speed and volume, an inflection magnitude, a fundamental frequency, a power and an utterance duration, and an acoustic feature quantity such as MFCC and its dispersion. In the present embodiment, the acquisition of acoustic information when the questioner's emotional state is normal is acquired only when the voice interaction system is used for the first time. However, for example, when a questioner who has used a voice dialogue system in the past starts using the voice dialogue system again, the voice dialogue system obtains acoustic information when the questioner's emotional state is normal It is good to do.

図６は、質問者の主観的状態の推定処理の一例を示すフローチャートである。
Ｓ６０１において、主観状態推定部３３５は、質問者の感情状態を感情データベースに保存されているモデル感情データと比較するために、質問者の感情データとモデル感情データとの関係Ｆを取得する。モデル感情データとは、ラベルが示す感情の状態における複数の人の音声データから導出された平均的な特徴量である。本実施形態では、主観状態推定部３３５は、感情が平常な状態での感情データの特徴量に基づいて、関係Ｆを取得する。また、関係Ｆは、モデル感情データのある特徴量が質問者の感情データの同じ特徴量の何倍に相当するか、で算出される。例えば、感情状態が平常な状態でのモデル感情データの特徴量が（ａ１、ａ２、ａ３、…）であり、感情状態が平常な状態での質問者の感情データの特徴量が（ｂ１、ｂ２、ｂ３、…）であり、関係Ｆが（ｆ１、ｆ２、ｆ３、…）であるとする。その場合、ａ１＝ｆ１×ｂ１、ａ２＝ｆ２×ｂ２、…の関係を満たす。主観状態推定部３３５は、この関係から、関係Ｆを算出する。
Ｓ６０２において、主観状態推定部３３５は、Ｓ６０１で算出した関係Ｆを、ある感情状態の質問者の感情データの特徴量に適用する。即ち、主観状態推定部３３５は、ある感情状態の質問者の感情データの特徴量（Ｂ１、Ｂ２、Ｂ３、…）に対して、関係Ｆを適用する。その結果、適用後の特徴量は（Ｂ'１、Ｂ'２、Ｂ'３、…）（Ｂ'１＝ｆ１×Ｂ１、Ｂ'２＝ｆ２×Ｂ２、…）となる。これにより、ある感情状態の質問者の感情データの特徴量を感情データベースに保存されているモデル感情データと比較できるようになる。即ち、ある感情状態の質問者の感情データの特徴量に関係Ｆを適用した特徴量が、感情データベースに保存されているモデル感情データの特徴量と比較される特徴量となる。 FIG. 6 is a flowchart illustrating an example of the process of estimating the subjective state of the questioner.
In step S601, the subjective state estimation unit 335 acquires the relationship F between the questioner's emotion data and the model emotion data in order to compare the questioner's emotion state with the model emotion data stored in the emotion database. The model emotion data is an average feature amount derived from voice data of a plurality of people in the emotional state indicated by the label. In the present embodiment, the subjective state estimation unit 335 acquires the relationship F based on the feature amount of the emotion data in a state where the emotion is normal. The relationship F is calculated by how many times the characteristic amount of the model emotion data corresponds to the same feature amount of the questioner's emotion data. For example, the feature amount of the model emotion data when the emotional state is normal is (a1, a2, a3,...), And the feature amount of the questioner's emotion data when the emotional state is normal is (b1, b2). , B3,..., And the relationship F is (f1, f2, f3,...). In that case, the relationship of a1 = f1 × b1, a2 = f2 × b2,. The subjective state estimation unit 335 calculates the relationship F from this relationship.
In S602, the subjective state estimation unit 335 applies the relationship F calculated in S601 to the feature amount of the emotion data of the questioner in a certain emotional state. That is, the subjective state estimation unit 335 applies the relationship F to the feature amount (B1, B2, B3,...) Of the emotion data of the questioner in a certain emotional state. As a result, the feature amount after application is (B′1, B′2, B′3,...) (B′1 = f1 × B1, B′2 = f2 × B2,...). As a result, the feature amount of the emotion data of the questioner in a certain emotion state can be compared with the model emotion data stored in the emotion database. In other words, the feature amount obtained by applying the relationship F to the feature amount of the emotion data of the questioner in a certain emotional state becomes the feature amount to be compared with the feature amount of the model emotion data stored in the emotion database.

Ｓ６０３において、主観状態推定部３３５は、感情データベースより感情データを１つ選択する。選択された感情データの特徴量を（Ａ１、Ａ２、Ａ３、…）とする。
Ｓ６０４において、主観状態推定部３３５は、Ｓ６０２で算出したある感情状態の質問者の特徴量に関係Ｆを適用した特徴量（Ｂ'１、Ｂ'２、Ｂ'３、…）と、Ｓ６０３で選択した感情データの特徴量（Ａ１、Ａ２、Ａ３、…）について類似度Ｍを算出する。類似度Ｍは、特徴量（Ｂ'１、Ｂ'２、Ｂ'３、…）を要素とするベクトルＢ'と特徴量（Ａ１、Ａ２、Ａ３、…）を要素とするベクトルＡを用いて、ベクトルＢ'とベクトルＡの内積からベクトルＡの絶対値の二乗を引いた値である。即ち、主観状態推定部３３５は、Ｍ＝｜Ｂ'・Ａ−｜Ａ｜²｜の式を用いて、類似度Ｍを算出する。
Ｓ６０５において、主観状態推定部３３５は、感情データベースに含まれる全ての感情データに対して、類似度Ｍを算出したか否かを判定する。主観状態推定部３３５は、全ての感情データに対して類似度Ｍを算出した場合は、特徴量の類似度比較処理を終了し、Ｓ６０７に進む。主観状態推定部３３５は、類似度Ｍを算出していない感情データがある場合は、Ｓ６０６の処理に進む。
Ｓ６０６において、主観状態推定部３３５は、感情データベースから、Ｓ６０３〜Ｓ６０４の処理が行われていない感情データを取得した後、取得した感情データについて、Ｓ６０３〜Ｓ６０４の処理を行う。
Ｓ６０７において、主観状態推定部３３５は、感情データベースに含まれる感情データのうち、Ｓ６０４で算出した類似度Ｍが最小となる感情データに対応する感情を示すラベルを、質問の際の質問者の主観的状態として推定する。 In S603, the subjective state estimation unit 335 selects one emotion data from the emotion database. Let the feature amount of the selected emotion data be (A1, A2, A3,...).
In S604, the subjective state estimation unit 335 applies the feature amount (B′1, B′2, B′3,...) Obtained by applying the relationship F to the feature amount of the questioner in the certain emotional state calculated in S602, and in S603. A similarity M is calculated for the feature amount (A1, A2, A3,...) Of the selected emotion data. The similarity M is obtained by using a vector B ′ whose elements are feature quantities (B′1, B′2, B′3,...) And a vector A whose elements are feature quantities (A1, A2, A3,...). , A value obtained by subtracting the square of the absolute value of vector A from the inner product of vector B ′ and vector A. That is, the subjective state estimation unit 335 calculates the similarity M using the equation M = | B ′ · A− | A | ² |.
In S605, the subjective state estimation unit 335 determines whether the similarity M has been calculated for all emotion data included in the emotion database. If the subjective state estimation unit 335 calculates the similarity M for all the emotion data, the subjective state estimation unit 335 ends the feature amount similarity comparison processing and proceeds to S607. If there is emotion data for which the similarity M has not been calculated, the subjective state estimation unit 335 proceeds to the process of S606.
In S 606, the subjective state estimation unit 335 acquires emotion data that has not been subjected to the processes of S 603 to S 604 from the emotion database, and then performs the processes of S 603 to S 604 for the acquired emotion data.
In S 607, the subjective state estimation unit 335 displays the label indicating the emotion corresponding to the emotion data having the smallest similarity M calculated in S 604 among the emotion data included in the emotion database. Estimated as the target state.

図７は、回答決定処理の一例を示すフローチャートである。
Ｓ７０１において、回答決定部３３９は、回答データベースより、回答データを一つ選択する。
Ｓ７０２において、回答決定部３３９は、客観状態推定部３３４により取得された質問の際の質問者の客観的状態と、Ｓ７０１で選択した回答データに対応する客観的状態とを比較し、客観的状態が類似であるか否かを判定する。回答決定部３３９は、類似であると判定した場合、Ｓ７０３に進み、類似でないと判定した場合、Ｓ７０９に進む。
Ｓ７０２における客観状態推定部３３４により取得された質問の際の質問者の客観的状態と、Ｓ７０１で選択した回答データに対応する客観的状態と、の類似か否かの判定処理について説明する。
例えば、客観的状態として、カメラ２０に接続される付属品（三脚、望遠レンズ等）の情報が利用される場合について説明する。本実施形態では、カメラ２０に接続される付属品は、三脚と望遠レンズとの２つであるとする。客観状態推定部３３４は、カメラ２０からカメラ２０に接続されている付属品の情報を、質問者の客観的情報として取得する。客観状態推定部３３４は、例えば、カメラ２０から客観的情報として、三脚が接続されていることを示す情報を取得する。その場合、客観状態推定部３３４は、三脚が接続されていることを示す客観的情報に基づいて、質問者の客観的状態を、三脚を所持している状態として推定する。 FIG. 7 is a flowchart illustrating an example of an answer determination process.
In step S701, the answer determination unit 339 selects one answer data from the answer database.
In S702, the answer determination unit 339 compares the objective state of the questioner at the time of the question acquired by the objective state estimation unit 334 with the objective state corresponding to the answer data selected in S701, and the objective state. Are similar to each other. If the answer determination unit 339 determines that they are similar, the process proceeds to S703. If the answer determination unit 339 determines that they are not similar, the process proceeds to S709.
A process for determining whether or not the objective state of the questioner at the time of the question acquired by the objective state estimation unit 334 in S702 is similar to the objective state corresponding to the answer data selected in S701 will be described.
For example, a case where information on accessories (tripod, telephoto lens, etc.) connected to the camera 20 is used as an objective state will be described. In the present embodiment, it is assumed that there are two accessories connected to the camera 20, that is, a tripod and a telephoto lens. The objective state estimation unit 334 acquires information on accessories connected to the camera 20 from the camera 20 as objective information of the questioner. The objective state estimation unit 334 acquires information indicating that a tripod is connected as objective information from the camera 20, for example. In this case, the objective state estimation unit 334 estimates the objective state of the questioner as a state of possessing the tripod based on objective information indicating that the tripod is connected.

また、客観状態推定部３３４は、例えば、カメラ２０から客観的情報として、三脚が接続されていることを示す情報、及び望遠レンズが接続されていることを示す客観的情報を取得した場合、以下のように推定する。即ち、客観状態推定部３３４は、質問者の客観的状態を、三脚及び望遠レンズを所持している状態として推定する。客観状態推定部３３４は、例えば、カメラ２０から客観的情報として、三脚が接続されていないことを示す情報、及び望遠レンズが接続されていないことを示す情報を取得した場合、質問者の客観的状態を、何も所持していない状態として推定する。
回答決定部３３９は、例えば、客観状態推定部３３４により推定された質問の際の質問者の客観的状態に対応する客観的情報と、Ｓ７０１で取得した回答データに対応する客観的状態に対応する客観的情報と、を比較することで、双方が類似するか否か判定する。回答決定部３３９は、例えば、双方の客観的状態について、客観的状態に対応する客観的情報のうち、半数以上が共通する場合、双方の客観的状態を類似すると決定する。
本実施形態では、回答決定部３３９は、客観状態推定部３３４により推定された客観的状態に対応する客観的情報である付属品の所持状態と、Ｓ７０１で取得した回答データに対応する客観的状態が示す付属品の所持状態と、を比較する。
例えば、客観状態推定部３３４により推定された客観的状態が付属品を所持していない状態である場合、質問の際の質問者の客観的状態に対応する客観的情報は、以下の情報である。即ち、質問の際の質問者の客観的状態に対応する客観的情報は、三脚が接続されていないことを示す情報、及び望遠レンズが接続されていないことを示す情報である。Ｓ７０１で取得された回答データに対応する客観的状態が付属品を所持していない状態である場合は、Ｓ７０１で取得された回答データに対応する客観的状態に対応する客観的情報は、同様に、以下の情報となる。即ち、Ｓ７０１で取得された回答データに対応する客観的状態に対応する客観的情報は、三脚が接続されていないことを示す情報、及び望遠レンズが接続されていないことを示す情報である。その場合、回答決定部３３９は、双方の客観的状態に対応する客観的情報の全てが共通しているので、双方の客観的状態を類似していると判定する。 Further, for example, when the objective state estimation unit 334 acquires information indicating that a tripod is connected and objective information indicating that a telephoto lens is connected as objective information from the camera 20, Estimate as follows. That is, the objective state estimation unit 334 estimates the objective state of the questioner as a state in which a tripod and a telephoto lens are possessed. For example, when the objective state estimation unit 334 acquires information indicating that the tripod is not connected and information indicating that the telephoto lens is not connected as objective information from the camera 20, the objective state of the questioner The state is estimated as a state where nothing is possessed.
The answer determination unit 339 corresponds to, for example, objective information corresponding to the objective state of the questioner at the time of the question estimated by the objective state estimation unit 334, and the objective state corresponding to the answer data acquired in S701. By comparing objective information, it is determined whether or not both are similar. For example, when more than half of the objective information corresponding to the objective state is common to both objective states, the answer determination unit 339 determines that both objective states are similar.
In this embodiment, the answer determination unit 339 includes the accessory possession state, which is objective information corresponding to the objective state estimated by the objective state estimation unit 334, and the objective state corresponding to the reply data acquired in S701. Compare the possession status of the accessory indicated by.
For example, when the objective state estimated by the objective state estimation unit 334 is a state where no accessory is possessed, the objective information corresponding to the objective state of the questioner at the time of the question is the following information: . That is, the objective information corresponding to the objective state of the questioner at the time of the question is information indicating that the tripod is not connected and information indicating that the telephoto lens is not connected. If the objective state corresponding to the answer data acquired in S701 is a state in which no accessory is possessed, the objective information corresponding to the objective state corresponding to the answer data acquired in S701 is the same. It becomes the following information. That is, the objective information corresponding to the objective state corresponding to the answer data acquired in S701 is information indicating that the tripod is not connected and information indicating that the telephoto lens is not connected. In this case, the answer determination unit 339 determines that both objective states are similar because all objective information corresponding to both objective states is common.

また、例えば、客観状態推定部３３４により推定された客観的状態が付属品を所持していない状態である場合、質問の際の質問者の客観的状態に対応する客観的情報は、以下の情報である。即ち、質問の際の質問者の客観的状態に対応する客観的情報は、三脚が接続されていないことを示す情報、及び望遠レンズが接続されていないことを示す情報である。Ｓ７０１で取得された回答データに対応する客観的状態が三脚及び望遠レンズを所持している状態である場合、その客観的状態に対応する客観的情報は、三脚が接続されていることを示す情報及び、望遠レンズが接続されていることを示す情報である。その場合、回答決定部３３９は、双方の客観的状態に対応する客観的情報のうち共通しているものがないので、双方の客観的状態を類似でないと判定する。
また、例えば、客観状態推定部３３４により推定された客観的状態が三脚を所持している状態の場合、質問の際の質問者の客観的状態に対応する客観的情報は、三脚が接続されていることを示す情報、及び望遠レンズが接続されていないことを示す情報である。Ｓ７０１で取得された回答データに対応する客観的状態が三脚及び望遠レンズを所持している状態である場合、その客観的状態に対応する客観的情報は、三脚が接続されていることを示す情報及び、望遠レンズが接続されていることを示す情報である。その場合、回答決定部３３９は、三脚が接続されていることを示す情報が共通しており、２つの客観的情報のうち半数以上である１つの客観的情報が共通することになるので、双方の客観的状態を類似であると判定する。
本実施形態では、音声対話システムは、客観的情報として、付属品がカメラ２０に接続されているか否かの情報を利用することとしたが、他の情報を利用してもよい。例えば、音声対話システムは、客観的情報として、カメラ２０の照度センサからのセンサ値が設定された閾値以上であるか否かを示す情報を、質問者の客観的情報として利用してもよい。 For example, when the objective state estimated by the objective state estimation unit 334 is a state in which no accessory is possessed, the objective information corresponding to the objective state of the questioner at the time of the question is the following information: It is. That is, the objective information corresponding to the objective state of the questioner at the time of the question is information indicating that the tripod is not connected and information indicating that the telephoto lens is not connected. If the objective state corresponding to the answer data acquired in S701 is a state of possessing a tripod and a telephoto lens, the objective information corresponding to the objective state is information indicating that the tripod is connected. And information indicating that a telephoto lens is connected. In this case, the answer determination unit 339 determines that both objective states are not similar because there is no common objective information corresponding to both objective states.
Further, for example, when the objective state estimated by the objective state estimation unit 334 is a state having a tripod, the objective information corresponding to the objective state of the questioner at the time of the question is connected to the tripod. And information indicating that the telephoto lens is not connected. If the objective state corresponding to the answer data acquired in S701 is a state of possessing a tripod and a telephoto lens, the objective information corresponding to the objective state is information indicating that the tripod is connected. And information indicating that a telephoto lens is connected. In that case, the answer determination unit 339 has the same information indicating that the tripod is connected, and one objective information that is more than half of the two objective information is common. Are determined to be similar to each other.
In the present embodiment, the voice interaction system uses information as to whether the accessory is connected to the camera 20 as objective information, but other information may be used. For example, the voice interaction system may use, as objective information, information indicating whether or not the sensor value from the illuminance sensor of the camera 20 is greater than or equal to a set threshold value as objective information of the questioner.

Ｓ７０３において、回答決定部３３９は、Ｓ７０１で選択した回答データに対応する主観的状態と、主観状態推定部３３５により推定された質問の際の質問者の主観的状態と、が同一か否かを判定する。例えば、主観状態推定部３３５が質問の際の質問者の主観的状態を、「焦っている」状態と推定した場合、回答決定部３３９は、Ｓ７０１で選択した回答データに対応する主観的状態が「焦っている」状態であるか否かを判定する。回答決定部３３９は、双方の主観的状態を同一と判定した場合、Ｓ７０４に進み、同一でないと判定した場合、Ｓ７０９に進む。
Ｓ７０４において、回答決定部３３９は、Ｓ７０１で選択した回答データに対応する満足度ｐが設定された閾値（以下では、ｐ＿ｂｏｕｎｄ）以上であるか否かを判定する。表示条件設定部３１７は、閾値ｐ＿ｂｏｕｎｄを決定し、決定した閾値ｐ＿ｂｏｕｎｄの情報を、記憶装置３４に記憶するようサーバー３０に指示する。表示条件設定部３１７は、例えば、表示部１２に、閾値ｐ＿ｂｏｕｎｄの値の入力を受け付けるための入力画面を表示する。そして、表示条件設定部３１７は、入力画面への質問者による入力に基づいて、閾値ｐ＿ｂｏｕｎｄの値を決定することができる。
回答決定部３３９は、Ｓ７０１で選択した回答データに対応する満足度ｐが閾値ｐ＿ｂｏｕｎｄ以上であると判定した場合、Ｓ７０５に進み、閾値ｐ＿ｂｏｕｎｄ未満であると判定した場合、Ｓ７０９に進む。 In S703, the answer determination unit 339 determines whether or not the subjective state corresponding to the answer data selected in S701 and the subjective state of the questioner in the question estimated by the subjective state estimation unit 335 are the same. judge. For example, when the subjective state estimation unit 335 estimates that the questioner's subjective state at the time of the question is “impressed”, the answer determination unit 339 determines that the subjective state corresponding to the answer data selected in S701 is It is determined whether or not the state is “impressed”. If the answer determination unit 339 determines that both subjective states are the same, the process proceeds to S704. If it is determined that they are not the same, the process proceeds to S709.
In step S704, the answer determination unit 339 determines whether the satisfaction level p corresponding to the answer data selected in step S701 is equal to or greater than a set threshold value (hereinafter, p_bound). The display condition setting unit 317 determines the threshold value p_bound, and instructs the server 30 to store information on the determined threshold value p_bound in the storage device 34. For example, the display condition setting unit 317 displays an input screen for receiving an input of the value of the threshold value p_bound on the display unit 12. Then, the display condition setting unit 317 can determine the value of the threshold value p_bound based on the input by the questioner on the input screen.
If the answer determination unit 339 determines that the satisfaction level p corresponding to the answer data selected in S701 is greater than or equal to the threshold value p_bound, the process proceeds to S705. If the answer determination unit 339 determines that the satisfaction level p is less than the threshold value p_bound, the process proceeds to S709.

Ｓ７０５において、回答決定部３３９は、Ｓ７０８で質問者からの質問に対する回答として決定された回答データの個数が設定された回答上限数Ｌより少ないか否かを判定する。表示条件設定部３１７は、回答上限数Ｌを決定し、決定した回答上限数Ｌの情報を、記憶装置３４に記憶するようサーバー３０に指示する。表示条件設定部３１７は、例えば、表示部１２に、回答上限数Ｌの入力を受け付けるための入力画面を表示する。そして、表示条件設定部３１７は、入力画面への質問者による入力に基づいて、回答上限数Ｌの値を決定することができる。
回答決定部３３９は、Ｓ７０８で質問者からの質問に対する回答として決定された回答データの個数が設定された回答上限数Ｌより少ないと判定した場合、Ｓ７０８に進む。回答決定部３３９は、Ｓ７０８で質問者からの質問に対する回答として決定された回答データの個数が設定された回答上限数Ｌ以上であると判定した場合、Ｓ７０６に進む。
Ｓ７０６において、回答決定部３３９は、Ｓ７０１で選択した回答データに対応する満足度ｐが、Ｓ７０８で質問に対する回答として決定された回答データに対応する満足度の中の最小値ｐｍｉｎより大きいか否かを判定する。回答決定部３３９は、ｐがｐｍｉｎより大きいと判定した場合、Ｓ７０７に進み、ｐがｐｍｉｎ以下であると判定した場合、Ｓ７０９に進む。 In S705, the answer determination unit 339 determines whether or not the number of answer data determined as an answer to the question from the questioner in S708 is less than the set upper limit number L of answers. The display condition setting unit 317 determines the answer upper limit number L and instructs the server 30 to store the information of the determined answer upper limit number L in the storage device 34. The display condition setting unit 317 displays, for example, an input screen for accepting an input of the answer upper limit number L on the display unit 12. Then, the display condition setting unit 317 can determine the value of the answer upper limit number L based on the input by the questioner to the input screen.
If the answer determination unit 339 determines that the number of answer data determined as an answer to the question from the questioner in S708 is smaller than the set upper limit number L of answers, the process proceeds to S708. If the answer determination unit 339 determines in S708 that the number of answer data determined as an answer to the question from the questioner is greater than or equal to the set upper limit number of answers L, the process proceeds to S706.
In S706, the answer determination unit 339 determines whether or not the satisfaction p corresponding to the answer data selected in S701 is greater than the minimum value pmin in the satisfaction corresponding to the answer data determined as an answer to the question in S708. Determine. If the answer determination unit 339 determines that p is greater than pmin, the process proceeds to S707. If the answer determination unit 339 determines that p is equal to or less than pmin, the process proceeds to S709.

Ｓ７０７において、回答決定部３３９は、Ｓ７０８で質問に対する回答として決定された回答データの中の対応する満足度がｐｍｉｎである回答データを、質問者の質問に対する回答から除外する。回答決定部３３９は、Ｓ７０８で質問に対する回答として決定された回答データの中に、対応する満足度がｐｍｉｎである回答データが複数存在する場合、それら全てを質問者の質問に対する回答から除外する。これにより、回答決定部３３９は、質問に対する回答として、過剰な数の回答を質問者に提示する可能性を低減できる。
Ｓ７０８において、回答決定部３３９は、Ｓ７０１で選択した回答データを、質問者からの質問に対しての回答として決定する。
Ｓ７０９において、回答決定部３３９は、回答データベース内の全ての回答データに対して、Ｓ７０２〜Ｓ７０８の処理を行ったか否かを判定する。回答決定部３３９は、回答データベース内の全ての回答データに対して、Ｓ７０２〜Ｓ７０８の処理を行ったと判定した場合、Ｓ７１１に進む。回答決定部３３９は、回答データベース内にＳ７０２〜Ｓ７０８の処理を行っていない回答データがあると判定した場合、Ｓ７１０で、回答データベースにおける今回の処理で処理した回答データを除いた回答候補を参照し、Ｓ７０１に進む。Ｓ７０１では、回答候補から再び１つの回答データが選択される。
Ｓ７１１において、回答決定部３３９は、Ｓ７０８で質問者からの質問に対しての回答として決定した回答データが存在するか否かを判定する。回答決定部３３９は、Ｓ７０８で質問者からの質問に対しての回答として決定した回答データが存在すると判定した場合、図７の処理を終了する。回答決定部３３９は、Ｓ７０８で質問者からの質問に対しての回答として決定した回答データが存在しないと判定した場合、Ｓ７１２に進む。 In S707, the answer determination unit 339 excludes the answer data having a corresponding satisfaction level of pmin in the answer data determined as the answer to the question in S708 from the answer to the questioner's question. If there are a plurality of answer data having a corresponding satisfaction level of pmin in the answer data determined as the answer to the question in S708, the answer determining unit 339 excludes all of them from the answer to the questioner's question. Thereby, the answer determination unit 339 can reduce the possibility of presenting an excessive number of answers to the questioner as answers to the question.
In S708, the answer determination unit 339 determines the answer data selected in S701 as an answer to the question from the questioner.
In S709, the answer determination unit 339 determines whether or not the processes of S702 to S708 have been performed on all the answer data in the answer database. If the answer determination unit 339 determines that the processes of S702 to S708 have been performed on all the answer data in the answer database, the process proceeds to S711. If the answer determination unit 339 determines that there is answer data that has not been subjected to the processing of S702 to S708 in the answer database, in S710, the answer determination unit 339 refers to the answer candidates excluding the answer data processed in the current process in the answer database. , The process proceeds to S701. In S701, one answer data is selected again from the answer candidates.
In S711, the answer determination unit 339 determines whether or not the answer data determined as an answer to the question from the questioner exists in S708. If the answer determination unit 339 determines that there is the answer data determined as an answer to the question from the questioner in S708, the process of FIG. 7 ends. If the answer determination unit 339 determines that there is no answer data determined as an answer to the question from the questioner in S708, the process proceeds to S712.

Ｓ７１２において、回答決定部３３９は、回答データベースに含まれる回答データの中に、対応する主観的状態が質問の際の質問者の主観的状態と同一なものが存在するか否かを判定する。回答決定部３３９は、回答データベースに含まれる回答データの中に、対応する主観的状態が質問の際の質問者の主観的状態と同一なものが存在すると判定した場合、Ｓ７１３に進む。回答決定部３３９は、回答データベースに含まれる回答データの中に、対応する主観的状態が質問の際の質問者の主観的状態と同一なものが存在しないと判定した場合、Ｓ７１４に進む。
Ｓ７１３において、回答決定部３３９は、回答データベース中の回答データのうち、Ｓ７１２で対応する主観的状態が質問の際の質問者の主観的状態と同一と判定した回答データの中から、質問者の質問に対する回答とする回答データを決定する。回答決定部３３９は、例えば、Ｓ７１２で対応する主観的状態が質問の際の質問者の主観的状態と同一と判定した回答データのうち、対応する満足度がｐ＿ｂｏｕｎｄ以上のものの中から、満足度が大きいものからＬ個を質問者の質問に対する回答に決定する。 In step S712, the answer determination unit 339 determines whether or not the answer data included in the answer database includes a corresponding subjective state that is the same as that of the questioner at the time of the question. If the answer determination unit 339 determines that there is a corresponding subjective state in the answer database that is the same as the subjective state of the questioner at the time of the question, the process proceeds to S713. If the answer determination unit 339 determines that the corresponding subjective state does not exist in the answer data included in the answer database, the process proceeds to S714.
In S713, the answer determination unit 339 determines, from among the answer data in the answer database, from the answer data in which the subjective state corresponding to S712 is determined to be the same as the subjective state of the questioner at the time of the question. Determine the answer data that will be the answer to the question. For example, the answer determination unit 339 determines the satisfaction degree from among the answer data in which the corresponding subjective state is determined to be the same as the subjective state of the questioner at the time of the question in S712 and the corresponding satisfaction level is p_bound or higher. L is selected as the answer to the questioner's question.

Ｓ７１４において、回答決定部３３９は、回答データベース中の回答データのうち、対応する客観的状態が質問の際の質問者の客観的状態と類似するものが存在するか否かを判定する。回答決定部３３９は、回答データベース中の回答データのうち、対応する客観的状態が質問の際の質問者の客観的状態と類似するものが存在すると判定した場合、Ｓ７１５に進む。回答決定部３３９は、回答データベース中の回答データのうち、対応する客観的状態が質問の際の質問者の客観的状態と類似するものが存在しないと判定した場合、Ｓ７１６に進む。
Ｓ７１５において、回答決定部３３９は、回答データベース中の回答データのうち、Ｓ７１４で対応する客観的状態が質問の際の質問者の客観的状態と類似すると判定した回答データの中から、質問者の質問に対する回答とする回答データを決定する。回答決定部３３９は、例えば、Ｓ７１４で対応する客観的状態が質問の際の質問者の客観的状態と類似すると判定した回答データのうち、対応する満足度がｐ＿ｂｏｕｎｄ以上のものの中から、満足度が大きいものからＬ個を質問者の質問に対する回答に決定する。
Ｓ７１６において、回答決定部３３９は、回答データベース中の回答データのうち、対応する満足度がｐ＿ｂｏｕｎｄ以上のものの中から、満足度が大きいものからＬ個を、質問者の質問に対する回答として決定する。
本実施形態では、回答決定部３３９は、図７の処理で回答を決定したが、この方法に限定されるものではない。例えば、回答決定部３３９は、Ｓ７１１で回答として決定されている回答データが存在しないと判定した場合、Ｓ７１２〜Ｓ７１６の処理を行う代わりに、適切な回答がない旨を示す情報を表示部１２に表示するよう情報端末１０に指示することとしてもよい。 In S 714, the answer determination unit 339 determines whether or not the corresponding objective state in the answer database is similar to the objective state of the questioner at the time of the question. If the answer determination unit 339 determines that there is a response whose objective state is similar to the objective state of the questioner at the time of the question, the process proceeds to S715. If the answer determination unit 339 determines that there is no data whose answer objective in the answer database is similar to the objective condition of the questioner at the time of the question, the process proceeds to S716.
In S715, the answer determination unit 339 determines, from among the answer data in the answer database, from the answer data determined that the objective state corresponding to S714 is similar to the objective state of the questioner at the time of the question. Determine the answer data that will be the answer to the question. For example, the answer determination unit 339 determines, from among the answer data determined in S714 that the objective state corresponding to the question is similar to the objective state of the questioner at the time of the question, the satisfaction degree corresponding to the satisfaction level equal to or higher than p_bound L is selected as the answer to the questioner's question.
In S716, the answer determination unit 339 determines L answers from the answer data in the answer database that have a high satisfaction degree among answers corresponding to p_bound or more as answers to the questioner's question.
In the present embodiment, the answer determination unit 339 determines an answer by the process of FIG. 7, but the present invention is not limited to this method. For example, if the answer determination unit 339 determines that there is no answer data determined as an answer in S711, instead of performing the processing of S712 to S716, information indicating that there is no appropriate answer is displayed on the display unit 12. The information terminal 10 may be instructed to display.

図８は、回答データベースの更新処理の一例を示すフローチャートである。
Ｓ８０１において、操作内容取得部３１５は、質問者が提示された質問に対する回答に対して満足したか否かを評価したか否かを判定する。操作内容取得部３１５は、例えば、内容提示部３１３を介して表示部１２に質問に対する回答と併せて、回答について満足か又は不満足かを選択できるボタン等を表示し、質問者に入力を促す。操作内容取得部３１５は、表示部１２に表示したボタン等を介した質問者の操作に基づいて、質問者が質問に対する回答に満足したか否かを示す情報を取得することができる。操作内容取得部３１５は、質問者が提示された質問に対する回答に対して満足したか否かを評価したと判定した場合、Ｓ８０２に進む。操作内容取得部３１５は、質問者が提示された質問に対する回答に対して満足したか否かを評価していないと判定した場合、図８の処理を終了する。
Ｓ８０２において、操作内容取得部３１５は、質問者が提示された回答に対して満足したか否かを判定する。操作内容取得部３１５は、Ｓ８０１で説明した方法で、表示部１２に表示したボタン等を介した質問者の操作に基づいて、回答に満足したか否かを示す情報を取得する。操作内容取得部３１５は、取得した情報が回答に満足したことを示す情報である場合、質問者が回答に満足したと判定し、取得した情報が回答に満足していないことを示す情報である場合、質問者が回答に満足していないと判定する。操作内容取得部３１５は、質問者が提示された回答に対して満足したと判定した場合、Ｓ８０３の処理に進み、質問者が提示された回答に対して満足していないと判定した場合は、図８の処理を終了する。 FIG. 8 is a flowchart illustrating an example of update processing of the answer database.
In step S 801, the operation content acquisition unit 315 determines whether or not the questioner has evaluated whether or not the answer to the presented question is satisfied. The operation content acquisition unit 315 displays, for example, a button for selecting whether the answer is satisfactory or dissatisfied on the display unit 12 via the content presentation unit 313 and prompts the questioner to input. The operation content acquisition unit 315 can acquire information indicating whether or not the questioner is satisfied with the answer to the question based on the operation of the questioner via the button or the like displayed on the display unit 12. If the operation content acquisition unit 315 determines that the questioner is satisfied with the answer to the presented question, the operation content acquisition unit 315 proceeds to S802. If the operation content acquisition unit 315 determines that it is not evaluated whether the questioner is satisfied with the answer to the presented question, the process of FIG. 8 ends.
In step S 802, the operation content acquisition unit 315 determines whether the questioner is satisfied with the presented answer. The operation content acquisition unit 315 acquires information indicating whether or not the answer is satisfied based on the operation of the questioner via the button or the like displayed on the display unit 12 by the method described in S801. When the acquired information is information indicating that the answer is satisfied, the operation content acquisition unit 315 determines that the questioner is satisfied with the answer and indicates that the acquired information is not satisfied with the answer. In this case, it is determined that the questioner is not satisfied with the answer. When the operation content acquisition unit 315 determines that the questioner is satisfied with the presented answer, the operation content acquisition unit 315 proceeds to the processing of S803, and when the questioner determines that the questioner is not satisfied with the presented answer, The process of FIG. 8 is terminated.

Ｓ８０３において、回答情報記憶部３３６は、質問者が提示された回答に満足したことを示す情報を用いて、回答データベースを更新する。回答情報記憶部３３６は、例えば、回答データベース中の今回提示された回答データについて対応する満足度を更新する。回答情報記憶部３３６は、例えば、回答データに対する満足度を、提示された回答データにおける満足の有無の評価数全てに対して、満足したと回答した数の比率として算出する。例えば、音声対話システムは、回答を質問者に提示した後、質問者から満足又は不満足であるかを示す情報が入力された場合、入力された情報を、履歴情報として記憶装置３４等に記憶する。回答情報記憶部３３６は、例えば、現在までに、ある回答データが提示されたことのある場合全てについて、質問者から入力された満足又は不満足を示す履歴情報を記憶装置３４等から取得する。そして、回答情報記憶部３３６は、満足、不満足に関わらず回答データに対する評価の総数で、質問者が満足と評価した数を割ったものを、満足度として算出する。そして、回答情報記憶部３３６は、算出した満足度の値で、回答データベース内のその回答データに対応する満足度を更新する。
また、音声対話システムは、質問者により回答が不満足であることを示す情報が入力された場合、他の回答を改めて提示することとしてもよい。 In step S803, the answer information storage unit 336 updates the answer database using information indicating that the questioner is satisfied with the presented answer. For example, the answer information storage unit 336 updates the corresponding satisfaction level for the answer data presented this time in the answer database. For example, the answer information storage unit 336 calculates the degree of satisfaction with respect to the answer data as a ratio of the number of answers that have been satisfied with respect to all the evaluations of satisfaction in the presented answer data. For example, after presenting the answer to the questioner, the voice dialogue system stores the input information as history information in the storage device 34 or the like when information indicating whether the questioner is satisfied or dissatisfied is input. . For example, the answer information storage unit 336 acquires history information indicating satisfaction or dissatisfaction input from the questioner from the storage device 34 or the like for all cases where a certain answer data has been presented so far. Then, the answer information storage unit 336 calculates, as a satisfaction level, a value obtained by dividing the total number of evaluations for the answer data regardless of satisfaction or dissatisfaction by the number that the questioner has evaluated as satisfactory. Then, the response information storage unit 336 updates the satisfaction level corresponding to the response data in the response database with the calculated satisfaction level value.
In addition, the voice interactive system may present another answer again when information indicating that the answer is unsatisfactory is input by the questioner.

図９は、表示条件の設定と質問発話情報の入力から回答の提示までの処理で利用されるインターフェース画面の一例を示す図である。図９（ａ）は、表示条件等の各種条件の設定に利用されるインターフェース画面の一例を示す図である。また、図９（ｂ）は、質問発話情報の入力に利用されるインターフェース画面の一例を示す図である。また、図９（ｃ）は、質問に対する回答を表示する際に利用されるインターフェース画面の一例を示す図である。
本実施形態では、表示操作部３１４は、Ｓ４０１で、図９（ａ）のインターフェース画面を表示部１２に表示することとするが、図９（ａ）のインターフェース画面と同様の機能を含む他の画面を表示してもよい。
ウィンドウ９０１は、図９（ａ）のインターフェース画面のウィンドウである。本実施形態では、表示条件設定部３１７は、各種条件の設定や質問発話情報の入力、回答結果の表示等を、ウィンドウ９０１を介して行う。
スペース９０２は、表示条件を設定する項目が並ぶスペースである。質問者は、例えば、質問に対する回答を表示する画面において、一度に表示される回答の数をスペース９０２の入力ボックスに入力することができる。本実施形態では、回答数は数字で設定されるため、質問者は、スペース９０２の入力ボックス内に所望の数値を記入して入力する。回答決定部３３９は、例えば、図７の処理において、スペース９０２に入力された値を、回答上限数Ｌの値として用いる。 FIG. 9 is a diagram illustrating an example of an interface screen used in processing from setting display conditions and inputting question utterance information to presenting an answer. FIG. 9A is a diagram illustrating an example of an interface screen used for setting various conditions such as display conditions. FIG. 9B is a diagram showing an example of an interface screen used for inputting question utterance information. FIG. 9C is a diagram illustrating an example of an interface screen used when displaying an answer to a question.
In this embodiment, the display operation unit 314 displays the interface screen of FIG. 9A on the display unit 12 in S401, but other functions including the same functions as the interface screen of FIG. A screen may be displayed.
A window 901 is a window of the interface screen in FIG. In the present embodiment, the display condition setting unit 317 performs setting of various conditions, input of question utterance information, display of answer results, and the like via a window 901.
A space 902 is a space in which items for setting display conditions are arranged. For example, the questioner can input the number of answers displayed at a time in the input box of the space 902 on the screen displaying the answers to the questions. In the present embodiment, since the number of answers is set as a number, the questioner enters a desired numerical value in the input box of the space 902 and inputs it. For example, in the process of FIG. 7, the answer determination unit 339 uses the value input in the space 902 as the value of the answer upper limit number L.

スペース９０３は、質問に対する回答の決定に利用される決定条件についての項目が並ぶスペースである。例えば、スペース９０３は、回答候補を評価する処理内で使用される満足度の閾値ｐ＿ｂｏｕｎｄの入力に利用されるスライドバー９０４を含む。また、例えば、スペース９０３は、回答候補を評価する処理において主観的状態と客観的状態とのどちらの優先順位を高くするかの決定に利用されるラジオボタン９０５等を含む。回答決定部３３９は、スペース９０３に入力された決定条件の情報に基づいて、質問者からの質問に対する回答を決定することになる。
スライドバー９０４は、満足度の閾値ｐ＿ｂｏｕｎｄを決定する際に利用されるスライドバーである。つまみを左右に移動させることにより閾値を変更することが可能である。表示条件設定部３１７は、スライドバー９０４上でのつまみの位置に応じて、閾値ｐ＿ｂｏｕｎｄの値を決定する。 A space 903 is a space where items about determination conditions used for determining an answer to a question are arranged. For example, the space 903 includes a slide bar 904 that is used to input a satisfaction threshold p_bound used in the process of evaluating answer candidates. In addition, for example, the space 903 includes a radio button 905 and the like that are used to determine which of the priority state of the subjective state and the objective state is to be increased in the process of evaluating the answer candidate. The answer determination unit 339 determines an answer to the question from the questioner based on the information on the determination condition input in the space 903.
The slide bar 904 is a slide bar used when determining the satisfaction threshold p_bound. The threshold value can be changed by moving the knob to the left or right. The display condition setting unit 317 determines the value of the threshold value p_bound according to the position of the knob on the slide bar 904.

ラジオボタン９０５は、回答候補を評価する処理において主観的状態と客観的状態のどちらを優先するかの選択に利用されるラジオボタンである。音声対話システムは、例えば、ラジオボタン９０５を介して主観的状態を優先することが選択された場合、以下のようにしてもよい。即ち、音声対話システムは、質問の際の質問者の主観的状態と、回答データベース内の回答データに対応する主観的状態と、が同一であるか否かを比較し、客観的状態については比較しないこととしてもよい。また、音声対話システムは、例えば、ラジオボタン９０５を介して客観的状態を優先することが選択された場合、以下のようにしてもよい。即ち、音声対話システムは、質問の際の質問者の客観的状態と、回答データベース内の回答データに対応する客観的状態と、が類似するか否かを比較し、主観的状態については比較しないこととしてもよい。
また、音声対話システムは、例えば、ラジオボタン９０５の代わりに客観的状態と、主観的状態とに対応するチェックボックスを含むこととしてもよい。そして、音声対話システムは、チェックボックスにチェックが入っている状態について、質問の際の質問者の状態と、回答データに対応する状態とを比較することとしてもよい。
ボタン９０６は、スペース９０２、９０３に入力された条件を反映させるか否かの選択に利用される「ＯＫ」ボタン、「キャンセル」ボタンである。表示条件設定部３１７は、「ＯＫ」ボタンの選択を検知した場合、スペース９０２、９０３に入力された条件の反映を確定する。表示条件設定部３１７は、「キャンセル」ボタンの選択を検知した場合、スペース９０２、９０３に入力された条件の反映を取り消す。
表示操作部３１４は、スペース９０２やスペース９０３に入力された条件の情報を、サーバー３０に送信する。この送信処理は、条件送信処理の一例である。 The radio button 905 is a radio button used for selecting which of the subjective state and the objective state is prioritized in the process of evaluating the answer candidates. For example, when priority is given to the subjective state via the radio button 905, the voice interaction system may be configured as follows. That is, the spoken dialogue system compares whether or not the subjective state of the questioner at the time of the question is the same as the subjective state corresponding to the answer data in the answer database, and compares the objective state. You may not do it. Further, for example, when priority is given to the objective state via the radio button 905, the voice interaction system may be configured as follows. That is, the spoken dialogue system compares whether the objective state of the questioner at the time of the question is similar to the objective state corresponding to the answer data in the answer database, and does not compare the subjective state. It is good as well.
Further, for example, the voice interaction system may include check boxes corresponding to an objective state and a subjective state instead of the radio button 905. Then, the voice interactive system may compare the state of the questioner at the time of the question with the state corresponding to the answer data for the state where the check box is checked.
A button 906 is an “OK” button or a “Cancel” button used for selecting whether to reflect the conditions input in the spaces 902 and 903. When the display condition setting unit 317 detects selection of the “OK” button, the display condition setting unit 317 determines the reflection of the conditions input in the spaces 902 and 903. When the display condition setting unit 317 detects selection of the “cancel” button, the display condition setting unit 317 cancels the reflection of the conditions input in the spaces 902 and 903.
The display operation unit 314 transmits the condition information input in the space 902 or the space 903 to the server 30. This transmission process is an example of a condition transmission process.

図９（ｂ）は、質問者による質問発話情報の入力に利用されるインターフェース画面の一例を示す図である。表示操作部３１４は、Ｓ４０３で、表示部１２に図９（ｂ）のインターフェース画面を表示することとするが、図９（ｂ）のインターフェース画面と同様の機能を含む他の画面を表示してもよい。
スペース９０７は、質問文を入力する方法等を案内する情報が表示されるスペースである。質問者は、スペース９０７に表示された文を参考に質問を発声することができる。表示条件設定部３１７は、スペース９０７に表示する文の数や内容を、状況に応じて変更してもよい。
ボタン９０８は、質問者による質問発話情報の入力を開始する際に、選択されるボタンである。質問者は、ボタン９０８を選択後にマイク１７に向かって、質問を発話する。
表示操作部３１４は、質問発話情報を入力可能であることを示すために、ボタン９０８が選択される前と後とでボタン９０８の色を異なる色で表示する等してもよい。更に、音声取得部３１２は、質問者による質問の発話が終了した際には、ボタン９０８の選択を検知することで、質問発話情報の入力が終了したことを検知してもよい。また、音声認識部３３２は、音声区間を検出し、一定期間、質問者から発話が入力されない場合、質問発話情報の入力を終了してもよい。また、表示操作部３１４は、質問発話情報の入力を終了した場合は、ボタン９０８の色を質問発話情報の入力開始前と同じ色に戻してもよい。
本実施形態では、音声取得部３１２は、ボタン９０８により質問発話情報の入力を制御しているが、この方法に限定するものではない。例えば、音声取得部３１２は、質問者からの発話が途切れたタイミングで、質問発話情報の入力を終了することとしてもよい。 FIG. 9B is a diagram illustrating an example of an interface screen used for inputting question utterance information by a questioner. In S403, the display operation unit 314 displays the interface screen of FIG. 9B on the display unit 12, but displays another screen including the same function as the interface screen of FIG. 9B. Also good.
A space 907 is a space for displaying information for guiding a method for inputting a question sentence and the like. The questioner can utter a question with reference to the sentence displayed in the space 907. The display condition setting unit 317 may change the number and content of sentences displayed in the space 907 according to the situation.
A button 908 is a button that is selected when the questioner starts inputting question utterance information. The questioner speaks the question toward the microphone 17 after selecting the button 908.
The display operation unit 314 may display the button 908 in a different color before and after the button 908 is selected in order to indicate that the question utterance information can be input. Furthermore, the voice acquisition unit 312 may detect that the input of the question utterance information has been completed by detecting selection of the button 908 when the questioner utters the question. In addition, the voice recognition unit 332 may detect the voice section, and may end the input of the question utterance information when no utterance is input from the questioner for a certain period. In addition, when the input of the question utterance information is finished, the display operation unit 314 may return the color of the button 908 to the same color as before the input of the question utterance information.
In the present embodiment, the voice acquisition unit 312 controls the input of the question utterance information by the button 908, but is not limited to this method. For example, the voice acquisition unit 312 may end the input of the question utterance information at the timing when the utterance from the questioner is interrupted.

図９（ｃ）は、質問者からの質問に対する回答の表示に利用されるインターフェース画面の一例を示す図である。内容提示部３１３は、Ｓ４０８で、表示部１２に図９（ｃ）のインターフェース画面を表示することとするが、図９（ｃ）のインターフェース画面と同様の機能を含む他の画面を表示してもよい。
ボタン９０８は、図９（ｃ）のインターフェース画面にも含まれる。音声取得部３１２は、図９（ｃ）のインターフェース画面が表示されている状態で、ボタン９０８の選択を検知した場合、図９（ｂ）のインターフェース画面が表示されていなくても、質問者からの質問発話情報の入力を受け付けることができる。
スペース９０９は、質問者からの質問の内容が表示されるスペースである。質問者は、スペース９０９の内容を目にすることで、質問者が望む質問が音声対話システムに入力できているか否かを確認できる。
一覧９１０は、質問者に提示される回答の要約の一覧である。本実施形態では、一覧９１０は、表形式であり、カメラ２０の各パラメータをどのような値に設定すればよいかの情報を含む。本実施形態では、一覧９１０には、回答決定部３３９により質問者の質問に対する回答として決定された回答データの要約情報が、上から対応する満足度が高い順に表示されている。なお、一覧９１０は、表形式でなく、他の形式で回答の要約の一覧を含むこととしてもよい。例えば、一覧９１０は、箇条書きの形式の回答の要約の一覧を含むこととしてもよい。 FIG. 9C is a diagram illustrating an example of an interface screen used for displaying an answer to a question from a questioner. The content presentation unit 313 displays the interface screen of FIG. 9C on the display unit 12 in S408, but displays another screen including the same function as the interface screen of FIG. 9C. Also good.
The button 908 is also included in the interface screen of FIG. When the voice acquisition unit 312 detects selection of the button 908 in a state where the interface screen of FIG. 9C is displayed, the voice acquisition unit 312 receives the question from the questioner even if the interface screen of FIG. 9B is not displayed. Can be input.
A space 909 is a space in which the content of the question from the questioner is displayed. The questioner can confirm whether or not the question desired by the questioner can be input to the voice interactive system by seeing the contents of the space 909.
A list 910 is a summary list of answers presented to the questioner. In the present embodiment, the list 910 is in a table format and includes information on what values should be set for the parameters of the camera 20. In the present embodiment, the list 910 displays summary information of the answer data determined as answers to the questioner's question by the answer determination unit 339 in descending order of corresponding satisfaction. It should be noted that the list 910 may include a summary list of answers in another format instead of a table format. For example, the list 910 may include a list of summary answers in a bulleted format.

スペース９１１は、１つの回答データの詳細が表示されるスペースである。スペース９１１は、回答データに応じた操作の手順を記述した文章や、操作部１１の写真や図面等を利用したどこをどのように操作すればよいかの説明等を含む。また、スペース９１１は、表示中の回答データに応じた操作を実行することで、どのような写真が撮影できるのかを示すイメージ写真等を含むこととしてもよい。本実施形態では、スペース９１１は、回答の詳細表示に操作手順や撮影可能なイメージ写真等を含むこととしたが、これら以外にも例えば、カメラ２０による撮影についてのアドバイス情報等を含むこととしてもよい。
内容提示部３１３は、例えば、質問者の操作部１１を介したタッチ操作等に基づいて、一覧９１０に表示される回答のうちの何れかの選択を検知する。そして、内容提示部３１３は、選択された回答についての詳細な情報を、回答データベースから取得し、スペース９１１に表示することとしてもよい。 A space 911 is a space where details of one answer data are displayed. The space 911 includes a sentence describing an operation procedure according to the answer data, an explanation of where and how to operate using a photograph, a drawing, or the like of the operation unit 11. The space 911 may include an image photograph or the like indicating what kind of photograph can be taken by executing an operation according to the displayed answer data. In the present embodiment, the space 911 includes an operation procedure and a photographable image photograph in the detailed display of the answer. However, for example, the space 911 may include, for example, advice information about photographing by the camera 20. Good.
The content presentation unit 313 detects selection of any of the answers displayed in the list 910 based on, for example, a touch operation via the operation unit 11 of the questioner. And the content presentation part 313 is good also as acquiring the detailed information about the selected reply from the reply database, and displaying it on the space 911.

ボタン９１２は、スペース９１１に表示されている回答データに対する質問者の満足不満足を回答できるボタンである。内容提示部３１３は、ボタン９１２を、Ｓ４０８で回答データを質問者に提示する段階で表示してよい。また、内容提示部３１３は、ボタン９１２を、質問者に提示する段階では表示せず、回答データに対応する操作を質問者が実行した後で表示してもよい。例えば、質問者は、スペース９１１に表示されている操作手順に従って、カメラ２０のパラメータを変更したとする。内容提示部３１３は、カメラ２０からパラメータがスペース９１１に表示されている回答に応じて変更されたことを示す情報を取得すると、ボタン９１２を表示することとしてもよい。
また、回答決定部３３９は、質問に対する回答として決定した回答データが示すカメラ２０のパラメータの情報を、カメラ２０に送信し、カメラ２０に対して、送信したパラメータを反映させるように指示することとしてもよい。これにより、質問者が回答で示された操作を行わずとも、カメラ２０は、回答で示されたパラメータを反映できる。
回答情報記憶部３３６は、質問者によるボタン９１２の選択結果に基づいて、回答データベース内の回答データに対応する満足度を更新する。これにより、音声対話システムは、質問者が以降の質問においてより所望の回答を取得できる可能性を高めることができる。 The button 912 is a button that can answer the questioner's satisfaction or dissatisfaction with the answer data displayed in the space 911. The content presentation unit 313 may display the button 912 at the stage of presenting answer data to the questioner in S408. Further, the content presentation unit 313 may not display the button 912 at the stage of presenting to the questioner, but may display the button 912 after the questioner performs an operation corresponding to the answer data. For example, it is assumed that the questioner changes the parameters of the camera 20 in accordance with the operation procedure displayed in the space 911. The content presentation unit 313 may display the button 912 when acquiring information indicating that the parameter has been changed according to the answer displayed in the space 911 from the camera 20.
In addition, the answer determination unit 339 transmits the parameter information of the camera 20 indicated by the answer data determined as the answer to the question to the camera 20 and instructs the camera 20 to reflect the transmitted parameter. Also good. Accordingly, the camera 20 can reflect the parameter indicated by the answer without the questioner performing the operation indicated by the answer.
The answer information storage unit 336 updates the satisfaction corresponding to the answer data in the answer database based on the selection result of the button 912 by the questioner. Thereby, the voice interaction system can increase the possibility that the questioner can obtain a desired answer in the subsequent questions.

図１０は、種々の条件下における提示される回答等の一例を示す図である。
図１０の例では、回答候補となるデータが５種類あるとする。また、質問者の状態（主観的状態、客観的状態）を回答データに対応する状態（主観的状態、客観的状態）と比較する処理（Ｓ７０２、Ｓ７０４の処理）を行うかどうかに関する４種類の条件（状態推定結果の利用条件［１］〜［４］）があるとする。また、タスクである質問発話情報がタスクＡ〜Ｃの３種類あるとする。５つの回答候補は、図１０（ａ）に示すとおりである。また、３種類のタスク、４種類の条件は、それぞれ以下の通りである。
（３種類のタスク）
タスクＡ：質問発話テキスト＝「運動会での徒競走の写真の撮り方」
客観的状態＝三脚、望遠レンズともに未所持
主観的状態＝焦り
タスクＢ：質問発話テキスト＝「運動会での徒競走の写真の撮り方」
客観的状態＝三脚、望遠レンズともに未所持
主観的状態＝平常心
タスクＣ：質問発話テキスト＝「運動会での徒競走の写真の撮り方」
客観的状態＝三脚、望遠レンズともに所持
主観的状態＝平常心
（４種類の条件）
状態推定結果の利用条件［１］：客観的状態を利用しない、主観的状態を利用しない（Ｓ７０２、Ｓ７０３の判定処理が行われない）
状態推定結果の利用条件［２］：客観的状態を利用する、主観的状態を利用しない（Ｓ７０３の判定処理が行われない）
状態推定結果の利用条件［３］：客観的状態を利用しない、主観的状態を利用する（Ｓ７０２の判定処理が行われない）
状態推定結果の利用条件［４］：客観的状態を利用する、主観的状態を利用する（Ｓ７０２、Ｓ７０４の処理が行われる） FIG. 10 is a diagram illustrating an example of answers presented under various conditions.
In the example of FIG. 10, it is assumed that there are five types of data as answer candidates. In addition, four types of whether or not to perform processing (S702, S704 processing) for comparing a questioner's state (subjective state, objective state) with a state (subjective state, objective state) corresponding to the answer data It is assumed that there is a condition (use conditions [1] to [4] of the state estimation result). Further, it is assumed that there are three types of task utterance information, which are tasks A to C. The five answer candidates are as shown in FIG. The three types of tasks and the four types of conditions are as follows.
(Three types of tasks)
Task A: Question utterance text = “How to take a picture of a race at an athletic meet”
Objective state = no tripod or telephoto lens
Subjective state = impatience Task B: Text of question utterance = "How to take a picture of an athletic meet"
Objective state = no tripod or telephoto lens
Subjective state = Normal mind Task C: Question utterance text = "How to take a picture of an athlete at an athletic meet"
Objective state = possessing both a tripod and a telephoto lens
Subjective state = normal mind (4 conditions)
Condition of use of state estimation result [1]: Do not use objective state, do not use subjective state (determination process of S702 and S703 is not performed)
Use condition of state estimation result [2]: Use objective state, do not use subjective state (determination process of S703 is not performed)
Condition of use of state estimation result [3]: Do not use objective state, use subjective state (determination process of S702 is not performed)
Condition of use of state estimation result [4]: Use objective state, use subjective state (processing of S702 and S704 is performed)

図１０（ａ）は、回答候補の一例を示す図である。図１０（ａ）に表示されている回答候補は、何れもＳ４０７において質問者が所望する可能性の高いと評価された回答データである。各回答データは、大きく分けて２つの情報を含む。１つは、質問に対する回答の情報であり、図１０（ａ）の例では「カメラの設定」であるモード、シャッタースピードの情報である。もう１つは、回答データが質問者の要求に合致するか否かを判定する際に利用される情報であり、図１０（ａ）の例では、客観的状態（三脚所持有無、望遠レンズ所持有無）、主観的状態（焦りや平常等）、その回答に対しての質問者の満足度の情報である。回答データに対応する主観的状態は、これまでその回答データが採用された場合における質問者の主観的状態の中で最も多い状態としてもよい。
図１０（ｂ）は、各タスクに対して、条件毎に最も質問者が所望する回答が選択された結果を示す図である。タスクＡ〜Ｃは、何れも客観的状態、及び主観的状態が異なるので、それに応じて異なる回答が期待される。図１０（ｂ）の表中のアルファベットは、指定した条件においてそのタスクが最適と判断した回答データを示している。 FIG. 10A illustrates an example of answer candidates. The answer candidates displayed in FIG. 10A are all answer data evaluated as having a high possibility that the questioner desires in S407. Each answer data roughly includes two pieces of information. One is information on the answer to the question. In the example of FIG. 10A, information on the mode and shutter speed, which is “camera setting”. The other is information used when determining whether or not the answer data matches the requester's request. In the example of FIG. 10A, an objective state (whether a tripod is held or a telephoto lens is held). Presence / absence), subjective state (impression, normality, etc.), and information on the satisfaction of the questioner with respect to the answer. The subjective state corresponding to the answer data may be the largest state of the questioner's subjective state when the answer data has been adopted so far.
FIG. 10B is a diagram illustrating a result of selecting an answer most desired by the questioner for each condition for each task. Since all of the tasks A to C have different objective states and subjective states, different answers are expected accordingly. The alphabets in the table of FIG. 10B indicate the answer data that the task has determined to be optimal under the specified conditions.

図１０（ｂ）に示される結果を、状態推定結果の利用条件毎に見ていく。
状態推定結果の利用条件［１］においては、全てのタスクで同じ回答（Ｅ）が最適だと判定されている。これは、質問発話情報のテキスト情報のみが利用されており、音響情報から推定される主観的状態やセンサ等の情報から推定される客観的状態が利用されていないためである。この結果は、回答決定部３３９がＳ７０４で回答データに対応する満足度について、閾値判定を行うことで、質問に対する回答を決定する処理の結果である。このように状態推定結果の利用条件［１］では、音声対話システムは、質問者が異なる状態におかれていても毎回同じ回答しか提示することができない。そのため、質問者は、最適な回答を取得することはできない。
状態推定結果の利用条件［２］においては、カメラ付属所持品に応じてタスク毎の回答が変化しているのが分かる（タスクＡ、Ｂは回答Ｄ、タスクＣは回答Ｅ）。この結果は、回答決定部３３９がＳ７０２で客観的状態が類似するか否かの判定と、Ｓ７０４で回答データに対応する満足度について閾値判定と、を行うことで、質問に対する回答を決定する処理の結果である。これにより、所持していないカメラ付属品を用いた操作を行うことを示す回答が提示されることはなくなり、状態推定結果の利用条件［１］下に比べると、音声対話システムは、質問者の置かれている客観的状態に則して回答を提示できる。しかし、音声対話システムは、状態推定結果の利用条件［２］の下では、質問発話情報の音響情報から取得できる主観的状態を利用していない。そのため、音声対話システムは、例えば、質問者が短時間で手早く設定できる回答を所望する場合でも、より設定に時間がかかる操作を示す回答を提示するといった場合がある。 The result shown in FIG. 10B will be seen for each use condition of the state estimation result.
In the use condition [1] of the state estimation result, it is determined that the same answer (E) is optimal for all tasks. This is because only text information of question utterance information is used, and a subjective state estimated from acoustic information and an objective state estimated from information such as sensors are not used. This result is the result of the process in which the answer determination unit 339 determines the answer to the question by performing threshold determination on the satisfaction level corresponding to the answer data in S704. As described above, in the usage condition [1] of the state estimation result, the voice interactive system can present only the same answer every time even if the questioner is in a different state. Therefore, the questioner cannot obtain an optimal answer.
In the use condition [2] of the state estimation result, it can be seen that the answer for each task changes according to the camera accessories (task A and B are answer D, and task C is answer E). As a result, the answer determination unit 339 determines whether or not the objective state is similar in S702, and determines the answer to the question by performing threshold determination on the degree of satisfaction corresponding to the answer data in S704. Is the result of As a result, an answer indicating that an operation using a camera accessory that is not possessed is not presented, and the voice interactive system is compared with the use condition [1] of the state estimation result. Answers can be presented according to the objective state in which they are placed. However, the speech dialogue system does not use the subjective state that can be acquired from the acoustic information of the question utterance information under the use condition [2] of the state estimation result. Therefore, for example, even when the questioner desires an answer that can be quickly set in a short time, the voice interaction system may present an answer indicating an operation that takes more time to set.

状態推定結果の利用条件［３］において、状態推定結果の利用条件［１］及び［２］では区別ができなかったタスク１と２とについて、音響情報から推定できる質問者の主観的状態に応じて、異なる回答を提示しているのが分かる。これにより、例えば質問者が短時間で手早く設定できる回答を所望する場合に、より設定に時間を必要とする回答が提示されることはなくなる。したがって、音声対話システムは、状態推定結果の利用条件［１］に比べると質問者の置かれている主観的状態に則して回答を提示している。しかし、センサ等の情報から取得できる質問者の客観的状態を使用していないため、例えば質問者が三脚や望遠レンズ等を所持していない場合でも、それらについての操作を示す回答を提示するといった場合がある。
状態推定結果の利用条件［４］において、全てのタスクで回答が異なるのが分かる。これは、音声対話システムが質問者の主観的状態と、カメラ付属品の所持の有無といった質問者の客観的状態の両方を利用しているためである。音声対話システムは、状態推定結果の利用条件［３］下と同様に、タスク１と２に対しても、それぞれ別の回答を提示している。更に、音声対話システムは、状態推定結果の利用条件［２］下と同様に、客観的状態を主観的状態の両方を使用することにより、質問者のカメラ付属品の所持状態に応じた回答を提示することができる。したがって、音声対話システムは、状態推定結果の利用条件［４］下では、質問者の客観的状態及び主観的状態に応じてより適切な回答を提示できる。即ち、本実施形態の音声対話システムは、状態推定結果の利用条件［４］を実現するための技術を提供するものである。 In the use condition [3] of the state estimation result, the tasks 1 and 2 that cannot be distinguished in the use condition [1] and [2] of the state estimation result depend on the subjective state of the questioner who can be estimated from the acoustic information You can see that they are presenting different answers. Thereby, for example, when the questioner desires an answer that can be set quickly in a short time, an answer that requires more time for setting is not presented. Therefore, the spoken dialogue system presents an answer in accordance with the subjective state where the questioner is placed as compared to the use condition [1] of the state estimation result. However, since the objective state of the questioner that can be obtained from information such as sensors is not used, for example, even when the questioner does not have a tripod, a telephoto lens, etc., an answer indicating the operation about them is presented. There is a case.
It can be seen that in the use condition [4] of the state estimation result, the answers are different for all tasks. This is because the voice dialogue system uses both the subjective state of the questioner and the objective state of the questioner, such as whether or not the camera accessory is possessed. The spoken dialogue system presents different answers for tasks 1 and 2 as well, under the usage condition [3] of the state estimation result. Furthermore, the speech dialogue system uses both the objective state and the subjective state as in the condition [2] for use of the state estimation result, thereby providing an answer according to the possession state of the questioner's camera accessory. Can be presented. Therefore, the speech dialogue system can present a more appropriate answer according to the objective state and the subjective state of the questioner under the use condition [4] of the state estimation result. That is, the voice interaction system of the present embodiment provides a technique for realizing the use condition [4] of the state estimation result.

以上、本実施形態では、音声対話システムは、質問者からの質問を音声で受付け、受け付けた質問者の質問発話情報に基づいて、質問を発話した際の質問者の主観的状態を推定する。そして、音声対話システムは、対応する主観的状態が、質問を発話した際の質問者の主観的状態と同一である回答データを、質問者の質問に対する回答として決定する。これにより、音声対話システムは、質問者に対して、質問者の主観的状態に応じて、より適切な回答を決定し、提示することができる。
また、音声対話システムは、対応する主観的状態が、質問の際の質問者の主観的状態と同一であり、対応する客観的状態が、質問の際の質問者の客観的状態と類似する回答データを、質問者の質問に対する回答として決定することもできる。これにより、音声対話システムは、質問者に対して、質問者の主観的状態及び客観的状態に応じて、より適切な回答を決定し、提示することができる。 As described above, in this embodiment, the voice interaction system accepts a question from a questioner by voice, and estimates the subjective state of the questioner when the question is uttered based on the question utterance information of the accepted questioner. Then, the spoken dialogue system determines answer data whose corresponding subjective state is the same as the subjective state of the questioner when the question is uttered as an answer to the questioner's question. Thereby, the voice interaction system can determine and present a more appropriate answer to the questioner according to the subjective state of the questioner.
In addition, the spoken dialogue system has an answer in which the corresponding subjective state is the same as the subjective state of the questioner at the time of the question, and the corresponding objective state is similar to the objective state of the questioner at the time of the question. The data can also be determined as an answer to the questioner's question. Thereby, the voice interaction system can determine and present a more appropriate answer to the questioner according to the subjective state and objective state of the questioner.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではない。
例えば、上述した音声対話システムの機能構成の一部又は全てをハードウェアとして情報端末１０やサーバー３０に実装してもよい。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.
As mentioned above, although preferable embodiment of this invention was explained in full detail, this invention is not limited to the specific embodiment which concerns.
For example, a part or all of the functional configuration of the above-described voice interaction system may be implemented in the information terminal 10 or the server 30 as hardware.

１０情報端末
２０カメラ
３０サーバー 10 Information terminal 20 Camera 30 Server

Claims

First estimating means for estimating a subjective state of the questioner based on an acoustic feature of an audio signal indicating a question from the questioner;
Determining means for determining an answer to the question based on the subjective state of the questioner estimated by the first estimating means;
An information processing apparatus.

The information processing apparatus according to claim 1, wherein the first estimation unit estimates a psychological state of the questioner as a subjective state of the questioner based on the audio signal.

The first estimating means estimates the psychological state of the questioner based on the acoustic feature amount of the audio signal and the acoustic feature amount corresponding to each of a plurality of set psychological states. The information processing apparatus according to claim 2.

The first estimating means determines the psychological state corresponding to the acoustic feature amount most similar to the acoustic feature amount of the audio signal among the acoustic feature amounts corresponding to the plurality of psychological states, as the question. The information processing apparatus according to claim 3, wherein the information processing apparatus estimates the psychological state of the person.

A registration means for registering an acoustic feature amount in the subjective state set by the questioner;
The first estimation unit is an acoustic feature most similar to the acoustic feature amount of the audio signal among acoustic feature amounts corresponding to each of the plurality of psychological states including the acoustic feature amount registered by the registration unit. The information processing apparatus according to claim 4, wherein a psychological state corresponding to the amount is estimated as a psychological state of the questioner.

The determination means is estimated by the first estimation means and correspondence information indicating a correspondence between the answer candidate and a subjective state corresponding to the answer candidate and a satisfaction level indicating a degree of satisfaction of the questioner with respect to the answer candidate. The information processing apparatus according to claim 1, wherein an answer to the question is determined based on the subjective state of the questioner.

The determination means has the same subjective state as that of the questioner estimated by the first estimation means among the answer candidates included in the correspondence information, and a satisfaction level is set. The information processing apparatus according to claim 6, wherein an answer candidate that is equal to or greater than a threshold is determined as an answer to the question.

A second estimating means for estimating an objective state of the questioner;
The determination means is estimated by the first estimation means, the correspondence information indicating the correspondence between the answer candidate, the subjective state corresponding to the answer candidate, the objective state corresponding to the answer candidate, and the satisfaction with the answer candidate. The information processing apparatus according to claim 6, wherein an answer to the question is determined based on the subjective state of the questioner and the objective state of the questioner estimated by the second estimating unit.

The information processing apparatus according to claim 8, wherein the second estimation unit estimates the physical state of the questioner as an objective state of the questioner.

The information processing apparatus according to claim 8 or 9, wherein the second estimation unit estimates an objective state of the questioner based on a signal from a sensor included in the information processing apparatus.

The determining means has a corresponding subjective state that is the same as the subjective state of the questioner estimated by the first estimating means among the answer candidates included in the correspondence information, and the corresponding objective state is 11. The answer candidate according to any one of claims 8 to 10, wherein an answer candidate that is similar to the objective state of the questioner estimated by the second estimating means and has a satisfaction level equal to or higher than a set threshold is determined as an answer to the question. Information processing apparatus according to item.

The determining means has a corresponding subjective state that is the same as the subjective state of the questioner estimated by the first estimating means among the answer candidates included in the correspondence information, and has a corresponding objective state. Answer candidates in which more than half of the corresponding objective information is the same as the objective information corresponding to the objective state of the questioner estimated by the second estimating means and the satisfaction level is equal to or greater than a set threshold value The information processing apparatus according to claim 11, wherein the information is determined as an answer to the question.

Accepting means for accepting a response to an answer to the question determined by the determining means;
Updating means for updating the correspondence information based on the response received by the receiving means;
The information processing apparatus according to claim 7, further comprising:

The accepting means accepts an answer indicating whether or not the answer to the question determined by the determining means is satisfactory;
The updating means calculates the satisfaction of the questioner with respect to the answer to the question based on a plurality of answers received by the receiving means with respect to the answer to the question. The information processing apparatus according to claim 13, wherein the satisfaction level of answer candidates corresponding to an answer to the question is updated.

The first estimating unit estimates the subjective state of the questioner based on at least one of a volume, a speech rate, and a level of inflection, which are acoustic feature quantities of the audio signal. The information processing apparatus according to any one of 14.

The information processing apparatus according to claim 1, further comprising a changing unit that changes a set parameter according to the answer determined by the determining unit.

A reception means for receiving a voice indicating the question from the questioner;
Audio transmission means for transmitting an audio signal corresponding to the audio received by the reception means to an information processing device;
An output means for outputting information on an answer to the question transmitted from the information processing apparatus as a response to the sound signal transmitted by the sound transmitting means;
A terminal device.

Setting means for setting conditions for determining the answer to the question;
Condition transmitting means for transmitting information on the determination condition set by the setting means to the information processing apparatus;
The terminal device according to claim 17, further comprising:

The terminal device according to claim 18, wherein the determination condition includes a condition as to whether or not a subjective state of the questioner is used in an answer determination process for the question.

A system including a terminal device and an information processing device,
The terminal device
A reception means for receiving a voice indicating the question from the questioner;
Audio transmission means for transmitting an audio signal corresponding to the audio received by the reception means to the information processing apparatus;
An output means for outputting information on an answer to the question transmitted from the information processing apparatus as a response to the sound signal transmitted by the sound transmitting means;
Have
The information processing apparatus includes:
Estimating means for estimating a subjective state of the questioner based on an acoustic feature amount of the voice signal transmitted from the terminal device;
Determining means for determining an answer to the question based on the subjective state of the questioner estimated by the estimating means;
Answer sending means for sending answer information to the question determined by the determining means to the terminal device;
Having a system.

An information processing method executed by an information processing apparatus,
An estimation step for estimating a subjective state of the questioner based on an acoustic feature amount of an audio signal indicating a question from the questioner;
A determination step of determining an answer to the question based on the subjective state of the questioner estimated in the estimation step;
An information processing method including:

An information processing method executed by a terminal device,
A reception step for receiving a voice indicating the question from the questioner;
An audio transmission step of transmitting an audio signal corresponding to the audio received in the reception step to an information processing device;
As a response to the audio signal transmitted in the audio transmission step, an output step of outputting information on an answer to the question transmitted from the information processing device;
An information processing method including:

An information processing method in a system including a terminal device and an information processing device,
A receiving step in which the terminal device receives a voice indicating a question from a questioner;
An audio transmission step in which the terminal device transmits an audio signal corresponding to the audio received in the reception step to the information processing device;
The information processing apparatus estimates the subjective state of the interrogator based on the acoustic feature amount of the audio signal transmitted from the terminal device; and
The information processing apparatus determines a response to the question based on the subjective state of the questioner estimated in the estimation step;
The information processing apparatus transmits an answer information for the question determined in the determination step to the terminal apparatus;
An output step in which the terminal device outputs information of an answer to the question transmitted from the information processing device in the answer transmission step;
An information processing method including:

A program for causing a computer to function as each unit of the information processing apparatus according to any one of claims 1 to 16.

A program for causing a computer to function as each unit of the terminal device according to any one of claims 17 to 19.