JP2013238986A

JP2013238986A - Processing apparatus, processing system, and output method and program

Info

Publication number: JP2013238986A
Application number: JP2012110831A
Authority: JP
Inventors: Yusuke Tsukuda; 友介佃; Haruomi Azuma; 治臣東; Hideki Ohashi; 英樹大橋; Takahiro Hiramatsu; 嵩大平松
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2012-05-14
Filing date: 2012-05-14
Publication date: 2013-11-28

Abstract

PROBLEM TO BE SOLVED: To provide a processing apparatus, a processing system, and an output method and program for achieving more natural responses.SOLUTION: A processing apparatus includes: a voice recognition unit 21 that recognizes a user's voice; a search request unit 25 that requests search for information on the basis of the voice recognized by the voice recognition unit 21; an acquisition unit 27 that acquires a search result on the basis of the search for information requested by the search request unit 25; a situation recognition unit 29 that recognizes a situation of the user; and an output control unit 33 that determines whether or not it is a timing for causing an output unit 19 to output the search result on the basis of the situation of the user recognized by the situation recognition unit 29, and if it is the timing for the output, causes the output unit 19 to output the search result.

Description

本発明は、処理装置、処理システム、出力方法及びプログラムに関する。 The present invention relates to a processing apparatus, a processing system, an output method, and a program.

従来から、コンピュータに、ユーザが発した音声を認識させ、ユーザの発話の意図を理解・推論させて、適切な応答を行わせる技術が知られている。このような技術では、人間同士が対話するような自然な応答をコンピュータに行わせることが求められている。例えば特許文献１には、ユーザが疑問を発した際に、疑問に対する答えを応答する技術が開示されている。 2. Description of the Related Art Conventionally, a technique for causing a computer to recognize a voice uttered by a user, understand and infer the intention of the user's utterance, and perform an appropriate response is known. In such a technique, it is required to make a computer perform a natural response that allows humans to interact with each other. For example, Patent Document 1 discloses a technology that responds to an answer to a question when the user asks a question.

しかしながら、上述したような従来技術では、応答を行うタイミングが考慮されていないため、ユーザにとって適切なタイミングで応答が行われるとは限らない。特に、複数ユーザ間で対話が行われている場合には、ユーザ同士の対話を阻害してしまう可能性もある。 However, in the related art as described above, since the timing of response is not considered, the response is not always performed at an appropriate timing for the user. In particular, when a dialogue is performed between a plurality of users, there is a possibility that the dialogue between the users is hindered.

本発明は、上記事情に鑑みてなされたものであり、より自然な応答を行わせることができる処理装置、処理システム、出力方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a processing device, a processing system, an output method, and a program that can cause a more natural response.

上述した課題を解決し、目的を達成するために、本発明の一態様にかかる処理装置は、ユーザの音声を認識する音声認識部と、前記音声認識部で認識された音声に基づいて、情報の検索を要求する検索要求部と、前記検索要求部で要求された情報の検索に基づいて、検索結果を取得する取得部と、前記ユーザの状況を認識する状況認識部と、前記状況認識部で認識されたユーザの状況に基づいて、前記検索結果を出力部に出力させるタイミングであるか否かを判定し、出力させるタイミングである場合、前記検索結果を前記出力部に出力させる出力制御部と、を備える。 In order to solve the above-described problems and achieve the object, a processing device according to an aspect of the present invention provides a voice recognition unit that recognizes a user's voice and information based on the voice recognized by the voice recognition unit. A search request unit that requests a search, an acquisition unit that acquires a search result based on a search for information requested by the search request unit, a situation recognition unit that recognizes the user's situation, and the situation recognition unit An output control unit that determines whether or not it is a timing to output the search result to the output unit based on the user status recognized in step S3, and outputs the search result to the output unit if it is a timing to output the search result And comprising.

また、本発明の別の態様にかかる処理システムは、ユーザの音声を認識する音声認識部と、前記音声認識部で認識された音声に基づいて、情報の検索を要求する検索要求部と、前記検索要求部で要求された情報を検索する検索部と、前記検索部の情報の検索結果を取得する取得部と、前記ユーザの状況を認識する状況認識部と、前記状況認識部で認識されたユーザの状況に基づいて、前記検索結果を出力部に出力させるタイミングであるか否かを判定し、出力させるタイミングである場合、前記検索結果を前記出力部に出力させる出力制御部と、を備える。 A processing system according to another aspect of the present invention includes a speech recognition unit that recognizes a user's voice, a search request unit that requests a search for information based on the speech recognized by the speech recognition unit, A search unit that searches for information requested by a search request unit, an acquisition unit that acquires a search result of information of the search unit, a situation recognition unit that recognizes the user's situation, and a situation recognition unit that is recognized An output control unit that determines whether it is a timing to output the search result to an output unit based on a user's situation, and outputs the search result to the output unit when it is a timing to output the search result; .

また、本発明の別の態様にかかる出力方法は、音声認識部が、ユーザの音声を認識する音声認識ステップと、検索要求部が、前記音声認識部で認識された音声に基づいて、情報の検索を要求する検索要求ステップと、取得部が、前記検索要求部で要求された情報の検索に基づいて、検索結果を取得する取得ステップと、状況認識部が、前記ユーザの状況を認識する状況認識ステップと、出力制御部が、前記状況認識部で認識されたユーザの状況に基づいて、前記検索結果を出力部に出力させるタイミングであるか否かを判定し、出力させるタイミングである場合、前記検索結果を前記出力部に出力させる出力制御ステップと、を含む。 An output method according to another aspect of the present invention includes: a voice recognition step in which a voice recognition unit recognizes a user's voice; and a search request unit based on the voice recognized by the voice recognition unit. A search requesting step for requesting a search, an acquisition unit for acquiring a search result based on a search for information requested by the search requesting unit, and a situation in which the situation recognition unit recognizes the situation of the user When the recognition step and the output control unit determine whether it is a timing to output the search result to the output unit based on the user situation recognized by the situation recognition unit, And an output control step for causing the output unit to output the search result.

また、本発明の別の態様にかかるプログラムは、ユーザの音声を認識する音声認識ステップと、前記音声認識部で認識された音声に基づいて、情報の検索を要求する検索要求ステップと、前記検索要求部で要求された情報の検索に基づいて、検索結果を取得する取得ステップと、前記ユーザの状況を認識する状況認識ステップと、前記状況認識部で認識されたユーザの状況に基づいて、前記検索結果を出力部に出力させるタイミングであるか否かを判定し、出力させるタイミングである場合、前記検索結果を前記出力部に出力させる出力制御ステップと、してコンピュータを機能させるためのものである。 According to another aspect of the present invention, there is provided a program for recognizing a user's voice, a search requesting step for requesting information search based on the voice recognized by the voice recognition unit, and the search Based on a search for information requested by the request unit, an acquisition step for acquiring a search result, a situation recognition step for recognizing the situation of the user, and a situation of the user recognized by the situation recognition unit, It is determined whether or not it is time to output the search result to the output unit, and if it is time to output, the output control step for outputting the search result to the output unit is for causing the computer to function. is there.

本発明によれば、より自然な応答を行わせることができるという効果を奏する。 According to the present invention, there is an effect that a more natural response can be performed.

図１は、本実施形態の処理システムの構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a configuration of a processing system according to the present embodiment. 図２は、本実施形態の音声認識部の構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of the configuration of the voice recognition unit of the present embodiment. 図３は、本実施形態の処理システムで実行される処理の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of processing executed in the processing system of the present embodiment.

以下、添付図面を参照しながら、本発明にかかる処理装置、処理システム、出力方法及びプログラムの実施形態を詳細に説明する。 Hereinafter, embodiments of a processing device, a processing system, an output method, and a program according to the present invention will be described in detail with reference to the accompanying drawings.

まず、本実施形態の処理システムの構成について説明する。 First, the configuration of the processing system of this embodiment will be described.

図１は、本実施形態の処理システム１の構成の一例を示すブロック図である。図１に示すように、処理システム１は、処理装置の一例としてのネットワークエージェント（以下、「ＮＡ」と称する）１０と、検索部の一例としての検索サーバ１０１と、を備える。ＮＡ１０及び検索サーバ１０１は、インターネット１０７を介して接続されている。 FIG. 1 is a block diagram showing an example of the configuration of the processing system 1 of the present embodiment. As illustrated in FIG. 1, the processing system 1 includes a network agent (hereinafter referred to as “NA”) 10 as an example of a processing device, and a search server 101 as an example of a search unit. The NA 10 and the search server 101 are connected via the Internet 107.

検索サーバ１０１は、Ｗｅｂ上で公開されている情報を検索するものであり、例えば、Ｗｅｂ上で検索エンジン機能を提供するものなどであればよい。具体的には、検索サーバ１０１は、ＮＡ１０から検索クエリを受信し、受信した検索クエリに従ってＷｅｂ上で公開されている情報を検索し、検索結果をＮＡ１０に送信する。ここで、検索サーバ１０１が検索する情報は、Ｗｅｂの動的ページ上で公開されている動的情報であっても、Ｗｅｂの静的ページ上で公開されている静的情報であってもよい。なお、図１に示す例では、検索サーバを１台例示しているが、これに限定されるものではなく、何台であってもよい。 The search server 101 searches information published on the Web, and may be anything that provides a search engine function on the Web, for example. Specifically, the search server 101 receives a search query from the NA 10, searches information published on the Web in accordance with the received search query, and transmits the search result to the NA 10. Here, the information searched by the search server 101 may be dynamic information published on a Web dynamic page or static information published on a Web static page. . In the example illustrated in FIG. 1, one search server is illustrated, but the number is not limited to this, and any number may be used.

ＮＡ１０は、Ｗｅｂ上で公開されている情報や機能にアクセスする端末である。本実施形態では、ＮＡ１０は、スマートフォンやタブレットなど携帯型の端末を想定しているが、これに限定されるものではなく、インターネットにアクセス可能な装置であればよい。 The NA 10 is a terminal that accesses information and functions published on the Web. In this embodiment, NA10 assumes portable terminals, such as a smart phone and a tablet, However, It is not limited to this, What is necessary is just an apparatus which can access the internet.

本実施形態では、ユーザＵ１がＮＡ１０を所有しており、ユーザＵ１がユーザＵ２との対話にＮＡ１０を使用する場合を想定してＮＡ１０（処理システム１）について説明するが、ユーザが単独でＮＡ１０を使用することもできるし、３人以上のユーザが共用してＮＡ１０を使用することもできる。 In the present embodiment, the NA 10 (processing system 1) will be described assuming that the user U1 owns the NA 10 and the user U1 uses the NA 10 for dialogue with the user U2. The NA 10 can be used by three or more users.

ＮＡ１０は、図１に示すように、音声入力部１１と、ＧＰＳ（Global Positioning System）受信部１３と、通信部１５と、撮像部１６と、記憶部１７と、出力部１９と、制御部２０とを、備える。 As shown in FIG. 1, the NA 10 includes a voice input unit 11, a GPS (Global Positioning System) receiving unit 13, a communication unit 15, an imaging unit 16, a storage unit 17, an output unit 19, and a control unit 20. Are provided.

音声入力部１１は、ユーザＵ１やユーザＵ２などが発する音声をＮＡ１０に入力するものであり、マイクロフォンなどの集音器により実現できる。 The voice input unit 11 inputs voices uttered by the user U1, user U2, and the like to the NA 10, and can be realized by a sound collector such as a microphone.

ＧＰＳ受信部１３は、ＧＰＳ衛星からの電波を受信するものであり、ＧＰＳ受信機などにより実現できる。 The GPS receiver 13 receives radio waves from GPS satellites and can be realized by a GPS receiver or the like.

通信部１５は、インターネット１０７を介して検索サーバ１０１などの外部機器と通信するものであり、ＮＩＣ（Network Interface Card）などの通信装置により実現できる。 The communication unit 15 communicates with an external device such as the search server 101 via the Internet 107 and can be realized by a communication device such as a NIC (Network Interface Card).

撮像部１６は、ユーザＵ１やユーザＵ２などを撮像するものであり、デジタルカメラなどの撮像装置により実現できる。 The imaging unit 16 images the user U1, the user U2, and the like, and can be realized by an imaging device such as a digital camera.

記憶部１７は、ＮＡ１０で実行される各種プログラムやＮＡ１０で行われる各種処理に使用されるデータなどを記憶する。記憶部１７は、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、メモリカード、光ディスク、ＲＯＭ（Read Only Memory）、及びＲＡＭ（Random Access Memory）などの磁気的、光学的、又は電気的に記憶可能な記憶装置により実現できる。 The storage unit 17 stores various programs executed by the NA 10 and data used for various processes performed by the NA 10. The storage unit 17 is, for example, magnetic, optical, or electrical such as a hard disk drive (HDD), a solid state drive (SSD), a memory card, an optical disk, a read only memory (ROM), and a random access memory (RAM). This can be realized by a storage device that can be stored.

出力部１９は、制御部２０の処理結果を出力するものであり、液晶ディスプレイやタッチパネル式ディスプレイなどの表示出力用の表示装置、スピーカなどの音声出力用の音声装置、又は印刷出力用の印刷装置などで実現してもよいし、これらの装置を併用して実現してもよい。 The output unit 19 outputs the processing result of the control unit 20, and includes a display output display device such as a liquid crystal display and a touch panel display, a sound output sound device such as a speaker, or a print output printing device. For example, these devices may be used in combination.

制御部２０は、ＮＡ１０の各部を制御するものであり、音声認識部２１と、検索要求部２５と、取得部２７と、状況認識部２９と、出力制御部３３とを、含む。音声認識部２１、検索要求部２５、取得部２７、状況認識部２９、及び出力制御部３３は、例えば、ＣＰＵ（Central Processing Unit）などの処理装置にプログラムを実行させること、即ち、ソフトウェアにより実現してもよいし、ＩＣ（Integrated Circuit）などのハードウェアにより実現してもよいし、ソフトウェア及びハードウェアを併用して実現してもよい。 The control unit 20 controls each unit of the NA 10, and includes a voice recognition unit 21, a search request unit 25, an acquisition unit 27, a situation recognition unit 29, and an output control unit 33. The voice recognition unit 21, the search request unit 25, the acquisition unit 27, the situation recognition unit 29, and the output control unit 33, for example, cause a processing device such as a CPU (Central Processing Unit) to execute a program, that is, realized by software. Alternatively, it may be realized by hardware such as an IC (Integrated Circuit), or may be realized by using software and hardware together.

音声認識部２１は、音声入力部１１から入力された音声を認識して音声認識結果を得る。図２は、本実施形態の音声認識部２１の構成の一例を示すブロック図である。図２に示すように、音声認識部２１は、音響分析部５１と、変換部５３と、判定部５５と、抽出部５７とを、含む。 The voice recognition unit 21 recognizes the voice input from the voice input unit 11 and obtains a voice recognition result. FIG. 2 is a block diagram illustrating an example of the configuration of the speech recognition unit 21 according to the present embodiment. As shown in FIG. 2, the speech recognition unit 21 includes an acoustic analysis unit 51, a conversion unit 53, a determination unit 55, and an extraction unit 57.

音響分析部５１は、音声入力部１１から入力された音声を解析し、特徴量を抽出する。変換部５３は、音響分析部５１により抽出された特徴量を、記憶部１７に記憶されている音声認識用の辞書データなどを用いて、テキスト（文字列）に変換する。判定部５５は、自然言語処理技術などを用いて、変換部５３により変換されたテキストが疑問系（疑問文）であるか否かを判定する。抽出部５７は、判定部５５により疑問系であると判定された場合、疑問の対象となっているキーワードを抽出する。 The acoustic analysis unit 51 analyzes the voice input from the voice input unit 11 and extracts a feature amount. The conversion unit 53 converts the feature amount extracted by the acoustic analysis unit 51 into text (character string) using dictionary data for speech recognition stored in the storage unit 17. The determination unit 55 determines whether the text converted by the conversion unit 53 is questionable (question sentence) using a natural language processing technique or the like. When the determination unit 55 determines that the extraction unit 57 is questionable, the extraction unit 57 extracts the keyword that is the subject of the question.

音声認識手法の詳細については、例えば、前述した特許文献１、特開２００４−４５５９１号公報、及び特開２００８−２８１９０１号公報などに開示されている公知の手法を用いることができるため、ここでは、詳細な説明は省略する。 For details of the speech recognition method, for example, a known method disclosed in the above-described Patent Document 1, Japanese Patent Application Laid-Open No. 2004-45591, Japanese Patent Application Laid-Open No. 2008-281901, and the like can be used. Detailed description will be omitted.

検索要求部２５は、音声認識部２１で認識された音声（音声認識部２１の音声認識結果）に基づいて、情報の検索を要求する。具体的には、検索要求部２５は、音声認識部２１の音声認識結果が疑問系である場合、検索サーバ１０１に情報の検索を要求する。 The search request unit 25 requests information search based on the voice recognized by the voice recognition unit 21 (the voice recognition result of the voice recognition unit 21). Specifically, the search request unit 25 requests the search server 101 to search for information when the voice recognition result of the voice recognition unit 21 is questionable.

例えば、ユーザＵ１及びユーザＵ２が歴史の話をしており、ユーザＵ１が「関が原の戦いは何年にあった？」とユーザＵ２に質問したとする。そして、判定部５５により疑問系であると判定され、抽出部５７により疑問の対象となっているキーワードとして「関が原の戦い」及び「年」が抽出されたとする。この場合、検索要求部２５は、抽出されたキーワードである「関が原の戦い」及び「年」を検索クエリとし、検索サーバ１０１にウェブ上での検索を要求する。 For example, it is assumed that the user U1 and the user U2 are talking about history, and the user U1 asks the user U2 “How many years has the Sekigahara battle been?”. Then, it is assumed that the determination unit 55 determines that the system is questionable, and the extraction unit 57 extracts “Sekigahara Battle” and “Year” as the questionable keywords. In this case, the search request unit 25 uses the extracted keywords “Sekigahara Battle” and “Year” as a search query, and requests the search server 101 to search on the web.

取得部２７は、情報の検索結果を取得する。具体的には、取得部２７は、検索サーバ１０１から情報の検索結果を取得する。例えば、検索サーバ１０１により「関が原の戦い」及び「年」を検索クエリとする検索が行われた場合、取得部２７は、検索結果として「１６００年」を取得し、応答文を生成する。 The acquisition unit 27 acquires a search result of information. Specifically, the acquisition unit 27 acquires information search results from the search server 101. For example, when the search server 101 performs a search using “Sekigahara Battle” and “Year” as search queries, the acquisition unit 27 acquires “1600” as a search result and generates a response sentence.

状況認識部２９は、ユーザＵ１やユーザＵ２などの状況を認識してユーザ状況認識結果を得る。なお、状況認識部２９は、逐次、ユーザＵ１やユーザＵ２などの状況を認識する。ここで、ユーザ状況認識結果は、ユーザＵ１やユーザＵ２などの目配せの有無、ユーザＵ１やユーザＵ２などが首を傾けているか否か、並びにユーザＵ１及びユーザＵ２が沈黙しているか否かの少なくともいずれかなどが該当する。 The situation recognition unit 29 recognizes the situation of the user U1 or the user U2 and obtains a user situation recognition result. Note that the situation recognition unit 29 sequentially recognizes the situation of the user U1 and the user U2. Here, the user situation recognition result includes at least the presence / absence of the user U1 and the user U2, whether or not the user U1 and the user U2 are tilted, and whether or not the user U1 and the user U2 are silent. One of them is applicable.

目配せの有無や首を傾けているか否かは、撮像部１６により撮像された画像を解析することで判別できる。例えば、状況認識部２９は、撮像部１６により撮像された画像を解析し、ユーザＵ１やユーザＵ２などの視線が正面に向けられたか否かを判定することで、目配せの有無を判断できる。また例えば、状況認識部２９は、撮像部１６により撮像された画像を解析し、ユーザＵ１やユーザＵ２の頭部の傾きがある所定の角度を超えたかを判定することで、首を傾けているか否かを判断できる。 The presence / absence of the eyes and whether the head is tilted can be determined by analyzing the image captured by the imaging unit 16. For example, the situation recognizing unit 29 can determine the presence or absence of gaze by analyzing the image captured by the imaging unit 16 and determining whether or not the line of sight of the user U1 or the user U2 is directed to the front. In addition, for example, the situation recognition unit 29 analyzes the image captured by the imaging unit 16 and determines whether the head tilt of the user U1 or the user U2 exceeds a certain angle, thereby tilting the neck. You can determine whether or not.

沈黙しているか否かは、音声入力部１１に入力される音声の有無から判別できる。例えば、状況認識部２９は、音声認識部２１（判定部５５）により疑問系と判定された音声が音声入力部１１に入力されてからの経過時間を監視し、所定時間内に次の音声が入力されたか否かを判定することにより、沈黙しているか否かを判断できる。 Whether or not the user is silent can be determined from the presence or absence of a voice input to the voice input unit 11. For example, the situation recognizing unit 29 monitors the elapsed time after the voice determined to be questionable by the voice recognition unit 21 (determination unit 55) is input to the voice input unit 11, and the next voice is received within a predetermined time. By determining whether or not the input has been made, it can be determined whether or not the user is silent.

出力制御部３３は、状況認識部２９で認識されたユーザの状況（状況認識部２９のユーザ状況認識結果）に基づいて、取得部２７により取得された検索結果を出力部１９に出力させるタイミングであるか否かを判定し、出力させるタイミングである場合、当該検索結果を出力部１９に出力させる。なお、出力制御部３３は、逐次、ユーザ状況認識結果を用いて検索結果を出力部１９に出力させるタイミングであるか否かを判定する。そして出力制御部３３は、一定期間内に検索結果を出力部１９に出力させるタイミングであると判定しなかった場合、検索結果を出力部１９に出力させない。以降の説明では、「出力部１９に出力させるタイミング」を「出力タイミング」として説明する。 The output control unit 33 is configured to output the search result acquired by the acquisition unit 27 to the output unit 19 based on the user situation recognized by the situation recognition unit 29 (user situation recognition result of the situation recognition unit 29). It is determined whether or not there is a timing to output, and the search result is output to the output unit 19. Note that the output control unit 33 sequentially determines whether it is time to output the search result to the output unit 19 using the user situation recognition result. If the output control unit 33 does not determine that it is time to output the search result to the output unit 19 within a certain period, the output control unit 33 does not cause the output unit 19 to output the search result. In the following description, “timing to be output by the output unit 19” is described as “output timing”.

例えば、出力制御部３３は、ユーザ状況認識結果が、目配せ有りを示していたり、首を傾けていることを示していたり、沈黙していることを示していたりする場合、取得部２７により生成された応答文（検索結果）を出力部１９に出力させる。 For example, the output control unit 33 is generated by the acquisition unit 27 when the user situation recognition result indicates that there is a gaze, indicates that the head is tilted, or indicates that the user is silent. The response section (search result) is output to the output unit 19.

出力制御部３３は、例えば、出力部１９に音声出力を行わせる場合、取得部２７により生成された応答文（検索結果）を音声合成して音声に変換し、出力部１９に音声出力させる。出力制御部３３は、また例えば、出力部１９に音声出力を行わせる場合、取得部２７により生成された応答文（検索結果）を描画データに変換し、出力部１９に画面出力させる。 For example, when the output control unit 33 causes the output unit 19 to perform voice output, the response sentence (search result) generated by the acquisition unit 27 is voice-synthesized and converted into voice, and the output unit 19 outputs the voice. Further, for example, when the output unit 19 performs audio output, the output control unit 33 converts the response sentence (search result) generated by the acquisition unit 27 into drawing data and causes the output unit 19 to output the screen.

なお、ＮＡ１０は、上述した各部の全てを必須の構成とする必要はなく、その一部を省略した構成としてもよい。 Note that the NA 10 does not have to have all the above-described components as essential components, and may be configured such that some of them are omitted.

次に、本実施形態の処理システムの動作について説明する。 Next, the operation of the processing system of this embodiment will be described.

図３は、本実施形態の処理システム１で実行される処理の一例を示すフローチャートである。 FIG. 3 is a flowchart illustrating an example of processing executed by the processing system 1 of the present embodiment.

まず、音声認識部２１は、音声入力部１１から入力されたユーザＵ１やユーザＵ２などの音声を認識して音声認識結果を得る（ステップＳ１０１）。 First, the voice recognition unit 21 recognizes the voice of the user U1 or the user U2 input from the voice input unit 11 and obtains a voice recognition result (step S101).

続いて、音声認識部２１は、音声認識結果が疑問系であるか否かを判定する（ステップＳ１０３）。なお、音声認識結果が疑問系でない場合（ステップＳ１０３でＮｏ）、ＮＡ１０による応答出力は行われないため、ステップＳ１０１へ戻る。 Subsequently, the voice recognition unit 21 determines whether or not the voice recognition result is questionable (step S103). If the voice recognition result is not questionable (No in step S103), the NA 10 does not output a response, and the process returns to step S101.

音声認識結果が疑問系である場合（ステップＳ１０３でＹｅｓ）、検索要求部２５は、音声認識結果に基づく検索クエリで、検索サーバ１０１にウェブ上での検索を要求する（ステップＳ１０４）。 When the voice recognition result is questionable (Yes in step S103), the search request unit 25 requests the search server 101 to search on the web with a search query based on the voice recognition result (step S104).

続いて、検索サーバ１０１は、ＮＡ１０から検索クエリを受信し、受信した検索クエリに従ってＷｅｂ上で公開されている情報を検索し、検索結果をＮＡ１０に送信する（ステップＳ１０５）。 Subsequently, the search server 101 receives a search query from the NA 10, searches information published on the Web according to the received search query, and transmits the search result to the NA 10 (step S105).

続いて、取得部２７は、検索サーバ１０１から情報の検索結果を取得する（ステップＳ１０７）。 Subsequently, the acquisition unit 27 acquires a search result of information from the search server 101 (step S107).

続いて、状況認識部２９は、ユーザＵ１やユーザＵ２などの状況を逐次認識してユーザ状況認識結果を得、出力制御部３３は、状況認識部２９により得られたユーザ状況認識結果を用いて、取得部２７により取得された検索結果の出力タイミングであるか否かを判定する（ステップＳ１０９）。 Subsequently, the situation recognition unit 29 sequentially recognizes the situation of the user U1 and the user U2 to obtain a user situation recognition result, and the output control unit 33 uses the user situation recognition result obtained by the situation recognition unit 29. Then, it is determined whether it is the output timing of the search result acquired by the acquisition unit 27 (step S109).

状況認識部２９及び出力制御部３３は、出力タイミングと判定されるまで、ステップＳ１０９の処理を繰り返し（ステップＳ１０９でＮｏ）、出力タイミングと判定されると（ステップＳ１０９でＹｅｓ）、検索結果を出力部１９に出力させる（ステップＳ１１１）。これにより、ＮＡ１０による応答が望まれたタイミングでの出力が可能となり、対話の促進効果も期待できる。 The situation recognition unit 29 and the output control unit 33 repeat the process of step S109 until the output timing is determined (No in step S109). If the output timing is determined (Yes in step S109), the search result is output. It is made to output to the part 19 (step S111). As a result, it is possible to output at a timing when a response by the NA 10 is desired, and a dialogue promoting effect can be expected.

なお、状況認識部２９及び出力制御部３３は、一定期間内に検索結果の出力タイミングであると判定されなかった場合、検索結果を出力部１９に出力させずに、処理を終了する。これにより、ＮＡ１０による応答が望まれていない場合には、応答を行わず、対話の妨げになることを回避することが可能となる。 If the situation recognition unit 29 and the output control unit 33 do not determine that the search result output timing is within a certain period, the status recognition unit 29 and the output control unit 33 end the processing without causing the output unit 19 to output the search result. As a result, when a response by the NA 10 is not desired, the response is not performed, and it is possible to avoid a hindrance to the dialogue.

以上のように本実施形態では、ユーザ状況認識結果を用いて出力タイミングであるか否かを判定し、出力タイミングであれば出力を行うので、より自然な応答をＮＡ１０に行わせることができる。 As described above, in the present embodiment, it is determined whether or not it is the output timing using the user situation recognition result, and if it is the output timing, the output is performed, so that a more natural response can be performed by the NA 10.

特に本実施形態によれば、複数ユーザ間で対話が行われていて疑問が生じた場合に、他のユーザが応答を行った場合には、ＮＡ１０による応答を行わず、他のユーザも応答を行わなかった場合には、ＮＡ１０による応答を行うことができるため、ユーザ同士の対話を阻害せず、より自然な応答をＮＡ１０に行わせることができる。 In particular, according to the present embodiment, when a question arises when a dialogue is performed between a plurality of users, when another user responds, the NA 10 does not respond and the other users also respond. If not, a response by the NA 10 can be made, so that the interaction between users is not hindered, and a more natural response can be made to the NA 10.

（変形例）
なお、本発明は、上記実施形態に限定されるものではなく、種々の変形が可能である。 (Modification)
In addition, this invention is not limited to the said embodiment, A various deformation | transformation is possible.

（変形例１）
上記実施形態では、一般的なＷｅｂ検索の検索結果を応答内容としたが、ユーザＵ１やユーザＵ２の移動履歴の検索結果を応答内容としてもよい。例えば、ＧＰＳ受信部１３により受信されたＧＰＳ衛星からの電波によって認識されたユーザＵ１やユーザＵ２の位置情報をＷｅｂ上でストレージ機能を提供するクラウド上に履歴しておく。そして、過去に行った場所を問われた場合に、クラウド上の移動履歴を検索して当該場所の情報を取得し、出力タイミングで出力するようにしてもよい。このようにすれば、ユーザに依存する疑問に対しても応答することが可能となる。 (Modification 1)
In the above embodiment, the search result of the general Web search is used as the response content, but the search result of the movement history of the user U1 or the user U2 may be used as the response content. For example, the location information of the user U1 and the user U2 recognized by the radio wave from the GPS satellite received by the GPS receiving unit 13 is recorded on a cloud providing a storage function on the Web. And when the place visited in the past is asked, the movement history on a cloud may be searched, the information of the said place may be acquired, and you may make it output at an output timing. In this way, it is possible to respond to questions that depend on the user.

（変形例２）
上記実施形態において、出力タイミングだけでなく出力方法を考慮してもよい。具体的には、出力制御部３３は、出力部１９が表示出力及び音声出力など複数態様での出力が可能な場合には、いずれの態様で出力するかを判定し、判定した態様でパブリック情報の絞り込み結果を出力するようにしてもよい。例えば、出力制御部３３は、ＧＰＳ受信部１３により受信されたＧＰＳ衛星からの電波によって認識されたユーザＵ１やユーザＵ２の位置情報から公共の場であると判定した場合には、音声出力ではなく、表示出力としてもよい。 (Modification 2)
In the above embodiment, not only the output timing but also the output method may be considered. Specifically, when the output unit 19 can output in a plurality of modes such as display output and audio output, the output control unit 33 determines which mode to output, and in the determined mode public information The narrowing-down result may be output. For example, if the output control unit 33 determines that the location is a public place from the location information of the user U1 and the user U2 recognized by the radio waves from the GPS satellites received by the GPS receiving unit 13, the output control unit 33 does not output sound. It is good also as a display output.

（ハードウェア構成）
本実施形態及び変形例のＮＡ１０のハードウェア構成の一例について説明する。本実施形態及び変形例のＮＡ１０は、ＣＰＵなどの制御装置と、ＲＯＭやＲＡＭなどの記憶装置と、ＨＤＤなどの外部記憶装置と、ディスプレイなどの表示装置と、マイクロフォンなどの音声入力装置と、通信インタフェースなどの通信装置と、を備えており、通常のコンピュータを利用したハードウェア構成となっている。 (Hardware configuration)
An example of the hardware configuration of the NA 10 of this embodiment and the modification will be described. The NA 10 according to the present embodiment and the modification includes a control device such as a CPU, a storage device such as a ROM and a RAM, an external storage device such as an HDD, a display device such as a display, a voice input device such as a microphone, and a communication device. And a communication device such as an interface, and has a hardware configuration using a normal computer.

本実施形態及び変形例のＮＡ１０で実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、ＣＤ−Ｒ、メモリカード、ＤＶＤ（Digital Versatile Disk）、フレキシブルディスク（ＦＤ）等のコンピュータで読み取り可能な記憶媒体に記憶されて提供される。 The programs executed by the NA 10 of this embodiment and the modification are files in an installable format or an executable format, and are CD-ROM, CD-R, memory card, DVD (Digital Versatile Disk), flexible disk (FD). Or the like stored in a computer-readable storage medium.

また、本実施形態及び変形例のＮＡ１０で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するようにしてもよい。また、本実施形態及び変形例のＮＡ１０で実行されるプログラムを、インターネット等のネットワーク経由で提供または配布するようにしてもよい。また、本実施形態及び変形例のＮＡ１０で実行されるプログラムを、ＲＯＭ等に予め組み込んで提供するようにしてもよい。 Further, the program executed by the NA 10 of the present embodiment and the modification may be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network. Further, the program executed by the NA 10 of this embodiment and the modification may be provided or distributed via a network such as the Internet. Further, the program executed by the NA 10 of this embodiment and the modification may be provided by being incorporated in advance in a ROM or the like.

本実施形態及び変形例のＮＡ１０で実行されるプログラムは、上述した各部をコンピュータ上で実現させるためのモジュール構成となっている。実際のハードウェアとしては、ＣＰＵがＨＤＤからプログラムをＲＡＭ上に読み出して実行することにより、上記各部がコンピュータ上で実現されるようになっている。 The program executed by the NA 10 of this embodiment and the modification has a module configuration for realizing the above-described units on a computer. As actual hardware, the CPU reads out a program from the HDD to the RAM and executes the program, whereby the above-described units are realized on the computer.

１処理システム
１０ＮＡ（ネットワークエージェント）
１１音声入力部
１３ＧＰＳ受信部
１５通信部
１６撮像部
１７記憶部
１９出力部
２０制御部
２１音声認識部
２５検索要求部
２７取得部
２９状況認識部
３３出力制御部
５１音響分析部
５３変換部
５５判定部
５７抽出部
１０１検索サーバ
１０７インターネット 1 Processing System 10 NA (Network Agent)
DESCRIPTION OF SYMBOLS 11 Voice input part 13 GPS receiving part 15 Communication part 16 Imaging part 17 Storage part 19 Output part 20 Control part 21 Voice recognition part 25 Search request part 27 Acquisition part 29 Situation recognition part 33 Output control part 51 Acoustical analysis part 53 Conversion part 55 Determination unit 57 Extraction unit 101 Search server 107 Internet

特開２００７−１２１５７７号公報JP 2007-121577 A

Claims

A voice recognition unit that recognizes the user's voice;
A search request unit that requests a search for information based on the voice recognized by the voice recognition unit;
An acquisition unit for acquiring a search result based on a search for information requested by the search request unit;
A situation recognition unit for recognizing the situation of the user;
Based on the situation of the user recognized by the situation recognition unit, it is determined whether it is time to output the search result to the output unit, and when it is time to output, the search result is output to the output unit. An output control unit
A processing apparatus comprising:

The situation recognition unit sequentially recognizes the situation of the user,
The processing apparatus according to claim 1, wherein the output control unit sequentially determines whether or not it is a timing to output the search result to the output unit based on a user situation recognized by the situation recognition unit. .

The processing apparatus according to claim 2, wherein the output control unit does not output the search result to the output unit when it is not determined that it is time to output the search result to the output unit within a certain period.

The processing device according to any one of claims 1 to 3, wherein the search request unit requests a search for the information when the voice recognition result is questionable.

The status of the user recognized by the status recognition unit is at least one of the presence / absence of the user's attention, whether or not the user is tilted, and whether or not the user and the conversation partner are silent. The processing apparatus according to any one of claims 1 to 4.

A voice recognition unit that recognizes the user's voice;
A search request unit that requests a search for information based on the voice recognized by the voice recognition unit;
A search unit for searching for information requested by the search request unit;
An acquisition unit for acquiring a search result of information of the search unit;
A situation recognition unit for recognizing the situation of the user;
Based on the situation of the user recognized by the situation recognition unit, it is determined whether it is time to output the search result to the output unit, and when it is time to output, the search result is output to the output unit. An output control unit
A processing system comprising:

A voice recognition step in which a voice recognition unit recognizes a user's voice;
A search requesting step for requesting a search for information based on the voice recognized by the voice recognition unit;
An obtaining step for obtaining a search result based on a search for information requested by the search requesting unit;
A situation recognition unit that recognizes the situation of the user;
The output control unit determines whether or not it is a timing to output the search result to the output unit based on the situation of the user recognized by the situation recognition unit. An output control step for outputting to the output unit;
Output method including

A voice recognition step for recognizing the user's voice;
A search requesting step for requesting a search for information based on the voice recognized by the voice recognition unit;
An acquisition step of acquiring a search result based on a search for information requested by the search request unit;
A situation recognition step for recognizing the situation of the user;
Based on the situation of the user recognized by the situation recognition unit, it is determined whether it is time to output the search result to the output unit, and when it is time to output, the search result is output to the output unit. An output control step to
Program to make the computer function.