JP5705274B2

JP5705274B2 - Information processing apparatus and method

Info

Publication number: JP5705274B2
Application number: JP2013146553A
Authority: JP
Inventors: 淳太木下; 満里子藤田; 智史辻田
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2013-07-12
Filing date: 2013-07-12
Publication date: 2015-04-22
Anticipated expiration: 2033-07-12
Also published as: JP2015018174A

Description

本発明は、情報処理装置を用いた情報提供に関する。 The present invention relates to information provision using an information processing apparatus.

今日、スマートデバイス（スマートフォンやタブレットＰＣなど）を持ち歩くことにより、情報が必要になる都度、容易にインターネット検索の操作が行える。検索のための入力を音声認識で容易化する例も提案されている（例えば、特許文献１参照）。 Today, by carrying smart devices (smartphones, tablet PCs, etc.), you can easily search the Internet whenever you need information. An example of facilitating input for search by voice recognition has also been proposed (see, for example, Patent Document 1).

特表２００３−５１７１５８号Special table 2003-517158 特開２００９−２３８１９９号JP 2009-238199 A

しかし、情報が必要な都度、検索操作するのでは、進行してゆく会話にリアルタイムに話題を提供して会話を促進する（すなわち、弾ませる）ことはできなかった。 However, if a search operation is performed whenever information is needed, it has not been possible to provide a topic in real time for the ongoing conversation to promote (that is, play) the conversation.

会話中の重要語を抽出しテキストとして提示する技術はあるが（例えば、特許文献２参照）、テキスト化は記録にはなるものの会話へ話題を提供するものではない。また重要語の抽出は、基本的に、繰返される語の検出によるため、会話の内容からは遅延が大きくリアルタイムでない。この点でも重要語のテキスト化は、会話への話題提供にはならなかった。 Although there is a technique for extracting an important word in conversation and presenting it as text (see, for example, Patent Document 2), text conversion does not provide a topic for conversation although it is recorded. In addition, since the extraction of important words is basically based on the detection of repeated words, there is a large delay from the content of the conversation and not real time. In this respect as well, the conversion of important words into texts did not provide topics for conversation.

より複雑な情報処理で会話からテーマを抽出して情報提供する技術を考えても、遅延が大きい点で重要語の抽出と同じ問題があり、遅延を克服するため議論の展開を予測すれば予測精度の限界で見当外れな情報提供を招きかねない。それ以前に、複雑な情報処理は大きな処理能力を要し携行デバイスとして実装する障害となる。 Even if we consider the technology to extract themes from conversations with more complex information processing and provide information, there is the same problem as key word extraction in terms of large delay, and if we predict the development of the discussion to overcome the delay, we can predict It may lead to unforeseen information provision due to the limit of accuracy. Prior to that, complex information processing required large processing power and became an obstacle to implement as a portable device.

本発明の目的は、会話を促進させる話題をリアルタイムに提供することである。 An object of the present invention is to provide a topic for promoting conversation in real time.

上記の目的をふまえ、本発明の一態様（１）である情報処理装置は、会話の音声を取得する音声取得手段と、取得された前記音声から会話の相手の発話を抽出する抽出手段と、抽出された前記発話から語を音声認識する認識手段と、音声認識された前記語に基づく関連情報を通信ネットワークを介してサーバから取得する関連情報取得手段と、取得された前記関連情報を出力する出力手段と、を備えたことを特徴とする。 Based on the above object, an information processing apparatus according to one aspect (1) of the present invention includes a voice acquisition unit that acquires a voice of a conversation, an extraction unit that extracts a speech of a conversation partner from the acquired voice, Recognizing means for recognizing words from the extracted utterances, related information acquiring means for acquiring related information based on the speech-recognized words from a server via a communication network, and outputting the acquired related information And an output means.

本発明の他の態様（６）である情報処理方法は、上記態様を方法のカテゴリで捉えたもので、会話の音声を取得する音声取得処理と、取得された前記音声から会話の相手の発話を抽出する抽出処理と、抽出された前記発話から語を音声認識する認識処理と、音声認識された前記語に基づく関連情報を通信ネットワークを介してサーバから取得する関連情報取得処理と、取得された前記関連情報を出力する出力処理と、をコンピュータが実行することを特徴とする。 An information processing method according to another aspect (6) of the present invention is based on the above-described aspect in the category of the method, and is a voice acquisition process for acquiring a voice of a conversation, and an utterance of a conversation partner from the acquired voice Extraction processing for extracting words, recognition processing for recognizing words from the extracted utterances, related information acquisition processing for acquiring related information based on the speech-recognized words from a server via a communication network, In addition, a computer executes an output process for outputting the related information.

また、本発明の他の態様（２）は、上記いずれかの態様において、マイクロホンと、ヘッドホンと、人の身体に装着する手段と、を備えたウェアラブルデバイスであることを特徴とする。 According to another aspect (2) of the present invention, in any one of the aspects described above, the wearable device includes a microphone, a headphone, and a unit that is attached to a human body.

本発明の他の態様（３）は、上記いずれかの態様において、前記認識手段により音声認識された前記語の品詞を逐次判断する品詞判断手段と、相前後して判断された語の前記品詞に基づいて前記関連情報の基礎とする語を選択する選択手段と、を備えたことを特徴とする。 According to another aspect (3) of the present invention, in any one of the above aspects, the part of speech determination means for sequentially determining the part of speech of the word that has been speech-recognized by the recognition means; And selecting means for selecting a word as a basis of the related information.

本発明の他の態様（４）は、上記いずれかの態様において、前記関連情報取得手段は、音声認識された前記語と、環境を表す情報と、に基づいて前記関連情報を取得することを特徴とする。 According to another aspect (4) of the present invention, in any one of the above aspects, the related information acquisition unit acquires the related information based on the speech-recognized word and information representing an environment. Features.

本発明の他の態様（５）は、上記いずれかの態様において、前記出力手段は、前記発話の速度が所定値未満のときは取得された前記関連情報の候補を提示して出力対象の選択を求め、前記発話の速度が所定以上のときは、前記選択を求めることなく、取得された前記関連情報の候補のうち予め定められた優先順位に基づいて出力対象を決定することを特徴とする。 According to another aspect (5) of the present invention, in any one of the above aspects, the output unit selects an output target by presenting the acquired candidate related information when the utterance speed is less than a predetermined value. When the utterance speed is equal to or higher than a predetermined value, an output target is determined based on a predetermined priority order among the obtained candidates for the related information without obtaining the selection. .

本発明によれば、会話を促進させる話題をリアルタイムに提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the topic which promotes conversation can be provided in real time.

本発明の実施形態について構成を示す機能ブロック図。The functional block diagram which shows a structure about embodiment of this invention. 本発明の実施形態におけるデータの例を示す図。The figure which shows the example of the data in embodiment of this invention. 本発明の実施形態における処理手順を示すフローチャート。The flowchart which shows the process sequence in embodiment of this invention. 本発明の実施形態における一利用例を示す概念図。The conceptual diagram which shows the example of 1 use in embodiment of this invention.

次に、本発明を実施するための形態（「実施形態」と呼ぶ）について図に沿って例示する。なお、背景技術や課題などで既に述べた内容と共通の前提事項は適宜省略する。 Next, a mode for carrying out the present invention (referred to as “embodiment”) will be illustrated along the drawings. In addition, the assumptions common to the contents already described in the background art and problems are omitted as appropriate.

〔１．構成〕
図１は、本実施形態の構成を示す。本実施形態は、マイクロホンＭと、ヘッドホンＨ（イヤホンを含む）と、人の身体に装着する手段である図示しないホルダと、を備えたウェアラブルデバイスとして構成された本発明の情報処理装置（以下「本装置１」とも呼ぶ）に関する。また、検索サーバ２は、通信ネットワークＮ（インターネット、携帯電話網など）を介し、外部からの検索要求に応じて情報を提供するサーバ装置である。 [1. Constitution〕
FIG. 1 shows the configuration of this embodiment. The present embodiment is an information processing apparatus according to the present invention (hereinafter referred to as “a wearable device”) including a microphone M, headphones H (including earphones), and a holder (not shown) that is a means to be worn on a human body. Also referred to as “this device 1”). The search server 2 is a server device that provides information in response to a search request from the outside via a communication network N (Internet, mobile phone network, etc.).

本装置１は、コンピュータの構成すなわち、ＣＰＵなどの演算制御部６と、主メモリや補助記憶装置等の記憶装置７と、通信ネットワークＮとの通信装置８（通信機器や通信アダプタなど）と、を備える。検索サーバ２も、仕様は異なるが同様にコンピュータの構成を有する（図示省略）。本装置１では、記憶装置７に記憶されている図示しないコンピュータプログラムを演算制御部６が実行することで、図１に示す各要素を実現する。 The apparatus 1 includes a computer configuration, that is, an arithmetic control unit 6 such as a CPU, a storage device 7 such as a main memory and an auxiliary storage device, a communication device 8 (such as a communication device and a communication adapter) with a communication network N, Is provided. The search server 2 also has a computer configuration (not shown), although the specifications are different. In the present apparatus 1, each element shown in FIG. 1 is realized by the arithmetic control unit 6 executing a computer program (not shown) stored in the storage device 7.

実現される要素のうち、情報の記憶手段は、本装置１内のいわゆるローカル記憶に限らず、ネットワーク・コンピューティング（クラウド）などによるリモート記憶でもよい。また、本出願に示す記憶手段は、説明の便宜に合わせた単位、かつ主なものである。実際の記憶手段は、情報の記憶に付随する入出力や管理などの機能を含んでもよいし、構成の単位を分割または一体化してもよいし、ワークエリアなど他の記憶手段を適宜用いてもよい。 Among the realized elements, the information storage means is not limited to the so-called local storage in the apparatus 1 but may be remote storage by network computing (cloud) or the like. Further, the storage means shown in the present application is a unit that is convenient for explanation and is a main one. The actual storage means may include functions such as input / output and management associated with information storage, the unit of the configuration may be divided or integrated, and other storage means such as a work area may be used as appropriate. Good.

記憶手段のうち、認識辞書記憶手段３５は、音声認識用の認識辞書（例えば、語ごとや音の要素などごとの特徴を表すデータ）を記憶している。一時記憶手段４５は、認識された語をその品詞と共に記憶する手段である。検索条件記憶手段５５は、語や品詞に応じ、関連情報を取得する際の検索条件を記憶している。図２は、一時記憶手段４５と検索条件記憶手段５５の例を一体に示すが、データ項目は適宜省略している。他の記憶手段の記憶内容は図示を省略する。 Of the storage means, the recognition dictionary storage means 35 stores a recognition dictionary for speech recognition (for example, data representing features for each word or sound element). The temporary storage means 45 is means for storing the recognized word together with its part of speech. The search condition storage means 55 stores search conditions for acquiring related information according to words and parts of speech. FIG. 2 shows an example of the temporary storage means 45 and the search condition storage means 55, but the data items are omitted as appropriate. The storage contents of other storage means are not shown.

音声合成データ記憶手段６６は、音声合成用に語ごと及び音の要素ごとの音声データを記憶している。品詞辞書記憶手段７５は、語の品詞を判断するための品詞辞書を記憶している。 The speech synthesis data storage means 66 stores speech data for each word and each sound element for speech synthesis. The part-of-speech dictionary storage unit 75 stores a part-of-speech dictionary for determining the part of speech of a word.

なお、図中（例えば図１）の矢印は、データや制御などの流れの主な方向を例示するもので、他の流れの否定や方向の限定を意味するものではない。また、記憶手段以外の各手段は、以下に説明するような情報処理の機能又は作用を実現又は実行する処理手段であるが、これら機能又は作用は、専ら説明のための単位で、実際のハードウェア及びソフトウェアの要素との一致は問わない。 The arrows in the figure (for example, FIG. 1) exemplify the main direction of the flow of data and control, and do not mean the denial of other flows or the limitation of the direction. Each means other than the storage means is a processing means for realizing or executing an information processing function or action as described below, but these functions or actions are exclusively units for explanation and are implemented in actual hardware. It does not matter if it matches the hardware and software elements.

〔２．作用〕
図３は、本装置１の動作を示すフローチャートである。図４は、本実施形態を活用する一例を示す概念図である。
〔２−１．概要〕
まず、図３のフローチャートに沿って、一部のステップを省略して、動作の概要を説明する。本装置１は、本装置１をウェアラブルデバイスとして身につけているユーザ（図４における「自分」）と相手との会話の音声と、音声取得手段２０がマイクロホンＭでＡ／Ｄ変換などで取得する（ステップＳ１１）。例えば、図４の例において、本装置１を装着している自分の発話Ｔ１と会話相手の発話Ｔ２は、いずれも取得の対象になる。 [2. Action)
FIG. 3 is a flowchart showing the operation of the apparatus 1. FIG. 4 is a conceptual diagram showing an example of utilizing this embodiment.
[2-1. Overview〕
First, an outline of the operation will be described along the flowchart of FIG. The apparatus 1 acquires the voice of the conversation between the user wearing the apparatus 1 as a wearable device ("self" in FIG. 4) and the other party, and the voice acquisition means 20 performs A / D conversion or the like with the microphone M. (Step S11). For example, in the example of FIG. 4, the user's utterance T1 wearing the apparatus 1 and the conversation partner's utterance T2 are both acquisition targets.

抽出手段３０は、取得された音声から会話の相手の発話を、マイクロホンＭまでの距離の違いに基づく入力音量の違いや音声周波数の違いなどで抽出する（ステップＳ１２）。会話の相手の発話を抽出するには、周波数推定、隠れマルコフモデル、パターンマッチング、ニューラルネットワーク、決定木その他、公知の技術を用いる。相手の発話Ｔ２は抽出されるが、自分の発話Ｔ１は抽出されない。 The extraction means 30 extracts the utterance of the conversation partner from the acquired voice based on the difference in input volume based on the difference in distance to the microphone M, the difference in voice frequency, and the like (step S12). In order to extract the speech of the conversation partner, known techniques such as frequency estimation, hidden Markov model, pattern matching, neural network, decision tree, and the like are used. The other party's utterance T2 is extracted, but his own utterance T1 is not extracted.

認識手段４３は、抽出された発話Ｔ２から、認識辞書記憶手段３５内の認識辞書との比較照合などにより、語を音声認識する（ステップＳ１３）。音声認識は、統計的手法、動的時間伸縮法、隠れマルコフモデルなど、公知の技術を用いる。例えば、発話Ｔ２から、「この辺」「イタリアン」「△△」など、いくつかの語が認識される（図４において波線で囲む）。 The recognition unit 43 recognizes a word from the extracted utterance T2 by comparison with a recognition dictionary in the recognition dictionary storage unit 35 (step S13). For voice recognition, a known technique such as a statistical method, a dynamic time expansion / contraction method, a hidden Markov model, or the like is used. For example, several words such as “this side”, “Italian”, and “ΔΔ” are recognized from the utterance T2 (enclosed by a wavy line in FIG. 4).

その後、関連情報取得手段５４は、音声認識された語に基づく関連情報（例えば、用語解説や飲食店情報など）を、通信ネットワークを介し検索ＡＰＩなどを用いて検索サーバ２から取得する（ステップＳ１８）。出力手段６５は、取得された関連情報を、音声合成データ記憶手段６６に記憶されている音声データを用いた合成音声によりヘッドホンＨから出力する（ステップＳ２２）。 Thereafter, the related information acquisition means 54 acquires related information (for example, glossary or restaurant information) based on the speech-recognized word from the search server 2 using a search API or the like via the communication network (step S18). ). The output unit 65 outputs the acquired related information from the headphones H as synthesized speech using the speech data stored in the speech synthesis data storage unit 66 (step S22).

図４の例では、相手の発話Ｔ２から認識された語に基づく関連情報として、認識された語に該当する近隣のイタリア料理店の情報が、合成音声Ｖにより出力されている。本装置１を装着している自分は、この関連情報のおかげで会話が弾み、行動の判断と提案を直ちに行うことができた（例えば発話Ｔ２）。以下、他のステップを含め、具体的に説明する。 In the example of FIG. 4, information of an Italian restaurant in the neighborhood corresponding to the recognized word is output by the synthesized speech V as related information based on the word recognized from the partner's utterance T2. Thanks to this related information, the person wearing this device 1 was able to make a conversation immediately and make a judgment and suggestion of an action (for example, utterance T2). Hereinafter, it will be specifically described including other steps.

〔２−２．品詞による語の選択〕
会話での発話は多くの語を含むが、関連情報の基礎とする語を選択する基準の例は、語の品詞である。すなわち、品詞判断手段７３は、認識手段４３により音声認識された語の品詞を、品詞辞書記憶手段７５に記憶されている品詞辞書を用いて、逐次判断する（ステップＳ１４）。一時記憶手段４５は、認識された語と、その後について判断された品詞のペアを最新２０組記憶する（ステップＳ１５）。 [2-2. (Selecting words based on part of speech)
An utterance in a conversation includes many words, but an example of a criterion for selecting a word on which related information is based is the part of speech of the word. That is, the part of speech determination unit 73 sequentially determines the part of speech of the words recognized by the recognition unit 43 using the part of speech dictionary stored in the part of speech dictionary storage unit 75 (step S14). The temporary storage means 45 stores the latest 20 pairs of recognized words and parts of speech determined thereafter (step S15).

品詞は、一般に用いられる品詞（例えば「普通名詞」「固有名詞」など）に限らず、関連情報を取得するための特化した分類や予約語、例えば、場所を表す予約語（「この辺」「ここらで」「近くで」など）や業種名を表す予約語（「イタリアン」「ファミレス」「郵便局」「銀行」など）でもよい（例えば図２）。 The part of speech is not limited to a part of speech that is generally used (for example, “common noun”, “proprietary noun”, etc.), but also a specialized classification or reserved word for acquiring related information, for example, a reserved word (“this area”, “ It may also be a reserved word (such as “Italian”, “Family”, “Post Office”, “Bank”) or the like (for example, FIG. 2).

特定の品詞には、特定の検索条件を対応付けることができる。例えば、関連情報取得手段５４は、場所を表す予約語を、本装置１においてＧＰＳなどで測位する現在位置（例えば「港区赤坂」）の情報に置き換えたうえ、検索クエリとして検索サーバ２に送信する。 A specific search condition can be associated with a specific part of speech. For example, the related information acquisition unit 54 replaces a reserved word representing a place with information on the current position (for example, “Akasaka, Minato-ku”) measured by GPS or the like in the device 1 and transmits the information to the search server 2 as a search query. To do.

業種名を表す予約語のうち、飲食店を表すものと予め定められているもの（例えば「イタリアン」「ファミレス」）には、関連情報の取得に用いる検索サービスの種別として、特定の種別（例えば飲食店検索）を選択するという検索条件を対応付けることもできる。品詞と検索条件を対応付ける情報は、検索条件記憶手段５５に予め記憶しておく。 Among reserved words representing business names, those that are predetermined as representing restaurants (for example, “Italian” and “Family”) have a specific type (for example, the type of search service used for acquiring related information). A search condition of selecting (restaurant search) can also be associated. Information associating the part of speech with the search condition is stored in advance in the search condition storage means 55.

選択手段８３は、一時記憶手段４５に記憶されている語の品詞、すなわち相前後して判断された語の品詞に基づいて関連情報の基礎とする語を選択する（ステップＳ１７）。例えば、一時記憶手段４５に記憶されている最新２０組の語から、固有名詞＞普通名詞＞その他、といった優先順位で優先順位が高いものを選択する。 The selection unit 83 selects a word that is the basis of the related information based on the part of speech of the word stored in the temporary storage unit 45, that is, the part of speech of the word determined in succession (step S17). For example, from the latest 20 sets of words stored in the temporary storage means 45, the one with the highest priority in the order of proper noun> common noun> other is selected.

〔２−３．環境に基づく関連情報の取得〕
また、関連情報の取得には、本装置１が用いられている環境を表す情報（例えば、時刻、場所、行動内容など。「環境情報」とも呼ぶこととする）を反映できる。すなわち、関連情報取得手段５４は、環境情報を逐次更新し（ステップＳ１６）、選択手段８３により選択された語と、環境情報と、に基づいて関連情報を取得する（ステップＳ１８）。 [2-3. Acquisition of related information based on the environment)
In addition, information related to the environment in which the present apparatus 1 is used (for example, time, place, action content, etc .; also referred to as “environment information”) can be reflected in the acquisition of related information. That is, the related information acquisition unit 54 sequentially updates the environment information (step S16), and acquires the related information based on the word selected by the selection unit 83 and the environment information (step S18).

例えば、飲食店を紹介するテレビ番組の番組名が音声認識され選択された場合でも、昼食時に飲食店街を歩きながらの発話では、近くの飲食店情報が関連情報となるが、深夜の自宅における発話ではその番組のテレビ番組情報が関連情報となる。 For example, even if the program name of a TV program that introduces a restaurant is recognized and selected by voice, the information on nearby restaurants becomes related information in utterances while walking in a restaurant area at lunch time. In the utterance, the TV program information of the program is related information.

〔２−４．出力スタイルの使い分け〕
関連情報の出力スタイルは、発話速度による。すなわち、出力手段６５は、発話の速度が所定値未満のときは（ステップＳ１９：「ＮＯ」）取得された関連情報の候補を提示して出力対象の選択を求める（ステップＳ２０）。関連情報の候補は、例えば、ある語に基づくウェブ検索結果に含まれるトップ数件（３件や５件など）などである。 [2-4. Proper use of output style)
The output style of related information depends on the speech rate. That is, when the utterance speed is less than the predetermined value (step S19: “NO”), the output unit 65 presents the acquired related information candidates and requests selection of an output target (step S20). The related information candidates include, for example, several tops (three or five) included in the web search result based on a certain word.

一方、出力手段６５は、発話の速度が所定以上のときは（ステップＳ１９：「ＹＥＳ」）、選択を求めることなく、取得された関連情報の候補のうち所定の優先順位に基づいて出力対象を決定する（ステップＳ２１）。所定の優先順位は、例えば、ウェブ検索結果では１件目、飲食店検索の結果では本装置１で測位する現在位置から近い順又は予めユーザが設定した順、その他の順などである。 On the other hand, when the utterance speed is equal to or higher than a predetermined value (step S19: “YES”), the output means 65 selects an output target based on a predetermined priority among the acquired related information candidates without asking for selection. Determine (step S21). The predetermined priority order is, for example, the first case in the web search result, the order close to the current position measured by the apparatus 1 in the restaurant search result, the order set by the user in advance, or other order.

〔３．効果〕
（１）以上のように、本実施形態では（例えば図４）、会話相手の発話を抽出し音声認識した言葉を基にネット経由で関連情報を取得して出力する簡易な処理により、相手が言った語に関し自分が知らない情報など、会話を促進させる話題をリアルタイムに提供することができる。 [3. effect〕
(1) As described above, in the present embodiment (for example, FIG. 4), the partner can acquire the related information via the network based on the speech-recognized words extracted from the conversation partner, and output the related information. You can provide real-time topics that promote conversation, such as information you don't know about the words you say.

（２）また、本実施形態では、本発明の情報処理装置を、ウェアラブルデバイスとして構成することにより（例えば図１）、いつでも容易に利用できるので、どこでも誰との会話でも促進させる話題をリアルタイムに提供できる。 (2) Also, in this embodiment, the information processing apparatus of the present invention is configured as a wearable device (for example, FIG. 1), so that it can be easily used at any time. Can be provided.

（３）また、本実施形態では、次々音声認識される語から品詞に基づいて関連情報の基礎とする語を選択することにより（例えば図３のステップＳ１４及びＳ１７）、基礎とする語が多くなり過ぎず適切な語に基づく関連情報を提供できる。 (3) Further, in the present embodiment, by selecting words as the basis of the related information from words that are recognized one after another based on the part of speech (for example, steps S14 and S17 in FIG. 3), many words are used as the basis. It is possible to provide relevant information based on appropriate words.

（４）また、本実施形態では、音声認識された語と、環境を表す情報とに基づいて関連情報を取得することにより（例えば図３のステップＳ１８）、ＴＰＯ（時、場所、場面など）に応じた適切な関連情報を提供できる。 (4) In the present embodiment, TPO (time, place, scene, etc.) is obtained by acquiring related information based on the speech-recognized word and information representing the environment (for example, step S18 in FIG. 3). Appropriate relevant information can be provided.

（５）また、本実施形態では、発話が速くなければ（例えば図３のステップＳ１９：「ＮＯ」）関連情報の候補から出力対象の選択を求め（ステップＳ２０）、発話が急速な時は（ステップＳ１９：「ＹＥＳ」）、急いでいたり、せっかちな相手と想定されるので、選択を求めず関連情報の候補から所定の優先順位で出力対象を決定することにより（ステップＳ２１）、状況に応じ適切な情報を提供できる。 (5) In this embodiment, if the utterance is not fast (eg, step S19 in FIG. 3: “NO”), the selection of the output target is obtained from the related information candidates (step S20). Step S19: “YES”), it is assumed that the other party is in a hurry or impatient, so by determining the output target with a predetermined priority from the candidates of the related information without seeking selection (Step S21), depending on the situation Provide appropriate information.

〔４．他の実施形態〕
なお、上記実施形態や図の内容は例示に過ぎず、各要素の有無や配置、処理の順序や内容などは適宜変更可能である。このため、本発明は、以下に例示する変形例やそれ以外の他の実施形態も含むものである。 [4. Other embodiments]
In addition, the content of the said embodiment and figure is only an illustration, and the presence or absence and arrangement | positioning of each element, the order and content of a process, etc. can be changed suitably. For this reason, this invention also includes the modification illustrated below and other embodiment other than that.

例えば、本発明の情報処理装置は、ウェアラブルデバイスにも、ヘッドホンから関連情報を音声出力するものにも限られない。本発明の情報処理装置は、例えば、ポケットに入れたり首から下げるスマートフォンと、メガネ型のヘッドマウントディスプレイユニットから視覚情報として関連情報を出力するもの、その他の形態でもよい。 For example, the information processing apparatus of the present invention is not limited to a wearable device or a device that outputs related information from headphones. The information processing apparatus according to the present invention may be, for example, a smartphone that is put in a pocket or lowered from a neck, an apparatus that outputs related information as visual information from a glasses-type head-mounted display unit, and other forms.

また、本発明の各態様は、明記しない他のカテゴリ（方法、プログラム、端末を含むシステムなど）としても把握できる。方法やプログラムのカテゴリでは、装置のカテゴリで示した「手段」を「処理」や「ステップ」のように適宜読み替えるものとする。また、「手段」の全部又は任意の一部を「部」（ユニット、セクション、モジュール等）と読み替えることができる。 Moreover, each aspect of the present invention can be understood as other categories (methods, programs, systems including terminals, etc.) that are not specified. In the category of method or program, “means” shown in the category of apparatus is appropriately read as “process” or “step”. In addition, all or any part of “means” can be read as “part” (unit, section, module, etc.).

また、実施形態に示した処理やステップについても、順序を変更したり、いくつかをまとめて実行しもしくは一部分ずつ分けて実行するなど変更可能である。また、個々の手段、処理やステップを実現、実行するハードウェア要素などは共通でもよいし、手段、処理やステップごとにもしくはタイミングごとに異なってもよい。 Also, the processes and steps shown in the embodiment can be changed by changing the order, executing some of them collectively, or executing them part by part. In addition, hardware elements that implement and execute individual means, processes, and steps may be common, or may differ for each means, process, step, or timing.

また、本出願で示す個々の手段は、外部のサーバが提供している機能をＡＰＩ（アプリケーションプログラムインタフェース）やネットワーク・コンピューティング（いわゆるクラウドなど）で呼び出して実現してもよい。さらに、手段などの要素は、コンピュータに限らず、現在のまたは将来登場する他の情報処理機構で実現してもよい。 The individual means shown in the present application may be realized by calling a function provided by an external server by an API (Application Program Interface) or network computing (so-called cloud or the like). Furthermore, elements such as means are not limited to computers, and may be realized by other information processing mechanisms that appear now or in the future.

１情報処理装置（本装置）
２検索サーバ
６演算制御部
７記憶装置
８通信装置
２０音声取得手段
３０抽出手段
３５認識辞書記憶手段
４３認識手段
４５一時記憶手段
５４関連情報取得手段
５５検索条件記憶手段
６５出力手段
６６音声合成データ記憶手段
７３品詞判断手段
７５品詞辞書記憶手段
８３選択手段
Ｈヘッドホン
Ｍマイクロホン
Ｎ通信ネットワーク
Ｔ１、Ｔ２、Ｔ３発話
Ｖ関連情報 1. Information processing device (this device)
2 Search server 6 Arithmetic control unit 7 Storage device 8 Communication device 20 Voice acquisition means 30 Extraction means 35 Recognition dictionary storage means 43 Recognition means 45 Temporary storage means 54 Related information acquisition means 55 Search condition storage means 65 Output means 66 Speech synthesis data storage Means 73 Part of speech judgment means 75 Part of speech dictionary storage means 83 Selection means H Headphone M Microphone N Communication network T1, T2, T3 Utterance V Related information

Claims

Voice acquisition means for acquiring voice of conversation;
Extracting means for extracting the utterance of the conversation partner from the acquired voice;
Recognizing means for recognizing words from the extracted utterances;
Related information acquisition means for acquiring related information based on the speech-recognized word from a server via a communication network;
Output means for outputting the acquired related information;
Equipped with a,
The output means presents a candidate for the related information obtained when the speed of the utterance is less than a predetermined value and asks for selection of an output target, and asks for the selection when the speed of the utterance exceeds a predetermined value And determining an output target based on a predetermined priority order among the acquired candidates for the related information .

A microphone,
With headphones,
Means for wearing on the human body;
The information processing apparatus according to claim 1, wherein the information processing apparatus is a wearable device.

Part-of-speech determination means for sequentially determining the part-of-speech of the word speech-recognized by the recognition means;
Selection means for selecting a word that is the basis of the related information based on the part of speech of the words that are determined in succession;
The information processing apparatus according to claim 1, further comprising:

4. The information processing according to claim 1, wherein the related information acquisition unit acquires the related information based on the speech-recognized word and information representing an environment. 5. apparatus.

Voice acquisition processing to acquire the voice of the conversation;
An extraction process for extracting the utterance of the conversation partner from the acquired voice;
Recognition processing for recognizing words from the extracted utterances;
Related information acquisition processing for acquiring related information based on the speech-recognized word from a server via a communication network;
An output process for outputting the acquired related information;
The computer runs,
In the output process, when the utterance speed is less than a predetermined value, the obtained candidate of the related information is presented to request selection of an output target, and when the utterance speed is equal to or higher than the predetermined value, the selection is requested. And determining an output target based on a predetermined priority order among the acquired candidates for the related information .