JP5035208B2

JP5035208B2 - Information processing apparatus, interface providing method, and program

Info

Publication number: JP5035208B2
Application number: JP2008264227A
Authority: JP
Inventors: 雄介片山; 一郎赤堀
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2008-10-10
Filing date: 2008-10-10
Publication date: 2012-09-26
Anticipated expiration: 2028-10-10
Also published as: JP2010091962A

Description

本発明は、第１階層から第ｎ階層（ｎは任意の数）までの階層構造をなす複数のメニューそれぞれが有する各選択項目をユーザに選択させることにより、その選択項目を有するメニューから別階層のメニューへと遷移する、ように構成されたユーザインタフェースに関する。 The present invention allows a user to select each selection item included in each of a plurality of menus having a hierarchical structure from the first layer to the n-th layer (n is an arbitrary number), so that the menu having the selection item is separated from another menu. It is related with the user interface comprised so that it may change to a menu.

従来、ユーザが何を発話すればよいのか戸惑うことなく、ユーザの音声による選択項目の選択を実現するために、ユーザによるボタン操作に応じた項目からなるメニューを表示部に表示させると共に音声の入力を開始し、そうして入力された音声で識別された項目に対応する処理を実行する、といったシステム（従来システム１）が提案されている（特許文献１参照）。 Conventionally, in order to realize selection of selection items by the user's voice without being confused about what the user should speak, a menu consisting of items corresponding to user's button operations is displayed on the display unit and voice input And a system (conventional system 1) is proposed in which the processing corresponding to the item identified by the input voice is executed (refer to Patent Document 1).

一方、ユーザがボタン操作を行って音声入力を開始することが困難である場合や、ボタン操作にわずらわしさを感じる等の問題に対して、ユーザが明示的にボタン操作を行うことなく、ユーザの発話を常時認識するシステム（従来システム２）が提案されている（例えば特許文献２）。
特開２００７−１７１８０９号公報特開２０００−１９４３９３号公報 On the other hand, when it is difficult for the user to start a voice input by performing a button operation, or when the user feels troublesome in the button operation, the user does not perform the button operation explicitly. A system (conventional system 2) that always recognizes utterances has been proposed (for example, Patent Document 2).
JP 2007-171809 A JP 2000-194393 A

しかし上述した従来システム１は、ユーザによるボタン操作がなければ、その後に識別すべき項目がメニューとして表示されないため、従来システム２のようにボタン操作を行わない常時認識システムへ適用できないという課題がある。 However, the above-described conventional system 1 cannot be applied to an always-recognition system in which no button operation is performed, as in the conventional system 2, because an item to be identified thereafter is not displayed as a menu unless there is a button operation by the user. .

さらに、複数のメニューを階層構造にしたユーザインタフェース装置において複数の階層を一度に選択する音声入力を行いたい場合には、ユーザは階層に合わせた正しい順番及び正しい選択経路で各階層の選択項目を発話する必要がある。 Furthermore, in a user interface device having a plurality of menus in a hierarchical structure, when the user wishes to perform voice input for selecting a plurality of hierarchies at once, the user selects the selection items of each hierarchy in the correct order and the correct selection route according to the hierarchy. I need to speak.

しかし、従来システム１に代表される従来のメニュー表示技術を用いた場合、音声入力が終了し、音声認識によって選択項目が確定されるまでメニューが更新されないため、階層に合わせた正しい順番及び正しい選択経路で複数の選択項目を続けて発話することが困難になり、ユーザインタフェースとしての利便性が低くなってしまうという課題がある。 However, when the conventional menu display technique represented by the conventional system 1 is used, the menu is not updated until the voice input is completed and the selection item is confirmed by voice recognition. There is a problem that it becomes difficult to continuously utter a plurality of selection items on a route, and the convenience as a user interface is lowered.

本発明は、このような課題を解決するためになされたものであり、その目的は、ユーザの音声による選択項目の選択を実現するためのユーザインタフェースとしての利便性を従来よりも向上させることである。 The present invention has been made to solve such problems, and its purpose is to improve convenience as a user interface for realizing selection of a selection item by a user's voice as compared with the conventional one. is there.

上記課題を解決するためには、ユーザの音声による操作を実現するためのユーザインタフェース装置を、以下に示す第１の構成のようにするとよい。
この構成は、第１階層から第ｎ階層（ｎは任意の数）までの階層構造をなす複数のメニューそれぞれが有する各選択項目をユーザに選択させることにより、その選択項目を有するメニューから別階層のメニューへと遷移するように構成されたユーザインタフェースを実装してなるユーザインタフェース装置である。
In order to solve the above problems, a user interface device for implementing the operation according to the user's voice, better to first as configuration shown below.
In this configuration, by causing the user to select each selection item included in each of a plurality of menus having a hierarchical structure from the first hierarchy to the nth hierarchy (n is an arbitrary number), the menu having the selection item is changed to a different hierarchy. This is a user interface device implemented with a user interface configured to transition to the menu.

そして、複数のメニューそれぞれのうち、その時点で表示させるべきメニューであるカレントメニューを表示部に表示させるメニュー表示手段と、外部から入力される音声が、第ｉ階層（１≦i＜ｎ）のメニューから第ｎ階層のメニューへと遷移するまでに選択されうる選択項目からなるそれぞれの選択経路のうち、いずれの選択経路におけるいずれの選択項目に対応するかを推定する項目推定手段と、前記カレントメニューを、前記項目推定手段により推定された選択項目が選択されることにより表示させるべき別階層のメニューに遷移させるメニュー遷移手段と、を備えており、前記メニュー表示手段は、前記メニュー遷移手段によりカレントメニューとして遷移させられたメニューを示す画像を表示部に表示させる。 Then, among each of the plurality of menus, the menu display means for displaying the current menu, which is the menu to be displayed at that time, on the display unit and the sound input from the outside are in the i-th layer (1 ≦ i <n). Item estimation means for estimating which selection item corresponds to which selection route among the selection routes including selection items that can be selected before the transition from the menu to the n-th layer menu, and the current Menu transition means for transitioning the menu to a menu of a different hierarchy to be displayed when the selection item estimated by the item estimation means is selected, and the menu display means includes the menu transition means. An image indicating the menu that has been changed as the current menu is displayed on the display unit.

このように構成された情報処理装置では、まず、ユーザの音声が、いずれの選択経路におけるいずれの選択項目に対応するかを推定し、そうして推定した選択項目をユーザが選択したものとみなしている。そして、その選択項目が選択されることにより表示させるべき別階層のメニューへとカレントメニューを遷移させ、これを表示部に表示させている。 In the information processing apparatus configured as described above, first, it is assumed that the user's voice corresponds to which selection item in which selection route, and the selection item thus estimated is considered to be selected by the user. ing. Then, when the selected item is selected, the current menu is shifted to a menu of another level to be displayed, and this is displayed on the display unit.

このように、現時点までに入力された音声から推定される選択項目を、そのユーザが選択した項目として、メニューの表示を随時変更していくことができる。
そのため、ユーザにとっては、表示部に表示されるメニューを見ながら、その中の選択項目を任意に選んでその内容を順番に続けて発声していくだけで、そのメニューを該当する別メニューへと表示を変更させていくことができる結果、従来のように音声認識が終了してメニューが変更されるのを待った上で次の階層の項目を発声していくといった手間がかからない点で利便性が高い。 In this way, the menu display can be changed at any time with the selection item estimated from the voice input up to the present time as the item selected by the user.
For this reason, for the user, while viewing the menu displayed on the display unit, the user can select any of the selection items and utter the contents in succession. As a result, it is highly convenient in that it does not take time and effort to utter items in the next layer after waiting for the voice recognition to be completed and the menu to be changed as before. .

この構成において表示部に表示させるカレントメニューは、一旦表示された以降、継続的に表示させておけばよいが、音声の入力がないまま一定期間が経過した場合は、ユーザによる選択項目の選択が中断，中止されているといえるため、表示部における表示領域の視認性を向上させるなどの観点から、その一定期間の経過をもってメニューの表示を消去させることとしてもよい。 In this configuration, the current menu to be displayed on the display unit may be continuously displayed after being displayed once. However, when a certain period of time has elapsed without input of voice, the user can select a selection item. Since it can be said that the display is interrupted or stopped, the menu display may be erased after a certain period of time from the viewpoint of improving the visibility of the display area in the display unit.

このためには、例えば、上記構成を以下に示す第２の構成のようにするとよい。この構成において、前記メニュー表示手段は、メニューを示す画像を表示部に表示させた以降、外部からの音声の入力が所定期間以上なされていない場合に、メニューの前記表示部による表示を消去させる。
For this purpose, for example, it may be like the second configuration shown below the structure. In this configuration, the menu display means erases the display of the menu by the display unit when an external audio input has not been made for a predetermined period or longer after the image indicating the menu is displayed on the display unit.

この構成であれば、外部から音声が入力されない期間が所定期間以上継続した場合に、メニューの表示を消去させることができ、これ以降、表示部における表示領域の視認性を向上させることができる。 With this configuration, the menu display can be erased when a period in which no external audio is input continues for a predetermined period or longer, and thereafter, the visibility of the display area in the display unit can be improved.

また、上記各構成において、表示部にメニューが表示されていない状態で、新しくメニューを表示させる方法としては、例えば、上記構成を以下に示す第３の構成のようにするとよい。
In the above structure, in a state in which the menu on the display section is not displayed, as a method for displaying a new menu, for example, it may be like the third configuration shown below the structure.

この構成において、前記メニュー表示手段は、メニューが表示部に表示されていない状態で、外部から音声の入力がなされた場合に、カレントメニューを表示部に表示させる。
この構成であれば、本情報処理装置起動直後のように、メニューが表示されていない状態の場合は、内容によらず、ユーザが何らかの発話を行うことでメニューを表示することができ、利便性の高い情報処理装置を実現できる。また、第２の構成のように、外部から音声が入力されない期間が所定期間以上継続して表示が消去された後でユーザが発話を行った場合にも、カレントメニューを表示することができるため、利便性の高い情報処理装置を実現できる。 In this configuration, the menu display means displays the current menu on the display unit when a voice is input from the outside while the menu is not displayed on the display unit.
With this configuration, when the menu is not displayed, such as immediately after the information processing apparatus is started, the user can display the menu by performing some kind of utterance regardless of the contents. High information processing apparatus can be realized. Further, as in the second configuration, the current menu can be displayed even when the user utters after the display has been erased for a predetermined period of time during which no sound is input from the outside. An information processing apparatus with high convenience can be realized.

また、上記各構成におけるカレントメニューの遷移は、外部から入力される音声に基づいてのみ行われるものとすればよい。ただ、外部から入力される音声に基づいてカレントメニューが遷移した後、音声の入力がないまま一定期間が経過した場合は、ユーザによる選択項目の選択が中断，中止されているといえる。この場合、以降、ユーザが項目の選択を再度行おうとしたとき、その中断，中止前いずれのメニューがカレントメニューとなっていたかを忘れてしまい、選択項目の選択に際して混乱してしまう恐れがある。 In addition, the transition of the current menu in each of the above configurations may be performed only on the basis of sound input from the outside. However, if a certain period of time elapses without any voice input after the current menu transitions based on the voice input from the outside, it can be said that the selection of the selection item by the user is interrupted or stopped. In this case, thereafter, when the user tries to select an item again, the user forgets which menu is the current menu before the interruption or cancellation, and there is a risk of confusion in selecting the selection item.

そのため、外部から入力される音声に基づいてカレントメニューが遷移した後、音声の入力がないまま一定期間が経過した場合には、選択項目の選択を再度行うにあたって、一定の同じメニュー，例えばトップメニューから開始するようにしておくことが好適といえる。このためには、例えば、上記各構成を以下に示す第４の構成のようにするとよい。
Therefore, after a transition from the current menu based on the sound input from the outside, when a certain period of time has passed without any sound input, the selection of the selection item is performed again from a certain same menu, for example, the top menu. It can be said that it is preferable to start. For this purpose, for example, it may be like the fourth configuration shown below each structure.

この構成において、前記メニュー遷移手段は、外部からの音声の入力が所定期間以上なされていない場合に、前記カレントメニューを第１階層のメニューへと遷移させる。
この構成であれば、外部から入力される音声に基づいてカレントメニューが遷移していたとしても、外部から音声が入力されない期間が所定期間以上継続した場合には、カレントメニューが最上位階層（第１階層）のメニュー，つまりトップメニューに戻されるため、選択項目の選択を再度行うにあたってトップメニューから選択の項目を開始すればよいこととなる。 In this configuration, the menu transition means transitions the current menu to a first-level menu when an external audio input has not been made for a predetermined period or longer.
With this configuration, even if the current menu is changed based on the sound input from the outside, the current menu is displayed in the highest hierarchy (the first layer) when the period in which the sound is not input from the outside continues for a predetermined period or longer. Since the menu is returned to the (one layer) menu, that is, the top menu, the selection item may be started from the top menu when selecting the selection item again.

これにより、ユーザが選択項目の選択を中断，中止したとしても、その選択の再開時、常に第１階層のメニューから選択項目の選択を行えばよくなるため、選択項目の選択に際しての混乱を防止することができる結果、ユーザインタフェースとしての利便性を高めることができる。 As a result, even if the user interrupts or cancels the selection of the selection item, it is only necessary to select the selection item from the menu of the first hierarchy when the selection is resumed, thereby preventing confusion when selecting the selection item. As a result, convenience as a user interface can be enhanced.

また、上記各構成において、外部から入力された音声が、いずれの選択経路におけるいずれの選択項目に対応するかの推定は、外部から音声の入力が開始されたことをもって開始することとすればよい。 Further, in each of the above configurations, estimation of which selection item in which selection route corresponds to the voice input from the outside may be started when the voice input from the outside is started. .

なお、この音声が入力される「外部」とは、当該ユーザインタフェース装置の外部であり、マイクを介して音声を入力可能な構成であればこのマイクのこととすればよく、また、ネットワークを介して音声を入力する経路を有している場合であればこの経路のこととすればよい。 The “external” to which the sound is input is outside the user interface device, and may be the microphone as long as the sound can be input via the microphone. If there is a route for inputting voice, this route may be used.

また、外部から入力される音声が、いずれの選択経路におけるいずれの選択項目であるかは、どのように推定することとしてもよく、具体的な例としては、例えば、以下に示す第５の構成のようにすることが考えられる。
In addition, it may be estimated how the selected voice is input from which voice is input from the outside. As a specific example, for example, the following fifth configuration may be used. it is conceivable to make adult.

この構成においては、外部から入力された音声を、前記選択項目を選択するためのユーザの発話パターンを格納した発話パターン辞書における各発話パターンと比較し、その比較結果たる類似度が所定のしきい値以上となった発話パターンを認識結果として出力する音声認識手段と、を備えており、前記項目推定手段は、前記音声認識手段により順次認識された発話パターンに対応する選択項目それぞれからなる選択経路につき、該選択経路において最新の選択項目を推定結果として出力する。 In this configuration, the voice inputted from the outside is compared with each utterance pattern in the utterance pattern dictionary storing the user's utterance pattern for selecting the selection item, and the similarity as a result of the comparison is a predetermined threshold. Voice recognition means for outputting, as a recognition result, an utterance pattern that is equal to or greater than a value, and the item estimation means is a selection path comprising selection items corresponding to utterance patterns sequentially recognized by the voice recognition means. The latest selection item in the selected route is output as an estimation result.

この構成であれば、周知の音声認識の結果に至るまでの発話パターンそれぞれで形成される仮説探索における現時点での仮説情報を用いて、外部から入力される音声が何と発話しようとしているのかを推定したうえで、最終的にいずれの選択経路におけるいずれの選択項目を選択しようとしているのかを推定することができる。 With this configuration, the hypothesis information at the present time in the hypothesis search formed by each utterance pattern up to the result of known speech recognition is used to estimate what the externally input speech is about to be uttered. In addition, it is possible to estimate which selection item is finally selected in which selection route.

また、この構成においては、以下に示す第６の構成のようにするとよい。
この構成において、前記項目推定手段は、前記音声認識手段による認識の都度、該認識に際してしきい値以上の類似度となった発話パターンが複数種類存在した場合、最も類似度の高い発話パターンに対応する選択項目それぞれからなる選択経路につき、該選択経路において最新の選択項目を推定結果として出力する。
Further, in this configuration, it may be like the sixth configuration shown below.
In this configuration, the item estimation unit corresponds to the utterance pattern with the highest similarity when there are multiple types of utterance patterns that have a similarity level equal to or higher than a threshold value at the time of recognition by the voice recognition unit. For the selected route including each selected item, the latest selected item in the selected route is output as an estimation result.

この構成であれば、音声認識の都度、その認識に際してしきい値以上の類似度となった発話パターンが複数種類存在していたとしても、その中から最も類似度の大きな発話パターンに対応する選択項目それぞれからなる選択経路を推定することができる。 With this configuration, every time speech recognition is performed, even if there are multiple types of utterance patterns that have a similarity level equal to or higher than the threshold value during the recognition, the selection corresponding to the utterance pattern with the highest similarity is selected from among them. It is possible to estimate a selection route including each item.

このように、最も類似度が高い発話パターンは、最も実際の発話内容として尤もらしい選択経路および選択項目に対応しているといえることから、精度よく選択経路および選択項目を推定することができる。 Thus, it can be said that the utterance pattern with the highest similarity corresponds to the selection route and the selection item that are most likely as the actual utterance contents, and thus the selection route and the selection item can be estimated with high accuracy.

なお、この構成におけるある発話パターンの類似度としては、例えば当該発話パターンの発話内容に対応する確率モデルの尤度や、前記尤度を仮説探索の処理時間で正規化した値などが利用できる。 As the similarity of a certain utterance pattern in this configuration, for example, the likelihood of a probability model corresponding to the utterance content of the utterance pattern, a value obtained by normalizing the likelihood with the processing time of a hypothesis search, or the like can be used.

また、上記各構成においてカレントメニューの遷移は、例えば、その内容を示す情報を更新することで実現すればよく、そのための構成としては、上記各構成を以下に示す第７の構成のようにするとよい。
Also, the transition of the current menu in each of the above configuration, for example, may be realized by updating the information indicating the contents, as the configuration for the above each configured as a seventh configuration shown below Good.

この構成において、前記メニュー遷移手段は、前記カレントメニューを示すカレント情報を、前記項目推定手段にて推定された選択項目に基づいて表示させるべき別階層のメニューを示すものに更新することでメニューを遷移させる。そして、前記メニュー表示手段は、前記カレント情報で示されるメニューを表示部に表示させる。 In this configuration, the menu transition means updates the menu by updating the current information indicating the current menu to indicate a menu of a different hierarchy to be displayed based on the selection item estimated by the item estimation means. Transition. The menu display means displays a menu indicated by the current information on the display unit.

この構成であれば、所定の記憶領域に格納されたカレント情報を更新することにより、カレントメニューを遷移させることができる。
また、上記各構成におけるカレントメニューの遷移は、ユーザによる操作部への操作を受けた場合にも実施されるようにするとよい。 With this configuration, the current menu can be changed by updating the current information stored in a predetermined storage area.
In addition, the transition of the current menu in each configuration described above may be performed even when the user receives an operation on the operation unit.

ところで、上記各構成において、表示部によるメニューの表示に際しては、常に同じ表示態様にて表示させることとしてもよいが、周辺環境に応じてその表示態様を異ならせるようにしてもよい。 By the way, in each said structure, when displaying the menu by a display part, it is good also as making it always display in the same display mode, but you may make it change the display mode according to surrounding environment.

例えば、外部からの入力音声が、選択経路に沿った内容の音声であるか否かにより、その表示態様を異ならせることが考えられる。
このための構成としては、上記各構成を以下に示す第８の構成のようにすることが考えられる。
For example, it is conceivable to change the display mode depending on whether or not the input voice from the outside is the voice along the selected route.
As a structure for this, it is conceivable to make the eighth configuration shown below each structure.

この構成においては、外部から入力される音声に基づき、該音声が前記選択経路に沿った内容の音声であることの信頼度を特定する信頼特定手段と、前記メニュー遷移手段によりカレントメニューが遷移させられる都度、該メニューの表示態様を、前記信頼特定手段により特定された信頼度に応じて決定する第１の態様決定手段と、を備えている。そして、前記メニュー表示手段は、前記カレントメニューとして遷移させられたメニューを、該メニューについて前記第１の態様決定手段が決定した表示態様にて表示部に表示させる。 In this configuration, on the basis of the voice input from the outside, the current menu is shifted by the trust specifying means for specifying the reliability that the voice is the voice of the content along the selected route, and the menu transition means. And a first mode determining unit that determines a display mode of the menu according to the reliability specified by the trust specifying unit. Then, the menu display means displays the menu that has been changed as the current menu on the display unit in the display mode determined by the first mode determination unit for the menu.

この構成であれば、音声入力部からの入力音声が、上述した選択経路に沿った内容の音声であることの信頼度に応じて、メニューを示す画像の表示態様を異ならせることができる。 If it is this structure, according to the reliability that the input audio | voice from an audio | voice input part is the audio | voice of the content along the selection path | route mentioned above, the display mode of the image which shows a menu can be varied.

この構成における「信頼度」は、例えば、特開平１１−８５１８８号公報（以降「特許文献３」という）のように競合モデルを用意して尤度比を算出する方法や、最大の類似度を持つ仮説と他の仮説との類似度の差の大きさに対応する値を用いる方法を用いることにより特定することとすればよい。 The “reliability” in this configuration is, for example, a method of preparing a competition model and calculating a likelihood ratio as in JP-A-11-85188 (hereinafter referred to as “Patent Document 3”), or a maximum similarity. What is necessary is just to specify by using the method of using the value corresponding to the magnitude | size of the difference of the similarity degree of the hypothesis to have and another hypothesis.

また、この構成において異ならせる表示態様としては、どのようなものであってもよいが、例えば、メニューを示す画像の大きさに基づく表示態様が考えられ、このための構成としては、以下に示す第９の構成のようにすることが考えられる。
In addition, any display mode may be used in this configuration. For example, a display mode based on the size of an image showing a menu is conceivable. The configuration for this is shown below. it is conceivable to ninth as configuration.

この構成において、前記第１の態様決定手段は、前記カレントメニューの表示領域における大きさを、前記信頼特定手段により特定された信頼度に応じて決定して、前記メニュー表示手段は、前記カレントメニューとして遷移させられたメニューを、前記第１の態様決定手段により決定された表示領域に合わせた大きさにて表示部に表示させる。 In this configuration, the first aspect determining means determines the size of the display area of the current menu according to the reliability specified by the reliability specifying means, and the menu display means Are displayed on the display unit in a size corresponding to the display area determined by the first mode determining means.

この構成であれば、音声入力部からの入力音声における信頼度が高いほどメニューにおける表示領域を大きくすることができる。
また、メニューを示す画像の表示態様を異ならせる要因たる周辺環境としては、情報処理装置外部からの指令が考えられ、その指令に応じて表示態様を異ならせることが考えられる。 With this configuration, the higher the reliability in the input voice from the voice input unit, the larger the display area in the menu.
Further, as a surrounding environment that is a factor for changing a display mode of an image showing a menu, a command from the outside of the information processing apparatus can be considered, and a display mode can be changed depending on the command.

このための構成としては、上記各構成を以下に示す第１０の構成のようにすることが考えられる。
この構成においては、記メニュー遷移手段によりカレントメニューが遷移させられた以降、外部からの指令を受けて、該メニューの表示態様を決定する第２の態様決定手段、を備えている。そして、前記メニュー表示手段は、前記カレントメニューとして遷移させられたメニューを、該メニューについて前記第２の態様決定手段が決定した表示態様にて表示部に表示させる。
As a structure for this, it is conceivable to make the tenth configuration shown below each structure.
In this configuration, after the current menu is transitioned by the menu transition means, second mode determining means for determining the display mode of the menu in response to an external command is provided. Then, the menu display means displays the menu changed as the current menu on the display unit in the display mode determined by the second mode determination unit for the menu.

この構成であれば、外部からの指令に応じて、メニューを示す画像の表示態様を異ならせることができる。
この構成における「外部からの指令」としては、例えば、ユーザによる音声を入力して動作する所定装置との通信を経て、この所定装置が音声入力を受けて動作しているか否かを検知した結果を用いることが考えられる。 If it is this structure, according to the command from the outside, the display mode of the image which shows a menu can be varied.
As the “command from the outside” in this configuration, for example, a result of detecting whether or not the predetermined device is operating by receiving voice input through communication with a predetermined device that operates by inputting a voice by the user Can be considered.

このためには、上記構成を以下に示す第１１の構成のようにすればよい。
この構成において、ユーザによる音声を入力して動作する外部の所定装置（外部装置）との通信を経て、該所定装置が音声入力を受けて動作しているか否かを判定する外部音声入力判定手段，を備えている。前記第２の態様決定手段は、前記外部音声入力判定手段により外部装置側で音声入力が行われていないと判定された場合に、前記カレントメニューを表示させるべき旨を決定する一方、外部装置側で音声入力が行われていると判定された場合に、前記カレントメニューを表示させない旨を決定する。
For this purpose, the above configuration may be obtained in the eleventh configuration of the below.
In this configuration, external voice input determination means for determining whether or not the predetermined device is operating by receiving a voice input through communication with an external predetermined device (external device) that operates by inputting a user's voice. , Is provided. The second mode determining means determines that the current menu should be displayed when the external audio input determining means determines that no audio input is performed on the external apparatus side, while the external apparatus side When it is determined that the voice input is performed in step S1, it is determined that the current menu is not displayed.

この構成であれば、ユーザによる音声を入力して動作する所定装置が音声入力を受けて動作している場合に、メニューの遷移が行われないようにすることができる。
このように、所定装置が音声入力を受けて動作しているということは、本情報処理装置に対する音声入力とは無関係に発声が行われている可能性が高く、そのような無関係の音声を入力してその後の処理を行ってしまうと、ユーザの意図しないメニュー遷移が行われてしまう。 With this configuration, it is possible to prevent menu transitions when a predetermined device that operates by receiving voice input by a user is operated by receiving voice input.
As described above, the fact that the predetermined device is operating by receiving voice input is likely to be uttered regardless of the voice input to the information processing apparatus, and such irrelevant voice is input. If the subsequent processing is performed, menu transitions that are not intended by the user are performed.

そのため、上記のように、所定装置が音声入力を受けて動作している場合にメニューの表示が行われないようにすることにより、ユーザの意図しないメニュー遷移が行われないようにすることができる。 Therefore, as described above, when the predetermined device is operated by receiving voice input, the menu display is not performed, so that the menu transition not intended by the user can be prevented. .

また、上述した「外部からの指令」としては、例えば、本情報処理装置の操作部或いは当該情報処理装置に接続されている所定装置がユーザにより操作されていることを検知した結果を用いることが考えられる。 In addition, as the above-mentioned “command from the outside”, for example, a result of detecting that the operation unit of the information processing apparatus or a predetermined apparatus connected to the information processing apparatus is operated by the user is used. Conceivable.

このためには、上記第１０の構成または第１１の構成を以下に示す第１２の構成のようにすればよい。
この構成においては、当該情報処理装置の操作部或いは当該情報処理装置に接続されている所定装置がユーザにより操作されていることを検出する操作検出手段，を備えている。そして、前記第２の態様決定手段は、前記操作検出手段により操作がなされていないと判定された場合に、前記カレントメニューを表示させるべき旨を決定する一方、前記操作検出手段により操作がなされていると判定された場合に、前記カレントメニューを表示させない旨を決定する。
For this purpose, the configuration of the tenth configuration or the 11 may be obtained in the twelfth configuration shown below.
In this configuration, operation detecting means for detecting that an operation unit of the information processing apparatus or a predetermined apparatus connected to the information processing apparatus is operated by a user is provided. The second mode determining means determines that the current menu should be displayed when it is determined that the operation detecting means is not operated, and the operation detecting means is operated. When it is determined that the current menu is present, it is determined that the current menu is not displayed.

この構成であれば、本情報処理装置の操作部或いは当該情報処理装置に接続されている所定装置が操作されている場合に、メニューの遷移が行われないようにすることができる。 With this configuration, it is possible to prevent menu transition when the operation unit of the information processing apparatus or a predetermined apparatus connected to the information processing apparatus is operated.

このように、本情報処理装置の操作部或いは当該情報処理装置に接続されている所定装置が操作されているということは、本情報処理装置のメニュー選択とは無関係に発声が行われている可能性が高く、そのような無関係の音声を入力してその後の処理を行ってしまうと、ユーザの意図しないメニュー遷移が行われてしまう。 As described above, when the operation unit of the information processing apparatus or a predetermined device connected to the information processing apparatus is operated, the utterance may be performed regardless of the menu selection of the information processing apparatus. If such an irrelevant voice is input and the subsequent processing is performed, menu transition unintended by the user is performed.

そのため、上記のように、操作部に対する操作が行われている場合にメニューの遷移が行われないようにすることにより、ユーザの意図しないメニュー遷移が行われないようにすることができる。 Therefore, as described above, it is possible to prevent menu transitions that are not intended by the user from being performed by preventing menu transitions when the operation unit is operated.

また、上述した「外部からの指令」としては、例えば、本情報処理装置周辺に位置しているユーザの数を検出した結果を用いることが考えられる。
このためには、上記第１０〜第１２のいずれかの構成を以下に示す第１３の構成のようにすればよい。
Further, as the above-mentioned “command from the outside”, for example, it is conceivable to use a result of detecting the number of users located around the information processing apparatus.
For this purpose, the configuration of any of the above tenth to twelfth may be obtained in the first 13 Configuring shown below.

この構成においては、当該情報処理装置周辺に位置しているユーザの数を検出するユーザ検出手段，を備えている。そして、前記第２の態様決定手段は、前記ユーザ検出手段により１人のユーザのみが位置していることが検出された場合、前記カレントメニューの表示領域の大きさを通常の大きさとして決定する一方、複数のユーザが位置していることが検出された場合、前記カレントメニューの表示領域の大きさを通常よりも小さくするように決定する。 In this configuration, there is provided user detection means for detecting the number of users located around the information processing apparatus. The second aspect determining means determines the size of the display area of the current menu as a normal size when the user detecting means detects that only one user is located. On the other hand, when it is detected that a plurality of users are located, the size of the display area of the current menu is determined to be smaller than usual.

この構成であれば、１人のユーザのみが位置していることが検出された場合には、カレントメニューの表示領域を通常の大きさとするが、複数のユーザが位置していることが検出された場合には、カレントメニューの表示領域を通常よりも小さい表示領域とすることができる。 With this configuration, when it is detected that only one user is located, the display area of the current menu is set to a normal size, but it is detected that a plurality of users are located. In this case, the display area of the current menu can be made smaller than usual.

このように、複数のユーザが周辺に位置している場合は、本情報処理装置を音声により操作する以外のユーザからすると、表示部に表示されるメニューが必ずしも必要な情報ではない。そのため、上記構成のように、このような場合におけるメニューの表示領域を小さくすることで、そのような表示態様を異ならせない構成と比べて、他のユーザにとっての表示部の視認性が低下することを防止することができる。 As described above, when a plurality of users are located in the vicinity, a menu displayed on the display unit is not necessarily necessary information from a user other than operating the information processing apparatus by voice. Therefore, by reducing the menu display area in such a case as in the above configuration, the visibility of the display unit for other users is reduced compared to a configuration in which such a display mode is not changed. This can be prevented.

なお、この構成において、周辺に位置しているユーザの数を検出するためには、その周辺においてユーザが位置しうる領域付近にセンサを配置しておき、その検出結果に基づいてユーザの数を検出することとすればよい。また、周辺においてユーザが位置しうる領域をカメラで撮影しておき、その映像に含まれるユーザを画像処理で特定することにより、ユーザの数を検出することとすればよい。 In this configuration, in order to detect the number of users located in the vicinity, a sensor is arranged in the vicinity of the area where the user can be located, and the number of users is determined based on the detection result. What is necessary is just to detect. In addition, it is only necessary to detect the number of users by photographing an area where the user can be located in the vicinity with a camera and specifying the user included in the video by image processing.

また、上記各構成においては、第１４の構成のように、前記項目推定手段により推定された選択項目に割り当てられた所定の処理を実施する処理実施手段，を備えているようにしてもよい。
In the above structures, 14 as configuration of the processing means for executing a predetermined process assigned to the estimated selected item by the item estimating means, it may be provided with a .

この構成であれば、各メニューにおける選択項目のうち、最下層のメニューなどにおいて所定の処理が割り当てられた選択項目が選択されたとみなされた場合に、その割り当てられた処理を実行することができる。 With this configuration, when a selection item to which a predetermined process is assigned in a menu at the lowest layer among the selection items in each menu is considered to be selected, the assigned process can be executed. .

また、上記課題を解決するためには、第１階層から第ｎ階層（ｎは任意の数）までの階層構造をなす複数のメニューそれぞれが有する各選択項目をユーザに選択させることにより、その選択項目を有するメニューから別階層のメニューへと遷移するように構成されたユーザインタフェースを提供するためのユーザインタフェース提供方法としてもよい。
Further, in order to solve the above-mentioned problem, the selection is performed by causing the user to select each selection item included in each of a plurality of menus having a hierarchical structure from the first hierarchy to the n-th hierarchy (n is an arbitrary number). item may be a user interface provided of how to provide a user interface that is configured to transition to a menu of another hierarchical menu having.

このインタフェース提供方法は、複数のメニューそれぞれのうち、その時点で表示させるべきメニューであるカレントメニューを表示部に表示させるメニュー表示手順と、外部から入力される音声が、第ｉ階層（１≦i＜ｎ）のメニューから第ｎ階層のメニューへと遷移するまでに選択されうる選択項目それぞれの選択経路のうち、いずれの選択経路におけるいずれの選択項目に対応するかを推定する項目推定手順と、前記カレントメニューを、前記項目推定手順により推定した選択項目が選択されることにより表示させるべき別階層のメニューに遷移させるメニュー遷移手順と、を含む。そして、前記メニュー表示手順では、前記メニュー遷移手順にてカレントメニューとして遷移させられたメニューを示す画像を表示部に表示させる。 In this interface providing method, a menu display procedure for displaying a current menu, which is a menu to be displayed at that time, among the plurality of menus on the display unit, and audio input from the outside are in the i-th layer (1 ≦ i An item estimation procedure for estimating which selection item corresponds to which selection route among the selection routes of the selection items that can be selected before the transition from the menu of <n) to the menu of the nth layer; A menu transition procedure for transitioning the current menu to a menu of another hierarchy to be displayed when the selection item estimated by the item estimation procedure is selected. In the menu display procedure, an image showing the menu that has been changed as the current menu in the menu transition procedure is displayed on the display unit.

このインタフェース提供方法であれば、上述した第１の構成に係るユーザインタフェース装置と同様の作用，効果を得ることができる。
なお、このインタフェース提供方法は、上述した第２〜第１４のいずれかの構成に係るユーザインタフェース装置における各手段を手順として実現した方法としてもよく、この場合、上述した第２〜第１４のいずれかの構成に係るユーザインタフェース装置と同様の作用，効果を得ることができる。 With this interface providing method, the same operations and effects as those of the user interface device according to the first configuration described above can be obtained.
Note that this interface providing method may be a method in which each means in the user interface device according to any one of the second to fourteenth configurations described above is realized as a procedure. The same operations and effects as those of the user interface device according to the above configuration can be obtained.

また、上記課題を解決するためには、上述した第１〜第１４のいずれかの構成に係る全ての手段として機能させるための各種処理手順をコンピュータシステムに実行させるためのプログラムとしてもよい。 In order to solve the above problems, even if a program for executing various processing procedures for functioning as all the means according to any one of the first to 14 described above in a computer system Good.

このプログラムにより制御されるコンピュータシステムであれば、上記第１から第１４のいずれかの構成に係るユーザインタフェース装置の一部を構成することができる。
なお、上述したプログラムは、コンピュータシステムによる処理に適した命令の順番付けられた列からなるものであって、各種記録媒体や通信回線を介してユーザインタフェース，情報処理装置や、これを利用するユーザ等に提供されるものである。 A computer system controlled by this program can constitute a part of the user interface device according to any one of the first to fourteenth configurations.
Note that the above-described program is composed of an ordered sequence of instructions suitable for processing by a computer system, and includes a user interface, an information processing apparatus, and a user using the same via various recording media and communication lines. Etc. are provided.

以下に本発明の実施形態を図面と共に説明する。
（０）全体構成
情報処理装置１は、周知のナビゲーション装置のユーザインタフェースを実現すべく、このナビゲーション装置に実装されたものであり、図１に示すように、ＣＰＵ，ＲＯＭ，ＲＡＭなどからなる制御部１０と、入出力インタフェース（Ｉ／Ｏ）２０と、からなる周知のコンピュータシステムであって、ナビゲーション装置のうち、各種情報を記憶する記憶部２，ユーザによる操作を受け付ける操作部３，各種情報を表示する表示部４，マイク５を介した音声の入力を制御する音声入力部６などが接続されている。 Embodiments of the present invention will be described below with reference to the drawings.
(0) Overall Configuration The information processing apparatus 1 is mounted on a navigation device in order to realize a user interface of a known navigation device. As shown in FIG. 1, a control including a CPU, a ROM, a RAM, and the like. 1 is a well-known computer system comprising a unit 10 and an input / output interface (I / O) 20, and includes a storage unit 2 for storing various information, an operation unit 3 for receiving operations by a user, and various types of information. Are connected to a display unit 4 for displaying the voice, a voice input unit 6 for controlling voice input via the microphone 5, and the like.

これらのうち、制御部１０は、ＲＯＭに記憶されたプログラムに従って各種処理を実行することで、音声入力部６を介した音声の入力レベル（音量）によってユーザの発話音声が含まれているか否かを検出する音声検出手段３１，マイク５を介して入力される音声で示される選択項目（後述する）を推定する項目推定手段３３，マイク５を介して入力される音声の内容を周知の音声認識により解析する音声認識手段３５，音声認識手段３５の解析結果に応じた処理を実施する処理実施手段３７，項目推定手段３３による項目推定結果に基づいて表示すべきメニューを遷移させるメニュー遷移手段３８、表示部４によるメニューの表示を制御するメニュー表示手段３９などとして機能する。これら機能によって、制御部１０は、ナビゲーション装置のユーザインタフェースを実現している。 Among these, the control part 10 performs various processes according to the program memorize | stored in ROM, and whether a user's speech voice is contained by the input level (sound volume) of the voice via the voice input part 6 is contained. The voice detection means 31 for detecting the sound, the item estimation means 33 for estimating a selection item (to be described later) indicated by the voice input via the microphone 5, and the content of the voice input via the microphone 5 is well-known voice recognition The voice recognition means 35 for analyzing by the above, the processing execution means 37 for executing the process according to the analysis result of the voice recognition means 35, the menu transition means 38 for changing the menu to be displayed based on the item estimation result by the item estimation means 33, It functions as a menu display unit 39 for controlling the display of the menu by the display unit 4. With these functions, the control unit 10 realizes a user interface of the navigation device.

このユーザインタフェースは、第１階層から第ｎ階層（ｎは任意の数）までの階層構造をなす複数のメニューそれぞれが有する各選択項目をユーザに選択させることにより、その選択項目を有するメニューから別階層のメニューへと遷移するように構成されたものである。 In this user interface, by allowing the user to select each selection item included in each of a plurality of menus having a hierarchical structure from the first layer to the nth layer (n is an arbitrary number), the user interface is separated from the menu having the selection item. It is configured to transition to a hierarchical menu.

具体的には、ユーザがマイク５に向けて音声を発した以降、表示部４に第１階層のメニュー（トップメニュー）が表示され（図２の画面Ａ参照）、その後、このメニューにて選択可能ないずれかの選択項目を発してなる音声をマイク５から入力すると、その選択項目が選択されたものとして、その選択項目の選択により遷移すべき別階層のメニューへと表示内容を変化させていく（図２の画面Ｂ，Ｃ参照）、といったユーザインタフェースである。そして、最下層のメニューにおいて選択された選択項目に対応する処理が実施されることとなる。 Specifically, after the user utters a sound toward the microphone 5, the first level menu (top menu) is displayed on the display unit 4 (see screen A in FIG. 2), and can be selected from this menu thereafter. When the sound generated from any one of the selection items is input from the microphone 5, it is assumed that the selection item has been selected, and the display content is changed to a menu of a different hierarchy to be shifted by the selection of the selection item. (Refer to screens B and C in FIG. 2). And the process corresponding to the selection item selected in the menu of the lowest layer will be implemented.

以下、上記のような構成の情報処理装置１について、制御部１０により実行される処理手順が異なる実施形態を順に説明する。
（１）第１実施形態
（１−１）指示受付処理
はじめに、情報処理装置１が起動された以降、制御部１０のＣＰＵがＲＯＭに格納されたプログラムに従って繰り返し実行する指示受付処理の処理手順を、図３に基づいて説明する。 Hereinafter, for the information processing apparatus 1 configured as described above, embodiments in which processing procedures executed by the control unit 10 are different will be described in order.
(1) First Embodiment (1-1) Instruction Accepting Process First, after the information processing apparatus 1 is activated, the procedure of the instruction accepting process that the CPU of the control unit 10 repeatedly executes according to the program stored in the ROM. This will be described with reference to FIG.

この指示受付処理が起動されると、まず、音声入力部６を介した音声の入力が開始されるまで待機状態となる（ｓ１１０：ＮＯ）。ここでは、音声検出手段３１により検出された音声のレベルが一定以上となった場合に、ユーザの発話が開始されたと判定される。 When this instruction receiving process is activated, the process first waits until voice input via the voice input unit 6 is started (s110: NO). Here, it is determined that the user's utterance has been started when the level of the sound detected by the sound detection means 31 becomes a certain level or more.

その後、ユーザの発話が開始されたら（ｓ１１０：ＹＥＳ）、後述する表示内容決定処理が行われる（ｓ１２０）。
この表示内容決定処理では、その時点までにマイク５を介して入力された音声が、第ｉ階層（１≦i＜ｎ）のメニューから第ｎ階層のメニューへと遷移するまでに選択されうる選択項目それぞれの選択経路のうち、いずれの選択経路におけるいずれの選択項目に対応するかを推定し、その推定結果に応じて表示部４に表示させるべきメニューを決定する。 Thereafter, when the user's utterance is started (s110: YES), a display content determination process described later is performed (s120).
In this display content determination processing, the selection that can be selected until the voice input through the microphone 5 up to that point changes from the menu in the i-th layer (1 ≦ i <n) to the menu in the n-th layer. Of the selection paths for each item, it is estimated which selection item corresponds to which selection path, and the menu to be displayed on the display unit 4 is determined according to the estimation result.

次に、上記ｓ１２０での決定事項に基づいて、表示部４にメニューを表示させるためのメニュー表示処理が行われる（ｓ１３０）。ここでは、上記ｓ１２０にて決定されたメニューが、メニュー表示手段３９により表示部４に表示させられる。 Next, a menu display process for displaying a menu on the display unit 4 is performed based on the items determined in s120 (s130). Here, the menu determined in s120 is displayed on the display unit 4 by the menu display means 39.

なお、上記ｓ１２０にてメニューが決定されていない場合、メニュー表示手段３９は、その時点で表示部４に表示されているメニューの表示を消去させる。ここでいう「メニューが決定されていない」とは、メニューの表示を消去させるべき旨が決定されていた場合や、表示させるべきメニューが存在していなかった場合などのことである。 When the menu is not determined in s120, the menu display unit 39 deletes the menu displayed on the display unit 4 at that time. Here, “the menu has not been determined” refers to a case where it has been determined that the menu display should be erased, or a case where there is no menu to be displayed.

次に、音声認識手段３５が、ユーザの音声に対する音声認識を終了すべき状況であるか否かをチェックする（ｓ１５０）。ここでは、上述した音声検出手段３１による音声入力の検出が所定期間以上なされていない場合に、音声認識を終了すべき状況であると判定される。 Next, the voice recognition means 35 checks whether or not the voice recognition for the user's voice should be terminated (s150). Here, it is determined that the voice recognition should be terminated if the voice input detection by the voice detection unit 31 described above has not been performed for a predetermined period or longer.

このｓ１５０で音声認識を終了すべき状況ではないと判定された場合（ｓ１５０：ＮＯ）、プロセスがｓ１２０へと戻り、以降、音声認識を終了すべき状況となるまで、上記ｓ１２０〜ｓ１５０が繰り返し行われる。 When it is determined in s150 that the voice recognition is not to be terminated (s150: NO), the process returns to s120, and thereafter, the above steps s120 to s150 are repeated until the voice recognition is terminated. Is called.

そして、音声認識を終了すべき状況となったら、上記ｓ１５０でその旨が判定され（ｓ１５０：ＹＥＳ）、音声認識手段３５が、その時点までにマイク５を介して入力され、ＣＰＵの内蔵メモリまたはＲＡＭに格納された音声に対して、周知の音声認識を行うことにより、その音声で示される文字列が特定される（ｓ１５２）。 Then, when the situation in which voice recognition should be terminated is reached, this is determined in s150 (s150: YES), and the voice recognition means 35 is input through the microphone 5 by that time, and the internal memory of the CPU or By performing well-known voice recognition on the voice stored in the RAM, the character string indicated by the voice is specified (s152).

そして、処理実施手段３７が、上記ｓ１５２にて特定された文字列に対応する選択項目に基づき、その選択項目に割り当てられた所定の処理を実行した後（ｓ１６０）、プロセスがｓ１１０へと戻る。 Then, after the processing execution unit 37 executes a predetermined process assigned to the selected item based on the selected item corresponding to the character string specified in s152 (s160), the process returns to s110.

このｓ１６０において、例えば、選択項目が第ｎ階層のメニューにおける選択項目でないなど、選択項目に割り当てられた処理が存在していない場合には、現時点で表示されているメニュー及び現在時刻が履歴情報（カレント情報）としてメモリまたはＲＡＭの所定領域に格納され（既に格納されている場合はその履歴情報が更新され）、プロセスがｓ１１０へと戻る。 In this s160, when there is no process assigned to the selected item, for example, the selected item is not a selected item in the n-th layer menu, the currently displayed menu and the current time are displayed as history information ( (Current information) is stored in a predetermined area of the memory or RAM (if already stored, the history information is updated), and the process returns to s110.

なお、本実施形態では、上記ｓ１１０で音声の入力が開始されたと判定された以降、そうして入力される音声を示す情報がメモリまたはＲＡＭに蓄積されていき、プロセスがｓ１１０へと戻るとそれまでに蓄積された音声の情報が削除されるように構成されている。
（１−２）表示内容決定処理
続いて、上記指示受付処理のｓ１２０である表示内容決定処理の処理手順を図４に基づいて説明する。 In the present embodiment, after it is determined that the voice input has been started in s110, information indicating the voice thus input is accumulated in the memory or RAM, and when the process returns to s110, The audio information accumulated up to this point is deleted.
(1-2) Display Content Determination Process Next, the processing procedure of the display content determination process that is s120 of the instruction reception process will be described with reference to FIG.

この表示内容決定処理では、まず、音声認識手段３５が後述する仮説情報を生成する（ｓ２１０）。
ここでは、まず、この時点で蓄積されている情報で示される音声，つまりその時点までにマイク５を介して入力された音声を、予め保持している音響的・言語的確率モデル及び後述する発話パターン辞書と、周知の仮説探索によって比較し、その比較結果たる類似度が最も大きい発話パターンを示す仮説（図５の「１位」参照）について、発話パターン辞書上での位置（現在位置）、類似度、及び現在時刻を示す情報を仮説情報として生成する。この類似度としては、例えば、当該発話パターンの発話内容に対応する確率モデルの尤度や、尤度を仮説探索の処理時間で正規化した値などが利用できる。 In this display content determination process, first, the speech recognition means 35 generates hypothesis information described later (s210).
Here, first, the voice indicated by the information accumulated at this time, that is, the voice inputted through the microphone 5 up to that time, the acoustic / linguistic probability model stored in advance and the utterance to be described later The position (current position) on the utterance pattern dictionary for the hypothesis (see “No. 1” in FIG. 5) that compares the pattern dictionary with a known hypothesis search and indicates the utterance pattern with the highest similarity as the comparison result. Information indicating the similarity and the current time is generated as hypothesis information. As the similarity, for example, the likelihood of the probability model corresponding to the utterance content of the utterance pattern, a value obtained by normalizing the likelihood with the processing time of the hypothesis search, or the like can be used.

上述した発話パターン辞書は、選択項目或いは選択経路を選択するためにユーザがどのように発話するかを表す発話パターンを格納しており、本実施例では、図６に示すように、発話パターンを単語毎の接続関係で規定した有効グラフ状の形で表されている。 The utterance pattern dictionary described above stores utterance patterns representing how a user utters in order to select a selection item or a selection route. In this embodiment, as shown in FIG. It is represented in the form of an effective graph defined by the connection relationship for each word.

なお、この発話パターン辞書では、想定される複数の発話パターンを単語単位に分解し、この単語を接続していくことにより、第ｉ階層（１≦i＜ｎ）のメニューから第ｎ階層のメニューへと遷移するまでに選択されうる選択項目の選択経路がそれぞれ形成される。 In this utterance pattern dictionary, a plurality of assumed utterance patterns are decomposed into units of words, and the words are connected to each other so that menus in the i-th layer (1 ≦ i <n) can be changed to menus in the n-th layer. Selection paths of selection items that can be selected before transitioning to are formed.

次に、音声認識手段３５は、上記ｓ２１０にて生成された仮説情報をメモリにおける仮説情報用の記憶領域に記憶させる（ｓ２２０）。
次に、項目推定手段３３は、上記ｓ２２０にて記憶させた仮説情報に基づき、以下の手順に従って上記発話パターン辞書における最終的な現在位置を確定する（ｓ２３０）。 Next, the speech recognition means 35 stores the hypothesis information generated in s210 in the hypothesis information storage area in the memory (s220).
Next, the item estimating means 33 determines the final current position in the utterance pattern dictionary according to the following procedure based on the hypothesis information stored in s220 (s230).

ここでは、まず、最終的な現在位置の候補となる候補位置として、あらかじめ定められた初期位置が設定される（ｓ３１０）。本実施形態では、第１階層のメニューとして定められたトップメニューに対応する位置として単語の存在しない位置が初期位置として定められており（図６「初期位置」参照）、この位置が候補位置に設定される。 Here, first, a predetermined initial position is set as a candidate position to be a final current position candidate (s310). In the present embodiment, a position where no word exists is defined as an initial position as a position corresponding to the top menu defined as the menu of the first hierarchy (see “initial position” in FIG. 6), and this position is set as a candidate position. Is done.

続いて、この時点でメモリまたはＲＡＭに履歴情報（カレント情報）が記憶されているか否かがチェックされ（ｓ３２０）、履歴情報が記憶されていれば（ｓ３２０：ＹＥＳ）、この履歴情報で示される現在時刻と実際の現在時刻との差，つまり履歴情報が生成された以降の経過時間Ｔ０が、所定のしきい値ＴＨｓ以上である（ＴＨｓ≦Ｔｏ）か否かがチェックされる（ｓ３３０）。 Subsequently, whether or not history information (current information) is stored in the memory or RAM at this time is checked (s320). If history information is stored (s320: YES), this history information indicates this. It is checked whether or not the difference between the current time and the actual current time, that is, the elapsed time T0 after the history information is generated is equal to or greater than a predetermined threshold THs (THs ≦ To) (s330).

なお、この「しきい値ＴＨｓ」とは、履歴情報が生成された以降、選択項目の選択が中断，中止された場合に到達しうる経過時間として定められたものである。
このｓ３３０で経過時間Ｔ０がしきい値ＴＨｓ未満である（Ｔ０＜ＴＨｓ）と判定された場合には（ｓ３３０：ＮＯ）、候補位置としてその履歴情報で示される現在位置が設定された後（ｓ３４０）、プロセスが次の処理（ｓ３５０）へと移行する。 The “threshold value THs” is determined as an elapsed time that can be reached when selection of a selection item is interrupted or stopped after history information is generated.
When it is determined in s330 that the elapsed time T0 is less than the threshold value THs (T0 <THs) (s330: NO), the current position indicated by the history information is set as a candidate position (s340). ), The process proceeds to the next process (s350).

また、上記ｓ３２０で履歴情報が記憶されていないと判定された場合（ｓ３２０：ＮＯ），または，上記ｓ３３０で経過時間Ｔ０がしきい値ＴＨｓ以上であると判定された場合（ｓ３３０：ＹＥＳ）、上記ｓ３４０が行われることなく、プロセスが次の処理（ｓ３５０）へと移行する。 If it is determined in s320 that no history information is stored (s320: NO), or if it is determined in s330 that the elapsed time T0 is greater than or equal to the threshold value THs (s330: YES), The process proceeds to the next process (s350) without performing the above s340.

次に、この時点でメモリに仮説情報が記憶されているか否かがチェックされ（ｓ３５０）、仮説情報が記憶されていれば（ｓ３５０：ＹＥＳ）、この仮説情報で示される類似度ｒが所定の最低値ＴＨａより大きい（ＴＨａ＜ｒ）か否かがチェックされる（ｓ３６０）。 Next, it is checked whether hypothesis information is stored in the memory at this time (s350). If hypothesis information is stored (s350: YES), the similarity r indicated by the hypothesis information is a predetermined value. It is checked whether it is larger than the minimum value THa (THa <r) (s360).

このｓ３６０で類似度ｒが最低値ＴＨａより大きいと判定された場合（ｓ３６０：ＹＥＳ）、その仮説情報で示される現在位置の整合性がチェックされる（ｓ３７０）。ここでは、仮説情報で示される現在位置が、この時点でメモリに記憶されている履歴情報で示される現在位置から発話パターン辞書を順方向に辿ることで到達できる位置にあることをもって、現在位置同士の整合性があると判定される。 If it is determined in s360 that the similarity r is greater than the minimum value THa (s360: YES), the consistency of the current position indicated by the hypothesis information is checked (s370). Here, the current position indicated by the hypothesis information is located at a position that can be reached by following the utterance pattern dictionary in the forward direction from the current position indicated by the history information stored in the memory at this time. It is determined that there is consistency.

このｓ３７０で現在位置同士の整合性があると判定された場合（ｓ３７０：ＹＥＳ）、候補位置としてその仮説情報で示される現在位置が設定された後（ｓ３８０）、プロセスが次の処理（ｓ４００）へと移行する。 If it is determined in s370 that the current positions are consistent (s370: YES), the current position indicated by the hypothesis information is set as a candidate position (s380), and then the process is the next process (s400). Migrate to

また、上記ｓ３７０で現在位置同士の整合性がないと判定された場合（ｓ３７０：ＮＯ）、この仮説情報で示される類似度ｒが所定のしきい値ＴＨｂより大きい（ＴＨｂ＜ｒ）か否かがチェックされる（ｓ３９０）。なお、この「しきい値ＴＨｂ」は、しきい値ＴＨａよりも大きな値として定められたものである。 If it is determined in s370 that there is no consistency between the current positions (s370: NO), whether or not the similarity r indicated by this hypothesis information is greater than a predetermined threshold value THb (THb <r). Is checked (s390). This “threshold value THb” is determined as a value larger than the threshold value THa.

このｓ３９０で類似度ｒがしきい値ＴＨｂより大きいと判定された場合（ｓ３９０：ＹＥＳ）、プロセスがｓ３８０へと移行し、候補位置としてその仮説情報で示される現在位置が設定される。 When it is determined in s390 that the similarity r is greater than the threshold value THb (s390: YES), the process proceeds to s380, and the current position indicated by the hypothesis information is set as a candidate position.

また、上記ｓ３５０で仮説情報が記憶されていないと判定された場合（ｓ３５０：ＮＯ），上記ｓ３６０で類似度ｒが最低値ＴＨａ以下であると判定された場合（ｓ３６０：ＮＯ），または，上記ｓ３９０で類似度ｒがしきい値ＴＨｂ以下であると判定された場合（ｓ３９０：ＮＯ）、上記ｓ３８０が行われることなく、プロセスが次の処理（ｓ４００）へと移行する。 Further, when it is determined in s350 that no hypothesis information is stored (s350: NO), when it is determined in s360 that the similarity r is equal to or lower than the minimum value THa (s360: NO), or When it is determined in s390 that the similarity r is equal to or less than the threshold value THb (s390: NO), the process proceeds to the next process (s400) without performing the above s380.

そして、この時点における候補位置が最終的な現在位置として確定される（ｓ４００）。こうして、ｓ３１０〜ｓ４００にての発話パターン辞書における最終的な現在位置が確定された後、項目推定手段３３は、その現在位置に基づいて表示部４に表示させるべきメニューを決定する（ｓ２４０）。 Then, the candidate position at this time is determined as the final current position (s400). Thus, after the final current position in the utterance pattern dictionary in s310 to s400 is determined, the item estimation unit 33 determines a menu to be displayed on the display unit 4 based on the current position (s240).

発話パターン辞書は、選択項目を選択するための予め想定された発話パターンを単語毎の接続関係で規定するものであることから、この発話パターン辞書における最終的な現在位置は、いずれかの選択経路に沿って辿り着いた選択項目を示すものとなる。 Since the utterance pattern dictionary defines the utterance pattern assumed in advance for selecting the selection item by the connection relation for each word, the final current position in the utterance pattern dictionary is any selected route. The selection items arrived along are shown.

そのため、このｓ２４０では、最終的な現在位置である単語に対応する選択項目が、ユーザにより選択された選択項目とみなされ、その選択項目が選択されることにより遷移させるべき別階層のメニューが存在していれば、そのメニューが表示部４に表示させるべきメニューとして決定される。 For this reason, in this s240, the selection item corresponding to the word at the final current position is regarded as the selection item selected by the user, and there is a menu of another level to be shifted when the selection item is selected. If so, the menu is determined as a menu to be displayed on the display unit 4.

なお、音声の入力が開始された直後などのように仮説情報が記憶されておらず、かつ履歴情報も記憶されていない場合は、候補位置には予め定められた初期位置が設定されているため、表示させるべきメニューとしては初期位置に対応する第１階層のメニューが選ばれることとなる。 Note that if no hypothesis information is stored and no history information is stored, such as immediately after voice input is started, a predetermined initial position is set as a candidate position. As a menu to be displayed, the first layer menu corresponding to the initial position is selected.

次に、メニュー遷移手段３８が、その時点で表示させるべきメニューであるカレントメニューを、上記ｓ２４０にて決定されたメニューに遷移させる（ｓ２５０）。ここでは、カレントメニューおよび現在時刻がメモリまたはＲＡＭの所定領域に格納され（既に格納されている場合はその内容が更新され）、これにより、カレントメニューが遷移する。 Next, the menu transition means 38 transitions the current menu, which is the menu to be displayed at that time, to the menu determined in s240 (s250). Here, the current menu and the current time are stored in a predetermined area of the memory or RAM (if already stored, the contents are updated), and the current menu changes accordingly.

次に、メニュー表示手段３９は、過去の一定期間内にマイク５を介した音声の入力があったか否かをチェックする（ｓ２６０）。ここでは、上述した音声検出手段３１による音声入力の検出が一定期間内になされていれば、過去の一定期間内にマイク５を介した音声の入力があると判定される一方、音声入力の検出が一定期間内になされていなければ、過去の一定期間内にマイク５を介した音声の入力がないと判定される。 Next, the menu display means 39 checks whether or not there has been an input of sound through the microphone 5 within a certain period in the past (s260). Here, if the voice input detection by the voice detection unit 31 described above is performed within a certain period, it is determined that there is a voice input through the microphone 5 within the past certain period, while the voice input is detected. Is not performed within a certain period, it is determined that there is no input of sound via the microphone 5 within the past certain period.

なお、この「一定期間」とは、ユーザによる選択項目の選択が中断，中止された場合に到達しうる経過時間として定められたものである。
このｓ２６０で、過去の一定期間内にマイク５を介した音声の入力があると判定された場合には（ｓ２６０：ＹＥＳ）、メニューを表示すべき旨の決定がなされた後（ｓ２７０）、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。 The “certain period” is defined as an elapsed time that can be reached when selection of a selection item by the user is interrupted or stopped.
If it is determined in s260 that there has been an input of sound through the microphone 5 within a predetermined period in the past (s260: YES), after determining that the menu should be displayed (s270), the process Returns to the instruction reception process (shifts to s130).

一方、上記ｓ２６０で、過去の一定期間内にマイク５を介した音声の入力がないと判定された場合には（ｓ２６０：ＮＯ）、メニューの表示を消去すべき旨の決定がなされた後（ｓ２８０）、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。
（１−３）作用，効果
このように構成された情報処理装置１では、まず、ユーザの音声が、いずれの選択経路におけるいずれの選択項目に対応するかを推定し、そうして推定した選択項目をユーザが選択したものとみなしている（図４のｓ２１０）。このとき、「いずれの選択経路におけるいずれの選択項目に対応するか」は、周知の音声認識の結果に至るまでの発話パターンそれぞれで形成される仮説探索の仮説情報を用いて、外部から入力される音声が何と発話しようとしているのかを推定したうえで、最終的にいずれの選択経路におけるいずれの選択項目を選択しようとしているのかを推定している。 On the other hand, if it is determined in s260 that there is no voice input via the microphone 5 within a certain period in the past (s260: NO), it is determined that the menu display should be deleted ( s280), the process returns to the instruction receiving process (shifts to s130).
(1-3) Action and Effect In the information processing apparatus 1 configured as described above, first, it is estimated which user's voice corresponds to which selection item in which selection route, and the selection thus estimated. The item is regarded as being selected by the user (s210 in FIG. 4). At this time, “which selection item corresponds to which selection route” is input from the outside using hypothesis information of the hypothesis search formed by each utterance pattern up to the result of known speech recognition. It is estimated what the selected voice is going to utter, and finally which selection item in which selection route is to be selected.

そして、その選択項目が選択されることにより表示させるべき別階層のメニューへとカレントメニューを遷移させ（同図ｓ２３０〜ｓ２５０）、これを表示部４に表示させている（図３のｓ１３０）。このように、ユーザが実際に選択した項目に対応させて、メニューの表示を随時変更していくことができる。 Then, when the selected item is selected, the current menu is shifted to a menu of another hierarchy to be displayed (s230 to s250 in the figure), and this is displayed on the display unit 4 (s130 in FIG. 3). As described above, the display of the menu can be changed as needed in accordance with the item actually selected by the user.

そのため、ユーザにとっては、表示部４に表示されるメニューを見ながら、その中の選択項目を任意に選んでその内容を順番に続けて発声していくだけで、そのメニューを該当する別メニューへと表示を変更させていくことができる結果（図２参照）、従来のように音声認識が終了してメニューが変更されるのを待った上で次の階層の項目を発声していくといった手間がかからない点で利便性が高い。 For this reason, the user simply selects a selection item from the menu displayed on the display unit 4 and utters the contents in order, and the menu is changed to a corresponding menu. As a result of being able to change the display (see FIG. 2), it does not take time and effort to utter the item of the next layer after waiting for the voice recognition to be completed and the menu to be changed as in the past. Convenient in terms.

また、上記実施形態では、外部から音声が入力されない期間が所定期間以上継続した場合に（図４のｓ２６０「ＮＯ」）、メニューの表示を消去させることができる（同図ｓ２８０，図３のｓ１３０）。 In the above embodiment, the menu display can be erased (s280 in FIG. 3 and s130 in FIG. 3) when a period in which no external sound is input continues for a predetermined period or longer (s260 “NO” in FIG. 4). ).

表示部４に表示させるカレントメニューは、一旦表示された以降、継続的に表示させておけばよいが、音声の入力がないまま一定期間が経過した場合は、ユーザによる選択項目の選択が中断，中止されているといえるため、表示部４における表示領域の視認性を向上させるなどの観点から、その一定期間の経過をもってメニューの表示を消去させることが望ましい。 The current menu to be displayed on the display unit 4 may be displayed continuously after it is displayed once. However, when a certain period of time has elapsed without voice input, selection of the selection item by the user is interrupted. Since it can be said that it has been canceled, it is desirable to delete the display of the menu after a certain period of time from the viewpoint of improving the visibility of the display area in the display unit 4.

また、上記実施形態では、本情報処理装置１の起動直後のように、メニューが表示されていない状態の場合、ユーザが何らかの発話を行うことで（図３のｓ１１０：ＹＥＳ）、初期位置として定められたトップメニューを候補位置としてカレントメニューが決定される（図４のｓ３１０〜ｓ４００）。そのため、メニューが表示されていない状態の場合、ユーザが何らかの発話を行うことで、第１階層のトップメニューを表示させることができる。 In the above-described embodiment, when the menu is not displayed, such as immediately after the information processing apparatus 1 is activated, the user performs some utterance (s110: YES in FIG. 3) to determine the initial position. The current menu is determined with the top menu as a candidate position (s310 to s400 in FIG. 4). Therefore, when the menu is not displayed, the top menu of the first hierarchy can be displayed when the user speaks something.

また、上記実施形態では、音声の入力が開始された直後などのように仮説情報が記憶されておらず、かつ履歴情報も記憶されていない場合は、候補位置には予め定められた初期位置が設定される結果、表示させるべきメニューとして初期位置に対応する第１階層のメニューが選ばれる（図４のｓ３１０〜ｓ４００）。 In the above embodiment, when no hypothesis information is stored and history information is not stored, such as immediately after the start of voice input, the candidate position has a predetermined initial position. As a result of the setting, the first layer menu corresponding to the initial position is selected as the menu to be displayed (s310 to s400 in FIG. 4).

つまり、外部から入力される音声に基づいてカレントメニューが遷移していたとしても、外部から音声が入力されない期間が所定期間以上継続した場合には（同図ｓ２６０「ＮＯ」）、その後、カレントメニューが最上位階層（第１階層）のメニュー，つまりトップメニューに戻されるため、選択項目の選択を再度行うにあたってトップメニューから選択の項目を開始すればよいこととなる。 That is, even if the current menu is changed based on the sound input from the outside, if the period in which the sound is not input from the outside continues for a predetermined period or longer ("NO" in s260 in the figure), then the current menu Is returned to the menu of the highest hierarchy (first hierarchy), that is, the top menu. Therefore, when the selection item is selected again, the selection item may be started from the top menu.

これにより、ユーザが選択項目の選択を中断，中止したとしても、その選択の再開時、常にトップメニューから選択項目の選択を行えばよくなるため、選択項目の選択に際しての混乱を防止することができる結果、ユーザインタフェースとしての利便性を高めることができる。 As a result, even if the user interrupts or cancels the selection of the selection item, it is only necessary to always select the selection item from the top menu when the selection is resumed, so that confusion in selecting the selection item can be prevented. Convenience as a user interface can be improved.

また、上記実施形態では、所定の記憶領域に格納されたカレント情報を更新することにより、カレントメニューを遷移させることができる（図４のｓ２５０）。
また、上記実施形態では、各メニューにおける選択項目のうち、所定の処理が割り当てられた選択項目が選択されたとみなされた場合に、その割り当てられた処理を実行することができる（図３のｓ１６０）。 In the above embodiment, the current menu can be changed by updating the current information stored in the predetermined storage area (s250 in FIG. 4).
Further, in the above embodiment, when it is considered that a selection item to which a predetermined process is assigned is selected from among the selection items in each menu, the assigned process can be executed (s160 in FIG. 3). ).

また、上記実施形態では、ユーザの音声がいずれの選択経路におけるいずれの選択項目に対応するかを推定する際に実施される音声認識の都度、その認識に際してしきい値以上の類似度となった発話パターンが複数種類認識されていたとしても、その中から最も類似度の大きな発話パターンに対応する選択項目それぞれからなる選択経路を推定することができる。
（１−４）変形例
以上、本発明の実施の形態について説明したが、本発明は、上記実施形態に何ら限定されることはなく、本発明の技術的範囲に属する限り種々の形態をとり得ることはいうまでもない。 In the above embodiment, each time voice recognition is performed when estimating which selection item in which selection route corresponds to the user's voice, the similarity is equal to or higher than a threshold value in the recognition. Even if a plurality of types of utterance patterns are recognized, it is possible to estimate a selection route including selection items corresponding to the utterance pattern having the highest similarity among them.
(1-4) Modifications Embodiments of the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and may take various forms as long as they belong to the technical scope of the present invention. Needless to say, you get.

例えば、上記実施形態においては、カレントメニューが遷移させられる都度、その旨のメッセージを表示部４に表示させたり、スピーカーからメッセージ或いはビープ音を出力させることとしてもよい。この場合、カレントメニューが遷移させられた旨をその都度報知することができる。 For example, in the above embodiment, each time the current menu is changed, a message to that effect may be displayed on the display unit 4 or a message or beep sound may be output from the speaker. In this case, it can be notified each time that the current menu is changed.

また、上記実施形態においては、本発明の情報処理装置が、ナビゲーション装置におけるユーザインタフェースを実現するための装置として実装された構成を例示した。しかし、本発明の情報処理装置は、ナビゲーション装置以外の装置におけるユーザインタフェースを実現するための装置として実装してもよい。 Moreover, in the said embodiment, the information processing apparatus of this invention illustrated the structure mounted as an apparatus for implement | achieving the user interface in a navigation apparatus. However, the information processing apparatus of the present invention may be implemented as a device for realizing a user interface in a device other than the navigation device.

また、上記実施形態では、カレントメニューが遷移した後、外部から音声が入力されない期間が所定期間以上継続した場合には、カレントメニューが直ちに最上位階層のメニュー，つまりトップメニューに戻されるように構成してもよい。 In the above embodiment, when the period in which no sound is input from the outside continues for a predetermined period or longer after the current menu transitions, the current menu is immediately returned to the top-level menu, that is, the top menu. May be.

このためには、図４におけるｓ２６０で、過去の一定期間内にマイク５を介した音声の入力がないと判定された場合に（ｓ２６０：ＮＯ）、図７に示すように、カレントメニューを第１階層のメニューへと遷移させた後（ｓ２８２）、ｓ２８０へ移行してメニューの消去を決定する（図７（ａ）），または，ｓ２７０へ移行してメニューの表示を決定する（図７（ｂ）こととすればよい。 For this purpose, if it is determined in s260 in FIG. 4 that there is no voice input through the microphone 5 within a certain past period (s260: NO), the current menu is displayed as shown in FIG. After the transition to the one-level menu (s282), the process proceeds to s280 to decide to delete the menu (FIG. 7 (a)), or the process proceeds to s270 to decide the menu display (FIG. 7 ( b)

このように、カレントメニューが最上位階層のトップメニューに戻される構成であれば、選択の中断，中止前のカレントメニューに拘わらず、その選択の再開時、常に第１階層のメニューから選択項目の選択を行えばよくなり、選択項目の選択に際しての混乱を防止することができる結果、ユーザインタフェースとしての利便性を高めることができる。 In this way, if the current menu is returned to the top menu of the highest hierarchy, the selection item is always selected from the menu of the first hierarchy when the selection is resumed regardless of the current menu before interruption or cancellation of the selection. As a result, it is possible to prevent confusion when selecting a selection item. As a result, convenience as a user interface can be improved.

また、上記実施形態においては、マイク５を介して入力される音声に基づいてメニューの遷移が実現されるように構成されたものを例示したが、このメニューの遷移を実現するための音声としては、ネットワークを介して音声を入力する経路を有している場合であれば、この経路を介して入力される音声を用いてもよい。 Moreover, in the said embodiment, although what was comprised so that a menu transition was implement | achieved based on the audio | voice input via the microphone 5, as an audio | voice for implement | achieving this menu transition, If there is a route for inputting voice via the network, voice input via this route may be used.

また、上記実施形態におけるカレントメニューの遷移は、ユーザによる操作部３への操作を受けた場合に実施されるようにしてもよい。この場合、過去の一定期間内に音声の入力が検出されなかった場合でも、メニューの第１階層への遷移やメニュー表示の消去を行わないようにする、或いは、音声入力による遷移とは異なる一定期間を設けることが望ましい。
（２）第２実施形態
この実施形態においては、表示内容決定処理の一部処理内容が一部相違しているだけであるため、この相違点についてのみ説明する。 In addition, the transition of the current menu in the above embodiment may be performed when the user receives an operation on the operation unit 3. In this case, even if no voice input is detected within a certain period in the past, the menu is not shifted to the first layer or the menu display is not deleted, or it is different from the transition by voice input. It is desirable to provide a period.
(2) Second Embodiment In this embodiment, only part of the display content determination process is partially different, so only this difference will be described.

この相違点とは、上記第１実施形態が、表示部４によるメニューの表示に際し、常に同じ表示態様にて表示させるのに対し、本実施形態が、周辺環境に応じてその表示態様を異ならせている点である。具体的には、外部からの入力音声が、選択経路に沿った内容の音声であるか否かにより、その表示態様を異ならせるように構成されている。
（２−１）表示内容決定処理
本実施形態における表示内容決定処理では、第１実施形態と同様にｓ２１０〜ｓ２５０が行われた後、図８に示すように、その時点までにマイク５を介して入力された音声に基づいて、この音声が上述した選択経路に沿った内容の音声であることの信頼度が特定される（ｓ２５１）。ここでは、その時点までにマイク５を介して入力され、ＣＰＵの内蔵メモリまたはＲＡＭに格納された音声に基づいて、上述した特許文献３のように競合モデルを用意して音声認識処理を行い、上述した仮説情報の類似度（尤度）と競合モデルの現在時刻の仮説の類似度（尤度）との尤度比を算出することにより信頼度が特定される。 This difference is that the first embodiment always displays in the same display mode when displaying the menu by the display unit 4, whereas the present embodiment changes the display mode according to the surrounding environment. It is a point. Specifically, the display mode is configured to differ depending on whether or not the externally input voice is a voice having contents along the selected route.
(2-1) Display content determination process In the display content determination process in this embodiment, after s210 to s250 are performed as in the first embodiment, as shown in FIG. On the basis of the input voice, the reliability that the voice is the voice having the content along the above-described selected path is specified (s251). Here, based on the voice input through the microphone 5 up to that point and stored in the internal memory or RAM of the CPU, a competitive model is prepared as in Patent Document 3 described above, and voice recognition processing is performed. The reliability is specified by calculating the likelihood ratio between the similarity (likelihood) of the hypothesis information described above and the similarity (likelihood) of the hypothesis at the current time of the competitive model.

なお、この信頼度の特定は、このｓ２５１ではなく、本表示内容決定処理とは独立した別の処理において実施することとしてもよく、この場合、このｓ２５１では、こうして特定された信頼度を別の処理から取得することとすればよい。 The specification of the reliability may be performed not in s251 but in another process independent of the display content determination process. In this case, in s251, the reliability specified in this way is changed to another. What is necessary is just to acquire from a process.

こうして特定された信頼度が所定の第１しきい値ＴＨ１より大きければ（ｓ２５２：ＹＥＳ）、通常の表示サイズによりメニューを表示すべき旨の決定がなされた後（ｓ２５３）、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。 If the reliability thus specified is larger than the predetermined first threshold value TH1 (s252: YES), after determining that the menu should be displayed with the normal display size (s253), the process accepts the instruction. The process returns (shifts to s130).

上述したｓ２５２での判定基準となる「第１しきい値ＴＨ１」とは、その時点までにマイク５を介して入力された音声が選択経路に沿った内容の音声であると判定して問題ない程度の信頼度として定められた値である。そして、上記ｓ２５３でいう「通常の表示サイズによりメニューを表示すべき旨」とは、表示部４においてメニューを表示させる際の表示領域を、第１実施形態と同様の表示領域とすべきことを意味する。 The “first threshold value TH1” serving as the determination criterion in s252 described above is satisfactory because it is determined that the sound input through the microphone 5 up to that point in time is the sound along the selected route. This is a value determined as the degree of reliability. Then, “indicating that the menu should be displayed with the normal display size” in s253 means that the display area when displaying the menu on the display unit 4 should be the same display area as in the first embodiment. means.

こうして指示受付処理へと戻った後は、図３のｓ１３０にて通常の表示サイズによるメニューの表示がなされることとなる。
また、上記ｓ２５１で特定された信頼度が、第１しきい値ＴＨ１より小さい値として定められた第２しきい値ＴＨ２より大きければ（ｓ２５２：ＮＯ，ｓ２５４：ＹＥＳ）、通常よりも小さい表示サイズによりメニューを表示すべき旨の決定がなされた後（ｓ２５５）、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。 After returning to the instruction receiving process in this way, the menu is displayed in the normal display size in s130 of FIG.
If the reliability specified in s251 is larger than the second threshold TH2 determined as a value smaller than the first threshold TH1 (s252: NO, s254: YES), the display size is smaller than normal. After the determination that the menu should be displayed is made (S255), the process returns to the instruction receiving process (shifts to s130).

上述したｓ２５４での判定基準となる「第２しきい値ＴＨ２」とは、その時点までにマイク５を介して入力された音声が選択経路に沿った内容の音声であると判定するのに十分ではない信頼度として定められた値である。そして、上記ｓ２５５でいう「通常よりも小さい表示サイズによりメニューを表示すべき旨」とは、表示部４においてメニューを表示させる際の表示領域を、第１実施形態における表示領域よりも小さい表示領域とすべきことを意味する。 The “second threshold TH2”, which is the determination criterion in s254 described above, is sufficient to determine that the sound input through the microphone 5 up to that point is the sound having the content along the selected route. It is a value determined as a non-reliability. In addition, “the menu should be displayed with a display size smaller than normal” in s255 means that the display area when the menu is displayed on the display unit 4 is a display area smaller than the display area in the first embodiment. Means that

こうして指示受付処理へと戻った後は、図３のｓ１３０にて通常よりも小さい表示サイズによるメニューの表示がなされることとなる。
また、上記ｓ２５１で特定された信頼度が、第２しきい値ＴＨ２以下である場合（ｓ２５４：ＮＯ）、ｓ２８０へ移行し、メニューの表示を消去すべき旨の決定がなされた後で、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。
（２−２）作用，効果
このように構成された情報処理装置１によれば、第１実施形態と同様の構成から得られる作用，効果の他、以下に示すような作用，効果を得ることができる。 After returning to the instruction receiving process in this way, a menu with a display size smaller than normal is displayed in s130 of FIG.
If the reliability specified in s251 is equal to or lower than the second threshold value TH2 (s254: NO), the process proceeds to s280, and after the decision to delete the menu display is made, the process Returns to the instruction reception process (shifts to s130).
(2-2) Actions and Effects According to the information processing apparatus 1 configured as described above, the following actions and effects are obtained in addition to the actions and effects obtained from the same configuration as the first embodiment. Can do.

例えば、上記実施形態においては、外部から入力される音声が、上述した選択経路に沿った内容の音声であることの信頼度に応じて、メニューを示す画像の表示態様を異ならせることができる。具体的には、外部からの入力音声における信頼度が高いほどメニューにおける表示領域を大きくすることができる。 For example, in the above-described embodiment, the display mode of the image indicating the menu can be varied according to the reliability that the voice input from the outside is the voice having the content along the above-described selection path. Specifically, the higher the reliability of the input voice from the outside, the larger the display area in the menu.

なお、この実施形態においては、表示態様として表示サイズを異ならせるように構成されているが、こうして異ならせる表示態様としては、表示サイズ以外の態様としてもよい。
（３）第３実施形態
この実施形態においては、表示内容決定処理の一部処理内容が一部相違しているだけであるため、この相違点についてのみ説明する。 In this embodiment, the display size is configured to be different as the display mode. However, the display mode to be changed in this way may be a mode other than the display size.
(3) Third Embodiment In this embodiment, only part of the display content determination process is partially different, so only this difference will be described.

この相違点とは、上記第１実施形態が、表示部４によるメニューの表示に際し、常に同じ表示態様にて表示させるのに対し、本実施形態が、周辺環境に応じてその表示態様を異ならせている点である。具体的には、ユーザによる音声を入力して動作する所定装置７（図１参照）に対する音声の入力が行われているか否かにより、その表示態様を異ならせるように構成されている。なお、この所定装置７とは、例えば、情報処理装置１と通信可能に接続された情報端末（より具体的には携帯電話端末）などのことである。
（３−１）表示内容決定処理
本実施形態における表示内容決定処理では、第１実施形態と同様にｓ２１０〜ｓ２５０が行われた後、図９に示すように、所定装置７との通信を経て、所定装置７が音声入力を受けて動作しているか否かがチェックされる（ｓ４１０）。 This difference is that the first embodiment always displays in the same display mode when displaying the menu by the display unit 4, whereas the present embodiment changes the display mode according to the surrounding environment. It is a point. Specifically, the display mode is configured to differ depending on whether or not a voice is input to a predetermined device 7 (see FIG. 1) that operates by inputting a voice by the user. The predetermined device 7 is, for example, an information terminal (more specifically, a mobile phone terminal) connected to the information processing device 1 so as to be communicable.
(3-1) Display Content Determination Processing In the display content determination processing in the present embodiment, after s210 to s250 are performed as in the first embodiment, communication with a predetermined device 7 is performed as shown in FIG. Then, it is checked whether or not the predetermined device 7 is operating upon receiving voice input (s410).

このｓ４１０で、所定装置７が音声入力を受けて動作していると判定された場合には（ｓ４１０：ＹＥＳ）、プロセスがｓ２８０へと移行し、メニューの表示を消去すべき旨の決定がなされた後、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。 If it is determined in s410 that the predetermined device 7 is operating by receiving voice input (s410: YES), the process proceeds to s280, and a determination is made that the menu display should be deleted. After that, the process returns to the instruction receiving process (shifts to s130).

一方、上記ｓ４１０で、所定装置７が音声入力を受けて動作していないと判定された場合には（ｓ４１０：ＮＯ）、プロセスがｓ２７０へと移行し、メニューを表示すべき旨の決定がなされた後、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。
（３−２）作用，効果
このように構成された情報処理装置１によれば、第１実施形態と同様の構成から得られる作用，効果の他、以下に示すような作用，効果を得ることができる。 On the other hand, if it is determined in s410 that the predetermined device 7 is not operating in response to voice input (s410: NO), the process moves to s270 and a determination is made that the menu should be displayed. After that, the process returns to the instruction receiving process (shifts to s130).
(3-2) Actions and Effects According to the information processing apparatus 1 configured as described above, the following actions and effects are obtained in addition to the actions and effects obtained from the same configuration as the first embodiment. Can do.

例えば、上記実施形態においては、ユーザによる音声を入力して動作する所定装置７が音声入力を受けて動作している場合に、メニューの表示が行われないようにすることができる。 For example, in the above-described embodiment, when the predetermined device 7 that operates by inputting the voice of the user is operating by receiving the voice input, the menu can be prevented from being displayed.

このように、所定装置７に対する音声入力が行われているということは、本情報処理装置１に対する音声入力とは無関係に発声が行われている可能性が高く、そのような無関係の音声を入力してその後の処理を行ってしまうと、ユーザの意図しないメニュー遷移が行われてしまう。 As described above, the fact that the voice input to the predetermined device 7 is performed is highly likely that the utterance is performed regardless of the voice input to the information processing apparatus 1, and such an irrelevant voice is input. If the subsequent processing is performed, menu transitions that are not intended by the user are performed.

そのため、上記のように、所定装置７が音声入力を受けて動作している場合にメニューの表示が行われないようにすることにより、ユーザの意図しないメニュー遷移が行われないようにすることができる。
（４）第４実施形態
この実施形態においては、表示内容決定処理の一部処理内容が一部相違しているだけであるため、この相違点についてのみ説明する。 Therefore, as described above, when the predetermined device 7 is operated by receiving voice input, the menu is not displayed, so that the menu transition not intended by the user is not performed. it can.
(4) Fourth Embodiment In this embodiment, only part of the processing content of the display content determination processing is different, so only this difference will be described.

この相違点とは、上記第１実施形態が、表示部４によるメニューの表示に際し、常に同じ表示態様にて表示させるのに対し、本実施形態が、周辺環境に応じてその表示態様を異ならせている点である。具体的には、操作部３或いは当該情報処理装置１に接続されている所定装置７に対する操作が行われているか否かにより、その表示態様を異ならせるように構成されている。
（４−１）表示内容決定処理
本実施形態における表示内容決定処理では、第１実施形態と同様にｓ２１０〜ｓ２５０が行われた後、図１０に示すように、操作部３或いは情報処理装置１に接続されている所定装置７において操作が行われている最中であるか否かがチェックされる（ｓ４２０）。 This difference is that the first embodiment always displays in the same display mode when displaying the menu by the display unit 4, whereas the present embodiment changes the display mode according to the surrounding environment. It is a point. Specifically, the display mode is configured to differ depending on whether an operation is performed on the operation unit 3 or the predetermined device 7 connected to the information processing device 1.
(4-1) Display Content Determination Process In the display content determination process in the present embodiment, after s210 to s250 are performed as in the first embodiment, as shown in FIG. It is checked whether or not an operation is being performed in the predetermined device 7 connected to (S420).

このｓ４２０で、操作部３或いは情報処理装置１に接続されている所定装置７において操作が行われていると判定された場合には（ｓ４２０：ＹＥＳ）、プロセスがｓ２８０へと移行し、メニューの表示を消去すべき旨の決定がなされた後、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。 If it is determined in s420 that the operation is being performed on the operation unit 3 or the predetermined device 7 connected to the information processing apparatus 1 (s420: YES), the process proceeds to s280, and the menu After the determination that the display should be deleted is made, the process returns to the instruction receiving process (shifts to s130).

一方、上記ｓ４２０で、操作部３或いは情報処理装置１に接続されている所定装置７において操作が行われていないと判定された場合には（ｓ４２０：ＮＯ）、プロセスがｓ２７０へと移行し、メニューを表示すべき旨の決定がなされた後、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。
（４−２）作用，効果
このように構成された情報処理装置１によれば、第１実施形態と同様の構成から得られる作用，効果の他、以下に示すような作用，効果を得ることができる。 On the other hand, when it is determined in s420 that the operation is not performed in the operation device 3 or the predetermined device 7 connected to the information processing device 1 (s420: NO), the process proceeds to s270, After the determination that the menu should be displayed is made, the process returns to the instruction receiving process (shifts to s130).
(4-2) Actions and Effects According to the information processing apparatus 1 configured as described above, the following actions and effects are obtained in addition to the actions and effects obtained from the same configuration as the first embodiment. Can do.

例えば、上記実施形態においては、情報処理装置１の操作部或いは情報処理装置１に接続されている所定装置７が操作されている場合に、メニューの表示が行われないようにすることができる。 For example, in the above-described embodiment, when the operation unit of the information processing device 1 or the predetermined device 7 connected to the information processing device 1 is operated, the menu can be prevented from being displayed.

このように、情報処理装置１の操作部３或いは情報処理装置１に接続されている所定装置７が操作されているということは、情報処理装置１のメニュー選択とは無関係に発声が行われている可能性が高く、そのような無関係の音声を入力してその後の処理を行ってしまうと、ユーザの意図しないメニュー遷移が行われてしまう。 As described above, when the operation unit 3 of the information processing device 1 or the predetermined device 7 connected to the information processing device 1 is operated, the utterance is performed regardless of the menu selection of the information processing device 1. If such an irrelevant voice is input and the subsequent processing is performed, a menu transition unintended by the user is performed.

そのため、上記のように、操作部３に対する操作が行われている場合にメニューの遷移が行われないようにすることにより、ユーザの意図しないメニュー遷移が行われないようにすることができる。
（５）第５実施形態
この実施形態においては、表示内容決定処理の一部処理内容が一部相違しているだけであるため、この相違点についてのみ説明する。 Therefore, as described above, the menu transition not intended by the user can be prevented from being performed by preventing the menu transition when the operation unit 3 is operated.
(5) Fifth Embodiment In this embodiment, only part of the processing content of the display content determination processing is partially different, so only this difference will be described.

この相違点とは、上記第１実施形態が、表示部４によるメニューの表示に際し、常に同じ表示態様にて表示させるのに対し、本実施形態が、周辺環境に応じてその表示態様を異ならせている点である。具体的には、情報処理装置１周辺に位置しているユーザの数に応じて、その表示態様を異ならせるように構成されている。
（５−１）表示内容決定処理
本実施形態における表示内容決定処理は、第１実施形態と同様にｓ２１０〜ｓ２６０が行われ、このｓ２６０で「ＹＥＳ」と判定された後、図１１に示すように、情報処理装置１周辺に位置するユーザの数がチェックされる（ｓ４３０）。 This difference is that the first embodiment always displays in the same display mode when displaying the menu by the display unit 4, whereas the present embodiment changes the display mode according to the surrounding environment. It is a point. Specifically, the display mode is configured to be different depending on the number of users located around the information processing apparatus 1.
(5-1) Display Content Determination Processing In the display content determination processing in this embodiment, s210 to s260 are performed as in the first embodiment, and after “YES” is determined in s260, as shown in FIG. Next, the number of users located around the information processing apparatus 1 is checked (s430).

この実施形態では、情報処理装置１周辺に位置しているユーザの数を検出すべく、その周辺においてユーザが位置しうる領域付近にセンサが配置されているため、このｓ４３０では、それらの検出結果に基づいてユーザの数を検出する。なお、ここでは、周辺においてユーザが位置しうる領域をカメラで撮影しておき、その映像に含まれるユーザを画像処理で特定することにより、ユーザの数を検出することとしてもよい。 In this embodiment, in order to detect the number of users located in the vicinity of the information processing apparatus 1, sensors are arranged in the vicinity of the area where the user can be located. Therefore, in s430, those detection results Detect the number of users based on Here, the number of users may be detected by photographing a region where the user can be located in the vicinity with a camera and specifying the users included in the video by image processing.

こうしてチェックされたユーザの数が「１」であれば（ｓ４３０：ＹＥＳ）、通常の表示サイズによりメニューを表示すべき旨の決定がなされた後（ｓ４４０）、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。 If the number of users checked in this way is “1” (s430: YES), after determining that the menu should be displayed with the normal display size (s440), the process returns to the instruction receiving process. (Transition to s130).

また、上記ｓ４３０にてチェックされたユーザの数が複数であれば（ｓ４３０：ＮＯ）、通常よりも小さい表示サイズによりメニューを表示すべき旨の決定がなされた後（ｓ４５０）、プロセスが上記指示受付処理へと戻る（ｓ１３０へと移行する）。
（５−２）作用，効果
このように構成された情報処理装置１によれば、第１実施形態と同様の構成から得られる作用，効果の他、以下に示すような作用，効果を得ることができる。 Also, if the number of users checked in s430 is plural (s430: NO), after determining that the menu should be displayed with a display size smaller than normal (s450), the process instructs The process returns to the reception process (shifts to s130).
(5-2) Actions and Effects According to the information processing apparatus 1 configured as described above, the following actions and effects are obtained in addition to the actions and effects obtained from the same configuration as the first embodiment. Can do.

例えば、上記実施形態においては、情報処理装置１周辺に１人のユーザのみが位置していることが検出された場合には（図１１のｓ４３０「ＹＥＳ」）、カレントメニューの表示領域を通常の大きさとするが（同図ｓ４４０）、複数のユーザが位置していることが検出された場合には（同図ｓ４３０「ＮＯ」）、カレントメニューの表示領域を通常よりも小さい表示領域とすることができる（同図ｓ４５０）。 For example, in the above embodiment, when it is detected that only one user is located around the information processing apparatus 1 (s430 “YES” in FIG. 11), the current menu display area is set to the normal display area. If it is detected that a plurality of users are located (s430 “NO” in the figure), the current menu display area is set to a display area smaller than usual. (S450 in the figure).

このように、複数のユーザが周辺に位置している場合は、情報処理装置１を音声により操作する以外のユーザからすると、表示部４に表示されるメニューが必ずしも必要な情報ではない。そのため、上記構成のように、このような場合におけるメニューの表示領域を小さくすることで、そのような表示態様を異ならせない構成と比べて、他のユーザにとっての表示部４の視認性が低下することを防止することができる。 As described above, when a plurality of users are located in the vicinity, a menu displayed on the display unit 4 is not necessarily necessary information from a user other than operating the information processing apparatus 1 by voice. Therefore, by reducing the menu display area in such a case as in the above configuration, the visibility of the display unit 4 for other users is reduced compared to a configuration in which such a display mode is not changed. Can be prevented.

なお、この実施形態においては、表示態様として表示サイズを異ならせるように構成されているが、こうして異ならせる表示態様としては、表示サイズ以外の態様としてもよい。
（６）本発明との対応関係
以上説明した実施形態において、図８のｓ３１０は本発明における信頼特定手段であり、同図ｓ２５３，ｓ２５５は本発明における第１の態様決定手段であり，図１１のｓ４４０，ｓ４５０は本発明における第２の態様決定手段であり、図９のｓ４１０は本発明における外部音声入力判定手段であり、図１０のｓ４２０は本発明における操作検出手段であり、図１１のｓ４３０は本発明におけるユーザ検出手段であり、図３のｓ１６０は本発明における処理実施手段である。 In this embodiment, the display size is configured to be different as the display mode. However, the display mode to be changed in this way may be a mode other than the display size.
(6) Correspondence with the Present Invention In the embodiment described above, s310 in FIG. 8 is the trust specifying means in the present invention, and s253 and s255 in FIG. 8 are the first mode determining means in the present invention. S440 and s450 are the second mode determining means in the present invention, s410 in FIG. 9 is the external voice input determining means in the present invention, s420 in FIG. 10 is the operation detecting means in the present invention, and FIG. s430 is user detection means in the present invention, and s160 in FIG. 3 is processing execution means in the present invention.

情報処理装置の全体構成を示すブロック図Block diagram showing the overall configuration of the information processing apparatus 表示部に表示されるメニューが遷移していく様子を示す図The figure which shows a mode that the menu displayed on a display part changes 指示受付処理を示すフローチャートFlow chart showing instruction acceptance processing 表示内容決定処理を示すフローチャートFlow chart showing display content determination processing ユーザが選択した選択項目を推定する過程を示す図The figure which shows the process of estimating the selection item which the user selected 発話パターン辞書の構成を示す図Diagram showing the structure of the utterance pattern dictionary 別の実施形態における表示内容決定処理を示すフローチャートThe flowchart which shows the display content determination process in another embodiment 第２実施形態における表示内容決定処理を示すフローチャートThe flowchart which shows the display content determination process in 2nd Embodiment. 第３実施形態における表示内容決定処理を示すフローチャートThe flowchart which shows the display content determination process in 3rd Embodiment. 第４実施形態における表示内容決定処理を示すフローチャートThe flowchart which shows the display content determination process in 4th Embodiment 第５実施形態における表示内容決定処理を示すフローチャートThe flowchart which shows the display content determination process in 5th Embodiment

Explanation of symbols

１…情報処理装置、２…記憶部、３…操作部、４…表示部、５…マイク、６…音声入力部、７…所定装置、１０…制御部、２０…入出力インタフェース、３１…音声検出手段、３３…項目推定手段、３５…音声認識手段、３７…処理実施手段、３８…メニュー遷移手段、３９…メニュー表示手段。 DESCRIPTION OF SYMBOLS 1 ... Information processing apparatus, 2 ... Memory | storage part, 3 ... Operation part, 4 ... Display part, 5 ... Microphone, 6 ... Audio | voice input part, 7 ... Predetermined apparatus, 10 ... Control part, 20 ... Input / output interface, 31 ... Audio | voice Detection means, 33 ... item estimation means, 35 ... voice recognition means, 37 ... processing execution means, 38 ... menu transition means, 39 ... menu display means.

Claims

By causing the user to select each selection item included in each of a plurality of menus having a hierarchical structure from the first layer to the n-th layer (n is an arbitrary number), the menu having the selection item is changed to a menu in another layer. An information processing apparatus implemented with a user interface configured to transition,
Menu display means for displaying a current menu, which is a menu to be displayed at that time among the plurality of menus, on the display unit;
During the continuous sound from outside is input, respectively the speech consists of choices that can be selected to transition from a menu of the i-th layer (1 ≦ i <n) to the menu of the n hierarchy Item estimation means for repeatedly estimating which selection item corresponds to which selection route among the selected routes;
Menu transition means for transitioning the current menu to a menu of another hierarchy to be displayed when the selection item is selected each time a selection item is estimated by the item estimation means ,
The menu display means causes the display unit to display an image indicating the menu transitioned as the current menu by the menu transition means,
further,
The speech continuously input from the outside until that time is compared with each utterance pattern in the utterance pattern dictionary storing each utterance pattern when each of the selected routes is uttered, and the degree of similarity as a comparison result is predetermined. Voice recognition means for outputting the utterance pattern exceeding the threshold of
The information processing apparatus according to claim 1, wherein the item estimation unit outputs the latest selection item as an estimation result among the selection routes corresponding to the utterance patterns sequentially recognized by the voice recognition unit .

The menu display means deletes the display of the menu by the display unit when an external audio input has not been made for a predetermined period or longer after the menu image is displayed on the display unit. The information processing apparatus according to claim 1.

The menu display means displays the current menu on the display unit when a voice is input from outside in a state where the menu is not displayed on the display unit. The information processing apparatus described in 1.

The said menu transition means changes the said current menu to the menu of a 1st hierarchy, when the input of the audio | voice from the outside has not been made more than predetermined period, The menu of any one of Claim 1 to 3 characterized by the above-mentioned. Information processing device.

The item estimation means, wherein each time by recognition speech recognition means, when the speech pattern becomes the threshold or more similarity when the recognition has a plurality of type recognition, that corresponds to a high speech patterns most similarity election The information processing apparatus according to claim 4, wherein the latest selection item in the selected route is output as an estimation result for the selected route.

The menu transition means transitions the menu by updating the current information indicating the current menu to one indicating a menu of a different hierarchy to be transitioned based on the selection item estimated by the item estimation means. the information processing apparatus according to any one of claims 1-5, wherein.

Trust specifying means for specifying the reliability that the voice is a voice of the content along the selected route based on the voice inputted from the outside;
First mode determining means for determining the display mode of the menu according to the reliability specified by the trust specifying means after the current menu is changed by the menu transition means;
It said menu display means, the menu was allowed to transition as the current menu, from claim 1, characterized in that on the display unit at a display mode for the menu the first mode determination means has determined 6 The information processing apparatus according to any one of the above.

The first aspect determining means determines the size in the display area of the current menu according to the reliability specified by the trust specifying means,
Said menu display means, according to claim 7, characterized in that on the display unit a menu that is allowed to transition as the current menu at the combined size of the display area determined by the first mode determination means The information processing apparatus described in 1.

A second mode determining unit that receives a command from the outside and determines a display mode of the menu after the current menu is transitioned by the menu transition unit;
It said menu display means, the menu was allowed to transition as the current menu, from claim 1, characterized in that on the display unit at a display mode for the menu the second mode determination means has determined 6 The information processing apparatus according to any one of the above.

External voice input determination means for determining whether or not the predetermined device is operating by receiving voice input through communication with an external predetermined device (external device) that operates by inputting voice by the user; And
The second mode determining means determines that the current menu should be displayed when the external audio input determining means determines that no audio input is performed on the external apparatus side, while the external apparatus side 10. The information processing apparatus according to claim 9 , wherein when it is determined that voice input is performed, the current menu is not displayed.

An operation detecting means for detecting whether an operation unit of the information processing apparatus or a predetermined apparatus connected to the information processing apparatus is operated by a user;
The second aspect determination means determines that the current menu should be displayed when it is determined that the operation detection means is not operated, and the operation detection means is operated. If it is determined, information processing apparatus according to claim 9 or claim 1 0, characterized in that to determine the effect that does not display the current menu.

User detection means for detecting the number of users located around the information processing apparatus,
The second aspect determining means determines the size of the display area of the current menu to a normal size when the user detecting means detects that only one user is located, If it is detected that a plurality of users are located, according to any of claims 9 1 1, characterized in that to determine the size of the display area of the current menu to be smaller than normal Information processing device.

The information processing apparatus according to claim 1 1 2, characterized in that it comprises a processing execution means, for implementing a predetermined process assigned to the estimated selected item by the item estimation means.

By causing the user to select each selection item included in each of a plurality of menus having a hierarchical structure from the first layer to the n-th layer (n is an arbitrary number), the menu having the selection item is changed to a menu in another layer. A user interface providing method for providing a user interface configured to transition, comprising:
A menu display procedure for displaying the current menu, which is a menu to be displayed at that time, among the plurality of menus on the display unit,
During the externally continuous speech is inputted, the voice menu with the selection of each selection item may be selected to transition to the menu of the n layer of the i-th layer (1 ≦ i <n) An item estimation procedure for repeatedly estimating which selection item corresponds to which selection route among the routes;
A menu transition procedure for transitioning the current menu to a menu of another hierarchy to be displayed when the selected item is selected , each time the selected item is estimated by the item estimating means ;
The speech continuously input from the outside until that time is compared with each utterance pattern in the utterance pattern dictionary storing each utterance pattern when each of the selected routes is uttered, and the degree of similarity as a comparison result is predetermined. A speech recognition procedure for outputting an utterance pattern that exceeds the threshold of
In the menu display procedure, an image showing the menu transitioned in the menu transition procedure is displayed on the display unit ,
In the item estimation procedure, the latest selection item in the selected route corresponding to the utterance pattern sequentially recognized by the voice recognition procedure is output as an estimation result .

Program for executing various processing procedures for functioning as all the means according to any of claims 1 1 3 of the computer system.