JP2015076038A

JP2015076038A - Information processing method, information processing apparatus, and program

Info

Publication number: JP2015076038A
Application number: JP2013213686A
Authority: JP
Inventors: 玲二藤川; Reiji Fujikawa; 雅彦原田; Masahiko Harada
Original assignee: NEC Personal Computers Ltd
Current assignee: NEC Personal Computers Ltd
Priority date: 2013-10-11
Filing date: 2013-10-11
Publication date: 2015-04-20

Abstract

PROBLEM TO BE SOLVED: To provide an information processing method, information processing apparatus, and program for improving the convenience of the execution of an instruction through voice recognition.SOLUTION: There is provided an information processing method for executing a predetermined command, which is specified on the basis of predetermined text information recognized through input voice information, the information processing method including a first mode of receiving the input of voice information and displaying the capability of receiving the input of voice information in a first area A1 on a screen, and a second mode of executing the predetermined command and displaying a result of the execution in a second area A2 larger than the first area on the screen.

Description

本発明は、情報処理方法、情報処理装置、及びプログラムに関する。 The present invention relates to an information processing method, an information processing apparatus, and a program.

近年、テレビ受像器やパーソナルコンピュータ等の電子機器に対するユーザ・コマンドの入力を支援する対話型操作支援システムが開発されている（例えば、特許文献１参照）。 In recent years, an interactive operation support system that supports input of user commands to electronic devices such as a television receiver and a personal computer has been developed (see, for example, Patent Document 1).

特許文献１に記載の発明は、「対話型操作支援システム及び対話型操作支援方法、並びに記憶媒体」に関する発明であり、具体的には、「音声合成やアニメーションによるリアクションを行なう擬人化されたアシスタントと呼ばれるキャラクタのアニメーションをユーザ・インターフェースとすることにより、ユーザに対して親しみを持たせると同時に複雑な命令への対応やサービスへの入り口を提供することができる。また、自然言語に近い感じの命令体系を備えているので、ユーザは、通常の会話と同じ感覚で機器の操作を容易に行なうことができる」ものである。 The invention described in Patent Document 1 is an invention related to “interactive operation support system, interactive operation support method, and storage medium”, and specifically, “an anthropomorphic assistant that performs speech synthesis and animation reaction” By using the animation of the character called as a user interface, it is possible to provide a familiarity to the user and at the same time provide a response to a complicated command and an entrance to a service. Since the command system is provided, the user can easily operate the device with the same feeling as in a normal conversation.

特開２００２−４１２７６号公報JP 2002-41276 A

しかしながら、上述した特許文献１に記載の技術では、音声による命令受け付け状態における表示と命令の実行状態における表示との関係について特段考慮されていない。具体的には、特許文献１に記載の技術によると、命令受け付け状態ではアシスタントが全画面表示され、受け付けた命令が実行されるとその結果が全画面表示される。このため、ユーザは命令受け付け状態ではアシスタント以外の情報処理内容を視認することができない。また、命令実行状態では、音声による命令と実行結果画面との関係を把握することが難しい。
そこで、本発明の目的は、音声認識による命令実行の使い勝手を向上させる情報処理方法、情報処理装置、及びプログラムを提供することにある。 However, in the technique described in Patent Document 1 described above, no special consideration is given to the relationship between the display in the instruction reception state by voice and the display in the instruction execution state. Specifically, according to the technique described in Patent Document 1, the assistant is displayed on the full screen in the command reception state, and when the received command is executed, the result is displayed on the full screen. For this reason, the user cannot visually recognize the information processing contents other than the assistant in the command acceptance state. In the instruction execution state, it is difficult to grasp the relationship between the voice instruction and the execution result screen.
Accordingly, an object of the present invention is to provide an information processing method, an information processing apparatus, and a program that improve the usability of instruction execution by voice recognition.

上記課題を解決するため、請求項１に記載の発明は、入力された音声情報から認識された所定のテキスト情報に基づいて特定される所定のコマンドを実行する情報処理方法であって、前記音声情報の入力を受け付けるとともに前記音声情報の入力を受け付け可能であることを画面上の第１の領域に表示する第１のモードと、前記所定のコマンドを実行するとともに実行結果を前記画面上であって前記第１の領域よりも大きい第２の領域に表示する第２のモードとを有することを特徴とする。 In order to solve the above-mentioned problem, the invention according to claim 1 is an information processing method for executing a predetermined command specified based on predetermined text information recognized from input voice information, wherein the voice A first mode for accepting the input of information and displaying in the first area on the screen that the input of the voice information can be accepted; and executing the predetermined command and displaying the execution result on the screen. And a second mode for displaying in a second area larger than the first area.

本発明によれば、音声認識による命令実行の使い勝手を向上させる情報処理方法、情報処理装置、及びプログラムの提供を実現することができる。 ADVANTAGE OF THE INVENTION According to this invention, provision of the information processing method, information processing apparatus, and program which improve the usability of the command execution by speech recognition is realizable.

一実施形態に係る情報処理装置としてのパーソナルコンピュータのブロック図である。It is a block diagram of a personal computer as an information processing apparatus according to an embodiment. 図１に示したパーソナルコンピュータの主要部のブロック図の一例である。FIG. 2 is an example of a block diagram of a main part of the personal computer shown in FIG. 1. 図１に示したパーソナルコンピュータの動作の一例を示すフローチャートである。3 is a flowchart showing an example of the operation of the personal computer shown in FIG. ユーザが音声でパーソナルコンピュータに指示している状態を示す図である。It is a figure which shows the state which the user has instruct | indicated to the personal computer with the sound. 図１に示したパーソナルコンピュータの起動から入力待機、会話、タイムアウト、及びセッション起動待機までの遷移概要を示す概念図である。It is a conceptual diagram which shows the transition outline from starting of the personal computer shown in FIG. 1 to input standby, conversation, timeout, and session startup standby.

次に実施の形態について述べる。
＜構成＞
図１は、一実施形態に係る情報処理装置としてのパーソナルコンピュータのブロック図である。
図１に示すパーソナルコンピュータ（以下、ＰＣ）１００は、マイクロフォン１０１、増幅回路１０２、１０４、スピーカ１０３、表示装置１０５、キーボード１０６、マウス１０７、光学読取装置１０８、制御手段１０９、ＨＤＤ(Hard Disk Drive)１１０、ネットワーク接続部１１１、Ｉ／Ｏ(Input/Output)１１２、及びバスライン１１３を有する。 Next, an embodiment will be described.
<Configuration>
FIG. 1 is a block diagram of a personal computer as an information processing apparatus according to an embodiment.
1 includes a microphone 101, amplification circuits 102 and 104, a speaker 103, a display device 105, a keyboard 106, a mouse 107, an optical reader 108, a control means 109, an HDD (Hard Disk Drive). ) 110, a network connection unit 111, an I / O (Input / Output) 112, and a bus line 113.

マイクロフォン１０１は、ユーザの音声を電気信号に変換する機能を有する。マイクロフォン１０１としては、例えばコンデンサマイクロフォンが挙げられるが、ダイナミックマイクロフォンでもよい。
増幅回路１０２は、マイクロフォン１０１からの電気信号を増幅する回路である。
スピーカ１０３は、電気信号を音声に変換する機能を有する。スピーカ１０３は、主にＰＣを擬人化したアバターの発話内容をユーザへ伝達する機能を有する。
増幅回路１０４は、音声信号を、スピーカ１０３を駆動させるレベルまで増幅する回路である。
表示装置１０５は、アバターやアバターの発話内容を文字で表示した吹き出しを含む画像や文字等を表示する機能を有する。表示装置１０５としては、例えば、液晶表示素子が挙げられる。表示装置１０５には、音声情報の入力を受け付け可能であることが画面上の第１の領域Ａ１（後述する図５参照）に表示される。また表示装置１０５には、所定のコマンドの実行結果が画面上の第２の領域Ａ２（図５参照）に表示される。第１の領域Ａ１は、長方形である画面の少なくとも一辺と辺を共有した棒状の領域であってもよい。また、第２の領域Ａ２は、画面の全領域であってもよい。
キーボード１０６は、文字、数字、符号を入力する入力装置である。
マウス１０７は、入力装置の一種であり、机上を移動させることで表示装置１０５のカーソルを移動させる等の機能を有する。
光学読取装置１０８は、ＣＤ(Compact Disk)、ＤＶＤ(Digital Versatile Disc)やＣＤ−Ｒ(Compact Disc-Recordable)等の光学媒体を読み取る機能を有する。 The microphone 101 has a function of converting a user's voice into an electrical signal. Examples of the microphone 101 include a condenser microphone, but a dynamic microphone may be used.
The amplifier circuit 102 is a circuit that amplifies the electric signal from the microphone 101.
The speaker 103 has a function of converting an electrical signal into sound. The speaker 103 mainly has a function of transmitting the utterance content of an avatar obtained by anthropomorphizing a PC to a user.
The amplifier circuit 104 is a circuit that amplifies the audio signal to a level for driving the speaker 103.
The display device 105 has a function of displaying an image, characters, and the like including a balloon that displays the avatar and the utterance contents of the avatar as characters. Examples of the display device 105 include a liquid crystal display element. The display device 105 displays in the first area A1 on the screen (see FIG. 5 described later) that it can accept input of audio information. On the display device 105, the execution result of the predetermined command is displayed in the second area A2 (see FIG. 5) on the screen. The first area A1 may be a bar-shaped area sharing a side with at least one side of a rectangular screen. The second area A2 may be the entire area of the screen.
The keyboard 106 is an input device for inputting characters, numbers, and symbols.
The mouse 107 is a kind of input device, and has a function of moving the cursor of the display device 105 by moving on the desk.
The optical reader 108 has a function of reading an optical medium such as a CD (Compact Disk), a DVD (Digital Versatile Disc), or a CD-R (Compact Disc-Recordable).

制御手段１０９は、ＰＣ１００を統括制御機能、及び音声処理機能を有する素子であり、例えばＣＰＵ(Central Processing Unit)が挙げられる。音声処理機能とは、主に入力した音声をテキストデータとして出力し、解析し、合成する機能である。
制御手段１０９は、それぞれソフトウェアで構成される入力制御手段１０９ａ、音声認識手段１０９ｂ、音声解析手段１０９ｃ、検索手段１０９ｄ、音声合成手段１０９ｅ、及び表示装置制御手段１０９ｆを有する。 The control means 109 is an element having an overall control function and a voice processing function for the PC 100, and includes, for example, a CPU (Central Processing Unit). The voice processing function is a function for outputting, analyzing, and synthesizing mainly input voice as text data.
The control unit 109 includes an input control unit 109a, a voice recognition unit 109b, a voice analysis unit 109c, a search unit 109d, a voice synthesis unit 109e, and a display device control unit 109f, each of which is configured by software.

入力制御手段１０９ａは、マイクロフォン１０１に入力された音声が変換された信号を解析して得られたコマンドに基づいて処理させる機能の他、キーボード１０６からのキー入力、及びマウス１０７からのクリックやドラッグ等による信号を文字表示、数字表示、符号表示、カーソル移動、コマンド等に変換する機能を有する。
音声認識手段１０９ｂについては、後述するクライアント型音声認識部２０３である。
音声解析手段１０９ｃは、後述する音声信号解釈部２０２である。
検索手段１０９ｄは、ネットワーク２０７を介してインターネット検索する手段である。検索手段１０９ｄは、ユーザから検索の指示があると、予め設定されたブラウザでネットワークに接続し、予め設定されたインターネット検索サービス会社に接続し、キーワード検索する機能を有する。
音声合成手段１０９ｅは、後述するクライアント型音声合成部２１０である。 The input control unit 109a performs processing based on a command obtained by analyzing a signal obtained by converting the sound input to the microphone 101, key input from the keyboard 106, and click or drag from the mouse 107. Has a function of converting a signal such as a character display, a numerical display, a sign display, a cursor movement, a command, and the like.
The voice recognition unit 109b is a client type voice recognition unit 203 described later.
The voice analysis unit 109c is a voice signal interpretation unit 202 described later.
The search means 109d is means for searching the Internet via the network 207. The search unit 109d has a function of searching for a keyword by connecting to a network with a preset browser and connecting to a preset Internet search service company when a search instruction is received from a user.
The voice synthesizer 109e is a client type voice synthesizer 210 described later.

表示装置制御手段１０９ｆは、音声認識処理に関して表示装置１０５に表示すべきアバター、吹き出し、コマンド実行結果などが画面上に表示される領域を領域Ａ１もしくは領域Ａ２のいずれかに選択する機能を有する。すなわち、表示装置制御手段１０９ｆは、ＰＣ１００が音声入力待機状態（第１のモード）のときは、第１の領域Ａ１に音声入力を受け付け可能であることを表示する。音声入力を受け付け可能であることの表示は、棒状の領域にアバターと「音声入力受け付け可能」な旨が記載された吹き出しとを表示することによって可能であるが、これに限らない。また、会話中、すなわち音声情報の入力と音声情報により特定されたコマンドの実行とが所定時間内に連続している状態（第２のモード）のときは、第２の領域Ａ２にコマンドの実行結果を表示する。第２の領域Ａ２には、コマンドの実行結果に加えて、アバター、アバターの吹き出し、棒状の領域、ガイドアイコンまたはガイドアイコンの説明などをあわせて表示してもよい。アバターの吹き出しに、コマンド実行結果の補足情報を記載するようにすると、ユーザはコマンド実行結果をより的確に把握することができる。
アバターの吹き出しは、例えば、ＰＣ１００の設定により、ユーザからの音声による問いかけへの応答を文字化するようにしてもよく、空白のままにしてもよい。
表示装置制御手段１０９ｆの他の機能については後述するクライアントアプリケーション部２０４、及びローカルコンテンツ部２０８が対応する。 The display device control unit 109f has a function of selecting, as the region A1 or the region A2, a region where an avatar, a speech balloon, a command execution result, and the like to be displayed on the display device 105 are displayed on the screen regarding the speech recognition process. That is, when the PC 100 is in a voice input standby state (first mode), the display device control unit 109f displays that voice input can be received in the first area A1. The indication that the voice input can be accepted can be displayed by displaying an avatar and a speech balloon stating that “speech input can be accepted” in a stick-shaped area, but is not limited thereto. In addition, during a conversation, that is, when the input of voice information and the execution of a command specified by the voice information are continuous within a predetermined time (second mode), the command is executed in the second area A2. Display the results. In the second area A2, in addition to the execution result of the command, an avatar, an avatar balloon, a bar-shaped area, a guide icon or a description of the guide icon may be displayed together. If supplementary information on the command execution result is described in the balloon of the avatar, the user can grasp the command execution result more accurately.
For example, the avatar balloon may characterize the response to the question from the user by the setting of the PC 100 or may be left blank.
Other functions of the display device control unit 109f correspond to a client application unit 204 and a local content unit 208 described later.

ＨＤＤ１１０は、記憶装置の一種であり、ＲＯＭ(Read Only Memory)エリア、及びＲＡＭ(Random Access Memory)エリアを有する。ＲＯＭエリアは制御プログラムを格納するエリアであり、ＲＡＭエリアはメモリとして用いられるエリアである。 The HDD 110 is a kind of storage device, and has a ROM (Read Only Memory) area and a RAM (Random Access Memory) area. The ROM area is an area for storing a control program, and the RAM area is an area used as a memory.

ネットワーク接続部１１１は、ネットワーク２０７を介して外部のサーバに接続する機能を有する公知の装置である。無線もしくは有線のいずれの手段を用いてもよい。
Ｉ／Ｏ１１２は、外部の電子機器、例えばＵＳＢ(Universal Serial Bus line)フラッシュメモリやプリンタを接続する機能を有する入出力装置である。 The network connection unit 111 is a known device having a function of connecting to an external server via the network 207. Either wireless or wired means may be used.
The I / O 112 is an input / output device having a function of connecting an external electronic device such as a USB (Universal Serial Bus line) flash memory or a printer.

図２は、図１に示したＰＣの主要部のブロック図の一例である。
図２において、本発明の実施形態におけるＰＣ１００は、マイクロフォン１０１から入力されたユーザ２００の音声が音声データ（電気信号）に変換されて、当該音声データが音声信号解釈部２０２によって解釈され、その結果がクライアント型音声認識部２０３において認識される。クライアント型音声認識部２０３は、認識した音声データをクライアントアプリケーション部２０４に渡す。 FIG. 2 is an example of a block diagram of the main part of the PC shown in FIG.
In FIG. 2, the PC 100 according to the embodiment of the present invention converts the voice of the user 200 input from the microphone 101 into voice data (electrical signal), and the voice data is interpreted by the voice signal interpretation unit 202. Is recognized by the client-type speech recognition unit 203. The client type voice recognition unit 203 passes the recognized voice data to the client application unit 204.

クライアントアプリケーション部２０４は、ユーザ２００からの問い合わせに対する回答が、オフライン状態にあるローカルコンテンツ部２０８に格納されているか否かを確認し、ローカルコンテンツ部２０８に格納されている場合は、当該ユーザからの問い合わせに対する回答を、後述するテキスト読上部２０９、クライアント型音声合成部２１０を経由して、スピーカ１０３から音声出力する。 The client application unit 204 checks whether an answer to the inquiry from the user 200 is stored in the local content unit 208 in an offline state. If the answer is stored in the local content unit 208, the client application unit 204 An answer to the inquiry is output from the speaker 103 via the text reading unit 209 and the client-type speech synthesizer 210 described later.

ユーザ２００からの問い合わせに対する回答が、ローカルコンテンツ部２０８に格納されていない場合は、ＰＣ１００単独で回答を持ち合わせていないことになるので、インターネット等のネットワーク２０７に接続されるネットワーク接続部２０６を介して、インターネット上の検索エンジン等を用いてユーザからの問い合わせに対する回答を検索し、得られた検索結果を、テキスト読上部２０９、クライアント型音声合成部２１０を経由して、スピーカ１０３から音声出力する。 If the answer to the inquiry from the user 200 is not stored in the local content unit 208, it means that the PC 100 alone does not have an answer, so the network connection unit 206 connected to the network 207 such as the Internet is used. Then, an answer to the inquiry from the user is searched using a search engine on the Internet, and the obtained search result is output as voice from the speaker 103 via the text reading unit 209 and the client-type speech synthesizer 210.

クライアントアプリケーション部２０４は、ローカルコンテンツ部２０８、又はネットワーク２０７から得られた回答をテキスト（文字）データに変換し、テキスト読上部２０９に渡す。テキスト読上部２０９は、テキストデータを読み上げ、クライアント型音声合成部２１０に渡す。クライアント型音声合成部２１０は、音声データを人間が認識可能な音声データに合成しスピーカ１０３に渡す。スピーカ１０３は、音声データ（電気信号）を音声に変換する。また、スピーカ１０３から音声を発するのに合わせて、表示装置１０５に当該音声に関連する詳細な情報を表示する。 The client application unit 204 converts the answer obtained from the local content unit 208 or the network 207 into text (character) data and passes it to the text reading unit 209. The text reading unit 209 reads the text data and passes it to the client-type speech synthesizer 210. The client-type voice synthesizer 210 synthesizes voice data with voice data that can be recognized by a human and passes the voice data to the speaker 103. The speaker 103 converts sound data (electrical signal) into sound. In addition, in accordance with the sound emitted from the speaker 103, detailed information related to the sound is displayed on the display device 105.

次に、本発明の実施形態における情報処理装置の起動時の画面表示について説明する。図３から図５は、本発明の実施形態における情報処理装置の起動時の画面表示について説明する図である。 Next, screen display at the time of starting the information processing apparatus according to the embodiment of the present invention will be described. 3 to 5 are diagrams for explaining screen display when the information processing apparatus according to the embodiment of the present invention is started.

本発明の実施形態に係るＰＣ１００のアバターは、起動時の時間帯や曜日に応じて、様々な挨拶を行うことができる。例えば、起動時が朝の時間帯であるときには、図５に示すアバターが、「おはようございます！」と発声するのに合わせて表示装置１０５（図１）に関連情報を表示する。同様に、起動時が昼間アバターは、「こんにちは！」と発声する。また、時間帯以外にも、平日と休日といった曜日に応じた発声も行うことができる。 The avatar of PC100 which concerns on embodiment of this invention can perform various greetings according to the time zone and day of the week at the time of starting. For example, when the activation time is in the morning time zone, the avatar shown in FIG. 5 displays related information on the display device 105 (FIG. 1) as it says “Good morning!”. Similarly, when you start the avatar during the day, to say "Hello!". In addition to the time zone, utterances according to the days of the week such as weekdays and holidays can be performed.

＜動作＞
図３は、図１に示したＰＣの動作の一例を示すフローチャートである。図４は、ユーザが音声でＰＣに指示している状態を示す図である。図５（ａ）は、図４に示したＰＣの表示装置に表示される画面のうち、音声認識処理が作動中であって、ユーザからの指示前の状態を示し、図５（ｂ）は、音声認識処理が作動中であってユーザからの指示後の状態を示す図である。 <Operation>
FIG. 3 is a flowchart showing an example of the operation of the PC shown in FIG. FIG. 4 is a diagram illustrating a state in which the user instructs the PC by voice. FIG. 5A shows a state before the instruction from the user when the voice recognition processing is in operation among the screens displayed on the display device of the PC shown in FIG. 4, and FIG. It is a figure which shows the state after the instruction | indication from a user while the speech recognition process is operating.

図３において、動作の主体は制御手段である。
音声認識によってコマンドを実行する一連の処理の開始の指示を受けると、まず、制御手段１０９は、音声認識開始指示を受け付け可能な第１のモードに遷移し、音声認識開始指示を受け付け可能であることを表示装置１０５の領域Ａ１に表示させる（ステップＳ１）。領域Ａ１には、音声入力を受け付け可能であることとともに、アバターを表示してもよく、音声入力を受け付け可能であることをアバターの発話内容として吹き出しで表示してもよい。領域Ａ１は、表示装置１０５における長方形の画面の少なくとも一辺と辺を共有するタスクバーとして表示することができるが、これに限らない。
続いて、制御手段１０９は、ユーザから音声認識開始指示の入力があったか否かを判断する（ステップＳ２）。音声認識開始指示は、ユーザの音声によるウェークアップキーワード（例えば、「シェリー」）の入力の簡易音声認識処理による認識であってもよいし、画面に表示された所定のアイコンやハードウェアスイッチへの操作などであってもよい。
制御手段１０９は、ユーザからの音声認識開始指示の入力があるまで待機し（ステップＳ２／Ｎ）、ユーザから音声認識開始指示の入力があると（ステップＳ２／Ｙ）、制御手段１０９は、音声認識処理を開始する（ステップＳ３）。
続いて、制御手段１０９は、ユーザからの音声認識停止指示を待つ（ステップＳ４）。ユーザからの音声認識停止指示の入力があると（ステップＳ４／Ｙ）、音声認識処理を終了した後にステップＳ１に戻る。ユーザからの音声認識停止指示は、音声により指示されたコマンドの実行後の所定時間連続して音声入力がないことにより判断してもよく、ユーザの音声によるスリープワード（例えば、「バイバイ」）の入力を音声認識処理によって認識するのでもよく、画面に表示された所定のアイコンやハードウェアスイッチへの操作であってもよい。また、音声認識処理を終了するときに、音声認識を終了する旨をアバターの発話内容として吹き出しで表示するなどしてもよい。
ユーザからの音声認識停止指示の入力がなされないと（ステップＳ４／Ｎ）、制御手段１０９は、ユーザからのコマンドを指示する音声の入力があったか否かを判断する（ステップＳ５）。 In FIG. 3, the subject of operation is a control means.
When receiving an instruction to start a series of processes for executing a command by voice recognition, first, the control unit 109 transitions to a first mode in which a voice recognition start instruction can be accepted, and can accept a voice recognition start instruction. This is displayed in the area A1 of the display device 105 (step S1). In the area A1, in addition to being able to accept voice input, an avatar may be displayed, and the fact that voice input can be accepted may be displayed in a speech bubble as speech content of the avatar. The area A1 can be displayed as a task bar that shares at least one side of the rectangular screen in the display device 105, but is not limited thereto.
Subsequently, the control means 109 determines whether or not a voice recognition start instruction is input from the user (step S2). The voice recognition start instruction may be recognition by a simple voice recognition process of inputting a wakeup keyword (for example, “Sherri”) by a user's voice, or an operation to a predetermined icon or hardware switch displayed on the screen It may be.
The control means 109 stands by until a voice recognition start instruction is input from the user (step S2 / N). When a voice recognition start instruction is input from the user (step S2 / Y), the control means 109 Recognition processing is started (step S3).
Subsequently, the control unit 109 waits for a voice recognition stop instruction from the user (step S4). When a voice recognition stop instruction is input from the user (step S4 / Y), the process returns to step S1 after the voice recognition process is completed. The voice recognition stop instruction from the user may be determined by the absence of voice input for a predetermined time after the execution of the command instructed by voice, and the sleep word (for example, “bye-bye”) by the user's voice may be determined. The input may be recognized by voice recognition processing, or may be an operation on a predetermined icon or hardware switch displayed on the screen. Further, when the voice recognition process is ended, a message indicating that the voice recognition is to be ended may be displayed as a speech of the avatar in a balloon.
If no voice recognition stop instruction is input from the user (step S4 / N), the control means 109 determines whether or not there is a voice input instructing a command from the user (step S5).

ユーザからのコマンドを指示する音声の入力があると（ステップＳ５／Ｙ）、音声により指示されたコマンドを特定して実行し、表示装置１０５の領域Ａ１より大きい領域Ａ２にコマンドの実行結果を表示させ（ステップＳ６）、ステップＳ４に戻って音声認識停止指示を待つ。領域Ａ２は、表示装置１０５における表示画面全体であってもよいが、これに限らない。また、領域Ａ２に、コマンドの実行結果とあわせてアバターを表示させてもよく、さらにコマンドの実行結果の補足情報をアバターの発話内容として吹き出しで表示させてもよい。このとき、ステップＳ１で領域Ａ１に表示するアバターと共通の表示縮尺で領域Ａ２にアバターを表示するようにすると、第１のモードと第２のモードとの間の移行によるユーザの違和感を減少させることができるため、好適である。
また、ユーザからのコマンドを指示する音声の入力があったときに、指示されたコマンドの実行結果が画面表示を必要とするものであるかをさらに判断し、画面表示を必要とする実行結果のときにその結果を領域Ａ２に表示するステップをステップＳ５に続いて実行するようにしてもよい。このようにすると、例えば音量調整や画面の明るさ調整などのコマンド実行時に画面の表示領域の不要な変更を防止することができ、ユーザの使い勝手が向上する。このステップで、コマンドの実行結果が画面表示を必要としないときは、ステップＳ５実行後と同様にステップＳ４に戻る。
ユーザからのコマンドを指示する音声の入力がないと（ステップＳ５／Ｎ）、ステップＳ４に戻って音声認識停止指示の入力およびコマンドを指示する音声の入力の待機を継続する。 When there is a voice input instructing a command from the user (step S5 / Y), the command specified by the voice is specified and executed, and the execution result of the command is displayed in an area A2 larger than the area A1 of the display device 105. (Step S6), the process returns to Step S4 and waits for a voice recognition stop instruction. The area A2 may be the entire display screen in the display device 105, but is not limited thereto. Further, an avatar may be displayed in the area A2 together with the execution result of the command, and supplementary information on the execution result of the command may be displayed as a speech of the avatar in a balloon. At this time, if the avatar is displayed in the area A2 at the same display scale as the avatar displayed in the area A1 in step S1, the user's uncomfortable feeling due to the transition between the first mode and the second mode is reduced. This is preferable.
Further, when there is a voice input instructing a command from the user, it is further determined whether or not the execution result of the instructed command requires screen display, and the execution result that requires screen display is determined. Sometimes, the step of displaying the result in the area A2 may be executed following step S5. In this way, for example, unnecessary changes in the display area of the screen can be prevented when executing commands such as volume adjustment and screen brightness adjustment, and user convenience is improved. In this step, when the command execution result does not require screen display, the process returns to step S4 in the same manner as after step S5.
If there is no voice input instructing a command from the user (step S5 / N), the process returns to step S4 to continue the input of the voice recognition stop instruction and the voice input instructing the command.

すなわち、図４に示すように、例えばユーザ２００がドレッサーのチェストに座りながらメークしており、ソファーの上に音声認識動作中のＰＣ１００が載置されているとする。尚、キーワードとしてのウェークアップキーワードがユーザ２００から発せられると、判別手段としての制御手段が判別し、コマンドとしての問いかけに対する応答動作を開始する。
ユーザ２００がメークをしながら、ＰＣ１００に「シェリー」と呼ぶと、瞬時にＰＣ１００のスピーカ１０３から「おはようございます。いかがいいたしましょうか？お手伝いできることがあれば言ってくださいね。」等の音声が発せられる。これと同時にモニタ１００ａには領域Ａ２が表示される。図４ではＰＣ１００とユーザ２００とは離れているが、ユーザ２００はいつでもＰＣ１００のモニタ１００ａを見ることができる。このため、アバターの大きさがほとんど変わらないため、一連のモードの連続性が得られ、違和感なく操作できる。 That is, as shown in FIG. 4, for example, it is assumed that the user 200 is making a seat while sitting on the dresser's chest, and the PC 100 that is performing the voice recognition operation is placed on the sofa. When a wake-up keyword as a keyword is issued from the user 200, a control unit as a determination unit determines and starts a response operation for an inquiry as a command.
When the user 200 makes “Sherry” to the PC 100 while making up, the voice of the speaker 103 of the PC 100 instantly says “Good morning. What do you want to do? Is emitted. At the same time, the area A2 is displayed on the monitor 100a. Although the PC 100 and the user 200 are separated from each other in FIG. 4, the user 200 can view the monitor 100 a of the PC 100 at any time. For this reason, since the size of the avatar is hardly changed, continuity of a series of modes can be obtained, and operation can be performed without a sense of incongruity.

図５は、図１に示したパーソナルコンピュータ１００の起動から入力待機、会話、タイムアウト、及びセッション起動待機までの遷移概要を示す概念図である。
＜起動＞
図１に示したパーソナルコンピュータ１００の音声認識の機能は、ソフナビ（ソフトウェアナビゲータ）から起動するか、またはアプリ（アプリケーションソフトウェア）から起動する。ソフナビもしくはアプリの起動は、ユーザ２００によりマウス１０７、キーボード１０６、もしくは図示しないタッチパネルにて行われる。 FIG. 5 is a conceptual diagram showing an outline of the transition from activation of the personal computer 100 shown in FIG. 1 to input standby, conversation, timeout, and session activation standby.
<Startup>
The voice recognition function of the personal computer 100 shown in FIG. 1 is activated from a soft navigation (software navigator) or an application (application software). The soft navigation or application is activated by the user 200 using the mouse 107, the keyboard 106, or a touch panel (not shown).

＜入力待機(Active Waiting)＞
音声認識機能が起動すると、表示装置１０５の画面上に第１の領域Ａ１が表示される。第１の領域Ａ１は、アバター１５０、アバター１５０の吹き出し１５１、及び棒状の領域１５２を有する。棒状の領域１５２にはキーボード入力もしくはマウス入力するための領域１５３が配置されている。 <Active Waiting>
When the voice recognition function is activated, the first area A1 is displayed on the screen of the display device 105. The first area A1 includes an avatar 150, a balloon 151 of the avatar 150, and a bar-shaped area 152. In the bar-shaped area 152, an area 153 for keyboard input or mouse input is arranged.

＜会話中＞
入力待機状態において、ユーザ２００がパーソナルコンピュータ１００に対して音声による会話が開始されると、領域Ａ１より大きい領域Ａ２が表示される。領域Ａ２のサイズは画面の全領域である。領域Ａ２には、アバター１５０、アバター１５０の吹き出し１５１、棒状の領域１５２に加えて、例えばレストランを意味するアイコン１５５ａ、電車を意味するアイコン１５５ｂ、天気を意味するアイコン１５５ｃ、乗換案内を意味するアイコン１５５ｄ、カレンダーを意味するアイコン１５５ｅ等の各種アイコン、及び各アイコン１５５ａ〜１５５ｅの説明１５６ａ〜１５６ｅが箇条書き表示される。
ユーザ２００はパーソナルコンピュータ１００のアバター１５０とあたかも会話をするように質疑や検索の依頼等を行うことができる。会話を開始後所定の時間が経過してもユーザ２００からの音声が途絶えると、画面の表示が領域Ａ２から元の領域Ａ１に戻る（セッションタイムアウト）。 <During conversation>
When the user 200 starts a voice conversation with the personal computer 100 in the input standby state, an area A2 larger than the area A1 is displayed. The size of the area A2 is the entire area of the screen. In the area A2, in addition to the avatar 150, the balloon 151 of the avatar 150, and the bar-shaped area 152, for example, an icon 155a meaning a restaurant, an icon 155b meaning a train, an icon 155c meaning weather, and an icon meaning transfer information 155d, various icons such as an icon 155e indicating a calendar, and descriptions 156a to 156e of the icons 155a to 155e are displayed in an itemized form.
The user 200 can make a question and a search request as if having a conversation with the avatar 150 of the personal computer 100. If the audio from the user 200 is interrupted even after a predetermined time has elapsed after the conversation is started, the screen display returns from the area A2 to the original area A1 (session timeout).

入力待機状態を示す領域Ａ１もしくは会話中の状態を示す領域Ａ２は、棒状の領域１５２の中の最小化ボタン１５７をキーボード１０６、マウス１０７、もしくは図示しないタッチパネルでクリックもしくはタップすることにより（タスクバーもしくはタスクトレイ）にアイコン化される。アイコンをダブルタップすることによりセッション起動待機状態（Inactive Waiting）状態に遷移する。
以上において、本実施形態によれば、サイズが異なる領域Ａ１、Ａ２においてアバターのサイズがほぼ同一であるので、一連のモードの連続性が得られるようにした。このため操作上の違和感がなくなる。 The area A1 indicating the input standby state or the area A2 indicating the state of talking is clicked or tapped on the keyboard 106, the mouse 107, or a touch panel (not shown) in the bar-shaped area 152 (taskbar or Iconized in the task tray. By double-tapping the icon, the session transitions to the session activation waiting state (Inactive Waiting) state.
In the above, according to this embodiment, since the size of the avatar is almost the same in the areas A1 and A2 having different sizes, the continuity of a series of modes is obtained. For this reason, there is no sense of incongruity in operation.

＜プログラム＞
以上で説明した本発明に係る情報処理装置は、コンピュータで処理を実行させるプログラムによって実現されている。コンピュータとしては、例えばパーソナルコンピュータが挙げられるが、本発明はこれに限定されるものではない。よって、一例として、プログラムにより本発明の機能を実現する場合の説明を以下で行う。 <Program>
The information processing apparatus according to the present invention described above is realized by a program that causes a computer to execute processing. Examples of the computer include a personal computer, but the present invention is not limited to this. Therefore, as an example, a case where the function of the present invention is realized by a program will be described below.

例えば、
音声情報を入力する入力手段と、前記入力手段に入力された音声情報から所定のテキスト情報を認識する認識手段と、認識された所定のテキスト情報に基づいて特定される所定のコマンドを実行する実行手段と、を備えた情報処理装置のコンピュータが読み取り可能なプログラムであって、
前記コンピュータが、
表示手段に、
前記音声情報の入力を受け付ける第１のモードにおいて前記音声情報の入力を受け付け可能であることを画面上の第１の領域に表示する手順、
前記所定のコマンドを実行する第２のモードにおいて前記所定のコマンドの実行結果を前記画面上の前記第１の領域より大きい第２の領域に表示する手順、
を実行させるためのプログラムが挙げられる。 For example,
Input means for inputting speech information; recognition means for recognizing predetermined text information from the speech information input to the input means; and executing a predetermined command specified based on the recognized predetermined text information A computer-readable program of an information processing device comprising means,
The computer is
In the display means,
A procedure for displaying in the first area on the screen that the input of the audio information can be received in the first mode of receiving the input of the audio information;
Displaying the execution result of the predetermined command in a second area larger than the first area on the screen in the second mode for executing the predetermined command;
A program for executing

これにより、プログラムが実行可能なコンピュータ環境さえあれば、どこにおいても本発明にかかる情報処理装置を実現することができる。
このようなプログラムは、コンピュータに読み取り可能な記憶媒体に記憶されていてもよい。 Thus, the information processing apparatus according to the present invention can be realized anywhere as long as there is a computer environment capable of executing the program.
Such a program may be stored in a computer-readable storage medium.

＜記憶媒体＞
ここで、記憶媒体としては、例えばＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ等のコンピュータで読み取り可能な記憶媒体、フラッシュメモリ、ＲＡＭ、ＲＯＭ、ＦｅＲＡＭ等の半導体メモリやＨＤＤが挙げられる。 <Storage medium>
Here, examples of the storage medium include computer-readable storage media such as CD-ROM, flexible disk (FD), and CD-R, semiconductor memories such as flash memory, RAM, ROM, and FeRAM, and HDD.

フレキシブルディスクは、Flexible Disk：ＦＤを意味する。ＲＡＭは、Random-Access Memoryの略である。ＲＯＭは、Read-Only Memoryの略である。ＦｅＲＡＭは、Ferroelectric RAMの略で、強誘電体メモリを意味する。 A flexible disk means Flexible Disk: FD. RAM is an abbreviation for Random-Access Memory. ROM is an abbreviation for Read-Only Memory. FeRAM is an abbreviation for Ferroelectric RAM and means a ferroelectric memory.

以上において、本発明によれば、音声情報を入力する入力手段と、入力手段に入力された音声情報から所定のテキスト情報を認識する認識手段と、認識された所定のテキスト情報に基づいて特定される所定のコマンドを実行する実行手段と、を備えた情報処理装置であって、音声情報の入力を受け付けるとともに音声情報の入力を受け付け可能であることを画面上の第１の領域に表示し、所定のコマンドを実行するとともに実行結果を画面上の第２の領域に表示する表示手段を備え、第１の領域よりも第２の領域の方が大きいことにより、一連のモードの連続性が得られる情報処理方法、情報処理装置、及びプログラムの提供を実現することができる。 In the above, according to the present invention, input means for inputting speech information, recognition means for recognizing predetermined text information from the speech information input to the input means, and identification based on the recognized predetermined text information. An information processing apparatus that executes a predetermined command, and displays in the first area on the screen that the voice information input and the voice information input can be accepted, A display means for executing a predetermined command and displaying an execution result in a second area on the screen is provided, and the continuity of a series of modes is obtained because the second area is larger than the first area. Information processing method, information processing apparatus, and program can be provided.

尚、上述した実施の形態は、本発明の好適な実施の形態の一例を示すものであり、本発明はそれに限定されることなく、その要旨を逸脱しない範囲内において、種々変形実施が可能である。例えば、本実施形態ではアバターとして若い女性が用いられているが、本発明はこれに限定されるものでなく、男性であっても、アニメーションキャラクタであってもよい。 The above-described embodiment shows an example of a preferred embodiment of the present invention, and the present invention is not limited thereto, and various modifications can be made without departing from the scope of the invention. is there. For example, although a young woman is used as an avatar in the present embodiment, the present invention is not limited to this, and may be a male or an animated character.

１００パーソナルコンピュータ（ＰＣ、情報処理装置）
１００ａモニタ
１０１マイクロフォン
１０２、１０４増幅回路
１０３スピーカ
１０５表示装置
１０６キーボード
１０７マウス
１０８光学読取装置
１０９制御手段
１０９ａ入力制御手段
１０９ｂ音声認識手段
１０９ｃ音声解析手段
１０９ｄ検索手段
１０９ｅ音声合成手段
１０９ｆ表示装置制御手段
１１０ＨＤＤ
１１１ネットワーク接続部
１１２Ｉ／Ｏ
１１３バスライン
１５０アバター
１５１吹き出し
２００ユーザ
２０２音声信号解釈部
２０３クライアント型音声認識部
２０４クライアントアプリケーション部
２０７ネットワーク
２０８ローカルコンテンツ部
２０９テキスト読上部
２１０クライアント型音声合成部 100 Personal computer (PC, information processing device)
100a monitor 101 microphone 102, 104 amplifier circuit 103 speaker 105 display device 106 keyboard 107 mouse 108 optical reader 109 control means 109a input control means 109b speech recognition means 109c speech analysis means 109d search means 109e speech synthesis means 109f display device control means 110 HDD
111 Network connection 112 I / O
113 Bus Line 150 Avatar 151 Speech Bubble 200 User 202 Speech Signal Interpretation Unit 203 Client Type Speech Recognition Unit 204 Client Application Unit 207 Network 208 Local Content Unit 209 Text Reading Top 210 Client Type Speech Synthesis Unit

Claims

An information processing method for executing a predetermined command specified based on predetermined text information recognized from input speech information,
A first mode for accepting the input of the voice information and displaying in the first area on the screen that the voice information can be accepted, and executing the predetermined command and displaying the execution result on the screen And a second mode for displaying in a second area larger than the first area.

2. The information processing method according to claim 1, wherein when the predetermined command whose execution result is not displayed on the screen is executed in the second mode, the display area is not changed.

An avatar that is an anthropomorphic of an information processing device that performs information processing is displayed as if speaking that the input of the voice information can be accepted in the first mode, and in the second mode, The information processing method according to claim 1 or 2, wherein the supplementary information of the execution result is displayed as if it is spoken.

The information processing method according to claim 3, wherein the display scale of the avatar is common in the first mode and the second mode.

5. The information processing method according to claim 1, wherein the first area is a bar-shaped area sharing a side with at least one side of the screen having a rectangular shape.

The information processing method according to claim 1, wherein the second area is an entire area of the screen.

Input means for inputting speech information; recognition means for recognizing predetermined text information from the speech information input to the input means; and executing a predetermined command specified based on the recognized predetermined text information An information processing apparatus comprising means,
In the first mode for accepting the input of the audio information, the fact that the input of the audio information can be accepted is displayed in the first area on the screen, and the predetermined command is executed in the second mode for executing the predetermined command. Display means for displaying a command execution result in a second area on the screen;
An information processing apparatus, wherein the second area is larger than the first area.

Input means for inputting speech information; recognition means for recognizing predetermined text information from the speech information input to the input means; and executing a predetermined command specified based on the recognized predetermined text information A computer-readable program of an information processing device comprising means,
The computer is
In the display means,
A procedure for displaying in the first area on the screen that the input of the audio information can be received in the first mode of receiving the input of the audio information;
Displaying the execution result of the predetermined command in a second area larger than the first area on the screen in the second mode for executing the predetermined command;
A program for running