JP2004301893A

JP2004301893A - Control method of voice recognition device

Info

Publication number: JP2004301893A
Application number: JP2003091561A
Authority: JP
Inventors: Eiji Ishiyama; 英二石山; Hiroshi Tanaka; 宏志田中
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To put a voice recognition device into operation without performing any complicated operation. <P>SOLUTION: Provided are a standby mode wherein only a specified command can be recognized as a voice and a main voice input mode wherein a normal command is recognized as a voice and a digital camera is made to execute the processing corresponding to the command. When the specified command is recognized as a voice in the standby mode, a speaker is specified by referencing a dictionary for standby mode wherein voice data of a plurality of speakers are previously stored and the main voice input mode is entered. In the main voice input mode, respective commands are recognized as voices according to a main dictionary corresponding to the specified speaker. When the mode changes from the standby mode to the main voice input mode, a proper choice between an LCD panel and a voice output part is made according to the current operation state of the digital camera to inform the speaker of the change into the main voice input mode. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、電子機器等に搭載され、話者が発する音声を認識する音声認識装置の制御方法に関する。
【０００２】
【従来の技術】
音声認識装置は、話者が発声するタイミングを検出することにより、その認識性能を向上させることができる。このため、音声認識装置が搭載された電子機器等においては、音声の入力開始など音声認識に関する処理の開始を指示するために、ボタンスイッチ等が設けられている。このボタンスイッチが話者によりオン操作されたのを検出したときに、音声入力の開始など音声認識に関する処理を開始することで、入力音声データの始まりを絞り込む。
【０００３】
しかしながら、近年、音声認識装置は種々の用途に応用されており、その用途によっては、ボタンスイッチを手動操作することが、システムの性質上困難な場合がある。このような問題を解決するために、話者の動きを検出する動作検出手段を設け、話者の所定の動作を検出したときに、音声認識に関する処理を開始する音声認識装置が知られている（例えば、特許文献１参照）。
【０００４】
【特許文献１】
特開２０００−３３８９９５号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、上記特許文献１記載の技術では、音声認識処理を行うために、音声入力と関係のない特定の動作をユーザに強いることとなり、外部操作を行うことなく音声入力を行うことのできる音声認識装置の特長が損なわれてしまう。
さらに、所定の動作が検出されたときは常に音声認識に関する処理を開始するため、意図せずに所定の動作を行ったときも音声認識処理が開始されてしまうという問題があった。
【０００６】
本発明は、上記問題点を解決するためのもので、音声入力と関係ない動作を行うことなく、音声入力の始まりを効果的に検出することのできる音声認識装置の制御方法を提供することを目的とする。
【０００７】
【課題を解決するための手段】
上記問題点を解決するために、本発明の音声認識装置の制御方法は、電子機器等に搭載され、話者が発する音声を認識して対応するコマンドを結果出力する音声認識装置の制御方法であり、特定コマンドのみを音声認識可能な待ち受けモードと、電子機器を動作させる動作コマンドを音声認識する主音声入力モードとを有し、待ち受けモードにおいて、特定コマンドを音声認識した時に、主音声入力モードに移行することことを特徴とするものである。
【０００８】
また、待ち受けモードから主音声入力モードに移行する際に、予め複数の話者の音声データが記録された待ち受け用認識辞書を参照することにより話者を特定し、特定された話者に対応する認識辞書に基づき主音声入力モードでの音声認識を行うことことを特徴とするものである。
【０００９】
さらに、待ち受けモードと主音声入力モードとの間で動作モードが切り替えられたときに、電子機器の動作状態に基づいて選択した最適な通知手段を介して、動作モードが切り替えられた旨を話者に通知することを特徴とするものである。
【００１０】
【発明の実施の形態】
図１は、本発明の音声認識装置の制御方法を適用したデジタルカメラ１０の外観斜視図である。カメラ本体１１の背面には、ＬＣＤパネル１２が設けられており、その図中上方に接眼ファインダ窓１３が配置されている。この接眼ファインダ窓１３の奥には光学式のファインダ装置が設けられている。接眼ファインダ窓１３の図中左方には接眼検知センサ１４が設けられており、撮影者が接眼ファインダ窓１３を覗いたことを検知する。この接眼ファインダ窓１３の図中右方には、マイクロホンである音声入力部１５、及び電源スイッチと連動するスライド部材１６が設けられている。また、ＬＣＤパネル１２の図中右方にはカーソル操作ボタン１７が設けられており、その下方には、操作ボタン１８と、スピーカである音声出力部１９とが設けられている。
【００１１】
この操作ボタン１８は、メニューボタン１８ａ，実行ボタン１８ｂ，及び取消ボタン１８ｃの３つのボタンで構成されており、デジタルカメラ１０の設定変更等を行う際に、撮影者により押圧操作される。
【００１２】
カメラ本体１１の上面には、シャッタボタン２０、モード設定ダイヤル２１が設けられている。このモード設定ダイヤル２１は、撮影モード、再生モード、セットアップモード等のモードの切替え時に撮影者により回転操作される。また、カメラ本体１１の前面には、図示しないが、撮影レンズ、対物ファインダ窓等が設けられている。
【００１３】
メニューボタン１８ａが押圧操作されると、ＬＣＤパネル１２に設定メニューが表示される。この設定メニュー画面上で、カーソル操作ボタン１７を操作して、設定メニュー画面上で設定したい項目、例えば、「画像サイズ選択」にカーソルを移動させ、実行ボタン１８ｂを押圧操作することにより、「画像サイズ選択」の設定画面が表示される。この設定画面において、同様にカーソル操作ボタン１７と、実行ボタン１８ｂとを操作することにより、画像サイズの設定を変更することができる。本実施形態のデジタルカメラ１０では、このような設定変更を撮影者である話者の発する音声を認識することで設定変更することができる。
【００１４】
図２は、デジタルカメラ１０の電気的構成の主要部を示すブロック図である。
システムコントローラ３１は、データバス３２を介して、デジタルカメラ１０内の各ブロックと接続されており、デジタルカメラ１０全体の制御を行う。
【００１５】
ＣＣＤイメージセンサ等の撮像素子で構成される撮像部３３は、被写体画像を取得して画像データをＡ／Ｄ変換器３４に出力する。Ａ／Ｄ変換器３４は、この画像データをアナログ信号からデジタル信号に変換して、デジタル化された画像データを画像信号処理部３５に出力する。
【００１６】
画像信号処理部３５は、データバス３２を介して、表示用メモリ３６及びバッファメモリ３７に接続されており、それぞれに画像データを書き込む。表示用メモリ３６は、ＬＣＤパネル１２を電子ビューファインダとして使用する際に、解像度の低い画像データが一時的に記録され、その後、画像データはＬＣＤドライバ３９に送信され信号処理後、ＬＣＤパネル１２に表示される。
【００１７】
バッファメモリ３７は、撮像された高解像度の画像データが一時的に記憶され、画像信号処理部３５により各種の信号処理が行われる。このバッファメモリ３７から読み出された画像データは、データバス３２を介して接続されている圧縮・伸張処理部４０により、ＪＰＥＧ等の圧縮方式により圧縮される。システムコントローラ３１が、メディアコントローラ４１を制御して、圧縮された画像データを記録媒体であるメモリカード４２に記録させる。
【００１８】
システムコントローラ３１は、Ｉ／Ｏポート４３を介して、音声入力部１５、音声出力部１９、操作入力部４４が接続されている。この操作入力部４４は、スライド部材１６、カーソル操作ボタン１７、操作ボタン１８、シャッタボタン２０、及びモード設定ダイヤル２１で構成されている。システムコントローラ３１は、この操作入力部４４から撮影者の指令を取得する。システムコントローラ３１内には、内蔵メモリ（図示せず）が設けられており、この内蔵メモリに記憶された制御プログラムに基づいて各種処理を実行する。
【００１９】
さらに、システムコントローラ３１には、データバス３２を介して、接眼検知センサ１４、音声認識装置４６が接続されている。図３のブロック図に示すように、音声認識装置４６は、装置全体の制御を行う制御部５１と、短時間フーリエスペクトル分析やＬＰＣ分析等の手法を用いて音声信号からその特徴量を取得する音声処理部５２と、ＨＭＭ等の手法を用いて音声の特徴量から音声認識を行う音声認識部５３と、この音声認識部５３に音声認識用辞書を提供する辞書選択部５４と、辞書記憶部５５とで構成されている。
【００２０】
また、この音声認識装置４６は、特定のコマンドのみを認識する待ち受けモードと、通常のコマンドを認識して、入力コマンドの内容をシステムコントローラ３１へ出力する主音声入力モードとの２つの動作モードを備えている。
【００２１】
音声入力部１５は、マイクロホンであり、操作者の音声を電気信号に変換して、この電気信号をアナログ信号からデジタル信号に変換する。音声処理部５２は、デジタル信号に変換された音声データを音声入力部１５から取得して、この音声データに対して、短時間フーリエスペクトル分析やＬＰＣ分析を用いてスペクトル解析を行い、音素毎の特徴量（特徴パラメータ）を取得する。このような特徴パラメータとしては、パワースペクトラムや、ケプストラム係数などが良く知られているが、本実施形態では、使用する特徴パラメータの種類は問わない。
【００２２】
また単音のスペクトルが得られるなど明らかに音声とは異なる特性が得られた場合は、その音声入力を無効な入力と判定して、その後の処理を実行しない。
【００２３】
音声認識部５３は、音声処理部５２から音素毎の特徴量（特徴パラメータ）を取得し、ＨＭＭ等の時系列確率モデルを用いて入力音声を推定し、辞書記憶部５４に記憶されている認識用辞書を参照することにより入力コマンドを特定する。
【００２４】
辞書記憶部５４は、待ち受けモードにおいて使用される待ち受け用辞書５６と、主音声入力モードにおいて使用される主辞書（Ａ）５７、主辞書（Ｂ）５８とが記憶されている。待ち受け用辞書５６には、特定のコマンド、例えば、「開始」、「スタート」のモード移行コマンドのみが記憶されている。この「開始」のコマンドは、話者（Ａ）に対応付けられており、「スタート」のコマンドは、話者（Ｂ）に対応付けられている。主辞書（Ａ）５７及び主辞書（Ｂ）５８には、この音声認識装置４６が組み込まれる機器の各種処理（例えば、画質の設定やストロボ撮影モードの設定）を実行させる複数のコマンド（コマンド１〜ｎ）が記憶されており、主辞書（Ａ）は話者（Ａ）に対応付けられており、主辞書（Ｂ）は話者（Ｂ）に対応付けられている。
【００２５】
辞書選択部５４は、待ち受けモードにおいて、待ち受け用辞書５６を選択し、主音声入力モードにおいては、話者が話者（Ａ）に特定された時に主辞書（Ａ）５７を選択し、話者（Ｂ）に特定された時に主辞書（Ｂ）５８を選択する。
【００２６】
前述のように構成された音声認識装置４６の制御方法について説明する。図４は、この音声認識装置４６の音声認識処理を説明するフローチャートである。
【００２７】
この音声認識装置４６は、デジタルカメラ１０が起動している間は、常に起動状態であり、待ち受けモードとして動作している。待ち受けモードとして動作している時に、音声入力部１５に音声が入力されると、撮影者である話者の音声が音声データとして音声処理部５２に出力される。
【００２８】
音声処理部５２は、取得した音声データに基づいて、音素毎の特徴量（特徴パラメータ）を算出する。この時、単音のスペクトルが得られるなど明らかに音声とは異なる特性が得られた場合には、この音声入力を無効な入力と判定して、その後の処理を行わずに、待ち受けモードの状態を維持する。そして、新たな音声入力がされた時に、特徴量を再び算出する。それ以外の場合には、算出した特徴量を音声認識部５３に出力する。
【００２９】
音声認識部５３は、音声処理部５２から取得した特徴量と、待ち受け用辞書５６に記憶されているモード移行コマンド（（Ａ）「開始」、（Ｂ）「スタート」）の特徴量とを比較してコマンドを識別する。この時、モード移行コマンドが認識されなかった場合には、その後の処理を行わずに、待ち受けモードの状態を維持して、新たな音声入力がされた時に音声認識処理を再び行う。待ち受けモードでは、特定のモード移行コマンド以外の音声入力を受けつけないため、モード移行コマンドを精度良く認識できる。また、待ち受けモードから主音声入力モードへの移行は音声入力によって行われるため、モード変更のための特別な動作を行う必要がなくなる。
【００３０】
モード移行コマンドが認識された場合には、待ち受けモードから主音声入力モードに移行する。このモードの移行は、後述するように、音声出力部１９やＬＣＤパネル１２により撮影者に通知される。また、使用された移行コマンドの種類に基づいて、主音声入力モードで使用する音声認識用辞書の種類が決定される。
例えば、（Ａ）「開始」というコマンドで主音声入力モードへ移行した場合、主辞書（Ａ）１８が認識用辞書として選択され、（Ｂ）「スタート」というコマンドで主音声入力モードに移行した場合、主辞書（Ｂ）１９が認識辞書として選択される。
【００３１】
この主音声入力モードにおいては、待ち受けモードの時と同様に音声入力処理が実行されて、音声処理部５２にて無効な入力と判定された場合には、新たな音声が入力されるまで処理を行わない。また、有効な入力の場合には、音声認識部５３でコマンドを識別する。この時、待ち受けモード時と異なり、主辞書（Ａ）５７または主辞書（Ｂ）５８に登録された全てのコマンドを認識する。この音声入力モード時にのみ、コマンド入力がなされるので、入力音声の始まりのタイミングを絞り込むことができる。
【００３２】
認識されたコマンドは、データバス３２を介して、システムコントローラ３１に出力され、システムコントローラ３１は、そのコマンドに対応する処理を実行する。また、認識されたコマンドが音声認識装置４６の制御に関するもの（例えば、主音声入力モードの終了を表す「終了」のコマンド）場合は、音声認識装置１０の制御部５１に出力される。
【００３３】
「終了」のコマンドが入力されたときは、制御部５１は、辞書選択部５４を制御して、認識用辞書を主辞書（Ａ）５７、または主辞書（Ｂ）５８から待ち受け用辞書５６に変更して、待ち受けモードに移行する。「終了」コマンドを認識しない場合は、新たな音声入力を取得して、同様の処理を繰り返し実行する。
【００３４】
なお、主音声入力モード終了を「終了」コマンドを認識することにより行ったが、これに限るものではなく、例えば、デジタルカメラ１０の操作ボタン１８を押圧操作することにより主音声入力モードを終了するようにしても良い。
【００３５】
また、コマンドの種類（「開始」、「スタート」）によって話者の特定を行い、対応する主辞書を選択するように説明したが、これに限るものではなく、予め待ち受け用辞書５６に、モード移行コマンドに対応する話者毎の音声の特徴量を記憶させ、入力された音声データと比較することにより、話者の特定を行っても良い。
【００３６】
さらに、本実施形態においては、２名分の辞書しか設定していないが、これに限るものではなく、話者は１名であっても良いし、あるいは３名以上でも良い。
この場合は、話者（操作者）の人数分の主辞書を設定する。
【００３７】
また、音声認識処理部を設けずに、音声認識処理部の処理をソフトウェアで実行しても良い。この場合、ソフトウェアのプログラム及び認識用辞書をシステムコントローラ３１の内蔵メモリに記憶させ、デジタルカメラ１０の電源がＯＮの時は、前述のプログラムが常に起動している状態であれば良い。
【００３８】
次に、モード移行時の通知処理について説明を行う。図５に示すように、主音声入力モードへの移行を通知する方法として、「ＬＣＤパネル表示」、「音声で通知」、「自動選択」の３種類の通知方法が設定されており、通知設定画面６０がＬＣＤパネル１２に表示される。カーソル選択ボタン１７を押圧操作して、選択したい通知方法の項目にカーソル６１を移動させ、実行ボタン１８ｂを押圧操作することにより設定変更可能である。図６は、モード移行時の通知処理を説明するフローチャートである。
【００３９】
待ち受けモードから主音声入力モードへ移行した際に、通知方法が「音声通知」に設定されているか否かの判定をシステムコントローラ３１が行う。「音声通知」に設定されていると判定された場合には、システムコントローラ３１が音声出力部１９を制御して音声出力を行う。これにより、主音声入力モードに移行したことが操作者に通知され、音声通知処理が終了する。
【００４０】
「音声通知」に設定されていない判定された場合は、音声出力部１９からの音声出力を行なわずに次の処理に進む。その後、システムコントローラ３１は、通知方法が「ＬＣＤ表示」に設定されているか否かの判定を行う。「ＬＣＤ表示」に設定されていると判定された場合には、システムコントローラ３１がＬＣＤドライバ３９を制御して、主音声入力モードに移行したことを示す文字列等をＬＣＤパネル１２に表示する。これにより、主音声入力モードに移行したことが操作者に通知され、音声通知処理が終了する。
【００４１】
「ＬＣＤ表示」に設定されていないと判定された場合は、ＬＣＤパネル１２に表示を行わずに次の処理に進む。この時、通知方法は「自動選択」に設定されている。システムコントローラ３１は、接眼検知センサ１４の出力に基づき、撮影者がファインダ装置を使用してフレーミングを行っているか否かを判定する。
【００４２】
ファインダ装置を使用していると判定された場合は、システムコントローラ３１が音声出力部１９を制御して音声出力を行う。これにより、主音声入力モードに移行したことが操作者に通知され、音声通知処理が終了する。
【００４３】
ファインダ装置を使用していないと判定された場合は、次の処理に進む。その後、システムコントローラ３１は、ＬＣＤパネル１２が点灯しているか否かの判定を行う。ＬＣＤパネル１２が点灯していると判定された場合には、システムコントローラ３１がＬＣＤドライバ３９を制御して、主音声入力モードに移行したことを示す文字列等をＬＣＤパネル１２に表示する。これにより、主音声入力モードに移行したことが操作者に通知され、音声通知処理が終了する。
【００４４】
ＬＣＤパネル１２が点灯していないと判定された場合には、システムコントローラ３１が音声出力部１９を制御して音声出力を行う。これにより、主音声入力モードに移行したことが操作者に通知され、音声通知処理が終了する。
【００４５】
なお、ファインダ装置を電子式ビューファインダとして使用しても良い。この場合では、撮影者がファインダ装置を使用していると判定されたときに、主音声入力モードに移行した旨が電子式ビューファインダに表示することもできる。また、主音声入力モードから待ち受けモードに移行したときに、ＬＣＤパネル１２にその旨を表示し、あるいは音声通知しても良い。
【００４６】
なお、本実施形態においては、デジタルカメラを用いて説明したが、本発明は、これに限るものではなく、例えば、カーナビゲーションの音声認識装置にも適用することができる。
【００４７】
【発明の効果】
以上説明したように、本発明によれば、特定の音声コマンドを認識することにより、音声認識処理を開始させるようにしたので、誤動作が防止される。また、音声コマンドにより、音声認識処理の開始と、通常コマンドの処理との両方を実行できるので、機器を操作していることを話者に意識させずに操作可能であり、音声入力装置の利点を最大限に生かすことができる。
【図面の簡単な説明】
【図１】音声認識装置を搭載したデジタルカメラの斜視図である。
【図２】デジタルカメラの電気的構成を示す斜視図である。
【図３】音声認識装置の構成を示すブロック図である。
【図４】モード移行処理を説明するフローチャートである。
【図５】モード移行の通知設定画面を示す説明図である。
【図６】モード移行通知の処理を説明するフローチャートである。
【符号の説明】
１０デジタルカメラ
１２ＬＣＤパネル
１４接眼検知センサ
１５音声入力部
１９音声出力部
４６音声認識装置
５１制御部
５２音声処理部
５３音声認識部
５４辞書選択部
５５辞書記憶部
５６待ち受け用辞書
５７主辞書（Ａ）
５８主辞書（Ｂ）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method for controlling a speech recognition apparatus that is mounted on an electronic device or the like and recognizes speech uttered by a speaker.
[0002]
[Prior art]
The speech recognition apparatus can improve the recognition performance by detecting the timing when the speaker speaks. For this reason, an electronic device or the like equipped with a voice recognition device is provided with a button switch or the like in order to instruct the start of processing related to voice recognition such as voice input start. When it is detected that the button switch is turned on by the speaker, the start of the input voice data is narrowed down by starting a process related to voice recognition such as the start of voice input.
[0003]
However, in recent years, speech recognition apparatuses have been applied to various uses, and depending on the use, it may be difficult to manually operate button switches due to the nature of the system. In order to solve such a problem, there is known a speech recognition device that includes a motion detection unit that detects a motion of a speaker and starts processing related to speech recognition when a predetermined motion of the speaker is detected. (For example, refer to Patent Document 1).
[0004]
[Patent Document 1]
Japanese Patent Laid-Open No. 2000-338995
[Problems to be solved by the invention]
However, in the technology described in Patent Document 1, in order to perform the voice recognition process, the user is forced to perform a specific operation unrelated to the voice input, and the voice recognition can be performed without performing an external operation. The features of the device will be impaired.
Furthermore, since a process related to voice recognition is always started when a predetermined operation is detected, there is a problem that a voice recognition process is started even when a predetermined operation is performed unintentionally.
[0006]
The present invention is for solving the above-described problems, and provides a method for controlling a speech recognition apparatus capable of effectively detecting the start of speech input without performing an operation unrelated to speech input. Objective.
[0007]
[Means for Solving the Problems]
In order to solve the above problems, a control method for a speech recognition apparatus according to the present invention is a control method for a speech recognition apparatus that is mounted on an electronic device or the like and that recognizes speech emitted by a speaker and outputs a corresponding command as a result. Yes, it has a standby mode in which only specific commands can be recognized by voice and a main voice input mode that recognizes voice commands for operating commands to operate electronic devices. It is characterized by shifting to.
[0008]
Further, when the standby mode is shifted to the main voice input mode, the speaker is identified by referring to a standby recognition dictionary in which voice data of a plurality of speakers is recorded in advance, and the identified speaker is supported. Voice recognition in the main voice input mode is performed based on the recognition dictionary.
[0009]
Further, when the operation mode is switched between the standby mode and the main voice input mode, the speaker is informed that the operation mode has been switched through the optimal notification means selected based on the operation state of the electronic device. It is characterized by notifying.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is an external perspective view of a digital camera 10 to which a method for controlling a speech recognition apparatus according to the present invention is applied. An LCD panel 12 is provided on the rear surface of the camera body 11, and an eyepiece finder window 13 is disposed in the upper part of the drawing. An optical viewfinder device is provided behind the eyepiece viewfinder window 13. An eyepiece detection sensor 14 is provided on the left side of the eyepiece finder window 13 in the drawing, and detects that the photographer has looked into the eyepiece finder window 13. On the right side of the eyepiece finder window 13 in the figure, a voice input unit 15 that is a microphone and a slide member 16 that is linked to a power switch are provided. A cursor operation button 17 is provided on the right side of the LCD panel 12 in the drawing, and an operation button 18 and a sound output unit 19 that is a speaker are provided below the cursor operation button 17.
[0011]
The operation button 18 includes three buttons, a menu button 18a, an execution button 18b, and a cancel button 18c, and is pressed by the photographer when changing the setting of the digital camera 10 or the like.
[0012]
A shutter button 20 and a mode setting dial 21 are provided on the upper surface of the camera body 11. The mode setting dial 21 is rotated by a photographer when switching modes such as a photographing mode, a reproduction mode, and a setup mode. Although not shown, a camera lens, an objective finder window, and the like are provided on the front surface of the camera body 11.
[0013]
When the menu button 18a is pressed, a setting menu is displayed on the LCD panel 12. By operating the cursor operation button 17 on the setting menu screen, the cursor is moved to an item to be set on the setting menu screen, for example, “image size selection”, and the execution button 18b is pressed to operate “image”. The “Size selection” setting screen appears. Similarly, on this setting screen, the image size setting can be changed by operating the cursor operation button 17 and the execution button 18b. In the digital camera 10 of the present embodiment, such a setting change can be changed by recognizing a voice uttered by a speaker who is a photographer.
[0014]
FIG. 2 is a block diagram showing the main part of the electrical configuration of the digital camera 10.
The system controller 31 is connected to each block in the digital camera 10 via the data bus 32 and controls the entire digital camera 10.
[0015]
An imaging unit 33 configured by an imaging device such as a CCD image sensor acquires a subject image and outputs image data to the A / D converter 34. The A / D converter 34 converts the image data from an analog signal to a digital signal, and outputs the digitized image data to the image signal processing unit 35.
[0016]
The image signal processing unit 35 is connected to the display memory 36 and the buffer memory 37 via the data bus 32, and writes image data to each of them. When the LCD panel 12 is used as an electronic viewfinder, the display memory 36 temporarily records low-resolution image data, and then the image data is transmitted to the LCD driver 39 and after signal processing, Is displayed.
[0017]
The buffer memory 37 temporarily stores captured high-resolution image data, and the image signal processing unit 35 performs various signal processing. The image data read from the buffer memory 37 is compressed by a compression method such as JPEG by the compression / decompression processing unit 40 connected via the data bus 32. The system controller 31 controls the media controller 41 to record the compressed image data on a memory card 42 that is a recording medium.
[0018]
The system controller 31 is connected to the voice input unit 15, the voice output unit 19, and the operation input unit 44 via the I / O port 43. The operation input unit 44 includes a slide member 16, a cursor operation button 17, an operation button 18, a shutter button 20, and a mode setting dial 21. The system controller 31 acquires a photographer's command from the operation input unit 44. A built-in memory (not shown) is provided in the system controller 31 and executes various processes based on a control program stored in the built-in memory.
[0019]
Further, the eye controller 14 and the voice recognition device 46 are connected to the system controller 31 via the data bus 32. As shown in the block diagram of FIG. 3, the speech recognition device 46 acquires the feature amount from the speech signal using a control unit 51 that controls the entire device and a technique such as short-time Fourier spectrum analysis or LPC analysis. A speech processing unit 52; a speech recognition unit 53 that performs speech recognition from speech features using a technique such as HMM; a dictionary selection unit 54 that provides the speech recognition unit 53 with a speech recognition dictionary; and a dictionary storage unit 55.
[0020]
The voice recognition device 46 has two operation modes: a standby mode for recognizing only a specific command, and a main voice input mode for recognizing a normal command and outputting the contents of the input command to the system controller 31. I have.
[0021]
The voice input unit 15 is a microphone, converts an operator's voice into an electrical signal, and converts the electrical signal from an analog signal to a digital signal. The speech processing unit 52 acquires speech data converted into a digital signal from the speech input unit 15, performs spectrum analysis on the speech data using short-time Fourier spectrum analysis or LPC analysis, and performs speech analysis for each phoneme. A feature amount (feature parameter) is acquired. As such a characteristic parameter, a power spectrum, a cepstrum coefficient, and the like are well known, but in the present embodiment, the type of the characteristic parameter to be used is not limited.
[0022]
If a characteristic that is clearly different from the voice is obtained, such as a spectrum of a single sound, the voice input is determined as an invalid input, and the subsequent processing is not executed.
[0023]
The speech recognition unit 53 acquires a feature amount (feature parameter) for each phoneme from the speech processing unit 52, estimates an input speech using a time series probability model such as an HMM, and the recognition stored in the dictionary storage unit 54. The input command is specified by referring to the dictionary.
[0024]
The dictionary storage unit 54 stores a standby dictionary 56 used in the standby mode, and a main dictionary (A) 57 and a main dictionary (B) 58 used in the main voice input mode. The standby dictionary 56 stores only specific commands such as “start” and “start” mode transition commands. The “start” command is associated with the speaker (A), and the “start” command is associated with the speaker (B). In the main dictionary (A) 57 and the main dictionary (B) 58, a plurality of commands (command 1) for executing various processes (for example, image quality setting and strobe shooting mode setting) of a device in which the voice recognition device 46 is incorporated. N) are stored, the main dictionary (A) is associated with the speaker (A), and the main dictionary (B) is associated with the speaker (B).
[0025]
The dictionary selection unit 54 selects the standby dictionary 56 in the standby mode. In the main voice input mode, the dictionary selection unit 54 selects the main dictionary (A) 57 when the speaker is specified as the speaker (A). When specified in (B), the main dictionary (B) 58 is selected.
[0026]
A control method of the voice recognition device 46 configured as described above will be described. FIG. 4 is a flowchart for explaining the voice recognition processing of the voice recognition device 46.
[0027]
The voice recognition device 46 is always in an activated state while the digital camera 10 is activated, and operates as a standby mode. When voice is input to the voice input unit 15 while operating in the standby mode, the voice of the speaker who is the photographer is output to the voice processing unit 52 as voice data.
[0028]
The voice processing unit 52 calculates a feature amount (feature parameter) for each phoneme based on the acquired voice data. At this time, if a characteristic that is clearly different from the voice is obtained, such as a spectrum of a single sound, this voice input is determined as an invalid input, and the standby mode state is changed without performing subsequent processing. maintain. Then, when a new voice input is made, the feature amount is calculated again. In other cases, the calculated feature amount is output to the speech recognition unit 53.
[0029]
The speech recognition unit 53 compares the feature amount acquired from the speech processing unit 52 with the feature amount of the mode transition command ((A) “Start”, (B) “Start”) stored in the standby dictionary 56. To identify the command. At this time, if the mode transition command is not recognized, the standby mode state is maintained without performing the subsequent process, and the voice recognition process is performed again when a new voice is input. In the standby mode, since no voice input other than a specific mode transition command is accepted, the mode transition command can be recognized with high accuracy. Further, since the transition from the standby mode to the main voice input mode is performed by voice input, it is not necessary to perform a special operation for changing the mode.
[0030]
When the mode transition command is recognized, the standby mode is shifted to the main voice input mode. This mode transition is notified to the photographer by the audio output unit 19 and the LCD panel 12 as described later. Also, the type of speech recognition dictionary used in the main speech input mode is determined based on the type of transition command used.
For example, when the main voice input mode is entered with the command (A) “Start”, the main dictionary (A) 18 is selected as the recognition dictionary, and the main voice input mode is entered with the command (B) “Start”. In this case, the main dictionary (B) 19 is selected as the recognition dictionary.
[0031]
In the main voice input mode, the voice input process is executed in the same manner as in the standby mode, and if the voice processing unit 52 determines that the input is invalid, the process is continued until a new voice is input. Not performed. If the input is valid, the voice recognition unit 53 identifies the command. At this time, unlike the standby mode, all commands registered in the main dictionary (A) 57 or the main dictionary (B) 58 are recognized. Since the command is input only in the voice input mode, the timing of the start of the input voice can be narrowed down.
[0032]
The recognized command is output to the system controller 31 via the data bus 32, and the system controller 31 executes processing corresponding to the command. If the recognized command relates to the control of the voice recognition device 46 (for example, an “end” command indicating the end of the main voice input mode), the command is output to the control unit 51 of the voice recognition device 10.
[0033]
When the “end” command is input, the control unit 51 controls the dictionary selection unit 54 to change the recognition dictionary from the main dictionary (A) 57 or the main dictionary (B) 58 to the standby dictionary 56. Change to standby mode. If the “end” command is not recognized, a new voice input is acquired and the same processing is repeated.
[0034]
The main voice input mode is ended by recognizing the “end” command. However, the present invention is not limited to this. For example, the main voice input mode is ended by pressing the operation button 18 of the digital camera 10. You may do it.
[0035]
Further, it has been described that the speaker is specified by the type of command (“start”, “start”) and the corresponding main dictionary is selected, but the present invention is not limited to this. The speaker feature may be specified by storing the voice feature amount for each speaker corresponding to the transfer command and comparing it with the input voice data.
[0036]
Further, in the present embodiment, only two dictionaries are set, but the present invention is not limited to this, and the number of speakers may be one or three or more.
In this case, main dictionaries for the number of speakers (operators) are set.
[0037]
Further, the processing of the voice recognition processing unit may be executed by software without providing the voice recognition processing unit. In this case, the software program and the recognition dictionary may be stored in the built-in memory of the system controller 31, and when the digital camera 10 is turned on, the above-described program may be always activated.
[0038]
Next, notification processing at the time of mode transition will be described. As shown in FIG. 5, three notification methods “LCD panel display”, “notify by voice”, and “automatic selection” are set as a method for notifying the transition to the main voice input mode. A screen 60 is displayed on the LCD panel 12. The setting can be changed by pressing the cursor selection button 17 to move the cursor 61 to the notification method item to be selected and pressing the execution button 18b. FIG. 6 is a flowchart for explaining a notification process at the time of mode transition.
[0039]
When shifting from the standby mode to the main voice input mode, the system controller 31 determines whether or not the notification method is set to “voice notification”. When it is determined that “voice notification” is set, the system controller 31 controls the voice output unit 19 to output a voice. As a result, the operator is notified of the transition to the main voice input mode, and the voice notification process ends.
[0040]
If it is determined that the “voice notification” is not set, the process proceeds to the next process without outputting the voice from the voice output unit 19. Thereafter, the system controller 31 determines whether or not the notification method is set to “LCD display”. If it is determined that “LCD display” is set, the system controller 31 controls the LCD driver 39 to display a character string or the like indicating that the mode has been changed to the main voice input mode on the LCD panel 12. As a result, the operator is notified of the transition to the main voice input mode, and the voice notification process ends.
[0041]
If it is determined that “LCD display” is not set, the process proceeds to the next process without displaying on the LCD panel 12. At this time, the notification method is set to “automatic selection”. The system controller 31 determines whether the photographer is performing framing using the finder device based on the output of the eyepiece detection sensor 14.
[0042]
When it is determined that the finder device is used, the system controller 31 controls the sound output unit 19 to perform sound output. As a result, the operator is notified of the transition to the main voice input mode, and the voice notification process ends.
[0043]
If it is determined that the finder device is not used, the process proceeds to the next process. Thereafter, the system controller 31 determines whether or not the LCD panel 12 is lit. If it is determined that the LCD panel 12 is lit, the system controller 31 controls the LCD driver 39 to display on the LCD panel 12 a character string indicating the transition to the main voice input mode. As a result, the operator is notified of the transition to the main voice input mode, and the voice notification process ends.
[0044]
When it is determined that the LCD panel 12 is not lit, the system controller 31 controls the audio output unit 19 to perform audio output. As a result, the operator is notified of the transition to the main voice input mode, and the voice notification process ends.
[0045]
Note that the viewfinder device may be used as an electronic viewfinder. In this case, when it is determined that the photographer is using the viewfinder device, it is possible to display on the electronic viewfinder that the main voice input mode has been entered. Further, when the main voice input mode is shifted to the standby mode, the fact may be displayed on the LCD panel 12 or may be notified by voice.
[0046]
Although the present embodiment has been described using a digital camera, the present invention is not limited thereto, and can be applied to, for example, a car navigation voice recognition device.
[0047]
【The invention's effect】
As described above, according to the present invention, since a voice recognition process is started by recognizing a specific voice command, a malfunction is prevented. In addition, since both the start of voice recognition processing and normal command processing can be executed by voice commands, it can be operated without the speaker being aware that the device is being operated. Can make the most of it.
[Brief description of the drawings]
FIG. 1 is a perspective view of a digital camera equipped with a voice recognition device.
FIG. 2 is a perspective view showing an electrical configuration of the digital camera.
FIG. 3 is a block diagram illustrating a configuration of a voice recognition device.
FIG. 4 is a flowchart illustrating mode transition processing.
FIG. 5 is an explanatory diagram showing a mode transition notification setting screen.
FIG. 6 is a flowchart illustrating a mode transition notification process.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 Digital camera 12 LCD panel 14 Eyepiece detection sensor 15 Voice input part 19 Voice output part 46 Voice recognition apparatus 51 Control part 52 Voice processing part 53 Voice recognition part 54 Dictionary selection part 55 Dictionary storage part 56 Standby dictionary 57 Main dictionary (A )
58 Main dictionary (B)

Claims

In a control method of a speech recognition device that is mounted on an electronic device or the like and recognizes speech emitted by a speaker and outputs a corresponding command as a result,
A standby mode capable of recognizing only a specific command; and a main voice input mode for recognizing an operation command for operating the electronic device. When the specific command is voice-recognized in the standby mode, the main voice A control method for a speech recognition apparatus, wherein the input mode is entered.

When transitioning from the standby mode to the main voice input mode, a speaker is identified by referring to a standby recognition dictionary in which voice data of a plurality of speakers are recorded in advance, and the identified speaker is supported. 2. The method of controlling a speech recognition apparatus according to claim 1, wherein speech recognition is performed in the main speech input mode based on a recognition dictionary.

When the operation mode is switched between the standby mode and the main voice input mode, the fact that the operation mode has been switched through the optimal notification means selected based on the operation state of the electronic device The method of controlling a speech recognition apparatus according to claim 1 or 2, wherein a speaker is notified.