JP4197419B2

JP4197419B2 - camera

Info

Publication number: JP4197419B2
Application number: JP2002283072A
Authority: JP
Inventors: 泉三宅
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2002-09-27
Filing date: 2002-09-27
Publication date: 2008-12-17
Anticipated expiration: 2022-09-27
Also published as: JP2004120526A

Description

【０００１】
【発明の属する技術分野】
本発明は、被写体の撮影を行なうカメラに関する。
【０００２】
【従来の技術】
従来より、入力された音声を認識して文字データに変換し、変換された文字列を撮影画像と合成して、その画像に対するコメントを表示することにより、多数の画像の中から所望の画像を容易に見つけ出すことができるカメラが提案されている（例えば、特許文献１参照）。
【０００３】
また、入力された音声の大きさ（アナログ量）に応じた分だけパン・チルト角等を変化させるカメラが提案されており、さらにこのカメラの変形例として複数のユーザのデフォルトの発声を登録しておき、登録されたデフォルトの発声と複数のユーザそれぞれの発声とを比較することにより、音声認識を行なう技術が提案されている（例えば、特許文献２参照）。
【０００４】
【特許文献１】
特開平９−２５２４５３号公報（段落番号００１７、第１図）
【特許文献２】
特開２０００−２８４７９４号公報（段落番号００１３−段落番号００２３、第１図、および段落番号００３４−段落番号００３７）
【０００５】
【発明が解決しようとする課題】
入力された音声を認識して各種の制御を行なうカメラにおいて、そのカメラを使用するユーザの音声に適切な音声辞書を用いて音声認識を行なうと、高い精度で音声認識を行なうことができる。しかし、特許文献１には、このような技術については提案されていない。また、特許文献２には、予め登録されたデフォルトの発声と複数のユーザそれぞれの発声とを比較することにより音声認識を行なう技術が提案されているものの、どのようにして複数のユーザの中から使用するユーザ（話者）を特定しその話者の音声に適切な音声辞書を選択するかというような技術は提案されていない。
【０００６】
本発明は、上記事情に鑑み、簡単な操作で話者の音声に適切な音声辞書を選択することができるカメラを提供することを目的とする。
【０００７】
【課題を解決するための手段】
上記目的を達成する本発明のカメラは、被写体の撮影を行なうカメラにおいて、
音声をピックアップするマイクロホンと、
所定の単語が入力されたか否かを判定し、所定の単語が入力されたと判定された場合に上記マイクロホンから入力された複数の話者それぞれの音声に基づきその音声の話者の音声上の特徴を抽出して複数の話者それぞれの音声辞書を作成するとともに各音声辞書と各話者を認識するための各シンボル画像とを対応づける音声辞書作成部と、
画像を表示する画像表示部と、
上記画像表示部に前記シンボル画像一覧を表示させ、そのシンボル画像一覧の中のいずれかのシンボル画像を操作により選択させることにより選択されたシンボル画像に対応する音声辞書を選択する音声辞書選択部と、
上記マイクロホンから入力された音声を上記音声辞書選択部により選択された音声辞書を用いて認識して認識された音声に応じた制御を行なう音声制御部とを備えたことを特徴とする。
【０００８】
本発明のカメラは、複数の話者それぞれの音声上の特徴を抽出して複数の話者それぞれの音声辞書を作成するとともに、作成された各音声辞書と各話者を認識するための各シンボル画像との対応づけを行ない、音声制御にあたり、シンボル画像一覧の中からカメラ使用者である話者の音声に対応するシンボル画像を選択すれば済む。従って、話者の音声に適切な音声辞書を簡単に選択することができる。
【０００９】
【発明の実施の形態】
以下、本発明の実施形態について説明する。
【００１０】
図１は、本発明の一実施形態のカメラの外観図である。図１（ａ）は正面図、（ｂ）は上面図、（ｃ）は側面図、（ｄ）は背面図である。
【００１１】
図１（ａ）〜（ｄ）に示すカメラ１００は、被写体を撮像素子上に結像させてその被写体を表わす画像信号を取り込む撮影を行なうデジタルカメラである。また、このカメラ１００は、入力された音声を認識して種々の操作を行なうデジタルカメラである。
【００１２】
図１（ａ）に示すように、本実施形態のカメラ１００の正面には、撮影ズームレンズ１０１と、音声をピックアップするマイクロホン１５１が備えられている。このカメラ１００は、詳細は後述するが、マイクロホン１５１から入力された音声を認識し認識された音声に応じた、例えば上記撮影ズームレンズ１０１のズームアップ（テレ側への移動）やズームダウン（ワイド側への移動）等の音声制御を行なう。また、このカメラ１００の上部には、閃光を発光する閃光発光管１０５ａを有する閃光発光装置１０５が配備されている。
【００１３】
さらに、図１（ｄ）に示すように、カメラ１００の背面にはユーザがこのカメラ１００を使用するときに種々の操作を行なうための操作部１２０が設けられている。
【００１４】
この操作部１２０には、カメラ１００を作動させるための電源投入用の電源スイッチ１２１、撮影と再生とを自在に切り替える撮影・再生切替レバー１２２、オート撮影やマニュアル撮影等を選択するための撮影モードダイヤル１２３、各種のメニューの設定や選択あるいはズームを行なうための十字スイッチ１２４、閃光発光用のスイッチ１２５、および十字スイッチ１２４で選択されたメニューの実行を行なうための実行スイッチ１２６ａ，キャンセルを行なうためのキャンセルスイッチ１２６ｂが備えられている。
【００１５】
また、カメラ１００の背面には、撮影画像や再生画像等を表示するための画像表示ＬＣＤ１０２（本発明にいう画像表示部の一例に相当）と、操作の手助けを行なうための操作表示ＬＣＤ１０３と、スピーカ１５２とが備えられている。
【００１６】
さらに、図１（ｂ）に示すように、このカメラ１００の上面にはレリーズ釦１０４が配備されている。このレリーズ釦１０４によって撮影の開始指示がカメラ１００の内部に備えられた、後述するメインＣＰＵへと伝えられる。このカメラ１００では撮影・再生切替レバー１２２によって撮影と再生との切り替えが自在になっていて、撮影を行なうときにはユーザによって撮影・再生切替レバー１２２が撮影側１２２ａに切り替えられ、再生を行なうときには撮影・再生切替レバー１２２が再生側１２２ｂに切り替えられる。
【００１７】
さらに、図１（ｃ）に示すように、カメラ１００の側面には、このカメラ１００により撮影された被写体の画像信号をテレビやプロジェクタ等に出力するためのケーブルが接続される映像出力端子１０６と、このカメラ１００により撮影された被写体の画像信号をＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ（ＵＳＢ）端子が備えられたパーソナルコンピュータ等に出力し、およびこのようなパーソナルコンピュータ等からカメラ１００に画像信号を入力するためのケーブルが接続されるＵＳＢ端子１０７と、ＡＣアダプタからの直流電圧が入力される直流電圧入力端子１０８とが備えられている。
【００１８】
図２は、図１に示すカメラの回路構成を示すブロック図である。
【００１９】
このカメラ１００には、前述した撮影ズームレンズ１０１と、絞り１３１と、それら撮影ズームレンズ１０１および絞り１３１を経由して結像された被写体像をアナログの画像信号に変換する撮像素子であるＣＣＤセンサ１３２とが備えられている。ＣＣＤセンサ１３２は、そのＣＣＤセンサ１３２に照射された被写体光により発生した電荷を可変の電荷蓄積時間の間蓄積することにより画像信号を生成するものである。
【００２０】
また、このカメラ１００には、ＣＣＤセンサ１３２からのアナログ画像信号が表わす被写体像のホワイトバランスを合わせるとともにその被写体像の階調特性における直線の傾き（γ）を調整する白バランス・γ処理部１３３が備えられている。
【００２１】
さらに、カメラ１００には、白バランス・γ処理部１３３からのアナログ信号をデジタルの画像データにＡ／Ｄ変換するＡ／Ｄ部１３４と、そのＡ／Ｄ部１３４からの画像データを格納するバッファメモリ１３５が備えられている。
【００２２】
また、カメラ１００には、ＣＧ（クロックジェネレータ）部１３６と、測光・測距用ＣＰＵ１３７と、充電・発光制御部１３８と、ＹＣ処理部１４０と、電源１４１とが備えられている。
【００２３】
ＣＧ部１３６は、ＣＣＤセンサ１３２を駆動するための駆動信号、白バランス・γ処理部１３３，Ａ／Ｄ部１３４を制御するための制御信号を出力する。また、このＣＧ部１３６には、測光・測距用ＣＰＵ１３７からの制御信号が入力される。
【００２４】
測光・測距用ＣＰＵ１３７は、撮影ズームレンズ１０１，絞り１３１を図示しない手段で駆動することにより測光や測距を行ない、ＣＧ部１３６および充電・発光制御部１３８を制御する。さらに、この測光・測距用ＣＰＵ１３７は、後述するメインＣＰＵ１４５との間でデータ通信を行なう。
【００２５】
充電・発光制御部１３８は，閃光発光管１０５ａを発光させるために電源１４１からの電力の供給を受けて図示しない閃光発光用のコンデンサを充電したり、その閃光発光管１０５ａの発光を制御する。
【００２６】
ＹＣ処理部１４０は、バッファメモリ１３５に格納された画像データをバスライン１４２を介して読み出し、輝度信号（Ｙ）と色信号（Ｃ）に分離されたカラー映像信号ＹＣを生成する。生成されたカラー映像信号ＹＣは、映像出力端子１０６（図１（ｃ）参照）から出力される。
【００２７】
電源１４１は、このカメラ１００の各部に電力を供給する。
【００２８】
さらに、カメラ１００には、圧縮・伸長＆ＩＤ抽出部１４３と、Ｉ／Ｆ部１４４が備えられている。圧縮・伸長＆ＩＤ抽出部１４３は、バッファメモリ１３５に格納された画像データを、バスライン１４２を介して読み出して圧縮し、Ｉ／Ｆ部１４４を経由してメモリカード２００に格納する。また、圧縮・伸長＆ＩＤ抽出部１４３は、メモリカード２００に格納された画像データの読み出しにあたり、メモリカード２００固有の識別番号（ＩＤ）を抽出し、そのメモリカード２００に格納された画像データを読み出して伸長し、バッファメモリ１３５に格納する。
【００２９】
また、カメラ１００には、メインＣＰＵ１４５と、ＥＥＰＲＯＭ１４６と、ＹＣ／ＲＧＢ変換部１４７と、表示用のドライバ１４８とが備えられている。
【００３０】
メインＣＰＵ１４５は、このカメラ１００全体の制御を行なう。
【００３１】
ＥＥＰＲＯＭ１４６には、このカメラ１００固有の固体データ等が格納されている。
【００３２】
ＹＣ／ＲＧＢ変換部１４７は、ＹＣ処理部１４０で生成されたカラー映像信号ＹＣを３色のＲＧＢ信号に変換して表示用のドライバ１４８を経由して画像表示ＬＣＤ１０２に出力する。
【００３３】
さらに、カメラ１００には、前述したマイクロホン１５２と、フィルタ１５３と、Ａ／Ｄ部１５４と、音声辞書作成部１５５と、音声辞書メモリ１５６と、音声辞書選択部１５７と、音声制御部１５８とが備えられている。
【００３４】
マイクロホン１５２には、複数のユーザ（話者）からの音声が入力される。マイクロホン１５２から入力された複数の話者それぞれの音声は、アナログの電気信号に変換されてフィルタ１５３に向けて出力される。
【００３５】
フィルタ１５３は、マイクロホン１５２からのアナログの電気信号の、必要帯域以外の周波数成分を除去して、Ａ／Ｄ部１５４に向けて出力する。
【００３６】
Ａ／Ｄ部１５４は、フィルタ１５３からのアナログ電気信号をデジタル信号に変換する。
【００３７】
音声辞書作成部１５５は、Ａ／Ｄ部１５４からのデジタル信号に基づいて、複数の話者の音声上の特徴を抽出して複数の話者それぞれの音声辞書を作成するとともに各音声辞書と各話者を認識するための各シンボル画像とを対応づけて、メモリカード２００に記憶する。詳細については後述する。
【００３８】
音声辞書選択部１５７は、画像表示ＬＣＤ１０２にシンボル画像の一覧を表示させ、そのシンボル画像一覧の中のいずれかのシンボル画像を、操作部１２０の十字スイッチ１２４の操作により選択させることにより選択されたシンボル画像に対応する音声辞書を選択する。選択された音声辞書は、音声辞書メモリ１５６に格納される。
【００３９】
音声制御部１５８は、マイクロホン１５１から入力された音声を、音声辞書選択部１５７により選択され音声辞書メモリ１５６に格納された音声辞書を用いて認識し、認識された音声に応じた制御を行なう。ここで、音声制御の例について説明する。
【００４０】
本実施形態では、話者の基本的な言葉の入力による声紋が音声辞書メモリ１５６に格納される。ここで、基本的な言葉はカメラ１００の撮影時に使用される言葉と同じでもよく、あるいは関係のない言葉（例えば「あいうえお」等）でもよい。カメラ１００の撮影時に使用される言葉とカメラ１００の動作の関係の例としては、以下のものがある。
【００４１】
「撮影」と発音すると、カメラ１００ではシャッタレリーズ動作を行なう。また、「ズームアップ」と発音すると撮影ズームレンズ１０１をテレ側に移動し、「ズームダウン」と発音すると撮影ズームレンズ１０１をワイド側に移動する。さらに、「メニューオープン」と発音すると画像表示ＬＣＤ１０２上に設定画面を表示する。その状態で、「アップ」と発音するとメニュー選択項目をアップ（繰り上げ）し、「ダウン」と発音するとメニュー選択項目をダウン（繰り下げ）する。また、「セット」と発音するとメニュー選択項目を確定し、「キャンセル」と発音するとメニュー選択項目を取り消す。さらに、「再生」と発音すると撮影画像を再生し、「送り」と発音すると再生画像のコマ送りを行ない、「戻し」と発音すると再生画像のコマを１つ戻す。
【００４２】
図３は、メモリカードに記録された画像ファイルと音声辞書とを示す図である。
【００４３】
前述したように、音声辞書作成部１５５により、複数の話者の音声上の特徴が抽出されて複数の話者それぞれの音声辞書が作成されるとともに各音声辞書が各話者を認識するための各シンボル画像に対応づけられる。これら対応づけられた各音声辞書と各シンボル画像は、この図３に示すように、メモリカード２００に記憶される。ここでは、シンボル画像としての画像ファイル１とその画像ファイル１に対応づけられた音声辞書１とのペア、シンボル画像としての画像ファイル２とその画像ファイル２に対応づけられた音声辞書２とのペア等が記憶される。
【００４４】
本実施形態のカメラ１００は、複数の話者それぞれの音声上の特徴を抽出して複数の話者それぞれの音声辞書を作成するとともに、作成された各音声辞書と各話者を認識するための各シンボル画像との対応づけを行なっておき、そのカメラ１００の使用にあたっては、シンボル画像一覧の中からそのカメラ１００を使用するユーザに対応するシンボル画像を操作により選択し、そのユーザ（話者）に対応する音声辞書を選択して音声認識するものである。以下、詳細に説明する。
【００４５】
図４は、図３に示す画像ファイルからなるシンボル画像一覧を示す図である。
【００４６】
このシンボル画像一覧１６０は、家族のシンボル画像からなるものであり、画像表示ＬＣＤ１０２上に表示される。具体的には、「父親」のシンボル画像１６１（上記画像ファイル１に相当）と、「息子」のシンボル画像１６２（上記画像ファイル２に相当）と、「母親」のシンボル画像１６３と、「娘」のシンボル画像１６４とが表示されている。このような家族が音声でカメラ１００を操作する場合、先ず音声辞書を作成する必要がある。
【００４７】
図５は、本実施形態のカメラの、音声辞書作成ルーチンのフローチャートである。
【００４８】
ここでは、図４に示す「息子」のシンボル画像１６２用の音声辞書のファイルを作成する場合について説明するが、その他のシンボル画像１６１，１６３，１６４用の音声辞書のファイルを作成する場合についても同様である。
【００４９】
撮影・再生切替レバー１２２が再生側１２２ｂに切り替えられた状態で電源スイッチ１２１が押されてカメラ１００に電源が投入され、十字スイッチ１２４で個人設定モードが選択されて実行スイッチ１２６ａが操作されると、この音声辞書作成ルーチンが開始する。
【００５０】
先ず、ステップＳ１において、音声入力が許可されてステップＳ２に進む。ステップＳ２では、所定の単語（ここでは「息子」により発声される単語）が入力されたか否かが判定される。所定の単語が入力されない場合は、所定の単語が入力されるまでこのステップＳ２を繰り返し実行する。所定の単語が入力されたと判定された場合はステップＳ３に進む。
【００５１】
ステップＳ３では、音声の特徴を抽出してステップＳ４に進む。ステップＳ４では、音声辞書ファイルを作成する。
【００５２】
次に、ステップＳ５において、関連付ける画像ファイルを選択する。ここでは、図４に示すように、画像表示ＬＣＤ１０２にシンボル画像一覧１６０を表示しておき、「息子」のシンボル画像１６２を選択して、実行スイッチ１２６ａを操作する。すると、ステップＳ６において、音声辞書ファイルを画像ファイルと関連付けて記録メディアとしてのメモリカード２００に記録して、このルーチンを終了する。
【００５３】
図６は、本実施形態のカメラの、音声辞書選択ルーチンのフローチャートである。
【００５４】
ここでは、図４に示す「息子」のシンボル画像１６２用の音声辞書を選択する場合について説明する。
【００５５】
撮影・再生切替レバー１２２が再生側１２２ｂに切り替えられた状態で電源スイッチ１２１が押されてカメラ１００に電源が投入され、十字スイッチ１２４で個人特定モードが選択されて実行スイッチ１２６ａが操作されると、この音声辞書選択ルーチンが開始する。
【００５６】
先ず、ステップＳ２１において、記録メディア（メモリカード２００）から画像を読み込んでステップＳ２２に進む。ステップＳ２２では、画像一覧データ（シンボル画像一覧）を作成する。次に、ステップＳ２３において、画像一覧を再生してステップＳ２４に進む。
【００５７】
ステップＳ２４では、使用者の画像（ここでは「息子」の画像）が選択されたか否かが判定される。使用者の画像が選択されない場合は、使用者の画像が選択されるまでこのステップＳ２４を繰り返し実行する（具体的には、「息子」の画像が選択されるまで十字スイッチ１２４を操作する）。使用者の画像が選択されたと判定された場合は実行スイッチ１２６ａを操作することによりステップＳ２５に進む。
【００５８】
ステップＳ２５では、画像と関連付けられた音声辞書を読み込む。次に、ステップＳ２６において、カメラ本体の音声辞書格納場所（音声辞書メモリ１５６）に格納して、このルーチンを終了する。
【００５９】
図７は、本実施形態のカメラの、音声辞書メモリに格納された音声辞書を使って音声を認識して、そのカメラを制御するルーチンのフローチャートである。
【００６０】
撮影・再生切替レバー１２２が撮影側１２２ａに切り替えられた状態で電源スイッチ１２１が押されてカメラ１００に電源が投入されると、このルーチンが開始する。
【００６１】
先ず、ステップＳ３１において、音声入力モードが選択されたか否か（十字スイッチ１２４による選択）が判定される。音声入力モードが選択されない場合は、ステップＳ３７に進み、通常の操作を実行する。音声入力モードが選択されたと判定された場合はステップＳ３２に進む。
【００６２】
ステップＳ３２では、音声入力があるか否かが判定される。音声入力がないと判定された場合は、音声入力があるまでステップＳ３２を繰り返し実行する。音声入力があったと判定された場合はステップＳ３３に進む。ステップＳ３３では、音声辞書を利用して認識を行なう。
【００６３】
次に、ステップＳ３４において、有効なコマンド、即ち前述した「撮影」，「ズームアップ」等の音声であるか否かが判定される。有効なコマンドであると判定された場合はステップＳ３５においてそのコマンドを実行し、ステップＳ３１に戻る。一方、有効なコマンドではないと判定された場合はステップＳ３６において画像表示ＬＣＤ１０２に警告表示をしてステップＳ３１に戻る。
【００６４】
尚、本実施形態では、音声辞書作成部１５５が各音声辞書と各シンボル画像との対応づけを行なってメモリカード２００に記録する例で説明したが、本発明にいう音声辞書作成部は、各音声辞書と各シンボル画像との対応づけを行なうものであればよい。
【００６５】
また、本実施形態では、音声制御部１５８が音声辞書メモリ１５６に格納された音声辞書を用いて認識する例で説明したが、本発明にいう音声制御部は、音声辞書選択部により選択された音声辞書を用いて音声を認識するものであればよい。
【００６６】
【発明の効果】
以上説明したように、本発明のカメラによれば、簡単な操作で話者の音声に適切な音声辞書を選択することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態のカメラの外観図である。
【図２】図１に示すカメラの回路構成を示すブロック図である。
【図３】メモリカードに記録された画像ファイルと音声辞書とを示す図である。
【図４】図３に示す画像ファイルからなるシンボル画像一覧を示す図である。
【図５】本実施形態のカメラの、音声辞書作成ルーチンのフローチャートである。
【図６】本実施形態のカメラの、音声辞書選択ルーチンのフローチャートである。
【図７】本実施形態のカメラの、音声辞書メモリに格納された音声辞書を使って音声を認識して、そのカメラを制御するルーチンのフローチャートである。
【符号の説明】
１００カメラ
１０１撮影ズームレンズ
１０２画像表示ＬＣＤ
１０３操作表示ＬＣＤ
１０４レリーズ釦
１０５閃光発光装置
１０５ａ閃光発光管
１０６映像出力端子
１０７ＵＳＢ端子
１０８直流電圧入力端子
１２０操作部
１２１電源スイッチ
１２２撮影・再生切替レバー
１２３撮影モードダイヤル
１２４十字スイッチ
１２５閃光発光用スイッチ
１２６ａ実行スイッチ
１２６ｂキャンセルスイッチ
１３１絞り
１３２ＣＣＤセンサ
１３３白バランス・γ処理部
１３４，１５４Ａ／Ｄ部
１３５バッファメモリ
１３６ＣＧ部
１３７測光・測距用ＣＰＵ
１３８充電・発光制御部
１３９通信制御部
１４０ＹＣ処理部
１４１電源
１４２バスライン
１４３圧縮・伸長＆ＩＤ抽出部
１４４Ｉ／Ｆ部
１４５メインＣＰＵ
１４６ＥＥＰＲＯＭ
１４７ＹＣ／ＲＧＢ変換部
１４８ドライバ
１５１マイクロホン
１５２スピーカ
１５３フィルタ
１５５音声辞書作成部
１５６音声辞書メモリ
１５７音声辞書選択部
１５８音声制御部
１６０シンボル画像一覧
１６１，１６２，１６３，１６４シンボル画像
２００メモリカード[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a camera for photographing a subject.
[0002]
[Prior art]
Conventionally, an input voice is recognized and converted into character data, the converted character string is combined with a photographed image, and a comment on the image is displayed, so that a desired image can be selected from many images. A camera that can be easily found has been proposed (see, for example, Patent Document 1).
[0003]
In addition, a camera that changes the pan / tilt angle etc. by an amount corresponding to the input sound volume (analog amount) has been proposed. Further, as a modified example of this camera, default utterances of a plurality of users are registered. A technique for performing speech recognition by comparing a registered default utterance with the utterances of each of a plurality of users has been proposed (see, for example, Patent Document 2).
[0004]
[Patent Document 1]
Japanese Patent Laid-Open No. 9-252453 (paragraph number 0017, FIG. 1)
[Patent Document 2]
JP 2000-284794 (paragraph number 0013-paragraph number 0023, FIG. 1, and paragraph number 0034-paragraph number 0037)
[0005]
[Problems to be solved by the invention]
In a camera that recognizes input voice and performs various controls, if voice recognition is performed using a voice dictionary appropriate for the voice of a user who uses the camera, voice recognition can be performed with high accuracy. However, Patent Document 1 does not propose such a technique. Also, Patent Document 2 proposes a technique for performing speech recognition by comparing a pre-registered default utterance with the utterances of each of a plurality of users. No technology has been proposed for identifying a user (speaker) to be used and selecting an appropriate speech dictionary for the voice of the speaker.
[0006]
In view of the above circumstances, an object of the present invention is to provide a camera that can select a speech dictionary appropriate for a speaker's voice with a simple operation.
[0007]
[Means for Solving the Problems]
The camera of the present invention that achieves the above object is a camera for photographing a subject.
A microphone that picks up the sound,
It is determined whether or not a predetermined word has been input, and when it is determined that the predetermined word has been input , the voice characteristics of the speaker based on the respective voices of the plurality of speakers input from the microphone A speech dictionary creating unit that creates a speech dictionary for each of the plurality of speakers and associates each speech dictionary with each symbol image for recognizing each speaker;
An image display unit for displaying an image;
A speech dictionary selection unit that displays the symbol image list on the image display unit and selects a speech dictionary corresponding to the selected symbol image by operating any symbol image in the symbol image list by operation; ,
And a voice control unit that performs control in accordance with the voice recognized by recognizing the voice input from the microphone using the voice dictionary selected by the voice dictionary selection unit.
[0008]
The camera of the present invention extracts voice features of each of a plurality of speakers to create a speech dictionary for each of the plurality of speakers, and each created speech dictionary and each symbol for recognizing each speaker Correspondence with images is performed, and for voice control, a symbol image corresponding to the voice of a speaker who is a camera user is selected from the symbol image list. Therefore, it is possible to easily select a speech dictionary appropriate for the speaker's voice.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described.
[0010]
FIG. 1 is an external view of a camera according to an embodiment of the present invention. 1A is a front view, FIG. 1B is a top view, FIG. 1C is a side view, and FIG.
[0011]
A camera 100 shown in FIGS. 1A to 1D is a digital camera that performs photographing by forming an image of a subject on an image sensor and taking in an image signal representing the subject. The camera 100 is a digital camera that recognizes input voice and performs various operations.
[0012]
As shown in FIG. 1A, a photographing zoom lens 101 and a microphone 151 for picking up sound are provided in front of the camera 100 of the present embodiment. As will be described in detail later, the camera 100 recognizes the voice input from the microphone 151 and, for example, zooms up (moves to the telephoto side) or zooms down (wide) according to the recognized voice. Voice control such as movement to the side). In addition, a flash light emitting device 105 having a flash light emitting tube 105a that emits flash light is disposed on the upper portion of the camera 100.
[0013]
Further, as shown in FIG. 1D, an operation unit 120 for performing various operations when the user uses the camera 100 is provided on the back surface of the camera 100.
[0014]
The operation unit 120 includes a power switch 121 for turning on the power for operating the camera 100, a shooting / playback switching lever 122 for freely switching between shooting and playback, and a shooting mode for selecting auto shooting, manual shooting, and the like. Dial 123, cross switch 124 for setting, selecting or zooming various menus, flashing switch 125, execution switch 126a for executing the menu selected by cross switch 124, for canceling The cancel switch 126b is provided.
[0015]
An image display LCD 102 (corresponding to an example of an image display unit according to the present invention) for displaying a photographed image, a reproduced image, and the like on the back of the camera 100, an operation display LCD 103 for assisting operation, A speaker 152 is provided.
[0016]
Further, as shown in FIG. 1B, a release button 104 is provided on the upper surface of the camera 100. With this release button 104, an instruction to start shooting is transmitted to a main CPU, which will be described later, provided in the camera 100. In this camera 100, shooting / playback switching lever 122 can freely switch between shooting and playback. When shooting, the user switches shooting / playback switching lever 122 to shooting side 122a, and when playback, shooting / playback is performed. The reproduction switching lever 122 is switched to the reproduction side 122b.
[0017]
Furthermore, as shown in FIG. 1C, a video output terminal 106 to which a cable for outputting an image signal of a subject photographed by the camera 100 to a television or a projector is connected to the side surface of the camera 100. A cable for outputting an image signal of a subject photographed by the camera 100 to a personal computer or the like provided with a Universal Serial Bus (USB) terminal, and inputting an image signal from the personal computer or the like to the camera 100 Are connected to a USB terminal 107, and a DC voltage input terminal 108 to which a DC voltage from an AC adapter is input.
[0018]
FIG. 2 is a block diagram showing a circuit configuration of the camera shown in FIG.
[0019]
The camera 100 includes a photographing zoom lens 101, a diaphragm 131, and a CCD sensor which is an image sensor that converts an object image formed through the photographing zoom lens 101 and the diaphragm 131 into an analog image signal. 132. The CCD sensor 132 generates an image signal by accumulating charges generated by subject light irradiated on the CCD sensor 132 for a variable charge accumulation time.
[0020]
The camera 100 also adjusts the white balance of the subject image represented by the analog image signal from the CCD sensor 132 and adjusts the slope (γ) of the straight line in the gradation characteristics of the subject image. Is provided.
[0021]
The camera 100 further includes an A / D unit 134 for A / D converting an analog signal from the white balance / γ processing unit 133 into digital image data, and a buffer for storing the image data from the A / D unit 134. A memory 135 is provided.
[0022]
Further, the camera 100 includes a CG (clock generator) unit 136, a photometry / ranging CPU 137, a charge / light emission control unit 138, a YC processing unit 140, and a power source 141.
[0023]
The CG unit 136 outputs a drive signal for driving the CCD sensor 132 and a control signal for controlling the white balance / γ processing unit 133 and the A / D unit 134. Further, a control signal from the photometry / ranging CPU 137 is input to the CG unit 136.
[0024]
The photometry / ranging CPU 137 performs photometry and distance measurement by driving the photographing zoom lens 101 and the diaphragm 131 by means not shown, and controls the CG unit 136 and the charge / light emission control unit 138. Further, the photometry / ranging CPU 137 performs data communication with a main CPU 145 described later.
[0025]
The charging / light emission control unit 138 is supplied with electric power from the power source 141 to emit light from the flash light emission tube 105a, charges a flash light emission capacitor (not shown), and controls light emission from the flash light emission tube 105a.
[0026]
The YC processing unit 140 reads the image data stored in the buffer memory 135 via the bus line 142, and generates a color video signal YC separated into a luminance signal (Y) and a color signal (C). The generated color video signal YC is output from the video output terminal 106 (see FIG. 1C).
[0027]
A power supply 141 supplies power to each part of the camera 100.
[0028]
Further, the camera 100 includes a compression / decompression & ID extraction unit 143 and an I / F unit 144. The compression / decompression & ID extraction unit 143 reads and compresses the image data stored in the buffer memory 135 via the bus line 142 and stores the image data in the memory card 200 via the I / F unit 144. In addition, the compression / decompression & ID extraction unit 143 extracts an identification number (ID) unique to the memory card 200 and reads the image data stored in the memory card 200 when reading the image data stored in the memory card 200. Are decompressed and stored in the buffer memory 135.
[0029]
The camera 100 also includes a main CPU 145, an EEPROM 146, a YC / RGB conversion unit 147, and a display driver 148.
[0030]
The main CPU 145 controls the entire camera 100.
[0031]
The EEPROM 146 stores solid data unique to the camera 100 and the like.
[0032]
The YC / RGB conversion unit 147 converts the color video signal YC generated by the YC processing unit 140 into RGB signals of three colors, and outputs them to the image display LCD 102 via the display driver 148.
[0033]
The camera 100 further includes the microphone 152, the filter 153, the A / D unit 154, the speech dictionary creation unit 155, the speech dictionary memory 156, the speech dictionary selection unit 157, and the speech control unit 158 described above. Is provided.
[0034]
Voices from a plurality of users (speakers) are input to the microphone 152. The voices of the plurality of speakers input from the microphone 152 are converted into analog electrical signals and output to the filter 153.
[0035]
The filter 153 removes frequency components other than the necessary band from the analog electrical signal from the microphone 152 and outputs the result to the A / D unit 154.
[0036]
The A / D unit 154 converts the analog electrical signal from the filter 153 into a digital signal.
[0037]
Based on the digital signal from the A / D unit 154, the voice dictionary creation unit 155 extracts the voice features of a plurality of speakers and creates a voice dictionary for each of the plurality of speakers. Each symbol image for recognizing the speaker is associated with each other and stored in the memory card 200. Details will be described later.
[0038]
The voice dictionary selection unit 157 displays a list of symbol images on the image display LCD 102, and is selected by selecting any symbol image in the symbol image list by operating the cross switch 124 of the operation unit 120. A speech dictionary corresponding to the symbol image is selected. The selected speech dictionary is stored in the speech dictionary memory 156.
[0039]
The voice control unit 158 recognizes the voice input from the microphone 151 using the voice dictionary selected by the voice dictionary selection unit 157 and stored in the voice dictionary memory 156, and performs control according to the recognized voice. Here, an example of voice control will be described.
[0040]
In the present embodiment, a voice print by inputting a basic word of a speaker is stored in the voice dictionary memory 156. Here, the basic words may be the same as the words used when the camera 100 is photographed, or may be unrelated words (for example, “Aiueo”). Examples of the relationship between the words used when shooting with the camera 100 and the operation of the camera 100 include the following.
[0041]
When “shooting” is pronounced, the camera 100 performs a shutter release operation. Also, when the word “zoom up” is pronounced, the photographing zoom lens 101 is moved to the tele side, and when the word “zoom down” is pronounced, the photographing zoom lens 101 is moved to the wide side. Further, when “menu open” is pronounced, a setting screen is displayed on the image display LCD 102. In this state, when the word “up” is pronounced, the menu selection item is raised (carrying up), and when the word “down” is pronounced, the menu selection item is lowered (carrying down). If the user pronounces “set”, the menu selection item is confirmed. If the user pronounces “cancel”, the menu selection item is canceled. Further, when the word “play” is pronounced, the photographed image is reproduced. When the word “feed” is pronounced, the frame of the reproduced image is forwarded. When the word “back” is pronounced, one frame of the reproduced image is returned.
[0042]
FIG. 3 is a diagram showing an image file and an audio dictionary recorded on the memory card.
[0043]
As described above, the voice dictionary creation unit 155 extracts voice characteristics of a plurality of speakers to create a voice dictionary for each of the plurality of speakers, and for each voice dictionary to recognize each speaker. Corresponding to each symbol image. Each of these correlated voice dictionaries and each symbol image is stored in the memory card 200 as shown in FIG. Here, a pair of an image file 1 as a symbol image and a speech dictionary 1 associated with the image file 1, a pair of an image file 2 as a symbol image and the speech dictionary 2 associated with the image file 2. Etc. are memorized.
[0044]
The camera 100 according to the present embodiment extracts voice characteristics of a plurality of speakers to create a voice dictionary for each of the plurality of speakers, and recognizes each created voice dictionary and each speaker. Each symbol image is associated with each other, and when the camera 100 is used, a symbol image corresponding to the user who uses the camera 100 is selected by operation from the symbol image list, and the user (speaker) is selected. The voice dictionary corresponding to is selected and voice recognition is performed. Details will be described below.
[0045]
FIG. 4 is a diagram showing a list of symbol images made up of the image files shown in FIG.
[0046]
The symbol image list 160 includes family symbol images, and is displayed on the image display LCD 102. Specifically, the symbol image 161 of “father” (corresponding to the image file 1), the symbol image 162 of “son” (corresponding to the image file 2), the symbol image 163 of “mother”, and the “daughter” ”Symbol image 164 is displayed. When such a family operates the camera 100 by voice, it is necessary to first create a voice dictionary.
[0047]
FIG. 5 is a flowchart of an audio dictionary creation routine of the camera of this embodiment.
[0048]
Here, the case of creating a speech dictionary file for the symbol image 162 of “son” shown in FIG. 4 will be described, but the case of creating a speech dictionary file for the other symbol images 161, 163, and 164 is also described. It is the same.
[0049]
When the power switch 121 is pressed with the shooting / playback switching lever 122 switched to the playback side 122b to turn on the camera 100, the personal setting mode is selected with the cross switch 124, and the execution switch 126a is operated. This speech dictionary creation routine starts.
[0050]
First, in step S1, voice input is permitted and the process proceeds to step S2. In step S2, it is determined whether or not a predetermined word (here, a word uttered by “son”) has been input. If the predetermined word is not input, this step S2 is repeatedly executed until the predetermined word is input. If it is determined that a predetermined word has been input, the process proceeds to step S3.
[0051]
In step S3, the voice feature is extracted and the process proceeds to step S4. In step S4, an audio dictionary file is created.
[0052]
Next, in step S5, an image file to be associated is selected. Here, as shown in FIG. 4, a symbol image list 160 is displayed on the image display LCD 102, the symbol image 162 of “son” is selected, and the execution switch 126a is operated. Then, in step S6, the voice dictionary file is associated with the image file and recorded in the memory card 200 as a recording medium, and this routine is terminated.
[0053]
FIG. 6 is a flowchart of an audio dictionary selection routine of the camera of this embodiment.
[0054]
Here, the case where the speech dictionary for the symbol image 162 of “son” shown in FIG. 4 is selected will be described.
[0055]
When the power switch 121 is pressed with the shooting / playback switching lever 122 switched to the playback side 122b to turn on the camera 100, the personal identification mode is selected with the cross switch 124, and the execution switch 126a is operated. This speech dictionary selection routine starts.
[0056]
First, in step S21, an image is read from the recording medium (memory card 200), and the process proceeds to step S22. In step S22, image list data (symbol image list) is created. Next, in step S23, the image list is reproduced and the process proceeds to step S24.
[0057]
In step S <b> 24, it is determined whether or not the user image (here, “son” image) has been selected. If the user's image is not selected, this step S24 is repeatedly executed until the user's image is selected (specifically, the cross switch 124 is operated until the “son” image is selected). If it is determined that the user image has been selected, the process proceeds to step S25 by operating the execution switch 126a.
[0058]
In step S25, the voice dictionary associated with the image is read. Next, in step S26, it is stored in the voice dictionary storage location (voice dictionary memory 156) of the camera body, and this routine is terminated.
[0059]
FIG. 7 is a flowchart of a routine for controlling the camera by recognizing the voice using the voice dictionary stored in the voice dictionary memory of the camera of the present embodiment.
[0060]
This routine starts when the power switch 121 is pressed and the camera 100 is turned on with the photographing / reproduction switching lever 122 switched to the photographing side 122a.
[0061]
First, in step S31, it is determined whether or not the voice input mode has been selected (selected by the cross switch 124). If the voice input mode is not selected, the process proceeds to step S37 and a normal operation is performed. If it is determined that the voice input mode has been selected, the process proceeds to step S32.
[0062]
In step S32, it is determined whether there is a voice input. If it is determined that there is no voice input, step S32 is repeatedly executed until there is a voice input. If it is determined that there is a voice input, the process proceeds to step S33. In step S33, recognition is performed using a speech dictionary.
[0063]
Next, in step S34, it is determined whether or not the command is a valid command, that is, the voice such as “shooting” and “zoom-up” described above. If it is determined that the command is valid, the command is executed in step S35, and the process returns to step S31. On the other hand, if it is determined that the command is not valid, a warning is displayed on the image display LCD 102 in step S36, and the process returns to step S31.
[0064]
In the present embodiment, the speech dictionary creation unit 155 associates each speech dictionary with each symbol image and records it in the memory card 200. However, the speech dictionary creation unit referred to in the present invention What is necessary is just to associate a speech dictionary with each symbol image.
[0065]
In this embodiment, the voice control unit 158 recognizes the voice dictionary stored in the voice dictionary memory 156. However, the voice control unit according to the present invention is selected by the voice dictionary selection unit. Any device that recognizes speech using a speech dictionary may be used.
[0066]
【The invention's effect】
As described above, according to the camera of the present invention, it is possible to select a voice dictionary appropriate for the voice of the speaker with a simple operation.
[Brief description of the drawings]
FIG. 1 is an external view of a camera according to an embodiment of the present invention.
2 is a block diagram showing a circuit configuration of the camera shown in FIG. 1. FIG.
FIG. 3 is a diagram showing an image file and an audio dictionary recorded on a memory card.
4 is a diagram showing a list of symbol images made up of the image files shown in FIG. 3. FIG.
FIG. 5 is a flowchart of an audio dictionary creation routine of the camera of the present embodiment.
FIG. 6 is a flowchart of an audio dictionary selection routine of the camera of the present embodiment.
FIG. 7 is a flowchart of a routine for recognizing voice using a voice dictionary stored in a voice dictionary memory and controlling the camera of the camera according to the present embodiment.
[Explanation of symbols]
100 Camera 101 Shooting Zoom Lens 102 Image Display LCD
103 Operation display LCD
104 Release button 105 Flash light emitting device 105a Flash light emitting tube 106 Video output terminal 107 USB terminal 108 DC voltage input terminal 120 Operation unit 121 Power switch 122 Shooting / playback switch lever 123 Shooting mode dial 124 Cross switch 125 Flash light emitting switch 126a Execution switch 126b Cancel switch 131 Aperture 132 CCD sensor 133 White balance / γ processing unit 134, 154 A / D unit 135 Buffer memory 136 CG unit 137 Metering / ranging CPU
138 Charging / light emission control unit 139 Communication control unit 140 YC processing unit 141 Power supply 142 Bus line 143 Compression / decompression & ID extraction unit 144 I / F unit 145 Main CPU
146 EEPROM
147 YC / RGB conversion unit 148 Driver 151 Microphone 152 Speaker 153 Filter 155 Audio dictionary creation unit 156 Audio dictionary memory 157 Audio dictionary selection unit 158 Audio control unit 160 Symbol image list 161, 162, 163, 164 Symbol image 200 Memory card

Claims

In a camera that shoots a subject,
A microphone that picks up the sound,
It is determined whether or not a predetermined word has been input, and when it is determined that a predetermined word has been input , the voice characteristics of the speaker based on the respective voices of the plurality of speakers input from the microphone A speech dictionary creation unit that creates a speech dictionary for each of the plurality of speakers and associates each speech dictionary with each symbol image for recognizing each speaker;
An image display unit for displaying an image;
A speech dictionary selection unit that displays the symbol image list on the image display unit and selects a speech dictionary corresponding to the selected symbol image by operating any symbol image in the symbol image list by operation; ,
A camera comprising: a voice control unit that recognizes voice input from the microphone using a voice dictionary selected by the voice dictionary selection unit and performs control according to the recognized voice.