JP2004363888A

JP2004363888A - Digital camera and image editing device using the same

Info

Publication number: JP2004363888A
Application number: JP2003159350A
Authority: JP
Inventors: Seiji Nagao; 征司長尾
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2003-06-04
Filing date: 2003-06-04
Publication date: 2004-12-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a digital camera in which additional information attached to image data is quickly, easily, and accurately inputted each time a picture is taken and editing processing for image data can be easily performed. <P>SOLUTION: The digital camera is equipped with a recording part 10 which performs encoding processing of a subject into image data and records the image data, a data input part 9 which adds contents of items of the image data as additional information by the items, a recording part 10 which processes inputted audio and records audio data, a plurality of conversion dictionary files which are classified and prepared by items and convert audio data corresponding to the contents into text data, and a speech recognition processing part 8 which converts the audio data into text data. Conversion dictionary files are selected, item by item, and the audio data are converted into the text data, which are added as additional information to the image data. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、デジタルカメラ及びこれを用いた画像編処理システムの改良に関する。
【０００２】
【従来の技術】
近年、アナログカメラに替わり、デジタルカメラが普及してきている。デジタルカメラには、以下に説明するような利点がある。
【０００３】
例えば、撮影した映像をデジタルカメラのＬＣＤ等の表示装置でその場で見ることができる。撮影に失敗した映像の消去を行うことができる。撮影した映像をパーソナルコンピュータに採り込んで、画像の加工、編集をデジタル処理で容易に行うことができる。パーソナルコンピュータを用いて印刷したり、デジタルファイルとして画像を送信したりすることができる。パーソナルコンピュータに撮影画像を転送することにより劣化しないデジタル画像を保存できる。また、動画や音声のようなマルチメディアの画像を扱うことのできるデジタルカメラもある。
【０００４】
このような理由で、パーソナルコンピュータの普及に伴って、デジタルカメラを利用するケースが多い。近時は、撮像素子ＣＣＤの技術の進歩に伴って、デジタルカメラはその画素数が増大し、２００万画素〜３００万画素以上の高画素のものが発売され、コンシューマー市場ばかりではなく、ビジネス市場にも幅広く、デジタルカメラが使用されるようになっている。業務用として使用するためには、撮影後の後工程を考慮して、撮影時点で様々な付加情報を入力することができるようにすることが不可欠である。
【０００５】
この業務用として使用する場合、従来、アナログカメラでは、撮影時に被写体の近くに黒板を設け、黒板に項目名とその内容を記載し、被写体と共に黒板を撮影して、被写体を分類するという作業を行っているが、この被写体の分類作業は面倒である。
【０００６】
これに対して、デジタルカメラでは、従来から、そのセットアップモードで日時の設定、撮影者の氏名等を設定して、撮影した写真画像に日付入れ、撮影者名を入れることができるものがあり、これに、加えて、撮影画像に複数項目の内容を付加情報として添付して入力することができるようにしたものもある（例えば、特許文献１参照。）
この場合に、業務用途で使用するデジタルカメラでは、各業務で使用する付加情報の項目及びその内容は一義的に業務の内容により決まっている。例えば、自動車保険の関係では、その項目は、クレーム番号、車体番号、症状コード等である。
【０００７】
また、例えば、土木関連では、図１に示すように、工事現場の撮影画像Ｇ１に、項目として、図２に示すような付加情報Ａ１として「現場名称」、「工程名称１」、「工程名称２」、「工事担当会社」、「撮影者の氏名」を設け、各項目の内容を入力することができるようにしたデジタルカメラもある。その図２では、各項目の内容の欄に、各項目に対応させて「第３橋梁現場」、「橋梁強化」、「溶接工程」、「ＸＸ土木（株）」、「△△」のデータが入力されている。
【０００８】
このように、デジタルカメラの普及に伴って、撮影された画像データに付加情報を付加してパーソナルコンピュータに転送し、付加情報Ａ１毎に画像を分類、整理するという画像編集作業、画像管理作業が非常に容易になりつつある。
【０００９】
【特許文献１】
特許第３０９２１４２号公報
【００１０】
【発明が解決しようとする課題】
ところで、小型化を求められているデジタルカメラでは、撮影時に付加情報を添付するのは非常に困難である。
【００１１】
例えば、撮影時にデジタルカメラの小さなキーを手で操作し、小さなＬＣＤ等の液晶画面を見ながら、付加データを入力することを強要するのは、困難である。
【００１２】
また、画像データに音声データを付加情報として関連づけて音声認識することも特許文献１に記載され、音声認識には各種の手法が提案されているが、不特定話者、語彙数が増大すると認識性能は劣化する。
【００１３】
すなわち、従来のデジタルカメラでは、画像データに添付される付加情報を迅速かつ手軽にしかも正確に入力し難いという不都合が残っている。
【００１４】
本発明は、上記の事情に鑑みて為されたもので、その目的とするところは、画像データに添付される付加情報を撮影毎に迅速かつ手軽にしかも正確に入力することができ、画像データの編集処理が容易なデジタルカメラ及びこれを用いた画像編集処理システムを提供することを目的とする。
【００１５】
【課題を解決するための手段】
請求項１に記載のデジタルカメラは、被写体を画像データとして符号化処理して該画像データを記録する記録部と、前記画像データに項目毎にその内容を付加情報として付加するデータ入力部と、入力された音声を処理して音声データを記録する記録部と、前記項目毎に分類して準備されかつその内容に対応する音声データをテキストデータに変換する複数個の変換辞書ファイルと、前記音声データをテキストデータに変換する音声認識処理部とを備え、項目毎に変換辞書ファイルを選択して前記音声データを前記テキストデータに変換して前記画像データに付加情報として添付することを特徴とする。
【００１６】
請求項２に記載のデジタルカメラは、前記音声データから前記テキストデータへの変換を撮影実行後に実行することを特徴とする。
【００１７】
請求項３に記載のデジタルカメラは、項目番号の入力と音声データの入力とにより自動的に音声データからテキストデータへの変換が実行されることを特徴とする。
【００１８】
請求項４に記載のデジタルカメラは、前記音声データを前記テキストデータに変換後に音声データを保存するかしないかをユーザーが選択する選択手段を有することを特徴とする。
【００１９】
請求項５に記載のデジタルカメラは、業務毎に分類された変換辞書ファイルを有することを特徴とする。
【００２０】
請求項６に記載の画像編集処理システムは、請求項１ないし請求項４のいずれか１項に記載のデジタルカメラに、前記変換辞書ファイルの音声データとテキストデータとをユーザーが入力することを特徴とする。
【００２１】
請求項７に記載の画像編集処理システムは、業務毎に分類された変換辞書ファイルを有することを特徴とする。
【００２２】
【発明の実施の形態】
図３は本発明に係わるデジタルカメラのブロック回路図を示し、この図において、１はレンズ、２はメカニカルシャッター、３はＣＣＤである。レンズ１、メカニカルシャッター２はドライバー部４によって駆動制御される。そのＣＣＤ３はＣＣＤ駆動回路部５によって駆動される。
【００２３】
ＣＣＤ３の映像出力信号は相関二重サンプリングを実行するＣＤＳ回路とアナログデジタル変換を実行するＡ／Ｄコンバータ部とを有する回路部６に入力され、アナログ・デジタル変換されて画像処理プロセッサ７に入力される。
【００２４】
画像処理プロセッサ７は、そのデジタル信号を輝度データＹ、色差データＵ、Ｖデータに変換したり、そのＹＵＶデータをＪＰＥＧ圧縮したり、画像サイズを変更したり等の各種の符号化処理を実行する機能を有する。
【００２５】
その画像処理プロセッサ７、ドライバー部４、ＣＣＤ駆動回路部５はＣＰＵ８に接続され、このＣＰＵ８はこれらの回路を制御するのに用いられる他、このデジタルカメラを統括制御するのに用いられる。このＣＰＵ８は音声認識処理部を備えている。
【００２６】
そのＣＰＵ８には操作入力部９が接続されると共に、メモリ部１０、通信部１１、メモリカードインターフェース部１２、外部センサー部１３が接続されている。外部センサー１３は被写体までの距離を測距し、その測距情報はＣＰＵ８に入力され、ＣＰＵ８はその測距情報に基づいてレンズ１の位置を制御する。
【００２７】
メモリ部１０はこのデジタルカメラによって撮像された画像データを一時的に保存すると共に、後述するメモリカード等の画像ファイルからのリードデータを一時的に保存する役割を有すると共に、画像処理プロセッサ７、ＣＰＵ８のワークメモリ部としても使用される。ここでは、そのメモリ部１０には複数個の変換辞書ファイルが保存されている。
【００２８】
メモリカードインターフェース部１２はデジタルカメラに対して着脱可能な図示を略すメモリカード（外部ストレージカードともいう）が装着されて、メモリカードとＣＰＵ８との間で画像データの授受が行われる。通信部１１はパーソナルコンピュータ等の端末処理装置１４に接続可能であり、この通信部１１は変換辞書ファイルをパーソナルコンピュータ等の端末処理装置１４から採り込んだり、デジタルカメラにより撮像された画像データを端末処理装置１４に送信したりするのに用いられる。
【００２９】
画像処理プロセッサ７は表示部１５、音声ＣＯＤＥＣ１６に接続され、表示部１５は表示コントローラ部とＬＣＤ表示装置とからなり、表示コントローラ部は画像処理プロセッサ７からの映像信号をＬＣＤ表示装置が表示可能な信号に変換し、ＬＣＤ表示装置はその画像を表示し、撮影時にはモニタリングの画像を表示したり、再生画像を表示したり、複数の撮影画像を表示するサムネイル画像表示モードも有する。
【００３０】
音声ＣＯＤＥＣ１６は、アナログデジタル変換部として機能し、入力アナログ部１７、出力アナログ部１８に接続され、入力アナログ部１７は図示を略すマイクからの音声信号を音声ＣＯＤＥＣ１６に出力し、音声ＣＯＤＥＣ１６はそのアナログ信号を音声データに変換する。その音声データは画像処理プロセッサ７に画像データに付加すべき付加情報として入力され、ＣＰＵ８によりメモリ部１０に保存される。また、そのメモリ部１０に保存されている音声データは、音声ＣＯＤＥＣ１６によってアナログ信号に変換され、出力アナログ部１８によってアナログ信号に変換され、マイクを通じて音声に変換される。なお、符号１９はカメラの電源である。
【００３１】
操作入力部９は各種のスイッチ部からなり、スイッチ部は、例えば、レリーズキー、ズーム操作キー、電源キー等の各種のキーを有し、これらの各種のキーは、変換辞書ファイルを指定するため、項目の番号を選択するため、音声データを保存するか破棄するかを選択する選択手段にも用いられる。
【００３２】
その画像データに付加すべき付加情報には複数個の項目があり、その図４（ａ）は音声データと変換すべきテキストデータとの対応関係を示す変換辞書ファイルＢ１の一例であり、左側の欄が変換後の「文字」を示し、右側の欄が音声データを示している。ここでは、項目番号「３」の「工程名称２」の変換辞書ファイルが示され、例えば「いっぱんこうてい」と発音される音声データは、文字「一般工程」というテキストデータに変換されることを示している。同様に、図４（ｂ）は、項目番号「５」の「撮影者の氏名」の変換辞書ファイルＢ２の例を示し、左側の欄が変換後の「文字」を示し、右側の欄が音声データを示し、例えば、「すずき」と発音された音声データは文字「鈴木」というテキストデータに変換されることを示している。
【００３３】
また、静止画像Ｇ１のデータの基本構造Ｇ２は、図５に示す通りであり、この例では、静止画像データには、Ｅｘｉｆ圧縮ファイルを用い、付加情報のフォーマット（付加データ構造）Ｆ１にはアプリケーションマーカセグメント５（ＡＰＰ５）を使用し、その構造は、詳細には項目エリアＧ３と内容エリアＧ４とに分割され、項目エリアＧ３は項目番号「１」から項目番号「ｎ」に分割され、内容エリアＧ４はその項目番号「１」から項目番号「ｎ」に対応して分割されている。その項目エリアＧ３はテキストデータエリアとして使用され、その内容エリアＧ４は音声データエリアとテキストデータエリアとして使用される。
【００３４】
ここでは、撮影後に音声認識変換処理を行うものとして説明する。
【００３５】
撮影画像データに付加すべき付加情報としての項目には、例えば、土木関連工事では、図２に示したように、項目番号１の「現場名称」、項目番号２の「工程名称１」、項目番号３の「工程名称２」、項目番号４の「工事担当会社」、項目番号５の「撮影者の氏名」等があるが、このうち、現場名称は工事の期間中変更することはなく、工程名称１は概括名称であるので、頻繁に変更されることは少なく、工事担当会社も固定的であるので、これらについては、被写体を撮影する前にあらかじめ画像データに関連させて付加すべき付加情報として、キー等のデータ入力部を利用してその内容をテキストデータとして入力させておくものとする。これに対して、工程名称２は詳細な工程を示すもので、頻繁に変更される可能性があり、撮影者の氏名も撮影担当が変わるたびに変更されるものである。そこで、これらについては、撮影後に音声を利用して入力することにする。
【００３６】
まず、音声データを入力すべき項目の欄、例えば、項目３の「工程名称２」と項目５の「撮影者の氏名」とに音声データを入力するときには、キー操作により「工程名称２」に対応する項目番号「３」を入力して、例えば、「ようせつこうてい」と発音する。すると、ＣＰＵ８は付加情報として、「項目３」の「工程名称２」に対応する「内容」の欄に「ようせつこうてい」という音声に対応する「音声データ」を撮影画像データに関連づけてメモリ１０に保存する。
【００３７】
次に、キー操作により「撮影者の氏名」に対応する項目番号「５」を入力して、例えば、「すずき」と発音すると、ＣＰＵ８は付加情報として、「項目５」の「撮影者の氏名」に対応する「内容」の欄に「すずき」という音声に対応する「音声データ」を撮影画像データに関連づけてメモリ１０に保存する。
【００３８】
図６（ａ）はその静止画像データＧ１に関連づけられる付加情報のうち、項目番号「３」に「ようせつこうてい」という音声データが撮影画像Ｇ１に関連づけられて保存されていると共に項目番号「５」に「すずき」という音声データが撮影画像Ｇ１に関連づけられて保存されている状態を示している。
【００３９】
ついで、音声変換処理開始のキーを操作すると、図７に示すように、メモリ部１０に保存されている撮影ファイルから、項目の番号とその番号に対応する音声データとが、順次ＣＰＵ８の音声認識処理システムに取り込まれる（Ｓ．１）。次に、音声認識処理システムは、項目の番号に対応する変換辞書ファイルＢ１を選択してロードする（Ｓ．２）。ついで、その変換辞書ファイルＢ１の中から音声データに対応する文字を探索して音声認識モジュールを用いて音声データをテキストデータに変換する（Ｓ．３）。
【００４０】
ついで、ユーザーは音声データを保存したままとするか、破棄してからテキストデータを保存するかを選択する（Ｓ．４）。音声データを保存したまま保存するを選択した場合には、音声データと共にテキストデータとが撮影ファイルに関連する付加情報として保存される（Ｓ．５）。例えば、項目番号「３」の音声変換処理の場合には、「ようせつこうてい」という音声データと共に「溶接工程」という文字がテキストデータとして項目番号「３」に対応する「工程名称２」の内容の欄に保存される。また、音声データを破棄して保存するを選択した場合には、テキストデータのみが撮影ファイルに関連する付加情報として保存される（Ｓ．６）。従って、ここでは、この場合には、「工程名称２」の内容の欄には「溶接工程」というテキストデータのみが保存されることになる。
【００４１】
音声認識処理システムは、他の項目に音声データがあるかないかを自動的に判断する（Ｓ．７）。他の項目に音声データがある場合には、ステップＳ．１に戻って、ステップＳ．１からＳ．７までの処理を繰り返す。ここでは、項目番号「５」に対応する内容の欄に音声データが関連づけられて保存されているので、Ｓ．１からＳ．７までの処理が再度実行され、「鈴木」という文字がテキストデータとして項目番号「５」に対応する「撮影者の氏名」の内容の欄に保存される。
【００４２】
すなわち、図６（ｂ）に示すように、その静止画像Ｇ１のデータに関連づけられる付加情報の音声データが変換されて、「溶接工程」という文字、「鈴木」という文字がテキストデータとして静止画像Ｇ１のデータに関連づけられて保存される。
【００４３】
また、図８に示すように、被写体の撮影後に、例えば、項目番号「３」をキーを操作して入力することにより、項目番号「３」に対応する変換辞書ファイルＢ１をＣＰＵ８にロードし、ついで、マイクを通して「ようせつこうてい」と発音することにより、その「ようせつこうてい」という音声データに対応する「溶接工程」というテキストデータを付加情報としてメモリ部１０に記憶させ、ついで、項目番号「５」をキー操作入力することにより、項目番号「５」に対応する変換辞書ファイルＢ２をＣＰＵ８にロードし、ついで、マイクを通して「すずき」と発音することにより「すずき」という音声データに対応する「鈴木」というテキストデータを静止画像Ｇ１のデータに関連づけて付加情報としてメモリ部１０に記憶させるようにしても良い。
【００４４】
更に、撮影前に画像データに関連すべき付加情報を入力する構成とすることもできる。
【００４５】
例えば、撮影前に、ユーザーは項目番号を選択して、マイクを通じて発音すると（Ｓ．１０）、その項目番号に対応する変換辞書ファイルが音声認識処理システムにロードされる（Ｓ．１１）。ＣＰＵ８はそのマイクを通じて入力された音声データをテキストデータに変換し（Ｓ．１２）、ついで、音声データを破棄するか否かを判断する（Ｓ．１３）。音声データを保存するを選択したときには、音声データと共にテキストデータを撮影前ファイルに付加情報として関連づける（Ｓ．１４）。音声データを破棄するを選択したときには、音声データを破棄してテキストデータのみが撮影前ファイルに付加情報として関連づけられる（Ｓ．１５）。その後、撮影を実行すると、その撮影前ファイルに撮影画像が採り込まれ、変換辞書ファイルが自動的にロードされて、音声データがテキストデータに自動的に変換され、その撮影済み画像ファイルに付加情報がテキストデータとして関連づけられてメモリ部１０に保存される。
【００４６】
ここでは、項目毎に分類準備された変換辞書ファイルを、ユーザがデジタルカメラのキーを操作することにより作成することにしたが、デジタルカメラとパーソナルコンピュータを接続して画像編集処理システムを構築し、そのパーソナルコンピュータ等の画像編集処理装置（エディター）を用いて項目毎の変換辞書ファイルを作成し、パーソナルコンピュータからデジタルカメラにその変換辞書ファイルを通信手段（ＵＳＢケーブル等の有線通信手段、ブルーツース等の無線通信手段）を用いて転送するようにしても良い。また、記録メディアを用いてデジタルカメラに転送しても良い。
【００４７】
また、ここでは、項目毎に分類して準備された変換辞書ファイルを作成することにしたが、業務毎にかつ項目毎に変換辞書ファイルを作成するようにしても良い。
【００４８】
【発明の効果】
請求項１〜３に記載の発明によれば、項目毎に分類して変換辞書を作成することにしたので、音声認識の処理劣化を回避しつつ付加情報の入力作業の容易化を図ることができる。
【００４９】
請求項４に記載の発明によれば、メモリ資源の活用を図ることができる。
【００５０】
請求項５に記載の発明によれば、各種の業務の付加情報の入力作業の容易化を図ることができる。
【００５１】
請求項６、請求項７に記載の発明によれば、変換辞書ファイルの作成作業が容易である。
【図面の簡単な説明】
【図１】静止画像の一例を示す図である。
【図２】図１に示す静止画像に付加される付加情報の一例を示す図である。
【図３】本発明に係わるデジタルカメラの一例を示すブロック回路図である。
【図４】項目毎の変換辞書ファイルの例を示し、（ａ）は工程名称２の変換辞書ファイルを示し、（ｂ）は撮影者の氏名の変換辞書ファイルを示す。
【図５】本発明に係わる静止画像のデータ構造の一例を示す説明図である。
【図６】静止画像に関連づけられる付加情報の一例を示す説明図であって、（ａ）は音声データが記録された付加情報の説明図であり、（ｂ）は変換辞書ファイルを用いて（ａ）に示す音声データがテキストデータに変換された付加情報の説明図である。
【図７】撮影ファイルに関連づけられた音声データの音声認識処理のフローチャート図である。
【図８】音声認識システムの概要を示す説明図である。
【図９】撮影ファイルに関連づけられる音声データの音声認識処理のフローチャート図である。
【符号の説明】
８…ＣＰＵ（音声認識処理部）
９…操作入力部（データ入力部）
１０…メモリ部（記録部）[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an improvement in a digital camera and an image knitting processing system using the same.
[0002]
[Prior art]
In recent years, digital cameras have become widespread in place of analog cameras. Digital cameras have the advantages described below.
[0003]
For example, a captured image can be viewed on the spot on a display device such as an LCD of a digital camera. It is possible to delete an image for which shooting failed. The captured video is taken into a personal computer, and the image can be easily processed and edited by digital processing. Printing can be performed using a personal computer, and images can be transmitted as digital files. By transferring the captured image to a personal computer, a digital image that does not deteriorate can be stored. Some digital cameras can handle multimedia images such as moving images and audio.
[0004]
For these reasons, digital cameras are often used with the spread of personal computers. In recent years, the number of pixels of digital cameras has increased with the advancement of the technology of the image pickup device CCD, and high-resolution pixels of 2 to 3 million pixels or more have been released, and not only in the consumer market but also in the business market. Digital cameras have come to be used widely. In order to use for business use, it is indispensable to be able to input various additional information at the time of shooting in consideration of a post-process after shooting.
[0005]
Conventionally, when used for business purposes, an analog camera has a work of arranging a blackboard near the subject at the time of shooting, writing the item name and its contents on the blackboard, shooting the blackboard with the subject, and classifying the subject. However, the task of classifying the subject is troublesome.
[0006]
On the other hand, some digital cameras have been able to set the date and time, the name of the photographer, etc. in the setup mode, put the date on the photographed image, and enter the photographer's name, In addition to this, there is also a configuration in which the contents of a plurality of items can be attached to a captured image as additional information and input (for example, see Patent Document 1).
In this case, in a digital camera used for business purposes, the items and contents of the additional information used in each business are uniquely determined by the contents of the business. For example, in the case of automobile insurance, the items are a claim number, a vehicle body number, a symptom code, and the like.
[0007]
In addition, for example, as shown in FIG. 1, in the civil engineering related field, as an item in the photographed image G1 of the construction site, as the additional information A1 as shown in FIG. 2, “site name”, “process name 1”, “process name” There is also a digital camera which is provided with “2”, “company in charge of construction”, and “name of photographer” so that the contents of each item can be input. In FIG. 2, the data of “third bridge site”, “bridge strengthening”, “welding process”, “XX Civil Engineering Co., Ltd.”, and “△△” are shown in the column of the content of each item corresponding to each item. Is entered.
[0008]
As described above, with the widespread use of digital cameras, image editing work and image management work in which additional information is added to photographed image data, transferred to a personal computer, and images are classified and arranged for each additional information A1 are performed. It's getting very easy.
[0009]
[Patent Document 1]
Japanese Patent No. 3092142
[Problems to be solved by the invention]
By the way, it is very difficult to attach additional information at the time of photographing in a digital camera required to be downsized.
[0011]
For example, it is difficult to force input of additional data while operating a small key of a digital camera by hand at the time of photographing and looking at a liquid crystal screen such as a small LCD.
[0012]
Patent Document 1 also describes that speech recognition is performed by associating image data with speech data as additional information, and various methods have been proposed for speech recognition. Performance degrades.
[0013]
That is, the conventional digital camera has a disadvantage that it is difficult to input additional information attached to image data quickly, easily, and accurately.
[0014]
The present invention has been made in view of the above circumstances, and an object of the present invention is to quickly, easily, and accurately input additional information attached to image data for each photographing operation. It is an object of the present invention to provide a digital camera which can easily perform the above editing process and an image editing processing system using the same.
[0015]
[Means for Solving the Problems]
A digital camera according to claim 1, wherein a recording unit that encodes a subject as image data and records the image data, a data input unit that adds the content of the image data as additional information for each item, A recording unit that processes the input voice and records voice data, a plurality of conversion dictionary files that are prepared by classifying each item and convert voice data corresponding to the content into text data, A voice recognition processing unit for converting data into text data, selecting a conversion dictionary file for each item, converting the voice data into the text data, and attaching the text data to the image data as additional information. .
[0016]
A digital camera according to a second aspect is characterized in that the conversion from the voice data to the text data is performed after shooting is performed.
[0017]
A digital camera according to a third aspect is characterized in that conversion of voice data to text data is automatically executed by inputting an item number and inputting voice data.
[0018]
A digital camera according to a fourth aspect of the present invention is characterized in that the digital camera further comprises a selection unit for selecting whether or not to store the audio data after converting the audio data into the text data.
[0019]
A digital camera according to a fifth aspect is characterized in that it has a conversion dictionary file classified for each job.
[0020]
According to a sixth aspect of the present invention, in the image editing processing system, a user inputs voice data and text data of the conversion dictionary file to the digital camera according to any one of the first to fourth aspects. And
[0021]
An image editing processing system according to a seventh aspect is characterized by having a conversion dictionary file classified for each task.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 3 is a block circuit diagram of a digital camera according to the present invention. In this figure, reference numeral 1 denotes a lens, 2 denotes a mechanical shutter, and 3 denotes a CCD. The drive of the lens 1 and the mechanical shutter 2 is controlled by a driver unit 4. The CCD 3 is driven by a CCD drive circuit 5.
[0023]
The video output signal of the CCD 3 is input to a circuit section 6 having a CDS circuit for performing correlated double sampling and an A / D converter for performing analog-to-digital conversion, analog-to-digital converted, and input to an image processor 7. You.
[0024]
The image processor 7 performs various encoding processes such as converting the digital signal into luminance data Y, color difference data U and V data, JPEG compression of the YUV data, and changing the image size. Has functions.
[0025]
The image processor 7, the driver unit 4, and the CCD drive circuit unit 5 are connected to a CPU 8, which is used for controlling these circuits and for controlling the digital camera as a whole. The CPU 8 has a voice recognition processing unit.
[0026]
An operation input unit 9 is connected to the CPU 8, and a memory unit 10, a communication unit 11, a memory card interface unit 12, and an external sensor unit 13 are connected to the CPU 8. The external sensor 13 measures the distance to the subject, and the distance measurement information is input to the CPU 8, and the CPU 8 controls the position of the lens 1 based on the distance measurement information.
[0027]
The memory unit 10 has a role of temporarily storing image data captured by the digital camera and temporarily storing read data from an image file such as a memory card to be described later. Is also used as a work memory unit. Here, the memory unit 10 stores a plurality of conversion dictionary files.
[0028]
The memory card interface unit 12 is provided with a memory card (also referred to as an external storage card), which is detachable from the digital camera, and exchanges image data between the memory card and the CPU 8. The communication unit 11 can be connected to a terminal processing device 14 such as a personal computer. The communication unit 11 can import a conversion dictionary file from the terminal processing device 14 such as a personal computer, or can convert image data captured by a digital camera into a terminal. It is used for transmission to the processing device 14 or the like.
[0029]
The image processing processor 7 is connected to the display unit 15 and the audio CODEC 16, and the display unit 15 includes a display controller unit and an LCD display device. The display controller unit can display a video signal from the image processing processor 7 on the LCD display device. The LCD display device also has a thumbnail image display mode for converting the signal into a signal, displaying the image, displaying a monitoring image at the time of photographing, displaying a reproduced image, and displaying a plurality of photographed images.
[0030]
The audio CODEC 16 functions as an analog-to-digital converter, and is connected to an input analog unit 17 and an output analog unit 18. The input analog unit 17 outputs an audio signal from a microphone (not shown) to the audio CODEC 16, and the audio CODEC 16 outputs Convert signals to audio data. The audio data is input to the image processor 7 as additional information to be added to the image data, and is stored in the memory unit 10 by the CPU 8. The audio data stored in the memory unit 10 is converted into an analog signal by the audio CODEC 16, converted into an analog signal by the output analog unit 18, and converted into audio through a microphone. Reference numeral 19 denotes a power supply of the camera.
[0031]
The operation input unit 9 includes various switch units. The switch unit has various keys such as a release key, a zoom operation key, and a power key. These various keys are used to specify a conversion dictionary file. In order to select the number of the item, it is also used as a selection means for selecting whether to save or discard the audio data.
[0032]
The additional information to be added to the image data includes a plurality of items. FIG. 4A shows an example of a conversion dictionary file B1 indicating the correspondence between the audio data and the text data to be converted. The column indicates “character” after conversion, and the column on the right side indicates audio data. Here, a conversion dictionary file of “process name 2” of item number “3” is shown. For example, voice data pronounced as “Ippan Kotai” is converted to text data of characters “general process”. Is shown. Similarly, FIG. 4B shows an example of a conversion dictionary file B2 of “photographer name” of item number “5”, the left column shows “character” after conversion, and the right column shows voice. For example, it indicates that voice data pronounced “Suzuki” is converted to text data “Suzuki”.
[0033]
The basic structure G2 of the data of the still image G1 is as shown in FIG. 5. In this example, an Exif compressed file is used for the still image data, and the application information format (additional data structure) F1 is used for the additional information. The marker segment 5 (APP5) is used, and its structure is specifically divided into an item area G3 and a content area G4. The item area G3 is divided from an item number "1" to an item number "n", G4 is divided corresponding to the item numbers “1” to “n”. The item area G3 is used as a text data area, and the content area G4 is used as a voice data area and a text data area.
[0034]
Here, a description will be given assuming that voice recognition conversion processing is performed after shooting.
[0035]
The items as additional information to be added to the photographed image data include, for example, in a civil engineering-related work, as shown in FIG. 2, “site name” of item number 1, “process name 1” of item number 2, There are "process name 2" of number 3, "construction company" of item number 4, "name of photographer" of item number 5, etc. Of these, the site name does not change during the construction period. Since the process name 1 is a general name, it is rarely changed frequently, and the company in charge of the construction is also fixed. Therefore, these should be added in advance in association with the image data before photographing the subject. It is assumed that the content is input as text data using a data input unit such as a key as information. On the other hand, the process name 2 indicates a detailed process, and may be frequently changed, and the name of the photographer is also changed each time the photographing charge changes. Therefore, these will be input using voice after shooting.
[0036]
First, when inputting audio data in a column of an item to which audio data is to be input, for example, item 3 “process name 2” and item 5 “photographer's name”, key operation is performed to “process name 2”. The corresponding item number "3" is entered and, for example, "yosetsu koutei" is pronounced. Then, the CPU 8 associates “voice data” corresponding to the voice “Yoshisetsu Kotei” in the “contents” column corresponding to “process name 2” of “item 3” with the captured image data as additional information in the memory 10. To save.
[0037]
Next, an item number “5” corresponding to “photographer's name” is input by key operation, and for example, when “Suzuki” is pronounced, the CPU 8 outputs “item 5” “photographer's name” as additional information. In the "content" column corresponding to "", "voice data" corresponding to the voice of "Suzuki" is stored in the memory 10 in association with the captured image data.
[0038]
FIG. 6A shows that, among the additional information associated with the still image data G1, the voice data “Yoshisetsu Kotei” is stored in the item number “3” in association with the captured image G1 and the item number “5”. "Shows a state in which audio data" Suzuki "is stored in association with the captured image G1.
[0039]
Then, when the key for starting the voice conversion process is operated, as shown in FIG. 7, from the photographed file stored in the memory unit 10, the item numbers and the voice data corresponding to the numbers are sequentially recognized by the CPU 8 by the voice recognition. It is taken into the processing system (S.1). Next, the speech recognition processing system selects and loads the conversion dictionary file B1 corresponding to the item number (S.2). Next, a character corresponding to the voice data is searched from the conversion dictionary file B1, and the voice data is converted into text data using the voice recognition module (S.3).
[0040]
Next, the user selects whether to keep the voice data or to discard and then save the text data (S.4). If the user selects to save the audio data, the text data and the audio data are stored as additional information related to the photographed file (S.5). For example, in the case of the voice conversion process of the item number “3”, the text “welding process” is text data and the content of the “process name 2” corresponding to the item number “3” together with the voice data “Yoshisetsu Kotei” Is stored in the field. When discarding and saving the audio data is selected, only the text data is saved as additional information related to the photographed file (S.6). Therefore, in this case, in this case, only the text data of “welding process” is stored in the content column of “process name 2”.
[0041]
The voice recognition processing system automatically determines whether or not other items have voice data (S.7). If there is audio data in another item, step S. Returning to step S.1, 1 to S.N. The processing up to 7 is repeated. Here, since the audio data is stored in the column of the content corresponding to the item number “5”, 1 to S.N. The processing up to 7 is executed again, and the character "Suzuki" is stored as text data in the column of the content of "name of photographer" corresponding to item number "5".
[0042]
That is, as shown in FIG. 6B, the audio data of the additional information associated with the data of the still image G1 is converted, and the characters “welding process” and “Suzuki” are converted into text data of the still image G1. Is stored in association with the data.
[0043]
As shown in FIG. 8, after photographing the subject, for example, by operating the key to input the item number “3”, the conversion dictionary file B1 corresponding to the item number “3” is loaded into the CPU 8, Then, by pronouncing “Yoshisetsu Koite” through a microphone, text data of “welding process” corresponding to the voice data of “Yoshisetsu Koite” is stored in the memory unit 10 as additional information, and then the item number “ By inputting the key operation of "5", the conversion dictionary file B2 corresponding to the item number "5" is loaded into the CPU 8, and then "Suzuki" is pronounced through a microphone to correspond to the voice data of "Suzuki". The text data "Suzuki" is stored in the memory unit 10 as additional information in association with the data of the still image G1. It may be.
[0044]
Further, a configuration may be employed in which additional information to be associated with image data is input before photographing.
[0045]
For example, before photographing, the user selects an item number and pronounces it through a microphone (S.10), and a conversion dictionary file corresponding to the item number is loaded into the speech recognition processing system (S.11). The CPU 8 converts the voice data input through the microphone into text data (S.12), and then determines whether to discard the voice data (S.13). When the user selects to save the audio data, the text data and the audio data are associated with the pre-shooting file as additional information (S. 14). When discarding the audio data is selected, the audio data is discarded and only the text data is associated with the pre-shooting file as additional information (S.15). After that, when shooting is performed, the shot image is taken in the file before shooting, the conversion dictionary file is automatically loaded, voice data is automatically converted to text data, and additional information is added to the shot image file Are stored in the memory unit 10 as text data.
[0046]
Here, the conversion dictionary file classified and prepared for each item is created by the user operating the key of the digital camera, but the digital camera and the personal computer are connected to construct an image editing processing system, A conversion dictionary file for each item is created using an image editing processing device (editor) such as a personal computer, and the conversion dictionary file is transmitted from the personal computer to a digital camera through communication means (wired communication means such as a USB cable, Bluetooth, etc.). (Wireless communication means). Alternatively, the data may be transferred to a digital camera using a recording medium.
[0047]
Here, a conversion dictionary file prepared by classifying each item is prepared. However, a conversion dictionary file may be generated for each task and for each item.
[0048]
【The invention's effect】
According to the first to third aspects of the present invention, since the conversion dictionary is created by classifying each item, it is possible to facilitate the input operation of the additional information while avoiding the processing degradation of the voice recognition. it can.
[0049]
According to the fourth aspect of the invention, it is possible to utilize memory resources.
[0050]
According to the fifth aspect of the present invention, it is possible to facilitate the work of inputting additional information for various tasks.
[0051]
According to the sixth and seventh aspects of the invention, it is easy to create a conversion dictionary file.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an example of a still image.
FIG. 2 is a diagram showing an example of additional information added to the still image shown in FIG.
FIG. 3 is a block circuit diagram showing an example of a digital camera according to the present invention.
4A and 4B show examples of a conversion dictionary file for each item, wherein FIG. 4A shows a conversion dictionary file of process name 2 and FIG. 4B shows a conversion dictionary file of a photographer's name.
FIG. 5 is an explanatory diagram showing an example of a data structure of a still image according to the present invention.
6A and 6B are explanatory diagrams illustrating an example of additional information associated with a still image, in which FIG. 6A is an explanatory diagram of additional information in which audio data is recorded, and FIG. It is explanatory drawing of the additional information which the audio | voice data shown to a) was converted into text data.
FIG. 7 is a flowchart of a voice recognition process of voice data associated with a shooting file.
FIG. 8 is an explanatory diagram showing an outline of a speech recognition system.
FIG. 9 is a flowchart of a voice recognition process of voice data associated with a shooting file.
[Explanation of symbols]
8 CPU (speech recognition processing unit)
9 Operation input section (data input section)
10 Memory part (recording part)

Claims

A recording unit that encodes a subject as image data and records the image data; a data input unit that adds the contents of the image data as additional information to the image data for each item; A plurality of conversion dictionary files which are prepared by classifying each item and convert voice data corresponding to the content into text data, and a voice recognition process which converts the voice data into text data A digital dictionary, wherein a conversion dictionary file is selected for each item, the voice data is converted to the text data, and the text data is attached to the image data as additional information.

2. The digital camera according to claim 1, wherein the conversion from the audio data to the text data is performed after photographing is performed.

3. The digital camera according to claim 1, wherein conversion of voice data to text data is automatically performed by inputting an item number and inputting voice data.

4. The digital camera according to claim 3, further comprising a selection unit configured to allow a user to select whether to store the voice data after converting the voice data into the text data. 5.

The digital camera according to claim 3, further comprising a conversion dictionary file classified for each job.

5. An image editing processing system, wherein a user inputs voice data and text data of the conversion dictionary file to the digital camera according to claim 1.

7. The image editing processing system according to claim 6, comprising a conversion dictionary file classified for each task.