JP2007049245A

JP2007049245A - Photography instrument with voice input function

Info

Publication number: JP2007049245A
Application number: JP2005228957A
Authority: JP
Inventors: Kazuhide Mogi; 一秀茂木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2005-08-08
Filing date: 2005-08-08
Publication date: 2007-02-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a photography instrument in which a photographer can input information easily and can check the content of a photographed image easily. <P>SOLUTION: The photography instrument with a voice input function comprises a photography means for creating bit map data from an image signal obtained by photography and holding the bit map data in a memory, a means for starting voice input, a voice input means for converting a voice into a voice signal, a voice recognition means for analyzing the voice signal to form a kana, a means for transforming a kana character string thus formed into a text, a means for creating a title including the transformed character string and adding it to image data, and a means for creating image data identification information and creating image data by compounding the title, the image data identification information and the bit map data held in the memory. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、デジタルカメラに代表される撮影装置にて撮影した画像データに、音声入力にて画像データ名称を付与する音声入力機能付撮影装置及び撮影方法に関するものである。
The present invention relates to a photographing apparatus with a sound input function and a photographing method for assigning image data names to sound data input by a photographing apparatus typified by a digital camera.

従来から、デジタルカメラで被写体を撮影した画像データを、記憶媒体に記録する時には、アルファベットや数字を用いた通し番号による画像データ名称を付与して、さらに、撮影画像の内容を識別しやすくするために、追加情報を加えることが一般的に行われている。 Conventionally, when image data obtained by photographing a subject with a digital camera is recorded on a storage medium, an image data name is assigned by a serial number using alphabets and numbers so that the contents of the photographed image can be easily identified. It is common practice to add additional information.

たとえば、特許文献１には、画像を撮影した時に音声入力した情報を、撮影画像に合成して表示させることで画像を識別する技術が開示されている。（従来技術１）
特開２００３−３４８４１０号公報（４〜５頁、図１−３、８）特許文献２には、機械的に付けられた画像データ名称に、デジタルカメラの入力部から文字を入力して、表題として撮影した画像データに付与して、画像を識別する技術が開示されている。（従来技術２）特開平１０−１８７６２７号公報（４〜５頁、図１−２） For example, Patent Document 1 discloses a technique for identifying an image by synthesizing and displaying information input by voice when the image is captured and displaying the captured image. (Prior art 1)
Japanese Patent Laying-Open No. 2003-348410 (pages 4 to 5, FIGS. 1-3 and 8) discloses a technique in which characters are input to a mechanically assigned image data name from an input unit of a digital camera, A technique for identifying an image by adding it to captured image data is disclosed. (Prior art 2) Japanese Patent Laid-Open No. 10-187627 (pages 4-5, FIG. 1-2)

しかし、従来技術１では、画像を識別するため音声入力した文字情報の内容を確かめるためには（つまり、画像を識別するためには）、画像を表示させなければならないという欠点があった。また、従来技術２では、入力部から文字を入力するときに、撮影を中断しなければならないという問題があった。 However, the prior art 1 has a drawback that the image must be displayed in order to confirm the content of the character information inputted by voice for identifying the image (that is, to identify the image). Further, in the related art 2, there is a problem in that shooting must be interrupted when characters are input from the input unit.

本発明はこのような従来技術を考慮してなされたものであって、本発明の課題は、撮影者が容易に情報を入力して、また、撮影した画像の内容を容易に確認できる撮影装置を提供することである。
The present invention has been made in consideration of such a conventional technique, and an object of the present invention is to provide a photographing apparatus that allows a photographer to easily input information and easily confirm the contents of a photographed image. Is to provide.

本発明は、以下のような解決手段により、前記課題を解決する。すなわち、請求項１の発明は、仮名文字列とテキストを関連付けた表形式のテキストデータを記録した辞書ファイルと表題や画像データ識別情報を含む画像データを記憶する画像データファイルを記憶する記憶手段と、撮影して得た画像信号からビットマップデータを生成して、メモリーに保持する撮像手段と、音声入力を開始する音声入力開始手段と、音声を音声信号に変換する音声入力手段と、前記音声信号を分析して、仮名を生成する音声認識手段と、前記生成された仮名文字列をキーにして、辞書ファイルを参照して、前記キーと一致する少なくとも１つのテキストを変換候補として呈示して、選択されたテキストに変換する文字列変換手段と、前記変換された文字列を含んだ表題を作成して、画像データに付加する画像データ表題作成手段と、画像データ識別情報を生成して、表題と前記生成した画像データ識別情報と前記メモリーに保持されたビットマップデータとを合成した画像データを生成する画像データ生成手段と、を備えることを特徴とする音声入力機能付撮影装置である。 The present invention solves the above problems by the following means. That is, the invention of claim 1 is a storage means for storing a dictionary file that records text data in a tabular format in which kana character strings are associated with text, and an image data file that stores image data including title and image data identification information. Imaging means for generating bitmap data from an image signal obtained by photographing and holding it in a memory; audio input starting means for starting audio input; audio input means for converting audio into an audio signal; and the audio Analyzing the signal, voice recognition means for generating a kana, and using the generated kana character string as a key, referring to a dictionary file, presenting at least one text matching the key as a conversion candidate A character string converting means for converting the selected text, and a title including the converted character string to be added to the image data And image data generating means for generating image data identification information, and generating image data obtained by combining the title, the generated image data identification information, and the bitmap data held in the memory. Is a photographing apparatus with a voice input function.

したがって、音声入力した情報が付加された表題を見ることで、画像データの内容を理解することができる。 Therefore, the contents of the image data can be understood by looking at the title to which the information inputted by voice is added.

請求項２の発明は、仮名文字列とテキストを関連付けた表形式のテキストデータを記録した辞書ファイルと画像データ識別情報を含む画像データを記憶する画像データファイルを記憶する記憶手段と、撮影して得た画像信号からビットマップデータを生成して、メモリーに保持する撮像手段と、音声入力を開始する音声入力開始手段と、音声を音声信号に変換する音声入力手段と、前記音声信号を分析して、仮名を生成する音声認識手段と、前記生成された仮名文字列をキーにして、辞書ファイルを参照して、前記キーと一致する少なくとも１つのテキストを変換候補として呈示して、選択されたテキストに変換する文字列変換手段と、画像データ識別情報を生成して、前記生成した画像データ識別情報と前記メモリーに保持されたビットマップデータとを合成した画像データを生成する画像データ生成手段と、前記画像データ識別情報に前記変換されたテキストを追加して新しい画像データ識別情報を生成する画像データ識別情報生成手段と、を備えることを特徴とする音声入力機能付撮影装置である。 According to a second aspect of the present invention, there is provided a dictionary file that records tabular text data in which kana character strings and text are associated, and a storage means that stores image data files that store image data including image data identification information. An imaging unit that generates bitmap data from the obtained image signal and holds it in a memory, an audio input start unit that starts audio input, an audio input unit that converts audio into an audio signal, and analyzes the audio signal A voice recognition means for generating a kana, and using the generated kana character string as a key, referring to a dictionary file, presenting at least one text matching the key as a conversion candidate and selected Character string conversion means for converting to text, image data identification information is generated, and the generated image data identification information and the bit map stored in the memory are stored. Image data generating means for generating image data obtained by synthesizing the image data, and image data identification information generating means for generating new image data identification information by adding the converted text to the image data identification information. Is a photographing apparatus with a voice input function.

請求項３の発明は、仮名文字列とテキストを関連付けた表形式のテキストデータを記録した辞書ファイルと表題と画像データ識別情報を含む画像データを記憶する画像データファイルを使用する音声入力撮影方法であって、撮影して得た画像信号からビットマップデータを生成して、メモリーに保持する撮像ステップと、音声入力を開始する音声入力開始ステップと、音声を音声信号に変換する音声入力ステップと、前記音声信号を分析して、仮名を生成する音声認識ステップと、前記生成された仮名文字列をキーにして、辞書ファイルを参照して、前記キーと一致する少なくとも１つのテキストを変換候補として呈示して、選択されたテキストに変換する文字列変換ステップと、前記変換された文字列を含んだ画像データの表題を作成する画像データ表題作成ステップと、画像データ識別情報を生成して、表題と前記生成した画像データ識別情報と前記メモリーに保持されたビットマップデータとを合成した画像データを生成する画像データ生成ステップと、を含んだ手順でなされることを特徴とする音声入力撮影方法である。 According to a third aspect of the present invention, there is provided a voice input photographing method using a dictionary file in which tabular text data in which a kana character string and text are associated and an image data file storing image data including a title and image data identification information are stored. An imaging step of generating bitmap data from an image signal obtained by shooting and holding the data in a memory; a voice input start step of starting voice input; a voice input step of converting voice into a voice signal; A speech recognition step of analyzing the speech signal to generate a kana, and using the generated kana character string as a key, referring to a dictionary file, and presenting at least one text matching the key as a conversion candidate Then, a character string conversion step for converting the selected text and a title of the image data including the converted character string are created. An image data title creation step; and an image data generation step for generating image data identification information, and generating image data obtained by combining the title, the generated image data identification information, and the bitmap data held in the memory; This is a voice input photographing method characterized by being performed in a procedure including

請求項４の発明は、仮名文字列とテキストを関連付けた表形式のテキストデータを記録した辞書ファイルと表題や画像データ識別情報を含む画像データを記憶する画像データファイルを使用する音声入力撮影方法であって、撮影して得た画像信号からビットマップデータを生成して、メモリーに保持する撮像ステップと、前記メモリーに保持されたビットマップデータと生成して名と表題を含んだ画像データを生成する画像データステップと、音声入力を開始する音声入力開始ステップと、音声を音声信号に変換する音声入力ステップと、前記音声信号を分析して、仮名を生成する音声認識ステップと、前記生成された仮名を文字列に変換する文字列変換ステップと、画像データ識別情報を生成して、前記生成した画像データ識別情報と前記メモリーに保持されたビットマップデータとを合成した画像データを生成する画像データ生成ステップと、前記画像データ識別情報に、前記変換されたテキストを追加して新しい画像データ識別情報に変更する画像データ識別情報変更ステップと、を含んだ手順でなされることを特徴とする音声入力撮影方法である。 According to a fourth aspect of the present invention, there is provided an audio input photographing method using a dictionary file in which tabular text data in which kana character strings and text are associated with each other and an image data file storing image data including title and image data identification information are used. Then, it generates bitmap data from the image signal obtained by photographing and stores it in a memory, and generates bitmap data stored in the memory to generate image data including a name and a title. An image data step, a voice input start step for starting voice input, a voice input step for converting voice into a voice signal, a voice recognition step for analyzing the voice signal to generate a kana, and the generated A character string conversion step for converting a kana into a character string; and generating image data identification information; and generating the image data identification information and the An image data generation step for generating image data obtained by combining the bitmap data held in Molly, and an image data identification for adding the converted text to the image data identification information to change to new image data identification information An audio input photographing method characterized by being performed in a procedure including an information changing step.

請求項５の発明は、コンピュータに組込むことによって、コンピュータを請求項１、または、請求項２に記載の音声入力機能付撮影装置として動作させるコンピュータプログラムである。 The invention according to claim 5 is a computer program that causes a computer to operate as the photographing apparatus with a voice input function according to claim 1 or claim 2 by being incorporated in the computer.

請求項６の発明は、請求項５に記載のコンピュータプログラムを記録したコンピュータ読取り可能な記録媒体である。
A sixth aspect of the present invention is a computer-readable recording medium on which the computer program according to the fifth aspect is recorded.

本願発明によれば、被写体を撮影した後に、音声入力による文字列を画像データの表題としたり、画像データ識別情報（たとえば、画像データ名）の一部にすることで、画像を容易に識別することが可能となる。
また、大量の画像の選択や整理の作業性が向上する。
According to the present invention, after a subject is photographed, an image can be easily identified by using a character string obtained by voice input as a title of image data or as a part of image data identification information (for example, image data name). It becomes possible.
In addition, the workability of selecting and organizing a large number of images is improved.

以下、図面等を参照しながら、本発明の実施の形態について、更に詳しく説明する。図３は、双方向型製版編集システムの詳細な構成図である。音声入力機能付撮影装置１００は、音声入力開始手段１１０と、音声入力手段１２０と、音声認識手段１３０と、文字列変換手段１４０と、画像データ表題作成手段１５０と、画像データ識別情報作成手段１６０と、撮像手段１７０と、画像データ生成手段１８０と、記憶手段１９０と、を備える。記憶手段１９０は、辞書ファイル１９１と、画像データファイル１９３を記憶する。 Hereinafter, embodiments of the present invention will be described in more detail with reference to the drawings. FIG. 3 is a detailed configuration diagram of the bidirectional type prepress editing system. The photographing apparatus with a voice input function 100 includes a voice input start unit 110, a voice input unit 120, a voice recognition unit 130, a character string conversion unit 140, an image data title creation unit 150, and an image data identification information creation unit 160. And imaging means 170, image data generation means 180, and storage means 190. The storage unit 190 stores a dictionary file 191 and an image data file 193.

辞書ファイル１９１は、仮名文字列とテキスト（たとえば、ひらがな、カタカナ、英数字など）とを関連付けた表形式のテキストデータである。画像データファイル１９３は、表題と画像データ識別情報（たとえば、画像データ名）とを含む画像データを記憶する。 The dictionary file 191 is tabular text data in which a kana character string and text (for example, hiragana, katakana, and alphanumeric characters) are associated with each other. The image data file 193 stores image data including a title and image data identification information (for example, image data name).

音声入力開始手段１１０は、音声入力を開始する。音声入力手段１２０は、音声を音声信号に変換する。音声認識手段１３０は、前記音声信号を分析して、仮名を生成する。文字列変換手段１４０は、前記生成された仮名文字列をキーにして、辞書ファイル１９１を参照して、前記キーと一致するテキストを変換候補として呈示して、選択されたテキストに変換する。画像データ表題作成手段１５０は、前記変換されたテキストを含んだ画像データの表題を作成する。画像データ識別情報変更手段１６０は、所定の画像データ識別情報（たとえば、画像データ名）に、前記変換されたテキストを追加して新しい画像データ名を生成する。撮像手段１７０は、撮影して得た画像信号からビットマップデータを生成して、メモリーに保持する。画像データ生成手段１８０は、前記メモリーに保持されたビットマップデータと前記生成した表題を含んだ画像データを生成する。 The voice input start unit 110 starts voice input. The voice input unit 120 converts voice into a voice signal. The voice recognition unit 130 analyzes the voice signal and generates a kana. The character string conversion unit 140 refers to the dictionary file 191 using the generated kana character string as a key, presents the text matching the key as a conversion candidate, and converts it to the selected text. The image data title creation unit 150 creates a title of the image data including the converted text. The image data identification information changing unit 160 adds the converted text to predetermined image data identification information (for example, image data name) to generate a new image data name. The imaging unit 170 generates bitmap data from an image signal obtained by photographing and stores it in a memory. The image data generation unit 180 generates image data including the bitmap data held in the memory and the generated title.

音声信号に変換する音声入力手段１２０は、マイクロフォンである。音声を音声入力開始手段１１０は、押しボタン式スウィッチである。撮像手段１７０は、ＣＣＤである。前記音声信号を分析して、仮名を生成する音声認識手段１３０と、前記生成された仮名を文字列に変換する文字列変換手段１４０と、前記変換された文字列を含んだ画像データの表題を作成する画像データ表題作成手段１５０と、前記変換された文字列を含んだ画像データ名を作成する画像データ識別情報変更手段１６０と、画像データ生成手段とは、コンピュータプログラムである。記憶手段１３０は、不揮発性メモリーである。辞書ファイル１９１と、画像データファイル１９３とは、コンピュータ可読なデータの集合体である。 The voice input means 120 for converting into a voice signal is a microphone. The voice input start means 110 is a push button switch. The imaging means 170 is a CCD. Analyzing the voice signal, voice recognition means 130 for generating a kana, character string conversion means 140 for converting the generated kana to a character string, and a title of image data including the converted character string The image data title creating means 150 to be created, the image data identification information changing means 160 for creating the image data name including the converted character string, and the image data generating means are computer programs. The storage unit 130 is a nonvolatile memory. The dictionary file 191 and the image data file 193 are a collection of computer-readable data.

図４は、画像データの構造説明図である。画像データ１９３ａは、「原稿データ名部」１９３ｃと「表題部」１９３ｄと「画像データ部」１９３ｅから構成される。「原稿データ名」１９３ｃは、ユニークな文字列である。原稿データ名の文字列は、装置１００が所定の手順で生成した文字列を含む。「表題」１９３ｄは、音声入力により入力された文字列を含んだ文字列である。画像データ１９３ｅは、ビットマップデータ（たとえば、ＪＰＥＧデータ、ＲＧＢデータ）である。 FIG. 4 is an explanatory diagram of the structure of image data. The image data 193a includes a “document data name portion” 193c, a “title portion” 193d, and an “image data portion” 193e. The “original data name” 193c is a unique character string. The character string of the document data name includes a character string generated by the apparatus 100 according to a predetermined procedure. “Title” 193d is a character string including a character string input by voice input. The image data 193e is bitmap data (for example, JPEG data, RGB data).

図９は、データ転送中継サーバ装置１００のフローチャートである。
（１）撮像手段１７０は、撮影して得た画像信号からビットマップデータを生成して、メモリーに保持する。（ステップＳ１１０）
（２）撮影者は、被写体を撮影した直後に音声入力ボタン１１０を押して、表題を音声入力する。（ステップＳ１１１）
（３）音声入力手段１２０は、音声を音声信号に変換して、装置に取り込む。音声認識手段１３０は、前記音声信号を分析して、仮名文字列を生成する。（ステップＳ１２０）
（５）文字列変換手段１４０は、前記生成された仮名文字列をキーにして、辞書ファイル１９１を参照して、前記キーと一致するテキストを変換候補として呈示して、撮影者が選択したテキストに変換する。（ステップＳ１２１）
（６）画像データ表題作成手段１５０は、前記変換されたテキストを含んだ画像データの表題を作成する（ステップＳ１３０）
（７）画像データ生成手段１８０は、ユニークな画像名を作成して、この生成した画像名と前記生成した表題と前記メモリーに保持されたビットマップデータとを含んだ画像データを生成する。（ステップＳ１３１） FIG. 9 is a flowchart of the data transfer relay server device 100.
(1) The imaging unit 170 generates bitmap data from an image signal obtained by photographing, and stores the bitmap data in a memory. (Step S110)
(2) The photographer presses the voice input button 110 immediately after shooting the subject, and inputs the title by voice. (Step S111)
(3) The voice input unit 120 converts voice into a voice signal and takes it into the apparatus. The voice recognition unit 130 analyzes the voice signal and generates a kana character string. (Step S120)
(5) The character string conversion unit 140 refers to the dictionary file 191 using the generated kana character string as a key, presents the text matching the key as a conversion candidate, and selects the text selected by the photographer Convert to (Step S121)
(6) The image data title creation means 150 creates a title of the image data including the converted text (step S130).
(7) The image data generation unit 180 generates a unique image name, and generates image data including the generated image name, the generated title, and the bitmap data held in the memory. (Step S131)

なお、ステップＳ１２１以降において、画像データ識別情報変更手段１６０を用いて、所定の画像データ名に、前記変換されたテキストを追加して新しい画像データ名を生成させて、画像データ名としても良い。この場合には、画像データ１９３ａの「表題部」１９３ｄは、なくても良い。 In step S121 and subsequent steps, the image data identification information changing unit 160 may be used to generate a new image data name by adding the converted text to a predetermined image data name to obtain the image data name. In this case, the “title part” 193d of the image data 193a may be omitted.

このように、文字列変換には文書作成ソフトに用いられているかな漢字変換機能及び辞書ファイルを本装置に具備することにより漢字、カタカナ、ひらがな、アルファベットなどの文字列に変換することが可能である。 As described above, the character string conversion can be converted into a character string such as kanji, katakana, hiragana, alphabet, etc. by providing the device with the kana-kanji conversion function and dictionary file used in the document creation software. .

また、音声入力によって画像ファイル名を設定する操作は、画像データファイル１９３に保存されている画像データを呼び出して行ってもよい。
Further, the operation for setting the image file name by voice input may be performed by calling image data stored in the image data file 193.

以上詳しく説明したように、本発明によれば、被写体を撮影した後に、音声入力による文字列を画像データの表題としたり、画像データ名の一部にすることで、画像を容易に識別することが可能となった。
また、大量の画像の選択や整理の作業性が向上した。。
As described above in detail, according to the present invention, an image can be easily identified by capturing a subject and then using a voice input character string as a title of the image data or a part of the image data name. Became possible.
In addition, the workability of selecting and organizing a large number of images has been improved. .

音声入力機能付撮影装置の構成図Configuration diagram of an imaging device with voice input function 画像データの構造Image data structure 音声入力情報の記録のフローチャートFlow chart of recording voice input information

Explanation of symbols

１００音声入力機能付撮影装置
１１０音声入力開始手段
１２０音声入力手段
１３０音声認識手段
１４０文字列変換手段
１５０画像データ表題作成手段
１６０画像データ識別情報変更手段
１７０撮像手段
１８０画像データ生成手段
１９０記憶手段
１９１辞書ファイル
１９３画像データファイル

100 photographing apparatus with voice input function 110 voice input starting means 120 voice input means 130 voice recognition means 140 character string conversion means 150 image data title creation means 160 image data identification information change means 170 imaging means 180 image data generation means 190 storage means 191 Dictionary file 193 Image data file

Claims

Storage means for storing a dictionary file that records tabular text data that associates a kana character string and text, and an image data file that stores image data including title and image data identification information;
Imaging means for generating bitmap data from an image signal obtained by shooting and holding it in a memory;
Voice input starting means for starting voice input;
A voice input means for converting voice into a voice signal;
Voice recognition means for analyzing the voice signal and generating a pseudonym;
A character string conversion means for referring to the dictionary file using the generated kana character string as a key, presenting at least one text matching the key as a conversion candidate, and converting the selected text into a selected text;
An image data title creating means for creating a title including the converted character string and adding the title to the image data;
Image data generating means for generating image data identification information, and generating image data by combining the title, the generated image data identification information, and the bitmap data held in the memory;
A photographing apparatus with a voice input function.

A storage means for storing a dictionary file that records text data in a tabular format in which kana character strings and text are associated, and an image data file that stores image data including image data identification information;
Imaging means for generating bitmap data from an image signal obtained by shooting and holding it in a memory;
Voice input starting means for starting voice input;
A voice input means for converting voice into a voice signal;
Voice recognition means for analyzing the voice signal and generating a pseudonym;
A character string conversion means for referring to the dictionary file using the generated kana character string as a key, presenting at least one text matching the key as a conversion candidate, and converting the selected text into a selected text;
Image data generating means for generating image data identification information, and generating image data by combining the generated image data identification information and the bitmap data held in the memory;
Image data identification information generating means for generating new image data identification information by adding the converted text to the image data identification information;
A photographing apparatus with a voice input function.

A voice input photographing method using a dictionary file that records tabular text data in association with a kana character string and text, and an image data file that stores image data including a title and image data identification information,
An imaging step of generating bitmap data from an image signal obtained by shooting and holding it in a memory;
A voice input start step for starting voice input;
An audio input step for converting audio into an audio signal;
A speech recognition step of analyzing the speech signal to generate a pseudonym;
A character string conversion step of converting the generated kana character string as a key, referring to a dictionary file, presenting at least one text matching the key as a conversion candidate, and converting the selected text into a selected text;
An image data title creating step for creating a title of the image data including the converted character string;
An image data generating step for generating image data identification information, and generating image data obtained by combining the title, the generated image data identification information, and the bitmap data held in the memory;
A voice input photographing method characterized by being performed in a procedure including

A voice input photographing method using a dictionary file that records text data in a tabular format in which kana character strings and text are associated, and an image data file that stores image data including a title and image data identification information,
An imaging step of generating bitmap data from an image signal obtained by shooting and holding it in a memory;
An image data step for generating bitmap data stored in the memory and generating image data including a name and a title;
A voice input start step for starting voice input;
An audio input step for converting audio into an audio signal;
A speech recognition step of analyzing the speech signal to generate a pseudonym;
A string conversion step of converting the generated kana into a string;
An image data generating step for generating image data identification information and generating image data obtained by combining the generated image data identification information and the bitmap data held in the memory;
An image data identification information changing step of adding the converted text to the image data identification information and changing it to new image data identification information;
A voice input photographing method characterized by being performed in a procedure including

A computer program for causing a computer to operate as the photographing apparatus with a voice input function according to any one of claims 1 to 2 by being incorporated in the computer.

A computer-readable recording medium on which the computer program according to claim 5 is recorded.