JP2008091978A

JP2008091978A - Imaging apparatus and image storing method

Info

Publication number: JP2008091978A
Application number: JP2006267017A
Authority: JP
Inventors: Takashi Machida; 貴志町田; Makoto Oishi; 誠大石
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2006-09-29
Filing date: 2006-09-29
Publication date: 2008-04-17

Abstract

<P>PROBLEM TO BE SOLVED: To add data related to a voice to image data while suppressing an increase in the amount of data. <P>SOLUTION: During a photographing mode, a CPU 20 stores voice data in a voice memory 29. When a release button 8 is half pressed, the CPU 20 performs focus adjustment by a focus motor 32. When the release button 8 is fully pressed, photographing is performed, and photographed image data are stored in a buffer memory 24 temporarily. In this case, a feature extraction section 31 measures the amount of effective sound pressure in the voice data stored in the voice memory 29 for 2 seconds before and after with a point of time of photographing as the center. The CPU 20 adds the measured sound pressure data to the image data stored in the buffer memory 24, and stores the image data to which the sound pressure data are added in a RAM 26. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、撮像装置及び画像記憶方法に関するものである。 The present invention relates to an imaging apparatus and an image storage method.

近年、撮影した画像を画像データとして記憶し、記憶した画像データをＬＣＤ等で表示させることにより、撮影画像を印刷することなく撮影画像を見ることができるデジタルカメラが広く普及している。印刷したプリント写真は、アルバム等に整理することにより、効率よく分類及び検索を行うことができる。一方、デジタルカメラで撮影した画像データは、プリント写真に比べて劣化しないというメリットがあるが、分類及び検索する手間がかかるという問題があった。そこで、記憶した画像データの分類及び検索を容易に行うことができるようにするための技術が提案されている（例えば、特許文献１〜３参照）。 In recent years, digital cameras that can store a captured image as image data and display the stored image data on an LCD or the like so that the captured image can be viewed without printing are widely used. Printed print photos can be efficiently classified and searched by organizing them into albums or the like. On the other hand, image data taken with a digital camera has the merit that it does not deteriorate as compared with a printed photograph, but there is a problem that it takes time and effort to classify and search. Therefore, a technique has been proposed for enabling easy classification and search of stored image data (see, for example, Patent Documents 1 to 3).

特許文献１では、撮像された画像データと、画像データの被写体を説明する説明データとを関連付ける撮像装置、その制御方法、及び制御プログラムが提案されている。説明データとしては、静止画データ、テキストデータ、音声データ、オーディオデータ、動画データが記載されている。 Patent Document 1 proposes an imaging apparatus that associates captured image data with explanatory data that describes a subject of the image data, a control method thereof, and a control program. As the explanation data, still picture data, text data, voice data, audio data, and moving picture data are described.

特許文献２では、撮影時に被写体の音声を録音し、この録音データを被写体の画面内位置情報と関連付けて画像ファイルに記憶する電子カメラおよび画像表示装置および画像表示方法が記載されている。 Patent Document 2 describes an electronic camera, an image display device, and an image display method that record a voice of a subject at the time of shooting and store the recorded data in an image file in association with positional information on the subject.

特許文献３では、シャッタボタンを操作して撮影を行ったときに、音声を取り込み、画像データと音声データとを関連付けてメモリに格納するデジタルカメラが提案されている。
特開２００３−１９８９０９号公報特開２００３−１７４５７８号公報特開２００３−２８３９０４号公報 Patent Document 3 proposes a digital camera that captures sound when shooting by operating a shutter button, and stores image data and sound data in association with each other in a memory.
JP 2003-198909 A JP 2003-174578 A JP 2003-283904 A

しかしながら、特許文献１〜３では、音声データをそのまま記憶しているため、データ量が大きくなり、メモリに記憶することができる画像データの量が少なくなるという問題があった。 However, Patent Documents 1 to 3 have a problem in that since the audio data is stored as it is, the amount of data increases, and the amount of image data that can be stored in the memory decreases.

本発明は、上記課題を解決するためになされたものであり、データ量の増加を抑制しながらも、音声に係るデータを画像データに付加することができる撮像装置及び画像記憶方法を提供することを目的とする。 The present invention has been made to solve the above-described problems, and provides an imaging apparatus and an image storage method that can add audio data to image data while suppressing an increase in the amount of data. With the goal.

上記目的を達成するために、本発明の撮像装置は、撮影を行う撮影手段と、外部の音声を音声データとして記憶する音声記憶手段と、を備えた撮像装置において、前記音声記憶手段に記憶された音声データから特徴となる特徴データを抽出する特徴抽出手段と、前記撮影手段で撮影した画像データと前記特徴抽出手段で抽出した特徴データとを関連付けて記憶する関連データ記憶手段と、を備えたことを特徴とする。なお、前記特徴データは、音声データよりもデータ量が小さいことが好ましい。 In order to achieve the above object, an imaging apparatus of the present invention is an imaging apparatus that includes an imaging unit that performs imaging and an audio storage unit that stores external audio as audio data, and is stored in the audio storage unit. Feature extraction means for extracting feature data as features from the voice data, and associated data storage means for storing the image data photographed by the photographing means and the feature data extracted by the feature extraction means in association with each other. It is characterized by that. The feature data preferably has a smaller data amount than the audio data.

また、前記撮影手段で撮影を行う撮影モードと撮影以外の制御を行う制御モードとの切り換えを行うモード切換手段を備え、前記音声記憶手段は、前記撮影モード中に前記音声データを記憶し、前記制御モード中に前記音声データを記憶しないことが好ましい。なお、前記制御モードとしては、撮影した画像データを再生する再生モード、撮像装置の各種設定を行う設定モード、電力消費を抑える省電力モード等が挙げられ、再生モードは、撮像装置がＬＣＤ等の表示手段を有する場合には、ＬＣＤに画像データを再生表示し、撮像装置がＬＣＤを有していない場合には、撮像装置に接続されたＰＣ（パーソナルコンピュータ）に画像データを送り、ＰＣのモニタに画像データを再生表示するものである。 Further, it comprises mode switching means for switching between a photographing mode for photographing with the photographing means and a control mode for performing control other than photographing, and the sound storage means stores the sound data during the photographing mode, It is preferable not to store the audio data during the control mode. Examples of the control mode include a playback mode for playing back captured image data, a setting mode for performing various settings of the imaging device, a power saving mode for reducing power consumption, and the playback mode includes an LCD or the like. When the display means is provided, the image data is reproduced and displayed on the LCD, and when the imaging device does not have the LCD, the image data is sent to a personal computer (PC) connected to the imaging device, and the PC monitor The image data is reproduced and displayed.

さらに、操作可能に設けられた操作部材と、前記操作部材が操作されたときに操作音を発生する操作音発生手段と、前記撮影モード中に前記操作部材が操作されたときに、前記操作音発生手段による操作音の発生を停止する操作音停止手段と、を備えることが好ましい。 Furthermore, an operation member provided so as to be operable, an operation sound generating means for generating an operation sound when the operation member is operated, and the operation sound when the operation member is operated during the photographing mode. It is preferable to include operation sound stop means for stopping the generation of operation sound by the generation means.

また、押圧可能に設けられ、半押ししたときに撮影準備動作を行わせ、前記半押し状態からさらに押し込んで全押ししたときに前記撮影手段による撮影を行わせる撮影ボタンを備え、前記音声記憶手段は、前記撮影ボタンが半押しされたことに応答して、前記音声データの記憶を開始する
ことが好ましい。 The voice storage means is provided so that it can be pressed, performs a shooting preparation operation when half-pressed, and further presses from the half-pressed state to perform shooting by the shooting means when fully pressed. Preferably, the storage of the audio data is started in response to the shooting button being half-pressed.

さらに、前記特徴抽出手段は、前記撮影手段により撮影が行われたときの第１の所定時間前から、前記撮影手段により撮影が行われたときの第２の所定時間後までの期間に記憶された音声データから前記特徴データを抽出することが好ましい。 Further, the feature extracting means is stored for a period from a first predetermined time when the photographing is performed by the photographing means to a second predetermined time when the photographing is performed by the photographing means. It is preferable to extract the feature data from the voice data.

また、前記特徴抽出手段は、前記特徴抽出手段は、前記撮影手段により撮影が行われたときから第３の所定時間前までの期間に記憶された音声データから前記特徴データを抽出することが好ましい。 Preferably, the feature extraction unit extracts the feature data from audio data stored in a period from when the image was taken by the image taking unit to a third predetermined time before. .

さらに、前記撮影手段により撮影を行う前にフォーカスレンズを移動させてフォーカス調整を行う調整部材を備え、前記特徴抽出手段は、前記調整部材によるフォーカス調整が終了したときから第４の所定時間後までの期間に記憶された音声データから前記特徴データを抽出することが好ましい。 Further, an adjustment member that adjusts focus by moving a focus lens before photographing by the photographing means is provided, and the feature extraction means is from when the focus adjustment by the adjustment member is finished to after a fourth predetermined time. It is preferable to extract the feature data from the voice data stored during the period.

また、前記第４の所定時間は、少なくとも前記調整部材によるフォーカス調整が終了したときから前記撮影手段により撮影が行われるまでの時間であることが好ましい。 In addition, it is preferable that the fourth predetermined time is a time from when the focus adjustment by the adjustment member is completed to when the photographing is performed by the photographing unit.

さらに、前記音声記憶手段に記憶された音声データのうち、第５の所定時間経過した音声データを消去する音声データ消去手段を備えることが好ましい。 Further, it is preferable that the audio data erasure unit for erasing the audio data that has passed the fifth predetermined time out of the audio data stored in the audio storage unit.

さらに、前記特徴抽出手段は、前記音声記憶手段に記憶された音声データから音圧を特徴データとして抽出することが好ましい。なお、前記音圧としては、音圧（Ｎ／ｍ²）、音圧レベル（ｄＢ）等が挙げられる。 Furthermore, it is preferable that the feature extraction means extracts sound pressure as feature data from the sound data stored in the sound storage means. Examples of the sound pressure include sound pressure (N / m ² ) and sound pressure level (dB).

また、前記特徴抽出手段は、前記音声記憶手段に記憶された音声データが、音声の特徴毎に予め分類された複数の音声モデルデータのうちのいずれに最も類似しているかを検出してその類似レベルを算出し、算出した類似レベルを前記特徴データとして抽出することが好ましい。なお、前記類似レベルは、前記複数の音声モデルデータのうちの前記音声データが最も類似している１つの音声モデルデータ名と、類似レベルをパーセンテージで表したときの数値と、を含むものであることが好ましい。また、前記音声モデルデータとしては、自動車内の音声を示す自動車内音声モデルデータ、列車内の音声を示す列車内音声モデルデータ、サッカー場の音声を示すサッカー場音声モデルデータ等が挙げられる。 In addition, the feature extraction means detects which voice data stored in the voice storage means is most similar to a plurality of voice model data classified in advance for each voice feature. Preferably, the level is calculated, and the calculated similarity level is extracted as the feature data. The similarity level includes one voice model data name in which the voice data is most similar among the plurality of voice model data, and a numerical value when the similarity level is expressed as a percentage. preferable. Examples of the voice model data include in-car voice model data indicating the voice in the car, in-train voice model data indicating the voice in the train, and soccer field voice model data indicating the voice in the soccer field.

さらに、前記特徴抽出手段は、前記音声記憶手段に記憶された音声データのうちの予め設定された割合のデータを前記特徴データとして抽出することが好ましい。 Furthermore, it is preferable that the feature extraction unit extracts data of a preset ratio of the voice data stored in the voice storage unit as the feature data.

このように、前記特徴抽出手段により音声データから抽出する特徴データは、音圧（音圧（Ｎ／ｍ²）、音圧レベル（ｄＢ）等）、複数の音声モデルデータに対する類似レベル、音声データのうちの予め設定された割合のデータ、のいずれか１つであることが好ましいが、音声データよりもデータ量が小さく特徴となるデータであればよく、適宜変更可能である。 As described above, the feature data extracted from the voice data by the feature extraction means includes sound pressure (sound pressure (N / m ² ), sound pressure level (dB), etc.), a similarity level for a plurality of sound model data, and sound data. Of these, it is preferable that the data is any one of the data set in advance, but any data that has a smaller data amount than the voice data and is characteristic can be changed as appropriate.

また、前記複数の音声モデルデータを記憶するモデルデータ記憶手段を備えることが好ましい。 Moreover, it is preferable to provide a model data storage means for storing the plurality of voice model data.

さらに、前記特徴データに基づいて、前記関連データ記憶手段に記憶された画像データから、指定された前記特徴データが関連付けられた画像データを検索する画像データ検索手段を備えることが好ましい。 Furthermore, it is preferable that image data search means for searching image data associated with the specified feature data from image data stored in the related data storage means based on the feature data is provided.

また、前記画像データ検索手段で検索された画像データを再生表示する表示手段を備えることが好ましい。 Further, it is preferable to include a display unit for reproducing and displaying the image data searched by the image data search unit.

さらに、上記撮像装置は、デジタルカメラであることが好ましい。 Furthermore, the imaging device is preferably a digital camera.

また、本発明の画像記憶方法は、撮影を行うときに外部の音声を音声データとして記憶し、記憶した外部の音声データから特徴となる特徴データを抽出し、撮影した画像データと抽出した特徴データとを関連付けて記憶することを特徴とする。 In addition, the image storage method of the present invention stores external sound as sound data when shooting, extracts feature data that is characteristic from the stored external sound data, and captures image data and extracted feature data. Are stored in association with each other.

本発明の撮像装置によれば、音声記憶手段に記憶された音声データから特徴となる特徴データを抽出し、撮影した画像データと抽出した特徴データとを関連付けて記憶するから、音声データをそのまま記憶するタイプに比べて、記憶するデータ量を抑制しながらも、音声に係るデータと画像データとを関連付けることができる。 According to the imaging apparatus of the present invention, feature data serving as features is extracted from the sound data stored in the sound storage means, and the captured image data and the extracted feature data are stored in association with each other, so the sound data is stored as it is. Compared with the type to perform, it is possible to correlate audio data and image data while suppressing the amount of data to be stored.

また、制御モード中には音声データを記憶せず、撮影モード中にのみ音声データを記憶するから、音声データの記憶に係る電力消費を抑制することができる。 In addition, since the audio data is not stored during the control mode and is stored only during the shooting mode, it is possible to suppress power consumption related to the storage of the audio data.

さらに、撮影ボタンが半押しされたことに応答して、音声データの記憶を開始するから、音声データの記憶に係る電力消費をより一層抑制することができる。 Furthermore, since the storage of the sound data is started in response to the half-press of the shooting button, it is possible to further suppress the power consumption related to the storage of the sound data.

さらに、本発明の画像記憶方法によれば、撮影を行うときに記憶した外部の音声データから特徴を特徴データとして抽出し、撮影した画像データと抽出した特徴データとを関連付けて記憶するから、音声データをそのまま記憶するタイプに比べて、記憶するデータ量を抑制しながらも、音声に係るデータと画像データとを関連付けることができる。 Furthermore, according to the image storage method of the present invention, features are extracted as feature data from external audio data stored at the time of shooting, and the captured image data and the extracted feature data are stored in association with each other. Compared with a type that stores data as it is, it is possible to associate audio data and image data while suppressing the amount of data to be stored.

図１及び図２に示すように、本発明を実施したデジタルカメラ２は、カメラボディ３の前面に、複数の撮影レンズ４が組み込まれたレンズ鏡筒５、音声を取り込むマイク６等が設けられている。カメラボディ３の上面には、電源のオン／オフ時に押圧操作される電源ボタン７、撮影時に押圧操作されるレリーズボタン（撮影ボタン）８、デジタルカメラ２を、撮影を行う撮影モードと撮影した画像を再生表示する再生モード（制御モード）とで切り換えるモード切換スイッチ（モード切換手段）９等が設けられている。レンズ鏡筒５は、デジタルカメラ２が電源オフ状態にある際に、カメラボディ３の内部に収納され、電源オンとともにカメラボディ３の前面から突出する。モード切換スイッチ９は回転自在に設けられており、回転することにより、デジタルカメラ２のモードが変更されるようにされている。 As shown in FIGS. 1 and 2, a digital camera 2 embodying the present invention is provided with a lens barrel 5 incorporating a plurality of photographing lenses 4, a microphone 6 for capturing sound, and the like on the front surface of a camera body 3. ing. On the upper surface of the camera body 3, a power button 7 that is pressed when the power is turned on / off, a release button (shooting button) 8 that is pressed when shooting, and the digital camera 2, a shooting mode for shooting and a shot image A mode changeover switch (mode changeover means) 9 or the like for changing over between the reproduction mode (control mode) for reproducing and displaying the image is provided. The lens barrel 5 is housed inside the camera body 3 when the digital camera 2 is in a power-off state, and protrudes from the front surface of the camera body 3 when the power is turned on. The mode changeover switch 9 is rotatably provided, and the mode of the digital camera 2 is changed by rotating.

レリーズボタン８は、２段押しの構造とされている。レリーズボタン８を軽く押圧（半押し）すると、フォーカシングなどの撮影準備動作が行われる。この状態でさらに押圧（全押し）すると、撮影動作が行われる。 The release button 8 has a two-stage push structure. When the release button 8 is lightly pressed (half-pressed), a shooting preparation operation such as focusing is performed. When further pressed (fully pressed) in this state, a photographing operation is performed.

カメラボディ３の背面には、撮影画像や各種設定条件が表示されるＬＣＤ（表示手段）１０、各種設定及び表示画像の切り換えを行うためのメニューキー（操作部材）１１、ＬＣＤ１０に表示する画像を拡大及び縮小ズームさせるズームボタン（操作部材）１２、スピーカ（操作音発生手段）１３等が設けられている。 On the back of the camera body 3, an LCD (display means) 10 for displaying a photographed image and various setting conditions, a menu key (operation member) 11 for switching various settings and display images, and an image to be displayed on the LCD 10 are displayed. A zoom button (operation member) 12 for zooming in and out and a speaker (operation sound generating means) 13 are provided.

ＬＣＤ１０、メニューキー１１、ズームボタン１２、スピーカ１３は、デジタルカメラ２の駆動を制御するＣＰＵ２０（図３参照）に接続されている。デジタルカメラ２が再生モードのときに、ユーザがメニューキー１１及びズームボタン１２を操作すると、ＣＰＵ２０は、メニューキー１１及びズームボタン１２が操作されたことを示す操作音を、スピーカ１３から発するように制御する。デジタルカメラ２が撮影モードのときには、ＣＰＵ２０は、上記操作音を発しないように制御する。これにより、後述する音声取り込み時に、上記操作音が取り込まれることがなく、デジタルカメラ２の外部の音声のみを取り込むことができる。本実施形態では、ＣＰＵ２０は、撮影モード中にメニューキー１１及びズームボタン１２が操作されたときに、スピーカ１３からの操作音の発生を停止する操作音停止手段としても機能する。 The LCD 10, the menu key 11, the zoom button 12, and the speaker 13 are connected to a CPU 20 (see FIG. 3) that controls driving of the digital camera 2. When the user operates the menu key 11 and the zoom button 12 when the digital camera 2 is in the playback mode, the CPU 20 emits an operation sound indicating that the menu key 11 and the zoom button 12 are operated from the speaker 13. Control. When the digital camera 2 is in the shooting mode, the CPU 20 performs control so that the operation sound is not emitted. Thereby, at the time of audio | voice acquisition mentioned later, the said operation sound is not acquired, but only the audio | voice outside the digital camera 2 can be acquired. In the present embodiment, the CPU 20 also functions as an operation sound stop unit that stops the generation of the operation sound from the speaker 13 when the menu key 11 and the zoom button 12 are operated during the shooting mode.

図３は、デジタルカメラ２内部の電気的構成を示すブロック図であり、撮影レンズ４の背後には、撮影レンズ４を透過した被写体光が撮像されるＣＣＤ１４が配置されている。このＣＣＤ１４には、ＣＰＵ２０によって制御されるタイミングジェネレータ１９からタイミング信号（クロック信号）が入力される。ＣＣＤ１４から出力された信号は、相関二重サンプリング回路（ＣＤＳ）１５に入力され、ＣＣＤ１４の各セルの蓄積電荷量に正確に対応したＲ、Ｇ、Ｂの画像データとして出力される。ＣＤＳ１５から出力された画像データは、増幅器（ＡＭＰ）１６で増幅され、Ａ／Ｄ変換器１７でデジタルデータに変換される。 FIG. 3 is a block diagram showing an electrical configuration inside the digital camera 2, and a CCD 14 that images subject light transmitted through the photographing lens 4 is disposed behind the photographing lens 4. A timing signal (clock signal) is input to the CCD 14 from a timing generator 19 controlled by the CPU 20. The signal output from the CCD 14 is input to a correlated double sampling circuit (CDS) 15 and output as R, G, and B image data corresponding to the accumulated charge amount of each cell of the CCD 14 accurately. Image data output from the CDS 15 is amplified by an amplifier (AMP) 16 and converted into digital data by an A / D converter 17.

画像入力コントローラ１８は、データバス２１を介してＣＰＵ２０に接続されており、ＣＰＵ２０の命令によってＣＣＤ１４，ＣＤＳ１５，ＡＭＰ１６，Ａ／Ｄ変換器１７を制御する。また、Ａ／Ｄ変換器１７から出力された画像データをビデオメモリ２２、あるいはバッファメモリ２４に書き込む。 The image input controller 18 is connected to the CPU 20 via the data bus 21, and controls the CCD 14, CDS 15, AMP 16, and A / D converter 17 according to instructions from the CPU 20. In addition, the image data output from the A / D converter 17 is written into the video memory 22 or the buffer memory 24.

ビデオメモリ２２は、ＬＣＤ１０をビューファインダとして使用する際に、解像度の低い画像データが一時的に記憶される。ビデオメモリ２２に記憶された画像データは、データバス２１を介してＬＣＤドライバ２３に送られ、ＬＣＤ１０に表示される。バッファメモリ２４は、撮像された高解像度の画像データが一時的に記憶される。このバッファメモリ２４から読み出された画像データは、メモリコントローラ２５によって駆動制御されるデータリーダによりＲＡＭ２６に記憶される。 The video memory 22 temporarily stores low-resolution image data when the LCD 10 is used as a viewfinder. The image data stored in the video memory 22 is sent to the LCD driver 23 via the data bus 21 and displayed on the LCD 10. The buffer memory 24 temporarily stores captured high-resolution image data. The image data read from the buffer memory 24 is stored in the RAM 26 by a data reader that is driven and controlled by the memory controller 25.

画像信号処理回路２７は、撮像された高解像度の画像データがバッファメモリ２４内に記憶されている間に、例えば階調変換、色変換、画像の超低周波濃度成分の階調を圧縮するハイパートーン処理、粒状を抑制しながらシャープネスを強調するハイパーシャープネス処理等の画像処理を施す。 The image signal processing circuit 27 performs, for example, gradation conversion, color conversion, and hyper to compress the gradation of the ultra-low frequency density component of the image while the captured high-resolution image data is stored in the buffer memory 24. Image processing such as tone processing and hyper sharpness processing that enhances sharpness while suppressing graininess is performed.

マイク６は、音声を取り込み、取り込んだアナログの音声データをＡ／Ｄ変換器２８に送る。Ａ／Ｄ変換器２８は、送られてきたアナログの音声データをデジタルの音声データに変換する。ＣＰＵ２０は、デジタルの音声データを、データバス２１を介して一時的に音声メモリ（音声記憶手段）２９に記憶する。本実施形態では、ＣＰＵ２０は、デジタルカメラ２が撮影モードであるときに、音声データを音声メモリ２９に記憶し、デジタルカメラ２が再生モードであるときに、音声データを音声メモリ２９に記憶しないようにされている。すなわち、再生モードから撮影モードに切り換えられたことに応答して音声メモリ２９に音声データの記憶を開始し、撮影モードから再生モードに切り換えられたことに応答して音声データの記憶を停止する。 The microphone 6 captures sound and sends the captured analog sound data to the A / D converter 28. The A / D converter 28 converts the sent analog audio data into digital audio data. The CPU 20 temporarily stores the digital audio data in the audio memory (audio storage means) 29 via the data bus 21. In the present embodiment, the CPU 20 stores audio data in the audio memory 29 when the digital camera 2 is in the shooting mode, and does not store audio data in the audio memory 29 when the digital camera 2 is in the playback mode. Has been. That is, in response to the switching from the playback mode to the shooting mode, the storage of the voice data is started in the voice memory 29, and in response to the switching from the shooting mode to the playback mode, the storage of the voice data is stopped.

特徴抽出部（特徴抽出手段）３１は、音声メモリ２９に記憶されている音声データの特徴として、音圧レベル（ｄＢ）を公知の計測方法により計測する。ＣＰＵ２０は、計測した音圧レベルデータを、バッファメモリ２４に記憶された画像データに付加し、音圧レベルデータが付加された画像データをＲＡＭ（関連データ記憶手段）２６に記憶する。 The feature extraction unit (feature extraction means) 31 measures the sound pressure level (dB) as a feature of the voice data stored in the voice memory 29 by a known measurement method. The CPU 20 adds the measured sound pressure level data to the image data stored in the buffer memory 24, and stores the image data to which the sound pressure level data is added in a RAM (related data storage means) 26.

ＣＰＵ２０は、レリーズボタン８が全押しされたことに応答して、データバス２１を介して、特徴抽出部３１に抽出信号を出力する。本実施形態では、特徴抽出部３１は、抽出信号が入力されたことに応答して、撮影が行われたとき（レリーズボタン８が全押しされた時点）を基点として、基点の１秒（第１の所定時間）前から、基点の１秒（第２の所定時間）後までの期間（計２秒）に音声メモリ２９に記憶された音声データの音圧レベルを計測する。このため、音声メモリ２９は、少なくとも２秒以上音声データを記憶することができる記憶容量とされている。本実施形態では、音声メモリ２９は、音声データを５秒記憶することができる記憶容量とされている。なお、上記した秒数（１秒）は適宜変更可能である。 In response to the release button 8 being fully pressed, the CPU 20 outputs an extraction signal to the feature extraction unit 31 via the data bus 21. In the present embodiment, the feature extraction unit 31 responds to the input of the extraction signal, and takes 1 second (first time) from the base point when shooting is performed (when the release button 8 is fully pressed). The sound pressure level of the sound data stored in the sound memory 29 is measured during a period (2 seconds in total) from 1 second (second predetermined time) before the base point. For this reason, the audio memory 29 has a storage capacity capable of storing audio data for at least 2 seconds. In the present embodiment, the audio memory 29 has a storage capacity capable of storing audio data for 5 seconds. The number of seconds (1 second) described above can be changed as appropriate.

ＣＰＵ２０は、記憶されてから４秒（第５の所定時間）経過した音声メモリ２９内の音声データを消去する。これにより、音声メモリ２９に記憶した音声データを恒久的に保存するタイプに比べて、音声メモリ２９の記憶容量を小さくすることができる。本実施形態では、ＣＰＵ２０は、音声メモリ２９に記憶された音声データのうち、４秒（第５の所定時間）経過した音声データを消去する音声データ消去手段としても機能する。 The CPU 20 erases the audio data in the audio memory 29 after 4 seconds (fifth predetermined time) have elapsed since the storage. Thereby, the storage capacity of the audio memory 29 can be reduced as compared with the type in which the audio data stored in the audio memory 29 is permanently stored. In the present embodiment, the CPU 20 also functions as audio data erasing means for erasing audio data that has passed 4 seconds (fifth predetermined time) from the audio data stored in the audio memory 29.

フォーカスモータ（調整部材）３２は、撮影を行う前に複数の撮影レンズ４を移動させてフォーカス調整を行うものであり、モータドライバ（図示せず）を介してＣＰＵ２０により駆動が制御される。ＣＰＵ２０は、レリーズボタン８が半押しされたことに応答して、フォーカスモータ３２を駆動し、撮影レンズ４を移動させてフォーカス調整を行う。 The focus motor (adjustment member) 32 is for adjusting the focus by moving the plurality of photographing lenses 4 before photographing, and the drive is controlled by the CPU 20 via a motor driver (not shown). In response to the release button 8 being half-pressed, the CPU 20 drives the focus motor 32 and moves the photographing lens 4 to perform focus adjustment.

図４に示すように、デジタルカメラ２を再生モードにすると、ＣＰＵ２０は、ＬＣＤ１０に表示方法選択画面３５を表示する。表示方法選択画面３５には、「画像表示方法を選択して下さい」という選択コメント画像４０が表示され、この選択コメント画像４０の下方には、「撮影順に表示」という第１コメント画像４１と、「音圧レベル順に表示」という第２コメント画像４２と、「音圧レベルを指定して検索表示」という第３コメント画像４３と、が表示され、第３コメント画像４３の下方には、音圧レベルを入力する入力部４４と、検索開始ボタン部４５と、が設けられている。 As shown in FIG. 4, when the digital camera 2 is set to the playback mode, the CPU 20 displays a display method selection screen 35 on the LCD 10. On the display method selection screen 35, a selection comment image 40 “Please select an image display method” is displayed. Below the selection comment image 40, a first comment image 41 “Display in order of photographing”, and A second comment image 42 “displayed in order of sound pressure level” and a third comment image 43 “designated and displayed by sound pressure level” are displayed. Below the third comment image 43, the sound pressure is displayed. An input unit 44 for inputting a level and a search start button unit 45 are provided.

ユーザは、メニューキー１１を操作して上記した第１〜第３コメント画像４１〜４３のいずれかを選択する。ユーザが第１コメント画像４１を選択した場合には、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データを検索し、画像データを撮影順にＬＣＤ１０に表示するように制御する。なお、ＬＣＤ１０に表示する画像データを切り換える場合には、メニューキー１１を操作して行う。 The user operates the menu key 11 to select one of the first to third comment images 41 to 43 described above. When the user selects the first comment image 41, the CPU 20 searches the image data stored in the RAM 26, and controls the image data to be displayed on the LCD 10 in the shooting order. Note that the menu key 11 is operated to switch the image data to be displayed on the LCD 10.

ユーザが第２コメント画像４２を選択した場合には、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データを検索し、画像データを音圧レベル順（大→小）にＬＣＤ１０に表示するように制御する。 When the user selects the second comment image 42, the CPU 20 searches the image data stored in the RAM 26 and controls the image data to be displayed on the LCD 10 in order of sound pressure level (large → small).

ユーザが第３コメント画像４３を選択した場合には、ＣＰＵ２０は、入力部４４に数値を入力することが可能となるように制御し、ユーザがメニューキー１１を操作して入力部４４に数値を入力すると、ＣＰＵ２０は、検索開始ボタン部４５を有効化するように制御する。そして、ユーザが、メニューキー１１を操作して検索開始ボタン部４５を操作すると、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データを検索し、付加された音圧レベルデータが入力数値（入力部４４に入力された数値）に近い順に、画像データをＬＣＤ１０に表示するように制御する。例えば、ＲＡＭ２６に記憶された５つの画像データに付加された音圧レベルデータの音圧レベル（ｄＢ）が、７０、７５、８０、８５、９０であり、入力数値が１００の場合には、ＣＰＵ２０は、音圧レベル９０の画像データ→音圧レベル８５の画像データ→音圧レベル８０の画像データ→音圧レベル７５の画像データ、音圧レベル７０の画像データの順に、表示する。 When the user selects the third comment image 43, the CPU 20 performs control so that a numerical value can be input to the input unit 44, and the user operates the menu key 11 to input a numerical value to the input unit 44. When entered, the CPU 20 controls the search start button unit 45 to be validated. When the user operates the search key button 45 by operating the menu key 11, the CPU 20 searches the image data stored in the RAM 26, and the added sound pressure level data is input to the input value (input unit 44). Control is performed so that the image data is displayed on the LCD 10 in the order close to the input numerical value. For example, when the sound pressure level (dB) of the sound pressure level data added to the five image data stored in the RAM 26 is 70, 75, 80, 85, 90 and the input numerical value is 100, the CPU 20 Are displayed in the order of image data of sound pressure level 90 → image data of sound pressure level 85 → image data of sound pressure level 80 → image data of sound pressure level 75 and image data of sound pressure level 70.

なお、本実施形態では、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データから、指定された特徴データ（音圧レベル）が関連付けられた画像データを検索する画像データ検索手段としても機能する。 In the present embodiment, the CPU 20 also functions as image data search means for searching image data associated with specified feature data (sound pressure level) from image data stored in the RAM 26.

上記のように構成されたデジタルカメラ２の作用について、図５のフローチャートを用いて説明を行う。デジタルカメラ２で撮影を行う場合には、デジタルカメラ２を撮影モードにする（ステップ（以下、Ｓ）１でＹ）。撮影モードであるときには、ＣＰＵ２０は、マイク６で取り込まれ、Ａ／Ｄ変換器２８でデジタルデータに変換された音声データを、音声メモリ２９に記憶する（Ｓ２）。 The operation of the digital camera 2 configured as described above will be described with reference to the flowchart of FIG. When photographing with the digital camera 2, the digital camera 2 is set to the photographing mode (Y in step (hereinafter, S) 1). In the photographing mode, the CPU 20 stores the audio data captured by the microphone 6 and converted into digital data by the A / D converter 28 in the audio memory 29 (S2).

ユーザは、レリーズボタン８を操作して撮影を行う。レリーズボタン８が半押しされたこと（Ｓ３でＹ）に応答して、ＣＰＵ２０は、フォーカスモータ３２によるフォーカス調整を行う（Ｓ４）。そして、レリーズボタン８が全押しされたこと（Ｓ５でＹ）に応答して、撮影を行い、撮影した画像データをバッファメモリ２４に一時的に記憶するとともに、データバス２１を介して、特徴抽出部３１に抽出信号を出力する。特徴抽出部３１は、抽出信号が入力されたことに応答して、撮影が行われたとき（レリーズボタン８が全押しされた時点）を基点として、基点の１秒前から、基点の１秒後までの期間（計２秒）に音声メモリ２９に記憶された音声データの音圧レベルを計測する（Ｓ６）。ＣＰＵ２０は、計測した音圧レベルデータを、バッファメモリ２４に記憶された画像データに付加し（Ｓ７）、音圧レベルデータが付加された画像データをＲＡＭ２６に記憶する（Ｓ８）。 The user performs shooting by operating the release button 8. In response to the release button 8 being half-pressed (Y in S3), the CPU 20 performs focus adjustment by the focus motor 32 (S4). Then, in response to the release button 8 being fully pressed (Y in S5), shooting is performed, the shot image data is temporarily stored in the buffer memory 24, and feature extraction is performed via the data bus 21. The extraction signal is output to the unit 31. In response to the input of the extraction signal, the feature extraction unit 31 starts from one second before the base point to one second from the base point when shooting is performed (when the release button 8 is fully pressed). The sound pressure level of the sound data stored in the sound memory 29 is measured for a period until the next time (2 seconds in total) (S6). The CPU 20 adds the measured sound pressure level data to the image data stored in the buffer memory 24 (S7), and stores the image data to which the sound pressure level data is added in the RAM 26 (S8).

次に、デジタルカメラ２が再生モードのときの画像表示方法を選択する流れについて、図６に示すフローチャートを用いて説明を行う。 Next, the flow of selecting an image display method when the digital camera 2 is in the playback mode will be described using the flowchart shown in FIG.

デジタルカメラ２を再生モードにする（Ｓ１でＹ）と、ＣＰＵ２０は、ＬＣＤ１０に表示方法選択画面３５を表示する（Ｓ２）。ユーザは、メニューキー１１を操作して、表示方法選択画面３５に表示された第１〜第３コメント画像４１〜４３のいずれかを選択する。ユーザが第１コメント画像４１を選択した場合（Ｓ３でＹ）には、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データを、撮影順にＬＣＤ１０に表示するように制御する（Ｓ４）。 When the digital camera 2 is set to the playback mode (Y in S1), the CPU 20 displays the display method selection screen 35 on the LCD 10 (S2). The user operates the menu key 11 to select one of the first to third comment images 41 to 43 displayed on the display method selection screen 35. When the user selects the first comment image 41 (Y in S3), the CPU 20 controls the image data stored in the RAM 26 to be displayed on the LCD 10 in the shooting order (S4).

ユーザが第２コメント画像４２を選択した場合（Ｓ５でＹ）には、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データを、音圧レベル順（大→小）にＬＣＤ１０に表示するように制御する（Ｓ６）。 When the user selects the second comment image 42 (Y in S5), the CPU 20 controls the image data stored in the RAM 26 to be displayed on the LCD 10 in order of sound pressure level (large → small) ( S6).

ユーザが第３コメント画像４３を選択した場合（Ｓ７でＹ）には、ユーザが入力部４４に数値を入力し（Ｓ８でＹ）、検索開始ボタン部４５を操作する（Ｓ９でＹ）と、ＣＰＵ２０は、付加された音圧レベルデータが入力数値（入力部４４に入力された数値）に近い順に、画像データをＬＣＤ１０に表示するように制御する（Ｓ１０）。 When the user selects the third comment image 43 (Y in S7), the user inputs a numerical value into the input unit 44 (Y in S8) and operates the search start button unit 45 (Y in S9). The CPU 20 controls to display the image data on the LCD 10 in the order in which the added sound pressure level data is close to the input numerical value (the numerical value input to the input unit 44) (S10).

このように、ＣＰＵ２０は、計測した音圧レベルデータを付加した画像データをＲＡＭ２６に記憶するから、音声データをそのままＲＡＭ２６に記憶するタイプに比べて、ＲＡＭ２６に記憶するデータ量を抑制しながらも、音声に係るデータと画像データとを関連付けることができる。 Thus, since the CPU 20 stores the image data to which the measured sound pressure level data is added in the RAM 26, the data amount stored in the RAM 26 is suppressed as compared with the type in which the sound data is stored in the RAM 26 as it is. Data related to sound and image data can be associated with each other.

また、画像データを、音圧レベル順にＬＣＤ１０に表示することができ、さらには、任意の音圧レベルに近い画像データを検索することができる。 Further, the image data can be displayed on the LCD 10 in the order of sound pressure levels, and further, image data close to an arbitrary sound pressure level can be searched.

さらに、ＣＰＵ２０は、デジタルカメラ２が再生モードであるときには音声メモリ２９に音声データを記憶せず、撮影モードであるときにのみ音声データを記憶するから、音声データの記憶に係る電力消費を抑制することができる。 Further, since the CPU 20 does not store audio data in the audio memory 29 when the digital camera 2 is in the playback mode, but stores audio data only when in the shooting mode, the power consumption associated with storing the audio data is suppressed. be able to.

また、音圧レベルを計測する期間を２秒とし、さらには、記憶されてから４秒経過した音声メモリ２９内の音声データを消去するから、音声メモリ２９の記憶容量を、音声データを５秒記憶可能な記憶容量にすることができ、音声メモリ２９の記憶容量を小さくすることができる。 Also, the period for measuring the sound pressure level is set to 2 seconds, and furthermore, the sound data in the sound memory 29 that has passed 4 seconds after being stored is deleted, so the storage capacity of the sound memory 29 is set to 5 seconds. The storage capacity can be increased, and the storage capacity of the audio memory 29 can be reduced.

なお、上記実施形態では、音圧レベルを計測する期間を、撮影が行われたとき（レリーズボタン８が全押しされた時点）を基点として、基点の１秒前から、基点の１秒（後までの期間（計２秒）としたが、音圧レベルを計測する期間は適宜変更可能である。 In the above embodiment, the period for measuring the sound pressure level is set from 1 second before the base point to 1 second (after the base point) when shooting is performed (when the release button 8 is fully pressed). However, the period during which the sound pressure level is measured can be changed as appropriate.

また、上記実施形態では、ＲＡＭ２６に画像データを記憶するようにしたが、デジタルカメラ２にメモリカードを挿入するメモリカードスロットを設け、メモリカードスロットに挿入されたメモリカードに画像データを記憶するようにしてもよい。 In the above embodiment, the image data is stored in the RAM 26. However, the memory card slot for inserting the memory card is provided in the digital camera 2, and the image data is stored in the memory card inserted in the memory card slot. It may be.

さらに、上記実施形態では、デジタルカメラ２にＬＣＤ１０を設けたが、デジタルカメラ２にＬＣＤ１０を設けずに、再生モード時にはデジタルカメラ２をＰＣに接続し、ＲＡＭ２６に記憶した画像データをＰＣに送り、ＰＣのモニタに画像データを再生表示するようにしてもよい。 Furthermore, in the above embodiment, the LCD 10 is provided in the digital camera 2, but without providing the LCD 10 in the digital camera 2, the digital camera 2 is connected to the PC in the playback mode, and the image data stored in the RAM 26 is sent to the PC. The image data may be reproduced and displayed on a PC monitor.

図７及び図８に他の実施形態を示す。図１〜図６に示す実施形態のものと同様の構成部材には同一の符号を付し、その詳細な説明を簡略化する。この実施形態では、ＣＰＵ２０は、レリーズボタン８が半押しされたことに応答して、音声メモリ２９への音声データの記憶を開始し、レリーズボタン８が全押しされたことに応答して、音声メモリ２９への音声データの記憶を停止するようにされている。ＲＡＭ２６には、音声の特徴毎に予め分類された複数の音声モデルデータ（例えば、自動車内の音声を示す自動車内音声モデルデータ、列車内の音声を示す列車内音声モデルデータ、サッカー場の音声を示すサッカー場音声モデルデータ等）が記憶されている。本実施形態では、ＲＡＭ２６は、複数の音声モデルデータを記憶するモデルデータ記憶手段としても機能する。なお、ＲＡＭ２６に記憶する音声モデルデータは、適宜変更可能である。 7 and 8 show another embodiment. Constituent members similar to those of the embodiment shown in FIGS. 1 to 6 are denoted by the same reference numerals, and detailed description thereof is simplified. In this embodiment, the CPU 20 starts storing audio data in the audio memory 29 in response to the release button 8 being pressed halfway, and in response to the release button 8 being fully pressed, Storage of audio data in the memory 29 is stopped. The RAM 26 stores a plurality of voice model data (for example, in-car voice model data indicating the voice in the car, in-train voice model data indicating the voice in the train, and soccer field voices classified in advance for each voice feature. The soccer field voice model data shown) is stored. In the present embodiment, the RAM 26 also functions as a model data storage unit that stores a plurality of sound model data. Note that the speech model data stored in the RAM 26 can be changed as appropriate.

デジタルカメラ２で撮影を行う場合には、デジタルカメラ２を撮影モードにする（Ｓ１でＹ）。ユーザが、レリーズボタン８を半押ししたこと（Ｓ２でＹ）に応答して、ＣＰＵ２０は、フォーカスモータ３２によるフォーカス調整を行う（Ｓ３）と同時に、マイク６で取り込まれ、Ａ／Ｄ変換器２８でデジタルデータに変換された音声データの音声メモリ２９への記憶を開始する（Ｓ４）。本実施形態では、サッカー場で撮影を行ったときの処理の流れについて説明を行う。 When shooting with the digital camera 2, the digital camera 2 is set to the shooting mode (Y in S1). In response to the user half-pressing the release button 8 (Y in S2), the CPU 20 performs focus adjustment by the focus motor 32 (S3), and at the same time, the CPU 20 takes in the A / D converter 28. Then, the storage of the audio data converted into digital data in the audio memory 29 is started (S4). In this embodiment, the flow of processing when shooting is performed on a soccer field will be described.

レリーズボタン８の半押しで音声データの記憶を開始してから、全押しするまでに半押しが解除される（Ｓ５でＹ）と、ＣＰＵ２０は、音声データの音声メモリ２９への記憶を停止する（Ｓ６）。なお、半押しが解除された場合、それまでに記憶した音声データを消去するようにしてもよい。 When the release of the release button 8 is half-pressed and the half-press is released before it is fully pressed (Y in S5), the CPU 20 stops storing the voice data in the voice memory 29. (S6). When the half-press is released, the voice data stored so far may be deleted.

ユーザが、レリーズボタン８の半押しを解除していない（Ｓ５でＮ）状態でレリーズボタン８を全押しする（Ｓ７でＹ）と、ＣＰＵ２０は、撮影を行い、撮影した画像データをバッファメモリ２４に一時的に記憶するとともに、データバス２１を介して、特徴抽出部３１に抽出信号を出力する。本実施形態では、特徴抽出部３１は、抽出信号が入力されたことに応答して、撮影時点（レリーズボタン８が全押しされ、撮影が行われたとき）から、撮影時点の１秒（第３の所定時間）前までの期間に音声メモリ２９に記憶された音声データが、ＲＡＭ２６に記憶された複数の音声モデルデータのうちのいずれに最も類似しているかを公知の検出方法（声紋による分析、音声周波数による分析、音量による分析等）により検出（サッカー場音声モデルデータが最も類似していることを検出）し、その類似レベルを公知の算出方法（声紋による分析、音声周波数による分析、音量による分析等）により算出（例えば、８０％）し（Ｓ８）、算出した類似レベルデータ（サッカー場音声モデルデータ、類似レベル８０％）をバッファメモリ２４に記憶された画像データに付加し（Ｓ９）、類似レベルデータが付加された画像データをＲＡＭ２６に記憶する（Ｓ１０）。 If the user does not release the half-press of the release button 8 (N in S5) and fully presses the release button 8 (Y in S7), the CPU 20 performs shooting and the captured image data is stored in the buffer memory 24. And the extracted signal is output to the feature extraction unit 31 via the data bus 21. In the present embodiment, in response to the input of the extraction signal, the feature extraction unit 31 starts from the shooting time point (when the release button 8 is fully pressed and the shooting is performed) from the shooting time point for 1 second (first time). 3 is a known detection method (analysis by voiceprint) that the voice data stored in the voice memory 29 in the period up to a predetermined time) is most similar to the voice model data stored in the RAM 26. , Analysis by sound frequency, sound volume analysis, etc. (detects that the soccer field sound model data is the most similar), and calculates the similarity level by a known calculation method (voice print analysis, sound frequency analysis, sound volume (E.g., 80%) (S8), and the calculated similar level data (soccer field voice model data, similar level 80%) is stored in the buffer memory 24. It was added to the stored image data (S9), and stores the image data similar level data is added to the RAM 26 (S10).

なお、上記した類似レベルデータは、複数の音声モデルデータのうちの音声データが最も類似している１つの音声モデルデータ名と、類似レベルをパーセンテージで表したときの数値と、を含むものである。例えば、サッカー場音声モデルデータの音量データとして、観客の歓声が大きいときの音量データが記憶されている場合には、サッカー場で撮影を行ったときに、観客の歓声が大きいとき（類似レベルの数値が高い、例えば９０％）と、観客の歓声が小さいとき（類似レベルの数値が低い、例えば５０％）とでは、類似レベルを示す数値が異なることとなる。 Note that the above-described similarity level data includes one voice model data name in which the voice data among the plurality of voice model data is most similar, and a numerical value when the similarity level is expressed as a percentage. For example, if the volume data of the soccer field voice model data is stored as volume data when the audience cheers are large, when the audience cheers are large when shooting at the soccer field (similar level) The numerical value indicating the similarity level is different when the numerical value is high (for example, 90%) and when the audience cheer is small (the numerical value for the similarity level is low, for example, 50%).

図８に示すように、デジタルカメラ２を再生モードにすると、ＣＰＵ２０は、ＬＣＤ１０に表示方法選択画面４９を表示する。表示方法選択画面４９には、「画像表示方法を選択して下さい」という選択コメント画像５０が表示され、この選択コメント画像５０の下方には、「撮影順に表示」という第１コメント画像５１と、「類似する音声モデルデータ毎に表示」という第２コメント画像５２と、が表示される。 As shown in FIG. 8, when the digital camera 2 is set to the playback mode, the CPU 20 displays a display method selection screen 49 on the LCD 10. On the display method selection screen 49, a selection comment image 50 “Please select an image display method” is displayed. Below this selection comment image 50, a first comment image 51 “display in order of photographing”, and A second comment image 52 “display for each similar speech model data” is displayed.

ユーザは、メニューキー１１を操作して上記した第１，第２コメント画像５１，５２のいずれかを選択する。ユーザが第１コメント画像５１を選択した場合には、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データを検索し、画像データを撮影順にＬＣＤ１０に表示するように制御する。なお、ＬＣＤ１０に表示する画像データを切り換える場合には、メニューキー１１を操作して行う。 The user operates the menu key 11 to select one of the first and second comment images 51 and 52 described above. When the user selects the first comment image 51, the CPU 20 searches the image data stored in the RAM 26 and controls the image data to be displayed on the LCD 10 in the order of photographing. Note that the menu key 11 is operated to switch the image data to be displayed on the LCD 10.

ユーザが第２コメント画像５２を選択した場合には、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データを検索し、画像データを類似する音声モデルデータ毎にＬＣＤ１０に表示するように制御する。このとき、ＣＰＵ２０は、類似レベルの数値（％）が高い順に表示するように制御する。 When the user selects the second comment image 52, the CPU 20 searches the image data stored in the RAM 26 and controls the image data to be displayed on the LCD 10 for each similar voice model data. At this time, the CPU 20 performs control so that the similar level numerical values (%) are displayed in descending order.

このように、ＣＰＵ２０は、レリーズボタン８が半押しされたことに応答して、音声メモリ２９への音声データの記憶を開始するから、音声データの記憶に係る電力消費を抑制することができる。 As described above, since the CPU 20 starts storing the audio data in the audio memory 29 in response to the release button 8 being pressed halfway, the power consumption related to the storage of the audio data can be suppressed.

また、音声データが、ＲＡＭ２６に記憶された音声モデルデータのうちのいずれに最も類似しているかを検出及び算出し、算出した類似レベルデータを付加した画像データをＲＡＭ２６に記憶するから、画像データを、類似する音声モデルデータ毎に分類することができ、さらには、画像データを、類似する音声モデルデータ毎にＬＣＤ１０に表示することができる。 Further, since the sound data is detected and calculated to which of the sound model data stored in the RAM 26 is most similar, and the image data to which the calculated similarity level data is added is stored in the RAM 26, the image data is stored in the RAM 26. The voice model data can be classified for each similar voice model data, and the image data can be displayed on the LCD 10 for each similar voice model data.

さらに、ＣＰＵ２０は、撮影時点から、撮影時点の１秒前までの期間に音声メモリ２９に記憶された音声データに基づいて、上記した検出及び算出を行うから、デジタルカメラ２で連写を行った場合にも、対応することができる。 Furthermore, since the CPU 20 performs the above detection and calculation based on the audio data stored in the audio memory 29 during the period from the shooting time to 1 second before the shooting time, the digital camera 2 performs continuous shooting. Cases can also be handled.

なお、上記実施形態では、レリーズボタン８が半押しされたことに応答して、ＣＰＵ２０は、音声メモリ２９への音声データの記憶を開始したが、音声データの記憶を開始するタイミングは適宜変更可能である。 In the above embodiment, the CPU 20 starts storing the audio data in the audio memory 29 in response to the release button 8 being pressed halfway. However, the timing for starting the audio data storage can be changed as appropriate. It is.

また、上記実施形態では、ＣＰＵ２０は、撮影時点から、撮影時点の１秒前までの期間に音声メモリ２９に記憶された音声データに基づいて、上記した検出及び算出を行ったが、検出及び算出を行う音声データの期間は適宜変更可能である。 In the above embodiment, the CPU 20 performs the above detection and calculation based on the audio data stored in the audio memory 29 during the period from the shooting time to 1 second before the shooting time. The period of the audio data for performing can be changed as appropriate.

さらに、上記実施形態では、ＲＡＭ２６に音声モデルデータを記憶したが、デジタルカメラ２に、インターネットを介して音声モデルデータを記憶したデータサーバーに接続可能な通信部を設けるようにしてもよく、この場合には、ＣＰＵ２０は、通信部及びインターネットを介してデータサーバーに接続し、上記した検出及び算出を行う。 Furthermore, in the above embodiment, the voice model data is stored in the RAM 26. However, the digital camera 2 may be provided with a communication unit that can be connected to a data server that stores the voice model data via the Internet. First, the CPU 20 connects to the data server via the communication unit and the Internet, and performs the detection and calculation described above.

図９に他の実施形態を示す。図１〜図６に示す実施形態のものと同様の構成部材には同一の符号を付し、その詳細な説明を簡略化する。この実施形態では、ＣＰＵ２０は、レリーズボタン８が全押しされたことに応答して、データバス２１を介して、特徴抽出部３１に抽出信号を出力する。特徴抽出部３１は、抽出信号が入力されたことに応答して、レリーズボタン８半押し時のフォーカス調整が終了したときから、レリーズボタン８の全押しによる撮影後２秒経過した時点までの期間（第４の所定時間）に音声メモリ２９に記憶された音声データのうちの例えば３０％（予め設定された割合）のデータ（以下、限定音声データと称する）を特徴データとして抽出する。特徴抽出部３１は、音声データのうち周波数の最も高い音声から順に３０％分の音声データを、限定音声データとして抽出する。 FIG. 9 shows another embodiment. Constituent members similar to those of the embodiment shown in FIGS. 1 to 6 are denoted by the same reference numerals, and detailed description thereof is simplified. In this embodiment, the CPU 20 outputs an extraction signal to the feature extraction unit 31 via the data bus 21 in response to the release button 8 being fully pressed. The feature extraction unit 31 responds to the input of the extraction signal, from the time when the focus adjustment when the release button 8 is half-pressed to the time when 2 seconds elapse after shooting by the full release button 8 press. Of the audio data stored in the audio memory 29 at (the fourth predetermined time), for example, 30% (a preset ratio) of data (hereinafter referred to as limited audio data) is extracted as feature data. The feature extraction unit 31 extracts 30% of audio data as limited audio data in order from the audio having the highest frequency in the audio data.

なお、音声データから限定音声データを抽出する方法は、上記したような高い周波数の音声を抽出する方法に限定されることなく、記憶された音声データを例えば０．５秒毎に分割し、分割した各データのうちの音圧レベル（ｄＢ）が高いデータを３０％分だけ抽出するようにしてもよく、適宜変更可能である。また、限定音声データを抽出する期間は、フォーカス調整が終了したときから所定時間後までの期間であればよく、少なくともフォーカス調整が終了したときから撮影が行われるまでの期間であることが好ましいが、適宜変更可能である。 Note that the method for extracting limited audio data from audio data is not limited to the method for extracting high-frequency audio as described above, and the stored audio data is divided every 0.5 seconds, for example. Of these data, data with a high sound pressure level (dB) may be extracted by 30%, and can be changed as appropriate. The period for extracting the limited audio data may be a period from when the focus adjustment is finished to a predetermined time later, and is preferably at least a period from when the focus adjustment is finished to when shooting is performed. These can be changed as appropriate.

デジタルカメラ２で撮影を行う場合には、デジタルカメラ２を撮影モードにする（Ｓ１でＹ）。撮影モードであるときには、ＣＰＵ２０は、マイク６で取り込まれ、Ａ／Ｄ変換器２８でデジタルデータに変換された音声データを、音声メモリ２９に記憶する（Ｓ２）。 When shooting with the digital camera 2, the digital camera 2 is set to the shooting mode (Y in S1). In the photographing mode, the CPU 20 stores the audio data captured by the microphone 6 and converted into digital data by the A / D converter 28 in the audio memory 29 (S2).

ユーザが、レリーズボタン８を半押しする（Ｓ３でＹ）と、フォーカスモータ３２によるフォーカス調整が行われる（Ｓ４）。そして、ユーザが、レリーズボタン８を全押ししたこと（Ｓ５でＹ）に応答して、ＣＰＵ２０は、撮影を行い、撮影した画像データをバッファメモリ２４に一時的に記憶するとともに、データバス２１を介して、特徴抽出部３１に抽出信号を出力する。特徴抽出部３１は、抽出信号が入力されたことに応答して、フォーカス調整が終了したときから撮影後２秒経過した時点までの期間に音声メモリ２９に記憶された音声データから限定音声データを抽出する（Ｓ６）。 When the user half-presses the release button 8 (Y in S3), focus adjustment by the focus motor 32 is performed (S4). Then, in response to the user fully pressing the release button 8 (Y in S5), the CPU 20 performs shooting, temporarily stores the shot image data in the buffer memory 24, and stores the data bus 21 in the buffer memory 24. Then, an extraction signal is output to the feature extraction unit 31. In response to the input of the extraction signal, the feature extraction unit 31 obtains limited audio data from the audio data stored in the audio memory 29 during the period from when focus adjustment is completed until 2 seconds have passed after shooting. Extract (S6).

ＣＰＵ２０は、抽出した限定音声データを、バッファメモリ２４に記憶された画像データに付加し（Ｓ７）、限定音声データが付加された画像データをＲＡＭ２６に記憶する（Ｓ８）。 The CPU 20 adds the extracted limited sound data to the image data stored in the buffer memory 24 (S7), and stores the image data with the limited sound data added in the RAM 26 (S8).

また、本実施形態では、ＣＰＵ２０は、ＲＡＭ２６に記憶された画像データをＬＣＤ１０に表示するときに、表示する画像データに付加された限定音声データを再生してスピーカ１３から出力するようにされている。 In this embodiment, the CPU 20 reproduces the limited audio data added to the image data to be displayed and outputs it from the speaker 13 when displaying the image data stored in the RAM 26 on the LCD 10. .

このように、フォーカス調整が終了したときから撮影後２秒経過した時点までの期間に音声メモリ２９に記憶された音声データから限定音声データを抽出するから、限定音声データを抽出する音声データには、フォーカス調整時のフォーカスモータ３２の駆動音が記憶されていないため、デジタルカメラ２の外部の音声のみから限定音声データを抽出することができる。 As described above, the limited audio data is extracted from the audio data stored in the audio memory 29 during the period from the end of the focus adjustment to the time when 2 seconds have passed after the shooting. Since the drive sound of the focus motor 32 at the time of focus adjustment is not stored, limited sound data can be extracted only from the sound outside the digital camera 2.

また、ＣＰＵ２０は、抽出した限定音声データを付加した画像データをＲＡＭ２６に記憶するから、音声データをそのままＲＡＭ２６に記憶するタイプに比べて、ＲＡＭ２６に記憶するデータ量を抑制しながらも、音声データを再生して撮影時の音声を確認することができ、さらには、音声データをそのまま記憶するタイプに比べて、音声データを再生するときのＣＰＵ２０の消費電力を抑制することができる。 In addition, since the CPU 20 stores the extracted image data to which the limited audio data is added in the RAM 26, the audio data can be stored while suppressing the amount of data stored in the RAM 26 as compared with the type in which the audio data is stored in the RAM 26 as it is. The sound at the time of shooting can be confirmed by reproduction, and furthermore, the power consumption of the CPU 20 when reproducing the audio data can be suppressed as compared with the type in which the audio data is stored as it is.

本発明を実施したデジタルカメラを示す正面斜視図である。It is a front perspective view which shows the digital camera which implemented this invention. デジタルカメラを示す背面斜視図である。It is a back perspective view showing a digital camera. デジタルカメラ内部の電気的構成を示すブロック図である。It is a block diagram which shows the electrical structure inside a digital camera. 表示方法選択画面を示す説明図である。It is explanatory drawing which shows a display method selection screen. 撮影時の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process at the time of imaging | photography. 画像データの表示における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in the display of image data. レリーズボタンが半押しされたことに応答して、音声メモリへの音声データの記憶を開始する実施形態の撮影時の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process at the time of imaging | photography of embodiment which starts the memory | storage of the audio | voice data to an audio | voice memory in response to half-pressing the release button. 図７に示す実施形態の表示方法選択画面示す説明図である。It is explanatory drawing which shows the display method selection screen of embodiment shown in FIG. フォーカス調整が終了したときから、撮影終了後２秒経過した時点までの期間に音声メモリに記憶された音声データから限定音声データを抽出する実施形態の撮影時の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process at the time of imaging | photography of embodiment which extracts limited audio | voice data from the audio | voice data memorize | stored in the audio | voice memory in the period from the time of completion | finish of focus adjustment to the time of 2 second passed after completion | finish of imaging | photography.

Explanation of symbols

２デジタルカメラ
８レリーズボタン（撮影ボタン）
９モード切換スイッチ（モード切換手段）
１０ＬＣＤ（表示手段）
１１メニューキー（操作部材）
１２ズームボタン（操作部材）
１３スピーカ（操作音発生手段）
２０ＣＰＵ
２６ＲＡＭ（関連データ記憶手段）
２９音声メモリ（音声記憶手段）
３１特徴抽出部（特徴抽出手段）
３２フォーカスモータ（調整部材） 2 Digital camera 8 Release button (shooting button)
9 Mode selector switch (Mode selector)
10 LCD (display means)
11 Menu key (operation member)
12 Zoom button (operation member)
13 Speaker (operation sound generating means)
20 CPU
26 RAM (related data storage means)
29 Voice memory (voice storage means)
31 Feature extraction unit (feature extraction means)
32 Focus motor (adjustment member)

Claims

Photographing means for photographing,
In an imaging apparatus comprising: voice storage means for storing external voice as voice data;
Feature extraction means for extracting feature data as features from the voice data stored in the voice storage means;
An image pickup apparatus comprising: related data storage means for storing image data photographed by the photographing means and feature data extracted by the feature extraction means in association with each other.

A mode switching means for switching between a photographing mode for photographing with the photographing means and a control mode for performing control other than photographing;
The imaging apparatus according to claim 1, wherein the voice storage unit stores the voice data during the shooting mode and does not store the voice data during the control mode.

An operation member provided to be operable;
Operation sound generating means for generating an operation sound when the operation member is operated;
The imaging apparatus according to claim 2, further comprising: an operation sound stop unit that stops generation of an operation sound by the operation sound generation unit when the operation member is operated during the photographing mode.

Provided with a shooting button that is provided so that it can be pressed, performs a shooting preparation operation when half-pressed, and further presses from the half-pressed state and performs shooting by the shooting means when fully pressed,
The imaging apparatus according to claim 1, wherein the voice storage unit starts storing the voice data in response to the shooting button being pressed halfway.

The feature extracting means stores the sound stored in a period from a first predetermined time when the photographing is performed by the photographing means to a second predetermined time when the photographing is performed by the photographing means. The imaging apparatus according to claim 1, wherein the feature data is extracted from data.

5. The feature extraction unit according to claim 1, wherein the feature extraction unit extracts the feature data from audio data stored in a period from when the photographing is performed by the photographing unit to a third predetermined time before. An imaging apparatus according to claim 1.

An adjustment member that adjusts the focus by moving the focus lens before shooting by the shooting means;
5. The feature extraction unit according to claim 1, wherein the feature extraction unit extracts the feature data from audio data stored in a period from when the focus adjustment by the adjustment member is completed to after a fourth predetermined time. An imaging apparatus according to claim 1.

8. The image pickup apparatus according to claim 7, wherein the fourth predetermined time is a time from when at least focus adjustment by the adjustment member is completed to when shooting is performed by the shooting unit.

9. The imaging apparatus according to claim 1, further comprising a voice data erasing unit that erases voice data that has passed a fifth predetermined time out of the voice data stored in the voice storage unit. .

The imaging apparatus according to claim 1, wherein the feature extraction unit extracts a sound pressure from the voice data stored in the voice storage unit as the feature data.

The feature extraction means detects which voice data stored in the voice storage means is most similar to a plurality of voice model data classified in advance for each voice feature, and sets the similarity level. The imaging apparatus according to claim 1, wherein the imaging device calculates and calculates the calculated similarity level as the feature data.

The imaging according to any one of claims 1 to 9, wherein the feature extraction unit extracts, as the feature data, data of a preset ratio of the voice data stored in the voice storage unit. apparatus.

12. The imaging apparatus according to claim 11, further comprising model data storage means for storing the plurality of voice model data.

2. An image data search unit for searching image data associated with the specified feature data from image data stored in the related data storage unit based on the feature data. 14. The imaging device according to any one of thirteen.

15. The imaging apparatus according to claim 14, further comprising display means for reproducing and displaying the image data searched by the image data search means.

16. The imaging apparatus according to claim 1, wherein the imaging apparatus is a digital camera.

When shooting, external audio is stored as audio data,
Extract feature data that is characteristic from stored external audio data,
An image storage method characterized by storing photographed image data and extracted feature data in association with each other.