JP5168375B2

JP5168375B2 - Imaging apparatus, imaging method, and program

Info

Publication number: JP5168375B2
Application number: JP2011053660A
Authority: JP
Inventors: 陽宮田
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2011-03-11
Filing date: 2011-03-11
Publication date: 2013-03-21
Anticipated expiration: 2029-03-18
Also published as: JP2011120306A

Description

本発明は、静止画撮影時に音声も合わせて記録可能なデジタルカメラ等に適した撮像装置、撮像方法及びプログラムに関する。 The present invention relates to an imaging apparatus, an imaging method, and a program suitable for a digital camera or the like that can record audio together with still image shooting.

静止画撮影時に、音声データも併せて記憶し、得た静止画の画像データと音声データとを関連付けて１つのデータファイルとして記録可能な機能がデジタルカメラの多くで採用されている。 Many digital cameras employ a function capable of storing audio data together with still image shooting, and associating the obtained still image data with audio data and recording them as one data file.

この種のデジタルカメラでは、シャッタキー操作による静止画の撮影タイミングをトリガとして音声の記憶を開始し、再度シャッタキーが操作されるか、あるいは最大限度時間（例えば１０［秒］）が経過した時点で音声の記憶を停止し、取得した音声データをＤＣＦ（ＤｅｓｉｇｎｒｕｌｅｆｏｒＣａｍｅｒａＦｉｌｅｓｙｓｔｅｍ）の規格などに則って静止画の画像データに関連付けて記録する。 In this type of digital camera, voice recording is started with a still image shooting timing by a shutter key operation as a trigger, and the shutter key is operated again, or when a maximum time (for example, 10 [seconds]) has elapsed. Then, the storage of the sound is stopped and the acquired sound data is recorded in association with the image data of the still image in accordance with the DCF (Design rule for Camera File system) standard.

また、上述したように静止画像の撮影タイミングから開始するのではなく、予めバッファメモリに音声データを更新しながら記憶し続けておき、静止画の撮影タイミングを中心としてその前後一定時間ずつの音声データを記録するようにした技術も考えられている。（例えば特許文献１） In addition, as described above, instead of starting from the still image shooting timing, the audio data is continuously stored in the buffer memory while being updated in advance, and the audio data for each fixed time before and after the still image shooting timing is the center. There is also a technique that can record these. (For example, Patent Document 1)

特開２００４−２９７１７７号公報JP 2004-297177 A

上記特許文献１に記載された技術を含め、音声データの取得タイミングは、関連付けられる静止画の撮影タイミング、すなわちシャッタボタンを押圧操作したタイミングに基づいて一律に決定される。しかしながら、静止画の最適な撮影タイミングが、そのまま対応する音声の取得で最適なタイミングと一致するとは限らず、実際の撮影時に必要と思われる音声の一部が途切れてしまうなど、記録タイミングを外してしまうことも充分あり得る。 Including the technique described in Patent Document 1, the audio data acquisition timing is uniformly determined based on the associated still image shooting timing, that is, the timing at which the shutter button is pressed. However, the optimal shooting timing for still images does not always match the optimal timing for acquiring the corresponding audio as it is, and some recording sounds that are necessary for actual shooting are interrupted. It is also possible that

本発明は上記のような実情に鑑みてなされたもので、その目的とするところは、音声付きの静止画像撮影時に、静止画像の撮影タイミングのみに決定されず、状況に対応して音声取得の開始及び終了の各タイミングを適切に制御することが可能な撮像装置、撮像方法及びプログラムを提供することにある。 The present invention has been made in view of the above circumstances, and the purpose of the present invention is not determined only by the shooting timing of a still image at the time of shooting a still image with sound, but can be acquired according to the situation. An object of the present invention is to provide an imaging apparatus, an imaging method, and a program capable of appropriately controlling each timing of start and end.

本発明は、静止画像を撮影する撮像手段と、音声を入力する音声入力手段と、予め任意の人物の声紋を登録しておく登録手段と、上記音声入力手段に入力される音声の声紋が上記登録手段に登録された人物の声紋と一致する期間に応じ、上記撮像手段での静止画像の撮影タイミングを含んで音声の取得開始タイミング及び取得終了タイミングを決定する音声期間決定手段と、上記撮像手段で撮影した静止画像、及び音声を記録する記録手段と、上記音声期間決定手段で決定した取得開始タイミングから取得終了タイミングまでの上記音声入力手段に入力される音声を切り出し、切り出した音声を上記撮像手段で得た静止画像と関連付けて上記記録手段に記録させる記録制御手段と、上記登録手段に登録された人物の声紋と声紋が一致する音声が関連付けられた静止画像が上記記録手段に複数記録されている場合に、それら該当する静止画像を纏めて新たな１つの音声付き画像を作成し、上記記録手段に記録させる画像再構成手段とを具備したことを特徴とする撮像装置、またはその撮像方法およびプログラムである。
また、他の態様による発明は、静止画像を撮影する撮像手段と、音声を入力する音声入力手段と、上記音声入力手段に入力される音声の音質が所定の条件を満たす期間に応じ、上記撮像手段での静止画像の撮影タイミングを含んで音声の取得開始タイミング及び取得終了タイミングを決定する音声期間決定手段と、上記撮像手段で撮影した静止画像、及び音声を記録する記録手段と、上記音声期間決定手段で決定した取得開始タイミングから取得終了タイミングまでの上記音声入力手段に入力される音声を切り出し、切り出した音声を上記撮像手段で得た静止画像と関連付けて上記記録手段に記録させる記録制御手段と、を具備し、上記音声期間決定手段は、上記静止画像の撮影タイミングにおいて特定の人物の声を検出した後、この撮影タイミングで検出された特定の人物の声が検出されなくなるまでの期間に基づいて、上記音声の取得終了タイミングを決定することを特徴とする撮像装置、またはその撮像方法およびプログラムである。 According to the present invention, there are provided an imaging unit for capturing a still image, a voice input unit for inputting voice, a registration unit for registering a voice print of an arbitrary person in advance, and a voice voice pattern input to the voice input unit described above. An audio period determining unit that determines an acquisition start timing and an acquisition end timing including a still image shooting timing in the imaging unit according to a period that coincides with a voice print of a person registered in the registration unit; and the imaging unit The recording unit that records the still image and the sound captured in step S3, and the audio input to the audio input unit from the acquisition start timing to the acquisition end timing determined by the audio period determination unit are cut out, and the cut out audio is captured Recording control means for recording in the recording means in association with the still image obtained by the means, and voice in which the voiceprint of the person registered in the registration means matches the voiceprint When a plurality of associated still images are recorded in the recording unit, an image reconstructing unit for creating a new image with sound by collecting the corresponding still images and recording the image with the recording unit is provided. An imaging apparatus or an imaging method and program thereof characterized by the above.
According to another aspect of the present invention, there is provided an imaging unit that captures a still image, an audio input unit that inputs audio, and the imaging according to a period in which a sound quality of audio input to the audio input unit satisfies a predetermined condition. A sound period determining means for determining the acquisition start timing and the acquisition end timing of the sound including the still image capturing timing at the means, a recording means for recording the still image and sound captured by the imaging means, and the sound period. Recording control means for cutting out the sound input to the sound input means from the acquisition start timing to the acquisition end timing determined by the determination means, and recording the cut sound in association with the still image obtained by the imaging means. The sound period determining means detects the voice of a specific person at the still image shooting timing, and then Based on the time to voice of the particular person detected by the timing is not detected, an imaging device, or imaging method and a program thereof, characterized in that to determine the acquisition end timing of the voice.

本発明によれば、音声付きの静止画像撮影時に、静止画像の撮影タイミングのみに決定されず、状況に対応して音声取得の開始及び終了の各タイミングを適切に制御することが可能となる。 According to the present invention, when capturing a still image with sound, it is possible to appropriately control the start and end timings of sound acquisition according to the situation without being determined only by the still image capturing timing.

本発明の第１の実施形態に係るデジタルカメラの電子回路の機能構成を示すブロック図。1 is a block diagram showing a functional configuration of an electronic circuit of a digital camera according to a first embodiment of the present invention. 同実施形態に係る音声付き静止画撮影時の処理内容を示すフローチャート。The flowchart which shows the processing content at the time of still image photography with a sound concerning the embodiment. 同実施形態に係る音声データの取得と記録を説明する図。The figure explaining acquisition and recording of sound data concerning the embodiment. 本発明の第２の実施形態に係る音声付き静止画撮影時の処理内容を示すフローチャート。The flowchart which shows the processing content at the time of still image photography with a sound concerning the 2nd Embodiment of this invention. 同実施形態に係る音声データの取得と記録を説明する図。The figure explaining acquisition and recording of sound data concerning the embodiment. 同実施形態に係るスライドショー用画像の一例を示す図。The figure which shows an example of the image for slideshows concerning the embodiment.

（第１の実施形態）
以下本発明をデジタルカメラに適用した場合の第１の実施形態について図面を参照して説明する。 (First embodiment)
A first embodiment when the present invention is applied to a digital camera will be described below with reference to the drawings.

図１は、本実施形態に係るデジタルカメラ１０の回路構成を示すものである。同図では、カメラ筐体前面に配設される光学レンズユニット１１により、固体撮像素子であるＣＣＤ１２の撮像面上に被写体の光像が結像される。 FIG. 1 shows a circuit configuration of a digital camera 10 according to the present embodiment. In the figure, a light image of a subject is formed on the imaging surface of a CCD 12 that is a solid-state imaging device by an optical lens unit 11 disposed on the front surface of the camera casing.

スルー画像表示、あるいはライブビュー画像表示とも称されるモニタ状態では、このＣＣＤ１２での撮像により得た画像信号を画像処理部１３に送り、相関二乗サンプリングや自動ゲイン調整、Ａ／Ｄ変換処理を実行してデジタル化する。画像処理部１３はさらに、このデジタル値の画像データに画素補間処理、γ補正処理を含むカラープロセス処理を施した後、システムバスＳＢを介して画像バッファ１４に一時的に保持させる。 In the monitor state, also referred to as through image display or live view image display, the image signal obtained by imaging with the CCD 12 is sent to the image processing unit 13 to execute correlation square sampling, automatic gain adjustment, and A / D conversion processing. And digitize. The image processing unit 13 further performs color process processing including pixel interpolation processing and γ correction processing on the digital image data, and then temporarily stores the digital data in the image buffer 14 via the system bus SB.

画像バッファ１４に保持された画像データをシステムバスＳＢを介して画像処理部１３に読出し、再びシステムバスＳＢを介して表示部１５へ送り、スルー画像として表示させる。 The image data held in the image buffer 14 is read out to the image processing unit 13 via the system bus SB, sent again to the display unit 15 via the system bus SB, and displayed as a through image.

また、上記光学レンズユニット１１と同じくカメラ筐体前面には、上記光学レンズユニット１１の撮影画角と略同等の音響指向性を有するマイクロホン１６が配設され、被写体方向の音声が入力される。マイクロホン１６は入力した音声を電気信号化し、音声処理部１７へ出力する。 Similarly to the optical lens unit 11, a microphone 16 having an acoustic directivity substantially equal to the shooting angle of view of the optical lens unit 11 is disposed on the front surface of the camera housing, and audio in the subject direction is input. The microphone 16 converts the input sound into an electrical signal and outputs it to the sound processing unit 17.

音声処理部１７は、音声の録音時にはマイクロホン１６から入力する音声信号をデジタルデータ化し、リングバッファで構成する音声バッファ１８に一時的に記憶させる。さらに音声処理部１７は、必要に応じて音声バッファ１８に記憶する音声データを部分的に切り出し、所定のデータファイル形式、例えばＡＡＣ（ｍｏｖｉｎｇｐｉｃｔｕｒｅｅｘｐｅｒｔｓｇｒｏｕｐ−４ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）形式でデータ圧縮して音声データファイルを作成し、後述する記録媒体へ送出する。 The voice processing unit 17 converts the voice signal input from the microphone 16 into digital data when recording the voice, and temporarily stores it in the voice buffer 18 constituted by a ring buffer. Furthermore, the audio processing unit 17 partially cuts out audio data stored in the audio buffer 18 as necessary, and compresses the data in a predetermined data file format, for example, AAC (moving picture experts group-4 Advanced Audio Coding) format. An audio data file is created and sent to a recording medium to be described later.

加えて音声処理部１７は、ＰＣＭ音源等の音源回路を備え、音声の再生時に送られてくる音声データファイルの圧縮を解いてアナログ化し、このデジタルカメラ１０の背面側に設けられるスピーカ１９を駆動して、拡声放音させる。 In addition, the sound processing unit 17 includes a sound source circuit such as a PCM sound source, and uncompresses and converts the sound data file sent at the time of sound reproduction into an analog signal, and drives a speaker 19 provided on the back side of the digital camera 10. Then, a loud sound is emitted.

以上の回路を制御部２０が統括制御する。制御部２０はＣＰＵで構成され、メインメモリ２１、プログラムメモリ２２と直接接続される。メインメモリ２１は、ＳＤＲＡＭ（シンクロナスＤＲＡＭ）で構成され、ワークメモリとして機能する。プログラムメモリ２２は、電気的に書換可能な不揮発性メモリで構成され、後述する撮影モード時の制御を含む動作プログラムや特定の人物の声紋データ等を固定的に記憶する。 The control unit 20 performs overall control of the above circuit. The control unit 20 is composed of a CPU and is directly connected to the main memory 21 and the program memory 22. The main memory 21 is composed of SDRAM (synchronous DRAM) and functions as a work memory. The program memory 22 is composed of an electrically rewritable non-volatile memory, and fixedly stores an operation program including control in a shooting mode to be described later, voice print data of a specific person, and the like.

制御部２０はプログラムメモリ２２から必要なプログラムやデータ等を読出し、メインメモリ２１に適宜一時的に展開記憶させながら、このデジタルカメラ１０全体の制御動作を実行する。 The control unit 20 reads out necessary programs, data, and the like from the program memory 22 and executes the control operation of the entire digital camera 10 while temporarily expanding and storing it in the main memory 21 as appropriate.

さらに上記制御部２０は、キー入力部２３から直接入力されるキー操作信号に対応して各種制御動作を実行する。制御部２０は、システムバスＳＢを介して上記画像処理部１３、画像バッファ１４、表示部１５の他、さらにレンズ駆動部２４、フラッシュ駆動部２５、ＣＣＤドライバ２６、圧縮伸長処理部２７、メモリカードコントローラ２８、及びＵＳＢインタフェース（Ｉ／Ｆ）２９とも接続される。 Further, the control unit 20 executes various control operations in response to key operation signals directly input from the key input unit 23. In addition to the image processing unit 13, the image buffer 14, and the display unit 15, the control unit 20 further includes a lens driving unit 24, a flash driving unit 25, a CCD driver 26, a compression / decompression processing unit 27, and a memory card. A controller 28 and a USB interface (I / F) 29 are also connected.

キー入力部２３は、例えば電源キー、シャッタキー、ズームキー、撮影モードキー、再生モードキー、メニューキー、カーソル（「↑」「→」「↓」「←」）キー、セットキー、シーンプログラムキー等を備える。 The key input unit 23 includes, for example, a power key, shutter key, zoom key, shooting mode key, playback mode key, menu key, cursor (“↑”, “→”, “↓”, “←”) key, set key, scene program key, and the like. Is provided.

レンズ駆動部２４は、制御部２０からの制御信号を受けてレンズ用ステッピングモータ（Ｍ）３０の回動を制御し、上記光学レンズユニット１１を構成する複数のレンズ中の一部、具体的にはフォーカスレンズ及びズームレンズの位置をそれぞれ個別に光軸方向に沿って移動させる。 The lens driving unit 24 receives a control signal from the control unit 20 and controls the rotation of the lens stepping motor (M) 30, and a part of the plurality of lenses constituting the optical lens unit 11, specifically, Moves the position of the focus lens and the zoom lens individually along the optical axis direction.

フラッシュ駆動部２５は、静止画像撮影時に制御部２０からの制御信号を受けて複数の白色高輝度ＬＥＤで構成されるフラッシュ部３１を撮影タイミングに同期して点灯駆動する。 The flash drive unit 25 receives a control signal from the control unit 20 during still image shooting, and drives the flash unit 31 including a plurality of white high-intensity LEDs to be lit in synchronization with the shooting timing.

ＣＣＤドライバ２６は、その時点で設定されている撮影条件等に応じて上記ＣＣＤ１２の操作駆動を行なう。 The CCD driver 26 operates and drives the CCD 12 according to the photographing conditions set at that time.

圧縮伸長処理部２７は、上記キー入力部２３のシャッタキー操作に伴う画像撮影時に、画像バッファ１４に保持される画像データを所定のデータファイル形式、例えばＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）であればＤＣＴ（離散コサイン変換）やハフマン符号化等のデータ圧縮処理を施してデータ量を大幅に削減した画像データファイルを作成する。作成した画像データファイルはシステムバスＳＢ、メモリカードコントローラ２８を介してメモリカード３２に記録される。 The compression / decompression processing unit 27 converts the image data stored in the image buffer 14 into a predetermined data file format, for example, DCT if it is a JPEG (Joint Photographic Experts Group), at the time of shooting an image accompanying the shutter key operation of the key input unit 23. Data compression processing such as (discrete cosine transform) and Huffman coding is performed to create an image data file with a greatly reduced data amount. The created image data file is recorded on the memory card 32 via the system bus SB and the memory card controller 28.

また圧縮伸長処理部２７は、再生モード時にメモリカード３２からメモリカードコントローラ２８を介して読出されてくる画像データをシステムバスＳＢを介して受取り、記録時とは逆の手順で圧縮を解く伸長処理により元のサイズの画像データを得、これをシステムバスＳＢを介して画像バッファ１４に保持させる。そして、画像バッファ１４に保持された画像データにより表示部１５で再生のための表示が実行される。
メモリカードコントローラ２８は、カードコネクタ３３を介してメモリカード３２と接続される。メモリカード３２は、このデジタルカメラ１０に着脱自在に装着され、このデジタルカメラ１０の記録媒体となる画像データ等の記録用メモリであり、内部にはブロック単位で電気的に書換え可能な不揮発性メモリであるフラッシュメモリと、その駆動回路とが設けられる。 The compression / decompression processing unit 27 receives the image data read from the memory card 32 via the memory card controller 28 in the playback mode via the system bus SB, and decompresses the decompression process in the reverse order of the recording process. Thus, the original size image data is obtained and held in the image buffer 14 via the system bus SB. Then, display for reproduction is executed on the display unit 15 by the image data held in the image buffer 14.
The memory card controller 28 is connected to the memory card 32 via the card connector 33. The memory card 32 is a memory for recording image data and the like, which is detachably attached to the digital camera 10 and serves as a recording medium for the digital camera 10, and has a nonvolatile memory that can be electrically rewritten in units of blocks. A flash memory and a driving circuit thereof are provided.

ＵＳＢインタフェース２９は、ＵＳＢコネクタ３４を介してこのデジタルカメラ１０を外部機器、例えばパーソナルコンピュータと接続する際のデータの送受を司る。 The USB interface 29 controls transmission / reception of data when the digital camera 10 is connected to an external device such as a personal computer via the USB connector 34.

次に上記実施形態の動作について説明する。
なお、以下に示す動作は、撮影モード下で静止画像の撮影を行なう際、制御部２０がプログラムメモリ２２に記憶されている動作プログラムやデータ、及び登録済みの特定の人物の音声の声紋データを読出してメインメモリ２１に展開して記憶させた上で実行するものである。 Next, the operation of the above embodiment will be described.
In the following operation, when a still image is shot in the shooting mode, the control unit 20 uses the operation program and data stored in the program memory 22 and the voice print data of a registered specific person's voice. The data is read out, developed and stored in the main memory 21, and then executed.

プログラムメモリ２２に記憶されている動作プログラム等は、このデジタルカメラ１０の製造工場出荷時にプログラムメモリ２２に記憶されていたものに加え、例えばこのデジタルカメラ１０のバージョンアップに際して、デジタルカメラ１０をパーソナルコンピュータと接続することにより外部からＵＳＢコネクタ３４、ＵＳＢインタフェース２９を経由して新たな動作プログラム、データ等をダウンロードして記憶していたものも含む。 For example, when the digital camera 10 is upgraded, the operation program stored in the program memory 22 is stored in the program memory 22 when the digital camera 10 is shipped from the factory. And a new operation program, data, etc. downloaded and stored from the outside via the USB connector 34 and the USB interface 29.

また、キー入力部２３の一部を構成するシャッタキーは、２段階の操作ストロークを有し、第１段目の操作ストローク（以下「半押し」と称する）で撮影準備状態となり、ＡＦ（自動合焦）処理及びＡＥ（自動露光）処理を実行して合焦位置及び露光値をロックし、さらに第２段階の操作ストローク（以下「全押し」と称する）で撮影を実行するものとする。 The shutter key that constitutes a part of the key input unit 23 has a two-step operation stroke. The first-step operation stroke (hereinafter referred to as “half-press”) enters a shooting preparation state, and AF (automatic) It is assumed that the in-focus position and the AE (automatic exposure) process are executed to lock the in-focus position and the exposure value, and further, shooting is performed with a second stage operation stroke (hereinafter referred to as “full press”).

図２は、音声付き静止画撮影モード時の、主として音声データの取扱いの処理内容を示すものである。その当初には、キー入力部２３のシャッタキーが半押し操作されたか否かを判断し続けることで、シャッタキーが半押し操作されるのを待機する（ステップＳ１０１）。 FIG. 2 mainly shows the processing contents of handling of audio data in the still image shooting mode with audio. Initially, it is determined whether or not the shutter key of the key input unit 23 has been pressed halfway, thereby waiting for the shutter key to be pressed halfway (step S101).

そして、シャッタキーが半押し操作された時点でそれを上記ステップＳ１０１で判断し、上述した如く合焦位置及び露光値をロックすると共に、マイクロホン１６、音声処理部１７、及び音声バッファ１８を用いた音声の循環記憶を開始させる（ステップＳ１０２）。 When the shutter key is half-pressed, it is determined in step S101, the focus position and exposure value are locked as described above, and the microphone 16, the audio processing unit 17, and the audio buffer 18 are used. The sound circulation storage is started (step S102).

この音声の循環記憶状態では、リングバッファで構成される音声バッファ１８の記憶容量を有効に活用し、古い音声データを順次消去しながら新たな音声データを取得して蓄積することで、常に過去の一定時間分、例えば３０秒間分の音声データを一時的に記憶しておく。 In this audio cyclic storage state, the storage capacity of the audio buffer 18 composed of a ring buffer is effectively used, and new audio data is acquired and accumulated while sequentially deleting old audio data, so that the past data is always stored in the past. Audio data for a fixed time, for example, 30 seconds is temporarily stored.

この音声の循環記憶を続行しながら、キー入力部２３のシャッタキーがさらに全押し操作されたか否かを判断する（ステップＳ１０３）。ここでシャッタキーが全押し操作されていないと判断すると、次いで同シャッタキーの半押し操作状態が解除されたか否かを判断する（ステップＳ１０４）。 It is determined whether or not the shutter key of the key input unit 23 is further fully pressed while continuing the sound circulation storage (step S103). If it is determined that the shutter key is not fully pressed, it is then determined whether or not the half-pressed state of the shutter key is released (step S104).

シャッタキーの半押し操作状態が維持されている場合、再び上記ステップＳ１０３からの処理に戻る。以後、ステップＳ１０３，Ｓ１０４の処理を繰返し実行することで、シャッタキーが全押し操作されるか、半押し状態が解除されるのを待機する。 If the half-pressed state of the shutter key is maintained, the process returns to step S103 again. Thereafter, the processes in steps S103 and S104 are repeatedly executed to wait until the shutter key is fully pressed or the half-pressed state is released.

シャッタキーの半押し操作状態が解除された場合、上記ステップＳ１０４でそれを判断し、上記音声バッファ１８での音声の循環記憶状態を停止した上で（ステップＳ１０５）、再び次にシャッタキーが半押し操作されるのを待機するべく、上記ステップＳ１０１からの処理に戻る。 When the half-pressed state of the shutter key is released, it is determined in step S104, and the circulating storage state of the voice in the voice buffer 18 is stopped (step S105). In order to wait for the pressing operation, the process returns to step S101.

また、上記ステップＳ１０３，Ｓ１０４の処理を繰返し実行している過程でシャッタキーが全押し操作された場合、ステップＳ１０３でそれを判断し、その時点でロックされている合焦位置と露光値に従って静止画の本撮影を実行し、撮影で得た画像データを圧縮伸長処理部２７でデータ圧縮してデータファイル化し、メモリカード３２に記録する（ステップＳ１０６）。 If the shutter key is fully pressed while the processes of steps S103 and S104 are repeatedly executed, the determination is made in step S103, and the image is stopped according to the focus position and the exposure value locked at that time. The actual shooting of the image is executed, and the image data obtained by shooting is compressed by the compression / decompression processing unit 27 into a data file and recorded in the memory card 32 (step S106).

これと共に、その時点で音声バッファ１８に記憶されている音声データの状態に応じた切り出し処理を行なう。すなわち、現時点で音声バッファ１８に記憶する音声データが予め設定した音圧以上で予め設定した時間、例えば３秒間以上継続した状態となるのを待機する（ステップＳ１０７）。 At the same time, a clipping process is performed according to the state of the audio data stored in the audio buffer 18 at that time. That is, it waits for the audio data stored in the audio buffer 18 to be in a state of continuing for a preset time, for example, 3 seconds or more, at a preset sound pressure or higher (step S107).

そして、音声データが予め設定した音圧以上、且つ予め設定した時間以上継続した状態となったと判断した時点で、次に音声バッファ１８に記憶する音声データが予め設定した音圧未満で予め設定した時間、例えば１０秒間以上継続した状態となるのを待機する（ステップＳ１０８）。 Then, when it is determined that the voice data is in a state where the voice data is equal to or higher than a preset sound pressure and has been continued for a preset time, the voice data to be stored in the voice buffer 18 is set in advance below a preset sound pressure. It waits for a period of time, for example, 10 seconds or longer (step S108).

そして、音声データが予め設定した音圧未満で、且つ予め設定した時間以上継続した状態となったと判断した時点で、上記音声データが予め設定した音圧以上、且つ予め設定した時間以上継続した状態での終端位置を、上記撮影した静止画に対応する音声データの区間の終了位置とするよう設定する（ステップＳ１０９）。 Then, when it is determined that the voice data is less than a preset sound pressure and has continued for a preset time or more, the voice data has continued for a preset time or more for a preset time Is set to be the end position of the section of the audio data corresponding to the photographed still image (step S109).

以上で、音声バッファ１８への音声データの循環記憶を停止させる（ステップＳ１１０）。 Thus, the circular storage of audio data in the audio buffer 18 is stopped (step S110).

次いで音声バッファ１８内の音声データを、上記シャッタキーが全押し操作される前から時系列に遡って、記憶状態を判定する処理を開始する（ステップ１１１）。 Next, processing for determining the storage state of the audio data in the audio buffer 18 is started retrospectively before the shutter key is fully pressed (step 111).

この時間を遡った判定処理過程で、音声データが予め設定した音圧以上で予め設定した時間、例えば３秒間以上継続した部分をサーチする（ステップＳ１１２）。 In the determination process going back this time, a search is performed for a portion in which the audio data continues for a preset time, for example, 3 seconds or more, at a preset sound pressure or higher (step S112).

そして、音声データが予め設定した音圧以上、且つ予め設定した時間以上継続した状態となった時間位置からさらに遡り、次に予め設定した音圧未満で予め設定した時間、例えば１０秒間以上継続した部分をサーチする（ステップＳ１１３）。 Then, the sound data further goes back from the time position where the sound data has been set to a state higher than a preset sound pressure and continued for a preset time, and then continued for a preset time, for example, 10 seconds or less, below the preset sound pressure. The part is searched (step S113).

そして、音声データが予め設定した音圧未満で、且つ予め設定した時間以上継続した状態が検出されると、上記音圧が予め設定した音圧以上であった始端位置を上記撮影した静止画に対応する音声データの区間の開始位置とするよう設定する（ステップＳ１１４）。 When a state in which the audio data is less than a preset sound pressure and continues for a preset time is detected, the start position where the sound pressure is greater than or equal to the preset sound pressure is detected as the captured still image. The start position of the corresponding audio data section is set (step S114).

これにより、音声バッファ１８に記憶されている音声データの区間の開始位置と終了位置とが決定されたことになるので、音声処理部１７が当該範囲の音声データを切り出し、データ圧縮した上で上記ステップＳ１０６で記録した静止画像に関連付けた音声データファイルを作成し、メモリカードコントローラ２８よりメモリカード３２に記録させ（ステップＳ１１５）、以上で図２に係る一連の処理を終了する。 Thus, since the start position and end position of the section of the audio data stored in the audio buffer 18 are determined, the audio processing unit 17 cuts out the audio data in the range and compresses the data. An audio data file associated with the still image recorded in step S106 is created and recorded on the memory card 32 by the memory card controller 28 (step S115), and the series of processing shown in FIG.

メモリカード３２に記録された静止画像データのファイル名が、例えば「ＣＩＭＧ０００１．ＪＰＧ」であった場合、この静止画像データに関連付けて音声データのファイルを記憶する場合には、上述した如くファイル名を同じに設定することにより、例えば「ＣＩＭＧ０００１．ＡＡＣ」として設定すればよい。 When the file name of the still image data recorded on the memory card 32 is, for example, “CIMG0001.JPG”, when storing the audio data file in association with the still image data, the file name is changed as described above. By setting the same, for example, “CIMG0001.AAC” may be set.

この場合、同一のファイル名「ＣＩＭＧ０００１」を有することと拡張子とにより、ＪＰＥＧフォーマットで記録された静止画像データのファイル「ＣＩＭＧ０００１．ＪＰＧ」と、ＡＡＣフォーマットで記録された音声データのファイル「ＣＩＭＧ０００１．ＡＡＣ」とが関連付けられていることが理解できる。したがって、再生時にはこれら両ファイルをメモリカード３２から並列して読出し、静止画像データを表示部１５で表示する間、対応する音声データをスピーカ１９より放音させるものとすればよい。 In this case, due to having the same file name “CIMG0001” and the extension, the still image data file “CIMG0001.JPG” recorded in the JPEG format and the audio data file “CIMG0001.JPG” recorded in the AAC format are used. It can be seen that “AAC” is associated. Accordingly, at the time of reproduction, both these files may be read out from the memory card 32 in parallel, and the corresponding audio data may be emitted from the speaker 19 while the still image data is displayed on the display unit 15.

図３は、静止画像撮影時の音声の一時記憶と、その記憶した内容から静止画像に関連した音声区間を切り出して記録する場合について例示する。図３（Ａ）に示すように、キー入力部２３のシャッタキーを半押し操作したタイミングｔ１１で、音声バッファ１８への音声データの一時記憶が開始される。 FIG. 3 exemplifies a case where audio is temporarily stored at the time of still image shooting and a voice section related to a still image is cut out and recorded from the stored contents. As shown in FIG. 3A, the temporary storage of the audio data in the audio buffer 18 is started at the timing t11 when the shutter key of the key input unit 23 is half-pressed.

以後、音声バッファ１８には順次音声データが蓄積されていく。音声データが音声バッファ１８の容量一杯になると、その後は古い音声データを削除すると同時に、その時点で入力した新しい音声データを記憶するようになる。結果として音声バッファ１８には常にその記憶容量に応じた一定時間分の音声データが記憶され続ける。 Thereafter, audio data is sequentially stored in the audio buffer 18. When the audio data becomes full in the audio buffer 18, thereafter, the old audio data is deleted and at the same time the new audio data input at that time is stored. As a result, the audio buffer 18 always stores audio data for a fixed time according to the storage capacity.

その後、タイミングｔ１２でシャッタキーが全押し操作されて静止画像の撮影を実行した後も音声バッファ１８への音声データの記憶を継続する。そして、音声が途切れて予め設定した音圧未満となってから一定時間が経過したタイミングｔ１３で録音を一旦終了する。 Thereafter, the storage of the audio data in the audio buffer 18 is continued even after the shutter key is fully pressed at timing t12 and still image shooting is executed. Then, the recording is temporarily terminated at a timing t13 when a predetermined time has elapsed after the sound is interrupted and becomes less than a preset sound pressure.

こうして音声バッファ１８に記憶した音声データに対し、撮影タイミングｔ１２を含む上述した区間の音声データを切り出す。 In this way, the audio data in the above-described section including the shooting timing t12 is cut out from the audio data stored in the audio buffer 18.

図３（Ｂ）は、静止画像の撮影タイミングｔ１２においては音声データの音圧が予め設定した音圧レベル未満であり、それより前のタイミングと後のタイミングのそれぞれに、予め設定した音圧レベル以上が予め設定した時間以上継続している場合の音圧Ｐの変化を示す。 FIG. 3B shows that the sound pressure of the audio data is lower than the preset sound pressure level at the still image capturing timing t12, and the preset sound pressure level is set at each of the timing before and after that. The change of the sound pressure P when the above continues for the preset time or more is shown.

このような場合、上記図２で説明した処理により、撮影タイミングｔ１２の前側に位置する、予め設定した音圧レベル以上が予め設定した時間以上継続している区間の始端位置を、記録する音声区間の開始タイミングｔ１４とする。同様に、撮影タイミングｔ１２の後側に位置する、予め設定した音圧レベル以上が予め設定した時間以上継続している区間の終端位置を、記録する音声区間の終了タイミングｔ１５とする。 In such a case, by the processing described with reference to FIG. 2, the audio section for recording the start position of the section located on the front side of the photographing timing t12 and continuing for a preset time or more at a preset sound pressure level or more. The start timing t14. Similarly, the end position of the section located behind the shooting timing t12 and continuing for a preset time or more is set as the end timing t15 of the recording voice section.

結果として、シャッタキーの全押しによる撮影タイミングｔ１２を挟み、予め設定した音圧レベル以上が予め設定した時間以上継続する前後２つの区間を含むタイミングｔ１４からタイミングｔ１５までの区間が、音声データとして切り出されて静止画像データと関連付けて記録される。 As a result, a section from timing t14 to timing t15 including two sections before and after a preset sound pressure level continues for a preset time or more with a shooting timing t12 by pressing the shutter key fully pressed is cut out as audio data. And recorded in association with still image data.

また、図３（Ｃ）は、静止画像の撮影タイミングｔ１２において音声データの音圧が予め設定した音圧レベル以上であり、且つ予め設定した時間以上継続している場合の音圧Ｐの変化を示す。 FIG. 3C shows the change in the sound pressure P when the sound pressure of the audio data is equal to or higher than a preset sound pressure level and continues for a preset time at the still image capturing timing t12. Show.

このような場合、上記図２で説明した処理により、撮影タイミングｔ１２自体が存在している、予め設定した音圧レベル以上が予め設定した時間以上継続している区間の始端位置を、記録する音声区間の開始タイミングｔ１６とする。同様に、撮影タイミングｔ１２自体が位置する、予め設定した音圧レベル以上が予め設定した時間以上継続している区間の終端位置を、記録する音声区間の終了タイミングｔ１７とする。 In such a case, the processing described with reference to FIG. 2 described above is performed to record the start position of the section where the shooting timing t12 itself exists and where the sound pressure level that has been set in advance continues for a preset time or longer. The section start timing is t16. Similarly, the end position of the section where the shooting timing t12 itself is located and where the sound pressure level set in advance is continued for the preset time or longer is set as the end timing t17 of the recording voice section.

結果として、シャッタキーの全押しによる撮影タイミングｔ１２自体を含む、予め設定した音圧レベル以上が予め設定した時間以上継続する区間を含むタイミングｔ１６からタイミングｔ１７までの区間が、音声データとして切り出されて静止画像データと関連付けて記録される。 As a result, a section from the timing t16 to the timing t17 including the section in which the predetermined sound pressure level or higher continues for the preset time or more including the shooting timing t12 itself when the shutter key is fully pressed is cut out as audio data. Recorded in association with still image data.

なお、上記図２で示した動作例では、撮影タイミングの時点で音声がない場合でも、少なくとも前後３秒以上の音声データが静止画像に関連付けて記録されることが保証される。また、記録される音声データには１０秒以上の音声の途切れがないことが保証される。ただし、撮影タイミングの前後で最大２０秒の音声の途切れが発生する場合を許容することになる。 In the operation example shown in FIG. 2, it is ensured that audio data of at least 3 seconds before and after is recorded in association with a still image even when there is no audio at the time of shooting. Further, it is guaranteed that the recorded audio data has no audio interruption for 10 seconds or more. However, the case where a sound interruption of up to 20 seconds occurs before and after the photographing timing is allowed.

また、上記ステップＳ１０７，Ｓ１１２ではいずれも、音声データが予め設定した音圧以上で予め設定した時間、例えば３秒間以上継続した状態となるのを待機したが、この待機状態が、例えば２０秒以上経過しても所定の音圧以上の状態の音声が検出されたかった場合には、待機状態を解除して音声データの記録を停止するようにしてもよい。 In both steps S107 and S112, the sound data waits for a preset time at a preset sound pressure or higher, for example, for 3 seconds or more. This standby state is, for example, 20 seconds or more. If it is not desired to detect a sound with a predetermined sound pressure or higher even after a lapse of time, the standby state may be canceled and recording of the sound data may be stopped.

このようにすれば、撮影タイミングとは全く関係ないタイミングで発生した、撮影タイミングでの音声とは連続性のない音声データまでもが静止画像に関連付けて記録されるのを防ぐことができる。 In this way, it is possible to prevent even audio data that is generated at a timing that is completely unrelated to the imaging timing and has no continuity with the audio at the imaging timing from being recorded in association with the still image.

同様に、上記ステップＳ１０８，Ｓ１１３ではいずれも、音声データが予め設定した音圧未満で予め設定した時間、例えば１０秒間以上継続した状態となるのを待機するものとしたが、この待機状態が、例えば３０秒以上経過しても規定状態が検出されたかった場合には、音声データの記録を停止するようにしてもよい。このようにすれば、１つの静止画像に関連付けて記録される音声データの最大長を制限し、むやみに音声データが長引いてしまうのを防止できる。 Similarly, in steps S108 and S113, it is assumed that the voice data waits for a preset time less than the preset sound pressure, for example, 10 seconds or more. For example, recording of audio data may be stopped if the specified state is not detected even after 30 seconds have elapsed. In this way, it is possible to limit the maximum length of audio data recorded in association with one still image and prevent the audio data from being unnecessarily prolonged.

さらに上記動作例では、待機時間を順番に、３秒、１０秒、３秒、１０秒としたが、撮影状況に合わせて各々の時間を任意に設定してもよい。 Furthermore, in the above operation example, the standby times are set to 3 seconds, 10 seconds, 3 seconds, and 10 seconds in order, but each time may be arbitrarily set according to the shooting situation.

以上詳記した如く本実施形態によれば、音声付きの静止画像撮影時に、静止画像の撮影タイミングのみによって一律に決定されるのではなく、音声の状況に対応して取得の開始及び終了の各タイミングを適切に制御することが可能となる。 As described above in detail, according to the present embodiment, at the time of still image shooting with sound, it is not determined uniformly only by the shooting timing of the still image, but each of the start and end of acquisition corresponding to the state of the sound. The timing can be appropriately controlled.

また上記実施形態では、音声バッファ１８に記憶される音声の音圧が所定レベル以上を維持する期間に基づいて音声の取得開始タイミング及び取得終了タイミングを決定するものとしたので、撮影時の雰囲気をより忠実に再現できる。 In the above embodiment, the sound acquisition start timing and acquisition end timing are determined based on a period in which the sound pressure of the sound stored in the sound buffer 18 is maintained at a predetermined level or higher. Can be reproduced more faithfully.

さらに上記実施形態では、図３（Ｂ）で説明したように、撮影を行なったタイミングでは適正な音圧の音声が得られなかった場合でも、その前後で適正な音圧が得られる区間までを延長して記録するものとしたので、違和感なく撮影時の状況を忠実に再現できる。 Furthermore, in the above-described embodiment, as described with reference to FIG. 3B, even when a sound having an appropriate sound pressure cannot be obtained at the timing of shooting, a section where an appropriate sound pressure can be obtained before and after the sound is obtained. Since the recording is extended, the situation at the time of shooting can be faithfully reproduced without a sense of incongruity.

（第２の実施形態）
以下本発明をデジタルカメラに適用した場合の第２の実施形態について図面を参照して説明する。 (Second Embodiment)
A second embodiment when the present invention is applied to a digital camera will be described below with reference to the drawings.

なお、本実施形態に係るデジタルカメラ１０′の回路構成に関しては、基本的に上記図１に示した内容と同様であるため、同一部分には同一符号を用いることとし、その図示と説明とを省略する。 The circuit configuration of the digital camera 10 ′ according to the present embodiment is basically the same as that shown in FIG. 1, and therefore the same reference numerals are used for the same parts, and the illustration and explanation thereof are as follows. Omitted.

加えて本実施形態では、予め複数の人物の声紋がプログラムメモリ２２に登録可能であり、音声処理部１７が、プログラムメモリ２２に登録された人物の声紋と、マイクロホン１６から入力して音声バッファ１８に記憶する音声データ中の声紋との一致比較を行なう解析機能を有しているものとする。 In addition, in the present embodiment, voice prints of a plurality of persons can be registered in the program memory 22 in advance, and the voice processing unit 17 inputs the voice prints of the persons registered in the program memory 22 from the microphone 16 and the voice buffer 18. Assume that it has an analysis function for comparing and matching the voiceprints in the voice data stored in.

音声処理部１７は他に、マイクロホン１６から入力される音声中に所定の音圧レベル以上で人間の声が含まれているか否かを判断する解析機能等の各種音声処理機能を有するものとする。 In addition, the sound processing unit 17 has various sound processing functions such as an analysis function for determining whether or not a voice input from the microphone 16 includes a human voice at a predetermined sound pressure level or higher. .

図４は、音声付き静止画撮影モード時の、主として音声データの取扱いの処理内容を示すものである。 FIG. 4 mainly shows the processing contents of handling audio data in the still image shooting mode with audio.

その当初には、プログラムメモリ２２に予め登録されている人物の声紋中から、静止画像の撮影に伴う音声の記録対象となる人物の声紋を選択して指定する（ステップＳ２０１）。 Initially, a voice print of a person who is a recording target of a sound accompanying a still image shooting is selected and specified from the voice prints of the person registered in advance in the program memory 22 (step S201).

この声紋指定に際しては、そのいずれかを指定するべくキー入力部２３でのキー操作を促すガイドメッセージと共に、登録されている声紋の人物名を表示部１５で一覧表示し、所定のキー操作により人物名が選択されると、選択された人物名の声紋が指定されたものとする。 When this voiceprint is designated, a list of registered voiceprint character names is displayed on the display unit 15 together with a guide message for prompting a key operation on the key input unit 23 to designate one of the voiceprints. It is assumed that when a name is selected, a voice print of the selected person name is designated.

その後、マイクロホン１６から入力される音声に人間の声が含まれているか否かを音声処理部１７を用いて判断する（ステップＳ２０２）。 Thereafter, it is determined using the voice processing unit 17 whether or not the voice input from the microphone 16 includes a human voice (step S202).

ここで音声処理部１７は、マイクロホン１６から入力される音声の周波数スペクトルを分析し、人間の声に特有の周波数帯での検出レベルが所定の値以上得られているか否かにより音声に人間の声が含まれているか否かを判断する。 Here, the voice processing unit 17 analyzes the frequency spectrum of the voice input from the microphone 16 and determines whether or not the voice level is higher than a predetermined value in the frequency band specific to the human voice. Determine if voice is included.

上記ステップＳ２０２で、マイクロホン１６から入力される音声に人間の声が含まれていると判断するまで、同様の処理を繰返し実行することで、人間の声が入力されるのを待機する。 By repeating the same process until it is determined in step S202 that the voice input from the microphone 16 includes a human voice, the process waits for a human voice to be input.

そして、人間の声が含まれているとステップＳ２０２で判断すると、次いでマイクロホン１６、音声処理部１７、及び音声バッファ１８を用いた音声の循環記憶を開始させる（ステップＳ２０３）。 Then, if it is determined in step S202 that a human voice is included, then the sound storage using the microphone 16, the sound processing unit 17, and the sound buffer 18 is started (step S203).

この音声の循環記憶を続行しながら、キー入力部２３のシャッタキーが操作されたか否かを判断する（ステップＳ２０４）。ここでシャッタキーが操作されたと判断するまで、同様の処理を繰返し実行することで、音声データの記憶を継続しながらシャッタキーが操作されるのを待機する。 It is determined whether or not the shutter key of the key input unit 23 has been operated while continuing the circular storage of the sound (step S204). Until it is determined that the shutter key has been operated, the same processing is repeatedly executed to wait for the operation of the shutter key while continuing to store the audio data.

そして、シャッタキーが操作されると、ステップＳ２０４でそれを判断し、静止画の本撮影を実行し、撮影で得た画像データを圧縮伸長処理部２７でデータ圧縮してデータファイル化し、メモリカード３２に記録する（ステップＳ２０５）。 When the shutter key is operated, it is determined in step S204, the still image is actually captured, and the image data obtained by the capture is compressed by the compression / decompression processing unit 27 into a data file. 32 (step S205).

これと共に、その時点でマイクロホン１６から入力される音声を解析する（ステップＳ２０６）。この解析の結果、現在の音声中に、上記ステップＳ２０１で指定した人物の声紋を含んでいるか否かを判断する（ステップＳ２０７）。 At the same time, the sound input from the microphone 16 at that time is analyzed (step S206). As a result of this analysis, it is determined whether or not the voice of the person specified in step S201 is included in the current voice (step S207).

ここで、指定した人物の声紋を含んでいると判断した場合には、上記ステップＳ２０１での指定通り、指定した人物の声紋を録音対象として設定する（ステップＳ２０８）。 If it is determined that the voice print of the designated person is included, the voice print of the designated person is set as a recording target as designated in step S201 (step S208).

また、上記ステップＳ２０７で指定した人物の声紋を含んでいないと判断した場合には、解析したすべての人物の声紋を録音対象として設定する（ステップＳ２０９）。 If it is determined that the voice print of the person specified in step S207 is not included, all the voice prints of the analyzed persons are set as recording targets (step S209).

その後、上記ステップＳ２０８またはステップＳ２０９で設定した録音対象の人物の声紋が検出できない期間が一定時間、例えば１０秒間以上あるか否かを繰返し判断する（ステップＳ２１０）。 Thereafter, it is repeatedly determined whether or not the period during which the voice print of the person to be recorded set in step S208 or step S209 cannot be detected is a certain time, for example, 10 seconds or more (step S210).

ここで録音対象の人物の声紋が検出できない期間が一定時間以上あると判断した時点で、録音対象となる人物の音声が一定時間以上途絶えたこととなるため、最後に録音対象の人物の声紋が検出できた時点を、上記撮影した静止画に対応する音声データの区間の終了位置とするよう設定する（ステップＳ２１１）。 Here, when it is determined that there is a period during which the voice print of the person to be recorded cannot be detected for a certain time or more, the voice of the person to be recorded has stopped for a certain time. The detected time is set as the end position of the section of the audio data corresponding to the photographed still image (step S211).

次いで、音声バッファ１８内の音声データを、上記シャッタキーが全押し操作される前から時系列に遡って、記憶状態を判定する処理を開始する（ステップＳ２１２）。 Next, a process for determining the storage state of the audio data in the audio buffer 18 is started retroactively before the shutter key is fully pressed (step S212).

この時間を遡った判定処理過程で、指定した声紋検出が予め設定した時間、例えば３秒間以上継続した部分をサーチする（ステップＳ２１３）。 In the determination process going back this time, a search is performed for a portion where the designated voiceprint detection continues for a preset time, for example, 3 seconds or more (step S213).

サーチの結果、指定した声紋検出が予め設定した時間以上継続した状態での始端位置を、上記撮影した静止画に対応する音声データの区間の開始位置とするよう設定する（ステップＳ２１４）。 As a result of the search, the start position in a state where the specified voice print detection continues for a preset time or longer is set as the start position of the section of the audio data corresponding to the photographed still image (step S214).

これにより、音声バッファ１８に記憶されている音声データの区間の開始位置と終了位置とが決定されたことになるので、音声処理部１７が当該範囲の音声データを切り出し、データ圧縮した上で上記ステップＳ２０５で記録した静止画像に関連付けた音声データファイルを作成し、メモリカードコントローラ２８よりメモリカード３２に記録させる（ステップＳ２１５）。 Thus, since the start position and end position of the section of the audio data stored in the audio buffer 18 are determined, the audio processing unit 17 cuts out the audio data in the range and compresses the data. An audio data file associated with the still image recorded in step S205 is created and recorded in the memory card 32 by the memory card controller 28 (step S215).

その後、さらにその時点でマイクロホン１６から入力される音声に人間の声が含まれているか否かを音声処理部１７を用いて判断する（ステップＳ２１６）。 Thereafter, it is further determined using the voice processing unit 17 whether or not the voice input from the microphone 16 at that time includes a human voice (step S216).

ここでマイクロホン１６から入力される音声に人間の声が含まれていると判断した場合には、続けて次の静止画像を撮影するために、上記ステップＳ２０４からの処理に戻る。 If it is determined that the voice input from the microphone 16 includes a human voice, the process returns to step S204 in order to continuously capture the next still image.

こうして、マイクロホン１６から入力される音声に人間の声が含まれている間、上記ステップＳ２０４〜Ｓ２１６までの処理を繰返し実行する。 In this way, while the human voice is included in the voice input from the microphone 16, the processes from the above steps S204 to S216 are repeatedly executed.

そして、上記ステップＳ２１６で入力される音声に人間の声が含まれていないと判断すると、以上で、音声バッファ１８への音声データの循環記憶を停止させる（ステップＳ２１７）。 If it is determined that the voice input in step S216 does not contain a human voice, the circular storage of the voice data in the voice buffer 18 is stopped (step S217).

その後、上記記録した各静止画像のデータファイルとは別に、撮影した複数の静止画像データと各静止画像データに対応して記録した音声データとに基づき、それらをスライドショーとして再生するための１つの画像データファイル、具体的には例えば拡張子が「．ＡＶＩ」で表されるモーションＪＰＥＧファイルを作成してメモリカード３２に記録させ（ステップＳ２１８）、以上で図４に係る一連の処理を終了する。 After that, apart from the recorded still image data file, one image for reproducing them as a slide show based on a plurality of captured still image data and audio data recorded corresponding to each still image data. A data file, specifically, for example, a motion JPEG file with an extension “.AVI” is created and recorded in the memory card 32 (step S218), and the series of processing shown in FIG.

図５は、静止画像撮影時の音声の一時記憶と、その記憶した内容から静止画像に関連した音声区間を切り出して記録する場合について例示する。 FIG. 5 exemplifies a case where audio is temporarily stored at the time of still image shooting and a voice section related to a still image is cut out and recorded from the stored contents.

図５（Ｂ）に示すように、人間の声が入力され、それを音声処理部１７で認識したタイミングｔ２１で、音声バッファ１８への音声データの一時記憶が開始される。但し、この時点での人の声は、予め登録されていない他人Ａのものであるため、この時点では記録の対象とは判断しない。 As shown in FIG. 5B, temporary storage of audio data in the audio buffer 18 is started at a timing t21 when a human voice is input and recognized by the audio processing unit 17. However, since the voice of the person at this time is that of another person A who has not been registered in advance, it is not determined to be a recording target at this time.

そして、さらに図５（Ｃ）に示すように登録されていない他人Ｂの声と、図５（Ｄ）に示すように登録済みであり、且つ図４の処理当初で指定した自分の子供の声とが、順次時間差をもって入力されるものとする。 Further, the voice of another person B who is not registered as shown in FIG. 5C and the voice of his child who has been registered as shown in FIG. 5D and specified at the beginning of the process shown in FIG. Are sequentially input with a time difference.

その後、タイミングｔ２２でシャッタキーが操作されて静止画像の撮影を実行した後も登録／未登録含めて人の声が入力されている間は、音声バッファ１８への音声データの記憶を継続する。そして、人の声による音声が途切れて予め設定した音圧未満となったタイミングｔ２３で録音を一旦終了する。 After that, after the shutter key is operated at timing t22 and still image shooting is performed, the voice data is continuously stored in the voice buffer 18 as long as a human voice is input including registration / unregistration. Then, the recording is temporarily terminated at a timing t23 when the voice of the human voice is interrupted and becomes less than the preset sound pressure.

こうして音声バッファ１８に記憶した音声データに対し、撮影タイミングｔ２２を含む上述した区間の音声データを切り出す。 In this way, the audio data in the above-described section including the shooting timing t22 is cut out from the audio data stored in the audio buffer 18.

この場合、図５（Ｄ）に示したように、予め指定した、登録済みである自分の子供の声が上記撮影タイミングｔ２２も含めて音声中に入っているため、音声処理部１７は図５（Ｄ）に示す子供の声を解析して認識し、指定した声紋と一致が検出された始端のタイミングｔ２４から、一致が検出できなくなる終端のタイミングｔ２５までを記録する音声区間とする。 In this case, as shown in FIG. 5 (D), since the voice of his / her child who has been specified and has been registered is included in the sound including the photographing timing t22, the sound processing unit 17 performs the processing shown in FIG. The voice of the child shown in (D) is analyzed and recognized, and the voice section is recorded from the start timing t24 at which a match with the designated voiceprint is detected to the end timing t25 at which no match can be detected.

結果として、シャッタキーの操作による撮影タイミングｔ２２を挟み、指定した声紋と一致する声が入力され始めたタイミングｔ２４から、指定した声紋と一致する声の入力が途絶えるタイミングｔ２５までの区間が、音声データとして切り出されて静止画像データと関連付けて記録される。 As a result, the interval from the timing t24 at which the voice matching the designated voiceprint starts to be input to the timing t25 at which the input of the voice matching the designated voiceprint is interrupted is sandwiched between the shooting timing t22 by the operation of the shutter key. And recorded in association with still image data.

なお、上記図５において、図５（Ｄ）で示した、予め指定した自分の子供の声の入力が検出できなかった場合には、指定した声紋ではないものの、代わって人の声である図５（Ｂ）で示した他人Ａの声、図５（Ｃ）で示した他人Ｂの声が入力されている区間を切り出し、静止画像データと関連付けて記録することとなる。 In FIG. 5, when the input of the voice of his / her child designated in advance shown in FIG. 5 (D) is not detected, the voice is not a designated voice print but is a human voice instead. The section in which the voice of the other person A shown in FIG. 5B and the voice of the other person B shown in FIG. 5C are input is cut out and recorded in association with the still image data.

図６は、上記図４の処理によりデジタルカメラ１０′の電源を投入して常に人の声が入力され続け、且つ途中で電源を切断することなく子供の運動会の様子を撮影し続けた場合に、スライドショー表示のために自動的に作成される画像データを例示するものである。 FIG. 6 shows a case where the digital camera 10 'is turned on by the process of FIG. 4 and a human voice is continuously inputted, and a child's athletic meet is continuously photographed without turning off the power. FIG. 4 illustrates image data automatically created for slide show display.

この場合、図６（Ａ）〜図６（Ｅ）で示す各種目毎に静止画像の撮影を行ない、且つ指定した子供の声が音声データ中に入っている場合にはその子供の声が検出して範囲で、また指定した子供の声が入っていない場合には、人の声が連続して入っている範囲でそれぞれ静止画像データに関連付けて音声データのファイルが作成される。 In this case, still images are taken for each eye shown in FIGS. 6 (A) to 6 (E), and if the designated child's voice is included in the audio data, the child's voice is detected. If the designated child's voice is not included in the range, an audio data file is created in association with the still image data in the range where the human voice is continuously input.

そのため、全く人の声が途絶えた時点で、上記ステップＳ２１８の処理により図６（Ａ）〜図６（Ｅ）の静止画と対応する音声とによるスライドショー用の画像データが別途作成され、メモリカード３２に記録される。 For this reason, when the human voice is completely interrupted, image data for a slide show using the still image of FIGS. 6A to 6E and the corresponding audio is separately created by the process of step S218, and the memory card 32.

このスライドショー用の画像データの再生時には、図６（Ａ）〜図６（Ｅ）の静止画と対応する音声とが循環的に、且つ再生の停止を指示するまでエンドレスで再生され続ける。 At the time of reproducing the image data for the slide show, the still images in FIGS. 6A to 6E and the corresponding audio are continuously reproduced in an endless manner until the stop of the reproduction is instructed.

以上詳記した如く本実施形態によれば、一時的に記憶する音声に対して例えば人の声など、特定の音質となる所定の条件を満足する期間に基づいて音声の取得開始タイミング及び取得終了タイミングを決定するものとしたので、被写体に適した音質を設定することでその状況に適切に対応して音声の取得の開始及び終了の各タイミングを制御することが可能となる。 As described above in detail, according to the present embodiment, the voice acquisition start timing and the acquisition end based on a period of time that satisfies a predetermined condition of a specific sound quality, such as a human voice, with respect to the temporarily stored voice. Since the timing is determined, by setting the sound quality suitable for the subject, it is possible to control the start and end timings of the sound acquisition appropriately corresponding to the situation.

加えて上記実施形態では、特定の音質として、人の声を指定するものとしたので、スナップやポートレイトといった特に被写体を人とする多くの場合に設定するための操作が容易となり、音声の取得に関する手間を大幅に簡略化できる。 In addition, in the above embodiment, since the voice of a person is specified as the specific sound quality, the operation for setting in many cases where the subject is a person such as a snap or a portrait becomes easy, and the acquisition of the voice Can be greatly simplified.

また、上記実施形態では、予め声紋を登録しておくことで、特定の人物の声が検出された場合にその声が得られる区間を音声データとして記録するよう設定することができるため、他の人の声に惑わされず、被写体に適した音声データ部分のみを音声データとして静止画像のデータに関連付けて記録することができる。 In the above embodiment, by registering a voice print in advance, when a voice of a specific person is detected, a section in which the voice can be obtained can be set to be recorded as audio data. Only the audio data portion suitable for the subject can be recorded as audio data in association with still image data without being confused by the human voice.

さらに上記実施形態は、音声データの取得終了後に、複数の静止画像データが一連の撮影過程で得られた場合には、それら音声付きの複数の静止画像のデータからスライドショー用の画像データを別途作成して記録するものとしたので、ユーザの手間を煩わせることなく手軽にスライドショーデータを作成して視聴を楽しむことができる。 Further, in the above embodiment, when a plurality of still image data is obtained in a series of shooting processes after the acquisition of the audio data, slide show image data is separately created from the plurality of still image data with sound. Therefore, it is possible to easily create slideshow data and enjoy viewing without bothering the user.

また上記第１及び第２の各実施形態では、音声バッファ１８に記憶される音声が、予め設定した欠落期間を超えることなく所定の条件を満たす期間に応じ、静止画像の撮影タイミングを含んで音声の取得開始タイミング及び取得終了タイミングを決定するものとしたので、音声バッファ１８の記憶容量を有効に活用すると共に、画像に関係ないと思われる無駄な無音部分を音声に含ませることなく、効率的で実状に即した音声データを記録させることができる。 In each of the first and second embodiments, the audio stored in the audio buffer 18 includes the still image shooting timing according to a period that satisfies a predetermined condition without exceeding a preset missing period. Since the acquisition start timing and the acquisition end timing are determined, the storage capacity of the audio buffer 18 is effectively used, and a soundless portion that seems to be unrelated to the image is not included in the audio. Can record audio data in line with the actual situation.

なお、上記実施形態は本発明をデジタルカメラに適用した場合について説明したが、本発明はこれに限ることなく、ビデオムービーカメラや、カメラ機能を有する携帯電話端末、ウェブカメラを搭載したモバイルコンピュータ等、その他の各種電子機器にも同様に適用可能となる。 In the above embodiment, the present invention is applied to a digital camera. However, the present invention is not limited to this, and a video movie camera, a mobile phone terminal having a camera function, a mobile computer equipped with a web camera, etc. The present invention can be similarly applied to other various electronic devices.

その他、本発明は上述した実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、上述した実施形態で実行される機能は可能な限り適宜組み合わせて実施しても良い。上述した実施形態には種々の段階が含まれており、開示される複数の構成要件による適宜の組み合せにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、効果が得られるのであれば、この構成要件が削除された構成が発明として抽出され得る。 In addition, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention in the implementation stage. Further, the functions executed in the above-described embodiments may be combined as appropriate as possible. The above-described embodiment includes various stages, and various inventions can be extracted by an appropriate combination of a plurality of disclosed constituent elements. For example, even if several constituent requirements are deleted from all the constituent requirements shown in the embodiment, if an effect is obtained, a configuration from which the constituent requirements are deleted can be extracted as an invention.

１０，１０′…デジタルカメラ、１１…光学レンズユニット、１２…ＣＣＤ、１３…画像処理部、１４…画像バッファ、１５…表示部、１６…マイクロホン、１７…音声処理部、１８…音声バッファ（リングバッファ）、１９…スピーカ、２０…制御部、２１…メインメモリ、２２…プログラムメモリ、２３…キー入力部、２４…レンズ駆動部、２５…フラッシュ駆動部、２６…ＣＣＤドライバ、２７…圧縮伸長処理部、２８…メモリカードコントローラ、２９…ＵＳＢインタフェース（Ｉ／Ｆ）、３０…レンズ用ステッピングモータ（Ｍ）、３１…フラッシュ部、３２…メモリカード、３３…カードコネクタ、３４…ＵＳＢコネクタ、ＳＢ…システムバス。 DESCRIPTION OF SYMBOLS 10,10 '... Digital camera, 11 ... Optical lens unit, 12 ... CCD, 13 ... Image processing part, 14 ... Image buffer, 15 ... Display part, 16 ... Microphone, 17 ... Sound processing part, 18 ... Sound buffer (ring) Buffer), 19 ... Speaker, 20 ... Control unit, 21 ... Main memory, 22 ... Program memory, 23 ... Key input unit, 24 ... Lens drive unit, 25 ... Flash drive unit, 26 ... CCD driver, 27 ... Compression / decompression process , 28 ... Memory card controller, 29 ... USB interface (I / F), 30 ... Lens stepping motor (M), 31 ... Flash unit, 32 ... Memory card, 33 ... Card connector, 34 ... USB connector, SB ... System bus.

Claims

Imaging means for capturing still images;
Voice input means for inputting voice;
Registration means for registering a voice print of an arbitrary person in advance;
The voice acquisition start timing and the acquisition end including the still image shooting timing in the imaging unit according to the period in which the voice print of the voice input to the voice input unit coincides with the voice print of the person registered in the registration unit An audio period determining means for determining timing;
Recording means for recording a still image and sound taken by the imaging means;
The audio input to the audio input unit from the acquisition start timing to the acquisition end timing determined by the audio period determination unit is cut out, and the cut out audio is recorded in the recording unit in association with the still image obtained by the imaging unit. Recording control means;
When a plurality of still images associated with a voice whose voice print matches a person's voice print registered in the registration means are recorded in the recording means, the corresponding still images are combined to create a new image with sound. And an image reconstructing unit that causes the recording unit to record the image.

Imaging means for capturing still images;
Voice input means for inputting voice;
A sound period determining means for determining a sound acquisition start timing and an acquisition end timing including a still image shooting timing in the image pickup means according to a period in which the sound quality of the sound input to the sound input means satisfies a predetermined condition. When,
Recording means for recording a still image and sound taken by the imaging means;
The audio input to the audio input unit from the acquisition start timing to the acquisition end timing determined by the audio period determination unit is cut out, and the cut out audio is recorded in the recording unit in association with the still image obtained by the imaging unit. Recording control means;
Comprising
The sound period determining means detects the voice of the specific person based on a period until the voice of the specific person detected at the shooting timing is not detected after the voice of the specific person is detected at the shooting timing of the still image. An imaging apparatus characterized by determining acquisition end timing.

The sound period determining means includes a timing at which the sound input to the sound input means satisfies a predetermined condition without exceeding a preset missing period and includes a still image shooting timing at the image pickup means. The imaging apparatus according to claim 1, wherein an acquisition start timing and an acquisition end timing are determined.

Voice temporary storage means that continues to store while updating the voice for a predetermined time length input to the voice input means,
The sound period determining means determines the sound acquisition start timing before the still image shooting timing in the imaging means according to a period in which the sound stored in the sound temporary storage means satisfies a predetermined condition. The imaging apparatus according to claim 1, wherein the imaging apparatus is characterized in that:

The sound period determining means detects a position where the sound input to the sound input means satisfies a predetermined condition both before and after the shooting timing, and starts acquiring the sound up to the detected position. The imaging apparatus according to claim 1, wherein the timing or the acquisition end timing is extended.

An imaging unit including an imaging unit that captures still images, an audio input unit that inputs audio, a recording unit that records still images and audio captured by the imaging unit, and a registration unit that registers a voice print of an arbitrary person in advance An imaging method using an apparatus,
The voice acquisition start timing and acquisition end including the still image shooting timing in the imaging unit according to a period in which the voice voiceprint input to the voice input unit matches the voiceprint of the person registered in the registration unit An audio period determining step for determining timing;
The audio input to the audio input unit from the acquisition start timing to the acquisition end timing determined in the audio period determination step is cut out, and the cut out audio is recorded in the recording unit in association with the still image obtained by the imaging unit. Recording control process;
When a plurality of still images associated with a voice whose voiceprint matches the voiceprint of the person registered in the registration unit are recorded by the recording control step, the corresponding still images are added together with a new voice. An image reconstructing step of creating an image and recording the image on the recording unit.

An imaging method in an imaging apparatus including an imaging unit that captures a still image, an audio input unit that inputs audio, and a recording unit that records still images and audio captured by the imaging unit,
An audio period determining step of determining an acquisition start timing and an acquisition end timing including a still image shooting timing in the imaging unit according to a period in which sound quality of the audio input to the audio input unit satisfies a predetermined condition When,
The audio input to the audio input unit from the acquisition start timing to the acquisition end timing determined in the audio period determination step is cut out, and the cut out audio is recorded in the recording unit in association with the still image obtained by the imaging unit. Recording control process;
Have
The audio period determining step is based on a period from when a voice of a specific person is detected at the shooting timing of the still image until a voice of the specific person detected at the shooting timing is not detected. An imaging method characterized by determining an acquisition end timing.

An imaging unit including an imaging unit that captures still images, an audio input unit that inputs audio, a recording unit that records still images and audio captured by the imaging unit, and a registration unit that registers a voice print of an arbitrary person in advance A program executed by a computer built in the apparatus, wherein the computer is
The voice acquisition start timing and acquisition end including the still image shooting timing in the imaging unit according to a period in which the voice voiceprint input to the voice input unit matches the voiceprint of the person registered in the registration unit Audio period determining means for determining timing;
The audio input to the audio input unit from the acquisition start timing to the acquisition end timing determined by the audio period determining unit is cut out, and the cut out audio is recorded in the recording unit in association with the still image obtained by the imaging unit. Recording control means,
In the case where a plurality of still images associated with voices whose voice prints match the person's voice print registered in the registration unit are recorded in the recording unit, the corresponding still images are combined into a new image with sound. And a program that functions as image reconstruction means for recording in the recording unit.

A program executed by a computer built in an imaging apparatus including an imaging unit that captures a still image, an audio input unit that inputs audio, and a recording unit that records still images and audio captured by the imaging unit, Computer
Audio period determination for determining audio acquisition start timing and acquisition end timing including still image shooting timing in the imaging unit according to a period in which sound quality of the audio input to the audio input unit satisfies a predetermined condition And the recording means associated with the still image obtained by the imaging means and cut out the sound input to the sound input unit from the acquisition start timing to the acquisition end timing determined by the sound period determining means. Recording control means for recording to
Function as
The sound period determining means detects the voice of the specific person based on a period until the voice of the specific person detected at the shooting timing is not detected after the voice of the specific person is detected at the shooting timing of the still image. A program for determining an acquisition end timing.