JP2018152766A

JP2018152766A - File generation device, file generating method, and program

Info

Publication number: JP2018152766A
Application number: JP2017048468A
Authority: JP
Inventors: 亮及川; Akira Oikawa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-03-14
Filing date: 2017-03-14
Publication date: 2018-09-27

Abstract

PROBLEM TO BE SOLVED: To make it possible to easily reproduce a sound corresponding to a still image when reproducing the still image by a general reproduction apparatus.SOLUTION: The file generation device includes: first acquisition means for acquiring still image data; second acquisition means for acquiring voice data; and generation means for generating one moving image file from the still image data and the voice data so that a still image on the basis of the still image data is reproduced during a period in which a sound on the basis of the voice data is reproduced.SELECTED DRAWING: Figure 2

Description

本発明は、ファイル生成装置、ファイル生成方法、及び、プログラムに関する。 The present invention relates to a file generation device, a file generation method, and a program.

記録された動画データ（動画撮影によって生成された、被写体像を表す動画データ）の１フレームの画像データを、静止画データとして、動画データから切り出す機能を有する撮像装置（デジタルビデオカメラなど）が存在する。この機能を用いることにより、静止画データの記録のための操作が動画撮影中に行われなくても、動画データから静止画データを切り出せるため、撮影者が所望するタイミングでの静止画データを容易に得ることができる。 There is an imaging device (such as a digital video camera) that has a function of extracting one frame of image data of recorded moving image data (moving image data representing a subject image generated by moving image shooting) from moving image data as still image data. To do. By using this function, it is possible to extract still image data from moving image data even when an operation for recording still image data is not performed during moving image shooting. Can be easily obtained.

また、動画撮影時の雰囲気をより鮮明に残すための技術として、音声付動画の動画データから静止画データを切り出す際に所望の音声データも音声付動画の音声データから切り出す技術が提案されている（特許文献１）。切り出された静止画データに基づく静止画を再生する際に、静止画データと共に切り出された音声データに基づく音声を容易に再生するためには、静止画データと音声データの関連付けが必要となる。 Further, as a technique for leaving the atmosphere at the time of moving image shooting clearer, a technique has been proposed in which desired audio data is also extracted from audio data of moving image with sound when moving still image data from moving image data of moving image with sound. (Patent Document 1). When reproducing a still image based on the cut out still image data, in order to easily reproduce sound based on the audio data cut out together with the still image data, it is necessary to associate the still image data with the audio data.

上記関連付けの方法として、静止画データと音声データから１つの静止画ファイルを生成する方法が提案されている。また、静止画データのデータファイルと音声データのデータファイルとを個別に生成し、それらのデータファイルを関連付ける管理情報を管理ファイルで管理する方法も提案されている。 As a method for the association, a method for generating one still image file from still image data and audio data has been proposed. In addition, a method has been proposed in which a still image data file and an audio data file are generated separately, and management information for associating these data files is managed by a management file.

特開２００６−２９５５７５号公報JP 2006-295575 A

しかしながら、静止画データと音声データから１つの静止画ファイルを生成する方法では、特殊なファイル構造を有する静止画ファイルが生成される。そのため、静止画ファイルから静止画と音声を再生するためには、特殊なファイル構造を解釈可能な専用の再生装置が必要となる。 However, in the method of generating one still image file from still image data and audio data, a still image file having a special file structure is generated. Therefore, in order to reproduce a still image and sound from a still image file, a dedicated reproduction device capable of interpreting a special file structure is required.

また、静止画データのデータファイルと音声データのデータファイルとを個別に生成し、それらのデータファイルを関連付ける管理情報を管理ファイルで管理する方法が使用された場合にも、静止画と音声を再生するために、専用の再生装置が必要となる。具体的には、管理ファイルを解釈可能な専用の再生装置が必要となる。 Also, when still image data and audio data files are generated separately, and management information for associating those data files is managed in the management file, still images and audio are played back. In order to do this, a dedicated playback device is required. Specifically, a dedicated playback device that can interpret the management file is required.

このように、従来技術では、専用の再生装置を用いなければ、静止画を再生する際に、当該静止画に対応する音声を容易に再生することができない。即ち、一般的な再生装置を用いた場合には、静止画を再生する際に、当該静止画に対応する音声を容易に再生することができない。 As described above, in the related art, unless a dedicated playback device is used, when playing back a still image, it is not possible to easily play back sound corresponding to the still image. That is, when a general reproduction apparatus is used, when reproducing a still image, it is not possible to easily reproduce the sound corresponding to the still image.

本発明は、静止画を再生する際に当該静止画に対応する音声を一般的な再生装置で容易に再生可能とする技術を提供することを目的とする。 It is an object of the present invention to provide a technique that enables a sound corresponding to a still image to be easily played back by a general playback device when the still image is played back.

本発明の第１の態様は、
静止画データを取得する第１取得手段と、
音声データを取得する第２取得手段と、
前記音声データに基づく音声が再生される期間に、前記静止画データに基づく静止画が再生されるように、前記静止画データと前記音声データから１つの動画ファイルを生成する生成手段と、
を有することを特徴とするファイル生成装置である。 The first aspect of the present invention is:
First acquisition means for acquiring still image data;
Second acquisition means for acquiring audio data;
Generating means for generating one moving image file from the still image data and the audio data so that a still image based on the still image data is reproduced during a period in which sound based on the audio data is reproduced;
A file generation device characterized by comprising:

本発明の第２の態様は、
静止画データを取得するステップと、
音声データを取得するステップと、
前記音声データに基づく音声が再生される期間に、前記静止画データに基づく静止画が再生されるように、前記静止画データと前記音声データから１つの動画ファイルを生成するステップと、
を有することを特徴とするファイル生成方法である。 The second aspect of the present invention is:
Acquiring still image data;
Obtaining audio data;
Generating one moving image file from the still image data and the audio data so that a still image based on the still image data is reproduced during a period in which sound based on the audio data is reproduced;
A file generation method characterized by comprising:

本発明の第３の態様は、コンピュータを上述したファイル生成装置の各手段として機能させるためのプログラムである。 A third aspect of the present invention is a program for causing a computer to function as each unit of the file generation apparatus described above.

本発明によれば、静止画を再生する際に当該静止画に対応する音声が一般的な再生装置で容易に再生可能となる。 According to the present invention, when a still image is reproduced, sound corresponding to the still image can be easily reproduced by a general reproduction device.

本実施形態に係る撮像装置の構成例を示すブロック図1 is a block diagram illustrating a configuration example of an imaging apparatus according to the present embodiment. 実施例１に係る撮像装置の処理フローの一例を示すフローチャート6 is a flowchart illustrating an example of a processing flow of the imaging apparatus according to the first embodiment. 実施例１に係る音声取得期間の一例を示す図The figure which shows an example of the audio | voice acquisition period which concerns on Example 1. FIG. 実施例２に係る撮像装置の処理フローの一例を示すフローチャート10 is a flowchart illustrating an example of a processing flow of the imaging apparatus according to the second embodiment. 実施例２に係る撮像装置の処理の具体例を示す図FIG. 10 is a diagram illustrating a specific example of processing of the imaging apparatus according to the second embodiment. 実施例３に係る撮像装置の処理フローの一例を示すフローチャート10 is a flowchart illustrating an example of a processing flow of the imaging apparatus according to the third embodiment. 実施例３に係る撮像装置の処理の具体例を示す図FIG. 10 is a diagram illustrating a specific example of processing of the imaging apparatus according to the third embodiment.

以下、本発明の実施形態について説明する。図１は、本実施形態に係る撮像装置の構成例を示すブロック図である。 Hereinafter, embodiments of the present invention will be described. FIG. 1 is a block diagram illustrating a configuration example of an imaging apparatus according to the present embodiment.

撮影レンズ１００は、被写体からの光を撮像素子１０１へ導く。撮影レンズ１００として、単焦点レンズ、ズームレンズ、等を用いることができる。 The photographing lens 100 guides light from the subject to the image sensor 101. As the photographing lens 100, a single focus lens, a zoom lens, or the like can be used.

撮像素子１０１は、撮影レンズ１００によって撮像素子１０１上に結像された被写体像（被写体の画像；光学像）をアナログ信号に変換し、当該アナログ信号を出力する。撮像素子１０１として、ＣＣＤ（Ｃｈａｒｇｅ−ＣｏｕｐｌｅｄＤｅｖｉｃｅ）、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ−ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサ、等を用いることができる。 The image sensor 101 converts a subject image (subject image; optical image) formed on the image sensor 101 by the photographing lens 100 into an analog signal, and outputs the analog signal. As the image sensor 101, a charge-coupled device (CCD), a complementary metal-oxide semiconductor (CMOS) sensor, or the like can be used.

Ａ／Ｄ変換器１０２は、撮像素子１０１から出力されたアナログ信号をデジタル信号（撮影画像データ；被写体像を表す画像データ）に変換する。そして、Ａ／Ｄ変換器１０２は、撮影画像データを出力する。 The A / D converter 102 converts the analog signal output from the image sensor 101 into a digital signal (photographed image data; image data representing a subject image). Then, the A / D converter 102 outputs captured image data.

マイクロホン１０３は、集音を行うことにより、集音された音声を表すアナログ信号を
生成する。そして、マイクロホン１０３は、生成したアナログ信号を出力する。 The microphone 103 collects sound to generate an analog signal representing the collected sound. Then, the microphone 103 outputs the generated analog signal.

Ａ／Ｄ変換器１０４は、マイクロホン１０３から出力されたアナログ信号をデジタル信号（集音された音声を表す音声データ）に変換する。そして、Ａ／Ｄ変換器１０４は、音声データを出力する。 The A / D converter 104 converts the analog signal output from the microphone 103 into a digital signal (sound data representing collected sound). Then, the A / D converter 104 outputs audio data.

マイクロコンピュータ１０５は、撮像装置の各機能部の処理を制御する。 The microcomputer 105 controls processing of each functional unit of the imaging apparatus.

揮発性メモリ１０６は、Ａ／Ｄ変換器１０２から出力された撮影画像データ、Ａ／Ｄ変換器１０４から出力された音声データ、等を一時的に記憶する。また、揮発性メモリ１０６は、データファイル（動画ファイル、静止画ファイル、等）から抽出されたデータ（静止画データ、動画データ、音声データ、等）を一時的に記憶する。 The volatile memory 106 temporarily stores captured image data output from the A / D converter 102, audio data output from the A / D converter 104, and the like. The volatile memory 106 temporarily stores data (still image data, moving image data, audio data, etc.) extracted from the data file (moving image file, still image file, etc.).

不揮発性メモリ１１１には、マイクロコンピュータ１０５によって実行されるプログラムなどが予め記録されている。 In the nonvolatile memory 111, a program executed by the microcomputer 105 is recorded in advance.

静止画コーデック１０７は、揮発性メモリ１０６に記録された静止画データの圧縮符号化（圧縮変換）を行う。例えば、静止画コーデック１０７は、静止画データのデータフォーマットをＪＰＥＧフォーマットへ変換する圧縮符号化を行う。 The still image codec 107 performs compression encoding (compression conversion) of still image data recorded in the volatile memory 106. For example, the still image codec 107 performs compression encoding that converts the data format of still image data into a JPEG format.

動画コーデック１０８は、揮発性メモリ１０６に記録された動画データの圧縮符号化、揮発性メモリ１０６に記録された動画データの伸張複合化（伸張変換）、等を行う。例えば、動画コーデック１０８は、動画データのデータフォーマットをＨ．２６４フォーマットへ変換する圧縮符号化、動画データのデータフォーマットをＨ．２６４フォーマットから変換する伸張複合化、等を行う。 The moving image codec 108 performs compression encoding of moving image data recorded in the volatile memory 106, decompression decoding (decompression conversion) of moving image data recorded in the volatile memory 106, and the like. For example, the moving image codec 108 changes the data format of moving image data to H.264. H.264 is the compression encoding for converting to H.264 format, and the data format of moving image data is H.264. For example, decompression / combining to convert from H.264 format is performed.

音声コーデック１０９は、揮発性メモリ１０６に記録された音声データの圧縮符号化、揮発性メモリ１０６に記録された音声データの伸張複合化、等を行う。例えば、音声コーデック１０９は、音声データのデータフォーマットをＡＡＣフォーマットへ変換する圧縮符号化、音声データのデータフォーマットをＡＡＣフォーマットから変換する伸張複合化、等を行う。 The audio codec 109 performs compression encoding of the audio data recorded in the volatile memory 106, decompression decoding of the audio data recorded in the volatile memory 106, and the like. For example, the audio codec 109 performs compression encoding for converting the data format of the audio data to the AAC format, decompression decoding for converting the data format of the audio data from the AAC format, and the like.

ファイル生成部１１０は、静止画コーデック１０７の圧縮符号化後の静止画データに対してヘッダ情報を生成することにより、静止画ファイルを生成する。例えば、ファイル生成部１１０は、ＪＰＥＧフォーマットの静止画ファイルを生成する。また、ファイル生成部１１０は、動画コーデック１０８の圧縮符号化後の動画データと、音声コーデック１０９の圧縮符号化後の音声データとの多重化を行う。そして、ファイル生成部１１０は、多重化後のデータに対してヘッダ情報を生成することにより、動画ファイルを生成する。例えば、ファイル生成部１１０は、ＭＰ４フォーマットの動画ファイルを生成する。ここで、動画ファイルは、音声付動画のデータファイルである。 The file generation unit 110 generates a still image file by generating header information for still image data after compression encoding of the still image codec 107. For example, the file generation unit 110 generates a still image file in JPEG format. In addition, the file generation unit 110 multiplexes the moving image data after the compression encoding of the moving image codec 108 and the audio data after the compression encoding of the audio codec 109. Then, the file generation unit 110 generates a moving image file by generating header information for the multiplexed data. For example, the file generation unit 110 generates a moving image file in MP4 format. Here, the moving image file is a data file of a moving image with sound.

外部メモリ１１２は、ファイル生成部１１０によって生成されたデータファイル（静止画ファイル、動画ファイル、等）を記憶する。例えば、マイクロコンピュータ１０５が、ファイル生成部１１０によって生成されたデータファイルを、外部メモリ１１２に記録する。外部メモリ１１２として、コンパクトフラッシュ、スマートメディア、ハードディスクドライブ、等を用いることができる。外部メモリ１１２は、撮像装置に対して着脱可能である。なお、外部メモリ１１２の代わりに、撮像装置に内蔵された記憶部が使用されてもよい。 The external memory 112 stores data files (still image files, moving image files, etc.) generated by the file generation unit 110. For example, the microcomputer 105 records the data file generated by the file generation unit 110 in the external memory 112. As the external memory 112, a compact flash, smart media, a hard disk drive, or the like can be used. The external memory 112 is detachable from the imaging device. Note that a storage unit built in the imaging apparatus may be used instead of the external memory 112.

表示部材１１３は、画像（動画、静止画、等）を画面に表示する。例えば、表示部材１
１３は、外部メモリ１１２に記録された動画ファイルに基づく動画（音声付動画）、外部メモリ１１２に記録された静止画ファイルに基づく静止画、等を表示する（再生処理）。撮像装置には不図示のスピーカが設けられており、音声はスピーカから発せられる。表示部材１１３としては、液晶モニタなどを用いることができる。 The display member 113 displays images (moving images, still images, etc.) on the screen. For example, display member 1
13 displays a moving image (moving image with sound) based on a moving image file recorded in the external memory 112, a still image based on a still image file recorded in the external memory 112, and the like (reproduction processing). The imaging device is provided with a speaker (not shown), and sound is emitted from the speaker. A liquid crystal monitor or the like can be used as the display member 113.

操作部材１１４は、撮像装置に対するユーザ操作を受け付ける。例えば、静止画撮影を行うためのユーザ操作、動画撮影を行うためのユーザ操作、撮像装置の動作モードを切り替えるためのユーザ操作、撮影パラメータを変更するためのユーザ操作、等が行われる。静止画撮影は、撮影画像データとして静止画データを得るための撮影である。動画撮影は、撮影画像データとして動画データを得るための撮影である。操作部材１１４として、スイッチ、ボタン、タッチパネル、コントローラ、等を用いることができる。タッチパネルは、表示部材１１３の画面上に設けられていてもよいし、そうでなくてもよい。 The operation member 114 receives a user operation on the imaging apparatus. For example, a user operation for taking a still image, a user operation for taking a moving image, a user operation for switching the operation mode of the imaging apparatus, a user operation for changing a shooting parameter, and the like are performed. Still image shooting is shooting for obtaining still image data as captured image data. Moving image shooting is shooting for obtaining moving image data as captured image data. As the operation member 114, a switch, a button, a touch panel, a controller, or the like can be used. The touch panel may or may not be provided on the screen of the display member 113.

＜実施例１＞
以下、本発明の実施例１について説明する。本実施例に係る撮像装置は、図１に示す構成を有する。図２は、本実施例に係る撮像装置の処理フローの一例を示すフローチャートである。 <Example 1>
Embodiment 1 of the present invention will be described below. The imaging apparatus according to the present embodiment has the configuration shown in FIG. FIG. 2 is a flowchart illustrating an example of a processing flow of the imaging apparatus according to the present embodiment.

まず、Ｓ２０１にて、マイクロコンピュータ１０５は、外部メモリ１１２に記録されている動画ファイルに基づく音声付動画の再生を行う。例えば、マイクロコンピュータ１０５は、撮像装置に対するユーザ操作に応じて、外部メモリ１１２に記録されている複数の動画ファイルのいずれかを選択する。そして、マイクロコンピュータ１０５は、選択した動画ファイルに基づく音声付動画の再生を行う。音声付動画の音声は表示部材１１３の画面に表示され、音声付動画の音声は不図示のスピーカから発せられる。そして、マイクロコンピュータ１０５は、撮像装置に対するユーザ操作に応じて、再生された音声付動画の１フレームを選択する。以後、Ｓ２０１で音声付動画が再生された動画ファイルを「再生動画ファイル」と記載し、Ｓ２０１で選択されたフレームを「選択フレーム」と記載する。 First, in S 201, the microcomputer 105 reproduces a moving image with sound based on a moving image file recorded in the external memory 112. For example, the microcomputer 105 selects one of a plurality of moving image files recorded in the external memory 112 in response to a user operation on the imaging device. Then, the microcomputer 105 reproduces the moving image with sound based on the selected moving image file. The sound of the moving image with sound is displayed on the screen of the display member 113, and the sound of the moving image with sound is emitted from a speaker (not shown). The microcomputer 105 selects one frame of the reproduced moving image with sound in response to a user operation on the imaging apparatus. Hereinafter, the moving image file in which the moving image with sound is reproduced in S201 is referred to as “reproduced moving image file”, and the frame selected in S201 is referred to as “selected frame”.

次に、Ｓ２０２にて、マイクロコンピュータ１０５は、選択フレームの画像データを静止画データとして再生動画ファイルから取得する（抽出する；切り出す）ために再圧縮符号化が必要か否かを判断する。再圧縮符号化が必要でないと判断された場合には、Ｓ２０３へ処理が進められ、再圧縮符号化が必要であると判断された場合には、Ｓ２１０へ処理が進められる。 Next, in S202, the microcomputer 105 determines whether or not recompression coding is necessary to acquire (extract; cut out) the image data of the selected frame as still image data from the reproduced moving image file. If it is determined that recompression encoding is not necessary, the process proceeds to S203, and if it is determined that recompression encoding is necessary, the process proceeds to S210.

例えば、再生動画ファイルにおいて、選択フレームの画像データが、フレーム内圧縮の画像データである場合には、再圧縮符号化を行わずに、選択フレームの画像データを静止画データとして再生動画ファイルから取得することができる。そのため、そのような場合には、再圧縮符号化が必要でないと判断される。一方、再生動画ファイルにおいて、選択フレームの画像データが、フレーム間圧縮の画像データである場合には、再圧縮符号化を行わなければ、選択フレームの画像データを静止画データとして再生動画ファイルから取得することができない。そのため、そのような場合には、再圧縮符号化が必要であると判断される。 For example, in the playback video file, if the image data of the selected frame is image data of intra-frame compression, the image data of the selected frame is acquired from the playback video file as still image data without performing recompression encoding. can do. Therefore, in such a case, it is determined that recompression encoding is not necessary. On the other hand, in the playback video file, if the image data of the selected frame is inter-frame compressed image data, the image data of the selected frame is acquired from the playback video file as still image data unless recompression encoding is performed. Can not do it. Therefore, in such a case, it is determined that recompression encoding is necessary.

なお、再圧縮符号化が必要であるか否かの判断方法は、特に限定されない。例えば、再生動画ファイルに含まれている動画データのデータフォーマット（圧縮形式）がＪＰＥＧフォーマットである場合に、再圧縮符号化が必要でないと判断され、それ以外の場合に、再圧縮符号化が必要であると判断されてもよい。 Note that a method for determining whether or not recompression encoding is necessary is not particularly limited. For example, when the data format (compression format) of the video data included in the playback video file is JPEG format, it is determined that recompression encoding is not necessary, and in other cases, recompression encoding is required. It may be determined that

Ｓ２０３にて、マイクロコンピュータ１０５は、選択フレームの画像データを静止画デ
ータとして再生動画ファイル（再生動画ファイルに含まれている動画データ）から取得し、取得した静止画データを揮発性メモリ１０６に記録する。そして、Ｓ２０４へ処理が進められる。 In S 203, the microcomputer 105 acquires the image data of the selected frame as still image data from the reproduced moving image file (moving image data included in the reproduced moving image file), and records the acquired still image data in the volatile memory 106. To do. Then, the process proceeds to S204.

Ｓ２１０にて、動画コーデック１０８は、再生動画ファイルに含まれている動画データの伸長複合化を行う。マイクロコンピュータ１０５は、伸張複合化後の動画データから選択フレームの画像データを静止画データとして取得する。静止画コーデック１０７は、マイクロコンピュータ１０５によって取得された静止画データの圧縮符号化を行う。そして、マイクロコンピュータ１０５は、圧縮符号化後の静止画データを揮発性メモリ１０６に記録する。その後、Ｓ２０４へ処理が進められる。 In S210, the moving image codec 108 performs decompression decoding of the moving image data included in the reproduced moving image file. The microcomputer 105 acquires the image data of the selected frame as still image data from the moving image data after being decompressed and combined. The still image codec 107 performs compression encoding of still image data acquired by the microcomputer 105. The microcomputer 105 records the still image data after compression encoding in the volatile memory 106. Thereafter, the process proceeds to S204.

Ｓ２０４にて、マイクロコンピュータ１０５は、再生動画ファイル（再生動画ファイルに含まれている音声データ）から取得する音声データの期間を決定する。以後、Ｓ２０４で決定された期間を「音声取得期間」と記載する。本実施例では、再生動画ファイルに基づく音声付動画の少なくとも一部の期間が、音声取得期間として決定される。 In S204, microcomputer 105 determines the period of audio data acquired from the playback video file (audio data included in the playback video file). Hereinafter, the period determined in S204 is referred to as “voice acquisition period”. In the present embodiment, at least a part of the period of the moving image with sound based on the reproduced moving image file is determined as the sound acquisition period.

Ｓ２０５にて、マイクロコンピュータ１０５は、音声取得期間の長さ（時間）を判断（算出）する。そして、マイクロコンピュータ１０５は、音声取得期間の長さが閾値以上であるか否かを判断する。音声取得期間の長さが閾値以上であると判断された場合には、Ｓ２０７へ処理が進められ、音声取得期間の長さが閾値未満であると判断された場合には、Ｓ２１１へ処理が進められる。音声取得期間の長さと比較される閾値は、メーカによって予め定められた固定値であってもよいし、そうでなくてもよい。例えば、音声取得期間の長さと比較される閾値は、ユーザによって指定された値であってもよい。 In S205, the microcomputer 105 determines (calculates) the length (time) of the voice acquisition period. Then, the microcomputer 105 determines whether or not the length of the voice acquisition period is equal to or greater than a threshold value. If it is determined that the length of the voice acquisition period is equal to or greater than the threshold, the process proceeds to S207. If it is determined that the length of the voice acquisition period is less than the threshold, the process proceeds to S211. It is done. The threshold value compared with the length of the voice acquisition period may be a fixed value determined in advance by the manufacturer, or may not be so. For example, the threshold value compared with the length of the voice acquisition period may be a value specified by the user.

Ｓ２１１にて、マイクロコンピュータ１０５は、ユーザに対する所定の通知を行う。例えば、マイクロコンピュータ１０５は、音声取得期間の長さが不十分であることのメッセージを表示部材１１３の画面に表示する制御を行う。そして、Ｓ２０４へ処理が戻され、音声取得期間が再決定される。なお、所定の通知は、所定のメッセージの表示に限られない。所定の通知として、所定のアイコンの表示、所定の音声の出力、所定のランプの点灯、所定の点灯パターンでのランプの点灯、等が行われてもよい。 In S211, the microcomputer 105 gives a predetermined notification to the user. For example, the microcomputer 105 performs control to display a message indicating that the length of the voice acquisition period is insufficient on the screen of the display member 113. Then, the process returns to S204, and the voice acquisition period is determined again. The predetermined notification is not limited to displaying a predetermined message. As the predetermined notification, display of a predetermined icon, output of a predetermined sound, lighting of a predetermined lamp, lighting of a lamp with a predetermined lighting pattern, and the like may be performed.

Ｓ２０７にて、マイクロコンピュータ１０５は、音声取得期間における音声データを再生動画ファイル（再生動画ファイルに含まれている音声データ）から取得し、取得した音声データを揮発性メモリ１０６に記録する。 In S207, the microcomputer 105 acquires the audio data in the audio acquisition period from the reproduction moving image file (audio data included in the reproduction moving image file), and records the acquired audio data in the volatile memory 106.

次に、Ｓ２０８とＳ２０９にて、音声取得期間の音声データに基づく音声が再生される期間に、選択フレームの静止画データに基づく静止画が再生されるように、選択フレームの静止画データと音声取得期間の音声データから１つの動画ファイルが生成される。音声データと静止画データは、揮発性メモリ１０６から読み出されて使用される。 Next, in S208 and S209, the still image data and audio of the selected frame are reproduced so that the still image based on the still image data of the selected frame is reproduced during the period in which the audio based on the audio data of the audio acquisition period is reproduced. One moving image file is generated from the audio data in the acquisition period. Audio data and still image data are read from the volatile memory 106 and used.

具体的には、Ｓ２０８にて、マイクロコンピュータ１０５は、音声取得期間の長さに基づいて、上記動画ファイルに含まれる動画データのフレームレートを決定（算出）する。本実施例では、選択フレームの静止画データに対応する１フレームのみを動画データが含むように、動画ファイルが生成される。そのため、マイクロコンピュータ１０５は、静止画データに基づく静止画の再生時間に対応するフレームレートを決定する。例えば、音声取得期間の長さが５秒である場合には、１フレームの再生時間が５秒になるようにフレームレートが決定される。具体的には、０．２（＝１／５）ｆｐｓのフレームレートが決定される。 Specifically, in S208, the microcomputer 105 determines (calculates) the frame rate of the moving image data included in the moving image file based on the length of the audio acquisition period. In this embodiment, the moving image file is generated so that the moving image data includes only one frame corresponding to the still image data of the selected frame. Therefore, the microcomputer 105 determines a frame rate corresponding to the still image reproduction time based on the still image data. For example, when the length of the voice acquisition period is 5 seconds, the frame rate is determined so that the playback time of one frame is 5 seconds. Specifically, a frame rate of 0.2 (= 1/5) fps is determined.

そして、Ｓ２０９にて、マイクロコンピュータ１０５は、選択フレームの静止画データ
と、音声取得期間の音声データとを、揮発性メモリ１０６から読み出す。そして、ファイル生成部１１０は、読み出されたそれらのデータの多重化を行い、Ｓ２０８で決定されたフレームレートに動画データのフレームレートを制御するためのヘッダ情報を多重化後のデータに対して生成する。それにより、選択フレームの静止画データに対応する１フレームのみを動画データが含み、且つ、１フレームの再生時間が音声取得期間の長さと等しい動画ファイルが生成される。その後、マイクロコンピュータ１０５は、生成された動画ファイルを、外部メモリ１１２に記録する。 In step S 209, the microcomputer 105 reads out still image data of the selected frame and audio data during the audio acquisition period from the volatile memory 106. Then, the file generation unit 110 multiplexes the read data, and adds header information for controlling the frame rate of the moving image data to the frame rate determined in S208 for the multiplexed data. Generate. As a result, a moving image file is generated in which the moving image data includes only one frame corresponding to the still image data of the selected frame, and the playback time of one frame is equal to the length of the audio acquisition period. Thereafter, the microcomputer 105 records the generated moving image file in the external memory 112.

なお、本実施例に係る撮像装置の処理フローは、上記処理フローに限られない。例えば、Ｓ２０６とＳ２１１の処理が省略されてもよい。Ｓ２０５からＳ２０７へ処理が進められてもよい。 Note that the processing flow of the imaging apparatus according to the present embodiment is not limited to the above processing flow. For example, the processes of S206 and S211 may be omitted. The process may proceed from S205 to S207.

図３は、音声取得期間の一例を示す。図３において、時間軸の矢印の向きは、時間の経過方向である。図３には、再生動画ファイルの各フレームの画像、再生動画ファイルの音声の波形、及び、音声取得期間が示されている。再生動画ファイルのフレームレートは、例えば、３０ｆｐｓである。図３において、画像ｎは、選択フレームの画像である。 FIG. 3 shows an example of a voice acquisition period. In FIG. 3, the direction of the arrow on the time axis is the direction of time passage. FIG. 3 shows an image of each frame of the reproduced moving image file, an audio waveform of the reproduced moving image file, and an audio acquisition period. The frame rate of the playback moving image file is, for example, 30 fps. In FIG. 3, an image n is an image of a selected frame.

マイクロコンピュータ１０５は、選択フレームの時間位置に応じた期間を、音声取得期間として自動で決定することができる。例えば、マイクロコンピュータ１０５は、選択フレームの時間位置に対して第１時間前の時間位置から選択フレームの時間位置までの期間ｓ１を、音声取得期間として自動で決定することができる。図３の例では期間ｓ１が選択フレームの時間位置を含んでいるが、選択フレームの時間位置までの期間は、選択フレームの時間位置を含まなくてもよい。第１時間は、メーカによって予め定められた固定時間であってもよいし、そうでなくてもよい。例えば、第１時間は、ユーザによって指定された時間であってもよい。 The microcomputer 105 can automatically determine the period corresponding to the time position of the selected frame as the voice acquisition period. For example, the microcomputer 105 can automatically determine a period s1 from the time position before the first time to the time position of the selected frame with respect to the time position of the selected frame as the voice acquisition period. In the example of FIG. 3, the period s1 includes the time position of the selected frame, but the period up to the time position of the selected frame may not include the time position of the selected frame. The first time may or may not be a fixed time predetermined by the manufacturer. For example, the first time may be a time specified by the user.

マイクロコンピュータ１０５は、選択フレームの時間位置から選択フレームの時間位置に対して第２時間後の時間位置までの期間ｓ２を、音声取得期間として自動で決定することもできる。図３の例では期間ｓ２が選択フレームの時間位置を含んでいないが、選択フレームの時間位置からの期間は、選択フレームの時間位置を含んでもよい。第２時間は、メーカによって予め定められた固定時間であってもよいし、そうでなくてもよい。例えば、第２時間は、ユーザによって指定された時間であってもよい。 The microcomputer 105 can also automatically determine the period s2 from the time position of the selected frame to the time position after the second time with respect to the time position of the selected frame as the voice acquisition period. In the example of FIG. 3, the period s2 does not include the time position of the selected frame, but the period from the time position of the selected frame may include the time position of the selected frame. The second time may or may not be a fixed time predetermined by the manufacturer. For example, the second time may be a time specified by the user.

マイクロコンピュータ１０５は、選択フレームの時間位置を跨いだ期間ｓ３を、音声取得期間として自動で決定することもできる。期間ｓ３は、選択フレームの時間位置に対して第３時間前の時間位置から、選択フレームの時間位置に対して第４時間後の時間位置までの期間である。図３の例では、期間ｓ３の中心の時間位置が選択フレームの時間位置に一致する。即ち、第３時間は第４時間と等しい。なお、選択フレームの時間位置を跨いだ音声取得期間期間の中心の時間位置は、選択フレームの時間位置に一致しなくてもよい。即ち、第３時間は第４時間と異なっていてもよい。第３時間が第４時間より長くてもよいし、第３時間が第４時間より短くてもよい。第３時間と第４時間のそれぞれは、メーカによって予め定められた固定時間であってもよいし、そうでなくてもよい。例えば、第３時間と第４時間の少なくとも一方は、ユーザによって指定された時間であってもよい。 The microcomputer 105 can also automatically determine the period s3 straddling the time position of the selected frame as the voice acquisition period. The period s3 is a period from the time position before the third time with respect to the time position of the selected frame to the time position after the fourth time with respect to the time position of the selected frame. In the example of FIG. 3, the time position at the center of the period s3 matches the time position of the selected frame. That is, the third time is equal to the fourth time. Note that the time position at the center of the voice acquisition period that straddles the time position of the selected frame does not have to coincide with the time position of the selected frame. That is, the third time may be different from the fourth time. The third time may be longer than the fourth time, or the third time may be shorter than the fourth time. Each of the third time and the fourth time may or may not be a fixed time predetermined by the manufacturer. For example, at least one of the third time and the fourth time may be a time designated by the user.

マイクロコンピュータ１０５は、ユーザによって指定された期間ｓ４を、音声取得期間として決定することもできる。ユーザは、例えば、表示部材１１３の画面に表示された音声付動画を確認しながら、操作部材１１４を用いて期間ｓ４を指定する。期間ｓ４は、任意の期間である。期間ｓ４、選択フレームに関連した期間であってもよいし、選択フレームに関連した期間でなくてもよい。 The microcomputer 105 can also determine the period s4 designated by the user as the voice acquisition period. For example, the user designates the period s4 using the operation member 114 while confirming the moving image with sound displayed on the screen of the display member 113. The period s4 is an arbitrary period. The period s4 may be a period related to the selected frame, or may not be a period related to the selected frame.

以上述べたように、本実施例によれば、音声取得期間の音声データに基づく音声が再生される期間に、選択フレームの静止画データに基づく静止画が再生されるように、それらのデータから、一般的なファイル構造を有する１つの動画ファイルが生成される。一般的なファイル構造を有する動画ファイルに基づく音声付動画は、一般的な再生装置で容易に再生することができる。そのため、本実施例によれば、静止画を再生する際に当該静止画に対応する音声が一般的な再生装置で容易に再生可能となる。 As described above, according to the present embodiment, from the data so that the still image based on the still image data of the selected frame is reproduced during the period in which the audio based on the audio data in the audio acquisition period is reproduced. One moving image file having a general file structure is generated. A moving image with sound based on a moving image file having a general file structure can be easily reproduced by a general reproduction device. Therefore, according to the present embodiment, when playing back a still image, the sound corresponding to the still image can be easily played back by a general playback device.

＜実施例２＞
以下、本発明の実施例２について説明する。なお、以下では、実施例１と異なる点（構成、処理、等）について詳しく説明し、実施例１と同じ点についての説明は省略する。本実施例に係る撮像装置は、図１に示す構成を有する。図４は、本実施例に係る撮像装置の処理フローの一例を示すフローチャートである。 <Example 2>
Embodiment 2 of the present invention will be described below. In the following, points (configuration, processing, etc.) different from those in the first embodiment will be described in detail, and descriptions of the same points as in the first embodiment will be omitted. The imaging apparatus according to the present embodiment has the configuration shown in FIG. FIG. 4 is a flowchart illustrating an example of a processing flow of the imaging apparatus according to the present embodiment.

まず、Ｓ４０１にて、音声データのバッファリングが開始される。本実施例では、撮像装置の状態が撮影可能な状態になったタイミングで、音声データのバッファリングが開始される。なお、音声データのバッファリングが開始されるタイミングは、特に限定されない。例えば、撮像装置に対するユーザ操作に応じて、音声データのバッファリングが開始されてもよい。 First, in S401, buffering of audio data is started. In this embodiment, the buffering of the audio data is started at the timing when the state of the imaging apparatus becomes ready for photographing. Note that the timing at which audio data buffering is started is not particularly limited. For example, buffering of audio data may be started in response to a user operation on the imaging apparatus.

音声データのバッファリングについて具体的に説明する。マイクロホン１０３は、集音を行うことにより、集音された音声を表すアナログ信号を生成する。Ａ／Ｄ変換器１０４は、マイクロホン１０３によって生成されたアナログ信号を、デジタル信号である音声データに変換する。マイクロコンピュータ１０５は、Ａ／Ｄ変換器１０４によって生成された音声データを、揮発性メモリ１０６に記録する。マイクロコンピュータ１０５は、Ａ／Ｄ変換器１０４によって生成された音声データを揮発性メモリ１０６から読み出し、読み出した音声データを音声コーデック１０９に供給する。音声コーデック１０９は、供給された音声データの圧縮符号化を行う。そして、マイクロコンピュータ１０５は、圧縮符号化後の音声データを、揮発性メモリ１０６に記録する。 The audio data buffering will be specifically described. The microphone 103 collects sound to generate an analog signal representing the collected sound. The A / D converter 104 converts the analog signal generated by the microphone 103 into audio data that is a digital signal. The microcomputer 105 records the audio data generated by the A / D converter 104 in the volatile memory 106. The microcomputer 105 reads the audio data generated by the A / D converter 104 from the volatile memory 106 and supplies the read audio data to the audio codec 109. The audio codec 109 performs compression encoding of the supplied audio data. The microcomputer 105 records the audio data after compression encoding in the volatile memory 106.

撮像装置では、揮発性メモリ１０６が保持可能な音声データの最大時間（上限時間；最大保持時間）が予め設定されている。ここで、音声データのバッファリングで最大保持時間よりも長い時間の音声データが揮発性メモリ１０６に記録される場合を考える。この場合には、現在のタイミングまでの最大保持時間の音声データが揮発性メモリ１０６によって保持されるように、古い音声データが揮発性メモリ１０６から削除される。なお、最大保持時間は、メーカによって予め定められた固定時間であってもよいし、そうでなくてもよい。例えば、最大保持時間は、ユーザによって指定された時間であってもよい。 In the imaging apparatus, a maximum time (upper limit time; maximum holding time) of audio data that can be held in the volatile memory 106 is set in advance. Here, consider a case where audio data having a time longer than the maximum holding time is recorded in the volatile memory 106 by buffering of the audio data. In this case, the old audio data is deleted from the volatile memory 106 so that the audio data of the maximum holding time until the current timing is held by the volatile memory 106. Note that the maximum holding time may or may not be a fixed time predetermined by the manufacturer. For example, the maximum holding time may be a time specified by the user.

次に、Ｓ４０２にて、ユーザは、静止画撮影を行うためのユーザ操作を、操作部材１１４を用いて行う。そして、Ｓ４０３にて、静止画撮影を行うためのユーザ操作に応じて、音声データのバッファリングが停止される。次に、Ｓ４０４にて、静止画撮影を行うためのユーザ操作に応じて、静止画撮影が行われる。それにより、静止画データが取得される。 Next, in S 402, the user performs a user operation for taking a still image using the operation member 114. In step S403, the audio data buffering is stopped in response to a user operation for taking a still image. Next, in S404, still image shooting is performed in response to a user operation for performing still image shooting. Thereby, still image data is acquired.

静止画撮影について具体的に説明する。撮像素子１０１は、撮影レンズ１００によって撮像素子１０１上に結像された被写体像をアナログ信号に変換する。Ａ／Ｄ変換器１０２は、撮像素子１０１によって生成されたアナログ信号を、デジタル信号であり且つ撮影画像データである静止画データに変換する。マイクロコンピュータ１０５は、Ａ／Ｄ変換器１０２によって生成された静止画データを、揮発性メモリ１０６に記録する。マイクロコンピュータ１０５は、Ａ／Ｄ変換器１０２によって生成された静止画データを揮発性メモリ１０６から読み出し、読み出した静止画データを静止画コーデック１０７に供給する。
静止画コーデック１０７は、供給された静止画データの圧縮符号化を行う。そして、マイクロコンピュータ１０５は、圧縮符号化後の静止画データを、揮発性メモリ１０６に記録する。 The still image shooting will be specifically described. The image sensor 101 converts the subject image formed on the image sensor 101 by the photographing lens 100 into an analog signal. The A / D converter 102 converts the analog signal generated by the image sensor 101 into still image data that is a digital signal and is captured image data. The microcomputer 105 records the still image data generated by the A / D converter 102 in the volatile memory 106. The microcomputer 105 reads the still image data generated by the A / D converter 102 from the volatile memory 106 and supplies the read still image data to the still image codec 107.
The still image codec 107 performs compression encoding of the supplied still image data. Then, the microcomputer 105 records the still image data after compression encoding in the volatile memory 106.

次に、Ｓ４０５にて、マイクロコンピュータ１０５は、揮発性メモリ１０６に保持されている音声データの時間が最小記録時間以上であるか否かを判断する。ここで、揮発性メモリ１０６に保持されている音声データは、音声データのバッファリングによって揮発性メモリ１０６に記録された音声データである。揮発性メモリ１０６に保持されている音声データの時間が最小記録時間以上であると判断された場合には、Ｓ４０６へ処理が進められる。揮発性メモリ１０６に保持されている音声データの時間が最小記録時間未満であると判断された場合には、Ｓ４１１へ処理が進められる。なお、最小記録時間は、メーカによって予め定められた固定時間であってもよいし、そうでなくてもよい。例えば、最小記録時間は、ユーザによって指定された時間であってもよい。 In step S 405, the microcomputer 105 determines whether the time of the audio data held in the volatile memory 106 is equal to or longer than the minimum recording time. Here, the sound data held in the volatile memory 106 is sound data recorded in the volatile memory 106 by sound data buffering. If it is determined that the time of the audio data held in the volatile memory 106 is equal to or longer than the minimum recording time, the process proceeds to S406. If it is determined that the time of the audio data held in the volatile memory 106 is less than the minimum recording time, the process proceeds to S411. Note that the minimum recording time may or may not be a fixed time determined in advance by the manufacturer. For example, the minimum recording time may be a time specified by the user.

Ｓ４１１にて、Ｓ４０４の静止画撮影によって揮発性メモリ１０６に記録された静止画データから、静止画ファイルが生成される。具体的には、マイクロコンピュータ１０５は、Ｓ４０４の静止画撮影によって得られた静止画データを、揮発性メモリ１０６から読み出す。そして、ファイル生成部１１０は、読み出された静止画データから、静止画ファイルを生成する。その後、マイクロコンピュータ１０５は、生成された静止画ファイルを、外部メモリ１１２に記録する。 In S411, a still image file is generated from the still image data recorded in the volatile memory 106 by the still image shooting in S404. Specifically, the microcomputer 105 reads out the still image data obtained by the still image shooting in S 404 from the volatile memory 106. Then, the file generation unit 110 generates a still image file from the read still image data. Thereafter, the microcomputer 105 records the generated still image file in the external memory 112.

Ｓ４０６にて、マイクロコンピュータ１０５は、揮発性メモリ１０６に保持されている音声データの時間が最大記録時間以上であるか否かを判断する。ここで、揮発性メモリ１０６に保持されている音声データは、音声データのバッファリングによって揮発性メモリ１０６に記録された音声データである。揮発性メモリ１０６に保持されている音声データの時間が最大記録時間以上であると判断された場合には、Ｓ４０７へ処理が進められる。揮発性メモリ１０６に保持されている音声データの時間が最大記録時間未満であると判断された場合には、Ｓ４１２へ処理が進められる。最大記録時間は、最小記録時間よりも長く、最大保持時間以下である。なお、最大記録時間は、メーカによって予め定められた固定時間であってもよいし、そうでなくてもよい。例えば、最大記録時間は、ユーザによって指定された時間であってもよい。 In step S406, the microcomputer 105 determines whether the time of the audio data held in the volatile memory 106 is equal to or longer than the maximum recording time. Here, the sound data held in the volatile memory 106 is sound data recorded in the volatile memory 106 by sound data buffering. If it is determined that the time of the audio data held in the volatile memory 106 is equal to or longer than the maximum recording time, the process proceeds to S407. If it is determined that the time of the audio data held in the volatile memory 106 is less than the maximum recording time, the process proceeds to S412. The maximum recording time is longer than the minimum recording time and not longer than the maximum holding time. Note that the maximum recording time may or may not be a fixed time predetermined by the manufacturer. For example, the maximum recording time may be a time specified by the user.

Ｓ４１２にて、マイクロコンピュータ１０５は、音声データのバッファリングによって揮発性メモリ１０６に記録された音声データ（全体）を、揮発性メモリ１０６から読み出す。その後、Ｓ４０８へ処理が進められる。 In step S 412, the microcomputer 105 reads the audio data (entire) recorded in the volatile memory 106 by the audio data buffering from the volatile memory 106. Thereafter, the process proceeds to S408.

Ｓ４０７にて、マイクロコンピュータ１０５は、音声データのバッファリングによって揮発性メモリ１０６に記録された、バッファリングが終了したタイミングまでの最大記録時間分の音声データを、揮発性メモリ１０６から読み出す。その後、Ｓ４０８へ処理が進められる。 In step S 407, the microcomputer 105 reads, from the volatile memory 106, the audio data for the maximum recording time recorded in the volatile memory 106 by the buffering of the audio data until the timing when the buffering is completed. Thereafter, the process proceeds to S408.

Ｓ４０８〜Ｓ４１０にて、動画ファイルが生成される。本実施例では、Ｓ４０７またはＳ４１２で読み出された音声データに基づく音声が再生される期間に、Ｓ４０４の静止画撮影によって得られた静止画データに基づく静止画が再生されるように、それらのデータから１つの動画ファイルが生成される。 In S408 to S410, a moving image file is generated. In the present embodiment, in order to reproduce the still image based on the still image data obtained by the still image shooting of S404 during the period in which the audio based on the audio data read in S407 or S412 is reproduced. One moving image file is generated from the data.

具体的には、Ｓ４０８にて、マイクロコンピュータ１０５は、Ｓ４０７またはＳ４１２で読み出した音声データの時間を判断（算出）する。 Specifically, in S408, the microcomputer 105 determines (calculates) the time of the audio data read in S407 or S412.

Ｓ４０９にて、マイクロコンピュータ１０５は、Ｓ４０８で判断した時間に基づいて、
フレームレートを決定（算出）する。Ｓ４０９では、図２のＳ２０８の処理と同様の処理により、フレームレートが決定される。 In S409, the microcomputer 105 determines based on the time determined in S408.
Determine (calculate) the frame rate. In S409, the frame rate is determined by a process similar to the process of S208 in FIG.

Ｓ４１０にて、マイクロコンピュータ１０５は、Ｓ４０４の静止画撮影によって得られた静止画データを、揮発性メモリ１０６から読み出す。ファイル生成部１１０は、読み出された静止画データと、Ｓ４０７またはＳ４１２で読み出された音声データとの多重化を行う。そして、ファイル生成部１１０は、Ｓ４０９で決定されたフレームレートに動画データのフレームレートを制御するためのヘッダ情報を多重化後のデータに対して生成する。それにより、Ｓ４０４の静止画撮影によって得られた静止画データに対応する１フレームのみを動画データが含み、且つ、１フレームの再生時間がＳ４０８で判断した時間と等しい動画ファイルが生成される。その後、マイクロコンピュータ１０５は、生成された動画ファイルを、外部メモリ１１２に記録する。 In S410, the microcomputer 105 reads out the still image data obtained by the still image shooting in S404 from the volatile memory 106. The file generation unit 110 multiplexes the read still image data and the audio data read in S407 or S412. Then, the file generation unit 110 generates header information for the multiplexed data to control the frame rate of the moving image data to the frame rate determined in S409. As a result, a moving image file is generated in which the moving image data includes only one frame corresponding to the still image data obtained by the still image shooting in S404 and the reproduction time of one frame is equal to the time determined in S408. Thereafter, the microcomputer 105 records the generated moving image file in the external memory 112.

なお、本実施例に係る撮像装置の処理フローは、上記処理フローに限られない。例えば、Ｓ４０５、Ｓ４０６、Ｓ４０７、及び、Ｓ４１１の少なくとも一部の処理が省略されてもよい。Ｓ４０４からＳ４１２へ処理が進められてよい。揮発性メモリ１０６に保持されている音声データの時間が最小記録時間以上であると判断された場合には、Ｓ４０５からＳ４１２へ処理が進められてもよい。 Note that the processing flow of the imaging apparatus according to the present embodiment is not limited to the above processing flow. For example, at least a part of the processing of S405, S406, S407, and S411 may be omitted. Processing may proceed from S404 to S412. If it is determined that the time of the audio data held in the volatile memory 106 is equal to or longer than the minimum recording time, the process may proceed from S405 to S412.

図５は、本実施例に係る撮像装置の処理の具体例を示す。図５において、時間軸の矢印の向きは、時間の経過方向である。図５には、バッファリングされた音声データの波形、静止画撮影によって取得された静止画データに基づく静止画、各種期間、及び、各種タイミングが示されている。 FIG. 5 shows a specific example of processing of the imaging apparatus according to the present embodiment. In FIG. 5, the direction of the arrow on the time axis is the elapsed time direction. FIG. 5 shows a waveform of buffered audio data, a still image based on still image data acquired by still image shooting, various periods, and various timings.

タイミングｔ１で、音声データのバッファリングが開始される（Ｓ４０１）。そして、タイミングｔ２で、静止画撮影のためのユーザ操作が行われ（Ｓ４０２）、音声データのバッファリングが停止され（Ｓ４０３）、静止画撮影（静止画データの取得）が行われる（Ｓ４０４）。 At timing t1, buffering of audio data is started (S401). At timing t2, a user operation for still image shooting is performed (S402), buffering of audio data is stopped (S403), and still image shooting (acquisition of still image data) is performed (S404).

タイミングｔ１からタイミングｔ２までの期間ｓ５に音声データのバッファリングが行われるが、期間ｓ５の時間は、最大保持時間よりも長い。そのため、音声データのバッファリングでは、現在のタイミングまでの最大保持時間の音声データが揮発性メモリ１０６によって保持されるように、古い音声データが揮発性メモリ１０６から削除される。その結果、タイミングｔ２において、タイミングｔ２までの期間ｓ５’の音声データが、揮発性メモリ１０６によって保持される。期間ｓ５’の時間は、最大保持時間と等しい。 The audio data is buffered in the period s5 from the timing t1 to the timing t2, but the period s5 is longer than the maximum holding time. Therefore, in audio data buffering, old audio data is deleted from the volatile memory 106 so that the audio data having the maximum retention time up to the current timing is held by the volatile memory 106. As a result, at the timing t2, the volatile memory 106 holds the audio data in the period s5 'up to the timing t2. The time of the period s5 'is equal to the maximum holding time.

最小記録時間は最大保持時間よりも短く、最大記録時間は最大保持時間以下である。そのため、Ｓ４０５からＳ４０６へ処理が進められ、Ｓ４０６からＳ４０７へ処理が進められる。ここでは、最大記録時間が最大保持時間と等しいとする。そのため、期間ｓ５’の音声データが揮発性メモリ１０６から取得され（Ｓ４０７）、期間ｓ５’の時間が判断され（Ｓ４０８）、期間ｓ５’の時間に基づいてフレームレートが決定される（Ｓ４０９）。そして、静止画撮影によって取得された静止画データ、揮発性メモリ１０６から取得された音声データ、及び、決定されたフレームレートから、１つの動画ファイルが生成される（Ｓ４１０）。 The minimum recording time is shorter than the maximum holding time, and the maximum recording time is less than the maximum holding time. Therefore, the process proceeds from S405 to S406, and the process proceeds from S406 to S407. Here, it is assumed that the maximum recording time is equal to the maximum holding time. Therefore, the audio data of the period s5 'is acquired from the volatile memory 106 (S407), the time of the period s5' is determined (S408), and the frame rate is determined based on the time of the period s5 '(S409). Then, one moving image file is generated from the still image data acquired by still image shooting, the audio data acquired from the volatile memory 106, and the determined frame rate (S410).

以上述べたように、本実施例によれば、静止画撮影を行うことにより、静止画データが取得され、静止画撮影が行われるタイミングに応じた期間に集音を行うことにより、音声データが取得される。そして、取得された音声データに基づく音声が再生される期間に、取得された静止画データに基づく静止画が再生されるように、それらのデータから、一般的なファイル構造を有する１つの動画ファイルが生成される。それにより、静止画を再生
する際に当該静止画に対応する音声が一般的な再生装置で容易に再生可能となる。 As described above, according to the present embodiment, still image data is acquired by performing still image shooting, and audio data is acquired by collecting sound during a period corresponding to the timing at which still image shooting is performed. To be acquired. Then, one moving image file having a general file structure is generated from the data so that a still image based on the acquired still image data is reproduced during a period in which the sound based on the acquired audio data is reproduced. Is generated. Thereby, when reproducing a still image, the sound corresponding to the still image can be easily reproduced by a general reproduction apparatus.

＜実施例３＞
以下、本発明の実施例３について説明する。なお、以下では、実施例１，２と異なる点（構成、処理、等）について詳しく説明し、実施例１，２と同じ点についての説明は省略する。本実施例に係る撮像装置は、図１に示す構成を有する。図６は、本実施例に係る撮像装置の処理フローの一例を示すフローチャートである。 <Example 3>
Embodiment 3 of the present invention will be described below. In the following, differences (configuration, processing, etc.) from the first and second embodiments will be described in detail, and the description of the same points as the first and second embodiments will be omitted. The imaging apparatus according to the present embodiment has the configuration shown in FIG. FIG. 6 is a flowchart illustrating an example of a processing flow of the imaging apparatus according to the present embodiment.

まず、Ｓ６０１にて、ユーザは、静止画撮影を行うためのユーザ操作を、操作部材１１４を用いて行う。次に、Ｓ６０２にて、静止画撮影を行うためのユーザ操作に応じて、静止画撮影が行われる。それにより、静止画データが取得される。そして、Ｓ６０３にて、静止画撮影を行うためのユーザ操作に応じて、音声データのバッファリングが開始される。次に、Ｓ６０４にて、音声データのバッファリングが行われる。 First, in S 601, the user performs a user operation for taking a still image using the operation member 114. Next, in S602, still image shooting is performed in response to a user operation for performing still image shooting. Thereby, still image data is acquired. In step S 603, buffering of audio data is started in response to a user operation for performing still image shooting. Next, in S604, audio data is buffered.

そして、Ｓ６０５にて、マイクロコンピュータ１０５は、揮発性メモリ１０６に保持されている音声データの時間が必要記録時間に達したか否かを判断する。ここで、揮発性メモリ１０６に保持されている音声データは、音声データのバッファリングによって揮発性メモリ１０６に記録された音声データである。揮発性メモリ１０６に保持されている音声データの時間が必要記録時間に達していないと判断された場合には、Ｓ６０４へ処理が戻される。そして、揮発性メモリ１０６に保持されている音声データの時間が必要記録時間に達するまで、音声データのバッファリングが継続される。揮発性メモリ１０６に保持されている音声データの時間が必要記録時間に達すると、Ｓ６０５からＳ６０６へ処理が進められる。なお、必要記録時間は、メーカによって予め定められた固定時間であってもよいし、そうでなくてもよい。例えば、必要記録時間は、ユーザによって指定された時間であってもよい。 In step S 605, the microcomputer 105 determines whether the time of the audio data held in the volatile memory 106 has reached the necessary recording time. Here, the sound data held in the volatile memory 106 is sound data recorded in the volatile memory 106 by sound data buffering. If it is determined that the time of the audio data held in the volatile memory 106 has not reached the required recording time, the process returns to S604. The audio data buffering is continued until the time of the audio data held in the volatile memory 106 reaches the required recording time. When the time of the audio data held in the volatile memory 106 reaches the required recording time, the process proceeds from S605 to S606. The required recording time may be a fixed time predetermined by the manufacturer or may not be so. For example, the necessary recording time may be a time specified by the user.

Ｓ６０６にて、音声データのバッファリングが停止される。次に、Ｓ６０７にて、マイクロコンピュータ１０５は、音声データのバッファリングによって揮発性メモリ１０６に記録された音声データを、揮発性メモリ１０６から読み出す。 In S606, buffering of audio data is stopped. In step S 607, the microcomputer 105 reads out the audio data recorded in the volatile memory 106 by the audio data buffering from the volatile memory 106.

そして、Ｓ６０８〜Ｓ６１０にて、動画ファイルが生成される。本実施例では、Ｓ６０７で読み出された音声データに基づく音声が再生される期間に、Ｓ６０２の静止画撮影によって得られた静止画データに基づく静止画が再生されるように、それらのデータから１つの動画ファイルが生成される。 Then, in S608 to S610, a moving image file is generated. In the present embodiment, during the period in which the sound based on the sound data read out in S607 is reproduced, the still image based on the still image data obtained by the still image shooting in S602 is reproduced from the data. One moving image file is generated.

具体的には、Ｓ６０８にて、マイクロコンピュータ１０５は、Ｓ６０７で読み出した音声データの時間を判断（算出）する。 Specifically, in S608, the microcomputer 105 determines (calculates) the time of the audio data read out in S607.

Ｓ６０９にて、マイクロコンピュータ１０５は、Ｓ６０８で判断した時間に基づいて、フレームレートを決定（算出）する。 In S609, the microcomputer 105 determines (calculates) the frame rate based on the time determined in S608.

Ｓ６１０にて、マイクロコンピュータ１０５は、Ｓ６０２の静止画撮影によって得られた静止画データを、揮発性メモリ１０６から読み出す。ファイル生成部１１０は、読み出された静止画データと、Ｓ６０７で読み出された音声データとの多重化を行う。そして、ファイル生成部１１０は、Ｓ６０９で決定されたフレームレートに動画データのフレームレートを制御するためのヘッダ情報を多重化後のデータに対して生成する。それにより、Ｓ６０２の静止画撮影によって得られた静止画データに対応する１フレームのみを動画データが含み、且つ、１フレームの再生時間がＳ６０８で判断した時間と等しい動画ファイルが生成される。その後、マイクロコンピュータ１０５は、生成された動画ファイルを、外部メモリ１１２に記録する。 In step S 610, the microcomputer 105 reads out the still image data obtained by the still image shooting in step S 602 from the volatile memory 106. The file generation unit 110 multiplexes the read still image data and the audio data read in S607. Then, the file generation unit 110 generates header information for controlling the frame rate of the moving image data to the frame rate determined in S609 for the multiplexed data. Thereby, the moving image data includes only one frame corresponding to the still image data obtained by the still image shooting in S602, and the moving image file having the same playback time as the time determined in S608 is generated. Thereafter, the microcomputer 105 records the generated moving image file in the external memory 112.

なお、本実施例に係る撮像装置の処理フローは、上記処理フローに限られない。例えば、Ｓ６０５の処理が省略されてもよい。Ｓ６０４からＳ６０６へ処理が進められてよい。 Note that the processing flow of the imaging apparatus according to the present embodiment is not limited to the above processing flow. For example, the process of S605 may be omitted. Processing may proceed from S604 to S606.

図７は、本実施例に係る撮像装置の処理の具体例を示す。図７において、時間軸の矢印の向きは、時間の経過方向である。図７には、静止画撮影によって取得された静止画データに基づく静止画、バッファリングされた音声データの波形、各種期間、及び、各種タイミングが示されている。 FIG. 7 shows a specific example of processing of the imaging apparatus according to the present embodiment. In FIG. 7, the direction of the arrow on the time axis is the direction of time passage. FIG. 7 shows still images based on still image data acquired by still image shooting, waveforms of buffered audio data, various periods, and various timings.

タイミングｔ３で、静止画撮影のためのユーザ操作が行われ（Ｓ６０１）、静止画撮影（静止画データの取得）が行われる（Ｓ６０２）。そして、静止画撮影が完了したタイミングｔ４で、音声データのバッファリングが開始され（Ｓ６０３）、タイミングｔ５で、音声データのバッファリングが停止される（Ｓ６０６）。そのため、タイミングｔ４からタイミングｔ５までの期間ｓ６に音声データのバッファリングが行われる（Ｓ６０３〜Ｓ６０６）。期間ｓ６の時間は、必要記録時間と等しい。 At timing t3, a user operation for still image shooting is performed (S601), and still image shooting (acquisition of still image data) is performed (S602). Then, buffering of audio data is started at timing t4 when still image shooting is completed (S603), and buffering of audio data is stopped at timing t5 (S606). Therefore, audio data is buffered in a period s6 from timing t4 to timing t5 (S603 to S606). The time of the period s6 is equal to the necessary recording time.

その後、期間ｓ６の音声データが揮発性メモリ１０６から取得され（Ｓ６０７）、期間ｓ６の時間が判断され（Ｓ６０８）、期間ｓ６の時間に基づいてフレームレートが決定される（Ｓ６０９）。そして、静止画撮影によって取得された静止画データ、揮発性メモリ１０６から取得された音声データ、及び、決定されたフレームレートから、１つの動画ファイルが生成される（Ｓ６１０）。 Thereafter, the audio data of the period s6 is acquired from the volatile memory 106 (S607), the time of the period s6 is determined (S608), and the frame rate is determined based on the time of the period s6 (S609). Then, one moving image file is generated from the still image data acquired by still image shooting, the audio data acquired from the volatile memory 106, and the determined frame rate (S610).

以上述べたように、本実施例によれば、静止画撮影を行うことにより、静止画データが取得され、静止画撮影が行われるタイミングに応じた期間に集音を行うことにより、音声データが取得される。そして、取得された音声データに基づく音声が再生される期間に、取得された静止画データに基づく静止画が再生されるように、それらのデータから、一般的なファイル構造を有する１つの動画ファイルが生成される。それにより、静止画を再生する際に当該静止画に対応する音声が一般的な再生装置で容易に再生可能となる。 As described above, according to the present embodiment, still image data is acquired by performing still image shooting, and audio data is acquired by collecting sound during a period corresponding to the timing at which still image shooting is performed. To be acquired. Then, one moving image file having a general file structure is generated from the data so that a still image based on the acquired still image data is reproduced during a period in which the sound based on the acquired audio data is reproduced. Is generated. Thereby, when reproducing a still image, the sound corresponding to the still image can be easily reproduced by a general reproduction apparatus.

なお、図１の各機能部は、個別のハードウェアであってもよいし、そうでなくてもよい。２つ以上の機能部の機能が、共通のハードウェアによって実現されてもよい。１つの機能部の複数の機能のそれぞれが、個別のハードウェアによって実現されてもよい。１つの機能部の２つ以上の機能が、共通のハードウェアによって実現されてもよい。また、各機能部は、ハードウェアによって実現されてもよいし、そうでなくてもよい。例えば、装置が、プロセッサと、制御プログラムが格納されたメモリとを有していてもよい。そして、装置が有する少なくとも一部の機能部の機能が、プロセッサがメモリから制御プログラムを読み出して実行することにより実現されてもよい。 Each functional unit in FIG. 1 may or may not be individual hardware. The functions of two or more functional units may be realized by common hardware. Each of a plurality of functions of one functional unit may be realized by individual hardware. Two or more functions of one functional unit may be realized by common hardware. Each functional unit may be realized by hardware or not. For example, the apparatus may include a processor and a memory in which a control program is stored. The functions of at least some of the functional units included in the apparatus may be realized by the processor reading and executing the control program from the memory.

なお、実施例１〜３はあくまで一例であり、本発明の要旨の範囲内で実施例１〜３の構成を適宜変形したり変更したりすることにより得られる構成も、本発明に含まれる。実施例１〜３の構成を適宜組み合わせて得られる構成も、本発明に含まれる。 In addition, Examples 1-3 are an example to the last, and the structure obtained by changing suitably and changing the structure of Examples 1-3 within the range of the summary of this invention is also contained in this invention. Configurations obtained by appropriately combining the configurations of Examples 1 to 3 are also included in the present invention.

実施例１〜３では、動画ファイルを生成するファイル生成装置が撮像装置に設けられている例を説明したが、ファイル生成装置は撮像装置とは別体の装置であってもよい。例えば、ファイル生成装置は、パーソナルコンピュータ（ＰＣ）、タブレット端末、スマートフォン、等に設けられていてもよい。 In the first to third embodiments, an example in which a file generation device that generates a moving image file is provided in the imaging device has been described. However, the file generation device may be a separate device from the imaging device. For example, the file generation device may be provided in a personal computer (PC), a tablet terminal, a smartphone, or the like.

実施例１では、撮影画像データである動画データから静止画データが取得される例を説明したが、動画データは撮影画像データに限られない。例えば、動画データは、放送番組のデータ、映画のデータ、等であってもよい。また、実施例２，３では、静止画撮影によ
って撮影画像データである静止画データが取得される例を説明したが、静止画データは、撮影画像データに限られない。例えば、静止画データは、イラストのデータであってもよい。 In the first embodiment, an example in which still image data is acquired from moving image data that is captured image data has been described. For example, the moving image data may be broadcast program data, movie data, or the like. In the second and third embodiments, an example in which still image data that is captured image data is acquired by still image capturing has been described. However, still image data is not limited to captured image data. For example, the still image data may be illustration data.

実施例１では、音声付動画から静止画データと音声データが取得される例を説明した。実施例２，３では、静止画撮影が行われる期間の集音によって音声データが取得される例を説明した。しかしながら、音声データの取得方法はこれらに限られない。例えば、任意の音楽データの一部が、音声データとして取得されてもよい。 In the first embodiment, an example in which still image data and audio data are acquired from a moving image with audio has been described. In the second and third embodiments, the example in which audio data is acquired by collecting sound during a period in which still image shooting is performed has been described. However, the acquisition method of audio data is not limited to these. For example, a part of arbitrary music data may be acquired as audio data.

実施例２では、静止画撮影が行われるタイミングに応じた期間、すなわち集音が行われる期間として、静止画撮影が行われるタイミングまでの期間が使用される例を説明した。実施例３では、静止画撮影が行われるタイミングに応じた期間として、静止画撮影が行われるタイミングからの期間が使用される例を説明した。しかしながら、静止画撮影が行われるタイミングに応じた期間は、これらに限られない。例えば、静止画撮影が行われるタイミングに応じた期間として、静止画撮影が行われるタイミングを跨いだ期間が使用されてもよい。具体的には、図４のＳ４０１の処理とＳ４０２の処理とが行われ、Ｓ４０３の処理が省略され、Ｓ４０４の処理が行われ、Ｓ４０４から図５のＳ５０４へ処理が進められてもよい。そのようにすれば、静止画撮影が行われるタイミングに応じた期間として、静止画撮影が行われるタイミングを跨いだ期間を使用することができる。 In the second embodiment, the example in which the period up to the timing at which the still image shooting is performed is used as the period according to the timing at which the still image shooting is performed, that is, the period during which the sound collection is performed. In the third embodiment, the example in which the period from the timing at which the still image shooting is performed is used as the period according to the timing at which the still image shooting is performed. However, the period corresponding to the timing at which still image shooting is performed is not limited to these. For example, as a period corresponding to the timing at which still image shooting is performed, a period across the timing at which still image shooting is performed may be used. Specifically, the processing in S401 and S402 in FIG. 4 may be performed, the processing in S403 may be omitted, the processing in S404 may be performed, and the processing may proceed from S404 to S504 in FIG. By doing so, it is possible to use a period straddling the timing at which the still image shooting is performed as the period according to the timing at which the still image shooting is performed.

静止画撮影の種類に応じて、静止画撮影が行われるタイミングに応じた期間、すなわち集音が行われる期間が、切り替えられてもよい。ここで、メカニカルシャッターを用いた静止画撮影と、電子シャッターを用いた静止画撮影とが実行可能である場合を考える。メカニカルシャッターを用いた静止画撮影では、物理的に光を遮る遮光機構を開閉させることにより、シャッターが切られる。電子シャッターを用いた静止画撮影では、撮像素子の処理を制御することによって、物理的なシャッターの動きが再現される。この場合には、メカニカルシャッターを用いた静止画撮影が行われる場合に、静止画撮影が行われるタイミングまでの期間、または、静止画撮影が行われるタイミングからの期間に集音が行われてもよい。そして、電子シャッターを用いた静止画撮影が行われる場合に、静止画撮影が行われるタイミングを跨いだ期間に集音が行われてもよい。静止画撮影の種類と集音の期間との対応関係は、特に限定されない。 Depending on the type of still image shooting, the period corresponding to the timing when still image shooting is performed, that is, the period during which sound collection is performed may be switched. Here, consider a case where still image shooting using a mechanical shutter and still image shooting using an electronic shutter can be performed. In still image shooting using a mechanical shutter, the shutter is opened by opening and closing a light blocking mechanism that physically blocks light. In still image shooting using an electronic shutter, the physical movement of the shutter is reproduced by controlling the processing of the image sensor. In this case, when still image shooting using a mechanical shutter is performed, even if sound collection is performed during the period up to the timing when the still image shooting is performed or during the period from when the still image shooting is performed. Good. When still image shooting using an electronic shutter is performed, sound collection may be performed during a period across the timing at which still image shooting is performed. The correspondence between the type of still image shooting and the period of sound collection is not particularly limited.

実施例１〜３では、得られた静止画データに対応する１フレームのみを含む動画ファイルが生成される例を説明したが、動画ファイルは、各々が同じ静止画データに対応する複数フレームを含んでもよい。例えば、音声データの時間が５秒である場合には、フレームレートが３０ｆｐｓである動画ファイルとして、各々が同じ静止画データに対応する１５０（＝３０×５）フレームを含む動画ファイルが生成されてもよい。 In the first to third embodiments, the example in which the moving image file including only one frame corresponding to the obtained still image data has been described, but the moving image file includes a plurality of frames each corresponding to the same still image data. But you can. For example, when the time of the audio data is 5 seconds, a moving image file including 150 (= 30 × 5) frames each corresponding to the same still image data is generated as a moving image file having a frame rate of 30 fps. Also good.

静止画データと音声データから生成された動画ファイルから音声データを削除するためのユーザ操作が行われてもよい。動画ファイルから音声データを削除する指示がユーザによって行われた場合には、ファイル生成部１１０は、当該動画ファイルの生成に使用された静止画データから静止画ファイルを生成してもよい。そして、マイクロコンピュータ１０５は、外部メモリ１１２に記録された上記動画ファイルを、生成された静止画ファイルに置き換えてもよい。それにより、動画ファイルよりもデータサイズの小さい静止画ファイルを得ることができるため、外部メモリ１１２の記憶容量を効率よく利用することができる。なお、静止画データは、動画ファイルから取得されてもよいし、揮発性メモリ１０６、外部メモリ１１２、等によって保持されていてもよい。 A user operation for deleting the audio data from the moving image file generated from the still image data and the audio data may be performed. When an instruction to delete audio data from the moving image file is issued by the user, the file generation unit 110 may generate a still image file from the still image data used for generating the moving image file. Then, the microcomputer 105 may replace the moving image file recorded in the external memory 112 with the generated still image file. As a result, a still image file having a data size smaller than that of the moving image file can be obtained, so that the storage capacity of the external memory 112 can be used efficiently. Note that the still image data may be acquired from a moving image file, or may be held by the volatile memory 106, the external memory 112, or the like.

静止画データと音声データから動画ファイルを生成する否かをユーザが選択可能であってもよい。静止画データと音声データから動画ファイルを生成することがユーザによって
選択されている場合に、ファイル生成部１１０が、静止画データと音声データから動画ファイルを生成してもよい。そして、静止画データと音声データから動画ファイルを生成しないことがユーザによって選択されている場合に、ファイル生成部１１０が、静止画データから静止画ファイルを生成してもよい。 The user may be able to select whether to generate a moving image file from still image data and audio data. When the user has selected to generate a moving image file from still image data and audio data, the file generation unit 110 may generate a moving image file from still image data and audio data. Then, when the user has selected not to generate a moving image file from still image data and audio data, the file generation unit 110 may generate a still image file from still image data.

＜その他の実施例＞
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other examples>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００：撮影レンズ１０１：撮像素子１０２：Ａ／Ｄ変換器
１０３：マイクロホン１０４：Ａ／Ｄ変換器１０５：マイクロコンピュータ
１０６：揮発性メモリ１０７：静止画コーデック１０８：動画コーデック
１０９：音声コーデック１１０：ファイル生成部１１１：不揮発性メモリ
１１２：外部メモリ１１３：表示部材１１４：操作部材 DESCRIPTION OF SYMBOLS 100: Shooting lens 101: Image pick-up element 102: A / D converter 103: Microphone 104: A / D converter 105: Microcomputer 106: Volatile memory 107: Still image codec 108: Movie codec 109: Audio codec 110: File Generation unit 111: nonvolatile memory 112: external memory 113: display member 114: operation member

Claims

First acquisition means for acquiring still image data;
Second acquisition means for acquiring audio data;
Generating means for generating one moving image file from the still image data and the audio data so that a still image based on the still image data is reproduced during a period in which sound based on the audio data is reproduced;
A file generation apparatus comprising:

The file generation apparatus according to claim 1, wherein the first acquisition unit acquires image data of one frame of a moving image as the still image data.

The video is a video with sound,
The file generation apparatus according to claim 2, wherein the second acquisition unit acquires audio data in at least a part of the moving image with audio.

The file generation apparatus according to claim 3, wherein the second acquisition unit acquires audio data in a period corresponding to a time position of a frame of the still image data of the moving image with audio.

The period corresponding to the time position of the frame of the still image data is a period until the time position of the frame of the still image data, a period from the time position of the frame of the still image data, or the frame of the still image data The file generation apparatus according to claim 4, wherein the file generation apparatus is a period that crosses the time position.

The file generation apparatus according to claim 3, wherein the second acquisition unit acquires audio data of the video with audio for a period specified by a user.

The file generation apparatus according to claim 1, wherein the first acquisition unit acquires still image data representing a subject image by performing still image shooting.

The file generation apparatus according to claim 7, wherein the second acquisition unit acquires the audio data by collecting sound during a period corresponding to a timing at which the still image shooting is performed.

The period corresponding to the timing at which the still image shooting is performed includes a period until the timing at which the still image shooting is performed, a period from the timing at which the still image shooting is performed, or a timing at which the still image shooting is performed. The file generation apparatus according to claim 8, wherein the file generation apparatus is a straddling period.

The first acquisition means can execute still image shooting using a mechanical shutter and still image shooting using an electronic shutter.
When still image shooting using the mechanical shutter is performed, the second acquisition unit collects sound during a period until the timing when the still image shooting is performed or during a period from the timing when the still image shooting is performed. And
The said 2nd acquisition means collects sound in the period over the timing when the said still image photography is performed, when the still image photography using the said electronic shutter is performed. File generator.

Recording means for recording the moving image file in a storage unit;
When an instruction to delete the audio data from the video file is given by the user,
The generation unit generates a still image file from the still image data,
The file generation apparatus according to claim 1, wherein the recording unit replaces the moving image file recorded in the storage unit with the still image file.

The user can select whether to generate the video file,
The said production | generation means produces | generates a still image file from the said still image data, when the user has chosen not to produce | generate the said moving image file, The any one of Claims 1-11 characterized by the above-mentioned. File generator.

The said production | generation means produces | generates a still image file from the said still image data, when the time of the audio | voice based on the said audio data is less than a threshold value, The Claim 1 characterized by the above-mentioned. File generator.

Acquiring still image data;
Obtaining audio data;
Generating one moving image file from the still image data and the audio data so that a still image based on the still image data is reproduced during a period in which sound based on the audio data is reproduced;
A file generation method characterized by comprising:

The program for functioning a computer as each means of the file generation apparatus of any one of Claims 1-13.