JP2020191524A

JP2020191524A - Imaging apparatus, control method, and program

Info

Publication number: JP2020191524A
Application number: JP2019095296A
Authority: JP
Inventors: 勇真内藤; Yuma Naito
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2020-11-26

Abstract

To provide an imaging apparatus, a control method, and a program capable of synchronizing video data and audio data for each frame even when a capturing frame rate and a reproduction frame rate at the time of capturing a moving image are different from each other.SOLUTION: An imaging apparatus includes: an acquisition means configured to acquire video data and audio data at a capturing frame rate; a conversion means configured to convert the audio data on the basis of a ratio between a capturing frame rate and a reproduction frame rate; and a recording means configured to record the audio data converted by the conversion means along with the video data. The conversion means calculates the ratio between the capturing frame rate and the reproduction frame rate for each frame, and converts the audio data corresponding to the aforementioned frame on the basis of the calculated ratio.SELECTED DRAWING: Figure 4

Description

本発明は、撮像装置、制御方法およびプログラムに関する。 The present invention relates to imaging devices, control methods and programs.

近年、動画撮影を行うときに、撮影フレームレートと再生フレームレートとを設定する機能が搭載されている撮像装置が用いられている。撮影フレームレートと再生フレームレートとが同じ値に設定された場合、撮像装置は、撮影フレームレートで取得した映像データと音声データとを再生フレームレートで同期した状態で記録することができる。一方、撮影フレームレートと再生フレームレートとが異なる値に設定された場合、設定された再生フレームレートに合わせて記録する映像データのデータ量に対して、撮影フレームレートで取得した音声データのデータ量が過剰になるか、または不足する。このため、映像データと音声データとを再生フレームレートに合わせて記録する場合、映像データと音声データとの再生時間が等しくなるように変換する処理を行い、同期が保たれた状態で映像データと音声データとを記録する必要がある。 In recent years, an imaging device equipped with a function of setting a shooting frame rate and a playback frame rate has been used when shooting a moving image. When the shooting frame rate and the playback frame rate are set to the same value, the imaging device can record the video data and the audio data acquired at the shooting frame rate in a synchronized state at the playback frame rate. On the other hand, when the shooting frame rate and the playback frame rate are set to different values, the amount of audio data acquired at the shooting frame rate is relative to the amount of video data recorded according to the set playback frame rate. Is excessive or insufficient. For this reason, when recording video data and audio data according to the playback frame rate, a process is performed to convert the video data and audio data so that the playback times are equal, and the video data and the audio data are synchronized with each other. It is necessary to record audio data.

関連する技術として、特許文献１の技術が提案されている。特許文献１の技術は、音響データと動画データを複数の区間に分割し、分割された区間ごとに音響データと動画データの圧縮伸長の度合いを決定するパラメータ時系列を決定する。また、上記技術は、分割された音響データを等間隔の入力フレームに分割し、パラメータ時系列に基づいて出力フレーム長を決定することで、音響データに関して圧縮伸長前の音響データと圧縮伸長後の音響データの間での同期ポイントの対応表を出力する。そして、上記技術は、対応表に基づいて音響データを圧縮伸長し、対応表に基づいて動画データを圧縮伸長する。 As a related technique, the technique of Patent Document 1 has been proposed. The technique of Patent Document 1 divides acoustic data and moving image data into a plurality of sections, and determines a parameter time series for determining the degree of compression / expansion of the acoustic data and moving image data for each divided section. Further, in the above technique, the divided acoustic data is divided into input frames at equal intervals, and the output frame length is determined based on the parameter time series. As for the acoustic data, the acoustic data before compression / expansion and the acoustic data after compression / expansion are obtained. Output the correspondence table of synchronization points between acoustic data. Then, the above technique compresses and decompresses the acoustic data based on the correspondence table, and compresses and decompresses the moving image data based on the correspondence table.

特開２００６−１８７０１３号公報Japanese Unexamined Patent Publication No. 2006-187013

特許文献１の技術は、上述した対応表に基づいて、音響データと動画データとを圧縮伸張することで、音響データと動画データとが同期するようにしている。従って、特許文献１の技術では、分割区間ごとの対応表を出力する必要があり、分割区間ごとに対応表を参照して、動画データを伸長圧縮するため、処理が煩雑化する。また、特許文献１の技術では、分割区間ごとの対応表に基づいて、音響データと動画データとを同期させているため、フレームごとに、音声データと映像データとを同期させることは難しい。 The technique of Patent Document 1 compresses and decompresses the acoustic data and the moving image data based on the above-mentioned correspondence table so that the acoustic data and the moving image data are synchronized. Therefore, in the technique of Patent Document 1, it is necessary to output the correspondence table for each division section, and the moving image data is decompressed and compressed by referring to the correspondence table for each division section, which complicates the processing. Further, in the technique of Patent Document 1, since the acoustic data and the moving image data are synchronized based on the correspondence table for each divided section, it is difficult to synchronize the audio data and the video data for each frame.

本発明は、動画撮影時の撮影フレームレートと再生フレームレートとが異なる場合でも、映像データと音声データとをフレームごとに同期させることができる撮像装置、制御方法およびプログラムを提供することを目的とする。 An object of the present invention is to provide an imaging device, a control method, and a program capable of synchronizing video data and audio data for each frame even when the shooting frame rate and the playback frame rate at the time of moving image shooting are different. To do.

上記目的を達成するために、本発明の撮像装置は、撮影フレームレートで映像データおよび音声データを取得する取得手段と、前記撮影フレームレートと再生フレームレートとの比に基づいて前記音声データを変換する変換手段と、前記変換手段により変換された音声データを前記映像データとともに記録する記録手段と、を備え、前記変換手段は、１フレーム毎に、前記撮影フレームレートと前記再生フレームレートとの比を算出し、当該フレームに対応する音声データを算出した比に基づいて変換することを特徴とする。 In order to achieve the above object, the imaging device of the present invention converts the audio data based on the acquisition means for acquiring video data and audio data at the shooting frame rate and the ratio of the shooting frame rate to the playback frame rate. The conversion means includes a conversion means for recording the audio data converted by the conversion means together with the video data, and the conversion means is a ratio of the shooting frame rate to the reproduction frame rate for each frame. Is calculated, and the audio data corresponding to the frame is converted based on the calculated ratio.

本発明によれば、動画撮影時の撮影フレームレートと再生フレームレートとが異なる場合でも、映像データと音声データとをフレームごとに同期させることができる。 According to the present invention, even when the shooting frame rate and the playback frame rate at the time of moving image shooting are different, the video data and the audio data can be synchronized for each frame.

本実施形態の撮像装置の機能ブロック図である。It is a functional block diagram of the image pickup apparatus of this embodiment. スロー＆ファスト機能を用いて撮影を行った際の映像データおよび音声データの一例を示す図である。It is a figure which shows an example of the video data and audio data at the time of taking a picture using the slow & fast function. 動画撮影時の記録処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the recording process at the time of moving image shooting. 低速再生の場合の変換処理を説明するための図である。It is a figure for demonstrating the conversion process in the case of low-speed reproduction. 変換処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the conversion process. 高速再生の場合の変換処理を説明するための図である。It is a figure for demonstrating the conversion process in the case of high-speed reproduction.

以下、本発明の各実施の形態について図面を参照しながら詳細に説明する。しかしながら、以下の各実施の形態に記載されている構成はあくまで例示に過ぎず、本発明の範囲は各実施の形態に記載されている構成によって限定されることはない。 Hereinafter, each embodiment of the present invention will be described in detail with reference to the drawings. However, the configurations described in each of the following embodiments are merely examples, and the scope of the present invention is not limited by the configurations described in each embodiment.

図１は、本実施形態の撮像装置１００の機能ブロック図である。撮像装置１００は、動画を撮影する機能を有しており、撮影した動画のデータ等をファイル化してメモリカード１０９に記録できる。撮像装置１００は、例えば、デジタルカメラや撮像機能を有するスマートフォン等である。本実施形態の撮像装置１００は、撮影フレームレートと再生フレームレートとを異なるフレームレートに設定して撮影する機能（スロー＆ファスト機能）を有する。図１に示されるように、撮像装置１００は、撮像素子１０１、撮像素子駆動回路１０２、画像処理回路１０３および映像データメモリ１０４を有する。また、撮像装置１００は、マイク１０５、マイク駆動回路１０６、信号処理回路１０７、音声データメモリ１０８、メモリカード１０９、書き込み制御部１１０および蓄積バッファ１１１を有する。また、撮像装置１００は、操作部１１２、表示部１１３、ＣＰＵ１１４、ファイルシステム１１５およびメモリ１１６を有する。 FIG. 1 is a functional block diagram of the image pickup apparatus 100 of the present embodiment. The image pickup apparatus 100 has a function of capturing a moving image, and can record the captured moving image data or the like as a file on the memory card 109. The image pickup device 100 is, for example, a digital camera, a smartphone having an image pickup function, or the like. The image pickup apparatus 100 of the present embodiment has a function (slow & fast function) of setting a shooting frame rate and a reproduction frame rate to different frame rates for shooting. As shown in FIG. 1, the image pickup device 100 includes an image pickup element 101, an image pickup element drive circuit 102, an image processing circuit 103, and a video data memory 104. Further, the image pickup apparatus 100 includes a microphone 105, a microphone drive circuit 106, a signal processing circuit 107, an audio data memory 108, a memory card 109, a write control unit 110, and a storage buffer 111. Further, the image pickup apparatus 100 includes an operation unit 112, a display unit 113, a CPU 114, a file system 115, and a memory 116.

撮像素子１０１は、光電変換により入射光量に応じた電荷を生成して出力する。撮像素子１０１は、例えば、ＣＭＯＳセンサである。撮像素子１０１は、全画素の信号を読み出すこともでき、また特定の画素の加算および特定の行または列において間引いて電荷を読み出すこともできる。撮像素子駆動回路１０２は、ＣＰＵ１１４による制御に基づいて、撮像素子１０１を駆動する。画像処理回路１０３は、撮像素子１０１からの画像信号をＡ／Ｄ変換し、色情報の変換を行って所定の画像処理を施す。映像データメモリ１０４は、画像処理回路１０３からの動画の映像データを記憶する。該動画の映像データは、表示部１１３での表示に用いられることもあり、記録処理用に一時記憶するために用いられることもある。 The image sensor 101 generates and outputs an electric charge according to the amount of incident light by photoelectric conversion. The image sensor 101 is, for example, a CMOS sensor. The image sensor 101 can read signals of all pixels, and can also read charges by adding specific pixels and thinning out in a specific row or column. The image sensor drive circuit 102 drives the image sensor 101 based on the control by the CPU 114. The image processing circuit 103 A / D-converts the image signal from the image sensor 101, converts the color information, and performs predetermined image processing. The video data memory 104 stores video data of a moving image from the image processing circuit 103. The video data of the moving image may be used for display on the display unit 113, or may be used for temporary storage for recording processing.

マイク１０５は、集音手段であり、例えば、空気振動の変化量に応じた電気信号を生成する振動板を有しており、変化量が同じであっても空気振動の入力の方向に応じて生成する電気量が変化する特性を持っている。マイク駆動回路１０６は、ＣＰＵ１１４による制御に基づいて、マイク１０５を駆動する。信号処理回路１０７は、マイク１０５からの電気信号をＡ／Ｄ変換し、音声情報の変換を行って所定の信号処理を施す。音声データメモリ１０８は、信号処理回路１０７からの動画の音声データを出力用および記録処理用に一時記憶するために用いられる。 The microphone 105 is a sound collecting means, for example, has a diaphragm that generates an electric signal according to the amount of change in air vibration, and even if the amount of change is the same, it depends on the direction of input of air vibration. It has the property that the amount of electricity generated changes. The microphone drive circuit 106 drives the microphone 105 under the control of the CPU 114. The signal processing circuit 107 A / D-converts the electric signal from the microphone 105, converts the audio information, and performs predetermined signal processing. The audio data memory 108 is used to temporarily store the audio data of the moving image from the signal processing circuit 107 for output and recording processing.

書き込み制御部１１０は、メモリカード１０９の接続状態の監視を行うとともに、メモリカード１０９へのアクセスを制御する。映像データメモリ１０４に記憶された映像データおよび音声データメモリ１０８に記憶された音声データは、ＣＰＵ１１４により所定の変換処理が施される。蓄積バッファ１１１は、変換後の映像データおよび音声データを動画データとして蓄積する。 The write control unit 110 monitors the connection status of the memory card 109 and controls access to the memory card 109. The video data stored in the video data memory 104 and the audio data stored in the audio data memory 108 are subjected to a predetermined conversion process by the CPU 114. The storage buffer 111 stores the converted video data and audio data as moving image data.

操作部１１２は、録画ボタンが割り当てられた部材や、タッチパネル等による入力デバイスによる入力をユーザ操作により受け付け、操作に応じた制御信号を生成する。表示部１１３は、撮影時のライブビューや各種設定値等を示す画面であるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）を構成するＧＵＩ画面を表示する構成要素である。制御手段としてのＣＰＵ１１４は、メモリ１１６に記憶されているプログラムを実行することにより、本実施形態に係る処理を行う。メモリ１１６は、撮像装置１００の機能を実現するための各種プログラムや操作部１１２に入力された撮影に関する設定に関する情報を記憶する。ＣＰＵ１１４は、各種制御信号やプログラムに従って撮像装置１００の動作を制御する。ファイルシステム１１５は、メモリ１１６に保存された再生フレームレート等の撮影情報をメタデータとして扱い、映像データおよび音声データと共にファイル化を行う。 The operation unit 112 receives an input from a member to which a recording button is assigned, an input device such as a touch panel, or the like by a user operation, and generates a control signal according to the operation. The display unit 113 is a component that displays a GUI screen that constitutes a GUI (Graphical User Interface) that is a screen showing a live view at the time of shooting, various setting values, and the like. The CPU 114 as a control means performs the process according to the present embodiment by executing the program stored in the memory 116. The memory 116 stores various programs for realizing the functions of the image pickup apparatus 100 and information related to shooting settings input to the operation unit 112. The CPU 114 controls the operation of the image pickup apparatus 100 according to various control signals and programs. The file system 115 handles shooting information such as the playback frame rate stored in the memory 116 as metadata, and creates a file together with video data and audio data.

上述したように、本実施形態の撮像装置１００は、動画撮影時に、撮影フレームレートと再生フレームレートとをそれぞれ個別的に設定することができる機能を有する。該機能は、スロー＆ファスト機能とも称される。従って、撮影フレームレートと再生フレームレートとは同じ値に設定される場合もあり、異なる値に設定される場合もある。スロー＆ファスト機能では、撮像装置１００が動画撮影を行っている際、任意のタイミングで、撮影フレームレートを変更することも可能である。図２は、スロー＆ファスト機能を用いて撮影を行った際の映像データおよび音声データの一例を示す図である。図２では、撮影フレームレートと再生フレームレートとは異なっている。 As described above, the image pickup apparatus 100 of the present embodiment has a function of individually setting a shooting frame rate and a reproduction frame rate at the time of moving image shooting. The function is also referred to as a slow & fast function. Therefore, the shooting frame rate and the playback frame rate may be set to the same value, or may be set to different values. With the slow & fast function, it is possible to change the shooting frame rate at an arbitrary timing when the imaging device 100 is shooting a moving image. FIG. 2 is a diagram showing an example of video data and audio data when shooting is performed using the slow & fast function. In FIG. 2, the shooting frame rate and the playback frame rate are different.

図２（Ａ）は、撮影時の映像データ３０３および音声データ３０４を示すイメージ図である。図２（Ａ）において、撮影時間軸３０２における時間ｔ１は、映像データ３０３の１フレームに相当する撮影時間である。時間ｔ１は、撮影フレームレートの逆数から求めることができる。図２（Ｂ）は、再生時の映像データ３０８および音声データ３０９を示すイメージ図である。図２（Ｂ）において、再生時間軸３０７における時間ｔ２は、映像データの１フレームに相当する撮影時間である。時間ｔ２は、再生フレームレートの逆数から求めることができる。 FIG. 2A is an image diagram showing video data 303 and audio data 304 at the time of shooting. In FIG. 2A, the time t1 on the shooting time axis 302 is the shooting time corresponding to one frame of the video data 303. The time t1 can be obtained from the reciprocal of the shooting frame rate. FIG. 2B is an image diagram showing video data 308 and audio data 309 during reproduction. In FIG. 2B, the time t2 on the reproduction time axis 307 is the shooting time corresponding to one frame of the video data. The time t2 can be obtained from the reciprocal of the reproduction frame rate.

映像データにおける１フレームに相当する必要なデータ量は撮影時と再生時とで変化することはないが、表示される時間は変化する。音声データにおける１フレームに相当する必要なデータ量は、撮影時と再生時とでフレームレートの設定に応じて変化する。撮影時のフレームレートと再生時のフレームレートとによっては、音声データにおける１フレームに相当するデータ量が、過剰になるか、または不足する。そこで、本実施形態では、ＣＰＵ１１４は、フレームレートに応じた音声データの変換を行う。ここでは、１フレームの撮影時間ｔ１に対して再生時間ｔ２が長くなる低速再生（ｔ１＜ｔ２）の場合における撮影時の音声データの変換方法を説明する。低速再生の場合、ストレッチ比（再生フレームレートに対する撮影フレームレートの比）の値は１より大きい。 The amount of data required for one frame in the video data does not change between shooting and playback, but the displayed time does change. The required amount of data corresponding to one frame in the audio data changes depending on the frame rate setting during shooting and playback. Depending on the frame rate at the time of shooting and the frame rate at the time of reproduction, the amount of data corresponding to one frame in the audio data becomes excessive or insufficient. Therefore, in the present embodiment, the CPU 114 converts the audio data according to the frame rate. Here, a method of converting audio data during shooting in the case of low-speed playback (t1 <t2) in which the playback time t2 is longer than the shooting time t1 of one frame will be described. In the case of low-speed playback, the value of the stretch ratio (ratio of the shooting frame rate to the playback frame rate) is larger than 1.

図３のフローチャートを参照して、動画撮影時の記録処理の流れについて説明する。撮影者は、操作部１１２を用いて、撮影情報として取り扱われる撮影フレームレートと再生フレームレートとを個別的に設定する。受付手段として機能する操作部１１２は、撮影フレームレートの設定と再生フレームレートの設定とを受け付ける（Ｓ２０１）。また、撮影者は、操作部１１２を用いて、撮影モードや動画のサイズ等の各種の撮影情報を撮像装置１００に設定することができる。設定された撮影フレームレートおよび再生フレームレートを含む撮影情報は、メモリ１１６に記憶される。例えば、撮影者は、撮像装置１００に対して、撮影フレームレートとして「６０ｆｐｓ」、再生フレームレートとして「３０ｆｐｓ」を設定したとする。この場合、撮像装置１００は、１秒間に６０枚の画像を撮影し、３０枚の画像を再生するため、撮影時間に比べて再生時間が２倍になる。撮影フレームレートと再生フレームレートとは何れも「１ｆｐｓ」から設定可能であり、上限は定めないものとする。 The flow of the recording process at the time of moving image shooting will be described with reference to the flowchart of FIG. The photographer uses the operation unit 112 to individually set the shooting frame rate and the playback frame rate to be handled as shooting information. The operation unit 112 that functions as a reception means receives the setting of the shooting frame rate and the setting of the playback frame rate (S201). Further, the photographer can set various shooting information such as a shooting mode and a moving image size in the image pickup apparatus 100 by using the operation unit 112. The shooting information including the set shooting frame rate and playback frame rate is stored in the memory 116. For example, it is assumed that the photographer sets "60 fps" as a shooting frame rate and "30 fps" as a reproduction frame rate for the image pickup apparatus 100. In this case, since the image pickup apparatus 100 captures 60 images per second and reproduces 30 images, the reproduction time is doubled as compared with the imaging time. Both the shooting frame rate and the playback frame rate can be set from "1 fps", and the upper limit is not set.

ＣＰＵ１１４は、撮影フレームレートおよび再生フレームレートがメモリ１１６に設定されている状態で、操作部１１２に対して録画開始の指示がされたかを判定する（Ｓ２０２）。例えば、ＣＰＵ１１４は、操作部１１２の録画ボタン押下等が押下されたことを検出した場合、録画開始の指示がされたと判定する。Ｓ２０２でＮＯと判定された場合、フローは次のステップに進まない。Ｓ２０２でＹＥＳと判定された場合、ＣＰＵ１１４は、撮影時間軸３０２における、予め設定された撮影フレームレートの１フレームにかかる時間ｔ１に相当する音声データ量を算出する（Ｓ２０３）。これにより、ＣＰＵ１１４は、撮影中の映像データや音声データの撮影間隔を決定する。例えば、設定された撮影フレームレートが「６０ｆｐｓ」の場合、撮像装置１００は、１秒間に６０枚の画像を撮影する。このため、画像データ１枚を表す１フレームあたりの時間ｔ１は「１/６０秒」となり、ＣＰＵ１１４は、その時間に相当する音声データ量を算出する。 The CPU 114 determines whether the operation unit 112 has been instructed to start recording while the shooting frame rate and the playback frame rate are set in the memory 116 (S202). For example, when the CPU 114 detects that the recording button of the operation unit 112 is pressed, it determines that the recording start instruction has been given. If NO is determined in S202, the flow does not proceed to the next step. If YES is determined in S202, the CPU 114 calculates the amount of audio data corresponding to the time t1 required for one frame of the preset shooting frame rate on the shooting time axis 302 (S203). As a result, the CPU 114 determines the shooting interval of the video data and the audio data during shooting. For example, when the set shooting frame rate is "60 fps", the image pickup apparatus 100 captures 60 images per second. Therefore, the time t1 per frame representing one image data is "1/60 second", and the CPU 114 calculates the amount of audio data corresponding to that time.

ＣＰＵ１１４は、再生時間軸３０７における、予め設定された再生フレームレートの１フレームにかかる時間ｔ２のデータ量を算出する（Ｓ２０４）これにより、ＣＰＵ１１４は、再生中の映像データや音声データの再生間隔を決定する。例えば、設定された再生フレームレートが「３０ｆｐｓ」の場合、撮像装置１００は、１秒間に３０枚の画像を撮影する。このため、画像データ１枚を表す１フレームあたりの時間ｔ２は「１/３０秒」となり、ＣＰＵ１１４は、その時間に相当する音声データ量を算出する。 The CPU 114 calculates the amount of data of the time t2 required for one frame of the preset playback frame rate on the playback time axis 307 (S204). As a result, the CPU 114 sets the playback interval of the video data and the audio data during playback. decide. For example, when the set reproduction frame rate is "30 fps", the image pickup apparatus 100 captures 30 images per second. Therefore, the time t2 per frame representing one image data is "1/30 second", and the CPU 114 calculates the amount of audio data corresponding to that time.

そして、ＣＰＵ１１４は、撮影間隔および再生間隔を取得した状態でメモリ１１６に設定された撮影情報に基づいて、動画記録を開始する（Ｓ２０５）。上述したように、撮像装置１００は、動画撮影中（取得中）に、撮影フレームレートを変更することが可能である。ＣＰＵ１１４は、操作部１１２に対して、撮影者からの撮影フレームレートの設定変更が指示されたかを判定する（Ｓ２０６）。Ｓ２０６でＹＥＳと判定された場合、ＣＰＵ１１４は、操作部１１２が受け付けた設定変更の指示（変更指示）に基づいて、指示に基づく撮影フレームレートをメモリ１１６に再度設定する（Ｓ２０７）。そして、ＣＰＵ１１４は、撮影フレームレートから１フレームあたりのデータ量を算出し直す（Ｓ２０８）。Ｓ２０６でＮＯと判定された場合、Ｓ２０７およびＳ２０８の処理は行われない。 Then, the CPU 114 starts video recording based on the shooting information set in the memory 116 with the shooting interval and the playback interval acquired (S205). As described above, the image pickup apparatus 100 can change the shooting frame rate during movie shooting (acquisition). The CPU 114 determines whether the operation unit 112 is instructed by the photographer to change the setting of the shooting frame rate (S206). If YES is determined in S206, the CPU 114 resets the shooting frame rate based on the instruction in the memory 116 based on the setting change instruction (change instruction) received by the operation unit 112 (S207). Then, the CPU 114 recalculates the amount of data per frame from the shooting frame rate (S208). If NO is determined in S206, the processes of S207 and S208 are not performed.

撮像装置１００は、動画撮影中に、撮影フレームレートを変更することが可能であるが、１フレームの映像データおよび音声データの取得が行われている途中では、撮影フレームレートの変更は適用されない。撮影フレームレートの変更は、１フレームの映像データおよび音声データの取得が完了した時点（フレームの切れ目）で適用される。 The image pickup apparatus 100 can change the shooting frame rate during moving image shooting, but the change in the shooting frame rate is not applied while the video data and audio data of one frame are being acquired. The change in the shooting frame rate is applied when the acquisition of the video data and the audio data of one frame is completed (frame break).

動画撮影中に取得される映像データおよび音声データは、それぞれ、撮影フレームレートのフレーム間隔（時間ｔ１）ごとに、画像処理回路１０３および信号処理回路１０７で処理される。そして、処理された映像データは映像データメモリ１０４に蓄積され、処理された音声データは音声データメモリ１０８に蓄積される（Ｓ２０９）。ＣＰＵ１１４は、撮影フレームレートの１フレームにかかる時間ｔ１の映像データおよび音声データ（１フレーム分の映像データおよび音声データ）を取得したかを判定する（Ｓ２１０）。Ｓ２１０でＮＯと判定された場合、フローは次のステップに進まず、映像データおよび音声データの蓄積が繰り返される。 The video data and audio data acquired during moving image shooting are processed by the image processing circuit 103 and the signal processing circuit 107 for each frame interval (time t1) of the shooting frame rate, respectively. Then, the processed video data is stored in the video data memory 104, and the processed audio data is stored in the audio data memory 108 (S209). The CPU 114 determines whether or not the video data and audio data (video data and audio data for one frame) of the time t1 required for one frame of the shooting frame rate have been acquired (S210). If NO is determined in S210, the flow does not proceed to the next step, and the accumulation of video data and audio data is repeated.

Ｓ２１０でＹＥＳと判定された場合、ＣＰＵ１１４は、再生時間軸３０７の中で取得した時間ｔ１に対応する音声データを時間ｔ２に対応する音声データへ変換処理を行う（Ｓ２１１）。ＣＰＵ１１４は、変換処理後の音声データ３０９を蓄積バッファ１１１に蓄積する。蓄積バッファ１１１に蓄積された映像データの撮影１フレームにかかる時間ｔ１のデータ量と再生１フレームにかかる時間ｔ２の映像データのデータ量とは等しい。このため、ＣＰＵ１１４は、ファイルシステム１１５がファイル化する際に時間ｔ２で再生できるように１フレームずつ映像データ３０８をメモリ１１６に記憶する。そして、ＣＰＵ１１４は、Ｓ２１１の変換処理が行われて、蓄積バッファ１１１に蓄積された音声データ３０９を再生１フレームにかかる時間ｔ２に相当するデータ量だけメモリ１１６に記憶する。 If YES is determined in S210, the CPU 114 converts the audio data corresponding to the time t1 acquired in the reproduction time axis 307 into the audio data corresponding to the time t2 (S211). The CPU 114 stores the voice data 309 after the conversion process in the storage buffer 111. The amount of video data in the time t1 required for one frame of shooting of the video data stored in the storage buffer 111 is equal to the amount of data of the video data in the time t2 required for one frame of reproduction. Therefore, the CPU 114 stores the video data 308 in the memory 116 frame by frame so that the file system 115 can reproduce the file at time t2. Then, the conversion process of S211 is performed, and the CPU 114 stores the audio data 309 stored in the storage buffer 111 in the memory 116 by the amount of data corresponding to the time t2 required for one playback frame.

ＣＰＵ１１４は、操作部１１２が撮影者から録画終了の指示を受け付けたかを判定する（Ｓ２１４）。Ｓ２１４でＮＯと判定された場合、フローはＳ２０６に移行し、動画撮影が継続される。Ｓ２１４でＹＥＳと判定された場合、ＣＰＵ１１４は、録画終了の指示がされたタイミングまでの映像データおよび音声データをファイル化させるようにファイルシステム１１５を制御する。ファイルシステム１１５は、録画終了の指示がされたタイミングまでに記録した動画データ（映像データおよび音声データ）に関するフレームレート等の撮影情報と共に動画ファイルをファイル化する。ファイル化された動画ファイルは、メモリカード１０９に記録される。そして、記録処理が終了する。 The CPU 114 determines whether the operation unit 112 has received an instruction to end recording from the photographer (S214). If NO is determined in S214, the flow shifts to S206, and moving image shooting is continued. If YES is determined in S214, the CPU 114 controls the file system 115 so as to file the video data and the audio data up to the timing when the recording end instruction is given. The file system 115 creates a file of a moving image file together with shooting information such as a frame rate related to the moving image data (video data and audio data) recorded by the timing when the recording end instruction is given. The filed moving image file is recorded on the memory card 109. Then, the recording process is completed.

次に、Ｓ２１１の変換処理について説明する。ここでは、変換時において撮影フレームレートが再生フレームレートよりも大きい場合、すなわち、撮影時間に対して再生時間が長くなる低速再生を例に説明を行う。図４は、低速再生の場合の変換処理を説明するための図である。図４（Ａ）は、撮影時の映像データおよび音声データの一例を示すイメージ図である。図４（Ａ）において、撮影時間軸５０１における期間５０５は、フレーム１〜４の映像データ５０２および音声データ５０３を示す。各フレーム１〜４は、出力タイミング５０４がフレームの切れ目となり、蓄積バッファ１１１に格納される。図４（Ａ）において、出力タイミング５０４は、破線で示される。 Next, the conversion process of S211 will be described. Here, the case where the shooting frame rate is larger than the playback frame rate at the time of conversion, that is, low-speed playback in which the playback time is longer than the shooting time will be described as an example. FIG. 4 is a diagram for explaining a conversion process in the case of low-speed reproduction. FIG. 4A is an image diagram showing an example of video data and audio data at the time of shooting. In FIG. 4A, the period 505 on the shooting time axis 501 indicates the video data 502 and the audio data 503 of frames 1 to 4. Each frame 1 to 4 is stored in the storage buffer 111 at the output timing 504 as a frame break. In FIG. 4A, the output timing 504 is indicated by a broken line.

図４（Ｂ）は、図４（Ａ）のフレーム１〜４に対応する期間５０５における映像データの再生方法を示すイメージ図である。図４（Ｂ）において、再生時間軸５０７における再生１フレームの時間ｔ２は、撮影時間軸５０１における撮影１フレームの時間ｔ１より長くなっている。図４（Ｃ）は、図４（Ａ）のフレーム１〜４に対応する期間５０５における音声データの記録用変換処理を示すイメージ図である。図４（Ｃ）の各音声データ５０９は、図４（Ａ）の各音声データ５０３に対応する。各音声データ５０９および各音声データ５０３の中の番号は、フレーム１〜４に対応する。図４（Ｃ）において、１フレームの音声データ５０９から、撮影フレームの先頭位置５１２から時間軸方向にずらし量５１１の分だけずらされながら複数のブロック５１０が切り出されている。ブロック５１０は、１フレームのうちの音声データ５０９の一部であり、部分音声データに対応する。各ブロック５１０が結合されることで、変換後の音声データ５１３が得られる。図４（Ｄ）は、図４（Ａ）のフレーム１〜４に対応する期間５０５で撮影した動画を再生したときのイメージ図である。各フレームに対応する映像データおよび音声データは先頭位置５１５で一致している。図４（Ｄ）において、先頭位置５１５は、破線で示される。 FIG. 4B is an image diagram showing a method of reproducing video data in the period 505 corresponding to frames 1 to 4 of FIG. 4A. In FIG. 4B, the time t2 of the reproduction 1 frame on the reproduction time axis 507 is longer than the time t1 of the photographing 1 frame on the photographing time axis 501. FIG. 4C is an image diagram showing a conversion process for recording audio data in the period 505 corresponding to frames 1 to 4 of FIG. 4A. Each voice data 509 of FIG. 4 (C) corresponds to each voice data 503 of FIG. 4 (A). The numbers in each voice data 509 and each voice data 503 correspond to frames 1 to 4. In FIG. 4C, a plurality of blocks 510 are cut out from the audio data 509 of one frame while being shifted by the amount of shift 511 from the head position 512 of the shooting frame in the time axis direction. The block 510 is a part of the audio data 509 in one frame and corresponds to the partial audio data. By combining the blocks 510, the converted voice data 513 can be obtained. FIG. 4D is an image diagram when the moving image taken in the period 505 corresponding to the frames 1 to 4 of FIG. 4A is reproduced. The video data and audio data corresponding to each frame match at the head position 515. In FIG. 4D, the head position 515 is indicated by a broken line.

図４および図５を参照して、変換処理の流れについて説明する。図５は、変換処理の流れを示すフローチャートである。ＣＰＵ１１４は、設定されている撮影フレームレートと再生フレームレートの比の値が１であるかを判定する（Ｓ４０１）。Ｓ４０１でＹＥＳと判定された場合、撮影フレームレートと再生フレームレートとは同じである（ストレッチ比の値が１である）。撮影時間軸５０１における１フレームあたりの時間ｔ１と再生時間軸５０７における１フレームあたりの時間ｔ２とが等しいため、音声データを変換する必要はない。この場合、ＣＰＵ１１４は、メモリ１１６に蓄積された音声データ５０３をそのまま蓄積バッファ１１１に格納する（Ｓ４０２）。そして、変換処理は終了する。蓄積バッファ１１１に格納された音声データ５０３は、ファイルシステム１１５により映像データ５０２と撮影情報と共にファイル化されて、メモリカード１０９に記録される。このときの映像データと音声データとは、図４（Ａ）に示されるように、撮影時と同様に各フレームで映像データと音声データとの出力タイミング５０４が一致している。 The flow of the conversion process will be described with reference to FIGS. 4 and 5. FIG. 5 is a flowchart showing the flow of the conversion process. The CPU 114 determines whether the value of the ratio of the set shooting frame rate to the playback frame rate is 1 (S401). If YES is determined in S401, the shooting frame rate and the playback frame rate are the same (the value of the stretch ratio is 1). Since the time t1 per frame on the shooting time axis 501 and the time t2 per frame on the reproduction time axis 507 are equal, it is not necessary to convert the audio data. In this case, the CPU 114 stores the voice data 503 stored in the memory 116 as it is in the storage buffer 111 (S402). Then, the conversion process ends. The audio data 503 stored in the storage buffer 111 is filed by the file system 115 together with the video data 502 and the shooting information, and is recorded in the memory card 109. As shown in FIG. 4A, the video data and the audio data at this time have the same output timing 504 of the video data and the audio data in each frame as in the case of shooting.

Ｓ４０１でＮＯと判定された場合、撮影フレームレートと再生フレームレートとは異なる。この場合、１フレーム毎に、Ｓ４０３〜Ｓ４０９の処理を実行する。ＣＰＵ１１４は、１フレームごとの音声データ５０９の変換に対応するために、撮影フレームレートの１フレームにかかる時間ｔ１のデータ量に基づいて、音声データ５０９から切り出すブロック５１０のサイズを設定する（Ｓ４０３）。該サイズは、切り出しサイズである。ここで、ブロック５１０のサイズが撮影フレームレートの１フレームあたりの時間ｔ１のデータ量に対して小さいサイズになるに応じて、変換前後の音声データの損失が小さくなる。従って、ＣＰＵ１１４は、変換後の音声データを再生した際に不自然にならない範囲（所定範囲）でブロック５１０のサイズを小さく設定して、切り出しによる変換処理を複数回行うことが好ましい。 If NO is determined in S401, the shooting frame rate and the playback frame rate are different. In this case, the processes of S403 to S409 are executed for each frame. The CPU 114 sets the size of the block 510 cut out from the audio data 509 based on the amount of data of the time t1 required for one frame of the shooting frame rate in order to correspond to the conversion of the audio data 509 for each frame (S403). .. The size is a cutout size. Here, as the size of the block 510 becomes smaller than the amount of data of the time t1 per frame of the shooting frame rate, the loss of audio data before and after the conversion becomes smaller. Therefore, it is preferable that the CPU 114 sets the size of the block 510 to be small within a range (predetermined range) that does not become unnatural when the converted audio data is reproduced, and performs the conversion process by cutting out a plurality of times.

ＣＰＵ１１４は、変換処理開始時の切り出し位置を撮影フレームの先頭位置５１２に設定する（Ｓ４０４）。ＣＰＵ１１４は、ブロック５１０のサイズとフレームレート比の値（撮影フレームレートに対する再生フレームレートの比の値）とに基づいて、ずらし量５１１を算出する。該ずらし量は、変換処理において、切り出した後の切り出し位置を再設定するために用いられる。例えば、撮影フレームレートが「６０ｆｐｓ」であり、再生フレームレートが「３０ｆｐｓ」である場合、フレームレート比の値は「１／２」となる。フレームレート比の値は、ストレッチ比の値の逆数になる。この場合、ＣＰＵ１１４は、ブロック５１０のサイズの「１／２倍」をずらし量５１１として決定する。該ずらし量５１１は、ストレッチ比の値の逆数に対応する。これにより、撮影時の１フレームの音声データのデータ量に対して、変換後の音声データの１フレームのデータ量を「２倍」にすることができる。 The CPU 114 sets the cutout position at the start of the conversion process to the start position 512 of the shooting frame (S404). The CPU 114 calculates the shift amount 511 based on the size of the block 510 and the value of the frame rate ratio (the value of the ratio of the reproduction frame rate to the shooting frame rate). The shift amount is used in the conversion process to reset the cutout position after cutting out. For example, when the shooting frame rate is "60 fps" and the reproduction frame rate is "30 fps", the value of the frame rate ratio is "1/2". The frame rate ratio value is the reciprocal of the stretch ratio value. In this case, the CPU 114 determines the shift amount 511 as "1/2 times" the size of the block 510. The shift amount 511 corresponds to the reciprocal of the value of the stretch ratio. As a result, the amount of data in one frame of the converted audio data can be "doubled" with respect to the amount of data in one frame of audio data at the time of shooting.

ＣＰＵ１１４は、音声データメモリ１０８に蓄積された音声データを現在位置からブロック５１０のサイズ分を切り出し、蓄積バッファ１１１に格納する（Ｓ４０６）。ＣＰＵ１１４は、音声データ５０９から切り出すブロック５１０の切り出し位置をずらし量５１１の分だけ補正する（Ｓ４０７）。上述したように、ずらし量がブロック５１０のサイズの「１／２倍」である場合、ブロック５１０の切り出し位置は、音声データ５０９からブロック５１０の半分ずつずらされていく。撮影フレームレートが再生フレームレートよりも大きい場合、変換前の音声データ５０３のデータ量に比べて変換後の音声データ５１３のデータ量が多くなる。そこで、ＣＰＵ１１４は、切り出し位置の補正を、既に切り出した音声データ５０９のブロック５１０に重ねるように設定して音声データの切り出しを行う。上述したように、ずらし量がブロック５１０のサイズの半分である場合、切り出されたブロック５１０は、前後のブロックと重複する。変換後の音声データには、変換前の音声データが重複していることがあるが、変換後の音声データの各フレームの先頭位置５１５で音声データが一致することに対しての影響はない。 The CPU 114 cuts out the voice data stored in the voice data memory 108 from the current position by the size of the block 510 and stores it in the storage buffer 111 (S406). The CPU 114 corrects the cut-out position of the block 510 cut out from the voice data 509 by the amount of the shift amount 511 (S407). As described above, when the shift amount is "1/2 times" the size of the block 510, the cutout position of the block 510 is shifted from the audio data 509 by half of the block 510. When the shooting frame rate is larger than the reproduction frame rate, the amount of data of the converted audio data 513 is larger than the amount of data of the audio data 503 before conversion. Therefore, the CPU 114 sets the correction of the cutout position so as to overlap the block 510 of the already cut out voice data 509, and cuts out the voice data. As described above, when the shift amount is half the size of the block 510, the cut out block 510 overlaps with the previous and next blocks. Although the voice data before conversion may be duplicated in the voice data after conversion, there is no effect on the matching of the voice data at the head position 515 of each frame of the voice data after conversion.

ＣＰＵ１１４は、変換処理後の音声データが再生フレームレートの１フレームにかかる時間ｔ２のデータ量の分、メモリ１１６に蓄積されているかを判定する（Ｓ４０８）。Ｓ４０８でＮＯと判定された場合、フローはＳ４０６に移行する。Ｓ４０８でＹＥＳと判定された場合、ＣＰＵ１１４は、再生フレームレート１フレームにかかる時間ｔ２のデータ量を加味して、次のフレームの先頭位置に蓄積バッファ１１１の書込み位置を設定する（Ｓ４０９）。そして、変換処理は終了する。Ｓ４０９の処理によって、１フレームの音声データの変換処理を行うごとに再生フレームレートの各フレームの先頭位置から書込みを行うことができる。つまり、撮影時の各フレームの先頭の音声データが、再生時の各フレームの先頭の音声データとなり、映像と共に再生したときに、撮影時に各フレームの先頭位置で一致していた映像データと音声データを、再生時に一致させることができる。これにより、映像データと音声データとを同期させることができる。 The CPU 114 determines whether the audio data after the conversion process is stored in the memory 116 by the amount of data of the time t2 required for one frame of the reproduction frame rate (S408). If NO is determined in S408, the flow shifts to S406. If YES is determined in S408, the CPU 114 sets the write position of the storage buffer 111 at the start position of the next frame in consideration of the amount of data of the time t2 required for one frame of the reproduction frame rate (S409). Then, the conversion process ends. By the processing of S409, writing can be performed from the start position of each frame of the reproduction frame rate every time the conversion processing of the audio data of one frame is performed. That is, the audio data at the beginning of each frame at the time of shooting becomes the audio data at the beginning of each frame at the time of reproduction, and the video data and the audio data that match at the beginning position of each frame at the time of shooting when reproduced together with the video. Can be matched during playback. As a result, the video data and the audio data can be synchronized.

上述したように、撮像装置１００に対して、動画撮影中に、再生フレームレートよりも大きい撮影フレームレートが設定された場合でも、１フレームごとに音声データの変換処理が行われる。つまり、ＣＰＵ１１４は、１フレームごとに、音声データから所定サイズのブロック（部分音声データ）を切り出して、重ね合わせて結合することで、音声データのサイズを大きくすることができる。これにより、撮像装置１００に、再生フレームレートよりも大きい撮影フレームレートが設定されたとしても、フレームごとに、映像データと音声データとを再生フレームレートに同期させることができる。 As described above, even if a shooting frame rate larger than the reproduction frame rate is set for the image pickup apparatus 100 during movie shooting, the audio data conversion process is performed for each frame. That is, the CPU 114 can increase the size of the audio data by cutting out blocks (partial audio data) of a predetermined size from the audio data for each frame and superimposing and combining them. As a result, even if a shooting frame rate larger than the reproduction frame rate is set in the image pickup apparatus 100, the video data and the audio data can be synchronized with the reproduction frame rate for each frame.

ここで、再生フレームレートよりも大きい撮影フレームレートが設定された場合、複数フレームの音声データを纏めて伸長することが考えられる。しかし、複数フレームの音声データを纏めて伸長すると、音声データと映像データとの間にずれを生じることがあり、この場合、音声データと映像データとを同期させることができない。本実施形態では、再生フレームレートよりも大きい撮影フレームレートが設定された場合、フレームごとに、音声データを変換する処理を行っているため、音声データと映像データとを同期させることができる。 Here, when a shooting frame rate larger than the playback frame rate is set, it is conceivable that the audio data of a plurality of frames are collectively expanded. However, if the audio data of a plurality of frames is expanded together, a gap may occur between the audio data and the video data, and in this case, the audio data and the video data cannot be synchronized. In the present embodiment, when a shooting frame rate larger than the playback frame rate is set, the audio data and the video data can be synchronized because the processing of converting the audio data is performed for each frame.

上述した例は、撮影フレームレートが再生フレームレートよりも大きい場合である低速再生の例であるが、撮影フレームレートが再生フレームレートよりも小さい場合となる高速再生（ｔ１＞ｔ２）の場合でも、図５の変換処理を用いることができる。高速再生の場合、ストレッチ比の値（再生フレームレートに対する撮影フレームレートの比の値）は１より小さい。図６は、高速再生の場合の変換処理を説明する図である。図６（Ａ）は、撮影時の映像データおよび音声データのイメージ図である。図６（Ａ）のイメージ図は、図４（Ａ）のイメージ図と同様である。図６（Ｂ）は、図４（Ｂ）のフレーム１〜４に対応する期間６０５における映像データの再生方法を示すイメージ図である。図６（Ｃ）は、図６（Ａ）のフレーム１〜４に対応する期間６０５における音声データ６０９の記録用変換処理を示すイメージ図である。図６（Ｄ）は、図６（Ａ）のフレーム１〜４に対応する期間６０５で撮影した動画を再生したときのイメージ図である。 The above example is an example of low-speed playback in which the shooting frame rate is larger than the playback frame rate, but even in the case of high-speed playback (t1> t2) in which the shooting frame rate is smaller than the playback frame rate. The conversion process of FIG. 5 can be used. In the case of high-speed playback, the stretch ratio value (the ratio of the shooting frame rate to the playback frame rate) is smaller than 1. FIG. 6 is a diagram illustrating a conversion process in the case of high-speed reproduction. FIG. 6A is an image diagram of video data and audio data at the time of shooting. The image diagram of FIG. 6 (A) is the same as the image diagram of FIG. 4 (A). FIG. 6B is an image diagram showing a method of reproducing video data in the period 605 corresponding to frames 1 to 4 of FIG. 4B. FIG. 6C is an image diagram showing a recording conversion process of the audio data 609 in the period 605 corresponding to frames 1 to 4 of FIG. 6A. FIG. 6D is an image diagram when the moving image taken in the period 605 corresponding to the frames 1 to 4 of FIG. 6A is reproduced.

高速再生の場合の変換処理は、低速再生の場合の変換処理とＳ４０７の処理内容が異なる。高速再生の場合、撮影フレームレートが再生フレームレートよりも小さいため、変換前の音声データ６０９のデータ量に比べて変換後の音声データ６１３のデータ量が小さくなる。このため、ＣＰＵ１１４は、切り出した音声データと次の切り出し位置の間の音声データを間引いて音声データの切り出しを行うことで、切り出し位置の補正を行う。図６（Ｃ）の例の場合、最初に音声データ６０９からブロック６１０が切り出されてから、次のブロック６１０が切り出されるときに、切り出し位置は、ブロック６１０のサイズより大きいずらし量６１１で補正される。図６（Ｃ）の例の場合、ずらし量６１１は、ブロック６１０のサイズの２倍である。つまり、該ずらし量６１１は、ストレッチ比の逆数の値に対応する。図６（Ｃ）に示されるように、変換後の音声データにおいて変換前の音声データの一部が欠落していることがあるが、変換後の音声データの各フレームの先頭位置６１５で音声データが一致することに対する影響はない。 The conversion process in the case of high-speed reproduction differs from the conversion process in the case of low-speed reproduction in the processing content of S407. In the case of high-speed playback, since the shooting frame rate is smaller than the playback frame rate, the amount of data of the converted audio data 613 is smaller than the amount of data of the audio data 609 before conversion. Therefore, the CPU 114 corrects the cutout position by thinning out the cut out voice data and the voice data between the next cutout positions and cutting out the voice data. In the case of the example of FIG. 6C, when the block 610 is first cut out from the audio data 609 and then the next block 610 is cut out, the cutout position is corrected by a shift amount 611 larger than the size of the block 610. To. In the case of the example of FIG. 6C, the shift amount 611 is twice the size of the block 610. That is, the shift amount 611 corresponds to the value of the reciprocal of the stretch ratio. As shown in FIG. 6C, a part of the voice data before conversion may be missing in the voice data after conversion, but the voice data is at the head position 615 of each frame of the voice data after conversion. Has no effect on matching.

上述したように、撮像装置１００に対して、動画撮影中に、再生フレームレートよりも小さい撮影フレームレートが設定された場合でも、フレームごとに音声データの変換処理が行われる。つまり、ＣＰＵ１１４は、フレームごとに、音声データから一部を間引きすることで、音声データのサイズを小さくすることができる。これにより、撮像装置１００に、再生フレームレートよりも小さい撮影フレームレートが設定されたとしても、フレームごとに、映像データと音声データとを再生フレームレートに同期させることができる。 As described above, even if a shooting frame rate smaller than the reproduction frame rate is set for the image pickup apparatus 100 during movie shooting, the audio data conversion process is performed for each frame. That is, the CPU 114 can reduce the size of the audio data by thinning out a part of the audio data for each frame. As a result, even if a shooting frame rate smaller than the reproduction frame rate is set in the image pickup apparatus 100, the video data and the audio data can be synchronized with the reproduction frame rate for each frame.

以上において、撮像装置１００は、設定された撮影フレームレートで取得した映像データおよび音声データを、ネットワーク等を介して、外部装置に送信してもよい。この場合、外部装置は、設定された撮影フレームレートで取得された映像データおよび音声データを受信する。外部装置としては、例えば、パーソナルコンピュータ等の情報処理装置が想定される。再生フレームレートの設定を受け付けることができ、設定された再生フレームレートに基づいて、上述した変換処理を行い、フレームごとに、映像データと音声データとを同期させてもよい。 In the above, the image pickup apparatus 100 may transmit the video data and the audio data acquired at the set shooting frame rate to the external device via the network or the like. In this case, the external device receives the video data and the audio data acquired at the set shooting frame rate. As the external device, for example, an information processing device such as a personal computer is assumed. The setting of the reproduction frame rate can be accepted, and the above-mentioned conversion process may be performed based on the set reproduction frame rate to synchronize the video data and the audio data for each frame.

以上、本発明の好ましい実施の形態について説明したが、本発明は上述した各実施の形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。本発明は、上述の各実施の形態の１以上の機能を実現するプログラムを、ネットワークや記憶媒体を介してシステムや装置に供給し、そのシステム又は装置のコンピュータの１つ以上のプロセッサーがプログラムを読み出して実行する処理でも実現可能である。また、本発明は、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Although the preferred embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and various modifications and modifications can be made within the scope of the gist thereof. The present invention supplies a program that realizes one or more functions of each of the above-described embodiments to a system or device via a network or storage medium, and one or more processors of the computer of the system or device implements the program. It can also be realized by the process of reading and executing. The present invention can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００撮像装置
１０１撮像素子
１０５マイク
１１４ＣＰＵ
１１６メモリ 100 Image pickup device 101 Image pickup device 105 Microphone 114 CPU
116 memory

Claims

An acquisition method for acquiring video data and audio data at the shooting frame rate,
A conversion means for converting the audio data based on the ratio of the shooting frame rate to the playback frame rate, and
A recording means for recording the audio data converted by the conversion means together with the video data is provided.
The conversion means is an image pickup apparatus characterized in that the ratio of the shooting frame rate to the reproduction frame rate is calculated for each frame, and the audio data corresponding to the frame is converted based on the calculated ratio.

A reception means that accepts change instructions for changing the shooting frame rate while acquiring video data,
It has a changing means for changing the shooting frame rate setting at the frame boundary when an instruction to change the shooting frame rate is received during acquisition of video data.
The conversion means calculates the ratio of the shooting frame rate to the playback frame rate by using the changed shooting frame rate in response to the change in the shooting frame rate by the changing means. The imaging device according to claim 1.

Claim 1 is characterized in that, when the shooting frame rate and the reproduction frame rate are the same, the conversion means does not convert audio data based on the ratio of the reproduction frame rate to the shooting frame rate. The imaging apparatus according to.

When the value of the ratio is larger than 1, the conversion means cuts out a plurality of partial voice data having a size smaller than the voice data from the acquired voice data, and superimposes and combines the plurality of the partial voice data to obtain the said voice data. The imaging device according to claim 1, wherein audio data corresponding to one frame of a reproduction frame rate is generated.

The fourth aspect of claim 4, wherein the conversion means, when cutting out the plurality of partial voice data from the voice data, shifts the size of the partial voice data by the reciprocal of the value of the ratio. Imaging device.

When the value of the ratio is less than 1, the conversion means thins out a part of the partial audio data having a size smaller than the audio data from the acquired audio data, and the audio corresponding to one frame of the reproduction frame rate. The imaging apparatus according to claim 1, wherein data is generated.

Of claims 4 to 6, the time of the generated audio data corresponding to one frame of the reproduction frame rate is the same as the time of the video data corresponding to one frame of the reproduction frame rate. The imaging device according to any one item.

The conversion means is characterized in that when the shooting frame rate is changed while the video data and the audio data are being acquired, the shooting frame rate changed in the next frame is applied. The imaging apparatus according to any one of 1 to 7.

The process of acquiring video data and audio data at the shooting frame rate,
The step of converting the audio data based on the ratio of the shooting frame rate to the playback frame rate, and
The process of recording the converted audio data together with the video data, and
With
The conversion step is a control method characterized in that the ratio of the shooting frame rate to the reproduction frame rate is calculated for each frame, and the audio data corresponding to the frame is converted based on the calculated ratio. ..

A program for causing a computer to execute each means according to any one of claims 1 to 8.