JP2013005418A

JP2013005418A - Imaging apparatus and reproducer

Info

Publication number: JP2013005418A
Application number: JP2011138050A
Authority: JP
Inventors: Shinji Onishi; 慎二大西
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-06-22
Filing date: 2011-06-22
Publication date: 2013-01-07
Anticipated expiration: 2031-06-22
Also published as: JP5762168B2

Abstract

PROBLEM TO BE SOLVED: To record voice from an object outside an imaged image, which is related to voice from an object inside the imaged image.SOLUTION: A CPU 106 takes in the voice of wireless microphones 200to 200into a RAM 109 in frame of the imaged image by an imaging element 102. The CPU 106 determines whether the microphone 200exists in the imaged image by an object related to the microphone. The CPU 106 detects the voice of the wireless microphone which is not in the imaged image that is the voice correlated to the voice of the wireless microphone which is in the imaged image, with respect to the voice of the wireless microphone in the imaged image. The CPU 106 records the voice of the wireless microphone in the imaged image, and the voice of the wireless microphone which is not in the imaged image that is correlated to the voice of the wireless microphone which is in the imaged image, in a recording medium 110.

Description

本発明は、撮像装置及び再生装置に関する。 The present invention relates to an imaging device and a playback device.

従来、撮像装置には、ワイヤレスマイクを接続可能であり、動画撮像中に、ワイヤレスマイクからの音声信号を記録する機能を有するものがある。特許文献１には、人物を被写体とした撮像画像から人物の顔の特徴を抽出し、予め登録されている顔特徴データとの一致度に基づいて人物認証を行う技術が記載されている。複数のワイヤレスマイクを接続可能な場合、予め登録されている人物と個々のワイヤレスマイクとを関連付け、上記人物認証を併用することで、撮像画角内外の人物のワイヤレスマイクを特定できる。 2. Description of the Related Art Conventionally, some imaging devices can be connected to a wireless microphone and have a function of recording an audio signal from a wireless microphone during moving image imaging. Japanese Patent Application Laid-Open No. 2004-228561 describes a technique for extracting a person's facial feature from a captured image of a person as a subject and performing person authentication based on the degree of coincidence with face feature data registered in advance. When a plurality of wireless microphones can be connected, a wireless microphone of a person inside and outside the imaging angle of view can be identified by associating a person registered in advance with each wireless microphone and using the person authentication together.

また、特許文献２には、撮像画像（又は撮像画角）内に存在するワイヤレスマイクを識別し、そのワイヤレスマイクからの音声を記録し、他方、撮像画像外のワイヤレスマイクの音声を記録しないようにする技術が記載されている。 Also, in Patent Document 2, a wireless microphone present in a captured image (or captured angle of view) is identified, and the sound from the wireless microphone is recorded, while the sound of the wireless microphone outside the captured image is not recorded. The technology to make is described.

特開平６−２５９５３４号公報JP-A-6-259534 特開２００４−２２８６６７号公報JP 2004-228667 A

特許文献２に記載の技術では、例えば、撮像画像内に存在するワイヤレスマイクを有する被写体と撮像画像外に存在するワイヤレスマイクを有する被写体が会話中である場合、再生時に会話として成り立たない音声になってしまう。 In the technique described in Patent Document 2, for example, when a subject having a wireless microphone that exists in a captured image and a subject having a wireless microphone that exists outside the captured image are in conversation, the sound does not hold as conversation during playback. End up.

本発明は、このような不都合を解消して、撮像画像内の被写体からの音声と、撮像画像外の被写体からの音声をより適切に記録する撮像装置、及び適切に再生する再生装置を提示することを目的とする。 The present invention eliminates such inconveniences, and presents an imaging device that more appropriately records audio from a subject in a captured image and audio from a subject outside the captured image, and a playback device that appropriately reproduces the same. For the purpose.

本発明に係る撮像装置は、撮像手段及び１以上のワイヤレスマイクと通信を行う通信手段を有する撮像装置であって、前記ワイヤレスマイクのそれぞれが前記撮像手段の撮像画像内に存在するか否かを判定する判定手段と、前記撮像画像内に存在するワイヤレスマイクと前記撮像画像内に存在しないワイヤレスマイクの相関の有無を判定する相関判定手段と、前記撮像画像内に存在するワイヤレスマイクの音声と、前記相関判定手段により前記撮像画像内に存在するワイヤレスマイクと相関が有ると判定された、前記撮像画像内に存在しないワイヤレスマイクの音声を記録する記録手段とを具備することを特徴とする。 An imaging apparatus according to the present invention is an imaging apparatus having an imaging unit and a communication unit that communicates with one or more wireless microphones, and whether or not each of the wireless microphones is present in a captured image of the imaging unit. A determination unit for determining, a correlation determination unit for determining presence / absence of correlation between a wireless microphone present in the captured image and a wireless microphone not present in the captured image, and a voice of the wireless microphone present in the captured image; Recording means for recording the sound of the wireless microphone that is determined to be correlated with the wireless microphone present in the captured image by the correlation determining means and is not present in the captured image.

本発明に係る再生装置は、画像及び１以上のワイヤレスマイクの音声が記録された画像・音声データを再生する再生装置であって、前記各ワイヤレスマイクが再生画像内に存在するか否かを判定する判定手段と、前記再生画像内に存在するワイヤレスマイクと前記再生画像内に存在しないワイヤレスマイクの相関の有無を判定する相関判定手段と、前記再生画像内に存在するワイヤレスマイクの音声と、前記相関判定手段により相関があると判定された、前記再生画像内に存在しないワイヤレスマイクの音声を出力する音声出力手段とを有することを特徴とする。 The playback apparatus according to the present invention is a playback apparatus that plays back image / audio data in which an image and sound of one or more wireless microphones are recorded, and determines whether or not each wireless microphone is present in a playback image. Determining means for determining, correlation determining means for determining presence / absence of correlation between a wireless microphone present in the reproduced image and a wireless microphone not present in the reproduced image, audio of the wireless microphone present in the reproduced image, Audio output means for outputting the sound of the wireless microphone which is determined to be correlated by the correlation determination means and which does not exist in the reproduced image.

本発明によれば、撮像画像又は再生画像内のワイヤレスマイクと撮像画像外又は再生画像外のワイヤレスマイクの相関の有無に基づき、後者の音声を記録又は出力するので、適切な音声を記録再生できる。 According to the present invention, since the latter sound is recorded or output based on the presence or absence of correlation between the wireless microphone in the captured image or the reproduced image and the wireless microphone outside the captured image or the reproduced image, appropriate sound can be recorded and reproduced. .

本発明の一実施例の概略構成ブロック図である。It is a schematic block diagram of one Example of this invention. 実施例１における音声処理を示すフローチャートである。3 is a flowchart illustrating audio processing in the first embodiment. 実施例１における音声処理を示すフローチャートである。3 is a flowchart illustrating audio processing in the first embodiment. 音声信号用バッファの構成例を示す模式図である。It is a schematic diagram which shows the structural example of the buffer for audio | voice signals. マイクｉの状態を示す変数の内容例である。It is an example of the content of the variable which shows the state of the microphone i. 音声検出履歴データの例を示す模式図である。It is a schematic diagram which shows the example of audio | voice detection log | history data. 音声ブロック検出処理を示すフローチャートである。It is a flowchart which shows an audio | voice block detection process. 音声ブロック検出処理を示すフローチャートである。It is a flowchart which shows an audio | voice block detection process. 相関判定処理を示すフローチャートである。It is a flowchart which shows a correlation determination process. 相関判定処理を示すフローチャートである。It is a flowchart which shows a correlation determination process. 実施例２における音声処理を示すフローチャートである。10 is a flowchart illustrating audio processing in the second embodiment. 実施例２における音声処理を示すフローチャートである。10 is a flowchart illustrating audio processing in the second embodiment. マイクｉの状態を示す変数の内容例である。It is an example of the content of the variable which shows the state of the microphone i. 測位機能を有するワイヤレスマイクの概略構成ブロック図である。It is a schematic block diagram of a wireless microphone having a positioning function. 実施例３における音声処理を示すフローチャートである。12 is a flowchart illustrating audio processing in the third embodiment. 実施例３における音声処理を示すフローチャートである。12 is a flowchart illustrating audio processing in the third embodiment. 実施例３における相関判定処理を示すフローチャートである。12 is a flowchart illustrating a correlation determination process in the third embodiment. 実施例４における音声処理を示すフローチャートである。10 is a flowchart illustrating audio processing in the fourth embodiment. 実施例４における音声処理を示すフローチャートである。10 is a flowchart illustrating audio processing in the fourth embodiment.

以下、図面を参照して、本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明に係る撮像装置の一実施例の概略構成ブロック図を示す。図１に示す実施例は、動画及び音声を記録可能であり、記録した動画及び音声を再生可能な撮像装置、いわゆるビデオカメラ１００である。ビデオカメラ１００には、１以上のワイヤレスマイク２００（２００_１〜２００_ｎ）を接続可能である。ワイヤレスマイク２００_１〜２００_ｎは同じ構成からなる。 FIG. 1 shows a schematic block diagram of an embodiment of an imaging apparatus according to the present invention. The embodiment shown in FIG. 1 is a so-called video camera 100 that can record moving images and sounds and can reproduce the recorded moving images and sounds. One or more wireless microphones 200 (200 ₁ to 200 _n ) can be connected to the video camera 100. The wireless microphones 200 ₁ to 200 _n have the same configuration.

ビデオカメラ１００の構成を説明する。光学系１０１は撮像のためのレンズ群と、そのレンズを駆動させるアクチュエータ群とからなる。光学系１０１は、ＣＰＵ１０６からの指示に従い、ズームとフォーカスを変更可能である。撮像素子１０２は、光学系１０１が結像させた光学像を電気信号に変換し、得られた画像データをＣＰＵ１０６に供給する。 The configuration of the video camera 100 will be described. The optical system 101 includes a lens group for imaging and an actuator group that drives the lens. The optical system 101 can change zoom and focus in accordance with instructions from the CPU 106. The image sensor 102 converts the optical image formed by the optical system 101 into an electrical signal, and supplies the obtained image data to the CPU 106.

マイク部１０３は、ビデオカメラ１００の筺体に固定されたマイクまたは有線接続されたマイクとＡ／Ｄ変換回路からなる。マイク部１０３は周囲の音声を取り込み、対応する音声データをＣＰＵ１０６に供給する。 The microphone unit 103 includes a microphone fixed to the housing of the video camera 100 or a wired microphone and an A / D conversion circuit. The microphone unit 103 captures surrounding sounds and supplies corresponding sound data to the CPU 106.

表示部１０４は、撮像した画像、及びユーザ設定のためのメニュー等を表示する。表示部１０４は、具体的には、液晶表示装置（ＬＣＤ）などの画像を表示可能なデバイスからなる。通信部１０５は、ワイヤレスマイク２００Ａ，２００Ｂと無線通信する。 The display unit 104 displays a captured image, a menu for user setting, and the like. Specifically, the display unit 104 includes a device capable of displaying an image such as a liquid crystal display (LCD). The communication unit 105 performs wireless communication with the wireless microphones 200A and 200B.

ＣＰＵ１０６は、ビデオカメラ１００全体の処理を統括する中央制御ユニットであり、基本的にＲＯＭ１０８に格納されているブログラムとデータに基づいて動作する。操作部１０７は、ユーザの操作を受け付ける手段であり、具体的にはビデオカメラ１００のユーザが触れることのできる部分にあるボタン、レバー及びタッチパネルなどからなる。ＲＯＭ１０８には、ビデオカメラ１００が動作するためのプログラムとパラメータなどのデータが予め書き込まれており、ＣＰＵ１０６は基本的にＲＯＭ１０８の内容に従って動作する。ＲＡＭ１０９は、ＣＰＵ１０６上で動作するプログラムと一時データを格納するのに使用される。 The CPU 106 is a central control unit that controls the entire processing of the video camera 100, and basically operates based on programs and data stored in the ROM 108. The operation unit 107 is a unit that receives a user's operation, and specifically includes buttons, levers, a touch panel, and the like that are located on a portion of the video camera 100 that can be touched by the user. A program for operating the video camera 100 and data such as parameters are written in the ROM 108 in advance, and the CPU 106 basically operates according to the contents of the ROM 108. The RAM 109 is used to store programs that run on the CPU 106 and temporary data.

記録媒体１１０は、動画データ及び音声データ等を記録するのに使用される。撮影時に記録媒体１１０に動画データ及び音声データが記録される。再生時に、記録媒体１１０から記録された動画データ及び音声データが読みだされる。記録媒体１１０は、例えば、不揮発性の半導体メモリ等からなる。 The recording medium 110 is used for recording moving image data, audio data, and the like. Moving image data and audio data are recorded on the recording medium 110 at the time of shooting. During reproduction, moving image data and audio data recorded from the recording medium 110 are read out. The recording medium 110 is composed of, for example, a nonvolatile semiconductor memory.

ワイヤレスマイク２００_１〜２００_ｎの内部構成を説明する。ワイヤレスマイク２００_１〜２００_ｎは、同じ構成からなる。マイク部２０１は外部の音声を取り込むマイクと、マイクの出力音声信号をデジタル信号に変換するＡ／Ｄ変換回路からなる。マイク部２０１は、取り込んだ音声データをＣＰＵ２０２に供給する。 The internal configuration of the wireless microphones 200 ₁ to 200 _n will be described. The wireless microphones 200 ₁ to 200 _n have the same configuration. The microphone unit 201 includes a microphone that captures external sound and an A / D conversion circuit that converts an output sound signal of the microphone into a digital signal. The microphone unit 201 supplies the captured audio data to the CPU 202.

ＣＰＵ２０２は、ワイヤレスマイク２００_１〜２００_ｎの動作を全体的に統括する中央制御ユニットであり、基本的にＲＯＭ２０３に格納されているブログラムとデータに基づいて動作する。ＲＯＭ２０３には、ワイヤレスマイク２００_１〜２００_ｎが動作するためのプログラムとパラメータなどのデータが予め書き込まれており、ＣＰＵ２０２は基本的にＲＯＭ２０３の内容に従って動作する。ＲＡＭ２０４は、ＣＰＵ２０２上で動作するプログラムと一時データを格納するのに使用される。通信部２０５はビデオカメラ１００の通信部１０５と無線通信する。 The CPU 202 is a central control unit that generally controls the operations of the wireless microphones 200 ₁ to 200 _n and basically operates based on programs and data stored in the ROM 203. The ROM 203 is pre-programmed with data for operating the wireless microphones 200 ₁ to 200 _n and data such as parameters, and the CPU 202 basically operates according to the contents of the ROM 203. The RAM 204 is used to store programs operating on the CPU 202 and temporary data. A communication unit 205 performs wireless communication with the communication unit 105 of the video camera 100.

図２Ａ及び図２Ｂは、ビデオカメラ１００の記録時の音声処理の動作フローチャートである。音声処理は、動画の記録と同期して行われるので、図２Ａ及び図２Ｂに示すフローは、記録される動画のフレーム周期で繰り返し実行される。たとえば、毎秒３０フレームの動画記録を行う場合、ＣＰＵ１０６は、図２Ａ及び図２Ｂに示す処理を１／３０秒以内に実行し、１秒間に３０回、繰り返す。ビデオカメラ１００（ＣＰＵ１０６）は、撮像画像内の人物を認識する機能と、予め登録されている人物とワイヤレスマイク２００_１〜２００_ｎとを関連付けて記憶する機能を有する。記録媒体１１０には、被写体となる人物を識別するために必要な特徴データが事前に登録された被写体登録手段として機能する。ＣＰＵ１０６は、この特徴データを参照して被写体を識別する。図２Ａ及び図２Ｂに示すフローでは、ワイヤレスマイク２００_１〜２００_ｎをマイクｉ（但し、ｉ＝１〜ｎ）と表記する。 2A and 2B are operation flowcharts of audio processing during recording by the video camera 100. FIG. Since the audio processing is performed in synchronization with the recording of the moving image, the flow shown in FIGS. 2A and 2B is repeatedly executed at the frame period of the moving image to be recorded. For example, when recording a moving image at 30 frames per second, the CPU 106 executes the processing shown in FIGS. 2A and 2B within 1/30 seconds and repeats 30 times per second. The video camera 100 (CPU 106) has a function of recognizing a person in a captured image and a function of storing a person registered in advance and the wireless microphones 200 ₁ to 200 _n in association with each other. The recording medium 110 functions as a subject registration unit in which feature data necessary for identifying a person as a subject is registered in advance. The CPU 106 identifies the subject with reference to the feature data. In the flow shown in FIGS. 2A and 2B, the wireless microphones 200 ₁ to 200 _n are represented as microphones i (where i = ₁ to _n ).

ユーザがビデオカメラ１００での記録を開始すると、ＣＰＵ１０６は、マイク部１０３から動画１フレーム時間に相当する音声信号を取得し、ＲＡＭ１０９内の音声信号用バッファに記録する（Ｓ２０１）。図３は、ＲＡＭ１０９内の音声信号用バッファの構成を示す模式図である。音声信号用バッファは、数の領域に分割されている。ミックス音声領域には、マイク部１０３及び接続されたワイヤレスマイク２００_１〜２００_ｎの音声をミックスしたデータが最終的に格納される。内蔵マイク音声領域には、内蔵のマイク部１０３からの音声データが格納される。マイク［ｉ］音声領域（但し，ｉ＝１〜ｎ）には、ワイヤレスマイク２００_１〜２００_ｎからの音声データが格納される。ＣＰＵ１０６はまず、マイク部１０３で取得した音声データをＲＡＭ１０９のミックス音声領域及び内蔵マイク音声領域に格納する。 When the user starts recording with the video camera 100, the CPU 106 acquires an audio signal corresponding to one frame time of the moving image from the microphone unit 103, and records it in an audio signal buffer in the RAM 109 (S201). FIG. 3 is a schematic diagram showing the configuration of the audio signal buffer in the RAM 109. The audio signal buffer is divided into a number of areas. In the mixed sound area, data obtained by mixing sounds of the microphone unit 103 and the connected wireless microphones 200 ₁ to 200 _n is finally stored. Voice data from the built-in microphone unit 103 is stored in the built-in microphone voice area. Audio data from the wireless microphones 200 ₁ to 200 _n is stored in the microphone [i] audio area (where i = ₁ to _n ). First, the CPU 106 stores the audio data acquired by the microphone unit 103 in the mixed audio area and the built-in microphone audio area of the RAM 109.

ＣＰＵ１０６は、内部変数ｉに１をセットして初期化する（Ｓ２０２）。変数ｉは、ワイヤレスマイクｉを特定する。ＣＰＵ１０６は、予めビデオカメラ１００に登録されているワイヤレスマイクｉが通信部１０５を介して検出できるか否かを判定する（Ｓ２０３）。検出されない場合（Ｓ２０３）、ＣＰＵ１０６は、内部変数Ｍｉｃ［ｉ］に”０”をセットする（Ｓ２０４）。内部変数Ｍｉｃ［ｉ］は、マイクｉの状態を示す変数である。図４は、内部変数Ｍｉｃ［ｉ］の値と意味の対応表を示す。 The CPU 106 initializes the internal variable i by setting 1 (S202). The variable i specifies the wireless microphone i. The CPU 106 determines whether or not the wireless microphone i registered in the video camera 100 in advance can be detected via the communication unit 105 (S203). If not detected (S203), the CPU 106 sets “0” to the internal variable Mic [i] (S204). The internal variable Mic [i] is a variable indicating the state of the microphone i. FIG. 4 shows a correspondence table between values of internal variables Mic [i] and meanings.

マイクｉが検出される場合（Ｓ２０３）、ＣＰＵ１０６は、通信部１０５を介してマイクｉから動画１フレーム時間に相当する音声信号を取得し、ＲＡＭ１０９内のマイク［ｉ］音声領域に記録する（Ｓ２０５）。次に、ＣＰＵ１０６は、マイクｉが撮像画像内に存在するか否かを判定する（Ｓ２０６）。この判定は、マイクｉが関連付けられている人物が撮像画像内に検出されているか否かで行うことができる。 When the microphone i is detected (S203), the CPU 106 acquires an audio signal corresponding to one frame time of the moving image from the microphone i via the communication unit 105, and records it in the microphone [i] audio area in the RAM 109 (S205). ). Next, the CPU 106 determines whether or not the microphone i exists in the captured image (S206). This determination can be made based on whether or not the person associated with the microphone i is detected in the captured image.

マイクｉが撮像画像内に存在する場合（Ｓ２０７）、ＣＰＵ１０６は、ミックス音声領域の音声データとマイク［ｉ］音声領域の音声データを混合し、混合結果の音声データをミックス音声領域に記録する（Ｓ２０８）。また、ＣＰＵ１０６は、内部変数Ｍｉｃ［ｉ］にワイヤレスマイクｉが検出されており、かつ撮像画像内に存在することを示す値”１”をセットする（Ｓ２０９）。 When the microphone i exists in the captured image (S207), the CPU 106 mixes the audio data in the mixed audio area and the audio data in the microphone [i] audio area, and records the audio data of the mixing result in the mixed audio area ( S208). Further, the CPU 106 sets a value “1” indicating that the wireless microphone i is detected and exists in the captured image in the internal variable Mic [i] (S209).

マイクｉが撮像画像内に存在しない場合（Ｓ２０７）、ＣＰＵ１０６は、内部変数Ｍｉｃ［ｉ］にマイクｉが検出されているが、撮像画像内に存在しないことを示す値”２”をセットする（Ｓ２１０）。 When the microphone i is not present in the captured image (S207), the CPU 106 sets a value “2” indicating that the microphone i is detected but not present in the captured image in the internal variable Mic [i] ( S210).

ステップＳ２０９又は同Ｓ２１０の後、ＣＰＵ１０６は、マイク［ｉ］音声領域に記録されている動画１フレーム時間の音声データに人物の発声による音声が含まれるかどうかを判定する（Ｓ２１１）。この判定には、例えば、特開２００１−０２２３６７号公報に記載されるような方法が利用可能である。即ち、入力音声フレームの背景雑音レベルを判定し、この雑音レベルに対応した閾値と入力音声フレームの音量を比較する。 After step S209 or S210, the CPU 106 determines whether or not voice data of one frame of moving image recorded in the microphone [i] voice area includes voice generated by a person's voice (S211). For this determination, for example, a method as described in JP-A-2001-022367 can be used. That is, the background noise level of the input voice frame is determined, and the threshold corresponding to this noise level is compared with the volume of the input voice frame.

ＣＰＵ１０６は、ステップＳ２１１の判定結果をマイクｉに対する音声検出履歴データとしてＲＡＭ１０９に格納する（Ｓ２１２）。例えば、１０秒分の音声検出履歴データを保持できるＦＩＦＯバッファに、該当フレームにおいて音声を検出した場合は音声フレームの音量値を、音声を検出していない場合はゼロをセットする。このＦＩＦＯバッファは、音声検出履歴保持手段に相当する。 The CPU 106 stores the determination result of step S211 in the RAM 109 as voice detection history data for the microphone i (S212). For example, in the FIFO buffer that can hold the voice detection history data for 10 seconds, the volume value of the voice frame is set when the voice is detected in the corresponding frame, and zero is set when the voice is not detected. The FIFO buffer corresponds to a voice detection history holding unit.

ＣＰＵ１０６は、変数ｉをインクリメントする（Ｓ２１３）。ＣＰＵ１０６は、変数ｉを接続可能なワイヤレスマイク数ｎと比較し、ｉがｎ以下であれば、ステップＳ２０３に戻る（Ｓ２１４）。ｉがｎを越える場合（Ｓ２１４）、即ち、全ての接続可能なワイヤレスマイクの検出処理を終了している場合、ステップＳ２１５に進む。 The CPU 106 increments the variable i (S213). The CPU 106 compares the variable i with the number of connectable wireless microphones n. If i is equal to or less than n, the CPU 106 returns to step S203 (S214). If i exceeds n (S214), that is, if all the connectable wireless microphones have been detected, the process proceeds to step S215.

図５は、ステップＳ２１２で記録した音声検出履歴データの例を模式的に示す。図５では、接続可能なワイヤレスマイクの数ｎが４である場合を示している。各ワイヤレスマイクの右の欄に、音声を検出したフレームに該当する箇所にハッチングが施されている。欄の左側が時間的に過去のフレームに該当し、右端が現在処理中のフレームに該当する。また、ハッチングを施した部分の高さは、該当するフレームで検出された音声の音量を示す。例えば、ワイヤレスマイク１に関しては、現在時刻の１０秒前から２秒間、音声を検出し、その後、３秒間の音声未検出、３秒間の音声検出及び２秒間の音声未検出になっている。 FIG. 5 schematically shows an example of the voice detection history data recorded in step S212. FIG. 5 shows a case where the number n of connectable wireless microphones is four. In the right column of each wireless microphone, a portion corresponding to the frame in which the voice is detected is hatched. The left side of the column corresponds to the past frame in time, and the right end corresponds to the frame currently being processed. Also, the height of the hatched portion indicates the volume of the sound detected in the corresponding frame. For example, with respect to the wireless microphone 1, the voice is detected for 2 seconds from 10 seconds before the current time, and then the voice is not detected for 3 seconds, the voice is detected for 3 seconds, and the voice is not detected for 2 seconds.

ＣＰＵ１０６は、全ワイヤレスマイクの検出を終了した後、各ワイヤレスマイクｉの出力で検出された音声をブロックとして検出する（Ｓ２１５）。図５に示す例では、マイク１に関しては、現在時刻の１０秒前から２秒間の音声ブロックと、３秒前から３秒間の音声ブロックが、検出される。 After completing the detection of all the wireless microphones, the CPU 106 detects the sound detected by the output of each wireless microphone i as a block (S215). In the example illustrated in FIG. 5, for the microphone 1, an audio block for 2 seconds from 10 seconds before the current time and an audio block for 3 seconds from 3 seconds before are detected.

図６Ａ及び図６Ｂは、音声ブロック検出処理（Ｓ２１５）の動作フローチャートである。図６Ａ及び図６Ｂに示すフローチャートでは、音声が連続して０．５秒以上検出され、かつ、その後に音声が１秒以上未検出の場合の、検出された音声部分を音声ブロックとして検出する。 6A and 6B are operation flowcharts of the audio block detection process (S215). In the flowcharts shown in FIG. 6A and FIG. 6B, the detected voice portion is detected as a voice block when the voice is continuously detected for 0.5 seconds or longer and the voice is not detected for 1 second or longer thereafter.

ＣＰＵ１０６は、音声ブロック検出を行うワイヤレスマイクを示す内部変数ｉに１をセットする（Ｓ６０１）。ＣＰＵ１０６は、音声フレーム位置を示す内部変数ｔに１をセットし、マイクｉの音声検出履歴データから検出される音声ブロック数を示す内部変数ｂｎ［ｉ］に０をセットする（Ｓ６０２）。音声フレーム位置は、現在時刻から１０秒前の位置のフレームに対する番号を１とし、フレーム位置が１フレーム時間分現在時刻に近付くと、番号が１増加するものとする。従って、現在時刻に対応するフレーム位置番号は秒間３０フレームであるので、３０（フレーム）×１０（秒）＝３００となる。マイクｉに対する内部変数ｔで示すフレーム位置の音声検出履歴データをｐｏｗｅｒ［ｉ］［ｔ］と表記する。ｐｏｗｅｒ［ｉ］［ｔ］は、マイクｉから取得した音声データのフレーム位置ｔにおける音量を示す。 The CPU 106 sets 1 to an internal variable i indicating a wireless microphone that performs audio block detection (S601). The CPU 106 sets 1 to the internal variable t indicating the audio frame position, and sets 0 to the internal variable bn [i] indicating the number of audio blocks detected from the audio detection history data of the microphone i (S602). The audio frame position is set to 1 for a frame at a position 10 seconds before the current time, and the number increases by 1 when the frame position approaches the current time by one frame time. Accordingly, since the frame position number corresponding to the current time is 30 frames per second, 30 (frames) × 10 (seconds) = 300. The sound detection history data at the frame position indicated by the internal variable t for the microphone i is represented as power [i] [t]. power [i] [t] indicates the volume at the frame position t of the audio data acquired from the microphone i.

ＣＰＵ１０６は、ステップＳ６０３以降の処理で、まず音声ブロックの先頭フレームを検出する。まず、ＣＰＵ１０６は、マイクｉに対するフレーム位置ｔの音声検出履歴データｐｏｗｅｒ［ｉ］［ｔ］から音声検出の有無を判定する（Ｓ６０３）。音声が未検出の場合は、ＣＰＵ１０６は、内部変数ｔをカウントアップし（Ｓ６０４）、現在時刻に対するフレーム位置の処理が終了したか否か判定する（Ｓ６０５）。ステップＳ６０５中のＦＲの値は秒間のフレーム数を示す。ビデオカメラ１００は毎秒３０フレームの動画記録を行うので、ＦＲの値は３０である。 The CPU 106 first detects the first frame of the audio block in the processing after step S603. First, the CPU 106 determines the presence / absence of voice detection from the voice detection history data power [i] [t] at the frame position t with respect to the microphone i (S603). If no sound is detected, the CPU 106 counts up the internal variable t (S604), and determines whether or not the processing of the frame position with respect to the current time has ended (S605). The value of FR in step S605 indicates the number of frames per second. Since the video camera 100 records a moving image at 30 frames per second, the FR value is 30.

現在時刻に対するフレーム位置の処理が終了していない場合（Ｓ６０５）、ＣＰＵ１０６は、ステップＳ６０３に戻り、次の音声フレーム位置に対する処理を行う。現在時刻に対するフレーム位置の処理が終了している場合（Ｓ６０５）、ＣＰＵ１０６は、変数ｉをカウントアップする（Ｓ６０６）。そして、ＣＰＵ１０６は、内部変数ｉとマイク数ｎの比較により、全ワイヤレスマイクに対する処理を終了したか否かを判定する（Ｓ６０７）。全ワイヤレスマイクに対する処理が終了していない場合（Ｓ６０７）、ＣＰＵ１０６は、ステップＳ６０２に戻って、次のワイヤレスマイクに対する処理を行う。全ワイヤレスマイクに対する処理が終了している場合（Ｓ６０７）、ＣＰＵ１０６は、音声ブロック検出処理を終了する。 If the processing of the frame position with respect to the current time has not ended (S605), the CPU 106 returns to step S603 and performs processing for the next audio frame position. When the processing of the frame position with respect to the current time is finished (S605), the CPU 106 counts up the variable i (S606). Then, the CPU 106 determines whether or not the processing for all wireless microphones has been completed by comparing the internal variable i with the number of microphones n (S607). If the processing for all wireless microphones has not been completed (S607), the CPU 106 returns to step S602 and performs processing for the next wireless microphone. If the processing for all wireless microphones has been completed (S607), the CPU 106 ends the audio block detection processing.

変数ｔで示すフレーム位置に音声が検出されている場合（Ｓ６０３）、ＣＰＵ１０６は、内部変数ｔｓ，ｐｗ，ｐｃを対応する所定値で初期化する（Ｓ６０８）。具体的には、音声ブロックの先頭フレーム候補位置を示す変数ｔｓに変数ｔの値をセットする。音声ブロックに含まれる音量の総和を示す変数ｐｗに現在の処理フレームの音量データの音量、即ち、ｐｏｗｅｒ［ｉ］［ｔ］の値をセットする。音声が検出されたフレームの総数をカウントする内部変数ｐｃに１をセットする。変数ｐｗ，ｐｃは、検出した音声ブロックの平均音量を算出するために使用される。 When audio is detected at the frame position indicated by the variable t (S603), the CPU 106 initializes the internal variables ts, pw, and pc with corresponding predetermined values (S608). Specifically, the value of the variable t is set to the variable ts indicating the first frame candidate position of the audio block. The volume of the volume data of the current processing frame, that is, the value of power [i] [t] is set in the variable pw indicating the total volume included in the audio block. Set 1 to an internal variable pc that counts the total number of frames in which speech was detected. The variables pw and pc are used for calculating the average sound volume of the detected audio block.

ＣＰＵ１０６は、変数ｔをカウントアップし（Ｓ６０９）、ステップＳ６１０で、ＣＰＵ１０６はフレーム位置ｔで音声が検出されているか否かを判定する（Ｓ６１０）。音声が検出されていない場合（Ｓ６１０）、ＣＰＵ１０６は、フレーム位置ｔｓがブロックの先頭ではないと判断して、ステップＳ６０４に進み、再度、音声ブロックの先頭フレーム候補の検出処理を行う。 The CPU 106 counts up the variable t (S609), and in step S610, the CPU 106 determines whether or not sound is detected at the frame position t (S610). If no audio has been detected (S610), the CPU 106 determines that the frame position ts is not at the head of the block, proceeds to step S604, and performs detection processing for the head frame candidate of the audio block again.

フレーム位置ｔに音声が検出されている場合（Ｓ６１０）、ＣＰＵ１０６は、変数ｐｗに現在の処理フレームの音量データ値をセットし、変数ｐｃをカウントアップする（Ｓ６１１）。ＣＰＵ１０６は、現在処理中のフレーム位置ｔが音声ブロックの先頭フレーム候補位置ｔｓの０．５秒後であるか否か、即ちフレーム位置ｔｓから連続して０．５秒間音声が検出されているか否かを判定する（Ｓ６１２）。０．５秒後ではない場合（Ｓ６１２）、ＣＰＵ１０６は、ステップＳ６０９に戻って次のフレームの処理を行う。０．５秒後である場合（Ｓ６１２）、ＣＰＵ１０６は、変数ｂｎ［ｉ］をカウントアップし、変数ｂ＿ｓｔａｒｔ［ｉ］［ｂｎ［ｉ］］に変数ｔｓの値をセットする（Ｓ６１３）。ｂ＿ｓｔａｒｔ［ｉ］［ｂｎ［ｉ］］は、マイクｉに対するｂｎ［ｉ］番目の音声ブロックの先頭フレーム位置を示す。 When audio is detected at the frame position t (S610), the CPU 106 sets the volume data value of the current processing frame in the variable pw and counts up the variable pc (S611). The CPU 106 determines whether or not the currently processed frame position t is 0.5 seconds after the first frame candidate position ts of the audio block, that is, whether or not the audio is detected for 0.5 seconds continuously from the frame position ts. Is determined (S612). If it is not 0.5 seconds later (S612), the CPU 106 returns to step S609 to process the next frame. If 0.5 seconds later (S612), the CPU 106 counts up the variable bn [i] and sets the value of the variable ts in the variable b_start [i] [bn [i]] (S613). b_start [i] [bn [i]] indicates the start frame position of the bn [i] -th audio block for the microphone i.

ステップＳ６０８〜Ｓ６１３で音声ブロックの先頭フレーム位置が検出された場合、ＣＰＵ１０６は、検出した音声ブロックの最終フレーム位置を検出する。まず、ＣＰＵ１０６は、フレーム位置Ｔで音声が検出されているか否かを判定する（Ｓ６１４）。音声が検出されている場合（Ｓ６１４）、ＣＰＵ１０６は、変数ｐｗに現在の処理フレームの音量データ値をセットし、音声が検出されたフレームの総数をカウントする変数ｐｃをカウントアップする（Ｓ６１５）。次に、ＣＰＵ１０６は、変数ｔをカウントアップし（Ｓ６１６）、現在時刻に対するフレーム位置の処理が終了したか否か判定する（Ｓ６１７）。現在時刻に対するフレーム位置の処理が終了していない場合（Ｓ６１７）、ＣＰＵ１０６は、ステップＳ６１４に戻って、次の音声フレーム位置に対する処理を行う。 When the first frame position of the audio block is detected in steps S608 to S613, the CPU 106 detects the final frame position of the detected audio block. First, the CPU 106 determines whether or not sound is detected at the frame position T (S614). When the voice is detected (S614), the CPU 106 sets the volume data value of the current processing frame in the variable pw, and counts up the variable pc that counts the total number of frames in which the voice is detected (S615). Next, the CPU 106 counts up the variable t (S616), and determines whether or not the processing of the frame position with respect to the current time is completed (S617). If the processing of the frame position with respect to the current time has not ended (S617), the CPU 106 returns to step S614 and performs processing for the next audio frame position.

現在時刻に達している場合（Ｓ６１７）、ＣＰＵ１０６は、マイクｉに対するｂｎ［ｉ］番目の音声ブロックの最終フレーム位置を示す変数ｂ＿ｅｎｄ［ｉ］［ｂｎ［ｉ］］に現在時刻を示す値をセットする（Ｓ６１８）。ＣＰＵ１０６はまた、変数ｐｗを変数ｐｃで除算した結果を変数ｐ＿ａｖｅ［ｉ］［ｂｎ［ｉ］］にセットする（Ｓ６１８）。変数ｐ＿ａｖｅ［ｉ］［ｂｎ［ｉ］］は、マイクｉに対するｂｎ［ｉ］番目の音声ブロックの平均音量を示す。ＣＰＵ１０６は、ステップＳ６１８の処理の後、ステップＳ６０６に移行する。 When the current time has been reached (S617), the CPU 106 sets a value indicating the current time in a variable b_end [i] [bn [i]] indicating the final frame position of the bn [i] -th audio block for the microphone i. (S618). The CPU 106 also sets the result of dividing the variable pw by the variable pc to the variable p_ave [i] [bn [i]] (S618). The variable p_ave [i] [bn [i]] indicates the average volume of the bn [i] -th audio block for the microphone i. After the process of step S618, the CPU 106 proceeds to step S606.

フレーム位置ｔに音声が検出されていない場合（Ｓ６１４）、ＣＰＵ１０６は、変数ｔｓに音声ブロックの最終フレーム候補位置として変数ｔより１フレーム前の位置を示す値をセットする（Ｓ６１９）。ＣＰＵ１０６は変数ｔをカウントアップし（Ｓ６２０）、フレーム位置ｔで音声が検出されているか否かを判定する（Ｓ６２１）。音声が検出されている場合（Ｓ６２１）、ＣＰＵ１０６は、フレーム位置ｔｓがブロックの最終フレームではないと判定してステップＳ６１５に進み、再度、音声ブロックの最終フレーム候補の検出処理を行う。フレーム位置ｔに音声が検出されていない場合（Ｓ６２１）、ＣＰＵ１０６は変数ｔが変数ｔｓの値の１秒後であるか否か、即ちフレーム位置ｔｓから連続して１秒間、音声が未検出であるか否かを判定する（Ｓ６２２）。１秒後ではない場合（Ｓ６２２）、ＣＰＵ１０６は、ステップＳ６２０に戻って次のフレームの処理を行う。１秒後である場合（Ｓ６２２）、ＣＰＵ１０６は、マイクｉに対するｂｎ［ｉ］番目の音声ブロックの最終フレーム位置を示す内部変数ｂ＿ｅｎｄ［ｉ］［ｂｎ［ｉ］］に変数ｔｓの値をセットする（Ｓ６２３）。ＣＰＵ１０６はまた、変数ｐｗを変数ｐｃで除算した結果を変数ｐ＿ａｖｅ［ｉ］［ｂｎ［ｉ］］にセットする（Ｓ６２３）。先に説明多様に、変数ｐ＿ａｖｅ［ｉ］［ｂｎ［ｉ］］は、マイクｉに対するｂｎ［ｉ］番目の音声ブロックの平均音量を示す。 When no voice is detected at the frame position t (S614), the CPU 106 sets a value indicating the position one frame before the variable t as the final frame candidate position of the voice block in the variable ts (S619). The CPU 106 counts up the variable t (S620), and determines whether or not sound is detected at the frame position t (S621). If audio is detected (S621), the CPU 106 determines that the frame position ts is not the final frame of the block, proceeds to step S615, and performs detection processing of the final frame candidate of the audio block again. If no voice is detected at the frame position t (S621), the CPU 106 determines whether or not the variable t is one second after the value of the variable ts, that is, no voice is detected for one second continuously from the frame position ts. It is determined whether or not there is (S622). If it is not one second later (S622), the CPU 106 returns to step S620 to process the next frame. If one second later (S622), the CPU 106 sets the value of the variable ts to the internal variable b_end [i] [bn [i]] indicating the final frame position of the bn [i] -th audio block for the microphone i. (S623). Further, the CPU 106 sets a result obtained by dividing the variable pw by the variable pc into the variable p_ave [i] [bn [i]] (S623). As described above, the variable p_ave [i] [bn [i]] indicates the average volume of the bn [i] -th audio block with respect to the microphone i.

ステップＳ６１９〜Ｓ６２３で音声ブロックの最終フレーム位置を検出した場合、ＣＰＵ１０６は、ステップＳ６０３に戻って次の音声ブロックの検出処理を継続する。 When the last frame position of the audio block is detected in steps S619 to S623, the CPU 106 returns to step S603 and continues the detection process of the next audio block.

図６Ａ及び図６Ｂに示すフローチャートに従った処理を行うことで、ＣＰＵ１０６は、各ワイヤレスマイクで取得された音声に対して、音声ブロックを検出し、各音声ブロックの平均音量を求めることができる。 By performing the processing according to the flowcharts shown in FIGS. 6A and 6B, the CPU 106 can detect an audio block for the audio acquired by each wireless microphone and obtain the average volume of each audio block.

図２Ａ及び図２Ｂに戻り、ステップＳ２１６以降の処理を説明する。ＣＰＵ１０６は、内部変数ｉに１をセットして初期化する（Ｓ２１６）。変数ｉは処理中のワイヤレスマイクを指定する番号を示す。ＣＰＵ１０６は、マイクｉの状態を判定する（Ｓ２１７）。マイクｉが、ビデオカメラ１００により検出されていないか、検出され、且つ撮像画像内に存在する場合（Ｓ２１７）、ＣＰＵ１０６は、次のマイクの処理に移行するために変数ｉをカウントアップ又はインクリメントする（Ｓ２２１）。 Returning to FIG. 2A and FIG. 2B, the processing after step S216 will be described. The CPU 106 initializes the internal variable i by setting 1 (S216). The variable i indicates a number that designates the wireless microphone being processed. The CPU 106 determines the state of the microphone i (S217). When the microphone i is not detected by the video camera 100 or is detected and exists in the captured image (S217), the CPU 106 counts up or increments the variable i in order to shift to the next microphone processing. (S221).

マイクｉが、ビデオカメラ１００により検出されているが、撮像画像内に存在しない場合（Ｓ２１７）、ＣＰＵ１０６は、このマイクｉと、ビデオカメラ１００により検出され、且つ撮像画像内に存在する他のワイヤレスマイクとの相関を判定する（Ｓ２１８）。ＣＰＵ１０６は、相関がある場合にＴＲＵＥを、相関がない場合はＦＡＬＳＥを内部変数Ｒｅｓｕｌｔにセットする。ステップＳ２１８の動作の詳細は後述する。 When the microphone i is detected by the video camera 100 but is not present in the captured image (S217), the CPU 106 detects this microphone i and another wireless that is detected by the video camera 100 and is present in the captured image. The correlation with the microphone is determined (S218). The CPU 106 sets TRUE to the internal variable Result when there is a correlation, and FALSE when there is no correlation. Details of the operation in step S218 will be described later.

相関がある場合（Ｓ２１９）、ＣＰＵ１０６は、音声信号用バッファのミックス音声領域の音声データとマイク［ｉ］音声領域の音声データをミックスし、ミックス処理結果の音声データをミックス音声領域に格納する（Ｓ２２０）。そして、ＣＰＵ１０６は、次のマイクの処理のために変数ｉをカウントアップする（Ｓ２２１）。 When there is a correlation (S219), the CPU 106 mixes the audio data in the mixed audio area of the audio signal buffer and the audio data in the microphone [i] audio area, and stores the audio data of the mix processing result in the mixed audio area ( S220). Then, the CPU 106 counts up the variable i for the next microphone processing (S221).

相関がないと判定された場合（Ｓ２１９）、ＣＰＵ１０６は、次のマイクの処理のために変数ｉをカウントアップする（Ｓ２２１）。 When it is determined that there is no correlation (S219), the CPU 106 counts up the variable i for the next microphone processing (S221).

ステップＳ２２１の後、ＣＰＵ１０６は変数ｉをワイヤレスマイク数ｎと比較し、全ての接続可能なワイヤレスマイクの処理を終了したか否かを判定する（Ｓ２２２）。全ての接続可能なワイヤレスマイクの検出処理が終了していない場合、ＣＰＵ１０６は、ステップＳ２１７に戻って次のワイヤレスマイクの処理を行う。終了している場合、ＣＰＵ１０６は、音声信号用バッファのミックス音声領域の音声データを記録媒体に記録し（Ｓ２２３）、ステップＳ２０１に戻って次の記録フレームに対する処理を繰り返す。 After step S221, the CPU 106 compares the variable i with the number of wireless microphones n, and determines whether or not processing for all connectable wireless microphones has been completed (S222). If the detection process for all connectable wireless microphones has not been completed, the CPU 106 returns to step S217 to perform the process for the next wireless microphone. If completed, the CPU 106 records the audio data in the mixed audio area of the audio signal buffer on the recording medium (S223), returns to step S201, and repeats the processing for the next recording frame.

図７Ａ及び図７Ｂは、相関判定処理（Ｓ２１８）の動作例を示すフローチャートである。図７Ａ及び図７Ｂに示すフローチャートに従い、ＣＰＵ１０６は、処理対象のマイクｉが、ビデオカメラ１００により検出され、かつ撮像画像内に存在する他のワイヤレスマイクのいずれかと相関があるか否かを判定する。図７Ａ及び図７Ｂに示すフローチャートでは、処理の途中でいずれかのワイヤレスマイクとの相関があると判定された時点で、相関判定結果ＲｅｓｕｌｔにＴＲＵＥをセットして処理を終了する。全てのワイヤレスマイクとの相関判定が終了しても、いずれのワイヤレスマイクとも相関がない場合、相関判定結果Ｒｅｓｕｌｔの値はＦＡＬＳＥのままで判定処理を終了する。 7A and 7B are flowcharts illustrating an example of the operation of the correlation determination process (S218). In accordance with the flowcharts shown in FIGS. 7A and 7B, the CPU 106 determines whether or not the processing target microphone i is correlated with any of the other wireless microphones detected by the video camera 100 and present in the captured image. . In the flowcharts shown in FIGS. 7A and 7B, when it is determined that there is a correlation with one of the wireless microphones during the process, TRUE is set in the correlation determination result Result, and the process ends. If there is no correlation with any wireless microphone even after the correlation determination with all wireless microphones is completed, the determination process ends with the value of the correlation determination result Result being FALSE.

ＣＰＵ１０６は、内部変数ＲｅｓｕｌｔにＦＡＬＳＥを、内部変数ｊに１をセットして初期化する（Ｓ７０１）。内部変数ｊは、現在処理中のマイクｉとの相関の有無を判定する対象のワイヤレスマイクを特定する番号を示す。ＣＰＵ１０６は変数ｊと変数ｉの値が一致するか否かを判定する（Ｓ７０２）。変数ｉと変数ｊの値が一致する場合（Ｓ７０２）、ＣＰＵ１０６は、変数ｊをカウントアップし（Ｓ７０３）、全てのワイヤレスマイクとの相関判定を終了したか否かを判定する（Ｓ７０４）。全てのワイヤレスマイクとの相関判定が終了していない場合（Ｓ７０４）、ＣＰＵ１０６は、ステップＳ７０２に戻って、次のワイヤレスマイクとの相関判定処理を継続する。 The CPU 106 initializes the internal variable Result by setting FALSE and the internal variable j to 1 (S701). The internal variable j indicates a number that identifies a wireless microphone that is a target for determining whether or not there is a correlation with the microphone i currently being processed. The CPU 106 determines whether or not the values of the variable j and i match (S702). When the values of the variable i and the variable j match (S702), the CPU 106 counts up the variable j (S703), and determines whether or not the correlation determination with all wireless microphones is completed (S704). If the correlation determination with all wireless microphones has not been completed (S704), the CPU 106 returns to step S702 and continues the correlation determination process with the next wireless microphone.

内変数ｊと変数ｉの値が一致しない場合（Ｓ７０２）、ＣＰＵ１０６は、マイクｊがビデオカメラ１００により検出されており、かつ撮像画像内に存在するか否かを判定する（Ｓ７０５）。マイクｊがビデオカメラ１００により検出されていないか、または、検出されていても撮像画像内に存在しない場合（Ｓ７０２）、ＣＰＵ１０６は、先に説明したように、変数ｊをカウントアップして、次のワイヤレスマイクとの相関判定処理を行う（Ｓ７０３）。 When the value of the internal variable j does not match the value of the variable i (S702), the CPU 106 determines whether or not the microphone j is detected by the video camera 100 and exists in the captured image (S705). If the microphone j is not detected by the video camera 100 or if it is detected but does not exist in the captured image (S702), the CPU 106 counts up the variable j as described above, and next A correlation determination process with the wireless microphone is performed (S703).

マイクｊがビデオカメラ１００により検出され、かつ撮像画像内に存在する場合、ＣＰＵ１０６は、内部変数ｂｉに１をセットし（Ｓ７０６）、内部変数ｂｊに１をセットする（Ｓ７０７）。変数ｂｉはマイクｉの音声検出履歴データから検出された音声ブロックの番号を示す。変数ｂｊはマイクｊの音声検出履歴データから検出される音声ブロックの番号を示す。以下、マイクｉに対するｂｉ番目の音声ブロックを音声ブロック（ｉ，ｂｉ）”と記す。 When the microphone j is detected by the video camera 100 and exists in the captured image, the CPU 106 sets 1 to the internal variable bi (S706), and sets 1 to the internal variable bj (S707). The variable bi indicates the number of the voice block detected from the voice detection history data of the microphone i. A variable bj indicates the number of a voice block detected from the voice detection history data of the microphone j. Hereinafter, the bith audio block for the microphone i is referred to as an audio block (i, bi) ”.

ＣＰＵ１０６は、変数ｄｔに音声ブロック（ｉ，ｂｉ）の先頭フレーム位置と音声ブロック（ｊ、ｂｊ）の先頭フレーム位置の差分をセットし（Ｓ７０８）、内部変数ｄｔの正負を判定する（Ｓ７０９）。変数ｄｔが正値である場合（Ｓ７０９）、音声ブロック（ｊ、ｂｊ）は、音声ブロック（ｉ，ｂｉ）よりも先頭フレーム位置が時間的に過去の状態である。図５に示す例では、音声ブロック（ｉ，ｂｉ）が音声ブロック５０２であり、音声ブロック（ｊ，ｂｉ）が音声ブロック５０１である場合に対応する。他方、変数ｄｔが負値である場合（Ｓ７０９）、音声ブロック（ｉ，ｂｉ）は音声ブロック（ｊ，ｂｊ）よりも先頭フレーム位置が時間的に過去の状態である。図５に示す例では、音声ブロック（ｉ，ｂｉ）が音声ブロック５０２であり、音声ブロック（ｊ，ｂｉ）が音声ブロック５０３である場合に相当する。本実施例では、処理の簡略化のため、ｄｔ＝０の場合、即ち音声ブロック（ｉ，ｂｉ）と音声ブロック（ｊ、ｂｊ）の先頭フレーム位置が一致する場合には、ｄｔが正値の場合と同様の処理を行う。 The CPU 106 sets the difference between the first frame position of the audio block (i, bi) and the first frame position of the audio block (j, bj) in the variable dt (S708), and determines whether the internal variable dt is positive or negative (S709). When the variable dt is a positive value (S709), the audio frame (j, bj) has a temporally past head frame position than the audio block (i, bi). In the example shown in FIG. 5, this corresponds to the case where the audio block (i, bi) is the audio block 502 and the audio block (j, bi) is the audio block 501. On the other hand, when the variable dt is a negative value (S709), the head frame position of the audio block (i, bi) is in the past in time than the audio block (j, bj). In the example shown in FIG. 5, this corresponds to the case where the audio block (i, bi) is the audio block 502 and the audio block (j, bi) is the audio block 503. In this embodiment, for simplification of processing, when dt = 0, that is, when the start frame positions of the audio block (i, bi) and the audio block (j, bj) match, dt is a positive value. The same processing as in the case is performed.

ｄｔが正値又はゼロである場合（Ｓ７０９）、ＣＰＵ１０６は、音声ブロック（ｉ，ｂｉ）の先頭フレーム位置と音声ブロック（ｊ、ｂｊ）の最終フレーム位置の差分値を変数ｄｔにセットする（Ｓ７１０）。他方、ｄｔが負値である場合（Ｓ７０９）、ＣＰＵ１０６は、音声ブロック（ｊ，ｂｊ）の先頭フレーム位置と音声ブロック（ｉ，ｂｉ）の最終フレーム位置の差分値を変数ｄｔにセットする（Ｓ７１１）。音声ブロック（ｉ，ｂｉ）と音声ブロック（ｊ，ｂｊ）が時間的に重なっている場合、ｄｔはゼロ又は負値となり、重なっているフレーム時間数を示す。音声ブロック（ｉ，ｂｉ）と音声ブロック（ｊ，ｂｊ）が時間的に重なっていない場合、ｄｔは正の値となり、音声ブロック間の隔たりフレーム時間数を示す。 When dt is a positive value or zero (S709), the CPU 106 sets a difference value between the head frame position of the audio block (i, bi) and the final frame position of the audio block (j, bj) in the variable dt (S710). ). On the other hand, when dt is a negative value (S709), the CPU 106 sets a difference value between the head frame position of the audio block (j, bj) and the final frame position of the audio block (i, bi) in the variable dt (S711). ). When the audio block (i, bi) and the audio block (j, bj) overlap with each other in time, dt is zero or a negative value, indicating the number of overlapping frame times. When the audio block (i, bi) and the audio block (j, bj) do not overlap with each other in time, dt is a positive value and indicates the distance frame time between the audio blocks.

ステップＳ７１０又はＳ７１１の後、ＣＰＵ１０６は、変数ｄｔの値が所定の定数Ｔ１と定数Ｔ２の範囲内にあるか否かを判定する（Ｓ７１２）。Ｔ１は負値であり、相関がある音声ブロック間の許容可能な重なりフレーム数を示す。Ｔ２は正値であり、相関がある音声ブロック間の許容可能な隔たりフレーム数を示す。ｄｔがＴ１以下の場合、重なりフレーム数が許容可能なフレーム数を超えることになり、ｄｔがＴ２以上の場合、音声ブロック間の隔たりフレーム数が許容可能なフレーム数を超えることになる。ここでは、２つのマイクの音声が隔たる間隔が所定時間よりも短いかどうか、また、音声の重なりが所定時間より長いかどうかを見ていることになる。 After step S710 or S711, the CPU 106 determines whether or not the value of the variable dt is within a range between a predetermined constant T1 and a constant T2 (S712). T1 is a negative value and indicates the allowable number of overlapping frames between correlated speech blocks. T2 is a positive value and indicates the allowable number of spaced frames between correlated speech blocks. When dt is equal to or less than T1, the number of overlapping frames exceeds the allowable number of frames. When dt is equal to or greater than T2, the number of separated frames between the audio blocks exceeds the allowable number of frames. Here, it is determined whether or not the interval between the sounds of the two microphones is shorter than the predetermined time, and whether or not the overlapping of the sounds is longer than the predetermined time.

ｄｔがＴ１以下であるか、Ｔ２以上である場合（Ｓ７１２）、ＣＰＵ１０６は、音声ブロック（ｉ，ｂｉ）と音声ブロック（ｊ、ｂｊ）との間に相関が無いと判定し、ステップＳ７１３に進む。ＣＰＵ１０６は、ステップＳ７１３で内部変数ｂｊをカウントアップし（Ｓ７１３）、マイクｊの全音声ブロックと音声ブロック（ｉ，ｂｉ）との相関判定を終了したか否かを判定する（Ｓ７１４）。マイクｊの全音声ブロックと音声ブロック（ｉ，ｂｉ）との相関判定を終了していない場合、ＣＰＵ１０６は、ステップＳ７０８に戻って音声ブロック（ｉ，ｂｉ）とマイクｊの次の音声ブロックとの相関判定を行う。 If dt is equal to or less than T1 or equal to or greater than T2 (S712), the CPU 106 determines that there is no correlation between the audio block (i, bi) and the audio block (j, bj), and proceeds to step S713. . In step S713, the CPU 106 counts up the internal variable bj (S713), and determines whether or not the correlation determination between all the audio blocks of the microphone j and the audio block (i, bi) has been completed (S714). If the correlation determination between all the audio blocks of microphone j and the audio block (i, bi) has not ended, the CPU 106 returns to step S708 to return the audio block (i, bi) to the next audio block of microphone j. Perform correlation determination.

マイクｊの全音声ブロックと音声ブロック（ｉ，ｂｉ）との相関判定を終了すると（Ｓ７１４）、ＣＰＵ１０６は変数ｂｉをカウントアップする（Ｓ７１５）。そして、ＣＰＵ１０６は、マイクｉの全音声ブロックとマイクｊの全音声ブロックとの相関判定を終了したか否かを判定する（Ｓ７１６）。マイクｉの全音声ブロックとマイクｊの全音声ブロックとの相関判定を終了していない場合（Ｓ７１６）、ＣＰＵ１０６は、ステップＳ７０７に戻って、マイクｉの次の音声ブロックとマイクｊの音声ブロックとの相関判定を行う。マイクｉの全音声ブロックとマイクｊの全音声ブロックとの相関判定を終了している場合（Ｓ７１６）、ＣＰＵ１０６は、マイクｉと次のワイヤレスマイクとの相関判定を行うために、変数ｊをカウントアップする（Ｓ７０３）。 When the correlation determination between all the audio blocks of the microphone j and the audio block (i, bi) is completed (S714), the CPU 106 counts up the variable bi (S715). Then, the CPU 106 determines whether or not the correlation determination between all the audio blocks of the microphone i and all the audio blocks of the microphone j is finished (S716). When the correlation determination between all the audio blocks of the microphone i and all the audio blocks of the microphone j has not been completed (S716), the CPU 106 returns to step S707 to return to the next audio block of the microphone i and the audio block of the microphone j. The correlation is determined. When the correlation determination between all the audio blocks of the microphone i and all the audio blocks of the microphone j has been completed (S716), the CPU 106 counts the variable j in order to determine the correlation between the microphone i and the next wireless microphone. Up (S703).

ｄｔがＴ１よりも大きく、且つ、Ｔ２よりも小さい場合（Ｓ７１２）、ＣＰＵ１０６は、音声ブロック（ｉ，ｂｉ）の平均音量と音声ブロック（ｊ、ｂｊ）の平均音量の差分絶対値を内部変数ｄｐにセットする（Ｓ７１７）。ＣＰＵ１０６は、変数ｄｐが所定閾値Ｐよりも小さいか否かを判定する（Ｓ７１８）。変数ｄｐが所定閾値以上である場合（Ｓ７１８）、音声ブロックの音量の差が大きいことを示しているので、音声ブロック間に相関がないと判断でき、ＣＰＵ１０６は、ステップＳ７１３に移行する。 When dt is larger than T1 and smaller than T2 (S712), the CPU 106 sets the absolute value of the difference between the average volume of the audio block (i, bi) and the average volume of the audio block (j, bj) as an internal variable dp. (S717). The CPU 106 determines whether or not the variable dp is smaller than the predetermined threshold P (S718). If the variable dp is equal to or greater than the predetermined threshold value (S718), it indicates that there is a large difference in volume between the audio blocks, so it can be determined that there is no correlation between the audio blocks, and the CPU 106 proceeds to step S713.

ｄｔが所定閾値よりも小さい場合（Ｓ７１８）、音声ブロックの音量の差が小さいことを示しているので、マイクｉとマイクｊの音声に相関があると判断できる。この場合、ＣＰＵ１０６は、マイクｉがビデオカメラ１００により検出され、且つ撮像画像内の他の何れかのワイヤレスマイクと相関があることを示すＴＲＵＥ値を内部変数Ｒｅｓｕｌｔにセットして（Ｓ７１９）、処理を終了する。 If dt is smaller than the predetermined threshold value (S718), it indicates that the difference in volume between the audio blocks is small, so that it can be determined that there is a correlation between the sounds of the microphones i and j. In this case, the CPU 106 sets a TRUE value indicating that the microphone i is detected by the video camera 100 and is correlated with any of the other wireless microphones in the captured image to the internal variable Result (S719). Exit.

図７Ａ及び図７Ｂに示すフローチャートでは、音声ブロックのタイミングの及び音量を用いて相関の判定を行っているが、ステップＳ７０８〜ステップＳ７１２の処理を行わずにステップＳ７１７以降の処理のみを行ってもよい。即ち、音声ブロックの音量のみを用いて相関の有無を判定してもよい。 In the flowcharts shown in FIGS. 7A and 7B, the correlation is determined using the timing and volume of the audio block. However, even if only the processing after step S717 is performed without performing the processing of steps S708 to S712. Good. That is, the presence or absence of correlation may be determined using only the sound block volume.

このように、本実施例では、撮像画像内に検出されるワイヤレスマイクの音声を記録できるだけでなく、この音声と相関のある、撮影画像内に検出されないワイヤレスマイクの音声も記録できる。 As described above, in this embodiment, not only the sound of the wireless microphone detected in the captured image can be recorded, but also the sound of the wireless microphone that is correlated with the sound and not detected in the captured image can be recorded.

再生時の処理で実施例１と同様の作用を実現できる。この場合、ビデオカメラ１００は、記録時には、内蔵マイクの音声及び接続する全ワイヤレスマイクの音声を記録し、再生時に、記録された画像・音声データの画像を再生しつつ、音声を選択的に再生する。記録時に各マイクで取得される音声は、１つの動画ファイルに異なるトラックとして記録してもよく、またマイク別にそれぞれ独立した音声ファイルとして記録してもよい。 The same operation as that of the first embodiment can be realized by the processing at the time of reproduction. In this case, the video camera 100 records the sound of the built-in microphone and the sound of all the connected wireless microphones during recording, and selectively reproduces the sound while playing back the recorded image / audio data image during playback. To do. Audio acquired by each microphone at the time of recording may be recorded as a different track in one moving image file, or may be recorded as an independent audio file for each microphone.

図８Ａ及び図８Ｂは、再生時に、撮像画像内に検出されるワイヤレスマイクの音声と、この音声と相関のある、撮影画像内に検出されないワイヤレスマイクの音声を再生する動作のフローチャートを示す。図８Ａ及び図８Ｂにおいて、図２Ａ及び図２Ｂと同じ処理内容のステップには同じ符号を付してある。この実施例では、ビデオカメラ１００（ＣＰＵ１０６）は、再生画像内の人物を認識する機能と、予め登録されている人物とワイヤレスマイク２００_１〜２００_ｎとを関連付けて記憶する機能を有する。従って、ＣＰＵ１０６は、各ワイヤレスマイク２００_１〜２００_ｎに対応する音声トラックまたは音声ファイルがどの人物に対応する音声であるかを関連付けて記録しているとも言える。以下、音声トラック又は音声ファイルを音声トラックで代表する。 FIG. 8A and FIG. 8B show a flowchart of the operation of reproducing the voice of the wireless microphone detected in the captured image and the voice of the wireless microphone that is correlated with the voice and not detected in the captured image during reproduction. 8A and 8B, steps having the same processing contents as those in FIGS. 2A and 2B are denoted by the same reference numerals. In this embodiment, the video camera 100 (CPU 106) has a function of recognizing a person in a reproduced image and a function of storing a person registered in advance and the wireless microphones 200 ₁ to 200 _n in association with each other. Therefore, it can be said that the CPU 106 records the sound track or sound file corresponding to each of the wireless microphones 200 ₁ to 200 _n in association with the person corresponding to the sound. Hereinafter, an audio track or an audio file is represented by an audio track.

ユーザがビデオカメラ１００で再生対象の動画を選択して再生を開始すると、ＣＰＵ１０６は、記録媒体１１０の選択された動画ファイルをオープンして、再生処理を開始する。音声ファイルが動画ファイルとは別のファイルとして記録されている場合は、内蔵マイク部１０３及び各ワイヤレスマイク２００_１〜２００_ｎに対応する音声ファイルも同時にオープンして、再生処理を開始する。 When the user selects a moving image to be played with the video camera 100 and starts playback, the CPU 106 opens the selected moving image file on the recording medium 110 and starts playback processing. When the audio file is recorded as a file different from the moving image file, the audio file corresponding to the built-in microphone unit 103 and each of the wireless microphones 200 ₁ to 200 _n is also opened at the same time, and the reproduction process is started.

ＣＰＵ１０６は、内蔵マイク部１０３に対応する音声トラックから動画１フレーム時間に相当する音声データを再生して、ＲＡＭ１０９の音声信号用バッファに格納する（Ｓ８０１）。内蔵マイク部１０３で取得された音声データは、音声信号用バッファのミックス音声領域と内蔵マイク音声領域に格納される。 The CPU 106 reproduces audio data corresponding to one frame time of the moving image from the audio track corresponding to the built-in microphone unit 103, and stores it in the audio signal buffer of the RAM 109 (S801). The sound data acquired by the built-in microphone unit 103 is stored in the mixed sound area and the built-in microphone sound area of the sound signal buffer.

ＣＰＵ１０６は内部変数ｉに１をセットして初期化する（Ｓ８０２）。ＣＰＵ１０６は、マイクｉに対応する音声トラックから動画１フレーム時間に相当する音声データを再生し、ＲＡＭ１０９のマイク［ｉ］音声領域に格納する（Ｓ８０３）。次に、ＣＰＵ１０６は、マイクｉが再生画像内に存在するか否かを判定する（Ｓ２０６）。この判定は、マイクｉが関連付けられている人物が再生画像内に検出されているか否かで行うことができる。 The CPU 106 initializes the internal variable i by setting 1 (S802). The CPU 106 reproduces audio data corresponding to one frame time of the moving image from the audio track corresponding to the microphone i, and stores it in the microphone [i] audio area of the RAM 109 (S803). Next, the CPU 106 determines whether or not the microphone i exists in the reproduced image (S206). This determination can be made based on whether or not a person associated with the microphone i is detected in the reproduced image.

マイクｉが再生画像内に存在する場合（Ｓ２０５）、ＣＰＵ１０６は、ミックス音声領域の音声データとマイク［ｉ］音声領域の音声データを混合し、混合結果の音声データをミックス音声領域に記録する（Ｓ２０８）。また、ＣＰＵ１０６は、内部変数Ｍｉｃ［ｉ］にワイヤレスマイクｉが再生画像内に存在することを示す値”１”をセットする（Ｓ８０４）。図９は、変数Ｍｉｃ［ｉ］にセットされる値の例を示す。マイクｉが再生画像内に存在しない場合（Ｓ２０７）、ＣＰＵ１０６は、変数Ｍｉｃ［ｉ］にマイクｉが再生画像内に存在しないことを示す値”２”をセットする（Ｓ８０８）。 When the microphone i exists in the reproduced image (S205), the CPU 106 mixes the audio data in the mixed audio area and the audio data in the microphone [i] audio area, and records the audio data of the mixing result in the mixed audio area ( S208). Further, the CPU 106 sets a value “1” indicating that the wireless microphone i exists in the reproduced image in the internal variable Mic [i] (S804). FIG. 9 shows an example of values set in the variable Mic [i]. When the microphone i does not exist in the reproduced image (S207), the CPU 106 sets a value “2” indicating that the microphone i does not exist in the reproduced image to the variable Mic [i] (S808).

ステップＳ２１１以降は、実施例１で説明した処理と同様の処理を行う。最終的に、ＣＰＵ１０６は、音声信号用バッファのミックス音声領域に格納される音声データを図示しない音声出力手段に出力し（Ｓ８０６）、ステップＳ８０１に戻って次の記録フレームに対する処理を繰り返す。 After step S211, processing similar to that described in the first embodiment is performed. Finally, the CPU 106 outputs the audio data stored in the mixed audio area of the audio signal buffer to an audio output unit (not shown) (S806), returns to step S801, and repeats the process for the next recording frame.

以上、説明したように、再生時に、再生画像内に検出されているワイヤレスマイクの音声と、この音声と相関のある、再生画像外のワイヤレスマイクによる音声を同時再生できる。 As described above, at the time of playback, the sound of the wireless microphone detected in the playback image and the sound of the wireless microphone outside the playback image correlated with this sound can be played back simultaneously.

ワイヤレスマイクが自身の位置座標を検出する測位機能を有する場合、マイク位置情報を利用して、個々のワイヤレスマイクが撮像画像内にあるかどうかを判定できる。この判定結果を使って、撮像画像内に検出されるワイヤレスマイクの音声と相関のある、撮影画像内に検出されないワイヤレスマイクの音声を選択的に記録できる。 When the wireless microphone has a positioning function for detecting its own position coordinates, it is possible to determine whether each wireless microphone is in the captured image using the microphone position information. Using this determination result, it is possible to selectively record the sound of the wireless microphone that is correlated with the sound of the wireless microphone detected in the captured image and that is not detected in the captured image.

図１０は、測位機能を有するワイヤレスマイクの概略構成ブロック図を示す。ワイヤレスマイク１０００（１０００_１〜１０００_ｎ）は、ワイヤレスマイク２００の機能に加えて、測位部８０１を具備する。この実施例では、ビデオカメラ１００は、通信部１０５を介して最大ｎ個のワイヤレスマイク１０００_１〜１０００_ｎから音声データと位置情報を取得し、記録できる。 FIG. 10 shows a schematic block diagram of a wireless microphone having a positioning function. The wireless microphone 1000 (1000 _{1 to} 1000 _n ) includes a positioning unit 801 in addition to the function of the wireless microphone 200. In this embodiment, the video camera 100 acquires the position information and voice data from up to n wireless microphone 1000 ₁ to 1000 _n through the communication unit 105, can be recorded.

図１１Ａ及び図１１Ｂは、本実施例の記録時の音声処理のフローチャートである。図１１Ａ及び図１１Ｂにおいて、図２Ａ及び図２Ｂと同じ処理には同じ符号を付してある。 11A and 11B are flowcharts of audio processing during recording according to the present embodiment. 11A and 11B, the same processes as those in FIGS. 2A and 2B are denoted by the same reference numerals.

ユーザがビデオカメラ１００での記録を開始すると、ＣＰＵ１０６は、実施例１と同様の処理を行い、ワイヤレスマイク１０００の検出とワイヤレスマイク１０００が撮像画像内に存在するかの判定を行う（Ｓ２０３〜Ｓ２０７）。そして、ＣＰＵ１０６は、検出されたワイヤレスマイク１０００の音声処理（Ｓ２０８，Ｓ２０９）を行った後、マイクｉの位置情報（マイク位置情報）を取得する（Ｓ１１０１）。 When the user starts recording with the video camera 100, the CPU 106 performs the same processing as in the first embodiment, and detects the wireless microphone 1000 and determines whether the wireless microphone 1000 exists in the captured image (S203 to S207). ). The CPU 106 performs sound processing (S208, S209) of the detected wireless microphone 1000, and then acquires position information (microphone position information) of the microphone i (S1101).

ＣＰＵ１０６は、接続可能な全ワイヤレスマイク１０００の検出処理が終了すると（Ｓ２１４）、ステップＳ２１６以降の処理を行う。ステップＳ２１６以降の処理で、ＣＰＵ１０６は、撮像画像内に存在しないと判定されたワイヤレスマイクの音声と、撮像画像内に存在すると判定された何れかのワイヤレスマイクの音声との相関を判定する（Ｓ１１０２）。ＣＰＵ１０６は、音声信号用バッファのミックス音声領域の音声データを記録媒体に記録し（Ｓ２２３）、ステップＳ２０１に戻って次の記録フレームに対する処理を繰り返す。 When the detection process of all connectable wireless microphones 1000 ends (S214), the CPU 106 performs the processes after step S216. In the processing after step S216, the CPU 106 determines the correlation between the sound of the wireless microphone determined not to be present in the captured image and the sound of any wireless microphone determined to be present in the captured image (S1102). ). The CPU 106 records the audio data in the mixed audio area of the audio signal buffer on the recording medium (S223), returns to step S201, and repeats the process for the next recording frame.

図１２は、相関判定処理（Ｓ１１０２）の詳細なフローチャートを示す。ＣＰＵ１０６は内部変数ＲｅｓｕｌｔにＦＡＬＳＥを、内部変数ｊに１をセットして初期化する（Ｓ１２０１）。内部変数ｊは、現在処理中のマイクｉとの相関の有無を判定する対象のワイヤレスマイクを特定する番号を示す。ＣＰＵ１０６は、変数ｊと変数ｉの値が一致するか否かを判定する（Ｓ１２０２）。一致する場合（Ｓ１２０２）、ＣＰＵ１０６は、変数ｊをカウントアップし（Ｓ１２０３）、全ワイヤレスマイクとの相関判定を終了したか否かを判定する（Ｓ１２０４）。 FIG. 12 shows a detailed flowchart of the correlation determination process (S1102). The CPU 106 initializes the internal variable Result by setting FALSE and the internal variable j to 1 (S1201). The internal variable j indicates a number that identifies a wireless microphone that is a target for determining whether or not there is a correlation with the microphone i currently being processed. The CPU 106 determines whether or not the values of the variable j and the variable i match (S1202). If they match (S1202), the CPU 106 counts up the variable j (S1203), and determines whether or not the correlation determination with all wireless microphones has been completed (S1204).

本フローチャートでは、処理の途中でいずれかのワイヤレスマイクとの相関があると判定された時点で、ＲｅｓｕｌｔをＴＲＵＥにセットして相関判定を終了する。全てのワイヤレスマイクとの相関判定が終了している場合は、何れのワイヤレスマイクとも相関がない状態であり、ＲｅｓｕｌｔはＦＡＬＳＥのままで判定処理を終了する。全てのワイヤレスマイクとの相関判定が終了していない場合（Ｓ１２０３）、ＣＰＵ１０６は、ステップＳ１２０２に戻って、次のワイヤレスマイクとの相関判定処理を継続する。 In this flowchart, when it is determined that there is a correlation with one of the wireless microphones in the middle of the processing, Result is set to TRUE and the correlation determination ends. When the correlation determination with all the wireless microphones has been completed, there is no correlation with any wireless microphone, and the determination process ends with the Result set to FALSE. If the correlation determination with all wireless microphones has not been completed (S1203), the CPU 106 returns to step S1202 and continues the correlation determination process with the next wireless microphone.

変数ｊと変数ｉの値が一致しない場合（Ｓ１２０２）、ＣＰＵ１０６は、マイクｊがビデオカメラ１００により検出され、かつ撮像画像内に存在するか否かを判定する（Ｓ１２０５）。マイクｊがビデオカメラ１００により検出されていないか、または、撮像画像内に存在しない場合（Ｓ１２０５）、ＣＰＵ１０６は、変数ｊをカウントアップする（Ｓ１２０３）。ＣＰＵ１０６は、次のワイヤレスマイクとの相関判定処理の要否を判断する（Ｓ１２０４）。 If the values of the variable j and i do not match (S1202), the CPU 106 determines whether or not the microphone j is detected by the video camera 100 and exists in the captured image (S1205). If the microphone j is not detected by the video camera 100 or does not exist in the captured image (S1205), the CPU 106 counts up the variable j (S1203). The CPU 106 determines whether or not the correlation determination process with the next wireless microphone is necessary (S1204).

マイクｊがビデオカメラ１００により検出され、かつ撮像画像内に存在する場合（Ｓ１２０５）、ＣＰＵ１０６は、マイクｉとマイクｊの間の距離を算出して、変数ｄにセットする（Ｓ１２０６）。マイクｉとマイクｊの間の距離は、図１１Ａ及び図１１ＢのステップＳ１１０１で取得した各マイクの位置情報を用いて、

に示すヒュベニの距離計算式で求めることができる。 When the microphone j is detected by the video camera 100 and exists in the captured image (S1205), the CPU 106 calculates the distance between the microphone i and the microphone j and sets it to the variable d (S1206). The distance between the microphone i and the microphone j is obtained by using the position information of each microphone acquired in step S1101 of FIGS. 11A and 11B.

It can be obtained by the Huveni distance calculation formula shown in FIG.

ＣＰＵ１０６は、変数ｄが閾値ＤＴよりも小さいか否かを判定する（Ｓ１２０７）。ＤＴは、会話が成立すると想定される距離である。ＤＴは、予め決められた固定値でよいが、ユーザが値を設定できるようにしてもよい。変数ｄが閾値ＤＴ以上である場合（Ｓ１２０７）、ワイヤレスマイクｉ，ｊ間の距離が大きく、マイクｉとマイクｊに相関がないと見做しうる。そこで、ＣＰＵ１０６は、変数ｊをカウントアップし（Ｓ１２０３）、全ワイヤレスマイクとの相関判定を終了したか否かを判定する（Ｓ１２０４）。 The CPU 106 determines whether or not the variable d is smaller than the threshold value DT (S1207). DT is a distance at which a conversation is assumed to be established. The DT may be a fixed value determined in advance, but may be set by the user. When the variable d is equal to or greater than the threshold value DT (S1207), it can be considered that the distance between the wireless microphones i and j is large and the microphone i and the microphone j are not correlated. Therefore, the CPU 106 counts up the variable j (S1203), and determines whether or not the correlation determination with all wireless microphones is completed (S1204).

変数ｄが閾値ＤＴよりも小さい場合（Ｓ１２０７）、会話が成立する距離であるので、マイクｉとマイクｊに相関があると見做しうる。そこで、ＣＰＵ１０６は、マイクｉが、撮像画面内の検出された何れかのマイク（ここでは、マイクｊ）と相関があることを示すＴＲＵＥを内部変数Ｒｅｓｕｌｔにセットして（Ｓ１２０８）、処理を終了する。 When the variable d is smaller than the threshold value DT (S1207), since the conversation is established, it can be considered that there is a correlation between the microphone i and the microphone j. Therefore, the CPU 106 sets TRUE indicating that the microphone i is correlated with any of the detected microphones (here, the microphone j) in the imaging screen to the internal variable Result (S1208), and ends the processing. To do.

このように、本実施例では、会話が成立するようなマイク間の距離を要素として、相関を判定する。これにより、撮影画像内に検出されていないマイクの中で適切なマイクの音声を、撮像画像内で検出されるマイクの音声と同時に記録することが可能になる。 Thus, in the present embodiment, the correlation is determined using the distance between the microphones where the conversation is established as an element. This makes it possible to record the sound of an appropriate microphone among the microphones not detected in the captured image at the same time as the sound of the microphone detected in the captured image.

実施例１に対する実施例２と同様に、実施例３に対して、再生時に記録音声を選択的に再生するようにしてもよい。即ち、ビデオカメラ１００は、記録時には内蔵マイクの音声及び接続する全ワイヤレスマイクの音声を記録し、再生時に、記録された画像・音声データの画像を再生しつつ、音声を選択的に再生する。記録時に各マイクで取得される音声は、１つの動画ファイルに異なるトラックとして記録してもよく、またマイク別にそれぞれ独立した音声ファイルとして記録してもよい。 Similar to the second embodiment with respect to the first embodiment, the recorded sound may be selectively reproduced during reproduction with respect to the third embodiment. That is, the video camera 100 records the sound of the built-in microphone and the sound of all the connected wireless microphones during recording, and selectively reproduces the sound while reproducing the recorded image / audio data image during reproduction. Audio acquired by each microphone at the time of recording may be recorded as a different track in one moving image file, or may be recorded as an independent audio file for each microphone.

図１３Ａ及び図１３Ｂは、再生時に、撮像画像内に検出されるワイヤレスマイクの音声と、この音声と相関のある、撮影画像内に検出されないワイヤレスマイクの音声を再生する動作のフローチャートを示す。図１３Ａ及び図１３Ｂにおいて、図１１Ａ及び図１１Ｂと同じ処理内容のステップには同じ符号を付してある。ビデオカメラ１００（ＣＰＵ１０６）は、予め登録されている人物とワイヤレスマイク１０００_１〜１０００_ｎとを関連付けて記憶し、各ワイヤレスマイク１０００_１〜１０００_ｎの位置情報を音声トラックまたは音声ファイルに関連付けて記録する。また、ビデオカメラ１００は、予め登録されている人物を再生画像中から検出する機能を有し、動画再生時に再生画像中に登録人物が存在するか否かを検出できる。 FIG. 13A and FIG. 13B show a flowchart of the operation of reproducing the sound of the wireless microphone detected in the captured image and the sound of the wireless microphone that is correlated with the sound and not detected in the captured image during reproduction. 13A and 13B, steps having the same processing contents as those in FIGS. 11A and 11B are denoted by the same reference numerals. The video camera 100 (CPU 106) stores a person registered in advance and the wireless microphones 1000 _{1 to} 1000 _n in association with each other, and records position information of each wireless microphone 1000 _{1 to} 1000 _n in association with an audio track or an audio file. To do. In addition, the video camera 100 has a function of detecting a pre-registered person from the reproduced image, and can detect whether or not a registered person exists in the reproduced image during moving image reproduction.

ユーザがビデオカメラ１００で再生対象の動画を選択して再生を開始すると、ＣＰＵ１０６は、記録媒体１１０の選択された動画ファイルをオープンして、再生処理を開始する。音声ファイルが動画ファイルとは別のファイルとして記録されている場合は、内蔵マイク部１０３及び各ワイヤレスマイク１０００_１〜１０００_ｎに対応する音声ファイルも同時にオープンして、再生処理を開始する。 When the user selects a moving image to be played with the video camera 100 and starts playback, the CPU 106 opens the selected moving image file on the recording medium 110 and starts playback processing. When the audio file is recorded as a file different from the moving image file, the audio file corresponding to the built-in microphone unit 103 and each of the wireless microphones 1000 _{1 to} 1000 _n is opened at the same time, and the reproduction process is started.

ＣＰＵ１０６は、内蔵マイク部１０３に対応する音声トラックから動画１フレーム時間に相当する音声データを再生して、ＲＡＭ１０９の音声信号用バッファに格納する（Ｓ１３０１）。内蔵マイク部１０３で取得された音声データは、音声信号用バッファのミックス音声領域と内蔵マイク音声領域に格納される。 The CPU 106 reproduces the audio data corresponding to one frame time of the moving image from the audio track corresponding to the built-in microphone unit 103, and stores it in the audio signal buffer of the RAM 109 (S1301). The sound data acquired by the built-in microphone unit 103 is stored in the mixed sound area and the built-in microphone sound area of the sound signal buffer.

ＣＰＵ１０６は内部変数ｉに１をセットして初期化する（Ｓ１３０２）。ＣＰＵ１０６は、マイクｉに対応する音声トラックから動画１フレーム時間に相当する音声データを再生し、ＲＡＭ１０９のマイク［ｉ］音声領域に格納する（Ｓ１３０３）。次に、ＣＰＵ１０６は、マイクｉが再生画像内に存在するか否かを判定する（Ｓ２０６）。この判定は、マイクｉが関連付けられている人物が再生画像内に検出されているか否かで行うことができる。 The CPU 106 initializes the internal variable i by setting 1 (S1302). The CPU 106 reproduces audio data corresponding to one frame time of the moving image from the audio track corresponding to the microphone i, and stores it in the microphone [i] audio area of the RAM 109 (S1303). Next, the CPU 106 determines whether or not the microphone i exists in the reproduced image (S206). This determination can be made based on whether or not a person associated with the microphone i is detected in the reproduced image.

マイクｉが再生画像内に存在する場合（Ｓ２０７）、ＣＰＵ１０６は、ミックス音声領域の音声データとマイク［ｉ］音声領域の音声データを混合し、混合結果の音声データをミックス音声領域に記録する（Ｓ２０８）。また、ＣＰＵ１０６は、内部変数Ｍｉｃ［ｉ］にワイヤレスマイクｉが再生画像内に存在することを示す値”１”をセットする（Ｓ１３０４）。図９は、ここで変数Ｍｉｃ［ｉ］セットされる値の例を示す。マイクｉが再生画像内に存在しない場合（Ｓ２０７）、ＣＰＵ１０６は、変数Ｍｉｃ［ｉ］にマイクｉが再生画像内に存在しないことを示す値”２”をセットする（Ｓ１３０５）。 When the microphone i exists in the reproduced image (S207), the CPU 106 mixes the audio data in the mixed audio area and the audio data in the microphone [i] audio area, and records the audio data of the mixing result in the mixed audio area ( S208). Further, the CPU 106 sets a value “1” indicating that the wireless microphone i exists in the reproduced image to the internal variable Mic [i] (S1304). FIG. 9 shows an example of values set here for the variable Mic [i]. When the microphone i does not exist in the reproduced image (S207), the CPU 106 sets a value “2” indicating that the microphone i does not exist in the reproduced image to the variable Mic [i] (S1305).

ＣＰＵ１０６は、マイクｉに対応する音声トラックに関連付けて記録されているマイクｉの位置情報を取得する（Ｓ１３０６）。 The CPU 106 acquires the position information of the microphone i recorded in association with the audio track corresponding to the microphone i (S1306).

ステップＳ２１３〜Ｓ２１７，Ｓ１１０２，Ｓ２１９〜Ｓ２２２では、実施例３で説明した処理と同様の処理を行う。最終的に、ＣＰＵ１０６は、音声信号用バッファのミックス音声領域に格納される音声データを図示しない音声出力手段に出力し（Ｓ１３０７）、ステップＳ１３０１に戻って次の記録フレームに対する処理を繰り返す。 In steps S213 to S217, S1102, and S219 to S222, processing similar to that described in the third embodiment is performed. Finally, the CPU 106 outputs the audio data stored in the mixed audio area of the audio signal buffer to an audio output unit (not shown) (S1307), returns to step S1301, and repeats the process for the next recording frame.

（その他）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。特に、実施例２又は実施例４に示す再生処理をコンピュータに実行させるためのプログラムによっても、本発明が実現される。 (Other)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed. In particular, the present invention is also realized by a program for causing a computer to execute the reproduction process shown in the second or fourth embodiment.

本実施例では、撮像装置を例にとって説明したが、撮像機能または再生機能と、音声処理機能を有していればどのような装置であっても良い。たとえば、カメラ、ビデオ、携帯電話、スマートフォン、パソコンなどであってもよい。 In this embodiment, the image pickup apparatus has been described as an example. However, any apparatus may be used as long as it has an image pickup function or a reproduction function and a sound processing function. For example, it may be a camera, a video, a mobile phone, a smartphone, a personal computer, or the like.

また、実施例１から実施例４の機能は同時に備えられていても良い。また、各実施例を適宜組み合わせて使用しても良い。たとえば、動画及び音声を記録再生可能なビデオカメラにおいて、撮影時に全ワイヤレスマイクの音声を記録するか、実施例１又は３で示した処理を行うかの選択をユーザに可能とする。そして、全ワイヤレスマイクの音声を記録した場合は実施例２又は４で示した処理を行うように構成する。 The functions of the first to fourth embodiments may be provided at the same time. Moreover, you may use combining each Example suitably. For example, in a video camera capable of recording and reproducing moving images and audio, the user can select whether to record the audio of all wireless microphones at the time of shooting or to perform the processing described in the first or third embodiment. When the voices of all the wireless microphones are recorded, the processing shown in the second or fourth embodiment is performed.

Claims

An imaging apparatus having communication means for communicating with an imaging means and one or more wireless microphones,
Determining means for determining whether each of the wireless microphones is present in a captured image of the imaging means;
Correlation determining means for determining whether or not there is a correlation between a wireless microphone present in the captured image and a wireless microphone not present in the captured image;
Record the voice of the wireless microphone present in the captured image and the voice of the wireless microphone that is determined not to be correlated with the wireless microphone present in the captured image by the correlation determination unit. An image pickup apparatus comprising: a recording unit.

The determination means is
Subject registration means for registering a subject assigned with one of the one or more wireless microphones;
The imaging apparatus according to claim 1, further comprising: a recognition unit that recognizes a subject registered in the subject registration unit in the captured image.

The correlation determination means includes
Voice detection means for detecting the presence or absence of voice from each wireless microphone;
Voice detection history holding means for holding voice detection history data obtained as a result of the voice detection means;
Comparison of determining whether or not there is a correlation by comparing the voice detection history data of the wireless microphone determined to be present in the captured image with the voice detection history data of the wireless microphone determined not to be present in the captured image The imaging apparatus according to claim 1, wherein the imaging apparatus includes: means.

The comparison means compares the sound detected by the wireless microphone determined to be present in the captured image with the sound detected by the wireless microphone determined not to exist in the captured image, and detects these. The imaging apparatus according to claim 3, wherein it is determined that there is a correlation when the interval between sounds is shorter than a predetermined time.

The comparison means compares the sound detected by the wireless microphone determined to be present in the captured image with the sound detected by the wireless microphone determined not to be present in the captured image, and overlaps these sounds. The imaging apparatus according to claim 3, wherein it is determined that there is a correlation when is shorter than a predetermined time.

The comparison means compares the volume of the wireless microphone determined to be present in the captured image with the volume of the wireless microphone determined not to be present in the captured image, and the difference between the volumes is greater than a predetermined amount. The imaging apparatus according to claim 3, wherein it is determined that there is a correlation when the value is small.

Each wireless microphone has positioning means for acquiring current microphone position information,
The correlation determination means includes
From the microphone position information by the positioning means of the wireless microphone determined to be present in the captured image and the microphone position information by the positioning means of the wireless microphone determined not to be present in the captured image, these wireless microphones Calculate the distance between
The imaging apparatus according to claim 1, wherein it is determined that there is a correlation when the calculated distance is shorter than a predetermined distance.

A playback device for playing back image / audio data in which an image and sound of one or more wireless microphones are recorded,
Determining means for determining whether or not each of the wireless microphones is present in a reproduced image;
Correlation determining means for determining whether or not there is a correlation between the wireless microphone present in the reproduced image and the wireless microphone not present in the reproduced image;
Audio output means for outputting the sound of the wireless microphone present in the reproduced image and the sound of the wireless microphone not present in the reproduced image determined to be correlated by the correlation determining means; Playback device.

The determination means includes
Subject registration means for registering a subject assigned with one of the one or more wireless microphones;
9. The reproduction apparatus according to claim 8, further comprising a recognition unit that recognizes a subject registered in the subject registration unit in the reproduction image.

The correlation determination means includes
Voice detection means for detecting the presence or absence of voice from each wireless microphone;
Voice detection history holding means for holding voice detection history data obtained by the voice detection means;
Comparison of comparing the voice detection history data of the wireless microphone determined to be present in the reproduced image with the voice detection history data of the wireless microphone determined not to be present in the reproduced image, and determining whether or not there is a correlation The reproduction apparatus according to claim 8 or 9, further comprising: means.

The comparison means compares the sound detected by the wireless microphone determined to be present in the reproduced image with the sound detected by the wireless microphone determined not to be present in the reproduced image, and The playback apparatus according to claim 10, wherein it is determined that there is a correlation when the interval is shorter than a predetermined time.

The comparison means compares the sound detected by the wireless microphone determined to be present in the reproduced image with the sound detected by the wireless microphone determined not to be present in the reproduced image, and 11. The playback apparatus according to claim 10, wherein it is determined that there is a correlation when the overlap is shorter than a predetermined time.

The comparison means compares the volume of the wireless microphone determined to be present in the reproduced image with the volume of the wireless microphone determined not to exist in the reproduced image, and the difference between the volumes is larger than a predetermined amount. The playback apparatus according to claim 10, wherein it is determined that there is a correlation when the value is small.

The image / sound data includes position information of each wireless microphone,
The correlation determining means calculates a distance between the wireless microphones determined from the position information of the wireless microphones determined to be present in the reproduced image and the position information of the wireless microphones determined not to be present in the reproduced image. 10. The reproducing apparatus according to claim 8, wherein it is determined that there is a correlation when the calculated distance is shorter than a predetermined distance.