JP2020043420A

JP2020043420A - Video audio processing system and control method of the same

Info

Publication number: JP2020043420A
Application number: JP2018167901A
Authority: JP
Inventors: 裕也藤原; Hironari Fujiwara
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-09-07
Filing date: 2018-09-07
Publication date: 2020-03-19
Anticipated expiration: 2038-09-07
Also published as: JP7337491B2

Abstract

To enable a plurality of pieces of video data and a plurality of pieces of audio data to be appropriately associated with each other.SOLUTION: The video audio processing system includes: a plurality of pieces of imaging means (1101 to 1107) for generating a plurality of pieces of video data; audio input means (2000) for collecting audio in a plurality of places or directions and generating a plurality of pieces of audio data corresponding to the imaging means; and associating means for associating the video data with the audio data.SELECTED DRAWING: Figure 5

Description

本発明は、映像音声処理システムおよび映像音声処理システムの制御方法に関する。 The present invention relates to a video / audio processing system and a control method for the video / audio processing system.

特許文献１では、撮像装置は、画像を取得する撮像ユニットと、画像を表示する画像表示ユニットとともに、入力された外部音声を複数の音声信号に変換して出力するマイクユニットを備える。撮像装置は、表示ユニットで表示された画像に対して、画像内の特定の被写体を指定し、指定された被写体の座標情報と撮影画角とに基づいて、撮像ユニットからの特定の被写体の方向を算出する。そして、撮像装置は、算出された特定の被写体の方向に基づいて、複数の音声信号から特定の被写体の方向に対応した合成音声信号を生成し、合成音声信号を外部に出力する。 In Patent Literature 1, the imaging apparatus includes an imaging unit that acquires an image, an image display unit that displays an image, and a microphone unit that converts input external audio into a plurality of audio signals and outputs the audio signals. The imaging device specifies a specific subject in the image with respect to the image displayed on the display unit, and based on the coordinate information of the specified subject and the shooting angle of view, the direction of the specific subject from the imaging unit. Is calculated. Then, the imaging device generates a synthetic audio signal corresponding to the direction of the specific subject from the plurality of audio signals based on the calculated direction of the specific subject, and outputs the synthetic audio signal to the outside.

特開２００８−１９３１９６号公報JP 2008-193196 A

特許文献１では、１つのカメラに対し、１つのマイクロフォンを用意することを前提としている。複数のカメラが録画と共に録音をするには、録音するすべてのカメラにマイクロフォンを接続するか、または内蔵マイクを搭載したカメラを用意する必要がある。ユーザは、カメラの台数分のマイクロフォンを用意すると、費用が増えるので、録音機能を断念してしまう。 In Patent Document 1, it is assumed that one microphone is prepared for one camera. In order for multiple cameras to record along with the recording, it is necessary to connect microphones to all the cameras to be recorded or to prepare a camera with a built-in microphone. If the user prepares microphones for the number of cameras, the cost increases, and the recording function is abandoned.

特許文献１では、撮像ユニットとマイクロフォンの位置がそれぞれ固定であり、ユーザは、撮像ユニットに対するマイクロフォンの相対位置を変更することができないことを前提としている。そのため、ユーザの意思で撮像ユニットとマイクロフォンの数を決定し、それぞれを任意の位置に設置する場合では、設置場所によって、撮像ユニットからの被写体の方向と、マイクロフォンからの被写体の方向とが異なってしまう。この場合、カメラの映像に合わせて集音方向を指定し、音声を配信することができない。 In Patent Literature 1, it is assumed that the positions of the imaging unit and the microphone are fixed, and that the user cannot change the relative position of the microphone with respect to the imaging unit. Therefore, when the number of imaging units and microphones is determined by the user's intention and each is installed at an arbitrary position, the direction of the subject from the imaging unit and the direction of the subject from the microphone differ depending on the installation location. I will. In this case, the sound collection direction cannot be specified according to the image of the camera and the sound cannot be distributed.

本発明の目的は、複数の映像データと複数の音声データを適切に関連付けることができるようにすることである。 An object of the present invention is to make it possible to appropriately associate a plurality of video data with a plurality of audio data.

本発明の映像音声処理システムは、複数の映像データを生成する複数の撮像手段と、複数の場所または複数の方向の音声を集音し、前記複数の撮像手段に対応する複数の音声データを生成する音声入力手段と、前記複数の映像データと前記複数の音声データを関連付ける関連付け手段とを有する。 A video and audio processing system according to the present invention includes a plurality of imaging units that generate a plurality of video data, and a plurality of locations or directions in which sound is collected to generate a plurality of audio data corresponding to the plurality of imaging units. And an associating means for associating the plurality of video data with the plurality of audio data.

本発明によれば、複数の映像データと複数の音声データを適切に関連付けることができる。 According to the present invention, it is possible to appropriately associate a plurality of video data with a plurality of audio data.

映像音声処理システムの構成例を示す図である。1 is a diagram illustrating a configuration example of a video and audio processing system. カメラとマイクロフォンとサーバ装置の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of a camera, a microphone, and a server device. ＣＰＵとＲＯＭとＲＡＭを示す図である。It is a figure which shows CPU, ROM, and RAM. 音源からの音がマイクロフォンに到達する様子を示す図である。FIG. 3 is a diagram illustrating a state in which sound from a sound source reaches a microphone. カメラの撮影範囲とマイクロフォンの集音範囲を示す図である。FIG. 3 is a diagram illustrating a shooting range of a camera and a sound collection range of a microphone. ＩＤとＩＰアドレスと指向性条件を記憶する記憶部を示す図である。FIG. 3 is a diagram illustrating a storage unit that stores an ID, an IP address, and a directivity condition. 映像音声処理システムの制御方法を示すフローチャートである。5 is a flowchart illustrating a control method of the video and audio processing system. 映像音声処理システムの制御方法を示すフローチャートである。5 is a flowchart illustrating a control method of the video and audio processing system. カメラの撮影範囲の極座標の算出方法を示す図である。FIG. 5 is a diagram illustrating a method of calculating polar coordinates of a shooting range of a camera. マイクロフォンの指向性方向および指向性範囲を示す図である。It is a figure which shows the directivity direction and directivity range of a microphone.

（第１の実施形態）
図１（ａ）は、本発明の第１の実施形態による映像音声処理システム１００の構成例を示す図である。映像音声処理システム１００は、カメラ１１０１と、カメラ１１０２と、カメラ１１０３と、マイクロフォン２０００と、サーバ装置３０００と、ネットワーク４０００とを有する。カメラ１１０１と、カメラ１１０２と、カメラ１１０３と、マイクロフォン２０００と、サーバ装置３０００は、ネットワーク４０００を介して、相互に通信可能である。マイクロフォン２０００の数は、カメラ１１０１〜１１０３の数より少ない。 (First embodiment)
FIG. 1A is a diagram illustrating a configuration example of a video and audio processing system 100 according to a first embodiment of the present invention. The video / audio processing system 100 includes a camera 1101, a camera 1102, a camera 1103, a microphone 2000, a server device 3000, and a network 4000. The camera 1101, the camera 1102, the camera 1103, the microphone 2000, and the server device 3000 can communicate with each other via the network 4000. The number of microphones 2000 is smaller than the number of cameras 1101 to 1103.

カメラ１１０１は、映像データを生成し、映像データをネットワーク４０００を介して送信するネットワークカメラであり、撮像装置に相当する。カメラ１１０２とカメラ１１０３は、それぞれ、カメラ１１０１と同様である。なお、映像音声処理システム１００は、カメラ１１０１〜１１０３以外にも、複数のカメラを備えてもよい。カメラ１１０１〜１１０３は、ズーム駆動機構、パン駆動機構、チルト駆動機構等を備えてもよい。 The camera 1101 is a network camera that generates video data and transmits the video data via the network 4000, and corresponds to an imaging device. The camera 1102 and the camera 1103 are the same as the camera 1101 respectively. Note that the video and audio processing system 100 may include a plurality of cameras in addition to the cameras 1101 to 1103. The cameras 1101 to 1103 may include a zoom driving mechanism, a pan driving mechanism, a tilt driving mechanism, and the like.

マイクロフォン２０００は、音声を入力し、入力した音声をネットワーク４０００を介して送信するネットワークマイクロフォンであり、音声入力装置に相当する。図１（ｂ）は、マイクロフォン２０００の構成例を示す図である。マイクロフォン２０００は、複数のマイクロフォン（集音部）２０１１〜２０１８を有する。マイクロフォン２０１１〜２０１８の数は、８個に限定されず、増やしても、減らしてもよい。 The microphone 2000 is a network microphone that inputs voice and transmits the input voice via the network 4000, and corresponds to a voice input device. FIG. 1B is a diagram illustrating a configuration example of the microphone 2000. The microphone 2000 has a plurality of microphones (sound collection units) 2011 to 2018. The number of microphones 2011 to 2018 is not limited to eight, and may be increased or decreased.

サーバ装置３０００は、カメラ１１０１〜１１０３およびマイクロフォン２０００に対して、ネットワーク４０００を介して相互に通信可能である。カメラ１１０１〜１１０３、マイクロフォン２０００およびサーバ装置３０００は、それぞれ、ネットワーク４０００を介して、他の装置に各種コマンドを送信し、コマンドを受信した場合には、送信した装置にレスポンスを送信する。サーバ装置３０００は、パーソナルコンピュータ（ＰＣ）等の処理装置の一例である。 The server device 3000 can communicate with the cameras 1101 to 1103 and the microphone 2000 via the network 4000. Each of the cameras 1101 to 1103, the microphone 2000, and the server device 3000 transmits various commands to other devices via the network 4000, and when receiving the command, transmits a response to the transmitting device. The server device 3000 is an example of a processing device such as a personal computer (PC).

ネットワーク４０００は、例えばＥｔｈｅｒｎｅｔ（登録商標）等の通信規格を満足する複数のルータ、スイッチ、ケーブル等から構成される。なお、ネットワーク４０００は、カメラ１１０１〜１１０３とマイクロフォン２０００とサーバ装置３０００との間の通信を行うことができるものであれば、その通信規格、規模、構成を問わない。例えば、ネットワーク４０００は、インターネットや有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、無線ＬＡＮ（ＷｉｒｅｌｅｓｓＬＡＮ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）等により構成されていてもよい。なお、カメラ１１０１〜１１０３は、例えば、ＰｏＥ（ＰｏｗｅｒＯｖｅｒＥｔｈｅｒｎｅｔ（登録商標））に対応する監視カメラでもよく、ＬＡＮケーブルを介して電力を供給されてもよい。 The network 4000 includes a plurality of routers, switches, cables, and the like that satisfy a communication standard such as Ethernet (registered trademark). The network 4000 may be of any communication standard, scale, and configuration as long as it can perform communication between the cameras 1101 to 1103, the microphone 2000, and the server device 3000. For example, the network 4000 may include the Internet, a wired LAN (Local Area Network), a wireless LAN (Wireless LAN), a WAN (Wide Area Network), or the like. The cameras 1101 to 1103 may be, for example, surveillance cameras corresponding to PoE (Power Over Ethernet (registered trademark)), and may be supplied with power via a LAN cable.

図２は、カメラ１１０１とマイクロフォン２０００とサーバ装置３０００の構成例を示す図である。カメラ１１０１は、撮像部１００１と、画像処理部１００２と、制御部１００３と、通信部１００４と、記憶部１００５とを有する。なお、カメラ１１０２および１１０３は、カメラ１１０１と同様の構成を有する。 FIG. 2 is a diagram illustrating a configuration example of the camera 1101, the microphone 2000, and the server device 3000. The camera 1101 includes an imaging unit 1001, an image processing unit 1002, a control unit 1003, a communication unit 1004, and a storage unit 1005. The cameras 1102 and 1103 have the same configuration as the camera 1101.

撮像部１００１は、レンズと、ＣＣＤまたはＣＭＯＳ等の撮像素子とを有し、レンズ設定等によって決定される画角によって被写体を撮像し、光電変換により、映像信号を生成する。画像処理部１００２は、撮像部１００１により生成された映像信号に対して、所定の画像処理および圧縮符号化処理を行い、映像データを生成する。なお、制御部１００３は、ユーザにより設定された撮像条件または制御部１００３が自動で決定した撮像条件に基づいて、撮像部１００１および画像処理部１００２を制御する。ここで、撮像条件は、撮像ゲイン条件、ガンマ条件、ダイナミックレンジ条件、露出条件、フォーカス条件等を含む。 The imaging unit 1001 includes a lens and an imaging element such as a CCD or a CMOS, captures an image of a subject at an angle of view determined by lens setting, and generates a video signal by photoelectric conversion. The image processing unit 1002 performs predetermined image processing and compression encoding processing on the video signal generated by the imaging unit 1001, and generates video data. The control unit 1003 controls the imaging unit 1001 and the image processing unit 1002 based on the imaging conditions set by the user or the imaging conditions automatically determined by the control unit 1003. Here, the imaging conditions include an imaging gain condition, a gamma condition, a dynamic range condition, an exposure condition, a focus condition, and the like.

制御部１００３は、図３に示すように、ＣＰＵ３０１とＲＯＭ３０２とＲＡＭ３０３を有する。制御部１００３は、マイクロフォン２０００とサーバ装置３０００等からネットワーク４０００経由で受信したカメラ制御コマンドを解析し、カメラ制御コマンドに応じた処理を行う。例えば、制御部１００３は、画像処理部１００２に対して、画質調整の指示、ズームやフォーカス制御の指示、パンチルト動作の指示、音声データと映像データの結合および送信を行う。また、制御部１００３は、ＣＰＵ３０１を有し、カメラ１１０１の各構成要素を統括的に制御し、各種パラメータ等の設定を行う。また、制御部１００３は、データを記憶するＲＯＭ３０２およびＲＡＭ３０３を有し、ＲＡＭ３０２またはＲＯＭ３０３に記憶されたプログラムを実行する。ＲＡＭ３０３は、制御部１００３が実行するプログラムの格納領域、プログラム実行中のワーク領域、データの格納領域等を有する。また、制御部１００３は、計時手段を有し、取得した各データに対してタイムスタンプ等を付与することができる。 The control unit 1003 includes a CPU 301, a ROM 302, and a RAM 303, as shown in FIG. The control unit 1003 analyzes a camera control command received from the microphone 2000 and the server device 3000 via the network 4000, and performs a process according to the camera control command. For example, the control unit 1003 issues an instruction for image quality adjustment, an instruction for zoom and focus control, an instruction for a pan / tilt operation, and a combination and transmission of audio data and video data to the image processing unit 1002. Further, the control unit 1003 has a CPU 301 and controls each component of the camera 1101 in an integrated manner and sets various parameters. The control unit 1003 has a ROM 302 and a RAM 303 for storing data, and executes a program stored in the RAM 302 or the ROM 303. The RAM 303 has a storage area for a program executed by the control unit 1003, a work area during program execution, a data storage area, and the like. Further, the control unit 1003 has a time measuring unit, and can add a time stamp or the like to each of the acquired data.

通信部１００４は、マイクロフォン２０００が送信した音声データをネットワーク４０００経由で受信し、適切なパケット処理を行った後に、制御部１００３へ出力する。また、通信部１００４は、マイクロフォン２０００からコマンドを受信し、受信したコマンドに対するレスポンスをマイクロフォン２０００へ送信する。また、通信部１００４は、映像データをネットワーク４０００経由でサーバ装置３０００に送信する。また、通信部１００４は、サーバ装置３０００が送信するカメラ制御コマンドを受信し、適切なパケット処理等を行った後に、制御部１００３へ出力する。また、通信部１００４は、サーバ装置３０００から受信したコマンドに対するレスポンスをサーバ装置３０００へ送信する。 The communication unit 1004 receives the voice data transmitted by the microphone 2000 via the network 4000, performs appropriate packet processing, and outputs the data to the control unit 1003. The communication unit 1004 receives a command from the microphone 2000 and transmits a response to the received command to the microphone 2000. The communication unit 1004 transmits the video data to the server device 3000 via the network 4000. Further, the communication unit 1004 receives the camera control command transmitted by the server device 3000, performs appropriate packet processing and the like, and then outputs the command to the control unit 1003. The communication unit 1004 transmits a response to the command received from the server device 3000 to the server device 3000.

記憶部１００５は、カメラ１１０１によって生成された映像データとマイクロフォン２０００によって生成された音声データを関連付けるための情報を記憶する。 Storage unit 1005 stores information for associating video data generated by camera 1101 with audio data generated by microphone 2000.

続いて、図２を参照して、マイクロフォン２０００の各部構成と機能を説明する。マイクロフォン２０００は、集音部２００１と、音声処理部２００２と、制御部２００３と、通信部２００４と、記憶部２００５とを有する。なお、８個のマイクロフォン２０１１〜２０１８の各々は、少なくとも別々の集音部２００１を有する。 Next, the configuration and function of each unit of the microphone 2000 will be described with reference to FIG. The microphone 2000 includes a sound collection unit 2001, a sound processing unit 2002, a control unit 2003, a communication unit 2004, and a storage unit 2005. Note that each of the eight microphones 2011 to 2018 has at least a separate sound collection unit 2001.

集音部２００１は、振動膜および固定板等の電極から構成され、音圧による振動膜の振動に応じて電極間の距離が変わることにより、電圧が変動することで音声を電気の音声信号へ変換する。また、集音部２００１は、音声信号の電圧を増幅するためのアンプを含んでもよい。 The sound collection unit 2001 includes electrodes such as a diaphragm and a fixed plate. The distance between the electrodes changes according to the vibration of the diaphragm due to sound pressure, and the voltage changes, thereby converting sound into an electric sound signal. Convert. Further, the sound collection unit 2001 may include an amplifier for amplifying the voltage of the audio signal.

音声処理部２００２は、集音部２００１により生成された音声信号に対して、音声処理および圧縮符号化処理を行い、音声データを生成する。なお、制御部２００３は、ユーザにより設定された音声入力条件または制御部２００３が自動で決定した音声入力条件に基づいて、集音部２００１および音声処理部２００２を制御する。ここで、音声入力条件は、音量ゲイン条件、音声周波数特性条件、音声指向方向条件、音声指向範囲条件等を含む。 The audio processing unit 2002 performs audio processing and compression encoding processing on the audio signal generated by the sound collection unit 2001, and generates audio data. The control unit 2003 controls the sound collection unit 2001 and the voice processing unit 2002 based on the voice input condition set by the user or the voice input condition automatically determined by the control unit 2003. Here, the voice input condition includes a volume gain condition, a voice frequency characteristic condition, a voice directivity condition, a voice directivity range condition, and the like.

制御部２００３は、図３に示すように、ＣＰＵ３０１とＲＯＭ３０２とＲＡＭ３０３を有する。制御部２００３は、カメラ１１０１とサーバ装置３０００等からネットワーク４０００経由で受信した制御コマンドを解析し、制御コマンドに応じた処理を行う。例えば、制御部２００３は、音声処理を行った音声データの送信先の制御指示を行う。また、制御部２００３は、ＣＰＵ３０１を有し、マイクロフォン２０００の各構成要素を統括的に制御し、各種パラメータ等の設定を行う。また、制御部２００３は、データを記憶するＲＯＭ３０２およびＲＡＭ３０３を有し、ＲＯＭ３０２またはＲＡＭ３０３に記憶されたプログラムを実行する。なお、ＲＡＭ３０３は、制御部２００３が実行するプログラムの格納領域、プログラム実行中のワーク領域、データの格納領域等を有する。また、制御部２００３は、計時手段を有し、取得した各データに対してタイムスタンプ等を付与することができる。また、制御部２００３は、２つのマイクロフォン２０１１および２０１２等の音声信号を、指向性処理（目的の方向からの音声を強調し、目的以外の方向からの音声を抑制する信号処理）し、指向性処理した音声信号を出力する。 The control unit 2003 includes a CPU 301, a ROM 302, and a RAM 303, as shown in FIG. The control unit 2003 analyzes a control command received from the camera 1101 and the server device 3000 via the network 4000, and performs processing according to the control command. For example, the control unit 2003 gives a control instruction of a transmission destination of the audio data subjected to the audio processing. Further, the control unit 2003 has a CPU 301 and controls each component of the microphone 2000 in an integrated manner and sets various parameters. The control unit 2003 has a ROM 302 and a RAM 303 for storing data, and executes a program stored in the ROM 302 or the RAM 303. The RAM 303 has a storage area for a program executed by the control unit 2003, a work area during execution of the program, a data storage area, and the like. In addition, the control unit 2003 includes a clock unit, and can add a time stamp or the like to each piece of acquired data. Further, the control unit 2003 performs directivity processing (signal processing for emphasizing sound from a target direction and suppressing sound from a direction other than the target) on sound signals of the two microphones 2011 and 2012, and performs directivity processing. Output the processed audio signal.

図４（ａ）および図４（ｂ）を用いて、指向性処理を説明する。図４（ａ）は、音源からの音が、マイクロフォン２０１１およびマイクロフォン２０１２に対して、角度θの方向から到達する様子を示している。マイクロフォン２０１１とマイクロフォン２０１２は、距離Ｄ２を隔てて配置されている。この場合において、音源とマイクロフォン２０１１との距離と、音源とマイクロフォン２０１２との距離の差Ｌは、次式で表される。
Ｌ＝Ｄ２×ｃｏｓθ The directivity processing will be described with reference to FIGS. 4A and 4B. FIG. 4A illustrates a state in which a sound from a sound source reaches the microphone 2011 and the microphone 2012 from the direction of the angle θ. The microphone 2011 and the microphone 2012 are arranged at a distance D2. In this case, the difference L between the distance between the sound source and the microphone 2011 and the distance between the sound source and the microphone 2012 is expressed by the following equation.
L = D2 × cos θ

また、音速をＶとすると、音源からの音がマイクロフォン２０１１に到達してから、音源からの音がマイクロフォン２０１２に到達するまでの時間Ｔは、次式で表される。
Ｔ＝Ｌ／Ｖ＝Ｄ２×ｃｏｓθ／Ｖ Further, assuming that the sound speed is V, the time T from when the sound from the sound source reaches the microphone 2011 to when the sound from the sound source reaches the microphone 2012 is expressed by the following equation.
T = L / V = D2 × cos θ / V

図４（ｂ）は、Ｄ２＝５０ｍｍ、Ｖ＝３４６．７５ｍ／ｓとした場合の、角度θに対する、ＬとＴの値、および、Ｔの差分を示している。例えば、θ＝０ｄｅｇのとき、Ｌ＝５０ｍｍ、Ｔ＝１４４μｓである。θ＝１５ｄｅｇのとき、Ｌ＝４８ｍｍ、Ｔ＝１３９μｓである。θ＝０ｄｅｇとθ＝１５ｄｅｇのＴの差分は、５μｓである。 FIG. 4B shows the difference between the values of L and T and the angle T with respect to the angle θ when D2 = 50 mm and V = 346.75 m / s. For example, when θ = 0 deg, L = 50 mm and T = 144 μs. When θ = 15 deg, L = 48 mm and T = 139 μs. The difference between T at θ = 0 deg and θ = 15 deg is 5 μs.

制御部２００３は、時間Ｔを基にして、指向性処理の演算を行う。例えば、正面方向（９０ｄｅｇ）の音声を強調したい（指向性を持たせたい）場合を説明する。その場合、制御部２００３は、マイクロフォン２０１１とマイクロフォン２０１２に同時に到達する音声（Ｔ＝０μｓの音声）を強調し、時間差をもって到達する音声（Ｔ≠０μｓの音声）を抑制するための演算を行う。 The control unit 2003 calculates the directivity processing based on the time T. For example, a case will be described in which sound in the front direction (90 deg) is to be emphasized (to have directivity). In that case, the control unit 2003 performs an operation for emphasizing the sound (T = 0 μs sound) arriving at the microphones 2011 and 2012 simultaneously and suppressing the sound arriving at a time difference (T ≠ 0 μs sound).

通信部２００４は、音声データをネットワーク４０００経由でサーバ装置３０００に送信する。また、通信部２００４は、サーバ装置３０００から送信される制御コマンドを受信し、適切なパケット処理等を行った後に、制御部２００３へ出力する。また、通信部２００４は、サーバ装置３０００から受信したコマンドに対するレスポンスをサーバ装置３０００へ送信する。 The communication unit 2004 transmits the audio data to the server device 3000 via the network 4000. In addition, the communication unit 2004 receives a control command transmitted from the server device 3000, performs appropriate packet processing and the like, and outputs the control command to the control unit 2003. Further, the communication unit 2004 transmits a response to the command received from the server device 3000 to the server device 3000.

記憶部２００５は、カメラ１１０１〜１１０３によって生成された映像データとマイクロフォン２０００によって生成された音声データを関連付けるための情報を記憶する。 The storage unit 2005 stores information for associating video data generated by the cameras 1101 to 1103 with audio data generated by the microphone 2000.

続いて、図２を参照して、サーバ装置３０００の各部構成と機能について説明する。サーバ装置３０００は、例えば、パーソナルコンピュータ等の汎用コンピュータである。サーバ装置３０００は、通信部３００１と、システム制御部３００２と、記憶部３００３とを有する。 Next, the configuration and function of each unit of the server device 3000 will be described with reference to FIG. The server device 3000 is, for example, a general-purpose computer such as a personal computer. The server device 3000 includes a communication unit 3001, a system control unit 3002, and a storage unit 3003.

通信部３００１は、ネットワーク４０００を介して、カメラ１１０１〜１１０３から映像データを受信し、マイクロフォン２０００から音声データを受信する。また、通信部３００１は、カメラ１１０１〜１１０３またはマイクロフォン２０００に対して制御コマンドを送信し、この制御コマンドに対するレスポンスを受信する。 The communication unit 3001 receives video data from the cameras 1101 to 1103 and audio data from the microphone 2000 via the network 4000. The communication unit 3001 transmits a control command to the cameras 1101 to 1103 or the microphone 2000, and receives a response to the control command.

システム制御部３００２は、図３に示すように、ＣＰＵ３０１とＲＯＭ３０２とＲＡＭ３０３を有する。システム制御部３００２は、ユーザの操作に応じてカメラ制御コマンドを生成し、カメラ制御コマンドを通信部３００１を介してカメラ１１０１〜１１０３へ送信する。また、システム制御部３００２は、通信部３００１を介して受信したカメラ１１０１〜１１０３からの映像データを記憶部３００３に保存する。また、システム制御部３００２は、ＣＰＵ３０１を有し、サーバ装置３０００の各構成要素を統括的に制御し、各種パラメータ等の設定を行う。また、システム制御部３００２は、データを記憶するＲＯＭ３０２およびＲＡＭ３０３を有し、ＲＯＭ３０２またはＲＡＭ３０３に記憶されたプログラムを実行する。なお、ＲＡＭ３０３は、システム制御部３００２が実行するプログラムの格納領域、プログラム実行中のワーク領域、データの格納領域等を有する。また、システム制御部３００２は、計時手段を有し、取得した各データに対してタイムスタンプ等を付与することができる。 The system control unit 3002 has a CPU 301, a ROM 302, and a RAM 303 as shown in FIG. The system control unit 3002 generates a camera control command according to a user operation, and transmits the camera control command to the cameras 1101 to 1103 via the communication unit 3001. Further, the system control unit 3002 stores the video data from the cameras 1101 to 1103 received via the communication unit 3001 in the storage unit 3003. Further, the system control unit 3002 has a CPU 301, and controls each component of the server device 3000 as a whole and sets various parameters and the like. Further, the system control unit 3002 has a ROM 302 and a RAM 303 for storing data, and executes a program stored in the ROM 302 or the RAM 303. The RAM 303 has a storage area for a program executed by the system control unit 3002, a work area during execution of the program, a storage area for data, and the like. In addition, the system control unit 3002 includes a clock unit, and can add a time stamp or the like to each piece of acquired data.

記憶部３００３は、カメラ１１０１〜１１０３およびマイクロフォン２０００が取得したデータを保存する。システム制御部３００２は、記憶部３００３に記憶されているデータを読み出し、転送する。 The storage unit 3003 stores data acquired by the cameras 1101 to 1103 and the microphone 2000. The system control unit 3002 reads and transfers data stored in the storage unit 3003.

図５は、部屋５０００に設置したカメラ１１０１〜１１０７およびマイクロフォン２０００の配置、カメラ１１０１〜１１０７の撮影範囲１２０１〜１２０７、およびマイクロフォン２０００の集音範囲Ａ〜Ｈを示す図である。カメラ１１０２〜１１０７は、カメラ１１０１と同様の構成を有し、ネットワーク４０００に接続される。マイクロフォン２０００の数は、カメラ１１０１〜１１０７の数より少ない。図５を参照しながら、マイクロフォン２０００の指向性処理における集音範囲の決定方法について説明する。カメラ１１０１〜１１０７が部屋５０００に設置されている場合、サーバ装置３０００は、カメラ管理ソフトウェアにより、図５のように部屋５０００を上から見た方向で表示する。サーバ装置３０００は、カメラ１１０１〜１１０７の撮影範囲１２０１〜１２０７を表示し、ユーザの設定またはカメラ１１０１〜１１０７の画角に合わせて撮影範囲１２０１〜１２０７を自動的に設定する。マイクロフォン２０００の指向性の設定可能な集音範囲Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆ、Ｇ、Ｈのエリアを分割し、マイクロフォン２０００に対して集音範囲Ａ〜Ｈのいずれかを選択することにより、集音エリアを指定することができる。例えば、マイクロフォン２０００の指向性を集音範囲ＡおよびＢと指定した場合、集音範囲ＡおよびＢのエリアにある音源の音のみを集音することができる。集音範囲Ａ〜Ｈは、マイクロフォン２０１１〜２０１８に対応する。マイクロフォン２０００は、複数の場所または複数の方向の音声を集音し、複数の場所または複数の方向の音声を指向性処理することにより、複数のカメラ１１０１〜１１０７に対応する複数の音声データを生成する。 FIG. 5 is a diagram showing the arrangement of the cameras 1101 to 1107 and the microphone 2000 installed in the room 5000, the imaging ranges 1201 to 1207 of the cameras 1101 to 1107, and the sound collection ranges A to H of the microphone 2000. The cameras 1102 to 1107 have the same configuration as the camera 1101 and are connected to the network 4000. The number of microphones 2000 is smaller than the number of cameras 1101-1107. A method of determining a sound collection range in the directivity processing of the microphone 2000 will be described with reference to FIG. When the cameras 1101 to 1107 are installed in the room 5000, the server device 3000 displays the room 5000 in a direction viewed from above as shown in FIG. 5 by the camera management software. The server device 3000 displays the photographing ranges 1201 to 1207 of the cameras 1101 to 1107, and automatically sets the photographing ranges 1201 to 1207 in accordance with user settings or the angle of view of the cameras 1101 to 1107. The area of the sound collection ranges A, B, C, D, E, F, G, and H in which the directivity of the microphone 2000 can be set is divided, and any one of the sound collection ranges A to H is selected for the microphone 2000. Thus, the sound collection area can be specified. For example, when the directivity of the microphone 2000 is designated as the sound collection ranges A and B, only the sound of the sound source in the sound collection ranges A and B can be collected. The sound collection ranges A to H correspond to the microphones 2011 to 2018. The microphone 2000 collects sounds in a plurality of places or directions and generates a plurality of sound data corresponding to a plurality of cameras 1101 to 1107 by performing directional processing on the sounds in a plurality of places or directions. I do.

図６（ａ）〜（ｅ）は、カメラ１１０１〜１１０７の映像データとマイクロフォン２０００の音声データを関連付けるためのＩＤとＩＰアドレスと指向性の対応関係を示す図である。本実施の形態では、カメラ１１０１〜１１０７のそれぞれに１つのＩＰアドレスが割り当てられ、マイクロフォン２０００には、複数のＩＰアドレスを有するように複数の通信デバイスが備えられているものとする。図６（ａ）に示すように、１つのカメラに対して１つのＩＤが割り当てられる。例えば、カメラ１１０１のＩＤは１である。カメラ１１０２のＩＤは２である。カメラ１１０３のＩＤは３である。カメラ１１０４のＩＤは４である。カメラ１１０５のＩＤは５である。カメラ１１０６のＩＤは６である。カメラ１１０７のＩＤは７である。 FIGS. 6A to 6E are diagrams showing a correspondence relationship between an ID, an IP address, and directivity for associating video data of the cameras 1101 to 1107 with audio data of the microphone 2000. In the present embodiment, one IP address is assigned to each of the cameras 1101 to 1107, and the microphone 2000 is provided with a plurality of communication devices having a plurality of IP addresses. As shown in FIG. 6A, one ID is assigned to one camera. For example, the ID of the camera 1101 is 1. The ID of the camera 1102 is 2. The ID of the camera 1103 is 3. The ID of the camera 1104 is 4. The ID of the camera 1105 is 5. The ID of the camera 1106 is 6. The ID of the camera 1107 is 7.

図６（ａ）は、カメラ１１０１〜１１０７のＩＤとＩＰアドレス（識別情報）との対応関係を示すテーブルの図である。図６（ｂ）は、カメラ１１０１〜１１０７のＩＤごとのマイクロフォン２０００の指向性の集音範囲を示すテーブルの図である。マイクロフォン２０００の指向性の集音範囲は、マイクロフォン２０００の指向性を示す情報である。例えば、カメラ１１０１のＩＤ「１」は、カメラ１１０１の撮影範囲１２０１に対応する指向性の集音範囲ＡおよびＢが設定される。 FIG. 6A is a diagram of a table showing the correspondence between the IDs of the cameras 1101 to 1107 and the IP addresses (identification information). FIG. 6B is a diagram of a table showing a sound collection range of directivity of the microphone 2000 for each ID of the cameras 1101 to 1107. The sound collecting range of the directivity of the microphone 2000 is information indicating the directivity of the microphone 2000. For example, as the ID “1” of the camera 1101, the directivity sound collection ranges A and B corresponding to the shooting range 1201 of the camera 1101 are set.

図６（ｃ）は、カメラ１１０１〜１１０７のＩＤと音声データに対応する通信デバイスのＩＰアドレス（識別情報）との対応関係を示すテーブルの図である。例えば、カメラ１１０１のＩＤ「１」に対して、集音範囲ＡおよびＢの音声データを出力するための通信デバイスのＩＰアドレスが関連付けられている。また、カメラ１１０２のＩＤ「２」に対して、集音範囲ＡおよびＨの音声データを出力するための通信デバイスのＩＰアドレスが関連付けられている。図６（ｄ）は、カメラ１１０１〜１１０７のＩＤに対するカメラのＩＰアドレスと対応する音声データを出力する通信デバイスのＩＰアドレスとの対応関係を示すテーブルの図である。図６（ｅ）は、カメラのＩＤおよびＩＰアドレス、対応する音声データを出力する通信デバイスのＩＤおよびＩＰアドレスの対応関係を示すものであり、カメラのＩＤが「１」の例を示している。 FIG. 6C is a table showing the correspondence between the IDs of the cameras 1101 to 1107 and the IP addresses (identification information) of the communication devices corresponding to the audio data. For example, the IP address of the communication device for outputting the sound data of the sound collection ranges A and B is associated with the ID “1” of the camera 1101. Also, the IP address of the communication device for outputting the sound data of the sound collection ranges A and H is associated with the ID “2” of the camera 1102. FIG. 6D is a diagram of a table showing the correspondence between the IP addresses of the cameras 1101 to 1107 and the IP addresses of the communication devices that output the corresponding audio data. FIG. 6E shows the correspondence between the ID and IP address of the camera and the ID and IP address of the communication device that outputs the corresponding audio data, and shows an example in which the camera ID is “1”. .

図７（ａ）〜（ｃ）は、映像音声処理システム１００の制御方法を示すフローチャートである。図７（ａ）は、マイクロフォン２０００の制御方法を示すフローチャートである。ステップＳ７０１では、マイクロフォン２０１１〜２０１８の集音部２００１は、集音し、音声を電気の音声信号に変換する。音声処理部２００２は、音声信号に対して、音声処理および圧縮符号化処理を行い、音声データを生成する。 7A to 7C are flowcharts illustrating a control method of the video and audio processing system 100. FIG. 7A is a flowchart illustrating a control method of the microphone 2000. In step S701, the sound collection unit 2001 of the microphones 2011 to 2018 collects sound and converts sound into an electric sound signal. The audio processing unit 2002 performs audio processing and compression encoding processing on the audio signal to generate audio data.

ステップＳ７０２では、制御部２００３は、記憶２００５に記憶されている図６（ｂ）のテーブルを参照し、各カメラ１１０１〜１１０７のＩＤに対応するマイクロフォン２０００の指向性の集音範囲を読み出す。 In step S702, the control unit 2003 reads out the directivity sound collection range of the microphone 2000 corresponding to the ID of each of the cameras 1101 to 1107 with reference to the table of FIG. 6B stored in the storage 2005.

ステップＳ７０３では、制御部２００３は、マイクロフォン２０１１〜２０１８の音声データに対して、ステップＳ７０２で読み出した各カメラ１１０１〜１１０７のＩＤの指向性の集音範囲に基づいて指向性処理を行い、各ＩＤの音声データを生成する。例えば、制御部２００３は、カメラ２０１１のＩＤに対して、マイクロフォン２０００の指向性の集音範囲ＡおよびＢの指向性処理を行い、カメラ２０１１のＩＤの音声データを生成する。 In step S703, the control unit 2003 performs directivity processing on the audio data of the microphones 2011 to 2018 based on the directivity collection range of the IDs of the cameras 1101 to 1107 read out in step S702. To generate audio data. For example, the control unit 2003 performs directional processing of the sound collection ranges A and B of the directivity of the microphone 2000 on the ID of the camera 2011, and generates audio data of the ID of the camera 2011.

ステップＳ７０４では、制御部２００３は、記憶部２００５に記憶されている図６（ｃ）のテーブルを参照し、各カメラ１１０１〜１１０７のＩＤに対応する音声データのＩＰアドレスを読み出す。 In step S704, the control unit 2003 reads the IP address of the audio data corresponding to the ID of each of the cameras 1101 to 1107 with reference to the table of FIG. 6C stored in the storage unit 2005.

ステップＳ７０５では、制御部２００３は、ステップＳ７０４で読み出した各カメラ１１０１〜１１０７のＩＤに対応する音声データを出力する通信デバイスのＩＰアドレスと、ステップＳ７０３で生成された各ＩＤの音声データとをそれぞれ対応付ける。そして、制御部２００３は、対応付けたＩＰアドレスと音声データをサーバ装置３０００に送信する。 In step S705, the control unit 2003 stores the IP address of the communication device that outputs the audio data corresponding to the ID of each of the cameras 1101 to 1107 read in step S704, and the audio data of each ID generated in step S703. Correspond. Then, control unit 2003 transmits the associated IP address and audio data to server device 3000.

図７（ｂ）は、カメラ１１０１〜１１０７の制御方法を示すフローチャートである。ステップＳ７１１では、カメラ１１０１〜１１０７の撮像部１００１は、被写体を撮像し、映像信号を生成する。カメラ１１０１〜１１０７の画像処理部１００２は、映像信号に対して、画像処理および圧縮符号化処理を行い、映像データを生成する。 FIG. 7B is a flowchart illustrating a control method of the cameras 1101 to 1107. In step S711, the imaging unit 1001 of each of the cameras 1101 to 1107 captures an image of a subject and generates a video signal. An image processing unit 1002 of each of the cameras 1101 to 1107 performs image processing and compression encoding processing on a video signal to generate video data.

ステップＳ７１２では、カメラ１１０１〜１１０７の制御部１００３は、記憶部１００５に記憶されている図６（ａ）のテーブルを参照し、自己のカメラのＩＤに対応するＩＰアドレスを読み出す。カメラ１１０１〜１１０７の制御部１００３は、自己のカメラのＩＤに対応するＩＰアドレスと、自己のＩＤのカメラで生成された映像データとをそれぞれ対応付けて、サーバ装置３０００に送信する。 In step S712, the control unit 1003 of each of the cameras 1101 to 1107 refers to the table in FIG. 6A stored in the storage unit 1005, and reads out the IP address corresponding to the own camera ID. The control unit 1003 of each of the cameras 1101 to 1107 associates the IP address corresponding to the ID of the camera with the video data generated by the camera with the ID of the camera and transmits the IP address to the server device 3000.

図７（ｃ）は、サーバ装置３０００の制御方法を示すフローチャートである。ステップＳ７２１では、システム制御部３００２は、通信部３００１を介して、カメラ１１０１〜１１０７から各ＩＤのＩＰアドレスと映像データを受信する。また、システム制御部３００２は、通信部３００１を介して、マイクロフォン２０００から各ＩＤに対応する音声データを出力する通信デバイスのＩＰアドレスと音声データを受信する。 FIG. 7C is a flowchart illustrating a control method of the server device 3000. In step S721, the system control unit 3002 receives the IP address of each ID and the video data from the cameras 1101 to 1107 via the communication unit 3001. In addition, the system control unit 3002 receives, via the communication unit 3001, the IP address and audio data of a communication device that outputs audio data corresponding to each ID from the microphone 2000.

ステップＳ７２２では、システム制御部３００２は、記憶部３００３に記憶されている図６（ｄ）のテーブルを基に、各カメラ１１０１〜１１０７のＩＤに対応するＩＰアドレスと対応する音声データを出力する通信デバイスのＩＰアドレスの組み合わせを読み出す。 In step S722, the system control unit 3002 outputs audio data corresponding to the IP address corresponding to the ID of each of the cameras 1101 to 1107 based on the table of FIG. 6D stored in the storage unit 3003. Read the combination of the IP address of the device.

ステップＳ７２３では、システム制御部３００２は、同じＩＤのカメラのＩＰアドレスと音声データのＩＰアドレスを基に、対応する映像データと音声データを関連付け、ＭＰＥＧファイルとして記憶部３００３に記録する。システム制御部３００２は、カメラのＩＰアドレスを基に、映像データと音声データを含むＭＰＥＧファイルを再生することができる。 In step S723, the system control unit 3002 associates the corresponding video data and audio data with each other based on the IP address of the camera having the same ID and the IP address of the audio data, and records them in the storage unit 3003 as an MPEG file. The system control unit 3002 can reproduce an MPEG file including video data and audio data based on the IP address of the camera.

（第２の実施形態）
第１の実施形態では、図７（ｃ）に示すように、サーバ装置３０００によって、同じＩＤの映像データと音声データの関連付けを行った。本発明の第２の実施形態では、カメラ１１０１〜１１０７が、同じＩＤの映像データと音声データを関連付ける。カメラ１１０１〜１１０７とマイクロフォン２０００とサーバ装置３０００の構成および接続は、第１の実施形態と同じであるため説明を省略する。また、指向性処理も、第１の実施形態と同じであるため説明を省略する。以下、本実施形態が第１の実施形態と異なる点を説明する。 (Second embodiment)
In the first embodiment, as shown in FIG. 7C, the video data and the audio data having the same ID are associated by the server device 3000. In the second embodiment of the present invention, the cameras 1101 to 1107 associate video data and audio data with the same ID. The configurations and connections of the cameras 1101 to 1107, the microphone 2000, and the server device 3000 are the same as those in the first embodiment, and a description thereof will not be repeated. Also, the directivity processing is the same as that of the first embodiment, and the description is omitted. Hereinafter, the points of this embodiment different from the first embodiment will be described.

図８（ａ）〜（ｃ）は、本発明の第２の実施形態による映像音声処理システム１００の制御方法を示すフローチャートである。図８（ａ）は、マイクロフォン２０００の制御方法を示すフローチャートである。マイクロフォン２０００は、ステップＳ８０１〜Ｓ８０４の処理を行う。ステップＳ８０１〜Ｓ８０４の処理は、図７（ａ）のステップＳ７０１〜Ｓ７０４の処理と同じです。ステップＳ８０５では、制御部２００３は、ステップＳ８０４で読み出した各カメラ１１０１〜１１０７のＩＤに対応する音声データのＩＰアドレスと、ステップＳ８０３で生成された各ＩＤの音声データとを対応付けて、カメラ１１０１〜１１０７に送信する。 FIGS. 8A to 8C are flowcharts illustrating a control method of the video and audio processing system 100 according to the second embodiment of the present invention. FIG. 8A is a flowchart illustrating a control method of the microphone 2000. The microphone 2000 performs the processing of steps S801 to S804. The processing in steps S801 to S804 is the same as the processing in steps S701 to S704 in FIG. In step S805, the control unit 2003 associates the IP address of the audio data corresponding to the ID of each of the cameras 1101 to 1107 read in step S804 with the audio data of each ID generated in step S803, To 1107.

図８（ｂ）は、カメラ１１０１〜１１０７の制御方法を示すフローチャートである。図６（ｅ）に示すように、カメラ１１０１の記憶部１００５は、自己のカメラ１１０１のＩＤに対応するＩＰアドレスと音声データの組み合わせのテーブルを記憶する。同様に、カメラ１１０２〜１１０７の記憶部１００５は、それぞれ、自己のカメラ１１０１のＩＤに対応するＩＰアドレスと音声データの組み合わせのテーブルを記憶する。 FIG. 8B is a flowchart illustrating a control method of the cameras 1101 to 1107. As shown in FIG. 6E, the storage unit 1005 of the camera 1101 stores a table of a combination of an IP address corresponding to the ID of the camera 1101 and audio data. Similarly, the storage unit 1005 of each of the cameras 1102 to 1107 stores a table of a combination of an IP address and audio data corresponding to the ID of its own camera 1101.

ステップＳ８１１では、カメラ１１０１〜１１０７の撮像部１００１は、被写体を撮像し、映像信号を生成する。カメラ１１０１〜１１０７の画像処理部１００２は、映像信号に対して、画像処理および圧縮符号化処理を行い、映像データを生成する。 In step S811, the imaging unit 1001 of each of the cameras 1101 to 1107 captures an image of a subject and generates a video signal. An image processing unit 1002 of each of the cameras 1101 to 1107 performs image processing and compression encoding processing on a video signal to generate video data.

ステップＳ８１２では、カメラ１１０１の制御部１００３は、記憶部１００５に記憶されている図６（ｅ）のテーブルを参照し、自己のカメラ１１０１のＩＤに対応する音声データを出力する通信デバイスのＩＰアドレスを読み出す。同様に、カメラ１１０２〜１１０７の制御部１００３は、それぞれ、記憶部１００５に記憶されているテーブルを参照し、自己のカメラ１１０２〜１１０７のＩＤに対応する音声データを出力する通信デバイスのＩＰアドレスを読み出す。 In step S812, the control unit 1003 of the camera 1101 refers to the table of FIG. 6E stored in the storage unit 1005, and refers to the IP address of the communication device that outputs the audio data corresponding to the ID of the camera 1101. Is read. Similarly, the control unit 1003 of each of the cameras 1102 to 1107 refers to the table stored in the storage unit 1005, and sets the IP address of a communication device that outputs audio data corresponding to the ID of its own camera 1102 to 1107. read out.

ステップＳ８１３では、カメラ１１０１〜１１０７の制御部１００３は、ステップＳ８１２で読み出した自己のカメラ１１０２〜１１０７のＩＤに対応するＩＰアドレスを有する通信デバイスから出力される音声データをマイクロフォン２０００から受信する。カメラ１１０１〜１１０７の制御部１００３は、それぞれ、上記の音声データを受信できた場合には、ステップＳ８１４に進み、上記の音声データを受信できなかった場合には、ステップＳ８１５に進む。ステップＳ８１５では、制御部１００３は、通信部１００４を介して、ステップＳ８１１で生成された映像データと自己のカメラのＩＤに対応するＩＰアドレスをサーバ装置３０００に送信する。 In step S813, the control unit 1003 of the cameras 1101 to 1107 receives, from the microphone 2000, audio data output from the communication device having the IP address corresponding to the ID of the camera 1102 to 1107 read out in step S812. The control unit 1003 of each of the cameras 1101 to 1107 proceeds to step S814 when the above audio data can be received, and proceeds to step S815 when the above audio data cannot be received. In step S815, the control unit 1003 transmits the video data generated in step S811 and the IP address corresponding to the own camera ID to the server device 3000 via the communication unit 1004.

ステップＳ８１４では、カメラ１１０１〜１１０７の制御部１００３は、記憶部１００５内の図６（ｅ）等のテーブルを基に、自己のカメラのＩＤに対応するＩＰアドレスと対応する音声データを出力する通信デバイスのＩＰアドレスの組み合わせを読み出す。カメラ１１０１〜１１０７の制御部１００３は、自己のカメラのＩＤに対応するＩＰアドレスと音声データを出力する通信デバイスのＩＰアドレスを基に、自己のカメラのＩＤの映像データ（Ｓ８１１）と音声データ（Ｓ８１３）を関連付ける。そして、カメラ１１０１〜１１０７の制御部１００３は、関連付けた映像データと音声データを含むＭＰＥＧファイルを生成する。その後、カメラ１１０１〜１１０７の制御部１００３は、ステップＳ８１５に進む。ステップＳ８１５では、制御部１００３は、通信部１００４を介して、自己のカメラのＩＤの映像データおよび音声データを含むＭＰＥＧファイルと映像データのＩＰアドレスとをサーバ装置３０００に送信する。 In step S814, the control unit 1003 of the cameras 1101 to 1107 outputs, based on the table in FIG. 6E or the like in the storage unit 1005, the audio data corresponding to the IP address corresponding to the camera ID of the camera. Read the combination of the IP address of the device. The control unit 1003 of each of the cameras 1101 to 1107 determines the video data (S811) and the audio data (S811) of the own camera ID based on the IP address corresponding to the own camera ID and the IP address of the communication device that outputs the audio data. S813). Then, the control unit 1003 of each of the cameras 1101 to 1107 generates an MPEG file including the associated video data and audio data. Thereafter, the control unit 1003 of the cameras 1101 to 1107 proceeds to step S815. In step S815, the control unit 1003 transmits, via the communication unit 1004, an MPEG file including the video data and audio data of the camera ID of the camera 100 itself and the IP address of the video data to the server device 3000.

図８（ｃ）は、サーバ装置３０００の制御方法を示すフローチャートである。ステップＳ８２１では、システム制御部３００２は、通信部３００１を介して、カメラ１１０１〜１１０７から各ＩＤの映像データと音声データを含むＭＰＥＧファイルとカメラのＩＰアドレス、または各ＩＤの映像データとカメラのＩＰアドレスを受信する。システム制御部３００２は、カメラのＩＰアドレスを基に、映像データと音声データを含むＭＰＥＧファイル、または映像データを再生することができる。 FIG. 8C is a flowchart illustrating a control method of the server device 3000. In step S821, the system control unit 3002 transmits the MPEG file including the video data and audio data of each ID and the IP address of the camera or the video data of each ID and the IP address of the camera from the cameras 1101 to 1107 via the communication unit 3001. Receive the address. The system control unit 3002 can reproduce an MPEG file including video data and audio data or video data based on the IP address of the camera.

第１の実施形態および第２の実施形態では、カメラ１１０１〜１１０７のＩＰアドレスおよびマイクロフォン２０００が有する通信デバイスのＩＰアドレスを用いて映像データおよび音声データの関連付けを行っていたが、これに限定されない。マイクロフォン２０００が複数の通信デバイスを有さない場合、単純にカメラ１１０１〜１１０７のＩＤを識別情報とし、音声を収集する指向性の範囲を関連付けることでもよい。すなわち、この場合、図６（ａ）および（ｂ）を有するだけでもよい。 In the first and second embodiments, the association between the video data and the audio data is performed using the IP addresses of the cameras 1101 to 1107 and the IP address of the communication device of the microphone 2000. However, the present invention is not limited to this. . When the microphone 2000 does not have a plurality of communication devices, IDs of the cameras 1101 to 1107 may be simply used as identification information and a range of directivity for collecting sound may be associated. That is, in this case, only the configuration shown in FIGS. 6A and 6B may be provided.

（第３の実施形態）
図９（ａ）は、本発明の第３の実施形態によるマイクロフォン２０００の位置に対するカメラ１１０１の撮影範囲１２０１の極座標の算出方法を示す図である。サーバ装置３０００は、カメラ１１０１の撮影範囲１２０１の極座標を算出し、その極座標を基にカメラ１１０１の指向性方向と指向性範囲を算出する。以下、本実施形態が第１および第２の実施形態と異なる点を説明する。 (Third embodiment)
FIG. 9A is a diagram illustrating a method of calculating the polar coordinates of the shooting range 1201 of the camera 1101 with respect to the position of the microphone 2000 according to the third embodiment of the present invention. The server device 3000 calculates the polar coordinates of the shooting range 1201 of the camera 1101, and calculates the directivity direction and the directivity range of the camera 1101 based on the polar coordinates. Hereinafter, the points of this embodiment different from the first and second embodiments will be described.

まず、システム制御部３００２は、ユーザの指示に応じて、カメラ１１０１の位置座標ａ（Ｘａ，Ｙａ）と、カメラ１１０１の撮影方向と、カメラ１１０１の撮影角度と、カメラ１１０１の撮影距離を設定する。次に、システム制御部３００２は、ユーザの指示に応じて、マイクロフォン２０００の位置座標を設定する。 First, the system control unit 3002 sets the position coordinates a (Xa, Ya) of the camera 1101, the shooting direction of the camera 1101, the shooting angle of the camera 1101, and the shooting distance of the camera 1101 according to a user's instruction. . Next, the system control unit 3002 sets the position coordinates of the microphone 2000 according to the user's instruction.

次に、システム制御部３００２は、上記の情報を基に、カメラ１１０１の撮影範囲１２０１の頂点座標ａ（Ｘａ，Ｙａ）、ｂ（Ｘｂ，Ｙｂ）、ｃ（Ｘｃ，Ｙｃ）を算出する。次に、システム制御部３００２は、次式により、マイクロフォン２０００の位置座標に対する撮影範囲１２０１の頂点座標ａ（Ｘａ，Ｙａ）、ｂ（Ｘｂ，Ｙｂ）、ｃ（Ｘｃ，Ｙｃ）を極座標（ｒａ，θａ）、（ｒｂ，θｂ）、（ｒｃ，θｃ）に変換する。 Next, the system control unit 3002 calculates the vertex coordinates a (Xa, Ya), b (Xb, Yb), and c (Xc, Yc) of the shooting range 1201 of the camera 1101 based on the above information. Next, the system control unit 3002 converts the vertex coordinates a (Xa, Ya), b (Xb, Yb), and c (Xc, Yc) of the shooting range 1201 with respect to the position coordinates of the microphone 2000 into polar coordinates (ra, θa), (rb, θb), and (rc, θc).

システム制御部３００２は、上記と同様に、マイクロフォン２０００の位置座標に対するカメラ１１０２〜１１０７の撮影範囲１２０２〜１２０７の頂点座標を極座標に変換する。 The system control unit 3002 converts the vertex coordinates of the photographing range 1202 to 1207 of the cameras 1102 to 1107 with respect to the position coordinates of the microphone 2000 into polar coordinates in the same manner as described above.

図９（ｂ）は、本実施形態によるマイクロフォン２０００に対するカメラ１１０１の撮影範囲１２０１の極座標の他の算出方法を示す図である。サーバ装置３０００は、カメラ１１０１の撮影範囲１２０１の極座標を算出し、その極座標を基にカメラ１１０１の指向性方向と指向性範囲を算出する。 FIG. 9B is a diagram illustrating another method of calculating the polar coordinates of the imaging range 1201 of the camera 1101 with respect to the microphone 2000 according to the present embodiment. The server device 3000 calculates the polar coordinates of the shooting range 1201 of the camera 1101, and calculates the directivity direction and the directivity range of the camera 1101 based on the polar coordinates.

まず、システム制御部３００２は、ユーザの指示に応じて、カメラ１１０１の設置領域の座標を設定する。次に、システム制御部３００２は、ユーザの指示に応じて、マイクロフォン２０００の位置座標を設定する。 First, the system control unit 3002 sets the coordinates of the installation area of the camera 1101 according to a user's instruction. Next, the system control unit 3002 sets the position coordinates of the microphone 2000 according to the user's instruction.

次に、システム制御部３００２は、上記の情報を基に、カメラ１１０１の撮影範囲１２０１の頂点座標ａ（Ｘａ，Ｙａ）、ｂ（Ｘｂ，Ｙｂ）、ｃ（Ｘｃ，Ｙｃ）、ｄ（Ｘｄ，Ｙｄ）を算出する。システム制御部３００２は、次式により、マイクロフォン２０００の位置座標を基準に、頂点座標ａ（Ｘａ，Ｙａ）、ｂ（Ｘｂ，Ｙｂ）、ｃ（Ｘｃ，Ｙｃ）、ｄ（Ｘｄ，Ｙｄ）を極座標（ｒａ，θａ）、（ｒｂ，θｂ）、（ｒｃ，θｃ）、（ｒｄ，θｄ）に変換する。 Next, the system control unit 3002, based on the above information, vertex coordinates a (Xa, Ya), b (Xb, Yb), c (Xc, Yc), d (Xd, Yd) is calculated. The system control unit 3002 calculates the vertex coordinates a (Xa, Ya), b (Xb, Yb), c (Xc, Yc), and d (Xd, Yd) on the basis of the position coordinates of the microphone 2000 by the following equation. (Ra, θa), (rb, θb), (rc, θc), and (rd, θd).

図１０（ａ）は、カメラ１１０１の撮影範囲１２０１に対応するマイクロフォン２０００の指向性方向θ１と指向性範囲φ１を示す図である。サーバ装置３０００は、図９（ａ）または（ｂ）の極座標を算出した後、カメラ１１０１の撮影範囲１２０１に対応するマイクロフォン２０００の指向性方向θ１と指向性範囲φ１を算出する。 FIG. 10A is a diagram showing the directivity direction θ1 and the directivity range φ1 of the microphone 2000 corresponding to the shooting range 1201 of the camera 1101. After calculating the polar coordinates in FIG. 9A or 9B, the server device 3000 calculates the directivity direction θ1 and the directivity range φ1 of the microphone 2000 corresponding to the shooting range 1201 of the camera 1101.

まず、システム制御部３００２は、マイクロフォン２０００の位置座標と撮影範囲１２０１の両端とを結ぶ２直線９０１および９０２の角度を算出する。次に、システム制御部３００２は、２直線９０１および９０２の角度の平均の角度θ１を、カメラ１１０１の撮影範囲１２０１に対応するマイクロフォン２０００の指向性方向として算出する。次に、システム制御部３００２は、２直線９０１および９０２の角度の差の角度φ１を、カメラ１１０１の撮影範囲１２０１に対応するマイクロフォン２０００の指向性範囲として算出する。 First, the system control unit 3002 calculates the angles of two straight lines 901 and 902 connecting the position coordinates of the microphone 2000 and both ends of the imaging range 1201. Next, the system control unit 3002 calculates the average angle θ1 of the angles of the two straight lines 901 and 902 as the directivity direction of the microphone 2000 corresponding to the shooting range 1201 of the camera 1101. Next, the system control unit 3002 calculates the angle φ1 of the difference between the angles of the two straight lines 901 and 902 as the directivity range of the microphone 2000 corresponding to the shooting range 1201 of the camera 1101.

システム制御部３００２は、上記と同様に、カメラ１１０２〜１１０７の撮影範囲１２０２〜１２０７に対応するマイクロフォン２０００の指向性方向θ２〜θ７および指向性範囲φ２〜φ７を算出する。 The system control unit 3002 calculates the directivity directions θ2 to θ7 and the directivity ranges φ2 to φ7 of the microphone 2000 corresponding to the shooting ranges 1202 to 1207 of the cameras 1102 to 1107, as described above.

図１０（ｂ）は、カメラ１１０１〜１１０７のＩＤごとのマイクロフォン２０００の指向性方向および指向性範囲を示すテーブルの図である。マイクロフォン２０００の指向性方向および指向性範囲は、マイクロフォン２０００の指向性を示す情報である。サーバ装置３０００は、図１０（ａ）の算出の後、図１０（ｂ）のテーブルを生成する。システム制御部３００２は、図１０（ａ）の算出の後、図１０（ｂ）のように、カメラ１１０１〜１１０７のＩＤに対するマイクロフォン２０００の指向性方向θ１〜θ７および指向性範囲φ１〜φ７を示すテーブルを生成する。図１０（ｂ）のテーブルは、図６（ｂ）のテーブルの代わりに使用される。サーバ装置３０００は、図１０（ｂ）のテーブルをマイクロフォン２０００に送信する。マイクロフォン２０００は、記憶部２００５に図１０（ｂ）のテーブルを保存し、図１０（ｂ）のテーブルを基に、図７（ａ）のステップＳ７０３の指向性処理を行う。 FIG. 10B is a diagram of a table showing the directivity direction and the directivity range of the microphone 2000 for each ID of the cameras 1101 to 1107. The directivity direction and the directivity range of the microphone 2000 are information indicating the directivity of the microphone 2000. After the calculation in FIG. 10A, the server device 3000 generates the table in FIG. 10B. After the calculation of FIG. 10A, the system control unit 3002 indicates the directivity directions θ1 to θ7 and the directivity ranges φ1 to φ7 of the microphone 2000 with respect to the IDs of the cameras 1101 to 1107 as shown in FIG. Generate a table. The table of FIG. 10B is used instead of the table of FIG. The server device 3000 transmits the table of FIG. 10B to the microphone 2000. The microphone 2000 stores the table of FIG. 10B in the storage unit 2005, and performs the directivity processing of step S703 of FIG. 7A based on the table of FIG.

第１〜第３の実施形態によれば、マイクロフォン２０００の数がカメラ１１０１〜１１０７の数より少なく、ユーザが任意にマイクロフォン２０００とカメラ１１０１〜１１０７の設置場所を指定することができる。映像音声処理システム１００は、カメラ１１０１〜１１０７の映像に合わせた音声を集音し、各カメラ１１０１〜１１０７の映像データと音声データを適切に関連付ける（結合する）ことができる。 According to the first to third embodiments, the number of microphones 2000 is smaller than the number of cameras 1101 to 1107, and the user can arbitrarily specify the installation locations of the microphone 2000 and the cameras 1101 to 1107. The video / audio processing system 100 can collect audio corresponding to the video of the cameras 1101 to 1107 and appropriately associate (combine) the video data and the audio data of each of the cameras 1101 to 1107.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or an apparatus via a network or a storage medium, and one or more processors in a computer of the system or the apparatus read and execute the program. This processing can be realized. Further, it can also be realized by a circuit (for example, an ASIC) that realizes one or more functions.

なお、上記実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 It should be noted that each of the above-described embodiments is merely an example of a concrete example for carrying out the present invention, and the technical scope of the present invention should not be interpreted in a limited manner. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features.

１１０１〜１１０７カメラ、２０００マイクロフォン、３０００サーバ装置、４０００ネットワーク 1101-1107 camera, 2000 microphone, 3000 server device, 4000 network

Claims

A plurality of imaging means for generating a plurality of video data,
Sound input means for collecting sounds in a plurality of locations or directions and generating a plurality of sound data corresponding to the plurality of image pickup means;
A video and audio processing system comprising: an association unit that associates the plurality of video data with the plurality of audio data.

2. The video / audio processing system according to claim 1, wherein the audio input unit generates the plurality of audio data by performing directional processing on audio in a plurality of places or in a plurality of directions. 3.

3. The sound input unit performs directivity processing on sounds in the plurality of places or in a plurality of directions based on information indicating directivity of the sound input unit for each of the plurality of imaging units. 2. The video and audio processing system according to 1.

The said sound input means performs directivity processing of the sound of said several places or several directions based on the information which shows the sound collection range of said sound input means for every said several imaging means. 4. The video / audio processing system according to 2 or 3.

The voice input unit performs directivity processing on the voices in the plurality of locations or directions based on information indicating a directivity direction and a directivity range of the voice input unit for each of the plurality of imaging units. The video / audio processing system according to claim 2 or 3, wherein

The image processing apparatus according to claim 1, further comprising: a generating unit configured to generate information indicating directivity of the voice input unit for each of the plurality of imaging units, based on a shooting range of the plurality of imaging units with respect to a position of the voice input unit. Item 4. The video / audio processing system according to item 3.

7. The video / audio processing according to claim 6, wherein the generation unit calculates a photographing range of the plurality of photographing units based on positions, photographing directions, photographing angles, and photographing distances of the plurality of photographing units. system.

7. The video and audio processing system according to claim 6, wherein the generation unit calculates an imaging range of the plurality of imaging units based on an installation area of the plurality of imaging units.

The plurality of imaging units each transmit the identification information of the plurality of video data and the plurality of video data to the association unit,
The voice input unit transmits identification information of the plurality of voice data and the plurality of voice data to the association unit,
The method according to claim 1, wherein the associating unit associates the plurality of video data with the plurality of audio data based on identification information of the plurality of video data and identification information of the plurality of audio data. The video / audio processing system according to claim 1.

The voice input unit transmits identification information of the plurality of voice data and the plurality of voice data to the association unit,
9. The apparatus according to claim 1, wherein the associating unit associates the plurality of video data generated by the imaging unit with the plurality of audio data based on identification information of the plurality of audio data. The video / audio processing system according to the item.

A plurality of imaging devices;
A voice input device,
And a processing device,
The plurality of imaging devices each include the plurality of imaging units,
The voice input device has the voice input unit,
The video / audio processing system according to claim 1, wherein the processing device includes the association unit.

The plurality of imaging devices each transmit the identification information of the plurality of video data and the plurality of video data to the processing device,
The voice input device transmits identification information of the plurality of voice data and the plurality of voice data to the processing device,
The method according to claim 11, wherein the processing device associates the plurality of video data with the plurality of audio data based on identification information of the plurality of video data and identification information of the plurality of audio data. Video and audio processing system.

A plurality of imaging devices;
A voice input device,
Each of the plurality of imaging devices has the imaging unit and the association unit,
9. The video / audio processing system according to claim 1, wherein the audio input device includes the audio input unit.

The voice input device transmits identification information of the plurality of voice data and the plurality of voice data to the plurality of imaging devices,
14. The video and audio processing system according to claim 13, wherein each of the plurality of imaging devices associates the plurality of video data with the plurality of audio data based on identification information of the plurality of audio data.

Generating a plurality of video data by a plurality of imaging means;
A step of collecting sounds in a plurality of places or in a plurality of directions by voice input means and generating a plurality of voice data corresponding to the plurality of imaging means;
Associating the plurality of video data with the plurality of audio data by an associating unit.