JP5866505B2

JP5866505B2 - Voice processing system and voice processing method

Info

Publication number: JP5866505B2
Application number: JP2015007243A
Authority: JP
Inventors: 裕隆澤; 宏之松本; 信一重永; 昭年泉; 渡辺　周一; 周一渡辺
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2012-12-27
Filing date: 2015-01-16
Publication date: 2016-02-17
Anticipated expiration: 2033-12-05
Also published as: JP2015100125A; JP2015122756A; JP5866504B2; JP5853181B2; JP2015118386A; JP2015136177A

Description

本発明は、記録された映像データ及び音声データを再生する音声処理システム及び音声処理方法に関する。 The present invention relates to an audio processing system and an audio processing method for reproducing recorded video data and audio data.

従来、工場、店舗（例えば小売店、銀行）或いは公共の場（例えば図書館）に設置される監視システムでは、ネットワークを用いて、複数の監視カメラ（例えばパンチルトカメラ、全方位カメラ）を接続することで、監視対象の周囲の映像データ（静止画像及び動画像を含む。以下同様）の高画質化及び広画角化が図られている。 Conventionally, in a surveillance system installed in a factory, a store (for example, a retail store, a bank) or a public place (for example, a library), a plurality of surveillance cameras (for example, a pan-tilt camera and an omnidirectional camera) are connected using a network. Thus, the image data (including still images and moving images; the same applies hereinafter) of the surrounding video to be monitored is improved in image quality and wide angle of view.

また、映像だけの監視では得られる情報量がどうしても限界があるため、監視カメラの他にマイクロホンも配置することで、監視対象の周囲の映像データ及び音声データを得るという監視システムも、近年登場している。 In addition, since the amount of information that can be obtained by video-only monitoring is inevitably limited, a monitoring system that obtains video data and audio data around the monitoring target by arranging a microphone in addition to the monitoring camera has recently appeared. ing.

監視対象の周囲の音声データを得る先行技術として、撮像画像を得る撮像部と、音声データを収音する複数のマイクロホン（収音部）とを有し、各マイクロホンが収音した音声データを用いて、クライアントとしてのサウンド再生装置から指定された所定の収音方向に指向性を有する音声データを生成するサウンド処理装置が知られている（例えば特許文献１参照）。 As a prior art for obtaining sound data around a monitoring target, an image pickup unit that obtains a picked-up image and a plurality of microphones (sound pickup units) that pick up sound data are used, and sound data that each microphone picks up is used. A sound processing apparatus that generates sound data having directivity in a predetermined sound collection direction designated from a sound reproduction apparatus as a client is known (see, for example, Patent Document 1).

特許文献１では、サウンド処理装置は、複数の収音部（マイクロホン）が収音した音声データを、ネットワークを介して接続されているクライアント（サウンド再生装置）から予め受けた所定の収音方向の制御命令を基に合成して、同方向に指向性を有する音声データを生成し、合成された音声データをクライアント（サウンド再生装置）に送信する。 In Patent Document 1, the sound processing device receives a sound data collected by a plurality of sound collection units (microphones) in a predetermined sound collection direction received in advance from a client (sound reproduction device) connected via a network. Synthesis is performed based on the control command, voice data having directivity in the same direction is generated, and the synthesized voice data is transmitted to the client (sound reproduction device).

特開２０００−２０９６８９号公報Japanese Patent Laid-Open No. 2000-209589

特許文献１に示すサウンド処理装置を有人監視システムに適用した場合には、サウンド処理装置は、監視対象の周囲の撮影画像の記録中に、何かしらのアクシデントが発生した時には、収音方向の指定をクライアント（サウンド再生装置）から直ぐに受け、同収音方向に指向性を有する音声データを生成できる。 When the sound processing device shown in Patent Document 1 is applied to a manned monitoring system, the sound processing device designates a sound collection direction when any accident occurs during recording of a captured image around the monitoring target. Audio data having directivity in the sound collecting direction can be generated immediately after receiving from the client (sound reproduction device).

しかし、特許文献１に示すサウンド処理装置を例えば無人監視システムに適用する場合において、アクシデントが発生した後に、アクシデントの発生前から記録されていた映像データ及び音声データを再生することでアクシデントに関する情報（例えば音声データ）を得たいとする。この場合では、サウンド処理装置は、アクシデントが発生した場所が予めクライアントから指定を受けた所定の収音方向とは限らないため、アクシデントが発生した場所、即ち所望の収音方向に指向性を有する音声データを得ることが困難となる可能性がある。即ち、記録された映像データ及び音声データからアクシデントに関する有効な情報を得られない可能性が高いという課題がある。 However, when the sound processing device shown in Patent Document 1 is applied to, for example, an unmanned monitoring system, after the occurrence of an accident, the video data and audio data recorded before the occurrence of the accident are reproduced to reproduce information about the accident ( For example, it is desired to obtain audio data. In this case, the sound processing apparatus has directivity in the place where the accident has occurred, that is, in the desired sound collection direction, because the place where the accident has occurred is not necessarily the predetermined sound collection direction designated in advance by the client. It may be difficult to obtain audio data. That is, there is a problem that there is a high possibility that effective information regarding the accident cannot be obtained from the recorded video data and audio data.

本発明は、上述した従来の課題を解決するために、撮像された映像データが表示された表示画面の中で指定された１つ以上の指定箇所に対応する位置に向かう指向方向の音声データを強調して出力することができ、音声データのパラメータの状態を容易に認識することができる音声処理システム及び音声処理方法を提供することを目的とする。 In order to solve the above-described conventional problems, the present invention provides audio data in a directional direction toward a position corresponding to one or more designated locations designated on a display screen on which captured video data is displayed. It is an object of the present invention to provide a voice processing system and a voice processing method that can be output with emphasis and that can easily recognize a parameter state of voice data.

本発明は、映像を撮像する撮像部と、矩形状の表示領域を有し、前記撮像部により撮像された映像データを、前記矩形状の表示領域より小さい円形状の映像表示領域内に表示する表示部と、複数のマイクロホンを含み、前記マイクロホンを用いて音声を収音する収音部と、前記収音部により収音された音声データを音声出力する音声出力部と、前記表示部への指定を受け付ける操作部と、前記音声データを基に、前記収音部から、前記操作部により指定が受け付けられた前記映像データの指定箇所に対応する位置に向かう指向方向の音声を強調した強調音声データを生成又は合成して前記音声出力部から音声出力させる信号処理部と、を備え、前記信号処理部は、前記操作部による前記映像表示領域外への指定の受け付けに応じて、前記音声出力の状態表示エリア又は調整用操作エリアを、前記矩形状の表示領域の四隅のいずれかから前記映像表示領域の中心までを対角線とする矩形領域内に前記映像表示領域の境界線を跨ぐように表示する、音声処理システムである。 The present invention includes an imaging unit that captures an image and a rectangular display area, and displays video data captured by the imaging unit in a circular video display area that is smaller than the rectangular display area. A display unit, a plurality of microphones, a sound collection unit that collects sound using the microphones, a sound output unit that outputs sound data collected by the sound collection unit, and a display unit An operation unit that accepts designation, and an emphasized voice that emphasizes voice in a directivity direction from the sound collection unit to a position corresponding to the designated location of the video data that has been designated by the operation unit, based on the audio data data generated or synthesized and a and a signal processing unit for audio output from the audio output unit, the signal processing unit, in response to designation of acceptance into the video display area outside by the operation unit, the voice The status display area or operating area for adjustment of the force, straddle the borders of the video display area from one of the four corners of the rectangular display area to the center of the image display region in a rectangular area whose diagonal it is displayed as a sound voice processing system.

また、本発明は、撮像部において映像を撮像するステップと、複数のマイクロホンを含む収音部において音声を収音するステップと、前記撮像部により撮像された映像データを、矩形状の表示領域を有する表示部の前記矩形状の表示領域より小さい円形状の映像表示領域内に表示させるステップと、前記映像データを前記表示部に表示させ、前記収音部により収音された音声データを音声出力部に音声出力させるステップと、操作部により前記表示部への指定を受け付けるステップと、前記音声データを基に、前記収音部から、前記操作部により指定が受け付けられた前記映像データの指定箇所に対応する位置に向かう指向方向の音声を強調した強調音声データを生成又は合成して前記音声出力部から音声出力させるステップと、前記操作部による前記映像表示領域外への指定の受け付けに応じて、前記音声出力の状態表示エリア又は調整用操作エリアを、前記矩形状の表示領域の四隅のいずれかから前記映像表示領域の中心までを対角線とする矩形領域内に前記映像表示領域の境界線を跨ぐように表示するステップと、を有する音声処理方法である。 In addition, the present invention provides a step of capturing an image in an image capturing unit, a step of capturing sound in a sound collecting unit including a plurality of microphones, and a video image data captured by the image capturing unit in a rectangular display area. a step of displaying the rectangular display small circular image display area than the area of the display unit with, the image data is displayed on the display unit, voice output audio data picked up by the sound pickup unit A step of outputting a sound to a unit, a step of receiving a designation to the display unit by an operation unit, and a designated portion of the video data that has been designated by the operation unit from the sound collection unit based on the audio data a step of voice output from the voice output unit generates or synthesizes the emphasized sound data that emphasizes the orientation of the sound toward the position corresponding to the operating unit Wherein in response to reception of the specified to the video display area outside with, the status display area or operating area for adjustment of the audio output from any of the four corners of the rectangular display area to the center of the video display area Displaying in a rectangular area having a diagonal line across the boundary of the video display area .

本発明によれば、撮像された映像データが表示された表示画面の中で指定された１つ以上の指定箇所に対応する位置に向かう指向方向の音声データを強調して出力することができ、音声データのパラメータの状態を容易に表示することができる。 According to the present invention, it is possible to emphasize and output audio data in a directivity direction toward a position corresponding to one or more designated locations designated in a display screen on which captured video data is displayed, The state of the parameters of the audio data can be easily displayed.

（Ａ）、（Ｂ）各実施形態の音声処理システムのシステム構成を示すブロック図(A), (B) The block diagram which shows the system configuration | structure of the speech processing system of each embodiment. （Ａ）マイクアレイの外観図、（Ｂ）第３の実施形態におけるマイクアレイの外観図、（Ｃ）マイクアレイとカメラとの取り付け状態とを示す図(A) External view of microphone array, (B) External view of microphone array according to the third embodiment, (C) A view showing a mounting state of the microphone array and the camera マイクアレイを用いた指向性制御処理の原理の説明図Illustration of the principle of directivity control processing using a microphone array 音声処理システムの記録時の動作手順を説明するフローチャートFlow chart for explaining the operation procedure at the time of recording in the voice processing system １つ以上の指定箇所を指定する場合における、音声処理システムの再生時の動作手順を説明するフローチャートThe flowchart explaining the operation | movement procedure at the time of reproduction | regeneration of a voice processing system in the case of designating one or more designation | designated places. 第１の実施形態の音声処理システムの使用形態の一例を示す模式図、（Ａ）例えば屋内のホールの天井に１台のカメラと１台のマイクアレイとが離れた位置に設置された様子を示す図、（Ｂ）映像データがディスプレイに表示され、音声データがスピーカにおいて音声出力されている様子を示す図The schematic diagram which shows an example of the usage pattern of the audio | voice processing system of 1st Embodiment, (A) The mode that one camera and one microphone array were installed in the position which separated from the ceiling of the indoor hall, for example, for example (B) The figure which shows a mode that video data is displayed on a display, and audio | voice data is audio-outputted in the speaker. 第２の実施形態の音声処理システムの使用形態の一例を示す模式図、（Ａ）例えば屋内のホールの天井に、２台のカメラと、２台のカメラの中間位置にある１台のマイクアレイと、スピーカとが設置された様子を示す図、（Ｂ）カメラ１０により撮像された映像データがディスプレイ６３に表示され、音声データがスピーカ６５において音声出力されている様子を示す図、（Ｃ）カメラ１０Ａにより撮像された映像データがディスプレイ６３に表示され、音声データがスピーカ６５において音声出力されている様子を示す図Schematic diagram showing an example of a usage pattern of the speech processing system of the second embodiment, (A) For example, two microphones and one microphone array at an intermediate position between the two cameras on the ceiling of an indoor hall And (B) a diagram showing a state in which video data picked up by the camera 10 is displayed on the display 63 and audio data is output as audio from the speaker 65. (C) The figure which shows a mode that the video data imaged with the camera 10A is displayed on the display 63, and audio | voice data are output by the speaker 65 as audio | voice. 第４の実施形態の音声処理システムの使用形態の一例を示す模式図、（Ａ）例えば屋内のホールの天井に、１台のカメラと、１台のマイクアレイと、スピーカとが設置された様子を示す図、（Ｂ）ディスプレイに表示された映像データの中で複数の指定箇所が指定された場合の音声処理システムの動作概要の説明図The schematic diagram which shows an example of the usage pattern of the speech processing system of 4th Embodiment, (A) A mode that one camera, one microphone array, and the speaker were installed in the ceiling of the indoor hall, for example (B) Explanatory drawing of operation | movement outline | summary of audio | voice processing system in case the some designation | designated location is designated in the video data displayed on the display. 音声処理システムの使用形態の一例を示す模式図、（Ａ）例えば屋内のホールの天井に、ドーナツ型形状のマイクアレイと、マイクアレイと一体として組み込まれたカメラと、スピーカとが設置された様子を示す図、（Ｂ）カメラ１０Ｅが撮像した映像データにおいて２人の人物９１，９２が選択される様子を示す図、（Ｃ）画像変換後の２人の人物９１，９２の映像データがディスプレイに表示され、人物９１，９２の会話の音声データがスピーカ６５において音声出力されている様子を示す図、（Ｄ）カメラ１０Ｅが撮像した映像データにおいて２人の人物９３，９４が選択される様子を示す図、（Ｅ）画像変換後の２人の人物９３，９４の映像データがディスプレイに表示され、人物９３，９４の会話の音声データがスピーカ６５において音声出力されている様子を示す図Schematic diagram showing an example of a usage pattern of a sound processing system, (A) A donut-shaped microphone array, a camera integrated with the microphone array, and a speaker, for example, installed on the ceiling of an indoor hall FIG. 5B is a diagram showing a state in which two persons 91 and 92 are selected in the video data captured by the camera 10E, and FIG. 5C is a diagram showing video data of the two persons 91 and 92 after image conversion. (D) The state in which two persons 93 and 94 are selected in the video data captured by the camera 10E. (E) Video data of two persons 93 and 94 after image conversion are displayed on the display, and voice data of conversations of the persons 93 and 94 is displayed on the speaker 65. Shows a state in which the audio output Te （Ａ）、（Ｂ）、（Ｃ）他のマイクアレイ２０Ｄ、２０Ｅ、２０Ｆの外観図(A), (B), (C) External view of other microphone arrays 20D, 20E, 20F 複数の指定箇所が指定された場合のディスプレイ６３及びスピーカ６５の動作を示す模式図Schematic diagram showing the operation of the display 63 and the speaker 65 when a plurality of designated locations are designated. 各実施形態のマイクアレイの筐体構造の分解斜視図Disassembled perspective view of the housing structure of the microphone array of each embodiment （Ａ）図１２に示すマイクアレイの筐体構造の平面図、（Ｂ）図１３（Ａ）のＡ−Ａ断面図(A) Plan view of the housing structure of the microphone array shown in FIG. 12, (B) AA sectional view of FIG. 図１３（Ｂ）の点線範囲の要部拡大図The principal part enlarged view of the dotted-line range of FIG.13 (B) （Ａ）パンチングメタルカバーをメイン筐体に固定する様子を示す斜視図、（Ｂ）パンチングメタルカバーをメイン筐体に固定する様子を示す断面図(A) The perspective view which shows a mode that a punching metal cover is fixed to a main housing | casing, (B) It is sectional drawing which shows a mode that a punching metal cover is fixed to a main housing | casing. マイク取付構造の模式図Schematic diagram of microphone mounting structure マイク基板の平面図Top view of microphone board （Ａ）複数のマイク回路に１つのリップル除去回路が設けられるマイク基板回路の図、（Ｂ）複数のマイク回路のそれぞれにリップル除去回路が設けられるマイク基板回路の図(A) A diagram of a microphone substrate circuit in which one ripple removal circuit is provided in a plurality of microphone circuits, (B) a diagram of a microphone substrate circuit in which a ripple removal circuit is provided in each of the plurality of microphone circuits. （Ａ）カメラアダプタが取り付けられずに全方位カメラが取り付けられたマイクアレイの筐体構造の斜視図、（Ｂ）屋外用全方位カメラがカメラアダプタと共に取り付けられたマイクアレイの筐体構造の斜視図(A) The perspective view of the housing | casing structure of the microphone array to which the omnidirectional camera was attached without attaching the camera adapter, (B) The perspective view of the housing | casing structure of the microphone array to which the outdoor omnidirectional camera was attached with the camera adapter. Figure 屋内用全方位カメラが取り付けられるマイクアレイの筐体構造の分解斜視図An exploded perspective view of a microphone array housing structure to which an indoor omnidirectional camera is attached 屋外用全方位カメラが取り付けられるマイクアレイの筐体構造の分解斜視図An exploded perspective view of a microphone array housing structure to which an outdoor omnidirectional camera is attached （Ａ）屋外用全方位カメラが取り付けられたマイクアレイの筐体構造の側面図、（Ｂ）図２２（Ａ）のＢ−Ｂ断面図(A) Side view of housing structure of microphone array to which outdoor omnidirectional camera is attached, (B) BB sectional view of FIG. 22 (A) 図２２の点線範囲の要部拡大図The principal part enlarged view of the dotted line range of FIG. 蓋の取り付けられるマイクアレイの筐体構造の分解斜視図Exploded perspective view of the microphone array housing structure to which the lid is attached 取付金具を用いて天井に取り付けられる筐体構造の分解斜視図Disassembled perspective view of a housing structure that can be attached to the ceiling using mounting brackets （Ａ）ベース板金用固定穴に差し込まれる前のベース板金側固定ピンの側面図、（Ｂ）ベース板金用固定穴に差し込まれたベース板金側固定ピンの側面図、（Ｃ）ベース板金用固定穴に差し込まれたベース板金側固定ピンの平面図、（Ｄ）ベース板金用固定穴の小径穴に移動したベース板金側固定ピンの側面図、（Ｅ）ベース板金用固定穴の小径穴に移動したベース板金側固定ピンの平面図(A) Side view of the base metal plate fixing pin before being inserted into the base metal plate fixing hole, (B) Side view of the base metal plate fixing pin inserted into the base metal plate fixing hole, (C) Base metal plate fixing Plan view of the base sheet metal side fixing pin inserted into the hole, (D) Side view of the base sheet metal side fixing pin moved to the small diameter hole of the base sheet metal fixing hole, (E) Movement to the small diameter hole of the base sheet metal fixing hole Plan view of fixed base sheet metal side fixing pin ＥＣＭ用凹部にテーパが設けられたマイクアレイの筐体構造の断面図Sectional drawing of the housing | casing structure of the microphone array in which the taper was provided in the recessed part for ECM 風対策の施されたマイクアレイの筐体構造の断面図Cross-sectional view of microphone array housing structure with wind countermeasures （Ａ）ＥＣＭ用凹部の内径と深さの関係を表したマイクアレイの筐体構造の断面図、（Ｂ）ＥＣＭ用凹部の内壁が傾斜壁となったマイクアレイの筐体構造の断面図、（Ｃ）ＥＣＭ用凹部の内周隅部がＲ部となったマイクアレイの筐体構造の断面図(A) A cross-sectional view of the housing structure of the microphone array showing the relationship between the inner diameter and depth of the ECM recess, (B) a cross-sectional view of the microphone array housing structure in which the inner wall of the ECM recess is an inclined wall, (C) Sectional view of the housing structure of the microphone array in which the inner peripheral corner of the ECM recess is an R portion （Ａ）テーパを形成しないＥＣＭ用凹部の等圧面を表した説明図、（Ｂ）テーパを形成したＥＣＭ用凹部の等圧面を表した説明図(A) Explanatory drawing showing the isobaric surface of the ECM recess not forming the taper, (B) Explanatory drawing showing the isobaric surface of the ECM recess forming the taper （Ａ）第４の実施形態の音声処理システムの使用例の説明図、（Ｂ）第１の指定箇所の周囲に表示される第１の識別形状、第２の指定箇所の周囲に表示される第２の識別形状の一例を表示する様子と、第１の識別形状により特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調して第１のスピーカから出力する様子と、第２の識別形状により特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調して第２のスピーカから出力する様子とを示す図(A) Explanatory drawing of an example of use of the speech processing system of the fourth embodiment, (B) a first identification shape displayed around the first designated location, and displayed around the second designated location. A state in which an example of the second identification shape is displayed and the first directional sound that is directed to the first sound position corresponding to the first designated location specified by the first identification shape is emphasized in the first. Output from the second speaker, and the second directivity direction sound toward the second sound position corresponding to the second designated location specified by the second identification shape is emphasized and output from the second speaker. Figure showing the situation 図３１（Ｂ）に示す映像データが表示されている状態において、ディスプレイに表示された映像データの表示領域外へのクリック操作に応じて、調整用操作ボックスが表示される様子を示す図The figure which shows a mode that the operation box for adjustment is displayed according to the click operation outside the display area of the video data displayed on the display in the state where the video data shown in FIG. 31 (B) is displayed. （Ａ）第４の実施形態の音声処理システムの使用例の説明図、（Ｂ）第１の指定箇所の周囲に表示される第１の識別形状、第２の指定箇所の周囲に表示される第２の識別形状の一例を表示する様子と、第１の識別形状により特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調して第１のスピーカから出力する様子と、第２の識別形状により特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調して第２のスピーカから出力する様子とを示す図(A) Explanatory drawing of an example of use of the speech processing system of the fourth embodiment, (B) a first identification shape displayed around the first designated location, and displayed around the second designated location. A state in which an example of the second identification shape is displayed and the first directional sound that is directed to the first sound position corresponding to the first designated location specified by the first identification shape is emphasized in the first. Output from the second speaker, and the second directivity direction sound toward the second sound position corresponding to the second designated location specified by the second identification shape is emphasized and output from the second speaker. Figure showing the situation 図３１（Ｂ）に示す映像データが表示されている状態において、ディスプレイに表示された映像データの表示領域外へのクリック操作毎に、全方位カメラにより撮像された映像データと調整用操作ボックスとを切り替えて表示する様子を示す図In the state where the video data shown in FIG. 31 (B) is being displayed, the video data captured by the omnidirectional camera and the adjustment operation box for each click operation outside the display area of the video data displayed on the display, Showing how to switch and display 図３１（Ｂ）に示す映像データが表示されている状態において、ディスプレイに表示された映像データの表示領域外へのクリック操作に応じて、状態標示用ボックスが表示される様子を示す図The figure which shows a mode that the box for a status indication is displayed according to the click operation outside the display area of the video data displayed on the display in the state where the video data shown in FIG. 31 (B) is displayed. （Ａ）第４の実施形態の音声処理システムの使用例の説明図、（Ｂ）第１の指定箇所の周囲に表示される第１の識別形状、第２の指定箇所の周囲に表示される第２の識別形状、第３の指定箇所の周囲に表示される第３の識別形状、第４の指定箇所の周囲に表示される第４の識別形状の一例を表示する様子と、第１の識別形状により特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調した音声データと、第２の識別形状により特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調した音声データと、第３の識別形状により特定される第３の指定箇所に対応する第３の音声位置に向かう第３の指向方向の音声を強調した音声データとを、第１及び第２の各スピーカから出力する様子を示す図(A) Explanatory drawing of an example of use of the speech processing system of the fourth embodiment, (B) a first identification shape displayed around the first designated location, and displayed around the second designated location. An example of displaying the second identification shape, the third identification shape displayed around the third designated location, the fourth identification shape displayed around the fourth designated location, and the first In the voice data that emphasizes the voice in the first directivity direction toward the first voice position corresponding to the first designated location specified by the identification shape and the second designated location specified by the second identification shape The voice data in which the voice in the second directivity direction toward the corresponding second voice position is emphasized, and the third toward the third voice position corresponding to the third designated position specified by the third identification shape. Voice data that emphasizes voice in the directional direction is output from each of the first and second speakers. Shows a state in which the force 図３６（Ｂ）に示す映像データが表示されている状態において、キーボードの複数の特定キーの同時押下操作に応じて、調整用操作ボックスが表示される様子を示す図The figure which shows a mode that the operation box for adjustment is displayed according to simultaneous pressing operation of the several specific key of a keyboard in the state as which the video data shown to FIG. 36 (B) is displayed. 図３６（Ｂ）に示す映像データが表示されている状態において、ディスプレイに表示された映像データの表示領域外へのクリック操作に応じて、調整用操作ボックスが表示される様子を示す図The figure which shows a mode that the operation box for adjustment is displayed according to the click operation outside the display area of the video data displayed on the display in the state where the video data shown in FIG. 36 (B) is displayed. （Ａ）第４の実施形態の音声処理システムの使用例の説明図、（Ｂ）第１の指定箇所の周囲に表示される第１の識別形状、第２の指定箇所の周囲に表示される第２の識別形状、第３の指定箇所の周囲に表示される第３の識別形状、第４の指定箇所の周囲に表示される第４の識別形状の一例を表示する様子と、第１の識別形状により特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調した音声データと、第２の識別形状により特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調した音声データとを合成して第１のスピーカから出力する様子と、第３の識別形状により特定される第３の指定箇所に対応する第３の音声位置に向かう第３の指向方向の音声を強調した音声データを第２のスピーカから出力する様子を示す図(A) Explanatory drawing of an example of use of the speech processing system of the fourth embodiment, (B) a first identification shape displayed around the first designated location, and displayed around the second designated location. An example of displaying the second identification shape, the third identification shape displayed around the third designated location, the fourth identification shape displayed around the fourth designated location, and the first In the voice data that emphasizes the voice in the first directivity direction toward the first voice position corresponding to the first designated location specified by the identification shape and the second designated location specified by the second identification shape A state of synthesizing and outputting from the first speaker the voice data in which the voice in the second directivity direction toward the corresponding second voice position is emphasized, and the third designated portion specified by the third identification shape A sound that emphasizes the sound in the third directivity direction toward the third sound position corresponding to Shows how the output data from the second speaker 図３９（Ｂ）に示す映像データが表示されている状態において、タッチパネルが設けられたディスプレイに表示された映像データの表示領域外へのタッチに応じて、調整用操作ボックスが表示される様子を示す図In the state where the video data shown in FIG. 39B is displayed, the adjustment operation box is displayed in response to a touch outside the display area of the video data displayed on the display provided with the touch panel. Illustration

以下、本発明に係る音声処理システム及び音声処理方法の各実施形態について、図面を参照して説明する。各実施形態の音声処理システムは、工場、公共施設（例えば図書館又はイベント会場）、又は店舗（例えば小売店、銀行）に設置される監視システム（有人監視システム及び無人監視システムを含む）に適用される。 Hereinafter, embodiments of a voice processing system and a voice processing method according to the present invention will be described with reference to the drawings. The voice processing system of each embodiment is applied to a monitoring system (including a manned monitoring system and an unmanned monitoring system) installed in a factory, a public facility (for example, a library or an event venue), or a store (for example, a retail store or a bank). The

（第１の実施形態）
図１（Ａ）及び図１（Ｂ）は、各実施形態の音声処理システム５Ａ，５Ｂのシステム構成を示すブロック図である。音声処理システム５Ａは、監視用のカメラ１０，１０Ａと、マイクアレイ２０と、音声処理装置４０とを含む構成である。カメラ１０，１０Ａと、マイクアレイ２０と、音声処理装置４０とは、ネットワーク３０を介して相互に接続されている。 (First embodiment)
FIG. 1A and FIG. 1B are block diagrams showing the system configuration of the speech processing systems 5A and 5B of the embodiments. The voice processing system 5 A includes a monitoring camera 10, 10 A, a microphone array 20, and a voice processing device 40. The cameras 10, 10 A, the microphone array 20, and the sound processing device 40 are connected to each other via the network 30.

音声処理システム５Ｂは、監視用のカメラ１０Ｂ，１０Ｃと、マイクアレイ２０Ａと、レコーダ４５Ａと、ＰＣ（Personal Computer）７０とを含む構成である。カメラ１０Ｂ，１０Ｃと、マイクアレイ２０Ａと、レコーダ４５Ａと、ＰＣ７０とは、ネットワーク３０Ａを介して相互に接続されている。 The audio processing system 5B includes a monitoring camera 10B, 10C, a microphone array 20A, a recorder 45A, and a PC (Personal Computer) 70. The cameras 10B and 10C, the microphone array 20A, the recorder 45A, and the PC 70 are connected to each other via the network 30A.

以下、音声処理システム５Ａの各部の動作を主に説明し、音声処理システム５Ｂの各部の動作については音声処理システム５Ａの動作と異なる内容について説明する。 Hereinafter, the operation of each part of the voice processing system 5A will be mainly described, and the contents of the operation of each part of the voice processing system 5B will be described differently from the operation of the voice processing system 5A.

撮像部としてのカメラ１０，１０Ａは、例えばイベント会場の室内の天井（例えば図６参照）に設置される監視カメラであり、ネットワーク３０を介して接続された監視システム制御室（不図示）から遠隔操作が可能なパンチルト機能、ズームイン機能及びズームアウト機能を有し、監視対象の地点（場所）の周囲の映像（静止画及び動画を含む。以下同様）を撮像する。カメラ１０，１０Ａは、撮像した映像のデータ（映像データ）を、ネットワーク３０を介してレコーダ４５に記録する。 The cameras 10 and 10 A as imaging units are monitoring cameras installed on the ceiling (for example, see FIG. 6) of the event venue, for example, and are remote from a monitoring system control room (not shown) connected via the network 30. It has a pan / tilt function, a zoom-in function, and a zoom-out function that can be operated, and captures images (including still images and moving images; the same applies hereinafter) around the point (place) to be monitored. The cameras 10 and 10 A record captured video data (video data) in the recorder 45 via the network 30.

収音部としてのマイクアレイ２０は、例えばイベント会場の室内の天井（例えば図６参照）に設置され、複数のマイクロホン２２（例えば図２参照）が一様に設けられたマイクロホンである。マイクアレイ２０は、各々のマイクロホン２２を用いて、監視対象の地点（場所）周囲の音声を収音し、各々のマイクロホン２２により収音された音声のデータ（音声データ）を、ネットワークを介してレコーダ４５に記録する。マイクアレイ２０の構造は、図２を参照して後述する。 The microphone array 20 as a sound collection unit is a microphone that is installed on, for example, a ceiling (for example, see FIG. 6) in an event venue, and is uniformly provided with a plurality of microphones 22 (see, for example, FIG. 2). The microphone array 20 uses each microphone 22 to collect the sound around the point (location) to be monitored, and the sound data (voice data) collected by each microphone 22 via the network. Record in the recorder 45. The structure of the microphone array 20 will be described later with reference to FIG.

音声処理装置４０は、レコーダ４５と、信号処理部５０と、操作部５５と、再生部６０とを含む構成である。レコーダ４５は、レコーダ４５におけるデータの記録等の各処理を制御するための制御部（不図示）と、映像データ及び音声データを格納するための記録部（不図示）とを含む構成である。レコーダ４５は、カメラ１０，１０Ａにより撮像された映像データと、マイクアレイ２０により収音された音声データとを対応付けて記録する。 The audio processing device 40 includes a recorder 45, a signal processing unit 50, an operation unit 55, and a playback unit 60. The recorder 45 includes a control unit (not shown) for controlling each process such as data recording in the recorder 45 and a recording unit (not shown) for storing video data and audio data. The recorder 45 records the video data captured by the cameras 10 and 10A and the audio data collected by the microphone array 20 in association with each other.

信号処理部５０は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）又はＤＳＰ（Digital Signal Processor）を用いて構成され、音声処理装置４０の各部の動作を全体的に統括するための制御処理、他の各部との間のデータの入出力処理、データの演算（計算）処理及びデータの記憶処理を実行する。 The signal processing unit 50 is configured using, for example, a central processing unit (CPU), a micro processing unit (MPU), or a digital signal processor (DSP), and controls for overall control of operations of each unit of the audio processing device 40. Processing, data input / output processing with other units, data calculation (calculation) processing, and data storage processing are executed.

信号処理部５０は、レコーダ４５に記録されている音声データを用いて、後述する音声データの指向性制御処理によって各々のマイクロホンにより収音された各音声データを加算し、マイクアレイ２０の各マイクロホン２２の位置から特定方向への音声（音量レベル）を強調（増幅）するために、特定方向への指向性を形成した音声データを生成する。また、信号処理部５０は、マイクアレイ２０から送信された音声データを用いて、マイクアレイ２０から特定方向（指向方向）への音声の音量レベルを強調（増幅）するために、特定方向への指向性を形成した音声データを生成しても良い。なお、特定方向とは、マイクアレイ２０から、操作部５５から指定された所定の指定箇所に対応する位置に向かう方向であり、音声データの音量レベルを強調（増幅）するためにユーザにより指定される方向である。 The signal processing unit 50 uses the sound data recorded in the recorder 45 to add the sound data collected by each microphone by the sound data directivity control process described later, and each microphone of the microphone array 20. In order to emphasize (amplify) the sound (volume level) in the specific direction from the position 22, sound data in which directivity in the specific direction is formed is generated. In addition, the signal processing unit 50 uses the audio data transmitted from the microphone array 20 to emphasize (amplify) the volume level of the audio from the microphone array 20 in the specific direction (directing direction). Audio data having directivity may be generated. The specific direction is a direction from the microphone array 20 to a position corresponding to a predetermined designated location designated from the operation unit 55, and is designated by the user to emphasize (amplify) the volume level of the audio data. Direction.

信号処理部５０は、レコーダ４５に記録された映像データが全方位カメラ（後述参照）によって撮像された場合には、レコーダ４５に記録された映像データの座標系（例えばｘ軸，ｙ軸，ｚ軸のうち２次元又は３次元の座標変換）の変換処理を行い、変換処理後の映像データをディスプレイ６３に表示させる（図９（Ｃ）及び図９（Ｅ）参照）。 When the video data recorded in the recorder 45 is imaged by an omnidirectional camera (see later), the signal processing unit 50 uses a coordinate system (for example, x-axis, y-axis, z) of the video data recorded in the recorder 45. Conversion processing (two-dimensional or three-dimensional coordinate conversion of the axes) is performed, and the video data after the conversion processing is displayed on the display 63 (see FIGS. 9C and 9E).

操作部５５は、例えばディスプレイ６３の画面に対応して配置され、ユーザの指９５又はスタイラスペンによって入力操作が可能なタッチパネル又はタッチパッドを用いて構成される。操作部５５は、ユーザの操作に応じて、音声データの音量レベルの強調（増幅）を所望する１つ以上の指定箇所の座標のデータを信号処理部５０に出力する。なお、操作部５５は、マウス又はキーボード等のポインティングデバイスを用いて構成されても良い。 The operation unit 55 is arranged using, for example, a touch panel or a touch pad that is arranged corresponding to the screen of the display 63 and can be input by a user's finger 95 or a stylus pen. The operation unit 55 outputs, to the signal processing unit 50, data on the coordinates of one or more designated locations for which enhancement (amplification) of the volume level of the audio data is desired in accordance with a user operation. The operation unit 55 may be configured using a pointing device such as a mouse or a keyboard.

再生部６０は、ディスプレイ６３と、スピーカ６５とを含む構成であり、レコーダ４５に記録された映像データをディスプレイ６３に表示させ、更に、レコーダ４５に記録された音声データをスピーカ６５に音声出力させる。なお、ディスプレイ６３及びスピーカ６５は、再生部６０とは別々の構成としても良い。 The playback unit 60 includes a display 63 and a speaker 65. The playback unit 60 displays video data recorded on the recorder 45 on the display 63, and further causes the speaker 65 to output audio data recorded on the recorder 45. . The display 63 and the speaker 65 may be configured separately from the playback unit 60.

表示部としてのディスプレイ６３は、カメラ１０，１０Ａによって撮像されてレコーダ４５に記録された映像データを表示する。 The display 63 as a display unit displays video data captured by the cameras 10 and 10 A and recorded in the recorder 45.

音声出力部としてのスピーカ６５は、マイクアレイ２０によって収音されてレコーダ４５に記録された音声データ、もしくはその音声データを基にして信号処理部５０にて特定方向への強調処理を行った音声データを音声出力する。 The speaker 65 as an audio output unit is audio data picked up by the microphone array 20 and recorded in the recorder 45, or audio that has been emphasized in a specific direction by the signal processing unit 50 based on the audio data. Output data as audio.

ここで、音声処理装置４０は、レコーダ４５と音声処理装置４０における他の各部とが異なる装置の構成としても良い（図１（Ｂ）参照）。具体的には、図１（Ａ）に示す音声処理装置４０は、図１（Ｂ）に示すレコーダ４５Ａと、図１（Ｂ）に示すＰＣ７０とを含む構成としても良い。即ち、ＰＣ７０は、汎用のコンピュータを用いて構成され、信号処理部７１と、ディスプレイ７３及びスピーカ７５を含む再生部７２と、操作部７８とを含む構成である。レコーダ４５Ａ及びＰＣ７０は、音声処理システム５Ａにおける音声処理装置４０に相当し、同様の機能及び動作を実現する。 Here, the voice processing device 40 may be configured as a device in which the recorder 45 and other units in the voice processing device 40 are different (see FIG. 1B). Specifically, the audio processing device 40 illustrated in FIG. 1A may include a recorder 45A illustrated in FIG. 1B and a PC 70 illustrated in FIG. That is, the PC 70 is configured using a general-purpose computer, and includes a signal processing unit 71, a playback unit 72 including a display 73 and a speaker 75, and an operation unit 78. The recorder 45A and the PC 70 correspond to the voice processing device 40 in the voice processing system 5A, and realize the same functions and operations.

また、カメラ１０Ｂ，１０Ｃ及びマイクアレイ２０Ａの機能は、それぞれ音声処理システム５Ａにおけるカメラ１０，１０Ａ及びマイクアレイ２０の機能と同一である。 The functions of the cameras 10B and 10C and the microphone array 20A are the same as the functions of the cameras 10 and 10A and the microphone array 20 in the sound processing system 5A, respectively.

なお、音声処理システム５Ａ，５Ｂに設置されるカメラの台数は、任意である。また、ネットワーク３０，３０Ａが相互に接続され、音声処理システム５Ａ−５Ｂの間においてデータの転送が可能でも良い。 Note that the number of cameras installed in the voice processing systems 5A and 5B is arbitrary. Further, the networks 30 and 30A may be connected to each other so that data can be transferred between the voice processing systems 5A and 5B.

図２（Ａ）は、マイクアレイ２０の外観図である。マイクアレイ２０は、円盤状の筐体２１に配置された複数のマイクロホン２２を含む構成である。複数のマイクロホン２２は、筐体２１の面に沿って配置され、筐体２１と同一の中心を有する小さい円状及び大きい円状の２個の同心円状に沿って配置されている。小さな円状に沿って配置された複数のマイクロホン２２Ａは、互いの間隔が狭く、高い音域に適した特性を有する。一方、大きな円状に沿って配置された複数のマイクロホン２２Ｂは、直径が大きく、低い音域に適した特性を有する。 FIG. 2A is an external view of the microphone array 20. The microphone array 20 includes a plurality of microphones 22 arranged in a disk-shaped housing 21. The plurality of microphones 22 are arranged along the surface of the housing 21, and are arranged along two concentric circles, a small circle having the same center as the housing 21 and a large circle. The plurality of microphones 22A arranged along a small circle are narrow in distance from each other and have characteristics suitable for a high sound range. On the other hand, the plurality of microphones 22B arranged along a large circle have a large diameter and characteristics suitable for a low sound range.

図２（Ｂ）は、第３の実施形態におけるマイクアレイ２０Ｃの外観とマイクアレイ２０Ｃと全方位カメラ１０Ｅ（図９（Ａ）参照）との取り付け状態とを示す図である。図２（Ｂ）に示すマイクアレイ２０Ｃは、内側に開口部２１ａが形成されたドーナツ型形状の筐体２１Ｃと、同筐体２１Ｃに一様に設けられた複数のマイクロホン２２Ｃとを含む構成である。複数のマイクロホン２２Ｃは、筐体２１Ｃに対して同心円状に沿って配置されている。 FIG. 2B is a diagram showing the appearance of the microphone array 20C and the attachment state of the microphone array 20C and the omnidirectional camera 10E (see FIG. 9A) in the third embodiment. A microphone array 20C shown in FIG. 2B includes a donut-shaped housing 21C having an opening 21a formed inside, and a plurality of microphones 22C provided uniformly in the housing 21C. is there. The plurality of microphones 22C are arranged along a concentric circle with respect to the housing 21C.

図２（Ｃ）では、筐体２１Ｃの開口部２１ａの内側には、図９（Ａ）に示す全方位カメラ１０Ｅが挿通した状態で取り付けられる。本実施形態では、全方位カメラ１０Ｅは、例えば魚眼レンズを搭載したカメラであり、ホールの床面の広範囲を撮像するように取り付けられている。このように、全方位カメラ１０Ｅとマイクアレイ２０Ｃとは、マイクアレイ２０Ｃの筐体２１Ｃの中心を共通とした同軸上に配置されるので、同一の座標系を用いることが可能である。 In FIG. 2C, the omnidirectional camera 10E shown in FIG. 9A is attached inside the opening 21a of the casing 21C. In the present embodiment, the omnidirectional camera 10E is, for example, a camera equipped with a fisheye lens, and is attached so as to image a wide range of the floor of the hall. As described above, the omnidirectional camera 10E and the microphone array 20C are arranged on the same axis so that the center of the casing 21C of the microphone array 20C is common, so that the same coordinate system can be used.

図３は、マイクアレイ２０を用いた指向性制御処理の原理の説明図である。図３では、遅延和方式を用いた指向性制御処理の原理について簡単に説明する。音源８０から発した音波が、マイクアレイ２０の各マイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎに対し、ある一定の角度（入射角＝（９０−θ）［度］）で入射するとする。マイクアレイ２０の筐体２１の面に対し、音源８０は所定角度θの方向に配置されているとする。また、マイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎ間の間隔は一定である。 FIG. 3 is an explanatory diagram of the principle of directivity control processing using the microphone array 20. In FIG. 3, the principle of directivity control processing using the delay sum method will be briefly described. When sound waves emitted from the sound source 80 are incident on the microphones 22a, 22b, 22c,..., 22n-1, 22n of the microphone array 20 at a certain angle (incident angle = (90−θ) [degrees]). To do. It is assumed that the sound source 80 is disposed in the direction of the predetermined angle θ with respect to the surface of the casing 21 of the microphone array 20. The intervals between the microphones 22a, 22b, 22c,..., 22n-1, and 22n are constant.

音源８０から発した音波は、最初にマイクロホン２２ａに到達して収音され、次にマイクロホン２２ｂに到達して収音され、次々に収音され、最後にマイクロホン２２ｎに到達して収音される。なお、マイクアレイ２０の各マイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎの位置から音源８０に向かう方向は、例えば音源８０が人物の会話時の音声である場合又は周囲の音楽である場合を想定すれば、人物の会話時の音声又は周囲の音楽の音声データの音量レベルを強調（増幅）するために操作部５５から指定された所定の範囲に対応する方向と同じと考えることができる。 The sound wave emitted from the sound source 80 first reaches the microphone 22a and is collected, then reaches the microphone 22b and is collected one after another, and finally reaches the microphone 22n and is collected. . Note that the direction from the position of each microphone 22a, 22b, 22c,..., 22n-1, 22n of the microphone array 20 toward the sound source 80 is, for example, when the sound source 80 is a voice during a conversation of a person or surrounding music. Assuming the case, it may be considered that the direction is the same as the direction corresponding to the predetermined range designated from the operation unit 55 in order to emphasize (amplify) the volume level of the voice data of the person's conversation or the sound data of the surrounding music. it can.

ここで、音波がマイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１に到達した時刻から最後に収音されたマイクロホン２２ｎに到達した時刻までには、到達時間差τ１，τ２，τ３，…，τｎ−１が生じる。このため、各々のマイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎにより収音された音声データがそのまま加算された場合には、位相がずれたまま加算されるため、音波の音量レベルが全体的に弱め合うことになってしまう。 Here, the arrival time difference τ1, τ2, τ3,..., Τn− from the time when the sound wave reaches the microphones 22a, 22b, 22c,. 1 is produced. For this reason, when the sound data collected by the respective microphones 22a, 22b, 22c,..., 22n-1, 22n are added as they are, they are added while being out of phase. It will be weakening as a whole.

なお、τ１は音波がマイクロホン２２ａに到達した時刻と音波がマイクロホン２２ｎに到達した時刻との差分の時間であり、τ２は音波がマイクロホン２２ｂに到達した時刻と音波がマイクロホン２２ｎに到達した時刻との差分の時間であり、τｎ−１は音波がマイクロホン２２ｎ−１に到達した時刻と音波がマイクロホン２２ｎに到達した時刻との差分の時間である。 Note that τ1 is a difference time between the time when the sound wave reaches the microphone 22a and the time when the sound wave reaches the microphone 22n, and τ2 is the time when the sound wave reaches the microphone 22b and the time when the sound wave reaches the microphone 22n. It is a difference time, and τn-1 is a difference time between the time when the sound wave reaches the microphone 22n-1 and the time when the sound wave reaches the microphone 22n.

一方、本実施形態を含む各実施形態では、信号処理部５０は、マイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎ毎に対応して設けられたＡ／Ｄ変換器５１ａ，５１ｂ，５１ｃ，…，５１ｎ−１，５１ｎ及び遅延器５２ａ，５２ｂ，５２ｃ，…，５２ｎ−１，５２ｎと、加算器５７と、を有する構成である（図３参照）。 On the other hand, in each embodiment including this embodiment, the signal processing unit 50 includes A / D converters 51a, 51b, 51c provided corresponding to the microphones 22a, 22b, 22c,..., 22n-1, 22n. ,..., 51n-1, 51n, delay units 52a, 52b, 52c,..., 52n-1, 52n, and an adder 57 (see FIG. 3).

即ち、信号処理部５０は、各マイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎにより収音されたアナログの音声データを、Ａ／Ｄ変換器５１ａ，５１ｂ，５１ｃ，…，５１ｎ−１，５１ｎにおいてＡＤ変換することでデジタルの音声データを得る。更に、信号処理部５０は、遅延器５２ａ，５２ｂ，５２ｃ，…，５２ｎ−１，５２ｎにおいて、各々のマイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎにおける到達時間差に対応する遅延時間を与えて位相を揃えた後、加算器５７において遅延処理後の音声データを加算する。これにより、信号処理部５０は、各マイクロホン２２ａ，２２ｂ，２２ｃ，…，２２ｎ−１，２２ｎの設置位置からの所定角度θの方向の音声データを強調した音声データを生成することができる。例えば図３では、遅延器５２ａ，５２ｂ，５３ｃ，…，５２ｎ−１，５２ｎに設定された各遅延時間Ｄ１，Ｄ２，Ｄ３，…，Ｄｎ−１，Ｄｎは、それぞれ到達時間差τ１，τ２，τ３，…，τｎ−１に相当し、数式（１）により示される。 That is, the signal processing unit 50 converts the analog audio data collected by the microphones 22a, 22b, 22c,..., 22n-1, 22n into A / D converters 51a, 51b, 51c,. , 51n, digital audio data is obtained by AD conversion. Further, the signal processing unit 50 uses the delay units 52a, 52b, 52c,..., 52n-1, 52n to set delay times corresponding to the arrival time differences in the respective microphones 22a, 22b, 22c,. After providing the signals and aligning the phases, the adder 57 adds the audio data after the delay processing. As a result, the signal processing unit 50 can generate audio data in which the audio data in the direction of the predetermined angle θ from the installation positions of the microphones 22a, 22b, 22c,..., 22n-1, 22n are emphasized. For example, in FIG. 3, the delay times D1, D2, D3,..., Dn-1, Dn set in the delay units 52a, 52b, 53c,. ,..., .Tau.n-1, and is represented by the formula (1).

Ｌ１は、マイクロホン２２ａとマイクロホン２２ｎにおける音波到達距離の差である。Ｌ２は、マイクロホン２２ｂとマイクロホン２２ｎにおける音波到達距離の差である。Ｌ３は、マイクロホン２２ｃとマイクロホン２２ｎにおける音波到達距離の差である。Ｌｎ−１は、マイクロホン２２ｎ−１とマイクロホン２２ｎにおける音波到達距離の差である。Ｖｓは音速である。Ｌ１，Ｌ２，Ｌ３，…，Ｌｎ−１，Ｖｓは既知の値である。図３では、遅延器５２ｎに設定される遅延時間Ｄｎは０（ゼロ）である。 L1 is the difference in the sound wave arrival distance between the microphone 22a and the microphone 22n. L2 is the difference in sound wave arrival distance between the microphone 22b and the microphone 22n. L3 is a difference in sound wave arrival distance between the microphone 22c and the microphone 22n. Ln-1 is a difference in sound wave arrival distance between the microphone 22n-1 and the microphone 22n. Vs is the speed of sound. L1, L2, L3,..., Ln−1, Vs are known values. In FIG. 3, the delay time Dn set in the delay device 52n is 0 (zero).

このように、信号処理部５０は、遅延器５２ａ，５２ｂ，５２ｃ，…，５２ｎ−１，５２ｎに設定される遅延時間Ｄ１，Ｄ２，Ｄ３，…，Ｄｎ−１，Ｄｎを変更することで、レコーダ４５に記録された音声データを用いて、マイクアレイ２０の設置位置を基準とした任意の方向の音声データを強調した音声データを生成することができ、音声処理システム５Ａ，５Ｂにおける音声データの指向性制御処理が簡易に行える。 Thus, the signal processing unit 50 changes the delay times D1, D2, D3,..., Dn-1, Dn set in the delay units 52a, 52b, 52c,. Audio data recorded in the recorder 45 can be used to generate audio data in which audio data in an arbitrary direction with respect to the installation position of the microphone array 20 is used as a reference, and the audio data in the audio processing systems 5A and 5B can be generated. Directivity control processing can be performed easily.

次に、本実施形態の音声処理システム５Ａ，５Ｂの記録時及び再生時の各動作を説明する。ここでは、音声処理システム５Ａが監視システムに適用された場合について説明する。図４は、音声処理システム５Ａの記録時の動作手順を説明するフローチャートである。 Next, each operation at the time of recording and reproduction of the audio processing systems 5A and 5B of the present embodiment will be described. Here, a case where the voice processing system 5A is applied to a monitoring system will be described. FIG. 4 is a flowchart for explaining an operation procedure during recording of the voice processing system 5A.

図４において、例えば監視システム制御室（不図示）にいるユーザからの遠隔操作により、カメラ１０，１０Ａは、監視対象の地点（場所）の周囲の映像の撮像を開始する（Ｓ１）。カメラ１０，１０Ａによる撮像の開始と同時又は略同時に、マイクアレイ２０は、監視対象の地点（場所）の周囲の音声の収音を開始する（Ｓ２）。カメラ１０，１０Ａは、撮像された映像データを、ネットワーク３０を介して接続されたレコーダ４５に転送する。マイクアレイ２０は、収音された音声データを、ネットワーク３０を介して接続されたレコーダ４５に転送する。 In FIG. 4, for example, by remote operation from a user in a monitoring system control room (not shown), the cameras 10 and 10 A start capturing an image around a monitored point (place) (S 1). Simultaneously or substantially simultaneously with the start of imaging by the cameras 10 and 10A, the microphone array 20 starts collecting sound around the point (location) to be monitored (S2). The cameras 10 and 10 A transfer the captured video data to the recorder 45 connected via the network 30. The microphone array 20 transfers the collected audio data to the recorder 45 connected via the network 30.

レコーダ４５は、カメラ１０，１０Ａから転送された映像データと、マイクアレイ２０から転送された音声データとを全て対応付けて記録媒体に格納して記録する（Ｓ３）。ユーザからの遠隔操作により、カメラ１０，１０Ａと、マイクアレイ２０とレコーダ４５との記録時の動作が終了する。 The recorder 45 associates and stores all the video data transferred from the cameras 10 and 10A and the audio data transferred from the microphone array 20 in a recording medium (S3). The operation at the time of recording with the cameras 10 and 10A, the microphone array 20, and the recorder 45 is completed by a remote operation from the user.

図５は、１つ以上の指定箇所を指定する場合における、音声処理システム５Ａ，５Ｂの再生時の動作手順を説明するフローチャートである。 FIG. 5 is a flowchart for explaining an operation procedure during reproduction of the voice processing systems 5A and 5B when one or more designated portions are designated.

図５において、音声処理装置４０のレコーダ４５は、ユーザからの直接的な操作或いは遠隔操作により再生したい映像データの指定を受け付ける（Ｓ１１）。映像データの指定には、例えば記録された日時及びカメラの種類が条件として用いられる。再生部６０は、ステップＳ１１において指定された条件に応じた映像データを再生し、ディスプレイ６３の画面に表示させる。更に、再生部６０は、再生された映像データに対応付けてレコーダ４５に格納されている音声データも再生し、スピーカ６５から音声出力させる。 In FIG. 5, the recorder 45 of the audio processing device 40 accepts designation of video data to be reproduced by direct operation or remote operation from the user (S11). For the designation of the video data, for example, recorded date and camera type are used as conditions. The reproduction unit 60 reproduces video data corresponding to the condition specified in step S 11 and displays it on the screen of the display 63. Furthermore, the reproduction unit 60 reproduces audio data stored in the recorder 45 in association with the reproduced video data, and outputs the audio from the speaker 65.

ここで、再生部６０が再生している映像データの再生中或いは一時停止中に、ユーザが、操作部５５を介して、ディスプレイ６３の画面に表示されている映像データの中で音声（音量レベル）を強調（増幅）する１つ以上の指定箇所を指定したとする。信号処理部５０は、ユーザの指定操作に応じて、映像データの内容の中で音声（音量レベル）を強調（増幅）する１つ以上の指定箇所の指定を受け付ける（Ｓ１２）。 Here, during the playback or pause of the video data being played back by the playback unit 60, the user can make a sound (volume level) in the video data displayed on the screen of the display 63 via the operation unit 55. ) Is specified at one or more specified locations to emphasize (amplify). The signal processing unit 50 accepts designation of one or more designated portions that emphasize (amplify) audio (volume level) in the contents of the video data in accordance with a user's designation operation (S12).

以下、操作部５５を介して、マイクアレイ２０，２０Ａを基準として、音声（音量レベル）を強調（増幅）する方向（指向方向）に指向性を形成するために、ユーザにより指定された指定箇所を「指定箇所」と略記する。ステップＳ１２では、例えばユーザが、ディスプレイ６３の画面を指９５でタッチすることで、ディスプレイ６３の画面に表示された映像データに対する指定箇所、又はタッチされた指定箇所を中心とする所定の矩形の音声強調範囲が指定されたとする。 In the following, the designated location designated by the user in order to form directivity in the direction (directing direction) in which the sound (volume level) is emphasized (amplified) with reference to the microphone arrays 20 and 20A via the operation unit 55. Is abbreviated as “specified location”. In step S12, for example, when the user touches the screen of the display 63 with the finger 95, the designated portion of the video data displayed on the screen of the display 63 or a predetermined rectangular sound centering on the touched designated portion is used. Assume that the highlight range is specified.

信号処理部５０は、操作部５５を介して指定された１つ以上の指定箇所又は音声強調範囲を基に、マイクアレイ２０の各マイクロホン２２の位置の中心位置から１つ以上の指定箇所又は音声強調範囲の例えば中心に対応する実際の現場の各位置（各音声位置）に向かう方向（各指向方向）を、図３を参照して説明した所定角度θ１，θ２，…，θｎの方向、即ち、音声（音量レベル）を強調（増幅）する各方向（各指向方向）として算出する。更に、信号処理部５０は、現在再生部６０によって再生されている映像データと対応付けてレコーダ４５に格納されている音声データに対し、算出された所定角度θ１，θ２，…，θｎにそれぞれ指向性を形成した音声データ、即ち、所定角度θ１，θ２，…，θｎの音声（音量レベル）が強調（増幅）された音声データを生成する（Ｓ１３）。 Based on one or more designated locations or voice enhancement ranges designated via the operation unit 55, the signal processing unit 50 performs one or more designated locations or sounds from the center position of each microphone 22 of the microphone array 20. The direction (each pointing direction) toward each position (each voice position) corresponding to, for example, the center of the enhancement range is the direction of the predetermined angles θ1, θ2,..., Θn described with reference to FIG. The sound (volume level) is calculated as each direction (each direction) in which the sound (volume level) is emphasized (amplified). Further, the signal processing unit 50 is directed to the calculated predetermined angles θ1, θ2,..., Θn with respect to the audio data stored in the recorder 45 in association with the video data currently reproduced by the reproduction unit 60, respectively. , That is, voice data in which voices (volume levels) at predetermined angles θ1, θ2,..., Θn are emphasized (amplified) are generated (S13).

なお、本実施形態では、信号処理部５０は、マイクアレイ２０の各マイクロホン２２の位置の中心位置から１つ以上の指定箇所又は音声強調範囲の例えば中心に対応する各音声位置に向かう方向に指向性を形成した音声データを生成又は合成するが、更に、１つ以上の指定箇所又は音声強調範囲に対応する各音声位置に向かう方向（所定角度θ１，θ２，…，θｎ）から大きく外れる方向（例えば所定角度θ１，θ２，…，θｎから±５度以上外れる方向）に対する音声データを抑圧処理しても良い。 In the present embodiment, the signal processing unit 50 is directed in a direction from the center position of each microphone 22 of the microphone array 20 toward each sound position corresponding to, for example, the center of one or more specified locations or sound enhancement ranges. Is generated or synthesized, but is further greatly deviated from the direction (predetermined angles θ1, θ2,..., Θn) toward each audio position corresponding to one or more designated locations or audio enhancement ranges ( For example, audio data for a predetermined angle θ1, θ2,.

再生部６０は、信号処理部５０によって１つ以上の指定箇所又は音声強調範囲に対応する各音声位置に向かう方向の音声（音量レベル）が強調（増幅）された各音声データを、ステップＳ１１の指定に応じてディスプレイ６３に表示されている映像データと同期させて、スピーカ６５から音声出力させる（Ｓ１４）。これにより、音声処理装置４０の再生時における動作は終了する。 In step S11, the reproduction unit 60 emphasizes (amplifies) the sound (volume level) in the direction toward each sound position corresponding to one or more designated locations or sound enhancement ranges by the signal processing unit 50. In accordance with the designation, audio is output from the speaker 65 in synchronization with the video data displayed on the display 63 (S14). Thereby, the operation | movement at the time of reproduction | regeneration of the audio processing apparatus 40 is complete | finished.

図６は、第１の実施形態の音声処理システム５Ａの使用形態の一例を示す模式図である。図６（Ａ）は、例えば屋内のイベント会場としてのホールの天井８５に、１台のカメラ１０と１台のマイクアレイ２０とが離れた位置に設置された様子を示す図である。 FIG. 6 is a schematic diagram illustrating an example of a usage pattern of the voice processing system 5A according to the first embodiment. FIG. 6A is a diagram showing a state in which one camera 10 and one microphone array 20 are installed at a distance from a ceiling 85 of a hall as an indoor event venue, for example.

図６（Ａ）では、２人の人物９１，９２がホールの床８７に立って会話をしている。２人の人物９１，９２から少し離れた位置には、スピーカ８２が床８７の上に接して載置されており、スピーカ８２から音楽が流れている。また、カメラ１０は、カメラ１０に予め設定された監視対象の地点（場所）の周囲にいる人物９１，９２を撮像している。更に、マイクアレイ２０は、ホール全体の音声を収音している。 In FIG. 6A, two persons 91 and 92 are standing on the floor 87 of the hall and having a conversation. A speaker 82 is placed on the floor 87 at a position slightly away from the two persons 91 and 92, and music flows from the speaker 82. Further, the camera 10 captures images of persons 91 and 92 around a monitoring target point (place) set in advance in the camera 10. Furthermore, the microphone array 20 collects the sound of the entire hall.

図６（Ｂ）は、映像データがディスプレイ６３に表示され、音声データがスピーカ６５において音声出力されている様子を示す図である。ディスプレイ６３の画面には、カメラ１０が撮像した映像データが表示されている。また、スピーカ６５からは、２人の人物９１，９２の会話又はホール内の音楽が音声出力されている。 FIG. 6B is a diagram illustrating a state in which video data is displayed on the display 63 and audio data is output as audio from the speaker 65. Video data captured by the camera 10 is displayed on the screen of the display 63. In addition, the speaker 65 outputs the conversation between the two persons 91 and 92 or the music in the hall.

ユーザは、例えばディスプレイ６３の画面に表示された２人の人物９１，９２の映像データの中央付近を指９５でタッチしたとする。タッチ点６３ａはユーザにより指定された指定箇所となる。信号処理部５０は、マイクアレイ２０によって収音された音声、即ち各マイクロホン２２が収音した各音声データを用いて、マイクアレイ２０の各マイクロホン２２の位置から、ユーザが指定したタッチ点６３ａ又は矩形範囲６３ｂの中心に対応する音声位置に向かう指向方向（図６（Ａ）に示す符号ｅで示される方向）に指向性を形成した音声データを生成する。 For example, it is assumed that the user touches the vicinity of the center of the video data of the two persons 91 and 92 displayed on the screen of the display 63 with the finger 95. The touch point 63a is a designated location designated by the user. The signal processing unit 50 uses the sound picked up by the microphone array 20, that is, the sound data picked up by each microphone 22, from the position of each microphone 22 of the microphone array 20, and the touch point 63a designated by the user or Audio data in which directivity is formed in the directivity direction (direction indicated by the symbol e shown in FIG. 6A) toward the audio position corresponding to the center of the rectangular range 63b is generated.

即ち、信号処理部５０は、各マイクロホン２２が収音した各音声データを用いて、マイクアレイ２０の各マイクロホン２２の位置から、ユーザが指定したタッチ点６３ａ又は矩形範囲６３ｂの中心に対応する音声位置に向かう指向方向の音声（音量レベル）を強調（増幅）した音声データを生成する。再生部６０は、信号処理部５０が生成した音声データを、カメラ１０が撮像した映像データと同期させてスピーカ６５から音声出力させる。 In other words, the signal processing unit 50 uses the audio data collected by each microphone 22 and the audio corresponding to the center of the touch point 63a or the rectangular range 63b specified by the user from the position of each microphone 22 of the microphone array 20. Sound data in which the sound (volume level) in the directivity direction toward the position is emphasized (amplified) is generated. The reproduction unit 60 outputs the audio data generated by the signal processing unit 50 from the speaker 65 in synchronization with the video data captured by the camera 10.

この結果、ユーザによって指定されたタッチ点６３ａ又は矩形範囲６３ｂにおける音声データが強調され、スピーカ６５から２人の人物９１，９２の会話（例えば図６（Ａ）に示す「Ｈｅｌｌｏ」参照）が大きな音量によって音声出力される。一方、２人の人物９１，９２に比べ、マイクアレイ２０により近い距離に載置されているがユーザによって指定されたタッチ点６３ａではないスピーカ８２から流れている音楽（図６（Ａ）に示す「♪〜」参照）は強調して音声出力されず、２人の人物９１，９２の会話に比べて小さな音量によって音声出力される。 As a result, the voice data at the touch point 63a or the rectangular range 63b specified by the user is emphasized, and the conversation between the two persons 91 and 92 from the speaker 65 (see, for example, “Hello” shown in FIG. 6A) is large. Sound is output according to the volume. On the other hand, compared to the two persons 91 and 92, music that is placed from a speaker 82 that is placed closer to the microphone array 20 but is not the touch point 63a specified by the user (shown in FIG. 6A). The voice is output with a lower sound volume than the conversation between the two persons 91 and 92.

以上により、本実施形態では、音声処理システム５Ａ又は５Ｂは、レコーダ４５に記録された映像データ及び音声データの再生中において、ユーザによって指定された任意の再生時間に対する映像中の音声データを強調して出力することができる。これにより、ユーザは、ディスプレイ６３の画面に表示された映像データを見ながら、音声データを強調したい箇所をタッチして指定するだけで、簡単にその指定箇所又は指定箇所を含む指定範囲（音声強調範囲）における音声データを強調して音声出力させることができる。このように、本実施形態の音声処理システム５Ａ又は５Ｂでは、ユーザは、カメラ１０によって撮像された映像データをディスプレイ６３にて目視しながら、自己に必要な範囲の音声情報を容易に得ることができる。 As described above, in the present embodiment, the audio processing system 5A or 5B emphasizes the audio data in the video for an arbitrary reproduction time specified by the user during the reproduction of the video data and audio data recorded in the recorder 45. Can be output. As a result, the user simply touches and designates a location where the audio data is to be emphasized while watching the video data displayed on the screen of the display 63, and the designated location including the designated location (audio enhancement) Audio data in the range) can be emphasized and output as audio. As described above, in the audio processing system 5A or 5B according to the present embodiment, the user can easily obtain the audio information in the necessary range while viewing the video data captured by the camera 10 on the display 63. it can.

例えば、本実施形態の音声処理システム５Ａ又は５Ｂは、何かしらのアクシデントが発生した場合でも、アクシデントの発生後においても、マイクアレイ２０の各マイクロホン２２の位置からアクシデントの発生地点に向かう方向に指向性を形成した音声データを生成することで、アクシデントの発生時点における会話又は音声をユーザに確認させることができる。 For example, the speech processing system 5A or 5B according to the present embodiment has directivity in the direction from the position of each microphone 22 of the microphone array 20 toward the occurrence point of the accident, even when some accident occurs or after the occurrence of the accident. By generating the voice data that forms, the user can confirm the conversation or voice at the time of occurrence of the accident.

また、本実施形態の音声処理システム５Ａ又は５Ｂは、カメラ１０とマイクアレイ２０とは、屋内のホール等の天井８５に設置されているので、ホール内の至る所を監視することが可能となる。 Further, in the audio processing system 5A or 5B of the present embodiment, since the camera 10 and the microphone array 20 are installed on the ceiling 85 such as an indoor hall, it is possible to monitor everywhere in the hall. .

（第２の実施形態）
第１の実施形態では、カメラが１台である場合の音声処理システム５Ａの使用形態の一例を説明した。第２の実施形態では、カメラが複数台（例えば２台）である場合の音声処理システム５Ｃの使用形態の一例を説明する。 (Second Embodiment)
In the first embodiment, an example of the usage pattern of the audio processing system 5A when there is one camera has been described. In the second embodiment, an example of a usage pattern of the voice processing system 5C when there are a plurality of cameras (for example, two cameras) will be described.

なお、第２の実施形態の音声処理システム５Ｃでは、カメラが複数台（例えば２台）であること以外は、第１の実施形態の音声処理システム５Ａ又は５Ｂと同一の構成を有するので、第１の実施形態の音声処理システム５Ａ又は５Ｂと同一の構成要素については同一の符号を用いることで、その説明を省略する。 Note that the voice processing system 5C of the second embodiment has the same configuration as the voice processing system 5A or 5B of the first embodiment, except that there are a plurality of cameras (for example, two). The same components as those of the speech processing system 5A or 5B of the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

図７は、第２の実施形態の音声処理システム５Ｃの使用形態の一例を示す模式図である。図７（Ａ）は、例えば屋内のホールの天井８５に、２台のカメラ１０，１０Ａと、２台のカメラ１０，１０Ａの中間位置にある１台のマイクアレイ２０と、スピーカ８３とが設置された様子を示す図である。 FIG. 7 is a schematic diagram illustrating an example of a usage pattern of the voice processing system 5C of the second embodiment. In FIG. 7A, for example, two cameras 10 and 10A, one microphone array 20 at an intermediate position between the two cameras 10 and 10A, and a speaker 83 are installed on the ceiling 85 of an indoor hall. FIG.

また、ホールの床８７には、４人の人物９１，９２，９３，９４が立っており、人物９１と人物９２とが会話しており、人物９３と人物９４とが会話している。これら２組の間の位置には、スピーカ８２が床８７の上に載置されており、音楽が流れている。また、スピーカ８３は、人物９３と人物９４とのほぼ真上の天井８５に設置されている。 In addition, four persons 91, 92, 93, and 94 are standing on the floor 87 of the hall, the person 91 and the person 92 are talking, and the person 93 and the person 94 are talking. A speaker 82 is placed on the floor 87 at a position between these two sets, and music flows. The speaker 83 is installed on a ceiling 85 almost directly above the person 93 and the person 94.

カメラ１０は、４人の人物９１，９２，９３，９４から少し離れた位置から２人の人物９１，９２を撮像しており、マイクアレイ２０は、スピーカ８２のほぼ真上の天井８５に設置されており、ホール全体の音声を収音している。カメラ１０Ａは、４人の人物９１，９２，９３，９４から少し離れた位置から人物９３，９４を撮像している。 The camera 10 images two persons 91 and 92 from positions slightly away from the four persons 91, 92, 93 and 94, and the microphone array 20 is installed on the ceiling 85 almost directly above the speaker 82. The sound of the whole hall is collected. The camera 10 A images the persons 93 and 94 from positions slightly apart from the four persons 91, 92, 93, and 94.

図７（Ｂ）は、カメラ１０により撮像された映像データがディスプレイ６３に表示され、音声データがスピーカ６５において音声出力されている様子を示す図である。ディスプレイ６３の画面には、カメラ１０が撮像した映像データが表示されている。また、スピーカ６５からは、２人の人物９１，９２の会話又はホール内の音楽が音声出力されている。 FIG. 7B is a diagram illustrating a state in which video data captured by the camera 10 is displayed on the display 63 and audio data is output as audio from the speaker 65. Video data captured by the camera 10 is displayed on the screen of the display 63. In addition, the speaker 65 outputs the conversation between the two persons 91 and 92 or the music in the hall.

ユーザは、例えばディスプレイ６３の画面に表示された２人の人物９１，９２の映像データの中央付近を指９５でタッチしたとする。信号処理部５０は、マイクアレイ２０によって収音された音声、即ち各マイクロホン２２が収音した各音声データを用いて、マイクアレイ２０の各マイクロホン２２の位置から、ユーザが指定したタッチ点６３ａ又は矩形範囲６３ｂの中心に対応する音声位置に向かう指向方向（図７（Ａ）に示す符号ｅで示される方向）に指向性を形成した音声データを生成する。 For example, it is assumed that the user touches the vicinity of the center of the video data of the two persons 91 and 92 displayed on the screen of the display 63 with the finger 95. The signal processing unit 50 uses the sound picked up by the microphone array 20, that is, the sound data picked up by each microphone 22, from the position of each microphone 22 of the microphone array 20, and the touch point 63a designated by the user or Audio data in which directivity is formed in the directivity direction (direction indicated by the symbol e shown in FIG. 7A) toward the audio position corresponding to the center of the rectangular range 63b is generated.

この結果、ユーザによって指定されたタッチ点６３ａ又は矩形範囲６３ｂにおける音声データが強調され、スピーカ６５から２人の人物９１，９２の会話（例えば図７（Ａ）に示す「Ｈｅｌｌｏ」参照）が大きな音量によって音声出力される。一方、２人の人物９１，９２に比べ、マイクアレイ２０により近い距離に載置されているがユーザによって指定された矩形範囲６３ｂに含まれないスピーカ８２から流れている音楽（図７（Ａ）に示す「♪〜」参照）は強調して音声出力されず、２人の人物９１，９２の会話に比べて小さな音量によって音声出力される。 As a result, the audio data at the touch point 63a or the rectangular range 63b designated by the user is emphasized, and the conversation between the two persons 91 and 92 from the speaker 65 (see, for example, “Hello” shown in FIG. 7A) is large. Sound is output according to the volume. On the other hand, music that is placed from a speaker 82 that is placed closer to the microphone array 20 than the two persons 91 and 92 but is not included in the rectangular range 63b specified by the user (FIG. 7A). The voice is not output with emphasis, but is output with a lower sound volume than the conversation between the two persons 91 and 92.

図７（Ｃ）は、カメラ１０Ａにより撮像された映像データがディスプレイ６３に表示され、音声データがスピーカ６５において音声出力されている様子を示す図である。ディスプレイ６３の画面には、カメラ１０Ａが撮像した映像データが表示されている。また、スピーカ６５からは、２人の人物９３，９４の会話又はホール内の音楽が音声出力されている。 FIG. 7C is a diagram illustrating a state in which video data captured by the camera 10 A is displayed on the display 63 and audio data is output as audio from the speaker 65. On the screen of the display 63, video data captured by the camera 10A is displayed. Further, the speaker 65 outputs voice of conversation between two persons 93 and 94 or music in the hall.

ユーザは、例えばディスプレイ６３の画面に表示された２人の人物９３，９４の映像データの中央付近を指９５でタッチしたとする。信号処理部５０は、マイクアレイ２０によって収音された音声、即ち各マイクロホン２２が収音した各音声データを用いて、マイクアレイ２０の各マイクロホン２２の位置から、ユーザが指定したタッチ点６３ｃ又は矩形範囲６３ｄの中心に対応する音声位置に向かう指向方向（図７（Ａ）に示す符号ｆで示される方向）に指向性を形成した音声データを生成する。 It is assumed that the user touches the vicinity of the center of the video data of the two persons 93 and 94 displayed on the screen of the display 63 with the finger 95, for example. The signal processing unit 50 uses the sound picked up by the microphone array 20, that is, the sound data picked up by each microphone 22, from the position of each microphone 22 of the microphone array 20, and the touch point 63c specified by the user or Audio data in which directivity is formed in the directivity direction (direction indicated by the symbol f shown in FIG. 7A) toward the audio position corresponding to the center of the rectangular range 63d is generated.

即ち、信号処理部５０は、各マイクロホン２２が収音した各音声データを用いて、マイクアレイ２０の各マイクロホン２２の位置から、ユーザが指定したタッチ点６３ｃ又は矩形範囲６３ｄの中心に対応する音声位置に向かう指向方向の音声（音量レベル）を強調（増幅）した音声データを生成する。再生部６０は、信号処理部５０が生成した音声データを、カメラ１０Ａが撮像した映像データと同期させてスピーカ６５から音声出力させる。 In other words, the signal processing unit 50 uses the audio data collected by each microphone 22 and the audio corresponding to the center of the touch point 63c or the rectangular range 63d designated by the user from the position of each microphone 22 of the microphone array 20. Sound data in which the sound (volume level) in the directivity direction toward the position is emphasized (amplified) is generated. The reproduction unit 60 outputs the audio data generated by the signal processing unit 50 from the speaker 65 in synchronization with the video data captured by the camera 10A.

この結果、ユーザによって指定されたタッチ点６３ｃ又は矩形範囲６３ｄにおける音声データが強調され、スピーカ６５から２人の人物９１，９２の会話（例えば図７（Ａ）に示す「Ｈｉ」参照）が大きな音量によって音声出力される。一方、２人の人物９３，９４に比べ、マイクアレイ２０により近い距離に載置されているがユーザによって指定された矩形範囲６３ｄに含まれないスピーカ８２から流れている音楽（図７（Ａ）に示す「♪〜」参照）は強調して音声出力されず、２人の人物９３，９４の会話に比べて小さな音量によって音声出力される。 As a result, the voice data at the touch point 63c or the rectangular range 63d designated by the user is emphasized, and the conversation between the two persons 91 and 92 from the speaker 65 (see, for example, “Hi” shown in FIG. 7A) is large. Sound is output according to the volume. On the other hand, compared to the two persons 93 and 94, music that is placed from a speaker 82 that is placed closer to the microphone array 20 but is not included in the rectangular range 63d designated by the user (FIG. 7A). Are emphasized and are not output in voice, but are output with a lower sound volume than in the conversation between two persons 93 and 94.

以上により、本実施形態では、音声処理システム５Ｃは、レコーダ４５に記録された映像データ及び音声データの再生中において、ユーザによって指定されたいずれかのカメラ１０又は１０Ａにおける映像データに対して指定された任意の再生時間に対する映像中の音声データを強調して出力することができる。これにより、ユーザは、カメラ１０又は１０Ａが撮像した映像データをディスプレイ６３で見ながら、音声（音量レベル）を強調（増幅）したい箇所をタッチして指定するだけで、簡単にその指定された指定箇所又はその指定箇所を含む指定範囲における音声データを強調して音声出力させることができる。このように、本実施形態の音声処理システム５Ｃでは、ユーザは、カメラ１０又は１０Ａによって撮像された映像データをディスプレイ６３にて目視しながら、自己に必要な範囲の音声情報を容易に得ることができる。 As described above, in the present embodiment, the audio processing system 5C is designated for the video data in one of the cameras 10 or 10A designated by the user during the reproduction of the video data and audio data recorded in the recorder 45. In addition, audio data in video for an arbitrary reproduction time can be emphasized and output. As a result, the user simply touches and designates a portion where the sound (volume level) is to be emphasized (amplified) while viewing the video data captured by the camera 10 or 10A on the display 63, and the designated designation can be easily performed. The voice data in the designated range including the location or the designated location can be emphasized and outputted. As described above, in the audio processing system 5C according to the present embodiment, the user can easily obtain audio information within a necessary range while viewing the video data captured by the camera 10 or 10A on the display 63. it can.

また、本実施形態では第１の実施形態に比べて、音声処理システム５Ｃにおけるカメラの設置台数が複数でも良いため、カメラの台数に合わせてマイクアレイの台数を増やさなくて済み、コストの低減が可能な音声処理システム５Ｃを構築でき、音声処理システム５Ｃの省スペースを図ることができる。また、音声処理システム５Ｃは、１台目のカメラ１０が既に設置された音声処理システム５Ａ又は５Ｂに対し、２台目のカメラ１０Ａを増設するだけで第１の実施形態の音声処理システム５Ａ又は５Ｂと同様な動作及び効果を得ることができ、音声処理システムの拡張性を向上できる。 Further, in the present embodiment, a plurality of cameras may be installed in the sound processing system 5C as compared with the first embodiment, so that it is not necessary to increase the number of microphone arrays in accordance with the number of cameras, and the cost can be reduced. The possible voice processing system 5C can be constructed, and the space for the voice processing system 5C can be saved. Further, the voice processing system 5C can be obtained by adding the second camera 10A to the voice processing system 5A or 5B in which the first camera 10 has already been installed. Operations and effects similar to those of 5B can be obtained, and the expandability of the voice processing system can be improved.

（第３の実施形態）
第１及び第２の各実施形態では、カメラとマイクアレイとが天井の異なる場所に設置されている音声処理システム５Ａ又は５Ｂの使用形態の一例を説明した。第３の実施形態では、全方位カメラとマイクアレイとが一体として同軸上に設置された音声処理システム５Ｄの使用形態の一例を説明する。 (Third embodiment)
In each of the first and second embodiments, an example of a usage pattern of the voice processing system 5A or 5B in which the camera and the microphone array are installed in different places on the ceiling has been described. In the third embodiment, an example of a usage pattern of a voice processing system 5D in which an omnidirectional camera and a microphone array are integrally installed on the same axis will be described.

なお、第３の実施形態の音声処理システム５Ｄでは、全方位カメラとマイクアレイとが一体として同軸上に設置されたこと以外は、第１の実施形態の音声処理システム５Ａ又は音声処理システム５Ｂと同一の構成を有するので、第１の実施形態の音声処理システム５Ａ又は５Ｂと同一の構成要素については同一の符号を用いることで、その説明を省略する。 The voice processing system 5D of the third embodiment is the same as the voice processing system 5A or the voice processing system 5B of the first embodiment, except that the omnidirectional camera and the microphone array are integrally installed on the same axis. Since they have the same configuration, the same components as those in the speech processing system 5A or 5B of the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

図９は、音声処理システム５Ｄの使用形態の一例を示す模式図である。図９（Ａ）は、例えば屋内のホールの天井８５に、ドーナツ型形状のマイクアレイ２０Ｃと、マイクアレイ２０Ｃと一体として組み込まれた全方位カメラ１０Ｅと、スピーカ８３とが設置された様子を示す図である。図９（Ａ）では、人物９１，９２，９３，９４の会話状況と、スピーカ８２，８３の各動作状況は第２の実施形態における状況と同じとする。 FIG. 9 is a schematic diagram illustrating an example of a usage pattern of the voice processing system 5D. FIG. 9A shows a state in which, for example, a donut-shaped microphone array 20C, an omnidirectional camera 10E integrated with the microphone array 20C, and a speaker 83 are installed on the ceiling 85 of an indoor hall. FIG. In FIG. 9A, it is assumed that the conversation situations of the persons 91, 92, 93, and 94 and the operation situations of the speakers 82 and 83 are the same as those in the second embodiment.

図９（Ｂ）は、全方位カメラ１０Ｅが撮像した映像データにおいて２人の人物９１，９２が選択される様子を示す図である。図９（Ｂ）では、ディスプレイ６３の画面には、全方位カメラ１０Ｅにおける座標系が用いられた映像データ、即ち全方位カメラ１０Ｅが撮像した映像データがそのまま表示されている。図９（Ｃ）は、画像変換後の２人の人物９１，９２の映像データがディスプレイに表示され、人物９１，９２の会話の音声データがスピーカ６５において音声出力されている様子を示す図である。 FIG. 9B is a diagram illustrating a state in which two persons 91 and 92 are selected in the video data captured by the omnidirectional camera 10E. In FIG. 9B, video data using a coordinate system in the omnidirectional camera 10E, that is, video data captured by the omnidirectional camera 10E is displayed on the screen of the display 63 as it is. FIG. 9C is a diagram showing a state in which the video data of the two persons 91 and 92 after the image conversion is displayed on the display, and the voice data of the conversations of the persons 91 and 92 is output as audio from the speaker 65. is there.

ユーザは、例えばディスプレイ６３の画面に表示された４人の人物９１，９２，９３，９４の映像データの左上付近の指定箇所を指９５でタッチしたとする。信号処理部５０は、第２の実施形態と同様の動作に加え、全方位カメラ１０Ｅが撮像した広範囲の映像データの中から、ユーザにより指定された指定箇所を含む符号ｇの範囲の映像データの座標系を変換処理する。再生部６０は、信号処理部５０が座標系を変換処理した映像データを、ディスプレイ６３に表示させる（図９（Ｃ）参照）。なお、範囲ｇは、指９５のタッチ点から自動的に生成されるとする。また、信号処理部５０における第２の実施形態と同様の動作の説明は省略する。 For example, it is assumed that the user touches the designated portion near the upper left of the video data of the four persons 91, 92, 93, 94 displayed on the screen of the display 63 with the finger 95. In addition to the same operation as that of the second embodiment, the signal processing unit 50 performs processing of video data in a range of code g including a designated portion designated by the user from a wide range of video data captured by the omnidirectional camera 10E. Convert the coordinate system. The reproduction unit 60 causes the display 63 to display the video data obtained by converting the coordinate system by the signal processing unit 50 (see FIG. 9C). Note that the range g is automatically generated from the touch point of the finger 95. The description of the operation of the signal processing unit 50 similar to that of the second embodiment is omitted.

この結果、ユーザによって指定された範囲ｇにおける音声データが強調され、スピーカ６５から２人の人物９１，９２の会話（例えば図９（Ａ）に示す「Ｈｅｌｌｏ」参照）が大きな音量によって音声出力される。一方、２人の人物９１，９２に比べ、マイクアレイ２０Ｃにより近い距離に載置されているがユーザによって指定された指定箇所又はその指定箇所を含む指定範囲ｇに含まれないスピーカ８２から流れている音楽（図９（Ａ）に示す「♪〜」参照）は強調して音声出力されず、２人の人物９１，９２の会話に比べて小さな音量によって音声出力される。 As a result, the voice data in the range g designated by the user is emphasized, and the conversation between the two persons 91 and 92 (see, for example, “Hello” shown in FIG. 9A) is output from the speaker 65 with a high volume. The On the other hand, it flows from the speaker 82 that is placed closer to the microphone array 20C than the two persons 91 and 92 but is not included in the specified location designated by the user or the designated range g including the designated location. The music (see “♪ ˜” shown in FIG. 9A) is not emphasized and is not output as a sound, but is output as a sound with a smaller volume compared to the conversation between the two persons 91 and 92.

図９（Ｄ）は、全方位カメラ１０Ｅが撮像した映像データにおいて２人の人物９３，９４が選択される様子を示す図である。図９（Ｄ）では、ディスプレイ６３の画面には、全方位カメラ１０Ｅにおける座標系が用いられた映像データ、即ち全方位カメラ１０Ｅが撮像した映像データがそのまま表示されている。図９（Ｅ）は、画像変換後の２人の人物９３，９４の映像データがディスプレイに表示され、人物９３，９４の会話の音声データがスピーカ６５において音声出力されている様子を示す図である。 FIG. 9D is a diagram showing a state in which two persons 93 and 94 are selected in the video data captured by the omnidirectional camera 10E. In FIG. 9D, video data using a coordinate system in the omnidirectional camera 10E, that is, video data captured by the omnidirectional camera 10E is displayed on the screen of the display 63 as it is. FIG. 9E is a diagram showing a state in which the video data of the two persons 93 and 94 after the image conversion is displayed on the display, and the voice data of the conversations of the persons 93 and 94 is output as audio from the speaker 65. is there.

ユーザは、例えばディスプレイ６３の画面に表示された４人の人物９１，９２，９３，９４の映像データの右下付近の指定箇所を指９５でタッチしたとする。信号処理部５０は、第２の実施形態と同様の動作に加え、全方位カメラ１０Ｅが撮像した広範囲の映像データの中から、ユーザにより指定された指定箇所を含む符号ｈの範囲の映像データの座標系を変換処理する。再生部６０は、信号処理部５０が座標系を変換処理した映像データを、ディスプレイ６３に表示させる（図９（Ｅ）参照）。なお、範囲ｈは、指９５のタッチ点から自動的に生成されるとする。また、信号処理部５０における第２の実施形態と同様の動作の説明は省略する。 For example, it is assumed that the user touches a designated portion near the lower right of the video data of four persons 91, 92, 93, 94 displayed on the screen of the display 63 with the finger 95. In addition to the same operation as that of the second embodiment, the signal processing unit 50 performs processing of video data in a range of code h including a designated portion designated by the user from a wide range of video data captured by the omnidirectional camera 10E. Convert the coordinate system. The reproducing unit 60 causes the display 63 to display the video data obtained by converting the coordinate system by the signal processing unit 50 (see FIG. 9E). The range h is automatically generated from the touch point of the finger 95. The description of the operation of the signal processing unit 50 similar to that of the second embodiment is omitted.

この結果、ユーザによって指定された範囲ｈにおける音声データが強調され、スピーカ６５から２人の人物９３，９４の会話（例えば図９（Ａ）に示す「Ｈｉ」参照）が大きな音量によって音声出力される。一方、２人の人物９３，９４に比べ、マイクアレイ２０Ｃにより近い距離に載置されているがユーザによって指定された指定箇所又はその指定箇所を含む指定範囲ｈに含まれないスピーカ８２から流れている音楽（図９（Ａ）に示す「♪〜」参照）は強調して音声出力されず、２人の人物９３，９４の会話に比べて小さな音量によって音声出力される。 As a result, the audio data in the range h designated by the user is emphasized, and the conversation between the two persons 93 and 94 (see, for example, “Hi” shown in FIG. 9A) is output from the speaker 65 with a high sound volume. The On the other hand, it flows from the speaker 82 that is placed closer to the microphone array 20C than the two persons 93 and 94 but is not included in the specified location designated by the user or the designated range h including the designated location. The existing music (see “♪ ˜” shown in FIG. 9A) is not emphasized and is not output as a sound, but is output as a sound with a smaller volume compared to the conversation between the two persons 93 and 94.

以上により、本実施形態では、音声処理システム５Ｄは、全方位カメラ１０Ｅとマイクアレイ２０Ｃとは同軸上に配置されているので、全方位カメラ１０Ｅとマイクアレイ２０Ｃとの座標系を同一にすることができる。これにより、音声処理システム５Ｄは、第１，第２の各実施形態の効果に加え、全方位カメラ１０Ｅにより撮像された映像データにおける被写体の位置とマイクアレイ２０Ｃにより収音される被写体の人物の音声の方向とを対応付けるための座標系の変換処理を第１，第２の各実施形態に比べて容易化でき、再生部６０における映像データと音声データとを同期した再生処理の負荷を軽減できる。 As described above, in this embodiment, in the audio processing system 5D, the omnidirectional camera 10E and the microphone array 20C are arranged on the same axis, so that the coordinate systems of the omnidirectional camera 10E and the microphone array 20C are the same. Can do. Thereby, in addition to the effects of the first and second embodiments, the audio processing system 5D adds the position of the subject in the video data picked up by the omnidirectional camera 10E and the person of the subject picked up by the microphone array 20C. Compared to the first and second embodiments, the coordinate system conversion process for associating the audio direction can be facilitated, and the load of the reproduction process that synchronizes the video data and the audio data in the reproduction unit 60 can be reduced. .

また、音声処理システム５Ｄは、ユーザにより指定された指定箇所若しくはその指定箇所を含む指定範囲ｇ又は指定範囲ｈに含まれる映像データが、ディスプレイ６３の画面サイズに合わせた映像データに変換処理するので、全方位カメラ１０Ｅにより撮像された映像データを、縦横比がディスプレイ６３にとって自然な映像データの表示形態にて表示することができる。 Also, the audio processing system 5D converts the video data included in the designated area designated by the user or the designated range g including the designated place or the designated range h into video data that matches the screen size of the display 63. The video data captured by the omnidirectional camera 10 E can be displayed in a video data display mode in which the aspect ratio is natural for the display 63.

また、例えばマイクアレイの形状及び構成は、上述した各実施形態のものに限られず、種々の形状及び構成を用いても良い。図１０（Ａ）〜（Ｃ）は、他のマイクアレイ２０Ｄ、２０Ｅ、２０Ｆの外観図である。 For example, the shape and configuration of the microphone array are not limited to those of the above-described embodiments, and various shapes and configurations may be used. FIGS. 10A to 10C are external views of other microphone arrays 20D, 20E, and 20F.

図１０（Ａ）に示すマイクアレイ２０Ｄでは、図２に示すマイクアレイ２０に比べ、円盤状の筐体２１Ｄの径が小さい。筐体２１Ｄの面に、複数のマイクロホン２２Ｄが円状に沿って一様に配置されている。各々のマイクロホン２２Ｄの間隔が短くなるので、マイクアレイ２０Ｄは、高い音域に適した特性を有する。 In the microphone array 20D shown in FIG. 10A, the diameter of the disk-shaped housing 21D is smaller than that of the microphone array 20 shown in FIG. A plurality of microphones 22D are uniformly arranged along a circle on the surface of the housing 21D. Since the interval between the microphones 22D is shortened, the microphone array 20D has characteristics suitable for a high sound range.

また、図１０（Ｂ）に示すマイクアレイ２０Ｅでは、矩形を有する筐体２１Ｅの面に、複数のマイクロホン２２Ｅが矩形に沿って一様に配置されている。筐体２１Ｅが矩形に形成されているので、コーナー等の場所であってもマイクアレイ２０Ｅを設置し易くなる。 In the microphone array 20E shown in FIG. 10B, a plurality of microphones 22E are uniformly arranged along the rectangle on the surface of the rectangular casing 21E. Since the casing 21E is formed in a rectangular shape, the microphone array 20E can be easily installed even at a corner or the like.

また、図１０（Ｃ）に示すマイクアレイ２０Ｆでは、円盤状の筐体２１Ｆの面に、複数のマイクロホン２２Ｆが縦横に一様に配列されている。複数のマイクロホン２２Ｆが直線状に配置されているので、信号処理部５０における音声の強調処理が簡易化できる。なお、縦方向又は横方向の１列だけに、複数のマイクロホン２２Ｆが配置されても良い。 In the microphone array 20F shown in FIG. 10C, a plurality of microphones 22F are uniformly arranged vertically and horizontally on the surface of the disk-shaped casing 21F. Since the plurality of microphones 22F are arranged in a straight line, the sound enhancement processing in the signal processing unit 50 can be simplified. A plurality of microphones 22F may be arranged in only one column in the vertical direction or the horizontal direction.

また、上述した各実施形態では、ユーザがディスプレイ６３に表示されている映像データを見ながら音声の強調を所望する指定箇所又はその指定箇所を含む指定範囲を任意に指９５でタッチにより指定したが、例えば予めディスプレイ６３の画面を複数の区画（例えば、上下左右の４区画）に分割しておき、いずれか１つの区画を選択して音声を強調したい範囲としても良い。 Further, in each of the above-described embodiments, the user designates a designated portion where the user wants to emphasize audio or a designated range including the designated portion by touching with the finger 95 while viewing the video data displayed on the display 63. For example, the screen of the display 63 may be divided in advance into a plurality of sections (for example, four sections on the top, bottom, left, and right), and any one section may be selected as a range in which the voice is to be emphasized.

また、上述した各実施形態では、カメラは映像を記録（録画）し、ディスプレイは記録された映像データを表示する場合を説明したが、カメラは所定周期で静止画像を撮像し、ディスプレイは、所定間隔で撮像される静止画像を表示する場合、即ちリアルタイムに映像を撮像して音声を収音する場合においても本発明は適用可能である。即ち、ユーザは、ディスプレイの画面に表示された静止画像中の所定範囲を指定し、その付近の音声を強調させることもできる。 In each of the above-described embodiments, a case has been described in which the camera records (records) video and the display displays recorded video data. However, the camera captures still images at a predetermined cycle, and the display The present invention can also be applied when displaying still images picked up at intervals, that is, when picking up sound by picking up video in real time. That is, the user can designate a predetermined range in the still image displayed on the screen of the display and emphasize the sound in the vicinity thereof.

また、上述した各実施形態では、ユーザが指９５で画面をタッチすることで、指９５がタッチされたタッチ点を含む指定範囲（例えば楕円や矩形の範囲）が指定されたが、ユーザが指９５で円や多角形等を描くことで所定範囲が指定されても良い。 In each of the above-described embodiments, the user touches the screen with the finger 95 to specify a specified range (for example, an ellipse or a rectangular range) including the touch point where the finger 95 is touched. A predetermined range may be designated by drawing a circle, a polygon or the like at 95.

また、上述した各実施形態では、信号処理部５０は、複数の指定箇所又は各々の指定箇所を含む指定範囲（音声強調範囲）の指定を、操作部５５から受け付けても良い。この場合では、信号処理部５０は、指定された各指定箇所又は指定範囲に応じて、音声データの強調処理を行う。図１１は、所定の指定箇所又は指定範囲（音声強調範囲）が複数指定された場合のディスプレイ６３及びスピーカ６５の動作を示す模式図である。なお、説明を簡単にするために、音声処理システムが用いられたカメラ及びマイクアレイの動作状況は図６に示すカメラ１０及びマイクアレイ２０の動作状況と同様とする。 In each of the above-described embodiments, the signal processing unit 50 may receive a designation of a plurality of designated locations or a designated range (speech enhancement range) including each designated location from the operation unit 55. In this case, the signal processing unit 50 performs audio data enhancement processing according to each designated location or designated range. FIG. 11 is a schematic diagram illustrating operations of the display 63 and the speaker 65 when a plurality of predetermined designated locations or designated ranges (speech enhancement ranges) are designated. In order to simplify the description, the operation status of the camera and the microphone array in which the voice processing system is used is the same as the operation status of the camera 10 and the microphone array 20 shown in FIG.

この場合、信号処理部５０は、スピーカ６５から、２つの所定の異なる指定箇所又は異なる指定箇所を含む音声強調範囲６３ｅ、６３ｆの指定に応じて、マイクアレイ２０の各マイクロホン２２の位置から２人の人物９１，９２の中心に対応する音声位置に向かう指向方向に指向性を形成した各音声データを生成し、更に、マイクアレイ２０の各マイクロホン２２の位置からスピーカ８２の中心に対応する音声位置に向かう方向に指向性を形成した音声データを生成する。 In this case, the signal processing unit 50 receives two persons from the positions of the microphones 22 of the microphone array 20 in accordance with the designation of the voice emphasizing ranges 63e and 63f including two predetermined different designated places or different designated places from the speaker 65. Voice data having directivity formed in the directivity direction toward the voice position corresponding to the center of the persons 91 and 92, and further, the voice position corresponding to the center of the speaker 82 from the position of each microphone 22 of the microphone array 20 Generate voice data with directivity in the direction toward.

この結果、２人の人物９１，９２の会話（図１１に示す「Ｈｅｌｌｏ」参照）と、スピーカ８２から流れる音楽（図１１に示す「♪〜」参照）との両方が大きな音量によって音声出力される。これにより、音声処理システムは、１つのディスプレイにおいて２箇所以上の音声を強調させることができる。 As a result, both the conversation between the two persons 91 and 92 (see “Hello” shown in FIG. 11) and the music flowing from the speaker 82 (see “♪ ˜” shown in FIG. 11) are output as audio with a high volume. The Thereby, the voice processing system can emphasize two or more voices on one display.

次に、上述した各実施形態におけるマイクアレイ２０の筐体構造、マイクアレイ２０の回路構成の一例について、図１２〜図３０を参照して説明する。 Next, an example of the housing structure of the microphone array 20 and the circuit configuration of the microphone array 20 in each embodiment described above will be described with reference to FIGS.

（マイクアレイの筐体：４重の筐体構造）
図１２は、上述した各実施形態のマイクアレイ２０の筐体構造の分解斜視図である。図１３（Ａ）は、図１２に示すマイクアレイ２０の筐体構造の平面図である。図１３（Ｂ）は、図１３（Ａ）のＡ−Ａ断面図である。図１４は、図１３（Ｂ）の点線範囲の要部拡大図である。 (Mic array housing: quadruple housing structure)
FIG. 12 is an exploded perspective view of the housing structure of the microphone array 20 of each embodiment described above. FIG. 13A is a plan view of the housing structure of the microphone array 20 shown in FIG. FIG. 13B is a cross-sectional view taken along the line AA in FIG. FIG. 14 is an enlarged view of a main part of the dotted line range in FIG.

図１２に示すマイクアレイ２０の筐体構造は、メイン筐体１０１と、パンチングメタルカバー１０３と、マイク板金１０５と、ベース板金１０７とが鉛直方向に沿って積層された構成である。メイン筐体１０１、パンチングメタルカバー１０３、マイク板金１０５、ベース板金１０７は、４層となった耐衝撃性筐体１０９（バンダル・レジスタント・ケーシング：vandal-resistant casing）を構成している。 The housing structure of the microphone array 20 shown in FIG. 12 has a configuration in which a main housing 101, a punching metal cover 103, a microphone sheet metal 105, and a base sheet metal 107 are stacked in the vertical direction. The main casing 101, the punching metal cover 103, the microphone sheet metal 105, and the base sheet metal 107 constitute an impact resistant casing 109 (vandal-resistant casing) having four layers.

メイン筐体１０１は、例えば樹脂を材料として一体に成形される。メイン筐体１０１は、環状底部１１１に複数のマイク敷設用穴１１３が同心円上に設けられて有底筒状に形成される。環状底部１１１の中央部は、カメラ取付空間１１５となる。メイン筐体１０１は、メイン筐体外周壁１１７が、図１２に示すマイクアレイ２０の筐体構造において、最大外径を有する。 The main casing 101 is integrally formed using, for example, a resin as a material. The main casing 101 is formed in a bottomed cylindrical shape in which a plurality of microphone laying holes 113 are provided concentrically on the annular bottom 111. A central portion of the annular bottom portion 111 becomes a camera mounting space 115. The main casing 101 has a main casing outer peripheral wall 117 having a maximum outer diameter in the casing structure of the microphone array 20 shown in FIG.

パンチングメタルカバー１０３は、例えば金属を材料として一体の環状に成形される。パンチングメタルカバー１０３は、メイン筐体１０１の環状底部１１１を覆うようにメイン筐体１０１に取り付けられる。パンチングメタルカバー１０３には、音波を入射させるための多数の貫通孔（図示略）が穿設されている。パンチングメタルカバー１０３の外周にはメイン筐体１０１に向かって立ち上がる起立縁部１１９が絞り加工等によって形成される。起立縁部１１９は、メイン筐体１０１の下面外周に形成される周溝１２１（図１４参照）に挿入される。起立縁部１１９には、円周方向の等間隔で複数の弾性係止爪１２３が更に上方（図１２又は図１４の上方）に向かって突出している。 The punching metal cover 103 is formed into an integral annular shape using, for example, a metal as a material. The punching metal cover 103 is attached to the main casing 101 so as to cover the annular bottom portion 111 of the main casing 101. The punching metal cover 103 has a large number of through holes (not shown) through which sound waves are incident. On the outer periphery of the punching metal cover 103, a rising edge portion 119 rising toward the main casing 101 is formed by drawing or the like. The standing edge 119 is inserted into a circumferential groove 121 (see FIG. 14) formed on the outer periphery of the lower surface of the main housing 101. A plurality of elastic locking claws 123 protrude further upward (upward in FIG. 12 or FIG. 14) at equal intervals in the circumferential direction on the standing edge 119.

図１５（Ａ）は、パンチングメタルカバー１０３をメイン筐体１０１に固定する様子を示す斜視図である。図１５（Ｂ）は、パンチングメタルカバー１０３をメイン筐体１０１に固定する様子を示す断面図である。弾性係止爪１２３は、周溝１２１の奥側に設けられている係止孔１２５ａを通して回転することで、爪係止部１２５に係止される。パンチングメタルカバー１０３は、弾性係止爪１２３を爪係止部１２５に係止することで、メイン筐体１０１に固定される。 FIG. 15A is a perspective view showing how the punching metal cover 103 is fixed to the main casing 101. FIG. 15B is a cross-sectional view showing a state in which the punching metal cover 103 is fixed to the main housing 101. The elastic locking claw 123 is locked to the claw locking portion 125 by rotating through a locking hole 125 a provided on the back side of the circumferential groove 121. The punching metal cover 103 is fixed to the main casing 101 by locking the elastic locking claws 123 to the claw locking portions 125.

マイク板金１０５は、例えば金属板をプレス加工することにより形成される。マイク板金１０５は、円環形状を周方向に四等分した形状で形成される。マイク板金１０５は、マイク板金固定ネジ（図示略）によってメイン筐体１０１に固定される。メイン筐体１０１に固定されたマイク板金１０５は、メイン筐体１０１の環状底部１１１との間に、マイク基板１２７を保持したマイク筐体１２９を挟んだ状態で保持する。 The microphone sheet metal 105 is formed, for example, by pressing a metal plate. The microphone sheet metal 105 is formed in a shape obtained by dividing an annular shape into four equal parts in the circumferential direction. The microphone sheet metal 105 is fixed to the main housing 101 by a microphone sheet metal fixing screw (not shown). The microphone sheet metal 105 fixed to the main casing 101 holds the microphone casing 129 holding the microphone substrate 127 between the annular bottom 111 of the main casing 101.

マイク筐体１２９は、例えば樹脂を材料として一体に成形される。マイク筐体１２９は、円環形状を周方向に四等分した形状で形成される。マイク基板１２７には、４つの高音質小型エレクトレットコンデンサーマイクロホン（ＥＣＭ：Electret Condenser Microphone）が同一面上に取り付けられている。マイク筐体１２９には、ＥＣＭ１３１が図１４中の下方にある状態で、マイク基板１２７が取り付けられる。マイク基板１２７とマイク筐体１２９との間にゴム部品が挟みこまれている（図１４参照）。マイク基板１２７は、マイク筐体１２９に対して１つ取り付けられる。従って、マイクアレイ２０の筐体構造全体では、合計４つのマイク基板１２７が取り付けられ、マイクアレイ２０の筐体構造全体では、合計１６個のＥＣＭ１３１が装備される。 The microphone casing 129 is integrally formed using, for example, a resin. The microphone casing 129 is formed in a shape obtained by dividing an annular shape into four equal parts in the circumferential direction. Four high sound quality small electret condenser microphones (ECM) are mounted on the same surface of the microphone substrate 127. A microphone substrate 127 is attached to the microphone casing 129 in a state where the ECM 131 is below in FIG. A rubber component is sandwiched between the microphone substrate 127 and the microphone casing 129 (see FIG. 14). One microphone substrate 127 is attached to the microphone casing 129. Accordingly, a total of four microphone substrates 127 are attached in the entire housing structure of the microphone array 20, and a total of 16 ECMs 131 are equipped in the entire housing structure of the microphone array 20.

従って、図１２に示すマイクアレイ２０の筐体構造では、底部の外側から、パンチングメタルカバー１０３、メイン筐体１０１、マイク筐体１２９、マイク板金１０５、ベース板金１０７が順に、図１２に示す上方向に向かって配置されている。これらの複数の部材は、マイクアレイ２０の図１２に示す下方向からの外力（衝撃力）に対抗する構造体を構成している。例えばメイン筐体１０１とマイク筐体１２９とが一体構成でなく別体構成となっているので、図１２に示す下方向からの外力（衝撃力）を分散し、ベース板金１０７がメイン筐体１０１及びマイク筐体１２９の変形を防ぐことができる。これにより、外力が加わった後でも、マイクアレイ２０の収音時の形状維持が可能となり、マイクアレイ２０の収音時における音響特性の劣化を防ぐことができる。 Therefore, in the case structure of the microphone array 20 shown in FIG. 12, the punching metal cover 103, the main case 101, the microphone case 129, the microphone sheet metal 105, and the base sheet metal 107 are sequentially shown in FIG. It is arranged toward the direction. These members constitute a structure that opposes the external force (impact force) from the lower side of the microphone array 20 shown in FIG. For example, since the main housing 101 and the microphone housing 129 are not integrated but separated, the external force (impact force) from below shown in FIG. And the deformation | transformation of the microphone housing | casing 129 can be prevented. Thereby, even after an external force is applied, it is possible to maintain the shape of the microphone array 20 during sound collection, and it is possible to prevent deterioration of the acoustic characteristics during sound collection of the microphone array 20.

ベース板金１０７は、例えば金属の材料をプレス加工（絞り加工）することにより一体に成形される。ベース板金１０７は、環状天板部１３３を有して有底筒状に形成される。即ち、環状底部１１１の外周からはベース板金外周壁１３５が下側に曲げられている。このベース板金外周壁１３５は、大径の環状天板部１３３の素板を絞り加工することにより得られる。ベース板金外周壁１３５が絞り加工されたベース板金１０７は、他の構成部材よりも高い強度を有している。 The base sheet metal 107 is integrally formed by, for example, pressing (drawing) a metal material. The base metal plate 107 has an annular top plate portion 133 and is formed in a bottomed cylindrical shape. That is, the base sheet metal outer peripheral wall 135 is bent downward from the outer periphery of the annular bottom portion 111. The base sheet metal outer peripheral wall 135 is obtained by drawing a base plate of the large-diameter annular top plate portion 133. The base sheet metal 107 obtained by drawing the outer peripheral wall 135 of the base sheet metal has higher strength than other components.

ベース板金１０７は、メイン筐体１０１にベース板金固定ネジ（図示略）によって固定される。ベース板金１０７には、マイク板金１０５との間に、例えばマイクアレイ２０の処理を制御するための部品等が実装されたメイン基板１３９と、例えばマイクアレイ２０の各部に電源を供給するための部品等が実装された電源基板１４１とが配置される。メイン基板１３９と電源基板１４１は、図１２に示すマイクアレイ２０の筐体構造の全体で、それぞれが１つずつ設けられる。 The base metal plate 107 is fixed to the main housing 101 with a base metal plate fixing screw (not shown). For example, a component for controlling the processing of the microphone array 20 is mounted between the base sheet metal 107 and the microphone sheet metal 105, and a component for supplying power to each part of the microphone array 20, for example. And a power supply board 141 on which are mounted. The main board 139 and the power supply board 141 are provided one by one in the entire housing structure of the microphone array 20 shown in FIG.

マイク板金１０５からは、複数の嵌合部１４３が円周方向に等間隔で起立している。嵌合部１４３は、半径方向に離間する一対の挟持片（外側挟持片１４５、内側挟持片１４７）からなる。嵌合部１４３は、メイン筐体外周壁１１７の内側で間隙１４９を有して配置される。嵌合部１４３には、ベース板金外周壁１３５が嵌合される。つまり、図１２に示すマイクアレイ２０の筐体構造では、側部の外側から、メイン筐体外周壁１１７、間隙１４９、外側挟持片１４５、ベース板金外周壁１３５、内側挟持片１４７が順に、半径方向内側に向かって配置されている。これらの重ねられた複数の部材は、マイクアレイ２０の側部からの外力（衝撃力）に対抗する構造体を構成している。 From the microphone sheet metal 105, a plurality of fitting portions 143 stand up at equal intervals in the circumferential direction. The fitting portion 143 includes a pair of holding pieces (an outer holding piece 145 and an inner holding piece 147) that are spaced apart in the radial direction. The fitting portion 143 is disposed with a gap 149 inside the outer peripheral wall 117 of the main housing. A base sheet metal outer peripheral wall 135 is fitted to the fitting portion 143. That is, in the case structure of the microphone array 20 shown in FIG. 12, the main case outer peripheral wall 117, the gap 149, the outer holding piece 145, the base sheet metal outer peripheral wall 135, and the inner holding piece 147 are arranged in the radial direction in this order from the outer side. It is arranged toward the inside. A plurality of these stacked members constitute a structure that resists external force (impact force) from the side of the microphone array 20.

また、マイク板金１０５からは、起立して突出した当り止め部１３７があり、通常はベース板金１０７とは離れた位置にあるが、外力が加わってメイン筐体１０１が変形した場合、当り止め部１３７がベース板金１０７に当り、メイン筐体１０１に大きなひずみが生じないように働く。 Further, the microphone sheet metal 105 has a stopper portion 137 that stands up and protrudes, and is usually located away from the base sheet metal 107, but when the main casing 101 is deformed by an external force, the stopper portion 137 hits the base metal plate 107 and works so that a large distortion does not occur in the main casing 101.

（ＥＣＭの直付構造）
図１６は、ＥＣＭの取付構造の模式図である。図１２に示すマイクアレイ２０の筐体構造では、マイク基板１２７がマイク板金１０５の下側に配置され、メイン基板１３９及び電源基板１４１がマイク板金１０５の上側に配置される。つまり、マイク基板１２７と、メイン基板１３９及び電源基板１４１とは、２階建ての構造となって配置されている。ここで、４つのマイク基板１２７は、円周回りの一方向で第１のマイク基板１２７、第２のマイク基板１２７、第３のマイク基板１２７、第４のマイク基板１２７が順に配置されているとする。この場合、メイン基板１３９は、第１のマイク基板１２７と、第４のマイク基板１２７に電源配線１５１によって接続されている。第１のマイク基板１２７は、第２のマイク基板１２７に接続されている。第４のマイク基板１２７は、第３のマイク基板１２７に接続されている。 (Direct mounting structure of ECM)
FIG. 16 is a schematic diagram of an ECM mounting structure. In the housing structure of the microphone array 20 shown in FIG. 12, the microphone substrate 127 is disposed below the microphone sheet metal 105, and the main substrate 139 and the power supply substrate 141 are disposed above the microphone sheet metal 105. That is, the microphone substrate 127, the main substrate 139, and the power supply substrate 141 are arranged in a two-story structure. Here, in the four microphone boards 127, the first microphone board 127, the second microphone board 127, the third microphone board 127, and the fourth microphone board 127 are sequentially arranged in one direction around the circumference. And In this case, the main board 139 is connected to the first microphone board 127 and the fourth microphone board 127 by the power supply wiring 151. The first microphone substrate 127 is connected to the second microphone substrate 127. The fourth microphone substrate 127 is connected to the third microphone substrate 127.

マイク基板１２７の下面側には、ＥＣＭ１３１が取り付けられる。ＥＣＭ１３１には、一対のピン端子１５３が突出される。ＥＣＭ１３１は、それぞれのピン端子１５３が、マイク基板１２７の所定の回路に設けられた端子ピン挿入孔（図示略）に挿入され、例えば半田によって直接に接続固定される。これにより、マイク基板１２７に対するＥＣＭ１３１の薄厚化（低背化）を実現している。また、ＥＣＭ１３１のマイク基板１２７への直付けにより材料費を安価としている。 An ECM 131 is attached to the lower surface side of the microphone substrate 127. A pair of pin terminals 153 protrudes from the ECM 131. Each pin terminal 153 of the ECM 131 is inserted into a terminal pin insertion hole (not shown) provided in a predetermined circuit of the microphone substrate 127, and is directly connected and fixed by, for example, solder. Thereby, the thickness (low profile) of the ECM 131 with respect to the microphone substrate 127 is realized. In addition, the material cost is reduced by directly attaching the ECM 131 to the microphone substrate 127.

（ＡＤＣコンバータ配置）
図１７は、マイク基板１２７の平面図である。図１７に示す１つのマイク基板１２７には、４つのＥＣＭ１３１が取り付けられている。マイク基板１２７の回路（マイク基板回路）では、それぞれのＥＣＭ１３１に接続される線路長の差は音波信号における位相差を生じさせ、結果的に、この位相差が指向角のズレとなってくる。このため、それぞれのＥＣＭ１３１に接続される線路長は、できるだけ等しくする必要がある。 (ADC converter arrangement)
FIG. 17 is a plan view of the microphone substrate 127. Four ECMs 131 are attached to one microphone substrate 127 shown in FIG. In the circuit of the microphone substrate 127 (microphone substrate circuit), the difference in the line length connected to each ECM 131 causes a phase difference in the sound wave signal, and as a result, this phase difference becomes a deviation of the directivity angle. For this reason, the line length connected to each ECM 131 needs to be as equal as possible.

そこで、マイク基板１２７では、２つのＥＣＭ１３１と１つのＡＤコンバータ１５５との組合せによりマイク基板回路が構成されている。マイク基板回路は、１つのＡＤコンバータ１５５が２つのＥＣＭ１３１の間に、それぞれのＥＣＭ１３１から等距離で配置されることで、ＡＤコンバータ１５５とＥＣＭ１３１との間のアナログ線路１５７を増幅回路を経由して最短でかつ同じ線路長となるように配線している。これにより、マイク基板回路は、マイク基板１２７におけるノイズ信号のレベルを各ＥＣＭにおいて均等にでき、かつ指向角のズレを低減できる。 Therefore, in the microphone substrate 127, a microphone substrate circuit is configured by a combination of two ECMs 131 and one AD converter 155. The microphone substrate circuit is configured such that one AD converter 155 is arranged between two ECMs 131 at an equal distance from each ECM 131, so that an analog line 157 between the AD converter 155 and the ECM 131 passes through an amplifier circuit. It is wired so that it is the shortest and has the same line length. Thereby, the microphone substrate circuit can equalize the level of the noise signal in the microphone substrate 127 in each ECM, and can reduce the deviation of the directivity angle.

（マイク基板回路）
図１８（Ａ）は、複数のマイク回路１５９に対して１つのリップル除去回路１６１が設けられるマイク基板回路の図を示す。図１８（Ｂ）は、複数のマイク回路１５９のそれぞれにリップル除去回路１６１が設けられるマイク基板回路の図である。 (Microphone board circuit)
FIG. 18A shows a diagram of a microphone substrate circuit in which one ripple removing circuit 161 is provided for a plurality of microphone circuits 159. FIG. 18B is a diagram of a microphone substrate circuit in which a ripple removal circuit 161 is provided in each of the plurality of microphone circuits 159.

マイク基板１２７のマイク基板回路には、ＥＣＭが配置されたマイク回路１５９と電源基板１４１との間に、リップル除去回路１６１が設けられる。リップル除去回路１６１は、直流信号は通過させるが、特定周波数の交流信号をカットするフィルタである。リップル除去回路１６１は、図１８（Ａ）に示すように、並列接続した４つのマイク回路１５９と電源基板１４１の間に、１つ設けることができる。この場合、マイクアレイ２０の製造コストの低減が可能となる。 In the microphone substrate circuit of the microphone substrate 127, a ripple removing circuit 161 is provided between the microphone circuit 159 where the ECM is arranged and the power supply substrate 141. The ripple removal circuit 161 is a filter that passes a DC signal but cuts an AC signal having a specific frequency. As shown in FIG. 18A, one ripple removing circuit 161 can be provided between the four microphone circuits 159 connected in parallel and the power supply substrate 141. In this case, the manufacturing cost of the microphone array 20 can be reduced.

一方、リップル除去回路１６１は、図１８（Ｂ）に示すように、４つそれぞれのマイク回路１５９と電源基板１４１の間に設けてもよい。この場合、異なるＥＣＭ間の信号流入が低減され、所謂クロストーク１６３の抑制が可能となる。 On the other hand, the ripple removal circuit 161 may be provided between each of the four microphone circuits 159 and the power supply board 141 as shown in FIG. In this case, signal inflow between different ECMs is reduced, and so-called crosstalk 163 can be suppressed.

（マイクアレイとカメラとの間の構造的な隙間対策）
図１９（Ａ）は、カメラアダプタが取り付けられずに全方位カメラが取り付けられたマイクアレイ２０の筐体構造の斜視図である。図１９（Ｂ）は、屋外用全方位カメラ１６５がカメラアダプタと共に取り付けられたマイクアレイ２０の筐体構造の斜視図である。図２０は、屋内用全方位カメラ１６７が取り付けられるマイクアレイ２０の筐体構造の分解斜視図である。図２１は、屋外用全方位カメラ１６５が取り付けられるマイクアレイ２０の筐体構造の分解斜視図である。図２２（Ａ）は、屋外用全方位カメラ１６５が取り付けられたマイクアレイ２０の筐体構造の側面図である。図２２（Ｂ）は、図２２（Ａ）のＢ−Ｂ断面図である。図２３は、図２２の要部拡大図である。 (Measures for structural gaps between the microphone array and the camera)
FIG. 19A is a perspective view of the housing structure of the microphone array 20 in which the omnidirectional camera is attached without the camera adapter being attached. FIG. 19B is a perspective view of the housing structure of the microphone array 20 to which the outdoor omnidirectional camera 165 is attached together with the camera adapter. FIG. 20 is an exploded perspective view of the housing structure of the microphone array 20 to which the indoor omnidirectional camera 167 is attached. FIG. 21 is an exploded perspective view of the housing structure of the microphone array 20 to which the outdoor omnidirectional camera 165 is attached. FIG. 22A is a side view of the housing structure of the microphone array 20 to which the outdoor omnidirectional camera 165 is attached. FIG. 22B is a cross-sectional view taken along the line BB in FIG. FIG. 23 is an enlarged view of a main part of FIG.

マイクアレイ２０の筐体構造において、中央部のカメラ取付空間１１５に、例えば全方位カメラを組み込むことかできる。全方位カメラには、屋外用全方位カメラ１６５と、屋内用全方位カメラ１６７とがある。図１９（Ａ）に示すように、マイクアレイ２０の筐体構造として、例えば屋内用全方位カメラ１６７がカメラ取付空間１１５に取り付けられると、マイクアレイ２０のメイン筐体１０１と屋内用全方位カメラ１６７との間に隙間１６９が生じ、マイクアレイ２０の内部が見えてしまう。内部が見える状態は、製品としての見栄えの悪化やごみなどの進入だけでなく、マイクアレイ２０の内部空間に音が侵入して、共鳴や反射などを起こし、音響的な性能の劣化の原因となってしまう。 In the housing structure of the microphone array 20, for example, an omnidirectional camera can be incorporated in the camera mounting space 115 in the center. The omnidirectional camera includes an outdoor omnidirectional camera 165 and an indoor omnidirectional camera 167. As shown in FIG. 19A, when the indoor omnidirectional camera 167 is attached to the camera mounting space 115 as the housing structure of the microphone array 20, for example, the main housing 101 of the microphone array 20 and the indoor omnidirectional camera are arranged. A gap 169 is generated between the microphone array 167 and the inside of the microphone array 20 can be seen. The state where the inside can be seen is not only the deterioration of the appearance of the product and the entry of dust etc., but also the sound enters the internal space of the microphone array 20 and causes resonance and reflection, which causes deterioration of acoustic performance. turn into.

また、全方位カメラには用途や機能によって様々なサイズがある。それぞれの全方位カメラ用に、サイズの異なるメイン筐体１０１を準備することは、製造上のコストアップが避けられない。メイン筐体１０１をひとつのサイズに固定して、全方位カメラの機種による隙間の違いを、カメラアダプタを用いて隙間を塞ぐことで、製造コストを抑えることが可能になる。 There are various sizes of omnidirectional cameras depending on applications and functions. For each omnidirectional camera, it is inevitable to increase the manufacturing cost to prepare the main casing 101 having a different size. Manufacturing cost can be reduced by fixing the main casing 101 to one size and closing the gap by using a camera adapter to prevent the gap between the omnidirectional camera models.

そこで、図１９（Ｂ）に示すように、例えば屋外用全方位カメラ１６５がカメラ取付空間１１５に取り付けられる場合には、屋外用カメラアダプタ１７１が、屋外用全方位カメラ１６５の周囲に取り付けられる。また、図２０に示すように、屋内用全方位カメラ１６７がカメラ取付空間１１５に取り付けられる場合には、屋内用カメラアダプタ１７３が、屋内用全方位カメラ１６７の周囲に取り付けられる。屋内用カメラアダプタ１７３は、例えば樹脂を材料として筒状に形成される。屋内用カメラアダプタ１７３の下端には隙間隠し用のフランジ１７５が形成され、フランジ１７５は屋内用全方位カメラ１６７をカメラ取付空間１１５に取り付けた場合に生じる屋内用全方位カメラ１６７とメイン筐体１０１との間の隙間１６９を隠す。 Therefore, as shown in FIG. 19B, for example, when the outdoor omnidirectional camera 165 is attached to the camera mounting space 115, the outdoor camera adapter 171 is attached around the outdoor omnidirectional camera 165. As shown in FIG. 20, when the indoor omnidirectional camera 167 is attached to the camera mounting space 115, the indoor camera adapter 173 is attached around the indoor omnidirectional camera 167. The indoor camera adapter 173 is formed in a cylindrical shape using, for example, resin. The indoor camera adapter 173 has a flange 175 for concealing a gap formed at the lower end of the indoor camera adapter 173, and the flange 175 has the indoor omnidirectional camera 167 and the main casing 101 that are generated when the indoor omnidirectional camera 167 is attached to the camera mounting space 115. The gap 169 between the two is hidden.

屋内用カメラアダプタ１７３には複数の周壁弾性爪１７７が、複数の切り込み１７９内に、円周方向に沿って等間隔に形成される。屋内用カメラアダプタ１７３は、周壁弾性爪１７７を屋内用全方位カメラ１６７のカメラ筐体１８１に係止して取り付けられる。ベース板金１０７には、図２２に示す複数のカメラ固定用板金部１８３が円周方向に沿って等間隔で形成されている。カメラ固定用板金部１８３は、ダルマ穴１８５を有してカメラ取付空間１１５の上方に配置される。カメラ筐体１８１の上面には、カメラ固定用板金部１８３のダルマ穴１８５に係合する大径頭部（図示略）を有する係合ピン（図示略）が突設されている。屋内用カメラアダプタ１７３が取り付けられた屋内用全方位カメラ１６７は、カメラ取付空間１１５に挿入され、回転されることで、係合ピンがダルマ穴１８５に係合して落下が規制されて支持される。この回転位置で、屋内用全方位カメラ１６７は、カメラ回転規制ネジ（図示略）によってマイクアレイ２０のメイン筐体１０１等にロックされる。また、屋内用全方位カメラ１６７がロックされた状態では、周壁弾性爪１７７は、メイン筐体１０１の内周壁が邪魔となって、カメラ固定用板金部１８３の係止の解除が規制される。 In the indoor camera adapter 173, a plurality of peripheral wall elastic claws 177 are formed in the plurality of cuts 179 at equal intervals along the circumferential direction. The indoor camera adapter 173 is attached by locking the peripheral wall elastic claw 177 to the camera casing 181 of the indoor omnidirectional camera 167. On the base sheet metal 107, a plurality of camera fixing sheet metal parts 183 shown in FIG. 22 are formed at equal intervals along the circumferential direction. The camera fixing sheet metal portion 183 has a dharma hole 185 and is disposed above the camera mounting space 115. An engaging pin (not shown) having a large-diameter head (not shown) that engages with the dharma hole 185 of the camera fixing sheet metal part 183 is projected on the upper surface of the camera housing 181. The indoor omnidirectional camera 167 to which the indoor camera adapter 173 is attached is inserted into the camera attachment space 115 and rotated, whereby the engagement pin engages with the dharma hole 185 and is supported by the fall being restricted. The At this rotational position, the indoor omnidirectional camera 167 is locked to the main casing 101 or the like of the microphone array 20 by a camera rotation restricting screw (not shown). Further, in a state where the indoor omnidirectional camera 167 is locked, the peripheral wall elastic claws 177 are prevented from being locked by the camera fixing sheet metal portion 183 because the inner peripheral wall of the main housing 101 becomes an obstacle.

一方、図２１に示す屋外用カメラアダプタ１７１の外周には、先端が自由端となったバヨネット板１８７が設けられている。バヨネット板１８７の自由端には、半径方向内側に突出するアダプタ回転規制爪１８９（図２３参照）が形成されている。アダプタ回転規制爪１８９は、カメラ筐体１８１に形成されるバヨネット係合溝１９１に係合する。他の構造は、屋内用カメラアダプタ１７３と同様である。カメラ取付空間１１５に組み込まれた屋外用カメラアダプタ１７１を回転させようとすると、図２３に示すように、アダプタ回転規制爪１８９がバヨネット係合溝１９１に係合して、回転が規制される。つまり、屋外用カメラアダプタ１７１と屋外用全方位カメラ１６５との相対回転が規制される。なお、屋外用カメラアダプタ１７１のフランジ１７５には、工具挿入溝１９３が形成される。屋外用全方位カメラ１６５は、カメラ取付空間１１５に押し込まれると、回転させる手段が無くなる。そこで、工具挿入溝１９３にドライバー等を入れて回すことが可能となっている。 On the other hand, on the outer periphery of the outdoor camera adapter 171 shown in FIG. 21, a bayonet plate 187 having a free end is provided. At the free end of the bayonet plate 187, an adapter rotation restricting claw 189 (see FIG. 23) protruding inward in the radial direction is formed. The adapter rotation restricting claw 189 engages with a bayonet engaging groove 191 formed in the camera housing 181. Other structures are the same as the indoor camera adapter 173. When the outdoor camera adapter 171 incorporated in the camera mounting space 115 is rotated, the adapter rotation restricting claw 189 engages with the bayonet engaging groove 191 and the rotation is restricted as shown in FIG. That is, relative rotation between the outdoor camera adapter 171 and the outdoor omnidirectional camera 165 is restricted. A tool insertion groove 193 is formed in the flange 175 of the outdoor camera adapter 171. When the outdoor omnidirectional camera 165 is pushed into the camera mounting space 115, there is no means for rotating. Therefore, a screwdriver or the like can be inserted into the tool insertion groove 193 and rotated.

（マイクアレイと全方位カメラとの別体使用時に用いられる蓋）
図２４は、蓋１９５の取り付けられるマイクアレイ２０の筐体構造の分解斜視図である。マイクアレイ２０と全方位カメラとは、例えば図７（Ａ）に示すように一体的に取り付けられて使用される場合もあるが、例えば図９（Ａ）に示すように別体で取り付けられて使用される場合もある。この場合、カメラ取付空間１１５は、図２４に示す蓋１９５によって塞がれる。蓋１９５は、例えば樹脂を材料として一体に成形される。また、蓋１９５は、金属製の蓋用板金１９７との係止構造等によって一体に組み合わせられる。蓋１９５は、蓋用板金１９７と組み合わされることで、外力（衝撃力）を蓋用板金１９７へ分散させる。これにより、蓋１９５は、蓋１９５自身の大きな変形が抑制されて、割れ等が防止される。蓋１９５は、蓋用板金１９７と組み合わされて、カメラ取付空間１１５へ挿入され、蓋用板金１９７が、全方位カメラ固定用のカメラ固定用板金部１８３に係合することで支持される。この状態で、蓋１９５は、蓋回転止ネジ１９９によってカメラ固定用板金部１８３に回転止めされて固定される。 (Lid used when using separate microphone array and omnidirectional camera)
FIG. 24 is an exploded perspective view of the housing structure of the microphone array 20 to which the lid 195 is attached. The microphone array 20 and the omnidirectional camera may be used by being integrally attached as shown in FIG. 7A, for example, but are attached separately as shown in FIG. 9A, for example. Sometimes used. In this case, the camera mounting space 115 is closed by a lid 195 shown in FIG. The lid 195 is integrally formed using, for example, a resin as a material. Further, the lid 195 is integrally combined by a locking structure with a metal lid sheet metal 197 or the like. The lid 195 is combined with the lid sheet metal 197 to disperse external force (impact force) to the lid sheet metal 197. As a result, the lid 195 is prevented from being cracked and the like by preventing large deformation of the lid 195 itself. The lid 195 is inserted into the camera mounting space 115 in combination with the lid sheet metal 197, and the lid sheet metal 197 is supported by engaging the camera fixing sheet metal portion 183 for fixing the omnidirectional camera. In this state, the lid 195 is rotationally stopped and fixed to the camera fixing sheet metal portion 183 by the lid rotation set screw 199.

（取付金具）
図２５は、取付金具２０１を用いて天井に取り付けられるマイクアレイ２０の筐体構造の分解斜視図である。図２６（Ａ）は、ベース板金用固定穴２０３に差し込まれる前のベース板金側固定ピン２０５の側面図である。図２６（Ｂ）は、ベース板金用固定穴２０３に差し込まれたベース板金側固定ピン２０５の側面図である。図２６（Ｃ）は、ベース板金用固定穴２０３に差し込まれたベース板金側固定ピン２０５の平面図である。図２６（Ｄ）は、ベース板金用固定穴２０３の小径穴２０７に移動したベース板金側固定ピン２０５の側面図である。図２６（Ｅ）は、ベース板金用固定穴２０３の小径穴２０７に移動したベース板金側固定ピン２０５の平面図である。 (Mounting bracket)
FIG. 25 is an exploded perspective view of the housing structure of the microphone array 20 that is attached to the ceiling using the mounting bracket 201. FIG. 26A is a side view of the base sheet metal side fixing pin 205 before being inserted into the base sheet metal fixing hole 203. FIG. 26B is a side view of the base sheet metal side fixing pin 205 inserted into the base sheet metal fixing hole 203. FIG. 26C is a plan view of the base metal plate-side fixing pin 205 inserted into the base metal plate fixing hole 203. FIG. 26D is a side view of the base sheet metal side fixing pin 205 moved to the small diameter hole 207 of the base sheet metal fixing hole 203. FIG. 26E is a plan view of the base sheet metal side fixing pin 205 moved to the small diameter hole 207 of the base sheet metal fixing hole 203.

耐衝撃性筐体１０９（図１２参照）は、取付金具２０１を用いて設置面の一例としての天井面（図示略）に取り付けられる。即ち、取付金具２０１は、天井面に固定され、この取付金具２０１に、筐体構造を有する耐衝撃性筐体１０９が取り付けられる。 The impact-resistant housing 109 (see FIG. 12) is attached to a ceiling surface (not shown) as an example of an installation surface using a mounting bracket 201. That is, the mounting bracket 201 is fixed to the ceiling surface, and an impact resistant casing 109 having a casing structure is mounted on the mounting bracket 201.

取付具の一例としての取付金具２０１は、図２５に示すように、円形の金具基部を有する。ただし、取付具は金属製の取付金具２０１に限定されず、取付具の材質は例えばセラミックスでも合成樹脂（例えばプラスチックまたはエラストマ）でもよい。金具基部には、ベース板金用固定穴２０３が複数（例えば３個）穿設される。ベース板金用固定穴２０３は、小径穴２０７と大径穴２０９とが接続されたダルマ形状またはヘチマ形状に形成されている。 As shown in FIG. 25, the mounting bracket 201 as an example of a mounting tool has a circular bracket base. However, the fixture is not limited to the metal fixture 201, and the material of the fixture may be, for example, ceramics or synthetic resin (for example, plastic or elastomer). A plurality of (for example, three) base metal plate fixing holes 203 are formed in the metal base. The base sheet metal fixing hole 203 is formed in a dharma shape or a loofah shape in which a small diameter hole 207 and a large diameter hole 209 are connected.

一方、天井面と対面するベース板金１０７の面には、ベース板金用固定穴２０３に対応してベース板金側固定ピン２０５が突設される。図２６（Ａ）に示すように、ベース板金側固定ピン２０５は、突出先端に大径のピン頭部２１１を有する。大径のピン頭部２１１は、大径穴２０９に挿入可能となり、小径穴２０７には離脱が規制されて係止可能となっている。 On the other hand, on the surface of the base sheet metal 107 facing the ceiling surface, a base sheet metal side fixing pin 205 protrudes corresponding to the base sheet metal fixing hole 203. As shown in FIG. 26A, the base metal plate fixing pin 205 has a large-diameter pin head 211 at the protruding tip. The large-diameter pin head 211 can be inserted into the large-diameter hole 209, and the small-diameter hole 207 is restricted from being detached and can be locked.

次に、耐衝撃性筐体１０９の取り付け方法を説明する。
先ず、設置面の一例としての天井面に耐衝撃性筐体１０９を取り付けるには、取付金具２０１を天井面の所定位置に天井固定ネジ（図示略）によって固定する。天井面に固定された取付金具２０１に、耐衝撃性筐体１０９を同心円状に位置合わせする。 Next, a method for attaching the impact-resistant housing 109 will be described.
First, in order to attach the impact-resistant housing 109 to a ceiling surface as an example of an installation surface, the mounting bracket 201 is fixed to a predetermined position on the ceiling surface with a ceiling fixing screw (not shown). The impact-resistant housing 109 is concentrically aligned with the mounting bracket 201 fixed to the ceiling surface.

次に、図２６（Ｂ）及び図２６（Ｃ）に示すように、ベース板金側固定ピン２０５の大径のピン頭部２１１をベース板金用固定穴２０３の大径穴２０９に挿入する（図２６（Ｂ）及び図２６（Ｃ）参照）。 Next, as shown in FIGS. 26B and 26C, the large-diameter pin head 211 of the base metal plate-side fixing pin 205 is inserted into the large-diameter hole 209 of the base metal plate fixing hole 203 (see FIG. 26 (B) and FIG. 26 (C)).

その後、図２６（Ｄ）及び図２６（Ｅ）に示すように、耐衝撃性筐体１０９を回転して、大径のピン頭部２１１を小径穴２０７に移動することで、全てのベース板金側固定ピン２０５がベース板金用固定穴２０３に同時に固定される。取付金具２０１を介して天井面に固定された耐衝撃性筐体１０９のカメラ取付空間１１５には、上述したようにして、屋外用全方位カメラ１６５や屋内用全方位カメラ１６７が、取り付けられる。 Thereafter, as shown in FIGS. 26 (D) and 26 (E), the impact-resistant housing 109 is rotated, and the large-diameter pin head 211 is moved to the small-diameter hole 207. The side fixing pins 205 are simultaneously fixed in the base sheet metal fixing holes 203. As described above, the outdoor omnidirectional camera 165 and the indoor omnidirectional camera 167 are attached to the camera mounting space 115 of the impact-resistant housing 109 fixed to the ceiling surface via the mounting bracket 201.

このように、マイクアレイ２０の筐体構造では、取付金具２０１によって天井面に固定された耐衝撃性筐体１０９に、全方位カメラが直接取り付けられる。これにより、マイクアレイ２０の筐体構造は、マイク板金１０５の固定されているベース板金１０７に、全方位カメラが直接取り付けられるので、ＥＣＭ１３１と全方位カメラの位置精度を向上させることができる。 As described above, in the case structure of the microphone array 20, the omnidirectional camera is directly attached to the impact-resistant case 109 fixed to the ceiling surface by the mounting bracket 201. Thereby, since the omnidirectional camera is directly attached to the base sheet metal 107 to which the microphone sheet metal 105 is fixed, the housing structure of the microphone array 20 can improve the positional accuracy of the ECM 131 and the omnidirectional camera.

（反射音の抑制）
図２７は、ＥＣＭ用凹部２１３にテーパ２２３が設けられたマイクアレイ２０の筐体構造の断面図である。マイクアレイ２０の筐体構造は、図２７に示すように、ＥＣＭ用凹部２１３の内周面が、ＥＣＭ１３１に向かって縮径されるテーパ２２３となっている。テーパ２２３は、最小径がＥＣＭ１３１の挿入される緩衝材２１７の円形凸部の外径と略一致し、最大径が環状底部１１１のマイク敷設用穴１１３と略一致する。テーパ２２３が形成されたＥＣＭ用凹部２１３は、気柱の共振点が上がる。また、ＥＣＭ用凹部２１３の内周面の反射波がＥＣＭ１３１に向かわなくなる。更に、筐体横方向からの音波に乱れが無い状態でＥＣＭ１３１に届くようになる。これにより、使用可能な音域が広がり、マイクアレイ２０の収音時における音響特性が向上する。また、パンチングメタルカバー１０３と環状底部１１１の間には、風騒音を低減させるための不織布２２１が挟持されている。 (Suppression of reflected sound)
FIG. 27 is a cross-sectional view of the housing structure of the microphone array 20 in which the taper 223 is provided in the ECM recess 213. As shown in FIG. 27, the housing structure of the microphone array 20 has a taper 223 in which the inner peripheral surface of the ECM recess 213 is reduced in diameter toward the ECM 131. The taper 223 has a minimum diameter that substantially matches the outer diameter of the circular convex portion of the cushioning material 217 into which the ECM 131 is inserted, and a maximum diameter that substantially matches the microphone laying hole 113 in the annular bottom portion 111. In the ECM recess 213 formed with the taper 223, the resonance point of the air column rises. Further, the reflected wave on the inner peripheral surface of the ECM recess 213 does not go to the ECM 131. Furthermore, it reaches the ECM 131 in a state where the sound waves from the lateral direction of the casing are not disturbed. As a result, the usable sound range is widened, and the acoustic characteristics of the microphone array 20 during sound collection are improved. Further, a nonwoven fabric 221 for reducing wind noise is sandwiched between the punching metal cover 103 and the annular bottom portion 111.

（風対策）
図２８は、風対策の施されたマイクアレイ２０の筐体構造の断面図である。マイクアレイ２０の筐体構造は、マイク筐体１２９に、複数のＥＣＭ用凹部２１３がＥＣＭ１３１に応じて形成される。ＥＣＭ用凹部２１３は、例えば円形状に形成され、中心にＥＣＭ１３１を表出させる透孔２１５が形成される。なお、ＥＣＭ１３１は、例えば外周にゴム等の緩衝材２１７が巻かれてマイク筐体１２９に取り付けられ、ＥＣＭ１３１の先端が透孔２１５に挿入される。ＥＣＭ用凹部２１３は、環状底部１１１に形成されるマイク敷設用穴１１３と同心円状に配置される。このＥＣＭ用凹部２１３には、風対策用の吸音材２１９を充填できる。吸音材２１９の表面は、不織布２２１によって覆う。不織布２２１は、パンチングメタルカバー１０３と環状底部１１１とに挟持されている。 (Wind measures)
FIG. 28 is a cross-sectional view of the housing structure of the microphone array 20 that is provided with wind countermeasures. In the microphone array 20, a plurality of ECM recesses 213 are formed in the microphone casing 129 in accordance with the ECM 131. The ECM recess 213 is formed, for example, in a circular shape, and a through hole 215 for exposing the ECM 131 is formed at the center. The ECM 131 is attached to the microphone casing 129 with a buffer material 217 such as rubber wound around the outer periphery, and the tip of the ECM 131 is inserted into the through hole 215. The ECM recess 213 is arranged concentrically with the microphone laying hole 113 formed in the annular bottom 111. The ECM recess 213 can be filled with a sound absorbing material 219 for wind countermeasures. The surface of the sound absorbing material 219 is covered with a nonwoven fabric 221. The nonwoven fabric 221 is sandwiched between the punching metal cover 103 and the annular bottom portion 111.

次に、ＥＣＭ用凹部２１３の変形例を、図２９（Ａ）〜（Ｃ）を参照して説明する。図２９（Ａ）は、ＥＣＭ用凹部２１３の内径と深さとの関係を表したマイクアレイ２０の筐体構造の断面図である。図２９（Ｂ）は、ＥＣＭ用凹部２１３の内壁が傾斜壁２２５となったマイクアレイ２０の筐体構造の断面図である。図２９（Ｃ）は、ＥＣＭ用凹部２１３の内周隅部がＲ部２２７となったマイクアレイ２０の筐体構造の断面図である。 Next, modified examples of the ECM recess 213 will be described with reference to FIGS. FIG. 29A is a cross-sectional view of the housing structure of the microphone array 20 showing the relationship between the inner diameter and depth of the ECM recess 213. FIG. 29B is a cross-sectional view of the housing structure of the microphone array 20 in which the inner wall of the ECM recess 213 is an inclined wall 225. FIG. 29C is a cross-sectional view of the housing structure of the microphone array 20 in which the inner peripheral corner of the ECM recess 213 is an R portion 227.

図２９（Ａ）に示すように、ＥＣＭ用凹部２１３の直径Ｄと深さＨは、所定の関係となることが好ましい。例えばＨ／Ｄ＜１／１０の関係を満たすことで、ＥＣＭ用凹部２１３の共振周波数近傍でピークが抑えられるため、音響性能に悪影響を与えなくなる。 As shown in FIG. 29A, it is preferable that the diameter D and the depth H of the ECM recess 213 have a predetermined relationship. For example, by satisfying the relationship of H / D <1/10, the peak can be suppressed in the vicinity of the resonance frequency of the ECM recess 213, so that the acoustic performance is not adversely affected.

図２９（Ｂ）に示すように、ＥＣＭ用凹部２１３は、平坦な凹部底面２２９と、テーパ状の傾斜壁２２５とによって形成されてもよい。これによって、ＥＣＭ用凹部２１３の共振周波数を使用周波数帯域よりも高く出来るとともに、ＥＣＭ用凹部２１３の内周面からＥＣＭ１３１へ向かう反射波を低減させることができる。 As shown in FIG. 29B, the ECM recess 213 may be formed by a flat recess bottom surface 229 and a tapered inclined wall 225. As a result, the resonance frequency of the ECM recess 213 can be made higher than the operating frequency band, and reflected waves from the inner peripheral surface of the ECM recess 213 toward the ECM 131 can be reduced.

図２９（Ｃ）に示すように、ＥＣＭ用凹部２１３は、内周隅部をＲ部２２７としてもよい。これによっても、ＥＣＭ用凹部２１３の共振周波数を使用周波数帯域よりも高く出来るとともに、ＥＣＭ用凹部２１３の内周面からＥＣＭ１３１へ向かう反射波を低減させることができる。 As shown in FIG. 29C, the ECM recess 213 may have an inner peripheral corner as an R portion 227. Also by this, the resonance frequency of the ECM recess 213 can be made higher than the use frequency band, and the reflected wave from the inner peripheral surface of the ECM recess 213 toward the ECM 131 can be reduced.

図３０（Ａ）は、テーパ２２３を形成しないＥＣＭ用凹部２１３の等圧面を表した説明図である。図３０（Ｂ）は、テーパ２２３を形成したＥＣＭ用凹部２１３の等圧面を表した説明図である。 FIG. 30A is an explanatory view showing the isobaric surface of the ECM recess 213 in which the taper 223 is not formed. FIG. 30B is an explanatory diagram showing the isobaric surface of the ECM recess 213 in which the taper 223 is formed.

ＥＣＭ１３１の近傍の音は、例えば波動方程式による空間を伝わる音を有限要素法で解析することによってシミュレーションすることができる。この場合、ＥＣＭ用凹部２１３にテーパ２２３を設けないモデルでは、図３０（Ａ）に示すように、等圧面の間隔が、筐体表面２３１とＥＣＭ部２３３で異なる。一方、ＥＣＭ用凹部２１３にテーパ２２３を設けたモデルでは、図３０（Ｂ）に示すように、等圧面の間隔が、筐体表面２３１とＥＣＭ部２３３で同じとなる。これにより、ＥＣＭ用凹部２１３にテーパ２２３が設けられることで、ＥＣＭ１３１に向かって音波が乱れることなく届くことになる。 The sound in the vicinity of the ECM 131 can be simulated, for example, by analyzing the sound transmitted through the space based on the wave equation by the finite element method. In this case, in a model in which the ECM recess 213 is not provided with the taper 223, the space between the isobaric surfaces is different between the housing surface 231 and the ECM portion 233, as shown in FIG. On the other hand, in the model in which the ECM recess 213 is provided with the taper 223, the space between the isobaric surfaces is the same between the housing surface 231 and the ECM portion 233, as shown in FIG. Thereby, the taper 223 is provided in the ECM recess 213, so that the sound wave reaches the ECM 131 without being disturbed.

次に、上述した各実施形態のマイクアレイ２０の筐体構造の作用を説明する。
上述した各実施形態のマイクアレイ２０の筐体構造では、有底筒状に形成される樹脂製のメイン筐体１０１に、金属製のマイク板金１０５と、有底筒状の金属製のベース板金１０７が固定される。金属製のマイク板金１０５には、ベース板金１０７側に当り止め部１３７が起立している。また、メイン筐体１０１には、メイン筐体１０１を挟んでマイク板金１０５の反対側に、金属製のパンチングメタルカバー１０３が固定される。 Next, the effect | action of the housing structure of the microphone array 20 of each embodiment mentioned above is demonstrated.
In the case structure of the microphone array 20 of each of the embodiments described above, a metal microphone sheet metal 105 and a bottomed cylindrical metal base sheet metal are formed on a resin main casing 101 formed in a bottomed cylindrical shape. 107 is fixed. The metal microphone sheet metal 105 has a stopper 137 standing on the base sheet metal 107 side. A metal punching metal cover 103 is fixed to the main casing 101 on the opposite side of the microphone sheet metal 105 with the main casing 101 interposed therebetween.

上述した各実施形態のマイクアレイ２０の筐体構造は、外部からの衝撃エネルギーが、樹脂製のメイン筐体１０１を変形させることによって吸収される。メイン筐体１０１の破壊強度以上の衝撃エネルギーは、金属製のマイク板金１０５を変形させることによって吸収される。更に、マイク板金１０５を所定量以上に塑性変形させる衝撃エネルギーは、当り止め部１３７を介してベース板金１０７に加えられ、最終的にはベース板金１０７が取り付けられる建物躯体等へ逃がされる。 In the case structure of the microphone array 20 of each embodiment described above, impact energy from the outside is absorbed by deforming the resin main case 101. Impact energy exceeding the breaking strength of the main casing 101 is absorbed by deforming the metal microphone sheet metal 105. Further, the impact energy that plastically deforms the microphone sheet metal 105 to a predetermined amount or more is applied to the base sheet metal 107 through the contact stop portion 137 and is finally released to the building frame or the like to which the base sheet metal 107 is attached.

また、上述した各実施形態のマイクアレイ２０の筐体構造では、別体の部材で作られるパンチングメタルカバー１０３、メイン筐体１０１、マイク板金１０５、ベース板金１０７が、一体に固定されて組み立てられる。このため、外部からの衝撃エネルギーは、これら部材間の間隙１４９、擦れ合いによる摩擦によっても吸収されて低減される。 Further, in the case structure of the microphone array 20 of each embodiment described above, the punching metal cover 103, the main case 101, the microphone sheet metal 105, and the base sheet metal 107 made of separate members are integrally fixed and assembled. . For this reason, the impact energy from the outside is absorbed and reduced by the gap 149 between these members and friction caused by friction.

また、上述した各実施形態のマイクアレイ２０の筐体構造は、マイク基板１２７が、パンチングメタルカバー１０３とマイク板金１０５に挟まれている。メイン基板１３９及び電源基板１４１が、マイク板金１０５とベース板金１０７に挟まれている。つまり、マイク板金１０５は、金属製のパンチングメタルカバー１０３と金属製のマイク板金１０５とが構成する導電性外殻によって電磁シールドされる。メイン基板１３９及び電源基板１４１は、金属製のマイク板金１０５と金属製のベース板金１０７とが構成する導電性外殻によって電磁シールドされる。 Further, in the case structure of the microphone array 20 of each embodiment described above, the microphone substrate 127 is sandwiched between the punching metal cover 103 and the microphone sheet metal 105. The main board 139 and the power board 141 are sandwiched between the microphone sheet metal 105 and the base sheet metal 107. That is, the microphone sheet metal 105 is electromagnetically shielded by the conductive outer shell formed by the metal punching metal cover 103 and the metal microphone sheet metal 105. The main board 139 and the power supply board 141 are electromagnetically shielded by a conductive outer shell formed by the metal microphone sheet metal 105 and the metal base sheet metal 107.

また、上述した各実施形態のマイクアレイ２０の筐体構造では、樹脂製のメイン筐体１０１と金属製のマイク板金１０５によって挟まれるマイク筐体１２９が、樹脂素材で作られている。マイク筐体１２９には、複数のマイクが固定される。マイク筐体１２９に固定されたマイクは、メイン筐体１０１の環状底部１１１に開口するマイク敷設用穴１１３を通して外部に開放される。このマイク敷設用穴１１３は、環状底部１１１を覆うパンチングメタルカバー１０３によって覆われる。 In the case structure of the microphone array 20 of each embodiment described above, the microphone case 129 sandwiched between the resin main case 101 and the metal microphone sheet metal 105 is made of a resin material. A plurality of microphones are fixed to the microphone casing 129. The microphone fixed to the microphone casing 129 is opened to the outside through a microphone laying hole 113 opened in the annular bottom portion 111 of the main casing 101. The microphone laying hole 113 is covered with a punching metal cover 103 that covers the annular bottom portion 111.

例えば、耐衝撃性筐体１０９が天井面に固定されると、パンチングメタルカバー１０３は、地面に対面する側に配置される。地面側より耐衝撃性筐体１０９に加えられる打撃等の衝撃は、先ず、パンチングメタルカバー１０３に加わる。金属製のパンチングメタルカバー１０３は、弾性限界以上の衝撃によって塑性変形し、衝撃エネルギーを吸収する。パンチングメタルカバー１０３の塑性変形によって吸収されなかった衝撃エネルギーは、メイン筐体１０１の環状底部１１１に加わる。衝撃エネルギーは、環状底部１１１を変形させるとともに、マイク板金１０５とベース板金１０７に加わる。マイク筐体１２９はマイク板金に止められているため、大きな衝撃エネルギーは加わらない。 For example, when the impact-resistant housing 109 is fixed to the ceiling surface, the punching metal cover 103 is disposed on the side facing the ground. First, an impact such as impact applied to the impact-resistant housing 109 from the ground side is applied to the punching metal cover 103. The metal punching metal cover 103 is plastically deformed by an impact exceeding the elastic limit and absorbs the impact energy. The impact energy that has not been absorbed by the plastic deformation of the punching metal cover 103 is applied to the annular bottom 111 of the main housing 101. The impact energy deforms the annular bottom portion 111 and is applied to the microphone sheet metal 105 and the base sheet metal 107. Since the microphone casing 129 is fixed to the microphone sheet metal, a large impact energy is not applied.

このときの衝撃エネルギーが、樹脂製のメイン筐体１０１の弾性限界以上であると、メイン筐体１０１は、白化や亀裂等を生じさせ、その衝撃エネルギーを吸収する。メイン筐体１０１は、白化や亀裂が生じるが、全体が完全に破壊されない限り、白化や亀裂を有したまま元の形状に復元される。つまり、メイン筐体１０１は、白化や亀裂が生じていてもマイクの音響特性に大きな影響を及ぼさない。また、塑性変形したパンチングメタルカバー１０３も、開口率が高いため、変形してもマイクの音響特性に影響を及ぼさない。このため、外部からの衝撃に対抗し、マイクの音響特性が劣化しにくい。 If the impact energy at this time is equal to or greater than the elastic limit of the resin main casing 101, the main casing 101 causes whitening, cracks, and the like, and absorbs the impact energy. The main casing 101 is whitened or cracked, but is restored to its original shape with whitening or cracking unless the entire case is completely destroyed. That is, the main casing 101 does not greatly affect the acoustic characteristics of the microphone even if whitening or cracking occurs. Further, the punching metal cover 103 that has been plastically deformed also has a high aperture ratio, so that even if it is deformed, it does not affect the acoustic characteristics of the microphone. For this reason, the acoustic characteristics of the microphone are unlikely to deteriorate against an external impact.

なお、メイン筐体１０１がアルミ製であると、パンチングメタルカバー１０３からの衝撃によって塑性変形が生じ易くなる。特にマイク周辺形状が塑性変形した場合には、音響特性が劣化する。従って、上述した各実施形態のマイクアレイ２０の筐体構造によれば、このような塑性変形による音響特性の劣化が抑制される。 If the main casing 101 is made of aluminum, plastic deformation is likely to occur due to an impact from the punching metal cover 103. In particular, when the shape around the microphone is plastically deformed, the acoustic characteristics deteriorate. Therefore, according to the housing structure of the microphone array 20 of each embodiment described above, deterioration of acoustic characteristics due to such plastic deformation is suppressed.

更に、筐体構造では、メイン筐体１０１の内側に、マイク板金１０５が配置される。マイク板金１０５からは、嵌合部１４３が起立する。嵌合部１４３は、メイン筐体外周壁１１７の内側で、間隙１４９を有して配置される。この嵌合部１４３は、半径方向（メイン筐体外周壁１１７の厚み方向）に離間する一対の挟持片を有する。嵌合部１４３の一対の挟持片の間には、ベース板金１０７のベース板金外周壁１３５が挿入して嵌められ（嵌合され）る。つまり、本筐体構造では、耐衝撃性筐体１０９の側部が、外側より、メイン筐体外周壁１１７、間隙１４９、外側挟持片１４５、ベース板金外周壁１３５、内側挟持片１４７の順で内側に重ねられて構成されている。 Further, in the case structure, the microphone sheet metal 105 is disposed inside the main case 101. A fitting portion 143 rises from the microphone sheet metal 105. The fitting portion 143 is disposed inside the main casing outer peripheral wall 117 with a gap 149. This fitting part 143 has a pair of clamping pieces spaced apart in the radial direction (the thickness direction of the main casing outer peripheral wall 117). Between the pair of sandwiching pieces of the fitting portion 143, the base sheet metal outer peripheral wall 135 of the base sheet metal 107 is inserted and fitted (fitted). That is, in this case structure, the side part of the impact-resistant case 109 is arranged in the order of the main case outer peripheral wall 117, the gap 149, the outer holding piece 145, the base sheet metal outer peripheral wall 135, and the inner holding piece 147 from the outside. It is configured to overlap.

側部の外方より耐衝撃性筐体１０９に加えられる打撃等の衝撃エネルギーは、先ず、メイン筐体外周壁１１７に加わる。メイン筐体外周壁１１７は、間隙１４９の間を弾性変形して衝撃エネルギーを吸収する。弾性限界以上の衝撃エネルギーは、嵌合部１４３に加わる。嵌合部１４３に加わる衝撃エネルギーは、外側挟持片１４５、ベース板金外周壁１３５、内側挟持片１４７を弾性変形させて吸収される。また、この嵌合部１４３に加わる衝撃エネルギーは、外側挟持片１４５とベース板金外周壁１３５、ベース板金外周壁１３５と内側挟持片１４７の摩擦によっても効果的に吸収されて低減される。 Impact energy such as impact applied to the impact-resistant casing 109 from the outside of the side portion is first applied to the outer peripheral wall 117 of the main casing. The main casing outer peripheral wall 117 elastically deforms between the gaps 149 to absorb impact energy. Impact energy exceeding the elastic limit is applied to the fitting portion 143. The impact energy applied to the fitting portion 143 is absorbed by elastically deforming the outer holding piece 145, the base metal outer peripheral wall 135, and the inner holding piece 147. Further, the impact energy applied to the fitting portion 143 is also effectively absorbed and reduced by the friction between the outer sandwiching piece 145 and the base sheet metal outer peripheral wall 135, and the base sheet metal outer peripheral wall 135 and the inner sandwiching piece 147.

従って、上述した各実施形態のマイクアレイ２０の筐体構造によれば、耐衝撃性を向上させることができる。 Therefore, according to the casing structure of the microphone array 20 of each embodiment described above, the impact resistance can be improved.

（第４の実施形態）
第１〜第３の各実施形態では、ディスプレイ６３，７３に表示された映像データにおいて、ユーザにより１つの指定箇所が指定された場合の音声処理システムの動作を想定して説明した。第４の実施形態では、同様にディスプレイ６３，７３に表示された映像データにおいて、ユーザにより異なる複数（例えば２つ）の指定箇所が指定された場合の音声処理システムの動作について説明する。本実施形態の音声処理システムのシステム構成は図１（Ａ）に示す音声処理システム５Ａのシステム構成と同一であるため、音声処理システム５Ａの各部の符号を参照して説明する。 (Fourth embodiment)
In each of the first to third embodiments, the description has been made assuming the operation of the audio processing system when one designated place is designated by the user in the video data displayed on the displays 63 and 73. In the fourth embodiment, the operation of the sound processing system when a plurality of (for example, two) designated locations are designated by the user in the video data similarly displayed on the displays 63 and 73 will be described. Since the system configuration of the voice processing system of this embodiment is the same as the system configuration of the voice processing system 5A shown in FIG. 1A, description will be made with reference to the reference numerals of the respective parts of the voice processing system 5A.

本実施形態の音声処理システムは、例えばディスプレイ６３，７３に表示された映像データにおいてユーザにより２つの指定箇所が指定された場合、指定された２つの指定箇所を適正に区別し、区別したことをユーザに対して視覚的に明示するために、指定箇所毎に異なる識別形状を各指定箇所の周囲に表示する。更に、本実施形態の音声処理システムは、マイクアレイ２０により収音された音声の音声データを用いて、マイクアレイ２０から各指定箇所に対応する音声位置に向かう方向に指向性をそれぞれ形成し、各識別形状に対応付けて予め規定された方法に従って、音声出力する。 The audio processing system according to the present embodiment, for example, when two designated locations are designated by the user in the video data displayed on the displays 63 and 73, the two designated locations are properly distinguished and distinguished. In order to make it clearly visible to the user, a different identification shape for each designated location is displayed around each designated location. Furthermore, the voice processing system of the present embodiment uses the voice data of the voice collected by the microphone array 20 to form directivity in the direction from the microphone array 20 toward the voice position corresponding to each designated location, Audio is output according to a method defined in advance in association with each identification shape.

図８は、第４の実施形態の音声処理システム５Ａの使用形態の一例を示す模式図である。図８（Ａ）は、例えば屋内のホールの天井８５に、１台のカメラ１０と、１台のマイクアレイ２０と、スピーカ８２とが設置された様子を示す図である。図８（Ｂ）は、ディスプレイ６３に表示された映像データの中で複数の指定箇所が指定された場合の音声処理システム５Ａの動作概要の説明図である。 FIG. 8 is a schematic diagram illustrating an example of a usage pattern of the voice processing system 5A according to the fourth embodiment. FIG. 8A is a diagram showing a state in which one camera 10, one microphone array 20, and a speaker 82 are installed on the ceiling 85 of an indoor hall, for example. FIG. 8B is an explanatory diagram of an outline of the operation of the audio processing system 5 A when a plurality of designated locations are designated in the video data displayed on the display 63.

図８（Ａ）では、２人の人物９１ａ，９２ａがホールの床８７に立って会話をしている。２人の人物９１ａ，９２ａから少し離れた位置には、スピーカ８２が床８７の上に接して載置されており、スピーカ８２から音楽が流れている。また、カメラ１０は、カメラ１０に予め設定された監視対象の地点（場所）の周囲にいる人物９１ａ，９２ａを撮像している。更に、マイクアレイ２０は、ホール全体の音声を収音している。ディスプレイ６３の画面６８には、カメラ１０が撮像した映像データが表示されている。また、スピーカ６５からは、２人の人物９１，９２の会話又はホール内の音楽が音声出力されている。 In FIG. 8A, two persons 91a and 92a are standing on the floor 87 of the hall and having a conversation. A speaker 82 is placed in contact with the floor 87 at a position slightly away from the two persons 91a and 92a, and music flows from the speaker 82. In addition, the camera 10 captures images of persons 91 a and 92 a around a monitoring target point (place) set in advance in the camera 10. Furthermore, the microphone array 20 collects the sound of the entire hall. Video data captured by the camera 10 is displayed on the screen 68 of the display 63. In addition, the speaker 65 outputs the conversation between the two persons 91 and 92 or the music in the hall.

ユーザは、例えばディスプレイ６３の画面６８に表示された２人の人物９１ａ，９２ａの頭上付近を指９５でそれぞれ連続的にタッチしたとする。タッチ点６３ａ１，６３ａ２はユーザにより指定された複数の指定箇所となる。信号処理部５０は、マイクアレイ２０によって収音された音声、即ち各マイクロホン２２が収音した各音声データを用いて、マイクアレイ２０の各マイクロホン２２の位置から、ユーザが指定したタッチ点６３ａ１，６３ａ２に対応する各音声位置に向かう各指向方向（図８（Ａ）に示す符号ｅ１，ｅ２で示される方向）に指向性を形成した各音声データを生成して合成する。 For example, it is assumed that the user continuously touches the vicinity of the heads of the two persons 91 a and 92 a displayed on the screen 68 of the display 63 with the finger 95. The touch points 63a1 and 63a2 are a plurality of designated locations designated by the user. The signal processing unit 50 uses the sound picked up by the microphone array 20, that is, the sound data picked up by each microphone 22, from the position of each microphone 22 of the microphone array 20, and touch points 63a1, designated by the user. Each voice data in which directivity is formed in each directivity direction (direction indicated by reference signs e1 and e2 shown in FIG. 8A) toward each voice position corresponding to 63a2 is generated and synthesized.

即ち、信号処理部５０は、各マイクロホン２２が収音した各音声データを用いて、マイクアレイ２０の各マイクロホン２２の位置から、ユーザが指定したタッチ点６３ａ１，６３ａ２に対応する各音声位置に向かう各指向方向の音声（音量レベル）を強調（増幅）した音声データを生成して合成する。再生部６０は、信号処理部５０が合成した音声データを、カメラ１０が撮像した映像データと同期させてスピーカ６５から音声出力させる。 In other words, the signal processing unit 50 uses each audio data collected by each microphone 22 from each microphone 22 in the microphone array 20 to each audio position corresponding to the touch points 63a1 and 63a2 designated by the user. Generate and synthesize audio data that emphasizes (amplifies) audio in each directional direction (volume level). The playback unit 60 outputs the audio data synthesized by the signal processing unit 50 from the speaker 65 in synchronization with the video data captured by the camera 10.

この結果、ユーザによって指定されたタッチ点６３ａ１，６３ａ２に対応する各音声位置における音声が強調され、スピーカ６５から２人の人物９１ａ，９２ａの会話（例えば図８（Ａ）に示す「Ｈｅｌｌｏ」及び「Ｈｉ！」参照）が大きな音量によって音声出力される。一方、２人の人物９１ａ，９２ａに比べ、マイクアレイ２０により近い距離に載置されているがユーザによって指定されたタッチ点６３ａ１，６３ａ２ではないスピーカ８２から流れている音楽（図８（Ａ）に示す「♪〜」参照）は強調して音声出力されず、２人の人物９１ａ，９２ａの会話に比べて小さな音量によって音声出力される。 As a result, the voice at each voice position corresponding to the touch points 63a1 and 63a2 designated by the user is emphasized, and the conversation between the two persons 91a and 92a from the speaker 65 (for example, “Hello” shown in FIG. 8A) (See “Hi!”) Is output at a high volume. On the other hand, music flowing from the speaker 82 that is placed closer to the microphone array 20 than the two persons 91a and 92a but is not the touch points 63a1 and 63a2 designated by the user (FIG. 8A). The voice is not output with emphasis, but is output with a lower sound volume than the conversation between the two persons 91a and 92a.

次に、ユーザにより複数の指定箇所が指定された場合に、本実施形態の音声処理システムが、ディスプレイ６３に表示された映像データの中で、指定箇所毎に異なる識別形状を各指定箇所の周囲に表示する例、及び各識別形状に対応付けて予め規定された方法に従って音声出力する例について、図３１〜図４０を参照して詳細に説明する。なお、本実施形態の図３１〜図４０の説明を分かり易くするために、全方位カメラ１０Ｅとマイクアレイ２０Ｃとが一体として組み込まれた音声処理システム５Ｄを想定して説明する（図９（Ａ）参照）が、本実施形態の音声処理システム５Ｄでは複数（例えば２つ）のスピーカ６５Ｌ，６５Ｒが音声処理装置４０又はＰＣ７０に設けられているとする。 Next, when a plurality of designated locations are designated by the user, the audio processing system according to the present embodiment displays different identification shapes for each designated location in the video data displayed on the display 63 around each designated location. An example of displaying the voice and an example of outputting the voice according to a method defined in advance in association with each identification shape will be described in detail with reference to FIGS. In order to make the description of FIGS. 31 to 40 of the present embodiment easier to understand, description will be made assuming an audio processing system 5D in which the omnidirectional camera 10E and the microphone array 20C are integrated (FIG. 9A). However, in the voice processing system 5D of the present embodiment, it is assumed that a plurality of (for example, two) speakers 65L and 65R are provided in the voice processing device 40 or the PC 70.

図３１（Ａ）は、第４の実施形態の音声処理システム５Ｄの使用例の説明図である。図３１（Ｂ）は、第１の指定箇所の周囲に表示される第１の識別形状９１Ｍ、第２の指定箇所の周囲に表示される第２の識別形状９２Ｍの一例を表示する様子と、第１の識別形状９１Ｍにより特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調して第１のスピーカ６５Ｌから出力する様子と、第２の識別形状９２Ｍにより特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調して第２のスピーカ６５Ｒから出力する様子とを示す図である。 FIG. 31A is an explanatory diagram of a usage example of the voice processing system 5D of the fourth embodiment. FIG. 31B shows an example of displaying an example of the first identification shape 91M displayed around the first designated location, and the second identification shape 92M displayed around the second designated location, A state in which the sound in the first directivity direction toward the first sound position corresponding to the first designated location specified by the first identification shape 91M is emphasized and output from the first speaker 65L; It is a figure which shows a mode that the audio | voice of the 2nd directivity direction which goes to the 2nd audio | voice position corresponding to the 2nd designated location specified by the identification shape 92M is emphasized, and it outputs from the 2nd speaker 65R.

図３１（Ａ）では、例えば屋内のホールの天井８５に、ドーナツ型形状のマイクアレイ２０Ｃと、マイクアレイ２０Ｃと一体として組み込まれた全方位カメラ１０Ｅと、スピーカ８３とが設置されている。また、図３１（Ａ）では、４人の人物９１ａ，９２ａ，９３ａ，９４ａがホールの床８７に立って会話をしており、より具体的には人物９１ａ，９２ａが会話をしており、人物９３ａ，９４ａが会話をしている。人物９２ａ，９３ａから少し離れた位置には、スピーカ８２が床８７の上に接して載置されており、スピーカ８２から音楽が流れている。また、全方位カメラ１０Ｅは、所定の視野角内に存在する人物９１ａ，９２ａ，９３ａ，９４ａ及びスピーカ８２を撮像している。更に、マイクアレイ２０Ｃは、ホール全体の音声を収音している。ディスプレイ６３の画面６８には、全方位カメラ１０Ｅが撮像した映像データが表示されている。 In FIG. 31A, for example, a donut-shaped microphone array 20C, an omnidirectional camera 10E integrated with the microphone array 20C, and a speaker 83 are installed on the ceiling 85 of an indoor hall. Further, in FIG. 31A, four persons 91a, 92a, 93a, 94a are talking on the floor 87 of the hall, more specifically, the persons 91a, 92a are talking, Persons 93a and 94a are having a conversation. A speaker 82 is placed on the floor 87 at a position slightly away from the persons 92a and 93a, and music flows from the speaker 82. Further, the omnidirectional camera 10E images the persons 91a, 92a, 93a, 94a and the speaker 82 that exist within a predetermined viewing angle. Furthermore, the microphone array 20C collects the sound of the entire hall. On the screen 68 of the display 63, video data captured by the omnidirectional camera 10E is displayed.

（指定箇所の指定方法と指定方法に対応付けられた音声出力方法との組み合わせ）
以下、本実施形態の音声処理システム５Ｄにおいて、ユーザの複数の指定箇所の指定方法と、指定箇所毎に表示される識別形状に対応付けられた音声出力方法との組み合わせについて、複数の例を用いて説明する。但し、以下の指定箇所の指定方法と音声出力方法との組み合わせはあくまで一例であり、各組み合わせにおいて他の指定箇所の指定方法や音声出力方法が用いて組み合わされても良い。 (Combination of the specified location specification method and the audio output method associated with the specified method)
Hereinafter, in the voice processing system 5D of the present embodiment, a plurality of examples are used for a combination of a method for specifying a plurality of designated locations of a user and a voice output method associated with an identification shape displayed for each designated location. I will explain. However, the combination of the following designation location designation method and voice output method is merely an example, and each designation location may be combined using another designation location designation method or voice output method.

（第１の指定方法及び音声出力方法の組み合わせ）
第１の指定方法は、例えばマウスを用いた左クリック操作及び右クリック操作により、指定箇所を指定する方法である。第１の音声出力方法は、指定箇所の一方の音声データを一方のスピーカから音声出力し、指定箇所の他方の音声データを他方のスピーカから音声出力する単純ステレオ２ｃｈ（チャンネル）出力方法である。 (Combination of first designation method and audio output method)
The first designation method is a method of designating a designated portion by, for example, a left click operation and a right click operation using a mouse. The first audio output method is a simple stereo 2ch (channel) output method in which one audio data at a specified location is output from one speaker and the other audio data at the specified location is output from the other speaker.

ユーザは、例えばディスプレイ６３の画面６８（図３１（Ｂ）参照）に表示された人物９１ａの頭上付近を操作部５５（例えばマウス）の左クリック操作により、更に、人物９２ａの頭上付近を操作部５５（例えばマウス）の右クリック操作により、それぞれ連続的に指定したとする。左クリック操作及び右クリック操作により指定された箇所は、ユーザにより指定された複数の指定箇所となる。信号処理部５０は、複数の指定箇所が指定された場合に、各指定箇所を適正に区別するために、指定箇所毎に異なる識別形状を各指定箇所の周囲に表示させる。 The user, for example, near the overhead of the person 91a displayed on the screen 68 (see FIG. 31B) of the display 63 by left-clicking the operation unit 55 (for example, a mouse), and further, near the overhead of the person 92a. It is assumed that each is designated continuously by a right click operation of 55 (for example, a mouse). The locations specified by the left click operation and the right click operation are a plurality of specified locations specified by the user. When a plurality of designated locations are designated, the signal processing unit 50 displays a different identification shape for each designated location around each designated location in order to properly distinguish each designated location.

具体的には、信号処理部５０は、左クリック操作により指定された人物９１ａの周囲に、人物９１ａが指定されたことを視覚的に明示するための識別形状９１Ｍを表示させ、同様に、右クリック操作により指定された人物９２ａの周囲に、人物９２ａが指定されたことを視覚的に明示するための識別形状９２Ｍを表示させる。識別形状９１Ｍ，９２Ｍは、例えばそれぞれ緑色，赤色の矩形であるが、色や形状は緑色、赤色、矩形に限定されない。 Specifically, the signal processing unit 50 displays an identification shape 91M for visually indicating that the person 91a has been designated around the person 91a designated by the left click operation. An identification shape 92M for visually indicating that the person 92a is designated is displayed around the person 92a designated by the click operation. The identification shapes 91M and 92M are, for example, green and red rectangles, respectively, but the color and shape are not limited to green, red, and rectangle.

また、信号処理部５０は、マイクアレイ２０Ｃによって収音された音声の音声データを用いて、マイクアレイ２０Ｃの設置位置から、ユーザが指定した２つの指定箇所に対応する各音声位置に向かう各指向方向（図３１（Ａ）に示す符号ｅ１，ｅ２で示される方向）に指向性を形成した各音声データを生成する。再生部６０は、全方位カメラ１０Ｅが撮像した映像データと同期させて、識別形状９１Ｍにより特定される第１の指向方向（図３１（Ａ）に示す符号ｅ１参照）の音声を強調した音声データをスピーカ６５Ｌから音声出力し、識別形状９２Ｍにより特定される第２の指向方向（図３１（Ａ）に示す符号ｅ２参照）の音声を強調した音声データをスピーカ６５Ｒから音声出力する。従って、人物９１ａの会話音声（「Ｈｅｌｌｏ」）はスピーカ６５Ｌから強調されて音声出力され、人物９２ａの会話音声（「Ｈｉ！」）はスピーカ６５Ｒから強調されて音声出力される。 In addition, the signal processing unit 50 uses the sound data of the sound collected by the microphone array 20C to direct each direction from the installation position of the microphone array 20C to each sound position corresponding to two designated locations designated by the user. Each piece of audio data in which directivity is formed in the direction (the direction indicated by reference signs e1 and e2 shown in FIG. 31A) is generated. The reproducing unit 60 synchronizes with the video data imaged by the omnidirectional camera 10E, and emphasizes the audio data in the first directivity direction (see symbol e1 shown in FIG. 31A) specified by the identification shape 91M. Is output from the speaker 65L, and the audio data emphasizing the sound in the second directivity direction (see symbol e2 shown in FIG. 31A) specified by the identification shape 92M is output from the speaker 65R. Accordingly, the conversation voice (“Hello”) of the person 91a is emphasized and output from the speaker 65L, and the conversation voice (“Hi!”) Of the person 92a is emphasized and output from the speaker 65R.

図３２は、図３１（Ｂ）に示す映像データが表示されている状態において、ディスプレイ６３に表示された映像データの表示領域外へのクリック操作に応じて、調整用操作ボックスＯＰＢが表示される様子を示す図である。例えば、ディスプレイ６３に図３１（Ｂ）に示す映像データが表示されている場合に、ユーザが、操作部５５（例えばマウス）により、カーソルＭＰＴを映像データの表示領域外に移動させてからクリック操作（例えば右クリック操作）したとする。信号処理部５０は、ユーザのクリック操作に応じて、スピーカ６５Ｌ又は６５Ｒから音声出力される音声のパラメータ（例えば、音量レベル）を調整するための調整用操作ボックスＯＰＢを、ディスプレイ６３に表示させる。なお、調整用操作ボックスＯＰＢは、例えば音量レベルの調整に用いられるとして説明しているが、他には、音声出力時のイコライザの設定の調整や、有指向音声と無指向音声との切り替えの調整に用いられても良い。 In FIG. 32, the adjustment operation box OPB is displayed in response to a click operation outside the display area of the video data displayed on the display 63 in a state where the video data shown in FIG. 31B is displayed. It is a figure which shows a mode. For example, when the video data shown in FIG. 31B is displayed on the display 63, the user moves the cursor MPT out of the video data display area using the operation unit 55 (for example, a mouse) and then performs a click operation. (For example, right click operation). The signal processing unit 50 causes the display 63 to display an adjustment operation box OPB for adjusting an audio parameter (for example, a volume level) output from the speaker 65L or 65R according to a user's click operation. The adjustment operation box OPB has been described as being used, for example, for adjusting the volume level. However, other than that, adjustment of equalizer settings at the time of sound output and switching between directional sound and omnidirectional sound can be performed. It may be used for adjustment.

なお、ユーザが第１の識別形状９１Ｍを選択した状態で、調整用操作ボックスＯＰＢの「＋」ボタンを複数回押下すると、スピーカ６５Ｌから音声出力されている人物９１ａの会話音声が更に大きく音声出力される。一方、ユーザが第２の識別形状の９２Ｍを選択した状態で、調整用操作ボックスＯＰＢの「−」ボタンを複数回押下すると、スピーカ６５Ｒから音声出力されている人物９２ａの会話音声が更に小さく音声出力される。 When the user selects the first identification shape 91M and presses the “+” button of the adjustment operation box OPB a plurality of times, the conversation voice of the person 91a output from the speaker 65L is further increased. Is done. On the other hand, when the user selects the second identification shape 92M and presses the “−” button of the adjustment operation box OPB a plurality of times, the conversation voice of the person 92a output from the speaker 65R is further reduced. Is output.

なお、第１の識別形状９１Ｍ、第２の識別形状９２Ｍは、両方とも実線であるが、色が異なることで区別されていたが、例えば色は同じであって実線と点線とにより区別されても良い（図３３（Ｂ）参照）。図３３（Ａ）は、第４の実施形態の音声処理システム５Ｄの使用例の説明図である。図３３（Ｂ）は、第１の指定箇所の周囲に表示される第１の識別形状９１Ｎ、第２の指定箇所の周囲に表示される第２の識別形状９２Ｎの一例を表示する様子と、第１の識別形状９１Ｎにより特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調して第１のスピーカ６５Ｌから出力する様子と、第２の識別形状９２Ｎにより特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調して第２のスピーカ６５Ｒから出力する様子とを示す図である。 The first identification shape 91M and the second identification shape 92M are both solid lines, but are distinguished by different colors. For example, the colors are the same and are distinguished by a solid line and a dotted line. (See FIG. 33B). FIG. 33A is an explanatory diagram of a usage example of the speech processing system 5D of the fourth embodiment. FIG. 33B shows an example of displaying an example of the first identification shape 91N displayed around the first designated location and the second identification shape 92N displayed around the second designated location; A state in which the sound in the first directivity direction toward the first sound position corresponding to the first designated location specified by the first identification shape 91N is emphasized and output from the first speaker 65L; It is a figure which shows a mode that the audio | voice of the 2nd directivity direction which goes to the 2nd audio | voice position corresponding to the 2nd designated location specified by the identification shape 92N is emphasized, and it outputs from the 2nd speaker 65R.

なお、図３３（Ａ）は図３１（Ａ）と同様であるため、図３３（Ａ）の説明は割愛する。更に、図３１（Ｂ）では識別形状９１Ｍ，９２Ｍの色が異なっており両方とも実線であったが、図３３（Ｂ）では識別形状９１Ｎ，９２Ｎの色は同一であって、更に一方（第１の識別形状９１Ｎ）が実線であり他方（第２の識別形状９２Ｎ）が点線であること以外は、図３３（Ｂ）と図３１（Ｂ）との違いは無いので、図３３（Ｂ）の説明も割愛する。 Note that FIG. 33A is similar to FIG. 31A, and thus the description of FIG. 33A is omitted. Further, in FIG. 31B, the colors of the identification shapes 91M and 92M are different and both are solid lines. However, in FIG. 33B, the colors of the identification shapes 91N and 92N are the same, and one (first) Since there is no difference between FIG. 33 (B) and FIG. 31 (B) except that one identification shape 91N) is a solid line and the other (second identification shape 92N) is a dotted line, FIG. I will omit the explanation.

図３４は、図３１（Ｂ）に示す映像データが表示されている状態において、ディスプレイ６３に表示された映像データの表示領域外へのクリック操作毎に、全方位カメラ１０Ｅにより撮像された映像データと調整用操作ボックスＯＰＢとを切り替えて表示する様子を示す図である。例えば、ディスプレイ６３に図３１（Ｂ）に示す映像データが表示されている場合に、ユーザが、操作部５５（例えばマウス）により、カーソルＭＰＴを映像データの表示領域外に移動させてからクリック操作（例えば右クリック操作）したとする。信号処理部５０は、ユーザのクリック操作に応じて、全方位カメラ１０Ｅにより撮像された映像データの画面を調整用操作ボックスＯＰＢに切り替えてディスプレイ６３に表示させる。 FIG. 34 shows video data captured by the omnidirectional camera 10E for each click operation outside the display area of the video data displayed on the display 63 in a state where the video data shown in FIG. 31 (B) is displayed. It is a figure which shows a mode that it switches and displays operation box OPB for adjustment. For example, when the video data shown in FIG. 31B is displayed on the display 63, the user moves the cursor MPT out of the video data display area using the operation unit 55 (for example, a mouse) and then performs a click operation. (For example, right click operation). In response to the user's click operation, the signal processing unit 50 switches the screen of the video data captured by the omnidirectional camera 10E to the adjustment operation box OPB and causes the display 63 to display the screen.

反対に、ディスプレイ６３に調整用操作ボックスＯＰＢが表示されている場合に、ユーザが、操作部５５（例えばマウス）により、カーソルＭＰＴを映像データの表示領域外に移動させてからクリック操作（例えば右クリック操作）したとする。信号処理部５０は、ユーザのクリック操作に応じて、調整用操作ボックスＯＰＢを、全方位カメラ１０Ｅにより撮像された映像データの画面に切り替えてディスプレイ６３に表示させる。なお、調整用操作ボックスＯＰＢと全方位カメラ１０Ｅにより撮像された映像データの画面との切り替えは、カーソルＭＰＴの映像データの表示領域外におけるクリック操作により実行されると説明したが、クリック操作に限定されず、所定の入力操作により実行されても良い。所定の入力操作とは、例えばユーザがキーボードの異なる複数の特定キーを同時に押下した操作等である。 On the other hand, when the adjustment operation box OPB is displayed on the display 63, the user moves the cursor MPT out of the display area of the video data using the operation unit 55 (for example, a mouse) and then performs a click operation (for example, the right). Click operation). In response to the user's click operation, the signal processing unit 50 switches the adjustment operation box OPB to a screen of video data captured by the omnidirectional camera 10E and displays it on the display 63. It has been described that the switching between the adjustment operation box OPB and the screen of the video data captured by the omnidirectional camera 10E is executed by a click operation outside the display area of the video data of the cursor MPT, but is limited to the click operation. Instead, it may be executed by a predetermined input operation. The predetermined input operation is, for example, an operation in which the user simultaneously presses a plurality of specific keys with different keyboards.

図３５は、図３１（Ｂ）に示す映像データが表示されている状態において、ディスプレイ６３に表示された映像データの表示領域外へのクリック操作に応じて、状態標示用ボックスＩＮＤが表示される様子を示す図である。例えば、ディスプレイ６３に図３１（Ｂ）に示す映像データが表示されている場合に、ユーザが、操作部５５（例えばマウス）により、カーソルＭＰＴを映像データの表示領域外に移動させてからクリック操作（例えば右クリック操作）したとする。信号処理部５０は、ユーザのクリック操作に応じて、スピーカ６５Ｌ又は６５Ｒから音声出力されている音声のパラメータ（例えば、音量レベル）の状態を標示するための状態標示用ボックスＩＮＤを、ディスプレイ６３に表示させる。 In FIG. 35, in the state where the video data shown in FIG. 31 (B) is displayed, a state indication box IND is displayed in response to a click operation outside the display area of the video data displayed on the display 63. It is a figure which shows a mode. For example, when the video data shown in FIG. 31B is displayed on the display 63, the user moves the cursor MPT out of the video data display area using the operation unit 55 (for example, a mouse) and then performs a click operation. (For example, right click operation). The signal processing unit 50 displays, on the display 63, a status indication box IND for indicating the status of a parameter (for example, a volume level) of audio output from the speaker 65L or 65R in response to a user's click operation. Display.

なお、ユーザは状態標示用ボックスＩＮＤに対して操作することはできないが、ディスプレイ６３に表示されたいずれかの識別形状がユーザにより指定されると、指定された識別形状に対応する人物の音声の音量レベルの内容が状態標示用ボックスＩＮＤにより視覚的に明示される。また、状態標示用ボックスＩＮＤの内容を変更するためには、例えばユーザが、第１の識別形状９１Ｍを選択した状態で、他の操作部（例えばキーボード）の特定キーを押下することで、スピーカ６５Ｌから音声出力されている人物９１ａの会話音声の音量レベルを大きく又は小さくした結果又はその結果に至る過程が状態標示用ボックスＩＮＤにおいて視覚的に明示される。なお、状態標示用ボックスＩＮＤは、例えば音量レベルの状態を標示するとして説明しているが、他には、音声出力時のイコライザの設定内容や、有指向音声と無指向音声との切り替えの状態の標示に用いられても良い。また、状態標示用ボックスＩＮＤは、ディスプレイ６３において常に表示されても良い。 Although the user cannot operate the status indication box IND, if any identification shape displayed on the display 63 is designated by the user, the voice of the person corresponding to the designated identification shape is not displayed. The contents of the sound volume level are visually specified by a state indication box IND. Further, in order to change the contents of the state indication box IND, for example, the user presses a specific key of another operation unit (for example, a keyboard) while the first identification shape 91M is selected. The result of increasing or decreasing the volume level of the conversation voice of the person 91a that is outputting the sound from 65L or the process leading to the result is visually specified in the state indication box IND. Note that the state indication box IND is described as indicating, for example, the state of the volume level. However, other than the setting contents of the equalizer at the time of sound output, the state of switching between directional sound and omnidirectional sound May be used for marking. Further, the state indication box IND may be always displayed on the display 63.

（第２の指定方法及び音声出力方法の組み合わせ）
第２の指定方法は、例えばキーボードの数字キーの押下操作とマウスの左クリック操作とにより、指定箇所を指定する方法である。第２の音声出力方法は、全ての指定箇所の音声データを両方のスピーカから音声出力する合成モノラル２ｃｈ（チャンネル）出力方法である。 (Combination of second designation method and audio output method)
The second designation method is a method of designating a designated portion by, for example, pressing a numeric key on the keyboard and left-clicking the mouse. The second audio output method is a synthetic monaural 2ch (channel) output method that outputs audio data of all designated locations from both speakers.

図３６（Ａ）は、第４の実施形態の音声処理システム５Ｄの使用例の説明図である。図３６（Ｂ）は、第１の指定箇所の周囲に表示される第１の識別形状９１Ｋ、第２の指定箇所の周囲に表示される第２の識別形状９２Ｋ、第３の指定箇所の周囲に表示される第３の識別形状９３Ｋ、第４の指定箇所の周囲に表示される第４の識別形状９４Ｋの一例を表示する様子と、第１の識別形状９１Ｋにより特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調した音声データと、第２の識別形状９２Ｋにより特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調した音声データと、第３の識別形状９３Ｋにより特定される第３の指定箇所に対応する第３の音声位置に向かう第３の指向方向の音声を強調した音声データとを、第１及び第２の各スピーカ６５Ｌ，６５Ｒから出力する様子を示す図である。なお、図３６（Ａ）は図３１（Ａ）と同様であるため、図３６（Ａ）の説明は割愛する。 FIG. 36A is an explanatory diagram of a usage example of the voice processing system 5D of the fourth embodiment. FIG. 36B shows the first identification shape 91K displayed around the first designated location, the second identification shape 92K displayed around the second designated location, and the circumference of the third designated location. An example of displaying the third identification shape 93K displayed on the screen, an example of the fourth identification shape 94K displayed around the fourth designated location, and the first designation specified by the first identification shape 91K Heading to the second voice position corresponding to the second designated location specified by the second identification shape 92K and the voice data in which the voice in the first directivity direction toward the first voice position corresponding to the location is emphasized. Voice data in which the voice in the second directivity direction is emphasized, and voice in which the voice in the third directivity direction toward the third voice position corresponding to the third designated position specified by the third identification shape 93K is emphasized. Data, the first and second speakers 65L, Is a diagram showing how the output from the 5R. Note that FIG. 36A is similar to FIG. 31A, and thus the description of FIG. 36A is omitted.

ユーザは、例えばディスプレイ６３の画面６８（図３６（Ｂ）参照）に表示された人物９１ａの頭上付近を操作部５５（例えばキーボードの数字「１」キーの押下とマウスの左クリック）の同時操作、人物９２ａの頭上付近を操作部５５（例えばキーボードの数字「２」キーの押下とマウスの左クリック）の同時操作、人物９３ａの頭上付近を操作部５５（例えばキーボードの数字「３」キーの押下とマウスの左クリック）の同時操作、人物９４ａの頭上付近を操作部５５（例えばキーボードの数字「４」キーの押下とマウスの左クリック）の同時操作により、それぞれ連続的に指定したとする。数字キーの押下と左クリックの各操作により指定された各箇所は、ユーザにより指定された複数の指定箇所となる。信号処理部５０は、複数の指定箇所が指定された場合に、各指定箇所を適正に区別するために、指定箇所毎に異なる識別形状を各指定箇所の周囲に表示させる。 For example, the user operates the operation unit 55 (for example, pressing the number “1” key on the keyboard and left clicking the mouse) near the head of the person 91a displayed on the screen 68 of the display 63 (see FIG. 36B). The operation portion 55 (for example, pressing the number “2” key on the keyboard and the left click of the mouse) is operated simultaneously near the head of the person 92a, and the operation portion 55 (for example, the number “3” key on the keyboard) The simultaneous operation of pressing and left clicking of the mouse is performed, and the vicinity of the head of the person 94a is successively designated by the simultaneous operation of the operation unit 55 (for example, pressing of the number “4” key on the keyboard and left clicking of the mouse). . Each location specified by pressing the number key and each left-click operation becomes a plurality of specified locations specified by the user. When a plurality of designated locations are designated, the signal processing unit 50 displays a different identification shape for each designated location around each designated location in order to properly distinguish each designated location.

具体的には、信号処理部５０は、数字「１」キーの押下操作と左クリック操作により指定された人物９１ａの周囲に、人物９１ａが指定されたことを視覚的に明示するための識別形状９１Ｋを表示させ、数字「２」キーの押下操作と左クリック操作により指定された人物９２ａの周囲に、人物９２ａが指定されたことを視覚的に明示するための識別形状９２Ｋを表示させ、数字「３」キーの押下操作と左クリック操作により指定された人物９３ａの周囲に、人物９３ａが指定されたことを視覚的に明示するための識別形状９３Ｋを表示させ、数字「４」キーの押下操作と左クリック操作により指定された人物９４ａの周囲に、人物９４ａが指定されたことを視覚的に明示するための識別形状９４Ｋを表示させる。識別形状９１Ｋ，９２Ｋ，９３Ｋ，９４Ｋは、例えば黒色の矩形であるが、色や形状は黒色、矩形に限定されない。 Specifically, the signal processing unit 50 visually identifies that the person 91a has been designated around the person 91a designated by the pressing operation of the number “1” key and the left click operation. 91K is displayed, and an identification shape 92K for visually indicating that the person 92a has been designated is displayed around the person 92a designated by the pressing operation of the number “2” key and the left click operation. An identification shape 93K for visually indicating that the person 93a is designated is displayed around the person 93a designated by the pressing operation of the “3” key and the left click operation, and the number “4” key is pressed. An identification shape 94K for visually indicating that the person 94a is designated is displayed around the person 94a designated by the operation and the left click operation. The identification shapes 91K, 92K, 93K, and 94K are, for example, black rectangles, but the color and shape are not limited to black and rectangles.

また、信号処理部５０は、マイクアレイ２０Ｃによって収音された音声の音声データを用いて、マイクアレイ２０Ｃの設置位置から、ユーザが指定した４つの指定箇所に対応する各音声位置に向かう各指向方向（図３６（Ａ）に示す符号ｅ１，ｅ２，ｅ３で示される方向）に指向性を形成した各音声データを生成して合成する。再生部６０は、全方位カメラ１０Ｅが撮像した映像データと同期させて、識別形状９１Ｋにより特定される第１の指向方向（図３６（Ａ）に示す符号ｅ１参照）の音声を強調した音声データと、識別形状９２Ｋにより特定される第２の指向方向（図３６（Ａ）に示す符号ｅ２参照）の音声を強調した音声データと、識別形状９３Ｋにより特定される第３の指向方向（図３６（Ａ）に示す符号ｅ３参照）の音声を強調した音声データとを合成した音声データを、スピーカ６５Ｌ，６５Ｒから音声出力する。従って、人物９１ａの会話音声（「Ｈｅｌｌｏ」）、人物９２ａの会話音声（「Ｈｉ！」）、人物９３ａの会話音声（「Ｇｏｏｄｍｏｒｎｉｎｇ！」）はスピーカ６５Ｌ，６５Ｒから強調されて音声出力される。なお、図３６（Ａ）では人物９４ａは声を出していない状態が図示されているので、スピーカ６５Ｌ，６５Ｒから人物９４ａの会話音声は強調して音声出力されていないが、例えば人物９４ａが声を出している場合には、人物９４ａの会話音声もスピーカ６５Ｌ，６５Ｒから音声出力される。 In addition, the signal processing unit 50 uses the sound data of the sound collected by the microphone array 20C to direct each direction from the installation position of the microphone array 20C to each sound position corresponding to the four designated locations designated by the user. Each voice data in which directivity is formed in the direction (directions indicated by reference signs e1, e2, and e3 shown in FIG. 36A) is generated and synthesized. The reproduction unit 60 synchronizes with the video data captured by the omnidirectional camera 10E, and enhances audio data in the first directivity direction (see reference numeral e1 shown in FIG. 36A) specified by the identification shape 91K. And voice data emphasizing the voice in the second directivity direction (see symbol e2 shown in FIG. 36A) specified by the identification shape 92K, and the third directivity direction (FIG. 36) specified by the identification shape 93K. The voice data obtained by synthesizing the voice data in which the voice of (e3) shown in (A) is emphasized is output from the speakers 65L and 65R. Accordingly, the conversation voice of the person 91a (“Hello”), the conversation voice of the person 92a (“Hi!”), And the conversation voice of the person 93a (“Good morning!”) Are emphasized from the speakers 65L and 65R and outputted. . In FIG. 36A, since the person 94a does not speak, the conversation voice of the person 94a is not emphasized and output from the speakers 65L and 65R. In the case where the voice of the person 94a is output, the voice of the person 94a is also output from the speakers 65L and 65R.

図３７は、図３６（Ｂ）に示す映像データが表示されている状態において、キーボードの複数の特定キーの同時押下操作に応じて、調整用操作ボックスＯＰＢが表示される様子を示す図である。例えば、ディスプレイ６３に図３６（Ｂ）に示す映像データが表示されている場合に、ユーザが、操作部５５（例えばキーボードの「Ｓｈｉｆｔ」キーと数字「１」キー）の同時押下操作を行ったとする。信号処理部５０は、ユーザの同時押下操作に応じて、スピーカ６５Ｌ又は６５Ｒから音声出力される音声の音量レベルを調整するための調整用操作ボックスＯＰＢを、ディスプレイ６３に表示させる。 FIG. 37 is a diagram illustrating a state in which the adjustment operation box OPB is displayed in response to the simultaneous pressing operation of a plurality of specific keys on the keyboard in the state where the video data illustrated in FIG. 36B is displayed. . For example, when the video data shown in FIG. 36B is displayed on the display 63, the user performs the simultaneous pressing operation of the operation unit 55 (for example, the “Shift” key and the number “1” key on the keyboard). To do. The signal processing unit 50 causes the display 63 to display an adjustment operation box OPB for adjusting the volume level of the sound output from the speaker 65L or 65R according to the simultaneous pressing operation by the user.

図３９は、図３６（Ｂ）に示す映像データが表示されている状態において、ディスプレイ６３に表示された映像データの表示領域外へのクリック操作に応じて、調整用操作ボックスＯＰＢが表示される様子を示す図である。例えば、ディスプレイ６３に図３６（Ｂ）に示す映像データが表示されている場合に、ユーザが、操作部５５（例えばマウス）により、カーソルＭＰＴを映像データの表示領域外に移動させてからクリック操作（例えば右クリック操作）したとする。信号処理部５０は、ユーザのクリック操作に応じて、スピーカ６５Ｌ又は６５Ｒから音声出力される音声の音量レベルを調整するための調整用操作ボックスＯＰＢを、ディスプレイ６３に表示させる。 In FIG. 39, the adjustment operation box OPB is displayed in response to a click operation outside the display area of the video data displayed on the display 63 while the video data shown in FIG. It is a figure which shows a mode. For example, when the video data shown in FIG. 36B is displayed on the display 63, the user moves the cursor MPT out of the video data display area using the operation unit 55 (for example, a mouse) and then performs a click operation. (For example, right click operation). The signal processing unit 50 causes the display 63 to display an adjustment operation box OPB for adjusting the volume level of the sound output from the speaker 65L or 65R according to the user's click operation.

（第３の指定方法及び音声出力方法の組み合わせ）
第３の指定方法は、例えばタッチパネルが設けられたディスプレイ６３、又はタッチパネルとは異なるタッチデバイス（例えばタッチパッド）に対するユーザの指若しくはスタイラスペンによる異なる識別形状の描画操作により、指定箇所を指定する方法である。第３の音声出力方法は、ユーザにより指定された１つ又は複数の指定箇所の音声データを一方のスピーカから音声出力し、同様にユーザにより指定された１つ又は複数の指定箇所の音声データを他方のスピーカから音声出力する合成ステレオ２ｃｈ（チャンネル）出力方法である。以下、説明を分かり易くするために、タッチパネルが設けられたディスプレイ６３に対するユーザの描画操作により、指定箇所が指定されるとして説明する。 (Combination of the third designation method and audio output method)
The third designation method is a method of designating a designated portion by a drawing operation of a different identification shape by a user's finger or a stylus pen on a display 63 provided with a touch panel or a touch device (for example, a touch pad) different from the touch panel, for example. It is. In the third audio output method, audio data of one or a plurality of designated locations designated by the user is output from one speaker, and similarly, the audio data of one or more designated locations designated by the user is output. This is a synthetic stereo 2ch (channel) output method for outputting sound from the other speaker. Hereinafter, in order to make the explanation easy to understand, it is assumed that the designated portion is designated by the user's drawing operation on the display 63 provided with the touch panel.

図３９（Ａ）は、第４の実施形態の音声処理システム５Ｄの使用例の説明図である。図３９（Ｂ）は、第１の指定箇所の周囲に表示される第１の識別形状９１Ｌ、第２の指定箇所の周囲に表示される第２の識別形状９２Ｌ、第３の指定箇所の周囲に表示される第３の識別形状９３Ｌ、第４の指定箇所の周囲に表示される第４の識別形状９４Ｌの一例を表示する様子と、第１の識別形状９１Ｌにより特定される第１の指定箇所に対応する第１の音声位置に向かう第１の指向方向の音声を強調した音声データと、第２の識別形状９２Ｌにより特定される第２の指定箇所に対応する第２の音声位置に向かう第２の指向方向の音声を強調した音声データとを合成して第１のスピーカ６５Ｌから出力する様子と、第３の識別形状９３Ｌにより特定される第３の指定箇所に対応する第３の音声位置に向かう第３の指向方向の音声を強調した音声データを第２のスピーカ６５Ｒから出力する様子を示す図である。なお、図３９（Ａ）は図３１（Ａ）と同様であるため、図３９（Ａ）の説明は割愛する。 FIG. 39A is an explanatory diagram of a usage example of the speech processing system 5D of the fourth embodiment. FIG. 39B shows the first identification shape 91L displayed around the first designated location, the second identification shape 92L displayed around the second designated location, and the circumference of the third designated location. The third identification shape 93L displayed on the screen, an example of the fourth identification shape 94L displayed around the fourth designated location, and the first designation specified by the first identification shape 91L Heading to the second voice position corresponding to the second designated location specified by the second identification shape 92L and the voice data in which the voice in the first directivity direction toward the first voice position corresponding to the location is emphasized. The voice data in which the voice in the second directivity direction is emphasized is synthesized and output from the first speaker 65L, and the third voice corresponding to the third designated portion specified by the third identification shape 93L Emphasizing the voice in the third direction toward the position It is a diagram showing a state of outputting the voice data from the second speaker 65R. Note that FIG. 39A is similar to FIG. 31A, and thus the description of FIG. 39A is omitted.

ユーザは、例えばディスプレイ６３の画面６８（図４０（Ｂ）参照）に表示された人物９１ａの頭上付近のタッチ及びドラッグによる丸形状の描画操作、人物９２ａの頭上付近のタッチ及びドラッグによる矩形形状の描画操作、人物９３ａの頭上付近のタッチ及びドラッグによる三角形状の描画操作、人物９４ａの頭上付近のタッチ及びドラッグによる六角形状の描画操作により、それぞれ連続的に指定したとする。タッチ及びドラッグによる各形状の描画操作により指定された各箇所は、ユーザにより指定された複数の指定箇所となる。信号処理部５０は、複数の指定箇所が指定された場合に、各指定箇所を適正に区別するために、指定箇所毎に異なる描画操作により描かれた形状を識別形状として各指定箇所の周囲に表示させる。 The user, for example, draws a round shape by touching and dragging near the head of the person 91a displayed on the screen 68 (see FIG. 40B) of the display 63, and a rectangular shape by touching and dragging near the head of the person 92a. Assume that the drawing operation, the triangular drawing operation by touching and dragging near the head of the person 93a, and the hexagonal drawing operation by touching and dragging near the head of the person 94a are respectively designated continuously. Each location designated by the drawing operation of each shape by touch and drag becomes a plurality of designated locations designated by the user. When a plurality of designated places are designated, the signal processing unit 50 uses a shape drawn by a different drawing operation for each designated place as an identification shape around each designated place in order to appropriately distinguish each designated place. Display.

具体的には、信号処理部５０は、丸形状の描画操作により指定された人物９１ａの周囲に、人物９１ａが指定されたことを視覚的に明示するための識別形状９１Ｌを表示させ、矩形形状の描画操作により指定された人物９２ａの周囲に、人物９２ａが指定されたことを視覚的に明示するための識別形状９２Ｌを表示させ、三角形状の描画操作により指定された人物９３ａの周囲に、人物９３ａが指定されたことを視覚的に明示するための識別形状９３Ｌを表示させ、六角形状の描画操作により指定された人物９４ａの周囲に、人物９４ａが指定されたことを視覚的に明示するための識別形状９４Ｌを表示させる。識別形状９１Ｋ，９２Ｋ，９３Ｋ，９４Ｋは、あくまで一例であり各形状に限定されず、図３９（Ｂ）では各識別形状は点線により図示されているが、点線に限定されず、例えば実線により図示されても良い。 Specifically, the signal processing unit 50 displays an identification shape 91L for visually indicating that the person 91a is designated around the person 91a designated by the circular drawing operation, thereby obtaining a rectangular shape. An identification shape 92L for visually indicating that the person 92a has been designated is displayed around the person 92a designated by the drawing operation, and around the person 93a designated by the triangular drawing operation, An identification shape 93L for visually indicating that the person 93a has been specified is displayed, and it is visually indicated that the person 94a has been specified around the person 94a specified by the hexagonal drawing operation. An identification shape 94L is displayed. The identification shapes 91K, 92K, 93K, and 94K are merely examples, and are not limited to each shape. In FIG. 39B, each identification shape is illustrated by a dotted line, but is not limited to a dotted line, and is illustrated by, for example, a solid line. May be.

また、信号処理部５０は、マイクアレイ２０Ｃによって収音された音声の音声データを用いて、マイクアレイ２０Ｃの設置位置から、ユーザが指定した４つの指定箇所に対応する各音声位置に向かう各指向方向（図３９（Ａ）に示す符号ｅ１，ｅ２，ｅ３で示される方向）に指向性を形成した各音声データを生成して合成する。再生部６０は、例えばディスプレイ６３の中央から左側の表示領域において描画された識別形状９１Ｌ，９２Ｌを１つの音声出力グループとしてグルーピングし、全方位カメラ１０Ｅが撮像した映像データと同期させて、識別形状９１Ｌにより特定される第１の指向方向（図３９（Ａ）に示す符号ｅ１参照）の音声を強調した音声データと、識別形状９２Ｌにより特定される第２の指向方向（図３９（Ａ）に示す符号ｅ２参照）の音声を強調した音声データとを合成した音声データを、スピーカ６５Ｌから音声出力する。更に、再生部６０は、例えばディスプレイ６３の中央から右側の表示領域において描画された識別形状９３Ｌを１つの音声出力グループとしてグルーピングし、全方位カメラ１０Ｅが撮像した映像データと同期させて、識別形状９３Ｌにより特定される第３の指向方向（図３９（Ａ）に示す符号ｅ３参照）の音声を強調した音声データを、スピーカ６５Ｒから音声出力する。従って、人物９１ａの会話音声（「Ｈｅｌｌｏ」）、人物９２ａの会話音声（「Ｈｉ！」）はスピーカ６５Ｌから強調されて音声出力され、人物９３ａの会話音声（「Ｇｏｏｄｍｏｒｎｉｎｇ！」）はスピーカ６５Ｒから強調されて音声出力される。なお、図３６（Ａ）では人物９４ａは声を出していない状態が図示されているので、スピーカ６５Ｌ，６５Ｒから人物９４ａの会話音声は強調して音声出力されていないが、例えば人物９４ａが声を出している場合には、人物９４ａの会話音声もスピーカ６５Ｌ，６５Ｒから音声出力される。 In addition, the signal processing unit 50 uses the sound data of the sound collected by the microphone array 20C to direct each direction from the installation position of the microphone array 20C to each sound position corresponding to the four designated locations designated by the user. Each piece of audio data having directivity in the direction (direction indicated by reference signs e1, e2, and e3 shown in FIG. 39A) is generated and synthesized. For example, the reproducing unit 60 groups the identification shapes 91L and 92L drawn in the display area on the left side from the center of the display 63 as one audio output group, and synchronizes with the video data captured by the omnidirectional camera 10E to identify the identification shapes. The voice data in which the voice in the first directivity direction (see reference numeral e1 shown in FIG. 39A) specified by 91L is emphasized and the second directivity direction (see FIG. 39A) specified by the identification shape 92L. The voice data obtained by synthesizing the voice data emphasizing the voice of the reference numeral e2 is output from the speaker 65L. Further, the reproducing unit 60 groups the identification shape 93L drawn in the display area on the right side from the center of the display 63, for example, as one audio output group, and synchronizes with the video data captured by the omnidirectional camera 10E. Audio data emphasizing the audio in the third directivity direction (see symbol e3 shown in FIG. 39A) specified by 93L is output as audio from the speaker 65R. Accordingly, the conversation voice (“Hello”) of the person 91a and the conversation voice (“Hi!”) Of the person 92a are emphasized and output from the speaker 65L, and the conversation voice (“Good morning!”) Of the person 93a is output from the speaker 65R. Is output as a sound. In FIG. 36A, since the person 94a does not speak, the conversation voice of the person 94a is not emphasized and output from the speakers 65L and 65R. In the case where the voice of the person 94a is output, the voice of the person 94a is also output from the speakers 65L and 65R.

また、上述した説明では、再生部６０が、ディスプレイ６３の中央からの左側の表示領域と右側の表示領域とに表示されている識別形状の集合を区分した上で音声出力グループをそれぞれ形成する場合を説明したが、このやり方に限定されない。例えば、ユーザが音声出力グループを任意に指定しても良い。例えば、第１の識別形状９１Ｌと第３の識別形状９３Ｌとがスピーカ６５Ｌから音声出力させるための１つの音声出力グループとして指定され、第２の識別形状９２Ｌがスピーカ６５Ｒから音声出力させるための１つの音声出力グループとして指定されても良い。この場合、再生部６０は、全方位カメラ１０Ｅが撮像した映像データと同期させて、識別形状９１Ｌにより特定される第１の指向方向（図３９（Ａ）に示す符号ｅ１参照）の音声を強調した音声データと、識別形状９３Ｌにより特定される第３の指向方向（図３９（Ａ）に示す符号ｅ３参照）の音声を強調した音声データとを合成した音声データを、スピーカ６５Ｌから音声出力する。更に、再生部６０は、全方位カメラ１０Ｅが撮像した映像データと同期させて、識別形状９２Ｌにより特定される第２の指向方向（図３９（Ａ）に示す符号ｅ２参照）の音声を強調した音声データを、スピーカ６５Ｒから音声出力する。従って、人物９１ａの会話音声（「Ｈｅｌｌｏ」）、人物９３ａの会話音声（「Ｇｏｏｄｍｏｒｎｉｎｇ！」）はスピーカ６５Ｌから強調されて音声出力され、人物９２ａの会話音声（「Ｈｉ！」）はスピーカ６５Ｒから強調されて音声出力される。 Further, in the above description, the playback unit 60 forms the audio output groups after dividing the set of identification shapes displayed in the left display area and the right display area from the center of the display 63. However, it is not limited to this method. For example, the user may arbitrarily specify the audio output group. For example, the first identification shape 91L and the third identification shape 93L are designated as one audio output group for outputting sound from the speaker 65L, and the second identification shape 92L is 1 for outputting sound from the speaker 65R. It may be specified as one audio output group. In this case, the reproducing unit 60 emphasizes the sound in the first directivity direction (see the symbol e1 shown in FIG. 39A) specified by the identification shape 91L in synchronization with the video data captured by the omnidirectional camera 10E. Audio data obtained by synthesizing the audio data and the audio data emphasizing the audio in the third directivity direction (see symbol e3 shown in FIG. 39A) specified by the identification shape 93L is output from the speaker 65L. . Furthermore, the reproducing unit 60 emphasizes the voice in the second directivity direction (see symbol e2 shown in FIG. 39A) specified by the identification shape 92L in synchronization with the video data captured by the omnidirectional camera 10E. Audio data is output as audio from the speaker 65R. Therefore, the conversation voice (“Hello”) of the person 91a and the conversation voice (“Good morning!”) Of the person 93a are emphasized and output from the speaker 65L, and the conversation voice (“Hi!”) Of the person 92a is output from the speaker 65R. Is output as a sound.

図４０は、図３９（Ｂ）に示す映像データが表示されている状態において、タッチパネルが設けられたディスプレイ６３に表示された映像データの表示領域外へのタッチに応じて、調整用操作ボックスＯＰＢが表示される様子を示す図である。例えば、タッチパネルが設けられたディスプレイ６３に図３９（Ｂ）に示す映像データが表示されている場合に、ユーザが、映像データの表示領域外をタッチしたとする。信号処理部５０は、ユーザのタッチに応じて、スピーカ６５Ｌ又は６５Ｒから音声出力される音声の音量レベルを調整するための調整用操作ボックスＯＰＢを、ディスプレイ６３に表示させる。 FIG. 40 shows an adjustment operation box OPB in response to a touch outside the display area of the video data displayed on the display 63 provided with the touch panel in a state where the video data shown in FIG. 39B is displayed. It is a figure which shows a mode that is displayed. For example, when the video data shown in FIG. 39B is displayed on the display 63 provided with the touch panel, the user touches outside the display area of the video data. The signal processing unit 50 causes the display 63 to display an adjustment operation box OPB for adjusting the volume level of the sound output from the speaker 65L or 65R according to the user's touch.

以上により、第４の実施形態では、信号処理部５０は、ディスプレイ６３に表示された映像データに対して、ユーザが異なる複数（例えば２箇所）の指定箇所を指定した場合に、映像データ中の異なる各指定箇所に、異なる識別形状（例えば識別形状９１Ｌ，９２Ｌ）を表示させる。 As described above, in the fourth embodiment, when the user designates a plurality of different designated locations (for example, two locations) for the video data displayed on the display 63, the signal processing unit 50 includes the video data in the video data. Different identification shapes (for example, identification shapes 91L and 92L) are displayed at different designated locations.

これにより、音声処理システム５Ｄは、ディスプレイ６３に表示された映像データにおいて、ユーザにより指定された異なる複数の指定箇所を区別して認識することができ、区別した各指定箇所に異なる識別形状として、例えば一方の指定箇所の周囲に矩形の識別形状９１Ｌを表示し、他方の指定箇所の周囲に丸の識別形状９２Ｌを表示することで、複数の指定箇所を区別して認識したことを視覚的にユーザに対して明示することができる。 Thereby, the audio processing system 5D can distinguish and recognize a plurality of different designated places designated by the user in the video data displayed on the display 63. For example, as different identification shapes for each designated designated place, By displaying a rectangular identification shape 91L around one designated location and displaying a round identification shape 92L around the other designated location, the user is visually informed that a plurality of designated locations have been distinguished and recognized. It can be clearly indicated.

また、音声処理システム５Ｄには、例えば２つのスピーカが設けられ、再生部６０は、マイクアレイ２０から第１の指定箇所に対応する位置（第１の音声位置）に向かう第１の指向方向の音声を強調した第１の音声データを第１のスピーカ６５Ｌから音声出力させ、マイクアレイ２０から第２の指定箇所に対応する位置（第２の音声位置）に向かう第２の指向方向の音声を強調した第２の音声データを第２のスピーカ６５Ｒから音声出力させる。 In addition, for example, two speakers are provided in the audio processing system 5D, and the reproducing unit 60 has a first directivity direction from the microphone array 20 to a position corresponding to the first designated location (first audio position). The first voice data emphasizing the voice is outputted from the first speaker 65L, and the voice in the second directivity direction from the microphone array 20 toward the position corresponding to the second designated position (second voice position) is obtained. The emphasized second audio data is output as audio from the second speaker 65R.

これにより、音声処理システム５Ｄは、例えば２つのスピーカが設けられている場合に、指定箇所毎に、マイクアレイ２０から各指定箇所に対応する音声位置に向かう指向方向の音声を強調した各音声データを、各スピーカ６５Ｌ，６５Ｒから独立して音声出力させることができる。 Thereby, for example, when two speakers are provided, the audio processing system 5D emphasizes the sound in the direction of the direction from the microphone array 20 toward the audio position corresponding to each designated location for each designated location. Can be output from each speaker 65L, 65R independently.

以下、上述した本発明に係る音声処理システム及び音声処理方法の構成、作用及び効果を説明する。 Hereinafter, the configuration, operation, and effect of the above-described voice processing system and voice processing method according to the present invention will be described.

本発明の一実施形態は、映像を撮像する少なくとも１つの撮像部と、前記撮像部により撮像された映像データを表示する表示部と、複数のマイクロホンを含み、前記マイクロホンを用いて音声を収音する収音部と、前記収音部により収音された音声データを音声出力する音声出力部と、前記撮像部により撮像された前記映像データと、前記収音部により収音された前記音声データとを記録する記録部と、前記記録部に記録された前記映像データを前記表示部に表示させ、前記記録部に記録された前記音声データを前記音声出力部に音声出力させる再生部と、前記表示部に表示された前記映像データの１つ以上の指定箇所の指定を受け付ける操作部と、前記記録部に記録された前記音声データを基に、前記収音部から、指定された前記映像データの１つ以上の指定箇所に対応する位置に向かう指向方向の音声を強調した音声データを生成又は合成する信号処理部と、を備える音声処理システムである。 One embodiment of the present invention includes at least one imaging unit that captures an image, a display unit that displays image data captured by the imaging unit, and a plurality of microphones, and collects sound using the microphones. A sound collection unit, a sound output unit that outputs sound data collected by the sound collection unit, the video data picked up by the image pickup unit, and the sound data picked up by the sound collection unit A recording unit for recording the video data recorded on the recording unit on the display unit, and a playback unit for outputting the audio data recorded on the recording unit to the audio output unit, On the basis of the audio data recorded on the recording unit, an operation unit that accepts designation of one or more designated locations of the video data displayed on the display unit, the designated video data from the sound collection unit A signal processing unit which generates or synthesizes the audio data that emphasizes the orientation of the sound toward the positions corresponding to one or more specified locations, a voice processing system comprising a.

この構成によれば、音声処理システムは、既に記録された映像データの再生中に操作部からの所定の指定箇所の指定に応じて、マイクアレイの各マイクロホンが収音した各音声データを用いて、マイクアレイから１つ以上の指定箇所に対応する位置に向かう指向方向に指向性を形成した音声データを信号処理部において生成又は合成する。 According to this configuration, the audio processing system uses each audio data collected by each microphone of the microphone array in accordance with designation of a predetermined designated location from the operation unit during reproduction of the already recorded video data. Then, the signal processing unit generates or synthesizes audio data in which directivity is formed in a directivity direction from the microphone array toward a position corresponding to one or more designated locations.

これにより、音声処理システムは、記録された映像データ及び音声データの再生中に、指定された任意の再生時間に対する映像中の音声データを強調して出力できる。 Thereby, the audio processing system can emphasize and output the audio data in the video for the designated arbitrary reproduction time during the reproduction of the recorded video data and audio data.

また、本発明の一実施形態は、前記再生部が、前記収音部から、前記１つ以上の指定箇所に対応する位置に向かう指向方向の音声を強調した音声データを前記音声出力部に音声出力させる、音声処理システムである。 Also, in one embodiment of the present invention, the reproduction unit causes the audio output unit to output audio data in which sound in a directivity direction from the sound collection unit toward a position corresponding to the one or more specified locations is emphasized. This is a voice processing system for outputting.

これにより、音声処理システムは、信号処理部によって、マイクアレイから１つ以上の指定箇所に対応する位置に向かう指向方向に指向性を形成した音声データを音声出力することができる。 Accordingly, the audio processing system can output audio data having directivity formed in the directivity direction from the microphone array toward the position corresponding to the one or more designated locations by the signal processing unit.

また、本発明の一実施形態は、前記撮像部は全方位カメラであり、前記信号処理部は、前記全方位カメラにより撮像された前記映像データが前記表示部に表示されている間に指定された前記１つ以上の指定箇所に応じて、前記１つ以上の指定箇所を含む映像データの座標系を画像変換し、前記再生部は、前記画像変換後の映像データを前記表示部に表示させ、前記収音部から、前記１つ以上の指定箇所に対応する位置に向かう指向方向の音声を強調した音声データを音声出力させる、音声処理システムである。 In one embodiment of the present invention, the imaging unit is an omnidirectional camera, and the signal processing unit is specified while the video data captured by the omnidirectional camera is displayed on the display unit. According to the one or more designated locations, the coordinate system of the video data including the one or more designated locations is image-converted, and the playback unit causes the display unit to display the video data after the image conversion. A sound processing system for outputting sound data in which sound in a directivity direction directed to a position corresponding to the one or more designated locations is output from the sound collection unit.

この構成によれば、音声処理システムは、全方位カメラにより撮像された映像データにおける被写体の位置とマイクアレイにより収音される被写体の人物の音声の方向とを対応付けるための座標系の変換処理を容易に行うことができ、カメラにより撮像された映像データと１つ以上の指定箇所に対応する位置に向かう指向方向に指向性が形成された音声データとを再生部において同期再生処理する場合の処理負荷を軽減できる。 According to this configuration, the sound processing system performs a coordinate system conversion process for associating the position of the subject in the video data captured by the omnidirectional camera with the direction of the sound of the subject person picked up by the microphone array. Processing in the case where video data captured by a camera and audio data in which directivity is formed in a directivity direction toward a position corresponding to one or more designated locations can be easily reproduced in the reproduction unit. The load can be reduced.

また、本発明の一実施形態は、前記撮像部と前記収音部とが、同軸上に配置される、音声処理システムである。 Moreover, one Embodiment of this invention is an audio processing system with which the said imaging part and the said sound collection part are arrange | positioned coaxially.

これにより、音声処理システムは、音声処理システムにおける全方位カメラとマイクアレイとが同一の中心軸を有するように設置されるので、全方位カメラ１０Ｅとマイクアレイ２０Ｃとの座標系を同一にすることができる。 Thereby, the audio processing system is installed so that the omnidirectional camera and the microphone array in the audio processing system have the same central axis, so that the coordinate systems of the omnidirectional camera 10E and the microphone array 20C are made the same. Can do.

また、本発明の一実施形態は、前記撮像部と前記収音部とが、室内の天井に配置される、音声処理システムである。 Moreover, one Embodiment of this invention is an audio processing system with which the said imaging part and the said sound collection part are arrange | positioned at an indoor ceiling.

これにより、音声処理システムの設置が簡易化できる。 Thereby, installation of the voice processing system can be simplified.

また、本発明の一実施形態は、前記信号処理部は、前記表示部に表示された前記映像データに対して異なる複数箇所の指定に応じて、前記映像データにおける各指定箇所に異なる識別形状を表示させる、音声処理システムである。 Further, according to an embodiment of the present invention, the signal processing unit has different identification shapes for each designated location in the video data in response to designation of different locations for the video data displayed on the display unit. This is a voice processing system to be displayed.

この構成によれば、信号処理部は、ディスプレイに表示された映像データに対して、ユーザが異なる複数（例えば２箇所）の指定箇所を指定した場合に、映像データ中の異なる各指定箇所に、異なる識別形状を表示させる。 According to this configuration, when the user designates a plurality of different designated locations (for example, two locations) for the video data displayed on the display, each signal designated in the designated location in the video data, Different identification shapes are displayed.

これにより、音声処理システムは、ディスプレイに表示された映像データにおいて、ユーザにより指定された異なる複数の指定箇所を区別して認識することができ、区別した各指定箇所に異なる識別形状として、例えば一方の指定箇所の周囲に矩形の識別形状を表示し、他方の指定箇所の周囲に丸の識別形状を表示することで、複数の指定箇所を区別して認識したことを視覚的にユーザに対して明示することができる。 As a result, the audio processing system can distinguish and recognize a plurality of different designated locations designated by the user in the video data displayed on the display. By displaying a rectangular identification shape around the specified location and displaying a round identification shape around the other specified location, it is clearly indicated to the user that a plurality of specified locations have been recognized and recognized. be able to.

また、本発明の一実施形態は、前記音声出力部が、第１の音声出力部と、第２の音声出力部とを含み、前記再生部が、前記収音部から第１の指定箇所に対応する位置に向かう第１の指向方向の音声を強調した第１の音声データを前記第１の音声出力部から音声出力させ、前記収音部から第２の指定箇所に対応する位置に向かう第２の指向方向の音声を強調した第２の音声データを前記第２の音声出力部から音声出力させる、音声処理システムである。 In one embodiment of the present invention, the audio output unit includes a first audio output unit and a second audio output unit, and the reproduction unit is moved from the sound collection unit to the first designated location. The first sound data in which the sound in the first directivity direction toward the corresponding position is emphasized is output from the first sound output unit, and the first sound data toward the position corresponding to the second designated location is output from the sound collection unit. In the speech processing system, the second speech data in which the speech in the two directivity directions is emphasized is output from the second speech output unit.

この構成によれば、音声処理システムには例えば２つのスピーカが設けられ、再生部は、マイクアレイから第１の指定箇所に対応する位置（第１の音声位置）に向かう第１の指向方向の音声を強調した第１の音声データを第１のスピーカから音声出力させ、マイクアレイから第２の指定箇所に対応する位置（第２の音声位置）に向かう第２の指向方向の音声を強調した第２の音声データを第２のスピーカから音声出力させる。 According to this configuration, the audio processing system is provided with, for example, two speakers, and the reproduction unit has a first directivity direction from the microphone array to a position corresponding to the first designated location (first audio position). The first voice data with enhanced voice is output from the first speaker, and the voice in the second directivity direction from the microphone array toward the position corresponding to the second designated position (second voice position) is emphasized. The second audio data is output from the second speaker.

これにより、音声処理システムは、例えば２つのスピーカが設けられている場合に、指定箇所毎に、マイクアレイから各指定箇所に対応する音声位置に向かう指向方向の音声を強調した各音声データを、各スピーカから独立して音声出力させることができる。 Thereby, for example, when two speakers are provided, the voice processing system, for each designated place, each voice data that emphasizes the voice in the directivity direction from the microphone array to the voice position corresponding to each designated place, Audio can be output independently from each speaker.

また、本発明の一実施形態は、前記音声出力部が、第１の音声出力部と、第２の音声出力部とを含み、前記再生部が、前記収音部から異なる複数の指定箇所に対応する位置に向かう異なる複数の指向方向の音声を強調した音声データが合成された音声データを前記第１の音声出力部から音声出力させ、前記収音部から残りの１つ以上の指定箇所に対応する位置に向かう残りの１つ以上の指向方向の音声を強調した音声データを前記第２の音声出力部から音声出力又は合成音声出力させる、音声処理システムである。 In one embodiment of the present invention, the audio output unit includes a first audio output unit and a second audio output unit, and the playback unit is provided at a plurality of designated locations different from the sound collection unit. Voice data obtained by synthesizing voice data that emphasizes voices in different directivity directions toward the corresponding position is output from the first voice output unit, and the remaining one or more designated locations are output from the sound pickup unit. In the speech processing system, speech data that emphasizes speech in one or more remaining directivity directions toward a corresponding position is output from the second speech output unit as speech or synthesized speech.

この構成によれば、音声処理システムには例えば２つのスピーカが設けられ、再生部は、マイクアレイから異なる複数の指定箇所に対応する位置（例えば第１，第２の各音声位置）に向かう第１，第２の各指向方向の音声を強調した音声データが合成された音声データを第１のスピーカから音声出力させ、更に、マイクアレイから残りの１つ以上の指定箇所に対応する位置（例えば第３の音声位置）に向かう残りの１つ以上の指向方向の音声を強調した音声データを第２のスピーカから音声出力させる。 According to this configuration, the audio processing system is provided with, for example, two speakers, and the playback unit moves from the microphone array to positions corresponding to a plurality of different designated locations (for example, first and second audio positions). Audio data obtained by synthesizing audio data that emphasizes audio in the first and second directivity directions is output from the first speaker, and the position corresponding to one or more remaining designated locations from the microphone array (for example, Audio data that emphasizes the remaining voice in one or more directivity directions toward the third audio position) is output from the second speaker.

これにより、音声処理システムは、例えば２つのスピーカが設けられている場合に、マイクアレイから複数（例えば２つ）の指向方向の音声を強調した各音声データを合成して一方のスピーカから音声出力でき、更に他の指向方向の音声を強調した音声データを他方のスピーカから音声出力できる。 As a result, when two speakers are provided, for example, the audio processing system synthesizes each audio data in which a plurality of (for example, two) directional sounds are emphasized from the microphone array and outputs the audio from one speaker. In addition, voice data in which voice in another direction is emphasized can be outputted from the other speaker.

また、本発明の一実施形態は、１つ以上の前記音声出力部を含み、前記再生部が、前記収音部から異なる複数の指定箇所に対応する位置に向かう異なる複数の指向方向の音声を強調した音声データが合成された音声データを、１つ以上の前記音声出力部から音声出力させる、音声処理システムである。 In addition, one embodiment of the present invention includes one or more audio output units, and the reproduction unit outputs audio in different directivity directions from the sound collection unit to positions corresponding to different designated locations. It is an audio processing system for outputting audio data obtained by synthesizing emphasized audio data from one or more audio output units.

この構成によれば、音声処理システムには例えば１つ以上のスピーカが設けられ、再生部は、マイクアレイから第１の指定箇所に対応する位置（第１の音声位置）に向かう第１の指向方向の音声を強調した第１の音声データと、マイクアレイから第２の指定箇所に対応する位置（第２の音声位置）に向かう第２の指向方向の音声を強調した第２の音声データと、マイクアレイから第３の指定箇所に対応する位置（第３の音声位置）に向かう第３の指向方向の音声を強調した第３の音声データとが合成された音声データを、１つ以上のスピーカから音声出力させる。 According to this configuration, the audio processing system is provided with, for example, one or more speakers, and the reproduction unit has the first directivity from the microphone array toward the position corresponding to the first designated location (first audio position). First voice data in which voice in the direction is emphasized, second voice data in which voice in the second directivity direction from the microphone array toward the position corresponding to the second designated location (second voice position) is emphasized, and , Voice data obtained by synthesizing the third voice data that emphasizes the voice in the third directivity direction from the microphone array toward the position corresponding to the third designated position (third voice position). Sound is output from the speaker.

これにより、音声処理システムは、例えば１つ以上のスピーカが設けられている場合に、マイクアレイから複数（例えば３つ）の指向方向の音声を強調した各音声データを合成してスピーカから音声出力でき、更に複数のスピーカが設けられている場合には合成された音声データを同時に音声出力できる。 As a result, the audio processing system, for example, when one or more speakers are provided, synthesizes each audio data in which a plurality of (for example, three) directional sounds are emphasized from the microphone array and outputs the audio from the speakers. In addition, when a plurality of speakers are provided, the synthesized audio data can be output simultaneously.

また、本発明の一実施形態は、前記信号処理部が、所定の入力操作又は前記表示部に表示された前記映像データの表示領域外への指定操作に応じて、前記音声出力部から音声出力された前記音声データのパラメータ調整操作用媒体を表示する、音声処理システムである。 In one embodiment of the present invention, the signal processing unit outputs audio from the audio output unit in response to a predetermined input operation or a designation operation outside the display area of the video data displayed on the display unit. The voice processing system displays a parameter adjustment operation medium for the voice data.

この構成によれば、音声処理システムは、所定の入力操作（例えばマウスの右クリック操作）又はディスプレイに表示された映像データの表示領域外への指定操作（例えばマウスの左クリック操作）により、スピーカから音声出力されている音声データのパラメータ（例えば、音量レベル）の調整操作を受け付ける調整操作用ボックスを簡易に表示することができる。 According to this configuration, the audio processing system can perform a speaker operation by performing a predetermined input operation (for example, a right click operation of the mouse) or a designation operation (for example, a left click operation of the mouse) outside the display area of the video data displayed on the display. It is possible to easily display an adjustment operation box that accepts an adjustment operation of a parameter (for example, volume level) of audio data that is output from the audio.

また、本発明の一実施形態は、前記信号処理部が、常に、若しくは所定の入力操作又は前記表示部に表示された前記映像データの表示領域外への指定操作に応じて、前記音声出力部から音声出力された前記音声データのパラメータ状態標示用媒体を表示する、音声処理システムである。 Further, in one embodiment of the present invention, the signal processing unit may perform the audio output unit constantly or in response to a predetermined input operation or a designation operation outside the display area of the video data displayed on the display unit. It is a voice processing system for displaying a parameter status indicating medium for the voice data output from the voice.

この構成によれば、音声処理システムは、常に、若しくは所定の入力操作（例えばマウスの右クリック操作）又はディスプレイに表示された映像データの表示領域外への指定操作（例えばマウスの左クリック操作）により、スピーカから音声出力されている音声データのパラメータ（例えば、音量レベル）の状態を標示するインジケータとしての状態標示用ボックスを簡易に表示することができる。 According to this configuration, the audio processing system always or predetermined input operation (for example, right click operation of the mouse) or designation operation outside the display area of the video data displayed on the display (for example, left click operation of the mouse) Thus, it is possible to easily display a state indication box as an indicator for indicating the state of a parameter (for example, a volume level) of audio data output from the speaker.

また、本発明の一実施形態は、前記信号処理部が、所定の入力操作又は前記表示部に表示された前記映像データの表示領域外への指定操作毎に、前記撮像部により撮像された映像データ、又は前記音声出力部から音声出力された前記音声データのパラメータ調整操作用媒体に切り替えて前記表示部に表示させる、音声処理システムである。 In one embodiment of the present invention, the signal processing unit captures a video imaged by the imaging unit every time a predetermined input operation or a designation operation outside the display area of the video data displayed on the display unit is performed. The voice processing system is configured to switch to a parameter adjustment operation medium for data or voice data output from the voice output unit and to display the medium on the display unit.

この構成によれば、音声処理システムは、所定の入力操作又はディスプレイに表示された映像データの表示領域外への指定操作（例えばマウスの左クリック操作）毎に、カメラにより撮像された映像データ、又はスピーカから音声出力されている音声データのパラメータ（例えば、音量レベル）の調整操作を受け付ける調整操作用ボックスを簡易に切り替えて表示することができる。 According to this configuration, the audio processing system performs video data captured by the camera for each predetermined input operation or designation operation (for example, a left click operation of the mouse) outside the display area of the video data displayed on the display. Alternatively, an adjustment operation box that accepts an adjustment operation of a parameter (for example, a volume level) of audio data output from a speaker can be easily switched and displayed.

また、本発明の一実施形態は、前記信号処理部が、前記表示部に表示された前記映像データの指定箇所を中心に含む所定形状の描画操作に応じて、前記収音部から前記指定箇所に対応する位置に向かう指向方向の音声を強調した音声データを生成又は合成する、音声処理システムである。 Further, according to an embodiment of the present invention, the signal processing unit may perform the specified location from the sound collection unit according to a drawing operation of a predetermined shape centered on the specified location of the video data displayed on the display unit. Is a voice processing system that generates or synthesizes voice data that emphasizes voice in a directivity direction toward a position corresponding to.

この構成によれば、音声処理システムは、ディスプレイに表示された映像データの指定箇所を中心に含む所定形状（例えば矩形形状）を描く簡易な描画操作（例えばタッチ操作とタッチ操作した状態でのスライド操作）により、マイクアレイから指定箇所に対応する位置に向かう指向方向の音声を強調した音声データを生成又は合成することができる。 According to this configuration, the audio processing system can perform a simple drawing operation (for example, a touch operation and a slide in a state in which the touch operation is performed) that draws a predetermined shape (for example, a rectangular shape) centering on a designated portion of the video data displayed on the display. Operation) can generate or synthesize audio data in which sound in a directivity direction from the microphone array toward the position corresponding to the designated location is emphasized.

また、本発明の一実施形態は、前記信号処理部が、前記指定箇所毎に表示された前記識別形状の再指定に応じて、前記収音部から再指定された前記識別形状が表示された指定箇所に対応する位置に向かう指向方向の音声の強調を中止した音声データを生成又は合成する、音声処理システムである。 In one embodiment of the present invention, the signal processing unit displays the identification shape redesignated from the sound collection unit in response to the redesignation of the identification shape displayed for each designated location. A speech processing system that generates or synthesizes speech data in which enhancement of speech in a directivity direction toward a position corresponding to a designated location is stopped.

この構成によれば、音声処理システムは、指定箇所毎に表示された識別形状が再指定されると、マイクアレイから再指定された識別形状が表示された指定箇所に対応する位置に向かう指向方向の音声の強調を中止した音声データを簡易に生成又は合成することができる。 According to this configuration, when the identification shape displayed for each designated location is redesignated, the voice processing system is directed toward the position corresponding to the designated location where the identification shape redesignated from the microphone array is displayed. It is possible to easily generate or synthesize voice data for which the enhancement of voice is stopped.

また、本発明の一実施形態は、少なくとも１つの撮像部において映像を撮像するステップと、複数のマイクロホンを含む収音部において音声を収音するステップと、前記撮像部により撮像された映像データを表示部に表示させるステップと、前記撮像部により撮像された映像データと前記収音部により収音された音声データとを記録するステップと、記録された前記映像データを前記表示部に表示させ、記録された前記音声データを音声出力部に音声出力させるステップと、前記表示部に表示された前記映像データの１つ以上の指定箇所の指定を受け付けるステップと、記録された前記音声データを基に、前記収音部から、指定された前記映像データの１つ以上の指定箇所に対応する位置に向かう指向方向の音声を強調した音声データを生成又は合成するステップと、を有する、音声処理方法である。 In one embodiment of the present invention, at least one image pickup unit picks up an image, the sound pickup unit including a plurality of microphones picks up sound, and the image data picked up by the image pickup unit is recorded. Displaying on the display unit, recording the video data captured by the imaging unit and the audio data collected by the sound collection unit, and displaying the recorded video data on the display unit, Based on the recorded audio data, a step of outputting the recorded audio data to an audio output unit, a step of accepting designation of one or more designated locations of the video data displayed on the display unit, and the recorded audio data And generating sound data that emphasizes sound in a directivity direction toward a position corresponding to one or more specified locations of the specified video data from the sound collection unit Having the steps of synthesizes a speech processing method.

この方法によれば、音声処理システムは、既に記録された映像データの再生中に操作部からの所定の指定箇所の指定に応じて、マイクアレイの各マイクロホンが収音した各音声データを用いて、マイクアレイから１つ以上の指定箇所に対応する位置に向かう指向方向に指向性を形成した音声データを信号処理部において生成又は合成する。 According to this method, the audio processing system uses each audio data collected by each microphone of the microphone array in accordance with designation of a predetermined designated location from the operation unit during reproduction of already recorded video data. Then, the signal processing unit generates or synthesizes audio data in which directivity is formed in a directivity direction from the microphone array toward a position corresponding to one or more designated locations.

以上、図面を参照しながら各種の実施形態について説明したが、本発明はかかる例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 While various embodiments have been described above with reference to the drawings, it goes without saying that the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

本発明は、撮像された映像データが表示された表示画面の中で指定された１つ以上の指定箇所に対応する位置に向かう指向方向の音声データを強調して出力することができ、音声データのパラメータの状態を容易に認識することができる音声処理システム及び音声処理方法として有用である。 The present invention can emphasize and output audio data in a directional direction toward a position corresponding to one or more designated locations designated on a display screen on which captured video data is displayed. It is useful as a speech processing system and speech processing method that can easily recognize the state of the parameters.

５Ａ、５Ｂ、５Ｃ、５Ｄ音声処理システム
１０、１０Ａ、１０Ｂ、１０Ｃカメラ
１０Ｅ全方位カメラ
２０、２０Ａ、２０Ｃ、２０Ｄ、２０Ｅ、２０Ｆマイクアレイ
２２、２２Ａ、２２Ｂ、２２Ｃ、２２Ｄ、２２Ｅ、２２Ｆ、２２ａ、２２ｂ、２２ｃ、２２ｎ−１、２２ｎマイクロホン
３０、３０Ａネットワーク
４０音声処理装置
４５、４５Ａレコーダ
５０、７１信号処理部
５１ａ、５１ｂ、５１ｃ、５１ｎ−１、５１ｎＡ／Ｄ変換器
５２ａ、５２ｂ、５２ｃ、５２ｎ−１、５２ｎ遅延器
５５、７８操作部
５７加算器
６０、６０Ａ、６０Ｂ再生部
６３、７３ディスプレイ
６５、７５、８２、８３スピーカ
１０１メイン筐体
１０３パンチングメタルカバー
１０５マイク板金
１０７ベース板金
１１１環状底部
１１３マイク穴
１１７メイン筐体外周壁
１２７マイク基板
１２９マイク筐体
１３３環状天板部
１３５ベース板金外周壁
１３９メイン基板
１４１電源基板
１４３嵌合部
１４５外側挟持片
１４７内側挟持片
１４９間隙 5A, 5B, 5C, 5D Audio processing system 10, 10A, 10B, 10C Camera 10E Omnidirectional camera 20, 20A, 20C, 20D, 20E, 20F Microphone array 22, 22A, 22B, 22C, 22D, 22E, 22F, 22a 22b, 22c, 22n-1, 22n Microphone 30, 30A Network 40 Audio processing device 45, 45A Recorder 50, 71 Signal processing unit 51a, 51b, 51c, 51n-1, 51n A / D converters 52a, 52b, 52c 52n-1, 52n Delay unit 55, 78 Operation unit 57 Adder 60, 60A, 60B Playback unit 63, 73 Display 65, 75, 82, 83 Speaker 101 Main housing 103 Punching metal cover 105 Microphone sheet metal 107 Base sheet metal 111 Annular bottom 113 m Equal hole 117 Main casing outer peripheral wall 127 Microphone board 129 Microphone casing 133 Annular top panel 135 Base sheet metal outer peripheral wall 139 Main board 141 Power supply board 143 Fitting part 145 Outer clamping piece 147 Inner clamping piece 149 Gap

Claims

An imaging unit for imaging video;
A display unit having a rectangular display area, and displaying video data captured by the imaging unit in a circular video display area smaller than the rectangular display area ;
A sound collection unit that includes a plurality of microphones and collects sound using the microphones;
An audio output unit for outputting audio data collected by the sound collection unit;
An operation unit for accepting designation to the display unit;
Based on the audio data, generating or synthesizing emphasized audio data that emphasizes audio in a directivity direction toward a position corresponding to a specified location of the video data, which is specified by the operation unit, from the sound collection unit. A signal processing unit for outputting audio from the audio output unit,
Wherein the signal processing unit, in response to said reception of assignment to the video display area outside by the operation unit, a status display area or operating area for adjustment of the audio output, any of the four corners of the rectangular display area Displaying the image display area so as to straddle the boundary line of the video display area in a rectangular area diagonally extending from the center to the video display area .
Voice processing system.

The speech processing system according to claim 1 ,
When the operation unit accepts the sound output adjustment operation in a state where the state display area or the adjustment operation area is displayed, the signal processing unit outputs the emphasized sound data to the sound output unit. adjust,
Voice processing system.

The speech processing system according to claim 2 ,
The audio output adjustment operation includes a directional switching operation for switching between the emphasized audio data and the non-emphasized audio data and outputting the audio to the audio output unit.
Voice processing system.

The speech processing system according to claim 1 ,
The signal processing unit, when the designation of acceptance of the status display area or adjusting the operation area of the audio output of the enhanced speech data to the video display area outside in the state shown Table is performed by the operation unit, the enhanced speech Hide the status display area or adjustment operation area for data audio output.
Voice processing system.

An imaging unit for imaging video;
A display unit having a rectangular display area, and displaying video data captured by the imaging unit in a circular video display area smaller than the rectangular display area ;
A sound collection unit that includes a plurality of microphones and collects sound using the microphones;
An audio output unit for outputting audio data collected by the sound collection unit;
An operation unit for accepting designation to the display unit;
Based on the voice data, a plurality of emphasized voice data is generated or synthesized by emphasizing voice in a directivity direction toward a position corresponding to each of a plurality of designated locations that are designated by the operation unit from the sound collection unit. And
Depending on the specification of acceptance into the video display area outside by the operation unit, the adjustment operation area of the enhanced speech data, diagonals to the center of the video display area from one of the four corners of the rectangular display area In the rectangular area to be displayed so as to straddle the boundary line of the video display area ,
When the sound output adjustment operation is accepted by the operation unit in a state where one of the plurality of designated portions is selected by the operation unit, the pointing direction toward the position corresponding to the designated portion in the selected state is changed. A signal processing unit that adjusts audio output to the audio output unit of the emphasized audio data that emphasizes audio , and
Voice processing system.

The speech processing system according to claim 5 ,
The audio output adjustment operation includes a directional switching operation for switching between the emphasized audio data and the non-emphasized audio data and outputting the audio to the audio output unit.
Voice processing system.

The speech processing system according to claim 5 ,
The signal processing unit, when the designation of acceptance of the status display area or adjusting the operation area of the audio output of the enhanced speech data to the video display area outside in the state shown Table is performed by the operation unit, the enhanced speech Hide the status display area or adjustment operation area for data audio output.
Voice processing system.

Capturing an image in the imaging unit;
Collecting sound in a sound collecting unit including a plurality of microphones;
Displaying the video data captured by the imaging unit within a circular video display area smaller than the rectangular display area of the display unit having a rectangular display area ;
Displaying the video data on the display unit, and outputting audio data collected by the sound collection unit to an audio output unit;
Receiving a designation to the display unit by an operation unit;
Based on the audio data, generating or synthesizing emphasized audio data that emphasizes audio in a directivity direction toward a position corresponding to a specified location of the video data, which is specified by the operation unit, from the sound collection unit. Outputting audio from the audio output unit;
Depending on the specification of acceptance into the video display area outside by the operation unit, the state display area or operating area for adjustment of the audio output, the video display area from one of the four corners of the rectangular display area having a step of displaying to straddle the perimeter of the image display region within the rectangular region of from the center to the diagonal of the,
Audio processing method.

Capturing an image in the imaging unit;
Collecting sound in a sound collecting unit including a plurality of microphones;
Displaying the video data captured by the imaging unit within a circular video display area smaller than the rectangular display area of the display unit having a rectangular display area ;
Displaying the video data on the display unit, and outputting audio data collected by the sound collection unit to an audio output unit;
Receiving a designation to the display unit by an operation unit;
Based on the voice data, a plurality of emphasized voice data is generated or synthesized by emphasizing voice in a directivity direction toward a position corresponding to each of a plurality of designated locations that are designated by the operation unit from the sound collection unit. And
In response to the designation of the outside of the video display area by the operation unit, the operation area for adjusting the emphasized audio data is a diagonal line from any one of the four corners of the rectangular display area to the center of the video display area. In the rectangular area to be displayed so as to straddle the boundary line of the video display area ,
When the sound output adjustment operation is accepted by the operation unit in a state where one of the plurality of designated portions is selected by the operation unit, the pointing direction toward the position corresponding to the designated portion in the selected state is changed. Adjusting the audio output to the audio output unit of the emphasized audio data in which the audio is emphasized ,
Audio processing method.