JP4411959B2

JP4411959B2 - Audio collection / video imaging equipment

Info

Publication number: JP4411959B2
Application number: JP2003421437A
Authority: JP
Inventors: 竜一田中
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-12-18
Filing date: 2003-12-18
Publication date: 2010-02-10
Anticipated expiration: 2023-12-18
Also published as: JP2005184386A

Description

本発明は、たとえば、２つの会議室にいる複数の会議出席者同士が、テレビジョン会議を行うときに使用するのに好適な音声集音・映像撮像装置と撮像条件決定方法に関する。
特に、本発明は、発言者が使用しているマイクロフォンを正確に選択し、複数の撮像手段のうち選択されたマイクロフォンを使用する話者を撮像するに適した１つの撮像手段を選択して、選択された話者を適切に撮像可能にした音声集音・映像撮像装置に関する。
本発明は、さらに声紋認証を行い、その結果に基づいて選択され、声紋認証されたマイクロフォン設置領域を選択された撮像手段で撮像可能にした、音声集音・映像撮像装置と方法に関する。 The present invention relates to a sound collection / video imaging device and an imaging condition determination method suitable for use when, for example, a plurality of conference attendees in two conference rooms conduct a video conference.
In particular, the present invention accurately selects a microphone used by a speaker, selects one imaging means suitable for imaging a speaker using the selected microphone from among a plurality of imaging means, The present invention relates to a sound collection / video imaging apparatus capable of appropriately capturing a selected speaker.
The present invention further relates to a voice collection / video imaging apparatus and method that perform voiceprint authentication, and select a microphone installation area that is selected based on the voiceprint authentication and that can be picked up by a selected imaging means.

離れた位置にある２つの会議室にいる会議出席者同士が会議を行うため、テレビ会議システムが用いられている。テレビ会議システムは、それぞれの会議室にいる会議出席者の姿を撮像手段で撮像し、音声をマイクロフォンで集音して、撮像手段で撮像した画像およびマイクロフォンで集音した音声を通信経路を介して伝送し、相手側の会議室のテレビジョン受像機の表示部に撮像した画像を表示し、スピーカから集音した音声を出力する。 A video conference system is used in order for conference attendees in two conference rooms located at distant locations to hold a conference. The video conference system captures the attendees in each conference room with the image capturing means, collects the sound with the microphone, and uses the communication path to collect the image captured with the image capturing means and the sound collected with the microphone. The captured image is displayed on the display unit of the television receiver in the other party's conference room, and the sound collected from the speaker is output.

このようなテレビ会議システムにおいては、それぞれの会議室において、撮像手段およびマイクロフォンから離れた位置にいる発言者の音声が集音しにくいという問題に遭遇しており、その改善策として、会議出席者ごとにマイクロフォンを設けている場合がある。またテレビジョン受像機のスピーカから出力される音声が、スピーカから離れた位置にいる会議出席者には聞きにくいという問題もある。 In such a video conference system, in each conference room, a problem has been encountered that it is difficult for voices of speakers who are far away from the imaging means and the microphone to be collected. A microphone may be provided for each. There is also a problem that the audio output from the speaker of the television receiver is difficult to hear for conference attendees located away from the speaker.

特開２００３−８７８８７号公報および特開２００３−８７８９０号公報は、互いに離れた位置の会議室相互においてテレビ会議を行うときに、映像および音声を提供する通常のテレビ会議システムに加えて、相手側の会議室にいる会議出席者の音声がスピーカから明瞭に聴こえ、こちら側の会議室内の雑音の影響を受けにくいまたはエコーキャンセラーの負担が少ない、マイクロフォンとスピーカとが一体構成された音声入出力装置を開示している。 In JP 2003-87887 A and JP 2003-87890 A, in addition to a normal video conference system that provides video and audio when a video conference is performed between conference rooms located at a distance from each other, Voice input / output device with a built-in microphone and speaker that can clearly hear the voices of meeting attendees in the conference room from the speaker and is less susceptible to the noise in the conference room on this side or less burden on the echo canceller Is disclosed.

たとえば、特開２００３−８７８８７号公報に開示されている音声入出力装置は、特開２００３−８７８８７号公報の図５〜図８、図９、図２３を参照して記述されているように、下から上に向かって、スピーカ６が内蔵されたスピーカボックス５と、上に向かって放射状に開いている音を拡散する円錐状反射板４と、音遮蔽板３と、支柱８に支持された単一指向性の複数のマイクロフォン（図６、図７においては４本、図２３においては６本）を水平面に放射状に等角度で配置した構造をしている。音遮蔽板３は、下部のスピーカ５からの音が複数のマイクロフォンに入らないように遮蔽するためのものである。
特開２００３−８７８８７号公報および特開２００３−８７８９０号公報に開示された音声入出力装置は、映像および音声を提供するテレビ会議システムを補完する手段として活用されている。
特開２００３−８７８８７号公報特開２００３−８７８９０号公報 For example, a voice input / output device disclosed in Japanese Patent Laid-Open No. 2003-87887 is described with reference to FIGS. Supported from the bottom to the top by a speaker box 5 with a built-in speaker 6, a conical reflector 4 that diffuses a sound that opens radially upward, a sound shielding plate 3, and a column 8. A plurality of unidirectional microphones (four in FIGS. 6 and 7 and six in FIG. 23) are arranged radially at equal angles on a horizontal plane. The sound shielding plate 3 is for shielding the sound from the lower speaker 5 from entering a plurality of microphones.
The audio input / output devices disclosed in Japanese Patent Laid-Open Nos. 2003-87887 and 2003-87890 are used as means for complementing a video conference system that provides video and audio.
Japanese Patent Laid-Open No. 2003-87887 JP 2003-87890 A

従来、話者の声を収録するマイクロフォンの選択については、話者自身もしくは司会者などの第三者が手動で行っていた。
また１台のテレビジョンカメラを壁などに固定しておき、上記選択されたデータを基にそのテレビジョンカメラの向きを調整して、撮影方向を変え、話者の映像を取り込んでいた。しかしながら、そのような方法では、下記の問題に遭遇する。
（１）マイクロフォンの切り替えを忘れると、話者の映像が取り込まれない。
（２）カメラを動かすには時間がかかるため、話者が頻繁に変わる場合、話者の撮影が間に合わなくなる。
（３）複数の方向の映像を同時に伝送することができない。 Conventionally, selection of a microphone for recording a speaker's voice has been manually performed by a speaker or a third party such as a presenter.
In addition, one television camera is fixed on a wall or the like, and the direction of the television camera is adjusted based on the selected data, the shooting direction is changed, and the video of the speaker is captured. However, in such a method, the following problems are encountered.
(1) If you forget to switch microphones, the video of the speaker will not be captured.
(2) Since it takes time to move the camera, if the speaker changes frequently, the shooting of the speaker will not be in time.
(3) Images in a plurality of directions cannot be transmitted simultaneously.

本発明の目的は、発言者のマイクロフォンを正確に選択し、さらに、選択したマイクロフォンに対応する撮像手段を選択して、選択したマイクロフォンを使用して話をしている話者の音声と映像を出力する音声集音・映像撮像装置を提供することにある。 An object of the present invention is to accurately select a speaker's microphone, further select an imaging means corresponding to the selected microphone, and select the voice and video of the speaker who is speaking using the selected microphone. An object of the present invention is to provide a sound collection / video imaging device for output.

本発明によれば、円環状かつ各々放射状に配置された、複数のマイクロフォンと、前記複数のマイクロフォンの各々と対応して設けられ、対応するマイクロフォンの集音範囲を撮像可能に、対応するマイクロフォンに近接して配設された、複数の第１の小型撮像手段と、前記各マイクロフォンおよび前記対応する各第１の小型撮像手段との組の間の所定位置に、当該マイクロフォンの集音範囲を撮像可能に配設された、少なくとも１つの第２の小型撮像手段と、前記各マイクロフォンと前記対応する各第１の小型撮像手段との第１の関係、および、当該マイクロフォンの近傍に位置する前記第２の小型撮像手段が位置する場合はその第２の小型撮像手段との第２の関係を記憶した記憶手段と、前記複数のマイクロフォンの集音信号を検出し、該検出した集音信号のうち有効な集音信号を検出したマイクロフォンを選択するマイクロフォン選択手段と、前記記憶手段に記憶されている前記第１の関係に基づいて、前記選択されたマイクロフォンに近接する第１の小型撮像手段を選択し、前記記憶手段に記憶されている前記第２の関係により該当する前記第２の小型撮像手段が存在するときはその第２の小型撮像手段を選択する、撮像手段選択手段と、該撮像手段選択手段で選択した前記第１の小型撮像手段が撮像した第１の撮像信号と、該当する前記第２の小型撮像手段が存在するときはその第２の小型撮像手段が撮像した第２の撮像信号とを選択出力する撮像信号選択手段と、前記撮像信号選択手段が選択された、前記第１の撮像信号と前記第２の撮像信号とを１つに合成する、または、１画面に分割する画像合成手段とを有する、音声集音・映像撮像装置が提供される。
According to the present invention, a plurality of microphones arranged in an annular shape and each in a radial manner are provided corresponding to each of the plurality of microphones, and the corresponding microphones can be imaged so that the sound collection range of the corresponding microphones can be imaged. The sound collection range of the microphone is picked up at a predetermined position between a set of a plurality of first small image pickup means, the microphones, and the corresponding first small image pickup means, which are arranged in proximity to each other. A first relationship between at least one second small-sized image pickup means, each of the microphones and each corresponding first small-size image pickup means, and the first located in the vicinity of the microphone. If the two small image pickup means are located, the storage means storing the second relationship with the second small image pickup means, and the sound collection signals of the plurality of microphones are detected, Based on the first relationship stored in the storage means, a microphone selection means for selecting a microphone that has detected a valid sound collection signal among the detected sound collection signals, and a first one that is close to the selected microphone. An imaging unit that selects one small imaging unit and selects the second small imaging unit when the second small imaging unit corresponding to the second relationship stored in the storage unit exists. If there is a selection means, a first imaging signal picked up by the first small-size imaging means selected by the imaging means selection means, and the corresponding second small-size imaging means, the second small-size imaging means Imaging signal selection means for selectively outputting the second imaging signal picked up by the camera, and combining the first imaging signal and the second imaging signal selected by the imaging signal selection means into one. Ma Includes an image synthesizing means for dividing one screen, the sound pickup-image pickup device is provided.

本発明によれば、選択されたマイクロフォンを使用している話者の映像、音声を自動的かつ迅速に切り替えて出力することができる。
すなわち、本発明によれば、複数のマイクロフォンの１つの使用が選択され、選択されたマイクロフォンの話者部分を撮像する撮像手段が選択され、その選択された撮像手段でマイクロフォンを使用している話者の撮像を効果的に行うことができる。本発明によれば、会議中に話者が変わっても、マイクロフォンの切替えの迅速さとともに、話者を映し出す撮像手段が迅速かつ適切に選択される。
また本発明においては、マイクロフォンの選択に応じて撮像手段も自動的に切り換えるため、従来のように手動でセッティングを変更する必要がなく、選択されたマイクロフォンを使用している話者の明瞭な映像を映し出し続けることができる。 According to the present invention, it is possible to automatically and quickly switch and output the video and audio of a speaker who uses the selected microphone.
That is, according to the present invention, the use of one of a plurality of microphones is selected, the imaging means for imaging the speaker portion of the selected microphone is selected, and the selected imaging means uses the microphone. The person can be effectively imaged. According to the present invention, even if the speaker changes during the conference, the imaging means for displaying the speaker is quickly and appropriately selected together with the speed of switching the microphone.
In the present invention, since the imaging means is automatically switched according to the selection of the microphone, there is no need to manually change the setting as in the prior art, and a clear image of the speaker using the selected microphone is obtained. Can continue to be projected.

本発明によれば、複数の撮像手段が設けられており、必要に応じて、話者と、話者以外の方向や人物の映像を同時に撮影し、同時に伝送ことも可能である。また、本発明によれば、話者とは無関係に特定の方向のみを映し出すことや、反対に特定の方向の映像を映し出さないようにすることも可能である。 According to the present invention, a plurality of imaging means are provided, and if necessary, images of a speaker and directions other than the speaker or a person can be simultaneously captured and transmitted simultaneously. Further, according to the present invention, it is possible to project only a specific direction regardless of the speaker, or to prevent a video in a specific direction from being displayed.

さらに本発明によれば、単なる相槌など極めて短時間の発音に対しては、一定時間処理を行わない時間を設けることで、一々その方向を撮影可能とするカメラを選択しないようにすることができる。 Furthermore, according to the present invention, it is possible to prevent the selection of a camera that can shoot the direction one by one by providing a time during which the processing is not performed for a certain period of time for a very short time pronunciation such as a simple match. .

また本発明によれば、会議の相手先からの指示により、こちら側のマイクロフォンと撮像手段を自動的に切り替えることが可能である。すなわち、相手側の会議室にいる人の希望に応じて人の音声とその映像とを出力することができる。 In addition, according to the present invention, it is possible to automatically switch between the microphone on this side and the imaging means in accordance with an instruction from a conference partner. That is, it is possible to output a person's voice and its video according to the desire of the person in the other party's conference room.

さらに本発明によれば、話者方向検出技術と画像認識技術を用いることで、マイクロフォンの選択を正確に行うことができる。その結果に基づいて、選択されたマイクロフォンに対応する撮像手段を選択することができる。 Furthermore, according to the present invention, the microphone can be accurately selected by using the speaker direction detection technique and the image recognition technique. Based on the result, an imaging unit corresponding to the selected microphone can be selected.

以下、本発明の実施の形態の音声集音・映像撮像装置について述べる。
図１（Ａ）〜（Ｃ）は本発明の実施の形態の音声集音・映像撮像装置が適用される１例を示す構成図である。
図１（Ａ）に図解したように、２つの会議室９０１、９０２にそれぞれに第１および第２の音声集音・映像撮像装置１Ａ、１Ｂが設置されており、これらの音声集音・映像撮像装置１Ａ、１Ｂが通信回線９２０、たとえば、電話回線で接続されている。 The following describes the sound collection / video imaging apparatus according to the embodiment of the present invention.
FIGS. 1A to 1C are configuration diagrams showing an example to which the sound collection / video imaging apparatus according to the embodiment of the present invention is applied.
As illustrated in FIG. 1A, the first and second sound collection / video imaging devices 1A and 1B are installed in the two conference rooms 901 and 902, respectively. The imaging devices 1A and 1B are connected by a communication line 920, for example, a telephone line.

〔音声集音・映像撮像装置の概要〕
図２は本発明の実施の形態の音声集音・映像撮像装置１Ａの平面配置図である。第１および第２の音声集音・映像撮像装置１Ａ、１Ｂは同じ構成をしている。
第１の音声集音・映像撮像装置１Ａを代表して述べると、第１の音声集音・映像撮像装置１Ａは、本発明の音声集音手段に相当する第１の通話装置１０Ａと、本発明の撮像手段に相当する２台の第１のテレビジョンカメラ（テレビカメラ）装置４０Ａ１、４０Ａ２とを有する。通話装置は会議出席者の発言を検出し、発言者を決定して発言者の音声をその会議室内の他の会議出席者および相手方の会議室の会議出席者に通報する。さらに通話装置は、発言者の特定に基づいてテレビカメラ装置４０Ａ１、４０Ａ２の撮像条件を提供する。
テレビカメラ装置４０Ａ１、４０Ａ２は提供された撮像条件に基づき自動的に最適な画像を撮像する。 [Outline of the sound collection and imaging device]
FIG. 2 is a plan layout view of the sound collection / video imaging apparatus 1A according to the embodiment of the present invention. The first and second audio collecting / imaging devices 1A and 1B have the same configuration.
As a representative example of the first sound collection / video imaging device 1A, the first sound collection / video imaging device 1A includes the first call device 10A corresponding to the sound collection means of the present invention, It has two first television camera (television camera) devices 40A1 and 40A2 corresponding to the imaging means of the invention. The communication device detects the speech of the conference participant, determines the speaker, and reports the speech of the speaker to other conference participants in the conference room and the conference attendee in the other conference room. Furthermore, the communication device provides the imaging conditions of the television camera devices 40A1 and 40A2 based on the identification of the speaker.
The TV camera devices 40A1 and 40A2 automatically capture an optimal image based on the provided imaging conditions.

第１の音声集音・映像撮像装置１Ａに、テレビジョン受像機５０Ａ、および／または、第１のプロジェクタ装置６０Ａを含めることもできる。
プロジェクタ装置６０Ａは、たとえば、変調手段として液晶を用いたプロジェクタ装置であり、会議に用いる各種資料をパーソナルコンピュータから提供された場合、スクリーンＳに映像として投射して会議出席者Ａ１〜Ａ８に視認可能とする。
テレビジョン受像機５０Ａは、テレビカメラ装置４０Ａ１、４０Ａ２で撮像した映像、または、相手方会議室のテレビカメラ装置４０Ｂ１、４０Ｂ２で撮像した映像をスクリーンＳに投射して会議出席者Ａ１〜Ａ８に表示する。なお、テレビジョン受像機５０Ａを削除して、テレビカメラ装置４０Ａ１、４０Ａ２で撮像した映像、または、相手方会議室のテレビカメラ装置４０Ｂ１、４０Ｂ２で撮像した映像を、パーソナルコンピュータからの提供された映像と切り換えて、プロジェクタ装置６０Ａを介してスクリーンＳに投射して会議出席者Ａ１〜Ａ８に表示することもできる。以下、テレビジョン受像機５０Ａを用いず、プロジェクタ装置６０Ａでテレビカメラ装置４０Ａ１、４０Ａ２で撮像した画像を表示する場合について述べる。 The first sound collection / video imaging device 1A may include the television receiver 50A and / or the first projector device 60A.
The projector device 60A is, for example, a projector device using liquid crystal as modulation means. When various materials used for the conference are provided from a personal computer, the projector device 60A projects the image on the screen S and is visible to the conference attendees A1 to A8. And
The television receiver 50A projects the images captured by the TV camera devices 40A1 and 40A2 or the images captured by the TV camera devices 40B1 and 40B2 in the other party's conference room onto the screen S and displays them on the conference attendees A1 to A8. . It is to be noted that the television receiver 50A is deleted and the video captured by the television camera devices 40A1 and 40A2 or the video captured by the television camera devices 40B1 and 40B2 in the other party's conference room is provided as the video provided from the personal computer. It can also be switched and projected onto the screen S via the projector device 60A and displayed on the conference attendees A1 to A8. Hereinafter, a case will be described in which an image captured by the television camera devices 40A1 and 40A2 is displayed by the projector device 60A without using the television receiver 50A.

好ましくは、通話装置１０Ａと、プロジェクタ装置６０Ａとはテーブル９１１の上に載置されている。図１（Ｂ）はテーブル９１１に載置された通話装置１０Ａを示す。
図１（Ｃ）、図２に図解したように、通話装置１０Ａの周囲に複数（図１（Ｃ）では６名、図２においては８名）の会議出席者Ａ１〜Ａ６（Ａ１〜Ａ８）が位置している。 Preferably, communication device 10A and projector device 60A are placed on table 911. FIG. 1B shows the communication device 10 </ b> A placed on the table 911.
As illustrated in FIGS. 1C and 2, a plurality of conference attendants A1 to A6 (A1 to A8) around the communication device 10A (six people in FIG. 1C and eight people in FIG. 2). Is located.

図解を省略した第２の音声集音・映像撮像装置１Ｂも、第２の通話装置１０Ｂと、第２の２台のテレビジョンカメラ（テレビカメラ）装置４０Ｂ１、４０Ｂ２とを有する。
音声集音・映像撮像装置１Ｂに、第２のプロジェクタ装置６０Ｂおよびテレビジョン受像機５０Ｂを含めることもできる。
好ましくは、通話装置１０Ｂとプロジェクタ装置６０Ｂとは会議室９０２のテーブル９１２に載置されている。 The second sound collection / video imaging device 1B (not shown) also includes a second call device 10B and second two television camera (television camera) devices 40B1 and 40B2.
The sound collection / video imaging device 1B may include the second projector device 60B and the television receiver 50B.
Preferably, call device 10B and projector device 60B are placed on table 912 in conference room 902.

〔通話装置〕
第１通話装置１０Ａと第２の通話装置１０Ｂとの間で、通信回線９２０を介して音声による応答を行う。
通常、通信回線９２０を介しての会話は、一人の話者と一人の話者同士、すなわち、１対１で通話を行うが、本発明の実施の形態の通話装置は１つの通信回線９２０を用いて、会議室９０１、９０２内の複数の会議出席者同士が通話できる。ただし、本実施の形態においては、音声の混雑を回避し、テレビカメラ装置での話者の撮像を可能にするため、同時刻（同じ時間帯）の話者は、相互に一人に限定する。
通話装置の詳細は後述する。 [Calling equipment]
A voice response is made via the communication line 920 between the first call device 10A and the second call device 10B.
Normally, a conversation via the communication line 920 is performed by one speaker and one speaker, that is, one-to-one, but the communication device according to the embodiment of the present invention uses one communication line 920. By using this, a plurality of conference attendees in the conference rooms 901 and 902 can talk with each other. However, in this embodiment, in order to avoid voice congestion and to enable imaging of a speaker with a television camera device, the number of speakers at the same time (same time zone) is limited to one person.
Details of the communication device will be described later.

〔テレビカメラ装置とテレビジョン受像機〕
たとえば、第１の音声集音・映像撮像装置１Ａにおけるテレビカメラ装置４０Ａ１、４０Ａ２は、第１通話装置１０Ａで特定した通話者を撮像する。そのため、テレビカメラ装置４０Ａ１、４０Ａ２は、パン、チルト、ズーム機能などを有する。
テレビカメラ装置４０Ａ１、４０Ａ２で撮像した映像は、通信回線９２０を介して相手側の会議室のプロジェクタ装置６０Ｂ（またはテレビジョン受像機５０Ｂ）に表示される。
必要に応じて、自分の側の会議室のプロジェクタ装置６０Ａ（またはテレビジョン受像機５０Ａ）にテレビカメラ装置４０Ａ１、４０Ａ２が撮像した映像を表示することもできる。 [Television camera device and television receiver]
For example, the television camera devices 40A1 and 40A2 in the first sound collection / video imaging device 1A image the caller specified by the first call device 10A. Therefore, the television camera devices 40A1 and 40A2 have pan, tilt, zoom functions, and the like.
Images captured by the television camera devices 40A1 and 40A2 are displayed on the projector device 60B (or the television receiver 50B) in the conference room on the other side via the communication line 920.
If necessary, video captured by the television camera devices 40A1 and 40A2 can be displayed on the projector device 60A (or the television receiver 50A) in the conference room on the own side.

〔撮像対象の特定方法〕
テレビカメラ装置４０Ａ１、４０Ａ２で撮像する撮像対象の特定方法は、第１通話装置１０Ａにおける話者の方向の特定、および、事前に登録してある話者の声紋認識結果を用いる。その詳細は撮像調整部３６において行うが、後述する。 [Identification method of imaging target]
The identification method of the imaging target imaged by the TV camera devices 40A1 and 40A2 uses the direction of the speaker in the first call device 10A and the voiceprint recognition result of the speaker registered in advance. The details are performed in the imaging adjustment unit 36, which will be described later.

第２の音声集音・映像撮像装置１Ｂも第１の音声集音・映像撮像装置１Ａと同じ処理を行う。
このように、音声集音・映像撮像装置１Ａ、１Ｂは、通話装置１０Ａ、１０Ｂにおいて、通話者を選択（特定）し、選択した通話者の音声を集音する。さらに、テレビカメラ装置４０Ａ１、４０Ａ２は、撮像調整部３６の指令に基づいて、選択（特定）された通話者の映像を撮像する。
集音した音声と撮像した映像は相手側に会議室に転送され、相手側の音声集音・映像撮像装置における通話装置で音声を再生し、プロジェクタ装置（またはテレビジョン受像機）で映像を表示する。 The second sound collection / video imaging apparatus 1B performs the same processing as the first sound collection / video imaging apparatus 1A.
As described above, the sound collecting / imaging devices 1A and 1B select (specify) the caller and collect the sound of the selected caller in the call devices 10A and 10B. Furthermore, the TV camera devices 40A1 and 40A2 capture the video of the selected (specified) caller based on the command of the imaging adjustment unit 36.
The collected audio and the captured video are transferred to the conference room on the other side, the sound is played back by the call device in the other party's voice collection and video imaging device, and the video is displayed on the projector device (or television receiver). To do.

通話装置の詳細
図３〜図５を参照して本発明の実施の形態の音声集音・映像撮像装置における通話装置の構成について述べる。通話装置１０Ａも第２の通話装置１０Ｂも同様である。
図３は本発明の１実施の形態としての通話装置の斜視図である。
図４は図３に図解した通話装置の断面図である。
図５は図３、図４に図解した通話装置のマイクロフォン・電子回路収容部の平面図であり、図４の線Ｘ−Ｘにおける平面図である。 Details of the Call Device The configuration of the call device in the sound collection / video imaging device according to the embodiment of the present invention will be described with reference to FIGS. The same applies to the communication device 10A and the second communication device 10B.
FIG. 3 is a perspective view of a communication device as an embodiment of the present invention.
FIG. 4 is a cross-sectional view of the communication device illustrated in FIG.
FIG. 5 is a plan view of the microphone / electronic circuit housing portion of the communication device illustrated in FIGS. 3 and 4, and is a plan view taken along line XX in FIG.

図３に図解したように、通話装置は、上部カバー１１と、音反射板１２と、連結部材１３と、スピーカ収容部１４と、操作部１５とを有する。
図４に図解したように、スピーカ収容部１４は、音反射面１４ａと、底面１４ｂと、上部音出力開口部１４ｃとを有する。音反射面１４ａと底面１４ｂで包囲された空間である内腔１４ｄに受話再生スピーカ１６が収容されている。スピーカ収容部１４の上部に音反射板１２が位置し、スピーカ収容部１４と音反射板１２とが連結部材１３によって連結されている。 As illustrated in FIG. 3, the communication device includes an upper cover 11, a sound reflection plate 12, a connecting member 13, a speaker housing unit 14, and an operation unit 15.
As illustrated in FIG. 4, the speaker housing 14 includes a sound reflecting surface 14 a, a bottom surface 14 b, and an upper sound output opening 14 c. The reception / reproduction speaker 16 is accommodated in a lumen 14d which is a space surrounded by the sound reflection surface 14a and the bottom surface 14b. The sound reflecting plate 12 is positioned above the speaker housing portion 14, and the speaker housing portion 14 and the sound reflecting plate 12 are connected by a connecting member 13.

連結部材１３内には拘束部材１７が貫通しており、拘束部材１７は、スピーカ収容部１４の底面１４ｂの拘束部材下部固定部１４ｅと、音反射板１２の拘束部材固定部１２ｂとの間を拘束している。ただし、拘束部材１７はスピーカ収容部１４の拘束部材貫通部１４ｆは貫通しているだけである。拘束部材１７が拘束部材貫通部１４ｆを貫通してここで拘束していないのはスピーカ１６の動作によってスピーカ収容部１４が振動するが、その振動を上部音出力開口部１４ｃの周囲においては拘束させないためである。 A constraining member 17 passes through the connecting member 13, and the constraining member 17 is between the constraining member lower fixing portion 14 e on the bottom surface 14 b of the speaker housing portion 14 and the constraining member fixing portion 12 b of the sound reflecting plate 12. Restrained. However, the restraining member 17 is only penetrated by the restraining member penetration portion 14 f of the speaker housing portion 14. The reason why the restraining member 17 penetrates the restraining member through portion 14f and is not restrained here is that the speaker housing portion 14 vibrates due to the operation of the speaker 16, but the vibration is not restrained around the upper sound output opening 14c. Because.

相手会議室の話者が話した音声は、受話再生スピーカ１６を介して上部音出力開口部１４ｃから抜け、音反射板１２の音反射面１２ａとスピーカ収容部１４の音反射面１４ａとで規定される空間に沿って軸Ｃ−Ｃを中心として３６０度の全方位に拡散する。
音反射板１２の音反射面１２ａの断面は図解したように、ゆるやかなラッパ型の弧を描いている。音反射面１２ａの断面は軸Ｃ−Ｃを中心として３６０度にわたり（全方位にわたり）、図解した断面形状をしている。
同様にスピーカ収容部１４の音反射面１４ａの断面も図解したように、ゆるやかな凸面を描いている。音反射面１４ａの断面も軸Ｃ−Ｃを中心として３６０度にわたり（全方位）、図解した断面形状をしている。 The voice spoken by the speaker in the other party's conference room is extracted from the upper sound output opening 14c through the reception / reproduction speaker 16, and is defined by the sound reflecting surface 12a of the sound reflecting plate 12 and the sound reflecting surface 14a of the speaker accommodating portion 14. And spread in all directions of 360 degrees around the axis CC along the space.
As illustrated, the cross section of the sound reflecting surface 12a of the sound reflecting plate 12 depicts a gentle trumpet arc. The cross section of the sound reflecting surface 12a has a cross-sectional shape illustrated over 360 degrees (over all directions) about the axis CC.
Similarly, as illustrated in the cross section of the sound reflection surface 14a of the speaker housing portion 14, a gentle convex surface is drawn. The cross section of the sound reflecting surface 14a also has the illustrated cross sectional shape over 360 degrees (omnidirectional) about the axis CC.

受話再生スピーカ１６から出た音Ｓは、上部音出力開口部１４ｃを抜け、音反射面１２ａと音反射面１４ａとで規定される断面がラッパ状の音出力空間を経て、通話装置が載置されているテーブル９１１の面に沿って、軸Ｃ−Ｃを中心として３６０度全方位に拡散していき、全ての会議出席者Ａ１〜Ａ６に等しい音量で聞き取られる。本実施の形態においては、テーブル９１１の面も音伝播手段の一部として利用している。
受話再生スピーカ１６から出力された音Ｓの拡散状態を矢印で図示した。 The sound S emitted from the reception / reproduction speaker 16 passes through the upper sound output opening 14c, passes through a sound output space having a trumpet-shaped cross section defined by the sound reflection surface 12a and the sound reflection surface 14a, and the communication device is placed. Along the surface of the table 911, the sound is diffused 360 degrees in all directions around the axis C-C, and is heard at a volume equal to all the attendees A1 to A6. In the present embodiment, the surface of the table 911 is also used as part of the sound propagation means.
The diffusion state of the sound S output from the receiving / reproducing speaker 16 is shown by arrows.

音反射板１２は、プリント基板２１を支持している。
プリント基板２１には、図５に平面を図解したように、マイクロフォン・電子回路収容部２のマイクロフォンＭＣ１〜ＭＣ６、発光ダイオードＬＥＤ１〜６、マイクロプロセッサ２３、コーデック（ＣＯＤＥＣ）２４、第１のディジタルシグナルプロセッサ（ＤＳＰ１）ＤＳＰ２５、第２のディジタルシグナルプロセッサ（ＤＳＰ２）ＤＳＰ２６、Ａ／Ｄ変換器ブロック２７、Ｄ／Ａ変換器ブロック２８、増幅器ブロック２９などの各種電子回路が搭載されており、音反射板１２はマイクロフォン・電子回路収容部２を支持する部材としても機能している。 The sound reflecting plate 12 supports the printed circuit board 21.
On the printed circuit board 21, as illustrated in a plan view in FIG. 5, the microphones MC1 to MC6, the light emitting diodes LED1 to 6 of the microphone / electronic circuit housing unit 2, the microprocessor 23, the codec (CODEC) 24, the first digital signal. Various electronic circuits such as a processor (DSP 1) DSP 25, a second digital signal processor (DSP 2) DSP 26, an A / D converter block 27, a D / A converter block 28, and an amplifier block 29 are mounted on the sound reflector. Reference numeral 12 also functions as a member that supports the microphone / electronic circuit housing portion 2.

プリント基板２１には、受話再生スピーカ１６からの振動が音反射板１２を伝達してマイクロフォンＭＣ１〜ＭＣ６などに進入して騒音とならないように、受話再生スピーカ１６からの振動を吸収するダンパー１８が取り付けられている。ダンパー１８は、ネジと、このネジとプリント基板２１との間に挿入された防振ゴムなどの緩衝材とからなり、緩衝材をネジでプリント基板２１にネジ止めしている。すなわち、緩衝材によって受話再生スピーカ１６からプリント基板２１に伝達される振動が吸収される。これにより、マイクロフォンＭＣ１〜ＭＣ６は、スピーカ１６からの音の影響を受けない。 The printed circuit board 21 has a damper 18 that absorbs vibration from the reception / reproduction speaker 16 so that vibration from the reception / reproduction speaker 16 is transmitted to the sound reflector 12 and does not enter the microphones MC1 to MC6. It is attached. The damper 18 includes a screw and a cushioning material such as an anti-vibration rubber inserted between the screw and the printed board 21, and the cushioning material is screwed to the printed board 21 with a screw. That is, the vibration transmitted from the reception / reproduction speaker 16 to the printed circuit board 21 is absorbed by the buffer material. Thereby, the microphones MC1 to MC6 are not affected by the sound from the speaker 16.

マイクロフォンの配置
図５に図解したように、プリント基板２１の中心軸Ｃから等角度で放射状にかつ等間隔（本実施の形態では６０度の等角度）で６本のマイクロフォンＭＣ１〜ＭＣ６が位置している。各マイクロフォンは単一指向性を持つマイクロフォンである。その特性については後述する。
各マイクロフォンＭＣ１〜ＭＣ６は、共に柔軟性または弾力性のある第１のマイク支持部材２２ａと第２のマイク支持部材２２ｂとで、揺動自在に支持されており（図解を簡単にするため、マイクロフォンＭＣ１の部分の第１のマイク支持部材２２ａと第２のマイク支持部材２２ｂとについてのみ図解している）、上述した緩衝材を用いたダンパー１８による受話再生スピーカ１６からの振動の影響を受けない対策に加えて、柔軟性または弾力性のある第１のマイク支持部材２２ａと第２のマイク支持部材２２ｂとで受話再生スピーカ１６からの振動で振動するプリント基板２１の振動を吸収して受話再生スピーカ１６の振動の影響を受けないようにして、受話再生スピーカ１６の騒音を回避している。 Microphone Arrangement As illustrated in FIG. 5, six microphones MC1 to MC6 are located radially from the central axis C of the printed circuit board 21 at an equal angle and at equal intervals (equal angle of 60 degrees in the present embodiment). ing. Each microphone is a unidirectional microphone. Its characteristics will be described later.
Each of the microphones MC1 to MC6 is swingably supported by a first microphone support member 22a and a second microphone support member 22b, both of which are flexible or elastic (in order to simplify the illustration, the microphones Only the first microphone support member 22a and the second microphone support member 22b in the MC1 portion are illustrated), and is not affected by the vibration from the reception / reproduction speaker 16 by the damper 18 using the above-described cushioning material. In addition to the countermeasures, the first microphone support member 22a and the second microphone support member 22b having flexibility or elasticity absorb the vibration of the printed circuit board 21 that is vibrated by the vibration from the reception / reproduction speaker 16, and reproduce the reception. The noise of the receiving / reproducing speaker 16 is avoided so as not to be affected by the vibration of the speaker 16.

図４に図解したように、受話再生スピーカ１６はマイクロフォンＭＣ１〜ＭＣ６が位置する平面の中心軸Ｃ−Ｃに対して垂直に指向しており（本実施の形態においては上方向に向いている（指向している））、このような受話再生スピーカ１６と６本のマイクロフォンＭＣ１〜ＭＣ６の配置により、受話再生スピーカ１６と各マイクロフォンＭＣ１〜ＭＣ６との距離は等距離となり、受話再生スピーカ１６からの音声は、各マイクロフォンＭＣ１〜ＭＣ６に対しほとんど同音量、同位相で届く。ただし、上述した音反射板１２の音反射面１２ａおよびスピーカ収容部１４の音反射面１４ａの構成により、受話再生スピーカ１６の音がマイクロフォンＭＣ１〜ＭＣ６には直接入力されないようにしている。加えて、上述したように、緩衝材を用いたダンパー１８と、柔軟性または弾力性のある第１のマイク支持部材２２ａと第２のマイク支持部材２２ｂとを用いることにより、受話再生スピーカ１６の振動の影響を低減している。
会議出席者Ａ１〜Ａ６は、通常、たとえば、図１（Ｃ）に例示したように、通話装置の周囲３６０度方向に、６０度間隔で配設されているマイクロフォンＭＣ１〜ＭＣ６の近傍にほぼ等間隔で位置している。なお、図２に図解した例示では、通話装置の周囲に８名の会議出席者が位置している。 As illustrated in FIG. 4, the reception / reproduction speaker 16 is oriented perpendicularly to the central axis CC of the plane on which the microphones MC1 to MC6 are located (in the present embodiment, it is directed upward) With the arrangement of the reception / reproduction speaker 16 and the six microphones MC1 to MC6, the distance between the reception / reproduction speaker 16 and each of the microphones MC1 to MC6 is equal. The sound reaches the microphones MC1 to MC6 with almost the same volume and phase. However, due to the configuration of the sound reflection surface 12a of the sound reflection plate 12 and the sound reflection surface 14a of the speaker housing portion 14, the sound of the reception and reproduction speaker 16 is not directly input to the microphones MC1 to MC6. In addition, as described above, by using the damper 18 using the buffer material, the first microphone support member 22a and the second microphone support member 22b having flexibility or elasticity, the reception / reproduction speaker 16 is provided. The influence of vibration is reduced.
As shown in FIG. 1C, for example, conference attendees A1 to A6 are usually substantially equal to the vicinity of microphones MC1 to MC6 arranged at intervals of 60 degrees in the direction of 360 degrees around the communication device. Located at intervals. In the example illustrated in FIG. 2, eight conference attendees are located around the call device.

話者を決定したことを通報する手段（マイクロフォン選択結果表示手段）として発光ダイオードＬＥＤ１〜６がマイクロフォンＭＣ１〜ＭＣ６の近傍に配置されている。
発光ダイオードＬＥＤ１〜６は上部カバー１１を装着した状態でも、全ての会議出席者Ａ１〜Ａ６から視認可能に設けられている。したがって、上部カバー１１は発光ダイオードＬＥＤ１〜６の発光状態が視認可能なように透明窓が設けられている。もちろん、上部カバー１１に発光ダイオードＬＥＤ１〜６の部分に開口が設けられていてもよいが、マイクロフォン・電子回路収容部２への防塵の観点からは透光窓が好ましい。 Light emitting diodes LED1 to 6 are arranged in the vicinity of the microphones MC1 to MC6 as means for notifying that the speaker has been determined (microphone selection result display means).
The light emitting diodes LED1 to 6 are provided so as to be visible from all the conference attendants A1 to A6 even when the upper cover 11 is attached. Therefore, the upper cover 11 is provided with a transparent window so that the light emitting states of the light emitting diodes LED1 to LED6 can be visually recognized. Of course, the upper cover 11 may be provided with openings in the portions of the light emitting diodes LEDs 1 to 6, but a light-transmitting window is preferable from the viewpoint of dust prevention to the microphone / electronic circuit housing portion 2.

プリント基板２１には、後述する各種の信号処理を行うために、第１のディジタルシグナルプロセッサ（ＤＳＰ１）２５、第２のディジタルシグナルプロセッサ（ＤＳＰ２）２６、各種電子回路２７〜２９が、マイクロフォンＭＣ１〜ＭＣ６が位置する部分以外の空間に配置されている。
本実施の形態においては、ＤＳＰ２５を各種電子回路２７〜２９とともにフィルタ処理、マイクロフォン選択処理などの処理を行う信号処理手段として用い、ＤＳＰ２６をエコーキャンセラーとして用いている。 The printed circuit board 21 includes a first digital signal processor (DSP 1) 25, a second digital signal processor (DSP 2) 26, and various electronic circuits 27 to 29 for performing various signal processing described later. It is arranged in a space other than the part where the MC 6 is located.
In the present embodiment, the DSP 25 is used as signal processing means for performing processing such as filter processing and microphone selection processing together with various electronic circuits 27 to 29, and the DSP 26 is used as an echo canceller.

図６は、マイクロプロセッサ２３、コーデック２４、ＤＳＰ２５、ＤＳＰ２６、Ａ／Ｄ変換器ブロック２７、Ｄ／Ａ変換器ブロック２８、増幅器ブロック２９、その他各種電子回路の概略構成図である。
マイクロプロセッサ２３はマイクロフォン・電子回路収容部２の全体制御処理を行う。コーデック２４は相手方会議室に送信する音声を圧縮符号化する。
ＤＳＰ２５が下記に述べる各種の信号処理、たとえば、フィルタ処理、マイクロフォン選択処理などを行う。
ＤＳＰ２６はエコーキャンセラーとして機能する。
図６においては、Ａ／Ｄ変換器ブロック２７の１例として、４個のＡ／Ｄ変換器２７１〜２７４を例示し、Ｄ／Ａ変換器ブロック２８の１例として、２個のＤ／Ａ変換器２８１〜２８２を例示し、増幅器ブロック２９の１例として、２個の増幅器２９１〜２９２を例示している。
その他、マイクロフォン・電子回路収容部２としては電源回路など各種の回路がプリント基板２１に搭載されている。 FIG. 6 is a schematic configuration diagram of the microprocessor 23, the codec 24, the DSP 25, the DSP 26, the A / D converter block 27, the D / A converter block 28, the amplifier block 29, and other various electronic circuits.
The microprocessor 23 performs overall control processing of the microphone / electronic circuit housing unit 2. The codec 24 compresses and encodes audio to be transmitted to the other party conference room.
The DSP 25 performs various signal processing described below, such as filter processing and microphone selection processing.
The DSP 26 functions as an echo canceller.
In FIG. 6, four A / D converters 271 to 274 are illustrated as an example of the A / D converter block 27, and two D / A converters are illustrated as an example of the D / A converter block 28. The converters 281 to 282 are illustrated, and two amplifiers 291 to 292 are illustrated as an example of the amplifier block 29.
In addition, as the microphone / electronic circuit housing portion 2, various circuits such as a power supply circuit are mounted on the printed circuit board 21.

図５においてプリント基板２１の中心軸Ｃに対してそれぞれ対称（または対向する）位置に一直線上に配設された１対のマイクロフォンＭＣ１−ＭＣ４：ＭＣ２−ＭＣ５：ＭＣ３−Ｍ６が、それぞれ２チャネルのアナログ信号をディジタル信号に変換するＡ／Ｄ変換器２７１〜２７３に入力されている。本実施の形態においては、１個のＡ／Ｄ変換器が２チャネルのアナログ入力信号をディジタル信号に変換する。そこで、中心軸Ｃを挟んで一直線上に位置する２個（１対）のマイクロフォン、たとえば、マイクロフォンＭＣ１とＭＣ４の検出信号を１個のＡ／Ｄ変換器に入力してディジタル信号に変換している。また、本実施の形態においては、相手の会議室に送出する音声の話者を特定するため、一直線上に位置する２個のマイクロフォンの音声の差、音声の大きさなどを参照するから、一直線上に位置する２個のマイクロフォンの信号を同じＡ／Ｄ変換器に入力すると、変換タイミングもほぼ同じになり、２個のマイクロフォンの音声出力の差をとるときにタイミング誤差が少ない、信号処理が容易になるなどの利点がある。
なお、Ａ／Ｄ変換器２７１〜２７４は可変利得型増幅機能付きのＡ／Ｄ変換器２７１〜２７４として構成することもできる。
Ａ／Ｄ変換器２７１〜２７３で変換したマイクロフォンＭＣ１〜ＭＣ６の集音信号はＤＳＰ２５に入力されて、後述する各種の信号処理が行われる。
ＤＳＰ２５の処理結果の１つとして、マイクロフォンＭＣ１〜ＭＣ６のうちの１つを選択した結果が、マイクロフォン選択結果表示手段の１例である発光ダイオードＬＥＤ１〜６に出力される。 In FIG. 5, a pair of microphones MC1-MC4: MC2-MC5: MC3-M6 arranged in a straight line at symmetrical (or opposite) positions with respect to the central axis C of the printed circuit board 21 are respectively two channels. The analog signals are inputted to A / D converters 271 to 273 for converting them into digital signals. In this embodiment, one A / D converter converts a 2-channel analog input signal into a digital signal. Therefore, the detection signals of two (one pair) microphones, for example, microphones MC1 and MC4, which are positioned on a straight line across the central axis C, are input to one A / D converter and converted into digital signals. Yes. Further, in this embodiment, in order to identify the speaker of the voice to be sent to the other party's conference room, the difference between the two microphones positioned on a straight line, the volume of the voice, etc. are referred to. When the signals of two microphones located on the line are input to the same A / D converter, the conversion timing is also substantially the same, and there is little timing error when taking the difference between the audio outputs of the two microphones. There are advantages such as being easy.
The A / D converters 271 to 274 can also be configured as A / D converters 271 to 274 with a variable gain amplification function.
The collected sound signals of the microphones MC1 to MC6 converted by the A / D converters 271 to 273 are input to the DSP 25, and various signal processing described later is performed.
As one of the processing results of the DSP 25, the result of selecting one of the microphones MC1 to MC6 is output to the light emitting diodes LED1 to 6 which are an example of the microphone selection result display means.

ＤＳＰ２５の処理結果が、ＤＳＰ２６に出力されてエコーキャンセル処理が行われる。ＤＳＰ２６は、たとえば、エコーキャンセル送話処理部とエコーキャンセル受話部とを有する。
ＤＳＰ２６の処理結果が、Ｄ／Ａ変換器２８１〜２８２でアナログ信号に変換される。Ｄ／Ａ変換器２８１からの出力が、必要に応じて、コーデック２４で符号化されて、増幅器２９１を介して通信回線９２０（図１（Ａ））のラインアウトに出力され、相手方会議室に設置された通話装置の受話再生スピーカ１６を介して音として出力される。
相手方の会議室に設置された通話装置からの音声が通信回線９２０（図１（Ａ））のラインインを介して入力され、Ａ／Ｄ変換器２７４においてディジタル信号に変換されて、ＤＳＰ２６に入力されてエコーキャンセル処理に使用される。また、相手方の会議室に設置された通話装置からの音声は図示しない経路でスピーカ１６に印加されて音として出力される。
Ｄ／Ａ変換器２８２からの出力が増幅器２９２を介してこの通話装置の受話再生スピーカ１６から音として出力される。すなわち、会議出席者Ａ１〜Ａ６は、上述した受話再生スピーカ１６から相手会議室の選択された話者の音声に加えて、その会議室にいる発言者が発した音声をも受話再生スピーカ１６を介して聞くことが出来る。 The processing result of the DSP 25 is output to the DSP 26 and an echo cancellation process is performed. The DSP 26 includes, for example, an echo cancellation transmission processing unit and an echo cancellation reception unit.
The processing result of the DSP 26 is converted into an analog signal by the D / A converters 281 to 282. The output from the D / A converter 281 is encoded by the codec 24 as necessary, and is output to the line-out of the communication line 920 (FIG. 1A) via the amplifier 291 to the partner conference room. It is output as sound through the receiving / reproducing speaker 16 of the installed communication device.
Voice from a communication device installed in the other party's conference room is input via the line-in of the communication line 920 (FIG. 1A), converted into a digital signal by the A / D converter 274, and input to the DSP 26. And used for echo cancellation processing. In addition, the sound from the communication device installed in the other party's conference room is applied to the speaker 16 through a route (not shown) and output as sound.
An output from the D / A converter 282 is output as a sound from the reception reproduction speaker 16 of the communication device via the amplifier 292. That is, the conference attendees A1 to A6 use the reception / reproduction speaker 16 for the voice of the speaker in the conference room in addition to the voice of the speaker selected in the conference room from the reception / reproduction speaker 16 described above. Can be heard through.

マイクロフォンＭＣ１〜ＭＣ６
図７は各マイクロフォンＭＣ１〜ＭＣ６の指向性を示すグラフである。
各単一指向特性マイクフォンは発言者からマイクロフォンへの音声の到達角度により図７に図解のように周波数特性、レベル特性が変化する。複数の曲線は、集音信号の周波数が、１００Ｈｚ、１５０Ｈｚ、２００Ｈｚ、３００Ｈｚ、４００Ｈｚ、５００Ｈｚ、７００Ｈｚ、１０００Ｈｚ、１５００Ｈｚ、２０００Ｈｚ、３０００Ｈｚ、４０００Ｈｚ、５０００Ｈｚ、７０００Ｈｚの時の指向性を示している。ただし、図解を簡単にするため、図７は代表的に、１５０Ｈｚ、５００Ｈｚ、１５００Ｈｚ、３０００Ｈｚ、７０００Ｈｚについての指向性を図解している。 Microphones MC1 to MC6
FIG. 7 is a graph showing the directivity of each of the microphones MC1 to MC6.
Each unidirectional characteristic microphone changes its frequency characteristic and level characteristic as illustrated in FIG. 7 depending on the arrival angle of the sound from the speaker to the microphone. The plurality of curves indicate directivity when the frequency of the sound collection signal is 100 Hz, 150 Hz, 200 Hz, 300 Hz, 400 Hz, 500 Hz, 700 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 3000 Hz, 4000 Hz, 5000 Hz, and 7000 Hz. However, for simplicity of illustration, FIG. 7 typically illustrates the directivity for 150 Hz, 500 Hz, 1500 Hz, 3000 Hz, and 7000 Hz.

図８（Ａ）〜（Ｄ）は音源の位置とマイクロフォンの集音レベルの分析結果を示すグラフであり、通話装置と所定距離、たとえば、１．５メートルの距離にスピーカを置いて各マイクロフォンが集音した音声を一定時間間隔で高速フーリエ変換（ＦＦＴ）した結果を示している。Ｘ軸が周波数を、Ｙ軸が信号レベルを、Ｚ軸が時間を表している。
図７の指向性を持つマイクロフォンを用いた場合、マイクロフォンの正面に強い指向性を示す。本実施の形態においては、このような特性を活用して、ＤＳＰ２５においてマイクロフォンの選定処理を行う。 8A to 8D are graphs showing the analysis results of the position of the sound source and the sound collection level of the microphone, and each microphone is placed with a speaker placed at a predetermined distance, for example, a distance of 1.5 meters. The result of fast Fourier transform (FFT) of the collected sound at regular time intervals is shown. The X axis represents frequency, the Y axis represents signal level, and the Z axis represents time.
When the microphone having directivity shown in FIG. 7 is used, strong directivity is shown in front of the microphone. In the present embodiment, using such characteristics, the DSP 25 performs a microphone selection process.

本発明の実施の形態のように指向性を持つマイクロフォンではなく無指向性のマイクロフォンを用いた場合、マイクロフォン周辺の全ての音を集音（収音）するので発言者の音声と周辺ノイズとのＳ／Ｎが混同してあまり良い音が集音できない。これを避けるため、本発明においては、指向性マイクロフォン１本で集音することによって周辺のノイズとのＳ／Ｎを改善している。
さらに、マイクロフォンの指向性を得る方法として、複数の無指向性マイクロフォンを使用したマイクロフォンアレイを用いることができるが、このような方法では、複数の信号の時間軸（位相）の一致のため複雑な処理を要するため、時間がかかり応答性が低いし、装置構成を複雑になる。すなわち、ＤＳＰの信号処理系にも複雑な信号処理を必要とする。本発明は図６に例示した指向性のあるマイクロフォンを用いてそのような問題を解決している。
また、マイクロフォンアレイ信号を合成して指向性収音（集音）マイクロフォンとして利用するためには外形形状が通過周波数特性によって規制され外形形状が大きくなるという不利益がある。本発明はこの問題も解決している。 When an omnidirectional microphone is used instead of a directional microphone as in the embodiment of the present invention, all sounds around the microphone are collected (sound collection). S / N is confused and cannot collect very good sound. In order to avoid this, in the present invention, S / N with surrounding noise is improved by collecting sound with one directional microphone.
Furthermore, a microphone array using a plurality of omnidirectional microphones can be used as a method for obtaining the directivity of the microphone. However, in such a method, the time axis (phase) of a plurality of signals is complicated, which is complicated. Since processing is required, it takes time and response is low, and the apparatus configuration is complicated. That is, the DSP signal processing system also requires complicated signal processing. The present invention solves such a problem by using the directional microphone illustrated in FIG.
Further, in order to synthesize a microphone array signal and use it as a directional sound collecting (sound collecting) microphone, there is a disadvantage that the outer shape is restricted by the pass frequency characteristic and the outer shape becomes large. The present invention also solves this problem.

上述した構成の通話装置は下記の利点を示す。
（１）等角度で放射状かつ等間隔に配設された偶数個のマイクロフォンＭＣ１〜ＭＣ６と受話再生スピーカ１６との位置関係が一定であり、さらにその距離が非常に近いことで受話再生スピーカ１６から出た音が会議室（部屋）環境を経てマイクロフォンＭＣ１〜ＭＣ６に戻ってくるレベルより直接戻ってくるレベルが圧倒的に大きく支配的である。そのために、スピーカ１６からマイクロフォンＭＣ１〜ＭＣ６に音が到達する特性（信号レベル（強度）、周波数特性（ｆ特、位相）がいつも同じである。つまり、本発明の実施の形態における通話装置においてはいつも伝達関数が同じという利点がある。
（２）それ故、話者が異なった時に相手方会議室に送出するマイクロフォンの出力を切り替えた時の伝達関数の変化がなく、マイクロフォンを切り替える都度、マイクロフォン系の利得を調整する必要がないという利点を有する。換言すれば、通話装置の製造時に一度調整をすると調整をやり直す必要がないという利点がある。
（３）上記と同じ理由で話者が異なった時にマイクロフォンを切り替えても、エコーキャンセラー（ＤＳＰ２６）が一つでよい。ＤＳＰは高価であり、種々の部材が搭載されて空きが少ないプリント基板２１に複数のＤＳＰを配置する必要がなく、プリント基板２１におけるＤＳＰを配置するスペースも少なくてよい。その結果、プリント基板２１、ひいては、本発明の通話装置を小型にできる。
（４）上述したように、受話再生スピーカ１６とマイクロフォンＭＣ１〜ＭＣ６間の伝達関数が一定であるため、たとえば、±３ｄＢもあるマイクロフォン自体の感度差調整を通話装置のマイクロフォンユニット単独で出来るという利点がある。感度差調整の詳細は後述する。
（５）通話装置が搭載されるテーブルは、通常、円いテーブル（円卓）または多角テーブルを用いることで、通話装置内の一つの受話再生スピーカ１６で均等な品質の音声を軸Ｃを中心として３６０度全方位に均等に分散（拡散）するスピーカシステムが可能になった。
（６）受話再生スピーカ１６から出た音は円卓のテーブル面を伝達して（バウンダリ効果）会議出席者まで有効に能率良く均等に上質な音が届き、会議室の天井方向に対しては対向側の音と位相がキャンセルされて小さな音になり、会議出席者に対して天井方向からの反射音が少なく、結果として参加者に明瞭な音が配給されるという利点がある。
（７）受話再生スピーカ１６から出た音は等角度で放射状かつ等間隔に配設された全てのマイクロフォンＭＣ１〜ＭＣ６に同時に同じ音量で届くので発言者の音声なのか受話音声なのかの判断が容易になる。その結果、マイクロフォン選択処理の誤判別が減る。その詳細は後述する。
（８）偶数個、たとえば、６本のマイクロフォンを等角度で放射状かつ等間隔で、対向する１対のマイクロフォンを一直線上に配置したことで方向検出の為のレベル比較が容易にできる。
（９）ダンパー１８、マイクロフォン支持部材２２などにより、受話再生スピーカ１６の音による振動が、マイクロフォンＭＣ１〜ＭＣ６の集音に与える影響を低減することができる。
（１０）図４に図解したように、構造的に、受話再生スピーカ１６の音が直接、マイクロフォンＭＣ１〜ＭＣ６には伝搬しない。したがって、この通話装置においては受話再生スピーカ１６からのノイズの影響が少ない。 The communication device configured as described above has the following advantages.
(1) Since the positional relationship between the even number of microphones MC1 to MC6 radially arranged at equal angles and at equal intervals and the reception / reproduction speaker 16 is constant and the distance is very close, the reception / reproduction speaker 16 The level at which the output sound returns directly to the microphones MC1 to MC6 via the conference room (room) environment is overwhelmingly dominant. For this reason, the characteristics (signal level (intensity) and frequency characteristics (f characteristics, phase) in which sound reaches the microphones MC1 to MC6 from the speaker 16 are always the same. That is, in the communication device according to the embodiment of the present invention. There is an advantage that the transfer function is always the same.
(2) Therefore, there is no change in the transfer function when the output of the microphone sent to the other party's conference room is switched when the speakers are different, and there is no need to adjust the gain of the microphone system each time the microphone is switched. Have In other words, there is an advantage that once the adjustment is made at the time of manufacturing the telephone device, there is no need to redo the adjustment.
(3) Even if the microphones are switched when the speakers are different for the same reason as described above, only one echo canceller (DSP 26) is required. The DSP is expensive, and it is not necessary to arrange a plurality of DSPs on the printed circuit board 21 on which various members are mounted and the space is small. As a result, the printed circuit board 21, and thus the communication device of the present invention can be reduced in size.
(4) Since the transfer function between the reception / reproduction speaker 16 and the microphones MC1 to MC6 is constant as described above, for example, the advantage that the sensitivity difference of the microphone itself having ± 3 dB can be adjusted by the microphone unit alone of the communication device. There is. Details of the sensitivity difference adjustment will be described later.
(5) The table on which the communication device is mounted is usually a round table or a polygonal table, so that sound of equal quality is centered on the axis C by one receiving / reproducing speaker 16 in the communication device. A speaker system capable of evenly dispersing (diffusing) in all directions at 360 degrees has become possible.
(6) The sound emitted from the receiving / reproducing speaker 16 is transmitted to the table surface of the round table (boundary effect), effectively and efficiently delivering high-quality sound to the meeting attendees, and facing the ceiling direction of the conference room There is an advantage that the sound and the phase on the side are canceled to become a small sound, the reflected sound from the ceiling direction is less for the conference attendee, and as a result, a clear sound is distributed to the participants.
(7) Since the sound emitted from the reception / reproduction speaker 16 reaches all the microphones MC1 to MC6 arranged radially and at equal intervals at the same angle at the same volume at the same time, it is determined whether the sound is the voice of the speaker or the received voice. It becomes easy. As a result, erroneous determination of microphone selection processing is reduced. Details thereof will be described later.
(8) Even number, for example, six microphones are arranged at equal angles radially and at equal intervals, and a pair of opposing microphones are arranged in a straight line, so that level comparison for direction detection can be easily performed.
(9) By the damper 18, the microphone support member 22, and the like, it is possible to reduce the influence of the vibration due to the sound of the reception and reproduction speaker 16 on the sound collection of the microphones MC1 to MC6.
(10) As illustrated in FIG. 4, structurally, the sound of the reception / reproduction speaker 16 does not propagate directly to the microphones MC1 to MC6. Therefore, in this communication device, the influence of noise from the reception / reproduction speaker 16 is small.

変形例
図３〜図４を参照して述べた通話装置は、下部に受話再生スピーカ１６を配置させ、上部にマイクロフォンＭＣ１〜ＭＣ６（および関連する電子回路）を配置させたが、受話再生スピーカ１６とマイクロフォンＭＣ１〜ＭＣ６（および関連する電子回路）の位置を、図９に図解したように、上下逆にすることもできる。このような場合でも上述した効果を奏する。 Modified Example The communication device described with reference to FIGS. 3 to 4 has the reception reproduction speaker 16 disposed in the lower portion and the microphones MC1 to MC6 (and related electronic circuits) disposed in the upper portion. And the positions of the microphones MC1 to MC6 (and related electronic circuits) can also be turned upside down as illustrated in FIG. Even in such a case, the above-described effects are exhibited.

マイクロフォンの本数は６本には限定されず、４本、８本などと任意の偶数本のマイクロフォンを等角度で放射状かつ等間隔で軸Ｃを中心に複数対それぞれを一直線に（同方向に）、たとえば、マイクロフォンＭＣ１とＭＣ４のように一直線に配置する。好ましい形態として、２本のマイクロフォンＭＣ１、ＭＣ４を対向させて一直線に配置する理由は、マイクロフォンを選定して話者を特定するためである。 The number of microphones is not limited to six, and any number of microphones, such as four, eight, etc., may be arranged in a straight line (in the same direction) with a plurality of pairs radially centered on axis C at equal angles and at equal intervals. For example, the microphones MC1 and MC4 are arranged in a straight line. The reason why the two microphones MC1 and MC4 are arranged in a straight line as a preferred form is to select a microphone and identify a speaker.

信号処理内容
以下、主として第１のディジタルシグナルプロセッサ（ＤＳＰ）２５で行う処理内容について述べる。
図１０はＤＳＰ２５が行う通話装置における処理の概要を図解した図である。以下、その概要を述べる。 Signal Processing Contents Hereinafter, processing contents mainly performed by the first digital signal processor (DSP) 25 will be described.
FIG. 10 is a diagram illustrating an outline of processing in the communication device performed by the DSP 25. The outline is described below.

（１）周囲のノイズの測定
初期動作として、好ましくは、通話装置１０Ａが設置される周囲のノイズを測定する。
通話装置は種々の環境（会議室）で使用されうる。マイクロフォンの選択の正確さを期し、通話装置の性能を高めるために、本発明においては、初期段階において、通話装置が設置される周囲環境のノイズを測定し、そのノイズの影響をマイクロフォンで集音した信号から排除することを可能とする。
もちろん、通話装置を同じ会議室で反復して使用するような場合、事前にノイズ測定が行われており、ノイズ状態が変化しないような場合にはこの処理は割愛できる。
なお、ノイズ測定は通常状態においても行うことができる。 (1) Measurement of ambient noise As an initial operation, preferably, ambient noise where the communication device 10A is installed is measured.
The call device can be used in various environments (conference rooms). In the present invention, in order to increase the accuracy of the selection of the microphone and to improve the performance of the communication device, in the initial stage, noise in the surrounding environment where the communication device is installed is measured, and the influence of the noise is collected by the microphone. It is possible to exclude it from the processed signal.
Of course, when the communication device is repeatedly used in the same conference room, noise measurement is performed in advance, and this processing can be omitted when the noise state does not change.
Note that noise measurement can also be performed in a normal state.

（２）議長の選定
たとえば、通話装置を双方向会議に使用する場合、それぞれの会議室における議事運営を取りまとめる議長がいることが有益である。したがって、本発明の１態様としては、通話装置を使用する初期段階において、通話装置の操作部１５から議長を設定する。議長の設定方法としては、たとえば、操作部１５の近傍に位置する第１マイクロフォンＭＣ１を議長用マイクロフォンとする。もちろん、議長用マイクロフォンを任意のものにすることもできる。
なお、通話装置を反復して使用する議長が同じ場合はこの処理は割愛できる。あるいは、事前に議長が座る位置のマイクロフォンを決めておいてもよい。その場合はその都度、議長の選定動作は不要である。
もちろん、議長の選定は初期状態に限らず、任意のタイミングで行うことができる。 (2) Selection of Chairperson For example, when a telephone device is used for a two-way conference, it is beneficial to have a chairperson who manages the proceedings in each conference room. Therefore, as one aspect of the present invention, the chairperson is set from the operation unit 15 of the call device in the initial stage of using the call device. As a chairperson setting method, for example, the first microphone MC1 located in the vicinity of the operation unit 15 is used as a chairperson microphone. Of course, the chairman's microphone can be arbitrary.
Note that this process can be omitted when the chairperson who uses the telephone device repeatedly is the same. Or you may decide the microphone of the position where a chairperson sits beforehand. In that case, there is no need to select a chairman each time.
Of course, the selection of the chair is not limited to the initial state, and can be performed at any timing.

（３）マイクロフォンの感度差調整
初期動作として、好ましくは、受話再生スピーカ１６とマイクロフォンＭＣ１〜ＭＣ６との音響結合が等しくなるように、マイクロフォンＭＣ１〜ＭＣ６の信号を増幅する増幅部の利得または減衰部の減衰値を自動的に調整する。 (3) Microphone sensitivity difference adjustment As an initial operation, preferably, the gain or attenuation unit of the amplification unit that amplifies the signals of the microphones MC1 to MC6 so that the acoustic coupling between the reception reproduction speaker 16 and the microphones MC1 to MC6 is equal. Automatically adjust the attenuation value.

通常処理として下記に例示する各種の処理を行う。
（１）マイクロフォン選択、切り替え処理
１つの会議室において同時に複数の会議出席者が通話すると、音声が入り交じり相手側会議室内の会議出席者Ａ１〜Ａ６にとって聞きにくい。そこで、本発明においては、原則として、ある時間帯には１人ずつ通話させる。そのため、ＤＳＰ２５においてマイクロフォンの選択・切り替え処理を行う。
その結果、選択されたマイクロフォンからの通話のみが、通信回線９２０を介して相手方会議室の通話装置に伝送されてスピーカから出力される。もちろん、図６を参照して述べたように、選択された話者のマイクロフォンの近傍のＬＥＤが点灯し、さらに、その部屋の通話装置のスピーカからも選択された話者の音声を聞くことができ、誰が許可された話者かを認識することができる。
この処理により、発言者に対向した単一指向性マイクの信号を選択し、送話信号として相手方にＳ／Ｎの良い信号を送ることを目的としている。
（２）選択したマイクロフォンの表示
話者のマイクロフォンが選択され、話すことが許可された会議出席者のマイクロフォンがどれであるかを、会議出席者Ａ１〜Ａ６全員が容易に認識できるように、マイクロフォン選択結果表示手段、たとえば、発光ダイオードＬＥＤ１〜６の該当するものを点灯させる。
（３）撮像条件の決定（第３実施の形態）
第３実施の形態として述べる撮像調整部３６において、通話装置による上述したマイクロフォンの選択（特定）結果を用いて、テレビカメラ装置４０Ａ１、４０Ａ２の撮像条件を決定することができる。
（４）上述したマイクロフォン選択処理の背景技術として、または、マイクロフォン選択処理を正確に遂行するため下記に例示する各種の信号処理を行う。
（ａ）マイクロフォンの集音信号の帯域分離と、レベル変換処理
（ｂ）発言の開始、終了の判定処理
発言者方向に対向したマイク信号の選択判定開始トリガとして使用するため。
（ｃ）発言者方向マイクロフォンの検出処理
各マイクロフォンの集音信号を分析し、発言者の使用しているマイクロフォンを判定するため。
（ｄ）発言者方向マイクロフォンの切り換えタイミング判定処理、および、検出された発言者に対向したマイク信号の選択切り替え処理
上述した処理結果から選択したマイクロフォンへ切り換えの指示をする。（ｅ）通常動作時のフロアノイズの測定 Various processes exemplified below are performed as normal processes.
(1) Microphone selection and switching processing When a plurality of conference attendees talk at the same time in one conference room, voices are mixed and difficult for the conference attendees A1 to A6 in the other conference room. Therefore, in the present invention, in principle, one person is allowed to talk at a time. For this reason, the DSP 25 performs microphone selection / switching processing.
As a result, only the call from the selected microphone is transmitted to the call device in the other party's conference room via the communication line 920 and output from the speaker. Of course, as described with reference to FIG. 6, the LED in the vicinity of the selected speaker's microphone is turned on, and the selected speaker's voice can be heard from the speaker of the communication device in the room. And recognize who is the authorized speaker.
The purpose of this processing is to select a signal from a unidirectional microphone facing the speaker and send a signal having a good S / N to the other party as a transmission signal.
(2) Display of selected microphone A microphone is selected so that all the conference participants A1 to A6 can easily recognize which conference participant's microphone is selected and allowed to speak. Selection result display means, for example, corresponding ones of the light emitting diodes LED1 to LED6 are turned on.
(3) Determination of imaging conditions (third embodiment)
In the imaging adjustment unit 36 described as the third embodiment, the imaging conditions of the television camera devices 40A1 and 40A2 can be determined using the above-described microphone selection (specification) result by the communication device.
(4) As a background technique of the above-described microphone selection process, or in order to accurately perform the microphone selection process, various signal processes exemplified below are performed.
(A) Band separation and level conversion processing of microphone collected signal (b) Start / end determination processing of speech
To be used as a trigger to start selecting the microphone signal that faces the speaker direction.
(C) Speaker direction microphone detection processing
To analyze the collected sound signal of each microphone and determine the microphone used by the speaker.
(D) Speaker direction microphone switching timing determination processing, and microphone signal selection switching processing facing the detected speaker
An instruction to switch to the microphone selected from the above processing result is given. (E) Measurement of floor noise during normal operation

フロア（環境）ノイズの測定
この処理は通話装置の電源投入直後の初期処理と通常処理に分かれる。
なお、この処理は下記の例示的な前提条件の下に行う。 Measurement of floor (environment) noise This process is divided into an initial process and a normal process immediately after the communication device is turned on.
This process is performed under the following exemplary preconditions.

〔表１〕
（１）条件：測定時間及び閾値暫定値：
１．テストトーン音圧：マイク信号レベルで−４０ｄＢ
２．ノイズ測定単位時間：１０秒
３．通常状態でのノイズ測定：１０秒間の測定結果で平均値計算し、さらにこれを１０回繰り返して平均値を求めノイズレベルとする。 [Table 1]
(1) Conditions: Measurement time and threshold provisional value:
1. Test tone sound pressure: -40dB at microphone signal level
2. 2. Noise measurement unit time: 10 seconds Noise measurement in a normal state: The average value is calculated from the measurement result for 10 seconds, and this is repeated 10 times to obtain the average value to obtain the noise level.

〔表２〕
（２）フロアノイズと発言開始基準レベルとの差による有効距離の目安と閾値
１．２６ｄＢ以上：３メートル以上
発言開始の検出レベル閾値：フロアノイズレベル＋９ｄＢ
発言終了の検出レベル閾値：フロアノイズレベル＋６ｄＢ
２．２０〜２６ｄＢ：３メートル以内
発言開始の検出レベル閾値：フロアノイズレベル＋９ｄＢ
発言終了の検出レベル閾値：フロアノイズレベル＋６ｄＢ
３．１４〜２０ｄＢ：１．５メートル以内
発言開始の検出レベル閾値：フロアノイズレベル＋９ｄＢ
発言終了の検出レベル閾値：フロアノイズレベル＋６ｄＢ
４．９〜１４ｄＢ：1 メートル以内
発言開始の検出レベル閾値：
フロアノイズレベルと発言開始基準レベルとの差÷２＋２ｄＢ
発言終了の検出レベル閾値：発言開始閾値−３ｄＢ
５．９ｄＢ以下：数１０センチメートル
発言開始の検出レベル閾値：−３ｄＢ
６．フロアノイズレベルと発言開始基準レベルとの差÷２
発言終了の検出レベル閾値：−３ｄＢ
７．同じかマイナス：判定できず選択禁止 [Table 2]
(2) Estimated effective distance and threshold based on the difference between floor noise and speech start reference level 1.26 dB or more: 3 meters or more
Detection level threshold for starting speech: Floor noise level +9 dB
Talk level detection level threshold: floor noise level + 6 dB
2.20 to 26 dB: within 3 meters
Detection level threshold for starting speech: Floor noise level +9 dB
Talk level detection level threshold: floor noise level + 6 dB
3.14 to 20 dB: within 1.5 meters
Detection level threshold for starting speech: Floor noise level +9 dB
Talk level detection level threshold: floor noise level + 6 dB
4.9-14dB: within 1 meter
Detection level threshold for starting speech:
Difference between floor noise level and speech start reference level ÷ 2 + 2 dB
Talk end threshold: Talk start threshold-3 dB
5.9 dB or less: tens of centimeters
Detection level threshold for speech start: -3 dB
6). Difference between floor noise level and speech start reference level ÷ 2
Talk end detection level threshold: -3 dB
7). Same or negative: Cannot be judged and cannot be selected

〔表３〕
（３）通常処理のノイズ測定開始閾値は電源投入時のフロアノイズ＋３ｄＢ以下のレベルになった時から開始する。 [Table 3]
(3) The noise measurement start threshold value of the normal process starts when the level becomes lower than the floor noise at the time of power-on + 3 dB.

フィルタ処理による各種周波数成分信号の生成
図１１はマイクロフォンで集音した音信号を前処理として、ＤＳＰ２５で行うフィルタリング処理を示す構成図である。図１１は１マイクロフォン（チャネル（１集音信号））分の処理について示す。
各マイクロフォンの集音信号は、たとえば、１００Ｈｚのカットオフ周波数を持つアナログ・ローカットフィルタ１０１で処理され、１００Ｈｚ以下の周波数が除去されたフィルタ処理された音声信号がＡ／Ｄ変換器１０２に出力され、Ａ／Ｄ変換器１０２でディジタル信号に変換された集音信号が、それぞれ７．５ＫＨｚ、４ＫＨｚ、１．５ＫＨｚ、６００Ｈｚ、２５０Ｈｚのカットオフ周波数を持つ、ディジタル・ハイカットフィルタ１０３ａ〜１０３ｅ（総称して１０３）で高周波成分が除去される（ハイカット処理）。ディジタル・ハイカットフィルタ１０３ａ〜１０３ｅの結果はさらに、減算器１０４ａ〜１０４ｄ（総称して１０４）において隣接するディジタル・ハイカットフィルタ１０３ａ〜１０３ｅのフィルタ信号ごとの減算が行われる。
本発明の実施の形態において、ディジタル・ハイカットフィルタ１０３ａ〜１０３ｅおよび減算器１０４ａ〜１０４ｄは、実際はＤＳＰ２５において処理している。Ａ／Ｄ変換器１０２はＡ／Ｄ変換器ブロック２７の１つとして実現できる。 Generation of Various Frequency Component Signals by Filter Processing FIG. 11 is a configuration diagram showing filtering processing performed by the DSP 25 using sound signals collected by a microphone as preprocessing. FIG. 11 shows processing for one microphone (channel (one sound collection signal)).
The collected sound signal of each microphone is processed by an analog low cut filter 101 having a cutoff frequency of 100 Hz, for example, and a filtered audio signal from which a frequency of 100 Hz or less has been removed is output to the A / D converter 102. , Digital high-cut filters 103a to 103e (collectively referred to as “collection signals”) having cut-off frequencies of 7.5 KHz, 4 KHz, 1.5 KHz, 600 Hz, and 250 Hz, respectively. 103), high frequency components are removed (high cut processing). The results of the digital high cut filters 103a to 103e are further subtracted for each filter signal of the adjacent digital high cut filters 103a to 103e in subtractors 104a to 104d (collectively 104).
In the embodiment of the present invention, the digital high cut filters 103a to 103e and the subtractors 104a to 104d are actually processed in the DSP 25. The A / D converter 102 can be realized as one of the A / D converter blocks 27.

図１２は、図１１を参照して述べたフィルタ処理結果を示す周波数特性図である。このように１つの指向性を持つマイクロフォンで集音した信号から、各種の周波数成分をもつ複数の信号が生成される。 FIG. 12 is a frequency characteristic diagram showing the filter processing result described with reference to FIG. Thus, a plurality of signals having various frequency components are generated from a signal collected by a microphone having one directivity.

バンドパス・フィルタ処理およびマイク信号レベル変換処理
マイクロフォン選択処理の開始のトリガの１つに発言の開始、終了の判定を行う。そのために使用する信号が、ＤＳＰ２５で行う図１３に図解したバンドパス・フィルタ処理およびレベル変換処理によって得られる。図１３はマイクロフォンＭＣ１〜ＭＣ６で集音した６チャネル（ＣＨ）の入力信号処理中の１ＣＨのみを示す。
ＤＳＰ２５内のバンドパス・フィルタ処理およびレベル変換処理部は、各チャネルのマイクロフォンの集音信号を、それぞれ１００〜６００Ｈｚ、２００〜２５０Ｈｚ、２５０〜６００Ｈｚ、６００〜１５００Ｈｚ、１５００〜４０００Ｈｚ、４０００〜７５００Ｈｚの帯域通過特性を持つバンドパス・フィルタ２０１ａ〜２０１ｆ（総称してバンドパス・フィルタ・ブロック２０１）と、元のマイクロフォン集音信号および上記帯域通過集音信号をレベル変換するレベル変換器２０２ａ〜２０２ｇ（総称して、レベル変換ブロック２０２）を有する。 The start / end of speech is determined as one of the triggers for starting the band-pass filter processing and microphone signal level conversion processing microphone selection processing. A signal used for this purpose is obtained by the bandpass filter processing and level conversion processing illustrated in FIG. FIG. 13 shows only 1CH during processing of 6-channel (CH) input signals collected by the microphones MC1 to MC6.
The band-pass filter processing and level conversion processing unit in the DSP 25 respectively collects the collected sound signals of the microphones of each channel at 100 to 600 Hz, 200 to 250 Hz, 250 to 600 Hz, 600 to 1500 Hz, 1500 to 4000 Hz, 4000 to 7500 Hz. Band-pass filters 201a to 201f having band-pass characteristics (collectively, band-pass filter block 201), original microphone sound collection signals, and level converters 202a to 202g (for converting the levels of the band-pass sound collection signals). Collectively, it has a level conversion block 202).

各レベル変換器２０２ａ〜２０２ｇは、信号絶対値処理部２０３とピークホールド処理部２０４を有する。したがって、波形図を例示したように、信号絶対値処理部２０３は破線で示した負の信号が入力されたとき符号を反転して正の信号に変換する。ピークホールド処理部２０４は、信号絶対値処理部２０３の出力信号の最大値を保持する。ただし、本実施の形態では、時間の経過により、保持した最大値は幾分低下していく。もちろん、ピークホールド処理部２０４を改良して、低下分を少なくして長時間最大値を保持可能にすることもできる。 Each of the level converters 202a to 202g includes a signal absolute value processing unit 203 and a peak hold processing unit 204. Therefore, as illustrated in the waveform diagram, the signal absolute value processing unit 203 inverts the sign and converts it to a positive signal when a negative signal indicated by a broken line is input. The peak hold processing unit 204 holds the maximum value of the output signal of the signal absolute value processing unit 203. However, in the present embodiment, the held maximum value is somewhat lowered with the passage of time. Of course, the peak hold processing unit 204 can be improved so that the maximum value can be held for a long time by reducing the decrease.

バンドパス・フィルタについて述べる。音声集音・映像撮像装置の通話装置に使用するバンドパス・フィルタは、たとえば、２次ＩＩＲハイカット・フィルタと、マイク信号入力段のローカット・フィルタのみでバンドパス・フィルタを構成している。
本実施の形態においては周波数特性がフラットな信号からハイカットフィルタを通した信号を引き算すれば残りはローカットフィルタを通した信号とほぼ同等になることを利用する。
周波数−レベル特性を合わせる為に、１バンド余分に全体帯域通過のバンドパス・フィルタが必要となるが、必要とするバンドパス・フィルタのバンド数＋１のフィルタ段数とフィルタ係数により必要とされるバンドパスが得られる。今回必要とされるハンドパス・フィルタの帯域周波数はマイク信号１チャネル（ＣＨ）当りで下記表４に示す６バンドのバンドパス・フィルタとなる。 A bandpass filter will be described. The bandpass filter used for the speech device of the sound collecting / imaging device is, for example, a bandpass filter made up of only a secondary IIR high cut filter and a low cut filter at the microphone signal input stage.
In the present embodiment, it is utilized that if the signal that has passed through the high-cut filter is subtracted from the signal having a flat frequency characteristic, the rest is substantially equivalent to the signal that has passed through the low-cut filter.
In order to match the frequency-level characteristics, an extra band-pass bandpass filter is required for one band, but the band required by the number of filter stages equal to the number of bands of the required bandpass filter + 1 and the filter coefficient A pass is obtained. The band frequency of the handpass filter required this time is a 6-band bandpass filter shown in Table 4 below per one channel (CH) of the microphone signal.

〔表４〕
ＢＰ特性バンドパスフィルタ
BPF1=[100Hz-250Hz] ・・２０１ｂ
BPF2=[250Hz-600Hz] ・・２０１ｃ
BPF3=[600Hz-1.5KHz] ・・２０１ｄ
BPF4=[1.5KHz-4KHz] ・・２０１ｅ
BPF5=[4KHz-7.5KHz] ・・２０１ｆ
BPF6=[100Hz-600Hz] ・・２０１ａ [Table 4]
BP characteristic band pass filter
BPF1 = [100Hz-250Hz] ・・ 201b
BPF2 = [250Hz-600Hz] ・・ 201c
BPF3 = [600Hz-1.5KHz] ・・ 201d
BPF4 = [1.5KHz-4KHz] ・・ 201e
BPF5 = [4KHz-7.5KHz] ・・ 201f
BPF6 = [100Hz-600Hz] ・・ 201a

この方法でＤＳＰ２５における上記のＩＩＲ・フィルタの計算プログラムは、６ＣＨ（チャネル）×５（ＩＩＲ・フィルタ) ＝３０のみである。
従来のバンドパス・フィルタの構成と対比する。バンドパス・フィルタの構成は２次ＩＩＲフィルタを使用するとして、本発明のように６本のマイク信号にそれぞれ６バンドのバンドパス・フィルタを用意すると、従来方法では、６×６×２＝７２回路のＩＩＲ・フィルタ処理が必要になる。この処理には、最新の優秀なＤＳＰでもかなりのプログラム処理を要し他の処理への影響が出る。
本発明の実施の形態においては、100Hzのローカット・フィルタは入力段のアナログフィルタで処理する。用意する２次ＩＩＲハイカット・フィルタのカットオフ周波数は、250Hz,600Hz,1.5KHz,4KHz,7.5KHzの５種類である。このうちのカットオフ周波数7.5KHzのハイカット・フィルタは、実はサンプリング周波数が 16KHzなので必要が無いが、減算処理の過程で、ＩＩＲフィルタの位相回りの影響で、バンドパス・フィルタの出力レベルが減少する現象を軽減する為に意図的に被減数の位相を回す。 In this method, the calculation program of the above IIR filter in the DSP 25 is only 6CH (channel) × 5 (IIR filter) = 30.
Contrast with the conventional bandpass filter configuration. Assuming that the band-pass filter uses a second-order IIR filter and a 6-band band-pass filter is prepared for each of six microphone signals as in the present invention, in the conventional method, 6 × 6 × 2 = 72. Circuit IIR / filtering is required. This processing requires considerable program processing even with the latest excellent DSP, and affects other processing.
In the embodiment of the present invention, the 100 Hz low cut filter is processed by an analog filter in the input stage. There are five types of cutoff frequencies of the prepared second-order IIR high cut filters: 250 Hz, 600 Hz, 1.5 KHz, 4 KHz, and 7.5 KHz. Of these, a high-cut filter with a cutoff frequency of 7.5 KHz is not necessary because the sampling frequency is actually 16 KHz. Deliberately rotate the phase of the attenuator to reduce the phenomenon.

図１４は図１３に図解した構成による処理をＤＳＰ２５で処理したときのフローチャートである。 FIG. 14 is a flowchart when processing by the DSP 25 is performed according to the configuration illustrated in FIG.

図１４に図解したＤＳＰ２５におけるフィルタ処理は１段目の処理としてハイパス・フィルタ処理、２段目の処理として１段目のハイパス・フィルタ処理結果からの減算処理を行う。図１２はその信号処理結果のイメージ周波数特性図である。下記、〔ｘ〕は図１２における各処理ケースを示す。 The filter processing in the DSP 25 illustrated in FIG. 14 performs high-pass filter processing as the first-stage processing and subtraction processing from the result of the first-stage high-pass filter processing as the second-stage processing. FIG. 12 is an image frequency characteristic diagram of the signal processing result. [X] below shows each processing case in FIG.

第一段階
〔１〕全体帯域通過フィルタ用として、入力信号を7.5KHzのハイカットフィルタを通す。このフィルタ出力信号は入力のアナログのローカット合わせにより [100Hz-7.5KHz] のバンドパス・フィルタ出力となる。 First stage [1] The input signal is passed through a 7.5 kHz high cut filter for the whole band pass filter. This filter output signal becomes a bandpass filter output of [100Hz-7.5KHz] by matching the analog low cut of the input.

〔２〕入力信号を4KHzのハイカットフィルタに通す。このフィルタ出力信号は入力のアナログのローカットフィルタとの組み合わせにより [100Hz-4KHz] のバンドパス・フィルタ出力となる。 [2] Pass the input signal through a 4KHz high cut filter. This filter output signal becomes a bandpass filter output of [100Hz-4KHz] by combining with the input analog low cut filter.

〔３〕入力信号を1.5KHzのハイカットフィルタを通す。このフィルタ出力信号は入力のアナログのローカットフィルタとの組み合わせにより [100Hz-1.5KHz] のバンドパス・フィルタ出力となる。 [3] Pass the input signal through a 1.5 kHz high cut filter. This filter output signal becomes a bandpass filter output of [100Hz-1.5KHz] by combining with the input analog low cut filter.

〔４〕入力信号を600Hz のハイカットフィルタを通す。このフィルタ出力信号は入力のアナログのローカットフィルタとの組み合わせにより [100Hz-600Hz] のバンドパス・フィルタ出力となる。 [4] Pass the input signal through a 600Hz high-cut filter. This filter output signal becomes a bandpass filter output of [100Hz-600Hz] by combining with the input analog low cut filter.

〔５〕入力信号を250Hz のハイカットフィルタを通す。このフィルタ出力信号は入力のアナログのローカットフィルタとの組み合わせにより [100Hz-250Hz] のバンドパス・フィルタ出力となる。 [5] Pass the input signal through a 250Hz high cut filter. This filter output signal becomes a bandpass filter output of [100Hz-250Hz] by combining with the input analog low cut filter.

第二段階
〔１〕バンドパス・フィルタ(BPF5=[4KHz〜7.5KHz])は、フィルタ出力[1]-[2]([100Hz〜7.5KHz] - [100Hz〜4KHz])の処理を実行すると上記信号出力[4KHz〜7.5KHz]となる。
〔２〕バンドパス・フィルタ(BPF4=[1.5KHz〜4KHz])は、フィルタ出力[2]-[3]([100Hz〜4KHz] - [100Hz〜1.5KHz])の処理を実行すると、上記信号出力[1.5KHz〜4KHz]となる。
〔３〕バンドパス・フィルタ(BPF3=[600Hz〜1.5KHz])は、フィルタ出力[3]-[4]([100Hz〜1.5KHz] - [100Hz〜600Hz])の処理を実行すると、上記信号出力[600Hz〜1.5KHz]となる。
〔４〕バンドパス・フィルタ(BPF2=[250Hz〜600Hz])は、フィルタ出力[4]-[5]([100Hz〜600Hz] - [100Hz〜250Hz]) の処理を実行すると上記信号出力[250Hz〜600Hz]となる。〔５〕バンドパス・フィルタ(BPF1=[100Hz〜250Hz])は上記[5]の信号をそのままで出力信号[5]とする。
〔６〕バンドパス・フィルタ(BPF6=[100Hz〜600Hz])は[4]の信号をそのままで上記[4]の出力信号とする。
ＤＳＰ２５における以上の処理で必要とされるバンドパス・フィルタ出力が得られる。 The second stage [1] band pass filter (BPF5 = [4KHz ~ 7.5KHz]) executes the process of filter output [1]-[2] ([100Hz ~ 7.5KHz]-[100Hz ~ 4KHz]) The signal output is [4KHz to 7.5KHz].
[2] The bandpass filter (BPF4 = [1.5KHz to 4KHz]) will perform the above processing when the filter output [2]-[3] ([100Hz to 4KHz]-[100Hz to 1.5KHz]) is executed. Output [1.5KHz ~ 4KHz].
[3] The bandpass filter (BPF3 = [600Hz to 1.5KHz]) performs the above processing when the filter output [3]-[4] ([100Hz to 1.5KHz]-[100Hz to 600Hz]) is executed. Output [600Hz ~ 1.5KHz].
[4] The bandpass filter (BPF2 = [250Hz to 600Hz]) performs the process of filter output [4]-[5] ([100Hz to 600Hz]-[100Hz to 250Hz]). ~ 600Hz]. [5] The bandpass filter (BPF1 = [100 Hz to 250 Hz]) uses the signal [5] as it is as the output signal [5].
[6] The bandpass filter (BPF6 = [100 Hz to 600 Hz]) uses the signal [4] as it is as the output signal [4].
The bandpass filter output required by the above processing in the DSP 25 is obtained.

入力されたマイクロフォンの集音信号ＭＩＣ１〜ＭＩＣ６は、ＤＳＰ２５において、全帯域の音圧レベル、バンドパス・フィルタを通過した６帯域の音圧レベルとして表５のように常時更新される。 The input microphone sound collection signals MIC1 to MIC6 are constantly updated in the DSP 25 as the sound pressure level of the entire band and the sound pressure level of the six bands that have passed through the bandpass filter as shown in Table 5.

表５において、たとえば、L1-1はマイクロフォンＭＣ１の集音信号が第１バンドパス・フィルタ２０１ａを通過したときのピークレベルを示す。
発言の開始、終了判定は、図１３に図示した100Hz〜600Hzのバンドパス・フィルタ２０１ａを通過し、レベル変換部２０２ｂで音圧レベル変換されたマイクロフォン集音信号を用いる。 In Table 5, for example, L1-1 indicates a peak level when the collected sound signal of the microphone MC1 passes through the first bandpass filter 201a.
The start and end of speech is determined by using a microphone sound collection signal that has passed through the 100 Hz to 600 Hz bandpass filter 201a shown in FIG. 13 and whose sound pressure level has been converted by the level converter 202b.

従来のバンドパス・フィルタの構成は、バンドパス・フィルタ１段当りにハイ・パスフィルタとロー・パスフィルタの組み合わせで行うので、本実施の形態で使用する仕様の３６回路のバンドパス・フィルタを構築すると７２回路のフィルタ処理が必要となる。これに対して本発明の実施の形態のフィルタ構成は上述したように簡単になる。 The conventional band-pass filter is configured by combining a high-pass filter and a low-pass filter for each stage of the band-pass filter. Therefore, a 36-band band-pass filter of the specification used in this embodiment is used. When constructed, 72 circuits of filter processing are required. In contrast, the filter configuration of the embodiment of the present invention is simplified as described above.

発言の開始・終了判定処理
第１のディジタルシグナルプロセッサ（ＤＳＰ１）２５は、音圧レベル検出部から出力される値を元に、図１５に図解したように、マイクロフォン集音信号レベルがフロアノイズより上昇し、発言開始レベルの閾値を越した場合発言開始と判定し、その後開始レベルの閾値よりも高いレベルが継続した場合発言中、発言が終了し集音信号レベルが閾値より下がった場合をフロアノイズと判定し、発言終了判定時間、たとえば、フロアノイズが０．５秒間継続した場合発言終了と判定する。
発言の開始は、図１３に図解したマイク信号変換処理部２０２ｂで音圧レベル変換された１００Ｈｚ〜６００Ｈｚのバンドパス・フィルタを通過した音圧レベルデータ（マイク信号レベル（１））が図１５に例示した閾値レベル以上になった時から発言開始と判定する。
ＤＳＰ２５は、頻繁なマイクロフォン切り替えに伴う動作不良を回避するため、発言開始を検出してから、発言終了判定時間を、たとえば、０．５秒間経過するまでは次の発言開始を検出しないようにしている。 Sentence start / end determination processing The first digital signal processor (DSP1) 25, based on the value output from the sound pressure level detector, as shown in FIG. If the voice start level exceeds the threshold of the voice start level, it is determined that the voice starts.If the level continues to be higher than the threshold of the voice start level after that, the voice is terminated and the sound collection signal level falls below the threshold. When it is determined as noise and the speech end determination time, for example, floor noise continues for 0.5 seconds, it is determined that the speech ends.
The start of the speech is as follows. Sound pressure level data (microphone signal level (1)) that has passed through a band pass filter of 100 Hz to 600 Hz that has been subjected to sound pressure level conversion by the microphone signal conversion processing unit 202b illustrated in FIG. It is determined that the speech starts when the threshold level is exceeded.
In order to avoid malfunction due to frequent microphone switching, the DSP 25 does not detect the start of the next speech until the speech termination determination time, for example, 0.5 seconds elapses after the speech start is detected. Yes.

マイクロフォン選択
ＤＳＰ２５は、相互通話システムにおける発言者方向検出および発言者に対向したマイク信号の自動選択を、いわゆる、「星取表方式」に基づいて行う。
図１６は音声集音・映像撮像装置の通話装置の動作形態を図解したグラフである。
図１７は通話装置の通常処理を示すフローチャートである。 The microphone selection DSP 25 performs speaker direction detection and automatic selection of a microphone signal facing the speaker in the mutual communication system based on a so-called “star chart method”.
FIG. 16 is a graph illustrating the operation mode of the communication device of the sound collecting / imaging device.
FIG. 17 is a flowchart showing normal processing of the telephone device.

通話装置は図１６に図解したように、マイクロフォンＭＣ１〜ＭＣ６からの集音信号に応じて音声信号監視処理を行い、発言開始・終了判定を行い、発言方向判定を行い、マイクロフォン選択を行い、その結果をマイクロフォン選択結果表示手段、たとえば、発光ダイオードＬＥＤ１〜６に表示する。
以下、図１７のフローチャートを参照して通話装置１におけるＤＳＰ２５を主体として動作を述べる。なお、マイクロフォン・電子回路収容部２の全体制御はマイクロプロセッサ２３によって行われるが、ＤＳＰ２５の処理を中心に述べる。 As shown in FIG. 16, the communication device performs voice signal monitoring processing according to the collected sound signals from the microphones MC1 to MC6, performs speech start / end determination, performs speech direction determination, performs microphone selection, The result is displayed on the microphone selection result display means, for example, the light emitting diodes LED1 to LED6.
The operation will be described below with the DSP 25 in the call device 1 as a main component with reference to the flowchart of FIG. The overall control of the microphone / electronic circuit housing unit 2 is performed by the microprocessor 23, and the processing of the DSP 25 will be mainly described.

ステップＳ１：レベル変換信号の監視
マイクロフォンＭＣ１〜ＭＣ６で集音した信号はそれぞれ、図１２〜図１４、特に、図１３を参照して述べた、バンドパス・フィルタ・ブロック２０１、レベル変換ブロック２０２において、７種類のレベルデータとして変換されているから、ＤＳＰ２５は各マイクロフォン集音信号についての７種類の信号を常時監視する。
その監視結果に基づいて、ＤＳＰ２５は、発言者方向検出処理、発言者方向検出処理、発言開始・終了判定処理のいずれかの処理に移行する。 Step S1: Level Conversion Signal Monitoring Signals collected by the microphones MC1 to MC6 are respectively obtained in the band-pass filter block 201 and the level conversion block 202 described with reference to FIGS. Therefore, the DSP 25 constantly monitors seven types of signals for each microphone sound collection signal.
Based on the monitoring result, the DSP 25 proceeds to any one of the speaker direction detection processing, the speaker direction detection processing, and the speech start / end determination processing.

ステップＳ２：発言開始・終了判定処理
ＤＳＰ２５は図１５を参照して、さらに下記に詳述する方法に従って、発言の開始、終了の判定を行う。ＤＳＰ２５の処理が発言開始を検出した場合、ステップ４の発言者方向の判定処理へ発言開始検出を知らせる。
なお、ステップ２における発言の開始、終了の判定処理において、発言レベルが発言終了レベルより低くなった時、発言終了判定時間（たとえば、0.5秒）のタイマを起動し発言終了判定時間、発言レベルが発言終了レベルより小さい時、発言終了と判定する。
発言終了判定時間以内に発言終了レベルより大きくなったら再び発言終了レベルより小さくなるまで待ちの処理に入る。 Step S2: Speech Start / End Determination Processing The DSP 25 determines the start and end of speech according to the method described in detail below with reference to FIG. When the processing of the DSP 25 detects the start of speech, the speech start detection is notified to the speaker direction determination processing in step 4.
In the speech start / end determination process in step 2, when the speech level becomes lower than the speech end level, a speech end determination time (for example, 0.5 second) timer is started and the speech end determination time and the speech level are set. When it is smaller than the speech end level, it is determined that the speech has ended.
If it becomes larger than the speech end level within the speech end determination time, it waits until it becomes smaller than the speech end level again.

ステップＳ３：発言者方向の検出処理
ＤＳＰ２５における発言者方向の検出処理は、常時発言者方向をサーチし続けて行う。その後、ステップ４の発言者方向の判定処理へデータを供給する。 Step S3: Speaker Direction Detection Processing The speaker direction detection processing in the DSP 25 is always performed by continuously searching for the speaker direction. Thereafter, the data is supplied to the speaker direction determination processing in step 4.

ステップＳ４：発言者方向マイクの切り換え処理
ＤＳＰ２５に発言者方向マイクの切り換え処理におけるタイミング判定処理はステップ２の処理とステップ３の処理の結果から、その時の発言者検出方向と今まで選択していた発言者方向が違う場合に、新たな発言者方向のマイク選択をステップ４のマイク信号切り換え処理へ指示する。
ただし、議長のマイクロフォンが操作部１５から設定されていて、議長のマイクロフォンと他の会議出席者とが同時的に発言がある場合、議長の発言を優先する。
この時に、選択されたマイク情報をマイクロフォン選択結果表示手段、たとえば、発光ダイオードＬＥＤ１〜６に表示する。 Step S4: Speaker direction microphone switching processing The timing determination processing in the speaker direction microphone switching processing in the DSP 25 has been selected from the result of the processing in step 2 and step 3 and the current speaker detection direction. If the speaker direction is different, the microphone selection in step 4 is instructed to select a microphone in a new speaker direction.
However, if the chairman's microphone is set from the operation unit 15 and the chairman's microphone and other meeting attendees speak at the same time, the chairman's comment is given priority.
At this time, the selected microphone information is displayed on the microphone selection result display means, for example, the light emitting diodes LED1 to LED6.

ステップＳ５：マイクロフォン集音信号の伝送
マイク信号切り換え処理は６本のマイク信号の中からステップ４処理により選択されたマイク信号のみを送話信号として、たとえば、第１の音声集音・映像撮像装置１Ａの第１の通話装置１０Ａから通信回線９２０を介して相手側の第２の音声集音・映像撮像装置１Ｂの第２の通話装置１０Ｂに伝送するため、図６に図解した通信回線９２０のラインアウトへ出力する。 Step S5: Microphone sound collection signal transmission microphone signal switching processing uses only the microphone signal selected by the step 4 processing from among the six microphone signals as the transmission signal, for example, the first sound collection / video imaging device. In order to transmit from the first call device 10A of 1A to the second call device 10B of the second voice collecting / imaging device 1B on the other side via the communication line 920, the communication line 920 illustrated in FIG. Output to line-out.

ステップＳ６：撮像条件の決定
以上の方法で、発言者が決定できると、複数のマイクロフォンの配置条件、および、会議出席者の位置から、テレビカメラ装置４０Ａ１、４０Ａ２による撮像条件も決定できる。
なお、好ましくは、第２実施の形態で述べる会議出席者の声紋認識結果を用いる。
この処理の詳細は第３実施の形態として詳述する。 Step S6: Determination of Imaging Conditions When the speaker can be determined by the above method, the imaging conditions by the TV camera devices 40A1 and 40A2 can be determined from the arrangement conditions of the plurality of microphones and the positions of the meeting attendees.
In addition, Preferably, the voiceprint recognition result of the meeting attendant described in 2nd Embodiment is used.
Details of this processing will be described as a third embodiment.

発言開始レベル閾値、発言終了閾値の設定
処理１：電源を投入直後に各マイクロフォンそれぞれの所定時間、たとえば、１秒間分のフロアノイズを測定する。
ＤＳＰ２５は、音圧レベル検出部のピークホールドされたレベル値を一定時間間隔、本実施の形態では、たとえば、10mSec間隔で読み出し、所定時間、たとえば、１分間の値の平均値を算出しフロアノイズとする。
ＤＳＰ２５は測定されたフロアノイズレベルを元に発言開始の検出レベル（フロアノイズ +9dB)、発言終了の検出レベルの閾値（フロアノイズ＋６ｄＢ）を決定する。ＤＳＰ２５は、以後も、音圧レベル検出器のピークホールドされたレベル値を一定時間間隔で読み出す。
発言終了と判定された時は、ＤＳＰ２５は、フロアノイズの測定として働き、発言開始の検出し、発言終了の検出レベルの閾値を更新する。 Processing for setting a speech start level threshold and a speech end threshold 1: Immediately after the power is turned on, the floor noise for a predetermined time, for example, 1 second is measured for each microphone.
The DSP 25 reads the peak-held level value of the sound pressure level detection unit at regular time intervals, for example, 10 mSec intervals in the present embodiment, and calculates an average value of values for a predetermined time, for example, 1 minute, to calculate floor noise. And
The DSP 25 determines a speech start detection level (floor noise +9 dB) and a speech end detection level threshold (floor noise +6 dB) based on the measured floor noise level. After that, the DSP 25 reads the peak-held level value of the sound pressure level detector at regular time intervals.
When it is determined that the speech has ended, the DSP 25 functions as a floor noise measurement, detects the start of speech, and updates the threshold for the detection level of speech end.

この方法によれば、この閾値設定はマイクロフォンの置かれた位置のフロアノイズレベルがそれぞれ違うので各マイクロフォンにそれぞれ閾値が設定でき、ノイズ音源によるマイクロフォンの選択における誤判定を防げる。 According to this method, since the floor noise level at the position where the microphone is placed is different in this threshold setting, a threshold can be set for each microphone, and erroneous determination in selection of a microphone by a noise source can be prevented.

処理２：周辺ノイズ（フロアノイズの大きい）部屋への対応
処理２は処理１ではフロアノイズが大きく自動で閾値レベルを更新されると、発言開始、終了検出がしにくい時の対策として下記を行う。
ＤＳＰ２５は、予測されるフロアノイズレベルを元に発言開始の検出レベル、発言終了の検出レベルの閾値を決定する。
ＤＳＰ２５は、発言開始閾値レベルは発言終了閾値レベルより大きく（たとえば、３dB以上の差）に設定する。
ＤＳＰ２５は、音圧レベル検出器でピークホールドされたレベル値を一定時間間隔で読み出す。 Process 2: Response to room with ambient noise (large floor noise) In Process 2, if the floor level is large and the threshold level is automatically updated in Process 1, the following measures are taken when it is difficult to detect the start and end of speech. .
The DSP 25 determines a threshold for the detection level of the speech start and the detection level of the speech end based on the predicted floor noise level.
The DSP 25 sets the speech start threshold level to be greater than the speech end threshold level (for example, a difference of 3 dB or more).
The DSP 25 reads the level value peak-held by the sound pressure level detector at regular time intervals.

この方法によれば、この閾値設定は閾値が全てのマイクロフォンに対して同じ値なので、ノイズ源を背にした人と、そうでない人とで声の大きさが同程度でも発言開始が認識できる。 According to this method, since the threshold value is the same value for all microphones, it is possible to recognize the start of speech even if the person who is behind the noise source and the person who is not the same have the same loudness.

発言開始判定
処理１、６個のマイクロフォンに対応した音圧レベル検出器の出力レベルと、発言開始レベルの閾値を比較し発言開始レベルの閾値を越した場合発言開始と判定する。
ＤＳＰ２５は、全てのマイクロフォンに対応した音圧レベル検出器の出力レベルが、発言開始レベルの閾値を越した場合は、受話再生スピーカ１６からの信号であると判定し、発言開始とは判定しない。なぜなら、受話再生スピーカ１６と全てのマイクロフォンＭＣ１〜ＭＣ６との距離は同じであるから、受話再生スピーカ１６からの音は全てのマイクロフォンＭＣ１〜ＭＣ６にほぼ均等に到達するからである。 Talk start judgment
Process 1 The output level of the sound pressure level detector corresponding to the six microphones is compared with the threshold value of the speech start level.
When the output level of the sound pressure level detector corresponding to all the microphones exceeds the threshold of the speech start level, the DSP 25 determines that the signal is from the reception / reproduction speaker 16 and does not determine that the speech is started. This is because the distance between the reception / reproduction speaker 16 and all the microphones MC1 to MC6 is the same, so that the sound from the reception / reproduction speaker 16 reaches almost all the microphones MC1 to MC6.

処理２、図５に図解した６個のマイクロフォンについての６０度の等角度で放射状かつ等間隔の配置で、指向性軸を反対方向に１８０度ずらした単一指向性マイク２本（マイクロフォンＭＣ１とＭＣ４、マイクロフォンＭＣ２とＭＣ５、マイクロフォンＭＣ３とＭＣ６）の３組構成しマイク信号のレベル差を利用する。すなわち下記の演算を実行する。 Process 2 , two unidirectional microphones (with microphones MC1 and MC1) with the directional axes shifted by 180 degrees in the opposite direction at an equal angle of 60 degrees with respect to the six microphones illustrated in FIG. Three sets of MC4, microphones MC2 and MC5, and microphones MC3 and MC6) are used to make use of the level difference of the microphone signal. That is, the following calculation is performed.

〔表６〕
（マイク１の信号レベル−マイク４の信号レベル）の絶対値・・・[１]
（マイク２の信号レベル−マイク５の信号レベル）の絶対値・・・[２]
（マイク３の信号レベル−マイク６の信号レベル）の絶対値・・・[３] [Table 6]
Absolute value of (the signal level of microphone 1−the signal level of microphone 4) [1]
Absolute value of (signal level of microphone 2−signal level of microphone 5) [2]
Absolute value of (signal level of microphone 3−signal level of microphone 6) [3]

ＤＳＰ２５は上記絶対値[１],[２],[３]と発言開始レベルの閾値を比較し発言開始レベルの閾値を越した場合発言開始と判定する。
この処理の場合、処理１のように全ての絶対値が発言開始レベルの閾値より大きくなることは無いので（受話再生スピーカ１６からの音が全てのマイクロフォンに等しく到達するから）、受話再生スピーカ１６からの音か話者からの音声かの判定は不要になる。 The DSP 25 compares the absolute values [1], [2], and [3] with the threshold value of the speech start level, and determines that the speech is started when the threshold value of the speech start level is exceeded.
In the case of this process, since all the absolute values do not become larger than the threshold value of the speech start level as in process 1 (because the sound from the reception / reproduction speaker 16 reaches all the microphones equally), the reception / reproduction speaker 16 It is not necessary to determine whether the sound is from the speaker or from the speaker.

発言者方向の検出処理
発言者方向の検出には図７に例示した単一指向性マイクロフォンの特性を利用する。単一指向特性マイクロフォンは発言者からマイクロフォンへの音声の到達角度により図７に例示したように、周波数特性、レベル特性が変化する。その結果を図８（Ａ）〜（Ｃ）に例示した。図８（Ａ）〜（Ｃ）は、通話装置１０Ａから所定距離、たとえば、１．５メートルの距離にスピーカーを置いて各マイクロフォンが集音した音声を一定時間間隔で高速フーリエ変換（ＦＦＴ）した結果を示す。Ｘ軸が周波数を、Ｙ軸が信号レベルを、Ｚ軸が時間を表している。横線は、バンドパス・フィルタのカットオフ周波数を表し、この線にはさまれた周波数帯域のレベルが、図１１〜図１４を参照して述べたマイク信号レベル変換処理からの５バンドのバンドパス・フィルタを通した音圧レベルに変換されたデータとなる。 Speaker Direction Detection Processing For detecting the speaker direction, the characteristics of the unidirectional microphone illustrated in FIG. 7 are used. As illustrated in FIG. 7, the frequency characteristics and level characteristics of the unidirectional microphone change depending on the sound arrival angle from the speaker to the microphone. The results are illustrated in FIGS. 8 (A) to (C). 8A to 8C show a case where a speaker is placed at a predetermined distance from the communication device 10A, for example, a distance of 1.5 meters, and the sound collected by each microphone is subjected to fast Fourier transform (FFT) at regular time intervals. Results are shown. The X axis represents frequency, the Y axis represents signal level, and the Z axis represents time. The horizontal line represents the cut-off frequency of the band-pass filter, and the level of the frequency band sandwiched between the lines is the 5-band band pass from the microphone signal level conversion processing described with reference to FIGS. -It becomes the data converted into the sound pressure level that passed through the filter.

本発明の実施の形態の音声集音・映像撮像装置における通話装置における発言者方向の検出のために実際の処理として適用した判定方法を述べる。
各帯域バンドパス・フィルタの出力レベルに対しそれぞれ適切な重み付け処理（１ｄＢフルスパン（1dBFs）ステップなら0dBFsの時０、-3dBFsなら３というように、又はこの逆に）を行う。この重み付けのステップで処理の分解能が決まる。
１サンプルクロック毎に上記の重み付け処理を実行し、各マイクの重み付けされた得点を加算して一定サンプル数で平均値化して合計点の小さい（大きい）マイク信号を発言者に対向したマイクロフォンと判定する。この結果をイメージ化したものが下記表７である。 A determination method applied as an actual process for detecting the direction of the speaker in the communication device in the sound collection / video imaging device according to the embodiment of the present invention will be described.
Appropriate weighting processing is performed on the output level of each band-pass filter (0 for 1 dB full span (1 dBFs) step, 0 for 0 dBFs, 3 for -3 dBFs, or vice versa). This weighting step determines the processing resolution.
The above weighting process is executed for each sample clock, and the weighted score of each microphone is added and averaged with a fixed number of samples, and the microphone signal having a small (large) total score is determined as a microphone facing the speaker. To do. Table 7 below is an image of this result.

表７に例示したこの例では一番合計点が小さいのは第１マイクロフォンＭＣ１なので、ＤＳＰ２５は第１マイクロフォンＭＣ１の方向に音源が有る（話者がいる）と判定する。ＤＳＰ２５はその結果を音源方向マイク番号という形で保持する。
上述したように、ＤＳＰ２５は各マイクロフォン毎の周波数帯域のバンドパス・フィルタの出力レベルに重み付けを実行し、各帯域バンドパス・フィルタの出力の、得点の小さい（または大きい）マイク信号順に順位をつけ、１位の順位が３つの帯域以上に有るマイク信号を発言者に対向したマイクロフォンと判定する。そして、ＤＳＰ２５は第１マイクロフォンＭＣ１の方向に音源が有る（話者がいる）として、下記表８のような成績表を作成する。 In this example illustrated in Table 7, the smallest total point is the first microphone MC1, so the DSP 25 determines that there is a sound source in the direction of the first microphone MC1 (there is a speaker). The DSP 25 holds the result in the form of a sound source direction microphone number.
As described above, the DSP 25 performs weighting on the output level of the band-pass filter of the frequency band for each microphone, and ranks the microphone signals with the lowest score (or higher) of the output of each band-pass filter. A microphone signal having the first rank in three or more bands is determined as a microphone facing the speaker. Then, the DSP 25 creates a score table as shown in Table 8 below, assuming that there is a sound source in the direction of the first microphone MC1 (there is a speaker).

実際には部屋の特性により音の反射や定在波の影響で、必ずしも第１マイクロフォンＭＣ１の成績が全てのバンドパス・フィルタの出力で一番となるとは限らないが、５バンド中の過半数が１位であれば第１マイクロフォンＭＣ１の方向に音源が有る（話者がいる）と判定することができる。ＤＳＰ２５はその結果を音源方向マイク番号という形で保持する。 Actually, the performance of the first microphone MC1 is not necessarily the best in the output of all bandpass filters due to the reflection of sound and the influence of standing waves depending on the characteristics of the room, but the majority in the 5 bands If it is 1st place, it can be determined that there is a sound source in the direction of the first microphone MC1 (there is a speaker). The DSP 25 holds the result in the form of a sound source direction microphone number.

ＤＳＰ２５は各マイクロフォンの各帯域バンドパス・フィルタの出力レベルデータを下記表９に示した形態で合計し、レベルの大きいマイク信号を発言者に対向したマイクロフォンと判定し、その結果を音源方向マイク番号という形で保持する。 The DSP 25 sums the output level data of each band band pass filter of each microphone in the form shown in Table 9 below, determines that the microphone signal having a high level is the microphone facing the speaker, and determines the result as the sound source direction microphone number. Hold in the form of.

〔表９〕
MIC1 Level = L1-1 + L1-2 + L1-3 + L1-4 + L1-5
MIC2 Level = L2-1 + L2-2 + L2-3 + L2-4 + L2-5
MIC3 Level = L3-1 + L3-2 + L3-3 + L3-4 + L3-5
MIC4 Level = L4-1 + L4-2 + L4-3 + L4-4 + L4-5
MIC5 Level = L5-1 + L5-2 + L5-3 + L5-4 + L5-5
MIC6 Level = L6-1 + L6-2 + L6-3 + L6-4 + L6-5 [Table 9]
MIC1 Level = L1-1 + L1-2 + L1-3 + L1-4 + L1-5
MIC2 Level = L2-1 + L2-2 + L2-3 + L2-4 + L2-5
MIC3 Level = L3-1 + L3-2 + L3-3 + L3-4 + L3-5
MIC4 Level = L4-1 + L4-2 + L4-3 + L4-4 + L4-5
MIC5 Level = L5-1 + L5-2 + L5-3 + L5-4 + L5-5
MIC6 Level = L6-1 + L6-2 + L6-3 + L6-4 + L6-5

発言者方向マイクの切り換えタイミング判定処理
図１７のステップ２の発言開始判定結果により起動し、ステップ３の発言者方向の検出処理結果と過去の選択情報から新しい発言者のマイクロフォンが検出された時、ＤＳＰ２５は、ステップ５のマイク信号の選択切り替え処理へマイク信号の切り換えコマンドを発効すると共に、マイクロフォン選択結果表示手段（発光ダイオードＬＥＤ１〜６）へ発言者マイクが切り替わったことを通知し、発言者に自分の発言に対し音声集音・映像撮像装置の通話装置が応答したことを知らせる。 Talker direction microphone switching timing determination processing When activated by the speech start determination result in step 2 of FIG. 17, when a new speaker microphone is detected from the speaker direction detection processing result in step 3 and past selection information, The DSP 25 issues a microphone signal switching command to the microphone signal selection switching process in step 5, and notifies the microphone selection result display means (light emitting diodes LED1 to LED6) that the speaker microphone has been switched to the speaker. Informs that the speech device of the sound collecting / imaging device has responded to his speech.

反響の大きい部屋で、反射音や定在波の影響を除くため、ＤＳＰ２５は、マイクロフォンを切り換えてから発言終了判定時間（たとえば、0.5 秒)経過しないと、新しいマイク選択コマンドの発効は禁止する。
図１７のステップ１のマイク信号レベル変換処理結果、および、ステップ３の発言者方向の検出処理結果から、本実施の形態においては、マイク選択切り替えタイミングは２通りを準備する。 In order to eliminate the influence of reflected sound and standing waves in a room with high reverberation, the DSP 25 prohibits the activation of a new microphone selection command unless the speech end determination time (for example, 0.5 seconds) elapses after the microphone is switched.
In the present embodiment, two microphone selection switching timings are prepared from the result of the microphone signal level conversion process in step 1 in FIG. 17 and the detection process result in the speaker direction in step 3.

第１の方法：発言開始が明らかに判定できる時
選択されていたマイクロフォンの方向からの発言が終了し新たに別の方向から発言があった場合。
この場合は、ＤＳＰ２５は、全てのマイク信号レベル(１)とマイク信号レベル(２)が発言終了閾値レベル以下になってから発言終了判定時間（たとえば、0.5 秒)以上経過してから発言が開始され、どれかのマイク信号レベル(１)が発言開始閾値レベル以上になった時発言が開始されたと判断し、音源方向マイク番号の情報を元に発言者方向に対向したマイクロフォンを正当な集音マイクロフォンと決定し、ステップ５のマイク信号選択切り替え処理を開始する。 First method : When it is possible to clearly determine the start of speech When speech from the direction of the selected microphone has ended and there is a new speech from another direction.
In this case, the DSP 25 starts speaking after all the microphone signal level (1) and the microphone signal level (2) are equal to or lower than the speech end threshold level and more than the speech end determination time (for example, 0.5 seconds). When any microphone signal level (1) is equal to or higher than the speech start threshold level, it is determined that speech has started, and a microphone facing the speaker direction is properly collected based on the information of the microphone number in the sound source direction. The microphone is determined, and the microphone signal selection switching process in step 5 is started.

第２の方法：発言継続中に新たに別の方向からより大きな声の発言があった場合
この場合はＤＳＰ２５は発言開始（マイク信号レベル(１)が閾値レベル以上になった時）から発言終了判定時間（たとえば、0.5 秒)以上経過してから判定処理を開始する。
発言終了検出前に、３の処理からの音源方向マイク番号が変更になり、安定していると判定された場合、ＤＳＰ２５は音源方向マイク番号に相当するマイクロフォンに現在選択されている発言者よりも大声で発言している話者がいると判断し、その音源方向マイクロフォンを正当な集音マイクロフォンと決定し、ステップ５のマイク信号選択切り替え処理を起動する。 Second method : When a new louder voice is spoken from another direction while speaking is in progress In this case, the DSP 25 stops speaking from the start of speaking (when the microphone signal level (1) exceeds the threshold level). The determination process starts after the determination time (for example, 0.5 seconds) has elapsed.
If it is determined that the sound source direction microphone number from the process 3 is changed and is stable before the end of the speech is detected, the DSP 25 is more than the speaker currently selected for the microphone corresponding to the sound source direction microphone number. It is determined that there is a speaker who is speaking loudly, the sound source direction microphone is determined as a valid sound collecting microphone, and the microphone signal selection switching process in step 5 is started.

検出された発言者に対向したマイク信号の選択切り替え処理
ＤＳＰ２５は図１７のステップ４の発言者方向マイクの切り換えタイミング判定処理からのコマンドで選択判定されたコマンドにより起動する。
ＤＳＰ２５のマイク信号の選択切り替え処理は、図１８に図解したように、６回路の乗算器と６入力の加算器で構成する。マイク信号を選択する為には、ＤＳＰ２５は選択したいマイク信号が接続されている乗算器のチャネルゲイン（チャネル利得：CH Gain）を〔１〕に、その他の乗算器のCH Gainを〔０〕とする事で、加算器には選択された（マイク信号×〔１])の信号と（マイク信号×〔０])の処理結果が加算されて希望のマイク選択信号が出力に得られる。 The microphone signal selection switching process DSP 25 facing the detected speaker is activated by the command selected and determined by the command from the speaker direction microphone switching timing determination process in step 4 of FIG.
As shown in FIG. 18, the DSP 25 microphone signal selection switching process is composed of a 6-circuit multiplier and a 6-input adder. In order to select the microphone signal, the DSP 25 sets the channel gain (channel gain: CH Gain) of the multiplier to which the microphone signal to be selected is connected to [1] and the CH gains of the other multipliers to [0]. By doing so, the selected signal of (microphone signal × [1]) and the processing result of (microphone signal × [0]) are added to the adder, and a desired microphone selection signal is obtained at the output.

上記の様にチャネルゲインを[１]か[０]に切り換えると切り換えるマイク信号のレベル差によりクリック音が発生する可能性が有る。そこで、通話装置１０Ａでは、図１９に図解したように、CH Gainの変化を[１]から[０]へ、[０]から[１]へ変化するのに、切替遷移時間、たとえば、１０ｍ秒の時間で連続的に変化させてクロスするようにして、マイク信号のレベル差によるクリック音の発生を避けている。 When the channel gain is switched between [1] and [0] as described above, there is a possibility that a click sound is generated due to the level difference of the microphone signal to be switched. Therefore, in the communication device 10A, as illustrated in FIG. 19, the switching transition time, for example, 10 ms, is used to change the change in CH Gain from [1] to [0] and from [0] to [1]. In order to avoid the click sound caused by the difference in the level of the microphone signal, the signal is continuously changed over time.

また、チャネルゲインの最大を[1]以外、たとえば[0.5]の様にセットする事で後段のＤＳＰ２５におけるエコーキャンセル処理動作の調整を行うこともできる。 Further, by setting the maximum channel gain to other than [1], for example, [0.5], the echo cancellation processing operation in the DSP 25 at the subsequent stage can be adjusted.

上述したように、本発明の第１実施の形態の音声集音・映像撮像装置における通話装置は、ノイズの影響を受けず、有効に会議などの通話装置に適用できる。 As described above, the communication device in the sound collection / video imaging device according to the first embodiment of the present invention is not affected by noise and can be effectively applied to a communication device such as a conference.

本発明の第１実施の形態の音声集音・映像撮像装置における通話装置は構造面から下記の利点を有する。
（１）複数の単一指向性を持つマイクロフォンと受話再生スピーカとの位置関係が一定であり、さらにその距離が非常に近いことで受話再生スピーカから出た音が会議室（部屋）環境を経て複数のマイクロフォンに戻ってくるレベルより直接戻ってくるレベルが圧倒的に大きく支配的である。そのために、受話再生スピーカから複数のマイクロフォンに音が到達する特性（信号レベル（強度））、周波数特性（ｆ特、位相）がいつも同じである。つまり、通話装置においてはいつも伝達関数が同じという利点がある。 The communication device in the sound collection / video imaging device of the first embodiment of the present invention has the following advantages in terms of structure.
(1) The positional relationship between a plurality of microphones having a single directivity and a reception / reproduction speaker is constant, and furthermore, since the distance is very close, the sound emitted from the reception / reproduction speaker passes through the conference room (room) environment. The level that returns directly to the multiple microphones is overwhelmingly dominant. Therefore, the characteristics (signal level (intensity)) and frequency characteristics (f characteristics, phase) for sound to reach a plurality of microphones from the receiving / reproducing speaker are always the same. That is, there is an advantage that the transfer function is always the same in the communication device.

（２）それ故、マイクロフォンを切り替えた時の伝達関数の変化がなく、マイクロフォンを切り替える都度、マイクロフォン系の利得を調整をする必要がないという利点を有する。換言すれば、通話装置の製造時に一度調整をするとやり直す必要がないという利点がある。 (2) Therefore, there is no change in the transfer function when the microphone is switched, and there is an advantage that it is not necessary to adjust the gain of the microphone system every time the microphone is switched. In other words, there is an advantage that it is not necessary to redo once the adjustment is made at the time of manufacturing the communication device.

（３）上記と同じ理由でマイクロフォンを切り替えても、ディジタルシグナルプロセッサ（ＤＳＰ）で構成するエコーキャンセラが一つでよい。ＤＳＰは高価であり、種々の部材が搭載されて空きが少ないプリント基板にＤＳＰを配置するスペースも少なくてよい。 (3) Even if the microphone is switched for the same reason as described above, only one echo canceller configured by a digital signal processor (DSP) may be used. The DSP is expensive, and the space for placing the DSP on a printed circuit board on which various members are mounted and there is little space may be small.

（４）受話再生スピーカと複数のマイクロフォン間の伝達関数が一定であるため、±３ｄＢもあるマイクロフォン自体の感度差調整をユニット単独で出来るという利点がある。 (4) Since the transfer function between the receiving / reproducing speaker and the plurality of microphones is constant, there is an advantage that the sensitivity difference of the microphone itself having ± 3 dB can be adjusted by the unit alone.

（５）音声集音・映像撮像装置の通話装置が搭載されるテーブルは、通話装置内の一つの受話再生スピーカで均等な品質の音声を全方位に均等に分散（拡散）するスピーカシステムが可能になった。 (5) The table on which the voice collecting / video imaging device communication device is mounted can be a speaker system that evenly distributes (spreads) sound of equal quality in all directions with a single receiving / reproducing speaker in the communication device. Became.

（６）受話再生スピーカから出た音はテーブル面を伝達して（バウンダリ効果）会議出席者まで有効に能率良く均等に上質な音が届き、会議室の天井方向に対しては対向側の音と位相キャンセルされて小さな音になり、会議出席者に対して天井方向からの反射音が少なく、結果として参加者に明瞭な音が配給されるという利点がある。 (6) The sound emitted from the receiving / reproducing speaker is transmitted to the table surface (boundary effect), and the sound is effectively and evenly delivered to the conference attendees. The phase is canceled to produce a small sound, and there is an advantage that there is little reflected sound from the ceiling direction to the conference attendees, and as a result, a clear sound is distributed to the participants.

（７）受話再生スピーカから出た音は複数の全てのマイクロフォンに同時に同じ音量で届くので発言者の音声なのか受話音声なのかの判断が容易になる。その結果、マイクロフォン選択処理の誤判別が減る。 (7) Since the sound emitted from the reception / reproduction speaker reaches all of the plurality of microphones at the same volume at the same time, it is easy to determine whether the sound is the speaker's voice or the reception voice. As a result, erroneous determination of microphone selection processing is reduced.

（８）偶数個のマイクロフォンを等間隔で配置したことで方向検出の為のレベル比較が容易に出来る。 (8) By arranging even number of microphones at equal intervals, level comparison for direction detection can be easily performed.

（９）緩衝材を用いたダンパー、柔軟性または弾力性を持つマイクロフォン支持部材などにより、マイクロフォンが搭載されているプリント基板を介して伝達され得る受話再生スピーカの音による振動が、マイクロフォンの集音に対する影響を低減することができる。 (9) Due to a damper using a cushioning material, a microphone support member having flexibility or elasticity, vibration due to the sound of the reception and reproduction speaker that can be transmitted through the printed circuit board on which the microphone is mounted is collected by the microphone. The influence on can be reduced.

（１０）受話再生スピーカの音が直接、マイクロフォンには進入しない。したがって、この通話装置においては受話再生スピーカからのノイズの影響が少ない。 (10) The sound of the receiving / reproducing speaker does not directly enter the microphone. Therefore, in this call device, there is little influence of noise from the receiving / reproducing speaker.

本発明の第１実施の形態の音声集音・映像撮像装置における通話装置は信号処理面から下記の利点を有する。
（ａ）複数の単一指向性マイクを等間隔で放射状に配置して音源方向を検知可能とし、マイク信号を切り換えてＳ／Ｎの良い音、クリアな音を集音（収音）して、相手方に送信することができる。
（ｂ）周辺の発言者からの音声をＳ／Ｎ良く集音して、発言者に対向したマイクを自動選択できる。
（ｃ）本発明においては、マイク選択処理の方法として通過音声周波数帯域を分割し、それぞれの分割された周波数帯域ごとのレベルを比較する事で、信号分析を簡略化している。
（ｄ）本発明のマイク信号切り換え処理をＤＳＰの信号処理として実現し、複数の信号をすべてにクロス・フェード処理する事で切り換え時のクリック音を出さないようにしている。
（ｅ）マイク選択結果を、発光ダイオードなどのマイクロフォン選択結果表示手段、または、外部へ通知処理することができる。したがって、たとえば、図２に図解したテレビカメラ装置４０Ａ１、４０Ａ２を用いた会議システムへの発言者位置情報として活用することもできる。 The communication device in the sound collection / video imaging device according to the first embodiment of the present invention has the following advantages from the viewpoint of signal processing.
(A) A plurality of unidirectional microphones are arranged radially at equal intervals so that the direction of the sound source can be detected, and the microphone signal is switched to collect (collect) sound with good S / N and clear sound. Can be sent to the other party.
(B) Sound from surrounding speakers can be collected with good S / N, and a microphone facing the speaker can be automatically selected.
(C) In the present invention, signal analysis is simplified by dividing a passing voice frequency band as a method of microphone selection processing and comparing levels for each divided frequency band.
(D) The microphone signal switching process according to the present invention is realized as a DSP signal process, and a plurality of signals are all cross-fade processed so as not to generate a clicking sound at the time of switching.
(E) The microphone selection result can be notified to a microphone selection result display means such as a light emitting diode or to the outside. Therefore, for example, it can be utilized as speaker position information for the conference system using the TV camera devices 40A1 and 40A2 illustrated in FIG.

第２実施の形態
図２０〜図２５を参照して本発明の音声集音・映像撮像装置の通話装置の第２実施の形態を述べる。
従来、会議や個人の音声を離れた相手に伝送するのに、電話、インターフォン、テレビ電話などがあった。しかしこの場合、周囲の人の声やテレビジョン装置からの音などがうるさいため話者の声が相手に良く伝わらないことが多い。そのためわざわざ話者がマイクロフォンの近くまで行ったり、大声を上げたり、テレビジョン装置の出力音をそのつど下げたりと面倒であった。
第１実施の形態の音声集音・映像撮像装置における通話装置を用いれば、通話装置の周囲の雑音を排除でき、話者の識別も正確にできるが、さらに改善することが希望されている。
本発明の第２実施の形態は、第１実施の形態の通話装置をさらに向上させるため、声紋識別を行って事前に声紋を登録した話者の音声のみを明瞭に選別し、その他のノイズとなる音はレベルを下げることにより、より良いコミュニケーションを可能にする。 Second Embodiment Referring to FIGS. 20 to 25, a second embodiment of the speech device of the sound collecting and video imaging apparatus of the present invention will be described.
Conventionally, there have been telephones, interphones, videophones and the like for transmitting conferences and individual voices to remote parties. However, in this case, the voices of the people around and the sound from the television set are noisy, and the voice of the speaker is often not transmitted well to the other party. For this reason, it is troublesome for the speaker to go close to the microphone, raise a loud voice, or lower the output sound of the television device each time.
If the communication device in the sound collecting and video imaging device of the first embodiment is used, noise around the communication device can be eliminated and the speaker can be identified accurately, but further improvement is desired.
In the second embodiment of the present invention, in order to further improve the communication device of the first embodiment, only voices of speakers who have previously registered voiceprints by performing voiceprint identification are clearly selected, and other noise and The sound becomes better communication by lowering the level.

図２０は本発明の第２実施の形態の通話装置の装置構成を示す。
図２０に図解した通話装置は、図６に図解した通話装置と類似する構成をしており、図６に図解した通話装置における構成要素は同じ符号を付している。ただし、下記の部分が異なる。
第２実施の形態の通話装置においては、マイクロフォンＭＣ１〜ＭＣ６とＡ／Ｄ変換器２７１〜２７３との間に利得可変型増幅器３０１〜３０６が配置され、声紋認証部３２が追加され、増幅器利得調整部３４が追加され、増幅器２９１からＬＩＮＥＯＵＴ端子への出力に加えて増幅器２９１から声紋認証部３２に出力信号が印加されている。なお、利得可変型増幅器３０１〜３０６は第１実施の形態においても述べたように、Ａ／Ｄ変換器２７１〜２７３を利得調整付増幅機能型Ａ／Ｄ変換器２７１〜２７３として構成することもでき、その場合は、利得可変型増幅器３０１〜３０６の機能をＡ／Ｄ変換器２７１〜２７３に含めることもできる。なお、本実施の形態においては、Ａ／Ｄ変換器２７１〜２７３とは別個に利得可変型増幅器３０１〜３０６を設けた場合について述べる。
なお、第２実施の形態においては、第３の増幅器２９３が付加されて，録音出力端子ＲＥＣＯＵＴに、ＬＩＮＥＩＮからの入力信号または増幅器２９３からの信号を出力可能に構成されている。 FIG. 20 shows a device configuration of the communication device according to the second embodiment of the present invention.
The telephone apparatus illustrated in FIG. 20 has a configuration similar to that of the telephone apparatus illustrated in FIG. 6, and the components in the telephone apparatus illustrated in FIG. However, the following parts are different.
In the communication device according to the second embodiment, variable gain amplifiers 301 to 306 are arranged between microphones MC1 to MC6 and A / D converters 271 to 273, a voiceprint authentication unit 32 is added, and amplifier gain adjustment is performed. In addition to the output from the amplifier 291 to the LINE OUT terminal, an output signal is applied from the amplifier 291 to the voiceprint authentication unit 32. As described in the first embodiment, the variable gain amplifiers 301 to 306 may be configured such that the A / D converters 271 to 273 are the amplification function type A / D converters 271 to 273 with gain adjustment. In this case, the functions of the variable gain amplifiers 301 to 306 can be included in the A / D converters 271 to 273. In this embodiment, a case where variable gain amplifiers 301 to 306 are provided separately from A / D converters 271 to 273 will be described.
In the second embodiment, a third amplifier 293 is added so that an input signal from LINE IN or a signal from the amplifier 293 can be output to the recording output terminal REC OUT.

６本のマイクロフォンＭＣ１〜ＭＣ６は、図７に例示した指向性を持ち、図３〜図５を参照して述べたように、等角度かつ等間隔で配置されている。
Ａ／Ｄ変換器２７１〜２７３は第２実施の形態においても、２チャネル用Ａ／Ｄ変換器であり、１個のＡ／Ｄ変換器で２入力信号（２チャネルの入力信号）が取り込める。
ＤＳＰ２５は第１実施の形態において述べた、図１０に列挙した種々の処理、たとえば、マイクロフォン選択・切り替え処理などを行う。
第２のディジタルシグナルプロセッサ（ＤＳＰ）２６は第１実施の形態で述べたとおり、エコーキャンセル処理を行う。 The six microphones MC1 to MC6 have the directivity illustrated in FIG. 7 and are arranged at equal angles and at equal intervals as described with reference to FIGS.
The A / D converters 271 to 273 are also two-channel A / D converters in the second embodiment, and two input signals (two-channel input signals) can be captured by one A / D converter.
The DSP 25 performs various processes listed in FIG. 10 described in the first embodiment, such as a microphone selection / switching process.
As described in the first embodiment, the second digital signal processor (DSP) 26 performs echo cancellation processing.

声紋認証部３２は、声紋認証処理を行う声紋認証処理プロセッサＰと、声紋処理のための辞書メモリＭ１と、声紋を登録する声紋登録メモリＭ２とを有する。声紋登録メモリＭ２には、声紋登録装置３２Ａにより事前に話者認証を行う人の声紋が登録されている。話者認証の対象者は、本実施の形態の通話装置を使用する会議出席者などである。声紋認証部３２の処理の詳細は後述する。 The voiceprint authentication unit 32 includes a voiceprint authentication processor P that performs voiceprint authentication processing, a dictionary memory M1 for voiceprint processing, and a voiceprint registration memory M2 that registers voiceprints. In the voiceprint registration memory M2, a voiceprint of a person who performs speaker authentication in advance by the voiceprint registration device 32A is registered. The target person for speaker authentication is a meeting attendee who uses the communication device of the present embodiment. Details of the processing of the voiceprint authentication unit 32 will be described later.

ＤＳＰ２５は第１実施の形態と同様、マイクロフォンＭＣ１〜ＭＣ６のうちの１つを選択し、選択したマイクロフォンの番号を示すマイクロフォン選択信号Ｓ２５１をマイクロプロセッサ２３に出力する。マイクロプロセッサ２３はマイクロフォン選択信号Ｓ２５１を増幅器利得調整部３４に出力する。
ＤＳＰ２５で選択したマイクロフォンの信号がＤＳＰ２６に印加され、ＤＳＰ２６においてエコーキャンセル処理されて、Ｄ／Ａ変換器２８２に出力され、増幅器２９２で増幅されて受話再生スピーカ１６から出力されるので、通話装置を使用している会議出席者は受話再生スピーカ１６から選択されたマイクロフォンを使用した話者の音声を聞くことができる。 As in the first embodiment, the DSP 25 selects one of the microphones MC1 to MC6 and outputs a microphone selection signal S251 indicating the number of the selected microphone to the microprocessor 23. The microprocessor 23 outputs the microphone selection signal S251 to the amplifier gain adjustment unit 34.
The microphone signal selected by the DSP 25 is applied to the DSP 26, echo canceled at the DSP 26, output to the D / A converter 282, amplified by the amplifier 292, and output from the reception / reproduction speaker 16. The conference attendee in use can listen to the voice of the speaker using the selected microphone from the reception reproduction speaker 16.

ＤＳＰ２６からＤ／Ａ変換器２８２に出力された選択音声信号Ｓ２６は増幅器２９１を介してＬＩＮＥＯＵＴ端子に出力され、相手方の通話装置に送出することができる。
またＤＳＰ２６からＤ／Ａ変換器２８２に出力された選択音声信号Ｓ２６は増幅器２９３を介してＲＥＣＯＵＴ端子に出力されるので、録音することもできる。
さらにＤＳＰ２６からＤ／Ａ変換器２８２に出力された選択音声信号Ｓ２６は増幅器２９１を介して声紋認証部３２に出力されるので、声紋認証部３２において選択音声信号Ｓ２６について声紋認証を行う。声紋認証の詳細は後述するが、声紋認証部３２は選択音声信号Ｓ２６を声紋認証した結果、声紋登録メモリＭ２に登録されたものであるとき、認証合格信号Ｓ３２（認証合格のとき「１」、認証不合格のとき「０」）を増幅器利得調整部３４に出力する。 The selected voice signal S26 output from the DSP 26 to the D / A converter 282 is output to the LINE OUT terminal via the amplifier 291 and can be sent to the partner telephone apparatus.
Since the selected audio signal S26 output from the DSP 26 to the D / A converter 282 is output to the REC OUT terminal via the amplifier 293, it can also be recorded.
Furthermore, since the selected voice signal S26 output from the DSP 26 to the D / A converter 282 is output to the voiceprint authentication unit 32 via the amplifier 291, the voiceprint authentication unit 32 performs voiceprint authentication on the selected voice signal S26. Although the details of the voiceprint authentication will be described later, when the voiceprint authentication unit 32 is registered in the voiceprint registration memory M2 as a result of voiceprint authentication of the selected voice signal S26, an authentication pass signal S32 (“1” is passed when the authentication is passed). “0”) is output to the amplifier gain adjustment unit 34 when the authentication fails.

増幅器利得調整部３４にはマイクロプロセッサ２３を介してＤＳＰ２５からマイクロフォン選択信号Ｓ２５１が入力されている。この状態において、声紋認証部３２から認証合格を示す認証合格信号Ｓ３２が増幅器利得調整部３４に入力されると、増幅器利得調整部３４はマイクロフォン選択信号Ｓ２５１で示されたマイクロフォンの出力信号が入力されている該当する利得可変型増幅器の利得を大きくし（すでに大きく設定されているときはその値に維持する、または、ある大きな値に設定する）、その他の利得可変型増幅器の利得を低下させる（すでに低く設定されているときはその値に維持する、または、ある低い値に設定する）。 A microphone selection signal S251 is input from the DSP 25 to the amplifier gain adjustment unit 34 via the microprocessor 23. In this state, when an authentication pass signal S32 indicating authentication pass is input from the voiceprint authentication unit 32 to the amplifier gain adjustment unit 34, the amplifier gain adjustment unit 34 receives the output signal of the microphone indicated by the microphone selection signal S251. Increase the gain of the corresponding variable gain amplifier (maintain that value if it is already set to a large value, or set it to a large value), and decrease the gain of the other variable gain amplifier ( If it is already set low, keep it at that value, or set it to a low value).

具体的には、増幅器利得調整部３４はマイクロコンピュータを内蔵しており、増幅器利得調整部３４内のマイクロコンピュータは、マイクロフォン選択信号Ｓ２５１で示されたマイクロフォンの出力信号が入力されている該当する利得可変型増幅器の利得設定値を大きな値に設定してその利得可変型増幅器に出力し、その他の利得可変型増幅器の利得設定値を低い値に設定してそれらの利得可変型増幅器に出力する。その結果、利得可変型増幅器３０１〜３０６は設定された利得に変更される。 Specifically, the amplifier gain adjustment unit 34 has a built-in microcomputer, and the microcomputer in the amplifier gain adjustment unit 34 has a corresponding gain to which the output signal of the microphone indicated by the microphone selection signal S251 is input. The gain setting values of the variable amplifiers are set to large values and output to the variable gain amplifiers, and the gain setting values of the other variable gain amplifiers are set to low values and output to these variable gain amplifiers. As a result, the variable gain amplifiers 301 to 306 are changed to the set gain.

たとえば、第１マイクロフォンＭＣ１がテレビジョン装置からの音だけを集音した場合、その音が大きければ、ＤＳＰ２５によって選択される。その結果、ＤＳＰ２５は第１マイクロフォンＭＣ１が選択されたことを示すマイクロフォン選択信号Ｓ２５１をマイクロプロセッサ２３を介して増幅器利得調整部３４に出力する。
ＤＳＰ２５で選択されたテレビジョン装置からの音信号がＤＳＰ２６から選択音声信号Ｓ２６として、増幅器２９１を経由して声紋認証部３２に入力される。声紋認証部３２の声紋登録メモリＭ２にはテレビジョン装置の音を登録していないから、その選択音声信号Ｓ２６は認証不合格とされ、「０」の認証合格信号Ｓ３２が増幅器利得調整部３４に出力される。
増幅器利得調整部３４にはすでに、第１マイクロフォンＭＣ１が選択されたことを示すマイクロフォン選択信号Ｓ２５１が入力されているが、「０」の認証合格信号Ｓ３２が入力されるので、増幅器利得調整部３４は、マイクロフォン選択信号Ｓ２５１に示された第１マイクロフォンＭＣ１の出力信号が接続されている利得可変型増幅器３０１の利得を低く設定して利得可変型増幅器３０１に出力し、利得可変型増幅器３０１の利得を下げる。その結果、第１マイクロフォンＭＣ１の集音信号は、利得可変型増幅器３０１で低下されて、Ａ／Ｄ変換器２７１に入力されるから、その後、マイクロフォン選択の対象から外れる可能性が高い。 For example, when the first microphone MC1 collects only the sound from the television device, if the sound is loud, it is selected by the DSP 25. As a result, the DSP 25 outputs a microphone selection signal S251 indicating that the first microphone MC1 has been selected to the amplifier gain adjustment unit 34 via the microprocessor 23.
A sound signal from the television device selected by the DSP 25 is input from the DSP 26 to the voiceprint authentication unit 32 via the amplifier 291 as the selected sound signal S26. Since the sound of the television apparatus is not registered in the voiceprint registration memory M2 of the voiceprint authentication unit 32, the selected voice signal S26 is rejected for authentication, and the authentication pass signal S32 of “0” is sent to the amplifier gain adjustment unit 34. Is output.
The amplifier gain adjustment unit 34 has already received the microphone selection signal S251 indicating that the first microphone MC1 has been selected. However, since the authentication pass signal S32 of “0” is input, the amplifier gain adjustment unit 34 The gain of the variable gain amplifier 301 to which the output signal of the first microphone MC1 indicated by the microphone selection signal S251 is connected is set low and output to the variable gain amplifier 301. The gain of the variable gain amplifier 301 is Lower. As a result, the collected sound signal of the first microphone MC1 is lowered by the variable gain amplifier 301 and is input to the A / D converter 271, so that there is a high possibility that it will be excluded from the microphone selection target.

他方、第３マイクロフォンＭＣ３を使用する話者の声紋が事前に声紋認証部３２の声紋登録メモリＭ２に登録されており、ＤＳＰ２５により第３マイクロフォンＭＣ３が選択されたとき、ＤＳＰ２５からマイクロプロセッサ２３を経由して第３マイクロフォンＭＣ３が選択されたことを示すマイクロフォン選択信号Ｓ２５１が増幅器利得調整部３４に出力され、第３マイクロフォンＭＣ３の音声が選択音声信号Ｓ２６として声紋認証部３２に入力されて声紋認証される。この場合、その声紋は声紋登録メモリＭ２に登録されているから、認証は合格し、「１」の認証合格信号Ｓ３２が出力される。
増幅器利得調整部３４は「１」の認証合格信号Ｓ３２が入力されると、第３マイクロフォンＭＣ３が選択されたことを示すマイクロフォン選択信号Ｓ２５１を参照して、第３マイクロフォンＭＣ３の出力信号が接続されている利得可変型増幅器３０５の利得を高く設定して利得可変型増幅器３０５に出力し、利得可変型増幅器３０５の利得をある高い値に設定する。その結果、第３マイクロフォンＭＣ３の集音信号は、利得可変型増幅器３０５で高められてＡ／Ｄ変換器２７３に入力され、高い音声出力がＤＳＰ２６から選択音声信号Ｓ２６として出力される。その選択音声信号Ｓ２６はもちろん、Ｄ／Ａ変換器２８２でアナログ信号に変換された後、増幅器２９２で増幅されて受話再生スピーカ１６に出力され、増幅器２９１で増幅されてＬＩＮＥＯＵＴを経由して相手方の通話装置に送出され、再び声紋認証部３２に入力されて声紋認証の対象となる。 On the other hand, when a voiceprint of a speaker who uses the third microphone MC3 is registered in advance in the voiceprint registration memory M2 of the voiceprint authentication unit 32 and the third microphone MC3 is selected by the DSP 25, the DSP 25 passes through the microprocessor 23. Then, the microphone selection signal S251 indicating that the third microphone MC3 has been selected is output to the amplifier gain adjustment unit 34, and the voice of the third microphone MC3 is input to the voiceprint authentication unit 32 as the selected voice signal S26 for voiceprint authentication. The In this case, since the voiceprint is registered in the voiceprint registration memory M2, the authentication is passed and an authentication pass signal S32 of “1” is output.
When the authentication pass signal S32 of “1” is input, the amplifier gain adjustment unit 34 is connected to the output signal of the third microphone MC3 with reference to the microphone selection signal S251 indicating that the third microphone MC3 has been selected. The gain of the variable gain amplifier 305 is set high and output to the variable gain amplifier 305, and the gain of the variable gain amplifier 305 is set to a certain high value. As a result, the collected sound signal of the third microphone MC3 is increased by the variable gain amplifier 305 and input to the A / D converter 273, and a high sound output is output from the DSP 26 as the selected sound signal S26. The selected audio signal S26 is, of course, converted to an analog signal by the D / A converter 282, amplified by the amplifier 292, output to the reception / reproduction speaker 16, amplified by the amplifier 291 and via the LINE OUT. Are input to the voiceprint authentication unit 32 again and are subject to voiceprint authentication.

第１マイクロフォンＭＣ１で採取したテレビジョン装置からの音と、第３マイクロフォンＭＣ３からの音声とが同時に存在したときは、ＤＳＰ２５において、まず、音の高いほうが選択されて、選択音声信号Ｓ２６として声紋認証部３２に入力される。
たとえば、第１マイクロフォンＭＣ１で集音したテレビジョン装置の音が第３マイクロフォンＭＣ３からの音声より高いときは第１マイクロフォンＭＣ１からのテレビジョン装置の音がＤＳＰ２５において選択されＤＳＰ２６から選択音声信号Ｓ２６として出力されている場合は、上述したように、声紋認証部３２においては認証されない。よって、上述したように、第１マイクロフォンＭＣ１の出力信号が接続されている利得可変型増幅器３０１の利得が低くされる。その結果、ＤＳＰ２５における次のマイクロフォン選択処理においては第１マイクロフォンＭＣ１の集音信号は選択されず、第３マイクロフォンＭＣ３の集音信号が選択されることになる。第３マイクロフォンＭＣ３の集音信号が選択音声信号Ｓ２６としてＤＳＰ２６から声紋認証部３２に出力されると、声紋認証処理は合格となる。その結果、増幅器利得調整部３４により、第３マイクロフォンＭＣ３が接続された利得可変型増幅器３０５の利得が高い値に設定されて、第３マイクロフォンＭＣ３の集音信号が高くなり、明瞭な音声として受話再生スピーカ１６から出力され、ＬＩＮＥＯＵＴから出力され、再び声紋認証部３２に入力される。 When the sound from the television apparatus collected by the first microphone MC1 and the sound from the third microphone MC3 exist at the same time, the DSP 25 first selects the higher sound and authenticates the voice print as the selected sound signal S26. Input to the unit 32.
For example, when the sound of the television apparatus collected by the first microphone MC1 is higher than the sound from the third microphone MC3, the sound of the television apparatus from the first microphone MC1 is selected by the DSP 25 and is selected from the DSP 26 as the selected sound signal S26. If it is output, the voiceprint authentication unit 32 does not authenticate as described above. Therefore, as described above, the gain of the variable gain amplifier 301 to which the output signal of the first microphone MC1 is connected is lowered. As a result, in the next microphone selection process in the DSP 25, the sound collection signal of the first microphone MC1 is not selected, and the sound collection signal of the third microphone MC3 is selected. When the collected sound signal of the third microphone MC3 is output from the DSP 26 to the voiceprint authentication unit 32 as the selected voice signal S26, the voiceprint authentication process is passed. As a result, the gain of the variable gain amplifier 305 to which the third microphone MC3 is connected is set to a high value by the amplifier gain adjusting unit 34, and the collected sound signal of the third microphone MC3 becomes high, so that the voice is received as clear sound. It is output from the reproduction speaker 16, output from LINE OUT, and input to the voiceprint authentication unit 32 again.

このように、声紋認証部３２の声紋登録メモリＭ２に登録した声紋の話者が話した音声が最終的に選択され、明瞭な信号として、受話再生スピーカ１６から、ＬＩＮＥＯＵＴに、声紋認証部３２に、出力される。
したがって、第２実施の形態の通話装置を用いれば、図１に例示したように、離れたところにいる人との間で明瞭な音声の会話を容易に行うことができる。
また、周囲のノイズとしてのテレビジョン装置の音など騒音環境で通話装置を使用する場合でも、話者が話す位置を移動する必要もなく、あるいは、ことさら大きい声を出す必要も無い。
さらに、ノイズとしてのテレビジョン装置の音声レベルをその都度下げる煩わしさもなく相手と話をすることができる。特に、ノイズとしてのテレビジョン装置の音を低く抑えて送られるので相手は明瞭な会話音だけが聞こえ、会話が円滑に行われる。その意味では、第２実施の形態の通話装置は不要な雑音を除去する装置としての機能をも持つ。
もちろん、声紋認証部３２の声紋登録メモリＭ２に声紋登録されてない人が通話装置の周囲で話していても、そのような音声は最終的には選択されず、声紋登録された話者の音声のみで、明瞭に選択出力される。 In this way, the voice spoken by the voiceprint speaker registered in the voiceprint registration memory M2 of the voiceprint authentication unit 32 is finally selected, and as a clear signal, the voiceprint authentication unit 32 is sent from the reception reproduction speaker 16 to LINE OUT. Is output.
Therefore, if the communication device of the second embodiment is used, as illustrated in FIG. 1, a clear voice conversation can be easily performed with a person at a distance.
Further, even when using a communication device in a noise environment such as the sound of a television device as ambient noise, it is not necessary to move the speaking position of the speaker or to make a loud voice.
Furthermore, it is possible to talk with the other party without bothering to lower the sound level of the television device as noise each time. In particular, since the sound of the television apparatus as noise is kept low, the other party can only hear a clear conversation sound, and the conversation is performed smoothly. In that sense, the communication device of the second embodiment also has a function as a device for removing unnecessary noise.
Of course, even if a person who is not registered in the voiceprint registration memory M2 of the voiceprint authentication unit 32 speaks around the telephone device, such voice is not finally selected, and the voice of the speaker who is registered as a voiceprint. Only clearly and selectively output.

選択されたマイクロフォンの終了は、図１５に図解したように、マイクロフォン出力信号のレベルが低下し、所定時間継続したとき、ＤＳＰ２５により判断される。
このとき、好ましくは、増幅器利得調整部３４は、発言が終了したマイクロフォンに対応する利得可変型増幅器の利得を通常の利得に設定しなおす。もちろん、ＤＳＰ２５からマイクロプロセッサ２３を経由して選択が終了したことを、マイクロフォン選択信号Ｓ２５１に含めて増幅器利得調整部３４に通報することができる。
このように、選択が終了したマイクロフォンに対応する利得可変型増幅器の利得を他の利得可変型増幅器と同じ利得にすることにより、次のマイクロフォン選択が平等の条件となる。 The end of the selected microphone is determined by the DSP 25 when the level of the microphone output signal decreases and continues for a predetermined time, as illustrated in FIG.
At this time, the amplifier gain adjustment unit 34 preferably resets the gain of the variable gain amplifier corresponding to the microphone whose speech has ended to a normal gain. Of course, it is possible to notify the amplifier gain adjustment unit 34 that the selection is completed from the DSP 25 via the microprocessor 23 by including it in the microphone selection signal S251.
In this way, by setting the gain of the variable gain amplifier corresponding to the selected microphone to the same gain as that of the other variable gain amplifiers, the next microphone selection becomes an equal condition.

以上の実施の形態においては、本発明の利得可変型増幅手段として、利得可変型増幅器３０１〜３０６を用いた場合について述べたが、上述したように、Ａ／Ｄ変換器２７１〜２７３として利得可変型Ａ／Ｄ変換器２７１〜２７３を用いることもでき、その場合、利得可変型増幅器３０１〜３０６を固定利得の増幅器に代え、増幅器利得調整部３４は利得可変型Ａ／Ｄ変換器２７１〜２７３の利得を調整（設定）することもできる。 In the above embodiment, the case where the variable gain amplifiers 301 to 306 are used as the variable gain amplification means of the present invention has been described. However, as described above, the variable gain can be used as the A / D converters 271 to 273. Type A / D converters 271 to 273 can also be used. In this case, the variable gain amplifiers 301 to 306 are replaced with fixed gain amplifiers, and the amplifier gain adjustment unit 34 includes variable gain A / D converters 271 to 273. Can be adjusted (set).

本発明の通話装置の好適な例示として、第１実施の形態として述べた、マイクロフォンＭＣ１〜ＭＣ６が等角度で放射状に配置された場合について述べたが、第２実施の形態としては、マイクロフォンＭＣ１〜ＭＣ６が第１実施の形態のように、各対のマイクロフォン、たとえば、ＭＣ１とＭＣ４とが一直線上に対向して配置されている場合に限らず、所定の配置でもよい。その場合、ＤＳＰ２５は、たとえば、最大振幅の集音信号を出力したマイクロフォンをマイクロフォン選択信号Ｓ２５１として選択する。その後、声紋認証部３２において上述した声紋認証を行う。 As a preferred example of the communication device of the present invention, the case where the microphones MC1 to MC6 described as the first embodiment are radially arranged at an equal angle has been described. However, as the second embodiment, the microphones MC1 to MC1 are described. The MC 6 is not limited to the case where each pair of microphones, for example, MC 1 and MC 4 are arranged on a straight line as in the first embodiment, but may be a predetermined arrangement. In this case, the DSP 25 selects, for example, a microphone that outputs a sound collection signal having the maximum amplitude as the microphone selection signal S251. Thereafter, the voiceprint authentication unit 32 performs the above-described voiceprint authentication.

図２１〜図２５を参照して声紋認証部３２の処理内容の詳細な例について述べる。
本実施の形態においては、各会議出席者がマイクロフォンＭＣ１〜ＭＣ６から順に音声を声紋登録装置３２Ａに入力し、声紋登録装置３２Ａからマイクロフォンの番号とともに、声紋認証部３２に出力する。本例では、各会議出席者の音声は、図２１に例示したように、たとえば、『ＯｐｅｎＦｉｌｅ』，『Ｎｅｘｔ』等の２〜３秒程度の音声によるコマンドを想定している。
声紋認証部３２内の声紋認証処理プロセッサＰは、声紋登録装置３２Ａから入力された音声信号をディジタル信号に変換した後、辞書メモリＭ１に記録された辞書を参照して音声認識処理を施し、文字列データに変換してマイクロフォン番号とともに声紋登録メモリＭ２に記録する。すなわち、声紋認証処理プロセッサＰはあらかじめ入力する音声コマンドに対応する文字列データが格納されている辞書メモリＭ１の音声コマンドに対応する文字列データを照合し、合致するものを選択する。 A detailed example of processing contents of the voiceprint authentication unit 32 will be described with reference to FIGS.
In the present embodiment, the attendees of each conference input voices sequentially from the microphones MC1 to MC6 to the voiceprint registration device 32A, and output the voiceprint registration unit 32A together with the microphone number to the voiceprint authentication unit 32. In this example, the voice of each conference attendant is assumed to be a command with a voice of about 2 to 3 seconds, such as “Open File” and “Next”, as illustrated in FIG.
The voiceprint authentication processor P in the voiceprint authentication unit 32 converts the voice signal input from the voiceprint registration device 32A into a digital signal, and then performs a voice recognition process with reference to the dictionary recorded in the dictionary memory M1, thereby Converted into column data and recorded in the voiceprint registration memory M2 together with the microphone number. That is, the voiceprint authentication processor P collates the character string data corresponding to the voice command in the dictionary memory M1 in which the character string data corresponding to the voice command input in advance is stored, and selects a matching one.

図２１（Ａ）〜（Ｄ）は、声紋認証部３２で行われる制御の動作について図解したタイミングチャートである。
図２１（Ａ）は、マイク切替え信号ＭＣ＿ＳＥＬのタイミングチャートであり、例えば＃４と記載されている場合は、第４マイクロフォンＭＣ４が現在選択されていることを示している。
図２１（Ｂ）は、マイクロフォン出力信号のタイミングチャートである。マイクロフォン出力信号は、図２１（Ａ）のマイク切替え信号ＭＣ＿ＳＥＬで示すマイク番号に対応した音声信号であり、声紋認証処理プロセッサＰ内のＡ／Ｄ変換器でディジタルに変換されて入力される。この例では、マイクロフォン出力信号”ＯｐｅｎＦｉｌｅ”，”Ｎｅｘｔ”といったコマンドの音声信号である。
図２１（Ｃ）は、図２１（Ａ）〜（Ｂ）で得られた情報をもとに声紋認証処理プロセッサＰで行われる処理プロセスを示すタイミングチャートである。各音声データのバッファリングとバッファリング後の音声認識処理から構成される。
図２１（Ｄ）は、図２１（Ｃ）で示した音声認識処理の結果として順次出力される文字列データのタイミングチャートである。 FIGS. 21A to 21D are timing charts illustrating the control operation performed by the voiceprint authentication unit 32.
FIG. 21A is a timing chart of the microphone switching signal MC_SEL. When, for example, # 4 is described, it indicates that the fourth microphone MC4 is currently selected.
FIG. 21B is a timing chart of the microphone output signal. The microphone output signal is an audio signal corresponding to the microphone number indicated by the microphone switching signal MC_SEL in FIG. 21A, and is converted into a digital signal by an A / D converter in the voiceprint authentication processor P and input. In this example, it is a voice signal of a command such as a microphone output signal “OpenFile” or “Next”.
FIG. 21C is a timing chart showing a process performed by the voiceprint authentication processor P based on the information obtained in FIGS. It consists of buffering of each voice data and voice recognition processing after buffering.
FIG. 21D is a timing chart of character string data sequentially output as a result of the voice recognition process shown in FIG.

図２１（Ａ）に図解のように、最初に選択されたマイクロフォンの番号が＃４であり、第４マイクロフォンから”ＯｐｅｎＦｉｌｅ”というマイクロフォン出力信号が声紋認証処理プロセッサＰに入力されている。声紋認証処理プロセッサＰはＡ／Ｄ変換器を介してディジタル変換されたマイクロフォン出力信号を入力し、図２１（Ｃ）に図解のごとくバッファリングを開始し、その音声データはバッファのマイク番号＃４に応じたバッファで保持される。 As illustrated in FIG. 21A, the number of the first selected microphone is # 4, and the microphone output signal “Open File” is input to the voiceprint authentication processor P from the fourth microphone. The voiceprint authentication processor P inputs the microphone output signal digitally converted via the A / D converter, starts buffering as illustrated in FIG. 21C, and the voice data is the microphone number # 4 of the buffer. Is held in a buffer according to

その後、マイクロフォンの番号が＃４から＃１になると、マイク切替え信号ＭＣ＿ＳＥＬ＝１となる。図２１（Ｂ）に示すとおり、マイク番号＃１の音声データは”Ｎｅｘｔ”に相当する音声データであり、声紋認証処理プロセッサＰはマイク番号＃４のバッファリングを終了し、新たにマイク番号＃１のバッファリングを開始するとともに、バッファに保持されたマイク番号＃４の音声データに基づいて、声紋認証処理プロセッサＰで音声認識処理を並行して行う。
音声認識処理では、マイク番号＃４の音声データが音声認識処理され、辞書メモリＭ１に格納されている文字列データのコマンド群と照合され、合致するものが選択され、文字列データとしての”ＯｐｅｎＦｉｌｅ”を、図２１（Ｄ）のとおり出力される。
その後さらに、マイク番号が＃１から＃２へ変化しても同様である。
以上、概略説明した制御動作をフローチャートを参照してさらに説明する。 Thereafter, when the microphone number changes from # 4 to # 1, the microphone switching signal MC_SEL = 1. As shown in FIG. 21B, the voice data of the microphone number # 1 is the voice data corresponding to “Next”, and the voiceprint authentication processing processor P ends the buffering of the microphone number # 4 and starts a new microphone number #. 1 buffering is started, and voice recognition processing processor P performs voice recognition processing in parallel based on the voice data of microphone number # 4 held in the buffer.
In the speech recognition process, the speech data of the microphone number # 4 is subjected to speech recognition processing, collated with the command group of the character string data stored in the dictionary memory M1, and the matching data is selected, and “Open” as the character string data File "is output as shown in FIG.
Thereafter, the same applies even if the microphone number changes from # 1 to # 2.
The control operation outlined above will be further described with reference to the flowchart.

図２２は声紋認証処理プロセッサＰで行われる制御のメインフローを示す図である。
たとえば、２ｋＨｚのＴ１タイマがスタートし、５０μｓ毎に図２３に示すＴ１タイマ割込みに移行する。そして、一定レベル以上の音声入力があれば（ステップＳＴ１１）、ステップＳＴ１２に移行する。この一定レベルの閾値は、アプリケーションに応じて適宜設定することができることは言うまでもない。
声紋認証処理プロセッサＰはマイク切替え信号ＭＣ＿ＳＥＬが供給されているので、ステップＳＴ１１において一定レベル以上の音声入力があれば、その音声のマイク番号（１〜６）を把握している。従って、ステップＳＴ１２では、その入力音声データのサンプリングを開始し、その音声のマイク番号（１〜６）に応じたバッファに音声データを保持する。
一定レベル以上の音声入力がなければ、ステップＳＴ１２では何もしない。 FIG. 22 is a diagram showing a main flow of control performed by the voiceprint authentication processor P.
For example, a T1 timer of 2 kHz starts and shifts to a T1 timer interrupt shown in FIG. 23 every 50 μs. If there is a voice input of a certain level or higher (step ST11), the process proceeds to step ST12. It goes without saying that this constant level threshold value can be appropriately set according to the application.
Since the voice print authentication processor P is supplied with the microphone switching signal MC_SEL, if there is a voice input at a certain level or higher in step ST11, the voice print authentication processor P knows the voice microphone number (1 to 6). Accordingly, in step ST12, sampling of the input voice data is started, and the voice data is held in a buffer corresponding to the microphone number (1 to 6) of the voice.
If there is no voice input above a certain level, nothing is done in step ST12.

図２５は、図２２に示したメインフローの制御においてマイク選択情報が変化した場合の割込みフローを示した図である。すなわち、通常制御動作であるメインフローにおいて、通話装置で選択されるマイク番号が変化して、その情報がマイク切替え信号ＭＣ＿ＳＥＬを通して通知された場合に発生する割込みフローであり、図２１の例で言えば、本割込み以前にマイク番号４（マイク切替え信号ＭＣ＿ＳＥＬ＝４）の音声データをマイク番号４のバッファにサンプリングをして格納していたとき、マイク切替え信号ＭＣ＿ＳＥＬが４から１へ変化した場合である。
図２５のステップＳＴ４０において、声紋認証処理プロセッサＰは音声サンプリングを行っていた場合は、それ以上バッファには音声データを格納しない。
この場合は、現在行っているマイク番号４からの発話入力は終了したものとみなし、サンプリングを終了する（ステップＳＴ４１）。
さらに、サンプリングが終了したマイク番号４の音声データは、声紋認証処理プロセッサＰにおいて音声認識処理が行われる（ステップＳＴ４２）。図２１の例では、声紋認証処理プロセッサＰにおいて、マイク番号４の音声データは”ＯｐｅｎＦｉｌｅ”と認識され、その文字列データが通話装置１Ａの外部に出力される。 FIG. 25 is a diagram showing an interrupt flow when the microphone selection information is changed in the control of the main flow shown in FIG. That is, in the main flow that is a normal control operation, the interrupt number is generated when the microphone number selected by the call device changes and the information is notified through the microphone switching signal MC_SEL. For example, when the voice data of microphone number 4 (microphone switching signal MC_SEL = 4) is sampled and stored in the buffer of microphone number 4 before this interruption, the microphone switching signal MC_SEL changes from 4 to 1. is there.
In step ST40 of FIG. 25, when the voiceprint authentication processor P is performing audio sampling, no more audio data is stored in the buffer.
In this case, it is considered that the speech input from the currently performed microphone number 4 is completed, and the sampling is terminated (step ST41).
Furthermore, the voice data of microphone number 4 for which sampling has been completed is subjected to voice recognition processing in the voiceprint authentication processor P (step ST42). In the example of FIG. 21, in the voiceprint authentication processor P, the voice data of the microphone number 4 is recognized as “Open File”, and the character string data is output to the outside of the call device 1A.

図２２のステップＳＴ１０において、Ｔ１タイマが開始され、例えば５０μｓ（２０ｋＨｚ）毎に図２２に示すＴ１タイマ割込みフローが開始される。Ｔ１タイマ割込みでは、５μｓ毎に音声入力があるか、および、一定レベル以上の音声入力があるか監視を行い、適切な処置を施す。まず、ステップＳＴ２０で音声サンプリングを行っていたか否かチェックされる。
音声サンプリングを行っていた場合は、声紋認証処理プロセッサＰはさらに一定レベルの音声入力があるか否かチェックし（ステップＳＴ２１）、一定レベルの音声入力がある場合には後述するＴ２タイマは停止する。Ｔ２タイマは発話がない状態を監視し、一定時間発話がない場合には自動的に次のフェーズである音声認識に移行するためのものである。
発話、すなわち、音声入力が一定レベル以上ある場合は、発話が継続していると考えられ、ステップＳＴ２２において、Ｔ２タイマはリセットされる。
また、ステップＳＴ２０で音声サンプリングを行っているが、一定レベル以上の音声入力がない場合には、現在の発話が終了した可能性があるため、発話がない状態の継続時間を監視するため、Ｔ２タイマをスタートさせる（ステップＳＴ２３）。
ステップＳＴ２１で一定レベル以上の音声入力がない場合でも、発話を再開する可能性があるため、音声サンプリングは継続する（ステップＳＴ２４）。 In step ST10 of FIG. 22, the T1 timer is started, and for example, the T1 timer interrupt flow shown in FIG. 22 is started every 50 μs (20 kHz). In the T1 timer interruption, it is monitored whether there is an audio input every 5 μs and whether there is an audio input of a certain level or more, and appropriate measures are taken. First, it is checked in step ST20 whether audio sampling has been performed.
If voice sampling has been performed, the voiceprint authentication processor P further checks whether there is a certain level of voice input (step ST21), and if there is a certain level of voice input, the T2 timer described later stops. . The T2 timer is used to monitor a state where there is no utterance, and when there is no utterance for a certain period of time, the T2 timer automatically shifts to the next phase, speech recognition.
If the utterance, that is, the voice input is above a certain level, it is considered that the utterance is continuing, and the T2 timer is reset in step ST22.
Further, although voice sampling is performed in step ST20, if there is no voice input of a certain level or more, the current utterance may be terminated. Therefore, in order to monitor the duration time when there is no utterance, T2 A timer is started (step ST23).
Even if there is no voice input of a certain level or higher in step ST21, speech sampling is continued because there is a possibility that speech is resumed (step ST24).

ステップＳＴ２０で音声サンプリングを行っていない場合は、声紋認証処理プロセッサＰはステップＳＴ２５で一定レベル以上の音声入力があるか否かチェックする。これにより、発話が開始された否かがチェックされ、一定レベル以上の音声入力がある場合は、声紋認証処理プロセッサＰは発話が開始されたものとし、新しく選択されたマイクに対応したバッファに音声サンプリングが開始される（ステップＳＴ２６）。
ステップＳＴ２５で一定レベル以上の音声入力がない場合には、声紋認証処理プロセッサＰは何もせず次の有効な発話を待つことになる。 If voice sampling is not performed in step ST20, the voiceprint authentication processor P checks in step ST25 whether there is a voice input of a certain level or higher. Thereby, it is checked whether or not the utterance has started, and if there is an audio input of a certain level or more, the voiceprint authentication processor P assumes that the utterance has started, and the audio is stored in the buffer corresponding to the newly selected microphone. Sampling is started (step ST26).
If there is no voice input exceeding a certain level in step ST25, the voiceprint authentication processor P does nothing and waits for the next valid utterance.

図２３のステップＳＴ２３で、例えば２ＨｚのＴ２タイマが開始され、一定時間経過した場合、すなわち、声紋認証処理プロセッサＰは音声サンプリングは実施しているが（ステップＳＴ２０）、一定レベル以上の音声入力がない場合が一定時間継続した場合は、音声サンプリングを継続することは無駄であるため、図２４に示すＴ２タイマ割込みフローに移行する。
すなわち、その時行っていた音声のサンプリングを終了し（ステップＳＴ３０）、音声認識処理に移行する（ステップＳＴ３１）。
音声認識処理に移行した後、ステップＳＴ３２において、次の発話の処理のため、Ｔ２タイマはリセットされる。 In step ST23 of FIG. 23, for example, when a T2 timer of 2 Hz is started and a certain period of time has elapsed, that is, the voiceprint authentication processor P is performing voice sampling (step ST20), If no case continues for a certain period of time, it is useless to continue the audio sampling, and the process proceeds to the T2 timer interrupt flow shown in FIG.
That is, the voice sampling performed at that time is terminated (step ST30), and the process proceeds to the voice recognition process (step ST31).
After shifting to the voice recognition process, in step ST32, the T2 timer is reset for the next utterance process.

声紋認証部３２によれば、複数の会議出席者のそれぞれが使用するマイクロフォンを通して、複数人が重なって通話装置に対して音声によりコマンドを発している場合でも、各音声の帯域毎の音圧レベルを分析して、主の話者を特定してその音声信号を引き渡す。したがって、声紋認証部３２において、複数の音声コマンドが同時に入力された場合でも誤認識処理を起こす可能性を極力回避することができ、主に発話している音声コマンドを適切に判断・処理を行うことが可能である。
声紋認証部３２の声紋認証処理プロセッサＰは、引き渡された音声コマンド信号をバッファリングし、バッファリングした音声信号を音声認識処理し、辞書メモリＭ１に格納されるコマンド文字列データと照合し、合致する文字列データを選択して処理される。
また、声紋認証部３２の声紋認証処理プロセッサＰは、声紋登録装置３２Ａから選択されたマイク番号を逐次通知されている。したがって、その選択されたマイク番号が切り替わった場合には、バッファリングを中止し、それまでバッファリングしていた音声信号を音声認識処理し、更新されたマイク番号からの音声コマンド信号のバッファリングを開始するので、音声認識の精度が向上する。 According to the voiceprint authentication unit 32, even when a plurality of people overlap each other and issue a voice command to the call device through the microphones used by each of the plurality of conference attendees, the sound pressure level for each voice band To identify the main speaker and deliver the speech signal. Therefore, the voiceprint authentication unit 32 can avoid the possibility of erroneous recognition processing even when a plurality of voice commands are input at the same time, and appropriately determines and processes a voice command mainly spoken. It is possible.
The voiceprint authentication processor P of the voiceprint authentication unit 32 buffers the delivered voice command signal, performs voice recognition processing on the buffered voice signal, compares it with the command character string data stored in the dictionary memory M1, and matches. The character string data to be selected is selected and processed.
The voiceprint authentication processor P of the voiceprint authentication unit 32 is sequentially notified of the selected microphone number from the voiceprint registration device 32A. Therefore, when the selected microphone number is switched, the buffering is stopped, the voice signal that has been buffered up to that point is subjected to voice recognition processing, and the voice command signal from the updated microphone number is buffered. Since it starts, the accuracy of voice recognition is improved.

第３実施の形態
図２、図２６〜図３１を参照して本発明の音声集音・映像撮像装置の第３実施の形態について述べる。
本発明の第３実施の形態は、上述した通話装置を用い、これらに、撮像手段を付加して、テレビジョン会議（ＴＶ会議）システムを構成した場合について述べる。
図２は音声集音・映像撮像装置のテレビカメラ装置４０Ａ１、４０Ａ２の初期状態を示し、図３１は通話装置および撮像調整部３６による撮像条件の決定に基づきテレビカメラ装置４０Ａ１、４０Ａ２が撮像する状態を示す図である。 Third Embodiment With reference to FIGS. 2 and 26 to 31, a third embodiment of the sound collection / video imaging apparatus of the present invention will be described.
In the third embodiment of the present invention, a case will be described in which a television conference (TV conference) system is configured by using the above-described communication devices and adding imaging means to them.
FIG. 2 shows an initial state of the television camera devices 40A1 and 40A2 of the sound collection / video imaging device, and FIG. 31 shows a state in which the television camera devices 40A1 and 40A2 take an image based on determination of imaging conditions by the communication device and the imaging adjustment unit 36. FIG.

従来のカメラ付き会議システムでは、各発言者個別のマイクロフォンの番号や、ＴＶ会議システムの管理者（議長）による制御によりカメラの向きを制御していた。このような方法だと発言者ごとに個別のマイクロフォンが必要なため高価なシステムが必要であったり、ＴＶ会議システムの管理者が発言者が変わるたびに撮像領域を変更するためのカメラ撮像方向の変更制御をしなければいけないという面倒さがあった。
また発言者の名前表示等については通常、マイクロフォンと発言者名が連動しており参加者が座る座席を途中変更すると、再設定が必要になり、手続きが複雑であった。
なお、単に音が出ている方向にカメラの向きを向けるという簡便なシステムも存在するが、カメラの向きが撮像に適しない人の方向に向いたり、周囲のノイズ、たとえば、会議に使用しているプロジェクタ装置のファンの音に感応してプロジェクタ装置の方向にカメラの撮像方向が向くという不具合が起こる。 In the conventional conference system with a camera, the direction of the camera is controlled by the microphone number of each speaker or the control (chairman) of the TV conference system. In such a method, an individual microphone is required for each speaker, so an expensive system is necessary, or a video conference system administrator changes the imaging area every time the speaker changes. There was a hassle of having to control change.
In addition, as for the name display of the speaker, the microphone and the speaker name are usually linked to each other, and if the seat where the participant sits is changed halfway, resetting is necessary and the procedure is complicated.
There are simple systems that simply point the camera in the direction in which the sound is coming out, but the camera faces the direction of the person who is not suitable for imaging, or is used for ambient noise, such as a meeting. In response to the sound of the fan of the projector device, there is a problem that the imaging direction of the camera is directed toward the projector device.

上述した音声集音・映像撮像装置の通話装置を用いれば、話者の選択が正確になる、会議出席者の近傍にマイクロフォンを設置する必要がないなど、種々の利点があり、上述した不具合を改善できる。
すなわち、図５に図解した、全方位に複数のマイクロフォンＭＣ１〜ＭＣ６を配置し、第１のディジタルシグナルプロセッサ（ＤＳＰ）２５により現在主に発話している方向のマイクロフォンの集音信号を選択する機能を持った図６に図解した通話装置を用いると、正確に発言者のマイクロフォンを選択できる。マイクロフォンは、たとえば、均等の角度で配置されているから、たとえば、ＤＳＰ２５においてマイクロフォンを選択できれば、ＤＳＰ２５においてマイクロフォンの配置方向が決定でき、さらにＤＳＰ２５において話者の方向を特定できる。
さらに好ましくは、図２０を参照して第２実施の形態として述べた、図６に図解した通話装置に声紋認証部３２を付加した通話装置によって、声紋認証部３２から出力される認証合格信号Ｓ３２と、ＤＳＰ２５からマイクロプロセッサ２３に出力されるマイクロフォン選択信号Ｓ２５１を用いると、正確に発言者を特定できる。
図１（Ｂ）、図２、図３１に例示したように、発言者は対応するマイクロフォンの前に座っているから、事前にＤＳＰ２５に各マイクロフォンの位置に対応する発言者の位置を登録しておく。さらに、ＤＳＰ２５には、各テレビカメラ装置４０Ａ１、４０Ａ２と発言者の位置および方向を登録しておく。
以上の話者の方向と位置を用いれば、各テレビカメラ装置４０Ａ１、４０Ａ２が撮像すべき発言者の撮像領域を決定できる。 Using the above-described voice collecting / imaging device communication device has various advantages such as accurate selection of a speaker and no need to install a microphone in the vicinity of a conference attendee. Can improve.
That is, the function illustrated in FIG. 5 is that a plurality of microphones MC1 to MC6 are arranged in all directions, and the first digital signal processor (DSP) 25 selects a microphone sound collection signal in a direction mainly speaking at present. 6 can be used to accurately select the speaker's microphone. For example, since the microphones are arranged at equal angles, if the microphone can be selected in the DSP 25, for example, the arrangement direction of the microphone can be determined in the DSP 25, and further, the direction of the speaker can be specified in the DSP 25.
More preferably, the authentication pass signal S32 output from the voiceprint authentication unit 32 by the call device in which the voiceprint authentication unit 32 is added to the call device illustrated in FIG. 6 described as the second embodiment with reference to FIG. If the microphone selection signal S251 output from the DSP 25 to the microprocessor 23 is used, the speaker can be accurately identified.
As illustrated in FIGS. 1B, 2, and 31, since the speaker is sitting in front of the corresponding microphone, the position of the speaker corresponding to the position of each microphone is registered in the DSP 25 in advance. deep. Further, the television camera devices 40A1 and 40A2 and the positions and directions of the speakers are registered in the DSP 25.
By using the direction and position of the above speakers, it is possible to determine the imaging area of the speaker to be captured by each of the television camera devices 40A1 and 40A2.

そこで、第３実施の形態においては、図２６、図２７に図解したように、本発明の撮像手段としてのテレビカメラ装置４０Ａ１、４０Ａ２（代表して、テレビカメラ装置４０）と、このテレビカメラ装置４０の撮像条件を調整する撮像調整手段としての撮像調整部３６とを、図２０を参照して述べた通話装置に付加している。
図２６、図２７は本発明の第３実施の形態としての音声集音・映像撮像装置の構成図である。図２７は、図２０に図解した通話装置に、撮像調整部３６とテレビカメラ装置４０（テレビカメラ装置４０Ａ１、４０Ａ２）とを付加した音声集音・映像撮像装置の構成図であり、図２６は、図２７に図解した音声集音・映像撮像装置から利得可変型増幅器３０１〜３０６と増幅器利得調整部３４とを削除した音声集音・映像撮像装置の構成図である。 Therefore, in the third embodiment, as illustrated in FIGS. 26 and 27, the television camera devices 40A1 and 40A2 (typically, the television camera device 40) as the imaging means of the present invention, and the television camera device An imaging adjustment unit 36 serving as an imaging adjustment unit that adjusts 40 imaging conditions is added to the communication device described with reference to FIG.
FIG. 26 and FIG. 27 are configuration diagrams of an audio sound collection / video imaging apparatus as a third embodiment of the present invention. FIG. 27 is a configuration diagram of an audio sound collection / video imaging device in which the imaging adjustment unit 36 and the television camera device 40 (television camera devices 40A1 and 40A2) are added to the call device illustrated in FIG. FIG. 28 is a block diagram of a sound collection / video imaging apparatus in which the variable gain amplifiers 301 to 306 and the amplifier gain adjustment unit 34 are deleted from the voice collection / video imaging apparatus illustrated in FIG.

（１）本発明の第３実施の形態としては、第１実施の形態として述べたＤＳＰ２５によるマイクロフォン選択処理が必須となり、ＤＳＰ２５におけるマイクロフォン選択処理結果に基づいて撮像調整部３６がテレビカメラ装置４０の撮像条件を制御する。
（２）本発明の第３実施の形態の好ましい形態としては、図２６に図解した構成において、図２０を参照して述べた第２実施の形態のように、第１実施の形態として述べたＤＳＰ２５によるマイクロフォン選択処理に加えて、声紋認証部３２における声紋認証を行って、マイクロフォン選択処理結果と声紋認証との両者が一致した場合のみ、撮像調整部３６によってテレビカメラ装置４０（テレビカメラ装置４０Ａ１、４０Ａ２）の撮像条件を制御する。
（３）本発明の第３実施の形態のさらに好ましい形態としては、図２７に図解した構成において、図２０を参照して述べた第２実施の形態のように、第１実施の形態として述べたＤＳＰ２５によるマイクロフォン選択処理に加えて、声紋認証部３２における声紋認証を行って、マイクロフォン選択処理結果と声紋認証との両者が一致した場合のみ、撮像調整部３６によってテレビカメラ装置４０（テレビカメラ装置４０Ａ１、４０Ａ２）の撮像条件を制御するともに、第２実施の形態として述べた、増幅器利得調整部３４による利得可変型増幅器３０１〜３０６の利得制御をも行う。
以下、図２６および図２７を参照して、第３実施の形態の基本事項について述べる。 (1) As the third embodiment of the present invention, the microphone selection process by the DSP 25 described as the first embodiment is indispensable, and the imaging adjustment unit 36 of the television camera device 40 is based on the result of the microphone selection process in the DSP 25. Control imaging conditions.
(2) As a preferable mode of the third embodiment of the present invention, the configuration illustrated in FIG. 26 is described as the first embodiment like the second embodiment described with reference to FIG. In addition to the microphone selection process by the DSP 25, voice print authentication is performed in the voice print authentication unit 32, and only when the microphone selection process result and the voice print authentication match, the imaging adjustment unit 36 performs the television camera device 40 (TV camera device 40A1). , 40A2) is controlled.
(3) As a more preferable form of the third embodiment of the present invention, the structure illustrated in FIG. 27 is described as the first embodiment like the second embodiment described with reference to FIG. In addition to the microphone selection process performed by the DSP 25, voice print authentication is performed by the voice print authentication unit 32, and only when the microphone selection process result matches the voice print authentication, the imaging adjustment unit 36 performs the TV camera device 40 (TV camera device). 40A1 and 40A2) are controlled, and the gain control of the variable gain amplifiers 301 to 306 by the amplifier gain adjustment unit 34 described in the second embodiment is also performed.
The basic items of the third embodiment will be described below with reference to FIGS.

撮像調整部３６は、コンピュータを内蔵しており、図２および図３１に図解したように、各テレビカメラ装置４０Ａ１、４０Ａ２の上下左右方向（上下左右向きまたはチルト）、パン、ズーム、照明条件などを調整できる。
なお、撮像調整部３６には、事前に、各テレビカメラ装置４０Ａ１、４０Ａ２について、たとえば、第１マイクロフォンの方向および領域ＭＩＣ１ＡＲＥＡを撮像するための第１の撮像条件情報、第２マイクロフォンの方向および領域ＭＩＣ２ＡＲＥＡを撮像するための第２の撮像条件情報などがコンピュータのメモリ部分に設定されている。好ましくは、これら撮像条件情報には、会議出席者の氏名、職名または役職などを含めてもよい。
図２に図解した例示においては、撮像調整部３６は、初期状態として、各テレビカメラ装置４０Ａ１、４０Ａ２が会議室内の通話装置１０Ａを中心として会議室の左右を分担し、かつ、合わせて出席者全員を撮像可能にしている。 The imaging adjustment unit 36 has a built-in computer, and as illustrated in FIG. 2 and FIG. 31, up and down and left and right directions (up and down and left and right or tilt), pan, zoom, illumination conditions, etc. Can be adjusted.
In addition, the imaging adjustment unit 36, for each TV camera device 40A1, 40A2, in advance, for example, the first imaging condition information for imaging the direction and area MIC1AREA of the first microphone, the direction and area of the second microphone Second imaging condition information for imaging MIC2AREA is set in the memory portion of the computer. Preferably, the imaging condition information may include the name, title or title of the attendee of the meeting.
In the example illustrated in FIG. 2, as an initial state, the imaging adjustment unit 36 has the television camera devices 40A1 and 40A2 share the right and left of the conference room with the call device 10A in the conference room as the center, and the attendees as well. Everyone can be imaged.

各テレビカメラ装置４０Ａ１、４０Ａ２は、撮像調整部３６から与えられる撮像条件、たとえば、撮像方向（上下、左右方向）、ズームするか否か、ズームする場合はどの程度ズームするかの撮像条件が与えられると、その撮像条件で撮像可能な構成になっている。テレビカメラ装置４０Ａ１、４０Ａ２で撮像した画像信号は、プロジェクタ装置６０Ａ（またはテレビジョン受像機５０Ａ）に表示される他、遠隔の音声集音・映像撮像装置のプロジェクタ装置６０Ｂ（またはテレビジョン受像機５０Ｂ）に表示される。 Each of the television camera devices 40A1 and 40A2 has an imaging condition given from the imaging adjustment unit 36, for example, an imaging condition (up and down, left and right direction), whether to zoom, and how much to zoom when zooming. If so, the image can be captured under the imaging conditions. The image signals captured by the TV camera devices 40A1 and 40A2 are displayed on the projector device 60A (or the television receiver 50A), and the projector device 60B (or the television receiver 50B) of the remote sound collection / video imaging device. ) Is displayed.

増幅器利得調整部３４および撮像調整部３６は、ＤＳＰ２５で選択したマイクロフォンの番号を示すマイクロフォン選択信号Ｓ２５１をマイクロプロセッサ２３を介して入力する。
増幅器利得調整部３４および撮像調整部３６は、ＤＳＰ２５で選択した集音信号がＤＳＰ２６でエコーキャンセル処理されて出力された選択音声信号Ｓ２６が、声紋認証部３２において声紋認証されて事前に登録した声紋と一致したとき、「１」として出力される認証合格信号Ｓ３２を入力する。 The amplifier gain adjustment unit 34 and the imaging adjustment unit 36 input a microphone selection signal S251 indicating the number of the microphone selected by the DSP 25 via the microprocessor 23.
The amplifier gain adjustment unit 34 and the image pickup adjustment unit 36 have a voice print that has been pre-registered after the voice print authentication is performed by the voice print authentication unit 32 on the selected voice signal S26 output by the echo cancellation processing of the collected sound signal selected by the DSP 25 by the DSP 26. The authentication pass signal S32 output as “1” is input.

増幅器利得調整部３４は第２実施の形態として述べた方法により、マイクロフォン選択信号Ｓ２５１で示されたマイクロフォンに対応する利得可変型増幅器の利得を大きな第１の利得に設定する。その結果は第２実施の形態において述べたと同様である。 The amplifier gain adjustment unit 34 sets the gain of the variable gain amplifier corresponding to the microphone indicated by the microphone selection signal S251 to a large first gain by the method described in the second embodiment. The result is the same as that described in the second embodiment.

撮像調整部３６は、マイクロフォン選択信号Ｓ２５１で示されたマイクロフォンに該当する、撮像調整部３６に事前に設定された撮像条件情報をメモリから読みだして、その撮像条件情報に基づいてテレビカメラ装置４０Ａ１、４０Ａ２の撮像条件を調整する。
たとえば、マイクロフォン選択信号Ｓ２５１が第１マイクロフォンを示している場合、第１マイクロフォンの方向および領域ＭＩＣ１ＡＲＥＡ（図２６、たとえば、図２の左方向）を撮像するための第１の撮像条件情報に基づいて、第１マイクロフォンの方向および領域ＭＩＣ１ＡＲＥＡを撮像するように、各テレビカメラ装置４０Ａ１、４０Ａ２の方向または向き（上下、左右）を制御する。第１の撮像条件情報がズーム情報を含んでいる場合は、撮像調整部３６はさらにテレビカメラ装置４０Ａ１、４０Ａ２にズーム処理を指示する。
テレビカメラ装置４０Ａ１、４０Ａ２は撮像調整部３６で指示された条件で撮像を行い、その結果を図示しない回線を用いて遠隔の相手方の音声集音・映像撮像装置のプロジェクタ装置に送出する。また、テレビカメラ装置４０Ａ１、４０Ａ２の撮像結果を、その音声集音・映像撮像装置のプロジェクタ装置に表示することもできる。
このように、遠隔の相手方の音声集音・映像撮像装置が設置されている室内のモニタ装置としてのプロジェクタ装置に、ＤＳＰ２５でマイクロフォンが選択され、さらに、声紋認証部３２において声紋認証されたマイクロフォンを用いて発言した会議出席者の映像が選択表示される。 The imaging adjustment unit 36 reads imaging condition information set in advance in the imaging adjustment unit 36 corresponding to the microphone indicated by the microphone selection signal S251 from the memory, and based on the imaging condition information, the TV camera device 40A1. , 40A2 imaging conditions are adjusted.
For example, when the microphone selection signal S251 indicates the first microphone, based on the first imaging condition information for imaging the direction of the first microphone and the region MIC1 AREA (FIG. 26, for example, the left direction in FIG. 2). Thus, the direction or direction (up and down, left and right) of each of the television camera devices 40A1 and 40A2 is controlled so as to image the direction of the first microphone and the area MIC1AREA. When the first imaging condition information includes zoom information, the imaging adjustment unit 36 further instructs the television camera devices 40A1 and 40A2 to perform zoom processing.
The TV camera devices 40A1 and 40A2 perform image capturing under the conditions instructed by the image capturing adjustment unit 36, and send the result to the projector device of the remote sound collecting / video image capturing device using a line (not shown). In addition, the imaging results of the television camera devices 40A1 and 40A2 can be displayed on the projector device of the sound collection / video imaging device.
As described above, a microphone selected by the DSP 25 and further subjected to voice print authentication by the voice print authentication unit 32 is selected as a projector device serving as a monitor device in a room where the voice collecting / imaging device of the remote party is installed. The video of the attendees who used to speak is selected and displayed.

撮像調整部３６は、テレビカメラ装置４０Ａ１、４０Ａ２が撮像した映像信号に、撮像条件情報に含まれる、氏名、役職などの情報を重畳することができる。その結果、通話装置が設置されている室内のモニタ装置としてのプロジェクタ装置、および、遠隔の相手方のプロジェクタ装置には、テレビカメラ装置４０Ａ１、４０Ａ２で撮像した映像だけでなく、氏名、役職などの情報が重畳されて表示される。 The imaging adjustment unit 36 can superimpose information such as name and title included in the imaging condition information on the video signals captured by the television camera devices 40A1 and 40A2. As a result, the projector device as the indoor monitor device in which the communication device is installed, and the remote counterpart projector device, not only the images captured by the TV camera devices 40A1 and 40A2, but also information such as name and title Are superimposed and displayed.

動作形態
図２８〜図２９を参照して第３実施の形態の音声集音・映像撮像装置の動作形態を述べる。
１．初期状態として、撮像調整部３６はテレビカメラ装置４０Ａ１、４０Ａ２を図２に図解したように、広角にしておく。
２．図２８、ステップＳ５１：会議が始まり発言者があると、通話装置は上述した方法で発言者の音声を検出する。
３．ステップＳ５２〜５３：好ましくは、通話装置の声紋認証部３２において発言者の声紋を抽出して声紋認識処理を行う。声紋登録装置３２Ａに登録されていない声紋の場合、ステップＳ６０の処理に移行する。
４．ステップＳ６０〜６４：新しい声紋の場合の処理を行う。この処理の詳細は後述する。
５．ステップＳ５４：声紋認証部３２は前回と同じ声紋か、または、音を検出したマイクロフォンが前回と同じか否かをチェックし、前回と同じ声紋または前回と同じマイクロフォンが選択されている場合は、ステップＳ５１の処理に戻る。
前回と異なる声紋または前回と異なるマイクロフォンが選択されている場合は、ステップＳ５５の処理に移行する。
６．ステップＳ５５〜５９：ステップＳ６０〜６４：
これらの処理を述べる前に、図２９に示したサブルーチン１および図３０に示したサブルーチン２の処理を述べる。 Mode of Operation The mode of operation of the sound collection / video imaging apparatus of the third embodiment will be described with reference to FIGS.
1. As an initial state, the imaging adjustment unit 36 sets the TV camera devices 40A1 and 40A2 to a wide angle as illustrated in FIG.
2. FIG. 28, Step S51: When the conference starts and there is a speaker, the call device detects the voice of the speaker by the method described above.
3. Steps S52 to 53: Preferably, the voiceprint authentication unit 32 of the call device extracts the voiceprint of the speaker and performs voiceprint recognition processing. If the voiceprint is not registered in the voiceprint registration device 32A, the process proceeds to step S60.
4). Steps S60 to S64: Processing for a new voiceprint is performed. Details of this processing will be described later.
5). Step S54: The voiceprint authentication unit 32 checks whether it is the same voiceprint as the previous time or whether the microphone that detected the sound is the same as the previous time, and if the same voiceprint as the previous time or the same microphone as the previous time is selected, the step The process returns to S51.
If a voiceprint different from the previous time or a microphone different from the previous time is selected, the process proceeds to step S55.
6). Steps S55 to 59: Steps S60 to 64:
Before describing these processes, the processes of subroutine 1 shown in FIG. 29 and subroutine 2 shown in FIG. 30 will be described.

図２９、サブルーチン１
ステップＳ７０：音声集音・映像撮像装置を設置したとき、通話装置の各マイクロフォン、テレビカメラ装置４０Ａ１、４０Ａ２の座標位置を、撮像調整部３６に入力しておく。これらの情報は、通話装置において発言者の音の方向（マイクロフォン位置）を特定し、たとえば、さらに声紋認識で発言者を特定したとき、各テレビカメラ装置４０Ａ１、４０Ａ２からその発言者の方向および距離を算出する情報となる。 FIG. 29, subroutine 1
Step S70: When the sound collection / video imaging device is installed, the coordinate positions of the microphones of the communication device and the TV camera devices 40A1 and 40A2 are input to the imaging adjustment unit 36. These pieces of information specify the direction (microphone position) of the speaker's sound in the communication device. For example, when the speaker is further specified by voiceprint recognition, the direction and distance of the speaker from each TV camera device 40A1, 40A2. Is the information for calculating.

ステップＳ７１：撮像調整部３６はＤＳＰ２５の結果である選択された２つのマイクロフォンから算出した音源方向検出データを入手する。
上述した実施の形態において、話者がどのマイクの近くにいるのかを検出し、そのマイク付近を撮影するのにふさわしいテレビカメラ装置４０Ａ１、４０Ａ２が選択される。 Step S71: The imaging adjustment unit 36 obtains sound source direction detection data calculated from the two selected microphones as a result of the DSP 25.
In the embodiment described above, it is detected which microphone the speaker is near, and the TV camera devices 40A1 and 40A2 suitable for photographing the vicinity of the microphone are selected.

なお、第１実施の形態においては、対向して配置された１対のマイクロフォンを用いて、最も音の高い音を検出したマイクロフォンを選択する場合を好適実施の形態として述べたが、本実施の形態においては、図２および図３１に図解したように、たとえば、６個のマイクロフォンに対して８名の会議出席者がいる場合をも想定している。
このような場合、マイクロフォンの数と会議出席者の数とが１対１に対応していないから、隣接する２つのマイクロフォンの間に位置する会議出席者がいることになる。このような場合、第１実施の形態のように１つのマイクロフォンのみを選択するのではなく、最大の音を検出した第１マイクロフォンと次に高い音を検出した第２マイクロフォンとを選択し、これら２つのマイクロフォンから音源方向を検出する。したがって、音源方向データは、２つの隣接するマイクロフォンの向き（配置、第１の配置条件）から規定できる。 In the first embodiment, the case of selecting the microphone that detects the highest sound using a pair of microphones arranged opposite to each other has been described as a preferred embodiment. In the embodiment, as illustrated in FIGS. 2 and 31, for example, it is assumed that there are eight conference attendees for six microphones.
In such a case, since the number of microphones and the number of conference attendees do not correspond one-to-one, there are conference attendees located between two adjacent microphones. In such a case, instead of selecting only one microphone as in the first embodiment, the first microphone that detects the highest sound and the second microphone that detects the next highest sound are selected, and these are selected. The direction of the sound source is detected from the two microphones. Therefore, the sound source direction data can be defined from the directions (arrangement, first arrangement condition) of two adjacent microphones.

音源方向の特定と、テレビカメラ装置４０Ａ１、４０Ａ２の撮影条件との関係は、たとえば、図３１に図解したように、会議出席者Ａ１の顔の正面が撮像できるテレビカメラ装置４０Ａ２で会議出席者Ａ１の顔を撮影し、他方のテレビカメラ装置４０Ａ１で会議室の右側全体、または、議長（たとえば、会議出席者Ａ４）、または、会議出席者全員を撮影する。 The relationship between the specification of the sound source direction and the shooting conditions of the TV camera devices 40A1 and 40A2 is, for example, as illustrated in FIG. 31, with the TV camera device 40A2 that can capture the front of the face of the conference participant A1 being the conference participant A1. The other TV camera device 40A1 shoots the entire right side of the conference room, the chairperson (for example, conference attendant A4), or all conference attendees.

ステップＳ７２：撮像調整部３６は音源方向検出データに変化があるか否かをチェックし、変化がなければステップＳ７１の処理に戻り、変化があれば、ステップＳ７３の処理に移行する。 Step S72: The imaging adjustment unit 36 checks whether or not there is a change in the sound source direction detection data. If there is no change, the process returns to step S71, and if there is a change, the process proceeds to step S73.

ステップＳ７３：撮像調整部３６は、隣接する２つのマイクロフォンの向き（方向）から、交点を算出する。なお、交点の算出に使用するデータは、ステップＳ７０で設定したデータを用いる。
これにより、通話装置１０Ａの中心から、発言者の位置が推定できる。 Step S73: The imaging adjustment unit 36 calculates an intersection point from the directions (directions) of two adjacent microphones. Note that the data set in step S70 is used as data used to calculate the intersection.
Thereby, the position of the speaker can be estimated from the center of the communication device 10A.

ステップＳ７４：撮像調整部３６は、算出した交点までの各テレビカメラ装置４０Ａ１、４０Ａ２の距離、上下左右方向（または上下左右向き）を算出する。なお、この距離および方向の算出に使用するデータは、ステップＳ７０で設定したデータを用いる。
ステップＳ７５、７６：撮像調整部３６は、算出した向き（方向）に各テレビカメラ装置４０Ａ１、４０Ａ２をパンさせる。その後、撮像調整部３６の処理は、呼び出された図２８のステップに次に戻る。 Step S74: The imaging adjustment unit 36 calculates the distance, vertical and horizontal directions (or vertical and horizontal directions) of each of the television camera devices 40A1 and 40A2 to the calculated intersection. Note that the data set in step S70 is used as data used to calculate the distance and direction.
Steps S75 and 76: The imaging adjustment unit 36 pans the television camera devices 40A1 and 40A2 in the calculated direction (direction). Thereafter, the processing of the imaging adjustment unit 36 returns to the called step of FIG.

図３０、サブルーチン２
ステップＳ８０、８１：図２８に図解したメインルーチンのサブルーチン２の結果（テレビカメラ装置４０Ａ１、４０Ａ２の撮像結果）を見る。その結果、出力がなければステップＳ８０の処理に戻り、出力があればステップＳ８２に移行する。 FIG. 30, subroutine 2
Steps S80 and 81: The results of the subroutine 2 of the main routine illustrated in FIG. 28 (imaging results of the television camera devices 40A1 and 40A2) are viewed. As a result, if there is no output, the process returns to step S80, and if there is an output, the process proceeds to step S82.

ステップＳ８２〜８４：撮像調整部３６はテレビカメラ装置４０Ａ１、４０Ａ２の撮像結果（画像）の輪郭、すなわち、会議出席者の輪郭を探し（ステップＳ８２）、その輪郭が画像の枠（フレーム）一杯になるように、テレビカメラ装置４０Ａ１、４０Ａ２に対してズーム制御を行う。上述したように、たとえば、図３１に図解したように、会議出席者Ａ１を撮影する場合は、会議出席者Ａ１の顔の正面が撮像できるテレビカメラ装置４０Ａ２で会議出席者Ａ１の顔を撮影し、ズーム処理を行う。ズーム処理後、呼び出されたメインルーチンの次のステップＳに戻る。
すなわち、テレビカメラ装置４０Ａ１、４０Ａ２の撮像結果から撮像調整部３６が話者の発言している状態を画像認識し、話者の顔の輪郭が画枠の中心になるよう、テレビカメラ装置４０Ａ１、４０Ａ２の向きをパン、チルトにて変化させ、ズームを行う。また、同時に話者の声紋を登録する。 Steps S82 to 84: The imaging adjustment unit 36 searches for the outline of the imaging results (images) of the TV camera devices 40A1 and 40A2, that is, the outline of the conference attendee (step S82), and the outline fills the frame (frame) of the image. Thus, zoom control is performed on the television camera devices 40A1 and 40A2. As described above, for example, as illustrated in FIG. 31, when the conference attendee A1 is photographed, the face of the conference attendant A1 is photographed by the TV camera device 40A2 that can capture the front face of the conference attendee A1. Perform zoom processing. After the zoom process, the process returns to the next step S of the called main routine.
That is, from the imaging results of the TV camera devices 40A1 and 40A2, the imaging adjustment unit 36 recognizes the state of the speaker speaking, and the TV camera devices 40A1 and 40A1 are configured so that the outline of the speaker's face is at the center of the image frame. The zoom is performed by changing the direction of 40A2 by pan and tilt. At the same time, the voiceprint of the speaker is registered.

この時、撮像調整部３６がもし２人以上の画像を認識してしまった場合には、モニタ装置としてのプロジェクタ装置６０Ａにその旨を表示する。たとえば、認識した全員の顔を表示するので、話者はその中からどれが自分であるかを選択し、必要があれば、手動でパン、チルト、ズーム操作を行い、出来る限り話者一人だけが画枠に入るようにする。 At this time, if the image capturing adjustment unit 36 recognizes two or more images, the fact is displayed on the projector device 60A as a monitor device. For example, since all recognized faces are displayed, the speaker selects which one of them is his / herself, and if necessary, manually performs pan, tilt and zoom operations. To enter the image frame.

ステップＳ５５〜５９：ステップＳ６０〜６４：
図２９に示したサブルーチン１および図３０に示したサブルーチン２の処理を参照して、これらの処理を述べる。
ステップＳ５５〜５６、６０〜６１：音源方向検出データをサブルーチン１に渡してテレビカメラ装置４０Ａ１、４０Ａ２のうち該当するものをパンさせる。
ステップＳ５７〜５８、６２〜６３：画像認識処理を行うサブルーチン２の処理を行う。
ステップＳ５９、６４：声紋認証部３２による声紋データと、テレビカメラ装置４０Ａ１、４０Ａ２のパン、チルト、ズームのデータを１対として、たとえば、撮像調整部３６のデータベースに保存し、次回の処理に用いる。
すなわち、話者の声紋と、その話者を明瞭に映し出すためのカメラのパン、チルト、ズームのデータを一対一で対応させ、データとして登録する。その結果、以後、話者が変わっても、話者の声紋を登録データと照合することにより、話者を明瞭に映し出すためのカメラのパン、チルト、ズーム動作が自動的に行われる。 Steps S55 to 59: Steps S60 to 64:
These processes will be described with reference to the processes of the subroutine 1 shown in FIG. 29 and the subroutine 2 shown in FIG.
Steps S55 to 56, 60 to 61: The sound source direction detection data is passed to the subroutine 1, and the corresponding one of the TV camera devices 40A1 and 40A2 is panned.
Steps S57 to 58, 62 to 63: Subroutine 2 processing for performing image recognition processing is performed.
Steps S59 and S64: The voiceprint data by the voiceprint authentication unit 32 and the pan, tilt, and zoom data of the television camera devices 40A1 and 40A2 are stored as a pair in, for example, the database of the imaging adjustment unit 36 and used for the next processing. .
That is, a speaker's voiceprint and camera pan, tilt, and zoom data for clearly displaying the speaker are made to correspond one-to-one and registered as data. As a result, even if the speaker changes thereafter, the camera's pan, tilt and zoom operations are automatically performed to clearly show the speaker by comparing the speaker's voiceprint with the registered data.

なお、マイクロフォンの選択が適正に行われない場合、あるいは、マイクロフォンの選択が行われたが声紋認証が合格されない、デフォルト状態のときは、撮像調整部３６は、デフォルト処理を行う。そのようなデフォルト処理としては、撮像調整部３６は、図２に図解した初期状態、すなわち、各テレビカメラ装置４０Ａ１、４０Ａ２が会議室の左右を分担して撮像する撮像条件をテレビカメラ装置４０Ａ１、４０Ａ２に与える。その結果、テレビカメラ装置４０Ａ１、４０Ａ２は初期状態の映像を撮像する。 If the microphone is not selected properly, or if the microphone is selected but voiceprint authentication is not passed, the imaging adjustment unit 36 performs default processing. As such default processing, the imaging adjustment unit 36 sets the imaging conditions for imaging in the initial state illustrated in FIG. 2, that is, the television camera devices 40A1 and 40A2 share the right and left of the conference room. Give to 40A2. As a result, the television camera devices 40A1 and 40A2 capture an initial image.

なお、デフォルトのとき、増幅器利得調整部３４は利得可変型増幅器３０１〜３０６の利得調整を行わない。 At the default, the amplifier gain adjusting unit 34 does not adjust the gain of the variable gain amplifiers 301 to 306.

以下、発言者の撮影例を述べる。
第１マイクロフォンの方向および領域ＭＩＣ１ＡＲＥＡにいる会議出席者Ａ１が第１マイクロフォンＭＣ１を用いて発言したとする。第１マイクロフォンＭＣ１の集音信号は、Ａ／Ｄ変換器２７１でディジタル信号に変換されてＤＳＰ２５に入力され、第１実施の形態において述べた方法により選択される。このとき、ＤＳＰ２５はマイクロプロセッサ２３に第１マイクロフォンＭＣ１を選択したことを示すマイクロフォン選択信号Ｓ２５１を出力する。マイクロフォン選択信号Ｓ２５１はマイクロプロセッサ２３から撮像調整部３６に出力される。
また、ＤＳＰ２５で選択された第１マイクロフォンの集音信号はＤＳＰ２６に出力され、ＤＳＰ２６でエコーキャンセルされ、選択音声信号Ｓ２６としてＤ／Ａ変換器２８２、増幅器２９１を経由して声紋認証部３２に入力される。
声紋認証部３２は、選択音声信号Ｓ２６が声紋認証部３２内の声紋登録メモリＭ２に事前に登録されている声紋に一致するか否かを認証する。会議出席者Ａ１の声紋が声紋認証部３２の声紋登録メモリＭ２に事前に登録されていれば、声紋認証部３２から合格を示す「１」の認証合格信号Ｓ３２が増幅器利得調整部３４と撮像調整部３６に出力される。
他方、会議出席者Ａ１の声紋が声紋認証部３２の声紋登録メモリＭ２に事前に登録されていなければ、声紋認証部３２から不合格を示す「０」の認証合格信号Ｓ３２が撮像調整部３６に出力される。 An example of shooting a speaker will be described below.
Suppose that the meeting attendee A1 in the first microphone direction and area MIC1 AREA speaks using the first microphone MC1. The collected sound signal of the first microphone MC1 is converted into a digital signal by the A / D converter 271 and input to the DSP 25, and is selected by the method described in the first embodiment. At this time, the DSP 25 outputs to the microprocessor 23 a microphone selection signal S251 indicating that the first microphone MC1 has been selected. The microphone selection signal S251 is output from the microprocessor 23 to the imaging adjustment unit 36.
The collected sound signal of the first microphone selected by the DSP 25 is output to the DSP 26, echo-cancelled by the DSP 26, and input to the voiceprint authentication unit 32 via the D / A converter 282 and the amplifier 291 as the selected sound signal S26. Is done.
The voiceprint authentication unit 32 authenticates whether the selected voice signal S26 matches a voiceprint registered in advance in the voiceprint registration memory M2 in the voiceprint authentication unit 32. If the voice print of the attendee A1 has been registered in advance in the voice print registration memory M2 of the voice print authentication unit 32, an authentication pass signal S32 of “1” indicating acceptance from the voice print authentication unit 32 and the imaging adjustment with the amplifier gain adjustment unit 34 Is output to the unit 36.
On the other hand, if the voice print of the attendee A1 is not registered in advance in the voice print registration memory M2 of the voice print authentication unit 32, the authentication pass signal S32 of “0” indicating failure is sent from the voice print authentication unit 32 to the imaging adjustment unit 36. Is output.

撮像調整部３６は、「１」の認証合格信号Ｓ３２が入力されたとき、マイクロフォン選択信号Ｓ２５１で示された第１マイクロフォンＭＣ１についての第１撮像条件情報に基づいて、テレビカメラ装置４０Ａ１、４０Ａ２を制御する。その結果、第１マイクロフォンの方向および領域ＭＩＣ１ＡＲＥＡが撮像されて、会議出席者Ａ１が撮像される。
撮像調整部３６は会議出席者Ａ１が発言をしている間、第１撮像条件情報に基づいてテレビカメラ４０で第１マイクロフォンの方向および領域ＭＩＣ１ＡＲＥＡを撮像を継続させる。 When the authentication pass signal S32 of “1” is input, the imaging adjustment unit 36 sets the television camera devices 40A1 and 40A2 based on the first imaging condition information about the first microphone MC1 indicated by the microphone selection signal S251. Control. As a result, the direction of the first microphone and the area MIC1 AREA are imaged, and the conference attendee A1 is imaged.
The imaging adjustment unit 36 continues imaging the direction of the first microphone and the area MIC1 AREA with the television camera 40 based on the first imaging condition information while the conference attendee A1 is speaking.

次に、声紋認証部３２には声紋が登録されていない、第３のマイクロフォンＭＣ３を用いた会議出席者Ａ３が発言し、ＤＳＰ２５においてその発言が選択されたとする。
ＤＳＰ２５からは、第３のマイクロフォンＭＣ３を示すマイクロフォン選択信号Ｓ２５１がマイクロプロセッサ２３を経由して撮像調整部３６に出力される。もちろん、第３のマイクロフォンＭＣ３の集音信号はＤＳＰ２６に入力されてエコーキャンセル処理され、ＤＳＰ２６として声紋認証部３２に出力される。
会議出席者Ａ３の声紋は声紋認証部３２に登録されていないから、声紋認証部３２からは、不合格を示す「０」の認証合格信号Ｓ３２が撮像調整部３６に出力される。
撮像調整部３６は、「０」の認証合格信号Ｓ３２に入力されたとき、デフォルトと判断する。デフォルトの場合の処理としては、撮像調整部３６は、たとえば、テレビカメラ装置４０Ａ１、４０Ａ２の撮像条件を継続するか、初期状態として会議室の左右かつ会議出席者全体が撮像されるようにする。 Next, it is assumed that the conference attendee A3 using the third microphone MC3 who has no voiceprint registered in the voiceprint authentication unit 32 speaks and the DSP25 selects the speech.
From the DSP 25, a microphone selection signal S251 indicating the third microphone MC3 is output to the imaging adjustment unit 36 via the microprocessor 23. Of course, the collected sound signal of the third microphone MC3 is input to the DSP 26, subjected to echo cancellation processing, and output to the voiceprint authentication unit 32 as the DSP 26.
Since the voice print of the attendee A3 is not registered in the voice print authentication unit 32, the voice print authentication unit 32 outputs an authentication pass signal S32 of “0” indicating failure to the imaging adjustment unit 36.
The imaging adjustment unit 36 determines that it is the default when the authentication pass signal S32 of “0” is input. As a process in the case of default, for example, the imaging adjustment unit 36 continues the imaging conditions of the TV camera devices 40A1 and 40A2, or the left and right of the conference room and the entire conference attendee are imaged as an initial state.

複数の会議出席者が同時に発言した時は、ＤＳＰ２５において音のレベルの高い方が選択され、その後は、上記の声紋認証の結果により撮像調整部３６を介してテレビカメラ装置４０Ａ１、４０Ａ２の撮像条件が制御される。 When a plurality of conference attendees speak at the same time, the DSP 25 selects the one with the higher sound level, and thereafter, the imaging conditions of the TV camera devices 40A1 and 40A2 via the imaging adjustment unit 36 according to the result of the voiceprint authentication described above. Is controlled.

以上の処理は遠隔会議の先方の音声集音・映像撮像装置でも全く同様に行われる。
また声紋登録、認証が遠隔の先方で使えない場合、先方の会議出席者の声紋登録や会議中の声紋認証を通話装置が設置されているこちら側で行い、先方の音声集音・映像撮像装置のテレビカメラ装置の撮像条件を制御することもできる。 The above processing is performed in exactly the same way in the audio collecting / imaging device at the other end of the remote conference.
If voiceprint registration / authentication cannot be used at the remote site, voiceprint registration of the conference attendee at the remote conference and voiceprint authentication during the conference will be performed on this side where the communication device is installed. It is also possible to control the imaging conditions of the TV camera apparatus.

第３実施の形態の音声集音・映像撮像装置を用いることにより、明瞭な音声および映像による遠隔会議の相手側に伝わるのは勿論であるが、会議出席者が発言するとその声紋が認証され、テレビカメラ装置がその声紋認証された発言者の方を向いて映すことができる。 By using the audio collecting / imaging device of the third embodiment, it is of course transmitted to the other party of the remote conference by clear audio and video, but when the conference participant speaks, the voiceprint is authenticated, The TV camera device can be directed toward the voice-printed speaker.

第３実施の形態によれば、会議出席者ごとに個別のマイクロフォンを設けることも不要であるし、システム管理者、たとえば、議長によるテレビカメラ装置４０Ａ１、４０Ａ２の撮像条件の制御も不要である。 According to the third embodiment, it is not necessary to provide a separate microphone for each conference attendant, and it is not necessary to control the imaging conditions of the TV camera devices 40A1 and 40A2 by the system administrator, for example, the chairperson.

さらに会議中に会議出席者が場所を移動しても、ＤＳＰ２５におけるマイクロフォン選択処理により有効なマイクロフォンが選択されかつ声紋認証部３２における声紋の認証により、テレビカメラ装置４０Ａ１、４０Ａ２をその会議出席者のいる方向および領域に向かわせることができる。 Furthermore, even if the conference attendee moves during the conference, a valid microphone is selected by the microphone selection process in the DSP 25 and the voice print authentication by the voice print authentication unit 32 causes the TV camera devices 40A1 and 40A2 to be connected to the conference attendee. Can be directed in the direction and area.

また会議中はシステム管理者（たとえば、議長）がなにもしなくても、テレビジョン受像機またはテレビジョン受像機に発言者の名前とかが自動的に表示される。 During the conference, the name of the speaker is automatically displayed on the television receiver or the television receiver without any action from the system administrator (for example, chairperson).

以上、第３実施の形態の好適な例示として、図２６および図２７を参照して、ＤＳＰ２５におけるマイクロフォン選択を行い、さらに、声紋認証部３２における声紋認証をした結果、撮像調整部３６がテレビカメラ装置４０Ａ１、４０Ａ２を撮像条件に従って制御する場合について述べたが、基本的には、ＤＳＰ２５によるマイクロフォン選択結果のみについて、撮像調整部３６によるテレビカメラ装置４０Ａ１、４０Ａ２の撮像制御を行うこともできる。 As described above, as a preferable example of the third embodiment, referring to FIGS. 26 and 27, the microphone selection in the DSP 25 is performed, and the voice print authentication in the voice print authentication unit 32 is performed. Although the case where the devices 40A1 and 40A2 are controlled in accordance with the imaging conditions has been described, basically, the imaging control of the television camera devices 40A1 and 40A2 by the imaging adjustment unit 36 can be performed only for the microphone selection result by the DSP 25.

第３実施の形態に実施に際しては、第１実施の形態において述べたように、マイクロフォンが等角度で放射状に配置されている場合には限定されない。マイクロフォンが等角度で放射状に配置されていない場合でも、ＤＳＰ２５は、たとえば、最大振幅を示すマイクロフォンを選択することができ、声紋認証部３２は事前に登録された声紋と一致しているか否かを認証することができる。
この場合でも、撮像調整部３６は事前に設定された撮像条件情報に基づいて、テレビカメラ装置４０Ａ１、４０Ａ２の撮像条件を制御する。 The implementation of the third embodiment is not limited to the case where the microphones are arranged radially at equal angles as described in the first embodiment. Even when the microphones are not arranged radially at an equal angle, the DSP 25 can select, for example, a microphone that exhibits the maximum amplitude, and the voiceprint authentication unit 32 determines whether or not it matches the voiceprint registered in advance. It can be authenticated.
Even in this case, the imaging adjustment unit 36 controls the imaging conditions of the television camera devices 40A1 and 40A2 based on the imaging condition information set in advance.

本発明の第３実施の形態によれば、会議中に話者が変わっても、話者を映し出すカメラの選択、及び選択されたカメラのパン、チルト、ズームが自動的に変わるため、従来のように手動でセッティングを変更する必要がなく、常に話者の明瞭な映像を映し出し続けることができる。
また、話者方向検出技術と画像認識技術を用いることで、話者を映し出すカメラのパン、チルト、ズーム動作が自動的に行われ、話者の明瞭な映像を映し出すことができる。特に、話者の声紋照合を行うことで、話者が変わる度に自動的にカメラのパン、チルト、ズーム動作が行われ、新しい話者を明瞭に撮影することが可能である。
また本発明の第３実施の形態によれば、マイクロフォンとテレビカメラ装置４０Ａ１、４０Ａ２の相対位置が厳密でなくても、上述した画像認識処理などにより実用的な画像及び音声が収録できる。 According to the third embodiment of the present invention, even if the speaker changes during the conference, the selection of the camera that displays the speaker and the pan, tilt, and zoom of the selected camera automatically change. Thus, it is not necessary to change the setting manually, and a clear image of the speaker can always be projected.
In addition, by using the speaker direction detection technology and the image recognition technology, panning, tilting, and zooming operations of a camera that displays the speaker are automatically performed, and a clear image of the speaker can be displayed. In particular, by performing speaker voiceprint matching, the camera pans, tilts, and zooms automatically whenever the speaker changes, and a new speaker can be clearly photographed.
Further, according to the third embodiment of the present invention, even if the relative positions of the microphone and the TV camera devices 40A1 and 40A2 are not strict, practical images and sounds can be recorded by the above-described image recognition processing or the like.

第４実施の形態
本発明の第４実施の形態は、上述した第３実施の形態を拡張した発明である。
（１）撮像手段の種類
第３実施の形態においては、撮像手段として２台のテレビジョンカメラ装置を用いた場合について述べたが、第４実施の形態においては、撮像手段として小型のカメラ、たとえば、ＣＣＤカメラを用いる。 Fourth Embodiment The fourth embodiment of the present invention is an invention that extends the third embodiment described above.
(1) Types of imaging means In the third embodiment, the case where two television camera devices are used as the imaging means has been described. In the fourth embodiment, a small camera, for example, A CCD camera is used.

（２）マイクロフォンとＣＣＤカメラの配置
図３２および図３３において、白丸で示したものがＣＣＤカメラであり、黒丸で示したものがマイクロフォンである。
図３２はマイクロフォンとＣＣＤカメラとが完全に１対１に併設されている例を示し、図３３はマイクロフォンとＣＣＤカメラとが完全に１対１には併設されていない例を示す。 (2) Arrangement of Microphone and CCD Camera In FIGS. 32 and 33, the white circle represents the CCD camera, and the black circle represents the microphone.
FIG. 32 shows an example in which the microphone and the CCD camera are completely arranged in a one-to-one relationship, and FIG. 33 shows an example in which the microphone and the CCD camera are not completely arranged in a one-to-one relationship.

（３）撮像手段の数と撮像範囲
図３２、図３３に図解したように、複数のＣＣＤカメラはそれぞれが撮像範囲が重複しない程度の間隔に配置しておく。各マイクロフォンの集音範囲と各撮像手段と撮像範囲とが完全に一致している必要はない。
すなわち、マイクロフォンの集音範囲とＣＣＤカメラの撮影範囲が異なる場合もあるので、マイクロフォンとＣＣＤカメラの位置と数量を合わせる必要はないが、マイクロフォンによって選ばれた方向を、あるＣＣＤカメラで撮影できることが必要である。 (3) Number of Imaging Means and Imaging Range As illustrated in FIGS. 32 and 33, a plurality of CCD cameras are arranged at intervals such that the imaging ranges do not overlap each other. It is not necessary for the sound collection range of each microphone, each imaging means, and the imaging range to completely match.
That is, since the microphone sound collection range and the CCD camera shooting range may be different, there is no need to match the position and quantity of the microphone and the CCD camera, but the direction selected by the microphone can be shot with a certain CCD camera. is necessary.

また本実施の形態においては、マイクロフォンの集音範囲と、ＣＣＤカメラの撮像範囲との関係が明確であれば足り、マイクロフォンが第１実施の形態で述べたように、等間隔で配置されていたり、図７に図解したような指向性を持つ必要は必ずしもない。
しかしながら、第１実施の形態で述べたように、マイクロフォンが等間隔で配置されており、指向性を持つことは、処理および設計の観点で容易であり、有効である。 In the present embodiment, it is sufficient that the relationship between the microphone sound collection range and the CCD camera imaging range is clear, and the microphones are arranged at equal intervals as described in the first embodiment. The directivity as illustrated in FIG. 7 is not necessarily required.
However, as described in the first embodiment, it is easy and effective in terms of processing and design that the microphones are arranged at equal intervals and have directivity.

撮像手段としてＣＣＤカメラを用いたのは、マイクロフォンとＣＣＤカメラとを近接させてほぼ併設状態にしているので、ＣＣＤカメラの数がマイクロフォンと同数またはマイクロフォンの数に近い数になるので、比較的低価格のＣＣＤカメラを用いたのである。
本実施の形態において、テレビジョンカメラ装置のような寸法の大きな撮像手段をマイクロフォンと併設することは現実的に困難なためであり、マイクロフォンと同様に小型のＣＣＤカメラを用いた。 The reason why the CCD camera is used as the imaging means is that the microphone and the CCD camera are placed close to each other so that the number of CCD cameras is the same as or close to the number of microphones. A priced CCD camera was used.
In the present embodiment, it is practically difficult to install an imaging unit having a large size such as a television camera device together with a microphone, and a small CCD camera is used like the microphone.

図３４はマイクロフォンとＣＣＤカメラを選択処理する回路の部分構成図である。
マイクロフォン選択処理部２５１は、第１〜第３実施の形態に述べたＤＳＰ２５の処理のうち、マイクロフォンを選択処理する部分である。
カメラ選択処理部２５２は、本実施の形態において付加された処理をＤＳＰ２５で行う部分である。
映像切替えスイッチ回路３７は本実施の形態で付加した部分であり、複数のＣＣＤカメラで撮像した映像信号のうちの１つを、カメラ選択処理部２５２の指令に応じて選択して出力するスイッチ回路である。
カメラ選択処理部２５２は、マイクロフォン選択処理部２５１の指令に応じてＣＣＤカメラを選択する場合と、操作部１５に設けられたカメラ選択指示ボタン（図示せず）に応じてＣＣＤカメラを選択する場合とがある。カメラ選択指示ボタンの指示を用いるか、マイクロフォン選択処理部２５１の指令を用いるかについては適宜決定できる。たとえば、カメラ選択指示ボタンの指示をマイクロフォン選択処理部２５１の指令に優先させることもできるし、逆にすることもできる。あるいは、優先順序はつけず、選択指令が発せられたときに応じてその都度、ＣＣＤカメラを切り換えてもよい。
画像合成部３８は、操作部１５に設けられた画像合成指示ボタン（図示せず）に応じて複数のＣＣＤカメラで撮像した画像を合成する部分である。画像合成部３８はまた、画像を合成させるだけでなく、複数の映像信号を１まいの画面内に分割させる処理も行う。 FIG. 34 is a partial configuration diagram of a circuit for selecting and processing a microphone and a CCD camera.
The microphone selection processing unit 251 is a part that performs a microphone selection process among the processes of the DSP 25 described in the first to third embodiments.
The camera selection processing unit 252 is a part that performs processing added in the present embodiment in the DSP 25.
The video changeover switch circuit 37 is a part added in the present embodiment, and a switch circuit that selects and outputs one of video signals picked up by a plurality of CCD cameras in accordance with a command from the camera selection processing unit 252. It is.
The camera selection processing unit 252 selects a CCD camera according to a command from the microphone selection processing unit 251 and selects a CCD camera according to a camera selection instruction button (not shown) provided on the operation unit 15. There is. Whether to use the instruction of the camera selection instruction button or the instruction of the microphone selection processing unit 251 can be determined as appropriate. For example, the instruction of the camera selection instruction button can be given priority over the instruction of the microphone selection processing unit 251 or vice versa. Alternatively, the priority order may not be assigned, and the CCD camera may be switched each time a selection command is issued.
The image composition unit 38 is a part for compositing images captured by a plurality of CCD cameras in accordance with an image composition instruction button (not shown) provided on the operation unit 15. In addition to synthesizing images, the image synthesizing unit 38 also performs a process of dividing a plurality of video signals into a single screen.

図３５はＣＣＤカメラとマイクロフォンとが同じ位置に併設されている場合の、ＤＳＰ２５で行う処理を示すフローチャートである。
図３６はＣＣＤカメラとマイクロフォンとが同じ位置に併設されていない場合もあることを想定したときのＤＳＰ２５で行う処理を示すフローチャートである。 FIG. 35 is a flowchart showing processing performed by the DSP 25 when the CCD camera and the microphone are installed at the same position.
FIG. 36 is a flowchart showing processing performed by the DSP 25 when it is assumed that the CCD camera and the microphone may not be provided at the same position.

第１形態の動作
図３５を参照して第１形態の動作を述べる。
ステップＳ９１においてＤＳＰ２５のマイクロフォン選択処理部２５１は上述した第１実施の形態の方法で、音声が発せられたマイクロフォンを検出した場合、ステップＳ９２において新たに検出されたマイクロフォンが前回まで選択されていたマイクロフォンと同じか否かを判断する。同じであれば、前回選択されたマイクロフォンが継続して選択され、選択されたマイクロフォンにより、上述した第１〜第３実施の形態の方法に従って話者の方向が特定できる。 Operation of the First Mode The operation of the first mode will be described with reference to FIG.
In step S91, when the microphone selection processing unit 251 of the DSP 25 detects the microphone from which sound is emitted by the method of the first embodiment described above, the microphone newly detected in step S92 has been selected up to the previous time. It is determined whether or not. If they are the same, the previously selected microphone is continuously selected, and the direction of the speaker can be specified by the selected microphone according to the method of the first to third embodiments described above.

なお、マイクロフォン選択処理部２５１は、単なる相槌など、極めて短時間の発音に対しては、マイクロフォンの選択を切り換えることなく、そして、ＣＣＤカメラを切り替える処理を行わないようにすることで、一々画像が切り替わってしまうような煩わしさ、または、不自然さを避けることができる。 Note that the microphone selection processing unit 251 does not switch the selection of the microphone and does not perform the process of switching the CCD camera for an extremely short time of sound such as a simple match, so that the images are displayed one by one. It is possible to avoid the troublesomeness or unnaturalness of switching.

ステップＳ９２において、ＤＳＰ２５のマイクロフォン選択処理部２５１がマイクロフォンが新しいマイクロフォンとして検出した場合、ＤＳＰ２５のマイクロフォン選択処理部２５１は上述した実施の形態に従って新しいマイクロフォンを選択し、選択されたマイクロフォンから音声が出力されるような処理を行うと同時に、カメラ選択処理部２５２に選択したマイクロフォンの情報を出力してマイクロフォンの選択切替えを行ったことを通報する。 In step S92, when the microphone selection processing unit 251 of the DSP 25 detects that the microphone is a new microphone, the microphone selection processing unit 251 of the DSP 25 selects a new microphone according to the above-described embodiment, and audio is output from the selected microphone. At the same time, information on the selected microphone is output to the camera selection processing unit 252 to notify that the selection of the microphone has been switched.

カメラ選択処理部２５２は、選択されたマイクロフォンと併設されているＣＣＤカメラをカメラ選択処理部２５２内のメモリに記憶している。そこで、カメラ選択処理部２５２は、マイクロフォン選択処理部２５１で選択切り換えたマイクロフォンに対応するＣＣＤカメラを選択するように、映像切替えスイッチ回路３７に切替え指令を出力する。
映像切替えスイッチ回路３７はカメラ選択処理部２５２の指令に応じてＣＣＤカメラの映像信号の出力の選択を切り換える。
本例では、画像合成指示ボタンから画像合成指示はない。したがって、映像切替えスイッチ回路３７から出力された映像信号は画像合成部３８を通過して、たとえば、図２に図解したテレビジョン受像機５０Ａに表示されるとともに、相手側の会議室に送出されて、相手側の会議室内のテレビジョン受像機に表示される。もちろん、選択された音声は上述した実施の形態と同じ方法で相手側会議室に送出されて、相手側会議室内の会議出席者が聞くことができる。 The camera selection processing unit 252 stores a CCD camera provided alongside the selected microphone in a memory in the camera selection processing unit 252. Therefore, the camera selection processing unit 252 outputs a switching command to the video changeover switch circuit 37 so as to select the CCD camera corresponding to the microphone selected and switched by the microphone selection processing unit 251.
The video changeover switch circuit 37 switches the selection of the video signal output of the CCD camera in accordance with a command from the camera selection processing unit 252.
In this example, there is no image composition instruction from the image composition instruction button. Therefore, the video signal output from the video changeover switch circuit 37 passes through the image synthesis unit 38 and is displayed on, for example, the television receiver 50A illustrated in FIG. 2 and sent to the conference room on the other side. Displayed on the television receiver in the other party's conference room. Of course, the selected voice is transmitted to the other party's conference room in the same manner as in the above-described embodiment, and can be heard by the conference attendee in the other party's conference room.

以上のように、第４実施の形態の第１形態によれば、マイクロフォンからの音声の迅速な選択に加えて、同時的に対応するＣＣＤカメラの迅速な選択が可能となり、選択されたマイクロフォンを使用している話者の映像と音声を迅速に切り換えて出力することができる。
特に、第４実施の形態の第１形態においては、ＣＣＤカメラにズーム機能などを付加しないで、マイクロフォンの前にいる話者の撮像を可能にしており、第３実施の形態のようにズーム機能を働かせたり、輪郭を検出して話者の適切な映像を決定する動作が不要となり、音声の切替えと同時に映像の切替えも可能となる。 As described above, according to the first embodiment of the fourth embodiment, in addition to the quick selection of sound from the microphone, it is possible to quickly select the corresponding CCD camera at the same time. The video and audio of the speaker in use can be quickly switched and output.
In particular, in the first embodiment of the fourth embodiment, it is possible to take an image of a speaker in front of a microphone without adding a zoom function or the like to the CCD camera, and the zoom function as in the third embodiment. And the operation of determining the appropriate video of the speaker by detecting the contour is unnecessary, and the video can be switched simultaneously with the switching of the voice.

第２形態の動作
図３６を参照して第２形態の動作を述べる。
ステップＳ９１〜９２の処理は、図３５を参照して述べた処理とほぼ同じである。ただし、本例は、図３３に図解したように、１つのマイクロフォンが選択されたとき、そのマイクロフォンの両側のＣＣＤカメラの撮像結果を合成して出力することができる。そのため、下記の処理を行う。 Operation of the Second Mode The operation of the second mode will be described with reference to FIG.
The processing in steps S91 to S92 is almost the same as the processing described with reference to FIG. However, in this example, as illustrated in FIG. 33, when one microphone is selected, the imaging results of the CCD cameras on both sides of the microphone can be synthesized and output. Therefore, the following processing is performed.

ステップＳ９３Ａにおいて、カメラ選択処理部２５２は、マイクロフォン選択処理部２５１からマイクロフォンの選択情報が入力された場合、カメラ選択処理部２５２内のメモリを検索して、選択されたマイクロフォンに隣接して併設された２個のＣＣＤカメラからの映像を合成すべきか否かを判断する。カメラ選択処理部２５２内のメモリに、選択されたマイクロフォンに対して１台のＣＣＤカメラしか指定されていない場合は、ステップＳ９３を参照して述べたと同様、そのＣＣＤカメラを選択処理する指令を映像切替えスイッチ回路３７に出力する。
カメラ選択処理部２５２内のメモリに、選択されたマイクロフォンに対して両側の２台のＣＣＤカメラが指定されている場合、カメラ選択処理部２５２は、１台のＣＣＤカメラからの撮像信号が映像切替えスイッチ回路３７で選択されてそこを通過して画像合成部３８で受信できる時間間隔で、上記２台のＣＣＤカメラの映像信号が交互に選択されて画像合成部３８に入力されるように、映像切替えスイッチ回路３７に指示する。 In step S93A, when microphone selection information is input from the microphone selection processing unit 251, the camera selection processing unit 252 searches the memory in the camera selection processing unit 252, and is adjacent to the selected microphone. It is determined whether or not the images from the two CCD cameras should be combined. When only one CCD camera is designated for the selected microphone in the memory in the camera selection processing unit 252, as described with reference to step S93, an instruction to select the CCD camera is displayed as an image. Output to the changeover switch circuit 37.
When two CCD cameras on both sides with respect to the selected microphone are designated in the memory in the camera selection processing unit 252, the camera selection processing unit 252 switches the image of the imaging signal from one CCD camera. The video signal is selected so that the video signals of the two CCD cameras are alternately selected and input to the image synthesis unit 38 at a time interval that can be selected by the switch circuit 37 and passed therethrough and received by the image synthesis unit 38. The changeover switch circuit 37 is instructed.

ステップＳ９４において、画像合成部３８は、カメラ選択処理部２５２から画像合成指示を受けたとき、上記所定間隔で映像切替えスイッチ回路３７から出力される連続した２つの撮像信号を画像合成部３８内のフレームメモリに記憶し、メモリに記憶された２種の映像信号を合成して１枚の画像として出力する。画像合成方法は、画像合成部３８のメモリに２つのＣＣＤカメラの組み合わせごとに事前に指定されており、その合成方法に従って画像合成部３８は２枚の映像信号を合成して１枚の画像とする。 In step S <b> 94, when the image composition unit 38 receives an image composition instruction from the camera selection processing unit 252, the image composition unit 38 outputs two consecutive imaging signals output from the video changeover switch circuit 37 at the predetermined interval. The image is stored in the frame memory, and the two kinds of video signals stored in the memory are combined and output as one image. The image synthesizing method is designated in advance in the memory of the image synthesizing unit 38 for each combination of two CCD cameras, and the image synthesizing unit 38 synthesizes two video signals according to the synthesizing method to generate one image. To do.

また、画像合成部３８は、上述したように、２種の映像信号を合成するだけでなく、１枚の画面を２分割した画像として出力することができる。
画像合成部３８において、２種の映像信号を１枚の画像に合成するか、１枚の画面に２分割して出力するかについては、カメラ選択処理部２５２から指定してもよいし、操作部１５における画像合成指示ボタンを用いてユーザが指定してもよい。 Further, as described above, the image composition unit 38 can not only synthesize two types of video signals but also output an image obtained by dividing one screen into two.
In the image composition unit 38, whether to synthesize two types of video signals into one image or to divide the image into two images for output may be designated from the camera selection processing unit 252 The user may specify using an image composition instruction button in the unit 15.

１画面に２種の映像信号を分割する場合、均等に２分割してもよいし、１番目の映像信号を画面一杯に表示し、２番目の映像信号を小さな枠内に表示する、いわゆる、マルチ表示方法をとってもよい。
このような画像合成方法についても、カメラ選択処理部２５２から指定してもよいし、操作部１５における画像合成指示ボタンを用いてユーザが指定してもよい。 When two types of video signals are divided into one screen, the video signals may be divided into two equal parts, the first video signal is displayed on the full screen, and the second video signal is displayed in a small frame. A multi-display method may be used.
Such an image composition method may also be designated from the camera selection processing unit 252 or may be designated by the user using an image composition instruction button on the operation unit 15.

画像合成部３８から出力された映像信号は、第１例と同様、たとえば、図２に図解したテレビジョン受像機５０Ａに表示されるとともに、相手側の会議室に送出されて、相手側の会議室内のテレビジョン受像機に表示される。もちろん、選択された音声は上述した実施の形態と同じ方法で相手側会議室に送出されて、相手側会議室内の会議出席者が聞くことができる。 As in the first example, the video signal output from the image composition unit 38 is displayed on, for example, the television receiver 50A illustrated in FIG. It is displayed on the television receiver in the room. Of course, the selected voice is sent to the other party's conference room in the same manner as the above-described embodiment, and can be heard by the conference attendee in the other party's conference room.

なお、１個のマイクロフォンの選択に関連して選択するＣＣＤカメラは選択したマイクロフォンに隣接する２個のＣＣＤカメラに限定されず、複数、たとえば、３個、４個にすることができる。
たとえば、４個のＣＣＤカメラを選択した場合、４個のＣＣＤカメラの撮像信号を１画面に４分割して表示することもできる。 The number of CCD cameras to be selected in connection with the selection of one microphone is not limited to two CCD cameras adjacent to the selected microphone, and may be plural, for example, three or four.
For example, when four CCD cameras are selected, the imaging signals of the four CCD cameras can be displayed on one screen by being divided into four.

以上のように、第４実施の形態の第２形態によれば、第１形態と同様の効果に加えて、１つのマイクロフォンの選択に関連する１または、２以上のＣＣＤカメラの映像信号を、合成または分割した状態で出力することができる。 As described above, according to the second embodiment of the fourth embodiment, in addition to the same effect as the first embodiment, the video signal of one or more CCD cameras related to the selection of one microphone is obtained. It can be output in a synthesized or divided state.

第４実施の形態の第１の変形態様
上述した第４実施の形態においては、ＣＣＤカメラにズーム機能、パン機能などを付加しない場合について述べたが、本実施の形態のＣＣＤカメラにズーム機能を付加することは、最近のカメラ付き携帯電話機などにおいても実現されているように、容易である。
したがって、本実施の形態のＣＣＤカメラにズーム機能など付加してもよい。そのようなズーム機能は、たとえば、操作部１５にズームボタンを付加して、ユーザがズームボタンを押した場合、押している期間、所定の速度でズームさせることができる。
すなわち、ＣＣＤカメラを用いた場合も、第３実施の形態において撮像手段としてテレビジョンカメラ装置を用いた場合と同様、テレビジョンカメラ装置の方向を変更させることを除いて、種々の処理を行うことができる。
そのような各種の撮像条件の変更処理は、第３実施の形態と同様に行う。 First Modification of Fourth Embodiment In the fourth embodiment described above, the case where the zoom function, pan function, etc. are not added to the CCD camera has been described. However, the CCD function of the present embodiment has a zoom function. The addition is easy, as has been realized in recent mobile phones with cameras.
Therefore, a zoom function or the like may be added to the CCD camera of this embodiment. With such a zoom function, for example, when a zoom button is added to the operation unit 15 and the user presses the zoom button, zooming can be performed at a predetermined speed during the pressing period.
That is, even when a CCD camera is used, various processes are performed except that the direction of the television camera device is changed as in the case where the television camera device is used as the imaging means in the third embodiment. Can do.
Such various imaging condition changing processes are performed in the same manner as in the third embodiment.

第４実施の形態の第２変形態様
第４実施の形態においても、第３実施の形態と同様、第２実施の形態、すなわち、声紋認証部３２を付加して、声紋認証された音声のみマイクロフォンの選択処理を行うことができる。それにより、より正確なマイクロフォンの選択と、それに伴うＣＣＤカメラなと野撮像手段の選択が可能となる。 Second Modification of Fourth Embodiment Also in the fourth embodiment, as in the third embodiment, the second embodiment, that is, the voice print authentication unit 32 is added, and only the voice that has been voice print authenticated is a microphone. Can be selected. As a result, it is possible to select a more accurate microphone and to select a field imaging means such as a CCD camera.

第４実施の形態として、小型の撮像手段として、ＣＣＤカメラを例示したが、マイクロフォンと併設可能で上述した諸機能を有し、上述した目的に適した他の小型カメラなどの撮像手段を用いることもできる。 In the fourth embodiment, a CCD camera is exemplified as a small-sized image pickup means. However, an image pickup means such as another small camera that can be provided with a microphone and has the above-described functions and suitable for the above-described purpose is used. You can also.

以上述べたように、第４実施の形態の第１形態においては、マイクロフォンと小型の撮像手段とが併設されている。したがって、マイクロフォンを使用している話者が選択できれば、そのマイクロフォンを使用している話者を撮影するのに適した１または２、あるいは、関連する他の小型の撮像手段が選択されたマイクロフォンの近傍に併設されている小型の撮像手段ＣＣＤカメラと一義的に選択できる。
その結果、第４実施の形態の第１形態によれば、マイクロフォンからの音声の迅速な選択に加えて、対応する、あるいは関連する１以上の小型の撮像手段の迅速な選択が可能となり、選択されたマイクロフォンを使用している話者の音声と、話者、および／または、その話者と関連する映像とを迅速に切り換えて出力することができる。 As described above, in the first mode of the fourth embodiment, the microphone and the small-size imaging unit are provided. Therefore, if the speaker using the microphone can be selected, 1 or 2 suitable for photographing the speaker using the microphone, or other related small imaging means is selected. This can be uniquely selected from a small-sized imaging means CCD camera provided in the vicinity.
As a result, according to the first form of the fourth embodiment, in addition to the quick selection of the sound from the microphone, the corresponding or related one or more small imaging means can be quickly selected and selected. The voice of the speaker using the selected microphone and the video of the speaker and / or the speaker can be quickly switched and output.

また、ユーザがマイク選択ボタンを押す必要が無い。
なお、小型の撮像手段の選択方法によれば、話者とは無関係に特定の方向のみを映し出すことや、反対に特定の方向の映像を映し出さないようにすることも可能である。
さらに上述したように、単なる相槌など極めて短時間の発音に対しては、一定時間処理を行わない時間を設けることで、一々その方向を撮影可能とするカメラを選択しないようにすることができる。
また、会議の相手先からの指示により、こちら側のマイクロフォンと小型の撮像手段とを自動的に切り替えることが可能である。 Further, there is no need for the user to press the microphone selection button.
It should be noted that according to the selection method of the small imaging means, it is possible to project only a specific direction regardless of the speaker, or on the contrary not to display a video in a specific direction.
Furthermore, as described above, it is possible to avoid selecting a camera that can shoot the direction one by one by providing a time for which processing is not performed for a certain period of time for an extremely short time of pronunciation such as mere conflict.
In addition, it is possible to automatically switch between the microphone on this side and the small image pickup means in accordance with an instruction from the other party of the conference.

本発明の実施に際しては、上述した複数の実施の形態を適宜組み合わせることができる。
なお、第１〜第４実施の形態について、相手側の会議室からの希望に応じてこちら側のマイクロフォンおよび撮像手段を切り替え、集音対象者や撮影対象者を変更することが可能である。同様に、こちら側から相手側のマイクロフォンおよび撮像手段を切り替え、集音対象者や撮影対象者を変更することが可能である。
これにより、双方向音声集音・映像撮像装置が効果的に活用できる。 In carrying out the present invention, the plurality of embodiments described above can be combined as appropriate.
In addition, about the 1st-4th embodiment, it is possible to change a microphone and an imaging means of this side according to the hope from the meeting room of the other party, and to change a sound collection object person and an imaging object person. Similarly, it is possible to change the other party's microphone and imaging means from this side, and change the sound collection target person and the photographing target person.
As a result, the bidirectional sound collection / video imaging apparatus can be effectively utilized.

図１（Ａ）は本発明の音声集音・映像撮像装置が適用される１例しての会議システムの概要を示す図であり、図１（Ｂ）は図１（Ａ）における音声集音・映像撮像装置の通話装置が載置される状態を示す図であり、図１（Ｃ）はテーブルに載置された通話装置と会議出席者との配置を示す図である。FIG. 1A is a diagram showing an outline of a conference system as an example to which the sound collection / video imaging apparatus of the present invention is applied, and FIG. 1B is a sound collection in FIG. -It is a figure which shows the state by which the communication apparatus of a video imaging device is mounted, FIG.1 (C) is a figure which shows arrangement | positioning with the communication apparatus mounted on the table, and a conference attendant. 図２は本発明の実施の形態の音声集音・映像撮像装置の平面構成図である。FIG. 2 is a plan configuration diagram of the sound collection / video imaging apparatus according to the embodiment of the present invention. 図３は本発明の実施の形態の通話装置の斜視図である。FIG. 3 is a perspective view of the communication device according to the embodiment of the present invention. 図４は図３に図解した通話装置の内部断面図である。FIG. 4 is an internal cross-sectional view of the communication device illustrated in FIG. 図５は図３に図解した通話装置の上部カバーを取り外したマイクロフォン・電子回路収容部の平面図である。FIG. 5 is a plan view of the microphone / electronic circuit housing portion from which the upper cover of the communication device illustrated in FIG. 3 is removed. 図６は第１実施の形態のマイクロフォン・電子回路収容部の主要回路の構成および接続状態を示す図であり、第１のディジタルシグナルプロセッサ（ＤＳＰ１）および第２のディジタルシグナルプロセッサ（ＤＳＰ２）の接続の接続状態を示している。FIG. 6 is a diagram showing the configuration and connection state of the main circuit of the microphone / electronic circuit housing portion of the first embodiment, and the connection of the first digital signal processor (DSP1) and the second digital signal processor (DSP2). Shows the connection state. 図７は図５に図解したマイクロフォンの特性図である。FIG. 7 is a characteristic diagram of the microphone illustrated in FIG. 図８（Ａ）〜（Ｄ）は、図６に図解した特性を持つマイクロフォンの指向性を分析した結果を示すグラフである。8A to 8D are graphs showing the results of analyzing the directivity of a microphone having the characteristics illustrated in FIG. 図９は本発明の通話装置の変形態様の部分構成図である。FIG. 9 is a partial configuration diagram of a modification of the communication device of the present invention. 図１０は第１のディジタルシグナルプロセッサ（ＤＳＰ１）における全体処理内容の概要を示すグラフである。FIG. 10 is a graph showing an outline of the entire processing contents in the first digital signal processor (DSP 1). 図１１は本発明の通話装置内のフィルタリング処理を示す図である。FIG. 11 is a diagram showing filtering processing in the communication device of the present invention. 図１２は図１１の処理結果を示す周波数特性図である。FIG. 12 is a frequency characteristic diagram showing the processing result of FIG. 図１３は本発明のバンドパス・フィルタリング処理とレベル変換処理を示すブロック図である。FIG. 13 is a block diagram showing bandpass filtering processing and level conversion processing according to the present invention. 図１４は図１３の処理を示すフローチャートである。FIG. 14 is a flowchart showing the processing of FIG. 図１５は本発明の通話装置における発言開始、終了を判定する処理を示すグラフである。FIG. 15 is a graph showing a process for determining the start and end of speech in the communication device of the present invention. 図１６は本発明の通話装置における通常処理の流れを示すグラフである。FIG. 16 is a graph showing the flow of normal processing in the communication device of the present invention. 図１７は本発明の通話装置における通常処理の流れを示すフローチャートである。FIG. 17 is a flowchart showing the flow of normal processing in the communication device of the present invention. 図１８は本発明の通話装置におけるマイクロフォン切り替え処理を図解したブロック図である。FIG. 18 is a block diagram illustrating a microphone switching process in the communication device of the present invention. 図１９は本発明の通話装置におけるマイクロフォン切り替え処理の方法を図解したブロック図である。FIG. 19 is a block diagram illustrating a method of microphone switching processing in the communication device of the present invention. 図２０は第２実施の形態のマイクロフォン・電子回路収容部の主要回路の構成および接続状態を示す図である。FIG. 20 is a diagram showing a configuration and connection state of main circuits of the microphone / electronic circuit housing portion of the second embodiment. 図２１は図２０に図解した声紋認証部の処理を示すグラフである。FIG. 21 is a graph showing the processing of the voiceprint authentication unit illustrated in FIG. 図２２は図２０に図解した声紋認証部の処理を示す第１のフローチャートである。FIG. 22 is a first flowchart showing the processing of the voiceprint authentication unit illustrated in FIG. 図２３は図２０に図解した声紋認証部の処理を示す第２のフローチャートである。FIG. 23 is a second flowchart showing the processing of the voiceprint authentication unit illustrated in FIG. 図２４は図２０に図解した声紋認証部の処理を示す第３のフローチャートである。FIG. 24 is a third flowchart showing the processing of the voiceprint authentication unit illustrated in FIG. 図２５は図２０に図解した声紋認証部の処理を示す第４のフローチャートである。FIG. 25 is a fourth flowchart showing the processing of the voiceprint authentication unit illustrated in FIG. 図２６は第３実施の形態の会議装置の構成図である。FIG. 26 is a configuration diagram of the conference apparatus according to the third embodiment. 図２７は第３実施の形態の会議装置の他の構成図である。FIG. 27 is another configuration diagram of the conference apparatus according to the third embodiment. 図２８は第３実施の形態の動作を示すフローチャートである。FIG. 28 is a flowchart showing the operation of the third embodiment. 図２９は第３実施の形態の緒と検出から撮影までの処理の流れ（その１）を示すフローチャートである。FIG. 29 is a flowchart showing the first embodiment and the flow of processing from detection to photographing (part 1). 図３０は第３実施の形態の緒と検出から撮影までの処理の流れ（その２）を示すフローチャートである。FIG. 30 is a flowchart showing the process flow (No. 2) from detection to photographing in the third embodiment. 図３１は第３実施の形態のテレビカメラ装置の撮像状態を示す図である。FIG. 31 is a diagram illustrating an imaging state of the television camera apparatus according to the third embodiment. 図３２は第４実施の形態の第１形態として、マイクロフォンとＣＣＤカメラの配置例を示す図である。FIG. 32 is a diagram showing an arrangement example of a microphone and a CCD camera as a first embodiment of the fourth embodiment. 図３３は第４実施の形態の第２形態として、マイクロフォンとＣＣＤカメラの配置例を示す図である。FIG. 33 is a diagram showing an arrangement example of a microphone and a CCD camera as a second embodiment of the fourth embodiment. 図３４は第４実施の形態のマイクロフォンとＣＣＤカメラを選択処理する回路の部分構成図である。FIG. 34 is a partial configuration diagram of a circuit for selecting and processing the microphone and the CCD camera according to the fourth embodiment. 図３５は図３２に図解したマイクロフォンとＣＣＤカメラとの配置の場合のＤＳＰで行う処理を示すフローチャートである。FIG. 35 is a flowchart showing processing performed by the DSP in the case of the arrangement of the microphone and the CCD camera illustrated in FIG. 図３６は図３３に図解したマイクロフォンとＣＣＤカメラとの配置の場合のＤＳＰで行う処理を示すフローチャートである。FIG. 36 is a flowchart showing processing performed by the DSP in the case of the arrangement of the microphone and the CCD camera illustrated in FIG.

Explanation of symbols

１Ａ、１Ｂ・・音声集音・映像撮像装置
１０Ａ、１０Ｂ・・通話装置（音声集音手段）
１１・・上部カバー、１２・・音反射板、１３・・連結部材
１４・・スピーカ収容部、１５・・操作部、１６・・受話再生スピーカ
１７・・拘束部材、１８・・ダンパ
２・・マイクロフォン・電子回路収容部
ＭＣ１〜ＭＣ・・マイクロフォン
２１・・プリント基板、２２・・マイクロフォン支持部材
２３・・マイクロプロセッサ、２４・・コーデック
２５・・第１のＤＳＰ
２５１・・マイクロフォン選択処理部
２５２・・カメラ選択処理部
２６・・第２のＤＳＰ
２７・・Ａ／Ｄ変換器ブロック、２７１〜２７４・・Ａ／Ｄ変換器
２８・・Ｄ／Ａ変換器ブロック、２９・・増幅器ブロック
３０・・マイクロフォン選択結果表示手段
３０１〜３０６・・可変利得型増幅器
３２・・声紋認証部
３４・・増幅器利得調整部
３６・・撮像調整部
３７・・映像切替えスイッチ回路
３８・・画像合成部
４０（４０Ａ、４０Ｂ）・・テレビカメラ装置（撮像手段）
1A, 1B..Voice sound collection / video imaging device 10A, 10B.
11 .. Upper cover, 12 .... Sound reflector, 13 .... Connecting member
14 .. Speaker housing part, 15 .. Operation part, 16 ..
17 .. Restraining member 18.. Damper 2.. Microphone · Electronic circuit housing
MC1 ~ MC ・・ Microphone
21 .. Printed circuit board, 22 .. Microphone support member
23. Microprocessor, 24. Codec
25..First DSP
251 .. Microphone selection processing unit
252 .. Camera selection processing unit
26 .. Second DSP
27 ·· A / D converter block, 271 to 274 ·· A / D converter
28..D / A converter block, 29..Amplifier block
30 .. Microphone selection result display means
301-306 .. Variable gain amplifier
32. Voiceprint authentication department
34 .. Amplifier gain adjustment section
36. Imaging adjustment unit
37..Image changeover switch circuit
38. ・ Image composition part
40 (40A, 40B) .. TV camera device (imaging means)

Claims

A plurality of microphones, each annularly arranged radially,
A plurality of first small-sized imaging means provided in correspondence with each of the plurality of microphones and disposed in proximity to the corresponding microphone so that a sound collection range of the corresponding microphone can be imaged;
At least one second small image pickup means disposed so as to be capable of picking up a sound collection range of the microphone at a predetermined position between each microphone and the corresponding first small image pickup means;
A first relationship between each microphone and each corresponding first small-sized image pickup means, and when the second small-size image pickup means located in the vicinity of the microphone is located, the second small-size image pickup means Storage means for storing the second relationship of
Microphone selection means for detecting sound collection signals of the plurality of microphones and selecting a microphone that has detected an effective sound collection signal among the detected sound collection signals;
Based on the first relation stored in the storage means, the first small image pickup means close to the selected microphone is selected, and the second relation stored in the storage means applies. An imaging means selecting means for selecting the second small imaging means when the second small imaging means is present;
When there is a first image signal picked up by the first small image pickup means selected by the image pickup means selection means and the corresponding second small image pickup means, the second image pickup image picked up by the second small image pickup means is present . Imaging signal selection means for selectively outputting the two imaging signals ;
The imaging signal selection unit is selected, and the first imaging signal and the second imaging signal are combined into one image combining unit or divided into one screen.
Audio collection / video imaging device.

When the image pickup means selection means selects the first small image pickup means and the second small image pickup means,
The imaging signal selection means switches the first imaging signal and the second imaging signal continuously and inputs them to the image synthesis means,
The image synthesizing unit synthesizes the first imaging signal and the second imaging signal that are continuously input into one screen based on a predesignated condition, or within one screen. Divide multiple image signals,
The sound collection / video imaging apparatus according to claim 1.

The first small-sized imaging means is a CCD camera;
The second small-sized imaging means is a CCD camera;
The sound collection / video imaging apparatus according to claim 1 or 2.

The first small-sized imaging unit and / or the first small-sized imaging unit has a zoom function for performing zoom processing according to an instruction.
The sound collection / video imaging apparatus according to claim 3.

The sound collection / video imaging device further includes voiceprint authentication means for authenticating voiceprints of a plurality of speakers using the plurality of microphones,
The microphone selection means selects the effective microphone of the collected sound when the voiceprint authentication is performed by the voiceprint authentication means;
The sound collection / video imaging apparatus according to claim 1.

The microphone selecting means sets at least the first imaging means to a default state when the voiceprint authentication means is not authenticated by the voiceprint authentication means;
The sound collection / video imaging apparatus according to claim 5.

The microphone selection means does not change at least the selection of the microphone and does not change the first imaging means as the default state.
The sound collection / video imaging apparatus according to claim 6.

The microphone selection means sets at least the first imaging means as an initial imaging condition as the default state;
The sound collection / video imaging apparatus according to claim 7.

Each of the microphones has a predetermined directivity,
The sound collection / video imaging apparatus according to claim 1.