JPH10191290A

JPH10191290A - Video camera with built-in microphone

Info

Publication number: JPH10191290A
Application number: JP8350065A
Authority: JP
Inventors: Hiroo Jofu; 浩男上符
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 1996-12-27
Filing date: 1996-12-27
Publication date: 1998-07-21

Abstract

PROBLEM TO BE SOLVED: To precisely collect the voice of a speaker by incorporating plural microphones in a video camera, changing the synthesis characteristics of the outputs of respective microphones in accordance with the racking angle and the zoom angle of the video camera and searching a maximum direction with a second beam forming circuit. SOLUTION: An array microphone output control part 50 receives a control signal corresponding to the movement of a pan tilt zoom camera unit 80 from a camera control part 60 and decides parameters given to variable delay circuits 14, 24 and 34 and variable gain circuits 15, 25 and 35. The output signals are added in an adder 40 and the voice of the speaker is precisely grasped and outputted. Input signals are simultaneously added to second variable delay circuits 71, 72 and 73 and a search control part 70 sequentially sets the delay time of the variable delay circuits 71, 72 and 73 and searchs the direction of a sound source for scanning the direction of the maximum sensitivity of the array microphone. Thus, the conversation high in clarity is realized.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、テレビ会議システ
ム等に利用する音声画像入力装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio / video input device used for a video conference system or the like.

【０００２】[0002]

【従来の技術】一般に販売されているテレビ会議システ
ムは、広い範囲を映し出す要求と、特定の領域を拡大し
て映し出す要求を両立させるため、左右上下方向に指向
角を振ることができ、画面表示範囲を広くしたり狭くし
たり、言い替えると望遠・広角を連続的に変えることが
できる電動パンチルトズームカメラが搭載されている。
このようなシステムでは、会議の参加者が広い範囲に着
席するため、各参加者の音声をムラなく拾う目的から複
数個のマイクロホンを分散して配置することが通常行わ
れていた。特開平１−２６４４８７では複数個のマイク
ロホンの受音レベルを比較することによって話者方向を
推定し、自動的にカメラを推定した話者に向けるビデオ
カメラが考案されている。このように、話者を推定し、
カメラを自動的に話者方向に向ける装置は利用者に利便
性を与えることができる。2. Description of the Related Art Generally, a video conferencing system that is marketed can swing a directivity angle in the left, right, up and down directions in order to satisfy a request for projecting a wide area and a request for projecting a specific area in an enlarged manner. It is equipped with an electric pan-tilt zoom camera that can increase or decrease the range, in other words, continuously change the telephoto and wide-angle.
In such a system, since the participants of the conference are seated in a wide area, a plurality of microphones are usually arranged in a distributed manner for the purpose of picking up the voice of each participant evenly. Japanese Patent Laid-Open No. 1-264487 has devised a video camera which estimates the speaker direction by comparing the sound receiving levels of a plurality of microphones and automatically points the camera to the estimated speaker. Thus, the speaker is estimated,
Devices that automatically point the camera toward the speaker can provide convenience to the user.

【０００３】[0003]

【発明が解決しようとする課題】上記従来例は、会議参
加者の人数と同数以上のマイクロホンを参加者の近傍に
設置する必要があり、あらかじめカメラ側にマイクロホ
ンの位置情報を設定する必要があった。そのため、参加
人数の増減や参加者の配置が変わるごとに設定しなけれ
ばならず、きわめて使いにくいシステムとなり利用者に
不便をかけていた。また、多くのマイクロホンを会議テ
ーブル上に設置するため本体との間にマイクロホンケー
ブルを敷設する必要があり、外観を損なうと同時に床上
の配線に足を引っかけるといった問題があった。本発明
の目的は、上記欠点を解決し、会議テーブルにマイクロ
ホンを設置すること無く、発言者を推定でき、自動的に
発言者の方向にカメラを向けるようにすることができ、
さらに、予め利用者が設定した発言者方向を優先的に探
索する機能と、自動的に探索された音源方向が予め設定
された方向と一致しない場合は新たに設定値に加えら
れ、次の探索に利用され、より音源探索方向を向上させ
ることができるため、より快適なテレビ会議を実現する
ことができる。In the above conventional example, it is necessary to install microphones at least as many as the number of conference participants in the vicinity of the participants, and it is necessary to set microphone position information in advance on the camera side. Was. For this reason, the system must be set each time the number of participants increases or decreases or the arrangement of participants changes, resulting in a system that is extremely difficult to use, causing inconvenience to users. In addition, since many microphones are set on the conference table, it is necessary to lay a microphone cable between the microphone and the main body, and there is a problem that the appearance is impaired and the foot is hooked on the wiring on the floor. An object of the present invention is to solve the above-described drawbacks, to estimate a speaker without installing a microphone on a conference table, and to automatically point a camera in the direction of the speaker,
Furthermore, a function for preferentially searching for the speaker direction set in advance by the user, and when the automatically searched sound source direction does not match the preset direction, a new value is added to the set value and the next search is performed. Since the sound source search direction can be further improved, a more comfortable video conference can be realized.

【０００４】[0004]

【課題を解決するための手段】上記目的は、テレビ会議
システムに不可欠なビデオカメラに、複数個のマイクロ
ホンを内蔵させ、各マイクロホンの出力信号をビームフ
ォーミング技術を用いて合成する。各マイクロホンの出
力は２系統のビームフォーミング回路に入力される。第
１のビームフォーミング回路ではビデオカメラの振り角
やズーム角に応じてその合成特性を変化させ、発言者を
的確に捕らえ集音できるようになるため騒音が少なく明
瞭度の高い集音を実現する。第２のビームフォーミング
回路では、音源方向を走査し、最大の受音レベルが得ら
れる、言い換えれば、ビームフォーミングマイクロホン
の出力が最大になる方向を探索することで、話者方向を
検出する。検出結果はビデオカメラ制御回路に入力さ
れ、話者方向にビデオカメラを向けることができる。さ
らに、予め利用者が設定した発言者方向を優先的に探索
する機能と、自動的に探索された音源方向が予め設定さ
れた方向と一致しない場合は新たに設定値に加えられ、
次の探索に利用され、より音源方向探索方向を向上させ
ることが出来るため、より快適なテレビ会議を実現する
ことができる。The above object is achieved by incorporating a plurality of microphones into a video camera which is indispensable for a video conference system, and synthesizing output signals of the microphones using a beam forming technique. The output of each microphone is input to two beamforming circuits. In the first beam forming circuit, the synthesizing characteristics are changed according to the swing angle and zoom angle of the video camera, so that the speaker can be accurately caught and collected, thereby realizing low-noise, high-clarity sound collection. . The second beamforming circuit scans the direction of the sound source to obtain the maximum sound receiving level, in other words, searches for the direction in which the output of the beamforming microphone becomes maximum, thereby detecting the speaker direction. The detection result is input to the video camera control circuit, and the video camera can be directed toward the speaker. Further, a function of preferentially searching for the speaker direction set in advance by the user, and when the automatically searched sound source direction does not match the preset direction, it is newly added to the set value,
Since it is used for the next search and the sound source direction search direction can be further improved, a more comfortable video conference can be realized.

【０００５】ビームフォーミング技術の詳細は“Multid
imentional Digital Signal Processing”(Prentice Ha
ll刊) に述べられている。このような構成とすることに
より、複数個のマイクロホンの並びとそれに接続され
た、遅延回路、加算回路の働きにより、話者方向を走査
し、話者の方向を探索し、ビデオカメラの方向を話者方
向に自動的に変える。前記複数個のマイクロホンの並び
には前記遅延回路、加算回路とは異なる、遅延回路、利
得回路、加算回路が接続され、カメラの振れ角やズーム
角に連動して複数のマイクロホンから得られる信号に遅
延を与え、利得を変えて加算することによりマイクロホ
ンの指向方向や指向特性を変化させることができる。こ
のような作用によって、ビデオカメラの操作を参加者が
行わなくても、テレビ会議参加者の発言者にカメラが向
き、高感度に集音できるため、他の発言者や室内の騒音
に影響されることはない。また、カメラの撮影範囲を広
くした場合は、参加者全体の音声を集音できるため指向
特性を広くすることができる。このように、カメラの方
向を変える操作を行わなくても音声の集音特性を簡単に
変えることができるため、わずらわしい操作を行うこと
なく、雑音に影響されず、しかも臨場感のあるテレビ会
議を実現できる。[0005] For details of the beam forming technique, see “Multid
imentional Digital Signal Processing ”(Prentice Ha
ll). With such a configuration, the direction of the speaker is searched, the direction of the speaker is searched, and the direction of the video camera is changed by the arrangement of the plurality of microphones and the operation of the delay circuit and the addition circuit connected thereto. Automatically change to speaker direction. The array of the plurality of microphones is connected to a delay circuit, a gain circuit, and an addition circuit, which are different from the delay circuit and the addition circuit, and delays signals obtained from the plurality of microphones in conjunction with a camera shake angle and a zoom angle. , And changing the gain and adding the values, it is possible to change the directional direction and directional characteristics of the microphone. By such an operation, the camera can be directed to the speaker of the video conference participant without any operation of the video camera by the participant, and the sound can be collected with high sensitivity. Never. Further, when the shooting range of the camera is widened, the voice of the entire participant can be collected, so that the directional characteristics can be widened. As described above, since the sound pickup characteristics of the sound can be easily changed without performing the operation of changing the direction of the camera, a TV conference that is not affected by noise and has a sense of reality can be performed without performing a troublesome operation. realizable.

【０００６】[0006]

【発明の実施の形態】以下、本発明の実施例について図
面を用いて説明する。本発明の実施例の説明に先立ち、
ビームフォーミング技術の概要を説明する。マイクロホ
ンの数をN 個、i 番目のマイクロホンの出力をri(t) と
する。所望する方向からの信号に対する各マイクロホン
の出力が同相になるように、それぞれに適当な遅延時間
τi を与え、重み係数Wiをかけて加算し、出力信号を得
る。所望する方向から伝搬して来る信号以外は同相にな
らないため加算すると互いに打ち消しあって減衰するた
め鋭い指向性を実現できる。指向特性は重み係数、遅延
時間、マイクロホンの数で決まり、重み係数、遅延量を
適応的に変えることによって信号源の位置に追随させる
ことができる。出力信号y(t)は次式で表される。Embodiments of the present invention will be described below with reference to the drawings. Prior to the description of the embodiments of the present invention,
An outline of the beam forming technique will be described. Let the number of microphones be N and the output of the i-th microphone be ri (t). An appropriate delay time τi is given to each of the microphones so that the output from each microphone with respect to a signal from a desired direction becomes in-phase, and the signals are added by multiplying by a weighting factor Wi to obtain an output signal. Since signals other than the signal propagating from the desired direction do not become in-phase, if added, they cancel each other out and attenuate, so that sharp directivity can be realized. The directional characteristics are determined by a weighting factor, delay time, and the number of microphones, and can follow the position of the signal source by adaptively changing the weighting factor and the delay amount. The output signal y (t) is represented by the following equation.

【０００７】[0007]

【数１】 (Equation 1)

【０００８】この特性を利用することにより、未知の音
源方向を探索することも実現できる。τi の値によって
最大の感度を持つ方向が変わる特性を利用して、音源方
向を探索することができる。図１は本発明の実施例にお
けるマイクロホンを３本接続した例を表している。入力
音声信号はマイクロホン( １１、２１、３１) で集音さ
れ、マイクロホンアンプ（１２、２２、３２) で増幅さ
れアナログデジタルコンバータ( １３、２３、３３)(以
下A/D)に入力される。前記A/D では入力音声信号を標本
化、離散化しデジタル入力音声信号に変換する。可変遅
延回路( １４、２４、３４) に入力されたデジタル入力
音声信号は、アレイマイク出力制御部( ５０) によって
決定された遅延時間だけ入力信号を遅延して出力され
る。遅延されたデジタル入力音声信号は、可変利得回路
( １５、２５、３５) で、アレイマイク出力制御部( ５
０) によって決定された利得がかけられ、加算器( ４
０) で加算されてデジタル出力音声信号を得る。同時
に、前記デジタル入力音声信号は、前記可変遅延回路と
は異なる可変遅延回路( ７１、７２、７３) に入力さ
れ、探索制御部( ７０) はアレイマイクの最大感度方向
を走査するために可変遅延回路に対して遅延時間を順次
設定していき、音源方向を探索する。探索された音源方
向はカメラ制御部( ６０) に伝えられ、カメラ制御部で
はカメラの方向を音源方向に向けるようカメラユニット
を制御する。前記デジタル出力音声信号はデジタルアナ
ログコンバータ( ４１)(以下D/A)でアナログ出力音声信
号に変換される。ただし、本装置の出力形式としてはデ
ジタル出力音声信号のみとしても良く、この場合はD/A
は不要である。前記マイクロホンから可変利得回路まで
の組合せは、マイクロホンの数に応じて複数個必要とな
る。前記音声入力部制御ユニット( ５０) はカメラ制御
ユニット( ６０) から制御信号を受け取るが、この制御
信号は電動パンチルトズームカメラユニット( ８０) の
動きに対応したもので、左右の振れ角( パン) 、ズーム
の状態を表しており、前記音声入力部制御ユニット( ５
０) は、この制御信号を元に、各可変遅延回路( １４、
２４、３４) および可変利得回路( １５、２５、３５)
に与えるパラメータを決定する。まず、前記探索制御部
( ７０) の動作の詳細を説明する。各マイクロホン入力
される音が正面から入射される場合は、同位相で集音さ
れる。しかし、斜め方向から入射される場合は図２に示
すように、各マイクロホン毎に入力信号に時間差が生じ
ることになる。各マイクロホンから入力された信号を加
算するとそれぞれの信号間に位相の違いが生じるため、
打ち消しあって減衰する。ここで、図３に示すように遅
延回路を用いて各マイクロホンで集音された信号から、
各信号間の時間差を取り除くと、その信号があたかも正
面から入力された音と等価になり、その方向に高い指向
性をもつという効果が得られる。この遅延回路の遅延時
間を制御することにより指向方向を変化させることがで
きる。この特性を利用し、マイクロホンの指向方向を複
数の発言者の方向で走査し、もっとも受音レベルの高い
方向を探索することができる。これによって、発言者方
向を推定できるためビデオカメラを発言者方向に自動的
に向けるような制御が可能となる。次に、前記アレイマ
イク出力制御部( ５０) の動作の詳細を説明する。した
がって、音声入力部制御ユニット( ５０) の働きは、所
望の方向に指向性を持たせるように、前記可変遅延回路
( １４、２４、３４) の遅延量を決定することと、指向
特性の鋭さを変化させるために前記可変利得回路( １
５、２５、３５) の利得を決定することにある。まず遅
延量決定方法について説明する。本実施例に示したマイ
クロホンアレーは鋭い指向性が得られることが特徴であ
る。そのため、カメラの指向方向にいる発言者の声は効
率良く集音できるが、逆に指向方向にいない発言者の声
や騒音は集音されにくい。一般にテレビ会議システムに
用いられるカメラは操作者によって発言者の方向に向き
を変え、映し出すことが可能であるため、マイクロホン
の指向方向をカメラの振れ角に応じて変えれば良い。カ
メラの振れ角はカメラ制御ユニット( ６０) が制御して
いるため、カメラの方向は知ることができる。本システ
ムは入力音声信号を離散信号としてあつかうため設定で
きる各マイクロホン間の遅延量の設定自体も離散的にな
る。そこで、カメラの振れ角を量子化し、得られたイン
デックスを用いてあらかじめ作成した遅延量を格納した
テーブルを検索し各可変遅延回路の遅延量を求め、前記
可変遅延回路を設定することで所望の特性をえることが
できる。発言者が１人の場合は発言者をズームアップし
て映すため、マイクロホンが鋭い指向性を持っていても
違和感はないが、多くの発言者が同時に話をしていると
きには指向性を弱めないと発言を集音できず不都合が生
じる。そこで、カメラ制御ユニット( ６０) は各マイク
ロホンの感度を制御し、指向特性を可変させる働きを持
っている。多くの発言者があったり、いちいちカメラ操
作することなく参加者全体を映しいる場合は撮影範囲を
広くするため、カメラのズーム設定を広角側に合わせる
ことになる。この場合もカメラのズームはカメラ制御ユ
ニット( ６０) が制御しているため、ズーム設定状態を
知ることができる。本実施例では、中央に設置したマイ
クロホンを主ユニットとして制御しているが、中央のマ
イクロホンから得られた信号の利得を高め、逆に両端の
マイクロホンから得られた信号の利得を下げることによ
り指向特性の強弱を制御できる。極端な例としては、中
央のマイクロホンで得られる信号だけを用いれば最も広
い指向性が得られる。この場合もカメラのズーム角を量
子化し、得られたインデックスを用いてあらかじめ作成
した利得量を格納したテーブルを検索し各可変利得回路
の利得量を求め、前記可変利得回路を設定することによ
り所望の特性を得ることができる。このように極めて簡
単な構成で優れた効果が得られる。遅延量を変えた場合
の本実施例によるマイクロホンの指向特性に変化を図４
に示す。また、利得量を変えた場合の本実施例によるマ
イクロホンの指向特性に変化を図５に示す。一般にマイ
クロホンの本数を多くすると指向特性は鋭くなる。ま
た、マイクロホンの間隔を広くすると、低い周波数まで
指向特性を発揮できるようになるが、いずれの場合も装
置が大型化することになる。また、A/D のサンプリング
レートを高くすると遅延量設定の精度が向上し、指向方
向の設定精度が高くなる。以上のような制御を行うこと
により、カメラの振れ角に応じてマイクロホンの指向方
向を変化させることにより、発言者の声を効率的に集音
することができ、カメラズームをワイド側にした場合
は、指向特性を広くすることにより参加者全体の声を拾
い、カメラズームをテレ側にした場合は、指向特性を鋭
くすることにより、カメラに映し出された発言者の声を
効果的に拾うことができる。続いて、音源方向のプリセ
ットと、それを利用した音源方向探索精度の向上につい
て説明する。テレビ会議利用に際し、発言者方向を予め
カメラに設定しておくことで、カメラを瞬時に発言者方
向に向けることができる。いわゆるカメラプリセット機
能は多くの装置で実現されている。本発明による装置で
は、発言者方向は自動的に推定されるためカメラプリセ
ット機能が無くとも、従来製品に比べ十分効果的ではあ
るが、発言者が近傍している場合や、騒音や反響の多い
環境では必ずしも正しい方向推定ができない場合があ
る。そこで、付加的な機能としてカメラプリセット機能
を加え、先に説明した音源探索機能を組み合わせてより
使い易いテレビ会議を実現することができる。テレビ会
議の利用者は、会議を始める時に発言者の座る場所に応
じ、カメラプリセット機能を使ってカメラの方向や画面
表示範囲をプリセットメモリに設定できる。プリセット
された方向には発言者が実際にいることが期待されるた
め、自動的な音源探索において、予めプリセットメモリ
に設定された音源方向を優先的に探索することで、効率
的な探索を実現する。しかしながら、音源方向の探索に
おいて、プリセットメモリに記憶された音源方向に有為
な探索結果が得られない場合は、他の方向を探索し新た
な音源を推定する。ここで、推定された音源方向は前記
プリセットメモリの空き領域に追加して記憶され、次の
音源方向探索において利用される。もし、空き領域が無
い場合はプリセットメモリに記憶された音源方向の中か
ら最も探索頻度の低い方向を記憶したメモリに上書きし
てもよい。By utilizing this characteristic, it is also possible to search for an unknown sound source direction. The direction of the sound source can be searched using the characteristic that the direction having the maximum sensitivity changes depending on the value of τi. FIG. 1 shows an example in which three microphones according to the embodiment of the present invention are connected. The input audio signal is collected by microphones (11, 21, 31), amplified by microphone amplifiers (12, 22, 32) and input to analog-to-digital converters (13, 23, 33) (hereinafter A / D). In the A / D, the input audio signal is sampled, discretized, and converted into a digital input audio signal. The digital input audio signal input to the variable delay circuits (14, 24, 34) is output after delaying the input signal by the delay time determined by the array microphone output control unit (50). The delayed digital input audio signal is supplied to a variable gain circuit.
(15, 25, 35), the array microphone output control unit (5
0) is multiplied by the gain determined by the adder (4).
0) to obtain a digital output audio signal. At the same time, the digital input audio signal is input to a variable delay circuit (71, 72, 73) different from the variable delay circuit, and the search control unit (70) controls the variable delay to scan the array microphone in the maximum sensitivity direction. The delay time is sequentially set for the circuit, and the sound source direction is searched. The searched sound source direction is transmitted to the camera control unit (60), and the camera control unit controls the camera unit to direct the camera to the sound source direction. The digital output audio signal is converted into an analog output audio signal by a digital / analog converter (41) (hereinafter D / A). However, the output format of this device may be a digital output audio signal only. In this case, D / A
Is unnecessary. A plurality of combinations from the microphone to the variable gain circuit are required according to the number of microphones. The voice input unit control unit (50) receives a control signal from the camera control unit (60). The control signal corresponds to the movement of the electric pan-tilt zoom camera unit (80), and the right and left swing angle (pan) , The zoom state, and the voice input unit control unit (5)
0) is based on this control signal, and each variable delay circuit (14,
24, 34) and a variable gain circuit (15, 25, 35)
Determine the parameters to be given to First, the search control unit
The operation (70) will be described in detail. When the sound input to each microphone is incident from the front, the sound is collected in the same phase. However, when the light is incident from an oblique direction, as shown in FIG. 2, a time difference occurs in the input signal for each microphone. Addition of the signals input from each microphone causes a phase difference between the signals,
Cancel and attenuate. Here, as shown in FIG. 3, from the signals collected by the microphones using a delay circuit,
If the time difference between the signals is removed, the signal becomes equivalent to a sound input from the front, and the effect of having high directivity in that direction can be obtained. The directivity can be changed by controlling the delay time of the delay circuit. By utilizing this characteristic, the direction of the microphone can be scanned in the directions of a plurality of speakers, and the direction with the highest sound receiving level can be searched. As a result, since the speaker direction can be estimated, it is possible to control the video camera to automatically point at the speaker direction. Next, the operation of the array microphone output control unit (50) will be described in detail. Therefore, the function of the voice input unit control unit (50) is such that the variable delay circuit is provided so as to have directivity in a desired direction.
(14, 24, 34) and the variable gain circuit (1) to change the sharpness of the directional characteristic.
5, 25, 35). First, a delay amount determining method will be described. The microphone array shown in this embodiment is characterized in that sharp directivity is obtained. Therefore, the voice of the speaker in the direction of the camera can be efficiently collected, but the voice and noise of the speaker not in the direction of the camera are hardly collected. In general, a camera used in a video conference system can be turned in the direction of a speaker by an operator and projected, so that the directivity direction of the microphone may be changed according to the swing angle of the camera. Since the camera swing angle is controlled by the camera control unit (60), the direction of the camera can be known. In this system, since the input audio signal is treated as a discrete signal, the setting itself of the delay amount between the microphones becomes discrete. Therefore, the camera shake angle is quantized, the obtained index is used to search a table storing the delay amount created in advance, the delay amount of each variable delay circuit is obtained, and the variable delay circuit is set to a desired value. Characteristics can be obtained. When there is only one speaker, the speaker is zoomed up and projected, so there is no discomfort even if the microphone has sharp directivity, but it does not weaken the directivity when many speakers are talking at the same time Cannot be collected, and inconvenience occurs. Therefore, the camera control unit (60) has a function of controlling the sensitivity of each microphone and changing the directional characteristics. If there are many speakers or the entire participant is projected without operating the camera each time, the zoom setting of the camera should be adjusted to the wide-angle side in order to widen the shooting range. Also in this case, since the camera control is performed by the camera control unit (60), the user can know the zoom setting state. In the present embodiment, the central microphone is controlled as the main unit, but the gain of the signal obtained from the central microphone is increased, and the gain of the signal obtained from the microphones at both ends is reduced. The strength of the characteristics can be controlled. As an extreme example, the widest directivity can be obtained by using only the signal obtained from the central microphone. Also in this case, it is desirable to quantize the zoom angle of the camera, use the obtained index to search a table storing gain amounts created in advance, obtain the gain amounts of the respective variable gain circuits, and set the variable gain circuits. Characteristic can be obtained. Thus, excellent effects can be obtained with an extremely simple configuration. FIG. 4 shows changes in the directional characteristics of the microphone according to the present embodiment when the delay amount is changed.
Shown in FIG. 5 shows a change in the directional characteristics of the microphone according to the present embodiment when the gain amount is changed. Generally, when the number of microphones is increased, the directivity becomes sharper. Further, if the interval between the microphones is widened, the directional characteristics can be exhibited even at a low frequency, but in any case, the device becomes large. Also, when the A / D sampling rate is increased, the accuracy of delay amount setting is improved, and the setting accuracy of the directional direction is increased. By performing the above control, the voice direction of the speaker can be efficiently collected by changing the directivity direction of the microphone according to the shake angle of the camera, and when the camera zoom is set to the wide side. Is to pick up the voice of the entire participant by widening the directional characteristics, and if the camera zoom is set to the tele side, sharpen the directional characteristics to effectively pick up the voice of the speaker reflected on the camera Can be. Next, a description will be given of presetting of the sound source direction and improvement of the sound source direction search accuracy using the preset sound source direction. By setting the speaker direction to the camera in advance when using the video conference, the camera can be instantly turned to the speaker direction. The so-called camera preset function is realized by many devices. In the device according to the present invention, the direction of the speaker is automatically estimated, so even if there is no camera preset function, it is sufficiently effective as compared with the conventional product, but when the speaker is near, or there is much noise and reverberation. In some environments, correct direction estimation may not always be possible. Therefore, a camera conference function can be added as an additional function, and a more convenient TV conference can be realized by combining the sound source search function described above. The user of the video conference can set the camera direction and the screen display range in the preset memory using the camera preset function according to the place where the speaker sits when starting the conference. Since it is expected that the speaker is actually in the preset direction, an efficient search is realized by preferentially searching the sound source direction preset in the preset memory during automatic sound source search. I do. However, in the search for the sound source direction, if a significant search result cannot be obtained in the sound source direction stored in the preset memory, another direction is searched to estimate a new sound source. Here, the estimated sound source direction is additionally stored in the empty area of the preset memory, and is used in the next sound source direction search. If there is no free area, the memory storing the direction with the lowest search frequency among the sound source directions stored in the preset memory may be overwritten.

【０００９】[0009]

【発明の効果】本発明によれば、発言者の近傍にマイク
ロホンを設置することなく、発言者方向を推定でき、カ
メラの触れ角を発言者方向に向けることができる。ま
た、カメラの振れ角に応じてマイクロホンの指向方向を
変化させることにより、発言者の声を効率的に集音する
ことができるため、他の参加者による雑談や室内の騒音
等の影響を受けにくくなる。また、カメラズームワイド
側にした場合は、マイクロホンの指向特性を広くするこ
とにより参加者全体の声を拾うことができ、逆に、カメ
ラズームをテレ側にした場合は、指向特性を鋭くするこ
とができる。これにより、カメラに映し出された発言者
の声を効果的に拾うことができる。したがって、発言者
を的確に捕らえ集音できるようになるため騒音が少なく
明瞭度の高い会話を実現でき、全体を映した場合は広い
範囲から集音できるため臨場感も同時に確保できるた
め、わずらわしい操作を行わなくても、極めて快適なテ
レビ会議が実現できるという効果がある。According to the present invention, the speaker direction can be estimated without placing a microphone near the speaker, and the touch angle of the camera can be directed to the speaker direction. In addition, by changing the direction of the microphone according to the camera's swing angle, the voice of the speaker can be collected efficiently, so that it is not affected by chat by other participants or noise in the room. It becomes difficult. In addition, when the camera zoom is set to the wide side, the voice of the entire participant can be picked up by widening the directional characteristics of the microphone. Conversely, when the camera zoom is set to the tele side, the directional characteristics should be sharpened. Can be. Thereby, the voice of the speaker reflected on the camera can be effectively picked up. Therefore, it is possible to accurately capture the speaker and collect sound, so that conversation with less noise and high clarity can be realized. There is an effect that an extremely comfortable video conference can be realized without performing the video conference.

[Brief description of the drawings]

【図１】本発明による実施例の説明図FIG. 1 is an explanatory diagram of an embodiment according to the present invention.

【図２】原理の説明図( 遅延回路を持たない場合の斜
め入射時の特性)FIG. 2 is an explanatory view of the principle (characteristics at oblique incidence without a delay circuit)

【図３】原理の説明図( 遅延回路を付加し斜め入射方
向に最大感度を持たせる場合)FIG. 3 is an explanatory view of the principle (when a delay circuit is added to provide maximum sensitivity in an oblique incident direction).

【図４】本実施例による指向方向変化の説明図FIG. 4 is an explanatory diagram of a change in a directivity direction according to the embodiment.

【図５】本実施例による指向特性変化の説明図FIG. 5 is an explanatory diagram of a change in directional characteristics according to the embodiment.

[Explanation of symbols]

１１、２１、３１マイクロホン、１２、２２、３２マイクロホンアンプ、１３、２３、３３アナログデジタルコンバータ、１４、２４、３４可変遅延回路１５、２５、３５可変利得回路、４０加算器４１デジタルアナログコンバータ、５０アレイマイク出力制御部６０カメラ制御部、７０探索制御部７１、７２、７３可変遅延回路７４加算器、８０電動パンチルトズームカメラユニット 11, 21, 31 microphone, 12, 22, 32 microphone amplifier, 13, 23, 33 analog / digital converter, 14, 24, 34 variable delay circuit 15, 25, 35 variable gain circuit, 40 adder 41 digital / analog converter, 50 Array microphone output control unit 60 Camera control unit, 70 Search control unit 71, 72, 73 Variable delay circuit 74 Adder, 80 Electric pan-tilt zoom camera unit

Claims

[Claims]

1. A video signal camera with a built-in microphone that includes a plurality of microphones and has an electric pan-tilt zoom function capable of changing a vertical and horizontal direction and a screen display range by an external control signal. The circuit includes an amplifier, an analog-to-digital converter, a first group of delay circuits and gain circuits, and a first group for adding output signals from these gain circuits, respectively. A second adder circuit, wherein the second signal processing circuit is connected to a second group of delay circuits corresponding to the microphones in a one-to-one correspondence with the output signals from each of the analog-to-digital converters. A second addition circuit for adding the output signals from the group of delay circuits, respectively, to the information of the directional angle and the screen display range input by the external control signal. A video camera with a built-in microphone, based on which the first signal processing circuit and the second signal processing circuit are controlled.

2. The delay time of the first group of delay circuits can be set arbitrarily within a predetermined range, and a delay time predetermined by the left and right swing angles of the video camera can be individually set. 2. The video camera with a built-in microphone according to claim 1, wherein the directional direction of the microphone can be set in the same direction as the swing angle direction.

3. The gain coefficients of the plurality of gain circuits can be arbitrarily set within a predetermined range, and the gain coefficients predetermined according to the screen display range of the video camera can be individually set. 3. The video camera with a built-in microphone according to claim 2, wherein the directivity of the microphone can be widened when the distance is wide, and conversely, the directivity of the microphone can be narrowed when the screen display range of the camera is narrow.

4. The delay time of the second group of delay circuits is sequentially set to a predetermined combination,
4. The video camera with a built-in microphone according to claim 3, wherein the direction of the sound source is estimated by changing the directional direction of the microphone and comparing the sound receiving level for each directional direction.

5. A memory having a plurality of memories for temporarily storing the information of the directivity angle and the screen display range, wherein a user can store the information by using an external control signal; A memory to be stored based on the estimated sound source direction can be selected for each memory, and when searching for the sound source direction, the sound source direction can be estimated by preferentially searching for the directional angle direction stored in the memory. The video camera with a built-in microphone according to claim 4.