JP3178509B2

JP3178509B2 - Stereo audio teleconferencing equipment

Info

Publication number: JP3178509B2
Application number: JP23623796A
Authority: JP
Inventors: 達也加藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-09-06
Filing date: 1996-09-06
Publication date: 2001-06-18
Anticipated expiration: 2016-09-06
Also published as: JPH1084539A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ステレオ音声テレ
ビ会議装置に関し、特にモノラル音声から疑似ステレオ
音声を生成するステレオ音声テレビ会議装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a stereo audio / video conference apparatus, and more particularly to a stereo audio / video conference apparatus for generating pseudo stereo audio from monaural audio.

【０００２】[0002]

【従来の技術】従来、この種のステレオ音声テレビ会議
装置は、例えば特開平５−３５６７号に示されるよう
に、テレビ会議装置において、より多くの会議の臨場感
を得るために、従来から一般的であるモノラルであるテ
レビ会議装置の音声を、ステレオ音声で実現するために
用いられる。2. Description of the Related Art Conventionally, this type of stereo audio / video conference apparatus has been conventionally used in a video conference apparatus in order to obtain more realism of a conference, as shown in, for example, Japanese Patent Application Laid-Open No. 5-3567. It is used to realize stereophonic audio of a video conference device that is a monaural target.

【０００３】図４は、従来のステレオ音声テレビ会議装
置の一例を示すブロック図である。この装置は、撮像カ
メラ及びディスプレイを備えて構成されるビデオ入出力
部１０１と、ビデオ信号を符号化・復号化するビデオコ
ーデック１０２と、信号の切り替え操作を行うマルチ・
デマルチプレクサ１０３と、ネットワークと装置全体と
の接続を行うネットワークインタフェイス１０４と、前
記マルチ・デマルチプレクサ１０３にそれぞれ接続され
たディレイ１０５ａ，１０５ｂと、オーディオ信号を符
号化・復号化するオーディオコーデック１０６ａ，１０
６ｂと、マイク等を備えて構成されるオーディオ出力部
１０８ａ，１０８ｂと、前記オーディオコーデック１０
６ａ，１０６ｂとオーディオ出力部１０８ａ，１０８ｂ
との間に介設された切り替えスイッチ１１０ａ，１１０
ｂと、前記のマルチ・デマルチプレクサ１０３に接続さ
れたエンド・エンド信号制御部１１１と、ネットワーク
インタフェイス１０４に接続されたエンド・網信号制御
部１１２と、エンド・エンド信号制御部１１１及びエン
ド・網信号制御部１１２に接続されたシステム制御部１
０９とを備えている。FIG. 4 is a block diagram showing an example of a conventional stereo audio / video conference apparatus. The apparatus includes a video input / output unit 101 including an imaging camera and a display, a video codec 102 for encoding / decoding a video signal, and a multi-input / output unit for performing a signal switching operation.
A demultiplexer 103, a network interface 104 for connecting the network to the entire apparatus, delays 105a and 105b connected to the multi-demultiplexer 103, and audio codecs 106a and 106a for encoding and decoding audio signals. 10
6b, audio output units 108a and 108b including a microphone and the like, and the audio codec 10
6a, 106b and audio output units 108a, 108b
Switches 110a, 110 interposed between
b, an end-end signal control unit 111 connected to the multi-demultiplexer 103, an end-network signal control unit 112 connected to the network interface 104, an end-end signal control unit 111 and an end-to-end signal control unit. System control unit 1 connected to network signal control unit 112
09.

【０００４】次に動作を説明する。切り替えスイッチ１
１０ａ，１１０ｂは、ユーザ操作により切り替わるもの
であり、切り替えスイッチ１１０ａ，１１０ｂが図中実
線の位置にある時には、オーディオコーデック１０６ｂ
とオーディオ出力部１０８ｂがそれぞれ接続され、上記
実線で示す状態から切り替えスイッチ１１０ａが図中点
線の位置に切り替わったときには、両オーディオ出力部
１０８ａ，１０８ｂがオーディオコーデック１０６ｂに
接続され、上記実線で示す状態から切り替えスイッチ１
１０ｂが図中点線の位置に切り替わったときには、両オ
ーディオ出力部１０８ａ，１０８ｂがオーディオコーデ
ック１０６ａに接続されるようになっている。Next, the operation will be described. Switch 1
10a and 110b are switched by a user operation. When the changeover switches 110a and 110b are at the positions indicated by solid lines in FIG.
When the changeover switch 110a is switched from the state indicated by the solid line to the position indicated by the dotted line in the figure, the audio output sections 108a and 108b are connected to the audio codec 106b, and the state indicated by the solid line. Switch 1
When 10b is switched to the position indicated by the dotted line in the figure, both audio output units 108a and 108b are connected to the audio codec 106a.

【０００５】また、マルチ・デマルチプレクサ１０３
は、オーディオ入力部１０７ａ，１０７ｂにより入力さ
れたそれぞれの音声データを交信フレーム内に組み込ん
で送信し、及び、受信した交信フレーム内に組み込まれ
ている複数の音声データを個々の音声データに分離して
オーディオコーデック１０６ａ，１０６ｂに出力するよ
うになっている。Further, the multi-demultiplexer 103
Transmits the audio data input by the audio input units 107a and 107b in a communication frame, and separates the plurality of audio data embedded in the received communication frame into individual audio data. Output to the audio codecs 106a and 106b.

【０００６】上記の切り替えスイッチ１１０ａ，１１０
ｂが実線の位置にあるとき、相手先から上記の交信フレ
ームが送られてくると、この交信フレームはマルチ・デ
マルチプレクサ１０３により処理され、１つの音声デー
タ部分はオーディオコーデック１０６ａを経てオーディ
オ出力部１０８ａに送られ、他の１つの音声データ部分
はオーディオコーデック１０６ｂを経てオーディオ出力
部１０８ｂに送られる。The above changeover switches 110a, 110
When the communication frame is sent from the other party when b is at the position of the solid line, the communication frame is processed by the multi-demultiplexer 103, and one audio data portion is passed through the audio codec 106a to the audio output unit. The audio data is sent to the audio output unit 108b via the audio codec 106b.

【０００７】ここで、１つの音声データがライト（Ｒ）
音声で他の１つの音声データがレフト（Ｌ）であれば、
受信音声はステレオ音声となって聞こえる事になり、臨
場感がいっそう高まることになる。さらに、１つの音声
データが会議音声で、他の１つの音声データがその通訳
音声であれば、オーディオ出力部１０８ａからは会議音
声が出力され、オーディオ出力部１０８ｂからは通訳音
声が出力されることになる。従って、通訳音声が聞きた
い場合には、オーディオ出力部１０８ｂを採ればよい。
また、切り換えスイッチ１１０ａ，１１０ｂを操作する
事により、会議音声をオーディオ出力部１０８ａ，１０
８ｂの両方から出力させたり、或いは、通訳音声をオー
ディオ出力部１０８ａ，１０８ｂの両方から出力させた
りする事が出来る。Here, one audio data is a write (R)
If the other one of the audio data is left (L),
The received sound will be heard as stereo sound, and the sense of presence will be further enhanced. Furthermore, if one voice data is a conference voice and the other voice data is its interpreted voice, the conference voice is output from the audio output unit 108a and the interpreted voice is output from the audio output unit 108b. become. Therefore, when the interpreter wants to listen to the interpreted voice, the audio output unit 108b may be used.
By operating the changeover switches 110a and 110b, the conference audio is output to the audio output units 108a and 108b.
8b, or the interpreted voice can be output from both of the audio output units 108a and 108b.

【０００８】[0008]

【発明が解決しようとする課題】この従来のステレオ音
声テレビ会議装置では、一般に音声の符号化、復号化を
行うオーディオコーデックは、特殊な集積回路等を使用
するため高価な上に、そのオーディオコーデックを２チ
ャンネル分、即ち２倍必要であるため、装置が極めて高
価になるという問題があった。In this conventional stereo audio / video conference apparatus, an audio codec for encoding and decoding audio is generally expensive due to the use of a special integrated circuit and the like. Is required for two channels, that is, two times, so that there is a problem that the apparatus becomes extremely expensive.

【０００９】本発明の目的は、安価なステレオ音声テレ
ビ会議装置を提供することにある。An object of the present invention is to provide an inexpensive stereo audio / video conference apparatus.

【００１０】[0010]

【課題を解決するための手段】本発明によれば、相手テ
レビ会議装置から転送されテレビ会議の話者に向いた雲
台の方向を表す雲台方向情報に基づいて、前記相手テレ
ビ会議装置から転送され音声復号化装置により復号化さ
れたモノラルの音声信号からステレオ化された音声信号
を生成する音場生成装置を設け、前記音場生成装置で生
成されたステレオ化された音声信号を左右のスピーカに
よりステレオ再生することを特徴とするステレオ音声テ
レビ会議装置が得られる。According to the present invention, based on the pan head direction information indicating the direction of the pan head directed to the speaker of the video conference transferred from the tele tele conference apparatus, the other tele conference apparatus can transmit the video data. A sound field generation device that generates a stereo sound signal from a monaural sound signal transferred and decoded by the sound decoding device is provided, and the stereo sound signal generated by the sound field generation device is converted into a left and right sound signal. A stereo audio / video conference apparatus characterized by performing stereo reproduction with a speaker is obtained.

【００１１】また、本発明によれば、前記音場生成装置
が、入力される前記モノラルの音声信号を前記雲台方向
情報に基づいて可変に減衰させる右チャンネル減衰器
と、入力される前記モノラルの音声信号を前記雲台方向
情報に基づいて可変に減衰させる左チャンネル減衰器
と、入力される前記モノラルの音声信号を前記雲台方向
情報に基づいて可変に遅延させる右チャンネル遅延素子
と、入力される前記モノラルの音声信号を前記雲台方向
情報に基づいて可変に遅延させる左チャンネル遅延素子
とから構成されることを特徴とするステレオ音声テレビ
会議装置が得られる。Further, according to the present invention, the sound field generating device variably attenuates the input monaural audio signal based on the pan head direction information, and the input monaural audio signal. A left channel attenuator that variably attenuates the audio signal of the camera based on the camera platform direction information; a right channel delay element that variably delays the input monaural audio signal based on the camera platform direction information; And a left channel delay element for variably delaying the monaural audio signal based on the camera platform direction information.

【００１２】[0012]

【発明の実施の形態】本発明の実施の形態について図面
を参照して、詳細に説明する。Embodiments of the present invention will be described in detail with reference to the drawings.

【００１３】本発明のステレオ音声会議装置は、音声復
号化装置（図１の３）により復号化された相手テレビ会
議装置のモノラル音声を、相手テレビ会議装置から受信
する雲台方向情報に基づいて擬似的なステレオ音声を生
成する音場生成装置（図１の４）と、音場生成装置（図
１の４）によって生成された擬似的なステレオ音声を再
生するスピーカ（図１の５−Ｒ及び５−Ｌ）と、を有す
る。The stereo audio conference apparatus according to the present invention receives the monaural audio of the other video conference apparatus decoded by the audio decoder (3 in FIG. 1) based on the pan head direction information received from the other video conference apparatus. A sound field generating device (4 in FIG. 1) for generating a pseudo stereo sound, and a speaker (5-R in FIG. 1) for reproducing the pseudo stereo sound generated by the sound field generating device (4 in FIG. 1). And 5-L).

【００１４】また、相手テレビ会議装置の雲台をリモー
トコントロール出来る事も特徴である。具体的には、相
手テレビ会議装置の雲台の現在位置を表す雲台方向情報
を知りながら、相手テレビ会議装置の雲台をリモートコ
ントロールするために雲台操作情報を出力する雲台操作
装置（図１の１０）を含む。Another feature is that the camera platform of the other party's video conference device can be remotely controlled. More specifically, a pan head operating device that outputs pan head operation information to remotely control the pan head of the partner video conference device while knowing the pan head direction information indicating the current position of the head of the partner video conference device ( 1) of FIG. 1 is included.

【００１５】さらに、相手テレビ会議装置からの操作と
自テレビ会議装置における操作により雲台を制御可能で
ある事も特徴である。具体的には、相手テレビ会議装置
から雲台操作情報と、自雲台操作手段（図２の１６）か
らの操作情報とを受け、雲台方向情報を出力する雲台制
御装置（図１の８）と、雲台制御装置（図１の８）が出
力する雲台方向情報により動作するビデオカメラ（図１
の１２）とマイクロフォン（図１の１４）とが設けられ
た雲台（図１の９）と、を含む。Another feature is that the camera platform can be controlled by an operation from the other party's video conference device and an operation from the own video conference device. Specifically, a head control device (see FIG. 1) that receives head operation information from the other party's video conference device and operation information from the own head operation means (16 in FIG. 2) and outputs head direction information. 8) and a video camera (FIG. 1) that operates based on the pan head direction information output from the pan head control device (8 in FIG. 1).
12) and a camera platform (9 in FIG. 1) provided with a microphone (14 in FIG. 1).

【００１６】相手テレビ会議装置から転送されるモノラ
ル音声と雲台方向情報とにより、擬似的にステレオ化さ
れた音声を生成できる。そのため、相手テレビ会議装置
の雲台の向いている方向に再生される音場定位が片寄る
ため、聴覚的に臨場感が豊かなテレビ会議を行う事が可
能となる。A pseudo stereo sound can be generated from the monaural sound and the pan head direction information transferred from the other party's video conference device. Therefore, since the sound field localization reproduced in the direction of the camera platform of the other party's video conference device is biased, it is possible to hold a video conference with a rich sense of reality.

【００１７】テレビ会議参加者の操作により、雲台操作
装置は雲台操作情報を相手のテレビ会議装置の雲台制御
装置に転送する。そのため、一方のテレビ会議装置から
他方のテレビ会議装置の雲台をリモートコントロールで
きる。According to the operation of the video conference participant, the head operation device transfers the head operation information to the head control device of the partner video conference device. Therefore, one of the video conference devices can remotely control the platform of the other video conference device.

【００１８】テレビ会議参加者の操作の情報と相手テレ
ビ会議装置から受信した雲台操作情報とを受信し、雲台
方向情報を雲台に出力している。そのため、雲台を相手
テレビ会議装置からリモートコントロール出来るだけで
なく、自側のテレビ会議装置のテレビ会議参加者もコン
トロールできる。[0018] The information of the operation of the video conference participant and the pan head operation information received from the partner video conference device are received, and the pan head direction information is output to the pan head. Therefore, not only can the head be remotely controlled from the remote video conference device, but also the video conference participants of the local video conference device can be controlled.

【００１９】さて、上記本発明の実施の形態について、
より詳細に説明すると、図１を参照して、テレビ会議装
置１は、テレビ会議を実現する装置全体である。ネット
ワーク１５は遠隔地間のテレビ会議装置１どうしを接続
し、テレビ会議を実現する。ビデオカメラ１２は撮像画
像信号を出力する。画像符号化装置１１はビデオカメラ
１２が出力する撮像画像信号を入力し、撮像画像信号を
符号化し、符号化された撮像画像信号を多重化装置２に
出力する。マイクロフォン１４は入力音声信号を出力す
る。音声符号化装置１３はマイクロフォンの出力する入
力音声信号を入力し、入力音声信号を符号化し、符号化
された音声信号を多重化装置２に出力する。画像復号化
装置６は相手テレビ会議装置から受信した符号化された
撮像画像信号を多重化装置２から入力し、復号化を行い
再生画像信号を出力する。ディスプレイ７は画像復号化
装置が出力する再生画像信号を入力し、相手の映像を表
示する。音声復号化装置３は相手テレビ会議装置から受
信した符号化された音声信号を多重化装置２から入力
し、復号化を行い再生音声信号を出力する。音場生成装
置４は音声復号化装置３の出力する再生音声信号と、相
手テレビ会議装置から受信した雲台方向情報とを入力
し、ステレオ化された音声信号を出力する。右スピーカ
５−Ｒは音場生成装置４が出力するステレオ化された右
音声信号を再生する。左スピーカ５−Ｌは音場生成装置
４が出力するステレオ化された左音声信号を再生する。
雲台制御装置８は相手テレビ会議装置から受信した雲台
操作情報を入力し、雲台操作情報と雲台制御装置８に設
けられた自雲台手段により、雲台方向情報を出力する。
雲台９は、ビデオカメラ１２とマイクロフォン１４がそ
れぞれ備えられ、雲台制御装置８が出力する雲台方向情
報に基づいて回転する。雲台操作装置１０は、相手テレ
ビ会議装置から受信した雲台方向情報を入力し、相手テ
レビ会議装置の雲台を操作するための雲台操作情報を出
力する。多重化装置２は画像符号化装置１１が出力する
符号化された撮像画像信号と、音声符号化装置１３が出
力する符号化された音声信号と、雲台操作装置１０が出
力する雲台操作情報と、雲台制御装置８が出力する雲台
方向情報とを入力し、信号の合成を行う。合成された信
号はネットワーク１５を介して相手のテレビ会議装置に
転送される。また同時に、相手テレビ会議装置からネッ
トワーク１５を介して受信した合成された信号を入力
し、符号化された音声信号と、符号化された撮像画像信
号と、雲台操作情報と、雲台方向情報とに分離し出力を
行う。Now, regarding the embodiment of the present invention,
More specifically, with reference to FIG. 1, the video conference device 1 is an entire device for realizing a video conference. The network 15 connects the video conference devices 1 between remote locations to realize a video conference. The video camera 12 outputs a captured image signal. The image encoding device 11 receives the captured image signal output from the video camera 12, encodes the captured image signal, and outputs the encoded captured image signal to the multiplexing device 2. The microphone 14 outputs an input audio signal. The audio encoding device 13 receives an input audio signal output from the microphone, encodes the input audio signal, and outputs the encoded audio signal to the multiplexer 2. The image decoding device 6 receives the encoded captured image signal received from the other party's video conference device from the multiplexing device 2, performs decoding, and outputs a reproduced image signal. The display 7 receives the reproduced image signal output from the image decoding device and displays the video of the other party. The audio decoding device 3 receives the encoded audio signal received from the other party's video conference device from the multiplexing device 2, performs decoding, and outputs a reproduced audio signal. The sound field generation device 4 receives the reproduced audio signal output from the audio decoding device 3 and the pan head direction information received from the partner video conference device, and outputs a stereo-converted audio signal. The right speaker 5-R reproduces a stereo right audio signal output by the sound field generation device 4. The left speaker 5-L reproduces the stereo left audio signal output by the sound field generation device 4.
The pan head controller 8 receives the pan head operation information received from the other party's teleconference device, and outputs pan head direction information by the pan head operation information and the own pan head means provided in the pan head controller 8.
The camera platform 9 includes a video camera 12 and a microphone 14, and rotates based on camera platform direction information output by the camera platform controller 8. The head operation device 10 receives the head direction information received from the other party's video conference device, and outputs head operation information for operating the head of the other party's video conference device. The multiplexing device 2 includes an encoded captured image signal output from the image encoding device 11, an encoded audio signal output from the audio encoding device 13, and pan head operation information output from the pan head operating device 10. And the pan head direction information output from the pan head control device 8, and synthesizes signals. The synthesized signal is transferred to the other party's video conference device via the network 15. At the same time, a combined signal received from the other party's video conference device via the network 15 is input, and the encoded audio signal, the encoded captured image signal, the pan head operation information, and the pan head direction information are input. And output.

【００２０】次に、図１の動作について説明する。Next, the operation of FIG. 1 will be described.

【００２１】ビデオカメラ１２で撮像され出力される撮
像画像信号は画像符号化装置１１に入力される。画像符
号化装置１１でこの撮像画像信号は符号化され、符号化
された撮像画像信号は多重化装置２に入力される。また
同時に、マイクロフォン１４で集音され出力される入力
音声信号は音声符号化装置１３に入力される。音声符号
化装置１３でこの入力音声信号が符号化され、符号化さ
れた音声信号は同様に多重化装置２に入力される。A picked-up image signal picked up and output by the video camera 12 is input to the image encoding device 11. The captured image signal is encoded by the image encoding device 11, and the encoded captured image signal is input to the multiplexer 2. At the same time, an input audio signal collected and output by the microphone 14 is input to the audio encoding device 13. The input audio signal is encoded by the audio encoding device 13, and the encoded audio signal is similarly input to the multiplexer 2.

【００２２】雲台操作装置１０は相手のテレビ会議装置
の雲台制御装置が送信した雲台方向情報を多重化装置２
から受信し、相手テレビ会議装置の雲台方向を知った
上、相手テレビ会議装置の雲台の操作のために雲台操作
情報を多重化装置２に出力する。The head operation device 10 receives the head direction information transmitted by the head control device of the partner video conference device, and transmits the head direction information to the multiplexing device 2.
, And, after knowing the direction of the camera platform of the other party's video conference device, outputs the platform operation information to the multiplexer 2 for operating the camera platform of the other party's video conference device.

【００２３】また、雲台制御装置８は相手テレビ会議装
置の雲台操作装置が送信した雲台操作情報を受信する。
そして、図２に示すように、雲台制御装置８に設けられ
た自雲台操作手段１６と、受信した雲台操作情報とに基
づいて、雲台方向情報を雲台９と多重化装置２に出力す
る。雲台９は、雲台制御装置８の出力する雲台方向情報
に基づいて雲台９を回転させる。雲台９にはビデオカメ
ラ１２とマイクロフォン１４とが設けられ、雲台９と共
に回転する。The head control device 8 receives the head operation information transmitted by the head operation device of the partner video conference device.
Then, as shown in FIG. 2, based on the own head operation means 16 provided in the head control device 8 and the received head operation information, the head direction information is transmitted to the head 9 and the multiplexing device 2. Output to The camera platform 9 rotates the camera platform 9 based on the camera platform direction information output from the camera platform controller 8. The camera platform 9 is provided with a video camera 12 and a microphone 14, and rotates together with the camera platform 9.

【００２４】多重化装置２は、画像符号化装置１１から
入力した符号化された撮像画像信号と、音声符号化装置
１１から入力した符号化された音声信号と、雲台操作装
置から入力した雲台操作情報と、雲台制御装置から入力
した雲台方向情報とをマルチプレクスし、ネットワーク
１５を介して相手テレビ会議装置に送信する。The multiplexing device 2 includes an encoded captured image signal input from the image encoding device 11, an encoded audio signal input from the audio encoding device 11, and a cloud input from the pan head operating device. The platform operation information and the platform direction information input from the platform controller are multiplexed and transmitted to the partner video conference device via the network 15.

【００２５】一方、多重化装置２はネットワーク１５を
介して相手テレビ会議装置から受信したデータをデマル
チプレクスし、画像復号化装置６に符号化された撮像画
像信号を、音声復号化装置３に符号化された音声信号
を、雲台制御装置８に雲台操作信号を、音場生成装置４
と雲台操作装置１０とに雲台方向信号を、それぞれ出力
する。On the other hand, the multiplexing device 2 demultiplexes the data received from the other party's video conference device via the network 15, and sends the captured image signal encoded by the image decoding device 6 to the audio decoding device 3. The encoded audio signal is transmitted to the pan head control device 8 by the pan head operation signal, and the sound field generation device 4
The head direction signal is output to the camera head operating device 10 and the camera head operating device 10, respectively.

【００２６】画像復号化装置６は、相手テレビ会議装置
から受信した復号化された撮像映像信号を多重化装置２
から入力し、映像を復号化する。復号化された映像信号
はディスプレイ７によって表示される。すなわち、相手
テレビ会議装置のビデオカメラが撮影した映像が映しだ
される。The image decoding device 6 multiplexes the decoded imaged video signal received from the other party's video conference device into the multiplexing device 2.
And decode the video. The decoded video signal is displayed on the display 7. That is, an image captured by the video camera of the other party's video conference device is displayed.

【００２７】また、音声復号化装置３は、相手テレビ会
議装置から受信した復号化された音声信号を多重化装置
２から入力し、音声を復号化する。復号化された音声信
号は音場生成装置４に出力される。The audio decoding device 3 inputs the decoded audio signal received from the other party's video conference device from the multiplexer 2, and decodes the audio. The decoded audio signal is output to the sound field generation device 4.

【００２８】音場生成装置４は音声復号化装置によって
復号化された音声信号を入力すると同時に、相手テレビ
会議装置から受信し多重化装置２より入力される相手の
雲台方向情報を入力する。音場生成装置４は音声信号と
雲台方向情報とにより、擬似的にステレオ化された音声
信号を生成する。音場生成装置４が出力する擬似的にス
テレオ化された音声信号は、右スピーカ５−Ｒ、左スピ
ーカ５−Ｌとにより再生される。すなわち、相手テレビ
会議装置のマイクロフォンが集音した音声が、擬似的な
ステレオ音声で再生される。The sound field generation device 4 inputs the audio signal decoded by the audio decoding device, and at the same time, inputs the head position information of the other party received from the other party's video conference device and input from the multiplexer 2. The sound field generating device 4 generates a pseudo stereo sound signal based on the sound signal and the pan head direction information. The pseudo-stereo sound signal output from the sound field generation device 4 is reproduced by the right speaker 5-R and the left speaker 5-L. That is, the sound collected by the microphone of the other party's video conference device is reproduced as pseudo stereo sound.

【００２９】図２は雲台制御装置８の動作を説明するブ
ロック図である。図２を参照すると、雲台制御装置８
は、通常は雲台制御装置８に設けられた自雲台操作手段
１６によって、自テレビ会議装置の雲台９に雲台方向情
報を出力する。そして、雲台９は話者の方向φに向く。
ここで、相手テレビ会議装置の雲台操作装置が送信した
雲台操作情報が多重化装置２から入力された場合、この
雲台操作情報を雲台９に雲台方向情報として出力する。
つまり、自雲台操作手段１６と相手テレビ会議装置の雲
台操作手段両方により、雲台９がコントロール可能とな
る。また、雲台操作情報は雲台９に出力されると同時
に、多重化装置２にも雲台操作情報を出力され、相手テ
レビ会議装置に送信される。FIG. 2 is a block diagram for explaining the operation of the camera platform control device 8. Referring to FIG. 2, the pan head control device 8
Normally, the pan head operation means 16 provided in the pan head control device 8 outputs the pan head direction information to the pan head 9 of the own video conference device. Then, the camera platform 9 faces the direction φ of the speaker.
Here, when the pan head operation information transmitted by the pan head operation device of the partner video conference device is input from the multiplexing device 2, the pan head operation information is output to the pan head 9 as pan head direction information.
That is, the camera platform 9 can be controlled by both the camera platform operating device 16 and the camera platform operating device of the partner video conference device. Further, the pan head operation information is output to the pan head 9 at the same time as the pan head operation information is also output to the multiplexing device 2 and transmitted to the partner video conference device.

【００３０】図３は音場生成装置４の動作を説明するブ
ロック図である。図３を参照すると、可変減衰器４ＺＲ
は右チャンネルの音声レベルを調整する。可変減衰器４
ＺＬは左チャンネルの音声レベルを調整する。可変遅延
素子４ＤＲは右チャンネルの音声遅延時間を調整する。
可変遅延素子４ＤＬは左チャンネルの音声遅延時間を調
整する。FIG. 3 is a block diagram for explaining the operation of the sound field generating device 4. Referring to FIG. 3, the variable attenuator 4ZR
Adjusts the audio level of the right channel. Variable attenuator 4
ZL adjusts the audio level of the left channel. The variable delay element 4DR adjusts the audio delay time of the right channel.
The variable delay element 4DL adjusts the audio delay time of the left channel.

【００３１】次に図３の動作について説明する。Next, the operation of FIG. 3 will be described.

【００３２】音声復号化装置３によって復号された音声
信号は音場生成装置４に入力される。入力された音声信
号は２チャンネルに分割される。今、右チャンネル側で
は、可変減衰器４ＺＲと可変遅延素子４ＤＲとにより、
減衰、遅延が加えられる。また同様に左チャンネル側で
も、可変減衰器４ＺＬと可変遅延素子４ＤＬとにより、
減衰、遅延が加えられる。ここで、可変減衰器４ＺＲ、
可変遅延素子４ＤＲと、可変減衰器４ＺＬ、可変遅延素
子４ＤＬとの可変量は、相手テレビ会議装置が送信し、
多重化装置２から入力される雲台方向情報によって決定
される。そして、可変減衰器４ＺＲ可変遅延素子４ＤＲ
とによって減衰、遅延が加えられた音声信号はスピーカ
５−Ｒにより再生され、また可変減衰器４ＺＬ、可変遅
延素子４ＤＬとによって減衰、遅延が加えられた音声信
号はスピーカ５−Ｌにより再生される。The audio signal decoded by the audio decoder 3 is input to the sound field generator 4. The input audio signal is divided into two channels. Now, on the right channel side, the variable attenuator 4ZR and the variable delay element 4DR
Attenuation and delay are added. Similarly, also on the left channel side, the variable attenuator 4ZL and the variable delay element 4DL provide
Attenuation and delay are added. Here, the variable attenuator 4ZR,
The variable amounts of the variable delay element 4DR, the variable attenuator 4ZL, and the variable delay element 4DL are transmitted by the other party's video conference device,
It is determined by the camera platform direction information input from the multiplexing device 2. And the variable attenuator 4ZR variable delay element 4DR
The audio signal attenuated and delayed by is reproduced by the speaker 5-R, and the audio signal attenuated and delayed by the variable attenuator 4ZL and the variable delay element 4DL is reproduced by the speaker 5-L. .

【００３３】ここで例えば、雲台方向情報φが０度であ
る場合、可変減衰器４ＺＲ，４ＺＬの減衰量は共に０、
可変遅延素子４ＤＲ，４ＤＬの遅延量も共に０に設定す
れば、スピーカ５−Ｒ，５−Ｌで再生される音声の音像
定位はスピーカ中央になり、相手テレビ会議装置の雲台
が正面を向いている事が音像定位から判断でき、臨場感
が得られる。また、雲台方向情報φが右にα度である場
合、可変減衰器４ＺＲによる減衰量を＋Ｚα、可変減衰
器４ＺＬによる減衰量を０と設定し、可変遅延素子４Ｄ
Ｒによる遅延量を＋Ｄα、可変遅延素子４ＤＬの遅延量
を０に設定すれば、スピーカ５−Ｒ，５−Ｌで再生され
る音声の音像定位は右側に寄り、相手テレビ会議装置の
雲台が右側を向いている事が音像定位から判断でき、臨
場感が得られる。逆に、雲台方向情報φが左にβ度であ
る場合、可変減衰器４ＺＲによる減衰量を０、可変減衰
器４ＺＬによる減衰量を＋Ｚβと設定し、可変遅延素子
４ＤＲによる遅延量を０、可変遅延素子４ＤＬによる遅
延量を＋Ｄβに設定すれば、スピーカ５−Ｒ，５−Ｌで
再生される音声の音像定位は左側に寄り、相手テレビ会
議装置の雲台が左側を向いていることが音像定位から判
断でき、臨場感が得られる。Here, for example, when the pan head direction information φ is 0 degree, the attenuation amounts of the variable attenuators 4ZR and 4ZL are both 0,
If the delay amounts of the variable delay elements 4DR and 4DL are both set to 0, the sound image localization of the sound reproduced by the speakers 5-R and 5-L is located at the center of the speaker, and the camera platform of the partner video conference device faces forward. Can be determined from the sound image localization, and a sense of reality can be obtained. When the pan head direction information φ is α degrees to the right, the attenuation by the variable attenuator 4ZR is set to + Zα, the attenuation by the variable attenuator 4ZL is set to 0, and the variable delay element 4D is set.
If the delay amount due to R is set to + Dα and the delay amount of the variable delay element 4DL is set to 0, the sound image localization of the sound reproduced by the speakers 5-R and 5-L shifts to the right, and the camera platform of the other party's video conference device It can be judged from the sound image localization that it is facing the right side, and a sense of reality can be obtained. Conversely, if the pan head direction information φ is β degrees to the left, the attenuation by the variable attenuator 4ZR is set to 0, the attenuation by the variable attenuator 4ZL is set to + Zβ, and the delay by the variable delay element 4DR is set to 0, If the delay amount of the variable delay element 4DL is set to + Dβ, the sound image localization of the sound reproduced by the speakers 5-R and 5-L shifts to the left, and the camera platform of the other party's video conference device faces the left. Judgment can be made from sound image localization, and a sense of reality can be obtained.

【００３４】[0034]

【発明の効果】以上に説明したように、本発明によれ
ば、モノラルの音声から相手テレビ会議装置の雲台の方
向に基づいて擬似的なステレオ音声を生成し再生してい
るため、画像、音声の符号化装置、復号化装置が１チャ
ンネル分しか必要ないため、装置が安価である。As described above, according to the present invention, pseudo stereo sound is generated and reproduced from monaural sound based on the direction of the camera platform of the partner video conference apparatus. Since only one channel is required for the audio encoder and decoder, the apparatus is inexpensive.

[Brief description of the drawings]

【図１】本発明の実施形態を示す図である。FIG. 1 is a diagram showing an embodiment of the present invention.

【図２】本発明の実施形態による雲台制御装置の動作を
説明する図である。FIG. 2 is a diagram illustrating an operation of the camera platform control device according to the embodiment of the present invention.

【図３】本発明の実施形態による音場生成装置の動作を
説明する図である。FIG. 3 is a diagram illustrating an operation of the sound field generation device according to the embodiment of the present invention.

【図４】従来のステレオ音声テレビ会議装置の一例を示
すブロック図である。FIG. 4 is a block diagram showing an example of a conventional stereo audio video conference device.

[Explanation of symbols]

１テレビ会議装置２多重化装置３音声復号化装置４音場生成装置４ＺＲ，４ＺＬ減衰器４ＤＲ，４ＤＬ遅延素子５−Ｒ右チャンネルスピーカ５−Ｌ左チャンネルスピーカ６画像復号化装置７ディスプレイ８雲台制御装置９雲台１０雲台操作装置１１画像符号化装置１２ビデオカメラ１３音声符号化装置１４マイクロフォン１５ネットワーク１６自雲台操作手段１０６ａオーディオコーデック１０６ｂオーディオコーデック１０７ａオーディオ入力部１０７ｂオーディオ入力部１０８ａオーディオ出力部１０８ｂオーディオ出力部１０９システムコントロール部１１０ａ切り換えスイッチ１１０ｂ切り換えスイッチ REFERENCE SIGNS LIST 1 video conference device 2 multiplexer 3 audio decoder 4 sound field generator 4ZR, 4ZL attenuator 4DR, 4DL delay element 5-R right channel speaker 5-L left channel speaker 6 image decoding device 7 display 8 pan head Control device 9 Camera platform 10 Camera platform operating device 11 Image encoding device 12 Video camera 13 Audio encoding device 14 Microphone 15 Network 16 Own platform operating means 106a Audio codec 106b Audio codec 107a Audio input unit 107b Audio input unit 108a Audio output Unit 108b audio output unit 109 system control unit 110a changeover switch 110b changeover switch

Claims

(57) [Claims]

1. A teleconference device having a video camera mounted thereon and indicating a direction of a pan head facing a speaker of a video conference.
A sound field generation device for generating a stereo sound signal from a monaural sound signal transferred from the other party's video conference device and decoded by a sound decoding device, based on the transferred pan head direction information; A stereo audio teleconference device wherein stereo audio signals generated by a sound field generation device are stereo-reproduced by left and right speakers.

2. The stereo audio teleconferencing device according to claim 1, wherein the sound field generation device variably attenuates the input monaural audio signal based on the pan head direction information. A left channel attenuator that variably attenuates the input monaural audio signal based on the camera platform direction information; and a right channel variably delays the input monaural audio signal based on the camera platform direction information. A stereo audio video conference apparatus comprising: a channel delay element; and a left channel delay element that variably delays the input monaural audio signal based on the pan head direction information.