JP2001036881A

JP2001036881A - Voice transmission system and voice reproduction device

Info

Publication number: JP2001036881A
Application number: JP11202955A
Authority: JP
Inventors: Naoto Takahashi; 直人高橋
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1999-07-16
Filing date: 1999-07-16
Publication date: 2001-02-09

Abstract

PROBLEM TO BE SOLVED: To reproduce a voice from an opposite party in a video conference at a prescribed sound image position. SOLUTION: Each of clients 12-18 connected to an internet 10 in this voice transmission system is provided with a voice input output means and an image input output means. A control server 20 controls connection.interruption or the like of a video conference by the clients 12-18. An image server 22 collects images from the clients 12-18 and distributes the images to the clients 12-18. A voice server 24 collects the voice from the clients 12-18 and distributes it to the clients 12-18. Furthermore, in the case of transmitting voice data to each of the clients 12-18, the voice server 24 adds voice image localization information to the voice data and transmits the resulting voice data to the clients 12-18. Each of the clients 12-18 reproduces a received voice so that it is reproduced at a voice image position according to the voice image localization information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声伝送システム
及び音声再生装置に関し、より具体的には、伝送された
各音声をその送信元を特定しやすいように再生する音声
伝送システム及び音声再生装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio transmission system and an audio reproducing apparatus, and more specifically, to an audio transmission system and an audio reproducing apparatus for reproducing each transmitted audio so that its transmission source can be easily specified. About.

【０００２】[0002]

【従来の技術】近年、インターネットに代表されるコン
ピュータ通信網が普及し、公衆通信網及び専用線網の使
用料金の低下している。その安価なネットワークを用い
た画像・音声通信の受容が高まり、テレビ会議装置又は
ビデオ会議装置が爆発的に普及してきている。インター
ネット又は公衆通信網を用いたテレビ会議では、主たる
装置にパーソナルコンピュータが使用される。パーソナ
ルコンピュータの価格低下とは対照的に、その機能が向
上していることも、テレビ会議システムの普及に拍車を
かけている。2. Description of the Related Art In recent years, computer communication networks represented by the Internet have become widespread, and usage fees for public communication networks and private line networks have been reduced. The acceptance of image / voice communication using such inexpensive networks has increased, and videoconferencing devices or videoconferencing devices have exploded. In a video conference using the Internet or a public communication network, a personal computer is used as a main device. The increasing functionality of personal computers, in contrast to the declining price of personal computers, has spurred the spread of videoconferencing systems.

【０００３】しかし、現在の日本における使用環境を考
えると、住宅は手狭であり、職場においても一人に割り
当てられるスペースは狭い。従って、スピーカもおおむ
ね２台が限度であり、大画面のテレビ会議システムは一
般的ではない。[0003] However, considering the current usage environment in Japan, housing is too small, and the space allocated to one person at work is also small. Therefore, the number of speakers is generally limited to two, and a large-screen video conference system is not common.

【０００４】従来、インターネット、公衆網及び専用線
網で使用されるテレビ会議システムは、一対一用又は多
地点用であり、どちらも広く普及している。そのシステ
ムに使用される送受信装置は、パーソナルコンピュータ
をベースとするのが一般的である。音声は、ＩＴＵ−Ｔ
勧告Ｇ．７１１、Ｇ．７２３及びＧ．７２９などに従っ
て圧縮符号化される。音声送信側では、コンピュータに
装備した音声ボード等のマイク入力端子に専用マイクを
接続して音声信号を入力する。入力された音声信号は、
ソフトウエア及び／又はハードウエアからなる符号化器
により勧告Ｇ．７１１、Ｇ．７２３及びＧ．７２９など
に従って圧縮符号化され、網接続装置（モデム、ターミ
ナルアダプタ及びディジタル・サービス・ユニット（Ｄ
ＳＵ）等）を経由してインターネット網などの通信網に
送出される。Conventionally, video conferencing systems used in the Internet, public networks and leased line networks are one-to-one or multipoint systems, both of which are widely used. The transmitting / receiving device used in the system is generally based on a personal computer. Voice is ITU-T
Recommendation G. 711, G.R. 723 and G.R. 729 and the like. On the audio transmitting side, a dedicated microphone is connected to a microphone input terminal of an audio board or the like provided in the computer to input an audio signal. The input audio signal is
According to an encoder consisting of software and / or hardware, Recommendation G. 711, G.R. 723 and G.R. 729 and the like, and are connected to a network connection device (modem, terminal adapter and digital service unit (D
SU)) to a communication network such as the Internet.

【０００５】受信側装置がパーソナルコンピュータをベ
ースとする場合、受信音声は、モニタ画面の左右に配置
される二台のスピーカからモノラル出力される。つま
り、多地点間のテレビ会議でも、複数の相手からの音声
が、同じ位置から出力される。[0005] When the receiving apparatus is based on a personal computer, the received sound is monaurally output from two speakers arranged on the left and right of the monitor screen. That is, even in a video conference between multiple points, sounds from a plurality of parties are output from the same position.

【０００６】音像定位装置はかなり普及してきてはいる
が、そのほとんどがオーディオ用に開発されたものであ
り、テレビ会議システム等の通信装置には応用されてい
ない。現在のオーディオ用機器では、原音をディジタル
信号処理して多チャンネル信号とすることにより音像位
置を規定している。狭帯域の伝送路に多チャンネル音声
信号を出力することは好ましくなく、テレビ会議システ
ム等ではこのような手法を採ることができない。[0006] Although sound image localization devices have become quite popular, most of them have been developed for audio and have not been applied to communication devices such as video conference systems. In current audio equipment, the sound image position is defined by digital signal processing of the original sound into a multi-channel signal. It is not preferable to output a multi-channel audio signal to a narrow band transmission path, and such a method cannot be adopted in a video conference system or the like.

【０００７】[0007]

【発明が解決しようとする課題】音声信号を高能率圧縮
符号化して伝送する通信用途では、そもそも、音像定位
という要求が無かった。しかし、多地点間のテレビ会議
で、複数人を相手にする状況を想定すると、同じ位置か
らの出力音では、どの通信相手からの音声であるかを判
別するのが難しい。また、複数人が同時に話した場合に
は、その内容を理解するのも難しくなる。In a communication application in which an audio signal is transmitted after being encoded by high-efficiency compression coding, there has been no request for sound image localization in the first place. However, assuming a situation in which a plurality of people are involved in a multipoint video conference, it is difficult to determine which communication partner is the sound from the output sound from the same position. In addition, when a plurality of people talk at the same time, it becomes difficult to understand the content.

【０００８】本発明は、このような不都合を解消した音
声伝送システム及び音声再生装置を提示することを目的
とする。[0008] An object of the present invention is to provide an audio transmission system and an audio reproducing apparatus which have solved such inconveniences.

【０００９】[0009]

【課題を解決するための手段】本発明に係る音声伝送シ
ステムは、音声情報を入力し、入力された音声情報を伝
送路に出力する１以上の送信装置と、当該１以上の送信
装置からの音声情報を受信し、当該音声情報に付加され
た音像定位情報に応じて各音声情報が示す音声を再生す
る受信装置とからなることを特徴とする。A voice transmission system according to the present invention comprises: at least one transmitting device for inputting voice information and outputting the input voice information to a transmission path; A receiving device that receives the audio information and reproduces the audio indicated by the audio information in accordance with the sound image localization information added to the audio information.

【００１０】本発明に係る音声伝送システムはまた、音
声を入力して、入力された音声を伝送路に出力する１以
上の音声送信装置と、当該１以上の音声送信装置からの
各音声を受信し、音声送信元となる当該音声送信装置毎
に予め指定された音像位置で各音声を再生する音声受信
装置とを具備することを特徴とする。[0010] The voice transmission system according to the present invention further includes one or more voice transmitting devices for inputting voice and outputting the input voice to a transmission path, and receiving each voice from the one or more voice transmitting devices. And a sound receiving device that reproduces each sound at a sound image position specified in advance for each sound transmitting device as a sound transmitting source.

【００１１】本発明に係る音声伝送システムはまた、夫
々音声情報を入力し、入力された音声情報を伝送路に出
力するとともに当該伝送路からの音声情報を受信し再生
する複数の端末装置を具備し、当該複数の端末装置の１
つが他の端末装置からの音声情報を互いに異なる音像位
置で再生することを特徴とする。The voice transmission system according to the present invention further includes a plurality of terminal devices for inputting voice information, outputting the input voice information to a transmission path, and receiving and reproducing the voice information from the transmission path. And one of the plurality of terminal devices
One is characterized in that audio information from another terminal device is reproduced at sound image positions different from each other.

【００１２】本発明に係る音声再生装置は、伝送路に接
続されている音声再生装置であって、当該伝送路に接続
されている１以上の送信装置からの音声情報を受信し、
当該音声情報に付加された音像定位情報に応じて各音声
情報が示す音声を再生することを特徴とする。An audio reproducing apparatus according to the present invention is an audio reproducing apparatus connected to a transmission path, which receives audio information from one or more transmitting apparatuses connected to the transmission path,
The sound indicated by each sound information is reproduced according to the sound image localization information added to the sound information.

【００１３】本発明に係る音声再生装置は、伝送路を介
して伝送される音声情報を再生する音声再生装置であっ
て、当該伝送路からの情報を受信する受信手段と、当該
受信手段で受信された情報に含まれる音声情報を送信元
に応じた音像位置で再生する音声再生手段とからなるこ
とを特徴とする。[0013] An audio reproducing apparatus according to the present invention is an audio reproducing apparatus for reproducing audio information transmitted via a transmission path, comprising: receiving means for receiving information from the transmission path; And sound reproducing means for reproducing sound information included in the obtained information at a sound image position corresponding to the transmission source.

【００１４】[0014]

【実施例】以下、図面を参照して本発明の実施例を詳細
に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１５】図１は、本発明を適用するテレビ会議シス
テムの一実施例の概略構成ブロック図を示す。ここで
は、通信網としてインターネットを使用する。FIG. 1 is a schematic block diagram showing an embodiment of a video conference system to which the present invention is applied. Here, the Internet is used as a communication network.

【００１６】１０はインターネットであり、これに、テ
レビ会議の端末装置であるクライアント１２，１４，１
６，１８、このグループのテレビ会議の接続・切断等を
制御する制御サーバ２０、このグループのクライアント
１２〜１８からの画像を収集し、クライアント１２〜１
８に分配する画像サーバ２２、及び、このグループのク
ライアント１２〜１８からの音声を収集し、クライアン
ト１２〜１８に分配する音声サーバ２４が接続する。Reference numeral 10 denotes the Internet, to which clients 12, 14, 1 which are terminal devices of a video conference.
6, 18; a control server 20 for controlling connection / disconnection of the video conference of this group; collecting images from clients 12 to 18 of this group;
8 and an audio server 24 that collects audio from the clients 12 to 18 of this group and distributes the audio to the clients 12 to 18.

【００１７】図２は、クライアント１２における音像定
位の様子を示す。他のクライアント１４〜１８でも基本
的に同様である。３０は映像を表示するモニタであり、
その画面には、このテレビ会議の参加者Ｂ，Ｃ，Ｄ（そ
れぞれ、クライアント１４，１６，１８の利用者）の画
像３２，３４，３６と、クライアント１２の利用者Ａの
画像（いわゆる自画像）３８が表示される。モニタ３０
の上にカメラ４０が配置され、クライアント１２の利用
者Ａの顔を撮影する。モニタ３０の左右には、スピーカ
４２，４４が配置される。FIG. 2 shows the state of sound image localization in the client 12. The same applies to the other clients 14 to 18. Reference numeral 30 denotes a monitor for displaying an image,
On the screen, images 32, 34, 36 of participants B, C, D (users of clients 14, 16, 18) of this video conference and an image of user A of client 12 (so-called self-portrait) are displayed. 38 is displayed. Monitor 30
The camera 40 is arranged on the camera 12 and photographs the face of the user A of the client 12. Speakers 42 and 44 are arranged on the left and right of the monitor 30.

【００１８】本実施例では、利用者Ａを含む音像定位領
域４６内に、他の各参加者Ｂ，Ｃ，Ｄに対してそれぞれ
異なる音像定位位置４８，５０，５２を設定する。In this embodiment, different sound image localization positions 48, 50, and 52 are set for the other participants B, C, and D in the sound image localization area 46 including the user A.

【００１９】図３は、クライアント１２〜１８の概略構
成図を示す。モニタ３０、カメラ４０及びスピーカ４
２，４４の他に、キーボード６０、ＩＳＤＮに接続する
ためのターミナルアダプタ（ＴＡ）６２及びマイク６４
はＰＣ本体６６に接続する。FIG. 3 shows a schematic configuration diagram of the clients 12 to 18. Monitor 30, camera 40, and speaker 4
2, 44, a keyboard 60, a terminal adapter (TA) 62 for connecting to an ISDN, and a microphone 64.
Is connected to the PC body 66.

【００２０】図４は、各クライアント１２〜１８の概略
機能ブロック図を示す。７０はマザーボード、７２は音
声入出力ボード、７４は音声処理ボード、７６はディジ
タル信号処理回路（ＤＳＰ）、７８はビデオキャピュチ
ャボードである。FIG. 4 is a schematic functional block diagram of each of the clients 12 to 18. 70 is a motherboard, 72 is an audio input / output board, 74 is an audio processing board, 76 is a digital signal processing circuit (DSP), and 78 is a video capture board.

【００２１】本実施例におけるテレビ会議動作を説明す
る。A video conference operation according to the present embodiment will be described.

【００２２】例えば、いずれかのクライアント（例え
ば、クライアント１２）が、制御サーバ２０に対して、
他のクライアント（例えば、クライアント１４，１６，
１８）との間でテレビ会議を行ないたい旨の要求を出
す。具体的には、クライアント１２上で、専用のソフト
ウエアを起動し、この要求の信号を制御サーバ２０のＩ
Ｐアドレスを指定して、制御サーバ２０に送信する。こ
の信号はインターネット１０を介して制御サーバ２０に
到達する。For example, one of the clients (for example, the client 12)
Other clients (e.g., clients 14, 16,
18) and a request to hold a video conference. Specifically, the dedicated software is started on the client 12, and the request signal is transmitted to the I / O of the control server 20.
The P address is specified and transmitted to the control server 20. This signal reaches the control server 20 via the Internet 10.

【００２３】制御サーバ２０は、クライアント１２から
のこの要求に対し、テレビ会議に参加要請されたクライ
アント１４，１６，１８にテレビ会議への参加要請を送
信する。この参加要請も、インターネット１０を介して
各クライアント１４，１６，１８に到達する。各クライ
アント１４，１６，１８は、参加要請に対して諾否を制
御サーバ２０に通知する。承諾の返事を出したクライア
ント１４，１６又は１８のみがクライアント１２を加え
てテレビ会議に参加する。ここでは、４つのクライアン
ト１２〜１８がテレビ会議に参加するものとする。In response to the request from the client 12, the control server 20 transmits a request for participation in the video conference to the clients 14, 16, and 18 requested to participate in the video conference. This request for participation also reaches each of the clients 14, 16, and 18 via the Internet 10. Each of the clients 14, 16, 18 notifies the control server 20 of acceptance or rejection of the participation request. Only the client 14, 16 or 18 that has given the consent will join the video conference with the client 12. Here, it is assumed that four clients 12 to 18 participate in the video conference.

【００２４】テレビ会議中での画像の分配動作を説明す
る。制御サーバ２０は画像サーバ２２にテレビ会議の参
加クライアント１２〜１８を指定して、多地点テレビ会
議の開始を通知する。各クライアント１２〜１８は、各
自のカメラ４０の映像をビデオキャプチャボード７８で
取り込み、圧縮符号化して画像サーバ２２に送信する。
画像サーバ２２は、各クライアント１２〜１８からの画
像をインターネット１０を介して送信元以外のクライア
ント１２〜１８に送信する。例えば、画像サーバ２２
は、クライアント１２からの画像をクライアント１４，
１６，１８に送信する。An image distribution operation during a video conference will be described. The control server 20 notifies the image server 22 of the start of the multipoint video conference by specifying the participating clients 12 to 18 of the video conference. Each of the clients 12 to 18 captures the video of its own camera 40 with the video capture board 78, compresses and encodes the video, and transmits it to the image server 22.
The image server 22 transmits images from the clients 12 to 18 to clients 12 to 18 other than the transmission source via the Internet 10. For example, the image server 22
Transmits an image from the client 12 to the client 14,
16 and 18.

【００２５】各クライアント１２〜１８のモニタ３０の
画面には、図２に例示したように、自画像と、他のクラ
イアントからの画像が表示される。専用ソフトウエア
が、どのクライアントからの画像をどの位置に表示する
かを管理する。On the screen of the monitor 30 of each of the clients 12 to 18, a self-portrait and an image from another client are displayed as shown in FIG. The dedicated software manages which image from which client is displayed at which position.

【００２６】次に、音声信号の伝送プロセスを説明す
る。各クライアント１２〜１８では、利用者の音声はマ
イク６４により取り込まれ、音声入出力ボード７２によ
りＡ／Ｄ変換され、音声処理ボード７４によりＩＴＵ−
Ｔ勧告Ｇ．７２９に従って８ｋｂｉｔ／ｓで符号化され
る。符号化された音声信号はマザーボード７０、ＴＡ６
２、ＩＳＤＮ及びインターネット１０を介して音声サー
バ２４に送信される。Next, the transmission process of the audio signal will be described. In each of the clients 12 to 18, the user's voice is captured by the microphone 64, A / D converted by the voice input / output board 72, and ITU-
T Recommendation G. 729, and is encoded at 8 kbit / s. The encoded audio signal is sent to the motherboard 70, TA6
2. It is transmitted to the voice server 24 via the ISDN and the Internet 10.

【００２７】音声サーバ２４は、各クライアント１２〜
１８からの音声をインターネット１０を介して送信元以
外のクライアント１２〜１８に送信する。送信相手は画
像と同じである。音声サーバ２４は、他のクライアント
からの音声情報に加えて、音像定位の信号を各クライア
ント１２〜１８に送信する。例えば、クライアント１２
にクライアント１４〜１８からの各音声を送信すると
き、音声サーバ２４は、クライアント１２の利用者Ａの
音像定位領域内のどこに定位させるかを指定する信号も
同時に、クライアント１２に送信する。他のクライアン
ト１４，１６，１８に対しても同様である。The voice server 24 is provided for each of the clients 12 to
The voice from the server 18 is transmitted to the clients 12 to 18 other than the transmission source via the Internet 10. The transmission destination is the same as the image. The audio server 24 transmits a sound image localization signal to each of the clients 12 to 18 in addition to audio information from other clients. For example, client 12
When each voice from the clients 14 to 18 is transmitted to the client 12, the voice server 24 also transmits to the client 12 a signal specifying where in the sound image localization area of the user A of the client 12 to localize. The same applies to the other clients 14, 16, and 18.

【００２８】各クライアント１２〜１８の音像定位領域
４６に対する音像定位情報の意味内容を、音声サーバ２
４と各クライアント１２〜１８と間で統一しておけば、
音像定位情報を簡略化できる。The meaning of the sound image localization information for the sound image localization area 46 of each of the clients 12 to 18 is described in the audio server 2.
If you unify between 4 and each client 12-18,
Sound image localization information can be simplified.

【００２９】クライアント１２において、他の各クライ
アント１４，１６，１８からの音声信号を音像定位させ
る場合を、図２を参照して、説明する。Referring to FIG. 2, a description will be given of the case where the client 12 localizes the sound signals from the other clients 14, 16, and 18 to a sound image.

【００３０】クライアント１２のモニタ画面が４分割さ
れ、左上のウインドウにクライアント１４の利用者の画
像３２、左下のウインドウにクライアント１６の利用者
の画像３４、右上のウインドウにクライアント１８の利
用者の画像３６、右下にはクライアント１２の利用者の
画像３８がそれぞれ表示されている。クライアント１２
の利用者の画像３８は、画像サーバ２２を経由せずに、
カメラ４０の出力画像をそのまま表示したものである。
クライアント１２の利用者の画像３８を表示するのは、
自局のカメラ４０の画像を確認できるようにするためで
ある。The monitor screen of the client 12 is divided into four parts, an image 32 of the user of the client 14 in the upper left window, an image 34 of the user of the client 16 in the lower left window, and an image of the user of the client 18 in the upper right window. 36, an image 38 of the user of the client 12 is displayed at the lower right. Client 12
Image 38 of the user without passing through the image server 22
The output image of the camera 40 is displayed as it is.
The image 38 of the user of the client 12 is displayed
This is because the image of the camera 40 of the own station can be confirmed.

【００３１】クライアント１２は、音声サーバ２４から
の音像定位情報を受信すると、自局の音像定位領域４６
のどの位置にどのクライアントの音声信号を音像定位さ
せるかを決定する。そして、音声サーバ２４から送信さ
れた各クライアント１４〜１８の音声を音声処理ボード
７４で復号化し、ディジタル信号処理回路７６が先に決
定された音像定位の位置になようにその受信音声を処理
する。処理された音声信号は、音声入出力ボード７２を
介してスピーカ４２，４４に供給される。例えば、図２
に例示したように、クライアント１２の利用者の左方か
らクライアント１４の利用者の音声、後方からクライア
ント１６の利用者の音声、右方からクライアント１８の
音声がそれぞれ聞こえてくるように、各音像を定位させ
る。Upon receiving the sound image localization information from the audio server 24, the client 12 receives the sound image localization area 46 of its own station.
It is determined at which position the audio signal of which client is to be localized. Then, the voice of each of the clients 14 to 18 transmitted from the voice server 24 is decoded by the voice processing board 74, and the digital signal processing circuit 76 processes the received voice so as to be at the previously determined sound image localization position. . The processed audio signal is supplied to the speakers 42 and 44 via the audio input / output board 72. For example, FIG.
As shown in FIG. 2, each sound image is such that the voice of the user of the client 14 is heard from the left of the user of the client 12, the voice of the user of the client 16 is heard from behind, and the voice of the client 18 is heard from the right. Is localized.

【００３２】どのクライアントからの音像をどの位置に
定位させるかは、クライアント１２〜１８毎に異なって
も良い。また、音像定位させる位置についても各クライ
アントが任意に変更することは容易である。The position at which the sound image from which client is localized may be different for each of the clients 12-18. It is easy for each client to arbitrarily change the position where the sound image is localized.

【００３３】ここで、上記音像定位情報及び各クライア
ントにおける音声処理について、簡単な例の下で詳細に
説明する。先ず、音声サーバ２４は、各クライアント１
２，１４，１６，１８の何れからの音声かに応じて、送
信する相手先のクライアント以外からの圧縮符号化され
た音声信号に対して左右及び中央の何れを音像定位位置
にするかを決定する。簡単には、送信先クライアントを
除く３つのクライアントについてクライアント１２，１
４，１６，１８の順に左、中央及び右と決定すればよ
い。Here, the sound image localization information and the sound processing in each client will be described in detail using a simple example. First, the voice server 24 checks each client 1
2, 14, 16, or 18, it is determined which of the left, right, and center is to be the sound image localization position for the compression-encoded audio signal from a client other than the client to be transmitted. I do. Briefly, for the three clients except the destination client, the clients 12, 1
The order may be determined to be left, center and right in the order of 4, 16, 18.

【００３４】そして、左、中央及び右と決定された音像
定位位置に対し、夫々’１０’、’１１’及び’０１’
などと２ビットの音像定位情報を付加して、各クライア
ントに送信する。例えば、音声サーバ２４からクライア
ント１８に音声を送信する場合にはクライアント１２か
らの音声情報に’１０’（左）、クライアント１４から
の音声情報に’１１’（中央）、クライアント１６から
の音声情報に’０１’（右）を付与して送信する。The sound image localization positions determined to be left, center and right are '10', '11' and '01', respectively.
And the like, and add 2-bit sound image localization information, and transmit the information to each client. For example, when the voice is transmitted from the voice server 24 to the client 18, the voice information from the client 12 is “10” (left), the voice information from the client 14 is “11” (center), and the voice information from the client 16 is With '01' (right).

【００３５】このような音像定位情報が付加されたモノ
ラル音声信号は、圧縮符号化された状態であるが、前述
のように各クライアントの音声処理ボード７４において
復号される。復号された音声信号は、ここでは未だ１チ
ャンネルの信号であるが、これを当該ボードにおいてス
テレオ信号とする。即ち、音像定位情報が’１０’の場
合にはそのモノラル信号を左（Ｌ）信号とし、音像定位
情報が’０１’の場合にはそのモノラル信号を右（Ｒ）
信号とし、音像定位情報が’１１’の場合にはそのモノ
ラル信号を左（Ｌ）信号と右（Ｒ）信号とに２分の１づ
つ分配する。The monaural audio signal to which the sound image localization information is added is in a state of being compression-encoded, but is decoded by the audio processing board 74 of each client as described above. Although the decoded audio signal is still a signal of one channel here, this is a stereo signal on the board. That is, when the sound image localization information is “10”, the monaural signal is set to the left (L) signal, and when the sound image localization information is “01”, the monaural signal is set to the right (R).
When the sound image localization information is “11”, the monaural signal is divided into a left (L) signal and a right (R) signal by a half.

【００３６】そして、このように復号された３つのクラ
イアントからの音声情報を左（Ｌ）信号及び右（Ｒ）信
号毎に夫々加算し、こうして得られた左（Ｌ）信号と右
（Ｒ）信号とを時分割多重して、音声入出力ボート７２
に出力する。音声入出力ボート７２は、時分割多重され
た左（Ｌ）信号と右（Ｒ）信号とを同時化し、ＤＡ変換
した後に、左（Ｌ）信号をスピーカ４２に、右（Ｒ）信
号をスピーカ４４に夫々供給する。Then, the thus decoded audio information from the three clients is added for each of the left (L) signal and the right (R) signal, and the left (L) signal and the right (R) signal thus obtained are added. The signal is time-division multiplexed and the audio input / output port 72
Output to The audio input / output port 72 synchronizes the time-division multiplexed left (L) signal and right (R) signal, and after DA conversion, outputs the left (L) signal to the speaker 42 and the right (R) signal to the speaker. 44 respectively.

【００３７】このようにして、インターネット網等の帯
域に制限のあるネットワークを使用し、音声信号を高圧
縮率で符号化する必要がある場合においても、多地点テ
レビ会議システムで各参加者からの音像を所望の位置に
定位させることができ、複数人が同時に話しても、判別
しやすくなり、会話がスムーズに行われる。As described above, even when a network having a limited band, such as the Internet network, is used and audio signals need to be encoded at a high compression rate, the multipoint video conference system can be used to transmit signals from each participant. The sound image can be localized at a desired position, and even if a plurality of people speak at the same time, it becomes easy to determine and the conversation can be smoothly performed.

【００３８】一般的なウインドウ管理ソフトウエアを使
用し、各クライアント１２〜１８の利用者が各クライア
ントの画像を自分の好みの位置に配置できるようにして
もよいことは勿論である。その場合には、そのクライア
ントからの音声の音像の位置も、モニタ画面上での位置
に応じて、例えば、モニタ画面の左側に画像を表示する
ときには、左側から音が聞こえ、モニタ画面の右側に画
像を表示するときには右側から音が聞こえるように、設
定するのが好ましい。この観点からは、音声サーバ２４
から各クライアント１２〜１８には、他の各クライアン
ト１２〜１８からの音声を送信元を特定する情報と共に
送信すればよい。各クライアント１２〜１８は、送信元
のクライアント１２〜１８毎に異なる音像定位位置を設
定すればよい。It is a matter of course that general window management software may be used so that the users of the clients 12 to 18 can arrange the images of the clients at desired positions. In that case, the position of the sound image of the voice from the client also depends on the position on the monitor screen. For example, when displaying an image on the left side of the monitor screen, sound is heard from the left side, and the sound is displayed on the right side of the monitor screen. When displaying an image, it is preferable to set so that sound can be heard from the right side. From this point of view, the voice server 24
Therefore, the voice from each of the other clients 12 to 18 may be transmitted to each of the clients 12 to 18 together with the information specifying the transmission source. Each of the clients 12 to 18 may set a different sound image localization position for each of the transmission clients 12 to 18.

【００３９】[0039]

【発明の効果】以上の説明から容易に理解できるよう
に、本発明によれば、複数人が同時に話すようなことが
あっても、各人の話す内容を識別しやすくなり、会話が
スムーズに進行するようになる。As can be easily understood from the above description, according to the present invention, even when a plurality of people speak at the same time, it is easy to identify the contents spoken by each person, and the conversation can be smoothly performed. To progress.

[Brief description of the drawings]

【図１】本発明の一実施例の概略構成図である。FIG. 1 is a schematic configuration diagram of an embodiment of the present invention.

【図２】本実施例における音像定位の例の模式図であ
る。FIG. 2 is a schematic diagram of an example of sound image localization in the present embodiment.

【図３】各クライアント１２〜１８の概略構成図であ
る。FIG. 3 is a schematic configuration diagram of each of clients 12 to 18.

【図４】各クライアント１２〜１８の概略構成ブロッ
ク図である。FIG. 4 is a schematic configuration block diagram of each of clients 12 to 18.

[Explanation of symbols]

１０：インターネット１２，１４，１６，１８：クライアント２０：制御サーバ２２：画像サーバ２４：音声サーバ３０：モニタ３２：クライアント１４からの画像３４：クライアント１６からの画像３６：クライアント１８からの画像３８：クライアント１２の画像４０：カメラ４２，４４：スピーカ４６：音像定位領域４８，５０，５２：音像定位位置６０：キーボード６２：ターミナルアダプタ６４：マイク７０：マザーボード７２：音声入出力ボード７４：音声処理ボード７６：ディジタル信号処理回路（ＤＳＰ）７８：ビデオキャプチャボード 10: Internet 12, 14, 16, 18: Client 20: Control Server 22: Image Server 24: Audio Server 30: Monitor 32: Image from Client 14 34: Image from Client 16 36: Image from Client 18 38: Image of client 12 40: Cameras 42, 44: Speaker 46: Sound image localization area 48, 50, 52: Sound image localization position 60: Keyboard 62: Terminal adapter 64: Microphone 70: Motherboard 72: Audio input / output board 74: Audio processing board 76: Digital signal processing circuit (DSP) 78: Video capture board

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5C064 AA02 AB04 AC06 AC16 AD02 AD09 5D062 AA67 5K015 AA00 AA02 AB01 AB02 JA00 JA10 JA11 5K101 KK07 LL00 LL03 MM07 NN36 SS08 TT02 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5C064 AA02 AB04 AC06 AC16 AD02 AD09 5D062 AA67 5K015 AA00 AA02 AB01 AB02 JA00 JA10 JA11 5K101 KK07 LL00 LL03 MM07 NN36 SS08 TT02

Claims

[Claims]

1. One or more transmitting devices for inputting voice information and outputting the input voice information to a transmission path, receiving voice information from the one or more transmitting devices, and adding the voice information to the voice information. A sound transmission system comprising: a receiving device that reproduces sound indicated by each sound information in accordance with sound image localization information.

2. The sound information from each of the transmitting devices is one-channel sound information, and the sound image localization information defines a sound image position when the one-channel sound information is reproduced by a plurality of speakers. Voice transmission system.

3. The audio transmission system according to claim 2, wherein each of the transmission devices compresses and encodes the one-channel audio information and outputs the information to the transmission path.

4. The apparatus according to claim 1, wherein the transmitting device inputs image information together with the audio information, and the receiving device receives each image information and reproduces the image information on a single monitor. Voice transmission system.

5. The receiving apparatus according to claim 4, wherein the receiving apparatus inputs image information, and the receiving apparatus reproduces each image information from each transmitting apparatus and image information from the receiving apparatus on a single monitor. An audio transmission system as described.

6. The audio transmission according to claim 1, further comprising an intermediate device for mediating audio transmission connected to the transmission path, wherein the intermediate device generates the sound image localization information. system.

7. One or more voice transmitting devices for inputting voice and outputting the input voice to a transmission path, receiving each voice from the one or more voice transmitting devices, and receiving the voice from the one or more voice transmitting devices. An audio transmission system comprising: an audio reception device that reproduces each audio at a sound image position specified in advance for each audio transmission device.

8. The audio transmission system according to claim 7, wherein the audio transmitting device and the audio receiving device are video conference terminals.

9. A transmission mediating device for mediating voice transmission from the voice transmitting device to the voice receiving device,
The audio transmission system according to claim 7, wherein the sound image position is defined by the mediation device.

10. A plurality of terminal devices each for inputting audio information, outputting the input audio information to a transmission path, and receiving and reproducing the audio information from the transmission path. An audio transmission system in which one reproduces audio information from another terminal device at different sound image positions.

11. The audio information output from each of the terminal devices is one-channel audio information, and one of the plurality of terminal devices reproduces one-channel audio information from another terminal device through a plurality of speakers. The audio transmission system according to claim 10, wherein a sound image position is different for each terminal.

12. The audio transmission system according to claim 11, wherein each of the terminal devices compresses and encodes the one-channel audio information and outputs the information to the transmission path.

13. The terminal device according to claim 10, wherein the terminal device inputs image information together with audio information, and the terminal device receives the image information from each terminal and reproduces the image information on a single monitor.
3. The audio transmission system according to any one of 2.

14. The audio transmission system according to claim 10, wherein an intermediate device for mediating audio transmission is further connected to the transmission path, and the intermediate device defines the sound image position. .

15. An audio reproducing apparatus connected to a transmission path, receiving audio information from one or more transmission apparatuses connected to the transmission path, and sound image localization information added to the audio information. A sound reproducing device that reproduces a sound indicated by each sound information in accordance with the information.

16. The sound information according to claim 15, wherein the sound information from each of the transmitting devices is one-channel sound information, and the sound image localization information defines a sound image position when the one-channel sound information is reproduced by a plurality of speakers. Audio playback device.

17. The audio reproducing apparatus according to claim 16, wherein the one-channel audio information is compression-encoded.

18. The audio according to claim 15, wherein the audio information is received together with the image information from the one or more transmitting devices, and the image information is reproduced on a single monitor. Playback device.

19. The receiving apparatus according to claim 15, wherein the receiving apparatus inputs image information, and the receiving apparatus reproduces each image information from each transmitting apparatus and image information from the receiving apparatus on a single monitor. The audio reproduction device according to the above.

20. The audio reproducing device according to claim 15, wherein an intermediary device for mediating audio transmission connected to the transmission path generates the sound image localization information.

21. An audio reproducing apparatus for reproducing audio information transmitted via a transmission line, comprising: a receiving unit for receiving information from the transmission line; and a sound included in the information received by the receiving unit. A sound reproducing device comprising: sound reproducing means for reproducing information at a sound image position corresponding to a transmission source.

22. The information indicating the sound image position of each sound information,
22. The sound reproducing device according to claim 21, wherein the sound reproducing device transmits the sound information together with the sound information via a transmission path.