JP4741325B2

JP4741325B2 - Multipoint conference method and multipoint conference system

Info

Publication number: JP4741325B2
Application number: JP2005257827A
Authority: JP
Inventors: 新九郎本田; 貴之安野; 万知夫森内; 大安藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-09-06
Filing date: 2005-09-06
Publication date: 2011-08-03
Anticipated expiration: 2025-09-06
Also published as: JP2007074221A

Description

本発明は、多地点に設置された各端末間で、サーバを介して映像や音声を送受信する、多地点会議方法及び多地点会議システムに関する。 The present invention, among the installed in the multi-point terminal, to transmit and receive video and audio via the server, about the multipoint conference method, and a multipoint conference system.

パーソナルコンピュータ（ＰＣ）の性能向上により、マイクやスピーカ、カメラを当該ＰＣに具備して、そのＰＣを、ネットワークを介した他のＰＣとの間の、映像／音声の双方向通信に利用することが可能となっている。またこれらの技術を用いて、映像／音声通信に特化した端末も実現されている。これらの端末を「クライアント」と呼ぶこととする。また映像情報／音声情報の配信処理や、それら情報を各端末へ配信するための呼制御の機能を具備したサーバを用いることにより、複数のクライアントを、サーバを介して接続し、多地点間での映像／音声通信を提供することも可能となっている。 By improving the performance of a personal computer (PC), a microphone, a speaker, and a camera are provided in the PC, and the PC is used for bidirectional video / audio communication with other PCs via a network. Is possible. In addition, a terminal specialized for video / audio communication has been realized using these technologies. These terminals are called “clients”. In addition, by using a server having a function of distributing video information / audio information and a call control function for distributing the information to each terminal, a plurality of clients can be connected via the server and It is also possible to provide video / audio communication.

多地点会議を実現するクライアントは、自クライアントのカメラで撮影した映像情報をエンコードして、通信ネットワークへ送信し、また他クライアントから送信された映像情報を通信ネットワークを介して受信し、その映像情報をエンコードしてモニタに表示する処理を行う。ここで、システムに参加するクライアントが３台以上になった場合には、あるクライアントに対して、他の２台以上のクライアントから送信された映像情報を配信する際のサーバの処理方法が大きく２つの方法に大別できる。まず、１つ目は、他のクライアントから配信された複数の映像情報をサーバで選択、または複数の映像情報を合成し、１つのストリームとして配信し、該配信先のクライアントが受信する方法である。この方法では、配信先のクライアントは常に１つのストリームにより映像情報を受信し再生処理を行えばよいので、処理能力がそれほど高くないクライアントでもシステムに参加することが可能となる。また２つ目の方法は、他のクライアントからの映像情報をそれぞれ別のストリームにより、サーバが配信先のクライアントへ配信し、該配信先のクライアントでは、複数のストリームにより映像情報を別々に受信して、各映像情報を、それぞれ別々に再生処理を行う方法である。この２つ目の方法では、配信先のクライアントにおける受信と再生の処理が複数同時に行われるので、高負荷に耐えることのできる能力を必要とする。なおクライアント間で映像や音声の情報をサーバを介して送受信する会議システムに関する技術が公開されている（特許文献１参照）。
“Arcstar IP-VPN ネットワークイメージ”、「online」、「平成１７年７月２０日検索」、インターネット＜ＵＲＬ：http://www.ntt-vpn.com/ip-vpn/push/tv/push_tv_net.html＞ A client that realizes a multipoint conference encodes video information captured by the camera of its own client, transmits it to the communication network, and receives video information transmitted from other clients via the communication network. Encodes and displays it on the monitor. Here, when there are three or more clients participating in the system, there are two major server processing methods for delivering video information transmitted from two or more other clients to a certain client. It can be roughly divided into two methods. First, the first is a method in which a plurality of video information distributed from other clients is selected by a server, or a plurality of video information is synthesized and distributed as a single stream, which is received by the distribution destination client. . In this method, the client of the distribution destination only needs to always receive the video information by one stream and perform the reproduction process. Therefore, even a client having a low processing capability can participate in the system. In the second method, the server distributes video information from other clients to each of the distribution destination clients using separate streams, and the distribution destination clients receive the video information separately from the plurality of streams. Thus, each video information is reproduced separately. In the second method, since a plurality of reception and reproduction processes are simultaneously performed in the distribution destination client, an ability to withstand a high load is required. A technology related to a conference system that transmits and receives video and audio information between clients via a server is disclosed (see Patent Document 1).
“Arcstar IP-VPN Network Image”, “online”, “Search July 20, 2005”, Internet <URL: http://www.ntt-vpn.com/ip-vpn/push/tv/push_tv_net. html>

しかしながら、従来の多地点会議システムでは、そのシステムを実現するに当たり、上述のような１つ目の方法のみを用いた多地点会議システムであるか、または２つ目の方法のみを用いた多地点会議システムであるかの何れかであった。 However, in the conventional multipoint conference system, in realizing the system, the multipoint conference system using only the first method as described above or the multipoint conference using only the second method is used. It was either a conference system.

そこでこの発明は、クライアントの処理能力の高低関係なく、システムに参加することのできる、多地点会議方法及び多地点会議システムを提供することを目的としている。 Accordingly, an object of the present invention is to provide a multipoint conference method and a multipoint conference system that can participate in the system regardless of the level of processing capability of the client.

上記目的を達成するために、本発明は、受信した複数の配信映像をそれぞれ再生処理できる第１種端末と、受信した１つの配信映像のみを再生処理できる第２種端末と、前記第１種端末や前記第２種端末が送信した映像情報および音声情報を他の端末へ配信する多地点会議処理装置と、からなる多地点会議システムにおける多地点会議方法であって、前記多地点会議処理装置における第１種サーバが、前記第１種端末および第２種端末それぞれより受信した音声情報を合成すると共に、当該合成した音声情報を前記第１種端末へ配信する際には、その合成した音声情報から配信先の第１種端末の音声情報を削除して配信し、当該合成した音声情報を前記第２種端末へ配信する際には、その合成した音声情報から第２種端末の音声情報を削除して配信し、前記第１種端末および第２種端末それぞれより受信した映像情報を、第１種端末へ配信する際に、該配信先の第１種端末を除く、他の第１種端末または第２種端末の映像情報を配信し、前記第１種端末および前記第２種端末それぞれと、前記映像情報および前記音声情報を配信または受信するための、ＩＰアドレス、ポート番号を少なくとも含む通信識別子と、前記映像情報および前記音声情報の送受信に用いる配信能力情報を交換し、前記多地点会議処理装置における第２種サーバが、前記音声配信手段により配信された第２種端末を宛先とする音声情報を転送し、前記第１種サーバを介して前記第１種端末より受信した映像情報や前記第２種端末より受信した映像情報を、第２種端末へ配信する際に、それら第１種端末より受信した映像情報や配信先以外の第２種端末より受信した映像情報を１つの配信映像に合成し、前記合成した映像情報を配信先の第２種端末へ配信するか、または前記第１種サーバを介して前記第１種端末より受信した映像情報や前記配信先以外の第２種端末より受信した映像情報のうちの複数を選択して１つの配信映像に合成し、当該合成した映像情報を前記配信先の第２種端末へ配信することを特徴とする多地点会議方法である。 To achieve the above object, the present invention provides a first type terminal that can reproduce and process a plurality of received distribution videos, a second type terminal that can reproduce only one received distribution video, and the first type A multipoint conference processing method in a multipoint conference system comprising: a multipoint conference processing device that distributes video information and audio information transmitted from a terminal or the second type terminal to another terminal, wherein the multipoint conference processing device When the first type server in synthesizes the voice information received from each of the first type terminal and the second type terminal and distributes the synthesized voice information to the first type terminal , the synthesized voice delivered by deleting the first kind audio information terminal destination from information, the synthesized voice information when delivered to the second type terminal, voice information from the synthesized speech information second type terminal Delete Delivered, the first type terminal and the second type terminal video information received from each when distributing the first-type terminal, excluding the first type terminal of the destination, the one terminal or the other and distributing video information of the two terminals, and each of the first type terminal and the second type terminal, for delivering or receiving the video information and the audio information, IP address, and at least including communication identifier a port number , Exchanging distribution capability information used for transmission / reception of the video information and the audio information, and the second type server in the multipoint conference processing device destined for the second type terminal distributed by the audio distribution means When the video information received from the first type terminal or the video information received from the second type terminal via the first type server is distributed to the second type terminal, these first type terminals More The video information received from the second type terminal other than the distribution destination video information or the second type terminal is combined into one distribution video, and the combined video information is distributed to the second type terminal of the distribution destination, or the first type server A plurality of video information received from the first type terminal and the video information received from the second type terminal other than the distribution destination are selected and combined into one distribution video, and the combined video information is It is a multipoint conference method characterized by delivering to the second type terminal of the delivery destination .

また本発明は、受信した複数の配信映像をそれぞれ再生処理できる第１種端末と、受信した１つの配信映像のみを再生処理できる第２種端末と、前記第１種端末や前記第２種端末が送信した映像情報および音声情報を他の端末へ配信する多地点会議処理装置と、からなる多地点会議システムであって、前記多地点会議処理装置は、前記第１種端末および第２種端末それぞれより受信した音声情報を合成すると共に、当該合成した音声情報を前記第１種端末へ配信する際には、その合成した音声情報から配信先の第１種端末の音声情報を削除して配信し、当該合成した音声情報を前記第２種端末へ配信する際には、その合成した音声情報から第２種端末の音声情報を削除して配信する音声配信手段と、前記第１種端末および第２種端末それぞれより受信した映像情報を、第１種端末へ配信する際に、該配信先の第１種端末を除く、他の第１種端末または第２種端末の映像情報を配信する第１映像配信手段と、前記第１種端末および前記第２種端末それぞれと、前記映像情報および前記音声情報を配信または受信するための、ＩＰアドレス、ポート番号を少なくとも含む通信識別子と、前記映像情報および前記音声情報の送受信に用いる配信能力情報を交換する接続制御手段と、を有する第１種サーバと、前記音声配信手段により配信された第２種端末を宛先とする音声情報を転送する音声転送手段と、前記第１種サーバを介して前記第１種端末より受信した映像情報や前記第２種端末より受信した映像情報を、第２種端末へ配信する際に、それら第１種端末より受信した映像情報や配信先以外の第２種端末より受信した映像情報を１つの配信映像に合成し、前記合成した映像情報を配信先の第２種端末へ配信するか、または前記第１種サーバを介して前記第１種端末より受信した映像情報や前記配信先以外の第２種端末より受信した映像情報のうちの複数を選択して１つの配信映像に合成し、当該合成した映像情報を前記配信先の第２種端末へ配信する第２映像配信手段と、を有する第２種サーバと、からなることを特徴とする多地点会議システムである。 The present invention also provides a first type terminal capable of reproducing each of a plurality of received distribution videos, a second type terminal capable of reproducing only one received distribution video, the first type terminals and the second type terminals. A multipoint conference processing apparatus that distributes the video information and audio information transmitted by the other terminal to the other terminal, wherein the multipoint conference processing apparatus includes the first type terminal and the second type terminal. with synthesized speech information received from each of the combined when delivering voice information to the first type terminals that delivery to remove the synthesized voice information of the first kind terminal of destination from the voice information and, the to the synthesized speech information when delivering to the second type terminal includes a voice delivery means for delivering from the synthesized speech information by deleting the voice information of the second type terminal, the first type terminal and the second type terminal each The video information received Ri, when delivering the first type terminal, the distribution destination other than the first type terminal and the other of the first kind terminal or the first video distribution means for distributing the video information of the two terminal A communication identifier including at least an IP address and a port number for distributing or receiving the video information and the audio information with each of the first type terminal and the second type terminal, and the video information and the audio information A connection control means for exchanging delivery capability information used for transmission / reception of the first type server, a voice transfer means for transferring voice information destined for a second type terminal delivered by the voice delivery means, Video information received from the first type terminal when the video information received from the first type terminal or the video information received from the second type terminal via the first type server is distributed to the second type terminal. Or arrangement The video information received from the second type terminal other than the destination is synthesized into one delivery video, and the synthesized video information is delivered to the second type terminal of the delivery destination or the first type server via the first type server. A plurality of video information received from a type 1 terminal and video information received from a type 2 terminal other than the distribution destination are selected and combined into one distribution video, and the combined video information is added to the distribution destination No. A multipoint conference system comprising: a second type server having second video distribution means for distributing to two types of terminals .

本発明によれば、複数の映像を別々に受信して再生できるクライアントと、１つの映像しか受信および再生処理できないクライアントが多地点会議システムに混在するような場合には、複数の映像の全てまたは複数を１つに合成する処理機能を持ったサーバを備えることで、１つの映像しか受信および再生処理できないクライアントでも多地点会議システムに参加できるようなシステムを構成することができる。つまり、クライアントの処理能力の高低関係なく、全てのクライアントがシステムに参加することのできる多地点会議システムを提供することができる。 According to the present invention, when a client capable of receiving and playing back a plurality of videos separately and a client capable of receiving and playing back only one video are mixed in a multipoint conference system, all of the plurality of videos or By providing a server having a processing function for combining a plurality of images into one, it is possible to configure a system in which even a client that can receive and reproduce only one video can participate in the multipoint conference system. That is, it is possible to provide a multipoint conference system in which all clients can participate in the system regardless of the level of processing capability of the clients.

以下、本発明の一実施形態による多地点会議システムを図面を参照して説明する。図１は同実施形態による多地点会議システムの構成を示すブロック図である。この図において、符号１０はサーバ群である。また符号２０、３０はクライアントである。クライアント２０、３０とサーバ群１０とは通信ネットワークを介して接続されている。またクライアント２０は、タイプＡのクライアントであり、タイプＡは、複数のストリームにより配信された別々の映像情報を同時に受信、再生処理することができるクライアントである。したがってタイプＡのクライアント２０は、高負荷に耐えることのできる処理機能を有している。また、クライアント３０は、タイプＢのクライアントであり、タイプＢは、１つストリームにより配信された１つの映像情報のみを受信、再生処理することができるクライアントである。したがって、クライアント３０は、クライアント２０の比べて、処理機能の劣る端末である。なお、クライアント２０とクライアント３０の処理機能の違いは、搭載されているＣＰＵの処理速度や、メモリ容量や、ソフトウェアなどによって、予めどちらのタイプの端末かが決められていても良いし、また、ユーザが決定してサーバ群に登録して、タイプが決められるようにしても良い。 Hereinafter, a multipoint conference system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the multipoint conference system according to the embodiment. In this figure, reference numeral 10 denotes a server group. Reference numerals 20 and 30 denote clients. The clients 20 and 30 and the server group 10 are connected via a communication network. The client 20 is a type A client, and the type A is a client that can simultaneously receive and reproduce different video information distributed by a plurality of streams. Therefore, the type A client 20 has a processing function capable of withstanding a high load. The client 30 is a type B client, and the type B is a client that can receive and play back only one piece of video information distributed by one stream. Therefore, the client 30 is a terminal having a processing function inferior to that of the client 20. Incidentally, the difference between the processing capabilities of the client 20 and client 30, and the processing speed of CPU mounted, and memory capacity, such as by software, may be provided either in advance which type of terminal is determined, also, The type may be determined by the user determining and registering in the server group.

図２はクライアントの機能ブロックを示す図である。
この図においては、タイプＡおよびタイプＢの各クライアントの機能ブロックを示している。タイプＡのクライアント２０は、１つの音声処理部２１、１つの映像エンコーダ２２、複数の映像デコーダ２３、カメラ２４、マイク２５、スピーカ２６の各機能が備えられている。クライアント２０の映像デコーダ２３では、別々のストリームにより受信した複数の映像情報それぞれをデコード処理しディスプレイなどに表示する処理を行う。また映像エンコーダ２２はカメラ２４で撮影した映像をエンコードし、通信ネットワークに接続されたサーバ群１０へ送信する処理を行う。また、音声処理部２１は、マイク３５で収拾した音声をエンコードして通信ネットワークに接続されたサーバ群１０へ送信する処理と、通信ネットワークを介して受信した音声情報をデコードしてスピーカ２６へ出力する処理を行う。 FIG. 2 is a diagram showing functional blocks of the client.
In this figure, functional blocks of type A and type B clients are shown. The type A client 20 includes functions of one audio processing unit 21, one video encoder 22, a plurality of video decoders 23, a camera 24, a microphone 25, and a speaker 26. The video decoder 23 of the client 20 performs a process of decoding each of a plurality of pieces of video information received by separate streams and displaying them on a display or the like. The video encoder 22 encodes the video shot by the camera 24 and transmits it to the server group 10 connected to the communication network. The audio processing unit 21 encodes the audio collected by the microphone 35 and transmits the encoded audio to the server group 10 connected to the communication network, and decodes the audio information received via the communication network and outputs it to the speaker 26. Perform the process.

またタイプＢのクライアント３０は、１つの音声処理部３１、１つの映像エンコーダ３２、１つの映像デコーダ３３、カメラ３４、マイク３５、スピーカ３６の各機能が備えられている。クライアント３０の映像デコーダ３３では、１つのストリームにより受信した映像情報のみをデコード処理しディスプレイなどに表示する処理を行う。また映像エンコーダ３２は、映像エンコーダ２２同様に、カメラ２４で撮影した映像をエンコードし、通信ネットワークに接続されたサーバ群１０へ送信する処理を行う。また、音声処理部３１は音声処理部２１同様に、マイク３５で収拾した音声をエンコードして通信ネットワークに接続されたサーバ群１０へ送信する処理と、通信ネットワークを介して受信した音声情報をデコードしてスピーカ３６へ出力する処理を行う。 The type B client 30 includes functions of one audio processing unit 31, one video encoder 32, one video decoder 33, a camera 34, a microphone 35, and a speaker 36. The video decoder 33 of the client 30 performs a process of decoding only the video information received by one stream and displaying it on a display or the like. Similarly to the video encoder 22, the video encoder 32 encodes video captured by the camera 24 and transmits the encoded video to the server group 10 connected to the communication network. Similarly to the audio processing unit 21, the audio processing unit 31 encodes audio collected by the microphone 35 and transmits it to the server group 10 connected to the communication network, and decodes audio information received via the communication network. Then, a process of outputting to the speaker 36 is performed.

図３はサーバ群の各サーバにおける機能ブロックを示す図である。
この図が示すように、本実施形態による多地点会議システムでは、サーバ群として会議サーバ１１と、映像変換サーバ１２が存在する。そして、会議サーバ１１において、会議管理機能部１１１は、会議システムの処理開始や、会議システムへのクライアントの参加要求の処理や、会議システム開始時間の登録処理などの、会議システムの運営にかかわる管理処理を行う。また、接続制御機能部１１２は、映像変換サーバ１２やクライアント２０や３０が映像情報や音声情報を送信また受信する際に用いる、ＩＰアドレスやポート番号や、また、音声情報や映像情報を送信する際の情報として必要な配信能力情報＜映像：コーデック種別（MPEG-4 SP3 など），ビットレート，フレームレート，画像サイズなど、音声：コーデック種別（G.711など），ビットレートなど＞を、映像変換サーバ１２やクライアント２０や３０に要求し、また他の装置やクライアントに通知する処理を行う。 FIG. 3 is a diagram showing functional blocks in each server of the server group.
As shown in the figure, in the multipoint conference system according to the present embodiment, a conference server 11 and a video conversion server 12 exist as a server group. In the conference server 11, the conference management function unit 111 manages the conference system, such as the start of processing of the conference system, the request for participation of the client in the conference system, and the registration processing of the conference system start time. Process. In addition, the connection control function unit 112 transmits an IP address, a port number, and audio information and video information used when the video conversion server 12 and the clients 20 and 30 transmit and receive video information and audio information. Information required for distribution information <video: codec type (MPEG-4 SP3, etc.), bit rate, frame rate, image size, etc., audio: codec type (G.711, etc.), bit rate, etc.> A process of requesting the conversion server 12 and the clients 20 and 30 and notifying other apparatuses and clients is performed.

また音声合成機能部１１３は、クライアント２０、３０より受信した音声情報を合成すると共に、合成した音声情報をクライアント２０へ配信する際に、その合成した音声情報から配信先のクライアント２０の音声情報を削除して配信する処理を行う。またクライアント３０へは、複数のクライアント２０から受信した音声情報を合成して、映像変換サーバ１２を介して送信する。また映像転送機能部１１４は、クライアント２０、３０の送信した映像情報を、他の配信先となるクライアント２０へ配信する際に、該配信先のクライアント２０以外の、他のクライアント２０やクライアント３０の送信した映像情報を選択して配信する処理を行う。またクライアント３０へは、複数のクライアント２０から受信した映像情報をそれぞれ、映像変換サーバ１２を介して送信する。 The voice synthesis function unit 113 synthesizes the voice information received from the clients 20 and 30, and distributes the synthesized voice information to the client 20 from the synthesized voice information. Process to delete and distribute. Also, the audio information received from the plurality of clients 20 is synthesized and transmitted to the client 30 via the video conversion server 12. Further, when the video transfer function unit 114 distributes the video information transmitted by the clients 20 and 30 to the client 20 that is another distribution destination, the video transfer function unit 114 of the other clients 20 and clients 30 other than the client 20 of the distribution destination. Processing to select and distribute the transmitted video information. The video information received from the plurality of clients 20 is transmitted to the client 30 via the video conversion server 12.

また映像変換サーバ１２において、接続制御機能部１２１は、自サーバや会議サーバ１１やクライアント３０が映像情報や音声情報を送信また受信する際に用いる、ＩＰアドレスやポート番号や、また、音声情報や映像情報を送信する際の情報として必要な配信能力情報＜映像：コーデック種別（MPEG-4 SP3 など），ビットレート，フレームレート，画像サイズなど、音声：コーデック種別（G.711など），ビットレートなど＞を、会議サーバ１１やクライアント３０に要求、配信する処理を行う。また音声処理機能部１２２は、会議サーバ１１から送信された合成後の音声情報をクライアント３０へ送信する処理と、クライアント３０から送信された音声情報を会議サーバ１１へ転送する処理とを行う。また映像トランスコード機能部１２３は、会議サーバ１１から受信した複数の映像情報を選択または合成して１つのストリームによりクライアント３０へ配信する処理と、クライアント３０から送信された映像情報を会議サーバ１１へ転送する処理とを行う。 In the video conversion server 12, the connection control function unit 121 includes an IP address, a port number, audio information, and the like used when the server, the conference server 11, and the client 30 transmit or receive video information and audio information. Delivery capability information required for sending video information <Video: Codec type (MPEG-4 SP3, etc.), bit rate, frame rate, image size, etc. Audio: Codec type (G.711, etc.), bit rate Etc.> is requested and distributed to the conference server 11 and the client 30. The voice processing function unit 122 performs a process of transmitting the synthesized voice information transmitted from the conference server 11 to the client 30 and a process of transferring the voice information transmitted from the client 30 to the conference server 11. The video transcoding function unit 123 also selects or combines a plurality of video information received from the conference server 11 and distributes the video information transmitted from the client 30 to the conference server 11 to the conference server 11. The process to transfer is performed.

図４はコネクション確立状態を示す図である。
図５はコネクション確立処理を示す第１の図である。
図６はコネクション確立処理を示す第２の図である。
図４は、多地点会議システムにおいて、タイプＡとしてクライアント２０−１、２０−２、またタイプＢとしてクライアント３０の３つのクライアントが利用される場合の、ａ〜ｏまでの１５のコネクション確立状態を示している。
ここでａは、会議サーバ１１からクライアント２０−１へ送信される複数合成された音声情報（下り）と、クライアント２０−１から会議サーバ１１へ送信されるクライアント２０−１の収拾した音声情報（上り）のコネクションを示している。
またｂは、クライアント２０−１から会議サーバ１１へ送信される映像情報のコネクション（上り）を示している。
またｃは、会議サーバ１１からクライアント２０−１へ送信される映像情報のコネクション（下り）のうち、クライアント２０−２で撮影された映像を送信するコネクションを示している。
またｄは、会議サーバ１１からクライアント２０−１へ送信される映像情報のコネクション（下り）のうち、クライアント３０で撮影された映像を送信するコネクションを示している。 FIG. 4 is a diagram showing a connection establishment state.
FIG. 5 is a first diagram showing connection establishment processing.
FIG. 6 is a second diagram showing the connection establishment process.
FIG. 4 shows 15 connection establishment states from a to o when the clients 20-1 and 20-2 as type A and the client 30 as type B are used in the multipoint conference system. Show.
Here, a is a plurality of synthesized voice information (downlink) transmitted from the conference server 11 to the client 20-1 and voice information collected by the client 20-1 transmitted from the client 20-1 to the conference server 11 ( (Uplink) connection.
In addition, b indicates a connection (up) of video information transmitted from the client 20-1 to the conference server 11.
In addition, c indicates a connection for transmitting a video imaged by the client 20-2 among connections (downstream) of video information transmitted from the conference server 11 to the client 20-1.
In addition, d indicates a connection for transmitting a video imaged by the client 30 among connections (downstream) of video information transmitted from the conference server 11 to the client 20-1.

またｅは、会議サーバ１１からクライアント２０−２へ送信される複数合成された音声情報（下り）と、クライアント２０−２から会議サーバ１１へ送信されるクライアント２０−２の収拾した音声情報（上り）のコネクションを示している。
またｆは、クライアント２０−２から会議サーバ１１へ送信される映像情報のコネクション（上り）を示している。
またｇは、会議サーバ１１からクライアント２０−２へ送信される映像情報のコネクション（下り）のうち、クライアント２０−１で撮影された映像を送信するコネクションを示している。
またｈは、会議サーバ１１からクライアント２０−２へ送信される映像情報のコネクション（下り）のうち、クライアント３０で撮影された映像を送信するコネクションを示している。 Also, e is a plurality of synthesized voice information (downlink) transmitted from the conference server 11 to the client 20-2 and voice information (uplink) collected by the client 20-2 transmitted from the client 20-2 to the conference server 11. ) Connection.
F indicates a connection (up) of video information transmitted from the client 20-2 to the conference server 11.
Further, g indicates a connection for transmitting a video captured by the client 20-1 among connections (downstream) of video information transmitted from the conference server 11 to the client 20-2.
In addition, h indicates a connection for transmitting video captured by the client 30 among video information connections (downstream) transmitted from the conference server 11 to the client 20-2.

またｉは、会議サーバ１１から映像変換サーバ１２へ送信される複数合成された音声情報（下り）と、映像変換サーバ１２から会議サーバ１１へ送信されるクライアント３０の音声情報（上り）のコネクションを示している。
またｊは、クライアント２０−１から受信した映像情報を、会議サーバ１１が映像変換サーバ１２へ送信するコネクション（下り）を示している。
またｋは、クライアント２０−２から受信した映像情報を、会議サーバ１１が映像変換サーバ１２へ送信するコネクション（下り）を示している。
またｌは、クライアント３０から受信した映像情報を、映像変換サーバ１２が会議サーバ１１へ送信するコネクション（上り）を示している。 I represents a connection between a plurality of synthesized audio information (downlink) transmitted from the conference server 11 to the video conversion server 12 and a voice information (uplink) of the client 30 transmitted from the video conversion server 12 to the conference server 11. Show.
Further, j indicates a connection (downward) through which the conference server 11 transmits the video information received from the client 20-1 to the video conversion server 12.
Further, k indicates a connection (downlink) through which the conference server 11 transmits the video information received from the client 20-2 to the video conversion server 12.
Further, l indicates a connection (uplink) through which the video conversion server 12 transmits the video information received from the client 30 to the conference server 11.

またｍは、映像変換サーバ１２からクライアント３０へ送信される複数合成された音声情報（下り）と、クライアント３０から映像変換サーバ１２へ送信されるクライアント３０の収拾した音声情報（上り）のコネクションを示している。
またｎは、映像変換サーバ１２からクライアント３０へ送信される変換後の映像情報のコネクション（下り）を示している。
またｏは、クライアント３０から映像変換サーバ１２へ送信される映像情報のコネクション（上り）を示している。 M represents a connection between a plurality of synthesized audio information (downstream) transmitted from the video conversion server 12 to the client 30 and a collected audio information (upstream) of the client 30 transmitted from the client 30 to the video conversion server 12. Show.
Further, n indicates a connection (downlink) of the converted video information transmitted from the video conversion server 12 to the client 30.
In addition, o indicates a connection (up) of video information transmitted from the client 30 to the video conversion server 12.

次に、図４、５、６を用いてサーバ群とクライアントの間のコネクションの確立処理について説明する。
まず、会議サーバ１１では、予めクライアント２０−１、２０−２、３０からのアクセスを受付けており、該クライアントの会議システムへの参加登録が行われる。例えば会議サーバ１１がウェブサーバの処理部を備えており、このウェブサーバ処理部の配信するウェブページにおいて、各クライアントを操作するユーザの処理により、各クライアントのＳＩＰアドレスや、会議識別番号や、会議開始時刻などの情報と、また、各クライアントがタイプＡとタイプＢのどちらのタイプのクライアントかを示す情報が登録される。なお、これらの登録処理は、会議サーバ１１へ多地点会議システムの管理者が入力するようにしてもよい。これにより、会議識別番号ごとに、その会議識別番号で表される会議に参加するクライアント２０、３０のＳＩＰアドレスや会議開始時刻やタイプ種別などの各登録情報がデータベースなどに登録される。 Next, a process for establishing a connection between a server group and a client will be described with reference to FIGS.
First, the conference server 11 accepts access from the clients 20-1, 20-2, and 30 in advance, and the participation registration of the clients in the conference system is performed. For example, the conference server 11 includes a processing unit of a web server, and in the web page distributed by the web server processing unit, the SIP address of each client, the conference identification number, Information such as the start time and information indicating whether each client is a type A or type B client are registered. These registration processes may be input to the conference server 11 by the administrator of the multipoint conference system. Thereby, for each conference identification number, each registration information such as the SIP address, conference start time, and type type of the clients 20 and 30 participating in the conference represented by the conference identification number is registered in the database.

次に、会議サーバ１１の会議管理機能部１１１は、カウントしている時刻とデータベースに記録されている会議開始時刻とを比較し、一致した時点において、会議システムの処理の開始を検出する。そして、会議管理機能部１１１は接続制御機能部１１２に、会議開催要求の情報を送信する（ステップＳ１）。また会議管理機能部１１１は接続制御機能部１１２に、登録済みのクライアントに対する会議参加要求の情報を送信する（ステップＳ２）。すると、接続制御機能部１１２は、登録されている各クライアントと、会議サーバ１１、映像変換サーバ１２間の、コネクション確立処理を開始する。以下、コネクション確立処理フローについて順を追って説明する。 Next, the conference management function unit 111 of the conference server 11 compares the counted time with the conference start time recorded in the database, and detects the start of the process of the conference system when they match. Then, the conference management function unit 111 transmits information on the conference holding request to the connection control function unit 112 (step S1). In addition, the conference management function unit 111 transmits information on a conference participation request to the registered client to the connection control function unit 112 (step S2). Then, the connection control function unit 112 starts connection establishment processing between each registered client and the conference server 11 and the video conversion server 12. Hereinafter, the connection establishment process flow will be described in order.

まず、クライアント２０−１が既にコネクションが確立されたものとする。そして次に、上述のステップＳ２において、会議管理機能部１１１が接続制御機能部１１２に対して、クライアント３０に対する会議参加要求の情報を送信したとする。すると、
（ステップＳ３）接続制御機能部１１２は、映像転送機能部１１４が映像変換サーバ１２へ映像を配信する場合に用いるＩＰアドレスの情報と、配信能力情報＜映像：コーデック種別（MPEG-4 SP3 など），ビットレート，フレームレート，画像サイズなど、音声：コーデック種別（G.711など），ビットレートなど＞とを映像変換サーバ１２へ通知する。またその応答として、映像変換サーバ１２の接続制御機能部１２１は、クライアント３０との間で映像や音声の情報を送受信するＩＰアドレスと、それら映像や音声の情報をクライアント３０から受信する各受信ポート番号と、クライアント３０との間で映像や音声の情報を送受信する際の配信能力情報とを、会議サーバ１１へ返送する。 First, it is assumed that the client 20-1 has already established a connection. Next, it is assumed that the conference management function unit 111 transmits the conference participation request information for the client 30 to the connection control function unit 112 in the above-described step S2. Then
(Step S3) The connection control function unit 112, IP address information used when the video transfer function unit 114 distributes video to the video conversion server 12, and distribution capability information <video: codec type (MPEG-4 SP3, etc.) , Bit rate, frame rate, image size, etc., audio: codec type (G.711, etc.), bit rate, etc.> are notified to the video conversion server 12. As a response, the connection control function unit 121 of the video conversion server 12 receives an IP address for transmitting / receiving video and audio information to / from the client 30 and each reception port for receiving the video and audio information from the client 30. The number and the distribution capability information for transmitting and receiving video and audio information to and from the client 30 are returned to the conference server 11.

（ステップＳ４）次に、接続制御機能部１１２は、ステップＳ３で映像変換サーバ１２から返送を受けた情報を、クライアント３０へ転送する。これにより、映像変換サーバ１２がクライアント３０との間で映像と音声の情報を送受信する際に利用する各情報（ＩＰアドレスや受信ポートや配信能力情報）がクライアント３０へ通知される。すると、クライアント３０は、会議サーバ１１へ、映像変換サーバ１２から映像や音声の情報を受信する際に用いる受信ポート番号と、映像変換サーバ１２との間で映像や音声の情報を送受信する際に利用する配信能力情報を返信する。 (Step S4) Next, the connection control function unit 112 transfers the information received from the video conversion server 12 in step S3 to the client 30. As a result, each information (IP address, reception port, and distribution capability information) used when the video conversion server 12 transmits and receives video and audio information to and from the client 30 is notified to the client 30. Then, the client 30 transmits / receives video / audio information to / from the video conversion server 12 and a reception port number used when receiving video / audio information from the video conversion server 12 to the conference server 11. Returns the delivery capability information to be used.

（ステップＳ５）次に、接続制御機能部１１２は、映像変換サーバ１２から音声情報を受信する際に用いる受信ポート番号を、映像変換サーバ１２へ通知する。またその返信として、映像変換サーバ１２の接続制御機能部１２１は、会議サーバ１１へ音声情報を送信する際に用いる送信ポート番号を、会議サーバ１１へ通知する。
（ステップＳ６）次に、接続制御機能部１１２は、映像変換サーバ１２へ音声情報を送信する際に用いる送信ポート番号を、映像変換サーバ１２へ通知する。またその返信として、映像変換サーバ１２の接続制御機能部１２１は、会議サーバ１１から音声情報を受信する際に用いる受信ポート番号を、会議サーバ１１へ通知する。 (Step S <b> 5) Next, the connection control function unit 112 notifies the video conversion server 12 of the reception port number used when receiving audio information from the video conversion server 12. Further, as a reply, the connection control function unit 121 of the video conversion server 12 notifies the conference server 11 of a transmission port number used when transmitting audio information to the conference server 11.
(Step S <b> 6) Next, the connection control function unit 112 notifies the video conversion server 12 of a transmission port number used when transmitting audio information to the video conversion server 12. As a reply, the connection control function unit 121 of the video conversion server 12 notifies the conference server 11 of a reception port number used when receiving audio information from the conference server 11.

（ステップＳ７）そして、接続制御機能部１１２は、映像変換サーバ１２からステップＳ６において受信した情報（音声情報の受信ポート番号）を、音声合成機能部１１３に通知する。そしてその返信として、音声合成機能部１１３はＯＫまたはＮＧの情報を、接続制御機能部１１２へ送信する。 (Step S7) Then, the connection control function unit 112 notifies the information (the reception port number of the voice information) received from the video conversion server 12 in step S6 to the voice synthesis function unit 113. As a response, the speech synthesis function unit 113 transmits OK or NG information to the connection control function unit 112.

（ステップＳ８）次に、接続制御機能部１１２は、映像情報を映像変換サーバ１２から受信する際に用いるＩＰアドレスとその受信ポート番号とを、映像変換サーバ１２に通知する。するとその返信として、映像変換サーバ１２の接続制御機能部１２１は、会議サーバ１１へ映像情報を送信する際に用いるＩＰアドレスとその送信ポート番号とを、会議サーバ１１へ送信する。
（ステップＳ９）次に、接続制御機能部１１２は、ステップＳ４においてクライアント３０より受信した情報（映像や音声を受信する際に用いる各受信ポート番号と、配信能力情報）を、映像変換サーバ１２へ送信する。これにより、クライアント３０が映像変換サーバ１２と映像や音声の情報を送受信する際に用いる情報が、映像変換サーバ１２へ、会議サーバ１１を介して通知される。またその返信として、映像変換サーバ１２は、音声情報をクライアント３０へ送信する際に用いる送信ポート番号と、映像情報をクライアント３０へ送信する際に用いる送信ポート番号とを、会議サーバ１１へ送信する。 (Step S <b> 8) Next, the connection control function unit 112 notifies the video conversion server 12 of an IP address and a reception port number used when video information is received from the video conversion server 12. Then, as a reply, the connection control function unit 121 of the video conversion server 12 transmits to the conference server 11 the IP address used for transmitting the video information to the conference server 11 and its transmission port number.
(Step S9) Next, the connection control function unit 112 sends the information received from the client 30 in Step S4 (reception port numbers used when receiving video and audio and distribution capability information) to the video conversion server 12. Send. Thereby, information used when the client 30 transmits and receives video and audio information to and from the video conversion server 12 is notified to the video conversion server 12 via the conference server 11. In addition, as a reply, the video conversion server 12 transmits to the conference server 11 a transmission port number used when transmitting audio information to the client 30 and a transmission port number used when transmitting video information to the client 30. .

（ステップＳ１０）次に、接続制御機能部１１２は、ステップＳ８において映像変換サーバ１２より受信した情報（映像変換サーバ１２が会議サーバ１１へ映像情報を送信する際に用いるＩＰアドレスとその送信ポート番号）を、映像転送機能部１１４へ通知する。映像転送機能部１１４は、その返信として接続制御機能部１１２へＯＫまたはＮＧの情報を返信する。 (Step S10) Next, the connection control function unit 112 receives the information received from the video conversion server 12 in Step S8 (the IP address used when the video conversion server 12 transmits the video information to the conference server 11 and its transmission port number). ) To the video transfer function unit 114. The video transfer function unit 114 returns OK or NG information to the connection control function unit 112 as a reply.

（ステップＳ１１）次に、接続制御機能部１１２は、映像情報を送信する際の情報として配信能力情報＜映像：コーデック種別（MPEG-4 SP3 など），ビットレート，フレームレート，画像サイズなど＞をクライアント２０−１へ送信する。するとクライアント２０−１はその返信として、会議サーバ１から映像情報を受信する際の受信ポート番号と、映像情報の送信する際に用いる配信能力情報を会議サーバ１１へ送信する。
（ステップＳ１２）また、接続制御機能部１１２は、ステップＳ１１においてクライアント２０−１から受信した情報（受信ポート番号や配信能力情報）を映像転送機能部１１４へ通知する。すると映像転送機能部１１４はその返信としてＯＫまたはＮＧの情報を返信する。 (Step S11) Next, the connection control function unit 112 sends distribution capability information <video: codec type (MPEG-4 SP3, etc.), bit rate, frame rate, image size, etc.> as information when transmitting video information. Transmit to client 20-1. As a response, the client 20-1 transmits to the conference server 11 the reception port number when receiving video information from the conference server 1 and the distribution capability information used when transmitting the video information.
(Step S12) Further, the connection control function unit 112 notifies the video transfer function unit 114 of the information (reception port number and distribution capability information) received from the client 20-1 in step S11. Then, the video transfer function unit 114 returns OK or NG information as the reply.

（ステップＳ１３）また、接続制御機能部１１２は、映像変換サーバ１２へ映像情報を送信する際に用いるＩＰアドレスとその送信ポート番号と、映像を送信する際に利用する配信能力情報＜映像：コーデック種別（MPEG-4 SP3 など），ビットレート，フレームレート，画像サイズなど＞を、映像変換サーバ１２へ送信する。またその返信として、映像変換サーバ１２は、会議サーバ１１から映像情報を受信する際のＩＰアドレスとその受信ポート番号と、映像情報の受信に利用する配信能力情報＜映像：コーデック種別（MPEG-4 SP3 など），ビットレート，フレームレート，画像サイズなど＞を、会議サーバ１１へ送信する。 (Step S13) The connection control function unit 112 also uses the IP address and the transmission port number used when transmitting the video information to the video conversion server 12, and the distribution capability information used when transmitting the video <video: codec. Type (MPEG-4 SP3, etc.), bit rate, frame rate, image size, etc.> are transmitted to the video conversion server 12. Also, as a response, the video conversion server 12 receives the video information from the conference server 11 and its receiving port number, distribution capability information used for receiving the video information <video: codec type (MPEG-4 SP3, etc.), bit rate, frame rate, image size, etc.> are transmitted to the conference server 11.

（ステップＳ１４）また、接続制御機能部１１２は、ステップＳ１３において映像変換サーバ１２から受信した情報（ＩＰアドレス、受信ポート番号、配信能力情報）を、映像転送機能部１１４へ通知する。これにより、映像変換サーバ１２へ映像情報を送信する際に用いられる各種情報が映像転送機能部１１４に通知される。その返信として映像転送機能部１１４は、ＯＫまたはＮＧの情報を接続制御機能部１１２に通知する。以上、ステップＳ２〜ステップＳ１４の処理により、クライアント３０とサーバ群１０における各サーバとの間のコネクションの確立処理が終了する。 (Step S14) In addition, the connection control function unit 112 notifies the video transfer function unit 114 of the information (IP address, reception port number, distribution capability information) received from the video conversion server 12 in Step S13. Thereby, various information used when transmitting the video information to the video conversion server 12 is notified to the video transfer function unit 114. In response, the video transfer function unit 114 notifies the connection control function unit 112 of OK or NG information. As described above, the process for establishing a connection between the client 30 and each server in the server group 10 is completed by the processes in steps S2 to S14.

次に、会議サーバ１１の会議管理機能部１１１がクライアント２０−２への会議参加要求を接続制御機能部１１２に通知し、クライアント２０−２とのコネクションの確立処理を開始する。
（ステップＳ１５）接続制御機能部１１２は、映像情報と音声情報とを受信するＩＰアドレスと各受信ポート番号とを、クライアント２０−２に送信する。するとその返信としてクライアント２０−２は受信用のＩＰアドレスと、音声情報と映像情報を受信する各受信ポート番号を会議サーバ１１に送信する。
（ステップＳ１６）次に接続制御機能部１１２は、ステップＳ１５で受信したＩＰアドレスと音声情報を受信する受信ポート番号を、音声合成機能部１１３に通知する。またその返信として音声合成機能部１１３はＯＫまたはＮＧの情報を接続制御機能部１１２に送信する。
（ステップＳ１７）また接続制御機能部１１２は、ステップＳ１５で受信したＩＰアドレスと映像情報を受信する受信ポート番号を、映像転送機能部１１４に通知する。またその返信として映像転送機能部１１４はＯＫまたはＮＧの情報を接続制御機能部１１２に送信する。 Next, the conference management function unit 111 of the conference server 11 notifies the connection control function unit 112 of a conference participation request to the client 20-2, and starts a connection establishment process with the client 20-2.
(Step S15) The connection control function unit 112 transmits an IP address for receiving video information and audio information and each reception port number to the client 20-2. Then, as a reply, the client 20-2 transmits to the conference server 11 a reception IP address and each reception port number for receiving audio information and video information.
(Step S16) Next, the connection control function unit 112 notifies the voice synthesis function unit 113 of the IP address received in step S15 and the reception port number for receiving the voice information. Also, as a response, the speech synthesis function unit 113 transmits OK or NG information to the connection control function unit 112.
(Step S17) The connection control function unit 112 notifies the video transfer function unit 114 of the IP address received in step S15 and the reception port number for receiving the video information. In addition, as a response, the video transfer function unit 114 transmits OK or NG information to the connection control function unit 112.

（ステップＳ１８）次に接続制御機能部１１２は、クライアント２０−１との間で映像情報を送受信する際に用いるＩＰアドレスと、その映像情報の送受信に用いる情報として配信能力情報＜映像：コーデック種別（MPEG-4 SP3 など），ビットレート，フレームレート，画像サイズなど＞とを、クライアント２０−１へ送信する。またその応答として、クライアント２０−１は、映像情報を送受信する際に用いるＩＰアドレスと、映像情報の受信ポート番号と、映像情報の送受信に用いる情報として配信能力情報を会議サーバ１１に送信する。
（ステップＳ１９）そして接続制御機能部１１２は、クライアント２０−１から受信した、映像情報の受信ポート番号と配信能力情報とを、映像転送機能部１１４に通知する。その応答として映像転送機能部１１４は、ＯＫまたはＮＧの情報を接続制御機能部１１２に通知する。 (Step S18) Next, the connection control function unit 112 uses an IP address used when transmitting / receiving video information to / from the client 20-1, and distribution capability information <video: codec type as information used for transmitting / receiving the video information. (MPEG-4 SP3, etc.), bit rate, frame rate, image size, etc.> are transmitted to the client 20-1. In response, the client 20-1 transmits the IP address used when transmitting / receiving the video information, the reception port number of the video information, and the distribution capability information as information used for transmitting / receiving the video information to the conference server 11.
(Step S19) The connection control function unit 112 notifies the video transfer function unit 114 of the reception port number and distribution capability information of the video information received from the client 20-1. In response, the video transfer function unit 114 notifies the connection control function unit 112 of OK or NG information.

（ステップＳ２０）また接続制御機能部１１２は、映像変換サーバ１２に映像情報を送信する際に用いるＩＰアドレスと、送信ポート番号と、当該映像情報の送受信に用いる配信能力情報を、映像変換サーバ１２へ送信する。するとその返信として映像変換サーバ１２の接続制御機能部１２１は、映像情報を受信するＩＰアドレスと、その受信ポートと、配信能力情報とを会議サーバ１１に送信する。
（ステップＳ２１）すると接続制御機能部１１２は、ステップＳ２０において映像変換サーバ１２から受信した情報（映像変換サーバ１２で映像情報受信時に利用するＩＰアドレス、受信ポート、配信能力情報）を、映像転送機能部１１４に通知する。そしてその応答として映像転送機能部１１４は、ＯＫまたはＮＧの情報を接続制御機能部１１２に送信する。 (Step S20) The connection control function unit 112 also obtains the IP address used when transmitting the video information to the video conversion server 12, the transmission port number, and the distribution capability information used for transmission / reception of the video information. Send to. Then, as a reply, the connection control function unit 121 of the video conversion server 12 transmits to the conference server 11 an IP address for receiving the video information, its reception port, and distribution capability information.
(Step S21) Then, the connection control function unit 112 uses the information received from the video conversion server 12 in Step S20 (IP address, reception port, distribution capability information used when the video conversion server 12 receives the video information) as a video transfer function. Notification to the unit 114. In response, the video transfer function unit 114 transmits OK or NG information to the connection control function unit 112.

（ステップＳ２２）また接続制御機能部１１２は、映像情報をクライアント２０−２に送信する際に用いるＩＰアドレスと、配信能力情報とを、クライアント２０−２に送信する。これにより、クライアント２０−２は、映像情報を会議サーバ１１から受信する際に用いるＩＰアドレスと、受信ポート番号と、配信能力情報とを、会議サーバ１１に返信する。
（ステップＳ２３）接続制御機能部１１２は、クライアント２０−２から受信したＩＰアドレスと、受信ポート番号と、配信能力情報とを、１つめの映像情報（例えばクライアント２０−１から配信された映像情報）をクライアント２０−２に送信する際に用いる情報として、映像転送機能部１１４に通知する。すると映像転送機能部１１４は、ＯＫまたはＮＧの情報を返信する。 (Step S22) The connection control function unit 112 also transmits to the client 20-2 an IP address and distribution capability information used when transmitting the video information to the client 20-2. As a result, the client 20-2 returns the IP address, the reception port number, and the distribution capability information used when receiving the video information from the conference server 11 to the conference server 11.
(Step S23) The connection control function unit 112 converts the IP address received from the client 20-2, the reception port number, and distribution capability information into the first video information (for example, video information distributed from the client 20-1). ) Is sent to the video transfer function unit 114 as information used when transmitting to the client 20-2. Then, the video transfer function unit 114 returns OK or NG information.

（ステップＳ２４）また接続制御機能部１１２は、映像情報をクライアント２０−２に送信する際に用いるＩＰアドレスと、配信能力情報とを、クライアント２０−２に送信する。これにより、クライアント２０−２は、映像情報を会議サーバ１１から受信する際に用いる、ＩＰアドレスと、受信ポート番号と、配信能力情報とを、会議サーバ１１に返信する。
（ステップＳ２５）接続制御機能部１１２は、クライアント２０−２から受信したＩＰアドレスと、受信ポート番号と、配信能力情報とを、２つめの映像情報（例えばクライアント３０から配信された映像情報）をクライアント２０−２に送信する際に用いる情報として、映像転送機能部１１４に通知する。すると映像転送機能部１１４は、ＯＫまたはＮＧの情報を返信する。
以上の処理により、クライアント２０−２とサーバ群１０における各サーバとの間のコネクションの確立処理（図４で示したａ〜ｏ）が終了する。なお、クライアント２０−１とサーバ群１０との間のコネクションの確立処理も同様にして行われている。 (Step S24) The connection control function unit 112 also transmits to the client 20-2 an IP address and distribution capability information used when transmitting the video information to the client 20-2. Thereby, the client 20-2 returns the IP address, the reception port number, and the distribution capability information used when receiving the video information from the conference server 11 to the conference server 11.
(Step S25) The connection control function unit 112 receives the IP address received from the client 20-2, the reception port number, and distribution capability information, and second video information (for example, video information distributed from the client 30). The video transfer function unit 114 is notified as information used when transmitting to the client 20-2. Then, the video transfer function unit 114 returns OK or NG information.
With the above processing, the connection establishment processing (a to o shown in FIG. 4) between the client 20-2 and each server in the server group 10 is completed. The connection establishment process between the client 20-1 and the server group 10 is performed in the same manner.

そして、会議に参加する全てのクライアントとの間でコネクションの確立処理が終了すると、会議サーバ１１から各クライアントに対して配信開始の指示が通知される。するとタイプＡのクライアント２０−１、２０−２においては、カメラ２４で撮影した映像情報と、マイク２５で収拾した音声情報とを順じ会議サーバ１１へ送信する。またタイプＢのクライアント３０においては、カメラ３４で撮影した映像情報と、マイク３５で収拾した音声情報とを順じ映像変換サーバ１２に送信する。そして、下記の会議サーバ１１と映像変換サーバ１２との処理により、映像情報と音声情報の配信処理が行われる。 Then, when the connection establishment process is completed with all the clients participating in the conference, the conference server 11 notifies each client of a distribution start instruction. Then, the type A clients 20-1 and 20-2 sequentially transmit the video information captured by the camera 24 and the audio information collected by the microphone 25 to the conference server 11. In addition, the type B client 30 sequentially transmits the video information captured by the camera 34 and the audio information collected by the microphone 35 to the video conversion server 12. Then, distribution processing of video information and audio information is performed by processing of the conference server 11 and the video conversion server 12 described below.

＜会議サーバにおける処理＞
会議サーバ１１の音声合成機能部１１３は、クライアント２０−１、クライアント２０−２、映像変換サーバ１２から送信された音声情報（クライアント３０で収拾された音声情報）を受信して合成する。そして、合成した音声情報をクライアント２０−１やクライアント２０−２、また映像変換サーバ１２に送信する。その時、音声合成機能部１１３は、合成した音声情報をクライアント２０−１に送信する際には、クライアント２０−１から受信した音声情報を合成した音声情報から省いて、送信する。また同様に、音声合成機能部１１３は、合成した音声情報をクライアント２０−２に送信する際には、クライアント２０−２から受信した音声情報を合成した音声情報から省いて送信し、合成した音声情報を映像変換サーバ１２に送信する際には、映像変換サーバ１２から受信した音声情報を合成した音声情報から省いて送信する。 <Processing in the conference server>
The voice synthesis function unit 113 of the conference server 11 receives and synthesizes voice information (voice information collected by the client 30) transmitted from the client 20-1, the client 20-2, and the video conversion server 12. Then, the synthesized audio information is transmitted to the client 20-1, the client 20-2, and the video conversion server 12. At that time, when transmitting the synthesized voice information to the client 20-1, the voice synthesis function unit 113 transmits the synthesized voice information by omitting the voice information received from the client 20-1 from the synthesized voice information. Similarly, when the synthesized voice information is transmitted to the client 20-2, the voice synthesis function unit 113 transmits the synthesized voice information by omitting the synthesized voice information from the synthesized voice information. When transmitting information to the video conversion server 12, the audio information received from the video conversion server 12 is omitted from the synthesized audio information.

また会議サーバ１１の映像転送機能部１１４は、クライアント２０−１、クライアント２０−２、映像変換サーバ１２から送信された映像情報を転送する。たとえば、映像転送機能部１１４は、映像変換サーバ１２とクライアント２０−２からそれぞれ受信した映像情報を、別々のセッションとしてクライアント２０−１に送信する。また、映像転送機能部１１４は、映像変換サーバ１２とクライアント２０−１からそれぞれ受信した映像情報を、別々のセッションとしてクライアント２０−２に送信する。また、映像転送機能部１１４は、クライアント２０−１とクライアント２０−２からそれぞれ受信した映像情報を、別々のセッションとして映像変換サーバ１２に送信する。 The video transfer function unit 114 of the conference server 11 transfers the video information transmitted from the client 20-1, the client 20-2, and the video conversion server 12. For example, the video transfer function unit 114 transmits the video information received from the video conversion server 12 and the client 20-2 to the client 20-1 as separate sessions. In addition, the video transfer function unit 114 transmits the video information respectively received from the video conversion server 12 and the client 20-1 to the client 20-2 as separate sessions. The video transfer function unit 114 transmits the video information received from the client 20-1 and the client 20-2 to the video conversion server 12 as separate sessions.

＜映像変換サーバにおける処理＞
映像変換サーバの音声処理機能部１２２は、クライアント３０から受信した音声情報を、会議サーバ１１へ転送すると共に、会議サーバ１１から受信した合成された音声情報をクライアント３０へ送信する。また、映像トランスコード機能部１２３は、会議サーバ１１から別々のセッションにより受信した映像情報を合成する。この時、受信した全ての映像情報を合成してもよいし、予め選択された映像情報のみを合成するようにしても良い。そして、映像トランスコード機能部１２３は、合成された映像情報をクライアント３０へ送信する。 <Processing in video conversion server>
The audio processing function unit 122 of the video conversion server transfers the audio information received from the client 30 to the conference server 11 and transmits the synthesized audio information received from the conference server 11 to the client 30. The video transcoding function unit 123 combines video information received from the conference server 11 through separate sessions. At this time, all received video information may be synthesized, or only pre-selected video information may be synthesized. Then, the video transcoding function unit 123 transmits the synthesized video information to the client 30.

以上の処理により、複数の映像を別々に受信して再生できるタイプＡのクライアント２０と、１つの映像しか受信および再生処理できないタイプＢのクライアント３０が多地点会議システムに混在するような場合には、映像を１つに合成する処理機能を持った映像変換サーバ１２を備えることで、タイプＢのクライアントでも多地点会議システムに参加できるようなシステムを構成することができる。つまり、クライアントの処理能力の高低関係なく、全てのクライアントがシステムに参加することのできる多地点会議システムを提供することができる。 In the case where a type A client 20 capable of receiving and playing back a plurality of videos separately and a type B client 30 capable of receiving and playing back only one video are mixed in the multipoint conference system by the above processing. By providing the video conversion server 12 having a processing function for synthesizing videos into one, it is possible to configure a system in which even a type B client can participate in a multipoint conference system. That is, it is possible to provide a multipoint conference system in which all clients can participate in the system regardless of the level of processing capability of the clients.

なお上述の会議サーバや映像変換サーバやクライアントは内部に、コンピュータシステムを有している。そして、上述した処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。 The conference server, video conversion server, and client described above have a computer system inside. The process described above is stored in a computer-readable recording medium in the form of a program, and the above process is performed by the computer reading and executing this program. Here, the computer-readable recording medium means a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, or the like. Alternatively, the computer program may be distributed to the computer via a communication line, and the computer that has received the distribution may execute the program.

また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

多地点会議システムの構成を示すブロック図である。It is a block diagram which shows the structure of a multipoint conference system. クライアントの機能ブロックを示す図である。It is a figure which shows the functional block of a client. サーバ群の各サーバにおける機能ブロックを示す図である。It is a figure which shows the functional block in each server of a server group. コネクション確立状態を示す図である。It is a figure which shows a connection establishment state. コネクション確立処理を示す第１の図である。It is a 1st figure which shows a connection establishment process. コネクション確立処理を示す第２の図である。It is a 2nd figure which shows a connection establishment process.

Explanation of symbols

１０・・・サーバ群
１１・・・会議サーバ
１２・・・映像変換サーバ
２０，３０・・・クライアント
１１１・・・会議管理機能部
１１２，１２１・・・接続制御機能部
１１３・・・音声合成機能部
１１４・・・映像転送機能部
１２２・・・音声処理機能部
１２３・・・映像トランスコード機能部
DESCRIPTION OF SYMBOLS 10 ... Server group 11 ... Conference server 12 ... Video conversion server 20, 30 ... Client 111 ... Conference management function part 112, 121 ... Connection control function part 113 ... Speech synthesis Function unit 114 ... Video transfer function unit 122 ... Audio processing function unit 123 ... Video transcoding function unit

Claims

A first type terminal capable of reproducing each of a plurality of received distribution videos;
A second type terminal that can process only one received distribution video;
A multipoint conference processing device that distributes video information and audio information transmitted by the first type terminal and the second type terminal to other terminals;
A multipoint conference method in a multipoint conference system comprising:
The first type server in the multipoint conference processing apparatus is
When synthesizing the voice information received from each of the first type terminal and the second type terminal and distributing the synthesized voice information to the first type terminal , the first of the distribution destinations is determined from the synthesized voice information . delivered by deleting the voice information of the seed terminal, when delivering the synthesized voice information to the second type terminals, and distributed from the synthesized speech information by deleting the voice information of the second type terminal,
The video information received from each of the first type terminal and the second type terminal, when delivering the first type terminal, except for the one terminal of the destination, the one terminal of the other or the two terminal Of video information ,
A communication identifier including at least an IP address and a port number for delivering or receiving the video information and the audio information with each of the first type terminal and the second type terminal, and transmission / reception of the video information and the audio information Exchange the delivery capability information used for
The second type server in the multipoint conference processing device is:
Transferring voice information destined for the second type terminal delivered by the voice delivery means;
Video information received from the first type terminal when distributing the video information received from the first type terminal or the video information received from the second type terminal to the second type terminal via the first type server The video information received from the second type terminal other than the information and the distribution destination is combined into one distribution video, and the combined video information is distributed to the second type terminal as the distribution destination, or via the first type server. The video information received from the first type terminal and the video information received from the second type terminal other than the distribution destination are selected and combined into one distribution video, and the combined video information is distributed to the distribution unit. A multipoint conference method characterized by delivering to a second type terminal .

A first type terminal capable of reproducing each of a plurality of received distribution videos;
A second type terminal that can process only one received distribution video;
A multipoint conference processing device that distributes video information and audio information transmitted by the first type terminal and the second type terminal to other terminals;
A multipoint conference system comprising:
The multipoint conference processing apparatus includes:
When synthesizing the voice information received from each of the first type terminal and the second type terminal and distributing the synthesized voice information to the first type terminal , the first of the distribution destinations is determined from the synthesized voice information . delivered by deleting the voice information of the seed terminal, when delivering the synthesized voice information to the second type terminal delivers remove the synthesized voice information of the second type terminal from the audio information audio Delivery means;
The video information received from each of the first type terminal and the second type terminal, when delivering the first type terminal, except for the one terminal of the destination, the one terminal of the other or the two terminal First video distribution means for distributing the video information;
A communication identifier including at least an IP address and a port number for delivering or receiving the video information and the audio information with each of the first type terminal and the second type terminal, and transmission / reception of the video information and the audio information Connection control means for exchanging distribution capability information used for
A first type server having
A voice transfer means for transferring voice information destined for the second type terminal delivered by the voice delivery means;
Video information received from the first type terminal when distributing the video information received from the first type terminal or the video information received from the second type terminal to the second type terminal via the first type server The video information received from the second type terminal other than the information and the distribution destination is combined into one distribution video, and the combined video information is distributed to the second type terminal as the distribution destination, or via the first type server. The video information received from the first type terminal and the video information received from the second type terminal other than the distribution destination are selected and combined into one distribution video, and the combined video information is distributed to the distribution unit. A second video distribution means for distributing to the second type terminal ;
A second type server having
Multipoint conferencing system which is characterized in that it consists of.