JP4522332B2

JP4522332B2 - Audiovisual distribution system, method and program

Info

Publication number: JP4522332B2
Application number: JP2005193552A
Authority: JP
Inventors: 武井上; 宏和高橋; 鑑豊島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-07-01
Filing date: 2005-07-01
Publication date: 2010-08-11
Anticipated expiration: 2025-07-01
Also published as: JP2007013764A

Description

本発明は、映像および音声の配信に利用する。特に、遠隔会議システムのように、多数の端末がネットワークを介して接続され、各端末が相互に受信要求を送信し、映像データまたは音声データを交換する映像音声配信技術に関する。 The present invention is used for video and audio distribution. In particular, the present invention relates to a video / audio distribution technique in which a large number of terminals are connected via a network and each terminal transmits a reception request to each other and exchanges video data or audio data as in a remote conference system.

また、本発明は、先願（特願２００４−１２９０６０号、本願出願時に未公開）の改良に関する。 The present invention also relates to an improvement of a prior application (Japanese Patent Application No. 2004-129060, unpublished at the time of filing this application).

先願の映像音声配信技術では、各端末は次のようにして受信する映像を選択する。各端末は、他の端末から音声データを受信し、その音声データに基づいて端末の音量を算出する。全ての端末からの音声データについて、同様の計算を行い、音量の比較を行う。その結果、例えば、音量の大きな端末を発言中であると判断し、その端末から映像を受信して表示する。また、音量の小さな端末からの映像は、受信を停止する。このようにすることで、自動的に発言者の映像を表示することができる。 In the video / audio distribution technology of the prior application, each terminal selects a video to be received as follows. Each terminal receives audio data from another terminal and calculates the volume of the terminal based on the audio data. The same calculation is performed for audio data from all terminals, and the volume is compared. As a result, for example, it is determined that a terminal with a high volume is speaking, and a video is received from the terminal and displayed. Also, reception of video from a terminal with a low volume is stopped. In this way, the video of the speaker can be automatically displayed.

図３に映像音声配信システムの概略図を示す。複数の端末♯１〜♯３が、相互通信可能なネットワーク１４に接続されている。このシステムは、端末♯１〜♯３が相互に音声データと映像データとを交換し、テレビ会議を実現するものとする。 FIG. 3 shows a schematic diagram of the video / audio distribution system. A plurality of terminals # 1 to # 3 are connected to a network 14 capable of mutual communication. In this system, terminals # 1 to # 3 mutually exchange audio data and video data to realize a video conference.

このシステムの特徴は、映像データを選択的に交換する点にある。端末は、他の全端末から継続的に音声データを受信する。その音声データから音量を計算し、音量の大小を比較する。音量が最大である端末を「発言中」であると判断し、その発言中の端末に映像要求を送信する。すると、発言中端末から映像データが配信され、発言者の映像が映し出される。発信者が変わると、音量が最大である端末が変わるため、映像の要求先が変更され、表示される発言者も切り替わる。 This system is characterized in that video data is selectively exchanged. The terminal continuously receives audio data from all other terminals. The volume is calculated from the audio data, and the volume is compared. The terminal having the maximum volume is determined to be “speaking”, and a video request is transmitted to the terminal that is speaking. Then, the video data is distributed from the terminal that is speaking, and the video of the speaker is displayed. When the caller changes, the terminal whose volume is maximum changes, so the video request destination is changed, and the displayed speaker is also switched.

このようにすることで、自動的に発言者の映像を表示することができる。なお、同時に表示する発言者数は任意であるが、ここでは簡単のため一人のみ表示するとする。 In this way, the video of the speaker can be automatically displayed. The number of speakers to be displayed at the same time is arbitrary, but here, for simplicity, only one person is displayed.

図４に、先願に基づく端末の構成を示す。図４を用いて、端末の動作を説明する。太線の大きな矩形は、映像音声配信システムの端末機能を実現するソフトウェアの範疇である。通常、パケット送受信や外部入出力などの一般的な機能はＯＳに実装されているため、太矩形の外に描いた。映像復号機能や、音声復号機能については、端末機能を実現するソフトウェアの一部として実現する可能性も十分にあるが（太矩形の内側にある可能性もあるということで）、ここでは端末ソフトウェアの外部に実装されているとした。 FIG. 4 shows the configuration of a terminal based on the prior application. The operation of the terminal will be described with reference to FIG. A rectangle with a large bold line is a category of software that realizes the terminal function of the video / audio distribution system. Usually, general functions such as packet transmission / reception and external input / output are mounted on the OS, and are drawn outside the thick rectangle. The video decoding function and audio decoding function can be realized as a part of the software that realizes the terminal function (because it may be inside the thick rectangle), but here the terminal software It is assumed that it is implemented outside.

また、端末ソフトウェアは、外部の音声復号部から音量データを取得できないとした。このため、端末ソフトウェア内部で音声復号を行い、音量データを算出する。 In addition, the terminal software cannot acquire volume data from an external audio decoding unit. For this reason, voice decoding is performed inside the terminal software to calculate volume data.

まず、音声データを受信した場合の動作を説明する。パケット受信部１から音声データを受信すると、パケット分別部２で仕分けられる。この例では、外部の音声復号・出力部３に送られて音声が再生されると同時に、端末プログラム内部の音声復号部１２にも送られる。音量データが２本の矢印で描かれているのは、図１のように端末数が３である場合を想定しているためである。受信する音量データ数は、端末数をｎとするとｎ−１となる。復号された音声データは、音量算出部１３に送られ、音量データに交換される。この例では、算出処理を２回行う。端末数をｎとするとｎ−１回行う。 First, the operation when audio data is received will be described. When the audio data is received from the packet receiving unit 1, it is sorted by the packet sorting unit 2. In this example, the voice is reproduced by being sent to the external voice decoding / output unit 3 and simultaneously sent to the voice decoding unit 12 inside the terminal program. The volume data is drawn by two arrows because it is assumed that the number of terminals is three as shown in FIG. The number of volume data to be received is n-1, where n is the number of terminals. The decoded audio data is sent to the volume calculation unit 13 and exchanged for volume data. In this example, the calculation process is performed twice. If the number of terminals is n, the process is performed n-1 times.

音量データとは、音量を表す数値である。音量比較部４は、複数の端末の音量データを比較し、発言者を特定する。各端末からの音量データは異なる時刻に到着するため、必要に応じて音声データ記憶部５に記録しておき、比較のときに用いる。発言者が変更された場合には、映像要求部６は新たな発言者に映像要求を送信する。図２の例と同様に、映像の停止に関する処理は省略する。 The volume data is a numerical value representing the volume. The volume comparison unit 4 compares volume data of a plurality of terminals and identifies a speaker. Since the volume data from each terminal arrives at different times, it is recorded in the voice data storage unit 5 as necessary and used for comparison. When the speaker is changed, the video request unit 6 transmits a video request to a new speaker. Similar to the example of FIG. 2, processing related to video stop is omitted.

次に、映像データを受信した場合の動作を説明する。パケット受信部１から映像データを受信すると、パケット分別部２で仕分けられ、映像復号・表示部７に送られる。そして、映像が再生される。また、マイク８からの音声入力は、音声符号部１０で音声データに符号化され、パケット送信部１１より他の端末へ送られる。映像データの符号化処理および送信処理は省略する。 Next, the operation when video data is received will be described. When video data is received from the packet receiving unit 1, it is sorted by the packet sorting unit 2 and sent to the video decoding / display unit 7. Then, the video is reproduced. The voice input from the microphone 8 is encoded into voice data by the voice encoding unit 10 and sent from the packet transmission unit 11 to another terminal. The video data encoding process and transmission process are omitted.

図５のシーケンス図を用いて、先願に基づくシステムの動作を説明する。説明は、端末♯１を中心に行う。最初、端末♯２を利用している参加者が発言しているとする。このシーケンス図には、端末♯１が受信する音声データと映像データ、端末♯１が送信する映像要求のみを示す。 The operation of the system based on the prior application will be described with reference to the sequence diagram of FIG. The description will be focused on the terminal # 1. First, it is assumed that a participant using terminal # 2 speaks. This sequence diagram shows only audio data and video data received by the terminal # 1, and only a video request transmitted by the terminal # 1.

他の端末が受信する音声データと映像データや、他の端末が送信する映像要求は省略されている。凡例にあるように、送受信されるデータ種は矢印を変えて区別する。端末♯１は、端末♯２、♯３から音声データを受信している。シーケンス図は、音声データを一つしか受信していないように描かれているが、実際には連続するデータを受信している。 Audio data and video data received by other terminals, and video requests transmitted by other terminals are omitted. As shown in the legend, data types to be transmitted and received are distinguished by changing the arrows. Terminal # 1 receives audio data from terminals # 2 and # 3. The sequence diagram is drawn as if only one piece of audio data is received, but in actuality, continuous data is received.

端末♯１は、端末♯２、♯３から受信した音声データを基に、それぞれの音量を算出する。この例では２回の音量算出処理を行っているが、端末数をｎとするとｎ−１回の音量算出処理を行うことになる。端末１は、算出した音量を比較し、端末♯２が発言していると判断する。すると、端末♯１は、端末♯２に映像要求を送出する。端末♯２は映像要求を受信すると、端末♯１に映像データの送信を開始する。シーケンス図は、映像データを一つしか送信していないように描かれているが、実際には連続してデータを送信している。端末♯１は映像データを受信すると、その映像データを表示する。 Terminal # 1 calculates the sound volume based on the audio data received from terminals # 2 and # 3. In this example, the volume calculation process is performed twice. However, when the number of terminals is n, the volume calculation process is performed n-1 times. Terminal 1 compares the calculated volume and determines that terminal # 2 is speaking. Terminal # 1 then sends a video request to terminal # 2. When terminal # 2 receives the video request, terminal # 2 starts transmitting video data to terminal # 1. Although the sequence diagram is drawn as if only one piece of video data is being transmitted, the data is actually transmitted continuously. When terminal # 1 receives the video data, terminal # 1 displays the video data.

続いて、端末♯２を利用している参加者が発言を終え、端末♯３を利用している参加者が発言するとする。端末♯１は、端末♯２、♯３から受信している音声データを基に、それぞれの音量を算出する。ここでも、先ほどと同様に２回の音量算出処理を行う。端末♯１は、算出した音量を比較し、端末♯３が発言していると判断する。すると、端末♯１は、端末♯３に映像要求を送出する。ここで、端末♯１は端末♯２からの映像データを停止するために、端末♯２への映像停止要求を送信してもよい。あるいは、タイムアウトなどの機構により映像データの停止を待ってもよい。 Subsequently, it is assumed that the participant using terminal # 2 finishes speaking and the participant using terminal # 3 speaks. Terminal # 1 calculates the sound volume based on the audio data received from terminals # 2 and # 3. Again, the volume calculation process is performed twice as before. Terminal # 1 compares the calculated volume and determines that terminal # 3 is speaking. Terminal # 1 then sends a video request to terminal # 3. Here, the terminal # 1 may transmit a video stop request to the terminal # 2 in order to stop the video data from the terminal # 2. Alternatively, the video data may be stopped by a mechanism such as a timeout.

シーケンス図では、図が煩雑になることを避けるために、端末♯２からの映像データを停止する方法については明記していない。端末♯３は映像要求を受信すると、端末♯１に映像データの送信を開始する。シーケンス図は、映像データを一つしか送信していないように描かれているが、実際には連続してデータを送信している。端末♯１は映像データを受信すると、その映像データを表示する。このようにすることで、自動的に発言者の映像を表示し、切り替えることができる。 In the sequence diagram, the method of stopping the video data from the terminal # 2 is not specified in order to avoid the diagram from becoming complicated. When terminal # 3 receives the video request, terminal # 3 starts transmitting video data to terminal # 1. Although the sequence diagram is drawn as if only one piece of video data is being transmitted, the data is actually transmitted continuously. When terminal # 1 receives the video data, terminal # 1 displays the video data. In this way, the video of the speaker can be automatically displayed and switched.

ここで、先願の課題を確認する。参加者数をｎとすると、各端末はｎ−１の音声データに対して音量算出処理を行う。このため、映像音声配信システム全体での音量算出処理量は、ｎ（ｎ−１）となる。また、図４に示したように、端末プログラム外部に音声復号・出力部３を持ち、そこから音量データを取得できない場合には、端末プログラム内部で別途音声復号処理を行わなければならない。 Here, we confirm the issues of the prior application. When the number of participants is n, each terminal performs a sound volume calculation process on n-1 audio data. For this reason, the volume calculation processing amount in the entire video / audio distribution system is n (n−1). In addition, as shown in FIG. 4, when the voice decoding / output unit 3 is provided outside the terminal program and the volume data cannot be obtained therefrom, the voice decoding process must be separately performed inside the terminal program.

すなわち、先願には、次のような課題がある。音量の算出を、各端末が個別に行わなければならない。例えば、遠隔会議システムを想定した場合に、端末数が十数台になれば、個々の端末がそれぞれ音量算出を行うべき音声データ数も十数個になり、その計算負荷は膨大である。 That is, the prior application has the following problems. Each terminal must calculate the volume individually. For example, assuming a teleconferencing system, if the number of terminals becomes more than ten, the number of audio data to be calculated by each terminal is also more than ten, and the calculation load is enormous.

しかし、各端末が受信する音声データは同一であり、音量の算出結果も同一であることを考えると、システム全体での音量算出処理を軽減する余地がある。 However, considering that the audio data received by each terminal is the same and the calculation result of the volume is the same, there is room for reducing the volume calculation processing in the entire system.

また、先願では、符号化された音声データを受信し、そのデータを基に音量の算出を行う。通常、音量の算出を行うためには、音声データの復号化が必要となる。端末が音声復号モジュールを備えていれば、復号処理をそのモジュールに任せることができる。しかし、その音声復号モジュールから、音量データを取得できないという課題がある。この映像音声配信方法を実現するためには、音量を算出するために、別の音声復号モジュールを実装しなければならない。 In the prior application, encoded audio data is received, and the volume is calculated based on the received data. Usually, in order to calculate the volume, it is necessary to decode the audio data. If the terminal includes a speech decoding module, the decoding process can be left to that module. However, there is a problem that volume data cannot be acquired from the speech decoding module. In order to realize this video / audio distribution method, another audio decoding module must be mounted in order to calculate the volume.

本発明は、このような背景に行われたものであって、映像音声配信システム全体での音声算出処理量を軽減することができ、また、音量を算出するために音声を復号する必要を無くすることができる映像音声配信システムおよび方法およびプログラムを提供することを目的とする。 The present invention has been made in such a background, and can reduce the amount of audio calculation processing in the entire video / audio distribution system, and eliminates the need to decode audio to calculate the volume. It is an object of the present invention to provide a video / audio distribution system, method and program that can be used.

ネットワークに接続された複数の端末が相互に受信要求を送信し、映像データまたは音声データを交換する映像音声配信システムにおいて、映像音声配信システム全体での音量算出処理量を軽減し、また、音量を算出するために音声を復号する必要を無くするために、本発明は、第一の端末に、音声データの音量を含む特性を表す数値データを生成する数値データ生成手段と、当該数値データを他端末に配信する数値データ配信手段とを備え、その数値データを受信した第二の端末に、受信した数値データに基づき映像データを要求する端末を決定する要求端末決定手段を備えたことを特徴とする。 In a video / audio distribution system in which multiple terminals connected to the network send reception requests to each other and exchange video data or audio data, the volume calculation processing amount in the entire video / audio distribution system is reduced, and the volume is reduced. In order to eliminate the need to decode speech for calculation, the present invention provides a first terminal with numerical data generation means for generating numerical data representing characteristics including the volume of audio data, A numerical data distribution means for distributing to the terminal, and a request terminal determining means for determining a terminal that requests video data based on the received numerical data in the second terminal that has received the numerical data To do.

本発明によれば、映像音声配信システム全体での音量算出処理量を軽減することができる。また、音量を算出するために音声を復号する必要がなくなる。 According to the present invention, it is possible to reduce the volume calculation processing amount in the entire video / audio distribution system. Further, it is not necessary to decode the sound in order to calculate the volume.

本発明の実施例を説明する。映像音声配信システムの概略図は、図３を参照する。図１に、本発明に基づく端末の構成を示す。図１を用いて、端末の動作を説明する。太線の大きな矩形は、映像音声配信システムの端末機能を実現するソフトウェアの範疇である。通常、パケット送受信や外部入出力などの一般的な機能はＯＳに実装されているため、太矩形の外に描いた。映像復号機能や、音声復号機能については、端末機能を実現するソフトウェアの一部として実現される可能性も十分にあるが（太矩形の内側にある可能性もあるということ）、ここでは端末ソフトウェアの外部に実装されているとした。 Examples of the present invention will be described. Refer to FIG. 3 for a schematic diagram of the video / audio distribution system. FIG. 1 shows the configuration of a terminal according to the present invention. The operation of the terminal will be described with reference to FIG. A rectangle with a large bold line is a category of software that realizes the terminal function of the video / audio distribution system. Usually, general functions such as packet transmission / reception and external input / output are mounted on the OS, and are drawn outside the thick rectangle. The video decoding function and the audio decoding function may be implemented as a part of the software that realizes the terminal function (it may be inside the thick rectangle), but here the terminal software It is assumed that it is implemented outside.

まず、音声データを受信した場合の動作を説明する。パケット受信部１から音声データを受信すると、パケット分別部２で仕分けられ、音声復号・出力部３に送られる。そして、音声が再生される。次に、音量データを受信した場合の動作を説明する。パケット受信部１から音量データを受信すると、パケット分別部２で仕分けられ、音量比較部４に送られる。音量比較部４は、複数の端末の音量データを比較し、発言者を特定する。 First, the operation when audio data is received will be described. When the voice data is received from the packet receiving unit 1, it is sorted by the packet sorting unit 2 and sent to the voice decoding / output unit 3. Then, the sound is reproduced. Next, the operation when volume data is received will be described. When the volume data is received from the packet receiving unit 1, it is sorted by the packet sorting unit 2 and sent to the volume comparing unit 4. The volume comparison unit 4 compares volume data of a plurality of terminals and identifies a speaker.

各端末からの音量データは異なる時刻に到着するため、必要に応じて音声データ記憶部５に記憶しておき、比較のときに用いる。発言者が変更された場合には、映像要求部６は新たな発言者に映像要求を送信する。図４の例と同様に、映像の停止に関する処理は省略する。 Since the volume data from each terminal arrives at different times, it is stored in the audio data storage unit 5 as necessary and used for comparison. When the speaker is changed, the video request unit 6 transmits a video request to a new speaker. Similar to the example of FIG. 4, processing related to video stop is omitted.

次に、映像データを受信した場合の動作を説明する。パケット受信部１から映像データを受信すると、パケット分別部２で仕分けられ、映像復号・表示部７に送られる。 Next, the operation when video data is received will be described. When video data is received from the packet receiving unit 1, it is sorted by the packet sorting unit 2 and sent to the video decoding / display unit 7.

そして、映像が再生される。また、マイク８からの音声入力は、音量算出部９で音量データに変換され、他の端末へと送られる。並行して、音声符号部１０で音声データに符号化され、パケット送信部１１より他の端末へ送られる。映像データの符号化処理および送信処理は省略する。 Then, the video is reproduced. The voice input from the microphone 8 is converted into volume data by the volume calculation unit 9 and sent to other terminals. In parallel, the audio encoding unit 10 encodes the audio data, and the packet transmission unit 11 transmits the audio data to another terminal. The video data encoding process and transmission process are omitted.

ここで、先願と本発明とを比較する。先願では、参加者数をｎとすると、各端末はｎ−１の音声データに対して音量算出処理を行う。このため、映像音声配信システム全体での音量算出処理量は、ｎ（ｎ−１）となる。 Here, the prior application and the present invention will be compared. In the prior application, if the number of participants is n, each terminal performs a volume calculation process on n-1 audio data. For this reason, the volume calculation processing amount in the entire video / audio distribution system is n (n−1).

一方、本発明では、各端末は自身の音声データに対してのみ音量を算出すればよい。このため、映像音声配信システム全体での音量算出処理量は、ｎとなる。また、先願では、図３に示したように、端末プログラム外部に音声復号・出力部３を持ち、そこから音量データを取得できない場合には、端末プログラム内部で別途音声復号処理を行わなければならない。本発明では、端末プログラム内部に音声復号処理は不要である。一方で、本発明では音声データを配信するため、トラフィックが増える。しかし、音声データは数値であり、データ量が少ないため、その影響は小さいことが多い。 On the other hand, in the present invention, each terminal only has to calculate the volume for its own audio data. For this reason, the volume calculation processing amount in the entire video / audio distribution system is n. Further, in the prior application, as shown in FIG. 3, when the voice decoding / output unit 3 is provided outside the terminal program and the volume data cannot be obtained from the voice decoding / output unit 3, the voice decoding processing must be separately performed inside the terminal program. Don't be. In the present invention, speech decoding processing is not required inside the terminal program. On the other hand, in the present invention, since voice data is distributed, traffic increases. However, since voice data is a numerical value and the amount of data is small, the influence is often small.

図２のシーケンス図を用いて、本発明に基づくシステムの動作を説明する。説明は、端末♯１を中心に行う。最初、端末♯２を利用している参加者が発言しているとする。このシーケンス図には、端末♯１が受信する音声データと映像データ、端末♯１が送信する映像要求のみを示す。他の端末が受信する音声データと映像データや、他の端末が送信する映像要求は省略されている。 The operation of the system according to the present invention will be described with reference to the sequence diagram of FIG. The description will be focused on the terminal # 1. First, it is assumed that a participant using terminal # 2 speaks. This sequence diagram shows only audio data and video data received by the terminal # 1, and only a video request transmitted by the terminal # 1. Audio data and video data received by other terminals, and video requests transmitted by other terminals are omitted.

凡例にあるように、送受信されるデータ種は矢印を変えて区別する。端末♯１は、端末♯２、♯３から音声データおよび音量データを受信している。シーケンス図は、音声データおよび音量データを一つずつしか受信していないように描かれているが、実際には連続するデータを受信している。端末♯２、♯３は音量データを送信するために、自身の音声データを基に音量データを算出する。端末数ｎに関係なく、音量データの算出回数は１である。端末♯１は、端末♯２、♯３から受信した音量データを比較し、端末♯２が発言していると判断する。すると、端末♯１は、端末♯２に映像要求を送出する。端末♯２は映像要求を受信すると、端末♯１に映像データの送信を開始する。シーケンス図は、映像データを一つしか送信していないように描かれているが、実際には連続してデータを送信している。端末♯１は映像データを受信すると、その映像データを表示する。 As shown in the legend, data types to be transmitted and received are distinguished by changing the arrows. Terminal # 1 receives audio data and volume data from terminals # 2 and # 3. Although the sequence diagram is drawn so that only one piece of audio data and volume data is received, continuous data is actually received. Terminals # 2 and # 3 calculate volume data based on their own audio data in order to transmit volume data. Regardless of the number of terminals n, the number of volume data calculations is one. Terminal # 1 compares the volume data received from terminals # 2 and # 3, and determines that terminal # 2 is speaking. Terminal # 1 then sends a video request to terminal # 2. When terminal # 2 receives the video request, terminal # 2 starts transmitting video data to terminal # 1. Although the sequence diagram is drawn as if only one piece of video data is being transmitted, the data is actually transmitted continuously. When terminal # 1 receives the video data, terminal # 1 displays the video data.

続いて、端末♯２を利用している参加者が発言を終え、端末♯３を利用している参加者が発言するとする。端末♯１は、端末♯２、♯３から受信している音量データを比較し、端末♯３が発言していると判断する。すると、端末♯１は、端末♯３に映像要求を送出する。ここで、端末♯１は端末♯２からの映像データを停止するために、端末♯２へ映像停止要求を送信してもよい。あるいは、タイムアウトなどの機構により映像データの停止を待ってもよい。 Subsequently, it is assumed that the participant using terminal # 2 finishes speaking and the participant using terminal # 3 speaks. Terminal # 1 compares the volume data received from terminals # 2 and # 3, and determines that terminal # 3 is speaking. Terminal # 1 then sends a video request to terminal # 3. Here, the terminal # 1 may transmit a video stop request to the terminal # 2 in order to stop the video data from the terminal # 2. Alternatively, the video data may be stopped by a mechanism such as a timeout.

シーケンス図では、図が煩雑になることを避けるために、端末♯２からの映像データを停止する方法については明記していない。端末♯３は映像要求を受信すると、端末♯１に映像データの送信を開始する。シーケンス図は、映像データを一つか送信していないように描かれているが、実際には連続してデータを送信している。端末♯１は映像データを受信すると、その映像データを表示する。このようにすることで、自動的に発言者の映像を表示し、切り替えることができる。 In the sequence diagram, the method of stopping the video data from the terminal # 2 is not specified in order to avoid the diagram from becoming complicated. When terminal # 3 receives the video request, terminal # 3 starts transmitting video data to terminal # 1. The sequence diagram is depicted as not transmitting any video data, but actually the data is transmitted continuously. When terminal # 1 receives the video data, terminal # 1 displays the video data. In this way, the video of the speaker can be automatically displayed and switched.

本実施例は、汎用の情報処理装置にインストールすることにより、その情報処理装置を本実施例の端末として機能させるプログラムとして実施することができる。このプログラムは、記録媒体に記録されて情報処理装置にインストールされ、あるいは通信回線を介して情報処理装置にインストールされることにより当該情報処理装置を本実施例の端末として機能させることができる。 This embodiment can be implemented as a program that causes an information processing apparatus to function as a terminal of this embodiment by being installed in a general-purpose information processing apparatus. This program is recorded on a recording medium and installed in the information processing apparatus, or is installed in the information processing apparatus via a communication line, so that the information processing apparatus can function as a terminal of this embodiment.

本発明によれば、映像音声配信システム全体での音量算出処理量を軽減することができ、また、音量を算出するために音声を復号する必要がなくなるので、結果的に、端末に求められる能力が低くなり、経済性を増すことができる。 According to the present invention, it is possible to reduce the volume calculation processing amount in the entire video / audio distribution system, and it is not necessary to decode the sound in order to calculate the volume. Can be reduced and the economy can be increased.

本実施例の端末の構成図。The block diagram of the terminal of a present Example. 本実施例の映像音声配信方法の手順を示すシーケンス図。The sequence diagram which shows the procedure of the audiovisual delivery method of a present Example. システムの概略を説明するための図。The figure for demonstrating the outline of a system. 先願の端末の構成図。The block diagram of the terminal of a prior application. 先願の映像配信方法の手順を示すシーケンス図。The sequence diagram which shows the procedure of the video delivery method of a prior application.

Explanation of symbols

１パケット受信部
２パケット分別部
３音声復号・出力部
４音量比較部
５音声データ記憶部
６映像要求部
７映像復号・表示部
８マイク
９、１３音量算出部
１０音声符号部
１１パケット送信部
１２音声復号部
１４ネットワーク
♯１〜♯３端末 DESCRIPTION OF SYMBOLS 1 Packet reception part 2 Packet classification part 3 Audio | voice decoding / output part 4 Volume comparison part 5 Audio | voice data storage part 6 Image | video request | requirement part 7 Image | video decoding / display part 8 Microphone 9, 13 Volume calculation part 10 Voice encoding part 11 Packet transmission part 12 Speech decoding unit 14 Network # 1- # 3 terminal

Claims

A plurality of terminals connected to the network transmits the received request to each other, the video and audio distribution system for exchanging video data and audio data,
Each of the plurality of terminals is
And volume data generating means for generating the volume data is numerical data representing the volume of the own voice data,
And volume data distribution means for distributing to all the other terminals as a communication partner of the sound volume data,
Volume data receiving means for receiving volume data from all the other terminals ;
Volume comparison means for comparing the volume data from all the other received terminals to identify the terminal of the speaker;
Video request means for transmitting a video request for requesting video data to a new speaker's terminal when the speaker's terminal is changed ;
A video / audio distribution system, which starts transmission of video data to a requesting terminal when a video request is received from another terminal .

In the terminal applied to the video / audio distribution system in which a plurality of terminals connected to the network transmit reception requests to each other and exchange video data and audio data,
And volume data generating means for generating the volume data is numerical data representing the volume of the own voice data,
And volume data distribution means for distributing to all the other terminals as a communication partner of the sound volume data,
Volume data receiving means for receiving volume data from all the other terminals;
Volume comparison means for comparing the volume data from all the other received terminals to identify the terminal of the speaker;
Video request means for transmitting a video request for requesting video data to a new speaker's terminal when the speaker's terminal is changed ;
A terminal that starts transmission of video data to a requesting terminal when a video request is received from another terminal.

In a video / audio distribution method in which a plurality of terminals connected to a network transmit reception requests to each other and exchange video data and audio data.
Each of the plurality of terminals is
It generates volume data is numerical data representing the volume of the own voice data,
Deliver the volume data to all other terminals with whom you are communicating ,
Receiving volume data from all the other terminals ,
By comparing the volume data from the all the other terminal which has received to identify the speaker's terminal,
When a speaker ’s terminal is changed, a video request is sent to the new speaker ’s terminal requesting video data ,
A video / audio distribution method characterized by starting transmission of video data to a requesting terminal when a video request is received from another terminal .

Implemented by installing a program in a general-purpose information processing device for the video and audio distribution system in which multiple terminals connected to the network send reception requests to each other and exchange video data and audio data In the program when
As a function corresponding to each of the plurality of terminals,
And volume data generating function of generating the volume data is numerical data representing the volume of the own voice data,
And volume data distribution function of distributing to all the other terminals as a communication partner of the sound volume data,
A volume data receiving function for receiving volume data from all the other terminals ;
A volume comparison function for identifying the speaker's terminal by comparing the volume data received from all the other terminals ;
A video request function for sending a video request for video data to a new speaker's terminal when the speaker's terminal is changed;
A program for realizing a function of starting transmission of video data to a requesting terminal when a video request is received from another terminal .

A recording medium readable by the information processing apparatus on which the program according to claim 4 is recorded.