JP2007158875A

JP2007158875A - Sound transmission apparatus

Info

Publication number: JP2007158875A
Application number: JP2005352930A
Authority: JP
Inventors: Kenji Fujimoto; 研司藤本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-12-07
Filing date: 2005-12-07
Publication date: 2007-06-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound transmission apparatus to reduce the fluctuations of the sound reproduction of a plurality of spots, when the sound is reproduced simultaneously in the plural spots. <P>SOLUTION: A sound transmission terminal 3 transmits a sound multicast data to an IP network 2 and a plurality of sound-receiving terminals 4 receive the sound multicast data through the IP network 2. The sound transmission terminal 3 is provided with a sound packet transmitting portion 12 to transmit a sound packet of the sound multicast data and a learning packet transmitting portion 13 for transmitting a learning packet, including a header constitution similar to the sound packet, before the transmission of the sound packet is started. Each sound-receiving terminal 4 is provided with a learning packet canceling portion 15 for canceling the learning packet, received before the start of the reception of the sound packet. A network path is decided by transmitting the learning packet, and then the transmission of the sound packet is started. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、音声のデジタルデータをＩＰネットワークを介してマルチキャストで送信する音声伝送装置に関する。 The present invention relates to an audio transmission apparatus that transmits audio digital data by multicast via an IP network.

従来、ビルや大規模施設のような構内における音声伝送装置としては、アナログ形式の音声伝送装置が一般的であったが、ネットワーク技術の発展の下、デジタル形式の音声伝送装置が提案されている。デジタル形式の場合、典型的には、音声送信端末が多数の音声受信端末とＥｔｈｅｒｎｅｔ（登録商標）のＩＰネットワークで接続される。送信側で音声がデジタル化されて、音声ストリームデータがＩＰネットワークを介して複数の音声受信端末に送信され、各々の音声受信端末で音声が再生されてスピーカから出力される。ネットワーク技術を利用することで、音声出力箇所を自由に指定できるなどの利点が得られる。 Conventionally, as an audio transmission apparatus in a premises such as a building or a large-scale facility, an analog audio transmission apparatus has been generally used, but with the development of network technology, a digital audio transmission apparatus has been proposed. . In the case of the digital format, typically, a voice transmitting terminal is connected to a number of voice receiving terminals via an Ethernet (registered trademark) IP network. Audio is digitized on the transmission side, and audio stream data is transmitted to a plurality of audio receiving terminals via the IP network, and audio is reproduced at each audio receiving terminal and output from a speaker. By using network technology, there are advantages such as the ability to freely specify the audio output location.

ＩＰネットワークを利用する音声伝送装置では、複数の音声受信端末に音声ストリームを送るために、マルチキャスト送信が有利に用いられる。例えば、一つのエリアの多数の音声受信端末に音声ストリームがマルチキャストで送信され、そのエリアで一斉に同じ音声が出力される。このような音声伝送では、複数の音声受信端末での出力タイミングが揃うことが求められる。 In an audio transmission apparatus using an IP network, multicast transmission is advantageously used to send an audio stream to a plurality of audio receiving terminals. For example, an audio stream is transmitted by multicast to many audio receiving terminals in one area, and the same audio is output simultaneously in that area. In such audio transmission, output timings at a plurality of audio receiving terminals are required to be aligned.

ＩＰネットワークを介して送信されるストリームデータの出力タイミングを揃えるための従来技術としては、ストリームに出力タイミング情報が組み込まれる。出力タイミング情報に従って出力タイミングが制御され、これにより、受信端末間の出力タイミングのずれが発生しにくくなる。このような技術は、ＭＰＥＧ２等で見られる。また、この種の技術は特許文献１に開示されている。特許文献１は、映像と音声のストリームデータに時刻データを添付し、時刻データに基づいて複数のストリームを同期させる技術を開示している。
特開２００２−３６９１６３号公報（図１等） As a conventional technique for aligning the output timing of stream data transmitted via an IP network, output timing information is incorporated into the stream. The output timing is controlled according to the output timing information, which makes it difficult for output timing deviations between receiving terminals to occur. Such a technique is found in MPEG2 and the like. Further, this type of technology is disclosed in Patent Document 1. Patent Document 1 discloses a technique for attaching time data to video and audio stream data and synchronizing a plurality of streams based on the time data.
JP 2002-369163 A (FIG. 1 etc.)

しかしながら、従来の音声伝送装置においては、複数の音声受信端末で同時に音声を再生しようとすると、上述のような時刻コード等の出力タイミング情報をストリームに組み込む装置が必要であり、そのために装置規模が大きくなってしまうという問題がある。 However, in the conventional audio transmission apparatus, if audio is to be reproduced simultaneously by a plurality of audio receiving terminals, an apparatus that incorporates output timing information such as a time code as described above into a stream is required. There is a problem that it gets bigger.

このような問題を避けるためには、時刻コード等をストリームに組み込む構成を設けず、出力タイミングを能動的に制御しなければよい。しかし、単純に出力タイミングの制御を止めてしまうと、出力タイミングが音声データの受信タイミングによって勝手に決まってしまうことになる。そして受信タイミングには音声受信端末間でずれがあるために、出力タイミングもばらつき、その結果、スピーカ間で音声再生に揺らぎ（ジッタ）が生じる。 In order to avoid such a problem, it is not necessary to actively control the output timing without providing a configuration for incorporating a time code or the like into a stream. However, if the control of the output timing is simply stopped, the output timing is arbitrarily determined by the reception timing of the audio data. Since the reception timing varies between audio receiving terminals, the output timing also varies. As a result, fluctuation (jitter) occurs in audio reproduction between speakers.

上記の音声受信端末間の受信タイミングのずれは、下記のようなメカニズムで生じる。音声ストリームの送信が開始されるときは、ストリーム送信のネットワーク経路がまだ決定されていない状態でデータが送信される。ネットワーク経路決定の過程では、ネットワークを構成するハブ等の各種スイッチの学習時間によって、ネットワーク経路が決定されるまでの時間（以下、ネットワーク経路決定時間と呼ぶ）に端末間で差が生じ、その結果、音声データの受信開始タイミングが端末間でずれる。このようなずれが原因になり、隣接するスピーカ間での出力タイミングがずれ、揺らぎが生じ、音位がずれてしまう可能性がある。 The reception timing shift between the voice receiving terminals is caused by the following mechanism. When transmission of an audio stream is started, data is transmitted in a state where a network path for stream transmission has not yet been determined. In the process of determining the network route, a difference occurs between terminals in the time until the network route is determined (hereinafter referred to as network route determination time) due to the learning time of various switches such as hubs constituting the network. The reception start timing of audio data is shifted between terminals. Due to such a shift, there is a possibility that the output timing between adjacent speakers shifts, fluctuations occur, and the sound level shifts.

上記のようなネットワーク経路決定時間に起因する揺らぎを防ぐために、ネットワーク経路を決定する技術であるＩＧＭＰ（Internet Group Management Protocol）のようなプロトコルを利用することも考えられる。しかし、この場合、ＩＧＭＰ等のプロトコルに対応する機器が必要となり、システムのコストが増加するという問題がある。特に、大規模施設などの音声伝送装置では、スピーカーの数が膨大であり、音声受信端末の数も膨大になる。このような膨大な数の端末に高機能を与えることは現実的に困難である。 In order to prevent the fluctuation caused by the network route determination time as described above, it is also conceivable to use a protocol such as IGMP (Internet Group Management Protocol) which is a technology for determining a network route. However, in this case, there is a problem that a device corresponding to a protocol such as IGMP is required and the cost of the system increases. In particular, in an audio transmission apparatus such as a large-scale facility, the number of speakers is enormous and the number of audio receiving terminals is enormous. It is practically difficult to give high functions to such a huge number of terminals.

以上のように、従来は、ストリームに出力タイミング情報を組み込む構成を設けたり、ＩＧＭＰのようなプロトコル対応機器を備えないと、音声受信端末間での受信タイミングがずれ、音声再生に揺らぎが生じるという問題があった。 As described above, conventionally, if a configuration in which output timing information is incorporated into a stream or a protocol-compatible device such as IGMP is not provided, reception timing between audio receiving terminals is shifted, and fluctuation in audio reproduction occurs. There was a problem.

本発明は、従来の問題を解決するためになされたもので、その目的は、複数箇所で同時に音声を再生する際、複数箇所の音声再生の揺らぎを低減することのできる音声伝送装置を提供することにある。 The present invention has been made to solve the conventional problems, and an object of the present invention is to provide an audio transmission device capable of reducing fluctuations in audio reproduction at a plurality of locations when audio is reproduced at a plurality of locations simultaneously. There is.

本発明の音声伝送装置は、音声マルチキャストデータをＩＰネットワークに送出する音声送信端末と、前記音声送信端末から送信された前記音声マルチキャストデータを前記ＩＰネットワークを通して受信する複数の音声受信端末とを備え、前記音声送信端末は、前記音声マルチキャストデータの音声パケットを送信する音声パケット送信手段と、前記音声パケットの送信が開始される前に前記音声パケットと同様なヘッダ構成を含む学習パケットをマルチキャストで送信する学習パケット送信手段とを備え、前記複数の音声受信端末の各々は、前記音声パケットの受信の開始前に受信された前記学習パケットを破棄する学習パケット破棄手段を備えている。 The voice transmission device of the present invention includes a voice transmission terminal that sends voice multicast data to an IP network, and a plurality of voice reception terminals that receive the voice multicast data transmitted from the voice transmission terminal through the IP network, The voice transmitting terminal multicasts a learning packet including a voice packet transmitting means for transmitting a voice packet of the voice multicast data and a header structure similar to the voice packet before the voice packet transmission is started. Learning packet transmitting means, and each of the plurality of voice receiving terminals includes learning packet discarding means for discarding the learning packet received before the start of reception of the voice packet.

この構成により、音声パケットの前に学習パケットを送信するので、学習パケットの送信によってＩＰネットワークのネットワーク経路が決定され、それから、決定されたネットワーク経路を音声パケットが通る。音声パケットは既に決定されたネットワーク経路を通るので、音声受信端末間での音声受信開始時間の揺らぎが低減する。したがって、複数箇所で音声再生を行った場合でも、複数箇所の音声再生の揺らぎを低減することが可能となる。 With this configuration, since the learning packet is transmitted before the voice packet, the network path of the IP network is determined by the transmission of the learning packet, and then the voice packet passes through the determined network path. Since the voice packet passes through the already determined network path, fluctuation of the voice reception start time between voice receiving terminals is reduced. Therefore, even when audio reproduction is performed at a plurality of locations, fluctuations in audio reproduction at a plurality of locations can be reduced.

また、本発明の音声伝送装置は、前記学習パケット送信手段による前記学習パケットの送信と前記音声送信手段による前記音声パケットの送信開始との時間間隔を指定可能に構成されている。 The voice transmission apparatus of the present invention is configured to be able to specify a time interval between transmission of the learning packet by the learning packet transmission unit and transmission start of the voice packet by the voice transmission unit.

この構成により、各ネットワークの構成および機器の違いによってネットワーク経路決定時間が異なる点を考慮して、学習パケットの送信によってネットワーク経路が決定されてから音声パケットの送信が開始するように音声パケットの開始時期を個別に適切に設定でき、音声生成の揺らぎを好適に低減できる。 With this configuration, considering the fact that the network route determination time varies depending on the configuration and equipment of each network, start the voice packet so that the transmission of the voice packet starts after the network route is determined by the transmission of the learning packet The timing can be set appropriately and the fluctuation of voice generation can be suitably reduced.

本発明は、音声パケットの前に学習パケットを送信してネットワーク経路を決定に導いておくので、音声受信端末間での音声受信開始時間の揺らぎが低減し、そして、複数箇所で音声再生を行った場合でも、複数箇所の音声再生の揺らぎを低減することが可能となるという効果を有する音声伝送装置を提供することができる。 In the present invention, learning packets are transmitted before voice packets to guide the network route, so that fluctuations in the voice reception start time between voice receiving terminals are reduced, and voice reproduction is performed at a plurality of locations. Even in such a case, it is possible to provide an audio transmission device having an effect that fluctuations in audio reproduction at a plurality of locations can be reduced.

以下、本発明の実施の形態に係る音声伝送装置について、図面を用いて説明する。 Hereinafter, an audio transmission apparatus according to an embodiment of the present invention will be described with reference to the drawings.

本発明の第１の実施の形態に係る音声伝送装置を含む音声送受信システムを図１および図２に示す。図１はブロック図であり、図２は全体構成を示している。 An audio transmission / reception system including an audio transmission apparatus according to the first embodiment of the present invention is shown in FIGS. FIG. 1 is a block diagram, and FIG. 2 shows the overall configuration.

まず、図２の全体構成について説明すると、音声伝送装置１は、音声送信端末３と複数の音声受信端末４とを備え、これら端末はＩＰネットワーク２で接続されており、ＩＰネットワーク２、音声送信端末３および音声受信端末４によって音声送受信システムが構成される。 First, the overall configuration of FIG. 2 will be described. The voice transmission device 1 includes a voice transmission terminal 3 and a plurality of voice reception terminals 4, which are connected by an IP network 2, and the IP network 2, voice transmission, and the like. The terminal 3 and the voice receiving terminal 4 constitute a voice transmission / reception system.

音声送信端末３には音源５が備えられる。音源５は、例えばマイク、音声再生装置などであり、アナログまたはデジタルの音声を入力する。音声再生装置は例えばＣＤ再生装置である。音源５は音声送信端末３に接続されても、音声送信端末３と一体化されてもよい。音声音声送信端末３は、音源５から音声が入力されると、音声データのＩＰパケットを生成し、ＩＰネットワーク２に送出する。音声データは複数の音声受信端末４を対象として、マルチキャストを利用して好適に送信される。 The sound transmitting terminal 3 is provided with a sound source 5. The sound source 5 is, for example, a microphone or a sound reproduction device, and inputs analog or digital sound. The audio reproduction device is, for example, a CD reproduction device. The sound source 5 may be connected to the voice transmission terminal 3 or may be integrated with the voice transmission terminal 3. When voice is input from the sound source 5, the voice / voice transmission terminal 3 generates an IP packet of voice data and sends it to the IP network 2. The audio data is suitably transmitted using multicast for a plurality of audio receiving terminals 4.

ここで、本実施の形態では、マルチキャストで送信される音声データを音声マルチキャストデータと呼び、音声マルチキャストデータのＩＰパケット（ＩＰマルチキャストパケット）を音声パケットと呼ぶ。 Here, in this embodiment, voice data transmitted by multicast is called voice multicast data, and an IP packet (IP multicast packet) of voice multicast data is called a voice packet.

音声パケットは各々の音声受信端末４にて受信され、各音声受信端末４でアナログ音声が再生される。そして、音声は、各音声受信端末４に備えられたスピーカ６から出力される。スピーカ６は音声受信端末４と別体でもよく、両者が一体化されてもよい。 The voice packet is received by each voice receiving terminal 4, and analog voice is reproduced by each voice receiving terminal 4. Then, the sound is output from the speaker 6 provided in each sound receiving terminal 4. The speaker 6 may be separate from the voice receiving terminal 4 or may be integrated.

図２の音声伝送装置１は、例えば、建物や大規模商業施設等の構内に設けられる。図２では、一部の音声受信端末４が示されているが、実際にはより多くの音声受信端末４が好適に設けられる。例えば、大きな施設では音声受信端末４の数が膨大であり、数千台から数万台に及ぶ可能性がある。また、図２では一つの音声送信端末３が示されているが、複数の音声送信端末３が設けられてもよい。 The voice transmission device 1 in FIG. 2 is provided in a premises such as a building or a large-scale commercial facility, for example. In FIG. 2, some of the voice receiving terminals 4 are shown, but in reality, more voice receiving terminals 4 are preferably provided. For example, in a large facility, the number of voice receiving terminals 4 is enormous, which may range from thousands to tens of thousands. Moreover, although one voice transmission terminal 3 is shown in FIG. 2, a plurality of voice transmission terminals 3 may be provided.

このような音声伝送装置１では、施設全体で同一音声が出力されたり（一斉放送）、適当なエリアにて同一音声が出力されたりする。このようなとき、隣接する複数箇所の音声受信端末４では、音声出力タイミングが揃っていることが求められる。このような要求に応えるべく、本実施の形態の音声伝送装置１は下記のように構成されている。 In such an audio transmission device 1, the same sound is output in the entire facility (broadcasting), or the same sound is output in an appropriate area. In such a case, the voice receiving terminals 4 at a plurality of adjacent locations are required to have the same voice output timing. In order to meet such a demand, the audio transmission device 1 of the present embodiment is configured as follows.

図１は、音声伝送装置１の構成を示すブロック図である。図１において、音声送信端末３は、アナログ音声やデジタル音声を入力する音声入力部１１と、入力された音声を音声マルチキャストデータとして送信する音声送信部１２と、音声マルチキャストデータの送信開始前に音声マルチキャストデータの音声パケットと同様なヘッダ構成を持つ学習パケットを送信する学習パケット送信部１３とを備える。音声入力部１１は図２の音源５に相当する。 FIG. 1 is a block diagram showing the configuration of the audio transmission device 1. In FIG. 1, a voice transmission terminal 3 includes a voice input unit 11 for inputting analog voice and digital voice, a voice transmission unit 12 for transmitting the inputted voice as voice multicast data, and a voice before starting transmission of voice multicast data. And a learning packet transmitting unit 13 for transmitting a learning packet having a header configuration similar to that of a voice packet of multicast data. The voice input unit 11 corresponds to the sound source 5 in FIG.

一方、音声受信端末４は、音声マルチキャストデータを受信する音声受信部１４と、音声マルチキャストデータの受信開始前に送られてくる学習パケットを受信して破棄する学習パケット破棄部１５と、受信されたマルチキャストデータを音声受信部１３で受信するか学習パケット破棄部１４で受信するかを判断する受信データ判断部１６と、音声受信部１３にて受信された音声マルチキャストデータから再生される音声を出力する音声出力部１７とを有する。音声出力部１７はスピーカで構成されている。 On the other hand, the voice receiving terminal 4 has received the voice receiving unit 14 that receives the voice multicast data, the learning packet discarding unit 15 that receives and discards the learning packet sent before starting the reception of the voice multicast data, A reception data determination unit 16 that determines whether the multicast data is received by the voice reception unit 13 or the learning packet discard unit 14, and a voice reproduced from the voice multicast data received by the voice reception unit 13 is output. And an audio output unit 17. The audio output unit 17 is composed of a speaker.

以上のように構成された音声伝送装置について、図１〜図４を用いてその動作を説明する。図３は、音声送信端末３における送信開始時の動作を示している。音声送信端末３の操作部（図示せず）への操作等によって音声送信端末３が送信開始の指示を受けると（Ｓ２１）、まず、学習パケット送信部１３が学習パケットを送信し（Ｓ２２）、それから、音声送信部１２が、音声入力部１１により入力された音声から生成した音声パケットを送信する（Ｓ２３）。 About the audio | voice transmission apparatus comprised as mentioned above, the operation | movement is demonstrated using FIGS. 1-4. FIG. 3 shows an operation at the start of transmission in the voice transmitting terminal 3. When the voice transmission terminal 3 receives an instruction to start transmission by operating the operation unit (not shown) of the voice transmission terminal 3 (S21), first, the learning packet transmission unit 13 transmits a learning packet (S22). Then, the voice transmission unit 12 transmits a voice packet generated from the voice input by the voice input unit 11 (S23).

図４（ａ）〜図４（ｃ）は、音声パケットおよび学習パケットの例を示している。図４（ａ）は音声パケットを示しており、音声パケットは、ＩＰマルチキャストパケットであり、図示のように、ＩＰマルチキャストパケットのヘッダ構成としてＭＡＣ層、ＩＰ層、ＵＤＰ層およびＲＴＰ層を有している。そして、音声パケットは、ペイロード部（ヘッダ以外のデータ部分）に音声データを有している。 FIGS. 4A to 4C show examples of voice packets and learning packets. FIG. 4A shows a voice packet. The voice packet is an IP multicast packet, and has a MAC layer, an IP layer, a UDP layer, and an RTP layer as a header configuration of the IP multicast packet as shown in the figure. Yes. The voice packet has voice data in the payload portion (data portion other than the header).

一方、図４（ｂ）は学習パケットを示しており、学習パケットは、音声のＩＰマルチキャストパケットと同様のヘッダ構成を有している。ただし、学習パケットは、ヘッダ部分のみのＩＰパケットであり、ペイロード部を有していない。学習パケットは、音声パケットから音声データ部分を削除したパケットともいえる。このような学習パケットが学習パケット送信部１３により生成および送信される。学習パケットも音声パケットと同様にマルチキャストで送信される。また、図４（ｃ）は、学習パケットの別の例を示しており、この例に示されるように、学習パケットは、ペイロード部に音声データの代わりにダミーデータを有してもよい。 On the other hand, FIG. 4B shows a learning packet, and the learning packet has the same header structure as the voice IP multicast packet. However, the learning packet is an IP packet having only a header portion and does not have a payload portion. The learning packet can be said to be a packet obtained by deleting the voice data portion from the voice packet. Such learning packets are generated and transmitted by the learning packet transmitter 13. Learning packets are also transmitted by multicast in the same manner as voice packets. FIG. 4C shows another example of the learning packet. As shown in this example, the learning packet may have dummy data in the payload portion instead of the voice data.

次に、受信側の動作を説明する。音声受信端末４でＩＰマルチキャストパケットが受信されると、まず、ＩＰマルチキャストパケットが受信データ判断部１６に供給される。受信データ判断部１６は、受信されたＩＰマルチキャストパケットが音声パケットか学習パケットかの判断を行う。 Next, the operation on the receiving side will be described. When the IP multicast packet is received by the voice receiving terminal 4, first, the IP multicast packet is supplied to the reception data determination unit 16. The reception data determination unit 16 determines whether the received IP multicast packet is a voice packet or a learning packet.

受信データ判断部１６は、判断基準として、入力されたパケットデータが、伝送開始直後のデータかどうかを判断するように構成されている。伝送開始直後のパケットデータは学習パケットであると判定され、それより後のパケットデータは音声パケットであると判定される。学習パケットの数は一つでよい。 The reception data determination unit 16 is configured to determine whether the input packet data is data immediately after the start of transmission as a determination criterion. Packet data immediately after the start of transmission is determined to be a learning packet, and subsequent packet data is determined to be a voice packet. The number of learning packets may be one.

別の判断基準としては、パケットのヘッダ構成におけるＲＴＰ層の拡張部分に専用のフラグが設定されてよい。学習パケットの場合、フラグが立てられる。受信データ判断部１６は、フラグを参照して、受信パケットが学習パケットか音声パケットかを判断する。そして、受信データ判断部１６は、フラグが立っていれば、受信パケットが学習パケットであると判断する。 As another determination criterion, a dedicated flag may be set in the extended portion of the RTP layer in the packet header configuration. In the case of a learning packet, a flag is set. The received data determining unit 16 refers to the flag to determine whether the received packet is a learning packet or a voice packet. If the flag is set, the reception data determination unit 16 determines that the reception packet is a learning packet.

受信データ判断部１６により学習パケットであると判断されたパケットは、学習パケット破棄部１５に受信される。そして、学習パケット破棄部１５は、学習パケットを無効データとして破棄する処理を行う。 A packet determined to be a learning packet by the reception data determination unit 16 is received by the learning packet discard unit 15. Then, the learning packet discard unit 15 performs processing for discarding the learning packet as invalid data.

一方、受信データ判断部１６により音声パケットと判断されたパケットは、音声受信部１４により受信される。音声受信部１４は、音声パケットを受信することで音声マルチキャストデータを取得し、音声データを再生する。音声受信部１４にて音声マルチキャストデータからアナログ音声信号が生成され、音声出力部１７に供給され、音声出力部１７から音声が出力される。 On the other hand, a packet determined as a voice packet by the reception data determination unit 16 is received by the voice reception unit 14. The voice receiver 14 receives the voice packet, acquires voice multicast data, and reproduces the voice data. An analog audio signal is generated from the audio multicast data in the audio receiving unit 14, supplied to the audio output unit 17, and audio is output from the audio output unit 17.

以上に、音声伝送装置１の動作を、送信側と受信側に分けて説明した。音声伝送装置１は全体としては下記のように動作する。送信開始時にはまず学習パケットが音声送信端末３からマルチキャストでＩＰネットワーク２へと送出され、複数の音声受信端末４へと送信される。この学習パケットの送信で、ＩＰネットワーク２を構成するハブ等の各種スイッチのような構成要素で学習が行われ、その結果としてＩＰネットワーク２でネットワーク経路が決定される。そして、学習パケットによってネットワーク経路が決定された後、音声パケットが音声送信端末３から音声受信端末４へと送信される。音声パケットの送信が開始する時点ではネットワーク経路が決まっている。ハブ等のネットワーク構成要素は通常は同じ動作を続けるので、学習パケットと同じ経路でストリームの音声パケットが送信され続ける。予め決まったネットワーク経路を音声パケットが通るので、複数の音声受信端末４でほぼ同時に音声受信が開始され、受信開始時間の差が殆ど生じない。したがって、複数の音声受信端末４での音声出力もほぼ同時に開始される。 The operation of the voice transmission device 1 has been described separately for the transmission side and the reception side. The audio transmission device 1 operates as follows as a whole. At the start of transmission, a learning packet is first transmitted from the voice transmitting terminal 3 to the IP network 2 by multicast and transmitted to a plurality of voice receiving terminals 4. With the transmission of the learning packet, learning is performed by components such as various switches such as a hub constituting the IP network 2, and as a result, the network path is determined in the IP network 2. Then, after the network path is determined by the learning packet, the voice packet is transmitted from the voice transmission terminal 3 to the voice reception terminal 4. At the time when transmission of voice packets starts, the network path is determined. A network component such as a hub normally continues the same operation, so that the audio packet of the stream continues to be transmitted along the same route as the learning packet. Since voice packets pass through a predetermined network path, voice reception is started almost simultaneously at the plurality of voice receiving terminals 4, and there is almost no difference in reception start time. Therefore, audio output from the plurality of audio receiving terminals 4 is started almost simultaneously.

本実施の形態は以下の点で有利である。本実施の形態では、音声データに出力タイミング情報が組み込まれていないので、音声受信端末４は受信した音声データを直ぐに出力する。このような場合に、本実施の形態の学習パケットが送信されないと、ネットワーク経路が決定されない状態で音声パケットの送信が開始し、音声パケットの送信過程でネットワーク経路が決定される。そして、ネットワーク経路決定時間が端末間でばらつくので、音声パケットの受信開始タイミングも端末間でばらつき、音声出力タイミングもばらつく。ネットワーク経路決定時間のばらつきは、音声出力タイミングのばらつきの大きな要因になる。これに対して、本実施の形態では、学習パケットを予め送信することにより、音声パケットの送信開始時点でネットワーク経路が既に決定している。したがって、上記のようなばらつき要因を除くことができ、音声パケットの受信開始タイミングを揃えることができ、音声出力タイミングのばらつきを大幅に低減できる。 This embodiment is advantageous in the following points. In the present embodiment, since the output timing information is not incorporated in the audio data, the audio receiving terminal 4 outputs the received audio data immediately. In such a case, if the learning packet according to the present embodiment is not transmitted, the transmission of the voice packet is started without determining the network path, and the network path is determined in the process of transmitting the voice packet. Since the network route determination time varies between terminals, the reception start timing of voice packets also varies between terminals, and the voice output timing also varies. Variation in network route determination time is a major factor in variation in audio output timing. On the other hand, in the present embodiment, the network path is already determined at the time when the transmission of the voice packet is started by transmitting the learning packet in advance. Therefore, the variation factors as described above can be eliminated, the reception start timing of the voice packet can be made uniform, and the variation in the voice output timing can be greatly reduced.

本実施の形態では、学習パケットの送信から音声パケットの送信開始までの遅延時間が予め設定されている。この遅延時間は、ネットワーク経路決定時間に応じて設定されている。より詳細には、遅延時間は、該当ＩＰネットワーク２での標準的なネットワーク経路決定時間よりも所定の余裕時間だけ長く設定されている。遅延時間は例えば数百ｍｓｅｃである。 In this embodiment, a delay time from the transmission of the learning packet to the start of transmission of the voice packet is set in advance. This delay time is set according to the network route determination time. More specifically, the delay time is set longer than the standard network route determination time in the corresponding IP network 2 by a predetermined margin time. The delay time is, for example, several hundred msec.

以上に本発明の第１の実施の形態に係る音声伝送装置１について説明した。本実施の形態によれば、音声パケットの前に学習パケットを送信するので、学習パケットの送信によってＩＰネットワークのネットワーク経路が決定され、それから、決定されたネットワーク経路を音声パケットが通る。音声パケットは既に決定されたネットワーク経路を通るので、音声受信端末間での音声受信開始時間の時間差が低減し、受信開始時間の揺らぎが低減する。したがって、複数箇所で音声再生を行った場合でも、複数箇所の音声再生の揺らぎを低減することが可能となる。出力タイミング情報をストリーム内に有さないストリーム形式でも受信開始と音声出力の揺らぎを最小限にできる。そして、受信側での隣接スピーカの音位のずれを抑えることができ、音声出力を良好に行える。 The audio transmission apparatus 1 according to the first embodiment of the present invention has been described above. According to the present embodiment, since the learning packet is transmitted before the voice packet, the network path of the IP network is determined by the transmission of the learning packet, and then the voice packet passes through the determined network path. Since the voice packet passes through the already determined network path, the time difference of the voice reception start time between the voice receiving terminals is reduced, and the fluctuation of the reception start time is reduced. Therefore, even when audio reproduction is performed at a plurality of locations, fluctuations in audio reproduction at a plurality of locations can be reduced. Even in a stream format that does not have output timing information in the stream, fluctuations in reception start and audio output can be minimized. And the shift | offset | difference of the sound level of the adjacent speaker in the receiving side can be suppressed, and an audio | voice output can be performed favorably.

また、本実施の形態では、ネットワーク経路を決定させるための学習パケットが、音声パケットと同じヘッダ構成を有している。これにより、ネットワーク構成要素であるハブ等のスイッチの形態に拘わらず、ネットワーク経路を決定させることができ、この点でも有利である。 In the present embodiment, the learning packet for determining the network route has the same header configuration as the voice packet. This makes it possible to determine a network path regardless of the form of a switch such as a hub as a network component, which is also advantageous in this respect.

そして、本実施の形態によれば、音声ストリームに出力タイミング情報を組み込まなくてもよいので、装置規模が大きくなるのを回避できる。また、学習パケットを使った簡単な構成でネットワーク経路を決定するので、ＩＧＭＰのような経路決定可能なプロトコルに端末機器が対応していなくてよい。したがって、音声伝送装置の端末の構成を簡素にできる。この点は、本実施の形態が適用されるような大規模施設などの音声伝送装置１においては特に有利である。このような音声伝送装置１では、スピーカ数が膨大であり、数千から数万に及ぶ可能性がある。そのため、従来のアナログシステムからデジタルシステムへの移行に当たっては、各端末機器の能力は極力低く抑えることが求められる。各端末機器に高いレベルの機能が要求されると、音声伝送装置のデジタル化そのものが現実的に困難になる。このような背景の下、本実施の形態は、学習パケットを予め送信するという簡単な構成でもって音声出力の揺らぎを低減できるので、特に有利である。 And according to this Embodiment, since it is not necessary to incorporate output timing information in an audio | voice stream, it can avoid that an apparatus scale becomes large. Further, since the network route is determined with a simple configuration using the learning packet, the terminal device does not have to support a route deciding protocol such as IGMP. Therefore, the configuration of the terminal of the voice transmission device can be simplified. This point is particularly advantageous in the audio transmission apparatus 1 such as a large-scale facility to which the present embodiment is applied. In such an audio transmission device 1, the number of speakers is enormous, and may range from thousands to tens of thousands. Therefore, when shifting from a conventional analog system to a digital system, it is required to suppress the capability of each terminal device as much as possible. When each terminal device is required to have a high level of function, it is practically difficult to digitize the voice transmission device. Against this background, this embodiment is particularly advantageous because it can reduce fluctuations in voice output with a simple configuration in which learning packets are transmitted in advance.

次に、本発明の第２の実施の形態に係る音声伝送装置を図５に示す。以下、第１の実施の形態と共通する事項の説明は省略する。 Next, an audio transmission apparatus according to the second embodiment of the present invention is shown in FIG. Hereinafter, description of matters common to the first embodiment is omitted.

図５において、本実施の形態の音声伝送装置１０１では、音声送信部１２の構成がより詳細に示されており、音声送信部１２は遅延発生部１２１と送信部１２２で構成されている。送信部１２２は、音声パケットを生成してＩＰネットワーク２へ送出する。遅延発生部１２１は、学習パケットの送信から音声パケットの送信までの遅延時間を発生する。 In FIG. 5, in the audio transmission device 101 of the present embodiment, the configuration of the audio transmission unit 12 is shown in more detail, and the audio transmission unit 12 includes a delay generation unit 121 and a transmission unit 122. The transmission unit 122 generates a voice packet and sends it to the IP network 2. The delay generation unit 121 generates a delay time from the learning packet transmission to the voice packet transmission.

遅延発生部１２１は、送信部１２２による音声パケットの生成の遅延時間を制御する。送信部１２２は、遅延発生部１２１に制御されて、学習パケット送信部１３による学習パケットの送信から、遅延時間だけ遅れて音声パケットの送信を開始する。 The delay generation unit 121 controls the delay time of the voice packet generation by the transmission unit 122. The transmission unit 122 is controlled by the delay generation unit 121 and starts transmitting voice packets after a delay time from the learning packet transmission by the learning packet transmission unit 13.

遅延発生部１２１が発生させる遅延時間は、任意に変更可能に構成されている。遅延時間は以下のようにして設定される。ＩＰマルチキャストパケットがＩＰネットワーク２に送出されたとき、ネットワーク経路決定時間は、ネットワークの構成およびネットワークに含まれる機器の違いによって異なる。そこで、対象のネットワークでの経路決定時間が測定される。実際にＩＰマルチパケットが送信されて、経路決定時間が測定されてよい。そして、実際のＩＰネットワーク２の経路決定時間よりも所定時間だけ長くなるように、遅延時間が設定される。設定された遅延時間が音声送信端末３に記憶され、遅延発生部１２１により参照され、遅延発生部１２１は、記憶された遅延時間を発生する。 The delay time generated by the delay generator 121 can be arbitrarily changed. The delay time is set as follows. When an IP multicast packet is sent to the IP network 2, the network route determination time varies depending on the configuration of the network and the difference in devices included in the network. Therefore, the route determination time in the target network is measured. Actually, an IP multi-packet may be transmitted and the route determination time may be measured. The delay time is set so as to be longer than the actual IP network 2 route determination time by a predetermined time. The set delay time is stored in the voice transmission terminal 3, and is referred to by the delay generation unit 121. The delay generation unit 121 generates the stored delay time.

遅延時間の設定は、音声送信端末３に設けられた操作部を用いて行われてもよい。また、経路決定時間が音声伝送装置１で測定されて、測定結果に応じて自動的に遅延時間が設定されてよい。 The setting of the delay time may be performed using an operation unit provided in the voice transmission terminal 3. Further, the route determination time may be measured by the voice transmission device 1, and the delay time may be automatically set according to the measurement result.

このように、本実施の形態では、実際のＩＰネットワーク２の構成に応じた経路決定時間に基づいて遅延時間を設定するので、遅延時間を必要最小限の時間に設定できる。このような遅延時間設定は以下の点で有利である。遅延時間が短すぎると、経路決定前に音声パケットの送信が開始してしまう可能性があり、この場合は、音声出力時期の揺らぎが十分に低減されなくなる可能性がある。一方、遅延時間が長いと、音声出力の開始も遅れる。また、本実施の形態では、学習パケットを用いる簡単な構成でネットワーク経路を事前に決定しており、ネットワーク経路を確実に制御している訳ではないので、遅延時間が長いと、せっかく決まったネットワーク経路が失われる可能性がある。本実施の形態では、遅延時間を制御することにより、経路決定時間より大きい範囲で極力小さく遅延時間を設定することができ、音声出力の揺らぎが好適に低減する。 Thus, in this embodiment, since the delay time is set based on the route determination time according to the actual configuration of the IP network 2, the delay time can be set to the minimum necessary time. Such a delay time setting is advantageous in the following points. If the delay time is too short, there is a possibility that transmission of voice packets will start before route determination. In this case, fluctuations in voice output timing may not be sufficiently reduced. On the other hand, if the delay time is long, the start of audio output is also delayed. In the present embodiment, the network route is determined in advance with a simple configuration using learning packets, and the network route is not reliably controlled. The route can be lost. In this embodiment, by controlling the delay time, the delay time can be set as small as possible within a range larger than the route determination time, and the fluctuation of the voice output is suitably reduced.

以上に本発明の第２の実施の形態に係る音声伝送装置１について説明した。本実施の形態によれば、学習パケットの送信と音声パケットの送信開始との時間間隔（上記の遅延時間）が指定可能である。これにより、学習パケットの送信によってネットワーク経路が決定されてから音声パケットの送信が開始するように、音声パケットの開始時期を個別に適切に設定でき、音声生成の揺らぎを好適に低減できる。ネットワークの系に拘わらずネットワーク経路が決定した後に音声マルチキャストデータを送信することが可能となり、受信側の各端末の絶対的な受信時間の差を最小限にすることができる。 The audio transmission device 1 according to the second embodiment of the present invention has been described above. According to the present embodiment, it is possible to specify the time interval (the above delay time) between the transmission of the learning packet and the start of transmission of the voice packet. Thereby, the start time of the voice packet can be appropriately set individually so that the transmission of the voice packet starts after the network path is determined by the transmission of the learning packet, and the fluctuation of the voice generation can be suitably reduced. Voice multicast data can be transmitted after the network route is determined regardless of the network system, and the difference in absolute reception time between the terminals on the receiving side can be minimized.

以上に本発明の好適な実施の形態を説明した。しかし、本発明は上述の実施の形態に限定されず、当業者が本発明の範囲内で上述の実施の形態を変形可能なことはもちろんである。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and it goes without saying that those skilled in the art can modify the above-described embodiments within the scope of the present invention.

以上のように、本発明に係る音声伝送装置は、音声の受信開始時間の揺らぎを低減するすることが可能となり、複数箇所で音声再生をおこなった場合でも、複数箇所の音声再生の揺らぎを低減することが可能となるという効果を有し、構内放送等の音声伝送装置として有用である。 As described above, the audio transmission device according to the present invention can reduce fluctuations in the voice reception start time, and even when voice reproduction is performed at a plurality of places, the fluctuation of voice reproduction at a plurality of places is reduced. And is useful as an audio transmission device for private broadcasting.

本発明の第１の実施の形態における音声伝送装置のブロック図The block diagram of the audio | voice transmission apparatus in the 1st Embodiment of this invention 音声伝送装置の全体構成を示す図The figure which shows the whole constitution of the audio transmission equipment 音声送信端末の動作を示すフロー図Flow chart showing operation of voice transmitting terminal （ａ）音声パケットのパケット構成を示す図（ｂ）学習パケットのパケット構成を示す図（ｃ）学習パケットのパケット構成の別の例を示す図(A) The figure which shows the packet structure of a voice packet (b) The figure which shows the packet structure of a learning packet (c) The figure which shows another example of the packet structure of a learning packet 本発明の第２の実施の形態における音声伝送装置のブロック図The block diagram of the audio | voice transmission apparatus in the 2nd Embodiment of this invention

Explanation of symbols

１音声伝送装置
２ＩＰネットワーク
３音声送信端末
４音声受信端末
５音源
６スピーカ
１１音声入力部
１２音声送信部
１３学習パケット送信部
１４音声受信部
１５学習パケット破棄部
１６受信データ判断部
１７音声出力部 DESCRIPTION OF SYMBOLS 1 Voice transmission apparatus 2 IP network 3 Voice transmission terminal 4 Voice reception terminal 5 Sound source 6 Speaker 11 Voice input part 12 Voice transmission part 13 Learning packet transmission part 14 Voice reception part 15 Learning packet discard part 16 Received data judgment part 17 Voice output part

Claims

A voice transmitting terminal for sending voice multicast data to an IP network; and a plurality of voice receiving terminals for receiving the voice multicast data transmitted from the voice sending terminal through the IP network;
The voice transmitting terminal multicasts a voice packet transmitting means for transmitting voice packets of the voice multicast data and a learning packet including a header structure similar to the voice packet before transmission of the voice packets is started. Learning packet transmission means,
Each of the plurality of voice receiving terminals includes a learning packet discarding unit that discards the learning packet received before the start of reception of the voice packet.

The voice transmission device according to claim 1, wherein a time interval between transmission of the learning packet by the learning packet transmission unit and transmission start of the voice packet by the voice transmission unit can be specified.