JP2012124689A

JP2012124689A - Communication system, transmission side device, reception side device

Info

Publication number: JP2012124689A
Application number: JP2010273097A
Authority: JP
Inventors: Hideo Natori; 英男名取
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2010-12-08
Filing date: 2010-12-08
Publication date: 2012-06-28

Abstract

PROBLEM TO BE SOLVED: To reduce sound distortion or sound skipping when an allowable storage amount of jitter absorption buffer increased at the time of communication network congestion is returned to a default value after the communication network congestion is recovered.SOLUTION: A transmission side device 1 comprises: a silent state information generating unit 11 which generates silent state information in accordance with a volume level of voice data; a voice packet generating unit 12 which adds the silent state information to packetize the voice data; a packet delivering unit 13 which transmits a packet to a reception side device 3. The reception side device 3 comprises: a packet receiving unit 31 which receives the packet to acquire the voice data and the silent state information; a jitter absorption buffer 33 which stores the voice data to adjust a delay time; and a jitter buffer adjusting unit 32 which, when a storage amount exceeds a specified allowable storage amount, increases the allowable storage amount and, when the storage amount is within a specified allowable storage amount for a certain time, discards the voice data indicating a silent state and returns the allowable storage amount back to a default value.

Description

この発明は、ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）電話サービスを提供するＶｏＩＰ（ＶｏｉｃｅＯｖｅｒＩＰ）ターミナルアダプタ機能を搭載したＶｏＩＰ通信システム等の通信システム、送信側装置および受信側装置に関するものである。 The present invention relates to a communication system such as a VoIP communication system equipped with a VoIP (Voice Over IP) terminal adapter function for providing an IP (Internet Protocol) telephone service, a transmission side apparatus, and a reception side apparatus.

昨今、通信事業者が提供する電話サービスでは、アナログ音声データをデジタル化した後、ＩＰパケット化を行い、ＰＣ等のデータ通信と同じパケット通信網（インターネット等）で転送するＶｏＩＰ技術が使用されている。
ＶｏＩＰ技術を用いた場合には、通信網に音声専用装置が不要であり、データ通信網と共用化できるため、安価なサービスを提供できるというメリットがある。一方、通信網の輻輳が発生した際には、受信側装置でパケットの到着時間にジッタが生じる。この場合、受信側装置の受信ＣＯＤＥＣにより連続した音声データを再生した際に、音飛び等が生じ、音声品質が劣化するという課題がある。 In recent years, a telephone service provided by a telecommunications carrier uses VoIP technology in which analog voice data is digitized and then converted into IP packets and transferred through the same packet communication network (such as the Internet) as data communication such as a PC. Yes.
When the VoIP technology is used, there is an advantage that an inexpensive service can be provided because a dedicated voice device is not required in the communication network and can be shared with the data communication network. On the other hand, when congestion occurs in the communication network, jitter occurs in the arrival time of the packet at the receiving side device. In this case, when continuous audio data is reproduced by the reception CODEC of the reception side device, there is a problem that sound skipping occurs and the audio quality deteriorates.

そこで、従来の受信側装置では、通信網の輻輳によるジッタを吸収するために、受信ＣＯＤＥＣの前段にジッタ吸収バッファを設けている。そして、受信した音声データを、ジッタ吸収バッファに許容蓄積量まで蓄積させた後、受信ＣＯＤＥＣへ転送している。これにより、ジッタによるアンダーランの発生を抑制することができる。 Therefore, in the conventional receiving side apparatus, in order to absorb jitter due to congestion of the communication network, a jitter absorbing buffer is provided in the preceding stage of the receiving CODEC. The received audio data is accumulated up to an allowable accumulation amount in the jitter absorption buffer, and then transferred to the reception CODEC. Thereby, the occurrence of underrun due to jitter can be suppressed.

しかしながら、さらにジッタが増大した場合には、ジッタ吸収バッファでアンダーランが発生し、音飛びが発生する。そこで、アンダーランの発生を検出した場合には、ジッタ吸収バッファでの許容蓄積量を増加させ、以降のアンダーランの発生を抑制することによって、音声品質を確保している。 However, when jitter further increases, underrun occurs in the jitter absorption buffer, and sound skip occurs. Therefore, when the occurrence of underrun is detected, the allowable accumulation amount in the jitter absorption buffer is increased, and the subsequent underrun occurrence is suppressed to ensure the voice quality.

上記の制御により音声品質は確保されるが、ジッタ吸収バッファで許容蓄積量を増加した分だけ遅延が追加されるため、会話が不自然となる。そこで、ジッタ吸収バッファの蓄積量が一定時間、許容蓄積量内で維持されている場合には、通信網の輻輳が回復したとみなし、ジッタ吸収バッファの許容蓄積量をデフォルト値に戻す。この際、受信ＣＯＤＥＣでの再生速度を上げたり、ジッタ吸収バッファへの転送データを無条件に廃棄することによって、ジッタ吸収バッファの蓄積量を減らしている（例えば特許文献１参照）。 Although the voice quality is ensured by the above control, since the delay is added by the increase in the allowable accumulation amount in the jitter absorption buffer, the conversation becomes unnatural. Therefore, when the accumulated amount of the jitter absorbing buffer is maintained within the allowable accumulated amount for a certain time, it is considered that the congestion of the communication network has been recovered, and the allowable accumulated amount of the jitter absorbing buffer is returned to the default value. At this time, the accumulation amount of the jitter absorption buffer is reduced by increasing the reproduction speed in the reception CODEC or unconditionally discarding the transfer data to the jitter absorption buffer (see, for example, Patent Document 1).

特開２００４−２８２６９２号公報Japanese Patent Application Laid-Open No. 2004-282692

上記のように、特許文献１に開示される方法では、ジッタ吸収バッファの許容蓄積量をデフォルト値に戻す際に、受信ＣＯＤＥＣでの再生速度を上げたり、ジッタ吸収バッファへの転送データを無条件に廃棄している。しかしながら、いずれも再生した音声に歪みや音飛びの発生等、音声品質に影響を与えてしまうという課題があった。 As described above, in the method disclosed in Patent Document 1, when the allowable accumulation amount of the jitter absorption buffer is returned to the default value, the reproduction speed in the reception CODEC is increased, or the transfer data to the jitter absorption buffer is unconditionally set. Is discarded. However, there is a problem that the quality of the sound is affected, such as distortion and skipping of the reproduced sound.

この発明は、上記のような課題を解決するためになされたもので、通信網の輻輳が発生した際に増加させたジッタ吸収バッファの許容蓄積量を、通信網の輻輳回復後に、遅延最適化のためにデフォルト値に戻す際に、音声の歪みや音飛びの発生等、音声品質に対する影響を抑制することができる通信システム、送信側装置および受信側装置を提供することを目的としている。 The present invention has been made to solve the above-described problems. The allowable accumulation amount of the jitter absorption buffer increased when the congestion of the communication network occurs, and the delay optimization is performed after the congestion of the communication network is recovered. Therefore, an object of the present invention is to provide a communication system, a transmission side apparatus, and a reception side apparatus that can suppress influences on voice quality such as voice distortion and sound skipping when returning to default values.

この発明に係る通信システムは、音声データをパケット化して送信する送信側装置と、送信側装置からパケットを受信し、音声データに変換する受信側装置とを備え、送信側装置は、音声データの音量レベルに基づいて、無音状態情報を生成する無音状態情報生成部と、無音状態情報生成部により生成された無音状態情報を付加して、対応する音声データをパケット化する音声パケット生成部と、音声パケット生成部により生成されたパケットを該当する受信側装置に転送するパケット送出部とを備え、受信側装置は、パケット送出部により転送されたパケットを受信し、音声データおよび無音状態情報を取得するパケット受信部と、パケット受信部により取得された音声データを蓄積して遅延時間を調整するジッタ吸収バッファと、ジッタ吸収バッファに蓄積されている音声データ量が設定した許容蓄積量を超過した場合に、当該許容蓄積量を増加させ、ジッタ吸収バッファに蓄積されている音声データ量が一定時間、設定した許容蓄積量内である場合に、パケット受信部により取得された無音状態情報が無音状態を示す音声データを廃棄して、当該許容蓄積量をデフォルト値に戻すジッタバッファ調整部とを備えたものである。 A communication system according to the present invention includes: a transmission side device that packetizes and transmits audio data; and a reception side device that receives packets from the transmission side device and converts the packets into audio data. A silence state information generating unit that generates silence state information based on the volume level, a voice packet generating unit that packetizes the corresponding voice data by adding the silence state information generated by the silence state information generation unit; A packet sending unit for transferring the packet generated by the voice packet generation unit to the corresponding receiving device, and the receiving device receives the packet transferred by the packet sending unit and acquires voice data and silence state information. A packet receiving unit, a jitter absorbing buffer that accumulates audio data acquired by the packet receiving unit and adjusts a delay time, and a jitter When the amount of audio data stored in the collection buffer exceeds the set allowable storage amount, the allowable storage amount is increased, and the amount of audio data stored in the jitter absorption buffer is set for a certain period of time. And a jitter buffer adjustment unit that discards voice data whose silence state information acquired by the packet reception unit indicates a silence state and returns the permissible accumulation amount to a default value.

この発明によれば、上記のように構成したので、通信網の輻輳が発生した際に増加させたジッタ吸収バッファの許容蓄積量を、通信網の輻輳回復後に、遅延最適化のためにデフォルト値に戻す際に、音声の歪みや音飛びの発生等、音声品質に対する影響を抑制することができる。 According to the present invention, since it is configured as described above, the allowable accumulation amount of the jitter absorption buffer that is increased when the communication network is congested is set to the default value for delay optimization after the congestion of the communication network is recovered. When returning to, effects on voice quality such as voice distortion and skipping can be suppressed.

この発明の実施の形態１に係るＶｏＩＰ通信システムの構成を示す図である。It is a figure which shows the structure of the VoIP communication system which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係るＶｏＩＰ通信システムにおけるジッタバッファ調整動作を示すフローチャートである。It is a flowchart which shows the jitter buffer adjustment operation | movement in the VoIP communication system which concerns on Embodiment 1 of this invention. この発明の実施の形態１におけるジッタ吸収バッファの蓄積量の削減を示す図である。It is a figure which shows reduction of the accumulation amount of the jitter absorption buffer in Embodiment 1 of this invention.

以下、この発明の実施の形態について図面を参照しながら詳細に説明する。
実施の形態１．
図１はこの発明の実施の形態１に係るＶｏＩＰ通信システムの構成を示す図である。
ＶｏＩＰ通信システムは、図１に示すように、送信側装置１、通信網（例えばインターネット等）２および受信側装置３から構成されている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Embodiment 1 FIG.
1 is a diagram showing a configuration of a VoIP communication system according to Embodiment 1 of the present invention.
As shown in FIG. 1, the VoIP communication system includes a transmission side device 1, a communication network (such as the Internet) 2, and a reception side device 3.

送信側装置１は、電話機４から受信したアナログ音声データをデジタル化し、ＩＰパケット化するものである。この送信側装置１は、送信ＣＯＤＥＣ（無音状態情報生成部）１１、音声パケット生成部１２およびパケット送出部１３から構成されている。 The transmission side device 1 digitizes the analog voice data received from the telephone 4 and converts it into an IP packet. The transmission side device 1 includes a transmission CODEC (silent state information generation unit) 11, an audio packet generation unit 12, and a packet transmission unit 13.

送信ＣＯＤＥＣ１１は、ＩＴＵ−ＴＧ．７１１，Ｇ．７２９ａ等で規定される方式に従い、電話機４からのアナログ音声データをデジタル音声データにエンコードするものである。
また、送信ＣＯＤＥＣ１１は、アナログ音声データをエンコードする際に、この音声データの音量レベルを検出し、所定の閾値と比較して無音状態情報を生成する。ここで、送信ＣＯＤＥＣ１１は、音量レベルが閾値を下回っていると判断した場合には、音声データが無音状態であると判断する。この送信ＣＯＤＥＣ１１により生成された、音声データが無音状態であるかを示す無音状態情報は、デジタル音声データと共に音声パケット生成部１２に転送される。 The transmission CODEC 11 is an ITU-T G. 711, G. In accordance with a method defined by 729a or the like, analog audio data from the telephone 4 is encoded into digital audio data.
Further, when encoding the analog audio data, the transmission CODEC 11 detects the volume level of the audio data, and compares it with a predetermined threshold value to generate silence state information. Here, if the transmission CODEC 11 determines that the volume level is below the threshold, the transmission CODEC 11 determines that the audio data is in a silent state. The silent state information generated by the transmission CODEC 11 indicating whether the voice data is silent is transferred to the voice packet generator 12 together with the digital voice data.

音声パケット生成部１２は、一定時間毎にＲＴＰ（Ｒｅａｌ−ｔｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ）に従い、送信ＣＯＤＥＣ１１からのデジタル音声データをＩＰパケット化するものである。
また、音声パケット生成部１２は、デジタル音声データをＩＰパケット化する際に、デジタル音声データと共に転送された無音状態情報を、ＩＰパケットのＤＳ（ＤｉｆｆｅｒｅｎｔｉａｔｅｄＳｅｒｖｉｃｅｓ）領域のＤＳＣＰ（ＤＳＣｏｄｅＰｏｉｎｔ）値に変換して埋め込む。具体的には、無音状態情報が無音状態を示す場合には低優先パケット（例えばＤＳＣＰ値＝０ｘ２２）として、それ以外は高優先パケット（例えばＤＳＣＰ値＝０ｘ２ｅ）として埋め込みを行う。
この音声パケット生成部１２により生成されたＩＰパケットはパケット送出部１３に転送される。 The voice packet generator 12 converts the digital voice data from the transmission CODEC 11 into IP packets according to RTP (Real-time Transport Protocol) at regular intervals.
Further, when the voice packet generation unit 12 converts the digital voice data into an IP packet, the silence state information transferred together with the digital voice data is converted into a DSCP (DS Code Point) value in a DS (Differentiated Services) area of the IP packet. Convert and embed. Specifically, when the silence state information indicates a silence state, the low-priority packet (for example, DSCP value = 0x22) is embedded, and otherwise, the high-priority packet (for example, DSCP value = 0x2e) is embedded.
The IP packet generated by the voice packet generator 12 is transferred to the packet transmitter 13.

パケット送出部１３は、音声パケット生成部１２からのＩＰパケットを通信網２を介して該当する受信側装置３に転送するものである。
通信網２は、送信側装置１からのＩＰパケットを該当する受信側装置３へ転送するものであり、ＰＣ等のデータ通信と同じパケット通信網である。 The packet sending unit 13 transfers the IP packet from the voice packet generation unit 12 to the corresponding receiving side device 3 via the communication network 2.
The communication network 2 transfers the IP packet from the transmission side device 1 to the corresponding reception side device 3, and is the same packet communication network as data communication such as a PC.

受信側装置３は、通信網２を介して送信側装置１からＩＰパケットを受信してデジタル音声データを取得し、アナログ音声データに変換して電話機５に送信するものである。この受信側装置３は、パケット受信部３１、ジッタバッファ調整部３２、ジッタ吸収バッファ３３および受信ＣＯＤＥＣ３４から構成されている。 The receiving side device 3 receives the IP packet from the transmitting side device 1 via the communication network 2 to acquire digital voice data, converts it into analog voice data, and transmits it to the telephone 5. The reception side device 3 includes a packet reception unit 31, a jitter buffer adjustment unit 32, a jitter absorption buffer 33, and a reception CODEC 34.

パケット受信部３１は、通信網２を介して送信側装置１からＩＰパケットを受信し、デジタル音声データを取得するものである。また、パケット受信部３１は、受信したＩＰパケットのＤＳＣＰ値を精査して、無音状態情報を取得する。このパケット受信部３１により取得されたデジタル音声データおよび無音状態情報はジッタバッファ調整部３２に転送される。 The packet receiver 31 receives an IP packet from the transmission side device 1 via the communication network 2 and acquires digital audio data. In addition, the packet receiving unit 31 examines the DSCP value of the received IP packet and acquires silence state information. The digital audio data and silence state information acquired by the packet receiving unit 31 are transferred to the jitter buffer adjusting unit 32.

ジッタバッファ調整部３２は、ジッタ吸収バッファ３３に蓄積させる音声データ量（許容蓄積量）を調整するものである。なお、許容蓄積量のデフォルト値としては、例えば通信網２の平均ジッタを吸収可能な値等に設定する。ここで、ジッタバッファ調整部３２は、通信網２の輻輳の発生によりジッタが増大し、設定した許容蓄積量で吸収可能なジッタ量を超過したと判断した場合には、許容蓄積量を増加させる。また、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量が一定時間、設定した許容蓄積量内で維持されている場合は、この許容蓄積量をデフォルト値に戻す。この際、ジッタバッファ調整部３２は、パケット受信部３１により取得された無音状態情報が無音状態を示す音声データを廃棄することによって、ジッタ吸収バッファ３３の蓄積量を削減して、許容蓄積量をデフォルト値に戻す。 The jitter buffer adjustment unit 32 adjusts the amount of audio data (allowable accumulation amount) accumulated in the jitter absorption buffer 33. Note that the default value of the allowable accumulation amount is set to a value that can absorb the average jitter of the communication network 2, for example. Here, when the jitter buffer adjustment unit 32 determines that the jitter has increased due to the occurrence of congestion in the communication network 2 and has exceeded the amount of jitter that can be absorbed by the set allowable storage amount, the jitter buffer adjustment unit 32 increases the allowable storage amount. . In addition, when the accumulation amount of the jitter absorption buffer 33 is maintained within the set allowable storage amount for a certain time, the jitter buffer adjustment unit 32 returns the allowable storage amount to the default value. At this time, the jitter buffer adjustment unit 32 reduces the accumulation amount of the jitter absorption buffer 33 by discarding the audio data in which the silence state information acquired by the packet reception unit 31 indicates the silence state. Restore the default value.

ジッタ吸収バッファ３３は、ジッタバッファ調整部３２を介して、パケット受信部３１からのデジタル音声データを蓄積し、遅延時間を調整した上で受信ＣＯＤＥＣ３４に転送することで、ジッタを吸収するものである。このジッタ吸収バッファ３３は、デジタル音声データを、ジッタバッファ調整部３２により設定された許容蓄積量まで蓄積した後、受信ＣＯＤＥＣ３４への転送を開始する。 The jitter absorption buffer 33 accumulates digital audio data from the packet reception unit 31 via the jitter buffer adjustment unit 32, adjusts the delay time, and transfers it to the reception CODEC 34 to absorb jitter. . The jitter absorption buffer 33 accumulates the digital audio data up to the allowable accumulation amount set by the jitter buffer adjustment unit 32, and then starts the transfer to the reception CODEC 34.

受信ＣＯＤＥＣ３４は、ジッタ吸収バッファ３３からのデジタル音声データをアナログ音声データにデコードするものである。この受信ＣＯＤＥＣ３４によりデコードされたアナログ音声データは電話機５へ送信される。 The reception CODEC 34 decodes the digital audio data from the jitter absorption buffer 33 into analog audio data. The analog audio data decoded by the reception CODEC 34 is transmitted to the telephone 5.

次に、上記のように構成されたＶｏＩＰ通信システムにおけるジッタバッファ調整動作について説明する。
図２はこの発明の実施の形態１に係るＶｏＩＰ通信システムにおけるジッタバッファ調整動作を示すフローチャートであり、図３はこの発明の実施の形態１におけるジッタ吸収バッファ３３の蓄積量の削減を示す図である。 Next, the jitter buffer adjustment operation in the VoIP communication system configured as described above will be described.
FIG. 2 is a flowchart showing the jitter buffer adjustment operation in the VoIP communication system according to Embodiment 1 of the present invention, and FIG. 3 is a diagram showing the reduction of the accumulated amount of the jitter absorption buffer 33 in Embodiment 1 of the present invention. is there.

なお、送信側装置１から送信されるＩＰパケットには、デジタル音声データに加えて無音状態情報が含まれている。すなわち、送信ＣＯＤＥＣ１１は、アナログ音声データをエンコードする際に、この音声データの音量レベルに基づいて、無音状態情報を生成している。また、音声パケット生成部１２は、デジタル音声データをＩＰパケット化する際に、無音状態情報を付加している。 Note that the IP packet transmitted from the transmission side device 1 includes silence state information in addition to the digital voice data. That is, when encoding the analog audio data, the transmission CODEC 11 generates silence state information based on the volume level of the audio data. The voice packet generation unit 12 adds silence state information when digital voice data is converted into IP packets.

ＶｏＩＰ通信システムにおけるジッタバッファ調整動作では、図２に示すように、まず、ジッタバッファ調整部３２は、通常状態において、ジッタ吸収バッファ３３でアンダーランが発生したかを判断する（ステップＳＴ２１，２２）。すなわち、ジッタバッファ調整部３２は、通信網２の輻輳の発生によりジッタが増大し、ＩＰパケットの受信間隔が、受信ＣＯＤＥＣ３４の再生速度に満たないレートで一定時間継続した場合、ジッタ吸収バッファ３３で吸収可能なジッタ量を超過していると判断し、アンダーランが発生したと判断する。 In the jitter buffer adjustment operation in the VoIP communication system, as shown in FIG. 2, first, the jitter buffer adjustment unit 32 determines whether an underrun has occurred in the jitter absorption buffer 33 in the normal state (steps ST21 and ST22). . That is, the jitter buffer adjustment unit 32 causes the jitter absorption buffer 33 to increase the jitter due to congestion in the communication network 2 and when the IP packet reception interval continues for a certain period of time at a rate less than the reproduction speed of the reception CODEC 34. It is determined that the amount of jitter that can be absorbed is exceeded, and it is determined that an underrun has occurred.

このステップＳＴ２２において、ジッタバッファ調整部３２が、ジッタ吸収バッファ３３でアンダーランが発生したと判断した場合には、受信側装置３は通信網輻輳状態に移行し、ジッタ吸収バッファ３３の許容蓄積量を増加させる（ステップＳＴ２３，２４）。すなわち、ジッタバッファ調整部３２は、ＩＴＵ−ＴＧ．７１１Ａｐｐｅｎｄｉｘ１等で規定されるＰＬＣ（ＰａｃｋｅｔＬｏｓｓＣｏｎｃｅａｌｍｅｎｔ）機能により、最後に再生した音声データを補間（連続する場合は２回目以降の振幅を２０％減とする）して、音飛びを防止する。そして、ジッタ吸収バッファ３３の許容蓄積量を増加（例えばＰＬＣ機能により補間したデータ量分の増加等）する。通信網輻輳状態では、ＰＬＣ機能により数パケット毎に音声データを補間することで、増加させた許容蓄積量まで、ジッタ吸収バッファ３３に音声データを蓄積させる。その後、音声データが許容蓄積量まで蓄積されたら、ＰＬＣ機能による音声データの補間を停止する。
この処理により、ジッタ吸収バッファ３３でのアンダーランを防止することができる。 In this step ST22, when the jitter buffer adjustment unit 32 determines that an underrun has occurred in the jitter absorption buffer 33, the receiving side apparatus 3 shifts to a communication network congestion state and the allowable accumulation amount of the jitter absorption buffer 33 is reached. Is increased (steps ST23 and ST24). In other words, the jitter buffer adjustment unit 32 performs the ITU-T G. By using a PLC (Packet Loss Concealment) function defined by 711 Appendix 1 or the like, the last reproduced audio data is interpolated (in the case of continuous, the second and subsequent amplitudes are reduced by 20%) to prevent sound skipping. Then, the allowable accumulation amount of the jitter absorption buffer 33 is increased (for example, an increase corresponding to the data amount interpolated by the PLC function). In a communication network congestion state, voice data is accumulated in the jitter absorption buffer 33 up to the increased allowable accumulation amount by interpolating the voice data every several packets by the PLC function. After that, when the audio data is accumulated up to the allowable accumulation amount, the interpolation of the audio data by the PLC function is stopped.
By this processing, underrun in the jitter absorption buffer 33 can be prevented.

次いで、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量を検出し、一定時間、許容蓄積量内で維持されているかを判断する（ステップＳＴ２５）。
このステップＳＴ２５において、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量が許容蓄積量を超えていると判断した場合には、ステップＳＴ２４に戻り、許容蓄積量を増加させる。 Next, the jitter buffer adjustment unit 32 detects the accumulation amount of the jitter absorption buffer 33 and determines whether or not it is maintained within the allowable accumulation amount for a certain time (step ST25).
In step ST25, when the jitter buffer adjusting unit 32 determines that the accumulated amount of the jitter absorption buffer 33 exceeds the allowable accumulated amount, the jitter buffer adjusting unit 32 returns to step ST24 and increases the allowable accumulated amount.

一方、ステップＳＴ２５において、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量が一定時間、許容蓄積量内で維持されていると判断した場合には、ジッタ吸収バッファ３３の蓄積量がデフォルト値であるかを判断する（ステップＳＴ２６）。すなわち、許容蓄積量をデフォルト値から増加させた状態が継続すると、蓄積量を追加した分、音飛びは解消されるが、遅延が増加された状態が継続するため、ズレにより会話が不自然になる。そこでジッタ吸収バッファ３３の許容蓄積量をデフォルト値に戻す。
このステップＳＴ２６において、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量がデフォルト値であると判断した場合には、ステップＳＴ２１に戻り、受信側装置３は通常状態に移行する。 On the other hand, when the jitter buffer adjusting unit 32 determines in step ST25 that the accumulated amount of the jitter absorbing buffer 33 is maintained within the allowable accumulated amount for a certain time, the accumulated amount of the jitter absorbing buffer 33 is the default value. Is determined (step ST26). In other words, if the state where the allowable storage amount is increased from the default value continues, skipping will be eliminated by adding the storage amount, but the state where the delay is increased will continue, so the conversation will be unnatural due to deviation. Become. Therefore, the allowable accumulation amount of the jitter absorption buffer 33 is returned to the default value.
In step ST26, when the jitter buffer adjustment unit 32 determines that the accumulated amount of the jitter absorption buffer 33 is the default value, the process returns to step ST21, and the receiving side apparatus 3 shifts to the normal state.

一方、ステップＳＴ２６において、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量がデフォルト値ではない、すなわち、蓄積量がデフォルト値より大きいと判断した場合には、受信側装置３は、許容蓄積量をデフォルト値に戻すため、遅延の最適化処理状態に移行する（ステップＳＴ２７）。 On the other hand, in step ST26, when the jitter buffer adjustment unit 32 determines that the accumulation amount of the jitter absorption buffer 33 is not the default value, that is, the accumulation amount is larger than the default value, the reception-side apparatus 3 determines that the allowable accumulation amount is sufficient. In order to return the amount to the default value, the process shifts to a delay optimization process state (step ST27).

この遅延の最適化処理状態において許容蓄積量をデフォルト値に戻す際に、従来の受信側装置では、受信ＣＯＤＥＣでの再生速度を上げたり、ジッタ吸収バッファへの転送データの無条件廃棄等によって、ジッタ吸収バッファ３３の蓄積量を削減しており、いずれも音の歪みや音飛びが発生していた。 When returning the allowable accumulation amount to the default value in the delay optimization processing state, the conventional reception side device increases the reproduction speed in the reception CODEC, or unconditionally discards the transfer data to the jitter absorption buffer. The amount of accumulation in the jitter absorption buffer 33 was reduced, and in both cases, sound distortion and skipping occurred.

そこで、実施の形態１に係る受信側装置３では、遅延の最適化処理状態において許容蓄積量をデフォルト値に戻す際に、無音データの廃棄を実施する（ステップＳＴ２８）。すなわち、パケット受信部３１は、受信したＩＰパケットのＤＳＣＰ値を精査し、無音状態情報を取得する。次に、ジッタバッファ調整部３２は、取得した無音状態情報に基づき無音データの廃棄を行う。ここで、ジッタバッファ調整部３２は、無音状態情報が無音状態を示していない音声データについては、図３（ａ）に示すように、ジッタ吸収バッファ３３への転送を行う。一方、無音状態情報が無音状態を示している音声データについては、図３（ｂ）に示すように、ジッタ吸収バッファ３３へは転送せずに廃棄する。これにより、音声品質に対する影響を最小限に抑制しつつ、図３（ｃ）に示すように、ジッタ吸収バッファ３３の蓄積量を削減することができる。 Therefore, the receiving-side apparatus 3 according to Embodiment 1 discards silence data when returning the allowable accumulation amount to the default value in the delay optimization processing state (step ST28). That is, the packet receiving unit 31 examines the DSCP value of the received IP packet and acquires silence state information. Next, the jitter buffer adjustment unit 32 discards silence data based on the acquired silence state information. Here, as shown in FIG. 3A, the jitter buffer adjustment unit 32 transfers the audio data whose silence state information does not indicate the silence state to the jitter absorption buffer 33 as shown in FIG. On the other hand, as shown in FIG. 3B, the audio data whose silence state information indicates the silence state is discarded without being transferred to the jitter absorption buffer 33. As a result, as shown in FIG. 3C, the amount of accumulation in the jitter absorption buffer 33 can be reduced while minimizing the influence on the voice quality.

次いで、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量がデフォルト値であるかを判断する（ステップＳＴ２９）。
このステップＳＴ２９において、ジッタバッファ調整部３２は、ジッタ吸収バッファ３３の蓄積量がデフォルト値であると判断した場合には、ステップＳＴ２１に戻り、受信側装置３は通常状態に移行する。 Next, the jitter buffer adjustment unit 32 determines whether or not the accumulation amount of the jitter absorption buffer 33 is a default value (step ST29).
In this step ST29, when the jitter buffer adjusting unit 32 determines that the accumulated amount of the jitter absorption buffer 33 is the default value, the jitter buffer adjusting unit 32 returns to step ST21, and the receiving side apparatus 3 shifts to the normal state.

一方、ステップＳＴ２９において、ジッタバッファ調整部３２が、ジッタ吸収バッファ３３の蓄積量がデフォルト値ではない、すなわち、蓄積量がデフォルト値以上であると判断した場合には、ステップＳＴ２８に戻り、無音データの廃棄を継続する。以降、ジッタ吸収バッファ３３への無音データを数パケット毎に廃棄し、ジッタ吸収バッファ３３の蓄積量が許容蓄積量のデフォルト値まで戻った後、受信側装置３は通常状態へ移行する。 On the other hand, when the jitter buffer adjustment unit 32 determines in step ST29 that the accumulation amount of the jitter absorption buffer 33 is not the default value, that is, the accumulation amount is greater than or equal to the default value, the process returns to step ST28 and the silence data Continue to dispose of. Thereafter, the silent data in the jitter absorption buffer 33 is discarded every several packets, and after the accumulation amount of the jitter absorption buffer 33 returns to the default value of the allowable accumulation amount, the reception side apparatus 3 shifts to the normal state.

以上のように、この実施の形態１によれば、送信側装置１では、音声データの音量レベルに基づき無音状態を検出し、ＩＰパケットに無音状態情報を付加して受信装置に転送する。また、受信側装置３では、付加された無音状態情報を取得し、通信網２が輻輳状態から回復した際、ジッタ吸収バッファ３３への転送データのうち、無音状態である音声データのみを廃棄する。このように構成することで、輻輳回復後に遅延最適化のため、ジッタ吸収バッファ３３の許容蓄積量をデフォルト値に戻す際に、音声の歪みや音飛びを抑制することができ、音声品質への影響を最低限に抑制することができ、ＥＮＤ−ＥＮＤの遅延最適化を図ることができる。 As described above, according to the first embodiment, the transmission-side apparatus 1 detects the silence state based on the volume level of the voice data, adds the silence state information to the IP packet, and transfers it to the reception apparatus. Further, the receiving side device 3 acquires the added silence state information, and when the communication network 2 recovers from the congestion state, only the voice data in the silence state is discarded among the transfer data to the jitter absorption buffer 33. . With this configuration, when the allowable accumulation amount of the jitter absorption buffer 33 is returned to the default value for delay optimization after congestion recovery, it is possible to suppress audio distortion and skipping, and to improve audio quality. The influence can be suppressed to the minimum, and END-END delay optimization can be achieved.

実施の形態２．
実施の形態１における音声パケット生成部１２では、無音状態情報をＩＰパケットのＤＳ領域のＤＳＣＰ値に変換して埋め込むように構成したが、無音状態情報が無音状態を示す場合に、ＩＰパケットをパディング領域が必要となるパケット長とし、無音状態情報をパディング領域に埋め込むように構成してもよい。
このように、無音状態情報をＩＰパケットに付加する際に、パディング領域を使用することで、ＤＳＣＰ値による優先制御を使用したサービスにおいて制約が発生しない効果が期待できる。 Embodiment 2. FIG.
The voice packet generation unit 12 according to the first embodiment is configured to convert the silence state information into a DSCP value in the DS region of the IP packet and embed it. When the silence state information indicates a silence state, the IP packet is padded. The packet length may require a region, and the silence state information may be embedded in the padding region.
As described above, when the silent state information is added to the IP packet, the use of the padding area can be expected to have an effect that there is no restriction in the service using the priority control based on the DSCP value.

実施の形態３．
実施の形態１における音声パケット生成部１２では、無音状態情報をＩＰパケットのＤＳ領域のＤＳＣＰ値に変換して埋め込むように構成したが、無音状態情報が無音状態を示す場合に、ＩＰパケットにＶＬＡＮ（ＶｉｒｔｕａｌＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）タグを付与し、無音状態情報をＶＬＡＮタグのＣＯＳ値に変換して非優先パケットとして転送するように構成してもよい。
このように、無音状態情報をＩＰパケットに付加する際に、ＶＬＡＮのＣＯＳ値を使用することで、実施の形態２と同様の効果が期待できる。 Embodiment 3 FIG.
The voice packet generation unit 12 in the first embodiment is configured to convert the silence state information into a DSCP value in the DS region of the IP packet and embed it. However, when the silence state information indicates a silence state, A (Virtual Local Area Network) tag may be attached, and silence state information may be converted into a COS value of the VLAN tag and transferred as a non-priority packet.
As described above, when the silent state information is added to the IP packet, the same effect as in the second embodiment can be expected by using the COS value of the VLAN.

なお、本願発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, any combination of the embodiments, or any modification of any component in each embodiment, or omission of any component in each embodiment is possible. .

１送信側装置、２通信網、３受信側装置、４，５電話機、１１送信ＣＯＤＥＣ（無音状態情報生成部）、１２音声パケット生成部、１３パケット送出部、３１パケット受信部、３２ジッタバッファ調整部、３３ジッタ吸収バッファ、３４受信ＣＯＤＥＣ。 DESCRIPTION OF SYMBOLS 1 Transmission side apparatus, 2 Communication network, 3 Reception side apparatus, 4,5 Telephone, 11 Transmission CODEC (silent state information generation part), 12 Voice packet generation part, 13 Packet transmission part, 31 Packet reception part, 32 Jitter buffer adjustment Part, 33 jitter absorption buffer, 34 reception CODEC.

Claims

In a communication system comprising: a transmission side device that packetizes and transmits voice data; and a reception side device that receives packets from the transmission side device and converts them into voice data;
The transmitting device is:
A silence state information generating unit for generating silence state information based on a volume level of the audio data;
A voice packet generator for packetizing the corresponding voice data by adding the silence state information generated by the silence state information generator;
A packet sending unit for transferring the packet generated by the voice packet generating unit to a corresponding receiving device;
The receiving side device
A packet receiving unit that receives the packet transferred by the packet sending unit and obtains voice data and silence state information;
A jitter absorption buffer for accumulating audio data acquired by the packet receiver and adjusting a delay time;
When the amount of audio data stored in the jitter absorption buffer exceeds the set allowable storage amount, the allowable storage amount is increased, and the amount of audio data stored in the jitter absorption buffer is set for a certain period of time. A jitter buffer adjusting unit that discards voice data indicating that the silent state information acquired by the packet receiving unit indicates a silent state when it is within the allowable storage amount, and returns the allowable storage amount to a default value; A communication system characterized by the above.

The communication system according to claim 1, wherein the voice packet generation unit converts and embeds silence state information into a DSCP (DS Code Point) value in a DS (Differentiated Services) area of a packet.

The communication system according to claim 1, wherein the voice packet generation unit embeds silence state information in a padding area of a packet.

The communication system according to claim 1, wherein the voice packet generation unit adds a VLAN (Virtual Local Area Network) tag to the packet, converts silence state information into a COS value of the VLAN tag, and embeds the silent state information.

In the transmission side device that packetizes and transmits audio data,
A silence state information generating unit for generating silence state information based on a volume level of the audio data;
A voice packet generator for packetizing the corresponding voice data by adding the silence state information generated by the silence state information generator;
A transmission-side apparatus, comprising: a packet transmission unit that transfers a packet generated by the voice packet generation unit to a corresponding reception-side device.

6. The transmission side apparatus according to claim 5, wherein the voice packet generation unit converts the silence state information into a DSCP value in a DS region of the packet and embeds it.

6. The transmission side apparatus according to claim 5, wherein the voice packet generation unit embeds silence state information in a padding area of a packet.

6. The transmission side device according to claim 5, wherein the voice packet generation unit attaches a VLAN tag to the packet, converts silence state information into a COS value of the VLAN tag, and embeds it.

In the receiving device that receives the packet and converts it into voice data,
A packet receiving unit that receives a packet to which silence state information indicating whether the voice data is silent when the voice data is packetized, and acquires the voice data and the silence state information;
A jitter absorption buffer for accumulating audio data acquired by the packet receiver and adjusting a delay time;
When the amount of audio data stored in the jitter absorption buffer exceeds the set allowable storage amount, the allowable storage amount is increased, and the amount of audio data stored in the jitter absorption buffer is set for a certain period of time. A jitter buffer adjusting unit that discards voice data indicating that the silent state information acquired by the packet receiving unit indicates a silent state when it is within the allowable storage amount, and returns the allowable storage amount to a default value; A receiving apparatus characterized by the above.