JP2000307654A

JP2000307654A - Voice packet transmitting system

Info

Publication number: JP2000307654A
Application number: JP11595799A
Authority: JP
Inventors: Toru Kikuchi; 徹菊地
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1999-04-23
Filing date: 1999-04-23
Publication date: 2000-11-02

Abstract

PROBLEM TO BE SOLVED: To correct the discontinuity of the levels of a silence interval and a sound interval without damaging the effective use of a transmission line in silence control by inserting an interpolating frame generated by an interpolating frame generating means at the boundary between the sound interval and the silence interval of a voice frame. SOLUTION: An interpolating frame generated by an interpolating frame generating means is inserted to the boundary between the sound interval and the silence interval of a voice frame based on the judging result of a silence boundary judging means for judging the boundary between the sound interval and the silence interval of a received voice frame. In this system, a voice frame string stored in a voice frame buffer 303 is subjected to silence control detection by a silence control frame detecting part 304, and a background noise frame generated by a background noise frame generating part 111 and the interpolating frame interpolated by a voice frame interpolating part 305 are inserted and the resultant string is converted to the voice frame string. Subsequently, the string is sent to a voice extending part 108 to restore a voice signal.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、パケット網を介し
て、音声信号を伝送する音声パケット伝送システムに係
り、特に、無音状態ではパケットの伝送を行なわない無
音制御を行なう音声パケット伝送システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice packet transmission system for transmitting a voice signal via a packet network, and more particularly to a voice packet transmission system for performing a silent control that does not transmit a packet in a silent state.

【０００２】[0002]

【従来の技術】従来、パケット網を介して音声信号を伝
送し、無音状態ではパケットの伝送を行なわない無音制
御を行なう音声パケット伝送システムは、送信側で有音
区間／無音区間の検出を行い、無音区間ではパケットの
送出は行なわず、受信側では背景雑音を再生する等し、
伝送路の有効利用を図っている。2. Description of the Related Art Conventionally, a voice packet transmission system for transmitting a voice signal via a packet network and performing no-speech control without transmitting a packet in a no-sound state, detects a voiced section / silent section on the transmitting side. , The packet is not transmitted in the silent section, the background noise is reproduced on the receiving side, and so on.
The transmission line is effectively used.

【０００３】しかし、無音区間と有音区間との境界にお
ける音圧レベルが、入力側では連続的に変化するところ
が、出力側では、不連続に変化するので、会話の始まり
や終わりで途切れのような違和感を感じるという問題が
ある。However, the sound pressure level at the boundary between a silent section and a sound section changes continuously on the input side, but changes discontinuously on the output side. There is a problem of feeling uncomfortable.

【０００４】無音区間と有音区間との境界の途切れや雑
音に対して、特開平５−２９２１２１号公報に開示され
ているように、差分符号化を初期化する方法、特開平９
−１１６５７１号公報に開示されるように、有音区間の
先頭パケットとして予測係数を出力し、符号化器の状態
を一致させる方法、音声を遅延させる有音区間の前方の
無音区間を有音区間として音声フレーム化し、伝送する
方法が提案されている。[0004] As disclosed in Japanese Patent Application Laid-Open No. 5-292121, a method for initializing differential encoding for a break or noise at a boundary between a silent section and a voiced section is disclosed.
As disclosed in JP-A-116571, a method of outputting a prediction coefficient as the first packet of a voiced section and matching the states of the encoders, a silent section in front of a voiced section for delaying voice is described as a voiced section. A method of transmitting a voice frame and transmitting the voice frame has been proposed.

【０００５】[0005]

【発明が解決しようとする課題】しかし、特開平５−２
９２１２１号公報、特開平９−１１６５７１号公報に記
載されている手法は、無音パケット廃棄による復号化器
の誤差を補正する手段であり、上記従来例では、無音制
御による音圧レベルの不連続性を補正することはできな
いという問題がある。However, Japanese Patent Laid-Open Publication No.
The techniques described in JP-A-92121 and JP-A-9-116571 are means for correcting an error of a decoder due to silent packet discarding. Cannot be corrected.

【０００６】また、有音区間の前方の無音区間から有音
として取り扱う手法は、無音圧縮による伝送路の有効利
用上マイナスとなる。さらに、擬似的に作り出された背
景雑音と、実際に伝送された無音区間の音声フレームと
の不連続性によって、無音制御の違和感が残るという問
題がある。[0006] The method of treating as silence from a silence section ahead of a speech section is negative in terms of effective use of a transmission path by silence compression. Further, there is a problem that a sense of incongruity in silence control remains due to the discontinuity between the pseudo background noise and the speech frame of the silence section actually transmitted.

【０００７】本発明は、無音制御において、伝送路の有
効利用を損なうことなく、無音区間と有音区間とのレベ
ルの不連続性を補正し、会話の始まりや終わりに違和感
を感じない音声パケット伝送システムを提供することを
目的とするものである。According to the present invention, in a silent control, a voice packet which does not cause a sense of incongruity at the beginning or end of a conversation by correcting the level discontinuity between a silent section and a sound section without impairing the effective use of a transmission path. It is intended to provide a transmission system.

【０００８】[0008]

【課題を解決するための手段】本発明は、音声信号をデ
ジタル化された音声フレームヘ変換する音声符号化手段
と、上記音声フレームを音声信号へ変換する音声復号化
手段と、音声フレームをパケット化して送受信するパケ
ット送受信手段と、音声信号に基づき音声の有無を検出
する無音検出手段と、上記無音検出手段の検出結果に基
づいて上記パケット送受信手段による音声フレームの送
受信を制御する無音制御手段と、受信音声フレームの有
音区間と無音区間との境界を判定する無音境界判定手段
と、音声フレーム列の間を補う少なくとも１つの補間フ
レームを生成する補間フレーム生成手段とによって構成
され、上記無音境界判定手段の判定結果に基づき、音声
フレームの有音区間と無音区間との境界に、上記補間フ
レーム生成手段によって生成された少なくとも１つの補
間フレームを挿入するよう動作するものである。SUMMARY OF THE INVENTION The present invention provides a voice encoding means for converting a voice signal into a digitized voice frame, a voice decoding means for converting the voice frame into a voice signal, and packetizing the voice frame. Packet transmitting and receiving means for transmitting and receiving, silence detecting means for detecting the presence or absence of voice based on the audio signal, silence control means for controlling the transmission and reception of voice frames by the packet transmitting and receiving means based on the detection result of the silence detecting means, A silent boundary determining means for determining a boundary between a voiced section and a silent section of the received voice frame; and an interpolation frame generating means for generating at least one interpolation frame for compensating for a gap between voice frames. Based on the determination result of the means, the interpolation frame generating means is provided at the boundary between the sound section and the silent section of the voice frame. It is intended to operate to insert the at least one interpolated frame generated me.

【０００９】[0009]

【発明の実施の形態および実施例】［第１の実施例］図
１は、本発明の一実施例であるパケット伝送システムの
概略を示すブロック図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS [First Embodiment] FIG. 1 is a block diagram schematically showing a packet transmission system according to one embodiment of the present invention.

【００１０】上記実施例は、アナログ音声信号をマイ
ク、電話回線等から入力する音声入力部１０１と、アナ
ログ音声信号をＰＣＭデジタルの音声信号へ変換するＡ
／Ｄ変換部１０２と、ＰＣＭデジタル音声信号を圧縮符
号化した音声フレームヘ変換する音声圧縮部１０３と、
音声フレームをパケット網で転送可能なパケットに組み
立て等を行なうパケット送信部１０４と、音声信号の無
音区間を検出する無音検出部１０５と、パケット網１２
０に対してパケットの送受信を行なうパケット送受信イ
ンタフェース１０６と、受信パケットから音声フレーム
の分解等を行なうパケット受信部１０７と、背景雑音フ
レームの生成を行なう背景雑音フレーム生成部１１１
と、圧縮されている音声フレームを伸張し、ＰＣＭデジ
タルの音声信号へ変換する音声伸張部１０８と、ＰＣＭ
デジタル音声信号をアナログ音声信号へ変換するＤ／Ａ
変換部１０９と、アナログ音声信号をスピーカ、電話回
線等へ出力する音声出力部１１０とを有する。In the above embodiment, an audio input unit 101 for inputting an analog audio signal from a microphone, a telephone line or the like, and an A for converting an analog audio signal into a PCM digital audio signal.
A / D converter 102, an audio compressor 103 for converting a PCM digital audio signal into an audio frame obtained by compression encoding,
A packet transmission unit 104 for assembling voice frames into packets that can be transferred on a packet network; a silence detection unit 105 for detecting a silent section of a voice signal;
0, a packet transmitting / receiving interface 106 for transmitting / receiving packets, a packet receiving unit 107 for decomposing a voice frame from a received packet, and a background noise frame generating unit 111 for generating a background noise frame.
A voice decompression unit 108 for decompressing a compressed voice frame and converting it into a PCM digital voice signal;
D / A for converting digital audio signals to analog audio signals
It has a conversion unit 109 and an audio output unit 110 that outputs an analog audio signal to a speaker, a telephone line, or the like.

【００１１】図２は、パケット送信部１０４の構成を示
すブロック図である。FIG. 2 is a block diagram showing the configuration of the packet transmission unit 104.

【００１２】パケット送信部１０４は、音声圧縮部によ
って変換された音声フレームを一時貯える音声フレーム
バッファ２０１と、無音検出部１０５からの検出結果に
基づき、無音制御を行なうフレームを生成する無音制御
フレーム生成部２０４と、音声フレームバッファからパ
ケット化に必要な音声フレームを取得し、タイムスタン
プ、シケーンス番号、宛先アドレス等のヘッダ情報等を
付加しパケットを組み立てるパケット組立部２０２と、
組み立てたパケットをパケットインタフェース（Ｉ／
Ｆ）１０６へ引き渡すパケット送信バッファ２０３とを
有する。[0012] The packet transmission unit 104 includes an audio frame buffer 201 for temporarily storing the audio frame converted by the audio compression unit, and a silence control frame generation unit for generating a frame for performing silence control based on the detection result from the silence detection unit 105. A packet assembler 202 that acquires a voice frame necessary for packetization from a voice frame buffer, adds header information such as a time stamp, a sequence number, and a destination address to assemble the packet;
The assembled packet is sent to the packet interface (I /
And F) a packet transmission buffer 203 to be transferred to 106.

【００１３】図３は、パケット受信部１０７の構成を示
すブロック図である。FIG. 3 is a block diagram showing the configuration of the packet receiving unit 107.

【００１４】パケット受信部１０７は、パケットインタ
フェース１０６からの受信パケットを格納し、ヘッダ情
報に基づき、パケットの並べ替えや処理タイミングを制
御するパケット受信バッファ３０１と、パケットのヘッ
ダ情報を取り除き音声フレームを切り出すパケット分解
部３０２と、音声フレームを格納する音声フレームバッ
ファ３０３は、無音制御フレームを検出し、背景雑音フ
レーム生成部１１１から背景雑音フレームを取得し、音
声フレームバッファ３０３へ格納する無音制御フレーム
検出部３０４と、音声フレームバッファに格納された音
声フレームの並びから、任意に位置する音声フレームを
予測算出する音声フレーム補間部３０５とを有する。A packet receiving unit 107 stores a packet received from the packet interface 106, and based on the header information, a packet receiving buffer 301 for controlling packet rearrangement and processing timing, and removes the header information of the packet to remove the audio frame. The packet decomposing unit 302 that cuts out the audio frame and the audio frame buffer 303 that stores the audio frame detect a silence control frame, acquire the background noise frame from the background noise frame generation unit 111, and store the silence control frame in the audio frame buffer 303. And an audio frame interpolation unit 305 for predicting and calculating an arbitrarily located audio frame from the arrangement of audio frames stored in the audio frame buffer.

【００１５】次に、上記実施例の動作について説明す
る。Next, the operation of the above embodiment will be described.

【００１６】図４は、音声信号４０１を示す図である。FIG. 4 is a diagram showing the audio signal 401.

【００１７】音声信号４０１は、図４に示すように、音
声圧縮部１０３によって音声フレーム列４０２へ変換さ
れる。音声の圧縮方式としては、ＩＴＵ−Ｔ勧告Ｇ．７
２８、Ｇ．７２９、Ｇ．７２３．１等で規定されるハイ
ブリッド方式等が考えられる。The audio signal 401 is converted into an audio frame sequence 402 by the audio compression section 103 as shown in FIG. As an audio compression method, ITU-T Recommendation G. 7
28, G. 729, G.C. For example, a hybrid system specified in 723.1 and the like can be considered.

【００１８】図６は、上記実施例において、有音区間に
おける符号化処理周期を示すフローチャートである。FIG. 6 is a flowchart showing an encoding processing cycle in a sound section in the above embodiment.

【００１９】図６のフローチャートにおいて、音声圧縮
部１０３によって音声圧縮処理（Ｓ６０１）された音声
フレーム、たとえば音声フレーム列４０２の音声フレー
ムＦ３は、無音検出部１０５による無音検出がされない
ので（Ｓ６０２、Ｎｏ）、音声フレームＦ３は、音声フ
レームバッファ２０１へ格納される（Ｓ６０３）。In the flow chart of FIG. 6, since the audio frame subjected to the audio compression processing (S601) by the audio compression unit 103, for example, the audio frame F3 of the audio frame sequence 402, is not detected by the audio absence detection unit 105, no audio is detected (S602, No). ), The audio frame F3 is stored in the audio frame buffer 201 (S603).

【００２０】ここで、たとえば、音声フレーム列４０２
における音声フレームＦ８のタイミングのように、入力
音声信号４０１が、基準値Ｌ１を下回り、無音へ変化し
たと検出された場合（Ｓ６０２、Ｙｅｓ）、音声フレー
ムＦ８は廃棄され（Ｓ６１１）、無音制御フレーム生成
部２０４によって、有音区間から無音区間への変化を示
す無音制御オンフレームＦｏｎが、音声フレームバッフ
ァ２０１へ格納され（Ｓ６１２）、無音区間へ遷移する
（Ｓ６１３）。Here, for example, an audio frame sequence 402
When the input audio signal 401 is detected to be lower than the reference value L1 and changed to silence as in the timing of the audio frame F8 (S602, Yes), the audio frame F8 is discarded (S611). The generation unit 204 stores a silent control on-frame Fon indicating a change from a sound section to a silent section in the audio frame buffer 201 (S612), and transits to a silent section (S613).

【００２１】図７は、上記実施例において、無音区間に
おける符号化周期処理を示すフローチャートである。FIG. 7 is a flowchart showing a coding cycle process in a silent section in the above embodiment.

【００２２】まず、音声圧縮部１０３によって音声圧縮
処理された（Ｓ７０１）音声フレーム、たとえば音声フ
レーム列４０２の音声フレームＦ１２は、無音検出部１
０５による有音検出がされないので（Ｓ７０２、Ｎ
ｏ）、音声フレームバッファ２０１へは格納されず、廃
棄される（Ｓ７０３）。First, an audio frame subjected to audio compression processing by the audio compression unit 103 (S701), for example, an audio frame F12 of the audio frame sequence 402,
Since no sound is detected by the method No. 05 (S702, N
o), it is not stored in the audio frame buffer 201 and is discarded (S703).

【００２３】ここで、たとえば、音声フレーム列４０２
における音声フレームＦ１８のタイミングのように、入
力音声信号４０１が基準値Ｌ１を越え、有音区間へ変化
したと検出された場合（Ｓ７０２、Ｙｅｓ）、無音制御
フレーム生成部２０４によって、無音区間から有音区間
への変化を示す無音制御オフフレームＦｏｆｆが、音声
フレームバッファ２０１へ格納され（Ｓ７１１）、引き
続き音声フレームＦ１８が、音声フレームバッファ２０
１へ格納され（Ｓ７１２）、有音区間へ遷移する（Ｓ７
１３）。Here, for example, the voice frame sequence 402
When it is detected that the input audio signal 401 has exceeded the reference value L1 and has changed to a voiced section (S702, Yes), as in the timing of the voice frame F18 in (S702, Yes), the voiceless The silence control off-frame Foff indicating the change to the sound section is stored in the audio frame buffer 201 (S711), and the audio frame F18 is continuously stored in the audio frame buffer 20.
1 (S712), and transits to a sound section (S7).
13).

【００２４】図４に示され、音声フレームバッファ２０
１へ格納された音声フレームと無音制御フレームとの列
４０３は、パケット組立部２０２によって、パケット列
４０４へ組み立てられ、パケット送信バッファ２０３へ
転送される。The audio frame buffer 20 shown in FIG.
The sequence 403 of voice frames and silence control frames stored in No. 1 is assembled into a packet sequence 404 by the packet assembling unit 202 and transferred to the packet transmission buffer 203.

【００２５】図８は、上記実施例におけるパケット組立
周期処理を示すフローチャートである。FIG. 8 is a flowchart showing the packet assembling cycle processing in the above embodiment.

【００２６】まず、音声フレームバッファ２０１に音声
フレームが存在する場合（Ｓ８０１、Ｙｅｓ）、パケッ
ト組立に必要な音声フレームを、音声フレームバッファ
２０１から取得し（Ｓ８０２）、この取得された音声フ
レームを合成し（Ｓ８０３）、ヘッダ情報を付加し（Ｓ
８０４）、パケット送信バッファ２０３へ格納する（Ｓ
８０５）。音声フレームバッファ２０１に音声フレーム
が存在しない場合（Ｓ８０１、Ｎｏ）、処理を終了す
る。First, when an audio frame exists in the audio frame buffer 201 (S801, Yes), an audio frame necessary for packet assembly is obtained from the audio frame buffer 201 (S802), and the obtained audio frame is synthesized. (S803), and adds header information (S803).
804) and store it in the packet transmission buffer 203 (S
805). If no audio frame exists in the audio frame buffer 201 (S801, No), the process ends.

【００２７】ここで、図４には、合成するフレーム数が
１、すなわち音声フレームと音声パケットが１対１であ
る場合を示してある。FIG. 4 shows a case where the number of frames to be synthesized is 1, that is, the audio frame and the audio packet are in a one-to-one correspondence.

【００２８】図９は、上記実施例において、パケット転
送周期処理を示すフローチャートである。FIG. 9 is a flowchart showing the packet transfer cycle processing in the above embodiment.

【００２９】図１０は、上記実施例における受信処理を
示すフローチャートである。FIG. 10 is a flowchart showing the receiving process in the above embodiment.

【００３０】まず、パケット送信バッファ２０３に格納
されたパケットは、パケット送信バッファから取り出さ
れ（Ｓ９０１）、パケットが存在する場合（Ｓ９０２、
Ｙｅｓ）、パケットインタフェース（Ｉ／Ｆ）１０６を
介して、パケット網１２０へ送信される（Ｓ９０３）。
パケットが存在しない場合（Ｓ９０２、Ｎｏ）、処理を
終了する。パケット網１２０へ送出されたパケットは、
ヘッダ情報を基に転送され、パケットＩ／Ｆ１０６を介
して受信されると、図１０に示す受信処理において、パ
ケット受信バッファ３０１へ格納される（Ｓ１００
１）。First, the packet stored in the packet transmission buffer 203 is taken out of the packet transmission buffer (S901), and if a packet exists (S902,
Yes), the packet is transmitted to the packet network 120 via the packet interface (I / F) 106 (S903).
If there is no packet (S902, No), the process ends. The packet transmitted to the packet network 120 is
When the data is transferred based on the header information and received via the packet I / F 106, it is stored in the packet reception buffer 301 in the reception processing shown in FIG. 10 (S100).
1).

【００３１】図５に示すように、パケット受信バッファ
３０１に格納された受信パケット列５０１は、パケット
分解部３０２によって、受信フレーム列５０２へ分解さ
れ、音声フレームバッファ３０３へ格納される。As shown in FIG. 5, the received packet sequence 501 stored in the packet receiving buffer 301 is decomposed into a received frame sequence 502 by the packet decomposing unit 302 and stored in the voice frame buffer 303.

【００３２】図１１は、上記実施例において、パケット
分解周期を示すフローチャートである。FIG. 11 is a flowchart showing a packet disassembly cycle in the above embodiment.

【００３３】まず、パケットヘッダのシーケンス番号情
報を基にして、パケット受信バッファ３０１内の受信パ
ケット列５０１の到着順序の逆転等が補正される（Ｓ１
１０１）。次に、パケットヘッダのタイムスタンプ情報
を基にして、パケット受信バッファ３０１に分解処理を
行なうべきパケットが存在するか否かを判断し（Ｓ１１
０２）、処理タイミングのパケットが存在する場合（Ｓ
１１０２、Ｙｅｓ）、受信パケットバッファ３０１から
上記パケットを取り出し（Ｓ１１０３）、ヘッダが除去
され（Ｓ１１０４）、フレームが分解され（Ｓ１１０
５）、音声フレームバッファ３０３へ、音声フレームが
格納される（Ｓ１１０６）。分解処理タイミングのパケ
ットが存在しない場合（Ｓ１１０２、Ｎｏ）、そのまま
処理を終了する。First, based on the sequence number information of the packet header, the inversion of the order of arrival of the received packet sequence 501 in the packet receiving buffer 301 is corrected (S1).
101). Next, based on the time stamp information of the packet header, it is determined whether or not there is a packet to be decomposed in the packet reception buffer 301 (S11).
02), when there is a packet at the processing timing (S
1102, Yes), the packet is extracted from the reception packet buffer 301 (S1103), the header is removed (S1104), and the frame is decomposed (S110).
5) The audio frame is stored in the audio frame buffer 303 (S1106). If there is no packet at the disassembly processing timing (S1102, No), the processing ends as it is.

【００３４】図５に示すように、音声フレームバッファ
３０３へ格納された音声フレーム列５０２は、無音制御
フレーム検出部３０４によって、無音制御検出が行わ
れ、背景雑音フレーム生成部１１１によって生成された
背景雑音フレームＦｓ、音声フレーム補間部３０５によ
って補間された補間フレームＦｉが挿入され、音声フレ
ーム列５０３へ変換され、その後、音声伸張部１０８へ
送られ、音声信号５０４が復元される。As shown in FIG. 5, the speech frame sequence 502 stored in the speech frame buffer 303 is subjected to silence control detection by the silence control frame detection unit 304, and the background generated by the background noise frame generation unit 111. The noise frame Fs and the interpolated frame Fi interpolated by the audio frame interpolation unit 305 are inserted, converted into an audio frame sequence 503, and then sent to the audio expansion unit 108 to restore the audio signal 504.

【００３５】図１２は、上記実施例において、有音区
間、たとえば音声フレーム列５０２の音声フレームＦ６
のタイミングの復号化周期処理を示すフローチャートで
ある。FIG. 12 shows a speech section, for example, the speech frame F6 of the speech frame sequence 502 in the above embodiment.
9 is a flowchart showing a decoding cycle process at the timing of FIG.

【００３６】まず、音声フレームと背景雑音フレームと
の変化によって、無音区間と有音区間との境界であるか
否かが判断される（Ｓ１２０１）。本実施例では、後述
する補間フレームによる置き換え処理を１フレームに関
して行なうとすると、境界は有音フレームと２つの背景
雑音フレームとの連続する３フレームで判断されるの
で、音声フレームＦ６のタイミングにおいて、フレーム
Ｆ４〜Ｆ６は、全て有音フレームであり、境界ではない
と判断される（Ｓ１２０１、Ｎｏ）。First, it is determined whether or not a boundary between a silent section and a sound section is present based on a change between a speech frame and a background noise frame (S1201). In the present embodiment, if replacement processing by an interpolation frame described later is performed for one frame, the boundary is determined by three consecutive frames of a voiced frame and two background noise frames. The frames F4 to F6 are all sound frames, and are determined not to be boundaries (S1201, No).

【００３７】次に、無音制御御フレームＦｏｎであるか
否かが判断され（Ｓ１２０２）、Ｆｏｎではないので
（Ｓ１２０２、Ｎｏ）、音声フレームバッファ３０３の
最も先頭の音声フレームが取り出され（Ｓ１２０３）、
音声伸張部１０８へ転送される（Ｓ１２０４）。Next, it is determined whether or not the frame is the silent control control frame Fon (S1202). Since the frame is not Fon (S1202, No), the first audio frame in the audio frame buffer 303 is extracted (S1203).
The data is transferred to the audio decompression unit 108 (S1204).

【００３８】ここで、音声フレームバッファ３０３に
は、無音境界検出に必要な音声フレームが最低限格納さ
れているので、音声伸張部１０８へ転送される音声フレ
ームは、フレームＦ４以前のものとなる。音声フレーム
列５０２における有音区間Ｆ７に引き続く無音制御オン
フレームＦｏｎのタイミングにおける復号化周期処理に
おいては、無音制御オンフレームが認識され（Ｓ１２０
２、Ｙｅｓ）、無音区間へ状態が遷移される（Ｓ１２１
１）。Here, since the audio frame buffer 303 stores at least the audio frames necessary for the silent boundary detection, the audio frames transferred to the audio decompression unit 108 are those before the frame F4. In the decoding cycle processing at the timing of the silent control on-frame Fon subsequent to the sound interval F7 in the voice frame sequence 502, the silent control on-frame is recognized (S120).
2, Yes), the state transits to the silent section (S121).
1).

【００３９】また、有音区間Ｆ２、Ｆ１８のタイミング
における復号化周期処理では、無音区間と有音区間との
境界であると判断され（Ｓ１２０１、Ｙｅｓ）、有音区
間と無音区間とに位置する背景雑音フレームが、それぞ
れ、補間フレームＦｉ０、Ｆｉ２へ置換される（Ｓ１２
１１）。補間フレームの生成は、たとえば、ハイブリッ
ド符号化方式の場合、フィルタ係数、雑音符号帳インデ
ックスは、有音区間のものを用い、ゲイン係数は、背景
雑音ゲインとの中間値を取る手法等が考えられる。In the decoding cycle processing at the timings of the sound sections F2 and F18, it is determined that the boundary is between the sound section and the sound section (S1201, Yes), and the sound is located in the sound section and the sound section. The background noise frames are replaced with interpolation frames Fi0 and Fi2, respectively (S12).
11). For example, in the case of the hybrid coding method, a method of generating an interpolation frame using a filter coefficient and a noise codebook index in a sound section is used, and a gain coefficient is set to an intermediate value with a background noise gain. .

【００４０】図１３は、上記実施例において、無音区間
における復号化周期処理を示すフローチャートである。FIG. 13 is a flowchart showing a decoding cycle process in a silent section in the above embodiment.

【００４１】まず、有音区間と無音区間との境界と判断
されず（Ｓ１３０１、Ｎｏ）、無音制御オフフレームＦ
ｏｆｆでないと判断された場合（Ｓ１３０２、Ｎｏ）、
背景雑音フレームＦｓが音声フレームバッファ３０３へ
格納され（Ｓ１３０３）、先頭の音声フレームが取り出
され（Ｓ１３０４）、音声伸張部１０８へ転送される
（Ｓ１３０５）。有音区間と無音区間との境界であると
判断されると（Ｓ１３０１、Ｙｅｓ）、補間フレームＦ
ｉ１、Ｆｉ３が生成置換され（Ｓ１３１１）、無音制御
オフフレームＦｏｆｆが検出されると（Ｓ１３０２、Ｙ
ｅｓ）、有音区間に状態を遷移させる（Ｓ１３２１）。First, it is not determined that the boundary between the sound section and the silent section (S1301, No), and the silent control off frame F
If it is determined that it is not off (S1302, No),
The background noise frame Fs is stored in the audio frame buffer 303 (S1303), the first audio frame is extracted (S1304), and transferred to the audio expansion unit 108 (S1305). If it is determined that the boundary is a boundary between a sound section and a silent section (S1301, Yes), the interpolation frame F
i1 and Fi3 are generated and replaced (S1311), and when the silence control off-frame Foff is detected (S1302, Y
es), the state is transited to the sound section (S1321).

【００４２】［第２の実施例］上記第１の実施例におい
て、音声圧縮方式によって定まる１音声フレームの時間
的長さによって、補間フレームのサイズを変化させ、こ
れによって、無音区間と有音区間との境界の時間的変化
を、同一レベルに保つことが可能となり、音声圧縮方式
に関わらず安定した滑らかさを実現することが可能とな
る。[Second Embodiment] In the first embodiment, the size of the interpolation frame is changed according to the temporal length of one audio frame determined by the audio compression method. , It is possible to keep the temporal change of the boundary with the same level, and to realize stable smoothness regardless of the audio compression method.

【００４３】[0043]

【発明の効果】本発明によれば、無音制御において、伝
送路の有効利用を損なうことなく、無音区間と有音区間
とのレベルの不連続性を補正し、会話の始まりや終わり
に違和感を感じない音声パケット伝送システムを構築す
ることが可能となり、さらに、音声圧縮方式によらず、
安定した滑らかさを実現することも可能になるという効
果を奏する。According to the present invention, in the silence control, the level discontinuity between the silence section and the speech section is corrected without impairing the effective use of the transmission path, and a sense of discomfort is obtained at the beginning or end of a conversation. It is possible to build a voice packet transmission system that you can not feel, and furthermore, regardless of the voice compression method,
There is an effect that stable smoothness can be realized.

[Brief description of the drawings]

【図１】本発明の一実施例であるパケット伝送システム
の概略を示すブロック図である。FIG. 1 is a block diagram schematically showing a packet transmission system according to one embodiment of the present invention.

【図２】パケット送信部１０４の構成を示すブロック図
である。FIG. 2 is a block diagram illustrating a configuration of a packet transmission unit 104.

【図３】パケット受信部１０７の構成を示すブロック図
である。FIG. 3 is a block diagram illustrating a configuration of a packet receiving unit 107.

【図４】音声信号４０１を示す図である。FIG. 4 is a diagram showing an audio signal 401.

【図５】音声信号５０１を示す図である。FIG. 5 is a diagram showing an audio signal 501.

【図６】上記実施例において、有音区間における符号化
処理周期を示すフローチャートである。FIG. 6 is a flowchart showing an encoding processing cycle in a sound section in the embodiment.

【図７】上記実施例において、無音区間における符号化
周期処理を示すフローチャートである。FIG. 7 is a flowchart showing an encoding cycle process in a silent section in the embodiment.

【図８】上記実施例におけるパケット組立周期処理を示
すフローチャートである。FIG. 8 is a flowchart showing a packet assembly cycle process in the embodiment.

【図９】上記実施例において、パケット転送周期処理を
示すフローチャートである。FIG. 9 is a flowchart showing a packet transfer cycle process in the embodiment.

【図１０】上記実施例において、パケット転送周期処理
を示すフローチャートである。FIG. 10 is a flowchart showing a packet transfer cycle process in the embodiment.

【図１１】上記実施例において、パケット分解周期を示
すフローチャートである。FIG. 11 is a flowchart showing a packet disassembly cycle in the embodiment.

【図１２】上記実施例において、有音区間、たとえば音
声フレーム列５０２の音声フレームＦ６のタイミングの
復号化周期処理を示すフローチャートである。FIG. 12 is a flowchart showing a decoding cycle process of the timing of a sound section, for example, the audio frame F6 of the audio frame sequence 502 in the embodiment.

【図１３】上記実施例において、無音区間における復号
化周期処理を示すフローチャートである。FIG. 13 is a flowchart showing a decoding cycle process in a silent section in the embodiment.

[Explanation of symbols]

１００…パケット伝送システム、１０４…パケット送信部、１０５…無音検出部、１０７…パケット受信部、１１１…背景雑音フレーム生成部、１２０…パケット網、４０１…音声信号。 100: Packet transmission system, 104: Packet transmission unit, 105: Silence detection unit, 107: Packet reception unit, 111: Background noise frame generation unit, 120: Packet network, 401: Voice signal.

Claims

[Claims]

1. An audio packet transmission system for transmitting an audio signal via a packet network, comprising: audio encoding means for converting the audio signal into a digitized audio frame; and converting the audio frame into an audio signal. Voice decoding means for packetizing the voice frame and transmitting and receiving the voice frame; and silence detecting means for detecting the presence or absence of voice based on the voice signal;
Silence control means for controlling transmission and reception of a voice frame by the packet transmission / reception means based on the detection result of the silence detection means; silence boundary determination means for determining a boundary between a sound section and a silence section in a received voice frame; Interpolated frame generation means for generating at least one frame that supplements between the audio frame sequences; and based on the determination result of the silent boundary determination means, at the boundary between the sound interval and the silent interval in the audio frame, A voice packet transmission system, wherein at least one interpolation frame generated by the interpolation frame generation means is inserted.

2. The apparatus according to claim 1, further comprising background noise frame generation means for generating a background noise frame, wherein the silence control means is generated by the background noise frame generation means when a received voice frame is in a silent state. Means for acquiring a background noise frame, wherein the audio decoding means converts the background noise frame into an audio signal, and converts the background noise frame existing at the boundary between the background noise frame and the audio frame into the interpolation frame. A voice packet transmission system, which is means for replacing the interpolation frame generated by the generation means.

3. The voice frame time recognition means according to claim 1, further comprising: a voice frame time recognition means for recognizing a time length of one voice frame; A voice packet transmission system, characterized in that it is means for determining the number of interpolated frames on the basis of the above.