JP5056049B2

JP5056049B2 - Audio data decoding device

Info

Publication number: JP5056049B2
Application number: JP2007035664A
Authority: JP
Inventors: 伊藤　　博紀; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-02-16
Filing date: 2007-02-16
Publication date: 2012-10-24
Anticipated expiration: 2026-07-27
Also published as: JP2008033232A

Description

本発明は、音声データの復号装置、音声データの変換装置、及び誤り補償方法の技術に
関する。 The present invention relates to a technology of a speech data decoding device, a speech data conversion device, and an error compensation method.

回線交換網又はパケット網を使って音声データを伝送する際、音声データを符号化、復
号を行うことで音声信号の授受を行っている。この音声圧縮の方式としては、例えば、Ｉ
ＴＵ−Ｔ（International Telecommunication Union Telecommunication Standardizatio
n sector）勧告G.711方式、またはＣＥＬＰ（Code-Excited Linear Prediction）方式が
知られている。 When audio data is transmitted using a circuit switching network or a packet network, audio signals are transmitted and received by encoding and decoding the audio data. As this audio compression method, for example, I
TU-T (International Telecommunication Union Telecommunication Standardizatio
n sector) recommendation G.711 system or CELP (Code-Excited Linear Prediction) system is known.

これらの圧縮方式で符号化された音声データを伝送すると、無線誤り又はネットワーク
の輻輳等により、音声データの一部が欠落することがある。この欠落した音声データに対
する誤り補償として、欠落する前に受信した音声データの情報に基づいて、欠落した音声
データに対する音声信号の生成を行う。 When audio data encoded by these compression methods is transmitted, a part of the audio data may be lost due to a radio error or network congestion. As error compensation for the missing voice data, a voice signal is generated for the missing voice data based on the information of the voice data received before the missing voice data.

そこで、欠落した音声データの音声信号を生成する際に伴う音質の劣化を低減する関連
技術として、特許文献1は、遅れて受信したパケットに含まれる音声フレームデータを用
いて、フィルタメモリ値を更新する技術を開示している。すなわち、ロスしたパケットを
遅れて受信した場合、このパケットに含まれる音声フレームデータを用いて、ピッチフィ
ルタ、またはスペクトル概形を表すフィルタで使用するフィルタメモリ値を更新する技術
である。 Therefore, as a related technique for reducing deterioration in sound quality caused when generating a sound signal of missing sound data, Patent Document 1 updates a filter memory value using sound frame data included in a packet received late. The technology to do is disclosed. That is, when a lost packet is received with a delay, the filter memory value used in the pitch filter or the filter representing the spectral outline is updated using the audio frame data included in the packet.

また、特許文献２は、ADPCM（Adaptive Differential Pulse Code Modulation）符号化
において、符号化データの欠落後に正しい符号化データを受け取っても、符号化側と復号
化側の予測器の状態不一致により不快な異常音を出力するという課題を解決する技術を開
示している。すなわち、パケット損失が「検出」から「非検出」へ遷移してから所定時間
、検出状態制御部が過去の音声データを基に生成した補間信号を徐々に減少させ、時間が
経つにつれて符号化側と復号化側との予測器の状態が次第に一致して音声信号が正常にな
っていくので、音声信号を徐々に増大させる。その結果、この技術は、符号化データの欠
落状態から復旧した直後においても異常音を出力しないという効果を奏する。 Further, in Patent Document 2, in ADPCM (Adaptive Differential Pulse Code Modulation) encoding, even if correct encoded data is received after loss of encoded data, it is uncomfortable due to a state mismatch between the predictor on the encoding side and the decoding side. A technique for solving the problem of outputting abnormal sound is disclosed. That is, for a predetermined time after the packet loss transitions from “detected” to “non-detected”, the detection state control unit gradually decreases the interpolation signal generated based on the past audio data, and the coding side over time Since the state of the predictor on the decoding side gradually matches and the speech signal becomes normal, the speech signal is gradually increased. As a result, this technique has an effect of not outputting abnormal sound even immediately after recovering from the lack of encoded data.

さらに、特許文献３では、音声信号から線形予測計数を算出し、この線形予測計数から
音声信号を生成する技術が開示されている。 Further, Patent Document 3 discloses a technique for calculating a linear prediction count from a speech signal and generating a speech signal from the linear prediction count.

特開２００２−２６８６９７号公報JP 2002-268697 A 特開２００５−２７４９１７号公報JP 2005-294917 A 特開平１１−３０５７９７号公報Japanese Patent Laid-Open No. 11-305797

従来の音声データに対する誤り補償方式は、過去の音声波形を繰り返す単純な方式であ
るため、上記のような技術が開示されているものの、音質に依然、改善の余地が残されて
いた。 The conventional error compensation method for audio data is a simple method that repeats a past audio waveform. Therefore, although the above-described technique has been disclosed, there is still room for improvement.

したがって、音声データに対する誤り補償方式を用いたときに、音質が劣化するという
課題を解決することが求められる。 Therefore, it is required to solve the problem that the sound quality deteriorates when an error compensation method for audio data is used.

本発明の音声データ復号装置は、音声データ中のロスを補間する補間信号を出力する音声データ復号装置であって、ロスを検出し、かつロスに対応する音声データを遅れて受信したことを検出するロス検出部と、ロスが検出された場合、合成フィルタ等のメモリをメモリ蓄積部に出力する第一音声データデコーダと、
メモリ蓄積部に蓄積されているロス前音声データの合成フィルタ等のメモリを使って、ロスに対応する音声データを復号し、その後、ロスに対応する音声データの次の音声データを復号した復号音声信号を生成する第二音声データデコーダと、出力している全音声信号に対する前記復号音声信号の比率を変化させながら出力する音声信号出力部と、を備えている。 An audio data decoding apparatus according to the present invention is an audio data decoding apparatus that outputs an interpolation signal for interpolating a loss in audio data, detects a loss, and detects that audio data corresponding to the loss has been received with a delay. A loss detection unit that, when a loss is detected, a first audio data decoder that outputs a memory such as a synthesis filter to the memory storage unit;
Decoded speech that uses speech filter data such as a synthesis filter for pre-loss speech data stored in the memory storage unit to decode the speech data corresponding to the loss, and then decodes the speech data next to the speech data corresponding to the loss A second audio data decoder that generates a signal; and an audio signal output unit that outputs the signal while changing a ratio of the decoded audio signal to all output audio signals.

また、パラメータが、スペクトルパラメータ、遅延パラメータ、適応コードブックゲイ
ン、正規化残差信号、または正規化残差信号ゲインである構成とすることもできる。 The parameter may be a spectrum parameter, a delay parameter, an adaptive codebook gain, a normalized residual signal, or a normalized residual signal gain.

本発明によれば、音声データのロスが生じたときに補う補間信号において、良好な音質
が期待される。 According to the present invention, good sound quality is expected in an interpolation signal to be compensated when a loss of audio data occurs.

本発明の実施の形態について図面を参照しながら説明する。しかしながら、係る形態は
本発明の技術的範囲を限定するものではない。 Embodiments of the present invention will be described with reference to the drawings. However, such a form does not limit the technical scope of the present invention.

本発明の実施例１について、図１及び図２を参照しながら以下に説明する。 Embodiment 1 of the present invention will be described below with reference to FIGS. 1 and 2.

図１は、G.711方式に代表される波形符号化方式で符号化された音声データに対する復
号装置の構成を示す。実施例１の音声データ復号装置は、ロスディテクタ１０１、音声デ
ータデコーダ１０２、音声データアナライザ１０３、パラメータ修正部１０４、音声合成
部１０５及び音声信号出力部１０６により構成されている。ここで、音声データとは、あ
る一連の音声を符号化したデータをいい、また、少なくとも１つの音声フレームから構成
される音声のデータのことをいう。 FIG. 1 shows a configuration of a decoding apparatus for audio data encoded by a waveform encoding system typified by the G.711 system. The speech data decoding apparatus according to the first embodiment includes a loss detector 101, a speech data decoder 102, a speech data analyzer 103, a parameter correction unit 104, a speech synthesis unit 105, and a speech signal output unit 106. Here, the audio data refers to data obtained by encoding a series of audio, and also refers to audio data composed of at least one audio frame.

ロスディテクタ１０１は、受信した音声データを音声データデコーダ１０２に出力する
とともに、受信した音声データがロスしたかを検出し、ロス検出結果を音声データデコー
ダ１０２とパラメータ修正部１０４と音声信号出力部１０６に出力する。 The loss detector 101 outputs the received audio data to the audio data decoder 102, detects whether the received audio data has been lost, and displays the loss detection result as an audio data decoder 102, a parameter correction unit 104, and an audio signal output unit 106. Output to.

音声データデコーダ１０２は、入力された音声データを復号して、復号音声信号を音声
データ出力部１０６と音声データアナライザ１０３に出力する。 The audio data decoder 102 decodes the input audio data and outputs the decoded audio signal to the audio data output unit 106 and the audio data analyzer 103.

音声データアナライザ１０３は、復号音声信号をフレーム毎（例えば２０ms）に分割し
、分割した信号に対して線形予測分析を用いて、音声信号のスペクトル特性を表すスペク
トルパラメータを抽出する。次に、音声データアナライザ１０３は、フレーム分割した音
声信号をサブフレーム（例えば５ms）に分割し、サブフレーム毎に過去の音源信号を基に
適応コードブックにおけるパラメータとして、ピッチ周期に対応する遅延パラメータと適
応コードブックゲインを抽出する。また、音声データアナライザ１０３は、適応コードブ
ックにより該当するサブフレームの音声信号をピッチ予測する。さらに、音声データアナ
ライザ１０３は、ピッチ予測して求めた残差信号を正規化して、正規化残差信号と正規化
残差信号ゲインを抽出する。そして、抽出したスペクトルパラメータ、遅延パラメータ、
適応コードブックゲイン、正規化残差信号または正規化残差信号ゲイン（以下、これらを
総称してパラメータとも呼ぶ）をパラメータ修正部１０４に出力する。ここで、スペクト
ルパラメータ、遅延パラメータ、適応コードブックゲイン、正規化残差信号及び正規化残
差信号ゲインのうちから２つ以上を抽出する構成としてもよい。 The voice data analyzer 103 divides the decoded voice signal into frames (for example, 20 ms), and extracts a spectral parameter representing the spectral characteristics of the voice signal using linear prediction analysis on the divided signal. Next, the audio data analyzer 103 divides the frame-divided audio signal into subframes (for example, 5 ms), and delay parameters corresponding to the pitch period as parameters in the adaptive codebook based on past sound source signals for each subframe. And extract the adaptive codebook gain. Also, the audio data analyzer 103 predicts the pitch of the audio signal of the corresponding subframe using the adaptive codebook. Further, the voice data analyzer 103 normalizes the residual signal obtained by pitch prediction, and extracts the normalized residual signal and the normalized residual signal gain. And the extracted spectral parameters, delay parameters,
The adaptive codebook gain, normalized residual signal, or normalized residual signal gain (hereinafter collectively referred to as parameters) is output to the parameter correction unit 104. Here, two or more of the spectral parameters, delay parameters, adaptive codebook gain, normalized residual signal, and normalized residual signal gain may be extracted.

パラメータ修正部１０４は、ロスディテクタ１０１から入力されたロス検出結果に基づ
いて、音声データアナライザ１０３から入力されたスペクトルパラメータ、遅延パラメー
タ、適応コードブックゲイン、正規化残差信号または正規化残差信号ゲインをそのまま用
いるか、又は±１%の乱数を加える、或いはゲインを小さくしていくなどの修正をする。
さらに、パラメータ修正部１０４は、この値を音声合成部１０５に出力する。これらの値
を修正する理由は、繰り返しにより不自然な音声信号が生成されることを避けるためであ
る。 Based on the loss detection result input from the loss detector 101, the parameter correction unit 104 receives the spectral parameter, delay parameter, adaptive codebook gain, normalized residual signal, or normalized residual signal input from the speech data analyzer 103. The gain is used as it is, or a random number of ± 1% is added, or the gain is reduced.
Further, the parameter correction unit 104 outputs this value to the speech synthesis unit 105. The reason for correcting these values is to avoid generating an unnatural audio signal by repetition.

音声合成部１０５は、パラメータ修正部１０４から入力されたスペクトルパラメータ、
遅延パラメータ、適応コードブックゲイン、正規化残差信号または正規化残差信号ゲイン
を使って合成音声信号を生成し、音声信号出力部１０６に出力する。 The speech synthesizer 105 receives the spectral parameters input from the parameter corrector 104,
A synthesized speech signal is generated using the delay parameter, adaptive codebook gain, normalized residual signal, or normalized residual signal gain, and is output to the speech signal output unit 106.

音声信号出力部１０６は、ロスディテクタ１０１から入力されたロス検出結果に基づい
て、音声データデコーダ１０２から入力された復号音声信号、音声合成部１０５から入力
された合成音声信号、又は復号音声信号と合成音声信号とをある比率で混合した信号のい
ずれかを出力する。 Based on the loss detection result input from the loss detector 101, the audio signal output unit 106 receives a decoded audio signal input from the audio data decoder 102, a synthesized audio signal input from the audio synthesis unit 105, or a decoded audio signal One of the signals obtained by mixing the synthesized speech signal at a certain ratio is output.

次に、図２を参照しながら、実施例１の音声データ復号装置の動作を説明する。 Next, the operation of the audio data decoding apparatus according to the first embodiment will be described with reference to FIG.

まず、ロスディテクタ１０１は、受信した音声データがロスしているかを検出する（S
６０１）。ここで、ロスを検出する方法としては、無線網におけるビット誤りをCRC (Cyc
lic Redundancy Check)符号を用いて検出した場合に音声データがロスしたとして検出す
る方法、又はIP (Internet Protocol)網におけるロスをRFC3550RTP (A Transport Protoc
ol for Real-Time Applications)ヘッダのシーケンス番号の抜けにより検出した場合に音
声データがロスしたとして検出する方法がある。 First, the loss detector 101 detects whether the received audio data is lost (S
601). Here, as a method of detecting a loss, a bit error in a wireless network is detected by CRC (Cyc
A method that detects voice data loss when detected using a lic redundancy check (RF) code, or loss in an IP (Internet Protocol) network
ol for Real-Time Applications) There is a method of detecting that audio data has been lost when it is detected by missing a sequence number in the header.

ロスディテクタ１０１が音声データのロスを検出しなかったならば、音声データアナラ
イザ１０２が受信した音声データを復号し、音声信号出力部へ出力する（S６０２）。 If the loss detector 101 does not detect the loss of audio data, the audio data analyzer 102 decodes the audio data received and outputs it to the audio signal output unit (S602).

ロスディテクタ１０１が音声データのロスを検出したならば、音声データアナライザ１
０３が、ロス直前の復号音声信号を基に、スペクトルパラメータ、遅延パラメータ、適応
コードブックゲイン、正規化残差信号または正規化残差信号ゲインを抽出する（S６０３
）。ここで、復号音声信号の分析は、ロスを検出した直前の復号音声信号に対して行なっ
てもよいし、全ての復号音声信号に対して行ってもよい。次に、パラメータ修正部１０４
はロス検出結果に基づいて、スペクトルパラメータ、遅延パラメータ、適応コードブック
ゲイン、正規化残差信号または正規化残差信号ゲインをそのまま用いるか、或いは±１%
の乱数を加える等して修正する（S６０４）。音声合成部１０５は、これらの値を使って
、合成音声信号を生成する（S６０５）。 If the loss detector 101 detects a loss of audio data, the audio data analyzer 1
03 extracts a spectral parameter, delay parameter, adaptive codebook gain, normalized residual signal or normalized residual signal gain based on the decoded speech signal immediately before the loss (S603).
). Here, the analysis of the decoded speech signal may be performed on the decoded speech signal immediately before the loss is detected, or may be performed on all the decoded speech signals. Next, the parameter correction unit 104
Based on the loss detection result, spectral parameters, delay parameters, adaptive codebook gain, normalized residual signal or normalized residual signal gain are used as they are, or ± 1%
Is corrected by adding a random number (S604). The speech synthesizer 105 generates a synthesized speech signal using these values (S605).

そして、音声信号出力部１０６は、ロス検出結果に基づいて、音声データデコーダ１０
２から入力された復号音声信号、音声合成部１０５から入力された合成音声信号又は復号
音声信号と合成音声信号とをある比率で混合した信号のいずれかを出力する（S６０６）
。具体的には、前フレームと現フレームでロスが検出されていない場合は、復号音声信号
を出力し、ロスが検出された場合は、合成音声信号を出力し、ロスが検出された次のフレ
ームでは、最初は、合成音声信号の比が大きく、時間が経過するにつれて復号音声信号の
比が大きくなるように音声信号を加算することにより、音声信号出力部106から出力され
る音声信号が不連続になることを避ける。 Then, the audio signal output unit 106 performs audio data decoder 10 based on the loss detection result.
2, the decoded speech signal input from 2, the synthesized speech signal input from the speech synthesizer 105, or the signal obtained by mixing the decoded speech signal and the synthesized speech signal at a certain ratio is output (S 606).
. Specifically, if no loss is detected in the previous frame and the current frame, a decoded speech signal is output. If a loss is detected, a synthesized speech signal is output, and the next frame in which the loss is detected. First, the audio signal output from the audio signal output unit 106 is discontinuous by adding the audio signals such that the ratio of the synthesized audio signal is large and the ratio of the decoded audio signal increases with time. Avoid becoming.

実施例１の音声データ復号装置により、従来パラメータなどを抽出していなかったG.71
1方式において、パラメータを抽出し、これらの値を、音声データのロスを補間する信号
に利用することで、ロスを補間する音声の音質を向上させることができる。 G.71, in which parameters and the like have not been conventionally extracted by the audio data decoding apparatus of the first embodiment
In one method, parameters are extracted, and these values are used as signals for interpolating the loss of audio data, thereby improving the sound quality of the audio for interpolating the loss.

実施例２について、図３及び図４を参照しながら説明する。実施例２と実施例１との異
なる点は、音声データのロスを検出した際、ロス部分を補間する音声信号を出力する前に
、ロス後の次の音声データを受信しているかを検出する。そして、次の音声データを検出
した場合、ロスした音声データに対する音声信号を生成するのに、実施例１の動作に加え
、次の音声データの情報をも用いる点である。 A second embodiment will be described with reference to FIGS. 3 and 4. The difference between the second embodiment and the first embodiment is that when a loss of audio data is detected, it is detected whether the next audio data after the loss is received before outputting an audio signal for interpolating the loss portion. . When the next audio data is detected, in addition to the operation of the first embodiment, the information of the next audio data is also used to generate an audio signal for the lost audio data.

図３は、実施例１と同様にG.711方式に代表される波形符号化方式で符号化された音声
データに対する復号装置の構成を示す。実施例２の音声データ復号装置は、ロスディテク
タ２０１、音声データデコーダ２０２、音声データアナライザ２０３、パラメータ修正部
２０４、音声合成部２０５及び音声信号出力部２０６より構成されている。ここで、音声
データデコーダ２０２、パラメータ修正部２０４及び音声合成部２０５は、実施例１の音
声データデコーダ１０２、パラメータ修正部１０４及び音声合成部１０５と同じ動作をす
るので、説明は割愛する。 FIG. 3 shows the configuration of a decoding apparatus for audio data encoded by a waveform encoding method typified by the G.711 method, as in the first embodiment. The speech data decoding apparatus according to the second embodiment includes a loss detector 201, a speech data decoder 202, a speech data analyzer 203, a parameter correction unit 204, a speech synthesis unit 205, and a speech signal output unit 206. Here, since the voice data decoder 202, the parameter correction unit 204, and the voice synthesis unit 205 perform the same operations as the voice data decoder 102, the parameter correction unit 104, and the voice synthesis unit 105 of the first embodiment, description thereof is omitted.

ロスディテクタ２０１は、実施例１記載のロスディテクタ１０１の動作に加え、音声デ
ータのロスを検出した場合、音声データデコーダ２０２がロス部分を補間する音声信号を
出力する前に、ロス後の次の音声データを受信しているかを検出する。さらに、ロスディ
テクタ２０１は、この検出結果を音声データデコーダ２０２と音声データアナライザ２０
３とパラメータ修正部２０４と音声信号出力部２０６に出力する。 When the loss detector 201 detects a loss of audio data in addition to the operation of the loss detector 101 described in the first embodiment, before the audio data decoder 202 outputs an audio signal for interpolating the loss part, Detect whether audio data is received. Further, the loss detector 201 sends the detection result to the audio data decoder 202 and the audio data analyzer 20.
3, the parameter correction unit 204, and the audio signal output unit 206.

音声データアナライザ２０３は、実施例1記載の音声データアナライザ１０３の動作に
加え、ロスディテクタ２０１からの検出結果に基づいて、ロスを検出した次の音声データ
に対する音声信号の時間を反転させた信号を生成する。そして、この信号について実施例
１と同様の手順で分析を行い、抽出したスペクトルパラメータ、遅延パラメータ、適応コ
ードブックゲイン、正規化残差信号または正規化残差信号ゲインをパラメータ修正部２０
４に出力する。 In addition to the operation of the audio data analyzer 103 described in the first embodiment, the audio data analyzer 203 generates a signal obtained by inverting the time of the audio signal for the next audio data in which loss is detected based on the detection result from the loss detector 201. Generate. Then, this signal is analyzed in the same procedure as in the first embodiment, and the extracted spectral parameter, delay parameter, adaptive codebook gain, normalized residual signal or normalized residual signal gain is converted to the parameter correction unit 20.
4 is output.

音声信号出力部２０６は、ロスディテクタ２０１から入力されたロス検出結果に基づい
て、音声データデコーダ２０２から入力された復号音声信号、或いは最初はロスが検出さ
れた前の音声データのパラメータにより生成された合成音声信号の比率が高く、最後はロ
スが検出された次の音声データのパラメータにより生成された合成音声信号の時間を反転
させた信号の比率が高くなるように加算した信号のいずれかを出力する。 The audio signal output unit 206 is generated based on the loss detection result input from the loss detector 201 based on the decoded audio signal input from the audio data decoder 202 or the parameters of the audio data before the loss is first detected. One of the signals added to increase the ratio of the signal obtained by inverting the time of the synthesized voice signal generated by the parameter of the next voice data in which the loss is detected. Output.

次に、図４を参照しながら、実施例２の音声データ復号装置の動作を説明する。 Next, the operation of the audio data decoding apparatus according to the second embodiment will be described with reference to FIG.

まず、ロスディテクタ２０１は、受信した音声データがロスしているかを検出する（S
７０１）。ロスディテクタ２０１が音声データのロスを検出しなかったならば、実施例１
のＳ６０２と同様の動作を行う（S７０２）。 First, the loss detector 201 detects whether the received audio data is lost (S
701). If the loss detector 201 does not detect a loss of audio data, the first embodiment
The same operation as S602 is performed (S702).

ロスディテクタ２０１が音声データのロスを検出したならば、ロスディテクタ２０１が
、音声データデコーダ２０２がロス部分を補間する音声信号を出力する前にロス後の次の
音声データを受信しているか、検出する（Ｓ７０３）。次の音声データを受信していない
ならば、実施例１のＳ６０３乃至Ｓ６０５と同様の動作を行う（Ｓ７０４乃至Ｓ７０６）
。次の音声データを受信したならば、音声データデコーダ２０２が次の音声データを復号
する（Ｓ７０７）。この復号した次の音声データを基に、音声データアナライザ２０３が
スペクトルパラメータ、遅延パラメータ、適応コードブックゲイン、正規化残差信号また
は正規化残差信号ゲインを抽出する（S７０８）。次に、パラメータ修正部２０４はロス
検出結果に基づいて、スペクトルパラメータ、遅延パラメータ、適応コードブックゲイン
、正規化残差信号または正規化残差信号ゲインをそのまま用いるか、或いは±１%の乱数
を加える等して修正する（S７０９）。音声合成部２０５は、これらの値を使って、合成
音声信号を生成する（S７１０）。 If the loss detector 201 detects a loss of audio data, it is detected whether the loss detector 201 has received the next audio data after loss before the audio data decoder 202 outputs an audio signal for interpolating the loss part. (S703). If the next audio data has not been received, the same operations as S603 to S605 of the first embodiment are performed (S704 to S706).
. If the next audio data is received, the audio data decoder 202 decodes the next audio data (S707). Based on the decoded next voice data, the voice data analyzer 203 extracts a spectrum parameter, a delay parameter, an adaptive codebook gain, a normalized residual signal, or a normalized residual signal gain (S708). Next, the parameter correction unit 204 uses the spectrum parameter, delay parameter, adaptive codebook gain, normalized residual signal or normalized residual signal gain as it is based on the loss detection result, or uses a random number of ± 1%. It is corrected by adding it (S709). The speech synthesizer 205 uses these values to generate a synthesized speech signal (S710).

そして、音声信号出力部２０６は、ロスディテクタ２０１から入力されたロス検出結果
に基づいて、音声データデコーダ２０２から入力された復号音声信号、または最初はロス
が検出された前の音声データのパラメータにより生成された合成音声信号の比率が高く、
最後はロスが検出された次の音声データのパラメータにより生成された合成音声信号の時
間を反転させた信号の比率が高くなるように加算した信号を出力する（Ｓ７１１）。 Based on the loss detection result input from the loss detector 201, the audio signal output unit 206 uses the decoded audio signal input from the audio data decoder 202 or the parameters of the audio data before the loss is initially detected. The ratio of the generated synthesized speech signal is high,
Finally, a signal added so that the ratio of the signal obtained by inverting the time of the synthesized voice signal generated by the parameter of the next voice data in which the loss is detected is increased (S711).

実施例２によれば、近年、急速に普及しているVoIP (Voice over IP)では、音声データ
の到着時間の揺らぎを吸収するために、受信した音声データのバッファリングを行ってい
るので、ロスした部分の音声信号を補間する際に、バッファに存在しているロスした次の
音声データを用いることで、補間信号の音質を向上させることができる。 According to the second embodiment, VoIP (Voice over IP), which has been rapidly spreading in recent years, buffers received voice data in order to absorb fluctuations in the arrival time of voice data. When interpolating the audio signal of the selected portion, the sound quality of the interpolated signal can be improved by using the lost audio data present in the buffer.

実施例３について、図５及び図６を参照しながら説明する。本実施例では、CELP方式で
符号化された音声データの復号に関して、音声データのロスを検出した場合に、実施例２
と同様に、第一音声データデコーダ３０２がロス部分を補間する音声信号を出力する前に
ロス後の音声データを受信していれば、ロスした音声データに対する音声信号を生成する
際に次の音声データの情報を用いる構成を示している。 A third embodiment will be described with reference to FIGS. 5 and 6. In this embodiment, when a loss of audio data is detected with respect to decoding of audio data encoded by the CELP method, the second embodiment
Similarly, if the first audio data decoder 302 receives the audio data after the loss before outputting the audio signal for interpolating the loss portion, the next audio is generated when generating the audio signal for the lost audio data. A configuration using data information is shown.

図５は、CELP方式で符号化された音声データに対する復号装置の構成を示す。実施例３
の音声データ復号装置は、ロスディテクタ３０１、第一音声データデコーダ３０２、パラ
メータ補間部３０３、第二音声データデコーダ３０４及び音声信号出力部３０５から構成
されている。 FIG. 5 shows a configuration of a decoding apparatus for audio data encoded by the CELP method. Example 3
The audio data decoding apparatus includes a loss detector 301, a first audio data decoder 302, a parameter interpolation unit 303, a second audio data decoder 304, and an audio signal output unit 305.

ロスディテクタ３０１は、受信した音声データを第一音声データデコーダ３０２と第二
音声データデコーダ３０４に出力するとともに、受信した音声データがロスしているかを
検出する。ロスを検出した場合に、第一音声データデコーダ３０２がロス部分を補間する
音声信号を出力する前に次の音声データを受信しているかを検出し、検出結果を第一音声
データデコーダ３０２と第二音声データデコーダ３０４に出力する。 The loss detector 301 outputs the received audio data to the first audio data decoder 302 and the second audio data decoder 304, and detects whether the received audio data is lost. When the loss is detected, it is detected whether the first audio data decoder 302 receives the next audio data before outputting the audio signal for interpolating the loss part, and the detection result is compared with the first audio data decoder 302 and the first audio data decoder 302. Output to the second audio data decoder 304.

第一音声データデコーダ３０２は、ロスが検出されなかった場合、入力された音声デー
タを復号して、復号音声信号を音声データ出力部に出力し、復号時のスペクトルパラメー
タ、遅延パラメータ、適応コードブックゲイン、正規化残差信号または正規化残差信号ゲ
インをパラメータ補間部３０３に出力する。また、第一音声データデコーダ３０２は、ロ
スを検出し、次の音声データを受信していない場合、過去の音声データの情報を用いてロ
ス部分を保管する音声信号を生成する。生成する方法については、上記特許文献1に記載
されている方法を用いることができる。さらに、第一音声データデコーダ３０２は、パラ
メータ補間部３０３から入力されたパラメータを用いてロスした音声データに対する音声
信号を生成し、音声信号出力部３０５に出力する。 If no loss is detected, the first audio data decoder 302 decodes the input audio data and outputs the decoded audio signal to the audio data output unit, and the spectral parameters, delay parameters, and adaptive codebook at the time of decoding The gain, the normalized residual signal, or the normalized residual signal gain is output to the parameter interpolation unit 303. Also, the first audio data decoder 302 detects a loss, and when the next audio data is not received, the first audio data decoder 302 generates an audio signal that stores the loss portion using information of past audio data. For the generation method, the method described in Patent Document 1 can be used. Further, the first audio data decoder 302 generates an audio signal for the lost audio data using the parameters input from the parameter interpolation unit 303 and outputs the audio signal to the audio signal output unit 305.

第二音声データデコーダ３０３は、ロスを検出し、第一音声データデコーダ３０２がロ
ス部分を補間する音声信号を出力する前に次の音声データを受信している場合、ロスした
音声データに対する音声信号を過去の音声データの情報を用いて生成する。そして、第二
音声データデコーダ３０４は、生成した音声データを使って次の音声データを復号した際
に用いる、スペクトルパラメータ、遅延パラメータ、適応コードブックゲイン、正規化残
差信号または正規化残差信号ゲインを抽出し、パラメータ補間部３０３に出力する。 The second audio data decoder 303 detects the loss, and when the first audio data decoder 302 receives the next audio data before outputting the audio signal for interpolating the loss part, the audio signal for the lost audio data Is generated using information of past audio data. Then, the second audio data decoder 304 uses the generated audio data to decode the next audio data, and uses the spectrum parameter, delay parameter, adaptive codebook gain, normalized residual signal or normalized residual signal. The gain is extracted and output to the parameter interpolation unit 303.

パラメータ補間部３０４は、第一音声データデコーダ３０２から入力されたパラメータ
と第二音声データデコーダ３０４から入力されたパラメータを用いて、ロスした音声デー
タに対するパラメータを生成し、第一音声データデコーダ３０２に出力する。 The parameter interpolation unit 304 generates a parameter for the lost audio data using the parameters input from the first audio data decoder 302 and the parameters input from the second audio data decoder 304, and sends them to the first audio data decoder 302. Output.

音声信号出力部３０５は、音声データデコーダ３０２から入力された復号音声信号を出
力する。 The audio signal output unit 305 outputs the decoded audio signal input from the audio data decoder 302.

次に、図６を参照しながら、実施例３の音声データ復号装置の動作を説明する。 Next, the operation of the audio data decoding apparatus according to the third embodiment will be described with reference to FIG.

まず、ロスディテクタ３０１が受信した音声データがロスしているかを検出する（S８
０１）。ロスしていないならば、第一音声データデコーダ３０２が、入力された音声デー
タを復号し、復号時のスペクトルパラメータ、遅延パラメータ、適応コードブックゲイン
、正規化残差信号または正規化残差信号ゲインをパラメータ補間部３０３に出力する（S
８０２及びS８０３）。 First, it is detected whether the audio data received by the loss detector 301 is lost (S8).
01). If there is no loss, the first audio data decoder 302 decodes the input audio data, and the spectral parameters, delay parameters, adaptive codebook gain, normalized residual signal, or normalized residual signal gain at the time of decoding are decoded. Is output to the parameter interpolation unit 303 (S
802 and S803).

ロスしているならば、ロスディテクタ３０１が第一音声データデコーダ３０２がロス部
分を補間する音声信号を出力する前にロス後の次の音声データを受信しているか、検出す
る（S８０４）。次の音声データを受信していないならば、第一音声データデコーダ３０
２が、過去の音声データの情報を用いてロス部分を保管する音声信号を生成する（S８０
５）。 If there is a loss, the loss detector 301 detects whether the next audio data after the loss is received before the first audio data decoder 302 outputs the audio signal for interpolating the loss part (S804). If the next audio data has not been received, the first audio data decoder 30
2 generates an audio signal for storing the loss part using information of past audio data (S80).
5).

次の音声データを受信しているならば、第二音声データデコーダ３０３が、ロスした音
声データに対する音声信号を過去の音声データの情報を用いて生成する（S８０６）。第
二音声データデコーダ３０４は、生成した音声データを使って次の音声データを復号し、
復号時のスペクトルパラメータ、遅延パラメータ、適応コードブックゲイン、正規化残差
信号または正規化残差信号ゲインを生成し、パラメータ補間部３０３に出力する（S８０
７）。次に、パラメータ補間部３０４が、第一音声データデコーダ３０２から入力された
パラメータと第二音声データデコーダ３０４から入力されたパラメータを用いて、ロスし
た音声データに対するパラメータを生成する（S８０８）。そして、第一音声データデコ
ーダ３０２は、パラメータ補間部３０４が生成したパラメータを用いて、ロスした音声デ
ータに対する音声信号を生成し、第一音声データデコーダ３０２に出力する（S８０９）
。 If the next audio data is received, the second audio data decoder 303 generates an audio signal for the lost audio data using the information of the past audio data (S806). The second audio data decoder 304 decodes the next audio data using the generated audio data,
A spectral parameter, delay parameter, adaptive codebook gain, normalized residual signal or normalized residual signal gain at the time of decoding is generated and output to the parameter interpolation unit 303 (S80).
7). Next, the parameter interpolation unit 304 uses the parameters input from the first audio data decoder 302 and the parameters input from the second audio data decoder 304 to generate parameters for the lost audio data (S808). Then, the first audio data decoder 302 generates an audio signal for the lost audio data using the parameters generated by the parameter interpolation unit 304, and outputs the audio signal to the first audio data decoder 302 (S809).
.

第一音声データデコーダ３０２はそれぞれの場合で生成した音声信号を音声信号出力部
３０５へ出力し、音声信号出力部３０５が復号音声信号を出力する（S８１０）。 The first audio data decoder 302 outputs the audio signal generated in each case to the audio signal output unit 305, and the audio signal output unit 305 outputs the decoded audio signal (S810).

実施例３により、近年、急速に普及しているVoIPでは、音声データの到着時間の揺らぎ
を吸収するために、受信した音声データのバッファリングを行っているので、CELP方式に
おいてロスした部分の音声信号を補間する際に、バッファに存在しているロスした次の音
声データを用いることで、補間信号の音質を向上させることができる。 According to the third embodiment, VoIP, which has been rapidly spreading in recent years, buffers received voice data in order to absorb fluctuations in the arrival time of voice data. When the signal is interpolated, the sound quality of the interpolated signal can be improved by using the lost audio data present in the buffer.

実施例４について、図７及び図８を参照しながら説明する。CELP方式において、音声デ
ータのロスが生じたときに補間信号を用いると、ロスした部分は補うことができるものの
、補間信号は正しい音声データから生成したわけではないので、その後に受信した音声デ
ータの音質を低下させてしまう。そこで、実施例４は、実施例３に加えて、音声データの
ロスの部分に対する補間音声信号を出力した後に、ロスした部分の音声データが遅れて届
いた場合、この音声データを用いることにより、ロスした次の音声データの音声信号の品
質を向上させる技術を開示する。 A fourth embodiment will be described with reference to FIGS. In the CELP method, if an interpolation signal is used when audio data loss occurs, the lost part can be compensated, but the interpolation signal is not generated from correct audio data. The sound quality will be degraded. Accordingly, in the fourth embodiment, in addition to the third embodiment, after outputting the interpolated audio signal for the lost portion of the audio data, when the lost portion of the audio data arrives late, by using this audio data, A technique for improving the quality of the audio signal of the next lost audio data is disclosed.

図７は、実施例３と同様に、CELP方式で符号化された音声データに対する復号装置の構
成を示す。実施例４の音声データ復号装置は、ロスディテクタ４０１、第一音声データデ
コーダ４０２、第二音声データデコーダ４０３、メモリ蓄積部４０４及び音声信号出力部
４０５から構成されている。 FIG. 7 shows the configuration of a decoding apparatus for audio data encoded by the CELP method, as in the third embodiment. The audio data decoding apparatus according to the fourth embodiment includes a loss detector 401, a first audio data decoder 402, a second audio data decoder 403, a memory storage unit 404, and an audio signal output unit 405.

ロスディテクタ４０１は、受信した音声データを第一音声データデコーダ４０２と第二
音声データデコーダ４０３に出力する。また、ロスディテクタ４０１は、受信した音声デ
ータがロスしたかを検出する。ロスを検出した場合には、次の音声データを受信している
かを検出し、検出結果を第一音声データデコーダ４０２、第二音声データデコーダ４０３
または音声信号出力部４０５に出力する。さらに、ロスディテクタ４０１は、ロスした音
声データが遅れて受信したかどうかを検出する。 The loss detector 401 outputs the received audio data to the first audio data decoder 402 and the second audio data decoder 403. The loss detector 401 detects whether the received audio data has been lost. When a loss is detected, it is detected whether the next audio data is received, and the detection results are detected as the first audio data decoder 402 and the second audio data decoder 403.
Or it outputs to the audio | voice signal output part 405. Further, the loss detector 401 detects whether or not the lost voice data is received with a delay.

第一音声データデコーダ４０２は、ロスが検出されなかった場合、入力された音声デー
タを復号する。また、第一音声データデコーダ４０２は、ロスが検出された場合、過去の
音声データの情報を用いて音声信号を生成して、音声データ出力部４０５に出力する。生
成する方法については、特許文献１に記載されている方法を用いることができる。さらに
、第一音声データデコーダ４０２は、合成フィルタ等のメモリをメモリ蓄積部４０４に出
力する。 The first audio data decoder 402 decodes the input audio data when no loss is detected. In addition, when a loss is detected, the first audio data decoder 402 generates an audio signal using past audio data information and outputs the audio signal to the audio data output unit 405. For the generation method, the method described in Patent Document 1 can be used. Further, the first audio data decoder 402 outputs a memory such as a synthesis filter to the memory storage unit 404.

第二音声データデコーダ４０３は、ロス部分の音声データが遅れて到着した場合、遅れ
て到着した音声データを、メモリ蓄積部４０４に蓄積されているロス検出直前パケットの
合成フィルタ等のメモリを使って復号し、復号信号を音声信号出力部４０５に出力する。 The second audio data decoder 403 uses a memory such as a synthesis filter for packets immediately before loss detection stored in the memory storage unit 404 when the audio data of the loss portion arrives late. The decoded signal is output to the audio signal output unit 405.

音声信号出力部４０５は、ロスディテクタ４０１から入力されたロス検出結果に基づい
て、第一音声データデコーダ４０２から入力された復号音声信号、第二音声データデコー
ダ４０３から入力された復号音声信号または前記二つの信号をある比率で加算した音声信
号を出力する。 The audio signal output unit 405, based on the loss detection result input from the loss detector 401, the decoded audio signal input from the first audio data decoder 402, the decoded audio signal input from the second audio data decoder 403, or the aforementioned An audio signal obtained by adding two signals at a certain ratio is output.

次に、図８を参照しながら、実施例４の音声データ復号装置の動作を説明する。 Next, the operation of the speech data decoding apparatus according to the fourth embodiment will be described with reference to FIG.

まず、上記実施例３の音声データ復号装置の動作、Ｓ８０１乃至Ｓ８１０を行い、ロス
した音声データを保管する音声信号を出力する。ここで、Ｓ８０５及びＳ８０６のときに
、過去の音声データより音声信号を生成したときに、合成フィルタ等のメモリをメモリ蓄
積部４０４に出力する（Ｓ９０３及びＳ９０４）。そして、ロスディテクタ４０１が、ロ
スしていた音声データを遅れて受信したのを検出する（Ｓ９０５）。ロスディテクタ４０
１が検出していないならば、実施例３で生成した音声信号を出力する。ロスディテクタ４
０１が検出したならば、第二音声データデコーダ４０３が、遅れて到着した音声データを
、メモリ蓄積部４０４に蓄積されているロス検出直前パケットの合成フィルタ等のメモリ
を使って復号する（Ｓ９０６）。 First, the operation of the audio data decoding apparatus of the third embodiment, S801 to S810, is performed, and an audio signal for storing lost audio data is output. Here, in S805 and S806, when an audio signal is generated from past audio data, a memory such as a synthesis filter is output to the memory storage unit 404 (S903 and S904). Then, the loss detector 401 detects that the lost voice data has been received with a delay (S905). Loss detector 40
If 1 is not detected, the audio signal generated in the third embodiment is output. Ross Detector 4
If 01 is detected, the second audio data decoder 403 decodes the audio data that arrived late using a memory such as a synthesis filter for the packet immediately before loss detection stored in the memory storage unit 404 (S906). .

そして、声信号出力部４０５が、ロスディテクタ４０１から入力されたロス検出結果に
基づいて、第一音声データデコーダ４０２から入力された復号音声信号、第二音声データ
デコーダ４０３から入力された復号音声信号または前記二つの信号をある比率で加算した
音声信号を出力する（Ｓ９０７）。具体的には、ロスを検出し、音声データが遅れて到着
した場合、音声信号出力部４０５は、ロスした音声データの次の音声データに対する音声
信号として、最初は、第一音声データデコーダ４０２から入力された復号音声信号の比を
大きくする。そして、時間が経過するにつれて、音声信号出力部４０５は、第二音声デー
タデコーダ４０３から入力された復号音声信号の比を大きくするように加算した音声信号
を出力する。 Then, the voice signal output unit 405, based on the loss detection result input from the loss detector 401, the decoded audio signal input from the first audio data decoder 402 and the decoded audio signal input from the second audio data decoder 403. Alternatively, an audio signal obtained by adding the two signals at a certain ratio is output (S907). Specifically, when the loss is detected and the audio data arrives late, the audio signal output unit 405 initially receives the first audio data decoder 402 as an audio signal for the audio data next to the lost audio data. Increase the ratio of the input decoded audio signal. Then, as time elapses, the audio signal output unit 405 outputs an audio signal added so as to increase the ratio of the decoded audio signal input from the second audio data decoder 403.

実施例４によれば、遅れて届いたロス部分の音声データを用いて合成フィルタ等のメモ
リを書き換えることで、正しい復号音声信号を生成することができる。また、この正しい
復号音声信号を、あえてすぐに出力せず、ある比率で加算した音声信号を出力することで
、音声が不連続になることを防止することがきる。さらに、ロスした部分に補間信号を用
いたとしても、遅れて届いたロス部分の音声データで合成フィルタ等のメモリを書きかえ
て復号音声信号を生成することで、補間信号後の音質を向上させることができる。 According to the fourth embodiment, a correct decoded speech signal can be generated by rewriting a memory such as a synthesis filter by using the speech data of the loss part that arrives late. In addition, it is possible to prevent the voice from being discontinuous by outputting the audio signal obtained by adding the correct decoded audio signal at a certain ratio without outputting it immediately. Furthermore, even if an interpolated signal is used for the lost part, the sound quality after the interpolated signal is improved by rewriting a memory such as a synthesis filter with the lost part of the audio data and generating a decoded audio signal. be able to.

ここで、実施例４は、実施例３の音声変換信号に付け加える形態で説明したが、他の補
間信号の生成をした形態に付け加えてもよい。 Here, the fourth embodiment has been described in the form of being added to the audio conversion signal of the third embodiment, but may be added to a form in which another interpolation signal is generated.

実施例５の音声データ変換装置について、図９及び図１０を参照しながら説明する。 An audio data conversion apparatus according to the fifth embodiment will be described with reference to FIGS.

図９は、ある音声符号化方式で符号化された音声信号を、別の音声符号化方式に変換す
る装置の構成を示している。ここでは、例えば、G.711で代表される波形符号化方式で符
号化された音声データを、CELP方式で符号化された音声データに変換する装置の形態を示
す。実施例５の音声データ変換装置は、ロスディテクタ５０１、音声データデコーダ５０
２、音声データエンコーダ５０３、パラメータ修正部５０４及び音声データ出力部５０５
から構成されている。 FIG. 9 shows the configuration of an apparatus for converting a speech signal encoded by a certain speech encoding method into another speech encoding method. Here, for example, a configuration of an apparatus that converts audio data encoded by a waveform encoding method typified by G.711 into audio data encoded by a CELP method is shown. The audio data conversion apparatus according to the fifth embodiment includes a loss detector 501 and an audio data decoder 50.
2. Audio data encoder 503, parameter correction unit 504, and audio data output unit 505
It is composed of

ロスディテクタ５０１は、受信した音声データを音声データデコーダ５０２に出力する
。また、ロスディテクタ５０１は、受信した音声データがロスしているかを検出し、検出
結果を音声データデコーダ５０２と音声データエンコーダ５０３とパラメータ修正部５０
４と音声信号出力部５０５に出力する。 The loss detector 501 outputs the received audio data to the audio data decoder 502. Further, the loss detector 501 detects whether or not the received audio data is lost, and the detection results are displayed as an audio data decoder 502, an audio data encoder 503, and a parameter correction unit 50.
4 and the audio signal output unit 505.

音声データデコーダ５０２は、ロスが検出されなかった場合、入力された音声データを
復号し、復号音声信号を音声データエンコーダ５０３に出力する。 If no loss is detected, the audio data decoder 502 decodes the input audio data and outputs a decoded audio signal to the audio data encoder 503.

音声データエンコーダ５０３は、ロスが検出されなかった場合、音声データデコーダ５
０２から入力された復号音声信号を符号化し、符号化した音声データを音声データ出力部
５０５に出力する。また、音声データエンコーダ５０３は、符号化時のパラメータである
スペクトルパラメータ、遅延パラメータ、適応コードブックゲイン、残差信号または残差
信号ゲインをパラメータ修正部５０４に出力する。さらに、音声データエンコーダ５０３
は、ロスが検出された場合、パラメータ修正部５０４から入力されパラメータを受け取る
。そして、音声データエンコーダ５０３は、パラメータ抽出に用いるフィルタ（図示せず
）を保持しており、パラメータ修正部５０４から受け取ったパラメータを符号化して、音
声データを生成する。その際に、音声データエンコーダ５０３はフィルタ等のメモリを更
新する。ここで、音声データエンコーダ５０３は、符号化時に生じる量子化誤差により、
符号化後のパラメータ値がパラメータ修正部５０４から入力された値と同じ値にならない
場合、符号化後のパラメータ値がパラメータ修正部５０４から入力された値に最も近い値
となるように選択する。また、通信相手の無線通信装置が保持するフィルタのメモリとの
齟齬が生じることを避けるために、音声データエンコーダ５０３は、音声データを生成す
る際に、パラメータ抽出などに用いるフィルタが持つメモリ（図示せず）を更新する必要
がある。さらに、音声データエンコーダ５０３は、生成した音声データを音声データ出力
部５０５に出力する。 If no loss is detected, the audio data encoder 503 determines that the audio data decoder 5
The decoded audio signal input from 02 is encoded, and the encoded audio data is output to the audio data output unit 505. Also, the audio data encoder 503 outputs a spectral parameter, a delay parameter, an adaptive codebook gain, a residual signal, or a residual signal gain, which are parameters at the time of encoding, to the parameter correction unit 504. Furthermore, the audio data encoder 503
When a loss is detected, the parameter is input from the parameter correction unit 504 and receives a parameter. The audio data encoder 503 holds a filter (not shown) used for parameter extraction, encodes the parameter received from the parameter correction unit 504, and generates audio data. At that time, the audio data encoder 503 updates a memory such as a filter. Here, the audio data encoder 503 is caused by a quantization error that occurs during encoding.
When the parameter value after encoding does not become the same value as the value input from the parameter correction unit 504, the parameter value after encoding is selected to be the closest value to the value input from the parameter correction unit 504. Further, in order to avoid the occurrence of a discrepancy with the filter memory held by the wireless communication apparatus of the communication partner, the audio data encoder 503 has a memory (see FIG. (Not shown) needs to be updated. Further, the audio data encoder 503 outputs the generated audio data to the audio data output unit 505.

パラメータ修正部５０４は、音声データエンコーダ５０３から符号化時のパラメータで
あるスペクトルパラメータ、遅延パラメータ、適応コードブックゲイン、残差信号または
残差信号ゲインを受け取り、保存する。また、パラメータ修正部５０４は、保持していた
ロス検出前のパラメータをそのまま用いるか、又は所定の修正をし、ロスディテクタ５０
１から入力されるロス検出結果に基づいて、音声データエンコーダ５０３へ出力する。 The parameter correction unit 504 receives and stores a spectral parameter, a delay parameter, an adaptive codebook gain, a residual signal or a residual signal gain, which are parameters at the time of encoding, from the audio data encoder 503. Further, the parameter correction unit 504 uses the held parameter before the loss detection as it is, or makes a predetermined correction, and the loss detector 50
1 is output to the audio data encoder 503 based on the loss detection result input from 1.

音声データ出力部５０５は、ロスディテクタ５０１から受け取ったロス検出結果に基づ
いて、音声データエンコーダ５０３から受け取った音声信号を出力する。 The audio data output unit 505 outputs the audio signal received from the audio data encoder 503 based on the loss detection result received from the loss detector 501.

次に、図１０を参照しながら、実施例５の音声データ変換装置を説明する。 Next, an audio data conversion apparatus according to the fifth embodiment will be described with reference to FIG.

まず、ロスディテクタ５０１が、受信した音声データがロスしているかを検出する（Ｓ
１００１）。ロスディテクタ５０１がロスを検出しなかったなら、音声データデコーダ５
０２が受信した音声データを基に復号音声信号を生成する（Ｓ１００２）。そして、音声
データエンコーダ５０３が、復号音声信号を符号化し、符号化時のパラメータであるスペ
クトルパラメータ、遅延パラメータ、適応コードブックゲイン、残差信号または残差信号
ゲインを出力する（Ｓ１００３）。 First, the loss detector 501 detects whether the received audio data is lost (S
1001). If the loss detector 501 detects no loss, the audio data decoder 5
A decoded audio signal is generated based on the audio data received by 02 (S1002). Then, the audio data encoder 503 encodes the decoded audio signal and outputs a spectrum parameter, a delay parameter, an adaptive codebook gain, a residual signal, or a residual signal gain, which are parameters at the time of encoding (S1003).

ロスディテクタ５０１がロスを検出したなら、パラメータ修正部５０４が、保持してい
るロス前のパラメータをそのままか、または所定の修正をして、音声データエンコーダ５
０３へ出力する。このパラメータを受信した音声データエンコーダ５０３は、パラメータ
を抽出するためのフィルタが持つメモリを更新する（Ｓ１００４）。さらに、音声データ
エンコーダ５０３が、ロスする直前のパラメータを基に音声信号を生成する（Ｓ１００５
）。 If the loss detector 501 detects a loss, the parameter correction unit 504 maintains the pre-loss parameter held by the loss detector 501 as it is or performs a predetermined correction, and the audio data encoder 5
Output to 03. The audio data encoder 503 that has received the parameter updates the memory of the filter for extracting the parameter (S1004). Further, the audio data encoder 503 generates an audio signal based on the parameter immediately before the loss (S1005).
).

そして、音声データ出力部５０５が、ロス検出結果に基づいて、音声データエンコーダ
５０３から受け取った音声信号を出力する（Ｓ１００６）。 Then, the audio data output unit 505 outputs the audio signal received from the audio data encoder 503 based on the loss detection result (S1006).

実施例５により、例えばゲートウェイなどのようなデータを変換する装置において、音
声データのロスに対する補間信号を波形符号化方式で生成せず、パラメータなどを用いて
ロス部分を補間することで、補間信号の音質を向上させることができる。また、音声デー
タのロスに対する補間信号を波形符号化方式で生成せず、パラメータなどを用いてロス部
分を補間することで、演算量を少なくすることができる。 According to the fifth embodiment, in an apparatus for converting data such as a gateway, for example, an interpolation signal for a loss of audio data is not generated by a waveform coding method, and an interpolation signal is interpolated using a parameter or the like. The sound quality can be improved. In addition, the amount of calculation can be reduced by interpolating the loss portion using a parameter or the like without generating an interpolation signal for the loss of audio data by the waveform encoding method.

ここで、実施例５ではG.711で代表される波形符号化方式で符号化された音声データをC
ELP方式で符号化された音声データに変換する形態を示したが、CELP方式で符号化された
音声データを別のCELP方式で符号化された音声データに変換する形態でもよい。 Here, in the fifth embodiment, audio data encoded by a waveform encoding method represented by G.711 is converted to C.
In the above description, the voice data encoded by the ELP method is converted into voice data. However, the voice data encoded by the CELP method may be converted into voice data encoded by another CELP method.

当業者は上記実施例の様々な変形を容易に実施することができる。したがって、本発明
は上記実施例に限定されることはなく、請求項やその均等物によって参酌される最も広い
範囲で解釈される。
Those skilled in the art can easily implement various modifications of the above-described embodiments. Therefore, the present invention is not limited to the above-described embodiments, but is interpreted in the widest range considered by the claims and their equivalents.

本発明の実施例１の音声データ復号装置の構成を示す概略図である。It is the schematic which shows the structure of the audio | voice data decoding apparatus of Example 1 of this invention. 本発明の実施例１の音声データ復号装置の動作を示す概略図である。It is the schematic which shows operation | movement of the audio | voice data decoding apparatus of Example 1 of this invention. 本発明の実施例２の音声データ復号装置の構成を示す概略図である。It is the schematic which shows the structure of the audio | voice data decoding apparatus of Example 2 of this invention. 本発明の実施例２の音声データ復号装置の動作を示す概略図である。It is the schematic which shows operation | movement of the audio | voice data decoding apparatus of Example 2 of this invention. 本発明の実施例３の音声データ復号装置の構成を示す概略図である。It is the schematic which shows the structure of the audio | voice data decoding apparatus of Example 3 of this invention. 本発明の実施例３の音声データ復号装置の動作を示す概略図である。It is the schematic which shows operation | movement of the audio | voice data decoding apparatus of Example 3 of this invention. 本発明の実施例４の音声データ復号装置の構成を示す概略図である。It is the schematic which shows the structure of the audio | voice data decoding apparatus of Example 4 of this invention. 本発明の実施例４の音声データ復号装置の動作を示す概略図である。It is the schematic which shows operation | movement of the audio | voice data decoder of Example 4 of this invention. 本発明の実施例５の音声データ変換装置の構成を示す概略図である。It is the schematic which shows the structure of the audio | voice data converter of Example 5 of this invention. 本発明の実施例５の音声データ変換装置の動作を示す概略図である。It is the schematic which shows operation | movement of the audio | voice data converter of Example 5 of this invention.

Explanation of symbols

１０１ロスディテクタ
１０２音声データデコーダ
１０３音声データアナライザ
１０４パラメータ修正部
１０５音声合成部
１０６音声信号出力部
２０１ロスディテクタ
２０２音声データデコーダ
２０３音声データアナライザ
２０４パラメータ修正部
２０５音声合成部
２０６音声信号出力部
３０１ロスディテクタ
３０２第一音声データデコーダ
３０３第二音声データデコーダ
３０４パラメータ補間部
３０５音声信号出力部
４０１ロスディテクタ
４０２第一音声データデコーダ
４０３第二音声データデコーダ
４０４メモリ蓄積部
４０５音声信号出力部
５０１ロスディテクタ
５０２音声データデコーダ
５０３音声データエンコーダ
５０４パラメータ修正部
５０５音声データ出力部 DESCRIPTION OF SYMBOLS 101 Loss detector 102 Voice data decoder 103 Voice data analyzer 104 Parameter correction part 105 Voice synthesis part 106 Voice signal output part 201 Loss detector 202 Voice data decoder 203 Voice data analyzer 204 Parameter correction part 205 Voice synthesis part 206 Voice signal output part 301 Loss Detector 302 First audio data decoder 303 Second audio data decoder 304 Parameter interpolation unit 305 Audio signal output unit 401 Loss detector 402 First audio data decoder 403 Second audio data decoder 404 Memory storage unit 405 Audio signal output unit 501 Loss detector 502 Audio data decoder 503 Audio data encoder 504 Parameter correction unit 505 Audio data output unit

Claims

An audio data decoding device that outputs an interpolation signal for interpolating a loss in audio data,
A loss detector that detects the loss and detects that the audio data corresponding to the loss has been received late;
When the loss is detected, a first audio data decoder that outputs a memory such as a synthesis filter to the memory storage unit;
Using a memory such as the synthesis filter of the pre-loss audio data stored in the memory storage unit, the audio data corresponding to the loss is decoded, and then the audio data next to the audio data corresponding to the loss is obtained. A second audio data decoder that generates a decoded audio signal ;
An audio data decoding apparatus comprising: an audio signal output unit that outputs the decoded audio signal while changing a ratio of the decoded audio signal to all output audio signals.

The first audio data decoder, when the loss is not detected, decodes the input audio data and generates the decoded audio signal;
The audio signal according to claim 1, wherein the audio signal output unit outputs the audio signal while changing a ratio of the decoded audio signal generated by the first audio data decoder and the decoded audio signal generated by the second audio data decoder. Data decoding device.