JP2016018042A

JP2016018042A - Voice decryption device, voice decryption method, voice decryption program, and communication apparatus

Info

Publication number: JP2016018042A
Application number: JP2014139817A
Authority: JP
Inventors: 大藤枝; Masaru Fujieda
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2014-07-07
Filing date: 2014-07-07
Publication date: 2016-02-01
Anticipated expiration: 2034-07-07
Also published as: JP6481271B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice decryption device configured to improve sound quality of decryption voice of the code method of a MEB (multi-band excitation) system, a voice decryption method, a voice decryption program, and a communication device.SOLUTION: A voice decryption device 1A decrypts digital code information coded according to a MBE system voice decryption method, comprising: MBE system decryption means 12 configured to decrypt digital code information to generate decryption voice; plosive detection means 13 configured to detect plosive of the decryption voice; and rupture processing means 14 configured to rupture the detected plosive.SELECTED DRAWING: Figure 1

Description

本発明は、音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器に関し、例えば、ＭＢＥ（Ｍｕｌｔｉ−ＢａｎｄＥｘｃｉｔａｔｉｏｎ；マルチバンド励振）系の音声符号化方式による符号化音声信号を復号する場合に適用して好適なものである。 The present invention relates to a speech decoding apparatus, a speech decoding method, a speech decoding program, and a communication device. For example, the present invention decodes an encoded speech signal using a MBE (Multi-Band Excitation) -based speech coding scheme. It is suitable for application in some cases.

データ伝送等の需要増加や周波数の逼迫が懸念されたことによる電波法の改正に伴い、簡易無線機を従来のアナログ方式からデジタル方式へ完全移行することが決まっている。このような流れを受けて、一般社団法人電波産業会によってデジタル方式の簡易無線機（以下、デジタル無線機と呼ぶ）の通信方式に対する標準規格が定められた。特定小電力無線機に多く採用されている変調方式４値ＦＳＫに対して、放送分野においては放送事業用４ＦＳＫ連絡無線方式（ＳＴＤ−Ｂ５４）、通信分野においては狭帯域デジタル通信方式（ＳＣＰＣ／４値ＦＳＫ方式）（ＳＴＤ−Ｔ１０２）の中で定められており、音声符号化方式はいずれも「ＤｉｇｉｔａｌＶｏｉｃｅＳｙｓｔｅｍ，Ｉｎｃ．（米国の会社）のＡＭＢＥ＋２ＥｎｈａｎｃｅｄＨａｌｆ−Ｒａｔｅを推奨する」とされている。なお、ＡＭＢＥ＋２（ＡＭＢＥ＋＋と表記されることがある）は、ＤｉｇｉｔａｌＶｏｉｃｅＳｙｓｔｅｍ，Ｉｎｃ．の商標である。 With the revision of the Radio Law due to concerns about increased demand for data transmission and frequency constraints, it has been decided that the simple wireless device will be completely transferred from the conventional analog system to the digital system. In response to this trend, a standard for a communication method of a digital simple wireless device (hereinafter referred to as a digital wireless device) was established by the Japan Radio Industry Association. In contrast to the modulation method 4-level FSK widely used in specific low-power radios, the broadcasting business uses 4FSK communication radio system (STD-B54), and the communication field uses narrowband digital communication system (SCPC / 4). Value FSK system) (STD-T102), and all voice coding systems are "Recommends Digital Voice System, Inc. (USA company) AMBE + 2 Enhanced Half-Rate" . Note that AMBE ++ (sometimes referred to as AMBE ++) is available from Digital Voice System, Inc. Trademark.

ＡＭＢＥ＋２は、雑音が多い環境でも復号音声が不自然になり難い長所と、低ビットレートでも安定した品質を提供できる長所とを有するが、声色を変質させる短所があり、「鼻が詰まった様な音声になる」ことも報告されている（非特許文献１）。 AMBE + 2 has the advantage that the decoded speech is not likely to be unnatural even in a noisy environment, and the advantage that it can provide stable quality even at a low bit rate, but it has the disadvantage of altering the voice color. It has also been reported that it becomes “sound” (Non-patent Document 1).

ＡＭＢＥ＋２は、音声符号化方式の一つであるＭＢＥ（Ｍｕｌｔｉ−ＢａｎｄＥｘｃｉｔａｔｉｏｎ）を応用させた方式であり、ＡＭＢＥは、ＡｄｖａｎｃｅｄＭＢＥを略したものである。ＡＭＢＥの他にもＩＭＢＥ（ｌｍｐｒｏｖｅｄＭＢＥ）と呼ばれる音声符号化方式がある。ＡＭＢＥ＋２を含むＡＭＢＥやＩＭＢＥは、いずれもＭＢＥが基本となっている。本願明細書では、ＭＢＥ、ＡＭＢＥ及びＩＭＢＥを「ＭＢＥ系の音声符号化方式」と称している。なお、単に、ＭＢＥ音声符号化方式と記載しているときは、音声符号化方式がＭＢＥであることを表している。 AMBE + 2 is a system to which MBE (Multi-Band Excitation), which is one of speech coding systems, is applied, and AMBE is an abbreviation for Advanced MBE. In addition to AMBE, there is a speech encoding method called IMBE (Improved MBE). All of AMBE and IMBE including AMBE + 2 are based on MBE. In the present specification, MBE, AMBE, and IMBE are referred to as “MBE-based speech encoding methods”. It should be noted that simply describing the MBE speech encoding method indicates that the speech encoding method is MBE.

図９は、ＭＢＥ符号化方式に従っている、非特許文献２に記載の音声符号化装置の構成を示している。 FIG. 9 shows the configuration of a speech encoding apparatus described in Non-Patent Document 2 that conforms to the MBE encoding scheme.

図９において、音声符号化装置１００は、周波数変換手段１０１、初期ピッチ選択手段１０２、ピッチ改良手段１０３、有声包絡推定手段１０４、無声包絡推定手段１０５、有声／無声決定手段１０６、有声／無声選択手段１０７、多重化手段１０８及び量子化手段１０９を有する。 In FIG. 9, the speech coding apparatus 100 includes a frequency conversion unit 101, an initial pitch selection unit 102, a pitch improvement unit 103, a voiced envelope estimation unit 104, a voiceless envelope estimation unit 105, a voiced / unvoiced determination unit 106, and a voiced / unvoiced selection. Means 107, multiplexing means 108 and quantization means 109 are provided.

マイクロホン等で取り込んだ音声信号が図示しないＤ／Ａ変換器によってデジタル化された音声信号（以下、入力音声と呼ぶ）が音声符号化装置１００に入力される。周波数変換手段１０１は、入力音声をオーバーラップさせながら窓掛けＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）によって周波数スペクトルへと変換する。初期ピッチ選択手段１０２は、入力音声が完全な有声音であると仮定した場合の調波モデル誤差を最小化するという基準に基づいて、動的計画法を併用しながらピッチ周期（整数サンプル値）を選択し、得られた初期ピッチはピッチ改良手段１０３へ与えられる。ピッチ改良手段１０３は、上記調波モデル誤差がさらに小さくなるように、周波数変換手段１０１からの入力スペクトルに基づいて、整数サンプル値で表現されている初期ピッチを実数サンプル値で表現される、より高精度なピッチ周期へと更新する。 A speech signal obtained by digitizing a speech signal captured by a microphone or the like by a D / A converter (not shown) (hereinafter referred to as input speech) is input to speech encoding apparatus 100. The frequency conversion means 101 converts the input sound into a frequency spectrum by using a FFT (Fast Fourier Transform) while overlapping the input sound. The initial pitch selection means 102 is based on the criterion of minimizing the harmonic model error when the input speech is assumed to be a complete voiced sound, while using dynamic programming together with the pitch period (integer sample value). And the obtained initial pitch is given to the pitch improving means 103. The pitch improving unit 103 represents the initial pitch represented by the integer sample value based on the input spectrum from the frequency converting unit 101 so that the harmonic model error is further reduced. Update to a highly accurate pitch period.

有声包絡推定手段１０４は、周波数変換手段１０１からの入力スペクトルとピッチ改良手段１０３からの実数ピッチに基づいて、上記調波モデル誤差を最小とする有声音に対する包絡情報を算出する。有声音に対する包絡情報は、調波成分ごとのパワー及び位相によって構成されている。無声包絡推定手段１０５は、入力スペクトルと実数ピッチに基づいて、各調波成分が雑音的であると仮定して、調波帯域ごとのパワーを算出して無声包絡情報とする。調波帯域は、有声音において各調波成分が占有する帯域のことであり、実数ピッチによって定義され、隣り合う調波帯域は重ならず、また離れてもいない。有声／無声決定手段１０６は、実数ピッチによって定義される調波帯域ごとに、入力スペクトルと有声包絡情報から算出される当該調波帯域の調波モデル誤差及び無声包絡情報に基づいて、当該調波帯域が有声音であるか無声音であるかを判定する。有声／無声選択手段１０７は、有声／無声情報に基づいて、調波帯域ごとに有声包絡情報又は無声包絡情報を択一的に選択する。 The voiced envelope estimation unit 104 calculates envelope information for the voiced sound that minimizes the harmonic model error based on the input spectrum from the frequency conversion unit 101 and the real number pitch from the pitch improvement unit 103. Envelope information for voiced sound is composed of power and phase for each harmonic component. The unvoiced envelope estimation unit 105 calculates the power for each harmonic band based on the input spectrum and the real number pitch and calculates the power for each harmonic band as unvoiced envelope information. The harmonic band is a band occupied by each harmonic component in the voiced sound, and is defined by a real pitch. Adjacent harmonic bands do not overlap or separate from each other. Voiced / unvoiced determining means 106, for each harmonic band defined by the real number pitch, based on the harmonic model error and unvoiced envelope information of the harmonic band calculated from the input spectrum and voiced envelope information. It is determined whether the band is voiced sound or unvoiced sound. Voiced / unvoiced selection means 107 alternatively selects voiced or unvoiced envelope information for each harmonic band based on voiced / unvoiced information.

多重化手段１０８は、ピッチ情報、調波帯域ごとの有声／無声情報、及び、調波帯域ごとの包絡情報を一つの系列へとまとめる。量子化手段１０９は、符号化情報を量子化し（例えば、要素毎に定まっているビット数になるように量子化し）、得られたデジタル音声符号化情報を出力する。 Multiplexing means 108 combines pitch information, voiced / unvoiced information for each harmonic band, and envelope information for each harmonic band into one series. The quantizing unit 109 quantizes the encoded information (for example, quantizes so as to have a predetermined number of bits for each element), and outputs the obtained digital speech encoded information.

図１０は、ＭＢＥ符号化方式に従っている、非特許文献２に記載の音声復号化装置の構成を示している。図１０に示す音声復号化装置２００は、上述した音声符号化装置１００に対向するものであり、音声符号化装置１００が出力したデジタル音声符号化情報が与えられる。 FIG. 10 shows the configuration of a speech decoding apparatus described in Non-Patent Document 2 that conforms to the MBE encoding method. The speech decoding apparatus 200 shown in FIG. 10 is opposite to the speech encoding apparatus 100 described above, and is given the digital speech encoding information output by the speech encoding apparatus 100.

図１０において、音声復号化装置２００は、逆量子化手段２０１、多重分離手段２０２、有声／無声包絡分離手段２０３、調波発振手段２０４、補間手段２０５、雑音生成手段２０６、周波数変換手段２０７、包絡情報置換手段２０８、波形復元手段２０９及び加算手段２１０を有する。 In FIG. 10, speech decoding apparatus 200 includes inverse quantization means 201, demultiplexing means 202, voiced / unvoiced envelope separation means 203, harmonic oscillation means 204, interpolation means 205, noise generation means 206, frequency conversion means 207, An envelope information replacing unit 208, a waveform restoring unit 209, and an adding unit 210 are included.

図１０において、逆量子化手段２０１は、到来したデジタル音声符号化情報から、逆量子化によって、量子化前の符号化情報を推定する。多重分離手段２０２は、逆量子化された音声符号化情報を、ピッチ情報、有声／無声情報及び包絡情報へと多重分離する。 In FIG. 10, inverse quantization means 201 estimates encoded information before quantization by inverse quantization from the arrived digital speech encoded information. The demultiplexing means 202 demultiplexes the dequantized speech coding information into pitch information, voiced / unvoiced information, and envelope information.

有声／無声包絡分離手段２０３は、多重分離された有声／無声情報に基づいて、包絡情報を、有声包絡情報と無声包絡情報とに分離する。有声包絡情報は、無声である調波帯域のパワーと位相がゼロとなっており、無声包絡情報は、有声である調波帯域のパワーがゼロとなっている。調波発振手段２０４は、ピッチ情報と有声包絡情報に基づいて、調波成分ごとに有声包括情報に応じた振幅と位相の正弦波信号を生成し、全ての調波成分の正弦波信号を足し合わせて有声音声を合成する。生成される正弦波信号は、振幅と位相が、有声包括情報に応じた振幅と位相を連続するように調整されているものである。 The voiced / unvoiced envelope separation means 203 separates the envelope information into voiced envelope information and unvoiced envelope information based on the demultiplexed voiced / unvoiced information. The voiced envelope information has zero power and phase in the harmonic band that is unvoiced, and the voiced envelope information has zero power in the harmonic band that is voiced. Based on the pitch information and the voiced envelope information, the harmonic oscillation means 204 generates a sine wave signal having an amplitude and a phase corresponding to the voiced comprehensive information for each harmonic component, and adds the sine wave signals of all the harmonic components. In addition, voiced speech is synthesized. The generated sine wave signal is adjusted so that the amplitude and the phase are continuous in accordance with the voiced comprehensive information.

補間手段２０５は、無声包絡情報を、周波数変換手段２０７の周波数分解能に合わせて補間（例えば線形補間）し、無声振幅スペクトルを得る。雑音生成手段２０６は、周知のいずれかの方法で白色雑音を生成し、周波数変換手段２０７は、上述した周波数変換手段１０１と同じパラメータで白色雑音信号を周波数変換し、雑音スペクトルを得る。包絡情報置換手段２０８は、周波数変換手段２０７からの雑音スペクトルに補間手段２０５からの無声振幅スペクトルを乗じて無声スペクトルを算出する。波形復元手段２０９は、周波数変換手段２０７に対応したパラメータで無声スペクトルをＩＦＦＴし、かつ、オーバーラップ加算して無声音声を生成する。 The interpolation unit 205 interpolates (for example, linear interpolation) the unvoiced envelope information in accordance with the frequency resolution of the frequency conversion unit 207 to obtain a unvoiced amplitude spectrum. The noise generation unit 206 generates white noise by any known method, and the frequency conversion unit 207 performs frequency conversion of the white noise signal with the same parameters as the frequency conversion unit 101 described above to obtain a noise spectrum. The envelope information replacement unit 208 multiplies the noise spectrum from the frequency conversion unit 207 by the unvoiced amplitude spectrum from the interpolation unit 205 to calculate the unvoiced spectrum. The waveform restoration unit 209 performs an IFFT on the unvoiced spectrum with parameters corresponding to the frequency conversion unit 207 and generates an unvoiced voice by performing overlap addition.

加算手段２１０は、調波発振手段２０４からの有声音声と波形復元手段２０９からの無声音声とを加算して復号音声を得て出力する。 The adding means 210 adds the voiced voice from the harmonic oscillation means 204 and the unvoiced voice from the waveform restoration means 209 to obtain a decoded voice and outputs it.

以上では、ＭＢＥ符号化方式に従っている音声符号化装置１００及び音声復号化装置２００の構成並びに動作を説明したが、ＡＭＢＥ符号化方式やＩＭＢＥ符号化方式も、音声パラメータの推定や、量子化の精度及び方法は異なるが、原理的には極めて似通っている。いずれのＭＢＥ系の音声符号化方式共に、雑音への耐性が高く、低ビットレートで安定した品質を提供できる。 In the above, the configurations and operations of the speech encoding apparatus 100 and speech decoding apparatus 200 that comply with the MBE encoding scheme have been described. However, the AMBE encoding scheme and the IMBE encoding scheme also include estimation of speech parameters and accuracy of quantization. And in principle, they are very similar in principle. Any of the MBE speech coding systems is highly resistant to noise and can provide stable quality at a low bit rate.

“１５０ＭＨｚ帯アナログ簡易無線局用周波数におけるデジタル方式との周波数共用に関する調査検討報告書”，総務省北陸総合通信局調査研究会情報，２０１１年．"Survey report on frequency sharing with digital system in frequency for 150MHz analog simple radio station", Ministry of Internal Affairs and Communications, Hokuriku General Communications Bureau, Study Group Information, 2011. ＤａｎｉｅｌＷ．ＧｒｉｆｆｉｎａｎｄＪａｅＳ．Ｌｉｍ，“ＭｕｌｔｉｂａｎｄＥｘｃｉｔａｔｉｏｎＶｏｃｏｄｅｒ，”ＩＥＥＥＴｒａｎｓ．ｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．ＡＳＳＰ−３６，ｎｏ．８，ｐｐ．１２２３−１２３５，１９８８．Daniel W. Griffin and Jae S .; Lim, “Multiband Excitation Vocoder,” IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. ASSP-36, no. 8, pp. 1223-1235, 1988.

しかしながら、ＭＢＥ系の音声符号化方式は、非特許文献１に報告されている通り、復号音声が「鼻が詰まった様な音声」になり、明瞭性を損なう問題を有する。 However, as reported in Non-Patent Document 1, the MBE-based speech encoding method has a problem that the decoded speech becomes “sound that seems to be clogged with a nose” and the clarity is impaired.

そのため、上記問題に鑑みて、復号音声の明瞭性を改善する音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器が望まれている。 Therefore, in view of the above problems, a speech decoding apparatus, speech decoding method, speech decoding program, and communication device that improve the clarity of decoded speech are desired.

第１の本発明の音声復号化装置は、ＭＢＥ系の音声符号化方式に従って符号化されているデジタル符号化情報を復号する音声復号化装置において、（１）デジタル音声符号化情報を復号化して復号音声を生成するＭＢＥ系復号手段と、（２）復号音声の破裂音を検出する破裂音検出手段と、（３）検出された破裂音を破裂化させる破裂化処理手段とを備えることを特徴とする。 A speech decoding apparatus according to a first aspect of the present invention is a speech decoding apparatus for decoding digitally encoded information encoded according to an MBE-based speech encoding method. (1) Decoding digital speech encoded information MBE decoding means for generating decoded speech; (2) a burst sound detecting means for detecting a burst sound of the decoded voice; and (3) a bursting processing means for bursting the detected burst sound. And

第２の本発明に係る音声復号化装置は、ＭＢＥ系の音声符号化方式に従って符号化されているデジタル符号化情報を復号する音声復号化装置において、（１）デジタル音声符号化情報を復号化して復号音声を生成するＭＢＥ系復号手段と、（２）復号音声の破裂音を周波数領域で検出する周波数領域破裂音検出手段と、（３）周波数領域破裂音検出手段において連続して破裂音を検出した回数と同数のフレーム内の復号音声を、サンプルごとに非負値に変換して得られた非負値化信号の和に関する重心を算出して重心時刻を算出する重心時刻算出手段と、（４）復号音声の破裂音を時間領域で検出する時間領域破裂音検出手段と、（５）重心時刻と時間領域破裂音検出手段から得られた破裂情報とを、当該破裂情報に基づいて選択する破裂情報選択手段と、（６）周波数領域破裂音検出手段の判定結果と破裂情報とに基づいて、破裂音か否かを再判定する破裂検定手段と、（７）破裂検定手段において破裂音であると判定したフレームを基準として、破裂情報選択手段から得られた重み係数設計情報に基づいて予め設計された所定の重み係数を再設計して、復号音声に当該重み係数を乗じる破裂化処理手段とを備えることを特徴とする。 A speech decoding apparatus according to a second aspect of the present invention is a speech decoding apparatus that decodes digitally encoded information encoded in accordance with an MBE speech encoding method. (1) Decodes digital speech encoded information MBE decoding means for generating decoded speech, (2) frequency domain burst detection means for detecting the burst sound of the decoded speech in the frequency domain, and (3) continuous burst sound in the frequency domain burst detection means. Centroid time calculation means for calculating the centroid time by calculating the centroid of the sum of the non-negative signals obtained by converting the decoded speech in the same number of detected frames into non-negative values for each sample; (4 A time domain burst sound detecting means for detecting a burst sound of the decoded speech in the time domain; and (5) a burst for selecting the centroid time and the burst information obtained from the time domain burst sound detecting means based on the burst information. Affection A selection means; (6) a burst verification means for re-determining whether or not the sound is a burst sound based on the determination result of the frequency domain burst sound detection means and the burst information; and (7) a burst sound in the burst verification means. Rupture processing means for redesigning a predetermined weight coefficient designed in advance based on the weight coefficient design information obtained from the burst information selection means based on the determined frame and multiplying the decoded speech by the weight coefficient. It is characterized by providing.

第３の本発明に係る音声復号化方法は、ＭＢＥ系の音声符号化方式に従って符号化されているデジタル符号化情報を復号する音声復号化方法において、（１）ＭＢＥ系復号手段が、デジタル音声符号化情報を復号化して復号音声を生成し、（２）破裂音検出手段が、復号音声の破裂音を検出し、（３）破裂化処理手段が、検出された破裂音を破裂化させることを特徴とする。 A speech decoding method according to a third aspect of the present invention is a speech decoding method for decoding digitally encoded information encoded according to an MBE speech encoding method, wherein (1) the MBE decoding means is a digital speech Decoding the encoded information to generate decoded speech, (2) the burst sound detection means detects the burst sound of the decoded voice, and (3) the burst processing means bursts the detected burst sound. It is characterized by.

第４の本発明に係る音声復号化方法は、ＭＢＥ系の音声符号化方式に従って符号化されているデジタル符号化情報を復号する音声復号化方法において、（１）ＭＢＥ系復号手段が、デジタル音声符号化情報を復号化して復号音声を生成し、（２）周波数領域破裂音検出手段が、復号音声の破裂音を周波数領域で検出し、（３）重心時刻算出手段が、周波数領域破裂音検出手段において連続して破裂音を検出した回数と同数のフレーム内の復号音声を、サンプルごとに非負値に変換して得られた非負値化信号の和に関する重心を算出して重心時刻を算出し、（４）時間領域破裂音検出手段が復号音声の破裂音を時間領域で検出し、（５）破裂情報選択手段が、重心時刻と時間領域破裂音検出手段から得られた破裂情報とを、当該破裂情報に基づいて選択し、（６）破裂検定手段が、周波数領域破裂音検出手段の判定結果と破裂情報とに基づいて、破裂音か否かを再判定し、（９）破裂化処理手段が、破裂検定手段において破裂音であると判定したフレームを基準として、破裂情報選択手段から得られた重み係数設計情報に基づいて予め設計された所定の重み係数を再設計して、復号音声に当該重み係数を乗じることを特徴とする。 A speech decoding method according to a fourth aspect of the present invention is a speech decoding method for decoding digitally encoded information encoded according to an MBE speech encoding method, wherein (1) the MBE decoder is a digital speech Decoding encoded information to generate decoded speech, (2) frequency domain burst sound detection means detects the burst sound of the decoded speech in the frequency domain, and (3) centroid time calculation means detects frequency domain burst sound The centroid time is calculated by calculating the centroid of the sum of the non-negative signal obtained by converting the decoded speech in the same number of frames as the number of times the plosive is continuously detected by the means into a non-negative value for each sample. (4) The time domain burst sound detection means detects the burst sound of the decoded speech in the time domain, and (5) the burst information selection means determines the centroid time and the burst information obtained from the time domain burst sound detection means, Based on the burst information (6) The burst verification means re-determines whether or not the sound is a burst sound based on the determination result of the frequency domain burst sound detection means and the burst information, and (9) the burst processing means determines the burst test. Based on the frame determined to be a plosive sound in the means, the predetermined weight coefficient designed in advance based on the weight coefficient design information obtained from the burst information selection means is redesigned, and the weight coefficient is added to the decoded speech. It is characterized by multiplication.

第５の本発明に係る音声復号化プログラムは、ＭＢＥ系の音声符号化方式に従って符号化されているデジタル符号化情報を復号する音声復号化プログラムにおいて、コンピュータを、（１）デジタル音声符号化情報を復号化して復号音声を生成するＭＢＥ系復号手段と、（２）復号音声の破裂音を検出する破裂音検出手段と、（３）検出された破裂音を破裂化させる破裂化処理手段として機能させることを特徴とする。 A speech decoding program according to a fifth aspect of the present invention is a speech decoding program for decoding digitally encoded information encoded according to an MBE speech encoding method. MBE-based decoding means for generating decoded speech, (2) burst sound detecting means for detecting the burst sound of the decoded voice, and (3) bursting processing means for bursting the detected burst sound It is characterized by making it.

第６の本発明に係る音声復号化プログラムは、ＭＢＥ系の音声符号化方式に従って符号化されているデジタル符号化情報を復号する音声復号化プログラムにおいて、コンピュータを、（１）デジタル音声符号化情報を復号化して復号音声を生成するＭＢＥ系復号手段と、（２）復号音声の破裂音を周波数領域で検出する周波数領域破裂音検出手段と、（３）周波数領域破裂音検出手段において連続して破裂音を検出した回数と同数のフレーム内の復号音声を、サンプルごとに非負値に変換して得られた非負値化信号の和に関する重心を算出して重心時刻を算出する重心時刻算出手段と、（４）復号音声の破裂音を時間領域で検出する時間領域破裂音検出手段と、（５）重心時刻と時間領域破裂音検出手段から得られた破裂情報とを、当該破裂情報に基づいて選択する破裂情報選択手段と、（６）周波数領域破裂音検出手段の判定結果と破裂情報とに基づいて、破裂音か否かを再判定する破裂検定手段と、（７）破裂検定手段において破裂音であると判定したフレームを基準として、破裂情報選択手段から得られた重み係数設計情報に基づいて予め設計された所定の重み係数を再設計して、復号音声に当該重み係数を乗じる破裂化処理手段として機能させることを特徴とする。 A speech decoding program according to a sixth aspect of the present invention is a speech decoding program for decoding digitally encoded information encoded according to an MBE speech encoding method, wherein (1) digital speech encoded information MBE decoding means for generating decoded speech by decoding the signal, (2) frequency domain burst sound detecting means for detecting a burst sound of the decoded voice in the frequency domain, and (3) frequency domain burst sound detecting means. Centroid time calculation means for calculating the centroid time by calculating the centroid of the sum of the non-negative signal obtained by converting the decoded speech in the same number of frames as the number of times the plosive is detected into a non-negative value for each sample; (4) Time domain burst sound detection means for detecting a burst sound of decoded speech in the time domain, and (5) the centroid time and the burst information obtained from the time domain burst sound detection means. Burst information selection means for selection based on information, (6) burst verification means for re-determining whether or not a burst sound is based on the determination result and burst information of the frequency domain burst sound detection means, and (7) burst Based on the frame determined to be a plosive sound by the verification means, the predetermined weight coefficient designed in advance based on the weight coefficient design information obtained from the burst information selection means is redesigned, and the weight coefficient is applied to the decoded speech. It is made to function as a bursting process means to multiply.

第７の本発明に係る通信機器は、第１、第２の本発明に係る音声復号化装置を備えることを特徴とするものである。 A communication device according to a seventh aspect of the present invention includes the speech decoding apparatus according to the first and second aspects of the present invention.

本発明によれば、符号化によって失われた復号音声の無声破裂音を破裂化させて、当該復号音声の明瞭性を改善した音声を利用者に提供できる。 ADVANTAGE OF THE INVENTION According to this invention, the unvoiced plosive of the decoded audio | voice lost by encoding can be burst, and the audio | voice which improved the clarity of the said decoded audio | voice can be provided to a user.

第１の実施形態に係る音声復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech decoding apparatus which concerns on 1st Embodiment. 重み係数の設計方法の例を示す図である。It is a figure which shows the example of the design method of a weighting coefficient. 第２の実施形態に係る音声復号化装置構成を示す機能ブロック図である。It is a functional block diagram which shows the audio | voice decoding apparatus structure which concerns on 2nd Embodiment. 第３の実施形態に係る音声復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech decoding apparatus which concerns on 3rd Embodiment. 第４の実施形態に係る音声復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech decoding apparatus which concerns on 4th Embodiment. 第５の実施形態に係る音声復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech decoding apparatus which concerns on 5th Embodiment. 第６の実施形態に係る音声復号化装置の構成を示す機能ブロック国である。It is a functional block country which shows the structure of the speech decoding apparatus which concerns on 6th Embodiment. 第７の実施形態に係る音声復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech decoding apparatus which concerns on 7th Embodiment. ＭＢＥ系の音声符号化方式の符号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the encoding apparatus of the MBE type audio | voice encoding system. ＭＢＥ系の音声符号化方式の復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the decoding apparatus of the MBE type audio | voice coding system. 無声破裂音の波形例を示して無声破裂音の音響現象を説明する図である。It is a figure which shows the example of a waveform of an unvoiced plosive and explains the acoustic phenomenon of an unvoiced plosive. 符号化前と復号化後の無声破裂音／ｋ／の波形例を示す図である。It is a figure which shows the example of a waveform of unvoiced plosive / k / before encoding and after decoding. 符号化前と復号化後の無声破裂音／ｔ／の波形例を示す図である。It is a figure which shows the example of a waveform of unvoiced plosive / t / before encoding and after decoding.

（Ａ）各実施形態によって復号音声の明瞭性を改善できる理由
各実施形態の音声符号化装置の説明に先立ち、各実施形態の音声符号化装置によって、ＭＢＥ系の音声符号化方式の復号音声の明瞭性を改善できる理由を説明する。 (A) Reason why the clarity of the decoded speech can be improved by each embodiment Prior to the description of the speech encoding device of each embodiment, the speech encoding device of each embodiment allows the speech encoding device of each embodiment to decode the decoded speech of the MBE speech encoding scheme. Explain why the clarity can be improved.

まず、復号音声の明瞭性が損なわれる原因を考察する。入力音声と復号音声とを注意深く聴き比べた結果、無声破裂音（例えば、日本語では／ｋ／、／ｔ／、および／ｐ／）が不明瞭になっていることを見出した。ここで、無声破裂音とは、図１１に示すような特徴的な音響現象を有する音韻である。図１１は、日本語の「か（／ｋａ／）」の波形とその音響現象の説明を示した図である（引用：板橋秀一ら著『音声工学』森北出版株式会社、第１版、第２章、Ｐ．２７、図２．１３）。図１１に示すように、無声破裂音は、無音部、破裂部、気音部の３状態を有する。ただし、発声によっては気音部が存在しない場合もあり、そのような現象は特に／ｐ／に多い。 First, the reason why the clarity of decoded speech is impaired will be considered. As a result of careful listening and comparison between the input speech and the decoded speech, it was found that unvoiced plosives (for example, / k /, / t /, and / p / in Japanese) are unclear. Here, the unvoiced plosive is a phoneme having a characteristic acoustic phenomenon as shown in FIG. FIG. 11 is a diagram showing a Japanese “ka (/ ka /)” waveform and explanation of its acoustic phenomenon (quoted by Shuichi Itabashi et al., “Speech Engineering” Morikita Publishing Co., Ltd., 1st edition, No. 1) Chapter 2, P.27, Figure 2.13). As shown in FIG. 11, the silent plosive sound has three states of a silent part, a rupture part, and an aerial part. However, depending on the utterance, there is a case where the sound part does not exist, and such a phenomenon is particularly large in / p /.

しかし、復号音声の波形を調査すると、この特徴的な音響現象が損なわれていることが確認できる。図１２と図１３は、それぞれ無声破裂音／ｋ／と／ｔ／に関する波形であり、何れも上段（図１２（Ａ）、図１３（Ａ））と下段（図１２（Ｂ）、図１３（Ｂ））がそれぞれ符号化前と復号化後の音声の波形を示している。何れも実線が波形であり、破線の四角形で囲んだ範囲が無声破裂音の範囲（破裂部と気音部）を表している。なお、図１２及び図１３の横軸は時間（単位はミリ秒）であり、縦軸は単位無しの振幅値である。 However, when the waveform of the decoded speech is investigated, it can be confirmed that this characteristic acoustic phenomenon is impaired. FIGS. 12 and 13 are waveforms relating to unvoiced plosives / k / and / t /, respectively, and the upper (FIGS. 12A and 13A) and the lower (FIGS. 12B and 13), respectively. (B) shows the waveforms of speech before encoding and after decoding, respectively. In both cases, the solid line is a waveform, and the range enclosed by a broken-line rectangle represents the range of unvoiced plosives (ruptured portion and aerial portion). In FIGS. 12 and 13, the horizontal axis represents time (unit: millisecond), and the vertical axis represents amplitude value without unit.

／ｋ／と／ｔ／の何れの波形でも、符号化前は無音部から破裂部に移行する際に振幅が急激に大きくなっているが（これを「破裂」と言い、破裂音と呼ばれる所以である。）、復号化後は破裂部と気音部の２状態の中で振幅が徐々に大きくなっている。つまり、復号音声の無声破裂音は、破裂していないため、復号音声は不明瞭になる。 In any of the waveforms of / k / and / t /, the amplitude suddenly increases when the transition is made from the silent part to the rupture part before encoding (this is called “rupture” and is called the rupture sound). After decoding, the amplitude gradually increases in the two states of the rupture part and the sound part. That is, since the unvoiced plosive sound of the decoded speech is not ruptured, the decoded speech becomes unclear.

次に、無声破裂音の特徴的な音響現象が損なわれるメカニズムを考察する。無声破裂音の破裂部と気音部の長さは、おおよそ５〜３０ｍｓ程度である。これに対して、ＭＢＥ系の音声符号化方式は、多くの場合、例えばフレーム長２０ｍｓを用いてハーフオーバーラップ（すなわちフレーム周期１０ｍｓ）で解析を行い、さらに無声音の位相情報を保存しない。 Next, the mechanism by which the characteristic acoustic phenomenon of unvoiced plosives is impaired will be considered. The length of the rupture part and the sound part of the unvoiced plosive is about 5 to 30 ms. On the other hand, in many cases, the MBE speech coding method performs analysis with a half overlap (that is, a frame period of 10 ms) using, for example, a frame length of 20 ms, and does not store the phase information of unvoiced sound.

このようなフレーム処理では、短い時間で起きる無声破裂音の特徴的な音響現象を符号化情報に保持することができないため、復号音声に正しく再現できない。無声音の復号は、符号化と同様に、ハーフオーバーラップで雑音を生成・加算していくため、無声破裂音は緩やかに振幅が変化する無声音へと変質してしまう。 In such frame processing, the characteristic acoustic phenomenon of the unvoiced plosive sound that occurs in a short time cannot be retained in the encoded information, and therefore cannot be correctly reproduced in the decoded speech. In the decoding of unvoiced sound, noise is generated and added in a half-overlapping manner as in the case of encoding. Therefore, the unvoiced plosive sound is transformed into an unvoiced sound whose amplitude gradually changes.

次に、本発明が復号音声の明瞭性をどのように改善するかを説明する。復号音声の無声破裂音は、符号化によって振幅の急激な変化が失われたが、パワースペクトルの包絡は失われていない。すなわち、振幅の急激な変化を再現すれば、復号音声の無声破裂音は正しい無声破裂音として知覚されるようになる。したがって、適当な方法で無声破裂音を検出し、検出された無声破裂音に振幅変調を施して無声破裂音らしい特徴的な音響現象を引き起こすことによって、無声破裂音がはっきりと聴き取れるようになり、復号音声の明瞭性が向上すると、本件発明者は考えた。 Next, how the present invention improves the clarity of decoded speech will be described. In the unvoiced plosive of the decoded speech, a sudden change in amplitude is lost due to encoding, but the envelope of the power spectrum is not lost. In other words, if a sudden change in amplitude is reproduced, the unvoiced plosive of the decoded speech will be perceived as a correct unvoiced plosive. Therefore, by detecting the unvoiced plosive sound by an appropriate method and applying amplitude modulation to the detected unvoiced plosive sound to cause a characteristic acoustic phenomenon that seems to be a silent plosive sound, the unvoiced plosive sound can be clearly heard. The present inventor thought that the clarity of the decoded speech was improved.

以上より、本発明は、復号音声の破裂音を検出し、振幅変調によって無声破裂音の特徴的な音響現象を再現することにより、復号音声の明瞭性を改善する。 As described above, the present invention improves the intelligibility of decoded speech by detecting the burst sound of the decoded speech and reproducing the characteristic acoustic phenomenon of the unvoiced burst sound by amplitude modulation.

（Ｂ）第１の実施形態
次に、本発明に係る音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器の第１の実施形態を、図面を参照しながら詳細に説明する。 (B) First Embodiment Next, a first embodiment of the speech decoding apparatus, speech decoding method, speech decoding program, and communication device according to the present invention will be described in detail with reference to the drawings.

第１の実施形態の音声復号化装置は、ＭＢＥ系の音声符号化方式に従って復号するものであり、後述する第２の実施形態以降も同様である。第１〜第７の実施形態では、有声破裂音を扱わないため、便宜上、以下では「無声破裂音」を単に「破裂音」と呼ぶ。 The speech decoding apparatus according to the first embodiment performs decoding in accordance with the MBE-based speech encoding method, and the same applies to the second and later embodiments described later. In the first to seventh embodiments, since voiced plosives are not handled, for the sake of convenience, “unvoiced plosives” are hereinafter simply referred to as “plosives”.

（Ｂ−１）第１の実施形態の構成
図１は、第１の実施形態に係る音声復号化装置の構成を示す機能ブロック図である。 (B-1) Configuration of the First Embodiment FIG. 1 is a functional block diagram showing the configuration of the speech decoding apparatus according to the first embodiment.

図１において、第１の実施形態に係る音声復号化装置１Ａは、受信手段１１、ＭＢＥ系復号手段１２、破裂音検出手段１３、破裂化処理手段１４を有する。 In FIG. 1, the speech decoding apparatus 1 A according to the first embodiment includes a receiving unit 11, an MBE decoding unit 12, a burst sound detecting unit 13, and a bursting processing unit 14.

音声復号化装置１Ａは、ＭＢＥ系音声符号化方式で符号化されたデジタル符号化情報を復号し、得られた復号音声の明瞭性を改善するものである。ここで、音声復号化装置１Ａは、ハードウェアで構成することも可能であり、また、ＣＰＵとＣＰＵが実行するソフトウェア（音声復号化プログラム）で実現することも可能であるが、いずれの実現方法を採用した場合であっても、機能的には図１で表すことができる。 The audio decoding device 1A decodes digitally encoded information encoded by the MBE audio encoding method, and improves the clarity of the obtained decoded audio. Here, the speech decoding apparatus 1A can be configured by hardware, and can be realized by the CPU and software executed by the CPU (speech decoding program). 1 is functionally represented in FIG.

対向する音声符号化装置は、例えば図９に例示する構成を有し、ＭＢＥ系の音声符号化方式に従って符号化したデジタル音声符号化情報（以下、デジタル符号化情報と呼ぶ。）を、送信手段によって無線回線に送出する。なお、この実施形態では、デジタル符号化情報は、無線回線を通じて無線通信によって伝送される場合を例示するが、有線回線を通じて伝送されるものであっても良い。 The opposing speech coding apparatus has a configuration illustrated in FIG. 9, for example, and transmits digital speech coding information (hereinafter referred to as digital coding information) coded according to the MBE speech coding method. Is sent to the wireless line. In this embodiment, the digital encoded information is exemplified by a case where it is transmitted by wireless communication through a wireless line, but may be transmitted through a wired line.

受信手段１１は、無線通信によって伝送されたデジタル符号化情報を受信し、得られたデジタル符号化情報をＭＢＥ系復号手段１２に与えるものである。 The receiving unit 11 receives digital encoded information transmitted by wireless communication, and gives the obtained digital encoded information to the MBE decoding unit 12.

第１の実施形態では、受信手段１１がデジタル無線機である場合を想定しており、さらに受信手段１１の処理を簡略化している。デジタル符号化情報の取得方法は、デジタル符号化情報を取得できるのであれば、例えば無線通信でなくても良く、有線通信であっても良い。また、いかなる通信手段においてもパケットロスを起こす可能性があるため、受信手段１１がパケットロスを補償する処理を行ない、その補償処理を施した情報をデジタル符号化情報として、ＭＢＥ系復号手段１２に与えられるようにしても良い。なお、ＭＢＥ系復号手段１２の種類によっては、ＭＢＥ系復号手段１２がパケットロス補償処理をも含むものもあり、その場合、受信手段１１による事前の補償処理は不要である。 In the first embodiment, it is assumed that the receiving unit 11 is a digital wireless device, and the processing of the receiving unit 11 is further simplified. As long as the digital encoded information can be acquired, the digital encoded information may be acquired by, for example, not wireless communication but wired communication. Since any communication means may cause a packet loss, the receiving means 11 performs a process for compensating for the packet loss, and the information subjected to the compensation process is sent to the MBE decoding means 12 as digitally encoded information. It may be given. Depending on the type of the MBE decoding unit 12, the MBE decoding unit 12 may include a packet loss compensation process, and in this case, a prior compensation process by the receiving unit 11 is unnecessary.

ＭＢＥ系復号手段１２は、デジタル符号化情報を生成するのに使用した符号化方法に対応した復号方法を用いて、デジタル符号化情報を復号するものである。ＭＢＥ系復号手段１２による復号により得られた復号音声は、破裂音検出手段１３および破裂化処理手段１４に与えられる。ここで、ＭＢＥ系復号手段１２は、ＭＢＥ系の音声符号化方式を用いた復号手段であれば種々のものを広く適用することができる。例えば、図１０に示す復号装置（復号方法）を用いるようにしても良いし、又前述したＡＭＢＥ（ＡＭＢＥ＋２を含む）やＩＭＢＥを用いても良い。 The MBE decoding means 12 decodes the digital encoded information using a decoding method corresponding to the encoding method used to generate the digital encoded information. The decoded speech obtained by the decoding by the MBE decoding unit 12 is given to the burst sound detecting unit 13 and the bursting processing unit 14. Here, the MBE decoding means 12 can be widely applied as long as it is a decoding means using an MBE speech coding method. For example, the decoding device (decoding method) shown in FIG. 10 may be used, or AMBE (including AMBE + 2) or IMBE described above may be used.

破裂音検出手段１３は、ＭＢＥ系復号手段１２により復号された復号音声を取得し、復号音声を解析して、当該フレームが破裂音の破裂部始端を有するか否かを判定するものである。破裂音検出手段１３は、得られた判定結果（破裂真偽値と呼ぶ。）を破裂化処理手段１４に与える。 The plosive sound detection means 13 acquires the decoded voice decoded by the MBE decoding means 12, analyzes the decoded voice, and determines whether or not the frame has a burst sound burst start point. The plosive sound detecting means 13 gives the obtained determination result (referred to as a rupture truth value) to the rupturing processing means 14.

図１に示すように、破裂音検出手段１３は、短周期パワー算出部１５、パワー比算出部１６、破裂検出部１７を有する。 As shown in FIG. 1, the plosive sound detection means 13 includes a short cycle power calculation unit 15, a power ratio calculation unit 16, and a burst detection unit 17.

短周期パワー算出部１５は、ＭＢＥ系復号手段１２よりも短い周期で復号音声のパワーを算出するものであり、得られた短周期パワーは、パワー比算出部１６に与えられる。 The short cycle power calculation unit 15 calculates the power of decoded speech in a cycle shorter than that of the MBE decoding means 12, and the obtained short cycle power is given to the power ratio calculation unit 16.

パワー比算出部１６は、短周期パワー算出部１５から与えられた短周期パワーを、所定のルールで定められる基準パワーで除して、パワー比を算出するものであり、得られたパワー比は破裂検出部１７に与えられる。なお、パワー比算出部１６における基準パワーを定める所定のルールについては、動作の項で詳細に説明する。 The power ratio calculation unit 16 calculates the power ratio by dividing the short cycle power given from the short cycle power calculation unit 15 by the reference power determined by a predetermined rule, and the obtained power ratio is This is given to the burst detection unit 17. The predetermined rule for determining the reference power in the power ratio calculation unit 16 will be described in detail in the operation section.

破裂検出部１７は、与えられた短周期のパワー比が所定の閾値以上の値を有するか否かを判定して破裂真偽値を生成するものであり、得られた破裂真偽値は破裂音検出手段１３の出力として破裂化処理手段１４に与えられる。すなわち、当該短周期のパワー比が閾値を上回れば破裂真偽値を真（ＴＲＵＥ）とし、上回らなければ偽（ＦＡＬＳＥ）とする。 The rupture detection unit 17 generates a rupture truth value by determining whether or not a given short period power ratio has a value equal to or greater than a predetermined threshold, and the obtained rupture truth value is a rupture value. The output from the sound detection means 13 is given to the bursting processing means 14. That is, if the power ratio of the short cycle exceeds the threshold value, the burst true value is set to true (TRUE), and if it does not exceed the threshold value, it is set to false (FALSE).

破裂化処理手段１４は、破裂音検出手段１３から与えられた破裂真偽値が真であるなら、ＭＢＥ系復号手段１２から与えられた復号音声に所定の重み係数を乗じて振幅変調を施し、破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、その得られた改善音声を出力する。当該振幅変調は、人工的に無音部と破裂部を生成する処理であって、当該振幅変調に用いる所定の重み係数は、後述する設計方法で予め計算しておく。 The bursting processing means 14 performs amplitude modulation by multiplying the decoded speech given from the MBE decoding means 12 by a predetermined weighting factor if the burst truth value given from the burst sound detection means 13 is true, If the burst truth value is false, the decoded speech is passed as it is to obtain improved speech, and the obtained improved speech is output. The amplitude modulation is a process for artificially generating a silent portion and a rupture portion, and a predetermined weight coefficient used for the amplitude modulation is calculated in advance by a design method described later.

（Ｂ−２）第１の実施形態の動作
次に、第１の実施形態の音声復号化装置１Ａの動作を説明する。 (B-2) Operation of the First Embodiment Next, the operation of the speech decoding apparatus 1A of the first embodiment will be described.

対向する音声符号化装置は、ＭＢＥ系の音声符号化方式に従って符号化したデジタル符号化情報を、無線回線を通じて送信する。 The opposing speech encoding apparatus transmits digitally encoded information encoded according to the MBE speech encoding method through a wireless line.

無線通信されたデジタル符号化情報は、音声復号化装置１Ａの受信手段１１により受信される。受信されたデジタル符号化情報は、ＭＢＥ系復号手段１２により、ＭＢＥ系の音声符号化方式に対応する復号化方法により復号され、その得られた復号音声が、破裂音検出手段１３および破裂化処理手段１４に与えられる。 The digitally encoded information communicated wirelessly is received by the receiving means 11 of the speech decoding apparatus 1A. The received digital encoded information is decoded by the MBE decoding means 12 by a decoding method corresponding to the MBE voice encoding method, and the obtained decoded speech is converted into the burst sound detecting means 13 and the bursting process. Provided to means 14.

破裂音検出手段１３では、短周期パワー算出部１５が、ＭＢＥ系復号手段１２よりも短い周期で、復号音声のパワーを算出する。例えば、短周期パワーの算出周期を２ｍｓとすると、与えられた１０ｍｓフレーム分の復号音声から短周期パワーは５個算出される。短周期パワーの算出周期を２．５ｍｓとしてもよい。この短周期パワーは、パワー比算出部１６に与えられる。短周期パワーの算出周期は、例えば１ｍｓ〜５ｍｓの任意の時間長を設定することができるが、計算の都合上、（フレーム周期の整数分の１）倍の時間長が好適に用いられる。 In the plosive sound detection means 13, the short cycle power calculation unit 15 calculates the power of the decoded speech at a shorter cycle than the MBE decoding means 12. For example, if the short cycle power calculation cycle is 2 ms, five short cycle powers are calculated from the decoded speech for a given 10 ms frame. The calculation cycle of the short cycle power may be 2.5 ms. This short period power is given to the power ratio calculation unit 16. As the short cycle power calculation cycle, an arbitrary time length of, for example, 1 ms to 5 ms can be set. However, for the convenience of calculation, a time length of (1 / integer of the frame cycle) is preferably used.

パワー比算出部１６は、与えられた短周期パワーを、所定のルールで定められる基準パワーで除して、パワー比を算出する。得られたパワー比は破裂検出部１７に与えられる。 The power ratio calculation unit 16 calculates the power ratio by dividing the given short period power by the reference power determined by a predetermined rule. The obtained power ratio is given to the burst detection unit 17.

パワー比算出部１６において基準パワーを定める所定のルールは、過去のフレームの情報を使うか否かで方法が異なる。 The predetermined rule for determining the reference power in the power ratio calculation unit 16 differs depending on whether or not the information of the past frame is used.

例えば、過去のフレームを使わずに現在のフレームだけを使ってパワー比を算出する場合、パワー比を求めたい時刻の直前から当該フレーム内の最初の時刻までの短周期パワーの最小値を基準パワーとする方法が好適に用いられる。ただし、この方法ではフレーム内の最初のパワー比を算出できないため、当該パワー比は１とする。 For example, when calculating the power ratio using only the current frame without using the past frame, the minimum value of the short period power from the time immediately before the time at which the power ratio is desired to the first time in the frame is used as the reference power. The method is preferably used. However, since this method cannot calculate the first power ratio in the frame, the power ratio is set to 1.

また例えば、過去のフレームをも使ってパワー比を算出する場合、所定の参照時間幅を設定し、パワー比を求めたい時刻の直前から当該参照時間幅だけ前までの短周期パワーの最小値を基準パワーとする方法が好適に用いられる。 Also, for example, when calculating the power ratio using past frames, a predetermined reference time width is set, and the minimum value of the short cycle power from immediately before the time at which the power ratio is to be calculated to the previous reference time width is set. A method of using the reference power is preferably used.

破裂検出部１７では、与えられた短周期のパワー比が所定の閾値以上の値を有するか否かを判定して破裂真偽値を生成する。得られた破裂真偽値は破裂音検出手段１３の出力として破裂化処理手段１４に与えられる。すなわち、破裂検出部１７は、当該短周期のパワー比が閾値を上回れば破裂真偽値を真（ＴＲＵＥ）とし、上回らなければ偽（ＦＡＬＳＥ）とする。当該閾値は無音部から破裂部への移行を検出ためのものであり、当該閾値は特に限定されるものではないが、１００〜１０００程度の値が好適に用いられる。 The rupture detection unit 17 determines whether or not a given short period power ratio has a value equal to or greater than a predetermined threshold value, and generates a rupture truth value. The obtained burst truth value is given to the bursting processing means 14 as an output of the burst sound detection means 13. That is, the burst detection unit 17 sets the burst true / false value to true (TRUE) if the short cycle power ratio exceeds the threshold, and false (FALSE) if it does not exceed the threshold. The threshold value is for detecting a transition from the silent part to the rupture part, and the threshold value is not particularly limited, but a value of about 100 to 1000 is preferably used.

破裂化処理手段１４では、与えられた破裂真偽値が真であるなら、与えられた復号音声に所定の重み係数を乗じて振幅変調を施し、破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得る。その得られた改善音声は後段に出力される。 In the bursting processing means 14, if the given burst true / false value is true, the given decoded voice is multiplied by a predetermined weight coefficient to perform amplitude modulation, and if the burst true / false value is false, the decoded voice is decoded. To get the improved voice. The obtained improved speech is output to the subsequent stage.

当該振幅変調は、人工的に無音部と破裂部を生成する処理であって、当該振幅変調に用いる所定の重み係数は予め所定の設計方法で計算しておく。 The amplitude modulation is a process for artificially generating a silent portion and a rupture portion, and a predetermined weight coefficient used for the amplitude modulation is calculated in advance by a predetermined design method.

図２は、重み係数の設計の概念を示す図である。聴感上自然となるように、振幅は対数尺度（デシベル）で設計する。 FIG. 2 is a diagram illustrating the concept of weighting factor design. The amplitude is designed on a logarithmic scale (decibels) so that it is natural to the sense of hearing.

まず、破裂部の前の無音部は、破裂部始端の破裂現象をより明瞭にするために、抑圧する。ここでは、ほぼ完全な無音とし、かつ破裂現象がより急激になるようにするために、無音部の最終的なゲインを−１００ｄＢとしている。 First, the silent portion before the rupture portion is suppressed in order to make the rupture phenomenon at the rupture portion start end more clear. Here, the final gain of the silent part is set to −100 dB in order to achieve almost complete silence and to make the burst phenomenon more abrupt.

次に、破裂部始端は元の波形よりも十分大きなパワーを持つよう、９ｄＢまで増幅する。離散的でなく短時間に急激に増大させることで、波形が不連続になって余計な雑音が発生しないようにしている。最後に破裂部の後半は、音響現象が破裂部から気音部または母音へと移行する区間なので、徐々に０ｄＢへと近づける。 Next, it amplifies to 9 dB so that the rupture portion start end has sufficiently larger power than the original waveform. By increasing it rapidly in a short time rather than discretely, the waveform becomes discontinuous so that no extra noise is generated. Finally, since the second half of the rupture portion is a section in which the acoustic phenomenon shifts from the rupture portion to the air sound portion or vowel, it gradually approaches 0 dB.

図２では、無音部を１０ｍｓ、破裂部始端を５ｍｓ、破裂部後半を１５ｍｓとしている。破裂音が検出されたフレームに破裂部始端を配置することを考慮すると、この重み係数による振幅変調を実現するためには、フレーム周期が１０ｍｓの場合、少なくとも１フレーム分の復号音声を保存しておく必要があり、結果として改善音声の出力は１フレーム分遅れる。 In FIG. 2, the silent part is 10 ms, the rupture part start end is 5 ms, and the rupture part latter half is 15 ms. In consideration of the arrangement of the burst start point in the frame where the burst sound is detected, in order to realize amplitude modulation by this weighting coefficient, when the frame period is 10 ms, at least one frame of decoded speech is stored. As a result, the output of the improved voice is delayed by one frame.

なお、無音部、破裂部始端、破裂部後半の長さは図２の通りでなくても良く、例えば無音部を５ｍｓとしても良い。この場合には、改善音声を遅延させることなく振幅変調を行うことができる。また、無音部がなくても、すなわち無音部を作らずに破裂部始端で０ｄＢから９ｄＢまで増幅させるようにしてもー定の効果が得られるので、そのような設計も適用することができる。 Note that the lengths of the silent part, the start of the rupture part, and the latter half of the rupture part do not have to be as shown in FIG. 2, for example, the silence part may be 5 ms. In this case, amplitude modulation can be performed without delaying the improved sound. Even if there is no silent part, that is, amplifying from 0 dB to 9 dB at the start of the rupture part without creating a silent part, a constant effect can be obtained, and such a design can also be applied.

図２の示し方では、無音部の始めを０ｍｓとして表しているが、これは無音部の始めとフレームの始めとを一致させることを示唆するものではなく、破裂音が検出されたフレームに破裂部を形成しなければならないという条件を満たしている限りにおいては、重み係数の設計は時間方向に自由に平行移動できる。 In the manner shown in FIG. 2, the beginning of the silent part is represented as 0 ms, but this does not suggest that the beginning of the silent part coincides with the beginning of the frame, and the burst is detected in the frame in which the burst sound is detected. As long as the condition that the part must be formed is satisfied, the design of the weighting factor can be freely translated in the time direction.

（Ｂ−３）第１の実施形態の効果
第１の実施形態によれば、ＭＢＥ系の音声符号化方式の復号音声において、特徴的な音響現象が損なわれた破裂音を破裂化させるので、当該復号音声の明瞭性を改善して聴き心地を向上させた音声を利用者に提供できる。 (B-3) Effect of First Embodiment According to the first embodiment, in the decoded speech of the MBE speech coding scheme, the plosive sound in which the characteristic acoustic phenomenon is impaired is disrupted. It is possible to provide a user with improved voice clarity by improving the clarity of the decoded voice.

（Ｃ）第２の実施形態
次に、本発明に係る音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器の第２の実施形態を、図面を参照しながら説明する。 (C) Second Embodiment Next, a second embodiment of the speech decoding apparatus, speech decoding method, speech decoding program, and communication device according to the present invention will be described with reference to the drawings.

第１の実施形態では、復号音声の波形の振幅の時間変化に基づいて破裂音の検出を行う場合を示した。しかし、図１２と図１３を用いて説明したとおり、復号音声は破裂音らしい振幅の時間変化の情報が損なわれている可能性が高いため、そのような破裂音を検出することはできない。一方で、ＭＢＥ系の音声符号化方式は、周波数スペクトルのパワーを包絡情報として符号化しているので、パワースペクトルの再現性は高い。 In the first embodiment, the case where the plosive is detected based on the temporal change in the amplitude of the waveform of the decoded speech has been shown. However, as described with reference to FIGS. 12 and 13, since it is highly possible that the information of the temporal change in amplitude that seems to be a plosive sound is lost in the decoded speech, such a plosive sound cannot be detected. On the other hand, since the MBE-based speech encoding method encodes the power of the frequency spectrum as envelope information, the reproducibility of the power spectrum is high.

そこで、第２の実施形態では、パワースペクトルを用いて破裂音を検出する。 Therefore, in the second embodiment, a plosive sound is detected using the power spectrum.

（Ｃ−１）第２の実施形態の構成
図３は、第２の実施形態の復号音声の音質改善装置１Ｂの構成を示す機能ブロック図であり、第１の実施形態に係る図１との同一、対応の構成要素には同一符号を付して示している。 (C-1) Configuration of Second Embodiment FIG. 3 is a functional block diagram showing the configuration of the decoded speech sound quality improvement apparatus 1B of the second embodiment, and is related to FIG. 1 according to the first embodiment. The same and corresponding components are indicated by the same reference numerals.

図３において、第２の実施形態の復号音声の音質改善装置１Ｂは、受信手段１１、ＭＢＥ系復号手段１２、破裂音検出手段２１、破裂化処理手段１４を有する。 In FIG. 3, the decoded speech sound quality improvement apparatus 1 B according to the second embodiment includes a receiving unit 11, an MBE decoding unit 12, a burst sound detecting unit 21, and a bursting processing unit 14.

第２の実施形態は、第１の実施形態に比較して、破裂音検出手段１３に代えて破裂音検出手段２１を設けている点が、第１の実施形態と異なっている。 The second embodiment is different from the first embodiment in that a plosive detection means 21 is provided in place of the plosive detection means 13 as compared with the first embodiment.

破裂音検出手段２１は、ＭＢＥ系復号手段１２から与えられた復号音声を解析して、当該フレームが破裂音の破裂部始端を有するか否かを判定し、得られた判定結果（破裂真偽値と呼ぶ。）は破裂化処理手段１４に与えられる。すなわち、破裂音検出手段２１は、フレームの周波数特性の特徴情報のパターンマッチングを行って、当該フレームが破裂音の破裂部始端か否かを検出する方法である。 The plosive sound detecting means 21 analyzes the decoded speech given from the MBE decoding means 12 to determine whether or not the frame has a rupture start point of the plosive sound, and the obtained determination result (rupture authenticity). Is referred to as a value). That is, the plosive detection means 21 is a method of detecting whether or not the frame is the start of a rupture part of a plosive by performing pattern matching of characteristic information of the frequency characteristics of the frame.

図３に示すように、破裂音検出手段２１は、周波数解析部２２、パターン識別部２３を有する。 As shown in FIG. 3, the plosive sound detection means 21 includes a frequency analysis unit 22 and a pattern identification unit 23.

周波数解析部２２は、ＭＢＥ系復号手段１２からの復号音声を取得し、フレーム毎の周波数スペクトルを算出して、フレーム毎のパワースペクトルを算出する。得られたパワースペクトルはパターン認識部２３へ与えられる。パワースペクトルを算出する方法は任意の方法を用いることができ、例えばＦＦＴ（ＦｉｒｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）やウェーブレット変換やフィルタバンクなどが適用できる。またウェーブレット変換やフィルタバンクを用いれば帯域の不等分割も可能だが、ここではフィルタバンクを用いた帯域の等分割が好適に用いられる。解析する帯域は、無意味な直流成分、ピッチ周波数がある帯域、音声符号化で抑圧されることが多い３４００Ｈｚ以上は不要なので、例えば帯域幅を４００Ｈｚとして、中心周波数を４００Ｈｚ、８００Ｈｚ、・・・、３２００Ｈｚとする８帯域のフィルタバンクが推奨される。 The frequency analysis unit 22 acquires the decoded speech from the MBE decoding unit 12, calculates a frequency spectrum for each frame, and calculates a power spectrum for each frame. The obtained power spectrum is given to the pattern recognition unit 23. Any method can be used as the method for calculating the power spectrum, and for example, FFT (First Fourier Transform), wavelet transform, filter bank, or the like can be applied. Further, if wavelet transform or a filter bank is used, it is possible to divide the band evenly. However, here, equal division of the band using the filter bank is preferably used. The band to be analyzed is a meaningless DC component, a band having a pitch frequency, and 3400 Hz or more that is often suppressed by speech coding is unnecessary. For example, the bandwidth is set to 400 Hz and the center frequency is set to 400 Hz, 800 Hz,. An 8-band filter bank at 3200 Hz is recommended.

パターン識別部２３は、周波数解析部２２から与えられたパワースペクトルのパターン認識を行って当該フレームが破裂音であるか否かを判定し、その判定結果を破裂真偽値とし、得られた破裂真偽値は、破裂音検出手段２１の出力として破裂化処理手段１４に与えられる。ここで、パターン認識には様々な方法を適用することができ、例えば単純なパターンマッチングやニューラルネットワークなどを選択することができるが、サポートベクターマシンが好適に用いられる。 The pattern identification unit 23 recognizes the pattern of the power spectrum given from the frequency analysis unit 22 to determine whether or not the frame is a plosive sound, sets the determination result as a burst truth value, and obtains the obtained burst The true / false value is given to the rupturing processing means 14 as an output of the plosive sound detecting means 21. Here, various methods can be applied to the pattern recognition. For example, simple pattern matching or a neural network can be selected, and a support vector machine is preferably used.

なお、以上ではパワースペクトルだけを用いるかのように書いたが、パターン認識部２３は帯域分割されていない復号音声自体のパワーをも加えてパターン認識しても良い。また、ここまで単一のフレームから得られたパワースペクトルだけを用いるかのように書いたが、パターン識別部２３は、過去のフレームを用いても良いし、出力を遅延させることで未来のフレームを用いても良い。また、任意の値を用いてパワー値を正規化しても良く、例えば現在のフレームの復号音声のパワーで他のパワー値を除することで正規化しても良い。 In addition, although it wrote as if only a power spectrum was used above, the pattern recognition part 23 may add the power of the decoded speech itself which is not band-divided, and may recognize a pattern. Although it has been written as if only the power spectrum obtained from a single frame has been used so far, the pattern identification unit 23 may use a past frame, or a future frame by delaying the output. May be used. Further, the power value may be normalized using an arbitrary value, for example, it may be normalized by dividing another power value by the power of decoded speech of the current frame.

（Ｃ−２）第２の実施形態の動作
次に、第２の実施形態に係る音声復号化装置１Ｂの動作を説明する。音声復号化装置１Ｂの全体動作は、第１の実施形態の場合と同様であるので、その説明は省略し、以下では、破裂音検出手段２１の動作を説明する。 (C-2) Operation of the Second Embodiment Next, the operation of the speech decoding apparatus 1B according to the second embodiment will be described. Since the overall operation of the speech decoding apparatus 1B is the same as that of the first embodiment, the description thereof is omitted, and the operation of the plosive detection means 21 will be described below.

ＭＢＥ系復号手段から出力された復号音声は、破裂音検出手段２１に与えられる。破裂音検出手段２１では、復号音声が、周波数解析部２２によって周波数解析が行われてフレーム毎のパワースペクトルが算出され、得られたフレーム毎のパワースペクトルはパターン認識部２３に与えられる。 The decoded speech output from the MBE decoding means is given to the plosive sound detection means 21. In the plosive detection means 21, the decoded speech is subjected to frequency analysis by the frequency analysis unit 22 to calculate a power spectrum for each frame, and the obtained power spectrum for each frame is given to the pattern recognition unit 23.

破裂音検出手段２１のパターン認識部２３において、フレーム毎のパワースペクトルは、所定のパターン認識手法によってパターン認識が行われて、当該フレームが破裂音であるか否かが判定され、その判定結果が破裂真偽値として破裂化処理手段１４に与えられる。 In the pattern recognition unit 23 of the plosive sound detection means 21, the power spectrum for each frame is subjected to pattern recognition by a predetermined pattern recognition method to determine whether or not the frame is a plosive sound. It is given to the rupturing processing means 14 as a rupture truth value.

そして、破裂化処理手段１４において、与えられた破裂真偽値が真であるなら、与えられた復号音声に所定の重み係数を乗じて振幅変調を施し、破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得る。その得られた改善音声は後段に出力される。 Then, in the bursting processing means 14, if the given burst truth value is true, the given decoded speech is multiplied by a predetermined weight coefficient to perform amplitude modulation, and if the burst truth value is false, The decoded speech is passed as it is to obtain improved speech. The obtained improved speech is output to the subsequent stage.

（Ｃ−３）第２の実施形態の効果
第２の実施形態によれば、ＭＢＥ系の音声符号化方式の復号音声において、特徴的な音響現象が損なわれた破裂音をより確実に破裂化させるので、当該復号音声の明瞭性を改善して聴き心地を向上させた音声を利用者に提供できる。 (C-3) Effects of the Second Embodiment According to the second embodiment, the burst sound in which the characteristic acoustic phenomenon is impaired is more reliably burst in the decoded speech of the MBE speech coding scheme. Therefore, it is possible to provide the user with the voice that improves the clarity of the decoded voice and improves the listening comfort.

（Ｄ）第３の実施形態
次に、本発明に係る音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器の第３の実施形態を、図面を参照しながら説明する。 (D) Third Embodiment Next, a third embodiment of the speech decoding apparatus, speech decoding method, speech decoding program, and communication device according to the present invention will be described with reference to the drawings.

第２の実施形態では、破裂音が検出されたフレームはすべて破裂化させる構成を説明した。しかし、破裂部が例えば１０ｍｓを超えるような破裂音では、２フレーム連続で破裂音が検出される可能性がある。また、そもそもパターン認識のエラーによって２フレーム以上連続で破裂音が検出される可能性もある。このような場合、第２の実施形態の構成では１つの破裂音を２回破裂化させてしまう可能性がある。 In the second embodiment, the configuration in which all frames in which a plosive sound is detected is ruptured has been described. However, in the case of a plosive sound in which the rupture portion exceeds 10 ms, for example, there is a possibility that the plosive sound is detected in two consecutive frames. In the first place, a plosive sound may be detected continuously for two or more frames due to a pattern recognition error. In such a case, in the configuration of the second embodiment, there is a possibility that one plosive sound is ruptured twice.

そこで、第３の実施形態では、複数フレームが連続して破裂音と検出された場合にも１回しか破裂化させないようにする。 Therefore, in the third embodiment, even when a plurality of frames are continuously detected as a plosive sound, the bursting is performed only once.

（Ｄ−１）第３の実施形態の構成
図４は、第３の実施形態に係る音声復号化装置１Ｃの構成を示す機能ブロック図である。図４において、第１の実施形態に係る図１および第２の実施形態に係る図３と同一、対応する構成要素には、同一符号を付して示している。 (D-1) Configuration of Third Embodiment FIG. 4 is a functional block diagram showing a configuration of a speech decoding apparatus 1C according to the third embodiment. In FIG. 4, the same and corresponding components as those in FIG. 1 according to the first embodiment and FIG. 3 according to the second embodiment are denoted by the same reference numerals.

図４において、第３の実施形態の音声復号化装置１Ｃは、受信手段１１、ＭＢＥ系復号手段１２、破裂音検出手段２１、破裂化処理手段３１を有する。 In FIG. 4, the speech decoding apparatus 1 C according to the third embodiment includes a receiving unit 11, an MBE decoding unit 12, a burst sound detecting unit 21, and a bursting processing unit 31.

第３の実施形態は、第２の実施形態と比較して、破裂化処理手段１４に代えて破裂化処理手段３１を設けている点が、第２の実施形態は異なっている。 The third embodiment is different from the second embodiment in that a rupture treatment means 31 is provided instead of the rupture treatment means 14.

破裂化処理手段３１は、破裂音検出手段２１からの破裂真偽値に基づいて、復号音声を破裂化させて、得られた改善音声を出力するものである。 The bursting processing means 31 bursts the decoded speech based on the burst truth value from the bursting sound detection means 21 and outputs the obtained improved speech.

図４に示すように、破裂化処理手段３１は、破裂検定部３２、振幅変調部３３を有する。 As shown in FIG. 4, the bursting processing unit 31 includes a burst verification unit 32 and an amplitude modulation unit 33.

破裂検定部３２は、破裂音検出手段２１から与えられた破裂真偽値と、過去の破裂真偽値とを比較して、真値が連続しない補正破裂真偽値を生成し、得られた補正破裂真偽値は、振幅変調部３３に与えられる。 The burst verification unit 32 compares the burst true value given from the burst sound detection means 21 with the past burst true value, and generates a corrected burst true value whose true value is not continuous. The corrected burst truth value is given to the amplitude modulation unit 33.

ここで、破裂検定部３２における補正破裂真偽値の生成方法を説明する。 Here, a method of generating a corrected burst truth value in the burst verification unit 32 will be described.

まず、破裂検定部３２は１フレーム過去の破裂真偽値を記憶し、新たに現在のフレームの破裂真偽値が入力されると、破裂検定部３２は、記憶している過去の破裂真偽値と現在のフレームの破裂真偽値とを比較する。そして、過去の破裂真偽値が「偽」で現在の破裂真偽値が「真」である場合にのみ、破裂検定部３２は補正破裂真偽値を「真」として生成する。 First, the burst verification unit 32 stores the burst true / false value of one frame in the past, and when the burst true / false value of the current frame is newly input, the burst verification unit 32 stores the past burst true / false value stored therein. Compare the value with the burst truth value of the current frame. Then, only when the previous burst true value is “false” and the current burst true value is “true”, the burst test unit 32 generates the corrected burst true value as “true”.

なお、それ以外の場合、すなわち、（ａ）過去の破裂真偽値が「偽」で現在の破裂真偽値が「偽」である場合、（ｂ）過去の破裂真偽値が「真」で現在の破裂真偽値が「偽」である場合、（ｃ）過去の破裂真偽値が「真」で現在の破裂真偽値が「真」である場合、破裂検定部３２は補正破裂真偽値を「偽」として生成する。 In other cases, that is, (a) the past burst true value is “false” and the current burst true value is “false”, (b) the past burst true value is “true”. When the current burst truth value is “false”, (c) when the previous burst truth value is “true” and the current burst truth value is “true”, the burst test unit 32 corrects the burst. A Boolean value is generated as “false”.

補正破裂真偽値を生成した後、破裂検定部３２は、現在の破裂真偽値を過去の破裂真偽値に上書きして記憶する。 After generating the corrected burst true / false value, the burst verification unit 32 stores the current burst true / false value overwriting the previous burst true / false value.

振幅変調部３３は、与えられた補正破裂真偽値が「真」であるなら、与えられた復号音声に所定の重み係数を乗じて振幅変調を施す。一方、補正破裂真偽値が「偽」であるなら、振幅変調部３３は復号音声をそのまま通過させて、改善音声を得る。この得られた改善音声は破裂化処理手段３１の出力として出力する。振幅変調部３３の動作は、２つ目の入力が破裂真偽値であるか補正破裂真偽値であるかを除けば、第１の実施形態に係る図１の破裂化処理手段１４と同一である。 If the given corrected burst true / false value is “true”, the amplitude modulating unit 33 performs amplitude modulation by multiplying the given decoded speech by a predetermined weight coefficient. On the other hand, if the corrected burst true / false value is “false”, the amplitude modulation unit 33 passes the decoded speech as it is to obtain improved speech. The obtained improved speech is output as the output of the bursting processing means 31. The operation of the amplitude modulation unit 33 is the same as that of the bursting processing unit 14 of FIG. 1 according to the first embodiment except that the second input is a bursting truth value or a corrected bursting truth value. It is.

以上のように、振幅変調を補正破裂真偽値に基づいて行うことで、誤って２フレーム以上連続して破裂化させてしまう誤りが起きなくなる。 As described above, by performing amplitude modulation on the basis of the corrected burst true / false value, an error that erroneously bursts continuously for two or more frames does not occur.

（Ｄ−２）第３の実施形態の動作
次に、第３の実施形態に係る音声復号化装置１Ｃの動作を説明する。音声復号化装置１Ｃの全体動作は、第１および第２の実施形態の場合と同様であるので、その説明は省略し、以下では、破裂化処理手段３１の動作を説明する。 (D-2) Operation of the Third Embodiment Next, the operation of the speech decoding apparatus 1C according to the third embodiment will be described. Since the overall operation of the speech decoding apparatus 1C is the same as that in the first and second embodiments, the description thereof will be omitted, and the operation of the bursting processing means 31 will be described below.

破裂音検出手段２１から出力された破裂真偽値は、破裂化処理手段３１の破裂検定部３２に与えられる。破裂化処理手段３１の破裂検定部３２において、与えられた現在のフレームの破裂真偽値は、過去のフレームの破裂真偽値と比較される。このとき、破裂検定部３２において、過去のフレームの破裂真偽値が偽であり、現在のフレームの破裂真偽値が真である場合、補正破裂真偽値を真として生成して出力する。一方、それ以外の場合には、補正破裂真偽値は偽として生成して出力する。その得られた補正破裂真偽値は、振幅変調部３３に与えられる。なお、破裂検定部３２において、生成された補正破裂真偽値は、過去のフレームの破裂真偽値に上書きされて記憶される。 The rupture truth value output from the rupture sound detection means 21 is given to the rupture verification unit 32 of the rupture processing means 31. In the burst verification unit 32 of the burst processing means 31, the given burst true / false value of the current frame is compared with the burst true / false values of the past frame. At this time, if the burst true / false value of the past frame is false and the burst true / false value of the current frame is true, the burst verification unit 32 generates and outputs the corrected burst true / false value as true. On the other hand, in other cases, the corrected burst true / false value is generated and output as false. The obtained corrected burst true / false value is given to the amplitude modulator 33. In the burst verification unit 32, the generated corrected burst true / false value is overwritten and stored on the burst true / false value of the past frame.

破裂化処理手段３１の振幅変調部３３では、与えられた補正破裂真偽値が真であるなら、複製音声に所定の重み係数を乗じて振幅変調を施す。一方、補正破裂真偽値が偽であるなら、振幅変調部３３は復号音声をそのまま通過させて、改善音声を得る。この得られた改善音声は破裂化処理手段３１の出力として出力する。 The amplitude modulation unit 33 of the bursting processing unit 31 performs amplitude modulation by multiplying the duplicated voice by a predetermined weighting factor if the given corrected burst true / false value is true. On the other hand, if the corrected burst true / false value is false, the amplitude modulation unit 33 passes the decoded speech as it is to obtain improved speech. The obtained improved speech is output as the output of the bursting processing means 31.

（Ｄ−３）第３の実施形態の効果
第３の実施形態によれば、ＭＢＥ系の音声符号化方式の複号音声において、特徴的な音響現象が損なわれた破裂音を破裂化させ、かつ誤って連続して破裂化させることがないので、当該複号音声の明瞭性を改善して聴き心地を向上させた音声を利用者に提供できる。 (D-3) Effect of the Third Embodiment According to the third embodiment, the burst sound in which the characteristic acoustic phenomenon is impaired is burst in the decoded speech of the MBE speech coding method, In addition, since it is not accidentally continuously burst, it is possible to provide the user with a voice that improves the clarity of the decoded voice and improves the listening comfort.

（Ｅ）第４の実施形態
次に、本発明に係る音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器の第４の実施形態を、図面を参照しながら説明する。 (E) Fourth Embodiment Next, a fourth embodiment of the speech decoding apparatus, speech decoding method, speech decoding program, and communication device according to the present invention will be described with reference to the drawings.

第１〜第３の実施形態では、固定された重み係数を用いていた。しかし、入力音声の破裂部は常にフレーム内の特定の位置で発生するわけではない。そのため、元々無音部であった部分や破裂部後半を破裂化させようとして十分な破裂化の効果が得られない可能性がある。 In the first to third embodiments, a fixed weight coefficient is used. However, the burst portion of the input voice does not always occur at a specific position in the frame. For this reason, there is a possibility that a sufficient effect of rupturing may not be obtained in an attempt to rupture the originally silent portion or the latter half of the rupture portion.

そこで、第４の実施形態では、復号音声の重心時刻に基づいて重み係数を時間方向に平行移動させることで、元々破裂部始端であった部分を正しく破裂化させるようにする。 Therefore, in the fourth embodiment, the weight coefficient is translated in the time direction based on the centroid time of the decoded speech, so that the part that was originally the start of the rupture part is correctly ruptured.

なお、第４の実施形態の特徴は破裂化処理手段であり、破裂音検出手段には第１と第２の実施形態の何れの破裂音検出手段も用いることができるが、第４の実施形態では第２の実施形態の破裂音検出手段２１を用いる場合を例示する。 The feature of the fourth embodiment is a rupture processing means, and any of the plosive detection means of the first and second embodiments can be used as the plosive detection means. Then, the case where the plosive detection means 21 of 2nd Embodiment is used is illustrated.

（Ｅ−１）第４の実施形態の構成
図５は、第４の実施形態の音声復号化装置の構成を示す機能ブロック図である。図５において、第１の実施形態に係る図１および第２の実施形態に係る図３との同一、対応の構成要素には同一符号を付して示している。 (E-1) Configuration of the Fourth Embodiment FIG. 5 is a functional block diagram showing the configuration of the speech decoding apparatus of the fourth embodiment. In FIG. 5, the same and corresponding components as those in FIG. 1 according to the first embodiment and FIG. 3 according to the second embodiment are denoted by the same reference numerals.

図５において、第４の実施形態に係る音声復号化装置１Ｄは、受信手段１１、ＭＢＥ系復号手段１２、破裂音検出手段２１、破裂化処理手段４１を有する。 In FIG. 5, the speech decoding apparatus 1D according to the fourth embodiment includes a receiving unit 11, an MBE decoding unit 12, a burst sound detecting unit 21, and a bursting processing unit 41.

第４の実施形態は、第２および第３の実施形態に比較して、破裂化処理手段１４に代えて破裂化処理手段４１を設けている点が、第２および第３の実施形態とは異なっている。 The fourth embodiment differs from the second and third embodiments in that a rupture treatment means 41 is provided in place of the rupture treatment means 14 as compared to the second and third embodiments. Is different.

破裂化処理手段４１は、与えられた破裂真偽値に基づいて与えられた復号音声を破裂化させて、得られた改善音声を出力する。 The bursting processing means 41 bursts the given decoded voice based on the given burst truth value, and outputs the obtained improved voice.

破裂化処理手段４１は、非負値化部４２、重心時刻算出部４３、振幅変調部４４を有する。 The bursting processing means 41 includes a non-negative value unit 42, a centroid time calculation unit 43, and an amplitude modulation unit 44.

破裂化処理手段４１に与えられた復号音声は、非負値化部４２および振幅変調部４４に与えられる。 The decoded speech given to the bursting processing means 41 is given to the non-negative value converting unit 42 and the amplitude modulating unit 44.

非負値化部４２は、与えられた復号音声の各サンプルを非負値へと変換し、得られた非負値化信号は重心時刻算出部に与えられる。非負値へと変換する方法は、出力が非負値であれば任意の方法が適用できるが、絶対値が好適に用いられる。 The non-negative value converting unit 42 converts each sample of the given decoded speech into a non-negative value, and the obtained non-negative value signal is supplied to the centroid time calculating unit. As a method of converting to a non-negative value, any method can be applied as long as the output is a non-negative value, but an absolute value is preferably used.

重心時刻算出部４３は、与えられた非負値化信号のフレーム内でのエネルギーの重心の時刻を算出し、得られた重心時刻は振幅変調部４４に与えられる。重心時刻は（１）式で定義される独自の特徴量である。（１）式において、ｔは時刻、Ｔはフレーム長、Ｘ（ｔ）は非負値化信号、Ｃは重心時刻であり、ｔとＴの単位はサンプルである。また、便宜上、ここでのｔはフレーム内の相対的な時刻を表している。

The centroid time calculation unit 43 calculates the centroid time of energy within the frame of the given non-negative signal, and the obtained centroid time is given to the amplitude modulation unit 44. The centroid time is a unique feature amount defined by equation (1). In equation (1), t is the time, T is the frame length, X (t) is the non-negative signal, C is the centroid time, and the unit of t and T is a sample. For convenience, t represents a relative time within the frame.

振幅変調部４４は、重心時刻算出部４３から与えられた重心時刻に基づいて所定の重み係数を時間方向に平行移動した上で、破裂音検出手段２１から与えられた破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段４１の出力として出力する。 The amplitude modulation unit 44 translates a predetermined weight coefficient in the time direction based on the centroid time given from the centroid time calculation unit 43, and the burst truth value given from the plosive sound detection means 21 is true. If there is, the given decoded speech is multiplied by the weighting factor to perform amplitude modulation. If the burst truth value is false, the decoded speech is passed as it is to obtain improved speech, and the obtained improved speech is ruptured. Is output as the output of the conversion processing means 41.

重み係数の設計方法は、第１の実施形態の破裂化処理手段１４と同じであるが、重心時刻に基づいて事前に算出された重み係数を時間方向に平行移動するところが異なる。重み係数の平行移動は、重み係数のピーク位置を重心時刻に基づいて決定することで行う。ピーク位置の決定方法は、重心時刻をそのままピーク位置に一致させる方法が最も簡単なので好適に用いられるが、重心時刻は本来のピーク位置よりも内側に寄る（（１）式においてＣがＴ／２に近づく）傾向と後ろに寄る（（１）式においてＣがＴに近づく）傾向とがあることを考慮して、（２）式によって補正した重心時刻Ｃ’をピーク位置に一致させても良い。

The design method of the weighting factor is the same as that of the bursting processing unit 14 of the first embodiment, but differs in that the weighting factor calculated in advance based on the centroid time is translated in the time direction. The parallel movement of the weighting factor is performed by determining the peak position of the weighting factor based on the centroid time. The method for determining the peak position is preferably used because the simplest method is to match the centroid time to the peak position as it is, but the centroid time is closer to the inside than the original peak position (in the equation (1), C is T / 2). In consideration of the tendency to approach (close to) and the tendency to move backward (C approaches T in equation (1)), the center-of-gravity time C ′ corrected by equation (2) may coincide with the peak position. .

以上のように、重心時刻に基づいて重み係数のピーク位置を変更することで、無音部や破裂部後半を破裂化させてしまうことで破裂化の効果が弱くなってしまう問題を回避できる。 As described above, by changing the peak position of the weighting factor based on the centroid time, it is possible to avoid the problem that the effect of rupture is weakened by rupturing the silent part and the latter half of the rupture part.

（Ｅ−２）第４の実施形態の動作
次に、第４の実施形態に係る音声復号化装置１Ｄの動作を説明する。音声復号化装置１Ｄの全体動作は、第１および第２の実施形態の場合と同様であるので、その説明は省略し、以下では、破裂化処理手段４１の動作を説明する。 (E-2) Operation of the Fourth Embodiment Next, the operation of the speech decoding apparatus 1D according to the fourth embodiment will be described. Since the overall operation of the speech decoding apparatus 1D is the same as that in the first and second embodiments, the description thereof will be omitted, and the operation of the bursting processing means 41 will be described below.

ＭＢＥ復号手段１２から出力された復号音声は、破裂化処理手段４１の非負値化部４２および振幅変調部４４に与えられる。非負値化部４２において、与えられた復号音声の各サンプルが非負値に変換され、得られた非負値化信号は重心時刻算出部４３に与えられる。 The decoded speech output from the MBE decoding unit 12 is given to the non-negative value converting unit 42 and the amplitude modulating unit 44 of the bursting processing unit 41. In the non-negative value converting unit 42, each sample of the given decoded speech is converted into a non-negative value, and the obtained non-negative value signal is supplied to the centroid time calculating unit 43.

重心時刻算出部４３では、（１）式に従って、与えられた非負値化信号のフレーム内でのエネルギーの重心の時刻が算出され、得られた重心時刻は振幅変調部４４に与えられる。 The centroid time calculation unit 43 calculates the centroid time of energy within the frame of the given non-negative signal according to the equation (1), and the obtained centroid time is provided to the amplitude modulation unit 44.

振幅変調部４４では、与えられた重心時刻に基づいて所定の重み係数を時間方向に平行移動した上で、与えられた破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段４１の出力として出力する。 The amplitude modulation unit 44 translates a predetermined weighting factor in the time direction based on the given centroid time, and if the given burst truth value is true, the weighting factor is added to the given decoded speech. If the burst true / false value is false, the decoded speech is passed as it is, the improved speech is obtained, and the obtained improved speech is output as the output of the burst processing means 41.

（Ｅ−３）第４の実施形態の効果
第４の実施形態によれば、ＭＢＥ系の音声符号化方式の復号音声において、特徴的な音響現象が損なわれた破裂音をより確実に破裂化させるので、当該復号音声の明瞭性を改善して聴き心地を向上させた音声を利用者に提供できる。 (E-3) Effect of the Fourth Embodiment According to the fourth embodiment, the burst sound in which the characteristic acoustic phenomenon is impaired is more reliably burst in the decoded speech of the MBE speech coding scheme. Therefore, it is possible to provide the user with the voice that improves the clarity of the decoded voice and improves the listening comfort.

（Ｆ）第５の実施形態
次に、本発明に係る音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器の第５の実施形態を、図面を参照しながら説明する。 (F) Fifth Embodiment Next, a fifth embodiment of the speech decoding apparatus, speech decoding method, speech decoding program, and communication device according to the present invention will be described with reference to the drawings.

第４の実施形態では、復号音声のフレーム内の重心時刻に基づいて、重み係数を時間方向に平行移動させた。この方法は有効であるが、複数のフレームで連続して破裂音が検出された場合には破裂部後半の大部分が次のフレームにかかってしまい、重心時刻と重み係数のピーク位置との関連性が弱くなる恐れがある。 In the fourth embodiment, the weighting factor is translated in the time direction based on the centroid time in the frame of the decoded speech. This method is effective, but if a burst sound is detected continuously in multiple frames, most of the latter half of the rupture is applied to the next frame, and the relationship between the centroid time and the peak position of the weighting coefficient. May be weak.

そこで、第５の実施形態では、連続して破裂音が検出されたフレーム数に基づいて重み係数のピーク位置を決定する。 Therefore, in the fifth embodiment, the peak position of the weighting coefficient is determined based on the number of frames in which a plosive is detected continuously.

（Ｆ−１）第５の実施形態の構成
図６は、第５の実施形態の音声復号化装置１Ｅの構成を示す機能ブロック図である。図６において、第１の実施形態に係る図１、第２の実施形態に係る図３、および第３の実施形態に係る図５との同一、対応の構成要素には同一符号を付して示している。 (F-1) Configuration of Fifth Embodiment FIG. 6 is a functional block diagram showing the configuration of the speech decoding apparatus 1E of the fifth embodiment. In FIG. 6, the same reference numerals are given to the same and corresponding components as those in FIG. 1 according to the first embodiment, FIG. 3 according to the second embodiment, and FIG. 5 according to the third embodiment. Show.

図６において、第５の実施形態の音声復号化装置１Ｅは、受信手段１１、ＭＢＥ系復号手段１２、破裂音検出手段２１、破裂化処理手段５１を有する。第５の実施形態は、第２〜４の実施形態に比較して、破裂化処理手段１４および５１に代えて破裂化処理手段５１を設けている点が、第２〜４の実施形態とは異なっている。 In FIG. 6, the speech decoding apparatus 1 E according to the fifth embodiment includes a reception unit 11, an MBE decoding unit 12, a burst sound detection unit 21, and a bursting processing unit 51. The fifth embodiment is different from the second to fourth embodiments in that a rupture treatment means 51 is provided instead of the rupture treatment means 14 and 51. Is different.

なお、第５の実施形態の特徴は破裂化処理手段５１であり、破裂音検出手段には第１と第２の実施形態の何れの破裂音検出手段も用いることができるが、第５の実施形態では第２の実施形態の破裂音検出手段２１を用いる場合を例示する。 The feature of the fifth embodiment is the rupturing processing means 51, and any of the plosive detection means of the first and second embodiments can be used as the plosive detection means. In the form, the case where the plosive detection means 21 of the second embodiment is used is illustrated.

破裂化処理手段５１は、与えられた破裂真偽値に基づいて与えられた復号音声を破裂化させて、得られた改善音声を出力する。 The bursting processing means 51 bursts the given decoded voice based on the given burst truth value, and outputs the obtained improved voice.

破裂化処理手段５１は、破裂検定部３２、検出回数算出部５３、破裂時刻推定部５４、振幅変調部５５を有する。 The bursting processing means 51 includes a burst verification unit 32, a detection frequency calculation unit 53, a burst time estimation unit 54, and an amplitude modulation unit 55.

破裂化処理手段５１に与えられた復号音声は、振幅変調部５５に与えられ、破裂音検出手段２１からの破裂真偽値は、破裂検定部３２および検出回数算出部５３に与えられる。 The decoded speech given to the rupturing processing means 51 is given to the amplitude modulation section 55, and the rupture truth value from the rupture sound detection means 21 is given to the rupture verification section 32 and the detection frequency calculation section 53.

破裂検定部３２の動作は、第３の実施形態のそれと同一であるため、説明を省略する。 Since the operation of the burst verification unit 32 is the same as that of the third embodiment, description thereof is omitted.

検出回数算出部５３は、破裂音が連続して検出された回数をカウントし、得られた検出回数は破裂時刻推定部５４に与えられる。検出回数算出部５３の具体的な動作は、内部に検出回数カウンタを有しており、破裂音検出手段２１から与えられた破裂真偽値が真であれば当該機出回数カウンタを「１」増やし、当該破裂真偽値が偽であれば当該検出回数カウンタを「０」に戻し、現在の検出回数カウンタの値を検出回数として出力する。 The number-of-detections calculation unit 53 counts the number of times that the burst sound has been detected continuously, and the obtained number of detections is given to the burst time estimation unit 54. The specific operation of the detection frequency calculation unit 53 has a detection frequency counter inside, and if the rupture truth value given from the plosive sound detection means 21 is true, the operation frequency counter is set to “1”. If the burst true / false value is false, the detection counter is reset to “0”, and the current detection counter value is output as the detection count.

破裂時刻推定部５４は、与えられた検出回数に基づいて、連続して破裂音が検出されたフレームの中で最初のフレームのどの時刻に破裂部のパワーのピーク位置があるかを推定し、得られた破裂時刻は振幅変調部５５に与えられる。 The rupture time estimation unit 54 estimates, based on the given number of detections, at which time of the first frame among the frames in which the rupture sound is continuously detected, the peak position of the rupture part power, The obtained burst time is given to the amplitude modulator 55.

破裂時刻推定部５４にはアルゴリズム上の注意点がある。すなわち、最初に破裂音を検出したフレームを破裂化させるため、（現実的ではないが可能性の問題として）破裂音が長期間連続して検出され続けた場合、破裂真偽値が偽となるまで出力を遅延させ続けなければならない。このような問題が生じないように、所定の回数より大きい検出回数は無視する動作が必要となる。 The rupture time estimation unit 54 has a note on the algorithm. In other words, if the burst sound is detected continuously for a long period of time (as a problem that is not realistic), the burst truth value becomes false in order to burst the frame that first detected the burst sound. The output must continue to be delayed until In order to prevent such a problem, an operation of ignoring the number of detections greater than a predetermined number of times is necessary.

以下、破裂部始端を含むフレームから破裂部後半が存在していると考えられる最後のフレームまでのフレーム数を、連続破裂フレーム数と呼ぶ。 Hereinafter, the number of frames from the frame including the rupture portion start end to the last frame where the latter half of the rupture portion is considered to be present is referred to as a continuous rupture frame number.

破裂部は長くても３０ｍｓ程度であることから、フレーム周期が１０ｍｓの場合、連続破裂フレーム数は最大３フレーム程度と考えるのが妥当である。そこで、破裂時刻の算出は連続破裂フレーム数に基づいて行うものとして、過去の検出回数を１フレーム分だけ記憶するようにしておいて、現在の検出回数が「０」で且つ前回の検出回数が「１」または「２」である場合には現在の検出回数を連続破裂フレーム数とし、現在の検出回数が「３」である場合には今回の検出回数を連続破裂フレーム数とし、それ以外の場合には連続破裂フレーム数を「０」とする。 Since the rupture part is about 30 ms at the longest, it is reasonable to consider that the number of continuous rupture frames is about 3 frames at the maximum when the frame period is 10 ms. Therefore, it is assumed that the burst time is calculated based on the number of consecutive burst frames, the past detection count is stored for one frame, the current detection count is “0”, and the previous detection count is If it is “1” or “2”, the current number of detections is the number of continuous burst frames. If the current number of detections is “3”, the current number of detections is the number of continuous burst frames. In this case, the continuous burst frame number is set to “0”.

検出回数に基づく破裂時刻の算出は（３）式によって行う。（３）式において、Ｎは連続破裂フレーム数であり、Ｐは破裂時刻であり、Ｐ０は破裂時刻の最小値である。Ｐ０は０以上（Ｔ−１）未満の任意の値であるが、重み係数の設計における破裂部始端の長さが好適に用いられる。なお、Ｎ＝０の場合には重み係数による振幅変調は行われないため、Ｐは計算する必要がなく、例えば前回の値をそのまま保持しておく。

The calculation of the rupture time based on the number of times of detection is performed by equation (3). In equation (3), N is the number of continuous burst frames, P is the burst time, and P0 is the minimum burst time. P0 is an arbitrary value not less than 0 and less than (T-1), but the length of the rupture start point in the design of the weighting factor is preferably used. Note that, when N = 0, amplitude modulation by the weighting coefficient is not performed, so P does not need to be calculated. For example, the previous value is held as it is.

振幅変調部５５は、破裂時刻推定部５４から与えられた破裂時刻に基づいて所定の重み係数を時間方向に平行移動した上で、破裂検定部３２から与えられた補正破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、補正破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段５１の出力として出力する。 The amplitude modulation unit 55 translates a predetermined weighting factor in the time direction based on the burst time given from the burst time estimation unit 54, and the corrected burst true / false value given from the burst verification unit 32 is true. If there is, the given decoded speech is multiplied by the weighting factor to perform amplitude modulation, and if the corrected burst truth value is false, the decoded speech is passed as it is to obtain improved speech, and the obtained improved speech is Output as the output of the bursting processing means 51.

重み係数の設計方法は、第１の実施形態の破裂化処理手段１４と同じであるが、破裂時刻に基づいて事前に算出された重み係数を時間方向に平行移動するところが異なる。重み係数の平行移動は、重み係数のピーク位置を破裂時刻に一致させることで行う。 The design method of the weighting factor is the same as that of the rupturing processing means 14 of the first embodiment, except that the weighting factor calculated in advance based on the rupture time is translated in the time direction. The parallel movement of the weighting factor is performed by matching the peak position of the weighting factor with the burst time.

以上のように、連続して破裂音が検出された回数に基づいて重み係数のピーク位置を変更することで、破裂音が連続して検出された場合に連続して破裂化させてしまう問題と、無音部を破裂化させてしまうことで破裂化の効果が弱くなってしまう問題を回避できる。 As described above, by changing the peak position of the weighting coefficient based on the number of times that the burst sound has been detected continuously, the burst sound may be continuously burst when detected. The problem that the effect of the bursting is weakened by bursting the silent part can be avoided.

（Ｆ−２）第５の実施形態の動作
次に、第５の実施形態に係る音声復号化装置１Ｅの動作を説明する。音声復号化装置１Ｅの全体動作は、第１〜第４の実施形態の場合と同様であるので、その説明は省略し、以下では、破裂化処理手段５１の動作を説明する。 (F-2) Operation of Fifth Embodiment Next, the operation of the speech decoding apparatus 1E according to the fifth embodiment will be described. Since the overall operation of the speech decoding apparatus 1E is the same as that in the first to fourth embodiments, the description thereof will be omitted, and the operation of the bursting processing means 51 will be described below.

破裂音検出手段２１から出力された破裂真偽値は、破裂化処理手段５１の破裂検定部３２および検出回数算出部５３に与えられる。 The rupture truth value output from the plosive sound detection means 21 is given to the rupture verification section 32 and the detection frequency calculation section 53 of the rupture processing means 51.

破裂化処理手段５１の破裂検定部３２では、第３の実施形態と同様にして、与えられた現在のフレームの破裂真偽値は、過去のフレームの破裂真偽値と比較して、補正破裂真偽値が生成される。得られた補正破裂真偽値は、振幅変調部５５に与えられる。 In the burst verification unit 32 of the bursting processing means 51, the burst true value of the given current frame is compared with the burst true value of the past frame in the same manner as in the third embodiment. A boolean value is generated. The obtained corrected burst true / false value is given to the amplitude modulation section 55.

破裂化処理手段５１の検出回数算出部５３では、与えられた各フレームの破裂真偽値に基づいて破裂音が連続して検出された回数をカウントし、得られた検出回数が破裂時刻推定部５４に与えられる。 The number-of-detections calculation unit 53 of the bursting processing means 51 counts the number of times that the bursting sound is continuously detected based on the given burst true / false value of each frame, and the obtained number of detections is the burst time estimation unit. 54.

破裂時刻推定部５４では、与えられた検出回数に基づいて、連続して破裂音が検出されたフレームの中で最初のフレームのどの時刻に破裂部のパワーのピーク位置があるかを推定し、得られた破裂時刻は振幅変調部５５に与えられる。 The rupture time estimation unit 54 estimates, based on the given number of detections, at which time of the first frame among the frames in which the rupture sound is continuously detected, the peak position of the rupture part power, The obtained burst time is given to the amplitude modulator 55.

振幅変調部５５では、破裂時刻推定部５４から与えられた破裂時刻に基づいて所定の重み係数を時間方向に平行移動した上で、破裂検定部３２から与えられた補正破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、補正破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段５１の出力として出力する。 The amplitude modulation unit 55 translates a predetermined weight coefficient in the time direction based on the burst time given from the burst time estimation unit 54, and the corrected burst true / false value given from the burst verification unit 32 is true. If there is, the given decoded speech is multiplied by the weighting factor to perform amplitude modulation, and if the corrected burst truth value is false, the decoded speech is passed as it is to obtain improved speech, and the obtained improved speech is Output as the output of the bursting processing means 51.

（Ｆ−３）第５の実施形態の効果
第５の実施形態によれば、ＭＢＥ系の音声符号化方式の復号音声において、特徴的な音響現象が損なわれた破裂音をより確実に破裂化させ、かつ誤って連続して破裂化させることがないので、当該復号音声の明瞭性を改善して聴き心地を向上させた音声を利用者に提供できる。 (F-3) Effect of Fifth Embodiment According to the fifth embodiment, in the decoded speech of the MBE speech coding method, the plosive sound in which the characteristic acoustic phenomenon is impaired is more reliably ruptured. Therefore, it is possible to provide the user with a voice that improves the clarity of the decoded voice and improves the listening comfort.

（Ｇ）第６の実施形態
次に、本発明に係る音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器の第６の実施形態を、図面を参照しながら説明する。 (G) Sixth Embodiment Next, a sixth embodiment of the speech decoding apparatus, speech decoding method, speech decoding program, and communication device according to the present invention will be described with reference to the drawings.

第５の実施形態は、連続して破裂音が検出された場合にも最初に破裂音が検出されたフレームだけを破裂化させ、さらに破裂化させる位置も連続して破裂音が検出された回数に基づいて推定された破裂時間によって動的に設定できるが、当該連続して破裂音が検出された回数はパターン認識の精度に強く依存するために、破裂時間の推定結果が不安定になる問題がある。 In the fifth embodiment, even when a burst sound is continuously detected, only the frame in which the burst sound is first detected is burst, and the number of times the burst sound is continuously detected at the position to be further burst Can be set dynamically according to the burst time estimated based on the above, but the number of times that the burst sound is detected depends strongly on the accuracy of pattern recognition, so the result of burst time estimation becomes unstable. There is.

そこで、第６の実施形態では、第４の実施形態の重心時刻と第５の実施形態の検出回数の両方に基づいて破裂時間を推定する。 Therefore, in the sixth embodiment, the rupture time is estimated based on both the center-of-gravity time of the fourth embodiment and the number of detections of the fifth embodiment.

（Ｇ−１）第６の実施形態の構成
図７は、第６の実施形態に係る音声復号化装置１Ｆの構成を示す機能ブロック図である。図７において、第１〜５の実施形態に係る図１、図３〜６との同一、対応の構成要素には同一符号を付して示している。 (G-1) Configuration of Sixth Embodiment FIG. 7 is a functional block diagram showing a configuration of a speech decoding apparatus 1F according to the sixth embodiment. In FIG. 7, the same and corresponding components as those in FIGS. 1 and 3 to 6 according to the first to fifth embodiments are denoted by the same reference numerals.

図７において、第６の実施形態に係る音声復号化装置１Ｆは、受信手段１１、ＭＢＥ系復号手段１２、破裂音検出手段２１、破裂化処理手段６１を有する。第６の実施形態は、第４および第５の実施形態に比較して、破裂化処理手段４１および５１に代えて破裂化処理手段６１を設けている点が、第４および第５の実施形態とは異なっている。 In FIG. 7, the speech decoding apparatus 1 F according to the sixth embodiment includes a receiving unit 11, an MBE decoding unit 12, a burst sound detecting unit 21, and a bursting processing unit 61. The sixth embodiment is different from the fourth and fifth embodiments in that a rupture treatment means 61 is provided instead of the rupture treatment means 41 and 51. Is different.

破裂化処理手段６１は、与えられた破裂真偽値に基づいて与えられた復号音声を破裂化させて、得られた改善音声を出力する。 The bursting processing means 61 bursts the given decoded voice based on the given burst truth value, and outputs the obtained improved voice.

破裂化処理手段６１は、破裂検定部３２、非負値化部４２、検出回数算出部５３、重心時刻算出部６２、振幅変調部６３を有する。 The burst processing means 61 includes a burst verification unit 32, a non-negative value unit 42, a detection frequency calculation unit 53, a centroid time calculation unit 62, and an amplitude modulation unit 63.

破裂化処理手段６１に与えられた復号音声は、非負値化部４２および振幅変調部６３に与えられ、同じく与えられた破裂真偽値は、破裂検定部３２および検出回数算出部５３に与えられる。 The decoded speech given to the bursting processing means 61 is given to the non-negative value converting section 42 and the amplitude modulating section 63, and the burst true / false value that is also given is given to the burst verification section 32 and the detection frequency calculation section 53. .

非負値化部４２の動作は、第４の実施形態のそれと同一であるため、説明を省略する。 Since the operation of the non-negative value unit 42 is the same as that of the fourth embodiment, description thereof is omitted.

検出回数算出部５３の動作は、第５の実施形態のそれと同一であるため、説明を省略する。 Since the operation of the detection frequency calculation unit 53 is the same as that of the fifth embodiment, description thereof is omitted.

重心時刻算出部６２は、与えられた非負値化信号と検出回数に基づいて、連続して破裂音が検出されたフレームの中の重心時刻算出し、得られた重心時刻は振幅変調部６３に与えられる。 The center-of-gravity time calculation unit 62 calculates the center-of-gravity time in a frame in which a plosive is detected continuously based on the given non-negative signal and the number of detections, and the obtained center-of-gravity time is sent to the amplitude modulation unit 63. Given.

第４の実施形態の重心時刻算出部４３では、単一のフレームに対して重心時刻を算出したが、第６の実施形態の重心時刻算出部６２では連続破裂フレーム数分のフレームを用いて重心時刻を算出する。したがって、重心時刻Ｃは（４）式によって算出される。なお、ｔは破裂音が検出された最初のフレームからの相対的な時刻を表している。

The center-of-gravity time calculation unit 43 of the fourth embodiment calculates the center-of-gravity time for a single frame, but the center-of-gravity time calculation unit 62 of the sixth embodiment uses the number of frames for the number of consecutive burst frames. Calculate the time. Therefore, the centroid time C is calculated by the equation (4). Note that t represents a relative time from the first frame when the plosive sound is detected.

なお、重心時刻算出部６２には、第５の実施形態の破裂時刻推定部５４と同じアルゴリズム上の注意点が存在する。そこで、連続破裂フレーム数は破裂時刻推定部５４と同じように設定する。 Note that the center-of-gravity time calculation unit 62 has the same algorithm cautions as the burst time estimation unit 54 of the fifth embodiment. Therefore, the number of continuous burst frames is set in the same manner as the burst time estimation unit 54.

振幅変調部６３は、重心時刻算出部６２から与えられた重心時刻に基づいて所定の重み係数を時間方向に平行移動した上で、破裂検定部３２から与えられた補正破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、補正破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段６１の出力として出力する。 The amplitude modulation unit 63 translates a predetermined weighting factor in the time direction based on the centroid time given from the centroid time calculation unit 62, and the corrected burst true / false value given from the burst test unit 32 is true. If there is, the given decoded speech is multiplied by the weighting factor to perform amplitude modulation, and if the corrected burst truth value is false, the decoded speech is passed as it is to obtain improved speech, and the obtained improved speech is Output as the output of the bursting processing means 61.

重み係数の設計方法は、第４の実施形態の振幅変調部４４と同じであるが、重心時刻が破裂音の最初に検出されたフレーム内に留まらない点が異なる。重み係数のピーク位置の決定方法は、第４の実施形態の振幅変調部４４と同様に重心時刻をそのままピーク位置に一致させる方法が最も簡単なので好適に用いられるが、重心時刻が本来のピーク位置よりも内側に寄る傾向と後ろに寄る傾向とを考慮して、（５）式によって補正した重心時刻Ｃ’をピーク位置に一致させても良い。

The design method of the weighting coefficient is the same as that of the amplitude modulation unit 44 of the fourth embodiment, except that the centroid time does not stay within the frame in which the plosive is first detected. The method for determining the peak position of the weighting factor is preferably used since the simplest method is to match the centroid time to the peak position as is, as in the amplitude modulation unit 44 of the fourth embodiment, but the centroid time is the original peak position. In consideration of the tendency toward the inside and the tendency toward the rear, the center-of-gravity time C ′ corrected by the equation (5) may be matched with the peak position.

以上のように、破裂音が連続して検出された場合には複数のフレームに渡って重心時刻を算出し、得られた重心時刻に基づいて重み係数のピーク位置を変更することで、破裂音が連続して検出された場合に連続して破裂化させてしまう問題と、無音部を破裂化させてしまうことで破裂化の効果が弱くなってしまう問題を回避できる。 As described above, when the burst sound is detected continuously, the center of gravity time is calculated over a plurality of frames, and the peak position of the weighting coefficient is changed based on the obtained center of gravity time. The problem of continuously bursting when the noise is continuously detected and the problem that the effect of bursting is weakened by bursting the silent part can be avoided.

（Ｇ−２）第６の実施形態の動作
次に、第６の実施形態に係る音声復号化装置１Ｆの動作を説明する。音声復号化装置１Ｆの全体動作は、第１〜第５の実施形態の場合と同様であるので、その説明は省略し、以下では、破裂化処理手段６１の動作を説明する。 (G-2) Operation of Sixth Embodiment Next, the operation of the speech decoding apparatus 1F according to the sixth embodiment will be described. Since the overall operation of the speech decoding apparatus 1F is the same as that in the first to fifth embodiments, the description thereof will be omitted, and the operation of the bursting processing means 61 will be described below.

破裂音検出手段２１から出力された破裂真偽値は、破裂化処理手段６１の破裂検定部３２及び検出回数算出部５３に与えられる。 The rupture truth value output from the plosive sound detection means 21 is given to the rupture verification section 32 and the detection frequency calculation section 53 of the rupture processing means 61.

破裂化処理手段６１の破裂検定部３２では、与えられた現在のフレームの破裂真偽値は、過去のフレームの破裂真偽値と比較して、補正破裂真偽値が生成される。得られた補正破裂真偽値は、振幅変調部６３に与えられる。 In the burst verification unit 32 of the bursting processing means 61, the burst true / false value of the given current frame is compared with the burst true / false value of the past frame to generate a corrected burst true / false value. The obtained corrected burst truth value is given to the amplitude modulation unit 63.

破裂化処理手段６１の検出回数算出部５３では、与えられた各フレームの破裂真偽値に基づいて破裂音が連続して検出された回数をカウントし、得られた検出回数が重心時刻算出部６２に与えられる。 The number-of-detections calculation unit 53 of the bursting processing means 61 counts the number of times that the bursting sound is continuously detected based on the given burst true / false value of each frame, and the obtained number of detections is the centroid time calculation unit. 62.

ＭＢＥ系復号手段１２から出力された復号音声は、破裂化処理手段６１の非負値化部４２および振幅変調部６３に与えられる。 The decoded speech output from the MBE decoding unit 12 is given to the non-negative value converting unit 42 and the amplitude modulating unit 63 of the bursting processing unit 61.

破裂化処理手段６１の非負値化部４２によって、復号音声の各サンプルが非負値へと変換され、得られた非負値化信号は重心時刻算出部６２に与えられる。 Each sample of the decoded speech is converted to a non-negative value by the non-negative value converting unit 42 of the bursting processing means 61, and the obtained non-negative value signal is provided to the centroid time calculating unit 62.

重心時刻算出部６２では、与えられた非負値化信号と、破裂音が連続して検出された検出回数とに基づいて、連続して破裂音が検出されたフレームの中の重心時刻が算出され、得られた重心時刻が振幅変調部６３に与えられる。 The center-of-gravity time calculation unit 62 calculates the center-of-gravity time in the frame in which the burst sound has been continuously detected based on the given non-negative signal and the number of detections in which the burst sound has been continuously detected. The obtained barycentric time is given to the amplitude modulation unit 63.

振幅変調部６３では、重心時刻算出部６２から与えられた重心時刻に基づいて所定の重み係数を時間方向に平行移動した上で、破裂検定部３２から与えられた補正破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、補正破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段６１の出力として出力する。 The amplitude modulation unit 63 translates a predetermined weighting factor in the time direction based on the centroid time given from the centroid time calculation unit 62, and the corrected burst true / false value given from the burst test unit 32 is true. If there is, the given decoded speech is multiplied by the weighting factor to perform amplitude modulation, and if the corrected burst truth value is false, the decoded speech is passed as it is to obtain improved speech, and the obtained improved speech is Output as the output of the bursting processing means 61.

（Ｇ−３）第６の実施形態の効果
以上のように、第６の実施形態によれば、ＭＢＥ系の音声符号化方式の復号音声において、特徴的な音響現象が損なわれた破裂音をより確実に破裂化させ、かつ誤って連続して破裂化させることがないので、当該復号音声の明瞭性を改善して聴き心地を向上させた音声を利用者に提供できる。 (G-3) Effect of Sixth Embodiment As described above, according to the sixth embodiment, a plosive sound in which a characteristic acoustic phenomenon is impaired in the decoded speech of the MBE speech coding scheme. Since it is surely ruptured and is not accidentally continuously ruptured, it is possible to provide the user with a voice that improves the clarity of the decoded voice and improves the listening comfort.

（Ｈ）第７の実施形態
次に、本発明に係る音声復号化装置、音声復号化方法、音声復号化プログラム及び通信機器の第７の実施形態を、図面を参照しながら説明する。 (H) Seventh Embodiment Next, a seventh embodiment of the speech decoding apparatus, speech decoding method, speech decoding program, and communication device according to the present invention will be described with reference to the drawings.

第２〜６の実施形態では、復号音声の破裂音は特徴的な音響現象が損なわれていることを前提として、パワースペクトルのみを用いて破裂音の検出を行っていた。しかし、すべての破裂音が損なわれているわけではなく、運よく入力音声の破裂音が精度よく再現されていることもある。そのような場合に、第２〜６の実施形態によって破裂化処理を行うと、改善音声の破裂音が過剰に強くなり、聴感上不自然に聴こえることがある。加えて、そのような場合には、パワースペクトルよりも時間波形を観察する方が、破裂音のパワーのピーク位置やピークの大きさまでもより正確に検出することができる。 In the second to sixth embodiments, the burst sound of the decoded speech is detected using only the power spectrum on the assumption that the characteristic acoustic phenomenon is impaired. However, not all plosives are damaged, and fortunately, the plosives of the input voice may be reproduced with high accuracy. In such a case, when the bursting process is performed according to the second to sixth embodiments, the burst sound of the improved voice becomes excessively strong and may be heard unnaturally. In addition, in such a case, the time waveform is observed more accurately than the power spectrum, even if the power peak position and the peak size of the plosive sound are detected.

そこで、第７の実施形態では、時間波形の観察による時間領域破裂音検出とパワースペクトルのパターン認識による破裂音検出とを同時に行い、その結果を適宜選択して用いる。 Therefore, in the seventh embodiment, time domain plosive detection by observation of a time waveform and plosive detection by power spectrum pattern recognition are simultaneously performed, and the result is appropriately selected and used.

（Ｈ−１）第７の実施形態の構成
図８は、第７の実施形態に係る音声復号化装置の構成を示す機能ブロック図である。図８において、第１〜６の実施形態に係る図１、図３〜７との同一、対応の構成要素には同一符号を付して示している。 (H-1) Configuration of Seventh Embodiment FIG. 8 is a functional block diagram showing a configuration of a speech decoding apparatus according to the seventh embodiment. In FIG. 8, the same and corresponding components as those in FIGS. 1 and 3 to 7 according to the first to sixth embodiments are denoted by the same reference numerals.

図８において、第７の実施形態に係る音声復号化装置１Ｇは、受信手段１１、ＭＢＥ系復号手段１２、周波数領域破裂音検出手段７１、時間領域破裂音検出手段７２、破裂化処理手段７３を有する。 In FIG. 8, the speech decoding apparatus 1G according to the seventh embodiment includes a receiving means 11, an MBE decoding means 12, a frequency domain burst sound detecting means 71, a time domain burst sound detecting means 72, and a burst processing means 73. Have.

受信手段１１およびＭＢＥ系復号手段１２は、第１の実施形態のそれらと同一なので、説明を省略する。 Since the receiving means 11 and the MBE decoding means 12 are the same as those in the first embodiment, description thereof is omitted.

周波数領域破裂音検出手段７１は、第２の実施形態の破裂音検出手段２１と同一なので、説明を省略する。ただし、次に説明する時間領域破裂音検出手段７２に対して、当該周波数領域破裂音検出手段７１はパワースペクトルのパターン認識を行うことで破裂音を検出することに注意されたい。 Since the frequency domain plosive detecting means 71 is the same as the plosive detecting means 21 of the second embodiment, the description thereof is omitted. However, it should be noted that the frequency domain plosive detection means 71 detects a plosive by performing pattern recognition of the power spectrum with respect to the time domain plosive detection means 72 described below.

時間領域破裂音検出手段７２は、復号音声の時間波形を解析して破裂音に係る破裂部のパワーのピーク位置やピーク値の情報をまとめた破裂情報を抽出し、得られた破裂情報は破裂化処理手段７３に与えられる。 The time domain plosive sound detection means 72 analyzes the time waveform of the decoded speech and extracts the burst information that summarizes the power peak position and peak value information of the rupture part related to the plosive sound. Is provided to the processing unit 73.

時間領域破裂音検出手段７２は、短周期パワー算出部１５、パワー比算出部１６、破裂情報抽出部７７を有する。 The time domain plosive sound detection means 72 includes a short cycle power calculation unit 15, a power ratio calculation unit 16, and a rupture information extraction unit 77.

時間領域破裂音検出手段７２は、第１の実施形態の破裂音検出手段１３とほぼ同一だが、破裂検出部１７に代えて破裂情報抽出部７７を設けている点、破裂音検出手段１３が破裂真偽値を出力するのに対して時間領域破裂音検出手段７２は破裂情報を出力する点が第１の実施形態の破裂音検出手段１３と異なる。 The time domain plosive detection means 72 is substantially the same as the plosive detection means 13 of the first embodiment, but is provided with a rupture information extraction unit 77 instead of the rupture detection unit 17. The time domain plosive detection means 72 is different from the plosive detection means 13 of the first embodiment in that the time domain plosive detection means 72 outputs a rupture information while outputting a truth value.

破裂情報抽出部７７は、与えられたパワー比が最大となる時刻を探索して破裂時刻とし、破裂時刻におけるパワー比を破裂パワー比とし、破裂時刻と破裂パワー比を破裂情報としてまとめ、得られた破裂情報を時間領域破裂音検出手段７２の出力として破裂化処理手段７３に与える。 The rupture information extraction unit 77 searches for the time when the given power ratio is maximum to determine the rupture time, sets the power ratio at the rupture time as the rupture power ratio, and summarizes the rupture time and the rupture power ratio as rupture information. The burst information is given to the bursting processing means 73 as an output of the time domain burst sound detection means 72.

破裂化処理手段７３は、周波数領域破裂音検出手段７１より与えられた破裂真偽値および時間領域破裂音検出手段７２より与えられた破裂情報に基づいて、ＭＢＥ系復号手段１２より与えられた復号音声を破裂化させ、得られた改善音声を出力する。 The bursting processing means 73 is based on the burst truth value given by the frequency domain burst sound detecting means 71 and the burst information given by the time domain burst sound detecting means 72, and is decoded by the MBE decoding means 12. The voice is ruptured and the obtained improved voice is output.

破裂化処理手段７３は、非負値化部４２、検出回数算出部５３、重心時刻算出部６２、破裂検定部７４、破裂情報選択部７５、振幅変調部７６を有する。 The rupture processing means 73 includes a non-negative value unit 42, a detection frequency calculation unit 53, a centroid time calculation unit 62, a rupture verification unit 74, a rupture information selection unit 75, and an amplitude modulation unit 76.

非負値化部４２は、第４の実施形態のそれと同一なので、説明を省略する。 Since the non-negative value unit 42 is the same as that of the fourth embodiment, the description thereof is omitted.

検出回数算出部５３は、第５の実施形態のそれと同一なので、説明を省略する。 Since the detection frequency calculation unit 53 is the same as that of the fifth embodiment, the description thereof is omitted.

重心時刻算出部６２は、第６の実施形態のそれと同一なので、説明を省略する。 The center-of-gravity time calculation unit 62 is the same as that of the sixth embodiment, and a description thereof will be omitted.

破裂検定部７４は、与えられた破裂真偽値と破裂情報とに基づいて破裂真偽値を補正して補正破裂真偽値を生成し、得られた補正破裂真偽値は振幅変調部７６に与えられる。補正破裂真偽値は、破裂情報の破裂パワー比が所定の閾値以上であれば真とし、そうでない場合には破裂真偽値に基づいて第３の実施形態の破裂検定部３２と同様にして設定される。 The rupture test unit 74 corrects the rupture truth value based on the given rupture truth value and the rupture information to generate a corrected rupture truth value. The obtained corrected rupture truth value is the amplitude modulation unit 76. Given to. The corrected burst truth value is true if the burst power ratio of the burst information is equal to or greater than a predetermined threshold, and otherwise, based on the burst true value, the same as the burst verification unit 32 of the third embodiment. Is set.

破裂情報選択部７５は、破裂情報に基づいて重心時刻か破裂情報かを選択し、選択された情報は重み係数設計情報として振幅変調部７６に与えられる。情報の選択は、破裂情報の破裂パワー比に基づいて行う。すなわち、所定の閾値を設定しておいて、破裂パワー比が当該閾値より大きければ破裂情報を選択して重み係数設計情報とし、破裂パワー比が当該閾値より小さければ重心時刻を選択して重み係数設計情報とする。当該閾値は、破裂検定部７４で用いられる閾値と同じ値としても良いし、異なる値としても良いが、２つの閾値を同じ値とすることで破裂検定部７４と破裂情報選択部７５の動作を同期させる構成が好適に用いられる。 The rupture information selection unit 75 selects the centroid time or the rupture information based on the rupture information, and the selected information is given to the amplitude modulation unit 76 as weight coefficient design information. The information is selected based on the burst power ratio of the burst information. That is, when a predetermined threshold is set, if the burst power ratio is larger than the threshold, the burst information is selected as weighting factor design information, and if the burst power ratio is smaller than the threshold, the centroid time is selected and the weighting factor is selected. Design information. The threshold value may be the same value as or different from the threshold value used in the burst verification unit 74, but the operation of the burst verification unit 74 and the burst information selection unit 75 can be performed by setting the two threshold values to the same value. A configuration for synchronization is preferably used.

振幅変調部７６は、破裂情報選択部７５から与えられた重み係数設計情報に基づいて重み係数を設計した上で、破裂検定部７４から与えられた補正破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、補正破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段７３の出力として出力する。 The amplitude modulation unit 76 designs the weighting factor based on the weighting factor design information given from the bursting information selection unit 75, and gives the corrected burst true / false value given from the burst testing unit 74 if it is true. When the decoded speech is multiplied by the weighting factor to perform amplitude modulation and the corrected burst truth value is false, the decoded speech is passed as it is to obtain improved speech, and the obtained improved speech is ruptured. 73 is output.

重み係数設計情報が重心時刻である場合には、振幅変調部７６の動作は第６の実施形態の振幅変調部６３と同一である。 When the weighting factor design information is the centroid time, the operation of the amplitude modulation unit 76 is the same as that of the amplitude modulation unit 63 of the sixth embodiment.

重み係数設計情報が破裂情報である場合には、破裂パワー比に応じて重み係数のピーク値を補正した後に、破裂時刻に基づいて第１の実施形態の破裂化処理手段１４と同様に重み係数を時間方向に平行移動する。重み係数のピーク値の補正は、予め設計された重み係数のピークゲイン（図２の例では９ｄＢ）と、破裂ピーク比のゲインとの合成ゲイン（対数尺度なら和、線形尺度なら積）が、元の重み係数のピークゲインを超えないようにする。すなわち、破裂ピーク比が９ｄＢ以上であれば重み係数は常に０ｄＢとし（すなわち破裂化させない）、破裂ピーク比が９ｄＢ未満であれば、元の重み係数のピークゲインと破裂ピーク比の合成ゲインが９ｄＢとなるように重み係数のピークゲインを補正する。例えば破裂ピーク比が４ｄＢであるなら、重み係数のピークゲインは５ｄＢとなるように補正される。 When the weighting factor design information is burst information, after correcting the peak value of the weighting factor according to the bursting power ratio, the weighting factor is determined in the same manner as the bursting processing unit 14 of the first embodiment based on the bursting time. Is translated in the time direction. The correction of the peak value of the weighting factor is performed by combining a predesigned weighting factor peak gain (9 dB in the example of FIG. 2) and a burst peak ratio gain (sum for a logarithmic scale, product for a linear scale), Do not exceed the peak gain of the original weighting factor. That is, if the burst peak ratio is 9 dB or more, the weighting factor is always 0 dB (that is, no bursting is performed), and if the burst peak ratio is less than 9 dB, the combined gain of the peak gain factor and the burst peak ratio is 9 dB. The peak gain of the weighting coefficient is corrected so that For example, if the burst peak ratio is 4 dB, the weight gain peak gain is corrected to 5 dB.

以上のように、時間領域破裂音検出手段の結果と周波数領域破裂音検出手段の結果を選択して使い分けることで、復号音声に破裂音の特徴的な音響現象が再現されている場合に過剰な破裂化がなされることを防ぐことができる。 As described above, by selecting and using the result of the time domain plosive sound detection means and the result of the frequency domain plosive sound detection means, it is excessive when the characteristic acoustic phenomenon of the plosive sound is reproduced in the decoded speech. Rupture can be prevented.

（Ｈ−２）第７の実施形態の動作
次に、第7の実施形態に係る音声復号化装置１Ｇの動作を説明する。 (H-2) Operation of Seventh Embodiment Next, the operation of the speech decoding apparatus 1G according to the seventh embodiment will be described.

ＭＢＥ系復号手段１２から出力された復号音声は、周波数領域破裂音検出手段７１、時間領域破裂音検出手段７２および破裂化処理手段７３に与えられる。 The decoded speech output from the MBE decoding means 12 is given to the frequency domain burst sound detection means 71, the time domain burst sound detection means 72, and the burst processing means 73.

周波数領域破裂音検出手段７１では、第２の実施形態に係る破裂音検出手段２１と同様にして、復号音声の周波数解析が行われ、各フレームのパワースペクトルのパターン認識が行われて、当該フレームが破裂音の破裂部始端を有するか否かを判定し、得られた判定結果が破裂真偽値として破裂化処理手段７３の検出回数算出部５３および破裂検定部７４に与えられる。 In the frequency domain plosive detection means 71, the frequency analysis of the decoded speech is performed and the pattern of the power spectrum of each frame is recognized in the same manner as the plosive detection means 21 according to the second embodiment. Is determined whether or not it has a rupture start point of the plosive sound, and the obtained determination result is given to the detection number calculation unit 53 and the rupture test unit 74 of the rupture processing means 73 as a rupture truth value.

時間領域破裂音検出手段７２では、与えられた復号音声の時間波形を解析して、破裂音に係る破裂部のパワーのピーク位置やピーク値の情報をまとめた破裂情報を抽出し、得られた破裂情報が破裂化処理手段７３の破裂検定部７４および破裂情報選択部７５に与えられる。 The time domain plosive sound detection means 72 analyzes the time waveform of the given decoded speech, extracts the rupture information that summarizes the power peak position and peak value information of the rupture part related to the plosive sound, and obtained. The rupture information is given to the rupture verification unit 74 and the rupture information selection unit 75 of the rupture processing means 73.

ここで、時間領域破裂音検出手段７２の破裂情報抽出部７７では、パワー比算出部１６によって算出されたパワー比が最大となる時刻を探索し、これを破裂時刻とする。そして、破裂時刻におけるパワー比を破裂パワー比とし、破裂時刻と破裂パワー比を破裂情報としてまとめ、得られた破裂情報を時間領域破裂音検出手段７２の出力として破裂化処理手段７３の破裂検定部７４および破裂情報選択部７５に与える。 Here, the rupture information extraction unit 77 of the time-domain rupture sound detection means 72 searches for a time at which the power ratio calculated by the power ratio calculation unit 16 is maximum, and sets this as the rupture time. Then, the power ratio at the rupture time is defined as the rupture power ratio, the rupture time and the rupture power ratio are summarized as rupture information, and the obtained rupture information is output as the output of the time domain rupture sound detecting means 72 to the rupture test unit 73. 74 and the burst information selection unit 75.

破裂化処理手段７３の検出回数算出部５３では、与えられた各フレームの破裂真偽値に基づいて破裂音が連続して検出された回数をカウントし、得られた検出回数が重心時刻算出部６２に与えられる。 The number-of-detections calculation unit 53 of the bursting processing means 73 counts the number of times that the burst sound has been continuously detected based on the given burst true / false value of each frame, and the obtained number of detections is the centroid time calculation unit. 62.

破裂化処理手段７３の非負化部４２では、与えられた復号音声の各サンプルを非負値へと変換し、得られた非負値化信号は重心時刻算出部６２に与えられる。 In the non-negative section 42 of the bursting processing means 73, each sample of the given decoded speech is converted into a non-negative value, and the obtained non-negative value signal is supplied to the centroid time calculation section 62.

破裂化処理手段７３の重心時刻算出部６２では、与えられた非負値化信号と、破裂音が連続して検出された検出回数とに基づいて、連続して破裂音が検出されたフレームの中の重心時刻が算出され、得られた重心時刻が破裂情報選択部７５に与えられる。 In the center-of-gravity time calculation unit 62 of the bursting processing means 73, based on the given non-negative value signal and the number of detections in which the bursting sound is continuously detected, the frames within the frames in which the bursting sound is continuously detected are detected. Centroid time is calculated, and the obtained centroid time is given to the burst information selection unit 75.

破裂検定部７４では、与えられた破裂真偽値と破裂情報とに基づいて破裂真偽値を補正して補正破裂真偽値を生成し、得られた補正破裂真偽値は振幅変調部７６に与えられる。補正破裂真偽値は、破裂情報の破裂パワー比が所定の閾値以上であれば真とし、そうでない場合には破裂真偽値に基づいて第３の実施形態の破裂検定部３２と同様にして設定される。 The rupture test unit 74 corrects the rupture truth value based on the given rupture truth value and the rupture information to generate a corrected rupture truth value. The obtained corrected rupture truth value is the amplitude modulation unit 76. Given to. The corrected burst truth value is true if the burst power ratio of the burst information is equal to or greater than a predetermined threshold, and otherwise, based on the burst true value, the same as the burst verification unit 32 of the third embodiment. Is set.

破裂情報選択部７５は、時間領域破裂音検出手段７２からの破裂情報に基づいて、重心時刻算出部６２からの重心時刻か当該破裂情報かを選択する。このとき破裂情報の破裂パワー比が所定の閾値より大きければ、破裂情報が選択され、この破裂情報が重み係数設計情報として振幅変調部７６に出力され、破裂パワー比が当該閾値より小さければ重心時刻が選択され、この重心時刻が重み係数設計情報として振幅変調部７６に出力される。 The rupture information selection unit 75 selects either the centroid time from the centroid time calculation unit 62 or the rupture information based on the rupture information from the time domain plosive sound detection means 72. At this time, if the burst power ratio of the burst information is greater than a predetermined threshold, the burst information is selected, and this burst information is output to the amplitude modulator 76 as weight coefficient design information. If the burst power ratio is smaller than the threshold, the centroid time Is selected, and this barycentric time is output to the amplitude modulation section 76 as weight coefficient design information.

振幅変調部７６では、与えられた重み係数設計情報に基づいて重み係数を設計した上で、破裂検定部７４から与えられた補正破裂真偽値が真であるなら、与えられた復号音声に当該重み係数を乗じて振幅変調を施し、補正破裂真偽値が偽であるなら、復号音声をそのまま通過させて、改善音声を得、得られた改善音声を破裂化処理手段７３の出力として出力する。 The amplitude modulation unit 76 designs the weighting factor based on the given weighting factor design information, and if the corrected burst true / false value given from the burst validation unit 74 is true, If the correction burst true / false value is false, the decoded speech is passed through as it is when the amplitude modulation is performed by multiplying the weight coefficient, and the improved speech is obtained, and the obtained improved speech is output as the output of the burst processing means 73. .

（Ｈ−３）第７の実施形態の効果
以上のように、第７の実施形態によれば、ＭＢＥ系の音声符号化方式の復号音声において、特徴的な音響現象が損なわれた破裂音をより適切に破裂化させることができるので、当該復号音声の明瞭性を改善して聴き心地を向上させた音声を利用者に提供できる。 (H-3) Effect of Seventh Embodiment As described above, according to the seventh embodiment, a plosive sound in which a characteristic acoustic phenomenon is impaired in the decoded speech of the MBE speech coding scheme. Since it can be more appropriately ruptured, it is possible to provide the user with a voice that improves the clarity of the decoded voice and improves the listening comfort.

（Ｉ）他の実施形態
上記各実施形態においても種々の変形実施形態に言及したが、さらに、以下に例示するような変形実施形態を挙げることができる。 (I) Other Embodiments In the above-described embodiments, various modified embodiments have been referred to, but further modified embodiments as exemplified below can be given.

上記各実施形態では、ＭＢＥ系復号手段からの復号音声の品質を改善する方法が１種類のものを示したが、複数の改善方法に対応できる構成とし、利用者が改善方法を選択できるようにしても良い。 In each of the above embodiments, one method for improving the quality of decoded speech from the MBE decoding means has been described. However, a configuration that can handle a plurality of improvement methods is provided so that the user can select an improvement method. May be.

また、複数の改善方法からの選択ではなく、改善方法を適用するか否かを利用者が選択できるようにしても良い。この選択を利用者が行うのではなく、自動的に行なうようにしても良い。例えば、復号音声について、パワー、各時数のＬＰＣ係数の平均値等の特性値を算出し、算出した特性値と閾値との比較により、上記各実施形態で説明した復号音声に対する改善方法を適用するか否かを定めるようにしても良い。 Further, instead of selecting from a plurality of improvement methods, the user may be able to select whether or not to apply the improvement method. This selection may be made automatically instead of by the user. For example, a characteristic value such as power and an average value of LPC coefficients for each hour is calculated for decoded speech, and the improvement method for the decoded speech described in each of the above embodiments is applied by comparing the calculated characteristic value with a threshold value. Whether or not to do so may be determined.

上記各実施形態では、音声を復号化する場合を示したが、音響を適用可能なＭＢＥ系符号化の場合であれば、音響の復号化に本発明の技術的思想を適用することができる。特許請求の範囲に記載の「音声」の用語には、このような場合の「音響」も含まれているものとする。 In each of the above embodiments, the case where speech is decoded has been described. However, in the case of MBE encoding to which sound can be applied, the technical idea of the present invention can be applied to sound decoding. It is assumed that the term “sound” described in the claims includes “sound” in such a case.

上記各実施形態の説明では言及しなかったが、音声復号化装置を構成する要素の装置やチップへの実装方法は任意である。例えば、ＭＢＥ系復号手段１２がＩＣチップで実現され、上記各実施形態の破裂音検出手段（周波数領域破裂音検出手段、時間領域破裂音検出手段を含む）、破裂化処理手段が、ＣＰＵにより実行されるソフトウェアとして構成されても良い。また、上記各実施形態の破裂音検出手段（周波数領域破裂音検出手段、時間領域破裂音検出手段を含む）、破裂化処理手段がＩＣチップ化されても良い。上記各実施形態の音声復号化装置は、デジタル無線機や有線回線に接続する通信機器に搭載されるものであっても良い。 Although not mentioned in the description of each of the above embodiments, the method of mounting the elements constituting the speech decoding apparatus on the device or chip is arbitrary. For example, the MBE decoding means 12 is realized by an IC chip, and the plosive detection means (including the frequency domain plosive detection means and the time domain plosive detection means) and the bursting processing means of the above embodiments are executed by the CPU. It may be configured as software. Further, the explosive sound detecting means (including the frequency domain explosive sound detecting means and the time domain explosive sound detecting means) and the bursting processing means of each of the above embodiments may be integrated into an IC chip. The speech decoding apparatus according to each of the above embodiments may be mounted on a communication device connected to a digital wireless device or a wired line.

１Ａ、１Ｂ、１Ｃ、１Ｄ、１Ｅ、１Ｆ、１Ｇ…音声復号化装置、１１…受信手段、１２…ＭＢＥ系復号手段、１３、２１…破裂音検出手段、１４、３１、４１、５１、６１、７７…破裂化処理手段、７１…周波数領域破裂音検出手段、７２…時間領域破裂音検出手段。 1A, 1B, 1C, 1D, 1E, 1F, 1G ... speech decoding device, 11 ... receiving means, 12 ... MBE decoding means, 13, 21 ... plosive detecting means, 14, 31, 41, 51, 61, 77 ... bursting processing means, 71 ... frequency domain burst sound detecting means, 72 ... time domain burst sound detecting means.

Claims

In a speech decoding apparatus for decoding digitally encoded information encoded according to an MBE speech encoding method,
MBE decoding means for decoding the digital audio encoding information to generate decoded audio;
A burst sound detecting means for detecting a burst sound of the decoded speech;
A speech decoding apparatus comprising: bursting processing means for bursting the detected burst sound.

The rupture treatment means is:
When the plosive sound detection means determines that the processing frame is a plosive sound, the decoded sound is multiplied by a predetermined weighting factor and output,
The speech decoding apparatus according to claim 1, wherein the decoded speech is output as it is when the plosive detection means determines that the processing frame is not a plosive.

The weighting factor is
Consists of three states: silence, rupture start, and rupture late
The silent part has a value smaller than 0 dB,
From the value of the silent part at the beginning of the rupture part, the value increases to a value larger than 0 dB,
The speech decoding apparatus according to claim 2, wherein in the latter half of the rupture part, the value decreases from a value larger than 0 dB at the rupture part start end to 0 dB.

The plosive detection means is
Power is calculated at a cycle shorter than the processing cycle of the MBE decoding means,
Divide the obtained short cycle power by the reference power determined by the predetermined rule,
The speech decoding apparatus according to claim 1, wherein if the obtained power ratio is equal to or greater than a predetermined threshold, it is determined that the sound is a plosive sound.

The predetermined rule is
For the target time for calculating the power ratio,
If the target time is the beginning of the processing frame, the short period power of the target time is the reference power,
The minimum value of short cycle power from the start end in the processing frame to immediately before the target time is used as the reference power if the target time is after the start end in the processing frame. The speech decoding apparatus described.

The predetermined rule is
The voice according to claim 4, wherein the reference power is a minimum value of a short cycle power from a predetermined time before the target time to immediately before the target time with respect to the target time for calculating the power ratio. Decryption device.

The plosive detection means is
Frequency analysis of the decoded speech from the MBE decoding means,
The speech decoding apparatus according to claim 1, wherein pattern recognition is performed on the obtained power spectrum to determine whether or not the sound is a plosive sound.

The rupture treatment means is:
8. The speech decoding according to claim 7, wherein when the plosive sound is detected continuously by the plosive sound detection means, the weighting coefficient is multiplied only once based on only the first detected frame. Device.

The rupture treatment means is:
The decoded speech is converted to a non-negative value for each sample,
Calculate the center of gravity in the frame related to the sum of the obtained non-negative signal,
The speech decoding apparatus according to claim 7, wherein the weighted coefficient is translated in the time direction based on the obtained barycentric time and then multiplied by the decoded speech.

The rupture treatment means is:
9. The speech decoding apparatus according to claim 8, wherein the weighting coefficient is translated in the time direction and then multiplied by the decoded speech based on the number of times that the burst sound is continuously detected by the burst sound detecting means. .

The rupture treatment means is:
The decoded speech is converted to a non-negative value for each sample,
Based on the number of times the plosive detection means has continuously detected a plosive and the centroid time obtained by calculating the centroid related to the sum of the non-negative signals in the same number of frames, the weighting factor is set in the time direction. The speech decoding apparatus according to claim 8, wherein the decoded speech is multiplied after being translated.

In a speech decoding apparatus for decoding digitally encoded information encoded according to an MBE speech encoding method,
MBE decoding means for decoding the digital audio encoding information to generate decoded audio;
A frequency domain burst sound detecting means for detecting a burst sound of the decoded speech in the frequency domain;
Calculates the center of gravity of the sum of the non-negative signals obtained by converting the decoded speech in the same number of frames as the number of consecutive bursts detected by the frequency domain plosive detection means into a non-negative value for each sample. Centroid time calculating means for calculating the centroid time,
A time domain burst sound detecting means for detecting the burst sound of the decoded speech in the time domain;
Burst information selecting means for selecting the centroid time and the burst information obtained from the time domain burst sound detecting means based on the burst information;
Based on the determination result of the frequency domain plosive sound detection means and the rupture information, a burst test means for re-determining whether or not a plosive sound,
Based on the weight coefficient design information obtained from the burst information selection unit, the predetermined weighting factor is redesigned based on the frame determined to be a plosive sound by the burst verification unit, and the decoded speech And a bursting processing unit for multiplying the weight coefficient by the speech decoding device.

The predetermined weight coefficient designed in advance is
Consists of three states: silence, rupture start, and rupture late
The silent part has a value smaller than 0 dB,
From the value of the silent part at the beginning of the rupture part, the value increases to a value larger than 0 dB,
The speech decoding apparatus according to claim 12, wherein in the latter half of the rupture part, the value decreases from a value larger than 0 dB of the rupture part start end to 0 dB.

The frequency domain plosive detection means is
Frequency analysis of the MBE decoding means,
The speech decoding apparatus according to claim 12 or 13, wherein pattern recognition is performed on the obtained power spectrum to determine whether or not it is a plosive sound.

The time domain plosive detection means is
Power is calculated at a cycle shorter than the processing cycle of the MBE decoding means,
Divide the obtained short cycle power by the reference power determined by the predetermined rule,
The speech decoding apparatus according to any one of claims 12 to 14, wherein if the obtained power ratio is equal to or greater than a predetermined closed value, it is determined that the sound is a plosive sound.

The predetermined rule is
For the target time for calculating the power ratio,
If the target time is the beginning of the processing frame, the short period power of the target time is the reference power,
16. If the target time is after the base end in the processing frame, the minimum value of the short cycle power from the start end in the processing frame to immediately before the target time is set as the reference power. The speech decoding apparatus according to 1.

The predetermined rule is
The voice according to claim 15, wherein the reference power is a minimum value of a short period power from a predetermined time before the target time to immediately before the target time with respect to the target time for calculating the power ratio. Decryption device.

In a speech decoding method for decoding digitally encoded information encoded according to an MBE-based speech encoding scheme,
MBE decoding means decodes the digital audio encoding information to generate decoded audio,
The plosive detection means detects a plosive of the decoded speech,
A speech decoding method, wherein the bursting processing means bursts the detected burst sound.

In a speech decoding method for decoding digitally encoded information encoded according to an MBE-based speech encoding scheme,
MBE decoding means decodes the digital audio encoding information to generate decoded audio,
The frequency domain burst sound detecting means detects the burst sound of the decoded speech in the frequency domain,
A non-negative signal obtained by converting the decoded speech in the same number of frames as the number of times that the burst sound is continuously detected by the frequency domain plosive detection means into a non-negative value for each sample by the centroid time calculation means. Calculate the center of gravity for the sum of
The time domain plosive detection means detects the plosive sound of the decoded speech in the time domain,
The rupture information selection means selects the centroid time and the rupture information obtained from the time domain rupture sound detection means based on the rupture information,
Based on the determination result of the frequency domain plosive detection means and the burst information, the burst test means re-determines whether or not it is a plosive sound,
Redesign a predetermined weighting factor designed in advance based on the weighting factor design information obtained from the bursting information selection unit with reference to the frame determined by the bursting processing unit as a burst sound in the bursting verification unit Then, the speech decoding method, wherein the decoded speech is multiplied by the weight coefficient.

In a speech decoding program for decoding digitally encoded information encoded according to the MBE speech encoding method,
Computer
MBE decoding means for decoding the digital audio encoding information to generate decoded audio;
A burst sound detecting means for detecting a burst sound of the decoded speech;
A speech decoding program that functions as bursting processing means for bursting the detected burst sound.

In a speech decoding program for decoding digitally encoded information encoded according to the MBE speech encoding method,
Computer
MBE decoding means for decoding the digital audio encoding information to generate decoded audio;
A frequency domain burst sound detecting means for detecting a burst sound of the decoded speech in the frequency domain;
Calculates the center of gravity of the sum of the non-negative signals obtained by converting the decoded speech in the same number of frames as the number of consecutive bursts detected by the frequency domain plosive detection means into a non-negative value for each sample. Centroid time calculating means for calculating the centroid time,
A time domain burst sound detecting means for detecting the burst sound of the decoded speech in the time domain;
Burst information selecting means for selecting the centroid time and the burst information obtained from the time domain burst sound detecting means based on the burst information;
Based on the determination result of the frequency domain plosive sound detection means and the rupture information, a burst test means for re-determining whether or not a plosive sound,
Based on the weight coefficient design information obtained from the burst information selection unit, the predetermined weighting factor is redesigned based on the frame determined to be a plosive sound by the burst verification unit, and the decoded speech A speech decoding program that functions as bursting processing means that multiplies the weight coefficient by the weight coefficient.

A communication device comprising the speech decoding device according to claim 1.