JP4935280B2

JP4935280B2 - Speech coding apparatus, speech decoding apparatus, speech coding method, speech decoding method, and program

Info

Publication number: JP4935280B2
Application number: JP2006267244A
Authority: JP
Inventors: 博康井手
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2006-09-29
Filing date: 2006-09-29
Publication date: 2012-05-23
Anticipated expiration: 2026-09-29
Also published as: JP2008089651A

Description

本発明は、分析合成型の音声圧縮復元を実行する際に必要となる、音声符号化装置、音声復号装置、音声符号化方法、音声復号方法、及び、プログラムに関する。 The present invention relates to a speech encoding device, speech decoding device, speech encoding method, speech decoding method, and program that are required when performing analysis / synthesis speech compression / decompression.

デジタル携帯電話等の移動体通信の分野においては、加入者の増加に対処するため、低ビットレート（８ｋｂｐｓ程度）の音声の圧縮符号化方法が求められている。例えば、８ｋｂｐｓの音声符号化方法として、ＩＴＵ−Ｔ勧告Ｇ．７２９に示される音声符号化方法がある。 In the field of mobile communications such as digital mobile phones, a low bit rate (about 8 kbps) voice compression coding method is required to cope with the increase in subscribers. For example, as an 8 kbps speech coding method, ITU-T Recommendation G. 729, there is a speech encoding method.

上述の勧告に係る音声符号化方法は、基本的には、音声信号を予測分析により予測係数と残差信号とに分解してから符号化する方法である。予測分析として、例えば、線型予測分析や、ＭＬＳＡ分析（例えば、非特許文献１参照。）が知られている。 The speech coding method according to the above-mentioned recommendation is basically a method of coding a speech signal after decomposing it into a prediction coefficient and a residual signal by predictive analysis. As prediction analysis, for example, linear prediction analysis and MLSA analysis (for example, see Non-Patent Document 1) are known.

ビットレートに例えば上述のような8ｋｂｐｓ程度といった制約があるために、単位時間あたりの符号長が一定になるように符号化する必要がある。 Since the bit rate is limited to, for example, about 8 kbps as described above, it is necessary to perform encoding so that the code length per unit time is constant.

従来はそのために、符号化方法として、ベクトル量子化等に基づく符号化方法が採用されている。かかる符号化方法においては、符号化の精度、すなわち、元の情報をどの程度詳しく再現し得るように符号化するか、が決定されると、圧縮率も決定される。 Conventionally, therefore, an encoding method based on vector quantization or the like has been adopted as an encoding method. In such an encoding method, when the accuracy of encoding, that is, how detailed the original information is to be reproduced, is determined, the compression rate is also determined.

よって、符号化の精度を決定すれば、自動的に、単位時間あたりの符号長が一定になり、好都合である。 Therefore, if the encoding accuracy is determined, the code length per unit time is automatically fixed, which is convenient.

別の観点からみれば、符号化にあたっての精度は、ビットレートの制約から逆算して決定されればよいので、好都合である。 From another point of view, the accuracy in encoding is convenient because it can be determined by back calculation based on the bit rate constraint.

かかる符号化方法を採る場合、符号長を長くしたり短くしたりするには、例えば、予測分析の次数を、それぞれ、増加させたり減少させたりすればよい。従来は、予測分析の次数は、あらかじめ、ビットレートの制約を満たす範囲内で最大化した特定の次数に固定されていた。
今井聖、住田一男、古市千枝子著「音声合成のためのメル対数スペクトル近似（ＭＬＳＡ）フィルタ」、電子通信学会論文誌、第Ｊ６６−Ａ巻、第２号、ｐ．１２２−１２９、１９８３年 When such an encoding method is employed, in order to increase or decrease the code length, for example, the order of prediction analysis may be increased or decreased, respectively. Conventionally, the order of prediction analysis has been fixed in advance to a specific order maximized within a range that satisfies the bit rate constraint.
Sei Imai, Kazuo Sumita, Chieko Furuichi, “Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis”, IEICE Transactions, Vol. J66-A, No. 2, p. 122-129, 1983

一方、符号化方法の中には、符号化対象となる信号に含まれる数値の発生頻度を考慮して符号化する、エントロピ符号化方法と呼ばれる符号化方法もある。 On the other hand, among the encoding methods, there is an encoding method called an entropy encoding method that performs encoding in consideration of the frequency of occurrence of numerical values included in a signal to be encoded.

符号化の精度が等しいという前提で、エントロピ符号化方法と上述のベクトル量子化等に基づく符号化方法とを比較すると、符号化対象の音声信号が継続している時間のうち、前者の方が圧縮率が高い時間帯がしばしばある。 Comparing the entropy encoding method and the encoding method based on the above-described vector quantization, etc., on the premise that the encoding accuracy is equal, the former is more of the time during which the speech signal to be encoded continues. There are often times when the compression ratio is high.

換言すれば、かかる時間帯においてベクトル量子化等に基づく符号化方法を採用することは、限られたビットレートを無駄遣いしているといえる。 In other words, it can be said that adopting an encoding method based on vector quantization or the like in such a time zone wastes a limited bit rate.

しかし、そうであるからといって、ベクトル量子化等に基づく符号化方法を単純にエントロピ符号化方法に置換すればよいわけではない。その理由は、次のようなものである。すなわち、エントロピ符号化方法は、符号化の精度が一定であっても、圧縮率は一意には定まらない。よって、該圧縮率が、ベクトル量子化等に基づく符号化方法による圧縮率よりも低くなってしまう場合もある。つまり、符号化対象の音声信号が継続している時間のうちには、ビットレートの制約を満たさない時間帯も生じ得るため、不都合である。 However, this does not mean that an encoding method based on vector quantization or the like is simply replaced with an entropy encoding method. The reason is as follows. That is, in the entropy encoding method, even if the encoding accuracy is constant, the compression rate is not uniquely determined. Therefore, the compression rate may be lower than the compression rate by an encoding method based on vector quantization or the like. That is, it is inconvenient because a time zone that does not satisfy the bit rate constraint may occur during the time that the audio signal to be encoded continues.

以上のように、一律にベクトル量子化等に基づく符号化方法を採用すると、与えられたビットレートを十分に活用することができない。一方で、かわりに単純にエントロピ符号化方法を採用して圧縮率の向上を目指そうとすると、ビットレートの制約を満たすことができない時間帯が発生してしまう。 As described above, if a coding method based on vector quantization or the like is adopted uniformly, a given bit rate cannot be fully utilized. On the other hand, if the entropy coding method is simply adopted instead to improve the compression rate, a time zone in which the bit rate constraint cannot be satisfied occurs.

本発明は、かかる事情に鑑みてなされたもので、限られたビットレートを最大限有効に活用する音声符号化装置、音声復号装置、音声符号化方法、音声復号方法、及び、プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and provides an audio encoding device, an audio decoding device, an audio encoding method, an audio decoding method, and a program that make effective use of a limited bit rate as much as possible. For the purpose.

上記目的を達成するために、この発明の第１の観点に係る音声符号化装置は、
音声信号を所定の次数の予測分析により予測係数と残差信号とに分解する予測分析部と、
前記残差信号のゲインを求めるゲイン抽出部と、
前記残差信号が有声音か無声音かを判別するとともに該残差信号が有声音であると判別された場合には該残差信号からピッチ周波数を抽出する有声無声判別及びピッチ抽出部と、
前記予測係数と前記ゲインと前記判別の結果と該判別の結果前記ピッチ周波数が抽出された場合には該ピッチ周波数とをエントロピ符号に変換する符号化部と、
前記エントロピ符号の長さが許容長を超えるか否かを判別し、符号の長さが許容長を超えると判別された場合には、前記予測分析部における予測分析の次数を減らして前記一連の符号化動作を繰り返し実行させる制御部と、
前記制御部の制御により前記符号化部で繰り返し実行されて許容長に収まったエントロピ符号を送信する符号送信部と、
を備えたことを特徴とする。 In order to achieve the above object, a speech encoding apparatus according to the first aspect of the present invention provides:
A prediction analysis unit that decomposes a speech signal into a prediction coefficient and a residual signal by a prediction analysis of a predetermined order;
A gain extraction unit for obtaining a gain of the residual signal;
Determining whether the residual signal is voiced or unvoiced and, if it is determined that the residual signal is voiced, a voiced and unvoiced discrimination and pitch extracting unit that extracts a pitch frequency from the residual signal;
An encoding unit that converts the pitch frequency into an entropy code when the prediction coefficient, the gain, the determination result, and the pitch frequency are extracted as a result of the determination;
It is determined whether or not the length of the entropy code exceeds an allowable length, and when it is determined that the length of the code exceeds the allowable length, the order of the prediction analysis in the prediction analysis unit is reduced to reduce the order of the series. A control unit that repeatedly executes an encoding operation;
A code transmission unit that transmits an entropy code that is repeatedly executed by the encoding unit under the control of the control unit and falls within an allowable length;
It is provided with.

かかる符号化装置によれば、情報伝達量に制約のある状況において、該制約のもとで可能な最高の品質を有する音声を再生するための符号化音声信号を生成することができる。予測分析の次数が大きいほど、再生音声は明りょうとなるからである。 According to such an encoding device, an encoded audio signal for reproducing audio having the highest quality possible under the restriction can be generated in a situation where the amount of information transmission is restricted. This is because the higher the order of predictive analysis, the clearer the reproduced sound.

前記有声無声判別及びピッチ抽出部は、前記残差信号からあらかじめ低域部を抽出するローパスフィルタを備え、前記低域部が有声音か無声音かを判別するとともに該低域部が有声音であると判別された場合には該低域部からピッチ周波数を抽出するのが望ましい。 The voiced / unvoiced discrimination / pitch extraction unit includes a low-pass filter that extracts a low-frequency part in advance from the residual signal, determines whether the low-frequency part is voiced sound or unvoiced sound, and the low-frequency part is voiced sound. If it is discriminated, it is desirable to extract the pitch frequency from the low frequency region.

有声音を特徴づける量であるピッチ周波数は比較的低い帯域に存在するので、ピッチ周波数の抽出の前に残差信号をローパスフィルタに通すことにより、有声音か無声音かの判別の精度が上がる。 Since the pitch frequency, which is an amount that characterizes voiced sound, exists in a relatively low band, the accuracy of discrimination between voiced and unvoiced sound increases by passing the residual signal through a low-pass filter before the extraction of the pitch frequency.

前記予測分析部は、音声信号を、例えば、線型予測分析により予測係数と残差信号とに分解する。 The prediction analysis unit decomposes the speech signal into a prediction coefficient and a residual signal by, for example, linear prediction analysis.

前記予測分析部は、音声信号を、あるいは例えば、ＭＬＳＡ（Mel Log Spectrum Approximation）分析により予測係数と残差信号とに分解する。 The prediction analysis unit decomposes the speech signal into a prediction coefficient and a residual signal by, for example, MLSA (Mel Log Spectrum Approximation) analysis.

上記目的を達成するために、この発明の第２の観点に係る音声復号装置は、
第１の観点に係る音声符号化装置の符号送信部から送信されたエントロピ符号を受信する受信部と、
前記受信したエントロピ符号を復号して、予測係数と、残差信号のゲインと、残差信号の有声無声判別結果及び有声の場合のピッチ周波数と、を生成する復号部と、
前記音声信号が無声音である場合には前記残差信号ゲインと等しいゲインを有する雑音を励起用信号として生成し、前記音声信号が有声音である場合には前記残差信号ゲインと等しいゲインを有し前記ピッチ周波数と等しい周波数を有するパルス列を励起用信号として生成する信号発生部と、
前記予測係数と前記励起用信号とを合成することにより音声を復元する合成フィルタと、
を備える。 In order to achieve the above object, a speech decoding apparatus according to the second aspect of the present invention provides:
A receiving unit that receives an entropy code transmitted from the code transmitting unit of the speech encoding device according to the first aspect ;
A decoding unit that decodes the received entropy code and generates a prediction coefficient, a gain of the residual signal, a voiced / unvoiced discrimination result of the residual signal, and a pitch frequency in the case of voiced ;
When the voice signal is an unvoiced sound, noise having a gain equal to the residual signal gain is generated as an excitation signal. When the voice signal is a voiced sound, the noise has a gain equal to the residual signal gain. A signal generator for generating a pulse train having a frequency equal to the pitch frequency as an excitation signal;
A synthesis filter that restores speech by synthesizing the prediction coefficient and the excitation signal;
Is provided.

音声復号装置が、予測係数と、残差信号のゲインと、有声音か無声音かの判別結果と、さらに、有声音であればそのピッチ周波数と、を受け取った場合、音声復元のために必要な励起用信号を最も簡潔かつ確実に生成するためには、上述のような仕組みの信号発生部を設けることが適切である。 When the speech decoding apparatus receives the prediction coefficient, the residual signal gain, the discrimination result of voiced or unvoiced sound, and the pitch frequency of voiced sound, it is necessary for speech restoration. In order to generate the excitation signal in the simplest and most reliable manner, it is appropriate to provide a signal generator having the above-described mechanism.

本発明によれば、所定の通信容量を超えないという条件の下で、元の音声の音質を最大限に保持するように音声を符号化及び復号することが可能となる。 According to the present invention, it is possible to encode and decode speech so as to maintain the sound quality of the original speech to the maximum, under the condition that the predetermined communication capacity is not exceeded.

以下、本発明の実施の形態に係る音声符号化装置及び音声復号装置について詳細に説明する。 The speech encoding apparatus and speech decoding apparatus according to embodiments of the present invention will be described in detail below.

図１は、本実施形態に係る音声符号化装置１１１の機能構成図である。 FIG. 1 is a functional configuration diagram of the speech encoding device 111 according to the present embodiment.

音声符号化装置１１１は、図示するように、マイクロフォン１２１と、Ａ／Ｄ変換部１２３と、予測分析部１２５と、ゲイン抽出部１２７と、ローパスフィルタ１２９と、有声無声判別及びピッチ抽出部１３１と、符号化部１３３と、スイッチ１３５と、予測分析次数調整部１３７と、送信部１３９と、を備える。予測分析部１２５は、予測分析用逆フィルタ算出器１４１を内蔵している。 As shown in the figure, the speech encoding apparatus 111 includes a microphone 121, an A / D conversion unit 123, a prediction analysis unit 125, a gain extraction unit 127, a low-pass filter 129, a voiced / unvoiced discrimination / pitch extraction unit 131, and the like. , An encoding unit 133, a switch 135, a prediction analysis order adjustment unit 137, and a transmission unit 139. The prediction analysis unit 125 includes a prediction analysis inverse filter calculator 141.

まず、マイクロフォン１２１に音声が入力される。該音声はアナログ信号である。一方、後に行われる分析及び符号化は離散的な処理である。よって、それに備えるために、該アナログ信号は、Ａ／Ｄ変換部１２３によってデジタル音声信号に変換されて、予測分析部１２５に送られる。 First, sound is input to the microphone 121. The voice is an analog signal. On the other hand, analysis and encoding performed later are discrete processes. Therefore, in order to prepare for this, the analog signal is converted into a digital audio signal by the A / D conversion unit 123 and sent to the prediction analysis unit 125.

予測分析部１２５は、Ａ／Ｄ変換部１２３から引き渡されたデジタル音声信号に対して、予測分析を施す。予測分析としては、例えば、線型予測分析を用いる。あるいは、ＭＬＳＡ（Mel Log Spectrum Approximation）分析を用いてもよい。いずれも既知の手法である。両分析の手順については、後に図４を用いて詳細に説明する。 The prediction analysis unit 125 performs prediction analysis on the digital audio signal delivered from the A / D conversion unit 123. As the prediction analysis, for example, linear prediction analysis is used. Alternatively, MLSA (Mel Log Spectrum Approximation) analysis may be used. Both are known methods. The procedures of both analyzes will be described later in detail with reference to FIG.

予測分析部１２５が行う予測分析とは、簡潔にいうと、デジタル音声信号を時分割し、各時間区間について、該時間区間における予測係数及び残差信号を算出する手続である。該時間区間の長さは、例えば、5ｍｓが好適である。 Precisely speaking, the prediction analysis performed by the prediction analysis unit 125 is a procedure for time-dividing a digital audio signal and calculating a prediction coefficient and a residual signal in the time interval for each time interval. The length of the time interval is preferably 5 ms, for example.

以下では、Ａ／Ｄ変換部１２３から予測分析部１２５に送られるデジタル音声信号は、M個の時間区間に時分割されるものとする。また、各時間区間に含まれるデジタル音声信号データの個数をlとする。すると、デジタル音声信号全体には、(l×M)個のデータが含まれていることになる。 In the following, it is assumed that the digital audio signal sent from the A / D conversion unit 123 to the prediction analysis unit 125 is time-divided into M time intervals. Also, let l be the number of digital audio signal data included in each time interval. Then, the entire digital audio signal contains (l × M) data.

予測分析部１２５は、全体としては、各時間区間中のデジタル音声信号S_i＝{s_i、0、・・・、s_i、l−1}(0≦i≦M−1)を、予測分析の次数に等しい個数の予測係数と、残差信号D_i＝{d_i、0、・・・、d_i、l−1}(0≦i≦M−1)と、に分解する機能を有する。 As a whole, the prediction analysis unit 125 predicts the digital audio signal S _i = {s _{i, 0} ,..., S _{i, l−1} } (0 ≦ i ≦ M−1) in each time interval. A function of decomposing into a number of prediction coefficients equal to the order of analysis and residual signals D _i = {d _{i, 0} ,..., D _{i, l−1} } (0 ≦ i ≦ M−1). Have.

より詳細には、予測分析部１２５は、まず、入力されたデジタル音声信号から予測係数を算出する。このとき、予測分析の次数は、所定の初期値である。 More specifically, the prediction analysis unit 125 first calculates a prediction coefficient from the input digital audio signal. At this time, the order of the prediction analysis is a predetermined initial value.

次に、予測分析部１２５に内蔵された予測分析用逆フィルタ算出器１４１が、該予測係数から、予測分析用逆フィルタを算出する。続いて、該予測分析用逆フィルタにＡ／Ｄ変換部１２３からのデジタル音声信号が入力されたときの出力として、残差信号D_i(0≦i≦M−1)が求まる。 Next, a prediction analysis inverse filter calculator 141 built in the prediction analysis unit 125 calculates a prediction analysis inverse filter from the prediction coefficient. Subsequently, a residual signal D _i (0 ≦ i ≦ M−1) is obtained as an output when the digital audio signal from the A / D converter 123 is input to the prediction analysis inverse filter.

予測係数は、そのまま符号化部１３３に送られる。 The prediction coefficient is sent to the encoding unit 133 as it is.

一方、残差信号は、符号化部１３３には、直接には引き渡されない。残差信号をそのまま符号化部１３３に送って符号化すると、符号化されても情報量がまだ大きすぎて、本実施の形態に係る音声符号化装置１１１が前提としている音声圧縮に反する結果となるからである。 On the other hand, the residual signal is not directly delivered to the encoding unit 133. When the residual signal is sent to the encoding unit 133 as it is and encoded, the amount of information is still too large, and the result is contrary to the audio compression assumed by the audio encoding device 111 according to the present embodiment. Because it becomes.

よって、残差信号は、できる限りその本質的な特徴だけを抽出することによりあらかじめ情報量を減少させてから、符号化部１３３に引き渡す必要がある。 Therefore, it is necessary to reduce the amount of information in advance by extracting only the essential features of the residual signal as much as possible, and then deliver the residual signal to the encoding unit 133.

予測分析部１２５により生成された残差信号D_i(0≦i≦M−1)は、ゲイン抽出部１２７と、ローパスフィルタ１２９と、に引き渡される。 The residual signal D _i (0 ≦ i ≦ M−1) generated by the prediction analysis unit 125 is delivered to the gain extraction unit 127 and the low-pass filter 129.

ゲイン抽出部１２７は、残差信号のゲインすなわち大きさを求める。かかる大きさの求め方には様々な方法が考えられるが、例えば、次式のように、i番目の時間区間におけるサンプル値の２乗平均を基にした値をゲインG_iとする。
G_i＝10×ｌｏｇ₁₀{(d_i、0 ²＋・・・＋d_i、l−1 ²)／l} The gain extraction unit 127 obtains the gain, that is, the magnitude of the residual signal. Such is the magnitude of Determination There are various ways, e.g., as follows, to a value based on the mean square of the sample value and the gain G _i in the i-th time interval.
G _i = 10 × log ₁₀ {(d _{i, 0} ² +... + D _{i, l−1} ² ) / l}

ここで、対数をとるのは、人間の聴覚が、音の大きさに対して対数的な感度を有することに基づく。 Here, the logarithm is based on the fact that the human auditory sense has logarithmic sensitivity to the volume of sound.

こうして算出されたゲインG_iは、符号化部１３３に引き渡される。 The gain G _i calculated in this way is delivered to the encoding unit 133.

一方、ローパスフィルタ１２９は、残差信号の低周波成分、例えば500Ｈｚ〜1ｋＨｚの成分を抽出する。これは、次に行う有声無声判別の精度を向上させるためである。換言すれば、有声音か無声音かを判別するためには、かかる低周波成分以外の成分は、不要であり、さらには、かえって判別の精度を低下させる要因になりかねないので、かかる判別の前にカットしておくのである。 On the other hand, the low pass filter 129 extracts a low frequency component of the residual signal, for example, a component of 500 Hz to 1 kHz. This is to improve the accuracy of the next voiced / unvoiced discrimination. In other words, in order to discriminate between voiced and unvoiced sounds, components other than such low-frequency components are unnecessary, and may also cause a reduction in discrimination accuracy. Cut it into pieces.

残差信号D_i＝{d_i、0、・・・、d_i、l−1}(0≦i≦M−1)は、ローパスフィルタ１２９を通されることにより、低域残差信号D_Low、i＝{d_Low、i、0、・・・、d_{Low、i、l−1}}(0≦i≦M−1)に変換される。低域残差信号D_Low、i＝{d_Low、i、0、・・・、d_{Low、i、l−1}}(0≦i≦M−1)は、有声無声判別及びピッチ抽出部１３１に引き渡される。 The residual signal D _i = {d _{i, 0} ,..., D _{i, l−1} } (0 ≦ i ≦ M−1) is passed through the low-pass filter 129, so that the low frequency residual signal D _{Low, i} = {d _{Low, i, 0} ,..., D _{Low, i, l−1} } (0 ≦ i ≦ M−1). The low-frequency residual signal D _{Low, i} = {d _{Low, i, 0} ,..., D _{Low, i, l−1} } (0 ≦ i ≦ M−1) is used for the voiced / unvoiced discrimination and pitch extraction unit 131. To be handed over.

有声無声判別及びピッチ抽出部１３１は、低域残差信号D_Low、i(0≦i≦M−1)が有声音であるか無声音であるかという判別結果を符号化部１３３に送る。また、該判別の結果、有声音であると判別された場合には、該判別結果に加えて、ピッチ周波数も、符号化部１３３に送る。これらの処理については、後に図５を参照して詳細に説明する。 The voiced / unvoiced discrimination / pitch extraction unit 131 sends the discrimination result as to whether the low-frequency residual signal D _{Low, i} (0 ≦ i ≦ M−1) is a voiced sound or an unvoiced sound to the encoding unit 133. If it is determined that the sound is a voiced sound, the pitch frequency is also sent to the encoding unit 133 in addition to the determination result. These processes will be described in detail later with reference to FIG.

このように、残差信号からは、ゲインと、有声音か無声音かの判別結果及び有声音であった場合にはピッチ周波数が抽出されて、符号化部１３３に送られる。これらの抽出された値及び判別結果は、音声信号の性質を考慮すると、情報量が少ない割には残差信号の性質を本質的に特徴づけるものであるといえる。このように残差信号の特徴量だけを符号化の対象とすることは、残差信号全体を丸ごと符号化する場合に比べ、符号化後の符号長が減少する割には、後述の音声復号装置により復元される残差信号の劣化の程度が、聴覚の特性上、小さい。よって、符号化の前に残差信号について上述の処理を施すことは、本実施形態に係る音声符号化装置１１１が前提とする程度までの音声圧縮を可能にしつつ、後述の音声復号装置により復元された音声の、元の音声に対する劣化の程度を許容限度内に収める結果となる。 Thus, from the residual signal, the gain, the discrimination result of voiced sound or unvoiced sound, and the pitch frequency in the case of voiced sound are extracted and sent to the encoding unit 133. It can be said that these extracted values and discrimination results essentially characterize the characteristics of the residual signal for a small amount of information in consideration of the characteristics of the audio signal. In this way, encoding only the feature amount of the residual signal means that the code length after encoding is reduced compared to the case where the entire residual signal is encoded as a whole. The degree of deterioration of the residual signal restored by the device is small in terms of auditory characteristics. Therefore, performing the above-described processing on the residual signal before encoding enables audio compression to the extent assumed by the audio encoding device 111 according to the present embodiment, and is restored by an audio decoding device described later. As a result, the degree of deterioration of the recorded voice with respect to the original voice is within an allowable limit.

結局、符号化部１３３には、予測分析部１２５からは予測係数が、ゲイン抽出部１２７からはゲインが、有声無声判別及びピッチ抽出部１３１からは有声音か無声音かの判別結果及び有声音であった場合にはピッチ周波数が、引き渡される。符号化部１３３は、これらをまとめて符号化する。 In the end, the encoding unit 133 receives the prediction coefficient from the prediction analysis unit 125, the gain from the gain extraction unit 127, and the voiced / unvoiced discrimination / pitch extraction unit 131 as a voiced / unvoiced discrimination result and voiced sound. If so, the pitch frequency is delivered. The encoding unit 133 encodes these together.

符号化部１３３は、符号化方法として、エントロピ符号化方法を採用する。エントロピ符号化方法には、圧縮率が予測できないという短所がある反面、符号化対象データに含まれる要素の出現頻度の分布の偏り次第では、極めて高い圧縮率が実現できる場合もある。圧縮率が高いということは、より高品質の音声を復元可能にする信号を送信できるということである。送信可能な元の情報量を増やせるので、元の音声に関するより多くの特徴量を送信できるからである。 The encoding unit 133 employs an entropy encoding method as an encoding method. The entropy encoding method has a disadvantage in that the compression rate cannot be predicted, but an extremely high compression rate may be realized depending on the distribution of the appearance frequency of elements included in the encoding target data. A high compression rate means that a signal that enables restoration of higher quality audio can be transmitted. This is because the amount of original information that can be transmitted can be increased, so that more feature amounts related to the original voice can be transmitted.

エントロピ符号化方法には、例えば、ハフマンコードや、RangeCoderがある。 Examples of the entropy encoding method include Huffman code and RangeCoder.

一般に、音声信号通信においては、単位時間あたりの情報伝達可能量が一定である。従って、音声信号通信に採用すべき符号化方法としては、上述のように、圧縮率にいわばムラがあるエントロピ符号化方法は、一見、適さないように思われる。 In general, in audio signal communication, the amount of information that can be transmitted per unit time is constant. Therefore, as described above, the entropy encoding method having a nonuniformity in the compression rate seems to be unsuitable as an encoding method to be adopted for the audio signal communication.

しかしこれについては、圧縮率が高くなった時間区間においては、そのことを生かして高品質音声信号を伝達し、圧縮率が低くなった時間区間においては、符号化対象となる特徴量の数を与えられた情報容量に収まるまで少なくすることにより、対処することができる。 However, in this regard, in the time interval in which the compression rate is high, the high-quality speech signal is transmitted by utilizing this fact, and in the time interval in which the compression rate is low, the number of feature quantities to be encoded is set. It can be dealt with by reducing it until it fits in the given information capacity.

このような方針を採用することにより、全ての時間区間について総合的に判断すれば、圧縮率が一定の通常の符号化方法を用いる場合よりも、エントロピ符号化方法を用いる場合の方が、良質の音声の再生に資するといえる。 By adopting such a policy, it is better to use the entropy encoding method than to use the normal encoding method with a constant compression rate, if all the time intervals are comprehensively determined. It can be said that it contributes to the reproduction of voice.

本実施形態は、かかる方針を具体化したものである。そのためには特に、予測分析次数調整部１３７が重要な役割を果たす。 The present embodiment embodies such a policy. For that purpose, in particular, the prediction analysis order adjustment unit 137 plays an important role.

上述のように、予測分析部１２５は、入力されたデジタル音声信号に対して、所定の初期値を予測分析の次数として、予測分析を行う。 As described above, the prediction analysis unit 125 performs prediction analysis on the input digital audio signal, using a predetermined initial value as the order of prediction analysis.

一般に、予測分析の次数が大きいほど、符号化部１３３に引き渡される情報の量は、増加し、かつ、元の音声信号を忠実に再生するにあたり有利になる。 In general, the greater the order of prediction analysis, the greater the amount of information delivered to the encoding unit 133, and the more advantageous it is for faithfully reproducing the original audio signal.

しかし、符号化部１３３に引き渡される情報の量が増加すると、符号化方法として圧縮率にムラのあるエントロピ符号化方法を用いたとしても、平均的にみれば、符号化後の符号長が長くなるのは確かである。よって、かかる符号長が、与えられた情報伝達容量に収まるようにするためには、予測分析の次数に上限があるといえる。 However, when the amount of information delivered to the encoding unit 133 increases, even if an entropy encoding method with uneven compression ratio is used as an encoding method, the code length after encoding becomes long on average. Certainly. Therefore, it can be said that there is an upper limit in the order of predictive analysis in order for the code length to be within a given information transmission capacity.

ただし、エントロピ符号化方法が、圧縮率の変動する符号化方法であるために、予測分析の次数における前記上限は単純には定まらない。 However, since the entropy encoding method is an encoding method in which the compression ratio varies, the upper limit in the order of prediction analysis is not simply determined.

本実施例においては、かかる上限を、エントロピ符号化方法が最も高い圧縮率を達成した場合を基準にして決定し、それを予測分析にあたっての上述の所定の初期値とする。 In the present embodiment, such an upper limit is determined based on the case where the entropy encoding method achieves the highest compression rate, and is set as the above-described predetermined initial value in the prediction analysis.

前述のとおり、高圧縮ができた場合はそれを生かして高品質の音声の再生に役立てる一方、低圧縮にとどまる場合には元の情報の削減量をできるだけ抑えて再生音声の品質低下を最小限に抑える、というのが、本実施例において採られる方針である。 As mentioned above, if high compression is possible, it will be used to play back high-quality audio, while if it remains low compression, the amount of original information will be reduced as much as possible to minimize degradation of the playback audio. The policy adopted in the present embodiment is to suppress it.

そこで、最初は、最も圧縮率が高くなる場合に該当することを期待して、上述の初期値を予測分析の次数とした予測分析を行う。そして、符号化部１３３は、実際にエントロピ符号化を行い、符号長を求め、予測分析次数調整部１３７に通知する。なお、この時点ではスイッチ１３５は開いており、符号化部１３３により生成されたエントロピ符号は送信部１３９に引き渡されないため、符号が送信されることはない。 Therefore, at first, the prediction analysis is performed with the above-described initial value as the order of the prediction analysis in anticipation of the case where the compression ratio becomes the highest. Then, the encoding unit 133 actually performs entropy encoding, obtains a code length, and notifies the prediction analysis order adjustment unit 137 of the code length. At this time, the switch 135 is open, and the entropy code generated by the encoding unit 133 is not delivered to the transmission unit 139, so that no code is transmitted.

予測分析次数調整部１３７は、符号化部１３３から通知された符号長が、与えられた情報通信容量の制限を満たしているか否かを判別する。かかる制限を満たしていると判別された場合には、スイッチ１３５に対し送信の許可を指示する。送信許可の指示を受けたスイッチ１３５は閉じ、符号化部１３３が生成したエントロピ符号は送信部１３９に引き渡され、後述の音声復号装置２１１に向けて送信される。 The prediction analysis order adjustment unit 137 determines whether or not the code length notified from the encoding unit 133 satisfies a given information communication capacity limit. If it is determined that the restriction is satisfied, the switch 135 is instructed to permit transmission. Upon receiving the transmission permission instruction, the switch 135 is closed, and the entropy code generated by the encoding unit 133 is transferred to the transmission unit 139 and transmitted to the speech decoding apparatus 211 described later.

上述のとおり、予測分析の次数の初期値は、エントロピ符号化方法にとって最も好都合な場合を基に定められたものであるから、多くの場合は、予測分析次数調整部１３７は、符号化部１３３から通知された符号長が、前記制限を満たしていないと判別し、スイッチ１３５に対して送信不許可の指示を送る。送信不許可の指示を受けたスイッチ１３５は開いたままとなり、エントロピ符号が送信部１３９に引き渡されることはなく、したがって送信されることもない。 As described above, since the initial value of the order of predictive analysis is determined based on the most convenient case for the entropy coding method, in many cases, the predictive analysis order adjusting unit 137 has the coding unit 133. It is determined that the code length notified from the above does not satisfy the restriction, and a transmission non-permission instruction is sent to the switch 135. The switch 135 that has received the transmission non-permission instruction remains open, and the entropy code is not delivered to the transmission unit 139 and is therefore not transmitted.

このようにエントロピ符号長が所定の符号長を超えている場合には、予測分析次数調整部１３７は、予測分析部１２５に対して、予測分析の次数を１だけ減少させた上でもう一度予測分析をやり直すように命じる。予測分析の次数が減少すると、符号化部１３３に送られる情報量は減るから、符号化部１３３が生成するエントロピ符号の符号長が所定の符号長以下になる可能性は先の場合よりも高い。 When the entropy code length exceeds the predetermined code length in this way, the prediction analysis order adjustment unit 137 reduces the order of prediction analysis by 1 to the prediction analysis unit 125 and then performs prediction analysis again. Order to start over. When the order of prediction analysis decreases, the amount of information sent to the encoding unit 133 decreases, so the possibility that the code length of the entropy code generated by the encoding unit 133 is less than or equal to the predetermined code length is higher than in the previous case. .

生成されたエントロピ符号の符号長は再び予測分析次数調整部１３７に通知される。予測分析次数調整部１３７は先の場合と同様の判別を行い、スイッチ１３５に送信の許可を通知するか、スイッチ１３５に送信の不許可を通知するとともに予測分析部１２５に対して予測分析の次数をさらに１だけ減少させた上で予測分析をやり直させる。 The code length of the generated entropy code is notified to the prediction analysis order adjustment unit 137 again. The prediction analysis order adjustment unit 137 performs the same determination as in the previous case, notifies the switch 135 of permission of transmission or notifies the switch 135 of permission of transmission and notifies the prediction analysis unit 125 of the order of prediction analysis. Is further reduced by 1 and the prediction analysis is performed again.

このような手順を踏めば、いずれは、予測分析次数調整部１３７からスイッチ１３５に送信許可が出され、エントロピ符号が送信部１３９に引き渡されて、さらに、後述の音声復号装置に送信される。 If such a procedure is followed, in any case, transmission permission is issued from the prediction analysis order adjustment unit 137 to the switch 135, the entropy code is handed over to the transmission unit 139, and further transmitted to a speech decoding apparatus to be described later.

送信部１３９は、本実施形態においては、無線通信による送信方法を採用するものとするが、他の、有線通信や、有線と無線を併用した通信など、様々な方法であってもよい。 In the present embodiment, the transmission unit 139 adopts a transmission method by wireless communication, but may be various other methods such as wired communication and communication using both wired and wireless.

以上のようにすれば、エントロピ符号化方法を採用することにより、与えられた情報通信容量を最大限活用して高品質な音声の再生に役立てるという本実施形態の方針に沿うことになる。 As described above, by adopting the entropy encoding method, the policy of the present embodiment is used in which the given information communication capacity is utilized to the maximum to help reproduce high-quality audio.

例えば、サンプリング周波数が8ｋＨｚの場合は、予測分析の次数の初期値を10とし、オーバーフローした場合には、目標符号長になるまで該次数を9、8、・・・のように1ずつ下げていく。 For example, when the sampling frequency is 8 kHz, the initial value of the order of prediction analysis is set to 10, and when it overflows, the order is lowered by 1 like 9, 8,... Until the target code length is reached. Go.

図２は、本実施形態に係る音声復号装置２１１の機能構成図である。 FIG. 2 is a functional configuration diagram of the speech decoding apparatus 211 according to the present embodiment.

音声復号装置２１１は、図示するように、受信部２３１と、復号部２３３と、残差信号復元部２３５と、合成用フィルタ算出部２３７と、合成用フィルタ部２３９と、Ｄ／Ａ変換部２４１と、スピーカ２４３と、を備える。 As shown in the figure, the speech decoding apparatus 211 includes a receiving unit 231, a decoding unit 233, a residual signal restoration unit 235, a synthesis filter calculation unit 237, a synthesis filter unit 239, and a D / A conversion unit 241. And a speaker 243.

受信部２３１は、図１の音声符号化装置１１１の送信部１３９から、無線通信手段によって、予測係数と残差信号情報がまとめて符号化されたもの（エントロピ符号）を受け取り、復号部２３３に引き渡す。 The reception unit 231 receives from the transmission unit 139 of the speech encoding device 111 of FIG. 1 the one in which the prediction coefficient and the residual signal information are encoded together (entropy code) by the wireless communication unit, and sends it to the decoding unit 233. hand over.

復号部２３３は、受信部２３１から引き渡されたエントロピ符号を復号して、各時間区分における、予測係数と、残差信号のゲインと、残差信号の有声無声判別結果及び有声の場合のピッチ周波数と、を生成する。なお、音声復号装置２１１には、音声合成に必要な情報である、音声符号化装置１１１が結局何次の予測分析を行ったかという情報は、直接には伝達されていない。しかしかかる情報は、復号された予測係数の個数をカウントすることにより得られる。 The decoding unit 233 decodes the entropy code delivered from the receiving unit 231, and in each time segment, the prediction coefficient, the residual signal gain, the voiced / unvoiced discrimination result of the residual signal, and the pitch frequency in the case of voiced And generate. Note that information about the order of prediction analysis performed by the speech coding apparatus 111 after all, which is information necessary for speech synthesis, is not directly transmitted to the speech decoding apparatus 211. However, such information can be obtained by counting the number of decoded prediction coefficients.

残差信号に関して復号された情報である、ゲインと、残差信号の有声無声判別結果及び有声の場合のピッチ周波数と、は、残差信号復元部２３５に引き渡される。 The gain, which is information decoded with respect to the residual signal, and the voiced / unvoiced discrimination result of the residual signal and the pitch frequency in the case of voiced are delivered to the residual signal restoration unit 235.

残差信号復元部２３５は、元の音声の残差信号をいくつかの特徴量に集約した結果に基づいて残差信号を復元する。この意味では、残差信号復元部２３５は、疑似残差信号生成部であるともいえる。 The residual signal restoration unit 235 restores the residual signal based on the result of collecting the residual signal of the original speech into several feature amounts. In this sense, it can be said that the residual signal restoration unit 235 is a pseudo residual signal generation unit.

残差信号復元部２３５が生成する疑似残差信号を、D'_i＝{d'_i、0、・・・、d'_i、l−1}(0≦i≦M−1)と表す。疑似残差信号D'_iは、パルス列又は雑音である。残差信号復元部２３５は、受け取った有声無声判別結果が有声音であれば、受け取ったピッチ周波数と同じピッチ周波数を有し、受け取ったゲインに対応する大きさを有するパルス列を生成する。一方、受け取った有声無声判別結果が無声音であれば、あらかじめ用意しておいた、ランダムな時間間隔を有する大きさ１の信号値列に、受け取ったゲインに対応する大きさを乗じることにより、雑音列を生成する。かかるパルス列又は雑音列を生成する手順については、後に図６を用いて詳細に説明する。疑似残差信号D'_iは、合成用フィルタ部２３９に、励起用の信号として引き渡される。 The pseudo residual signal generated by the residual signal restoration unit 235 is represented as D ′ _i = {d ′ _{i, 0} ,..., D ′ _{i, l−1} } (0 ≦ i ≦ M−1). The pseudo residual signal D ′ _i is a pulse train or noise. If the received voiced / unvoiced discrimination result is voiced sound, the residual signal restoration unit 235 generates a pulse train having the same pitch frequency as the received pitch frequency and having a magnitude corresponding to the received gain. On the other hand, if the received voiced / unvoiced discrimination result is an unvoiced sound, a noise value is obtained by multiplying a signal value sequence of size 1 having a random time interval prepared in advance by a size corresponding to the received gain. Generate a column. The procedure for generating such a pulse train or noise train will be described in detail later with reference to FIG. The pseudo residual signal D ′ _i is delivered to the synthesis filter unit 239 as an excitation signal.

一方、復号部２３３によって復号された予測係数は、合成用フィルタ算出部２３７に引き渡され、音声合成用のフィルタを算出するために用いられる。音声合成用のフィルタとは、該フィルタに励起用の信号を入力することにより音声信号が再生されるような性質を有するフィルタである。 On the other hand, the prediction coefficient decoded by the decoding unit 233 is transferred to the synthesis filter calculation unit 237 and used to calculate a speech synthesis filter. The filter for speech synthesis is a filter having such a property that a speech signal is reproduced by inputting an excitation signal to the filter.

合成用フィルタ算出部２３７によるフィルタ算出結果は、合成用フィルタ部２３９に送られる。合成用フィルタ部２３９は、受け取ったフィルタ算出結果に従って、自身の仕様を決定する。あるいは、合成用フィルタ算出部２３７によって、合成用フィルタ部２３９が生成されると考えてもよい。 The filter calculation result by the synthesis filter calculation unit 237 is sent to the synthesis filter unit 239. The synthesizing filter unit 239 determines its own specification according to the received filter calculation result. Alternatively, it may be considered that the synthesis filter unit 239 is generated by the synthesis filter calculation unit 237.

かかる合成用フィルタ部２３９に前述の疑似残差信号D'_iを励起用の信号として入力すれば、デジタルデータとしての音声信号が復元される。以上の音声信号復元の手順については、後に図７を参照して詳しく説明する。 When the pseudo residual signal D ′ _i is input as an excitation signal to the synthesizing filter unit 239, an audio signal as digital data is restored. The audio signal restoration procedure described above will be described in detail later with reference to FIG.

合成用フィルタ部２３９から出力された再生信号は、Ｄ／Ａ変換部２４１によりアナログ音声信号に変換された後、スピーカ２４３に伝達される。スピーカ２４３は受け取ったアナログ信号に従って実際に音声を発する。 The reproduction signal output from the synthesizing filter unit 239 is transmitted to the speaker 243 after being converted into an analog audio signal by the D / A conversion unit 241. The speaker 243 actually emits sound according to the received analog signal.

ここまで機能構成図である図１及び図２を参照して説明してきた音声符号化装置１１１及び音声復号装置２１１は、物理的には、使い勝手の観点から両装置の機能を統合した、図３に示される音声符号化兼復号装置３１１により実現される。以下では、音声符号化兼復号装置３１１として携帯電話機を想定して説明する。 The speech encoding apparatus 111 and speech decoding apparatus 211 that have been described with reference to FIGS. 1 and 2 which are functional configuration diagrams so far are physically integrated from the viewpoint of usability. Is realized by the speech encoding / decoding device 311 shown in FIG. In the following description, a mobile phone is assumed as the speech encoding / decoding device 311.

音声符号化兼復号装置３１１は、図１に既に示してあるマイクロフォン１２１と、図２に既に示してあるスピーカ２４３と、を備え、さらに、アンテナ３５３と、操作キー３６３と、を備える。 The speech encoding / decoding device 311 includes the microphone 121 already shown in FIG. 1 and the speaker 243 already shown in FIG. 2, and further includes an antenna 353 and operation keys 363.

音声符号化兼復号装置３１１は、ＣＰＵ３２１と、ＲＯＭ（Read Only Memory）３２３と、記憶部３２５と、音声処理部３４１と、無線通信部３５１と、操作キー入力処理部３６１と、をさらに備え、これらはシステムバス３７１で相互に接続されている。システムバス３７１は、命令やデータを転送するための伝送経路である。 The speech encoding / decoding device 311 further includes a CPU 321, a ROM (Read Only Memory) 323, a storage unit 325, a speech processing unit 341, a wireless communication unit 351, and an operation key input processing unit 361. These are connected to each other via a system bus 371. The system bus 371 is a transmission path for transferring commands and data.

ＲＯＭ３２３には、音声符号化及び復号のための動作プログラムや、音声符号化兼復号装置３１１の全体の制御に必要なオペレーティングシステムが格納されている。 The ROM 323 stores an operation program for speech encoding and decoding and an operating system necessary for overall control of the speech encoding / decoding device 311.

本実施の形態においては、図１の予測分析部１２５、ゲイン抽出部１２７、ローパスフィルタ１２９、有声無声判別及びピッチ抽出部１３１、スイッチ１３５、予測分析次数調整部１３７、図２の残差信号復元部２３５、合成用フィルタ算出部２３７、合成用フィルタ部２３９、の機能は、図３のＣＰＵ３２１による数値処理により実現される。ＲＯＭ３２３に格納されている動作プログラムには、ＣＰＵ３２１によるかかる数値処理のためのプログラムが含まれている。 In the present embodiment, the prediction analysis unit 125, the gain extraction unit 127, the low-pass filter 129, the voiced / unvoiced discrimination / pitch extraction unit 131, the switch 135, the prediction analysis order adjustment unit 137, and the residual signal restoration of FIG. The functions of the unit 235, the synthesis filter calculation unit 237, and the synthesis filter unit 239 are realized by numerical processing by the CPU 321 in FIG. The operation program stored in the ROM 323 includes a program for such numerical processing by the CPU 321.

ＣＰＵ３２１は、ＲＯＭ３２３に格納された動作プログラムやオペレーティングシステムを実行することにより、音声を符号化又は復号する。 The CPU 321 encodes or decodes sound by executing an operation program or an operating system stored in the ROM 323.

このように、ＣＰＵ３２１は、ＲＯＭ３２３に格納された動作プログラムに従って、数値演算を行う。そのためには、処理対象である数値列、例えばデジタル音声信号S_i(0≦i≦M−1)を格納したり、処理結果である数値列、例えば残差信号D_i(0≦i≦M−1)を格納したりするための記憶部３２５が必要となる。 As described above, the CPU 321 performs numerical calculations in accordance with the operation program stored in the ROM 323. For this purpose, a numerical sequence to be processed, for example, a digital audio signal S _i (0 ≦ i ≦ M−1) is stored, or a numerical sequence as a processing result, for example, a residual signal D _i (0 ≦ i ≦ M−1). -1) is required to be stored.

記憶部３２５は、ＲＡＭ（Random Access Memory）３３１と、ハードディスク３３３と、から構成されて、予測分析の次数、デジタル音声信号、予測係数、残差信号、ゲイン、有声無声判別結果、有声音のピッチ周波数、予測係数と残差信号情報がまとめて符号化されたもの、パルス列、雑音列、逆フィルタ算出結果、疑似残差信号、等を記憶する。 The storage unit 325 includes a RAM (Random Access Memory) 331 and a hard disk 333, and includes the order of prediction analysis, digital speech signal, prediction coefficient, residual signal, gain, voiced / unvoiced discrimination result, and pitch of voiced sound. The frequency, prediction coefficient and residual signal information encoded together, pulse train, noise train, inverse filter calculation result, pseudo residual signal, etc. are stored.

ＣＰＵ３２１は、レジスタ（図示せず）を内蔵しており、ＲＯＭ３２３から読み出した動作プログラムに従って、処理対象である数値列等を適宜記憶部３２５から該レジスタにロードし、ロードされた数値列等に所定の演算を施し、その結果を記憶部３２５に格納する。 The CPU 321 has a built-in register (not shown), and according to an operation program read from the ROM 323, appropriately loads a numerical sequence to be processed from the storage unit 325 into the register, and stores the predetermined numerical sequence in the loaded numerical sequence. And the result is stored in the storage unit 325.

無線通信部３５１と音声処理部３４１は、音声符号化兼復号装置３１１が音声符号化装置１１１（図１）となる場合は、次のように機能する。すなわち、マイクロフォン１２１に入力され音声処理部３４１が備えるＡ／Ｄ変換部１２３（図１）によりデジタル信号に変換された音声は、ＣＰＵ３２１、ＲＯＭ３２３、記憶部３２５により図１に示した過程を経て符号化される。そして、無線通信部３５１は送信部１３９（図１）として機能すべく、アンテナ３５３を用いて相手（受信側となる、別の音声符号化兼復号装置３１１。）に符号化予測係数及び符号化残差信号情報を送信する。 The wireless communication unit 351 and the audio processing unit 341 function as follows when the audio encoding / decoding device 311 is the audio encoding device 111 (FIG. 1). That is, the sound input to the microphone 121 and converted into a digital signal by the A / D conversion unit 123 (FIG. 1) included in the sound processing unit 341 is encoded by the CPU 321, ROM 323, and storage unit 325 through the process shown in FIG. It becomes. Then, the wireless communication unit 351 uses the antenna 353 to function as the transmission unit 139 (FIG. 1) to the other party (another speech encoding and decoding device 311 on the receiving side) and the encoded prediction coefficient and encoding. Residual signal information is transmitted.

無線通信部３５１と音声処理部３４１は、音声符号化兼復号装置３１１が音声復号装置２１１（図２）となる場合は、次のように機能する。すなわち、無線通信部３５１は受信部２３１（図２）として機能すべく、アンテナ３５３を用いて符号化予測係数及び符号化残差信号情報を受信する。受信された信号は、ＣＰＵ３２１、ＲＯＭ３２３、記憶部３２５により図２に示した過程を経てデジタル音声信号に復号される。デジタル音声信号は音声処理部３４１が備えるＤ／Ａ変換部２４１（図２）を用いてアナログ音声信号に変換され、スピーカ２４３から音声として出力される。 The wireless communication unit 351 and the speech processing unit 341 function as follows when the speech encoding / decoding device 311 is the speech decoding device 211 (FIG. 2). That is, the wireless communication unit 351 receives the encoded prediction coefficient and the encoded residual signal information using the antenna 353 so as to function as the receiving unit 231 (FIG. 2). The received signal is decoded by the CPU 321, ROM 323, and storage unit 325 into a digital audio signal through the process shown in FIG. The digital audio signal is converted into an analog audio signal using a D / A conversion unit 241 (FIG. 2) provided in the audio processing unit 341, and is output from the speaker 243 as audio.

操作キー入力処理部３６１は、操作キー３６３からの操作信号を受け付けて、操作信号に対応するキーコード信号をＣＰＵ３２１に入力する。ＣＰＵ３２１は、入力されたキーコード信号に基づいて操作内容を決定する。 The operation key input processing unit 361 receives an operation signal from the operation key 363 and inputs a key code signal corresponding to the operation signal to the CPU 321. The CPU 321 determines the operation content based on the input key code signal.

設定済みの変数等について、ユーザが操作キー３６３を用いて、自分が使いやすいように、音声符号化兼復号装置３１１をカスタマイズできるようにしてもよい。例えば、予測分析の次数の初期値は、原則としては、既に述べた考察に基づきあらかじめ決定され、ＲＯＭ３２３に格納されたプログラム中に記述済みである。しかし、これをユーザが操作キー３６３を用いて書き換えられるようにしてもよい。該初期値を小さくすれば、エントロピ符号化の利点が十分には発揮されなくなるが、平均的にみて少ない試行回数で所定の符号長に収まることになるので、処理速度が向上し、ユーザが通話の際のリアルタイム感の向上を実感できることもあり得る。 The user may be able to customize the speech encoding / decoding device 311 so that the user can easily use the operation keys 363 with respect to the set variables and the like. For example, the initial value of the order of predictive analysis is determined in advance based on the considerations already described, and is described in the program stored in the ROM 323 in principle. However, this may be rewritten by the user using the operation keys 363. If the initial value is reduced, the advantage of entropy coding will not be fully exerted, but on average the number of trials will fit within a predetermined code length, so the processing speed will be improved and the user will be able to talk. It may be possible to realize an improvement in real-time feeling during the event.

また、操作キー３６３は、音声符号化兼復号装置３１１を音声符号化装置１１１として機能させる場合に、多数流布している他の音声符号化兼復号装置３１１のうち、送信相手となる装置を特定するための番号（電話番号など）を入力するためにも必要となる。 In addition, the operation key 363 specifies a transmission partner device among the other speech encoding / decoding devices 311 that are widely distributed when the speech encoding / decoding device 311 functions as the speech encoding device 111. It is also necessary to enter a number (telephone number, etc.).

（予測分析の手順）
以下では、図１の予測分析部１２５が行う予測分析について、図４に示すフローチャートを参照しつつ説明する。予測分析としては、例えば、線型予測分析やＭＬＳＡ（Mel Log Spectrum Approximation）分析が知られている。図４では、後者を括弧書きにして、両分析が併記されている。 (Predictive analysis procedure)
Hereinafter, the prediction analysis performed by the prediction analysis unit 125 of FIG. 1 will be described with reference to the flowchart shown in FIG. As prediction analysis, for example, linear prediction analysis and MLSA (Mel Log Spectrum Approximation) analysis are known. In FIG. 4, both analyzes are shown together with the latter in parentheses.

記憶部３２５（図３）には、既に、デジタル音声信号（入力波形）S_i＝{s_i、0、・・・、s_i、l−1}(0≦i≦M−1)が格納されているとする。 The storage unit 325 (FIG. 3) already stores digital audio signals (input waveforms) S _i = {s _{i, 0} ,..., Si _{, l−1} } (0 ≦ i ≦ M−1). Suppose that

ＣＰＵ３２１（図３）は、内蔵のカウンタレジスタ（図示せず）を入力信号サンプルカウンタiの格納に用いることとし、初期値として、i＝0とする（図４のステップＳ４１１）。 The CPU 321 (FIG. 3) uses a built-in counter register (not shown) for storing the input signal sample counter i, and sets i = 0 as an initial value (step S411 in FIG. 4).

ＣＰＵ３２１は、内蔵の汎用レジスタ（図示せず）に、記憶部３２５（図３）から、入力信号サンプルS_i＝{s_i、0、・・・、s_i、l−1}をロードする（図４のステップＳ４１３）。 The CPU 321 loads input signal samples S _i = {s _{i, 0} ,..., S _{i, l−1} } from the storage unit 325 (FIG. 3) into a built-in general-purpose register (not shown) ( Step S413 in FIG.

ＣＰＵ３２１は、線型予測分析の場合は、入力信号サンプルS_iから、線型予測係数A_i＝{a_i、1、・・・、a_i、n}を計算する（ステップＳ４１５）。ただし、nは線型予測分析の次数である。計算方法としては、残差信号が所定の尺度に基づき十分に小さいと評価されることになるような計算方法であれば、任意の既知の手法を採用してよい。例えば、よく知られている、自己相関関数の計算とレビンソン・ダービンアルゴリズムを組み合わせた計算方法を採用するのが好適である。 In the case of linear prediction analysis, the CPU 321 calculates linear prediction coefficients A _i = {a _{i, 1} ,..., A _{i, n} } from the input signal sample S _i (step S415). Where n is the order of linear predictive analysis. As a calculation method, any known method may be employed as long as the residual signal is evaluated to be sufficiently small based on a predetermined scale. For example, it is preferable to use a well-known calculation method that combines the calculation of the autocorrelation function and the Levinson-Durbin algorithm.

ＣＰＵ３２１は、ＭＬＳＡ分析の場合は、入力信号サンプルS_iから、まず、ケプストラムC_i＝{c_i、0、・・・、c_{i、(l/2)−1}}を計算する。かかる計算には、任意の既知の手法を採用してよい。どの手法においても、概ね、離散フーリエ変換をする、絶対値をとる、対数をとる、逆離散フーリエ変換をする、といった手続が行われる。次に、求めたケプストラムC_iから、任意の既知の手法により、ＭＬＳＡフィルタ係数M_i＝{m_i、0、・・・、m_i、p−1}を計算する（ステップＳ４１５）。なお、ＭＬＳＡ分析の場合、pが予測分析の次数に相当する。 In the case of MLSA analysis, the CPU 321 first calculates cepstrum C _i = {c _{i, 0} ,..., C _{i, (l / 2) −1} } from the input signal sample S _i . Any known method may be employed for such calculation. In any method, procedures such as discrete Fourier transform, absolute value, logarithm, and inverse discrete Fourier transform are generally performed. Next, MLSA filter coefficients M _i = {m _{i, 0} ,..., M _{i, p−1} } are calculated from the obtained cepstrum C _i by any known method (step S415). In the case of MLSA analysis, p corresponds to the order of prediction analysis.

線型予測分析の場合は線型予測係数A_i＝{a_i、1、・・・、a_i、n}が、ＭＬＳＡ分析の場合はＭＬＳＡフィルタ係数M_i＝{m_i、0、・・・、m_i、p−1}が、記憶部３２５に予測係数として記憶される（ステップＳ４１７）。 In the case of linear prediction analysis, linear prediction coefficients A _i = {a _{i, 1} ,..., A _{i, n} }, and in the case of MLSA analysis, MLSA filter coefficients M _i = {m _{i, 0} ,. m _{i, p−1} } is stored as a prediction coefficient in the storage unit 325 (step S417).

続いて、線型予測分析の場合、線型予測係数A_iから、任意の既知の手法により、予測分析用逆線型予測フィルタAIA_iが計算され、ＭＬＳＡ分析の場合、ＭＬＳＡフィルタ係数M_iから、任意の既知の手法により、予測分析用逆ＭＬＳＡフィルタAIM_iが計算される。（ステップＳ４１９）これらの計算は、図１の予測分析用逆フィルタ算出器１４１が行う計算に相当する。 Subsequently, in the case of linear prediction analysis, an inverse prediction filter AIA _i for prediction analysis is calculated from the linear prediction coefficient A _i by an arbitrary known method. In the case of MLSA analysis, an arbitrary linear prediction filter A _i is calculated from the MLSA filter coefficient M _i . An inverse MLSA filter AIM _i for predictive analysis is calculated by a known method. (Step S419) These calculations correspond to the calculations performed by the prediction analysis inverse filter calculator 141 of FIG.

求めた予測分析用逆線型予測フィルタAIA_i又は予測分析用逆ＭＬＳＡフィルタAIM_iに入力信号サンプルS_i＝{s_i、0、・・・、s_i、l−1}が通されることにより、残差信号D_i＝{d_i、0、・・・、d_i、l−1}が求まる（図４のステップＳ４２１）。残差信号D_iは記憶部３２５に記憶される（ステップＳ４２３）。 The input signal sample S _i = {s _{i, 0} ,..., S _{i, l−1} } is passed through the obtained prediction analysis inverse linear prediction filter AIA _i or prediction analysis inverse MLSA filter AIM _i. , Residual signal D _i = {d _{i, 0} ,..., D _{i, l−1} } is obtained (step S421 in FIG. 4). The residual signal D _i is stored in the storage unit 325 (step S423).

ここで、入力信号サンプルカウンタiがM−1に達しているか否かが判別される（ステップＳ４２５）。達していれば（ステップＳ４２５；Ｙｅｓ）、終了する。一方、達していなければ（ステップＳ４２５；Ｎｏ）、次の時間区間の入力信号サンプルについての処理を行うために、iを1だけインクリメントし（ステップＳ４２７）、ステップＳ４１３以降の処理を繰り返す。 Here, it is determined whether or not the input signal sample counter i has reached M−1 (step S425). If it has been reached (step S425; Yes), the process ends. On the other hand, if not reached (step S425; No), i is incremented by 1 (step S427) in order to perform processing on the input signal sample in the next time interval, and the processing after step S413 is repeated.

（エントロピ符号生成の手順）
以下では、図１の予測分析次数調整部１３７の制御下で行われる、予測分析、ゲイン抽出、有声無声判別及びピッチ抽出、エントロピ符号化の試行、及び、実際に送信されるエントロピ符号の生成、の手順について、図５に示すフローチャートを参照しつつ説明する。ここでは予測分析として線型予測分析を採用した場合について説明するが、ＭＬＳＡ分析を採用した場合も同様である。 (Entropy code generation procedure)
Hereinafter, prediction analysis, gain extraction, voiced and unvoiced discrimination and pitch extraction, trial of entropy encoding, and generation of an entropy code to be actually transmitted, which are performed under the control of the prediction analysis order adjustment unit 137 of FIG. The procedure will be described with reference to the flowchart shown in FIG. Here, a case where linear prediction analysis is adopted as prediction analysis will be described, but the same applies when MLSA analysis is adopted.

ＣＰＵ３２１（図３）は、入力信号サンプルカウンタiをi＝0に設定し（図５のステップＳ５１１）、線型予測係数A_iの次数nを既に述べた考察に基づく所定の値であるn_initialに設定する（ステップＳ５１３）。 The CPU 321 (FIG. 3) sets the input signal sample counter i to i = 0 (step S511 in FIG. 5), and the order n of the linear prediction coefficient A _i is set to n _initial which is a predetermined value based on the consideration already described. Set (step S513).

続いて、図４に示した手順に従って、入力信号サンプルS_i＝{s_i、0、・・・、s_i、l−1}から、線型予測係数A_i＝{a_i、1、・・・、a_i、n}及び残差信号D_i＝{d_i、0、・・・、d_i、l−1}が計算される（図５のステップＳ５１５）。 Then, according to the procedure shown in FIG. 4, from the input signal samples S _i = {s _{i, 0} ,..., S _{i, l−1} }, linear prediction coefficients A _i = {a _{i, 1} ,. A _{i, n} } and residual signal D _i = {d _{i, 0} ,..., D _{i, l−1} } are calculated (step S515 in FIG. 5).

残差信号D_iからゲインG_iが計算される（ステップＳ５１７）。G_iは、既に述べたとおり、例えば、
G_i＝10×ｌｏｇ₁₀{(d_i、0 ²＋・・・＋d_i、l−1 ²)／l}
のように計算される。 A gain G _i is calculated from the residual signal D _i (step S517). G _i is, for example,
G _i = 10 × log ₁₀ {(d _{i, 0} ² +... + D _{i, l−1} ² ) / l}
It is calculated as follows.

次に、i番目の時間区間の音声が、有声音であるか、それとも、無声音であるか、が判別される。有声音であるか否かは、換言すれば、残差信号D_i（又は、ローパスフィルタ１２９（図１）通過後の低域残差信号D_Low、iであるが、以下では、D_Low、iも単にD_iと記す。）がピッチとしての性質を有しているか否か、ということである。残差信号D_iに周期性があれば、ピッチとしての性質を有しているといえる。そこで、D_iに周期性があるか否かを調べる。 Next, it is determined whether the sound in the i-th time interval is a voiced sound or an unvoiced sound. Whether a voiced sound, in other words, the residual signal D _i (or, a low pass filter 129 (FIG. 1) lowband residual signal D _Low after _passage, is a _i, in the following, D _{Low, i is} also simply referred to as D _i .) whether or not it has the property of pitch. If the residual signal _Di has periodicity, it can be said that it has a pitch property. Therefore, it is examined whether _Di has periodicity.

周期性の有無を調べるには任意の既知の手法を用いてよいが、例えば、規格化された自己相関関数を求めてそこに十分な大きさの極大値が存在するか否かを調べるのが好適である。かかる極大値が存在すれば周期性も存在するといえるし、さらに、かかる極大値をもたらす時間間隔t_MAXこそが周期であるといえる。一方、かかる極大値が存在しなければ、周期性はないといえる。 Any known method may be used to check for the presence or absence of periodicity.For example, it is possible to obtain a standardized autocorrelation function and check whether a sufficiently large maximum value exists. Is preferred. If such a maximum value exists, it can be said that periodicity also exists, and furthermore, it can be said that the time interval t _MAX that provides such a maximum value is the period. On the other hand, if there is no such maximum value, it can be said that there is no periodicity.

残差信号D_iの自己相関関数C(t)は、
C(t)＝d_i、0×d_i、t
＋d_i、1×d_i、t+1
＋・・・
＋d_i、l-1-t×d_i、l-1
である。この式から分かるように、tは、残差信号D_iに含まれる要素の個数を単位とした間隔である。よって、厳密には、残差信号D_iに含まれる各要素がサンプリングされた時間間隔をtに乗じたものがここで検討すべき時間間隔である。したがって、この点では、ピッチ周波数を求めるにあたっては注意が必要である。もっとも、通常、残差信号D_iに含まれる各要素がサンプリングされた時間間隔は一定であるから、ここで検討すべき時間間隔はtに比例する。よって、以下では、混同のおそれがない場合には、ここで検討すべき時間間隔を単にtと記す。 The autocorrelation function C (t) of the residual signal D _i is
C (t) = d _{i, 0} × d _{i, t}
+ D _{i, 1} xd _{i, t + 1}
+ ...
+ D _{i, l-1-t} × d _{i, l-1}
It is. As can be seen from this equation, t is an interval in which the number of elements in the residual signal D _i as a unit. Therefore, strictly speaking, the time interval to be considered here is that each element multiplied by the time interval sampled in t included in the residual signal D _i. Therefore, in this respect, care must be taken in obtaining the pitch frequency. However, since the time interval during which each element included in the residual signal D _i is normally sampled is constant, the time interval to be considered here is proportional to t. Therefore, hereinafter, when there is no possibility of confusion, the time interval to be considered here is simply denoted by t.

自己相関関数C(t)の規格化にあたっては、自己相関関数C(t)の大きさが残差信号D_iの全体としての大きさに依存しないようにする方法であればいかなる方法であってもかまわないが、例えば、規格化因子REG(t)を
REG(t)＝{(d_i、0 ²＋・・・＋d_i、l-1-t ²)
×(d_i、t ²＋・・・＋d_i、l-1 ²)}^0.5
のように定義し、規格化自己相関関数C_REG(t)を
C_REG(t)＝C(t)／REG(t)
と定義するのが好適である。 In the normalized autocorrelation function C (t), be any method as long as the method magnitude of the autocorrelation function C (t) is not depend on the size of the entire residual signal D _i For example, the normalization factor REG (t)
REG (t) = {(d _{i, 0} ² + ... + d _{i, l-1-t} ² )
× (d _{i, t} ² + ... + d _{i, l-1} ² )} ^0.5
And define the normalized autocorrelation function C _REG (t) as
C _REG (t) = C (t) / REG (t)
Is preferably defined.

前記所定の閾値C_thは、規格化自己相関関数C_REG(t)に明りょうな極大値が存在するか否かの判別に役立つ数値であれば任意の値でよいが、例えば、0.5とするのが好適である。 The predetermined threshold C _th may be an arbitrary value as long as it is a numerical value useful for determining whether or not there is a clear maximum value in the normalized autocorrelation function C _REG (t), but is set to 0.5, for example. Is preferred.

このように、ステップＳ５１９では、残差信号D_iから規格化自己相関関数C_REG(t)を算出し、C_REG(t＝t_MAX)＞C_th（＝0.5）なる極大値C_REG(t＝t_MAX)が存在するか否かを判別する。 As described above, in step S519, the normalized autocorrelation function C _REG (t) is calculated from the residual signal D _i, and the maximum value C _REG (t) where C _REG (t = t _MAX )> C _th (= 0.5) is obtained. = T _MAX ) exists.

存在する場合には残差信号D_iは有声音としての性質を有するといえるから（ステップＳ５１９；Ｙｅｓ）、有声音か無声音かを表す変数であるFlag_VorUV、iをFlag_VorUV、i＝"V"（有声音を意味する。）と設定して記憶部３２５に格納する。さらに、規格化自己相関関数C_REG(t)に極大値をもたらしたtの値であるt_MAXの逆数をとることによりピッチ周波数Pitch_iを算出し、記憶部３２５に格納し（ステップＳ５２１）、ステップＳ５２５に進む。 If it exists, it can be said that the residual signal D _i has a property as a voiced sound (step S519; Yes). _Therefore, Flag _{VorUV, i} which is a variable representing voiced sound or unvoiced sound _{, i} is Flag _{VorUV, i} = “V “(Means voiced sound)” is set and stored in the storage unit 325. Further, the pitch frequency Pitch _i is calculated by taking the reciprocal of t _MAX which is the value of t that has caused the maximum value in the normalized autocorrelation function C _REG (t), and is stored in the storage unit 325 (step S521). Proceed to step S525.

規格化自己相関関数C_REG(t)にC_REG(t)＞C_th（＝0.5）なる極大値をもたらすようなtが存在しない場合（ステップＳ５１９；Ｎｏ）には、Flag_VorUV、i＝"UV"（無声音を意味する。）と設定して記憶部３２５に格納し（ステップＳ５２３）、ステップＳ５２５に進む。 When there is no t that causes a maximum value of C _REG (t)> C _th (= 0.5) in the normalized autocorrelation function C _REG (t) (step _S519 ; No), Flag _{VorUV, i} = ” UV "(meaning unvoiced sound) is set and stored in the storage unit 325 (step S523), and the process proceeds to step S525.

ステップＳ５２５では、線型予測係数A_i、ゲインG_i、ピッチ判別フラグFlag_VorUV、i、及び、存在するならばピッチ周波数Pitch_i、を、例えばハフマンコードやRangeCoderといったエントロピ符号化方法により、まとめてエントロピ符号化する。そして、生成されたエントロピ符号の符号長を計算する。 In step S525, the linear prediction coefficient A _i , the gain G _i , the pitch discrimination flag Flag _{VorUV, i} and the pitch frequency Pitch _i if present are collectively entropy by an entropy encoding method such as Huffman code or RangeCoder. Encode. Then, the code length of the generated entropy code is calculated.

続いて、計算された符号長が、送信可能通信容量等の事情を勘案してあらかじめ定められている目標符号長以下であるか否かが判別される（ステップＳ５２７）。オーバーフローを起こしている場合、すなわち、計算された符号長が目標符号長よりも大きい場合（ステップＳ５２７；Ｎｏ）には、予測分析の次数nを1だけ減らしてから（ステップＳ５２９）、ステップＳ５１５に戻り、エントロピ符号化の試行を繰り返す。 Subsequently, it is determined whether or not the calculated code length is equal to or less than a predetermined target code length in consideration of circumstances such as transmittable communication capacity (step S527). If an overflow has occurred, that is, if the calculated code length is larger than the target code length (step S527; No), the order n of the prediction analysis is reduced by 1 (step S529), and the process goes to step S515. Return and repeat the entropy coding trial.

計算された符号長が目標符号長以下である場合（ステップＳ５２７；Ｙｅｓ）、ステップＳ５２５にて生成されたエントロピ符号が実際に送信されることになるので、それに備えて、該符号が記憶部３２５に記憶される（ステップＳ５３１）。 When the calculated code length is less than or equal to the target code length (step S527; Yes), the entropy code generated in step S525 is actually transmitted, so that the code is stored in the storage unit 325 in preparation for that. (Step S531).

続いて、iが(M−1)以上であるか否かが判別される（ステップＳ５３３）。iがM−1に達していれば（ステップＳ５３３；Ｙｅｓ）、全ての時間区間についての処理が完了したので、終了する。iがM−1に達していないのであれば（ステップＳ５３３；Ｎｏ）、次の時間区間についての処理を行うために、iを1だけ増加してから（ステップＳ５３５）、ステップＳ５１３に戻る。 Subsequently, it is determined whether or not i is equal to or greater than (M−1) (step S533). If i has reached M−1 (step S533; Yes), the processing for all the time sections is completed, and the process is terminated. If i has not reached M−1 (step S533; No), in order to perform the process for the next time interval, i is increased by 1 (step S535), and the process returns to step S513.

（パルス列又は雑音列の生成の手順）
以下では、図２の残差信号復元部２３５が行う処理について、図６に示すフローチャートを参照しつつ説明する。 (Pulse train or noise train generation procedure)
Hereinafter, processing performed by the residual signal restoration unit 235 of FIG. 2 will be described with reference to the flowchart shown in FIG.

i番目の時間区分（0≦i≦M−1）における処理について説明する。 Processing in the i-th time segment (0 ≦ i ≦ M−1) will be described.

ＣＰＵ３２１（図３）は、汎用レジスタに、記憶部３２５（図３）から、ゲインG_iと有声無声判別変数Flag_VorUV、iをロードする（図６のステップＳ６１１）。 The CPU 321 (FIG. 3) loads the gain G _i and the voiced / unvoiced discrimination variable Flag _{VorUV, i} from the storage unit 325 (FIG. 3) to the general-purpose register (step S611 in FIG. 6).

有声無声判別変数Flag_VorUV、iがFlag_VorUV、i＝”V”であるか否かを判別する（ステップＳ６１３）。すなわち、元の残差信号D_iが有声音であったか否かを判別する。 It is determined whether or not the voiced / unvoiced discrimination variable Flag _{VorUV, i} is Flag _{VorUV, i} = “V” (step S613). That is, it is determined whether or not the original residual signal _Di is a voiced sound.

有声音であった場合（ステップＳ６１３；Ｙｅｓ）、図５のステップＳ５２１において、送信側の音声符号化兼復号装置３１１の有声無声判別及びピッチ抽出部１３１（図１）によりPitch_iが生成されているはずであるから、符号化・送受信・復号を経て、受信側の音声符号化兼復号装置３１１の記憶部３２５にピッチ周波数Pitch_iが格納されているはずである。そこで、Pitch_iをロードする（ステップＳ６１５）。 If it is a voiced sound (step S613; Yes), in step S521 of FIG. 5, Pitch _i is generated by the voiced / unvoiced discrimination and pitch extraction unit 131 (FIG. 1) of the voice encoding / decoding device 311 on the transmission side. Therefore, the pitch frequency Pitch _i should be stored in the storage unit 325 of the speech encoding / decoding device 311 on the receiving side through encoding / transmission / reception / decoding. Therefore, Pitch _i is loaded (step S615).

続いて、残差信号の復元作業を行う。すなわち、大きさがゲインG_iであり、周期がピッチ周波数Pitch_iであるようなパルス列D’_i＝{d’_i、0、・・・、d’_i、l-1}を生成する（ステップＳ６１７）。これが復元された残差信号である。なお、パルス列D’_iは、元の残差信号のサンプリング間隔と同じサンプリング間隔を想定して生成される。 Subsequently, the residual signal is restored. That is, a pulse train D ′ _i = {d ′ _{i, 0} ,..., D ′ _{i, l−1} } having a gain G _i and a period of the pitch frequency Pitch _i is generated (Step 1 S617). This is the restored residual signal. The pulse train D ′ _i is generated assuming the same sampling interval as that of the original residual signal.

元の残差信号のサンプリング間隔に従ってD’_iを生成したのであるから、実際には、その各要素d’_i、0、・・・、d’_i、l-1の値はそれぞれ0かG_iの一方に限られる。しかも、これら時間順に並んだ要素の列においては、Pitch_iの逆数であるピッチ周期に対応する個数間隔毎にG_iが出現し、他の要素の値は0ということになる。 Since D ′ _i is generated according to the sampling interval of the original residual signal, the value of each element d ′ _{i, 0} ,..., D ′ _{i, l−1} is actually 0 or G, respectively. _Limited to one of _i . In addition, in these element sequences arranged in chronological order, G _i appears at every number interval corresponding to the pitch period that is the reciprocal of Pitch _i , and the values of the other elements are zero.

ステップＳ６１３において元の残差信号が有声音ではなかったと判別された場合（ステップＳ６１３；Ｎｏ）、元の残差信号は無声音であると判別されていたことになる。そこで、ゲインG_iを反映しつつ、雑音として適切な信号値の列D’_i＝{d’_i、0、・・・、d’_i、l-1}を、以下の手順により、生成する。 If it is determined in step S613 that the original residual signal is not a voiced sound (step S613; No), it is determined that the original residual signal is an unvoiced sound. Therefore, a signal value sequence D ′ _i = {d ′ _{i, 0} ,..., D ′ _{i, l−1} } that is appropriate as noise is generated according to the following procedure while reflecting the gain G _i. .

まず、大きさが±１で、時間間隔が乱数であるような基本雑音列R_i＝{r_i、0、・・・、r_i、l-1}を生成する（ステップＳ６１９）。 First, a basic noise sequence R _i = {r _{i, 0} ,..., R _{i, l-1} } having a size of ± 1 and a time interval of a random number is generated (step S619).

ここでは、元の残差信号のサンプリング間隔と同じサンプリング間隔であるとしてR_iを生成する。よって、実際には、その各要素r_i、0、・・・、r_i、l-1の値はそれぞれ０か＋１か−１のいずれかである。しかも、これら時間順に並んだ要素の列においては、ランダムな個数間隔で＋１か−１が出現し、他の要素の値は０ということになる。 Here, _Ri is generated assuming that the sampling interval is the same as the sampling interval of the original residual signal. Therefore, in practice, the value of each element r _{i, 0} ,..., R _{i, l−1} is either 0, +1, or −1. Moreover, in these element sequences arranged in time order, +1 or −1 appears at random number intervals, and the values of the other elements are zero.

得られた基本雑音列R_iに、ロード済のゲインG_iを乗じることにより、雑音列D’_i＝{d'_i、0、・・・、d'_i、l-1｝が生成される（ステップＳ６２１）。 By multiplying the obtained basic noise sequence R _i by the loaded gain G _i , a noise sequence D ′ _i = {d ′ _{i, 0} ,..., D ′ _{i, l−1} } is generated. (Step S621).

このように、元の残差信号が有声音であった場合も無声音であった場合も、パルス列又は雑音列として復元された残差信号であるD’_i＝{d’_i、0、・・・、d’_i、l-1}が生成される。これは後に音声信号の再生に用いるので、記憶部３２５に格納する（ステップＳ６２３）。 Thus, D ′ _i = {d ′ _{i, 0} ,..., Which is a residual signal restored as a pulse train or a noise train, regardless of whether the original residual signal is a voiced sound or an unvoiced sound. D ′ _{i, l−1} } is generated. Since this is used later for reproduction of an audio signal, it is stored in the storage unit 325 (step S623).

（音声信号復元の手順）
以下では、図２の合成用フィルタ算出部２３７及び合成用フィルタ部２３９による音声信号復元の手順について、図７に示すフローチャートを参照しつつ説明する。予測分析として線型予測分析を採用した場合について説明するが、他の場合、例えばＭＬＳＡ分析を採用した場合も、手順は同様である。 (Procedure for audio signal restoration)
Hereinafter, the procedure of audio signal restoration by the synthesis filter calculation unit 237 and the synthesis filter unit 239 in FIG. 2 will be described with reference to the flowchart shown in FIG. Although the case where linear prediction analysis is adopted as the prediction analysis will be described, the procedure is the same in other cases, for example, when MLSA analysis is adopted.

ＣＰＵ３２１（図３）は、カウンタレジスタにおいて、入力信号サンプルカウンタをi＝0とする（図７のステップＳ７１１）。 The CPU 321 (FIG. 3) sets the input signal sample counter to i = 0 in the counter register (step S711 in FIG. 7).

ＣＰＵ３２１は、汎用レジスタに、記憶部３２５（図３）から、線型予測係数A_i＝{a_i、1、・・・、a_i、n}をロードする（図７のステップＳ７１３）。 The CPU 321 loads the linear prediction coefficient A _i = {a _{i, 1} ,..., A _{i, n} } from the storage unit 325 (FIG. 3) to the general-purpose register (step S713 in FIG. 7).

次に、線型予測係数A_iから、任意の既知の手法により、合成用フィルタCIA_iを計算する（ステップＳ７１５）。これは、図２の合成用フィルタ算出部２３７が行う作業である。 Next, a synthesis filter CIA _i is calculated from the linear prediction coefficient A _i by any known method (step S715). This is an operation performed by the synthesis filter calculation unit 237 of FIG.

続いて疑似残差信号D’_i＝｛d’_i、0、・・・、d’_i、l-1｝をロードし、それを合成用フィルタCIA_iに通すことにより、音声信号S’_i＝{s’_i、0、・・・、s’_i、l-1}を復元する（ステップＳ７１７）。 Subsequently, the pseudo residual signal D ′ _i = {d ′ _{i, 0} ,..., D ′ _{i, l−1} } is loaded and passed through the synthesis filter CIA _i , thereby obtaining the speech signal S ′ _i. = {S ′ _{i, 0} ,..., S ′ _{i, l−1} } is restored (step S717).

復元された音声信号S’_iを記憶部３２５に格納する（ステップＳ７１９）。 The restored audio signal S ′ _i is stored in the storage unit 325 (step S719).

入力信号サンプルカウンタiがM-1に達しているか否かを判別する（ステップＳ７２１）。達していれば（ステップＳ７２１；Ｙｅｓ）、復元すべき音声信号は全て復元したのであるから、処理を終了する。達していないのであれば（ステップＳ７２１；Ｎｏ）、次の時間区間の音声信号を復元するために、iを1だけ増加してから（ステップＳ７２３）、ステップＳ７１３以降の処理を繰り返す。 It is determined whether or not the input signal sample counter i has reached M−1 (step S721). If it has been reached (step S721; Yes), since all the audio signals to be restored have been restored, the process is terminated. If not reached (step S721; No), in order to restore the audio signal in the next time interval, i is incremented by 1 (step S723), and the processing after step S713 is repeated.

（ケプストラムからＭＬＳＡ係数を求める手順の一例）
図８は、ケプストラムC_i＝{c_i、0、・・・、c_i、(l/2)-1}からＭＬＳＡフィルタ係数M_i＝{m_i、0、・・・、m_i、p-1}を求める具体的な手順の一例をフローチャートにしたものである。ステップＳ８１１〜Ｓ８３５に示した計算を行うことにより、ＭＬＳＡフィルタ係数が求まる。αは近似用の数値であり、音声信号が10ｋＨｚでサンプリングされている場合にはα＝0.35とするのが好適である。また、β＝１−α²である。m_i（0≦i≦p−1）は0に初期化しておく。 (Example of procedure for obtaining MLSA coefficients from cepstrum)
Figure 8 is a cepstrum _{_{C i = {c i, 0}} , ···, c i, (l / 2) -1} MLSA filter coefficients from _{_{M i = {m i, 0}} , ···, m i, p _-1 } is a flowchart illustrating an example of a specific procedure. By performing the calculations shown in steps S811 to S835, the MLSA filter coefficient is obtained. α is a numerical value for approximation, and α = 0.35 is preferable when the audio signal is sampled at 10 kHz. Further, β = 1−α ² . m _i (0 ≦ i ≦ p−1) is initialized to 0.

このようにして求まったＭＬＳＡフィルタ係数を用いたＭＬＳＡフィルタの構成の一例を、図９に示す。P₁〜P₄は近似用係数であり、例えば、P₁＝0.4999、P₂＝0.1067、P₃＝0.0117、P₄＝0.0005656とするのが好適である。 An example of the configuration of the MLSA filter using the MLSA filter coefficient obtained in this way is shown in FIG. P _{1 to} P ₄ are approximation coefficients. For example, it is preferable that P ₁ = 0.4999, P ₂ = 0.1067, P ₃ = 0.0117, and P ₄ = 0.0005656.

なお、この発明は、上記実施形態に限定されず、種々の変形及び応用が可能である。上述のハードウェア構成やブロック構成、フローチャートは例示であって、限定されるものではない。 In addition, this invention is not limited to the said embodiment, A various deformation | transformation and application are possible. The above-described hardware configuration, block configuration, and flowchart are examples, and are not limited.

例えば、図３に示される音声符号化兼復号装置３１１として携帯電話機を想定して説明したが、ＰＨＳ（Personal Handyphone System）、ＰＤＡ（Personal Digital Assistants）、ノート型及びデスクトップ型パーソナルコンピュータ等による音声処理においても、同様に本発明を適用することができる。例えば本発明をパーソナルコンピュータに適用する場合には、パーソナルコンピュータに音声入出力装置や通信装置等を付加すれば、ハードウェアとしては携帯電話機の機能を有するようにすることができる。そして、上述の処理をコンピュータに実行させるためのコンピュータプログラムが記録媒体や通信により配布されれば、これをコンピュータにインストールして実行させることにより、該コンピュータをこの発明に係る音声符号化装置又は音声復号装置として機能させることも可能である。 For example, the description has been made assuming that a mobile phone is used as the speech encoding / decoding device 311 shown in FIG. The present invention can also be applied in the same manner. For example, when the present invention is applied to a personal computer, if a voice input / output device, a communication device, or the like is added to the personal computer, it can have the function of a mobile phone as hardware. Then, if a computer program for causing a computer to execute the above-described processing is distributed by a recording medium or communication, the computer is installed and executed on the computer, thereby causing the computer to execute the speech encoding apparatus or the speech according to the present invention. It is also possible to function as a decoding device.

すなわち、上記実施形態は説明のためのものであり、本願発明の範囲を制限するものではない。したがって、当業者であればこれらの各要素もしくは全要素をこれと均等なものに置換した実施形態を採用することが可能であるが、これらの実施形態も本発明の範囲に含まれる。 That is, the said embodiment is for description and does not restrict | limit the scope of the present invention. Therefore, those skilled in the art can employ embodiments in which each or all of these elements are replaced with equivalent ones, and these embodiments are also included in the scope of the present invention.

本発明の実施形態に係る、予測分析次数調整部を備えた音声符号化装置の機能構成図である。It is a functional block diagram of the audio | voice coding apparatus provided with the prediction analysis order adjustment part based on embodiment of this invention. 本発明の実施形態に係る音声復号装置の機能構成図である。It is a functional block diagram of the speech decoding apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る音声符号化兼復号装置の物理的な構成を示す図である。It is a figure which shows the physical structure of the audio | voice encoding / decoding apparatus which concerns on embodiment of this invention. 予測分析の流れを示す図である。It is a figure which shows the flow of a prediction analysis. エントロピ符号生成の流れを示す図である。It is a figure which shows the flow of entropy code generation. パルス列又は雑音列を生成する流れを示す図である。It is a figure which shows the flow which produces | generates a pulse train or a noise train. 音声信号を復元する流れを示す図である。It is a figure which shows the flow which restore | restores an audio | voice signal. ＭＬＳＡフィルタ係数の計算の流れの一例を示す図である。It is a figure which shows an example of the flow of calculation of an MLSA filter coefficient. ＭＬＳＡフィルタの一例を示す図である。It is a figure which shows an example of an MLSA filter.

Explanation of symbols

１１１・・・音声符号化装置、１２１・・・マイクロフォン、１２３・・・Ａ／Ｄ変換部、１２５・・・予測分析部、１２７・・・ゲイン抽出部、１２９・・・ローパスフィルタ、１３１・・・有声無声判別及びピッチ抽出部、１３３・・・符号化部、１３５・・・スイッチ、１３７・・・予測分析次数調整部、１３９・・・送信部、１４１・・・予測分析用逆フィルタ算出器、２１１・・・音声復号装置、２３１・・・受信部、２３３・・・復号部、２３５・・・残差信号復元部、２３７・・・合成用フィルタ算出部、２３９・・・合成用フィルタ部、２４１・・・Ｄ／Ａ変換部、２４３・・・スピーカ、３１１・・・音声符号化兼復号装置、３２１・・・ＣＰＵ、３２３・・・ＲＯＭ、３２５・・・記憶部、３３１・・・ＲＡＭ、３３３・・・ハードディスク、３４１・・・音声処理部、３５１・・・無線通信部、３５３・・・アンテナ、３６１・・・操作キー入力処理部、３６３・・・操作キー、３７１・・・システムバス DESCRIPTION OF SYMBOLS 111 ... Speech coding apparatus, 121 ... Microphone, 123 ... A / D conversion part, 125 ... Prediction analysis part, 127 ... Gain extraction part, 129 ... Low pass filter, 131 * ..Voiced / unvoiced discrimination and pitch extraction unit, 133 ... encoding unit, 135 ... switch, 137 ... prediction analysis order adjustment unit, 139 ... transmission unit, 141 ... inverse filter for prediction analysis Calculator 211... Speech decoding device, 231... Receiving unit, 233... Decoding unit, 235... Residual signal restoring unit, 237. Filter unit, 241... D / A conversion unit, 243... Speaker, 311... Voice encoding and decoding device, 321... CPU, 323. 331... RAM, 333 ... hard disk, 341 ... audio processing unit, 351 ... wireless communication unit, 353 ... antenna, 361 ... operation key input processing unit, 363 ... operation key, 371 ... system bus

Claims

A prediction analysis unit that decomposes a speech signal into a prediction coefficient and a residual signal by a prediction analysis of a predetermined order;
A gain extraction unit for obtaining a gain of the residual signal;
Determining whether the residual signal is voiced or unvoiced and, if it is determined that the residual signal is voiced, a voiced and unvoiced discrimination and pitch extracting unit that extracts a pitch frequency from the residual signal;
An encoding unit that converts the pitch frequency into an entropy code when the prediction coefficient, the gain, the determination result, and the pitch frequency are extracted as a result of the determination;
It is determined whether or not the length of the entropy code exceeds an allowable length, and when it is determined that the length of the code exceeds the allowable length, the order of the prediction analysis in the prediction analysis unit is reduced to reduce the order of the series. A control unit that repeatedly executes an encoding operation;
A code transmission unit that transmits an entropy code that is repeatedly executed by the encoding unit under the control of the control unit and falls within an allowable length;
A speech encoding apparatus comprising:

The voiced / unvoiced discrimination and pitch extraction unit,
A low-pass filter for extracting a low-frequency portion from the residual signal in advance,
Determining whether the low frequency part is voiced sound or unvoiced sound and extracting the pitch frequency from the low frequency part when the low frequency part is determined to be voiced sound;
The speech coding apparatus according to claim 1.

The prediction analysis unit
The speech signal is decomposed into a prediction coefficient and a residual signal by linear predictive analysis.
The speech encoding apparatus according to claim 1 or 2 , characterized in that

The prediction analysis unit
The audio signal is decomposed into a prediction coefficient and a residual signal by MLSA (Mel Log Spectrum Approximation) analysis.
The speech encoding apparatus according to claim 1 or 2 , characterized in that

A receiving unit for receiving an entropy code transmitted from the code transmitting unit of the speech encoding device according to claim 1 ;
A decoding unit that decodes the received entropy code and generates a prediction coefficient, a gain of the residual signal, a voiced / unvoiced discrimination result of the residual signal, and a pitch frequency in the case of voiced ;
When the voice signal is an unvoiced sound, noise having a gain equal to the residual signal gain is generated as an excitation signal. When the voice signal is a voiced sound, the noise has a gain equal to the residual signal gain. A signal generator for generating a pulse train having a frequency equal to the pitch frequency as an excitation signal;
A synthesis filter that restores speech by synthesizing the prediction coefficient and the excitation signal;
A speech decoding apparatus comprising:

A predictive analysis step that decomposes the speech signal into predictive coefficients and residual signals by predictive analysis;
A gain extracting step for obtaining a gain of the residual signal;
Determining whether the residual signal is voiced or unvoiced and, if it is determined that the residual signal is voiced, a voiced / unvoiced discrimination and pitch extraction step of extracting a pitch frequency from the residual signal;
An encoding step for converting the pitch frequency into an entropy code when the prediction coefficient, the gain, the determination result, and the pitch frequency are extracted as a result of the determination;
A code length examination step of calculating a length of the entropy code and determining whether the length exceeds an allowable length; and
When it is determined in the code length examination step that the length of the entropy code exceeds the allowable length, a subtraction step for reducing the order of prediction analysis in the prediction analysis step;
Consisting of
Until the entropy code falls within an allowable length, the prediction analysis step, the gain extraction step, the voiced unvoiced discrimination and pitch extraction step, the encoding step, the code length examination step, the subtraction step, Is repeated, and an entropy code within the allowable length is transmitted .

A receiving unit step for receiving an entropy code encoded by the speech encoding method according to claim 6 ;
A decoding step of decoding the received entropy code to generate a prediction coefficient, a gain of the residual signal, a voiced / unvoiced discrimination result of the residual signal, and a pitch frequency in the case of voiced ;
When the voice signal is an unvoiced sound, noise having a gain equal to the residual signal gain is generated as an excitation signal. When the voice signal is a voiced sound, the noise has a gain equal to the residual signal gain. A signal generation step of generating a pulse train having a frequency equal to the pitch frequency as an excitation signal;
A synthesis step of restoring speech by synthesizing the prediction coefficient and the excitation signal;
A speech decoding method comprising:

On the computer,
A predictive analysis step that decomposes the speech signal into predictive coefficients and residual signals by predictive analysis;
A gain extracting step for obtaining a gain of the residual signal;
Determining whether the residual signal is voiced or unvoiced and, if it is determined that the residual signal is voiced, a voiced / unvoiced discrimination and pitch extraction step of extracting a pitch frequency from the residual signal;
An encoding step for converting the pitch frequency into an entropy code when the prediction coefficient, the gain, the determination result, and the pitch frequency are extracted as a result of the determination;
A code length examination step of calculating a length of the entropy code and determining whether the length exceeds an allowable length; and
When it is determined in the code length examination step that the length of the entropy code exceeds the allowable length, a subtraction step for reducing the order of prediction analysis in the prediction analysis step;
The prediction analysis step, the gain calculation step, the voiced / unvoiced discrimination and pitch extraction step, the encoding step, the code length examination step, and the subtraction step until the entropy code falls within an allowable length. , and re-encoding step of repeating,
A code transmission step of transmitting the entropy code obtained by the re-encoding step;
A computer program that executes

On the computer,
A receiving unit step for receiving an entropy code encoded by the speech encoding method according to claim 6 ;
A decoding step of decoding the received entropy code to generate a prediction coefficient, a gain of the residual signal, a voiced / unvoiced discrimination result of the residual signal, and a pitch frequency in the case of voiced ;
When the voice signal is an unvoiced sound, noise having a gain equal to the residual signal gain is generated as an excitation signal. When the voice signal is a voiced sound, the noise has a gain equal to the residual signal gain. A signal generation step of generating a pulse train having a frequency equal to the pitch frequency as an excitation signal;
A synthesis step of restoring speech by synthesizing the prediction coefficient and the excitation signal;
A computer program that executes