JP5667963B2 - Speech enhancement device, method and program thereof - Google Patents

Speech enhancement device, method and program thereof Download PDF

Info

Publication number
JP5667963B2
JP5667963B2 JP2011245547A JP2011245547A JP5667963B2 JP 5667963 B2 JP5667963 B2 JP 5667963B2 JP 2011245547 A JP2011245547 A JP 2011245547A JP 2011245547 A JP2011245547 A JP 2011245547A JP 5667963 B2 JP5667963 B2 JP 5667963B2
Authority
JP
Japan
Prior art keywords
index
conversion
frequency
speech
periodicity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2011245547A
Other languages
Japanese (ja)
Other versions
JP2013101255A (en
Inventor
歩相名 神山
歩相名 神山
水野 秀之
秀之 水野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2011245547A priority Critical patent/JP5667963B2/en
Publication of JP2013101255A publication Critical patent/JP2013101255A/en
Application granted granted Critical
Publication of JP5667963B2 publication Critical patent/JP5667963B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Description

この発明は、周囲に背景雑音がある環境において、音声を聞き取り易くする音声強調装置とその方法とプログラムに関する。   The present invention relates to a speech emphasizing device, a method thereof, and a program that make it easy to hear speech in an environment where background noise is present.

近年、音声通信端末、音声合成技術などの開発・普及により、様々な場所で音声を聴取する機会が増えた。このような音声聴取は静かな場所だけではなく、空港や駅のホームのように周囲に雑音があるような騒がしい環境で聴取する場合が多い。このため、周囲の雑音によって音声が聞き取り難くなる問題がある。   In recent years, with the development and popularization of voice communication terminals and voice synthesis technology, the opportunity to listen to voices in various places has increased. Such voice listening is often performed not only in a quiet place but also in a noisy environment such as an airport or a station platform where there is noisy surroundings. For this reason, there is a problem that it is difficult to hear the sound due to ambient noise.

雑音環境下での音声を聞き取り易くするために、最も簡単な方法は、雑音に応じて音量を大きくする方法である。しかし、音量を大きくし過ぎると、スピーカへの入力が過大となり、音声が歪んでしまい、かえって音質が劣化する場合がある。そこで、周波数スペクトルの特定の帯域のみを強調して音声を聞き易くする方法が従来より検討されている。   In order to make it easy to hear a voice in a noisy environment, the simplest method is to increase the volume according to the noise. However, if the volume is increased too much, the input to the speaker becomes excessive, the sound is distorted, and the sound quality may be deteriorated. In view of this, a method for enhancing the ease of listening to audio by emphasizing only a specific band of the frequency spectrum has been studied.

その方法の一つとして、音声の周波数スペクトルのピーク部分であるフォルマント(formant)を強調することで、音声の明瞭度を改善する考えが知られている(特許文献1)。図12に、特許文献1に開示された考えを示す。図12は、音声強調前後の音声のパワーと周波数との関係を示す図である。   As one of the methods, there is known an idea of improving speech intelligibility by emphasizing a formant that is a peak portion of a speech frequency spectrum (Patent Document 1). FIG. 12 shows the idea disclosed in Patent Document 1. FIG. 12 is a diagram showing the relationship between the power and frequency of speech before and after speech enhancement.

音声の音韻性は、このフォルマントの位置によって特徴付けられることが分かっており、このフォルマント部分のみを強調することで、音量を過大に上げることなく、音声の明瞭度を改善することができる(図12(b)の強調後の特性を参照)。   It has been found that the phonological nature of speech is characterized by the position of this formant, and by enhancing only this formant part, the intelligibility of speech can be improved without excessively increasing the volume (Fig. (See characteristic after enhancement in 12 (b)).

特許第4219898号公報Japanese Patent No. 421989

音声のスペクトルパワー(周波数スペクトルの密度分布)は聴覚的に話者を判別するための声質と密接な関係があることが知られている。しかし、従来の方法では、フォルマントが存在する帯域のみを強調することにより、スペクトルパワーが変化してしまい、声質が変化してしまう課題がある。   It is known that the spectrum power (frequency spectrum density distribution) of speech is closely related to the voice quality for auditorily discriminating a speaker. However, in the conventional method, there is a problem that the spectral power changes and the voice quality changes by emphasizing only the band where the formants exist.

この発明は、このような課題に鑑みてなされたものであり、音量及び声質を変化させずに音声の明瞭度を向上させる音声強調装置と、その方法とプログラムを提供することを目的とする。   This invention is made in view of such a subject, and it aims at providing the audio | voice emphasis apparatus which improves the intelligibility of an audio | voice without changing a sound volume and a voice quality, its method, and a program.

この発明の音声強調装置は、音声分析部と、非周期性指標変換部と、音声合成部と、を具備する。音声分析部は、音声信号s(t)を入力として、当該音声信号をpサンプル間隔で分析を行い、pサンプルごとの基本周波数f(i)と、非周期性指標A(i,f)と、スペクトルパワーP(i,f)を出力する。非周期性指標変換部は、所定の周波数範囲F〜Fの非周期性指標の値A(i,f)を、周波数の増加に対して小さくなる変換後非周期性指標A′(i,f)と、当該所定の周波数Fよりも大きな周波数では上記小さくなる変換後非周期性指標A′(i,f)の最小の変換後非周期性指標A′(i,f)とに変換して出力する。音声合成部は、基本周波数f(i)とスペクトルパワーP(i,f)と変換後非周期性指標A′(i,f)とを入力として音声合成音s′(t)を合成する。 The speech enhancement apparatus according to the present invention includes a speech analysis unit, an aperiodic index conversion unit, and a speech synthesis unit. The voice analysis unit receives the voice signal s (t) as input, analyzes the voice signal at p sample intervals, and generates a fundamental frequency f 0 (i) for each p sample and an aperiodicity index A (i, f). And the spectrum power P (i, f) is output. The non-periodic index conversion unit converts the non-periodic index value A (i, f) of the predetermined frequency range F L to F H into a post-conversion non-periodic index A ′ (i , F) and the minimum post-conversion non-periodicity index A ′ (i, f) of the post-conversion non-periodicity index A ′ (i, f) that becomes smaller at a frequency higher than the predetermined frequency F H. Convert and output. The speech synthesizer synthesizes the speech synthesized sound s ′ (t) with the fundamental frequency f 0 (i), the spectrum power P (i, f), and the converted non-periodicity index A ′ (i, f) as inputs. .

この発明の音声強調装置は、音声信号の所定の周波数範囲の非周期性指標A(i,f)の値を、周波数の増加に対して減少させた変換後非周期性指標A′(i,f)を用いて音声合成し、スペクトルパワーP(i,f)は変化させないため、音量と声質を変化させることなく音声信号の音声の明瞭度を向上させることができる。   The speech enhancement apparatus according to the present invention is a post-conversion aperiodicity index A ′ (i, f) in which the value of the aperiodicity index A (i, f) in a predetermined frequency range of the speech signal is decreased with respect to an increase in frequency. Since the speech synthesis is performed using f) and the spectrum power P (i, f) is not changed, the speech intelligibility of the speech signal can be improved without changing the volume and voice quality.

非周期性指標A(i,f)と音声の明瞭度スコアとの関係を示す図。The figure which shows the relationship between the aperiodic parameter | index A (i, f) and the intelligibility score of an audio | voice. この発明の音声強調装置100の機能構成例を示す図。The figure which shows the function structural example of the audio | voice emphasis apparatus 100 of this invention. 音声強調装置100の動作フローを示す図。The figure which shows the operation | movement flow of the audio | voice emphasis apparatus 100. 音声波形s(t)の一例を示す図。The figure which shows an example of the audio | voice waveform s (t). 音声波形s(t)の基本周波数の一例を示す図。The figure which shows an example of the fundamental frequency of the audio | voice waveform s (t). 図4に示す音声波形s(t)を分析して求めた基本周波数f(i)を示す図。Shows the fundamental frequency f 0 (i) obtained by analyzing the speech waveform s (t) shown in FIG. 変換関数E(f)の一例を示す図。The figure which shows an example of the conversion function E (f). 変換関数定義手段21の動作フローを示す図。The figure which shows the operation | movement flow of the conversion function definition means. 加算手段22の動作フローを示す図。The figure which shows the operation | movement flow of the addition means 22. 図4に示す音声波形s(t)を音声強調装置100で音声強調した音声を、分析した変換後非周期性指標A′(i,f)を示す図。FIG. 5 is a diagram showing a post-conversion aperiodicity index A ′ (i, f) obtained by analyzing a voice obtained by voice enhancement of the voice waveform s (t) shown in FIG. 図4に示す音声波形s(t)を分析した非周期性指標A(i,f)を示す図。The figure which shows the aperiodic parameter | index A (i, f) which analyzed the audio | voice waveform s (t) shown in FIG. 特許文献1に開示された音声強調の考えを示す図。The figure which shows the idea of the audio | voice emphasis disclosed by patent document 1. FIG.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには
同じ参照符号を付し、説明は繰り返さない。実施例の説明の前に、この発明の考えについて説明する。
Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated. Prior to the description of the embodiments, the idea of the present invention will be described.

〔この発明の考え方〕
人の声は、声帯の周期的な振動に基づく音と、声帯から口唇、及び鼻孔までの呼気の乱流による周期的な振動を伴わない音との混合音であることが知られている。この人の声を成す2つの音の混合比は、非周期性指標A(i,f)で表すことができる(参考文献:河原英紀、“聴覚の情景分析が生んだ高品質VOCODER:STRAIGHT”日本音響学会誌、54巻、7号、pp.521-526(1998.7))。
[Concept of this invention]
It is known that a human voice is a mixed sound of a sound based on a periodic vibration of a vocal cord and a sound not accompanied by a periodic vibration due to a turbulent flow of exhalation from the vocal cord to the lips and the nostrils. The mixing ratio of the two sounds that make up this person's voice can be expressed by the aperiodic index A (i, f) (reference: Hideki Kawahara, “High-quality VOCODER: STRAIGHT produced by auditory scene analysis”) Journal of the Acoustical Society of Japan, Vol. 54, No. 7, pp.521-526 (1998.7)).

非周期性指標A(i,f)は、音声を周波数スペクトルの周期的成分(声帯の振動)と非周期的成分(呼気の乱流)の和と見なしたとき、帯域毎の非周期成分の割合を表す特徴量である。音声の明瞭度を向上させる目的で、この非周期性指標A(i,f)に着目して雑音下における音声の聞き易さを評価する実験を行った。雑音環境としては、白色雑音、人ごみの騒音、電車の通過音を、それぞれ別々に用いて評価を行い、それぞれの結果を平均して音声の聞き易さを求めた。   The non-periodic index A (i, f) is a non-periodic component for each band when the speech is regarded as a sum of a periodic component of the frequency spectrum (voice zone vibration) and an aperiodic component (exhalation turbulence). This is a feature amount that represents the ratio of. For the purpose of improving speech intelligibility, an experiment was conducted to evaluate the ease of hearing of speech under noise by focusing on this non-periodicity index A (i, f). As noise environment, white noise, crowd noise, and train passing sound were evaluated separately, and the results were averaged to determine the ease of hearing.

図1に、非周期性指標と音声の明瞭度スコアとの関係を示す。横軸は、主観評価で得られた音声の聞こえ易さを5段階のスコアで表す。1は全く聞き取れない。5は全てはっきりと聞き取れる。である。縦軸は、非周期性指標A(i,f)を[dB]で表す。   FIG. 1 shows the relationship between the aperiodicity index and the speech intelligibility score. The horizontal axis represents the ease of hearing of the voice obtained by the subjective evaluation with a five-level score. 1 cannot be heard at all. All 5 are clearly audible. It is. The vertical axis represents aperiodicity index A (i, f) in [dB].

0〜1KHzの範囲の非周期性指標と明瞭度スコアとの相関を示す◆は、両者の間にほとんど相関が無いことを示している。1kHz以上の周波数における非周期性指標と明瞭度スコアとの相関を示す■,▲,×,*は、高い負の相関があることを示している。表1に相関係数を示す。   The symbol ♦ showing the correlation between the aperiodicity index in the range of 0 to 1 KHz and the clarity score indicates that there is almost no correlation between the two. The ■, ▲, ×, and * indicating the correlation between the aperiodicity index and the clarity score at frequencies of 1 kHz or higher indicate that there is a high negative correlation. Table 1 shows the correlation coefficient.

1〜8kHzの帯域で相関係数が大きい。つまり、周波数帯域が1kHz以上の範囲において、非周期性指標A(i,f)を減少させることで、音声が聞き易くなることが分かる。   The correlation coefficient is large in the band of 1 to 8 kHz. That is, it can be seen that the voice can be easily heard by decreasing the non-periodicity index A (i, f) in the frequency band of 1 kHz or more.

この発明は、この新しい知見に基づいて音声の明瞭度を向上させる音声強調方法とその装置を実現するものである。   The present invention realizes a speech enhancement method and apparatus for improving speech intelligibility based on this new knowledge.

図2に、この発明の音声強調装置100の機能構成例を示す。図3にその動作フローを示す。音声強調装置100は、音声分析部10と、非周期性指標変換部20と、音声合成部30と、を具備する。音声強調装置100の各部の機能は、例えばROM、RAM、CPU等で構成されるコンピュータに所定のプログラムが読み込まれて、CPUがそのプログラムを実行することで実現されるものである。   FIG. 2 shows a functional configuration example of the speech enhancement apparatus 100 of the present invention. FIG. 3 shows the operation flow. The speech enhancement apparatus 100 includes a speech analysis unit 10, an aperiodic index conversion unit 20, and a speech synthesis unit 30. The function of each part of the speech enhancement apparatus 100 is realized by reading a predetermined program into a computer composed of, for example, a ROM, a RAM, and a CPU and executing the program by the CPU.

音声分析部10は、音声信号s(t)を入力として、当該音声信号s(t)をpサンプル間隔で分析を行い、pサンプルごとの基本周波数f(i)と、非周期性指標A(i,f)と、スペクトルパワーP(i,f)を出力する(ステップS10)。音声分析部10は、基本周波数分析手段11と、非周期性指標分析手段12と、スペクトルパワー分析手段13と、で構成される。図4に、サンプリング周波数16[kHz]でサンプリングした音声信号s(t)の一例を示す。図4の横軸はサンプル時刻t、縦軸は振幅s(t)である。 The speech analysis unit 10 receives the speech signal s (t) as an input, analyzes the speech signal s (t) at p sample intervals, the fundamental frequency f 0 (i) for each p sample, and the aperiodicity index A. (I, f) and spectrum power P (i, f) are output (step S10). The voice analysis unit 10 includes a fundamental frequency analysis unit 11, an aperiodicity index analysis unit 12, and a spectrum power analysis unit 13. FIG. 4 shows an example of an audio signal s (t) sampled at a sampling frequency of 16 [kHz]. The horizontal axis in FIG. 4 is the sample time t, and the vertical axis is the amplitude s (t).

i(i=0,1,…,[(T-1)/p]、Tはサンプル数)は、pサンプル間隔で分析した場合の分析番号(フレーム番号)であり、t=ipである。また、fは(f=0,1,…,N-1)は、0からナイキスト周波数までの周波数帯域をN分割したうちの、(f/N)・(f/2)[Hz]以上、((f+1)/N)・(f/2)[Hz]未満の周波数帯域を表す番号(帯域番号)である。例えば、サンプリング周波数fを16[kHz]としたときのナイキスト周波数8[kHz]を、N=512個の帯域に分割する場合、帯域番号0番の周波数範囲は0〜15.625[Hz]、帯域番号1番は15.626〜31.25[Hz]、帯域番号512番の周波数範囲は7984.375〜8000[Hz]である。 i (i = 0, 1,..., [(T-1) / p], T is the number of samples) is an analysis number (frame number) when analysis is performed at an interval of p samples, and t = ip. Further, f is (f = 0,1, ..., N -1) are among the frequency band from 0 to Nyquist frequency is N divided, (f / N) · ( f s / 2) [Hz] or more a ((f + 1) / N ) · (f s / 2) [Hz] number representing a frequency band of less than (band number). For example, the Nyquist frequency 8 [kHz] when the sampling frequency f s and 16 [kHz], the case of dividing the N = 512 pieces of band, the frequency range of the band number 0 are 0-15.625 [Hz] The band number 1 is 15.626 to 31.25 [Hz], and the frequency range of the band number 512 is 7984.375 to 8000 [Hz].

基本周波数f(i)は、声の高さを表す特徴量であり、音声波形の周期をτ[秒]としたときに、その逆数1/τ[Hz]が基本周波数となる。図5に、図4に示した音声波形s(t)を時間方向に拡大して見た基本周波数f(i)の一例を示す。図5の横軸は時間[ms]、縦軸は音声振幅である。図6に、図4に示した音声信号s(t)を分析して求めた基本周波数f(i)を示す。図6の横軸はフレーム番号i、縦軸は基本周波数f(i)[Hz]であり、フレーム毎の声の高さを表す。図6に示す基本周波数f(i)は、128[Hz]〜230[Hz]程度に分布している。 The fundamental frequency f 0 (i) is a feature quantity representing the pitch of the voice, and when the period of the speech waveform is τ 0 [second], its reciprocal 1 / τ 0 [Hz] is the fundamental frequency. FIG. 5 shows an example of the fundamental frequency f 0 (i) obtained by enlarging the speech waveform s (t) shown in FIG. 4 in the time direction. The horizontal axis in FIG. 5 is time [ms], and the vertical axis is audio amplitude. FIG. 6 shows the fundamental frequency f 0 (i) obtained by analyzing the audio signal s (t) shown in FIG. The horizontal axis in FIG. 6 is the frame number i, and the vertical axis is the fundamental frequency f 0 (i) [Hz], which represents the voice pitch for each frame. The fundamental frequency f 0 (i) shown in FIG. 6 is distributed in the range of about 128 [Hz] to 230 [Hz].

非周期性指標A(i,f)は、周波数スペクトルを周期成分と非周期成分の和と見なしたとき、帯域毎の非周期成分の割合を表す。スペクトルパワーP(i,f)は、それぞれの帯域の周波数スペクトルの強さを表す。なお、音声分析部10は公知技術で構成できる。   The aperiodic index A (i, f) represents the ratio of the aperiodic component for each band when the frequency spectrum is regarded as the sum of the periodic component and the aperiodic component. The spectrum power P (i, f) represents the strength of the frequency spectrum of each band. The voice analysis unit 10 can be configured by a known technique.

非周期性指標変換部20は、所定の周波数範囲F〜Fの非周期性指標の値A(i,f)と、その所定の周波数Fよりも大きな周波数では小さくなる変換後非周期性指標A′(i,f)の最小の変換後非周期性指標A′(i,f)と、を出力する(ステップS20)。ここで最小の変換後非周期性指標A′(i,f)とは、例えば周波数範囲F〜Fにおいて周波数の増加に対して一定の傾きで小さくなる変換後非周期性指標A′(i,f)の最小値のことであり、周波数Fの変換後非周期性指標A′(i,f)の値である。 The non-periodicity index conversion unit 20 converts the non-periodicity after conversion to be smaller at a value A (i, f) of the non-periodicity index in the predetermined frequency range F L to F H and a frequency larger than the predetermined frequency F H. The minimum post-conversion aperiodicity index A ′ (i, f) of the sex index A ′ (i, f) is output (step S20). Here, the minimum post-conversion non-periodicity index A ′ (i, f) is, for example, a post-conversion non-periodicity index A ′ () that decreases with a constant slope with respect to an increase in frequency in the frequency range F L to F H. This is the minimum value of i, f), and is the value of the non-periodicity index A ′ (i, f) after conversion of the frequency F H.

非周期性指標変換部20は、変換係数定義手段21と、加算手段22と、を備える。   The aperiodic index conversion unit 20 includes conversion coefficient definition means 21 and addition means 22.

変換係数定義手段21は、所定の周波数範囲F〜Fの(N・F/f/2)以上、(N・F/f/2)未満の帯域番号fの周波数f′の上記非周期性指標A(i,f)の値を、所定の周波数f<FにおいてE(f)=0、所定の周波数範囲F≦f≦FにおいてE(f)=−γ{(f′−F)/(F−F)}、所定の周波数範囲F<fにおいてE(f)=−γという関係で小さくする変換関数E(f)を定義する。ここでγは減衰量(γ>0)、f′は帯域番号fで表される実際の周波数である(式(1))。 The conversion coefficient defining means 21 has a frequency f ′ of a band number f that is not less than (N · F L / f s / 2) and less than (N · F H / f s / 2) in a predetermined frequency range F L to F H. The value of the non-periodicity index A (i, f) is E (f) = 0 at a predetermined frequency f <F L and E (f) = − γ in a predetermined frequency range F L ≦ f ≦ F H. {(F′−F L ) / (F H −F L )}, a conversion function E (f) that is reduced in a relationship of E (f) = − γ in a predetermined frequency range F H <f is defined. Here, γ is the attenuation (γ> 0), and f ′ is the actual frequency represented by the band number f (formula (1)).

はサンプリング周波数、Nは周波数帯域の分割数である。変換関数E(f)は、例えば、F=1000[Hz]、F=2000[Hz]として、f′<1000[Hz]のときE(f)=0、1000[Hz]≦f′≦2000 [Hz]のときE(f)=−γ{(f′−1000)/1000}、f′>2000[Hz]のときはE(f)=−γとする。 f s is the sampling frequency, and N is the number of divisions of the frequency band. The conversion function E (f) is, for example, F L = 1000 [Hz], F H = 2000 [Hz], and when f ′ <1000 [Hz], E (f) = 0, 1000 [Hz] ≦ f ′ When ≦ 2000 [Hz], E (f) = − γ {(f′−1000) / 1000}, and when f ′> 2000 [Hz], E (f) = − γ.

図7に、変換関数E(f)の一例を示す。図7の横軸は、縦軸は変換関数E(f)[dB]である。   FIG. 7 shows an example of the conversion function E (f). The horizontal axis in FIG. 7 is the conversion function E (f) [dB].

図7に示す例は、小さくする変換係数E(f)を、例えばF=1000[Hz]、F=2000[Hz]で定義したものである。変換関数E(f)を、例えば式(2)で定義する。 In the example shown in FIG. 7, the conversion coefficient E (f) to be reduced is defined by, for example, F L = 1000 [Hz] and F H = 2000 [Hz]. The conversion function E (f) is defined by, for example, Expression (2).

変換関数E(f)は、周波数が高くなるとマイナス方向に絶対値が大きくなる関数である。この変換関数E(f)の値を、非周期性指標A(i,f)に加算することで、非周期性指標A(i,f)を周波数の増加に対して減少させることができる。   The conversion function E (f) is a function whose absolute value increases in the negative direction as the frequency increases. By adding the value of the conversion function E (f) to the non-periodic index A (i, f), the non-periodic index A (i, f) can be decreased with respect to the increase in frequency.

変換関数定義手段21の動作を、図8に示す動作フローを参照して更に詳しく説明する。変換関数定義手段21は、帯域番号fの示す実際の周波数f′ごとに変換関数E(f)の値を計算する処理を全ての帯域番号fについて行う。帯域番号fの示す実際の周波数f′は式(1)で計算する(ステップS211)。周波数f′がF[Hz]未満の場合(ステップS212のYes)、変換関数E(f)の値はE(f)=0とする(ステップS217)。 The operation of the conversion function defining means 21 will be described in more detail with reference to the operation flow shown in FIG. The conversion function defining means 21 performs the process of calculating the value of the conversion function E (f) for every actual frequency f ′ indicated by the band number f for all band numbers f. The actual frequency f ′ indicated by the band number f is calculated by equation (1) (step S211). When the frequency f ′ is less than F L [Hz] (Yes in step S212), the value of the conversion function E (f) is set to E (f) = 0 (step S217).

周波数f′がF[Hz]以上、F[Hz]以下の場合(ステップS213)は、式(2)で変換関数E(f)の値を求める(ステップS215)。周波数f′がF[Hz]より大きい場合は、変換関数E(f)の値はE(f)=−γとする(ステップS216)。ステップS211〜S216の処理は、全ての帯域番号fについて行われる(ステップS210〜S217のループ)。以上の動作によって、図7に示した変換関数E(f)の値が計算される。減衰量γは、図1に示した相関関係から5[dB]〜15[dB]の値にすると良い。 When the frequency f ′ is not less than F L [Hz] and not more than F H [Hz] (step S213), the value of the conversion function E (f) is obtained by equation (2) (step S215). When the frequency f ′ is larger than F H [Hz], the value of the conversion function E (f) is set to E (f) = − γ (step S216). The processing of steps S211 to S216 is performed for all band numbers f (loop of steps S210 to S217). With the above operation, the value of the conversion function E (f) shown in FIG. 7 is calculated. The attenuation amount γ is preferably set to a value of 5 [dB] to 15 [dB] based on the correlation shown in FIG.

加算手段22は、音声分析部10が出力する非周期性指標A(i,f)に、変換関数定義手段21で計算された変換関数E(f)の値を加算する(ステップS22)。図9に、加算手段22の動作フローを示す。加算手段22は、全てのフレーム番号iにおいて、変換関数定義手段21で計算された変換関数E(f)の値を、音声分析部10が出力する非周期性指標A(i,f)に加算する(ステップS222)。この加算処理は、全てのフレーム番号i(iのループ)の全ての帯域番号f(fのループ)について行われる。   The adding means 22 adds the value of the conversion function E (f) calculated by the conversion function defining means 21 to the non-periodicity index A (i, f) output from the speech analysis unit 10 (step S22). FIG. 9 shows an operation flow of the adding means 22. The adding means 22 adds the value of the conversion function E (f) calculated by the conversion function defining means 21 to the aperiodicity index A (i, f) output from the speech analysis unit 10 for all frame numbers i. (Step S222). This addition processing is performed for all band numbers f (f loops) of all frame numbers i (i loops).

図10に、非周期性指標変換部20で処理した変換後非周期性指標A′(i,f)を示す。図10の横軸はフレーム番号i、縦軸は帯域周波数の帯域番号fである。図10は、本来はスペクトルの大きさをグレースケールで表すものであるが、作図の都合上、約−30[dB]以下を黒で表現している。図10は、所定の周波数範囲F〜FをF=1000[Hz]、F=2000[Hz]、減衰量γ=15[dB]とした例である。周波数1[kHz]に対応する帯域番号f=64以上の範囲の変換後非周期性指標A′(i,f)が、小さくなっていることが分かる。 FIG. 10 shows the post-conversion aperiodicity index A ′ (i, f) processed by the aperiodicity index conversion unit 20. The horizontal axis in FIG. 10 is the frame number i, and the vertical axis is the band number f of the band frequency. FIG. 10 originally represents the magnitude of the spectrum in gray scale, but for the convenience of drawing, about −30 [dB] or less is represented in black. FIG. 10 is an example in which the predetermined frequency ranges F L to F H are set to F L = 1000 [Hz], F H = 2000 [Hz], and attenuation γ = 15 [dB]. It can be seen that the post-conversion aperiodicity index A ′ (i, f) in the range of the band number f = 64 or more corresponding to the frequency 1 [kHz] is small.

図11に、非周期性指標変換部20で処理する前の非周期性指標A(i,f)を示す。横軸と縦軸の関係は図10と同じである。図11から明らかなように、非周期性指標変換部20で処理する前の帯域番号f=64以上は大きな値を示している。   FIG. 11 shows an aperiodic index A (i, f) before processing by the aperiodic index conversion unit 20. The relationship between the horizontal axis and the vertical axis is the same as in FIG. As is clear from FIG. 11, the band number f = 64 or more before processing by the non-periodicity index conversion unit 20 shows a large value.

音声合成部30は、変換後非周期性指標A′(i,f)と、音声分析部10が出力する基本周波数f(i)とスペクトルパワーP(i,f)とを用いて音声合成音を合成する。周波数帯域が1[kHz]以上の所定の範囲において、非周期性指標A(i,f)を、周波数の増加に対して減少させることで、音声が聞き易くなることは、上記した図1に示す新たな知見によって明らかである。よって、この発明の音声強調装置100で強調した音声は、騒音下でも聞き取り易い音声となる。また、スペクトルパワーを変化させないため、話者の声質も変化しない。 The speech synthesizer 30 uses the post-conversion aperiodicity index A ′ (i, f), the fundamental frequency f 0 (i) and the spectrum power P (i, f) output from the speech analyzer 10 to synthesize speech. Synthesize the sound. In the predetermined range of the frequency band of 1 [kHz] or more, by reducing the non-periodicity index A (i, f) with respect to the increase in frequency, it is easy to hear the voice in FIG. 1 described above. This is evident from the new findings shown. Therefore, the voice emphasized by the voice enhancement device 100 of the present invention is easy to hear even under noise. Further, since the spectral power is not changed, the voice quality of the speaker is not changed.

なお、所定の周波数範囲F〜Fを1000[Hz]〜2000[Hz]として説明したが、この周波数範囲はその前後の凡その周波数範囲でも良い。また、その周波数範囲の変換後非周期性指標A′(i,f)を、式(2)に示すように一定の割合で減少させる例で説明を行ったが、この発明はこの実施例に限定されない。例えば、変換後非周期性指標A′(i,f)の値を、周波数範囲F〜Fの間の飛び飛びの周波数ごとに階段状に減少させても良い。また、変換後非周期性指標A′(i,f)の値を、周波数の増加に対して反比例する関係で設定しても同様の効果を奏する。 The predetermined frequency range F L to F H has been described as 1000 [Hz] to 2000 [Hz], but this frequency range may be an approximate frequency range before and after that. Further, the example has been described in which the post-conversion aperiodicity index A ′ (i, f) in the frequency range is decreased at a constant rate as shown in the equation (2). It is not limited. For example, the value of the post-conversion aperiodicity index A ′ (i, f) may be decreased stepwise for each jumping frequency between the frequency ranges F L to F H. The same effect can be obtained by setting the value of the post-conversion aperiodicity index A ′ (i, f) in a relationship inversely proportional to the increase in frequency.

音声強調装置100は、例えばROM、RAM、CPU等で構成されるコンピュータに所定のプログラムが読み込まれて、CPUがそのプログラムを実行することで実現されるように構成してもよい。   The speech enhancement apparatus 100 may be configured to be realized by, for example, reading a predetermined program into a computer configured by a ROM, a RAM, a CPU, and the like and executing the program by the CPU.

その場合、その処理内容を記述したプログラムは、コンピュータで読み取り可能な任意の記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリがある。より具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD(Digital Versatile Disc)、DVD-RAM(Random Access Memory)、CD-ROM(Compact Disc Read Only Memory)、CD-R(Recordable)/RW(ReWritable)等を、光磁気記録媒体として、MO(Magneto Optical disc)等を、半導体メモリとしてEEP-ROM(Electronically Erasable and Programmable-Read Only Memory)等を用いることができる。   In that case, the program describing the processing contents can be recorded in any computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. More specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape, etc., and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read) Only Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording media, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。   This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェアとして実現することとしてもよい。   Each unit may be configured by executing a predetermined program on a computer, or at least a part of the processing contents may be realized as hardware.

Claims (7)

音声信号s(t)を入力として、当該音声信号をpサンプル間隔で分析を行い、上記pサンプルごとの基本周波数f(i)と、非周期性指標A(i,f)と、スペクトルパワーP(i,f)を出力する音声分析部と、
所定の周波数範囲F〜Fの非周期性指標の値A(i,f)を、周波数の増加に対して小さくなる変換後非周期性指標A′(i,f)と、当該所定の周波数Fよりも大きな周波数では上記小さくなる変換後非周期性指標A′(i,f)の最小の変換後非周期性指標A′(i,f)とに変換して出力する非周期性指標変換部と、
上記基本周波数f(i)と上記スペクトルパワーP(i,f)と上記変換後非周期性指標A′(i,f)とを入力として音声合成音s′(t)を合成する音声合成部と、
を具備する音声強調装置。
The audio signal s (t) is input, the audio signal is analyzed at p-sample intervals, the fundamental frequency f 0 (i) for each p sample, the non-periodicity index A (i, f), and the spectral power A voice analysis unit that outputs P (i, f);
A non-periodic index value A (i, f) of a predetermined frequency range F L to F H is converted into a post-conversion non-periodic index A ′ (i, f) that decreases with increasing frequency, The non-periodicity which is converted into the minimum post-conversion non-periodicity index A ′ (i, f) of the post-conversion non-periodicity index A ′ (i, f) which becomes smaller at a frequency higher than the frequency F H and output. An indicator conversion unit;
Speech synthesis for synthesizing speech synthesized sound s ′ (t) with the fundamental frequency f 0 (i), the spectrum power P (i, f), and the converted non-periodicity index A ′ (i, f) as inputs. And
A speech enhancement device comprising:
請求項1に記載した音声強調装置において、
上記非周期性指標変換部は、
所定の周波数範囲F〜Fの(N・F/f/2)以上、(N・F/f/2)(f=0,1,…,N-1、fはサンプリング周波数)未満の帯域番号fの周波数f′の上記非周期性指標A(i,f′)の値を、上記所定の周波数範囲F〜Fにおいて、減衰量γとしたときにE=−γ{(f′−F)/(F−F)}の関係で小さくする変換関数E(f)を定義する変換関数定義手段と、
上記音声分析部が出力する上記非周期性指標A(i,f)に、上記変換関数E(f)の値を加算する加算手段と、
を備えることを特徴とする音声強調装置。
The speech enhancement apparatus according to claim 1,
The aperiodic index conversion unit is
(N · F L / f s / 2) or more of a predetermined frequency range F L to F H , (N · F H / f s / 2) (f = 0, 1,..., N−1, f s are 'the non-periodic index a (i a, f' the frequency f of the sampling frequency) of less than band number f the value of), in the predetermined frequency range F L to F H, E when the attenuation gamma = A conversion function defining means for defining a conversion function E (f) to be reduced by the relationship of −γ {(f′−F L ) / (F H −F L )}
Adding means for adding the value of the conversion function E (f) to the non-periodicity index A (i, f) output by the speech analysis unit ;
A speech enhancement device comprising:
請求項1又2に記載の音声強調装置において、
上記所定の周波数範囲F〜Fは、F=1000Hz以上、F=2000Hz以下の範囲であることを特徴とする音声強調装置。
In the speech enhancement apparatus according to claim 1 or 2,
The predetermined frequency range F L to F H is, F L = 1000 Hz or higher, the audio enhancement apparatus which is a range of F H = 2000 Hz.
音声信号s(t)を入力として、当該音声信号をpサンプル間隔で分析を行い、上記pサンプルごとの基本周波数f(i)と、非周期性指標A(i,f)と、スペクトルパワーP(i,f)を出力する音声分析過程と、
所定の周波数範囲F〜Fの非周期性指標の値A(i,f)を、周波数の増加に対して小さくなる変換後非周期性指標A′(i,f)と、当該所定の周波数Fよりも大きな周波数では上記小さくなる変換後非周期性指標A′(i,f)の最小の変換後非周期性指標A′(i,f)とに変換して出力する非周期性指標変換過程と、
上記基本周波数f(i)と上記スペクトルパワーP(i,f)と上記変換後非周期性指標A′(i,f)とを入力として音声合成音s′(t)を合成する音声合成過程と、
を備える音声強調方法。
The audio signal s (t) is input, the audio signal is analyzed at p-sample intervals, the fundamental frequency f 0 (i) for each p sample, the non-periodicity index A (i, f), and the spectral power A speech analysis process for outputting P (i, f);
A non-periodic index value A (i, f) of a predetermined frequency range F L to F H is converted into a post-conversion non-periodic index A ′ (i, f) that decreases with increasing frequency, The non-periodicity which is converted into the minimum post-conversion non-periodicity index A ′ (i, f) of the post-conversion non-periodicity index A ′ (i, f) which becomes smaller at a frequency higher than the frequency F H and output. Indicator conversion process,
Speech synthesis for synthesizing speech synthesized sound s ′ (t) with the fundamental frequency f 0 (i), the spectrum power P (i, f), and the converted non-periodicity index A ′ (i, f) as inputs. Process,
A speech enhancement method comprising:
請求項4に記載した音声強調方法において、
上記非周期性指標変換過程は、
所定の周波数範囲F〜Fの(N・F/f/2)以上、(N・F/f/2)(f=0,1,…,N-1、fはサンプリング周波数)未満の帯域番号fの周波数f′の上記非周期性指標A(i,f)の値を、上記所定の周波数範囲F〜Fにおいて、減衰量γとしたときにE=−γ{(f′−F)/(F−F)}の関係で小さくする変換関数E(f)を定義する変換関数定義ステップと、
上記音声分析過程が出力する上記非周期性指標A(i,f)に、上記変換関数E(f)の値を加算する加算ステップと、
を含むことを特徴とする音声強調方法。
The speech enhancement method according to claim 4,
The aperiodic index conversion process is
(N · F L / f s / 2) or more of a predetermined frequency range F L to F H , (N · F H / f s / 2) (f = 0, 1,..., N−1, f s are E = − when the value of the non-periodicity index A (i, f) of the frequency f ′ of the band number f less than the sampling frequency) is the attenuation amount γ in the predetermined frequency range F L to F H. a conversion function defining step for defining a conversion function E (f) to be reduced in a relationship of γ {(f′−F L ) / (F H −F L )};
An adding step of adding the value of the conversion function E (f) to the non-periodicity index A (i, f) output by the speech analysis process ;
A speech enhancement method characterized by comprising:
請求項4又5に記載の音声強調方法において、
上記所定の周波数範囲F〜Fは、F=1000Hz以上、F=2000[Hz]以下の範囲であることを特徴とする音声強調方法。
In claim 4 or speech enhancement method according to 5,
The predetermined frequency range F L to F H is, F L = 1000 Hz or higher, the speech enhancement method which is a F H = 2000 [Hz] or less.
請求項1乃至3の何れかに記載した音声強調装置としてコンピュータを機能させるためのプログラム。   A program for causing a computer to function as the speech enhancement apparatus according to any one of claims 1 to 3.
JP2011245547A 2011-11-09 2011-11-09 Speech enhancement device, method and program thereof Active JP5667963B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011245547A JP5667963B2 (en) 2011-11-09 2011-11-09 Speech enhancement device, method and program thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2011245547A JP5667963B2 (en) 2011-11-09 2011-11-09 Speech enhancement device, method and program thereof

Publications (2)

Publication Number Publication Date
JP2013101255A JP2013101255A (en) 2013-05-23
JP5667963B2 true JP5667963B2 (en) 2015-02-12

Family

ID=48621918

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2011245547A Active JP5667963B2 (en) 2011-11-09 2011-11-09 Speech enhancement device, method and program thereof

Country Status (1)

Country Link
JP (1) JP5667963B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5745453B2 (en) * 2012-04-10 2015-07-08 日本電信電話株式会社 Voice clarity conversion device, voice clarity conversion method and program thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4468804A (en) * 1982-02-26 1984-08-28 Signatron, Inc. Speech enhancement techniques
EP1557827B8 (en) * 2002-10-31 2015-01-07 Fujitsu Limited Voice intensifier
JP4630183B2 (en) * 2005-12-08 2011-02-09 日本電信電話株式会社 Audio signal analysis apparatus, audio signal analysis method, and audio signal analysis program
JP5745453B2 (en) * 2012-04-10 2015-07-08 日本電信電話株式会社 Voice clarity conversion device, voice clarity conversion method and program thereof

Also Published As

Publication number Publication date
JP2013101255A (en) 2013-05-23

Similar Documents

Publication Publication Date Title
McLoughlin Applied speech and audio processing: with Matlab examples
CN104079247B (en) Balanced device controller and control method and audio reproducing system
EP1252621B1 (en) System and method for modifying speech signals
Degottex et al. A uniform phase representation for the harmonic model in speech synthesis applications
Alku et al. Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering
KR20010014352A (en) Method and apparatus for speech enhancement in a speech communication system
JP4516157B2 (en) Speech analysis device, speech analysis / synthesis device, correction rule information generation device, speech analysis system, speech analysis method, correction rule information generation method, and program
JP2020507819A (en) Method and apparatus for dynamically modifying voice sound quality by frequency shift of spectral envelope formants
Raitio et al. Analysis and synthesis of shouted speech.
Nathwani et al. Speech intelligibility improvement in car noise environment by voice transformation
Konno et al. Whisper to normal speech conversion using pitch estimated from spectrum
Siegert et al. Speech signal compression deteriorates acoustic cues to perceived speaker charisma
Matsubara et al. Investigation of training data size for real-time neural vocoders on CPUs
JP4654621B2 (en) Voice processing apparatus and program
Raitio et al. Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis
Zouhir et al. A bio-inspired feature extraction for robust speech recognition
JP5667963B2 (en) Speech enhancement device, method and program thereof
Jokinen et al. Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network
Zorilă et al. Near and far field speech-in-noise intelligibility improvements based on a time–frequency energy reallocation approach
Ireland et al. Adaptive multi-rate compression effects on vowel analysis
Kąkol et al. Improving objective speech quality indicators in noise conditions
Erro et al. On combining statistical methods and frequency warping for high-quality voice conversion
JP5745453B2 (en) Voice clarity conversion device, voice clarity conversion method and program thereof
Mousa Speech segmentation in synthesized speech morphing using pitch shifting.
JP2000235400A (en) Acoustic signal coding device, decoding device, method for these and program recording medium

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20140108

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20140926

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20141021

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20141117

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20141209

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20141215

R150 Certificate of patent or registration of utility model

Ref document number: 5667963

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20150109