JPH08110796A

JPH08110796A - Voice emphasizing method and device

Info

Publication number: JPH08110796A
Application number: JP6247503A
Authority: JP
Inventors: Toshiyuki Aritsuka; 俊之在塚; Yoshito Nene; 義人禰寝
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-10-13
Filing date: 1994-10-13
Publication date: 1996-04-30

Abstract

PURPOSE: To conduct the tone quality improvement, audition compensation, and voice emphasis of the deteriorated voice by emphasizing the temporal change of the acoustic feature of a voice. CONSTITUTION: A voice processing device is provided with a means inputting a voice, a means analyzing and processing the voice, a means reproducing and outputting the voice, a feature quantity calculation section calculating the acoustic feature quantity of the voice from the voice wave-form, a feature quantity change quantity calculation section calculating the temporal change quantity for a unit time of the acoustic feature quantity, a temporal change quantity change section changing the temporal change quantity, an acoustic feature quantity change section changing the acoustic feature quantity with the changed temporal change quantity, and a wave-form reconstitution section 133 reconstituting the voice wave-form from the changed acoustic feature quantity.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音響処理装置に関し、特
に劣化音声の音質改善、聴力障害者の聴力補償、および
音声強調を目的とする音声信号処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound processing device, and more particularly to a sound signal processing device for improving sound quality of deteriorated sound, compensating hearing for a hearing impaired person, and enhancing sound.

【０００２】[0002]

【従来の技術】音声の聞きやすさは、発話者の発声器官
や発話方法、受話者の聴力、発話環境や受話環境、伝送
路の制約や情報圧縮等による音質の変化に影響される。
発話者の発声器官の障害や、発話時の発声器官の使い方
が不十分な場合は、音声の音響的特徴が発話者の意図と
異なるか不明瞭になることが考えられる。また、受話者
の聴力が低下している場合は、音響的特徴を十分にとら
えることが困難になる。発話環境や受話環境、伝送系等
に雑音が存在していたり、伝送路の通過帯域の制限や情
報圧縮によって音質が劣化している場合は、音響的特徴
が歪んだり、他の音によってマスクされることがある。
このような要因による聞きやすさの低下は、音声の音響
的特徴を加工する技術を用いることによって回復させる
ことが可能である。2. Description of the Related Art The easiness of hearing a voice is affected by a voice organ of a speaker, a speech method, a hearing ability of a listener, a speech environment and a speech environment, a restriction of a transmission path, a change in sound quality due to information compression and the like.
If the speaker's vocal organs are damaged or if the vocal organs are not used properly during speech, the acoustic characteristics of the voice may be different or unclear from the speaker's intention. Further, when the hearing ability of the listener is deteriorated, it becomes difficult to sufficiently capture the acoustic features. If there is noise in the utterance environment, the receiving environment, the transmission system, etc., or if the sound quality is deteriorated due to the restriction of the pass band of the transmission path or information compression, the acoustic characteristics are distorted or masked by other sounds. Sometimes.
The deterioration of the audibility due to such factors can be recovered by using a technique for processing the acoustic characteristics of the voice.

【０００３】音声の聞きやすさを向上させるための音響
的特徴の処理には、聴取者の可聴レベルに合わせた周波
数スペクトル形状の変更や、ホルマントピークの強調
等、周波数軸上で特性を変更する静的な加工によって音
質の明瞭度を上げる方法がある。In order to process acoustic characteristics for improving the audibility of speech, characteristics on the frequency axis are changed such as changing the frequency spectrum shape according to the audible level of the listener and emphasizing the formant peak. There is a method of increasing the clarity of sound quality by static processing.

【０００４】一方、音声の抑揚を表す時間的な変化を強
調する従来技術としては、「ホルマント変化の強調・抑
圧による音質制御」（都木、桑原、音講論1986年10月、
pp145-146）記載のように線形予測分析を行って抽出し
た母音ホルマントの時間変化を強調する方法がある。ま
た、「継時マスキングのの補償に基づく音声強調方法の
提案と評価」（鈴木他、信学技法、 SP91-135、Mar 199
2、pp31-37）記載のように波形に特定形状の窓関数を乗
ずることによって音声波形の音圧レベルの立ち上がり部
分を強調する方法がある。On the other hand, as a conventional technique for emphasizing the temporal change that represents the intonation of speech, "sound quality control by emphasizing and suppressing formant changes" (Toki, Kuwahara, Onkyo, October 1986,
pp145-146), there is a method of emphasizing the temporal change of the vowel formants extracted by performing the linear prediction analysis. In addition, "Proposal and Evaluation of Speech Enhancement Method Based on Compensation of Continuous Masking" (Suzuki et al., IEEJ, SP91-135, Mar 199)
2, pp31-37), there is a method to emphasize the rising part of the sound pressure level of the speech waveform by multiplying the waveform by a window function of a specific shape.

【０００５】音質向上以外の目的にも、音声の機械認識
精度を向上させるために、認識パラメータの時間変化を
強調することによって音韻性を高める方法として、例え
ば、「スペクトル変化強調による単語音声認識」（古
井、音声研究会資料、S85-77、Jan 1986、pp597-604）
がある。本方法では、対象時間区間におけるＬＰＣケプ
ストラム包絡およびパワーの時間微分を２次の回帰係数
で近似し、該回帰係数に適当な係数を乗じた結果を該対
象時間区間におけるＬＰＣケプストラム包絡およびパワ
ーにそれぞれ加えることによって時間変化を強調する。For the purpose other than improving the sound quality, in order to improve the accuracy of machine recognition of speech, as a method of enhancing the phonological property by emphasizing the temporal change of the recognition parameter, for example, "word speech recognition by emphasizing spectrum change". (Furui, Voice Study Material, S85-77, Jan 1986, pp597-604)
There is. In this method, the time derivative of the LPC cepstrum envelope and power in the target time section is approximated by a quadratic regression coefficient, and the result obtained by multiplying the regression coefficient by an appropriate coefficient is added to the LPC cepstrum envelope and power in the target time section, respectively. Emphasize the change over time by adding.

【０００６】発声速度を変更することによって聞きやす
さを向上させる方法としては、例えば「音声蓄積再生装
置」（特開平3-48300）記載のように、比較的パワーの
大きい有声音部分の周期的波形を挿入または削除するこ
とによって音声のピッチを変えずに発声速度のみを変更
する従来技術がある。As a method for improving the easiness of hearing by changing the utterance speed, for example, as described in "Voice storage / reproduction device" (Japanese Patent Laid-Open No. 3-48300), a periodical portion of a voiced sound portion having a relatively large power is periodically used. There is a conventional technique in which only the speaking rate is changed without changing the pitch of the voice by inserting or deleting the waveform.

【０００７】[0007]

【発明が解決しようとする課題】上記従来技術のうち、
周波数軸上で特性を加工する方法は、時間軸上の特徴の
変化を強調しない。Of the above-mentioned conventional techniques,
The method of processing the characteristic on the frequency axis does not emphasize the change of the characteristic on the time axis.

【０００８】また、ホルマント変化を強調する方法は、
強調に先立ってホルマント抽出を行う必要があるが、ホ
ルマントは一般に無声音では顕著でないため、通常の手
段でホルマントを抽出できる有声音のみにしか適用され
ない。従って無声音を含む連続音声の強調には向かない
という問題があった。A method for emphasizing the change in formant is as follows.
Although it is necessary to perform formant extraction prior to emphasizing, formants are generally not prominent in unvoiced sounds, so they are applicable only to voiced sounds for which formants can be extracted by conventional means. Therefore, there is a problem that it is not suitable for emphasizing continuous voice including unvoiced sound.

【０００９】スペクトル変化を強調する方法では、強調
対象として主にＬＰＣケプストラム包絡を用いている
が、この強調は機械認識に用いる音響パラメータを対象
としており、強調後の音声復元が考慮されていないとい
う問題があった。また、本従来技術では、各分析フレー
ムごとに対象時間区間の原音のＬＰＣケプストラム包絡
およびパワーに対して強調が行われ、前フレームの強調
結果が累積されないという問題があった。In the method of emphasizing the spectrum change, the LPC cepstrum envelope is mainly used as an emphasizing object, but this emphasizing is intended for acoustic parameters used for machine recognition, and speech restoration after emphasizing is not considered. There was a problem. Further, the conventional technique has a problem that the LPC cepstrum envelope and power of the original sound in the target time section are emphasized for each analysis frame, and the emphasis result of the previous frame is not accumulated.

【００１０】波形に特定形状の窓関数を乗ずる方法は、
音声区間の立ち上がりの音圧レベルを強調するが、音声
全体のパワーの時間変化が強調されないという問題があ
った。A method for multiplying a waveform by a window function having a specific shape is as follows:
Although the sound pressure level at the rising edge of the voice section is emphasized, there is a problem that the temporal change of the power of the entire voice is not emphasized.

【００１１】発声速度のみを変更する方法は、有声部波
形の短時間区間の時間構造は保存するが、連続的な時間
変化は崩れるため、発話速度を変更した音声が、人間が
その速度で発話した音声にくらべ特に遷移区間で不自然
な音質となる場合があるという問題があった。According to the method of changing only the utterance speed, the time structure of the short time section of the voiced part waveform is preserved, but the continuous time change is broken, so that the voice with the changed utterance speed is spoken by the human at that speed. There is a problem that the sound quality may be unnatural especially in the transition section, as compared with the above speech.

【００１２】[0012]

【課題を解決するための手段】上記の課題を解決するた
めに、本発明の音声強調方法およびこれを用いた装置で
は、音声波形から音声の音響的特徴量を計算する特徴量
計算部と、該音響的特徴量の単位時間あたりの時間変化
量を計算する特徴量変化量計算部と、該時間変化量を変
更する時間変化量変更部と、該変更後時間変化量を用い
て該音響的特徴量を変更する音響的特徴量変更部と、該
変更後音響的特徴量から音声波形を再構築する波形再構
築部を設けた。In order to solve the above-mentioned problems, in a voice emphasizing method of the present invention and an apparatus using the same, a feature quantity calculating section for calculating an acoustic feature quantity of a voice from a voice waveform, A feature amount change amount calculation unit that calculates a time change amount of the acoustic feature amount per unit time, a time change amount change unit that changes the time change amount, and the acoustic amount using the changed time change amount. An acoustic feature amount changing unit that changes the feature amount and a waveform reconstructing unit that reconstructs a voice waveform from the changed acoustic feature amount are provided.

【００１３】また、音声の音響的特徴量の中で、時間的
な特性の変化が顕著であるために音質に寄与する割合が
比較的大きい基本周波数、パワー、および周波数スペク
トルを用い、その時間変化を同時にまたは単独であるい
は組み合せて変更する手段を設けた。Further, among the acoustic feature quantities of speech, a fundamental frequency, a power, and a frequency spectrum, which have a relatively large contribution to the sound quality due to a remarkable change in characteristics over time, are used, and their temporal changes are used. Are provided at the same time, individually or in combination.

【００１４】音響的特徴量変更部が、対象時間区間より
単位時間前の音響的特徴量に、時間変化量変更部におい
て変更された単位時間あたりの音響的特徴量の時間変化
量を加えた結果を対象時間区間の音響的特徴量とするこ
とによって、対象時間区間の音響的特徴量を変更する手
段を設けた。The result of the acoustic feature quantity changing unit adding the time change amount of the acoustic feature quantity per unit time changed by the time change amount changing unit to the acoustic feature quantity unit time before the target time section. A means for changing the acoustic characteristic amount of the target time section is provided by setting the acoustic characteristic amount of the target time section to.

【００１５】該単位時間前の音響的特徴量として、それ
より単位時間前に音響的特徴量変更部によって変更した
音響的特徴量を用いた。As the acoustic feature amount before the unit time, the acoustic feature amount changed by the acoustic feature amount changing unit before the unit time is used.

【００１６】発話速度を変更する発話速度変換部および
時間変化を変更する音声強調部を設けた。A speech rate conversion section for changing the speech rate and a voice emphasis section for changing the time change are provided.

【００１７】周波数特性を変更する周波数特性変更部お
よび時間変化を変更する音声強調部を設けた。A frequency characteristic changing section for changing the frequency characteristic and a voice emphasizing section for changing the time change are provided.

【００１８】発話速度を変更する発話速度変換部および
周波数特性を変更する周波数特性変更部および時間変化
を変更する音声強調部を設けた。A speech rate conversion section for changing the speech rate, a frequency characteristic changing section for changing the frequency characteristic, and a voice emphasizing section for changing the time change are provided.

【００１９】[0019]

【作用】音声波形から音声の音響的特徴量を計算する特
徴量計算部と、該音響的特徴量の単位時間あたりの時間
変化量を計算する特徴量変化量計算部と、該時間変化量
を変更する時間変化量変更部と、該変更後時間変化量を
用いて該音響的特徴量を変更する音響的特徴量変更部
と、該変更後音響的特徴量から音声波形を再構築する波
形再構築部を設けることにより、音声の時間変化を変更
することが可能になる。Operation: A feature amount calculation unit for calculating the acoustic feature amount of a voice from a voice waveform, a feature amount change amount calculation unit for calculating the time change amount of the acoustic feature amount per unit time, and the time change amount A time change amount changing unit for changing, an acoustic feature amount changing unit for changing the acoustic feature amount using the changed time change amount, and a waveform reconstructing unit for reconstructing a speech waveform from the changed acoustic feature amount. By providing the construction unit, it is possible to change the time change of the voice.

【００２０】また、音声の音響的特徴量の中で、時間的
な特性の変化が顕著であるために音質に寄与する割合が
比較的大きい基本周波数、パワー、および周波数スペク
トルを用い、その時間変化を同時にまたは単独であるい
は組み合せて変更する手段を設けることにより、音声の
時間変化を効果的に変更することが可能になる。In the acoustic feature quantity of the voice, the fundamental frequency, the power, and the frequency spectrum, which have a relatively large contribution to the sound quality because the characteristic changes with time are remarkable, are used. It is possible to effectively change the time change of the voice by providing a means for changing simultaneously, alone or in combination.

【００２１】音響的特徴量変更部が、対象時間区間より
単位時間前の音響的特徴量に、時間変化量変更部におい
て変更された単位時間あたりの音響的特徴量の時間変化
量を加えた結果を対象時間区間の音響的特徴量とするこ
とによって、対象時間区間の音響的特徴量を変更する手
段を設けることにより、音声の音響的特徴の時間変化を
変更することが可能になる。The result of the acoustic feature quantity changing unit adding the time change amount of the acoustic feature quantity per unit time changed by the time change amount changing unit to the acoustic feature quantity unit time before the target time section. Is set as the acoustic feature amount of the target time section, and by providing a means for changing the acoustic feature amount of the target time section, it is possible to change the temporal change of the acoustic feature of the voice.

【００２２】該単位時間前の音響的特徴量として、それ
より単位時間前に音響的特徴量変更部によって変更した
音響的特徴量を用いることにより、音声の時間変化の変
更を累積することが可能になる。By using, as the acoustic feature amount before the unit time, the acoustic feature amount changed by the acoustic feature amount changing unit before the unit time, it is possible to accumulate the change in the time change of the voice. become.

【００２３】発話速度を変更する発話速度変換部および
時間変化を変更する音声強調部を設けることにより、発
話速度および時間変化を同時に変更することが可能にな
る。By providing the speech speed conversion unit for changing the speech speed and the voice emphasizing unit for changing the time change, the speech speed and the time change can be changed at the same time.

【００２４】周波数特性を変更する周波数特性変更部お
よび時間変化を強調する音声強調部を設けることによ
り、周波数特性および時間変化を同時に変更することが
可能になる。By providing the frequency characteristic changing unit for changing the frequency characteristic and the voice emphasizing unit for emphasizing the time change, the frequency characteristic and the time change can be changed at the same time.

【００２５】発話速度を変更する発話速度変換部および
周波数特性を変更する周波数特性変更部および時間変化
を強調する音声強調部を設けることにより、発話速度お
よび周波数特性および時間変化を同時に変更することが
可能になる。By providing a speech rate conversion section for changing the speech rate, a frequency characteristic changing section for changing the frequency characteristic, and a voice emphasizing section for emphasizing the time change, the speech rate, the frequency characteristic and the time change can be changed at the same time. It will be possible.

【００２６】[0026]

【実施例】以下、本発明の実施例を図を用いて説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００２７】図１は本発明である音声強調装置の１実施
例を説明するブロック図である。図１において、入力音
声は、マイクロフォン１０１を通して電気信号１０２に
変換された後、Ａ／Ｄ変換部１０３においてディジタル
波形信号１０４に変換される。フレーム処理部１０５
は、適当な時間間隔の分析周期で、数十ミリ〜百ミリ秒
程度の時間窓を用いてディジタル波形信号１０４から短
時間区間波形を切り出し、フレーム波形信号１０６とす
る。FIG. 1 is a block diagram for explaining one embodiment of a voice emphasizing device according to the present invention. In FIG. 1, an input voice is converted into an electric signal 102 through a microphone 101 and then converted into a digital waveform signal 104 in an A / D converter 103. Frame processing unit 105
Is a frame waveform signal 106 obtained by cutting out a short-time section waveform from the digital waveform signal 104 using a time window of about several tens of milliseconds to hundreds of milliseconds at an analysis cycle of an appropriate time interval.

【００２８】フレームパワー計算部１０７は、フレーム
波形信号１０６のフレーム平均パワー１０８を計算す
る。パワー変化量計算部１０９は、フレーム平均パワー
１０８の単位時間あたりの変化量１１０を計算する。フ
レーム平均パワーの単位時間あたりの変化量は、現在の
分析フレームのフレーム平均パワーと前フレームの強調
前のフレーム平均パワーとの差で表される。ただし、分
析周期が短い場合は、ばらつきを少なくするために前後
数フレーム分のフレーム平均パワーから回帰係数を計算
し、これを用いてもよい。パワー変化強調部１１１は、
フレーム平均パワーの単位時間あたりの変化量１１０に
適当な係数を乗じ、前フレームの強調後のフレーム平均
パワーに加えることによって、強調後フレーム平均パワ
ー１１２とする。The frame power calculator 107 calculates the frame average power 108 of the frame waveform signal 106. The power change amount calculation unit 109 calculates a change amount 110 of the frame average power 108 per unit time. The amount of change in the frame average power per unit time is represented by the difference between the frame average power of the current analysis frame and the frame average power of the previous frame before enhancement. However, when the analysis cycle is short, a regression coefficient may be calculated from the frame average powers of several frames before and after it and used to reduce the variation. The power change emphasis unit 111
The post-enhancement frame average power 112 is obtained by multiplying the change amount 110 of the frame average power per unit time by an appropriate coefficient and adding it to the post-enhancement frame average power of the previous frame.

【００２９】また、基本周波数抽出部１１３は、例えば
「ディジタル音声処理」（古井、東海大学出版会、１９
８５）に記載されている自己相関分析に基づいてフレー
ム波形信号１０６から分析フレームごとの基本周波数１
１４を抽出する。基本周波数変化量計算部１１５は、基
本周波数１１４の単位時間あたりの変化量１１６を計算
する。基本周波数の単位時間あたりの変化量は、現在の
分析フレームの基本周波数と前フレームの強調前の基本
周波数との差で表される。ただし、分析周期が短い場合
は、ばらつきを少なくするために前後数フレーム分の基
本周波数から回帰係数を計算し、これを用いてもよい。
なお、無声音のように基本周波数が抽出できない場合
は、強調を行わないか、前後の有声音の基本周波数間を
線形あるいはスプライン関数等で補間することによって
内挿した基本周波数を用いて処理を行う。In addition, the fundamental frequency extraction unit 113 is, for example, "digital audio processing" (Furui, Tokai University Press, 19
85) from the frame waveform signal 106 to the fundamental frequency 1 for each analysis frame based on the autocorrelation analysis.
14 is extracted. The basic frequency change amount calculation unit 115 calculates the change amount 116 of the basic frequency 114 per unit time. The amount of change in the fundamental frequency per unit time is represented by the difference between the fundamental frequency of the current analysis frame and the fundamental frequency of the previous frame before enhancement. However, when the analysis cycle is short, a regression coefficient may be calculated from the fundamental frequencies of several frames before and after it in order to reduce the variation, and this may be used.
When the fundamental frequency cannot be extracted like unvoiced sound, no enhancement is performed or processing is performed using the fundamental frequency interpolated by linearly interpolating the fundamental frequencies of the voiced sounds before and after or with a spline function. .

【００３０】一方、フレーム波形信号１０６は、フーリ
エ変換部１１７において周波数スペクトル１１８に変換
される。スペクトル正規化部１１９は、周波数スペクト
ル１１８をフレーム平均パワー１０８で正規化し、正規
化周波数スペクトル１２０を計算する。スペクトル包絡
計算部１２１は、正規化周波数スペクトル１２０からス
ペクトル包絡１２２を計算する。本実施例では、スペク
トル包絡として、フーリエケプストラム包絡を用いる。
フーリエケプストラム包絡は、対数パワースペクトルの
ピーク包絡を、ケプストラム分析に基づいて計算するこ
とによって得られる。計算方法は、例えば「ディジタル
音声処理」（古井、東海大学出版会、１９８５）に記載
されている。スペクトル包絡変化量計算部１２３は、ス
ペクトル包絡１２２の単位時間あたりの変化量１２４を
計算する。スペクトル包絡の単位時間あたりの変化量
は、現在の分析フレームのスペクトル包絡と前フレーム
の強調前のスペクトル包絡との各周波数成分ごとの差で
表される。ただし、分析周期が短い場合は、ばらつきを
少なくするために前後数フレーム分のスペクトル包絡か
ら回帰係数を計算し、これを用いてもよい。スペクトル
変化強調部１２５では、スペクトル包絡の単位時間あた
りの変化量１２４に適当な係数を乗じ、前フレームの強
調後のスペクトル包絡に加えた結果が現在の分析フレー
ムの強調後のスペクトル包絡となるように現在の分析フ
レームの正規化スペクトル１２０を強調し、スペクトル
変化強調後正規化スペクトル１２６を得る。このとき、
強調後のスペクトル包絡と強調前のスペクトル包絡の差
を対数パワースペクトルから複素スペクトルに換算した
値を、正規化スペクトル１２０の各周波数成分の実部お
よび虚部にそれぞれ乗じ、位相を保存したままスペクト
ルの強調を行う。On the other hand, the frame waveform signal 106 is converted into a frequency spectrum 118 by the Fourier transform unit 117. The spectrum normalization unit 119 normalizes the frequency spectrum 118 with the frame average power 108 to calculate a normalized frequency spectrum 120. The spectrum envelope calculation unit 121 calculates the spectrum envelope 122 from the normalized frequency spectrum 120. In this embodiment, the Fourier cepstrum envelope is used as the spectrum envelope.
The Fourier cepstrum envelope is obtained by calculating the peak envelope of the logarithmic power spectrum based on cepstrum analysis. The calculation method is described, for example, in "Digital Speech Processing" (Furui, Tokai University Press, 1985). The spectrum envelope change amount calculation unit 123 calculates a change amount 124 of the spectrum envelope 122 per unit time. The amount of change in the spectrum envelope per unit time is represented by the difference for each frequency component between the spectrum envelope of the current analysis frame and the spectrum envelope of the previous frame before enhancement. However, when the analysis cycle is short, the regression coefficient may be calculated from the spectral envelopes of several frames before and after it in order to reduce the variation, and this may be used. The spectrum change emphasizing unit 125 multiplies the amount of change 124 of the spectrum envelope per unit time by an appropriate coefficient and adds the result to the spectrum envelope after emphasis of the previous frame so that the result becomes the spectrum envelope after emphasis of the current analysis frame. Then, the normalized spectrum 120 of the current analysis frame is emphasized, and the normalized spectrum 126 after the spectrum change emphasis is obtained. At this time,
A value obtained by converting the difference between the spectrum envelope after emphasis and the spectrum envelope before emphasis into a complex spectrum from a logarithmic power spectrum is multiplied by the real part and the imaginary part of each frequency component of the normalized spectrum 120, and the spectrum is preserved while the phase is preserved. Emphasize.

【００３１】基本周波数変化強調部１２７は、基本周波
数の単位時間あたりの変化量１１６に適当な係数を乗
じ、前フレームの強調後の基本周波数に加えることによ
って、強調後基本周波数とし、例えば、"Pitch-synchro
nous waveform processing techniques for text-to-sp
eech synthesis using diphones" (Charpentier and Mo
ulines、 Eurospeech 89、 vol 2、 Sep 1989、 pp13-1
9)記載の方法を用いてスペクトル変化強調後正規化スペ
クトル１２６の基本周波数を変更し、基本周波数変化強
調後正規化スペクトル１２８とする。基本周波数変化強
調後正規化スペクトル１２８は、逆フーリエ変換部１２
９でフレーム波形信号１３０に逆変換される。フレーム
波形再合成部１３１は、フレーム波形信号１３０に強調
後フレーム平均パワー１１２を乗じ、パワー変化強調後
フレーム波形信号１３２を再合成する。パワー変化強調
後フレーム波形信号１３２は、波形再構築部１３３にお
いて連続波形１３４に再構築され、Ｄ／Ａ変換部１３５
においてアナログ電気信号１３６に変換された後、スピ
ーカー１３７から強調された音声として出力される。The fundamental frequency change emphasizing section 127 multiplies the variation 116 of the fundamental frequency per unit time by an appropriate coefficient and adds it to the emphasized fundamental frequency of the previous frame to obtain the enhanced fundamental frequency. Pitch-synchro
nous waveform processing techniques for text-to-sp
eech synthesis using diphones "(Charpentier and Mo
ulines, Eurospeech 89, vol 2, Sep 1989, pp13-1
9) The fundamental frequency of the normalized spectrum 126 after emphasis of spectrum change is changed by using the method described above to obtain the normalized spectrum 128 after emphasis of change of fundamental frequency. The normalized spectrum 128 after emphasizing the fundamental frequency change is obtained by the inverse Fourier transform unit 12
At 9, the frame waveform signal 130 is inversely converted. The frame waveform recombining unit 131 multiplies the frame waveform signal 130 by the post-enhancement frame average power 112 to resynthesize the power change emphasized frame waveform signal 132. The frame waveform signal 132 after power change enhancement is reconstructed into a continuous waveform 134 by the waveform reconstructing unit 133, and the D / A converting unit 135.
After being converted into an analog electric signal 136 in, the sound is output from the speaker 137 as emphasized sound.

【００３２】なお、基本周波数抽出法は、上記実施例記
載の自己相関分析に基づく方法以外にも、ケプストラム
分析を用いる方法や、波形の零交差数の計測による方法
等があり、いずれの方法を用いても本音声強調装置を実
現することができる。The fundamental frequency extraction method includes, in addition to the method based on the autocorrelation analysis described in the above embodiment, a method using a cepstrum analysis, a method by measuring the number of zero-crossings of a waveform, etc. The present speech emphasizing device can also be realized by using it.

【００３３】また、本実施例においては、周波数スペク
トルの正規化に平均パワーを用いたが、周波数レベル最
大値で正規化することも可能である。Further, in the present embodiment, the average power is used for normalizing the frequency spectrum, but it is also possible to normalize with the maximum value of the frequency level.

【００３４】さらに、スペクトル包絡として、フーリエ
ケプストラム包絡を用いたが、対数パワースペクトル、
バークスペクトル、メルスペクトルを用いることも可能
である。ここで、バークスペクトルおよびメルスペクト
ルは、聴覚の周波数分解能に基づいており、バークスペ
クトルまたはメルスペクトルを用いることによって人間
の聴覚の周波数分析に近い方法で強調を行うことが可能
となる。ところで、バークスペクトルおよびメルスペク
トルは周波数が高くなるほど分解能が粗くなるため、こ
れらのスペクトルでは高い周波数成分に対応する各次の
成分に対しフーリエスペクトルの複数ポイントが割り当
てられる。このような場合は１つの次数に割り当てられ
るフーリエスペクトルの複数のポイントを、等しい割合
で強調する。バークスペクトルおよびメルスペクトル
は、例えば難波編「聴覚ハンドブック」（ナカニシヤ出
版、1984）記載の周波数軸の変換式や変換テーブルを用
いてフーリエスペクトルから換算することができる。周
波数スペクトルの時間変化強調にスペクトル包絡を用い
ず、複素周波数スペクトルを直接用いることも可能であ
るが、この場合は、周波数レベルのみでなく位相の時間
変化の強調も同時に行われる。Further, as the spectrum envelope, the Fourier cepstrum envelope is used.
It is also possible to use a Bark spectrum or a mel spectrum. Here, the Bark spectrum and the mel spectrum are based on the frequency resolution of the auditory sense, and by using the Bark spectrum or the Mel spectrum, it is possible to perform the enhancement in a method close to the frequency analysis of the human auditory sense. By the way, since the resolution of the Bark spectrum and the mel spectrum becomes coarser as the frequency becomes higher, a plurality of points of the Fourier spectrum are assigned to each order component corresponding to the high frequency component in these spectra. In such a case, a plurality of points of the Fourier spectrum assigned to one order are emphasized in equal proportions. The Bark spectrum and the Mel spectrum can be converted from the Fourier spectrum by using a conversion formula or conversion table of the frequency axis described in “Hearing Handbook” edited by Namba (Nakanishiya Publishing Co., Ltd., 1984). It is also possible to directly use the complex frequency spectrum without using the spectrum envelope to emphasize the time change of the frequency spectrum, but in this case, not only the frequency level but also the time change of the phase are emphasized at the same time.

【００３５】係数値は、変更する特徴量ごとに最適な値
を用いることが可能であり、必ずしも同じである必要は
ない。As the coefficient value, an optimum value can be used for each characteristic amount to be changed, and it is not always necessary that the coefficient value is the same.

【００３６】本実施例は分析フレームの時間的近傍の音
声信号のみを用いるため、受聴者に遅延を感じさせない
範囲の実時間で処理を行うことが可能であることは明ら
かである。Since this embodiment uses only the audio signals in the temporal vicinity of the analysis frame, it is clear that the processing can be performed in real time within a range where the listener does not feel any delay.

【００３７】図２は、特徴量の時間変化強調処理方法を
説明する図である。FIG. 2 is a diagram for explaining a method of highlighting the time variation of the feature quantity.

【００３８】図において、Ｆを原音の特徴量、Ｆ’を強
調後の特徴量とする。Ｆ、Ｆ’は時間の関数で、単位時
間ごとに離散的な値を持つとする。今、時間ｔにおける
原音の特徴量をＦ（ｔ）、強調後の特徴量をＦ’（ｔ）
とすると、ｔ＋１における強調後の特徴量Ｆ’（ｔ＋
１）は、時間ｔにおけるＦ（ｔ）の傾きｄＦ（ｔ）／ｄ
ｔを、In the figure, F is the feature amount of the original sound, and F'is the feature amount after emphasis. F and F ′ are functions of time and have discrete values for each unit time. Now, the feature amount of the original sound at time t is F (t), and the feature amount after emphasis is F ′ (t).
Then, at t + 1, the emphasized feature amount F ′ (t +
1) is the slope dF (t) / d of F (t) at time t
t

【００３９】[0039]

【数１】 [Equation 1]

【００４０】とすると、強調係数ａを用いて次式で表す
ことができる。Then, it can be expressed by the following equation using the emphasis coefficient a.

【００４１】[0041]

【数２】 [Equation 2]

【００４２】同様に、Ｆ’（ｔ＋２）は、次式で表され
る。Similarly, F '(t + 2) is expressed by the following equation.

【００４３】[0043]

【数３】 (Equation 3)

【００４４】従って、Therefore,

【００４５】[0045]

【数４】 [Equation 4]

【００４６】となり、特徴量の強調が累積される。本図
は簡単のため、特徴量をスカラーとしているが、特徴量
がベクトルの場合にも容易に拡張可能なことは明らかで
ある。ここで、ａは、−１より大きい実数で、ａ＞０の
とき時間変化を強調、−１＜ａ＜０のとき抑制する。ａ
＝０の時は原音の時間変化が変更されない。Then, the emphasis of the feature amount is accumulated. In the present figure, the feature amount is a scalar for simplification, but it is obvious that the feature amount can be easily expanded even when it is a vector. Here, a is a real number larger than -1, and emphasizes a temporal change when a> 0 and suppresses it when -1 <a <0. a
When = 0, the time change of the original sound is not changed.

【００４７】図３は、特徴量がスカラーの場合の時間変
化強調の概念図である。時間変化量の強調によって特徴
量Ｆは、Ｆ’のように強調される。FIG. 3 is a conceptual diagram of temporal change emphasis when the feature amount is a scalar. The feature amount F is emphasized like F ′ by emphasizing the time change amount.

【００４８】なお、音声は音源や声道形状を変化させる
ことによって調音様式の異なる音声セグメントを連続的
に発声し、意図する言語を構成する。従って、発話が正
確でない場合や、聴取条件に制約がある場合は、各音声
セグメントの音響的特徴が連続発声によってなまけるこ
とによって聞きにくくなる。このような場合は、連続発
声された音声を、各音声セグメントごとの音響的特徴が
該音素本来の特徴となるように強調することによって聞
きやすさが向上すると考えられる。このような連続発声
された音声の各音声セグメントの境界は、音響的特徴量
の時間変化が極大の時点とみなすことができる。そこ
で、音響的特徴量の時間変化の極大値を検出し、このと
きのみａ＝０とすれば、時間変化の変更が音声セグメン
トごとに行われる。It should be noted that by changing the sound source and the vocal tract shape, the voice continuously utters voice segments having different articulation styles to form the intended language. Therefore, when the utterance is not accurate or the listening condition is restricted, the acoustic feature of each voice segment is dulled by continuous utterance, which makes it difficult to hear. In such a case, it is considered that the audibility is improved by emphasizing the continuously uttered speech so that the acoustic feature of each voice segment becomes the original feature of the phoneme. The boundary of each voice segment of the continuously uttered voice can be regarded as the time point when the temporal change of the acoustic feature amount is the maximum. Therefore, if the maximum value of the temporal change of the acoustic feature amount is detected and a = 0 is set only at this time, the temporal change is changed for each voice segment.

【００４９】図４は音響的特徴量の時間変化極大値検出
部を有する時間変化量変更部のブロック図である。図に
おいて、４０１は時間変化量変更部を表す。音響的特徴
量の時間変化量４０２は、時間変化量保持部４０３に格
納されると共に、時間変化量比較部４０４において、時
間変化量保持部において保持されていた前フレームの時
間変化量４０５との差を計算され、該差４０６が極大で
あるかを極大値検出部４０７において判定される。強調
係数決定部４０９は、該判定結果４０８をもとに時間変
化量が極大であれば強調係数４１０を０、それ以外の場
合は、予め定められた０以外の実数とする。時間変化量
変更部４１１は、該強調係数４１０を用いて時間変化量
４０２を変更し、変更後時間変化量４１２とする。FIG. 4 is a block diagram of a time change amount changing unit having a time change maximum value detecting unit of the acoustic feature amount. In the figure, reference numeral 401 represents a time change amount changing unit. The time change amount 402 of the acoustic feature amount is stored in the time change amount holding unit 403, and the time change amount comparison unit 404 compares it with the time change amount 405 of the previous frame held in the time change amount holding unit. The difference is calculated, and the maximum value detection unit 407 determines whether the difference 406 is the maximum. Based on the determination result 408, the emphasis coefficient determination unit 409 sets the emphasis coefficient 410 to 0 if the amount of change with time is maximum, and otherwise sets a predetermined real number other than 0. The time change amount changing unit 411 changes the time change amount 402 using the emphasis coefficient 410 to obtain the changed time change amount 412.

【００５０】一方、音声が十分ゆっくり発声された場合
は、母音等の比較的継続時間長の長い音声セグメントの
定常部は音声セグメント本来の音響的特徴を有する場合
がある。このような場合は、時間変化量が極小の時点を
定常部開始時間とみなし、時間変化を変更しないとによ
って原音の特徴を保存することが可能である。On the other hand, when the voice is uttered sufficiently slowly, the stationary part of the voice segment having a relatively long duration such as a vowel may have the original acoustic characteristics of the voice segment. In such a case, it is possible to save the characteristics of the original sound by regarding the time point when the amount of time change is minimal as the stationary part start time and not changing the time change.

【００５１】図５は、音響的特徴量の時間変化極小値検
出部を有する時間変化量変更部のブロック図である。図
において、５０１は時間変化量変更部を表す。音響的特
徴量の時間変化量５０２は、時間変化量保持部５０３に
格納されると共に、時間変化量比較部５０４において、
時間変化量保持部において保持されていた前フレームの
時間変化量５０５との差を計算され、該差５０６が極小
であるかを極小値検出部５０７において判定される。強
調係数決定部５０９は、該判定結果５０８をもとに時間
変化量が極小であれば強調係数５１０を０、それ以外の
場合は、予め定められた０以外の実数とする。時間変化
量変更部５１１は、該強調係数５１０を用いて時間変化
量５０２を変更し、変更後時間変化量５１２とする。FIG. 5 is a block diagram of a time change amount changing unit having a time change minimum value detecting unit for the acoustic feature amount. In the figure, reference numeral 501 represents a time change amount changing unit. The time change amount 502 of the acoustic feature amount is stored in the time change amount holding unit 503, and the time change amount comparison unit 504
The difference from the time change amount 505 of the previous frame held in the time change amount holding unit is calculated, and the minimum value detection unit 507 determines whether the difference 506 is the minimum. Based on the determination result 508, the emphasis coefficient determination unit 509 sets the emphasis coefficient 510 to 0 when the time change amount is minimal, and otherwise sets a predetermined real number other than 0. The time change amount changing unit 511 changes the time change amount 502 using the emphasis coefficient 510 to obtain the changed time change amount 512.

【００５２】さらに、時間変化量の絶対値を計算し、該
絶対値が予め定めたある値より小さい場合には、その区
間を定常区間とみなし、時間変化を変更しないとによっ
て原音の特徴を保存することが可能である。Further, the absolute value of the time change amount is calculated, and when the absolute value is smaller than a predetermined value, the section is regarded as a steady section and the characteristic of the original sound is preserved by not changing the time change. It is possible to

【００５３】図６は、音響的特徴量の時間変化極小値検
出部を有する時間変化量変更部のブロック図である。図
において、６０１は時間変化量変更部を表す。時間変化
量絶対値計算部６０３は、音響的特徴量の時間変化量６
０２の絶対値６０４を計算し、時間変化量絶対値判定部
６０５において、該絶対値６０４が予め定められた値と
比較し、これより小さい場合は強調係数６０８を０、そ
れ以外の場合は、予め定められた０以外の実数とする。
時間変化量変更部６０９は、該強調係数６０８を用いて
時間変化量６０２を変更し、変更後時間変化量６１０と
する。FIG. 6 is a block diagram of a time change amount changing unit having a time change minimum value detecting unit for the acoustic feature amount. In the figure, 601 represents a time change amount changing unit. The time change amount absolute value calculation unit 603 calculates the time change amount 6 of the acoustic feature amount.
The absolute value 604 of 02 is calculated, and in the time change amount absolute value determination unit 605, the absolute value 604 is compared with a predetermined value, and if it is smaller than this, the emphasis coefficient 608 is set to 0, otherwise, It is a real number other than 0, which is determined in advance.
The time change amount changing unit 609 changes the time change amount 602 using the emphasis coefficient 608 to obtain the changed time change amount 610.

【００５４】通常、音響的特徴量の時間変化は、音声セ
グメントの境界で極大、音声セグメント中心部で極小と
なる。そこで、該時間変化量の絶対値によって強調係数
をかえることによって、強調後の音響的特徴量の時間変
化を制御することが可能となる。該時間変化量の絶対値
が大きいほど系数値を大きくした場合は、音声セグメン
ト境界付近の変化より強調し、中心付近の変化の強調を
少なくするため、セグメント境界がより明確になり、セ
グメント内の特徴の定常部の持続時間が増加する。図７
に該時間変化量の絶対値が大きいほど系数値を大きくし
た場合の時間変化強調の概念図を示す。Normally, the temporal change of the acoustic feature amount is maximum at the boundary of the voice segment and minimum at the center of the voice segment. Therefore, by changing the emphasis coefficient according to the absolute value of the temporal change amount, it becomes possible to control the temporal change of the emphasized acoustic feature amount. When the system value is increased as the absolute value of the time change amount is increased, the change is emphasized more than the change near the voice segment boundary and the change near the center is less emphasized, so that the segment boundary becomes clearer. The duration of the stationary part of the feature is increased. Figure 7
A conceptual diagram of time change emphasis when the system numerical value is increased as the absolute value of the time change amount is increased is shown in FIG.

【００５５】また、該時間変化量の絶対値が小さいほど
系数値を大きくした場合は、音声セグメント境界付近の
変化の強調を少なくし、中心付近の変化の強調を大きく
するため、セグメント境界で原音の持つ特徴量の変化を
保存することが可能となる。図８に該時間変化量の絶対
値が小さいほど系数値を大きくした場合の時間変化強調
の概念図を示す。When the system value is increased as the absolute value of the time variation is smaller, the emphasis of the change near the voice segment boundary is reduced and the emphasis of the change near the center is increased to increase the original sound at the segment boundary. It is possible to save the change in the feature amount of. FIG. 8 shows a conceptual diagram of time variation emphasis when the system numerical value is increased as the absolute value of the time variation is smaller.

【００５６】一般に、音声を構成する様々な調音様式の
セグメントの中で、無声子音のように声帯振動を伴わな
いセグメントでは、パワーが小さく、周波数スペクトル
に対する調音結合の影響が顕著でないと考えられる。ま
た、無声音では、声帯が振動しないため、基本周波数が
抽出できない。そこで、比較的パワーの小さい部分を無
声部とみなし、予め適当に定めた閾値より音声パワーが
小さい場合には、周波数スペクトルや基本周波数等の時
間変化強調の時間変化強調を行わないことによって処理
量を削減することが可能である。図９にパワー判定部
を有する音声強調装置の構成例を示す。図において、入
力音声は、マイクロフォン９０１を通して電気信号９０
２に変換された後、Ａ／Ｄ変換部９０３においてディジ
タル波形信号９０４に変換される。フレーム処理部９０
５は、適当な時間間隔の分析周期で、数十ミリ〜百ミリ
秒程度の時間窓を用いてディジタル波形信号９０４から
短時間区間波形を切り出し、フレーム波形信号９０６と
する。フレームパワー計算部９０７は、フレーム波形信
号９０６のフレーム平均パワー９０８を計算する。フレ
ーム平均パワー９０８は、パワー判定部９０９において
予め定めた閾値と比較され、該閾値より大きいと判定さ
れたときに、スペクトルおよび基本周波数強調指令９１
０を出す。フレームレベル時間変化強調部９１１は、ス
ペクトルおよび基本周波数強調指令９１０が検出された
場合はフレーム波形信号９０６のスペクトルおよび基本
周波数強調を行い、スペクトルおよび基本周波数強調指
令９１０が検出されない場合はフレーム波形信号９０６
のスペクトルおよび基本周波数を変更しない。このと
き、平均パワー強調処理は、スペクトルおよび基本周波
数強調指令９１０によらず行う。強調後フレーム波形信
号９１２は、波形再構築部９１３において連続波形９１
４に再構築され、Ｄ／Ａ変換部９１５においてアナログ
電気信号９１６に変換された後、スピーカー９１７から
強調された音声として出力される。In general, among the various articulatory modal segments that make up a voice, it is considered that a segment that does not accompany vocal cord vibration, such as a voiceless consonant, has a small power and that the effect of articulatory coupling on the frequency spectrum is not significant. Further, in unvoiced sound, the fundamental frequency cannot be extracted because the vocal cords do not vibrate. Therefore, if the voice power is smaller than a threshold value set in advance, the portion with relatively small power is regarded as the unvoiced portion, and the processing amount is reduced by not performing the time change emphasis of the time change emphasis of the frequency spectrum or the fundamental frequency. Can be reduced. FIG. 9 shows a configuration example of a voice enhancement device having a power determination unit. In the figure, an input voice is an electric signal 90 through a microphone 901.
After being converted into 2, it is converted into a digital waveform signal 904 in the A / D converter 903. Frame processing unit 90
Reference numeral 5 is an analysis cycle at an appropriate time interval, and a short-time section waveform is cut out from the digital waveform signal 904 using a time window of about several tens of millimeters to hundreds of milliseconds to form a frame waveform signal 906. The frame power calculator 907 calculates the frame average power 908 of the frame waveform signal 906. The frame average power 908 is compared with a predetermined threshold value in the power determination unit 909, and when it is determined to be larger than the threshold value, the spectrum and fundamental frequency emphasis command 91
Give 0. The frame level time change emphasis unit 911 performs the spectrum and fundamental frequency emphasis of the frame waveform signal 906 when the spectrum and fundamental frequency emphasis instruction 910 is detected, and the frame waveform signal 906 when the spectrum and fundamental frequency emphasis instruction 910 is not detected. 906
Do not change the spectrum and fundamental frequency of. At this time, the average power emphasis processing is performed regardless of the spectrum and fundamental frequency emphasis instruction 910. The emphasized frame waveform signal 912 is converted into a continuous waveform 91 by the waveform reconstructing unit 913.
4 is reconstructed, converted into an analog electric signal 916 in the D / A conversion unit 915, and then output as emphasized sound from the speaker 917.

【００５７】図１０は、本発明を用いて時間変化を強調
した音声の例である。図の（ａ）が「パンを焼く」と発
声された原音声のスペクトログラム、（ｂ）が時間変化
強調音声のスペクトログラムである。図１０より、スペ
クトルの全体の時間変化が強調されていることがわか
る。FIG. 10 is an example of a voice emphasizing a temporal change using the present invention. In the figure, (a) is a spectrogram of the original voice uttered as "baking bread", and (b) is a spectrogram of the time-change emphasized voice. It can be seen from FIG. 10 that the time change of the entire spectrum is emphasized.

【００５８】図１１に、本発明である音声強調装置の構
成例を示す。入力音声のアナログ電気信号１１０１は、
アンプ１１０２によってレベル調整された後フィルタ１
１０３必要帯域外を除去し、Ａ／Ｄ変換部１１０４でデ
ィジタル信号に変換された後、ＤＳＰで音声強調処理さ
れる。音声強調プログラムおよびデータは、プログラム
メモリ１１０６およびデータメモり１１０７よりそれぞ
れロードされる。音声強調されたディジタル信号は、Ｄ
／Ａ変換部１１０８でアナログ信号に変換された後必要
帯域外を除去し、アンプ１１１０でレベル調整された
後、強調後音声アナログ電気信号１１１１として出力さ
れる。FIG. 11 shows an example of the configuration of the voice emphasizing device according to the present invention. The analog electrical signal 1101 of the input voice is
Filter 1 after level adjustment by amplifier 1102
103 A band outside the required band is removed, and after being converted into a digital signal by the A / D conversion unit 1104, a voice enhancement process is performed by the DSP. The voice emphasis program and data are loaded from the program memory 1106 and the data memory 1107, respectively. The voice-enhanced digital signal is D
After being converted into an analog signal by the A / A conversion unit 1108, the outside of the required band is removed, the level is adjusted by the amplifier 1110, and the amplified analog audio electric signal 1111 is output.

【００５９】図１２に、本発明である音声強調装置を直
接用いた実施例を示す。音声は、マイクロフォン１２０
１を用いて入力され、アナログ電気信号に変換され、強
調音声アナログ電気信号は、スピーカー１２０２で、音
声に変換されて出力される。本実施例を発話障害者が用
いれば、発話を補償することも可能である。FIG. 12 shows an embodiment in which the voice emphasizing device of the present invention is directly used. The voice is the microphone 120.
1 is input and converted into an analog electric signal, and the emphasized voice analog electric signal is converted into voice and output by the speaker 1202. If the speech impaired person uses this embodiment, it is possible to compensate the speech.

【００６０】図１３に、本発明である音声強調装置をア
ナログ回線用電話機に応用した例を示す。伝送路１３０
１を通ってアナログ電話機１３０２に入力した音声は、
音声強調装置で強調された後、ハンドセット１３０３の
スピーカー１３０４から出力される。受話者の発話はハ
ンドセット１３０３のマイクロフォン１３０５から入力
され、直接アナログ電話機に送信される。FIG. 13 shows an example in which the voice emphasizing device of the present invention is applied to an analog line telephone. Transmission line 130
The voice input to the analog telephone 1302 through 1 is
After being emphasized by the voice emphasizing device, the sound is output from the speaker 1304 of the handset 1303. The utterance of the listener is input from the microphone 1305 of the handset 1303 and directly transmitted to the analog telephone.

【００６１】図１４に本発明である音声強調装置をディ
ジタル回線用電話機に応用した例を示す。本実施例で
は、入力音声が、すでにディジタル信号であるため、Ａ
／Ｄ変換処理を省略することが可能である。FIG. 14 shows an example in which the voice emphasizing device of the present invention is applied to a telephone for digital lines. In this embodiment, since the input voice is already a digital signal, A
The / D conversion process can be omitted.

【００６２】図１５は、本発明である音声強調装置をテ
レビジョン、ラジオ等の送信機の前処理部として用いた
場合の実施例である。FIG. 15 shows an embodiment in which the voice emphasizing device according to the present invention is used as a preprocessing unit of a transmitter such as a television or a radio.

【００６３】図１６は、本発明である音声強調装置をテ
レビジョン、ラジオ等の受信機の後処理部として用いた
場合の実施例である。FIG. 16 shows an embodiment in which the voice emphasizing device of the present invention is used as a post-processing unit of a receiver such as a television or a radio.

【００６４】図１７は、本発明である音声強調装置の時
間変化量の強調量を、外部より調節可能とする構成例で
ある。強調量は、調節つまみ１７０１を用いて強調量コ
ントローラ１７０２において調節される。FIG. 17 shows an example of a configuration in which the emphasis amount of the time change amount of the voice emphasizing device according to the present invention can be adjusted from the outside. The emphasis amount is adjusted in the emphasis amount controller 1702 using the adjustment knob 1701.

【００６５】図１８は、調節つまみを音響的特徴量ごと
に複数個有する音声強調装置の構成例である。強調され
る複数の音響的特徴量の強調量は、調節つまみ１８０１
を用いて特徴量ごとに強調量コントローラ１８０２で調
節される。FIG. 18 shows an example of the configuration of a voice emphasizing device having a plurality of adjustment knobs for each acoustic feature amount. The amount of emphasis of the plurality of acoustic feature values to be emphasized is adjusted by the adjustment knob 1801.
Is adjusted by the emphasis amount controller 1802 for each feature amount.

【００６６】図１９は、発話速度変換部および時間変化
強調部を有する音声変換装置の構成例である。本構成例
では、発話速度変換部が前段に、時間変化強調部が後段
に構成されているが、逆に、時間変化強調部を前段に、
発話速度変換部を後段に構成することも可能である。FIG. 19 shows an example of the configuration of a voice conversion device having a speech rate conversion unit and a time change emphasis unit. In this configuration example, the speech speed conversion unit is arranged in the front stage and the time change emphasis unit is arranged in the rear stage, but conversely, the time change emphasis unit is arranged in the front stage.
It is also possible to configure the speech rate conversion unit in the subsequent stage.

【００６７】図２０は、周波数特性変更部および時間変
化強調部を有する音声変換装置の構成例である。本構成
例では、周波数特性変更部が前段に、時間変化強調部が
後段に構成されているが、逆に、時間変化強調部を前段
に、周波数特性変更部を後段に構成することも可能であ
る。なお、周波数特性の変更は、例えば、「聴覚補償装
置」（特願平4-254355）記載の方法を用いて周囲環境や
受聴者の聴力に合わせて行うことができる。FIG. 20 shows an example of the configuration of a voice conversion device having a frequency characteristic changing section and a time change emphasizing section. In this configuration example, the frequency characteristic changing unit is arranged in the front stage and the time change emphasizing unit is arranged in the rear stage, but it is also possible to conversely configure the time change emphasizing unit in the front stage and the frequency characteristic changing unit in the rear stage. is there. Note that the frequency characteristic can be changed, for example, by using the method described in "Hearing compensation device" (Japanese Patent Application No. 4-254355) in accordance with the surrounding environment and the hearing ability of the listener.

【００６８】図２１は、周波数特性変更部および時間変
化強調部および発話速度変換部を有する音声変換装置の
構成例である。本構成例では、周波数特性変更部、時間
変化強調部、発話速度変換部の順に構成されているが、
強調の順序を変更することも可能である。FIG. 21 shows an example of the configuration of a voice conversion device having a frequency characteristic changing section, a time change emphasizing section, and a speech rate converting section. In this configuration example, the frequency characteristic changing unit, the time change emphasizing unit, and the speech speed converting unit are configured in this order.
It is also possible to change the order of emphasis.

【００６９】なお、図１９および図２１に示した構成例
において、発話速度の変換倍率と、時間変化強調部にお
ける時間変化量の強調倍率を等しくすれば、原音の時間
変化を保存したまま発話速度を変換することが可能とな
る。In the configuration examples shown in FIGS. 19 and 21, if the conversion rate of the utterance speed is made equal to the emphasis rate of the time change amount in the time change emphasis section, the utterance speed is kept while the time change of the original sound is preserved. Can be converted.

【００７０】一般に、調音器官の変化パターンは臨界制
動２次系で表すことができるといわれている。従って、
調音器官の変動に基づく音響的特徴量の時間変化パター
ンも、臨界制動２次系のモデルで近似することが可能で
あると考えられる。そこで、音響的特徴量の時間変化を
臨界制動２次系モデルを用いて近似し、該モデルの時間
変化をもって音響的特徴量の時間変化とすることによっ
て音響的特徴量の時間変化を変更することが可能であ
る。本方法を実現するために、本発明である音声強調装
置の特徴量変化量計算部に、特徴量の時間変化極大点、
および極小点を逐次検出する極大点検出部および極小点
検出部および臨界制動２次系モデル推定部を設けるもの
とする。Generally, it is said that the change pattern of the articulatory organ can be expressed by a critical damping secondary system. Therefore,
It is considered that the temporal change pattern of the acoustic feature amount based on the change of the articulatory organ can also be approximated by the model of the secondary system of critical braking. Therefore, the time change of the acoustic feature quantity is changed by approximating the time change of the acoustic feature quantity by using a critical braking quadratic system model, and setting the time change of the model as the time change of the acoustic feature quantity. Is possible. In order to realize this method, the feature amount change amount calculation unit of the speech enhancement apparatus according to the present invention includes a feature amount time change maximum point,
Also, a maximum point detection unit, a minimum point detection unit, and a critical braking secondary system model estimation unit that sequentially detect the minimum points are provided.

【００７１】図２２は、臨界制動２次系モデルを用いて
特徴量の時間変化を変更する時間変化量変更部の構成例
である。図において、２２０１は時間変化量変更部を表
す。時間変化極大点検出部２２０３は、音響的特徴量の
時間変化量２２０２の時系列より極大点を検出し、極大
点検出信号２２０４を出力する。時間変化極小点検出部
２２０５は、音響的特徴量の時間変化量２２０２の時系
列より極小点を検出し、極小点検出信号２２０６を出力
する。モデル推定部２２０７は、極大点より後方数十ミ
リ秒の短区間の時系列を用いて界制動２次系モデルのパ
ラメータを推定し、該モデルを該極大点より後方に隣接
する極小値まで外挿し、また、極大点より前方数十ミリ
秒の短区間の時系列を用いて界制動２次系モデルのパラ
メータを推定し、該モデルを該極大点より前方に隣接す
る極小値まで外挿することによって特徴量の変更後時間
変化量２２０８を決定する。強調係数決定部２２０９
は、変更後時間変化量２２０８と、原音声の特徴量の時
間変化量の差をとることにより、強調係数２２１０を計
算する。時間変化量変更部２２１１は、該強調係数２２
１０を用いて時間変化量２２０２を変更し、変更後時間
変化量２２１２とする。FIG. 22 shows an example of the configuration of a time change amount changing unit for changing the time change of the characteristic amount using the critical braking secondary system model. In the figure, 2201 represents a time change amount changing unit. The time change maximum point detection unit 2203 detects a maximum point from the time series of the time change amount 2202 of the acoustic feature amount, and outputs a maximum point detection signal 2204. The time change minimum point detection unit 2205 detects a minimum point from the time series of the time change amount 2202 of the acoustic feature amount, and outputs a minimum point detection signal 2206. The model estimating unit 2207 estimates the parameters of the field braking quadratic system model using a time series of a short section of several tens of milliseconds behind the maximum point, and removes the model up to a minimum value adjacent to the maximum point behind the maximum point. Also, the parameters of the field braking quadratic system model are estimated using a time series of a short section of several tens of milliseconds ahead of the maximum point, and the model is extrapolated to a minimum value adjacent in front of the maximum point. By doing so, the time change amount 2208 after the change of the feature amount is determined. Enhancement coefficient determination unit 2209
Calculates the emphasis coefficient 2210 by taking the difference between the changed time change amount 2208 and the time change amount of the feature amount of the original voice. The time change amount changing unit 2211 is configured to change the emphasis coefficient 22.
10 is used to change the time change amount 2202 to be the changed time change amount 2212.

【００７２】図２３は、音響的特徴量の時間変化を、該
時間変化極大点の近傍の音響的特徴量の時間系列を臨界
制動２次系モデルで近似し、該モデルを外挿することに
よって変更した場合の概念図である。ここで、２３０
１、２３０３は特徴量の時間変化極小点、２３０２、２
３０４は特徴量の時間変化極小点を表す。今、極大点２
３０２に注目すれば、２３０２より前方に隣接する極小
点２３０３まで、２３０２より前方数十ミリ秒から推定
した臨界制動２次系モデルを外挿し、変更後特徴量とす
る。また、２３０２より後方に隣接する極小点２３０１
まで、２３０２より後方数十ミリ秒から推定した臨界制
動２次系モデルを外挿し、変更後特徴量とする。これに
より、特徴量の時間変化は、臨界制動２次系モデルを用
いて変更される。なお、短区間の特徴量時系列から臨界
制動２次系モデルのパラメータを推定する方法は、例え
ば、"Spectrum target prediction model and its appl
ication to speech recognition" (Akagi、 Tohkura、
Computer Speech and Language (1990) 4、 325-344)に
記載されている。FIG. 23 shows the time variation of the acoustic feature quantity by approximating the time series of the acoustic feature quantity in the vicinity of the time change maximum point by a critical damping quadratic system model and extrapolating the model. It is a conceptual diagram when it changes. Where 230
1, 2303 are local minimum points 2302, 2
Reference numeral 304 represents a local minimum change point of the feature amount. Now the maximum point 2
Focusing on 302, the critical braking quadratic system model estimated from several tens of milliseconds ahead of 2302 is extrapolated up to the minimum point 2303 adjacent to the front of 2302, and is set as the changed feature amount. Further, the minimum point 2301 adjacent to the rear of 2302
Up to 2302, a critical braking secondary system model estimated from several tens of milliseconds behind is extrapolated to obtain the changed feature amount. As a result, the change over time in the feature quantity is changed using the critical braking quadratic system model. A method of estimating the parameters of the critical braking secondary system model from the feature time series of the short section is described in, for example, "Spectrum target prediction model and its appl.
ication to speech recognition "(Akagi, Tohkura,
Computer Speech and Language (1990) 4, 325-344).

【００７３】図２４は、動画の対象部分に合わせた音声
付与方法を表した概念図である。動画の出演者等の対象
に対し、吹き替えやアテレコ等で、別に録音した音声を
対応付ける場合、通常は動画を見ながら音声を吹き込ん
だり、対象時間を指定して該対象時間に納まるように発
声を調整する。本実施例では、本発明にかかる音声変換
方法を用いて、動画とは別に録音した音声の発話速度を
変更して動画の出現部分に対応付ける。図において、例
えば動画中のある対象の出現部分１の持続時間をＴ１、
該出現部分１に対応付けて付与する目的で予め録音され
た音声１の持続時間をＴ１’としたとき、本発明にかか
る音声変換装置を用いて付与音声１の持続時間Ｔ１’を
出現部分１の持続時間Ｔ１に変換することによって対象
の出現部分１に対応付けて付与音声１を付与することが
可能となる。FIG. 24 is a conceptual diagram showing a voice adding method adapted to a target portion of a moving image. When associating voices recorded separately by dubbing or ateleco, etc., with the target such as the performer of the video, usually the voice is blown while watching the video, or the utterance is made so that the target time is specified and the target time is reached. adjust. In the present embodiment, the voice conversion method according to the present invention is used to change the utterance speed of the voice recorded separately from the moving image and associate it with the appearance portion of the moving image. In the figure, for example, the duration of the appearance part 1 of a certain object in the moving image is T1,
Assuming that the duration of the voice 1 recorded in advance for the purpose of assigning it in association with the appearance portion 1 is T1 ′, the duration T1 ′ of the addition voice 1 is calculated using the voice conversion device according to the present invention. By converting the duration T1 into the duration T1, it is possible to attach the added voice 1 in association with the appearance portion 1 of the target.

【００７４】[0074]

【発明の効果】音声波形から音声の音響的特徴量を計算
する特徴量計算部と、該音響的特徴量の単位時間あたり
の時間変化量を計算する特徴量変化量計算部と、該時間
変化量を変更する時間変化量変更部と、該変更後時間変
化量を用いて該音響的特徴量を変更する音響的特徴量変
更部と、該変更後音響的特徴量から音声波形を再構築す
る波形再構築部を設けたことにより、音声の時間変化を
変更することが可能になった。EFFECTS OF THE INVENTION A feature amount calculation unit for calculating an acoustic feature amount of a voice from a voice waveform, a feature amount change amount calculation unit for calculating a time change amount of the acoustic feature amount per unit time, and the time change. Amount change unit for changing the amount, an acoustic feature amount change unit for changing the acoustic feature amount using the changed time change amount, and a voice waveform reconstructed from the changed acoustic feature amount By providing the waveform reconstructing unit, it became possible to change the time change of the voice.

【００７５】また、音声の音響的特徴量の中で、時間的
な特性の変化が顕著であるために音質に寄与する割合が
比較的大きい基本周波数、パワー、および周波数スペク
トルを用い、その時間変化を同時にまたは単独であるい
は組み合せて変更する手段を設けたことにより、音声の
時間変化を効果的に変更することが可能になった。Further, among the acoustic feature quantities of the voice, the fundamental frequency, power, and frequency spectrum, which have a relatively large contribution to the sound quality due to the remarkable temporal characteristic change, are used and the temporal change thereof is used. By providing a means for changing the voices simultaneously, alone or in combination, it becomes possible to effectively change the time change of the voice.

【００７６】音響的特徴量変更部が、対象時間区間より
単位時間前の音響的特徴量に、時間変化量変更部におい
て変更された単位時間あたりの音響的特徴量の時間変化
量を加えた結果を対象時間区間の音響的特徴量とするこ
とによって、対象時間区間の音響的特徴量を変更する手
段を設けたことにより、音声の音響的特徴の時間変化を
変更することが可能になった。The result of the acoustic feature quantity changing unit adding the time change amount of the acoustic feature quantity per unit time changed by the time change amount changing unit to the acoustic feature quantity unit time before the target time section. By providing a means for changing the acoustic feature amount of the target time section by setting the above as the acoustic feature amount of the target time section, it is possible to change the temporal change of the acoustic feature of the voice.

【００７７】該単位時間前の音響的特徴量として、それ
より単位時間前に音響的特徴量変更部によって変更した
音響的特徴量を用いたことにより、音声の時間変化の変
更を累積することが可能になった。As the acoustic feature amount before the unit time, the acoustic feature amount changed by the acoustic feature amount changing unit before the unit time is used, and thus the change in the time change of the voice can be accumulated. It became possible.

【００７８】発話速度を変更する発話速度変換部および
時間変化を変更する音声強調部を設けたことにより、発
話速度および時間変化を同時に変更することが可能にな
った。By providing the speech speed conversion unit for changing the speech speed and the voice emphasizing unit for changing the time change, it becomes possible to change the speech speed and the time change at the same time.

【００７９】周波数特性を変更する周波数特性変更部お
よび時間変化を強調する音声強調部を設けたことによ
り、周波数特性および時間変化を同時に変更することが
可能になった。By providing the frequency characteristic changing unit for changing the frequency characteristic and the voice emphasizing unit for emphasizing the time change, it becomes possible to change the frequency characteristic and the time change at the same time.

【００８０】発話速度を変更する発話速度変換部および
周波数特性を変更する周波数特性変更部および時間変化
を強調する音声強調部を設けたことにより、発話速度お
よび周波数特性および時間変化を同時に変更することが
可能になった。By changing the utterance speed conversion unit for changing the utterance speed, the frequency characteristic changing unit for changing the frequency characteristic, and the voice emphasizing unit for emphasizing the time change, it is possible to change the utterance speed, the frequency characteristic, and the time change at the same time. Became possible.

[Brief description of drawings]

【図１】本発明である音声強調装置の１実施例を説明す
るブロック図である。FIG. 1 is a block diagram illustrating an embodiment of a voice emphasizing device according to the present invention.

【図２】特徴量の時間変化強調処理方法を表す概念図で
ある。FIG. 2 is a conceptual diagram showing a method of time-varying emphasis processing of a feature amount.

【図３】時間変化強調概念図である。FIG. 3 is a conceptual diagram of time change emphasis.

【図４】時間変化極大値検出部を有する時間変化量変更
部の構成例である。FIG. 4 is a configuration example of a time change amount changing unit having a time change maximum value detecting unit.

【図５】時間変化極小値検出部を有する時間変化量変更
部の構成例である。FIG. 5 is a configuration example of a time change amount changing unit having a time change minimum value detecting unit.

【図６】時間変化量絶対値判定部を有する時間変化量変
更部の構成例である。FIG. 6 is a configuration example of a time change amount changing unit having a time change amount absolute value determination unit.

【図７】時間変化が大きいほど強調を大きくした場合の
音声強調概念図である。FIG. 7 is a conceptual diagram of voice emphasis when the emphasis is increased as the time change is large.

【図８】時間変化が小さいほど強調を大きくした場合の
音声強調概念図FIG. 8 is a conceptual diagram of voice enhancement when the enhancement is increased as the time change is smaller.

【図９】パワー判定部を有する音声強調装置の構成例で
ある。FIG. 9 is a configuration example of a voice enhancement device having a power determination unit.

【図１０】本発明を用いて時間変化を強調した音声の例
である。FIG. 10 is an example of a voice emphasizing a temporal change using the present invention.

【図１１】音声強調装置の構成例である。FIG. 11 is a configuration example of a voice enhancement device.

【図１２】音声強調装置を直接用いた実施例である。FIG. 12 shows an example in which a voice emphasizing device is directly used.

【図１３】音声強調装置をアナロブ回線用の電話機に応
用した例である。FIG. 13 is an example in which the voice emphasizing device is applied to a telephone for an analog line.

【図１４】音声強調装置をディジタル回線用の電話機に
応用した例である。FIG. 14 is an example in which the voice emphasizing device is applied to a telephone for a digital line.

【図１５】音声強調装置を送信機の前処理部として用い
た場合の実施例である。FIG. 15 shows an example in which the voice enhancement device is used as a pre-processing unit of a transmitter.

【図１６】音声強調装置を受信機の後処理部として用い
た場合の実施例である。FIG. 16 shows an example in which the voice enhancement device is used as a post-processing unit of a receiver.

【図１７】強調量可変の音声強調装置の構成例である。FIG. 17 is a configuration example of a voice enhancement device with variable enhancement amount.

【図１８】複数の調節つまみを有する強調量可変の音声
強調装置の構成例である。FIG. 18 is a configuration example of a voice enhancement device having a variable amount of enhancement having a plurality of adjustment knobs.

【図１９】発話速度変換部および時間変化強調部を有す
る音声変換装置の構成例である。FIG. 19 is a configuration example of a voice conversion device having a speech rate conversion unit and a time change emphasis unit.

【図２０】周波数特性変更部および時間変化強調部を有
する音声変換装置の構成例である。FIG. 20 is a configuration example of a voice conversion device having a frequency characteristic changing unit and a time change emphasizing unit.

【図２１】周波数特性変更部および時間変化強調部およ
び発話速度変換部を有する音声変換装置の構成例であ
る。FIG. 21 is a configuration example of a voice conversion device having a frequency characteristic changing unit, a time change emphasizing unit, and a speech rate converting unit.

【図２２】臨界制動２次系モデルを用いた時間変化量変
更部である。FIG. 22 is a time change amount changing unit using a critical braking secondary system model.

【図２３】臨界制動２次系モデルを用いた時間変化強調
概念図である。FIG. 23 is a conceptual diagram of temporal change emphasis using a critical braking secondary system model.

【図２４】動画の対象部分に合わせた音声付与方法の概
念図である。[Fig. 24] Fig. 24 is a conceptual diagram of a sound adding method adapted to a target portion of a moving image.

[Explanation of symbols]

１０１…マイクロフォン、１０２…電気信号、１０４…
ディジタル波形信号、１０６…フレーム波形信号、１０
８…フレーム平均パワー、１１０…フレーム平均パワー
の単位時間あたりの変化量、１１２…強調後フレーム平
均パワー、１１４…基本周波数、１１６…基本周波数の
単位時間あたりの変化量、１１８…周波数スペクトル、
１２０…正規化周波数スペクトル、１２２…スペクトル
包絡、１２４…スペクトル包絡の単位時間あたりの変化
量、１２６…スペクトル変化強調後正規化スペクトル、
１２８…基本周波数変化強調後正規化スペクトル、１３
０…フレーム波形信号、１３２…パワー変化強調後フレ
ーム波形信号、１３４…連続波形、１３６…アナログ電
気信号、１３７…スピーカー、４０１…時間変化量変更
部、４０２…音響的特徴量の時間変化量、４０５…前フ
レームの時間変化量、４０６…現在の分析フレームと前
フレームの時間変化量の差、４０８…極大値判定結果、
４１０…強調係数、４１２…変更後時間変化量、５０１
…時間変化量変更部、５０２…音響的特徴量の時間変化
量、５０５…前フレームの時間変化量、５０６…現在の
分析フレームと前フレームの時間変化量の差、５０８…
極小値判定結果判定結果、５１０…強調係数、５１２…
変更後時間変化量、６０１…時間変化量変更部、６０２
…音響的特徴量の時間変化量、６０４…音響的特徴量の
時間変化量の絶対値、６０８…強調係数、６１０…変更
後時間変化量、９０１…マイクロフォン、９０２…電気
信号、９０４…ディジタル波形信号、９０６…フレーム
波形信号、９０８…フレーム平均パワー、９１０…スペ
クトルおよび基本周波数強調指令、９１２…強調後フレ
ーム波形信号、９１４…連続波形、９１６…アナログ電
気信号、９１７…スピーカー、１１０１…入力音声のア
ナログ電気信号、１１０２…アンプ、１１１０…アン
プ、１１１１…強調後音声アナログ電気信号、１２０１
…マイクロフォン、１２０２…スピーカー、１３０１…
伝送路、１３０３…ハンドセット、１３０４…スピーカ
ー、１３０５…マイクロフォン、１７０１…調節つま
み、１８０１…調節つまみ、２２０１…時間変化量変更
部、２２０２音響的特徴量の時間変化量、２２０４…極
大点検出信号、２２０６…極小点検出信号、２２０８…
特徴量の変更後時間変化量、２２１０…強調係数、２２
１２…変更後時間変化量、２３０１、２３０３…特徴量
の時間変化極小点、２３０２、２３０４…特徴量の時間
変化極小点。101 ... Microphone, 102 ... Electric signal, 104 ...
Digital waveform signal, 106 ... Frame waveform signal, 10
8 ... Frame average power, 110 ... Change amount of frame average power per unit time, 112 ... Frame average power after enhancement, 114 ... Basic frequency, 116 ... Change amount of basic frequency per unit time, 118 ... Frequency spectrum,
120 ... Normalized frequency spectrum, 122 ... Spectrum envelope, 124 ... Change amount of spectrum envelope per unit time, 126 ... Normalized spectrum after emphasis of spectrum change,
128 ... Normalized spectrum after enhancement of fundamental frequency, 13
0 ... Frame waveform signal, 132 ... Frame waveform signal after power change enhancement, 134 ... Continuous waveform, 136 ... Analog electric signal, 137 ... Speaker, 401 ... Time change amount changing unit, 402 ... Time change amount of acoustic feature amount, 405 ... Time change amount of previous frame, 406 ... Difference between time change amount of current analysis frame and previous frame, 408 ... Maximum value determination result,
410 ... emphasis coefficient, 412 ... post-change time change amount, 501
... Time change amount changing unit, 502 ... Time change amount of acoustic feature amount, 505 ... Time change amount of previous frame, 506 ... Difference between time change amount of current analysis frame and previous frame, 508 ...
Minimum value determination result Determination result 510 ... enhancement coefficient 512 ...
Time change amount after change, 601 ... Time change amount change unit, 602
... time change amount of acoustic feature amount, 604 ... absolute value of time change amount of acoustic feature amount, 608 ... enhancement coefficient, 610 ... changed time change amount, 901 ... microphone, 902 ... electrical signal, 904 ... digital waveform Signal, 906 ... Frame waveform signal, 908 ... Frame average power, 910 ... Spectrum and fundamental frequency enhancement command, 912 ... Frame waveform signal after enhancement, 914 ... Continuous waveform, 916 ... Analog electric signal, 917 ... Speaker, 1101 ... Input voice Analog electrical signal 1102 ... Amplifier, 1110 ... Amplifier, 1111 ... Voice analog electrical signal after emphasis, 1201
… Microphone, 1202… Speaker, 1301…
Transmission line 1303 ... Handset, 1304 ... Speaker, 1305 ... Microphone, 1701 ... Adjustment knob, 1801 ... Adjustment knob, 2201 ... Time change amount change unit, 2202 Acoustic feature amount time change amount, 2204 ... Maximum point detection signal, 2206 ... Minimum point detection signal, 2208 ...
Amount of time change after change of feature amount 2210 ... Enhancement coefficient, 22
12 ... Amount of time change after change 2301, 2303 ... Minimum point of time change of feature amount 2302, 2304 ... Minimum point of time change of feature amount.

Claims

[Claims]

1. An acoustic processing device having at least a means for inputting a voice, a means for analyzing and processing the voice, and a means for reproducing and outputting the voice, and calculating an acoustic feature amount of the voice from a voice waveform. A feature amount calculation unit, a feature amount change amount calculation unit that calculates the time change amount of the acoustic feature amount per unit time, a time change amount change unit that changes the time change amount, and the changed time change amount. And a waveform reconstructing unit for reconstructing a speech waveform from the post-modification acoustic feature amount. The equipment used.

2. A speech enhancement method according to claim 1, and a device using the same, wherein the acoustic feature quantity is a fundamental frequency and / or power and / or a short-term frequency spectrum, and the speech enhancement method. Equipment using.

3. The speech enhancement method according to claim 2 and the apparatus using the same, wherein the speech enhancement method and the apparatus using the same are characterized by normalizing a frequency spectrum with an average power.

4. A speech enhancement method according to claim 2 and a device using the same, wherein a speech spectrum enhancement method and a device using the same are characterized by normalizing a frequency spectrum with a frequency level maximum value.

5. The speech enhancement method according to claim 2 and the apparatus using the same, wherein the acoustic feature quantity changing unit changes the frequency spectrum based on a time change of the logarithmic power spectrum. Speech enhancement method and apparatus using the same.

6. The voice emphasizing method according to claim 2 and a device using the same, wherein the acoustic feature amount changing unit changes the frequency spectrum based on a time change of the cepstrum envelope. Emphasizing method and apparatus using the same.

7. The voice enhancement method according to claim 2 and a device using the same, wherein the acoustic feature quantity changing unit changes the frequency spectrum based on a temporal change of the Bark spectrum. Emphasizing method and apparatus using the same.

8. The voice enhancement method according to claim 2 and a device using the same, wherein the acoustic feature quantity changing unit changes the frequency spectrum based on a temporal change of the mel spectrum. Emphasizing method and apparatus using the same.

9. The speech enhancement method according to claim 2, and the apparatus using the same, wherein the acoustic feature quantity changing unit does not change the phase of the frequency spectrum and the apparatus using the speech enhancement method. .

10. The acoustic feature amount changing unit adds the time change amount of the acoustic feature amount per unit time changed in the time change amount changing unit to the acoustic feature amount unit time before the target time section. The audio enhancement method according to claim 1, wherein the acoustic feature quantity of the target time section is changed by setting the result as the acoustic feature quantity of the target time section.

11. The acoustic feature quantity changing unit according to claim 10, wherein the acoustic feature quantity before the unit time is changed by the acoustic feature quantity changing unit according to claim 8 before the unit time. The speech enhancement method according to claim 1, and a device using the speech enhancement method.

12. The acoustic feature amount changing unit according to claim 10 or 11, further comprising a maximum value calculating unit for calculating a maximum value of a time change amount of the acoustic feature amount, and when the time change amount is a maximum value. The voice enhancement device according to claim 1, wherein the acoustic feature amount is not changed.

13. The acoustic feature quantity changing unit according to claim 10 or 11, further comprising a local minimum value calculating unit for calculating a local minimum value of the temporal change quantity of the acoustic feature quantity, and when the temporal change quantity is a minimum value. The voice enhancement device according to claim 1, wherein the acoustic feature amount is not changed.

14. The acoustic feature amount changing unit according to claim 10 or 11, further comprising an absolute value calculating unit for calculating an absolute value of a temporal change amount of the acoustic feature amount, wherein the absolute value is greater than a specified value. The voice enhancement device according to claim 1, wherein the acoustic feature amount is not changed when the voice feature amount is small.

15. The voice according to claim 1, wherein the time change amount changing unit changes the time change amount by multiplying the time change amount of the acoustic feature amount per unit time by a coefficient larger than 1. Emphasizing method and apparatus using the same.

16. The time change amount changing unit changes the time change amount by multiplying the time change amount of the acoustic feature amount per unit time by a positive coefficient smaller than 1. Speech enhancement method and apparatus using the same.

17. The time change amount changing unit according to claim 15 or 16, further comprising an absolute value calculating unit for calculating an absolute value of the time change amount of the acoustic feature amount per unit time, and multiplying by the absolute value. The voice enhancement method according to claim 1, wherein the system numerical value is changed, and an apparatus using the same.

18. The time change amount changing unit according to claim 17, wherein the larger the absolute value of the time change amount of the acoustic feature value per unit time, the larger the system numerical value to be multiplied. Speech enhancement method and apparatus using the same.

19. The time change amount changing unit according to claim 17, wherein the larger the absolute value of the time change amount of the acoustic feature amount per unit time, the smaller the system numerical value to be multiplied. Speech enhancement method and apparatus using the same.

20. The voice enhancement method according to claim 1, and the apparatus using the same, wherein the time change amount changing unit according to any one of claims 15 and 16 uses different system numerical values depending on acoustic feature values.

21. The acoustic feature quantity is not changed when the power is smaller than a predetermined threshold value.
The voice enhancement device described.

22. A voice emphasizing device characterized by externally adjusting the change of the temporal change amount of the acoustic feature amount of the voice by providing an adjusting unit adjustable from the outside.

23. A speech conversion method and a device using the same, comprising a speech rate conversion section for changing a speech rate and a speech enhancement section for enhancing speech by using the speech enhancement method according to claim 1.

24. A voice conversion method and a device using the same, comprising a frequency characteristic changing unit for changing frequency characteristics and a voice emphasizing unit for emphasizing voice by using the voice emphasizing method according to claim 1.

25. A speech rate conversion section for changing a speech rate, a frequency characteristic changing section for changing frequency characteristics, and a voice emphasizing section for emphasizing a voice by using the voice emphasizing method according to claim 1. Speech conversion method and device using the same.

26. The speech conversion method according to claim 23 or 25, and the apparatus using the same, wherein the conversion rate of the speech rate in the speech rate conversion section is equal to the enhancement rate of the temporal change in the speech enhancement section.

27. A voice enhancement method according to claim 1 or a voice conversion method according to any one of claims 23 to 25 is used to enhance a voice uttered in association with an entire image or a specific part. Voice registration device.

28. A voice characterized by using the voice enhancement method according to claim 1 or the voice conversion method according to any one of claims 23 to 25 to enhance a voice uttered in advance in association with a motion of a moving image. Registration device.

29. The voice registration device according to claim 28, wherein the utterance speed conversion unit changes the utterance speed of the uttered voice in association with the designated time period of the moving image.

30. A speech disorder compensating apparatus having a voice emphasizing unit for emphasizing a voice using the voice emphasizing method according to claim 1 or the voice converting method according to any one of claims 23 to 25.

31. A deafness compensating apparatus having a voice emphasizing unit for emphasizing a voice using the voice emphasizing method according to claim 1 or the voice converting method according to any one of claims 23 to 25.

32. A communication device having a voice emphasizing unit for emphasizing a voice using the voice emphasizing method according to claim 1 or the voice converting method according to any one of claims 23 to 25.

33. A broadcasting apparatus having a voice emphasizing unit for emphasizing a voice by using the voice emphasizing method according to claim 1 or the voice converting method according to any one of claims 23 to 25.

34. An acoustic feature quantity change section having a maximum point calculation section for detecting a maximum change point of the acoustic feature quantity over time and a minimum value calculation section for detecting a minimum point of the acoustic feature quantity over time. Calculates the function approximating the temporal change of the acoustic feature quantity in the vicinity of the local maximum point and extrapolates the function to calculate the following from the minimum point of the temporal change of the acoustic feature quantity before the local maximum point. 2. The speech enhancement method according to claim 1, wherein the acoustic feature up to the minimum point of the temporal change of the acoustic feature is changed, and a device using the method.

35. An acoustic feature amount change section having a maximum point calculation unit for detecting a maximum change point of the acoustic feature amount over time and a minimum point calculation unit for detecting a minimum point of the acoustic feature amount over time. Part calculates a function approximating the temporal change of the acoustic feature quantity in the vicinity after the local maximum point, and extrapolates the function to obtain a local minimum point of the temporal change of the next acoustic feature quantity from the local maximum point. Up to the maximum point, and a function approximating the time change of the acoustic feature in the vicinity before the maximum point is calculated, and by extrapolating the function, the sound before the maximum point is calculated. The speech enhancement method according to claim 1, wherein the acoustic feature up to the minimum point of the temporal change of the dynamic feature is changed, and a device using the method.

36. A speech emphasizing method according to claim 34 or 35, wherein a function approximating a time change of an acoustic feature is calculated by using a critical damping quadratic differential equation. apparatus.