JPH086596A

JPH086596A - Voice emphasis device

Info

Publication number: JPH086596A
Application number: JP6138937A
Authority: JP
Inventors: Satoshi Furuta; 訓古田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1994-06-21
Filing date: 1994-06-21
Publication date: 1996-01-12
Anticipated expiration: 2017-12-24
Also published as: JP3360423B2

Abstract

PURPOSE:To provide a voice emphasis device which is applicable to a telephone set, a loudspeaker and other acoustic instruments or a voice coding and decoding device and improves the clarity of a voice without deteriorating naturalness and makes the voice easy to hear by an aged person. CONSTITUTION:The device is provided with a voice feature parameter fluctuation amount analysis means 2 which divides an inputted voice into short time blocks and analyzes the degree of timewise fluctuation of the voice feature parameters of the inputted voice every block, an emphasis strength control means 3 which controls the spectrum emphasis strength in response to the voice feature parameter fluctuation amount information outputted by the means 2, a spectrum emphasis means 6 which executes the emphasis based on the means 3, a detection means which detects the voice that is deteriorated by the voice emphasis and noise and an emphasis coefficient change means 5 which suppresses or inhibits the execution of an emphasis based on more than one control signal outputted by the detection means.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、電話機、拡声器、音響
装置、その他音声符号化復号化装置などに用いられて、
自然性を保ちながら音声の明瞭性を向上させ、音声を聞
き取りやすくする音声強調装置に関する。INDUSTRIAL APPLICABILITY The present invention is used in telephones, loudspeakers, acoustic devices, and other voice coding / decoding devices.
The present invention relates to a voice emphasizing device that improves the clarity of voice while maintaining naturalness and makes it easier to hear the voice.

【０００２】[0002]

【従来の技術】従来、音声の明瞭性を向上させる方法と
しては、例えば特開昭５８−１８４２００号などが挙げ
られる。特開昭５８−１８４２００号に記載された音声
の明瞭性を向上させる方法では、対話音声において、子
音のパワーが母音のそれに比較してかなり小さいことに
起因して生じる子音の知覚の誤りを低減するために、子
音部を検出しその部分のパワーを上げることにより強調
を行っている。2. Description of the Related Art Conventionally, as a method for improving the clarity of voice, there is, for example, JP-A-58-184200. In the method of improving the clarity of speech described in Japanese Patent Laid-Open No. 58-184200, the error in the perception of consonants caused by the fact that the power of consonants is considerably smaller than that of vowels in interactive speech is reduced. In order to do so, the consonant part is detected and the power of that part is increased to emphasize the part.

【０００３】以下、従来例における音声の明瞭性を向上
させる方法の一構成例について、その動作を説明する。
入力音声を短時間区間に分割し、各短時間区間毎にスペ
クトル形態評価装置に入力し、入力音声の短時間スペク
トル形態の評価を行う。次いでスペクトル形態評価装置
では、そのスペクトル形態の評価内容に応じて１つ以上
の評価信号を生成する。制御用論理装置ではその評価信
号に応じて、スペクトル形態動的修正装置を制御するた
めの制御信号を生成し、その制御信号に応じて、スペク
トル形態動的修正装置で入力音声の各短時間区間毎のス
ペクトル形態の修正を行い、出力音声を出力する。The operation of a conventional configuration example of a method for improving the clarity of voice will be described below.
The input speech is divided into short time periods, and the short time spectral morphology of the input speech is evaluated by inputting it to the spectrum morphology evaluation device for each short time period. Then, the spectrum form evaluation device generates one or more evaluation signals according to the evaluation contents of the spectrum form. The control logic device generates a control signal for controlling the spectrum morphology dynamic correction device according to the evaluation signal, and according to the control signal, each short time section of the input voice by the spectrum morphology dynamic correction device. The spectrum form is corrected for each and the output voice is output.

【０００４】更に、声質を変換して音声の明瞭性を向上
させる方法としては、例えば、特開平０１−９３７９６
号が挙げられる。特開平０１−９３７９６号に記載され
た声質を変換して音声の明瞭性を向上させる方法では、
音声情報として重要なホルマント周波数の遷移の強調を
行うことにより強調を行っている。Further, as a method for converting the voice quality to improve the clarity of the voice, for example, Japanese Patent Laid-Open No. 01-93796.
No. In the method described in Japanese Patent Laid-Open No. 01-93796 for improving voice clarity by converting voice quality,
The emphasis is made by emphasizing the transition of the formant frequency, which is important as voice information.

【０００５】以下、特開平０１−９３７９６号に示され
る声質変換方法の一構成例について、動作を説明する。
入力音声を短時間区間に分割し、各短時間毎に分析部に
て有音と無音、および有声音と無声音の判別を行なう。
入力された短時間区間が有声音区間の場合、その短時間
区間の線形予測係数を算出し共振周波数の算出を行う。
次にホルマント周波数制御部において、分析部で得られ
た共振周波数に基づきホルマント周波数を求め、ホルマ
ントの時間軌跡の変化量が大きくなるように、ホルマン
ト周波数および帯域幅の変更を行う。スペクトル制御部
では、ホルマント周波数制御部が出力するホルマント周
波数の時間変化分と帯域幅に応じてスペクトル包絡を変
更する。上述の一連の有声音に対する声質変換の処理を
終了すると無声音区間および無音区間を接続し、次の有
声音区間の処理に移り、最終的に合成された出力音声を
出力する。The operation of a voice quality conversion method disclosed in Japanese Patent Laid-Open No. 01-93796 will be described below.
The input voice is divided into short time sections, and the voice analysis is performed by the analysis unit for each short time to distinguish between voiced sound and unvoiced sound.
When the input short time section is the voiced sound section, the linear prediction coefficient of the short time section is calculated to calculate the resonance frequency.
Next, in the formant frequency control unit, the formant frequency is obtained based on the resonance frequency obtained by the analysis unit, and the formant frequency and the bandwidth are changed so that the amount of change in the time trace of the formant becomes large. The spectrum control unit changes the spectrum envelope in accordance with the time variation of the formant frequency output by the formant frequency control unit and the bandwidth. When the process of voice quality conversion for a series of voiced sounds described above is completed, the unvoiced section and the unvoiced section are connected, the process proceeds to the next voiced section, and the finally synthesized output speech is output.

【０００６】更に、音声の動的特徴を利用して合成音声
の品質を改善する方法としては、文献１、安藤他著、
「音声の動的特徴を考慮したメルケプストラム音声」社
団法人電子情報通信学会技術報告、ＳＰ９３−６５（１
９９３−１０）が挙げられる。Further, as a method for improving the quality of synthesized speech by utilizing the dynamic characteristics of speech, reference 1, Ando et al.,
"Melk cepstrum voice considering dynamic features of voice" Technical Report of IEICE, SP93-65 (1)
993-10).

【０００７】文献１では、音声の動的特徴の強調に２次
元メルケプストラムを用いる方法を提案している。以
下、文献１に示される、音声品質改善方法の処理につい
て説明する。入力音声を短時間区間に分割し、現在のフ
レームとその前後のフレームを用いて、２次元メルケプ
ストラムを求める。その２次元メルケプストラムの動的
特徴領域に、時間方向と周波数方向に低域強調型の重み
付けリフタを用いてリフタリングを行い、スペクトル包
絡の時間方向と周波数方向の形状について変更を行った
低域強調２次元メルケプストラムを求める。音源情報に
は単一パルス音源を用いて、合成音声をＭＬＳＡ（メル
対数スペクトル近似）フィルタで合成し、出力する。な
お、２次元メルケプストラムの物理量としての説明は、
文献１にて詳しく述べられているので省略する。[0007] Document 1 proposes a method of using a two-dimensional mel cepstrum for enhancing dynamic features of speech. The processing of the voice quality improving method shown in Document 1 will be described below. The input speech is divided into short time sections, and the two-dimensional mel cepstrum is obtained using the current frame and the frames before and after the current frame. The dynamic feature region of the two-dimensional mel cepstrum is lifted by using a low-frequency weighting type weighting lifter in the time and frequency directions, and the low-frequency enhancement is performed by changing the shape of the spectrum envelope in the time direction and the frequency direction. Find a two-dimensional mel cepstrum. A single-pulse sound source is used as the sound source information, and the synthesized speech is synthesized by an MLSA (mel log spectrum approximation) filter and output. In addition, the explanation as a physical quantity of the two-dimensional mel cepstrum is
Since it is described in detail in Reference 1, it is omitted.

【０００８】[0008]

【発明が解決しようとする課題】従来における音声強調
装置は、以上の様にして構成されていたので下記の様な
問題点があった。特開昭５８−１８４２００号では、入
力音声の短期スペクトルを評価し、それに応じて適応的
にスペクトル形態を変化させて音声の強調を行っている
が、これは各短時間区間の音声特徴パラメータ、つまり
音声特徴パラメータの「瞬時値」に基づいての強調であ
るので、子音区間を検出して強調を行うことはできるも
のの、音声特徴パラメータの「変動量」に基づいて強調
を制御するという着想が無く、音声の了解性において重
要とされる音韻の過渡部を強調することができないとい
う問題点があった。Since the conventional speech emphasizing apparatus is constructed as described above, it has the following problems. In Japanese Patent Laid-Open No. 58-184200, the short-term spectrum of an input voice is evaluated, and the spectrum form is adaptively changed accordingly to emphasize the voice. In other words, since the emphasis is based on the "instantaneous value" of the voice feature parameter, it is possible to detect and enhance the consonant section, but the idea is to control the emphasis based on the "variation amount" of the voice feature parameter. However, there is a problem that it is not possible to emphasize the transitional part of the phoneme, which is important for the intelligibility of the voice.

【０００９】特開平０１−９３７９６号では、ホルマン
ト時間変動幅の強調を行うことによりホルマント周波数
の変動量は大きくなるが、変化の中心区間の音声は強調
されず、音声の了解性において重要とされる音韻の過渡
部が強調できないという問題点があった。また、ホルマ
ント周波数の時間軌跡を、その変化幅が周波数軸上で大
きくなるように修正しているので、変化幅の絶対値を大
きくするためには、定常部においてもホルマント周波数
の位置をずらす必要が生じ、結果として音声の持つ個人
性情報として重要なホルマント周波数が変化してしま
い、そのためホルマントの時間軌跡変化が大きい部分の
明瞭性は改善されるものの、得られる出力音声は声質が
変わり自然性が著しく損なわれ、個人性が欠如するとい
う問題点があった。また、子音などのホルマントピーク
の時間方向軌跡が抽出できない音声に関しては強調がで
きないという問題点があった。According to Japanese Patent Laid-Open No. 01-93796, the amount of variation in the formant frequency is increased by emphasizing the formant time variation range, but the voice in the central section of the change is not emphasized and is important for the intelligibility of the voice. There was a problem that the transitional part of the phoneme could not be emphasized. In addition, the temporal locus of the formant frequency is modified so that its variation width is large on the frequency axis.Therefore, in order to increase the absolute value of the variation width, it is necessary to shift the position of the formant frequency even in the steady part. As a result, the formant frequency, which is important as the personality information of the voice, changes, and although the clarity of the part of the formant with a large temporal trajectory change is improved, the resulting output voice has a different voice quality and naturalness. However, there was a problem in that the personality was significantly impaired and the individuality was lacking. In addition, there is a problem that it is not possible to emphasize voices such as consonants whose temporal loci of formant peaks cannot be extracted.

【００１０】文献１に示される合成音声品質改善手法で
は、スペクトル時間周波数変動幅の強調を行っている
が、変化の中心の区間の音声は強調されず、音声の了解
性において重要とされる音韻の過渡部が強調できないと
いう問題点があった。また、リフタの重み係数を時間周
波数方向に応じて変更する構成にはなっているものの、
本手法では入力音声の音韻形状に応じてリフタを変更す
るような構成をとることができず、また、複数の強調方
式を導入してそれら方式を入力音声の音韻形状に応じて
適応的に切り替えたり、同時に複数の強調方式を動作さ
せて相乗効果を狙うような構成をとることもできないの
で、入力音声の音韻形状に応じた適応的な強調や、複数
の強調の相乗効果を利用した音声強調ができないという
問題点があった。In the synthetic speech quality improvement method shown in Document 1, the spectrum temporal frequency variation width is emphasized, but the speech in the center of the change is not emphasized, and the phoneme which is important for the intelligibility of the speech. There was a problem that the transitional part of could not be emphasized. Although the weighting coefficient of the lifter is changed according to the time frequency direction,
In this method, it is not possible to adopt a configuration in which the lifter is changed according to the phoneme shape of the input speech, and multiple enhancement methods are introduced to adaptively switch the methods according to the phoneme shape of the input speech. Also, it is not possible to operate multiple enhancement methods at the same time to achieve a synergistic effect.Therefore, adaptive enhancement according to the phoneme shape of the input speech, or speech enhancement using the synergistic effect of multiple enhancements There was a problem that I could not do it.

【００１１】また、前記いずれの方法においても、入力
音声の強調による振幅または歪みの増大等の劣化が生じ
る場合があるが、これらの発生の抑制手段を講じていな
いため、出力音声の品質が劣化するという問題があっ
た。In any of the above methods, deterioration of the amplitude or distortion may occur due to the emphasis of the input voice, but the quality of the output voice deteriorates because no means for suppressing the occurrence of these is taken. There was a problem of doing.

【００１２】本発明は、かかる課題を解決するためにな
されたものであり、自然性を保ちながら音声の明瞭性を
向上させ、音声を聞き取りやすくする音声強調装置を提
供することを目的とする。The present invention has been made to solve the above problems, and an object of the present invention is to provide a voice emphasizing device which improves the clarity of the voice while maintaining the naturalness and makes the voice easy to hear.

【００１３】[0013]

【課題を解決するための手段】第１の発明に係る音声強
調装置は、各短時間区間毎に入力音声の音声特徴パラメ
ータの時間変動の度合いを分析する音声特徴パラメータ
変動量分析手段と、音声特徴パラメータ変動量分析手段
が出力する音声特徴パラメータ変動量情報に応答してス
ペクトル強調強度を制御するスペクトル強調強度制御手
段と、スペクトル強調強度制御手段の出力するスペクト
ル強調係数に基づいて入力音声のスペクトル強調を行う
スペクトル強調手段とを備えるようにしたものである。According to a first aspect of the present invention, there is provided a voice emphasizing device, which comprises a voice feature parameter variation amount analyzing means for analyzing a degree of temporal variation of a voice feature parameter of an input voice for each short time section, and a voice feature parameter variation amount analyzing means. Spectrum emphasis strength control means for controlling the spectrum emphasis strength in response to the voice feature parameter fluctuation amount information output from the feature parameter fluctuation amount analysis means, and the spectrum of the input voice based on the spectrum emphasis coefficient output from the spectrum emphasis strength control means And a spectrum enhancing means for enhancing.

【００１４】第２の発明に係る音声強調装置は、各短時
間区間毎に入力音声の音声特徴パラメータの時間変動の
度合いを分析する音声特徴パラメータ変動量分析手段
と、音声特徴パラメータ変動量分析手段が出力する音声
特徴パラメータ変動量情報に応答してプリエンファシス
係数を制御するプリエンファシス係数制御手段と、プリ
エンファシス係数制御手段の出力するプリエンファシス
係数に基づいて入力音声プリエンファシスを行うプリエ
ンファシス手段とを備えるようにしたものである。A voice emphasizing apparatus according to a second aspect of the invention is a voice feature parameter variation amount analyzing means for analyzing a degree of time variation of a voice feature parameter of an input voice for each short time section, and a voice feature parameter variation amount analyzing means. And a pre-emphasis coefficient control means for controlling the pre-emphasis coefficient in response to the voice feature parameter fluctuation amount information output by the pre-emphasis means, and a pre-emphasis means for performing input voice pre-emphasis based on the pre-emphasis coefficient output by the pre-emphasis coefficient control means. Is provided.

【００１５】第３の発明に係る音声強調装置は、短時間
区間に分割された入力音声の音声特徴パラメータの時間
変動の度合いを分析する音声特徴パラメータ変動量分析
手段と、音声特徴パラメータ変動量分析手段が出力する
第１の音声特徴パラメータ変動量情報に応答してスペク
トル強調強度を決定しスペクトル強調係数を出力するス
ペクトル強調強度制御手段と、スペクトル強調強度制御
手段が出力するスペクトル強調係数に応じて入力音声の
スペクトル強調を行うスペクトル強調手段と、音声特徴
パラメータ変動量分析手段が出力する第２の音声特徴パ
ラメータ変動量情報に応答してプリエンファシス係数を
決定しプリエンファシス係数を出力するプリエンファシ
ス係数制御手段と、プリエンファシス係数制御手段が出
力するプリエンファシス係数に応じて入力音声のプリエ
ンファシスを行うプリエンファシス手段と、から構成さ
れたようにしたものである。A voice emphasizing apparatus according to a third aspect of the present invention is a voice feature parameter variation amount analyzing means for analyzing a degree of time variation of a voice feature parameter of input voice divided into short time intervals, and a voice feature parameter variation amount analysis. In response to the first speech feature parameter variation amount information outputted by the means, the spectrum emphasis strength control means for determining the spectrum emphasis strength and outputting the spectrum emphasis coefficient, and the spectrum emphasis coefficient outputted by the spectrum emphasis strength control means A pre-emphasis coefficient that determines a pre-emphasis coefficient and outputs a pre-emphasis coefficient in response to the spectrum emphasis means for enhancing the spectrum of the input voice and the second voice feature parameter fluctuation amount information output by the voice feature parameter fluctuation amount analysis means. The control means and the pre-emphasis coefficient output by the pre-emphasis coefficient control means A pre-emphasis unit for performing pre-emphasis of the input voice in accordance with the cis coefficient, is obtained as constructed from.

【００１６】第４の発明に係る音声強調装置は、第１の
発明における音声強調装置において、短時間区間に分割
された入力音声特徴パラメータを入力し出力音声に歪み
が予想される場合に強調抑制信号を出力する強調不適合
音声検出手段と、強調抑制信号に基づいてスペクトル強
調係数を補正する強調係数変更手段とを備えるようにし
たものである。A speech emphasizing apparatus according to a fourth aspect of the present invention is the speech emphasizing apparatus according to the first aspect of the present invention, in which the input speech feature parameter divided into short time periods is input and the distortion is suppressed in the output speech. The present invention is provided with an emphasis incompatible voice detecting means for outputting a signal and an emphasis coefficient changing means for correcting the spectrum emphasis coefficient based on the emphasis suppression signal.

【００１７】第５の発明に係る音声強調装置は、第４の
発明における音声強調装置において、強調不適合音声検
出手段は入力音声に関する音声特徴パラメータ情報に加
えて出力音声の特徴パラメータ情報をも入力して、強調
抑制信号を制御するようにしたものである。A voice emphasizing device according to a fifth aspect of the present invention is the voice emphasizing device according to the fourth aspect of the invention, wherein the emphatic nonconforming voice detecting means inputs the feature parameter information of the output voice in addition to the voice feature parameter information regarding the input voice. Thus, the emphasis suppression signal is controlled.

【００１８】第６の発明に係る音声強調装置は、第４の
発明における音声強調装置において、強調係数変更手段
は強調抑制信号に加えて外部制御信号を入力するように
して、スペクトル強調係数を決定するようにしたもので
ある。A speech emphasizing apparatus according to a sixth aspect of the present invention is the speech emphasizing apparatus according to the fourth aspect of the invention, wherein the emphasis coefficient changing means inputs the external control signal in addition to the emphasis suppressing signal to determine the spectrum emphasis coefficient. It is something that is done.

【００１９】第７の発明に係る音声強調装置は、第１の
発明における音声強調装置において、音声特徴パラメー
タ変動量情報とスペクトル強調係数の関係を規定したス
ペクトル強調係数マップと、スペクトル強調仮係数から
スペクトル強調係数を算出するスペクトル強調仮係数決
定手段とを備えるようにしたものである。A speech emphasizing apparatus according to a seventh aspect of the present invention is the speech emphasizing apparatus according to the first aspect of the invention, wherein a spectral emphasis coefficient map defining the relationship between the speech feature parameter variation information and the spectral emphasis coefficient is used, A spectrum emphasis temporary coefficient determining means for calculating a spectrum emphasis coefficient is provided.

【００２０】第８の発明に係る音声強調装置は、第１の
発明における音声強調装置において音声特徴パラメータ
変動量情報とプリエンファシス係数の関係を規定したプ
リエンファシス係数マップと、プリエンファシス仮係数
からプリエンファシス係数を算出するプリエンファシス
係数決定手段とを備えるようにしたものである。A speech emphasizing apparatus according to an eighth aspect of the present invention is the speech emphasizing apparatus according to the first aspect of the invention, in which a pre-emphasis coefficient map defining a relationship between speech feature parameter variation amount information and a pre-emphasis coefficient, And a pre-emphasis coefficient determining means for calculating an emphasis coefficient.

【００２１】第９の発明に係る音声強調装置は、第１の
発明または第３の発明乃至第７の発明のいづれかにおけ
る音声強調装置において、音声特徴パラメータ変動量情
報に常時一定のバイアス値を加算し入力音声の特徴パラ
メータの変動の有無に拘らず常時入力音声のスペクトル
強調を行なうようにしたものである。A speech emphasizing apparatus according to a ninth invention is the speech emphasizing apparatus according to any one of the first invention or the third invention to the seventh invention, in which a constant bias value is always added to the speech feature parameter variation amount information. However, the spectrum of the input voice is always emphasized regardless of whether or not the characteristic parameter of the input voice changes.

【００２２】第１０の発明に係る音声強調装置は、第２
の発明または第３項の発明または第８の発明における音
声強調装置において、音声特徴パラメータ変動量情報に
常時一定のバイアス値を加算し入力音声の特徴パラメー
タの変動の有無に拘らず常時入力音声のプリエンファシ
スを行なうようにしたものである。A voice emphasizing device according to a tenth aspect of the invention is the second aspect.
In the voice emphasizing device according to the present invention, the third aspect of the invention, or the eighth aspect of the invention, a constant bias value is constantly added to the voice feature parameter variation amount information, and the constant input voice characteristic is maintained regardless of whether the feature parameter of the input voice varies. It is designed to perform pre-emphasis.

【００２３】第１１の発明に係る音声強調装置は、第７
の発明または第８の発明における音声強調装置におい
て、スペクトル強調係数マップまたはプリエンファシス
係数マップを複数個備えるようにしたものである。A speech emphasizing device according to an eleventh aspect of the invention is the seventh aspect.
The speech enhancement apparatus according to the invention or the eighth invention is provided with a plurality of spectrum enhancement coefficient maps or a plurality of pre-emphasis coefficient maps.

【００２４】[0024]

【作用】第１の発明に係わる音声強調装置は、入力音声
の音声特徴パラメータの変動量に応じてスペクトル強調
係数を求め、該係数値に基づいて入力音声の適応音声強
調を行う。第２の発明に係わる音声強調装置は、入力音
声の音声特徴パラメータの変動量に応じてプリエンファ
シス係数を求め、該係数値に基づいて入力音声の適応音
声強調を行う。第３の発明に係わる音声強調装置は、入
力音声の音声特徴パラメータの変動量に応答じてスペク
トル強調係数、およびプリエンファシス係数を求め、こ
れら両者の係数値に基づいて入力音声の適応音声強調を
行う。第４の発明に係わる音声強調装置は、強調処理に
伴う音声振幅の異常増大、あるいは雑音混入が予想され
る場合において、入力音声の強調を抑制または禁止する
ように制御する。第５の発明に係わる音声強調装置は、
入力音声および出力音声の両音声特徴パラメータに基づ
いて入力音声の強調を抑制または禁止するように制御す
る。第６の発明に係わる音声強調装置は、入力音声の音
声特徴パラメータに加えて外部入力制御信号により入力
音声の強調を抑制または禁止するように制御する。第７
の発明に係わる音声強調装置は、スペクトル強調マップ
から導出されたスペクトル強調仮係数を基に、前後の音
声フレームの平滑化処理からスペクトル強調係数を導出
し入力音声の強調処理を行なう。第８の発明に係わる音
声強調装置は、プリエンファシス係数マップから導出さ
れたプリエンファシス仮係数を基に、前後の音声フレー
ムの平滑化処理からプリエンファシス係数を導出し入力
音声の強調処理を行なう。第９の発明に係る音声強調装
置は、第１の発明または第３の発明乃至第７の発明のい
づれかにおける音声強調装置において、音声特徴パラメ
ータ変動量情報に常時一定のバイアス値を加算し入力音
声の特徴パラメータの変動の有無に拘らず常時入力音声
のスペクトル強調を行なうようにしたものである。第１
０の発明に係る音声強調装置は、第２の発明または第３
項の発明または第８の発明における音声強調装置におい
て、音声特徴パラメータ変動量情報に常時一定のバイア
ス値を加算し入力音声の特徴パラメータの変動の有無に
拘らず常時入力音声のプリエンファシスを行なうように
したものである。第１１の発明に係る音声強調装置は、
第７の発明または第８の発明における音声強調装置にお
いて、スペクトル強調係数マップまたはプリエンファシ
ス係数マップを複数個備え音声種別、環境変化を配慮す
るようにしたものである。The speech emphasizing apparatus according to the first aspect of the present invention obtains a spectrum emphasis coefficient according to the variation amount of the speech feature parameter of the input speech, and adaptively emphasizes the input speech based on the coefficient value. A voice emphasizing device according to a second aspect of the present invention obtains a pre-emphasis coefficient according to a variation amount of a voice feature parameter of an input voice, and adaptively emphasizes the input voice based on the coefficient value. A speech emphasizing apparatus according to a third aspect of the present invention obtains a spectral emphasis coefficient and a pre-emphasis coefficient in response to a variation amount of a speech feature parameter of input speech, and adaptively emphasizes the input speech based on these coefficient values. To do. The speech emphasizing device according to the fourth aspect of the invention controls so as to suppress or inhibit the emphasizing of the input speech when an abnormal increase of the speech amplitude or noise mixing is expected due to the emphasizing process. A voice emphasizing device according to a fifth invention is
Control is performed so as to suppress or prohibit the enhancement of the input voice based on both the voice feature parameters of the input voice and the output voice. The voice emphasizing device according to the sixth aspect of the invention controls so that the emphasis of the input voice is suppressed or prohibited by an external input control signal in addition to the voice feature parameter of the input voice. Seventh
The speech emphasizing apparatus according to the invention of claim 1 derives a spectral emphasis coefficient from the smoothing process of the preceding and following audio frames based on the spectral emphasis temporary coefficient derived from the spectrum emphasis map to perform the emphasis process of the input voice. A speech emphasizing apparatus according to an eighth aspect of the present invention derives a pre-emphasis coefficient from a smoothing process of preceding and following speech frames based on a pre-emphasis provisional coefficient derived from a pre-emphasis coefficient map, and enhances an input speech. According to a ninth aspect of the present invention, there is provided a speech enhancement apparatus according to any one of the first aspect of the invention, the third aspect of the invention, or the seventh aspect of the invention, wherein a constant bias value is always added to the speech feature parameter variation amount information to input speech. The input speech spectrum is always emphasized regardless of whether or not there is a change in the characteristic parameter. First
The voice enhancement device according to the invention of No. 0 is the second invention or the third invention.
In the voice emphasizing device according to the invention or the eighth invention, a constant bias value is always added to the voice feature parameter variation amount information to always perform pre-emphasis of the input voice regardless of the variation of the feature parameter of the input voice. It is the one. A voice emphasizing device according to an eleventh invention is
The speech enhancement apparatus according to the seventh invention or the eighth invention is provided with a plurality of spectrum enhancement coefficient maps or pre-emphasis coefficient maps so as to take into consideration the voice type and environmental changes.

【００２５】[0025]

【Example】

実施例１．以下、この発明の第１の実施例について、図
１及び図２に基づいて説明する。第１の発明に挙げる
「スペクトル強調」とは、例えば線形予測分析方法や高
速フーリエ変換を用いて音声のホルマントピークを抽出
し、それを強調するホルマント強調、あるいは帯域分割
を行い各周波数帯域毎にパワー強調係数を乗ずる帯域別
強調などがこれに相当する。本実施例では、スペクトル
強調手法として、線形予測分析方法によるホルマント強
調を用いた場合の構成について説明を行う。Example 1. A first embodiment of the present invention will be described below with reference to FIGS. 1 and 2. The "spectral enhancement" mentioned in the first invention is, for example, a formant peak of speech is extracted by using a linear predictive analysis method or a fast Fourier transform, and formant emphasis is performed to emphasize it, or band division is performed for each frequency band. This is equivalent to band-wise emphasis that is multiplied by a power emphasis coefficient. In the present embodiment, a configuration will be described in which formant enhancement by a linear prediction analysis method is used as a spectrum enhancement technique.

【００２６】図１において、１は入力音声から音声特徴
パラメータの算出を行う入力音声分析手段、２は前記音
声特徴パラメータの変化の度合いの分析を行う音声特徴
パラメータ変動量分析手段、３は前記変動量を受けてホ
ルマント強調強度制御を行うホルマント強調強度制御手
段、４は前記ホルマント強調強度制御手段が強調強度の
決定に用いるホルマント強調係数マップ、５は前記ホル
マント強調強度制御手段に応答してホルマント強調係数
を決定するホルマント強調係数決定手段、６は入力音声
のホルマント強調を行うホルマント強調手段、７は前記
ホルマント強調係数決定手段で決定された強調係数の時
間履歴を記憶する係数バッファである。また、１０１は
入力音声、１０２は音声特徴パラメータ、１０３は音声
特徴パラメータ変動量情報、１０４はホルマント強調仮
係数、１０５はホルマント強調係数、１０６は出力音
声、１１２は線形予測係数、１１３は前フレームのホル
マント強調係数である。In FIG. 1, 1 is an input voice analysis means for calculating a voice feature parameter from an input voice, 2 is a voice feature parameter variation amount analysis means for analyzing the degree of change of the voice feature parameter, and 3 is the variation. Formant emphasis strength control means for receiving the amount of formant emphasis strength control, 4 is a formant emphasis coefficient map used by the formant emphasis strength control means for determining the emphasis strength, and 5 is formant emphasis in response to the formant emphasis strength control means. Formant emphasis coefficient determining means for determining coefficients, 6 formant emphasis means for performing formant emphasis of input speech, and 7 is a coefficient buffer for storing the time history of the emphasis coefficients determined by the formant emphasis coefficient determining means. Further, 101 is an input voice, 102 is a voice feature parameter, 103 is voice feature parameter variation information, 104 is a formant emphasis temporary coefficient, 105 is a formant emphasis coefficient, 106 is an output voice, 112 is a linear prediction coefficient, and 113 is a previous frame. Is the formant emphasis coefficient of.

【００２７】次に動作について説明する。入力音声分析
手段１は入力音声１０１を短時間区間に分割し、各分析
区間毎に入力音声の分析を行い、例えば、自己相関係数
を音声特徴パラメータ１０２として算出する。さらに、
線形予測係数１１２を、自己相関方法などを用いて算出
する。Next, the operation will be described. The input voice analysis unit 1 divides the input voice 101 into short time periods, analyzes the input voice for each analysis period, and calculates, for example, an autocorrelation coefficient as the voice feature parameter 102. further,
The linear prediction coefficient 112 is calculated using an autocorrelation method or the like.

【００２８】音声特徴パラメータ変動量分析手段２は、
該当フレームおよび過去のフレームの音声特徴パラメー
タ１０２を入力とし、当該フレーム近傍における音声特
徴パラメータの変動量の分析を行い、例えば、前記自己
相関係数の絶対値、および前の１フレームとの自己相関
係数の変化量を音声特徴パラメータ変動量情報１０３と
して算出する。また、先読みを行って未来の音声特徴パ
ラメータを求め、未来から現在にいたる音声特徴パラメ
ータの変化を音声特徴パラメータ変動量情報１０３とし
て用いることも可能である。The voice feature parameter variation amount analysis means 2 is
The speech feature parameter 102 of the corresponding frame and the past frame is input, and the variation amount of the speech feature parameter in the vicinity of the frame is analyzed. For example, the absolute value of the autocorrelation coefficient and the self-phase with the preceding one frame are analyzed. The change amount of the number of relationships is calculated as the voice feature parameter change amount information 103. It is also possible to perform prefetching to obtain future voice characteristic parameters and use the change in the voice characteristic parameters from the future to the present as the voice characteristic parameter variation amount information 103.

【００２９】音声特徴パラメータ変動量情報１０３とし
ては、前記自己相関係数の絶対値および前フレームとの
変化量の他に、例えば、当該フレーム近傍の音声特徴パ
ラメータ間の相関値、差分値、微分値、絶対値、あるい
は公知の統計分析手法による音韻特徴パラメータ群の分
布の分析結果などを用いることも可能である。As the voice feature parameter variation amount information 103, in addition to the absolute value of the autocorrelation coefficient and the change amount with respect to the previous frame, for example, the correlation value, the difference value, the differential between the voice feature parameters in the vicinity of the frame concerned. It is also possible to use the value, the absolute value, or the analysis result of the distribution of the phoneme characteristic parameter group by a known statistical analysis method.

【００３０】ホルマント強調強度制御手段３は、図２に
示すようなホルマント強調強度の決定に用いられる音声
特徴パラメータ変動量と、ホルマント強調係数の関係を
表すホルマント強調係数マップ４を具備する。ホルマン
ト強調係数マップ４上では、Ｘ軸に自己相関係数の０次
成分の絶対値、Ｙ軸には同じく自己相関係数の０次成分
の前フレームとの差分量がそれぞれ対応付けられ、各短
時間区間毎に音声特徴パラメータ変動量情報１０３とし
て入力された自己相関係数の０次成分の絶対値と自己相
関係数の０次成分の前フレームとの差分量によりＸ−Ｙ
平面上のＺ軸におけるホルマント強調強度の値が一意に
定まるので、それをホルマント強調仮係数１０４として
出力する。The formant emphasizing strength control means 3 comprises a formant emphasizing coefficient map 4 showing the relationship between the form feature emphasizing coefficient and the voice feature parameter variation amount used for determining the formant emphasizing strength as shown in FIG. On the formant enhancement coefficient map 4, the X-axis is associated with the absolute value of the 0th-order component of the autocorrelation coefficient, and the Y-axis is similarly associated with the difference amount of the 0th-order component of the autocorrelation coefficient from the previous frame. XY is obtained by the difference amount between the absolute value of the 0th-order component of the autocorrelation coefficient and the previous frame of the 0th-order component of the autocorrelation coefficient, which is input as the speech feature parameter variation amount information 103 for each short time period.
Since the value of the formant emphasis strength on the Z axis on the plane is uniquely determined, it is output as the formant emphasis temporary coefficient 104.

【００３１】ホルマント強調係数マップ４は、ＲＯＭま
たはＲＡＭなどの記憶装置で提供される。その作成方法
として本実施例では、予備実験によりホルマント強調係
数マップを逐次変化させて行く方法について説明する。
まず自己相関係数の０次係数を表すＸ軸で数カ所の閾値
（Ｔｈ１，Ｔｈ２，…，Ｔｈｎ）と、差分量を表すＹ軸
で同様に数カ所の閾値（Ｔｄ１，Ｔｄ２，…，Ｔｄｎ）
を設定する。次にＸ軸とＹ軸の各閾値によって形成され
たＸ−Ｙ平面上の網目の交点における強調係数として、
聴感上で最適なホルマント強調係数マップが得られるよ
うに最初は適当な値を設定し仮の音声強調装置を構成し
ておき、その後で音声強調装置を用いて聴取実験を行い
つつ、それら交点におけるホルマント強調係数の調整を
繰り返す。さらにきめ細かな強調を行うために、ホルマ
ント強調係数マップ平面のＸ，Ｙ軸の閾値の網目を細分
化する必要がある場合は、最初に設定した閾値の間にさ
らに新たな閾値を設定して、その閾値に対して上記と同
様の調整を施す。以上の調整を繰り返した後に、交点間
のホルマント強調係数の値を隣接する交点の値で直線補
間する。ただし、ホルマント強調手段６での発振防止の
ために、ホルマント強調係数マップ４上でのホルマント
強調係数の上限値は０．９５とし、下限値は０とする。The formant emphasis coefficient map 4 is provided in a storage device such as a ROM or a RAM. In this embodiment, as a method of creating the method, a method of sequentially changing the formant emphasis coefficient map by a preliminary experiment will be described.
First, several thresholds (Th1, Th2, ..., Thn) on the X-axis that represent the zero-order coefficient of the autocorrelation coefficient and several thresholds (Td1, Td2, ..., Tdn) on the Y-axis that represent the difference amount.
Set. Next, as the emphasis coefficient at the intersection of the meshes on the XY plane formed by the threshold values of the X axis and the Y axis,
At first, an appropriate value is set so as to obtain the optimum formant emphasis coefficient map in terms of hearing, and a temporary speech enhancement device is configured.After that, a listening experiment is performed using the speech enhancement device, and at these intersections, The adjustment of the formant emphasis coefficient is repeated. When it is necessary to subdivide the mesh of thresholds on the X and Y axes of the formant enhancement coefficient map plane in order to perform finer enhancement, a new threshold is set between the thresholds set first, The same adjustment as above is applied to the threshold value. After repeating the above adjustment, the value of the formant emphasis coefficient between the intersections is linearly interpolated with the values of the adjacent intersections. However, in order to prevent oscillation in the formant emphasis means 6, the upper limit value of the formant emphasis coefficient on the formant emphasis coefficient map 4 is set to 0.95, and the lower limit value is set to 0.

【００３２】前記ホルマント強調係数マップ４の交点間
の補間は、例えば、３点曲線近似方法などの補間方法を
用いて曲線補間を行ってもよい。この場合もホルマント
強調手段６での発振防止のために、補間後のマップ上で
のホルマント強調係数の上限値は０．９５で下限値は０
に制限する。The interpolation between the intersections of the formant emphasis coefficient map 4 may be performed by curve interpolation using an interpolation method such as a three-point curve approximation method. Also in this case, in order to prevent oscillation in the formant emphasis means 6, the upper limit value of the formant emphasis coefficient on the map after interpolation is 0.95 and the lower limit value is 0.
Restricted to.

【００３３】係数バッファ７はＲＡＭなどの一時記憶装
置で構成され、ホルマント強調係数決定手段５で決定さ
れた過去５フレーム分のホルマント強調係数を記憶する
スペースを持ち、各短時間区間毎にホルマント強調係数
１０５が決定される度にその内容が取り込まれて値が更
新され、常に現在のフレームから最近の５フレーム分の
ホルマント強調係数を保持する。The coefficient buffer 7 is composed of a temporary storage device such as a RAM, has a space for storing the formant emphasis coefficients of the past 5 frames determined by the formant emphasis coefficient determining means 5, and formant emphasis is performed for each short time section. Each time the coefficient 105 is determined, its content is fetched and the value is updated, and the formant emphasis coefficient for the last 5 frames from the current frame is always held.

【００３４】ホルマント強調係数決定手段５は、ホルマ
ント強調強度制御手段３で決定されたホルマント強調仮
係数１０４に対して、前後フレームとの強調係数の不連
続が原因で生じる異音の発生を防止するため、係数バッ
ファ７に記憶されている前フレームのホルマント強調係
数１１３を用いて、その５フレーム分を例えば移動平均
フィルタなどの平滑化フィルタに入力して平滑化処理を
施した後、ホルマント強調係数１０５として出力する。The formant emphasis coefficient determining means 5 prevents the generation of abnormal noise caused by the discontinuity of the emphasis coefficients with respect to the formant emphasis temporary coefficient 104 determined by the formant emphasis intensity control means 3 due to the discontinuity of the emphasis coefficients with the preceding and following frames. Therefore, the formant emphasis coefficient 113 of the previous frame stored in the coefficient buffer 7 is used, and the five frames are input to a smoothing filter such as a moving average filter to be smoothed. It is output as 105.

【００３５】ホルマント強調手段６は、ホルマントとホ
ルマントの谷の部分（アンチホルマント）を強調するた
めに、式（１）の伝達関数Ｈs （ｚ）で表されるよう
な、フィルタの伝達関数が極と零点で表現される極零モ
デル型（ＡＲＭＡ型とも呼ばれる）フィルタで構成され
る。入力音声１０１を式（１）で構成されるフィルタに
入力してホルマント強調を行い、出力音声１０６を出力
する。The formant emphasizing means 6 emphasizes the formant and the valley portion of the formant (antiformant) so that the transfer function of the filter as represented by the transfer function Hs (z) of the equation (1) is a pole. And a zero-zero model (also called ARMA type) filter. The input voice 101 is input to the filter configured by the equation (1), formant enhancement is performed, and the output voice 106 is output.

【００３６】[0036]

【数１】 [Equation 1]

【００３７】[0037]

【数２】 [Equation 2]

【００３８】式（１）中のαは、ホルマント強調係数決
定手段５で出力されたホルマント強調係数に基づいて、
線形予測係数１１２を式（２）に従って平滑化したもの
である。式（２）中のｐはホルマント強調係数、α_o
は平滑化前の線形予測係数である。ηはさらに前記αを
一度自己相関係数に戻した後、係数の安定化処理を行う
ために自己相関係数に重み付けを行う、いわゆるラグ窓
を用いて自己相関係数の平滑化を行い、さらに線形予測
係数に戻したものである。式（１）、式（２）の詳細な
説明は、ＩｒａＡＧｅｒｓｏｎ他著、“Ｖｅｃｔｏ
ｒＳｕｍＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄ
ｉｃｔｉｏｎ（ＶＳＥＬＰ）ＳｐｅｅｃｈＣｏｄｉｎ
ｇａｔ８ＫＢＰＳ”，ＩＥＥＥ，ＩＣＡＳＳＰ，
（ＡＰＲ，１９９０）で述べられているので省略する。
また、ラグ窓についての詳細な説明は、特開昭５２−６
４８０８号に述べられているので省略する。Α in the equation (1) is based on the formant emphasis coefficient output from the formant emphasis coefficient determining means 5.
The linear prediction coefficient 112 is smoothed according to equation (2). P in the equation (2) is a formant emphasis coefficient, α _o
Is a linear prediction coefficient before smoothing. η further returns the α to the autocorrelation coefficient once, and then weights the autocorrelation coefficient in order to perform coefficient stabilization processing, that is, smoothing the autocorrelation coefficient using a so-called lag window, Further, it is returned to the linear prediction coefficient. For a detailed description of equations (1) and (2), see Ira A Gerson et al., “Vecto.
r Sum Excited Linear Pred
ction (VSELP) Speech Codin
g at 8KBPS ", IEEE, ICASSP,
Since it has been described in (APR, 1990), it will be omitted.
For a detailed description of the lag window, see Japanese Patent Laid-Open No. 52-6.
Since it is described in No. 4808, it will be omitted.

【００３９】ホルマント強調仮係数１０４は、ホルマン
ト強調係数マップ４を音声特徴パラメータ変化量情報の
種類や数に応じて複数個用意し、それぞれのマップでホ
ルマント強調仮係数を決定し、それらを平均して代表の
ホルマント強調仮係数を決定することも可能である。The formant emphasis temporary coefficient 104 prepares a plurality of formant emphasis coefficient maps 4 according to the kind and number of the voice feature parameter change amount information, determines the formant emphasis temporary coefficients in each map, and averages them. It is also possible to determine a representative formant emphasis temporary coefficient.

【００４０】上記実施例によれば、自己相関係数などの
音声特徴パラメータの変動量情報に応答してホルマント
強調を行うことにより、音声のスペクトルが急激に変化
するような子音から母音への立ち上りのホルマント、ま
たは母音から他の母音へ移行するホルマントの過渡部に
おける音声の了解性に関して、重要な部分を選択的に強
調できるという効果がある。更に、母音から子音への過
渡部などで、母音部の末端を通常よりも早期にパワーを
減衰するように強調制御を行うことにより先行母音によ
る子音の後方性マスキングを防ぐ効果がある。更に、ホ
ルマント強調係数マップを複数個用意してホルマント強
調係数を決定することにより、より一層音声特徴パラメ
ータ変化量に対応したホルマント強調処理を行うことが
できるという効果がある。According to the above-described embodiment, the formant enhancement is performed in response to the variation amount information of the voice characteristic parameter such as the autocorrelation coefficient, so that the rise of the consonant to the vowel causes a rapid change in the spectrum of the voice. There is an effect that an important part can be selectively emphasized with respect to the intelligibility of the voice in the formant of, or the transition part of the formant in which the vowel changes to another vowel. Further, in a transition portion from a vowel to a consonant, etc., by performing emphasis control so as to attenuate the power at the end of the vowel portion earlier than usual, there is an effect of preventing backward consonant masking by the preceding vowel. Further, by preparing a plurality of formant emphasis coefficient maps and determining the formant emphasis coefficient, there is an effect that the formant emphasis processing corresponding to the variation amount of the voice feature parameter can be further performed.

【００４１】実施例２．この発明の第２の実施例につい
て図３、及び図４に基づいて説明する。図３において、
１は入力音声分析手段、２は音声特徴パラメータ変動量
分析手段、８は音声特徴パラメータ変動量分析手段２が
出力する音声特徴パラメータ変動量を受けてプリエンフ
ァシス係数制御を行うプリエンファシス係数制御手段、
９は前記プリエンファシス係数制御手段８に応答してプ
リエンファシス係数を決定するプリエンファシス係数決
定手段、１０はプリエンファシス係数決定手段９で強調
強度の決定のために用いられるプリエンファシス係数マ
ップ、１１は入力音声のプリエンファシスを行うプリエ
ンファシス手段である。１０２は音声特徴パラメータ、
１０３は音声特徴パラメータ変動量情報、１０７はプリ
エンファシス仮係数、１０８はプリエンファシス係数、
１１４は前フレームのプリエンファシス係数である。そ
の他の入力音声１０１、係数バッファ７、出力音声１０
６は実施例１記載のものと同等であるので説明を省略す
る。Example 2. A second embodiment of the present invention will be described with reference to FIGS. 3 and 4. In FIG.
Reference numeral 1 is an input voice analysis unit, 2 is a voice feature parameter variation amount analysis unit, 8 is a pre-emphasis coefficient control unit that receives a voice feature parameter variation amount output from the voice feature parameter variation amount unit, and performs pre-emphasis coefficient control,
Reference numeral 9 is a pre-emphasis coefficient determining means for determining the pre-emphasis coefficient in response to the pre-emphasis coefficient controlling means 8, 10 is a pre-emphasis coefficient map used by the pre-emphasis coefficient determining means 9 for determining the emphasis strength, and 11 is a pre-emphasis coefficient map. It is a pre-emphasis means for pre-emphasising the input voice. 102 is a voice feature parameter,
Reference numeral 103 is voice characteristic parameter variation information, 107 is a pre-emphasis temporary coefficient, 108 is a pre-emphasis coefficient,
Reference numeral 114 is a pre-emphasis coefficient of the previous frame. Other input voice 101, coefficient buffer 7, output voice 10
Since 6 is the same as that described in the first embodiment, the description thereof is omitted.

【００４２】次に動作について説明する。入力音声分析
手段１は入力音声１０１を短時間区間に分割し、各分析
区間毎に入力音声の分析を行い、例えば自己相関係数を
音声特徴パラメータ１０２として算出する。Next, the operation will be described. The input voice analysis unit 1 divides the input voice 101 into short time periods, analyzes the input voice for each analysis period, and calculates, for example, an autocorrelation coefficient as the voice feature parameter 102.

【００４３】音声特徴パラメータ変動量分析手段２は、
当該フレームおよび過去のフレームの音声特徴パラメー
タ１０２を入力とし、当該フレーム近傍における音声特
徴パラメータの変動量の分析を行い、例えば前記自己相
関係数の絶対値、および前の１フレームとの自己相関係
数の変化量を音声特徴パラメータ変動量情報１０３とし
て算出する。また、先読みを行って未来の音声特徴パラ
メータを求め、未来から現在に到る音声特徴パラメータ
の変化を音声特徴パラメータ変動量情報１０３として用
いることも可能である。The voice feature parameter variation amount analysis means 2 is
The voice feature parameter 102 of the frame and the past frame is input, and the variation amount of the voice feature parameter in the vicinity of the frame is analyzed. For example, the absolute value of the autocorrelation coefficient and the autocorrelation with the previous one frame are analyzed. The variation amount of the number is calculated as the voice feature parameter variation amount information 103. Further, it is also possible to perform prefetching to obtain a future voice feature parameter and use the change in the voice feature parameter from the future to the present as the voice feature parameter variation amount information 103.

【００４４】プリエンファシス係数制御手段８は、図４
に示すようなプリエンファシス強度の決定に使用される
音声特徴パラメータ変動量と、プリエンファシス係数の
関係を表すプリエンファシス係数マップ１０を具備す
る。プリエンファシス係数マップ１０上では、Ｘ軸に自
己相関係数の０次成分の絶対値が、Ｙ軸には同じく自己
相関係数の０次成分の前フレームとの差分量がそれぞれ
対応付けられており、各短時間区間毎に音声特徴パラメ
ータ変動量情報１０３として入力された自己相関係数の
０次成分の絶対値と自己相関係数の０次成分の前フレー
ムとの差分量によりＸ−Ｙ平面上のＺ軸におけるプリエ
ンファシス係数の値が一意に定まるので、それをプリエ
ンファシス仮係数１０７として出力する。The pre-emphasis coefficient control means 8 is shown in FIG.
The pre-emphasis coefficient map 10 showing the relationship between the pre-emphasis coefficient and the variation amount of the speech feature parameter used for determining the pre-emphasis strength as shown in FIG. On the pre-emphasis coefficient map 10, the absolute value of the 0th order component of the autocorrelation coefficient is associated with the X axis, and the difference amount of the 0th order component of the autocorrelation coefficient from the previous frame is associated with the Y axis. Therefore, XY is obtained by the difference amount between the absolute value of the 0th-order component of the autocorrelation coefficient and the previous frame of the 0th-order component of the autocorrelation coefficient, which is input as the voice feature parameter variation amount information 103 for each short time period. Since the value of the pre-emphasis coefficient on the Z axis on the plane is uniquely determined, it is output as the pre-emphasis temporary coefficient 107.

【００４５】プリエンファシス係数マップ１０は、ＲＯ
ＭまたはＲＡＭなどの記憶装置で提供される。本実施例
においても、先に実施例１で示したホルマント強調係数
マップの作成方法と同様にして、予備実験によりプリエ
ンファシス係数マップを逐次変化させる方法により求め
ることができる。但し、プリエンファシス手段１１での
異音発生防止のために、プリエンファシス係数マップ１
０上でのプリエンファシス係数の上限値は０．７とし、
下限値は０とする。The pre-emphasis coefficient map 10 is RO
It is provided by a storage device such as M or RAM. In the present embodiment as well, similar to the method of creating the formant emphasis coefficient map shown in the first embodiment, the pre-emphasis coefficient map can be obtained by a method of successively changing the pre-emphasis coefficient map. However, in order to prevent abnormal noise from being generated by the pre-emphasis means 11, the pre-emphasis coefficient map 1
The upper limit of the pre-emphasis coefficient on 0 is 0.7,
The lower limit value is 0.

【００４６】プリエンファシス係数マップ１０が実施例
１で示された方法により作成された場合、マップ上交点
間の補間は、例えば、３点曲線近似方法などの補間方法
を用いて行ってもよい。この場合もプリエンファシス手
段１１での異音発生防止のために、補間後のプリエンフ
ァシス係数の上限値は０．７で、下限値は０に制限す
る。When the pre-emphasis coefficient map 10 is created by the method shown in the first embodiment, the interpolation between the intersection points on the map may be performed by using an interpolation method such as a three-point curve approximation method. Also in this case, the upper limit of the pre-emphasis coefficient after interpolation is 0.7 and the lower limit thereof is limited to 0 in order to prevent abnormal noise from being generated by the pre-emphasis means 11.

【００４７】係数バッファ７はＲＡＭなどの一時記憶装
置で構成され、プリエンファシス係数決定手段９で決定
された過去５フレーム分のプリエンファシス係数を記憶
するスペースを持ち、各短時間区間毎にプリエンファシ
ス係数１０８が決定する度にその値が取り込まれて内容
が更新され、常に現在のフレームから最近の５フレーム
分のプリエンファシス係数を保持する。The coefficient buffer 7 is composed of a temporary storage device such as a RAM, has a space for storing the pre-emphasis coefficients for the past 5 frames determined by the pre-emphasis coefficient determining means 9, and pre-emphasis for each short time interval. Each time the coefficient 108 is determined, its value is fetched and the content is updated, and the pre-emphasis coefficient for the last 5 frames from the current frame is always held.

【００４８】プリエンファシス係数決定手段９は、プリ
エンファシス係数制御手段８で決定されたプリエンファ
シス仮係数１０７に対して、前後フレームとの強調係数
の不連続が原因で生じる異音の発生を防止するために係
数バッファ７に記憶されている前フレームのプリエンフ
ァシス係数１１４を用いて、その５フレーム分を例えば
移動平均フィルタなどの平滑化フィルタに入力して平滑
化処理を施した後、プリエンファシス係数１０８として
出力する。The pre-emphasis coefficient determining means 9 prevents the pre-emphasis temporary coefficient 107 determined by the pre-emphasis coefficient control means 8 from generating an abnormal noise caused by the discontinuity of the emphasis coefficient with the preceding and following frames. For this purpose, the pre-emphasis coefficient 114 of the previous frame stored in the coefficient buffer 7 is used, and the five frames are input to a smoothing filter such as a moving average filter to be smoothed. Output as 108.

【００４９】プリエンファシス手段１１は、式（３）で
示される伝達関数Ｈ_p （ｚ）で構成されるフィルタであ
る。プリエンファシス手段１１では、ｐで表されるプリ
エンファシス係数１０８、および入力音声１０１を入力
して、入力音声１０１のプリエンファシス処理を行い、
出力音声１０６を出力する。The pre-emphasis means 11 is a filter composed of the transfer function H _p (z) expressed by the equation (3). In the pre-emphasis means 11, the pre-emphasis coefficient 108 represented by p and the input voice 101 are input, and pre-emphasis processing of the input voice 101 is performed.
The output voice 106 is output.

【００５０】[0050]

【数３】 (Equation 3)

【００５１】プリエンファシス仮係数１０７は、プリエ
ンファシス係数マップ１０を音声特徴パラメータ変化量
情報の種類や数に応じて複数個用意し、それぞれのマッ
プでプリエンファシス仮係数を決定し、例えば、それら
を平均することによって代表のプリエンファシス仮係数
を決定することも可能である。For the pre-emphasis temporary coefficient 107, a plurality of pre-emphasis coefficient maps 10 are prepared according to the type and number of the voice feature parameter change amount information, and the pre-emphasis temporary coefficient is determined in each map. It is also possible to determine the representative pre-emphasis temporary coefficient by averaging.

【００５２】上記実施例によれば、自己相関係数などの
音声特徴パラメータの変動量情報に応答してプリエンフ
ァシス処理を行うことにより、音声特徴パラメータが急
激に変化するような子音から母音への立ち上がりの過渡
部、つまり音声の了解性に関して重要な部分においてエ
ネルギーが小さくかつ高域にパワーが集中することが多
い子音特徴の強調を行うことができるという効果があ
る。また、プリエンファシス係数マップを複数個用意し
てプリエンファシス係数を決定することにより、音声特
徴パラメータ変化に対してより追従性の高いプリエンフ
ァシス処理を行うことができるという効果がある。According to the above embodiment, by performing the pre-emphasis processing in response to the variation amount information of the voice feature parameter such as the autocorrelation coefficient, the consonant to the vowel sound in which the voice feature parameter changes abruptly. There is an effect that the consonant feature whose energy is small and whose power is often concentrated in a high frequency region can be emphasized in the transitional portion of the rising edge, that is, in a portion important for the intelligibility of the voice. Further, by preparing a plurality of pre-emphasis coefficient maps and determining the pre-emphasis coefficient, it is possible to perform the pre-emphasis processing having a higher trackability with respect to the change of the voice characteristic parameter.

【００５３】実施例３．この発明の第３の実施例につい
て、図５に基づいて説明する。本実施例は、前記実施例
１に記載のスペクトル強調強度制御手段３、ホルマント
強調係数決定手段５、ホルマント強調手段６、及び前記
実施例２に記載のプリエンファシス係数制御手段８、プ
リエンファシス係数決定手段９、プリエンファシス手段
１１、の両者の縦続構成を取るようにしたものである。
図中の各要素については、前記実施例１および２にて説
明済なので省略する。Example 3. A third embodiment of the present invention will be described based on FIG. In this embodiment, the spectral emphasis strength control means 3, the formant emphasis coefficient determination means 5, the formant emphasis coefficient means 6, and the pre-emphasis coefficient control means 8 and pre-emphasis coefficient determination described in the second embodiment are described. Both the means 9 and the pre-emphasis means 11 are arranged in cascade.
The respective elements in the figure have been already described in the first and second embodiments and will not be described.

【００５４】このような構成をとることにより、実施例
１と実施例２が有する効果に加えて、ホルマント強調と
プリエンファシスの相乗効果により、高域のスペクトル
ピークが更に強調される効果がある。また本実施例にお
いて、ホルマント強調とプリエンファシスの前後を入れ
換えても、本発明の効果は変わらない。With such a structure, in addition to the effects of the first and second embodiments, the synergistic effect of formant enhancement and pre-emphasis has the effect of further enhancing the high-frequency spectrum peak. Further, in the present embodiment, even if the formant emphasis and the pre-emphasis are switched before and after, the effect of the present invention does not change.

【００５５】実施例４．この発明の第４の実施例につい
て、図６に基づいて説明する。図６において、１２は強
調不適合音声検出手段、１３は強調係数変更手段、１０
９は強調抑制／禁止信号、１１０は変更されたホルマン
ト強調係数である。Embodiment 4 FIG. A fourth embodiment of the present invention will be described based on FIG. In FIG. 6, reference numeral 12 is an emphasis incompatible voice detecting means, 13 is an emphasis coefficient changing means, 10
Reference numeral 9 is an emphasis suppression / prohibition signal, and 110 is a changed formant emphasis coefficient.

【００５６】本実施例では、音声強調装置２００に実施
例１で記載の音声強調装置を適用した場合の説明を行
う。その他の入力音声分析手段１、音声特徴パラメータ
変動量分析手段２、ホルマント強調強度制御手段３、ホ
ルマント強調係数マップ４、ホルマント強調係数決定手
段５、ホルマント強調手段６、係数バッファ７、入力音
声１０１、音声特徴パラメータ１０２、音声特徴パラメ
ータ変動量情報１０３、ホルマント強調仮係数１０４、
ホルマント強調係数１０５、出力音声１０６は前記実施
例と同一であるので説明を省略する。In the present embodiment, description will be given of a case where the voice emphasizing device described in the first embodiment is applied to the voice emphasizing device 200. Other input speech analysis means 1, speech feature parameter variation analysis means 2, formant emphasis strength control means 3, formant emphasis coefficient map 4, formant emphasis coefficient determination means 5, formant emphasis means 6, coefficient buffer 7, input speech 101, Voice feature parameter 102, voice feature parameter variation amount information 103, formant emphasis temporary coefficient 104,
Since the formant emphasis coefficient 105 and the output voice 106 are the same as those in the above-mentioned embodiment, their explanations are omitted.

【００５７】次に動作について説明する。強調不適合音
声検出手段１２は入力音声分析手段１が出力する音声特
徴パラメータ１０２の分析を行い、低域にパワーが集中
したり、ホルマントバンド幅がある閾値よりも狭い場
合、線形予測が２次以下で打ち切られた場合、さらには
音声区間で雑音が大きいこと（例えば、入力音声と雑音
のＳＮ比が１０ｄＢを下回ったとき）などが原因で強調
を行うことにより出力音声に歪みが生ずることが予測さ
れる場合には、強調抑制／禁止制御信号１０９を出力す
る。Next, the operation will be described. The emphasis non-conforming speech detection means 12 analyzes the speech feature parameter 102 output by the input speech analysis means 1, and when power is concentrated in the low frequency band or when the formant bandwidth is narrower than a certain threshold, the linear prediction is quadratic or lower. When it is discontinued, it is predicted that the output voice will be distorted by emphasizing due to the fact that the noise is large in the voice section (for example, when the SN ratio of the input voice and the noise is less than 10 dB). If so, the emphasis suppression / prohibition control signal 109 is output.

【００５８】強調係数変更手段１３は強調抑制／禁止制
御信号１０９を入力し、ホルマント強調を抑制する場合
には、ホルマント強調係数決定手段５が出力するホルマ
ント強調係数１０５に対して、補正係数として１より小
さい数（ここでは０．８）を乗じて強調を弱め、またホ
ルマント強調を禁止する場合にはホルマント強調係数を
０にするなどの変更処理を加えた後ホルマント強調係数
１１０を出力する。その後、ホルマント強調手段６で入
力音声１０１のホルマント強調を行い、出力音声１０６
を出力する。The emphasis coefficient changing means 13 inputs the emphasis suppression / inhibition control signal 109, and when suppressing the formant emphasis, the formant emphasis coefficient 105 output from the formant emphasis coefficient determining means 5 is set as a correction coefficient of 1 The formant emphasis coefficient 110 is output after performing a change process such that the emphasis is weakened by multiplying by a smaller number (here, 0.8), and the formant emphasis coefficient is set to 0 when the formant emphasis is prohibited. Thereafter, the formant emphasizing means 6 performs formant emphasis on the input voice 101 and outputs the output voice 106.
Is output.

【００５９】ホルマント強調の抑制または禁止を行うこ
とにより、ホルマント強調によって異音を発生するおそ
れのある音声の強調を防止し、音声と無関係な雑音の強
調を抑制したり、過度の強調により歪み感を防ぐ効果が
ある。By suppressing or prohibiting the formant emphasis, it is possible to prevent the emphasis of the voice that may generate an abnormal sound due to the formant emphasis, suppress the emphasis of noise unrelated to the voice, or to suppress the distortion feeling by the excessive emphasis. Has the effect of preventing

【００６０】実施例５．この発明の第５の実施例につい
て、図７に基づいて説明する。本実施例は図７のブロッ
ク図に示すように、強調不適合音声検出手段１２に対し
て音声特徴パラメータ１０２だけでなく、その他のパラ
メータ、例えば、出力音声をフィードバック入力し、こ
れを分析することで過度の強調を検出するようにしたも
のである。Example 5. A fifth embodiment of the present invention will be described based on FIG. In the present embodiment, as shown in the block diagram of FIG. 7, not only the voice feature parameter 102 but also other parameters, for example, output voice is fed back to the emphasis nonconforming voice detection means 12, and this is analyzed. It is designed to detect excessive emphasis.

【００６１】出力音声を用いて音声強調のフィードバッ
ク制御を行うことにより、強調制御をより精密に行なう
ことができる利点がある。By performing feedback control of voice enhancement by using the output voice, there is an advantage that the enhancement control can be performed more precisely.

【００６２】実施例６．この発明の第６の実施例につい
て、図８に基づいて説明する。本実施例は図８のブロッ
ク図に示すように、強調係数変更手段１３に対して、実
施例４に記載の音声強調装置２０１の外部から、例え
ば、強調強度切り替えスイッチ信号などの外部制御信号
１１１を入力するようにし強調強度の制御を外部から行
うようにしたものである。Example 6. A sixth embodiment of the present invention will be described based on FIG. In the present embodiment, as shown in the block diagram of FIG. 8, an external control signal 111 such as an emphasis intensity changeover switch signal is supplied to the emphasis coefficient changing unit 13 from the outside of the voice emphasis device 201 described in the fourth embodiment. Is input so that the emphasis strength is controlled from the outside.

【００６３】図８において新規な部分は、外部制御信号
１１１である。その他の各要素は実施例４記載の構成要
素と同一なので説明は省略する。A new part in FIG. 8 is the external control signal 111. The other elements are the same as the constituent elements described in the fourth embodiment, and the description thereof will be omitted.

【００６４】外部制御信号を用いて強調制御を行うこと
により、例えば、電話機などに本発明の音声強調装置が
応用された時に、受話者にとって聴きやすいように音声
強調制御を受話者または送話者が任意に行うことができ
るという利点がある。更に、受話者が音声強調を好まし
くないと判断した場合には、音声強調を抑制／禁止する
ことができるという利点もある。また、逆に音声強調の
抑制／禁止を解除するといった制御も可能である。By performing the enhancement control using the external control signal, for example, when the voice enhancement device of the present invention is applied to a telephone or the like, the voice enhancement control is performed so that the listener can easily hear the voice enhancement control. Has the advantage that it can be done arbitrarily. Further, there is an advantage that the voice enhancement can be suppressed / prohibited when the listener determines that the voice enhancement is not preferable. On the contrary, it is also possible to perform control such that the suppression / prohibition of voice enhancement is released.

【００６５】実施例７．実施例１乃至４のいづれかにお
いて、例えば、入力音声１０１の現在のフレームと出力
音声１０６の現在のフレームのパワーに関する両者の比
較を行い、入力音声のパワーレベルに正規化するように
出力音声のパワー補正を行うようにしてもよい。Example 7. In any one of Embodiments 1 to 4, for example, the power of the output sound is normalized so as to be normalized to the power level of the input sound by comparing the power of the current frame of the input sound 101 and the power of the current frame of the output sound 106. You may make it correct.

【００６６】パワー補正を行うことにより、例えば、入
力音声と出力音声を相互に切り替えて聞き比べるシステ
ム、または音声メッセージの一部分だけを強調すること
のできる音声蓄積装置などに用いられて、入力音声と出
力音声のパワーレベル整合を行うことができ、切り替え
時の違和感を低減できる効果がある。さらに、強調によ
る振幅増大などを抑制できる効果がある。By performing power correction, for example, it is used in a system in which input voice and output voice are switched and compared with each other, or in a voice storage device capable of emphasizing only a part of a voice message. The power level of the output voice can be matched, and an effect of reducing discomfort at the time of switching can be obtained. Furthermore, there is an effect that an increase in amplitude due to emphasis can be suppressed.

【００６７】実施例８．上記実施例１，３および４にお
いて、音声特徴パラメータ変動量情報１０３の検出結果
に一定値のバイアスを加え、音声特徴パラメータ変動量
情報１０３が常時小量の値を保持するようにして、音声
特徴パラメータの変動が無くても常時ホルマント強調を
行うようにしてもよい。Example 8. In the first, third, and fourth embodiments described above, a constant value bias is applied to the detection result of the voice feature parameter variation amount information 103 so that the voice feature parameter variation amount information 103 always holds a small amount of value. The formant emphasis may be always performed even if the parameters do not change.

【００６８】音韻の過渡部だけでなく、パラメータの時
間変動が安定した母音などの定常部でのホルマント強調
を行えるという利点がある。There is an advantage that the formant enhancement can be performed not only in the transient part of the phoneme but also in the stationary part such as a vowel whose time variation of the parameter is stable.

【００６９】実施例９．上記実施例２、及び３におい
て、音声特徴パラメータ変動量情報１０３の検出結果に
一定値のバイアスを加えて、音声特徴パラメータ変動量
情報１０３が常時小量の値を保持するようにして、音声
特徴パラメータの変動が無くてもプリエンファシスを行
うようにしてもよい。Example 9. In the second and third embodiments, a constant value bias is added to the detection result of the voice feature parameter variation amount information 103 so that the voice feature parameter variation amount information 103 always holds a small amount of the voice feature parameter. Pre-emphasis may be performed even if there is no change in parameters.

【００７０】音韻の過渡部だけでなく、パラメータの時
間変動が安定した母音などの定常部でのプリエンファシ
スを行えるという利点がある。There is an advantage that pre-emphasis can be performed not only in the transitional part of the phoneme but also in the stationary part such as a vowel whose time variation of parameters is stable.

【００７１】実施例１０．前記実施例における係数マッ
プを、音声の種別の変化、または環境雑音の変化などに
適応させるように構成してもよい。また、学習法として
は、例えばＬＢＧアルゴリズムなど同業者間では既に公
知の学習手段を用いても構わないし、１つ以上の複数の
マップから最適なものを選択する方法を用いても構わな
い。さらに、受話者にとって聴きやすいように、例え
ば、音声特徴パラメータの変動が無い時でも常時小量の
強調を行いたい場合などに、係数マップの強調係数全体
に小さな定数を加えるなどの任意の変更を加えるように
してもよい。また、これらの係数マップは、音声強調装
置の動作と独立してリアルタイムに変更するようにして
もよい。Example 10. The coefficient map in the above embodiment may be adapted to adapt to changes in the type of voice or changes in environmental noise. Further, as a learning method, for example, a learning means known to those skilled in the art such as an LBG algorithm may be used, or a method of selecting an optimum one from one or more maps may be used. Furthermore, to make it easier for the listener to listen to, for example, when a small amount of emphasis is always desired even when there is no change in the voice feature parameter, an arbitrary change such as adding a small constant to the entire emphasis coefficient of the coefficient map is possible. You may add it. Further, these coefficient maps may be changed in real time independently of the operation of the voice emphasizing device.

【００７２】係数マップの適応変更を行うことにより、
環境の変化、話者の変化などに追従でき、入力音声の声
質に最適な音声強調を行うことができる利点がある。さ
らに、受話者にとって聴きやすい音声強調を行うことが
できる利点がある。By adaptively changing the coefficient map,
It has the advantage that it can follow changes in the environment, changes in the speaker, etc., and can perform optimal voice enhancement for the voice quality of the input voice. Further, there is an advantage that voice enhancement that is easy for the listener to hear can be performed.

【００７３】[0073]

【発明の効果】この発明は以上説明したようにして構成
されているので以下に記載されるような効果を奏する。
この発明の音声強調装置は、音声特徴パラメータの変動
量情報に基づいてスペクトル強調係数を導出し、この値
に基づいて入力音声をスペクトル強調するようにしたの
で、音韻の過渡部などのように音声のスペクトルが急激
に変化する箇所に対しても音声としての自然性を保ち明
瞭性を維持することができる。Since the present invention is configured as described above, it has the following effects.
Since the speech emphasizing device of the present invention derives the spectrum emphasizing coefficient based on the variation amount information of the speech feature parameter, and the spectrum is emphasized on the input speech based on this value, the speech emphasizing unit detects speech such as a transient part of a phoneme. It is possible to maintain the naturalness as a voice and maintain the clarity even in a portion where the spectrum of changes rapidly.

【００７４】また、この発明の音声強調装置は音声特徴
パラメータの変動量情報に基づいてプリエンファシス係
数を導出し、この値に基づいて入力音声をプリエンファ
シスするようにしたので、音声のスペクトルが急激に変
化しエネルギーが小さくかつ高域に集中することの多い
子音部に対しても音声としての自然性を保ち、明瞭性を
維持することができる。Further, since the speech emphasizing device of the present invention derives the pre-emphasis coefficient based on the variation amount information of the speech characteristic parameter and pre-emphasizes the input speech based on this value, the speech spectrum is sharp. It is possible to maintain the naturalness as a voice and maintain the intelligibility even for a consonant part that changes to a small amount and has a small energy and is often concentrated in a high range.

【００７５】また、この発明の音声強調装置は、音声特
徴パラメータの変動量情報に基づいてスペクトル強調係
数とプリエンファシス係数を導出し、これら両方の値に
基づいて音声の適応強調を行なうようにしたので、第１
及び第２の発明の音声強調装置が有する効果に加えて、
スペクトル強調とプリエンファシス処理の相乗効果によ
り、音声の高域部分のスペクトルピークが更に強調され
明瞭性を向上させることができる。Further, the speech emphasizing apparatus of the present invention derives the spectral emphasis coefficient and the pre-emphasis coefficient based on the variation amount information of the speech characteristic parameter, and adaptively emphasizes the speech based on both of these values. So the first
And in addition to the effect of the voice emphasizing device of the second invention,
Due to the synergistic effect of the spectral enhancement and the pre-emphasis processing, the spectral peak in the high frequency part of the voice is further enhanced and the clarity can be improved.

【００７６】また、この発明の音声強調装置は、強調処
理によって異音を発生するおそれのある音声を検出し、
強調の抑制または禁止を行うようにしたので、音声と無
関係な雑音の強調を抑制したり、過度の強調による歪み
感を防ぐことができる。Further, the voice emphasizing device of the present invention detects a voice that may generate an abnormal sound by the emphasizing process,
Since the emphasis is suppressed or prohibited, it is possible to suppress the emphasis of noise unrelated to the voice and prevent the sense of distortion due to excessive emphasis.

【００７７】また、この発明の音声強調装置は、入力音
声に加え出力音声情報をもフィードバックさせて強調制
御を行なうようにしたので、より違和感の少ない高品質
な音声出力を得ることができる。Further, in the voice emphasizing device of the present invention, since the output voice information is fed back in addition to the input voice to perform the emphasis control, it is possible to obtain a high quality voice output with less discomfort.

【００７８】また、外部制御信号を用いて音声強調を行
なうようにしたので柔軟性に富んだ音声強調処理を行な
うことができる。Further, since the voice enhancement is performed by using the external control signal, it is possible to perform a flexible voice enhancement process.

【００７９】また、スペクトル強調係数マップにより一
旦仮の強調係数を求めた後、さらに過去の履歴情報から
最終係数値を決定し、この値に基づいて音声強調処理を
行なうようにしたので違和感の無い、入力音声に一層忠
実な出力を得ることができる。Further, since the temporary emphasis coefficient is once obtained from the spectrum emphasis coefficient map, the final coefficient value is further determined from the past history information, and the voice emphasis processing is performed based on this value, so that there is no discomfort. , An output more faithful to the input voice can be obtained.

【００８０】また、プリエンファシス係数マップにより
一旦仮のプリエンファシス係数を求めた後、さらに過去
の履歴情報から最終係数値を決定しこの値に基づいてプ
リエンファシス処理を行なうようにしたので、入力音声
のエネルギーが小さく高域にパワーが集中することの多
い子音部分についても明瞭性に優れた音声出力を得るこ
とができる。Further, since the temporary pre-emphasis coefficient is once obtained by the pre-emphasis coefficient map, the final coefficient value is determined from the past history information, and the pre-emphasis processing is performed based on this value. It is possible to obtain an audio output with excellent clarity even for a consonant portion whose energy is low and whose power is often concentrated in a high range.

【００８１】さらに、音声特徴パラメータ変動量情報に
常時一定のバイアス値を加算し入力音声の特徴パラメー
タの変動の有無に拘らず常時入力音声のスペクトル強調
を行なうようにしたので、過渡部のみならず音声特徴パ
ラメータが安定した定常部においてもスペクトル強調を
おこなうことができる。Furthermore, since a constant bias value is always added to the voice feature parameter variation amount information so that the spectrum of the input voice is always enhanced regardless of whether or not the feature parameter of the input voice varies, not only the transient portion but also the transient portion. Spectral enhancement can be performed even in a stationary part where the voice feature parameter is stable.

【００８２】さらに、音声特徴パラメータ変動量情報に
常時一定のバイアス値を加算し入力音声の特徴パラメー
タの変動の有無に拘らず常時入力音声のプリエンファシ
スを行なうようにしたので、過渡部のみならず音声特徴
パラメータが安定した定常部においてもプリエンファシ
スをおこなうことができる。Furthermore, since a constant bias value is always added to the voice feature parameter variation amount information so that the pre-emphasis of the input voice is always performed regardless of the variation of the feature parameter of the input voice, not only the transient part but also Pre-emphasis can be performed even in the stationary part where the speech feature parameters are stable.

【００８３】加えて、スペクトル強調係数マップまたは
プリエンファシス係数マップを複数個備えるようにした
ので、話者の変化や環境の変化にも追従した入力音声の
声質に適した音声強調をおこなうことができる。In addition, since a plurality of spectrum emphasis coefficient maps or pre-emphasis coefficient maps are provided, it is possible to carry out voice emphasis suitable for the voice quality of the input voice that follows changes in the speaker and changes in the environment. .

[Brief description of drawings]

【図１】この発明の実施例１を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】この発明の実施例１で使用するホルマント強調
係数マップの一例を示す図である。FIG. 2 is a diagram showing an example of a formant emphasis coefficient map used in the first embodiment of the present invention.

【図３】この発明の実施例２を示すブロック図である。FIG. 3 is a block diagram showing a second embodiment of the present invention.

【図４】この発明の実施例２で使用するプリエンファシ
ス係数マップの一例を示す図である。FIG. 4 is a diagram showing an example of a pre-emphasis coefficient map used in Embodiment 2 of the present invention.

【図５】この発明の実施例３を示すブロック図である。FIG. 5 is a block diagram showing a third embodiment of the present invention.

【図６】この発明の実施例４を示すブロック図である。FIG. 6 is a block diagram showing a fourth embodiment of the present invention.

【図７】この発明の実施例５を示すブロック図である。FIG. 7 is a block diagram showing Embodiment 5 of the present invention.

【図８】この発明の実施例６を示すブロック図である。FIG. 8 is a block diagram showing Embodiment 6 of the present invention.

[Explanation of symbols]

１入力音声分析手段、２音声特徴パラメータ変動量
分析手段、３ホルマント強調強度制御手段、４ホル
マント強調係数マップ、５ホルマント強調係数決定手
段、６ホルマント強調手段、７係数バッファ、８
プリエンファシス係数制御手段、９プリエンファシス
係数決定手段、１０プリエンファシス係数マップ、１
１プリエンファシス手段、１２強調不適合音声検出
手段、１３強調係数変更手段、１０１入力音声、１
０２音声特徴パラメータ、１０３音声特徴パラメー
タ変動量情報、１０４ホルマント強調仮係数、１０５
ホルマント強調係数、１０６出力音声、１０７プリ
エンファシス仮係数、１０８プリエンファシス係数、
１０９強調抑制／禁止信号、１１０変更されたホル
マント強調係数、１１１外部制御信号、１１２線形
予測係数、１１３前フレームのホルマント強調係数、１
１４前フレームのプリエンファシス係数、２００実
施例１に記載の音声強調装置、２０１実施例４に記載
の音声強調装置。1 input speech analysis means, 2 speech feature parameter variation amount analysis means, 3 formant emphasis strength control means, 4 formant emphasis coefficient map, 5 formant emphasis coefficient determination means, 6 formant emphasis means, 7 coefficient buffer, 8
Pre-emphasis coefficient control means, 9 pre-emphasis coefficient determination means, 10 pre-emphasis coefficient map, 1
1 pre-emphasis means, 12 emphasis incompatibility voice detection means, 13 emphasis coefficient changing means, 101 input speech, 1
02 voice feature parameter, 103 voice feature parameter variation amount information, 104 formant emphasis temporary coefficient, 105
Formant emphasis coefficient, 106 output speech, 107 pre-emphasis temporary coefficient, 108 pre-emphasis coefficient,
109 emphasis suppression / inhibition signal, 110 modified formant emphasis coefficient, 111 external control signal, 112 linear prediction coefficient, 113 previous frame formant emphasis coefficient, 1
14 Pre-emphasis coefficient of previous frame, 200 Speech enhancement device described in Example 1, 201 Speech enhancement device described in Example 4.

Claims

[Claims]

1. A speech feature parameter variation amount analyzing means for analyzing a degree of temporal variation of a speech feature parameter of an input speech divided into short time intervals, and a speech feature parameter variation output by the speech feature parameter variation amount analyzing means. Spectral emphasis strength control means for determining a spectral emphasis strength and outputting a spectrum emphasis coefficient in response to the quantity information, and a spectrum emphasis means for performing a spectrum emphasis of the input voice according to the spectrum emphasis coefficient output by the spectrum emphasis strength control means. And a voice emphasizing device.

2. A voice feature parameter variation amount analyzing means for analyzing a degree of time variation of a voice feature parameter of an input voice divided into short time intervals, and a voice feature parameter variation output by the voice feature parameter variation amount analyzing means. Pre-emphasis coefficient control means for determining a pre-emphasis coefficient in response to the quantity information and outputting the pre-emphasis coefficient; and pre-emphasis means for pre-emphasis of the input voice according to the pre-emphasis coefficient output by the pre-emphasis coefficient control means. And a voice emphasizing device.

3. A voice feature parameter variation amount analyzing means for analyzing a degree of time variation of a voice feature parameter of an input voice divided into short time periods, and a first feature output from the voice feature parameter variation amount analyzing means.
Spectral emphasis strength control means for determining the spectral emphasis strength and outputting a spectral emphasis coefficient in response to the speech feature parameter variation amount information of the input speech, and the spectral emphasis of the input speech according to the spectral emphasis coefficient output by the spectral emphasis strength control means. And a second feature output by the voice feature parameter variation analysis unit.
Pre-emphasis coefficient control means for determining the pre-emphasis coefficient and outputting the pre-emphasis coefficient in response to the speech feature parameter variation amount information, and the pre-emphasis of the input speech according to the pre-emphasis coefficient output by the pre-emphasis coefficient control means. And a pre-emphasis means for performing the above.

4. An emphasis nonconforming voice detection means for inputting a voice feature parameter of an input voice divided into short time periods and outputting an enhancement suppression signal when distortion is expected in the output voice, and based on the enhancement suppression signal. 2. An enhancement coefficient changing unit for correcting the spectrum enhancement coefficient according to claim 1, wherein the spectrum enhancement unit is configured to perform the spectrum enhancement of the input voice by the corrected spectrum enhancement coefficient. Speech enhancement device.

5. The emphasis incompatibility voice detection means controls the emphasis suppression signal from the voice feature parameter information of the output voice in addition to the voice feature parameter information of the input voice. Voice enhancement device.

6. The speech enhancement apparatus according to claim 4, wherein the enhancement coefficient changing means determines the spectral enhancement coefficient by an external control signal in addition to the enhancement suppression signal.

7. A spectrum emphasis coefficient map that defines a relationship between speech feature parameter variation amount information and a spectrum emphasis coefficient, and a spectrum emphasis temporary coefficient determination unit that calculates a spectrum emphasis coefficient from the spectrum emphasis temporary coefficient. The intensity control means outputs a value derived from the spectrum emphasis coefficient map based on the speech feature parameter variation amount information to the spectrum emphasis coefficient determination means as a spectrum emphasis temporary coefficient, and the spectrum emphasis coefficient determination means determines the spectrum emphasis coefficient. The speech enhancement apparatus according to claim 1, wherein a spectrum enhancement coefficient determined by smoothing processing of preceding and following speech frames is calculated from the temporary coefficient and output to the spectrum enhancing means.

8. A pre-emphasis coefficient map that defines a relationship between voice feature parameter variation information and a pre-emphasis coefficient, and a pre-emphasis coefficient determination unit that calculates a pre-emphasis coefficient from a pre-emphasis temporary coefficient. The control means outputs a value derived from the pre-emphasis coefficient map based on the voice feature parameter variation information to the pre-emphasis coefficient determining means as a pre-emphasis temporary coefficient, and the pre-emphasis coefficient determining means determines the pre-emphasis temporary coefficient. 3. The speech emphasizing device according to claim 2, wherein a pre-emphasis coefficient determined by smoothing the speech frames before and after the coefficient is calculated from the coefficient and output to the pre-emphasis means.

9. The speech feature parameter variation amount information is constantly added with a constant bias value so that the spectrum of the input speech is always enhanced regardless of the variation of the characteristic parameter of the input speech. The speech enhancement device according to any one of paragraphs 1 or 3 to 7.

10. The pre-emphasis of the input voice is always performed regardless of the change of the feature parameter of the input voice by always adding a constant bias value to the voice feature parameter variation amount information. The speech enhancement device according to item 2, item 3, or item 8.

11. The speech enhancement apparatus according to claim 7, wherein a plurality of the spectral enhancement coefficient maps and the pre-emphasis coefficient maps are provided.