JP3353994B2

JP3353994B2 - Noise-suppressed speech analyzer, noise-suppressed speech synthesizer, and speech transmission system

Info

Publication number: JP3353994B2
Application number: JP03718594A
Authority: JP
Inventors: 文啓松岡; 純石井; 裕久田崎; 宏一白木; 訓古田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1994-03-08
Filing date: 1994-03-08
Publication date: 2002-12-09
Anticipated expiration: 2017-12-09
Also published as: JPH07248793A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声信号に重畳し
た、蓄積、伝送目的外の環境雑音を抑圧する、雑音処理
方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise processing system which suppresses environmental noise superimposed on an audio signal and which is not stored and transmitted.

【０００２】[0002]

【従来の技術】音声信号に重畳した、蓄積あるいは伝送
目的外の環境雑音を抑圧する雑音処理方式としては、例
えば、文献１、Steven F. Boll著、"Suppression of No
ise inSpeech Using Spectral Subtraction",(IEEE Tra
ns. Acoust., Speech, SignalProcessing, vol. ASSP-2
7, pp113-120, Apr. 1979)に示されている方式が挙げら
れる。この方法は、雑音区間、即ち蓄積あるいは伝送目
的外の区間から、同区間のスペクトルを推定し、それを
全区間の各パワースペクトルから減算することによって
雑音抑圧を行う方法である。2. Description of the Related Art As a noise processing method for suppressing environmental noise that is superimposed on an audio signal and is not stored or transmitted, for example, reference 1 (written by Steven F. Boll, "Suppression of No.
ise inSpeech Using Spectral Subtraction ", (IEEE Tra
ns.Acoust., Speech, SignalProcessing, vol. ASSP-2
7, pp113-120, Apr. 1979). This method is a method of estimating a spectrum of a noise section, that is, a section not intended for storage or transmission, and subtracting the spectrum from each power spectrum of the entire section to suppress noise.

【０００３】図１１は、文献１に示される雑音処理方式
の一構成例を示すブロック図である。図１１において、
２は有音／雑音判定手段、３はスペクトル分析手段、４
はスペクトル減算手段、５は平均雑音パワースペクトル
算出手段、１０１は音声信号、１０２は有音／雑音判定
情報、１０３はスペクトル、１０４は雑音引き去りパワ
ースペクトル、１０５は平均雑音パワースペクトルであ
る。FIG. 11 is a block diagram showing an example of a configuration of a noise processing method disclosed in Reference 1. In FIG.
2 is a sound / noise determination means, 3 is a spectrum analysis means, 4
Is a spectrum subtraction means, 5 is an average noise power spectrum calculation means, 101 is a speech signal, 102 is sound / noise determination information, 103 is a spectrum, 104 is a noise removal power spectrum, and 105 is an average noise power spectrum.

【０００４】以下、図１１を用いて、従来の雑音処理方
式の一構成例の動作について説明する。有音／雑音判別
手段２は、入力された音声信号１０１を所定の長さの分
析フレーム単位に分割し、各分析フレームが、有音区
間、即ち蓄積あるいは伝送対象の音声区間であるか、雑
音区間、即ち蓄積あるいは伝送対象外の雑音区間である
かを判別し、結果を有音／雑音判定情報１０２として出
力する。一方、スペクトル分析手段３は、入力された音
声信号１０１を、前記分析フレーム単位で分析し、各々
の分析フレームのパワースペクトル１０３を出力する。
有音／雑音判定情報１０２が雑音フレームの場合、平均
雑音パワースペクトル算出手段５は、スペクトル分析手
段３で得られている、当該雑音フレーム区間のパワース
ペクトル１０３と、過去の平均雑音パワースペクトル１
０５を用いて、平均雑音パワースペクトル１０５の更新
を行い、更新された平均雑音パワースペクトル１０５を
出力し、さらにスペクトル減算手段４は、パワースペク
トル１０３から平均雑音パワースペクトル１０５を差し
引いた、雑音引き去りパワースペクトル１０４を算出
し、出力する。有音／雑音判定情報１０２が有音フレー
ムの場合には、スペクトル減算手段４において、当該有
音フレーム区間のパワースペクトル１０３から、現処理
フレームから見て直前の雑音フレームまでの、平均雑音
パワースペクトル１０５に全帯域に固定かつ一定のゲイ
ンを乗じたものを差し引き、得られた雑音引き去りパワ
ースペクトル１０４を出力する。The operation of one example of a conventional noise processing system will be described below with reference to FIG. The voiced / noise discriminating means 2 divides the input voice signal 101 into analysis frames each having a predetermined length, and determines whether each analysis frame is a voice section, that is, a voice section to be stored or transmitted, or determines whether or not the analysis frame is a voice section to be stored or transmitted. It is determined whether the section is a noise section that is not a storage or transmission target, and the result is output as voiced / noise determination information 102. On the other hand, the spectrum analysis means 3 analyzes the input audio signal 101 in units of the analysis frames, and outputs a power spectrum 103 of each analysis frame.
When the voiced / noise determination information 102 is a noise frame, the average noise power spectrum calculation unit 5 obtains the power spectrum 103 of the noise frame section obtained by the spectrum analysis unit 3 and the average noise power spectrum 1 of the past.
05, the average noise power spectrum 105 is updated, the updated average noise power spectrum 105 is output, and the spectrum subtracting means 4 further subtracts the noise power obtained by subtracting the average noise power spectrum 105 from the power spectrum 103. A spectrum 104 is calculated and output. If the voiced / noise determination information 102 is a voiced frame, the spectrum subtraction unit 4 calculates the average noise power spectrum from the power spectrum 103 in the voiced frame section to the noise frame immediately before the current processing frame. A value obtained by multiplying 105 by a fixed and constant gain over the entire band is subtracted, and the obtained noise removal power spectrum 104 is output.

【０００５】[0005]

【発明が解決しようとする課題】図１１に示される従来
の雑音処理方式においては、蓄積あるいは伝送目的外の
雑音が重畳している各フレームのパワースペクトル１０
３から、平均雑音パワースペクトル１０５を差し引い
た、雑音引き去りパワースペクトル１０４を求め、これ
を真に蓄積あるいは伝送を目的とする音声情報のスペク
トルであるとして用いている。この方法によれば、雑音
が定常である場合には音声信号１０１に重畳した蓄積あ
るいは伝送目的外の雑音を取り除くことができるが、一
般に雑音は非定常であり、このため平均雑音パワースペ
クトル算出手段５において、安定な平均雑音パワースペ
クトル１０５の推定が困難であり、スペクトル減算手段
４で求められた、雑音引き去りパワースペクトル１０４
に、雑音の非定常部分が引き去り誤差として残り、結果
として雑音引き去りパワースペクトル１０４を用いて合
成した合成音は、聴覚的に、かえって雑音感が強調され
てしまうという課題があった。In the conventional noise processing method shown in FIG. 11, the power spectrum of each frame on which noises not stored or transmitted is superimposed.
3, the noise removal power spectrum 104 is obtained by subtracting the average noise power spectrum 105, and this is used as the spectrum of voice information intended to be truly stored or transmitted. According to this method, if the noise is stationary, it is possible to remove accumulated noise superimposed on the audio signal 101 or the noise not intended for transmission. However, the noise is generally non-stationary, and therefore, the average noise power spectrum calculation means is used. 5, it is difficult to estimate a stable average noise power spectrum 105, and the noise subtraction power spectrum 104 obtained by the spectrum subtracting means 4 is obtained.
In addition, there is a problem that a non-stationary portion of noise remains as a subtraction error, and as a result, a synthesized sound synthesized using the noise removal power spectrum 104 has an auditory sense of noise.

【０００６】また、平均雑音パワースペクトル１０５を
引き去るとき、常に全帯域に固定かつ一定の引き去り率
を用いているため、平均雑音パワースペクトル１０５の
推定がうまく行われた場合にも、フレームや帯域によっ
ては引き去り過剰により、引き去りパワースペクトルが
大きく変形してしまったり、引き去り誤差が発生してし
まうという課題があった。さらに、雑音引き去りパワー
スペクトル１０４を、蓄積あるいは伝送を目的とする分
析合成系の音声符号化方式の伝送スペクトル情報として
用いた場合、分析合成系の単純なモデリングでは、前記
引き去り誤差を正確に再現できないため、引き去り誤差
が変形して、不自然な雑音を発生するという課題があっ
た。Further, when the average noise power spectrum 105 is subtracted, a fixed and constant subtraction rate is always used for all bands. In some cases, there is a problem that the withdrawal power spectrum is greatly deformed due to excessive withdrawal or a withdrawal error occurs. Further, when the noise removal power spectrum 104 is used as transmission spectrum information of a speech coding system of an analysis and synthesis system for storage or transmission, the subtraction error cannot be accurately reproduced by simple modeling of the analysis and synthesis system. For this reason, there has been a problem that the withdrawal error is deformed and unnatural noise is generated.

【０００７】図１２乃至図１５は、上記分析合成系の音
声符号化方式（ここではハーモニックコーダー）と共に
図１１の従来例を用いた場合の、引き去り誤差の変形の
様子をスペクトル包絡の時間推移を用いて説明する説明
図である。図１２のような原音声信号に対して、図１３
のように、定常な白色雑音を重畳させた（ＳＮＲ５ｄ
Ｂ）音声信号を用意し、従来例による前処理を行わずに
ハーモニックコーダーで符号化し復号した合成音の例を
図１４に、図１１の構成による従来例を用いて前処理を
施し、ハーモニックコーダーで符号化し復号した合成音
の例を図１５に示す。図１５を観察すると、従来例によ
る前処理を施したため、重畳した雑音はある程度引き去
られてはいるが、特に有音区間において、小振幅のスペ
クトル包絡のピークが、フレーム単位に不連続に出現し
ている。前処理を行わない場合の図１４の合成音には上
記の不連続は見られないことから、引き去り誤差が誤っ
たモデリングの結果変形されているものと考えられる。
但し図１４では、図１５の場合ほど、極端な包絡ピーク
の不連続は見られないものの、原音声信号の特徴が雑音
に埋もれており、正確に再現されているとはいえない。
従来の雑音処理方式では、この雑音処理方式を用いない
場合に比べ効果はあるものの、上記引き去り誤差の変形
にともなう包絡ピークの不連続が、不自然な雑音となっ
て聴取されるという問題があり、実用的ではないという
課題があった。FIGS. 12 to 15 show how the subtraction error is deformed in the case where the conventional example shown in FIG. 11 is used together with the above-mentioned speech coding method of the analysis / synthesis system (here, a harmonic coder). It is an explanatory view explaining using. For the original audio signal as shown in FIG.
(SNR5d)
B) An example of a synthesized sound prepared by preparing an audio signal and encoding and decoding by the harmonic coder without performing the conventional pre-processing is shown in FIG. 14, and pre-processed by using the conventional example having the configuration of FIG. FIG. 15 shows an example of a synthesized speech coded and decoded by. Observation of FIG. 15 shows that although the superimposed noise has been removed to some extent due to the pre-processing according to the conventional example, the peak of the small-amplitude spectral envelope appears discontinuously in frame units, particularly in a sound section. are doing. Since the above-mentioned discontinuity is not seen in the synthesized sound in FIG. 14 in which the preprocessing is not performed, it is considered that the subtraction error is deformed as a result of incorrect modeling.
However, in FIG. 14, although there is no extreme discontinuity of the envelope peak as in the case of FIG. 15, the features of the original audio signal are buried in the noise and cannot be said to be accurately reproduced.
Although the conventional noise processing method is more effective than the case where this noise processing method is not used, there is a problem that discontinuity of the envelope peak due to the deformation of the subtraction error is heard as unnatural noise. There was a problem that it was not practical.

【０００８】この発明はこれらの課題を解決するために
なされたもので、前記引き去り誤差の影響により強調さ
れる雑音感を、聴覚的に軽減する雑音抑圧音声分析装
置、雑音抑圧音声合成装置を得ることを目的としてい
る。また、過剰な引き去りに伴うスペクトル変形や前記
引き去り誤差が出にくい雑音引き去りを行うことで、良
好な雑音抑圧を行った雑音抑圧音声分析装置、雑音抑圧
音声合成装置及び音声伝送システムを得ることを目的と
している。The present invention has been made to solve these problems, and provides a noise-suppressed speech analyzing apparatus and a noise-suppressed speech synthesizing apparatus that auditoryly reduce a noise feeling emphasized by the influence of the subtraction error. It is intended to be. It is another object of the present invention to obtain a noise-suppressed speech analyzer, a noise-suppressed speech synthesizer, and a speech transmission system that perform good noise suppression by performing spectrum removal due to excessive removal and noise removal that hardly causes the removal error. And

【０００９】[0009]

【課題を解決するための手段】この発明に係る雑音抑圧
音声分析装置は、入力音声信号を所定の分析フレーム単
位毎にスペクトル分析してパワースペクトルを求めるス
ペクトル分析手段と、上記分析フレームの内の雑音区間
フレームに対して規定のフレーム分の平均雑音パワース
ペクトルを求める平均雑音パワースペクトル算出手段
と、上記スペクトル分析手段出力に上記平均雑音パワー
スペクトルを減算した雑音引き去りパワースペクトル
と、上記平均雑音パワースペクトルとを選択し、伝送ス
ペクトルとして送信する伝送スペクトル選択伝送手段を
備えた。 A noise suppression speech analyzer according to the present invention converts an input speech signal into a predetermined analysis frame unit.
Spectral analysis for power spectrum
Vector analysis means and noise section in the analysis frame
Average noise power for a specified frame for a frame
Mean noise power spectrum calculation means for obtaining the spectrum
And the average noise power in the output of the spectrum analysis means.
Noise subtracted power spectrum with spectrum subtracted
And the above average noise power spectrum, and
Transmission spectrum selection transmission means to transmit as a spectrum
Equipped.

【００１０】[0010]

【００１１】また更に、雑音フレーム区間では、送信す
るパワースペクトルとして雑音フレーム区間の瞬時雑音
パワースペクトルを送信するようにした。Further, in the noise frame section, the instantaneous noise power spectrum in the noise frame section is transmitted as the power spectrum to be transmitted.

【００１２】この発明に係る雑音抑圧音声合成装置は、
入力音声信号を所定の長さの分析フレーム単位で分割
し、この分析フレームを有音区間フレームと雑音区間フ
レームに分け、雑音区間フレームに対してはスペクトル
分析して平均雑音パワースペクトルを記憶する平均雑音
パワースペクトル保持手段と、平均雑音パワースペクト
ルを入力として合成音を生成する重畳雑音合成手段と、
入力信号が有音区間の場合は、入力の有音区間の音声信
号と、重畳雑音合成手段出力の合成音とを所定の重畳倍
率をかけて重畳して有音区間の合成音を生成して出力す
る有音区間合成音出力手段と、有音区間合成音出力手段
と重畳雑音合成手段出力とにより重畳倍率を計算し制御
し、入力信号が雑音区間の場合は、重畳雑音合成手段出
力に重畳倍率をかけて雑音区間の合成音を出力するよう
制御する重畳倍率制御手段と、を備えた。 [0012] The noise suppressing speech synthesizer according to the present invention comprises:
Divide the input audio signal into analysis frames of a predetermined length
Then, the analysis frame is divided into a speech section frame and a noise section frame.
Frame, and spectrum for noise section frames
Average noise analyzing and storing the average noise power spectrum
Power spectrum holding means and average noise power spectrum
Superimposed noise synthesizing means for generating a synthesized sound by using
If the input signal is a sound section, the audio signal of the input sound section
Signal and the synthesized sound output from the superimposed noise synthesis means by a predetermined superimposed multiple
Generate and output synthesized speech in the sound section by multiplying and multiplying by
Voiced section synthesized sound output means, and voiced section synthesized sound output means
Calculates and controls the superimposition magnification based on the output of the superimposition noise synthesis means
If the input signal is in the noise section, the superimposed noise
Output a synthesized sound in a noise section
And superimposing magnification control means for controlling.

【００１３】または、入力音声信号を所定の長さの分析
フレーム単位で分割し、分析フレームを有音区間フレー
ムと雑音区間フレームに分け、雑音区間フレームに対し
てはスペクトル分析して平均雑音パワースペクトルを記
憶する平均雑音パワースペクトル保持手段と、平均雑音
パワースペクトルを入力として合成音を生成する重畳雑
音合成手段と、有音区間フレームのスペクトルを所定の
周波数帯域に区切って帯域別に重畳倍率を制御する帯域
別重畳倍率制御手段と、入力信号が有音区間の場合は、
入力の有音区間の音声信号と、重畳雑音合成手段出力の
合成音とを帯域別重畳手段が制御する重畳倍率をかけて
重畳して有音区間の合成音を生成して出力する有音区間
合成音出力手段と、を備えた。[0013] or, analysis of the input speech signal by a predetermined length
Divide frame by frame, and analyze frame
And noise interval frames.
Analysis and record the average noise power spectrum.
Mean noise power spectrum holding means and average noise
Superimposed noise that generates synthesized sound using power spectrum as input
A sound synthesizing means, and a spectrum of a sound section frame is set to a predetermined value.
A band that controls the superposition magnification for each band divided into frequency bands
When the input signal is in a sound interval,
A voiced section for generating and outputting a synthesized voice in a voiced section by superimposing a voice signal in an input voiced section and a synthesized sound output from the superimposed noise synthesis means by a superposition factor controlled by a band-wise superimposing means to generate and output a synthesized voice in a voiced section. And synthetic sound output means.

【００１４】また更に、平均雑音パワースペクトル保持
手段は、瞬時雑音パワースペクトルが送信された場合
は、該瞬時雑音パワースペクトルを規定のフレーム分平
均化して平均雑音パワースペクトルとして記憶し、雑音
区間フレームでは、送信された瞬時雑音パワースペクト
ルまたは上記平均雑音パワースペクトルを基に出力する
ようにした。 Still further, retention of average noise power spectrum
Means when the instantaneous noise power spectrum is transmitted
Divides the instantaneous noise power spectrum by a specified frame
Averaged and stored as the average noise power spectrum,
In the interval frame, the transmitted instantaneous noise power spectrum
Or output based on the above average noise power spectrum
I did it.

【００１５】また更に、スペクトル分析手段出力から平
均雑音パワースペクトル算出手段出力を減算率を掛けて
減算して雑音引き去りパワースペクトルを求めるスペク
トル減算手段と、上記スペクトル分析手段出力の値によ
り上記減算率を定める引き去り率算出手段を備えて、有
音フレーム区間では上記雑音引き去りパワースペクトル
を伝送するようにした。Still further, spectrum subtraction means for subtracting the output of the average noise power spectrum calculation means from the output of the spectrum analysis means by a subtraction rate to obtain a noise subtracted power spectrum, and calculating the subtraction rate by the value of the output of the spectrum analysis means. A predetermined subtraction rate calculating means is provided so that the noise-removed power spectrum is transmitted in a sound frame section.

【００１６】また更に、引き去り率算出手段は、有音区
間のスペクトル分析手段出力の周波数帯域別に引き去り
率を算出するようにし、スペクトル減算手段は、上記周
波数帯域別の引き去り率で平均雑音パワースペクトルを
減算するようにした。Still further, the subtraction rate calculating means calculates the removal rate for each frequency band output from the spectrum analysis means in the sound section, and the spectrum subtraction means calculates the average noise power spectrum at the above-described removal rate for each frequency band. It was made to subtract.

【００１７】また更に、引き去り率算出手段は、有音区
間の雑音引き去りパワースペクトル出力をみて予め定め
たしきい値以下になると、この雑音引き去りスペクトル
出力が予め定めたしきい値を出力するようにした。Furthermore, the subtraction rate calculating means outputs the noise removal spectrum output to a predetermined threshold when the noise removal power spectrum output in the sound section falls below a predetermined threshold value. did.

【００１８】この発明に係る音声伝送システムは、入力
音声信号を所定の分析フレーム単位毎にスペクトル分析
してパワースペクトルを求めるスペクトル分析手段と、
上記分析フレームの内の雑音区間フレームに対して規定
のフレーム分の平均雑音パワースペクトルを求める平均
雑音パワースペクトル算出手段と、上記スペクトル分析
手段出力から上記平均雑音パワースペクトルを減算した
雑音引き去りパワースペクトルと、上記平均雑音パワー
スペクトルとを選択し、伝送スペクトルとして送信する
伝送スペクトル選択伝送手段を備えた雑音抑圧音声分析
装置と、伝送された分析フレーム毎の信号の雑音区間フ
レームに対しては、そのスペクトルに対応して平均雑音
パワースペクトルを記憶する平均雑音パワースペクトル
保持手段と、上記平均雑音パワースペクトルを入力とし
て合成音を生成する重畳雑音合成手段と、入力信号が有
音区間の場合は、入力の有音区間の音声信号と上記重畳
雑音合成手段出力の合成音とを重畳して有音区間の合成
音を生成する有音区間合成音出力手段を備えた雑音抑圧
音声合成装置とで構成するようにした。[0018] The audio transmission system according to the present invention comprises: a spectrum analysis means for analyzing the spectrum of an input audio signal for each predetermined analysis frame unit to obtain a power spectrum;
An average noise power spectrum calculating means for obtaining an average noise power spectrum for a prescribed frame with respect to a noise section frame in the analysis frame; and a noise subtracted power spectrum obtained by subtracting the average noise power spectrum from the output of the spectrum analyzing means. , A noise-suppressed speech analyzing apparatus having a transmission spectrum selecting and transmitting means for selecting the average noise power spectrum and transmitting the selected spectrum as a transmission spectrum, and a spectrum for a noise section frame of a signal for each transmitted analysis frame. Average noise power spectrum holding means for storing an average noise power spectrum corresponding to the above, a superimposed noise synthesizing means for generating a synthesized sound using the average noise power spectrum as an input, and an input signal when the input signal is a sound section. Voice signal of voiced section and output of the superimposed noise synthesis means And to constitute between the noise reduced speech synthesis apparatus which includes a sound interval synthesized sound output means for generating a synthesized sound of superposition to voiced interval and a synthesized sound.

【００１９】また更に、音声伝送システム中の各装置
を、平均雑音パワースペクトルを可変の減算率で減算し
て雑音引き去りパワースペクトルとした雑音抑圧音声分
析装置とし、入力の有音区間の合成音に、重畳雑音合成
手段出力を可変の重畳倍率を掛けて重畳して有音区間の
合成音を生成するようにした雑音抑圧音声合成装置とし
た。Still further, each device in the voice transmission system is a noise-suppressed voice analysis device in which the average noise power spectrum is subtracted by a variable subtraction rate to obtain a noise-removed power spectrum. And a noise suppression speech synthesizer in which the output of the superimposed noise synthesizing means is multiplied by a variable superimposition factor and superimposed to generate a synthesized sound in a sound section.

【００２０】[0020]

【作用】この発明による雑音抑圧音声分析装置は、伝送
スペクトル選択伝送手段を備えて、雑音引き去りパワー
スペクトルと平均雑音パワースペクトルのどちらか一方
が選択されて伝送される。 The noise-suppressed speech analyzing apparatus according to the present invention is capable of transmitting
Equipped with spectrum selective transmission means, noise removal power
Either spectrum or average noise power spectrum
Is selected and transmitted.

【００２１】[0021]

【００２２】また更に、伝送する雑音パワースペクトル
として、各雑音フレーム区間の瞬時雑音パワースペクト
ルが伝送される。Further, an instantaneous noise power spectrum in each noise frame section is transmitted as a noise power spectrum to be transmitted.

【００２３】[0023]

【００２４】また更に、有音区間の合成出力音として、
伝送された有声音対応のパワースペクトルに雑音パワー
スペクトルからの合成音がある倍率で重畳されて得ら
れ、雑音フレーム区間では、平均雑音パワースペクトル
にある倍率を掛けて合成音が得られる。Further, as a synthesized output sound of a sound section,
The synthesized voice from the noise power spectrum is superimposed on the transmitted voice spectrum corresponding to the voiced sound at a certain magnification, and in the noise frame section, the synthesized noise is obtained by multiplying the average noise power spectrum by a certain magnification.

【００２５】また更に、有音区間の合成音の出力とし
て、伝送された有声音の周波数帯域別に区切られたスペ
クトル毎に重畳倍率が計算され、有声音対応のパワース
ペクトルに雑音パワースペクトルからの合成音が上記ス
ペクトル毎に計算された各倍率で重畳されて得られる。Further, as an output of the synthesized sound in the voiced section, a superposition magnification is calculated for each spectrum divided for each frequency band of the transmitted voiced sound, and a power spectrum corresponding to the voiced sound is synthesized from the noise power spectrum. The sound is obtained by being superimposed at each magnification calculated for each spectrum.

【００２６】また更に、伝送すべき有音区間のパワース
ペクトルとして、有音区間フレームのパワースペクトル
から、ある可変の減算率で平均雑音スペクトルを引き去
った雑音引き去りパワースペクトルが伝送される。Further, a noise-removed power spectrum obtained by subtracting the average noise spectrum at a certain variable subtraction rate from the power spectrum of the voiced section frame is transmitted as the power spectrum of the voiced section to be transmitted.

【００２７】また更に、伝送すべき有音区間のパワース
ペクトルとして、有音区間フレームの有声音の周波数帯
域別に区切られたスペクトル毎に減算率が計算され、有
音区間フレームのパワースペクトルから、上記スペクト
ル毎に計算された各減算率で平均雑音スペクトルを引き
去った、雑音引き去りパワースペクトルが伝送される。Further, as a power spectrum of a voiced section to be transmitted, a subtraction rate is calculated for each spectrum divided for each frequency band of the voiced sound of the voiced section frame. A noise-removed power spectrum obtained by subtracting the average noise spectrum at each subtraction rate calculated for each spectrum is transmitted.

【００２８】また更に、伝送すべき有音区間のパワース
ペクトルとして、有音区間フレームのパワースペクトル
から、ある可変の減算率で平均雑音スペクトルを引き去
るが、引き去った後の雑音引き去りパワースペクトルが
一定のしきい値以上であるよう減算されて雑音引き去り
パワースペクトルが伝送される。Further, the average noise spectrum is subtracted at a certain variable subtraction rate from the power spectrum of the voiced section frame as the power spectrum of the voiced section to be transmitted. The noise subtracted power spectrum is transmitted after being subtracted so as to be equal to or larger than a certain threshold value.

【００２９】この発明による音声伝送システムは、送信
側の雑音抑圧音声分析装置では雑音パワースペクトルも
伝送され、受信側の雑音抑圧音声合成装置では、有音区
間の合成音の出力として雑音引き去りパワースペクトル
から合成した有声音に、平均雑音パワースペクトルから
の合成音を一部重畳されて出力される。In the voice transmission system according to the present invention, the noise power spectrum is also transmitted by the noise suppression voice analyzer on the transmission side, and the noise removal power spectrum is output by the noise suppression voice synthesizer on the reception side as the output of the synthesized voice in the sound section. Are superimposed on the voiced sound synthesized from the above, and a part of the synthesized sound from the average noise power spectrum is output.

【００３０】また更に、送信側では、平均雑音パワース
ペクトルを可変減算した雑音引き去りパワースペクトル
が生成されて伝送され、受信側では、入力の有音に可変
倍率で雑音パワースペクトルからの合成音が重畳されて
出力される。Further, on the transmitting side, a noise subtracted power spectrum obtained by variably subtracting the average noise power spectrum is generated and transmitted, and on the receiving side, the synthesized sound from the noise power spectrum is superimposed on the input sound at a variable magnification. Is output.

【００３１】[0031]

【Example】

実施例１．本実施例は、送信側の分析装置においては、
雑音区間の信号として平均雑音スペクトルを選択して送
信することに特徴があり、受信側の合成装置において
は、有音区間の合成音に平均雑音スペクトルを一部重畳
して出力することに特徴がある。図１は、本発明による
雑音抑圧音声合成装置と雑音抑圧音声分析装置の一実施
例のブロック図である。図中新規な部分は、６の伝送ス
ペクトル選択手段、７の情報伝送手段、８の情報受信手
段、９の平均雑音パワースペクトル保持手段、１１の重
畳雑音合成手段、１２の雑音区間合成音出力手段、１３
の有音区間合成音出力手段である。また、１０６は伝送
スペクトル情報、１０５の平均雑音パワースペクトル、
１１１は重畳雑音、１１２は出力音声、２００は伝送路
である。Embodiment 1 FIG. In the present embodiment, in the analyzer on the transmission side,
The feature is that the average noise spectrum is selected and transmitted as the signal of the noise section, and the synthesizer on the receiving side is characterized in that the average noise spectrum is partially superimposed on the synthesized sound of the sound section and output. is there. FIG. 1 is a block diagram of an embodiment of a noise-suppressed speech synthesizer and a noise-suppressed speech analyzer according to the present invention. The new parts in the figure are: 6 transmission spectrum selection means, 7 information transmission means, 8 information reception means, 9 average noise power spectrum holding means, 11 superimposed noise synthesis means, 12 noise section synthesized sound output means , 13
Is a sounded section synthesized sound output means. 106 is transmission spectrum information, 105 is an average noise power spectrum,
111 is a superimposed noise, 112 is an output voice, and 200 is a transmission path.

【００３２】以下、図１に示した音声分析装置と音声合
成装置の一実施例の動作について説明する。有音／雑音
判別手段２は、入力された、所定のサンプリング周期
（ここでは８０００Ｈｚ）でサンプリングしてある音声
信号１０１を所定の長さの分析フレーム単位（ここでは
２０ｍｓｅｃ）に分割し、各分析フレームが、有音区
間、即ち蓄積あるいは伝送対象の区間であるか、雑音区
間、即ち蓄積あるいは伝送対象外の区間であるかを判別
し、結果を有音／雑音判定情報１０２として出力する。
この有音／雑音判定手段２は公知の音声符号化方式に用
いられる方法を用いて構成している。同時にスペクトル
分析手段３は、入力された前記音声信号１０１を、前記
分析フレーム単位にスペクトル分析を行う。ここでは当
該分析フレームを中心とするサンプル数２５６点のＦＦ
Ｔ（高速フーリエ変換）を用いてスペクトル分析し、各
スペクトルの振幅値の２乗を計算し、パワースペクトル
１０３を出力する。The operation of one embodiment of the speech analyzer and the speech synthesizer shown in FIG. 1 will be described below. The voiced / noise discriminating means 2 divides the input audio signal 101 sampled at a predetermined sampling period (here, 8000 Hz) into analysis frame units of a predetermined length (here, 20 msec), and performs each analysis. It is determined whether the frame is a voiced section, ie, a section to be stored or transmitted, or a noise section, ie, a section not to be stored or transmitted, and the result is output as voiced / noise determination information 102.
The voice / noise determination means 2 is configured using a method used for a known speech coding system. At the same time, the spectrum analysis means 3 performs a spectrum analysis on the input audio signal 101 for each analysis frame. Here, an FF with 256 samples, centered on the analysis frame,
The spectrum is analyzed using T (fast Fourier transform), the square of the amplitude value of each spectrum is calculated, and the power spectrum 103 is output.

【００３３】有音／雑音判定情報１０２が雑音フレーム
の場合、平均雑音パワースペクトル算出手段５は、スペ
クトル分析手段３で得られている、当該雑音フレーム区
間のパワースペクトル１０３と、バッファ内に保持され
ている過去の平均雑音パワースペクトル１０５を用い
て、逐次それらの平均算出を行い、バッファ内の平均雑
音パワースペクトル１０５の更新を行い、次に更新され
た平均雑音パワースペクトル１０５を出力する。一方、
有音／雑音判定情報１０２が有音フレームの場合には、
スペクトル減算手段４において、当該有音フレーム区間
のパワースペクトル１０３から、平均雑音パワースペク
トル算出手段５のバッファ内に保持されている平均雑音
パワースペクトル１０５に、１．０程度の所定の固定的
なゲインを乗じたものを差し引いた、雑音引き去りパワ
ースペクトル１０４を算出し出力する。この算出はスペ
クトルの各値の差を計算することで得られる。When the sound / noise determination information 102 is a noise frame, the average noise power spectrum calculation means 5 holds the power spectrum 103 of the noise frame section obtained by the spectrum analysis means 3 and the buffer. The average noise power spectrum 105 in the buffer is updated sequentially by using the average noise power spectrum 105 in the past, and the updated average noise power spectrum 105 is output. on the other hand,
When the sound / noise determination information 102 is a sound frame,
In the spectrum subtracting means 4, a predetermined fixed gain of about 1.0 is added from the power spectrum 103 in the sound frame section to the average noise power spectrum 105 held in the buffer of the average noise power spectrum calculating means 5. Is subtracted, and a noise removal power spectrum 104 is calculated and output. This calculation is obtained by calculating the difference between each value of the spectrum.

【００３４】伝送スペクトル選択手段６では、有音／雑
音判定情報１０２が雑音フレームの場合、その時点で、
平均雑音パワースペクトル算出手段５で算出され、更新
されてバッファ内に取り込まれた平均雑音パワースペク
トル１０５を、スイッチ選択し、伝送スペクトル情報１
０６として出力する。有音／雑音判定情報１０２が有音
フレームの場合、スペクトル減算手段４で算出された雑
音引き去りパワースペクトル情報１０４を選択し、伝送
スペクトル情報１０６として出力する。これを情報伝送
手段７は、当該処理フレームの有音／雑音判定情報１０
２及び伝送スペクトル情報１０６を、伝送路２００にお
ける伝送形態にあわせて符号化乃至変調を行い、伝送路
２００を通じて伝送する。In the transmission spectrum selecting means 6, when the sound / noise determination information 102 is a noise frame,
The average noise power spectrum 105 calculated and updated by the average noise power spectrum calculation means 5 and taken into the buffer is selected by a switch, and the transmission spectrum information 1 is selected.
06 is output. When the sound / noise determination information 102 is a sound frame, the noise subtracting power spectrum information 104 calculated by the spectrum subtracting means 4 is selected and output as transmission spectrum information 106. The information transmission means 7 transmits the sound / noise determination information 10
2 and the transmission spectrum information 106 are coded or modulated according to the transmission mode in the transmission path 200, and transmitted through the transmission path 200.

【００３５】一方、受信側の音声合成装置においては、
情報受信手段８が判定情報１０２と伝送スペクトル情報
１０６を伝送路２００から受け取る。情報受信手段８
は、受信情報の復調乃至復号化後、受け取った有音／雑
音判定情報１０２及び伝送スペクトル情報１０６を出力
する。本実施例では、有音／雑音判定情報１０２及び伝
送スペクトル情報１０６の符号化・復号化方式としてハ
ーモニックコーダーを用いており、伝送路２００は無線
通信路を用いている。On the other hand, in the speech synthesizer on the receiving side,
The information receiving means 8 receives the determination information 102 and the transmission spectrum information 106 from the transmission line 200. Information receiving means 8
Outputs the received voice / noise determination information 102 and transmission spectrum information 106 after demodulation or decoding of the received information. In the present embodiment, a harmonic coder is used as a coding / decoding method for the voiced / noise determination information 102 and the transmission spectrum information 106, and the transmission path 200 uses a wireless communication path.

【００３６】平均雑音パワースペクトル保持手段９は、
情報受信手段８で受信された有音／雑音判定情報１０２
が雑音フレームの場合、それまでに保持されている伝送
スペクトル情報１０６を更新して、新たに受信された伝
送スペクトル情報１０６を、平均雑音パワースペクトル
１０５として保持する。その構成要素として、１フレー
ム分の平均雑音パワースペクトル１０５を記憶するバッ
ファがある。The average noise power spectrum holding means 9 comprises:
Sound / noise determination information 102 received by information receiving means 8
Is a noise frame, the transmission spectrum information 106 held so far is updated, and the newly received transmission spectrum information 106 is held as the average noise power spectrum 105. As a component thereof, there is a buffer for storing the average noise power spectrum 105 for one frame.

【００３７】重畳雑音合成手段１１は、平均雑音パワー
スペクトル１０５を用いて重畳雑音１１１を作成し、情
報受信手段８で受信された有音／雑音判定情報１０２が
雑音フレームの場合、重畳雑音１１１の振幅を所定の減
衰率（ここでは０．８）で減衰したものを、出力合成音
１１２として出力する。一方、情報受信手段８で受信さ
れた有音／雑音判定情報１０２が有音フレームの場合、
重畳雑音１１１を、後述する有音区間合成音出力手段１
３に出力する。本実施例では平均雑音パワースペクトル
１０５を６４の帯域（サブバンド）に分割し、サブバン
ドの１／２の帯域幅をもつガウス性雑音を、各サブバン
ドの中心周波数でＡＭ変調し、別に求めた平均雑音パワ
ースペクトル１０５のサブバンドのパワー値（サブバン
ド内の各平均雑音パワースペクトル値の和）の平方根よ
り求められたサブバンドの振幅値を乗じてこの重畳雑音
１１１を生成した。これはハーモニックコーダーの無声
音の合成方法と同様のものであり、例えば、H. Carl &
B. Kolpatzik著、"Speech Coding Using Nonstationary
Sinusoidal Modelling and Narrow-Band Basis Functi
ons",(IEEE Int. Conf. Rec. on ASSP(1991)pp581-584)
に記載されている。The superimposed noise synthesizing means 11 generates superimposed noise 111 using the average noise power spectrum 105, and when the sound / noise determination information 102 received by the information receiving means 8 is a noise frame, the superimposed noise 111 A signal whose amplitude has been attenuated at a predetermined attenuation rate (here, 0.8) is output as an output synthesized sound 112. On the other hand, when the sound / noise determination information 102 received by the information receiving means 8 is a sound frame,
The superimposed noise 111 is converted into a sound section synthesized sound output unit 1 described later.
Output to 3. In the present embodiment, the average noise power spectrum 105 is divided into 64 bands (sub-bands), and Gaussian noise having a bandwidth of 1/2 of the sub-band is AM-modulated at the center frequency of each sub-band and separately obtained. This superimposed noise 111 is generated by multiplying the sub-band amplitude value obtained from the square root of the power value of the sub-band of the average noise power spectrum 105 (sum of the average noise power spectrum values in the sub-band). This is similar to the unvoiced synthesis method of harmonic coder, for example, H. Carl &
B. Kolpatzik, "Speech Coding Using Nonstationary
Sinusoidal Modeling and Narrow-Band Basis Functi
ons ", (IEEE Int. Conf. Rec. on ASSP (1991) pp581-584)
It is described in.

【００３８】有音区間合成音出力手段１３は、情報受信
手段８で受信された有音／雑音判定情報１０２が有音フ
レームの場合、まず当該フレームの伝送スペクトル情報
１０６、即ち雑音引き去りパワースペクトル１０４を用
いて、ハーモニックコーダーの合成方法を用いて合成音
を合成する。具体的には、雑音引き去りパワースペクト
ル１０４を６４のサブバンドに分割し、各サブバンドの
中心周波数の正弦波に、サブバンドの振幅値を乗じて生
成する。次に重畳雑音１１１に所定の重畳倍率（ここで
は０．５）倍したものを重畳させ、出力合成音１１２と
して出力する。When the voiced / noise determination information 102 received by the information receiving means 8 is a voiced frame, the voiced section synthesized voice output means 13 first transmits the transmission spectrum information 106 of the frame, that is, the noise removal power spectrum 104. Is used to synthesize a synthesized sound using a harmonic coder synthesis method. More specifically, the noise removal power spectrum 104 is divided into 64 sub-bands, and a sine wave of the center frequency of each sub-band is multiplied by the amplitude value of the sub-band. Next, a signal multiplied by a predetermined multiplication factor (here, 0.5) is superimposed on the superimposed noise 111 and output as an output synthesized sound 112.

【００３９】この例のように、スペクトルの引き去りに
用いた、平均雑音パワースペクトルを用いて生成した雑
音を、振幅を抑えて再度重畳させる構成をとることによ
り、引き去り誤差による、不連続な包絡ピークが埋め戻
され、連続性が増し、聴覚的なマスク効果により、不快
な雑音感を軽減する効果がある。As in this example, the noise generated by using the average noise power spectrum used for the subtraction of the spectrum is superimposed again while suppressing the amplitude, so that the discontinuous envelope peak due to the subtraction error is generated. Are backfilled, continuity is increased, and an audible masking effect has the effect of reducing unpleasant noise.

【００４０】図２は、実施例１の音声分析装置と音声合
成装置による出力を、ハーモニックコーダーの前処理及
び後処理に用いて符号化し復号した場合の合成音のスペ
クトル包絡の時間推移を説明する説明図である。図２を
観察すると、実施例１の音声分析装置と音声合成装置に
よる出力を、前処理及び後処理に用いて符号化し復号し
た場合の合成音は、図１５に見られたような小振幅の包
絡ピークの時間的な不連続は、重畳した雑音によりマス
クされている様子が分かる。また、図１４、図１５とそ
れに対する図２で示される合成音に対して、音声研究者
６名により、音質の好ましさを基準とする簡単な対比較
検査を行ったところ、選択率がそれぞれ１０．７％，４
２．９％，９６．４％となり、重畳雑音によるマスク効
果が、聴覚上良好に機能していることが分かった。FIG. 2 explains the time transition of the spectrum envelope of a synthesized sound when the outputs from the speech analysis device and the speech synthesis device of the first embodiment are encoded and decoded by using the pre-processing and post-processing of the harmonic coder. FIG. When observing FIG. 2, the synthesized sound obtained by encoding and decoding the outputs of the speech analyzer and the speech synthesizer according to the first embodiment for pre-processing and post-processing has a small amplitude as shown in FIG. It can be seen that the temporal discontinuity of the envelope peak is masked by the superimposed noise. In addition, a simple pair comparison test was performed on the synthesized sounds shown in FIGS. 14 and 15 and FIG. 2 corresponding thereto based on the preference of sound quality by six voice researchers. 10.7%, 4 respectively
The values were 2.9% and 96.4%, indicating that the mask effect due to the superimposed noise was functioning audibly.

【００４１】実施例２．本実施例では、送信側の雑音パ
ワースペクトルとして平均ではなく、瞬時値を送る。一
方、受信側では、雑音区間ではこのまま受信値を合成し
て出力し、また有音区間では受信値を蓄積して平均雑音
化して有音区間の加算源とする。図３は、本実施例の音
声分析装置と音声合成装置のブロック図である。図中新
規な部分は、１４の受信側平均雑音パワースペクトル算
出手段である。その他の伝送スペクトル選択手段６、情
報伝送手段７、情報受信手段８、重畳雑音合成手段１
１、有音区間合成音出力手段１３は実施例１と同じであ
り、説明を省略する。平均雑音パワースペクトル算出手
段１４の構成は、送信側の平均雑音パワースペクトル算
出手段５と似た構成で、バッファから平均算出手段にフ
ィードバックループをかけ、平均化している。Embodiment 2 FIG. In this embodiment, an instantaneous value is sent as a noise power spectrum on the transmission side, not an average. On the other hand, on the receiving side, the received value is synthesized and output as it is in the noise section, and the received value is accumulated and converted into average noise in the sound section to be used as an addition source for the sound section. FIG. 3 is a block diagram of the speech analyzer and the speech synthesizer of the present embodiment. The new part in the figure is 14 receiving-side average noise power spectrum calculating means. Other transmission spectrum selection means 6, information transmission means 7, information reception means 8, superposition noise synthesis means 1
1. The voiced section synthesized sound output means 13 is the same as in the first embodiment, and a description thereof will be omitted. The configuration of the average noise power spectrum calculating means 14 is similar to that of the average noise power spectrum calculating means 5 on the transmitting side, and a feedback loop is applied from the buffer to the average calculating means to average.

【００４２】以下、図３に示した実施例の構成の装置に
よる動作について説明する。伝送スペクトル選択手段６
では、有音／雑音判定情報１０２が雑音フレームの場
合、当該フレームのパワースペクトル１０３を選択し、
伝送スペクトル情報１０６として出力する。有音／雑音
判定情報１０２が有音フレームの場合、雑音引き去りパ
ワースペクトル情報１０４を選択し、出力する。The operation of the apparatus having the configuration of the embodiment shown in FIG. 3 will be described below. Transmission spectrum selection means 6
In the case where the sound / noise determination information 102 is a noise frame, the power spectrum 103 of the frame is selected,
Output as transmission spectrum information 106. When the voiced / noise determination information 102 is a voiced frame, the noise removal power spectrum information 104 is selected and output.

【００４３】一方、情報受信手段８で受信された有音／
雑音判定情報１０２が雑音フレームの場合、受信側平均
雑音パワースペクトル算出手段１４は、バッファ内の過
去の平均雑音パワースペクトル１０５、及び新たに受信
された雑音フレームの伝送スペクトル情報１０６を用い
て、平均雑音パワースペクトル１０５を算出し、バッフ
ァに出力する。On the other hand, the sound / voice received by the information receiving means 8
When the noise determination information 102 is a noise frame, the receiving-side average noise power spectrum calculation unit 14 calculates an average using the past average noise power spectrum 105 in the buffer and the transmission spectrum information 106 of the newly received noise frame. The noise power spectrum 105 is calculated and output to the buffer.

【００４４】同時に当該雑音フレームの伝送スペクトル
情報１０６、即ち当該雑音フレーム区間のパワースペク
トル１０３を用いて、重畳雑音合成手段１１において無
音区間の合成音を合成し、その振幅を所定の減衰率（こ
こでは０．８）で減衰したものを、出力合成音１１２と
して出力する。At the same time, using the transmission spectrum information 106 of the noise frame, that is, the power spectrum 103 of the noise frame section, the superimposed noise synthesis means 11 synthesizes a synthesized sound in a silent section and sets the amplitude to a predetermined attenuation rate (here, Then, the sound attenuated by 0.8) is output as the output synthesized sound 112.

【００４５】情報受信手段８で受信された有音／雑音判
定情報１０２が有音フレームの場合、重畳雑音合成手段
１１は、平均雑音パワースペクトル１０５を用いて重畳
雑音１１１を合成する。また、有音区間合成音出力手段
１３は、当該フレームの伝送スペクトル情報１０６、即
ち雑音引き去りパワースペクトル１０４を用いて、合成
音を合成し、重畳雑音１１１に所定の重畳倍率（ここで
は０．５）倍したものを重畳させ、出力音声１１２とし
て出力する。When the sound / noise determination information 102 received by the information receiving means 8 is a sound frame, the superimposed noise synthesizing means 11 synthesizes the superimposed noise 111 using the average noise power spectrum 105. The voiced section synthesized sound output means 13 synthesizes a synthesized sound using the transmission spectrum information 106 of the frame, that is, the noise removal power spectrum 104, and synthesizes the synthesized sound with a predetermined superimposition magnification (here, 0.5 ) Are superimposed and output as output sound 112.

【００４６】この例のように、スペクトルの引き去りに
用いた、平均雑音パワースペクトルから合成した合成音
を再度重畳させる構成をとることにより、引き去り誤差
による、不連続な包絡ピークが埋め戻され、連続性が増
し、聴覚的なマスク効果により、不快な雑音感を軽減す
る効果がある。また、雑音区間の出力音声を当該雑音フ
レームのスペクトルを用いて合成し、減衰して出力する
構成にしたので、雑音区間の出力音声が自然かつ雑音が
抑圧されるという効果がある。As in this example, by adopting a configuration in which the synthesized sound synthesized from the average noise power spectrum used for the subtraction of the spectrum is superimposed again, the discontinuous envelope peak due to the subtraction error is backfilled, and The effect is increased, and an unpleasant noise effect is reduced by an auditory mask effect. Further, since the output speech in the noise section is synthesized using the spectrum of the noise frame, and is attenuated and output, there is an effect that the output speech in the noise section is natural and noise is suppressed.

【００４７】実施例３．本実施例は、実施例２の受信側
の雑音区間の出力を平均雑音パワースペクトルに基づい
て合成し、出力するものである。図４は、本実施例の音
声分析装置と音声合成装置のブロック図である。図中、
音声分析装置と、音声合成装置中の有音区間の出力合成
音１１２は実施例２と同じであり、説明を省略する。Embodiment 3 FIG. In the present embodiment, the outputs of the noise section on the receiving side of the second embodiment are combined based on the average noise power spectrum and output. FIG. 4 is a block diagram of the speech analyzer and the speech synthesizer of the present embodiment. In the figure,
The output synthesized sound 112 of the sound section in the voice analysis device and the voice synthesis device is the same as in the second embodiment, and the description is omitted.

【００４８】以下、本実施例の構成の装置による動作に
ついて説明する。情報受信手段８で受信された有音／雑
音判定情報１０２が雑音フレームの場合、重畳雑音合成
手段１１は、平均雑音パワースペクトル１０５を用いて
重畳雑音１１１を合成し、重畳雑音１１１の振幅を所定
の減衰率（ここでは０．８）で減衰したものを、出力合
成音１１２として出力する。Hereinafter, the operation of the apparatus having the configuration of this embodiment will be described. When the sound / noise determination information 102 received by the information receiving unit 8 is a noise frame, the superimposed noise combining unit 11 combines the superimposed noise 111 using the average noise power spectrum 105 and sets the amplitude of the superimposed noise 111 to a predetermined value. Attenuated at an attenuation rate of 0.8 (here, 0.8) is output as the output synthesized sound 112.

【００４９】この例のように、スペクトルの引き去りに
用いた、平均雑音パワースペクトルを用いて生成した雑
音を再度重畳させる構成をとることにより、引き去り誤
差による、不連続な包絡ピークが埋め戻され、連続性が
増し、聴覚的なマスク効果により、不快な雑音感を軽減
する効果がある。また、雑音区間の出力音声を平均雑音
パワースペクトルを用いて合成し、減衰して出力する構
成にしたので、雑音区間の出力音声が平滑化され、雑音
が抑圧されるという効果がある。As in this example, by adopting a configuration in which the noise generated using the average noise power spectrum used for the subtraction of the spectrum is superimposed again, the discontinuous envelope peak due to the subtraction error is backfilled. The continuity is increased, and the auditory mask effect has an effect of reducing an unpleasant noise feeling. In addition, since the output speech in the noise section is synthesized using the average noise power spectrum and is output after being attenuated, the output speech in the noise section is smoothed and the noise is suppressed.

【００５０】実施例４．上記実施例２及び３の構成を合
わせ持ち、受信側での雑音フレーム区間に対する出力音
声の作成手段を、使用者が選択できるようにすること
も、もちろん可能である。この例のように、雑音区間の
出力音声の作成手段を、利用者が選択可能な構成とする
事により、出力される雑音の性質、程度によって、より
聴取しやすい方式を自由に選択できるという効果があ
る。Embodiment 4 FIG. It is of course possible to combine the configurations of the second and third embodiments so that the user can select the means for generating the output voice for the noise frame section on the receiving side. As shown in this example, by making the means for generating the output sound in the noise section selectable by the user, the user can freely select a method that is easier to hear depending on the nature and degree of the output noise. There is.

【００５１】実施例５．上記実施例１乃至４では、情報
伝送手段７及び情報受信手段８を用いて、有音／雑音判
定情報１２及び伝送スペクトル情報１６の受け渡しを行
っていたが、これらの伝送情報を、公知の音声符号化／
復号化手段のパラメータの一部として伝送する構成も可
能である。上記実施例１乃至４は、音声の符号化／復号
化処理とは独立した構成であるので、蓄積、伝送を目的
とする音声符号化／復号化方式と自由に組み合わせがで
きる利点がある。Embodiment 5 FIG. In the first to fourth embodiments, the sound / noise determination information 12 and the transmission spectrum information 16 are exchanged using the information transmission unit 7 and the information reception unit 8. Coding/
A configuration in which the data is transmitted as a part of the parameters of the decoding means is also possible. Since the first to fourth embodiments have a configuration independent of the audio encoding / decoding processing, there is an advantage that the above-described embodiments can be freely combined with an audio encoding / decoding method for storage and transmission.

【００５２】実施例６．本実施例は、音声合成装置の有
音区間での雑音重畳に際し、更にきめ細かく有音と雑音
のパワースペクトルの平均で重畳倍率を変えようとする
例を説明する。図５は、本実施例の音声合成装置のブロ
ック図の内、重畳倍率制御手段１６の動作を説明する構
成図である。図中新規な部分は、１６の重畳倍率制御手
段、また、１６ａは重畳倍率算出手段、１６ｂは雑音パ
ワースペクトル平均算出手段、１６ｃは有音パワースペ
クトル平均算出手段である。また、１１６は重畳倍率で
ある。情報受信手段８、平均雑音パワースペクトル保持
手段９、重畳雑音合成手段１１、有音区間合成音出力手
段１３は実施例１と同じであるので説明を省略する。Embodiment 6 FIG. In the present embodiment, an example will be described in which, when noise is superimposed in a voiced section of a speech synthesizer, the superimposition ratio is changed more finely by averaging the power spectrum of voice and noise. FIG. 5 is a block diagram for explaining the operation of the superimposing magnification control means 16 in the block diagram of the speech synthesizer of the present embodiment. The new parts in the figure are 16 superposition magnification control means, 16a is superposition magnification calculation means, 16b is noise power spectrum average calculation means, and 16c is sound power spectrum average calculation means. Reference numeral 116 denotes a superimposing magnification. The information receiving unit 8, the average noise power spectrum holding unit 9, the superimposed noise synthesizing unit 11, and the voiced section synthesized sound output unit 13 are the same as those in the first embodiment, and thus the description is omitted.

【００５３】以下、図５を用いて、実施例６の重畳倍率
制御手段１６の動作の説明を行う。重畳倍率制御手段１
６は、有音／雑音判定情報１０２に従い、雑音区間合成
音出力手段１２及び有音区間合成音出力手段１３からの
出力合成音１１２について、有音及び雑音区間それぞれ
の、前フレームまでの全てのフレームの合成音の平均信
号パワーを算出する。そして、雑音区間の平均信号パワ
ーに対する有音区間の平均信号パワーの比を計算し、こ
れをもとに重畳倍率１１６を出力する。有音区間合成音
出力手段１３では、重畳倍率１１６に従い、雑音引き去
りパワースペクトル１０４から合成された合成音に重畳
雑音１１１の重畳を行い、出力音声１１２を出力する。
ここで重畳倍率１１６の決定は、例えば平均信号パワー
の比が小さいときには、重畳倍率を大きくするように、
逆に平均信号パワーの比が大きいときには、重畳倍率を
小さくするようにすればよい。The operation of the superimposing magnification control means 16 of the sixth embodiment will be described below with reference to FIG. Superposition magnification control means 1
6, according to the voiced / noise determination information 102, for the synthesized voice 112 output from the noise interval synthesized voice output means 12 and the voiced voice synthesized noise output means 13, all of the voiced and noise sections up to the previous frame. Calculate the average signal power of the synthesized sound of the frame. Then, the ratio of the average signal power of the sound section to the average signal power of the noise section is calculated, and the superimposition magnification 116 is output based on the calculated ratio. The voiced section synthesized sound output means 13 superimposes the superimposed noise 111 on the synthesized sound synthesized from the noise removal power spectrum 104 in accordance with the superimposition magnification 116, and outputs an output voice 112.
Here, the superposition magnification 116 is determined, for example, when the ratio of the average signal power is small, so that the superposition magnification is increased.
Conversely, when the ratio of the average signal powers is large, the superimposing magnification may be reduced.

【００５４】このように有音及び雑音区間それぞれの、
合成音の平均信号パワーの比によって重畳倍率を制御す
る構成にしたことにより、背景雑音レベルに応じた雑音
重畳が可能となる点で効果がある。As described above, for each of the sound and noise sections,
The configuration in which the superimposition magnification is controlled by the ratio of the average signal power of the synthesized sound is advantageous in that noise superimposition according to the background noise level can be performed.

【００５５】実施例７．上記実施例６では重畳倍率１１
６を、前フレームまでの有音／雑音フレームのそれぞれ
の総平均信号パワー比で決定していたが、例えば現フレ
ームまでの総平均の比、あるいは、過去５フレームのみ
の平均の比等、任意の区間の平均を用いて計算させる構
成ももちろん可能である。このように、重畳倍率制御信
号１１６の決定に用いるフレーム区間を任意に取れる構
成にすることにより、話者の使用環境に応じた適切な重
畳倍率の制御が可能になるという効果がある。Embodiment 7 FIG. In the sixth embodiment, the superimposing magnification is 11
6 is determined by the total average signal power ratio of each of the voiced / noise frames up to the previous frame. However, for example, the ratio of the total average up to the current frame or the average ratio of only the past 5 frames is arbitrary. Of course, a configuration in which the calculation is performed by using the average of the sections is also possible. As described above, by adopting a configuration in which a frame section used for determining the superimposition magnification control signal 116 can be arbitrarily set, there is an effect that it is possible to control an appropriate superimposition magnification according to the usage environment of the speaker.

【００５６】実施例８．本実施例は、有音区間での雑音
スペクトルの加算に際し、音声信号、雑音とともに周波
数帯域別に区分して、その区分毎に重畳倍率を変えて重
畳しようとするものである。図６は、本実施例の音声合
成装置のブロック図の内、帯域別重畳倍率制御手段１７
を主に記載し、その動作を説明する図である。図中新規
な部分は、１７の帯域別重畳倍率制御手段、１７ａ〜１
７ｄは、平均パワースペクトル算出手段、１７ｅ，１７
ｆは、帯域分離のための理想的なＢ．Ｐ．Ｆ（バンド・
パス・フィルタ）である。また、有音区間合成音出力手
段１３中の有音合成手段１３ａ、重畳手段１３ｂの他
に、１３ｃ〜１３ｆの理想的なＢ．Ｐ．Ｆ、１３ｇ〜１
３ｊの重畳手段がある。また、１１７の帯域別重畳倍率
がある。他の構成要素は既に述べた実施例のものと同様
である。Embodiment 8 FIG. In the present embodiment, when adding a noise spectrum in a voiced section, the sound signal and the noise are divided into frequency bands and the superimposition ratio is changed for each division to superimpose. FIG. 6 is a block diagram of the speech synthesizing apparatus according to the present embodiment.
FIG. 2 is a diagram mainly describing the operation and explaining the operation. A new part in the figure is a superposition magnification control means for 17 bands, 17a to 1
7d is an average power spectrum calculating means, 17e, 17
f is the ideal B.F. for band separation. P. F (band
Path filter). Further, in addition to the sound synthesizing means 13a and the superimposing means 13b in the sound section synthetic sound output means 13, ideal B.C. P. F, 13g-1
3j superimposing means. In addition, there are 117 band-wise superposition magnifications. Other components are the same as those of the embodiment described above.

【００５７】以下、図６を用いて、実施例８の帯域別重
畳倍率制御手段１７の動作の説明を行う。帯域別重畳倍
率制御手段１７は、有音／雑音判定情報１０２に従い、
伝送スペクトル情報１０６、即ち有音フレームの場合に
は雑音引き去りパワースペクトル、雑音フレームの場合
には平均雑音パワースペクトルを取り込み、各帯域毎に
平均信号パワーの比を計算し、これをもとに帯域別重畳
倍率１１７を出力する。有音区間合成音出力手段１３で
は、帯域別重畳倍率１１７に従い、雑音引き去りパワー
スペクトル１０４から合成された合成音に重畳雑音１１
１の重畳を、帯域別に行い、出力音声１１２を出力す
る。帯域の分割数はここでは５とした。このとき帯域別
重畳倍率１１７は、各帯域の平均パワー比が小さい時、
即ち平均パワー比に差がない時には大きく、逆に比が大
きいとき、小さくなるように設定すればよい。このよう
に重畳倍率を帯域別に制御する構成にしたことにより、
ある特定の帯域にパワーの集中した背景雑音に対しても
効果的な重畳制御が可能となる効果がある。The operation of the band-by-band superimposition magnification control means 17 of the eighth embodiment will be described below with reference to FIG. The band-based superposition magnification control means 17 determines
The transmission spectrum information 106, that is, the noise removal power spectrum in the case of a sound frame and the average noise power spectrum in the case of a noise frame are taken in, and the ratio of the average signal power is calculated for each band. The other superimposing magnification 117 is output. The voiced section synthesized sound output unit 13 adds the superimposed noise 11 to the synthesized sound synthesized from the noise removal power spectrum 104 in accordance with the band-based superposition magnification 117.
1 is superimposed for each band, and an output audio 112 is output. The number of band divisions was set to 5 here. At this time, when the average power ratio of each band is small,
That is, the average power ratio may be set to be large when there is no difference, and conversely, to be small when the ratio is large. By adopting a configuration in which the superposition magnification is controlled for each band,
There is an effect that effective superposition control can be performed even on background noise in which power is concentrated in a specific band.

【００５８】実施例９．上記実施例６乃至８を組み合わ
せて用いることももちろん可能である。フレーム内の全
帯域の平均信号パワーと帯域別の平均信号パワーの両方
を考慮にいれて重畳倍率を制御する構成にすることで、
より安定した重畳効果が得られるという利点がある。Embodiment 9 FIG. Of course, it is also possible to use the embodiments 6 to 8 in combination. By adopting a configuration in which the superposition magnification is controlled in consideration of both the average signal power of all bands in the frame and the average signal power of each band,
There is an advantage that a more stable superposition effect can be obtained.

【００５９】実施例１０．上記実施例８乃至９の帯域別
重畳倍率制御手段１７における、帯域別重畳倍率１１７
の算出の際に、帯域別のバイアスを与えることも可能で
ある。例えば低域は重畳倍率を小さめに、高域ほど重畳
倍率を大きめに算出するという構成も可能である。この
ような構成を取ることにより、一般に推定誤差が大きい
高域の雑音に対するマスク効果が得られるという利点が
ある。Embodiment 10 FIG. The band-by-band superposition magnification 117 in the band-by-band superposition magnification control means 17 of the eighth and ninth embodiments.
It is also possible to apply a bias for each band when calculating. For example, a configuration is also possible in which the superimposition magnification is calculated to be lower in the low frequency range and larger in the higher frequency range. By adopting such a configuration, there is an advantage that a mask effect for high-frequency noise having a large estimation error is generally obtained.

【００６０】実施例１１．本実施例は、音声分析装置側
での雑音引き去りパワースペクトル生成の工夫をした例
を説明する。即ち実施例６の雑音重畳のための構成を、
送信側の雑音引き去り部分に適用する。図７は、請求項
７の発明の音声分析装置の一構成例を示すブロック図で
ある。図中新規な部分は、１９の信号強度比による引き
去り率算出手段、更に、詳細には１９ａの有音平均パワ
ースペクトル算出手段、１９ｂの平均雑音パワースペク
トル算出手段、１９ｃのパワー比較手段がある。また、
１１８はスペクトル引き去り率である。その他の構成要
素は、他の実施例と同じであるので説明を省略する。Embodiment 11 FIG. In the present embodiment, an example will be described in which the noise analysis power spectrum generation is devised on the voice analysis device side. That is, the configuration for superimposing the noise of the sixth embodiment is as follows.
Applies to the noise removal part on the transmitting side. FIG. 7 is a block diagram showing a configuration example of the voice analysis device according to the seventh aspect of the present invention. The new parts in the figure are subtraction rate calculating means based on 19 signal intensity ratios, more specifically, a sounded average power spectrum calculating means 19a, an average noise power spectrum calculating means 19b, and a power comparing means 19c. Also,
Numeral 118 denotes a spectral subtraction rate. The other components are the same as those of the other embodiments, and the description is omitted.

【００６１】以下、図７を用いて本構成の音声分析装置
の動作について説明する。信号強度比による引き去り率
算出手段１９は、音声信号１０１、有音／雑音判定情報
１０２を入力とし、有音／雑音判定情報１０２を用い
て、有音区間及び雑音区間のそれぞれの平均信号パワー
比を求め、それを用いてスペクトル引き去り率１１８を
算出し、出力する。このとき例えば平均信号パワー比が
小さい時、即ち平均信号パワー比に差がない時には大き
く、逆に比が大きいとき、即ち雑音区間の平均信号パワ
ーが、有音区間のそれに比べ小さいときには、小さく設
定する。Hereinafter, the operation of the voice analyzing apparatus having the above configuration will be described with reference to FIG. The subtraction rate calculating means 19 based on the signal intensity ratio receives the audio signal 101 and the sound / noise determination information 102 as input, and uses the sound / noise determination information 102 to calculate the average signal power ratio of each of the sound section and the noise section. Is calculated, and the spectral subtraction rate 118 is calculated and output using the calculated value. At this time, for example, when the average signal power ratio is small, that is, when there is no difference in the average signal power ratio, the ratio is set large when the ratio is large, that is, when the average signal power in the noise section is smaller than that in the sound section. I do.

【００６２】スペクトル減算手段４において、当該フレ
ームのパワースペクトル１０３より、前記信号強度比に
よる引き去り率算出手段１９で得られたスペクトル引き
去り率１１８に従って、平均雑音パワースペクトル１０
５の引き去りを行い、雑音引き去りパワースペクトル１
０４を出力する。このように有音区間及び雑音区間のそ
れぞれの平均信号パワー比を求め、それを用いてスペク
トル引き去り率１１８を算出する構成にしたことによ
り、背景雑音レベルに応じた引き去りが可能になるとい
う利点がある。In the spectrum subtraction means 4, the average noise power spectrum 10 is obtained from the power spectrum 103 of the frame according to the spectrum subtraction rate 118 obtained by the subtraction rate calculation means 19 based on the signal intensity ratio.
5 and the noise removal power spectrum 1
04 is output. As described above, the average signal power ratio of each of the sound section and the noise section is obtained, and the spectrum subtraction rate 118 is calculated using the average signal power ratio. This has an advantage that the subtraction according to the background noise level becomes possible. is there.

【００６３】実施例１２．本実施例は、実施例８で示し
た、帯域別の重畳倍率制御を、送信側の引き去りに適用
した例である。即ち、雑音の引き去り率を帯域別に変え
ようとするものである。図８は、請求項８の発明の音声
分析装置の一実施例のブロック図である。図中新規な部
分は、帯域別引き去り率算出手段２０、更に詳細には２
０ａ〜２０ｆの帯域分割手段のＢ．Ｐ．Ｆ、２０ｇ〜２
０ｌの平均パワースペクトル算出手段、２０ｍ〜２０ｐ
の引き去り算出手段、２０ｑ〜２０ｖの引き去り手段、
２０ｗの出力スペクトル再生成手段がある。また、１１
８はスペクトル引き去り率である。その他の構成要素は
他の実施例と同じであるので説明を省略する。Embodiment 12 FIG. This embodiment is an example in which the superposition magnification control for each band shown in the eighth embodiment is applied to the subtraction on the transmission side. That is, the noise removal rate is changed for each band. FIG. 8 is a block diagram of one embodiment of the speech analyzing apparatus according to the eighth aspect of the present invention. The new part in the figure is the band-by-band removal rate calculating means 20, more specifically, 2
0a to 20f of the band dividing means. P. F, 20g-2
0l average power spectrum calculation means, 20m to 20p
Withdrawal calculation means, 20q to 20v withdrawal means,
There is a 20w output spectrum regeneration means. Also, 11
8 is a spectral subtraction rate. The other components are the same as those of the other embodiments, and a description thereof will be omitted.

【００６４】以下、図８を用いて請求項８の構成の装置
の動作について説明する。帯域別引き去り率算出手段２
０は、有音／雑音判定情報１０２、当該フレームのパワ
ースペクトル１０３、及び平均雑音パワースペクトル１
０５をいくつかの帯域に分割する。具体的には、パワー
スペクトルの当該帯域区間を選択し、その区間のパワー
スペクトル値のみ処理対象とする、理想的なバンドパス
フィルタを用いる。ここでは分割数を５とした。次に、
それぞれの帯域の有音区間と雑音区間の平均パワーの比
を平均パワースペクトル算出手段２０ｇ〜２０ｌで求
め、これをもとに帯域別の引き去り率を引き去り率算出
手段２０ｍ〜２０ｐで決定し、スペクトル引き去り率を
出力する。このとき各引き去り率は例えば平均パワー比
が小さい時、即ち平均パワー比に差がない時には大き
く、逆に比が大きいとき、即ち平均雑音パワースペクト
ル１０５の当該帯域の平均パワーが、当該フレームのパ
ワースペクトル１０３のそれに比べ小さいときには、小
さく設定する。Hereinafter, the operation of the apparatus having the structure of claim 8 will be described with reference to FIG. Band-by-band removal rate calculation means 2
0 is the sound / noise determination information 102, the power spectrum 103 of the frame, and the average noise power spectrum 1
05 is divided into several bands. Specifically, an ideal band-pass filter that selects the relevant band section of the power spectrum and processes only the power spectrum value in that section is used. Here, the number of divisions is set to five. next,
The ratio of the average power between the sound section and the noise section of each band is determined by the average power spectrum calculation means 20g to 20l, and based on this, the removal rate for each band is determined by the removal rate calculation means 20m to 20p. Output withdrawal rate. At this time, each subtraction rate is large, for example, when the average power ratio is small, that is, when there is no difference in the average power ratio, and conversely, when the ratio is large, that is, the average power of the band of the average noise power spectrum 105 is the power of the frame. When it is smaller than that of the spectrum 103, it is set smaller.

【００６５】スペクトル減算手段４において、各帯域の
当該フレームのパワースペクトル１０３より、前記帯域
別引き去り率算出手段２０で得られたスペクトル引き去
り率１１８に従って、平均雑音パワースペクトル１０５
の引き去りを行い、次に、出力スペクトル再生成手段２
０ｗで各帯域を取りまとめ、雑音引き去りパワースペク
トル１０４を出力する。In the spectrum subtracting means 4, the average noise power spectrum 105 is obtained from the power spectrum 103 of the frame in each band according to the spectrum subtraction rate 118 obtained by the band-specific removal rate calculating means 20.
, And then output spectrum regenerating means 2
Each band is collected at 0 w, and the noise removal power spectrum 104 is output.

【００６６】このようにスペクトル引き去り率を帯域別
に制御する構成にしたことにより、ある特定の帯域にパ
ワーの集中した背景雑音に対しても効果的な引き去りが
可能となる効果がある。図９は、３つの帯域に区分して
異なる引き去り率で引き去りパワースペクトルを得る例
を示した図である。帯域の区切りで不連続になるが、実
用上は全く問題はなく、効果的な雑音除去ができる。By controlling the spectrum removal rate for each band as described above, there is an effect that it is possible to effectively remove the background noise whose power is concentrated in a specific band. FIG. 9 is a diagram showing an example in which a band is divided into three bands and a subtraction power spectrum is obtained at different subtraction rates. Although discontinuity occurs at the boundary of the band, there is no problem in practical use, and effective noise removal can be performed.

【００６７】実施例１３．上記実施例１２では、帯域毎
の引き去り率を独立に算出する構成になっていたが、帯
域毎の引き去り率にバイアスをもたせることも可能であ
る。例えば低域は引き去り率を小さめに、高域ほど引き
去り率を大きめに算出するという構成も可能である。こ
のように引き去りに帯域別のバイアスをもたせた構成に
することにより、聴感上好ましい雑音抑圧効果が得られ
るように調整しておけるという利点がある。Embodiment 13 FIG. In the twelfth embodiment, the removal rate for each band is independently calculated, but the removal rate for each band may be biased. For example, it is also possible to adopt a configuration in which the withdrawal rate is calculated to be lower in a low frequency range and higher in a higher frequency range. By adopting a configuration in which the subtraction is given a bias for each band as described above, there is an advantage that the noise can be adjusted so as to obtain a noise suppression effect that is preferable in terms of audibility.

【００６８】実施例１４．上記実施例１２乃至１３で
は、帯域別引き去り率算出手段２０を単独で用いていた
が、実施例１１の信号強度比による引き去り率算出手段
１９を組み合わせる構成も可能である。この際、スペク
トル引き去り率１１８は、全帯域に対する平均的な引き
去り率を信号強度比による引き去り率算出手段１９で算
出しておき、引き続き帯域別引き去り率算出手段２０で
個別帯域の調整を行う構成とすることが考えられる。こ
のようにフレーム内の全帯域の平均信号パワーと帯域別
の平均信号パワーの両方を考慮にいれて引き去り率を制
御する構成にすることで、より安定した引き去り効果が
得られるという利点がある。Embodiment 14 FIG. In Embodiments 12 and 13, the band-by-band removal rate calculating means 20 is used alone, but a configuration in which the removal rate calculating means 19 based on the signal intensity ratio of Embodiment 11 is combined is also possible. At this time, the spectrum removal rate 118 is obtained by calculating the average removal rate for all the bands by the removal rate calculating means 19 based on the signal intensity ratio, and subsequently adjusting the individual band by the band-specific removal rate calculating means 20. It is possible to do. As described above, by adopting a configuration in which the removal rate is controlled in consideration of both the average signal power of the entire band in the frame and the average signal power of each band, there is an advantage that a more stable removal effect can be obtained.

【００６９】実施例１５．図１０は、請求項９の発明の
音声分析装置の一実施例のブロック図である。図１０の
構成において、帯域別引き去り率算出手段２０中に、２
０ｘ〜２０ｚのリミッタが設けられている。他の構成要
素は、実施例１４の図８で示す要素と同じである。Embodiment 15 FIG. FIG. 10 is a block diagram of one embodiment of the speech analyzing apparatus according to the ninth aspect. In the configuration of FIG. 10, 2
A limiter of 0x to 20z is provided. The other components are the same as those shown in FIG.

【００７０】以下、図１０を用いて請求項９の音声分析
装置の一実施例の動作について説明する。引き去り率算
出手段１８は、有音／雑音判定情報１０２、当該フレー
ムのパワースペクトル１０３、及び平均雑音パワースペ
クトル１０５とから、スペクトル引き去り率１１８を算
出し、出力する。スペクトル減算手段４において、当該
フレームのパワースペクトル１０３より、前記引き去り
率算出手段１８で得られたスペクトル引き去り率１１８
に従って、平均雑音パワースペクトル１０５の引き去り
を行い、雑音引き去りパワースペクトル１０４を出力す
る。２０ｘ〜２０ｚのリミッタにより、予め定めたしき
い値以下では引き去りが行われず、このリミッタ設定の
しきい値が出力される。The operation of one embodiment of the speech analyzing apparatus according to claim 9 will be described below with reference to FIG. The subtraction rate calculating means 18 calculates and outputs a spectrum subtraction rate 118 from the sound / noise determination information 102, the power spectrum 103 of the frame, and the average noise power spectrum 105. In the spectrum subtraction means 4, the spectrum subtraction rate 118 obtained by the subtraction rate calculation means 18 is obtained from the power spectrum 103 of the frame.
, The average noise power spectrum 105 is subtracted, and the noise subtracted power spectrum 104 is output. With the limiters of 20x to 20z, the withdrawal is not performed below a predetermined threshold value, and the threshold value of the limiter setting is output.

【００７１】実施例１６．上記実施例では、リミッタを
用いて引き去り振幅制限をした。しかし、本実施例で
は、図８の構成で雑音引き去りパワースペクトル１０４
を引き去り率算出手段２０ｍ〜２０ｐフィードバック
し、再計算をしてもよい。即ち、引き去り率算出手段２
０ｍ〜２０ｐでは、雑音引き去りパワースペクトル１０
４の振幅成分の最小値を検出し、その値が予め定められ
ている値以下の場合には、再度当該フレームのパワース
ペクトル１０３、及び平均雑音パワースペクトル１０５
とからスペクトル引き去り率１１８の値を修正算出す
る。この処理を雑音引き去りパワースペクトル１０４の
振幅成分の最小値が所定の値以内に納まるまで繰り返
す。Embodiment 16 FIG. In the above embodiment, the pull-out amplitude is limited using the limiter. However, in the present embodiment, the noise removal power spectrum 104 in the configuration of FIG.
May be fed back and the recalculation may be performed. That is, the withdrawal rate calculating means 2
0m to 20p, the noise removal power spectrum 10
4 is detected, and when the detected value is equal to or smaller than a predetermined value, the power spectrum 103 and the average noise power spectrum 105 of the frame are again detected.
Then, the value of the spectrum removal rate 118 is corrected and calculated. This process is repeated until the minimum value of the amplitude component of the noise removal power spectrum 104 falls within a predetermined value.

【００７２】このように雑音引き去りパワースペクトル
１０４の振幅成分の最小値を検出し、その値が予め定め
られている値以下の場合には、再度スペクトル引き去り
率１１８の値を修正算出する構成とすることにより、引
き去り誤差を最小限にとどめる引き去り処理が可能とな
る効果がある。As described above, the minimum value of the amplitude component of the noise removal power spectrum 104 is detected, and when the value is equal to or less than a predetermined value, the value of the spectrum removal rate 118 is corrected and calculated again. As a result, there is an effect that a removal process for minimizing a removal error can be performed.

【００７３】実施例１７．上記実施例では、音声分析装
置と音声合成装置を別々の説明してきた。これらを併せ
た音声伝送システムが実用上は有用である。即ち、図１
に示す基本的な構成要素を備えた音声分析装置と、同じ
く図１に示す基本的な構成要素を備えた音声合成装置と
で構成される音声伝送システムである。この場合、複数
の音声合成装置があるいわゆる、放送形式のシステムで
あってもよいし、逆に音声分析装置が複数あって受信側
は切換使用するシステムであってもよい。Embodiment 17 FIG. In the above embodiment, the speech analyzer and the speech synthesizer have been described separately. An audio transmission system combining these is practically useful. That is, FIG.
1 is a voice transmission system composed of a voice analyzer having the basic components shown in FIG. 1 and a voice synthesizer also having the basic components shown in FIG. In this case, a so-called broadcast system having a plurality of voice synthesizers may be used, or a system in which a plurality of voice analyzers are provided and the receiving side is switched and used may be used.

【００７４】このような構成を取ることにより、引き去
り誤差の小さな雑音引き去りパワースペクトルを伝送す
る事ができ、かつ重畳によるマスク効果も得られる効果
がある。By adopting such a configuration, it is possible to transmit a noise-removed power spectrum with a small removal error and to obtain a mask effect by superposition.

【００７５】実施例１８．上記実施例１７では、有音区
間合成音出力手段１３で用いる重畳倍率は固定とした
が、これをフレーム毎に可変とし、引き去り率算出手段
１８でもとめたスペクトル引き去り率１１８を、情報伝
送手段７、伝送路２００、情報受信手段８を通じて伝送
し、重畳倍率の算出に用いる構成も可能である。また、
実施例６乃至１０の重畳倍率制御手段を組み合わせるこ
とももちろん可能である。この構成を取ることにより、
重畳によるマスク効果が適正な範囲で制御できる効果が
ある。Embodiment 18 FIG. In the seventeenth embodiment, the superimposition ratio used in the sound segment synthesized sound output unit 13 is fixed. However, this is variable for each frame, and the spectrum removal ratio 118 obtained by the removal ratio calculation unit 18 is used as the information transmission unit 7. , The transmission path 200 and the information receiving means 8 to be used for calculating the superposition magnification. Also,
Of course, it is also possible to combine the superimposing magnification control means of the sixth to tenth embodiments. By taking this configuration,
There is an effect that the mask effect by the superimposition can be controlled within an appropriate range.

【００７６】[0076]

【発明の効果】この発明による雑音抑圧音声分析装置
は、雑音引き去りパワースペクトルと、雑音スペクトル
を選択して送信するようにしたので、受信側に自然な合
成音を生成するための信号を与えることができるという
効果がある。 The noise-suppressed speech analyzing apparatus according to the present invention comprises a noise removal power spectrum and a noise spectrum.
Is selected and sent, so that the receiving
That you can give a signal to generate the sound
effective.

【００７７】[0077]

【００７８】更にまた音声分析装置は、有音区間では雑
音引き去りパワースペクトルを、雑音区間では雑音フレ
ームのスペクトルを伝送するようにしたので、受信側に
対し区間ごとに更に不快な雑音感を軽減できる効果があ
る。Furthermore, since the speech analyzer transmits the noise removal power spectrum in the sound section and the spectrum of the noise frame in the noise section, the unpleasant noise feeling can be further reduced for each section to the receiving side. effective.

【００７９】[0079]

【００８０】更にまた音声合成装置は、音声信号と、雑
音スペクトルを倍率を掛けて重畳するようにしたので、
更にきめ細かな合成音が得られる効果がある。Furthermore, the speech synthesizer superimposes the speech signal and the noise spectrum by multiplying them by a factor.
There is an effect that a finer synthesized sound can be obtained.

【００８１】更にまた音声合成装置は、帯域ごとに音声
信号と、雑音スペクトルを倍率を掛けて重畳するように
したので、更にきめ細かな聞き取りやすい合成音が得ら
れる効果がある。Furthermore, the speech synthesizer superimposes the noise spectrum and the noise spectrum for each band by multiplying by a factor, so that it is possible to obtain a more detailed synthesized sound that is easy to hear.

【００８２】更にまた音声分析装置は、音声信号から雑
音スペクトルを倍率を掛けて引き去るようにしたので、
受信側に対し自然な雑音引き去りパワースペクトルを送
れる効果がある。Further, since the voice analyzing apparatus subtracts the noise spectrum from the voice signal by multiplying the noise spectrum,
The effect is that a natural noise removal power spectrum can be sent to the receiving side.

【００８３】更にまた音声分析装置は、帯域ごとに、音
声信号から雑音スペクトルを倍率を掛けて引き去るよう
にしたので、受信側に対し更に自然な雑音引き去りパワ
ースペクトルを送れる効果がある。Further, since the voice analyzer is configured to remove the noise spectrum from the voice signal by multiplying it for each band, there is an effect that a more natural noise removal power spectrum can be sent to the receiving side.

【００８４】更にまた音声分析装置は、音声信号から雑
音スペクトルを倍率を掛けて引き去る際にしきい値を設
けて引き過ぎ防いだので、受信側に対し更に自然な雑音
引き去りパワースペクトルを送れる効果がある。Further, the voice analyzer removes the noise spectrum from the voice signal by multiplying the noise spectrum by a multiplication factor to prevent the noise signal from being overdrawn. Therefore, it is possible to send a more natural noise removing power spectrum to the receiving side. is there.

【００８５】この発明による音声伝送システムは、雑音
スペクトルも送るようにした音声分析装置と、雑音引き
去りスペクトルに雑音スペクトルからの合成音を重畳し
て有声区間の合成音を生成する音声合成装置で構成した
ので、自然な合成音が伝送できる効果がある。The voice transmission system according to the present invention comprises a voice analyzer for transmitting a noise spectrum and a voice synthesizer for generating a synthesized voice in a voiced section by superimposing a synthesized voice from a noise spectrum on a noise removal spectrum. Therefore, there is an effect that a natural synthesized sound can be transmitted.

【００８６】更にまた音声伝送システムは、雑音の引き
去り率を可変にして雑音引き去りパワースペクトルを送
るようにした音声分析装置と、雑音引き去りスペクトル
に雑音スペクトルからの合成音を重畳倍率を可変にして
重畳して有声区間の合成音を生成する音声合成装置で構
成したので、更に自然な合成音が伝送できる効果があ
る。Further, the voice transmission system comprises: a voice analyzer for transmitting a noise removal power spectrum by varying a noise removal rate; and a variable tone superimposing ratio of a synthesized sound from the noise spectrum on the noise removal spectrum. Therefore, since the speech synthesizer is configured to generate a synthesized voice in a voiced section, a more natural synthesized voice can be transmitted.

[Brief description of the drawings]

【図１】本発明の実施例１の雑音抑圧音声分析装置と雑
音抑圧音声合成装置の構成図である。FIG. 1 is a configuration diagram of a noise-suppressed speech analyzer and a noise-suppressed speech synthesizer according to a first embodiment of the present invention.

【図２】本発明の実施例の音声分析装置と音声合成装置
による出力信号を時間推移軸で表した図である。FIG. 2 is a diagram showing output signals from a speech analysis device and a speech synthesis device according to an embodiment of the present invention on a time transition axis.

【図３】本発明の実施例２の雑音抑圧音声分析装置と雑
音抑圧音声合成装置の構成図である。FIG. 3 is a configuration diagram of a noise suppressed speech analyzer and a noise suppressed speech synthesizer according to a second embodiment of the present invention.

【図４】本発明の実施例３の雑音抑圧音声分析装置と雑
音抑圧音声合成装置の構成図である。FIG. 4 is a configuration diagram of a noise-suppressed speech analyzer and a noise-suppressed speech synthesizer according to a third embodiment of the present invention.

【図５】本発明の実施例６の装置中の重畳倍率制御手段
の詳細構成図である。FIG. 5 is a detailed configuration diagram of a superposition magnification control unit in the device according to the sixth embodiment of the present invention.

【図６】本発明の実施例８の装置中の帯域別重畳倍率制
御手段の詳細構成図である。FIG. 6 is a detailed configuration diagram of band-by-band superimposition magnification control means in an apparatus according to an eighth embodiment of the present invention.

【図７】本発明の実施例１１の音声分析装置の構成図で
ある。FIG. 7 is a configuration diagram of an audio analysis device according to an eleventh embodiment of the present invention.

【図８】本発明の実施例１２の音声分析装置の構成図で
ある。FIG. 8 is a configuration diagram of a speech analysis device according to a twelfth embodiment of the present invention.

【図９】帯域別雑音引き去りスペクトルを説明する図で
ある。FIG. 9 is a diagram illustrating a noise removal spectrum for each band.

【図１０】本発明の実施例１５の音声分析装置の構成図
である。FIG. 10 is a configuration diagram of a speech analysis device according to Embodiment 15 of the present invention.

【図１１】従来の雑音処理装置の構成図である。FIG. 11 is a configuration diagram of a conventional noise processing device.

【図１２】原音声信号のスペクトル包絡の時間推移を説
明する図である。FIG. 12 is a diagram illustrating a time transition of a spectrum envelope of an original audio signal.

【図１３】原音声信号に白色雑音を重畳させた信号のス
ペクトル包絡の時間推移を説明する図である。FIG. 13 is a diagram illustrating a time transition of a spectral envelope of a signal in which white noise is superimposed on an original audio signal.

【図１４】図１３の信号を符号化／復号化した合成音信
号のスペクトル包絡の時間推移を説明する図である。FIG. 14 is a diagram illustrating a time transition of a spectral envelope of a synthesized sound signal obtained by encoding / decoding the signal of FIG. 13;

【図１５】図１３の信号を従来の雑音処理装置で処理し
た音声を符号化／復号化した合成音信号のスペクトル包
絡の時間推移を説明する図である。FIG. 15 is a diagram illustrating a time transition of a spectrum envelope of a synthesized sound signal obtained by encoding / decoding a speech obtained by processing the signal of FIG. 13 by a conventional noise processing apparatus.

[Explanation of symbols]

２有音／雑音判定手段３スペクトル分析手段４スペクトル減算手段５平均雑音パワースペクトル算出手段５ａ平均算出手段５ｂバッファ６ａ，６ｂ，６ｃ，６ｄ，６ｅ，６ｆ伝送スペクトル
選択手段７情報伝送手段８情報受信手段９平均雑音パワースペクトル保持手段１１重畳雑音合成手段１２雑音区間合成音出力手段１３有音区間合成音出力手段１３ａ有音合成手段１３ｂ重畳手段１３ｃ，１３ｄ，１３ｅ，１３ｆＢ．Ｐ．Ｆ１３ｇ，１３ｈ，１３ｉ，１３ｊ重畳手段１４受信側平均雑音パワースペクトル算出手段１５雑音区間音声合成手段１６重畳倍率制御手段１６ａ重畳倍率算出手段１６ｂ雑音パワースペクトル平均算出手段１６ｃ有音パワースペクトル平均算出手段１７帯域別重畳倍率制御手段１７ａ，１７ｂ，１７ｃ，１７ｄ平均パワースペクト
ル算出手段１７ｅ，１７ｆＢ．Ｐ．Ｆ１９信号強度比による引き去り率算出手段１９ａ有音平均パワースペクトル算出手段１９ｂ平均雑音パワースペクトル算出手段１９ｃパワー比算出手段２０帯域別引き去り率算出手段２０ａ，２０ｂ，２０ｃ，２０ｄ，２０ｅ，２０ｆ
Ｂ．Ｐ．Ｆ２０ｍ，２０ｎ，２０ｐ引き去り算出手段２０ｑ，２０ｒ，２０ｓ，２０ｔ，２０ｕ，２０ｖ引
き去り手段２０ｗ出力スペクトル再生成手段１０１音声信号１０２有音／雑音判定情報１０３パワースペクトル１０４雑音引き去りパワースペクトル１０５平均雑音パワースペクトル１０６伝送スペクトル情報１１１重畳雑音１１２出力音声１１６重畳倍率１１７帯域別重畳倍率１１８スペクトル引き去り率２００伝送路2 voiced / noise determination means 3 spectrum analysis means 4 spectrum subtraction means 5 average noise power spectrum calculation means 5a average calculation means 5b buffer 6a, 6b, 6c, 6d, 6e, 6f transmission spectrum selection means 7 information transmission means 8 information reception Means 9 Average noise power spectrum holding means 11 Superimposed noise synthesizing means 12 Noise section synthesized sound output means 13 Sounded section synthesized sound output means 13a Sounded synthesis means 13b Superposition means 13c, 13d, 13e, 13f P. F 13g, 13h, 13i, 13j Superposition means 14 Reception side average noise power spectrum calculation means 15 Noise section speech synthesis means 16 Superposition magnification control means 16a Superposition magnification calculation means 16b Noise power spectrum average calculation means 16c Sound power spectrum average calculation means B. 17 Superposition magnification control means for each band 17a, 17b, 17c, 17d Average power spectrum calculation means 17e, 17f P. F 19 Removal rate calculating means based on signal intensity ratio 19a Sounded average power spectrum calculating means 19b Average noise power spectrum calculating means 19c Power ratio calculating means 20 Band-wise removal rate calculating means 20a, 20b, 20c, 20d, 20e, 20f
B. P. F 20m, 20n, 20p Subtraction calculation unit 20q, 20r, 20s, 20t, 20u, 20v Subtraction unit 20w Output spectrum regeneration unit 101 Audio signal 102 Voice / noise determination information 103 Power spectrum 104 Noise removal power spectrum 105 Average noise power Spectrum 106 Transmission spectrum information 111 Superposition noise 112 Output voice 116 Superposition magnification 117 Superposition magnification for each band 118 Spectrum removal rate 200 Transmission line

───────────────────────────────────────────────────── フロントページの続き (72)発明者白木宏一鎌倉市大船五丁目１番１号三菱電機株式会社情報システム研究所内 (72)発明者古田訓鎌倉市大船五丁目１番１号三菱電機株式会社情報システム研究所内 (56)参考文献特開平３−179500（ＪＰ，Ａ) 特開平５−136746（ＪＰ，Ａ) 特開昭60−140399（ＪＰ，Ａ) 特開平２−278298（ＪＰ，Ａ) 特開昭54−133003（ＪＰ，Ａ) 特開平５−56007（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00,13/00 G10L 19/00,21/02 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Koichi Shiraki 5-1-1, Ofuna, Kamakura City Mitsubishi Electric Corporation Information Systems Research Laboratories (72) Inventor: Naru Furuta 5-1-1, Ofuna, Kamakura City Mitsubishi Electric (56) References JP-A-3-179500 (JP, A) JP-A-5-136746 (JP, A) JP-A-60-140399 (JP, A) JP-A-2- 278298 (JP, A) JP-A-54-133003 (JP, A) JP-A-5-56007 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11 / 00,13 / 00 G10L 19 / 00,21 / 02

Claims

(57) [Claims]

An input audio signal is converted into a predetermined analysis frame unit.
Spectrum analysis for each power spectrum
Specified for the vector analysis means and the noise section frame of the above analysis frames
To find the average noise power spectrum for all frames
Noise power spectrum calculation means, and in the sound interval, the average from the spectrum analysis means output.
Noise subtracted powers with noise power spectrum subtracted
Vector in the noise section.
Transmission spectrum to be selected and transmitted as a transmission spectrum.
A noise-suppressed speech analysis device provided with a torque selection transmission means.

2. In the noise section frame, the noise frame
To transmit the instantaneous noise power spectrum of
2. The noise-suppressed speech analysis device according to claim 1, wherein:

3. The method according to claim 1, wherein the input audio signal is analyzed by a predetermined
And the analysis frame is referred to as a sound section frame.
Divided into noise interval frames, and for noise interval frames
Analyze spectrum and store average noise power spectrum
Means for holding the average noise power spectrum, and generating the synthesized sound using the average noise power spectrum as input.
And a superimposed noise synthesizing means, and, when the input signal is a voiced section, a voice signal of the input voiced section.
Signal and the synthesized sound output from the superimposed noise synthesizing means have a predetermined weight.
Multiply and multiply by the tatami mat ratio to generate and output synthesized sounds
Sounding section synthesized sound output means, the sounded section synthesized sound output means and the superimposed noise synthesis means output.
Calculates and controls the above superimposed magnification by
In the case between the two, the output of the superimposed noise synthesis means
Superimposed magnification to control to output synthesized sound in noise section
Control means;
Equipment.

4. An analysis frame having a predetermined length is input to an input audio signal.
And the analysis frame is referred to as a sound section frame.
Divided into noise interval frames, and for noise interval frames
Analyze spectrum and store average noise power spectrum
Means for holding the average noise power spectrum, and generating the synthesized sound using the average noise power spectrum as input.
A superimposed noise synthesizing means, and a spectrum of a sound section frame is divided into predetermined frequency bands.
Superposition magnification control for each band to control superposition magnification for each band
Means and, if the input signal is a voiced section, the voice signal of the input voiced section.
Signal and the synthesized sound output from the superimposed noise synthesis means
Multiply and multiply by the superimposition factor controlled by another superimposing means
Voiced section synthesized sound output means for generating and outputting synthesized sounds between
And a noise suppression speech synthesizer comprising:

5. An average noise power spectrum holding means,
If the instantaneous noise power spectrum is transmitted,
The noise power spectrum is averaged for a specified frame and averaged.
Stored in the noise interval frame.
Output based on the vector or the above average noise power spectrum
Claim 3 or Claim characterized in that it is made to do
4. The noise-suppressed speech synthesizer according to any one of 4.

6. An average noise parameter based on the output of the spectrum analysis means.
Subtract the output of the power spectrum calculation means by the subtraction rate
Spectral subtraction for noise removal power spectrum
Means and the value of the output of the spectrum analysis means determine the subtraction rate.
With a withdrawal rate calculation means
Transmits the above noise-removed power spectrum.
The noise-suppressed speech analysis device according to claim 1.

7. The method according to claim 1, wherein the withdrawal rate calculating means calculates a space of the sound section.
Calculate withdrawal rate for each frequency band of vector analysis means output
And the spectrum subtraction means performs the above-mentioned frequency band
Subtract average noise power spectrum with different subtraction rate
2. The noise-suppressed speech analysis device according to claim 1, wherein:

8. A subtraction rate calculating means, comprising:
Predetermined threshold based on subtracted power spectrum output
When the value falls below the value, the noise removal power spectrum
The force outputs the predetermined threshold value.
2. The noise-suppressed speech analysis device according to claim 1, wherein:

9. An input audio signal is converted into a predetermined analysis frame unit.
Spectrum analysis for each power spectrum
Specified for the vector analysis means and the noise section frame of the above analysis frames
To find the average noise power spectrum for all frames
A noise power spectrum calculating means, and the average noise power
A noise-removed power spectrum with the spectrum subtracted,
Select the above average noise power spectrum and the transmission
Equipped with transmission spectrum selection transmission means for transmitting as transmission
The noise-suppressed speech analyzer and the transmitted signal for each analysis frame are
The average noise power corresponding to the spectrum.
Mean noise power spectrum holding means for storing the spectrum
And generate a synthesized sound using the above average noise power spectrum as input.
And a superimposed noise synthesizing means, and when the input signal is a voiced section, an audio signal of the input voiced section.
Signal and the synthesized sound output from the superimposed noise synthesis means
Synthesized sound output means for generating and outputting synthesized sounds in sound sections
Speech transmission composed of a noise-suppressed speech synthesizer having a stage
Feeding system.

10. The variable reduction of the average noise power spectrum.
A noise-subtracted power spectrum obtained by subtracting
A sound-suppressed speech analyzer, which outputs the superimposed noise synthesis means output to the speech signal of the input voiced section
Multiply by a variable superimposition ratio and superimpose to produce a synthesized sound in a sound section
A noise-suppressed speech synthesizer designed to generate and output
10. The audio transmission system according to claim 9, wherein: