JPH11305797A

JPH11305797A - Voice analyzing synthesizer

Info

Publication number: JPH11305797A
Application number: JP10113076A
Authority: JP
Inventors: Tomokazu Morio; 智一森尾
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1998-04-23
Filing date: 1998-04-23
Publication date: 1999-11-05
Anticipated expiration: 2018-04-23
Also published as: JP3472704B2

Abstract

PROBLEM TO BE SOLVED: To improve voice quality by providing a pulsative analyzer analyzing a residual signal and performing voice decision when a pulsative rate is higher than a set threshold value in a voice analyzer. SOLUTION: Voice analyzer A side processing inputs a voice signal from an input terminal 1. A linear predictive analyzer 2 calculates a linear predictive coefficient (b) to output it to a linear predictive analytic filter 3 and to transmit it to the linear predictive synthetic filter 13 of the voice synthesizer B also. The linear predictive analytic filter 3 inputs the voice signal (a) and the linear predictive coefficient (b) to output the residual signal (c). An amplifier 12 amplifies the residual signal (f) outputted from a residual signal generator 11 so as to become the same as the power of the residual signal based on information of residual power (g) sent from the voice analyzer A side. The linear predictive synthetic filter 13 synthesizes the voice signal (i) from the information of the linear predictive coefficient (b) sent from the voice analyzer A side and the amplified residual signal (h) to output it to an output terminal 14.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声信号を圧縮
して符号化復号化する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for compressing an audio signal for encoding and decoding.

【０００２】[0002]

【従来の技術】一般にボコーダと呼ばれる音声分析合成
技術がある（例えば、「音声情報処理の基礎」斎藤、中
田、オーム社、１９８１年）。図３は、従来のボコーダ
の構成図であり、点線の上側のＡ′が音声分析器、下側
のＢ′が音声合成器である。音声分析器Ａ′は、１０１
が入力端子、１０２が音声信号ａを入力して線形予測係
数ｂを算出する線形予測分析器、１０３が音声信号ａと
線形予測係数ｂとを入力して、第１の残差信号ｃを算出
する線形予測分析フィルタ、１０４が第１の残差信号ｃ
を入力して、残差パワーｇを算出する残差パワー分析
器、１０５が第１の残差信号ｃを入力して、有声／無声
の判定ｅ′とピッチ周波数ｄとを算出するピッチ分析器
である。2. Description of the Related Art There is a speech analysis / synthesis technique generally called a vocoder (for example, "Basics of speech information processing", Saito, Nakata, Ohmsha, 1981). FIG. 3 is a configuration diagram of a conventional vocoder, wherein A 'above the dotted line is a speech analyzer, and B' below the dotted line is a speech synthesizer. The voice analyzer A 'has 101
Is an input terminal, 102 is a linear prediction analyzer that receives the audio signal a and calculates a linear prediction coefficient b, and 103 is an input of the audio signal a and the linear prediction coefficient b and calculates a first residual signal c. Linear prediction analysis filter 104 performs the first residual signal c
And a residual power analyzer 105 for calculating a residual power g. A pitch analyzer 105 for inputting the first residual signal c and calculating a voiced / unvoiced decision e ′ and a pitch frequency d. It is.

【０００３】音声合成器Ｂ′は、１１１が有声／無声の
判定ｅ′とピッチ周波数ｄを入力して、第２の残差信号
ｆ′を生成する残差信号生成器、１１２が第２の残差信
号ｆ′と残差パワーｇとを入力して、第２の残差信号
ｆ′を増幅する増幅器、１１３は増幅された第２の残差
信号ｈ′と線形予測係数ｂとを入力して、音声信号ｉ′
を生成する線形予測合成フィルタ、１１４が出力端子で
ある。[0003] A voice synthesizer B 'includes a residual signal generator 111 which receives a voiced / unvoiced decision e' and a pitch frequency d and generates a second residual signal f '. An amplifier that receives the residual signal f 'and the residual power g and amplifies the second residual signal f'. An amplifier 113 receives the amplified second residual signal h 'and the linear prediction coefficient b. And the audio signal i '
, And 114 is an output terminal.

【０００４】次に、図３に示すボコーダの動作を簡単に
説明する。以下の処理は、ある一定長（例えば５mse
c.）毎のフレーム単位に行われる。音声分析器Ａ′側の
処理は、入力端子１０１から音声信号ａを入力する。図
２（１）は、その音声信号ａの音声波形例である。Next, the operation of the vocoder shown in FIG. 3 will be briefly described. The following processing is performed for a certain length (for example, 5 mse
c.) It is performed for each frame. In the processing on the voice analyzer A 'side, a voice signal a is input from the input terminal 101. FIG. 2A shows an example of an audio waveform of the audio signal a.

【０００５】線形予測分析器１０２により線形予測係数
ｂを算出し、線形予測分析フィルタ１０３に出力すると
共に、音声合成器Ｂ′の線形予測合成フィルタ１１３に
も送信する。線形予測分析フィルタ１０３は、音声信号
ａと線形予測係数ｂとを入力し、第１の残差信号ｃを出
力する。図２（２）は、図２（１）の音声信号ａから求
めた第１の残差信号ｃの例である。残差パワー分析器１
０４は、第１の残差信号ｃの残差パワーｇを算出し、音
声合成器Ｂ′の増幅器１１２に送信する。ピッチ分析器
１０５は、第１の残差信号ｃを入力し、第１の残差信号
ｃの相関値（以下ピッチ相関値と呼ぶ）をもとに、ピッ
チ周期性が高いか否か判定し、高い場合には有声の、低
い場合には無声の判定結果ｅ′と、ピッチ周波数ｄとを
音声合成器Ｂ′の残差信号生成器１１１に送信する。[0005] The linear prediction coefficient b is calculated by the linear prediction analyzer 102, output to the linear prediction analysis filter 103, and transmitted to the linear prediction synthesis filter 113 of the speech synthesizer B ′. The linear prediction analysis filter 103 receives the audio signal a and the linear prediction coefficient b, and outputs a first residual signal c. FIG. 2 (2) is an example of the first residual signal c obtained from the audio signal a of FIG. 2 (1). Residual power analyzer 1
04 calculates the residual power g of the first residual signal c and sends it to the amplifier 112 of the voice synthesizer B '. The pitch analyzer 105 receives the first residual signal c, and determines whether or not the pitch periodicity is high based on a correlation value of the first residual signal c (hereinafter, referred to as a pitch correlation value). , A voiced result if high, a voiceless result if low, and a pitch frequency d to the residual signal generator 111 of the voice synthesizer B '.

【０００６】ピッチ周波数とピッチ相関値の算出方法
は、一般に広く使われているピッチ探索の手法を用いる
ことができる。探索手法は、残差信号を分析対象とし
て、ピッチ周波数に対応する時間間隔（時間シフト量）
を変量とし、正規化自己相関値を最大にする時間間隔か
ら、ピッチ周波数を決定する。また、そのときの正規化
自己相関値をピッチ相関値とする手法を用いることがで
きる。As a method of calculating the pitch frequency and the pitch correlation value, a pitch search technique which is generally widely used can be used. The search method uses a residual signal as an analysis target and a time interval (time shift amount) corresponding to a pitch frequency.
Is a variable, and the pitch frequency is determined from the time interval that maximizes the normalized autocorrelation value. Further, a method of using the normalized autocorrelation value at that time as a pitch correlation value can be used.

【０００７】音声合成器Ｂ′側の処理は、残差信号生成
器１１１は、音声分析器Ａ′側から送られた有声／無声
の判定結果ｅ′と、ピッチ周波数ｄの情報から、有声の
場合は、ピッチ周波数ｄで決められる周期的信号（例え
ばパルス列）を生成し、無声の場合は、雑音信号（例え
ば白色雑音）を生成する。この様子を図４に示す。[0007] In the processing on the voice synthesizer B 'side, the residual signal generator 111 uses the voiced / unvoiced determination result e' sent from the voice analyzer A 'side and the pitch frequency d information to determine whether the voice signal is voiced. In this case, a periodic signal (for example, a pulse train) determined by the pitch frequency d is generated, and in the case of no voice, a noise signal (for example, white noise) is generated. This is shown in FIG.

【０００８】図４は、残差信号生成器を模式的に示して
おり、周期的信号生成器と雑音信号生成器とから構成さ
れている。音声が有声の場合は、ピッチ周期波ｄの情報
に従って周期的信号を、無声の場合は、雑音信号を切り
換えて出力するようになっている。FIG. 4 schematically shows a residual signal generator, which comprises a periodic signal generator and a noise signal generator. When the voice is voiced, a periodic signal is output according to the information of the pitch periodic wave d. When the voice is unvoiced, a noise signal is switched and output.

【０００９】増幅器１１２は、音声分析器Ａ′側から送
られた残差パワーｇの情報を元に、第１の残差信号ｃの
パワーと同じになるように、残差信号生成器１１１から
出力される第２の残差信号ｆ′を増幅する。この様子を
図２（３）に模式的に示す。線形予測合成フィルタ１１
３は、音声分析器Ａ′側から送られた線形予測係数ｂの
情報と、増幅された第２の残差信号ｈ′とから音声信号
ｉ′を合成し、出力端子１１４に出力する。The amplifier 112 outputs a signal from the residual signal generator 111 based on the information on the residual power g sent from the voice analyzer A 'so that the power becomes equal to the power of the first residual signal c. The output second residual signal f 'is amplified. This situation is schematically shown in FIG. Linear prediction synthesis filter 11
3 synthesizes an audio signal i 'from the information of the linear prediction coefficient b sent from the audio analyzer A' side and the amplified second residual signal h ', and outputs it to the output terminal 114.

【００１０】このように音源信号を、周期的信号と雑音
信号とを切り替えてモデル化する方式の他に、有声信号
と無声信号とが混合した信号をモデル化することによ
り、分析合成音声の品質を向上させる技術もある（例え
ば、“Ｈｉｇｈ−ＱｕａｌｉｔｙＨａｒｍｏｎｉｃ
ＣｏｄｉｎｇＡｔＶｅｒｙＬｏｗＢｉｔＲａ
ｔｅｓ”、Ｇ．Ｙａｎｇ、Ｈ．Ｌｅｉｃｈ、ＩＣＡＳＳ
Ｐ、１９９４）。混合比率の制御は、例えばピッチ相関
値に基づいて行われる。周期性の度合が強いと周期的信
号を多く混合し、逆に周期性が弱いと雑音信号を多く混
合する。[0010] In addition to the method of modeling a sound source signal by switching between a periodic signal and a noise signal as described above, by modeling a signal in which a voiced signal and an unvoiced signal are mixed, the quality of an analysis-synthesized speech is improved. There is also a technology for improving (for example, “High-Quality Harmonic”
Coding At Very Low Bit Ra
tes ", G. Yang, H. Leich, ICASS
P, 1994). The control of the mixing ratio is performed based on, for example, a pitch correlation value. When the degree of periodicity is high, a large amount of periodic signals are mixed, and when the degree of periodicity is low, a large amount of noise signals are mixed.

【００１１】[0011]

【発明が解決しようとする課題】上記音声分析合成器に
おいて、音源信号を適切にモデル化することが重要であ
る。しかしながら、ピッチ相関値にしたがって有声／無
声の判定をする処理において、ピッチ周波数の存在する
周期的な信号の立ち上がり区間などではピッチ相関値が
低く、無声に判定されてしまい、合成した音声が雑音的
になるという課題があった。In the above-mentioned speech analysis / synthesizer, it is important to appropriately model a sound source signal. However, in the process of determining voiced / unvoiced according to the pitch correlation value, the pitch correlation value is low in a rising section of a periodic signal where a pitch frequency exists, and the voice is determined to be unvoiced, and the synthesized voice is noise-like. There was a problem of becoming.

【００１２】[0012]

【課題を解決するための手段】上記課題を解決するため
に、本発明の音声分析合成器は、音声分析器と音声合成
器とで構成される音声分析合成器において、音声信号を
入力して線形予測係数を算出し、線形予測分析フィルタ
と上記音声合成器に出力する線形予測分析器と、上記音
声信号と上記線形予測係数とを入力して第１の残差信号
を算出して、上記音声分析器内の後続の処理手段に出力
する上記線形予測分析フィルタと、上記第１の残差信号
を入力して残差パワーを算出し、上記音声合成器に出力
する残差パワー分析器と、上記第１の残差信号を入力し
て、ピッチ周波数を算出し、パルス性分析器に出力する
ピッチ分析器とを備える音声分析器と、上記パルス性分
析器からの有声／無声の判定結果と、上記ピッチ分析器
で算出されたピッチ周波数とを入力して第２の残差信号
を生成し、増幅器に出力する残差信号生成器と、上記第
２の残差信号と、上記残差パワー分析器からの残差パワ
ーとを入力して第２の残差信号を増幅する増幅器と、上
記増幅された第２の残差信号と、上記線形予測分析器か
ら出力される上記線形予測係数とを入力して、音声信号
を生成して、出力する線形予測合成フィルタとで構成さ
れる音声合成器とを備え、上記第１の残差信号を分析
し、パルス性度合が設定された閾値より高い場合には、
有声判定を行う上記パルス性分析器を、上記音声分析器
に備える。In order to solve the above-mentioned problems, a speech analysis / synthesizer according to the present invention comprises a speech analyzer / speech synthesizer composed of a speech analyzer and a speech analyzer. Calculating a linear prediction coefficient, outputting a linear prediction analysis filter and a linear prediction analyzer to the speech synthesizer, and inputting the speech signal and the linear prediction coefficient to calculate a first residual signal; A linear predictive analysis filter for outputting to subsequent processing means in the voice analyzer, a residual power analyzer for calculating the residual power by inputting the first residual signal, and outputting the residual power to the voice synthesizer; , A voice analyzer having a pitch analyzer that receives the first residual signal, calculates a pitch frequency, and outputs the pitch frequency to a pulse analyzer, and a voiced / unvoiced determination result from the pulse analyzer. And the pitch calculated by the pitch analyzer A second residual signal generator for generating a second residual signal by inputting a frequency, and outputting the second residual signal to an amplifier; the second residual signal; and a residual power from the residual power analyzer. And an amplifier for amplifying a second residual signal, the amplified second residual signal, and the linear prediction coefficient output from the linear prediction analyzer to generate an audio signal. And a speech synthesizer configured to output a linear prediction synthesis filter, and analyzing the first residual signal. If the pulse characteristic is higher than a set threshold,
The speech analyzer is provided with the pulse analyzer that performs voiced determination.

【００１３】また、上記線形予測分析器と、上記線形予
測分析フィルタと、上記残差パワー分析器と、上記第１
の残差信号を入力して有声／無声の混合比率と、上記ピ
ッチ周波数を算出して、上記パルス性分析器に出力する
ピッチ分析器とを備える音声分析器と、上記パルス性分
析器から出力される有声／無声の混合比率と、上記ピッ
チ周波数とを入力して、第３の残差信号を生成する上記
残差信号生成器と、上記増幅器と、上記線形予測合成フ
ィルタとで構成される音声合成器とを備え、上記第１の
残差信号を分析し、上記ピッチ分析器から出力されるパ
ルス性度合に従って有声／無声の混合比率を変更する上
記パルス性分析器を備える。Further, the linear prediction analyzer, the linear prediction analysis filter, the residual power analyzer, and the first
, A voice analyzer comprising a voiced / unvoiced mixing ratio and a pitch analyzer for calculating the pitch frequency and outputting the pitch frequency to the pulse analyzer, and an output from the pulse analyzer. The voiced / unvoiced mixing ratio and the pitch frequency are input to generate a third residual signal, the residual signal generator, the amplifier, and the linear prediction synthesis filter. A voice synthesizer for analyzing the first residual signal and changing a voiced / unvoiced mixing ratio according to a pulsed degree output from the pitch analyzer.

【００１４】線形予測分析器は、入力された音声信号の
線形予測係数を算出する。線形予測分析フィルタは、音
声信号と線形予測係数を入力し、残差信号を出力する。
残差パワー分析器は、残差信号のパワーを算出する。ピ
ッチ分析器は、残差信号を入力し、残差信号の相関値を
もとにピッチ周波数を出力する。パルス性分析器は、残
差信号の波形のパルス性を判定し、パルス性度合に従っ
て有声／無声の混合比率を変更する。[0014] The linear prediction analyzer calculates a linear prediction coefficient of the input speech signal. The linear prediction analysis filter inputs a speech signal and a linear prediction coefficient, and outputs a residual signal.
The residual power analyzer calculates the power of the residual signal. The pitch analyzer receives the residual signal and outputs a pitch frequency based on the correlation value of the residual signal. The pulse nature analyzer determines the pulse nature of the waveform of the residual signal, and changes the voiced / unvoiced mixture ratio according to the pulse nature.

【００１５】[0015]

【発明の実施の形態】〔実施の形態１〕図１は、本発明
の音声分析合成器の実施の一形態を示すブロック図であ
る。図３に示す従来技術と異なるのは、音声分析器Ａ側
にパルス性分析器６が追加され、ピッチ分析器５の出力
を、このパルス性分析器６に入力している点である。[First Embodiment] FIG. 1 is a block diagram showing an embodiment of a speech analysis / synthesizer according to the present invention. The difference from the prior art shown in FIG. 3 is that a pulse analyzer 6 is added to the voice analyzer A, and the output of the pitch analyzer 5 is input to the pulse analyzer 6.

【００１６】音声分析器Ａの１は入力端子、２は音声信
号ａを入力して線形予測係数ｂを算出する線形予測分析
器、３は音声信号ａと線形予測係数ｂを入力して、第１
の残差信号ｃを算出する線形予測分析フィルタ、４は第
１の残差信号ｃを入力して、残差パワーｇを算出する残
差パワー分析器、５は第１の残差信号ｃを入力して、ピ
ッチ周波数ｄを算出するピッチ分析器、６は第１の残差
信号ｃとピッチ周波数ｄとを入力して、パルス性度合に
基づいて有声／無声の判定ｅを行うパルス性分析器であ
る。1 is an input terminal of the speech analyzer A, 2 is a linear prediction analyzer that receives a speech signal a and calculates a linear prediction coefficient b, and 3 is a speech signal that receives a speech signal a and a linear prediction coefficient b. 1
A linear prediction analysis filter for calculating the residual signal c, a residual power analyzer 4 for receiving the first residual signal c and calculating a residual power g, and a linear power analyzer 5 for converting the first residual signal c. A pitch analyzer for inputting and calculating a pitch frequency d, a pulse analyzer 6 for inputting the first residual signal c and the pitch frequency d and performing a voiced / unvoiced determination e based on the pulse level. It is a vessel.

【００１７】音声合成器Ｂの１１は、有声／無声の判定
結果ｅとピッチ周波数ｄとを入力して、第２の残差信号
ｆを生成する残差信号生成器、１２は第２の残差信号ｆ
と残差パワーｇとを入力して、第２の残差信号ｆを増幅
する増幅器、１３は増幅された第２の残差信号ｈと線形
予測係数ｂとを入力して、音声信号ｉを生成する線形予
測合成フィルタ、１４は出力端子である。A voice synthesizer B 11 receives a voiced / unvoiced determination result e and a pitch frequency d to generate a second residual signal f, and a second residual signal generator 12 generates a second residual signal f. Difference signal f
And the residual power g, and an amplifier 13 for amplifying the second residual signal f. The amplifier 13 receives the amplified second residual signal h and the linear prediction coefficient b to convert the audio signal i. The linear prediction synthesis filter to be generated, 14 is an output terminal.

【００１８】次に、図１に示すボコーダの動作を説明す
る。以下の処理は、ある一定長（例えば５msec.）毎の
フレーム単位に行われる。音声分析器Ａ側の処理は、入
力端子１から音声信号ａを入力する。図２（１）は、音
声信号ａの音声波形例である。線形予測分析器２により
線形予測係数ｂを算出し、線形予測分析フィルタ３に出
力すると共に、音声合成器Ｂの線形予測合成フィルタ１
３にも送信する。線形予測分析フィルタ３は、音声信号
ａと線形予測係数ｂとを入力し、第１の残差信号ｃを出
力する。Next, the operation of the vocoder shown in FIG. 1 will be described. The following processing is performed for each frame of a certain fixed length (for example, 5 msec.). In the processing on the audio analyzer A side, the audio signal a is input from the input terminal 1. FIG. 2A shows an example of the audio waveform of the audio signal a. The linear prediction coefficient b is calculated by the linear prediction analyzer 2 and output to the linear prediction analysis filter 3, and the linear prediction coefficient b of the speech synthesizer B is calculated.
3 is also transmitted. The linear prediction analysis filter 3 receives the audio signal a and the linear prediction coefficient b, and outputs a first residual signal c.

【００１９】図２（２）は、図２（１）の音声信号ａか
ら求めた第１の残差信号ｃの例である。残差パワー分析
器４は、残差信号ｃの残差パワーｇを算出し、音声合成
器Ｂの増幅器１２に送信する。ピッチ分析器５は第１の
残差信号ｃを入力し、ピッチ周波数ｄを算出して、パル
ス性分析器６に出力する。パルス性分析器６は、第１の
残差信号ｃの相関値（以下ピッチ相関値と呼ぶ）をもと
に、ピッチ周期性が高いか否か判定し、高い場合には有
声の、低い場合には無声の判定結果ｅと、ピッチ周波数
ｄとを音声合成器Ｂの残差信号生成器１１に送信する。FIG. 2B shows an example of the first residual signal c obtained from the audio signal a in FIG. 2A. The residual power analyzer 4 calculates the residual power g of the residual signal c, and transmits it to the amplifier 12 of the speech synthesizer B. The pitch analyzer 5 receives the first residual signal c, calculates a pitch frequency d, and outputs it to the pulse analyzer 6. The pulse analyzer 6 determines whether or not the pitch periodicity is high based on the correlation value of the first residual signal c (hereinafter referred to as pitch correlation value). Transmits the unvoiced determination result e and the pitch frequency d to the residual signal generator 11 of the voice synthesizer B.

【００２０】図２（２）に示すように、一般に、ピッチ
周波数が存在する音声区間の残差信号は、元の音声信号
に比べて波形がパルス的になる性質がある。パルス性分
析器６は第１の残差信号ｃを入力し、第１の残差信号ｃ
のパルス性度合を算出する。パルス性度合の算出方法と
しては、（１）式で表わされる、フレーム内残差信号波
形の最大絶対値と平均絶対値との比、或いは、（２）式
で表わされる、フレーム内残差信号波形の二乗平均平方
根値と最大絶対値との比などを用いることができる。或
いは（１）式、（２）式で算出される値を、１以下の値
に正規化して取り扱い易くするために、（３）式のよう
にフレーム長の平方根で正規化することもできる。As shown in FIG. 2 (2), generally, the residual signal in a voice section in which a pitch frequency exists has a property that the waveform becomes pulse-like compared to the original voice signal. The pulse analyzer 6 receives the first residual signal c and outputs the first residual signal c
Is calculated. As a method of calculating the degree of pulse, the ratio between the maximum absolute value and the average absolute value of the in-frame residual signal waveform represented by the equation (1) or the in-frame residual signal represented by the equation (2) is used. The ratio between the root mean square value of the waveform and the maximum absolute value can be used. Alternatively, in order to normalize the value calculated by the formulas (1) and (2) to a value of 1 or less to facilitate handling, the value can be normalized by the square root of the frame length as in the formula (3).

【００２１】Ｙmax＝Ｍax(|Ｙ１|，|Ｙ２|,...,|ＹＮ|) Ｙave＝(|Ｙ１|＋|Ｙ２|＋...＋|ＹＮ|)／ＮＹrms＝Ｓqrt((Ｙ１＊Ｙ１＋Ｙ２＊Ｙ２＋...＋ＹＮ＊ＹＮ)／Ｎ) Ｐulse＝Ｙmax／Ｙave (1) Ｐulse＝Ｙmax／Ｙrms (2) Ｐulse＝Ｙmax／(Ｙrms＊Ｓqrt(Ｎ)) (3) ここで、Ｎはフレーム内のサンプル数（例えば４０）Ｙi(i＝１,２，...，Ｎ) はフレーム内残差信号の波高
値 |ｘ| は、ｘの絶対値Ｍax(Ｙ１,Ｙ２,...,ＹＮ)は、Ｙ１,Ｙ２,...,ＹＮの最
大値Ｓqrt(ｘ) は、ｘの平方根を表わす。Ymax = Max (| Y1 |, | Y2 |, ..., | YN |) Yave = (| Y1 | + | Y2 | + ... + | YN |) / N Yrms = Sqrt ((Y1 * Y1 + Y2 * Y2 + ... + YN * YN) / N) Pulse = Ymax / Yave (1) Pulse = Ymax / Yrms (2) Pulse = Ymax / (Yrms * Sqrt (N)) (3) where N is The number of samples in the frame (for example, 40) Yi (i = 1, 2,..., N) is the peak value | x | of the residual signal in the frame is the absolute value of x Max (Y1, Y2,. , YN) is the maximum value of Y1, Y2,..., YN. Sqrt (x) represents the square root of x.

【００２２】（３）式を用いた場合、例えば最もパルス
性の強い例として、フレーム内に１本のパルスのみ存在
する場合は、算出される値は１になり、逆に最もパルス
性の弱い例として、フレーム内全て等振幅の信号が存在
する場合は、算出される値は１／Ｓqrt（Ｎ）となる。When equation (3) is used, for example, when only one pulse exists in a frame as an example of the strongest pulse, the calculated value is 1, and conversely, the weakest pulse. As an example, when there are all signals of equal amplitude in a frame, the calculated value is 1 / Sqrt (N).

【００２３】次に算出されたパルス性の指標値を、ある
判定閾値（例えば０.５）と比較して、パルス性の指標
値の方が大きければ、ピッチ分析器５から入力した有声
／無声の判定信号を有声判定に設定する。この場合ピッ
チ周波数としては、ピッチ相関値から最もピッチ周波数
の可能性の高い（ピッチ相関値が高い）値に設定する方
法や、ある一定の値に決めるなどの方法が考えられる。Next, the calculated pulse index value is compared with a certain judgment threshold value (for example, 0.5). If the pulse index value is larger, voiced / unvoiced input from the pitch analyzer 5 is obtained. Is set to voiced determination. In this case, as the pitch frequency, a method of setting the pitch correlation value to a value with the highest possibility of the pitch frequency (higher pitch correlation value) or a method of determining a certain fixed value can be considered.

【００２４】音声合成器Ｂ側の処理は、残差信号生成器
１１は、音声分析器Ａ側から送られた有声／無声の判定
結果ｅと、ピッチ周波数ｄの情報から、有声の場合は、
ピッチ周波数ｄで決められる周期的信号（例えばパルス
列）を生成し、無声の場合は、雑音信号（例えば白色雑
音）を生成する。この様子を図４に示す。In the processing on the voice synthesizer B side, the residual signal generator 11 determines whether the voice signal is voiced from the voiced / unvoiced determination result e sent from the voice analyzer A side and the pitch frequency d information.
A periodic signal (for example, a pulse train) determined by the pitch frequency d is generated, and in the case of no voice, a noise signal (for example, white noise) is generated. This is shown in FIG.

【００２５】増幅器１２は、音声分析器Ａ側から送られ
た残差パワーｇの情報を元に、第１の残差信号ｃのパワ
ーと同じになるように、残差信号生成器１１から出力さ
れる第２の残差信号ｆを増幅する。この様子を図２
（４）に模式的に示す。図２（４）の縦軸及び、横軸の
スケームは、図２（１）乃至、図２（３）と同じであ
る。但し、この方法は、パルスが生成される位置の情報
がないため、パルスの位置は分析フレーム内のどこに立
つかは、図２（３）、（４）と、図２（１）、（２）と
は異なる。線形予測合成フィルタ１３は、音声分析器Ａ
側から送られた線形予測係数ｂの情報と、増幅された第
２の残差信号ｈとから音声信号ｉを合成し、出力端子１
４に出力する。The amplifier 12 outputs the signal from the residual signal generator 11 based on the information on the residual power g sent from the voice analyzer A so that the power becomes the same as the power of the first residual signal c. The amplified second residual signal f is amplified. Figure 2 shows this situation.
This is schematically shown in (4). The scales on the vertical axis and the horizontal axis in FIG. 2D are the same as those in FIGS. 2A to 2C. However, in this method, since there is no information on the position where the pulse is generated, where the position of the pulse stands in the analysis frame depends on FIGS. 2 (3) and (4) and FIGS. 2 (1) and (2). ). The linear predictive synthesis filter 13 includes a speech analyzer A
The audio signal i is synthesized from the information of the linear prediction coefficient b sent from the side and the amplified second residual signal h, and the output terminal 1
4 is output.

【００２６】音声信号に対する処理内容を図２の場合で
例示すると、従来の技術では、ピッチの立ち上がり区間
は、ピッチ相関値が低く算出されてしまい、図２（３）
に示すように３．８８０秒から３．８８５秒の区間は無
声と判定されて、雑音信号が生成されているが、本発明
では、残差信号のパルス性の度合が高く、図２（４）に
示すように有声判定が行われ、周期的信号（ここではパ
ルス列）が生成されている。If the processing content for the audio signal is exemplified in the case of FIG. 2, in the prior art, the pitch correlation value is calculated to be low in the rising section of the pitch, and FIG.
As shown in (3), a section from 3.880 seconds to 3.885 seconds is determined to be unvoiced, and a noise signal is generated. In the present invention, however, the degree of pulse property of the residual signal is high, and FIG. ), Voiced determination is performed, and a periodic signal (here, a pulse train) is generated.

【００２７】〔実施の形態２〕上記実施の一形態では、
音源信号である残差信号を周期的信号か、雑音信号かで
表現していたが、従来技術で説明したように、有声信号
と無声信号とを混合した信号で表現するシステムに適用
することもできる。この場合も図１と同様に、新たに追
加される処理ブロックはパルス性分析器６だけである。
処理内容は実施形態１と同様に、残差信号のパルス性度
合を、例えば（３）式を用いて算出する。算出されたパ
ルス性度合の指標値に従い、ピッチ分析器５から入力し
たピッチ相関値を変更し、音声合成器Ｂ側に出力する。
ピッチ相関値の変更は、パルス性度合が高い程、ピッチ
相関値を高く変更するように処理する。例えば（４）式
のような処理が考えられる。[Embodiment 2] In the above embodiment,
Although the residual signal, which is the sound source signal, is represented by a periodic signal or a noise signal, as described in the related art, the present invention can be applied to a system in which a voiced signal and an unvoiced signal are represented by a mixed signal. it can. In this case, as in FIG. 1, the only processing block newly added is the pulse analyzer 6.
As in the processing content, the pulse content of the residual signal is calculated by using, for example, Equation (3), as in the first embodiment. The pitch correlation value input from the pitch analyzer 5 is changed according to the calculated index value of the pulse characteristic, and is output to the voice synthesizer B side.
The process of changing the pitch correlation value is such that the higher the degree of pulse property, the higher the pitch correlation value. For example, a process such as Expression (4) can be considered.

【００２８】Ｐcor＝Ｐulse,Ｐulse＞Ｐcorの場合Ｐcor＝Ｐcor,Ｐluse＜＝Ｐcor場合 (4) ここで、Ｐcorは１以下の値に正規化されているピッチ
相関値Ｐulseはパルス性度合を表わし、例えば（３）式で求め
られる。In the case of Pcor = Pulse, Pulse> Pcor Pcor = Pcor, Pluse <= Pcor (4) Here, Pcor is a pitch correlation value normalized to a value of 1 or less. Pulse represents a pulse degree. For example, it can be obtained by equation (3).

【００２９】このように、残差信号波形のパルス性度合
に従ってピッチ相関値を変更することで、ピッチの立ち
上がり区間で、従来法よりも早く周期的信号（例えばパ
ルス列）成分を多く生成することができ、音質を改善す
ることができる。As described above, by changing the pitch correlation value according to the pulse degree of the residual signal waveform, it is possible to generate more periodic signal (eg, pulse train) components in the rising section of the pitch earlier than in the conventional method. And sound quality can be improved.

【００３０】[0030]

【発明の効果】本発明によれば、音声のピッチの立ち上
がり区間など、ピッチ抽出が困難な区間においても、有
声判定を行うことにより、音質を向上した音声分析合成
器が実現できる。According to the present invention, a voice analysis / synthesizer with improved sound quality can be realized by performing voiced judgment even in a section where pitch extraction is difficult, such as a rising section of a voice pitch.

[Brief description of the drawings]

【図１】本発明の音声分析合成器の実施形態を示すブロ
ック図である。FIG. 1 is a block diagram showing an embodiment of a speech analyzer / synthesizer according to the present invention.

【図２】従来技術及び、本発明の音声分析合成器の処理
に係る音声信号の波形図である。FIG. 2 is a waveform diagram of an audio signal according to the prior art and the processing of the audio analysis / synthesizer according to the present invention.

【図３】従来技術の音声分析合成器の実施例を示すブロ
ック図である。FIG. 3 is a block diagram showing an embodiment of a conventional speech analyzer / synthesizer.

【図４】従来技術の音声分析合成器の残差信号生成器の
動作の説明図である。FIG. 4 is an explanatory diagram of an operation of a residual signal generator of a conventional speech analysis / synthesizer.

[Explanation of symbols]

１，１０１入力端子２，１０２線形予測分析器３，１０３線形予測分析フィルタ４，１０４残差パワー分析器５，１０５ピッチ分析器６パルス性分析器１１，１１１残差信号生成器１２，１１２増幅器１３，１１３線形予測合成フィルタ１４，１１４出力端子 DESCRIPTION OF SYMBOLS 1,101 Input terminal 2,102 Linear prediction analyzer 3,103 Linear prediction analysis filter 4,104 Residual power analyzer 5,105 Pitch analyzer 6 Pulse analyzer 11,111 Residual signal generator 12,112 Amplifier 13,113 Linear prediction synthesis filter 14,114 Output terminal

Claims

[Claims]

1. A speech analysis / synthesizer composed of a speech analyzer and a speech synthesizer, wherein a speech signal is input, a linear prediction coefficient is calculated, and a linear prediction coefficient is output to a linear prediction analysis filter and the speech synthesizer. A prediction analyzer, the linear prediction analysis filter that receives the audio signal and the linear prediction coefficient, calculates a first residual signal, and outputs the first residual signal to a subsequent processing unit in the audio analyzer. A residual power analyzer that receives the first residual signal to calculate a residual power, and outputs the residual power to the speech synthesizer; and inputs the first residual signal, calculates a pitch frequency, and outputs a pulse. A voice analyzer having a pitch analyzer for outputting to the sex analyzer; a voiced / unvoiced determination result from the pulsed sex analyzer; and a pitch frequency calculated by the pitch analyzer. And output it to the amplifier. A residual signal generator; an amplifier that receives the second residual signal and the residual power from the residual power analyzer and amplifies the second residual signal; And a speech synthesizer comprising a linear prediction synthesis filter that receives the linear prediction coefficient output from the linear prediction analyzer, generates a speech signal, and outputs the speech signal. 1. A speech analysis / synthesizer characterized in that the speech analyzer is provided with the pulse analyzer for performing a voiced judgment when the pulse signal degree is higher than a set threshold value by analyzing the residual signal of No. 1. .

2. The linear prediction analyzer, the linear prediction analysis filter, the residual power analyzer, and a voiced / unvoiced mixing ratio by inputting the first residual signal and the pitch frequency. A voice analyzer having a pitch analyzer that calculates and outputs the pitch to the pulse analyzer; a voiced / unvoiced mixing ratio output from the pulse analyzer and the pitch frequency; 3; a residual signal generator that generates the residual signal of No. 3; an amplifier that amplifies the third residual signal; and a speech synthesizer including the linear prediction synthesis filter. 2. The speech analysis / synthesis according to claim 1, further comprising the pulse analyzer for analyzing a residual signal and changing a voiced / unvoiced mixture ratio according to a pulse output from the pitch analyzer. vessel.