JPS59152496A

JPS59152496A - Voice analysis synthesization system

Info

Publication number: JPS59152496A
Application number: JP2607183A
Authority: JP
Inventors: 田中　啓夫; 古村　光夫
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-02-18
Filing date: 1983-02-18
Publication date: 1984-08-31
Also published as: JPH0330880B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】囚　発明の技術分野本発明は、音声分析合成システム、特に線形予測法によ
って入力音声を分析しパラメータを抽出して行う音声分
析合成システムにおいて、予測残差についての自己相関
にもとづいてピッチ周期を抽出するに当って、自己相関
係数のより大きい複数個の成分を候補として抽出してお
き、信頼性のよ“り高い期間についてのピッチ周期を基
準として。[Detailed Description of the Invention] Technical Field of the Invention The present invention relates to a speech analysis and synthesis system, particularly a speech analysis and synthesis system that analyzes input speech and extracts parameters using a linear prediction method. When extracting the pitch period based on the above, multiple components with larger autocorrelation coefficients are extracted as candidates, and the pitch period for the more reliable period is used as the reference.

信頼性のよシ低い期間についてのピッチ周期を設定して
ゆくようにした音声分析合成システムに関するものであ
る。This invention relates to a speech analysis and synthesis system that sets pitch periods for periods with very low reliability.

（■３）　　技術の背景と問題点従来、線形予測法（−＠ΦＰＡＲＣＯＲ法を含む）によ
る音声分析合成システムにおいては、線形予測を行った
結果の予測残差を調べ、当該予測残差の自己相関係数列
の時間遅れに対する変化がピッチ周期に相当する時間遅
れ位置においてピークをとることに着目してピッチ周期
を抽出することが行われている。即ち、ピッチ周期を抽
出するに当って、（１）予測残差の自己相関係数列の最
大値を与える時間遅れを見出しｌ　ｆｌｌ＋前後のフレ
ームについてのピッチ周期からの変動を調べ、　（ｉｉ
ｉｌ変動が大きければ変動が小さくなるような半周期あ
るいは倍周期に該当する位置にピークがあるか否かを調
べ。(■3) Technical background and problems Conventionally, in speech analysis and synthesis systems using linear prediction methods (including the -@ΦPARCOR method), the prediction residual as a result of linear prediction is examined, and the self-analysis of the prediction residual is performed. The pitch period is extracted by focusing on the fact that the change in the correlation coefficient sequence with respect to time delay takes a peak at the time delay position corresponding to the pitch period. That is, in extracting the pitch period, (1) find the time delay that gives the maximum value of the autocorrelation coefficient sequence of the prediction residual, examine the fluctuations from the pitch period for frames before and after l fll +, and (ii
If the il fluctuation is large, check whether there is a peak at a position corresponding to a half period or double period where the fluctuation becomes small.

（１ｖ）あればそのピークを与える時間遅れをピッチ周
期とするという如き方法が採用されている。(1v), a method is adopted in which the time delay giving the peak is taken as the pitch period.

しかし、この方法の場合には、原音声のビッグ周期にゆ
らぎがあったり、音素境界でピッチ構造がくずれていた
りすると、予測残差の自己相関係数が鋭いピークをもた
なくな妙、ピッチ周期を正しく発見できないという問題
を含んでいる。However, in the case of this method, if there are fluctuations in the big period of the original speech or if the pitch structure collapses at phoneme boundaries, the autocorrelation coefficient of the prediction residual will not have a sharp peak, and the pitch This includes the problem that the period cannot be discovered correctly.

（Ｃ１発明の目的と構成本発明は、上記の点を解決することを目的としており、
ピッチ周期を信頼性の高い形で抽出することＫよって９
人手による誤り修正作業をなりシ。(C1 Objective and structure of the invention The present invention aims to solve the above points,
Reliably extracting the pitch period K.9
Eliminate manual error correction work.

音声分析合成を全自動化するようにすることを目的とし
ている。そして、そのだめ９本発明の音声分析合成ンス
テムは、入力音声を線形予測法によって分析してパラメ
ータに変換すると共に、予ｉ１＋ｌＩ残差の自己相関係
数を用いてピッチ周期を抽出し。The aim is to fully automate speech analysis and synthesis. Then, the speech analysis and synthesis system of the present invention analyzes the input speech using a linear prediction method and converts it into parameters, and also extracts the pitch period using the autocorrelation coefficient of the pre-i1+lI residual.

これらの結果にもとづいて音声合成を行う音声分析合成
シスデノ、において、上記予測残差の自己相関係数列あ
るいは一ヒ記予測残差の自己相関係数について加重移動
平均をとったものについての自己相関係数列を調べてよ
り相関仙の大きい複数個の候補を抽出するピッチ周期決
定候補抽出部、当該各ピッチ周期決定候補に対応する時
間遅れを抽出する時間遅れ成分抽出部、および上記ピッ
チ周期決定候補抽出部によって抽出されたピッチ周期決
定候補と該当する時間遅れ成分とにもとづいてピッチ周
期を決定する論理判断処理部をそなえてなり、該論理判
断処理部は、上記ピッチ周期決定候補の値にもとづいて
該当する時間遅れ成分がピッチ周期を設定し得る信頼度
を調べ、閾値以上の信Ｍ度をもつピッチ周期決定候補に
対応する時間遅れ成分を利用し、当該閾値以上の信頼度
をもつ期間についてのピッチ周期を決定し、閾値以下の
信頼度をもつ期間についてのビッグ周期を、先に決定さ
れたビッグ周期との連続性を保つべく決定してゆくよう
にしたことを特徴としている。以下図面を参照しつつ謄
明する、（１））発明（７）実施例第１図仁１、本明細ｐＪにいうフレーノ・の概念を説明
する説り］図、第２図は本発明の音声分析合成ンスデム
の一実施例ブロック図、第３図ないし第７回目夫々第２
図図示の論理判断処理部における処理態様を説明する説
明図、第８図＃ｉ論理判断処理部における第１バスに関
する処理についての一実施例フローチャート、第９図は
論理判断処理部における第２パスに関する処理について
の一実施例フローチャー１・、第１０図（Ａ＋　ｆｉ＋
は一緒になって論理断処理部における第５バスに関する
処理についての一実施例）自−チャード、第１２図は論
理判断処理部における第４バスに関する処理についての
一実施例フローチャートを示す。In the speech analysis and synthesis system that performs speech synthesis based on these results, the autocorrelation coefficient sequence of the above prediction residuals or the weighted moving average of the autocorrelation coefficients of the above prediction residuals is calculated. A pitch period determination candidate extraction unit that examines a relational number sequence and extracts a plurality of candidates with larger correlation coefficients, a time delay component extraction unit that extracts a time delay corresponding to each pitch period determination candidate, and the pitch period determination candidate. a logical judgment processing section that determines the pitch period based on the pitch period determination candidate extracted by the extraction section and the corresponding time delay component, and the logical judgment processing section determines the pitch period based on the value of the pitch period determination candidate. Check the reliability with which the pitch period can be set by the corresponding time delay component, and use the time delay component corresponding to the pitch period determination candidate with confidence M greater than the threshold to determine the period with reliability greater than the threshold. The present invention is characterized in that the pitch period is determined, and the big period for a period having reliability below a threshold value is determined so as to maintain continuity with the previously determined big period. The following describes the invention with reference to the drawings. A block diagram of an embodiment of the speech analysis and synthesis system, Figures 3 to 7, respectively.
An explanatory diagram illustrating the processing mode in the illustrated logical judgment processing section, FIG. 8 is a flowchart of an embodiment of the processing related to the first bus in the #i logical judgment processing section, and FIG. 9 is a second path in the logical judgment processing section. Flowchart 1 of an embodiment regarding processing related to FIG.
FIG. 12 shows a flowchart of an embodiment of the processing related to the fourth bus in the logic decision processing section.

音声合成を行うだめの各種パラメータを抽出するに当っ
ては、第１図図示の如く、入力音声を所定の時間長（フ
レーム長）にて切出し、隣接する７レーノ・が一部重複
するようにしたフレームをつくり、各フレームに対応し
てフレー゛ノ、番号を附しておき、各フレーム毎のパラ
メータ・茶抽出するようにされる。In extracting various parameters for speech synthesis, as shown in Figure 1, the input speech is cut out at a predetermined time length (frame length), and the 7 adjacent frames are cut out so that they partially overlap. A frame is created, a frame number is assigned to each frame, and parameters and tea extraction are performed for each frame.

このように切出された入力音声を（Ｘ（ｎ）　−）　　−（ｘ（０１＋　Ｘ（１）　＋　
”’　＋　ｘ（％−＋））とするとき、第２図図示の如
ぐ入力されて処理が行われる。The input audio extracted in this way is (X(n) −) −(x(01+X(1) +
``'+x(%-+)), the input and processing as shown in FIG. 2 is performed.

第２図において、１は入方音声相関処理部、２は線形予
測処理部ｊ部、３け残差電力抽出部、４は残差相関処理
部、５は移動平均処理部、６は最大値抽出処理部、７ｉ
−ｔピッチ周期決定候補保持部、８は時間遅れ成分保持
部、９は論理判断処理部。In Fig. 2, 1 is an incoming audio correlation processing section, 2 is a linear prediction processing section j, a 3-digit residual power extraction section, 4 is a residual correlation processing section, 5 is a moving average processing section, and 6 is a maximum value. Extraction processing unit, 7i
-t pitch period determination candidate holding section; 8 is a time delay component holding section; 9 is a logical judgment processing section.

１０け有声／無声判定部、１１は有声区間１駆動音源牛
成部、１２１、坤声（無音を含む）区間駆動音υ１２牛
成生成１５は切換処理部、１４は線形予測合　成　部を
表わしている。10 voiced/unvoiced determination units, 11 a voiced section 1 drive sound source generation section, 121, unvoiced (including silent) section drive sound υ12 signal generation generation 15 a switching processing section, and 14 a linear prediction synthesis section. ing.

一ヒ記の如く切出さｈた入力音か（Ｘ（ｎ）　）は処理
部Ｉにおいて入力音声についての自己相関（ｒｌ）をと
られる。そして線形予測処理部２において声道に関連す
るパラメータを抽出され、予測残差（ｅ（ｎ））が残差
電力抽出部ろと残差相関処理部４と有声／無声判定部１
０とに供給きれる。The extracted input sound (X(n)) as described in Section 1 is subjected to autocorrelation (rl) with respect to the input sound in the processing section I. Then, the parameters related to the vocal tract are extracted in the linear prediction processing unit 2, and the prediction residual (e(n)) is sent to the residual power extraction unit, the residual correlation processing unit 4, and the voiced/unvoiced determination unit 1.
It can be supplied to 0.

図示ユニット３，４，５，６，７，８．９は９本発明が
適用された場合のピッチ周期抽出部に対応している。該
ピッチ周期抽出部によって抽出されたピッチ周期は９合
成時に有声区間駆動音源生成部１１に供給される。そし
て、パルスまたは三角波の形でピッチ周期に対応しかつ
抽出部３からの電力に対応した振動成分が生成される。Illustrated units 3, 4, 5, 6, 7, and 8.9 correspond to nine pitch period extraction sections when the present invention is applied. The pitch period extracted by the pitch period extracting section is supplied to the voiced section drive sound source generating section 11 at the time of 9 synthesis. Then, a vibration component corresponding to the pitch period and the electric power from the extractor 3 is generated in the form of a pulse or a triangular wave.

また無声区間駆動音源生成部１２は、無声区間に対応す
るパワーをホワイト−ノイズの形で生成する。Further, the unvoiced section driving sound source generating section 12 generates power corresponding to the unvoiced section in the form of white noise.

有声／無声判定部１０は、有声区間と無声区間とを判定
し９合成時に切換処理部１３における切換え処理を行う
。そして、　　線形予測合成部１４は、切換処理部１３
からの出力に対して、声道に関連するパラメータを附加
して２合成音声（ｙ（−））として出力する。The voiced/unvoiced determining section 10 determines whether a voiced section is a voiced section or an unvoiced section, and performs a switching process in a switching processing section 13 at the time of 9 synthesis. Then, the linear prediction synthesis unit 14 includes the switching processing unit 13
Parameters related to the vocal tract are added to the output from the voice tract, and the result is output as 2-synthesized speech (y(-)).

本発明の重要な特徴は、第２図図示ユニット３゜４．５
，６，７，８．９　　に示すピッチ周期抽出部に存在し
ており、以下具体的に説明する。An important feature of the invention is that the illustrated unit 3°4.5 in FIG.
, 6, 7, and 8.9, and will be specifically explained below.

第２図図示の如く予測残差（ｅ（ロ））が抽出されたと
き、残差電力抽出部３において、残差（ｅ（わ））の電
力（パワー）が抽出される。一方、残差相関処理部４に
おいて、残差についての自己相関がとられる。即ち、予
測残差をとし、第ｍフレーム目の予測残差の時間遅れｉについて
の自己相関係数を。When the prediction residual (e (b)) is extracted as shown in FIG. 2, the residual power extraction section 3 extracts the power of the residual (e (wa)). On the other hand, the residual correlation processing section 4 calculates autocorrelation for the residuals. That is, let the prediction residual be the autocorrelation coefficient for the time delay i of the prediction residual of the m-th frame.

（但し、ｉ＝０．１．・・印・、　Ｎ−１＞　−＋１１
とした場合のｉ己相関係数ρ開の系列を抽出する。(However, i=0.1...mark...N-1>-+11
A series of i autocorrelation coefficients ρ is extracted when

各係数ρｌ（へ）は時系列にみた場合に極端な凹凸が非
所望に存在する可能性があり、移動平均処理部５は９例
えば。When each coefficient ρl(f) is viewed in time series, there is a possibility that extreme unevenness exists undesirably, and the moving average processing unit 5 has a value of 9, for example.

ρ：（ホ）−区（ρ１−山＋ρ１（ｌＩ＋）十ρ、。□
（→）但し、　　Ｉ　ｅ　〔’ｌ’　ｍｉｎ　、　２　
Ｔ、ｍａｘ　）−−−−−−−−−−−−−−−（２１
の如き形で、移動平均をとるようにする。なお上記Ｔｍ
１ｎ　、　２　Ｔｍａｘ　　については後述する。第２
図図示構成においては、上述の移動平均をとった自己相
関係数ρ’＋ｈ）が用いられるが、以下の説明において
ｌ−ｉ簡単のために描該伴数ρ′、（→をｒｐに係数ρ
１（→と記述する。ρ: (e) - ku (ρ1 - mountain + ρ1 (lI +) ten ρ,.□
(→) However, I e ['l' min, 2
T, max)---------------(21
Take a moving average in the form of: Note that the above Tm
1n and 2Tmax will be described later. Second
In the configuration shown in the figure, the autocorrelation coefficient ρ'+h) obtained by taking the above-mentioned moving average is used, but in the following explanation, for the sake of simplicity, we will draw the autocorrelation coefficient ρ', (→ with rp as the coefficient ρ
1 (described as →.

第２図図示の最大イ１＾抽出処理部６は、各フレームず
σに、」二記係数ρＩ（ＩＴ８のより大きい値を例えば
５個分：１７＜出し、ピッチ周期決定候補保持部７に格
納する。壕だ合わせて、当該選出された各係数ＪＣ対応
する時間遅れ１を時間遅れ成分保持部８に格納する。即
ち、脊圧しい形でピッチ周期が存在するであろうと考え
られる時間遅れｉの範囲を〔Ｔ面ｎ。The maximum i1^ extraction processing unit 6 shown in FIG. In addition, the time delay 1 corresponding to each selected coefficient JC is stored in the time delay component holding unit 8. That is, the time delay 1 corresponding to each selected coefficient JC is stored in the time delay component holding unit 8. That is, the time delay 1 that is considered to have a pitch period in an unpleasant manner is stored. The range of i is [T surface n.

Ｔｍａｘ）としたとき、原音声におけるピッチが脱落し
ている場合などに対処するために、上記第１１）式の時
間遅れｉの探索範囲を〔Ｔｍ１ｎ　、　２　Ｔｍａｘ　
）とするが、この時間遅れｉの範囲内において、予測残
差の自己相関係数の値の大きい方から順に５個の候補を選出する。そして、これに対応する時間遅れを選出フ
る。この選出を式で表わすと、一般に。Tmax), in order to deal with the case where the pitch in the original voice is dropped, the search range of the time delay i in the above equation 11) is set as [Tm1n, 2 Tmax
), but within the range of this time delay i, five candidates are selected in descending order of the value of the autocorrelation coefficient of the prediction residual. Then, a time delay corresponding to this is selected. In general, this selection can be expressed as a formula.

ρ（Ｌｌ（ハ）＝　　　ｍａｘ　　　　　　ρ、（ハ）
□（３）！　６　（Ｔｍ１ｎ　、　２Ｔｍａｘ　）ｉＶ
τ（ｋ）（→、　１りｋ＜ｊ−１区Ｊ−５ τ”（ｍ）　−ｎｒｇ　　　　　Ｉｌｌ；ＩＸ　　　　
　ρ＋　（＠　　−−−−−−−−−−ｆ４）ｉＥ　　
（’Ｊ”ｍｉｎ　、　２’ｌ”＋ｎａｘ　’Ｊ１′￥τ
（Ｊ→　ｉ　％、　ｋ（ｊ　−１１くｊく５で表わされる。なお、上記ａｒｇけρ（１）を与える時
間遅れ１を値にとる関数である。ρ (Ll (c) = max ρ, (c)
□(3)! 6 (Tm1n, 2Tmax)iV
τ(k)(→, 1k<j-1 ward J-5 τ”(m) −nrg Ill; IX
ρ+ (@ −−−−−−−−−−f4)iE
('J”min, 2'l”+nax 'J1'\τ
(J → i %, k (j −11 × j × 5). Note that this is a function that takes the time delay 1 that gives the above arg ρ (1) as a value.

上記の如くして。As above.

（ρ（’）（、、＋、ρ（’）（，０，・・・・・・、
ρ３゛）（ホ））（Ｊ”）（→　、（２）（→、・・・
・・・　Ｊｓ）（→）が選ばれ、上記各ρ（Ｉ）（ハ）
は対応する時間遅れＪｌ）（ホ）をもってピッチ周期と
みなし′＃、際の信頼度に相当すると鳴え１よい。(ρ(')(,,+,ρ(')(,0,...)
ρ3゛) (E)) (J”) (→ , (2) (→,...
...Js)(→) is selected, and each of the above ρ(I)(c)
is regarded as the pitch period with the corresponding time delay Jl)(e), and it is said that it corresponds to the reliability of the case.

第２図し１示の論理判断処理部９は、上記選出されたρ
〈Ｉ）（→やτ（Ｉ）（→にもとづいて、各フレーノ、
毎のピッチ族Ｊｕｌを決定してゆく。該処理部９におけ
る処理は、第６図に示す如きものであり、第４図ない１
７第７ン［に概略示す如く、第１パス、第２パス。The logic judgment processing unit 9 shown in FIG.
Based on 〈I)(→ and τ(I)(→), each Freno,
Determine the pitch family Jul for each pitch group. The processing in the processing section 9 is as shown in FIG.
7. First pass, second pass as schematically shown in [7].

第３パス、第４バスの各処理から成っている。以下、こ
れについて説明をつづける。It consists of each process of the third pass and the fourth bus. This will be explained below.

今ある入力音声について、フレーム番号を横軸にとって
上記第１位の信頼度の値を縦軸にとった場合に、第３図
ｆＡ１図示の如きものであったとする。Assume that the current input audio is as shown in FIG. 3 fA1 when the frame number is plotted on the horizontal axis and the above-mentioned first reliability value is plotted on the vertical axis.

そして当該入力音声は、第５図ＣＢ＋図示の如く、無音
区間、無声区間、有声区間、無音区間、有声区間、無音
区間と並んでいたとする。Assume that the input audio is arranged as a silent section, a voiceless section, a voiced section, a silent section, a voiced section, and a silent section as shown in FIG. 5 CB+.

上記論理判断処理部９においては、第１パスにおいて第
３図ＦＣ＋図示の如く、信頼度ρ′′−が閾値θ。In the logic judgment processing section 9, in the first pass, the reliability ρ''- is equal to the threshold θ as shown in FIG. 3 FC+.

を超える区間内の各フレームについて、即ち。For each frame in an interval exceeding , i.e.

を満足する各フレームについて、ピッチ族Ｍを一応決定
する。このような決定は９図示矢印の如く時系列の順に
行われ１次いで当該矢印の逆方向にも行われる。第６図
（Ｃ１は当該第１パスに対応する処理を表わしている。The pitch family M is tentatively determined for each frame that satisfies the following. Such decisions are made in chronological order as indicated by the arrows shown in Figure 9, and then also in the opposite direction of the arrows. FIG. 6 (C1 represents processing corresponding to the first pass).

即ち、今９時系列の順に対応する処理を考えると、先頭
のフレームに対応するピッチ周期を τ（ハ）＝　、（＋）（→□−−□−−、−−−−−　
＋６）として置き、それ以降のフレームに対しては、ピ
ッチ周期のズレが予め定め７ｒ幅θ、以内にあるという
条件（ｊ＋１−糾という）ｌ　　”（ｍ）　　Ｔ’（ｍ−＋）　ｌ　：５−θ、　
　−−−−−−−−−−一−−−−−−−−−−−−ｔ
７）を満足するように、ピッチ周期を（Ｔ”’（→、・・・・・・、ＪＩ）（→、・・・・・
　Ｊ５）（→）の中から、Ｊの小さい順に選出する。第
（７）式を満足するものが存在しない場合には、当該フ
レームについてはピッチ周期を決定せず９次のフレーム
に進む。当該次のフレームは第（６）式にしたかつてピ
ッチ周期を定める。時系列と逆方向に処理するには、上
記τＦ（急の代わりに、τ８（→を考慮し。In other words, considering the processing that corresponds to the order of the nine time series, the pitch period corresponding to the first frame is τ (c) = , (+) (→□−−□−−, −−−−−
+6), and for subsequent frames, the condition is that the pitch period deviation is within a predetermined 7r width θ (referred to as j+1-d) l ''(m) T'(m-+) l : 5-θ,
−−−−−−−−−−−−−−−−−−−−−−−t
7), set the pitch period to (T”'(→,..., JI)(→,...
J5) Select from (→) in descending order of J. If there is no frame that satisfies Equation (7), the pitch period is not determined for the frame and the process proceeds to the ninth frame. The next frame determines the pitch period according to equation (6). To process in the opposite direction to the time series, consider τ8(→ instead of τF(steep).

ｍ’　＝　　Ｍ　　−ｍ　　＋　１の如く時間を変換して行う。そして、τｒ（ハ）とげ（
へ）とが共に求まったフレームについては、ピッチ周期
τ（ハ）として。This is done by converting the time as m' = M - m + 1. And τr (ha) thorn (
For frames in which both ``f'' and ``f'' are determined, the pitch period τ(c) is used.

τ（→ｗ　ｍｉｎ　（τ′（かτ８（ハ））−一−−−
−，−，−−−−−−−−−＋８１から求め、一方のみ
求まっているフレームについては求１つているものをピ
ッチ周期τ（→とする。τ(→w min (τ′(kaτ8(c))−1−−−
−, −, −−−−−−−−−+81, and for frames for which only one is determined, the pitch period for which only one is determined is defined as pitch period τ(→).

これ以外の場合には、当該フレーノ、に対してピッチ周
期τ（→を決定しない。In other cases, the pitch period τ(→ is not determined for the Freno).

第２パスは、第１パスについて決定した各フレームのピ
ッチ周期を調べ、第６図ｆＤ］図示の如く例えば３フレ
一ム分以下のフレームにおいてピッチ周期未決定個所が
存在したとすると１図示点Ｐ（Ｆ　ｍフレーム目）のピ
ッチ周期に対して点Ｑ（第（ｒｎ＋ｎ　＋　１　）フレ
ーム目）のピッチ周期が１ｒ　（ｎｒｌ−ｎ＋１　）　
　　ｒ（ｍｌ　ｌ≦−ｖ’（ｎ　−１−１−）　”θｒ
−−−−−＋９１を満足するように、上述の５個の候補
の中から選ぶようにする。当該５個について該当するも
のが存在しない場合には　、（＋）（→／２のｆ７ｊ　
（先にｔｘｔもれている候補のピッチ周期の半分に当る
もの）をも調べてみる。In the second pass, the pitch period of each frame determined in the first pass is checked, and if there is a pitch period undetermined portion in a frame of 3 frames or less as shown in FIG. The pitch period of point Q ((rn+n+1)th frame) is 1r (nrl-n+1) with respect to the pitch period of P (F mth frame)
r(ml l≦−v'(n −1−1−) ”θr
-------+91 is selected from among the above five candidates. If there is no corresponding item for the five items, (+)(→/2 f7j
(corresponding to half of the pitch period of the candidate whose txt was previously omitted) will also be investigated.

」１記処理は１時系列と逆方向についても。” 1. Processing also applies to 1. time series and reverse direction.

ｍ’　＝　Ｍ　−ｍ　＋　１の如く時間を変換して処理してみて９点Ｐや点Ｑのフレ
ームのピッチ周期として、より小さい値が求まっていれ
仁１：、それを抽出するっ第４図および第５図は夫々上
記第１パスおよび第２パスについての処理を概略のフロ
ーヂャートの形で表わした説明図である。第１パスに対
応する第４図図示フローチャートにおいては、有声フレ
ームであって信頼度の高いフレームについて。Convert and process the time as m' = M - m + 1 to find the smaller value as the pitch period of the frame at point P and point Q. 5 and 5 are explanatory diagrams showing the processing for the first pass and the second pass, respectively, in the form of a schematic flowchart. In the flowchart shown in FIG. 4 corresponding to the first pass, the frame is a voiced frame and has a high reliability.

先頭フレームのピッチ周期を第１候補のものとし。Let the pitch period of the first frame be the first candidate.

それに続くフレームについて連続性を保つ候補をもって
当該フレームのピッチ周期としてゆく。第２パスに対応
する第５図図示のフローチャートにおいては、信頼でき
るフレームであって離れが３フレーム以下のフレームに
ついて連続性を保つように処理している。For subsequent frames, candidates that maintain continuity are used as the pitch period of the frame. In the flowchart shown in FIG. 5 corresponding to the second pass, frames that are reliable and have a distance of 3 frames or less are processed so as to maintain continuity.

第３バスは、鳴声フレームであってピッチ周期が未だ定
まっていないフレームに対して、第３図（１う）図示の
如く、贋、に定まっているフレームを核として９時間的
に前あるいは後から図示矢印の如くピッチ周期が連続す
るように、低い信頼度ながら選ばれている５個の・候補
の中から選出して、夫々のフレームについてのピッチ周
期を定めてゆく。As shown in FIG. 3 (1), the third bus is used to respond to frames that are voice frames and whose pitch period has not yet been determined, using a frame that has been determined to be false as a core, or a frame 9 hours earlier or Later, the pitch period for each frame is determined by selecting from among the five candidates selected with low reliability so that the pitch period is continuous as shown by the arrows in the figure.

第６図＃′ｉ当該第３パスについての処理を概略のフロ
ーチャートの形で表わした説明図である。信頼度の低い
フレーム群の先頭と末尾とを見出し。FIG. 6 #'i is an explanatory diagram showing the processing for the third pass in the form of a general flowchart. Find the beginning and end of a group of frames with low reliability.

連続性を保ちつつ当該フレーム群内の各フレームについ
てピッチ周期を決定してゆく。との間、本来のピッチ周
期に対応するものが脱落していることがあるなどのこと
を考慮して、上記第２パスの場合とｒｔＪ様に１１）Ｖ
２の飴をも候補として調べるようにされる。The pitch period is determined for each frame within the frame group while maintaining continuity. 11) V
The second candy is also investigated as a candidate.

第４バスは、有声フレーム全体についてピッチの連続性
をチェックしてゆき、不連続な場合には。The fourth bus checks pitch continuity for the entire voiced frame, and if it is discontinuous.

前後のフレームのピッチ周期から例えげ線形補間を行う
ようにする。第７図はその場合の概略７０−チャートを
表わしている。For example, linear interpolation is performed from the pitch period of the previous and next frames. FIG. 7 shows a schematic 70-chart in that case.

看うまでもなく、上記処理において、定数Ｔｍ１ｎ。Needless to say, in the above process, the constant Tm1n.

Ｔｍａｘ　、θ１．θ１などけ、予備実験において決定
しておく。Tmax, θ1. θ1 etc. are determined in preliminary experiments.

第８図り、第４図図示の第１パヌの処理についての詳細
を示したものである。図中の符号１５は有声フレームに
、関する処理に対応し、１６は信頼度の高いフレームに
関する処理に対応し、１７は前フレームが無音／無声で
ある場合の処理に対応し、１８はピッチ周期の連続性を
保つ処理に対応し、１９は前フレームが低い信頼度であ
る場合の処理、Ｋ　’ｉＪ応している。Figure 8 shows details of the processing of the first panel shown in Figure 4. Reference numeral 15 in the figure corresponds to processing related to voiced frames, 16 corresponds to processing related to frames with high reliability, 17 corresponds to processing when the previous frame is silent/unvoiced, and 18 corresponds to processing related to the pitch period. 19 corresponds to processing to maintain continuity of K'iJ, and 19 corresponds to processing when the previous frame has low reliability.

第９図は、第５図図示の第２パスの処ｆｆ１Ｋついての
詳細を示したものである。図中の符号２０は離れが６フ
レ一ム以内である場合についての処理に対応し、２１け
第（９）式に示す如き拡張された連続性についてチェッ
クする処理に対応し、２２は当該拡張さｈだ連続性を保
つようピッチ周期を訂正する処理に対応している。FIG. 9 shows details of the second pass processing ff1K shown in FIG. The reference numeral 20 in the figure corresponds to the process when the separation is within 6 frames, and corresponds to the process for checking extended continuity as shown in the 21st digit equation (9), and 22 corresponds to the process for checking the extended continuity as shown in the 21st digit equation (9). It also supports processing to correct the pitch period to maintain continuity.

第１０図（Ａｌ　（Ｂ＋は、第６図図示の第６パスの処
理についての詳細を示したものである。図中の符号２６
は無音／無声フレームから信頼度め低いフレームへの切
り替りを検出する処理に対応し、２４は（８軸度の高い
フレームから信頼度の低いフレームへの切り替りを検出
する処理に対応しており。FIG. 10 (Al (B+ indicates details of the processing of the sixth pass shown in FIG. 6. Reference numeral 26 in the figure
corresponds to the process of detecting a switch from a silence/voice frame to a frame with low reliability, and 24 corresponds to the process of detecting a switch from a frame with high 8-axis degree to a frame with low reliability. Ori.

両者処理２３と２４とで信頼度の低いフレームの先頭検
出を行っている。２５は信頼度の低いフレームから無音
／無声フレームへの切り替りを検出する処理に対応して
いる。また２６は第６図図示の処理甲に対応する処理に
対応している。２７は第６図図示の処理乙に対応する処
理に対応している。更に２日は信頼度の低いフレームか
ら信頼度の高いフレームへの切り替わりを検出する処理
に対応し、２９は第６図図示の処理丙に対応し。Both processes 23 and 24 detect the beginning of a frame with low reliability. 25 corresponds to a process of detecting a switch from a frame with low reliability to a silent/unvoiced frame. Further, 26 corresponds to the process corresponding to process A shown in FIG. 27 corresponds to the process corresponding to process B shown in FIG. Furthermore, the second day corresponds to a process for detecting a switch from a frame with low reliability to a frame with high reliability, and 29 corresponds to process C shown in FIG.

３０は第６図図示の処理下に対応している。30 corresponds to the processing shown in FIG.

第１１図は、第１０図図示の処理２６，２７，２９゜３
０ＪＣおける各処理を行うザブ・パスについての詳細を
示している。図示３１は候補の中から選出する処理に対
応し、３２は上述のτ（ｌし２を抽出する処理に対応し
、６３は前フレームが無音／無声であったＳ＠の処理に
対応し、６４は前フレームが有声であった場合の処理に
対応し、３５は連続性を保つものを探す処理に対応し、
３６はτ（１７２が連続性を保っているか否かをチェッ
クする処理に対応している。FIG. 11 shows the processing steps 26, 27, 29°3 shown in FIG. 10.
Details of the sub-pass that performs each process in 0JC are shown. 31 in the figure corresponds to the process of selecting from among the candidates, 32 corresponds to the process of extracting the above-mentioned τ(l and 2), 63 corresponds to the process of S@ whose previous frame was silent/voiceless, 64 corresponds to processing when the previous frame was voiced, 35 corresponds to processing to search for continuity,
36 corresponds to the process of checking whether τ(172 maintains continuity).

第１２図は、第７図図示の第４バスの処理についての舒
卸１を示したものである電図中の符号ろ７は有声フレー
ムに対する処理に対応し、３８は前フレーノ、が無音／
無声である場合の処理に対応し。FIG. 12 shows a summary 1 of the processing of the fourth bus shown in FIG.
Corresponds to processing when there is no voice.

３９は後７レームも無音／無声である。ｖす合の処理に
対応し、４０は後フレームでは有声である場合の処理に
対応し、４１は前フレームが有声である場合の処理に対
応し、４２は後フレームでは無音／無声である場合の処
理に対応し、４３は後フレームも有声である場合の処理
（補間なと）に対応している。39 is silent/voiceless for the next 7 frames. 40 corresponds to processing when the next frame is voiced, 41 corresponds to processing when the previous frame is voiced, and 42 corresponds to processing when the next frame is silent/unvoiced. 43 corresponds to processing when the subsequent frame is also voiced (interpolation).

（Ｅｌ　　発明の詳細な説明した如く１本発明によれば、ピッチ周期を決定す
るに当って、ピッチのゆらきや脱落などが生じていても
、複数の候補の中から合理的にピッチ周期を選択して決
定することが可能となり。(El As described in detail, according to the present invention, when determining the pitch period, even if the pitch fluctuates or drops, the pitch period can be rationally selected from among a plurality of candidates. You can choose and decide.

ピッチ周期の決定を人手たよって補足するなどの処理を
なくすることが可能となり、音声分析合成死守のいわば
全自動化をはかることが可能となる。It becomes possible to eliminate processes such as relying on humans to supplement pitch period determination, and it becomes possible to achieve full automation of voice analysis and synthesis.

なお、上８１２説明において、予測残差の自己相関係数
列の移ル）１平均を考用したか、それに限られるもので
はなく、他の加重イ均を考慮してもよい。Note that in the above 812 description, one average of the autocorrelation coefficient sequence of the prediction residual was considered, but the present invention is not limited to this, and other weighted equals may be considered.

４、　図面の宿Ｊ庁ｌ畝明第１図は木明細￥４にいうフレームの概念を説明する説
す１図、第２ド１は本発明の音声分析合成システノ・の
一実施例ブロック図、第３図ないし第７図は夫々第２図
図示の論理判断処理部におりる処理態様を説明する説明
図、第８図は論理判断処理部におりる第１パスに関する
処理についての一実施例フローチャート、第９図は論理
判断処理部における第２パスに関する処理についての一
実施例フローチャート、航１０図（Ａｌ　ＦＢ＋は一緒
になって論理判断処理部における第３パスに関する処理
についての一実施例フローチャート、第１１図は論理判
断処理部における第３パスに関する処理についての一実
施例フローヂャート、第１２図は論理判断処理部におけ
る第４バスに関する処理についての一実施例フローチャ
ートを示す。4. Figure 1 of the drawings is a diagram illustrating the concept of the frame referred to in the tree specification ¥4, and Figure 2 is a block diagram of an embodiment of the speech analysis and synthesis system of the present invention. , FIGS. 3 to 7 are explanatory diagrams each explaining the processing mode that goes to the logical judgment processing unit shown in FIG. 2, and FIG. Example flowchart, FIG. 9 is a flowchart of an embodiment of processing related to the second pass in the logic judgment processing section, and FIG. FIG. 11 is a flowchart of an embodiment of processing related to the third path in the logic judgment processing section, and FIG. 12 is a flowchart of one embodiment of processing related to the fourth bus in the logic judgment processing section.

図中、１は入力音声相関処理部、２は線形予測処理部、
３−残差電力抽出部、４は残差相関処理部、５は移動平
均処理部、６は最大値抽出処理部。In the figure, 1 is an input audio correlation processing section, 2 is a linear prediction processing section,
3-residual power extraction unit; 4, residual correlation processing unit; 5, moving average processing unit; and 6, maximum value extraction processing unit.

７はピッチ周期決定候補保持部、８は時間遅れ成分惺持
部、９は論理判断処理部、１０は有声／無声判定部、１
１は有声区間駆動音源生成部。7 is a pitch period determination candidate holding section, 8 is a time delay component holding section, 9 is a logic judgment processing section, 10 is a voiced/unvoiced judgment section, 1
1 is a voiced section drive sound source generation unit.

１２は無声（無音を含む）区間駆動音源生成部。Reference numeral 12 denotes an unvoiced (including silence) section drive sound source generation unit.

１６は切換処理部、１４は巌形予測合成部を表わしてい
る。Reference numeral 16 represents a switching processing section, and reference numeral 14 represents a rock-shaped predictive synthesis section.

時計出願人　富士通株式会社Watch applicant: Fujitsu Limited

Claims

[Claims] A voice in which an input voice is analyzed by a linear prediction method and converted into parameters, a pitch period is extracted using an autocorrelation coefficient of a prediction residual, and voice synthesis is performed based on these results. In analysis and synthesis, the autocorrelation coefficient sequence of the above prediction residuals or the autocorrelation coefficient sequence of the weighted moving average of the autocorrelation coefficients of the prediction residuals is examined to find multiple ones with larger correlation values. a pitch period determination candidate extraction unit that extracts candidates, a time delay component extraction unit that extracts a time delay corresponding to each pitch period determination candidate, and a pitch period determination candidate extracted by the pitch period determination candidate extraction unit described in 2. The logical judgment processing section determines the pitch period based on the corresponding time delay component, and the logical judgment processing section determines the pitch period based on the corresponding time delay component based on the value of the pitch period determination candidate. The method uses time delay components corresponding to pitch period determination candidates with reliability higher than a threshold value. Determine the pitch period for a period with reliability above the threshold, and determine the pitch period for a period with reliability below the threshold to maintain continuity with the previously determined pitch period. A speech analysis and synthesis system characterized by the following.