JP2624958B2

JP2624958B2 - Speech synthesizer

Info

Publication number: JP2624958B2
Application number: JP61130712A
Authority: JP
Inventors: 邦男中島; 泰石川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1986-06-05
Filing date: 1986-06-05
Publication date: 1997-06-25
Anticipated expiration: 2012-06-25
Also published as: JPS62287299A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、予め，音声を単語，句，文章等の単位で
分析して特徴パラメータという形で記憶登録しておき、
応答すべきメッセージの内容に応じてそれらを編集して
合成する音声合成装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] In the present invention, speech is analyzed in advance in units of words, phrases, sentences, etc., and stored and registered in the form of feature parameters.
The present invention relates to a speech synthesizer for editing and synthesizing messages according to the contents of messages to be answered.

[Conventional technology]

第２図は従来の音声合成装置の基本的な構成を示すブ
ロック図であり、図において、１は登録時に入力される
音声信号ａをPARCOR分析法等により所定単位で分析し
て、音声の特性を表わす特徴パラメータｂを抽出する分
析回路、２は上記分析回路１から出力される分析特徴パ
ラメータｂを記憶するパラメータ記憶回路、３は図示し
ない入力装置から入力されるメッセージ内容ｃに応じて
パラメータ記憶回路２から特徴パラメータを選択し、得
られた選択特徴パラメータｄを編集する編集回路、４は
上記編集回路３からの編集特徴パラメータｅから合成音
声ｆを生成する合成回路である。FIG. 2 is a block diagram showing a basic configuration of a conventional speech synthesizer. In FIG. 2, reference numeral 1 denotes a speech signal a input at the time of registration, which is analyzed in a predetermined unit by a PARCOR analysis method or the like. 2 is a parameter storage circuit for storing the analysis characteristic parameter b output from the analysis circuit 1, and 3 is a parameter storage according to a message content c input from an input device (not shown). An editing circuit 4 for selecting a feature parameter from the circuit 2 and editing the obtained selected feature parameter d is a synthesis circuit 4 for generating a synthesized voice f from the editing feature parameter e from the editing circuit 3.

次に動作について説明する。先ず予め、応答させたい
メッセージに対応する複数種の音声信号ａを同一話者に
発声させて入力することにより、分析回路１で所定単位
毎の分析特徴パラメータｂが抽出され、パラメータ記憶
回路２に記憶登録させる。このようにして所望の初期登
録がなされると、以降，応答すべきメッセージ内容ｃが
入力される毎に、編集回路３により上記メッセージ内容
ｃに応じてパラメータ記憶回路２から選択特徴パラメー
タｄが取り出されて編集され、この編集特徴パラメータ
ｅが合成回路４で合成されることにより、合成音声ｆが
出力されて音声応答がなされる。Next, the operation will be described. First, by inputting a plurality of types of voice signals a corresponding to a message to be answered in advance by uttering the same speaker, the analysis feature parameter b for each predetermined unit is extracted by the analysis circuit 1 and stored in the parameter storage circuit 2. Remember me. After the desired initial registration has been made in this way, every time a message content c to be answered is input, the editing circuit 3 retrieves the selected characteristic parameter d from the parameter storage circuit 2 in accordance with the message content c. The editing feature parameter e is synthesized by the synthesizing circuit 4, so that a synthesized voice f is output and a voice response is made.

[Problems to be solved by the invention]

しかしながら、従来のこの種の音声合成装置では、応
答させるメッセージ内容に変更が生じ、記憶登録されて
いる音声の特徴パラメータを部分的に修正，もしくは追
加しようとする場合、初期登録話者により登録しない
と、合成された音声は部分的に発声者が異なる極めて不
自然なものになり、また、たとえ同一話者により登録し
た場合でも、用いるマイクロフォン等の機器が異なる場
合や体調によって、同様に部分的な音質の違いが生じ品
質劣化の原因となっていた。このように従来の音声応答
装置では、登録音声の部分変更が困難であるという問題
点があった。However, in the conventional speech synthesizer of this type, when the content of a message to be responded to is changed, and the feature parameter of the stored and registered speech is partially corrected or added, the message is not registered by the initially registered speaker. The synthesized speech is partially unnatural due to the different speakers, and even if registered by the same speaker, it may also be partially affected by different microphones or other equipment or physical condition. A great difference in sound quality occurred, causing quality deterioration. As described above, the conventional voice response device has a problem that it is difficult to change a part of the registered voice.

この発明は上記のような問題点を解消するためになさ
れたもので、部分登録時の話者が初期登録話者と異なる
場合や同一話者でも使用機器や体調が異なる場合でも、
合成時に一定品質で自然な合成音声を得ることができる
音声合成装置を提供することを目的とするものである。The present invention has been made in order to solve the above problems, even when the speaker at the time of partial registration is different from the initially registered speaker, or when the same speaker uses different equipment and physical condition,
It is an object of the present invention to provide a speech synthesizer capable of obtaining a natural synthesized speech with constant quality during synthesis.

[Means for solving the problem]

この発明に係る音声合成装置は、初期音声登録時にパ
ラメータ記憶回路に記憶された特徴パラメータに基づき
初期登録音声の擬音韻パターンを生成するクラスタリン
グ回路と、生成された擬音韻パターンを記憶する擬音韻
パターン記憶回路と、部分登録時に分析回路から得られ
る特徴パラメータと上記擬音韻パターン記憶回路に記憶
されている擬音韻パターンとを比較して、最も類似度の
高い擬音韻パターンで当該特徴パラメータを補正するこ
とにより擬音韻表現特徴パラメータを生成し、当該擬音
韻表現特徴パラメータをパラメータ記憶回路に記憶する
マッチング回路とを備えものである。A speech synthesizing apparatus according to the present invention includes a clustering circuit that generates a pseudophoneme pattern of an initial registration voice based on a feature parameter stored in a parameter storage circuit at the time of initial voice registration, and a pseudophoneme pattern that stores the generated pseudophoneme pattern. The storage circuit compares the feature parameters obtained from the analysis circuit at the time of partial registration with the onomatopoeia pattern stored in the onomatopoeia pattern storage circuit, and corrects the feature parameter with the onomatopoeia pattern having the highest similarity. And a matching circuit for generating the onomatopoeia expression feature parameters and storing the onomatopoeia expression feature parameters in a parameter storage circuit.

[Action]

この発明においては、初期登録時に得られた特徴パラ
メータからクラスタリング回路により初期登録音声の擬
音韻パターンが生成されて擬音韻パターン記憶回路に記
憶される。そして、追加，修正等の部分登録時には、分
析回路から得られる特徴パラメータがマッチング回路を
介することにより擬音韻パターン記憶回路の中の最も類
似度の高い擬音韻パターンで補正されてパラメータ記憶
回路に記憶される。According to the present invention, a pseudophoneme pattern of the initially registered speech is generated by the clustering circuit from the characteristic parameters obtained at the time of the initial registration, and stored in the onomatopoeia pattern storage circuit. At the time of partial registration such as addition or correction, the characteristic parameters obtained from the analysis circuit are corrected by the onomatopoemic pattern having the highest similarity in the onomatopoeia pattern storage circuit via the matching circuit and stored in the parameter storage circuit. Is done.

〔Example〕

以下、この発明の一実施例を図について説明する。第
１図は実施例の音声合成装置の構成を示すブロック図で
あり、第２図従来例と同一符号は同一又は相当部分を示
しており、その説明は省略する。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a speech synthesizer according to an embodiment. The same reference numerals as those in the conventional example of FIG. 2 denote the same or corresponding parts, and a description thereof will be omitted.

図において、５はパラメータ空間におけるクラスタリ
ング（例えばGrayらの方法，文献;Linde Y.,Buzo A.,Gr
ay R.M.“An AIgorithm for Vector Quantizer Desig
n",IEEE Trans Com.,Vol.COM−28,No.1,PP.84〜94（Ja
n.1980）の手法により、初期登録時にパラメータ記憶回
路２に記憶登録された分析特徴パラメータｂの一部また
は全部から成るクラスタリング特徴パラメータｇに対し
て計算を施して、初期登録話者の特徴を有し，ほぼ各音
韻を代表する擬音韻パターンｈを生成するクラスタリン
グ回路、６は上記により生成された擬音韻パターンｈを
記憶する擬音韻パターン記憶回路、７は追加，修正等の
部分登録時に分析回路１からの分析特徴パラメータｂと
上記擬音韻パターン記憶回路６に記憶された擬音韻パタ
ーンｈとを比して最も類似度の高い擬音韻パターンを選
択し、これで上記分析特徴パラメータｂを補正してパラ
メータ記憶回路２へ擬音韻表現特徴パラメータｋとして
出力するマッチング回路である。なお、図示は省略した
が、オペレータの操作等により、初期登録時には分析回
路１からの分析特徴パラメータｂが直接パラメータ記憶
回路２へ入力され，部分登録時にはマッチング回路７を
介して入力されるようになっている。In the figure, reference numeral 5 denotes clustering in the parameter space (eg, the method of Gray et al., Literature; Linde Y., Buzo A., Gr
ay RM “An AIgorithm for Vector Quantizer Desig
n ", IEEE Trans Com., Vol. COM-28, No. 1, PP. 84-94 (Ja
n.1980), the clustering feature parameter g composed of a part or all of the analysis feature parameter b stored and registered in the parameter storage circuit 2 at the time of initial registration is calculated, and the feature of the initially registered speaker is calculated. A clustering circuit for generating an onomatopoeic pattern h that is representative of each phoneme, 6 a onomatopoeic pattern storage circuit for storing the onomatopoeic pattern h generated as described above, and 7 an analysis at the time of partial registration such as addition or correction. By comparing the analysis feature parameter b from the circuit 1 with the onomatopoeia pattern h stored in the onomatopoeia pattern storage circuit 6, the onomatopoeia pattern having the highest similarity is selected, and the above analysis feature parameter b is corrected. The matching circuit outputs the parameter to the parameter storage circuit 2 as the onomatopoeia expression feature parameter k. Although not shown, the analysis feature parameter b from the analysis circuit 1 is input directly to the parameter storage circuit 2 at the time of initial registration, and is input via the matching circuit 7 at the time of partial registration, by an operation of an operator or the like. Has become.

次に、動作について説明する。初期登録時には、編集
の単位となる単語，句等の単位で音声は入力され、分析
回路１で分析特徴パラメータｂに変換されたのち、パラ
メータ記憶回路２に記憶される。ここで特徴パラメータ
は、一般にピッチ周期Ｐと、声道の特性を表わすベクト
ル（例えばPARCOR係数など）から成る。以下では、特徴パ
ラメータのセットをの如くに後記する。初期登録が終わった時点で、パラメ
ータ記憶回路２に蓄えられた分析特徴パラメータｂの一
部または全部から成るクラスタリング特徴パラメータｇ
がクラスタリング回路５に送られ、例えば前記Grayらの
方法などにより擬音韻パターンｈとして、声道の特性を
表わすベクトルに対して（ｉ＝1,2,…Ｉ）が計算され、ピッチ周期についても平
均値が計算され、これらが擬音韻パターン記憶回路６
に記憶される。Next, the operation will be described. At the time of initial registration, a voice is input in units of words, phrases, etc., which are units of editing, converted into analysis characteristic parameters b by the analysis circuit 1, and then stored in the parameter storage circuit 2. Here, the characteristic parameter is generally a pitch period P and a vector representing the characteristics of the vocal tract. (For example, a PARCOR coefficient). In the following, a set of feature parameters It will be described later. At the end of the initial registration, a clustering feature parameter g composed of part or all of the analysis feature parameter b stored in the parameter storage circuit 2
Is sent to the clustering circuit 5, and a pseudophony pattern h is applied to the vector representing the characteristics of the vocal tract by, for example, the method of Gray et al. (I = 1, 2,... I), the average value of the pitch period is also calculated, and these are
Is stored.

また、追加，修正入力時には、分析回路１により分析
特徴パラメータｂとして（但し,j＝1,…,J）が抽出される。なお、qjはピッチ周
期，は声道の特性を示すベクトル量,jは時間を示すインデッ
クスである。この分析特徴パラメータｂはマッチング回
路７に送られ、擬音韻パターン記憶回路６に記憶された
擬音韻パターンｈと比較され、擬音韻表現特徴パラメー
タｋとしてに変換される。In addition, at the time of addition or correction input, the analysis circuit 1 uses the analysis circuit 1 as an analysis feature parameter b. (Where j = 1,..., J) is extracted. Where qj is the pitch period, Is a vector quantity indicating the characteristics of the vocal tract, and j is an index indicating time. The analysis feature parameter b is sent to the matching circuit 7 and compared with the onomatopoeia pattern h stored in the onomatopoeia pattern storage circuit 6, and is used as the onomatopoeia expression feature parameter k. Is converted to

ここで、ｌ（ｊ）は（ｉ＝1,…,I）の距離値について最小の値をとるｉを示すものである。また、α
はピッチを補正する係数で、例えばであたえられる。この擬音韻表現特徴パラメータｋが追
加，修正データとしてパラメータ記憶回路２に記憶され
る。Where l (j) is (I = 1, ..., I) distance value Is the minimum value of i. Also, α
Is a coefficient for correcting pitch, for example, It is given. This onomatopoeia expression feature parameter k is added and stored in the parameter storage circuit 2 as correction data.

一方、合成時には、従来同様メッセージ内容ｃに従っ
て編集回路３はパラメータ記憶回路２から必要な選択特
徴パラメータｄを読み出して編集し、合成回路４はこの
編集特徴パラメータｅから合成音声ｆを生成して出力す
る。On the other hand, at the time of synthesis, the editing circuit 3 reads and edits the necessary selected characteristic parameter d from the parameter storage circuit 2 in accordance with the message content c as in the conventional case, and the synthesizing circuit 4 generates and outputs a synthesized voice f from the edited characteristic parameter e. I do.

〔The invention's effect〕

以上のように、この発明の音声合成装置によれば、初
期音声登録時にパラメータ記憶回路に記憶された特徴パ
ラメータに基づき初期登録音声の擬音韻パターンを生成
するクラスタリング回路と、生成された擬音韻パターン
を記憶する擬音韻パターン記憶回路と、部分登録時に分
析回路から得られる特徴パラメータと上記擬音韻パター
ン記憶回路に記憶されている擬音韻パターンとを比較し
て、最も類似度の高い擬音韻パターンで当該特徴パラメ
ータを補正することにより擬音韻表現特徴パラメータを
生成し、当該擬音韻表現特徴パラメータをパラメータ記
憶回路に記憶するマッチング回路とを備えたことによ
り、登録時の話者の違いや機器あるいは体調の違いなど
による部分的な合成音品質の急変を解消でき、高品質を
保ちつつ、追加，修正が容易に行なえるという優れた効
果を奏するものである。As described above, according to the speech synthesis apparatus of the present invention, a clustering circuit that generates a pseudophoneme pattern of an initial registration voice based on the feature parameters stored in the parameter storage circuit at the time of initial voice registration, and a generated pseudophoneme pattern And a feature phoneme pattern obtained by the analysis circuit at the time of partial registration and the onomatopoeia pattern stored in the onomatopoeia pattern storage circuit. A matching circuit that generates the onomatopoeia expression feature parameters by correcting the feature parameters and stores the onomatopoeia expression feature parameters in a parameter storage circuit, thereby enabling the difference in speaker or equipment or physical condition at the time of registration. Additions and corrections can be made while maintaining high quality by eliminating sudden changes in partial synthesized sound quality due to differences in sound quality, etc. In which exhibits an excellent effect of easily.

[Brief description of the drawings]

第１図はこの発明の一実施例による音声合成装置を示す
ブロック構成図、第２図は従来の音声合成装置の一例を
示すブロック構成図である。１は分析回路、２はパラメータ記憶回路、３は編集回
路、４は合成回路、５はクラスタリング回路、６は擬音
韻パターン記憶回路、７はマッチング回路である。なお、図中同一符号は同一又は相当部分を示す。FIG. 1 is a block diagram showing a speech synthesizer according to an embodiment of the present invention, and FIG. 2 is a block diagram showing an example of a conventional speech synthesizer. 1 is an analysis circuit, 2 is a parameter storage circuit, 3 is an editing circuit, 4 is a synthesis circuit, 5 is a clustering circuit, 6 is a pseudophoneme pattern storage circuit, and 7 is a matching circuit. In the drawings, the same reference numerals indicate the same or corresponding parts.

Claims

(57) [Claims]

An analysis circuit for analyzing a voice inputted at the time of registration in a predetermined unit to extract a feature parameter, and a parameter storage circuit for storing the extracted feature parameter,
A speech synthesizer for selecting and editing characteristic parameters from the storage circuit according to the content of a message to be answered, and synthesizing the edited characteristic parameters, wherein the initial registration is performed based on the characteristic parameters stored in the parameter storage circuit at the time of initial voice registration. A clustering circuit for generating an onomatopoeic pattern of the voice, an onomatopoeic pattern storage circuit for storing the generated onomatopoeic pattern, a feature parameter obtained from the analysis circuit at the time of partial registration, and stored in the onomatopoeic pattern storage circuit. A matching circuit that generates a pseudophoneme expression feature parameter by comparing the onomatopoeia pattern with the onomatopoeia pattern having the highest similarity and correcting the feature parameter, and stores the onomatopoeia expression feature parameter in a parameter storage circuit. A speech synthesizer comprising: