JPH04190398A

JPH04190398A - Sound synthesizing method

Info

Publication number: JPH04190398A
Application number: JP2322170A
Authority: JP
Inventors: Yoshimasa Sawada; 沢田　喜正
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1990-11-26
Filing date: 1990-11-26
Publication date: 1992-07-08

Abstract

PURPOSE:To obtain a sound quality resembling a recorded composite tone by determining a phoneme boundary in the environment of a vocalized word in accordance with phoneme time length data obtained by analyzing an actual voice and a pitch pattern obtained by analyzing an actual voice pitch pattern. CONSTITUTION:A vocalizer vocalizes a word or a sentence stored in a storage 21, and the pitches of the word or the sentence are analyzed by a pitch analyzing part 22. Further, phoneme time length data stored in storage part 23 for each phoneme environment can be obtained by previously analyzing huge actual voices in a time length pattern data base 4, and are dependent upon the data in the storage part 23 and the result of analysis by the pitch analyzing part 22. The pitch pattern is inputted to a time length normalizing part 24 so as to a phoneme boundary in the phoneme environment of the vocalized word. Further, the pitches at the time when each phoneme is quarterly divided are stored as a composite tone pitch pattern in a pitch pattern data base 6, and accordingly, a composite tone is obtained therefrom. Thereby it is possible to obtain a sound quality resembling a recorded composite sound.

Description

【発明の詳細な説明】Ａ、産業上の利用分野この発明は漢字かな混じりのテキストから音声を合成す
る方法に係わり、特に音声のピッチパターンのデータベ
ースを生成する方法に関する。DETAILED DESCRIPTION OF THE INVENTION A. Field of Industrial Application This invention relates to a method of synthesizing speech from text containing kanji and kana, and more particularly to a method of generating a database of pitch patterns of speech.

Ｂ　発明の概要この発明は音声のピッチパターンのデータベースを生成
する方法において、発声単語のピッチを分析したピッチパターンと、音韻環
境別の音韻時間長データとから、音韻境界を決定してピ
ッチパターンデータベースを生成するようにしたことに
より、録音合成に近い音質をもつ規則合成方法を得ることがで
きるようにしたものである。B. Summary of the Invention This invention provides a method for generating a database of speech pitch patterns, which determines phonological boundaries from pitch patterns obtained by analyzing the pitch of uttered words and phonological duration data for each phonological environment, and generates a pitch pattern database. By generating , it is possible to obtain a rule synthesis method with sound quality close to that of recording synthesis.

Ｃ従来の技術規則音声合成は、任意の単語１文章等を漢字かな混じり
のテキストより音声として合成する手法である。C. Conventional technical rule speech synthesis is a method of synthesizing a single sentence of arbitrary words into speech from text containing kanji and kana.

第３図は、−船釣な音声合成装置の概要を示す説明図で
ある。FIG. 3 is an explanatory diagram showing an outline of a simple voice synthesis device.

まず、漢字かな混じりのテキスト入力部１に入力された
テキストを、日本語処理部２により音素記号列に変換す
る。First, a text input into the text input section 1 containing kanji and kana is converted into a phoneme symbol string by the Japanese processing section 2.

次に、この音素記号列から、韻律パターン（時間長パタ
ーン・ピッチパターン・エネルギパターン）を生成する
。Next, a prosodic pattern (duration pattern, pitch pattern, energy pattern) is generated from this phoneme symbol string.

すなわち、時間長パターン生成部３により、時間長パタ
ーンデータヘース４を参照し、音声の継続時間を示す時
間長パターンを生成する。That is, the time length pattern generation unit 3 refers to the time length pattern data hese 4 and generates a time length pattern indicating the duration of the audio.

同様に、ピッチパターン生成部５により、ピッチパター
ンデータベース６を参照し、音声の高さを示すピッチパ
ターンを生成する。Similarly, the pitch pattern generation unit 5 refers to the pitch pattern database 6 and generates a pitch pattern indicating the pitch of the voice.

また同様に、工不ルキパターン生成部７により、エネル
ギパターンデータベース８を参照し、音声の強さを示す
エネルギパターンを生成する。Similarly, the energy pattern generation unit 7 refers to the energy pattern database 8 and generates an energy pattern indicating the strength of the voice.

このようにして得られた各音韻パターンに基づいて、音
声合成部９により、音声データベースＩＯを参照し、音
声波形を合成する。なお１１は、合成音声を出力する音
声出力部である。Based on each phoneme pattern obtained in this manner, the speech synthesis section 9 refers to the speech database IO and synthesizes a speech waveform. Note that 11 is an audio output unit that outputs synthesized audio.

上記のように構成された音声合成装置において、ピッチ
パターンデータベース６を参照してピッチパターン生成
部５で生成される例えば「あおい」という文におけるピ
ッチパターンは第４図に示すように、Ｉモーラにつき４
点のピッチ目標値を与えて合成していた。このため、従
来の音声合成装置では抑揚の単調さや不自然さかあった
。In the speech synthesis device configured as described above, the pitch pattern for the sentence "Aoi", for example, generated by the pitch pattern generation unit 5 with reference to the pitch pattern database 6, is based on the I mora as shown in FIG. 4
The pitch target value of the points was given and synthesized. For this reason, conventional speech synthesizers suffer from monotonous and unnatural intonation.

Ｄ　発明が解決しようとする課題上述した従来の音声合成装置で上記した不具合があるけ
れども、近年装置の音質は進歩し、明瞭性の点では問題
がない程度まで到達しつつある。D. Problems to be Solved by the Invention Although the conventional speech synthesis device described above has the above-mentioned problems, the sound quality of the device has improved in recent years and is reaching a level where there is no problem in terms of clarity.

しかし、音質では原音をある程度圧縮した、いわゆる録
音合成装置と比較するとやや音質評価か落ちる。However, when compared to so-called recording synthesizers that compress the original sound to some extent, the sound quality is slightly lower.

しかし、録音合成は任意の文章を発声させることができ
ないので、多数の文章を発声させるためには、膨大なメ
モリを必要とし、高価になる欠点がある。この欠点の他
に以下のような問題もある。However, since recording synthesis cannot make any sentence uttered, it requires a huge amount of memory and is expensive in order to utter a large number of sentences. In addition to this drawback, there are also the following problems.

（イ）ある文章のある単語を別の単語に変更する場合に
その都度録音しなければならず非常に煩雑かつ高価とな
る、。(b) When changing one word in a certain sentence to another, it must be recorded each time, which is very complicated and expensive.

（ロ）すてに録音しである同し発声者から必ずしも録音
できない。(b) It is not always possible to record from the same speaker who has already recorded.

（ハ）同じ発声者から録音できても抑揚がすでに録音し
である単語や文章と異なる場合がある。(c) Even if a sound can be recorded from the same speaker, the intonation may differ from the already recorded word or sentence.

（ニ）上記（ロ）、（ハ）から文章全体の抑揚か不自然
になり、合成音の自然性や明瞭性を低下させる。(d) As a result of (b) and (c) above, the intonation of the entire sentence becomes unnatural, reducing the naturalness and clarity of the synthesized speech.

この発明は上記の事情に鑑みてなされたもので、録音合
成に近い音質を持つ音声合成方法を提供することを目的
とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a speech synthesis method having a sound quality close to that of recording synthesis.

Ｅ　課題を解決するための手段この発明は漢字かな混じり文のテキスト入力を日本語処
理部で解析して音韻列に変換し、この音韻列に基づいて
時間長パターン、ピッチパターン及びエネルキパターン
を各データ格納部を参照して生成し、生成されたこれら
のパターンに基ついて合成音声を生成する方法において
、発声単語のピッチを分析し、このピッチ分析パターンと
前記時間長パターンデータヘースに格納されている音韻
環境別の音韻時間長データとから音韻境界を決定してピ
ッチパターン生成の際参照されるピッチパターンデータ
ベースを生成するようにしたことを特徴とするものであ
る。E. Means for Solving the Problems This invention analyzes text input of sentences containing kanji and kana in a Japanese language processing unit, converts it into a phoneme string, and then converts each time length pattern, pitch pattern, and energy pattern based on this phoneme string. In the method of generating synthesized speech based on the generated patterns by referring to the data storage section, the pitch of the uttered word is analyzed, and this pitch analysis pattern and the time length pattern stored in the data base are The present invention is characterized in that a pitch pattern database is generated which is referred to when generating a pitch pattern by determining phoneme boundaries from phoneme duration data for each phoneme environment.

Ｆ　作用発声すべき単語や文章を発声者に発声してもらってその
ピッチを分析する。分析した結果、得られたピッチパタ
ーンと、予め、実音声を分析することにより得られた音
韻環境別の音韻時間長データとを用いて発声単語の音韻
環境から音韻境界を決定する。そして、各音韻を分割し
た時点のピッチを合成時の抑揚パターンとして記憶して
おいて、このパターンを使って合成音を得る。F. Have the speaker say the words and sentences to be uttered and analyze their pitch. A phonological boundary is determined from the phonological environment of the uttered word using the pitch pattern obtained as a result of the analysis and the phonological time length data for each phonological environment obtained in advance by analyzing real speech. Then, the pitch at the time when each phoneme is divided is stored as an intonation pattern at the time of synthesis, and this pattern is used to obtain a synthesized sound.

Ｇ　実施例以下この発明の実施例を図面に基づいて説明する。G Example Embodiments of the present invention will be described below based on the drawings.

第１図において、２１は発声すべき単語あるいは文章の
格納部で、この格納部２１の単語あるいは文章を発声者
に発声してもらい、そのピッチをピッチ分析部２２で分
析する。２３は音韻環境別の音韻時間長データ格納部で
、この格納部２３の音韻時間長データは第３図に示す時
間長パターンデータベース４に予め膨大な実音声を分析
するこ、とにより得たものである。格納部２３のデータ
とピッチ分析部２２で分析された結果による第２図に示
すピッチパターンとを時間長正規化部２４に入力して、
ここで発声単語の音韻環境力ｌら音韻境界を決定する。In FIG. 1, 21 is a storage unit for words or sentences to be uttered. A speaker is asked to utter the words or sentences in this storage unit 21, and the pitch analysis unit 22 analyzes the pitch. Reference numeral 23 denotes a phoneme time length data storage unit for each phoneme environment, and the phoneme time length data in this storage unit 23 is obtained by previously analyzing a huge amount of real speech into the time length pattern database 4 shown in FIG. It is. The data in the storage unit 23 and the pitch pattern shown in FIG. 2 based on the results analyzed by the pitch analysis unit 22 are input to the time length normalization unit 24,
Here, the phonological environment of the uttered word and the phonological boundaries are determined.

そして、各音韻を４分割した時点のピッチを合成時のピ
ッチパターンとしてピッチパターンデータベース６に格
納しておき、これを使って合成音を得る。Then, the pitch at the time when each phoneme is divided into four is stored in the pitch pattern database 6 as a pitch pattern at the time of synthesis, and this is used to obtain a synthesized sound.

上記のようにして得たピッチパターンデータベースを使
用すれば、第２図に示す「あ」、「お≦。If the pitch pattern database obtained as described above is used, "a" and "o≦" as shown in FIG.

：い＝・等の音韻境界が正しく認識できるとともにピッ
チ目標値の位置も適確になる。従って、合成音は自然な
抑揚となって、録音合成に近い音質になる。Phonological boundaries such as :i=・ can be recognized correctly, and the pitch target value can also be positioned accurately. Therefore, the synthesized sound has a natural intonation, and the sound quality is close to that of recorded synthesis.

Ｈ発明の効果以上述べたように、この発明によれば、実音声を分析し
て得られた音韻環境別の音韻時間長データと、実音声の
ピッチパターンを分析して得られたピンチパターンとか
ら発声噴語の音韻環境から音韻境界を決定してピッチパ
ターンと合成したことにより、録音合成音に近い音質を
得ることかできる規則音声合成方法が得られる。Effects of the Invention As described above, according to the present invention, phonetic duration data by phonetic environment obtained by analyzing real speech, and pinch patterns obtained by analyzing pitch patterns of real speech. By determining phonological boundaries from the phonological environment of spoken ejectives and synthesizing them with pitch patterns, a regular speech synthesis method can be obtained that can obtain sound quality close to that of recorded synthesized speech.

[Brief explanation of the drawing]

第１図はこの発明の一実施例を示す概略構成図、第２図
はこの発明の実施例によるピッチ分析で得られたピッチ
パターン説明図、第３図は従来の規則音声合成装置の概
略構成図、第４図は従来の規則音声合成装置で使用され
るピッチパターン説明図である。２１・・発声すべき単語あるいは文章の格納部、２２・
・ピッチ分析部、２３−音韻時間長データ格納部、２４
・・時間長正規化部。第１図６　ピッチパターンデータ／＼−ス２１　発生ずへき単語あるいは文章の格納部２２　ピッ
チ分析部２３　音韻時ｉＩｌ長ナータ格納部２４　時間長正規化部第３図４　時間長パター／データベ−７，６ピッチバター７データベース８　エネルギパターノデータベース１０　音声データベース１１　音声出力部FIG. 1 is a schematic configuration diagram showing an embodiment of the present invention, FIG. 2 is an explanatory diagram of pitch patterns obtained by pitch analysis according to the embodiment of this invention, and FIG. 3 is a schematic configuration diagram of a conventional regular speech synthesis device. FIG. 4 is an explanatory diagram of a pitch pattern used in a conventional regular speech synthesizer. 21. Storage unit for words or sentences to be uttered, 22.
・Pitch analysis section, 23 - Phonological duration data storage section, 24
...Time length normalization section. Fig. 1 6 Pitch pattern data/\-space 21 Storage section 22 for words or sentences that occur without occurrence Pitch analysis section 23 Phonological time length nata storage section 24 Time length normalization section Fig. 3 4 Time length pattern/database 7 , 6 pitch butter 7 database 8 energy pattern database 10 audio database 11 audio output section

Claims

[Claims]

(1) The Japanese processing unit analyzes the text input of a sentence containing kanji and kana, converts it into a phoneme string, and generates a time length pattern, pitch pattern, and energy pattern based on this phoneme string by referring to each database, In the method of generating synthetic speech based on these generated patterns, the pitch of the uttered word is analyzed, and the pitch of the uttered word is analyzed from the pitch analysis pattern and the phonetic duration data for each phonetic environment stored in the duration pattern database. A speech synthesis method characterized in that a pitch pattern database is generated which is referred to when generating pitch patterns by determining phoneme boundaries.