JPH04190398A - Sound synthesizing method - Google Patents

Sound synthesizing method

Info

Publication number
JPH04190398A
JPH04190398A JP2322170A JP32217090A JPH04190398A JP H04190398 A JPH04190398 A JP H04190398A JP 2322170 A JP2322170 A JP 2322170A JP 32217090 A JP32217090 A JP 32217090A JP H04190398 A JPH04190398 A JP H04190398A
Authority
JP
Japan
Prior art keywords
pitch
pattern
phoneme
time length
analyzing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2322170A
Other languages
Japanese (ja)
Inventor
Yoshimasa Sawada
沢田 喜正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Original Assignee
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meidensha Corp, Meidensha Electric Manufacturing Co Ltd filed Critical Meidensha Corp
Priority to JP2322170A priority Critical patent/JPH04190398A/en
Publication of JPH04190398A publication Critical patent/JPH04190398A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To obtain a sound quality resembling a recorded composite tone by determining a phoneme boundary in the environment of a vocalized word in accordance with phoneme time length data obtained by analyzing an actual voice and a pitch pattern obtained by analyzing an actual voice pitch pattern. CONSTITUTION:A vocalizer vocalizes a word or a sentence stored in a storage 21, and the pitches of the word or the sentence are analyzed by a pitch analyzing part 22. Further, phoneme time length data stored in storage part 23 for each phoneme environment can be obtained by previously analyzing huge actual voices in a time length pattern data base 4, and are dependent upon the data in the storage part 23 and the result of analysis by the pitch analyzing part 22. The pitch pattern is inputted to a time length normalizing part 24 so as to a phoneme boundary in the phoneme environment of the vocalized word. Further, the pitches at the time when each phoneme is quarterly divided are stored as a composite tone pitch pattern in a pitch pattern data base 6, and accordingly, a composite tone is obtained therefrom. Thereby it is possible to obtain a sound quality resembling a recorded composite sound.

Description

【発明の詳細な説明】 A、産業上の利用分野 この発明は漢字かな混じりのテキストから音声を合成す
る方法に係わり、特に音声のピッチパターンのデータベ
ースを生成する方法に関する。
DETAILED DESCRIPTION OF THE INVENTION A. Field of Industrial Application This invention relates to a method of synthesizing speech from text containing kanji and kana, and more particularly to a method of generating a database of pitch patterns of speech.

B 発明の概要 この発明は音声のピッチパターンのデータベースを生成
する方法において、 発声単語のピッチを分析したピッチパターンと、音韻環
境別の音韻時間長データとから、音韻境界を決定してピ
ッチパターンデータベースを生成するようにしたことに
より、 録音合成に近い音質をもつ規則合成方法を得ることがで
きるようにしたものである。
B. Summary of the Invention This invention provides a method for generating a database of speech pitch patterns, which determines phonological boundaries from pitch patterns obtained by analyzing the pitch of uttered words and phonological duration data for each phonological environment, and generates a pitch pattern database. By generating , it is possible to obtain a rule synthesis method with sound quality close to that of recording synthesis.

C従来の技術 規則音声合成は、任意の単語1文章等を漢字かな混じり
のテキストより音声として合成する手法である。
C. Conventional technical rule speech synthesis is a method of synthesizing a single sentence of arbitrary words into speech from text containing kanji and kana.

第3図は、−船釣な音声合成装置の概要を示す説明図で
ある。
FIG. 3 is an explanatory diagram showing an outline of a simple voice synthesis device.

まず、漢字かな混じりのテキスト入力部1に入力された
テキストを、日本語処理部2により音素記号列に変換す
る。
First, a text input into the text input section 1 containing kanji and kana is converted into a phoneme symbol string by the Japanese processing section 2.

次に、この音素記号列から、韻律パターン(時間長パタ
ーン・ピッチパターン・エネルギパターン)を生成する
Next, a prosodic pattern (duration pattern, pitch pattern, energy pattern) is generated from this phoneme symbol string.

すなわち、時間長パターン生成部3により、時間長パタ
ーンデータヘース4を参照し、音声の継続時間を示す時
間長パターンを生成する。
That is, the time length pattern generation unit 3 refers to the time length pattern data hese 4 and generates a time length pattern indicating the duration of the audio.

同様に、ピッチパターン生成部5により、ピッチパター
ンデータベース6を参照し、音声の高さを示すピッチパ
ターンを生成する。
Similarly, the pitch pattern generation unit 5 refers to the pitch pattern database 6 and generates a pitch pattern indicating the pitch of the voice.

また同様に、工不ルキパターン生成部7により、エネル
ギパターンデータベース8を参照し、音声の強さを示す
エネルギパターンを生成する。
Similarly, the energy pattern generation unit 7 refers to the energy pattern database 8 and generates an energy pattern indicating the strength of the voice.

このようにして得られた各音韻パターンに基づいて、音
声合成部9により、音声データベースIOを参照し、音
声波形を合成する。なお11は、合成音声を出力する音
声出力部である。
Based on each phoneme pattern obtained in this manner, the speech synthesis section 9 refers to the speech database IO and synthesizes a speech waveform. Note that 11 is an audio output unit that outputs synthesized audio.

上記のように構成された音声合成装置において、ピッチ
パターンデータベース6を参照してピッチパターン生成
部5で生成される例えば「あおい」という文におけるピ
ッチパターンは第4図に示すように、Iモーラにつき4
点のピッチ目標値を与えて合成していた。このため、従
来の音声合成装置では抑揚の単調さや不自然さかあった
In the speech synthesis device configured as described above, the pitch pattern for the sentence "Aoi", for example, generated by the pitch pattern generation unit 5 with reference to the pitch pattern database 6, is based on the I mora as shown in FIG. 4
The pitch target value of the points was given and synthesized. For this reason, conventional speech synthesizers suffer from monotonous and unnatural intonation.

D 発明が解決しようとする課題 上述した従来の音声合成装置で上記した不具合があるけ
れども、近年装置の音質は進歩し、明瞭性の点では問題
がない程度まで到達しつつある。
D. Problems to be Solved by the Invention Although the conventional speech synthesis device described above has the above-mentioned problems, the sound quality of the device has improved in recent years and is reaching a level where there is no problem in terms of clarity.

しかし、音質では原音をある程度圧縮した、いわゆる録
音合成装置と比較するとやや音質評価か落ちる。
However, when compared to so-called recording synthesizers that compress the original sound to some extent, the sound quality is slightly lower.

しかし、録音合成は任意の文章を発声させることができ
ないので、多数の文章を発声させるためには、膨大なメ
モリを必要とし、高価になる欠点がある。この欠点の他
に以下のような問題もある。
However, since recording synthesis cannot make any sentence uttered, it requires a huge amount of memory and is expensive in order to utter a large number of sentences. In addition to this drawback, there are also the following problems.

(イ)ある文章のある単語を別の単語に変更する場合に
その都度録音しなければならず非常に煩雑かつ高価とな
る、。
(b) When changing one word in a certain sentence to another, it must be recorded each time, which is very complicated and expensive.

(ロ)すてに録音しである同し発声者から必ずしも録音
できない。
(b) It is not always possible to record from the same speaker who has already recorded.

(ハ)同じ発声者から録音できても抑揚がすでに録音し
である単語や文章と異なる場合がある。
(c) Even if a sound can be recorded from the same speaker, the intonation may differ from the already recorded word or sentence.

(ニ)上記(ロ)、(ハ)から文章全体の抑揚か不自然
になり、合成音の自然性や明瞭性を低下させる。
(d) As a result of (b) and (c) above, the intonation of the entire sentence becomes unnatural, reducing the naturalness and clarity of the synthesized speech.

この発明は上記の事情に鑑みてなされたもので、録音合
成に近い音質を持つ音声合成方法を提供することを目的
とする。
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a speech synthesis method having a sound quality close to that of recording synthesis.

E 課題を解決するための手段 この発明は漢字かな混じり文のテキスト入力を日本語処
理部で解析して音韻列に変換し、この音韻列に基づいて
時間長パターン、ピッチパターン及びエネルキパターン
を各データ格納部を参照して生成し、生成されたこれら
のパターンに基ついて合成音声を生成する方法において
、 発声単語のピッチを分析し、このピッチ分析パターンと
前記時間長パターンデータヘースに格納されている音韻
環境別の音韻時間長データとから音韻境界を決定してピ
ッチパターン生成の際参照されるピッチパターンデータ
ベースを生成するようにしたことを特徴とするものであ
る。
E. Means for Solving the Problems This invention analyzes text input of sentences containing kanji and kana in a Japanese language processing unit, converts it into a phoneme string, and then converts each time length pattern, pitch pattern, and energy pattern based on this phoneme string. In the method of generating synthesized speech based on the generated patterns by referring to the data storage section, the pitch of the uttered word is analyzed, and this pitch analysis pattern and the time length pattern stored in the data base are The present invention is characterized in that a pitch pattern database is generated which is referred to when generating a pitch pattern by determining phoneme boundaries from phoneme duration data for each phoneme environment.

F 作用 発声すべき単語や文章を発声者に発声してもらってその
ピッチを分析する。分析した結果、得られたピッチパタ
ーンと、予め、実音声を分析することにより得られた音
韻環境別の音韻時間長データとを用いて発声単語の音韻
環境から音韻境界を決定する。そして、各音韻を分割し
た時点のピッチを合成時の抑揚パターンとして記憶して
おいて、このパターンを使って合成音を得る。
F. Have the speaker say the words and sentences to be uttered and analyze their pitch. A phonological boundary is determined from the phonological environment of the uttered word using the pitch pattern obtained as a result of the analysis and the phonological time length data for each phonological environment obtained in advance by analyzing real speech. Then, the pitch at the time when each phoneme is divided is stored as an intonation pattern at the time of synthesis, and this pattern is used to obtain a synthesized sound.

G 実施例 以下この発明の実施例を図面に基づいて説明する。G Example Embodiments of the present invention will be described below based on the drawings.

第1図において、21は発声すべき単語あるいは文章の
格納部で、この格納部21の単語あるいは文章を発声者
に発声してもらい、そのピッチをピッチ分析部22で分
析する。23は音韻環境別の音韻時間長データ格納部で
、この格納部23の音韻時間長データは第3図に示す時
間長パターンデータベース4に予め膨大な実音声を分析
するこ、とにより得たものである。格納部23のデータ
とピッチ分析部22で分析された結果による第2図に示
すピッチパターンとを時間長正規化部24に入力して、
ここで発声単語の音韻環境力lら音韻境界を決定する。
In FIG. 1, 21 is a storage unit for words or sentences to be uttered. A speaker is asked to utter the words or sentences in this storage unit 21, and the pitch analysis unit 22 analyzes the pitch. Reference numeral 23 denotes a phoneme time length data storage unit for each phoneme environment, and the phoneme time length data in this storage unit 23 is obtained by previously analyzing a huge amount of real speech into the time length pattern database 4 shown in FIG. It is. The data in the storage unit 23 and the pitch pattern shown in FIG. 2 based on the results analyzed by the pitch analysis unit 22 are input to the time length normalization unit 24,
Here, the phonological environment of the uttered word and the phonological boundaries are determined.

そして、各音韻を4分割した時点のピッチを合成時のピ
ッチパターンとしてピッチパターンデータベース6に格
納しておき、これを使って合成音を得る。
Then, the pitch at the time when each phoneme is divided into four is stored in the pitch pattern database 6 as a pitch pattern at the time of synthesis, and this is used to obtain a synthesized sound.

上記のようにして得たピッチパターンデータベースを使
用すれば、第2図に示す「あ」、「お≦。
If the pitch pattern database obtained as described above is used, "a" and "o≦" as shown in FIG.

:い=・等の音韻境界が正しく認識できるとともにピッ
チ目標値の位置も適確になる。従って、合成音は自然な
抑揚となって、録音合成に近い音質になる。
Phonological boundaries such as :i=・ can be recognized correctly, and the pitch target value can also be positioned accurately. Therefore, the synthesized sound has a natural intonation, and the sound quality is close to that of recorded synthesis.

H発明の効果 以上述べたように、この発明によれば、実音声を分析し
て得られた音韻環境別の音韻時間長データと、実音声の
ピッチパターンを分析して得られたピンチパターンとか
ら発声噴語の音韻環境から音韻境界を決定してピッチパ
ターンと合成したことにより、録音合成音に近い音質を
得ることかできる規則音声合成方法が得られる。
Effects of the Invention As described above, according to the present invention, phonetic duration data by phonetic environment obtained by analyzing real speech, and pinch patterns obtained by analyzing pitch patterns of real speech. By determining phonological boundaries from the phonological environment of spoken ejectives and synthesizing them with pitch patterns, a regular speech synthesis method can be obtained that can obtain sound quality close to that of recorded synthesized speech.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はこの発明の一実施例を示す概略構成図、第2図
はこの発明の実施例によるピッチ分析で得られたピッチ
パターン説明図、第3図は従来の規則音声合成装置の概
略構成図、第4図は従来の規則音声合成装置で使用され
るピッチパターン説明図である。 21・・発声すべき単語あるいは文章の格納部、22・
・ピッチ分析部、23−音韻時間長データ格納部、24
・・時間長正規化部。 第1図 6 ピッチパターンデータ/\−ス 21 発生ずへき単語あるいは文章の格納部22 ピッ
チ分析部 23 音韻時iIl長ナータ格納部 24 時間長正規化部 第3図 4 時間長パター/データベ−7, 6ピッチバター7データベース 8 エネルギパターノデータベース 10 音声データベース 11 音声出力部
FIG. 1 is a schematic configuration diagram showing an embodiment of the present invention, FIG. 2 is an explanatory diagram of pitch patterns obtained by pitch analysis according to the embodiment of this invention, and FIG. 3 is a schematic configuration diagram of a conventional regular speech synthesis device. FIG. 4 is an explanatory diagram of a pitch pattern used in a conventional regular speech synthesizer. 21. Storage unit for words or sentences to be uttered, 22.
・Pitch analysis section, 23 - Phonological duration data storage section, 24
...Time length normalization section. Fig. 1 6 Pitch pattern data/\-space 21 Storage section 22 for words or sentences that occur without occurrence Pitch analysis section 23 Phonological time length nata storage section 24 Time length normalization section Fig. 3 4 Time length pattern/database 7 , 6 pitch butter 7 database 8 energy pattern database 10 audio database 11 audio output section

Claims (1)

【特許請求の範囲】[Claims] (1)漢字かな混じり文のテキスト入力を日本語処理部
で解析して音韻列に変換し、この音韻列に基づいて時間
長パターン、ピッチパターン及びエネルギパターンを各
データベースを参照して生成し、生成されたこれらのパ
ターンに基づいて合成音声を生成する方法において、 発声単語のピッチを分析し、このピッチ分析パターンと
前記時間長パターンデータベースに格納されている音韻
環境別の音韻時間長データとから音韻境界を決定してピ
ッチパターン生成の際参照されるピッチパターンデータ
ベースを生成するようにしたことを特徴とする音声合成
方法。
(1) The Japanese processing unit analyzes the text input of a sentence containing kanji and kana, converts it into a phoneme string, and generates a time length pattern, pitch pattern, and energy pattern based on this phoneme string by referring to each database, In the method of generating synthetic speech based on these generated patterns, the pitch of the uttered word is analyzed, and the pitch of the uttered word is analyzed from the pitch analysis pattern and the phonetic duration data for each phonetic environment stored in the duration pattern database. A speech synthesis method characterized in that a pitch pattern database is generated which is referred to when generating pitch patterns by determining phoneme boundaries.
JP2322170A 1990-11-26 1990-11-26 Sound synthesizing method Pending JPH04190398A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2322170A JPH04190398A (en) 1990-11-26 1990-11-26 Sound synthesizing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2322170A JPH04190398A (en) 1990-11-26 1990-11-26 Sound synthesizing method

Publications (1)

Publication Number Publication Date
JPH04190398A true JPH04190398A (en) 1992-07-08

Family

ID=18140721

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2322170A Pending JPH04190398A (en) 1990-11-26 1990-11-26 Sound synthesizing method

Country Status (1)

Country Link
JP (1) JPH04190398A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0728491A (en) * 1993-07-12 1995-01-31 Atr Jido Honyaku Denwa Kenkyusho:Kk Automatic labeling method for phoneme border

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0728491A (en) * 1993-07-12 1995-01-31 Atr Jido Honyaku Denwa Kenkyusho:Kk Automatic labeling method for phoneme border

Similar Documents

Publication Publication Date Title
Isewon et al. Design and implementation of text to speech conversion for visually impaired people
US8219398B2 (en) Computerized speech synthesizer for synthesizing speech from text
JP4302788B2 (en) Prosodic database containing fundamental frequency templates for speech synthesis
US8886538B2 (en) Systems and methods for text-to-speech synthesis using spoken example
US6173262B1 (en) Text-to-speech system with automatically trained phrasing rules
US8775185B2 (en) Speech samples library for text-to-speech and methods and apparatus for generating and using same
JP3587048B2 (en) Prosody control method and speech synthesizer
JP3085631B2 (en) Speech synthesis method and system
RU61924U1 (en) STATISTICAL SPEECH MODEL
JP3576848B2 (en) Speech synthesis method, apparatus, and recording medium recording speech synthesis program
US6829577B1 (en) Generating non-stationary additive noise for addition to synthesized speech
JPH0887297A (en) Voice synthesis system
JPH08335096A (en) Text voice synthesizer
JP2001034284A (en) Voice synthesizing method and voice synthesizer and recording medium recorded with text voice converting program
JP3060276B2 (en) Speech synthesizer
JPH04190398A (en) Sound synthesizing method
Houidhek et al. Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic
Hinterleitner et al. Speech synthesis
Datta et al. Epoch Synchronous Overlap Add (ESOLA)
JP2703253B2 (en) Speech synthesizer
Dessai et al. Development of Konkani TTS system using concatenative synthesis
JPH06138894A (en) Device and method for voice synthesis
JPH09198073A (en) Speech synthesizing device
JPH01321496A (en) Speech synthesizing device
JPH08160990A (en) Speech synthesizing device