JPH04214600A

JPH04214600A - Sound synthesizing method

Info

Publication number: JPH04214600A
Application number: JP2401799A
Authority: JP
Inventors: Yoshimasa Sawada; 沢田　喜正
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1990-12-13
Filing date: 1990-12-13
Publication date: 1992-08-05

Abstract

PURPOSE:To enhance the naturality and crispness of synthesized sounds by previously forming the pitch fluctuation data of original sounds in a synthesizing section and superposing this data on the target pitch value at the time of synthesis. CONSTITUTION:A fluctuation data section 21 is previously formed in the voice synthesizing section 9. The average pitch values of the intervals between the n points of the pitch patterns of the original sounds of the synthesized sounds and the differences between the average pitch values and the respective pitches are stored as the fluctuation data of respective pieces of the data into the fluctuation data section 21. The data of the fluctuation data section 21 obtd. in such a manner are superposed with the pitch patterns formed in a pitch pattern forming section 5 in a superposing section 22 at the time of sound synthesis and the monotonousness of the pitches is decreased.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】この発明は漢字かな混じりのテキ
ストから音声を合成する音声合成方法に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention This invention relates to a speech synthesis method for synthesizing speech from text containing kanji and kana.

【０００２】0002

【従来の技術】規則音声合成は、任意の単語，文章等を
漢字かな混じりのテキストより音声として合成する手段
である。図３は、一般的な音声合成装置の概要を示す説
明図である。まず、テキスト入力部１に入力されたテキ
ストを、日本語処理部２により音素記号列に変換する。次に、この音素記号列から、韻律パターン（時間長パタ
ーン・ピッチパターン・エネルギパターン）を生成する
。すなわち、時間長パターン生成部３により、時間長パ
ターンデータベース４を参照し、音声の継続時間を示す
時間長パターンを生成する。2. Description of the Related Art Ruled speech synthesis is a means of synthesizing arbitrary words, sentences, etc. into speech from text mixed with kanji and kana. FIG. 3 is an explanatory diagram showing an outline of a general speech synthesis device. First, the text input to the text input section 1 is converted into a phoneme symbol string by the Japanese language processing section 2. Next, a prosodic pattern (duration pattern, pitch pattern, energy pattern) is generated from this phoneme symbol string. That is, the time length pattern generation unit 3 refers to the time length pattern database 4 and generates a time length pattern indicating the duration of the audio.

【０００３】同様に、ピッチパターン生成部５により、
ピッチパターンデータベース６を参照し、音声の高さを
示すピッチパターンを生成する。また同様に、エネルギ
パターン生成部７により、エネルギパターンデータベー
ス８を参照し、音声の強さを示すエネルギパターンを生
成する。このようにして得られた各音韻パターンに基づ
いて、音声合成部９により、音声データベース１０を参
照し、音声波形を合成する。なお１１は、合成音声を出
力する音声出力部である。Similarly, the pitch pattern generator 5 generates
Referring to the pitch pattern database 6, a pitch pattern indicating the pitch of the voice is generated. Similarly, the energy pattern generation unit 7 refers to the energy pattern database 8 and generates an energy pattern indicating the strength of the voice. Based on each phoneme pattern obtained in this way, the speech synthesis section 9 refers to the speech database 10 and synthesizes a speech waveform. Note that 11 is an audio output unit that outputs synthesized audio.

【０００４】上記のように構成された音声合成装置では
、図３に示すような処理で合成音声を出力していた。このときの、抑揚（ピッチ）パターンは図４に示すよう
なパターンを用いていた。すなわち、図４においては１
モーラにつき４点のピッチ目標値Ｐ１０〜Ｐ３４を与え
、その目標値間を直線で補間して、各フレームのピッチ
を最終的に与えて音声合成部９に供給していた。[0004] The speech synthesizer configured as described above outputs synthesized speech through processing as shown in FIG. At this time, the intonation (pitch) pattern shown in FIG. 4 was used. That is, in Figure 4, 1
Four pitch target values P10 to P34 are given for each mora, interpolation is performed between the target values in a straight line, and the pitch of each frame is finally given and supplied to the speech synthesis section 9.

【０００５】[0005]

【発明が解決しようとする課題】上記のような抑揚制御
方法では、各モーラの時間長が短いとあまり影響を受け
ないが、それが長くなると、近い値のピッチで合成され
る時間が長くなってしまう。つまり、ピッチの変化が単
調になるとともに合成音が機械的になりやすく、かつ明
瞭性も低下する問題がある。また、モーラ時間長が比較
的短いときでも、実際の人間が発する声のピッチパター
ンと比較すれば、変化に乏しく単調となる問題がある。[Problem to be solved by the invention] In the intonation control method as described above, if the time length of each mora is short, it will not be affected much, but if it becomes longer, the time taken to synthesize pitches of similar values will increase. I end up. In other words, there is a problem that the pitch changes become monotonous, the synthesized sound tends to become mechanical, and the clarity also deteriorates. Furthermore, even when the mora time length is relatively short, there is a problem in that the pitch pattern is monotonous with little variation when compared to the pitch pattern of an actual human voice.

【０００６】この発明は上記の事情に鑑みてなされたも
ので、合成音のピッチに自然なゆらぎを与えることによ
り、合成音の自然性及び明瞭性を向上させるようにした
音声合成方法を提供することを目的とする。The present invention has been made in view of the above circumstances, and provides a speech synthesis method that improves the naturalness and clarity of synthesized speech by imparting natural fluctuations to the pitch of synthesized speech. The purpose is to

【０００７】[0007]

【課題を解決するための手段】この発明は上記の目的を
達成するために、漢字かな混じり文のテキスト入力を日
本語処理部で解析して音韻列に変換し、この音韻列に基
づいて時間長パターン，ピッチパターン及びエネルギパ
ターンを各データベースを参照して生成し、生成された
これらのパターンに基づいて音声合成部で合成音声を生
成する方法において、予め音声合成部内に原音のピッチ
ゆらぎデータを作成しておき、このデータを合成時前記
ピッチパターンのピッチ目標値に重畳させたことを特徴
とするものである。[Means for Solving the Problems] In order to achieve the above object, the present invention analyzes a text input of a sentence containing kanji and kana in a Japanese processing unit, converts it into a phoneme string, and uses the phoneme string to calculate the time. In a method in which long patterns, pitch patterns, and energy patterns are generated by referring to each database, and synthesized speech is generated in a speech synthesis section based on these generated patterns, pitch fluctuation data of the original sound is stored in the speech synthesis section in advance. This data is created in advance, and this data is superimposed on the pitch target value of the pitch pattern at the time of synthesis.

【０００８】[0008]

【作用】前記ピッチパターンにゆらぎデータを重畳させ
ると、ピッチの単調さが低減し、合成音の自然性が向上
する。[Operation] When fluctuation data is superimposed on the pitch pattern, the monotony of the pitch is reduced and the naturalness of the synthesized sound is improved.

【０００９】[0009]

【実施例】以下この発明の一実施例を図面に基づいて説
明するに、図３と同一部分は同一符号を付してその説明
を省略する。図１において、２１は音声合成部９内に予
め作成されたゆらぎデータ部で、このゆらぎデータ部２
１は次のようにして作成される。例えば合成音「あ」の
原音のピッチパターンが図２に示すようなとき、ｎ点間
隔のピッチ平均値と各ピッチ平均値と各ピッチとの差を
、各フレームのゆらぎデータとしてゆらぎデータ部２１
に格納する。以下同様にして各音についてゆらぎデータ
を得て、それらをゆらぎデータ部２１に格納する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. The same parts as those in FIG. In FIG. 1, 21 is a fluctuation data section created in advance in the speech synthesis section 9;
1 is created as follows. For example, when the pitch pattern of the original sound of the synthesized sound ``a'' is as shown in FIG.
Store in. Thereafter, fluctuation data is obtained for each sound in the same manner and stored in the fluctuation data section 21.

【００１０】上記のように得られたゆらぎデータ部２１
のデータは音声合成時にピッチパターン生成部５で生成
されたピッチパターンと重畳部２２で重畳される。これ
により、従来の欠点であるピッチの単調さを低減できる
ようになり、合成音の自然性が向上できるようになる。Fluctuation data section 21 obtained as described above
The data is superimposed on the pitch pattern generated by the pitch pattern generation section 5 at the time of speech synthesis in the superimposition section 22. This makes it possible to reduce pitch monotony, which is a drawback of the conventional method, and improve the naturalness of synthesized sounds.

【００１１】[0011]

【発明の効果】以上述べたように、この発明によれば、
ゆらぎデータをピッチパターンに重畳させることにより
、合成音の自然性を向上させるとともに明瞭性を高める
ことができる利点がある。[Effects of the Invention] As described above, according to the present invention,
By superimposing the fluctuation data on the pitch pattern, there is an advantage that the naturalness of the synthesized sound can be improved and the clarity can be improved.

[Brief explanation of the drawing]

【図１】　　この発明の一実施例を示す概略構成図。FIG. 1 is a schematic configuration diagram showing an embodiment of the present invention.

【図２】　　合成音の原音のピッチパターン図。[Fig. 2] Pitch pattern diagram of the original sound of the synthesized sound.

【図３】　　一般的な音声合成装置の概略説明図。FIG. 3 is a schematic explanatory diagram of a general speech synthesis device.

【図４】　　合成音のピッチパターン図。[Fig. 4] Pitch pattern diagram of synthesized sound.

[Explanation of symbols]

５…ピッチパターン生成部９…音声合成部１１…音声出力部２１…ゆらぎデータ部２２…重畳部 5...Pitch pattern generation section 9...Speech synthesis section 11...Audio output section 21... Fluctuation data section 22...Superimposed part

Claims

[Claims]

[Claim 1] A text input containing kanji and kana is analyzed by a Japanese processing unit and converted into a phoneme string, and based on this phoneme string, a time length pattern, a pitch pattern, and an energy pattern are generated by referring to each database. However, in the method of generating synthesized speech in the speech synthesis section based on these generated patterns, pitch fluctuation data of the original sound is created in advance in the synthesis section, and this data is superimposed on the pitch target value during synthesis. A speech synthesis method characterized by the following.