JPH06332490A

JPH06332490A - Generating method of accent component basic table for voice synthesizer

Info

Publication number: JPH06332490A
Application number: JP5117519A
Authority: JP
Inventors: Kiyoshi Ishida; 清石田; Kazuya Hasegawa; 和也長谷川
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1993-05-20
Filing date: 1993-05-20
Publication date: 1994-12-02

Abstract

PURPOSE:To faithfully express accent information of actual human voice and to realize smoothness in an accent control during a synthesis. CONSTITUTION:For every accent control point (seven points for each data), an averaged value of all data belong to a group is obtained in a first step S1. Employing the averaged value, a standard deviation is obtained in a second step S2. Only using the data which exist within a + or - range of the averaged value and the standard deviation, an averaged value is again obtained in a third step S3 and the results are registered in a table for every group in a fourth step S4.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は合成音声のピッチの制
御でピッチ目標値を１音韻あたり複数個設け、さらにア
クセント幅を多段階で制御することにより、人間の発生
により近いピッチパターンを実現する規則音声合成装置
において、アクセント成分の基本パターンを成分する時
に参照する音声合成装置のアクセント成分基本テーブル
の作成方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention realizes a pitch pattern closer to human occurrence by providing a plurality of pitch target values per phoneme in controlling the pitch of synthesized speech and controlling the accent width in multiple stages. The present invention relates to a method of creating an accent component basic table of a voice synthesizing device which is referred to when a basic pattern of an accent component is composed in a regular speech synthesizing device.

【０００２】[0002]

【従来の技術】規則合成方式による音声合成装置は、図
３に示すように、テキスト入力部１からの入力テキスト
に対し、日本語処理部２による形態素解析によってポー
ズの位置，単語・文節の句切りや辞書を参照した読みが
な変換とアクセント付与がなされる。例えば、入力テキ
ストが「今日はいい天気です」にあるとき、日本語処理結果は下記表となる。2. Description of the Related Art As shown in FIG. 3, a speech synthesizing system using a rule synthesizing method uses a morphological analysis by a Japanese processing unit 2 for a text input from a text input unit 1 to find a position of a pose and a phrase of a word or phrase. The phonetic conversion and accent addition are performed by cutting and referring to a dictionary. For example, when the input text is "Today's weather is nice", the Japanese processing result is shown in the table below.

【０００３】[0003]

【表１】 [Table 1]

【０００４】この処理結果のテキストに対し、抑揚制御
部３ではフレーズパターン算出部３Ａによってテキスト
に含まれるモーラ（子音＋母音で表される音の最小単
位）の数から得られるフレーズ成分（ポーズで挟まれた
一息で話すときの音の高低）を算出し、アクセントパタ
ーン算出部３Ｂによってアクセント成分（単語が個々に
持つ音の高低）を算出し、夫々の成分を図４に示すよう
に重ね合わせた抑揚制御パターンを算出する。フレーズ
成分は人が話すときの出始めの高いピッチから次第に声
門下圧の低下による低いピッチへの低下になる。また、
アクセント成分は上述のように解析単位毎に１つのピッ
チ目標値を与えてその間を直線補間する場合と、解析単
位毎に３点のピッチ目標値を与えてその間を直線補間す
る場合がある。For the text resulting from this processing, in the intonation control unit 3, the phrase pattern calculation unit 3A obtains a phrase component (in pauses) obtained from the number of mora (consonant + minimum unit of sound represented by vowels) included in the text. The pitch of a sound when a person is caught in one breath is calculated), the accent component (the pitch of the sound of each word) is calculated by the accent pattern calculation unit 3B, and the respective components are superposed as shown in FIG. The intonation control pattern is calculated. The phrase component gradually decreases from a high pitch at the beginning when a person speaks to a low pitch due to a decrease in subglottic pressure. Also,
As described above, the accent component may be given one pitch target value for each analysis unit and linearly interpolated between them, or may be given three pitch target values for each analysis unit and linearly interpolated between them.

【０００５】後者のアクセント成分算出は、ＣＶ単位
（子音＋母音）とＶ単位（母音）別に下記表２、表３に
示す区分に目標値Ｐ₁〜Ｐ₃を与え、＊印で示す区分のピ
ッチは前後のデータから直線補間する。In the latter accent component calculation, target values P _{1 to} P ₃ are given to the categories shown in Tables 2 and 3 below for each CV unit (consonant + vowel) and V unit (vowel), and the category indicated by * is added. The pitch is linearly interpolated from the preceding and following data.

【０００６】[0006]

【表２】 [Table 2]

【０００７】[0007]

【表３】 [Table 3]

【０００８】音声合成部４は抑揚処理部３で付与された
抑揚制御パターンに従って各音節のピッチを調整し、ま
た各音節に対応づけた音源波パターンと調音フィルタの
バラメータから調音フィルタのの応答出力として合成音
声を得る。The speech synthesizer 4 adjusts the pitch of each syllable according to the intonation control pattern provided by the intonation processor 3, and outputs the response of the articulatory filter from the source wave pattern and the articulatory filter parameter associated with each syllable. As a synthetic voice.

【０００９】上述のように、従来の抑揚制御方式は、入
力テキストに含まれるモーラの数によるフレーズ成分と
モータ毎のアクセント成分等の合成により合成音声に抑
揚を持たせる。このうち、アクセント成分は、１モーラ
当たり複数の目標ピッチを持たせ、その間のピッチ変化
は補間処理によって決定される。As described above, the conventional intonation control system imparts intonation to the synthesized voice by synthesizing the phrase component according to the number of moras contained in the input text and the accent component for each motor. Of these, the accent component has a plurality of target pitches per mora, and the pitch change during that time is determined by interpolation processing.

【００１０】[0010]

【発明が解決しようとする課題】前記アクセント成分は
アクセントの高低から単純にピッチが決定されるため、
画一的なピッチ変化になって機械的な合成音声になり易
かった。また、音節と音節及び文節と文節のつながりに
ついては全く考慮されていないため、モーラの前後を含
めたアクセントの変化に滑らかさが無くなる場合が生
じ、音節又は文全体としては不自然な合成音声になる問
題があった。Since the pitch of the accent component is simply determined from the height of the accent,
It was easy to get a uniform pitch change and mechanical synthetic voice. In addition, since the connection between syllables and syllables and syllables and syllables is not considered at all, the change in accents before and after the mora may become unsmooth, resulting in unnatural synthetic speech as syllables or sentences as a whole. There was a problem.

【００１１】この発明は上記の事情に鑑みてなされたも
ので、グループ化された種々の環境毎に複数のデータの
アクセントパターンの平均化処理を行い、テーブルを作
成し、多数の発生からより平均値なアクセントパターン
を抽出して、人が発生した実音声のアクセント情報をよ
り忠実に表現するとともに、合成時のアクセント制御に
おけるなめらかさを実現できる音声合成装置のアクセン
ト成分基本テーブルの作成方法を提供することを目的と
する。The present invention has been made in view of the above circumstances, and performs averaging processing of accent patterns of a plurality of data for each of various grouped environments, creates a table, and averages from a large number of occurrences. Provides a method for creating a basic table of accent components for a speech synthesizer that extracts a valued accent pattern to more accurately represent the accent information of a human-generated voice and realizes smoothness in accent control during synthesis. The purpose is to do.

【００１２】[0012]

【課題を解決するための手段】この発明は上記の目的を
達成するために、第１発明は日本語処理された入力テキ
ストに対し、該入力テキストのフレーズ内モーラ数から
フレーズ成分の抑揚制御パターンを得、該入力テキスト
のフレーズ内各モーラに定めるピッチ目標値をアクセン
ト環境に応じて補正及び補間処理によってアクセント成
分を重ね合わせて入力テキストの抑揚制御パターンを
得、前記モーラのアクセント環境別に基本アクセントパ
ターンをテーブル化して記憶する基本アクセントパター
ンテーブルを得、入力されるモーラのアクセント環境に
対応する基本アクセントパターンを前記テーブルから得
て、当該モーラのピッチを補正する基本アクセントパタ
ーン生成処理部を得、この処理部の処理結果を日本語処
理された句切り内モーラ数とモーラ位置によって補正し
てアクセント成分の抑揚パターンを得る補正処理部とを
備えた規則合成方式の音声合成方法において、前記基本
アクセントパターンテーブルは種々なモーラが発声され
る環境をグループ化し、各グループ毎に各アクセント制
御点について平均化処理を行った後に、その処理結果を
各グループ毎にテーブルに登録するようにしたことを特
徴とする。In order to achieve the above object, the present invention relates to an input control pattern for a Japanese language input text, which is based on the number of mora in the phrase of the input text. The pitch target value determined for each mora in the phrase of the input text is corrected and interpolated according to the accent environment to superimpose accent components to obtain an intonation control pattern of the input text. A basic accent pattern table that stores the patterns in a table is obtained, a basic accent pattern corresponding to the accent environment of the input mora is obtained from the table, and a basic accent pattern generation processing unit that corrects the pitch of the mora is obtained. The processing result of this processing unit is used for the Japanese phrase processing In the speech synthesis method of the rule synthesizing method, which comprises a correction processing unit that obtains the accent pattern of the accent component by correcting according to the number of mora and the mora position, the basic accent pattern table groups the environments in which various mora are uttered, The averaging process is performed on each accent control point for each group, and then the processing result is registered in the table for each group.

【００１３】第２発明は平均化処理が各アクセント制御
点毎にグループに属するデータ全部について平均値Ｘａ
ｖｅを求めた後、標準偏差ＳＤを求め、しかる後、前記
平均値Ｘａｖｅ±３ＳＤの範囲内に存在するデータのみ
を用いて再度平均値を求めるようにしたことを特徴とす
る。In the second aspect of the invention, the averaging process is performed by averaging Xa for all data belonging to the group for each accent control point.
After obtaining ve, the standard deviation SD is obtained, and thereafter, the average value is obtained again using only the data existing within the range of the average value Xave ± 3SD.

【００１４】[0014]

【作用】アクセント基本テーブルを作成する際に、モー
ラの環境毎に登録されるアクセントデータを計算する。
このとき、その環境のグループに属するデータ群のパタ
ーンの平均化を行って、適切な平均的アクセントパター
ンをテーブルに登録する。これにより、データ間の連続
性を改善して、合成音声全体の自然性の向上を図る。When the accent basic table is created, the accent data registered for each mora environment is calculated.
At this time, the patterns of the data groups belonging to the environment group are averaged, and an appropriate average accent pattern is registered in the table. This improves continuity between data and improves the naturalness of the entire synthetic speech.

【００１５】[0015]

【実施例】以下この発明の実施例を図面に基づいて説明
する。図１において、Ｓ₁は種々のモーラが発声される
環境をグループ化した際、各グループ毎に、各アクセン
ト制御点（例えば各データにつき７点づつ：ｊ＝１〜
７）について平均化処理を行うときの第１ステップで、
この第１ステップＳ₁は各アクセント制御点（各データ
につき７点づつ）毎に、グループに属するデータ全部
［Ｘ₁（ｊ）〜Ｘ_N（ｊ）］についての平均値Ｘａｖｅ₁
を（１）式により求めるものである。Embodiments of the present invention will be described below with reference to the drawings. In FIG. 1, S ₁ is an accent control point (for example, 7 points for each data: j = 1 to 1) for each group when the environment in which various mora are uttered is grouped.
In the first step when performing the averaging process for 7),
This first step S ₁ is, for each accent control point (7 points for each data), average value Xave ₁ for all data [X ₁ (j) to X _N (j)] belonging to the group.
Is obtained by the equation (1).

【００１６】[0016]

【数１】 [Equation 1]

【００１７】第１ステップＳ₁で求めた平均値Ｘａｖｅ₁
を基にして第２ステップＳ₂で標準偏差ＳＤを求める。
この標準偏差ＳＤは（２）式を用いて求める。[0017] The average value Xave ₁ obtained in the first step S ₁
Based on the above, the standard deviation SD is obtained in the second step S ₂ .
This standard deviation SD is obtained using the equation (2).

【００１８】[0018]

【数２】 [Equation 2]

【００１９】標準偏差ＳＤを求めたなら、第３ステップ
Ｓ₃に進んでＸａｖｅ１±３ＳＤの範囲内に存在するデ
ータ（Ｎ′個）のみを用いて再度平均値Ｘａｖｅを求め
る。（３）式はその平均値を求める式である。After the standard deviation SD is obtained, the process proceeds to the third step S ₃ and the average value Xave is obtained again using only the data (N ′ pieces) existing within the range of Xave1 ± 3SD. Expression (3) is an expression for obtaining the average value.

【００２０】[0020]

【数３】 [Equation 3]

【００２１】上記のようにして求めた結果を各グループ
毎にテーブルに登録する。このステップが第４である。
このようにしてアクセント基本テーブルを作成する際に
はモーラの環境毎に登録されるアクセントデータをまず
計算する。計算後、その環境のグループに属するデータ
群のパターンの平均化を行って、より適切な平均的アク
セントパターンをテーブルに登録する。テーブルに登録
しておけば次に述べるような構成図を用いることによ
り、データ間の連続性を保持することができ、合成音声
全体の自然性の向上を図ることができる。The results obtained as described above are registered in the table for each group. This step is the fourth.
In this way, when creating the accent basic table, the accent data registered for each mora environment is first calculated. After the calculation, the patterns of the data groups belonging to the environment group are averaged, and a more appropriate average accent pattern is registered in the table. If registered in the table, the continuity between data can be maintained and the naturalness of the synthetic speech as a whole can be improved by using the following configuration diagram.

【００２２】図２は図１で求めたアクセントパターンを
テーブル７に登録したときの構成図で、図３と同一部分
には同一符号を付して示した。図２において、アクセン
トパターン生成処理部５には入力テキストの日本語処理
結果として区切り内モーラ数と当該モーラのアクセント
パターン（前記表１参照）データが与えられる。このデ
ータを、アクセントパターン生成処理部５は、基本アク
セントパターン生成処理部６と、詳細を後述する基本ア
クセントパターンテーブル７及び補正処理部８で処理す
る。FIG. 2 is a block diagram when the accent pattern obtained in FIG. 1 is registered in the table 7. The same parts as those in FIG. 3 are designated by the same reference numerals. In FIG. 2, the accent pattern generation processing unit 5 is provided with the number of moras in a section and the accent pattern data (see Table 1) of the moras as the Japanese processing result of the input text. The accent pattern generation processing unit 5 processes this data by the basic accent pattern generation processing unit 6 and the basic accent pattern table 7 and the correction processing unit 8 which will be described in detail later.

【００２３】基本アクセントパターン生成処理部６は、
入力されたデータのアクセント環境に対応する基本アク
セントパターンをテーブル７から読み出すための処理を
行う。この基本アクセントパターンテーブル７は規則合
成を行う際、ある文章の中のあるモーラのアクセント成
分の値を決めるとき、そのモーラの前後の音の種類やア
クセントパターン等を考慮に入れて決定し、各モーラの
出て来るさまざまな環境に対して、それぞれアクセント
成分を決定するためのアクセント成分の値をテーブルに
持っておいて参照して決めたものである。The basic accent pattern generation processing unit 6 is
Processing for reading the basic accent pattern corresponding to the accent environment of the input data from the table 7 is performed. This basic accent pattern table 7 is determined by taking into consideration the types of sounds before and after the mora, accent patterns, etc. when determining the value of the accent component of a mora in a sentence when performing rule composition. This is determined by referring to the values of accent components for determining the accent components for various environments in which mora appears, respectively.

【００２４】[0024]

【発明の効果】以上述べたように、この発明によれば、
大量の発声例から生成された音声データベースに基づい
て、アクセント基本成分を決定するときに参照するアク
セント基本テーブルを作成し、これに基づいて音声合成
のピッチパターンを生成して、人間の発声するときのピ
ッチパターンを合成音に反映させるときに、前記基本テ
ーブルを作成する際に、グループ化された種々の環境毎
に複数のデータのアクセントパターンの平均化処理を行
ってテーブルを作成し、多数の発声からより平均的なア
クセントパターンを抽出するようにしたことにより、人
間の発声した実音声のアクセント情報をより忠実に表現
するとともに、合成時のアクセント制御におけるなめら
かを実現することができる。As described above, according to the present invention,
Based on a speech database generated from a large number of utterance examples, create an accent basic table to refer to when deciding accent basic components, and generate a pitch pattern for speech synthesis based on this to generate a human voice. When the basic pattern is created when the pitch pattern of No. 1 is reflected in the synthesized sound, the table is created by averaging the accent patterns of a plurality of data for each of various grouped environments. By extracting a more average accent pattern from the utterance, it is possible to more faithfully represent the accent information of the real voice uttered by a human and to realize smoothness in accent control during synthesis.

[Brief description of drawings]

【図１】この発明の実施例を示すフローチャートであ
る。FIG. 1 is a flow chart showing an embodiment of the present invention.

【図２】図１が適用される構成図である。FIG. 2 is a configuration diagram to which FIG. 1 is applied.

【図３】従来の規則合成方式による音声合成装置の構成
図である。FIG. 3 is a block diagram of a conventional speech synthesis apparatus using a rule synthesis method.

【図４】抑揚処理態様図である。FIG. 4 is a diagram showing an intonation processing mode.

[Explanation of symbols]

Ｓ₁…第１ステップＳ₂…第２ステップＳ₃…第３ステップＳ₄…第４ステップS ₁ ... 1st step S ₂ ... 2nd step S ₃ ... 3rd step S ₄ ... 4th step

Claims

[Claims]

1. For input text processed in Japanese,
An intonation control pattern of a phrase component is obtained from the number of mora in the phrase of the input text, and a pitch target value defined for each mora in the phrase of the input text is corrected and interpolated in accordance with the accent environment to superimpose the accent component on the input text. To obtain a basic accent pattern table in which the basic accent patterns for each mora accent environment are stored and stored, and a basic accent pattern corresponding to the input mora accent environment is obtained from the table. A basic accent pattern generation processing unit for correcting the pitch of the mora is obtained, and a correction processing unit for correcting the processing result of the processing unit by the number of mora in the phrase cut and the mora position which are processed in Japanese to obtain an intonation pattern of the accent component. Speech Synthesis Method with Rule Synthesis Method In the basic accent pattern table, the environments in which various mora are uttered are grouped, the averaging process is performed for each accent control point for each group, and the processing result is registered in the table for each group. A method of creating an accent component basic table for a speech synthesizer characterized by the above.

2. The averaging process obtains an average value Xave for all data belonging to a group for each accent control point, then obtains a standard deviation SD, and thereafter, the average value Xave.
2. A method of creating an accent component basic table of a speech synthesizer according to claim 1, wherein the average value is obtained again only by using the data existing within the range of ave ± 3SD.