JP2004206144A

JP2004206144A - Fundamental frequency pattern generating method and program recording medium

Info

Publication number: JP2004206144A
Application number: JP2004079113A
Authority: JP
Inventors: Yumiko Kato; 弓子加藤; Takahiro Kamai; 孝浩釜井; Kiyo Hara; 紀代原; Kenji Matsui; 謙二松井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-11-28
Filing date: 2004-03-18
Publication date: 2004-07-22
Anticipated expiration: 2018-11-24
Also published as: JP3771565B2

Abstract

<P>PROBLEM TO BE SOLVED: To solve problems wherein when a fundamental frequency pattern is generated, a fluctuation in fundamental frequency in a mora can not precisely be determined, and a rhythm represented by an accent becomes unnatural owing to distortion generation on a real-time axis. <P>SOLUTION: Provided are: a phoneme time length standardized fundamental frequency database 351 which stores the number of the moras of an accent phrase, a rise reference point of an (i)th mora as the peak of a fundamental frequency pattern, etc., in the form of positions relative to the time length of a phoneme of a specified mora; a fundamental frequency pattern deformation database 350 which stores peaks of an accent phrase by positions in phrases of the accent phrase and the quantity of deformation in fundamental frequency at the ending; a time length setting part 40 which sets a time length of each phoneme by referring to a phoneme time length database 30 according to phoneme information etc., outputted from a character string analysis part 20; a generation part 60 which generates a fundamental frequency pattern by referring to the above-mentioned database 351 according to the above-mentioned outputted phoneme information and the time length of a phoneme set by the time length setting part 40; and others. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

本発明は、音声合成に用いる基本周波数パタン生成方法、及びプログラム記録媒体に関する。 The present invention relates to a fundamental frequency pattern generation method used for speech synthesis, and a program recording medium.

従来の音声基本周波数パタン生成方法としては、特許文献１のようにアクセントの型に着目して当該モーラの開始点あるいは当該モーラの母音開始点を基準とし、対数周波数軸上の臨界制動２次線形系で基本周波数パタンを決定するものがある。一方、特許文献２のようにアクセントの型、音韻の種類、語あるいは句のモーラ位置に着目して各モーラ毎の基本周波数を決定するものもある。
特開平５−１７３５９０号公報特開平５−８８６９０号公報 As a conventional voice fundamental frequency pattern generation method, as in Patent Document 1, focusing on the type of accent, the starting point of the mora or the starting point of the vowel of the mora is used as a reference, and a critical braking quadratic linear on the logarithmic frequency axis is used. Some systems determine the fundamental frequency pattern. On the other hand, as in Patent Document 2, there is a method in which a fundamental frequency for each mora is determined by focusing on the type of accent, the type of phoneme, and the mora position of a word or phrase.
JP-A-5-173590 JP-A-5-88690

しかし、これらの従来の手法では、モーラ内での基本周波数の変動を精密に決定できない、あるいはモーラ毎の時間長の違いによる実時間軸上での歪みを生じ、アクセントに代表される韻律が不自然になるという課題を有していた。 However, with these conventional methods, fluctuations in the fundamental frequency within the mora cannot be determined accurately, or distortions on the real time axis due to differences in the time length of each mora cause prosody represented by accents. There was a problem of becoming natural.

本発明は、上述した従来の音声基本周波数パタン生成方法の課題を考慮し、従来に比べてより一層自然性の高い基本周波数パタンを生成出来る基本周波数パタン生成方法、及びプログラム記録媒体を提供することを目的とする。 The present invention provides a fundamental frequency pattern generation method and a program recording medium capable of generating a more natural fundamental frequency pattern compared to the related art, in consideration of the above-described problem of the conventional audio fundamental frequency pattern generation method. With the goal.

第１の本発明は、音韻数とアクセント位置によって分類された基本周波数パタンを記憶した基本周波数データベースを用いてアクセント句の基本周波数パタンを生成する基本周波数パタン生成方法であって、
前記基本周波数パタンを生成しようとするアクセント句の音韻数およびアクセント型に該当する基本周波数パタンが、前記基本周波数データベース内に記憶されていない場合であって、しかも、前記基本周波数パタンを生成しようとするアクセント句のアクセント位置が、前記基本周波数データベースに記憶されている基本周波数のピークを含む音韻位置の次の音韻位置と同一またはそれ以前の場合には、
（１）前記基本周波数パタンを生成しようとするアクセント句のアクセント位置と同一のアクセント位置を有する、前記基本周波数データベースに記憶された基本周波数パタンであり、且つ前記基本周波数パタンを生成しようとするアクセント句の音韻数に最も近い音韻数に対応する、前記基本周波数データベースに記憶された基本周波数パタンを用い、
（２）第１音韻からアクセント核の次の音韻までの基本周波数パタンを、前記基本周波数データベースに記憶された基本周波数パタンの第１音韻からアクセント核の次の音韻までの基本周波数を適用して生成し、
（３）前記アクセント核から２音韻目より予め定められた４以下の音韻数であるアクセント句末尾の直前の音韻までの基本周波数を、前記基本周波数データベースに記憶された基本周波数パタンの、（ａ）前記アクセント核から２音韻目と前記アクセント句末尾との基本周波数、または（ｂ）前記アクセント核の次の音韻と前記アクセント句末尾との基本周波数、または（ｃ）前記アクセント核から２音韻目と前記アクセント句末尾の直前の音韻との基本周波数、または（ｄ）前記アクセント核の次の音韻と前記アクセント句末尾の直前の音韻との基本周波数、により補間して生成し、
（４）前記基本周波数パタンを生成しようとするアクセント句の前記アクセント句末尾の基本周波数パタンを前記基本周波数データベースに記憶された基本周波数パタンの前記アクセント句末尾の音韻の基本周波数を適用することで生成する基本周波数パタン生成方法である。 A first aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified by the number of phonemes and accent positions,
The basic frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the basic frequency pattern is to be generated is not stored in the basic frequency database, and furthermore, the basic frequency pattern is to be generated. If the accent position of the accent phrase is the same or earlier than the next phoneme position of the phoneme position including the peak of the fundamental frequency stored in the fundamental frequency database,
(1) A basic frequency pattern stored in the basic frequency database and having the same accent position as that of the accent phrase for which the basic frequency pattern is to be generated, and an accent for which the basic frequency pattern is to be generated. Using the fundamental frequency pattern stored in the fundamental frequency database, corresponding to the number of phonemes closest to the number of phonemes in the phrase,
(2) Applying a fundamental frequency pattern from the first phoneme to the next phoneme of the accent kernel by applying a fundamental frequency from the first phoneme of the fundamental frequency pattern stored in the fundamental frequency database to the next phoneme of the accent kernel. Generate
(3) The basic frequency from the accent nucleus to the phoneme immediately before the end of the accent phrase, which is a predetermined number of phonemes less than or equal to 4 from the second phoneme, is calculated as (a) in the fundamental frequency pattern stored in the fundamental frequency database. ) A fundamental frequency between a second phoneme from the accent kernel and the end of the accent phrase, or (b) a fundamental frequency between a phoneme next to the accent kernel and the end of the accent phrase, or (c) a second phoneme from the accent kernel. And (d) a fundamental frequency between the phoneme immediately before the end of the accent phrase and a fundamental frequency between the phoneme immediately before the end of the accent phrase and (d) a fundamental frequency between the phoneme immediately before the end of the accent phrase and the fundamental frequency of the next phoneme of the accent kernel.
(4) By applying the fundamental frequency pattern at the end of the accent phrase of the accent phrase for which the fundamental frequency pattern is to be generated to the fundamental frequency of the phoneme at the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency database. This is a basic frequency pattern generation method to be generated.

第２の本発明は、音韻数とアクセント位置によって分類された基本周波数パタンを記憶した基本周波数データベースを用いてアクセント句の基本周波数パタンを生成する基本周波数パタン生成方法であって、
前記基本周波数パタンを生成しようとするアクセント句の音韻数およびアクセント型に該当する基本周波数パタンが前記基本周波数データベース内に記憶されていない場合であって、しかも、前記基本周波数パタンを生成しようとするアクセント句のアクセント位置が、前記基本周波数データベースに記憶されている基本周波数のピークを含む音韻位置の次の音韻位置より後方でかつあらかじめ定められた４以下の音韻数であるアクセント句末尾以前に位置する場合には、
（１）前記基本周波数データベースに記憶されている基本周波数のピークから２音韻目以降で且つ前記アクセント句末尾より前にアクセント核をもつ基本周波数パタンであり、しかも前記基本周波数パタンを生成しようとするアクセント句の音韻数に最も近い音韻数に対応する、前記基本周波数データベースに記憶された基本周波数パタンを用い、
（２）前記基本周波数パタンを生成しようとするアクセント句の第１音韻から前記基本周波数のピークを含む音韻までの基本周波数パタンを、前記基本周波数データベースに記憶された基本周波数パタンの第１音韻から前記基本周波数のピークを含む音韻までの基本周波数を適用して生成し、
（３）前記基本周波数のピークを含む音韻の次の音韻からアクセント核の直前の音韻までの基本周波数を、前記基本周波数データベースに記憶された基本周波数パタンの、（ａ）前記基本周波数のピークを含む音韻と、アクセント核を含む音韻との基本周波数、又は、（ｂ）前記基本周波数のピークを含む音韻と、アクセント核を含む音韻の直前の音韻との基本周波数、又は、（ｃ）前記基本周波数のピークを含む音韻の次の音韻と、アクセント核を含む音韻との基本周波数、又は、（ｄ）前記基本周波数のピークを含む音韻の次の音韻と、アクセント核を含む音韻の直前の音韻との基本周波数、により補間して生成し、
（４）前記基本周波数パタンを生成しようとするアクセント句のアクセント核を含む音韻とその直後の音韻との基本周波数を、前記基本周波数データベースに記憶された基本周波数パタンのアクセント核を含む音韻とその直後の音韻との基本周波数を適用して生成し、
（５）前記アクセント核から２音韻目より予め定められた４以下の音韻数である前記アクセント句末尾の直前の音韻までの基本周波数を、前記基本周波数データベースに記憶された基本周波数パタンの、（ａ）前記アクセント核から２音韻目と、前記アクセント句末尾との基本周波数、又は（ｂ）前記アクセント核の次の音韻と前記アクセント句末尾との基本周波数、又は（ｃ）前記アクセント核から２音韻目と、前記アクセント句末尾の直前の音韻との基本周波数、又は（ｄ）前記アクセント核の次の音韻と前記アクセント句末尾の直前の音韻との基本周波数、により補間して生成し、
（６）前記基本周波数パタンを生成しようとするアクセント句の前記アクセント句末尾の基本周波数パタンを、前記基本周波数データベースに記憶された基本周波数パタンの前記アクセント句末尾の音韻の基本周波数を適用することで生成する基本周波数パタン生成方法である。 A second invention is a fundamental frequency pattern generation method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified according to the number of phonemes and accent positions,
The basic frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the basic frequency pattern is to be generated is not stored in the basic frequency database, and further, the basic frequency pattern is to be generated. The accent position of the accent phrase is located after the phoneme position next to the phoneme position including the peak of the fundamental frequency stored in the fundamental frequency database and before the end of the accent phrase which is a predetermined number of phonemes of 4 or less. If you do
(1) It is a fundamental frequency pattern having an accent nucleus after the second phoneme from the peak of the fundamental frequency stored in the fundamental frequency database and before the end of the accent phrase, and further attempts to generate the fundamental frequency pattern. Using the fundamental frequency pattern stored in the fundamental frequency database, corresponding to the number of phonemes closest to the number of phonemes of the accent phrase,
(2) A basic frequency pattern from the first phoneme of the accent phrase for which the basic frequency pattern is to be generated to the phoneme including the peak of the basic frequency is obtained from the first phoneme of the basic frequency pattern stored in the basic frequency database. Generate by applying the fundamental frequency up to the phoneme including the peak of the fundamental frequency,
(3) The fundamental frequencies from the phoneme following the phoneme including the peak of the fundamental frequency to the phoneme immediately before the accent nucleus are expressed by: (a) the fundamental frequency peak stored in the fundamental frequency database; (B) the fundamental frequency of the phoneme including the peak of the fundamental frequency and the fundamental frequency of the phoneme immediately preceding the phoneme including the accent kernel, or (c) the fundamental frequency of the phoneme including the accent kernel. A fundamental frequency of a phoneme next to the phoneme including the frequency peak and a phoneme including the accent kernel, or (d) a phoneme next to the phoneme including the peak of the fundamental frequency and a phoneme immediately before the phoneme including the accent kernel. Generated by interpolation with the fundamental frequency
(4) The basic frequencies of the phoneme including the accent nucleus of the accent phrase for which the basic frequency pattern is to be generated and the phoneme immediately following it are obtained by combining the phoneme including the accent nucleus of the basic frequency pattern stored in the basic frequency database with the phoneme. Generated by applying the fundamental frequency with the phoneme immediately after,
(5) The fundamental frequency from the accent nucleus to the phoneme immediately before the end of the accent phrase, which is the predetermined number of phonemes of 4 or less from the second phoneme, is calculated by using the basic frequency pattern stored in the fundamental frequency database as ( a) the fundamental frequency of the second phoneme from the accent kernel and the end of the accent phrase, or (b) the fundamental frequency of the next phoneme of the accent kernel and the end of the accent phrase, or (c) the fundamental frequency of 2 from the accent kernel. A fundamental frequency between a phoneme and a phoneme immediately before the end of the accent phrase, or (d) a fundamental frequency between a phoneme next to the accent kernel and the phoneme immediately before the end of the accent phrase, and generated by interpolation.
(6) applying the fundamental frequency pattern at the end of the accent phrase of the accent phrase for which the fundamental frequency pattern is to be generated to the fundamental frequency of the phoneme at the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency database. This is a method of generating a fundamental frequency pattern generated by.

第３の本発明は、音韻数とアクセント位置によって分類された基本周波数パタンを記憶した基本周波数データベースを用いてアクセント句の基本周波数パタンを生成する基本周波数パタン生成方法であって、
前記基本周波数パタンを生成しようとするアクセント句の音韻数およびアクセント型に該当する基本周波数パタンが前記基本周波数データベース内に記憶されていない場合であって、しかも前記基本周波数パタンを生成しようとするアクセント句のアクセント位置が、予め定められた４以下の音韻数であるアクセント句末尾の音韻に含まれる場合には、
（１）前記基本周波数パタンを生成しようとするアクセント句の前記アクセント句末尾内でのアクセント位置と前記アクセント句末尾内でのアクセント位置とが、同一の、前記基本周波数データベースに記憶されている基本周波数パタンであり、かつ前記基本周波数パタンを生成しようとするアクセント句の音韻数に最も近い音韻数に対応する、前記基本周波数データベースに記憶された基本周波数パタンを用い、
（２）前記基本周波数パタンを生成しようとするアクセント句の第１音韻から前記基本周波数のピークを含む音韻までの基本周波数パタンを、前記基本周波数データベースに記憶された基本周波数パタンの第１音韻から前記基本周波数のピークを含む音韻までの基本周波数を適用して生成し、
（３）前記基本周波数のピークを含む音韻の次の音韻からアクセント核の直前の音韻までの基本周波数を、前記基本周波数データベースに記憶された基本周波数パタンの、（ａ）前記基本周波数のピークを含む音韻と、アクセント核を含む音韻との基本周波数、又は、（ｂ）前記基本周波数のピークを含む音韻と、アクセント核を含む音韻の直前の音韻との基本周波数、又は、（ｃ）前記基本周波数のピークを含む音韻の次の音韻と、アクセント核を含む音韻との基本周波数、又は、（ｄ）前記基本周波数のピークを含む音韻の次の音韻と、アクセント核を含む音韻の直前の音韻との基本周波数、により補間して生成し、
（４）前記基本周波数パタンを生成しようとするアクセント句のアクセント核を含む音韻からアクセント句の最終音韻までの基本周波数を、前記基本周波数データベースに記憶された基本周波数パタンのアクセント核を含む音韻からアクセント句の最終音韻までの基本周波数を適用して生成する基本周波数パタン生成方法である。 A third aspect of the present invention is a fundamental frequency pattern generation method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified by the number of phonemes and accent positions,
If the fundamental frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the fundamental frequency pattern is to be generated is not stored in the fundamental frequency database, and the accent for which the fundamental frequency pattern is to be generated is When the accent position of the phrase is included in the phoneme at the end of the accent phrase having a predetermined number of phonemes of 4 or less,
(1) The basic position stored in the fundamental frequency database, wherein the accent position within the accent phrase end and the accent position within the accent phrase end of the accent phrase for which the fundamental frequency pattern is to be generated are the same. Using a fundamental frequency pattern stored in the fundamental frequency database, which is a frequency pattern and corresponds to the number of phonemes closest to the number of phonemes of the accent phrase to generate the fundamental frequency pattern,
(2) A basic frequency pattern from the first phoneme of the accent phrase for which the basic frequency pattern is to be generated to the phoneme including the peak of the basic frequency is obtained from the first phoneme of the basic frequency pattern stored in the basic frequency database. Generate by applying the fundamental frequency up to the phoneme including the peak of the fundamental frequency,
(3) The fundamental frequencies from the phoneme following the phoneme including the peak of the fundamental frequency to the phoneme immediately before the accent nucleus are expressed by: (a) the fundamental frequency peak stored in the fundamental frequency database; (B) the fundamental frequency of the phoneme including the peak of the fundamental frequency and the fundamental frequency of the phoneme immediately preceding the phoneme including the accent kernel, or (c) the fundamental frequency of the phoneme including the accent kernel. A fundamental frequency of a phoneme next to the phoneme including the frequency peak and a phoneme including the accent kernel, or (d) a phoneme next to the phoneme including the peak of the fundamental frequency and a phoneme immediately before the phoneme including the accent kernel. Generated by interpolation with the fundamental frequency
(4) The fundamental frequencies from the phoneme including the accent nucleus of the accent phrase for which the fundamental frequency pattern is to be generated to the final phoneme of the accent phrase are calculated from the phoneme including the accent nucleus of the fundamental frequency pattern stored in the fundamental frequency database. This is a method of generating a fundamental frequency pattern by applying a fundamental frequency up to the final phoneme of an accent phrase.

第４の本発明は、音韻数とアクセント位置によって分類された基本周波数パタンを記憶した基本周波数データベースを用いてアクセント句の基本周波数パタンを生成する基本周波数パタン生成方法であって、
前記基本周波数パタンを生成しようとするアクセント句の音韻数およびアクセント型に該当する基本周波数パタンが前記基本周波数データベース内に記憶されていない場合であって、しかも、前記基本周波数パタンを生成しようとするアクセント句のアクセント型が平板型である場合には、
（１）前記平板型アクセントでかつ前記基本周波数パタンを生成しようとするアクセント句の音韻数に最も近い音韻数に対応する、前記基本周波数データベースに記憶された基本周波数パタンを用い、
（２）第１音韻から基本周波数のピークを含む音韻までの基本周波数パタンを、前記基本周波数データベースに記憶された基本周波数パタンの第１音韻から前記基本周波数のピークを含む音韻までの基本周波数を適用して生成し、
（３）前記基本周波数のピークを含む音韻の次の音韻から前記のアクセント句末尾または最終音韻の直前までの基本周波数を、前記基本周波数データベースに記憶された基本周波数パタンの、（ａ）前記基本周波数のピークを含む音韻と、前記アクセント句末尾との又は前記最終音韻との基本周波数、又は（ｂ）前記基本周波数のピークを含む音韻と、前記アクセント句末尾の又は前記最終音韻の直前の音韻との基本周波数、又は（ｃ）前記基本周波数のピークを含む音韻の次の音韻と、前記アクセント句末尾との又は前記最終音韻との基本周波数、又は（ｄ）前記基本周波数のピークを含む音韻の次の音韻と、前記アクセント句末尾の又は前記最終音韻の直前の音韻との基本周波数、により補間して生成し、
（４）前記基本周波数パタンを生成しようとするアクセント句の前記アクセント句末尾または前記最終音韻の基本周波数パタンを、前記基本周波数データベースに記憶された基本周波数パタンの前記アクセント句末尾の音韻または前記最終音韻の基本周波数を適用することで生成する基本周波数パタン生成方法である。 A fourth invention is a fundamental frequency pattern generation method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified according to the number of phonemes and accent positions,
The basic frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the basic frequency pattern is to be generated is not stored in the basic frequency database, and further, the basic frequency pattern is to be generated. If the accent type of the accent phrase is flat,
(1) Using a fundamental frequency pattern stored in the fundamental frequency database, which corresponds to the number of phonemes closest to the number of phonemes of the accent phrase for which the basic frequency pattern is to be generated with the flat type accent,
(2) The fundamental frequency pattern from the first phoneme to the phoneme including the peak of the fundamental frequency is defined as the fundamental frequency from the first phoneme of the fundamental frequency pattern stored in the fundamental frequency database to the phoneme including the peak of the fundamental frequency. Apply and generate,
(3) The basic frequencies from the phoneme following the phoneme including the peak of the fundamental frequency to the end of the accent phrase or immediately before the final phoneme are calculated using the basic frequency pattern stored in the fundamental frequency database, A fundamental frequency of a phoneme including a frequency peak and the end of the accent phrase or the final phoneme, or (b) a phoneme including a peak of the fundamental frequency and a phoneme at the end of the accent phrase or immediately before the final phoneme. Or (c) a fundamental frequency between the phoneme following the phoneme including the peak of the fundamental frequency and the end of the accent phrase or the final phoneme, or (d) a phoneme including the peak of the fundamental frequency. The next phoneme of the, the fundamental frequency of the phoneme at the end of the accent phrase or immediately before the final phoneme, generated by interpolation,
(4) The end of the accent phrase of the accent phrase for which the fundamental frequency pattern is to be generated or the fundamental frequency pattern of the final phoneme is stored in the fundamental frequency pattern stored in the fundamental frequency database. This is a basic frequency pattern generation method generated by applying a basic frequency of a phoneme.

第５の本発明は、前記基本周波数パタンは自然に発声された音声から抽出されたものである上記第１〜４の何れかの本発明の基本周波数パタン生成方法である。 A fifth aspect of the present invention is the fundamental frequency pattern generation method according to any one of the first to fourth aspects of the present invention, wherein the fundamental frequency pattern is extracted from a naturally uttered voice.

第６の本発明は、前記補間が直線補間である上記第１〜４の何れかの本発明の基本周波数パタン生成方法である。 A sixth aspect of the present invention is the fundamental frequency pattern generation method according to any one of the first to fourth aspects, wherein the interpolation is linear interpolation.

第７の本発明は、アクセント句の先頭から前記基本周波数のピークまでの基本周波数を、実時間軸上で表現された基本周波数パタンで補間する上記第１〜４の何れかの本発明の基本周波数パタン生成方法である。 According to a seventh aspect of the present invention, there is provided a method according to any one of the first to fourth aspects, wherein a fundamental frequency from a head of the accent phrase to a peak of the fundamental frequency is interpolated by a fundamental frequency pattern expressed on a real time axis. This is a frequency pattern generation method.

第８の本発明は、前記音韻は、モーラ又は音節であることを特徴とする上記第１〜４の何れかの本発明の基本周波数パタン生成方法である。 An eighth aspect of the present invention is the fundamental frequency pattern generation method according to any one of the first to fourth aspects of the invention, wherein the phoneme is a mora or a syllable.

第９の本発明は、上記第１〜４の何れかの本発明の基本周波数パタン生成方法の各ステップをコンピュータにより実行させるためのプログラムを記録したプログラム記録媒体である。 A ninth aspect of the present invention is a program recording medium storing a program for causing a computer to execute each step of the fundamental frequency pattern generation method according to any one of the first to fourth aspects of the present invention.

以上述べたところから明らかな様に本発明は、従来に比べてより一層自然性の高い基本周波数パタンを生成出来るという長所を有する。 As is apparent from the above description, the present invention has an advantage that a fundamental frequency pattern with higher naturalness can be generated as compared with the related art.

以下、本発明の実施の形態、及び本発明に関連する他の発明の実施の形態について、図１から図２０を用いて説明する。 Hereinafter, an embodiment of the present invention and another embodiment of the present invention related to the present invention will be described with reference to FIGS.

（実施の形態１）
図１は、本発明に関連する他の発明の一実施の形態を示す基本周波数パタン生成装置の機能ブロック図であり、同図を参照しながら、本実施の形態の構成を説明する。 (Embodiment 1)
FIG. 1 is a functional block diagram of a fundamental frequency pattern generation device showing another embodiment of the present invention related to the present invention. The configuration of the present embodiment will be described with reference to FIG.

即ち、図１において、１０は音声合成の対象となる文字列を入力する文字列入力部である。２０は文字列入力部１０より入力された文字列を解析し合成されるべき音声の音韻情報とアクセントやポーズ等の韻律情報を出力する文字列解析部である。３０は発話速度、発話中での音韻の位置等の条件ごとに各音韻の時間長を記憶する音韻時間長データベースであり、４０は文字列解析部２０より出力された音韻情報および韻律情報に基づいて音韻時間長データベース３０を参照して各音韻の時間長を設定する時間長設定部である。５０はアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、各モーラの時間長で標準化した基本周波数パタンをモーラ毎に記憶するモーラ時間長標準化基本周波数データベースであり、６０は文字列解析部２０より出力された韻律情報と時間長設定部４０で設定された音韻の時間長に基づいてモーラ時間長標準化基本周波数データベース５０を参照して基本周波数パタンを生成する基本周波数パタン生成部である。７０は基本周波数パタン生成部より出力された基本周波数パタンに基づいて声帯振動を生成する声帯振動生成部であり、合成音声の音源振動を生成する。図２は本発明による基本周波数パタンの一例である。 That is, in FIG. 1, reference numeral 10 denotes a character string input unit for inputting a character string to be subjected to speech synthesis. Reference numeral 20 denotes a character string analysis unit that analyzes a character string input from the character string input unit 10 and outputs phonological information of a voice to be synthesized and prosody information such as accent and pause. Reference numeral 30 denotes a phoneme time length database that stores the time length of each phoneme for each condition such as the speech speed and the position of the phoneme in the speech, and 40 is based on the phoneme information and the prosody information output from the character string analysis unit 20. A time length setting unit for setting the time length of each phoneme with reference to the phoneme time length database 30. Reference numeral 50 denotes a mora time length standardized fundamental frequency database that stores, for each mora, a fundamental frequency pattern standardized by the time length of each mora with respect to the conditions of determinants of prosody such as the number of mora, accent type, phoneme sequence, etc. Reference numeral 60 denotes a fundamental frequency for generating a fundamental frequency pattern by referring to the mora time length standardized fundamental frequency database 50 based on the prosody information output from the character string analysis unit 20 and the phoneme time length set by the time length setting unit 40. This is a pattern generation unit. Reference numeral 70 denotes a vocal cord vibration generating unit that generates vocal cord vibrations based on the fundamental frequency pattern output from the fundamental frequency pattern generating unit, and generates a sound source vibration of a synthesized voice. FIG. 2 is an example of a fundamental frequency pattern according to the present invention.

以上のように構成された基本周波数パタン生成装置について、以下、その動作を述べる。 The operation of the fundamental frequency pattern generator configured as described above will be described below.

まず、文字列入力部１０から音声に変換されるべき文字列（図２に示す、「オンセーゴーセー」の文字列）が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 First, a character string to be converted into speech (a character string of “on-go-go-se” shown in FIG. 2) is input from the character string input unit 10. The character string analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the time length setting unit 40, divides the character string into accent phrases, and indicates the number of mora and the accent type of each accent phrase. Prosody information and phoneme information indicating a phoneme sequence are output to the fundamental frequency pattern generation unit 60. The time length setting unit 40 sets the time length of each phoneme with reference to the phoneme time length database 30 based on the phoneme information input from the character string analysis unit 20, and outputs the time length information to the fundamental frequency pattern generation unit 60. . The basic frequency pattern generation unit 60 generates a basic frequency pattern for each accent phrase based on the prosody information and phoneme information input from the character string analysis unit 20 and the time length information input from the time length setting unit 40.

まず、図２中のａ）のようにアクセント句の先頭モーラの基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得する。次にアクセント句のモーラ数とアクセント型より基本周波数が最大値をとるモーラを特定し、図２中のｂ）のように特定されたモーラの基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得する。図２中のｃ）およびｄ）のようにアクセント核とアクセント核の次のモーラの基本周波数パタンおよびアクセント句の最終モーラの基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得する。図２のｂ）とｃ）、ｃ）とｄ）のように基準となるモーラの間を実時間軸で線形補間を用いて、図２のｅ）、ｆ）およびｇ）の基本周波数パタンを決定する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 First, the basic frequency pattern of the first mora of the accent phrase is acquired from the mora time length standardized basic frequency database 50 as shown in a) of FIG. Next, the mora having the maximum fundamental frequency is specified from the number of mora of the accent phrase and the accent type, and the basic frequency pattern of the specified mora as shown in b) in FIG. get. As shown in c) and d) in FIG. 2, the fundamental frequency pattern of the mora next to the accent nucleus and the accent nucleus and the fundamental frequency pattern of the last mora of the accent phrase are acquired from the mora time length standardized fundamental frequency database 50. Using linear interpolation on the real-time axis between reference mora as in b) and c), c) and d) in FIG. 2, the fundamental frequency patterns in e), f) and g) in FIG. decide. The vocal fold vibration generating unit 70 generates vocal fold vibration of the synthesized sound according to the fundamental frequency pattern output from the fundamental frequency pattern generating unit 60.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングと角度を当該モーラの時間長で標準化した基本周波数パタンを当てはめることにより、モーラ内での基本周波数の変動を詳細に再現し、高い自然性を実現するとともに、聞こえに大きく影響しない部分については実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。 By applying a fundamental frequency pattern that standardizes the timing and angle of the rise of the accent phrase and the fall at the accent nucleus with the time length of the mora, which greatly affects the naturalness of speech, the variation of the fundamental frequency within the mora By reproducing in detail and realizing high naturalness, by interpolating on the real time axis for parts that do not greatly affect the hearing, the sense of discontinuity when controlling in mora units is eliminated, and the fundamental frequency pattern database is also Can be smaller.

（実施の形態２）
図４は本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図であり、モーラ時間長標準化基本周波数データベース５０がアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、各モーラの母音部の時間長を４等分し、各区間の基本周波数の代表値を区間の中央点の値として記憶する母音時間長標準化基本周波数データベース１５０ａに置き換わった以外は図１と同様である。 (Embodiment 2)
FIG. 4 is a functional block diagram of an apparatus showing another embodiment related to the present invention, in which a mora time length standardized fundamental frequency database 50 determines a prosody such as the number of mora of an accent phrase, an accent type and a phoneme sequence. Regarding the condition of the factor, except that the time length of the vowel part of each mora was divided into four equal parts, and the vowel time length standardized fundamental frequency database 150a storing the representative value of the fundamental frequency of each section as the value of the center point of the section was replaced. It is the same as FIG.

図３は本発明による基本周波数パタンの一例である。以下その動作を述べる。まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 FIG. 3 is an example of a fundamental frequency pattern according to the present invention. The operation will be described below. First, a character string to be converted to speech is input from the character string input unit 10. The character string analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the time length setting unit 40, divides the character string into accent phrases, and indicates the number of mora and the accent type of each accent phrase. Prosody information and phoneme information indicating a phoneme sequence are output to the fundamental frequency pattern generation unit 60. The time length setting unit 40 sets the time length of each phoneme with reference to the phoneme time length database 30 based on the phoneme information input from the character string analysis unit 20, and outputs the time length information to the fundamental frequency pattern generation unit 60. . The basic frequency pattern generation unit 60 generates a basic frequency pattern for each accent phrase based on the prosody information and phoneme information input from the character string analysis unit 20 and the time length information input from the time length setting unit 40.

まず、アクセント句のモーラ数とアクセント型および音韻列等により、母音時間長標準化基本周波数データベース１５０ａより基本周波数が最大値をとるモーラの母音相当部を４等分した３番目の区間中央のａ）立ち上がり基準点、アクセント核に当たるモーラの母音相当部を４等分した３番目の区間中央のｂ）立ち下がり基準点、アクセント核の次のモーラの母音相当部を４等分した３番目の区間中央のｃ）立ち下がり基準点、アクセント句の最終モーラの母音相当部を４等分した２番目の区間中央のｄ）アクセント句末基準点、および最終モーラの母音相当部を４等分した３番目の区間中央のｅ）語尾基準点を取得する。 First, according to the number of mora of the accent phrase, the accent type, the phoneme sequence, and the like, the vowel-equivalent portion of the mora whose fundamental frequency takes the maximum value from the vowel time length standardized fundamental frequency database 150a is divided into four equal parts a). B) The middle of the third section where the vowel equivalent part of the mora hitting the accent nucleus is divided into four equal parts b) The falling reference point and the center of the third section where the vowel equivalent part of the mora next to the accent nucleus is divided into four equal parts C) the falling reference point, the middle of the second section obtained by dividing the vowel equivalent part of the last mora of the accent phrase into four equal parts; d) the third reference point obtained by dividing the accent phrase end reference point and the vowel equivalent part of the final mora into four equal parts; E) The end reference point at the center of the section is obtained.

次に各基準点を対応するモーラの母音時間長に対する相対位置に設定する。ａ）立ち上がり基準点が最大値となるようアクセント句の先頭からａ）立ち上がり基準点までを実時間軸上で対数周波数軸に対する臨界制動２次線形系を用いて補間する。ａ）からｄ）の各基準点の間を各区間ごとに２点間を実時間軸上で対数周波数軸に対する臨界制動２次線形系を用いて補間する。さらにアクセント句の終了が発話の終了である場合には、ｄ）アクセント句末基準点とｅ）語尾基準点との間を実時間軸上の関数である語尾関数により補間する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 Next, each reference point is set at a position relative to the vowel time length of the corresponding mora. a) From the beginning of the accent phrase to a) the rising reference point, the interpolation is performed on the real time axis using a critical damping quadratic linear system for the logarithmic frequency axis so that the rising reference point becomes the maximum value. Interpolation is performed between the reference points a) to d) in each section between two points on a real time axis using a critical braking quadratic linear system with respect to a logarithmic frequency axis. Further, when the end of the accent phrase is the end of the utterance, interpolation is performed between d) the reference point at the end of the accent phrase and e) the reference point at the end by an end function which is a function on the real time axis. The vocal fold vibration generating unit 70 generates vocal fold vibration of the synthesized sound according to the fundamental frequency pattern output from the fundamental frequency pattern generating unit 60.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの母音長で標準化した時間軸上で設定することにより、モーラ内での基本周波数の変動のタイミングを詳細に再現し、立ち上がり、立ち下がりの角度については実時間軸上の関数を用いることによって、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。さらに聞こえに大きく影響しない部分については実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。 By setting the timing of the rise of the accent phrase and the timing of the fall at the accent nucleus on the time axis standardized by the vowel length of the mora, which greatly affects the naturalness of the voice, the timing of the fluctuation of the fundamental frequency within the mora By using the functions on the real-time axis for the rising and falling angles, the pattern of the smooth fundamental frequency with stable rising and falling without being affected by the difference in time length due to phonemes. To achieve high naturalness. Furthermore, by performing interpolation on the real time axis for a portion that does not significantly affect the hearing, a sense of discontinuity when controlling in units of mora can be eliminated, and the fundamental frequency pattern database can be made smaller.

（実施の形態３）
本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図は、上記実施の形態２のデータベース１５０ａが、モーラ時間長標準化基本周波数データベース５０がアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、各モーラの母音部の時間長で標準化したモーラ毎の母音部の基本周波数パタンとアクセント句の先頭基本周波数を記憶する母音時間長標準化基本周波数データベース１５０ｂに置き換わった以外は図４と同様であるので、図示を省略する。 (Embodiment 3)
A functional block diagram of an apparatus showing another embodiment related to the present invention is as follows. In the second embodiment, the database 150a according to the second embodiment is such that the mora time length standardized fundamental frequency database 50 uses the number of mora of accent phrase, accent type, phoneme. The condition of the determinant of the prosody such as a column is stored in the vowel time length standardized fundamental frequency database 150b which stores the fundamental frequency pattern of the vowel part and the leading fundamental frequency of the accent phrase for each mora standardized by the time length of the vowel part of each mora. Except for the replacement, it is the same as FIG.

図５は本発明による基本周波数パタンの一例である。 FIG. 5 is an example of a fundamental frequency pattern according to the present invention.

まず、文字列入力部１０から音声に変換されるべき文字列（図５に示す、「ｏＮｓｅ−ｇｏ−ｓｅ−」の文字列）が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各モーラの母音時間長または単母音音節、撥音あるいは長音における母音相当部の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 First, a character string to be converted to speech (a character string of “oNse-go-se-” shown in FIG. 5) is input from the character string input unit 10. The character string analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the time length setting unit 40, divides the character string into accent phrases, and indicates the number of mora and the accent type of each accent phrase. Prosody information and phoneme information indicating a phoneme sequence are output to the fundamental frequency pattern generation unit 60. The time length setting unit 40 refers to the phoneme time length database 30 based on the phoneme information input from the character string analysis unit 20 and determines the vowel time length of each mora or the time length of a single vowel syllable, vowel sound or a vowel equivalent part in a long sound. And outputs the time length information to the fundamental frequency pattern generation unit 60. The basic frequency pattern generation unit 60 generates a basic frequency pattern for each accent phrase based on the prosody information and phoneme information input from the character string analysis unit 20 and the time length information input from the time length setting unit 40.

まず、図５中のＡのようにアクセント句の先頭基本周波数を母音時間長標準化基本周波数データベース１５０ｂより取得する。次に図５中のａ）のようにアクセント句の先頭モーラの母音部の基本周波数パタンを母音時間長標準化基本周波数データベース１５０ｂより取得する。本例では第１モーラは単母音音節であるので図５中のａ）のように当該モーラの時間長の後半部に対して母音時間長標準化基本周波数データベース１５０ｂより取得した基本周波数パタンを適用する。ｂ)、ｃ),ｄ),ｅ),ｆ),ｇ),ｈ)についても同様に当該モーラの母音部の基本周波数パタンを母音時間長標準化基本周波数データベース１５０ｂより取得する。撥音であるｂ)、長音であるｄ),ｆ),ｈ)についてもａ）と同様に当該モーラの時間長の後半部にたいして母音時間長標準化基本周波数データベース１５０ｂより取得した基本周波数パタンを適用する。次に単母音音節、撥音、長音の前半部あるいは有声子音のａ')、ｂ')、ｄ')、ｅ'),ｆ')、ｈ')の基本周波数を前後の基本周波数により、実時間軸で線形補間を用いて生成する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 First, as shown at A in FIG. 5, the head fundamental frequency of the accent phrase is obtained from the vowel time length standardized fundamental frequency database 150b. Next, as shown in FIG. 5A, the fundamental frequency pattern of the vowel part of the first mora of the accent phrase is obtained from the vowel time length standardized fundamental frequency database 150b. In this example, since the first mora is a single vowel syllable, the fundamental frequency pattern acquired from the vowel duration standardized fundamental frequency database 150b is applied to the latter half of the duration of the mora as shown in a) of FIG. . Similarly, for b), c), d), e), f), g) and h), the fundamental frequency pattern of the vowel part of the mora is acquired from the vowel time length standardized fundamental frequency database 150b. Similarly to a), the fundamental frequency pattern acquired from the vowel duration standardized fundamental frequency database 150b is applied to the latter half of the duration of the mora for b) which is a sound repellent and d), f) and h) which are long sounds. . Next, the fundamental frequencies of monovowel syllables, vowel sounds, the first half of long sounds or voiced consonants a '), b'), d '), e'), f '), and h') are calculated by the fundamental frequencies before and after. Generated using linear interpolation on the time axis. The vocal fold vibration generating unit 70 generates vocal fold vibration of the synthesized sound according to the fundamental frequency pattern output from the fundamental frequency pattern generating unit 60.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングと角度を当該モーラの母音長で標準化した基本周波数パタンを当てはめることにより、モーラ内での基本周波数の変動を詳細に再現し、高い自然性を実現するとともに、聞こえに大きく影響しない部分については実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。 By applying a fundamental frequency pattern that standardizes the timing and angle of the rise of the accent phrase and the fall at the accent nucleus, which greatly affects the naturalness of the voice, by the vowel length of the mora, the variation of the fundamental frequency within the mora is reduced. By reproducing in detail and realizing high naturalness, by interpolating on the real time axis for parts that do not greatly affect the hearing, the sense of discontinuity when controlling in mora units is eliminated, and the fundamental frequency pattern database is also Can be smaller.

（実施の形態４）
実施の形態４においては母音時間長標準化基本周波数データベース１５０ａはアクセント句のモーラ数、アクセント型、音韻列等の韻律の決定要因の条件について、Ａ）先頭基本周波数、Ｂ）立ち上がり基準点、Ｃ）立ち下がり基準点（アクセント核）、Ｄ）立ち下がり基準点（アクセント核の直後）、Ｅ）アクセント句末基準点、およびＦ）語尾基準点を、各基準点を含むモーラの母音時間長に対する相対位置で記憶する母音時間長標準化基本周波数データベースである。これ以外は装置の構成については図４と同様である。図６は本発明による基本周波数パタンの一例である。以下その動作を述べる。 (Embodiment 4)
In the fourth embodiment, the vowel time length standardized fundamental frequency database 150a determines the conditions of prosodic determinants such as the number of mora of an accent phrase, the accent type, the phoneme sequence, and the like; The falling reference point (accent kernel), D) the falling reference point (immediately after the accent kernel), E) the accent phrase end reference point, and F) the ending reference point relative to the vowel time length of the mora including each reference point. It is a vowel time length standardized fundamental frequency database stored by position. Otherwise, the configuration of the apparatus is the same as that of FIG. FIG. 6 is an example of a fundamental frequency pattern according to the present invention. The operation will be described below.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。まず、アクセント句のモーラ数とアクセント型および音韻列等により、母音時間長標準化基本周波数データベース１５０ａよりＡ）からＦ）の基準点を取得する。次に各基準点を対応するモーラの母音長に対する相対位置に設定する。Ａ）先頭基本周波数からＢ）立ち上がり基準点までの間を実時間軸上の関数を用いて生成する。さらにＢ）以降の各基準点の間の基本周波数パタンを実時間軸上の直線で補間することにより生成する。 First, a character string to be converted to speech is input from the character string input unit 10. The character string analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the time length setting unit 40, divides the character string into accent phrases, and indicates the number of mora and the accent type of each accent phrase. Prosody information and phoneme information indicating a phoneme sequence are output to the fundamental frequency pattern generation unit 60. The time length setting unit 40 sets the time length of each phoneme with reference to the phoneme time length database 30 based on the phoneme information input from the character string analysis unit 20, and outputs the time length information to the fundamental frequency pattern generation unit 60. . The basic frequency pattern generation unit 60 generates a basic frequency pattern for each accent phrase based on the prosody information and phoneme information input from the character string analysis unit 20 and the time length information input from the time length setting unit 40. First, the reference points of A) to F) are obtained from the vowel time length standardized fundamental frequency database 150a based on the number of mora of the accent phrase, the accent type, the phoneme sequence, and the like. Next, each reference point is set at a position relative to the vowel length of the corresponding mora. The interval between A) the top fundamental frequency and B) the rising reference point is generated using a function on the real time axis. Further, B is generated by interpolating a fundamental frequency pattern between the respective reference points after that with a straight line on the real time axis.

声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 The vocal fold vibration generating unit 70 generates vocal fold vibration of the synthesized sound according to the fundamental frequency pattern output from the fundamental frequency pattern generating unit 60.

（実施の形態５）
図７は本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図であり、母音時間長標準化基本周波数データベース１５０ａがアクセント句のモーラ数、アクセント型の条件についてａ）立ち上がり基準点、ｂ）立ち下がり基準点（アクセント核）、ｃ）立ち下がり基準点（アクセント核の直後）、ｄ）アクセント句末基準点、およびｅ）語尾基準点を、各基準点を含むモーラの母音あるいは母音相当部の時間長に対する相対位置で記憶し、音韻あるいは音素列による基本周波数の微細な変動を母音時間長標準化基本周波数データベース１５０ａに記憶された各基準点および基準点の間を補間した値との差を音素の時間長で標準化して記憶するマイクロプロソディデータベース２５０がつけ加わった以外は図４と同様である。 (Embodiment 5)
FIG. 7 is a functional block diagram of an apparatus showing another embodiment related to the present invention. The vowel time length standardized fundamental frequency database 150a stores the number of mora of an accent phrase and the condition of an accent type. , B) the falling reference point (accent kernel), c) the falling reference point (immediately after the accent kernel), d) the accent ending reference point, and e) the ending reference point, or the mora vowel containing each reference point or A value obtained by interpolating between the reference points and the reference points stored in the vowel time length standardized fundamental frequency database 150a by storing the relative variation with respect to the time length of the vowel equivalent part and the fine fluctuation of the fundamental frequency due to the phoneme or phoneme sequence. 4 is the same as FIG. 4 except that a microprosody database 250 for standardizing and storing the difference in the phoneme duration is added.

図８はマイクロプロソディデータベース２５０に記憶されているマイクロプロソディ成分の模式図であり、図９（Ａ）〜（Ｃ）は本発明による基本周波数パタンの一例である。 FIG. 8 is a schematic diagram of the microprosody components stored in the microprosody database 250, and FIGS. 9A to 9C are examples of the fundamental frequency pattern according to the present invention.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各モーラの音素ごとの時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。まず、アクセント句のモーラ数とアクセント型により、母音時間長標準化基本周波数データベースより、基本周波数が最大値をとるモーラの母音相当部を４等分した３番目の区間中央のａ）立ち上がり基準点、アクセント核に当たるモーラの母音相当部を４等分した３番目の区間中央のｂ）立ち下がり基準点、アクセント核の次のモーラの母音相当部を４等分した３番目の区間中央のｃ）立ち下がり基準点、アクセント句の最終モーラの母音相当部を４等分した２番目の区間中央のｄ）アクセント句末基準点、および最終モーラの母音相当部を４等分した３番目の区間中央のｅ）語尾基準点を取得する。 First, a character string to be converted to speech is input from the character string input unit 10. The character string analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the time length setting unit 40, divides the character string into accent phrases, and indicates the number of mora and the accent type of each accent phrase. Prosody information and phoneme information indicating a phoneme sequence are output to the fundamental frequency pattern generation unit 60. The time length setting unit 40 sets the time length of each mora phoneme with reference to the phoneme time length database 30 based on the phoneme information input from the character string analysis unit 20, and uses the time length information as the fundamental frequency pattern generation unit 60. Output to The basic frequency pattern generation unit 60 generates a basic frequency pattern for each accent phrase based on the prosody information and phoneme information input from the character string analysis unit 20 and the time length information input from the time length setting unit 40. First, according to the number of mora and the accent type of the accent phrase, a) a rising reference point at the center of a third section obtained by dividing the vowel equivalent part of the mora having the maximum fundamental frequency into four parts from the vowel time length standardized fundamental frequency database. B) Falling reference point, middle of the third section where the vowel equivalent part of the mora hitting the accent nucleus is divided into four equal parts, c) Standing at the center of the third section where the vowel equivalent part of the mora next to the accent nucleus is divided into four equal parts D) the descent reference point, the middle of the second section where the vowel equivalent part of the last mora of the accent phrase is divided into four equal parts d) the accent reference point reference point, and the middle of the third section where the vowel equivalent part of the final mora is divided into four equal parts e) Get the ending reference point.

次に各基準点を対応するモーラの音素時間長に対する相対位置に設定する。ａ）立ち上がり基準点が最大値となるようアクセント句の先頭からａ）立ち上がり基準点までを実時間軸上でかつ対数周波数軸に対する臨界制動２次線形系で補間する。ａ）からｅ）の各基準点の間を各区間ごとに２点間を実時間軸上でかつ対数周波数軸に対する臨界制動２次線形系で補間し、図９（Ａ）のような基本周波数パタンを生成する。次にマイクロプロソディデータベース２５０より各音素に対応する基本周波数の微細な変動を取得し、各音素の時間長に合わせて伸長圧縮し、図９（Ｂ）のように適用する。図９（Ａ）の基本周波数パタンに、同図（Ｂ）の微細な変動を加え、同図（Ｃ）のような基本周波数パタンを生成する。声帯振動生成部７０は基本周波数パタン生成部６０より出力された基本周波数パタンに従って合成音の声帯振動を生成する。 Next, each reference point is set to a relative position with respect to the phoneme time length of the corresponding mora. a) From the head of the accent phrase to a) the rising reference point, the interpolation is performed on the real time axis and in the critical damping quadratic linear system with respect to the logarithmic frequency axis so that the rising reference point becomes the maximum value. Interpolation between the reference points a) to e) is carried out by a critical braking quadratic linear system on the real time axis and the logarithmic frequency axis between the two points for each section, and the fundamental frequency as shown in FIG. Generate a pattern. Next, a minute variation of the fundamental frequency corresponding to each phoneme is obtained from the microprosody database 250, expanded and compressed in accordance with the time length of each phoneme, and applied as shown in FIG. 9B. The basic frequency pattern shown in FIG. 9B is added to the basic frequency pattern shown in FIG. 9A to generate a basic frequency pattern as shown in FIG. The vocal fold vibration generating unit 70 generates vocal fold vibration of the synthesized sound according to the fundamental frequency pattern output from the fundamental frequency pattern generating unit 60.

アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの音素の時間長で標準化した軸上で設定することによりモーラ内での基本周波数の変動のタイミングを詳細に再現し、さらに音声の自然性と明瞭性に影響する、基本周波数の微細な変動を加えることにより高い自然性と明瞭性を実現する。 By setting the timing of the rise of the accent phrase and the timing of the fall in the accent nucleus on an axis standardized by the duration of the phoneme of the mora, the timing of the fluctuation of the fundamental frequency in the mora is reproduced in detail, and High naturalness and clarity are achieved by adding small variations in the fundamental frequency that affect naturalness and clarity.

（実施の形態６）
図１０は本発明の一実施の形態を示す装置の機能ブロック図であり、モーラ時間長標準化基本周波数データベース５０がアクセント句のモーラ数、アクセント型の条件について基本周波数パタンのピークである第ｉモーラのａ）立ち上がり基準点、ｂ）立ち下がり基準点（アクセント核）、ｃ）立ち下がり基準点（アクセント核の直後）、アクセント句末尾のｋモーラのｄ）アクセント句末基準点を、各基準点を含むモーラの音素の時間長に対する相対位置で記憶する音素時間長標準化基本周波数データベース３５１に入れかわり、基本周波数を生成しようとするアクセント句のフレーズ内での位置ごとにアクセント句のピークと末尾の基本周波数の変形量を記憶した基本周波数パタン変形データベース３５０がつけ加わった以外は図１と同様である。 (Embodiment 6)
FIG. 10 is a functional block diagram of an apparatus showing an embodiment of the present invention. The i-th mora in which the mora time length standardized fundamental frequency database 50 is the peak of the fundamental frequency pattern for the number of mora of the accent phrase and the condition of the accent type. A) a rising reference point, b) a falling reference point (accent nucleus), c) a falling reference point (immediately after the accent nucleus), d) a k-mora at the end of the accent phrase, and d) an accent phrase end reference point. Is replaced with a phoneme time length standardized fundamental frequency database 351 stored at a relative position with respect to the time length of the phoneme of the mora, and the peak and the end of the accent phrase for each position in the phrase of the accent phrase for which the fundamental frequency is to be generated. 1. Same as FIG. 1 except that a fundamental frequency pattern deformation database 350 storing the amount of deformation of the fundamental frequency is added. It is.

図１１、図１２、図１３および図１４は音素時間長標準化基本周波数データベース３５１に基本周波数生成しようとするアクセント句のモーラ数およびアクセント型に対応する基本周波数パタンのデータがない場合に生成する基本周波数パタンの模式図である。図１５は複数のアクセント句の基本周波数パタンを接続して生成した文の基本周波数パタンの模式図である。以下その動作を述べる。 11, 12, 13, and 14 show the basics generated when there is no data of the basic frequency pattern corresponding to the number of mora and the accent type of the accent phrase to be generated in the phoneme duration standardized basic frequency database 351. It is a schematic diagram of a frequency pattern. FIG. 15 is a schematic diagram of a fundamental frequency pattern of a sentence generated by connecting fundamental frequency patterns of a plurality of accent phrases. The operation will be described below.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各音素の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。 First, a character string to be converted to speech is input from the character string input unit 10. The character string analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme sequence to the time length setting unit 40, divides the character string into accent phrases, and indicates the number of mora and the accent type of each accent phrase. Prosody information and phoneme information indicating a phoneme sequence are output to the fundamental frequency pattern generation unit 60. The time length setting unit 40 sets the time length of each phoneme with reference to the phoneme time length database 30 based on the phoneme information input from the character string analysis unit 20, and outputs the time length information to the fundamental frequency pattern generation unit 60. . The basic frequency pattern generation unit 60 generates a basic frequency pattern for each accent phrase based on the prosody information and phoneme information input from the character string analysis unit 20 and the time length information input from the time length setting unit 40.

まず、アクセント句のモーラ数とアクセント型および音韻列等により、音素時間長標準化基本周波数データベース３５１よりａ）立ち上がり基準点、ｂ）立ち下がり基準点、ｃ）立ち下がり基準点、ｄ）アクセント句末基準点、あるいは、ｄ’）最終モーラを取得する。 First, based on the mora number of the accent phrase, the accent type, the phoneme sequence, and the like, a) a rising reference point, b) a falling reference point, c) a falling reference point, and d) an accent phrase end from the phoneme time length standardized fundamental frequency database 351. Reference point or d ') Obtain final mora.

基本周波数を生成しようとするアクセント句のモーラ数、アクセント型に対応する基本周波数パタンのデータが音素時間長標準化基本周波数データベース３５１にない場合は、基本周波数を生成しようとするアクセント句のモーラ数がｎモーラ、アクセント型がｍ型とすると、ｍがｉ＋１以下の場合は図１１（Ａ）のように、アクセント型がｍ型でモーラ数がｎに最も近い１モーラｍ型の基本周波数パタンのａ）からｄ）を音素時間長標準化基本周波数データベース３５１より取得し、図１１（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｄ）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。 When the number of mora of the accent phrase for which the fundamental frequency is to be generated and the data of the fundamental frequency pattern corresponding to the accent type are not in the phoneme time length standardized fundamental frequency database 351, the number of mora of the accent phrase for which the fundamental frequency is to be generated is Assuming that n mora and m-type accent type, if m is equal to or less than i + 1, as shown in FIG. 11 (A), a 1-mora m-type fundamental frequency pattern a of m-type accent type and mora number closest to n ) To d) are obtained from the phoneme time standardized fundamental frequency database 351, and d) obtained from the phoneme time standardized basic frequency database 351 as shown in FIG. It is set as a reference point of the (n−k + 1) mora to the n-th mora.

ｍがｉ＋１より大きくｎ−ｋ以下の場合は図１２（Ａ）のように、アクセント核のモーラ位置ｊがｉ＋１より大きくｌ−ｋ以下で、モーラ数がｎに最も近いｌモーラｊ型の基本周波数パタンのａ）からｄ）を音素時間長標準化基本周波数データベース３５１より取得し、図１２（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｂ）とｃ）を基本周波数を生成しようとするアクセント句の第ｍモーラと第ｍ＋１モーラの基準点として設定し、音素時間長標準化基本周波数データベース３５１から取得したｄ）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。 When m is larger than i + 1 and equal to or smaller than n−k, as shown in FIG. 12A, the mora position j of the accent nucleus is larger than i + 1 and equal to or smaller than 1−k, and the number of mora is closest to n. The frequency patterns a) to d) are obtained from the phoneme time length standardized fundamental frequency database 351, and the basic frequencies b) and c) obtained from the phoneme time length standardized fundamental frequency database 351 as shown in FIG. The d) obtained from the m-th mora and the (m + 1) -th mora of the accent phrase to be obtained and obtained from the phoneme time length standardized fundamental frequency database 351 is obtained from the (nk−1) th mora of the accent phrase for which the fundamental frequency is to be generated. It is set as the reference point of the nth mora.

ｍがｎ−ｋより大きい場合は図１３（Ａ）のように、アクセント核のモーラ位置ｊがｌ−ｋより大きくモーラ数がｎに最も近いｌモーラｊ型の基本周波数パタンのａ）からｄ’）を音素時間長標準化基本周波数データベース３５１より取得し、図１３（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｂ）とｃ）を含むｄ’）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。基本周波数を生成しようとするアクセント句がｎモーラ平板型の場合図１４（Ａ）のように、アクセント型が平板型でモーラ数がｎに最も近いｌモーラ平板型の基本周波数パタンのａ）とｄ）を音素時間長標準化基本周波数データベース３５１より取得し、図１３（Ｂ）のように音素時間長標準化基本周波数データベース３５１から取得したｄ）を基本周波数を生成しようとするアクセント句の第ｎ−ｋ＋１モーラから第ｎモーラの基準点として設定する。 When m is larger than n−k, as shown in FIG. 13A, the mora position j of the accent nucleus is larger than 1−k and the number of mora is closest to n. ') From the phoneme time length standardized fundamental frequency database 351 and generate a fundamental frequency d') including b) and c) obtained from the phoneme time standardized fundamental frequency database 351 as shown in FIG. Are set as reference points of the (n−k + 1) th mora to the nth mora of the accent phrase. When the accent phrase for which the fundamental frequency is to be generated is an n-mora flat plate type, as shown in FIG. 14A, the 1-mora flat plate basic frequency pattern a) whose accent type is a flat type and whose mora number is closest to n is d) is obtained from the phoneme time length standardized fundamental frequency database 351, and d) obtained from the phoneme time length standardized fundamental frequency database 351 as shown in FIG. It is set as a reference point from k + 1 moras to n-th moras.

次に、音素時間長標準化基本周波数データベース３５１より取得されたあるいは音素時間長標準化基本周波数データベース３５１より取得された基準点より生成されたアクセント句の基本周波数パタンを基本周波数変形データベース３５０にアクセント句のフレーズ内での位置ごとに記憶された変形量に従って、各アクセント句の基本周波数の最大値、ａ）からｄ）あるいはｄ’）の基準点の基本周波数を変更する。 Next, the fundamental frequency pattern of the accent phrase obtained from the reference point acquired from the phoneme time length standardized fundamental frequency database 351 or generated from the phoneme time standardized fundamental frequency database 351 is stored in the fundamental frequency deformation database 350. The maximum value of the fundamental frequency of each accent phrase, that is, the fundamental frequency of the reference point from a) to d) or d ') is changed according to the amount of deformation stored for each position in the phrase.

まず基本周波数変形データベース３５０に記憶された第１アクセント句の変形量により図１５中のＡ）のように、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の９０％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。第２アクセント句については図１５中のＢ）のように、ａ）の基本周波数を音素時間長標準化基本周波数データベース３５１より取得された基本周波数の７５％の値に変更し、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の７０％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。同様に第３アクセント句も図１５中のＣ）のようにａ）の基本周波数を音素時間長標準化基本周波数データベース３５１より取得された基本周波数の７０％の値に変更し、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の６８％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。 First, as shown in A) in FIG. 15, the difference between the fundamental frequencies a) and d) is obtained from the phoneme time length standardized fundamental frequency database 351 based on the amount of transformation of the first accent phrase stored in the fundamental frequency transformation database 350. The basic frequencies b), c) and d) are changed so as to be 90% of the difference between the fundamental frequencies. For the second accent phrase, as shown in B) in FIG. 15, the fundamental frequency of a) is changed to a value of 75% of the fundamental frequency obtained from the phoneme time length standardized fundamental frequency database 351; The fundamental frequencies b), c) and d) are changed so that the difference between the fundamental frequencies of the above becomes 70% of the difference between the fundamental frequencies obtained from the phoneme time length standardized fundamental frequency database 351. Similarly, the third accent phrase also changes the fundamental frequency of a) to 70% of the fundamental frequency obtained from the phoneme time length standardized fundamental frequency database 351 as shown in C) in FIG. The fundamental frequencies b), c) and d) are changed so that the difference between the fundamental frequencies of the above becomes 68% of the difference between the fundamental frequencies obtained from the phoneme time length standardized fundamental frequency database 351.

基本周波数変形データベース３５０に第ｎアクセント句に対応する変形量が記憶されていない場合、アクセント位置の値がｎより小さく、最もｎに近いアクセント位置に対応する変形量を適用する。本例では第４アクセント句の変形量が基本周波数変形データベース３５０に記憶されていない場合を示す。 If the transformation amount corresponding to the n-th accent phrase is not stored in the fundamental frequency transformation database 350, the transformation amount corresponding to the accent position whose accent position value is smaller than n and closest to n is applied. This example shows a case where the deformation amount of the fourth accent phrase is not stored in the fundamental frequency deformation database 350.

アクセント位置の値が４より小さく、最も４に近い第３アクセント句の変形量を適用し図１５中のＤ）のように第３アクセント句と同様の変形を加える。フレーズ終端である最終アクセント句については、基本周波数変形データベース３５０より最終アクセント句に対応する変形量を取得し、図１５中のＥ）のようにａ）の基本周波数を音素時間長標準化基本周波数データベース３５１より取得された基本周波数の４８％の値に変更し、ａ）とｄ）の基本周波数の差が音素時間長標準化基本周波数データベース３５１より取得された基本周波数の差の６０％になるようにｂ）、ｃ）およびｄ）の基本周波数を変更する。 The amount of deformation of the third accent phrase in which the value of the accent position is smaller than 4 and closest to 4 is applied, and the same deformation as in the third accent phrase is added as shown in D) in FIG. For the final accent phrase at the end of the phrase, the amount of deformation corresponding to the final accent phrase is obtained from the basic frequency deformation database 350, and the basic frequency of a) is converted to the phoneme time length standardized basic frequency database as shown in E) in FIG. 351 is changed to a value of 48% of the fundamental frequency acquired so that the difference between the fundamental frequencies a) and d) becomes 60% of the difference between the fundamental frequencies acquired from the phoneme time length standardized fundamental frequency database 351. Change the fundamental frequencies of b), c) and d).

次に、各アクセント句について、実施の形態２あるいは実施の形態４のようにアクセント句の先頭からａ）までの基本周波数を実時間軸上の関数を用いて生成し、さらに各基準点の間を実時間軸上で補間し、アクセント句終了点までの基本周波数パタンを生成する。 Next, for each accent phrase, a fundamental frequency from the beginning of the accent phrase to a) is generated using a function on the real time axis as in the second or fourth embodiment, and furthermore, between each reference point. Is interpolated on the real time axis to generate a fundamental frequency pattern up to the end point of the accent phrase.

音声の自然性に大きく影響する、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの音素時間長で標準化した時間軸上で設定することにより、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。さらに基本周波数パタンの拡張を行うことによりデータベースの縮小が可能になる。また、フレーズ内でのアクセント句位置に基づいて基本周波数パタンを変形することにより、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By setting the timing of the rise of the accent phrase and the fall timing at the accent nucleus on the time axis standardized by the phoneme time length of the mora, which greatly affects the naturalness of the voice, it is influenced by the difference in time length due to phonemes. This makes it possible to obtain a pattern with a stable fundamental frequency with a stable rise and fall without causing a high naturalness. Furthermore, the database can be reduced by extending the fundamental frequency pattern. Also, by transforming the fundamental frequency pattern based on the position of the accent phrase in the phrase, it is possible to form a unity as a phrase and realize a natural sentence voice.

（実施の形態７）
図１７は複数のアクセント句の基本周波数パタンを接続して生成した文の基本周波数パタンの模式図である。装置の構成については図１に同じである。以下その動作を述べる。 (Embodiment 7)
FIG. 17 is a schematic diagram of a fundamental frequency pattern of a sentence generated by connecting fundamental frequency patterns of a plurality of accent phrases. The configuration of the device is the same as in FIG. The operation will be described below.

図１７に示す様に、まず、第１アクセント句１７０１のモーラ数、アクセント型に対応する基本周波数パタン１７１１をモーラ時間長標準化基本周波数データベース５０より取得し、適用する。 As shown in FIG. 17, first, a fundamental frequency pattern 1711 corresponding to the number of mora and the accent type of the first accent phrase 1701 is obtained from the mora time length standardized fundamental frequency database 50 and applied.

第１アクセント句１７０１の基本周波数の最大値ａを通り、第ｎアクセント句の位置を示すｉの値が増加するごとに、第１アクセント句１７０１の最大値ａが１０％低下するような、第ｎアクセント句に対するアクセント句の基本周波数最大値を示す式１を求める。
（数１）
（−０．１ｉ＋１）ａ …式１
但し、ａは、第１アクセント句１７０１の基本周波数の最大値である。また、アクセント句数ｉは、第ｎアクセント句が、第１アクセント句から数えて、何番目のアクセント句であるかを示す数であり、ｎ−１となる。 The maximum value a of the first accent phrase 1701 decreases by 10% every time the value of i indicating the position of the n-th accent phrase increases through the maximum value a of the fundamental frequency of the first accent phrase 1701. Expression 1 is obtained which indicates the maximum fundamental frequency of the accent phrase for the n accent phrase.
(Equation 1)
(−0.1i + 1) a Equation 1
Here, a is the maximum value of the fundamental frequency of the first accent phrase 1701. The number of accent phrases i is a number indicating the number of the accent phrase counted from the first accent phrase, and is n-1.

さらに第１アクセント句１７０１のアクセント句末の周波数ｂを通り、第ｎアクセント句の位置を示すｉの値が増加する毎に、第１アクセント句１７０１のアクセント句末の周波数ｂが５％低下するような、第ｎアクセント句に対するアクセント句末の周波数を示す式２を求める。
（数２）
（−０．０５ｉ＋１）ｂ …式２
但し、ｂは、第１アクセント句１７０１のアクセント句末の周波数である。 Further, every time the value of i indicating the position of the n-th accent phrase increases through the frequency b at the end of the accent phrase of the first accent phrase 1701, the frequency b at the end of the accent phrase of the first accent phrase 1701 decreases by 5%. Equation 2 indicating the frequency at the end of the accent phrase for the n-th accent phrase is obtained.
(Equation 2)
(−0.05i + 1) b Equation 2
Here, b is the frequency at the end of the accent phrase of the first accent phrase 1701.

次に、第２アクセント句１７０２のモーラ数、アクセント型に対応する基本周波数パタン１７１２（図中、点線で表した）をモーラ時間長標準化基本周波数データべース５０より取得する。第２アクセント句のアクセント句数ｉは１であるから、これを式１に代入して、基本周波数パタン１７１２の変形後の最大値ａ₂を求める。同様にして、式２より、基本周波数パタン１７１２の変形後のアクセント句末の周波数ｂ₂を求める。 Next, a basic frequency pattern 1712 (represented by a dotted line in the figure) corresponding to the number of moras and the accent type of the second accent phrase 1702 is obtained from the moras time length standardized basic frequency database 50. Since the number of accent phrases i of the second accent phrase is 1, this is substituted into Expression 1 to obtain the transformed maximum value a ₂ of the fundamental frequency pattern 1712. Similarly, the frequency b ₂ at the end of the accent phrase after the transformation of the fundamental frequency pattern 1712 is obtained from Expression 2.

この様にして求めた変形後の最大値ａ₂と、変形後のアクセント句末の周波数ｂ₂とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタン１７１２を変形した後、変形後の基本周波数パタン１７１３を第２アクセント句１７０２の基本周波数パタンとして用いる。 The fundamental frequency pattern 1712 obtained from the mora time length standardized fundamental frequency database 50 so as to match the maximum value a ₂ after the transformation obtained in this way and the frequency b ₂ at the end of the accent phrase after the transformation. Is transformed, the transformed fundamental frequency pattern 1713 is used as the fundamental frequency pattern of the second accent phrase 1702.

第ｎアクセント句についても、当該アクセント句が最終アクセント句（文末）でない場合、第ｎアクセント句のモーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得する。そして、その取得した基本周波数パタンの最大値が、式１より得られた値に一致し、且つ、その取得した基本周波数パタンのアクセント句末の周波数が、式２より得られた値に一致する様に、上記データベース５０より取得した基本周波数パタンを変形し、これを第ｎアクセント句の基本周波数パタンとして用いる。 For the n-th accent phrase, if the accent phrase is not the final accent phrase (end of sentence), the number of mora of the n-th accent phrase and the fundamental frequency pattern corresponding to the accent type are obtained from the mora time length standardized fundamental frequency database 50. I do. Then, the maximum value of the obtained fundamental frequency pattern matches the value obtained from Expression 1, and the frequency at the end of the accent phrase of the obtained fundamental frequency pattern matches the value obtained from Expression 2. In this manner, the fundamental frequency pattern acquired from the database 50 is transformed and used as the fundamental frequency pattern of the n-th accent phrase.

更に、基本周波数を生成しようとするアクセント句が文末である場合、モーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得し、その最大値が当該アクセント句の直前のアクセント句の最大値を１５％低下させた値に一致し、しかも、アクセント句末の周波数が直前のアクセント句のアクセント句末を１０％低下させた値に一致するように、上記データベース５０より取得した基本周波数パタンを変形し、これを適用する。 Further, when the accent phrase for which the fundamental frequency is to be generated is at the end of the sentence, the fundamental frequency pattern corresponding to the number of mora and the accent type is obtained from the mora time length standardized fundamental frequency database 50, and the maximum value is obtained as the corresponding accent. In order to match the maximum value of the accent phrase immediately before the phrase by 15%, and to match the frequency at the end of the accent phrase with the value of the accent phrase of the immediately preceding accent phrase by 10%, The fundamental frequency pattern acquired from the database 50 is modified and applied.

尚、対応する基本周波数パタンのデータがモーラ時間長標準化基本周波数データベース５０にない場合は、実施の形態６のようにアクセント句の基本周波数パタンを生成し、これを変形する。 If the data of the corresponding fundamental frequency pattern is not in the mora time length standardized fundamental frequency database 50, the fundamental frequency pattern of the accent phrase is generated and modified as in the sixth embodiment.

当該モーラのモーラ時間長で標準化した時間軸上で設定することにより、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。また、フレーズ内でのアクセント句位置に基づいて基本周波数パタンを変形することにより、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By setting on the time axis standardized by the mora time length of the mora, it is possible to obtain a smooth fundamental frequency pattern with stable rising and falling without being affected by the difference in time length due to phonemes, and Realize naturalness. Also, by transforming the fundamental frequency pattern based on the position of the accent phrase in the phrase, it is possible to form a unity as a phrase and realize a natural sentence voice.

尚、上記実施の形態では、基本周波数を生成しようとするアクセント句が、文末である場合のみ、直前のアクセント句の所定位置の周波数を基準として、その周波数を所定の比率で低下させて用いる場合について述べた。そこで、上記実施の形態の変形例として、文末以外に存在しているアクセント句についても、上記例と同様のルールで各周波数値を圧縮しても良い。即ち、この場合、例えば、図１８に示す様に、文末を除く、第２アクセント句から第ｎアクセント句については、それぞれ、直前のアクセント句の最大値を１０％低下させた値（図中、例えば、ａ₂）と、直前のアクセント句のアクセント句末の周波数を５％低下させた値（図中、例えば、ｂ₂）を求める。 Note that, in the above embodiment, only when the accent phrase for which the basic frequency is to be generated is at the end of a sentence, the frequency is reduced at a predetermined ratio with respect to the frequency at a predetermined position of the immediately preceding accent phrase. Was mentioned. Therefore, as a modified example of the above embodiment, each frequency value may be compressed according to the same rule as in the above example also for an accent phrase existing at a position other than the end of a sentence. That is, in this case, for example, as shown in FIG. 18, for the second accent phrase to the n-th accent phrase, excluding the end of the sentence, a value obtained by reducing the maximum value of the immediately preceding accent phrase by 10% (in the figure, for example, a _2), in the frequency of 5% reduction is allowed values of accent phrase end of the immediately preceding accent phrase (Figure, for example, determine the b _2).

そして、例えば第２アクセント句については、この様にして求めた変形後の最大値ａ₂と、変形後のアクセント句末の周波数ｂ₂とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタン１７１２を変形した後、変形後の基本周波数パタン１７１３を第２アクセント句１７０２の基本周波数パタンとして用いる。第ｎアクセント句についても、これと同様である。尚、基本周波数を生成しようとするアクセント句が、文末である場合は、図１７と同様の方法を用いる。 Then, for example, for the second accent phrase, and the maximum value a ₂ after the deformation obtained in this manner, to match the frequency b ₂ of the accent phrase end after deformation, mora time length standardized fundamental frequency data After transforming the fundamental frequency pattern 1712 obtained from the source 50, the transformed fundamental frequency pattern 1713 is used as the fundamental frequency pattern of the second accent phrase 1702. The same applies to the n-th accent phrase. If the accent phrase for which the fundamental frequency is to be generated is at the end of a sentence, a method similar to that in FIG. 17 is used.

（実施の形態８）
図１９は複数のアクセント句の基本周波数パタンを接続して生成した文の基本周波数パタンの模式図である。装置の構成については図１に同じである。以下その動作を述べる。 (Embodiment 8)
FIG. 19 is a schematic diagram of a fundamental frequency pattern of a sentence generated by connecting fundamental frequency patterns of a plurality of accent phrases. The configuration of the device is the same as in FIG. The operation will be described below.

図１９に示す様に、まず、第１アクセント句１８０１のモーラ数、アクセント型に対応する基本周波数パタン１８１１をモーラ時間長標準化基本周波数データベース５０より取得し、適用する。 As shown in FIG. 19, first, the basic frequency pattern 1811 corresponding to the number of moras and the accent type of the first accent phrase 1801 is obtained from the moras time length standardized basic frequency database 50 and applied.

第１アクセント句１８０１の基本周波数の最大値ａを通り、第１アクセント句の基本周波数の最大値ａを含むモーラ位置からのモーラ数が増えるごとにアクセント句１８０１の最大値ａが２％低下するような、累積モーラ数ｊに対するアクセント句の基本周波数最大値を示す式３を求める。
（数３）
（−０．０２ｊ＋１）ａ …式３
但し、ａは、第１アクセント句１８０１の基本周波数の最大値であり、累積モーラ数ｊは、第１アクセント句の基本周波数の最大値ａを含むモーラ位置（図中、横軸の原点とした）を基準として数えたモーラ数である。 The maximum value a of the accent phrase 1801 decreases by 2% every time the number of mora from the mora position including the maximum value a of the fundamental frequency of the first accent phrase 1801 passes through the maximum value a of the fundamental frequency of the first accent phrase 1801. Expression 3 indicating the maximum fundamental frequency of the accent phrase with respect to the accumulated number of mora j is obtained.
(Equation 3)
(−0.02j + 1) a Equation 3
Here, a is the maximum value of the fundamental frequency of the first accent phrase 1801, and the cumulative number of moras j is the mora position including the maximum value a of the fundamental frequency of the first accent phrase 180 (in FIG. ) Is the number of mora counted based on).

さらに第１アクセント句１８０１のアクセント句末の周波数ｂを通り、第１アクセント句のアクセント句末の周波数ｂを含むモーラ位置からのモーラ数が増えるごとに、第１アクセント句１８０１のアクセント句末の周波数ｂが１％低下するような、累積モーラ数ｊに対するアクセント句末の周波数を示す式４を求める。
（数４）
（−０．０１ｊ＋１）ｂ …式４
但し、ｂは、第１アクセント句１８０１のアクセント句末の周波数である。 Further, each time the number of mora from the mora position including the frequency b at the end of the accent phrase of the first accent phrase and passing through the frequency b at the end of the accent phrase of the first accent phrase 1801 increases, the end of the accent phrase of the first accent phrase 1801 is increased. Equation 4 is obtained which indicates the frequency at the end of the accent phrase with respect to the accumulated number of moras j such that the frequency b decreases by 1%.
(Equation 4)
(-0.01j + 1) b ... Equation 4
Here, b is the frequency at the end of the accent phrase of the first accent phrase 1801.

次に、第２アクセント句１８０２のモーラ数、アクセント型に対応する基本周波数パタン１８１２（図中、点線で表した）をモーラ時間長標準化基本周波数データべース５０より取得し、その最大値１８１２ａをとるモーラが、原点のモーラからｊ_2aモーラ目になることを求め、これを式３に累積モーラ数として代入して、基本周波数パタン１８１２の変形後の最大値ａ₂を求める。又、第２アクセント句１８０２のアクセント句末１８１２ｂが原点のモーラからｊ_2bモーラ目になることを求め、これを式４に累積モーラ数として代入して、基本周波数パタン１８１２の変形後のアクセント句末の周波数ｂ₂を求める。 Next, the number of mora of the second accent phrase 1802 and the basic frequency pattern 1812 (represented by a dotted line in the figure) corresponding to the accent type are obtained from the mora time length standardized basic frequency database 50, and the maximum value 1812a is obtained. mora taking is determined to become the origin of mora to j _2a mora th, which is substituted as the cumulative number of moras in equation 3, the maximum value a ₂ after the deformation of the fundamental frequency pattern 1812. Further, it is determined that the end of the accent phrase 1812b of the second accent phrase 1802 is j _2b mora from the origin mora, and this is substituted into Expression 4 as the cumulative number of mora, and the transformed accent phrase of the fundamental frequency pattern 1812 is obtained. determine the end of the frequency b _2.

この様にして求めた変形後の最大値ａ₂と、変形後のアクセント句末の周波数ｂ₂とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタン１８１２を変形した後、これを第２アクセント句１８０２の基本周波数パタンとして用いる。 The fundamental frequency pattern 1812 obtained from the mora time length standardized fundamental frequency database 50 so as to match the maximum value a ₂ after the transformation thus obtained and the frequency b ₂ at the end of the accent phrase after the transformation. Is used as the fundamental frequency pattern of the second accent phrase 1802.

第ｎアクセント句についても、当該アクセント句が最終アクセント句（文末）でない場合、第ｎアクセント句のモーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得し、その最大値をとるモーラが、原点のモーラから数えて何モーラ目になるかを求め、これを式３に累積モーラ数として代入して基本周波数パタンの変形後の最大値を求める。更に、アクセント句末が、原点のモーラから数えて何モーラ目になるかを求め、これを式４に累積モーラ数として代入して基本周波数パタンの変形後のアクセント句末の周波数を求める。 For the n-th accent phrase, if the accent phrase is not the final accent phrase (end of sentence), the number of mora of the n-th accent phrase and the fundamental frequency pattern corresponding to the accent type are obtained from the mora time length standardized fundamental frequency database 50. Then, the number of the mora that takes the maximum value is counted from the mora at the origin, and the number of the mora is obtained. This is substituted into Equation 3 as the number of the accumulated mora, and the maximum value after the deformation of the fundamental frequency pattern is obtained. Further, the number of moras counted from the mora at the origin is determined, and this is substituted into Equation 4 as the cumulative number of moras, to obtain the frequency of the accent phrase end after transformation of the fundamental frequency pattern.

この様にして求めた変形後の最大値と、変形後のアクセント句末の周波数とに一致するように、モーラ時間長標準化基本周波数データべース５０より取得した基本周波数パタンを変形して、第ｎアクセント句の基本周波数パタンとして用いる。 The fundamental frequency pattern obtained from the mora time length standardized fundamental frequency database 50 is transformed so as to match the maximum value after the transformation obtained in this way and the frequency at the end of the accent phrase after the transformation, Used as the fundamental frequency pattern of the nth accent phrase.

又、基本周波数を生成しようとするアクセント句が文末である場合、モーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データべース５０より取得し、その最大値が当該アクセント句の直前のアクセント句の最大値を１５％低下させた値に一致し、アクセント句末の周波数が直前のアクセント句のアクセント句末を１０％低下させた値に一致するよう取得した基本周波数パタンを変形して適用する。対応する基本周波数パタンのデータがモーラ時間長標準化基本周波数データベース５０にない場合は、実施の形態６のようにアクセント句の基本周波数パタンを生成し、変形する。 If the accent phrase for which the fundamental frequency is to be generated is at the end of the sentence, the fundamental frequency pattern corresponding to the number of mora and the accent type is obtained from the mora time length standardized fundamental frequency database 50, and the maximum value is obtained for the accent. A fundamental frequency pattern acquired so that the maximum value of the accent phrase immediately before the phrase is reduced by 15% and the frequency at the end of the accent phrase matches the value obtained by decreasing the accent phrase end of the immediately preceding accent phrase by 10%. And apply it. If the data of the corresponding fundamental frequency pattern is not in the mora time length standardized fundamental frequency database 50, the fundamental frequency pattern of the accent phrase is generated and modified as in the sixth embodiment.

当該モーラのモーラ時間長で標準化した時間軸上で設定することにより、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、高い自然性を実現する。また、フレーズ内での累積モーラ位置に基づいて基本周波数パタンを変形することにより、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By setting on the time axis standardized by the mora time length of the mora, it is possible to obtain a smooth fundamental frequency pattern with stable rising and falling without being affected by the difference in time length due to phonemes, and Realize naturalness. Also, by transforming the fundamental frequency pattern based on the accumulated mora positions in the phrase, it is possible to form a unity as a phrase and realize a natural sentence voice.

（実施の形態９）
図１６は本発明に関連する他の発明一実施の形態を示す装置の機能ブロック図であり、モーラ時間長標準化基本周波数データベース５０が第１アクセント句から第３アクセント句についてアクセント句が文末であるか否か、およびアクセント句のモーラ数、アクセント型、音韻列等の、韻律を決定する要因によって分類された、各モーラの母音部の時間長で標準化したモーラ毎の母音部の基本周波数パタンを記憶するアクセント句位置基本周波数データベース４５０に置き換わった以外は図１と同様である。 (Embodiment 9)
FIG. 16 is a functional block diagram of an apparatus showing another embodiment related to the present invention. In the mora time length standardized fundamental frequency database 50, the accent phrases from the first accent phrase to the third accent phrase are sentence endings. The basic frequency pattern of the vowel part for each mora, standardized by the time length of the vowel part of each mora, classified according to the prosodic determining factors such as the number of mora of the accent phrase, the accent type, the phoneme sequence, etc. It is the same as FIG. 1 except that the accent phrase position fundamental frequency database 450 to be stored is replaced.

まず、文字列入力部１０から音声に変換されるべき文字列が入力される。文字列解析部２０では入力された文字列を解析し、音素列を示す音韻情報を時間長設定部４０へ出力し、文字列をアクセント句に分け、各アクセント句のモーラ数とアクセント型、およびアクセント句のフレーズ内での位置を示す韻律情報と音素列を示す音韻情報とを基本周波数パタン生成部６０へ出力する。 First, a character string to be converted to speech is input from the character string input unit 10. The character string analysis unit 20 analyzes the input character string, outputs phoneme information indicating the phoneme string to the time length setting unit 40, divides the character string into accent phrases, and sets the number of mora and accent type of each accent phrase, and The prosody information indicating the position of the accent phrase in the phrase and the phoneme information indicating the phoneme sequence are output to the fundamental frequency pattern generation unit 60.

時間長設定部４０は文字列解析部２０より入力された音韻情報に基づき音韻時間長データベース３０を参照して各モーラの母音時間長または単母音音節、撥音あるいは長音における母音相当部の時間長を設定し、時間長情報を基本周波数パタン生成部６０に出力する。基本周波数パタン生成部６０は文字列解析部２０より入力された韻律情報と音韻情報、時間長設定部４０より入力された時間長情報に基づき、アクセント句毎に基本周波数パタンを生成する。本例では５つのアクセント句によって構成される文の基本周波数の生成を説明する。 The time length setting unit 40 refers to the phoneme time length database 30 based on the phoneme information input from the character string analysis unit 20 and determines the vowel time length of each mora or the time length of a single vowel syllable, vowel sound or a vowel equivalent part in a long sound. And outputs the time length information to the fundamental frequency pattern generation unit 60. The basic frequency pattern generation unit 60 generates a basic frequency pattern for each accent phrase based on the prosody information and phoneme information input from the character string analysis unit 20 and the time length information input from the time length setting unit 40. In this example, generation of a fundamental frequency of a sentence composed of five accent phrases will be described.

まず、第１アクセント句に対して、アクセント句位置基本周波数データベース４５０より第１アクセント句で文末でない、基本周波数を生成しようとするアクセント句のモーラ数、アクセント型に対応する基本周波数パタンを取得する。第２アクセント句、第３アクセント句に対しても同様にアクセント句位置基本周波数データベース４５０より基本周波数パタンを取得する。 First, for the first accent phrase, a basic frequency pattern corresponding to the number of mora of the accent phrase for which the fundamental frequency is to be generated and the accent type, which is not the end of the sentence in the first accent phrase, is obtained from the accent phrase position fundamental frequency database 450. . Similarly, a fundamental frequency pattern is obtained from the accent phrase position fundamental frequency database 450 for the second accent phrase and the third accent phrase.

第４アクセント句については、アクセント句位置基本周波数データベース４５０に第４アクセント句に対応する基本周波数パタンはないため、第４アクセント句にアクセント句の位置が最も近い第３アクセント句の、文末でない基本周波数パタンからモーラ数とアクセント型に対応する基本周波数パタンを取得する。 As for the fourth accent phrase, since there is no fundamental frequency pattern corresponding to the fourth accent phrase in the accent phrase position fundamental frequency database 450, the third accent phrase whose accent phrase is closest to the fourth accent phrase is not at the end of the sentence. The basic frequency pattern corresponding to the number of mora and the accent type is obtained from the frequency pattern.

最終アクセント句である第５アクセント句についても、アクセント句位置基本周波数データベース４５０に該当する基本周波数パタンはないため、アクセント句の位置が最も近い第３アクセント句の、文末の基本周波数パタンからモーラ数とアクセント型に対応する基本周波数パタンを取得する。実施の形態３あるいは実施の形態４のように基本周波数パタンのない部分を実時間軸上で補間し、基本周波数パタンを生成する。 As for the fifth accent phrase, which is the final accent phrase, there is no fundamental frequency pattern corresponding to the accent phrase position fundamental frequency database 450. Therefore, the number of mora from the fundamental frequency pattern at the end of the sentence of the third accent phrase having the closest accent phrase position is calculated. And the fundamental frequency pattern corresponding to the accent type. As in the third or fourth embodiment, a portion having no fundamental frequency pattern is interpolated on the real time axis to generate a fundamental frequency pattern.

当該モーラの母音長で標準化した基本周波数パタンを利用することによりモーラ内での基本周波数の変動を詳細に再現し、アクセント句の位置、文末か否かの条件によって当てはめることによりフレーズ単位の基本周波数の変動を正確に再現できるため、フレーズとしてのまとまりをつくり、自然な文音声を実現することができる。 By using the fundamental frequency pattern standardized by the vowel length of the mora, the variation of the fundamental frequency in the mora is reproduced in detail, and applied to the condition of the accent phrase, the condition of the end of the sentence or not, and the fundamental frequency of the phrase unit Can be accurately reproduced, so that a unity as a phrase can be formed and a natural sentence voice can be realized.

（実施の形態１０）
図２０（Ａ）、（Ｂ）はアクセント句の基本周波数パタンを接続して文を生成する際の基本周波数パタンの接続部の模式図である。本発明に関連する他の発明一実施の形態の基本周波数パタン生成装置の構成については図１に同じである。以下その動作を述べる。 (Embodiment 10)
FIGS. 20A and 20B are schematic diagrams of a connection portion of a fundamental frequency pattern when a sentence is generated by connecting fundamental frequency patterns of accent phrases. The configuration of a fundamental frequency pattern generation device according to another embodiment of the present invention relating to the present invention is the same as that shown in FIG. The operation will be described below.

まず、基本周波数パタンを生成しようとする各アクセント句のモーラ数、アクセント型に対応する基本周波数パタンをモーラ時間長標準化基本周波数データベース５０より取得し、適用する。実施の形態６、実施の形態７、あるいは実施の形態８の方法でアクセント句ごとにモーラ時間長標準化基本周波数データベース50より取得した基本周波数パタンを変形する。 First, the number of mora of each accent phrase for which a basic frequency pattern is to be generated and the basic frequency pattern corresponding to the accent type are obtained from the mora time length standardized basic frequency database 50 and applied. According to the method of the sixth, seventh, or eighth embodiment, the fundamental frequency pattern acquired from the mora time length standardized fundamental frequency database 50 is modified for each accent phrase.

変形された各アクセント句の基本周波数パタンのうち、文末でない第ｎアクセント句につて、図２０のｅ）当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差を求める。 Among the transformed fundamental frequency patterns of each accent phrase, the n-th accent phrase that is not the end of the sentence is shown in FIG. 20e) the fundamental frequency of the vowel part of the last mora of the accent phrase and the first mora of the n + 1-th accent phrase. Find the difference between the fundamental frequencies of the vowels.

第ｎアクセント句と第ｎ+1アクセント句の間にポーズがない場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が４０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から４０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮し、図２０のｆ）のように第ｎアクセント句と第ｎ+1アクセント句の間を滑らかに接続する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が４０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から４０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮し、第ｎアクセント句と第ｎ+1アクセント句の間を滑らかに接続する。 If there is no pause between the n-th accent phrase and the n + 1-th accent phrase, e) the fundamental frequency of the vowel part of the last mora of the relevant accent phrase and the fundamental frequency of the vowel part of the first mora of the n + 1-th accent phrase Is greater than 40 Hz and the accent nucleus of the n-th accent phrase is not in the last three moras in the accent phrase, the first mora of the accent phrase end reference point or the preceding mora and the (n + 1) -th accent The fundamental frequency pattern from the mora having a fundamental frequency exceeding the value obtained by subtracting 40 from the fundamental frequency of the vowel part of the first mora of the phrase to the final mora of the n-th accent phrase is compressed in the frequency axis direction, and FIG. , The connection between the n-th accent phrase and the (n + 1) -th accent phrase is smoothly connected. e) The difference between the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the n + 1th accent phrase is 40 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase. If the mora is within the last three moras, the mora having the fundamental frequency exceeding the value obtained by subtracting 40 from the fundamental frequency of the vowel part of the first mora of the n + 1 accent phrase in the mora of the accent kernel or the preceding mora, The fundamental frequency pattern up to the final mora of the n-th accent phrase is compressed in the frequency axis direction, and a smooth connection is made between the n-th accent phrase and the (n + 1) -th accent phrase.

第ｎアクセント句と第ｎ+1アクセント句の間に５０ｍｓｅｃ未満のポーズがある場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が５０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から５０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が５０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から５０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。 If there is a pause of less than 50 msec between the nth accent phrase and the (n + 1) th accent phrase, e) the fundamental frequency of the vowel part of the last mora of the accent phrase and the vowel part of the first mora of the (n + 1) th accent phrase If the fundamental frequency difference is 50 Hz or more, and the accent nucleus of the n-th accent phrase is not in the last three moras in the accent phrase, the n-th mora is the first morah of the accent phrase end reference point or a mora preceding it. A fundamental frequency pattern from a mora having a fundamental frequency exceeding a value obtained by subtracting 50 from a fundamental frequency of a vowel part of a first mora of the +1 accent phrase to a final mora of the nth accent phrase is compressed in the frequency axis direction. e) the difference between the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the n + 1th accent phrase is 50 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase If the mora is within the last three moras, the mora with the fundamental frequency exceeding the value obtained by subtracting 50 from the fundamental frequency of the vowel part of the first mora of the n + 1 accent phrase in the mora of the accent kernel or the preceding mora, The fundamental frequency pattern up to the last mora of the nth accent phrase is compressed in the frequency axis direction.

第ｎアクセント句と第ｎ+1アクセント句の間に５０ｍｓｅｃ以上１００ｍｓｅｃ未満のポーズがある場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が７０Ｈｚ以上で、第ｎアクセントト句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から７０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が７０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から７０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。 If there is a pause of 50 msec or more and less than 100 msec between the nth accent phrase and the n + 1th accent phrase, e) the fundamental frequency of the vowel part of the last mora of the accent phrase and the first mora of the n + 1th accent phrase If the difference between the fundamental frequencies of the vowels is 70 Hz or more and the accent nucleus of the n-th accent phrase is not in the last three moras of the accent phrase, the first mora of the accent phrase end reference point or the mora preceding it , Compressing in the frequency axis direction a fundamental frequency pattern from a mora having a fundamental frequency exceeding a value obtained by subtracting 70 from the fundamental frequency of the vowel part of the head mora of the n + 1th accent phrase to the last mora of the nth accent phrase . e) the difference between the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the n + 1th accent phrase is 70 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase If the mora is within the last three moras, the mora having the fundamental frequency exceeding the value obtained by subtracting 70 from the fundamental frequency of the vowel part of the first mora of the n + 1 accent phrase in the mora of the accent kernel or the preceding mora, The fundamental frequency pattern up to the last mora of the nth accent phrase is compressed in the frequency axis direction.

第ｎアクセント句と第ｎ+1アクセント句の間に１００ｍｓｅｃ以上１５０ｍｓｅｃ未満のポーズがある場合はｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が８０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にない場合は、アクセント句末尾基準点の先頭モーラあるいはそれより先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から８０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。ｅ）の当該アクセント句の最終モーラの母音部の基本周波数と第ｎ+1アクセント句の先頭モーラの母音部の基本周波数の差が８０Ｈｚ以上で、第ｎアクセント句のアクセント核がアクセント句内の末尾3モーラの中にある場合は、アクセント核のモーラあるいは先行するモーラで、第ｎ+1アクセント句の先頭モーラの母音部の基本周波数から７０を減じた値を超える基本周波数をもつモーラから、第ｎアクセント句の最終モーラまでの基本周波数パタンを周波数軸方向に圧縮する。 If there is a pause between 100 msec and less than 150 msec between the nth accent phrase and the n + 1th accent phrase, e) the fundamental frequency of the vowel part of the last mora of the accent phrase and the first mora of the n + 1th accent phrase If the difference between the fundamental frequencies of the vowels is 80 Hz or more and the accent nucleus of the n-th accent phrase is not in the last three moras in the accent phrase, the first mora of the accent phrase end reference point or a mora preceding it, A fundamental frequency pattern from a mora having a fundamental frequency exceeding a value obtained by subtracting 80 from the fundamental frequency of the vowel part of the head mora of the (n + 1) th accent phrase to the last mora of the nth accent phrase is compressed in the frequency axis direction. e) The difference between the fundamental frequency of the vowel part of the last mora of the accent phrase and the fundamental frequency of the vowel part of the first mora of the n + 1th accent phrase is 80 Hz or more, and the accent nucleus of the nth accent phrase is within the accent phrase. If the mora is within the last three moras, the mora having the fundamental frequency exceeding the value obtained by subtracting 70 from the fundamental frequency of the vowel part of the first mora of the n + 1 accent phrase in the mora of the accent kernel or the preceding mora, The fundamental frequency pattern up to the last mora of the nth accent phrase is compressed in the frequency axis direction.

アクセント句単位で生成した基本周波数パタンの末尾を後続アクセント句との間のポーズ長に基づいて変形することによりアクセント句どうしの接続部を滑らかにし、自然な文音声を実現することができる。 By deforming the end of the fundamental frequency pattern generated for each accent phrase based on the pause length between the accent phrase and the subsequent accent phrase, the connection between the accent phrases can be smoothed, and a natural sentence voice can be realized.

なお、以上の説明では、実施の形態１、３、４では補間関数として直線を用い、実施の形態２で補間関数として対数周波数軸に対する臨界制動２次線形系を用いた例で説明したが、実施の形態１、３、４に臨界制動２次線形系を用い、実施の形態２に直線を用いてもよい、またその他の実時間軸上の関数についても同様に実施可能である。 In the above description, the first, third, and fourth embodiments have been described using an example in which a straight line is used as an interpolation function, and the second embodiment uses a critical damping quadratic linear system with respect to a logarithmic frequency axis as the interpolation function. A second-order critical braking linear system may be used in the first, third, and fourth embodiments, and a straight line may be used in the second embodiment. Other functions on the real time axis can be similarly implemented.

なお、実施の形態２においてアクセント句の先頭から、立ち上がり基準点までの基本周波数を対数周波数軸に対する臨界制動２次線形系を用いて補間し、実施の形態４で実時間軸上で表現された基本周波数パタンを当てはめることにより補間したが、実施の形態２に実時間軸上で表現された基本周波数パタンを当てはめ、実施の形態４に対数周波数軸に対する臨界制動２次線形系を用いてもよい。 In the second embodiment, the fundamental frequency from the beginning of the accent phrase to the rising reference point is interpolated using a critical damping quadratic linear system on the logarithmic frequency axis, and is represented on the real time axis in the fourth embodiment. Although the interpolation is performed by applying the fundamental frequency pattern, the fundamental frequency pattern expressed on the real time axis may be applied to the second embodiment, and the critical damping quadratic linear system for the logarithmic frequency axis may be used in the fourth embodiment. .

なお、実施の形態２において母音時間長標準化基本周波数データベース１５０ａは各モーラの母音部の時間長を４等分し、各区間の基本周波数の代表値を記憶するものとしたが、基本周波数パタンを各音素の時間長で標準化したものであればこれ以外のものでもよい。 In the second embodiment, the vowel time length standardized fundamental frequency database 150a divides the time length of the vowel part of each mora into four equal parts and stores the representative value of the fundamental frequency in each section. Any other standardized time length for each phoneme may be used.

なお、実施の形態２、５において、アクセント立ち上がり基準点を当該モーラの母音長を４等分した３番目の区間の中央を立ち上がり基準点としたが、母音の後半に当たる相対位置であればこれ以外の値でも良い。 In the second and fifth embodiments, the center of the third section obtained by dividing the vowel length of the mora by four is used as the accent rising reference point. However, any other relative position corresponding to the latter half of the vowel may be used. Value may be used.

なお、実施の形態５において母音時間長標準化基本周波数データベース１５０ａは各モーラの母音部の時間長を４等分し、各区間の基本周波数の代表値を記憶するものとしたが、基本周波数パタンを各母音の時間長で標準化したものであればこれ以外のものでもよい。 In the fifth embodiment, the vowel time length standardized fundamental frequency database 150a divides the time length of the vowel part of each mora into four equal parts and stores the representative value of the fundamental frequency in each section. Any other vowel may be used as long as it is standardized by the time length.

なお、実施の形態２、５において、アクセント核に当たるモーラの母音長を４等分した３番目の区間の中央と、アクセント核の次のモーラの母音長を４等分した３番目の区間の中央の２点を立ち下がり基準点としたが、母音の後半に当たる相対位置であればこれ以外の値でも良い。 In the second and fifth embodiments, the center of the third section obtained by dividing the vowel length of the mora corresponding to the accent nucleus into four parts, and the center of the third section obtained by dividing the vowel length of the next mora following the accent nucleus into four parts Are set as falling reference points, but other values may be used as long as they are relative positions corresponding to the latter half of the vowel.

なお、実施の形態２、５において、アクセント句の最終モーラの母音長を４等分した２番目の区間の中央をアクセント句末基準点としたが、母音の前半に当たる相対位置であればこれ以外の値でも良い。 In the second and fifth embodiments, the center of the second section obtained by dividing the vowel length of the last mora of the accent phrase into four is set as the accent phrase end reference point. Value may be used.

なお、実施の形態２、５において、発話の最終モーラの母音長を４等分した３番目の区間の中央を語尾基準点としたが、母音の後半に当たる相対位置であればこれ以外の値でも良い。 In Embodiments 2 and 5, the center of the third section obtained by dividing the vowel length of the last mora of the speech into four equal parts is used as the ending reference point. However, any other value may be used as long as the relative position corresponds to the latter half of the vowel. good.

なお、実施の形態５において、マイクロプロソディを付加する基礎となる基本週は素パタンを実施の形態２と同様に生成したが、実施の形態１、３、４と同様にしても良い。 In the fifth embodiment, the elementary pattern is generated in the same manner as in the second embodiment for the basic week on which the microprosody is added, but may be the same as in the first, third, and fourth embodiments.

なお、実施の形態６において、アクセント句の基本周波数パタンを実施の形態２と同様に生成したが、実施の形態１、３、４と同様にしても良い。 In the sixth embodiment, the fundamental frequency pattern of the accent phrase is generated in the same manner as in the second embodiment, but may be similar to the first, third, and fourth embodiments.

なお、実施の形態６において、基本周波数パタンの基準点をデータベースより取得された変形量に従って変更した後に補間を行ったが、補間を行った後に基本周波数パタンを変形しても良い。 In the sixth embodiment, interpolation is performed after the reference point of the fundamental frequency pattern is changed according to the deformation amount acquired from the database. However, the fundamental frequency pattern may be modified after the interpolation.

なお、実施の形態６において、基本周波数パタンの変形量として、第１アクセント句では最大値とアクセント句末との差を９０％に圧縮したが７０％から１００％未満の範囲内の他の値でも良い。 In the sixth embodiment, as the amount of deformation of the fundamental frequency pattern, the difference between the maximum value and the end of the accent phrase is compressed to 90% in the first accent phrase, but other values within the range of 70% to less than 100% are used. But it's fine.

なお、実施の形態６において、基本周波数パタンの変形量として、第２アクセント句においては最大値を７５％に圧縮し、第３アクセント句、第ｎアクセント句においては最大値を７０％に圧縮したが５０％から９０％の範囲内の他の値でも良い。 In the sixth embodiment, the maximum value of the second accent phrase is compressed to 75%, and the maximum value of the third accent phrase and the n-th accent phrase is reduced to 70% as the amount of deformation of the fundamental frequency pattern. May be another value within the range of 50% to 90%.

なお、実施の形態６において、基本周波数パタンの変形量として、第２アクセント句においては最大値とアクセント句末との差を７０％に圧縮し、第３アクセント句、第ｎアクセント句においては最大値とアクセント句末との差を６８％に圧縮したが５０％から９０％の範囲内の他の値でも良い。 In the sixth embodiment, the difference between the maximum value and the end of the accent phrase in the second accent phrase is compressed to 70% as the amount of deformation of the fundamental frequency pattern, and the difference in the third accent phrase and the n-th accent phrase is the maximum. Although the difference between the value and the end of the accent phrase is compressed to 68%, another value within the range of 50% to 90% may be used.

なお、実施の形態６において、基本周波数パタンの変形量として、最終アクセント句については最大値を４８％に圧縮したが３０％から７０％の範囲内の他の値でも良い。 In the sixth embodiment, as the amount of deformation of the fundamental frequency pattern, the maximum value of the final accent phrase is compressed to 48%, but another value within the range of 30% to 70% may be used.

なお、実施の形態６において、基本周波数パタンの変形量として、最終アクセント句については最大値とアクセント句末との差を６０％に圧縮するとしたが４０％から８０％の範囲内の他の値でも良い。 In the sixth embodiment, as the amount of deformation of the fundamental frequency pattern, the difference between the maximum value and the end of the accent phrase is compressed to 60% for the final accent phrase, but other values within the range of 40% to 80% are compressed. But it's fine.

なお、実施の形態７において、式１のｉの係数を−０．１としたが−０．０５から−０．４の範囲内の他の値でも良い。 In the seventh embodiment, the coefficient of i in Equation 1 is -0.1, but may be another value in the range of -0.05 to -0.4.

なお、実施の形態７において、式２のｊの係数を−０．０５としたが０を最大として−０．２の範囲内の他の値でも良い。 In the seventh embodiment, the coefficient of j in Equation 2 is set to -0.05, but other values within the range of -0.2 with 0 as the maximum may be used.

なお、実施の形態７および実施の形態８において最終アクセント句においては、基本周波数の最大値を直前のアクセント句の最大値を１５％低下させた値としたが、１０％から４０％の範囲内の他の値でも良い。 In Embodiments 7 and 8, in the final accent phrase, the maximum value of the fundamental frequency is set to a value obtained by reducing the maximum value of the immediately preceding accent phrase by 15%, but within a range of 10% to 40%. Other values may be used.

アクセント句末を直前のアクセント句のアクセント句末を１０％低下させた値にするとしたが、５％から４０％の範囲内の他の値でも良い。 Although the end of the accent phrase is set to a value obtained by reducing the end of the accent phrase of the immediately preceding accent phrase by 10%, another value within the range of 5% to 40% may be used.

なお、実施の形態８において、式３のｉの係数を−０．０２としたが、これに限らず、−０．０１から−０．２の範囲内の他の値でも良い。 In Embodiment 8, the coefficient of i in Equation 3 is -0.02, but is not limited to this, and may be another value in the range of -0.01 to -0.2.

なお、実施の形態８において、式４のｊの係数を−０．０１としたが、これに限らず、−０．０１から−０．１の範囲内の他の値でも良い。 In Embodiment 8, the coefficient of j in Equation 4 is -0.01, but is not limited to this, and may be another value in the range from -0.01 to -0.1.

なお、実施の形態１０において実施の形態６、７、あるいは８同様にしてモーラ時間長標準化基本周波数データベース５０より取得した基本周波数パタンを変形するとしたが、実施の形態９と同様にアクセント句位置基本周波数データベース４５０よりアクセント句の位置に基づいて基本周波数パタンを取得するとしても良い。 In the tenth embodiment, the fundamental frequency pattern acquired from the mora time length standardized fundamental frequency database 50 is modified in the same manner as in the sixth, seventh, or eighth embodiment. The fundamental frequency pattern may be obtained from the frequency database 450 based on the position of the accent phrase.

なお、実施の形態１０において第ｎアクセント句と第ｎ＋１アクセント句の間にポーズがない場合に、第ｎアクセント句の最終モーラの母音部中央と第ｎ＋１アクセント句の先頭モーラの母音部中央の基本周波数の差が４０Ｈｚ以下になるように基本周波数パタンを変形するとしたが、２０Ｈｚから６０Ｈｚの間の他の値でも良い。 In the tenth embodiment, when there is no pause between the n-th accent phrase and the (n + 1) -th accent phrase, the basics of the vowel center of the last mora of the n-th accent phrase and the vowel center of the first mora of the (n + 1) -th accent phrase are different. Although the fundamental frequency pattern is modified so that the frequency difference becomes 40 Hz or less, another value between 20 Hz and 60 Hz may be used.

なお、実施の形態１０においてアクセント句立ち下がり、アクセント句末、語尾の基本周波数の変更の基準として、第ｎアクセント句と第ｎ＋１アクセント句の間のポーズの持続時間を５０msec未満、５０msec以上１００msec未満、１００msec以上１５０msec未満、１５０msec以上の４段階に分類したが、１ないし８の他の数の段階に分類しても良い。 In the tenth embodiment, the duration of the pause between the n-th accent phrase and the (n + 1) -th accent phrase is less than 50 msec, and 50 msec or more and less than 100 msec, as the basis for changing the fundamental frequency of the accent phrase fall, the accent phrase end, and the ending. , 100 msec to less than 150 msec, and 150 msec or more, but may be classified into 1 to 8 other numbers.

なお、実施の形態１０において第ｎアクセント句と第ｎ＋１アクセント句の間のポーズの持続時間が１５０msec以上の場合はアクセント句立ち下がり、アクセント句末、語尾の基本周波数の変更を行わないものとしたが、変更を行うポーズの持続時間の上限は１２０msecから２００msecの間のほかの値としても良い。 In the tenth embodiment, when the duration of the pause between the n-th accent phrase and the (n + 1) -th accent phrase is 150 msec or more, the accent phrase falls, and the fundamental frequency of the accent phrase end and the ending is not changed. However, the upper limit of the duration of the pause to be changed may be another value between 120 msec and 200 msec.

なお、実施の形態１０においてアクセント句立ち下がり、アクセント句末、語尾の基本周波数の変更の基準として、第ｎアクセント句と第ｎ＋１アクセント句の間のポーズの持続時間を４段階に分類し、第ｎアクセント句の最終モーラの母音部中央と第ｎ＋１アクセント句の先頭モーラの母音部中央の基本周波数の差の上限をポーズの持続時間の段階毎に設定したが、ポーズの持続時間ｔに対する一次式（式５）
（数５）
ａｔ＋ｂ（Ｈｚ） …式５
ただし０＜ａ＜０．４２０＜ｂ＜６０
によって設定するとしても良い。 In the tenth embodiment, the duration of the pause between the n-th accent phrase and the (n + 1) -th accent phrase is classified into four stages as the basis for changing the fundamental frequency of the accent phrase falling edge, the accent phrase end, and the ending. The upper limit of the difference between the fundamental frequency of the center of the vowel part of the last mora of the n accent phrase and the center of the vowel part of the head mora of the (n + 1) th accent phrase is set for each step of the pause duration. (Equation 5)
(Equation 5)
at + b (Hz) ... Equation 5
However, 0 <a <0.4 20 <b <60
It may be set by setting.

なお、本発明はプログラムによって実現し、これをフロッピー（登録商標）ディスク、光ディスク、ＩＣカード、ＲＯＭカセット等のプログラムを記録することのできる記録媒体に記録して移送することにより、独立した他のコンピュータシステムで容易に実施することができる。 The present invention is realized by a program, and is recorded on a recording medium such as a floppy (registered trademark) disk, an optical disk, an IC card, a ROM cassette, or the like on which a program can be recorded, and is transferred. It can be easily implemented in a computer system.

又、本発明の音韻は、上記実施の形態では、主にモーラに該当するものとして説明したが、これに限らず例えば、音節であっても良い。即ち、上記の様に、基本周波数データベースとして、モーラ単位又は音素単位でデータを格納している場合に限らず例えば、音節単位又は音節に含まれる音素単位でデータを格納した基本周波数データベースを用いても勿論良く、この場合でも、上記と同様の効果を発揮する。即ち、上述した全ての実施の形態において、「モーラ」を「音節」と読み替えた構成としても、上記と同様の効果を発揮する。 Further, in the above embodiment, the phoneme of the present invention has been described as mainly corresponding to a mora, but the present invention is not limited to this, and may be, for example, a syllable. That is, as described above, the fundamental frequency database is not limited to the case where data is stored in mora units or phoneme units. Of course, this is also good, and in this case, the same effect as described above is exerted. That is, in all of the above-described embodiments, the same effect as described above is exerted even if the configuration is such that “mora” is read as “syllable”.

又、上記実施の形態では、基本周波数データベースが、末尾から３モーラまでの基本周波数パタンを保持している場合について述べたが、最大限末尾から４モーラまでの基本周波数パタンを保持しておけば十分な効果を発揮する。 Further, in the above-described embodiment, a case has been described where the fundamental frequency database holds the fundamental frequency patterns from the end to 3 moras. However, if the fundamental frequency database holds the fundamental frequency patterns from the end to 4 moras at the maximum. Demonstrate sufficient effect.

上記の様に、本発明の第１の方法は、アクセント句のモーラ位置毎に当該モーラの音素の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を含むモーラ、アクセント核とアクセント核の次のモーラ、およびアクセント句の末尾の１モーラあるいは複数のモーラのおのおのについて、前記のデータベースを参照して各モーラ内での基本周波数パタンを設定し、基本周波数がデータベースより設定されない区間については、データベースから設定された基本周波数の間を実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 As described above, the first method of the present invention uses a phoneme time length standardized fundamental frequency database that stores a fundamental frequency pattern standardized by the time length of the phoneme of the mora for each mora position of the accent phrase, and For the mora containing the maximum of the fundamental frequency, the mora next to the accent kernel and the accent kernel, and one or more mora at the end of the accent phrase, the fundamental frequency pattern in each mora is referred to the above database. Is set, and for a section in which the basic frequency is not set from the database, a basic frequency pattern is generated by interpolating between the basic frequencies set from the database using a function on the real time axis.

又、第２の方法は、アクセント句のモーラ位置毎に当該モーラの音素の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を与える立ち上がり基準点、アクセントの立ち下がりを与える立ち下がり基準点、アクセント句の終了時の基本周波数を与えるアクセント句末基準点および発話終了時の基本周波数を与える語尾基準点を当該モーラの母音長に対して一定比である時間点に設定し、おのおのの基準点について前記のデータベースを参照して基本周波数を設定し、それらの基準点間の基本周波数については実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 The second method uses a phoneme time length standardized basic frequency database that stores a fundamental frequency pattern standardized by the time length of the phoneme of the mora for each mora position of the accent phrase, and determines the maximum value of the fundamental frequency of the accent phrase. The rise reference point to give, the fall reference point to give the fall of the accent, the accent reference point to give the fundamental frequency at the end of the accent phrase, and the ending reference point to give the fundamental frequency at the end of the utterance to the vowel length of the mora. For each of the reference points, the fundamental frequency is set with reference to the database, and the fundamental frequency between these reference points is interpolated by a function on the real time axis. This is a basic frequency pattern generation method for generating a basic frequency pattern.

又、第３の方法は、アクセント句のモーラ位置毎に当該モーラの母音あるいは母音相当部の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を含むモーラ、アクセント核とアクセント核の次のモーラ、およびアクセント句末尾の１ないし複数のモーラのおのおのについて、前記のデータベースを参照して各モーラ内での基本周波数パタンを設定し、基本周波数がデータベースより設定されない区間については、データベースから設定された基本周波数の間を実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 The third method uses a phoneme time length standardized basic frequency database that stores a fundamental frequency pattern standardized by the time length of a vowel or a vowel equivalent part of the mora for each mora position of the accent phrase, and uses the basic frequency of the accent phrase. For each of the mora containing the maximum value of mora, the mora next to the accent nucleus and the accent nucleus, and one or more mora at the end of the accent phrase, the fundamental frequency pattern in each mora is set by referring to the database, In a section in which the fundamental frequency is not set from the database, the basic frequency pattern is generated by interpolating between the fundamental frequencies set from the database using a function on the real time axis to generate a fundamental frequency pattern.

又、第４の方法は、アクセント句のモーラ位置毎に当該モーラの母音あるいは母音相当部の時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句の基本周波数の最大値を与える立ち上がり基準点、アクセントの立ち下がりを与える立ち下がり基準点、アクセント句の終了時の基本周波数を与えるアクセント句末基準点および発話終了時の基本周波数を与える語尾基準点を当該モーラの母音長に対して一定比である時間点に設定し、おのおのの基準点について前記のデータベースを参照して基本周波数を設定し、それらの基準点間の基本周波数については実時間軸上の関数により補間して基本周波数パタンを生成する基本周波数パタン生成方法である。 The fourth method uses a phoneme time length standardized fundamental frequency database storing a fundamental frequency pattern standardized by the time length of a vowel or a vowel equivalent part of the mora for each mora position of the accent phrase, and uses the fundamental frequency of the accent phrase. The rising reference point that gives the maximum value of, the falling reference point that gives the fall of the accent, the accent reference point that gives the fundamental frequency at the end of the accent phrase, and the ending reference point that gives the fundamental frequency at the end of the utterance Is set to a time point that is a fixed ratio to the vowel length of each, the fundamental frequency is set with reference to the database for each reference point, and the fundamental frequency between those reference points is a function on the real time axis. This is a basic frequency pattern generation method of generating a basic frequency pattern by interpolation.

又、第５の方法は、アクセント句のモーラ位置毎に当該モーラの音素時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースと、音素あるいは音韻列ごとの基本周波数を音素時間長で標準化した値と基本周波数パタンとの差を記憶したマイクロプロソディデータベースとを用い、音素時間長標準化基本周波数データベースから取得された基本周波数パタンにマイクロプロソディデータを加算あるいは減算することにより基本周波数パタンを生成する基本周波数パタン生成方法である。 In the fifth method, a phoneme time length standardized fundamental frequency database storing a fundamental frequency pattern standardized by the phoneme time length of the mora for each mora position of the accent phrase, and a fundamental frequency for each phoneme or phoneme sequence are stored in the phoneme time. Using a microprosody database that stores the difference between the value standardized by the length and the fundamental frequency pattern, the fundamental frequency pattern is obtained by adding or subtracting the microprosody data to or from the fundamental frequency pattern obtained from the phoneme time length standardized fundamental frequency database. Is a basic frequency pattern generation method for generating

又、第６の方法は、アクセント句のモーラ位置毎に当該モーラの音素時間長で標準化した基本周波数パタンを記憶した音素時間長標準化基本周波数データベースを用い、アクセント句ごとの基本周波数パタンを生成する基本周波数パタン生成方法において、基本種端数を生成しようとするアクセント句のモーラ数およびアクセント型に該当する基本周波数パタンが音素時間長標準化基本周波数データベース内にない場合、データベース内の基本周波数パタンを利用し、基本周波数を生成しようとするアクセント句をｎモーラｍ型、データベースから取得した基本周波数パタンをｌモーラｊ型、取得した基本周波数パタンの最大値を含むモーラの位置をｉ、取得した基本周波数パタンのアクセント句末尾のモーラ数をｋとするとき、ｍ≦ｉ＋１のとき第１から第ｍ＋１モーラまではデータベースより取得した基本周波数パタンの第１から第ｊ＋１モーラまでを適用し、第ｎ−ｋ＋１から第ｎモーラまではデータベースより取得した基本周波数パタンの第ｌ―ｋ＋１から第ｌモーラを適用し、その間のモーラについては実時間軸上で補間することにより基本周波数パタンを生成する。またｉ＋１＜ｍ≦ｎ−ｋ＋１のとき第１から第ｉモーラまではデータベースより取得した基本周波数パタンの第１から第ｉモーラまでを適用し、第ｍ、第ｍ＋１モーラにはデータベースより取得した基本周波数パタンの第ｊ、第ｊ＋１モーラを適用し、第ｎ−ｋ＋１から第ｎモーラまではデータベースより取得した基本周波数パタンの第ｌ―ｋ＋１から第ｌモーラまでを適用し、その間のモーラについては実時間軸上で補間することにより基本周波数パタンを生成する。またｍ＞ｎ−ｋ＋１のとき第１から第ｉモーラまではデータベースより取得した基本周波数パタンの第１から第ｉモーラまでを適用し、第ｍから第ｎモーラまではデータベースより取得した基本周波数パタンの第ｊモーラから第ｌモーラまでを適用し、その間のモーラについては実時間軸上で補間することにより基本周波数パタンを生成する基本周波数生成方法である。 The sixth method generates a fundamental frequency pattern for each accent phrase by using a phoneme time length standardized fundamental frequency database storing a fundamental frequency pattern standardized by a phoneme time length of the mora for each mora position of the accent phrase. In the basic frequency pattern generation method, if the basic frequency pattern corresponding to the mora number and accent type of the accent phrase for which the basic type fraction is to be generated is not in the phoneme time length standardized basic frequency database, the basic frequency pattern in the database is used. Then, the accent phrase for which the basic frequency is to be generated is n mora m type, the basic frequency pattern obtained from the database is 1 mora j type, the position of the mora including the maximum value of the obtained basic frequency pattern is i, and the obtained basic frequency is When the number of mora at the end of the accent phrase of the pattern is k, m ≦ i + 1 Then, from the first to the (m + 1) th mora, the first to j + 1th mora of the fundamental frequency pattern obtained from the database are applied, and from the (n−k + 1) th to the nth mora, the (l−k + 1) th of the fundamental frequency pattern obtained from the database is applied. To the first mora, and for the mora in between, a fundamental frequency pattern is generated by interpolating on the real time axis. When i + 1 <m ≦ nk + 1, the first to i-th mora of the basic frequency pattern obtained from the database are applied to the first to i-th mora, and the basic data obtained from the database are applied to the m-th and m + 1-th mora. The j-th and j + 1-th mora of the frequency pattern are applied, and from the (n-k + 1) -th to the n-th mora, the l-k + 1 to the l-th mora of the fundamental frequency pattern acquired from the database are applied. A fundamental frequency pattern is generated by interpolating on the time axis. When m> nk + 1, the first to i-th mora of the fundamental frequency pattern obtained from the database are applied to the first to i-th mora, and the basic frequency pattern obtained from the database is applied to the m-th to n-th mora. This is a fundamental frequency generation method of generating a fundamental frequency pattern by applying the j-th to the l-th mora and interpolating the mora between them on the real time axis.

又、第７の方法は、アクセント句の基本周波数パタンをフレーズのアクセント句の位置および文末であるか否かによって分類した基本周波数データベースを用いて基本周波数パタンを生成する基本周波数生成方法である。 A seventh method is a basic frequency generation method for generating a basic frequency pattern using a basic frequency database in which the basic frequency pattern of an accent phrase is classified according to the position of the accent phrase of the phrase and whether or not it is the end of a sentence.

又、第８の方法は、アクセント句の基本周波数パタンを記憶した基本周波数データベースと、フレーズのアクセント句の位置および文末であるか否かによって、基本周波数パタンの変形量を記憶した変形データベースを用い、基本周波数データより取得した基本周波数パタンを変形データベースより取得した変形量に従って変形し基本周波数パタンを生成する基本周波数パタン生成方法である。 The eighth method uses a fundamental frequency database that stores a fundamental frequency pattern of an accent phrase and a deformation database that stores the amount of deformation of the fundamental frequency pattern depending on the position of the accent phrase of the phrase and whether or not it is the end of the sentence. This is a basic frequency pattern generation method for generating a basic frequency pattern by deforming a basic frequency pattern obtained from basic frequency data according to a deformation amount obtained from a deformation database.

又、第９の方法は、アクセント句の基本周波数パタンを記憶した基本周波数データベースを用い、基本周波数データより取得した基本周波数パタンをフレーズ内でのアクセント句の位置ｉの関数により基本周波数パタンを変形する基本周波数パタン生成方法である。 A ninth method uses a fundamental frequency database storing fundamental frequency patterns of accent phrases, and transforms the fundamental frequency pattern obtained from the fundamental frequency data by a function of the position i of the accent phrase in the phrase. This is a basic frequency pattern generation method.

又、第１０の方法は、アクセント句の基本周波数パタンを記憶した基本周波数データベースを用い、基本周波数データより取得した基本周波数パタンを基本周波数パタンを決定する基準になるモーラに対してそのモーラのフレーズ内での位置ｊの関数により基本周波数パタンを変形する基本周波数パタン生成方法である。 Further, a tenth method uses a fundamental frequency database storing fundamental frequency patterns of accent phrases, and converts a fundamental frequency pattern acquired from the fundamental frequency data into a mora phrase used as a reference for determining the fundamental frequency pattern. This is a method of generating a fundamental frequency pattern in which the fundamental frequency pattern is deformed by a function of the position j in.

又、第１１の方法は、アクセント句ごとに基本周波数パタンを生成し、当該アクセント句のアクセント末尾、および終了点の周波数と次のアクセント句の開始点の周波数の差があらかじめ定められた値以下になるよう当該アクセント句のアクセントの立ち下がり、アクセント末尾および終了点の特性を変更する基本周波数パタン生成方法である。 In the eleventh method, a fundamental frequency pattern is generated for each accent phrase, and the difference between the frequency at the end and the end point of the accent phrase and the frequency at the start point of the next accent phrase is equal to or less than a predetermined value. This is a fundamental frequency pattern generation method that changes the characteristics of the fall of the accent, the end of the accent, and the end point of the accent phrase so that

以上説明したように、本発明によれば、アクセント句の立ち上がりとアクセント核での立ち下がりのタイミングと角度を当該モーラの母音長で標準化した基本周波数パタンを当てはめることにより、モーラ内での基本周波数の変動を詳細に再現し、高い自然性を実現するとともに、データベースのパタンを当てはめない実時間軸上で補間を行うことにより、モーラ単位で制御する際の不連続感をなくし、基本周波数パタンデータベースもより小さくすることができる。あるいはアクセント句の立ち上がりとアクセント核での立ち下がりのタイミングを当該モーラの母音長で標準化した時間軸上で設定することにより、モーラ内での基本周波数の変動のタイミングを詳細に再現し、立ち上がり、立ち下がりの角度については実時間軸上の関数を用いることによって、音韻による時間長の差に影響されることなく立ち上がり、立ち下がりの安定したなめらかな基本周波数のパタンを得ることができ、モーラ単位で制御する際の不連続感をなくし、高い自然性を実現する。さらに補間を用いることにより基本周波数パタンデータベースもより小さくすることができ、その実用的効果は大きい。 As described above, according to the present invention, the fundamental frequency pattern in the mora is applied by applying the fundamental frequency pattern in which the timing and angle of the rise of the accent phrase and the fall in the accent nucleus are standardized by the vowel length of the mora. In addition to realizing the high naturalness of the fluctuations of the data, the interpolation on the real-time axis to which the pattern of the database is not applied eliminates the sense of discontinuity when controlling in mora units. Can also be smaller. Alternatively, by setting the timing of the rise of the accent phrase and the fall of the accent nucleus on the time axis standardized by the vowel length of the mora, the timing of the fluctuation of the fundamental frequency in the mora is reproduced in detail, and the rise, By using the function on the real-time axis for the fall angle, it is possible to obtain a pattern with a stable fundamental frequency with a stable rise and fall without being affected by the difference in time length due to phonemes. Eliminates the sense of discontinuity when controlling with, and achieves high naturalness. Further, by using interpolation, the fundamental frequency pattern database can be made smaller, and its practical effect is great.

本発明に係る基本周波数パタン生成方法等は、従来に比べてより一層自然性の高い基本周波数パタンを生成出来るという長所を有し、基本周波数パタン生成方法等として有用である。 The fundamental frequency pattern generation method and the like according to the present invention has an advantage that a fundamental frequency pattern with higher naturalness can be generated as compared with the related art, and is useful as a fundamental frequency pattern generation method and the like.

本発明及び／又は本発明に関連する他の発明（以下、本発明等という）による基本周波数生成装置の機能ブロック図Functional block diagram of a fundamental frequency generation device according to the present invention and / or another invention related to the present invention (hereinafter, referred to as the present invention, etc.) 本発明等の実施の形態１により生成される基本周波数パタンの１例を示す図FIG. 4 is a diagram illustrating an example of a fundamental frequency pattern generated according to the first embodiment of the present invention. 本発明等の実施の形態２により生成される基本周波数パタンの１例を示す図FIG. 9 is a diagram illustrating an example of a fundamental frequency pattern generated according to the second embodiment of the present invention. 本発明等の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing one embodiment of the present invention etc. 本発明による基本周波数パタンの一例を示す図FIG. 4 is a diagram illustrating an example of a fundamental frequency pattern according to the present invention. 本発明による基本周波数パタンの一例を示す図FIG. 4 is a diagram illustrating an example of a fundamental frequency pattern according to the present invention. 本発明等の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing one embodiment of the present invention etc. マイクロプロソディデータベース２５０に記憶されているマイクロプロソディ成分の模式図Schematic diagram of microprosody components stored in microprosody database 250 （Ａ）：実施の形態５の基本周波数データベースより生成される基本周波数パタンを示す図（Ｂ）：同実施の形態のマイクロプロソディデータベースより取得したマイクロプロソディ成分を示す図（Ｃ）：図９（Ａ）のパタンに図９（Ｂ）のパタンを加算して生成した基本周波数パタンを示す図(A): A diagram showing a fundamental frequency pattern generated from the fundamental frequency database of the fifth embodiment. (B): A diagram showing microprosody components obtained from the microprosody database of the same embodiment. (C): FIG. The figure which shows the fundamental frequency pattern generated by adding the pattern of FIG.9 (B) to the pattern of A). 本発明の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing an embodiment of the present invention （Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): A diagram showing an example of a fundamental frequency pattern according to the present invention. （Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): A diagram showing an example of a fundamental frequency pattern according to the present invention. （Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): A diagram showing an example of a fundamental frequency pattern according to the present invention. （Ａ）、（Ｂ）：本発明による基本周波数パタンの一例を示す図(A), (B): A diagram showing an example of a fundamental frequency pattern according to the present invention. 本発明の基本周波数パタンの模式図Schematic diagram of the fundamental frequency pattern of the present invention 本発明等の一実施の形態を示す装置の機能ブロック図Functional block diagram of an apparatus showing one embodiment of the present invention etc. 本発明等の一実施の形態の基本周波数パタンの模式図FIG. 1 is a schematic diagram of a fundamental frequency pattern according to an embodiment of the present invention. 本発明等の変形例の基本周波数パタンの模式図Schematic diagram of a fundamental frequency pattern according to a modified example of the present invention. 本発明等の基本周波数パタンの模式図Schematic diagram of the fundamental frequency pattern of the present invention （Ａ）、（Ｂ）：本発明等の基本周波数パタンのアクセント句接続部の模式図(A), (B): Schematic diagrams of the accent phrase connection part of the fundamental frequency pattern of the present invention.

Explanation of reference numerals

１０文字列入力部
２０文字列解析部
３０音韻時間長データベース
４０時間長設定部
５０モーラ時間長標準化基本周波数データベース
６０基本周波数パタン生成部
７０声帯振動生成部
１５０、１５０ａ,１５０ｂ母音時間長標準化基本周波数データベース
２５０マイクロプロソディデータベース
３５０基本周波数パタン変形データベース
４５０アクセント句位置基本周波数データベース DESCRIPTION OF SYMBOLS 10 Character string input part 20 Character string analysis part 30 Phoneme time length database 40 Time length setting part 50 Mora time length standardized fundamental frequency database 60 Basic frequency pattern generation part 70 Vocal cord vibration generation part 150, 150a, 150b Vowel time length standardized fundamental frequency Database 250 Microprosody database 350 Fundamental frequency pattern deformation database 450 Accent phrase position fundamental frequency database

Claims

A fundamental frequency pattern generation method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified according to the number of phonemes and accent positions,
The basic frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the basic frequency pattern is to be generated is not stored in the basic frequency database, and furthermore, the basic frequency pattern is to be generated. If the accent position of the accent phrase is the same or earlier than the next phoneme position of the phoneme position including the peak of the fundamental frequency stored in the fundamental frequency database,
(1) A basic frequency pattern stored in the basic frequency database and having the same accent position as that of the accent phrase for which the basic frequency pattern is to be generated, and an accent for which the basic frequency pattern is to be generated. Using the fundamental frequency pattern stored in the fundamental frequency database, corresponding to the number of phonemes closest to the number of phonemes in the phrase,
(2) Applying a fundamental frequency pattern from the first phoneme to the next phoneme of the accent kernel by applying a fundamental frequency from the first phoneme of the fundamental frequency pattern stored in the fundamental frequency database to the next phoneme of the accent kernel. Generate
(3) The basic frequency from the accent nucleus to the phoneme immediately before the end of the accent phrase, which is a predetermined number of phonemes less than or equal to 4 from the second phoneme, is calculated as (a) in the fundamental frequency pattern stored in the fundamental frequency database. ) A fundamental frequency between a second phoneme from the accent kernel and the end of the accent phrase, or (b) a fundamental frequency between a phoneme next to the accent kernel and the end of the accent phrase, or (c) a second phoneme from the accent kernel. And (d) a fundamental frequency between the phoneme immediately before the end of the accent phrase and a fundamental frequency between the phoneme immediately before the end of the accent phrase and (d) a fundamental frequency between the phoneme immediately before the end of the accent phrase and the fundamental frequency of the phoneme immediately before the end of the accent phrase.
(4) By applying the fundamental frequency pattern at the end of the accent phrase of the accent phrase for which the fundamental frequency pattern is to be generated to the fundamental frequency of the phoneme at the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency database. The method of generating the fundamental frequency pattern to be generated.

A fundamental frequency pattern generation method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified according to the number of phonemes and accent positions,
The basic frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the basic frequency pattern is to be generated is not stored in the basic frequency database, and further, the basic frequency pattern is to be generated. The accent position of the accent phrase is located after the phoneme position next to the phoneme position including the peak of the fundamental frequency stored in the fundamental frequency database and before the end of the accent phrase which is a predetermined number of phonemes of 4 or less. If you do
(1) It is a fundamental frequency pattern having an accent nucleus after the second phoneme from the peak of the fundamental frequency stored in the fundamental frequency database and before the end of the accent phrase, and further attempts to generate the fundamental frequency pattern. Using the fundamental frequency pattern stored in the fundamental frequency database, corresponding to the number of phonemes closest to the number of phonemes of the accent phrase,
(2) A basic frequency pattern from the first phoneme of the accent phrase for which the basic frequency pattern is to be generated to the phoneme including the peak of the basic frequency is obtained from the first phoneme of the basic frequency pattern stored in the basic frequency database. Generate by applying the fundamental frequency up to the phoneme including the peak of the fundamental frequency,
(3) The fundamental frequencies from the phoneme following the phoneme including the peak of the fundamental frequency to the phoneme immediately before the accent nucleus are expressed by: (a) the fundamental frequency peak stored in the fundamental frequency database; (B) the fundamental frequency of the phoneme including the peak of the fundamental frequency and the fundamental frequency of the phoneme immediately preceding the phoneme including the accent kernel, or (c) the fundamental frequency of the phoneme including the accent kernel. A fundamental frequency of a phoneme next to the phoneme including the frequency peak and a phoneme including the accent kernel, or (d) a phoneme next to the phoneme including the peak of the fundamental frequency and a phoneme immediately before the phoneme including the accent kernel. Generated by interpolation with the fundamental frequency
(4) The basic frequencies of the phoneme including the accent nucleus of the accent phrase for which the basic frequency pattern is to be generated and the phoneme immediately following it are obtained by combining the phoneme including the accent nucleus of the basic frequency pattern stored in the basic frequency database with the phoneme. Generated by applying the fundamental frequency with the phoneme immediately after,
(5) The fundamental frequency from the accent nucleus to the phoneme immediately before the end of the accent phrase, which is the predetermined number of phonemes of 4 or less from the second phoneme, is calculated by using the basic frequency pattern stored in the fundamental frequency database as ( a) the fundamental frequency of the second phoneme from the accent kernel and the end of the accent phrase, or (b) the fundamental frequency of the next phoneme of the accent kernel and the end of the accent phrase, or (c) the fundamental frequency of 2 from the accent kernel. A fundamental frequency between a phoneme and a phoneme immediately before the end of the accent phrase, or (d) a fundamental frequency between a phoneme next to the accent kernel and the phoneme immediately before the end of the accent phrase, and generated by interpolation.
(6) applying the fundamental frequency pattern at the end of the accent phrase of the accent phrase for which the fundamental frequency pattern is to be generated to the fundamental frequency of the phoneme at the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency database. The method of generating the fundamental frequency pattern generated by.

A fundamental frequency pattern generation method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified according to the number of phonemes and accent positions,
If the fundamental frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the fundamental frequency pattern is to be generated is not stored in the fundamental frequency database, and the accent for which the fundamental frequency pattern is to be generated is When the accent position of the phrase is included in the phoneme at the end of the accent phrase having a predetermined number of phonemes of 4 or less,
(1) The basic position stored in the fundamental frequency database, wherein the accent position within the accent phrase end and the accent position within the accent phrase end of the accent phrase for which the fundamental frequency pattern is to be generated are the same. Using a fundamental frequency pattern stored in the fundamental frequency database, which is a frequency pattern and corresponds to the number of phonemes closest to the number of phonemes of the accent phrase to generate the fundamental frequency pattern,
(2) A basic frequency pattern from the first phoneme of the accent phrase for which the basic frequency pattern is to be generated to the phoneme including the peak of the basic frequency is obtained from the first phoneme of the basic frequency pattern stored in the basic frequency database. Generate by applying the fundamental frequency up to the phoneme including the peak of the fundamental frequency,
(3) The fundamental frequencies from the phoneme following the phoneme including the peak of the fundamental frequency to the phoneme immediately before the accent nucleus are expressed by: (a) the fundamental frequency peak stored in the fundamental frequency database; (B) the fundamental frequency of the phoneme including the peak of the fundamental frequency and the fundamental frequency of the phoneme immediately preceding the phoneme including the accent kernel, or (c) the fundamental frequency of the phoneme including the accent kernel. A fundamental frequency of a phoneme next to the phoneme including the frequency peak and a phoneme including the accent kernel, or (d) a phoneme next to the phoneme including the peak of the fundamental frequency and a phoneme immediately before the phoneme including the accent kernel. Generated by interpolation with the fundamental frequency
(4) The fundamental frequencies from the phoneme including the accent nucleus of the accent phrase for which the fundamental frequency pattern is to be generated to the final phoneme of the accent phrase are calculated from the phoneme including the accent nucleus of the fundamental frequency pattern stored in the fundamental frequency database. A method of generating a fundamental frequency pattern by applying a fundamental frequency up to the final phoneme of an accent phrase.

A fundamental frequency pattern generation method for generating a fundamental frequency pattern of an accent phrase using a fundamental frequency database storing fundamental frequency patterns classified according to the number of phonemes and accent positions,
The basic frequency pattern corresponding to the number of phonemes and the accent type of the accent phrase for which the basic frequency pattern is to be generated is not stored in the basic frequency database, and further, the basic frequency pattern is to be generated. If the accent type of the accent phrase is flat,
(1) Using a fundamental frequency pattern stored in the fundamental frequency database, which corresponds to the number of phonemes closest to the number of phonemes of the accent phrase for which the basic frequency pattern is to be generated with the flat type accent,
(2) The fundamental frequency pattern from the first phoneme to the phoneme including the peak of the fundamental frequency is defined as the fundamental frequency from the first phoneme of the fundamental frequency pattern stored in the fundamental frequency database to the phoneme including the peak of the fundamental frequency. Apply and generate,
(3) The basic frequencies from the phoneme following the phoneme including the peak of the fundamental frequency to the end of the accent phrase or immediately before the final phoneme are calculated using the basic frequency pattern stored in the fundamental frequency database, A fundamental frequency of a phoneme including a frequency peak and the end of the accent phrase or the final phoneme, or (b) a phoneme including a peak of the fundamental frequency and a phoneme at the end of the accent phrase or immediately before the final phoneme. Or (c) a fundamental frequency between the phoneme following the phoneme including the peak of the fundamental frequency and the end of the accent phrase or the final phoneme, or (d) a phoneme including the peak of the fundamental frequency. The next phoneme of the, the fundamental frequency of the phoneme at the end of the accent phrase or immediately before the final phoneme, generated by interpolation,
(4) The end of the accent phrase of the accent phrase for which the fundamental frequency pattern is to be generated or the fundamental frequency pattern of the final phoneme is stored in the fundamental frequency pattern stored in the fundamental frequency database. A method of generating a fundamental frequency pattern by applying a fundamental frequency of a phoneme.

The method according to claim 1, wherein the fundamental frequency pattern is extracted from a naturally uttered voice.

5. The method according to claim 1, wherein the interpolation is linear interpolation.

The basic frequency pattern generation method according to any one of claims 1 to 4, wherein a basic frequency from a head of the accent phrase to a peak of the basic frequency is interpolated by a basic frequency pattern expressed on a real time axis.

The method according to claim 1, wherein the phoneme is a mora or a syllable.

A program recording medium on which a program for causing a computer to execute each step of the fundamental frequency pattern generation method according to claim 1 is recorded.