JP4630038B2

JP4630038B2 - Speech waveform database construction method, apparatus and program for implementing this method

Info

Publication number: JP4630038B2
Application number: JP2004315614A
Authority: JP
Inventors: 光昭磯貝; 秀之水野; 一則間野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-10-29
Filing date: 2004-10-29
Publication date: 2011-02-09
Anticipated expiration: 2024-10-29
Also published as: JP2006126556A

Description

この発明は、音声波形データベース構築方法、この方法を実施する装置およびプログラムに関し、特に、波形接続型のテキスト音声合成装置において用いられる音声波形データベースを構築する際に用いられ、この音声波形データベースを用いたテキスト音声合成装置において、音声合成に要する実行時間、計算量が小さく、かつ、高品質な音声合成を実施する音声波形データベース構築方法、この方法を実施する装置およびプログラムに関する。 The present invention relates to a speech waveform database construction method, an apparatus and a program for executing the method, and more particularly to construction of a speech waveform database used in a waveform-connected text speech synthesizer. The present invention relates to a speech waveform database construction method for implementing high-quality speech synthesis with a small execution time and calculation amount required for speech synthesis, and an apparatus and program for implementing this method.

近年の音声合成技術分野においては、大容量の記憶装置に数十分から数十時間の大量の肉声データを格納して音声波形データベースを構成し、入力されたテキストに応じて、適切な基準で音声波形データベースから適切な長さの音声波形を切り出し、これらを接続して合成音声を生成する波形接続型音声合成方法が提案されている（特許文献１参照）。この波形接続型音声合成方法は、音声波形データベース中に適切な音高、長さの音声波形が存在しなかった場合、生成される合成音声のイントネーションが不自然になる場合がある。そこで、合成音声に所望のイントネーションを与えるために、音声波形データベースから切り出した音声波形に信号処理を施して、その音高、長さその他の音声パラメータを変更する方法が提案されている（特許文献１参照）。
特許第２７６１５５２号明細書 In the recent speech synthesis technology field, a large volume storage device stores a large amount of tens of minutes to several tens of hours of real voice data to construct a speech waveform database. A waveform connection type speech synthesis method has been proposed in which speech waveforms having an appropriate length are cut out from a speech waveform database and connected to generate synthesized speech (see Patent Document 1). In this waveform connection type speech synthesis method, when a speech waveform having an appropriate pitch and length does not exist in the speech waveform database, intonation of the synthesized speech to be generated may become unnatural. Therefore, in order to give a desired intonation to the synthesized speech, a method has been proposed in which signal processing is performed on the speech waveform cut out from the speech waveform database and the pitch, length, and other speech parameters are changed (Patent Literature). 1).
Japanese Patent No. 2761552 Specification

ところが、信号処理によって音声波形の音高、長さを変更すると、その変更量が大きい場合は音質が劣化するという問題があった。また、信号処理を行うと、単純に波形接続を実行した場合と比較して、より計算量が増大する上に、合成音声の生成処理に要する時間も増大するという問題があった。
この発明は、上述の問題に鑑みてなされたものであり、波形接続型テキスト音声合成において、肉声音声波形に音声パラメータを変更した変更済音声波形をも加えて音声波形データベースを構成することにより、音声合成に要する実行時間、計算量が小さく、かつ高品質な合成音声を生成することができる、先の問題を解消した音声波形データベース構築方法、この方法を実施する装置およびプログラムを提供するものである。 However, when the pitch and length of the speech waveform are changed by signal processing, there is a problem that the sound quality deteriorates when the change amount is large. Further, when the signal processing is performed, there is a problem that the amount of calculation is further increased and the time required for the synthetic speech generation processing is increased as compared with the case where the waveform connection is simply performed.
The present invention has been made in view of the above problems, and in waveform-connected text-to-speech synthesis, by adding a modified speech waveform in which a speech parameter is changed to a real voice waveform, a speech waveform database is configured. A speech waveform database construction method capable of generating high-quality synthesized speech with a small execution time and amount of calculation required for speech synthesis, a method for constructing a speech waveform database, an apparatus for implementing this method, and a program are provided. is there.

請求項１：波形接続型テキスト音声合成装置において用いられる音声波形データベースを構築する音声波形データベース構築方法において、予め用意された肉声音声波形に対して、音声波形の音質劣化が生じない範囲内で指定された変更量パラメータに基づいて、音高、或いは長さ、或いはパワー、その他の音声パラメータを信号処理により変更して、新たに韻律変更済音声波形を生成するステップを実行し、肉声音声波形に付与されている、発声内容とその位置を示す音素ラベル情報、或いはパワー情報、或いは音高情報、或いはピッチマーク情報、その他の音声情報を、韻律変更済音声波形を生成するステップにおいてなされた変更量に対応して変更することにより新たな音声情報を生成するステップを実行し、肉声音声波形およびその音声情報に加え、当該肉声音声波形から派生させた１つあるいは複数種類の韻律変更済の音声波形およびその音声情報とを用いて音声波形データベースを生成するステップを実行する、音声波形データベース構築方法を構成した。 Claim 1: In a speech waveform database construction method for constructing a speech waveform database used in a waveform-connected text speech synthesizer, designation is made within a range in which sound quality deterioration of a speech waveform does not occur with respect to a prepared real voice waveform Based on the changed amount parameter, the pitch, length, power, or other speech parameter is changed by signal processing to newly generate a prosody changed speech waveform, The amount of change made in the step of generating the prosody modified speech waveform from the given phoneme label information indicating the content and position of the utterance, power information, pitch information, pitch mark information, and other speech information The step of generating new audio information by executing a change corresponding to the In addition to voice information, performing the step of generating a speech waveform database by using the the real voice one or more types of prosody Changed speech waveform and the voice information is derived from the speech waveform, a speech waveform database building method Configured.

そして、請求項２：波形接続型テキスト音声合成装置において用いられる音声波形データベースを構築する音声波形データベース構築装置において、予め用意された肉声音声波形に対して、音声波形の音質劣化が生じない範囲内で指定された変更量パラメータに基づいて、音高、或いは長さ、或いはパワー、その他の音声パラメータを信号処理により変更して、新たに韻律変更済音声波形を生成する音声波形変更部４と、肉声音声波形に付与されている、発声内容とその位置を示す音素ラベル情報、或いはパワー情報、或いは音高情報、或いはピッチマーク情報、その他の音声情報を、韻律変更済音声波形の生成においてなされた変更量に対応して変更することにより新たな音声情報を生成する音声情報変更部５と、肉声音声波形およびその音声情報に加え、当該肉声音声波形から派生させた１つあるいは複数種類の韻律変更済の音声波形およびその音声情報とを用いて音声波形データベースを生成する音声波形データベース生成部８と、を有する音声波形データベース構築装置を構成した。 Claim 2: In a speech waveform database construction device for constructing a speech waveform database used in a waveform-connected text speech synthesizer, within a range in which sound quality deterioration of the speech waveform does not occur with respect to a prepared real voice speech waveform A voice waveform changing unit 4 that changes a pitch, length, power, or other voice parameters by signal processing based on the change amount parameter specified in step S4 to generate a new prosody changed voice waveform; The phoneme label information indicating the utterance content and its position, or the power information, the pitch information, the pitch mark information, or other voice information added to the real voice waveform is generated in the generation of the prosody modified voice waveform. A voice information changing unit 5 for generating new voice information by changing the amount corresponding to the amount of change; In addition to voice information, voice with a speech waveform database creation unit 8 for generating a speech waveform database by using the the real voice one or more types of prosody Changed speech waveform and the voice information is derived from the speech waveform A waveform database construction device was constructed.

また、請求項３：請求項１に記載されるステップをこの順に実行する指令をコンピュータに対してする音声波形データベース構築プログラムを構成した。 A third aspect of the present invention is a speech waveform database construction program for instructing a computer to execute the steps described in the first aspect in this order.

この発明は、波形接続型テキスト音声合成装置において用いられる音声波形データベースを構築する課程において、
音声波形データベースを構築する肉声音声波形と、肉声音声波形に発声内容とその位置を示す音素ラベル情報、或いはパワー情報、或いは音高情報、或いはピッチマーク情報、その他の音声情報が用意されている段階で、
音声波形データベース中の肉声音声波形全てに対して、音声波形変更部により指定された変更量パラメータに基づいて、音高、或いは長さその他の音声パラメータを信号処理により変更して韻律変更済の音声波形を新たに生成し、
音声波形データベース中の肉声発声に付与されている、発声内容とその位置を示すための音素ラベル情報、或いはパワー情報、或いは音高情報その他の音声情報を、音声情報変更部により、音声波形変更部においてなされた変更量に対応して変更することにより新たな音声情報を生成し、
上述した肉声音声波形およびその音声情報と、韻律変更済の音声波形およびその音声情報を用いて、音声波形データベース生成部により音声波形データベースを構築する構成を採用している。これにより、予め信号処理により多様な韻律バリエーションを持つ音声波形が音声波形データベースに含まれることになり、音声波形データベース中に、合成したいテキストに対応した適切な音高或いは長さ或いはパワーその他の音声パラメータの音声波形が存在する可能性が高くなり、自然なイントネーションを持つ高品質な音声合成をすることができる。 In the course of constructing a speech waveform database used in a waveform connected text speech synthesizer,
A stage in which a voice waveform for constructing a voice waveform database, a phonetic label information indicating the utterance content and its position, power information, pitch information, pitch mark information, or other voice information is prepared in the voice waveform. so,
For all real voice waveforms in the voice waveform database, the pitch or length or other voice parameters are changed by signal processing based on the change amount parameter specified by the voice waveform change unit, and the prosody has been changed. Generate a new waveform,
The phonetic label information for indicating the utterance content and its position, the power information, the pitch information or other voice information given to the real voice utterance in the voice waveform database To generate new audio information by changing in accordance with the amount of change made in
A configuration in which a speech waveform database is constructed by a speech waveform database generation unit using the above-described real voice waveform and its speech information, and a speech waveform whose prosody has been changed and its speech information is adopted. As a result, speech waveforms having various prosodic variations by signal processing are included in the speech waveform database in advance, and an appropriate pitch, length, power or other speech corresponding to the text to be synthesized is included in the speech waveform database. There is a high possibility that a parameter speech waveform exists, and high-quality speech synthesis with natural intonation can be performed.

そして、音声波形変更部で行う肉声音声波形に対する信号処理において、韻律の変更は音質劣化が生じない範囲で行われているので、音質劣化のない高品質な音声合成をすることができる。
また、音声波形に対する信号処理をデータベース構築の段階で行うので、音質劣化は少ないが計算量を要する信号処理手法を用いることができるところから、合成音声生成処理に要する時間を増大させずに、音質劣化のない自然なイントネーションを持つ高品質な音声合成をすることができる。 In the signal processing for the real voice waveform performed by the voice waveform changing unit, the prosody change is performed in a range in which the sound quality does not deteriorate, so that high quality speech synthesis without sound quality deterioration can be performed.
In addition, since the signal processing for speech waveforms is performed at the stage of database construction, a signal processing technique that requires little computational complexity but with little deterioration in sound quality can be used, so the sound quality can be increased without increasing the time required for the synthesized speech generation processing. High quality speech synthesis with natural intonation without degradation can be achieved.

発明を実施するための最良の形態を図１の実施例を参照して説明する。
１は音声波形データベース構築装置であり、音声波形変更部４と、音声情報変更部５と、変更済音声波形データ６と、変更済音声情報７と、音声波形データベース生成部8とから構成されている。
音声波形データベース構築装置１は、肉声音声波形データ２と、その音素ラベル情報、或いはパワー情報、或いは音高情報、或いはピッチマーク情報その他の肉声音声情報３を入力として、音声波形データベース9を構築する音声波形データベース構築装置である。音声波形データベース構築装置１は、また、周知のハードウエアから構成されるコンピュータ装置とそのＣＰＵを駆動するソフトウエアであるコンピュータプログラムとからなり、音声波形変更部４と、音声情報変更部５と、音声波形データベース生成部8のそれぞれはハードウエアとソフトウエアの内の何れか一方或いは双方によって構成されている。 The best mode for carrying out the invention will be described with reference to the embodiment of FIG.
Reference numeral 1 denotes an audio waveform database construction device, which is composed of an audio waveform change unit 4, an audio information change unit 5, changed audio waveform data 6, changed audio information 7, and an audio waveform database generation unit 8. Yes.
The speech waveform database construction apparatus 1 constructs a speech waveform database 9 by inputting the real voice waveform data 2 and its phoneme label information, power information, pitch information, pitch mark information or other real voice information 3. This is a speech waveform database construction device. The speech waveform database construction device 1 is also composed of a computer device composed of well-known hardware and a computer program that is software for driving the CPU. The speech waveform change unit 4, the speech information change unit 5, Each of the speech waveform database generation units 8 is configured by one or both of hardware and software.

肉声音声波形データ２は、例えば、ナレーターの発声により単語、文章を読み上げた音声データをＡＤ変換して、ハードディスクの如き記憶媒体の記憶領域に格納したものである。
肉声音声情報３は、肉声音声波形データ２に対応して、発声内容とその時間情報を示す音素ラベル情報、パワー情報、音高情報、ピッチマーク情報、スペクトル情報その他の音声情報を付与したデータであり、これらをハードディスクの如き記憶媒体の記憶領域に格納したものである。 The real voice waveform data 2 is, for example, voice data obtained by reading a word or a sentence by the voice of a narrator, AD-converted, and stored in a storage area of a storage medium such as a hard disk.
The real voice information 3 corresponds to the real voice waveform data 2 and is data to which phonetic label information, power information, pitch information, pitch mark information, spectrum information and other voice information indicating utterance contents and time information are added. These are stored in a storage area of a storage medium such as a hard disk.

音声波形変更部４は、音声波形データベース構築装置１に与えられた変更量パラメータに基づいて、肉声音声波形データ２に格納される全ての肉声音声波形データに対して信号処理を施して対応する音高、長さに変更し、音高、長さに指定の変更量が加えられた変更済音声波形データ６を生成する。ここで用いられる信号処理方法としては、例えば、ＰＳＯＬＡ（Pitch Sincronous Ooverlap Add ）に代表される信号処理方法がある。変更済音声波形データ６は、ハードディスクの如き記憶媒体の記憶領域に格納される。
音声情報変更部５は、音声波形変更部４で変更された音声波形データの音高、長さに対応して、肉声音声情報３の情報を変更して、変更済音声情報７を生成する。例えば、音声波形データの音高が変わると、それに対応して、その音高情報、ピッチマーク情報、スペクトル情報を変更する。また、音声波形データの長さが変わると、それに対応して音素ラベルの時間情報、ピッチマーク情報を変更する。変更済音声情報７は、ハードディスクの如き記憶媒体の記憶領域に格納される。 The speech waveform changing unit 4 performs signal processing on all the real voice waveform data stored in the real voice waveform data 2 on the basis of the change amount parameter given to the voice waveform database construction device 1 and corresponding sound. Changed to high and length, and the changed speech waveform data 6 in which the specified change amount is added to the pitch and length is generated. As a signal processing method used here, for example, there is a signal processing method represented by PSOLA (Pitch Sincronous Overlap Add). The changed speech waveform data 6 is stored in a storage area of a storage medium such as a hard disk.
The voice information changing unit 5 changes the information of the real voice information 3 according to the pitch and length of the voice waveform data changed by the voice waveform changing unit 4 and generates changed voice information 7. For example, when the pitch of the voice waveform data changes, the pitch information, pitch mark information, and spectrum information are changed accordingly. Further, when the length of the speech waveform data changes, the time information and the pitch mark information of the phoneme label are changed correspondingly. The changed audio information 7 is stored in a storage area of a storage medium such as a hard disk.

音声波形データベース生成部8は、肉声音声波形データ２、肉声音声情報３、変更済音声波形データ６と、変更済音声情報７に格納されたデータに基づいて、音声波形データ９と音声情報インデックス１０からなる音声波形データベース１１を生成する。
音声波形データ9は、肉声音声波形データ２或いは変更済音声波形データ６の内容である音声波形を格納したもので、ハードディスクの如き記憶媒体の記憶領域に格納したものである。
音声情報インデックス１0は、肉声音声情報３或いは変更済音声情報７の内容をインデックスとして保持し、所望の音素情報、音高情報を検索キーとして、音声波形データ９から音声合成に必要な音声波形データを読み出せる構成としたものである。 Based on the data stored in the voice waveform data 2, the real voice information 3, the changed voice waveform data 6, and the changed voice information 7, the voice waveform database generation unit 8 generates the voice waveform data 9 and the voice information index 10. A speech waveform database 11 is generated.
The voice waveform data 9 stores the voice waveform which is the content of the real voice waveform data 2 or the changed voice waveform data 6 and is stored in a storage area of a storage medium such as a hard disk.
The voice information index 10 holds the contents of the real voice information 3 or the changed voice information 7 as an index, and uses the desired phoneme information and pitch information as search keys, and the voice waveform data necessary for voice synthesis from the voice waveform data 9. Can be read.

ここで、処理動作を図２をも参照して説明する。図２は声波形データベース構築の処理動作を示すフローチャートである。
先ず、（S１）は、音声波形変更部４において、音声波形データベース構築装置１に与えられた変更量パラメータに基づいて、肉声音声波形データ２に格納される全ての肉声音声波形データに対して信号処理を施して音高、長さを変更する。この変更量は、音質劣化が生じない範囲である必要がある。この許容量は信号処理方法にも依存するので、一種類に規定しておくことはできず、信号処理方法に応じて実験値より決定すべきであるが、この実施例は、変更量パラメータは、音高を一律１０％上昇するように与えられたものとする。この場合、肉声音声波形データ２に格納された全ての肉声音声波形データに対して、その音高を一律に１０％上昇させた音声波形を信号処理により生成する。ここで用いられる信号処理方法としては、例えば、ＰＳＯＬＡに代表される方法がある。ＰＳＯＬＡのような信号処理時にピッチマーク情報が必要となる方式の場合は、肉声音声情報３に格納されたピッチマーク情報を読み出して利用する。変更された音声波形データは変更済音声波形データ６として、一時的に格納される。 Here, the processing operation will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of voice waveform database construction.
First, (S1) is a signal for all the real voice waveform data stored in the real voice waveform data 2 in the voice waveform change unit 4 based on the change amount parameter given to the voice waveform database construction device 1. Apply the processing to change the pitch and length. This amount of change needs to be within a range where sound quality degradation does not occur. Since this allowable amount also depends on the signal processing method, it cannot be defined as one type, and should be determined from experimental values according to the signal processing method. Suppose that the pitch is given by 10%. In this case, for all the real voice waveform data stored in the real voice waveform data 2, a voice waveform whose pitch is uniformly increased by 10% is generated by signal processing. As a signal processing method used here, for example, there is a method represented by PSOLA. In the case of a system that requires pitch mark information during signal processing such as PSOLA, the pitch mark information stored in the real voice information 3 is read and used. The changed voice waveform data is temporarily stored as changed voice waveform data 6.

続いて、（Ｓ２）は、音声情報変更部５において、音声波形変更部４で変更された音声波形データの音高、長さに対応して、肉声音声情報３の情報を変更する。この実施例の場合は、音高を１0％上昇させるために、音高情報を１０％上昇させたものに変更する。また、音高が変わるとピッチマーク位置も変わるので、変更後のピッチマーク位置に合わせて変更する。変更後のピッチマーク位置については、音声波形変更部４の信号処理時に用いたピッチマーク位置情報を流用すればよい。変更された音声情報データは変更済音声情報７として、一時的に格納される。 Subsequently, in (S2), in the voice information changing unit 5, the information of the real voice information 3 is changed in accordance with the pitch and length of the voice waveform data changed by the voice waveform changing unit 4. In the case of this embodiment, in order to increase the pitch by 10%, the pitch information is changed to a value increased by 10%. Further, since the pitch mark position changes when the pitch changes, the pitch mark position is changed according to the changed pitch mark position. For the pitch mark position after the change, the pitch mark position information used at the time of signal processing of the voice waveform changing unit 4 may be used. The changed audio information data is temporarily stored as changed audio information 7.

次に、（Ｓ３）は、音声波形データベース生成部８において、肉声音声波形データ２と変更済音声波形データ６と肉声音声情報３と変更済音声情報７とを用いて音声波形データベース１１を構築する。肉声音声情報３或いは変更済音声情報７の内容を音声情報インデックス１0として保持し、また、対応する音声波形データヘのポインタ情報を登録することにより、音声合成エンジンで音声波形データベース１１を使って音声合成を行うに際して、所望の音素情報、音高情報などで音声波形データ9を検索して、合成に必要な音声波形データを読み出せるように構成する。この実施例によれば、最終的に構築される音声波形データベース１１は、音声波形データについては、肉声音声波形データ２と、音高を１０％上昇させるという変更を加えた変更済音声データ６を含むので、音声波形データベースの規模は２倍となる。 Next, in (S3), the voice waveform database generation unit 8 constructs the voice waveform database 11 using the real voice waveform data 2, the changed voice waveform data 6, the real voice information 3, and the changed voice information 7. . The contents of the real voice information 3 or the changed voice information 7 are held as the voice information index 10 and the pointer information to the corresponding voice waveform data is registered, so that the voice synthesis engine uses the voice waveform database 11 to synthesize the voice. Is performed, the speech waveform data 9 is searched for desired phoneme information, pitch information, etc., and the speech waveform data necessary for synthesis can be read out. According to this embodiment, the voice waveform database 11 to be finally constructed includes, for the voice waveform data, the real voice waveform data 2 and the changed voice data 6 to which the pitch is increased by 10%. As a result, the size of the speech waveform database is doubled.

上述した通りであり、この発明によれば、予め変更量パラメータに基づく信号処理を行っておくことにより、音声波形データベースの保持する音声波形のバリエーションが増加するので、音声波形データベース中に、合成したいテキストに対応した適切な音高或いは長さの音声波形が存在する可能性が高くなり、自然なイントネーションを持つ高品質な音声合成をすることができる。
そして、音声波形変更部で行う肉声音声波形に対する信号処理において、韻律の変更は音質劣化が生じない範囲内で行われているので、肉声音声波形に対して信号処理を施した音声波形を用いているにも関わらず、音質劣化のない高品質な音声合成をすることができる。 As described above, according to the present invention, since signal processing based on the change amount parameter is performed in advance, the variation of the speech waveform held in the speech waveform database increases. Therefore, it is desired to synthesize the speech waveform database. There is a high possibility that a speech waveform having an appropriate pitch or length corresponding to the text exists, and high-quality speech synthesis with natural intonation can be performed.
In the signal processing for the real voice waveform performed by the voice waveform changing unit, since the prosody change is performed within a range in which the sound quality does not deteriorate, the voice waveform obtained by performing signal processing on the real voice waveform is used. In spite of this, it is possible to synthesize high-quality speech without deterioration of sound quality.

また、音声波形に対する信号処理をデータベース構築の段階で行うので、音質劣化が少ない半面計算量を要するような信号処理手法を用いることができ、合成音声生成処理に要する時問を増大させずに、音質劣化のない自然なイントネーションを持つ高品質な音声合成が可能となる効果がある。
なお、上述した実施例は、以下の如き態様で実施することができる。
例えば、変更量パラメータにおいて指定される変更量は１種類である必要はなく、例えば、音高変更量を５％上昇、１０％上昇、５％下降、１０％下降、の４種類指定し、変更済音声波形データ６および変更済音声情報７をそれぞれ変更量パラメータに応じて４種類生成し、これらを元にした音声波形データベース１１を構築してもよい。また、音高の変更、長さの変更を組み合わせて指定してもよい。これらの場合、或る音素列に対してより多くの音高バリエーション、音長バリエーションが存在するので、音声素片間の音高の繋がりがより滑らかになり、また、音素の継続時間の自然性が向上する可能性が高まるので、合成音声の品質はより向上する。また、この発明は、先のＰＳＯＬＡ以外の他の信号処理方法を使用しても実施することができる。 In addition, since the signal processing for the speech waveform is performed at the stage of database construction, it is possible to use a signal processing method that requires a small amount of computation with little deterioration in sound quality, and without increasing the time required for the synthesized speech generation process, This has the effect of enabling high-quality speech synthesis with natural intonation without sound quality degradation.
In addition, the Example mentioned above can be implemented with the following aspects.
For example, the change amount specified in the change amount parameter does not need to be one type. For example, the pitch change amount is specified by changing four types of 5% up, 10% up, 5% down, 10% down. Four types of the completed speech waveform data 6 and the modified speech information 7 may be generated according to the change amount parameter, and the speech waveform database 11 based on these may be constructed. Moreover, you may specify combining the change of a pitch, and the change of length. In these cases, since there are more pitch variations and pitch variations for a certain phoneme string, the pitch connection between speech segments becomes smoother, and the naturalness of the duration of phonemes. Therefore, the quality of the synthesized speech is further improved. The present invention can also be implemented using a signal processing method other than the previous PSOLA.

実施例を説明する図。The figure explaining an Example. 実施例の動作フローチャートを示す図。The figure which shows the operation | movement flowchart of an Example.

Explanation of symbols

１音声波形データベース構築装置２肉声音声波形データ
３肉声音声情報４音声波形変更部
５音声情報変更部６変更済音声波形データ
７変更済音声情報 8 音声波形データベース生成部
9 音声波形データ１０音声情報インデックス
１１音声波形データベース DESCRIPTION OF SYMBOLS 1 Voice waveform database construction apparatus 2 Real voice voice waveform data 3 Real voice voice information 4 Voice waveform change part 5 Voice information change part 6 Changed voice waveform data 7 Changed voice information 8 Voice waveform database generation part
9 Voice waveform data 10 Voice information index 11 Voice waveform database

Claims

In a speech waveform database construction method for constructing a speech waveform database used in a waveform connected text speech synthesizer,
Based on the amount of change parameter specified within the range where sound quality deterioration of the voice waveform does not occur with respect to the prepared real voice waveform, the pitch, length, power, and other voice parameters are processed by signal processing. Change and execute the step of generating a new prosody modified speech waveform,
In the step of generating the prosody modified speech waveform, the phoneme label information indicating the utterance content and its position, the power information, the pitch information, the pitch mark information, or other speech information given to the real voice waveform Performing a step of generating new audio information by changing in accordance with the amount of change made,
In addition to the real voice speech waveform and the voice information, performing the step of generating a speech waveform database by using the the real voice one or more types of prosody Changed speech waveform and the voice information is derived from the speech waveform,
A method of constructing a speech waveform database.

In a speech waveform database construction device for constructing a speech waveform database used in a waveform connected text speech synthesizer,
Based on the amount of change parameter specified within the range where sound quality deterioration of the voice waveform does not occur with respect to the prepared real voice waveform, the pitch, length, power, and other voice parameters are processed by signal processing. A voice waveform changing unit that changes and generates a new prosody changed voice waveform;
Corresponding to the amount of change made in the generation of the prosody modified speech waveform, the phoneme label information indicating the utterance content and its position, or the power information, pitch information, or pitch mark information given to the real voice waveform A voice information changing unit that generates new voice information by changing
In addition to the real voice speech waveform and its audio information, and the audio waveform database generation unit for generating a speech waveform database by using the the real voice speech waveform and the voice information of one or more types of prosody Modified which was derived from the speech waveform ,
A speech waveform database construction device characterized by comprising:

A speech waveform database construction program characterized in that a command for executing the steps described in claim 1 in this order is given to a computer.