JPH07319495A

JPH07319495A - Synthesis unit data generating system and method for voice synthesis device

Info

Publication number: JPH07319495A
Application number: JP6136401A
Authority: JP
Inventors: Otoya Shirotsuka; 音也城塚; Noriya Murakami; 憲也村上; Keiji Hayashi; 慶士林
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1994-05-26
Filing date: 1994-05-26
Publication date: 1995-12-08

Abstract

PURPOSE:To obtain a synthesized voice having a desired voice property by using voice data of plural human beings having different voice properties in a voice synthesis device. CONSTITUTION:Synthesis unit data obtained from voices of plural human beings are made to be stored as a data-base 24. When a text and voice wanted to be synthesized into the synthesized voice is inputted, synthesis unit data nearest synthetically in reagard to a phonemic environment, a power, a pitch and a voice property are selected with respect to respective voice synthesis units composing the text. Next, whether the voice property of the speaker of selected synthesis unit data is different from a voice property wanted to be synthesized more than a certain degree or not is checked and in the case the difference is greater than the degree, the voice property of the selected synthesis unit data is converted into the same voice property as the voice property wanted to be synthesized by executing a voice transformation to synthesis unit data.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、一般には、作成した文
章の校正や視覚障害者の読書等に活用される、電子化さ
れたテキストを自動的に読上げるための音声合成装置に
関し、特に、合成単位データの生成技術の改良に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to a speech synthesizing device for automatically reading a digitized text, which is utilized for proofreading a prepared sentence, reading by a visually impaired person, and the like. , Improvement of the technique for generating synthesis unit data.

【０００２】[0002]

【従来の技術】従来の音声合成装置では、品質の良い合
成音声を出力するために、音声の合成単位に、その合成
単位に前後の音韻的環境を考慮したサブカテゴリを設け
る必要があった。たとえば「朝日」/asahi/は合成単位
で記述すると/a//sa//hi/となるが、その合成単位/asah
i/の/sa/と同様の先行音韻/a/と後続音韻/hi/に挟まれ
て使用された音韻データを必要とする。たとえば「旭
川」/asahikawa/の/sa/は/asahi/の/sa/と同様の音韻的
環境にあり、単語/asahi/の合成に使用することができ
る。しかし、合成単位のすべてに対して多数のサブカテ
ゴリを設けるために、音声合成単位の作成に必要な音声
データが大量に必要であった。また、合成する単語に求
められる、音声のピッチ、パワーといったパラメータ
と、合成単位データのもつパラメータに違いがある場
合、合成音声の品質が著しく悪化する。そのために同じ
音韻的環境をもつ合成単位であっても、パワー、ピッチ
の異なるものを数種類もつ必要があり、合成単位作成の
ためのデータが大量に必要であった。2. Description of the Related Art In a conventional speech synthesizer, in order to output a synthesized speech of good quality, it is necessary to provide a sub-category in the speech synthesis unit in consideration of the phonological environment before and after the synthesis unit. For example, "Asahi" / asahi / becomes / a // sa // hi / when written in synthetic units, but the synthetic unit / asah
The phoneme data used is sandwiched between the preceding phoneme / a / and the following phoneme / hi / similar to i / / sa /. For example, "Asahikawa" / asahikawa / / sa / has the same phonological environment as / asahi / / sa / and can be used to synthesize the word / asahi /. However, since a large number of subcategories are provided for all the synthesis units, a large amount of voice data necessary for creating the voice synthesis unit is required. Further, if there is a difference between the parameters such as the pitch and the power of the voice required for the word to be synthesized and the parameters included in the synthesis unit data, the quality of the synthesized voice is significantly deteriorated. Therefore, even if the synthesis units have the same phonological environment, it is necessary to have several types with different powers and pitches, and a large amount of data is needed to create the synthesis units.

【０００３】一方、音声データの声質を変換する技術が
報告されている。たとえば、阿部伸「音声変換技術」信
学技報Vol.93,No.427,pp69〜pp75では、音声の音原特性
と声道特性のそれぞれに注目した声質変換方法が紹介さ
れている。On the other hand, a technique for converting the voice quality of voice data has been reported. For example, Shin Abe "Voice Conversion Technology", IEICE Technical Report, Vol.93, No.427, pp69 to pp75, introduces a voice quality conversion method that focuses on the source characteristics and vocal tract characteristics of speech.

【０００４】[0004]

【発明が解決しようとする課題】音声合成装置では、前
項で述べた理由により大量の音声データを合成単位の作
成のために必要とする。そこで、種々の場面で収録され
た複数の人間の音声データを使用して合成単位を作成
し、それをつなぎあわせ音声合成をおこなうことが考え
られるが、そうすると、個人ごとに音声の声質が異なる
ために、合成音声の品質が悪化する。そのため、合成単
位作成のための音声データは、ある１人の人間から長時
間にわたって収録する必要があり、収録作業に多大な労
力が必要であった。The speech synthesizer requires a large amount of speech data for creating a synthesis unit because of the reason described in the previous section. Therefore, it is possible to create a synthesis unit by using multiple human voice data recorded in various scenes and join them together to perform voice synthesis, but then the voice quality of each individual will be different. In addition, the quality of synthesized speech is deteriorated. Therefore, it is necessary to record the voice data for creating the synthesis unit from a single person for a long time, which requires a great deal of labor for the recording work.

【０００５】従って、本発明の目的は複数の人間の音声
から合成単位に使用する音声データを選び、それをつな
ぎ合せて音声合成をしても、１人の人間の音声から音声
合成した場合に近い合成音声が得られるようにすること
にある。Therefore, an object of the present invention is to select the voice data to be used as a synthesis unit from a plurality of human voices, connect them, and perform voice synthesis. The goal is to get close synthetic speech.

【０００６】[0006]

【課題を解決するための手段】本発明の合成単位データ
生成方式は、複数話者の音声から得た合成単位データを
蓄えるための合成単位データ保持部と、音声合成に用い
たい合成単位と声質とを指定した情報を入力し、指定さ
れた合成単位に最も近い合成単位データを、合成単位デ
ータ保持部から選択する合成単位データ選択部と、選択
した合成単位データの話者の声質が指定された声質に対
し所定程度以上異なるかどうかチェックし、異なる場
合、選択した合成単位データの声質を指定された声質に
近づけるための声質変換を行う声質変換部とを備える。A synthesis unit data generation method of the present invention is a synthesis unit data holding unit for storing synthesis unit data obtained from voices of a plurality of speakers, a synthesis unit to be used for voice synthesis, and voice quality. Input the information that specifies, and select the synthesis unit data closest to the specified synthesis unit from the synthesis unit data holding unit, and the voice quality of the speaker of the selected synthesis unit data is specified. And a voice quality conversion unit for performing voice quality conversion to bring the voice quality of the selected synthesis unit data closer to the specified voice quality.

【０００７】[0007]

【作用】複数話者の音声から得た合成単位データがデー
タベースに予め蓄積されている。音声合成に用いたい合
成単位と声質とが指定されると、まず、指定された合成
単位に最も近い合成単位データが、データベースから選
択される。次に、その選択した合成単位データの話者の
声質が指定された声質とどの程度異なるかチェックさ
れ、所定の程度以上異なる場合には、指定声質に近くな
るように、合成単位データに対し声質変換が行われる。
このようにして、声質の異なる種々の話者の音声データ
から、所望の単一の声質の合成単位データを得ることが
できる。Function: Synthesis unit data obtained from the voices of a plurality of speakers is previously stored in the database. When a synthesis unit to be used for voice synthesis and a voice quality are designated, first, synthesis unit data closest to the designated synthesis unit is selected from the database. Next, it is checked how much the voice quality of the speaker of the selected synthesis unit data differs from the specified voice quality. If the voice quality of the selected unit differs from the specified voice quality by more than a predetermined level, the voice quality of the synthesis unit data is approximated to the specified voice quality. The conversion is done.
In this way, it is possible to obtain the desired unit synthesis data of a single voice quality from the voice data of various speakers having different voice qualities.

【０００８】[0008]

【実施例】以下、本発明の実施例を図面により詳細に説
明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【０００９】図１は本発明に係る音質変換型音声合成装
置の一実施例の全体構成を示す。FIG. 1 shows the overall structure of an embodiment of a sound quality conversion type speech synthesizer according to the present invention.

【００１０】まず、装置の構成を説明する。この音質変
換型音声合成装置は入力端子１１より漢字かな混じり文
を入力すると共に、合成したい声質（性別、年齢層、声
の高低等の指定であり、以下、目標声質という）を指定
する。前処理部１２は、この入力文を言語的に解析し、
文を構成する単語の音韻記号系列と、単語のアクセント
型と、入力文の文型とを求める。選択基準パラメタ設定
部１３は、求められた単語の音韻記号系列、単語のアク
セント型及び入力文の文型から、音声の合成に必要な合
成単位（以下、対象合成単位という）とその音韻的環境
（＝先行音韻、後続音韻）と所定のパラメタ（＝パワ
ー、ピッチ）を決定する。First, the structure of the apparatus will be described. This sound quality conversion type speech synthesizer inputs a kanji-kana mixed sentence from the input terminal 11 and designates a voice quality to be synthesized (gender, age group, voice pitch, etc., hereinafter referred to as target voice quality). The preprocessing unit 12 linguistically analyzes this input sentence,
A phonological symbol sequence of words forming a sentence, a word accent type, and a sentence pattern of an input sentence are obtained. The selection criterion parameter setting unit 13 determines a synthesis unit (hereinafter, referred to as a target synthesis unit) necessary for speech synthesis and a phonological environment (from the phonological symbol sequence of the obtained word, the accent type of the word, and the sentence pattern of the input sentence). = Preceding phoneme, subsequent phoneme) and predetermined parameters (= power, pitch).

【００１１】合成単位データ選択部１４は、予め合成単
位ごとに複数の人間の音声より得られた複数の合成単位
データを保持し、選択基準パラメタ設定部１３から渡さ
れる対象合成単位の音韻的環境及びパラメタと目標声質
とに最もよく合致する合成単位データを選び出す。更
に、その選択した合成単位データの声質が目標声質とど
の程度異なるかをチェックし、ある程度以上異なる場
合、その合成単位データを目標声質と同様の声質に変換
する。The synthesis unit data selection unit 14 holds a plurality of synthesis unit data obtained from a plurality of human voices in advance for each synthesis unit, and the phonological environment of the target synthesis unit passed from the selection reference parameter setting unit 13. And select synthesis unit data that best matches the parameter and the target voice quality. Furthermore, it is checked how different the voice quality of the selected synthesis unit data is from the target voice quality. If the voice quality is different to some extent or more, the synthesis unit data is converted into a voice quality similar to the target voice quality.

【００１２】合成単位データ変形部１５は、選択された
合成単位データを、対象合成単位のパラメタに合うよう
に変形する。合成単位データ接合部１６は、変形された
合成単位データを接合して単語データを合成する。出力
端子１７は、合成された単語データを音声に変換し出力
する。The synthesizing unit data transforming unit 15 transforms the selected synthesizing unit data so as to match the parameters of the target synthesizing unit. The synthesis unit data splicing unit 16 splices the modified synthesis unit data to synthesize the word data. The output terminal 17 converts the synthesized word data into voice and outputs it.

【００１３】次に、この音声合成装置の特徴である合成
単位データ選択部１４の構成を、図２を参照して詳しく
説明する。Next, the structure of the synthesis unit data selection unit 14, which is a feature of this speech synthesis apparatus, will be described in detail with reference to FIG.

【００１４】図２において、尤度計算部２１は、選択基
準パラメタ設定部１３より渡される対象合成単位の合成
単位名、音韻的環境、パラメタ及び目標声質とに基づい
て、これらの点で対象合成単位に最もよく一致する合成
単位データの識別番号（以下、データＩＤという）を、
合成単位データ情報テーブル２２及び個人コードブック
格納部２５を参照して、決定する。In FIG. 2, the likelihood calculation unit 21 performs the target synthesis at these points based on the synthesis unit name of the target synthesis unit, the phonological environment, the parameters, and the target voice quality passed from the selection criterion parameter setting unit 13. The identification number of the composite unit data that best matches the unit (hereinafter referred to as the data ID) is
It is determined by referring to the composition unit data information table 22 and the personal codebook storage unit 25.

【００１５】ここで、合成単位データ情報テーブル２２
には、図３に示すように、システム内に格納されている
すべての合成単位データについて、その合成単位名、音
韻的環境（先行音韻、後続音韻）、パラメタ（パワー、
ピッチ）、話者識別番号（以下、話者ＩＤという）、及
びデータＩＤが記録されている。また、個人コードブッ
ク格納部２５には、システム内に格納されている合成単
位データのすべての話者に関して、話者ＩＤとその声質
の特徴を表わした情報とを記録した話者コードブックが
格納されている。Here, the composition unit data information table 22
As shown in FIG. 3, for all synthesis unit data stored in the system, the synthesis unit name, phonological environment (preceding phoneme, subsequent phoneme), parameter (power, power,
Pitch), a speaker identification number (hereinafter referred to as a speaker ID), and a data ID are recorded. Further, the personal codebook storage unit 25 stores a speaker codebook in which the speaker IDs and information representing the characteristics of the voice quality of all the speakers of the synthesis unit data stored in the system are recorded. Has been done.

【００１６】尤度計算部２１の処理を具体的に説明する
と、まず、合成単位データ情報テーブル２２から各合成
単位データの音韻的環境とパラメタとを取り出し、ま
た、個人コードブック格納部２５から各合成単位データ
の話者の声質に関する情報を取り出す。次に、選択基準
パラメタ設定部１３から渡された対象合成単位の音韻的
環境、パラメタ及び目標声質情報と、先程合成単位デー
タ情報テーブル２２及び個人コードブック格納部２５か
ら取り出した各合成単位データの音韻的環境、パラメタ
及び声質情報とを、以下に示す評価関数に代入して、対
象合成単位と各合成単位データとの間の距離Ｄを求め
る。The processing of the likelihood calculation section 21 will be described in detail. First, the phonological environment and parameters of each synthesis unit data are extracted from the synthesis unit data information table 22, and each of them is extracted from the personal codebook storage section 25. Information about the voice quality of the speaker of the synthesis unit data is extracted. Next, the phonological environment, parameters and target voice quality information of the target synthesis unit passed from the selection criterion parameter setting unit 13 and the synthesis unit data extracted from the synthesis unit data information table 22 and the personal codebook storage unit 25 previously. The phonological environment, parameters, and voice quality information are substituted into the evaluation function shown below to obtain the distance D between the target synthesis unit and each synthesis unit data.

【００１７】Ｄ＝W1・Pre(s1,t1)+W2・Post(s2,t2)+W3・Po
w(s3,t3)+W4・Pit(s4,t4)+W5・Person(s5,t5) ここに、s1〜s5は、対象合成単位の音韻的環境、パラメ
タ及び目標声質情報、t1〜t5は、合成単位データの音韻
的環境、パラメタ及び声質情報、Pre()は、先行音韻に
関する距離計算関数、Post()は、後続音韻に関する距離
計算関数、Pow()は、パワーに関する距離計算関数、Pit
()は、ピッチに関する距離計算関数、Person()は、声質
情報に関する距離計算関数、W1〜W5は、重み係数であ
る。この評価関数は、先行音韻、後続音韻、パワー、ピ
ッチ及び声質の各々について、対象合成単位と合成単位
データとの間の距離を計算し、それぞれの距離に、合成
音声の品質に対する影響度を考慮した重み係数を掛け、
それらの総和をとったものである。D = W1 ・ Pre (s1, t1) + W2 ・ Post (s2, t2) + W3 ・ Po
w (s3, t3) + W4 ・ Pit (s4, t4) + W5 ・ Person (s5, t5) where s1 to s5 are the phonological environment of the target synthesis unit, parameter and target voice quality information, and t1 to t5 are , Phonological environment of synthesis unit data, parameter and voice quality information, Pre () is a distance calculation function for a preceding phoneme, Post () is a distance calculation function for a subsequent phoneme, Pow () is a distance calculation function for power, Pit
() Is a distance calculation function regarding pitch, Person () is a distance calculation function regarding voice quality information, and W1 to W5 are weighting factors. This evaluation function calculates the distance between the target synthesis unit and the synthesis unit data for each of the preceding phoneme, the subsequent phoneme, the power, the pitch, and the voice quality, and considers the influence degree on the quality of the synthesized speech in each distance. Multiplied by the weighting factor
It is the sum of them.

【００１８】尤度計算部２１は、この距離Ｄの計算を、
システムに格納されているすべての合成単位データにつ
いて行い、最も距離Ｄの小さい、つまり一致度（尤度）
の高い合成単位データを音声合成に使用するものとして
決定し、その合成単位データのデータＩＤと話者ＩＤと
上記計算の過程で求めた声質に関する距離（一致度）と
を、声質変換部２３に渡す。The likelihood calculating section 21 calculates the distance D by
This is performed for all composition unit data stored in the system, and the distance D is the smallest, that is, the degree of coincidence (likelihood).
The synthesis unit data having a high level is determined to be used for speech synthesis, and the data ID of the synthesis unit data, the speaker ID, and the distance (degree of coincidence) regarding the voice quality obtained in the above calculation process are set in the voice quality conversion unit 23. hand over.

【００１９】声質変換部２３は、尤度計算部２１から渡
されたデータＩＤと話者ＩＤに基づき、これにより特定
される実際の合成単位データを個人別合成単位データベ
ース２４から取り出す。ここで、個人別合成単位データ
ベース２４には、各話者毎にその話者の収録音声から得
た合成単位データが蓄積されている。続いて、声質変換
部２３は、尤度計算部２１から渡された声質の一致度に
ついて、所定の閾値より低いか否かチェックし、低い場
合（声質が相当に異なる場合）には、指定声質とほぼ同
様の声質になるように、取り出した合成単位データの声
質変換を行う。Based on the data ID and the speaker ID passed from the likelihood calculation section 21, the voice quality conversion section 23 extracts the actual synthesis unit data specified by this from the individual synthesis unit database 24. Here, in the individual synthesis unit database 24, synthesis unit data obtained from the recorded voice of each speaker is accumulated for each speaker. Subsequently, the voice quality conversion unit 23 checks whether or not the degree of coincidence of the voice qualities passed from the likelihood calculation unit 21 is lower than a predetermined threshold, and if it is low (when the voice qualities are considerably different), the designated voice quality is determined. The voice quality of the extracted synthesis unit data is converted so that the voice quality is almost the same as that of.

【００２０】この声質変換の手法としては、例えば、前
掲「音声変換技術」に記載の声道特徴を考慮した話者コ
ードブックを用いた方法を採用する。即ち、個人コード
ブック格納部２５から、上記話者ＩＤに対応する話者の
コードブック（以下、オリジナルコードブックという）
と、目標声質情報にマッチする声質を持った話者のコー
ドブック（以下、目標コードブックという）とを取り出
し、オリジナルコードブックから目標コードブックへコ
ードブックマッピングを行うことにより、合成単位デー
タの声質を目標声質に変換する。As the voice quality conversion method, for example, a method using a speaker codebook in consideration of vocal tract characteristics described in the above "speech conversion technology" is adopted. That is, from the personal codebook storage unit 25, the codebook of the speaker corresponding to the speaker ID (hereinafter referred to as the original codebook).
And the codebook of the speaker having a voice quality that matches the target voice quality information (hereinafter referred to as the target codebook), and the codebook mapping from the original codebook to the target codebook is performed to obtain the voice quality of the synthesis unit data. To the target voice quality.

【００２１】尚、声質変換の手法には他にも種々のもの
が知られており、それを採用しても構わない。例えば、
粕屋、楊著「音原特性を考慮した声質変換」（音響学会
講論集、1-6-18,pp225-226,1991）に記載の方法等が採
用できる。There are various other known voice quality conversion methods, which may be adopted. For example,
The method described in “Voice conversion considering sound source characteristics” by Kasuya and Yang (Academic Society of Japan, 1-6-18, pp225-226, 1991) can be adopted.

【００２２】以上の処理により、複数の話者から収録し
た音声データを利用して所望の声質を持った音声が合成
可能になる。そのため、一人の人間から大量の音声を収
録する必要がなくなり、すでに収録されている種々の人
間の音声を寄せ集め再利用することができる。By the above processing, it is possible to synthesize a voice having a desired voice quality by using voice data recorded by a plurality of speakers. Therefore, it is not necessary to record a large amount of voices from one person, and it is possible to collect and reuse various types of voices already recorded.

【００２３】本発明は、上述した実施例以外にも種々の
態様で実施することが可能である。例えば、上述した評
価関数において、声質に関する一致度のスコアに対する
重み係数W5を適度に大きくすることにより、声質の大き
く異なる話者の合成単位データが選ばれなくすることが
できる。これにより、声質変換部２３を省略することも
可能である。The present invention can be implemented in various modes other than the above-mentioned embodiments. For example, in the above-described evaluation function, by appropriately increasing the weighting coefficient W5 for the score of the degree of coincidence regarding the voice quality, it is possible to prevent the synthesis unit data of speakers having greatly different voice qualities from being selected. Thereby, the voice quality conversion unit 23 can be omitted.

【００２４】また、合成単位データ選択部１４は、これ
を合成単位データベース作成装置として応用して、複数
話者の音声データから、一人の話者の声質をもつ合成単
位データから成る合成単位データベースを作成すること
も可能である。このデータベースを用いる場合には、音
声合成装置としては、実施例の構成において、合成単位
データ情報テーブルから話者ＩＤを省略することがで
き、また、声質変換部及び個人コードブック格納部も省
略することができる。但し、合成単位データ選択の際に
は、データベース作成の際に計算した声質（話者）の一
致度（距離）を考慮した方が、データベース作成時の声
質変換によって大きく品質の低下してしまった合成単位
データを選択してしまうおそれがなくなるため、この距
離を合成単位データ情報テーブルに記述において評価関
数に導入するようにした方がよい。The synthesizing unit data selecting unit 14 applies this as a synthesizing unit database creating apparatus to create a synthesizing unit database composed of synthesizing unit data having voice qualities of one speaker from voice data of a plurality of speakers. It is also possible to create. When this database is used, in the configuration of the voice synthesizer, the speaker ID can be omitted from the synthesis unit data information table, and the voice quality conversion unit and the personal codebook storage unit are also omitted. be able to. However, when selecting synthesis unit data, considering the degree of matching (distance) of voice qualities (speakers) calculated at the time of database creation, the quality was greatly reduced due to voice quality conversion at database creation. It is better to introduce this distance into the evaluation function in the description in the composition unit data information table because there is no risk of selecting composition unit data.

【００２５】[0025]

【発明の効果】本発明によれば、複数の人間から収録し
た音声データを合成単位データとして蓄積しておき、合
成に使いたい合成単位に近い合成単位データを選び出
し、これと合成したい声質とがある程度以上異なる場合
には、選び出した合成単位データの声質を目標の声質に
変換するようにしているので、複数の人間の音声データ
を利用して所望の声質の合成音声が得られ、従って、一
人の人間から大量の音声データを収録する必要がなくな
り、収録の労力軽減に効果がある。According to the present invention, voice data recorded by a plurality of people are accumulated as synthesis unit data, synthesis unit data close to the synthesis unit desired to be used for synthesis are selected, and this and the voice quality to be synthesized are selected. When the difference is more than a certain amount, the voice quality of the selected synthesis unit data is converted to the target voice quality, so that the synthesized voice of the desired voice quality can be obtained by using the voice data of a plurality of humans. There is no need to record a large amount of voice data from humans, which is effective in reducing the recording labor.

【００２６】また、すでに収録されている複数人の音声
データを再利用することができるので、合成単位データ
ベースの作成作業の軽減にも効果がある。Further, since the voice data of a plurality of persons already recorded can be reused, it is effective in reducing the work of creating the synthesis unit database.

[Brief description of drawings]

【図１】本発明の声質変換型音声合成装置の一実施例の
全体構成を示すブロック図。FIG. 1 is a block diagram showing the overall configuration of an embodiment of a voice quality conversion type speech synthesizer of the present invention.

【図２】図１における合成単位データ選択部の詳細構成
を示すブロック図。FIG. 2 is a block diagram showing a detailed configuration of a synthesis unit data selection unit in FIG.

【図３】図２における合成単位データ情報テーブルの一
例を示す図。FIG. 3 is a diagram showing an example of a composition unit data information table in FIG.

[Explanation of symbols]

１１入力端子１２前処理部１３選択基準パラメタ設定部１４合成単位データ選択部１５合成単位変形部１６合成単位接合部１７出力端子２１尤度計算部２２合成単位データ情報テーブル２３声質変換部２４個人別合成単位データベース２５個人コードブック格納部 11 Input Terminal 12 Pre-Processing Section 13 Selection Criteria Parameter Setting Section 14 Synthesis Unit Data Selection Section 15 Synthesis Unit Deformation Section 16 Synthesis Unit Joining Section 17 Output Terminal 21 Likelihood Calculation Section 22 Synthesis Unit Data Information Table 23 Voice Conversion Section 24 Individual Synthesis unit database 25 Individual codebook storage

Claims

[Claims]

1. A synthesis unit data holding unit for storing synthesis unit data obtained from voices of a plurality of speakers, and inputting information designating a synthesis unit desired to be used for voice synthesis and voice quality, and the designated synthesis unit. And a synthesis unit data selection unit that selects the synthesis unit data closest to the above from the synthesis unit data holding unit, and whether or not the voice quality of the speaker of the selected synthesis unit data differs from the specified voice quality by a predetermined degree or more. However, if different, a voice quality conversion unit that performs a voice quality conversion to bring the voice quality of the selected synthesis unit data closer to the specified voice quality, a synthesis unit data generation method for a voice synthesis apparatus, .

2. A process of designating a synthesis unit to be used for speech synthesis and a voice quality, and a degree of coincidence with the designated synthesis unit from synthesis unit data obtained from voices of a plurality of speakers accumulated in advance. And the process of selecting the highest synthesis unit data, check whether the voice quality of the speaker of the selected synthesis unit data is different from the specified voice quality by a predetermined degree or more. And a step of performing voice quality conversion to bring the voice quality closer to the specified voice quality, the synthesis unit data generation method for a voice synthesis device.