JP3495275B2

JP3495275B2 - Speech synthesizer

Info

Publication number: JP3495275B2
Application number: JP36981498A
Authority: JP
Inventors: 充海老原; 泰石川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1998-12-25
Filing date: 1998-12-25
Publication date: 2004-02-09
Anticipated expiration: 2018-12-25
Also published as: JP2000194388A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、入力された任意
のテキストを音声へ変換する音声合成装置に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing apparatus for converting input arbitrary text into voice.

【０００２】[0002]

【従来の技術】音声合成技術、中でも規則合成技術は任
意の文字列からなるテキストを音声に変換して提示する
技術であり、音声による情報サービス、電子メールの読
み上げ、身障者向け朗読器、新聞校閲などの用途に用い
られている。2. Description of the Related Art Speech synthesis technology, especially rule synthesis technology, is a technology for presenting text consisting of arbitrary character strings after converting it into speech. Information services by voice, reading of e-mails, reading devices for persons with disabilities, newspaper review It is used for such purposes.

【０００３】一般的な規則合成技術に基づいたテキスト
音声変換システムの構成は、古井「デジタル音声処理」
（東京大学出版会１９８５）の１４６ページに示され
る。システムは文章解析部、音声規則合成部、音声合成
部の３つのモジュールからなる。The structure of a text-to-speech conversion system based on a general rule synthesizing technique is Furui "Digital Speech Processing".
(Tokyo University Press 1985), page 146. The system consists of three modules, a sentence analysis unit, a voice rule synthesis unit, and a voice synthesis unit.

【０００４】文章解析部では、辞書を参照して入力した
日本語テキストを形態素と呼ばれる単位に分割する形態
素解析処理を行う。形態素には読み、アクセント型、品
詞等が付与される。The sentence analysis unit performs a morpheme analysis process for dividing a Japanese text input with reference to a dictionary into units called morphemes. Readings, accent types, parts of speech, etc. are added to the morphemes.

【０００５】音声規則合成部は、さらに音響処理部と韻
律生成部から成る。文章解析部で得られた読みとアクセ
ントを基に、音響処理部ではＬＳＰやメルケプストラム
等の音声波形を分析して得られる特徴量、あるいは音声
波形そのものなどで構成される音響パラメータを生成
し、韻律生成部では韻律規則に基づいてピッチ、ポー
ズ、継続時間長などの韻律パラメータを生成する。The voice rule synthesizing section further comprises an acoustic processing section and a prosody generation section. Based on the readings and accents obtained by the sentence analysis unit, the acoustic processing unit generates an acoustic parameter composed of a feature amount obtained by analyzing the speech waveform of LSP, mel cepstrum, or the like, or the speech waveform itself, The prosody generation unit generates prosody parameters such as pitch, pause, and duration based on the prosody rules.

【０００６】音声合成部では上記音響パラメータと上記
韻律パラメータを基に合成音声を生成し、出力する。音
響パラメータがＬＳＰ等の特徴量の場合には、ボコーダ
ーと呼ばれる分析合成手法に基づいて合成音声を生成
し、音声波形の場合はＰＳＯＬＡ（Pitch Synchronous
OverLap-and-Add）と呼ばれる方式で合成を行う。The voice synthesizer generates and outputs a synthesized voice based on the acoustic parameters and the prosody parameters. When the acoustic parameter is a feature amount such as LSP, synthetic speech is generated based on an analysis and synthesis method called a vocoder, and when it is a speech waveform, PSOLA (Pitch Synchronous).
Synthesis is performed by a method called OverLap-and-Add).

【０００７】ＰＳＯＬＡ方式は波形を直接利用をするこ
とで自然性の高い音声合成が可能となるが、波形情報は
ＬＳＰ等の音響パラメータに比べて情報量が大きいた
め、記憶容量が多大となる。そこで波形情報をベクトル
量子化などによって圧縮して記憶する方法が検討されて
いる。このような従来技術として特開平５−７３１００
号公報「音声合成方式及びその装置」（以下、文献１と
する）が挙げられる。The PSOLA method enables highly natural speech synthesis by directly using a waveform, but since the waveform information has a large amount of information compared to acoustic parameters such as LSP, the storage capacity becomes large. Therefore, a method of compressing and storing the waveform information by vector quantization has been studied. As such a conventional technique, JP-A-5-73100 is used.
Japanese Patent Laid-Open Publication "Voice Synthesis Method and Apparatus Thereof" (hereinafter referred to as Document 1).

【０００８】図１７は、文献１で示される手法に基づく
音声合成装置の一構成例（以下、第１の従来例）であ
る。１はテキスト、２は言語処理部、３は音素列、４は
アクセント情報、５は韻律生成部、６はピッチ周波数、
１１は音声素片データベース、２４はコードブック、１
０は素片読み出し部、７は音声素片、１６はコードブッ
ク参照部、１３は合成パラメータ、１４は合成部、１５
は合成音声である。FIG. 17 shows an example of the configuration of a speech synthesizer based on the method described in Document 1 (hereinafter referred to as the first conventional example). 1 is a text, 2 is a language processing unit, 3 is a phoneme string, 4 is accent information, 5 is a prosody generation unit, 6 is a pitch frequency,
11 is a speech unit database, 24 is a codebook, 1
0 is a unit reading unit, 7 is a voice unit, 16 is a codebook reference unit, 13 is a synthesis parameter, 14 is a synthesis unit, and 15 is a synthesis unit.
Is a synthetic voice.

【０００９】以下に、第１の従来例による音声合成装置
について説明する。言語処理部２は、入力されたテキス
ト１から読みを表す音素列３とアクセント情報４を得
る。素片読み出し部１０は、音素列３にしたがって音声
素片７を音声素片データベース１１から読み出す。読み
出された音声素片７は事前のベクトル量子化により得ら
れたものであり、波形系列またはスペクトル系列に対応
するコード列が記録されている。The speech synthesis apparatus according to the first conventional example will be described below. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 reads the phoneme 7 from the phoneme database 11 according to the phoneme sequence 3. The read speech unit 7 is obtained by vector quantization in advance, and a code string corresponding to a waveform series or a spectrum series is recorded.

【００１０】コードブック２４は音声波形のスペクトル
データから既存のクラスタリング手法により分割された
パターン（コードベクトル）とコードの組からなる。コ
ードベクトルはスペクトル情報または元の波形情報のい
ずれでも構わない。コードブック参照部１６では、先の
音声素片７におけるコード列からコードブック２４を参
照して、スペクトル情報または波形情報からなる合成パ
ラメータ１３を得る。韻律生成部５はアクセント情報４
を基に合成音声のピッチ６を規則により生成する。The codebook 24 is composed of patterns (code vectors) and code sets obtained by dividing the spectrum data of the voice waveform by the existing clustering method. The code vector may be either spectral information or original waveform information. The codebook reference unit 16 refers to the codebook 24 from the code string in the preceding speech unit 7 to obtain the synthesis parameter 13 composed of spectrum information or waveform information. The prosody generation unit 5 uses the accent information 4
The pitch 6 of the synthesized voice is generated according to the rule based on

【００１１】合成部１４は合成パラメータ１３とピッチ
６から合成音声１５を生成する。合成部１４は、合成パ
ラメータ１３がスペクトル情報ならデジタルフィルタに
よる合成手法、波形情報ならＰＳＯＬＡ手法に基づく。The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6. The synthesizing unit 14 is based on a synthesizing method using a digital filter if the synthesizing parameter 13 is spectrum information, and based on the PSOLA method if it is waveform information.

【００１２】[0012]

【発明が解決しようとする課題】第１の従来技術ではボ
コーダー方式に比べて自然性の高い合成音声を得る事が
可能である。しかし、有声音声区間において同一フレー
ム内で同じピッチ波形が繰り返されることにより、合成
音の自然性が損なわれるという問題がある。これは単一
の波形の繰り返しにより、自然音声波形の持つゆらぎ成
分が失われ、合成音のブザー性が顕著となり不自然とな
るためである。このような問題点を解消する為に、音声
波形を定常的周期成分と、ゆらぎを含む非定常成分とに
分離し、別々に制御して合成音声波形を生成する手法も
提案されている。このような従来技術として特願平４−
３５８２００号公報記載の「音声合成装置」（以下、文
献２とする）に開示されている技術がある。According to the first conventional technique, it is possible to obtain a synthesized voice having a high naturalness as compared with the vocoder system. However, since the same pitch waveform is repeated in the same frame in the voiced voice section, there is a problem that the naturalness of the synthesized voice is impaired. This is because the fluctuation component of the natural voice waveform is lost due to the repetition of a single waveform, and the buzzer property of the synthetic sound becomes remarkable and becomes unnatural. In order to solve such a problem, a method of separating a speech waveform into a stationary periodic component and a non-stationary component including fluctuation and controlling them separately to generate a synthesized speech waveform is also proposed. As such a conventional technique, Japanese Patent Application No. 4-
There is a technique disclosed in "Speech synthesizer" (hereinafter referred to as Document 2) described in Japanese Patent No. 358200.

【００１３】図１８は、文献２で示される手法に基づく
音声合成装置の一構成例（以下、第２の従来例）であ
る。３０は１ピッチ波形格納部、３１はピッチ格納部、
３２は非定常波形格納部、３３は１ピッチ波形、６はピ
ッチ、３４は非定常波形、３５は移動加算部、３６は定
常合成音声、３７は単純加算部、１５は合成音声であ
る。FIG. 18 shows an example of the configuration of a speech synthesizer based on the method disclosed in Document 2 (hereinafter referred to as the second conventional example). 30 is a 1 pitch waveform storage unit, 31 is a pitch storage unit,
32 is a non-stationary waveform storage unit, 33 is a 1-pitch waveform, 6 is a pitch, 34 is a non-stationary waveform, 35 is a moving addition unit, 36 is a steady synthesized voice, 37 is a simple addition unit, and 15 is a synthesized voice.

【００１４】以下に、第２の従来例による音声合成装置
について説明する。１ピッチ波形格納部３０は自然音声
を帯域分割して得られた低域音声波形を１ピッチ毎に１
ピッチ波形３３として格納する。非定常波形格納部３２
は先の帯域分割で得られた高域音声波形を非定常波形３
４として記憶する。移動加算部３５はピッチ格納部３１
から求めたピッチ６間隔で１ピッチ波形３３の移動加算
を行う。単純加算部３７は移動加算部３５で得られた定
常合成音声３６と非定常波形格納部３２から読み出した
非定常波形３４を加算し、これを合成音声１５として出
力する。A second conventional speech synthesizer will be described below. The 1-pitch waveform storage unit 30 stores the low-frequency voice waveform obtained by band-dividing the natural voice by 1 for each pitch.
It is stored as the pitch waveform 33. Unsteady waveform storage unit 32
Is the unsteady waveform 3 of the high-frequency speech waveform obtained by the previous band division.
Remember as 4. The moving addition unit 35 is the pitch storage unit 31.
The 1-pitch waveform 33 is moved and added at intervals of 6 pitches obtained from. The simple addition unit 37 adds the stationary synthesized speech 36 obtained by the moving addition unit 35 and the unsteady waveform 34 read from the unsteady waveform storage unit 32, and outputs this as the synthesized speech 15.

【００１５】第２の従来技術は音声波形のゆらぎ成分に
着目した波形利用による音声合成を目的としている。し
かし、ピッチに同期した波形処理を低周波数帯域でしか
行わない為に高品質の合成音が得られないという問題が
あり、更に波形情報を丸ごと保存する為に記憶容量が多
大になる。The second prior art aims at voice synthesis by utilizing a waveform focusing on the fluctuation component of the voice waveform. However, there is a problem that a high-quality synthesized sound cannot be obtained because the waveform processing synchronized with the pitch is performed only in the low frequency band, and the storage capacity is large because the entire waveform information is stored.

【００１６】本発明の目的は、かかる問題を克服し、１
ピッチ波形を低周波成分と高周波成分に分離して制御す
ることで合成音の自然性に影響をおよぼすゆらぎ成分の
生成が可能となり、またコードブック利用の方式により
少ない記憶容量で高品質な合成音声が可能な音声合成装
置を提供することである。The object of the present invention is to overcome such problems and
By controlling the pitch waveform by separating it into low-frequency components and high-frequency components, it is possible to generate fluctuation components that affect the naturalness of synthesized speech, and a codebook-based method enables high-quality synthesized speech with a small storage capacity. It is to provide a speech synthesizer capable of

【００１７】[0017]

【課題を解決するための手段】この発明に係る音声合成
装置は、音声素片を記憶する音声素片データベースと、
有声音波形を低周波成分と高周波成分に分離した時の低
周波成分波形のベクトル量子化コードブックである低周
波成分コードブックと、有声音波形を低周波成分と高周
波成分に分離した時の高周波成分波形のベクトル量子化
コードブックである高周波成分コードブックと、入力さ
れたテキストから音素列およびアクセント情報を得る言
語処理部と、音素列にしたがって音声素片を音声素片デ
ータベースから読み出す素片読み出し部と、素片読み出
し部の読み出した音声素片に基づいて、低周波成分コー
ドブック及び高周波成分コードブックの各々から低周波
成分波形及び高周波成分波形を選択するコードブック参
照部と、選択された低周波成分波形と高周波成分波形と
を加算し合成パラメータを得る加算部と、アクセント情
報にしたがってピッチ周波数を生成する韻律生成部と、
合成パラメータとピッチ周波数とに基づいて合成音声を
生成する合成部とを備えている。A speech synthesis apparatus according to the present invention comprises a speech unit database for storing speech units,
A low-frequency component codebook, which is a vector quantization codebook of low-frequency component waveforms when a voiced sound waveform is separated into low-frequency components and high-frequency components, and high-frequency when a voiced sound waveform is separated into low-frequency components and high-frequency components. A high-frequency component codebook, which is a vector quantization codebook of component waveforms, a language processing unit that obtains phoneme strings and accent information from the input text, and a phoneme unit readout that reads out phoneme units from the phoneme unit database according to the phoneme string. Section, and a codebook reference section that selects a low-frequency component waveform and a high-frequency component waveform from each of the low-frequency component codebook and the high-frequency component codebook based on the speech segment read by the segment reading unit. An adding unit that adds a low-frequency component waveform and a high-frequency component waveform to obtain a synthesis parameter, and a pitch according to the accent information. A prosody generation unit for generating a switch frequency,
A synthesizing section for generating synthetic speech based on the synthesizing parameter and the pitch frequency is provided.

【００１８】また、コードブック参照部は、高周波成分
コードブックから１ピッチ毎に異なる高周波成分波形を
選択する。The codebook reference section selects different high frequency component waveforms for each pitch from the high frequency component codebook.

【００１９】また、加算部は、低周波成分波形を時間軸
に配置する際、基準位置に対して、高周波成分波形の配
置位置を１ピッチ毎に変化させて、低周波成分波形と高
周波成分波形とを加算する移動加算部である。Further, when arranging the low frequency component waveform on the time axis, the adding section changes the arrangement position of the high frequency component waveform for every pitch with respect to the reference position so as to change the low frequency component waveform and the high frequency component waveform. This is a moving addition unit that adds and.

【００２０】また、加算部は、低周波成分波形の基準位
置に対する高周波成分波形の配置位置の平均変化幅を、
入力ピッチまたはパワーに応じて変化させて、低周波成
分波形と高周波成分波形とを加算する韻律制御移動加算
部である。Further, the adding section calculates an average change width of the arrangement position of the high frequency component waveform with respect to the reference position of the low frequency component waveform,
It is a prosody control moving addition unit that adds a low frequency component waveform and a high frequency component waveform by changing the input pitch or power according to the input pitch or power.

【００２１】また、加算部は、加算される低周波成分波
形と高周波成分波形の振幅比率を、１ピッチ毎に変化さ
せ、低周波成分波形と高周波成分波形とを加算する振幅
比率制御加算部である。The addition unit is an amplitude ratio control addition unit that changes the amplitude ratio of the low frequency component waveform and the high frequency component waveform to be added for each pitch and adds the low frequency component waveform and the high frequency component waveform. is there.

【００２２】また、加算部は、低周波成分波形と高周波
成分波形の振幅比率を、入力ピッチまたはパワーに応じ
て変化させ、低周波成分波形と高周波成分波形とを加算
する振幅比率ピッチ制御移動加算部である。The addition section changes the amplitude ratio of the low frequency component waveform and the high frequency component waveform according to the input pitch or power, and adds the low frequency component waveform and the high frequency component waveform. It is a department.

【００２３】また、この発明に係る他の音声合成装置
は、音声素片およびコードブック出現頻度を記憶する頻
度情報付き音声素片データベースと、有声音波形ベクト
ル量子化コードブックであるコードブックと、入力され
たテキストから音素列およびアクセント情報を得る言語
処理部と、頻度情報付き音声素片データベースに記載さ
れている音声素片を出現頻度と共に読み出す素片読み出
し部と、音声素片に対して、出現頻度に応じてコードブ
ックを参照し、合成パラメータを得るコードブック参照
部と、アクセント情報にしたがってピッチ周波数を生成
する韻律生成部と、合成パラメータとピッチ周波数とに
基づいて合成音声を生成する合成部とを備えている。Further, another speech synthesizer according to the present invention is a speech unit database with frequency information for storing speech units and codebook appearance frequencies, a codebook which is a voiced sound wave vector quantization codebook, A language processing unit that obtains a phoneme string and accent information from the input text, a phoneme unit reading unit that reads out the phoneme units described in the phoneme unit database with frequency information together with the appearance frequency, and a phoneme unit, A codebook reference unit that refers to a codebook according to the appearance frequency to obtain a synthesis parameter, a prosody generation unit that generates a pitch frequency according to accent information, and a synthesis that generates a synthesized voice based on the synthesis parameter and the pitch frequency. And a section.

【００２４】また、コードブック参照部は、音声素片に
記載される出現頻度からコードベクトルの出現比率を求
め、音声素片に記述されたコードの中で出現比率の上位
から複数個を選んでコードブックを参照し、読み出され
て得た各波形を出現比率に応じた重み付けをして加算
し、合成パラメータを得るコードブック参照部である。The codebook reference section obtains the appearance ratio of the code vector from the appearance frequency described in the speech unit, and selects a plurality of codes from the highest appearance ratio among the codes described in the speech unit. A codebook reference unit that refers to a codebook, weights the waveforms obtained by reading, and adds the waveforms according to the appearance ratio to obtain a synthesis parameter.

【００２５】また、コードブックは、有声音波形を低周
波成分と高周波成分に分離した時の低周波成分波形のベ
クトル量子化コードブックである低周波成分コードブッ
クと、有声音波形を低周波成分と高周波成分に分離した
時の高周波成分波形のベクトル量子化コードブックであ
る高周波成分コードブックとからなり、頻度情報付き音
声素片データベースは、音声素片と共に低周波コードベ
クトルの出現頻度および高周波コードベクトルの出現頻
度を記憶し、コードブック参照部は、音声素片に記載
される出現頻度から低周波コードベクトルおよび高周波
コードベクトルの出現比率を求め、音声素片に記述され
たコードの内で出現比率の上位から複数個を選んで低周
波コードブックおよび高周波コードブックを参照し、読
み出されて得た各波形を出現比率に応じた重み付けをし
て加算し、合成パラメータを得るコードブック参照部で
ある。The codebook is a low-frequency component codebook which is a vector quantization codebook of a low-frequency component waveform when a voiced sound waveform is separated into a low-frequency component and a high-frequency component. And a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into high-frequency components, and a speech element database with frequency information, together with speech elements, the appearance frequency and high-frequency code of low-frequency code vectors. The codebook reference unit stores the appearance frequency of the vector, and determines the appearance ratio of the low-frequency code vector and the high-frequency code vector from the appearance frequency described in the speech unit, and the codebook reference unit appears in the code described in the speech unit. Waveforms obtained by reading out the low-frequency codebook and high-frequency codebook by selecting multiple from the highest ratio It is a codebook reference section that obtains synthesis parameters by weighting shapes according to the appearance ratio and adding them.

【００２６】また、この発明に係る他の音声合成装置
は、音声素片列をコードブックのコードベクトル線形和
表現で記憶した線形和表現音声素片データベースと、有
声音波形ベクトル量子化コードブックであるコードブッ
クと、入力されたテキストから音素列およびアクセント
情報を得る言語処理部と、音素列にしたがって音声素片
列を線形和表現音声素片データベースから読み出す素片
読み出し部と、音声素片列に対して、線形和表現音声素
片データベースに記憶されている係数とコードブックを
参照して得た波形から線形和を求め、合成パラメータを
得るコードブック参照部と、アクセント情報にしたがっ
てピッチ周波数を生成する韻律生成部と、合成パラメー
タとピッチ周波数とに基づいて合成音声を生成する合成
部とを備えている。Another speech synthesizer according to the present invention comprises a linear sum expression speech unit database in which a speech unit string is stored in a code vector linear sum representation of a codebook, and a voiced sound wave vector quantization codebook. A codebook, a language processing unit that obtains a phoneme string and accent information from the input text, a phoneme unit reading unit that reads out a phoneme unit string from the linear sum expression phoneme unit database according to the phoneme string, and a phoneme unit string. On the other hand, a linear sum is obtained from the coefficients stored in the linear sum expression speech unit database and the waveform obtained by referring to the codebook, and the pitch frequency is determined according to the codebook reference section for obtaining the synthesis parameter and the accent information. It includes a prosody generating section for generating and a synthesizing section for generating synthesized speech based on the synthesizing parameter and the pitch frequency.

【００２７】また、乱数を発生する乱数発生器をさらに
有し、コードブック参照部は、線形和表現音声素片デー
タベースに記憶されている係数に乱数発生器による乱数
を加算し、係数とコードブックを参照して得た波形から
線形和を求め、合成パラメータを得る乱数利用コードブ
ック参照部である。Further, the codebook reference unit further has a random number generator for generating a random number, and the codebook reference unit adds the random number generated by the random number generator to the coefficient stored in the linear sum expression speech unit database to obtain the coefficient and the codebook. Is a random-number-based codebook reference unit that obtains a linear sum from the waveform obtained by referring to.

【００２８】また、コードブックは、有声音波形を低周
波成分と高周波成分に分離した時の低周波成分波形のベ
クトル量子化コードブックである低周波成分コードブッ
クと、有声音波形を低周波成分と高周波成分に分離した
時の高周波成分波形のベクトル量子化コードブックであ
る高周波成分コードブックとからなり、線形和表現音声
素片データベースは、音声素片列を低周波成分コードブ
ックおよび高周波成分コードブックのコードベクトル線
形和表現で記憶し、コードブック参照部は、線形和表現
音声素片データベースに記憶されている係数と低周波成
分コードブックおよび高周波成分コードブックを参照し
て得た波形から線形和を求め合成パラメータを得る。The codebook is a low-frequency component codebook which is a vector quantization codebook of a low-frequency component waveform when a voiced sound waveform is separated into a low-frequency component and a high-frequency component. And a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into high-frequency components. The code vector of the book is stored as a linear sum expression, and the code book reference section is a linear sum from the coefficients stored in the linear sum expression speech unit database and the waveform obtained by referring to the low frequency component code book and the high frequency component code book. The sum is calculated and the synthesis parameter is obtained.

【００２９】また、コードブックは、有声音波形を低周
波成分と高周波成分に分離した時の低周波成分波形のベ
クトル量子化コードブックである低周波成分コードブッ
クと、有声音波形を低周波成分と高周波成分に分離した
時の高周波成分波形のベクトル量子化コードブックであ
る高周波成分コードブックとからなり、乱数利用コード
ブック参照部は、線形和表現音声素片データベースに記
憶されている係数に乱数発生器による乱数を加算し、係
数と低周波成分コードブックおよび高周波成分コードブ
ックを参照して得た波形から線形和を求め合成パラメー
タを得る。The codebook is a low-frequency component codebook which is a vector quantization codebook of a low-frequency component waveform when a voiced sound waveform is separated into a low-frequency component and a high-frequency component. And a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into high-frequency components, and the random-number-based codebook reference unit uses random numbers for the coefficients stored in the linear sum expression speech unit database. Random numbers generated by the generator are added, a linear sum is obtained from coefficients and waveforms obtained by referring to the low-frequency component codebook and the high-frequency component codebook, and a synthesis parameter is obtained.

【００３０】さらに、コードブックは、有声音波形を低
周波成分と高周波成分に分離した時の低周波成分波形の
ベクトル量子化コードブックである低周波成分コードブ
ックと、同じく、有声音波形を低周波成分と高周波成分
に分離した時の高周波成分波形のベクトル量子化コード
ブックである高周波成分コードブックとからなり、乱数
利用コードブック参照部は、線形和表現音声素片データ
ベースに記憶されている高周波成分コードの係数に乱数
発生器による乱数を加算し、係数と低周波成分コードブ
ックおよび高周波成分コードブックを参照して得た波形
から線形和を求め合成パラメータを得る。Further, the codebook is a low frequency component codebook which is a vector quantization codebook of a low frequency component waveform when the voiced sound waveform is separated into a low frequency component and a high frequency component. A high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into a high-frequency component and a high-frequency component, and the random-number-based codebook reference unit is a high-frequency component stored in the linear sum expression speech unit database. A random number generated by a random number generator is added to the coefficient of the component code, and a linear sum is obtained from the coefficient and the waveform obtained by referring to the low frequency component codebook and the high frequency component codebook to obtain a synthesis parameter.

【００３１】[0031]

【発明の実施の形態】実施の形態１．図１はこの発明の
音声合成装置を示すブロック図である。図において、１
はテキスト、２は言語処理部、３は音素列、４はアクセ
ント情報、５は韻律生成部、６はピッチ周波数、１１は
音声素片データベース、８は低周波成分コードブック、
９は高周波成分コードブック、１０は素片読み出し部、
７は音声素片、１２はコードブック参照部および加算部
としてのコードブック参照加算部、１３は合成パラメー
タ、１４は合成部、１５は合成音声である。BEST MODE FOR CARRYING OUT THE INVENTION Embodiment 1. FIG. 1 is a block diagram showing a speech synthesizer of the present invention. In the figure, 1
Is text, 2 is a language processing unit, 3 is a phoneme string, 4 is accent information, 5 is a prosody generation unit, 6 is a pitch frequency, 11 is a speech unit database, 8 is a low frequency component codebook,
9 is a high frequency component code book, 10 is a segment reading unit,
Reference numeral 7 is a voice unit, 12 is a codebook reference addition unit as a codebook reference unit and an addition unit, 13 is a synthesis parameter, 14 is a synthesis unit, and 15 is a synthesized voice.

【００３２】次ぎに動作について説明する。言語処理部
２は、入力されたテキスト１から読みを表す音素列３と
アクセント情報４を出力する。素片読み出し部１０は、
音素列３にしたがって音声素片７を音声素片データベー
ス１１から読み出す。読み出された音声素片７はＶＣＶ
（母音−子音−母音）またＣＶ（子音−母音）などの単
位からなり、事前のベクトル量子化により得られたもの
であり、低周波成分波形系列と高周波成分波形系列のそ
れぞれに対応するコード列が記録されている。音声素片
の作成方法は後に説明する。Next, the operation will be described. The language processing unit 2 outputs a phoneme sequence 3 and accent information 4 representing reading from the input text 1. The element reading unit 10
The speech unit 7 is read from the speech unit database 11 according to the phoneme string 3. The read voice unit 7 is VCV
(Vowel-consonant-vowel) or CV (consonant-vowel) units, which are obtained by vector quantization in advance, and code strings corresponding to the low-frequency component waveform series and the high-frequency component waveform series, respectively. Is recorded. A method for creating a voice unit will be described later.

【００３３】低周波成分コードブック８は、低周波成分
音声波形のスペクトルデータから既存のクラスタリング
手法により分割されたパターン（コードベクトル）とコ
ードの組からなる。一方、高周波成分コードブック９
は、高周波成分音声波形のスペクトルデータから同様の
手法で得た分割されたパターン（コードベクトル）とコ
ードの組からなる。低周波成分コードブック８および高
周波成分コードブック９の各コードベクトルには、１ピ
ッチ長の音声波形を利用する。コードブックの作成方法
は後程説明する。コードブック参照加算部１２では、先
の音声素片７におけるコード列から低周波成分コードブ
ック８および高周波成分コードブック９を参照して、１
ピッチ分の低周波成分波形および高周波成分波形を抽出
する。ここでの参照および選択方法は後程説明する。The low frequency component codebook 8 is composed of a pattern (code vector) and a code set obtained by dividing the spectrum data of the low frequency component speech waveform by the existing clustering method. On the other hand, high frequency component codebook 9
Consists of a set of divided patterns (code vectors) and codes obtained by a similar method from the spectrum data of the high frequency component speech waveform. For each code vector of the low frequency component code book 8 and the high frequency component code book 9, a one-pitch-length voice waveform is used. How to create a codebook will be described later. The codebook reference adding unit 12 refers to the low-frequency component codebook 8 and the high-frequency component codebook 9 from the code string in the preceding speech unit 7, and
A low frequency component waveform and a high frequency component waveform for the pitch are extracted. The reference and selection method here will be described later.

【００３４】そして低周波成分波形と高周波成分波形と
を加算し、１ピッチ分の波形情報からなる合成パラメー
タ１３を得る。韻律生成部５は、アクセント情報４を基
に合成音声のピッチ６を韻律規則により生成する。合成
部１４は合成パラメータ１３とピッチ６からＰＳＯＬＡ
方式に基づき合成音声１５を生成する。Then, the low frequency component waveform and the high frequency component waveform are added to obtain a synthesis parameter 13 composed of waveform information for one pitch. The prosody generation unit 5 generates the pitch 6 of the synthetic speech based on the accent information 4 according to the prosody rule. The synthesizing unit 14 uses the synthesis parameter 13 and the pitch 6 to calculate PSOLA.
The synthetic speech 15 is generated based on the method.

【００３５】ここで、低周波成分コードブック８および
高周波成分コードブック９と、音声素片データベース１
１の作成方法について説明する。まず、有声区間音声を
１ピッチ長分切り出す。次ぎに１ピッチ波形について、
ある帯域分割周波数により、高周波成分波形と低周波成
分波形とに分離する。例えば８［ｋＨｚ］サンプリング
の音声データである場合、３［ｋＨｚ］による帯域制限
を行うローパスフィルターおよびハイパスフィルターを
作成し、これに前述の１ピッチ波形を通すことで低周波
成分波形と高周波成分波形とに分離する。そして、低周
波成分波形、高周波成分波形それぞれを収集し、クラス
タリング手法により低周波成分コードブック８、高周波
成分コードブック９を作成する。コードブックサイズに
ついては低周波成分コードブック８と高周波成分コード
ブック９とも同じ値、例えば２５０ずつとする。Here, the low-frequency component codebook 8 and the high-frequency component codebook 9, and the speech unit database 1
A method of creating the first item will be described. First, the voiced section voice is cut out for one pitch length. Next, for the 1 pitch waveform,
A certain band division frequency separates a high frequency component waveform and a low frequency component waveform. For example, in the case of audio data of 8 [kHz] sampling, a low-pass filter and a high-pass filter that limit the band by 3 [kHz] are created, and the 1-pitch waveform described above is passed through the low-pass filter and the high-frequency component waveform. And separate. Then, each of the low frequency component waveform and the high frequency component waveform is collected, and the low frequency component codebook 8 and the high frequency component codebook 9 are created by the clustering method. Regarding the codebook size, the low frequency component codebook 8 and the high frequency component codebook 9 have the same value, for example, 250 each.

【００３６】又、これらのコードブックを用いた、ベク
トル量子化による音声素片データベース１１の作成方法
も説明する。ベクトル量子化は低周波成分と高周波成分
について別々に行い、各音素列毎に得られたコード列を
フレーム単位で記述していく。ここで、低周波成分につ
いてはフレーム毎に代表波形１ピッチ分のみについての
ベクトル量子化を行うが、高周波成分についてはフレー
ム内の全てのピッチ波形についてベクトル量子化を行
い、得られたコード列をすべてフレーム毎に記述する。
このようにして、１つの音声素片は図２のようになる。A method of creating the speech unit database 11 by vector quantization using these codebooks will also be described. Vector quantization is performed separately for low frequency components and high frequency components, and code sequences obtained for each phoneme sequence are described in frame units. Here, for low frequency components, vector quantization is performed only for one pitch of the representative waveform for each frame, but for high frequency components, vector quantization is performed for all pitch waveforms in the frame, and the obtained code string is obtained. Describe every frame.
In this way, one speech unit becomes as shown in FIG.

【００３７】また、コードブック参照加算部１２での１
ピッチ毎の素片選択は、以下のように行う。まず音声素
片データベース１１の１フレーム毎の低周波成分コード
は１つだけなので、そのコードに対応する波形を低周波
成分コードブック８から選択する。次ぎに高周波成分波
形についてはコードが複数存在する場合には、フレーム
の先頭から１ピッチ毎にコードを選び、それに対応する
波形を高周波波形コードブック９から選択する。In addition, 1 in the codebook reference addition unit 12
The element selection for each pitch is performed as follows. First, since there is only one low frequency component code for each frame in the speech segment database 11, the waveform corresponding to that code is selected from the low frequency component code book 8. Next, when there are a plurality of codes for the high frequency component waveform, a code is selected for each pitch from the beginning of the frame, and the corresponding waveform is selected from the high frequency waveform codebook 9.

【００３８】図２を用いて説明すれば、音声素片／ａ／
において第１フレーム内の最初のピッチ区間について
は、高周波成分コードの２０１を選択し、次のピッチ区
間は２、その次ぎは２０１に戻る。そして第２フレーム
からは１０２、５１３、５１２の順で選択する。１ピッ
チ毎の低周波成分波形と高周波成分波形との和を１ピッ
チ波形とする。すなわち、この方法により同一フレーム
内でも高周波成分のみが異なるピッチ波形の生成が可能
となる。Referring to FIG. 2, the speech unit / a /
In the first pitch section in the first frame, the high frequency component code 201 is selected, 2 is set for the next pitch section, and 201 is returned for the next pitch section. Then, from the second frame, 102, 513 and 512 are selected in this order. The sum of the low-frequency component waveform and the high-frequency component waveform for each pitch is defined as a 1-pitch waveform. That is, this method makes it possible to generate pitch waveforms that differ only in high-frequency components even within the same frame.

【００３９】本実施の形態によれば、このような構成を
とることにより、ゆらぎの大きい高周波成分と低周波成
分とを別々に制御することができ、ピッチ同期の波形利
用の合成方式においてゆらぎ成分の生成を可能とし、よ
り自然性の高い合成音声の提供が可能となる。According to the present embodiment, by adopting such a configuration, it is possible to separately control a high-frequency component and a low-frequency component with large fluctuation, and the fluctuation component in the pitch-synchronized waveform-using synthesis method. It becomes possible to provide a synthetic speech with higher naturalness.

【００４０】実施の形態２．図３はこの発明の音声合成
装置の他の例を示すブロック図である。なお、上述実施
の形態１と同様の構成要素およびデータについては、同
じ符号を付けて説明を省略する。本実施の形態における
音声合成装置は、実施の形態１と同様の言語処理部２、
韻律生成部５、素片読み出し部１０、音声素片データベ
ース１１、低周波成分コードブック８、高周波成分コー
ドブック９および合成部１４に加えて、コードブック参
照部１６、移動加算部１９を具備した構成を有してい
る。Embodiment 2. FIG. 3 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the above-described first embodiment are designated by the same reference numerals, and description thereof will be omitted. The speech synthesizer according to the present embodiment has a language processing unit 2 similar to that of the first embodiment,
In addition to the prosody generation unit 5, the unit reading unit 10, the speech unit database 11, the low frequency component codebook 8, the high frequency component codebook 9, and the synthesis unit 14, a codebook reference unit 16 and a moving addition unit 19 are provided. Have a configuration.

【００４１】コードブック参照部１６は、入力される音
声素片７におけるコード列から低周波成分コードブック
８および高周波成分コードブック９を参照して１ピッチ
単位の低周波成分波形１７および高周波成分波形１８を
抽出する。移動加算部１９は、低周波成分波形１７の基
準位置に対して、高周波成分波形１８を加算する位置を
１回ごとにずらして加算し、それを１ピッチ波形情報か
らなる合成パラメータ１３として出力する。The codebook reference section 16 refers to the low-frequency component codebook 8 and the high-frequency component codebook 9 from the code string in the input speech unit 7, and outputs the low-frequency component waveform 17 and the high-frequency component waveform in pitch units. Extract 18. The moving addition unit 19 shifts the position where the high-frequency component waveform 18 is added to the reference position of the low-frequency component waveform 17 every time and adds the result, and outputs it as the synthesis parameter 13 including one pitch waveform information. .

【００４２】次ぎに動作について説明する。言語処理部
２は、入力されたテキスト１から読みを表す音素列３と
アクセント情報４を得る。素片読み出し部１０は、音素
列３にしたがって音声素片７を音声素片データベース１
１から読み出す。読み出された音声素片７はＶＣＶまた
ＣＶなどの単位からなり、事前のベクトル量子化により
得られたものであり、低周波成分波形系列と高周波成分
波形系列のそれぞれに対応するコード列が記録されてい
る。ただし、各フレームとも低周波コードと高周波コー
ドは一つずつとする。すなわちベクトル量子化はフレー
ム同期で、フレーム中の代表波形１つに対して行う。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The unit reading unit 10 stores the phonemes 7 according to the phoneme string 3 in the phoneme database 1.
Read from 1. The read speech unit 7 is composed of units such as VCV or CV, and is obtained by vector quantization in advance. Code strings corresponding to the low-frequency component waveform series and the high-frequency component waveform series are recorded. Has been done. However, one low frequency code and one high frequency code are used for each frame. That is, vector quantization is performed in frame synchronization with respect to one representative waveform in a frame.

【００４３】低周波成分コードブック８と高周波成分コ
ードブック９の構成および作成法は実施の形態１と同じ
とする。コードブック参照部１６では、先の音声素片７
におけるコード列から低周波成分コードブック８および
高周波成分コードブック９を参照して、１ピッチ分の低
周波成分波形１７および高周波成分波形１８を抽出す
る。移動加算部１９では前記低周波成分波形１７と前記
高周波成分波形１８とを加算し、波形情報からなる合成
パラメータ１３を得る。加算方法は後に説明する。The low frequency component codebook 8 and the high frequency component codebook 9 are constructed and created in the same manner as in the first embodiment. In the codebook reference unit 16, the speech unit 7
With reference to the low-frequency component codebook 8 and the high-frequency component codebook 9, the low-frequency component waveform 17 and the high-frequency component waveform 18 for one pitch are extracted from the code string in. The moving adder 19 adds the low-frequency component waveform 17 and the high-frequency component waveform 18 to obtain a synthesis parameter 13 consisting of waveform information. The addition method will be described later.

【００４４】韻律生成部５はアクセント情報４を基に合
成音声のピッチ６を韻律規則により生成する。合成部１
４は合成パラメータ１３とピッチ６からＰＳＯＬＡ方式
に基づき合成音声１５を生成する。The prosody generation unit 5 generates the pitch 6 of the synthetic voice based on the accent information 4 according to the prosody rule. Synthesis part 1
4 generates a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００４５】ここで、コードブック参照部１６と移動加
算部１９の動作について説明する。まず音声素片データ
ベースの１フレーム毎の低周波成分コードに対応する波
形を低周波成分コードブックから選択する。次ぎに高周
波成分波形についても同じくコードが単一となり、それ
に対応する波形を高周波成分コードブックから選択す
る。Here, the operations of the codebook reference unit 16 and the moving addition unit 19 will be described. First, a waveform corresponding to the low frequency component code for each frame of the speech segment database is selected from the low frequency component code book. Next, the same code is used for the high frequency component waveform, and the corresponding waveform is selected from the high frequency component codebook.

【００４６】加算方式を図４を用いて説明すれば、１ピ
ッチ長の低周波成分波形はピッチ同期の基準位置に、ピ
ッチ間隔で配置される。これは図においては上の波形が
ピッチ周期（Ｔ１，Ｔ２，Ｔ３）間隔で配置されること
で示される。それに対して、１ピッチ長の高周波成分波
形は先の基準位置に対して数サンプルずれた位置にピッ
チ同期で配置する。例えばフレームの１番目の高周波成
分波形は基準位置＋ｔ１、２番目のピッチ波形は基準位
置＋ｔ２、３番目の波形は−ｔ３となっている。こうし
て１フレーム分の波形を得る。すなわち、この方法によ
り同一フレーム内でも高周波成分の配置の相違により、
ピッチ波形間に波形ゆらぎを生成することが可能とな
る。Explaining the addition method with reference to FIG. 4, the low-frequency component waveform of one pitch length is arranged at the pitch synchronization reference position at pitch intervals. This is shown in the figure by arranging the upper waveforms at pitch period (T1, T2, T3) intervals. On the other hand, the high-frequency component waveform of one pitch length is arranged in pitch synchronization at a position displaced by several samples from the previous reference position. For example, the first high-frequency component waveform of the frame is the reference position + t1, the second pitch waveform is the reference position + t2, and the third waveform is -t3. In this way, a waveform for one frame is obtained. That is, due to the difference in the arrangement of high frequency components even in the same frame by this method,
It is possible to generate a waveform fluctuation between pitch waveforms.

【００４７】本実施の形態によれば、このような構成を
とることにより、ゆらぎの大きい高周波成分と低周波成
分とを別々に制御することができ、ピッチ同期の波形利
用の合成方式において、より自然性の高い合成音声の提
供が可能となる。According to the present embodiment, by adopting such a configuration, it is possible to separately control a high-frequency component having a large fluctuation and a low-frequency component, and it is possible to further control in a pitch-synchronized waveform utilizing synthesis method. It is possible to provide synthetic speech with high naturalness.

【００４８】実施の形態３．図５はこの発明の音声合成
装置の他の例を示すブロック図である。なお、上述実施
の形態１および実施の形態２と同様の構成要素およびデ
ータについては、同じ符号を付けて説明を省略する。本
実施の形態における音声合成装置は、実施の形態１と同
様の言語処理部２、韻律生成部５、素片読み出し部１
０、音声素片データベース１１、低周波成分コードブッ
ク８、高周波成分コードブック９および合成部１４、実
施の形態２と同様のコードブック参照部１６に加えて、
韻律制御移動加算部２０を具備した構成を有している。
韻律制御移動加算部２０は、低周波成分波形１７の基準
位置に対して、高周波成分波形１８を加算する位置の、
韻律生成部５で得たピッチ６に応じて平均変動幅を変え
て、１回ごとにずらして加算し、それを１ピッチ波形情
報からなる合成パラメータ１３として出力する。Embodiment 3. FIG. 5 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first and second embodiments described above are designated by the same reference numerals and the description thereof will be omitted. The speech synthesis apparatus according to the present embodiment has the same language processing unit 2, prosody generation unit 5, and segment reading unit 1 as in the first embodiment.
0, the speech unit database 11, the low frequency component codebook 8, the high frequency component codebook 9 and the synthesis unit 14, and the codebook reference unit 16 similar to the second embodiment,
It has a configuration including a prosody control moving addition unit 20.
The prosody control moving adder 20 adds the high-frequency component waveform 18 to the reference position of the low-frequency component waveform 17,
The average fluctuation width is changed according to the pitch 6 obtained by the prosody generation unit 5, and the values are shifted and added every time, and the sum is output as a synthesis parameter 13 including one pitch waveform information.

【００４９】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって音声素片７を音声素片データベース１１か
ら読み出す。読み出された音声素片７はＶＣＶまたＣＶ
などの単位からなり、事前のベクトル量子化により得ら
れたものであり、低周波成分波形系列と高周波成分波形
系列のそれぞれに対応するコード列が記録されている。
ただし、各フレームとも低周波コードと高周波コードは
一つずつとする。すなわちベクトル量子化はフレーム同
期で、フレーム中の代表波形１つに対して行う。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Then, the speech unit 7 is read from the speech unit database 11. The read voice unit 7 is VCV or CV.
Is obtained by prior vector quantization, and the code strings corresponding to each of the low frequency component waveform series and the high frequency component waveform series are recorded.
However, one low frequency code and one high frequency code are used for each frame. That is, vector quantization is performed in frame synchronization with respect to one representative waveform in a frame.

【００５０】低周波成分コードブック８と高周波成分コ
ードブック９の構成および作成法は実施の形態１と同じ
とする。コードブック参照部１６では、先の音声素片７
におけるコード列から低周波成分コードブック８および
高周波成分コードブック９を参照して、１ピッチ分の低
周波成分波形１７および高周波成分波形１８を抽出す
る。韻律制御移動加算部２０では前記低周波成分波形１
７と前記高周波成分波形１８との加算位置の平均変化幅
をピッチ６により制御して加算し、１ピッチ波形情報か
らなる合成パラメータ１３を得る。加算方法は後に説明
する。合成部１４は合成パラメータ１３とピッチ６から
ＰＳＯＬＡ方式に基づき合成音声１５を生成する。The low frequency component codebook 8 and the high frequency component codebook 9 are constructed and created in the same manner as in the first embodiment. In the codebook reference unit 16, the speech unit 7
With reference to the low-frequency component codebook 8 and the high-frequency component codebook 9, the low-frequency component waveform 17 and the high-frequency component waveform 18 for one pitch are extracted from the code string in. In the prosody control moving adder 20, the low frequency component waveform 1
7 and the high-frequency component waveform 18 are added by controlling the average change width of the added position by controlling the pitch 6 to obtain a synthesis parameter 13 consisting of 1-pitch waveform information. The addition method will be described later. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００５１】ここで、韻律制御移動加算部２０の動作に
ついて説明する。まず音声素片データベースの１フレー
ム毎の低周波成分コードに対応する波形を低周波波形コ
ードブック８から選択する。次ぎに高周波成分波形につ
いても同じくコードが単一となり、それに対応する波形
を高周波成分コードブック９から選択する。そして、１
ピッチ長の低周波成分波形の基準位置に対する図４にお
ける高周波成分波形の配置位置を、韻律生成部５で生成
したピッチ６により制御する。例えば、ピッチ６の値を
α［Ｈｚ］とした時に、基準位置と高周波成分波形配置
位置との変化幅を１／α倍する。これは、低ピッチの音
声波形ほど波形ゆらぎが大きいという知見に基づくもの
である。Here, the operation of the prosody control moving addition section 20 will be described. First, the waveform corresponding to the low frequency component code for each frame of the speech segment database is selected from the low frequency waveform codebook 8. Next, the same code is used for the high frequency component waveform, and the corresponding waveform is selected from the high frequency component code book 9. And 1
The arrangement position of the high frequency component waveform in FIG. 4 with respect to the reference position of the low frequency component waveform of the pitch length is controlled by the pitch 6 generated by the prosody generation unit 5. For example, when the value of the pitch 6 is α [Hz], the change width between the reference position and the high frequency component waveform arrangement position is multiplied by 1 / α. This is based on the finding that the lower the pitch, the larger the waveform fluctuation.

【００５２】加算方式を図４を用いて説明すれば、１ピ
ッチ長の低周波成分波形はピッチ同期の基準位置に、ピ
ッチ間隔で配置される。それに対して、１ピッチ長の高
周波成分波形は先の基準位置に対して数サンプルずれた
位置にピッチ同期で配置する。平均変化幅をｋとした場
合、例えばフレーム１、２、３番目の高周波成分波形の
基準位置からのずれｔ１，ｔ２，ｔ３はそれぞれｋ／α
に比例した値となる。こうして１ピッチ分の波形を得
る。すなわち、この方法により同一フレーム内でも高周
波成分の配置の相違により、ピッチ波形間に波形ゆらぎ
を生成することが可能となる。Explaining the addition method with reference to FIG. 4, the low-frequency component waveform of one pitch length is arranged at the pitch synchronization reference position at the pitch interval. On the other hand, the high-frequency component waveform of one pitch length is arranged in pitch synchronization at a position displaced by several samples from the previous reference position. When the average change width is k, for example, the deviations t1, t2, t3 of the first, second, and third high-frequency component waveforms from the reference position are k / α, respectively.
The value is proportional to. In this way, a waveform for one pitch is obtained. That is, according to this method, it is possible to generate a waveform fluctuation between pitch waveforms due to a difference in arrangement of high frequency components even in the same frame.

【００５３】本実施の形態によれば、このような構成を
とることにより、ゆらぎの大きい高周波成分と低周波成
分とを別々に制御することができ、ピッチ同期の波形利
用の合成方式においてゆらぎ成分の生成を可能とし、な
お且つゆらぎの変動量を相関性の高いピッチによって制
御することで、より自然性の高い合成音声の提供が可能
となる。According to the present embodiment, by adopting such a configuration, it is possible to control the high-frequency component and the low-frequency component with large fluctuations separately, and the fluctuation component in the pitch-synchronized waveform-using synthesis method. Is generated and the fluctuation amount of fluctuation is controlled by the pitch having a high correlation, it becomes possible to provide a synthetic speech having a higher naturalness.

【００５４】実施の形態４．図６はこの発明の音声合成
装置の他の例を示すブロック図である。なお、上述実施
の形態１および実施の形態２と同様の構成要素およびデ
ータについては、同じ符号を付けて説明を省略する。本
実施の形態における音声合成装置は、実施の形態１と同
様の言語処理部２、韻律生成部５、素片読み出し部１
０、音声素片データベース１１、低周波成分コードブッ
ク８、高周波成分コードブック９および合成部１４、実
施の形態２と同様のコードブック参照部１６に加えて、
振幅制御波形加算部２１を具備した構成を有している。
振幅比率制御加算部２１は、低周波成分波形１７と高周
波成分波形１８の振幅の加算比率を変えて加算し、それ
を１ピッチ波形情報からなる合成パラメータ１３として
出力する。Fourth Embodiment FIG. 6 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first and second embodiments described above are designated by the same reference numerals and the description thereof will be omitted. The speech synthesis apparatus according to the present embodiment has the same language processing unit 2, prosody generation unit 5, and segment reading unit 1 as in the first embodiment.
0, the speech unit database 11, the low frequency component codebook 8, the high frequency component codebook 9 and the synthesis unit 14, and the codebook reference unit 16 similar to the second embodiment,
It has a configuration including an amplitude control waveform adder 21.
The amplitude ratio control addition unit 21 changes the addition ratio of the amplitudes of the low frequency component waveform 17 and the high frequency component waveform 18 and adds them, and outputs the result as a synthesis parameter 13 consisting of 1 pitch waveform information.

【００５５】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって音声素片７を音声素片データベース１１か
ら読み出す。読み出された音声素片７はＶＣＶまたＣＶ
などの単位からなり、事前のベクトル量子化により得ら
れたものであり、低周波成分波形系列と高周波成分波形
系列のそれぞれに対応するコード列が記録されている。
ただし、各フレームとも低周波コードと高周波コードは
一つずつとする。すなわちベクトル量子化はフレーム同
期で、フレーム中の代表波形１つに対して行う。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Then, the speech unit 7 is read from the speech unit database 11. The read voice unit 7 is VCV or CV.
Is obtained by prior vector quantization, and the code strings corresponding to each of the low frequency component waveform series and the high frequency component waveform series are recorded.
However, one low frequency code and one high frequency code are used for each frame. That is, vector quantization is performed in frame synchronization with respect to one representative waveform in a frame.

【００５６】低周波成分コードブック８と高周波成分コ
ードブック９の構成および作成法は実施の形態１と同じ
とする。コードブック参照部１６では、先の音声素片７
におけるコード列から低周波成分コードブック８および
高周波成分コードブック９を参照して、１ピッチ分の低
周波成分波形１７および高周波成分波形１８を抽出す
る。振幅比率制御加算部２１では前記低周波成分波形１
７と前記高周波成分波形１８とを、それらの振幅比率を
１ピッチ毎に変更して加算し、１ピッチ波形情報からな
る合成パラメータ１３を得る。合成部１４は合成パラメ
ータ１３とピッチ６からＰＳＯＬＡ方式に基づき合成音
声１５を生成する。The low frequency component codebook 8 and the high frequency component codebook 9 are constructed and created in the same manner as in the first embodiment. In the codebook reference unit 16, the speech unit 7
With reference to the low-frequency component codebook 8 and the high-frequency component codebook 9, the low-frequency component waveform 17 and the high-frequency component waveform 18 for one pitch are extracted from the code string in. In the amplitude ratio control adder 21, the low frequency component waveform 1
7 and the high frequency component waveform 18 are added by changing their amplitude ratio for each pitch, and a synthesis parameter 13 consisting of 1 pitch waveform information is obtained. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００５７】本実施の形態によれば、このような構成を
とることにより、ゆらぎの大きい高周波成分と低周波成
分とを別々に制御することができ、ピッチ同期の波形利
用の合成方式において高周波成分波形と低周波成分波形
の振幅比率を１ピッチ毎に変更することによりゆらぎの
制御を行うことで、より自然性の高い合成音声の提供が
可能となる。According to the present embodiment, by adopting such a configuration, it is possible to separately control a high-frequency component having a large fluctuation and a low-frequency component, and the high-frequency component in the pitch-synchronized waveform utilizing synthesis method. By controlling the fluctuations by changing the amplitude ratio of the waveform and the low-frequency component waveform for each pitch, it is possible to provide synthetic speech with a higher naturalness.

【００５８】実施の形態５．図７はこの発明の音声合成
装置の他の例を示すブロック図である。なお、上述実施
の形態１および実施の形態２と同様の構成要素およびデ
ータについては、同じ符号を付けて説明を省略する。本
実施の形態における音声合成装置は、実施の形態１と同
様の言語処理部２、韻律生成部５、素片読み出し部１
０、音声素片データベース１１、低周波成分コードブッ
ク８、高周波成分コードブック９および合成部１４、実
施の形態２と同様のコードブック参照部１６に加えて、
振幅比率ピッチ制御加算部２２を具備した構成を有して
いる。振幅比率ピッチ制御加算部２２は、低周波成分波
形１７と高周波成分波形１８の振幅の加算比率を、韻律
生成部５で得たピッチ６に応じて変えて加算し、それを
１ピッチ波形情報からなる合成パラメータ１３として出
力する。Embodiment 5. FIG. 7 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first and second embodiments described above are designated by the same reference numerals and the description thereof will be omitted. The speech synthesis apparatus according to the present embodiment has the same language processing unit 2, prosody generation unit 5, and segment reading unit 1 as in the first embodiment.
0, the speech unit database 11, the low frequency component codebook 8, the high frequency component codebook 9 and the synthesis unit 14, and the codebook reference unit 16 similar to the second embodiment,
The configuration is provided with the amplitude ratio pitch control addition unit 22. The amplitude ratio pitch control addition unit 22 changes the addition ratio of the amplitudes of the low-frequency component waveform 17 and the high-frequency component waveform 18 according to the pitch 6 obtained by the prosody generation unit 5, and adds the addition ratio from the 1-pitch waveform information. Is output as the synthesis parameter 13.

【００５９】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって音声素片７を音声素片データベース１１か
ら読み出す。読み出された音声素片７はＶＣＶまたＣＶ
などの単位からなり、事前のベクトル量子化により得ら
れたものであり、低周波成分波形系列と高周波成分波形
系列のそれぞれに対応するコード列が記録されている。
ただし、各フレームとも低周波コードと高周波コードは
一つずつとする。すなわちベクトル量子化はフレーム同
期で、フレーム中の代表波形１つに対して行う。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Then, the speech unit 7 is read from the speech unit database 11. The read voice unit 7 is VCV or CV.
Is obtained by prior vector quantization, and the code strings corresponding to each of the low frequency component waveform series and the high frequency component waveform series are recorded.
However, one low frequency code and one high frequency code are used for each frame. That is, vector quantization is performed in frame synchronization with respect to one representative waveform in a frame.

【００６０】低周波成分コードブック８と高周波成分コ
ードブック９の構成および作成法は実施の形態１と同じ
とする。コードブック参照部１６では、先の音声素片７
におけるコード列から低周波成分コードブック８および
高周波成分コードブック９を参照して、１ピッチ分の低
周波成分波形１７および高周波成分波形１８を抽出す
る。振幅比率ピッチ制御加算部２２では前記低周波成分
波形１７と前記高周波成分波形１８とを、それらの振幅
比率をピッチ６により制御して加算し、１ピッチ波形情
報からなる合成パラメータ１３を得る。加算方法は後に
説明する。合成部１４は合成パラメータ１３とピッチ６
からＰＳＯＬＡ方式に基づき合成音声１５を生成する。It is assumed that the low frequency component codebook 8 and the high frequency component codebook 9 are constructed and created in the same manner as in the first embodiment. In the codebook reference unit 16, the speech unit 7
With reference to the low-frequency component codebook 8 and the high-frequency component codebook 9, the low-frequency component waveform 17 and the high-frequency component waveform 18 for one pitch are extracted from the code string in. The amplitude ratio pitch control adder 22 controls the amplitude ratio of the low frequency component waveform 17 and the high frequency component waveform 18 by the pitch 6, and adds them to obtain a synthesis parameter 13 consisting of one pitch waveform information. The addition method will be described later. The synthesizing unit 14 uses the synthesis parameter 13 and the pitch 6
From the PSOLA system to generate a synthetic voice 15.

【００６１】ここで、振幅比率ピッチ制御加算部２２の
動作について説明する。まず選択された１ピッチ長の高
周波成分波形の振幅比率をβ（０＜β＜１）とし、低周
波波形をｘｌ、高周波成分波形をｘｈとした場合に、
（１）式で示される波形の重みづけ加算を行う。（１−β）・ｘｌ＋ β・ｘｈ（１）この時、ピッチ６の値とβと関係は反比例となるように
設定する。これは、低ピッチの音声波形ほど波形ゆらぎ
が大きいという知見に基づくものであり、その為に高周
波成分の比率を上げるものである。こうして１ピッチ分
の波形を得る。すなわち、この方法によりピッチの値に
よって波形ゆらぎの制御が可能となる。The operation of the amplitude ratio pitch control adder 22 will be described. First, when the amplitude ratio of the selected high frequency component waveform of one pitch length is β (0 <β <1), the low frequency waveform is xl, and the high frequency component waveform is xh,
The weighted addition of the waveform represented by the equation (1) is performed. (1-β) · xl + β · xh (1) At this time, the relationship between the value of the pitch 6 and β is set to be inversely proportional. This is based on the finding that the lower the pitch of the voice waveform is, the larger the waveform fluctuation is. Therefore, the ratio of the high frequency component is increased. In this way, a waveform for one pitch is obtained. That is, according to this method, the waveform fluctuation can be controlled by the pitch value.

【００６２】本実施の形態によれば、このような構成を
とることにより、ゆらぎの大きい高周波成分と低周波成
分とを別々に制御することができ、ピッチ同期の波形利
用の合成方式において振幅比率をゆらぎと相関性の高い
ピッチによって制御することで、より自然性の高い合成
音声の提供が可能となる。According to the present embodiment, by adopting such a configuration, it is possible to separately control a high-frequency component having a large fluctuation and a low-frequency component, and it is possible to control the amplitude ratio in the pitch-synchronized waveform-using synthesis method. It is possible to provide synthetic speech with higher naturalness by controlling the pitch with a pitch highly correlated with the fluctuation.

【００６３】実施の形態６．図８はこの発明の音声合成
装置の他の例を示すブロック図である。なお、上述実施
の形態１および従来例１と同様の構成要素およびデータ
については、同じ符号を付けて説明を省略する。本実施
の形態における音声合成装置は、実施の形態１と同様の
言語処理部２、韻律生成部５および合成部１４、従来例
１と同様のコードブック参照部１６およびコードブック
２４に加えて、頻度情報付き音声素片データベース２３
および素片読み出し部１０を具備した構成を有してい
る。頻度情報付き音声素片データベース２３は、事前の
ベクトル量子化の際に記述されたコードブックのコード
とのその出現頻度の系列からなる。素片読み出し部１０
は、音素列３におけるコードの頻度から音声素片７に記
述するコード列を決定する。Sixth Embodiment FIG. 8 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first embodiment and the conventional example 1 described above are designated by the same reference numerals and description thereof will be omitted. The speech synthesis apparatus according to the present embodiment has a language processing unit 2, a prosody generation unit 5 and a synthesis unit 14 similar to those of the first embodiment, a codebook reference unit 16 and a codebook 24 similar to those of the conventional example 1, Speech element database with frequency information 23
And a configuration including the element reading unit 10. The speech element database with frequency information 23 is composed of a codebook code described at the time of vector quantization in advance and a sequence of its appearance frequency. Element reading section 10
Determines the code string to be written in the speech unit 7 from the frequency of the code in the phoneme string 3.

【００６４】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって頻度情報付き音声素片データベース２３を
参照する。頻度情報付き音声素片データベース２３は図
９に示されるような構造となっており、音素列３に対応
する頻度情報付き音声素片の系列を読み出す。頻度情報
は元となる音声データベース中の全音素系列と、コード
ブックとの間で１ピッチ波形のベクトル量子化を行い、
同一音素系列中でコードブック中のコードベクトルが選
択される回数を記録することにより実現する。図９の例
では、音声素片／ａ／の第１フレームにおいて、コード
１０１の出現頻度数が３０、コード１００が１５、コー
ド５４が５となっている。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Accordingly, the voice element database 23 with frequency information is referred to. The frequency-information-attached speech unit database 23 has a structure as shown in FIG. 9, and reads a sequence of frequency-information-attached speech units corresponding to the phoneme string 3. The frequency information is a vector quantization of a 1-pitch waveform between the entire phoneme sequence in the original speech database and the codebook,
It is realized by recording the number of times the code vector in the codebook is selected in the same phoneme sequence. In the example of FIG. 9, the frequency of appearance of the code 101 is 30, the code 100 is 15, and the code 54 is 5 in the first frame of the speech unit / a /.

【００６５】次ぎに素片読み出し部１０の中で、音声素
片における各フレーム毎のコードを一意に決定し、それ
を音声素片７として出力する。決定方法としては、フレ
ーム毎のコード出現頻度からコード出現率を求め、その
率が閾値を越すコードが複数存在すれば、１ピッチ毎に
コードを変更するように記述する。Next, in the voice unit reading section 10, the code for each frame in the voice unit is uniquely determined and is output as the voice unit 7. As a determination method, the code appearance rate is obtained from the code appearance frequency for each frame, and if there are a plurality of codes whose rates exceed a threshold value, the code is changed every pitch.

【００６６】コードブック参照部１６では、先の音声素
片７におけるコード列からコードブック２４を参照し
て、波形情報からなる合成パラメータ１３を得る。韻律
生成部５はアクセント情報４を基に合成音声のピッチ６
を韻律規則により生成する。合成部１４は合成パラメー
タ１３とピッチ６からＰＳＯＬＡ方式に基づき合成音声
１５を生成する。The codebook reference section 16 refers to the codebook 24 from the code string in the preceding speech unit 7 to obtain the synthesis parameter 13 consisting of waveform information. Based on the accent information 4, the prosody generation unit 5 produces a pitch 6 of synthesized speech.
Is generated according to the prosody rule. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００６７】本実施の形態によれば、このような構成を
とることにより、ピッチ同期の波形利用の合成方式にお
いて、コードブックの出現頻度を考慮した音声素片を用
意することにより、フレーム毎単一コードの音声素片に
比べ、より多様な波形合成を可能とし、より自然性の高
い合成音声の提供が可能となる。According to the present embodiment, by adopting such a configuration, in the pitch-synchronized waveform-utilizing synthesis method, by preparing the speech units in consideration of the frequency of appearance of the codebook, it is possible to perform simple processing for each frame. Compared to a voice unit of one code, it is possible to synthesize a wider variety of waveforms, and it is possible to provide a synthesized voice with higher naturalness.

【００６８】実施の形態７．図１０はこの発明の音声合
成装置の他の例を示すブロック図である。なお、上述実
施の形態１および実施の形態６と同様の構成要素および
データについては、同じ符号を付けて説明を省略する。
本実施の形態における音声合成装置は、実施の形態１と
同様の言語処理部２、韻律生成部５および合成部１４、
実施の形態６と同様のコードブック参照部１６、コード
ブック２４および頻度情報付き音声素片データベース２
３に加えて、素片読み出し重み付け選択部２５を具備し
た構成を有している。素片読み出し重み付け選択部２５
は、音素列３におけるコードの頻度から重み係数を決定
し、音声素片７に記述されたコード列をその重み付けと
共に出力する。Embodiment 7. FIG. 10 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first and sixth embodiments described above are designated by the same reference numerals and the description thereof will be omitted.
The speech synthesis device according to the present embodiment has a language processing unit 2, a prosody generation unit 5, and a synthesis unit 14, which are similar to those of the first embodiment.
The codebook reference unit 16, the codebook 24, and the speech segment database with frequency information 2 similar to those in the sixth embodiment.
In addition to 3, the configuration has a configuration including a unit reading weighting selection unit 25. Element reading weighting selection unit 25
Determines the weighting coefficient from the frequency of the chords in the phoneme string 3, and outputs the code string described in the speech unit 7 together with the weighting.

【００６９】次ぎに動作について説明する。言語処理部
２は、入力されたテキスト１から読みを表す音素列３と
アクセント情報４を得る。素片読み出し重み付け選択部
２５は音素列３にしたがって頻度情報付き音声素片デー
タベース２３を参照する。頻度情報付き音声素片データ
ベース２３は構成は実施の形態６と同様とする。素片読
み出し重み付け選択部２５の中で、音声素片における各
フレーム毎のコードを全て選択する。そして各コードの
出現頻度から判定される出現率を重み係数とし、各コー
ドに乗する。そして、それを音声素片７として出力す
る。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The unit readout weighting selection unit 25 refers to the frequency unit-attached speech unit database 23 according to the phoneme sequence 3. The structure of the speech unit database with frequency information 23 is similar to that of the sixth embodiment. All the codes for each frame in the speech unit are selected in the unit reading weighting selection unit 25. Then, the appearance rate determined from the appearance frequency of each code is used as a weighting coefficient, and each code is multiplied. Then, it is output as the speech unit 7.

【００７０】コードブック参照部１６では、先の音声素
片７におけるコード列からコードブック２４を参照し、
複数のピッチ波形に出現率から求められた重み係数を乗
じて加算する。こうして求められた１ピッチ長の波形情
報からなる合成パラメータ１３を得る。韻律生成部５は
アクセント情報４を基に合成音声のピッチ６を韻律規則
により生成する。合成部１４は合成パラメータ１３とピ
ッチ６からＰＳＯＬＡ方式に基づき合成音声１５を生成
する。The codebook reference section 16 refers to the codebook 24 from the code string in the speech unit 7 above,
A plurality of pitch waveforms are multiplied by a weighting factor obtained from the appearance rate and added. A synthesis parameter 13 including the waveform information of one pitch length thus obtained is obtained. The prosody generation unit 5 generates the pitch 6 of the synthetic speech based on the accent information 4 according to the prosody rule. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００７１】本実施の形態によれば、このような構成を
とることにより、出現頻度を考慮した素片選択および重
みづけ加算をすることでフレーム毎単一コードの音声素
片に比べ、より詳細な波形合成を可能とし、より自然性
の高い合成音声の提供が可能となる。According to the present embodiment, by adopting such a configuration, it is possible to perform more detailed processing than in the case of a speech element having a single code for each frame by performing the element selection and weighted addition considering the appearance frequency. It is possible to synthesize various waveforms and provide more natural synthesized speech.

【００７２】実施の形態８．図１１はこの発明の音声合
成装置の他の例を示すブロック図である。なお、上述実
施の形態１、実施の形態６および実施の形態７と同様の
構成要素およびデータについては、同じ符号を付けて説
明を省略する。本実施の形態における音声合成装置は、
実施の形態１と同様の言語処理部２、韻律生成部５、低
周波成分コードブック８、高周波成分コードブック９お
よび合成部１４、実施の形態６と同様のコードブック参
照部１６、コードブック２４および頻度情報付き音声素
片データベース２３に加えて、素片読み出し重み付け選
択部２５を具備した構成を有している。素片読み出し重
み付け選択部２５は、音素列３における低周波成分およ
び高周波成分のコードの頻度から重み係数を決定し、音
声素片７に記述されたコード列をその重み付けと共に出
力する。Embodiment 8. FIG. 11 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the above-mentioned first, sixth and seventh embodiments are designated by the same reference numerals and the description thereof will be omitted. The speech synthesizer in the present embodiment is
The language processing unit 2, the prosody generation unit 5, the low frequency component codebook 8, the high frequency component codebook 9 and the synthesis unit 14 which are the same as those in the first embodiment, and the codebook reference unit 16 and the codebook 24 which are the same as those in the sixth embodiment. In addition to the frequency-information-added speech unit database 23, a unit reading weighting selection unit 25 is provided. The unit reading weighting selection unit 25 determines a weighting coefficient from the frequency of the low frequency component code and the high frequency component code in the phoneme sequence 3, and outputs the code sequence described in the speech unit 7 together with the weighting.

【００７３】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し重み付け選択部２
５は音素列３にしたがって頻度情報付き音声素片データ
ベース２３を参照する。頻度情報付き音声素片データベ
ース２３は構成は実施の形態６と同様とする。素片読み
出し重み付け選択部２５の中で、音声素片における各フ
レーム毎のコードを全て選択する。そして各コードの出
現頻度から判定される出現率を重み係数とし、各コード
に乗する。そして、それを音声素片７として出力する。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. Element reading weighting selection unit 2
Reference numeral 5 refers to the phoneme database with frequency information 23 according to the phoneme sequence 3. The structure of the speech unit database with frequency information 23 is similar to that of the sixth embodiment. All the codes for each frame in the speech unit are selected in the unit reading weighting selection unit 25. Then, the appearance rate determined from the appearance frequency of each code is used as a weighting coefficient, and each code is multiplied. Then, it is output as the speech unit 7.

【００７４】コードブック参照部１６では、先の音声素
片７におけるコード列から低周波成分コードブック８お
よび高周波成分コードブック９を参照し、複数の１ピッ
チ長の低周波成分波形および高周波成分波形に出現率か
ら求められた重み係数を乗じて加算する。こうして求め
られた１ピッチ長の波形情報からなる合成パラメータ１
３を得る。韻律生成部５はアクセント情報４を基に合成
音声のピッチ６を韻律規則により生成する。合成部１４
は合成パラメータ１３とピッチ６からＰＳＯＬＡ方式に
基づき合成音声１５を生成する。The codebook reference section 16 refers to the low-frequency component codebook 8 and the high-frequency component codebook 9 from the code string in the preceding speech unit 7, and refers to a plurality of 1-pitch-long low-frequency component waveforms and high-frequency component waveforms. Is multiplied by the weighting factor obtained from the appearance rate and added. Synthesis parameter 1 consisting of waveform information of one pitch length thus obtained
Get 3. The prosody generation unit 5 generates the pitch 6 of the synthetic speech based on the accent information 4 according to the prosody rule. Synthesizer 14
Generates a synthesized voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００７５】本実施の形態によれば、このような構成を
とることにより、出現頻度を考慮した素片選択および重
みづけ加算をすることでフレーム毎単一コードの音声素
片に比べ、より詳細で、低周波成分と高周波成分とを分
離することでゆらぎを考慮した波形合成を可能とし、よ
り自然性の高い合成音声の提供が可能となる。According to the present embodiment, by adopting such a configuration, it is possible to perform more detailed processing as compared with a speech element having a single code for each frame by performing the element selection and weighted addition in consideration of the appearance frequency. By separating the low-frequency component and the high-frequency component, it becomes possible to perform waveform synthesis in consideration of fluctuations, and it is possible to provide synthetic speech with a higher naturalness.

【００７６】実施の形態９．図１２はこの発明の音声合
成装置の他の例を示すブロック図である。なお、上述実
施の形態１および従来例１と同様の構成要素およびデー
タについては、同じ符号を付けて説明を省略する。本実
施の形態における音声合成装置は、実施の形態１と同様
の言語処理部２、韻律生成部５、素片読み出し部１０お
よび合成部１４、従来例１と同様のコードブック２４に
加えて、線形和表現音声素片データベース２６およびコ
ードブック参照部１６を具備した構成を有している。線
形和表現音声素片データベース２６は、事前のベクトル
量子化の際に記述されたコードブックのコードを線形和
で表現した系列からなる。コードブック参照部１６は音
声素片７に記述されているコード列からコードブック２
４のコードベクトルを参照し、コードベクトルの線形和
を合成パラメータ１３として出力する。Ninth Embodiment FIG. 12 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first embodiment and the conventional example 1 described above are designated by the same reference numerals and description thereof will be omitted. The speech synthesis device according to the present embodiment includes a language processing unit 2, a prosody generation unit 5, a segment reading unit 10 and a synthesis unit 14 similar to those of the first embodiment, and a codebook 24 similar to that of the first conventional example. It has a configuration including a linear sum expression speech unit database 26 and a codebook reference unit 16. The linear sum representation speech unit database 26 is composed of a sequence in which the codes of the codebook described at the time of the vector quantization in advance are represented by the linear sum. The codebook reference unit 16 determines the codebook 2 from the code string described in the speech unit 7.
The code vector of No. 4 is referred to, and the linear sum of the code vectors is output as the synthesis parameter 13.

【００７７】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって線形和表現音声素片データベース２６を参
照する。線形和表現音声素片データベース２６は図１５
に示されるような構造となっており、音素列３に対応す
る線形和表現形式の音声素片の系列を読み出す。線形和
表現は音声素片データベースについて、コードブック中
の２つのコードベクトルＸ，Ｙにより、（αＸ＋βＹ）
が量子化歪み最小となるような１ピッチ波形のベクトル
量子化を行い、２つのコードと係数α、βを記録するこ
とにより実現する。図１５では音声素片／ａ／の第１フ
レームはＸに対応するコードが１０１、Ｙに対応するコ
ードが１００であり、係数αおよびβはそれぞれ０．
６、０．４となる。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Accordingly, the linear sum expression speech unit database 26 is referred to. The linear sum expression speech unit database 26 is shown in FIG.
The structure is as shown in (1), and the series of speech unit in the linear sum expression format corresponding to the phoneme string 3 is read out. The linear sum expression is (αX + βY) for two speech vectors X and Y in the codebook for the speech unit database.
Is performed by performing vector quantization of a one-pitch waveform that minimizes quantization distortion and recording two codes and coefficients α and β. In FIG. 15, the code corresponding to X is 101 and the code corresponding to Y is 100 in the first frame of the speech unit / a /, and the coefficients α and β are 0.
It becomes 6 and 0.4.

【００７８】コードブック参照部１６では、先の音声素
片７におけるコード列からコードブック２４を参照し
て、２つのコードベクトルＸ，Ｙを読みだし、先の係数
を用いて、（αＸ＋βＹ）という重み付けをした１ピッ
チ波形を求め、これを合成パラメータ１３として得る。
韻律生成部５はアクセント情報４を基に合成音声のピッ
チ６を韻律規則により生成する。合成部１４は合成パラ
メータ１３とピッチ６からＰＳＯＬＡ方式に基づき合成
音声１５を生成する。The codebook reference unit 16 refers to the codebook 24 from the code string in the preceding speech unit 7, reads out the two code vectors X and Y, and uses the above coefficients to call (αX + βY). A weighted 1-pitch waveform is obtained, and this is obtained as the synthesis parameter 13.
The prosody generation unit 5 generates the pitch 6 of the synthetic speech based on the accent information 4 according to the prosody rule. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００７９】本実施例によれば、このような構成をとる
ことにより、フレーム毎単一コードの音声素片に比べ、
より詳細な波形合成を可能とし、より自然性の高い合成
音声の提供が可能となる。According to this embodiment, by adopting such a configuration, as compared with the voice unit having a single code for each frame,
It enables more detailed waveform synthesis and provides more natural synthesized speech.

【００８０】実施の形態１０．図１３はこの発明の音声
合成装置の他の例を示すブロック図である。なお、上述
実施の形態１および実施の形態９と同様の構成要素およ
びデータについては、同じ符号を付けて説明を省略す
る。本実施の形態における音声合成装置は、実施の形態
１と同様の言語処理部２、韻律生成部５、素片読み出し
部１０および合成部１４、実施の形態９と同様のコード
ブック２４、線形和表現音声素片データベース２６に加
えて乱数発生器２７、乱数利用コードブック参照部２９
を具備した構成を有している。乱数利用コードブック参
照部２９は音声素片７に記述された係数に、乱数発生器
２７で発生させた乱数２８を加算し、合成パラメータを
作成する。Embodiment 10. FIG. 13 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first and ninth embodiments described above are designated by the same reference numerals, and description thereof will be omitted. The speech synthesis apparatus according to the present embodiment includes a language processing unit 2, a prosody generation unit 5, a segment reading unit 10 and a synthesis unit 14 similar to those of the first embodiment, a codebook 24 similar to that of the ninth embodiment, and a linear sum. In addition to the expression voice unit database 26, a random number generator 27, a random number using codebook reference unit 29
It has a configuration including. The random number-using codebook reference unit 29 adds the random number 28 generated by the random number generator 27 to the coefficient described in the speech unit 7 to create a synthesis parameter.

【００８１】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって線形和表現音声素片データベース２６を参
照する。線形和表現音声素片データベース２６の構成は
実施の形態９と同様とする。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Accordingly, the linear sum expression speech unit database 26 is referred to. The structure of the linear sum expression speech unit database 26 is similar to that of the ninth embodiment.

【００８２】乱数利用コードブック参照部２９では、先
の音声素片７におけるコード列からコードブック２４を
参照して、２つのコードベクトルＸ，Ｙを読みだす。そ
して乱数発生器２７で発生させた乱数２８を音声素片７
に記述された係数α、βに加算し、新たな係数α’、
β’とし、（α’Ｘ＋β’Ｙ）という重み付けをした１
ピッチ波形を求め、これを合成パラメータ１３として得
る。韻律生成部５はアクセント情報４を基に合成音声の
ピッチ６を韻律規則により生成する。合成部１４は合成
パラメータ１３とピッチ６からＰＳＯＬＡ方式に基づき
合成音声１５を生成する。The random number-based codebook reference unit 29 refers to the codebook 24 from the code string in the speech unit 7 and reads out two code vectors X and Y. Then, the random number 28 generated by the random number generator 27 is used as the speech segment 7
To the new coefficients α ',
β ′ and weighted as (α′X + β′Y) 1
A pitch waveform is obtained and this is obtained as the synthesis parameter 13. The prosody generation unit 5 generates the pitch 6 of the synthetic speech based on the accent information 4 according to the prosody rule. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００８３】本実施例によれば、このような構成をとる
ことにより、フレーム毎単一コードの音声素片に比べ、
より詳細な波形合成を可能とし、さらに乱数を利用する
ことでゆらぎの影響を取り入れた、より自然性の高い合
成音声の提供が可能となる。According to the present embodiment, by adopting such a configuration, as compared with the voice unit having a single code per frame,
By enabling more detailed waveform synthesis and using random numbers, it becomes possible to provide more natural synthesized speech that incorporates the effects of fluctuations.

【００８４】実施の形態１１．図１４はこの発明の音声
合成装置の他の例を示すブロック図である。なお、上述
実施の形態１および従来例１と同様の構成要素およびデ
ータについては、同じ符号を付けて説明を省略する。本
実施の形態における音声合成装置は、実施の形態１と同
様の言語処理部２、韻律生成部５、素片読み出し部１
０、低周波成分コードブック８、高周波成分コードブッ
ク９および合成部１４、実施の形態９と同様の線形和表
現音声素片データベース２６およびコードブック参照部
１６を具備した構成を有している。頻度情報付き音声素
片データベース２３は、事前のベクトル量子化の際に記
述された低周波成分波形、高周波成分波形のそれぞれの
コードブックのコードを線形和で表現した系列からな
る。コードブック参照部１６は音声素片７に記述されて
いるコード列から低周波波形コードブック８および高周
波成分コードブック９のコードベクトルを参照し、コー
ドベクトルの線形和を合成パラメータ１３として出力す
る。Eleventh Embodiment FIG. 14 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first embodiment and the conventional example 1 described above are designated by the same reference numerals and description thereof will be omitted. The speech synthesis apparatus according to the present embodiment has the same language processing unit 2, prosody generation unit 5, and segment reading unit 1 as in the first embodiment.
0, the low frequency component codebook 8, the high frequency component codebook 9 and the synthesizing unit 14, the linear sum expression speech unit database 26 and the codebook reference unit 16 similar to those in the ninth embodiment. The speech unit database with frequency information 23 is composed of a series of linear sums of the codes of the low frequency component waveform and the high frequency component waveform described in the previous vector quantization. The codebook reference unit 16 refers to the code vectors of the low frequency waveform codebook 8 and the high frequency component codebook 9 from the code string described in the speech unit 7, and outputs the linear sum of the code vectors as the synthesis parameter 13.

【００８５】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって線形和表現音声素片データベース２６を参
照する。線形和表現音声素片データベース２６は図１５
に示されるような構造を低周波成分波形と高周波成分波
形の両方についてとる。コードブック参照部１６では、
先の音声素片７におけるコード列から低周波波形コード
ブック８および高周波成分コードブック８を参照し、
（αＸ＋βＹ）という重み付け形式で１ピッチ長の低周
波成分波形と高周波成分波形を求め、これを合成パラメ
ータ１３として得る。韻律生成部５はアクセント情報４
を基に合成音声のピッチ６を韻律規則により生成する。
合成部１４は合成パラメータ１３とピッチ６からＰＳＯ
ＬＡ方式に基づき合成音声１５を生成する。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Accordingly, the linear sum expression speech unit database 26 is referred to. The linear sum expression speech unit database 26 is shown in FIG.
The structure as shown in (3) is taken for both the low frequency component waveform and the high frequency component waveform. In the codebook reference section 16,
Referring to the low-frequency waveform codebook 8 and the high-frequency component codebook 8 from the code string in the above speech unit 7,
The low-frequency component waveform and the high-frequency component waveform of one pitch length are obtained in the weighting format of (αX + βY), and this is obtained as the synthesis parameter 13. The prosody generation unit 5 uses the accent information 4
Based on, the pitch 6 of the synthetic speech is generated by the prosody rule.
The synthesizing unit 14 calculates the PSO from the synthesis parameter 13 and the pitch 6.
The synthetic speech 15 is generated based on the LA method.

【００８６】本実施例によれば、このような構成をとる
ことにより、フレーム毎単一コードの音声素片に比べ、
より詳細な波形合成を可能とし、低周波成分と高周波成
分とを分離することでゆらぎを考慮した波形合成を可能
とし、より自然性の高い合成音声の提供が可能となる。According to the present embodiment, by adopting such a configuration, as compared with the voice unit having a single code for each frame,
More detailed waveform synthesis is possible, and by separating the low-frequency component and the high-frequency component, it is possible to perform waveform synthesis in consideration of fluctuations, and it is possible to provide synthetic speech with higher naturalness.

【００８７】実施の形態１２．図１６はこの発明の音声
合成装置の他の例を示すブロック図である。なお、上述
実施の形態１、１０、１１と同様の構成要素およびデー
タについては、同じ符号を付けて説明を省略する。本実
施の形態における音声合成装置は、実施の形態１と同様
の言語処理部２、韻律生成部５、素片読み出し部１０お
よび合成部１４、低周波成分コードブック８、高周波成
分コードブック９、実施の形態９における線形和表現音
声素片データベース２６および実施の形態１０における
乱数利用コードブック参照部２９および乱数発生器２７
を具備した構成を有している。乱数利用コードブック参
照部２９は音声素片７に記述されているコード列から低
周波成分コードブック８および高周波成分コードブック
９のコードベクトルを参照し、音声素片７に記述された
係数に、乱数発生器２７で発生させた乱数２８を加算
し、線形和を合成パラメータ１３として出力する。Twelfth Embodiment FIG. 16 is a block diagram showing another example of the speech synthesizer of the present invention. The same components and data as those in the first, tenth, and eleventh embodiments described above are designated by the same reference numerals, and description thereof will be omitted. The speech synthesis apparatus according to the present embodiment has the same language processing unit 2, prosody generation unit 5, segment reading unit 10 and synthesis unit 14, low frequency component codebook 8, high frequency component codebook 9, as in the first embodiment. Linear sum representation speech unit database 26 in the ninth embodiment and random number utilizing codebook reference unit 29 and random number generator 27 in the tenth embodiment.
It has a configuration including. The random-number-based codebook reference unit 29 refers to the code vectors of the low-frequency component codebook 8 and the high-frequency component codebook 9 from the code string described in the speech unit 7, and uses the coefficients described in the speech unit 7 as the coefficient. The random number 28 generated by the random number generator 27 is added, and a linear sum is output as the synthesis parameter 13.

【００８８】次ぎに動作について説明する。言語処理部
２は入力されたテキスト１から読みを表す音素列３とア
クセント情報４を得る。素片読み出し部１０は音素列３
にしたがって線形和表現音声素片データベース２６を参
照する。線形和表現音声素片データベース２６は図１５
に示されるような構造を低周波成分と高周波成分の両方
についてとる。乱数利用コードブック参照部２９では、
先の音声素片７におけるコード列から低周波成分コード
ブック８および高周波成分コードブック８を参照して、
両コードブックから２つのコードベクトルを読みだす。
そして乱数発生器２７で発生させた乱数２８を音声素片
７に記述された係数α、βに加算し、新たな係数α’、
β’とし、（α’Ｘ＋β’Ｙ）という重み付け形式で１
ピッチ長の低周波成分波形と高周波成分波形を求め、こ
れを合成パラメータ１３として得る。韻律生成部５はア
クセント情報４を基に合成音声のピッチ６を韻律規則に
より生成する。合成部１４は合成パラメータ１３とピッ
チ６からＰＳＯＬＡ方式に基づき合成音声１５を生成す
る。Next, the operation will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The phoneme reading unit 10 uses the phoneme string 3
Accordingly, the linear sum expression speech unit database 26 is referred to. The linear sum expression speech unit database 26 is shown in FIG.
The structure as shown in is taken for both low frequency components and high frequency components. In the codebook reference section 29 using random numbers,
Referring to the low-frequency component codebook 8 and the high-frequency component codebook 8 from the code string in the speech unit 7,
Read out two code vectors from both codebooks.
Then, the random number 28 generated by the random number generator 27 is added to the coefficients α and β described in the speech unit 7, and a new coefficient α ′,
β ′ and 1 in the weighting format of (α′X + β′Y)
A low-frequency component waveform and a high-frequency component waveform of the pitch length are obtained, and this is obtained as the synthesis parameter 13. The prosody generation unit 5 generates the pitch 6 of the synthetic speech based on the accent information 4 according to the prosody rule. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００８９】本実施例によれば、このような構成をとる
ことにより、フレーム毎単一コードの音声素片に比べ、
より詳細な波形合成を可能とし、低周波成分と高周波成
分とを分離し、さらに乱数を利用することでゆらぎの影
響を取り入れた、より自然性の高い合成音声の提供が可
能となる。According to this embodiment, by adopting such a configuration, as compared with the voice unit having a single code for each frame,
By enabling more detailed waveform synthesis, separating low-frequency components and high-frequency components, and using random numbers, it is possible to provide more natural synthesized speech that incorporates the influence of fluctuations.

【００９０】実施の形態１３．本実施の形態において
は、概略実施の形態１２と同様の構成を成す。本実施の
形態の動作について説明する。言語処理部２は入力され
たテキスト１から読みを表す音素列３とアクセント情報
４を得る。素片読み出し部１０は音素列３にしたがって
線形和表現音声素片データベース２６を参照する。線形
和表現音声素片データベース２６は図１５に示されるよ
うな構造を低周波成分と高周波成分の両方についてと
る。乱数利用コードブック参照部２９では、先の音声素
片７におけるコード列から低周波成分コードブック８お
よび高周波成分コードブック８を参照して、両コードブ
ックから２つのコードベクトルを読みだす。そして高周
波成分についての音声素片７の係数α、βについて乱数
発生器２７で発生させた乱数２８を加算して新たな係数
α’、β’とし、低周波成分については（αＸ＋β
Ｙ）、高周波成分については（α’Ｘ＋β’Ｙ）という
重み付け形式で１ピッチ長の低周波成分波形と高周波成
分波形を求め、これを合成パラメータ１３として得る。
韻律生成部５はアクセント情報４を基に合成音声のピッ
チ６を韻律規則により生成する。合成部１４は合成パラ
メータ１３とピッチ６からＰＳＯＬＡ方式に基づき合成
音声１５を生成する。Thirteenth Embodiment The present embodiment has the same configuration as that of the twelfth embodiment. The operation of this embodiment will be described. The language processing unit 2 obtains the phoneme string 3 and the accent information 4 representing the reading from the input text 1. The unit reading unit 10 refers to the linear sum expression speech unit database 26 according to the phoneme sequence 3. The linear sum expression speech unit database 26 has a structure as shown in FIG. 15 for both low frequency components and high frequency components. The codebook reference unit 29 using random numbers refers to the low-frequency component codebook 8 and the high-frequency component codebook 8 from the code string in the preceding speech unit 7, and reads out two code vectors from both codebooks. Then, the random numbers 28 generated by the random number generator 27 are added to the coefficients α and β of the speech unit 7 for the high frequency component to obtain new coefficients α ′ and β ′, and (αX + β) for the low frequency component.
Y), for the high frequency component, the low frequency component waveform and the high frequency component waveform of one pitch length are obtained in the weighting format of (α′X + β′Y), and this is obtained as the synthesis parameter 13.
The prosody generation unit 5 generates the pitch 6 of the synthetic speech based on the accent information 4 according to the prosody rule. The synthesizer 14 produces a synthetic voice 15 from the synthesis parameter 13 and the pitch 6 based on the PSOLA method.

【００９１】本実施例によれば、このような構成をとる
ことにより、フレーム毎単一コードの音声素片に比べ、
より詳細な波形合成を可能とし、低周波成分と高周波成
分とを分離し、さらにゆらぎの影響が強い高周波成分に
ついて乱数を利用することで、より自然性の高い合成音
声の提供が可能となる。According to this embodiment, by adopting such a configuration, as compared with the voice unit having a single code for each frame,
By enabling more detailed waveform synthesis, separating the low-frequency component and the high-frequency component, and using the random number for the high-frequency component having a strong influence of fluctuation, it becomes possible to provide synthetic speech with a higher naturalness.

【００９２】尚、実施の形態１〜１３における低周波成
分コードブック８および高周波成分コードブック９、ま
たはコードブック２４は、１ピッチ長音声波形の代りに
それのスペクトル、又はＬＳＰ、メルケプストラムなど
のスペクトルパラメータをコードベクトルとすることも
でき、合成部１４はボコーダー方式による音声合成を行
うことも可能である。The low-frequency component codebook 8 and the high-frequency component codebook 9 or the codebook 24 in the first to thirteenth embodiments has a spectrum thereof instead of the one-pitch long speech waveform, or LSP, mel cepstrum, or the like. The spectrum parameter may be a code vector, and the synthesizing unit 14 may also perform vocoder-based speech synthesis.

【００９３】また、実施の形態１〜５、８、１１〜１３
における低周波成分コードブック８および高周波成分コ
ードブック９は、コードブック作成時の帯域制限周波数
を固定とするのではなく、原音声のピッチ長に応じて可
変的変更することを可能とする。The first to fifth, eighth, eleventh to thirteenth embodiments are also provided.
The low-frequency component codebook 8 and the high-frequency component codebook 9 in FIG. 2 do not fix the band limiting frequency at the time of creating the codebook, but can variably change the band-limiting frequency according to the pitch length of the original voice.

【００９４】さらに、実施の形態１〜５、８、１１〜１
３における低周波成分コードブック８および高周波成分
コードブック９は、コードブックサイズを同じくする必
要はなく、高周波成分コードブック９のサイズを低周波
成分コードブック８のそれより少なくするなど、異なる
ように構成してもよい。Furthermore, Embodiments 1 to 5, 8 and 11 to 1
The low-frequency component codebook 8 and the high-frequency component codebook 9 in 3 do not need to have the same codebook size, and are different in that the size of the high-frequency component codebook 9 is smaller than that of the low-frequency component codebook 8. You may comprise.

【００９５】さらにまた、実施の形態１〜５における高
周波成分コードブック９は、各コードベクトルが代表ピ
ッチとともに記録され、コードブック参照では代表ピッ
チが目標ピッチに最も近い高周波成分コードを選択する
こともできる。Further, in the high frequency component codebook 9 in the first to fifth embodiments, each code vector is recorded together with the representative pitch, and in the codebook reference, the high frequency component code whose representative pitch is closest to the target pitch may be selected. it can.

【００９６】また、実施の形態１〜５における素片読み
だし部１０の読み出した高周波成分コードの決定は、同
じコードを続けて選択しないようランダムに行うことで
も実現可能である。Further, the determination of the high frequency component code read by the segment reading unit 10 in the first to fifth embodiments can also be realized by randomly performing the same code so as not to be continuously selected.

【００９７】さらに、実施の形態１における音声素片デ
ータベース１１は、作成の際に低周波成分についてもす
べてのフレーム内ピッチ波形についてベクトル量子化を
行い、得られたコード列をすべてフレーム毎に記述する
ことができ、素片読み出し部１０は、高周波成分波形同
様に低周波成分波形を１ピッチ毎に選択することも可能
である。Further, the speech unit database 11 according to the first embodiment performs vector quantization on all in-frame pitch waveforms for low frequency components at the time of creation, and describes all obtained code strings for each frame. It is also possible for the segment reading unit 10 to select the low frequency component waveform for each pitch as well as the high frequency component waveform.

【００９８】さらにまた、実施の形態１〜５における音
声素片データベース１１は、高周波成分コードの記述は
行わず、別途、音韻種類とそれに対応づけられた高周波
成分コードとのマトリックスが含まれ、素片読み出し部
１０は、入力された音韻によりマトリックスを検索し、
音韻対応コードの中でランダムに決定し、高周波成分コ
ードを決定することもできる。Furthermore, the speech unit database 11 in the first to fifth embodiments does not describe a high frequency component code, but separately includes a matrix of phoneme types and high frequency component codes associated therewith, and The one-sided reading unit 10 searches the matrix by the input phoneme,
It is also possible to randomly determine the high-frequency component code among the phoneme-corresponding codes.

【００９９】また、実施の形態１〜５における音声素片
データベース１１は、高周波成分コードの記述は行わ
ず、別途、音韻種類および代表ピッチにより対応づけら
れた高周波成分コードとのマトリックスが含まれ、素片
読み出し部１０は、入力された音韻と目標ピッチにより
マトリックスを検索し、高周波成分コードを決定するこ
ともできる。Further, the speech unit database 11 in the first to fifth embodiments does not describe a high frequency component code, but separately includes a matrix with a high frequency component code associated with a phoneme type and a representative pitch, The phoneme reading unit 10 can also search the matrix by the input phoneme and the target pitch and determine the high frequency component code.

【０１００】さらに、実施の形態２〜５における音声素
片データベース１１は、高周波コードの記述はフレーム
毎に複数個のコードとすることも可能であり、コードブ
ック参照部は１ピッチ別に別の高周波波形を選択するこ
ともできる。Further, in the speech segment database 11 in the second to fifth embodiments, the description of the high frequency code can be a plurality of codes for each frame, and the codebook reference part has different high frequencies for each pitch. You can also select the waveform.

【０１０１】さらにまた、実施の形態４における振幅比
率の変更としては、乱数発生器による制御が考えられ
る。Furthermore, as a change of the amplitude ratio in the fourth embodiment, control by a random number generator can be considered.

【０１０２】また、実施の形態４における振幅比率制御
加算部２１および実施の形態５における振幅比率ピッチ
制御加算部２２は、振幅の加算比率を入力音声素片の音
韻種類により決定することも可能である。Further, the amplitude ratio control addition unit 21 in the fourth embodiment and the amplitude ratio pitch control addition unit 22 in the fifth embodiment can also determine the addition ratio of the amplitudes according to the phoneme type of the input speech unit. is there.

【０１０３】さらに、実施の形態３の韻律制御移動加算
部２０は、ピッチ周波数６の代わりに韻律生成部５で韻
律規則により決定されるパワーにより決定することがで
きる。すなわち、パワーが小さいほど高周波成分波形の
加算位置の平均変動率が増加するように構成することが
できる。Furthermore, the prosody control moving addition unit 20 of the third embodiment can be determined by the power determined by the prosody rule in the prosody generation unit 5 instead of the pitch frequency 6. That is, the average fluctuation rate of the addition position of the high frequency component waveform can be increased as the power is decreased.

【０１０４】さらにまた、実施の形態５の振幅比率ピッ
チ制御加算部２２は、ピッチ周波数２２の代わりに韻律
生成部５で韻律規則により決定されるパワーにより決定
することができる。すなわち、パワーが小さいほど高周
波波形成分の振幅比率が増加するように構成することが
できる。Furthermore, the amplitude ratio pitch control addition unit 22 of the fifth embodiment can be determined by the power determined by the prosody rule in the prosody generation unit 5 instead of the pitch frequency 22. That is, the smaller the power, the larger the amplitude ratio of the high frequency waveform component can be configured.

【０１０５】また、実施の形態６〜８における頻度情報
付き音声素片データベース２３は、あるコードの出現頻
度が一定の閾値を下回る場合に、あるいはあるコードの
出現頻度が総出現頻度に対して一定値を下回る場合に、
そのコードを記述から除くこともできる。Further, in the speech segment database with frequency information 23 in the sixth to eighth embodiments, when the appearance frequency of a certain code is below a certain threshold value, or the appearance frequency of a certain code is constant with respect to the total appearance frequency. Below the value,
The code can also be omitted from the description.

【０１０６】さらに、実施の形態９〜１３における線形
和表現音声素片データベース２６は、二項の和ではな
く、三項以上の和からなる線形式で表現するものを含む
こともできる。Furthermore, the linear sum expression speech segment database 26 in the ninth to thirteenth embodiments may include not only the sum of the two terms but also the one expressed in the linear form of the sum of three or more terms.

【０１０７】[0107]

【発明の効果】この発明に係る音声合成装置は、音声素
片を記憶する音声素片データベースと、有声音波形を低
周波成分と高周波成分に分離した時の低周波成分波形の
ベクトル量子化コードブックである低周波成分コードブ
ックと、有声音波形を低周波成分と高周波成分に分離し
た時の高周波成分波形のベクトル量子化コードブックで
ある高周波成分コードブックと、入力されたテキストか
ら音素列およびアクセント情報を得る言語処理部と、音
素列にしたがって音声素片を音声素片データベースから
読み出す素片読み出し部と、素片読み出し部の読み出し
た音声素片に基づいて、低周波成分コードブック及び高
周波成分コードブックの各々から低周波成分波形及び高
周波成分波形を選択するコードブック参照部と、選択さ
れた低周波成分波形と高周波成分波形とを加算し合成パ
ラメータを得る加算部と、アクセント情報にしたがって
ピッチ周波数を生成する韻律生成部と、合成パラメータ
とピッチ周波数とに基づいて合成音声を生成する合成部
とを備えている。そのため、ピッチ同期の波形利用の合
成方式において、従来の波形コードブックを低周波成分
と高周波成分とに分離することにより、ゆらぎの大きい
高周波成分と低周波成分とを別々に制御することがで
き、より自然性の高い合成音声の提供が可能となる。The speech synthesizer according to the present invention has a speech unit database for storing speech units, and a vector quantization code for a low frequency component waveform when a voiced sound waveform is separated into a low frequency component and a high frequency component. A low-frequency component codebook that is a book, a high-frequency component codebook that is a vector quantization codebook of a high-frequency component waveform when a voiced sound waveform is separated into a low-frequency component and a high-frequency component, and a phoneme sequence from the input text and A low-frequency component codebook and a high-frequency component based on a language processing unit for obtaining accent information, a phoneme reading unit for reading a phoneme unit from a phoneme unit database according to a phoneme sequence, and a phoneme unit read out by the unit unit reading unit. A codebook reference section for selecting low-frequency component waveforms and high-frequency component waveforms from each of the component codebooks, and the selected low-frequency component waveforms. And a high-frequency component waveform to obtain a synthesis parameter, a prosody generation unit that generates a pitch frequency according to the accent information, and a synthesis unit that generates a synthetic voice based on the synthesis parameter and the pitch frequency. There is. Therefore, in a pitch-synchronized waveform-based synthesis method, by separating a conventional waveform codebook into a low-frequency component and a high-frequency component, it is possible to separately control a high-frequency component and a low-frequency component with large fluctuations, It is possible to provide a more natural synthetic voice.

【０１０８】また、コードブック参照部は、高周波成分
コードブックから１ピッチ毎に異なる高周波成分波形を
選択する。そのため、ピッチ同期の波形利用の合成方式
において、ピッチ区間毎に高周波成分の１ピッチ波形選
択を行うことでゆらぎ成分の生成を可能とし、より自然
性の高い合成音声の提供が可能となる。The codebook reference section selects a different high frequency component waveform for each pitch from the high frequency component codebook. Therefore, in the pitch-synchronized waveform-using synthesis method, it is possible to generate a fluctuation component by selecting one pitch waveform of a high-frequency component for each pitch section, and it is possible to provide a more natural synthesized voice.

【０１０９】また、加算部は、低周波成分波形を時間軸
に配置する際、基準位置に対して、高周波成分波形の配
置位置を１ピッチ毎に変化させて、低周波成分波形と高
周波成分波形とを加算する移動加算部である。そのた
め、ピッチ同期の波形利用の合成方式において、ピッチ
区間毎に低周波成分波形に対する高周波成分波形の配置
位置を変動することにより、ゆらぎ成分の生成を可能と
し、より自然性の高い合成音声の提供が可能となる。Further, when arranging the low frequency component waveform on the time axis, the adding section changes the arrangement position of the high frequency component waveform for every pitch with respect to the reference position so that the low frequency component waveform and the high frequency component waveform are arranged. This is a moving addition unit that adds and. Therefore, in a pitch-synchronized waveform-based synthesis method, it is possible to generate fluctuation components by varying the placement position of the high-frequency component waveform with respect to the low-frequency component waveform for each pitch section, and to provide a more natural synthesized speech. Is possible.

【０１１０】また、加算部は、低周波成分波形の基準位
置に対する高周波成分波形の配置位置の平均変化幅を、
入力ピッチまたはパワーに応じて変化させて、低周波成
分波形と高周波成分波形とを加算する韻律制御移動加算
部である。そのため、ピッチ同期の波形利用の合成方式
において、ピッチ区間毎に低周波成分波形に対する高周
波成分波形の配置位置を変動することにより、ゆらぎ成
分の生成を可能とし、その配置位置の変動量をゆらぎと
の相関性の高いピッチによって制御することで、より自
然性の高い合成音声の提供が可能となる。Further, the adding section calculates the average change width of the arrangement position of the high frequency component waveform with respect to the reference position of the low frequency component waveform,
It is a prosody control moving addition unit that adds a low frequency component waveform and a high frequency component waveform by changing the input pitch or power according to the input pitch or power. Therefore, in the pitch-synchronized waveform-using synthesis method, it is possible to generate a fluctuation component by changing the arrangement position of the high-frequency component waveform with respect to the low-frequency component waveform for each pitch section, and to fluctuate the fluctuation amount of the arrangement position. It is possible to provide synthetic speech with a higher degree of naturalness by controlling the pitch with a high correlation.

【０１１１】また、加算部は、加算される低周波成分波
形と高周波成分波形の振幅比率を、１ピッチ毎に変化さ
せ、低周波成分波形と高周波成分波形とを加算する振幅
比率制御加算部である。そのため、ピッチ同期の波形利
用の合成方式において、高周波成分と低周波成分の振幅
比率を１ピッチ毎に変更することにより、ゆらぎの生成
を可能とし、より自然性の高い合成音声の提供が可能と
なる。The adder is an amplitude ratio control adder for changing the amplitude ratio of the low frequency component waveform and the high frequency component waveform to be added for each pitch and adding the low frequency component waveform and the high frequency component waveform. is there. Therefore, in the pitch-synchronized waveform-based synthesis method, it is possible to generate fluctuations by changing the amplitude ratio of the high-frequency component and the low-frequency component for each pitch, and it is possible to provide a more natural synthesized voice. Become.

【０１１２】また、加算部は、低周波成分波形と高周波
成分波形の振幅比率を、入力ピッチまたはパワーに応じ
て変化させ、低周波成分波形と高周波成分波形とを加算
する振幅比率ピッチ制御移動加算部である。そのため、
ピッチ同期の波形利用の合成方式において、高周波成分
と低周波成分の振幅比率を１ピッチ毎に変更することに
より、ゆらぎの生成を可能とし、その振幅比率をゆらぎ
との相関性の高いピッチによって制御することで、より
自然性の高い合成音声の提供が可能となる。Further, the adding section changes the amplitude ratio of the low frequency component waveform and the high frequency component waveform according to the input pitch or power, and adds the low frequency component waveform and the high frequency component waveform. It is a department. for that reason,
In a pitch-synchronized waveform-based synthesis method, fluctuations can be generated by changing the amplitude ratio of high-frequency components and low-frequency components for each pitch, and the amplitude ratio is controlled by a pitch highly correlated with fluctuations. By doing so, it becomes possible to provide synthetic speech with higher naturalness.

【０１１３】また、この発明に係る他の音声合成装置
は、音声素片およびコードブック出現頻度を記憶する頻
度情報付き音声素片データベースと、有声音波形ベクト
ル量子化コードブックであるコードブックと、入力され
たテキストから音素列およびアクセント情報を得る言語
処理部と、頻度情報付き音声素片データベースに記載さ
れている音声素片を出現頻度と共に読み出す素片読み出
し部と、音声素片に対して、出現頻度に応じてコードブ
ックを参照し、合成パラメータを得るコードブック参照
部と、アクセント情報にしたがってピッチ周波数を生成
する韻律生成部と、合成パラメータとピッチ周波数とに
基づいて合成音声を生成する合成部とを備えている。そ
のため、ピッチ同期の波形利用の合成方式において、コ
ードブックの出現頻度を考慮した音声素片を用意するこ
とにより、フレーム毎単一コードの音声素片に比べ、よ
り多様な波形合成を可能とし、より自然性の高い合成音
声の提供が可能となる。Another speech synthesizer according to the present invention is a speech unit database with frequency information for storing speech units and codebook appearance frequencies, a codebook which is a voiced sound vector quantized codebook, A language processing unit that obtains a phoneme string and accent information from the input text, a phoneme unit reading unit that reads out the phoneme units described in the phoneme unit database with frequency information together with the appearance frequency, and a phoneme unit, A codebook reference unit that refers to a codebook according to the appearance frequency to obtain a synthesis parameter, a prosody generation unit that generates a pitch frequency according to accent information, and a synthesis that generates a synthesized voice based on the synthesis parameter and the pitch frequency. And a section. Therefore, in a pitch-synchronized waveform-based synthesis method, by preparing a speech unit in consideration of the frequency of appearance of a codebook, it is possible to synthesize more diverse waveforms than a speech unit having a single code per frame, It is possible to provide a more natural synthetic voice.

【０１１４】また、コードブック参照部は、音声素片に
記載される出現頻度からコードベクトルの出現比率を求
め、音声素片に記述されたコードの中で出現比率の上位
から複数個を選んでコードブックを参照し、読み出され
て得た各波形を出現比率に応じた重み付けをして加算
し、合成パラメータを得るコードブック参照部である。
そのため、ピッチ同期の波形利用の合成方式において、
コードブックの出現頻度を考慮した音声素片を用意し、
さらにその出現頻度に応じた１ピッチ波形の重みづけ加
算をすることで、フレーム毎単一コードの音声素片に比
べ、より詳細な波形合成を可能とし、より自然性の高い
合成音声の提供が可能となる。Further, the codebook reference unit obtains the appearance ratio of the code vector from the appearance frequency described in the speech unit and selects a plurality of codes from the highest appearance ratio among the codes described in the speech unit. A codebook reference unit that refers to a codebook, weights the waveforms obtained by reading, and adds the waveforms according to the appearance ratio to obtain a synthesis parameter.
Therefore, in the pitch-synchronized waveform-based synthesis method,
Prepare speech units considering the frequency of codebook appearance,
Furthermore, by performing weighted addition of 1-pitch waveforms according to the frequency of appearance, more detailed waveform synthesis is possible compared to speech units of a single code for each frame, and more natural synthesized speech can be provided. It will be possible.

【０１１５】また、コードブックは、有声音波形を低周
波成分と高周波成分に分離した時の低周波成分波形のベ
クトル量子化コードブックである低周波成分コードブッ
クと、有声音波形を低周波成分と高周波成分に分離した
時の高周波成分波形のベクトル量子化コードブックであ
る高周波成分コードブックとからなり、頻度情報付き音
声素片データベースは、音声素片と共に低周波コードベ
クトルの出現頻度および高周波コードベクトルの出現頻
度を記憶し、コードブック参照部は、音声素片に記載
される出現頻度から低周波コードベクトルおよび高周波
コードベクトルの出現比率を求め、音声素片に記述され
たコードの内で出現比率の上位から複数個を選んで低周
波コードブックおよび高周波コードブックを参照し、読
み出されて得た各波形を出現比率に応じた重み付けをし
て加算し、合成パラメータを得るコードブック参照部で
ある。そのため、ピッチ同期の波形利用の合成方式にお
いて、高周波成分と低周波成分についてコードブックの
出現頻度を考慮した音声素片を用意し、さらにその出現
頻度に応じた１ピッチ波形の重みづけ加算をすること
で、フレーム毎単一コードの音声素片に比べ、より詳細
で、低周波成分と高周波成分とを分離することでゆらぎ
を考慮した波形合成を可能とし、より自然性の高い合成
音声の提供が可能となる。The codebook is a low-frequency component codebook which is a vector quantization codebook of a low-frequency component waveform when a voiced sound waveform is separated into a low-frequency component and a high-frequency component. And a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into high-frequency components, and a speech element database with frequency information, together with speech elements, the appearance frequency and high-frequency code of low-frequency code vectors. The codebook reference unit stores the appearance frequency of the vector, and determines the appearance ratio of the low-frequency code vector and the high-frequency code vector from the appearance frequency described in the speech unit, and the codebook reference unit appears in the code described in the speech unit. Waveforms obtained by reading out the low-frequency codebook and high-frequency codebook by selecting multiple from the highest ratio It is a codebook reference section that obtains synthesis parameters by weighting shapes according to the appearance ratio and adding them. Therefore, in the pitch-synchronized waveform-using synthesis method, a speech unit considering the appearance frequency of the codebook is prepared for the high-frequency component and the low-frequency component, and the 1-pitch waveform is weighted and added according to the appearance frequency. This makes it possible to synthesize waveforms that are more detailed and separate the low-frequency component and high-frequency component, taking into account fluctuations, as compared to a single-code speech segment for each frame, providing a more natural synthesized speech. Is possible.

【０１１６】また、この発明に係る他の音声合成装置
は、音声素片列をコードブックのコードベクトル線形和
表現で記憶した線形和表現音声素片データベースと、有
声音波形ベクトル量子化コードブックであるコードブッ
クと、入力されたテキストから音素列およびアクセント
情報を得る言語処理部と、音素列にしたがって音声素片
列を線形和表現音声素片データベースから読み出す素片
読み出し部と、音声素片列に対して、線形和表現音声素
片データベースに記憶されている係数とコードブックを
参照して得た波形から線形和を求め、合成パラメータを
得るコードブック参照部と、アクセント情報にしたがっ
てピッチ周波数を生成する韻律生成部と、合成パラメー
タとピッチ周波数とに基づいて合成音声を生成する合成
部とを備えている。そのため、ピッチ同期の波形利用の
合成方式において、複数のコードベクトルの線形和表現
からなる音声素片を用意することにより、フレーム毎単
一コードの音声素片に比べ、より詳細な波形合成を可能
とし、より自然性の高い合成音声の提供が可能となる。Another speech synthesis apparatus according to the present invention is a linear sum representation speech unit database in which a speech unit string is stored in a code vector linear sum representation of a codebook, and a voiced sound waveform vector quantization codebook. A codebook, a language processing unit that obtains a phoneme string and accent information from the input text, a phoneme unit reading unit that reads out a phoneme unit string from the linear sum expression phoneme unit database according to the phoneme string, and a phoneme unit string. On the other hand, a linear sum is obtained from the coefficients stored in the linear sum expression speech unit database and the waveform obtained by referring to the codebook, and the pitch frequency is determined according to the codebook reference section for obtaining the synthesis parameter and the accent information. It includes a prosody generating section for generating and a synthesizing section for generating synthesized speech based on the synthesizing parameter and the pitch frequency. Therefore, in a pitch-synchronized waveform-based synthesis method, by preparing a speech unit consisting of a linear sum expression of multiple code vectors, more detailed waveform synthesis is possible compared to a speech unit with a single code per frame. Therefore, it becomes possible to provide more natural synthetic speech.

【０１１７】また、乱数を発生する乱数発生器をさらに
有し、コードブック参照部は、線形和表現音声素片デー
タベースに記憶されている係数に乱数発生器による乱数
を加算し、係数とコードブックを参照して得た波形から
線形和を求め、合成パラメータを得る乱数利用コードブ
ック参照部である。そのため、ピッチ同期の波形利用の
合成方式において、複数のコードベクトルの線形和表現
からなる音声素片を用意することにより、フレーム毎単
一コードの音声素片に比べ、より詳細な波形合成を可能
とし、さらに乱数を利用することでゆらぎの影響を取り
入れた、より自然性の高い合成音声の提供が可能とな
る。Further, the codebook reference unit further has a random number generator for generating random numbers, and the codebook reference unit adds the random number generated by the random number generator to the coefficient stored in the linear sum expression speech unit database to obtain the coefficient and the codebook. Is a random-number-based codebook reference unit that obtains a linear sum from the waveform obtained by referring to. Therefore, in a pitch-synchronized waveform-based synthesis method, by preparing a speech unit consisting of a linear sum expression of multiple code vectors, more detailed waveform synthesis is possible compared to a speech unit with a single code per frame. By using random numbers, it is possible to provide more natural synthesized speech that incorporates the effects of fluctuations.

【０１１８】また、コードブックは、有声音波形を低周
波成分と高周波成分に分離した時の低周波成分波形のベ
クトル量子化コードブックである低周波成分コードブッ
クと、有声音波形を低周波成分と高周波成分に分離した
時の高周波成分波形のベクトル量子化コードブックであ
る高周波成分コードブックとからなり、線形和表現音声
素片データベースは、音声素片列を低周波成分コードブ
ックおよび高周波成分コードブックのコードベクトル線
形和表現で記憶し、コードブック参照部は、線形和表現
音声素片データベースに記憶されている係数と低周波成
分コードブックおよび高周波成分コードブックを参照し
て得た波形から線形和を求め合成パラメータを得る。そ
のため、ピッチ同期の波形利用の合成方式において、複
数のコードベクトルの線形和表現からなる音声素片を用
意することにより、フレーム毎単一コードの音声素片に
比べ、より詳細な波形合成を可能とし、低周波成分と高
周波成分とを分離することでゆらぎを考慮した波形合成
を可能とし、より自然性の高い合成音声の提供が可能と
なる。The codebook is a low-frequency component codebook which is a vector quantization codebook of a low-frequency component waveform when a voiced sound waveform is separated into a low-frequency component and a high-frequency component. And a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into high-frequency components, and the linear sum representation speech unit database is a low-frequency component codebook and a high-frequency component code for a speech unit string. The code vector of the book is stored as a linear sum expression, and the code book reference section is a linear sum from the coefficients stored in the linear sum expression speech unit database and the waveform obtained by referring to the low frequency component code book and the high frequency component code book. The sum is calculated to obtain the synthesis parameter. Therefore, in a pitch-synchronized waveform-based synthesis method, by preparing a speech unit consisting of a linear sum expression of multiple code vectors, more detailed waveform synthesis is possible compared to a speech unit with a single code per frame. By separating the low-frequency component and the high-frequency component, it is possible to perform waveform synthesis in consideration of fluctuations, and it is possible to provide synthetic speech with a higher naturalness.

【０１１９】また、コードブックは、有声音波形を低周
波成分と高周波成分に分離した時の低周波成分波形のベ
クトル量子化コードブックである低周波成分コードブッ
クと、有声音波形を低周波成分と高周波成分に分離した
時の高周波成分波形のベクトル量子化コードブックであ
る高周波成分コードブックとからなり、乱数利用コード
ブック参照部は、線形和表現音声素片データベースに記
憶されている係数に乱数発生器による乱数を加算し、係
数と低周波成分コードブックおよび高周波成分コードブ
ックを参照して得た波形から線形和を求め合成パラメー
タを得る。そのため、ピッチ同期の波形利用の合成方式
において、複数のコードベクトルの線形和表現からなる
音声素片を用意することにより、フレーム毎単一コード
の音声素片に比べ、より詳細な波形合成を可能とし、低
周波成分と高周波成分とを分離し、さらに乱数を利用す
ることでゆらぎの影響を取り入れた、より自然性の高い
合成音声の提供が可能となる。The codebook is a low frequency component codebook which is a vector quantization codebook of low frequency component waveforms when a voiced sound waveform is separated into a low frequency component and a high frequency component. And a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into high-frequency components, and the random-number-based codebook reference unit uses random numbers for the coefficients stored in the linear sum expression speech unit database. Random numbers generated by the generator are added, a linear sum is obtained from coefficients and waveforms obtained by referring to the low-frequency component codebook and the high-frequency component codebook, and a synthesis parameter is obtained. Therefore, in a pitch-synchronized waveform-based synthesis method, by preparing a speech unit consisting of a linear sum expression of multiple code vectors, more detailed waveform synthesis is possible compared to a speech unit with a single code per frame. By separating the low-frequency component and the high-frequency component and further using random numbers, it is possible to provide a more natural synthesized speech that incorporates the influence of fluctuations.

【０１２０】さらに、コードブックは、有声音波形を低
周波成分と高周波成分に分離した時の低周波成分波形の
ベクトル量子化コードブックである低周波成分コードブ
ックと、同じく、有声音波形を低周波成分と高周波成分
に分離した時の高周波成分波形のベクトル量子化コード
ブックである高周波成分コードブックとからなり、乱数
利用コードブック参照部は、線形和表現音声素片データ
ベースに記憶されている高周波成分コードの係数に乱数
発生器による乱数を加算し、係数と低周波成分コードブ
ックおよび高周波成分コードブックを参照して得た波形
から線形和を求め合成パラメータを得る。そのため、ピ
ッチ同期の波形利用の合成方式において、複数のコード
ベクトルの線形和表現からなる音声素片を用意すること
により、フレーム毎単一コードの音声素片に比べ、より
詳細な波形合成を可能とし、低周波成分と高周波成分と
を分離し、さらにゆらぎの影響が強い高周波成分につい
て乱数を利用することで、より自然性の高い合成音声の
提供が可能となる。Further, the codebook is a low frequency component codebook which is a vector quantization codebook of a low frequency component waveform when the voiced sound waveform is separated into a low frequency component and a high frequency component. A high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into a high-frequency component and a high-frequency component, and the random-number-based codebook reference unit is a high-frequency component stored in the linear sum expression speech unit database. A random number generated by a random number generator is added to the coefficient of the component code, and a linear sum is obtained from the coefficient and the waveform obtained by referring to the low frequency component codebook and the high frequency component codebook to obtain a synthesis parameter. Therefore, in a pitch-synchronized waveform-based synthesis method, by preparing a speech unit consisting of a linear sum expression of multiple code vectors, more detailed waveform synthesis is possible compared to a speech unit with a single code per frame. Then, by separating the low-frequency component and the high-frequency component and using the random number for the high-frequency component which is strongly influenced by the fluctuation, it is possible to provide a more natural synthetic speech.

[Brief description of drawings]

【図１】この発明の音声合成装置を示すブロック図で
ある。FIG. 1 is a block diagram showing a speech synthesizer of the present invention.

【図２】音声素片データベースの内容例を示した図で
ある。FIG. 2 is a diagram showing an example of contents of a speech unit database.

【図３】この発明の音声合成装置の他の例を示すブロ
ック図である。FIG. 3 is a block diagram showing another example of the speech synthesizer of the present invention.

【図４】移動加算部、および韻律制御移動加算部の加
算方式の例を示した説明図である。FIG. 4 is an explanatory diagram showing an example of an addition method of a moving addition unit and a prosody control moving addition unit.

【図５】この発明の音声合成装置の他の例を示すブロ
ック図である。FIG. 5 is a block diagram showing another example of the speech synthesizer of the present invention.

【図６】この発明の音声合成装置の他の例を示すブロ
ック図である。FIG. 6 is a block diagram showing another example of the speech synthesizer of the present invention.

【図７】この発明の音声合成装置の他の例を示すブロ
ック図である。FIG. 7 is a block diagram showing another example of the speech synthesizer of the present invention.

【図８】この発明の音声合成装置の他の例を示すブロ
ック図である。FIG. 8 is a block diagram showing another example of the speech synthesizer of the present invention.

【図９】頻度情報付き音声素片データベースの内容例
を示した図である。FIG. 9 is a diagram showing an example of contents of a voice segment database with frequency information.

【図１０】この発明の音声合成装置の他の例を示すブ
ロック図である。FIG. 10 is a block diagram showing another example of the speech synthesizer of the present invention.

【図１１】この発明の音声合成装置の他の例を示すブ
ロック図である。FIG. 11 is a block diagram showing another example of the speech synthesizer of the present invention.

【図１２】この発明の音声合成装置の他の例を示すブ
ロック図である。FIG. 12 is a block diagram showing another example of the speech synthesizer of the present invention.

【図１３】この発明の音声合成装置の他の例を示すブ
ロック図である。FIG. 13 is a block diagram showing another example of the speech synthesizer of the present invention.

【図１４】この発明の音声合成装置の他の例を示すブ
ロック図である。FIG. 14 is a block diagram showing another example of the speech synthesizer of the present invention.

【図１５】線形和表現音声素片データベースの内容例
を示した図である。FIG. 15 is a diagram showing an example of contents of a linear sum expression speech unit database.

【図１６】この発明の音声合成装置の他の例を示すブ
ロック図である。FIG. 16 is a block diagram showing another example of the speech synthesizer of the present invention.

【図１７】従来の音声合成装置の一構成例を示すブロ
ック図である。FIG. 17 is a block diagram showing a configuration example of a conventional speech synthesizer.

【図１８】従来の音声合成装置の他の例を示すブロッ
ク図である。FIG. 18 is a block diagram showing another example of a conventional speech synthesizer.

[Explanation of symbols]

２言語処理部、５韻律生成部、８低周波成分コー
ドブック、９高周波成分コードブック、１０素片読
み出し部、１１音声素片データベース、１２コードブ
ック参照加算部（コードブック参照部）、１４合成
部、１６コードブック参照部、１９移動加算部、２
０韻律制御移動加算部、２１振幅比率制御加算部、
２２振幅比率ピッチ制御加算部、２３頻度情報付き
音声素片データベース、２４コードブック、２５素
片読み出し重み付け選択部、２６線形和表現音声素片デ
ータベース、２７乱数発生器。2 language processing unit, 5 prosody generation unit, 8 low frequency component codebook, 9 high frequency component codebook, 10 unit reading unit, 11 speech unit database, 12 codebook reference addition unit (codebook reference unit), 14 synthesis Section, 16 codebook reference section, 19 moving addition section, 2
0 prosody control moving addition unit, 21 amplitude ratio control addition unit,
22 amplitude ratio pitch control addition unit, 23 frequency information-added voice unit database, 24 codebook, 25 unit readout weighting selection unit, 26 linear sum expression voice unit database, 27 random number generator.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−141000（ＪＰ，Ａ) 特開平９−204192（ＪＰ，Ａ) 特開平10−247097（ＪＰ，Ａ) 特開平８−335096（ＪＰ，Ａ) 特開平10−143196（ＪＰ，Ａ) 特開平５−73100（ＪＰ，Ａ) 特開平７−20894（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 G10L 13/06 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-7-141000 (JP, A) JP-A-9-204192 (JP, A) JP-A-10-247097 (JP, A) JP-A-8- 335096 (JP, A) JP 10-143196 (JP, A) JP 5-73100 (JP, A) JP 7-20894 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 13/08 G10L 13/06

Claims

(57) [Claims]

1. A speech element database for storing speech elements, and a low frequency component codebook which is a vector quantization codebook of low frequency component waveforms when a voiced sound waveform is separated into low frequency components and high frequency components. , A high-frequency component codebook that is a vector quantization codebook of high-frequency component waveforms when a voiced sound waveform is separated into low-frequency components and high-frequency components, and a language processing unit that obtains phoneme strings and accent information from input text, A low-frequency component codebook and a high-frequency component code based on the phoneme reading unit that reads the phoneme from the phoneme database according to the phoneme sequence, and the phoneme read by the phoneme read unit. A codebook reference section that selects low-frequency component waveforms and high-frequency component waveforms from each of the books, and the selected low-frequency component waveforms. An addition unit that adds the split waveform and the high-frequency component waveform to obtain a synthesis parameter, a prosody generation unit that generates a pitch frequency according to the accent information, and a synthetic voice based on the synthesis parameter and the pitch frequency. A voice synthesizing device comprising: a synthesizing unit.

2. The speech synthesizer according to claim 1, wherein the codebook reference section selects different high frequency component waveforms for each pitch from the high frequency component codebook.

3. The low frequency component waveform, wherein the adding section changes the high frequency component waveform arrangement position with respect to a reference position for each pitch when arranging the low frequency component waveform on a time axis. The speech synthesizer according to claim 1 or 2, wherein the speech synthesizer is a moving addition unit that adds the high frequency component waveform and the high frequency component waveform.

4. The adding unit changes an average change width of an arrangement position of the high frequency component waveform with respect to a reference position of the low frequency component waveform according to an input pitch or power,
The speech synthesizer according to claim 1 or 2, which is a prosody control moving addition unit that adds the low-frequency component waveform and the high-frequency component waveform.

5. The amplitude for adding the low frequency component waveform and the high frequency component waveform by changing the amplitude ratio of the added low frequency component waveform and the high frequency component waveform for each pitch The speech synthesis apparatus according to claim 1 or 2, which is a ratio control addition unit.

6. An amplitude for adding the low-frequency component waveform and the high-frequency component waveform by changing the amplitude ratio of the low-frequency component waveform and the high-frequency component waveform according to the input pitch or power. The speech synthesizer according to claim 1 or 2, wherein the speech synthesizer is a ratio pitch control moving addition unit.

7. A speech unit and voiced sound vector quantization
And frequency information with speech unit database that stores the code frequency codebook with a codebook is voiced waveform vector quantization codebook, and the language processing unit to obtain a phoneme and accent information from the input text , a segment reading unit for reading the speech units that are described in the speech unit database with the frequency information along with the frequency of occurrence, with the above speech unit, by referring to the code book,
A codebook reference unit that obtains a synthesis parameter based on the appearance frequency, a prosody generation unit that generates a pitch frequency according to the accent information, and a synthesis unit that generates a synthesized voice based on the synthesis parameter and the pitch frequency. A voice synthesizer comprising:

8. The codebook reference unit obtains an appearance ratio of a code vector from the appearance frequency described in the speech unit, and selects a plurality of code vectors from the highest appearance ratio among the codes described in the speech unit. 8. A codebook reference unit for selecting individual pieces and referring to the codebook, weighting and adding each waveform obtained by reading according to an appearance ratio, and obtaining a synthesis parameter. The described speech synthesizer.

9. The codebook is a low-frequency component codebook which is a vector quantization codebook of low-frequency component waveforms when a voiced sound waveform is separated into low-frequency components and high-frequency components, and voiced sound waveforms at low-frequency components. Composed of a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into a high-frequency component and a high-frequency component, and the speech element database with frequency information mentioned above, together with the speech element, the frequency of appearance of low-frequency code vectors. And the appearance frequency of the high-frequency code vector is stored, and the codebook reference unit obtains the appearance ratios of the low-frequency code vector and the high-frequency code vector from the appearance frequency described in the speech unit, and describes them in the speech unit. Among the codes, select a plurality from the top of the appearance ratio and refer to the above low frequency codebook and high frequency codebook. And, read each waveform obtained by adding to a weighting according to the appearance ratio, the speech synthesizing apparatus according to claim 7, characterized in that the codebook reference unit for obtaining a synthesis parameter.

10.The speech unit string is converted to the codebook codebook.
Coutle Linear sum expression stored in linear sum expression Speech segment data
Base, Codeb, a voiced sound vector quantized codebook
And Phoneme strings and accent information from the entered text
A language processing unit to obtain, According to the phoneme string, the phoneme string string is converted into the linear sum table.
A unit reading unit that reads from the current speech unit database
When, For the speech unit string, the linear sum expression speech unit data
Refer to the coefficients and codebook stored in the database
The linear sum is obtained from the waveforms obtained by
Reference section Generates pitch frequency according to the above accent information
A prosody generation unit that Based on the synthesis parameters and the pitch frequency,
And a synthesis unit that generates a synthesized voice, Further having a random number generator for generating random numbers, The codebook reference section is for the linear sum expression speech unit
The coefficient stored in the database is
Waves obtained by adding random numbers and referring to the coefficient and codebook
A random-number-based
It is a readbook reference sectionSoundVoice synthesizer
Place

11.The speech unit string is converted to the codebook codebook.
Coutle Linear sum expression stored in linear sum expression Speech segment data
Base, Codeb, a voiced sound vector quantized codebook
And Phoneme strings and accent information from the entered text
A language processing unit to obtain, According to the phoneme string, the phoneme string string is converted into the linear sum table.
A unit reading unit that reads from the current speech unit database
When, For the speech unit string, the linear sum expression speech unit data
Refer to the coefficients and codebook stored in the database
The linear sum is obtained from the waveforms obtained by
Reference section Generates pitch frequency according to the above accent information
A prosody generation unit that Based on the synthesis parameters and the pitch frequency,
And a synthesis unit that generates a synthesized voice, The above codebook shows the voiced sound waveform as low frequency components and high frequency components.
Vector quantization coefficient of low-frequency component waveform when separated into components
Low frequency component codebook and voiced sound wave
High frequency component when the shape is separated into low frequency component and high frequency component
A high-frequency component code, which is a vector quantization codebook for waveforms.
And a book, The above linear sum expression speech unit database stores speech unit strings.
The low frequency component codebook and the high frequency component code
Dobook's code vector linear sum expression, The codebook reference section is for the linear sum expression speech unit
The coefficient and low frequency component code block stored in the database.
Waveforms obtained by referring to the clock and high frequency component codebooks
It is characterized in that a linear sum is obtained from
SoundVoice synthesizer.

12. The codebook is a low-frequency component codebook which is a vector quantization codebook of a low-frequency component waveform when a voiced sound waveform is separated into a low-frequency component and a high-frequency component. And a high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into high-frequency components, and the random-number-use codebook reference section is stored in the linear sum expression speech unit database. adding a random number to the coefficient by the random number generator, the coefficients and the low frequency component codebook and a high frequency component codes obtained by reference to the book waveform of claim 10, wherein to obtain the synthesis parameter obtaining the linear sum Speech synthesizer.

13. The low frequency component codebook, which is a vector quantization codebook of a low frequency component waveform when the voiced sound waveform is separated into a low frequency component and a high frequency component. The high-frequency component codebook, which is a vector quantization codebook of high-frequency component waveforms when separated into low-frequency components and high-frequency components, is stored in the linear sum expression speech unit database. The random number generated by the random number generator is added to the coefficient of the high frequency component code, and a linear sum is obtained from the coefficient and the waveform obtained by referring to the low frequency component codebook and the high frequency component codebook to obtain a synthesis parameter. The speech synthesizer according to claim 10 .