JP2007025338A

JP2007025338A - Method and device for speech synthesis, and computer program

Info

Publication number: JP2007025338A
Application number: JP2005208532A
Authority: JP
Inventors: Tsutomu Kaneyasu; 勉兼安
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2005-07-19
Filing date: 2005-07-19
Publication date: 2007-02-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device and a method for speech synthesis, and a computer program with which deterioration in voice quality can be suppressed even when a speech is synthesized so that an output speech of a keyword part and an output speech of other parts can be discriminated by changing the speaker, sound volume, tone interval, or speaking speed of the speech. <P>SOLUTION: The device (100, 900) for speech synthesis is equipped with a keyword-priority phoneme selection section (107, 109) which selects candidates for phonemes of all keywords included in a text body from a corpus in the appearance order of the keywords preferentially to candidates for phonemes of elements other than the keywords. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は文書を読み上げるための音声合成にかかり，特にキーワードを強調する音声合成方法，音声合成装置，およびコンピュータプログラムに関する。 The present invention relates to speech synthesis for reading a document, and more particularly to a speech synthesis method, speech synthesis apparatus, and computer program for emphasizing keywords.

予め録音された人の自然音声等を基にして，ＰＣ（パーソナルコンピュータ）に記憶された文書を，音声に変換して読み上げる音声合成装置が一般的に知られている。上記音声合成装置は，品詞単位に分割可能な自然音声が記録されているコーパスに基づいて合成音声を作成する。 2. Description of the Related Art Generally, a speech synthesizer is known that converts a document stored in a PC (personal computer) into a voice and reads it out based on a natural voice of a person recorded in advance. The speech synthesizer creates synthesized speech based on a corpus in which natural speech that can be divided into parts of speech is recorded.

上記音声合成装置による音声合成処理では，例えば，入力されたテキストに対して形態素解析，係り受け解析を実行し，音素記号，アクセント記号などに変換される。 In the speech synthesis processing by the speech synthesizer, for example, morpheme analysis and dependency analysis are executed on the input text, and converted into phoneme symbols, accent symbols, and the like.

次に，音素記号，アクセント記号列，および形態素解析結果から得られる入力テキストの品詞情報を用いて，音素持続時間（声の長さ），基本周波数（声の高さ），母音中心のパワー（声の大きさ）等の推定が行われる。 Next, the phoneme duration (voice length), fundamental frequency (voice pitch), vowel-centric power (by using the part-of-speech information of the input text obtained from phoneme symbols, accent symbol strings, and morpheme analysis results ( The loudness etc. is estimated.

次に，上記推定された音素持続時間，基本周波数，母音中心のパワーなどに最も近く，かつ波形辞書に蓄積されている合成単位（音素片）を接続したときの歪みが最も小さくなる合成単位の組合せが動的計画法等を用いて選択される。なお，この際に行われる単位選択では，知覚的特徴に一致した尺度（コスト値）を用いる。 Next, the synthesis unit that is closest to the estimated phoneme duration, fundamental frequency, vowel center power, etc. and that has the smallest distortion when connecting synthesis units (phonemes) stored in the waveform dictionary is the smallest. A combination is selected using dynamic programming or the like. The unit selection performed at this time uses a scale (cost value) that matches the perceptual feature.

上記合成単位の組合せが選択されると，当該選択された音素片の組合せに従って，ピッチを変換しつつ音素片の接続を行うことにより音声が合成される。以上が，音声合成処理の概略である。 When the combination of the synthesis units is selected, the speech is synthesized by connecting the phonemes while changing the pitch in accordance with the selected phoneme combination. The above is the outline of the speech synthesis process.

また，上記音声合成装置のなかには，文書中の重要な個所，文書作成者が読み手に特に伝えたい個所を強調して読み上げることが可能な装置が開発されている（例えば，特許文献１，参照）。 Among the speech synthesizers described above, an apparatus has been developed that can read out important parts in a document and emphasize parts that the document creator particularly wants to convey to the reader (for example, see Patent Document 1). .

特開平１０−２７４９９９号公報JP-A-10-274999

しかしながら，音声合成装置において文書中の重要な個所を強調する際に，上記音声の話者，音量，音程，または話速を変更することによって，キーワード部分に対する出力音声と，その他の部分に対する出力音声とが識別できるように音声合成をすると，かかる強調部分の音質が劣化するという問題があった。 However, when emphasizing important parts in the document in the speech synthesizer, the output speech for the keyword portion and the output speech for other portions are changed by changing the speaker, volume, pitch, or speech speed of the speech. When speech synthesis is performed so that can be identified, there is a problem that the sound quality of the emphasized portion deteriorates.

本発明は，上記問題点に鑑みてなされたものであり，本発明の目的は，音声の話者，音量，音程，または話速を変更し，キーワード部分に対する出力音声と，その他の部分に対する出力音声とを識別できるような音声合成をしても音質の劣化を抑えることが可能な，新規かつ改良された音声合成装置，音声合成方法，およびコンピュータプログラムを提供することである。 The present invention has been made in view of the above problems, and an object of the present invention is to change the voice speaker, volume, pitch, or speech speed, and output voice to the keyword part and output to other parts. To provide a new and improved speech synthesizer, speech synthesis method, and computer program capable of suppressing deterioration of sound quality even if speech synthesis is performed such that speech can be distinguished.

上記課題を解決するため，本発明の第１の観点によれば，キーワード部分を強調し音声合成を行う音声合成方法が提供される。上記音声合成方法は，テキスト本文に含まれる全てのキーワードの音韻の候補を，該キーワード以外の音韻の候補よりも優先的に，該キーワードの出現順で，コーパスから選択する選択処理が実行されることを特徴としている。なお，上記コーパスには，例えば，少なくとも品詞単位に分割可能な自然音声が記録されているが，かかる例に限定されない。 In order to solve the above problems, according to a first aspect of the present invention, there is provided a speech synthesis method for performing speech synthesis by emphasizing a keyword portion. In the above speech synthesis method, a selection process is executed in which phoneme candidates for all keywords included in the text body are selected from the corpus in order of appearance of the keywords in preference to phoneme candidates other than the keywords. It is characterized by that. The corpus records, for example, natural speech that can be divided at least into parts of speech, but is not limited to this example.

上記選択処理は，上記キーワード部分のコスト値が最小となる単位候補の組合せを用いて，上記テキスト本文の開始位置から最初に出現したキーワード開始位置に向けて，コスト値が最小となる単位候補の組合せを選択し；上記キーワードが２つ以上存在する場合，該キーワードの終了位置から後続のキーワードの開始位置に向けて，上記コスト値が最小となる単位候補の組合せを選択するように構成しても良い。 The selection process uses a combination of unit candidates that minimizes the cost value of the keyword part, and selects a unit candidate that has the minimum cost value from the start position of the text body to the first keyword start position. A combination is selected; when there are two or more keywords, the combination of unit candidates that minimizes the cost value is selected from the end position of the keyword toward the start position of the subsequent keyword. Also good.

上記音声合成方法では，上記選択処理の前に，キーワードの音韻記号が，テキスト本文の音韻記号の内で，部分一致しているかを音韻記号の先頭からサーチするキーワード抽出処理が行われるようにしてもよい。 In the speech synthesis method, before the selection process, a keyword extraction process is performed to search from the head of the phonological symbol whether the phonological symbol of the keyword partially matches the phonological symbol of the text body. Also good.

上記キーワード抽出処理は，上記キーワードの音韻記号と上記テキスト本文の音韻記号とが部分一致している個所を基に，韻律予測情報に記載された各音韻ごとのキーワード位置情報の値を変更し；その変更後のキーワード位置情報を含んだ上記韻律予測情報を基にして，上記選択処理は，上記テキスト文に含まれる全てのキーワードの音韻の候補を，上記コーパスから選択するようにしてもよい。 The keyword extraction process changes the value of the keyword position information for each phoneme described in the prosodic prediction information based on a location where the phoneme symbol of the keyword and the phoneme symbol of the text body partially match; Based on the prosodic prediction information including the changed keyword position information, the selection process may select phoneme candidates for all keywords included in the text sentence from the corpus.

声の高さ，声の長さ，またはメルケプストラムのうち少なくとも一つを予測する情報である韻律予測情報に記載された各音韻のキーワード位置情報の値を変更することにより，キーワードの音韻記号とテキスト本文の音韻記号とが部分一致していることを示すようにしてもよい。 By changing the value of the keyword position information of each phoneme described in the prosodic prediction information, which is information for predicting at least one of voice pitch, voice length, or mel cepstrum, You may make it show that the phonetic symbol of a text main body corresponds partially.

上記音声合成方法では，上記テキスト本文内の１又は２以上のキーワードに対して，該キーワード部分を強調する度合いを示すキーワード重み付け係数を付与する重み付け付与処理がさらに行われるように構成しても良い。 The speech synthesis method may be configured to further perform a weighting process for assigning a keyword weighting coefficient indicating a degree of emphasizing the keyword part to one or more keywords in the text body. .

上記キーワード抽出処理では，上記テキスト本文内の１又は２以上のキーワードに対して，上記重み付け付与処理で付与されたキーワード重み付け係数をキーワード重み付け情報として取得し，そのキーワード重み付け情報と上記単位候補の組合せを絞り込む幅とを対応付けたテーブルが用いられるように構成してもよい。 In the keyword extraction process, the keyword weighting coefficient assigned in the weighting process is acquired as keyword weighting information for one or more keywords in the text body, and the combination of the keyword weighting information and the unit candidate is acquired. A table in which a width for narrowing down is associated may be used.

上記選択処理では，ターゲットコスト幅と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを足し合わせたコスト値が最小となる値を基にして，上記単位候補の組合せを絞り込む幅の範囲内に収まるコスト値をもつ単位候補の組合せを選択するように構成してもよい。 In the selection process described above, the unit candidate combinations are narrowed down based on the value that minimizes the cost value obtained by adding the target cost range, the sub cost value related to the pitch discontinuity, and the sub cost value related to the spectrum discontinuity. A combination of unit candidates having cost values that fall within the range of the width may be selected.

上記課題を解決するために，本発明の別の観点によれば，キーワード部分を強調し音声合成を行う音声合成装置が提供される。上記音声合成装置は，テキスト本文に含まれる全てのキーワードの音韻の候補を，当該キーワード以外の音韻の候補よりも優先的に，当該キーワードの出現順で，コーパスから選択するキーワード優先音韻選択部を備えることを特徴としている。なお，上記コーパスには，例えば，少なくとも品詞単位に分割可能な自然音声が記録されている。 In order to solve the above-described problem, according to another aspect of the present invention, a speech synthesizer for emphasizing a keyword portion and performing speech synthesis is provided. The speech synthesizer includes a keyword priority phoneme selection unit that selects phoneme candidates of all keywords included in the text body from a corpus in preference to phoneme candidates other than the keyword in the order of appearance of the keywords. It is characterized by providing. The corpus records, for example, natural speech that can be divided at least into parts of speech.

上記キーワード優先音韻選択部は，キーワード部分のコスト値が最小となる単位候補の組合せを用いて，上記テキスト本文の開始位置から最初に出現したキーワード開始位置に向けて，コスト値が最小となる単位候補の組合せを選択し；上記キーワードが２つ以上存在する場合，該キーワードの終了位置から後続のキーワードの開始位置に向けて，上記コスト値が最小となる単位候補の組合せを選択するように構成しても良い。 The keyword-preferred phoneme selection unit uses a combination of unit candidates that minimizes the cost value of the keyword part, and proceeds to the keyword start position that appears first from the start position of the text body. A candidate combination is selected; when there are two or more keywords, a unit candidate combination having the minimum cost value is selected from the end position of the keyword toward the start position of the subsequent keyword. You may do it.

上記音声合成装置は，上記キーワードの音韻記号が，上記テキスト本文の音韻記号の内で，部分一致しているかを音韻記号の先頭からサーチするキーワード抽出部をさらに備えてもよい。 The speech synthesizer may further include a keyword extraction unit that searches from the head of the phoneme symbol whether the phoneme symbol of the keyword partially matches the phoneme symbol of the text body.

上記キーワード抽出部は，上記キーワードの音韻記号と上記テキスト本文の音韻記号とが部分一致している個所を基に，韻律予測情報に記載された各音韻ごとのキーワード位置情報の値を変更し；上記キーワード優先音韻選択部は，上記変更後のキーワード位置情報を含んだ上記韻律予測情報を基にして，上記テキスト文に含まれる全てのキーワードの音韻の候補を，上記コーパスから選択するようにしてもよい。 The keyword extraction unit changes the value of the keyword position information for each phoneme described in the prosodic prediction information based on a location where the phoneme symbol of the keyword and the phoneme symbol of the text body partially match; The keyword priority phoneme selection unit selects from the corpus candidate phonemes of all keywords included in the text sentence based on the prosodic prediction information including the changed keyword position information. Also good.

上記キーワード抽出部が，声の高さ，声の長さ，またはメルケプストラムのうち少なくとも一つを予測する情報である韻律予測情報に記載された各音韻のキーワード位置情報の値を変更することにより，該変更されたキーワード位置情報の値は，キーワードの音韻記号とテキスト本文の音韻記号とが部分一致していることを示すようにしてもよい。 The keyword extraction unit changes the value of the keyword position information of each phoneme described in the prosodic prediction information that is information for predicting at least one of voice pitch, voice length, and mel cepstrum. The value of the changed keyword position information may indicate that the phonological symbol of the keyword partially matches the phonological symbol of the text body.

上記音声合成装置は，上記テキスト本文内の１又は２以上のキーワードに対して，該キーワード部分を強調する度合いを示すキーワード重み付け係数を付与する重み付け部をさらに備えてもよい。 The speech synthesizer may further include a weighting unit that assigns a keyword weighting coefficient indicating a degree of emphasizing the keyword part to one or more keywords in the text body.

上記キーワード抽出部は，上記テキスト本文内の１又は２以上のキーワードに対して，上記重み付け部で付与されたキーワード重み付け係数をキーワード重み付け情報として取得し，そのキーワード重み付け情報と上記単位候補の組合せを絞り込む幅とを対応付けたテーブルを用いるように構成しても良い。 The keyword extraction unit acquires, as keyword weighting information, the keyword weighting coefficient assigned by the weighting unit for one or more keywords in the text body, and combines the keyword weighting information and the unit candidate. You may comprise so that the table which matched the width | variety to narrow down may be used.

上記キーワード優先音韻選択部は，ターゲットコスト幅と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを足し合わせたコスト値が最小となる値を基にして，上記単位候補の組合せを絞り込む幅の範囲内に収まるコスト値をもつ単位候補の組合せを選択するように構成しても良い。 The keyword-preferred phoneme selection unit selects the unit candidate based on a value that minimizes the cost value obtained by adding the target cost range, the sub-cost value related to the pitch discontinuity, and the sub-cost value related to the spectrum discontinuity. A combination of unit candidates having cost values that fall within the range of narrowing down the combinations may be selected.

上記課題を解決するために，本発明の別の観点によれば，コンピュータをして，キーワード部分を強調し音声合成を行う音声合成装置として機能させるコンピュータプログラムが提供される。 In order to solve the above problems, according to another aspect of the present invention, there is provided a computer program that causes a computer to function as a speech synthesizer that performs speech synthesis by emphasizing a keyword portion.

以上説明したように，本発明によれば，音声の話者，音量，音程，または話速を変更しても，キーワード部分に対する出力音声とその他の部分に対する出力音声とを識別可能なように音声合成しても，強調させたいキーワード部分の音質の劣化を最小限に抑えることができる。 As described above, according to the present invention, even if the voice speaker, volume, pitch, or speech speed is changed, the voice can be distinguished from the output voice for the keyword part and the output voice for the other part. Even if it is synthesized, it is possible to minimize the deterioration of the sound quality of the keyword part to be emphasized.

以下，本発明の好適な実施の形態について，添付図面を参照しながら詳細に説明する。なお，以下の説明及び添付図面において，略同一の機能及び構成を有する構成要素については，同一符号を付することにより，重複説明を省略する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments of the invention will be described in detail with reference to the accompanying drawings. In the following description and the accompanying drawings, components having substantially the same functions and configurations are denoted by the same reference numerals, and redundant description is omitted.

（音声合成装置について）
まず，図１を参照しながら，第１の実施の形態にかかる音声合成装置１００について説明する。なお，図１は，第１の実施の形態にかかる音声合成装置の概略的な構成を示すブロック図である。 (About voice synthesizer)
First, the speech synthesis apparatus 100 according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a schematic configuration of the speech synthesizer according to the first embodiment.

図１に示すように，音声合成装置１００は，テキスト解析部１０１と，韻律予測部１０３と，キーワード抽出部１０５と，キーワード優先音韻選択部１０７と，コーパス１０９と，音韻接続部１１１とを備えている。 As shown in FIG. 1, the speech synthesizer 100 includes a text analysis unit 101, a prosody prediction unit 103, a keyword extraction unit 105, a keyword priority phoneme selection unit 107, a corpus 109, and a phoneme connection unit 111. ing.

上記テキスト解析部１０１は，図１に示すように，漢字仮名文字で表現されたテキスト本文と，テキスト本文の中で強調させたい漢字仮名文字で表現されたキーワードとを入力し，そのテキスト本文とキーワードを音韻記号に変換する。なお，音韻とは，例えば音素記号で表されるような分節可能な単位を示すが，かかる例に限定されない。 As shown in FIG. 1, the text analysis unit 101 inputs a text body expressed in Kanji kana characters and a keyword expressed in Kanji kana characters to be emphasized in the text body. Convert keywords to phonetic symbols. Note that the phoneme is a unit that can be segmented as represented by, for example, a phoneme symbol, but is not limited to such an example.

また，テキスト解析部１０１は，上記音韻記号に変換後，漢字仮名文字で表現されたテキスト本文に，形態素解析，係り受け解析を行い，アクセント記号列と，テキスト本文の品詞情報を表す形態素解析結果とを出力する。 In addition, the text analysis unit 101 performs morpheme analysis and dependency analysis on the text body expressed by the kanji characters after conversion into the above phoneme symbols, and the morpheme analysis result indicating the accent symbol string and the part of speech information of the text body Is output.

上記韻律予測部１０３は，テキスト解析部１０１により変換されたテキスト本文の音韻記号と，テキスト解析部１０１から出力されるアクセント記号列と，形態素解析結果とから得られるテキスト本文の品詞情報を基にして，ピッチ（声の高さ：基本周波数Ｆ_０）と，音韻継続時間長（声の長さ）と，波形の成分を表現するメルケプストラムとを予測する。なお，その予測した結果が韻律予測情報となる。また，メルケプストラム等の詳細については，特開２００３−２０８１８８に記載されている。 The prosody prediction unit 103 is based on the part of speech information of the text body obtained from the phoneme symbol of the text body converted by the text analysis unit 101, the accent symbol string output from the text analysis unit 101, and the morphological analysis result. Thus, a pitch (voice pitch: fundamental frequency F ₀ ), a phoneme duration (voice length), and a mel cepstrum representing a waveform component are predicted. The predicted result is prosodic prediction information. Details of the mel cepstrum and the like are described in JP-A-2003-208188.

図１に示すように，キーワード抽出部１０５は，キーワードの音韻記号が，テキスト本文の音韻記号のなかで部分的に一致しているか否かをテキスト本文の音韻記号の先頭から順にサーチする。 As shown in FIG. 1, the keyword extraction unit 105 searches in order from the head of the phoneme symbol of the text body whether or not the phoneme symbol of the keyword partially matches in the phoneme symbol of the text body.

また，キーワード抽出部１０５は，キーワードの音韻記号とテキスト本文における音韻記号とが部分的に一致している個所を基に，韻律予測部１０３から受ける韻律予測情報に記載された各音韻ごとのキーワード位置情報の値を，テキスト本文における音韻記号のなかで部分的に一致している全ての音韻について変更する。 In addition, the keyword extraction unit 105 determines the keyword for each phoneme described in the prosodic prediction information received from the prosody prediction unit 103 based on the location where the phonological symbol of the keyword partially matches the phonological symbol in the text body. The value of the position information is changed for all phonemes that partially match among the phoneme symbols in the text body.

上記キーワード優先音韻選択部１０７は，韻律予測部１０３で予測したピッチと，音韻継続時間長と，メルケプストラムとを，音韻選択処理のパラメータとして，コーパス１０９から音韻を選択する。なお，コーパス１０９は，例えば，ハードディスクドライブ等の記憶手段に記憶されている。 The keyword priority phoneme selection unit 107 selects a phoneme from the corpus 109 using the pitch predicted by the prosody prediction unit 103, the phoneme duration, and the mel cepstrum as parameters of the phoneme selection process. The corpus 109 is stored in storage means such as a hard disk drive.

上記音韻を選択する処理では，キーワード優先音韻選択部１０７は知覚的特性に一致した尺度（コスト）を使用する。また，観測可能な特徴量から，心理量にマッピングを行ったコスト関数は，韻律に関するサブコストと，ピッチの不連続に関するサブコストと，音韻環境代替に関するサブコストと，スペクトルの不連続に関するサブコストと，音韻の適合性に関するサブコストとの重み付けされた５つのサブコスト関数を足し合わせた関数として構成される（例えば，特開２００３−２０８１８８，参照）。 In the process of selecting a phoneme, the keyword priority phoneme selection unit 107 uses a scale (cost) that matches the perceptual characteristic. In addition, the cost function mapped from the observable feature quantity to the psychological quantity is the sub-cost related to prosody, the sub-cost related to pitch discontinuity, the sub-cost related to phonological environment substitution, the sub-cost related to spectrum discontinuity, and the phonetic It is configured as a function obtained by adding five weighted sub-cost functions with the sub-cost related to suitability (for example, see Japanese Patent Application Laid-Open No. 2003-208188).

なお，音声合成装置１００は，テキスト本文とキーワードを基にして合成音声を出力することが可能な装置であって，その合成音声を出力することで，テキスト本文を音声にして読み上げることが可能な装置である。より具体的には，音声合成装置１００は，例えば，ＣＰＵ，メモリ，ＨＤＤ（ハードディスクドライブ），マウス等に相当する入力部，液晶ディスプレイ等に相当する表示部などを備えたＰＣ等を例示することができるが，かかる例に限定されない。 Note that the speech synthesizer 100 is a device capable of outputting a synthesized speech based on a text body and a keyword, and can output the text body as speech by outputting the synthesized speech. Device. More specifically, the speech synthesizer 100 exemplifies, for example, a PC having an input unit corresponding to a CPU, memory, HDD (hard disk drive), mouse, etc., a display unit equivalent to a liquid crystal display, etc. However, it is not limited to such an example.

なお，本実施の形態にかかる音声合成装置１００に備わる表示部は，ＣＰＵにより表示可能なように処理された表示画面データと音声データを出力する。また，表示部は，例えば，ＴＶ又は液晶ディスプレイ装置などが例示され，上記双方ともにスピーカーを備えて，静止画像のほか，音声，又は動画像などを出力することが可能である。 In addition, the display part with which the speech synthesizer 100 concerning this Embodiment is provided outputs the display screen data and audio | voice data processed so that display was possible by CPU. In addition, the display unit is exemplified by a TV or a liquid crystal display device, for example, and both of them are provided with a speaker, and can output a sound or a moving image in addition to a still image.

入力部は，例えば，使用者から操作指示を受けることが可能なマウス，トラックボール，トラックパッド，スタイラスペン，またはジョイスティックなどのポインティングデバイスや，キーボード，ボタン，スイッチ，レバー等の操作手段と，入力信号を生成してＣＰＵに出力する入力制御回路などから構成されている。音声合成装置１００のユーザは，この入力部を操作することにより，音声合成装置１００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input unit is, for example, a pointing device such as a mouse, a trackball, a trackpad, a stylus pen, or a joystick that can receive an operation instruction from a user, an operation means such as a keyboard, a button, a switch, and a lever, and an input It comprises an input control circuit that generates a signal and outputs it to the CPU. The user of the speech synthesizer 100 can input various data and instruct a processing operation to the speech synthesizer 100 by operating this input unit.

（音声合成方法について）
次に，図２を参照しながら，第１の実施の形態にかかる音声合成方法について説明する。なお，図２は，第１の実施の形態にかかる音声合成方法の概略を示すフローチャートである。 (Speech synthesis method)
Next, the speech synthesis method according to the first embodiment will be described with reference to FIG. FIG. 2 is a flowchart showing an outline of the speech synthesis method according to the first embodiment.

図２に示すように，まず，１又は２以上のキーワードを含むテキスト本文と，強調させたい１又は２以上のキーワードとは，テキスト解析部１０１に入力する（Ｓ２０１）。なお，上記テキスト本文およびキーワードは漢字仮名文字で表現された場合を例に挙げて説明するが，かかる例に限定されない。 As shown in FIG. 2, first, a text body including one or more keywords and one or more keywords to be emphasized are input to the text analysis unit 101 (S201). The text body and the keyword will be described by taking an example where the text body and the keyword are expressed in kanji characters, but are not limited to such examples.

次に，テキスト解析部１０１は，上記入力したテキスト本文とキーワードとを音韻記号に変換する（Ｓ２０３）。 Next, the text analysis unit 101 converts the input text body and keywords into phonological symbols (S203).

テキスト解析部１０１は，漢字仮名文字で表現されたテキスト本文に，形態素解析，係り受け解析を行い，アクセント記号列と，テキスト本文の品詞情報を表す形態素解析結果とを出力する（Ｓ２０３）。 The text analysis unit 101 performs morphological analysis and dependency analysis on the text body expressed by kanji characters, and outputs an accent symbol string and a morphological analysis result representing the part-of-speech information of the text body (S203).

なお，図２に示すように，テキスト本文に対して変換された音韻記号と，形態素解析結果との情報を持つ出力結果は，例えば，テキスト本文中間言語であると定義するが，かかる例に限定されない。 As shown in FIG. 2, the output result having information on the phoneme symbol converted to the text body and the morphological analysis result is defined as, for example, an intermediate language of the text body, but is limited to such an example. Not.

ここで，テキスト本文が変換された音韻記号５０１（図５Ａに示す「テキスト本文音韻記号」）は，例えば，図５Ａに示すように，「ｈａｊｉｍｅ…ａｏｋｉ」である。 Here, the phoneme symbol 501 in which the text body is converted (“text body phoneme symbol” shown in FIG. 5A) is, for example, “hajime... Aoki” as shown in FIG. 5A.

また，キーワードに対して変換された音韻記号を持つ出力結果を，キーワード中間言語とするが，かかる例に限定されない。 An output result having a phoneme symbol converted for a keyword is used as a keyword intermediate language. However, the present invention is not limited to such an example.

また，図５Ａに示すように，キーワードの音韻記号５０２（図５Ａに示す「キーワード音韻記号」）は，例えば，「ａｏｋｉ」である。 Further, as shown in FIG. 5A, the keyword phonological symbol 502 (“keyword phonological symbol” shown in FIG. 5A) is, for example, “aoki”.

次に，図２に示すように，テキスト解析部１０１は，テキスト本文を韻律予測部１０３に入力するために，テキスト本文であるかどうかを判定する（Ｓ２０５）。 Next, as shown in FIG. 2, the text analysis unit 101 determines whether the text body is a text body in order to input the text body to the prosody prediction unit 103 (S205).

判定の結果（Ｓ２０５），テキスト本文である場合，テキスト解析部１０１は，テキスト本文中間言語を出力して韻律予測部１０３に送信する。 As a result of the determination (S205), if it is a text body, the text analysis unit 101 outputs the text body intermediate language and transmits it to the prosody prediction unit 103.

さらに，判定の結果（Ｓ２０５），テキスト本文である場合，テキスト解析部１０１は，テキスト本文中間言語をキーワード抽出部１０５に出力する。 Furthermore, if the result of determination (S205) is a text body, the text analysis unit 101 outputs the text body intermediate language to the keyword extraction unit 105.

一方，テキスト本文でない場合，つまりキーワードである場合，テキスト解析部１０１は，キーワード中間言語をキーワード抽出部１０５に送信する。 On the other hand, if it is not a text body, that is, if it is a keyword, the text analysis unit 101 transmits the keyword intermediate language to the keyword extraction unit 105.

次に，韻律予測部１０３は，ピッチ（声の高さ：基本周波数Ｆ_０），音韻継続時間長（声の長さ），もしくは波形の成分を表現するメルケプストラムのうち少なくとも一つまたは全部を予測する（Ｓ２０７）。 Next, the prosody prediction unit 103 selects at least one or all of pitch (voice pitch: fundamental frequency F ₀ ), phoneme duration (voice length), or mel cepstrum representing a waveform component. Prediction is performed (S207).

韻律予測部１０３は，予測した情報として韻律予測情報をキーワード抽出部１０５に出力する。 The prosody prediction unit 103 outputs prosodic prediction information to the keyword extraction unit 105 as predicted information.

ここで，上記韻律予測情報について説明すると，図５Ａに示すように，韻律予測情報５０３は，音韻記号５０１（図５Ａに示す例では，「ｈａｊｉｍｅ…ａｏｋｉ…」。）の音韻（韻律予測情報５０３内の縦方向に記載された「ｈａｊｉｍｅ…ａｏｋｉ…」）ごとに，音韻の開始時間を表す「ｓｔａｒｔ」と，音韻の継続時間長を表す「ｄｕｒａｔｉｏｎ」と，音韻の１又は２以上のピッチを表す「ｐｉｔｃｈ」と，音韻の１又は２以上のメルケプストラムを表す「Ｍｅｌｃｅｐ」とから少なくとも構成される。 Here, the prosody prediction information will be described. As shown in FIG. 5A, the prosody prediction information 503 is a phoneme (prosody prediction information 503) of a phoneme symbol 501 (“hajime ... aoki ...” in the example shown in FIG. 5A). "Hajime ... aoki ..." described in the vertical direction in the figure), "start" indicating the phoneme start time, "duration" indicating the duration of the phoneme, and one or more pitches of the phoneme "Pitch" representing the melody, and "Mel cep" representing one or more mel cepstrum of the phoneme.

次に，図２に示すように，キーワード抽出部１０５は，キーワードの音韻記号と，テキスト本文における音韻記号とが部分的に一致している個所を基に，韻律予測情報に記載された各音韻ごとのキーワード位置情報の値を，テキスト本文内の音韻記号について部分一致している音韻記号全てに対して変更し，その変更したキーワード位置情報付き韻律予測情報を，キーワード優先音韻選択部１０７に出力する。 Next, as shown in FIG. 2, the keyword extraction unit 105 performs each phoneme described in the prosodic prediction information based on the location where the keyword phoneme symbol partially matches the phoneme symbol in the text body. The value of each keyword position information is changed for all phoneme symbols that partially match the phoneme symbols in the text body, and the changed prosodic prediction information with keyword position information is output to the keyword priority phoneme selection unit 107 To do.

（キーワード抽出部による処理について）
次に，キーワード抽出部１０５は，図２に示すように，テキスト本文内でキーワードがどの位置に存在しているかを示すための情報（キーワード位置情報）を記載するため，韻律予測情報における各音韻に対して領域を確保し，初期化する（Ｓ２０９）。なお，上記初期化の際には，キーワード位置情報のフラグ値をｆａｌｓｅに設定するが，かかる例に限定されない。 (About processing by the keyword extraction unit)
Next, as shown in FIG. 2, the keyword extraction unit 105 describes information (keyword position information) for indicating the position where the keyword exists in the text body, so that each phoneme in the prosodic prediction information is recorded. An area is secured for and initialized (S209). In the initialization, the flag value of the keyword position information is set to false, but the present invention is not limited to this example.

次に，キーワード抽出部１０５は，テキスト本文と，キーワードとの音韻記号内で，現在どの音韻を指し示しているかを表現した音韻位置指示子を，テキスト本文と１又は２以上のキーワードにおける先頭音韻に設定する（Ｓ２１１）。 Next, the keyword extraction unit 105 converts the phoneme position indicator that indicates which phoneme is currently pointed to in the phoneme symbol between the text body and the keyword to the text body and the first phoneme in one or more keywords. Set (S211).

次に，キーワード抽出部１０５は，テキスト本文と，複数のキーワードに対して，音韻の数を求め，各キーワード音韻数にキーワードの音韻数，テキスト音韻数にテキスト本文の音韻数を設定する（Ｓ２１３）。 Next, the keyword extraction unit 105 obtains the number of phonemes for the text body and a plurality of keywords, and sets the number of keyword phonemes for each keyword phoneme number and the number of text body phonemes for the text phoneme number (S213). ).

次に，キーワード抽出部１０５は，テキスト本文の文末であるか否かを判定する（Ｓ２１５）。 Next, the keyword extraction unit 105 determines whether the end of the text body is reached (S215).

上記テキスト本文の文末であるか否かの判定した結果（Ｓ２１５），テキスト本文の文末でない場合，キーワード抽出部１０５は，テキスト本文内にあるキーワードの個所を抽出する（Ｓ２１７）。なお，上記キーワードの抽出処理（Ｓ２１７）については，後程説明する。 As a result of determining whether or not it is the end of the text body (S215), if it is not the end of the text body, the keyword extracting unit 105 extracts a keyword portion in the text body (S217). The keyword extraction process (S217) will be described later.

上記テキスト本文の文末である場合，キーワード抽出部１０５は，韻律予測情報と，韻律予測情報内の各音韻ごとに（付随する）キーワード位置情報とを有する，キーワード位置情報付き韻律予測情報をキーワード優先音韻選択部１０７に出力する。 In the case of the end of the text body, the keyword extraction unit 105 assigns keyword priority information to prosodic prediction information having prosodic prediction information and keyword position information (accompanying) for each phoneme in the prosodic prediction information. Output to the phoneme selection unit 107.

なお，図４Ｂに示すように，テキスト本文の文末であるか否かの基準は，テキスト本文の音韻位置指示子が，テキスト音韻数より小さいか否かを，基にして判断される。上記テキスト本文の音韻位置指示子が，テキスト音韻数より小さい場合，テキスト本文の文末であると判断される。 As shown in FIG. 4B, the criterion of whether or not the end of the text body is determined is based on whether or not the phoneme position indicator of the text body is smaller than the number of text phonemes. If the phoneme position indicator of the text body is smaller than the number of text phonemes, it is determined that the end of the text body is reached.

図２に示すステップＳ２１９では，詳細は後述するが，キーワード優先音韻選択部１０７によって，テキスト本文内のキーワード部分から，図１に示すコーパス１０９を用いて，最適な音韻を選択している。 In step S219 shown in FIG. 2, although the details will be described later, the keyword-preferred phoneme selection unit 107 selects an optimal phoneme from the keyword portion in the text body using the corpus 109 shown in FIG.

キーワード部分の音韻選択が完了すると，次に，キーワード部分以外の個所の最適な音韻を選択する。 When the phoneme selection for the keyword part is completed, the optimal phoneme for the part other than the keyword part is selected.

全ての音韻選択処理がキーワード優先音韻選択部１０７によって行われることで，波形セグメントを出力することができる（Ｓ２１９）。 All the phoneme selection processes are performed by the keyword priority phoneme selection unit 107, so that a waveform segment can be output (S219).

次に，音韻接続部１１１は，現在取り扱っている波形セグメントの音韻が，テキスト本文の文末であるか否かを判定する（Ｓ２２１）。 Next, the phoneme connection unit 111 determines whether the phoneme of the waveform segment currently handled is the end of the text body (S221).

上記判定した結果（Ｓ２２１），テキスト本文の文末でない場合，音韻接続部１１１は，現在取り扱っている波形セグメントと，次の波形セグメントを接続する（Ｓ２２３）。 As a result of the above determination (S221), if it is not the end of the text body, the phoneme connection unit 111 connects the current waveform segment to the next waveform segment (S223).

また一方で，テキスト本文の文末である場合（Ｓ２２１），音韻接続部１１１は，波形セグメントを接続することにより生成する合成音声を出力する（Ｓ２２５）。かかる合成音声の出力により（Ｓ２２５），音声合成装置１００は，キーワードを強調しながら，テキスト本文を読み上げることができる。 On the other hand, if it is the end of the text body (S221), the phoneme connection unit 111 outputs synthesized speech generated by connecting the waveform segments (S225). With the output of the synthesized speech (S225), the speech synthesizer 100 can read the text body while emphasizing the keyword.

（キーワード個所の抽出処理について）
次に，図２に示すキーワード個所の抽出処理（Ｓ２１７）について，図３，図４Ａ，図４Ｂを参照しながら，さらに詳細に説明する。なお，図３，図４Ａは，第１の実施の形態にかかるキーワード個所の抽出処理の概略を示すフローチャートであり，図４Ｂは，第１の実施の形態にかかるキーワードの語尾であるか否かの判断基準の概略を示す説明図である。 (Keyword location extraction process)
Next, the keyword location extraction process (S217) shown in FIG. 2 will be described in more detail with reference to FIGS. 3, 4A, and 4B. 3 and 4A are flowcharts showing an outline of the keyword location extraction processing according to the first embodiment, and FIG. 4B shows whether or not it is the ending of the keyword according to the first embodiment. It is explanatory drawing which shows the outline of this judgment standard.

図３に示すように，テキスト本文の文末でなければ（Ｓ２１５），キーワード抽出部１０５は，１又は２以上のキーワードのうち，順々にキーワードを取り扱うため，現時点で取り扱うキーワードの順番が，全体のキーワード数の範囲内でおさまっているか否かを判定する（Ｓ２４０）。 As shown in FIG. 3, if it is not the end of the text body (S215), the keyword extraction unit 105 handles the keywords in order from one or more keywords, so It is determined whether or not it falls within the range of the number of keywords (S240).

現在取り扱っているキーワードの順番は，キーワードの入力順とするが，かかる例に限定されない。 The order of keywords currently handled is the keyword input order, but is not limited to this example.

次に，キーワード抽出部１０５は，現在取り扱っているキーワード（当キーワード）の音韻位置指示子が，当キーワードの語尾を示しているかどうかを判定する（Ｓ２４１）。 Next, the keyword extraction unit 105 determines whether the phonological position indicator of the currently handled keyword (this keyword) indicates the ending of the keyword (S241).

当キーワードの語尾である場合（Ｓ２４１），テキスト本文の音韻位置指示子が示す位置より前の，当キーワード音韻数の数値分の音韻の，韻律予測情報内の各音韻ごとのキーワード位置情報をｔｒｕｅに設定する（図４Ａに示すステップＳ２５３）。 When it is the ending of this keyword (S241), the keyword position information for each phoneme in the prosodic prediction information in the prosodic prediction information of the phonemes of the number of the keyword phonemes before the position indicated by the phoneme position indicator in the text body is true. (Step S253 shown in FIG. 4A).

図４Ｂに示すように，当キーワードの語尾であるか否かの判断基準は，現在取り扱っているキーワード（当キーワード）の音韻位置指示子が，当キーワードの音韻数より小さいかどうかによって，判断する。 As shown in FIG. 4B, the criterion for determining whether or not this keyword is ending is determined by whether or not the phoneme position indicator of the currently handled keyword (this keyword) is smaller than the number of phonemes of this keyword. .

図３に示すように，キーワード抽出部１０５は，テキスト本文と，当キーワードの音韻位置指示子が示している，テキスト本文の音韻と，当キーワードの音韻とが，同じ音韻であるか否かを判定する（Ｓ２４３）。 As shown in FIG. 3, the keyword extraction unit 105 determines whether or not the text body, the phoneme of the text body indicated by the phoneme position indicator of the keyword, and the phoneme of the keyword are the same phoneme. Determination is made (S243).

音韻が一致している場合（Ｓ２４３），当キーワードの音韻位置指示子を次の音韻に設定し（Ｓ２４７），音韻が一致していない場合（Ｓ２４３），当キーワードの音韻位置指示子を先頭音韻に設定する（Ｓ２４５）。 When the phonemes match (S243), the phoneme position indicator of the keyword is set as the next phoneme (S247). When the phonemes do not match (S243), the phoneme position indicator of the keyword is set as the first phoneme. (S245).

次に，キーワード抽出部１０５は，現在取り扱っているキーワード（当キーワード）の音韻位置指示子の指し示す位置を，当キーワードの先頭音韻に設定する（Ｓ２４５）。 Next, the keyword extraction unit 105 sets the position pointed to by the phoneme position indicator of the currently handled keyword (this keyword) as the first phoneme of the keyword (S245).

キーワード抽出部１０５は，次に，現在取り扱っているキーワード（当キーワード）の音韻位置指示子の指し示す位置を，当キーワードの音韻位置指示子が現在指し示している音韻の次の音韻に設定する（Ｓ２４７）。 Next, the keyword extraction unit 105 sets the position pointed to by the phoneme position indicator of the currently handled keyword (this keyword) to the phoneme next to the phoneme currently pointed by the keyword position indicator (S247). ).

次に，キーワード抽出部１０５は，現在取り扱っているキーワードを，キーワードの入力順に次のキーワードに変更する（Ｓ２４９）。 Next, the keyword extraction unit 105 changes the currently handled keyword to the next keyword in the keyword input order (S249).

なお，テキスト本文の文末である場合（Ｓ２１５），キーワード抽出部１０５は，キーワード位置情報付き韻律予測情報を出力する（Ｓ２５１）。 If it is the end of the text body (S215), the keyword extraction unit 105 outputs prosodic prediction information with keyword position information (S251).

さらに，図４Ａに示すように，キーワード抽出部１０５は，テキスト本文の音韻位置指示子が指し示す音韻より，１つ先行している音韻から先頭音韻の方向に向かって，当キーワード音韻数の数値分の音韻全てに対応する，韻律予測情報内の各音韻ごとのキーワード位置情報を変更する（Ｓ２５３）。なお，ｆａｌｓｅからｔｒｕｅにキーワード位置情報は変更されるが，かかる例に限定されない。 Further, as shown in FIG. 4A, the keyword extraction unit 105 counts the number of the keyword phonemes from the phoneme pointed by the phoneme position indicator in the text body toward the head phoneme. The keyword position information for each phoneme in the prosodic prediction information corresponding to all the phonemes is changed (S253). The keyword position information is changed from false to true, but is not limited to this example.

次に，図４Ａに示すように，キーワード抽出部１０５は，図３に示すステップＳ２４５と同じ処理を実行する（Ｓ２５５）。 Next, as shown in FIG. 4A, the keyword extraction unit 105 executes the same processing as step S245 shown in FIG. 3 (S255).

また，図３に示す判定の結果（Ｓ２４０），キーワード数内でない場合，図４Ａに示すように，キーワード抽出部１０５は，当キーワードを最初のキーワードに変更する（Ｓ２５７）。 If the result of determination shown in FIG. 3 is not within the number of keywords (S240), the keyword extraction unit 105 changes the keyword to the first keyword as shown in FIG. 4A (S257).

次に，図４Ａに示すように，キーワード抽出部１０５は，テキスト本文の音韻位置指示子の指し示す位置を，テキスト本文の音韻位置指示子が現在指し示している音韻の次の音韻に設定する（Ｓ２５９）。 Next, as shown in FIG. 4A, the keyword extraction unit 105 sets the position pointed to by the phoneme position indicator in the text body to the phoneme next to the phoneme currently pointed by the phoneme position indicator in the text body (S259). ).

また，図５Ｂには，図１に示した第１の実施の形態にかかる音声合成装置１００におけるテキスト解析部１０１〜音韻接続部１１１の各部で処理するデータの流れについて示している。 FIG. 5B shows the flow of data processed by each unit of the text analysis unit 101 to the phoneme connection unit 111 in the speech synthesizer 100 according to the first embodiment shown in FIG.

以上で，図３及び図４に示すキーワード抽出部１０５によるキーワード個所の抽出処理（Ｓ２１７）の一連の処理が終了する。 Thus, a series of processing of the keyword location extraction processing (S217) by the keyword extraction unit 105 shown in FIGS. 3 and 4 is completed.

（キーワード優先音韻選択部１０７によるキーワード優先音韻選択処理について）
次に，図２に示すように，キーワード個所を抽出すると（Ｓ２１７），キーワード優先音韻選択部１０７によるキーワード優先音韻選択処理が実行される（Ｓ２１９）。 (Keyword priority phoneme selection processing by the keyword priority phoneme selection unit 107)
Next, as shown in FIG. 2, when a keyword part is extracted (S217), keyword priority phoneme selection processing by the keyword priority phoneme selection unit 107 is executed (S219).

上記キーワード優先音韻選択処理（Ｓ２１９）では，テキスト本文の音韻に対して，音韻の適合性に関するサブコスト値と，音韻環境代替に関するサブコスト値と，韻律に関するサブコスト値とをコーパス１０９を利用することで取得し，さらに３つのサブコスト値を足し合わせた，最小のサブコスト値（ターゲットコスト値）から，ある程度の幅を持たせた値の範囲内に含まれる，音韻を候補として選択（ターゲット選択）する。 In the keyword priority phoneme selection process (S219), a sub cost value related to phoneme compatibility, a sub cost value related to phonological environment substitution, and a sub cost value related to prosody are obtained by using the corpus 109 for the phoneme of the text body. Then, phonemes included in a range of values having a certain range are selected (target selection) from the minimum sub cost value (target cost value) obtained by adding the three sub cost values.

ここで，図６〜図１０を参照しながら，第１の実施の形態にかかるキーワード優先音韻選択処理（Ｓ２１９）について詳細に説明する。なお，図６は，第１の実施の形態にかかるキーワード優先音韻選択処理の概略を示す説明図であり，図７〜図１０は，第１の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。 Here, the keyword priority phoneme selection process (S219) according to the first embodiment will be described in detail with reference to FIGS. FIG. 6 is an explanatory diagram showing an outline of the keyword priority phoneme selection process according to the first embodiment. FIGS. 7 to 10 show an outline of the keyword priority phoneme selection process according to the first embodiment. It is a flowchart which shows.

図６に示すように，キーワード優先音韻選択部１０７は，ターゲット選択で求めた音韻候補をもとに，テキスト本文の各音韻に付与されているキーワード位置情報のフラグ値がｔｒｕｅとなるキーワード開始位置から，キーワード終了位置まで，ターゲットコスト値と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを，足し合わせたコスト値が最小となる単位候補の組合せを，動的計画法を用いて選択する。 As shown in FIG. 6, the keyword priority phoneme selection unit 107, based on the phoneme candidate obtained by the target selection, the keyword start position where the flag value of the keyword position information given to each phoneme of the text body is true. To the keyword end position, the target cost value, the sub-cost value related to the pitch discontinuity, and the sub-cost value related to the spectrum discontinuity are combined to generate a unit candidate that minimizes the cost value. Use to select.

キーワード優先音韻選択部１０７は，キーワード部分のコスト値が最小となる単位候補の組合せを用いて，テキスト本文の開始位置から，キーワード開始位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択する。 The keyword priority phoneme selection unit 107 uses the combination of unit candidates that minimizes the cost value of the keyword part to select the combination of unit candidates that minimizes the cost value from the start position of the text body toward the keyword start position. , Select using dynamic programming.

キーワード優先音韻選択部１０７は，キーワード部分のコスト値が最小となる単位候補の組合せを用いて，キーワード終了位置から，テキスト本文の終了位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択する。 The keyword priority phoneme selection unit 107 uses the combination of unit candidates that minimizes the cost value of the keyword part to select the combination of unit candidates that minimizes the cost value from the keyword end position toward the end position of the text body. , Select using dynamic programming.

図６に示すように，キーワードが２つ以上存在する場合，キーワード部分に関するコスト値の計算は，テキスト本文の開始から終了に向けてキーワードの出現順に行う。 As shown in FIG. 6, when there are two or more keywords, the cost value for the keyword portion is calculated in the order of appearance of the keywords from the start to the end of the text body.

また，図６に示すように，キーワードが２つ以上存在する場合，キーワード部分以外のコスト値の計算方法は，テキスト本文の開始位置から，最初に出現したキーワード開始位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択する。 Also, as shown in FIG. 6, when there are two or more keywords, the cost value calculation method other than the keyword part is calculated from the start position of the text body toward the first keyword start position. The smallest unit candidate combination is selected using dynamic programming.

図６に示すように，キーワードが２つ以上存在する場合，キーワード部分以外のコスト値の計算方法は，キーワード部分のコスト値が最小となる単位候補の組合せを用いて，出現したキーワード（当該キーワード）の終了位置から，次に出現するキーワード（後続キーワード）の開始位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択する。 As shown in FIG. 6, when there are two or more keywords, the cost value calculation method other than the keyword part uses the combination of unit candidates that minimizes the cost value of the keyword part. ) From the end position to the start position of the next appearing keyword (subsequent keyword), the combination of unit candidates that minimizes the cost value is selected using dynamic programming.

図６に示すように，キーワードが２つ以上存在する場合，キーワード部分以外のコスト値の計算方法は，最後に出現したキーワードの終了位置から，テキスト本文の終了位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択する。 As shown in FIG. 6, when there are two or more keywords, the cost value calculation method other than the keyword part is the minimum cost value from the end position of the keyword that appears last to the end position of the text body. A unit candidate combination is selected using dynamic programming.

また，図７に示すように，キーワード優先音韻選択部１０７は，まず，テキスト本文の音韻に対して，ターゲット選択する（Ｓ２７１）。 As shown in FIG. 7, the keyword priority phoneme selection unit 107 first selects a target for the phoneme of the text body (S271).

上記ターゲット選択すると（Ｓ２７１），キーワード優先音韻選択部１０７は，サーチ変数をテキスト本文の先頭音韻にキーワード数を０に設定する（Ｓ２７３）ことで，テキスト本文内で現在どの音韻を指し示しているかを表現したサーチ変数をテキスト本文の先頭の音韻を指し示すようにする。 When the target is selected (S271), the keyword priority phoneme selection unit 107 sets the search variable to the first phoneme of the text body and sets the number of keywords to 0 (S273), thereby determining which phoneme is currently indicated in the text body. The expressed search variable points to the phoneme at the beginning of the text body.

次に，キーワード優先音韻選択部１０７は，サーチ変数が指し示している音韻の位置がテキスト本文の全体の音韻数の範囲内にあるかどうかを判定する（Ｓ２７５）。つまり，図７に示すように，サーチ変数＜テキストの音韻数の関係にあるかどうかを判定する。 Next, the keyword priority phoneme selection unit 107 determines whether or not the position of the phoneme pointed to by the search variable is within the range of the total number of phonemes in the text body (S275). That is, as shown in FIG. 7, it is determined whether or not the relation of search variable <number of phonemes of text is satisfied.

上記ステップＳ２７５の判定結果，サーチ変数がテキストの音韻数より小さい場合，キーワード優先音韻選択部１０７は，サーチ変数が指し示している音韻が，キーワード内の音韻であるかどうかを判定する（Ｓ２７７）。 If the result of determination in step S275 is that the search variable is smaller than the number of phonemes in the text, the keyword priority phoneme selection unit 107 determines whether the phoneme pointed to by the search variable is a phoneme in the keyword (S277).

通常，サーチ変数が指し示している音韻が，キーワード内の音韻である場合，キーワード位置情報はｔｒｕｅである。キーワード内の音韻でない場合，キーワード位置情報はｆａｌｓｅである。 Normally, when the phoneme pointed to by the search variable is a phoneme in a keyword, the keyword position information is true. If it is not a phoneme within the keyword, the keyword position information is false.

一方，サーチ変数がテキストの音韻数より大きい場合（Ｓ２７５），図８に示す後続の処理（サーチ変数をテキスト本文の先頭音韻に設定する）が実行される。 On the other hand, if the search variable is larger than the number of phonemes of the text (S275), the subsequent processing shown in FIG. 8 (set the search variable to the first phoneme of the text body) is executed.

なお，サーチ変数が指すキーワード位置情報がｆａｌｓｅである場合（Ｓ２７７），サーチ変数を次の音韻に設定する（Ｓ２８７）。 If the keyword position information pointed to by the search variable is false (S277), the search variable is set to the next phoneme (S287).

サーチ変数が指すキーワード位置情報がｔｒｕｅである場合，サーチ変数の次の音韻のキーワード位置情報がｔｒｕｅ，又は，次の音韻があるか否かを確認する処理を実行する（Ｓ２７９）。 If the keyword position information pointed to by the search variable is true, a process of checking whether the keyword position information of the phoneme next to the search variable is true or whether there is the next phoneme is executed (S279).

図７に示すように，ステップＳ２７９では，サーチ変数が指し示している音韻の次の音韻がキーワード内の音韻であるかどうかを判定している。 As shown in FIG. 7, in step S279, it is determined whether the phoneme next to the phoneme indicated by the search variable is a phoneme in the keyword.

サーチ変数の次の音韻のキーワード位置情報がｔｒｕｅである場合，サーチ変数が指す音韻と次の音韻とのコスト値を求める（Ｓ２８１）。 If the keyword position information of the phoneme next to the search variable is true, the cost value of the phoneme pointed to by the search variable and the next phoneme is obtained (S281).

サーチ変数の次の音韻のキーワード位置情報がｆａｌｓｅである場合，キーワード数を１インクリメントする（Ｓ２８３）。 If the keyword position information of the phoneme next to the search variable is false, the number of keywords is incremented by 1 (S283).

図７に示すように，上記ステップＳ２８１では，サーチ変数が指し示している音韻のターゲット選択で求めた候補と，サーチ変数が指し示している音韻の次の音韻のターゲット選択で求めた候補との間で，ターゲットコスト値と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを足し合わせたコスト値を求めている。 As shown in FIG. 7, in step S281, between the candidate obtained by the target selection of the phoneme indicated by the search variable and the candidate obtained by the target selection of the phoneme next to the phoneme indicated by the search variable. The cost value obtained by adding the target cost value, the sub cost value related to the pitch discontinuity, and the sub cost value related to the spectrum discontinuity is obtained.

上記コスト値を求めると（Ｓ２８１），次に，サーチ変数を次の音韻に設定する（Ｓ２８７）。 When the cost value is obtained (S281), the search variable is set to the next phoneme (S287).

また，テキスト本文内でキーワードが幾つ存在するかを表現したキーワード数を１インクリメントすると（Ｓ２８３），キーワード優先音韻選択部１０７は，コスト値が最小となるパスを設定する（Ｓ２８５）。 Further, when the number of keywords expressing how many keywords exist in the text body is incremented by 1 (S283), the keyword priority phoneme selection unit 107 sets a path with the minimum cost value (S285).

上記パスを設定するステップＳ２８５では，サーチ変数が指し示している音韻の，ターゲット選択で求めた候補と，サーチ変数が指し示している音韻の次の音韻の，ターゲット選択で求めた候補との間で，ターゲットコスト値と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを，足し合わせたコスト値が最小となる単位候補の組合せ（パス）を求めている。 In step S285 for setting the path, between the candidate obtained by target selection of the phoneme indicated by the search variable and the candidate obtained by target selection of the phoneme next to the phoneme indicated by the search variable, A combination (path) of unit candidates that obtains the minimum cost value by adding the target cost value, the sub cost value related to the pitch discontinuity, and the sub cost value related to the spectrum discontinuity is obtained.

上記コスト値が最小となるパスを設定すると（Ｓ２８５），サーチ変数を次の音韻に設定する（Ｓ２８７）。 When the path with the minimum cost value is set (S285), the search variable is set to the next phoneme (S287).

図８に示すように，キーワード優先音韻選択部１０７は，サーチ変数をテキスト本文の先頭音韻に設定する（Ｓ２９０）。なお，上記ステップＳ２９０は，上記図７のステップＳ２７３と実質的に同様である。 As shown in FIG. 8, the keyword priority phoneme selection unit 107 sets the search variable to the head phoneme of the text body (S290). Note that step S290 is substantially the same as step S273 in FIG.

上記サーチ変数がテキスト本文の先頭音韻に設定されると（Ｓ２９０），サーチ変数が指すキーワード位置情報がｆａｌｓｅであるか否かを確認する（Ｓ２９１）。なお，上記ステップＳ２９１は，上記図７のステップＳ２７７と実質的に同様である。 When the search variable is set to the head phoneme of the text body (S290), it is confirmed whether or not the keyword position information pointed to by the search variable is false (S291). Note that step S291 is substantially the same as step S277 in FIG.

サーチ変数が指し示すキーワード位置情報がｆａｌｓｅである場合（Ｓ２９１），サーチ変数の次の音韻のキーワード位置情報がｆａｌｓｅであるか否かを確認する（Ｓ２９３）。 When the keyword position information indicated by the search variable is false (S291), it is confirmed whether the keyword position information of the phoneme next to the search variable is false (S293).

一方，サーチ変数が指し示すキーワード位置情報がｔｒｕｅである場合（Ｓ２９１），サーチ変数が指すキーワード位置情報がｔｒｕｅであるか否かを確認する（Ｓ２９８）。 On the other hand, if the keyword position information pointed to by the search variable is true (S291), it is confirmed whether the keyword position information pointed to by the search variable is true (S298).

図８に示すように，上記ステップＳ２９３では，サーチ変数が指し示している音韻の，次の音韻が，キーワード内の音韻であるかどうかを判定している。 As shown in FIG. 8, in step S293, it is determined whether or not the next phoneme of the phoneme indicated by the search variable is a phoneme in the keyword.

次に，サーチ変数の次の音韻のキーワード位置情報がｆａｌｓｅである場合（Ｓ２９３），サーチ変数が指す音韻と，次の音韻とのコスト値を求める（Ｓ２９５）。なお，当該ステップＳ２９５と，図７のコスト値を求める処理（Ｓ２８１）とは実質的に同様である。 Next, when the keyword position information of the phoneme next to the search variable is false (S293), the cost value of the phoneme pointed to by the search variable and the next phoneme is obtained (S295). Note that step S295 and the process of obtaining the cost value (S281) in FIG. 7 are substantially the same.

上記サーチ変数の次の音韻のキーワード位置情報がｔｒｕｅである場合（Ｓ２９３），コスト値が最小となるパスを設定する（Ｓ２９６）。なお，当該ステップＳ２９６と，図７の最小となるパスを設定する処理（Ｓ２８５）とは実質的に同様である。 If the keyword position information of the phoneme next to the search variable is true (S293), a path with the minimum cost value is set (S296). Note that step S296 is substantially the same as the process (S285) for setting the minimum path in FIG.

上記サーチ変数が指し示す音韻と，次の音韻とのコスト値を求めた後，サーチ変数を次の音韻に設定する（Ｓ２９７）。なお，当該ステップＳ２９７と，図７の次の音韻に設定する処理（Ｓ２８７）とは実質的に同様である。 After obtaining the cost value of the phoneme indicated by the search variable and the next phoneme, the search variable is set to the next phoneme (S297). Note that step S297 and the processing for setting the next phoneme in FIG. 7 (S287) are substantially the same.

次に，図８に示すように，キーワード優先音韻選択部１０７は，サーチ変数が指し示すキーワード位置情報がｆａｌｓｅであるか否かを確認する（Ｓ２９８）。なお，上記ステップＳ２９８は，図７のサーチ変数が指すキーワード位置情報がｆａｌｓｅであるか否かを確認する処理（Ｓ２７７）と実質的に同様である。 Next, as shown in FIG. 8, the keyword priority phoneme selection unit 107 checks whether or not the keyword position information indicated by the search variable is false (S298). Note that step S298 is substantially the same as the process (S277) for confirming whether the keyword position information indicated by the search variable in FIG. 7 is false.

上記確認の結果（Ｓ２９８），サーチ変数が指し示すキーワード位置情報がｔｒｕｅの場合，キーワード数を１デクリメント（減算）した結果が１以上であるか否かを確認する（Ｓ２９９）。 As a result of the confirmation (S298), if the keyword position information indicated by the search variable is true, it is confirmed whether or not the result of decrementing (subtracting) the number of keywords by 1 is 1 or more (S299).

一方，上記確認の結果（Ｓ２９８），サーチ変数が指し示すキーワード位置情報がｆａｌｓｅの場合，図９に示す後続の処理（Ｓ３０１）が実行される。 On the other hand, if the keyword position information indicated by the search variable is false as a result of the confirmation (S298), the subsequent processing (S301) shown in FIG. 9 is executed.

図８に示すように，上記ステップＳ２９９では，テキスト本文内に複数のキーワードがある場合，キーワードとキーワードの間に１つ以上の音韻がある場合，サーチ変数を次の音韻に設定する（Ｓ３００）。上記ステップＳ３００は，図７のステップＳ２８７と実質的に同様である。 As shown in FIG. 8, in step S299, if there are a plurality of keywords in the text body, and there is one or more phonemes between the keywords, the search variable is set to the next phoneme (S300). . Step S300 is substantially the same as step S287 in FIG.

次に，図９に示すように，キーワード優先音韻選択部１０７は，まずサーチ変数が指すキーワード位置情報がｆａｌｓｅであるか否かを確認し（Ｓ３０１），次にサーチ変数が指すキーワード位置情報がｆａｌｓｅである場合，サーチ変数の次の音韻のキーワード位置情報がｆａｌｓｅであるか否かを確認する（Ｓ３０３）。なお，ステップＳ３０３は，図８のステップＳ２９３と実質的に同様である。 Next, as shown in FIG. 9, the keyword priority phoneme selection unit 107 first checks whether or not the keyword position information pointed to by the search variable is false (S301), and then the keyword position information pointed to by the search variable is If it is false, it is confirmed whether or not the keyword position information of the phoneme next to the search variable is false (S303). Step S303 is substantially the same as step S293 in FIG.

サーチ変数の次の音韻のキーワード位置情報がｆａｌｓｅである場合（Ｓ３０３），サーチ変数が指す音韻と，次の音韻とのコスト値を求める（Ｓ３０５）。 If the keyword position information of the phoneme next to the search variable is false (S303), the cost value of the phoneme pointed to by the search variable and the next phoneme is obtained (S305).

一方，サーチ変数の次の音韻のキーワード位置情報がｔｒｕｅである場合，コスト値が最小となるパスを設定する（Ｓ３０７）。 On the other hand, if the keyword position information of the phoneme next to the search variable is true, a path with the minimum cost value is set (S307).

次に，図９に示すように，キーワード優先音韻選択部１０７による上記ステップＳ３０７では，図７のコスト値が最小となるパスを設定するステップＳ２８５と実質的に同様である。 Next, as shown in FIG. 9, the above-described step S307 by the keyword priority phoneme selection unit 107 is substantially the same as the step S285 for setting the path with the minimum cost value in FIG.

キーワード優先音韻選択部１０７は，コスト値が最小となるパスを設定すると（Ｓ３０７），キーワード数を１デクリメントする（Ｓ３０９）。 When the keyword priority phoneme selection unit 107 sets a path that minimizes the cost value (S307), the keyword priority phoneme selection unit 107 decrements the number of keywords by 1 (S309).

図９に示すように，キーワード優先音韻選択部１０７による，ステップＳ３０９では，テキスト本文内に複数のキーワードがある場合，キーワード数を参照することで，キーワードとキーワードとの間の音韻選択処理が完了したことを表現するのに用いられる。例えば，キーワード数が１になるとキーワードとキーワードとの間の音韻選択処理が完了したことを示しているが，かかる例に限定されない。 As shown in FIG. 9, in step S309 by the keyword priority phoneme selection unit 107, when there are a plurality of keywords in the text body, the phoneme selection processing between the keywords is completed by referring to the number of keywords. Used to express what has been done. For example, when the number of keywords is 1, it indicates that the phoneme selection processing between the keywords is completed, but the present invention is not limited to this example.

上記ステップＳ３０９でキーワード数を１デクリメントすると，キーワード優先音韻選択部１０７は，サーチ変数を次の音韻に設定する（Ｓ３１１）。なお，上記ステップＳ３１１は，図７のステップＳ２８７と実質的に同様である。 When the number of keywords is decremented by 1 in step S309, the keyword priority phoneme selection unit 107 sets the search variable to the next phoneme (S311). Note that step S311 is substantially the same as step S287 of FIG.

また，図９に示すように，図８のステップＳ２９９の実行後，キーワード優先音韻選択部１０７は，サーチ変数が指し示すキーワード位置情報がｔｒｕｅであるか否かを確認する（Ｓ３１３）。 As shown in FIG. 9, after executing step S299 in FIG. 8, the keyword priority phoneme selection unit 107 checks whether or not the keyword position information indicated by the search variable is true (S313).

上記サーチ変数が指し示すキーワード位置情報がｔｒｕｅである場合（Ｓ３１３），サーチ変数の次の音韻のキーワード位置情報が文末であるか否かを確認する（Ｓ３１５）。 If the keyword position information pointed to by the search variable is true (S313), it is confirmed whether the keyword position information of the phoneme next to the search variable is the end of the sentence (S315).

一方，上記サーチ変数が指し示すキーワード位置情報がｆａｌｓｅである場合（Ｓ３１３），図１０に示すサーチ変数の次の音韻のキーワード位置情報が文末であるか否かを確認する処理が行われる（Ｓ３２０）。 On the other hand, if the keyword position information pointed to by the search variable is false (S313), a process is performed to check whether the keyword position information of the phoneme next to the search variable shown in FIG. 10 is the end of the sentence (S320). .

上記説明したように，図９に示す上記ステップＳ３１５では，サーチ変数が指し示している音韻の，次の音韻が，テキスト本文の文末であるか否かの判定がされる。 As described above, in step S315 shown in FIG. 9, it is determined whether or not the next phoneme of the phoneme indicated by the search variable is the end of the text body.

上記サーチ変数の次の音韻のキーワード位置情報が文末である場合（Ｓ３１５），図１０に示すように，波形セグメントを出力する（Ｓ３２９）。 When the keyword position information of the phoneme next to the search variable is the end of the sentence (S315), a waveform segment is output as shown in FIG. 10 (S329).

一方，上記サーチ変数の次の音韻のキーワード位置情報が文末でない場合（Ｓ３１５），サーチ変数を次の音韻に設定する（Ｓ３１７）。なお，上記ステップＳ３１７は，図７のステップＳ２８７と実質的に同様である。 On the other hand, if the keyword position information of the next phoneme after the search variable is not the end of the sentence (S315), the search variable is set to the next phoneme (S317). Note that step S317 is substantially the same as step S287 of FIG.

上記サーチ変数を次の音韻に設定すると（Ｓ３１７），再びサーチ変数が指し示すキーワード位置情報がｔｒｕｅであるか否かを確認する（Ｓ３１３）。以降の処理については上記説明した通りである。 When the search variable is set to the next phoneme (S317), it is confirmed again whether or not the keyword position information indicated by the search variable is true (S313). Subsequent processing is as described above.

図１０に示すキーワード優先音韻選択部１０７が行うサーチ変数の次の音韻のキーワード位置情報が文末であるか否かを確認する処理（Ｓ３２０）は，図９のステップＳ３１５と実質的に同様である。 The process (S320) for confirming whether or not the keyword position information of the phoneme next to the search variable performed by the keyword priority phoneme selection unit 107 shown in FIG. 10 is the end of the sentence is substantially the same as step S315 of FIG. .

上記確認の結果（Ｓ３２０），サーチ変数の次の音韻のキーワード位置情報が文末である場合，次に，コスト値が最小となるパスを設定する（Ｓ３２７）。なお，上記パスを設定する処理（Ｓ３２７）は，図７に示すステップＳ２８５と実質的に同様である。 As a result of the confirmation (S320), if the keyword position information of the phoneme next to the search variable is the end of the sentence, next, a path with the minimum cost value is set (S327). The process for setting the path (S327) is substantially the same as step S285 shown in FIG.

上記コスト値が最小となるパスを設定すると（Ｓ３２７），波形セグメントを出力する（Ｓ３２９）。 When the path with the minimum cost value is set (S327), a waveform segment is output (S329).

また一方で，サーチ変数の次の音韻のキーワード位置情報が文末でない場合（Ｓ３２０），キーワード優先音韻選択部１０７は，サーチ変数が指す音韻と，次の音韻とのコスト値を求める（Ｓ３２３）。なお，上記ステップＳ３２３は，図７のステップＳ２８１と実質的に同様である。 On the other hand, if the keyword position information of the phoneme next to the search variable is not the end of the sentence (S320), the keyword priority phoneme selection unit 107 obtains the cost value of the phoneme pointed to by the search variable and the next phoneme (S323). Note that step S323 is substantially the same as step S281 in FIG.

サーチ変数を次の音韻に設定すると（Ｓ３２５），再びサーチ変数の次の音韻のキーワード位置情報が文末であるか否かを確認する処理が行われる（Ｓ３２０）。以降の処理については，上記説明した通りである。 When the search variable is set to the next phoneme (S325), a process is performed again to check whether the keyword position information of the next phoneme after the search variable is the end of the sentence (S320). The subsequent processing is as described above.

図１０に示す波形セグメントを出力する処理（Ｓ３２９）は，テキスト本文に対して，音韻選択することで得ることができた波形セグメントを出力する。 The process (S329) for outputting the waveform segment shown in FIG. 10 outputs the waveform segment obtained by selecting the phoneme for the text body.

上記キーワード優先音韻選択部１０７により波形セグメントが出力されると（Ｓ３２９），音韻接続部１１１は，上記音韻選択された波形セグメントをつなぎ合わせて合成音声として出力する。上記合成音声がスピーカー等の出力部から出力されることで，音声合成装置１００は，テキスト本文のうちキーワードを強調しながら読み上げることができる。 When a waveform segment is output by the keyword priority phoneme selection unit 107 (S329), the phoneme connection unit 111 connects the waveform segments selected by the phoneme and outputs them as synthesized speech. When the synthesized speech is output from an output unit such as a speaker, the speech synthesizer 100 can read out the text while emphasizing the keyword.

なお，以上で，第１の実施の形態にかかる音声合成装置１００についての説明が終了するが，かかる音声合成装置１００によって，以下に示すような優れた効果が存在する。
（１）テキスト本文に含まれるキーワードを，キーワード以外の個所よりも滑らかな読み上げが可能となり，キーワード以外の個所よりも音質がよく，キーワード部分の読み上げをより強調し，より際立たせることができ，視聴者にキーワード部分をより明確に伝えることができる。 Although the description of the speech synthesizer 100 according to the first embodiment is finished as described above, the speech synthesizer 100 has the following excellent effects.
(1) The keywords included in the text body can be read out more smoothly than the parts other than the keyword, the sound quality is better than the parts other than the keyword, the reading out of the keyword part is more emphasized, and can be made more prominent. It is possible to convey the keyword part more clearly to the viewer.

（音声合成装置について）
次に，図１１を参照しながら，第２の実施の形態にかかる音声合成装置９００について説明する。なお，図１１は，第２の実施の形態にかかる音声合成装置の概略的な構成を示すブロック図である。以下，第１の実施の形態との相違点について詳細に説明するが，その他の点については，ほぼ同様であるため詳細な説明は省略する。 (About voice synthesizer)
Next, a speech synthesizer 900 according to the second embodiment will be described with reference to FIG. FIG. 11 is a block diagram illustrating a schematic configuration of the speech synthesizer according to the second embodiment. Hereinafter, differences from the first embodiment will be described in detail, but the other points are substantially the same, and detailed description thereof will be omitted.

図１１に示すように，音声合成装置９００は，テキスト解析部１０１と，韻律予測部１０３と，キーワード重み付け部９０１と，キーワード抽出部９０５と，キーワード優先音韻選択部９０７と，コーパス１０９と，音韻接続部１１１とを備えている。 As shown in FIG. 11, the speech synthesizer 900 includes a text analysis unit 101, a prosody prediction unit 103, a keyword weighting unit 901, a keyword extraction unit 905, a keyword priority phoneme selection unit 907, a corpus 109, and a phoneme. And a connecting portion 111.

なお，第２の実施の形態に係る音声合成装置９００は，第１の実施の形態に係る音声合成装置１００と同様に，テキスト本文とキーワードを基にして合成音声を出力することが可能な装置であって，その合成音声を出力することで，テキスト本文を音声にして読み上げることが可能な装置である。 Note that the speech synthesizer 900 according to the second embodiment is an apparatus capable of outputting synthesized speech based on the text body and keywords, as with the speech synthesizer 100 according to the first embodiment. However, it is a device that can read out the text body as speech by outputting the synthesized speech.

より具体的には，音声合成装置９００は，例えば，ＣＰＵ，メモリ，ＨＤＤ（ハードディスクドライブ），マウス等に相当する入力部，液晶ディスプレイ等に相当する表示部などを備えたＰＣ等を例示することができるが，かかる例に限定されない。 More specifically, the speech synthesizer 900 exemplifies, for example, a PC having an input unit corresponding to a CPU, memory, HDD (hard disk drive), mouse, etc., a display unit equivalent to a liquid crystal display, etc. However, it is not limited to such an example.

なお，本実施の形態にかかる音声合成装置９００に備わる表示部は，ＣＰＵにより表示可能なように処理された表示画面データと音声データを出力する。また，表示部は，例えば，ＴＶ又は液晶ディスプレイ装置などが例示され，上記双方ともにスピーカーを備えて，静止画像のほか，音声，又は動画像などを出力することが可能である。 Note that the display unit provided in the speech synthesizer 900 according to the present embodiment outputs display screen data and audio data processed so as to be displayed by the CPU. In addition, the display unit is exemplified by a TV or a liquid crystal display device, for example, and both of them are provided with a speaker, and can output a sound or a moving image in addition to a still image.

入力部は，例えば，使用者から操作指示を受けることが可能なマウス，トラックボール，トラックパッド，スタイラスペン，またはジョイスティックなどのポインティングデバイスや，キーボード，ボタン，スイッチ，レバー等の操作手段と，入力信号を生成してＣＰＵに出力する入力制御回路などから構成されている。音声合成装置９００のユーザは，この入力部を操作することにより，音声合成装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input unit is, for example, a pointing device such as a mouse, a trackball, a trackpad, a stylus pen, or a joystick that can receive an operation instruction from a user, an operation means such as a keyboard, a button, a switch, and a lever, and an input It comprises an input control circuit that generates a signal and outputs it to the CPU. The user of the speech synthesizer 900 can input various data and instruct processing operations to the speech synthesizer 900 by operating this input unit.

図１１に示すキーワード重み付け部９０１は，テキスト本文内にある，複数のキーワード部分を読み上げさせる強調の度合い，つまりキーワード優先音韻選択部９０７で単位候補の組合せを絞り込むための，重み係数を決定し，キーワード重み付け情報として保持する。 The keyword weighting unit 901 shown in FIG. 11 determines the degree of emphasis for reading out a plurality of keyword parts in the text body, that is, the weight coefficient for narrowing down the combination of unit candidates by the keyword priority phoneme selection unit 907, Stored as keyword weighting information.

上記キーワード抽出部９０５は，キーワードの音韻記号と，テキスト本文内の音韻記号との部分一致している個所を基にして，キーワード強弱情報の値を，上記キーワード重み付け部９０１で求まったキーワード重み付け情報に対応する単位候補の組合せを絞り込む幅の値に変更する。なお，変更されたキーワード強弱情報は，韻律予測情報内のキーワードの先頭音韻に含んでいる。 The keyword extraction unit 905 determines the keyword weighting information obtained by the keyword weighting unit 901 based on the part where the phoneme symbol of the keyword and the phoneme symbol in the text body partially match. The combination of unit candidates corresponding to is changed to a width value for narrowing down. The changed keyword strength information is included in the head phoneme of the keyword in the prosodic prediction information.

また，上記キーワード抽出部９０５は，キーワード部分を読み上げさせる強調の度合いを示しているキーワード重み付け情報と，単位候補の組合せを絞り込む幅とを，対応付けたテーブルを用いる。なお，当該テーブルは，例えば，音声合成装置９００に備わるＨＤＤ等の記憶手段に格納されている。 The keyword extraction unit 905 uses a table in which keyword weighting information indicating the degree of emphasis for reading out the keyword portion and a width for narrowing down the combination of unit candidates are associated with each other. The table is stored in a storage unit such as an HDD provided in the speech synthesizer 900, for example.

上記キーワード優先音韻選択部９０７は，ターゲット選択で求めた音韻候補を基にして，テキスト本文の各音韻に付与されているキーワード位置情報のフラグ値がｔｒｕｅとなるキーワード開始位置からキーワード終了位置まで，ターゲットコスト値と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを足し合わせたコスト値が最小となる値から，単位候補の組合せを絞り込む幅の範囲内に収まるコスト値を持つ，単位候補の組合せを，動的計画法を用いて選択する。 The keyword priority phoneme selection unit 907 is based on the phoneme candidate obtained by the target selection, from the keyword start position to the keyword end position where the flag value of the keyword position information given to each phoneme of the text body is true, The cost value that falls within the range of narrowing the combination of unit candidates from the value that minimizes the cost value that is the sum of the target cost value, the sub cost value related to pitch discontinuity, and the sub cost value related to spectrum discontinuity. The combination of unit candidates is selected using dynamic programming.

（音声合成方法について）
次に，図１２，図１３を参照しながら，第２の実施の形態にかかる音声合成方法について説明する。なお，図１２，図１３は，第２の実施の形態にかかる音声合成方法の概略を示すフローチャートである。 (Speech synthesis method)
Next, a speech synthesis method according to the second embodiment will be described with reference to FIGS. FIGS. 12 and 13 are flowcharts showing an outline of the speech synthesis method according to the second embodiment.

図１２及び図１３に示すように，第２の実施の形態にかかる音声合成方法は，第１の実施の形態にかかる音声合成方法と比べて，キーワード重み付け処理（Ｓ１２０１）と，キーワード強弱情報を初期化する処理（Ｓ１２０３）とを含んでいる点で相違し，またキーワード個所を抽出する処理（Ｓ１２０５）と，キーワード優先音韻選択処理（Ｓ１２０７）は，第１の実施の形態にかかる処理と処理内容を異にするが，詳細は後述する。 As shown in FIGS. 12 and 13, the speech synthesis method according to the second embodiment is different from the speech synthesis method according to the first embodiment in keyword weighting processing (S1201) and keyword strength information. The processing is different in that it includes initialization processing (S1203), and the keyword location extraction processing (S1205) and the keyword priority phoneme selection processing (S1207) are the processing and processing according to the first embodiment. The contents are different, but details will be described later.

まず，図１２に示すように，１又は２以上のキーワードを含むテキスト本文と，強調させたい１又は２以上のキーワードとは，テキスト解析部１０１に入力する（Ｓ２０１）。なお，上記テキスト本文およびキーワードは漢字仮名文字で表現された場合を例に挙げて説明するが，かかる例に限定されない。 First, as shown in FIG. 12, a text body including one or more keywords and one or more keywords to be emphasized are input to the text analysis unit 101 (S201). The text body and the keyword will be described by taking an example where the text body and the keyword are expressed in kanji characters, but are not limited to such examples.

なお，図１２に示すように，テキスト本文に対して変換された音韻記号と，形態素解析結果との情報を持つ出力結果は，例えば，テキスト本文中間言語であると定義するが，かかる例に限定されない。 As shown in FIG. 12, an output result having information on phonological symbols converted to a text body and a morphological analysis result is defined as, for example, a text body intermediate language. Not.

次に，図１２に示すように，テキスト解析部１０１は，テキスト本文を韻律予測部１０３に入力するために，テキスト本文であるかどうかを判定する（Ｓ２０５）。 Next, as shown in FIG. 12, the text analysis unit 101 determines whether the text body is a text body in order to input the text body to the prosody prediction unit 103 (S205).

判定の結果（Ｓ２０５），テキスト本文である場合，テキスト解析部１０１は，テキスト本文中間言語を出力して韻律予測部１０３とともにキーワード抽出部９０５に送信する。 As a result of the determination (S205), if it is a text body, the text analysis unit 101 outputs the text body intermediate language and transmits it to the keyword extraction unit 905 together with the prosodic prediction unit 103.

一方，テキスト本文でない場合，つまりキーワードである場合，テキスト解析部１０１は，キーワード中間言語をキーワード重み付け部９０１に送信する。 On the other hand, when it is not a text body, that is, when it is a keyword, the text analysis unit 101 transmits the keyword intermediate language to the keyword weighting unit 901.

韻律予測部１０３は，予測した情報として韻律予測情報をキーワード抽出部９０１に出力する。 The prosody prediction unit 103 outputs prosodic prediction information to the keyword extraction unit 901 as predicted information.

また，図１２に示すように，キーワード重み付け部９０１は，テキスト解析部１０１からキーワード中間言語を受け取ると（Ｓ２０５），テキスト本文内にある，複数のキーワード部分を読み上げさせる際の強調する度合い，つまりキーワード優先音韻選択部９０７で単位候補の組合せを絞り込むための重み係数を決定し，キーワード重み付け情報として保持する（Ｓ１２０１）。 Also, as shown in FIG. 12, when the keyword weighting unit 901 receives the keyword intermediate language from the text analysis unit 101 (S205), the degree of emphasis when reading a plurality of keyword parts in the text body, that is, The keyword priority phoneme selection unit 907 determines a weighting factor for narrowing down the combination of unit candidates and holds it as keyword weighting information (S1201).

次に，キーワード重み付け部９０１は，重み係数を決定し，キーワード重み付け情報を生成すると（Ｓ１２０１），キーワード中間言語とキーワード重み付け情報とをキーワード抽出部９０５に出力する。 Next, the keyword weighting unit 901 determines a weighting factor and generates keyword weighting information (S1201), and outputs the keyword intermediate language and the keyword weighting information to the keyword extraction unit 905.

（キーワード抽出部による処理について）
次に，キーワード抽出部９０５は，図１２に示すように，テキスト本文内でキーワードがどの位置に存在しているかを示すための情報（キーワード位置情報）を記載するため，韻律予測情報における各音韻に対して領域を確保し，初期化する（Ｓ２０９）。なお，上記初期化の際には，キーワード位置情報のフラグ値をｆａｌｓｅに設定するが，かかる例に限定されない。 (About processing by the keyword extraction unit)
Next, as shown in FIG. 12, the keyword extraction unit 905 describes information (keyword position information) for indicating where the keyword exists in the text body, so that each phoneme in the prosodic prediction information is recorded. An area is secured for and initialized (S209). In the initialization, the flag value of the keyword position information is set to false, but the present invention is not limited to this example.

次に，キーワード抽出部９０５は，単位候補の組合せを絞り込む幅を表現するキーワード強弱情報を初期化する（Ｓ１２０３）。 Next, the keyword extraction unit 905 initializes keyword strength information expressing a width for narrowing down the combination of unit candidates (S1203).

上記ステップＳ１２０３では，単位候補の組合せを絞り込む幅を表現する，韻律予測情報内の各音韻ごとのキーワード強弱情報を初期化する。なお，当該初期化は，例えばキーワード強弱情報の値を０に設定する場合を例に挙げて説明するが，かかる例に限定されない。 In step S1203, keyword strength information is initialized for each phoneme in the prosodic prediction information, which expresses the range for narrowing the combination of unit candidates. The initialization will be described by taking, for example, a case where the value of keyword strength information is set to 0, but is not limited to such an example.

次に，キーワード抽出部９０５は，テキスト本文と，キーワードとの音韻記号内で，現在どの音韻を指し示しているかを表現した音韻位置指示子を，テキスト本文と１又は２以上のキーワードにおける先頭音韻に設定する（Ｓ２１１）。 Next, the keyword extraction unit 905 converts a phoneme position indicator representing which phoneme is currently pointed to in the phoneme symbol between the text body and the keyword to the text body and the first phoneme in one or more keywords. Set (S211).

次に，キーワード抽出部９０５は，テキスト本文と，複数のキーワードに対して，音韻の数を求め，各キーワード音韻数にキーワードの音韻数，テキスト音韻数にテキスト本文の音韻数を設定する（Ｓ２１３）。 Next, the keyword extraction unit 905 obtains the number of phonemes for the text body and a plurality of keywords, and sets the number of keyword phonemes for each keyword phoneme number and the number of text body phonemes for the text phoneme number (S213). ).

次に，キーワード抽出部９０５は，テキスト本文の文末であるか否かを判定する（Ｓ２１５）。 Next, the keyword extraction unit 905 determines whether the end of the text body is reached (S215).

上記テキスト本文の文末であるか否かの判定した結果（Ｓ２１５），テキスト本文の文末でない場合，キーワード抽出部９０５は，テキスト本文内にあるキーワードの個所を抽出する（Ｓ２１７）。 As a result of determining whether or not it is the end of the text body (S215), if it is not the end of the text body, the keyword extracting unit 905 extracts the location of the keyword in the text body (S217).

上記キーワード個所の抽出処理（Ｓ２１７）が実行された後，キーワードの音韻記号と，テキスト本文内の音韻記号との，部分一致している個所を基にして，韻律予測情報内のキーワードの先頭音韻に備えられたキーワード強弱情報を求めるために，キーワード部分を読み上げさせる強弱の度合いを示しているキーワード重み付け情報と，単位候補の組合せを絞り込む幅とを，対応付けたテーブルを用いて，キーワード強弱情報の値を，キーワード重み付け部９０１で求まったキーワード重み付け情報に対応する，単位候補の組合せを絞り込む幅の値に変更する（Ｓ２１０５）。 After the keyword location extraction process (S217) is executed, the first phoneme of the keyword in the prosodic prediction information is based on the location where the keyword phoneme symbol partially matches the phoneme symbol in the text body. In order to obtain the keyword strength information provided in the keyword, the keyword weighting information indicating the degree of strength at which the keyword portion is read out and the width for narrowing down the combination of unit candidates are used in the keyword strength information by using a table in which the combinations are narrowed down. Is changed to a width value for narrowing down the combination of unit candidates corresponding to the keyword weighting information obtained by the keyword weighting unit 901 (S2105).

上記キーワード抽出部９０５は，キーワード位置情報付き韻律予測情報と，キーワード強弱情報とから構成される「キーワード位置情報，強弱情報（キーワード強弱情報）付き韻律予測情報」をキーワード優先音韻選択部９０７に出力する。 The keyword extraction unit 905 outputs to the keyword priority phoneme selection unit 907 “prosody prediction information with keyword position information and strength information (keyword strength information)” composed of prosodic prediction information with keyword position information and keyword strength information. To do.

次に，テキスト本文の文末であることを確認すると（Ｓ２１５），図１３に示すように，キーワード優先音韻選択処理を実行する（Ｓ１２１９）。 Next, when it is confirmed that it is the end of the text body (S215), keyword priority phoneme selection processing is executed as shown in FIG. 13 (S1219).

上記キーワード優先音韻選択処理（Ｓ１２１９）では，ターゲット選択で求めた音韻候補を基にして，テキスト本文の各音韻に付与されているキーワード位置情報のフラグ値がｔｒｕｅとなるキーワード開始位置からキーワード終了位置まで，ターゲットコスト値と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを，足し合わせたコスト値が最小となる値から，単位候補の組合せを絞り込む幅の間の，単位候補の組合せを，動的計画法を用いて選択している。 In the keyword priority phoneme selection process (S1219), based on the phoneme candidate obtained by the target selection, the keyword end position is changed from the keyword start position where the flag value of the keyword position information assigned to each phoneme of the text body is true. Until the target cost value, the sub cost value related to the pitch discontinuity, and the sub cost value related to the spectrum discontinuity are combined, the unit between the widths for narrowing the combination of unit candidates from the value that minimizes the cost value. Candidate combinations are selected using dynamic programming.

なお，上記ステップＳ１２１９において，キーワードが２つ以上存在する場合，キーワード部分に関するコスト値の計算は，テキスト本文の開始から終了に向けて，キーワードの出現順に行ってもよい。 In step S1219, when there are two or more keywords, the cost value for the keyword portion may be calculated in the order in which the keywords appear from the start to the end of the text body.

また，上記ステップＳ１２１９において，キーワードが２つ以上存在する場合，キーワード部分以外のコスト値の計算方法は，テキスト本文の開始位置から，最初に出現したキーワード開始位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択してもよい。 If there are two or more keywords in step S1219, the cost value calculation method other than the keyword portion is such that the cost value is the minimum from the start position of the text body to the first keyword start position. A combination of unit candidates may be selected using dynamic programming.

また，上記ステップＳ１２１９において，キーワードが２つ以上存在する場合，キーワード部分以外のコスト値の計算方法は，キーワード部分のコスト値が最小となる単位候補の組合せを用いて，出現したキーワード（当該キーワード）の終了位置から，次に出現するキーワード（後続キーワード）の開始位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択してもよい。 In the above step S1219, when there are two or more keywords, the cost value calculation method other than the keyword part uses the combination of unit candidates that minimizes the cost value of the keyword part (the relevant keyword ) From the end position to the start position of the next appearing keyword (subsequent keyword), a combination of unit candidates that minimizes the cost value may be selected using dynamic programming.

また，上記ステップＳ１２１９において，キーワードが２つ以上存在する場合，キーワード部分以外のコスト値の計算方法は，最後に出現したキーワードの終了位置から，テキスト本文の終了位置に向けて，コスト値が最小となる単位候補の組合せを，動的計画法を用いて選択しても良い。 In the above step S1219, when there are two or more keywords, the cost value calculation method other than the keyword portion is such that the cost value is minimized from the end position of the keyword that appears last to the end position of the text body. A combination of unit candidates may be selected using dynamic programming.

以上，図１３に示すように，全ての音韻選択処理がキーワード優先音韻選択部１０７によって行われることで，波形セグメントを出力することができる（Ｓ１２１９）。 As described above, as shown in FIG. 13, all the phoneme selection processes are performed by the keyword priority phoneme selection unit 107, so that a waveform segment can be output (S1219).

また一方で，テキスト本文の文末である場合（Ｓ２２１），音韻接続部１１１は，波形セグメントを接続することにより生成する合成音声を出力する（Ｓ２２５）。かかる合成音声の出力により（Ｓ２２５），音声合成装置９００は，キーワードを強調しながら，テキスト本文を読み上げることができる。 On the other hand, if it is the end of the text body (S221), the phoneme connection unit 111 outputs synthesized speech generated by connecting the waveform segments (S225). With the output of the synthesized speech (S225), the speech synthesizer 900 can read the text body while emphasizing the keyword.

（キーワード個所の抽出処理について）
次に，図１２に示す第２の実施の形態に係るキーワード個所の抽出処理（Ｓ２１５，Ｓ２１７，Ｓ２１０５）について，図１４，図１５を参照しながら，さらに詳細に説明する。なお，図１４，図１５は，第２の実施の形態にかかるキーワード個所の抽出処理の概略を示すフローチャートである。 (Keyword location extraction process)
Next, the keyword location extraction processing (S215, S217, S2105) according to the second embodiment shown in FIG. 12 will be described in more detail with reference to FIGS. FIGS. 14 and 15 are flowcharts showing an outline of the keyword location extraction processing according to the second embodiment.

キーワード個所の抽出処理（Ｓ２１７）では，上記説明したように，キーワード抽出部９０５は，予めテキスト本文と，キーワードとの音韻記号内で，現在どの音韻を指し示しているかを表現した，音韻位置指示子を，テキスト本文，複数のキーワードに対して，先頭音韻に設定しておく。 In the keyword location extraction process (S217), as described above, the keyword extraction unit 905 previously represents a phoneme position indicator that indicates which phoneme is currently pointed out in the phoneme symbol between the text body and the keyword. Is set as the head phoneme for the text body and multiple keywords.

まず，図１４に示すように，テキスト本文の文末でなければ（Ｓ２１５），キーワード抽出部９０５は，１又は２以上のキーワードのうち，順々にキーワードを取り扱うため，現時点で取り扱うキーワードの順番が，全体のキーワード数の範囲内でおさまっているか否かを判定する（Ｓ２４０）。 First, as shown in FIG. 14, if it is not the end of the text body (S215), the keyword extraction unit 905 handles keywords in order from one or more keywords. , It is determined whether or not the total number of keywords falls within the range (S240).

次に，キーワード抽出部９０５は，現在取り扱っているキーワード（当キーワード）の音韻位置指示子が，当キーワードの語尾を示しているかどうかを判定する（Ｓ２４１）。 Next, the keyword extraction unit 905 determines whether or not the phonological position indicator of the currently handled keyword (this keyword) indicates the ending of the keyword (S241).

当キーワードの語尾である場合（Ｓ２４１），当キーワードの重みに対応する，単位候補の組合せを絞り込む値を取得する処理が行われる（図１５に示すＳ１２２９）。 If it is the ending of the keyword (S241), a process for obtaining a value for narrowing down the combination of unit candidates corresponding to the weight of the keyword is performed (S1229 shown in FIG. 15).

当キーワードの語尾でない場合（Ｓ２４１），音韻位置指示子が示しているテキスト本文，当キーワードの音韻が一致しているか否かを判定する（Ｓ２４３）。 If it is not the end of the keyword (S241), it is determined whether the text body indicated by the phoneme position indicator matches the phoneme of the keyword (S243).

なお，第２の実施の形態にかかる当キーワードの語尾であるか否かの判断基準は，第１の実施の形態にかかる判断基準と同様であり，現在取り扱っているキーワード（当キーワード）の音韻位置指示子が，当キーワードの音韻数より小さいかどうかによって，判断する。 Note that the criterion for determining whether or not the ending of the keyword according to the second embodiment is the same as the criterion according to the first embodiment, and the phoneme of the currently handled keyword (this keyword). Judgment is made based on whether the position indicator is smaller than the number of phonemes of the keyword.

次に，キーワード抽出部９０５は，現在取り扱っているキーワード（当キーワード）の音韻位置指示子の指し示す位置を，当キーワードの先頭音韻に設定する（Ｓ２４５）。 Next, the keyword extraction unit 905 sets the position pointed to by the phoneme position indicator of the currently handled keyword (this keyword) as the first phoneme of the keyword (S245).

キーワード抽出部９０５は，次に，現在取り扱っているキーワード（当キーワード）の音韻位置指示子の指し示す位置を，当キーワードの音韻位置指示子が現在指し示している音韻の次の音韻に設定する（Ｓ２４７）。 Next, the keyword extraction unit 905 sets the position pointed to by the phoneme position indicator of the currently handled keyword (this keyword) as the phoneme next to the phoneme currently pointed by the keyword position indicator (S247). ).

次に，キーワード抽出部９０５は，現在取り扱っているキーワードを，キーワードの入力順に次のキーワードに変更する（Ｓ２４９）。 Next, the keyword extraction unit 905 changes the currently handled keyword to the next keyword in the keyword input order (S249).

なお，テキスト本文の文末である場合（Ｓ２１５），キーワード抽出部９０５は，キーワード位置情報，強弱付き韻律予測情報を出力する（Ｓ２５１）。 Note that if it is the end of the text body (S215), the keyword extraction unit 905 outputs keyword position information and strong and weak prosodic prediction information (S251).

次に，図１５に示すように，キーワード抽出部９０５によるステップＳ１２２９では，現在取り扱っているキーワード（当キーワード）が保持する，キーワード部分を読み上げさせる強調の度合いを示しているキーワード重み付け情報に対応する，単位候補の組合せを絞り込む値を，キーワード部分を読み上げさせる強調の度合いを示しているキーワード重み付け情報と単位候補の組合せを絞り込む幅とを対応付けたテーブルから，取得する処理が行われる。 Next, as shown in FIG. 15, in step S <b> 1229 by the keyword extraction unit 905, it corresponds to the keyword weighting information indicating the degree of emphasis for reading the keyword portion held by the currently handled keyword (this keyword). Then, a process of acquiring values for narrowing down the combination of unit candidates from a table in which keyword weighting information indicating the degree of emphasis for reading out the keyword portion and a width for narrowing down the combination of unit candidates is associated.

次に，キーワード抽出部９０５は，テキスト本文の音韻位置指示子が指し示す音韻より，１つ先行している音韻から先頭音韻の方向に向かって，当キーワード音韻数の数値分の音韻全てに対応する，韻律予測情報内の各音韻ごとのキーワード位置情報を変更する（Ｓ２５３）。なお，ｆａｌｓｅからｔｒｕｅにキーワード位置情報は変更されるが，かかる例に限定されない。 Next, the keyword extraction unit 905 corresponds to all the phonemes for the number of the keyword phonemes from the phoneme pointed by the phoneme position indicator in the text body toward the head phoneme. The keyword position information for each phoneme in the prosodic prediction information is changed (S253). The keyword position information is changed from false to true, but is not limited to this example.

キーワード抽出部９０５は，ステップＳ２５３の処理を実行すると，次に，テキスト本文内の当キーワードの先頭音韻に含まれるキーワード強弱情報の値を，単位候補の組合せを絞り込む値に変更する（Ｓ１３００）。 After executing the processing of step S253, the keyword extraction unit 905 next changes the value of the keyword strength information included in the head phoneme of the keyword in the text body to a value that narrows down the combination of unit candidates (S1300).

上記ステップＳ１３００では，テキスト本文内の当キーワードの先頭音韻に含むキーワード強弱情報の値を，上記ステップＳ１２２９で取得した単位候補の組合せを絞り込む幅の値に変更する処理が行われる。 In step S1300, a process of changing the value of the keyword strength information included in the first phoneme of the keyword in the text body to a width value for narrowing down the combination of unit candidates acquired in step S1229 is performed.

次に，図１５に示すように，キーワード抽出部９０５は，図１４に示すステップＳ２４５と同じ処理を実行する（Ｓ２５５）。 Next, as shown in FIG. 15, the keyword extraction unit 905 executes the same processing as step S245 shown in FIG. 14 (S255).

また，図１４に示す判定の結果（Ｓ２４０），キーワード数内でない場合，図１５に示すように，キーワード抽出部９０５は，当キーワードを最初のキーワードに変更する（Ｓ２５７）。 If the determination result shown in FIG. 14 is not within the number of keywords (S240), the keyword extraction unit 905 changes the keyword to the first keyword as shown in FIG. 15 (S257).

次に，キーワード抽出部９０５は，テキスト本文の音韻位置指示子の指し示す位置を，テキスト本文の音韻位置指示子が現在指し示している音韻の次の音韻に設定する（Ｓ２５９）。 Next, the keyword extraction unit 905 sets the position indicated by the phoneme position indicator in the text body to the phoneme next to the phoneme currently indicated by the phoneme position indicator in the text body (S259).

以上で，図１４及び図１５に示す第２の実施の形態に係るキーワード抽出部９０５によるキーワード個所の抽出処理（Ｓ２１５，Ｓ２１７，Ｓ２１０５）の一連の処理が終了する。 Thus, a series of processing of keyword part extraction processing (S215, S217, S2105) by the keyword extraction unit 905 according to the second embodiment shown in FIGS. 14 and 15 is completed.

（キーワード優先音韻選択部９０７によるキーワード優先音韻選択処理について）
次に，図１６〜図２０を参照しながら，第２の実施の形態にかかるキーワード優先音韻選択処理（Ｓ１２１９）について詳細に説明する。なお，図１６〜図２０は，第２の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。 (Keyword priority phoneme selection processing by the keyword priority phoneme selection unit 907)
Next, the keyword priority phoneme selection process (S1219) according to the second embodiment will be described in detail with reference to FIGS. FIGS. 16 to 20 are flowcharts showing an outline of the keyword priority phoneme selection process according to the second embodiment.

まず，図１６に示すように，キーワード優先音韻選択部９０７は，テキスト本文の音韻に対して，ターゲット選択する（Ｓ２７１）。 First, as shown in FIG. 16, the keyword priority phoneme selection unit 907 selects a target for the phoneme of the text body (S271).

上記ターゲット選択すると（Ｓ２７１），キーワード優先音韻選択部９０７は，サーチ変数をテキスト本文の先頭音韻にキーワード数を０に設定する（Ｓ２７３）ことで，テキスト本文内で現在どの音韻を指し示しているかを表現したサーチ変数をテキスト本文の先頭の音韻を指し示すようにする。 When the target is selected (S271), the keyword priority phoneme selection unit 907 sets the search variable to the first phoneme of the text body and sets the number of keywords to 0 (S273), thereby determining which phoneme is currently indicated in the text body. The expressed search variable points to the phoneme at the beginning of the text body.

次に，キーワード優先音韻選択部９０７は，サーチ変数が指し示している音韻の位置がテキスト本文の全体の音韻数の範囲内にあるかどうかを判定する（Ｓ２７５）。つまり，図１６のステップＳ２７５に示すように，サーチ変数＜テキストの音韻数の関係にあるかどうかを判定する。 Next, the keyword priority phoneme selection unit 907 determines whether or not the position of the phoneme pointed to by the search variable is within the range of the total number of phonemes in the text body (S275). That is, as shown in step S275 in FIG. 16, it is determined whether or not the search variable <the number of phonemes of the text.

上記ステップＳ２７５の判定結果，サーチ変数がテキストの音韻数より小さい場合，キーワード優先音韻選択部９０７は，サーチ変数が指し示している音韻が，キーワード内の音韻であるかどうかを判定する（Ｓ２７７）。 If the result of determination in step S275 is that the search variable is smaller than the number of phonemes in the text, the keyword priority phoneme selection unit 907 determines whether the phoneme pointed to by the search variable is a phoneme in the keyword (S277).

また，サーチ変数がテキストの音韻数より大きい場合（Ｓ２７５），図１８に示す後続の処理（サーチ変数をテキスト本文の先頭音韻に設定する（Ｓ２９０））が実行される。 If the search variable is larger than the number of phonemes of the text (S275), the subsequent processing shown in FIG. 18 (set the search variable to the first phoneme of the text body (S290)) is executed.

なお，サーチ変数が指すキーワード位置情報がｆａｌｓｅである場合（Ｓ２７７），図１７に示すように，サーチ変数を次の音韻に設定する（Ｓ２８７）。 If the keyword position information pointed to by the search variable is false (S277), the search variable is set to the next phoneme as shown in FIG. 17 (S287).

サーチ変数が指すキーワード位置情報がｔｒｕｅである場合，図１７に示すように，サーチ変数が指し示すキーワード位置情報がｔｒｕｅ，かつ，サーチ変数の前の音韻のキーワード位置情報がｆａｌｓｅであるか否かを確認する処理を実行する（Ｓ１３３３）。 If the keyword position information pointed to by the search variable is true, as shown in FIG. 17, it is determined whether or not the keyword position information pointed to by the search variable is true and the keyword position information of the phoneme preceding the search variable is false. Processing to confirm is executed (S 1333).

図１７に示すステップＳ１３３３では，テキスト本文内の当キーワードの先頭音韻であるかどうかを判定している。サーチ変数が指し示すキーワード位置情報がｔｒｕｅ，かつ，サーチ変数の前の音韻のキーワード位置情報がｆａｌｓｅである場合，サーチ変数がキーワードの先頭音韻を指し示している。 In step S1333 shown in FIG. 17, it is determined whether or not it is the first phoneme of the keyword in the text body. When the keyword position information indicated by the search variable is true and the keyword position information of the phoneme preceding the search variable is false, the search variable indicates the head phoneme of the keyword.

次に，サーチ変数が指すキーワード位置情報がｔｒｕｅで，かつ，サーチ変数の前の音韻のキーワード位置情報がｆａｌｓｅである場合（Ｓ１３３３），サーチ変数が指し示すキーワードの先頭音韻に含まれるキーワード強弱情報を取得する（Ｓ１３３４）。 Next, when the keyword position information pointed to by the search variable is true and the keyword position information of the phoneme preceding the search variable is false (S 1333), the keyword strength information included in the head phoneme of the keyword pointed to by the search variable is obtained. Obtain (S1334).

一方，サーチ変数が指すキーワード位置情報がｔｒｕｅで，かつ，サーチ変数の前の音韻のキーワード位置情報がｆａｌｓｅでない場合（Ｓ１３３３），サーチ変数の次の音韻のキーワード位置情報がｔｒｕｅ，または，次の音韻があるか否かを確認する（Ｓ２７９）。 On the other hand, if the keyword position information pointed to by the search variable is true and the keyword position information of the phoneme preceding the search variable is not false (S 1333), the keyword position information of the phoneme next to the search variable is true or It is confirmed whether there is a phoneme (S279).

上記ステップＳ１３３４では，サーチ変数が指し示すキーワードの先頭音韻に含まれる，単位候補の組合せを絞り込む幅を表現した，キーワード強弱情報の値を取得する。なお，当該キーワード強弱情報の値を取得すると（Ｓ１３３４），次に，ステップＳ２７９が実行される。 In step S1334, the value of keyword strength information expressing the range for narrowing down the combination of unit candidates included in the head phoneme of the keyword indicated by the search variable is acquired. If the value of the keyword strength information is acquired (S1334), next, step S279 is executed.

図１７に示すように，ステップＳ２７９では，サーチ変数が指し示している音韻の次の音韻がキーワード内の音韻であるかどうかを判定している。 As shown in FIG. 17, in step S279, it is determined whether the phoneme next to the phoneme indicated by the search variable is a phoneme in the keyword.

図１７に示すように，上記ステップＳ２８１では，サーチ変数が指し示している音韻のターゲット選択で求めた候補と，サーチ変数が指し示している音韻の次の音韻のターゲット選択で求めた候補との間で，ターゲットコスト値と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを足し合わせたコスト値を求めている。 As shown in FIG. 17, in step S281, between the candidate obtained by the target selection of the phoneme indicated by the search variable and the candidate obtained by the target selection of the phoneme next to the phoneme indicated by the search variable. The cost value obtained by adding the target cost value, the sub cost value related to the pitch discontinuity, and the sub cost value related to the spectrum discontinuity is obtained.

また，テキスト本文内でキーワードが幾つ存在するかを表現したキーワード数を１インクリメントすると（Ｓ２８３），キーワード優先音韻選択部９０７は，（コスト値＋キーワード強弱情報）の値の範囲内に収まる，コスト値を持つパスを設定する（Ｓ１３３５）。 When the number of keywords expressing how many keywords are present in the text body is incremented by 1 (S283), the keyword priority phoneme selection unit 907 falls within the range of the value of (cost value + keyword strength information). A path having a value is set (S1335).

上記ステップＳ１３３５では，サーチ変数が指し示している音韻の，ターゲット選択で求めた候補と，サーチ変数が指し示している音韻の次の音韻の，ターゲット選択で求めた候補との間で，ターゲットコスト値と，ピッチの不連続に関するサブコスト値と，スペクトルの不連続に関するサブコスト値とを，足し合わせた最小となるコスト値に，上記ステップＳ１３３４で取得したキーワード強弱情報を足し合わせた値の範囲内に収まるコスト値を持つ，単位候補の組合せ（パス）を求める。 In step S1335, the target cost value between the candidate obtained by target selection of the phoneme indicated by the search variable and the candidate obtained by target selection of the phoneme next to the phoneme indicated by the search variable is , A cost that falls within a range of values obtained by adding the keyword cost information acquired in step S1334 to the minimum cost value obtained by adding the sub cost value related to the pitch discontinuity and the sub cost value related to the spectrum discontinuity. The unit candidate combination (path) having a value is obtained.

上記パスを設定すると（Ｓ１３３５），サーチ変数を次の音韻に設定する（Ｓ２８７）。 When the above path is set (S1335), the search variable is set to the next phoneme (S287).

図１８に示すように，キーワード優先音韻選択部９０７は，サーチ変数をテキスト本文の先頭音韻に設定する（Ｓ２９０）。なお，上記ステップＳ２９０は，上記図１６のステップＳ２７３と実質的に同様である。 As shown in FIG. 18, the keyword priority phoneme selection unit 907 sets the search variable to the first phoneme of the text body (S290). Note that step S290 is substantially the same as step S273 in FIG.

上記サーチ変数がテキスト本文の先頭音韻に設定されると（Ｓ２９０），サーチ変数が指すキーワード位置情報がｆａｌｓｅであるか否かを確認する（Ｓ２９１）。なお，上記ステップＳ２９１は，上記図１６のステップＳ２７７と実質的に同様である。 When the search variable is set to the head phoneme of the text body (S290), it is confirmed whether or not the keyword position information pointed to by the search variable is false (S291). Note that step S291 is substantially the same as step S277 in FIG.

図１８に示すように，上記ステップＳ２９３では，サーチ変数が指し示している音韻の，次の音韻が，キーワード内の音韻であるかどうかを判定している。 As shown in FIG. 18, in step S293, it is determined whether or not the next phoneme of the phoneme indicated by the search variable is a phoneme in the keyword.

次に，サーチ変数の次の音韻のキーワード位置情報がｆａｌｓｅである場合（Ｓ２９３），サーチ変数が指す音韻と，次の音韻とのコスト値を求める（Ｓ２９５）。なお，当該ステップＳ２９５と，図１７のコスト値を求める処理（Ｓ２８１）とは実質的に同様である。 Next, when the keyword position information of the phoneme next to the search variable is false (S293), the cost value of the phoneme pointed to by the search variable and the next phoneme is obtained (S295). Note that step S295 is substantially the same as the process (S281) for obtaining the cost value in FIG.

次に，図１８に示すように，キーワード優先音韻選択部９０７は，サーチ変数が指し示すキーワード位置情報がｆａｌｓｅであるか否かを確認する（Ｓ２９８）。なお，上記ステップＳ２９８は，図１６のサーチ変数が指すキーワード位置情報がｆａｌｓｅであるか否かを確認する処理（Ｓ２７７）と実質的に同様である。 Next, as shown in FIG. 18, the keyword priority phoneme selection unit 907 checks whether or not the keyword position information indicated by the search variable is false (S298). Note that step S298 is substantially the same as the process (S277) for confirming whether or not the keyword position information pointed to by the search variable in FIG. 16 is false.

一方，上記確認の結果（Ｓ２９８），サーチ変数が指し示すキーワード位置情報がｆａｌｓｅの場合，図１９に示す後続の処理（Ｓ３０１）が実行される。 On the other hand, if the keyword position information indicated by the search variable is false as a result of the confirmation (S298), the subsequent processing (S301) shown in FIG. 19 is executed.

図１８に示すように，上記ステップＳ２９９では，テキスト本文内に複数のキーワードがある場合，キーワードとキーワードとの間に１つ以上の音韻がある場合，サーチ変数を次の音韻に設定する（Ｓ３００）。上記ステップＳ３００は，図１７のステップＳ２８７と実質的に同様である。 As shown in FIG. 18, in step S299, if there are a plurality of keywords in the text body, and there is one or more phonemes between the keywords, the search variable is set to the next phoneme (S300). ). Step S300 is substantially the same as step S287 in FIG.

次に，図１９に示すように，キーワード優先音韻選択部９０７は，まずサーチ変数が指すキーワード位置情報がｆａｌｓｅであるか否かを確認し（Ｓ３０１），次にサーチ変数が指すキーワード位置情報がｆａｌｓｅである場合，サーチ変数の次の音韻のキーワード位置情報がｆａｌｓｅであるか否かを確認する（Ｓ３０３）。なお，ステップＳ３０３は，図１８のステップＳ２９３と実質的に同様である。 Next, as shown in FIG. 19, the keyword priority phoneme selection unit 907 first checks whether or not the keyword position information indicated by the search variable is false (S301), and then the keyword position information indicated by the search variable is If it is false, it is confirmed whether or not the keyword position information of the phoneme next to the search variable is false (S303). Step S303 is substantially the same as step S293 in FIG.

次に，図１９に示すように，キーワード優先音韻選択部９０７による上記ステップＳ３０７では，図７のコスト値が最小となるパスを設定するステップＳ２８５と実質的に同様である。 Next, as shown in FIG. 19, the above-described step S307 by the keyword priority phoneme selection unit 907 is substantially the same as step S285 of setting the path with the minimum cost value in FIG.

キーワード優先音韻選択部９０７は，コスト値が最小となるパスを設定すると（Ｓ３０７），キーワード数を１デクリメントする（Ｓ３０９）。 When the keyword priority phoneme selection unit 907 sets a path that minimizes the cost value (S307), the keyword priority phoneme selection unit 907 decrements the number of keywords by 1 (S309).

図１９に示すように，キーワード優先音韻選択部９０７による，ステップＳ３０９では，テキスト本文内に複数のキーワードがある場合，キーワード数を参照することで，キーワードとキーワードとの間の音韻選択処理が完了したことを表現するのに用いられる。例えば，キーワード数が１になるとキーワードとキーワードとの間の音韻選択処理が完了したことを示しているが，かかる例に限定されない。 As shown in FIG. 19, in step S309 by the keyword priority phoneme selection unit 907, when there are a plurality of keywords in the text body, the phoneme selection process between the keywords is completed by referring to the number of keywords. Used to express what has been done. For example, when the number of keywords is 1, it indicates that the phoneme selection processing between the keywords is completed, but the present invention is not limited to this example.

上記ステップＳ３０９でキーワード数を１デクリメントすると，キーワード優先音韻選択部９０７は，サーチ変数を次の音韻に設定する（Ｓ３１１）。なお，上記ステップＳ３１１は，図１７のステップＳ２８７と実質的に同様である。 When the number of keywords is decremented by 1 in step S309, the keyword priority phoneme selection unit 907 sets the search variable to the next phoneme (S311). Note that step S311 is substantially the same as step S287 of FIG.

また，図１９に示すように，図１８のステップＳ２９９の実行後，キーワード優先音韻選択部９０７は，サーチ変数が指し示すキーワード位置情報がｔｒｕｅであるか否かを確認する（Ｓ３１３）。 As shown in FIG. 19, after executing step S299 in FIG. 18, the keyword priority phoneme selection unit 907 checks whether or not the keyword position information indicated by the search variable is true (S313).

一方，上記サーチ変数が指し示すキーワード位置情報がｆａｌｓｅである場合（Ｓ３１３），図２０に示すサーチ変数の次の音韻のキーワード位置情報が文末であるか否かを確認する処理が行われる（Ｓ３２０）。 On the other hand, if the keyword position information pointed to by the search variable is false (S313), a process is performed to check whether the keyword position information of the phoneme next to the search variable shown in FIG. 20 is the end of the sentence (S320). .

上記説明したように，図１９に示す上記ステップＳ３１５では，サーチ変数が指し示している音韻の，次の音韻が，テキスト本文の文末であるか否かの判定がされる。 As described above, in step S315 shown in FIG. 19, it is determined whether or not the next phoneme of the phoneme indicated by the search variable is the end of the text body.

上記サーチ変数の次の音韻のキーワード位置情報が文末である場合（Ｓ３１５），図２０に示すように，波形セグメントを出力する（Ｓ３２９）。 If the keyword position information of the phoneme next to the search variable is the end of the sentence (S315), a waveform segment is output as shown in FIG. 20 (S329).

一方，上記サーチ変数の次の音韻のキーワード位置情報が文末でない場合（Ｓ３１５），サーチ変数を次の音韻に設定する（Ｓ３１７）。なお，上記ステップＳ３１７は，図１７のステップＳ２８７と実質的に同様である。 On the other hand, if the keyword position information of the next phoneme after the search variable is not the end of the sentence (S315), the search variable is set to the next phoneme (S317). Note that step S317 is substantially the same as step S287 of FIG.

図２０に示すキーワード優先音韻選択部９０７が行うサーチ変数の次の音韻のキーワード位置情報が文末であるか否かを確認する処理（Ｓ３２０）は，図９のステップＳ３１５と実質的に同様である。 The process (S320) for confirming whether or not the keyword position information of the phoneme next to the search variable performed by the keyword priority phoneme selection unit 907 shown in FIG. 20 is the end of the sentence is substantially the same as step S315 of FIG. .

また一方で，サーチ変数の次の音韻のキーワード位置情報が文末でない場合（Ｓ３２０），キーワード優先音韻選択部９０７は，サーチ変数が指す音韻と，次の音韻とのコスト値を求める（Ｓ３２３）。なお，上記ステップＳ３２３は，図１７のステップＳ２８１と実質的に同様である。 On the other hand, if the keyword position information of the phoneme next to the search variable is not the end of the sentence (S320), the keyword priority phoneme selection unit 907 obtains the cost value of the phoneme pointed to by the search variable and the next phoneme (S323). Note that step S323 is substantially the same as step S281 in FIG.

図２０に示す波形セグメントを出力する処理（Ｓ３２９）は，テキスト本文に対して，音韻選択することで得ることができた波形セグメントを出力する。 The process of outputting the waveform segment shown in FIG. 20 (S329) outputs the waveform segment obtained by selecting the phoneme for the text body.

上記キーワード優先音韻選択部９０７により波形セグメントが出力されると（Ｓ３２９），音韻接続部１１１は，上記音韻選択された波形セグメントをつなぎ合わせて合成音声として出力する。上記合成音声がスピーカー等の出力部から出力されることで，音声合成装置１００は，テキスト本文のうちキーワードを強調しながら読み上げることができる。 When a waveform segment is output by the keyword priority phoneme selection unit 907 (S329), the phoneme connection unit 111 connects the waveform segments selected by the phoneme and outputs them as synthesized speech. When the synthesized speech is output from an output unit such as a speaker, the speech synthesizer 100 can read out the text while emphasizing the keyword.

なお，以上で，第２の実施の形態にかかる音声合成装置９００についての説明が終了するが，かかる音声合成装置９００によって，以下に示すような優れた効果が存在する。
（１）キーワードの読み上げを強調させる度合い，つまりキーワード部分の強調の際に前後とのつながりを滑らかにし，また，そのキーワード部分の強調の度合いを調節することで，キーワード部分を強調しても自然な読み上げをすることができる。 Although the description of the speech synthesizer 900 according to the second embodiment is completed as described above, the speech synthesizer 900 has the following excellent effects.
(1) The degree to which the reading of a keyword is emphasized, that is, when the keyword part is emphasized, the connection with the front and back is smoothed, and by adjusting the degree of emphasis of the keyword part, Can read aloud.

なお，上述した一連の処理は，専用のハードウェアにより行うこともできるし，ソフトウェアにより行うこともできる。一連の処理をソフトウェアによって行う場合には，そのソフトウェアを構成するプログラムが，汎用のコンピュータやマイクロコンピュータ等の情報処理装置にインストールされ，上記音声合成装置１００，音声合成装置９００として機能させる。 The series of processes described above can be performed by dedicated hardware or software. When a series of processing is performed by software, a program constituting the software is installed in an information processing apparatus such as a general-purpose computer or a microcomputer, and functions as the speech synthesizer 100 and the speech synthesizer 900.

上記プログラムは，コンピュータに内蔵されている記録媒体としてのハードディスクやＲＯＭ等に予め実行可能なように記録しておくことができる。 The program can be recorded in advance on a hard disk or ROM as a recording medium built in the computer so as to be executable.

あるいはまた，プログラムは，ハードディスクドライブに限らず，フレキシブルディスク，ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＭＯ（ＭａｇｎｅｔｏＯｐｔｉｃａｌ）ディスク，ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ），磁気ディスク，半導体メモリなどのリムーバブル記録媒体に，一時的あるいは永続的に格納（記録）しておくことができる。このようなリムーバブル記録媒体は，いわゆるパッケージソフトウエアとして提供することができる。 Alternatively, the program is not limited to a hard disk drive, but a removable recording medium such as a flexible disk, CD-ROM (Compact Disc Read Only Memory), MO (Magneto Optical) disk, DVD (Digital Versatile Disc), magnetic disk, and semiconductor memory. In addition, it can be stored (recorded) temporarily or permanently. Such a removable recording medium can be provided as so-called package software.

なお，プログラムは，上述したようなリムーバブル記録媒体からコンピュータにインストールする他，ダウンロードサイトから，ディジタル衛星放送用の人工衛星を介して，コンピュータに無線で転送したり，ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ），インターネットといったネットワークを介して，コンピュータに有線で転送し，コンピュータでは，そのようにして転送されてくるプログラムを，内蔵するハードディスク等の記憶手段にインストールすることができる。 The program is installed on the computer from the removable recording medium as described above, and is transferred from the download site to the computer wirelessly via a digital satellite broadcasting artificial satellite, or a LAN (Local Area Network) or the Internet. Such a program can be transferred to a computer via a network, and the computer can install the program transferred in this way in a storage means such as a built-in hard disk.

ここで，本明細書において，コンピュータに各種の処理を行わせるためのプログラムを記述する処理ステップは，必ずしもフローチャートとして記載された順序に沿って時系列に処理する必要はなく，並列的あるいは個別に実行される処理（例えば，並列処理あるいはオブジェクトによる処理）も含むものである。 Here, in this specification, the processing steps for describing a program for causing a computer to perform various processes do not necessarily have to be processed in time series in the order described in the flowchart, but in parallel or individually. This includes processing to be executed (for example, parallel processing or processing by an object).

また，プログラムは，１のコンピュータにより処理されるものであっても良いし，複数のコンピュータによって分散処理されるものであっても良い。 The program may be processed by one computer, or may be distributedly processed by a plurality of computers.

以上，添付図面を参照しながら本発明の好適な実施形態について説明したが，本発明はかかる例に限定されない。当業者であれば，特許請求の範囲に記載された技術的思想の範疇内において各種の変更例または修正例を想定し得ることは明らかであり，それらについても当然に本発明の技術的範囲に属するものと了解される。 As mentioned above, although preferred embodiment of this invention was described referring an accompanying drawing, this invention is not limited to this example. It is obvious for a person skilled in the art that various changes or modifications can be envisaged within the scope of the technical idea described in the claims, and these are naturally within the technical scope of the present invention. It is understood that it belongs.

上記実施形態においては，音声合成装置１００および音声合成装置９００に備わる各部（テキスト解析部１０１〜キーワード優先音韻選択部９０７）はハードウェアからなる場合を例にあげて説明したが，本発明はかかる例に限定されない。例えば，上記各部のうち少なくとも一つは，１又は２以上のモジュールまたはコンポーネントから構成されるプログラムの場合であってもよい。 In the above embodiment, each unit (text analysis unit 101 to keyword priority phoneme selection unit 907) included in the speech synthesizer 100 and the speech synthesizer 900 has been described as an example of hardware, but the present invention is applied. It is not limited to examples. For example, at least one of the above units may be a program composed of one or two or more modules or components.

第１の実施の形態にかかる音声合成装置の概略的な構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a speech synthesizer according to a first embodiment. 第１の実施の形態にかかる音声合成方法の概略を示すフローチャートである。It is a flowchart which shows the outline of the speech synthesis method concerning 1st Embodiment. 第１の実施の形態にかかるキーワード個所の抽出処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the extraction process of the keyword location concerning 1st Embodiment. 第１の実施の形態にかかるキーワード個所の抽出処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the extraction process of the keyword location concerning 1st Embodiment. 第１の実施の形態にかかるキーワードの語尾であるか否かの判断基準の概略を示す説明図である。It is explanatory drawing which shows the outline of the criteria of judgment whether it is the ending of the keyword concerning 1st Embodiment. 本実施の形態にかかる各種情報の概略的な構成を示す説明図である。It is explanatory drawing which shows schematic structure of the various information concerning this Embodiment. 第１の実施の形態にかかる音声合成装置におけるデータの流れを概略的に示す説明図である。It is explanatory drawing which shows roughly the flow of the data in the speech synthesizer concerning 1st Embodiment. 第１の実施の形態にかかるキーワード優先音韻選択処理の概略を示す説明図である。It is explanatory drawing which shows the outline of the keyword priority phoneme selection process concerning 1st Embodiment. 第１の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 1st Embodiment. 第１の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 1st Embodiment. 第１の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 1st Embodiment. 第１の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 1st Embodiment. 第２の実施の形態にかかる音声合成装置の概略的な構成を示すブロック図である。It is a block diagram which shows schematic structure of the speech synthesizer concerning 2nd Embodiment. 第２の実施の形態にかかる音声合成方法の概略を示すフローチャートである。It is a flowchart which shows the outline of the speech synthesis method concerning 2nd Embodiment. 第２の実施の形態にかかる音声合成方法の概略を示すフローチャートである。It is a flowchart which shows the outline of the speech synthesis method concerning 2nd Embodiment. 第２の実施の形態にかかるキーワード個所の抽出処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the extraction process of the keyword location concerning 2nd Embodiment. 第２の実施の形態にかかるキーワード個所の抽出処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the extraction process of the keyword location concerning 2nd Embodiment. 第２の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 2nd Embodiment. 第２の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 2nd Embodiment. 第２の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 2nd Embodiment. 第２の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 2nd Embodiment. 第２の実施の形態にかかるキーワード優先音韻選択処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the keyword priority phoneme selection process concerning 2nd Embodiment.

Explanation of symbols

１００音声合成装置
１０１テキスト解析部
１０３韻律予測部
１０５，９０５キーワード抽出部
１０７，９０７キーワード優先音韻選択部
１０９コーパス
１１１音韻接続部
９０１キーワード重み付け部 DESCRIPTION OF SYMBOLS 100 Speech synthesizer 101 Text analysis part 103 Prosody prediction part 105,905 Keyword extraction part 107,907 Keyword priority phoneme selection part 109 Corpus 111 Phoneme connection part 901 Keyword weighting part

Claims

A speech synthesis method that synthesizes speech by emphasizing a keyword part:
A selection process is performed in which candidate phonemes of all keywords included in the text body are selected from a corpus in order of appearance of the keywords in preference to phoneme candidates other than the keywords. Speech synthesis method.

The selection process uses a combination of unit candidates that minimizes the cost value of the keyword part, and selects a unit candidate that has a minimum cost value from the start position of the text body to the first keyword start position. Select a combination;
The combination of unit candidates that minimizes the cost value is selected from the end position of the keyword toward the start position of the subsequent keyword when there are two or more keywords. The speech synthesis method described.

In the speech synthesis method, before the selection process, a keyword extraction process is performed to search from the head of the phoneme symbol if the phoneme symbol of the keyword partially matches the phoneme symbol of the text body. The speech synthesis method according to claim 1 or 2.

The keyword extraction process changes a value of keyword position information for each phoneme described in the prosodic prediction information based on a part where the phoneme symbol of the keyword and the phoneme symbol of the text body partially match;
Based on the prosodic prediction information including the keyword position information after the change, the selection process selects phoneme candidates of all keywords included in the text sentence from the corpus, The speech synthesis method according to claim 1.

By changing the value of the keyword position information of each phoneme described in the prosodic prediction information, which is information for predicting at least one of voice pitch, voice length, and mel cepstrum, the phonological symbol of the keyword The speech synthesis method according to claim 1, wherein the phoneme symbol of the text body partially matches.

In the speech synthesis method, a weighting process for adding a keyword weighting coefficient indicating a degree of emphasizing the keyword part to one or more keywords in the text body is further performed. Item 6. The speech synthesis method according to any one of Items 1 to 5.

In the keyword extraction process, the keyword weighting coefficient assigned in the weighting process is acquired as keyword weighting information for one or more keywords in the text body, and the combination of the keyword weighting information and the unit candidate is acquired. The speech synthesis method according to claim 4, wherein a table in which a width for narrowing down is associated is used.

In the selection process, the unit candidate combinations are narrowed down based on a value that minimizes a cost value obtained by adding the target cost width, the sub cost value related to the pitch discontinuity, and the sub cost value related to the spectrum discontinuity. 8. The speech synthesis method according to claim 7, wherein a combination of unit candidates having cost values that fall within the range of width is selected.

A speech synthesizer that emphasizes keyword parts and synthesizes speech:
A keyword-preferred phoneme selection unit that selects phoneme candidates of all keywords included in a text body from a corpus in order of appearance of the keywords in preference to phoneme candidates other than the keyword. , Speech synthesizer.

The keyword priority phoneme selection unit uses a combination of unit candidates that minimizes the cost value of the keyword part, and the cost value is minimized from the start position of the text body to the first keyword start position. Select a unit candidate combination;
The combination of unit candidates that minimizes the cost value is selected from the end position of the keyword toward the start position of the subsequent keyword when there are two or more keywords. The speech synthesizer described.

The speech synthesizer further includes a keyword extracting unit that searches from the head of the phoneme symbol whether the phoneme symbol of the keyword partially matches the phoneme symbol of the text body. The speech synthesizer according to 9 or 10.

The keyword extraction unit changes the value of the keyword position information for each phoneme described in the prosodic prediction information based on a location where the phoneme symbol of the keyword and the phoneme symbol of the text body partially match;
The keyword priority phoneme selection unit selects, from the corpus, phoneme candidates of all keywords included in the text sentence based on the prosodic prediction information including the changed keyword position information. The speech synthesizer according to any one of claims 9 to 11.

The keyword extraction unit changes the value of the keyword position information of each phoneme described in the prosody prediction information, which is information for predicting at least one of voice pitch, voice length, and mel cepstrum. 13. The speech according to claim 9, wherein the changed value of the keyword position information indicates that the phonological symbol of the keyword and the phonological symbol of the text body partially match. Synthesizer.

10. The speech synthesizer further includes a weighting unit that assigns a keyword weighting coefficient indicating a degree of emphasizing the keyword part to one or more keywords in the text body. The speech synthesizer in any one of -13.

The keyword extraction unit acquires, as keyword weighting information, a keyword weighting coefficient assigned by the weighting unit for one or more keywords in the text body, and combines the keyword weighting information and the unit candidate. The speech synthesizer according to claim 12, wherein a table in which a narrowing width is associated is used.

The keyword-preferred phoneme selection unit selects the unit candidate based on a value that minimizes a cost value obtained by adding a target cost range, a sub-cost value related to pitch discontinuity, and a sub-cost value related to spectrum discontinuity. 16. The speech synthesizer according to claim 15, wherein a combination of unit candidates having a cost value that falls within a range for narrowing down combinations is selected.

The computer program according to any one of claims 9 to 16, wherein the computer is caused to function as a speech synthesizer for emphasizing a keyword portion and performing speech synthesis.