JPH064090A

JPH064090A - Method and device for text speech conversion

Info

Publication number: JPH064090A
Application number: JP4158158A
Authority: JP
Inventors: Jinichi Chiba; 仁一千葉; Hiroshi Hamada; 洋浜田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1992-06-17
Filing date: 1992-06-17
Publication date: 1994-01-14

Abstract

PURPOSE:To clearly represent the intention represented in a text by extracting an emphasis part in the text, and emphasizing and outputting the speech of the extracted emphasis part. CONSTITUTION:An emphasis detection part 3 detects emphasis information described in the text which is outputted from a text input part 1 and outputs it to an emphasis level conversion part 4. Further, a parameter conversion value determination part 5 converts the emphasis level outputted by the emphasis level conversion part 4 into a physical parameter and outputs the parameter conversion value to an emphasis processing part 7 together with emphasis position information. Then a text speech conversion part 6 collates the text contents outputted from a text memory 2 with a previously stored dictionary to grant pronunciation and rhythm information, and combines previously stored phonemes for the pronunciation to generate a speech parameter. The emphasis processing part 7 performs emphasis processing for the speech parameter by using the emphasis position information and parameter conversion value from the speech parameter conversion part 5.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、テキストから音声を
合成するテキスト音声変換方法および装置に関し、特に
テキストに表記されている強調部分を抽出し、抽出され
た強調部分の音声を強調して出力することにより、テキ
ストに表現されている意図を音声として明確に表現する
テキスト音声の変換方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech conversion method and device for synthesizing speech from text, and particularly, to extract an emphasized portion written in the text and output the emphasized speech of the extracted emphasized portion. By doing so, the present invention relates to a text-to-speech conversion method and device for clearly expressing the intention expressed in text as voice.

【０００２】[0002]

【従来の技術】音声処理技術の進歩に伴って、銀行、郵
便局その他の現金取扱所におけるキャッシュ・サービス
機、駐車場の駐車券発行機その他の案内装置に合成音声
装置が使用されている。しかし、テレホンサービスの如
き情報提供サービスにおいては合成音声は余り採用され
てはいない。これは、従来の規則合成音声は、出力され
る音声の大きさおよび発話の速度が一定であって平均的
であるがために、単調な音声であることが多く、意図が
充分には伝わらないことに起因する。2. Description of the Related Art With the advance of voice processing technology, synthetic voice devices are used in cash service machines in banks, post offices and other cash handling offices, parking ticket issuing machines in parking lots and other guidance devices. However, synthetic speech is not often adopted in information providing services such as telephone services. This is because the conventional rule-synthesized speech is a monotonous speech because the volume of the output speech and the utterance speed are constant and average, and the intention is not sufficiently transmitted. Due to that.

【０００３】文字による表現の場合は、種々の字体、括
弧その他の表現の手法を使用して当方の意図を強調して
読み手に伝えることができる。従来のテキスト音声変換
方法は、テキスト中において字体が変化していてもこれ
を無視していた。また、テキスト中の括弧はそのまま
「かっこ」と読むか、或はこれを無視するかの何れかで
あった。従って、当方の意図を強調したいがためにテキ
スト中において括弧を付し、或は字体を変更してみて
も、テキスト音声の変換においては何等の意味ももたな
かった。実際、テキスト中に表記されている強調した部
分が強調された音声で出力されることが望ましいことは
言うまでもない。In the case of character expression, various character styles, parentheses and other expression techniques can be used to emphasize our intention and convey it to the reader. In the conventional text-to-speech conversion method, even if the font is changed in the text, it is ignored. Also, the parentheses in the text were either read as "brackets" as they were, or they were ignored. Therefore, even if parentheses were added in the text or the font was changed in order to emphasize our intention, there was no meaning in the conversion of the text voice. In fact, it goes without saying that it is desirable that the emphasized portion described in the text is output as emphasized voice.

【０００４】[0004]

【発明が解決しようとする課題】この発明は、上述の通
りの問題を解消した、テキストに表記されている強調部
分を抽出して抽出された強調部分の音声を強調して出力
することによりテキストに表現されている意図を音声と
して明確に表現するテキスト音声の変換方法および装置
を提供するものである。DISCLOSURE OF THE INVENTION The present invention solves the problems described above by extracting the emphasized portion described in the text and emphasizing and outputting the voice of the extracted emphasized portion to output the text. The present invention provides a method and apparatus for converting a text voice that clearly expresses the intention expressed in (1) as voice.

【０００５】[0005]

【課題を解決するための手段】予め記憶されている辞書
とテキストとを照合してこれに読み情報および韻率情報
を付与し、読み情報に基づいて予め記憶された音声素片
を結合することによりテキストに対応する音声を出力す
るテキスト音声変換方法において、テキスト中に表記さ
れる強調部分を抽出し、抽出された強調部分を強調した
音声を出力するテキスト音声変換方法を構成し、そし
て、テキスト中の字体の変化により強調部分を抽出し、
予め定められている字体の種類と音声の強調レベルとの
間の対応により抽出された強調部分の音声の強調の強さ
を決定し、決定された強さで抽出された強調部分を強調
した音声を出力するテキスト音声変換方法を構成し、ま
た、テキスト中の括弧を用いて強調部分を抽出し、予め
定められている括弧の種類と音声の強調レベルとの間の
対応により抽出された強調部分の音声の強調の強さを決
定し、決定された強さで抽出された強部分を強調した音
声を出力するテキスト音声変換方法を構成し、更に、テ
キスト入力部１を具備し、テキスト入力部１から出力さ
れるテキストが記憶されるテキスト・メモリ２を具備
し、テキスト中に表記される強調部分についての強調位
置情報と表記方法の情報とより成る強調情報を検出する
強調検出部３を具備し、強調情報に基づいてこれを強調
レベルに変換する強調レベル変換部４を具備し、強調レ
ベルを物理パラメータに変換するパラメータ変換値決定
部５を具備し、テキスト・メモリ２より出力されるテキ
スト内容を音声パラメータに変換するテキスト音声変換
部６を具備し、音声パラメータについて音声パラメータ
変換部５より出力される強調位置情報およびパラメータ
変換値に基づいて強調処理を施す強調処理部７を具備
し、強調処理部７より出力される音声パラメータを電気
音響変換する音声出力部８を具備するテキスト音声変換
装置をも構成した。[Means for Solving the Problems] Collating a prestored dictionary with text, adding reading information and prosodic information to this, and combining prestored speech units based on the reading information. In the text-to-speech conversion method for outputting speech corresponding to text, a text-to-speech conversion method for extracting a emphasized portion written in the text and outputting a sound emphasizing the extracted emphasized portion is configured, and the text Extract the emphasized part by changing the font inside,
A voice that emphasizes the extracted emphasized portion with the determined strength by determining the strength of the emphasized sound of the extracted emphasized portion by the correspondence between the predetermined type of font and the emphasized level of the sound. A text-to-speech conversion method for outputting the emphasized part using parentheses in the text, and the emphasized part extracted by the correspondence between the predetermined bracket type and the emphasis level of the voice. A text-to-speech conversion method for deciding the strength of the emphasis of the voice and outputting a voice in which the strong portion extracted with the determined strength is output, and further comprising a text input unit 1, and a text input unit 1 is provided with a text memory 2 in which the text output is stored, and an emphasis detection unit 3 for detecting emphasis information composed of emphasis position information about an emphasized part described in the text and notation method information. The text content output from the text memory 2 includes an emphasis level converter 4 for converting the emphasis level based on emphasis information, and a parameter conversion value determiner 5 for converting the emphasis level into a physical parameter. Is provided with a text-to-speech conversion unit 6, which includes an emphasis processing unit 7 that performs emphasis processing on the sound parameter based on the emphasis position information and the parameter conversion value output from the sound parameter conversion unit 5. A text-to-speech converter having a speech output unit 8 for electroacoustic conversion of speech parameters output from the processing unit 7 is also configured.

【０００６】[0006]

【実施例】この発明の実施例を図１を参照して説明す
る。１はテキスト入力部である。テキスト入力部１によ
るテキストの入力は、キーボードにより新たにテキスト
を入力する方法、および例えばハードディスク、フロッ
ピィ・ディスク上に既に記憶されているテキストから入
力する方法その他種々の方法を採用することができる。
テキスト入力部１により入力されたテキストは次いでテ
キスト・メモリ２に記憶される。また、一度び入力され
たテキストについて、ディスプレイ、キーボード、マウ
スその他の入力装置により編集することもできる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT An embodiment of the present invention will be described with reference to FIG. 1 is a text input section. Various methods such as a method of newly inputting text with a keyboard, a method of inputting text from a text already stored on a hard disk or a floppy disk, and other various methods can be adopted for inputting text by the text input unit 1.
The text input by the text input unit 1 is then stored in the text memory 2. In addition, once entered text can be edited using a display, a keyboard, a mouse or other input device.

【０００７】テキスト・メモリ２には、テキスト入力部
１から出力されたテキストが記憶され、ここからテキス
ト音声変換部６および強調検出部３へそれぞれテキスト
が出力される。強調検出部３においては、テキスト入力
部１から出力されたテキストについてテキスト中に表記
される強調情報を検出し、これを強調レベル変換部４へ
出力する。強調情報は強調位置情報とテキスト中の強調
部分の表記方法の情報とより成る。The text memory 2 stores the text output from the text input section 1, and the text is output from the text memory 2 to the text-to-speech conversion section 6 and the emphasis detection section 3, respectively. The emphasis detection unit 3 detects the emphasis information written in the text of the text output from the text input unit 1 and outputs it to the emphasis level conversion unit 4. The emphasis information is composed of emphasis position information and information on the notation method of the emphasis part in the text.

【０００８】ここで、強調情報検出方法について説明す
る。先ず、テキスト中の強調部分の表記方法として字体
を採用し、テキスト中において字体が変化する字体変化
位置から強調情報を検出するものについて図２（ａ）を
参照して説明する。字体変化位置の検出は入力されたテ
キストの字体を標準字体との間の比較により行う。テキ
スト中の標準字体と異なる字体を検出した始めの位置お
よびその字体の終わりの位置を強調位置情報として、ま
た、そのとき検出された字体の種類を種類情報としてこ
れらを強調レベル変換部４へ出力する。図２（ａ）はテ
キスト中の字体の変化位置により強調部分を抽出する場
合のテキスト入力例である。入力されたテキスト「関東
地方は日中は曇ですが、夕方６時頃から雨が降り出すで
しょう。」のうち、「関東地方は」の部分がイタリック
体で、「夕方６時頃から」の部分が太字体で示されてい
る。この例が入力されるとまず、標準字体と異なる字体
「イタリック」が、「関」で検出され、次に「は」のと
ころまで検出される。「関」の位置と「は」の位置が強
調位置情報となる。次に、標準字体と異なる字体「太
字」が「夕」で検出され、「ら」のところまで検出され
る。「夕」の位置と「ら」の位置が強調位置情報とな
る。はじめに検出された強調位置と字体の種類としての
「イタリック」、次に検出された強調位置と字体の種類
としての「太字」が、それぞれ強調レベル変換部４へ出
力される。Here, the emphasis information detection method will be described. First, a method in which a font is adopted as a notation method for an emphasized portion in text and the emphasis information is detected from a font change position where the font changes in the text will be described with reference to FIG. The change position of the font is detected by comparing the font of the input text with the standard font. The start position and the end position of the font different from the standard font in the text are output as emphasis position information, and the type of the font detected at that time is used as type information, which are output to the emphasis level conversion unit 4. To do. FIG. 2A is an example of text input in the case where an emphasized portion is extracted according to the change position of the font in the text. In the entered text "It is cloudy in the Kanto region during the daytime, but it will start raining at about 6 pm.", "Kanto region is" is in italics and "from about 6 pm" The parts are shown in bold type. When this example is input, first, the italic font different from the standard font is detected at "Seki", and then up to "ha". The position of “Seki” and the position of “ha” are the emphasis position information. Next, a font “bold” different from the standard font is detected in “evening” and even up to “ra”. The position of "evening" and the position of "ra" are the emphasis position information. The first detected emphasis position and “italic” as the type of font, and the next detected “bold” as the emphasis position and type of font are output to the emphasis level converter 4.

【０００９】次に、テキスト中の強調部分の表記方法と
して括弧を採用し、テキスト中の括弧の位置から強調部
分を検出するものについて図２（ｂ）を参照して説明す
る。テキスト中の括弧を検出した位置の一つ後の位置
と、次に検出した括弧の位置の一つ前の位置を強調位置
情報として、また、そのときの括弧の種類を種類情報情
報としてこれらを強調レベル変換部４へ出力する。ここ
で、例えば左括弧と右括弧の種類が相違し或は括弧が片
方しかない場合その他の括弧の位置と括弧の種類が対応
しない場合は、これらの括弧を無視するか或はディスプ
レイにメッセージを出力する様な手法を採用する。図２
（ｂ）は、テキスト中の括弧の種類により強調部分を抽
出する場合のテキスト入力例である。入力されたテキス
ト「最近の“ＣＤラジカセ”には、『リモコン』が付い
ています。」のうち、「ＣＤラジカセ」の部分が“ ”
で、「リモコン」の部分が括弧『』により囲われてい
る。この例が入力されると、先ず括弧“が検出され、次
いで括弧”が検出される。括弧“が検出されてその一つ
後の「Ｃ」の位置と、括弧”が検出されてその一つ前の
「セ」の位置が強調位置情報となる。次に、括弧『が検
出され、次いで括弧』が検出される。括弧『が検出され
てその一つ後の「リ」の位置と、括弧』が検出されてそ
の一つ前の「ン」の位置が強調位置情報となる。初めに
検出された強調位置と括弧の種類としての“ ”および
次に検出された強調位置と括弧の種類としての『』が、
それぞれ強調レベル変換部４へ出力される。Next, referring to FIG. 2B, description will be given of a method in which parentheses are adopted as a notation method of an emphasized part in text and the emphasized part is detected from the position of the bracket in the text. The position after the position where the parenthesis is detected in the text and the position immediately before the position of the next detected parenthesis are used as the emphasis position information, and the type of the parenthesis at that time is used as the type information information. Output to the emphasis level converter 4. Here, for example, if the left parenthesis and the right parenthesis are different types, or if there is only one parenthesis, and if other parenthesis positions do not correspond to the parenthesis types, ignore these parentheses or display a message on the display. Use a method that outputs. Figure 2
(B) is an example of text input in the case of extracting an emphasized part according to the type of parentheses in the text. In the entered text "The latest" CD radio-cassette "has a" remote control """,the" CD radio-cassette "part is"".
, The "remote control" part is enclosed in brackets "". When this example is entered, the bracket "is detected first, then the bracket" is detected. The position of "C" immediately after the bracket "is detected and the position of" S "immediately before the bracket" is detected becomes the emphasized position information. Next, the bracket "is detected, and then the bracket" is detected. The position of "ri" immediately after the bracket "is detected and the position of" n "immediately before the bracket" is detected becomes the highlighted position information. The first detected emphasis position and "" as the type of brackets and the next detected emphasis position and "" as the type of brackets are
Each is output to the emphasis level converter 4.

【００１０】強調レベル変換部４においては、強調検出
部３より出力されたテキスト中の強調部分の強調位置情
報と表記方法の情報に基づいてこれらを強調レベルに変
換する。強調レベルの変換方法は、テキスト中の強調部
分の表記方法と強調レベルを対応させる変換テーブルに
よる方法その他、表記方法と強調レベルとの間の対応が
求まる方法であればどのような方法でもよい。変換され
た強調レベルはパラメータ変換値決定部５へ強調位置情
報と共に出力される。The emphasis level conversion unit 4 converts these into emphasis levels based on the emphasis position information of the emphasis part in the text output from the emphasis detection unit 3 and the notation method information. The conversion method of the emphasis level may be any method as long as the correspondence between the notation method and the emphasis level can be obtained, such as a conversion table that associates the notation method of the emphasized part in the text with the emphasis level. The converted emphasis level is output to the parameter conversion value determination unit 5 together with the emphasis position information.

【００１１】先ず、字体の種類と強調レベルを対応させ
る変換テーブルにより強調レベルの変換をする変換方法
について図３（ａ）を参照して説明する。ここにおいて
は、強調検出部３より出力される字体の種類に基づいて
強調レベルに変換し、パラメータ変換値決定部５へ強調
位置情報と共に強調レベルを出力する。字体の種類に対
応した強調レベルの変換テーブルの一例を図３（ａ）に
示す。強調レベルは数字が高くなるほど強い強調を示
す。この例においては、字体の種類と強調レベルとの間
の対応を５段階に設定しているが、何段階でもよい。ま
た、強調レベルの変換テーブル上にない字体を検出した
場合は、これは例えば中間のレベル３に対応させる様に
する。図２（ａ）のテキスト入力例においては、強調検
出部３から出力された字体情報「イタリック」と「太
字」は図３（ａ）の変換テーブルによると、「イタリッ
ク」は強調レベルが「１」に変換され、そして「太字」
は強調レベルが「３」に変換されることとなる。First, a conversion method for converting the emphasis level using a conversion table that associates the type of font and the emphasis level will be described with reference to FIG. Here, the emphasis level is converted based on the type of font output from the emphasis detection unit 3, and the emphasis level is output to the parameter conversion value determination unit 5 together with the emphasis position information. FIG. 3A shows an example of the conversion table of the emphasis level corresponding to the type of font. The higher the emphasis level, the stronger the emphasis. In this example, the correspondence between the type of font and the emphasis level is set to five levels, but any number of levels may be used. When a font that is not on the conversion table of the emphasis level is detected, this is made to correspond to the intermediate level 3, for example. In the text input example of FIG. 2A, the font information “italic” and “boldface” output from the emphasis detection unit 3 is “1” according to the conversion table of FIG. Is converted to "and bold"
Means that the emphasis level is converted to "3".

【００１２】次に、括弧の種類と強調レベルを対応させ
る変換テーブルにより強調レベルの変換をする変換方法
について図３（ｂ）を参照して説明する。ここにおいて
は、強調検出部３より出力された括弧の種類に基づいて
強調レベルに変換し、パラメータ変換値決定部５へ強調
位置情報と共に強調レベルを出力する。括弧の種類に対
応した強調レベルの変換テーブルの一例を図３（ｂ）に
示す。強調レベルは数字が高くなるほど強い強調を示
す。この例においては、括弧の種類と強調レベルとの間
の対応を５段階に設定しているが、何段階でもよい。ま
た、強調レベルの変換テーブル上にない括弧を検出した
場合は、これを適当な強調レベルと対応させる様にすれ
ばよい。図２（ｂ）のテキスト入力例においては、強調
検出部３から出力された括弧の種類情報“ ”と『』は
図３（ｂ）の変換テーブルにより、括弧“ ”は強調レ
ベルが「２」に変換され、括弧『』は強調レベルが
「４」に変換される。Next, a conversion method for converting the emphasis level using a conversion table that associates the bracket type with the emphasis level will be described with reference to FIG. Here, the emphasis level is converted based on the type of parenthesis output from the emphasis detection unit 3, and the emphasis level is output to the parameter conversion value determination unit 5 together with the emphasis position information. FIG. 3B shows an example of the conversion table of the emphasis level corresponding to the type of bracket. The higher the emphasis level, the stronger the emphasis. In this example, the correspondence between the bracket type and the emphasis level is set to five levels, but any number of levels may be set. If a parenthesis that is not on the conversion table of the emphasis level is detected, it may be associated with an appropriate emphasis level. In the text input example of FIG. 2B, the bracket type information “” and “” output from the emphasis detection unit 3 is based on the conversion table of FIG. 3B, and the bracket “” has an emphasis level of “2”. And the bracket “” is converted to the emphasis level “4”.

【００１３】パラメータ変換値決定部５においては、強
調レベル変換部４で出力された強調レベルを物理パラメ
ータに変換し、強調位置情報と共に強調処理部７へパラ
メータ変換値を出力する。変換方法としては、強調レベ
ルとパラメータ変換値との間の対応を変換テーブルによ
る方法その他、強調レベルと音声パラメータとの間の対
応が求まる方法であればどのような方法でもよい。パラ
メータ変換は、例えば「音声合成装置（特願平４−２６
８００）」に開示される方法を採用することができる。
強調レベルに対応したパラメータ変換テーブルの一例を
図４に示す。図の例においては５段階の強調レベルと音
声パラメータの変換をしているが、強調レベル変換部４
における強調レベルと対応する音声パラメータ変換値を
用意しておけばよい。The parameter conversion value determination unit 5 converts the emphasis level output from the emphasis level conversion unit 4 into a physical parameter, and outputs the parameter conversion value to the emphasis processing unit 7 together with the emphasis position information. As the conversion method, any method may be used as long as the correspondence between the emphasis level and the parameter conversion value is obtained by a conversion table or any other method can be obtained as long as the correspondence between the emphasis level and the voice parameter is obtained. Parameter conversion is performed by, for example, "Speech synthesizer (Japanese Patent Application No.
800) ”can be adopted.
An example of the parameter conversion table corresponding to the emphasis level is shown in FIG. In the example shown in the figure, the emphasis level and the voice parameter are converted in five steps.
It suffices to prepare a voice parameter conversion value corresponding to the emphasis level in.

【００１４】図２（ａ）に示されるテキスト入力例にお
いては、強調レベル変換部４から出力された強調レベル
「１」は、図４の変換テーブルにより音量が３ｄＢ増加
され、基本周波数が１. ０３倍、発話速度が０. ８５
倍、強調部分の前に２００ミリ秒のポーズが挿入、強調
部分の後に１５０ミリ秒のポーズが挿入、と言うパラメ
ータ変換値に変換される。また、強調レベル変換部４か
ら出力された強調レベル「３」は、図４の変換テーブル
により、音量が６ｄＢ増加、基本周波数が１. ０５倍、
発話速度が０. ７５倍、強調部分の前に３３０ミリ秒の
ポーズが挿入、強調部分の後に２１０ミリ秒のポーズが
挿入、と言うパラメータ変換値に変換される。In the text input example shown in FIG. 2A, the emphasis level "1" output from the emphasis level conversion unit 4 is increased in volume by 3 dB and the fundamental frequency is 1. 03 times, speech rate 0.85
Double, a 200 ms pause is inserted before the emphasized part, and a 150 ms pause is inserted after the emphasized part. The emphasis level “3” output from the emphasis level conversion unit 4 is increased by 6 dB in the volume and 1.05 times in the fundamental frequency according to the conversion table of FIG.
The utterance speed is 0.75 times, a 330 ms pause is inserted before the emphasized part, and a 210 ms pause is inserted after the emphasized part.

【００１５】図２（ｂ）のテキスト入力例においては、
強調レベル変換部４から出力された強調レベル「２」
は、図４の変換テーブルにより音量が４ｄＢ増加、基本
周波数が１. ０４倍、発話速度が０. ８０倍、強調部分
の前に２８０ミリ秒のポーズが挿入、強調部分の後に１
８０ミリ秒のポーズが挿入、とパラメータ変換値に変換
される。また、強調レベル変換部４から出力された強調
レベル「４」は、音量が７ｄＢ増加、基本周波数が１.
１０倍、発話速度が０. ７０倍、強調部分の前に４００
ミリ秒のポーズが挿入、強調部分の後に２４０ミリ秒の
ポーズが挿入、と言うパラメータ変換値に変換される。In the text input example of FIG. 2B,
Enhancement level “2” output from the enhancement level conversion unit 4
4 increases the volume by the conversion table of FIG. 4, the fundamental frequency is 1.04 times, the speech rate is 0.80 times, a 280 ms pause is inserted before the emphasized part, and 1 after the emphasized part.
An 80-millisecond pause is inserted and converted into a parameter conversion value. In addition, the emphasis level “4” output from the emphasis level conversion unit 4 has a volume increased by 7 dB and a fundamental frequency of 1.
10 times, speech speed 0.70 times, 400 before emphasis
It is converted into a parameter conversion value of inserting a millisecond pause and inserting a 240 millisecond pause after the emphasized portion.

【００１６】６はテキスト音声変換部であり、テキスト
メモリ２より出力されたテキスト内容を予め記憶されて
いる辞書と照合してこれに読みと韻律情報とを付与し、
読みに基づいて予め記憶されている音声素片を結合し、
音声パラメータを生成する。テキストから読みに変換す
る場合、今日までに提案されている適当な変換方法を選
択採用することができるが、後で説明される強調処理を
行う必要上、音の大きさ、基本周波数、速度を制御する
ことができる変換方法である例えば、ＬＰＣ合成方法、
ＬＳＰ合成方法を採用している方法であることが望まし
い。ここにおいて生成された音声パラメータは強調処理
部７へ出力される。図２（ａ）のテキスト入力例は「関
東地方は日中は曇ですが、夕方６時頃から雨が降り出す
でしょう。」が音声パラメータに変換されるが、字体の
変化が音声パラメータに変換されることはないない。図
２（ｂ）のテキスト入力例は「最近のＣＤラジカセに
は、リモコンが付いています。」が音声パラメータに変
換されるが、括弧が音声パラメータに変換されることは
ない。Reference numeral 6 denotes a text-to-speech conversion unit, which collates the text contents output from the text memory 2 with a pre-stored dictionary and adds reading and prosody information to the dictionary.
Combine speech units stored in advance based on reading,
Generate voice parameters. When converting from text to reading, any suitable conversion method proposed to date can be selected and adopted, but the volume, fundamental frequency, and speed of the sound can be changed because of the emphasis processing described later. A conversion method that can be controlled, such as an LPC synthesis method,
It is desirable that the method adopts the LSP synthesis method. The voice parameter generated here is output to the emphasis processing unit 7. In the text input example in Figure 2 (a), "Kanto region is cloudy during the day, but it will start to rain from around 6pm." Is converted into a voice parameter, but changes in the font are converted into voice parameters. It will not be done. In the text input example of FIG. 2B, "the recent CD radio-cassette has a remote controller." Is converted into a voice parameter, but the parentheses are not converted into a voice parameter.

【００１７】強調処理部７は、テキスト音声変換部６よ
り出力された音声パラメータに対して音声パラメータ変
換部５より出力された強調位置情報およびパラメータ変
換値により強調処理を施す。処理方法としては、例え
ば、基本周波数については音声合成の駆動パルスの間隔
を変更し、発話速度については合成のフレーム周期を変
換するという様な一般的な手法を採用する。強調処理さ
れた音声パラメータは強調部分の音声パラメータに変形
が加えられるために、強調部分以外の音声パラメータと
の間にパラメータ不連続が生じている。これを除去する
ために強調部分の前後のパラメータを平滑化してから音
声出力部８へ出力する。The emphasis processing unit 7 applies emphasis processing to the voice parameter output from the text-to-speech conversion unit 6 based on the emphasis position information and the parameter conversion value output from the voice parameter conversion unit 5. As the processing method, for example, a general method is adopted in which the interval of the drive pulse for voice synthesis is changed for the fundamental frequency and the frame period of synthesis is converted for the speech rate. Since the emphasized voice parameter is modified in the voice parameter of the emphasized portion, a parameter discontinuity occurs with the voice parameter other than the emphasized portion. In order to remove this, the parameters before and after the emphasized portion are smoothed and then output to the audio output unit 8.

【００１８】図２（ａ）に示されるテキスト入力例につ
いては、テキスト音声変換部６より出力された音声パラ
メータ「関東地方は日中は曇ですが、夕方６時頃から雨
が降り出すでしょう。」に対して、音声パラメータ変換
部５より出力された強調位置およびパラメータ変換値に
より強調処理が施される。初めの強調位置とパラメータ
変換値とから「関東地方は」に対しては、強調レベル
「１」のパラメータ変換値の強調処理が施され、強調さ
れた音声パラメータが作成される。次の強調位置とパラ
メータ変換値とから「夕方６時頃から」に対しては、強
調レベル「３」のパラメータ変換値の強調処理が施さ
れ、強調された音声パラメータが作成される。強調位置
に相当しない部分はテキスト音声変換部６より出力され
た音声パラメータそのままであり、強調処理は施されな
い。For the text input example shown in FIG. 2 (a), the voice parameter "Kanto region is cloudy during the daytime, but it will start to rain from around 6pm in the evening. Is emphasized based on the emphasis position and the parameter conversion value output from the voice parameter converter 5. From the initial emphasis position and the parameter conversion value, “Kanto region is” is subjected to the emphasis process of the parameter conversion value of the emphasis level “1”, and the emphasized voice parameter is created. From the next emphasis position and the parameter conversion value, the parameter conversion value of the emphasis level “3” is emphasized for “from about 6 o'clock in the evening” to create the emphasized voice parameter. The portion that does not correspond to the emphasized position is the voice parameter output from the text-to-speech conversion unit 6 as it is, and no emphasis process is performed.

【００１９】図２（ｂ）に示されるテキスト入力例につ
いては、テキスト音声変換部６より出力された音声パラ
メータ「最近のＣＤラジカセには、リモコンが付いてい
ます。」に対して、音声パラメータ変換部５より出力さ
れた強調位置およびパラメータ変換値により強調処理が
施される。初めの強調位置とパラメータ変換値とから
「ＣＤラジカセ」については、強調レベル「２」のパラ
メータ変換値の強調処理が施され、強調された音声パラ
メータが作成される。次の強調位置とパラメータ変換値
とから「リモコン」については、強調レベル「４」のパ
ラメータ変換値の強調処理が施され、強調された音声パ
ラメータが作成される。強調位置に相当しない部分はテ
キスト音声変換部６より出力された音声パラメータその
ままであり、強調処理は施されない。In the case of the text input example shown in FIG. 2 (b), voice parameter conversion is performed for the voice parameter "latest CD radio-cassette is equipped with a remote controller." The emphasis processing is performed by the emphasis position and the parameter conversion value output from the unit 5. With respect to the “CD radio cassette player”, the parameter conversion value of the emphasis level “2” is emphasized based on the initial emphasis position and the parameter conversion value, and the emphasized voice parameter is created. With respect to the "remote control" from the next emphasis position and the parameter conversion value, the parameter conversion value of the emphasis level "4" is emphasized, and the emphasized voice parameter is created. The portion that does not correspond to the emphasized position is the voice parameter output from the text-to-speech conversion unit 6 as it is, and no emphasis process is performed.

【００２０】８は音声出力部である。音声出力部８にお
いては、強調処理部７より出力された音声パラメータで
あるディジタル音声信号をディジタル−アナログ変換
し、スピーカーその他の電気音響変換器により音声を出
力する。図２（ａ）および図２（ｂ）に示されるテキス
ト入力例はテキスト中に強調部分が２箇所含まれる例で
あったが、テキスト中に含まれる強調部分は更に多数で
あっても同様に強調された音声を出力することができ
る。そして、字体の種類或は括弧の種類と強調レベルと
の間の変換を５段階に設定したが、強調レベルの設定は
何段階であっても差し支えない。テキスト中に表記され
ている強調を強調レベルに変換する場合、その強調レベ
ルに対応するパラメータ変換値を変化できるようにして
おけばよい。例えば、変換テーブルを用いて変換する場
合、パラメータ変換値決定部５におけるパラメータ変換
テーブルに、強調レベル変換部４における強調レベル変
換テーブルの強調レベル数と同じ強調レベルのパラメー
タ変換値を用意しておけばよい。Reference numeral 8 is a voice output unit. In the voice output unit 8, the digital voice signal which is the voice parameter output from the emphasis processing unit 7 is digital-analog converted, and the voice is output by a speaker or other electroacoustic converter. The text input examples shown in FIGS. 2A and 2B are examples in which two emphasized parts are included in the text, but the same applies even if the number of emphasized parts included in the text is large. The emphasized voice can be output. Although the conversion between the type of font or the type of bracket and the emphasis level is set in five steps, the emphasis level may be set in any number of steps. When converting the emphasis written in the text to the emphasis level, the parameter conversion value corresponding to the emphasis level may be changed. For example, when conversion is performed using the conversion table, the parameter conversion table in the parameter conversion value determination unit 5 should be prepared with parameter conversion values having the same emphasis level as the number of emphasis levels in the emphasis level conversion table in the emphasis level conversion unit 4. Good.

【００２１】[0021]

【発明の効果】以上の通りであって、この発明のテキス
ト音声変換方法および装置によれば、入力されたテキス
ト中の括弧、字体の種類を反映した音声を出力すること
ができ、結果として文章の作成者がテキスト上に表現し
たキーワード、話題の焦点を明確にした音声を表現する
ことができる。また、既に作成されている文章について
は、強調させたい部分を括弧で囲み、或は字体を変更す
るという簡単な操作により表現力のある音声を出力する
ことができる。この発明のテキスト音声変換方式によれ
ば、単調となりがちな合成音声にテキスト作成者の意
図、感情を反映させることができ、合成音声の応用範囲
も広がることが期待される。As described above, according to the text-to-speech conversion method and apparatus of the present invention, it is possible to output a voice that reflects the type of parentheses and fonts in the input text, resulting in a sentence. It is possible to express the keyword that the creator of the above expresses on the text and the voice that makes the focus of the topic clear. Also, with respect to a sentence that has already been created, it is possible to output expressive voice by a simple operation of enclosing the portion to be emphasized in parentheses or changing the font. According to the text-to-speech conversion method of the present invention, it is expected that the intention and emotion of the text creator can be reflected in the synthesized voice, which tends to be monotonous, and the range of application of the synthesized voice is expanded.

[Brief description of drawings]

【図１】この発明の実施例のブロック図。FIG. 1 is a block diagram of an embodiment of the present invention.

【図２】強調部分を検出するテキスト入力の例を示す図
であり、（ａ）は字体の変化位置から強調部分を検出す
るテキスト入力の例を示す図、（ｂ）は括弧の位置から
強調部分を検出するテキスト入力の例を示す図である。2A and 2B are diagrams showing an example of text input for detecting an emphasized part, FIG. 2A is a diagram showing an example of text input for detecting an emphasized part from a change position of a font, and FIG. It is a figure which shows the example of the text input which detects a part.

【図３】強調レベル変換テーブルであり、（ａ）は字体
の種類についての強調レベル変換テーブル、（ｂ）は括
弧の種類についての強調レベル変換テーブルである。FIG. 3 is an emphasis level conversion table, in which (a) is an emphasis level conversion table for font types and (b) is an emphasis level conversion table for bracket types.

【図４】強調レベルについてのパラメータ変換テーブ
ル。FIG. 4 is a parameter conversion table for emphasis levels.

[Explanation of symbols]

１テキスト入力部２テキスト・メモリ３強調検出部４強調レベル変換部５パラメータ変換値決定部６テキスト音声変換部７強調処理部８音声出力部 1 Text Input Section 2 Text Memory 3 Emphasis Detection Section 4 Emphasis Level Conversion Section 5 Parameter Conversion Value Determining Section 6 Text Speech Conversion Section 7 Enhancement Processing Section 8 Speech Output Section

Claims

[Claims]

1. Corresponding to text by collating a pre-stored dictionary with text, adding reading information and prosodic information to this, and combining pre-stored speech units based on the reading information. In the text-to-speech conversion method for outputting a speech, the emphasized portion described in the text is extracted, and the speech in which the extracted emphasized portion is emphasized is output.

2. The text-to-speech conversion method according to claim 1, wherein an emphasized portion is extracted by a change of a font in the text, and a predetermined correspondence between the font type and the voice emphasis level is used. A text-to-speech conversion method, characterized in that the strength of emphasis of the extracted emphasized part is determined, and the emphasized part of the extracted part is output with the determined strength.

3. The text-to-speech conversion method according to claim 1, wherein the emphasized portion is extracted by using parentheses in the text, and a predetermined correspondence between the type of parentheses and the emphasis level of voice is used. A text-to-speech conversion method, characterized in that the strength of emphasis of a voice of an extracted emphasized portion is determined, and a voice in which the extracted strong portion is emphasized with the determined strength is output.

4. A text input section is provided, and a text memory for storing the text output from the text input section is provided, and emphasis position information and notation method information about an emphasized part described in the text. And a parameter conversion value determining unit for converting the emphasis level into a physical parameter. , A text-to-speech conversion unit that converts the text content output from the text memory into a speech parameter, and performs emphasis processing on the speech parameter based on the emphasis position information and the parameter conversion value output from the speech parameter conversion unit 5. An audio output unit that includes an emphasis processing unit and performs electroacoustic conversion of the audio parameter output from the emphasis processing unit. A text-to-speech converter characterized by being provided.