JPH09265299A

JPH09265299A - Text-to-speech device

Info

Publication number: JPH09265299A
Application number: JP7472296A
Authority: JP
Inventors: Mitsuo Furumura; 光夫古村; Tomoki Hamagami; 知樹濱上
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 1996-03-28
Filing date: 1996-03-28
Publication date: 1997-10-07

Abstract

(57)【要約】【課題】テキスト読み上げ装置においてユーザが読み
上げ動作の制御を随意行うことが難しい。【解決手段】透明なタッチパネル（入力器１２）がデ
ィスプレイ（表示器１０）の画面上に一体に構成された
表示入力デバイス２を用い、操作者が表示器１０に表示
された文字列に沿って入力器１２を指などのポインタで
なぞるトレース動作を行う。入力器１２からトレース動
作に基づいて得られる入力信号１４は入力信号処理部４
２においてポインタの位置データ５０、移動速度データ
５２、圧力データ５４が得られ、それぞれ読み上げ対象
文節、発話速度、音量（又はアクセント強度）というパ
ラメータに対応づけられる。規則合成部８は指定された
文節に対して、発話速度、音量を表すパラメータを反映
して合成を行うことにより、操作者の意図に沿った了解
性の高い合成音声が容易に得られる。 (57) [Abstract] [Problem] It is difficult for a user to arbitrarily control a reading operation in a text reading device. SOLUTION: A transparent touch panel (input device 12) is used on a screen of a display (display device 10) integrally with a display input device 2, and an operator follows a character string displayed on the display device 10. A tracing operation is performed by tracing the input device 12 with a pointer such as a finger. The input signal 14 obtained from the input device 12 based on the trace operation is input signal processing unit 4
2, pointer position data 50, moving speed data 52, and pressure data 54 are obtained, and are respectively associated with parameters such as the reading target phrase, the speech speed, and the volume (or accent strength). The rule synthesizing unit 8 synthesizes the designated bunsetsu by reflecting the parameters indicating the speech rate and the sound volume, so that the synthesized speech with high intelligibility according to the operator's intention can be easily obtained.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は操作の容易なマン・
マシンインターフェースを有したテキスト読み上げ装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention
The present invention relates to a text-to-speech device having a machine interface.

【０００２】[0002]

【従来の技術】従来よりテキスト読み上げ装置は、印
刷、出版などの業務において、文章中の誤字、脱字等の
誤りを検出、修正する編集作業に用いられている。例え
ば特開昭５９−１２７１４８号公報には、入力された文
章の検査、修正作業のためテキストを読み上げる装置が
記載されている。2. Description of the Related Art Conventionally, a text-to-speech apparatus has been used for editing work for detecting and correcting errors such as typographical errors and omissions in texts in business such as printing and publishing. For example, Japanese Unexamined Patent Publication No. 59-127148 discloses a device for reading out text for checking and correcting input text.

【０００３】従来の装置は、表示装置とは別個に設けら
れたキーボード、マウスなどの入力装置を用いて操作さ
れる。例えば、文字の入力だけでなく、装置から出力さ
れる音声の発話速度、声の大きさ・強さなどの条件の設
定、文章の飛ばし読み、後戻り、繰り返しの指定は、キ
ーボードから行ったり、マウスを操作して画面上のカー
ソルを所定の位置に移動させるといった操作により行わ
れている。つまり、指定された文章範囲は、基本的に一
定の上記条件で一方的に読み上げられるものであり、そ
の停止や読み上げ条件の変更を行うためには、上記キー
ボード等の入力装置を別途操作する必要がある。The conventional device is operated by using an input device such as a keyboard and a mouse provided separately from the display device. For example, in addition to inputting characters, you can use the keyboard or the mouse to set conditions such as the speech speed of the voice output from the device, the volume and strength of the voice, skip reading of sentences, backtracking, and repeating. Is operated to move the cursor on the screen to a predetermined position. In other words, the specified text range is basically unilaterally read under the above constant conditions, and it is necessary to operate the input device such as the keyboard separately in order to stop or change the reading conditions. There is.

【０００４】また、文の韻律的特徴は単語構成、文型な
どによって多様に変化する。このため、自然な音声での
読み上げには、規則合成方式が適している。規則合成方
式は、文の上記韻律的特徴を規則により表現する高度な
合成方式である。ちなみに音声は、音色、基本周波数の
高さ、及び強度の３要素に分解して捉えることができ
る。つまり音声は基本周波数とその逓倍周波数の音波成
分によって構成される周波数スペクトルを有し、音色は
この周波数スペクトルの包絡形状に依存する。またピッ
チはこの基本周波数であり音の高さを定める。ピッチの
時間的変動はピッチパタンにより表される。強度は音声
の振幅として捉えられる。Further, the prosodic characteristics of a sentence are variously changed depending on the word structure, sentence pattern, and the like. Therefore, the rule synthesis method is suitable for reading aloud with natural voice. The rule synthesizing method is an advanced synthesizing method for expressing the prosodic features of a sentence by rules. By the way, a voice can be captured by being decomposed into three components, a timbre, a height of a fundamental frequency, and an intensity. That is, the voice has a frequency spectrum composed of the sound wave component of the fundamental frequency and its multiplied frequency, and the timbre depends on the envelope shape of this frequency spectrum. The pitch is this fundamental frequency and determines the pitch of the pitch. The time variation of the pitch is represented by the pitch pattern. The intensity is captured as the amplitude of the voice.

【０００５】[0005]

【発明が解決しようとする課題】しかし、一般に規則合
成方式による合成音声は自然音声より了解性が低いた
め、ユーザは、装置によって一方的に読み上げられる合
成音声を長時間受聴することに、多大な集中力を必要と
しまた苦痛を強いられるという問題があった。上記合成
音声の了解性を向上させるためには、ユーザが随意、読
み上げ動作の開始／停止、条件変更を行えることが望ま
しい。しかしこれらを上記従来の入力装置で操作するこ
とは困難であるか、高度の習熟を要する。そのため、視
覚に障害を有するユーザや計算機操作に不慣れなユーザ
は、テキスト読み上げ装置を操作することが極めて困難
であるという問題点があった。However, since the synthetic speech according to the rule-based synthesis method is generally less intelligible than the natural speech, the user has a great deal of difficulty in listening to the synthetic speech unilaterally read by the device for a long time. There was a problem that it required concentration and suffering. In order to improve the intelligibility of the synthesized voice, it is desirable that the user can arbitrarily start / stop the reading operation and change the condition. However, it is difficult to operate these with the above-mentioned conventional input device, or a high degree of skill is required. Therefore, there is a problem that it is extremely difficult for a visually impaired user or a user who is unfamiliar with computer operation to operate the text-to-speech device.

【０００６】本発明は上記問題点を解決し、容易に操作
できるマン・マシンインターフェースを有したテキスト
読み上げ装置を提供し、文章校正支援など従来からの用
途における作業効率化を実現することのほか、表示され
たテキストを読むことが困難な視覚に障害を有した者の
読書などを支援する装置を提供することを目的とする。The present invention solves the above problems, provides a text-to-speech device having a man-machine interface that can be easily operated, and realizes work efficiency in conventional applications such as text proofreading support. It is an object of the present invention to provide a device for assisting reading or the like of a visually impaired person who has difficulty reading the displayed text.

【０００７】[0007]

【課題を解決するための手段】本発明に係るテキスト読
み上げ装置は、テキストデータを文字列として画面に表
示しこの画面上の指示物体の接触点を表す位置情報を出
力する表示入力手段と、前記位置情報に基づいて前記指
示物体によりなぞられた文字列部分を特定するトレース
特定手段と、この文字列部分に対応する音声を前記テキ
ストデータに基づき合成する音声合成手段と、この合成
された音声を出力する音声出力手段と、を有することを
特徴とする。A text-to-speech device according to the present invention includes display input means for displaying text data as a character string on a screen and outputting position information representing a contact point of a pointing object on the screen. Trace specifying means for specifying the character string portion traced by the pointing object based on the position information, voice synthesizing means for synthesizing the voice corresponding to the character string portion based on the text data, and the synthesized voice. And an audio output unit for outputting.

【０００８】本発明によれば、表示入力手段の画面に表
示された文字列をなぞると、表示入力手段はなぞられた
位置を検知して位置情報として出力する。この位置情報
がテキストデータに対応づけられ、文字列のなぞられた
部分が音声合成され音声出力される。画面をなぞる指示
物体は、例えば操作者の指やペンなどのポインタであ
る。表示入力手段は、例えば指から受ける接触圧や静電
的相互作用や光学的相互作用などにより、指示物体によ
り指し示された位置を検出する。表示入力手段は例え
ば、タッチパネルを用いて実現することができる。「な
ぞる」という、対象である文字列に直接に働きかける行
為は我々が日常行っているような自然な行為であり、操
作者が自在に装置を制御することができる。According to the present invention, when the character string displayed on the screen of the display input means is traced, the display input means detects the traced position and outputs it as position information. The position information is associated with the text data, and the traced portion of the character string is voice-synthesized and output as voice. The pointing object tracing the screen is, for example, a finger of the operator or a pointer such as a pen. The display input means detects the position pointed by the pointing object by, for example, contact pressure received from a finger, electrostatic interaction, optical interaction, or the like. The display input means can be realized using, for example, a touch panel. The act of directly acting on the target character string, which is "tracing," is a natural act that we do every day, and the operator can freely control the device.

【０００９】本発明に係るテキスト読み上げ装置におい
ては、前記音声合成手段は、前記位置情報に基づいて前
記接触点の移動速度を求める速度検出手段と、この移動
速度に応じて前記音声の発話速度を制御する発話速度制
御手段と、を有することを特徴とする。本発明によれ
ば、操作者は指示物体の移動速度を調節することにより
自己の望む発話速度を自然に得ることができ、出力され
る音声の了解性が高まる。In the text-to-speech apparatus according to the present invention, the voice synthesizing unit determines a moving speed of the contact point based on the position information, and a speech utterance speed of the voice according to the moving speed. And an utterance speed control means for controlling the utterance speed. According to the present invention, the operator can naturally obtain the utterance speed desired by himself by adjusting the moving speed of the pointing object, and the intelligibility of the output voice is enhanced.

【００１０】本発明に係るテキスト読み上げ装置におい
ては、前記表示入力手段は前記指示物体から受ける圧力
を表す圧力情報を出力し、前記音声合成手段は前記圧力
信号に応じて前記音声の音量を制御する音量制御手段を
有すること、を特徴とする。本発明によれば、表示入力
手段は指示物体から受ける圧力を検知することができ
る。操作者は指示物体による画面への圧力を調節するこ
とにより、合成音声の強弱を制御することができ、出力
される音声の了解性が高まる。In the text-to-speech device according to the present invention, the display input means outputs pressure information representing the pressure received from the pointing object, and the voice synthesis means controls the volume of the voice according to the pressure signal. It has a volume control means. According to the present invention, the display input means can detect the pressure received from the pointing object. The operator can control the strength of the synthesized voice by adjusting the pressure on the screen by the pointing object, and the intelligibility of the output voice is enhanced.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態である
テキスト読み上げ装置を図面を参照して説明する。この
テキスト読み上げ装置は、テキストを表示する表示器と
ポインティングを検知する入力器とが一体化された表示
入力デバイスを有し、そのデバイスに表示された文字列
を指などでなぞると、そのなぞられた部分の文字列を音
声合成により読み上げるという基本機能を有するもので
ある。図１は、この基本機能を説明する原理図である。
ここでは表示入力デバイスとして透明なタッチパネルが
ディスプレイの画面上に一体的に構成され、その画面が
図示されている。ディスプレイ画面上には文字列が複数
行に渡って表示されている。指１がディスプレイ画面上
の透明なタッチパネルに接触しながら、文字列のある行
を水平右向きにトレースすると、本装置はそのトレース
された文字列に対応する合成音声を出力する。以下、こ
の機能を実現する装置の具体的構成を説明する。BEST MODE FOR CARRYING OUT THE INVENTION A text-to-speech device according to an embodiment of the present invention will be described below with reference to the drawings. This text-to-speech device has a display input device in which a display device for displaying text and an input device for detecting pointing are integrated, and when a character string displayed on the device is traced with a finger or the like, the tracing operation is performed. It has a basic function of reading out the character string of the part indicated by voice synthesis. FIG. 1 is a principle diagram for explaining this basic function.
Here, a transparent touch panel as a display input device is integrally configured on the screen of the display, and the screen is shown. The character string is displayed on multiple lines on the display screen. When the line with the character string is traced horizontally rightward while the finger 1 is in contact with the transparent touch panel on the display screen, the device outputs the synthesized voice corresponding to the traced character string. The specific configuration of the device that realizes this function will be described below.

【００１２】図２は本実施形態に係るテキスト読み上げ
装置のブロック構成図である。本装置は、表示入力デバ
イス２、朗読インターフェースドライバ４、言語処理部
６及び規則合成部８という４つのブロックを含んでい
る。FIG. 2 is a block diagram of the text-to-speech device according to this embodiment. The present apparatus includes four blocks of a display input device 2, a reading interface driver 4, a language processing unit 6, and a rule synthesizing unit 8.

【００１３】表示入力デバイス２は、朗読インターフェ
ースドライバ４から出力されたテキストデータを表示す
る表示器１０と、この表示器１０と一体に構成され指示
物体によるその画面に対する接触点を検知する入力器１
２とを備えている。本装置は具体的には表示入力デバイ
ス２として上記の画面上に透明なタッチパネルを備えた
ディスプレイを用いている。すなわち、ディスプレイが
表示器１０であり、タッチパネルが入力器１２である。
このタッチパネルは、指やペンなどの指示物体（ポイン
タ）により接触された点の位置情報とその接触圧情報と
を表す入力信号１４を出力する。なお、表示入力デバイ
ス２には、入力器に上記タッチパネルの代わりにタブレ
ットを用いたものなども可能である。ちなみに、このタ
ブレットは、スタイラスと称される入力用ペンが指し示
す位置を電磁的な方法や光学的な方法などにより検出す
る装置である。The display input device 2 is a display device 10 for displaying the text data output from the reading interface driver 4, and an input device 1 configured integrally with the display device 10 for detecting a contact point on the screen by the pointing object.
2 is provided. Specifically, this apparatus uses a display having a transparent touch panel on the screen as the display input device 2. That is, the display is the display device 10 and the touch panel is the input device 12.
The touch panel outputs an input signal 14 representing position information of a point touched by a pointing object (pointer) such as a finger or a pen and contact pressure information thereof. The display input device 2 may be a device using a tablet instead of the touch panel as an input device. Incidentally, this tablet is a device called a stylus that detects a position pointed by an input pen by an electromagnetic method or an optical method.

【００１４】朗読インターフェースドライバ４は、上記
表示入力デバイス２、言語処理部６及び規則合成部８間
に位置するものであり、これらの間におけるデータの変
換や演算といった一群の処理を行う。本装置において基
本となるデータは、文章バッファ２０に格納されたテキ
ストデータである。このテキストデータは、図示しない
キーボードなどの入力装置から入力されたり、または電
子化テキストデータを記録された記録媒体から読み込ま
れる。この文章バッファ２０のテキストデータは言語処
理部６及び表示処理部２２にて利用される。The reading interface driver 4 is located between the display input device 2, the language processing unit 6 and the rule synthesizing unit 8 and performs a group of processes such as data conversion and calculation among them. The basic data in this device is the text data stored in the sentence buffer 20. This text data is input from an input device such as a keyboard (not shown) or read from a recording medium in which electronic text data is recorded. The text data in the text buffer 20 is used by the language processing unit 6 and the display processing unit 22.

【００１５】言語処理部６は、文章バッファ２０からの
日本語テキストを言語解析し、その日本語テキストの
「読み」を表す情報に、音声合成処理に必要な文節の境
界、アクセントの位置、ポーズの位置、母音の無声化と
いった韻律情報を加えた音韻記号列を生成する。この音
韻記号列は、音韻記号列バッファ２４に蓄積される。The language processing unit 6 linguistically analyzes the Japanese text from the sentence buffer 20 and, based on the information representing the "reading" of the Japanese text, the boundary of the phrase, the position of the accent, and the pause necessary for the speech synthesis processing. A phonological symbol string to which prosodic information such as the position of vowel and vowel devoicing is added is generated. This phoneme symbol string is stored in the phoneme symbol string buffer 24.

【００１６】規則合成部８は、朗読インターフェースド
ライバ４から出力されるパラメータを基に、音声の規則
合成を行う。詳細は後述する。The rule synthesizing unit 8 performs rule synthesizing of voices based on the parameters output from the reading interface driver 4. Details will be described later.

【００１７】続いて朗読インターフェースドライバ４を
説明する。表示処理部２２は表示入力デバイス２に対す
る出力処理を行う。すなわち表示処理部２２は、文章バ
ッファ２０から表示器１０の例えば一画面分に相当する
テキストデータを読み込み、テキスト画面データ２６に
変換して表示器１０に出力する。なお、表示処理部２２
は、文章バッファ２０中のどの部分が表示器１０のどこ
に表示されているかを示す表示範囲情報２８を音韻記号
出力処理部４０に出力し、後述する音韻記号出力処理部
４０はトレース特定処理をこの情報を用いて行うことと
してもよい。Next, the reading interface driver 4 will be described. The display processing unit 22 performs output processing for the display input device 2. That is, the display processing unit 22 reads text data corresponding to, for example, one screen of the display device 10 from the text buffer 20, converts the text data into text screen data 26, and outputs the text screen data 26 to the display device 10. The display processing unit 22
Outputs display range information 28 indicating which part of the sentence buffer 20 is displayed on the display 10 to the phonological symbol output processing unit 40, and the phonological symbol output processing unit 40, which will be described later, performs the trace identification processing. It may be performed using information.

【００１８】入力信号処理部４２内の位置判定器４４、
速度判定器４６、圧力判定器４８は表示入力デバイス２
からの入力信号１４を受け、それぞれ、表示器１０上で
のポインタによる接触点の位置データ５０、移動速度デ
ータ５２及び圧力データ５４を求め出力する。つまり、
入力信号１４が含む位置情報、接触圧情報から、接触点
の画面上での座標、圧力値が抽出され、またこの座標の
時間的な変化から移動速度が演算により生成される。位
置判定器４４は、ポインタの座標そのものを位置データ
５０として出力してもよいが、本装置では接触圧情報か
らポインタの画面への接触を検知しポインタが連続して
接触している間において、位置情報から行方向（水平方
向）の変位成分のみ抽出し、これを位置データ５０とし
て出力する。すなわちポインタの移動が垂直方向の変位
成分を有していても、位置データ５０の垂直方向の座標
は固定したままで変化させない。これは、人間にとって
細かい文字列をそれから逸れずに正確にトレースするこ
とは難しいからであり、この機能は特に、目の不自由な
者やお年寄りが操作する場合に有効である。垂直方向の
座標は、ポインタを画面から一旦離して他の行を接触す
ることにより変更される。なお、入力信号処理部４２
は、ポインタの行からの逸脱を、アラームや、なぞられ
る文字列から生成される合成音声をフェードアウトする
ことなどによって操作者に知らせても良い。また、後述
するように画面上にボタンを設けこれをダブルクリック
するなどによって操作を行う場合には、入力信号処理部
４２から出力される位置データ５０や圧力データ５４な
どは、図示しないテキスト読み上げ装置制御部に渡さ
れ、ボタン操作の有無の判定などに利用される。A position determiner 44 in the input signal processing section 42,
The speed determiner 46 and the pressure determiner 48 are the display input device 2
In response to the input signal 14 from, the contact point position data 50, the moving speed data 52 and the pressure data 54 on the display 10 are obtained and output. That is,
The coordinates and pressure value of the contact point on the screen are extracted from the position information and the contact pressure information included in the input signal 14, and the moving speed is calculated by the time change of the coordinates. The position determiner 44 may output the coordinates of the pointer itself as the position data 50, but in the present device, the contact of the pointer with the screen is detected from the contact pressure information, and while the pointer is continuously in contact, Only the displacement component in the row direction (horizontal direction) is extracted from the position information, and this is output as position data 50. That is, even if the movement of the pointer has a vertical displacement component, the vertical coordinate of the position data 50 remains fixed and does not change. This is because it is difficult for humans to accurately trace a fine character string without deviating from it, and this function is particularly effective when a visually handicapped person or the elderly operates. The vertical coordinate is changed by temporarily moving the pointer away from the screen and touching another line. The input signal processing unit 42
May notify the operator of the deviation from the line of the pointer by an alarm or by fading out a synthetic voice generated from the traced character string. When a button is provided on the screen and an operation is performed by double-clicking the button, as will be described later, the position data 50, the pressure data 54, etc. output from the input signal processing unit 42 are not shown in the text reading device. It is passed to the control unit and used for determining whether or not a button has been operated.

【００１９】位置データ５０は上記音韻記号出力処理部
４０に入力される。音韻記号出力処理部４０は、表示器
１０上においてポインタによりなぞられた（トレースさ
れた）文字列部分が文章バッファ２０に与えられたテキ
ストデータ中のどの位置であるかを、位置データ５０か
ら特定する。このとき必要に応じて、音韻記号出力処理
部４０は、表示処理部２２から表示範囲情報２８といっ
た情報を得たり、図示しないテキスト読み上げ装置制御
部から表示処理部２２に対し与えられる表示範囲指定の
情報を得る構成とされる。なぞられた文節に対応する音
韻記号列を音韻記号列バッファ２４から取り出し、規則
合成部８に出力する。また本装置では音韻記号出力処理
部４０は、接触点がある文節から次の文節に移動するタ
イミング信号６０を発話速度制御部６２に出力する。The position data 50 is input to the phoneme symbol output processing section 40. The phonological symbol output processing unit 40 identifies, from the position data 50, which position in the text data given to the sentence buffer 20 is the character string part traced (traced) by the pointer on the display 10. To do. At this time, the phonological symbol output processing unit 40 obtains information such as the display range information 28 from the display processing unit 22 or specifies the display range given to the display processing unit 22 from the text reading device control unit (not shown), if necessary. It is configured to obtain information. The phoneme symbol string corresponding to the traced phrase is fetched from the phoneme symbol string buffer 24 and output to the rule synthesizing unit 8. Further, in this device, the phonological symbol output processing unit 40 outputs the timing signal 60 for moving the contact point from one phrase to the next phrase to the speech rate control unit 62.

【００２０】ポインタを画面から離すと、入力信号処理
部４２からの位置データ５０の出力は停止する。これに
応じて音韻記号出力処理部４０からの音韻記号列の出力
も停止し、これを用いる規則合成部８での音声合成も中
断する。ちなみに、ある画面上の文字列のある行をトレ
ースしているポインタを画面から離して、他の行に触れ
ることにより、飛ばし読み動作が実現される。また、ポ
インタが逆方向、すなわち左向きに移動されある文節か
らその前の文節に移動すると、音韻記号出力処理部４０
は、ある文節の音韻記号列に引き続いてその前の文節の
音韻記号列を出力する。すなわち、ポインタの逆方向移
動により、文節単位で逆読みが行われる。この文節単位
の逆読みは、文字単位の逆読みに比べて意味の理解が容
易であり、読み直しのための後戻り動作が容易になる。When the pointer is released from the screen, the output of the position data 50 from the input signal processing section 42 is stopped. In response to this, the output of the phoneme symbol string from the phoneme symbol output processing unit 40 is also stopped, and the speech synthesis in the rule synthesizing unit 8 using this is also interrupted. By the way, a skipping operation is realized by moving a pointer tracing a line of a character string on a screen away from the screen and touching another line. Further, when the pointer is moved in the opposite direction, that is, leftward from a certain phrase to the preceding phrase, the phonological symbol output processing unit 40.
Outputs a phoneme symbol string of a certain phrase and subsequently a phoneme symbol string of the preceding phrase. That is, backward reading is performed in phrase units by moving the pointer in the backward direction. The reverse reading of the phrase unit is easier to understand the meaning than the reverse reading of the character unit, and the backward movement operation for re-reading is easy.

【００２１】発話速度制御部６２は、単純にはトレース
された文字列部分における発話速度とその位置における
ポインタの移動速度とを対応づければよい。しかし必ず
しもポインタの移動速度の滑らかなコントロールは容易
ではなく、通常は、文字列とは無関係に移動速度の緩急
が生じてしまうものである。特に、画面へのポインタの
押圧が強くなると画面上での摩擦が大きくなって大きな
速度変化が短い周期で発生しやすい。この文字列とは無
関係な緩急がそのまま発話速度に反映されると、合成さ
れた音声が聞き取りにくいものとなりやすい。そこで本
装置では文節移動のタイミング信号６０を音韻記号処理
部４０から得て、これを用いて文節間では発話速度を変
化させるが文節内の発話速度は一定とする制御を行って
いる。すなわちタイミング信号６０を得た瞬間の移動速
度データ５２に応じて、次の文節の発話速度を決定し、
発話速度パラメータとして規則合成部８に出力する。ま
た、耳障りな短周期の大きな速度変動の影響をキャンセ
ルするために、適当な時間幅で平均した速度を用いるこ
とも有効である。The speech speed control unit 62 simply associates the speech speed in the traced character string portion with the moving speed of the pointer at that position. However, it is not always easy to smoothly control the moving speed of the pointer, and usually, the moving speed is moderated regardless of the character string. In particular, when the pointer is strongly pressed against the screen, friction on the screen is increased, and a large speed change is likely to occur in a short cycle. If the speed irrelevant to the character string is directly reflected in the speech rate, the synthesized voice tends to be difficult to hear. Therefore, in the present apparatus, the phrase movement timing signal 60 is obtained from the phonological symbol processing unit 40, and the utterance speed is changed between phrases using this, and the utterance speed within the phrase is controlled to be constant. That is, the speech speed of the next phrase is determined according to the moving speed data 52 at the moment when the timing signal 60 is obtained,
The speech rate parameter is output to the rule synthesizing unit 8. Further, in order to cancel the influence of a large speed fluctuation of a short cycle which is offensive to the ears, it is also effective to use a speed averaged over an appropriate time width.

【００２２】音量制御部６４は、圧力データ５４に基づ
いて指し示された文節の音量を決定し、音量パラメータ
として規則合成部８に出力する。なお、本装置では、ポ
インティング信号１４中の接触圧情報を用いて音量を調
節したが、音量の代わりに文節中のアクセントの強さを
調節させる構成も可能である。The volume controller 64 determines the volume of the phrase pointed to based on the pressure data 54, and outputs it to the rule synthesizer 8 as a volume parameter. In this device, the volume is adjusted by using the contact pressure information in the pointing signal 14, but a structure in which the strength of the accent in the phrase is adjusted in place of the volume is also possible.

【００２３】規則合成部８には、音韻記号出力処理部４
０からの音韻記号列、発話速度変更部６２からの発話速
度パラメータ、及び音量制御部６４からの音量パラメー
タが入力される。この規則合成部８が従来と異なる点
は、その入力に、ポインタの操作により操作者が随意に
変更できる発話速度パラメータ及び音量パラメータを含
んでいる点である。The rule synthesizing unit 8 includes a phoneme symbol output processing unit 4
The phoneme symbol string from 0, the speech rate parameter from the speech rate changing unit 62, and the volume parameter from the volume control unit 64 are input. This rule synthesizing unit 8 is different from the conventional one in that its input includes a speech rate parameter and a volume parameter which can be arbitrarily changed by the operator by operating a pointer.

【００２４】通常の音声合成装置の処理は、まずテキス
トデータを言語解析した結果に、一定の発話速度に基づ
く音韻継続時間長が付加された音韻記号列を生成する。
つまり、発話速度は当初にパラメータとして与えられた
もので固定であり、これを変更する場合にはキーボード
などを操作する必要があり、テキストが読み上げられて
いる途中でタイミングよく自由自在に変更することは困
難であった。また、音韻記号列に基づいて生成される音
源振幅パタンの変動レンジも、通常は当初に定める音量
を表すパラメータにより決定され、発話速度同様、これ
を自由自在に変更することは困難であった。In the processing of a normal speech synthesizer, first, a phonological symbol string is generated by adding a phonological duration time based on a constant speech rate to the result of linguistic analysis of text data.
In other words, the speech rate is initially given as a parameter and is fixed.To change it, you need to operate the keyboard, etc., and freely change it at the right time while reading the text. Was difficult. Also, the variation range of the sound source amplitude pattern generated based on the phoneme symbol string is usually determined by a parameter representing the initially set sound volume, and like the speech rate, it is difficult to freely change this.

【００２５】これに対し本装置の規則合成部８は、ポイ
ンタ動作により文節ごとに変更可能な発話速度パラメー
タを用いて音韻継続時間長を文節ごとに自在に変更する
ことができる。次いで音韻記号列に基づき、音声の強度
変化を表す音源振幅パタン、基本周波数の変化を表すピ
ッチパタン、及び音色を表す周波数スペクトルの包絡形
状であるスペクトルパタン（フォルマントパタン）が、
それぞれ規則などに基づいて生成され、これらが合成さ
れて合成音声が作られる処理が行われる。このとき本装
置では、ポインタ動作により随意変更可能な音量パラメ
ータを用いて音源振幅パタンの変動レンジを自在に変更
することができる。On the other hand, the rule synthesizing unit 8 of the present apparatus can freely change the phoneme duration for each phrase by using the speech rate parameter that can be changed for each phrase by the pointer operation. Next, based on the phonological symbol sequence, a sound source amplitude pattern that represents a change in voice intensity, a pitch pattern that represents a change in fundamental frequency, and a spectrum pattern (formant pattern) that is an envelope shape of a frequency spectrum that represents a timbre,
Each is generated based on a rule or the like, and these are synthesized to perform a process of producing a synthesized voice. At this time, in this apparatus, the fluctuation range of the sound source amplitude pattern can be freely changed by using the volume parameter that can be arbitrarily changed by the pointer operation.

【００２６】上述のように本装置によれば、操作者はポ
インタで文字列をなぞるという極めて簡単な操作で、音
声出力を得ることができる。すなわち、視覚に障害があ
る者や機械操作に不慣れな者であっても、本装置を操作
することができる。また操作者はその了解度に応じて自
在に発話速度や音量（またはアクセント強度）を調節し
たり、後戻り動作や飛ばし読み動作を行うことができる
ので、了解性の高い合成音声を得ることができ、よって
本装置を文章校正作業の支援に用いれば、その作業効率
が向上する。As described above, according to this apparatus, the operator can obtain the voice output by an extremely simple operation of tracing the character string with the pointer. That is, even a visually impaired person or a person unfamiliar with machine operation can operate the apparatus. Also, the operator can freely adjust the speaking speed and volume (or accent strength) according to the degree of intelligibility, and can perform the backward movement operation and skip reading operation, so that a synthetic speech with high intelligibility can be obtained. Therefore, if this device is used to support the grammar correction work, its work efficiency is improved.

【００２７】ところで、以上述べた規則合成では現状で
は必ずしも１００％正しい「読み」や「アクセント」を
得ることはできない。そこで本装置は、文節修正モード
を備えている。文節修正モードは、操作者によって装置
の「読みの間違い」や「アクセントの間違い」が検知さ
れた文節に対し、装置が合成音声によって発音の候補を
提示し、操作者が正しい発音を選択してこれを装置に学
習させることができる機能である。本装置では文節修正
モードは、例えば操作者が装置の発音を修正したい文節
を画面上でポインタによってダブルクリックすることに
より起動される。装置は文節修正モードが起動されたこ
とを音により操作者に知らせる。併せて、画面上にその
旨を表示してもよい。装置は指定された文節に対する読
みの候補を合成音声により順次提示する。このとき、例
えば、画面は大きく３つのエリアに分割され、それぞれ
のエリアが次の候補、現在の候補、前の候補の合成音声
を出力させる接触スイッチとなる。この接触スイッチを
操作して候補を選択し、装置に発音させる。候補が尽き
た場合には、装置は画面と合成音声とのそれぞれにより
操作者に通知する。操作者が意図する候補を見出したと
きは、画面の分かりやすい位置、例えば、画面上端、下
端、隅などの少なくともどこかに設けられた接触スイッ
チをダブルクリックすることにより、その候補が以降そ
の文節の読みとして確定される。読みが確定されると、
同様の方法で、アクセントを決定するルーチンに入る。
このように読み又はアクセントが修正されると、音韻記
号列バッファ２４の内容、又は言語処理部６のデータが
変更され、それ以降、その文節は修正された読みとアク
セントで音声合成される。By the way, at present, it is not always possible to obtain 100% correct "reading" and "accent" by the rule composition described above. Therefore, this device has a phrase correction mode. In the phrase correction mode, the device presents pronunciation candidates by synthetic speech to the phrase for which the operator has detected “misreading” or “accent mistake” of the device, and the operator selects the correct pronunciation. This is a function that allows the device to learn this. In this device, the phrase correction mode is activated, for example, when the operator double-clicks a phrase on the screen for which the operator wants to correct the pronunciation of the device with the pointer. The device audibly informs the operator that the phrase modification mode has been activated. In addition, it may be displayed on the screen. The device sequentially presents the reading candidates for the specified phrase with synthetic speech. At this time, for example, the screen is roughly divided into three areas, and each area serves as a contact switch for outputting the synthesized voice of the next candidate, the current candidate, and the previous candidate. The contact switch is operated to select a candidate, and the device is made to sound. When the candidates are exhausted, the device notifies the operator by the screen and the synthesized voice respectively. When the operator finds a candidate that he / she intends, the candidate will be able to double-click the contact switch provided at an easy-to-understand position on the screen, for example, at the top, bottom, or corner of the screen. Is confirmed as the reading of. When the reading is confirmed,
In a similar manner, the accent determination routine is entered.
When the pronunciation or accent is corrected in this way, the content of the phoneme symbol string buffer 24 or the data of the language processing unit 6 is changed, and thereafter, the phrase is speech-synthesized with the corrected reading and accent.

【００２８】このように、本装置は画面上で、上述のよ
うな文節の発音を修正することができる。この操作は、
それまで画面上の文字列をなぞっていたポインタで簡単
な操作で行うことができる。すなわち、ポインタから他
の入力手段、例えばキーボードやマウスに切り替えると
いった動作が不要であるので作業効率がよい。特に、目
の不自由な者には、このような動作の移行がないこと
は、操作上、非常に便利であるし、また、文節修正モー
ド時に画面上に設定される接触スイッチも、操作者によ
る位置特定が大雑把でよい大きなものであったり、位置
を把握しやすい画面端部に設けられており、通常用いら
れるキーボードなど他の入力手段に比較し操作性が高
い。As described above, this device can correct the pronunciation of the phrase as described above on the screen. This operation
It can be done with a simple operation using the pointer that was tracing the character string on the screen until then. That is, the operation of switching from the pointer to another input means, such as a keyboard or a mouse, is unnecessary, so that the work efficiency is good. In particular, it is very convenient for the visually impaired person not to have such a transition of operation, and the contact switch set on the screen in the phrase correction mode is also used by the operator. The position can be specified roughly by a large one, or it is provided at the end of the screen where the position can be easily grasped, and the operability is high compared to other input means such as a commonly used keyboard.

【００２９】[0029]

【発明の効果】本発明のテキスト読み上げ装置によれ
ば、これまでキーボードやマウスからの操作が必要であ
った装置の操作を、読み上げられる文字列が表示された
画面上に対する指やペンなどのポインタの位置、動き、
圧力によって、極めて直観に近い自然な操作で行うこと
ができるという効果がある。特に、このような容易なマ
ン・マシンインターフェースは、視覚に障害を有したユ
ーザや計算機操作に不慣れなユーザに対して有効であ
る。また、音声合成において生じる読みやアクセントの
誤りの修正も画面上のポインタによる操作で行うことが
できる。このとき、読み、アクセントの候補が合成音声
で提示されるので、言語学的、または音声学的な知識を
持たない一般ユーザでもその候補の中から選択を行うこ
とにより、修正作業を容易に行うことができる。According to the text-to-speech device of the present invention, the operation of the device which has conventionally required operation from the keyboard or the mouse can be performed by a pointer such as a finger or a pen on the screen on which the read-out character string is displayed. Position, movement,
The pressure has an effect that it can be performed by a natural operation that is extremely intuitive. In particular, such an easy man-machine interface is effective for a visually impaired user or a user unfamiliar with computer operation. Further, the reading and accent errors that occur in voice synthesis can be corrected by operating the pointer on the screen. At this time, since candidates for reading and accent are presented in synthetic speech, even a general user who does not have linguistic or phonetic knowledge can easily make corrections by selecting from the candidates. be able to.

[Brief description of drawings]

【図１】本実施形態に係るテキスト読み上げ装置の基
本機能を説明する原理図である。FIG. 1 is a principle diagram illustrating basic functions of a text-to-speech device according to the present embodiment.

【図２】本実施形態に係るテキスト読み上げ装置のブ
ロック構成図である。FIG. 2 is a block configuration diagram of a text-to-speech device according to the present embodiment.

[Explanation of symbols]

２表示入力デバイス、４朗読インターフェースドラ
イバ、６言語処理部、８規則合成部、１０表示
器、１２入力器、１４入力信号、２０文章バッフ
ァ、２４音韻記号列バッファ、４０音韻記号出力処
理部、４２入力信号処理部、４４位置判定器、４６
速度判定器、４８圧力判定器、５０位置データ、５
２移動速度データ、５４圧力データ、６２発話速
度制御部、６４音量制御部。2 display input device, 4 reading interface driver, 6 language processing unit, 8 rule synthesizing unit, 10 display unit, 12 input unit, 14 input signal, 20 sentence buffer, 24 phoneme symbol string buffer, 40 phoneme symbol output processing unit, 42 Input signal processor, 44 Position determiner, 46
Speed judger, 48 Pressure judger, 50 Position data, 5
2 movement speed data, 54 pressure data, 62 speech speed control section, 64 volume control section.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ１０Ｌ 3/00 Ｇ０６Ｆ 15/20 ５６８Ａ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI Technical display location G10L 3/00 G06F 15/20 568A

Claims

[Claims]

1. Display input means for displaying text data as a character string on a screen and outputting position information indicating a contact point of the pointing object on the screen, and tracing by the pointing object based on the position information. A trace specifying means for specifying the character string portion, a voice synthesizing means for synthesizing the voice corresponding to the character string portion based on the text data, and a voice output means for outputting the synthesized voice. Characteristic text-to-speech device.

2. The text-to-speech device according to claim 1, wherein the voice synthesizing unit determines a moving speed of the contact point based on a change with time of the position information, and a speed detecting unit that determines the moving speed according to the moving speed. A text-to-speech device, comprising: a speech rate control means for controlling a speech rate of the voice.

3. The text-to-speech device according to claim 1, wherein the display input unit outputs pressure information indicating pressure received from the pointing object together with the position information, and the voice synthesis unit outputs the pressure signal to the pressure signal. A text-to-speech device having volume control means for controlling the volume of the voice according to the above.