JPH10222192A

JPH10222192A - Substitute reader

Info

Publication number: JPH10222192A
Application number: JP9025302A
Authority: JP
Inventors: Masaaki Kato; 匡朗加藤
Original assignee: NIPPON SHOKO FUAINANSU KK
Current assignee: NIPPON SHOKO FUAINANSU KK
Priority date: 1997-02-07
Filing date: 1997-02-07
Publication date: 1998-08-21

Abstract

PROBLEM TO BE SOLVED: To provide a substitute reader which can generate a synthesized voice corresponding to text data by using the voice of an arbitrary voice input person. SOLUTION: A speech recognition device 22 extracts phoneme data by analyzing the voice that an arbitrary voice input person has inputted by using a microphone 21. The phoneme data are stored in a phoneme data storage part 13. A voice synthesis part 10 composes a text by using text data read out of a CD-ROM by using a CD-ROM drive 11 and the phoneme data stored in the phoneme data storage part 13 and outputs a synthesized voice from a speaker 14. Therefore, the text can be read aloud with the voice of the arbitrary voice input person.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、与えられたテキス
トデータを任意の音声入力者の声で読み上げさせる代読
装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech reading apparatus for reading out given text data with the voice of an arbitrary voice input person.

【０００２】[0002]

【従来の技術】従来より、音声合成技術の分野において
は、与えられたテキストデータに従った音声合成を行な
うことにより、テキストデータに相当する合成音を出力
するようにしたテキスト合成と称する技術が知られてい
る。テキスト合成では、音声の単位（たとえば、ア、
イ、ウ、……など）ごとに与えた音韻データ（複数のパ
ラメータを組み合わせるものやＰＣＭ音源を用いるもの
が知られている）を用い、テキストデータから音声の単
位ごとに音韻データを適用して合成音を生成するのであ
る。音韻データは合成音の音質を決定する。また、合成
音を生成する際には単語単位あるいは単語間のつながり
などに応じた抑揚を与えるための韻律データを用いるも
のもある。2. Description of the Related Art Conventionally, in the field of speech synthesis technology, there is a technique called text synthesis in which synthesized speech corresponding to text data is output by performing speech synthesis according to given text data. Are known. In text synthesis, speech units (for example,
(B, c,...) Are applied to each phoneme data from text data by using phoneme data (a combination of a plurality of parameters and a PCM sound source are known). It produces synthesized sounds. The phoneme data determines the sound quality of the synthesized sound. Further, when generating a synthetic sound, there is a method that uses prosody data for giving inflection according to a word unit or a connection between words.

【０００３】[0003]

【発明が解決しようとする課題】ところで、上述のよう
なテキスト合成を行なう装置では、一般にメーカ側が音
韻データを作成しているものであり、ＲＯＭや磁気ディ
スクのような適宜の記憶媒体を用いて供給されている。
したがって、使用者はメーカが用意した範囲の音質でし
か合成音を生成することができない。In the above-described apparatus for synthesizing text, generally, the maker creates phonemic data, and the apparatus uses an appropriate storage medium such as a ROM or a magnetic disk. Supplied.
Therefore, the user can generate the synthesized sound only in the sound quality in the range prepared by the manufacturer.

【０００４】本発明は上記事由に鑑みて為されたもので
あり、その目的は、任意の音声入力者の音声を用いてテ
キストデータに相当する合成音を生成することができる
代読装置を提供することにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to provide a substitute reading apparatus capable of generating a synthesized sound equivalent to text data using a voice of an arbitrary voice input person. It is in.

【０００５】[0005]

【課題を解決するための手段】請求項１の発明は、テキ
ストデータを入力するテキスト入力手段と、音声入力装
置を通して任意の音声入力者が入力した音声をサンプリ
ングすることにより音声入力者の音韻データを抽出する
音声認識手段と、音声認識手段により抽出した音韻デー
タを音声の単位ごとに記憶する音韻データ記憶部と、テ
キスト入力手段により入力されたテキストデータと音韻
データ記憶部に格納された音韻データとに基づいてテキ
ストデータに相当する音声を合成し音声出力装置より合
成音を出力させる音声合成手段とを備えるものである。
この構成によれば、音声入力者が音声入力装置を通して
入力した音声から音韻データを抽出して音韻データ記憶
部に記憶させるから、任意の音声入力者の音声を合成音
として生成することが可能になり、結果的にテキストデ
ータに相当するテキストをいろいろな人の音声で読み上
げさせることが可能になるのである。According to the first aspect of the present invention, there is provided text input means for inputting text data, and phonological data of a voice input person by sampling voice input by an arbitrary voice input person through a voice input device. , A phoneme data storage unit that stores the phoneme data extracted by the speech recognition unit for each voice unit, and text data input by the text input unit and phoneme data stored in the phoneme data storage unit. And a voice synthesizing means for synthesizing voice corresponding to the text data based on the voice data and outputting a synthesized voice from the voice output device.
According to this configuration, since the phonetic data is extracted from the voice input by the voice input user through the voice input device and stored in the phonemic data storage unit, it is possible to generate the voice of any voice input user as a synthesized voice. As a result, it becomes possible to read out text corresponding to text data with various human voices.

【０００６】請求項２の発明は、請求項１の発明におい
て、音韻データ記憶部に格納する音声の単位ごとに音声
入力者に発話を促す指示手段を付加したものである。こ
の構成によれば、指示手段によって音韻データの入力の
仕方が指示されるから、適切な音韻データを与えること
ができるのである。請求項３の発明は、請求項１または
請求項２の発明において、別途に用意される印刷媒体の
内容に対応付けたテキストデータを記録した記録媒体か
らテキストデータを読み取る読取装置を備え、読取装置
をテキスト入力手段に用いたものである。この構成によ
れば、テキストを印刷した印刷媒体とテキストデータを
記録した記録媒体とを対にしておき、印刷媒体を見なが
ら記録媒体のテキストデータを任意の音声入力者の音声
で読み上げることができるものである。たとえば、印刷
媒体として子供の用の絵本を用意しておき、親の音声の
音韻データを音韻データ記憶部に登録しておけば、子供
が絵本を見るときに親の声で絵本の内容を読み上げると
いうような使用が可能になり、子供にとって親しみのあ
る声を聴きながら絵本を見ることができる。According to a second aspect of the present invention, in the first aspect of the present invention, an instruction means for prompting a voice input person to speak is added for each voice unit stored in the phoneme data storage unit. According to this configuration, the instruction means instructs how to input the phoneme data, so that appropriate phoneme data can be given. According to a third aspect of the present invention, in the first or second aspect of the present invention, there is provided a reading device for reading text data from a recording medium recording text data corresponding to the content of a separately prepared print medium, Are used as text input means. According to this configuration, the print medium on which the text is printed is paired with the recording medium on which the text data is recorded, and the text data on the recording medium can be read aloud by an arbitrary voice input person while watching the print medium. Things. For example, if a picture book for a child is prepared as a print medium and the phoneme data of the parent's voice is registered in the phoneme data storage unit, the content of the picture book is read out by the voice of the parent when the child views the picture book. Such a use becomes possible, and the picture book can be viewed while listening to the voice familiar to the child.

【０００７】請求項４の発明は、請求項１または請求項
２の発明において、文字と図形とを表示可能な表示手段
と、表示手段に表示させる静止画ないし動画の画像デー
タおよび画像データに関連したテキストデータを記録し
た記録媒体から画像データおよびテキストデータを読み
取る読取装置とを備え、読取装置をテキスト入力手段に
兼用するものである。この構成によれば、表示手段に表
示される画像を見ながらテキストデータに対応する音声
入力者の音声を聞くことができる。したがって、表示手
段に表示させる画像として絵本のような子供向けの画像
とし、親の音声の音韻データを音韻データ記憶部に登録
しておけば、親の声を聴きながら画像を見ることがで
き、子供が違和感のない声を聴きながら画像を見ること
ができる。ところで、近年、電子絵本などと称してディ
スプレイ装置の画面に絵本のような画像を表示するとと
もに、画像の切換や画像に関連した音声の発生を対話的
に行なうものが市販されているが、請求項４の発明では
任意の音声入力者の音声を発生させる点でこの種のもの
との相違がある。According to a fourth aspect of the present invention, in the first or the second aspect of the present invention, there is provided a display means capable of displaying characters and graphics, and image data and still image data of a still image or a moving image to be displayed on the display means. And a reading device for reading image data and text data from a recording medium on which the text data is recorded. The reading device is also used as text input means. According to this configuration, it is possible to hear the voice of the voice input person corresponding to the text data while watching the image displayed on the display means. Therefore, if the image to be displayed on the display means is an image for a child such as a picture book, and the phonemic data of the parent's voice is registered in the phonemic data storage unit, the image can be viewed while listening to the parent's voice, The child can view the image while listening to a voice that is comfortable. By the way, in recent years, an electronic picture book or the like that displays an image like a picture book on a screen of a display device and interactively switches images and generates sound related to the image has been marketed. The invention of Item 4 is different from this type in that a voice of an arbitrary voice input person is generated.

【０００８】請求項５の発明は、請求項１ないし請求項
４の発明において、合成音のピッチを外部信号に応じて
制御するピッチ制御部を備えるものである。この構成に
よれば、ピッチを制御することが可能になり、単純に音
の高低を変化させるだけではなく、ピッチを制御するピ
ッチデータをテキストデータに対応付けておけば、テキ
ストデータに音程を付けることで歌の出力も可能にな
る。According to a fifth aspect of the present invention, in the first to fourth aspects of the present invention, there is provided a pitch control section for controlling a pitch of the synthesized sound in accordance with an external signal. According to this configuration, it is possible to control the pitch. In addition to simply changing the pitch of the sound, if the pitch data for controlling the pitch is associated with the text data, a pitch is assigned to the text data. This makes it possible to output songs.

【０００９】[0009]

BEST MODE FOR CARRYING OUT THE INVENTION

（実施形態１）本実施形態では、テキストデータを記憶
媒体としてのＣＤ−ＲＯＭに格納している例を示す。た
だし、記憶媒体としては、ＩＣカード、フレキシブルデ
ィスクなどを用いることも可能である。(Embodiment 1) This embodiment shows an example in which text data is stored in a CD-ROM as a storage medium. However, as the storage medium, an IC card, a flexible disk, or the like can be used.

【００１０】図１に示すように、ＣＤ−ＲＯＭに記録さ
れているテキストデータを読み取る読取装置としてＣＤ
−ＲＯＭドライブ１１が設けられ、ＣＤ−ＲＯＭドライ
ブ１１で読み取られたテキストデータは、テキスト選択
部１２を通して音声合成部１０に入力される。音声合成
部１０は音韻データ記憶部１３に登録されている音韻デ
ータを用いてテキストデータに対応したテキスト合成を
行ない、音声出力装置としてのスピーカ１４より合成音
を出力する。つまり、音声合成部１０では、テキストデ
ータにおける音声の単位ごとに音韻データを用いて合成
音を生成する。音声合成部１０にはテキストデータの意
味解析を行なうことによって韻律データを生成するため
の辞書を設けてあり、韻律データによってイントネーシ
ョンが決定される。この種のテキスト合成（テキストデ
ータに基づく規則合成）の技術は従来より周知であるか
ら詳述はしない。As shown in FIG. 1, as a reading device for reading text data recorded on a CD-ROM, a CD is used.
A ROM drive 11 is provided, and text data read by the CD-ROM drive 11 is input to the speech synthesis unit 10 through the text selection unit 12. The speech synthesis unit 10 performs text synthesis corresponding to the text data using the phoneme data registered in the phoneme data storage unit 13, and outputs a synthesized sound from a speaker 14 as a speech output device. That is, the speech synthesis unit 10 generates a synthesized speech using phoneme data for each speech unit in the text data. The speech synthesis unit 10 is provided with a dictionary for generating prosody data by performing semantic analysis of text data, and intonation is determined by the prosody data. The technique of this type of text synthesis (rule synthesis based on text data) is well known in the art and will not be described in detail.

【００１１】テキスト選択部１２は、ＣＤ−ＲＯＭドラ
イブ１１においてＣＤ−ＲＯＭから読み出したテキスト
データと、メッセージ記憶部１５にあらかじめ用意され
ているメッセージのテキストデータとのどちらを音声合
成部１０に入力するかを選択するものであって、メッセ
ージ記憶部１５が選択されているときには、各種メッセ
ージをスピーカ１４から音声として出力することができ
る。The text selection unit 12 inputs to the speech synthesis unit 10 either text data read from the CD-ROM in the CD-ROM drive 11 or text data of a message prepared in advance in the message storage unit 15. When the message storage unit 15 is selected, various messages can be output from the speaker 14 as audio.

【００１２】ところで、本発明では、音韻データ記憶部
１３に記録される音韻データを任意の音声入力者が作成
する点に特徴があり、そのために音声入力者が音声を入
力する音声入力装置としてのマイクロホン２１と、マイ
クロホン２１を通して入力された音声を解析して音韻デ
ータを生成する音声認識部２２とを備える。音声認識部
２２は、マイクロホン２１を通して入力された音声をサ
ンプリングし、サンプリングした音声から規則合成に必
要なパラメータを抽出する。このようなパラメータを音
韻データとして音韻データ記憶部１３に登録する。By the way, the present invention is characterized in that the phoneme data recorded in the phoneme data storage section 13 is prepared by an arbitrary voice input person, and therefore, the voice input device is used as a voice input device for inputting voice. The microphone includes a microphone 21 and a voice recognition unit 22 that analyzes voice input through the microphone 21 and generates phoneme data. The voice recognition unit 22 samples voice input through the microphone 21 and extracts parameters necessary for rule synthesis from the sampled voice. Such parameters are registered in the phoneme data storage unit 13 as phoneme data.

【００１３】音韻データは音声の単位ごとに登録する必
要があり、音声の単位としては、ア、イ、ウ、……など
の５０音のほか、音韻連鎖（母音−子音−母音の接続）
などの付加情報を必要に応じて用いる。このような付加
情報を音韻データとして用いるこによって、合成音の明
瞭度を高めることができる。また、音韻データは、マイ
クロホン２１から入力した音声と音声の単位とを対応付
けて登録しなければならないから、音声入力者は所定の
手順に従って音声を入力する必要がある。このために、
たとえば所定のテキストを指定のタイミングで読み上げ
ることも考えられるが、音声の単位と音韻データとの対
応付けが正しく行なわれるようになるまで、作業を何度
か繰り返す必要があり、音韻データの登録作業が面倒で
ある。It is necessary to register phoneme data for each voice unit. The voice unit includes 50 sounds such as a, a, c,..., And a phoneme chain (vowel-consonant-vowel connection).
Additional information such as is used as needed. By using such additional information as phoneme data, the clarity of the synthesized sound can be increased. In addition, since the phoneme data must be registered in association with the voice input from the microphone 21 and the unit of the voice, the voice input user needs to input the voice according to a predetermined procedure. For this,
For example, it is conceivable to read out a predetermined text at a specified timing, but it is necessary to repeat the work several times until the correspondence between the voice unit and the phoneme data is correctly performed. Is troublesome.

【００１４】そこで、本実施形態では、音韻データの登
録作業の手順を指示するメッセージをメッセージ記憶部
１５に登録してあり、音声によるメッセージに従って作
業すれば比較的簡単に音韻データを登録できるようにし
てある。たとえば、音声入力者が発音すべき音声をスピ
ーカ１４から送出させた後に、復唱させることによって
音声の単位と音韻データとの対応付けを容易にするので
ある。また、同じ音声を複数回繰り返して発音させ、異
常値を捨てて平均値を抽出することにより、適切な音韻
データを設定することができる。つまり、メッセージ記
憶部１５が指示手段の主構成になる。なお、音韻データ
記憶部１５には所要の音韻データがあらかじめデフォル
トとして登録されており、メッセージ記憶部１５に格納
されたメッセージに対応する合成音を生成する際には、
デフォルトとして登録されている音韻データを用いる。Therefore, in the present embodiment, a message instructing the procedure of phoneme data registration work is registered in the message storage unit 15 so that the phoneme data can be registered relatively easily by working in accordance with the voice message. It is. For example, after a voice to be pronounced by the voice input person is transmitted from the speaker 14, the voice is repeated and the correspondence between the voice unit and the phoneme data is facilitated. In addition, the same sound is repeatedly generated a plurality of times, an abnormal value is discarded, and an average value is extracted, so that appropriate phoneme data can be set. That is, the message storage unit 15 is the main configuration of the instruction unit. Note that required phoneme data is registered in the phoneme data storage unit 15 as a default in advance, and when generating a synthesized sound corresponding to the message stored in the message storage unit 15,
Use phoneme data registered as default.

【００１５】上述した各部の制御はマイクロコンピュー
タよりなる制御部２０で行なっており、音韻データの作
成やテキストデータに応じた音声の合成のタイミングは
制御部２０が指示する。つまり、音韻データの作成時に
は、メッセージ記憶部１５に記憶されたメッセージに対
応したテキストデータを用いて音声を合成するように、
メッセージ記憶部１５、テキスト選択部１２、音声合成
部１０に指示を与え、また音声認識部２２で得られたパ
ラメータを音声の単位に対応付けて音韻データ記憶部１
３に格納するように、音韻データ記憶部１３への指示を
行なう。The control of each unit described above is performed by a control unit 20 composed of a microcomputer, and the control unit 20 instructs the timing of generating phoneme data and synthesizing speech according to text data. That is, at the time of creating the phonemic data, the speech is synthesized using the text data corresponding to the message stored in the message storage unit 15.
An instruction is given to the message storage unit 15, the text selection unit 12, and the speech synthesis unit 10, and the parameters obtained by the speech recognition unit 22 are associated with the units of speech and the phoneme data storage unit 1 is used.
3 to the phoneme data storage unit 13.

【００１６】上述のようにして音韻データ記憶部１３に
音韻データを登録しても、明瞭な合成音を得られないよ
うな音韻データを一部に含む場合が考えられる。そこ
で、音韻データ記憶部１３に必要なだけの音韻データを
登録した後には、その音韻データを用いてメッセージ記
憶部１５に格納された所要のメッセージに対応する合成
音を生成する。音声入力者はすべてのメッセージが明瞭
か否かを確認し、不明瞭なメッセージがあれば操作部２
３を操作して、どのメッセージが不明瞭であったかを指
示する。これによって、そのメッセージに含まれる音声
の単位について、音韻データの再登録を行なわせるので
ある。Even if the phoneme data is registered in the phoneme data storage unit 13 as described above, a case may be considered in which the phoneme data is such that a clear synthesized speech cannot be obtained. Therefore, after registering as much phoneme data as necessary in the phoneme data storage unit 13, a synthesized sound corresponding to a required message stored in the message storage unit 15 is generated using the phoneme data. The voice singer checks whether all messages are clear or not, and if there is an unclear message, the operation unit 2
Operate 3 to indicate which message was ambiguous. As a result, the phonemic data is re-registered for the unit of voice included in the message.

【００１７】以上説明したように、音韻データを音韻デ
ータ記憶部１５に格納しておけば、音声入力者の音声に
基づいた音韻データを用いてテキストデータに応じた合
成音を生成することができ、音声入力者の音声によるテ
キストデータの読み上げが可能になるのである。上記装
置で用いる記憶媒体としてのＣＤ−ＲＯＭは、ＣＤ−Ｒ
ＯＭに登録されているテキストデータに対応した内容の
印刷媒体と対にして販売すれば、印刷媒体を見ながら音
声入力者の音声によってテキストを読み上げることが可
能になる。たとえば、印刷媒体を絵本とし、絵本の内容
に応じたテキストデータをＣＤ−ＲＯＭに格納しておけ
ば、絵本の内容に応じたテキストを音声入力者の音声に
よって読み上げることができるのであって、子供用の絵
本に対して親が音声を登録しておいて親の声で子供に読
み聞かせることができる。しかも、テープレコーダなど
のように親が全文を録音しておく必要がなく、テキスト
データはメーカ側で用意しておき、親は音韻データの生
成に必要な音声を登録するだけであり、しかもテキスト
データが変化しても同じ音韻データを用いることができ
るから、複数のテキストデータのすべてに対して音声入
力者が録音する場合に比較して音声入力者の行なう作業
が大幅に少なくなる。As described above, if the phoneme data is stored in the phoneme data storage unit 15, a synthesized sound corresponding to text data can be generated using phoneme data based on the voice of the voice input person. Thus, the text data can be read aloud by the voice of the voice input person. A CD-ROM as a storage medium used in the above device is a CD-R
If the content is sold in a pair with a print medium having the content corresponding to the text data registered in the OM, the text can be read aloud by the voice of the voice input person while looking at the print medium. For example, if a print medium is a picture book and text data corresponding to the contents of the picture book is stored in a CD-ROM, text corresponding to the contents of the picture book can be read out by the voice of the voice entrant. A parent can register a voice for a picture book for a child and read it to a child in the voice of the parent. In addition, there is no need for the parent to record the entire text as in a tape recorder, etc., the text data is prepared by the manufacturer, and the parent only registers the voice required to generate phoneme data. Since the same phonological data can be used even if the data changes, the work performed by the voice input person is significantly reduced as compared with the case where the voice input person records all of the plurality of text data.

【００１８】上述した例では、メッセージ記憶部１５に
格納したメッセージをすべて合成音で与えているが、少
なくとも文字を表示することができる表示手段を設け、
一部のメッセージを表示手段に表示してもよい。つま
り、表示手段を指示手段として用い、文字によるメッセ
ージを与えるようにしてもよい。このようなメッセージ
は音韻データの作成時に使用可能である。In the above-described example, all the messages stored in the message storage unit 15 are given by synthetic sounds. However, display means capable of displaying at least characters is provided.
Some messages may be displayed on the display means. That is, the display means may be used as the instruction means to give a message in characters. Such a message can be used when creating phonemic data.

【００１９】上述の実施形態では、音韻データとして規
則合成用のパラメータを用いているが、マイクロホン２
１から入力され音声認識部２２でサンプリングした音声
をＡ／Ｄ変換して音韻データに用いることも可能であ
る。つまり、音韻データとしていわゆるＰＣＭ音源を用
いるのである。また、上記構成の一部に専用のハードウ
ェアを適用したり、専用のハードウェアを用いずに汎用
のコンピュータ装置にソフトウェアを組み合わせて実現
することが可能である。In the above embodiment, the parameters for rule synthesis are used as phoneme data.
It is also possible to A / D convert the voice input from 1 and sampled by the voice recognition unit 22 and use it for phoneme data. That is, a so-called PCM sound source is used as the phoneme data. Further, it is possible to apply dedicated hardware to a part of the above-described configuration, or to combine software with a general-purpose computer device without using dedicated hardware.

【００２０】（実施形態２）実施形態１は、テキストデ
ータを記憶媒体で与え、別途に用意した印刷媒体を記憶
媒体のテキストデータに対応付けたものであったが、記
憶媒体としてＣＤ−ＲＯＭなどを用いるのであれば、テ
キストデータ以外に十分な量の画像データを同時に格納
可能である。そこで、本実施形態では記憶媒体としてＣ
Ｄ−ＲＯＭを用い、静止画ないし動画の画像データと、
その画像データに対応付けたテキストデータとをＣＤ−
ＲＯＭに格納した例を示す。この種の画像データとして
は、電子絵本などとして知られているような子供向けの
絵本のような画像データを用いることができる。(Embodiment 2) In Embodiment 1, text data is provided on a storage medium, and a separately prepared print medium is associated with the text data of the storage medium. Is used, a sufficient amount of image data other than text data can be stored at the same time. Therefore, in this embodiment, C is used as the storage medium.
Using a D-ROM, image data of a still image or a moving image;
The text data corresponding to the image data and the CD-
5 shows an example of storing in a ROM. As this kind of image data, image data such as a picture book for children, which is known as an electronic picture book or the like, can be used.

【００２１】本実施形態は、図２に示すように、読取装
置としてのＣＤ−ＲＯＭドライブ１１で読み出したデー
タのうち画像データを表示するための表示手段としてデ
ィスプレイ装置１６を設けてある。ディスプレイ装置１
６には文字および図形を表示可能なＣＲＴあるいは液晶
表示器を用いる。ディスプレイ装置１６に表示する画像
は、対話的に画像を変化させるものが望ましい。すなわ
ち、操作部２３にマウスのようなポインティングデバイ
スを設けたり、ディスプレイ装置１６の画面にタッチパ
ネルを重ねて設けたりすることによって、画面上の所望
位置を指示できるようにし、指示された位置に応じて画
像の一部を変化させたり、その位置に応じた音声を出力
させたり、あるいはまた場面を切り換えたりするなどの
各種の処理を行なうのである。ここに、出力される音声
は、音韻データ記憶部１３に格納された音韻データを用
いて音声合成部１０において生成される合成音である。
音声合成部１０で使用するテキストデータは、画面上の
指示された場所に応じてＣＤ−ＲＯＭから読み出され
る。また、本実施形態では必要に応じてテキストをディ
スプレイ装置１６の画面にも表示してよい。In this embodiment, as shown in FIG. 2, a display device 16 is provided as a display means for displaying image data among data read by the CD-ROM drive 11 as a reading device. Display device 1
6 is a CRT or a liquid crystal display capable of displaying characters and figures. It is desirable that the image displayed on the display device 16 be one that interactively changes the image. That is, by providing a pointing device such as a mouse on the operation unit 23, or by providing a touch panel on the screen of the display device 16, a desired position on the screen can be designated, and according to the designated position. Various processes such as changing a part of an image, outputting a sound corresponding to the position, or switching a scene are performed. The output voice is a synthesized voice generated by the voice synthesis unit 10 using the phoneme data stored in the phoneme data storage unit 13.
The text data used by the voice synthesizer 10 is read from the CD-ROM according to the designated location on the screen. In the present embodiment, text may be displayed on the screen of the display device 16 as necessary.

【００２２】本実施形態は、ＣＤ−ＲＯＭにテキストデ
ータとともに画像データを記憶させ、ディスプレイ装置
１６に表示させている点が相違するが、他の同符号を付
した構成については実施形態１と同様に機能するもので
ある。（実施形態３）本実施形態は、図３に示すように、音韻
データ記憶部１３から読み出される音韻データのピッチ
（つまり、合成音のピッチ）を調節することができるピ
ッチ制御部１７を設けたものであって、音韻データのピ
ッチは制御部２０により指示される。ピッチの指示は操
作部２３により段階的に設定できるようにしたり、テキ
ストデータとともにＣＤ−ＲＯＭに格納しておき、テキ
ストデータに関連付けたピッチ情報によりピッチを変更
したりすることができるようになっている。The present embodiment is different from the first embodiment in that image data is stored together with text data on a CD-ROM and is displayed on a display device 16. However, other components having the same reference numerals are the same as those in the first embodiment. It works. (Embodiment 3) In the present embodiment, as shown in FIG. 3, a pitch control unit 17 capable of adjusting the pitch of the phoneme data read from the phoneme data storage unit 13 (that is, the pitch of the synthesized sound) is provided. The pitch of the phoneme data is specified by the control unit 20. The pitch instruction can be set in a stepwise manner by the operation unit 23, or can be stored in a CD-ROM together with text data, and the pitch can be changed by pitch information associated with the text data. I have.

【００２３】前者の場合には、ピッチの変更によって合
成音の明瞭度を調節することが可能であり、後者の場合
にはピッチ情報として音程を与えるようにすれば、テキ
ストに音程が付与されることで歌をうたわせることがで
きる。他の同符号を付した構成については実施形態１と
同様に機能する。In the former case, it is possible to adjust the clarity of the synthesized sound by changing the pitch. In the latter case, if a pitch is given as pitch information, the text is given a pitch. You can sing a song. The other components denoted by the same reference numerals function in the same manner as in the first embodiment.

【００２４】[0024]

【発明の効果】請求項１の発明は、テキストデータを入
力するテキスト入力手段と、音声入力装置を通して任意
の音声入力者が入力した音声をサンプリングすることに
より音声入力者の音韻データを抽出する音声認識手段
と、音声認識手段により抽出した音韻データを音声の単
位ごとに記憶する音韻データ記憶部と、テキスト入力手
段により入力されたテキストデータと音韻データ記憶部
に格納された音韻データとに基づいてテキストデータに
相当する音声を合成し音声出力装置より合成音を出力さ
せる音声合成手段とを備えるものでは、音声入力者が音
声入力装置を通して入力した音声から音韻データを抽出
して音韻データ記憶部に記憶させるから、任意の音声入
力者の音声を合成音として生成することが可能になり、
結果的にテキストデータに相当するテキストを音声入力
者の音声で読み上げさせることが可能になるという利点
がある。According to the first aspect of the present invention, there is provided a text input means for inputting text data, and a voice for extracting phoneme data of the voice input person by sampling a voice input by an arbitrary voice input person through a voice input device. A recognition unit, a phoneme data storage unit that stores phoneme data extracted by the speech recognition unit for each unit of speech, and a text data input by the text input unit and a phoneme data stored in the phoneme data storage unit. Speech synthesis means for synthesizing speech equivalent to text data and outputting a synthesized sound from a speech output device, the phoneme input unit extracts phoneme data from speech input through the speech input device, and stores the phoneme data in the phoneme data storage unit. Because it is stored, it is possible to generate the voice of any voice input person as a synthesized voice,
As a result, there is an advantage that a text corresponding to the text data can be read out by the voice of the voice input person.

【００２５】請求項２の発明のように、音韻データ記憶
部に格納する音声の単位ごとに音声入力者に発話を促す
指示手段を付加したものでは、指示手段によって音韻デ
ータの入力の仕方が指示されるから、適切な音韻データ
を与えることができるという利点がある。請求項３の発
明のように、別途に用意される印刷媒体の内容に対応付
けたテキストデータを記録した記録媒体からテキストデ
ータを読み取る読取装置を備え、読取装置をテキスト入
力手段に用いたものでは、テキストを印刷した印刷媒体
とテキストデータを記録した記録媒体とを対にしてお
き、印刷媒体を見ながら記録媒体のテキストデータを任
意の音声入力者の音声で読み上げることができる。According to the second aspect of the present invention, in the case where an instruction means for prompting a speech input person to speak is added for each unit of speech stored in the phoneme data storage unit, the instruction means instructs how to input the phoneme data. Therefore, there is an advantage that appropriate phoneme data can be given. According to the third aspect of the present invention, there is provided a reading device for reading text data from a recording medium recording text data corresponding to the content of a separately prepared printing medium, and using the reading device as text input means. A print medium on which text is printed and a recording medium on which text data is recorded are paired, and text data on the recording medium can be read aloud by an arbitrary voice input person while looking at the print medium.

【００２６】請求項４の発明のように、文字と図形とを
表示可能な表示手段と、表示手段に表示させる静止画な
いし動画の画像データおよび画像データに関連したテキ
ストデータを記録した記録媒体から画像データおよびテ
キストデータを読み取る読取装置とを備え、読取装置を
テキスト入力手段に兼用するものでは、表示手段に表示
される画像を見ながらテキストデータに対応した音声入
力者の音声を聞くことができる。According to a fourth aspect of the present invention, there is provided a display means capable of displaying characters and figures, and a recording medium which records still image data or moving image image data to be displayed on the display means and text data related to the image data. A reading device for reading image data and text data, wherein the reading device is also used as the text input means, so that the voice of the voice input person corresponding to the text data can be heard while watching the image displayed on the display means. .

【００２７】請求項５の発明のように、合成音のピッチ
を外部信号に応じて制御するピッチ制御部を備えるもの
では、ピッチを制御することが可能になり、単純に音の
高低を変化させるだけではなく、ピッチを制御するピッ
チデータをテキストデータに対応付けておけば、テキス
トデータに音程を付けることで歌の出力も可能になると
いう利点がある。According to the fifth aspect of the present invention, in a device having a pitch control unit for controlling the pitch of a synthesized sound in accordance with an external signal, the pitch can be controlled, and the pitch of the sound is simply changed. In addition, if pitch data for controlling the pitch is associated with text data, there is an advantage that a song can be output by attaching a pitch to the text data.

[Brief description of the drawings]

【図１】本発明の実施形態１を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】本発明の実施形態２を示すブロック図である。FIG. 2 is a block diagram showing Embodiment 2 of the present invention.

【図３】本発明の実施形態３を示すブロック図である。FIG. 3 is a block diagram showing a third embodiment of the present invention.

[Explanation of symbols]

１０音声合成部１１ＣＤ−ＲＯＭドライブ１２テキスト選択部１３音韻データ記憶部１４スピーカ１５メッセージ記憶部１６ディスプレイ装置１７ピッチ制御部２０制御部２１マイクロホン２２音声認識部２３操作部 DESCRIPTION OF SYMBOLS 10 Speech synthesis part 11 CD-ROM drive 12 Text selection part 13 Phoneme data storage part 14 Speaker 15 Message storage part 16 Display device 17 Pitch control part 20 Control part 21 Microphone 22 Voice recognition part 23 Operation part

Claims

[Claims]

1. Text input means for inputting text data, voice recognition means for extracting phoneme data of a voice input person by sampling voice input by an arbitrary voice input person through a voice input device, and voice recognition means A phoneme data storage unit that stores the phoneme data extracted by the above for each voice unit, and a speech corresponding to text data based on the text data input by the text input unit and the phoneme data stored in the phoneme data storage unit. A voice synthesizing means for synthesizing and outputting a synthesized sound from a voice output device.

2. The substitute reading device according to claim 1, further comprising an instruction means for instructing a procedure for storing phoneme data based on the voice of the voice input person in the phoneme data storage unit.

3. A reading device for reading text data from a recording medium recording text data corresponding to the content of a separately prepared printing medium, wherein the reading device is used as text input means. The substitute reading device according to claim 1 or 2.

4. Display means capable of displaying characters and graphics,
A reading device for reading image data and text data from a recording medium on which image data of still or moving images to be displayed on the display means and text data related to the image data are recorded, wherein the reading device is also used as the text input means. The substitute reading device according to claim 1 or 2, wherein

5. The substitute reading apparatus according to claim 1, further comprising a pitch control unit that controls a pitch of the synthesized sound in accordance with an external signal.