JP3404776B2

JP3404776B2 - Signal playback device

Info

Publication number: JP3404776B2
Application number: JP32248692A
Authority: JP
Inventors: 健三赤桐; 芳明及川; 敬一山田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1992-11-06
Filing date: 1992-11-06
Publication date: 2003-05-12
Anticipated expiration: 2018-05-12
Also published as: JPH06149291A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は音声を含む音響を再生す
る装置、または、音響と地図情報などの画像情報を再生
する信号再生装置に関する。本発明の音響再生には、好
適には、規則合成方式を用いる。本発明の信号再生装置
は、たとえば、自動車などの車両用誘導情報装置などに
好適に適用される。【０００２】【従来の技術】従来、音声、その他の音響を再生する装
置、または、このような音響再生に、地図情報の再生を
も加えて車両の走行誘導に使用する車両用誘導情報装置
が提案されている。このような車両用誘導情報装置にお
いては、車両の搭載するという種々の制約から、記憶容
量の低減と能率的な符号化および復号化が要望されてお
り、高能率符号を用いている。【０００３】【発明が解決しようとする課題】しかしながら、従来の
車両用誘導情報装置においては、記憶容量の低減の程度
は、高々数キロビット／秒（Ｋｂｐｓ）程度にしかすぎ
ず、高能率符号化を適用しても、大量の音響信号を符号
化する場合、依然としてかなり大きな記憶容量が必要で
あるという問題に遭遇している。【０００４】本発明の目的は、音響再生、または音響再
生に加えて地図などの画像情報を再生する可能な信号再
生において、再生用信号を記憶する記憶装置の記憶容量
を減少させることにある。また本発明の目的は、再生時
間を短縮できる信号再生装置を提供することにある。【０００５】【課題を解決するための手段】上記問題を解決する本発
明の第１の観点の信号再生装置は、予め再生用の解析処
理を行った音声合成のための音声発音記号情報及び音響
合成用素片を含む音響情報と、上記音響情報に対応する
テキスト情報とを記録した記録手段と、上記記録手段に
記録された上記音声発音記号情報のうち限られた音声発
音記号情報の選択情報を記録する選択情報記録手段と、
上記選択情報に基づいて上記記録手段から上記音響情報
を読み出す第１の読み出し手段と、上記読み出した音響
情報から音響再生信号を作り出す音響合成手段と、上記
作り出した音響再生信号を再生する手段と、上記テキス
ト情報を読み出す第２の読み出し手段と、上記テキスト
情報を表示する表示手段とを具備することを特徴とす
る。【０００６】【０００７】【０００８】【０００９】【００１０】【００１１】【００１２】音声単位データの有声部に関しては、実音
声の有声部分において複素ケプストラム分析を用いて抽
出された。１ピッチに対応するインパルスと単位応答波
形を一組として、この組を１つの音声単位データとして
必要なピッチ分だけ貯えたものからなり、また音素単位
データの無声部に関しては、実音声の無声部分の波形を
切り出して、そのまま貯えたものである。【００１３】【作用】音響テキストから得られる音響記号情報を事前
に文章解析し、音声合成規則処理して算出しておき、信
号再生装置の記憶装置にはこのように算出した音声発音
記号情報を記憶しておく。その結果、必要な記憶容量を
低減できる。【００１４】また、音響テキストから音響記号、好適に
は、音声発音記号情報に変換する文章解析部および辞書
を持たずに、このように記憶された音声発音記号情報を
再生するので、文章解析に必要な時間を短縮でき、再生
時間が短縮できる。もっとも、音声合成用素片をも記憶
しておき、この素片を用いて音声再生することができ
る。【００１５】上記音声再生に加えて地図などの画像再生
を行うこともできる。また利用者が音声出力に反応して
入力する時の音声出力と入力との対応をつけることがで
きる。【００１６】【実施例】以下、図面を参照して、本発明の実施例を詳
述する。図１は本発明の信号再生装置の第１実施例とし
ての音声などの音響情報から合成音を再生する音響再生
装置の構成図である。この実施例においては、音響とし
て特に音声を再生する場合について例示する。つまり、
第１実施例は音声を再生する装置に関する。図１に示す
音声再生装置１０１は、音声発音記号情報１０３を記憶
している記録装置１０２、記録装置１０２から音声発音
記号情報１０３を読み出す読み出し回路１０４、およ
び、音声合成装置１０５を有する。音声合成装置１０５
の後段にはスピーカなどの音声出力装置が接続される
が、図示していない。また、以下に述べる装置において
も、スピーカなどの音声出力装置の図解は省略する。【００１７】本発明の第１実施例においては、テキスト
から音声発音記号を得るまでの文章解析部を音声再生装
置１０１には内蔵せず、前もって解析ずみの音声発音記
号を音声発音記号情報１０３として記録装置１０２に記
録しておく。つまり、音声発音記号は予め、オフライン
的に文章解析部で事前に処理されて、解析された音声情
報が記憶装置１０２に記憶される。その結果、再生時に
解析時間が要らないから再生時間の短縮が実現できる。
その解析の詳細については後述する。【００１８】記録装置１０２としては、Ｃｏｍｐａｃｔ
Ｄｉｓｃ（ＣＤ）装置、Ｍｉｎｉ−Ｄｉｓｃ（ＭＤ）
装置などに例示されるような大容量、小型の再生専用記
憶装置としての再生専用光ディスク装置が好適である。
あるいは、上記記憶装置として、ＭＤ装置などの録音再
生用ディスク装置に代表される光磁気ディスク装置と別
種の原理を用いた録音再生用ディスク装置、たとえば、
相変化型光ディスク装置などのディスク装置、ＲＯＭ
ＲＡＭフラッシュメモリー等の半導体メモリーも利用し
うる。光磁気ディスク装置は、光スポットでディスクの
極小エリアの温度を高めて磁化しやすくしておき、磁界
をかけて「１」または「０」のディジタルを記憶方式の
記録媒体である。相変化型ディスク装置は、光スポット
により金属膜に相変化を起こして情報を記録する方式の
記録媒体である。フラッシュメモリは、ＦＥＴゲート上
の絶縁膜に電荷を蓄積させまたは除去してＦＥＴをオン
・オフして記憶する記録媒体であり、情報記録する面積
が小さく経済的であること、高速動作可能であること、
記憶状態保持に電源が不要であることなどの特徴があ
る。ただし、光ディスク装置に比較すると、記憶容量が
少なく、高価格になるという不利益を有する。【００１９】音声発音記号情報１０３に記憶されている
音声発音記号情報１０３のオフライン作成の詳細につい
て図２を参照して述べる。図２は全体として演算処理装
置構成の音声合成装置２０１の概略構成を示しており、
この音声合成装置２０１は、音声単位記憶部２０２、文
章解析部２０３、音声合成規則部２０４および音声合成
部２０５に分割される。図１に示した音声再生装置１０
１が車両などに搭載されてオンライン的に実時間で動作
するのに対して、図２に示した音声合成装置２０１は、
オフライン的に事前に処理して、再生した音声発音記号
情報１０３を記録装置１０２に記憶する。【００２０】文章解析部２０３は、所定の入力装置から
入力された『テキスト入力（文字の系列で表された文章
等で構成されている）』を所定の『辞書』を基準にして
テキスト解析し、仮名文字列に変換した後、単語、文節
毎に分解する。日本語においては、英語のように単語が
分かち書きされていないことから、例えば「米国産業
界」のような言葉を分析すると、「米国／産業・界」、
「米／国産／業界」のように２種類に区分化し得る。文
章解析部２０３は、『辞書』を参考にしながら、言葉の
連続関係および単語の統計的性質を利用して、テキスト
入力された、たとえば、「米国産業界」という言葉を、
単語、文節毎に分解し、これにより単語、文節の境界を
検出する。文章解析部２０３は、このように各単語毎に
基本アクセントを検出した後、その検出したものを音声
合成規則部２０４に出力する。【００２１】音声合成規則部２０４は、日本語の特徴に
基づいて設定された所定の『音韻規則』に従つて、文章
解析部２０３の上記検出結果およびテキスト入力を処理
する。日本語の自然な音声は、言語学的特性に基づいて
区別すると、約１００程度の発声の単位に区分すること
ができる。たとえば、「さくら」という単語を発声の単
位に区分すると、「sa」＋「ku」＋「ra」の３つのＣＶ
単位に分割することができる。さらに日本語は、単語が
連続する場合、連なった後ろの語の語頭音節が濁音化し
たり（すなわち、続濁となる）、語頭以外のガ行音が鼻
音化したりして、単語単体の場合と発声が変化するとい
う特徴がある。音声合成規則部２０４は、これら日本語
の特徴に従つて『音韻規則』が設定されるように処理
し、当該規則に従つて文章解析部２０３におけるテキス
ト入力を『音韻記号列』、たとえば、上述の「sa」＋
「ku」＋「ra」の連続する列で構成される記号に変換す
る。さらに音声合成規則部２０４は、当該『音韻記号
列』に基づいて、音声単位記憶部２０２から各音声単位
のデータをロードする。【００２２】音声合成装置２０１は、波形編集の手法を
用いて合成音を発声するように処理する。音声単位記憶
部２０２からロードされるデータは、各ＣＶ単位で表さ
れる合成音を生成する際に用いられる波形データで構成
される。波形合成に用いられる音声単位データは次のよ
うな構成からなる。音声単位データの有声部は、実音声
の有声部分において複素ケプストラム分析を用いて抽出
された１ピッチに対応するインパルスと単位応答波形を
１組として、この組を１つの音声単位データとして必要
なピッチ分だけ貯えたものからなり、また音声単位デー
タの無声部に関しては、実音声の無声部分の波形を切り
出してそのまま貯えたものからなる。よって、音声単位
データがＣＶ単位である場合には、１つの音声単位ＣＶ
の子音部Ｃが無声子音である時には無声部分の切り出し
波形と、インパルスと単位応答波形からなる複数組によ
って１つの音声単位データが構成され、また１つの音声
単位ＣＶの子音部Ｃが有声子音である時にはインパルス
と単位応答波形からなる複数組のみによって１つの音声
単位データが構成されることとなる。【００２３】音声合成規則部２０４は、音声単位記憶部
２０２からロードされた音声単位データをテキスト入力
に応じた順序（以下、このデータを合成波形データと呼
ぶ）で合成し、かくして抑揚のない状態で、テキスト入
力を読み上げた合成音声波形を得ることができる。さら
に音声合成規則部２０４は、所定の『韻律規則』に基づ
いて、文章解析部２０３におけるテキスト入力を適切な
長さで分割して、切れ目（すなわちポーズでなる）を検
出する。その結果、図３（Ａ）に示すように、たとえ
ば、テキスト入力として文章「きれいな花を山田さんか
らもらいました」が入力された場合は、図３（Ｂ）に示
すように、当該テキスト入力は、「きれいな」、「はな
を」、「やまださんから」、「もらいました」に分解さ
れた後、「はなを」と「やまださんから」との間に、ポ
ーズが検出される。【００２４】さらに音声合成規則部２０４は、『韻律規
則』および各単語の基本アクセントに基づいて、各文節
のアクセントを検出する。日本語の文節単体のアクセン
トは、感覚的に仮名文字を単位（以下、モーラと呼ぶ）
として、高低の２レベルで表現することができる。この
とき、文節の内容などに応じて、文節のアクセント位置
を区別することができる。たとえば、端、箸、橋は、い
ずれも「ハシ」とする２モーラの単語で、それぞれアク
セントのない０型、アクセントの位置が先頭のモーラに
ある１型、アクセントの位置が２モーラ目にある２型に
分類することができる。この実施例において音声合成規
則部２０４は、図３（Ｃ）に図解したように、『テキス
ト入力』の各文節を、モーラ１型、２型、０型、４型と
分類し、文節単位でアクセントおよびポーズを検出す
る。【００２５】さらに音声合成規則部２０４は、アクセン
トおよびポーズの検出結果に基づいて、テキスト入力全
体の抑揚を表す基本ピツチパターンを生成する。日本語
においては、文節のアクセントは、感覚的に２レベルで
表し得るのに対し、実際の抑揚は、図３（Ｄ）に図解し
たように、アクセントの位置が徐々に低下するという特
徴がある。さらに日本語においては、文節が連続して１
つの文章になると、ポーズから続く，次のポーズに向か
つて、図３（Ｄ）に図解したように、抑揚が徐々に低下
するという特徴がある。音声合成規則部２０４は、かか
る日本語の特徴に基づいて、テキスト入力全体の抑揚を
表すパラメータを各モーラ毎に生成した後、人間が発声
した場合と同様に抑揚が滑らかに変化するように、モー
ラ間の補間によりパラメータを設定する。以上から、音
声合成規則部２０４は、テキスト入力に応じた順序で、
各モーラのパラメータおよび補間したパラメータを合成
し（以下ピツチパターンと呼ぶ）、テキスト入力を読み
上げた音声の抑揚を表すピツチパターン（図３（Ｆ））
を得る。【００２６】音声合成部２０５は、合成波形データおよ
びピツチパターンに基づいて波形合成処理を行ない、合
成音を生成する。この波形合成処理は、下記に述べる方
法で行う。合成音声の有声部分においては、合成波形デ
ータ内のインパルスをピッチパターンに基づいて並べ、
その並べられたインパルスそれぞれに対応する単位応答
波形を各インパルスに重畳する。また合成音声の無声部
分においては、合成波形データ内の切り出し波形をその
まま所望の合成音声の波形とする。これにより、ピツチ
パターンの変化に追従して抑揚の変化する合成音を得る
ことができる。したがって、合成音において音源情報に
インパルスを用いているため合成音のピッチ周期が伸縮
してもそれによる音源情報への影響はほとんどなく、ピ
ッチパターンが大きく変化するような場合でもスペクト
ル包絡に歪みが生じることなく、人間の音声に近い高品
質な任意合成音が得られる。【００２７】以上の構成において、所定の入力装置から
入力されたテキスト入力は、文章解析部２０３で、所定
の辞書を基準にして解析され、単語、文節の境界及び基
本アクセントが検出される。単語、文節の境界および基
本アクセントの検出結果は、音声合成規則部２０４で、
上述した所定の音韻規則に従つて処理され、抑揚のない
状態でテキスト入力を読み上げた音声を表す合成波形デ
ータが生成される。さらに単語、文節の境界および基本
アクセントの検出結果は、音声合成規則部２０４で、所
定の韻律規則に従つて処理され、テキスト入力全体の抑
揚を表すピツチパターンが生成される。ピツチパターン
は、合成波形データと共に音声合成部２０５に出力さ
れ、ここでピツチパターン及び合成波形データに基づい
て合成音が生成される。以上の構成によれば、音声の合
成に用いる音声単位データそれぞれ１つにおいて、その
有声部分においては音源情報としてのインパルスと声道
特性としての単位応答波形の必要複数組を持ち、その無
声部分においては実音声の切り出し波形を持つことによ
って、ピッチパターンの変化によるスペクトル包絡の歪
みを生じることなく、人間の音声に近い高品質な合成音
声を任意に生成することができる。【００２８】なお上述の第１実施例においては、音源情
報としてインパルスを用いて単位応答波形と重畳するこ
とによって波形合成を行なっているが、このインパルス
を理想的なインパルスとみなすことによって、インパル
スをピッチ周期間隔に並べて重畳することなく、直接単
位応答波形をピッチパターンに対応するように並べるこ
とで、所望の合成音を生成するようにしてもよい。さら
に上述の実施例においては、音声単位記憶部２０２にお
いて音声単位データをＣＶ単位で保持しているが、これ
はＣＶ単位のみではなく、ＣＶＣ単位など別の音声単位
でデータを保持してもよい。【００２９】図１に示す音声発音記号情報１０３は、上
述した音声合成規則部２０４で作られる音韻記号列に文
節アクセント、ポーズ情報を付加したものであり、記録
装置１０２に記憶される。読み出し回路１０４は音声発
音記号情報１０３を記憶している記録装置１０２から音
声発音記号情報１０３を読み出す。音声合成装置１０５
は、図２に示した音声合成部２０５と同等の機能を有
し、読み出し回路１０４で読み出した音声発音記号情報
１０３をもとにして、合成音声出力を得る。【００３０】第１実施例によれば、記録装置１０２に記
憶される音声発音記号情報１０３が、図２に示した音声
合成装置２０１によで事前に文章解析されているから、
記録装置１０２の記憶容量を低減することができる。ま
た第１実施例においては、音声合成装置２０１において
事前に文章解析された音声情報が音声発音記号情報１０
３として記録装置１０２に記憶されているから、再生時
に文章解析する必要がなく、再生時間が短縮できる。な
お、第１実施例においては、音声の再生を例示したが、
本実施例においは、音声に限らず、広く音響をも再生の
対象にすることができる。音響については、音声のよう
に文章解析、音声合成規則に従う処理を行う必要は必ず
しもない。【００３１】第１実施例としての音声再生装置を車両に
搭載して、再生した音声、または、再生した音声と音響
を自動車などの車両の交通誘導情報として提供すること
ができる。特に、車両などのように、空間的、価格的、
電源容量的に制限された条件のもとで、しかも、迅速に
動作させなければならない場合には、記録装置１０２の
記憶容量の低減、再生時間の短縮は実益がある。【００３２】記録装置１０２の種類は、音声発音記号情
報１０３の記憶容量、再生時間の迅速性、さらに場合に
よっては、修正の可能性を考慮して決定する。迅速性と
いう観点からは、半導体メモリが好適であるが、容量と
価格に制限が生ずる可能性がある。ＣＤ、ＭＤなどを用
いると、再生速度の低下があるが、価格の点では、問題
が少ない。また車両に搭載する場合を考慮すると、車両
は振動の影響を受けるから、記録装置１０２としては、
振動に対して影響を受けない、半導体メモリが好適であ
る。【００３３】図４は本発明の信号再生装置の第２実施例
としての関連づけられた合成音とテキスト情報を出力す
る音声再生装置４０１の全体を示す。音声再生装置４０
１は、記憶装置４０２、第１の読み出し回路４０５、音
声合成装置４０６、第２の読み出し回路４０７、およ
び、表示装置４０８を有する。記憶装置４０２には、音
声発音記号情報４０４とそれと関係づけられたテキスト
情報４０３とが記憶されている。テキスト情報４０３
は、ＡＳＣＩＩコードなどで表されたテキスト情報であ
る。音声再生装置４０１は、対になった音声発音記号情
報４０４とテキスト情報４０３をほぼ同じ時刻に再生す
る。【００３４】音声発音記号情報４０４とテキスト情報４
０３の内容は全く同じ文章である場合もあるが、表示装
置４０８に表示されるテキスト情報４０３が書き言葉で
あり、音声合成装置４０６から出力される音声発音記号
情報４０４が話し言葉であるという違いはあり得る。ま
た、テキスト表示と音声再生にそれぞれ別の事を再生さ
せる事も有効である。たとえば、テキスト表示はくわし
く数字を表示し、音声では包括的な説明だけを行なう。【００３５】第１の読み出し回路４０５は図１に示した
読み出し回路１０４と同様であり、記憶装置４０２から
音声発音記号情報４０４を読み出す。第２の読み出し回
路４０７は記憶装置４０２からテキスト情報４０３を読
み出す。音声合成装置４０６は図１に示した音声合成装
置１０５と同様である。表示装置４０８は、テキスト情
報４０３を読み出す第２の読み出し回路４０７で読み出
したテキスト情報を表示する。たとえば、表示装置４０
８としては、液晶ディスプレイ、ブラウン管表示装置等
が使用できる。対になった音声発音記号情報４０４とテ
キスト情報４０３をほぼ同じ時刻に出力するには、記憶
装置４０２に各テキスト情報に対になった音声発音記号
情報の記録されている場所を示すアドレス情報を付加し
ておく事ができる。また、対になった音声発音記号情報
４０４とテキスト情報４０３とを記憶装置４０２内の連
続した記録エリアに記録しておく事でも可能であり、こ
の場合にはアドレス動作のために無駄時間が要らないと
いう利点がある。【００３６】図５には新たに地図情報が付加された本発
明の信号再生装置の第３実施例を示す。この再生装置５
０１は、音声発音記号情報４０４に対応する音声発音記
号情報５０４、テキスト情報４０３に対応するテキスト
情報５０３に加えて地図情報５０９を記憶している記憶
装置５０２を有し、第１の読み出し回路４０５に対応す
る第１の読み出し回路５０５が記憶装置５０２から音声
発音記号情報５０４を読み出して音声合成装置４０６に
対応する音声合成装置５０６に出力する。第２の読み出
し回路５０７は、記憶装置５０２がらテキスト情報５０
３の他、地図情報５０９を読み出し、表示装置４０８に
相当する表示装置５０８に出力する。記憶装置５０２に
地図情報５０９を記録するには，各座標点の信号値
「１」または「０」、および、テキストを表示する位置
情報とテキスト内容が判ればよい。これは、地図情報５
０９が白黒の場合であるが、地図情報５０９をカラー化
するには各座標点につき、Ｒ、Ｇ、Ｂを示す３個ずつの
信号値が、与えられれば良い。記憶装置５０２に記憶さ
れた地図情報５０９、音声発音記号情報５０４、テキス
ト情報５０３の３つから、同時に車両を誘導する音声、
車両を誘導する地図としての絵、テキスト出力を出す事
により、車両用誘導情報装置として望ましいものとな
る。【００３７】図６に本発明の信号再生装置の第４実施例
を示す。再生装置６０１は、記憶装置６０２、第１の読
み出し回路６０５、音声合成装置６０６、第２の読み出
し回路６０７、および、表示装置６０８を有する。図２
を参照して音声単位記憶部２０２をどこへ記録するかに
ついては述べなかったが、音質を向上させてゆくために
は、このためのデータ量が増加してゆくので、これらデ
ータを、図６の再生装置６０１においては、単位情報当
たりの記録コストの安いディスク装置などに記録してお
き、必要に応じて取り出すのが望ましい音声を合成する
のに時間はかかるが、時間的に前もって作製を開始して
おく事で実用上、不便とはならない。図６に示した再生
装置６０１においては、記憶装置６０２内に音声発音記
号情報６０４、テキスト情報６０３、地図情報６０９に
加えて、音声合成用素片６１０を記憶しておき、第１の
読み出し回路６０５で音声発音記号情報６０４と音声合
成用素片６１０を読み出し、音声合成装置６０６で音声
合成用素片６１０をもとに音声発音記号情報６０４を合
成するように構成している。地図情報６０９およびテキ
スト情報６０３の表示装置６０８への再生は、図５を参
照して述べた第２実施例と同様である。音声合成用素片
６１０は半導体メモリに記憶すると、再生速度を短縮す
る上で好適である。【００３８】図７に本発明の信号再生装置の第５実施例
を示す。図７に示した再生装置７０１は、記憶装置７０
２、第１の読み出し回路７０５、音声合成装置７０６、
第２の読み出し回路７０７、表示装置７０８を有する。
記憶装置７０２には音声発音記号情報７０４、音声合成
用素片７１０、テキスト情報７０３および地図情報７０
９が記憶されている。これらの構成は図６に図解した再
生装置６０１と同じである。図７に示した再生装置７０
１にはさらに、内部コントロール装置７１２、判別装置
７１１および入力装置７１０が設けられている。さらに
再生装置７０１には外部コントロール装置７１３が接続
されている。入力装置７１０はシステムの音声出力もし
くはテキスト情報出力を知ってから使用者がシテスムの
指示を行なう為の入力装置であり、最も簡単には、イエ
ス／ノーに対応するスイッチがこれに相当する。判別装
置７１１はシステムの音声出力、もしくは、テキスト情
報出力と、入力装置からの入力信号とを対応させる為の
判別装置である。【００３９】図８に示すように時間軸上の窓を利用する
ことにより対応関係得ることができる。入力時間窓は入
力装置７１０からの信号を判別装置７１１で、内部コン
トロール装置７１２または外部コントロール装置７１３
に窓が開いている時のみ通す事ができ、それ以外の時間
は拒絶する。入力時間窓の方が、テキスト情報７０３も
しくは音声発音記号情報７０４の出力の提示時間の終了
後もしばらく、たとえば、ほぼ１０秒程度以内、接続状
態にあるのは、利用者の判断および動作がテキストもし
くは音声の提示より遅れる可能性があるためである。も
ちろんこの事により、利用者からの入力はこの超過時間
以上の時間差を次の入力に与えなければいけないことに
なる。車両用誘導情報装置においては、利用者への負担
軽減のため、あまり速い動作はシステムには必要ではな
い事からほぼ１０秒程度以内のこの超過時間は許容し得
る。【００４０】内部コントロール装置７１２は判別装置７
１１から、有効とみなされた入力装置７１０からの入力
信号を受け付け、次の動作を決定して、記憶装置７０２
をコントロールする。たとえば、（１）システムが利用者に表示されている地図情報に関
連するテキスト情報を表示するか否かを音声で聞く。（２）使用者がイエスの応答をスイッチにて行なう。（３）判別装置７１１が音声合成装置７０６からの信号
で、音声発声時間巾を知り、入力時間窓を作製して、内
部コントロール装置７１２にイエスの信号を出力する。（４）内部コントロール装置７１２は、記録装置７０２
より対応するテキスト情報７０３を出力する。外部コン
トロール装置７１３は、再生装置７０１内の記憶装置７
０２以外をコントロールする回路であり、たとえば、音
声合成装置７０６のの発声音声のスピードコントロール
などを行う。【００４１】図９に記憶装置７０２としてＣＤなどのデ
ィスクを用いた場合の記録状態を示す。図９（Ａ）にお
いて、符号９０１はテキスト情報７０３の記憶領域を示
し、符号９０２はテキスト情報７０３とほぼ同時刻に出
力する音声発音記号情報７０４の記憶領域を示す。図９
（Ｂ）は図９（Ａ）に対比した記憶方式を示しており、
テキスト情報７０３の記憶領域と音声発音記号情報７０
４の記憶領域を異なる位置にまとめた例を示す。テキス
ト情報７０３は符号９０３、９０４で示した領域に記憶
され、音声発音記号情報７０４は符号９０５、９０６で
示した領域に記憶されている。【００４２】図１０に本発明の信号再生装置の第６実施
例を示す。図１０に示した信号再生装置は特に、車両用
誘導情報装置として好適な信号再生装置の例を示す。こ
の車両用誘導情報装置１００１は、記憶装置１００２、
第１の読み出し回路１００５、音声合成装置１００６、
バッファメモリ１０１６、第２の読み出し回路１００
７、表示装置１００８、入力装置１０１０、判別装置１
０１１、第１のコントロール装置１０１２、第２のコン
トロール装置１０１３を有する。これらの構成自体は、
バッファメモリ１０１６が設けられている点を除き、図
７に図解した再生装置７０１および外部コントロール装
置７１３の構成に類似しており、その基本処理動作は図
７に図解した信号再生装置と同様である。【００４３】車両用誘導情報装置１００１はさらに、プ
リセット入力装置１０１４を有し、第１のコントロール
装置１０１２には選択情報記憶装置１０１５が設けられ
ている。プリセット入力装置１０１４は、車両用誘導情
報装置１０００を使用するときに、車両が通過する経路
（コース）を考慮して再生音声を組をあらかじめ選択し
ておく、選択信号用入力装置である。選択情報記憶装置
１０１５は、プリセット入力装置１０１４および入力装
置１０１０からの入力信号によって、そのコンテキスト
にあった再生音声の組を選択するための選択情報記憶装
置入力に対応した音声の発音記号のアドレス情報を保持
する。第２の読み出し回路１００７と表示装置１００８
との間に設けられたバッファメモリ１０１６は、コンテ
キストに合致した音声発音記号の組を記憶して、音声合
成装置１００６を介して音声出力させる。この構成によ
り、車両用誘導情報装置１０００が搭載された車両のド
ライバは車両の通過コースを考慮した音声出力を音声合
成装置１００６から、そして、図形情報を表示装置１０
０８から入手することができる。【００４４】【発明の効果】本発明によれば、数百ｂｐｓ程度のビッ
トレ−トで表示するテキストと関連ある内容の音声を得
ることができる。本発明によればまた、オフラインで文
章解析および音声合成規則処理を行っているので、再生
時間が短縮できる。さらに文章解析部および辞書を有す
ることなしにオンラインシステムを構成できるので低価
格に信号再生装置を製造できる。また本発明によれば、
利用者が音声出力に反応して入力する時の音声出力と入
力との対応を付けることができる。DETAILED DESCRIPTION OF THE INVENTION [0001] BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention reproduces sound including sound.
Device that reproduces sound and image information such as map information
To a signal reproducing device. For the sound reproduction of the present invention,
Suitably, a rule combining scheme is used. Signal reproducing device of the present invention
Is used, for example, in guidance information devices for vehicles such as automobiles.
It is preferably applied. [0002] 2. Description of the Related Art Conventionally, devices for reproducing voice and other sounds have been used.
Or the reproduction of map information for such sound reproduction.
Vehicle guidance information device used for vehicle guidance
Has been proposed. In such a vehicle guidance information device,
However, due to various restrictions on mounting a vehicle,
There is a demand for reduced coding and efficient encoding and decoding.
And a high efficiency code is used. [0003] SUMMARY OF THE INVENTION However, the conventional
In vehicle guidance information devices, the degree of reduction in storage capacity
Is only a few kilobits per second (Kbps) at most
Even if high-efficiency coding is applied, a large amount of audio signals can be encoded.
Requires still a significant amount of storage
There is a problem that there is. An object of the present invention is to reproduce sound or reproduce sound.
Possible signal reproduction to reproduce image information such as maps in addition to raw
In storage, the storage capacity of the storage device that stores the signal for reproduction
Is to reduce. The purpose of the present invention is to
An object of the present invention is to provide a signal reproducing apparatus capable of shortening the time. [0005] [MEANS FOR SOLVING THE PROBLEMS] The present invention for solving the above problems.
The signal reproducing apparatus according to the first aspect of the present invention has an analysis processing for reproduction in advance.
Phonetic symbol information and sound for intelligent speech synthesis
Acoustic information including the synthesis unit and corresponding to the acoustic information
Recording means for recording text information;In the above recording means
Limited voice utterances among the recorded phonetic symbol information
Selection information recording means for recording selection information of phonetic information,
Based on the above selection informationThe sound information from the recording means
First reading means for readingReadacoustic
Sound synthesis means for generating a sound reproduction signal from information;
CreatedMeans for reproducing a sound reproduction signal;
Second reading means for reading the text information, and the text
Display means for displaying information.Be preparedCharacterized by
You. [0006] [0007] [0008] [0009] [0010] [0011] For the voiced part of the voice unit data, the actual sound
Extraction of voiced parts using complex cepstrum analysis
Was issued. Impulse and unit response wave corresponding to one pitch
Form as one set, this set as one voice unit data
It consists of only the required pitches stored, and also has phoneme units.
For the unvoiced part of the data, the waveform of the unvoiced part of the real voice
It is cut out and stored as it is. [0013] [Action] The acoustic symbol information obtained from the acoustic text is pre-
Sentences are analyzed and calculated by speech synthesis rule processing.
The voice pronunciation calculated in this way is stored in the storage device of the signal reproduction device.
The symbol information is stored. As a result, the required storage capacity
Can be reduced. [0014] In addition, from the acoustic text to the acoustic symbol, preferably
Is a sentence analyzer and dictionary that converts to phonetic symbol information
Without having to use the phonetic phonetic symbol information stored in this way.
Playback reduces the time required for sentence analysis and plays back
Time can be reduced. However, it also stores speech synthesis segments
You can use this segment to play back audio
You. [0015] In addition to the above audio reproduction, reproduction of images such as maps.
Can also be performed. Also, if the user responds to the audio output
The correspondence between audio output and input when inputting
Wear. [0016] Embodiments of the present invention will be described below in detail with reference to the drawings.
Will be described. FIG. 1 shows a signal reproducing apparatus according to a first embodiment of the present invention.
Sound reproduction that reproduces synthesized sounds from acoustic information such as various voices
It is a block diagram of an apparatus. In this embodiment, the sound
In particular, a case where sound is reproduced will be exemplified. That is,
The first embodiment relates to a device for reproducing sound. Shown in FIG.
The voice reproducing device 101 stores the phonetic symbol information 103.
Recording device 102, voice pronunciation from the recording device 102
A readout circuit 104 for reading out the symbol information 103;
And a voice synthesizer 105. Voice synthesizer 105
An audio output device such as a speaker is connected to the subsequent stage
However, it is not shown. Also, in the device described below
Also, illustration of an audio output device such as a speaker is omitted. In a first embodiment of the present invention, a text
From the sentence analysis unit until the phonetic symbols are obtained
The phonetic transcription that is not built in the device 101 but has been analyzed in advance
Is recorded in the recording device 102 as phonetic symbol information 103.
Record it. In other words, phonetic symbols are pre-
The speech information processed and analyzed in advance by the sentence analysis unit
The information is stored in the storage device 102. As a result, during playback
Since the analysis time is not required, the reproduction time can be reduced.
Details of the analysis will be described later. As the recording device 102, Compact is used.
Disc (CD) device, Mini-Disc (MD)
Large-capacity, small-sized read-only record as exemplified in devices
A read-only optical disk device as a storage device is suitable.
Alternatively, as the above-mentioned storage device, a recording device such as an MD device
Separate from magneto-optical disk drive represented by raw disk drive
Disk device for recording and playback using various principles, for example,
Disk device such as phase change type optical disk device, ROM
Also uses semiconductor memory such as RAM flash memory
sell. Magneto-optical disk drives use a light spot to
Increase the temperature of the minimal area to make it easy to magnetize,
Multiply by “1” or “0” to store the digital
It is a recording medium. The phase-change type disk drive uses a light spot
A method of recording information by causing a phase change in the metal film by using
It is a recording medium. Flash memory on FET gate
Turns FET on by accumulating or removing electric charge in insulating film
.A recording medium for storing data off, and an area for recording information
Is small and economical, can operate at high speed,
It has features such as the fact that no power supply is
You. However, compared to optical disk devices, the storage capacity is
It has the disadvantage of being less expensive. [0019] stored in the phonetic symbol information 103
Details of offline creation of phonetic symbol information 103
This will be described with reference to FIG. FIG. 2 shows an arithmetic processing unit as a whole.
1 shows a schematic configuration of a speech synthesizer 201 having an arrangement configuration;
The speech synthesizer 201 includes a speech unit storage unit 202, a sentence
Chapter analysis unit 203, speech synthesis rule unit 204, and speech synthesis
Unit 205. Audio reproduction device 10 shown in FIG.
1 is installed in a vehicle and operates online in real time
In contrast, the speech synthesizer 201 shown in FIG.
Phonetic symbols reproduced and processed in advance offline
The information 103 is stored in the recording device 102. The sentence analysis unit 203 receives a message from a predetermined input device.
Entered text input (sentences represented by a series of characters
Etc.)] based on a predetermined "dictionary"
After analyzing the text and converting it to a kana character string, words and phrases
Decompose each time. In Japanese, words are like English
For example, "U.S. industry
Analyzing words like "world", "US / industry / world"
It can be divided into two types, such as "US / Domestic / Industry". Sentence
The chapter analysis unit 203 refers to the “dictionary” and
Utilize continuity and statistical properties of words
For example, if you enter the word "US industry"
Break down each word and phrase, and use this to determine the boundaries of words and phrases.
To detect. The sentence analysis unit 203 thus performs, for each word,
After detecting basic accents, sound the detected
Output to the composition rule unit 204. [0021] The speech synthesis rule unit 204 provides
According to the predetermined "phonological rules" set based on the sentence,
Processes the above detection result and text input of the analysis unit 203
I do. Japanese natural speech is based on linguistic characteristics
To distinguish them, divide them into about 100 utterance units
Can be. For example, the word "Sakura"
When divided into three places, three CVs of "sa" + "ku" + "ra"
Can be divided into units. Furthermore, in Japanese, the words
If they are consecutive, the initial syllable of the succeeding word becomes muddy.
(Ie, it becomes turbid)
It is said that the utterance changes from the case of a single word due to sound
There is a feature. The speech synthesis rule unit 204 uses these Japanese
Processing so that "phonological rules" are set according to the characteristics of
In accordance with the rules, the text
The input is "phonemic symbol string", for example, "sa" +
Convert to a symbol consisting of a continuous column of "ku" + "ra"
You. Further, the speech synthesis rule unit 204 sets the “phonological symbol
Of each voice unit from the voice unit storage unit 202 based on the
Load the data for The voice synthesizer 201 uses a waveform editing technique.
To produce a synthesized sound. Voice unit memory
The data loaded from the unit 202 is expressed in units of each CV.
Composed of waveform data used to generate synthesized sounds
Is done. The voice unit data used for waveform synthesis is as follows.
It consists of such a structure. The voiced part of the voice unit data is the actual voice
Of complex voiced parts using complex cepstrum analysis
Impulse and unit response waveform corresponding to one pitch
As one set, this set is required as one audio unit data
Audio data.
For unvoiced parts of data, cut off the waveform of unvoiced parts of real voice.
It consists of things that you take out and store. Therefore, the audio unit
If the data is in CV units, one voice unit CV
When the consonant part C is a voiceless consonant, cut out the unvoiced part
Waveform and multiple sets of impulse and unit response waveform
One voice unit data is constituted by
Impulse when consonant part C of unit CV is a voiced consonant
One voice by only multiple sets consisting of
Unit data is configured. The speech synthesis rule unit 204 includes a speech unit storage unit.
Text input of voice unit data loaded from 202
(Hereinafter, this data is referred to as synthesized waveform data.)
), And without text into the text,
It is possible to obtain a synthesized speech waveform whose power is read aloud. Further
The speech synthesis rule unit 204 performs the
The text input in the text analysis unit 203
Divide by length to detect breaks (i.e., poses)
Put out. As a result, as shown in FIG.
For example, as a text input, use the sentence
If you input "I got it,"
As such, the text input is “clean”,
"," "From Yamada", "I got it"
After that, between “Hana-no” and “Yamada-san”,
Is detected. Further, the speech synthesis rule unit 204 outputs
Rules and the basic accent of each word
To detect accents. Japanese phrase alone Accen
Is a unit of kana characters intuitively (hereinafter referred to as mora)
Can be expressed in two levels, high and low. this
Sometimes, depending on the content of the phrase, the accent position of the phrase
Can be distinguished. For example, ends, chopsticks, bridges
The gap is also a two-mora word that says “hashi”.
Type 0 without cents, accent position in first mora
There is a certain type 1 and a type 2 where the accent position is in the second mora
Can be classified. In this embodiment, the speech synthesis rule
As illustrated in FIG. 3 (C), the rule unit 204 performs the “text
The input clauses are called mora type 1, type 2, type 0, type 4
Classify and detect accents and poses on a phrase basis
You. Further, the speech synthesis rule section 204
Text input based on the
Generate a basic pitch pattern representing the intonation of the body. Japanese
In, the accent of the clause is intuitively at two levels
The actual intonation, while it can be represented, is illustrated in FIG.
As shown in the figure, the accent position gradually decreases.
There is a sign. Furthermore, in Japanese, clauses are
When one sentence comes, it follows the pose and goes to the next pose
Therefore, as illustrated in Fig. 3 (D), the intonation gradually decreases.
There is a feature to do. The speech synthesis rule section 204
Based on the characteristics of Japanese language
After generating the parameters to represent for each mora, a human utters
Mode so that the intonation changes smoothly as if
The parameters are set by interpolation between lines. From the above, the sound
The voice synthesis rule unit 204 determines the order according to the text input,
Combining the parameters of each mora and the interpolated parameters
(Hereinafter referred to as pitch pattern), read text input
Pitch pattern showing the intonation of the raised voice (Fig. 3 (F))
Get. The speech synthesizing unit 205 generates synthesized waveform data and
Performs waveform synthesis processing based on the pitch pattern
Generates a sound. This waveform synthesizing process is described below.
Perform by law. In the voiced part of the synthesized speech, the synthesized waveform
The impulse in the data is arranged based on the pitch pattern,
Unit response corresponding to each of the arranged impulses
The waveform is superimposed on each impulse. Also, the unvoiced part of the synthesized speech
Minute, the cut-out waveform in the synthesized waveform data is
The desired synthesized speech waveform is left as it is. This makes the pitch
Obtain a synthetic sound with varying intonation following the pattern change
be able to. Therefore, in the synthesized sound,
The pitch period of synthesized sound expands and contracts due to the use of impulses
Even so, there is almost no effect on sound source information,
Even when the switch pattern changes greatly
High quality that is similar to human voice without distortion of the envelope
A high quality arbitrary synthesized sound is obtained. In the above configuration, a predetermined input device
The input text input is sent to the
Is analyzed with reference to the dictionary of
This accent is detected. Word and phrase boundaries and bases
The detection result of the accent is obtained by the speech synthesis rule unit 204.
Processed according to the prescribed phonological rules described above, without inflection
Synthetic waveform data representing the speech read out of text input in the state
Data is generated. Plus words, phrase boundaries and basics
The result of accent detection is output by the speech synthesis rule
It is processed according to certain prosodic rules, and suppresses the entire text input.
A pitch pattern representing the fried fish is generated. Pitch pattern
Is output to the speech synthesizer 205 together with the synthesized waveform data.
Here, based on the pitch pattern and the synthesized waveform data,
Thus, a synthesized sound is generated. According to the above configuration, audio
In each of the audio unit data used for
Impulse and vocal tract as sound source information in voiced part
It has necessary sets of unit response waveforms as characteristics,
In the voice part, by having a cutout waveform of the real voice
Therefore, the distortion of the spectral envelope due to the change in pitch pattern
High quality synthesized sound that is close to human voice without causing distortion
Voices can be generated arbitrarily. In the first embodiment described above, the sound source information
Superimposed on the unit response waveform using impulse
The waveform is synthesized by
Is regarded as an ideal impulse,
Without superimposing them at pitch period intervals,
Position response waveforms so that they correspond to the pitch pattern.
Thus, a desired synthesized sound may be generated. Further
In the above-described embodiment, the voice unit storage unit 202
And holds voice unit data in CV units.
Is not only CV unit, but also another voice unit such as CVC unit
May hold the data. The phonetic symbol information 103 shown in FIG.
Sentence in the phoneme symbol string created by the speech synthesis rule unit 204 described above.
Recorded with knot accent and pause information added
Stored in device 102. The readout circuit 104 emits voice
The sound from the recording device 102 storing the phonetic symbol information 103
The voice pronunciation symbol information 103 is read. Voice synthesizer 105
Has a function equivalent to that of the speech synthesis unit 205 shown in FIG.
And the phonetic symbol information read out by the reading circuit 104
Based on 103, a synthesized speech output is obtained. According to the first embodiment, the recording device 102
The phonetic symbol information 103 to be remembered is the voice shown in FIG.
Since the sentence has been analyzed in advance by the synthesizer 201,
The storage capacity of the recording device 102 can be reduced. Ma
In the first embodiment, in the speech synthesizer 201,
The speech information analyzed in advance is the phonetic symbol information 10
3 is stored in the recording device 102,
There is no need to analyze the text and the playback time can be reduced. What
Note that, in the first embodiment, the reproduction of audio has been exemplified.
In this embodiment, not only sound but also sound can be widely reproduced.
Can be targeted. Sound is like sound
It is necessary to perform processing according to the sentence analysis and speech synthesis rules
I can't do it. The audio reproducing apparatus according to the first embodiment is applied to a vehicle.
Installed and played audio, or played audio and sound
Information as traffic guidance information for vehicles such as automobiles
Can be. In particular, like vehicles, spatial, price,
Quickly under conditions where power capacity is limited
If it is necessary to operate, the recording device 102
There is a real benefit in reducing the storage capacity and the playback time. The type of the recording device 102 is phonetic phonetic symbol information.
Information 103 storage capacity, quick playback time, and in some cases
Therefore, the determination is made in consideration of the possibility of correction. Quickness and
From this point of view, semiconductor memory is preferable, but capacity and
There may be price restrictions. For CD, MD, etc.
There is a slow playback speed, but in terms of price,
Less is. Considering the case of mounting on a vehicle,
Is affected by vibration, the recording device 102
A semiconductor memory that is not affected by vibration is preferable.
You. FIG. 4 shows a second embodiment of the signal reproducing apparatus according to the present invention.
Output the associated synthesized speech and text information as
1 shows an entire sound reproducing apparatus 401. Audio playback device 40
1 denotes a storage device 402, a first reading circuit 405, a sound
Voice synthesizer 406, second readout circuit 407, and
And a display device 408. The storage device 402 has a sound
Phonetic symbol information 404 and the text associated with it
Information 403 is stored. Text information 403
Is text information represented by ASCII code, etc.
You. The audio reproducing device 401 is a pair of audio phonetic symbol information.
Information 404 and text information 403 are reproduced at almost the same time.
You. Phonetic symbol information 404 and text information 4
03 may be the exact same sentence,
The text information 403 displayed on the display 408 is written
Yes, phonetic symbols output from the speech synthesizer 406
There can be a difference that the information 404 is a spoken language. Ma
Play different things for text display and voice playback.
Making it effective is also effective. For example, text display
The numbers are displayed and the audio gives only a comprehensive explanation. The first read circuit 405 is shown in FIG.
The same as the reading circuit 104,
The phonetic symbol information 404 is read. Second read time
The path 407 reads the text information 403 from the storage device 402.
Protrude. The speech synthesizer 406 is the speech synthesizer shown in FIG.
It is the same as the device 105. The display device 408 displays text information.
The information 403 is read by the second read circuit 407
Display the text information. For example, the display device 40
8 is a liquid crystal display, a cathode ray tube display, etc.
Can be used. Paired phonetic symbol information 404 and tele
To output the text information 403 at almost the same time,
Phonetic symbols paired with each piece of text information in device 402
Adds address information indicating the location where the information is recorded.
Can be kept. Also, paired phonetic phonetic information
404 and the text information 403 are linked in the storage device 402.
It is also possible to record in a continuous recording area.
In the case of, if no dead time is required for the address operation
There are advantages. FIG. 5 shows an embodiment of the present invention to which map information is newly added.
3 shows a third embodiment of the present invention. This playback device 5
01 is a phonetic transcription corresponding to the phonetic symbol information 404.
No. information 504, text corresponding to text information 403
Storage that stores map information 509 in addition to information 503
Device 502 corresponding to the first readout circuit 405.
The first reading circuit 505 outputs a voice from the storage device 502.
Read phonetic symbol information 504 and send it to speech synthesizer 406
Output to the corresponding speech synthesizer 506. Second read
The storage circuit 507 stores the text information 50 in the storage device 502.
3 and the map information 509 is read out and displayed on the display device 408.
Output to the corresponding display device 508. In the storage device 502
To record the map information 509, the signal value of each coordinate point
"1" or "0" and the position to display the text
You only need to know the information and the text content. This is map information 5
09 is black and white, but map information 509 is colored
For each coordinate point, three points, R, G, and B,
It is sufficient that the signal value is given. Stored in the storage device 502
Map information 509, phonetic symbol information 504, text
From the three pieces of information 503,
Output pictures and text as a map to guide the vehicle
This makes it desirable as a vehicle guidance information device.
You. FIG. 6 shows a fourth embodiment of the signal reproducing apparatus according to the present invention.
Is shown. The playback device 601 has a storage device 602 and a first reading device.
Readout circuit 605, speech synthesizer 606, second reading
A display circuit 607 and a display device 608. FIG.
Where the voice unit storage unit 202 is recorded with reference to
I did not mention it, but to improve the sound quality
However, since the amount of data for this will increase,
The playback device 601 shown in FIG.
To a low cost disk drive, etc.
Synthesize speech that is desirable to extract as needed
It takes time, but you can start production in advance
Putting it in practice does not cause inconvenience. Reproduction shown in FIG.
In the device 601, the phonetic transcription is stored in the storage device 602.
Number information 604, text information 603, map information 609
In addition, the speech synthesis unit 610 is stored, and the first
The read-out circuit 605 matches the phonetic symbol information 604 with the voice.
The speech element 610 is read out, and the speech is synthesized by the speech synthesizer 606.
Based on the phonetic symbol information 604 based on the synthesis segment 610,
It is configured to Map information 609 and text
The reproduction of the strike information 603 on the display device 608 is shown in FIG.
This is the same as the second embodiment described with reference to FIG. Voice synthesis unit
610 reduces the reproduction speed when stored in the semiconductor memory.
This is suitable for FIG. 7 shows a fifth embodiment of the signal reproducing apparatus according to the present invention.
Is shown. The playback device 701 shown in FIG.
2, a first readout circuit 705, a speech synthesizer 706,
A second reading circuit 707 and a display device 708 are provided.
The phonetic symbol information 704 and the speech synthesis are stored in the storage device 702.
Segment 710, text information 703 and map information 70
9 is stored. These configurations are again illustrated in FIG.
It is the same as the raw device 601. The playback device 70 shown in FIG.
1 further includes an internal control device 712 and a discriminating device.
711 and an input device 710 are provided. further
External control device 713 is connected to playback device 701
Have been. The input device 710 outputs the sound of the system.
User knows the text information output before the system
It is an input device for giving instructions.
A switch corresponding to S / N corresponds to this. Discriminator
The device 711 is used to output the audio of the system or text information.
Information output and the input signal from the input device
It is a discriminating device. A window on the time axis is used as shown in FIG.
Thus, a correspondence can be obtained. Input time window is on
The signal from the force device 710 is discriminated by the discrimination device 711 by the internal controller.
Troll device 712 or external control device 713
Can be passed only when the window is open, and at other times
Rejects. The input time window also has text information 703
Or the end of the presentation time of the output of the phonetic symbol information 704
After a while, for example, within about 10 seconds,
State is that the user's judgment and actions are text
This is because there is a possibility that it will be later than the presentation of the voice. Also
Of course, due to this, the input from the user is this excess time
The time difference above must be given to the next input
Become. In the vehicle guidance information device, the burden on the user
Not too fast is not necessary for the system to mitigate
This extra time within about 10 seconds is acceptable
You. The internal control device 712 is the discriminating device 7
11, input from input device 710 considered valid
The signal is received, the next operation is determined, and the storage device 702 is determined.
Control. For example, (1) The system uses the map information displayed to the user.
Hear or not to display the associated text information. (2) The user sends a yes response using the switch. (3) The discriminator 711 outputs a signal from the speech synthesizer 706
Then, know the voice utterance time width, create an input time window, and
A yes signal is output to the section control device 712. (4) The internal control device 712 controls the recording device 702
The corresponding text information 703 is output. External control
The troll device 713 is a storage device 7 in the playback device 701.
02 is a circuit that controls other than
Speed control of uttered voice of voice synthesizer 706
And so on. FIG. 9 shows a storage device 702 such as a CD or the like.
This shows the recording state when a disk is used. FIG. 9 (A)
901 indicates a storage area of the text information 703.
And reference numeral 902 appears at about the same time as the text information 703.
The storage area of the phonetic phonetic symbol information 704 to be input is shown. FIG.
FIG. 9B shows a storage method in comparison with FIG.
Storage area of text information 703 and phonetic symbol information 70
4 shows an example in which four storage areas are arranged at different positions. Text
Information 703 is stored in the areas indicated by reference numerals 903 and 904.
The phonetic symbol information 704 is represented by reference numerals 905 and 906.
It is stored in the indicated area. FIG. 10 shows a sixth embodiment of the signal reproducing apparatus of the present invention.
Here is an example. The signal reproducing apparatus shown in FIG.
An example of a signal reproducing device suitable as a guidance information device will be described. This
The vehicle guidance information device 1001 includes a storage device 1002,
A first readout circuit 1005, a speech synthesizer 1006,
Buffer memory 1016, second read circuit 100
7, display device 1008, input device 1010, discrimination device 1
011, the first controller 1012, the second controller
It has a trawl device 1013. These configurations themselves,
10 except that a buffer memory 1016 is provided.
7, a playback device 701 and an external control device.
The configuration is similar to that of the device 713, and its basic processing operation is shown in FIG.
7 is the same as the signal reproducing apparatus illustrated in FIG. The vehicle guidance information device 1001 further includes
The first control having a reset input device 1014
The device 1012 is provided with a selection information storage device 1015.
ing. The preset input device 1014 provides guidance information for the vehicle.
Route that the vehicle passes when using the alarm device 1000
Considering (course), select a group of playback audio in advance
This is a selection signal input device. Selection information storage device
Reference numeral 1015 denotes a preset input device 1014 and an input device.
The context from the input signal from the
Information storage device for selecting a set of reproduced voices according to
Holds phonetic symbol address information corresponding to voice input
I do. Second reading circuit 1007 and display device 1008
The buffer memory 1016 provided between the
A set of phonetic symbols that match the text is stored and
The sound is output via the synthesizing device 1006. With this configuration
Of the vehicle on which the vehicle guidance information device 1000 is mounted.
The driver outputs a sound output that takes into account the course of the vehicle.
From the device 1006 and from the display device 10
08. [0044] According to the present invention, a bit rate of about several hundred bps is obtained.
Get audio of contents related to the text to be displayed in the trait
Can be According to the present invention, the statement
Plays because chapter analysis and speech synthesis rule processing are performed.
Time can be reduced. Also has a sentence analysis unit and a dictionary
Low cost because you can configure an online system without having to
A signal reproducing device can be manufactured particularly. According to the present invention,
Voice output and input when the user inputs in response to voice output
We can attach correspondence with power.

【図面の簡単な説明】【図１】本発明の信号再生装置の第１実施例としての音
声再生装置の構成図である。【図２】図１に示した音声再生装置の記憶装置に記憶さ
れるテキスト音声合成処理を行う音声合成装置の処理形
態図である。【図３】図２に示した音声合成装置における基本ピッチ
パタ−ンの生成を説明する図である。【図４】本発明の信号再生装置の第２実施例としての音
声再生装置の構成図である。【図５】本発明の信号再生装置の第３実施例としての音
声再生装置の構成図である。【図６】本発明の信号再生装置の第４実施例としての音
声再生装置の構成図である。【図７】本発明の信号再生装置の第５実施例としての音
声再生装置の構成図である。【図８】図７に示した判別装置の動作を説明する図であ
る。【図９】本発明の音声再生装置における記憶装置として
ディスク装置を用いた場合における音声発音記号情報と
テキスト情報の記憶形態を図解する図である。【図１０】本発明の信号再生装置の第６実施例としての
車両用誘導情報装置の構成図である。【符号の説明】１０１、４０１、５０１、６０１、７０１・・再生装置１０２、４０２、５０２、６０２、７０２、１００２
記録装置１０３、４０４、５０４、６０４、７０４、１００４・
・音声発音記号情報４０３、５０３、６０３、７０３、１００３・・テキス
ト情報１０４、４０５、５０５、６０５、７０５、１００５・
・第１の読み出し回路１０５、４０６、５０６、６０６、７０６、１００６・
・音声合成装置４０７、５０７、６０７、７０７、１００７・・第２の
読み出し回路４０８、５０８、６０８、７０８、１００８・・表示装
置５０９、６０９、７０９、１００９・・地図情報６１０・・・・・・・・・・・音声合成用素片２０１・・音声合成装置２０２・・音声単位記憶部２０３・・文章解析部２０４・・音声合成規則部２０５・・音声合成部７１０・・入力装置７１１・・判別装置７１２・・内部コントロール装置７１３・・外部コントロール装置１００１・・車両用誘導情報装置１０１４・・プリセット入力装置１０１５・・選択情報記憶装置１０１６・・バッファメモリBRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a configuration diagram of a sound reproducing device as a first embodiment of a signal reproducing device of the present invention. FIG. 2 is a processing form diagram of a voice synthesizing apparatus that performs a text voice synthesizing process stored in a storage device of the voice reproducing apparatus shown in FIG. 1; FIG. 3 is a view for explaining generation of a basic pitch pattern in the speech synthesizer shown in FIG. 2; FIG. 4 is a configuration diagram of a sound reproducing device as a second embodiment of the signal reproducing device of the present invention. FIG. 5 is a configuration diagram of a sound reproducing device as a third embodiment of the signal reproducing device of the present invention. FIG. 6 is a configuration diagram of an audio reproduction device as a fourth embodiment of the signal reproduction device of the present invention. FIG. 7 is a configuration diagram of an audio reproducing apparatus as a fifth embodiment of the signal reproducing apparatus of the present invention. FIG. 8 is a diagram for explaining the operation of the discrimination device shown in FIG. FIG. 9 is a diagram illustrating a storage form of phonetic symbol information and text information when a disk device is used as a storage device in the audio reproduction device of the present invention. FIG. 10 is a configuration diagram of a vehicle guidance information device as a sixth embodiment of the signal reproduction device of the present invention. [Explanation of Reference Codes] 101, 401, 501, 601, 701... Playback devices 102, 402, 502, 602, 702, 1002
Recording devices 103, 404, 504, 604, 704, 1004
-Phonetic symbol information 403, 503, 603, 703, 1003-Text information 104, 405, 505, 605, 705, 1005-
First reading circuits 105, 406, 506, 606, 706, 1006
A voice synthesizing device 407, 507, 607, 707, 1007 second reading circuit 408, 508, 608, 708, 1008 display device 509, 609, 709, 1009 map information 610 Speech synthesis unit 201 Speech synthesis unit 202 Speech unit storage unit 203 Text analysis unit 204 Speech synthesis rule unit 205 Speech synthesis unit 710 Input device 711 · Discrimination device 712 ··· Internal control device 713 ··· External control device 1001 ··· Vehicle guidance information device 1014 ··· Preset input device 1015 ··· Selection information storage device 1016 ··· Buffer memory

フロントページの続き (56)参考文献特開平３−160500（ＪＰ，Ａ) 特開平３−257485（ＪＰ，Ａ) 特開平３−87799（ＪＰ，Ａ) 特開平２−66598（ＪＰ，Ａ) 特開平２−234199（ＪＰ，Ａ) 特開昭63−225299（ＪＰ，Ａ) 特開平４−280300（ＪＰ，Ａ) 特開平４−199196（ＪＰ，Ａ) 特開平４−218871（ＪＰ，Ａ) 特開平３−259200（ＪＰ，Ａ) 特開平３−259196（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 G10L 13/00 Continuation of the front page (56) References JP-A-3-160500 (JP, A) JP-A-3-257485 (JP, A) JP-A-3-87799 (JP, A) JP-A-2-66598 (JP) JP-A-2-234199 (JP, A) JP-A-63-225299 (JP, A) JP-A-4-280300 (JP, A) JP-A-4-199196 (JP, A) 4-218871 (JP, A) JP-A-3-259200 (JP, A) JP-A-3-259196 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 13/08 G10L 13/00

Claims

(57) [Claims 1] Sound information including speech phonetic symbol information and a speech synthesis unit for speech synthesis that has been subjected to analysis processing for reproduction in advance, and the sound information corresponds to the sound information. Recording means for recording text information; selection information recording means for recording selection information of limited phonetic symbol information among the phonetic symbol information recorded in the recording means; and First reading means for reading the sound information from the recording means, sound synthesizing means for generating a sound reproduction signal from the read sound information, means for reproducing the generated sound reproduction signal, and second reading the text information And a display unit for displaying the text information.