JP4014361B2

JP4014361B2 - Speech synthesis apparatus, speech synthesis method, and computer-readable recording medium recording speech synthesis program

Info

Publication number: JP4014361B2
Application number: JP2001022729A
Authority: JP
Inventors: 和正本田; 智一森尾; 浩幸勘座
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2001-01-31
Filing date: 2001-01-31
Publication date: 2007-11-28
Anticipated expiration: 2021-01-31
Also published as: JP2002229578A

Abstract

PROBLEM TO BE SOLVED: To solve the problem that a voice synthesizing which reads a hypertext aloud can not read aloud information at a link destination connected in a network state by a hyperlink. SOLUTION: This voice synthesizing device has a means which extracts link destination information according to property information showing a link destination in an input hypertext, a means which obtains the hypertext corresponding to the link destination information, a means which generates a read- aloud text at the link destination according to the obtained hypertext, and a means which synthesizes a voice of the generated read-aloud text at the link destination.

Description

【０００１】
【発明の属する技術分野】
本発明は、音声合成装置及び音声合成方法並びに音声合成プログラムを記録したコンピュータ読み取り可能な記録媒体に関する。特に、ハイパーテキストの読み上げを行う音声合成装置及び音声合成方法並びに音声合成プログラムを記録したコンピュータ読み取り可能な記録媒体に関する。
【０００２】
【従来の技術】
従来より、漢字かな混じり文の文章を読み上げるテキスト音声合成技術が知られている(古井著「ディジタル音声処理」(東海大学出版会)。
【０００３】
一方、近年、インターネットの発展とともに、ＷＷＷ(ＷｏｒｌｄＷｉｄｅＷｅｂ)における情報の記述書式としてＩＳＯ／ＩＥＣ１５４４５で規定されているＨＴＭＬ(ＨｉｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ)で記述されたハイパーテキストを読み上げるハイパーテキスト読上装置が提案されている(特開平８−８３０８９号公報)。このハイパーテキスト読上装置によれば、テキスト中からハイパーリンクの部分を検出し、そのハイパーリンクの部分の読上音声と通常の部分の読上音声とを変化させて出力することにより、ハイパーリンクが設定されている部分を音声で知覚させることができる。
【０００４】
【発明が解決しようとする課題】
しかしながら、上記ハイパーテキスト読上装置は、テキスト中のハイパーリンクの存在を読上音声の違いからユーザに知覚させることが可能であるが、リンク先の内容についてまでは読み上げられないため、そのリンク先にどのような情報が存在しているかをユーザに知らせることはできなかった。
【０００５】
本発明は、ハイパーテキストを読み上げる音声合成装置において、ハイパーリンクによりネットワーク状に繋がれているリンク先の情報を読み上げ可能な音声合成装置を提供することを主な目的とする。
【０００６】
【課題を解決するための手段】
本発明は、ハイパーテキストで示される情報を表示する表示手段と、入力ハイパーテキスト中のリンク先を示す属性情報を解析すると共に、この解析結果に従って上記入力ハイパーテキスト中からインターネット上のハイパーテキストの接続先を表すリンク先を示すリンク先情報を抽出するリンク先情報抽出手段と、上記リンク先情報に対応するインターネット上のハイパーテキストを上記リンク先情報が示すインターネット上のリンク先から取得した後、当該ハイパーテキストに含まれると共に、読み上げテキストであることを表すマークアップ情報に基づいて、当該ハイパーテキストからリンク先の読み上げハイパーテキストを抽出するリンク先読み上げテキスト抽出手段と、上記抽出されたリンク先の読み上げハイパーテキストに基づいてリンク先の読み上げテキストを生成するリンク先読み上げテキスト生成手段と、上記入力ハイパーテキストに対する上記表示手段による表示処理を維持させる一方、上記インターネット上から取得したハイパーテキストから抽出および生成されたリンク先の読み上げテキストを音声合成する音声合成手段と、を有することを特徴とする音声合成装置を提供する。
【０００７】
本発明によれば、入力されたハイパーテキスト中のリンク先の情報を取得し、このリンク先情報に対応するインターネット上のハイパーテキストを上記リンク先情報が示すインターネット上のリンク先から取得した後、当該ハイパーテキストから、マークアップ情報に基づいてリンク先の読み上げテキストを生成できるため、入力されたハイパーテキスト中のリンク先のハイパーテキストから抽出および生成された読み上げテキストを読み上げることが可能である。その際に、上記入力ハイパーテキストに対する上記表示手段による表示処理を維持させるので、リンク先の表示後に元の本文の表示に戻るという煩わしい作業を回避することが可能になる。さらに、予め上記リンク先のハイパーテキスト中に読み上げるべきテキストを指定するマークアップ情報が挿入されている場合には、その指定された部分のテキストだけを読み上げるようにすることができ、余分な情報を読み上げなくて済む。
【０００８】
また、本発明は、上記入力ハイパーテキストに基づいて本文の読み上げテキストを生成する本文読み上げテキスト生成手段と、上記生成された本文の読み上げテキストと上記生成されたリンク先の読み上げテキストとを切り替えて音声合成する切替音声合成手段と、をさらに有することを特徴とする音声合成装置を提供する。
【０００９】
本発明によれば、本文の読み上げテキストとリンク先の読み上げテキストとを切り替えて出力することができるため、入力されたハイパーテキストで示される情報のうち本文に関する読み上げテキスト情報とリンク先に関する読み上げテキスト情報とを、ユーザが希望する通りの順序で音声合成することが可能である。
【００１０】
また、本発明は、所望のリンク先の入力を受け付ける入力手段をさらに有し、上記入力手段でリンク先の入力を受け付けた場合には、上記切替音声合成手段によって該リンク先の読み上げテキストを音声合成する一方、上記表示手段で上記本文の読み上げテキストを表示することを特徴とする音声合成装置を提供する。
【００１１】
本発明によれば、ユーザが所望のリンク先を入力した場合、そのリンク先の情報を音声合成する一方、現在表示されている本文情報をそのまま表示し続けることができるため、リンク先の表示後に元の本文の表示に戻るという煩わしい作業を回避することが可能である。
【００１２】
また、本発明は、上記入力手段によるリンク先の入力は、カーソルを含むポインタが、上記表示されている本文の読み上げテキスト中におけるリンク先を表す文字又は画像を含む表示情報に移動することによって行われることを特徴とする音声合成装置を提供する。
【００１３】
また、本発明は、上記リンク先の読み上げテキストの音声合成が要求された場合に、上記リンク先の読み上げテキストに対応する情報を表示させるか否かを選択する選択手段をさらに有することを特徴とする音声合成装置を提供する。
【００１４】
本発明によれば、現在表示されているＷｅｂページを目で見ながら、その表示を他のＷｅｂページに切り替えることなくリンク先の情報を音声合成によって取得することができると共に、ユーザが希望するときに現在音声合成されている情報に対応する表示に切り替えることが可能である。
【００１５】
また、本発明は、入力ハイパーテキストで示される情報を表示するステップと、入力ハイパーテキスト中のリンク先を示す属性情報を解析すると共に、この解析結果に従って上記入力ハイパーテキスト中からインターネット上のハイパーテキストの接続先を表すリンク先を示すリンク先情報を抽出するステップと、上記リンク先情報に対応するインターネット上のハイパーテキストを上記リンク先情報が示すインターネット上のリンク先から取得した後、当該ハイパーテキストに含まれると共に、読み上げテキストであることを表すマークアップ情報に基づいて、当該ハイパーテキストからリンク先の読み上げハイパーテキストを抽出するステップと、上記抽出されたリンク先の読み上げハイパーテキストに基づいてリンク先の読み上げテキストを生成するステップと、上記入力ハイパーテキストに対する上記表示処理を維持させる一方、上記インターネット上から取得したハイパーテキストから抽出および生成されたリンク先の読み上げテキストを音声合成するステップと、を有することを特徴とする音声合成方法を提供する。
【００１６】
本発明によれば、入力されたハイパーテキスト中のリンク先の情報を取得し、このリンク先情報に対応するインターネット上のハイパーテキストを上記リンク先情報が示すインターネット上のリンク先から取得した後、当該ハイパーテキストから、マークアップ情報に基づいてリンク先の読み上げテキストを生成できるため、入力されたハイパーテキスト中のリンク先のハイパーテキストから抽出および生成された読み上げテキストを読み上げることが可能である。その際に、上記入力ハイパーテキストに対する表示処理を維持させるので、リンク先の表示後に元の本文の表示に戻るという煩わしい作業を回避することが可能になる。さらに、予め上記リンク先のハイパーテキスト中に読み上げるべきテキストを指定するマークアップ情報が挿入されている場合には、その指定された部分のテキストだけを読み上げるようにすることができ、余分な情報を読み上げなくて済む。
【００１７】
また、本発明は、上記音声合成方法の各ステップをコンピュータに実行させるための音声合成プログラムを記録したコンピュータ読み取り可能な記録媒体を提供する。
【００１８】
本発明によれば、入力されたハイパーテキスト中のリンク先の情報が示すインターネット上のリンク先から取得したハイパーテキストからリンク先の読み上げテキストを生成できるため、入力されたハイパーテキスト中のリンク先のハイパーテキストから抽出および生成された読み上げテキストを読み上げることが可能である。
【００１９】
また、上記本発明は、上記リンク先のテキストの読み上げテキストを生成するリンク先読み上げテキスト生成手段を有して、上記取得したハイパーテキストの一部又は全部を加工してリンク先の読み上げテキストを生成ようにすることが望ましい。
【００２０】
上記構成によれば、テキストの要約技術を用いて、取得したハイパーテキストの一部又は全部の情報を加工するため、リンク先の情報を要約したテキストを生成することができ、リンク先の情報量が多い場合でも適切な情報を音声合成することが可能になる。
【００２１】
【発明の実施の形態】
（実施の形態１）
図１は、本発明に係る音声合成装置の一形態を示す機能ブロック図である。この音声合成装置１００は、ハイパーテキスト解釈部１０１、読み上げテキスト生成部１０２、言語処理部１０３、音声合成部１０４、音声出力部１０５で構成される。
【００２２】
本実施の形態では、音声合成装置１００は汎用のパーソナルコンピュータに内蔵され、ネットワークを介して受信されるハイパーテキストを音声合成して出力する場合について説明を行うが、本発明はこれに限定されるものではなく、入力されたハイパーテキストに対して音声合成を行うような装置全般に対して適用可能である。
【００２３】
図１に示すように、音声合成装置１００にはハイパーテキストが入力される。このハイパーテキストはＷＷＷサーバでのドキュメントを記述するための言語として広く知られているＨＴＭＬ等のマークアップ言語で記述されている。また、このハイパーテキストには、インターネット上のリソースのロケーションを指し示すための情報であるＵＲＬ(ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ)や文章等が埋め込められている。一般に、インターネット上でブラウザを使用して様々なサーバに接続できるようにすることをリンクと呼び、特にハイパーテキストを使用して接続先を指定することをハイパーリンクと呼ぶ。以下の説明では、特に断らない限り、ハイパーリンクのことを単にリンクと呼ぶこととする。また、ハイパーテキストに埋め込められたＵＲＬにより指定される接続先のことをリンク先と呼ぶこととする。
【００２４】
ハイパーテキスト解釈部１０１は、入力されたハイパーテキスト中に記述された属性情報を解析する。解析の結果、入力ハイパーテキストの本文の内容を示す情報(以下、本文の読み上げテキストという)は言語処理部１０３に出力される一方、リンク先を示すＵＲＬ(以下、リンク先情報という)は文字列を要素とする配列で表され、読み上げテキスト生成部１０２に出力される。
【００２５】
読み上げテキスト生成部１０２は、ハイパーテキスト解釈部１０１から出力されたリンク先情報で示されるリンク先にネットワークを介して接続し、リンク先に記録されている情報(以下、リンク情報という)を示すハイパーテキストを取得して、そのハイパーテキスト中の属性情報に基づいてリンク先の読み上げテキストを生成し、言語処理部１０３に出力する。
【００２６】
言語処理部１０３は、ハイパーテキスト解釈部１０１の出力する本文の読み上げテキスト及び読み上げテキスト生成部１０２の出力するリンク先の読み上げテキストを入力とし、それらが示すテキスト情報を解析して、読み上げるテキスト情報を表す文字列と、その文字列に対応する音の高さと継続時間の数値とをそれぞれ記憶した配列とを音声合成部１０４に出力する。
【００２７】
音声合成部１０４は、言語処理部１０３の出力した情報から音声波形を表す時系列の数値の配列を生成し音声出力部１０５へ出力する。
【００２８】
音声出力部１０５は、スピーカなどの音声出力装置を有し、音声合成部１０４から出力された合成波形をＤ／Ａ変換して出力する。
【００２９】
図２は、本実施の形態に係る音声合成装置１００にハイパーテキストが入力され、ハイパーテキスト解釈部１０１及び読み上げテキスト生成部１０２において、そのハイパーテキストから読み上げるテキスト情報を生成する際の具体例を説明するための概念図である。
【００３０】
図２(ａ)は、入力されるハイパーテキストの一例である。図２(ａ)において、＜ａｈｒｅｆ＝…＞から＜／ａ＞までがリンクの記述であり、＜ａｈｒｅｆ＝…＞及び＜／ａ＞がリンクを示す属性情報である。また、“ｈｏｋｋａｉｄｏ．ｈｔｍｌ”及び“ｔｏｕｈｏｋｕ．ｈｔｍｌ”がリンク先情報である。この例では、このハイパーテキストは、北海道の天気の情報及び東北の天気の情報にリンクされていることが示されている。
【００３１】
図２(ｂ)は、読み上げテキスト生成部１０２において生成されるリンク先の読み上げテキストの一例である。このリンク先の読み上げテキストは、リンク先(ｈｏｋｋａｉｄｏ．ｈｔｍｌ)のサーバに蓄積されている情報であり、ハイパーテキスト解釈部１０１において、リンクを示す属性情報(＜ａｈｒｅｆ＝…＞及び＜／ａ＞)からリンク先情報(ｈｏｋｋａｉｄｏ．ｈｔｍｌ)を抽出し、読み上げテキスト生成部１０２において、そのリンク先に接続して、そのリンク先情報に対応するハイパーテキストを取得する。その後、その取得したハイパーテキスト中の属性情報に基づいてリンク先の読み上げテキストを生成する。この場合、ハイパーテキスト中の全ての属性情報に対応する情報をリンク先の読み上げテキストとして生成するようにしてもよいし、ハイパーテキスト中のリンクを示す属性情報以外で示される属性情報に対応する情報をリンク先の読み上げテキストとして生成するようにしておいてもよい。
【００３２】
図２(ｃ)は、ハイパーテキスト解釈部１０１において生成される本文の読み上げテキストの一例である。この本文の読み上げテキストは、ハイパーテキスト内のリンクを示す属性情報以外で示される属性情報で示される情報である。
（実施の形態２）図３は、本発明に係る音声合成装置の別の形態を示す機能ブロック図である。図１と同一の構成要素には同一の符号を付し、同じ説明は省略する。
【００３３】
読み上げテキスト管理部１０６は、ハイパーテキスト解釈部１０１及び読み上げテキスト生成部１０２から入力されたテキスト情報を適切な順序で言語処理部１０３に出力するように管理する。
【００３４】
図４は読み上げテキスト管理部１０６における処理の一例を示すフローチャートである。図４に従って読み上げテキスト管理部１０６の処理を説明すると以下の通りである。
【００３５】
ハイパーテキスト解釈部１０１及び読み上げテキスト生成部１０２から読み上げテキスト管理部１０６にテキスト情報が入力されると、テキスト情報を構成するそれぞれの文字列を区別して記憶する(Ｓ１０１)。すなわち、ハイパーテキスト解釈部１０１から入力された本文の読み上げテキストの文字列を配列Ｈに、読み上げテキスト生成部１０２から入力されたリンク先の読み上げテキストの文字列を配列Ｙにそれぞれ記憶する。このとき、ハイパーテキスト解釈部１０１から入力された文字列の数をＮ、読み上げテキスト生成部１０２から入力された文字列の数をＭとした場合、配列Ｈ、Ｙには、それぞれ、Ｈ［１］、Ｈ［２］、…Ｈ［Ｎ］、Ｙ［１］、Ｙ［２］…、Ｙ［Ｍ］が記憶される。
【００３６】
次に、ループの制御変数ｉに１を代入し(Ｓ１０２)、Ｈ［ｉ］を言語処理部１０３に出力する(Ｓ１０３)。さらに次に、制御変数ｉに１を加え(Ｓ１０４)、ｉ＜Ｎの条件を満足するかどうかを判定する(Ｓ１０５)。条件を満足する場合は、Ｓ１０３に戻り、ハイパーテキスト解釈部１０１から入力された文字列を出力し終えるまで処理を繰り返す。出力し終えると、制御変数ｉを１に初期化し直し(Ｓ１０６)、Ｙ［ｉ］を言語処理部１０３に出力する(Ｓ１０７)。さらに次に、制御変数ｉに１を加え(Ｓ１０８)、ｉ＜Ｍの条件を満足するかどうかを判定する(Ｓ１０９)。条件を満足する場合は、Ｓ１０７に戻り、読み上げテキスト生成部１０２から入力された文字列を出力し終えるまで処理を繰り返す。
【００３７】
以上の処理により、まず、本文の読み上げテキストを音声合成し、その次にリンク先の読み上げテキストを音声合成することが可能である。
【００３８】
本実施の形態では、まず本文の読み上げテキストを出力し、その次にリンク先の読み上げテキストを出力するようにテキスト情報を管理しているが、順番を逆にして出力したり、それぞれを交互に出力するようにしても良い。すなわち、読み上げテキスト管理部１０６において、本文の読み上げテキストの出力とリンク先の読み上げテキストの出力とを切り替え可能に設定したり、本文の読み上げテキスト又はリンク先の読み上げテキストを出力するか否かを制限したりするようにしてもよい。
（実施の形態３）図５は、本発明に係る音声合成装置の別の形態を示す機能ブロック図である。図１と同一の構成要素には同一の符号を付し、同じ説明は省略する。
【００３９】
テキスト要約部１０７は、読み上げテキスト生成部１０２からリンク先の読み上げテキストを受け取り、そのテキスト情報を要約して、その要約情報を読み上げテキスト生成部１０２に返す。テキスト情報の要約については、既存のテキスト要約技術を用いて行えば良い。
【００４０】
このような構成により、入力されたハイパーテキスト中に設定されているリンク先の情報を要約したテキストを生成することができるため、リンク先の情報量が多い場合でも適切な情報を音声合成することが可能である。
（実施の形態４）図６は、本発明に係る音声合成装置の別の形態を示す機能ブロック図である。図１と同一の構成要素には同一の符号を付し、同じ説明は省略する。
【００４１】
テキスト翻訳部１０８は、読み上げテキスト生成部１０２からリンク先の読み上げテキストを受け取り、そのテキスト情報を予め設定しておいたユーザの使用言語に翻訳して、その翻訳情報を読み上げテキスト生成部１０２に返す。テキスト情報の翻訳については、既存の自然言語自動翻訳技術(長尾真監修「日本語情報処理」社団法人電子通信学会)を用いて行えば良い。
【００４２】
このような構成により、入力されたハイパーテキスト中に設定されているリンク先の情報を予めユーザが設定しておいた使用言語に翻訳することができるため、ユーザが理解できない言語の情報でもユーザが理解できる言語で音声合成することが可能である。
（実施の形態５）図７は、本発明に係る音声合成装置の別の形態を示す機能ブロック図である。図１と同一の構成要素には同一の符号を付し、同じ説明は省略する。
【００４３】
読み上げテキスト抽出部１０９は、読み上げテキスト生成部１０２からリンク情報を受け取り、そのリンク情報にテキスト読み上げ用の属性情報がある場合には、その部分を抽出して、その読み上げ情報を読み上げテキスト生成部１０２に返す。
【００４４】
テキスト読み上げ用の属性情報について、図８を参照して説明すると以下の通りである。
【００４５】
図８は、テキスト読み上げ用の属性情報が設定されたハイパーテキストからその部分を抽出する場合の一例を説明するための図である。図８(ａ)中の＜ｓｐｅａｋ＞がテキスト読み上げ用の属性情報を示しており、＜ｓｐｅａｋ＞(テキスト読み上げ開始タグ)と＜／ｓｐｅａｋ＞(テキスト読み上げ終了タグ)とに挟まれたテキスト情報が、図８(ｂ)のように読み上げ情報として抽出される。
【００４６】
このように、予めリンク情報の中に読み上げるべき情報が指定されている場合は、その部分だけを読み上げるようにすることができ、余分な情報を読み上げなくて済む。
（実施の形態６）図９は、本発明に係る音声合成装置の別の形態を示す機能ブロック図である。図１と同一の構成要素には同一の符号を付し、同じ説明は省略する。
【００４７】
表示部１１２は、液晶ディスプレイなどの表示デバイスを備え、ハイパーテキストで示される情報を表示するものである。なお、表示する際には既存のＷＷＷブラウザなどのハイパーテキストを表示させるためのソフトウエアが利用される。
【００４８】
入力部１１１は、ユーザ所望のリンク先の入力を受け付ける。そのリンク先を示す文字列は読み上げテキスト生成部１０２及びハイパーテキスト入力部１１０に出力される。この入力部１１１は、キーボードやマウスなどの入力デバイスを備えており、表示部１１２に表示されるＷＷＷブラウザ画面を介してキーボードでＵＲＬをタイプしたり、マウスでリンク先を表す文字や画像をクリックすることにより、対応する文字列を読み上げテキスト生成部１０２及びハイパーテキスト管理部１１０に出力させる。
【００４９】
ハイパーテキスト管理部１１０は、ＷＷＷブラウザを介して表示部１１２に表示させるＷｅｂページの基となるハイパーテキストを決定し出力する。例えば、本発明に係る音声合成装置を搭載したパソコンを起動した場合などの初期状態では、ＷＷＷブラウザで予め設定されているスタートページを表示部１１２に表示させるように決定する。また、入力部１１１又は読み上げテキスト生成部１０２からリンク先を示す文字列が入力されると、その文字列に対応するリンク先の情報を表すハイパーテキストを取得して、そのハイパーテキストを表示部１１２及びハイパーテキスト解釈部１０１に出力する。
【００５０】
また、現在表示部１１２に表示されているＷｅｂページにおいて、ユーザがリンク先を指定した場合、現在表示されているＷｅｂページはそのまま表示を維持する一方、指定されたリンク先の情報を読み上げるようにするか、又は、現在表示されているＷｅｂページをリンク先のＷｅｂページに切り替えると共に、指定されたリンク先の情報を読み上げるようにするかをユーザが選択可能なインタフェースを提供することもできる。後者の場合は、例えば、ページの切り替えを行うボタンを表す画像を表示部１１２に表示させておき、入力部１１１でそのボタンが選択されると、ハイパーテキスト管理部１１０において、現在読み上げられている文章に対応するリンク先のハイパーテキストを取得し、そのハイパーテキストが示す情報を表示部１１２に出力するようにしておけば良い。
【００５１】
このような構成により、現在表示されているＷｅｂページを目で見ながら、その表示を他のＷｅｂページに切り替えることなくリンク先の情報を音声合成によって取得することができると共に、ユーザが希望するときに現在音声合成されている情報に対応する表示に切り替えることができる。
【００５２】
また、入力部１１１において、マウスでリンク先を表す文字や画像にカーソルを移動させた時に、リンク先の文字列を読み上げテキスト生成部１０２に出力するようにしても良い。このような構成により、現在表示中のＷｅｂページはそのままで、指定されたリンク先の情報を音声合成により読み上げることができる。
【００５３】
また、現在音声合成で読み上げているリンク先に対応する文字列を読み上げテキスト生成部１０２から表示部１１２に出力して、そのリンク先に対応する文字を、例えば点滅させるなど他の文字とは変化させて表示させることにより、現在音声合成で読み上げているリンク先をユーザに知らせるようにすることができる。なお、表示の変化のさせ方については、点滅させる以外に、例えば、リンク先の文字や画像の付近にスピーカの画像を表示させても良い。
【００５４】
また、図１０に示すように、リンク先表示部１１３を設け、そこに、音声で読み上げを行うテキストに対応するリンク先を表す文字列を表示するようにしておいても良い。
【００５５】
なお、本実施の形態では、入力部１１１において、マウスでリンク先をクリックした場合は、リンク先の文字列をハイパーテキスト管理部１１０に出力し、カーソルをリンク先に移動した場合は、リンク先の文字列を読み上げテキスト生成部１０２に出力するようにしているが、それらの出力先は逆にしても良いし、両方に出力するようにしても良い。
（実施の形態７）本実施の形態に係る音声合成装置は、上述の実施の形態３〜６に係る音声合成装置を組み合わせた構成、すなわち、実施の形態１に係る音声合成装置の各構成に加えて、テキスト要約部１０７とテキスト翻訳部１０８と読み上げテキスト抽出部１０９とハイパーテキスト管理部１１０と入力部１１１と表示部１１２とリンク先表示部とを備える構成の音声合成装置(図示せず)である。本実施の形態では、この音声合成装置が搭載されたパソコンにおいて、ＷＷＷブラウザが起動してスタートページが表示される場合の読み上げテキスト生成部１０２における処理について説明する。
【００５６】
図１１は、読み上げテキスト生成部１０２における処理の一例を示すフローチャートである。図１１に従って読み上げテキスト生成部１０２の処理を説明すると以下の通りである。
【００５７】
ＷＷＷブラウザが起動すると、スタートページに対応するハイパーテキストがハイパーテキスト管理部１１０に入力されて表示部１１２でスタートページが表示されると共に、ハイパーテキスト解釈部１０１でハイパーテキスト中の属性情報が解析されてリンク先情報が読み上げテキスト生成部１０２に入力される。
【００５８】
リンク先情報は文字列を要素とする配列Ａで表され、要素数をＮとすると、Ａ［１］、Ａ［２］、…、Ａ［Ｎ］と表される。リンク先情報が読み上げテキスト生成部１０２に入力されると、ループの制御変数ｉに１が代入される(Ｓ２０１)。次に、Ａ［ｉ］に対応するリンク先情報から読み上げテキストを生成し、リンク先を表す文字列と対応させて記憶させる(Ｓ２０２)。Ｓ２０２における詳細な処理については後述する。
【００５９】
さらに次に、制御変数ｉに１を加え(Ｓ２０３)、ｉ＜Ｎの条件を満足するかどうかを判定する(Ｓ２０４)。条件を満足する場合は、Ｓ２０２に戻り、リンク先情報の文字列の最後(Ａ［ｎ］)まで処理を繰り返す。
【００６０】
Ｓ２０４において、リンク先情報の最後の文字まで処理が終了すると、次に入力部１１１からのデータを受け付ける。入力部１１１からリンク先を指定する文字列が入力されると、その指定されたリンク先に対応する読み上げテキストを表す文字列を言語処理部１０３に出力し、リンク先を表す文字列をリンク先表示部１１３に表示させる。また、入力部１１１から表示ページの切り替え指示が入力されると、リンク先表示部１１３に表示されている文字列が読み上げテキスト生成部１０２からハイパーテキスト管理部１１０に出力され、リンク先のページが表示部１１２に表示される。また、Ｓ２０５において、入力部１１１からの入力がなければ、入力があるまで待つ。
【００６１】
図１２は、上述のＳ２０２における処理の詳細なフローチャートである。
【００６２】
リンク先情報を表す文字列Ａ［ｉ］で指定されるリンク先から、リンク情報を取得しバッファに記憶しておく。なお、リンク情報は、周知のＨＴＴＰプロトコル(ハイパーテキスト転送プロトコル)などを利用して取得することができる。
【００６３】
次に、取得したリンク情報に、例えば＜ｓｐｅａｋ＞などで示されるテキスト読み上げ用の属性情報があるかどうかについて判断され(Ｓ３０２)、テキスト読み上げ用の属性情報がある場合は、Ｓ３０６に進み、テキスト読み上げ用の情報を抽出し、その情報を読み上げテキスト情報としてリンク先を表す文字列と対応させて記憶する。
【００６４】
Ｓ３０２においてテキスト読み上げ用の属性情報がない場合は、Ｓ３０３に進みリンク情報の翻訳が必要であるかどうかについて判断する。リンク情報のハイパーテキストのヘッダ部分に記述されている言語とユーザの使用言語とが比較され、異なっていれば翻訳が必要と判断して、リンク情報を翻訳する(Ｓ３０４)。一致していれば翻訳は不要と判断して、Ｓ３０５に進む。
【００６５】
Ｓ３０５では、リンク情報を要約して、その情報を読み上げテキスト情報として、リンク先を表す文字列と対応させて記憶する。
【００６６】
なお、上述の実施の形態１〜実施の形態７における音声合成装置の各構成を任意に組み合わせた音声合成装置を提供することも可能である。このように各構成を任意に組み合わせた場合でも、それぞれの実施の形態で説明した同様の作用効果を奏することは明らかである。
【００６７】
また、上述の各実施の形態における処理の一部又は全部をコンピュータによる処理に適した命令の順番付けられた列からなるもの(プログラム)として提供することも可能である。また、そのプログラムのインストール、実行、プログラムの流通のために、そのプログラムを記録したコンピュータ読取可能な記録媒体として提供することも可能である。
【００６８】
【発明の効果】
本発明では、ハイパーテキストを読み上げる音声合成装置において、ハイパーリンクによりネットワーク状に繋がれているリンク先のハイパーテキストから抽出および生成された読み上げテキストを、本文の読み上げテキストを表示した状態で、読み上げ可能な音声合成装置を提供することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る音声合成装置の機能ブロック図である。
【図２】本発明の実施の形態１に係る音声合成装置の入力となるハイパーテキストと、そのハイパーテキストから生成されるテキスト情報の概念図である。
【図３】本発明の実施の形態２に係る音声合成装置の機能ブロック図である。
【図４】本発明の実施の形態２に係る音声合成装置のテキスト管理部の処理を示すフローチャートである。
【図５】本発明の実施の形態３に係る音声合成装置の機能ブロック図である。
【図６】本発明の実施の形態４に係る音声合成装置の機能ブロック図である。
【図７】本発明の実施の形態５に係る音声合成装置の機能ブロック図である。
【図８】本発明の実施の形態５に係る音声合成装置の読み上げテキスト抽出部の動作を示す概念図である。
【図９】本発明の実施の形態６に係る音声合成装置の機能ブロック図である。
【図１０】本発明の実施の形態６に係る音声合成装置の機能ブロック図である。
【図１１】本発明の実施の形態７に係る音声合成装置の読み上げテキスト生成部の処理を示すフローチャートである。
【図１２】本発明の実施の形態７に係る音声合成装置の読み上げテキスト生成部の処理を示すフローチャートである。
【符号の説明】
１００音声合成装置
１０１ハイパーテキスト解釈部
１０２読み上げテキスト生成部
１０３言語処理部
１０４音声合成部
１０５音声出力部
１０６読み上げテキスト管理部
１０７テキスト要約部
１０８テキスト翻訳部
１０９読み上げテキスト抽出部
１１０ハイパーテキスト管理部
１１１入力部
１１２表示部
１１３リンク先表示部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech synthesizer, a speech synthesis method, and a computer-readable recording medium on which a speech synthesis program is recorded. In particular, the present invention relates to a speech synthesizer that reads out hypertext, a speech synthesis method, and a computer-readable recording medium that records a speech synthesis program.
[0002]
[Prior art]
Conventionally, a text-to-speech synthesis technique that reads a sentence in a kanji-kana mixed sentence is known ("Digital Speech Processing" by Furui (Tokai University Press).
[0003]
On the other hand, in recent years, with the development of the Internet, a hypertext reading device has been proposed that reads hypertext described in HTML (Hyper Text Markup Language) defined by ISO / IEC15445 as a description format of information in WWW (World WideWeb). (JP-A-8-83089). According to this hypertext reading device, a hyperlink portion is detected from the text, and the hyperlink portion is read out and the normal portion of the read-out speech is changed and output. Can be perceived by voice.
[0004]
[Problems to be solved by the invention]
However, although the above hypertext reading apparatus can make the user perceive the presence of a hyperlink in the text from the difference in the reading voice, since the contents of the link destination cannot be read out, the link destination It was impossible to inform the user what kind of information exists.
[0005]
The main object of the present invention is to provide a speech synthesizer that can read out information of a link destination connected in a network by a hyperlink in a speech synthesizer that reads out hypertext.
[0006]
[Means for Solving the Problems]
The present invention analyzes display means for displaying information indicated by hypertext, and attribute information indicating a link destination in the input hypertext, and from the input hypertext according to the analysis result. Represents the destination of hypertext on the Internet Link destination information extracting means for extracting link destination information indicating the link destination, and corresponding to the link destination information On the internet Hypertext Is obtained from the link destination on the Internet indicated by the link destination information, and then the hypertext And a link destination reading text extracting means for extracting the link destination reading hypertext from the hypertext based on the markup information that is included in the reading text and the extracted reading destination reading hypertext. A link-to-speech text generation unit that generates a link-to-speech text based on the input hypertext, and a display process by the display unit for the input hypertext, Extracted from hypertext obtained from the Internet and There is provided a speech synthesizer characterized by comprising speech synthesizer that synthesizes speech of a link destination read-out text generated.
[0007]
According to the present invention, the information of the link destination in the input hypertext is acquired and the link destination information is handled. On the internet Hypertext Is obtained from the link destination on the Internet indicated by the link destination information, and then the hypertext Can generate the text to be read out from the link based on the markup information. Reading text extracted and generated from hypertext Can be read aloud. At that time, since the display process by the display means for the input hypertext is maintained, it is possible to avoid a troublesome work of returning to the display of the original text after displaying the link destination. Furthermore, if markup information for designating text to be read out is inserted in advance in the hypertext at the link destination, it is possible to read out only the text of the designated part, and extra information There is no need to read it aloud.
[0008]
Further, the present invention provides a text-to-speech text generating means for generating text-to-speech text based on the input hypertext, and switching between the generated text-to-speech text and the generated text to be read at the link destination. There is provided a speech synthesizer characterized by further comprising a switched speech synthesizer for synthesizing.
[0009]
According to the present invention, it is possible to switch and output the text to be read out from the text and the text to be read out from the link destination. Therefore, the text text to be read out from the text and the text information from the link destination among the information indicated by the input hypertext. Can be synthesized in the order desired by the user.
[0010]
The present invention further includes an input means for receiving an input of a desired link destination, and when the input of the link destination is received by the input means, the read-out text of the link destination is voiced by the switching voice synthesizing means. On the other hand, a speech synthesizer is provided that displays the text to be read out by the display means on the display means.
[0011]
According to the present invention, when the user inputs a desired link destination, the link destination information can be synthesized, while the currently displayed text information can be displayed as it is. It is possible to avoid the troublesome work of returning to the display of the original text.
[0012]
Further, according to the present invention, the input of the link destination by the input means is performed by moving the pointer including the cursor to display information including a character or an image representing the link destination in the read text of the displayed text. A speech synthesizer is provided.
[0013]
The present invention further includes selection means for selecting whether or not to display information corresponding to the link-to-speech text when speech synthesis of the link-to-speech text is requested. A speech synthesizer is provided.
[0014]
According to the present invention, while viewing a currently displayed web page, the linked information can be acquired by voice synthesis without switching the display to another web page, and when the user desires It is possible to switch to the display corresponding to the information currently being synthesized.
[0015]
Further, the present invention analyzes the attribute information indicating the link destination in the input hypertext, the step of displaying the information indicated by the input hypertext, and from the input hypertext according to the analysis result. Represents the destination of hypertext on the Internet A step of extracting link destination information indicating a link destination, and corresponding to the link destination information On the internet Hypertext Is obtained from the link destination on the Internet indicated by the link destination information, and then the hypertext And a link destination based on the extracted hypertext of the link destination extracted from the hypertext based on markup information indicating that the text is a text to be read Generating the read-out text and maintaining the display processing for the input hypertext, Extracted from hypertext obtained from the Internet and And a step of synthesizing the generated text to be read at the link destination.
[0016]
According to the present invention, the information of the link destination in the input hypertext is acquired and the link destination information is handled. On the internet Hypertext Is obtained from the link destination on the Internet indicated by the link destination information, and then the hypertext Can generate the text to be read out from the link based on the markup information. Reading text extracted and generated from hypertext Can be read aloud. At that time, since the display process for the input hypertext is maintained, it is possible to avoid the troublesome work of returning to the display of the original text after displaying the link destination. Furthermore, if markup information for designating text to be read out is inserted in advance in the hypertext at the link destination, it is possible to read out only the text of the designated part, and extra information There is no need to read it aloud.
[0017]
The present invention also provides a computer-readable recording medium on which a speech synthesis program for causing a computer to execute each step of the speech synthesis method is recorded.
[0018]
According to the present invention, link destination information in the input hypertext From hypertext obtained from the link destination on the Internet Since the text to be read out can be generated, the link destination in the input hypertext Reading text extracted and generated from hypertext Can be read aloud.
[0019]
Further, the present invention has link destination reading text generation means for generating a reading text of the link destination text, and generates a reading text of the link destination by processing a part or all of the acquired hypertext. It is desirable to do so.
[0020]
According to the above-described configuration, since part or all of the acquired hypertext information is processed using text summarization technology, it is possible to generate text that summarizes link destination information, and the amount of link destination information. Even when there are many, it becomes possible to synthesize appropriate information.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
(Embodiment 1)
FIG. 1 is a functional block diagram showing an embodiment of a speech synthesizer according to the present invention. The speech synthesizer 100 includes a hypertext interpretation unit 101, a reading text generation unit 102, a language processing unit 103, a speech synthesis unit 104, and a speech output unit 105.
[0022]
In the present embodiment, the case where speech synthesis apparatus 100 is built in a general-purpose personal computer and hypertext received via a network is synthesized by speech and output will be described, but the present invention is limited to this. However, the present invention can be applied to all devices that perform speech synthesis on input hypertext.
[0023]
As shown in FIG. 1, hypertext is input to the speech synthesizer 100. This hypertext is described in a markup language such as HTML which is widely known as a language for describing a document on a WWW server. Further, in this hypertext, a URL (Uniform Resource Locator) that is information for indicating the location of a resource on the Internet, a sentence, and the like are embedded. In general, making it possible to connect to various servers using a browser on the Internet is called a link, and in particular, designating a connection destination using hypertext is called a hyperlink. In the following description, hyperlinks are simply called links unless otherwise specified. A connection destination specified by a URL embedded in hypertext is called a link destination.
[0024]
The hypertext interpretation unit 101 analyzes attribute information described in the input hypertext. As a result of the analysis, information indicating the content of the main body of the input hypertext (hereinafter referred to as text to be read out) is output to the language processing unit 103, while a URL indicating the link destination (hereinafter referred to as link destination information) is a character string. And is output to the text-to-speech generation unit 102.
[0025]
The text-to-speech generation unit 102 connects to a link destination indicated by the link destination information output from the hypertext interpretation unit 101 via a network, and displays hypertext indicating information recorded in the link destination (hereinafter referred to as link information). A text is acquired, and a text to be read out at the link destination is generated based on the attribute information in the hypertext and output to the language processing unit 103.
[0026]
The language processing unit 103 receives the text-to-speech text output from the hypertext interpretation unit 101 and the text-to-link text to be output from the text-to-speech generation unit 102, analyzes the text information indicated by the text, and reads the text information to be read out. A character string to be represented and an array each storing a pitch and a duration value corresponding to the character string are output to the speech synthesizer 104.
[0027]
The speech synthesizer 104 generates a time-series numerical array representing the speech waveform from the information output from the language processing unit 103 and outputs it to the speech output unit 105.
[0028]
The voice output unit 105 includes a voice output device such as a speaker, and outputs the synthesized waveform output from the voice synthesizer 104 after D / A conversion.
[0029]
FIG. 2 illustrates a specific example in which hypertext is input to the speech synthesizer 100 according to the present embodiment, and the hypertext interpretation unit 101 and the read-out text generation unit 102 generate text information to be read out from the hypertext. It is a conceptual diagram for doing.
[0030]
FIG. 2A shows an example of input hypertext. In FIG. 2 (a), <a href=...> to </a> are descriptions of links, and <ahref = ...> and </a> are attribute information indicating links. Further, “hokkaido.html” and “touhoku.html” are link destination information. In this example, it is shown that this hypertext is linked to Hokkaido weather information and Tohoku weather information.
[0031]
FIG. 2B is an example of a link destination read-out text generated by the read-out text generation unit 102. The text to be read at the link destination is information stored in the server at the link destination (hokaido.html). In the hypertext interpretation unit 101, attribute information (<a href=...> And </a>) indicating the link is displayed. ) Is extracted from the link destination information (hokaido.html), and the text-to-speech generation unit 102 connects to the link destination and acquires hypertext corresponding to the link destination information. Thereafter, the text to be read out at the link destination is generated based on the attribute information in the acquired hypertext. In this case, information corresponding to all attribute information in the hypertext may be generated as read-out text at the link destination, or information corresponding to attribute information indicated other than the attribute information indicating the link in the hypertext. May be generated as a text to be read out at the link destination.
[0032]
FIG. 2C is an example of text-to-speech text generated by the hypertext interpretation unit 101. This text-to-speech text is information indicated by attribute information other than attribute information indicating a link in the hypertext.
(Embodiment 2) FIG. 3 is a functional block diagram showing another embodiment of the speech synthesizer according to the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals, and the same description is omitted.
[0033]
The reading text management unit 106 manages the text information input from the hypertext interpretation unit 101 and the reading text generation unit 102 so as to be output to the language processing unit 103 in an appropriate order.
[0034]
FIG. 4 is a flowchart showing an example of processing in the reading text management unit 106. The processing of the text-to-speech management unit 106 will be described with reference to FIG.
[0035]
When text information is input from the hypertext interpretation unit 101 and the read text generation unit 102 to the read text management unit 106, each character string constituting the text information is distinguished and stored (S101). That is, the character string of the text-to-speech text input from the hypertext interpretation unit 101 is stored in the array H, and the character string of the link-to-speech text input from the text generation unit 102 is stored in the array Y. At this time, assuming that the number of character strings input from the hypertext interpretation unit 101 is N and the number of character strings input from the read-out text generation unit 102 is M, the arrays H and Y have H [1 ], H [2],... H [N], Y [1], Y [2]..., Y [M] are stored.
[0036]
Next, 1 is assigned to the loop control variable i (S102), and H [i] is output to the language processing unit 103 (S103). Next, 1 is added to the control variable i (S104), and it is determined whether or not the condition of i <N is satisfied (S105). If the condition is satisfied, the process returns to S103, and the process is repeated until the character string input from the hypertext interpretation unit 101 is output. When the output is completed, the control variable i is reinitialized to 1 (S106), and Y [i] is output to the language processing unit 103 (S107). Next, 1 is added to the control variable i (S108), and it is determined whether or not the condition of i <M is satisfied (S109). If the condition is satisfied, the process returns to S107, and the process is repeated until the character string input from the text-to-speech generation unit 102 has been output.
[0037]
Through the above processing, it is possible to first synthesize text-to-speech text and then synthesize text-to-link text.
[0038]
In this embodiment, the text information is managed so that the text to be read out is output first, and then the text to be read out at the link destination is output. However, the text information is output in the reverse order or alternately. You may make it output. In other words, the read-out text management unit 106 sets whether to switch the output of the text-to-speech text and the output of the text-to-link text, or restricts whether to output the text-to-speech text or the text to be read out from the link destination. You may make it.
(Embodiment 3) FIG. 5 is a functional block diagram showing another embodiment of the speech synthesizer according to the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals, and the same description is omitted.
[0039]
The text summarization unit 107 receives the read-out text to be linked from the read-out text generation unit 102, summarizes the text information, and returns the summary information to the read-out text generation unit 102. The text information may be summarized using an existing text summarization technique.
[0040]
With such a configuration, it is possible to generate text that summarizes the link destination information set in the input hypertext, so that even if there is a large amount of link destination information, speech synthesis of appropriate information Is possible.
(Embodiment 4) FIG. 6 is a functional block diagram showing another embodiment of the speech synthesizer according to the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals, and the same description is omitted.
[0041]
The text translation unit 108 receives the read-out text to be linked from the text-to-speech generation unit 102, translates the text information into a preset language used by the user, and returns the translation information to the text-to-speech generation unit 102. . The translation of the text information may be performed by using an existing natural language automatic translation technique (supervised by Makoto Nagao, “Japanese Information Processing” The Institute of Electronics and Communication Engineers).
[0042]
With this configuration, the link destination information set in the input hypertext can be translated into the language used by the user in advance. It is possible to synthesize speech in an understandable language.
(Embodiment 5) FIG. 7 is a functional block diagram showing another embodiment of the speech synthesizer according to the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals, and the same description is omitted.
[0043]
The text-to-speech extraction unit 109 receives link information from the text-to-speech generation unit 102, and when the link information includes attribute information for text-to-speech, extracts the part and reads the text-to-speech information to the text-to-speech generation unit 102. Return to.
[0044]
The attribute information for text-to-speech will be described with reference to FIG.
[0045]
FIG. 8 is a diagram for explaining an example of extracting a part from hypertext in which attribute information for text reading is set. In FIG. 8A, <speak> indicates text reading attribute information, and text information sandwiched between <speak> (text reading start tag) and </ speak> (text reading end tag) is included. As shown in FIG. 8B, it is extracted as read-out information.
[0046]
In this way, when information to be read out is designated in advance in the link information, only that portion can be read out, and unnecessary information need not be read out.
(Embodiment 6) FIG. 9 is a functional block diagram showing another embodiment of the speech synthesizer according to the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals, and the same description is omitted.
[0047]
The display unit 112 includes a display device such as a liquid crystal display, and displays information indicated by hypertext. When displaying, software for displaying hypertext such as an existing WWW browser is used.
[0048]
The input unit 111 receives an input of a link destination desired by the user. The character string indicating the link destination is output to the reading text generation unit 102 and the hypertext input unit 110. The input unit 111 includes input devices such as a keyboard and a mouse, and a URL is typed with the keyboard via the WWW browser screen displayed on the display unit 112, or a character or image representing a link destination is clicked with the mouse. As a result, the corresponding character string is output to the reading text generation unit 102 and the hypertext management unit 110.
[0049]
The hypertext management unit 110 determines and outputs the hypertext that is the basis of the Web page displayed on the display unit 112 via the WWW browser. For example, in an initial state such as when a personal computer equipped with the speech synthesizer according to the present invention is activated, the display unit 112 is determined to display a start page set in advance by the WWW browser. Further, when a character string indicating a link destination is input from the input unit 111 or the read-out text generation unit 102, hypertext representing link destination information corresponding to the character string is acquired, and the hypertext is displayed on the display unit 112. And output to the hypertext interpretation unit 101.
[0050]
Further, when the user designates a link destination on the web page currently displayed on the display unit 112, the web page currently displayed is kept displayed as it is, while the information on the designated link destination is read out. Alternatively, it is possible to provide an interface that allows the user to select whether to switch the currently displayed web page to the linked web page and to read out the specified linked page information. In the latter case, for example, an image representing a button for switching pages is displayed on the display unit 112. When the button is selected by the input unit 111, the hypertext management unit 110 is currently reading it out. What is necessary is just to acquire the hypertext of the link destination corresponding to a sentence, and to output the information which the hypertext shows to the display part 112.
[0051]
With such a configuration, it is possible to obtain link destination information by voice synthesis without visually switching the currently displayed web page and switching the display to another web page, and when the user desires It is possible to switch to the display corresponding to the information currently being synthesized.
[0052]
In addition, when the input unit 111 moves the cursor to a character or image representing a link destination with the mouse, the link destination character string may be output to the read-out text generation unit 102. With such a configuration, it is possible to read out the information of the designated link destination by speech synthesis without changing the currently displayed Web page.
[0053]
Also, the character string corresponding to the link destination currently read out by speech synthesis is output from the reading text generation unit 102 to the display unit 112, and the character corresponding to the link destination is changed from other characters, for example, blinking. By displaying it, it is possible to notify the user of the link destination currently being read out by speech synthesis. As for how to change the display, in addition to blinking, for example, an image of a speaker may be displayed near the linked character or image.
[0054]
Also, as shown in FIG. 10, a link destination display unit 113 may be provided, and a character string representing a link destination corresponding to a text to be read out by voice may be displayed there.
[0055]
In the present embodiment, when the link destination is clicked with the mouse in the input unit 111, the link destination character string is output to the hypertext management unit 110, and when the cursor is moved to the link destination, the link destination Are output to the text-to-speech generator 102, but their output destinations may be reversed or output to both.
(Embodiment 7) The speech synthesizer according to the present embodiment has a configuration in which the speech synthesizer according to Embodiments 3 to 6 described above is combined, that is, each configuration of the speech synthesizer according to Embodiment 1. In addition, a speech synthesizer (not shown) configured to include a text summarization unit 107, a text translation unit 108, a reading text extraction unit 109, a hypertext management unit 110, an input unit 111, a display unit 112, and a link destination display unit. It is. In the present embodiment, processing in the text-to-speech generation unit 102 when a WWW browser is activated and a start page is displayed on a personal computer equipped with this speech synthesizer will be described.
[0056]
FIG. 11 is a flowchart illustrating an example of processing in the read-out text generation unit 102. The processing of the text-to-speech generation unit 102 will be described with reference to FIG.
[0057]
When the WWW browser is activated, hypertext corresponding to the start page is input to the hypertext management unit 110 and the start page is displayed on the display unit 112, and attribute information in the hypertext is analyzed by the hypertext interpretation unit 101. The link destination information is input to the read-out text generation unit 102.
[0058]
The link destination information is represented by an array A having a character string as an element. If the number of elements is N, the link destination information is represented as A [1], A [2],. When the link destination information is input to the read-out text generation unit 102, 1 is assigned to the loop control variable i (S201). Next, a text to be read out is generated from the link destination information corresponding to A [i], and stored in correspondence with a character string representing the link destination (S202). Detailed processing in S202 will be described later.
[0059]
Next, 1 is added to the control variable i (S203), and it is determined whether or not the condition of i <N is satisfied (S204). If the condition is satisfied, the process returns to S202, and the process is repeated until the end of the character string of the link destination information (A [n]).
[0060]
In S204, when the processing is completed up to the last character of the link destination information, data from the input unit 111 is received next. When a character string designating a link destination is input from the input unit 111, a character string representing the text to be read corresponding to the designated link destination is output to the language processing unit 103, and the character string representing the link destination is linked to the link destination. It is displayed on the display unit 113. When a display page switching instruction is input from the input unit 111, the character string displayed on the link destination display unit 113 is output from the reading text generation unit 102 to the hypertext management unit 110, and the link destination page is displayed. It is displayed on the display unit 112. In S205, if there is no input from the input unit 111, the process waits until there is an input.
[0061]
FIG. 12 is a detailed flowchart of the process in S202 described above.
[0062]
Link information is acquired from the link destination specified by the character string A [i] representing the link destination information and stored in the buffer. The link information can be acquired using a well-known HTTP protocol (hypertext transfer protocol) or the like.
[0063]
Next, it is determined whether or not the acquired link information includes attribute information for text reading indicated by, for example, <speak> (S302). If there is attribute information for text reading, the process proceeds to S306, where the text Read-out information is extracted, and the information is stored as read-out text information in association with a character string representing a link destination.
[0064]
If there is no text-to-speech attribute information in S302, the process proceeds to S303 to determine whether or not the link information needs to be translated. The language described in the header portion of the hypertext of the link information is compared with the language used by the user. If they are different, it is determined that translation is necessary, and the link information is translated (S304). If they match, it is determined that translation is unnecessary, and the process proceeds to S305.
[0065]
In S305, the link information is summarized and stored as read-out text information in association with a character string representing the link destination.
[0066]
It is also possible to provide a speech synthesizer that arbitrarily combines the configurations of the speech synthesizers in the first to seventh embodiments. Thus, it is obvious that even when the respective configurations are arbitrarily combined, the same operational effects described in the respective embodiments can be obtained.
[0067]
It is also possible to provide part or all of the processing in each of the above-described embodiments as a program (program) including an ordered sequence of instructions suitable for processing by a computer. It is also possible to provide the program as a computer-readable recording medium on which the program is recorded for installation, execution, and distribution of the program.
[0068]
【The invention's effect】
In the present invention, in a speech synthesizer that reads out a hypertext, a link destination connected in a network by a hyperlink Reading text extracted and generated from hypertext Thus, it is possible to provide a speech synthesizer that can read out the text in a state where the text of the text is displayed.
[Brief description of the drawings]
FIG. 1 is a functional block diagram of a speech synthesizer according to Embodiment 1 of the present invention.
FIG. 2 is a conceptual diagram of hypertext to be input to the speech synthesizer according to Embodiment 1 of the present invention and text information generated from the hypertext.
FIG. 3 is a functional block diagram of a speech synthesizer according to Embodiment 2 of the present invention.
FIG. 4 is a flowchart showing processing of a text management unit of the speech synthesizer according to Embodiment 2 of the present invention.
FIG. 5 is a functional block diagram of a speech synthesizer according to Embodiment 3 of the present invention.
FIG. 6 is a functional block diagram of a speech synthesizer according to Embodiment 4 of the present invention.
FIG. 7 is a functional block diagram of a speech synthesizer according to Embodiment 5 of the present invention.
FIG. 8 is a conceptual diagram showing the operation of the text-to-speech extraction unit of the speech synthesizer according to Embodiment 5 of the present invention.
FIG. 9 is a functional block diagram of a speech synthesizer according to Embodiment 6 of the present invention.
FIG. 10 is a functional block diagram of a speech synthesizer according to Embodiment 6 of the present invention.
FIG. 11 is a flowchart showing processing of a text-to-speech generation unit of the speech synthesizer according to Embodiment 7 of the present invention.
FIG. 12 is a flowchart showing processing of a text-to-speech generation unit of the speech synthesizer according to Embodiment 7 of the present invention.
[Explanation of symbols]
100 Speech synthesizer
101 Hypertext interpretation part
102 Reading text generator
103 Language processor
104 Speech synthesis unit
105 Audio output unit
106 Text-to-speech management department
107 Text summary section
108 Text Translation Department
109 Text-to-speech extraction unit
110 Hypertext Management Department
111 Input section
112 Display section
113 Link destination display

Claims

Display means for displaying information indicated by hypertext;
Link destination information that analyzes the attribute information indicating the link destination in the input hypertext and extracts link destination information indicating the link destination that represents the hypertext connection destination on the Internet from the input hypertext according to the analysis result Extraction means;
After a hypertext on the Internet corresponding to the link information acquired from the link destination on the Internet indicated by the link destination information, along with contained in the hypertext, based on the markup information indicating that it is a text-to-speech , A linked text-to-speech extracting means for extracting the text to be read out from the hypertext,
Link-to-speech text generation means for generating link-to-speech text based on the extracted link-to-speech hypertext,
A speech synthesizer for synthesizing the read-out text of the link destination extracted and generated from the hypertext acquired from the Internet, while maintaining the display processing by the display means for the input hypertext;
A speech synthesizer characterized by comprising:

Text-to-speech text generating means for generating text-to-speech text based on the input hypertext,
A switched speech synthesizer that synthesizes speech by switching between the generated text-to-speech text and the generated text-to-link text to be linked;
The speech synthesizer according to claim 1, further comprising:

It further has an input means for receiving an input of a desired link destination,
When an input of a link destination is received by the input means, the read-out text of the link destination is voice-synthesized by the switching voice synthesizing means, while the text-to-speech text is displayed by the display means. The speech synthesizer according to claim 2.

The input of the link destination by the input means is performed by moving the pointer including the cursor to display information including a character or an image representing the link destination in the reading text of the displayed text. The speech synthesizer according to claim 3.

2. The apparatus according to claim 1, further comprising selection means for selecting whether or not to display information corresponding to the text to be read at the link destination when speech synthesis of the text to be read at the link destination is requested. The speech synthesizer according to claim 2 or claim 3.

Displaying information indicated by input hypertext;
Analyzing attribute information indicating a link destination in the input hypertext, and extracting link destination information indicating a link destination representing a hypertext connection destination on the Internet from the input hypertext according to the analysis result;
After acquiring the hypertext on the Internet corresponding to the link destination information from the link destination on the Internet indicated by the link destination information, the hypertext is included in the hypertext and based on markup information indicating that it is a read-out text Extracting the hypertext to be read from the hypertext,
Generating link-to-speech text based on the extracted link-to-speech hypertext,
Maintaining the display processing for the input hypertext while synthesizing the read-out text of the link destination extracted and generated from the hypertext obtained from the Internet;
A speech synthesis method characterized by comprising:

A computer-readable recording medium recording a voice synthesis program for causing a computer to execute each step of the voice synthesis method according to claim 6.