JP2001282268A

JP2001282268A - Speech data delivery system

Info

Publication number: JP2001282268A
Application number: JP2000092788A
Authority: JP
Inventors: Eiji Mitsuya; 英司三ツ矢; Daisuke Yoshida; 大介吉田; Patrick Dabin; ダビン・パトリック; Yuji Mogi; 祐治茂木
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2000-03-30
Filing date: 2000-03-30
Publication date: 2001-10-12

Abstract

PROBLEM TO BE SOLVED: To enable home page readers and telephone users to listen to information on weather forecasts, etc., as speech because a speech synthesis server 12 receives the text data of the information on the weather forecasts, etc., from an information source 14 and forms synthesized speech data by using, for example, CHATR in accordance with therewith and delivers the data to Web servers 16 and telephone servers 18 which are delivery destinations. SOLUTION: The respective servers are able to provide diversified services without the need for discretely installing speech synthesizers to the Web servers and telephone servers and without increasing running costs.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は音声データ配信システ
ムに関し、特にたとえば天気予報，株価情報，道路情
報，競馬競輪等のレース結果，プロ野球速報等を音声デ
ータとしてＷＷＷサーバや電話サーバなどに配信する音
声データ配信に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice data distribution system, and in particular, distributes, for example, weather forecasts, stock price information, road information, race results such as horse races, and professional baseball bulletins to a WWW server or a telephone server as voice data. Related to audio data distribution.

【０００２】[0002]

【従来の技術】従来、インタネットを介してウェブサー
バ（ホームページ）にアクセスし、上述のようなデータ
または情報を取得することが可能である。2. Description of the Related Art Conventionally, it is possible to access a web server (homepage) via the Internet and acquire data or information as described above.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
システムでは、いずれも、そのようなデータや情報は文
字情報であり、ユーザは、それを目で読み取らなければ
ならず、或る種の煩わしさが残る。それを解消するに
は、ホームページから音声でこれらのデータや情報を提
供できるようにすればよい。However, in any of the conventional systems, such data or information is character information, and the user must read it with his / her eyes, which causes some inconvenience. Remains. In order to solve this problem, the data and information may be provided by voice from the homepage.

【０００４】ホームページから音声データを発信するた
めには、ウェブサーバに音声変換機能を設ける必要があ
り、ホームページ運営者にかかる金銭的負担が大きくな
る。[0004] In order to transmit voice data from a homepage, it is necessary to provide a voice conversion function in a web server, which increases the financial burden on the homepage operator.

【０００５】それゆえに、この発明の主たる目的は、ウ
ェブサーバ等に音声データを配信する、新規な音声デー
タ配信システムを提供することである。Therefore, a main object of the present invention is to provide a new audio data distribution system for distributing audio data to a web server or the like.

【０００６】[0006]

【課題を解決するための手段】この発明に従った音声デ
ータ配信システムは、情報ソースからテキストデータを
受け取る受信手段、テキストデータに基づいて合成音声
データを作成する音声合成手段、および合成音声データ
を配信先に配信する配信手段を備える、音声データ配信
システムである。A voice data distribution system according to the present invention comprises: a receiving unit for receiving text data from an information source; a voice synthesizing unit for generating synthesized voice data based on the text data; An audio data distribution system including a distribution unit for distributing to a distribution destination.

【０００７】好ましくは、音声合成手段は、ＣＨＡＴＲ
（自然発話音声波形接続方式)を用いた音声合成を実行
する。そのために、音声合成手段は、音声波形データベ
ースと、テキストデータの音素列に応じた索引情報に従
って音声波形データベースから音声波形信号を読み出す
音声単位選択部とを含む。[0007] Preferably, the voice synthesizing means is CHATR.
(Spontaneous Utterance Speech Waveform Connection Method) For this purpose, the speech synthesis means includes a speech waveform database and a speech unit selection unit for reading a speech waveform signal from the speech waveform database according to index information corresponding to a phoneme string of text data.

【０００８】このようにして合成された合成音声データ
は、好ましくは、ネットワーク上に存在するコンピュー
タの記憶システムに配信される。そのコンピュータとし
ては、ウェブサーバや電話サーバが考えられる。[0008] The synthesized speech data thus synthesized is preferably delivered to a storage system of a computer existing on a network. The computer may be a web server or a telephone server.

【０００９】[0009]

【作用】たとえば、天気予報，株価情報，道路情報，競
馬競輪等のレース結果，プロ野球速報等のテキストデー
タを情報ソースから受け取る。このテキストデータは、
メモリのテキストデータ領域に蓄積される。スケジュー
ラの指示に従って、音声合成手段が、テキストデータ領
域からテキストデータを読み出し、上記ＣＨＡＴＲによ
ってテキストデータに応じた合成音声データを作成す
る。この合成音声データは、メモリの音声データのよに
蓄積される。配信手段は、スケジューラの指示に従っ
て、音声データ領域の音声データを読み出して、スケジ
ューラの指示するウェブサーバ等の配信先に送信する。For example, text data such as weather forecasts, stock price information, road information, race results such as horse races, and professional baseball bulletins are received from an information source. This text data is
It is stored in the text data area of the memory. In accordance with the instruction of the scheduler, the voice synthesizing unit reads out the text data from the text data area, and creates the synthesized voice data according to the text data by the CHATR. This synthesized voice data is stored like voice data in a memory. The distribution means reads the audio data in the audio data area according to the instruction of the scheduler, and transmits the audio data to a distribution destination such as a web server indicated by the scheduler.

【００１０】ＣＨＡＴＲを用いた音声合成手段であれ
ば、任意の人間、たとえばタレントやアナウンサの実際
の音声波形を用いて音声データを合成するので、配信先
が希望する人間の声で音声データを配信することができ
る。[0010] If the voice synthesizing means using the CHATR synthesizes voice data using an actual voice waveform of an arbitrary person, for example, a talent or announcer, the voice data is distributed in a human voice desired by the distribution destination. can do.

【００１１】[0011]

【発明の効果】この発明によれば、たとえばホームペー
ジ運営者等がその音声データの提供を受けるだけで、ホ
ームページから天気予報等の情報を音声データとして出
力できるようになり、ユーザにとって非常に便利にな
る。しかも、ホームページ運営者等の配信先では、音声
合成装置を個別に設置しておく必要がなく、少ない運営
コストで多彩なサービスを提供できる。According to the present invention, information such as a weather forecast can be output as voice data from a home page simply by receiving the voice data, for example, by a homepage operator or the like, which is very convenient for the user. Become. In addition, it is not necessary to separately install a speech synthesizer at a distribution destination such as a homepage operator, and various services can be provided with a small operation cost.

【００１２】また、ＣＨＡＴＲを用いれば、任意の人間
の声で合成した極めて自然な音声データを配信すること
ができる。[0012] Also, the use of CHATR makes it possible to deliver extremely natural voice data synthesized with an arbitrary human voice.

【００１３】この発明の上述の目的，その他の目的，特
徴および利点は、図面を参照して行う以下の実施例の詳
細な説明から一層明らかとなろう。The above objects, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

【００１４】[0014]

【実施例】図１を参照して、この実施例の音声データ配
信システム１０は、音声合成サーバ１２を含み、この音
声合成サーバ１２が、天気予報ソース１４１，株価情報
ソース１４２，道路情報ソース１４３，レース結果ソー
ス１４４およびプロ野球速報ソース１４５（以下、まと
めて「情報ソース１４」ということがある。）から、天
気予報，株価情報，道路情報，レース結果およびプロ野
球速報のテキストデータを受ける。これらの情報ソース
１４は、典型的には、情報提供会社であるが、可能な
ら、気象庁や道路公団等の公的機関であってよい。Referring to FIG. 1, a voice data distribution system 10 according to this embodiment includes a voice synthesis server 12, which includes a weather forecast source 141, a stock price information source 142, and a road information source 143. And text data of a weather forecast, stock price information, road information, a race result, and a professional baseball bulletin from a race result source 144 and a professional baseball bulletin source 145 (hereinafter sometimes collectively referred to as an “information source 14”). These information sources 14 are typically information providers, but may be public institutions such as the Meteorological Agency or the Road Authority if possible.

【００１５】音声合成サーバ１２は、この実施例では、
ＣＨＡＴＲを用いた音声合成を実行するものであるが、
他の方式での音声合成を排除するものではない。そし
て、音声合成サーバ１２は、情報ソース１４から受け取
ったテキストデータを音声データに変換して、ＷＷＷ
（ウェブ）サーバ１６や電話サーバ１８に供給ないし配
信する。これらを「配信先」と呼ぶ。In this embodiment, the speech synthesis server 12
This performs speech synthesis using CHATR,
It does not exclude other methods of speech synthesis. Then, the speech synthesis server 12 converts the text data received from the information source 14 into speech data, and
(Web) Supply or distribute to the server 16 or the telephone server 18. These are called "delivery destinations".

【００１６】配信方法の一例として、この実施例では、
音声合成サーバ１２は、サーバ１６や１８のたとえばハ
ードディスク等を含む記憶システム（図示せず）の予め
決められた記憶領域に直接書き込む。この場合、各サー
バ１６および１８の記憶システムにおいて、合成音声デ
ータを書き込むべき領域を固定的に形成している。As an example of the distribution method, in this embodiment,
The speech synthesis server 12 directly writes to a predetermined storage area of a storage system (not shown) including a hard disk or the like of the servers 16 and 18. In this case, in the storage system of each of the servers 16 and 18, an area where the synthesized voice data is to be written is fixedly formed.

【００１７】ウェブサーバ１６は、それぞれ、ホームペ
ージを運営していて、コンピュータネット２０を介して
ユーザが当該ホームページにアクセスしたとき、音声合
成サーバ１２から配信を受けた合成音声データを聴取す
ることができる。Each of the web servers 16 operates a home page. When a user accesses the home page via the computer network 20, the web server 16 can listen to synthesized speech data distributed from the speech synthesis server 12. .

【００１８】同様に、電話サーバ１８は、携帯電話や他
の電話サービスを提供するもので、ユーザが電話キャリ
ア２２を介して電話サーバ１８にアクセスすると、電話
を通じて、音声合成サーバ１２からの合成音声データを
聴取することができる。Similarly, the telephone server 18 provides a cellular phone and other telephone services. When a user accesses the telephone server 18 via the telephone carrier 22, the synthesized voice from the voice synthesis server 12 is transmitted through the telephone. You can listen to the data.

【００１９】なお、音声合成サーバ１２から各サーバ１
６および１８に合成音声データを配信するに際して、実
際には、適宜必要なデータ圧縮の技術が採用され、圧縮
された音声データが配信される。また、図１の実施例で
は、専用線を通じて配信するように図示されているが、
これは、コンピュータネット２０を通して配信するよう
にしてもよいことは勿論である。この場合、適当な暗号
化やパスワード等の設定が必要なことはいうまでもな
い。The speech synthesizing server 12 sends each server 1
In distributing the synthesized voice data to 6 and 18, actually, a necessary data compression technique is appropriately adopted, and the compressed voice data is distributed. Also, in the embodiment of FIG. 1, the distribution is shown through a dedicated line.
Of course, this may be distributed via the computer net 20. In this case, it is needless to say that appropriate settings such as encryption and password are required.

【００２０】図２に示すブロック図は音声合成サーバ１
２を詳細に示し、音声合成サーバ１２は、受信コンピュ
ータ２４を含み、この受信コンピュータ２４が、図１に
示す情報ソース１４からテキストデータを受け取る。音
声合成サーバ１２はさらに合成コンピュータ２６を含
み、この合成コンピュータ２６は、後に或る程度使用際
に説明するＣＨＡＴＲによって音声合成を実行する。配
信コンピュータ２８は、合成コンピュータ２６によって
作成された合成音声データを各サーバ１６，１８（図
１）に配信する。これらのコンピュータ２４，２６およ
び２８は、同様にコンピュータであるスケジューラ３０
の指示に従って上述の作業を実行する。The block diagram shown in FIG.
2, the speech synthesis server 12 includes a receiving computer 24, which receives text data from the information source 14 shown in FIG. Speech synthesis server 12 further includes a synthesis computer 26, which performs speech synthesis with CHATR, which will be described in some use later. The distribution computer 28 distributes the synthesized speech data created by the synthesis computer 26 to each of the servers 16 and 18 (FIG. 1). These computers 24, 26 and 28 are also computers, schedulers 30
The above-mentioned operation is performed according to the instruction of (1).

【００２１】なお、受信コンピュータ２４によって受信
したテキストデータは、たとえばハードディスクのよう
なメモリ３２のテキストデータ領域３２ａに蓄積され、
合成コンピュータ２６によって作成した合成音声データ
は、そのメモリ３２の音声データ領域３２ｂに蓄積され
る。したがって、合成コンピュータ２６は、スケジユー
ラ３０の指示に従って、指示されたテキストデータをテ
キストデータ領域３２ａから読み出し、合成音声データ
を音声データ領域３２ｂに書き込む。そして、配信コン
ピュータ２８は、スケジューラ３０の指示に従って、指
示された音声データを音声データ領域３２ｂから読み出
して、指示された配信先に送信する。The text data received by the receiving computer 24 is stored in a text data area 32a of a memory 32 such as a hard disk.
The synthesized voice data created by the synthesis computer 26 is stored in the voice data area 32b of the memory 32. Therefore, the synthesis computer 26 reads the specified text data from the text data area 32a and writes the synthesized voice data to the voice data area 32b according to the instruction of the scheduler 30. Then, the distribution computer 28 reads the instructed audio data from the audio data area 32b according to the instruction of the scheduler 30, and transmits it to the instructed destination.

【００２２】図２の実施例では、合成サーバ１２が幾つ
かのコンピュータによって構成されているように図示さ
れているが、これは１つのコンピュータによって構成し
てもよいことは勿論である。In the embodiment of FIG. 2, the synthesizing server 12 is shown as being constituted by several computers, but it is a matter of course that this may be constituted by one computer.

【００２３】また、合成コンピュータ２６の詳細は、た
とえば特開平１０−４９１９３号公報［Ｇ１０Ｌ５／
０４，３／００］に詳しく説明されているように、任意
の人の実際に発話された音声を利用して合成音声データ
を作成することができる。The details of the synthesizing computer 26 are described in, for example, JP-A-10-49193 [G10L5 /
04, 3/00], synthesized voice data can be created using the voice of a person actually spoken.

【００２４】図２に示す合成コンピュータ２６は図３に
示すように機能的に表現できる。図３を参照して、合成
コンピュータ２６は、合成音声として出力すべき人の実
際の音声波形信号データを蓄積している音声波形データ
ベース３４を含み、音声分析部３８は、音声波形信号デ
ータベース３４内の音声波形信号データを分析して、音
素記号系列の生成，音素アラインメントおよび特徴パラ
メータを抽出する。このとき、音声波形データベース３
４から音声分析部３８には、テキストデータベース３６
内の音素表記（正書法テキスト）に従った音声波形信号
データが与えられ、音素アラインメント処理において
は、音素ＨＭＭ(Hidden Markov Model)４０に従って各
音素の始点および終点を決める。The synthesizing computer 26 shown in FIG. 2 can be functionally represented as shown in FIG. Referring to FIG. 3, synthesis computer 26 includes a voice waveform database 34 that stores actual voice waveform signal data of a person to be output as a synthesized voice. Is analyzed to generate a phoneme symbol sequence, phoneme alignment, and extract feature parameters. At this time, the audio waveform database 3
4 to the voice analysis unit 38, the text database 36
Speech waveform signal data in accordance with the phoneme notation (orthographic text) is given. In the phoneme alignment processing, the start point and the end point of each phoneme are determined according to the phoneme HMM (Hidden Markov Model) 40.

【００２５】音声分析部３８によって抽出された特徴ベ
クトルないし特徴パラメータは特徴パラメータメモリ４
２に蓄積される。この特徴パラメータには、音素ラベ
ル，メモリ４２内の音声波形信号データベース３４中の
各ファイルにおけるその音素の開始時刻（開始位置），
基本周波数，音素時間長，パワー，ストレス，アクセン
ト，韻律境界に対する位置，スペクトル傾斜等の情報が
含まれ、これらの情報は索引番号で特定される１つのフ
ァイル毎に記憶される。The feature vector or feature parameter extracted by the voice analysis unit 38 is stored in the feature parameter memory 4.
2 is stored. The characteristic parameters include a phoneme label, a start time (start position) of the phoneme in each file in the audio waveform signal database 34 in the memory 42,
Information such as fundamental frequency, phoneme time length, power, stress, accent, position with respect to prosodic boundary, and spectrum inclination is stored for each file specified by the index number.

【００２６】重み係数学習部４４は、最適重み係数を学
習しながら決定する。与えられた目標音声の音響的およ
び韻律的なかん京に最適なサンプルを音声波形データベ
ース３４から選択するためには、まず、どの特徴がどれ
だけ寄与しているかを音素的および韻律的名環境の違い
によって決める必要がある。音素の性質によって重要な
特徴パラメータの種類が変化するためである。たとえ
ば、音声基本周波数は有声音の選択には極めて有効であ
るが、無声音の選択にはほとんど意味がない。また、摩
擦音の音声機用的特徴は前後の音素の種類によって影響
が変わる。したがって、最適な音素を選択するためにそ
れぞれの特徴にどれだけの重みを付与するかを、たとえ
ば線形回帰分析等を利用して、この重み係数学習部４４
で自動的に決定する。そして、重み係数学習部４４で得
られた重み係数ベクトルがメモリ４６に蓄積され、音声
単位選択部４８で利用される。The weight coefficient learning unit 44 determines the optimum weight coefficient while learning it. In order to select, from the speech waveform database 34, the most suitable sample for the acoustic and prosodic kankyo of a given target speech, it is first necessary to determine which features and how much contribute to the phonetic and prosodic names. You have to decide according to the differences. This is because the types of important feature parameters change depending on the properties of phonemes. For example, the speech fundamental frequency is extremely effective in selecting voiced sounds, but has little meaning in selecting unvoiced sounds. In addition, the effect of the fricative sound for a sound machine varies depending on the types of phonemes before and after. Therefore, the weighting factor learning unit 44 determines, for example, by using a linear regression analysis, how much weight is given to each feature in order to select an optimal phoneme.
Is automatically determined by. Then, the weight coefficient vector obtained by the weight coefficient learning unit 44 is stored in the memory 46 and used by the voice unit selection unit 48.

【００２７】音声単位選択部４８は、受信コンピュータ
２４からメモリ３２に蓄積されたテキストデータ（入力
音素列）を受け、その音素列に基づいて音声単位を選択
し、それに対する音声波形データベース３４の索引番号
を出力する。つまり、この音声単位選択部４８は、音声
データに変換しようとするテキストデータで表される音
素列に対して、目標音素と候補音素との間の近似コスト
を表す目標コストと、連結されるべき隣接音素候補間の
近似コストを表す連結コストとを含むコストが最小にな
る音素候補列を検索してその索引番号と、各音素の開始
位置（時刻）と時間長とを音声合成部５０に与える。The voice unit selector 48 receives text data (input phoneme sequence) stored in the memory 32 from the receiving computer 24, selects a voice unit based on the phoneme sequence, and indexes the voice unit in the voice waveform database 34. Print the number. That is, the speech unit selection unit 48 should concatenate the phoneme string represented by the text data to be converted into speech data with the target cost representing the approximate cost between the target phoneme and the candidate phoneme. A search is made for a phoneme candidate sequence having the minimum cost including a connection cost representing an approximate cost between adjacent phoneme candidates, and the index number, the start position (time) and time length of each phoneme are provided to the speech synthesis unit 50. .

【００２８】音声合成部５０では、索引番号と、各音素
の開始位置（時刻）と時間長とに基づいて、音声波形信
号データベース３４をアクセスして、その音素候補のデ
ィジタル音声波形信号データを出力し、それが図２に示
す音声データ領域３２ｂに蓄積される。The voice synthesizer 50 accesses the voice waveform signal database 34 based on the index number, the start position (time) and time length of each phoneme, and outputs digital voice waveform signal data of the phoneme candidate. Then, it is stored in the audio data area 32b shown in FIG.

【００２９】このようにして、合成コンピュータ２６に
よって、ＣＨＡＴＲを用いてテキストデータが合成音声
データに変換される。このＣＨＡＴＲでは、上述のよう
に、音声波形信号データベース３４に蓄積されている実
際の発話者の音声波形信号データを用いる。したがっ
て、一般的なパルスと白色雑音とを用いる音声合成に比
べて、非常に自然な音声を合成することができる。As described above, the text data is converted into synthesized speech data by the synthesis computer 26 using the CHATR. In this CHATR, as described above, the voice waveform signal data of the actual speaker stored in the voice waveform signal database 34 is used. Therefore, a very natural voice can be synthesized as compared with voice synthesis using a general pulse and white noise.

【００３０】一方、ＣＨＡＴＲでは、音声波形信号デー
タを厳重に管理しなければ、有名人や著名人の音声波形
信号データが無断で利用されてしまう可能性がある。つ
まり、ＣＨＡＴＲでは、その人の肉声を一定時間取得す
れば音声波形信号データベースをつくることができるの
で、任意のウェブサーバや電話サーバにＣＨＡＴＲによ
る合成コンピュータを設置させると、この実施例におい
て音声合成サーバ１２（図１）が提供しようとするデー
タ以外への利用の可能性を広げ、他人の音声の無断利用
の危険性が飛躍的に増大する。On the other hand, in the CHATR, if the audio waveform signal data is not strictly managed, the audio waveform signal data of a celebrity or a celebrity may be used without permission. In other words, in the CHATR, if a person's real voice is acquired for a certain period of time, a voice waveform signal database can be created. 12 (FIG. 1) expands the possibility of using the data other than the data to be provided, and the risk of unauthorized use of the voice of another person dramatically increases.

【００３１】そこで、この発明では、合成サーバ１２か
ら必要な配信先に合成音声データを提供するようにして
いる。そのため、この発明においてＣＨＡＴＲによる音
声合成を用いても、音声波形信号データを厳重にかつ一
元的に管理できるので、斯かる無断利用を完全に回避で
きる。しかも、各サーバ等においては、合成コンピュー
タを独自に設置しかつ運用する初期コストおよびランニ
ングコストの低減が期待できる。Therefore, in the present invention, the synthesized speech data is provided from the synthesis server 12 to a necessary distribution destination. For this reason, even if voice synthesis using CHATR is used in the present invention, voice waveform signal data can be strictly and unitarily managed, and such unauthorized use can be completely avoided. In addition, in each server or the like, reduction in the initial cost and running cost of independently installing and operating the synthetic computer can be expected.

【００３２】なお、上述の説明では、配信先として、ホ
ームページ（ＷＷＷ）サーバや電話サーバを想定してい
るが、斯かる配信先がもし放送局であれば、配信した合
成音声データを利用してそのまま放送することができ
る。In the above description, a homepage (WWW) server or a telephone server is assumed as the distribution destination. However, if the distribution destination is a broadcasting station, the distributed synthesized voice data is used. It can be broadcast as it is.

[Brief description of the drawings]

【図１】この発明の一実施例を示す図解図である。FIG. 1 is an illustrative view showing one embodiment of the present invention;

【図２】図１実施例の音声合成サーバを示すブロック図
である。FIG. 2 is a block diagram showing a speech synthesis server of the embodiment in FIG. 1;

【図３】図２実施例の合成コンピュータを示す機能ブロ
ック図である。FIG. 3 is a functional block diagram illustrating a synthesis computer according to the embodiment in FIG. 2;

[Explanation of symbols]

１０ …音声データ配信システム１２ …音声合成サーバ１４ …情報ソース１６ …ＷＷＷ（ウェブ）サーバ１８ …電話サーバ２６ …合成コンピュータ DESCRIPTION OF SYMBOLS 10 ... Speech data distribution system 12 ... Speech synthesis server 14 ... Information source 16 ... WWW (web) server 18 ... Telephone server 26 ... Synthesis computer

───────────────────────────────────────────────────── フロントページの続き (72)発明者ダビン・パトリック京都府相楽郡精華町光台二丁目２番地２株式会社国際電気通信基礎技術研究所内 (72)発明者茂木祐治京都府相楽郡精華町光台二丁目２番地２株式会社国際電気通信基礎技術研究所内Ｆターム(参考） 5D045 AA09 AB01 AB26 5K015 GA00 9A001 HH18 JJ25 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor: Dabin Patrick 2-2-2 Kodai, Seika-cho, Soraku-gun, Kyoto Prefecture Within the International Telecommunications Research Institute, Inc. (72) Yuji Mogi, Seika-cho, Soraku-gun, Kyoto 2-2-2, Kodaidai F-term in International Telecommunication Research Institute, Inc. (reference) 5D045 AA09 AB01 AB26 5K015 GA00 9A001 HH18 JJ25

Claims

[Claims]

An audio data distribution system comprising: a receiving unit that receives text data from an information source; a voice synthesizing unit that generates synthesized voice data based on the text data; and a distribution unit that distributes the synthesized voice data to a distribution destination. system.

2. The voice synthesizing means includes a voice waveform signal database and voice unit selecting means for reading voice waveform signal data from the voice waveform database according to index information corresponding to a phoneme string of the text data. 1
The described audio data distribution system.

3. The voice data distribution system according to claim 1, wherein the synthesized voice data synthesized by the voice synthesis means is distributed to a storage system of a computer existing on a network.

4. The audio data distribution system according to claim 1, wherein said distribution destination includes a web server.

5. The audio data distribution system according to claim 1, wherein said distribution destination includes a telephone server.