JPH10222343A

JPH10222343A - Information communication system, information processor, information communicating method and information processing method

Info

Publication number: JPH10222343A
Application number: JP9083123A
Authority: JP
Inventors: Nobuhide Yamazaki; 信英山崎
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 1996-12-04
Filing date: 1997-04-01
Publication date: 1998-08-21

Abstract

PROBLEM TO BE SOLVED: To keep high quality in voice synthesis by obtaining optimum correspondence relation without fixing the correspondence relation of speech information with tone information. SOLUTION: In a host device 1, DB11 permits one of largeness of voice and the height of it or both of them to correspond to the time difference, one of voice largeness and voice height or both of them are permitted not to rely on time difference between phonemes and file information including speech information constituted by a discreteness so as to have a correlative level is stored. In a terminal equipment 2, a tone storage part 21 stores tone data expressing acoustic parameters at every speech segment unit of the phonemes by tone kind. The host device 1 transfers file information to the terminal equipment 2 in accordance with the terminal equipment 2 and the terminal equipment 2 reads tone data designated by transferred speech information inside file information from a tone storage part 21 and executes voice synthesis based on the tone data and speech information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、インターネット
等の通信網を介して装置間で情報通信を行うことにより
音声等のメディア情報を再生する情報通信システム及び
その方法、ならびインターネット等の通信網を介して装
置間で情報通信を行うことにより音声等のメディア情報
を再生するための情報を作成編集する情報処理装置及び
その方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information communication system and method for reproducing media information such as voice by performing information communication between apparatuses via a communication network such as the Internet, and a communication network such as the Internet. TECHNICAL FIELD The present invention relates to an information processing apparatus and method for creating and editing information for reproducing media information such as audio by performing information communication between apparatuses via the apparatus.

【０００２】[0002]

【従来の技術】近年、著しく発展しているインターネッ
トにおいて、サーバーからクライアントに対して音声を
伝達するため、音声を波形データ（．ｗａｖや．ａｕ）
の形態で圧縮して転送する技術が適用されている。2. Description of the Related Art In the Internet, which has been remarkably developed in recent years, in order to transmit audio from a server to a client, audio is converted into waveform data (.wav or .au).
The technique of compressing and transferring in the form of is applied.

【０００３】このインターネットでは、転送量の多いホ
ームページのダウンロードが敬遠される傾向にあるた
め、データサイズの大きい波形データを少量の転送量で
転送できるようにすることが音声通信を普及させるため
の要となる。[0003] In the Internet, download of a homepage with a large amount of transfer tends to be avoided, and it is necessary to transfer waveform data having a large data size with a small amount of transfer in order to spread voice communication. Becomes

【０００４】音声通信におけるこの転送量の問題を解消
する技術として、例えば特公平５−５２５２０号公報に
開示されている技術がある。この公報によれば、音声を
音源情報とこの音源情報に対応関係にある声道情報とに
分けておき、音声合成時に対応関係にある音源情報と声
道情報とを音声合成する技術が開示されている。As a technique for solving the problem of the transfer amount in voice communication, there is a technique disclosed in Japanese Patent Publication No. 5-52520, for example. According to this publication, a technique is disclosed in which voice is divided into sound source information and vocal tract information corresponding to the sound source information, and the sound source information and the vocal tract information corresponding to each other are synthesized during voice synthesis. ing.

【０００５】[0005]

【発明が解決しようとする課題】ところが、インターネ
ットは不特定多数の利用者によって利用される通信網で
あることから、通常、クライアントはサーバーから任意
に音源情報すなわち発話情報をアクセスしてその発話情
報を取り込む利用形態となる。その際、クライアントに
用意されている声道情報すなわち声色情報が任意にアク
セスした発話情報に合致したものかどうかは不明であ
る。However, since the Internet is a communication network used by an unspecified number of users, a client usually accesses arbitrarily sound source information, that is, utterance information from a server. This is a usage form that captures At this time, it is unclear whether the vocal tract information, that is, the timbre information prepared in the client matches the utterance information arbitrarily accessed.

【０００６】したがって、声色情報の基になる話者と発
話情報の基になる話者とが同一人物であり、かつ声色情
報と発話情報とを作成する際の声のコンディションが同
じであれば、音声合成による音声の再現性に問題はない
が、人物やコンディションが異なれば、振幅が絶対的な
振幅レベルとして指定され、また、声の高さが絶対的な
ピッチ周波数として指定されていることから、声色情報
本来の振幅パターンが反映されず、音声合成時に不適な
音声で再現されるという危惧があった。Therefore, if the speaker on which the timbre information is based and the speaker on which the utterance information is based are the same person, and the voice conditions when creating the timbre information and the utterance information are the same, There is no problem with the reproducibility of the voice by voice synthesis, but if the person or condition is different, the amplitude is specified as the absolute amplitude level, and the pitch of the voice is specified as the absolute pitch frequency. However, there is a fear that the original amplitude pattern of the voice color information is not reflected, and the voice color information is reproduced with an inappropriate voice at the time of voice synthesis.

【０００７】この発明は、発話情報と声色情報との対応
関係を固定しなくても最適の対応関係を得ることで音声
合成の高い品質を維持することが可能な情報通信システ
ムを得ることを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to provide an information communication system capable of maintaining a high quality of speech synthesis by obtaining an optimum correspondence without fixing a correspondence between speech information and voice information. And

【０００８】また、この発明は、上記情報通信システム
により音声合成の高い品質を維持するための情報を容易
に作成編集することが可能な情報処理装置を得ることを
他の目的とする。It is another object of the present invention to provide an information processing apparatus capable of easily creating and editing information for maintaining high quality of speech synthesis by the information communication system.

【０００９】また、この発明は、発話情報と声色情報と
の対応関係を固定しなくても最適の対応関係を得ること
で音声合成の高い品質を維持することが可能な情報通信
方法を得ることを他の目的とする。Another object of the present invention is to provide an information communication method capable of maintaining a high quality of speech synthesis by obtaining an optimum correspondence without fixing a correspondence between speech information and voice color information. For other purposes.

【００１０】また、この発明は、上記情報通信方法によ
り音声合成の高い品質を維持するための情報を容易に作
成編集することが可能な情報処理方法を得ることを他の
目的とする。It is another object of the present invention to provide an information processing method capable of easily creating and editing information for maintaining high quality of speech synthesis by the information communication method.

【００１１】[0011]

【課題を解決するための手段】上述した課題を解決し、
目的を達成するため、請求項１の発明に係る情報通信シ
ステムは、第１通信装置と第２通信装置とを通信網に接
続し、この通信網を介して第１通信装置と第２通信装置
間で情報通信を行う情報通信システムにおいて、第１通
信装置は、声の大きさと声の高さとのいずれか一方、も
しくはその両方とその時間差とを対応させ、声の大きさ
と声の高さとのいずれか一方、もしくはその両方を、音
韻間の時間差に依存せず、かつ相対的なレベルをもつよ
うに離散させてなる発話情報を含むファイル情報を記憶
するファイル情報記憶手段と、第２通信装置の要求に応
じてファイル情報記憶手段に記憶された発話情報を第２
通信装置へ転送する第１通信手段と、を有し、第２通信
装置は、声色の種類別に音韻等の素片単位毎の音響パラ
メータを表す声色データを記憶する声色記憶手段と、第
１通信装置に対してファイル情報記憶手段に記憶されて
いるファイル情報の転送を要求し、その後に第１通信手
段により転送されてくるファイル情報を受信する第２通
信手段と、第２通信手段により受信されたファイル情報
の内の発話情報に基づいて声色記憶手段に記憶されてい
る複数種の声色データの内から一つの声色データを選定
する選定手段と、発話情報に含まれる声の大きさと声の
高さとのいずれか一方、もしくはその両方とその時間差
とに基づいて時間軸方向で連続した韻律パターンを展開
する展開手段と、展開手段により展開された韻律パター
ンと選定手段により選定された声色データとに基づいて
音声波形を生成する音声再生手段と、を有したことを特
徴とする。Means for Solving the Problems The above-mentioned problems are solved,
To achieve the object, an information communication system according to the first aspect of the present invention connects a first communication device and a second communication device to a communication network, and the first communication device and the second communication device via the communication network. In the information communication system for performing information communication between the first and second communication devices, the first communication device associates one or both of the loudness and the loudness of the voice with a time difference between the loudness and the loudness thereof, and File information storage means for storing file information including utterance information obtained by discriminating one or both of them so as to have a relative level without depending on a time difference between phonemes, and a second communication device The utterance information stored in the file information storage means in response to the
First communication means for transferring to a communication device, wherein the second communication device stores voice data representing voice parameters for each unit of unit such as phoneme for each voice type, and first communication. A second communication unit for requesting the device to transfer the file information stored in the file information storage unit, and thereafter receiving the file information transferred by the first communication unit; Selecting means for selecting one voice data from a plurality of types of voice data stored in the voice storage means based on the utterance information of the file information, and a voice volume and a voice height included in the utterance information. Expansion means for expanding a continuous prosody pattern in the time axis direction based on one or both of them and the time difference, and a prosody pattern expanded by the expansion means and a selection means. Characterized in that has a sound reproducing means for generating a speech waveform, the based on the selected the tone of voice data.

【００１２】この請求項１の発明によれば、第１通信装
置から第２通信装置へ発話情報を含むファイル情報を転
送し、第２通信装置において、音韻に依存しない声の大
きさや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報に基づき選定された声
色データとに基づいて音声波形を生成するようにしたの
で、特定の声色に限定しなくても適した声色で音声再生
ができ、かつ波形合成時に声の高さのパターンにずれが
生じることはなく、このように、発話情報と声色情報と
の対応関係を固定しなくても最適の対応関係を得ること
で音声合成の高い品質を維持することが可能である。According to the first aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device makes the voice volume and voice pitch independent of phonemes. By developing a temporally continuous prosody pattern and generating a voice waveform based on the prosody pattern and voice data selected based on the utterance information, it is suitable without being limited to a specific voice. It is possible to reproduce voices with different timbres, and there is no shift in voice pitch pattern during waveform synthesis. In this way, the optimum correspondence between utterance information and timbre information can be obtained without fixing the correspondence. This makes it possible to maintain high quality of speech synthesis.

【００１３】請求項２の発明に係る情報通信システム
は、第１通信装置と第２通信装置とを通信網に接続し、
この通信網を介して第１通信装置と第２通信装置間で情
報通信を行う情報通信システムにおいて、第１通信装置
は、声の大きさと声の高さとのいずれか一方、もしくは
その両方と、その時間差と、声色の種類とを対応させ、
声の大きさと声の高さとのいずれか一方、もしくはその
両方を、音韻間の時間差に依存せず、かつ相対的なレベ
ルをもつように離散させてなる発話情報を含むファイル
情報を記憶するファイル情報記憶手段と、第２通信装置
の要求に応じてファイル情報記憶手段に記憶されたファ
イル情報を第２通信装置へ転送する第１通信手段と、を
有し、第２通信装置は、声色の種類別に音韻等の素片単
位毎の音響パラメータを表す声色データを記憶する声色
記憶手段と、第１通信装置に対してファイル情報記憶手
段に記憶されているファイル情報の転送を要求し、その
後に第１通信手段により転送されてくるファイル情報を
受信する第２通信手段と、第２通信手段により受信され
たファイル情報の内の発話情報の内の声色の種類に対応
する声色データを声色記憶手段に記憶されている複数種
の声色データの内から選定する選定手段と、発話情報に
含まれる声の大きさと声の高さとのいずれか一方、もし
くはその両方とその時間差とに基づいて時間軸方向で連
続した韻律パターンを展開する展開手段と、展開手段に
より展開された韻律パターンと選定手段により選定され
た声色データとに基づいて音声波形を生成する音声再生
手段と、を有したことを特徴とする。An information communication system according to a second aspect of the present invention connects the first communication device and the second communication device to a communication network,
In the information communication system for performing information communication between the first communication device and the second communication device via the communication network, the first communication device includes one or both of a loudness and a high pitch of the voice, Match the time difference with the type of voice,
A file for storing file information including utterance information obtained by discriminating one or both of a voice volume and a voice pitch so as to have a relative level without depending on a time difference between phonemes. An information storage unit; and a first communication unit configured to transfer file information stored in the file information storage unit to the second communication device in response to a request from the second communication device. Voice data storage means for storing voice data representing acoustic parameters for each unit such as phoneme for each type, and requesting the first communication device to transfer the file information stored in the file information storage means. Second communication means for receiving the file information transferred by the first communication means; and voice data corresponding to the type of voice in the utterance information of the file information received by the second communication means. Selection means for selecting from among a plurality of types of voice data stored in the color storage means, and either one or both of the loudness and pitch of the voice included in the utterance information and the time difference based on the both. Expansion means for expanding a prosody pattern continuous in the time axis direction, and audio reproduction means for generating an audio waveform based on the prosody pattern expanded by the expansion means and the timbre data selected by the selection means It is characterized by.

【００１４】請求項２の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の種類を示す
情報で選定された声色データとに基づいて音声波形を生
成するようにしたので、特定の声色に限定しなくても複
数種の声色から直接的に指定した最適の声色で音声再生
ができ、かつ波形合成時に声の高さのパターンにずれが
生じることはなく、このように、発話情報と声色情報と
の対応関係を固定しなくても最適の対応関係を得ること
で音声合成の高い品質を維持することが可能である。According to the second aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and in the second communication device, the loudness and pitch of the voice independent of the phoneme. In order to generate a speech waveform based on the prosody pattern that is temporally continuous and generate the voice waveform based on the prosody pattern and the timbre data selected by the information indicating the type of timbre in the utterance information, The voice can be reproduced with the optimum voice directly specified from a plurality of voices without limitation, and there is no shift in the voice pitch pattern during waveform synthesis. Even if the correspondence with the information is not fixed, it is possible to maintain high quality of speech synthesis by obtaining the optimum correspondence.

【００１５】請求項３の発明に係る情報通信システム
は、第１通信装置と第２通信装置とを通信網に接続し、
この通信網を介して第１通信装置と第２通信装置間で情
報通信を行う情報通信システムにおいて、第１通信装置
は、声の大きさと声の高さとのいずれか一方、もしくは
その両方と、その時間差と、声色の属性とを対応させ、
声の大きさと声の高さとのいずれか一方、もしくはその
両方を、音韻間の時間差に依存せず、かつ相対的なレベ
ルをもつように離散させてなる発話情報を含むファイル
情報を記憶するファイル情報記憶手段と、第２通信装置
の要求に応じてファイル情報記憶手段に記憶されたファ
イル情報を第２通信装置へ転送する第１通信手段と、を
有し、第２通信装置は、声色の種類別に、音韻等の素片
単位毎の音響パラメータを表す声色データとその声色の
属性を示す情報とを対応させて記憶する声色記憶手段
と、第１通信装置に対してファイル情報記憶手段に記憶
されているファイル情報の転送を要求し、その後に第１
通信手段により転送されてくるファイル情報を受信する
第２通信手段と、第２通信手段により受信されたファイ
ル情報の内の発話情報の内の声色の属性を示す情報と声
色記憶手段に記憶されている各種声色の属性を示す情報
とを照合して声色の類似度を求める照合手段と、照合手
段により求められた類似度に基づいて声色記憶手段に記
憶されている複数種の声色データの内から類似度の最も
高い声色データを選定する選定手段と、発話情報に含ま
れる声の大きさと声の高さとのいずれか一方、もしくは
その両方とその時間差とに基づいて時間軸方向で連続し
た韻律パターンを展開する展開手段と、展開手段により
展開された韻律パターンと選定手段により選定された声
色データとに基づいて音声波形を生成する音声再生手段
と、を有したことを特徴とする。An information communication system according to a third aspect of the present invention connects the first communication device and the second communication device to a communication network,
In the information communication system for performing information communication between the first communication device and the second communication device via the communication network, the first communication device includes one or both of a loudness and a high pitch of the voice, Make the time difference correspond to the voice attribute,
A file for storing file information including utterance information obtained by discriminating one or both of a voice volume and a voice pitch so as to have a relative level without depending on a time difference between phonemes. An information storage unit; and a first communication unit configured to transfer file information stored in the file information storage unit to the second communication device in response to a request from the second communication device. Voice color storage means for storing voice data representing acoustic parameters for each unit such as phonemes and information indicating the attributes of the voice in association with each other, and file information storage means for the first communication device in the file information storage means Requesting the transfer of file information that has been
A second communication unit for receiving the file information transferred by the communication unit; and information indicating a voice attribute of the utterance information in the file information received by the second communication unit and stored in the voice storage unit. Collation means for comparing the information indicating the attributes of various voices to determine the similarity of the timbre, and a plurality of types of timbre data stored in the timbre storage means based on the similarity determined by the collation means. Selecting means for selecting voice data having the highest similarity, and a prosody pattern continuous in the time axis direction based on one or both of the loudness and loudness of the voice included in the utterance information and the time difference thereof Expansion means, and sound reproduction means for generating an audio waveform based on the prosody pattern expanded by the expansion means and the timbre data selected by the selection means. And it features.

【００１６】この請求項３の発明によれば、第１通信装
置から第２通信装置へ発話情報を含むファイル情報を転
送し、第２通信装置において、音韻に依存しない声の大
きさや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の属性を示す
情報で類似度によって選定された声色データとに基づい
て音声波形を生成するようにしたので、不適な声色を使
用せずに類似度の最も高い声色で音声再生ができ、かつ
波形合成時に声の高さのパターンにずれが生じることは
なく、このように、発話情報と声色情報との対応関係を
固定しなくても最適の対応関係を得ることで音声合成の
高い品質を維持することが可能である。According to the third aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and in the second communication device, the loudness and pitch of the voice independent of the phoneme. Now, a temporally continuous prosody pattern is developed, and a voice waveform is generated based on the prosody pattern and voice data selected by the similarity based on information indicating a voice attribute in the utterance information. Voice reproduction with the highest similarity can be performed without using inappropriate voices, and there is no shift in voice pitch pattern during waveform synthesis. Thus, correspondence between speech information and voice information Even if the relationship is not fixed, it is possible to maintain the high quality of speech synthesis by obtaining the optimal correspondence.

【００１７】請求項４の発明に係る情報通信システム
は、第１通信装置と第２通信装置とを通信網に接続し、
この通信網を介して第１通信装置と第２通信装置間で情
報通信を行う情報通信システムにおいて、第１通信装置
は、声の大きさと声の高さとのいずれか一方、もしくは
その両方と、その時間差と、声色の種類と、声色の属性
とを対応させ、声の大きさと声の高さとのいずれか一
方、もしくはその両方を、音韻間の時間差に依存せず、
かつ相対的なレベルをもつように離散させてなる発話情
報を含むファイル情報を記憶するファイル情報記憶手段
と、第２通信装置の要求に応じてファイル情報記憶手段
に記憶されたファイル情報を第２通信装置へ転送する第
１通信手段と、を有し、第２通信装置は、声色の種類別
に、音韻等の素片単位毎の音響パラメータを表す声色デ
ータとその声色の属性を示す情報とを対応させて記憶す
る声色記憶手段と、第１通信装置に対してファイル情報
記憶手段に記憶されているファイル情報の転送を要求
し、その後に第１通信手段により転送されてくるファイ
ル情報を受信する第２通信手段と、第２通信手段により
受信されたファイル情報の内の発話情報の内の声色の種
類を声色記憶手段に記憶されている各種声色の種類から
検索する検索手段と、検索手段の検索により発話情報の
内の声色の種類が取得できた場合には、その取得できた
声色の種類に該当する声色データを声色記憶手段に記憶
されている各種声色データの内から選定する第１選定手
段と、検索手段の検索により発話情報の内の声色の種類
が取得できなかった場合には、発話情報記憶手段に記憶
されている発話情報の内の声色の属性を示す情報と声色
記憶手段に記憶されている各種声色の属性を示す情報と
を照合して声色の類似度を求める照合手段と、照合手段
により求められた類似度に基づいて声色記憶手段に記憶
されている複数種の声色データの内から類似度の最も高
い声色データを選定する第２選定手段と、発話情報に含
まれる声の大きさと声の高さとのいずれか一方、もしく
はその両方とその時間差とに基づいて時間軸方向で連続
した韻律パターンを展開する展開手段と、展開手段によ
り展開された韻律パターンと前記第１又は第２選定手段
により選定された声色データとに基づいて音声波形を生
成する音声再生手段と、を有したことを特徴とする。An information communication system according to a fourth aspect of the present invention connects the first communication device and the second communication device to a communication network,
In the information communication system for performing information communication between the first communication device and the second communication device via the communication network, the first communication device includes one or both of a loudness and a high pitch of the voice, The time difference, the type of timbre, and the attribute of the timbre are associated with each other, and one or both of the loudness and the pitch of the voice do not depend on the time difference between phonemes,
And file information storage means for storing file information including utterance information discretely having a relative level, and file information stored in the file information storage means in response to a request from the second communication device. First communication means for transferring to the communication device, the second communication device, for each type of timbre, voice data representing an acoustic parameter for each unit such as a phoneme and information indicating the attribute of the voice. Requesting the first communication device to transfer the file information stored in the file information storage unit, and receiving the file information transferred by the first communication unit thereafter; Second communication means, and search means for searching for the type of voice in the utterance information of the file information received by the second communication means from the various types of voice stored in the voice storage means. When the type of voice in the utterance information can be obtained by the search of the search means, the voice data corresponding to the obtained voice type is selected from the various voice data stored in the voice storage means. When the type of voice in the utterance information cannot be obtained by the first selection unit and the search unit, the information indicating the attribute of the voice in the utterance information stored in the utterance information storage unit and the voice Matching means for comparing the information indicating the attributes of various timbres stored in the storage means to obtain similarities of timbres; and a plurality of types stored in the timbre storage means based on the similarities determined by the matching means. A second selecting means for selecting the voice data having the highest similarity from the voice data of the first and second voices, based on one or both of the voice loudness and the voice pitch included in the utterance information and the time difference thereof. Expansion means for expanding a prosody pattern continuous in the interaxial direction, and audio reproduction means for generating an audio waveform based on the prosody pattern expanded by the expansion means and the timbre data selected by the first or second selection means And having the following.

【００１８】この請求項４の発明によれば、第１通信装
置から第２通信装置へ発話情報を含むファイル情報を転
送し、第２通信装置において、音韻に依存しない声の大
きさや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の種類や属性
を示す情報で選定された声色データとに基づいて音声波
形を生成するようにしたので、直接的に指定した声色が
なくても不適な声色を使用せずに類似度の最も高い声色
で音声再生ができ、かつ波形合成時に声の高さのパター
ンにずれが生じることはなく、このように、発話情報と
声色情報との対応関係を固定しなくても最適の対応関係
を得ることで音声合成の高い品質を維持することが可能
である。According to the fourth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the voice volume and the pitch of the voice independent of the phoneme in the second communication device. Then, a temporally continuous prosody pattern is developed, and a voice waveform is generated based on the prosody pattern and the voice data selected by the information indicating the type and attribute of the voice in the utterance information. Even if there is no specified voice, the voice can be reproduced with the highest similarity without using an inappropriate voice, and there is no shift in the voice pitch pattern during waveform synthesis. Even if the correspondence between the utterance information and the timbre information is not fixed, it is possible to maintain the high quality of speech synthesis by obtaining the optimum correspondence.

【００１９】請求項５の発明に係る情報通信システム
は、第１通信装置と第２通信装置とを通信網に接続し、
この通信網を介して第１通信装置と第２通信装置間で情
報通信を行う情報通信システムにおいて、第１通信装置
は、音韻と韻律とを情報として含む発話情報を含むファ
イル情報を記憶するファイル情報記憶手段と、第２通信
装置の要求に応じてファイル情報記憶手段に記憶された
発話情報を第２通信装置へ転送する第１通信手段と、を
有し、第２通信装置は、声色の種類別に音韻等の素片単
位毎の音響パラメータを表す声色データを記憶する声色
記憶手段と、第１通信装置に対してファイル情報記憶手
段に記憶されているファイル情報の転送を要求し、その
後に第１通信手段により転送されてくるファイル情報を
受信する第２通信手段と、第２通信手段により受信され
たファイル情報の内の発話情報に基づいて声色記憶手段
に記憶されている複数種の声色データの内から一つの声
色データを選定する選定手段と、発話情報に基づいて時
間軸方向で連続した韻律パターンを展開する展開手段
と、展開手段により展開された韻律パターンと選定手段
により選定された声色データとに基づいて音声波形を生
成する音声再生手段と、を有したことを特徴とする。An information communication system according to a fifth aspect of the present invention connects the first communication device and the second communication device to a communication network,
In an information communication system for performing information communication between a first communication device and a second communication device via the communication network, the first communication device includes a file storing file information including utterance information including phonemes and prosody as information. An information storage unit; and a first communication unit configured to transfer utterance information stored in the file information storage unit to the second communication device in response to a request from the second communication device. Voice data storage means for storing voice data representing acoustic parameters for each unit such as phoneme for each type, and requesting the first communication device to transfer the file information stored in the file information storage means. Second communication means for receiving the file information transferred by the first communication means, and voice information stored in the timbre storage means based on speech information in the file information received by the second communication means. Selecting means for selecting one voice data from several kinds of voice data, developing means for developing a continuous prosodic pattern in the time axis direction based on speech information, and prosody pattern and selecting means developed by the developing means And a sound reproducing means for generating a sound waveform based on the voice data selected by (1).

【００２０】この請求項５の発明によれば、第１通信装
置から第２通信装置へ発話情報を含むファイル情報を転
送し、第２通信装置において、ファイル情報の内の発話
情報で時間的に連続した韻律パターンを展開し、その韻
律パターンと発話情報に基づき選定された声色データと
に基づいて音声波形を生成するようにしたので、特定の
声色に限定しなくても適した声色で音声再生ができ、か
つ波形合成時に声の高さのパターンにずれが生じること
はなく、このように、発話情報と声色情報との対応関係
を固定しなくても最適の対応関係を得ることで音声合成
の高い品質を維持することが可能である。According to the fifth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device uses the utterance information in the file information in time. Since a continuous prosody pattern is developed and a voice waveform is generated based on the prosody pattern and the voice data selected based on the utterance information, the voice is reproduced with a suitable voice without being limited to a specific voice. And the voice pitch pattern does not deviate during waveform synthesis. In this way, it is possible to obtain the optimal correspondence without fixing the correspondence between speech information and voice color information. It is possible to maintain high quality.

【００２１】請求項６の発明に係る情報通信システム
は、第１通信装置と第２通信装置とを通信網に接続し、
この通信網を介して第１通信装置と第２通信装置間で情
報通信を行う情報通信システムにおいて、第１通信装置
は、音韻と韻律と声色の種類とを情報として含む発話情
報を含むファイル情報を記憶するファイル情報記憶手段
と、第２通信装置の要求に応じてファイル情報記憶手段
に記憶されたファイル情報を第２通信装置へ転送する第
１通信手段と、を有し、第２通信装置は、声色の種類別
に音韻等の素片単位毎の音響パラメータを表す声色デー
タを記憶する声色記憶手段と、第１通信装置に対してフ
ァイル情報記憶手段に記憶されているファイル情報の転
送を要求し、その後に第１通信手段により転送されてく
るファイル情報を受信する第２通信手段と、第２通信手
段により受信されたファイル情報の内の発話情報の内の
声色の種類に対応する声色データを声色記憶手段に記憶
されている複数種の声色データの内から選定する選定手
段と、発話情報に基づいて時間軸方向で連続した韻律パ
ターンを展開する展開手段と、展開手段により展開され
た韻律パターンと選定手段により選定された声色データ
とに基づいて音声波形を生成する音声再生手段と、を有
したことを特徴とする。An information communication system according to a sixth aspect of the present invention connects the first communication device and the second communication device to a communication network,
In the information communication system for performing information communication between the first communication device and the second communication device via the communication network, the first communication device includes file information including speech information including phonemes, prosody, and timbre type as information. And a first communication unit for transferring file information stored in the file information storage unit to the second communication device in response to a request from the second communication device. Requests voice color storage means for storing voice data representing acoustic parameters for each segment unit such as phoneme for each voice type, and requests the first communication device to transfer the file information stored in the file information storage means. Then, the second communication means for receiving the file information transferred by the first communication means, and corresponding to the type of voice in the speech information in the file information received by the second communication means Selecting means for selecting voice data from a plurality of types of voice data stored in the voice memory means; developing means for developing a continuous prosodic pattern in the time axis direction based on speech information; Voice reproducing means for generating a voice waveform based on the selected prosody pattern and the timbre data selected by the selecting means.

【００２２】請求項６の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報中の声色の種類を示す情報で選定さ
れた声色データとに基づいて音声波形を生成するように
したので、特定の声色に限定しなくても複数種の声色か
ら直接的に指定した最適の声色で音声再生ができ、かつ
波形合成時に声の高さのパターンにずれが生じることは
なく、このように、発話情報と声色情報との対応関係を
固定しなくても最適の対応関係を得ることで音声合成の
高い品質を維持することが可能である。According to the sixth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continuously transmits the utterance information in the file information. Is developed based on the prosody pattern and the voice data selected based on the information indicating the type of voice in the utterance information, so that the voice is not limited to a specific voice. The voice can be reproduced with the optimum voice directly specified from a plurality of voices, and there is no shift in the voice pitch pattern during waveform synthesis. Thus, the correspondence between speech information and voice information Even if is not fixed, it is possible to maintain high quality of speech synthesis by obtaining an optimal correspondence.

【００２３】請求項７の発明に係る情報通信システム
は、第１通信装置と第２通信装置とを通信網に接続し、
この通信網を介して第１通信装置と第２通信装置間で情
報通信を行う情報通信システムにおいて、第１通信装置
は、音韻と韻律と声色の属性とを情報として含む発話情
報を含むファイル情報を記憶するファイル情報記憶手段
と、第２通信装置の要求に応じてファイル情報記憶手段
に記憶されたファイル情報を第２通信装置へ転送する第
１通信手段と、を有し、第２通信装置は、声色の種類別
に、音韻等の素片単位毎の音響パラメータを表す声色デ
ータとその声色の属性を示す情報とを対応させて記憶す
る声色記憶手段と、第１通信装置に対してファイル情報
記憶手段に記憶されているファイル情報の転送を要求
し、その後に第１通信手段により転送されてくるファイ
ル情報を受信する第２通信手段と、第２通信手段により
受信されたファイル情報の内の発話情報の内の声色の属
性を示す情報と声色記憶手段に記憶されている各種声色
の属性を示す情報とを照合して声色の類似度を求める照
合手段と、照合手段により求められた類似度に基づいて
声色記憶手段に記憶されている複数種の声色データの内
から類似度の最も高い声色データを選定する選定手段
と、発話情報基づいて時間軸方向で連続した韻律パター
ンを展開する展開手段と、展開手段により展開された韻
律パターンと選定手段により選定された声色データとに
基づいて音声波形を生成する音声再生手段と、を有した
ことを特徴とする。An information communication system according to a seventh aspect of the present invention connects the first communication device and the second communication device to a communication network,
In an information communication system for performing information communication between a first communication device and a second communication device via the communication network, the first communication device includes file information including speech information including phonemes, prosody, and voice attributes as information. And a first communication unit for transferring file information stored in the file information storage unit to the second communication device in response to a request from the second communication device. Is a timbre storage means for storing, in association with each type of timbre, timbre data representing an acoustic parameter for each unit such as a phoneme and information indicating an attribute of the timbre, and file information for the first communication device. A second communication unit for requesting the transfer of the file information stored in the storage unit and subsequently receiving the file information transferred by the first communication unit; and a file received by the second communication unit. Matching means for comparing the information indicating the voice attributes of the utterance information in the report with the information indicating the attributes of various voices stored in the voice storage means; Selecting means for selecting voice data having the highest similarity from among a plurality of types of voice data stored in the voice memory based on the obtained similarity; and prosodic patterns continuous in the time axis direction based on the utterance information. It is characterized by having expansion means for expanding, and audio reproduction means for generating an audio waveform based on the prosody pattern expanded by the expansion means and the timbre data selected by the selection means.

【００２４】この請求項７の発明によれば、第１通信装
置から第２通信装置へ発話情報を含むファイル情報を転
送し、第２通信装置において、ファイル情報の内の発話
情報で時間的に連続した韻律パターンを展開し、その韻
律パターンと発話情報中の声色の属性を示す情報で類似
度によって選定された声色データとに基づいて音声波形
を生成するようにしたので、不適な声色を使用せずに類
似度の最も高い声色で音声再生ができ、かつ波形合成時
に声の高さのパターンにずれが生じることはなく、この
ように、発話情報と声色情報との対応関係を固定しなく
ても最適の対応関係を得ることで音声合成の高い品質を
維持することが可能である。According to the seventh aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device uses the utterance information in the file information in time. Since a continuous prosodic pattern is developed and a voice waveform is generated based on the prosodic pattern and voice data selected by similarity based on information indicating the voice attribute in the utterance information, an inappropriate voice is used. The voice reproduction with the highest similarity can be performed without performing the voice synthesis, and the voice pitch pattern does not shift during the waveform synthesis. Thus, the correspondence between the utterance information and the voice color information is not fixed. Even so, it is possible to maintain high quality of speech synthesis by obtaining an optimal correspondence.

【００２５】請求項８の発明に係る情報通信システム
は、第１通信装置と第２通信装置とを通信網に接続し、
この通信網を介して第１通信装置と第２通信装置間で情
報通信を行う情報通信システムにおいて、第１通信装置
は、音韻と韻律と声色の種類と声色の属性とを情報とし
て含む発話情報を含むファイル情報を記憶するファイル
情報記憶手段と、第２通信装置の要求に応じてファイル
情報記憶手段に記憶されたファイル情報を第２通信装置
へ転送する第１通信手段と、を有し、第２通信装置は、
声色の種類別に、音韻等の素片単位毎の音響パラメータ
を表す声色データとその声色の属性を示す情報とを対応
させて記憶する声色記憶手段と、第１通信装置に対して
ファイル情報記憶手段に記憶されているファイル情報の
転送を要求し、その後に第１通信手段により転送されて
くるファイル情報を受信する第２通信手段と、第２通信
手段により受信されたファイル情報の内の発話情報の内
の声色の種類を声色記憶手段に記憶されている各種声色
の種類から検索する検索手段と、検索手段の検索により
発話情報の内の声色の種類が取得できた場合には、その
取得できた声色の種類に該当する声色データを声色記憶
手段に記憶されている各種声色データの内から選定する
第１選定手段と、検索手段の検索により発話情報の内の
声色の種類が取得できなかった場合には、発話情報記憶
手段に記憶されている発話情報の内の声色の属性を示す
情報と声色記憶手段に記憶されている各種声色の属性を
示す情報とを照合して声色の類似度を求める照合手段
と、照合手段により求められた類似度に基づいて声色記
憶手段に記憶されている複数種の声色データの内から類
似度の最も高い声色データを選定する第２選定手段と、
発話情報に基づいて時間軸方向で連続した韻律パターン
を展開する展開手段と、展開手段により展開された韻律
パターンと前記第１又は第２選定手段により選定された
声色データとに基づいて音声波形を生成する音声再生手
段と、を有したことを特徴とする。An information communication system according to an eighth aspect of the present invention connects the first communication device and the second communication device to a communication network,
In the information communication system for performing information communication between the first communication device and the second communication device via the communication network, the first communication device includes utterance information including phonemes, prosody, timbre type, and vocal attribute as information. File information storage means for storing file information including: and first communication means for transferring file information stored in the file information storage means to the second communication device in response to a request from the second communication device, The second communication device is
Voice color storage means for storing, in association with each type of voice, voice data representing an acoustic parameter for each unit such as a phoneme and information indicating the attribute of the voice, and file information storage means for the first communication device. A second communication unit for requesting the transfer of the file information stored in the first communication unit, and thereafter receiving the file information transferred by the first communication unit; and speech information in the file information received by the second communication unit. Search means for searching for the type of timbre among the timbres stored in the timbre storage means.If the type of utterance in the utterance information can be obtained by the search of the search means, it can be obtained. First selecting means for selecting voice data corresponding to the selected voice type from the various voice data stored in the voice color storage means, and obtaining the voice type in the utterance information by searching the search means. If not, the information indicating the voice attribute of the utterance information stored in the utterance information storage means is compared with the information indicating the various timbre attributes stored in the timbre storage means. Matching means for determining the similarity, and second selecting means for selecting voice data having the highest similarity from among a plurality of types of voice data stored in the voice memory based on the similarity determined by the matching means. ,
Expansion means for expanding a continuous prosody pattern in the time axis direction based on the utterance information; and a speech waveform based on the prosody pattern expanded by the expansion means and the timbre data selected by the first or second selection means. And sound reproducing means for generating.

【００２６】この請求項８の発明によれば、第１通信装
置から第２通信装置へ発話情報を含むファイル情報を転
送し、第２通信装置において、ファイル情報の内の発話
情報で時間的に連続した韻律パターンを展開し、その韻
律パターンと発話情報中の声色の種類や属性を示す情報
で選定された声色データとに基づいて音声波形を生成す
るようにしたので、直接的に指定した声色がなくても不
適な声色を使用せずに類似度の最も高い声色で音声再生
ができ、かつ波形合成時に声の高さのパターンにずれが
生じることはなく、このように、発話情報と声色情報と
の対応関係を固定しなくても最適の対応関係を得ること
で音声合成の高い品質を維持することが可能である。According to this invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally uses the utterance information in the file information. Since a continuous prosody pattern is developed and a voice waveform is generated based on the prosody pattern and voice data selected by information indicating the type and attribute of the voice in the utterance information, the directly specified voice is used. Even if there is no voice, the voice can be reproduced with the highest similarity without using an inappropriate voice, and there is no shift in the voice pitch pattern during waveform synthesis. Even if the correspondence with the information is not fixed, it is possible to maintain high quality of speech synthesis by obtaining the optimum correspondence.

【００２７】請求項９の発明に係る情報通信システム
は、請求項３、４、７、８のいずれか一つに記載の発明
において、属性を示す情報を性別、年齢、声の高さの基
準、明瞭度、及び自然度の内のいずれか一つ、もしくは
その二つ以上の組み合わせにしたので、発話情報記憶手
段の属性と声色記憶手段の属性との照合対象がパラメー
タ化され、これによって、声色の選定を容易にすること
が可能である。According to a ninth aspect of the present invention, in the information communication system according to any one of the third, fourth, seventh, and eighth aspects, the information indicating the attribute is determined based on gender, age, and voice pitch. , Intelligibility, and naturalness, or a combination of two or more of them, the collation target of the attribute of the utterance information storage means and the attribute of the timbre storage means is parameterized, whereby It is possible to easily select a voice.

【００２８】請求項１０の発明に係る情報通信システム
は、請求項１〜８のいずれか一つに記載の発明におい
て、ファイル情報記憶手段が声の高さの基準を示す第１
情報を発話情報に含めて記憶し、声色記憶手段が声の高
さの基準を示す第２情報を声色データに含めて記憶して
おり、音声再生手段が第１情報による声の高さの基準を
第２情報による声の高さの基準によってシフトすること
で音声再生時の声の高さの基準を決定することを特徴と
する。[0028] According to a tenth aspect of the present invention, in the information communication system according to any one of the first to eighth aspects, the file information storage means includes a first information indicating a reference of a voice pitch.
The information is stored in the utterance information, the timbre storage means stores the second information indicating the criterion of the voice pitch in the timbre data, and the voice reproducing means stores the criterion of the voice pitch based on the first information. Is shifted based on the voice pitch criterion according to the second information, thereby determining the voice pitch criterion during voice reproduction.

【００２９】この請求項１０の発明によれば、音声再生
時に発話情報記憶手段の声の高さの基準を声色記憶手段
の声の高さの基準によってシフトするようにしたので、
個々の声の高さは音韻の時間区分に関係なくそのシフト
した声の高さの基準に従って相対的に変化し、このた
め、声の高さの基準が声色側に近づくことから、音声の
品質を一層向上させることが可能である。According to the tenth aspect of the invention, the reference of the voice pitch of the speech information storage means is shifted by the reference of the voice pitch of the timbre storage means at the time of voice reproduction.
The pitch of an individual voice changes relatively according to the shifted voice pitch criterion regardless of the time segment of the phoneme, so that the voice pitch criterion approaches the timbre side, so that the voice quality Can be further improved.

【００３０】請求項１１の発明に係る情報通信システム
は、請求項１〜８のいずれか一つに記載の発明におい
て、ファイル情報記憶手段が声の高さの基準を示す第１
情報を発話情報に含めて記憶し、音声再生手段が、声の
高さの基準を示す第２情報を任意に入力する入力手段を
有し、第１情報による声の高さの基準を入力手段により
入力された第２情報による声の高さの基準によってシフ
トすることで音声再生時の声の高さの基準を決定するこ
とを特徴とする。An information communication system according to an eleventh aspect of the present invention is the information communication system according to any one of the first to eighth aspects, wherein the file information storage means indicates the first reference of the voice pitch.
Information is included in the utterance information, and the voice reproducing means has input means for arbitrarily inputting second information indicating a reference of voice pitch, and input means for inputting a reference of voice pitch based on the first information. And determining the reference of the voice pitch at the time of the sound reproduction by shifting based on the reference of the voice pitch based on the second information input by (1).

【００３１】この請求項１１の発明によれば、音声再生
時に発話情報記憶手段の声の高さの基準を任意の声の高
さの基準によってシフトするようにしたので、個々の声
の高さは音韻の時間区分に関係なくそのシフトした声の
高さの基準に従って相対的に変化し、このため、シフト
量に従って意図する声質に近づける等、声色の加工が可
能である。According to the eleventh aspect of the present invention, the reference of the voice pitch of the utterance information storage means is shifted by an arbitrary reference of the voice pitch during the reproduction of the voice. Is relatively changed irrespective of the time division of the phoneme according to the shifted voice pitch criterion. Therefore, it is possible to process the timbre such as approaching the intended voice quality according to the shift amount.

【００３２】請求項１２の発明に係る情報通信システム
は、請求項１０又は１１に記載の発明において、第１及
び第２情報による声の高さの基準を声の高さの平均周波
数、最大周波数、または最小周波数にしたので、声の高
さの基準が取りやすくなる。According to a twelfth aspect of the present invention, in the information communication system according to the tenth or eleventh aspect, the criterion of the voice pitch based on the first and second information is determined based on the average voice frequency and the maximum frequency. , Or the minimum frequency, so that it becomes easier to take a reference for the voice pitch.

【００３３】請求項１３の発明に係る情報通信システム
は、請求項１〜８のいずれか一つに記載の発明におい
て、第２通信装置において記録媒体より声色データを読
み出して声色記憶手段に記憶するようにしたので、記録
媒体を通して声色の種類にバリエーションを与えること
ができ、音声再生時に最適の声色を適用させることが可
能である。According to a thirteenth aspect of the present invention, in the information communication system according to any one of the first to eighth aspects, voice data is read from a recording medium in the second communication device and stored in voice color storage means. With this configuration, it is possible to give variations to the types of timbres through the recording medium, and it is possible to apply an optimum timbre at the time of voice reproduction.

【００３４】請求項１４の発明に係る情報通信システム
は、請求項１〜８のいずれか一つに記載の発明におい
て、第２通信装置において通信回線を介して外部装置よ
り声色データを受信してその声色データを声色記憶手段
に記憶するようにしたので、通信回線を通して声色の種
類にバリエーションを与えることができ、音声再生時に
最適の声色を適用させることが可能である。An information communication system according to a fourteenth aspect of the present invention is the information communication system according to any one of the first to eighth aspects, wherein the second communication device receives voice color data from an external device via a communication line. Since the timbre data is stored in the timbre storage means, variations in timbre type can be given through a communication line, and an optimum timbre can be applied at the time of voice reproduction.

【００３５】請求項１５の発明に係る情報通信システム
は、請求項１〜８のいずれか一つに記載の発明におい
て、発話情報にファイル情報の内の他の情報による動作
と音声再生手段による動作とを同期させる制御情報を含
め、音声再生手段が音声再生時に発話情報に含まれる制
御情報に従ってファイル情報の内の他の情報の動作に同
期して動作するようにしたので、音声と他のメディアと
の表現が融合して表現力を強化することが可能である。According to a fifteenth aspect of the present invention, in the information communication system according to any one of the first to eighth aspects, an operation based on other information of the file information and an operation performed by the sound reproducing means are included in the utterance information. Since the sound reproducing means operates in synchronization with the operation of the other information in the file information according to the control information included in the utterance information during the sound reproduction, including the control information for synchronizing the It is possible to enhance the power of expression by fusing the expression.

【００３６】請求項１６の発明に係る情報通信システム
は、請求項１５記載の発明において、他の情報を画像情
報、楽曲情報等にしたので、音声と画像、音楽等との表
現が融合して表現力を強化することが可能である。In the information communication system according to the sixteenth aspect of the present invention, in the invention according to the fifteenth aspect, the other information is image information, music information, or the like. It is possible to enhance expressive power.

【００３７】請求項１７の発明に係る情報処理装置は、
請求項１〜８のいずれか一つに記載の情報通信システム
で使用する発話情報を作成編集する情報処理装置であっ
て、自然音声を入力する音声入力手段と、音声入力手段
により入力された自然音声に基づいて発話情報を作成す
る作成手段と、第１通信装置に対して作成手段により作
成された発話情報を含むファイル情報の登録を要求し、
その後に作成された発話情報を含むファイル情報を第１
通信装置へ転送して第１通信装置のファイル情報記憶手
段に登録する登録転送手段と、を備えたことを特徴とす
る。According to a seventeenth aspect of the present invention, there is provided an information processing apparatus comprising:
An information processing apparatus for creating and editing utterance information used in the information communication system according to any one of claims 1 to 8, wherein the voice input means inputs a natural voice, and the natural information input by the voice input means. Requesting the first communication device to register file information including the utterance information generated by the generating unit, the generating unit generating the utterance information based on the voice;
The file information including the utterance information created thereafter is stored in the first
Registration transfer means for transferring to the communication device and registering it in the file information storage means of the first communication device.

【００３８】この請求項１７の発明によれば、入力され
た自然音声に基づいて、声の大きさと声の高さとのいず
れか一方、もしくはその両方を、音韻間の時間差に依存
せず、かつ相対的なレベルをもつように離散させて発話
情報を作成し、これを第１通信装置に転送してファイル
情報記憶手段に登録するようにしたので、音韻の時間差
から独立した任意の時点に声の大きさや声の高さを与え
ることが可能である。According to the seventeenth aspect of the invention, based on the input natural speech, one or both of the voice volume and the voice pitch are not dependent on the time difference between phonemes, and Since the utterance information is created by being discrete so as to have a relative level, the utterance information is transferred to the first communication device and registered in the file information storage means. It is possible to give loudness and voice pitch.

【００３９】請求項１８の発明に係る情報処理装置は、
請求項１７記載の発明において、請求項１０又は１１に
記載の情報通信システムで使用する発話情報を作成編集
する情報処理装置であって、作成手段が声の高さの基準
を示す第１情報を発話情報に含めて作成するようにした
ので、発話情報の中に声の高さの基準を与えることが可
能である。An information processing apparatus according to claim 18 is
An information processing apparatus for creating and editing utterance information for use in the information communication system according to claim 10 or 11, wherein the creating means transmits the first information indicating a reference of voice pitch. Since the speech information is created so as to be included in the speech information, it is possible to provide a reference for the voice pitch in the speech information.

【００４０】請求項１９の発明に係る情報処理装置は、
請求項１７記載の発明において、作成手段が各情報を任
意に変更する変更手段を含むようにしたので、音声の品
質を高めるための情報の変更が可能になる。The information processing apparatus according to the nineteenth aspect of the present invention
According to the seventeenth aspect of the present invention, since the creating means includes the changing means for arbitrarily changing each information, it is possible to change the information for improving the quality of the voice.

【００４１】請求項２０の発明に係る情報処理装置は、
請求項１７記載の発明において、請求項１５又は１６に
記載の情報通信システムで使用する発話情報を作成編集
する情報処理装置であって、作成手段が発話情報を作成
する際に制御情報を発話情報に含めるようにしたので、
他の情報による動作に音声合成の動作を同期させる情報
を発話情報の中に与えることが可能である。According to a twentieth aspect of the present invention, there is provided an information processing apparatus comprising:
18. An information processing apparatus for creating and editing utterance information used in the information communication system according to claim 15 or 16, wherein the creating means creates control information when the utterance information is created. , So
Information for synchronizing the operation of speech synthesis with the operation based on other information can be provided in the speech information.

【００４２】請求項２１の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、声の大きさと声の高さとのいずれか
一方、もしくはその両方とその時間差とを対応させ、声
の大きさと声の高さとのいずれか一方、もしくはその両
方を、音韻間の時間差に依存せず、かつ相対的なレベル
をもつように離散させてなる発話情報を含むファイル情
報をファイル情報記憶部に予め記憶しておくと共に、第
２通信装置において、色の種類別に音韻等の素片単位毎
の音響パラメータを表す声色データを声色記憶部に予め
記憶しておき、通信網を介して第１通信装置と第２通信
装置間で情報通信を行うことでファイル情報記憶部に記
憶されているファイル情報の内の発話情報と声色記憶部
に記憶されている声色データとに基づいて音声合成する
情報通信方法であって、第１通信装置に対する第２通信
装置の要求に応じて前記ファイル情報記憶手段に記憶さ
れた発話情報を第２通信装置へ転送する転送工程と、第
２通信装置において転送工程により転送されてきたファ
イル情報の内の発話情報に基づいて声色記憶部に記憶さ
れている複数種の声色データの内から一つの声色データ
を選定する選定工程と、第２通信装置において発話情報
に含まれる声の大きさと声の高さとのいずれか一方、も
しくはその両方とその時間差とに基づいて時間軸方向で
連続した韻律パターンを展開する展開工程と、第２通信
装置において展開工程により展開された韻律パターンと
選定工程により選定された声色データとに基づいて音声
波形を生成する音声再生工程と、を含むことを特徴とす
る。According to a twenty-first aspect of the present invention, there is provided an information communication method comprising:
Connecting a first communication device and a second communication device to a communication network;
In the communication device, one or both of the loudness of the voice and the pitch of the voice are associated with the time difference, and either the loudness of the voice and the pitch of the voice or both are determined by the time difference between the phonemes. And file information including utterance information that is discrete so as to have a relative level is stored in the file information storage unit in advance, and in the second communication device, phonemes such as phonemes are classified by color type. Voice data representing acoustic parameters for each unit is stored in the voice storage unit in advance, and is stored in the file information storage unit by performing information communication between the first communication device and the second communication device via a communication network. An information communication method for synthesizing voice based on utterance information in file information and voice data stored in a voice storage unit, the method comprising responding to a request from a second communication device to a first communication device. A transfer step of transferring the utterance information stored in the file information storage unit to the second communication device; and storing the utterance information in the voice storage unit based on the utterance information of the file information transferred by the transfer step in the second communication device. Selecting one voice data from a plurality of types of voice data being performed, and the second communication device performs one or both of the voice loudness and the voice pitch included in the utterance information, or both of them. A developing step of developing a continuous prosody pattern in the time axis direction based on the time difference, and generating a speech waveform based on the prosody pattern developed in the developing step and the timbre data selected in the selecting step in the second communication device And a sound reproducing step.

【００４３】この請求項２１の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、音韻に依存しない声の
大きさや声の高さで時間的に連続した韻律パターンを展
開し、その韻律パターンと発話情報に基づき選定された
声色データとに基づいて音声波形を生成する工程にした
ので、特定の声色に限定しなくても適した声色で音声再
生ができ、かつ波形合成時に声の高さのパターンにずれ
が生じることはなく、このように、発話情報と声色情報
との対応関係を固定しなくても最適の対応関係を得るこ
とで音声合成の高い品質を維持することが可能である。According to the twenty-first aspect of the present invention, file information including speech information is transferred from the first communication device to the second communication device. By developing a prosody pattern that is temporally continuous and generating a voice waveform based on the prosody pattern and voice data selected based on the utterance information, it is suitable without being limited to a specific voice. It is possible to reproduce voices with different timbres, and there is no shift in voice pitch pattern during waveform synthesis. In this way, the optimum correspondence between utterance information and timbre information can be obtained without fixing the correspondence. This makes it possible to maintain high quality of speech synthesis.

【００４４】請求項２２の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、声の大きさと声の高さとのいずれか
一方、もしくはその両方と、その時間差と、声色の種類
とを対応させ、声の大きさと声の高さとのいずれか一
方、もしくはその両方を、音韻間の時間差に依存せず、
かつ相対的なレベルをもつように離散させてなる発話情
報を含むファイル情報をファイル情報記憶部に予め記憶
しておくと共に、第１通信装置において、声色の種類別
に音韻等の素片単位毎の音響パラメータを表す声色デー
タを声色記憶部に予め記憶しておき、通信網を介して第
１通信装置と第２通信装置間で情報通信を行うことでフ
ァイル情報記憶部に記憶されているファイル情報の内の
発話情報と声色記憶部に記憶されている声色データとに
基づいて音声合成する情報通信方法であって、第１通信
装置に対する第２通信装置の要求に応じてファイル情報
記憶手段に記憶されたファイル情報を第２通信装置へ転
送する転送手段と、第２通信装置において転送工程によ
り転送されてきたファイル情報の内の発話情報の内の声
色の種類に対応する声色データを声色記憶部に記憶され
ている複数種の声色データの内から選定する選定工程
と、第２通信装置において発話情報に含まれる声の大き
さと声の高さとのいずれか一方、もしくはその両方とそ
の時間差とに基づいて時間軸方向で連続した韻律パター
ンを展開する展開工程と、第２通信装置において展開工
程により展開された韻律パターンと選定工程により選定
された声色データとに基づいて音声波形を生成する音声
再生工程と、を含むことを特徴とする。The information communication method according to the invention of claim 22 is as follows:
Connecting a first communication device and a second communication device to a communication network;
In the communication device, one or both of the loudness and the loudness of the voice, the time difference thereof, and the type of the timbre are associated with each other, and either or both of the loudness and the loudness of the voice are mixed. Is independent of the time difference between phonemes,
In addition, file information including utterance information that is discrete so as to have a relative level is stored in the file information storage unit in advance, and in the first communication device, for each type of voice, each unit of phoneme or the like is divided. Voice data representing sound parameters is stored in the voice storage unit in advance, and information communication is performed between the first communication device and the second communication device via a communication network, so that the file information stored in the file information storage unit An information communication method for synthesizing voice based on the utterance information and voice color data stored in a voice color storage unit, wherein the information is stored in file information storage means in response to a request from a first communication device to a second communication device. Transfer means for transferring the received file information to the second communication device, and corresponding to the type of voice in the utterance information of the file information transferred by the transfer process in the second communication device. A selection step of selecting voice data from a plurality of types of voice data stored in the voice storage unit; and a second communication device that selects one or both of a voice volume and a voice pitch included in the utterance information. A developing step of developing a continuous prosody pattern in the time axis direction based on both of them and a time difference between the two, and a voice based on the prosody pattern developed in the developing step and the timbre data selected in the selecting step in the second communication device. And a sound reproducing step of generating a waveform.

【００４５】この請求項２２の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、音韻に依存しない声の
大きさや声の高さで時間的に連続した韻律パターンを展
開し、その韻律パターンと発話情報中の声色の種類を示
す情報で選定された声色データとに基づいて音声波形を
生成する工程にしたので、特定の声色に限定しなくても
複数種の声色から直接的に指定した最適の声色で音声再
生ができ、かつ波形合成時に声の高さのパターンにずれ
が生じることはなく、このように、発話情報と声色情報
との対応関係を固定しなくても最適の対応関係を得るこ
とで音声合成の高い品質を維持することが可能である。According to the twenty-second aspect of the present invention, file information including speech information is transferred from the first communication device to the second communication device. Then, a temporally continuous prosody pattern is developed, and the voice waveform is generated based on the prosody pattern and the voice data selected by the information indicating the type of the voice in the utterance information. The voice can be reproduced with the optimum voice directly specified from a plurality of types of voices without being limited to the above, and there is no shift in the voice pitch pattern during waveform synthesis. Even if the correspondence with the timbre information is not fixed, it is possible to maintain high quality of speech synthesis by obtaining the optimum correspondence.

【００４６】請求項２３の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、声の大きさと声の高さとのいずれか
一方、もしくはその両方と、その時間差と、声色の属性
とを対応させ、声の大きさと声の高さとのいずれか一
方、もしくはその両方を、音韻間の時間差に依存せず、
かつ相対的なレベルをもつように離散させてなる発話情
報を含むファイル情報をファイル情報記憶部に予め記憶
しておくと共に、第２通信装置において、声色の種類別
に、音韻等の素片単位毎の音響パラメータを表す声色デ
ータとその声色の属性を示す情報とを対応させて声色記
憶部に予め記憶しておき、通信網を介して第１通信装置
と第２通信装置間で情報通信を行うことでファイル情報
記憶部に記憶されているファイル情報の内の発話情報と
声色記憶部に記憶されている声色データとに基づいて音
声合成する情報通信方法であって、第１通信装置に対す
る第２通信装置の要求に応じてファイル情報記憶部に記
憶されたファイル情報を第２通信装置へ転送する転送工
程と、第２通信装置において転送工程により転送されて
きたファイル情報の内の発話情報の内の声色の属性を示
す情報と声色記憶部に記憶されている各種声色の属性を
示す情報とを照合して声色の類似度を求める照合工程
と、第２通信装置において照合工程により求められた類
似度に基づいて声色記憶部に記憶されている複数種の声
色データの内から類似度の最も高い声色データを選定す
る選定工程と、第２通信装置において発話情報に含まれ
る声の大きさと声の高さとのいずれか一方、もしくはそ
の両方とその時間差とに基づいて時間軸方向で連続した
韻律パターンを展開する展開工程と、第２通信装置にお
いて展開工程により展開された韻律パターンと選定工程
により選定された声色データとに基づいて音声波形を生
成する音声再生工程と、を含むことを特徴とする。An information communication method according to a twenty-third aspect of the present invention
Connecting a first communication device and a second communication device to a communication network;
In the communication device, one or both of the loudness and the loudness of the voice, the time difference thereof, and the attribute of the timbre are associated with each other, and either or both of the loudness and the loudness of the voice are mixed. Is independent of the time difference between phonemes,
In addition, file information including utterance information that is discrete so as to have a relative level is stored in the file information storage unit in advance, and in the second communication device, for each voice type, for each unit of phoneme or the like, The voice data representing the acoustic parameters of the voice and the information indicating the attributes of the voice are stored in advance in the voice storage unit, and information communication is performed between the first communication device and the second communication device via a communication network. An information communication method for synthesizing voice based on utterance information of the file information stored in the file information storage unit and voice data stored in the voice color storage unit, wherein the second communication with the first communication device is performed. A transfer step of transferring the file information stored in the file information storage unit to the second communication apparatus in response to a request from the communication apparatus, and a file information transferred by the transfer step in the second communication apparatus A collation step of collating information indicating the attribute of the timbre in the utterance information within the utterance information with information indicating the attributes of various timbres stored in the timbre storage unit to obtain a similarity of the timbre; A selecting step of selecting voice data having the highest similarity from among a plurality of types of voice data stored in the voice memory based on the similarity obtained in the step; A development step of developing a continuous prosody pattern in the time axis direction based on one or both of the loudness and pitch of the voice and a time difference between the two, and a prosody developed by the development step in the second communication device A sound reproduction step of generating a sound waveform based on the pattern and the voice data selected in the selection step.

【００４７】この請求項２３の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、音韻に依存しない声の
大きさや声の高さで時間的に連続した韻律パターンを展
開し、その韻律パターンと発話情報中の声色の属性を示
す情報で類似度によって選定された声色データとに基づ
いて音声波形を生成する工程にしたので、不適な声色を
使用せずに類似度の最も高い声色で音声再生ができ、か
つ波形合成時に声の高さのパターンにずれが生じること
はなく、このように、発話情報と声色情報との対応関係
を固定しなくても最適の対応関係を得ることで音声合成
の高い品質を維持することが可能である。According to the twenty-third aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device. By developing a temporally continuous prosody pattern, a voice waveform was generated based on the prosody pattern and voice data selected by similarity based on information indicating a voice attribute in the utterance information. Voice reproduction with the highest similarity can be performed without using inappropriate voices, and there is no shift in voice pitch pattern during waveform synthesis. Thus, correspondence between speech information and voice information Even if the relationship is not fixed, it is possible to maintain the high quality of speech synthesis by obtaining the optimal correspondence.

【００４８】請求項２４の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、声の大きさと声の高さとのいずれか
一方、もしくはその両方と、その時間差と、声色の種類
と、声色の属性とを対応させ、声の大きさと声の高さと
のいずれか一方、もしくはその両方を、音韻間の時間差
に依存せず、かつ相対的なレベルをもつように離散させ
てなる発話情報を含むファイル情報をファイル情報記憶
部に予め記憶しておくと共に、第２通信装置において、
声色の種類別に、音韻等の素片単位毎の音響パラメータ
を表す声色データとその声色の属性を示す情報とを対応
させて声色記憶部に予め記憶しておき、通信網を介して
第１通信装置と第２通信装置間で情報通信を行うことで
ファイル情報記憶部に記憶されているファイル情報の内
の発話情報と声色記憶部に記憶されている声色データと
に基づいて音声合成する情報通信方法において、第１通
信装置に対する第２通信装置の要求に応じてファイル情
報記憶部に記憶されたファイル情報を第２通信装置へ転
送する転送工程と、第２通信装置において転送工程によ
り転送されてきたファイル情報の内の発話情報の内の声
色の種類を声色記憶部に記憶されている各種声色の種類
から検索する検索工程と、第２通信装置において検索工
程の検索により発話情報の内の声色の種類が取得できた
場合には、その取得できた声色の種類に該当する声色デ
ータを声色記憶部に記憶されている各種声色データの内
から選定する第１選定工程と、第２通信装置において検
索工程の検索により発話情報の内の声色の種類が取得で
きなかった場合には、ファイル情報記憶部に記憶されて
いる発話情報の内の声色の属性を示す情報と声色記憶部
に記憶されている各種声色の属性を示す情報とを照合し
て声色の類似度を求める照合工程と、第２通信装置にお
いて照合工程により求められた類似度に基づいて声色記
憶工程に記憶されている複数種の声色データの内から類
似度の最も高い声色データを選定する第２選定工程と、
第２通信装置において発話情報に含まれる声の大きさと
声の高さとのいずれか一方、もしくはその両方とその時
間差とに基づいて時間軸方向で連続した韻律パターンを
展開する展開工程と、第２通信装置において展開工程に
より展開された韻律パターンと第１又は第２選定工程に
より選定された声色データとに基づいて音声波形を生成
する音声再生工程と、を含むことを特徴とする。An information communication method according to a twenty-fourth aspect of the present invention,
Connecting a first communication device and a second communication device to a communication network;
In the communication device, one or both of the loudness and the loudness of the voice, the time difference thereof, the type of the timbre, and the attribute of the timbre are associated with each other, and one of the loudness and the loudness of the voice is set. On the other hand, file information including utterance information obtained by discriminating one or both of them without depending on the time difference between phonemes and having a relative level is stored in the file information storage unit in advance, In a communication device,
For each type of timbre, timbre data representing acoustic parameters for each unit such as a phoneme and information indicating the attribute of the timbre are stored in advance in a timbre storage unit, and the first communication is performed via a communication network. Information communication for performing voice communication based on utterance information of file information stored in a file information storage unit and voice data stored in a voice color storage unit by performing information communication between the device and the second communication device In the method, the file information stored in the file information storage unit is transferred to the second communication device in response to a request from the second communication device to the first communication device, and the file information is transferred by the transfer process in the second communication device. A search step of searching for the type of voice in the utterance information of the file information obtained from the various types of voice stored in the voice storage unit, and a search performed by the search step in the second communication device. A first selecting step of, when the type of voice in the information has been obtained, selecting voice data corresponding to the obtained voice type from among various types of voice data stored in the voice storage unit; If the type of voice in the utterance information cannot be obtained by the search in the search step in the second communication device, information indicating the attribute of the voice in the utterance information stored in the file information storage unit and voice color storage A collation step of comparing the information indicating the attributes of various timbres stored in the section to obtain similarities of timbres; and a timbre storage step stored in the timbre storage step based on the similarities determined by the collation steps in the second communication device. A second selecting step of selecting voice data having the highest similarity from the plurality of types of voice data,
A developing step of developing a continuous prosody pattern in the time axis direction based on one or both of the voice volume and voice pitch included in the utterance information in the second communication device; And a voice reproducing step of generating a voice waveform based on the prosody pattern expanded in the expanding step and the timbre data selected in the first or second selecting step in the communication device.

【００４９】この請求項２４の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、音韻に依存しない声の
大きさや声の高さで時間的に連続した韻律パターンを展
開し、その韻律パターンと発話情報中の声色の種類や属
性を示す情報で選定された声色データとに基づいて音声
波形を生成する工程にしたので、直接的に指定した声色
がなくても不適な声色を使用せずに類似度の最も高い声
色で音声再生ができ、かつ波形合成時に声の高さのパタ
ーンにずれが生じることはなく、このように、発話情報
と声色情報との対応関係を固定しなくても最適の対応関
係を得ることで音声合成の高い品質を維持することが可
能である。According to the twenty-fourth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device. Then, a time-continuous prosody pattern was developed, and a voice waveform was generated based on the prosody pattern and voice data selected based on information indicating the type and attribute of the voice in the speech information. Even if there is no specified voice, the voice can be reproduced with the highest similarity without using an inappropriate voice, and there is no shift in the voice pitch pattern during waveform synthesis. Even if the correspondence between the utterance information and the timbre information is not fixed, it is possible to maintain the high quality of speech synthesis by obtaining the optimum correspondence.

【００５０】請求項２５の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、音韻と韻律とを情報として含む発話
情報を含むファイル情報をファイル情報記憶部に予め記
憶しておくと共に、第２通信装置において、色の種類別
に音韻等の素片単位毎の音響パラメータを表す声色デー
タを声色記憶部に予め記憶しておき、通信網を介して第
１通信装置と第２通信装置間で情報通信を行うことでフ
ァイル情報記憶部に記憶されているファイル情報の内の
発話情報と声色記憶部に記憶されている声色データとに
基づいて音声合成する情報通信方法であって、第１通信
装置に対する第２通信装置の要求に応じて前記ファイル
情報記憶手段に記憶された発話情報を第２通信装置へ転
送する転送工程と、第２通信装置において転送工程によ
り転送されてきたファイル情報の内の発話情報に基づい
て声色記憶部に記憶されている複数種の声色データの内
から一つの声色データを選定する選定工程と、第２通信
装置において発話情報に基づいて時間軸方向で連続した
韻律パターンを展開する展開工程と、第２通信装置にお
いて展開工程により展開された韻律パターンと選定工程
により選定された声色データとに基づいて音声波形を生
成する音声再生工程と、を含むことを特徴とする。[0050] The information communication method according to the invention of claim 25 is characterized in that:
Connecting a first communication device and a second communication device to a communication network;
In the communication device, file information including utterance information including phonemes and prosody as information is stored in the file information storage unit in advance, and in the second communication device, the sound for each unit of phoneme or the like is classified by color type. Voice data representing parameters is stored in the voice storage unit in advance, and information communication is performed between the first communication device and the second communication device via the communication network, so that the file information stored in the file information storage unit is stored. An information communication method for synthesizing voice based on utterance information in a voice and voice color data stored in a voice color storage unit, wherein the voice communication is stored in the file information storage unit in response to a request from a second communication device to a first communication device. Transferring the received utterance information to the second communication device, and writing the utterance information in the voice storage unit based on the utterance information in the file information transferred by the transfer process in the second communication device. A selecting step of selecting one voice data from a plurality of types of voice data, a developing step of developing a continuous prosody pattern in the time axis direction based on the utterance information in the second communication device; The apparatus includes a sound reproducing step of generating a sound waveform based on the prosody pattern expanded in the expanding step and the timbre data selected in the selecting step.

【００５１】この請求項２５の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、ファイル情報の内の発
話情報で時間的に連続した韻律パターンを展開し、その
韻律パターンと発話情報に基づき選定された声色データ
とに基づいて音声波形を生成する工程にしたので、特定
の声色に限定しなくても適した声色で音声再生ができ、
かつ波形合成時に声の高さのパターンにずれが生じるこ
とはなく、このように、発話情報と声色情報との対応関
係を固定しなくても最適の対応関係を得ることで音声合
成の高い品質を維持することが可能である。According to the twenty-fifth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally uses the utterance information of the file information. Since a continuous prosody pattern is developed and a voice waveform is generated based on the prosody pattern and the voice data selected based on the utterance information, the voice is reproduced with a suitable voice without being limited to a specific voice. Can be
In addition, there is no shift in the voice pitch pattern during waveform synthesis, and thus high optimal speech synthesis quality can be obtained by obtaining the optimal correspondence without fixing the correspondence between speech information and voice color information. It is possible to maintain

【００５２】請求項２６の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、音韻と韻律と声色の種類とを情報と
して含む発話情報を含むファイル情報をファイル情報記
憶部に予め記憶しておくと共に、第２通信装置におい
て、声色の種類別に音韻等の素片単位毎の音響パラメー
タを表す声色データを声色記憶部に予め記憶しておき、
通信網を介して第１通信装置と第２通信装置間で情報通
信を行うことでファイル情報記憶部に記憶されているフ
ァイル情報の内の発話情報と声色記憶部に記憶されてい
る声色データとに基づいて音声合成する情報通信方法で
あって、第１通信装置に対する第２通信装置の要求に応
じてファイル情報記憶手段に記憶されたファイル情報を
第２通信装置へ転送する転送工程と、第２通信装置にお
いて転送工程により転送されてきたファイル情報の内の
発話情報の内の声色の種類に対応する声色データを声色
記憶部に記憶されている複数種の声色データの内から選
定する選定工程と、第２通信装置において発話情報に基
づいて時間軸方向で連続した韻律パターンを展開する展
開工程と、第２通信装置において展開工程により展開さ
れた韻律パターンと選定工程により選定された声色デー
タとに基づいて音声波形を生成する音声再生工程と、を
含むことを特徴とする。According to a twenty-sixth aspect of the present invention, there is provided an information communication method comprising:
Connecting a first communication device and a second communication device to a communication network;
In the communication device, file information including utterance information including phonemes, prosody, and timbre types as information is stored in a file information storage unit in advance, and in the second communication device, segments such as phonemes are classified by timbre type. Voice data representing acoustic parameters for each unit is stored in the voice storage unit in advance,
By performing information communication between the first communication device and the second communication device via the communication network, the utterance information of the file information stored in the file information storage unit and the voice data stored in the voice color storage unit A communication method for synthesizing speech based on the first communication device, wherein a transfer step of transferring file information stored in the file information storage means to the second communication device in response to a request from the second communication device to the first communication device; (2) a selecting step of selecting voice data corresponding to a voice type in the utterance information of the file information transferred in the transfer step in the communication device from a plurality of types of voice data stored in the voice storage unit; And a developing step of developing a continuous prosody pattern in the time axis direction based on the utterance information in the second communication device, and a prosody pattern developed by the developing step in the second communication device Characterized in that it includes a sound reproducing step of generating a speech waveform, the based on the selection process tone of voice data selected by.

【００５３】この請求項２６の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、ファイル情報の内の発
話情報で時間的に連続した韻律パターンを展開し、その
韻律パターンと発話情報中の声色の種類を示す情報で選
定された声色データとに基づいて音声波形を生成する工
程にしたので、特定の声色に限定しなくても複数種の声
色から直接的に指定した最適の声色で音声再生ができ、
かつ波形合成時に声の高さのパターンにずれが生じるこ
とはなく、このように、発話情報と声色情報との対応関
係を固定しなくても最適の対応関係を得ることで音声合
成の高い品質を維持することが可能である。According to the twenty-sixth aspect, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally uses the utterance information of the file information. Since a continuous prosody pattern is developed and a voice waveform is generated based on the prosody pattern and voice data selected by information indicating the type of voice in the speech information, the process is not limited to a specific voice. Can also play audio with the optimal voice specified directly from multiple types of voices,
In addition, there is no shift in the voice pitch pattern during waveform synthesis, and thus high optimal speech synthesis quality can be obtained by obtaining the optimal correspondence without fixing the correspondence between speech information and voice color information. It is possible to maintain

【００５４】請求項２７の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、音韻と韻律と声色の属性とを情報と
して含む発話情報を含むファイル情報をファイル情報記
憶部に予め記憶しておくと共に、第２通信装置におい
て、声色の種類別に、音韻等の素片単位毎の音響パラメ
ータを表す声色データとその声色の属性を示す情報とを
対応させて声色記憶部に予め記憶しておき、通信網を介
して第１通信装置と第２通信装置間で情報通信を行うこ
とでファイル情報記憶部に記憶されているファイル情報
の内の発話情報と声色記憶部に記憶されている声色デー
タとに基づいて音声合成する情報通信方法であって、第
１通信装置に対する第２通信装置の要求に応じてファイ
ル情報記憶部に記憶されたファイル情報を第２通信装置
へ転送する転送工程と、第２通信装置において転送工程
により転送されてきたファイル情報の内の発話情報の内
の声色の属性を示す情報と声色記憶部に記憶されている
各種声色の属性を示す情報とを照合して声色の類似度を
求める照合工程と、第２通信装置において照合工程によ
り求められた類似度に基づいて声色記憶部に記憶されて
いる複数種の声色データの内から類似度の最も高い声色
データを選定する選定工程と、第２通信装置において発
話情報に基づいて時間軸方向で連続した韻律パターンを
展開する展開工程と、第２通信装置において展開工程に
より展開された韻律パターンと選定工程により選定され
た声色データとに基づいて音声波形を生成する音声再生
工程と、を含むことを特徴とする。An information communication method according to a twenty-seventh aspect of the present invention
Connecting a first communication device and a second communication device to a communication network;
In the communication device, file information including utterance information including phonemes, prosody, and voice attributes as information is stored in the file information storage unit in advance, and in the second communication device, elements such as phonemes are classified by voice type. Voice data representing an acoustic parameter for each unit and information indicating an attribute of the voice are stored in a voice storage unit in advance in association with each other, and information is transmitted between the first communication device and the second communication device via a communication network. An information communication method for performing voice communication based on utterance information of file information stored in a file information storage unit and voice data stored in a voice color storage unit by performing communication, comprising: a first communication device; A transfer step of transferring the file information stored in the file information storage unit to the second communication device in response to a request from the second communication device for the second communication device, and a transfer process in the second communication device. A collation step of collating information indicating a voice attribute in the utterance information of the file information with information indicating various voice attributes stored in the voice storage unit to obtain a similarity of the voice; A selecting step of selecting voice data having the highest similarity from a plurality of types of voice data stored in the voice storage unit based on the similarity obtained by the matching step in the device; Developing a prosody pattern continuous in the time axis direction based on the prosody pattern, and generating a speech waveform based on the prosody pattern expanded by the expansion step and the timbre data selected by the selection step in the second communication device And a regeneration step.

【００５５】この請求項２７の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、ファイル情報の内の発
話情報で時間的に連続した韻律パターンを展開し、その
韻律パターンと発話情報中の声色の属性を示す情報で類
似度によって選定された声色データとに基づいて音声波
形を生成する工程にしたので、不適な声色を使用せずに
類似度の最も高い声色で音声再生ができ、かつ波形合成
時に声の高さのパターンにずれが生じることはなく、こ
のように、発話情報と声色情報との対応関係を固定しな
くても最適の対応関係を得ることで音声合成の高い品質
を維持することが可能である。According to the twenty-seventh aspect, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device uses the utterance information of the file information in time. Uses an unsuitable timbre because it develops a continuous prosody pattern and generates an audio waveform based on the rhythm pattern and timbre data selected by similarity based on information indicating the timbre attribute in the utterance information. The voice reproduction with the highest similarity can be performed without performing the voice synthesis, and the voice pitch pattern does not shift during the waveform synthesis. Thus, the correspondence between the utterance information and the voice color information is not fixed. Even so, it is possible to maintain high quality of speech synthesis by obtaining an optimal correspondence.

【００５６】請求項２８の発明に係る情報通信方法は、
第１通信装置と第２通信装置とを通信網に接続し、第１
通信装置において、音韻と韻律と声色の種類と声色の属
性とを情報として含む発話情報を含むファイル情報をフ
ァイル情報記憶部に予め記憶しておくと共に、第２通信
装置において、声色の種類別に、音韻等の素片単位毎の
音響パラメータを表す声色データとその声色の属性を示
す情報とを対応させて声色記憶部に予め記憶しておき、
通信網を介して第１通信装置と第２通信装置間で情報通
信を行うことでファイル情報記憶部に記憶されているフ
ァイル情報の内の発話情報と声色記憶部に記憶されてい
る声色データとに基づいて音声合成する情報通信方法で
あって、第１通信装置に対する第２通信装置の要求に応
じてファイル情報記憶部に記憶されたファイル情報を第
２通信装置へ転送する転送工程と、第２通信装置におい
て転送工程により転送されてきたファイル情報の内の発
話情報の内の声色の種類を声色記憶部に記憶されている
各種声色の種類から検索する検索工程と、第２通信装置
において検索工程の検索により発話情報の内の声色の種
類が取得できた場合には、その取得できた声色の種類に
該当する声色データを声色記憶部に記憶されている各種
声色データの内から選定する第１選定工程と、第２通信
装置において検索工程の検索により発話情報の内の声色
の種類が取得できなかった場合には、ファイル情報記憶
部に記憶されている発話情報の内の声色の属性を示す情
報と声色記憶部に記憶されている各種声色の属性を示す
情報とを照合して声色の類似度を求める照合工程と、第
２通信装置において照合工程により求められた類似度に
基づいて声色記憶部に記憶されている複数種の声色デー
タの内から類似度の最も高い声色データを選定する第２
選定工程と、第２通信装置において発話情報に基づいて
時間軸方向で連続した韻律パターンを展開する展開工程
と、第２通信装置において展開工程により展開された韻
律パターンと第１又は第２選定工程により選定された声
色データとに基づいて音声波形を生成する音声再生工程
と、を含むことを特徴とする。According to a twenty-eighth aspect of the present invention, there is provided an information communication method comprising:
Connecting a first communication device and a second communication device to a communication network;
In the communication device, file information including speech information including phonemes, prosody, voice type, and voice attribute as information is stored in the file information storage unit in advance, and in the second communication device, for each voice type, Voice data representing an acoustic parameter for each unit such as a phoneme and information indicating an attribute of the voice are stored in advance in the voice storage unit in association with each other,
By performing information communication between the first communication device and the second communication device via the communication network, the utterance information of the file information stored in the file information storage unit and the voice data stored in the voice color storage unit A communication method for synthesizing voice based on the first communication device, wherein a transfer step of transferring file information stored in the file information storage unit to the second communication device in response to a request from the second communication device to the first communication device; (2) a retrieval step of retrieving the type of voice in the utterance information of the file information transferred by the transmission step in the communication apparatus from the various types of voice stored in the voice storage unit; and a search in the second communication apparatus. If the type of voice in the utterance information can be obtained by the search for the process, the voice data corresponding to the obtained voice type is selected from among the various voice data stored in the voice storage unit. If the type of vocal tone in the utterance information cannot be obtained by the first selection step to be selected and the search step in the second communication device, the timbre in the utterance information stored in the file information storage unit And the information indicating the attributes of various timbres stored in the timbre storage unit to determine the similarity of the timbre, and the similarity determined by the matching process in the second communication device. Second selecting the voice data having the highest similarity from the plurality of types of voice data stored in the voice memory based on the second voice data;
A selecting step, a developing step of developing a continuous prosody pattern in the time axis direction based on the utterance information in the second communication device, and a prosody pattern developed by the developing step in the second communication device and the first or second selecting step And a voice reproduction step of generating a voice waveform based on the voice color data selected by (1).

【００５７】この請求項２８の発明によれば、第１通信
装置から第２通信装置へ発話情報を含むファイル情報を
転送し、第２通信装置において、ファイル情報の内の発
話情報で時間的に連続した韻律パターンを展開し、その
韻律パターンと発話情報中の声色の種類や属性を示す情
報で選定された声色データとに基づいて音声波形を生成
する工程にしたので、直接的に指定した声色がなくても
不適な声色を使用せずに類似度の最も高い声色で音声再
生ができ、かつ波形合成時に声の高さのパターンにずれ
が生じることはなく、このように、発話情報と声色情報
との対応関係を固定しなくても最適の対応関係を得るこ
とで音声合成の高い品質を維持することが可能である。According to the twenty-eighth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device uses the utterance information in the file information in time. Since the continuous prosody pattern is developed and the voice waveform is generated based on the prosody pattern and the voice data selected by the information indicating the type and attribute of the voice in the utterance information, the directly specified voice is used. Even if there is no voice, the voice can be reproduced with the highest similarity without using an inappropriate voice, and there is no shift in the voice pitch pattern during waveform synthesis. Even if the correspondence with the information is not fixed, it is possible to maintain high quality of speech synthesis by obtaining the optimum correspondence.

【００５８】請求項２９の発明に係る情報通信方法は、
請求項２３、２４、２７、２８のいずれか一つに記載の
発明において、属性を示す情報を性別、年齢、声の高さ
の基準、明瞭度、及び自然度の内のいずれか一つ、もし
くはその二つ以上の組み合わせる工程にしたので、発話
情報記憶手段の属性と声色記憶手段の属性との照合対象
がパラメータ化され、これによって、声色の選定を容易
にすることが可能である。An information communication method according to a twenty-ninth aspect of the present invention is characterized in that:
In the invention according to any one of claims 23, 24, 27, and 28, the information indicating the attribute is any one of gender, age, voice pitch criterion, intelligibility, and naturalness, Alternatively, since the two or more steps are combined, the collation target between the attribute of the utterance information storage unit and the attribute of the timbre storage unit is parameterized, thereby making it easy to select the timbre.

【００５９】請求項３０の発明に係る情報通信方法は、
請求項２１〜２８のいずれか一つに記載の発明におい
て、ファイル情報記憶部が声の高さの基準を示す第１情
報を前記発話情報に含めて記憶し、声色記憶部が声の高
さの基準を示す第２情報を声色データに含めて記憶して
おり、音声再生工程が第１情報による声の高さの基準を
第２情報による声の高さの基準によってシフトすること
で音声再生時の声の高さの基準を決定することを特徴と
する。According to a thirtieth aspect of the present invention, there is provided an information communication method comprising:
The invention according to any one of claims 21 to 28, wherein the file information storage unit stores first information indicating a reference of voice pitch in the utterance information, and the voice color storage unit stores the voice pitch. Is stored in the timbre data in the voice data, and the voice reproduction step shifts the voice pitch standard according to the first information by the voice pitch standard according to the second information. It is characterized in that a criterion for the pitch of the time is determined.

【００６０】この請求項３０の発明によれば、音声再生
時に発話情報記憶手段の声の高さの基準を声色記憶手段
の声の高さの基準によってシフトする工程にしたので、
個々の声の高さは音韻の時間区分に関係なくそのシフト
した声の高さの基準に従って相対的に変化し、このた
め、声の高さの基準が声色側に近づくことから、音声の
品質を一層向上させることが可能である。According to the thirtieth aspect, the step of shifting the reference of the voice pitch of the utterance information storage means at the time of voice reproduction is performed according to the reference of the voice pitch of the timbre storage means.
The pitch of an individual voice changes relatively according to the shifted voice pitch criterion regardless of the time segment of the phoneme, so that the voice pitch criterion approaches the timbre side, so that the voice quality Can be further improved.

【００６１】請求項３１の発明に係る情報通信方法は、
請求項２１〜２８のいずれか一つに記載の発明におい
て、ファイル情報記憶部が声の高さの基準を示す第１情
報を発話情報に含めて記憶し、音声再生工程が、声の高
さの基準を示す第２情報を任意に入力する入力工程を含
み、第１情報による声の高さの基準を入力工程によっり
入力された第２情報による声の高さの基準によってシフ
トすることで音声再生時の声の高さの基準を決定するこ
とを特徴とする。An information communication method according to a thirty-first aspect of the present invention comprises:
In the invention according to any one of claims 21 to 28, the file information storage unit stores the first information indicating the reference of the voice pitch included in the utterance information, and the voice reproducing step includes the step of reproducing the voice pitch. And an input step of arbitrarily inputting second information indicating the reference of (i), and shifting the reference of the voice pitch by the first information by the reference of the voice pitch by the second information input by the input step. Is used to determine a criterion for the pitch of voice at the time of voice reproduction.

【００６２】この請求項３１の発明によれば、音声再生
時に発話情報記憶手段の声の高さの基準を任意の声の高
さの基準によってシフトする工程にしたので、個々の声
の高さは音韻の時間区分に関係なくそのシフトした声の
高さの基準に従って相対的に変化し、このため、シフト
量に従って意図する声質に近づける等、声色の加工が可
能である。According to the thirty-first aspect of the present invention, the step of shifting the reference of the voice pitch of the utterance information storage means at the time of voice reproduction by an arbitrary reference of the voice pitch is performed. Is relatively changed irrespective of the time division of the phoneme according to the shifted voice pitch criterion. Therefore, it is possible to process the timbre such as approaching the intended voice quality according to the shift amount.

【００６３】請求項３２の発明に係る情報通信方法は、
請求項３０又は３１に記載の発明において、第１及び第
２情報による声の高さの基準を声の高さの平均周波数、
最大周波数、または最小周波数にしたので、声の高さの
基準が取りやすくなる。The information communication method according to the invention of claim 32 is:
31. The invention according to claim 30 or 31, wherein the reference of the voice pitch based on the first and second information is an average frequency of the voice pitch,
Since the maximum frequency or the minimum frequency is set, it becomes easy to set a reference for the voice pitch.

【００６４】請求項３３の発明に係る情報通信方法は、
請求項２１〜２８のいずれか一つに記載の発明におい
て、第２通信装置において記録媒体より声色データを読
み出して声色記憶部に記憶する工程にしたので、記録媒
体を通して声色の種類にバリエーションを与えることが
でき、音声再生時に最適の声色を適用させることが可能
である。According to a thirty-third aspect of the present invention, there is provided an information communication method comprising:
In the invention according to any one of claims 21 to 28, in the second communication device, the voice data is read from the recording medium and stored in the voice storage unit, so that the voice type is varied through the recording medium. This makes it possible to apply an optimal voice tone during voice reproduction.

【００６５】請求項３４の発明に係る情報通信方法は、
請求項２１〜２８のいずれか一つに記載の発明におい
て、第２通信装置において通信回線を介して外部装置よ
り声色データを受信してその声色データを声色記憶部に
記憶する工程にしたので、通信回線を通して声色の種類
にバリエーションを与えることができ、音声再生時に最
適の声色を適用させることが可能である。The information communication method according to the invention of claim 34, comprises:
In the invention according to any one of claims 21 to 28, since the second communication device receives voice data from an external device via a communication line and stores the voice data in the voice storage unit, Variations can be given to the types of voices through a communication line, and it is possible to apply an optimum voice at the time of voice reproduction.

【００６６】請求項３５の発明に係る情報通信方法は、
請求項２１〜２８のいずれか一つに記載の発明におい
て、発話情報がファイル情報の内の他の情報による動作
と音声再生工程による動作とを同期させる制御情報を含
み、音声再生工程は音声再生時に発話情報に含まれる制
御情報に従ってファイル情報の内の他の情報の動作に同
期して動作する工程にしたので、音声と他のメディアと
の表現が融合して表現力を強化することが可能である。An information communication method according to claim 35 is characterized in that:
In the invention according to any one of claims 21 to 28, the utterance information includes control information for synchronizing an operation based on other information in the file information with an operation based on the audio reproduction step, and the audio reproduction step includes the audio reproduction step. Sometimes the operation is performed in synchronization with the operation of other information in the file information according to the control information included in the utterance information, so the expression of voice and other media can be fused to enhance the expressive power It is.

【００６７】請求項３６の発明に係る情報通信方法は、
請求項３５記載の発明において、他の情報を画像情報、
楽曲情報等にしたので、音声と画像、音楽等との表現が
融合して表現力を強化することが可能である。[0067] The information communication method according to claim 36 is:
The invention according to claim 35, wherein the other information is image information,
Since the music information or the like is used, it is possible to enhance the expressive power by fusing the expression of the voice with the image, the music, and the like.

【００６８】請求項３７の発明に係る情報処理方法は、
請求項２１〜２８のいずれか一つに記載の情報通信方法
で使用する発話情報を通信網に接続された第３通信装置
によって作成編集する情報処理方法であって、自然音声
を入力する音声入力工程と、音声入力工程により入力さ
れた自然音声に基づいて発話情報を作成する作成工程
と、第１通信装置に対して作成工程により作成された発
話情報を含むファイル情報の登録を要求し、その後に作
成された発話情報を含むファイル情報を第１通信装置へ
転送して第１通信装置のファイル情報記憶部に登録する
登録転送工程と、を含むことを特徴とする。The information processing method according to claim 37 is characterized in that:
An information processing method for creating and editing utterance information used in the information communication method according to any one of claims 21 to 28 by a third communication device connected to a communication network, wherein a natural sound is input. Requesting the first communication device to register file information including the utterance information created in the creating step, and a creation step of creating the utterance information based on the natural speech input in the speech input step; Transferring the file information including the utterance information created in the first communication device to the first communication device and registering the file information in the file information storage unit of the first communication device.

【００６９】この請求項３７の発明によれば、入力され
た自然音声に基づいて、声の大きさと声の高さとのいず
れか一方、もしくはその両方を、音韻間の時間差に依存
せず、かつ相対的なレベルをもつように離散させて発話
情報を作成し、これを第１通信装置に転送してファイル
情報記憶手段に登録する工程にしたので、音韻の時間差
から独立した任意の時点に声の大きさや声の高さを与え
ることが可能である。According to the thirty-seventh aspect, based on the input natural speech, one or both of the loudness and the pitch of the voice do not depend on the time difference between phonemes, and Since the utterance information is created by discretely generating the utterance information so as to have a relative level, the utterance information is transferred to the first communication device and registered in the file information storage means. It is possible to give loudness and voice pitch.

【００７０】請求項３８の発明に係る情報処理方法は、
請求項３７記載の発明において、請求項３０又は３１に
記載の情報通信方法で使用する発話情報を作成編集する
情報処理方法であって、作成工程が声の高さの基準を示
す第１情報を発話情報に含めて作成する工程にしたの
で、発話情報の中に声の高さの基準を与えることが可能
である。An information processing method according to a thirty-eighth aspect of the present invention
37. The information processing method according to claim 37, wherein the utterance information used in the information communication method according to claim 30 or 31 is created and edited, wherein the creation step includes: Since the step of creating the speech information is included in the speech information, it is possible to provide a reference for the voice pitch in the speech information.

【００７１】請求項３９の発明に係る情報処理方法は、
請求項３７記載の発明において、作成工程が各情報を任
意に変更する変更工程を含むので、音声の品質を高める
ための情報の変更が可能になる。The information processing method according to claim 39 is
In the invention according to claim 37, since the creation step includes a change step of arbitrarily changing each piece of information, it is possible to change information for improving the voice quality.

【００７２】請求項４０の発明に係る情報処理方法は、
請求項３７記載の発明において、請求項３５又は３６に
記載の情報通信方法で使用する発話情報を作成編集する
情報処理方法であって、作成工程が発話情報を作成する
際に制御情報を発話情報に含める工程にしたので、他の
情報による動作に音声合成の動作を同期させる情報を発
話情報の中に与えることが可能である。The information processing method according to claim 40 is:
37. An information processing method according to claim 37, wherein the utterance information used in the information communication method according to claim 35 or 36 is created and edited, wherein control information is generated when the creation step creates the utterance information. In the utterance information, information for synchronizing the operation of speech synthesis with the operation based on other information can be provided.

【００７３】[0073]

【発明の実施の形態】以下に添付図面を参照して、この
発明に係る好適な一実施の形態を詳細に説明する。な
お、以下に説明する実施の形態は、インターネットを例
に挙げている。Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The embodiments described below exemplify the Internet.

【００７４】図１はこの発明に係る情報通信システムの
一実施の形態を示す構成図である。この情報通信システ
ムは、ホスト装置１（請求項１〜８の第１通信装置）と
複数の端末装置とをＩＳＤＮ網等の通信網ＮＥＴに接続
し、その通信網ＮＥＴを介してホスト装置１と各端末装
置間で情報通信を行う構成である。図１には、複数の端
末装置の代表として、端末装置２（請求項１〜８の第２
通信装置）をひとつ例に挙げる。FIG. 1 is a configuration diagram showing an embodiment of the information communication system according to the present invention. This information communication system connects a host device 1 (a first communication device according to claims 1 to 8) and a plurality of terminal devices to a communication network NET such as an ISDN network, and communicates with the host device 1 via the communication network NET. In this configuration, information communication is performed between the terminal devices. FIG. 1 shows a terminal device 2 (second embodiment of claims 1 to 8) as a representative of a plurality of terminal devices.
Communication device) as an example.

【００７５】ホスト装置１は、通信網ＮＥＴに接続され
る通信部１０（請求項１〜８の第１通信手段）、データ
ベース（以下にＤＢと称する）１１（請求項１〜８のフ
ァイル情報記憶手段）、制御部１２（請求項１〜８の第
１通信手段、請求項９の登録手段）等から構成されてい
る。The host device 1 includes a communication unit 10 (first communication means of claims 1 to 8) connected to the communication network NET, a database (hereinafter referred to as DB) 11 (file information storage of claims 1 to 8). Means), a control unit 12 (first communication means of claims 1 to 8, registration means of claim 9) and the like.

【００７６】通信部１０は通信網ＮＥＴを介して端末装
置２との情報通信（音声通信含む）を制御するユニット
であり、ＤＢ１１は端末装置２や自装置で作成された発
話情報を含むファイル情報をファイル単位で登録するメ
モリである。制御部１２は、端末装置２のファイル登録
要求に応じてファイルを受信してＤＢ１１に登録した
り、端末装置２の要求に応じて所要のファイル情報をＤ
Ｂ１１から読み出して端末装置２へ転送する等の制御を
実行する。The communication unit 10 is a unit for controlling information communication (including voice communication) with the terminal device 2 via the communication network NET, and the DB 11 is file information including speech information created by the terminal device 2 and its own device. Is a memory for registering a file in units of files. The control unit 12 receives a file in response to a file registration request from the terminal device 2 and registers it in the DB 11, and stores necessary file information in response to a request from the terminal device 2.
Control such as reading from B11 and transferring to terminal device 2 is performed.

【００７７】上述した発話情報とは、声の大きさと声の
高さとのいずれか一方、もしくはその両方と、その時間
差と、声色の種類とを対応させ、声の大きさと声の高さ
とのいずれか一方、もしくはその両方を、音韻間の時間
差に依存せず、かつ相対的なレベルをもつように離散さ
せてなる情報である。The above-mentioned utterance information is obtained by associating one or both of the loudness of the voice and the pitch of the voice with the time difference and the type of the timbre. One or both of them is information that is discrete so as to have a relative level without depending on the time difference between phonemes.

【００７８】端末装置２は、通信網ＮＥＴに接続される
通信部２０（請求項１〜８の第２通信手段、請求項１７
の登録転送手段）、声色記憶部２１（請求項１〜８の声
色記憶手段）、アプリケーション記憶部２２、スピーカ
２３（請求項１〜８の音声再生手段）、制御部２４（請
求項１〜８の第２通信手段、選定手段、展開手段、及び
音声再生手段、請求項１７の作成手段、及び登録転送手
段）、表示部２５（請求項１７の作成手段）等から構成
されている。The terminal device 2 is connected to a communication unit 20 (second communication means of claims 1 to 8;
Registration / transfer unit), voice storage unit 21 (voice storage unit of claims 1 to 8), application storage unit 22, speaker 23 (voice playback unit of claims 1 to 8), control unit 24 (claims 1 to 8). The second communication means, the selection means, the development means, and the sound reproduction means, the creation means and the registration transfer means of claim 17), the display unit 25 (the creation means of claim 17), and the like.

【００７９】通信部２０は通信網ＮＥＴを介してホスト
装置１との情報通信（音声通信含む）を制御するユニッ
トであり、声色記憶部２１は声色データを記憶するメモ
リである。上記声色データとは、声色の種類別に音韻等
の素片単位毎の音響パラメータを表すデータである。The communication section 20 is a unit for controlling information communication (including voice communication) with the host device 1 via the communication network NET, and the voice storage section 21 is a memory for storing voice data. The timbre data is data representing acoustic parameters for each unit of unit such as phonemes for each type of timbre.

【００８０】また、アプリケーション記憶部２２は、音
声処理ＰＭ（プログラムメモリ）２２１を有しており、
通信網ＮＥＴや、ＦＤ（フロッピーディスク）、ＣＤ
（コンパクトディスク）−ＲＯＭ等の記録媒体を通して
このナレーション処理ＰＭ２２１のプログラムの追加、
変更、削除等の操作が可能である。The application storage section 22 has a sound processing PM (program memory) 221.
Communication network NET, FD (floppy disk), CD
(Compact disk)-Add a program for this narration processing PM221 through a recording medium such as a ROM,
Operations such as change and deletion are possible.

【００８１】このナレーション処理ＰＭ２２１は、図１
６に示したフローチャートに従うファイル転送処理、図
１７及び図１８に示したフローチャートに従う再生処
理、図２３に示したフローチャートに従う発話情報作成
処理、図２４に示したフローチャートに従う新規作成処
理、図２５に示したフローチャートに従う割り込み再生
処理、図３４に示したフローチャートに従う編集処理、
図３５に示したフローチャートに従うファイル登録処理
等を実行するためのプログラムが格納されている。This narration processing PM 221 is performed as shown in FIG.
6, file reproduction processing according to the flowcharts shown in FIGS. 17 and 18, speech information creation processing according to the flowchart shown in FIG. 23, new creation processing according to the flowchart shown in FIG. 24, and FIG. Playback processing according to the flowchart shown in FIG. 34, editing processing according to the flowchart shown in FIG. 34,
A program for executing a file registration process and the like according to the flowchart shown in FIG. 35 is stored.

【００８２】図１６のファイル転送処理とは、端末装置
２がホスト装置１に対して所望の発話情報を含むファイ
ル情報を要求し、ホスト装置１より転送されているファ
イル情報を受け取って音声再生等の出力処理を実行する
処理である。The file transfer process shown in FIG. 16 means that the terminal device 2 requests the host device 1 for file information including desired utterance information, receives the file information transferred from the host device 1, and reproduces the sound. This is the process of executing the output process.

【００８３】図１７及び図１８の再生処理とは、上記フ
ァイル転送処理における音声再生の動作を具体的に示す
処理である。The reproduction process shown in FIGS. 17 and 18 is a process specifically showing the operation of audio reproduction in the file transfer process.

【００８４】図２３の発話情報作成処理とは、自然音声
に基づいて声色データを含まない、離散的な韻律を示す
発話情報の新規作成、編集、及びファイル登録を含む処
理である。The utterance information creation processing shown in FIG. 23 is processing including new creation, editing, and file registration of utterance information indicating discrete prosody, which does not include voice data based on natural speech.

【００８５】図２４の新規作成処理とは、上記発話情報
作成処理における新規作成動作を具体的に示す処理であ
る。The new creation process of FIG. 24 is a process specifically showing a new creation operation in the above utterance information creation process.

【００８６】図２５の割り込み再生処理とは、上記新規
作成処理または編集処理による動作中に再生要求があっ
た場合の音声再生動作を具体的に示す処理である。The interrupt reproduction process of FIG. 25 is a process specifically showing a sound reproduction operation when a reproduction request is made during the operation of the above-mentioned new creation processing or editing processing.

【００８７】図３４の編集処理とは、上記発話情報作成
処理における編集動作を具体的に示す処理であり、その
編集対象は既に作成済みのファイル（発話情報）とな
る。The editing process in FIG. 34 is a process specifically showing the editing operation in the above-mentioned utterance information creating process, and the editing target is a file (utterance information) that has already been created.

【００８８】図３５のファイル登録処理とは、上記発話
情報作成処理におけるファイル登録動作を具体的に示す
処理である。すなわち、ファイル登録処理とは、端末装
置２がホスト装置１に対して所望のファイル情報を登録
要求し、ホスト装置１へ転送して登録する処理である。The file registration process shown in FIG. 35 is a process specifically showing a file registration operation in the utterance information creation process. That is, the file registration process is a process in which the terminal device 2 requests the host device 1 to register desired file information, and transfers the file information to the host device 1 for registration.

【００８９】スピーカ２３は、上述した再生処理や割り
込み再生処理で発話情報と声色データとの波形合成によ
り再生される合成音声等を出力する音声出力ユニットで
ある。The speaker 23 is a sound output unit that outputs a synthesized voice or the like reproduced by synthesizing the waveform of the utterance information and the voice color data in the above-described reproduction processing or interrupt reproduction processing.

【００９０】表示部２５は、発話情報のファイル作成、
転送、登録等の処理時の表示画面を形成するＬＣＤ、Ｃ
ＲＴ等の表示ユニットである。The display unit 25 displays a file of speech information,
LCD and C to form a display screen during processing such as transfer and registration
A display unit such as an RT.

【００９１】次に、ホスト装置１によるファイル情報の
管理形態について詳述する。Next, the manner in which the host device 1 manages file information will be described in detail.

【００９２】図２はホスト装置１のＤＢ１１のメモリ構
成例を示す図である。FIG. 2 is a diagram showing an example of a memory configuration of the DB 11 of the host device 1.

【００９３】ＤＢ１１は、図２に示すように、ファイル
Ａ，Ｂ，Ｃ…に対応させて発話情報を含むファイル情報
を記憶している。例えば、ファイルＡのファイル情報は
発話情報（ヘッダ情報ＨＤＲＡ及び発声情報ＰＲＳ
Ａ）、画像情報ＩＭＧＡ、プログラム情報ＰＲＯＡを対
応付けて記憶されている。同様に、ファイルＢのファイ
ル情報は発話情報（ヘッダ情報ＨＤＲＢ及び発声情報Ｐ
ＲＳＢ）、画像情報ＩＭＧＢ、プログラム情報ＰＲＯＢ
を対応付けて記憶され、ファイルＣのファイル情報は発
話情報（ヘッダ情報ＨＤＲＣ及び発声情報ＰＲＳＣ）、
画像情報ＩＭＧＣ、プログラム情報ＰＲＯＣを対応付け
て記憶されている。なお、システムとしてインターネッ
トを例に挙げていることから、ファイル情報Ａ，Ｂ，Ｃ
…において、プログラム情報ＰＲＯＡ，ＰＲＯＢ，ＰＲ
ＯＣ…はＨＴＭＬ言語で記述された情報であり、ホーム
ページ等を作成する。As shown in FIG. 2, the DB 11 stores file information including speech information in association with files A, B, C,. For example, the file information of file A is speech information (header information HDRA and speech information PRS).
A), image information IMGA, and program information PROA are stored in association with each other. Similarly, file information of file B is utterance information (header information HDRB and utterance information P
RSB), image information IMGB, program information PROB
And file information of the file C is utterance information (header information HDRC and utterance information PRSC),
The image information IMGC and the program information PROC are stored in association with each other. Since the Internet is taken as an example of the system, the file information A, B, C
…, The program information PROA, PROB, PR
OC ... is information described in the HTML language, and creates a homepage or the like.

【００９４】図３は発話情報の内のヘッダ情報の一例を
示す図、図４は発話情報の内の発声情報の構成例を示す
図、図５は発声情報内の発声イベントの構成例を示す
図、図６はベロシティのレベル内容を説明する図、そし
て、図７は発声情報内の制御イベントの構成例を示す図
である。FIG. 3 is a diagram showing an example of header information in the speech information, FIG. 4 is a diagram showing a configuration example of speech information in the speech information, and FIG. 5 is a diagram showing a configuration example of a speech event in the speech information. FIG. 6 and FIG. 6 are diagrams for explaining the contents of the velocity level, and FIG. 7 is a diagram showing a configuration example of a control event in the utterance information.

【００９５】ここで、ファイルＡの発話情報を例に挙げ
て説明する。図３には、ファイルＡのヘッダ情報ＨＤＲ
Ａが示されている。このヘッダ情報ＨＤＲＡは、音韻群
ＰＧ、言語コードＬＧ、時間分解能ＴＤ、声色指定デー
タＶＰ、ピッチ基準データＰＢ、及び音量基準データＶ
Ｂより構成される。Here, the utterance information of the file A will be described as an example. FIG. 3 shows header information HDR of file A.
A is shown. The header information HDRA includes a phoneme group PG, a language code LG, a time resolution TD, voice color designation data VP, pitch reference data PB, and volume reference data V
B.

【００９６】音韻群ＰＧ，言語コードＬＧはそれぞれ後
述の音韻部４２（図８参照）の音韻群，言語コードを指
定するためのデータであり、このデータによって音声合
成に使用する音韻テーブルが特定される。The phoneme group PG and the language code LG are data for specifying a phoneme group and a language code of a phoneme unit 42 (see FIG. 8), which will be described later. The data specifies a phoneme table used for speech synthesis. You.

【００９７】時間分解能データＴＤは音韻間の時間差の
基本単位時間を指定するデータである。声色指定データ
ＶＰは音声合成時に使用する後述の声色部２１１（図８
参照）のファイルを指定（選択）するためのデータであ
り、このデータによって音声合成時に使用する声色の種
類すなわち声色データが特定される。The time resolution data TD is data for specifying a basic unit time of a time difference between phonemes. The voice designation data VP is used for voice synthesis 211 described later (FIG.
) File is specified (selected), and the data specifies the type of timbre used at the time of speech synthesis, that is, timbre data.

【００９８】ピッチ基準データＰＢは基準となる声の高
さ（ピッチ周波数）を定義するデータである。なお、ピ
ッチ基準の一例として平均ピッチを採用するが、その平
均ピッチの他に、ピッチの最大周波数や最小周波数等の
ように別の基準を適用してもよい。音声波形の合成時に
は、例えばこのピッチ基準データＰＢによるピッチを基
準にして上下１オクターブづつの範囲で変化が可能であ
る。The pitch reference data PB is data for defining a reference voice pitch (pitch frequency). Although an average pitch is used as an example of the pitch standard, another standard such as a maximum frequency or a minimum frequency of the pitch may be applied in addition to the average pitch. At the time of synthesizing the audio waveform, for example, the pitch can be changed in the range of one octave up and down with reference to the pitch based on the pitch reference data PB.

【００９９】音量基準データＶＢは全体の音量の基準を
指定するデータである。The volume reference data VB is data for specifying the reference of the overall volume.

【０１００】図４には、ファイルＡの発声情報ＰＲＳＡ
が示されている。発声情報ＰＲＳＡは、時間差データＤ
Ｔとイベントデータ（発声イベントＰＥ、又は制御イベ
ントＣＥ）とを交互に対応付けた構造であり、音韻間の
時間差に依存しないものである。FIG. 4 shows utterance information PRSA of file A.
It is shown. The utterance information PRSA is the time difference data D
It has a structure in which T and event data (the utterance event PE or the control event CE) are alternately associated with each other, and does not depend on the time difference between phonemes.

【０１０１】時間差データＤＴはイベントデータ間の時
間差を指定するデータである。この時間差データＤＴに
より示される時間差の単位は、発話情報の内のヘッダ情
報の時間分解能ＴＤで指定される。Time difference data DT is data for specifying a time difference between event data. The unit of the time difference indicated by the time difference data DT is specified by the time resolution TD of the header information in the speech information.

【０１０２】イベントデータの内の発声イベントＰＥ
は、発声する音韻、相対的に声高さを指定する声の高
さ、相対的に声の強さを指定するベロシティ等で構成さ
れるデータである。The utterance event PE in the event data
Is data composed of a phoneme to be uttered, a voice pitch relatively designating a voice pitch, a velocity relatively designating a voice strength, and the like.

【０１０３】イベントデータの内の制御イベントＣＥ
は、上述した発声イベントＰＥで指定するパラメータ以
外の制御として途中で音量変更等を行うために指定する
データである。Control event CE in event data
Is data designated for performing a volume change or the like in the middle as control other than the parameters designated by the above-mentioned utterance event PE.

【０１０４】次に、発声イベントＰＥについて図５及び
図６を参照して詳述する。Next, the utterance event PE will be described in detail with reference to FIGS.

【０１０５】発声イベントＰＥは、図５に示した如く、
音韻イベントＰＥ１、ピッチイベントＰＥ２、及びベロ
シティイベントＰＥ３の３種類である。The utterance event PE is, as shown in FIG.
There are three types of phoneme event PE1, pitch event PE2, and velocity event PE3.

【０１０６】音韻イベントＰＥ１は、識別情報Ｐ１、声
の強さ（ベロシティ）、及び音韻コードＰＨを対応付け
た構造を有し、音韻の指定と同時に声の強さを指定する
イベントである。The phoneme event PE1 has a structure in which identification information P1, voice intensity (velocity), and phoneme code PH are associated with each other, and is an event for specifying a phoneme at the same time as specifying a phoneme.

【０１０７】音韻イベントＰＥ１の先頭に付加された識
別情報Ｐ１は、イベントの種類が発声イベントＰＥの中
の音韻イベントＰＥ１であることを示す。The identification information P1 added to the head of the phoneme event PE1 indicates that the event type is the phoneme event PE1 in the utterance event PE.

【０１０８】声の強さＶＬは音量（ベロシティ）を指定
するデータであり、感覚的な声の強さとして指定する。The voice intensity VL is data for specifying the volume (velocity), and is specified as a sensible voice intensity.

【０１０９】この声の強さＶＬを例えば３ビットの８値
に区分して、各値に楽音の記号を対応付けてみると、図
６に示したように、値“０”、値“１”…値“７”には
それぞれ無音、ピアニッシモ（ｐｐｐ）…フォルテッシ
モ（ｆｆｆ）が対応する。When this voice intensity VL is divided into, for example, 3 bits of 8 values, and each value is associated with a tone symbol, as shown in FIG. 6, the values “0” and “1” are obtained. ... The value “7” corresponds to silence, pianissimo (ppp)... Fortissimo (fff), respectively.

【０１１０】実際の声の強さＶＬの値と物理的な声の強
さとは音声合成において声色データに依存することか
ら、例えば母音「ア」と母音「イ」の各声の強さＶＬの
値をいずれも標準値に設定しておけばよく、その標準値
を用いれば声色データによって物理的な声の強さは母音
「イ」よりも母音「ア」の方を大きくすることができ
る。なお、一般的に、母音「ア」の平均的な振幅パワー
は母音「イ」の振幅パワーよりも大きくなる。Since the value of the actual voice strength VL and the physical voice strength depend on voice data in voice synthesis, for example, the VL of the vowel "A" and the vowel "A" All of the values may be set to standard values, and if the standard values are used, the vowel “A” can be made larger in physical voice strength than the vowel “A” by the timbre data. In general, the average amplitude power of the vowel "A" is larger than the amplitude power of the vowel "A".

【０１１１】音韻コードＰＨは、前述の各音韻テーブル
（図１０、図１１、及び図１２参照）にある音韻コード
を指定するデータである。この実施の形態では、音韻コ
ードは１バイトデータとなる。The phoneme code PH is data for specifying a phoneme code in each of the phoneme tables (see FIGS. 10, 11, and 12). In this embodiment, the phoneme code is 1-byte data.

【０１１２】ピッチイベントＰＥ２は、識別情報Ｐ２と
声の高さ（ピッチ）ＰＴとを対応付けた構造を有し、任
意の時点での声の高さを指定するイベントである。この
ピッチイベントＰＥ２は、声の高さを音韻とは独立（音
韻間の時間差に依存せず）に指定できると共に、声の高
さをひとつの音韻の時間区分内にも細かい時間間隔で指
定できる。これらの指定及びその操作は高品位の韻律生
成に必須条件となる。The pitch event PE2 has a structure in which the identification information P2 is associated with the voice pitch (pitch) PT, and is an event for designating the voice pitch at an arbitrary time. In the pitch event PE2, the pitch of the voice can be specified independently of the phonemes (without depending on the time difference between the phonemes), and the pitch of the voice can be specified at a fine time interval within a time segment of one phoneme. . These designations and their operations are essential conditions for generating high-quality prosody.

【０１１３】ピッチイベントＰＥ２の先頭に付加された
識別情報Ｐ２は、イベントの種類が発声イベントＰＥの
内のピッチイベントであることを示す。The identification information P2 added to the head of the pitch event PE2 indicates that the event type is a pitch event in the utterance event PE.

【０１１４】声の高さＰＴは絶対的な声の高さを示すも
のでなく、ヘッダ情報の内のピッチ基準データＰＢで示
されるピッチ基準を基準（中心）に相対的に指定される
データである。The voice pitch PT does not indicate the absolute voice pitch, but is data specified relative to the reference (center) with respect to the pitch reference indicated by the pitch reference data PB in the header information. is there.

【０１１５】この声の高さＰＴを１バイトデータにした
場合には、０〜２５５の階調で表される、ピッチ基準を
基準に上下１オクターブの範囲で値が指定される。声の
高さＰＴを例えばピッチ周波数ｆ［Ｈｚ］で定義する
と、次式（１）の通りである。When the voice pitch PT is 1-byte data, the value is specified in a range of one octave up and down on the basis of the pitch, represented by gradations of 0 to 255. If the pitch PT of the voice is defined by, for example, a pitch frequency f [Hz], the following expression (1) is obtained.

【０１１６】すなわち、ｆ＝ＰＢＶ・（（ＰＴ／２５６）²＋０．５・（ＰＴ／２５６）＋０．５）・・・（１）である。ここで、ＰＢＶはピッチ基準データＰＢで指定
されるピッチ基準の値（単位はＨｚ）である。That is, f = PBV · ((PT / 256) ² + 0.5 · (PT / 256) +0.5) (1) Here, PBV is a pitch reference value (unit: Hz) specified by the pitch reference data PB.

【０１１７】また逆に、次式（２）に従ってピッチ周波
数ｆからピッチ基準ＰＴの値を求めることができる。こ
の式（２）は以下の通りである。Conversely, the value of the pitch reference PT can be obtained from the pitch frequency f according to the following equation (2). This equation (2) is as follows.

【０１１８】すなわち、ＰＴ＝６４・（√（（１６・ｆ／ＰＢＶ）−７）−１）・・・（２）である。That is, PT = 64 · (√ ((16 · f / PBV) −7) −1) (2)

【０１１９】ベロシティイベントＰＥ３は、識別情報Ｐ
３と声の強さＶＬとを対応付けた構造を有し、任意の時
点での声の強さを指定するイベントである。このベロシ
ティイベントＰＥ３は、声の強さを音韻とは独立（音韻
間の時間差に依存せず）に指定できると共に、声の強さ
をひとつの音韻の時間区分内にも細かい時間間隔で指定
できる。これらの指定及びその操作は高品位の韻律生成
に必須条件となる。The velocity event PE3 has the identification information P
3 is an event that has a structure in which the voice strength VL is associated with the voice strength VL, and specifies the voice strength at an arbitrary point in time. In the velocity event PE3, the strength of the voice can be specified independently of the phonemes (independent of the time difference between the phonemes), and the strength of the voice can be specified at a fine time interval within the time division of one phoneme. . These designations and their operations are essential conditions for generating high-quality prosody.

【０１２０】声の強さＶＬは、基本的には音韻毎に指定
するが、長音化など１つの音韻の途中で声の強さ（ベロ
シティ）を変化させる場合には、音韻とは独立して、任
意の時点にベロシティイベントＰＥ３を適宜追加指定す
る。The voice strength VL is basically specified for each phoneme. However, when the voice strength (velocity) is changed in the middle of one phoneme such as a long sound, the voice strength VL is independent of the phoneme. , The velocity event PE3 is additionally specified at any time.

【０１２１】次に、制御イベントＣＥについて図７を参
照して詳述する。Next, the control event CE will be described in detail with reference to FIG.

【０１２２】制御イベントＣＥは、音量イベントＣＥ１
（図７（Ａ）参照）とピッチ基準イベントＣＥ２（図７
（Ｂ）参照）とを定義するイベントである。The control event CE is a volume event CE1.
(See FIG. 7A) and pitch reference event CE2 (see FIG. 7A).
(See (B)).

【０１２３】音量イベントＣＥ１は、識別情報Ｃ１と音
量データＶＢＣとを対応付けた構造を有し、ヘッダ情報
ＨＤＲＡで指定される音量基準データＶＢを途中で変更
するように指定するイベントである。The volume event CE1 has a structure in which the identification information C1 is associated with the volume data VBC, and specifies that the volume reference data VB specified by the header information HDRA is changed on the way.

【０１２４】すなわち、これは全体の音量レベルを上下
に操作する際に使用されるイベントであって、時間軸方
向で次の音量イベントＣＥ１により音量が指定されるま
では音量基準がヘッダ情報ＨＤＲＡで指定される音量基
準データＶＢから指定の音量データＶＢＣに置き換わ
る。That is, this is an event used when operating the overall volume level up or down, and the volume reference is determined by the header information HDRA until the volume is designated by the next volume event CE1 in the time axis direction. The specified volume reference data VB is replaced with the specified volume data VBC.

【０１２５】音量イベントＣＥ１の先頭に付加された識
別情報Ｃ１は、制御イベントの種類である音量を示す。The identification information C1 added to the head of the volume event CE1 indicates the volume which is the type of the control event.

【０１２６】ピッチ基準イベントＣＥ２は、識別情報Ｃ
２とピッチ基準データＰＢＣとを対応付けた構造を有
し、ヘッダ情報ＨＤＲＡで指定されるピッチ基準データ
ＰＢで指定可能な声の高さの範囲を超える場合に指定す
るイベントである。The pitch reference event CE2 includes the identification information C
2 has a structure in which pitch reference data PBC is associated with the pitch reference data PBC, and is an event specified when the pitch exceeds the range of the voice pitch that can be specified by the pitch reference data PB specified by the header information HDRA.

【０１２７】すなわち、これは全体のピッチ基準を上下
に操作する際に使用されるイベントであって、時間軸方
向で次のピッチ基準イベントＣＥ２によりピッチ基準が
指定されるまではピッチ基準がヘッダ情報ＨＤＲＡで指
定されるピッチ基準データＰＢから指定のピッチ基準デ
ータＰＢＣに置き換わる。以降、声の高さはピッチ基準
データＰＢＣを中心に上下１オクターブの範囲で変化す
る。That is, this is an event used when the overall pitch reference is operated up and down, and the pitch reference is set in the header information until the next pitch reference event CE2 specifies the pitch reference in the time axis direction. The pitch reference data PB specified by HDRA is replaced with the specified pitch reference data PBC. Thereafter, the pitch of the voice changes within a range of one octave above and below the pitch reference data PBC.

【０１２８】次に、端末装置２について詳述する。図８
は端末装置２の内部構成を示すブロック図である。Next, the terminal device 2 will be described in detail. FIG.
FIG. 2 is a block diagram showing an internal configuration of the terminal device 2.

【０１２９】端末装置２は、制御部２４、キー入力部２
９（請求項１１の入力手段、請求項１７の作成手段、請
求項１９の変更手段）、アプリケーション記憶部２２、
声色記憶部２１、ＤＢ２６、元波形記憶部２７、マイク
２８（請求項１７の音声入力手段）、スピーカ２３、表
示部２５、インタフェース（Ｉ／Ｆ）３０、ＦＤドライ
ブ３１、ＣＤ−ＲＯＭドライブ３２、通信部２０等のユ
ニットを備えている。The terminal device 2 comprises a control unit 24, a key input unit 2
9 (input means of claim 11, creation means of claim 17, change means of claim 19), application storage unit 22,
Voice storage unit 21, DB 26, original waveform storage unit 27, microphone 28 (voice input means of claim 17), speaker 23, display unit 25, interface (I / F) 30, FD drive 31, CD-ROM drive 32, A unit such as the communication unit 20 is provided.

【０１３０】制御部２４は、バスＢＳに結合される各ユ
ニットを制御する中央処理ユニットである。この制御部
２４は、キー入力部２のキー操作の検出、アプリケーシ
ョンの実行、声色，音韻，発話にかかる情報の追加・削
除、発話情報の作成・送受信、元波形データの記憶、各
種表示画面の形成等の動作を制御する。The control unit 24 is a central processing unit that controls each unit connected to the bus BS. The control unit 24 detects the key operation of the key input unit 2, executes an application, adds / deletes information relating to timbre, phoneme, and utterance, creates / transmits utterance information, stores original waveform data, and stores various display screens. Control operations such as forming.

【０１３１】この制御部２４は、ＣＰＵ２４１、ＲＯＭ
２４２、ＲＡＭ２４３等を備えている。ＣＰＵ２４１は
ＲＯＭ２４２に格納されたＯＳプログラムやアプリケー
ション記憶部２２に格納されたアプリケーションプログ
ラム（音声処理ＰＭ（プログラムメモリ）３１等）に従
って動作する。The control unit 24 includes a CPU 241 and a ROM
242, a RAM 243, and the like. The CPU 241 operates according to the OS program stored in the ROM 242 and the application program (such as the sound processing PM (program memory) 31) stored in the application storage unit 22.

【０１３２】ＲＯＭ２４２はＯＳ（オペレーティングシ
ステム）プログラム等を格納している記録媒体であり、
ＲＡＭ２４３は、上述した各種プログラムのワークエリ
アとして使用するメモリであり、送受信データを一時格
納する際にも使用される。The ROM 242 is a recording medium storing an OS (operating system) program and the like.
The RAM 243 is a memory used as a work area for the various programs described above, and is also used when temporarily storing transmission / reception data.

【０１３３】キー入力部２９は発話情報に関するファイ
ル作成・送受信・ファイリング、声色記憶部２１のファ
イル送受信・ファイリング等の指示を制御部２４にキー
信号として検出させるための各種キーやマウス等の入力
デバイスを具備している。A key input unit 29 is an input device such as various keys or a mouse for causing the control unit 24 to detect, as a key signal, an instruction such as file creation / transmission / reception / filing relating to speech information and file transmission / reception / filing of the voice storage unit 21. Is provided.

【０１３４】アプリケーション記憶部２２は、ナレーシ
ョン処理ＰＭ２２１等のアプリケーションプログラムを
格納した記録媒体である。このアプリケーション記憶部
２２については、通信網ＮＥＴや、ＦＤ（フロッピーデ
ィスク）、ＣＤ（コンパクトディスク）−ＲＯＭ等の他
の記録媒体を通してこのナレーション処理ＰＭ２２１の
プログラムの追加、変更、削除等の操作が可能である。The application storage unit 22 is a recording medium that stores an application program such as a narration process PM221. The application storage unit 22 can perform operations such as addition, change, and deletion of the program of the narration processing PM 221 through another recording medium such as a communication network NET, an FD (floppy disk), and a CD (compact disk) -ROM. It is.

【０１３５】このナレーション処理ＰＭ２２１は、図１
６に示したフローチャートに従うファイル転送処理、図
１７及び図１８に示したフローチャートに従う再生処
理、図２３に示したフローチャートに従う発話情報作成
処理、図２４に示したフローチャートに従う新規作成処
理、図２５に示したフローチャートに従う割り込み再生
処理、図３４に示したフローチャートに従う編集処理、
図３５に示したフローチャートに従うファイル登録処理
等を実行するためのプログラムが格納されている。This narration processing PM 221 is performed as shown in FIG.
6, file reproduction processing according to the flowcharts shown in FIGS. 17 and 18, speech information creation processing according to the flowchart shown in FIG. 23, new creation processing according to the flowchart shown in FIG. 24, and FIG. Playback processing according to the flowchart shown in FIG. 34, editing processing according to the flowchart shown in FIG. 34,
A program for executing a file registration process and the like according to the flowchart shown in FIG. 35 is stored.

【０１３６】図１６のファイル転送処とは、端末装置２
がホスト装置１に対して所望のファイル情報（発話情
報、画像情報等を含む）を要求し、ホスト装置１より転
送されてくるファイル情報を受け取って音声、画像等の
再生を実行する処理である。The file transfer process shown in FIG.
Requests the host device 1 for desired file information (including utterance information, image information, and the like), receives the file information transferred from the host device 1, and executes reproduction of voice, image, and the like. .

【０１３７】図１７及び図１８の再生処理とは、上記フ
ァイル転送処理における音声や画像の再生動作を具体的
に示す処理である。The reproduction process shown in FIGS. 17 and 18 is a process specifically showing a reproduction operation of voice and image in the file transfer process.

【０１３８】図２３の発話情報作成処理とは、音韻等の
素片単位毎の音響パラメータを表す声色データを含まな
い自然音声に基づく発話情報（図３〜図７参照）の新規
作成、編集、及びファイリングを含む処理である。The utterance information creation processing shown in FIG. 23 includes new creation, editing, and creation of utterance information (see FIGS. 3 to 7) based on natural speech that does not include voice data representing acoustic parameters for each unit such as phonemes. And filing.

【０１３９】図２４の新規作成処理とは、上記発話情報
作成処理における新規作成動作を具体的に示す処理であ
る。The new creation process in FIG. 24 is a process specifically showing a new creation operation in the utterance information creation process.

【０１４０】図２５の割り込み再生処理とは、上記新規
作成処理または編集処理による動作中に再生要求があっ
た場合の音声再生動作を具体的に示す処理である。The interrupt reproduction process shown in FIG. 25 is a process specifically showing an audio reproduction operation when a reproduction request is made during the operation of the above-mentioned new creation processing or editing processing.

【０１４１】図３４の編集処理とは、上記発話情報作成
処理における編集動作を具体的に示す処理であり、その
編集対象は既に作成済みのファイルの発話情報となる。The editing process in FIG. 34 is a process specifically showing the editing operation in the above-mentioned utterance information creating process, and the editing target is the utterance information of a file that has already been created.

【０１４２】図３５のファイル登録処理とは、端末装置
２がホスト装置１に対して所望のファイル情報を登録要
求してホスト装置１へ転送する処理である。The file registration process shown in FIG. 35 is a process in which the terminal device 2 requests the host device 1 to register desired file information and transfers it to the host device 1.

【０１４３】声色記憶部２１は、各種声色を示す声色デ
ータを記憶する記録媒体であり、声色部２１１と音韻部
２１２とで構成される。声色部２１１は声色の種類別に
音韻等の素片単位毎の音響パラメータを表す声色データ
を選択可能に格納しており（図９参照）、音韻部２１２
は各言語の属する音韻群毎に音韻と音韻コードとを対応
させた音韻テーブルを格納している（図１０〜図１３参
照）。The timbre storage unit 21 is a recording medium for storing timbre data indicating various timbres, and includes a timbre unit 211 and a phoneme unit 212. The timbre section 211 stores vocal data representing acoustic parameters for each unit such as phoneme for each type of timbre in a selectable manner (see FIG. 9).
Stores a phoneme table in which phonemes and phoneme codes are associated with each phoneme group to which each language belongs (see FIGS. 10 to 13).

【０１４４】声色部２１１、音韻部２１２のいずれも、
通信回線ＬＮや、ＦＤ、ＣＤ−ＲＯＭ等の記録媒体を通
して声色データや音韻テーブル等を追加したり、キー入
力部２９のキー操作によって削除することが可能であ
る。Both the voice part 211 and the phoneme part 212
It is possible to add voice data or a phoneme table through a recording medium such as the communication line LN, FD, CD-ROM or the like, or delete it by operating the key input unit 29.

【０１４５】ＤＢ２６はファイル単位で発話情報を記憶
している。この発話情報とは、離散的な音韻と離散的な
韻律情報（音韻群、発声や発声制御の時間差、声の高
さ、声の大きさ）とで構成される発声情報と、その発声
情報を定義するヘッダ情報（言語、時間分解能、声色指
定、基準となる声の大きさを示すピッチ基準、基準とな
る音量を示す音量基準）とを含んだ情報である。The DB 26 stores utterance information in file units. The utterance information includes utterance information composed of discrete phonemes and discrete prosody information (phoneme group, time difference between utterances and utterance control, pitch of voice, and loudness of voice). This is information including header information to be defined (language, time resolution, voice color designation, pitch reference indicating a reference voice volume, volume reference indicating a reference volume).

【０１４６】音声再生時には、発話情報に基づいて離散
的な韻律を連続的な韻律パターンに展開し、ヘッダ情報
に基づく音声の声色を示す声色データと波形合成するこ
とにより音声を再生することができる。At the time of voice reproduction, the voice can be reproduced by developing a discrete prosody into a continuous prosody pattern based on the utterance information and synthesizing a waveform with voice data indicating the voice color of the voice based on the header information. .

【０１４７】元波形記憶部２７は発話情報のファイルを
作成するための自然音声を波形データの状態で記憶する
記録媒体である。マイク２８は発話情報のファイルを作
成する等の処理で自然音声を入力する音声入力ユニット
である。The original waveform storage section 27 is a recording medium for storing natural speech for creating a file of speech information in the form of waveform data. The microphone 28 is a voice input unit for inputting natural voice by performing processing such as creating a file of speech information.

【０１４８】スピーカ２３は、上述した再生処理や割り
込み再生処理で再生される合成音声等の音声を出力する
音声出力ユニットである。The speaker 23 is a sound output unit that outputs a sound such as a synthesized sound reproduced in the above-described reproduction processing or interrupt reproduction processing.

【０１４９】表示部２５は発話情報のファイル作成・送
受信・ファイリング等の処理に関連する表示画面を形成
するＬＣＤ、ＣＲＴ等の表示ユニットである。The display unit 25 is a display unit such as an LCD and a CRT which forms a display screen related to processing such as file creation / transmission / reception / filing of speech information.

【０１５０】インタフェース３０はバスＢＳとＦＤドラ
イブ３１やＣＤ−ＲＯＭドライブ３２との間でデータ授
受を行うユニットである。ＦＤドライブ３１は着脱自在
のＦＤ３１ａ（記録媒体）を装着してデータを読み出し
たり書き込む動作を実施する。ＣＤ−ＲＯＭドライブ３
２は着脱自在のＣＤ−ＲＯＭ３２ａ（記録媒体）を装着
してデータを読み出す動作を実施する。The interface 30 is a unit for exchanging data between the bus BS and the FD drive 31 or the CD-ROM drive 32. The FD drive 31 performs an operation of reading and writing data by mounting a detachable FD 31a (recording medium). CD-ROM drive 3
Reference numeral 2 carries out an operation of reading data by mounting a removable CD-ROM 32a (recording medium).

【０１５１】なお、ＦＤ３１ａやＣＤ−ＲＯＭ３２ａに
声色データ、音韻テーブル、アプリケーションプログラ
ム等の情報を記憶させておけば、声色記憶部２１、アプ
リケーション記憶部２２等の記憶内容を更新することが
可能である。If information such as voice data, a phoneme table, and an application program is stored in the FD 31a or the CD-ROM 32a, the stored contents of the voice color storage unit 21, the application storage unit 22, and the like can be updated. .

【０１５２】通信部２０は、通信回線ＬＮに接続され、
その通信回線ＬＮを介して外部装置との通信を行う。The communication section 20 is connected to the communication line LN.
Communication with an external device is performed via the communication line LN.

【０１５３】次に、声色記憶部２１について詳述する。
図９は声色記憶部２１の声色部２１１のメモリ構成例を
示す図である。声色部２１１は、図９に示したように、
選択Ｎｏ．１，２…にそれぞれ対応させて声色データＶ
Ｄ１，ＶＤ２…を記憶するメモリである。声色の種類に
ついては、男性、女性、、子供、大人、ハスキー等の声
色が採用される。また、声色データＶＤ１，ＶＤ２…に
はそれぞれ声の高さの基準を示すピッチ基準データＰＢ
１，ＰＢ２…が含まれている。Next, the timbre storage unit 21 will be described in detail.
FIG. 9 is a diagram showing an example of a memory configuration of the voice color unit 211 of the voice color storage unit 21. The voice part 211, as shown in FIG.
Selection No. Voice data V corresponding to 1, 2,.
D1, VD2,... As the types of voices, voices of men, women, children, adults, husky, etc. are adopted. The voice color data VD1, VD2,... Respectively have pitch reference data PB indicating a reference of voice pitch.
1, PB2... Are included.

【０１５４】声色データには、それぞれ合成単位（例え
ばＣＶＣなど）毎の音響パラメータが含まれている。こ
の音響パラメータとして、ＬＳＰパラメータやケプスト
ラム、または１ピッチの波形データ等が好適である。The voice data includes sound parameters for each synthesis unit (for example, CVC). As the acoustic parameters, LSP parameters, cepstrum, and one-pitch waveform data are suitable.

【０１５５】次に、音韻部２１２について説明する。図
１０は声色記憶部２１の音韻部２１２のメモリ構成例を
示す図、図１１は日本語音韻テーブルの有音声化音韻テ
ーブル３３Ａのメモリ構成例を示す図、図１２は日本語
音韻テーブルの無音声化音韻テーブル３３Ｂのメモリ構
成例を示す図、そして、図１３は音韻部２１２における
言語コード毎の音韻と音韻コードとの対応関係を示す図
である。Next, the phoneme unit 212 will be described. FIG. 10 is a diagram showing a memory configuration example of the phoneme unit 212 of the timbre storage unit 21, FIG. 11 is a diagram showing a memory configuration example of the voiced phoneme table 33A of the Japanese phoneme table, and FIG. FIG. 13 is a diagram illustrating a memory configuration example of the voiced phoneme table 33B, and FIG. 13 is a diagram illustrating a correspondence relationship between phonemes and phoneme codes for each language code in the phoneme unit 212.

【０１５６】音韻部２１２は、図１０に示したように、
英語、独語、日本語等の言語の言語コード毎に音韻群を
対応させた音韻テーブル２１２Ａと、音韻群毎に音韻と
音韻コードとの対応関係を示す音韻テーブル２１２Ｂと
を記憶するメモリである。As shown in FIG. 10, the phoneme unit 212
It is a memory that stores a phoneme table 212A in which phoneme groups are made to correspond to language codes of languages such as English, German, and Japanese, and a phoneme table 212B that shows the correspondence between phonemes and phoneme codes for each phoneme group.

【０１５７】言語コードは言語毎に付与されたコードで
あり、言語と言語コードとは１対１の対応関係になる。
例えば、英語には言語コード「１」、独語には言語コー
ド「２」、日本語には言語コード「３」がそれぞれ付与
される。The language code is a code assigned to each language, and there is a one-to-one correspondence between the language and the language code.
For example, a language code "1" is given to English, a language code "2" is given to German, and a language code "3" is given to Japanese.

【０１５８】音韻群は各言語毎に対応する音韻テーブル
を指定する。例えば、英語と独語の場合には、音韻群に
よって音韻テーブル２１２ＢにおけるアドレスＡＤＲ１
が指定され、この場合にはラテン語音韻テーブルが使用
される。日本語の場合には、音韻テーブル２１２Ｂにお
けるアドレスＡＤＲ２が指定され、この場合には日本語
音韻テーブルが使用される。A phoneme group specifies a phoneme table corresponding to each language. For example, in the case of English and German, the address ADR1 in the phoneme table 212B is determined by the phoneme group.
Is specified, in which case the Latin phoneme table is used. In the case of Japanese, the address ADR2 in the phoneme table 212B is specified, and in this case, the Japanese phoneme table is used.

【０１５９】さらに詳述すれば、ラテン系の言語、例え
ば英語や独語では、音声の単位として音素レベルが使用
される。すなわち、１種類の音韻コードのセットが複数
種の言語の文字に対応する。一方、日本語の様な言語の
場合には、音韻コードと文字とがほぼ１対１対応の関係
にある。More specifically, in Latin languages such as English and German, phoneme levels are used as units of speech. That is, one type of phoneme code set corresponds to characters in a plurality of languages. On the other hand, in the case of a language such as Japanese, there is a one-to-one correspondence between phoneme codes and characters.

【０１６０】また、音韻テーブル２１２Ｂは音韻コード
と音韻の対応関係を示す表形式のデータである。この音
韻テーブル２１２Ｂは音韻群毎に具備され、例えばラテ
ン語（英語、独語）の音韻テーブル（ラテン語音韻テー
ブル）はメモリ上のアドレスＡＤＲ１、日本語の音韻テ
ーブル（日本語音韻テーブル）はメモリ上のアドレスＡ
ＤＲ２に格納されている。The phoneme table 212B is tabular data indicating the correspondence between phoneme codes and phonemes. The phoneme table 212B is provided for each phoneme group. For example, a Latin (English, German) phoneme table (Latin phoneme table) has an address ADR1 in a memory, and a Japanese phoneme table (Japanese phoneme table) has an address in a memory. A
It is stored in DR2.

【０１６１】例えば、日本語の言語に対応する音韻テー
ブル（アドレスＡＤＲ２の位置）は、図１１及び図１２
に示した如く、有音声化音韻テーブル３３Ａと無音声化
音韻テーブル３３Ｂとで構成される。For example, the phoneme table (the position of the address ADR2) corresponding to the Japanese language is shown in FIGS.
As shown in (1), it is composed of a voiced phoneme table 33A and a non-voiced phoneme table 33B.

【０１６２】図１１に示した有音声化音韻テーブル３３
Ａでは、有音声化の音韻コードと有音声化音韻（文字：
文字コードで表現される）とが対応付けられいる。有音
声化の音韻コードは１バイトで構成され、例えば有音声
化の音韻の文字「ア」には有音声化の音韻コードの０３
ｈ（ｈは１６進数を示す）が対応する。The voiced phoneme table 33 shown in FIG.
In A, a voiced phoneme code and a voiced phoneme (character:
(Represented by character codes). The voiced phoneme code is composed of 1 byte. For example, the voiced phoneme character "A" has the voiced phoneme code 03.
h (h indicates a hexadecimal number) corresponds thereto.

【０１６３】また、カ行の文字の右上に「。」が付加さ
れた音韻はその文字の発音声を鼻濁音とする意味をも
つ。例えば有音声化の音韻コード１３ｈ〜１７ｈには文
字「カ」〜「コ」の鼻濁音の表現が対応する。A phoneme with a “.” Added to the upper right of the character in the line has a meaning that the voice of the character is a muddy sound. For example, voiced phoneme codes 13h to 17h correspond to the expressions of the nasal sounds of the characters "ka" to "ko".

【０１６４】図１２に示した無音声化音韻テーブル３３
Ｂでは、無音声化の音韻コードと無声化音韻（文字：文
字コードで表現される）とが対応付けられいる。この実
施の形態では、無音声化の音韻コードも１バイトで構成
され、例えば無音声化の音韻の文字「カ」（「Ｕカ」）
には無音声化の音韻コードのＡ０ｈが対応する。無音声
化の音韻については、文字の前に「Ｕ」が付加される。The unvoiced phoneme table 33 shown in FIG.
In B, the unvoiced phoneme code and the unvoiced phoneme (character: represented by a character code) are associated with each other. In this embodiment, the non-voiced phoneme code is also composed of one byte, and for example, the non-voiced phonetic character "ka"("Uka")
Corresponds to the non-voiced phoneme code A0h. For phonemes to be unvoiced, “U” is added before the character.

【０１６５】例えば、言語コードが「３」の日本語の場
合、アドレスＡＤＲ２にある日本語音韻テーブルが使用
される。これにより、図１３に示した一例のように、各
文字「ア」、「カ」、「ヘ」の文字がそれぞれ音韻コー
ド０３ｈ，０９ｈ，３９ｈに対応付けられる。For example, when the language code is "3", the Japanese phoneme table at address ADR2 is used. As a result, as in the example shown in FIG. 13, the characters "A", "F", and "F" are associated with the phoneme codes 03h, 09h, and 39h, respectively.

【０１６６】また、言語が英語又は独語の場合、アドレ
スＡＤＲ１のラテン語音韻テーブルが使用される。これ
により、図１３に示した一例のように、英語の音韻（音
素）「ａ」、「ｉ」はそれぞれ音韻コード３９ｈ，０５
ｈに対応付けられ、一方、独語の音韻（音素）「ａ」、
「ｉ」はそれぞれ音韻コード３９ｈ、０５ｈに対応付け
られる。If the language is English or German, the Latin phoneme table at address ADR1 is used. Thus, as in the example shown in FIG. 13, the English phonemes (phonemes) “a” and “i” have phoneme codes 39h and 05, respectively.
h, while the German phoneme (a phoneme) "a",
“I” is associated with phoneme codes 39h and 05h, respectively.

【０１６７】このように、図１３に示した一例のよう
に、英語、独語に共通の例えば音素「ａ」，「ｉ」につ
いては共通の音韻コード３９ｈ、０５ｈが付与される。As described above, common phonemic codes 39h and 05h are assigned to, for example, phonemes "a" and "i" common to English and German, as in the example shown in FIG.

【０１６８】次に、ＤＢ２６について説明する。図１４
は端末装置２のＤＢ２６のメモリ構成例を示す図であ
る。Next, the DB 26 will be described. FIG.
FIG. 4 is a diagram illustrating an example of a memory configuration of a DB 26 of the terminal device 2.

【０１６９】ＤＢ２６は、図１４に示すように、ファイ
ルＡ，Ｄ…に対応させて発話情報を含むファイル情報を
記憶している。例えば、ファイルＡのファイル情報は既
にホスト装置１より受信した情報であり、発話情報（ヘ
ッダ情報ＨＤＲＡ及び発声情報ＰＲＳＡ）、画像情報Ｉ
ＭＧＡ、プログラム情報ＰＲＯＡを対応付けて記憶され
ている。同様に、ファイルＤのファイル情報は発話情報
（ヘッダ情報ＨＤＲＤ及び発声情報ＰＲＳＤ）、画像情
報ＩＭＧＤ、プログラム情報ＰＲＯＤを対応付けて記憶
されている。なお、システムとしてインターネットを例
に挙げていることから、ファイル情報Ａ，Ｄ…におい
て、プログラム情報ＰＲＯＡ，ＰＲＯＤ…はＨＴＭＬ言
語で記述された情報であり、ホームページ等を作成す
る。As shown in FIG. 14, the DB 26 stores file information including speech information in association with the files A, D,. For example, the file information of the file A is information that has already been received from the host device 1, and includes speech information (header information HDRA and speech information PRSA) and image information I.
The MGA and the program information PROA are stored in association with each other. Similarly, the file information of the file D is stored in association with speech information (header information HDRD and speech information PRSD), image information IMGD, and program information PROD. Since the Internet is taken as an example of the system, in the file information A, D..., The program information PROA, PROD... Is information described in HTML language, and creates a homepage or the like.

【０１７０】次に、音声合成について説明する。図１５
はこの実施の形態による音声再生処理を概念的に説明す
るブロック図である。Next, speech synthesis will be described. FIG.
FIG. 4 is a block diagram conceptually illustrating a sound reproduction process according to the embodiment.

【０１７１】音声再生処理は制御部２４のＣＰＵ２４１
により実行される動作である。すなわち、ＣＰＵ２４１
は、発話情報を順次入力し、韻律パターン展開処理ＰＲ
１と合成波形生成処理ＰＲ２とを経て合成波形データを
生成する。The sound reproduction process is performed by the CPU 241 of the control unit 24.
This is the operation performed by That is, the CPU 241
Input utterance information sequentially and perform prosodic pattern development processing PR
1 and a composite waveform generation process PR2 to generate composite waveform data.

【０１７２】韻律パターン展開処理ＰＲ１は、ホスト装
置１より受信されたファイル情報、もしくはＤＢ２６か
ら読み出しが指定されたファイル情報の発話情報の内の
発声情報を入力して、発声イベントＰＥ中の時間差デー
タＤＴ、声の高さＰＴ、声の強さＶＬから時間軸上で連
続する韻律パターンを展開する。なお、発声イベントＰ
Ｅは前述したように３種類のイベントパターンがあるこ
とから、音韻とは独立した時間差で声の高さや声の大き
さが指定される。The prosody pattern development processing PR1 is executed by inputting the utterance information of the file information received from the host device 1 or the utterance information of the file information designated to be read from the DB 26, and executing the time difference data in the utterance event PE. A continuous prosody pattern is developed on the time axis from DT, voice pitch PT, and voice strength VL. The utterance event P
Since E has three types of event patterns as described above, the pitch and loudness of the voice are specified with a time difference independent of the phoneme.

【０１７３】なお、声色記憶部２１では、ホスト装置１
より受信されたファイル情報のヘッダ情報、もしくはＤ
Ｂ２６に記憶されたファイル情報のヘッダ情報で指定さ
れる音韻群ＰＧ、声色指定データＶＰ、ピッチ基準デー
タＰＢにより声色データを選定し、かつピッチ値を決め
るピッチシフトデータを合成波形生成処理ＰＲ２に供給
する。時間差、ピッチ、ベロシティは、それぞれ時間分
解能ＴＤ、ピッチ基準データＰＢ、音量基準データＶＢ
を基準に相対的な値として決定される。In the timbre storage unit 21, the host device 1
Header information of the received file information, or D
The voice data is selected based on the phoneme group PG, voice color designation data VP, and pitch reference data PB specified by the header information of the file information stored in B26, and pitch shift data for determining the pitch value is supplied to the synthetic waveform generation process PR2. I do. The time difference, pitch, and velocity are respectively time resolution TD, pitch reference data PB, and volume reference data VB.
Is determined as a relative value with respect to.

【０１７４】合成波形生成処理ＰＲ２は、音韻コードＰ
Ｈと時間差データＤＴとにより各音韻の系列及び継続時
間長を求め、音韻の系列より声色データから該当する合
成単位の音響パラメータの長さを伸縮する処理を実行す
る。The synthetic waveform generation processing PR2 is performed by using the phoneme code P
Based on H and the time difference data DT, a sequence and duration of each phoneme are obtained, and a process of expanding / contracting the length of the acoustic parameter of the corresponding synthesis unit from the timbre data based on the phoneme sequence is executed.

【０１７５】その後、合成波形生成処理ＰＲ２は、音響
パラメータと韻律パターン展開処理ＰＲ１により得られ
た時間的に連続なピッチとベロシティのパターンとに基
づいて音声合成して合成波形データを得る。Thereafter, in the synthetic waveform generation processing PR2, voice synthesis is performed based on the acoustic parameters and the temporally continuous pitch and velocity patterns obtained by the prosody pattern development processing PR1 to obtain synthetic waveform data.

【０１７６】なお、実際の物理的なピッチ周波数は韻律
パターン展開処理ＰＲ１で得られたパターンとシフトデ
ータとによって決定される。The actual physical pitch frequency is determined by the pattern obtained in the prosody pattern development process PR1 and the shift data.

【０１７７】合成波形データは、図８には図示していな
いＤ／Ａ変換器３４によってデジタルデータからアナロ
グデータに変換され、その後段のスピーカ２３によって
音声出力される。The synthesized waveform data is converted from digital data to analog data by a D / A converter 34 not shown in FIG.

【０１７８】次に、動作について説明する。Next, the operation will be described.

【０１７９】まず、ファイル転送について説明する。図
１６はこの実施の形態によるファイル転送処理を説明す
るフローチャートであり、図１７及び図１８はこの実施
の形態による再生処理を説明するフローチャートであ
る。図１９〜図２２は再生処理時の表示画面の操作に応
じた状態遷移を示す図である。First, file transfer will be described. FIG. 16 is a flowchart illustrating a file transfer process according to this embodiment, and FIGS. 17 and 18 are flowcharts illustrating a reproduction process according to this embodiment. FIG. 19 to FIG. 22 are diagrams showing state transitions according to operations on the display screen during the reproduction process.

【０１８０】このファイル転送では、端末装置２がホス
ト装置１から所望のファイル情報をダウンロードして音
声や画像を再生する処理が実行される。In this file transfer, a process is executed in which the terminal device 2 downloads desired file information from the host device 1 and reproduces audio and images.

【０１８１】具体的には、ホスト装置１と端末装置２と
の通信時に、まず、端末装置２において、所望のファイ
ルがキー入力部２９のキー操作によって選択される（ス
テップＴ１）。このステップＴ１によるファイル選択で
は、通信中に、ホスト装置１より選択可能なファイルの
一覧が転送され、端末装置２側で表示部３０にその一覧
が表示される。Specifically, at the time of communication between the host device 1 and the terminal device 2, first, in the terminal device 2, a desired file is selected by a key operation of the key input unit 29 (step T1). In the file selection in step T1, a list of selectable files is transferred from the host device 1 during communication, and the list is displayed on the display unit 30 on the terminal device 2 side.

【０１８２】その後、ステップＴ１で選択されたファイ
ルの転送（ダウンロード）がホスト装置１に対して要求
される（ステップＴ２）。この要求処理は、上述したフ
ァイル選択に伴って動作する。Thereafter, transfer (download) of the file selected in step T1 is requested to the host device 1 (step T2). This request process operates according to the above-described file selection.

【０１８３】ホスト装置１では、端末装置２より何らか
の要求があると、その要求が受け付けられ（ステップＨ
１）、その要求の内容が判別される（ステップＨ２）。If any request is received from the terminal device 2, the host device 1 accepts the request (step H).
1), the content of the request is determined (step H2).

【０１８４】そして、ファイル転送要求という判別結果
が得られた場合には（ステップＨ３）、処理はステップ
Ｈ４に移行してファイル転送処理が実行され、得られな
かった場合には（ステップＨ３）、処理は判別結果に応
じたその他の処理に移行する。If the result of the determination is that a file transfer request has been obtained (step H3), the process proceeds to step H4 to execute the file transfer process. If not, the process proceeds to step H3. The processing shifts to other processing according to the determination result.

【０１８５】ステップＨ４によるファイル転送処理で
は、端末装置２が要求するファイルをＤＢ１１より読み
出して端末装置２に転送する処理が実行される。このフ
ァイル転送では、音声情報については、端末装置２にお
いて音声再生に必要な発話情報だけが転送される。すな
わち、この転送は、音声情報に声色データを含まない転
送量の少ないファイル転送となる。In the file transfer process in step H4, a process of reading a file requested by the terminal device 2 from the DB 11 and transferring the file to the terminal device 2 is executed. In this file transfer, as for the voice information, only the utterance information necessary for voice reproduction in the terminal device 2 is transferred. That is, this transfer is a file transfer with a small transfer amount that does not include voice data in the audio information.

【０１８６】その後、端末装置２では、所望のファイル
の受信（ダウンロード）が完了すると（ステップＴ
３）、処理がステップＴ４に移行して再生処理を実行す
る。Thereafter, the terminal device 2 completes the reception (download) of the desired file (step T).
3) The process shifts to step T4 to execute a reproduction process.

【０１８７】この再生処理は、ホスト装置１よりダウン
ロードしたファイル情報に基づいて音声や画像を再生す
る処理である。その再生処理の際に、イベント入力があ
って、そのイベントが他のファイルの選択であれば（ス
テップＴ５）、処理はステップＴ２に移行して上述した
ファイル転送要求を再度実行し、そのイベントが操作の
終了指示であれば（ステップＴ６）、本処理は終了とな
り、または、そのイベントがその他の処理の指示であれ
ば、その指示に応じた処理が実行される。This playback process is a process for playing back audio and images based on file information downloaded from the host device 1. At the time of the reproduction process, if there is an event input and the event is the selection of another file (step T5), the process proceeds to step T2 and executes the above-described file transfer request again. If it is an instruction to end the operation (step T6), the process ends, or if the event is an instruction for another process, the process according to the instruction is executed.

【０１８８】ここで、ステップＴ４の再生処理について
前述の図１５と、図１７及び図１８とを参照して説明す
る。Here, the reproduction process in step T4 will be described with reference to FIG. 15, FIG. 17, and FIG.

【０１８９】この再生処理では、転送されてきたファイ
ル情報の内からプログラム情報に従って再生動作が開始
される。まず、ステップＴ４０１において、ファイル情
報の内の画像情報が読み出され、図１９に示した如く画
像（一例としての阿波踊りの絵）が表示部２５に表示さ
れる。このファイル情報には、発話情報が含まれている
ことから、図１９に示したように、ナレーションコント
ロール（以下にＮＣと称する）ウインド２５０が表示さ
れる。In this reproducing process, a reproducing operation is started from the transferred file information according to the program information. First, in step T401, image information in the file information is read, and an image (a picture of Awa Odori as an example) is displayed on the display unit 25 as shown in FIG. Since the file information includes utterance information, a narration control (hereinafter, referred to as NC) window 250 is displayed as shown in FIG.

【０１９０】このＮＣウインド２５０は、停止ボタン２
５１、再生ボタン２５２、一時停止ボタン２５３、及び
早送りボタン２５４より構成され、その表示位置をキー
入力部２９の操作で任意に移動させることができる。The NC window 250 has a stop button 2
51, a play button 252, a pause button 253, and a fast forward button 254, and the display position can be arbitrarily moved by operating the key input unit 29.

【０１９１】再生ボタン２５２はナレーション（発話情
報に基づいて音声波形を生成して得られる合成音声）再
生を指示するためのソフトスイッチであり、早送りボタ
ン２５４はナレーション再生位置をアドレス上で早送り
する動作を指示するためのソフトスイッチである。A play button 252 is a software switch for instructing narration (synthesized voice obtained by generating a voice waveform based on speech information), and a fast-forward button 254 is an operation for fast-forwarding a narration reproduction position on an address. Is a soft switch for instructing the operation.

【０１９２】停止ボタン２５１は再生ボタン２５２また
は早送りボタン２５４の操作に応じて動作するナレーシ
ョン再生の停止または早送り動作の停止を指示するソフ
トスイッチである。The stop button 251 is a software switch for instructing the stop of the narration playback or the stop of the fast-forward operation, which operates in response to the operation of the play button 252 or the fast-forward button 254.

【０１９３】一時停止ボタン２５３はナレーション再生
時にナレーション再生位置をアドレス上で一時停止する
動作を指示するためのソフトスイッチである。The pause button 253 is a software switch for instructing the operation of temporarily stopping the narration reproduction position on the address during the narration reproduction.

【０１９４】続くステップＴ４０２において、ファイル
情報の内の発話情報が読み出されて解析される。この場
合、まず、発話情報の内のヘッダ情報の声色指定データ
ＶＰが参照され、その声色指定データＶＰに基づく声色
指定の有無が判断される（ステップＴ４０３）。In the following step T402, the utterance information in the file information is read and analyzed. In this case, first, the voice color designation data VP of the header information in the utterance information is referred to, and it is determined whether or not voice color designation is performed based on the voice color designation data VP (step T403).

【０１９５】もし声色指定があるという判断結果が得ら
れた場合には、処理はステップＴ４０４に移行し、一
方、声色指定がないという判断結果が得られた場合に
は、処理はステップＴ４０６に移行する。If it is determined that there is a voice designation, the process proceeds to step T404. On the other hand, if it is determined that there is no voice designation, the process proceeds to step T406. I do.

【０１９６】ステップＴ４０４では、まず声色指定デー
タＶＰにより指定される声色が声色記憶部２１の声色部
２１１から検索され、その声色部２１１に用意されてい
るか否か判断される。In step T404, first, the voice specified by the voice specification data VP is retrieved from the voice section 211 of the voice storage section 21, and it is determined whether or not the voice is prepared in the voice section 211.

【０１９７】そして、もし指定の声色が用意されている
という判断結果が得られた場合には、処理はステップＴ
４０５に移行し、一方、声色指定が用意されていないと
いう判断結果が得られた場合には、処理はステップＴ４
０６に移行する。If the result of the determination that the designated voice tone is prepared is obtained, the process proceeds to step T
The process proceeds to step T4, and if it is determined that the voice designation is not prepared, the process proceeds to step T4.
Shift to 06.

【０１９８】ステップＴ４０５では、声色記憶部２１に
用意されていた声色が音声再生に使用する声色として設
定される。この後、処理はステップＴ４０７に移行す
る。In step T405, the timbre prepared in the timbre storage unit 21 is set as a timbre to be used for voice reproduction. Thereafter, the processing shifts to Step T407.

【０１９９】また、ステップＴ４０６では、ヘッダ情報
に声色指定がないか、もしくは声色部２１１に指定の声
色が用意されていないため、さらにヘッダ情報のピッチ
基準データＰＢに基づきピッチ基準データＰＢ１，ＰＢ
２…から基準値が近いものが判断され、その最近のピッ
チ基準に相当する声色が音声再生に使用する声色として
設定される。この後、処理はステップＴ４０７に移行す
る。In step T406, the voice information is not specified in the header information, or the specified voice is not prepared in the voice color part 211. Therefore, the pitch reference data PB1 and PB1 are further determined based on the pitch reference data PB of the header information.
2 are determined to be close to the reference value, and the voice color corresponding to the latest pitch reference is set as the voice color used for voice reproduction. Thereafter, the processing shifts to Step T407.

【０２００】続いてステップＴ４０７では、音声合成時
の声の高さをキー入力部２９によって設定する処理が行
われる。なお、この設定は任意であり、設定しなくても
よいが（この場合には声色部２１１のピッチ基準が使用
される）、設定された場合には声色データ中のピッチ基
準データに代わってその設定値が基準値として採用され
る。Subsequently, in step T407, a process of setting the pitch of the voice at the time of speech synthesis by the key input unit 29 is performed. Note that this setting is optional and need not be set (in this case, the pitch reference of the voice part 211 is used). However, if it is set, the pitch reference data in the voice data is used instead of the pitch reference data in the voice data. The set value is adopted as the reference value.

【０２０１】そして、処理がステップＴ４０８に移行す
ると、イベント入力待ちとなる。この入力対象となるイ
ベントには、ＮＣウインド２５０の各ボタン入力、他の
ファイル指定、終了指示等がある。Then, when the processing shifts to step T408, it waits for an event input. The events to be input include input of each button of the NC window 250, designation of another file, and end instruction.

【０２０２】例えば、図２０に示したように、Ｘ１で示
した位置に表示画面上のカーソル（不図示）が移動し
て、再生ボタン２５２が操作された場合には、処理は、
ステップＴ４１０に移行して、図１５で説明した音声合
成処理（ナレーション再生処理に相当する）を実行す
る。図２０の例では、表示部２５に表示されている阿波
踊りの絵に合わせてスピーカ２３より「徳島の阿波踊り
は、世界的にも有名な踊りです。」のようにナレーショ
ンが再生される。For example, as shown in FIG. 20, when the cursor (not shown) on the display screen is moved to the position indicated by X1 and the play button 252 is operated, the processing is as follows.
The process shifts to step T410 to execute the speech synthesis process (corresponding to the narration playback process) described with reference to FIG. In the example of FIG. 20, a narration is reproduced from the speaker 23 in accordance with the picture of the Awa dance displayed on the display unit 25, such as "Awa dance in Tokushima is a world-famous dance."

【０２０３】音声合成時に、発話情報と声色データとの
間でピッチ基準にずれが生じていた場合には、声色記憶
部２１より合成波形生成処理ＰＲ２に対してシフト量を
示すピッチシフトデータが供給される。合成波形生成処
理ＰＲ２では、このピッチシフトデータに基づいてピッ
チ基準を変化させる。このため、声の高さは声色側の声
の高さに合うように変化する。If there is a deviation in the pitch reference between the utterance information and the timbre data during speech synthesis, pitch shift data indicating the shift amount is supplied from the timbre storage unit 21 to the synthesized waveform generation processing PR2. Is done. In the synthetic waveform generation processing PR2, the pitch reference is changed based on the pitch shift data. For this reason, the pitch of the voice changes to match the pitch of the voice on the timbre side.

【０２０４】このピッチシフトについて具体的に説明す
る。例えば、ピッチ基準として平均ピッチ周波数を使用
して、発話情報の平均ピッチ周波数が２００［Ｈｚ］、
声色データの平均ピッチ周波数が２３０［Ｈｚ］であっ
た場合、音声合成時の声の高さを全体的に２３０／２０
０倍にして音声合成が実施される。これにより、声色デ
ータに適した声の高さの音声を合成することができ、音
声の品質が向上する。The pitch shift will be specifically described. For example, using the average pitch frequency as a pitch reference, the average pitch frequency of speech information is 200 [Hz],
When the average pitch frequency of voice data is 230 [Hz], the voice pitch at the time of voice synthesis is changed to 230/20 as a whole.
Speech synthesis is performed at 0 times. As a result, it is possible to synthesize a voice having a voice pitch suitable for the voice color data, and the quality of the voice is improved.

【０２０５】なお、ピッチ基準を周波数でつくる周期等
の別の表現を用いようにしてもよい。It is to be noted that another expression such as a cycle in which the pitch reference is formed by frequency may be used.

【０２０６】上記ステップＴ４１０で音声合成が開始す
ると、すぐに処理はステップＴ４０８に戻り、次のイベ
ント入力を待つ。As soon as speech synthesis starts in step T410, the process returns to step T408 and waits for the next event input.

【０２０７】また、ステップＴ４１０で音声再生が開始
され、「徳島の阿波踊りは、」までナレーションが再生
された段階で、図２１に示したように、Ｘ２で示した位
置の一時停止ボタン２５３が操作された場合（ステップ
Ｔ４１３）、処理はステップＴ４１４に移行してナレー
ションの再生を「、」の再生位置で一時停止する。In step T410, the sound reproduction is started, and at the stage where the narration is reproduced until "Awa Odori wa Tokushima", the pause button 253 at the position indicated by X2 is pressed as shown in FIG. If the operation has been performed (step T413), the process proceeds to step T414 to pause the reproduction of the narration at the reproduction position of “,”.

【０２０８】その後、処理は再びステップＴ４０８に移
行して次のイベント入力を待つが、再度一時停止ボタン
２５３が操作されるか、もしくは再生ボタン２５１が操
作されると、そのイベント入力はステップＴ４０９にお
いて音声再生のイベント入力として判断され、続くステ
ップＴ４１０においてナレーションが一時停止位置の次
から再生される。すなわち、図２２に示したように、
「世界的にも有名な踊りです。」のようにナレーション
が再生される。Thereafter, the process again shifts to step T408 to wait for the next event input. When the pause button 253 is operated or the play button 251 is operated again, the event input is performed in step T409. The narration is determined to be an event input for sound reproduction, and in the subsequent step T410, the narration is reproduced from the position following the pause position. That is, as shown in FIG.
The narration is played as "This is a world-famous dance."

【０２０９】また、ナレーション再生中に停止ボタン２
５１が操作された場合には（ステップＴ４１１）、処理
はステップＴ４１２に移行してそのナレーションを停止
すると共に、ナレーションの再生が途中であっても次の
再生位置をナレーションの先頭位置に戻す。[0209] The stop button 2
If 51 has been operated (step T411), the process proceeds to step T412 to stop the narration and return the next playback position to the beginning of the narration even if the narration is being played.

【０２１０】また、ナレーションの再生時、もしくはイ
ベント待ち状態で早送りボタン２５４が操作された場合
には（ステップＴ４１５）、処理はステップＴ４１６に
移行して再生しているナレーションを早口で進めたり、
ナレーションの再生位置をメモリのカウント上で早送り
する。When the narration is reproduced or when the fast-forward button 254 is operated in the event waiting state (step T415), the process proceeds to step T416, and the narration being reproduced is advanced at a rapid pace.
Fast forwards the narration playback position on the memory count.

【０２１１】なお、イベント入力待ちの際に、他のファ
イル情報の要求をホスト装置１に対して行う指示や本処
理を終了する指示などがイベントとして入力された場合
には、処理はこの再生処理から再びファイル転送処理
（メイン処理）に戻る。If an instruction to request other file information to the host device 1 or an instruction to end the present processing is input as an event while waiting for an event input, the processing is performed in this reproduction processing. Then, the process returns to the file transfer process (main process).

【０２１２】次に、端末装置２のファイル処理について
説明する。図２３はこの実施の形態による発話情報作成
処理を説明するフローチャート、図２４はこの実施の形
態による新規作成処理を説明するフローチャート、図２
５はこの実施の形態による割り込み再生処理を説明する
フローチャート、図２６〜図３３はこの実施の形態によ
る新規作成処理時の操作画面の状態遷移を示す図、そし
て、図３４はこの実施の形態による編集処理を説明する
フローチャートである。Next, file processing of the terminal device 2 will be described. FIG. 23 is a flowchart illustrating the utterance information creation processing according to this embodiment, FIG. 24 is a flowchart illustrating the new creation processing according to this embodiment, and FIG.
5 is a flowchart for explaining an interrupt reproduction process according to this embodiment, FIGS. 26 to 33 are diagrams showing a state transition of an operation screen at the time of a new creation process according to this embodiment, and FIG. 34 is a diagram according to this embodiment. It is a flowchart explaining an edit process.

【０２１３】このファイル処理には、発話情報作成処
理、割り込み再生処理、再生処理等が含まれている。発
話情報作成処理には、新規作成処理と編集処理とが含ま
れている。[0213] This file processing includes speech information creation processing, interrupt reproduction processing, reproduction processing, and the like. The utterance information creating process includes a new creating process and an editing process.

【０２１４】図２３に示した発話情報作成処理では、ま
ず、キー入力部２９のキー操作に従って処理が選択され
る（ステップＳ１）。そして、選択された処理内容が判
断され、新規作成という判断結果が得られた場合には
（ステップＳ２）、処理はステップＳ３に移行して新規
作成処理（図２４参照）を実行する。また、編集という
判断結果が得られた場合には（ステップＳ４）、処理は
ステップＳ５に移行して編集処理（図２９参照）を実行
する。In the speech information creating process shown in FIG. 23, first, a process is selected according to a key operation of the key input unit 29 (step S1). Then, the content of the selected process is determined, and if a determination result of new creation is obtained (step S2), the process proceeds to step S3 to execute a new creation process (see FIG. 24). If a result of the determination of editing is obtained (step S4), the process proceeds to step S5 to execute an editing process (see FIG. 29).

【０２１５】そして、新規作成処理（ステップＳ３）、
編集処理（ステップＳ５）のいずれかの処理が終了した
後に、処理はステップＳ６に移行して終了指示の有無を
判断する。その結果、終了であれば処理は終了し、終了
でなければ再度ステップＳ１に戻る。Then, a new creation process (step S3),
After any one of the editing processes (step S5) is completed, the process proceeds to step S6 to determine whether or not there is an end instruction. As a result, if the processing is completed, the processing is completed, and if not, the processing returns to step S1.

【０２１６】続いて、新規作成処理について図２６〜図
３３を参照して説明する。この新規作成処理では、ま
ず、発話情報を構成するヘッダ情報と発声情報とが初期
化されると共に、ファイル作成に使用する作成画面も初
期化される（ステップＳ１０１）。Next, the new creation processing will be described with reference to FIGS. In the new creation process, first, the header information and the utterance information constituting the speech information are initialized, and the creation screen used for creating the file is also initialized (step S101).

【０２１７】次に、マイク２８を用いて新規に自然音声
を入力するか、もしくは元波形記憶部２７に既に登録さ
れている元音声情報（波形データ）のファイルをオープ
ンして（ステップＳ１０２）、作成画面上に元波形が表
示される（ステップＳ１０３）。なお、新規に自然音声
を入力する場合には、その入力された自然音声が解析さ
れ、Ｄ／Ａ変換器３４でデジタル化された後、表示部２
５に波形データとなって表示される。Next, a new natural sound is input using the microphone 28, or a file of the original sound information (waveform data) already registered in the original waveform storage unit 27 is opened (step S102). The original waveform is displayed on the creation screen (step S103). When a new natural voice is input, the input natural voice is analyzed, digitized by the D / A converter 34, and then displayed on the display unit 2.
5 is displayed as waveform data.

【０２１８】作成画面は、図２６に示したように、表示
部２５上に、音韻表示ウインド２５Ａ、元波形表示ウイ
ンド２５Ｂ、合成波形表示ウインド２５Ｃ、ピッチ表示
ウインド２５Ｄ、ベロシティ表示ウインド２５Ｅ、元音
声再生／停止ボタン２５Ｆ、合成音声形再生／停止ボタ
ン２５Ｇ、ピッチ基準設定目盛り２５Ｈ等で構成され
る。As shown in FIG. 26, the creation screen is displayed on the display unit 25 as a phoneme display window 25A, an original waveform display window 25B, a synthesized waveform display window 25C, a pitch display window 25D, a velocity display window 25E, and an original voice. It comprises a play / stop button 25F, a synthesized voice play / stop button 25G, a pitch reference setting scale 25H, and the like.

【０２１９】この作成画面では、音声入力、もしくはフ
ァイルオープンによって形成される元波形は、図２６に
示したように、元波形表示ウインド２５Ｂに表示され
る。On this creation screen, the original waveform formed by voice input or file open is displayed in the original waveform display window 25B as shown in FIG.

【０２２０】続くステップＳ１０４では、元波形表示ウ
インド２５Ｂに表示された元波形に対して各音韻の時間
長を設定するため、音韻を時間区分するラベルがマニュ
アル操作で付与される。この操作については、例えば、
キー入力部２９の操作により表示画面上のカーソルを元
波形表示ウインド２５Ｂ下に位置する合成波形表示ウイ
ンド２５Ｃ内に移動させ、所望の位置にラベル指定する
ことでラベルの付与ができる。この場合、マウス等の入
力デバイスを使用すれば、そのラベル位置の指定が容易
である。In the following step S104, in order to set the time length of each phoneme with respect to the original waveform displayed in the original waveform display window 25B, a label for temporally dividing the phonemes is given by manual operation. For this operation, for example,
By operating the key input unit 29, the cursor on the display screen is moved into the synthesized waveform display window 25C located below the original waveform display window 25B, and a label can be assigned by specifying a label at a desired position. In this case, if an input device such as a mouse is used, the label position can be easily specified.

【０２２１】図２７には、合成波形表示ウインド２５Ｃ
内で１１個のラベルが付与された例が示されている。こ
のラベル付与により合成波形表示ウインド２５Ｃの上下
に位置する音韻表示ウインド２５Ａ、元波形表示ウイン
ド２５Ｂ、ピッチ表示ウインド２５Ｄ、及びベロシティ
表示ウインド２５Ｅにも各ラベルが延長され、これによ
って時間軸上でのパラメータの対応がとられる。FIG. 27 shows a composite waveform display window 25C.
In the figure, an example in which 11 labels are given is shown. By this labeling, each label is also extended to the phonological display window 25A, the original waveform display window 25B, the pitch display window 25D, and the velocity display window 25E positioned above and below the synthesized waveform display window 25C. Correspondence of parameters is taken.

【０２２２】入力された自然音声の言語が例えば日本語
であった場合には、続くステップＳ１０５において、日
本語の音韻（文字）が音韻表示ウインド２５Ａに入力さ
れる。この場合にもラベル付与と同様にキー入力部２９
を用いてマニュアル操作によって音韻入力が行われ、各
音韻は音韻表示ウインド２５Ａ内のラベルで仕切られた
スペースに設定される。If the language of the input natural speech is, for example, Japanese, in a succeeding step S105, a Japanese phoneme (character) is input to the phoneme display window 25A. In this case, the key input unit 29 is provided in the same manner as the labeling.
The phoneme input is performed by manual operation using, and each phoneme is set in a space separated by a label in the phoneme display window 25A.

【０２２３】図２８には、時間軸上、先頭から「ヨ」、
「ロ」、「Ｕシ」、「イ」、「デ」、「Ｕ
ス」、「、」、「カ」の順で音韻入力された一例が示さ
れている。この入力された音韻の内、「Ｕシ」及び「Ｕ
ス」は無音声化音韻を示し、その他は有音声化音韻を示
す。In FIG. 28, on the time axis, "Y"
"B", "U", "I", "De", "U"
An example is shown in which phonemes are input in the order of “S”, “,”, and “F”. Of the input phonemes, "Ushi" and "U
"" Indicates a non-voiced phoneme, and the other indicates a voiced phoneme.

【０２２４】続くステップＳ１０６では、元波形表示ウ
インド２５Ｂに表示されている元波形のピッチ分析が行
われる。In the following step S106, pitch analysis of the original waveform displayed on the original waveform display window 25B is performed.

【０２２５】図２９には、ピッチ表示ウインド２５Ｄに
表示されピッチ分析後の元波形のピッチパターンＷ１
（図２９中、実線部分）と合成波形のピッチパターンＷ
２（図２９中、ラベル位置に丸で結ばれる波線部分）と
が例えば異なる色で示されている。In FIG. 29, the pitch pattern W1 of the original waveform displayed on the pitch display window 25D after the pitch analysis has been performed.
(Solid line in FIG. 29) and pitch pattern W of the synthesized waveform
2 (in FIG. 29, a broken line portion connected to a label position by a circle) is shown in a different color, for example.

【０２２６】続くステップＳ１０７では、ピッチ調整が
行われる。このピッチ調整には、ピッチラベルの追加、
時間軸方向の移動、削除にそれぞれ伴ってピッチ値の追
加、移動（時間軸方向やレベル方向）、削除の操作等が
含まれている。At the following step S107, pitch adjustment is performed. This pitch adjustment includes adding a pitch label,
Included are operations of adding, moving (time axis direction and level direction), deleting, etc., with the movement and deletion in the time axis direction, respectively.

【０２２７】このピッチ調整とは、具体的には、元波形
のピッチパターンをユーザが目視により参照して、合成
波形のピッチパターンＷ２をマニュアル操作で設定する
ものであり、その際、元波形のピッチパターンＷ１は固
定である。合成波形のピッチパターンＷ２は時間軸上の
ラベル位置の点ピッチで指定され、音韻の時間区分に依
存しない時間差をもつラベル間を直線で補間したもので
ある。Specifically, the pitch adjustment is to manually set the pitch pattern W2 of the synthesized waveform by manually referring to the pitch pattern of the original waveform by the user. The pitch pattern W1 is fixed. The pitch pattern W2 of the synthesized waveform is specified by the point pitch of the label position on the time axis, and is obtained by interpolating linearly between labels having a time difference independent of the time division of phonemes.

【０２２８】ピッチラベルの調整では、図３０に示した
ように、音韻を仕切るラベル間にさらにラベルを追加す
ることができる。追加の操作は、ピッチ表示ウインド２
５Ｄ内に直接マウス等でＤ１，Ｄ３，Ｄ４，Ｄ５に示し
たようにラベル位置を指定すればよい。このようにして
新たに付与されたピッチは隣り合うピッチと直線で結ば
れるので、ひとつの音韻の中に所望のピッチ変化を与え
ることができ、理想とする韻律に加工することが容易で
ある。In the adjustment of the pitch label, as shown in FIG. 30, it is possible to add a label between labels separating phonemes. Additional operation is the pitch display window 2
The label position may be directly specified in the 5D with a mouse or the like as indicated by D1, D3, D4, and D5. Since the pitch newly added in this way is connected to the adjacent pitch by a straight line, a desired pitch change can be given in one phoneme, and it is easy to process into the ideal prosody.

【０２２９】また、移動の操作では、ピッチ表示ウイン
ド２５Ｄ内に直接マウス等でＤ２に示したようにピッチ
ラベルの移動先を指定すればよい。このピッチラベルの
移動でも、ピッチは隣り合うピッチと直線で結ばれるの
で、ひとつの音韻の中に所望のピッチ変化を与えること
ができ、理想とする韻律に加工することが容易である。In the movement operation, the destination of the pitch label may be directly designated in the pitch display window 25D by a mouse or the like as indicated by D2. Even in the movement of the pitch label, the pitch is connected to the adjacent pitch by a straight line, so that a desired pitch change can be given in one phoneme, and it is easy to process into the ideal prosody.

【０２３０】なお、ピッチラベルからピッチを削除して
も同様に、ピッチはその削除したピッチを除く隣り合う
ピッチと直線で結ばれるので、ひとつの音韻の中に所望
のピッチ変化を与えることができ、理想とする韻律に加
工することが容易である。[0230] Even if the pitch is deleted from the pitch label, the pitch is connected by a straight line to the adjacent pitch excluding the deleted pitch, so that a desired pitch change can be given in one phoneme. It is easy to process to the ideal prosody.

【０２３１】この場合には、発声イベントＰＥ１が設定
される。In this case, an utterance event PE1 is set.

【０２３２】続くステップＳ１０８では、ピッチまで調
整した段階での合成波形が生成され、例えば図３１に示
したように、合成波形表示ウインド２５Ｃに表示形成さ
れる。このとき、ベロシティは未設定のため、図３１に
示したようにベロシティ表示ウインド２５Ｅにはプレー
ンなベロシティが表示される。In the following step S108, a composite waveform at the stage adjusted to the pitch is generated, and is displayed and formed on the composite waveform display window 25C, for example, as shown in FIG. At this time, since the velocity is not set, a plain velocity is displayed in the velocity display window 25E as shown in FIG.

【０２３３】また、ステップＳ１０８において合成波形
を表示した段階で元音声と合成音声とを比較再生させる
ことが可能である。この段階では、合成させる声色の種
類はデフォルトの声色とする。At the stage where the synthesized waveform is displayed in step S108, the original voice and the synthesized voice can be compared and reproduced. At this stage, the type of voice to be synthesized is a default voice.

【０２３４】元音声を再生させる場合には、元音声再生
／停止ボタン２５Ｆを操作し、その再生を停止させる場
合には、もう一度元音声再生／停止ボタン２５Ｆを操作
すればよい。また、合成音声を再生させる場合には、合
成音声再生／停止ボタン２５Ｇを操作し、その再生を停
止させる場合には、もう一度合成音声再生／停止ボタン
２５Ｇを操作すればよい。When the original sound is reproduced, the original sound reproduction / stop button 25F is operated, and when the reproduction is stopped, the original sound reproduction / stop button 25F is operated again. When the synthesized voice is reproduced, the synthesized voice reproduction / stop button 25G is operated, and when the reproduction is stopped, the synthesized voice reproduction / stop button 25G is operated again.

【０２３５】以上の再生処理は、新規作成処理もしくは
後述の編集処理の中で割り込み再生処理として実行され
る。その詳細は図２５に示した動作である。すなわち、
ステップＳ２０１において、まず、元音声再生／停止ボ
タン２５Ｆもしくは合成音声再生／停止ボタン２５Ｇの
操作に応じてまず再生対象が元音声かそれとも合成音声
か判別される。The above reproduction processing is executed as an interruption reproduction processing in a new creation processing or an editing processing described later. The details are the operation shown in FIG. That is,
In step S201, first, it is determined whether the reproduction target is the original sound or the synthesized sound in response to the operation of the original sound reproduction / stop button 25F or the synthesized sound reproduction / stop button 25G.

【０２３６】そして、元音声という判別結果が得られた
場合には（ステップＳ２０２）、処理はステップＳ２０
３に移行して元波形により元音声を再生出力し、一方、
合成音声という判別結果が得られた場合には（ステップ
Ｓ２０２）、処理はステップＳ２０４に移行して合成波
形により合成音声を再生出力する。この後、処理は新規
作成処理の割り込み時点の動作に戻る。If the result of the determination is that of the original voice (step S202), the process proceeds to step S20.
3 and reproduce and output the original sound with the original waveform.
If a determination result of a synthesized voice is obtained (step S202), the process proceeds to step S204, where the synthesized voice is reproduced and output using the synthesized waveform. Thereafter, the process returns to the operation at the time of interruption of the new creation process.

【０２３７】さて、新規作成処理の説明に戻り、続くス
テップＳ１０９において音韻の音量を表すベロシティが
マニュアル操作で調整される。このベロシティの調整
は、図３２に示したように、予め決められた段階の範囲
（例えば１６段）で行われる。Now, returning to the description of the new creation processing, in step S109, the velocity representing the volume of the phoneme is manually adjusted. This velocity adjustment is performed within a predetermined range of steps (for example, 16 steps) as shown in FIG.

【０２３８】このベロシティ調整についても、前述のピ
ッチ調整と同様に、音韻間の時間区分に依存せず、時間
軸上で音韻の時間差よりもさらに細かく、任意の時点で
声の強さに変化を与えることができる。As with the above-described pitch adjustment, this velocity adjustment does not depend on the time division between phonemes, but is finer than the time difference between phonemes on the time axis. Can be given.

【０２３９】例えば、図３２に示されるベロシティ表示
ウインド２５Ｅ中の音韻“カ”の時間区分のベロシティ
Ｅ１を、図３３に示した如く、ベロシティＥ１１とＥ１
２とに細分化することができる。このベロシティ調整
も、ピッチ調整の場合と同様に、ベロシティ表示ウイン
ド２５Ｅに対してキー入力部２９の操作で設定する。For example, as shown in FIG. 33, the velocities E1 of the time division of the phoneme "ka" in the velocity display window 25E shown in FIG.
2 can be subdivided. This velocity adjustment is also set by operating the key input unit 29 in the velocity display window 25E as in the case of the pitch adjustment.

【０２４０】このベロシティ調整後に再び合成音声の再
生が操作されると、音韻の時間差に依存しない時間差で
音声の強さが変化してプレーンなベロシティ状態に比べ
て音声に抑揚を付加することができる。なお、声の強さ
（ベロシティ）の時間区分については、ピッチ調整によ
り得られたピッチラベルの時間区分に同期させるように
してもよい。When the reproduction of the synthesized speech is operated again after the velocity adjustment, the intensity of the speech changes with a time difference that does not depend on the time difference between the phonemes, and the intonation can be added to the speech as compared with the plain velocity state. . The time division of the voice intensity (velocity) may be synchronized with the time division of the pitch label obtained by the pitch adjustment.

【０２４１】この後、ステップＳ１１０において新規作
成処理の終了操作が判別され、もし終了操作が実行され
た場合には、処理はステップＳ１１７に移行して新規フ
ァイリング処理を実行する。この新規ファイリング処理
では、ファイル名が入力され、そのファイル名に対応さ
せて新規作成ファイルがＤＢ２６に記憶される。そのフ
ァイル名が「Ａ」であれば、図１４に示した如くヘッダ
情報ＨＤＲＡ及び発声情報ＰＲＳＡの形で発話情報が記
憶される。Thereafter, the end operation of the new creation process is determined in step S110, and if the end operation is executed, the process shifts to step S117 to execute the new filing process. In the new filing process, a file name is input, and a newly created file is stored in the DB 26 in correspondence with the file name. If the file name is "A", the speech information is stored in the form of header information HDRA and speech information PRSA as shown in FIG.

【０２４２】また、ステップＳ１１０において終了操作
はなく、ベロシティの変更（ステップＳ１１１）、ピッ
チの変更（ステップＳ１１２）、音韻の変更（ステップ
Ｓ１１３）、ラベルの変更（ステップＳ１１４）、声色
設定の変更（ステップＳ１１５）のいずれかの操作が判
別されると、処理は各変更要求に応じた処理に移行す
る。There is no ending operation in step S110, and the velocity is changed (step S111), the pitch is changed (step S112), the phoneme is changed (step S113), the label is changed (step S114), and the timbre setting is changed (step S114). If any one of the operations in step S115) is determined, the process shifts to a process according to each change request.

【０２４３】すなわち、ベロシティの変更であれば（ス
テップＳ１１１）、処理はステップＳ１０９に戻ってベ
ロシティの値を音韻単位でマニュアル操作に応じて変更
する。また、ピッチの変更であれば（ステップＳ１１
２）、処理はステップＳ１０７に戻ってピッチの値をラ
ベル単位でマニュアル操作に応じて変更（追加、削除含
む）する。That is, if the velocity is changed (step S111), the process returns to step S109 to change the velocity value in phoneme units according to the manual operation. If the pitch is changed (step S11)
2), the process returns to step S107 to change (including addition and deletion) the pitch value in label units according to the manual operation.

【０２４４】また、音韻の変更であれば（ステップＳ１
１３）、処理はステップＳ１０５に戻って音韻をマニュ
アル操作に応じて変更する。また、ラベルの変更であれ
ば（ステップＳ１１４）、処理はステップＳ１０４に戻
ってラベルをマニュアル操作に応じて変更する。なお、
このラベル変更とピッチ変更では、その変更後にピッチ
間隔に応じて合成波形のピッチパターンＷ２が変化す
る。If the phoneme is changed (step S1)
13), the process returns to step S105, and changes the phoneme according to the manual operation. If the label is to be changed (step S114), the process returns to step S104 to change the label according to a manual operation. In addition,
In the label change and the pitch change, the pitch pattern W2 of the synthesized waveform changes according to the pitch interval after the change.

【０２４５】また、声色設定の変更であれば（ステップ
Ｓ１１５）、処理はステップＳ１１６に移行してマニュ
アル操作に応じて所望の声色の種類に設定変更する。こ
の声色設定の変更により再び合成音声を再生すると、音
声の特徴が変わることから、例えば自然音声が男性の声
色であっても声色の変更により女性等の声色に変化させ
ることができる。If the tone setting is to be changed (step S115), the process proceeds to step S116 to change the setting to the desired tone type in accordance with the manual operation. When the synthesized voice is reproduced again by the change of the voice setting, the characteristics of the voice are changed. For example, even if the natural voice is a male voice, the voice can be changed to a female voice by changing the voice.

【０２４６】なお、ステップＳ１０９の処理の後に終了
操作が検出されず、パラメータの変更操作も検出されな
い間は、ステップＳ１１５より再びステップＳ１１０に
戻る処理が繰り返し実行される。It should be noted that as long as no end operation is detected after the process of step S109 and no parameter change operation is detected, the process of returning from step S115 to step S110 is repeatedly executed.

【０２４７】また、各パラメータの変更は変更指定され
たパラメータの変更だけを実行する。例えば、ラベル変
更によりステップＳ１０４の処理が終了すると、以降の
ステップＳ１０５〜ステップＳ１０９までの処理がパス
スルーされ、ステップＳ１１０より処理が再開する。Further, the change of each parameter only executes the change of the parameter designated to be changed. For example, when the processing in step S104 ends due to a label change, the processing in steps S105 to S109 is passed through, and the processing is restarted from step S110.

【０２４８】続いて、編集処理について図３４を参照し
て説明する。この編集処理は既に作成済みのファイルに
対してパラメータの追加、変更、削除を操作する処理で
あり、基本的には新規作成処理の変更ステップと同様の
処理を実行する。Next, the editing process will be described with reference to FIG. This editing process is a process for operating addition, change, and deletion of a parameter to a file that has already been created, and basically executes the same process as the change step of the new creation process.

【０２４９】すなわち、この編集処理では、まず、ステ
ップＳ３０１において、編集対象となるファイルがＤＢ
２６のファイルリストを参照して選択操作される。そし
て、前述した新規作成処理と同様の作成画面が表示部２
５に表示形成される。That is, in this editing processing, first, in step S301, the file to be edited is stored in the DB.
A selection operation is performed with reference to the 26 file list. Then, a creation screen similar to the above-described new creation processing is displayed on the display unit 2.
5 is formed.

【０２５０】この編集処理では、編集対象となる元の合
成波形が今度は元波形として扱われるので、その元波形
が元波形表示ウインド２５Ｂに表示形成される。In this editing processing, the original synthesized waveform to be edited is now treated as the original waveform, and the original waveform is displayed and formed on the original waveform display window 25B.

【０２５１】続くステップＳ３０２において、編集操作
が入力される。この入力は前述の新規作成処理の変更操
作に相当するものである。At the following step S302, an editing operation is input. This input corresponds to the change operation of the above-described new creation processing.

【０２５２】この編集操作により、ラベルの変更（ステ
ップＳ３０３）、音韻の変更（ステップＳ３０５）、ピ
ッチの変更（ステップＳ３０７）、ベロシティの変更
（ステップＳ３０９）、声色設定の変更（ステップＳ３
１１）のいずれかの操作が判別されると、処理は各変更
要求に応じた処理に移行する。By this editing operation, the label is changed (step S303), the phoneme is changed (step S305), the pitch is changed (step S307), the velocity is changed (step S309), and the tone setting is changed (step S3).
When any one of the operations in 11) is determined, the processing shifts to a processing corresponding to each change request.

【０２５３】すなわち、ラベルの変更であれば（ステッ
プＳ３０３）、処理はステップＳ３０４に移行してラベ
ルをマニュアル操作に応じて変更する。なお、この編集
処理でもラベル変更とピッチ変更では、その変更に応じ
て合成波形のピッチパターンＷ２が変化する。That is, if the label is to be changed (step S303), the process shifts to step S304 to change the label according to a manual operation. It should be noted that the pitch pattern W2 of the synthesized waveform changes in accordance with the change in the label change and the pitch change in the editing process as well.

【０２５４】また、音韻の変更であれば（ステップＳ３
０５）、処理はステップＳ３０６に移行して音韻をマニ
ュアル操作に応じて変更する。また、ピッチの変更であ
れば（ステップＳ３０７）、処理はステップＳ３０８に
移行してピッチの値をラベル単位でマニュアル操作に応
じて変更（追加、削除含む）する。If the phoneme is to be changed (step S3
05), the process proceeds to step S306, and the phoneme is changed according to the manual operation. If the pitch is to be changed (step S307), the process proceeds to step S308 to change (including addition or deletion) the pitch value in label units in accordance with the manual operation.

【０２５５】また、ベロシティの変更であれば（ステッ
プＳ３０９）、処理はステップＳ３１０に移行してベロ
シティの値を音韻単位でマニュアル操作に応じて変更す
る。また、声色設定の変更であれば（ステップＳ３１
１）、処理はステップＳ３１２に移行してマニュアル操
作に応じて所望の声色の種類に設定変更する。If the velocity is to be changed (step S309), the process shifts to step S310 to change the velocity value in phoneme units according to the manual operation. If the voice setting is changed (step S31)
1), the process proceeds to step S312, and the setting is changed to a desired tone type according to the manual operation.

【０２５６】また、ステップＳ３０２の編集操作で終了
操作が実行された場合には、処理はステップＳ３１３に
移行して操作終了を確認した後、さらにステップＳ３１
４に移行する。このステップＳ３１４では、編集ファイ
リング処理が実行され、その際に新規ファイルとしての
登録や既存のファイルへの上書きが任意に選択できる。If the ending operation has been performed in the editing operation in step S302, the process shifts to step S313 to confirm the end of the operation, and further proceeds to step S31.
Move to 4. In this step S314, edit filing processing is executed, and at this time, registration as a new file or overwriting to an existing file can be arbitrarily selected.

【０２５７】なお、各パラメータの変更後に、処理は再
びステップＳ３０２に戻り、パラメータの変更操作を続
行することができる。After each parameter is changed, the process returns to step S302, and the parameter changing operation can be continued.

【０２５８】次に、ファイル登録について説明する。図
３５はこの実施の形態によるファイル登録処理を説明す
るフローチャートである。Next, file registration will be described. FIG. 35 is a flowchart for explaining a file registration process according to this embodiment.

【０２５９】このファイル登録では、端末装置２がホス
ト装置１に対して所望のファイル情報をアップロードし
て発話情報を登録する処理が実行される。In the file registration, a process is executed in which the terminal device 2 uploads desired file information to the host device 1 and registers utterance information.

【０２６０】具体的には、ホスト装置１と端末装置２と
の通信時に、まず、端末装置２において、作成済みのフ
ァイルがキー入力部２９のキー操作によって選択される
（ステップＴ１１）。このステップＴ１１によるファイ
ル選択では、ＤＢ２６に記憶されたファイルを一覧表示
して選択すればよい。Specifically, at the time of communication between the host device 1 and the terminal device 2, first, in the terminal device 2, a created file is selected by a key operation of the key input unit 29 (step T11). In the file selection in step T11, the files stored in the DB 26 may be displayed in a list and selected.

【０２６１】その後、ステップＴ１１で選択されたファ
イルの転送（アップロード）がホスト装置１に対して要
求される（ステップＴ１２）。この要求処理は、上述し
たファイル選択に伴って動作する。Thereafter, transfer (upload) of the file selected in step T11 is requested to the host device 1 (step T12). This request process operates according to the above-described file selection.

【０２６２】ホスト装置１では、端末装置２より何らか
の要求があると、その要求が受け付けられ（前述のファ
イル転送と同様のステップＨ１）、その要求の内容が判
別される（前述のファイル転送と同様のステップＨ
２）。When any request is received from the terminal device 2, the host device 1 accepts the request (step H1 similar to the above-described file transfer) and determines the content of the request (similar to the above-described file transfer). Step H
2).

【０２６３】そして、ファイル登録要求という判別結果
が得られた場合には（ステップＨ５）、処理はステップ
Ｈ６に移行してファイル登録要求の受け付け許可を端末
装置２に対して応答する。また、ステップＨ５でファイ
ル登録要求という判別結果が得られなかった場合には、
処理は判別結果に応じたその他の処理に移行する。If the result of the determination is that a file registration request has been obtained (step H5), the processing shifts to step H6, where the terminal device 2 responds to the terminal device 2 with permission to accept the file registration request. If the result of the determination that the file registration request is not obtained in step H5,
The processing shifts to other processing according to the determination result.

【０２６４】端末装置２では、ホスト装置１より受け付
け許可が得られると、登録すべきファイル情報をＤＢ２
６より読み出してホスト装置１にファイル転送する処理
が実行される。In the terminal device 2, when the reception permission is obtained from the host device 1, the file information to be registered is stored in the DB 2.
6 and a process of transferring the file to the host device 1 is executed.

【０２６５】そして、ホスト装置１では、登録要求され
たファイルの受信（ダウンロード）が完了すると（ステ
ップＨ７）、処理がステップＨ８に移行してＤＢ１１へ
の登録処理を実行する。When the reception (download) of the file for which registration has been requested is completed (step H7), the host device 1 shifts the processing to step H8 and executes the registration processing in the DB 11.

【０２６６】このように、ホスト装置１へのファイル登
録が完了すると、ＤＢ１１に登録されたファイルを通信
網ＮＥＴに接続される他の端末装置からアクセスするこ
とができ、その際に前述したファイル転送が適用され
る。As described above, when the file registration in the host device 1 is completed, the file registered in the DB 11 can be accessed from another terminal device connected to the communication network NET. Is applied.

【０２６７】以上説明したように、この実施の形態によ
れば、ホスト装置１から端末装置２へ発話情報を含むフ
ァイル情報を転送し、端末装置２において、音韻に依存
しない声の大きさや声の高さで時間的に連続した韻律パ
ターンを展開し、その韻律パターンと発話情報中の声色
の種類を示す情報で選定された声色データとに基づいて
音声波形を生成するようにしたので、特定の声色に限定
しなくても複数種の声色から直接的に指定した最適の声
色で音声再生ができ、かつ波形合成時に声の高さのパタ
ーンにずれが生じることはなく、このように、発話情報
と声色情報との対応関係を固定しなくても最適の対応関
係を得ることで音声合成の高い品質を維持することが可
能である。As described above, according to the present embodiment, file information including utterance information is transferred from the host device 1 to the terminal device 2, and the voice volume and the voice volume independent of the phoneme are transferred to the terminal device 2. Since a prosody pattern that is temporally continuous in height is developed, and a voice waveform is generated based on the prosody pattern and voice data selected by information indicating the type of voice in the utterance information, a specific waveform is generated. The voice can be reproduced with the optimum voice directly specified from a plurality of types of voices without being limited to voices, and there is no shift in the voice pitch pattern during waveform synthesis. It is possible to maintain high quality of speech synthesis by obtaining an optimal correspondence without fixing the correspondence between the voice and the timbre information.

【０２６８】また、音声再生時に発話情報の声の高さの
基準を声色部２１１の声の高さの基準によってシフトす
るようにしたので、個々の声の高さは音韻の時間区分に
関係なくそのシフトした声の高さの基準に従って相対的
に変化し、このため、声の高さの基準が声色側に近づく
ことから、音声の品質を一層向上させることが可能であ
る。Also, since the reference of the voice pitch of the utterance information is shifted according to the reference of the voice pitch of the timbre section 211 at the time of voice reproduction, the pitch of each voice is independent of the time division of the phoneme. Since the pitch is relatively changed according to the shifted voice pitch criterion, and the voice pitch criterion approaches the timbre side, it is possible to further improve the voice quality.

【０２６９】また、音声再生時に発話情報の声の高さの
基準を任意の声の高さの基準によってシフトするように
したので、個々の声の高さは音韻の時間区分に関係なく
そのシフトした声の高さの基準に従って相対的に変化
し、このため、シフト量に従って意図する声質に近づけ
る等、声色の加工が可能である。Further, since the reference of the voice pitch of the utterance information is shifted by an arbitrary reference of the voice pitch during voice reproduction, the pitch of each voice can be shifted regardless of the time division of the phoneme. It changes relatively according to the reference of the pitch of the voice, and therefore, it is possible to process the timbre such as approaching the intended voice quality according to the shift amount.

【０２７０】また、声の高さの基準を声の高さの平均周
波数、最大周波数、または最小周波数にしたので、声の
高さの基準が取りやすくなる。Further, since the reference of the voice pitch is set to the average frequency, the maximum frequency, or the minimum frequency of the voice pitch, the reference of the voice pitch can be easily set.

【０２７１】また、端末装置２において記録媒体より声
色データを読み出して声色部２１１に記憶するようにし
たので、記録媒体を通して声色の種類にバリエーション
を与えることができ、音声再生時に最適の声色を適用さ
せることが可能である。Also, since the voice data is read from the recording medium in the terminal device 2 and stored in the voice section 211, a variation can be given to the type of voice through the recording medium, and the optimum voice can be applied at the time of voice reproduction. It is possible to do.

【０２７２】また、端末装置２において通信回線ＬＮを
介して外部装置より声色データを受信してその声色デー
タを声色部２１１に記憶するようにしたので、通信回線
ＬＮを通して声色の種類にバリエーションを与えること
ができ、音声再生時に最適の声色を適用させることが可
能である。Further, since the terminal device 2 receives timbre data from an external device via the communication line LN and stores the timbre data in the timbre portion 211, the type of timbre is given variation through the communication line LN. This makes it possible to apply an optimal voice tone during voice reproduction.

【０２７３】また、端末装置２において、入力された自
然音声に基づいて、声の大きさと声の高さとのいずれか
一方、もしくはその両方を、音韻間の時間差に依存せ
ず、かつ相対的なレベルをもつように離散させて発話情
報を作成し、これをホスト装置１に転送してＤＢ１１に
登録するようにしたので、音韻の時間差から独立した任
意の時点に声の大きさや声の高さを与えることが可能で
ある。In the terminal device 2, based on the input natural speech, one or both of the loudness and the loudness of the voice are not dependent on the time difference between phonemes, and are relative. Since the utterance information is discretely generated so as to have a level, the utterance information is transferred to the host device 1 and registered in the DB 11, so that the loudness and the pitch of the voice at any time independent of the time difference between the phonemes. It is possible to give

【０２７４】また、発話情報の作成の際に、声の高さの
基準を発話情報に含めて作成するようにしたので、発話
情報の中に声の高さの基準を与えることが可能である。Also, when the utterance information is created, the utterance information is created by including the criterion of the voice pitch in the utterance information, so that the utterance information can be given a criterion of the voice pitch. .

【０２７５】また、発話情報の作成の際に、各パラメー
タを任意に変更するようにしたので、音声の品質を高め
るための情報の変更が可能である。Further, since each parameter is arbitrarily changed at the time of creating the utterance information, it is possible to change the information for improving the voice quality.

【０２７６】次に、前述の実施の形態の変形例について
説明する。Next, a modified example of the above embodiment will be described.

【０２７７】変形例１では、前述の実施の形態の新規作
成処理を変形したものなので、その新規作成処理につい
て以下に説明する。In the first modification, since the new creation processing of the above-described embodiment is modified, the new creation processing will be described below.

【０２７８】図３６はこの実施の形態の変形例１による
要部を示すブロック図である。この変形例による装置
は、前述の端末装置２（図８参照）に音声認識部３５を
追加した構成であり、バスＢＳに結合される。FIG. 36 is a block diagram showing a main part according to a first modification of the present embodiment. The device according to this modification has a configuration in which a speech recognition unit 35 is added to the above-described terminal device 2 (see FIG. 8), and is coupled to a bus BS.

【０２７９】そして、この音声認識部３５は、マイク２
８を用いて入力した自然音声に基づいて音声認識を行
い、その認識結果を制御部２４に供給する。制御部２４
では、供給された認識結果から文字コード（前述の音韻
テーブル対応）に変換する処理が実行される。[0279] The voice recognition unit 35 is connected to the microphone 2
The voice recognition is performed based on the natural voice input by using the control unit 8, and the recognition result is supplied to the control unit 24. Control unit 24
Then, a process of converting the supplied recognition result into a character code (corresponding to the phoneme table described above) is executed.

【０２８０】続いてこの変形例の主要な動作について説
明する。図３７は変形例１による新規作成処理を説明す
るフローチャートである。Next, main operations of the modification will be described. FIG. 37 is a flowchart illustrating a new creation process according to the first modification.

【０２８１】この変形例１による新規作成処理では、前
述したステップＳ１０１（図２４参照）と同様に、ま
ず、発話情報を構成するヘッダ情報と発声情報とが初期
化されると共に、ファイル作成に使用する作成画面も初
期化される（ステップＳ５０１）。In the new creation processing according to the first modification, as in step S101 (see FIG. 24), first, the header information and the utterance information constituting the utterance information are initialized and used for file creation. The creation screen to be created is also initialized (step S501).

【０２８２】次に、マイク２８を用いて新規に自然音声
が入力されると（ステップＳ５０２）、作成画面の元波
形表示ウインド２５Ｂに元波形が表示される（ステップ
Ｓ５０３）。Next, when a new natural voice is input using the microphone 28 (step S502), the original waveform is displayed in the original waveform display window 25B of the creation screen (step S503).

【０２８３】なお、作成画面は、前述の実施の形態と同
様（図１７参照）、表示部２５上に、音韻表示ウインド
２５Ａ、元波形表示ウインド２５Ｂ、合成波形表示ウイ
ンド２５Ｃ、ピッチ表示ウインド２５Ｄ、ベロシティ表
示ウインド２５Ｅ、元音声再生／停止ボタン２５Ｆ、合
成音声形再生／停止ボタン２５Ｇ、ピッチ基準設定目盛
り２５Ｈ等で構成される。Note that the creation screen is the same as that of the above-described embodiment (see FIG. 17). On the display unit 25, the phoneme display window 25A, the original waveform display window 25B, the synthesized waveform display window 25C, the pitch display window 25D, It comprises a velocity display window 25E, an original sound reproduction / stop button 25F, a synthesized sound reproduction / stop button 25G, a pitch reference setting scale 25H, and the like.

【０２８４】この変形例では、音声認識部３５において
音声入力による元波形に基づく音声認識が実行され、音
韻が一括して取得される（ステップＳ５０３）。In this modified example, the speech recognition unit 35 performs speech recognition based on the original waveform by speech input, and acquires phonemes collectively (step S503).

【０２８５】続くステップＳ５０４では、その取得され
た音韻と元波形とに基づいて音韻表示ウインド２５Ａに
音韻が自動的に割り付けられ、その際、ラベルが付与さ
れる。この場合、音韻名（文字）とその音韻の時間間隔
（時間軸上の範囲）が求められる。In the following step S504, phonemes are automatically assigned to the phoneme display window 25A based on the obtained phonemes and the original waveform, and a label is given at that time. In this case, the phoneme name (character) and the time interval (range on the time axis) between the phoneme are obtained.

【０２８６】さらに、ステップＳ５０５においてピッチ
（ピッチ基準含む）とベロシティが元波形より抽出さ
れ、続くステップＳ５０６において音韻に対応させて抽
出されたピッチ、ベロシティがそれぞれピッチ表示ウイ
ンド２５Ｄ、ベロシティ表示ウインド２５Ｅに表示され
る。なお、ピッチ基準については、例えば、ピッチ周波
数の最小値の２倍に設定する方法がある。Further, in step S505, the pitch (including the pitch reference) and velocity are extracted from the original waveform, and in step S506, the pitch and velocity extracted corresponding to the phoneme are displayed in the pitch display window 25D and the velocity display window 25E, respectively. Is displayed. In addition, there is a method of setting the pitch reference to, for example, twice the minimum value of the pitch frequency.

【０２８７】この後、各パラメータとデフォルトの声色
データとに基づいて音声波形の生成が行われ、合成波形
表示ウインド２５Ｃに表示形成される（ステップＳ５０
７）。Thereafter, a speech waveform is generated based on each parameter and default timbre data, and is displayed on the synthesized waveform display window 25C (step S50).
7).

【０２８８】この後、ステップＳ５０８において新規作
成処理の終了操作が判別され、もし終了操作が実行され
た場合には、処理はステップＳ５１３に移行して新規フ
ァイリング処理を実行する。この新規ファイリング処理
では、ファイル名が入力され、そのファイル名に対応さ
せて新規作成ファイルがＤＢ２６に記憶される。Thereafter, the end operation of the new creation process is determined in step S508, and if the end operation is executed, the process shifts to step S513 to execute the new filing process. In the new filing process, a file name is input, and a newly created file is stored in the DB 26 in correspondence with the file name.

【０２８９】また、ステップＳ５０８において終了操作
はなく、ベロシティ、ピッチ、音韻、ラベルのいずれか
のパラメータの変更操作が判別されると（ステップＳ５
０９）、処理はステップＳ５１０に移行して変更対象の
パラメータに対して変更処理を実施する。In step S508, there is no ending operation, and if it is determined that an operation to change any of the parameters of velocity, pitch, phoneme, and label is determined (step S5).
09), the process proceeds to step S510, and a change process is performed on the parameter to be changed.

【０２９０】また、ステップＳ５１１において声色設定
の変更が判別されると、処理はステップＳ５１２に移行
して声色設定の変更が行われる。[0290] If it is determined in step S511 that the voice setting has been changed, the process proceeds to step S512, in which the voice setting is changed.

【０２９１】なお、ステップＳ５０８において終了操作
が検出されず、ステップＳ５０９やステップＳ５１１に
おいてパラメータの変更操作が検出されない間は、ステ
ップＳ５０８、Ｓ５０９、及びＳ５１２の処理が繰り返
し実行される。Note that while the end operation is not detected in step S508 and the parameter changing operation is not detected in step S509 or step S511, the processing of steps S508, S509, and S512 is repeatedly executed.

【０２９２】この変形例１のように、自然音声の入力
後、一旦合成波形までを自動的に求めてから、個々のパ
ラメータを変更するようにしても、前述の実施の形態と
同様に、高い品質で音声再生を維持できる実用的な音声
合成を実現することが可能である。As in the first modification, even after the natural speech is input, the parameters up to the synthesized waveform are automatically obtained and then individual parameters are changed, as in the above-described embodiment. It is possible to realize practical speech synthesis that can maintain sound reproduction with high quality.

【０２９３】また、変形例２として、一度音声合成した
後に、元波形と合成波形の振幅パターンを比較して、合
成波形が元波形の振幅に合うようにベロシティ値を最適
化するようにしてもよく、さらに音声の品質を向上でき
る。Also, as a second modification, the amplitude value of the original waveform is compared with the amplitude pattern of the synthesized waveform after the speech is synthesized once, and the velocity value is optimized so that the synthesized waveform matches the amplitude of the original waveform. Well, the quality of the voice can be further improved.

【０２９４】また、変形例３として、発話情報で指定す
る声色データが声色部にない場合に発話情報の特徴（声
色属性）に類似した特徴（声色属性）をもつ声色を声色
部から選定して音声合成を行うようにしてもよい。As a third modification, when the voice data specified by the utterance information does not exist in the voice part, a voice having a feature (voice attribute) similar to the feature (voice attribute) of the speech information is selected from the voice part. Voice synthesis may be performed.

【０２９５】以下に変形例３を具体的に説明する。図３
８は変形例３によるヘッダ情報の構成例を示す図、図３
９は図３８に示したヘッダ情報中の声色属性の構成例を
示す図、図４０は変形例３による声色部の構成例を示す
図、及び図４１は図４０に示した声色部中の声色属性の
構成例を示す図である。Hereinafter, Modification 3 will be described in detail. FIG.
FIG. 8 is a diagram showing a configuration example of header information according to the third modification;
9 is a diagram showing a configuration example of a voice attribute in the header information shown in FIG. 38, FIG. 40 is a diagram showing a configuration example of a voice portion according to Modification 3, and FIG. 41 is a voice in the voice portion shown in FIG. FIG. 4 is a diagram illustrating a configuration example of an attribute.

【０２９６】この変形例３では、図３８及び図４０に示
したように、共通のフォーマットをもつ声色属性情報が
発話情報内のヘッダ情報と声色部２１３とに用意され
る。In the third modification, as shown in FIGS. 38 and 40, voice attribute information having a common format is prepared in the header information in the speech information and the voice portion 213.

【０２９７】発話情報内のヘッダ情報ＨＤＲＸには、前
述の実施の形態に適用されるヘッダ情報に対して新しい
パラメータとして声色属性情報ＡＴが付加される。[0297] To the header information HDRX in the speech information, voice attribute information AT is added as a new parameter to the header information applied to the above-described embodiment.

【０２９８】この声色属性情報ＡＴは、図３９に示した
ように、性別データＳＸ、年齢データＡＧ、ピッチ基準
ＰＢ、明瞭度ＣＬ、及び自然度ＮＴを対応付けた構造を
有している。As shown in FIG. 39, the voice attribute information AT has a structure in which gender data SX, age data AG, pitch reference PB, clarity CL, and naturalness NT are associated with each other.

【０２９９】同様に、声色部２１３には、前述の実施の
形態に適用される声色部２１１に対して新しいパラメー
タとして声色データに対応させて声色属性情報ＡＴｎ
（ｎは自然数）が付加される。Similarly, the voice color part 213 has the voice color attribute information ATn corresponding to the voice color data as a new parameter with respect to the voice color part 211 applied to the above embodiment.
(N is a natural number).

【０３００】この声色属性情報ＡＴｎは、図４１に示し
たように、性別データＳＸｎ、年齢データＡＧｎ、ピッ
チ基準ＰＢｎ、明瞭度ＣＬｎ、及び自然度ＮＴｎを対応
付けた構造を有している。As shown in FIG. 41, the voice attribute information ATn has a structure in which gender data SXn, age data AGn, pitch reference PBn, clarity CLn, and naturalness NTn are associated with each other.

【０３０１】声色属性の各項目は、声色属性情報ＡＴ，
ＡＴｎに共通して、性別：−１／１（男／女）年齢：０〜ピッチ基準（平均ピッチ）：１００〜３００［Ｈｚ］明瞭度：１〜１０（度数が高くなると明瞭度がアップす
る）自然度：１〜１０（度数が高くなると自然度がアップす
る）によって定義される。なお、明瞭度と自然度は感覚的な
レベルを示すものである。Each item of voice attribute is voice attribute information AT,
Gender: -1/1 (male / female) Age: 0 Pitch reference (average pitch): 100 to 300 [Hz] Clarity: 1 to 10 (increasing clarity as frequency increases) ) Naturalness: defined by 1 to 10 (naturalness increases as frequency increases). Note that the clarity and naturalness indicate a sensory level.

【０３０２】次に、変形例３の主要な動作について説明
する。図４２は変形例３による新規作成処理の主要な処
理を説明するフローチャートであり、図４３は変形例３
による再生処理を説明するフローチャートである。Next, main operations of the third modification will be described. FIG. 42 is a flowchart for explaining main processing of the new creation processing according to the third modification, and FIG.
Is a flowchart for explaining a reproduction process by the.

【０３０３】新規作成処理については、全体の流れが前
述の実施の形態による新規作成処理（図２４参照）と同
様のため、相違する部分についてのみ説明する。[0303] Since the entire flow of the new creation processing is the same as that of the new creation processing according to the above-described embodiment (see Fig. 24), only different parts will be described.

【０３０４】図２４に示した処理の流れでは、新規作成
が終了すると、処理はステップＳ１１０からステップＳ
１１７へ移行するが、この変形例３では、図４２に示し
たように、処理はステップＳ１１８に移行して声色属性
設定を実行する。この後に、処理はステップＳ１１７の
ファイリング処理を実行する。In the processing flow shown in FIG. 24, when the new creation is completed, the processing proceeds from step S110 to step S110.
Then, the process proceeds to step S118, and in this modified example 3, as shown in FIG. 42, the process proceeds to step S118 to execute voice attribute setting. After this, the processing executes the filing processing in step S117.

【０３０５】ステップＳ１１８では、前述の声色属性情
報ＡＴが作成され、ヘッダ情報ＨＤＲＸへ組み込まれ
る。ここでは、一例として、性別：１（女性）年齢：２５（歳）ピッチ基準（平均ピッチ）：２００［Ｈｚ］明瞭度：５（普通）自然度：５（普通）が声色属性情報ＡＴに設定されるものとする。In step S118, the above-described voice attribute information AT is created and incorporated into the header information HDRX. Here, as an example, gender: 1 (female) Age: 25 (years) Pitch reference (average pitch): 200 [Hz] Clarity: 5 (normal) Naturalness: 5 (normal) is set in the voice attribute information AT. Shall be performed.

【０３０６】次に、再生処理について説明する。この説
明の前に、声色部２１３の声色属性情報ＡＴｎの各項目
の内容について一例を示す。Next, the reproducing process will be described. Before this description, an example will be given of the contents of each item of the voice attribute information ATn of the voice part 213.

【０３０７】声色属性情報ＡＴ１の場合、一例として、性別：−１（男性）年齢：３５（歳）ピッチ基準（平均ピッチ）：１４０［Ｈｚ］明瞭度：７（やや高い）自然度：５（普通）とする。In the case of the voice attribute information AT1, for example, gender: -1 (male) Age: 35 (years) Pitch reference (average pitch): 140 [Hz] Clarity: 7 (slightly higher) Naturalness: 5 ( Normal).

【０３０８】また、声色属性情報ＡＴ２の場合、一例と
して、性別：１（女性）年齢：２０（歳）ピッチ基準（平均ピッチ）：２００［Ｈｚ］明瞭度：５（普通）自然度：５（普通）とする。In the case of the voice attribute information AT2, for example, sex: 1 (female) Age: 20 (years) Pitch reference (average pitch): 200 [Hz] Clarity: 5 (normal) Naturalness: 5 ( Normal).

【０３０９】この図４３に示した再生処理も、前述の実
施の形態による再生処理（図１７及び図１８参照）と全
体的な流れが共通のため、相違する部分についてのみ説
明する。The reproduction process shown in FIG. 43 has the same general flow as the reproduction process according to the above-described embodiment (see FIGS. 17 and 18), and therefore only different parts will be described.

【０３１０】ステップＳ４０２において指定の声色がな
いという判断結果が得られた場合には、処理はステップ
Ｓ４０７に移行する。このステップＳ４０７では、発話
情報の声色属性情報ＡＴと声色部２１３に記憶されてい
る各声色属性情報ＡＴｎとを照合する処理が実行され
る。[0310] If it is determined in step S402 that there is no designated voice tone, the process proceeds to step S407. In this step S407, a process of collating the voice attribute information AT of the utterance information with each voice attribute information ATn stored in the voice part 213 is executed.

【０３１１】この照合には、照合対象となる各項目の値
の差分をとり、これに重み付けして自乗し、各項目の結
果を加算する方法（ユークリッド距離）や、絶対値の重
み付け加算等の方法がある。In this collation, a difference between the values of the items to be collated is taken, weighted and squared, and the result of each item is added (Euclidean distance), or a weighted addition of the absolute value, etc. There is a way.

【０３１２】例えば、ユークリッド距離（ＤＳｎ）の算
出方法を適用した場合について説明する。その際に使用
する重み付けを、一例として、性別：２０年齢：１ピッチ基準（平均ピッチ）：１明瞭度：５自然度：３とする。[0312] For example, a case where a method of calculating the Euclidean distance (DSn) is applied will be described. The weighting used at this time is, for example, gender: 20 age: 1 pitch reference (average pitch): 1 clarity: 5 naturalness: 3

【０３１３】そこで、声色属性情報ＡＴとＡＴ１との照
合では、ＤＳ１＝（−１−１）＊２０）²＋（（３５−２５）＊１）²＋（（１４０− ２００）＊１）²＋（（７−５）＊５）²＋（（５−５）＊３）² ＝７２０となり、声色属性情報ＡＴとＡＴ２との照合では、ＤＳ２＝（（１−１）＊２０）²＋（（２０−２５）＊１）²＋（（２３０− ２００）＊１）²＋（（４−５）＊５）²＋（（７−５）＊３）² ＝９８６となる。Therefore, in the comparison between the voice attribute information AT and AT1, DS1 = (− 1−1) * 20) ² + ((35−25) * 1) ² + ((140−200) * 1) ² + ((7−5) * 5) ² + ((5−5) * 3) ² = 720, and in the comparison between the voice attribute information AT and AT2, DS2 = ((1-1) * 20) ² + ((20-25) * 1) ² + ((230-200) * 1) ² + ((4-5) * 5) ² + ((7-5) * 3) ² = 986.

【０３１４】したがって、ステップＳ４０８において、
ＤＳ１＜ＤＳ２の関係となり、距離の短い声色属性情報
ＡＴ１に対応して記憶される声色データＶＤ１が声色属
性の類似度が最も高い声色の種類として選定される。Therefore, in step S408,
DS1 <DS2, and the timbre data VD1 stored corresponding to the short-distance timbre attribute information AT1 is selected as the type of timbre having the highest similarity of timbre attributes.

【０３１５】なお、この変形例３については、直接声色
の種類を指定した後に声色属性で声色を選定するように
していたが、直接声色の種類を指定せず、声色属性だけ
を用いて類似度から声色データを選定するようにしても
よい。In the third modification, the timbre is selected by the timbre attribute after directly specifying the timbre type. However, the similarity is determined by using only the timbre attribute without directly specifying the timbre type. May be selected from the voice data.

【０３１６】この変形例３によれば、音韻に依存しない
声の大きさや声の高さで時間的に連続した韻律パターン
を展開し、その韻律パターンと発話情報中の声色の属性
を示す情報で類似度によって選定された声色データとに
基づいて音声波形を生成するようにしたので、不適な声
色を使用せずに類似度の最も高い声色で音声再生がで
き、かつ音声波形の生成時に声の高さのパターンにずれ
が生じることはなく、これによって、音声を高い品質で
再生することが可能である。According to the third modification, a prosody pattern that is temporally continuous with a voice volume and a voice pitch that does not depend on a phoneme is developed, and the prosody pattern and information indicating the attribute of the voice color in the speech information are used. Since the voice waveform is generated based on the voice data selected by the similarity, the voice can be reproduced in the voice having the highest similarity without using an inappropriate voice, and the voice is generated when the voice waveform is generated. There is no shift in the height pattern, which allows audio to be reproduced with high quality.

【０３１７】また、音韻に依存しない声の大きさや声の
高さで時間的に連続した韻律パターンを展開し、その韻
律パターンと発話情報中の声色の種類や属性を示す情報
で選定された声色データとに基づいて音声波形を生成す
るようにしたので、直接的に指定した声色がなくても不
適な声色を使用せずに類似度の最も高い声色で音声再生
ができ、かつ音声波形の生成時に声の高さのパターンに
ずれが生じることはなく、これによって、音声を高い品
質で再生することが可能である。Further, a temporally continuous prosody pattern is developed based on the volume and pitch of the voice independent of the phoneme, and the prosody pattern and the timbre selected by the information indicating the type and attribute of the timbre in the speech information Since the voice waveform is generated based on the data, even if there is no directly specified voice, the voice with the highest similarity can be reproduced without using an inappropriate voice, and the voice waveform is generated. Occasionally, there is no shift in the pitch pattern of the voice, which makes it possible to reproduce the voice with high quality.

【０３１８】次に、変形例４について説明する。この変
形例４は、前述の実施の形態で使用した制御イベントに
バリエーションを与えたものである。Next, a fourth modification will be described. The fourth modification is a variation of the control event used in the above-described embodiment.

【０３１９】変形例４による制御イベントＣＥについて
詳述する。図４４は変形例４による制御イベントの構成
例を示す図である。The control event CE according to the fourth modification will be described in detail. FIG. 44 is a diagram illustrating a configuration example of a control event according to the fourth modification.

【０３２０】変形例４では、制御イベントＣＥに、新た
にポーズイベントＣＥ３と完了イベントＣＥ４とが追加
される。In the modification 4, a pause event CE3 and a completion event CE4 are newly added to the control event CE.

【０３２１】ポーズイベントＣＥ４は、識別情報Ｃ３と
ポーズイベントデータＰＳＥとを対応付けた構造を有
し、ナレーション再生を任意の時点で一旦ポーズさせる
ためのイベントである。The pause event CE4 has a structure in which the identification information C3 and the pause event data PSE are associated with each other, and is an event for temporarily pausing narration reproduction at an arbitrary time.

【０３２２】すなわち、このポーズイベントＣＥ３は、
他の制御イベントＣＥ１，ＣＥ２，ＣＥ４と同様に発声
データ中に組み込めるイベントであり、ナレーション再
生をこのイベントの出現でポーズ状態にする。このポー
ズ状態の解除については、他の情報（画像表示など）に
よる動作に同期させて行うようにする。That is, this pause event CE3 is
Like the other control events CE1, CE2, and CE4, this event can be incorporated into the utterance data, and the narration playback is paused when this event appears. The release of the pause state is performed in synchronization with an operation based on other information (such as image display).

【０３２３】ポーズイベントＣＥ３の先頭に付加された
識別情報Ｃ３は、制御イベントの種類であるポーズを示
す。[0323] The identification information C3 added to the head of the pause event CE3 indicates a pause that is a type of control event.

【０３２４】完了ベントＣＥ４は、識別情報Ｃ４と完了
イベントデータＣＯＥとを対応付けた構造を有し、ナレ
ーション再生がどこまで再生されたのかを外部の上位ア
プリケーション等に知らせるイベントである。[0324] The completion event CE4 has a structure in which the identification information C4 and the completion event data COE are associated with each other, and informs an external upper application or the like of how far the narration has been reproduced.

【０３２５】すなわち、この完了イベントＣＥ４は、他
の制御イベントＣＥ１，ＣＥ２，ＣＥ３と同様に発声デ
ータ中に組み込めるイベントであり、ナレーション再生
完了をこのイベントの出現で外部の上位アプリケーショ
ンに知らせる。That is, the completion event CE4 is an event that can be incorporated into the utterance data, like the other control events CE1, CE2, and CE3, and notifies the completion of the narration reproduction to an external upper application by the appearance of this event.

【０３２６】完了イベントＣＥ４の先頭に付加された識
別情報Ｃ４は、制御イベントの種類である完了を示す。[0326] The identification information C4 added to the head of the completion event CE4 indicates completion, which is the type of control event.

【０３２７】ここで、変形例４の再生処理について説明
する。図４５は変形例４による再生処理を説明するフロ
ーチャートであり、図４６〜図４８は再生処理時の表示
画面の状態遷移を示す図である。[0327] Here, the reproduction process of Modification 4 will be described. FIG. 45 is a flowchart for explaining the reproduction process according to the fourth modification, and FIGS. 46 to 48 are diagrams showing the state transition of the display screen during the reproduction process.

【０３２８】なお、説明上、プログラム情報において
は、画像情報は第１画像、第２画像の順に表示する工程
でプログラミングされ、発話情報は、第１画像の表示開
始と共に第１ナレーションを再生し、その後に完了イベ
ント、ポーズイベントによって第２ナレーションの再生
をウェイトさせ、第２画像の表示開始と共に第２ナレー
ションを再生することで画像とナレーションとの同期を
とるようにプログラミングされている。In the description, in the program information, the image information is programmed in the step of displaying the first image and the second image in this order, and the utterance information reproduces the first narration when the display of the first image starts. Thereafter, the reproduction of the second narration is weighted by the completion event and the pause event, and the second narration is reproduced together with the start of the display of the second image, so that the image and the narration are synchronized.

【０３２９】この変形例４では、再生の開始に従い、ま
ず、ファイル情報の内の画像情報に基づいて図４６に示
した如く第１画像（一例として日本の絵）が表示され
（ステップＴ５０１）、続いてファイル情報の内の発話
情報が解析される（ステップＴ５０２）。In the fourth modification, as the reproduction is started, first, the first image (for example, a Japanese picture) is displayed as shown in FIG. 46 based on the image information in the file information (step T501). Subsequently, the utterance information in the file information is analyzed (step T502).

【０３３０】その解析結果に基づいて図４６に示した如
くスピーカ２３より「日本は島国です。」のように第１
ナレーションの再生が開始される（ステップＴ５０
３）。この場合にも、前述した実施の形態と同様に、Ｎ
Ｃウインド２５０は表示部２５に画像と共に表示され
る。As shown in FIG. 46, based on the analysis result, the speaker 23 outputs the first message such as "Japan is an island country".
The reproduction of the narration is started (step T50)
3). Also in this case, similarly to the above-described embodiment, N
The C window 250 is displayed on the display unit 25 together with the image.

【０３３１】この変形例４では、第１ナレーションの再
生が開始された後に、第１ナレーションの完了を示す完
了イベントや前述のイベント（ＮＣウインド２５０の操
作、他のファイル情報の要求指示、本処理の終了指示な
ど）の検出が行われる（ステップＴ５０４、ステップＴ
５０６）In the fourth modification, after the reproduction of the first narration is started, a completion event indicating the completion of the first narration or the above-described event (operation of NC window 250, request instruction of other file information, this processing) (Step T504, Step T504)
506)

【０３３２】ステップＴ５０６によりイベント入力が検
出された場合には、処理はステップＴ５０７に移行す
る。このステップＴ５０７において、前述の実施の形態
と同様にＮＣウインド２５０の操作によるナレーション
再生のためのイベント入力であれば、処理はさらにステ
ップＴ５０８に移行して、再生、停止、一時停止、又は
早送りの制御を実行する。また、ナレーション再生のた
めのイベント入力でない場合には、処理はこの再生処理
から抜け、図１６に示したファイル転送処理（メイン処
理）に戻る。[0332] If an event input is detected in step T506, the process proceeds to step T507. If it is determined in step T507 that the event is an event input for narration playback by operating the NC window 250 as in the above-described embodiment, the process further proceeds to step T508 to perform playback, stop, pause, or fast forward. Execute control. If the event is not an event input for narration playback, the process exits from this playback process and returns to the file transfer process (main process) shown in FIG.

【０３３３】また、第１ナレーションの再生の終了が発
話情報の発声イベントにある完了イベントによって検知
されると（ステップＴ５０４）、この完了イベントに続
くポーズイベントが検出されることになり（ステップＴ
５０５）、このタイミングで図４７に示した如く第２画
像（一例として富士山の絵）が表示部２５に表示される
（ステップＴ５０９）。If the end of the reproduction of the first narration is detected by the completion event in the utterance event of the speech information (step T504), a pause event following this completion event is detected (step T504).
505), at this timing, the second image (a picture of Mt. Fuji, for example) is displayed on the display unit 25 as shown in FIG. 47 (step T509).

【０３３４】この第２画像の表示開始のタイミングで発
話情報（発声データ）に基づいて「富士山は日本一高い
山です。」（図４８参照）のように第２ナレーションの
再生が開始され（ナレーション再生の再開）、第２画像
の表示と第２ナレーションの再生との同期がとられる
（ステップＴ５０２、ステップＴ５０３）。At the timing of the start of the display of the second image, the reproduction of the second narration is started based on the utterance information (utterance data) as in "Mt. Fuji is the highest mountain in Japan" (see FIG. 48). The reproduction of the second image and the reproduction of the second narration are synchronized (steps T502 and T503).

【０３３５】なお、上述した変形例４では、完了イベン
トとポーズイベントとをペアで使用する例を示していた
が、それぞれのイベントを単独で使用するようにしても
よい。[0335] In the above-described modification 4, an example in which the completion event and the pause event are used as a pair has been described. However, each event may be used independently.

【０３３６】すなわち、ナレーション再生の間に、完了
イベントの出現をナレーションの再生位置を知る基点と
して再生処理の上位のアプリケーションに渡すことで、
上位のアプリケーションは他の動作との同期をとるよう
にしてもよい。この場合、完了イベントはナレーション
再生の時間軸上で任意の時点（他の動作との同期を取る
べき箇所）に組み込めばよい。That is, during the reproduction of the narration, the appearance of the completion event is passed to the upper application in the reproduction processing as the base point for knowing the reproduction position of the narration.
The higher-level application may synchronize with another operation. In this case, the completion event may be incorporated at an arbitrary point on the time axis of the narration playback (a point where synchronization with other operations is to be performed).

【０３３７】また、ポーズイベントにおいては、例えば
ナレーションの一文毎に発声データ中にポーズイベント
を組み込むことで、ナレーション再生のポーズの解除を
前述の画像表示とは異なってキー入力部２９の操作に同
期させることもできる。In the pause event, for example, by incorporating a pause event into the utterance data for each sentence of the narration, the release of the pause of the narration reproduction is synchronized with the operation of the key input unit 29 differently from the above-described image display. It can also be done.

【０３３８】このように、変形例４によれば、発話情報
にファイル情報の内の画像情報による動作と音声再生に
よる動作とを同期させる制御イベントを含め、音声再生
を発話情報に含まれる制御イベントに従ってファイル情
報の内の画像情報の動作に同期して動作するようにした
ので、音声と他のメディアとの表現が融合して表現力を
強化することが可能である。As described above, according to the fourth modification, the utterance information includes the control event for synchronizing the operation based on the image information in the file information and the operation based on the audio reproduction, and the control event included in the utterance information includes the audio reproduction. , The operation is performed in synchronization with the operation of the image information in the file information, so that the expression of sound and other media can be fused to enhance the expressive power.

【０３３９】なお、ファイル情報には画像情報の他に楽
曲情報等を加えてもよく、これによって、音声と画像の
他に音楽等の表現が融合して表現力を強化することが可
能である。[0339] Note that music information or the like may be added to the file information in addition to the image information. This makes it possible to enhance the expressive power by combining expressions of music and the like in addition to the sound and the image. .

【０３４０】また、この制御イベントは発話情報を作成
する際に発話情報に含めるようにしたので、他の情報に
よる動作に音声合成の動作を同期させる情報を発話情報
の中に与えることが可能である。Since this control event is included in the utterance information when the utterance information is created, information for synchronizing the operation of speech synthesis with the operation based on other information can be given in the utterance information. is there.

【０３４１】さて、前述の実施の形態及び各変形例にお
いて、音韻に依存しない声の高さや声の強さの指定によ
り声色データを選定するようにしていたが、声色データ
の選定だけに着目すると、声の高さや声の強さは音韻に
依存しなくても、声色部２１１（声色部２１３）の中で
音声合成する発話情報に最適の声色データを選定するこ
とができる。この程度において音声を高い品質で再生す
ることが可能である。By the way, in the above-described embodiment and each of the modified examples, the timbre data is selected by designating the voice pitch and the voice intensity that do not depend on the phoneme. Even if the pitch and strength of the voice do not depend on the phoneme, it is possible to select the optimum voice data for the speech information to be synthesized in the voice section 211 (voice section 213). At this level, audio can be reproduced with high quality.

【０３４２】[0342]

【発明の効果】以上説明したように、請求項１の発明に
よれば、第１通信装置から第２通信装置へ発話情報を含
むファイル情報を転送し、第２通信装置において、音韻
に依存しない声の大きさや声の高さで時間的に連続した
韻律パターンを展開し、その韻律パターンと発話情報に
基づき選定された声色データとに基づいて音声波形を生
成するようにしたので、特定の声色に限定しなくても適
した声色で音声再生ができ、かつ音声波形の生成時に声
の高さのパターンにずれが生じることはなく、このよう
に、発話情報と声色情報との対応関係を固定しなくても
最適の対応関係を得ることで音声合成の高い品質を維持
することが可能な情報通信システムが得られるという効
果を奏する。As described above, according to the first aspect of the present invention, file information including speech information is transferred from the first communication device to the second communication device, and the second communication device does not depend on phonemes. Since a temporally continuous prosody pattern is developed based on the loudness and pitch of the voice, and a voice waveform is generated based on the prosody pattern and voice data selected based on the utterance information, a specific voice color is generated. It is possible to reproduce the voice with a suitable voice color even if it is not limited to the above, and there is no deviation in the voice pitch pattern when generating the voice waveform, thus fixing the correspondence between the utterance information and the voice color information By obtaining the optimal correspondence without performing the above, it is possible to obtain an information communication system capable of maintaining high quality of speech synthesis.

【０３４３】請求項２の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の種類を示す
情報で選定された声色データとに基づいて音声波形を生
成するようにしたので、特定の声色に限定しなくても複
数種の声色から直接的に指定した最適の声色で音声再生
ができ、かつ音声波形の生成時に声の高さのパターンに
ずれが生じることはなく、このように、発話情報と声色
情報との対応関係を固定しなくても最適の対応関係を得
ることで音声合成の高い品質を維持することが可能な情
報通信システムが得られるという効果を奏する。According to the second aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device. In order to generate a speech waveform based on the prosody pattern that is temporally continuous and generate the voice waveform based on the prosody pattern and the timbre data selected by the information indicating the type of timbre in the utterance information, The voice can be reproduced with the optimum voice directly specified from a plurality of voices without limitation, and there is no shift in the voice pitch pattern when generating the voice waveform. There is an effect that an information communication system capable of maintaining a high quality of speech synthesis can be obtained by obtaining an optimum correspondence relationship without fixing a correspondence relationship between the voice and timbre information.

【０３４４】請求項３の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の属性を示す
情報で類似度によって選定された声色データとに基づい
て音声波形を生成するようにしたので、不適な声色を使
用せずに類似度の最も高い声色で音声再生ができ、かつ
音声波形の生成時に声の高さのパターンにずれが生じる
ことはなく、このように、発話情報と声色情報との対応
関係を固定しなくても最適の対応関係を得ることで音声
合成の高い品質を維持することが可能な情報通信システ
ムが得られるという効果を奏する。According to the third aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device. Since a temporally continuous prosody pattern is developed in step (1) and a voice waveform is generated based on the prosody pattern and voice data selected by similarity based on information indicating a voice attribute in the utterance information, The voice can be reproduced with the voice having the highest similarity without using any vocal timbre, and there is no shift in the voice pitch pattern when the voice waveform is generated. There is an effect that an information communication system capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimal correspondence without fixing the correspondence.

【０３４５】請求項４の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の種類や属性
を示す情報で選定された声色データとに基づいて音声波
形を生成するようにしたので、直接的に指定した声色が
なくても不適な声色を使用せずに類似度の最も高い声色
で音声再生ができ、かつ音声波形の生成時に声の高さの
パターンにずれが生じることはなく、このように、発話
情報と声色情報との対応関係を固定しなくても最適の対
応関係を得ることで音声合成の高い品質を維持すること
が可能な情報通信システムが得られるという効果を奏す
る。According to the fourth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device. Since a temporally continuous prosody pattern is developed in step (1), a speech waveform is generated based on the prosody pattern and voice data selected by information indicating the type and attribute of the voice in the utterance information. The voice can be reproduced with the highest similarity without using an unsuitable voice even if the voice specified in the above is not used, and there is no shift in the voice pitch pattern when generating the voice waveform. In addition, it is possible to obtain an information communication system capable of maintaining a high quality of speech synthesis by obtaining an optimum correspondence without fixing the correspondence between the utterance information and the voice color information.

【０３４６】請求項５の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報に基づき選定された声色データとに
基づいて音声波形を生成するようにしたので、特定の声
色に限定しなくても適した声色で音声再生ができ、かつ
音声波形の生成時に声の高さのパターンにずれが生じる
ことはなく、このように、発話情報と声色情報との対応
関係を固定しなくても最適の対応関係を得ることで音声
合成の高い品質を維持することが可能な情報通信システ
ムが得られるという効果を奏する。According to the fifth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the utterance information in the file information. Since the prosody pattern is developed and the voice waveform is generated based on the prosody pattern and the voice data selected based on the utterance information, the voice can be reproduced with a suitable voice without being limited to a specific voice. In addition, there is no shift in the pitch pattern of the voice at the time of generation of the voice waveform, and thus, it is possible to obtain the optimum correspondence without obtaining the correspondence between the utterance information and the voice color information. There is an effect that an information communication system capable of maintaining high quality of synthesis can be obtained.

【０３４７】請求項６の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報中の声色の種類を示す情報で選定さ
れた声色データとに基づいて音声波形を生成するように
したので、特定の声色に限定しなくても複数種の声色か
ら直接的に指定した最適の声色で音声再生ができ、かつ
音声波形の生成時に声の高さのパターンにずれが生じる
ことはなく、このように、発話情報と声色情報との対応
関係を固定しなくても最適の対応関係を得ることで音声
合成の高い品質を維持することが可能な情報通信システ
ムが得られるという効果を奏する。[0347] According to the invention of claim 6, file information including speech information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the speech information in the file information. Is developed based on the prosody pattern and the voice data selected based on the information indicating the type of voice in the utterance information, so that the voice is not limited to a specific voice. The voice can be reproduced with the optimal voice directly specified from the plurality of voices, and the voice pitch pattern does not shift when the voice waveform is generated. There is an effect that an information communication system capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimal correspondence without fixing the correspondence.

【０３４８】請求項７の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報中の声色の属性を示す情報で類似度
によって選定された声色データとに基づいて音声波形を
生成するようにしたので、不適な声色を使用せずに類似
度の最も高い声色で音声再生ができ、かつ音声波形の生
成時に声の高さのパターンにずれが生じることはなく、
このように、発話情報と声色情報との対応関係を固定し
なくても最適の対応関係を得ることで音声合成の高い品
質を維持することが可能な情報通信システムが得られる
という効果を奏する。[0348] According to the invention of claim 7, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the utterance information in the file information. The prosody pattern is developed, and the voice waveform is generated based on the prosody pattern and the voice data selected by the similarity based on the information indicating the voice attribute in the utterance information. The voice can be reproduced with the highest similarity without any voice, and the voice pitch pattern does not shift when generating the voice waveform.
As described above, an advantageous effect is obtained in that an information communication system capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimal correspondence without fixing the correspondence between speech information and voice color information.

【０３４９】請求項８の発明によれば、第１通信装置か
ら第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報中の声色の種類や属性を示す情報で
選定された声色データとに基づいて音声波形を生成する
ようにしたので、直接的に指定した声色がなくても不適
な声色を使用せずに類似度の最も高い声色で音声再生が
でき、かつ音声波形の生成時に声の高さのパターンにず
れが生じることはなく、このように、発話情報と声色情
報との対応関係を固定しなくても最適の対応関係を得る
ことで音声合成の高い品質を維持することが可能な情報
通信システムが得られるという効果を奏する。According to the eighth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the utterance information in the file information. The voice waveform is generated based on the prosody pattern and the voice data selected by the information indicating the type and attribute of the voice in the utterance information. Without using an inappropriate voice tone, the voice can be reproduced with the highest similarity voice without using an inappropriate voice tone, and there is no shift in the voice pitch pattern when generating a voice waveform. There is an effect that an information communication system capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimal correspondence without fixing the correspondence with voice color information.

【０３５０】請求項９の発明によれば、請求項３、４、
７、８のいずれか一つに記載の発明において、属性を示
す情報を性別、年齢、声の高さの基準、明瞭度、及び自
然度の内のいずれか一つ、もしくはその二つ以上の組み
合わせにしたので、発話情報記憶手段の属性と声色記憶
手段の属性との照合対象がパラメータ化され、これによ
って、声色の選定を容易にすることが可能な情報通信シ
ステムが得られるという効果を奏する。According to the ninth aspect of the present invention, the third aspect, the fourth aspect,
In the invention described in any one of the above items 7 and 8, the information indicating the attribute is represented by any one of gender, age, voice pitch criterion, intelligibility, and naturalness, or two or more thereof. Because of the combination, the collation target of the attribute of the utterance information storage means and the attribute of the timbre storage means is parameterized, thereby providing an information communication system capable of easily selecting the timbre. .

【０３５１】請求項１０の発明によれば、請求項１〜８
のいずれか一つに記載の発明において、音声再生時に発
話情報記憶手段の声の高さの基準を声色記憶手段の声の
高さの基準によってシフトするようにしたので、個々の
声の高さは音韻の時間区分に関係なくそのシフトした声
の高さの基準に従って相対的に変化し、このため、声の
高さの基準が声色側に近づくことから、音声の品質を一
層向上させることが可能な情報通信システムが得られる
という効果を奏する。According to the tenth aspect, claims 1 to 8
In the invention described in any one of the above, since the reference of the voice pitch of the utterance information storage means is shifted by the reference of the voice pitch of the timbre storage means at the time of voice reproduction, the pitch of each voice Is relatively changed according to the shifted voice pitch criterion regardless of the time segment of the phoneme, so that the voice pitch criterion approaches the timbre side, so that the voice quality can be further improved. There is an effect that a possible information communication system is obtained.

【０３５２】請求項１１の発明によれば、請求項１〜８
のいずれか一つに記載の発明において、音声再生時に発
話情報記憶手段の声の高さの基準を任意の声の高さの基
準によってシフトするようにしたので、個々の声の高さ
は音韻の時間区分に関係なくそのシフトした声の高さの
基準に従って相対的に変化し、このため、シフト量に従
って意図する声質に近づける等、声色の加工が可能な情
報通信システムが得られるという効果を奏する。According to the eleventh aspect, claims 1 to 8
In the invention according to any one of the above, since the reference of the voice pitch of the utterance information storage means is shifted by an arbitrary reference of the voice pitch at the time of voice reproduction, the pitch of each voice is Irrespective of the time segment of the voice, the pitch relatively changes according to the reference of the pitch of the shifted voice, and therefore, the effect that an information communication system capable of processing the voice color, such as approaching the intended voice quality according to the shift amount, is obtained. Play.

【０３５３】請求項１２の発明によれば、請求項１０又
は１１に記載の発明において、第１及び第２情報による
声の高さの基準を声の高さの平均周波数、最大周波数、
または最小周波数にしたので、声の高さの基準が取りや
すくなる情報通信システムが得られるという効果を奏す
る。According to the twelfth aspect of the present invention, in the tenth or eleventh aspect, the criterion of the voice pitch based on the first and second information is determined based on an average frequency of the voice pitch, a maximum frequency,
Alternatively, since the frequency is set to the minimum frequency, it is possible to obtain an information communication system in which the reference of the voice pitch can be easily obtained.

【０３５４】請求項１３の発明によれば、請求項１〜８
のいずれか一つに記載の発明において、第２通信装置に
おいて記録媒体より声色データを読み出して声色記憶手
段に記憶するようにしたので、記録媒体を通して声色の
種類にバリエーションを与えることができ、音声再生時
に最適の声色を適用させることが可能な情報通信システ
ムが得られるという効果を奏する。According to the invention of claim 13, claims 1 to 8
In the invention according to any one of the above, the voice data is read from the recording medium in the second communication device and stored in the voice storage means, so that the voice type can be varied through the recording medium, There is an effect that an information communication system capable of applying an optimal voice during reproduction is obtained.

【０３５５】請求項１４の発明によれば、請求項１〜８
のいずれか一つに記載の発明において、第２通信装置に
おいて通信回線を介して外部装置より声色データを受信
してその声色データを声色記憶手段に記憶するようにし
たので、通信回線を通して声色の種類にバリエーション
を与えることができ、音声再生時に最適の声色を適用さ
せることが可能な情報通信システムが得られるという効
果を奏する。According to the fourteenth aspect, claims 1 to 8
In the invention described in any one of the above, the voice data is received from the external device via the communication line in the second communication device, and the voice data is stored in the voice storage means. Variations can be given to the types, and an information communication system capable of applying an optimal voice at the time of voice reproduction is obtained.

【０３５６】請求項１５の発明によれば、請求項１〜８
のいずれか一つに記載の発明において、発話情報にファ
イル情報の内の他の情報による動作と音声再生手段によ
る動作とを同期させる制御情報を含め、音声再生手段が
音声再生時に発話情報に含まれる制御情報に従ってファ
イル情報の内の他の情報の動作に同期して動作するよう
にしたので、音声と他のメディアとの表現が融合して表
現力を強化することが可能な情報通信システムが得られ
るという効果を奏する。According to the invention of claim 15, claims 1 to 8
In the invention described in any one of the above, the utterance information includes control information for synchronizing the operation by the other information in the file information and the operation by the audio reproduction unit, and the audio reproduction unit is included in the utterance information at the time of audio reproduction. Since it operates in synchronization with the operation of the other information in the file information according to the control information to be transmitted, the information communication system that can enhance the expressive power by combining the expression of voice and other media is provided. The effect is obtained.

【０３５７】請求項１６の発明によれば、請求項１５記
載の発明において、他の情報を画像情報、楽曲情報等に
したので、音声と画像、音楽等との表現が融合して表現
力を強化することが可能な情報通信システムが得られる
という効果を奏する。According to the sixteenth aspect, in the invention according to the fifteenth aspect, the other information is image information, music information, and the like. There is an effect that an information communication system that can be strengthened is obtained.

【０３５８】請求項１７の発明によれば、入力された自
然音声に基づいて、声の大きさと声の高さとのいずれか
一方、もしくはその両方を、音韻間の時間差に依存せ
ず、かつ相対的なレベルをもつように離散させて発話情
報を作成し、これを第１通信装置に転送してファイル情
報記憶手段に登録するようにしたので、音韻の時間差か
ら独立した任意の時点に声の大きさや声の高さを与える
ことが可能な情報処理装置が得られるという効果を奏す
る。According to the seventeenth aspect, based on the input natural speech, one or both of the voice volume and the voice pitch can be determined independently of the time difference between phonemes and relative. The utterance information is created by discretely having a vocal level, and the utterance information is transferred to the first communication device and registered in the file information storage means. There is an effect that an information processing device capable of giving a loudness and a loud voice can be obtained.

【０３５９】請求項１８の発明によれば、請求項１７記
載の発明において、請求項１０又は１１に記載の情報通
信システムで使用する発話情報を作成編集する情報処理
装置であって、作成手段が声の高さの基準を示す第１情
報を発話情報に含めて作成するようにしたので、発話情
報の中に声の高さの基準を与えることが可能な情報処理
装置が得られるという効果を奏する。According to an eighteenth aspect of the present invention, there is provided the information processing apparatus according to the seventeenth aspect, wherein the utterance information used in the information communication system according to the tenth or eleventh aspect is created and edited. Since the first information indicating the criterion of the voice pitch is included in the utterance information and created, the information processing device capable of providing the criterion of the voice pitch in the utterance information is obtained. Play.

【０３６０】請求項１９の発明によれば、請求項１７記
載の発明において、作成手段が各情報を任意に変更する
変更手段を含むようにしたので、音声の品質を高めるた
めの情報の変更が可能な情報処理装置が得られるという
効果を奏する。According to the nineteenth aspect of the present invention, in the invention of the seventeenth aspect, since the creation means includes a change means for arbitrarily changing each information, the information change for improving the voice quality can be performed. There is an effect that a possible information processing device is obtained.

【０３６１】請求項２０の発明によれば、作成手段が発
話情報を作成する際に制御情報を発話情報に含めるよう
にしたので、他の情報による動作に音声合成の動作を同
期させる情報を発話情報の中に与えることが可能な情報
処理装置が得られるという効果を奏する。According to the twentieth aspect of the present invention, when the creating means creates the utterance information, the control information is included in the utterance information, so that the information for synchronizing the operation of speech synthesis with the operation based on other information is uttered. There is an effect that an information processing device that can be given in information can be obtained.

【０３６２】請求項２１の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報に基づき選定された声
色データとに基づいて音声波形を生成する工程にしたの
で、特定の声色に限定しなくても適した声色で音声再生
ができ、かつ音声波形の生成時に声の高さのパターンに
ずれが生じることはなく、このように、発話情報と声色
情報との対応関係を固定しなくても最適の対応関係を得
ることで音声合成の高い品質を維持することが可能情報
通信方法が得られるという効果を奏する。According to the twenty-first aspect of the present invention, file information including speech information is transferred from the first communication device to the second communication device. In the step of developing a temporally continuous prosody pattern and generating a voice waveform based on the prosody pattern and voice data selected based on the utterance information, it is suitable without being limited to a specific voice. Voice reproduction can be performed in voice color, and there is no shift in voice pitch pattern when generating a voice waveform. In this way, the optimum correspondence relationship between the utterance information and the voice color information is not fixed. Thus, there is an effect that an information communication method capable of maintaining high quality of speech synthesis can be obtained.

【０３６３】請求項２２の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の種類を示す
情報で選定された声色データとに基づいて音声波形を生
成する工程にしたので、特定の声色に限定しなくても複
数種の声色から直接的に指定した最適の声色で音声再生
ができ、かつ音声波形の生成時に声の高さのパターンに
ずれが生じることはなく、このように、発話情報と声色
情報との対応関係を固定しなくても最適の対応関係を得
ることで音声合成の高い品質を維持することが可能な情
報通信方法が得られるという効果を奏する。[0363] According to the invention of claim 22, file information including speech information is transferred from the first communication device to the second communication device. In the step of developing a temporally continuous prosody pattern and generating a voice waveform based on the prosody pattern and voice data selected by information indicating the type of voice in the speech information, a specific voice The voice can be reproduced with the optimum voice directly specified from a plurality of voices without limitation, and there is no shift in the voice pitch pattern when generating the voice waveform. There is an effect that an information communication method capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimum correspondence relationship without fixing a correspondence relationship between the voice and the timbre information.

【０３６４】請求項２３の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の属性を示す
情報で類似度によって選定された声色データとに基づい
て音声波形を生成する工程にしたので、不適な声色を使
用せずに類似度の最も高い声色で音声再生ができ、かつ
音声波形の生成時に声の高さのパターンにずれが生じる
ことはなく、このように、発話情報と声色情報との対応
関係を固定しなくても最適の対応関係を得ることで音声
合成の高い品質を維持することが可能な情報通信方法が
得られるという効果を奏する。According to the twenty-third aspect, the file information including the utterance information is transferred from the first communication device to the second communication device. In this process, a temporally continuous prosody pattern is developed, and a voice waveform is generated based on the prosody pattern and voice data selected by similarity based on information indicating a voice color attribute in the utterance information. The voice can be reproduced with the voice having the highest similarity without using any vocal timbre, and there is no shift in the voice pitch pattern when the voice waveform is generated. There is an effect that an information communication method capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimal correspondence without fixing the correspondence.

【０３６５】請求項２４の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、音韻に依存しない声の大き
さや声の高さで時間的に連続した韻律パターンを展開
し、その韻律パターンと発話情報中の声色の種類や属性
を示す情報で選定された声色データとに基づいて音声波
形を生成する工程にしたので、直接的に指定した声色が
なくても不適な声色を使用せずに類似度の最も高い声色
で音声再生ができ、かつ音声波形の生成時に声の高さの
パターンにずれが生じることはなく、このように、発話
情報と声色情報との対応関係を固定しなくても最適の対
応関係を得ることで音声合成の高い品質を維持すること
が可能な情報通信方法が得られるという効果を奏する。According to the twenty-fourth aspect, the file information including the utterance information is transferred from the first communication device to the second communication device. In this process, a temporally continuous prosody pattern is developed, and a voice waveform is generated based on the prosody pattern and voice data selected by information indicating the type and attribute of the voice in the speech information. The voice can be reproduced with the highest similarity without using an unsuitable voice even if the voice specified in the above is not used, and there is no shift in the voice pitch pattern when generating the voice waveform. In addition, it is possible to obtain an information communication method capable of maintaining a high quality of speech synthesis by obtaining an optimum correspondence without fixing the correspondence between the utterance information and the voice color information.

【０３６６】請求項２５の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報に基づき選定された声色データとに
基づいて音声波形を生成する工程にしたので、特定の声
色に限定しなくても適した声色で音声再生ができ、かつ
音声波形の生成時に声の高さのパターンにずれが生じる
ことはなく、このように、発話情報と声色情報との対応
関係を固定しなくても最適の対応関係を得ることで音声
合成の高い品質を維持することが可能情報通信方法が得
られるという効果を奏する。[0366] According to the twenty-fifth aspect, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the utterance information in the file information. The prosody pattern is developed, and the voice waveform is generated based on the prosody pattern and the voice data selected based on the utterance information, so that the voice can be reproduced with a suitable voice without being limited to a specific voice. In addition, there is no shift in the pitch pattern of the voice at the time of generation of the voice waveform, and thus, it is possible to obtain the optimum correspondence without obtaining the correspondence between the utterance information and the voice color information. There is an effect that an information communication method capable of maintaining high quality of synthesis can be obtained.

【０３６７】請求項２６の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報中の声色の種類を示す情報で選定さ
れた声色データとに基づいて音声波形を生成する工程に
したので、特定の声色に限定しなくても複数種の声色か
ら直接的に指定した最適の声色で音声再生ができ、かつ
音声波形の生成時に声の高さのパターンにずれが生じる
ことはなく、このように、発話情報と声色情報との対応
関係を固定しなくても最適の対応関係を得ることで音声
合成の高い品質を維持することが可能な情報通信方法が
得られるという効果を奏する。According to the twenty-sixth aspect, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the utterance information in the file information. The prosody pattern is developed, and the voice waveform is generated based on the prosody pattern and the voice data selected by the information indicating the type of voice in the utterance information. The voice can be reproduced with the optimal voice directly specified from the plurality of voices, and the voice pitch pattern does not shift when the voice waveform is generated. There is an effect that an information communication method capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimal correspondence without fixing the correspondence.

【０３６８】請求項２７の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報中の声色の属性を示す情報で類似度
によって選定された声色データとに基づいて音声波形を
生成する工程にしたので、不適な声色を使用せずに類似
度の最も高い声色で音声再生ができ、かつ音声波形の生
成時に声の高さのパターンにずれが生じることはなく、
このように、発話情報と声色情報との対応関係を固定し
なくても最適の対応関係を得ることで音声合成の高い品
質を維持することが可能な情報通信方法が得られるとい
う効果を奏する。[0368] According to the twenty-seventh aspect, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the utterance information in the file information. The prosody pattern is developed and a voice waveform is generated based on the prosody pattern and the voice data selected by the similarity based on the information indicating the voice attribute in the utterance information. The voice can be reproduced with the highest similarity without any voice, and the voice pitch pattern does not shift when generating the voice waveform.
As described above, an effect is obtained that an information communication method capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimum correspondence relationship without fixing the correspondence relationship between speech information and voice color information.

【０３６９】請求項２８の発明によれば、第１通信装置
から第２通信装置へ発話情報を含むファイル情報を転送
し、第２通信装置において、ファイル情報の内の発話情
報で時間的に連続した韻律パターンを展開し、その韻律
パターンと発話情報中の声色の種類や属性を示す情報で
選定された声色データとに基づいて音声波形を生成する
工程にしたので、直接的に指定した声色がなくても不適
な声色を使用せずに類似度の最も高い声色で音声再生が
でき、かつ音声波形の生成時に声の高さのパターンにず
れが生じることはなく、このように、発話情報と声色情
報との対応関係を固定しなくても最適の対応関係を得る
ことで音声合成の高い品質を維持することが可能な情報
通信方法が得られるという効果を奏する。According to the twenty-eighth aspect of the present invention, the file information including the utterance information is transferred from the first communication device to the second communication device, and the second communication device temporally continues the utterance information in the file information. Is developed based on the prosodic pattern and the voice waveform selected based on information indicating the type and attribute of the voice in the utterance information. Without using an inappropriate voice tone, the voice can be reproduced with the highest similarity voice without using an inappropriate voice tone, and there is no shift in the voice pitch pattern when generating a voice waveform. There is an effect that an information communication method capable of maintaining high quality of speech synthesis can be obtained by obtaining an optimal correspondence without fixing the correspondence with voice color information.

【０３７０】請求項２９の発明によれば、請求項２３、
２４、２７、２８のいずれか一つに記載の発明におい
て、属性を示す情報を性別、年齢、声の高さの基準、明
瞭度、及び自然度の内のいずれか一つ、もしくはその二
つ以上の組み合わせる工程にしたので、発話情報記憶手
段の属性と声色記憶手段の属性との照合対象がパラメー
タ化され、これによって、声色の選定を容易にすること
が可能な情報通信方法が得られるという効果を奏する。According to the invention of claim 29, claim 23,
24. The invention according to any one of 24, 27, and 28, wherein the information indicating the attribute is any one of gender, age, voice pitch criterion, clarity, and naturalness, or two of them. Since the above-described combination process is performed, the collation target of the attribute of the utterance information storage unit and the attribute of the timbre storage unit is parameterized, whereby an information communication method that can easily select a timbre can be obtained. It works.

【０３７１】請求項３０の発明によれば、請求項２１〜
２８のいずれか一つに記載の発明において、音声再生時
に発話情報記憶手段の声の高さの基準を声色記憶手段の
声の高さの基準によってシフトする工程にしたので、個
々の声の高さは音韻の時間区分に関係なくそのシフトし
た声の高さの基準に従って相対的に変化し、このため、
声の高さの基準が声色側に近づくことから、音声の品質
を一層向上させることが可能な情報通信方法が得られる
という効果を奏する。According to the thirtieth aspect, the twenty-first to twenty-first aspects are described.
In the invention described in any one of the twenty-eighth and thirteenth aspects, the step of shifting the reference of the voice pitch of the utterance information storage means during the voice reproduction by the reference of the voice pitch of the timbre storage means is performed. The magnitude changes relative to the shifted voice pitch irrespective of the time segment of the phoneme,
Since the criterion of the voice pitch approaches the timbre side, there is an effect that an information communication method capable of further improving voice quality is obtained.

【０３７２】請求項３１の発明によれば、請求項２１〜
２８のいずれか一つに記載の発明において、音声再生時
に発話情報記憶手段の声の高さの基準を任意の声の高さ
の基準によってシフトする工程にしたので、個々の声の
高さは音韻の時間区分に関係なくそのシフトした声の高
さの基準に従って相対的に変化し、このため、シフト量
に従って意図する声質に近づける等、声色の加工が可能
な情報通信方法が得られるという効果を奏する。According to the thirty-first aspect, the twenty-first to twenty-first aspects are described.
In the invention according to any one of the twenty-sixth to thirteenth aspects, the step of shifting the reference of the voice pitch of the utterance information storage means at the time of voice reproduction by an arbitrary reference of the voice pitch is performed. Regardless of the time segment of the phoneme, it changes relatively according to the reference of the pitch of the shifted voice, so that an information communication method capable of processing the voice color, such as approaching the intended voice quality according to the shift amount, is obtained. To play.

【０３７３】請求項３２の発明によれば、請求項３０又
は３１に記載の発明において、第１及び第２情報による
声の高さの基準を声の高さの平均周波数、最大周波数、
または最小周波数にしたので、声の高さの基準が取りや
すくなる情報通信方法が得られるという効果を奏する。According to the invention of claim 32, in the invention of claim 30 or 31, the criterion of the voice pitch based on the first and second information is determined based on the average frequency of the voice pitch, the maximum frequency,
Alternatively, since the frequency is set to the minimum frequency, there is an effect that an information communication method in which a reference of voice pitch is easily obtained can be obtained.

【０３７４】請求項３３の発明によれば、請求項２１〜
２８のいずれか一つに記載の発明において、第２通信装
置において記録媒体より声色データを読み出して声色記
憶部に記憶する工程にしたので、記録媒体を通して声色
の種類にバリエーションを与えることができ、音声再生
時に最適の声色を適用させることが可能な情報通信方法
が得られるという効果を奏する。According to the thirty-third aspect, the twenty-first to twenty-first aspects are described.
28, in the second communication device, the voice data is read from the recording medium in the second communication device and stored in the voice storage unit, so that the voice type can be varied through the recording medium, There is an effect that an information communication method capable of applying an optimum voice tone during voice reproduction is obtained.

【０３７５】請求項３４の発明によれば、請求項２１〜
２８のいずれか一つに記載の発明において、第２通信装
置において通信回線を介して外部装置より声色データを
受信してその声色データを声色記憶部に記憶する工程に
したので、通信回線を通して声色の種類にバリエーショ
ンを与えることができ、音声再生時に最適の声色を適用
させることが可能な情報通信方法が得られるという効果
を奏する。According to the thirty-fourth aspect, the twenty-first to twenty-first aspects are described.
28. In the invention according to any one of the items 28, the voice data is received from the external device via the communication line in the second communication device, and the voice data is stored in the voice memory unit. And an information communication method capable of applying an optimum voice tone at the time of voice reproduction can be obtained.

【０３７６】請求項３５の発明によれば、請求項２１〜
２８のいずれか一つに記載の発明において、発話情報が
ファイル情報の内の他の情報による動作と音声再生工程
による動作とを同期させる制御情報を含み、音声再生工
程は音声再生時に発話情報に含まれる制御情報に従って
ファイル情報の内の他の情報の動作に同期して動作する
工程にしたので、音声と他のメディアとの表現が融合し
て表現力を強化することが可能な情報通信方法が得られ
るという効果を奏する。According to the invention of claim 35, claims 21 to 21
28, the utterance information includes control information for synchronizing an operation based on the other information in the file information with an operation according to the audio reproduction step, and the audio reproduction step includes adding the utterance information during the audio reproduction. An information communication method capable of enhancing the expressive power by fusing the expression of voice and other media because the operation is performed in synchronization with the operation of other information in the file information according to the included control information. Is obtained.

【０３７７】請求項３６の発明に係る情報通信方法は、
請求項３５記載の発明において、他の情報を画像情報、
楽曲情報等にしたので、音声と画像、音楽等のメディア
との表現が融合して表現力を強化することが可能な情報
通信方法が得られるという効果を奏する。[0377] The information communication method according to the invention of claim 36 is characterized in that:
The invention according to claim 35, wherein the other information is image information,
Since the music information or the like is used, there is an effect that an information communication method capable of enhancing the expressive power by fusing the expression of sound with media such as images and music is obtained.

【０３７８】請求項３７の発明によれば、入力された自
然音声に基づいて、声の大きさと声の高さとのいずれか
一方、もしくはその両方を、音韻間の時間差に依存せ
ず、かつ相対的なレベルをもつように離散させて発話情
報を作成し、これを第１通信装置に転送してファイル情
報記憶手段に登録する工程にしたので、音韻の時間差か
ら独立した任意の時点に声の大きさや声の高さを与える
ことが可能な情報処理方法が得られるという効果を奏す
る。[0378] According to the invention of claim 37, based on the input natural speech, one or both of the loudness and pitch of the voice can be determined independently of the time difference between phonemes and relative. Utterance information is created by discretely arranging the utterance information so as to have a typical level, and the utterance information is transferred to the first communication device and registered in the file information storage means. There is an effect that an information processing method capable of giving a loudness and a voice pitch can be obtained.

【０３７９】請求項３８の発明に係る情報処理方法は、
請求項３７記載の発明において、請求項３０又は３１に
記載の情報通信方法で使用する発話情報を作成編集する
情報処理方法であって、作成工程が声の高さの基準を示
す第１情報を発話情報に含めて作成する工程にしたの
で、発話情報の中に声の高さの基準を与えることが可能
な情報処理方法が得られるという効果を奏する。The information processing method according to claim 38 is:
37. The information processing method according to claim 37, wherein the utterance information used in the information communication method according to claim 30 or 31 is created and edited, wherein the creation step includes: Since the step of creating the utterance information is included, an information processing method capable of providing a reference of the voice pitch in the utterance information is obtained.

【０３８０】請求項３９の発明によれば、請求項３７記
載の発明において、作成工程が各情報を任意に変更する
変更工程を含むので、音声の品質を高めるための情報の
変更が可能な情報処理方法が得られるという効果を奏す
る。According to the thirty-ninth aspect of the present invention, in the invention according to the thirty-seventh aspect, since the creation step includes a change step of arbitrarily changing each information, information capable of changing information for improving voice quality is provided. There is an effect that a processing method can be obtained.

【０３８１】請求項４０の発明によれば、作成工程が発
話情報を作成する際に制御情報を発話情報に含める工程
にしたので、他の情報による動作に音声合成の動作を同
期させる情報を発話情報の中に与えることが可能な情報
処理方法が得られるという効果を奏する。According to the forty-ninth aspect of the present invention, the creation step includes the step of including control information in the utterance information when the utterance information is created, so that the information for synchronizing the operation of speech synthesis with the operation based on other information is uttered. There is an effect that an information processing method that can be given in information is obtained.

[Brief description of the drawings]

【図１】この発明に係る情報通信システムの一実施の形
態を示す構成図である。FIG. 1 is a configuration diagram showing one embodiment of an information communication system according to the present invention.

【図２】この実施の形態によるホスト装置のＤＢのメモ
リ構成例を示す図である。FIG. 2 is a diagram illustrating an example of a memory configuration of a DB of the host device according to the embodiment;

【図３】この実施の形態による発話情報の内のヘッダ情
報の一例を示す図である。FIG. 3 is a diagram showing an example of header information in speech information according to the embodiment;

【図４】発話情報の内の発声情報の構成例を示す図であ
る。FIG. 4 is a diagram showing a configuration example of utterance information in utterance information.

【図５】発声情報内の発声イベントの構成例を示す図で
ある。FIG. 5 is a diagram illustrating a configuration example of an utterance event in utterance information.

【図６】ベロシティのレベル内容を説明する図である。FIG. 6 is a diagram illustrating the contents of a velocity level.

【図７】発声情報内の制御イベントの構成例を示す図で
ある。FIG. 7 is a diagram illustrating a configuration example of a control event in utterance information.

【図８】この発明に係る端末装置の一実施の形態を示す
ブロック図である。FIG. 8 is a block diagram showing one embodiment of a terminal device according to the present invention.

【図９】この実施の形態による声色記憶部の声色部のメ
モリ構成例を示す図である。FIG. 9 is a diagram showing an example of a memory configuration of a timbre section of a timbre storage section according to the embodiment.

【図１０】この実施の形態による声色記憶部の音韻部の
メモリ構成例を示す図である。FIG. 10 is a diagram showing an example of a memory configuration of a phoneme unit of the timbre storage unit according to the embodiment.

【図１１】日本語音韻テーブルの有音声化音韻テーブル
のメモリ構成例を示す図である。FIG. 11 is a diagram showing an example of a memory configuration of a voiced phoneme table of a Japanese phoneme table.

【図１２】日本語音韻テーブルの無音声化音韻テーブル
のメモリ構成例を示す図である。FIG. 12 is a diagram showing an example of a memory configuration of a non-voiced phoneme table of a Japanese phoneme table.

【図１３】音韻部における言語コード毎の音韻と音韻コ
ードとの対応関係を説明する図である。FIG. 13 is a diagram illustrating the correspondence between phonemes and phoneme codes for each language code in the phoneme unit.

【図１４】この実施の形態によるＤＢのメモリ構成例を
示す図である。FIG. 14 is a diagram showing an example of a memory configuration of a DB according to this embodiment.

【図１５】この実施の形態による音声再生処理を概念的
に説明するブロック図である。FIG. 15 is a block diagram conceptually illustrating a sound reproduction process according to the embodiment.

【図１６】この実施の形態によるファイル転送処理を説
明するフローチャートである。FIG. 16 is a flowchart illustrating a file transfer process according to the embodiment.

【図１７】この実施の形態による再生処理を説明するフ
ローチャートである。FIG. 17 is a flowchart illustrating a reproduction process according to the embodiment.

【図１８】この実施の形態による再生処理を説明するフ
ローチャートである。FIG. 18 is a flowchart illustrating a reproducing process according to the embodiment.

【図１９】この実施の形態による再生処理時の表示画面
の一状態遷移を示す図である。FIG. 19 is a diagram showing one state transition of a display screen at the time of reproduction processing according to the embodiment.

【図２０】この実施の形態による再生処理時の表示画面
の他の状態遷移を示す図である。FIG. 20 is a diagram showing another state transition of the display screen at the time of the reproduction process according to the embodiment.

【図２１】この実施の形態による再生処理時の表示画面
のさらに他の状態遷移を示す図である。FIG. 21 is a diagram showing still another state transition of the display screen at the time of the reproduction process according to the embodiment.

【図２２】この実施の形態による再生処理時の表示画面
のさらに他の状態遷移を示す図である。FIG. 22 is a diagram showing still another state transition of the display screen at the time of the reproduction process according to the embodiment.

【図２３】この実施の形態による発話情報作成処理を説
明するフローチャートである。FIG. 23 is a flowchart illustrating utterance information creation processing according to this embodiment.

【図２４】この実施の形態による新規作成処理を説明す
るフローチャートである。FIG. 24 is a flowchart illustrating a new creation process according to the embodiment.

【図２５】この実施の形態による割り込み再生処理を説
明するフローチャートである。FIG. 25 is a flowchart illustrating an interrupt reproduction process according to the embodiment.

【図２６】この実施の形態による新規作成処理時の操作
画面の一状態遷移を示す図である。FIG. 26 is a diagram showing one state transition of an operation screen at the time of a new creation process according to the embodiment.

【図２７】この実施の形態による新規作成処理時の操作
画面の他の状態遷移を示す図である。FIG. 27 is a diagram showing another state transition of the operation screen during a new creation process according to the embodiment.

【図２８】この実施の形態による新規作成処理時の操作
画面のさらに他の状態遷移を示す図である。FIG. 28 is a diagram showing still another state transition of the operation screen at the time of new creation processing according to the embodiment.

【図２９】この実施の形態による新規作成処理時の操作
画面のさらに他の状態遷移を示す図である。FIG. 29 is a diagram showing still another state transition of the operation screen at the time of a new creation process according to the embodiment.

【図３０】この実施の形態による新規作成処理時の操作
画面のさらに他の状態遷移を示す図である。FIG. 30 is a diagram showing still another state transition of the operation screen at the time of new creation processing according to the embodiment.

【図３１】この実施の形態による新規作成処理時の操作
画面のさらに他の状態遷移を示す図である。FIG. 31 is a diagram showing still another state transition of the operation screen at the time of new creation processing according to the embodiment.

【図３２】この実施の形態による新規作成処理時の操作
画面のさらに他の状態遷移を示す図である。FIG. 32 is a diagram showing still another state transition of the operation screen at the time of new creation processing according to the embodiment.

【図３３】この実施の形態による新規作成処理時の操作
画面のさらに他の状態遷移を示す図である。FIG. 33 is a diagram showing still another state transition of the operation screen at the time of new creation processing according to the embodiment.

【図３４】この実施の形態による編集処理を説明するフ
ローチャートである。FIG. 34 is a flowchart illustrating an editing process according to the embodiment.

【図３５】この実施の形態によるファイル登録処理を説
明するフローチャートである。FIG. 35 is a flowchart illustrating a file registration process according to this embodiment.

【図３６】この実施の形態の変形例１による要部を示す
ブロック図である。FIG. 36 is a block diagram showing a main part according to a first modification of the embodiment.

【図３７】この実施の形態の変形例１による新規作成処
理を説明するフローチャートである。FIG. 37 is a flowchart illustrating a new creation process according to a first modification of the embodiment.

【図３８】この実施の形態の変形例３によるヘッダ情報
の構成例を示す図である。FIG. 38 is a diagram showing a configuration example of header information according to a third modification of the embodiment.

【図３９】図３８に示したヘッダ情報中の声色属性の構
成例を示す図である。FIG. 39 is a diagram showing a configuration example of a voice attribute in the header information shown in FIG. 38;

【図４０】この実施の形態の変形例３による声色部の構
成例を示す図である。FIG. 40 is a diagram showing a configuration example of a timbre portion according to a third modification of the embodiment.

【図４１】図４０に示した声色部中の声色属性の構成例
を示す図である。FIG. 41 is a diagram showing a configuration example of a voice attribute in a voice portion shown in FIG. 40;

【図４２】この実施の形態の変形例３による新規作成処
理の主要な処理を説明するフローチャートである。FIG. 42 is a flowchart illustrating main processing of a new creation processing according to a third modification of the embodiment.

【図４３】この実施の形態の変形例３による再生処理を
説明するフローチャートである。FIG. 43 is a flowchart illustrating a reproducing process according to a third modification of the embodiment.

【図４４】この実施の形態の変形例４による制御イベン
トの構成例を示す図である。FIG. 44 is a diagram showing a configuration example of a control event according to a fourth modification of the embodiment.

【図４５】この実施の形態の変形例４による再生処理を
説明するフローチャートである。FIG. 45 is a flowchart illustrating a reproduction process according to Modification 4 of the embodiment.

【図４６】この実施の形態の変形例４による再生処理時
の表示画面の一状態遷移を示す図である。FIG. 46 is a diagram showing one state transition of a display screen at the time of a reproduction process according to Modification 4 of the embodiment.

【図４７】この実施の形態の変形例４による再生処理時
の表示画面の他の状態遷移を示す図である。FIG. 47 is a diagram showing another state transition of the display screen at the time of the reproduction process according to Modification 4 of the embodiment.

【図４８】この実施の形態の変形例４による再生処理時
の表示画面のさらに他の状態遷移を示す図である。FIG. 48 is a diagram showing still another state transition of the display screen at the time of the reproduction process according to Modification 4 of the embodiment.

[Explanation of symbols]

１ホスト装置２端末装置１０通信部１１ＤＢ１２制御部２０通信部２１声色記憶部２２アプリケーション記憶部２３スピーカ２４制御部２５表示部２６ＤＢ２８マイク２９キー入力部３１ａＦＤ３２ａＣＤ−ＲＯＭ３５音声認識部２１１声色部２１２音韻部２１３音韻部２４１ＣＰＵ Reference Signs List 1 host device 2 terminal device 10 communication unit 11 DB 12 control unit 20 communication unit 21 voice color storage unit 22 application storage unit 23 speaker 24 control unit 25 display unit 26 DB 28 microphone 29 key input unit 31a FD 32a CD-ROM 35 voice recognition Part 211 voice part 212 phoneme part 213 phoneme part 241 CPU

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＧ１０Ｌ 5/04 Ｇ１０Ｌ 5/04 Ｅ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁶ Identification code FI G10L 5/04 G10L 5/04 E

Claims

[Claims]

1. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, The first communication device associates one or both of the loudness and loudness of the voice with the time difference, and converts one or both of the loudness and the loudness of the voice between phonemes. File information storage means for storing file information including utterance information which is discrete so as to have a relative level without depending on the time difference between the file information storage means and the file information storage means in response to a request from the second communication device And a first communication means for transferring the utterance information stored in the second communication device to the second communication device, wherein the second communication device comprises: Memorize the voice Storage means; and a second communication for requesting the first communication device to transfer the file information stored in the file information storage means, and thereafter receiving the file information transferred by the first communication means. Means, selecting means for selecting one voice data from a plurality of types of voice data stored in the voice color storage means based on speech information in the file information received by the second communication means, Developing means for developing a continuous prosody pattern in the time axis direction based on one or both of the loudness and pitch of the voice included in the utterance information, or both, and the developing means; An audio reproduction means for generating an audio waveform based on the prosodic pattern and the timbre data selected by the selection means. Temu.

2. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, The first communication device associates one or both of the loudness and loudness of the voice with the time difference and the type of the timbre, and the one of the loudness and the loudness of the voice, or File information storage means for storing file information including utterance information which is not dependent on a time difference between phonemes and has a relative level, and which responds to a request of the second communication device. And a first communication unit for transferring the file information stored in the file information storage unit to the second communication device, wherein the second communication device is provided for each unit of phoneme or the like for each voice type. Voice data representing acoustic parameters Voice data storage means for storing data, and requesting the first communication device to transfer the file information stored in the file information storage means, and thereafter transmitting the file information transferred by the first communication means. A plurality of timbres stored in the timbre storage means, the timbre data corresponding to the type of timbre in the utterance information of the file information received by the second communicator. Selecting means for selecting from among the data, and developing a continuous prosody pattern in the time axis direction based on one or both of the voice loudness and the voice pitch included in the utterance information and the time difference thereof Expanding means; and audio reproducing means for generating an audio waveform based on the prosody pattern expanded by the expanding means and the timbre data selected by the selecting means. Information communication system, characterized in that the.

3. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, The first communication device associates one or both of the loudness and loudness of the voice with the time difference and the attribute of the timbre, and either the loudness and the loudness of the voice, or File information storage means for storing file information including utterance information which is not dependent on a time difference between phonemes and has a relative level, and which responds to a request of the second communication device. And a first communication unit for transferring the file information stored in the file information storage unit to the second communication device. The second communication device, for each voice type, for each unit of phoneme or the like Voice representing acoustic parameters Voice storage means for storing data and information indicating the voice attribute thereof in association with each other; and requesting the first communication device to transfer the file information stored in the file information storage means. Second communication means for receiving file information transferred by the first communication means; information indicating a voice attribute of speech information in the file information received by the second communication means; and the voice color storage means Matching means for comparing the information indicating the attributes of the various timbres stored in the timbre, and a plurality of types stored in the timbre storage means based on the similarity determined by the matching means. Selecting means for selecting the voice data having the highest similarity from the voice data of the utterance data, and either one or both of the loudness and the voice pitch included in the utterance information, Expansion means for expanding a continuous prosody pattern in the time axis direction based on the difference, and a voice for generating a voice waveform based on the prosody pattern expanded by the expansion means and the timbre data selected by the selection means An information communication system comprising: a reproducing unit.

4. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, The first communication device associates one or both of the loudness and the loudness of the voice with the time difference, the type of the timbre, and the attribute of the timbre. File information storage means for storing file information including utterance information obtained by discriminating one or both of them so as to have a relative level without depending on a time difference between phonemes; and And a first communication unit for transferring file information stored in the file information storage unit to the second communication device in response to a request from the device, wherein the second communication device includes a phoneme or the like for each voice type. Parameters of each unit Voice data storing means for storing voice data representing voice data and information indicating the attribute of the voice, and requesting the first communication device to transfer the file information stored in the file information storage means. And second communication means for receiving the file information transferred by the first communication means thereafter; and changing the type of voice in the speech information in the file information received by the second communication means to the voice color Searching means for searching from various types of voices stored in the storage means; if the type of voice in the utterance information can be obtained by the search of the searching means, the type of voice obtained corresponds to the obtained type of voice. First selecting means for selecting the timbre data to be performed from the various timbre data stored in the timbre storage means, and the type of the timbre in the utterance information can be obtained by searching by the searching means. If there is no vocal information, the information indicating the attribute of the timbre in the utterance information stored in the utterance information storage means is compared with the information indicating the attributes of various timbres stored in the timbre storage means. And a second unit for selecting voice data having the highest similarity from a plurality of types of voice data stored in the voice memory based on the similarity obtained by the matching unit. Selecting means; developing means for developing a continuous prosodic pattern in the time axis direction based on one or both of a voice volume and a voice pitch included in the utterance information, and a time difference thereof; Prosody pattern developed by the means and the first
Or an audio reproducing means for generating an audio waveform based on the timbre data selected by the second selecting means.

5. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, A first communication device for storing file information including utterance information including phonemes and prosody as information; utterance information stored in the file information storage device in response to a request from the second communication device; And a first communication unit for transferring voice data to the second communication device, wherein the second communication device stores voice data representing voice parameters for each unit of phoneme or the like for each voice type. A second communication device that requests the first communication device to transfer the file information stored in the file information storage unit, and thereafter receives the file information transferred by the first communication unit. Selecting means for selecting one voice data from a plurality of types of voice data stored in the voice memory based on speech information in the file information received by the second communication means; Expansion means for expanding a continuous prosody pattern in the time axis direction based on the utterance information; and audio reproduction for generating an audio waveform based on the prosody pattern expanded by the expansion means and the timbre data selected by the selection means. An information communication system comprising:

6. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, A first communication device, a file information storage unit that stores file information including speech information including a phoneme, a temperament, and a timbre type as information; and a file information storage unit that stores the file information in response to a request from the second communication device. First communication means for transferring the obtained file information to the second communication device, wherein the second communication device stores voice data representing an acoustic parameter for each unit of phoneme or the like for each type of voice. Requesting the first communication device to transfer the file information stored in the file information storage unit, and receiving the file information transferred by the first communication unit. A second communication unit, and voice data corresponding to the type of voice in the utterance information of the file information received by the second communication unit, of the plurality of types of voice data stored in the voice storage unit. Selecting means for selecting from among; developing means for developing a continuous prosody pattern in the time axis direction based on the utterance information; and prosody pattern developed by the developing means and voice data selected by the selecting means. An information communication system, comprising: a sound reproducing unit that generates a sound waveform based on the sound reproduction means.

7. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, A first communication device for storing file information including utterance information including phonemes, prosody, and voice attributes as information; and storing the file information in the file information storage device in response to a request from the second communication device. First communication means for transferring the obtained file information to the second communication device, wherein the second communication device comprises: for each type of timbre, voice data representing an acoustic parameter for each unit such as a phoneme; Voice color storage means for storing information indicating the voice color attributes in association with each other; requesting the first communication device to transfer the file information stored in the file information storage means; communication Second communication means for receiving the file information transferred by the step, information indicating a voice attribute of utterance information in the file information received by the second communication means, and stored in the voice storage means Matching means for comparing the information indicating the attributes of the various timbres to determine the similarity of the timbres; and a plurality of types of timbre data stored in the timbre storage means based on the similarity determined by the matching means. Selecting means for selecting voice data having the highest similarity from among the following; developing means for developing a continuous prosody pattern in the time axis direction based on the utterance information; and prosody pattern developed by the developing means and the selection. And a sound reproducing means for generating a sound waveform based on the voice color data selected by the means.

8. An information communication system for connecting a first communication device and a second communication device to a communication network and performing information communication between the first communication device and the second communication device via the communication network, A first communication device, file information storage means for storing file information including utterance information including a phoneme, a prosody, a timbre type, and a timbre attribute as information; and the file information in response to a request from the second communication device. And a first communication unit for transferring file information stored in a storage unit to the second communication device, wherein the second communication device is configured to determine, for each type of voice, an acoustic parameter for each unit of phoneme or the like. Voice data storage means for storing the voice data representing the voice and information indicating the attribute of the voice in association with each other; and requesting the first communication device to transfer the file information stored in the file information storage means. To A second communication unit for receiving the file information transferred by the first communication unit; and a voice type in the utterance information in the file information received by the second communication unit, stored in the voice color storage unit. Search means for searching from the various types of timbres that have been obtained.If the type of timbre in the utterance information can be obtained by the search of the search means, voice data corresponding to the obtained timbre type is obtained. A first selection unit for selecting from among various voice data stored in the voice storage unit; and a case where a type of voice in the speech information cannot be obtained by the search by the search unit. Matching means for comparing the information indicating the attributes of the timbres in the speech information stored in the storage means with the information indicating the attributes of various timbres stored in the timbre storage means to determine the similarity of the timbres Second selecting means for selecting voice data having the highest similarity from a plurality of types of timbre data stored in the timbre storage means based on the similarity determined by the matching means; and the utterance information. Expansion means for expanding a continuous prosody pattern in the time axis direction based on the prosody pattern;
Or an audio reproducing means for generating an audio waveform based on the timbre data selected by the second selecting means.

9. The information indicating the attribute is any one of gender, age, voice pitch criterion, intelligibility, and naturalness,
The information communication system according to any one of claims 3, 4, 7, and 8, wherein the information communication system is a combination of two or more thereof.

10. The file information storage means stores first information indicating a criterion of a voice pitch in the speech information,
The timbre storage means stores second information indicating a criterion of a voice pitch in the timbre data, and the voice reproducing means determines a criterion of a voice pitch based on the first information based on the second information. The information communication system according to any one of claims 1 to 8, wherein a reference for a voice pitch at the time of voice reproduction is determined by shifting according to a reference for a voice pitch.

11. The file information storage means stores first information indicating a criterion of a voice pitch in the utterance information, and stores the first information.
The voice reproducing means has an input means for arbitrarily inputting second information indicating a reference of a voice pitch, and a second information inputted by the input means based on the voice information based on the first information. The information communication system according to any one of claims 1 to 8, wherein a reference of a voice pitch at the time of voice reproduction is determined by shifting according to a voice pitch standard.

12. The information according to claim 10, wherein the criterion of the voice pitch according to the first and second information is an average frequency, a maximum frequency, or a minimum frequency of the voice pitch. Communications system.

13. The second communication device further includes a removable recording medium storing voice data, reads voice data from the recording medium, and stores the read voice data in the voice storage means. The information communication system according to any one of claims 1 to 8, wherein

14. The voice communication apparatus according to claim 1, wherein said second communication device receives voice data from an external device via a communication line, and stores the voice data in said voice storage means.
9. The information communication system according to any one of items 8.

15. The utterance information includes control information for synchronizing an operation based on other information in the file information with an operation performed by the audio reproducing unit, and the audio reproducing unit is included in the utterance information during audio reproduction. 9. The information communication system according to claim 1, wherein the information communication system operates in synchronization with the operation of other information in the file information according to the control information.

16. The information communication system according to claim 15, wherein said other information is image information, music information and the like.

17. An information processing apparatus for creating and editing utterance information used in the information communication system according to any one of claims 1 to 8, wherein: a voice input means for inputting a natural voice; Creating means for creating the utterance information based on natural speech input by means, and requesting the first communication device to register file information including the utterance information created by the creating means; An information processing apparatus, comprising: a registration transfer unit that transfers file information including generated utterance information to the first communication device and registers the file information in the file information storage unit of the first communication device.

18. An information processing apparatus for creating and editing utterance information for use in the information communication system according to claim 10 or 11, wherein said creating means uses the utterance as first information indicating a reference of a voice pitch. The information processing apparatus according to claim 17, wherein the information processing apparatus is created by including the information.

19. The information processing apparatus according to claim 17, wherein said creating means includes a changing means for arbitrarily changing said information.

20. An information processing apparatus for creating and editing utterance information used in the information communication system according to claim 15 or 16, wherein said creating means converts said control information into said utterance when creating said utterance information. 18. The method according to claim 17, wherein the information is included in the information.
An information processing apparatus according to claim 1.

21. A first communication device and a second communication device are connected to a communication network, and in the first communication device, one or both of a voice volume and a voice pitch and a time difference between the two are determined. File information including utterance information obtained by discriminating one or both of the loudness and the pitch of the voice so as to have a relative level without depending on the time difference between phonemes. In the second communication device, voice data representing acoustic parameters for each unit of phoneme or the like for each color type is stored in advance in the file information storage unit, and the voice data is stored in the voice storage unit in advance. By performing information communication between the first communication device and the second communication device via a network, the utterance information of the file information stored in the file information storage unit and the voice stored in the voice storage unit Data and An information communication method for synthesizing speech based on: a transfer that transfers speech information stored in the file information storage unit to the second communication device in response to a request from the second communication device to the first communication device And selecting one voice data from a plurality of types of voice data stored in the voice color storage unit based on speech information in the file information transferred in the transfer step in the second communication device. Selecting, and developing a prosody pattern that is continuous in the time axis direction in the second communication device based on one or both of the voice volume and the voice pitch included in the utterance information and the time difference between the two. And a tone pattern based on the prosody pattern developed in the second communication device in the development step and the timbre data selected in the selection step. An audio communication method, comprising: a voice reproduction step of generating a voice waveform.

22. A first communication device and a second communication device are connected to a communication network, and in the first communication device, one or both of a loudness and a loudness of a voice and a time difference between the two. Utterances that correspond to the timbre type, and that one or both of the voice volume and voice pitch are discrete so as to have a relative level without depending on the time difference between phonemes. The file information including the information is stored in the file information storage unit in advance, and in the first communication device, the timbre data representing the acoustic parameters for each unit of phoneme or the like for each voice type is stored in the timbre storage unit in advance. In addition, by performing information communication between the first communication device and the second communication device via the communication network, the utterance information of the file information stored in the file information storage unit and the voice color storage unit Remembered in An information communication method for performing voice synthesis based on voice data that is stored in said file information storage means in response to a request from said second communication device to said first communication device. Transfer means for transferring to the device; and voice data corresponding to the type of voice in the utterance information of the file information transferred in the transfer step in the second communication device is stored in the voice storage unit. A selecting step of selecting from among a plurality of types of voice data; and a time based on one or both of a loudness and a loudness of a voice included in the utterance information in the second communication device, and a time difference between the two. An expansion step of expanding a prosody pattern that is continuous in the axial direction; and a prosody pattern expanded by the expansion step and selected by the selection step in the second communication device. An audio reproduction step of generating an audio waveform based on the determined timbre data.

23. A first communication device and a second communication device connected to a communication network, wherein at the first communication device, one or both of a voice volume and a voice pitch, and a time difference between the two. , Utterances that correspond to the attributes of the timbre, and that one or both of the voice volume and voice pitch are discrete so as to have a relative level without depending on the time difference between phonemes. The file information including the information is stored in the file information storage unit in advance, and the second communication device indicates, for each type of voice, voice data representing an acoustic parameter for each unit such as a phoneme and the attribute of the voice. The information is stored in the voice storage unit in advance in association with the information, and is stored in the file information storage unit by performing information communication between the first communication device and the second communication device via the communication network. File information An information communication method for performing voice synthesis based on the utterance information and voice color data stored in the voice color storage unit, wherein the file information storage unit is responsive to a request from the second communication device to the first communication device. Transferring the file information stored in the second communication device to the second communication device; and information indicating a voice attribute of the utterance information in the file information transferred by the transfer process in the second communication device. A collation step of collating information indicating the attributes of various timbres stored in the timbre storage unit to obtain similarities of timbres; and A selecting step of selecting voice data having the highest similarity from a plurality of types of voice data stored in the voice storage unit; and the second communication device includes the voice data in the speech information. A developing step of developing a prosody pattern that is continuous in the time axis direction based on one or both of the loudness and pitch of the voice, and a time difference between the two, and the developing step is performed by the developing step in the second communication device. An audio reproduction step of generating an audio waveform based on the prosody pattern and the timbre data selected in the selection step.

24. A first communication device and a second communication device connected to a communication network, wherein at the first communication device, one or both of a voice volume and a voice pitch, and a time difference between the two. , The type of voice and the attribute of voice are associated with each other, and one or both of the voice volume and the voice pitch are not dependent on the time difference between phonemes and have a relative level. File information including discrete utterance information is stored in the file information storage unit in advance, and in the second communication device, for each type of timbre, voice data representing sound parameters for each unit such as a phoneme and The information indicating the attribute of the timbre is stored in the timbre storage unit in advance in association with the information, and the first communication device and the second communication device are connected via the communication network.
An information communication method for performing voice communication based on utterance information of file information stored in the file information storage unit and voice data stored in the voice color storage unit by performing information communication between communication devices. Transferring a file information stored in the file information storage unit to the second communication device in response to a request from the second communication device to the first communication device; and transferring the file information in the second communication device. Searching for the type of voice in the utterance information of the file information transferred by the method from the various types of voice stored in the voice storage unit; and searching in the searching step in the second communication device. When the voice type of the utterance information can be obtained by the above, voice data corresponding to the obtained voice type is stored in the voice storage unit. A first selecting step of selecting from among various voice data; and if the type of voice in the utterance information cannot be obtained by the search in the search step in the second communication device, the file information storage unit A collation step of collating information indicating a voice attribute of the stored utterance information with information indicating various voice attributes stored in the voice storage unit to obtain a similarity of the voice; A second selecting step of selecting, from the plurality of types of timbre data stored in the timbre storage step, the timbre data having the highest similarity in the communication device based on the similarity obtained in the matching step; (2) In the communication device, a prosody pattern continuous in a time axis direction is developed based on one or both of a voice volume and a voice pitch included in the speech information and a time difference between the two. A voice reproduction step of generating a voice waveform based on the prosody pattern expanded by the expansion step in the second communication device and the timbre data selected by the first or second selection step. An information communication method, comprising:

25. A first communication device and a second communication device are connected to a communication network, and in the first communication device, file information including speech information including phonemes and prosody as information is stored in a file information storage unit in advance. In the second communication device, the timbre data representing the acoustic parameters of each unit such as phoneme for each color type is stored in advance in the timbre storage unit, and the second communication device stores the timbre data via the communication network. By performing information communication between the first communication device and the second communication device, a voice is generated based on the utterance information of the file information stored in the file information storage unit and the voice data stored in the voice color storage unit. An information communication method for combining, wherein a transfer step of transferring utterance information stored in the file information storage unit to the second communication device in response to a request from the second communication device to the first communication device. Selecting one voice data from a plurality of types of voice data stored in the voice memory based on speech information in the file information transferred in the transfer step in the second communication device; A developing step of developing a prosody pattern continuous in the time axis direction based on the utterance information in the second communication device; and a prosody pattern developed in the second communication device by the developing step and the selection step. An audio reproduction step of generating an audio waveform based on the selected timbre data.

26. A first communication device and a second communication device connected to a communication network, wherein the first communication device converts file information including speech information including phonemes, prosody, and timbre types as information. Store in advance in the storage unit,
In the second communication device, voice data representing an acoustic parameter for each segment unit such as a phoneme for each type of voice is stored in a voice storage unit in advance, and the first communication device and the second communication device are connected to each other via the communication network. An information communication method for performing voice communication based on utterance information of file information stored in the file information storage unit and voice data stored in the voice color storage unit by performing information communication between communication devices. A transfer step of transferring file information stored in the file information storage unit to the second communication device in response to a request from the second communication device to the first communication device; Voice data corresponding to the type of voice in the utterance information of the file information transferred in the transfer step is output from the plurality of types of voice data stored in the voice storage unit. A selecting step to determine; a developing step of developing a continuous prosody pattern in the time axis direction based on the utterance information in the second communication device; and a prosody pattern developed by the developing step in the second communication device. An audio reproduction step of generating an audio waveform based on the voice color data selected in the selection step.

27. A first communication device and a second communication device connected to a communication network, wherein the first communication device converts file information including speech information including phonemes, prosody, and voice attributes as file information. Store in advance in the storage unit,
In the second communication device, for each type of voice, voice data representing an acoustic parameter for each unit such as a phoneme and information indicating an attribute of the voice are stored in a voice storage unit in advance, and the communication is performed. By performing information communication between the first communication device and the second communication device via a network, the utterance information of the file information stored in the file information storage unit and the voice stored in the voice storage unit An information communication method for performing voice synthesis on the basis of data, wherein the file information stored in the file information storage unit is transferred to the second communication device in response to a request from the second communication device to the first communication device. A transfer step, and information indicating a voice attribute of speech information in the utterance information of the file information transferred by the transfer step in the second communication device, and each of the information stored in the voice color storage unit. A collation step of collating information indicating the attribute of the timbre to determine the similarity of the timbre; and a plurality of types stored in the timbre storage unit based on the similarity determined by the collation step in the second communication device. Selecting the voice data having the highest similarity from the voice data of the first and second voice data; and developing the prosody pattern continuous in the time axis direction based on the utterance information in the second communication device; An information communication method, comprising: in a communication device, a voice reproduction step of generating a voice waveform based on the prosody pattern expanded in the expansion step and the timbre data selected in the selection step.

28. A file containing speech information including, as information, a phoneme, a prosody, a type of voice, and an attribute of voice, wherein the first communication device and the second communication device are connected to a communication network. The information is stored in the file information storage unit in advance, and in the second communication device, for each type of timbre, timbre data representing an acoustic parameter for each unit such as a phoneme and information indicating an attribute of the timbre are stored. Correspondingly stored in the voice storage unit in advance, and the first
By performing information communication between the communication device and the second communication device, voice synthesis is performed based on utterance information of the file information stored in the file information storage unit and voice data stored in the voice color storage unit. An information communication method for transferring file information stored in the file information storage unit to the second communication device in response to a request from the second communication device to the first communication device; (2) a search step of searching for the type of voice in the utterance information of the file information transferred in the transfer step from the various voice types stored in the voice color storage unit in the communication device; When the type of voice in the utterance information can be obtained by the search in the search step, the voice data corresponding to the obtained voice type is stored in the voice storage unit. A first selecting step of selecting from among various stored voice data; and a step of selecting the voice information in the utterance information by the second communication device in the search in the search step. A collation step of collating information indicating the attribute of the timbre of the utterance information stored in the information storage unit with information indicating the attributes of various timbres stored in the timbre storage unit to determine the similarity of the timbres; A second selecting step of selecting, from the plurality of types of timbre data stored in the timbre storage unit, voice data having the highest similarity in the second communication device based on the similarity obtained in the matching step; An expansion step of expanding a prosody pattern continuous in a time axis direction based on the utterance information in the second communication device; and an expansion step of expanding the prosody pattern in the second communication device. Information communication method, which comprises a sound reproduction step of generating a speech waveform based on the tone of voice data selected by said prosodic pattern first or second selection step.

29. The information indicating the attribute is any one of gender, age, voice pitch criterion, clarity, and naturalness, or a combination of two or more thereof. The information communication method according to any one of claims 23, 24, 27, and 28.

30. The file information storage unit stores first information indicating a criterion of a voice pitch in the utterance information, and the timbre storage unit stores the second information indicating a criterion of a voice pitch. The voice reproduction step is performed by shifting the reference of the voice pitch according to the first information by the reference of the voice pitch according to the second information. 22. A criterion for determining the criterion is determined.
29. The information communication method according to any one of -28.

31. The file information storage unit stores first information indicating a reference of a voice pitch in the speech information, and stores the second information indicating a reference of a voice pitch in the speech information. An input step of arbitrarily inputting, wherein a voice pitch at the time of voice reproduction is shifted by shifting a voice pitch standard according to the first information according to the voice pitch standard according to the second information input at the input step. The information communication method according to any one of claims 21 to 28, wherein a reference for the height of the information is determined.

32. The information according to claim 30, wherein the criterion of the voice pitch according to the first and second information is an average frequency, a maximum frequency, or a minimum frequency of the voice pitch. Communication method.

33. The second communication device, further comprising connecting a detachable recording medium storing voice data, reading voice data from the recording medium, and storing the read voice data in the voice storage unit. The information communication method according to any one of claims 21 to 28.

34. The apparatus according to claim 21, wherein the second communication device receives voice data from an external device via a communication line and stores the voice data in the voice storage unit.
28. The information communication method according to any one of 28.

35. The utterance information includes control information for synchronizing an operation based on other information in the file information with an operation according to the audio reproduction step, and the audio reproduction step is included in the utterance information during audio reproduction. The information communication method according to any one of claims 21 to 28, wherein the information communication method operates in synchronization with the operation of other information in the file information according to the control information.

36. The information communication method according to claim 35, wherein the other information is image information, music information, and the like.

37. An information processing method for creating and editing utterance information to be used in the information communication method according to claim 21 by a third communication device connected to the communication network. A voice inputting step of inputting a voice; a creating step of creating the utterance information based on the natural voice input in the voice inputting step; and the utterance information created by the creating step for the first communication device. A registration and transfer step of requesting registration of file information including the utterance information, transferring the file information including the generated utterance information to the first communication device, and registering the file information in the file information storage unit of the first communication device; An information processing method comprising:

38. An information processing method for creating and editing utterance information used in the information communication method according to claim 30 or 31, wherein the creating step includes first utterance indicating a criterion of a voice pitch being said utterance. 38. The information processing method according to claim 37, wherein the information is created by being included in the information.

39. The information processing method according to claim 37, wherein the creating step includes a changing step of arbitrarily changing the information.

40. An information processing method for creating and editing utterance information used in the information communication method according to claim 35 or 36, wherein the creating step includes transmitting the control information when creating the utterance information. 38. The information included in the information.
The information processing method described.