JP2007171765A

JP2007171765A - Speech database producing device, speech database, speech segment restoring device, speech database producing method, speech segment restoring method, and program

Info

Publication number: JP2007171765A
Application number: JP2005371774A
Authority: JP
Inventors: Kunihiro Suga; 邦博須賀
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2005-12-26
Filing date: 2005-12-26
Publication date: 2007-07-05
Anticipated expiration: 2025-12-26
Also published as: JP4816067B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech database producing device and the like for effectively protecting speech data while freely supplying data representing speeches. <P>SOLUTION: A speech segment registration unit R acquires speech segment data representing a speech segment, acquires three kinds of data representing the data length of the speech segment data, the speaking speed of the speech segment and time variation in frequency of a pitch component, extracts and operates one piece of data from each data, i.e. three pieces of data in total to produce an encryption key, encrypts the speech segment data by using this encryption key into encrypted speech segment data, and stores the encrypted speech segment data in a speech segment database D together with speech segment reading data. A main body unit M reads out the encrypted speech segment data from the speech segment database D, produces the encryption key by using the three pieces of data among data stored in the speech segment database D, restores the speech segment data by decoding the encrypted speech segment data by using the encryption key, and uses the data. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、音声データベース製造装置、音声データベース、音片復元装置、音声データベース製造方法、音片復元方法及びプログラムに関する。 The present invention relates to a voice database manufacturing device, a voice database, a sound piece restoring device, a voice database manufacturing method, a sound piece restoring method, and a program.

音声を合成する手法として、録音編集方式と呼ばれる手法がある。録音編集方式は、駅の音声案内システムや、車載用のナビゲーション装置などに用いられている。
録音編集方式は、単語と、この単語を読み上げる音声を表す音声データとを対応付けておき、音声合成する対象の文章を単語に区切ってから、これらの単語に対応付けられた音声データを取得してつなぎ合わせる、という手法である（例えば、特許文献１参照）。
特開平１０−４９１９３号公報 As a technique for synthesizing speech, there is a technique called a recording editing system. The recording / editing system is used in a station voice guidance system, an in-vehicle navigation system, and the like.
The recording and editing method associates a word with voice data representing a voice that reads out the word, divides a sentence to be synthesized into words, and acquires voice data associated with these words. This is a technique of joining them together (for example, see Patent Document 1).
JP 10-49193 A

録音編集方式により得られる合成音声の話者の変更を可能としたり、あるいはその他、得られる合成音声を多様にするための手法としては、音声データをリムーバブルメディア（可搬な記録媒体）に記録して用いるものとして、互いに異なる音声データを記録した複数のリムーバブルメディアを必要に応じて差し替える、というものが考えられる。 As a technique for making it possible to change the speaker of the synthesized speech obtained by the recording and editing method, or to diversify the synthesized speech obtained, record the speech data on removable media (portable recording media). As one to use, a plurality of removable media in which different audio data are recorded may be replaced as necessary.

しかし、リムーバブルメディアに記録された音声データは、不正な複製やその他の不正利用をされやすいという問題がある。そこで、この問題を解決するための手法として、音声データを、暗号化した上でリムーバブルメディアに記録するという手法が考えられる。 However, there is a problem that the audio data recorded on the removable medium is easily subjected to unauthorized duplication and other unauthorized use. Therefore, as a technique for solving this problem, a technique of encrypting audio data and recording it on a removable medium is conceivable.

ところが、音声データを暗号鍵を用いて暗号化した上でリムーバブルメディアに記録し、音声合成を行う装置がこの音声データを利用するものとする場合、暗号化された音声データの復号化に必要なこの暗号鍵の内容を、音声合成を行う装置が把握できる必要がある。 However, when audio data is encrypted using an encryption key, recorded on a removable medium, and a device that performs speech synthesis uses this audio data, it is necessary to decrypt the encrypted audio data. The content of this encryption key needs to be understood by a device that performs speech synthesis.

しかし、例えば暗号鍵をこの装置が予め記憶するものとすると、音声データを記録したリムーバブルメディアを製造する側の者がこの暗号鍵を知っている必要がある（換言すれば、この暗号鍵とは異なる暗号鍵を用いて暗号化された音声データを記録したリムーバブルメディアを用いても音声合成が正常に行えない）こととなり、音声データの自由な供給が阻害される。 However, if this device stores the encryption key in advance, for example, the person who manufactures the removable media on which the audio data is recorded needs to know this encryption key (in other words, what is this encryption key? Even if a removable medium that records voice data encrypted using a different encryption key is used, voice synthesis cannot be performed normally), and free supply of voice data is hindered.

また、例えば暗号鍵を単にこの装置へ別途供給した場合は、暗号鍵が、暗号鍵を非公開とする対象者へと容易に漏洩し得ることとなり、暗号化された音声データの保護が有効に図れない。 In addition, for example, when an encryption key is simply supplied separately to this device, the encryption key can be easily leaked to a target person who makes the encryption key private, and the protection of encrypted audio data is effective. I can't figure it out.

また、この暗号鍵を暗号化した上でこの装置へと供給する場合は、この装置が、この暗号鍵を復号化できる必要がある。従って、暗号鍵を復号化するための条件をいかにしてこの装置に安全に把握させるか、という点について、音声データを復号化するための条件についてと同様の問題が更に生じることになる。 In addition, when the encryption key is encrypted and then supplied to the apparatus, the apparatus needs to be able to decrypt the encryption key. Therefore, the same problem as the condition for decrypting the audio data is further caused in terms of how to make the apparatus securely grasp the condition for decrypting the encryption key.

この発明は、上記実状に鑑みてなされたものであり、音声を表すデータの自由な供給を図りながら、音声データの有効な保護を図るための音声データベース製造装置、音声データベース、音片復元装置、音声データベース製造方法、音片復元方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and is intended to provide a voice database manufacturing device, a voice database, a sound piece restoration device, and an effective protection of voice data while freely supplying data representing voice. An object of the present invention is to provide a voice database manufacturing method, a sound piece restoration method, and a program.

上記目的を達成するため、この発明の第１の観点に係る音声データベース製造装置は、
音片を表す音片データを取得するデータ取得手段と、
取得された前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータを生成し、生成した当該データのうち所定の複数個のデータを用いて暗号鍵を生成する暗号鍵生成手段と、
取得された前記音片データを、生成された前記暗号鍵を用いて暗号化することにより、暗号化音片データを生成する暗号化手段と、
生成された前記暗号化音片データを外部の記憶装置の記憶領域に格納する音片記憶手段と、を備える、
ことを特徴とする。 In order to achieve the above object, an audio database manufacturing apparatus according to the first aspect of the present invention provides:
Data acquisition means for acquiring sound piece data representing a sound piece;
Data representing the characteristics of the acquired sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data are generated, and encryption is performed using a plurality of predetermined data among the generated data An encryption key generating means for generating a key;
An encryption means for generating encrypted sound piece data by encrypting the acquired sound piece data using the generated encryption key;
Sound piece storage means for storing the generated encrypted sound piece data in a storage area of an external storage device,
It is characterized by that.

前記暗号鍵生成手段は、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、前記記憶装置又は当該暗号化音片データの復号化を行う外部の装置に固有に割り当てられた識別データの値により決まるもの複数個を、前記所定の複数個のデータとして扱い、前記暗号鍵を生成するものであってもよい。 The encryption key generation means decrypts the storage device or the encrypted sound piece data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of data determined by identification data values uniquely assigned to an external device that performs the processing may be handled as the predetermined plurality of data to generate the encryption key.

前記暗号鍵生成手段は、前記所定の複数個のデータに、前記記憶装置又は当該暗号化音片データの復号化を行う外部の装置に固有に割り当てられた識別データの値により決まる規則に従った演算を施すことにより、前記暗号鍵を生成するものであってもよい。 The encryption key generation means complies with a rule determined by the value of identification data uniquely assigned to the predetermined plurality of data in the storage device or an external device that decrypts the encrypted sound piece data. The encryption key may be generated by performing an operation.

前記音片データの特徴を表すデータは、例えば、当該音片データのデータ長を表すデータからなるものであればよい。 The data representing the characteristics of the sound piece data may be data including data representing the data length of the sound piece data, for example.

前記音片データが表す音片の特徴を表すデータは、例えば、当該音片の発声スピードを表すデータ、又は、当該音片のピッチ成分の周波数の時間変化を表すデータからなるものであればよい。 The data representing the characteristics of the sound piece represented by the sound piece data may be data including the data representing the utterance speed of the sound piece or the data representing the time change of the frequency of the pitch component of the sound piece. .

また、この発明の第２の観点に係る音声データベースは、
音片を表す音片データが暗号化されたものに相当する暗号化音片データを記憶する音声データベースであって、
前記暗号化音片データは、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、所定の複数個のデータを用いて生成された暗号鍵を用いて前記音片データが暗号化されたものに相当する、
ことを特徴とする。 The speech database according to the second aspect of the present invention is:
An audio database that stores encrypted sound piece data corresponding to sound piece data representing sound pieces,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key.
It is characterized by that.

前記暗号化音片データは、例えば、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、前記音声データベース自身又は当該暗号化音片データの復号化を行う外部の装置に固有に割り当てられた識別データの値により決まるもの複数個を前記所定の複数個のデータとして生成された暗号鍵を用いて、前記音片データが暗号化されたものに相当するものであればよい。 The encrypted sound piece data is, for example, the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. The sound piece data is encrypted using an encryption key generated as the predetermined plurality of pieces of data determined by identification data values uniquely assigned to an external device that performs data decryption. Anything equivalent to the above may be used.

前記暗号化音片データは、例えば、前記所定の複数個のデータに、前記音声データベース自身又は当該暗号化音片データの復号化を行う外部の装置に固有に割り当てられた識別データの値により決まる規則に従った演算を施すことにより生成された暗号鍵を用いて、前記音片データが暗号化されたものに相当するものであればよい。 The encrypted sound piece data is determined by, for example, the value of identification data uniquely assigned to the predetermined plurality of data in the sound database itself or an external device that decrypts the encrypted sound piece data. What is necessary is just to correspond to what the said sound piece data was encrypted using the encryption key produced | generated by performing the calculation according to a rule.

また、この発明の第３の観点に係る音片復元装置は、
音片を表す音片データが暗号化されたものに相当する暗号化音片データを記憶し、当該音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータを更に記憶する外部の音声データベースに接続可能に構成され、
前記音声データベースが記憶する暗号化音片データのうち音片の復元に用いるものを選択する選択手段と、
前記選択手段が選択した暗号化音片データを復号化する復号化手段と、により構成されており、
前記暗号化音片データは、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、所定の複数個のデータを用いて生成された暗号鍵を用いて前記音片データが暗号化されたものに相当し、
前記復号化手段は、前記音声データベースより前記所定の複数個のデータを取得し、取得した当該複数個のデータを用いて前記暗号鍵を生成し、生成した当該暗号鍵を用いて前記暗号化音片データを復号化するものである、
ことを特徴とする。 Moreover, the sound piece restoring device according to the third aspect of the present invention is:
Encrypted sound piece data corresponding to the sound piece data representing the sound piece is stored, and the data representing the characteristics of the sound piece data and / or the characteristics of the sound pieces represented by the sound piece data are stored. It is configured to be connectable to an external audio database that further stores data to represent,
A selection means for selecting one of the encrypted speech piece data stored in the speech database to be used for restoring the speech piece;
And decryption means for decrypting the encrypted speech piece data selected by the selection means,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key,
The decryption means acquires the predetermined plurality of data from the voice database, generates the encryption key using the acquired plurality of data, and uses the generated encryption key to generate the encrypted sound. It is for decrypting one piece of data.
It is characterized by that.

前記音声データベースには、当該音声データベースに固有の識別データが割り当てられていてもよく、この場合、前記暗号化音片データは、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、前記音声データベースに固有に割り当てられた識別データの値により決まるもの複数個を前記所定の複数個のデータとして生成された暗号鍵を用いて、前記音片データが暗号化されたものに相当し、
前記復号化手段は、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、前記音声データベースに固有に割り当てられた識別データの値により決まるもの複数個を前記音声データベースより取得し、取得した当該複数個のデータを用いて前記暗号鍵を生成し、生成した当該暗号鍵を用いて前記暗号化音片データを復号化するものであってもよい。 The voice database may be assigned identification data unique to the voice database. In this case, the encrypted sound piece data includes data representing characteristics of the sound piece data and / or the sound piece. Among the data representing the characteristics of the sound pieces represented by the data, a plurality of data determined by the value of the identification data uniquely assigned to the speech database are used, using the encryption key generated as the predetermined plurality of data, It corresponds to the sound piece data encrypted,
The decoding means is based on the value of identification data uniquely assigned to the speech database among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of determined ones are acquired from the voice database, the encryption key is generated using the acquired plurality of data, and the encrypted sound piece data is decrypted using the generated encryption key. May be.

前記音声データベースに、当該音声データベースに固有の識別データが割り当てられている場合、前記暗号化音片データは、前記所定の複数個のデータに、前記音声データベースに固有に割り当てられた識別データの値により決まる規則に従った演算を施すことにより生成された暗号鍵を用いて、前記音片データが暗号化されたものに相当し、
前記復号化手段は、前記音声データベースより取得した前記所定の複数個のデータに、前記音声データベースに固有に割り当てられた識別データの値により決まる規則に従った演算を施すことにより前記暗号鍵を生成し、生成した当該暗号鍵を用いて前記暗号化音片データを復号化するものであってもよい。 When identification data unique to the voice database is assigned to the voice database, the encrypted speech piece data is a value of the identification data uniquely assigned to the voice database. The sound piece data is encrypted using an encryption key generated by performing an operation according to a rule determined by:
The decryption means generates the encryption key by performing an operation according to a rule determined by a value of identification data uniquely assigned to the speech database on the predetermined plurality of data acquired from the speech database. Then, the encrypted sound piece data may be decrypted using the generated encryption key.

前記音片復元装置は、前記復号化手段が復号化した音片データを互いに結合することにより、合成音声を表すデータを生成する合成手段を更に備えるものであってもよい。 The sound piece restoration device may further include synthesis means for generating data representing synthesized speech by combining the sound piece data decoded by the decoding means.

前記音片復元装置が、前記復号化手段が復号化した音片データを互いに結合することにより、合成音声を表すデータを生成する合成手段を更に備えている場合、前記合成手段には、当該合成手段に固有の識別データが割り当てられていてもよい。
この場合、前記暗号化音片データは、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、前記合成手段に固有に割り当てられた識別データの値により決まるもの複数個を前記所定の複数個のデータとして生成された暗号鍵を用いて、前記音片データが暗号化されたものに相当するものであってもよく、
前記復号化手段は、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、前記合成手段に固有に割り当てられた識別データの値により決まるもの複数個を前記音声データベースより取得し、取得した当該複数個のデータを用いて前記暗号鍵を生成し、生成した当該暗号鍵を用いて前記暗号化音片データを復号化するものであってもよい。 In the case where the sound piece restoration device further includes synthesis means for generating data representing synthesized speech by combining the sound piece data decoded by the decoding means, the synthesis means includes the synthesis means Unique identification data may be assigned to the means.
In this case, the encrypted sound piece data is uniquely assigned to the synthesizing unit among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of items determined by the value of the identification data may be equivalent to those obtained by encrypting the sound piece data using an encryption key generated as the predetermined plurality of data,
The decoding means is based on the value of the identification data uniquely assigned to the synthesizing means among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of determined ones are acquired from the voice database, the encryption key is generated using the acquired plurality of data, and the encrypted sound piece data is decrypted using the generated encryption key. May be.

前記音片復元装置が、前記復号化手段が復号化した音片データを互いに結合することにより、合成音声を表すデータを生成する合成手段を更に備えていて、前記合成手段に、当該合成手段に固有の識別データが割り当てられている場合、前記暗号化音片データは、前記所定の複数個のデータに、前記合成手段に固有に割り当てられた識別データの値により決まる規則に従った演算を施すことにより生成された暗号鍵を用いて、前記音片データが暗号化されたものに相当するものであってもよく、
前記復号化手段は、前記音声データベースより取得した前記所定の複数個のデータに、前記合成手段に固有に割り当てられた識別データの値により決まる規則に従った演算を施すことにより前記暗号鍵を生成し、生成した当該暗号鍵を用いて前記暗号化音片データを復号化するものであってもよい。 The sound piece restoration device further includes a synthesis means for generating data representing synthesized speech by combining the sound piece data decoded by the decoding means, and the synthesis means includes the synthesis means. When the unique identification data is assigned, the encrypted sound piece data is subjected to an operation according to a rule determined by the value of the identification data uniquely assigned to the synthesizing means on the predetermined plurality of data. The sound piece data may be equivalent to the encrypted data using the encryption key generated by
The decryption means generates the encryption key by performing an operation according to a rule determined by a value of identification data uniquely assigned to the synthesis means on the predetermined plurality of data acquired from the speech database Then, the encrypted sound piece data may be decrypted using the generated encryption key.

また、この発明の第４の観点に係る音声データベース製造方法は、
音片を表す音片データを取得し、
取得された前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータを生成し、生成した当該データのうち所定の複数個のデータを用いて暗号鍵を生成し、
取得された前記音片データを、生成された前記暗号鍵を用いて暗号化することにより、暗号化音片データを生成し、
生成された前記暗号化音片データを外部の記憶装置の記憶領域に格納する、
ことを特徴とする。 A speech database manufacturing method according to the fourth aspect of the present invention is as follows.
Acquire sound piece data representing a sound piece,
Data representing the characteristics of the acquired sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data are generated, and encryption is performed using a plurality of predetermined data among the generated data Generate a key
Encrypting the acquired sound piece data using the generated encryption key to generate encrypted sound piece data,
Storing the generated encrypted sound piece data in a storage area of an external storage device;
It is characterized by that.

また、この発明の第５の観点に係る音片復元方法は、
音片を表す音片データが暗号化されたものに相当する暗号化音片データを記憶し、当該音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータを更に記憶する音声データベースを用いる音片復元方法であって、
前記音声データベースが記憶する暗号化音片データのうち音片の復元に用いるものを選択する選択ステップと、
前記選択ステップで選択した暗号化音片データを復号化する復号化ステップ、により構成されており、
前記暗号化音片データは、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、所定の複数個のデータを用いて生成された暗号鍵を用いて前記音片データが暗号化されたものに相当し、
前記復号化ステップでは、前記音声データベースより前記所定の複数個のデータを取得し、取得した当該複数個のデータを用いて前記暗号鍵を生成し、生成した当該暗号鍵を用いて前記暗号化音片データを復号化する、
ことを特徴とする。 Moreover, the sound piece restoration method according to the fifth aspect of the present invention provides:
Encrypted sound piece data corresponding to the sound piece data representing the sound piece is stored, and the data representing the characteristics of the sound piece data and / or the characteristics of the sound pieces represented by the sound piece data are stored. A sound piece restoration method using a voice database for further storing data to be represented,
A selection step of selecting one of the encrypted speech piece data stored in the speech database to be used for restoring the speech piece;
A decryption step for decrypting the encrypted speech piece data selected in the selection step,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key,
In the decryption step, the predetermined plurality of data is acquired from the speech database, the encryption key is generated using the acquired plurality of data, and the encrypted sound is generated using the generated encryption key. Decrypt one piece of data,
It is characterized by that.

また、この発明の第６の観点に係るプログラムは、
コンピュータを、
音片を表す音片データを取得するデータ取得手段と、
取得された前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータを生成し、生成した当該データのうち所定の複数個のデータを用いて暗号鍵を生成する暗号鍵生成手段と、
取得された前記音片データを、生成された前記暗号鍵を用いて暗号化することにより、暗号化音片データを生成する暗号化手段と、
生成された前記暗号化音片データを外部の記憶装置の記憶領域に格納する音片記憶手段と、
して機能させるためのものであることを特徴とする。 A program according to the sixth aspect of the present invention is
Computer
Data acquisition means for acquiring sound piece data representing a sound piece;
Data representing the characteristics of the acquired sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data are generated, and encryption is performed using a plurality of predetermined data among the generated data An encryption key generating means for generating a key;
An encryption means for generating encrypted sound piece data by encrypting the acquired sound piece data using the generated encryption key;
Sound piece storage means for storing the generated encrypted sound piece data in a storage area of an external storage device;
It is for making it function.

また、この発明の第７の観点に係るプログラムは、
音片を表す音片データが暗号化されたものに相当する暗号化音片データを記憶し、当該音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータを更に記憶する外部の音声データベースに接続可能なコンピュータを、
前記音声データベースが記憶する暗号化音片データのうち音片の復元に用いるものを選択する選択手段と、
前記選択手段が選択した暗号化音片データを復号化する復号化手段と、して機能させるためのプログラムであって、
前記暗号化音片データは、前記音片データの特徴を表すデータ、及び／又は、当該音片データが表す音片の特徴を表すデータのうち、所定の複数個のデータを用いて生成された暗号鍵を用いて前記音片データが暗号化されたものに相当し、
前記復号化手段は、前記音声データベースより前記所定の複数個のデータを取得し、取得した当該複数個のデータを用いて前記暗号鍵を生成し、生成した当該暗号鍵を用いて前記暗号化音片データを復号化するものである、
ことを特徴とする。 A program according to the seventh aspect of the present invention is
Encrypted sound piece data corresponding to the sound piece data representing the sound piece is stored, and the data representing the characteristics of the sound piece data and / or the characteristics of the sound pieces represented by the sound piece data are stored. A computer connectable to an external audio database that further stores data to represent
A selection means for selecting one of the encrypted speech piece data stored in the speech database to be used for restoring the speech piece;
A program for functioning as decryption means for decrypting encrypted sound piece data selected by the selection means,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key,
The decryption means acquires the predetermined plurality of data from the voice database, generates the encryption key using the acquired plurality of data, and uses the generated encryption key to generate the encrypted sound. It is for decrypting one piece of data.
It is characterized by that.

この発明によれば、音声を表すデータの自由な供給を図りながら、音声データの有効な保護を図るための音声データベース製造装置、音声データベース、音片復元装置、音声データベース製造方法、音片復元方法及びプログラムが実現される。 According to the present invention, a voice database manufacturing apparatus, a voice database, a sound piece restoration apparatus, a voice database production method, and a sound piece restoration method for effectively protecting voice data while freely providing data representing voice. And a program is realized.

以下、音声合成システムを例とし、図面を参照して、この発明の実施の形態を説明する。
図１は、この発明の実施の形態に係る音声合成システムの構成を示す図である。図示するように、この音声合成システムは、本体ユニットＭと、音片登録ユニットＲと、音片データベースＤとにより構成されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, taking a speech synthesis system as an example.
FIG. 1 is a diagram showing a configuration of a speech synthesis system according to an embodiment of the present invention. As shown in the figure, this speech synthesis system is composed of a main unit M, a sound piece registration unit R, and a sound piece database D.

本体ユニットＭは、言語処理部１と、一般単語辞書２と、ユーザ単語辞書３と、規則合成処理部４と、音片編集部５と、検索部６と、復号化部７と、話速変換部８とにより構成されている。
このうち、規則合成処理部４は、音響処理部４１と、検索部４２と、伸長部４３と、波形データベース４４とにより構成されている。
また、音片編集部５は、形態素解析部５１と、一致音片決定部５２と、韻律予測部５３と、出力合成部５４とにより構成されている。 The main unit M includes a language processing unit 1, a general word dictionary 2, a user word dictionary 3, a rule synthesis processing unit 4, a sound piece editing unit 5, a search unit 6, a decoding unit 7, a speech speed. It is comprised by the conversion part 8.
Among these, the rule synthesis processing unit 4 includes an acoustic processing unit 41, a search unit 42, an expansion unit 43, and a waveform database 44.
The sound piece editing unit 5 includes a morphological analysis unit 51, a matching sound piece determination unit 52, a prosody prediction unit 53, and an output synthesis unit 54.

言語処理部１、音響処理部４１、検索部４２、伸長部４３、音片編集部５、検索部６、復号化部７及び話速変換部８は、いずれも、ＣＰＵ（Central Processing Unit）やＤＳＰ（Digital Signal Processor）等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどにより構成されており、それぞれ後述する処理を行う。 The language processing unit 1, the acoustic processing unit 41, the search unit 42, the decompression unit 43, the sound piece editing unit 5, the search unit 6, the decoding unit 7, and the speech rate conversion unit 8 are all CPU (Central Processing Unit) or A processor such as a DSP (Digital Signal Processor), a memory for storing a program to be executed by the processor, and the like, each of which performs processing to be described later.

なお、言語処理部１、音響処理部４１、検索部４２、伸長部４３、音片編集部５、検索部６、復号化部７及び話速変換部８の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。従って、例えば、伸長部４３の機能を行うプロセッサが復号化部７の機能を行ってもよいし、１個のプロセッサが音響処理部４１、検索部４２及び伸長部４３の機能を兼ねて行ってもよい。 A part or all of the functions of the language processing unit 1, the sound processing unit 41, the search unit 42, the decompression unit 43, the sound piece editing unit 5, the search unit 6, the decoding unit 7, and the speech rate conversion unit 8 are united. May be performed by the processor. Therefore, for example, the processor that performs the function of the decompression unit 43 may perform the function of the decoding unit 7, or one processor performs the functions of the acoustic processing unit 41, the search unit 42, and the decompression unit 43. Also good.

一般単語辞書２は、ＰＲＯＭ（Programmable Read Only Memory）やハードディスク装置等の不揮発性メモリにより構成されている。一般単語辞書２には、表意文字（例えば、漢字など）を含む単語等と、この単語等の読みを表す表音文字（例えば、カナや発音記号など）とが、この音声合成システムの製造者等によって、あらかじめ互いに対応付けて記憶されている。 The general word dictionary 2 is configured by a non-volatile memory such as a PROM (Programmable Read Only Memory) or a hard disk device. In the general word dictionary 2, words including ideographic characters (for example, kanji) and phonograms (for example, kana and phonetic symbols) representing the reading of these words are the manufacturer of this speech synthesis system. Etc., and stored in advance in association with each other.

ユーザ単語辞書３は、ＥＥＰＲＯＭ（Electrically Erasable/Programmable Read Only Memory）やハードディスク装置等のデータ書き換え可能な不揮発性メモリと、この不揮発性メモリへのデータの書き込みを制御する制御回路とにより構成されている。なお、プロセッサがこの制御回路の機能を行ってもよく、言語処理部１、音響処理部４１、検索部４２、伸長部４３、音片編集部５、検索部６、復号化部７及び話速変換部８の一部又は全部の機能を行うプロセッサがユーザ単語辞書３の制御回路の機能を行うようにしてもよい。
ユーザ単語辞書３は、表意文字を含む単語等と、この単語等の読みを表す表音文字とを、ユーザの操作に従って外部より取得し、互いに対応付けて記憶する。ユーザ単語辞書３には、一般単語辞書２に記憶されていない単語等とその読みを表す表音文字とが格納されていれば十分である。 The user word dictionary 3 includes a nonvolatile memory capable of rewriting data such as an EEPROM (Electrically Erasable / Programmable Read Only Memory) or a hard disk device, and a control circuit for controlling writing of data to the nonvolatile memory. . The processor may perform the function of this control circuit. The language processing unit 1, the acoustic processing unit 41, the search unit 42, the decompression unit 43, the sound piece editing unit 5, the search unit 6, the decoding unit 7, and the speech speed. A processor that performs part or all of the functions of the conversion unit 8 may perform the function of the control circuit of the user word dictionary 3.
The user word dictionary 3 obtains words including ideograms and phonograms representing readings of these words from the outside according to user operations, and stores them in association with each other. It is sufficient that the user word dictionary 3 stores words and the like that are not stored in the general word dictionary 2 and phonograms representing the readings.

波形データベース４４は、ＰＲＯＭやハードディスク装置等の不揮発性メモリにより構成されている。波形データベース４４には、表音文字と、この表音文字が表す音素を構成する素片（すなわち、１個の音素を構成する音声の波形１サイクル分（又はその他所定数のサイクル分）の音声）を表す素片波形データをエントロピー符号化して得られる圧縮波形データとが、この音声合成システムの製造者等によって、あらかじめ互いに対応付けて記憶されている。なお、エントロピー符号化される前の素片波形データは、例えば、ＰＣＭ化されたデジタル形式のデータからなっていればよい。 The waveform database 44 is configured by a non-volatile memory such as a PROM or a hard disk device. In the waveform database 44, a phonetic character and a voice constituting a phoneme represented by the phonetic character (that is, a voice of one cycle (or other predetermined number of cycles) of a voice waveform constituting one phoneme). ) And the compressed waveform data obtained by entropy coding the segment waveform data representing the data are stored in advance in association with each other by the manufacturer of the speech synthesis system. Note that the segment waveform data before entropy encoding may be, for example, PCM digital data.

なお、一般単語辞書２、ユーザ単語辞書３、波形データベース４４、及び一致音片決定部５２の不揮発性メモリの一部又は全部の機能を、単一の不揮発性メモリが行うようにしてもよい。 A single nonvolatile memory may perform a part or all of the functions of the nonvolatile memory of the general word dictionary 2, the user word dictionary 3, the waveform database 44, and the matching sound piece determination unit 52.

音片データベースＤは、ＰＲＯＭやハードディスク装置等の不揮発性メモリにより構成されており、本体ユニットＭに着脱可能に接続できるよう構成されており、また、音片登録ユニットＲにも着脱可能に接続できるよう構成されている。 The sound piece database D is composed of a non-volatile memory such as a PROM or a hard disk device, and is configured to be detachably connected to the main unit M, and is also detachably connectable to the sound piece registration unit R. It is configured as follows.

音片データベースＤには、例えば、図２に示すデータ構造を有するデータが記憶されている。すなわち、図示するように、音片データベースＤに格納されているデータは、ヘッダ部ＨＤＲ、インデックス部ＩＤＸ、ディレクトリ部ＤＩＲ及びデータ部ＤＡＴの４種に分かれている。 In the sound piece database D, for example, data having a data structure shown in FIG. 2 is stored. That is, as shown in the figure, the data stored in the sound piece database D is divided into four types: a header part HDR, an index part IDX, a directory part DIR, and a data part DAT.

なお、音片データベースＤへのデータの格納は、例えば、この音声合成システムの製造者によりあらかじめ行われ、及び／又は、音片登録ユニットＲが後述する動作を行うことにより行われる。 Note that the data storage in the sound piece database D is performed in advance by, for example, the manufacturer of the speech synthesis system and / or by the sound piece registration unit R performing the operation described later.

ヘッダ部ＨＤＲには、音片データベースＤを識別するデータや、インデックス部ＩＤＸ、ディレクトリ部ＤＩＲ及びデータ部ＤＡＴのデータ量、データの形式、著作権等の帰属などを示すデータが格納される。 The header portion HDR stores data identifying the sound piece database D, data indicating the index portion IDX, the data amount of the directory portion DIR and the data portion DAT, the format of the data, the attribution of copyright, and the like.

データ部ＤＡＴには、音片の波形を表す音片データを暗号化して得られる暗号化音片データが複数格納されている。
なお、音片とは、音声のうち音素１個以上を含む連続した１区間をいい、通常は単語１個分又は複数個分の区間からなる。音片は接続詞を含む場合もある。なお、１個の音片データベースに格納されている各暗号化音片データが表す各音片は、同一の話者が発話したものであるとする。
また、暗号化される前の音片データは、上述の圧縮波形データの生成のため暗号化される前の波形データと同じ形式のデータ（例えば、ＰＣＭ化されたデジタル形式のデータ）からなっていればよい。 The data portion DAT stores a plurality of pieces of encrypted sound piece data obtained by encrypting sound piece data representing the waveform of a sound piece.
Note that a sound piece refers to a continuous section including one or more phonemes in speech, and usually includes a section for one word or a plurality of words. Sound pieces may contain conjunctions. It is assumed that the sound pieces represented by the encrypted sound piece data stored in one sound piece database are uttered by the same speaker.
Further, the sound piece data before being encrypted is composed of data in the same format as the waveform data before being encrypted for the generation of the compressed waveform data (for example, data in a digital format converted to PCM). Just do it.

ディレクトリ部ＤＩＲには、個々の暗号化音片データについて、
（Ａ）この暗号化音片データが表す音片の読みを示す表音文字を表すデータ（音片読みデータ）、
（Ｂ）この暗号化音片データが格納されている記憶位置の先頭アドレスを表すデータ、
（Ｃ）この暗号化音片データのデータ長を表すデータ、
（Ｄ）この暗号化音片データが表す音片の発声スピード（再生した場合の時間長）を表すデータ（スピード初期値データ）、
（Ｅ）この音片のピッチ成分の周波数の時間変化を表すデータ（ピッチ成分データ）、
が、互いに対応付けられた形で格納されている。（なお、音片データベースＤの記憶領域にはアドレスが付されているものとする。） In the directory part DIR, for each encrypted sound piece data,
(A) data (phonetic piece reading data) representing a phonetic character indicating reading of the voice piece represented by the encrypted voice piece data
(B) data representing the head address of the storage location where this encrypted sound piece data is stored;
(C) data representing the data length of this encrypted sound piece data;
(D) data (speed initial value data) representing the utterance speed of the sound piece represented by the encrypted sound piece data (the length of time when played back);
(E) data (pitch component data) representing the time variation of the frequency of the pitch component of this sound piece;
Are stored in association with each other. (It is assumed that an address is assigned to the storage area of the sound piece database D.)

図２は、データ部ＤＡＴに含まれるデータとして、読みが「サイタマ」である音片の波形を表す、データ量１４１０ｈバイトの暗号化音片データが、アドレス００１Ａ３６Ａ６ｈを先頭とする論理的位置に格納されている場合を例示している。（なお、本明細書及び図面において、末尾に“ｈ”を付した数字は１６進数を表す。） FIG. 2 shows, as data included in the data portion DAT, encrypted sound piece data having a data amount of 1410 h bytes representing the waveform of a sound piece whose reading is “Saitama” is stored in a logical position starting at address 001A36A6h. The case where it is done is illustrated. (In this specification and drawings, the number with “h” at the end represents a hexadecimal number.)

なお、各々の暗号化音片データは、元の音片データが暗号鍵を用いて暗号化されたものに相当するものとする。この暗号鍵は、上述の（Ｃ）のデータの集合全体のうちから所定の１個（具体的には、例えば所定の読みの暗号化音片データに対応付けられた１個）を選択し、同様に（Ｄ）のデータの集合全体のうちからも所定の１個を選択し、更に（Ｅ）のデータの集合全体のうちからも所定の１個を選択して、選択された計３個のデータの値の和を求めることにより生成されたものであるとする。選択されるこれら３個のデータは、必ずしも互いに同一の暗号化音片データに対応付けられている必要はない。なお、上述の暗号化は、任意の対称鍵暗号の手法により行われていればよく、具体的には、例えばＤＥＳ（Data Encryption System）に準拠した手法で行われていればよい。 Note that each piece of encrypted sound piece data corresponds to the original sound piece data encrypted using an encryption key. As the encryption key, a predetermined one (specifically, for example, one associated with encrypted sound piece data of a predetermined reading) is selected from the entire data set of (C) described above, Similarly, a predetermined one is selected from the entire data set of (D), and a predetermined one is further selected from the entire data set of (E). It is assumed that the data is generated by calculating the sum of the data values. These three pieces of selected data are not necessarily associated with the same encrypted sound piece data. The above-described encryption only needs to be performed by any symmetric key encryption method, and specifically, may be performed by a method compliant with, for example, DES (Data Encryption System).

また、上述の（Ａ）〜（Ｅ）のデータの集合のうち少なくとも（Ａ）のデータ（すなわち音片読みデータ）は、音片読みデータが表す表音文字に基づいて決められた順位に従ってソートされた状態で（例えば、表音文字がカナであれば、五十音順に従って、アドレス昇順に並んだ状態で）、音片データベースＤの記憶領域に格納されている。
また、上述のピッチ成分データは、例えば、図示するように、音片のピッチ成分の周波数を音片の先頭からの経過時間の１次関数で近似した場合における、この１次関数の切片β及び勾配αの値を示すデータからなっていればよい。（勾配αの単位は例えば［ヘルツ／秒］であればよく、切片βの単位は例えば［ヘルツ］であればよい。）
また、ピッチ成分データには更に、暗号化音片データが表す音片が鼻濁音化されているか否か、及び、無声化されているか否かを表す図示しないデータも含まれているものとする。 In addition, at least the data (A) (that is, the speech piece reading data) of the set of data (A) to (E) is sorted according to the order determined based on the phonetic characters represented by the speech piece reading data. (For example, if the phonetic character is kana, the phonetic characters are arranged in ascending order of addresses in the order of the Japanese syllabary) and stored in the storage area of the speech piece database D.
In addition, the above-described pitch component data includes, for example, as shown in the figure, when the frequency of the pitch component of the sound piece is approximated by a linear function of the elapsed time from the head of the sound piece, What is necessary is just to consist of the data which show the value of gradient (alpha). (The unit of the gradient α may be [Hertz / second], for example, and the unit of the intercept β may be [Hertz], for example.)
In addition, it is assumed that the pitch component data further includes data (not shown) indicating whether or not the sound piece represented by the encrypted sound piece data is nasalized and whether or not it is devoiced.

インデックス部ＩＤＸには、ディレクトリ部ＤＩＲのデータのおおよその論理的位置を音片読みデータに基づいて特定するためのデータが格納されている。具体的には、例えば、音片読みデータがカナを表すものであるとして、カナ文字と、先頭１字がこのカナ文字であるような音片読みデータがどのような範囲のアドレスにあるかを示すデータ（ディレクトリアドレス）とが、互いに対応付けて格納されている。 The index part IDX stores data for specifying the approximate logical position of the data in the directory part DIR based on the sound piece reading data. Specifically, for example, assuming that the sound piece reading data represents kana, the address range of the kana characters and the sound piece reading data whose first character is this kana character is in the range. Data (directory address) to be shown is stored in association with each other.

音片登録ユニットＲは、図示するように、収録音片データセット記憶部１０と、音片データベース作成部１１と、圧縮部１２とにより構成されている。 The sound piece registration unit R includes a recorded sound piece data set storage unit 10, a sound piece database creation unit 11, and a compression unit 12, as illustrated.

収録音片データセット記憶部１０は、ハードディスク装置等のデータ書き換え可能な不揮発性メモリにより構成されている。
収録音片データセット記憶部１０には、音片の読みを表す表音文字と、この音片を人が実際に発声したものを集音して得た波形を表す音片データとが、この音声合成システムの製造者等によって、あらかじめ互いに対応付けて記憶されている。なお、この音片データは、例えば、ＰＣＭ化されたデジタル形式のデータからなっていればよい。 The recorded sound piece data set storage unit 10 is composed of a rewritable nonvolatile memory such as a hard disk device.
The recorded sound piece data set storage unit 10 includes phonetic characters representing the reading of the sound pieces and sound piece data representing the waveforms obtained by collecting the sound pieces actually uttered by a person. They are stored in advance in association with each other by the manufacturer of the speech synthesis system. The sound piece data may be composed of, for example, PCM digital data.

音片データベース作成部１１及び圧縮部１２は、ＣＰＵ等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどにより構成されており、このプログラムに従って後述する処理を行う。 The sound piece database creation unit 11 and the compression unit 12 are configured by a processor such as a CPU and a memory that stores a program to be executed by the processor, and performs processing to be described later according to the program.

なお、音片データベース作成部１１及び圧縮部１２の一部又は全部の機能を単一のプロセッサが行うようにしてもよく、また、言語処理部１、音響処理部４１、検索部４２、伸長部４３、音片編集部５、検索部６、復号化部７及び話速変換部８の一部又は全部の機能を行うプロセッサが音片データベース作成部１１や圧縮部１２の機能を更に行ってもよい。また、音片データベース作成部１１や圧縮部１２の機能を行うプロセッサが、収録音片データセット記憶部１０の制御回路の機能を兼ねてもよい。 Note that a single processor may perform a part or all of the functions of the speech piece database creation unit 11 and the compression unit 12, and the language processing unit 1, the acoustic processing unit 41, the search unit 42, and the expansion unit. 43, even if a processor that performs some or all of the functions of the sound piece editing unit 5, the search unit 6, the decoding unit 7, and the speech rate conversion unit 8 further performs the functions of the sound piece database creation unit 11 and the compression unit 12. Good. Further, a processor that performs the functions of the sound piece database creation unit 11 and the compression unit 12 may also function as a control circuit of the recorded sound piece data set storage unit 10.

次に、この音声合成システムの動作を説明する。 Next, the operation of this speech synthesis system will be described.

（音片登録ユニットの動作）
まず、音片登録ユニットＲの動作を説明する。
音片データベースＤに音片を登録する場合、まず、音片データベース作成部１１は、音片データベースＤが音片登録ユニットＲに接続された状態で、収録音片データセット記憶部１０より、互いに対応付けられている表音文字及び音片データを読み出し、この音片データのデータ長と、この音片データが表す音声の発声スピード及びピッチ成分の周波数の時間変化とを特定する。 (Operation of sound piece registration unit)
First, the operation of the sound piece registration unit R will be described.
When registering sound pieces in the sound piece database D, the sound piece database creating unit 11 firstly stores the sound piece database D from the recorded sound piece data set storage unit 10 in a state where the sound piece database D is connected to the sound piece registration unit R. Corresponding phonetic character and speech piece data are read out, and the data length of the speech piece data and the time variation of the voice production speed and the frequency of the pitch component represented by the speech piece data are specified.

音片データのデータ長及び発声スピードの特定は、例えば、この音片データのサンプル数を数えることにより特定すればよい。 The data length and speech speed of the sound piece data may be specified, for example, by counting the number of samples of the sound piece data.

一方、ピッチ成分の周波数の時間変化は、例えば、この音片データにケプストラム解析を施すことにより特定すればよい。具体的には、例えば、音片データが表す波形を時間軸上で多数の小部分へと区切り、得られたそれぞれの小部分の強度を、元の値の対数（対数の底は任意）に実質的に等しい値へと変換し、値が変換されたこの小部分のスペクトル（すなわち、ケプストラム）を、高速フーリエ変換の手法（あるいは、離散的変数をフーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。そして、このケプストラムの極大値を与える周波数のうちの最小値を、この小部分におけるピッチ成分の周波数として特定する。 On the other hand, the time change of the frequency of the pitch component may be specified by performing cepstrum analysis on the sound piece data, for example. Specifically, for example, the waveform represented by the sound piece data is divided into a number of small parts on the time axis, and the intensity of each obtained small part is converted to the logarithm of the original value (the base of the logarithm is arbitrary). Convert to a substantially equal value, and use this fast Fourier transform method (or generate data that represents the result of Fourier transform of discrete variables, etc.) (Any method). Then, the minimum value among the frequencies giving the maximum value of the cepstrum is specified as the frequency of the pitch component in this small portion.

なお、ピッチ成分の周波数の時間変化は、例えば、特開２００３−１０８１７２号公報に開示された手法に従って音片データをピッチ波形データへと変換してから、このピッチ波形データに基づいて特定するようにすると良好な結果が期待できる。具体的には、音片データをフィルタリングしてピッチ信号を抽出し、抽出されたピッチ信号に基づいて、音片データが表す波形を単位ピッチ長の区間へと区切り、各区間について、ピッチ信号との相関関係に基づいて位相のずれを特定して各区間の位相を揃えることにより、音片データをピッチ波形信号へと変換すればよい。そして、得られたピッチ波形信号を音片データとして扱い、ケプストラム解析を行う等することにより、ピッチ成分の周波数の時間変化を特定すればよい。 The time change of the frequency of the pitch component is specified based on the pitch waveform data after the sound piece data is converted into the pitch waveform data according to the method disclosed in Japanese Patent Laid-Open No. 2003-108172, for example. A good result can be expected. Specifically, the pitch data is extracted by filtering the piece data, and the waveform represented by the piece data is divided into sections of unit pitch length based on the extracted pitch signal. It is only necessary to convert the sound piece data into a pitch waveform signal by identifying the phase shift based on the correlation and aligning the phases of each section. Then, the obtained pitch waveform signal is handled as sound piece data, and a cepstrum analysis is performed, for example, so that the time change of the frequency of the pitch component may be specified.

次に、音片データベース作成部１１は、特定したデータ長、発声スピード及びピッチ成分の周波数の時間変化を特定した結果を示す３個のデータを、上述の（Ｃ）のデータ、スピード初期値データ（すなわち上述の（Ｄ）のデータ）及びピッチ成分データ（すなわち上述の（Ｅ）のデータ）として生成する。 Next, the sound piece database creation unit 11 uses the above-described data (C), speed initial value data, and the three data indicating the results of specifying the time variation of the specified data length, utterance speed, and pitch component frequency. (That is, the data of (D) described above) and pitch component data (that is, the data of (E) described above).

音片データベースに登録する対象であるすべての音片について（Ｃ）、（Ｄ）及び（Ｅ）のデータの生成を完了すると、音片データベース作成部１１は、生成した（Ｃ）のデータの集合全体のうちから所定の読みの音片データより生成された１個を選択し、生成した（Ｄ）のデータの集合全体のうちからも所定の読みの音片データより生成された１個を選択し、更に、生成した（Ｅ）のデータの集合全体のうちからも所定の読みの音片データより生成された１個を選択して、選択された計３個のデータの値の和を表すデータ、すなわち暗号鍵を生成する。そして、収録音片データセット記憶部１０より読み出した音片データと、生成したこの暗号鍵とを、圧縮部１２に供給する。
圧縮部１２は、音片データベース作成部１１より供給された暗号鍵を用いて、音片データベース作成部１１より供給された音片データを対称鍵暗号の手法により暗号化することにより暗号化音片データを作成し、音片データベース作成部１１に返送する。 When the generation of the data (C), (D), and (E) is completed for all the sound pieces to be registered in the sound piece database, the sound piece database creation unit 11 sets the generated data set (C). Select one generated from sound piece data of a predetermined reading from among the whole, and select one generated from sound piece data of a predetermined reading from the entire set of generated data of (D) Further, one generated from the piece data of the predetermined reading is selected from the entire set of generated data (E), and the sum of the values of the selected three data is expressed. Data, that is, an encryption key is generated. Then, the sound piece data read from the recorded sound piece data set storage unit 10 and the generated encryption key are supplied to the compression unit 12.
The compression unit 12 uses the encryption key supplied from the sound piece database creation unit 11 to encrypt the sound piece data supplied from the sound piece database creation unit 11 by using a symmetric key encryption technique, thereby encrypting the sound piece. Data is created and returned to the sound piece database creation unit 11.

なお、所定の（Ｃ）、（Ｄ）及び（Ｅ）の各データの定められ方は、どれが当該所定のデータであるかがこの音声合成システムの利用者に対して公開されない態様で定められている限り、任意である（ただし、データが所定の値をとることが条件となっていないことが必要である）。従って、暗号鍵の生成に用いられるべき３個のデータが対応付けられる暗号化音片データの読みが直接定められていてもよいし、これら３個のデータの、音片データベースＤの記憶領域内での先頭からの位置が定められていてもよい。あるいは、音片データベース作成部１１が、ある所定のデータを所定のハッシュ関数に代入し、あるいはその他所定の演算を施すことにより得られる値を算出して、得られた値によって一意に決まる３個のデータを暗号鍵の生成に用いるようにしてもよい。 The predetermined data (C), (D), and (E) are determined in such a manner that which is the predetermined data is not disclosed to the user of the speech synthesis system. As long as the data is arbitrary (however, it is not necessary that the data take a predetermined value). Therefore, the reading of the encrypted sound piece data associated with the three pieces of data to be used for generating the encryption key may be directly determined, or these three pieces of data are stored in the storage area of the sound piece database D. The position from the beginning may be determined. Alternatively, the sound piece database creation unit 11 calculates a value obtained by substituting certain predetermined data into a predetermined hash function or performing other predetermined operations, and is uniquely determined by the obtained value. These data may be used for generating an encryption key.

音片データが暗号化され暗号化音片データとなって圧縮部１２より返送されると、音片データベース作成部１１は、この暗号化音片データを、データ部ＤＡＴを構成するデータとして、音片データベースＤの記憶領域に書き込む。 When the sound piece data is encrypted and returned as the encrypted sound piece data from the compression unit 12, the sound piece database creation unit 11 uses the encrypted sound piece data as data constituting the data part DAT as a sound. Write to the storage area of the fragment database D.

また、音片データベース作成部１１は、書き込んだ暗号化音片データが表す音片の読みを示すものとして収録音片データセット記憶部１０より読み出した表音文字を、音片読みデータ（すなわち上述の（Ａ）のデータ）として音片データベースＤの記憶領域に書き込む。
また、書き込んだ暗号化音片データの、音片データベースＤの記憶領域内での先頭のアドレスを特定し、このアドレスを上述の（Ｂ）のデータとして音片データベースＤの記憶領域に書き込む。
また、書き込んだ暗号化音片データに相当する音片データを用いて生成された上述の（Ｃ）、（Ｄ）及び（Ｅ）の各データも、音片データベースＤの記憶領域に書き込む。 Further, the sound piece database creating unit 11 converts the phonetic character read from the recorded sound piece data set storage unit 10 as the sound piece reading data (that is, the above-described sound piece reading data (ie, the above-mentioned). (A) data) is written in the storage area of the sound piece database D.
Further, the head address of the written encrypted sound piece data in the storage area of the sound piece database D is specified, and this address is written in the storage area of the sound piece database D as the data (B) described above.
In addition, the above-described data (C), (D), and (E) generated using sound piece data corresponding to the written encrypted sound piece data are also written in the storage area of the sound piece database D.

（本体ユニットの動作）
次に、本体ユニットＭの動作を説明する。以下では、まず、言語処理部１が、この音声合成システムに音声を合成させる対象としてユーザが用意した、表意文字を含む文章（フリーテキスト）を記述したフリーテキストデータを外部から取得したとして説明する。 (Operation of the main unit)
Next, the operation of the main unit M will be described. In the following, first, it is assumed that the language processing unit 1 has acquired free text data describing a sentence (free text) including an ideogram prepared by the user as a target for synthesizing speech in the speech synthesis system. .

なお、言語処理部１がフリーテキストデータを取得する手法は任意であり、例えば、図示しないインターフェース回路を介して外部の装置やネットワークから取得してもよいし、図示しない記録媒体ドライブ装置にセットされた記録媒体（例えば、フレキシブルディスクやＣＤ−ＲＯＭなど）から、この記録媒体ドライブ装置を介して読み取ってもよい。 The language processing unit 1 may acquire any free text data. For example, the language processing unit 1 may acquire the free text data from an external device or a network via an interface circuit (not shown), or may be set in a recording medium drive device (not shown). It is also possible to read from a recording medium (for example, a flexible disk or a CD-ROM) via this recording medium drive device.

また、言語処理部１の機能を行っているプロセッサが、自ら実行している他の処理で用いたテキストデータを、フリーテキストデータとして、言語処理部１の処理へと引き渡すようにしてもよい。
プロセッサが実行する当該他の処理としては、例えば、音声を表す音声データを取得し、この音声データに音声認識を施すことにより、この音声が表す語句を特定し、特定した語句に基づいて、この音声の発話者の要求の内容を特定して、特定した要求を満足させるために実行すべき処理を特定して実行するようなエージェント装置の機能をプロセッサに行わせるための処理などが考えられる。 Alternatively, the processor performing the function of the language processing unit 1 may deliver the text data used in other processing executed by itself to the processing of the language processing unit 1 as free text data.
As the other processing executed by the processor, for example, voice data representing voice is acquired, and voice recognition is performed on the voice data to identify a phrase represented by the voice. Based on the identified phrase, For example, processing for causing the processor to perform the function of the agent device that specifies the content of the request of the voice speaker and specifies and executes the processing to be executed to satisfy the specified request can be considered.

フリーテキストデータを取得すると、言語処理部１は、このフリーテキストに含まれるそれぞれの表意文字について、その読みを表す表音文字を、一般単語辞書２やユーザ単語辞書３を検索することにより特定する。そして、この表意文字を、特定した表音文字へと置換する。そして、言語処理部１は、フリーテキスト内の表意文字がすべて表音文字へと置換した結果得られる表音文字列を、規則合成処理部４の音響処理部４１へと供給する。 When the free text data is acquired, the language processing unit 1 specifies a phonetic character representing the reading of each ideographic character included in the free text by searching the general word dictionary 2 and the user word dictionary 3. . Then, the ideogram is replaced with the specified phonogram. Then, the language processing unit 1 supplies the phonogram string obtained as a result of replacing all ideographic characters in the free text with phonograms to the acoustic processing unit 41 of the rule synthesis processing unit 4.

音響処理部４１は、言語処理部１より表音文字列を供給されると、この表音文字列に含まれるそれぞれの表音文字について、当該表音文字が表す音素を構成する素片の波形を検索するよう、検索部４２に指示する。また、音響処理部４１は、この表音文字列を、音片編集部５の韻律予測部５３に供給する。 When the sound processing unit 41 is supplied with a phonetic character string from the language processing unit 1, for each phonetic character included in the phonetic character string, the waveform of the segment constituting the phoneme represented by the phonetic character The search unit 42 is instructed to search for. The acoustic processing unit 41 supplies the phonetic character string to the prosody prediction unit 53 of the speech piece editing unit 5.

検索部４２は、この指示に応答して波形データベース４４を検索し、この指示の内容に合致する圧縮波形データを索出する。そして、索出された圧縮波形データを伸長部４３へと供給する。 The search unit 42 searches the waveform database 44 in response to this instruction, and searches for compressed waveform data that matches the contents of this instruction. Then, the searched compressed waveform data is supplied to the decompression unit 43.

伸長部４３は、検索部４２より供給された圧縮波形データを、圧縮される前の素片波形データへと復元し、検索部４２へと返送する。検索部４２は、伸長部４３より返送された素片波形データを、検索結果として音響処理部４１へと供給する。 The decompression unit 43 restores the compressed waveform data supplied from the search unit 42 to the segment waveform data before being compressed, and returns it to the search unit 42. The search unit 42 supplies the segment waveform data returned from the decompression unit 43 to the acoustic processing unit 41 as a search result.

一方、音響処理部４１より表音文字列を供給された韻律予測部５３は、この表音文字列に、例えば「藤崎モデル」や「ＴｏＢＩ（Tone and Break Indices）」等の韻律予測の手法に基づいた解析を加えることにより、この表音文字列が表す音声の韻律（アクセント、イントネーション、強勢、音素の時間長など）を予測し、予測結果を表す韻律予測データを生成する。そして、この韻律予測データを、音響処理部４１に供給する。 On the other hand, the prosodic prediction unit 53 supplied with the phonetic character string from the acoustic processing unit 41 uses the phonetic character string as a prosody prediction method such as “Fujisaki model” or “ToBI (Tone and Break Indices)”. By adding the analysis based on this, the prosody of the speech represented by the phonetic character string (accent, intonation, stress, phoneme duration, etc.) is predicted, and prosodic prediction data representing the prediction result is generated. Then, this prosodic prediction data is supplied to the acoustic processing unit 41.

音響処理部４１は、検索部４２より素片波形データを供給され、韻律予測部５３より韻律予測データを供給されると、供給された素片波形データを用いて、言語処理部１が供給した表音文字列に含まれるそれぞれの表音文字が表す音声の波形を表す音声波形データを生成する。 When the acoustic processing unit 41 is supplied with the segment waveform data from the search unit 42 and is supplied with the prosody prediction data from the prosody prediction unit 53, the language processing unit 1 uses the supplied segment waveform data. Speech waveform data representing a speech waveform represented by each phonogram included in the phonogram string is generated.

具体的には、音響処理部４１は、例えば、検索部４２より供給された各々の素片波形データが表す素片により構成されている音素の時間長を、韻律予測部５３より供給された韻律予測データに基づいて特定する。そして、特定した音素の時間長を、当該素片波形データが表す素片の時間長で除した値に最も近い整数を求め、当該素片波形データを、求めた整数に等しい個数分相互に結合することにより、音声波形データを生成すればよい。 Specifically, the acoustic processing unit 41 uses, for example, the prosody supplied from the prosody prediction unit 53 to determine the time length of phonemes configured by the segments represented by the respective segment waveform data supplied from the search unit 42. Identify based on forecast data. Then, an integer closest to the value obtained by dividing the time length of the specified phoneme by the time length of the segment represented by the segment waveform data is obtained, and the segment waveform data is mutually connected by the number equal to the obtained integer. Thus, the speech waveform data may be generated.

なお、音響処理部４１は、音声波形データが表す音声の時間長を韻律予測データに基づいて決定するのみならず、音声波形データを構成する素片波形データを加工して、音声波形データが表す音声が、当該韻律予測データが示す韻律に合致する強度やイントネーション等を有するようにしてもよい。 The acoustic processing unit 41 not only determines the time length of the speech represented by the speech waveform data based on the prosodic prediction data, but also processes the segment waveform data constituting the speech waveform data to represent the speech waveform data. The voice may have intensity, intonation, and the like that match the prosody indicated by the prosodic prediction data.

あるいは、波形データベース４４が、同一の音素を構成するものであって互いに異なる強度及び／又はイントネーションを有する複数の素片を表す複数の圧縮波形データを記憶していてもよい。この場合、音響処理部４１は、素片波形データを加工する代わりに、検索部４２より供給された素片波形データのうち、韻律予測部５３より供給された韻律予測データが示す韻律に合致する強度及びイントネーション等を有する素片を表すものを用いて、当該韻律予測データが示す韻律に合致する強度やイントネーション等を有する音声を表す音声波形データを生成するようにしてもよい。
あるいは、検索部４２は、音響処理部４１の指示の内容に合致する圧縮波形データのうち、韻律予測部５３より供給された韻律予測データが示す韻律に合致する強度及びイントネーション等を有する素片を表すもののみを索出するようにしてもよい。 Alternatively, the waveform database 44 may store a plurality of compressed waveform data representing a plurality of segments that constitute the same phoneme and have different intensities and / or intonations. In this case, the acoustic processing unit 41 matches the prosody indicated by the prosody prediction data supplied from the prosody prediction unit 53 among the segment waveform data supplied from the search unit 42 instead of processing the segment waveform data. Speech waveform data representing speech having strength, intonation, and the like that matches the prosody indicated by the prosodic prediction data may be generated by using a unit representing an element having strength, intonation, and the like.
Alternatively, the search unit 42 selects a segment having strength and intonation that matches the prosody indicated by the prosody prediction data supplied from the prosody prediction unit 53 among the compressed waveform data that matches the content of the instruction of the sound processing unit 41. Only what is represented may be searched.

そして、音響処理部４１は、生成された音声波形データを、言語処理部１より供給された表音文字列内での各表音文字の並びに従った順序で、音片編集部５の出力合成部５４へと供給する。 Then, the sound processing unit 41 synthesizes the generated speech waveform data from the speech piece editing unit 5 in the order in which the phonograms are arranged in the phonogram string supplied from the language processing unit 1. To the unit 54.

出力合成部５４は、音響処理部４１より波形データを供給されると、この波形データを、音響処理部４１より供給された順序で互いに結合し、合成音声を表すデータ（合成音声データ）として出力する。フリーテキストデータに基づいて合成されたこの合成音声は、規則合成方式の手法により合成された音声に相当する。 When the waveform data is supplied from the acoustic processing unit 41, the output synthesis unit 54 combines the waveform data with each other in the order supplied by the acoustic processing unit 41, and outputs the combined data as data representing the synthesized speech (synthetic speech data). To do. This synthesized speech synthesized based on the free text data corresponds to speech synthesized by the rule synthesis method.

なお、出力合成部５４が合成音声データを出力する手法は任意であり、例えば、図示しないＤ／Ａ（Digital-to-Analog）変換器やスピーカを介して、この合成音声データが表す合成音声を再生するようにしてもよい。また、図示しないインターフェース回路を介して外部の装置やネットワークに送出してもよいし、図示しない記録媒体ドライブ装置にセットされた記録媒体へ、この記録媒体ドライブ装置を介して書き込んでもよい。また、出力合成部５４の機能を行っているプロセッサが、自ら実行している他の処理へと、合成音声データを引き渡すようにしてもよい。 The method of outputting the synthesized voice data by the output synthesizer 54 is arbitrary. For example, the synthesized voice represented by the synthesized voice data is output via a D / A (Digital-to-Analog) converter or a speaker (not shown). You may make it reproduce | regenerate. Further, it may be sent to an external device or a network via an interface circuit (not shown), or may be written to a recording medium set in a recording medium drive device (not shown) via this recording medium drive device. Further, the processor that performs the function of the output synthesis unit 54 may deliver the synthesized voice data to another process that is being executed by the processor.

次に、音響処理部４１が、外部より配信された、表音文字列を表すデータ（配信文字列データ）を取得したとする。（なお、音響処理部４１が配信文字列データを取得する手法も任意であり、例えば、言語処理部１がフリーテキストデータを取得する手法と同様の手法で配信文字列データを取得すればよい。） Next, it is assumed that the acoustic processing unit 41 acquires data representing a phonetic character string (delivery character string data) distributed from the outside. (Note that the method by which the acoustic processing unit 41 acquires the distribution character string data is also arbitrary. For example, the distribution character string data may be acquired by a method similar to the method by which the language processing unit 1 acquires the free text data. )

この場合、音響処理部４１は、配信文字列データが表す表音文字列を、言語処理部１より供給された表音文字列と同様に扱う。この結果、配信文字列データが表す表音文字列に含まれる表音文字が表す音素を構成する素片を表す圧縮波形データが検索部４２により索出され、圧縮される前の素片波形データが伸長部４３により復元される。一方で、韻律予測部５３により、配信文字列データが表す表音文字列に韻律予測の手法に基づいた解析が加えられ、この結果、この表音文字列が表す音声の韻律の予測結果を表す韻律予測データが生成される。そして音響処理部４１が、配信文字列データが表す表音文字列に含まれるそれぞれの表音文字が表す音声の波形を表す音声波形データを、復元された各素片波形データと、韻律予測データとに基づいて生成し、出力合成部５４は、生成された音声波形データを、配信文字列データが表す表音文字列内での各表音文字の並びに従った順序で互いに結合し、合成音声データとして出力する。配信文字列データに基づいて合成されたこの合成音声データも、規則合成方式の手法により合成された音声を表す。 In this case, the acoustic processing unit 41 handles the phonetic character string represented by the distribution character string data in the same manner as the phonetic character string supplied from the language processing unit 1. As a result, the compressed waveform data representing the phoneme constituting the phoneme represented by the phonetic character included in the phonetic character string represented by the delivery character string data is retrieved by the search unit 42, and the segment waveform data before being compressed. Is restored by the decompression unit 43. On the other hand, the prosody prediction unit 53 adds an analysis based on the prosody prediction method to the phonetic character string represented by the distribution character string data, and as a result, represents the prediction result of the prosody of the voice represented by the phonetic character string. Prosodic prediction data is generated. Then, the acoustic processing unit 41 converts the speech waveform data representing the speech waveform represented by each phonogram included in the phonogram string represented by the distribution character string data, the restored segment waveform data, and the prosody prediction data. The output synthesizer 54 combines the generated speech waveform data with each other in the order of the phonograms in the phonogram string represented by the distribution character string data. Output as data. This synthesized voice data synthesized based on the distribution character string data also represents voice synthesized by the rule synthesis method.

なお、配信文字列データに基づいて合成音声データを合成する場合も、音響処理部４１は、音声波形データが表す音声の時間長を韻律予測データに基づいて決定するのみならず、音声波形データを構成する素片波形データを加工して、音声波形データが表す音声が、当該韻律予測データが示す韻律に合致する強度やイントネーション等を有するようにしてもよい。あるいは、波形データベース４４が、同一の音素を構成するものであって互いに異なる強度及び／又はイントネーションを有する複数の素片を表す複数の圧縮波形データを記憶していてもよい。この場合、音響処理部４１は、素片波形データを加工する代わりに、検索部４２より供給された素片波形データのうち、韻律予測部５３より供給された韻律予測データが示す韻律に合致する強度及びイントネーション等を有する素片を表すものを用いて、当該韻律予測データが示す韻律に合致する強度やイントネーション等を有する音声を表す音声波形データを生成するようにしてもよい。あるいは、検索部４２は、音響処理部４１の指示の内容に合致する圧縮波形データのうち、韻律予測部５３より供給された韻律予測データが示す韻律に合致する強度及びイントネーション等を有する素片を表すもののみを索出するようにしてもよい。 In addition, when synthesizing the synthesized speech data based on the distribution character string data, the acoustic processing unit 41 not only determines the time length of the speech represented by the speech waveform data based on the prosodic prediction data, but also the speech waveform data. The segment waveform data constituting the data may be processed so that the voice represented by the voice waveform data has intensity, intonation, and the like that match the prosody indicated by the prosodic prediction data. Alternatively, the waveform database 44 may store a plurality of compressed waveform data representing a plurality of segments that constitute the same phoneme and have different intensities and / or intonations. In this case, the acoustic processing unit 41 matches the prosody indicated by the prosody prediction data supplied from the prosody prediction unit 53 among the segment waveform data supplied from the search unit 42 instead of processing the segment waveform data. Speech waveform data representing speech having strength, intonation, and the like that matches the prosody indicated by the prosodic prediction data may be generated by using a unit representing an element having strength, intonation, and the like. Alternatively, the search unit 42 selects a segment having strength and intonation that matches the prosody indicated by the prosody prediction data supplied from the prosody prediction unit 53 among the compressed waveform data that matches the content of the instruction of the sound processing unit 41. Only what is represented may be searched.

次に、音片データベースＤが本体ユニットＭに接続された状態で、音片編集部５が、定型メッセージデータ、発声スピードデータ、及び照合レベルデータを取得したとする。
なお、定型メッセージデータは、定型メッセージを表意文字列として表すデータであり、具体的には、例えば本体ユニットＭが、車両に登載されるナビゲーション装置を構成するものであれば、ナビゲーションの目的で当該ナビゲーション装置に発声させるべきメッセージ等を表すデータである。
また、発声スピードデータは、定型メッセージデータが表す定型メッセージの発声スピードの指定値（この定型メッセージを発声する時間長の指定値）を示すデータである。
照合レベルデータは、検索部６が行う後述の検索処理における検索条件を指定するデータであり、以下では「１」、「２」又は「３」のいずれかの値をとるものとし、「３」が最も厳格な検索条件を示すものとする。 Next, it is assumed that the sound piece editing unit 5 acquires the standard message data, the utterance speed data, and the collation level data while the sound piece database D is connected to the main unit M.
The fixed message data is data representing the fixed message as an ideographic character string. Specifically, for example, if the main unit M constitutes a navigation device mounted on a vehicle, the fixed message data is used for navigation purposes. This is data representing a message or the like to be uttered by the navigation device.
The utterance speed data is data indicating a specified value of the utterance speed of the standard message represented by the standard message data (specified value of the length of time for uttering this standard message).
The collation level data is data for designating a search condition in a search process to be described later performed by the search unit 6, and is assumed to take one of the values “1”, “2”, or “3” below, and “3”. Indicates the strictest search condition.

また、一致音片決定部５２が定型メッセージデータや発声スピードデータや照合レベルデータを取得する手法は任意であり、例えば、言語処理部１がフリーテキストデータを取得する手法と同様の手法で定型メッセージデータや発声スピードデータや照合レベルデータを取得すればよい。 Further, the matching sound piece determination unit 52 may use any method for acquiring the standard message data, the utterance speed data, and the collation level data. For example, the standard message may be obtained by the same method as the method for the language processing unit 1 to acquire the free text data. Data, utterance speed data, and verification level data may be acquired.

定型メッセージデータ、発声スピードデータ、及び照合レベルデータが音片編集部５に供給されると、音片編集部５の形態素解析部５１は、定型メッセージデータに公知の手法による形態素解析を施すことにより、定型メッセージデータを構成する表意文字列を、表音文字列へと置換する。そして、得られた表音文字列を一致音片決定部５２へと供給する。 When the standard message data, the utterance speed data, and the collation level data are supplied to the sound piece editing unit 5, the morpheme analysis unit 51 of the sound piece editing unit 5 performs morphological analysis on the fixed message data by a known method. The ideographic character string constituting the standard message data is replaced with the phonetic character string. Then, the obtained phonetic character string is supplied to the matching sound piece determination unit 52.

一致音片決定部５２は、表音文字列を形態素解析部５１より供給されると、この表音文字列に合致する表音文字列が対応付けられている暗号化音片データをすべて索出するよう、検索部６に指示する。 When the phonogram string is supplied from the morphological analysis unit 51, the coincidence sound piece determination unit 52 searches for all the encrypted phonogram data associated with the phonogram string that matches the phonogram string. The search unit 6 is instructed to do so.

検索部６は、一致音片決定部５２の指示に応答して音片データベースＤを検索し、該当する暗号化音片データと、該当する暗号化音片データに対応付けられている上述の音片読みデータ、スピード初期値データ及びピッチ成分データとを索出し、索出された暗号化音片データを復号化部７へと供給する。複数の暗号化音片データが共通の表音文字ないし表音文字列に該当する場合も、該当する暗号化音片データすべてが、音声合成に用いられるデータの候補として索出される。一方、暗号化音片データを索出できなかった音片があった場合、検索部６は、該当する音片を識別するデータ（以下、欠落部分識別データと呼ぶ）を生成する。 The search unit 6 searches the sound piece database D in response to an instruction from the matching sound piece determination unit 52, and the corresponding encrypted sound piece data and the above-described sound associated with the corresponding encrypted sound piece data. The piece-read data, the speed initial value data, and the pitch component data are retrieved, and the retrieved encrypted sound piece data is supplied to the decryption unit 7. Even when a plurality of encrypted speech piece data corresponds to a common phonetic character or phonetic character string, all the corresponding encrypted speech piece data are searched for as data candidates used for speech synthesis. On the other hand, when there is a sound piece for which the encrypted sound piece data could not be found, the search unit 6 generates data for identifying the corresponding sound piece (hereinafter referred to as missing part identification data).

また、検索部６は、音片データベースＤが記憶する暗号化音片データの生成に用いたものと同一の暗号鍵を生成して、復号化部７へと供給する。
具体的には、検索部６は、音片データベースＤが記憶する上述の（Ｃ）、（Ｄ）及び（Ｅ）の各データのうち所定の１個ずつ計３個を抽出して、抽出された３個のデータの値の和を表すデータを生成することにより、この暗号化音片データの生成に用いたものと同一の暗号鍵を生成し、この暗号鍵を復号化部７へと供給する。 Further, the search unit 6 generates the same encryption key as that used for generation of the encrypted sound piece data stored in the sound piece database D, and supplies it to the decryption unit 7.
Specifically, the search unit 6 extracts a total of three of each of the above-mentioned data (C), (D), and (E) stored in the sound piece database D and extracts them. By generating data representing the sum of the three data values, the same encryption key as that used to generate the encrypted sound piece data is generated, and this encryption key is supplied to the decryption unit 7 To do.

復号化部７は、検索部６より供給された暗号化音片データを、検索部６より供給された暗号鍵を用い、圧縮される前の音片データへと復元して、検索部６へと返送する。 The decryption unit 7 restores the encrypted speech piece data supplied from the search unit 6 to the speech piece data before being compressed using the encryption key supplied from the search unit 6, and then returns to the search unit 6. And return.

検索部６は、復号化部７より返送された音片データと、索出された音片読みデータ、スピード初期値データ及びピッチ成分データとを、検索結果として話速変換部８へと供給する。また、欠落部分識別データを生成した場合は、この欠落部分識別データも話速変換部８へと供給する。 The retrieval unit 6 supplies the speech piece data returned from the decoding unit 7 and the retrieved speech piece reading data, speed initial value data, and pitch component data to the speech speed conversion unit 8 as retrieval results. . Further, when missing part identification data is generated, this missing part identification data is also supplied to the speech speed conversion unit 8.

一方、一致音片決定部５２は、話速変換部８に対し、話速変換部８に供給された音片データを変換して、当該音片データが表す音片の時間長を、音片編集部５に供給された発声スピードデータが示すスピードに合致するようにすることを指示する。 On the other hand, the coincidence sound piece determination unit 52 converts the sound piece data supplied to the speech speed conversion unit 8 to the speech speed conversion unit 8, and sets the time length of the sound piece represented by the sound piece data to the sound piece. The editing unit 5 is instructed to match the speed indicated by the utterance speed data supplied.

話速変換部８は、一致音片決定部５２の指示に応答し、検索部６より供給された音片データを指示に合致するように変換して、一致音片決定部５２に供給する。具体的には、例えば、検索部６より供給された音片データを個々の音素を表す区間へと区切り、得られたそれぞれの区間について、当該区間から、当該区間が表す音素を構成する素片を表す部分を特定して、特定された部分を（１個もしくは複数個）複製して当該区間内に挿入したり、又は、当該区間から当該部分を（１個もしくは複数個）除去することによって、当該区間の長さを調整することにより、この音片データ全体のサンプル数を、一致音片決定部５２の指示したスピードに合致する時間長にすればよい。なお、話速変換部８は、各区間について、素片を表す部分を挿入又は除去する個数を、各区間が表す音素相互間の時間長の比率が実質的に変化しないように決定すればよい。こうすることにより、音素同士を単に結合して合成する場合に比べて、音声のより細かい調整が可能になる。 In response to the instruction from the matching sound piece determination unit 52, the speech speed conversion unit 8 converts the sound piece data supplied from the search unit 6 so as to match the instruction, and supplies it to the matching sound piece determination unit 52. Specifically, for example, the speech piece data supplied from the search unit 6 is divided into sections representing individual phonemes, and for each obtained section, the segments constituting the phoneme represented by the section from the section. By identifying the part that represents and duplicating the specified part (s) and inserting it into the section, or by removing the part (s) from the section By adjusting the length of the section, the total number of samples of the sound piece data may be set to a time length that matches the speed indicated by the matching sound piece determination unit 52. Note that the speech speed conversion unit 8 may determine the number of inserted or removed portions representing segments for each section so that the ratio of time lengths between phonemes represented by each section does not substantially change. . By doing so, it is possible to finely adjust the sound compared to the case where the phonemes are simply combined and synthesized.

また、話速変換部８は、検索部６より供給された音片読みデータ及びピッチ成分データも一致音片決定部５２に供給し、欠落部分識別データを検索部６より供給された場合は、更にこの欠落部分識別データも一致音片決定部５２に供給する。 In addition, the speech speed conversion unit 8 also supplies the sound piece reading data and pitch component data supplied from the search unit 6 to the matching sound piece determination unit 52, and when missing part identification data is supplied from the search unit 6, Further, the missing portion identification data is also supplied to the matching sound piece determination unit 52.

なお、発声スピードデータが一致音片決定部５２に供給されていない場合、一致音片決定部５２は、話速変換部８に対し、話速変換部８に供給された音片データを変換せずに一致音片決定部５２に供給するよう指示すればよく、話速変換部８は、この指示に応答し、検索部６より供給された音片データをそのまま一致音片決定部５２に供給すればよい。 Note that, when the utterance speed data is not supplied to the matching sound piece determination unit 52, the matching sound piece determination unit 52 causes the speech speed conversion unit 8 to convert the sound piece data supplied to the speech speed conversion unit 8. The speech speed conversion unit 8 responds to this instruction and supplies the speech piece data supplied from the search unit 6 to the matching sound piece determination unit 52 as it is. do it.

一致音片決定部５２は、話速変換部８より音片データ、音片読みデータ及びピッチ成分データを供給されると、供給された音片データのうちから、定型メッセージを構成する音片の波形に近似できる波形を表す音片データを、音片１個につき１個ずつ選択する。ただし、一致音片決定部５２は、いかなる条件を満たす波形を定型メッセージの音片に近い波形とするかを、音片編集部５に供給された照合レベルデータに従って設定する。 When the voice piece data, the voice piece reading data, and the pitch component data are supplied from the speech speed conversion unit 8, the coincident voice piece determination unit 52 receives the voice pieces constituting the fixed message from the supplied voice piece data. One piece of piece data representing a waveform that can be approximated to a waveform is selected for each piece of sound. However, the matching sound piece determination unit 52 sets, according to the collation level data supplied to the sound piece editing unit 5, what kind of waveform is to be a waveform close to the sound piece of the standard message.

具体的には、まず、一致音片決定部５２は、例えば定型メッセージデータを変換して得られた表音文字列を韻律予測部５３に供給し、韻律予測部５３に、この表音文字列が表す定型メッセージの韻律を予測するよう指示する。韻律予測部５３はこの指示に従い、上述した韻律予測の手法に基づいた解析を加えることにより、この定型メッセージの韻律を予測し、予測結果を表す韻律予測データを生成して、一致音片決定部５２に返送する。 Specifically, first, the matching sound piece determination unit 52 supplies a phonetic character string obtained, for example, by converting standard message data to the prosody prediction unit 53, and the phonetic character string is sent to the prosody prediction unit 53. Instructs to predict the prosody of the canonical message represented by. In accordance with this instruction, the prosody prediction unit 53 performs analysis based on the above-described prosodic prediction method, predicts the prosody of this fixed message, generates prosodic prediction data representing the prediction result, and matches the speech piece determination unit Return to 52.

韻律予測データを取得すると、一致音片決定部５２は、例えば、
（１）照合レベルデータの値が「１」である場合は、話速変換部８より供給された音片データ（すなわち、定型メッセージ内の音片と読みが合致する音片データ）をすべて、定型メッセージ内の音片の波形に近いものとして選択する。 When the prosody prediction data is acquired, the matching sound piece determination unit 52, for example,
(1) When the value of the collation level data is “1”, all the speech piece data supplied from the speech rate conversion unit 8 (that is, speech piece data whose reading matches the speech piece in the standard message) Select as close to the waveform of the sound piece in the standard message.

（２）照合レベルデータの値が「２」である場合は、（１）の条件（つまり、読みを表す表音文字の合致という条件）を満たし、更に、音片データのピッチ成分の周波数の時間変化を表すピッチ成分データの内容と定型メッセージに含まれる音片のアクセント（いわゆる韻律）の予測結果との間に所定量以上の強い相関がある場合（例えば、アクセントの位置の時間差が所定量以下である場合）に限り、この音片データが定型メッセージ内の音片の波形に近いものとして選択する。なお、定型メッセージ内の音片のアクセントの予測結果は、定型メッセージの韻律の予測結果より特定できるものであり、一致音片決定部５２は、例えば、ピッチ成分の周波数が最も高いと予測されている位置をアクセントの予測位置であると解釈すればよい。一方、音片データが表す音片のアクセントの位置については、例えば、ピッチ成分の周波数が最も高い位置を上述のピッチ成分データに基づいて特定し、この位置をアクセントの位置であると解釈すればよい。また、韻律予測は、文章全体に対して行ってもよいし、文章を所定の単位に分割し、それぞれの単位に対して行ってもよい。 (2) When the value of the collation level data is “2”, the condition of (1) (that is, the condition that the phonetic character representing the reading is matched) is satisfied, and the frequency of the pitch component frequency of the sound piece data is further satisfied. When there is a strong correlation of a predetermined amount or more between the content of the pitch component data representing the time change and the prediction result of the accent (so-called prosody) of the speech piece included in the standard message (for example, the time difference between the accent positions is a predetermined amount) (If it is the following), the sound piece data is selected as being close to the waveform of the sound piece in the standard message. Note that the prediction result of the accent of the sound piece in the standard message can be specified from the prediction result of the prosody of the standard message, and the matching sound piece determination unit 52 is predicted to have the highest frequency of the pitch component, for example. What is necessary is just to interpret the position which is the predicted position of the accent. On the other hand, for the position of the accent of the sound piece represented by the sound piece data, for example, if the position where the frequency of the pitch component is the highest is specified based on the above-described pitch component data, this position is interpreted as the position of the accent. Good. The prosody prediction may be performed on the entire sentence, or the sentence may be divided into predetermined units and performed on each unit.

（３）照合レベルデータの値が「３」である場合は、（２）の条件（つまり、読みを表す表音文字及びアクセントの合致という条件）を満たし、更に、音片データが表す音声の鼻濁音化や無声化の有無が、定型メッセージの韻律の予測結果に合致している場合に限り、この音片データが定型メッセージ内の音片の波形に近いものとして選択する。一致音片決定部５２は、音片データが表す音声の鼻濁音化や無声化の有無を、話速変換部８より供給されたピッチ成分データに基づいて判別すればよい。 (3) When the value of the collation level data is “3”, the condition of (2) (that is, the condition of coincidence of phonetic characters and accents indicating reading) is satisfied, and further, The sound piece data is selected as being close to the waveform of the sound piece in the fixed message only when the presence or absence of nasal muffler or devoicing matches the prosodic prediction result of the fixed message. The coincidence sound piece determination unit 52 may determine whether or not the voice represented by the sound piece data is nasalized or unvoiced based on the pitch component data supplied from the speech speed conversion unit 8.

なお、一致音片決定部５２は、自ら設定した条件に合致する音片データが１個の音片につき複数あった場合は、これら複数の音片データを、設定した条件より厳格な条件に従って１個に絞り込むものとする。 In addition, when there are a plurality of pieces of sound piece data that match the conditions set by itself, the matching sound piece determination unit 52 sets the pieces of sound piece data to 1 according to conditions that are stricter than the set conditions. We shall narrow down to pieces.

具体的には、例えば、設定した条件が照合レベルデータの値「１」に相当するものであって、該当する音片データが複数あった場合は、照合レベルデータの値「２」に相当する検索条件にも合致するものを選択し、なお複数の音片データが選択された場合は、選択結果のうちから照合レベルデータの値「３」に相当する検索条件にも合致するものを更に選択する、等の操作を行う。照合レベルデータの値「３」に相当する検索条件で絞り込んでなお複数の音片データが残る場合は、残ったものを任意の基準で１個に絞り込めばよい。 Specifically, for example, when the set condition corresponds to the value “1” of the collation level data and there are a plurality of corresponding piece of piece data, it corresponds to the value “2” of the collation level data. If the search condition is also selected and multiple pieces of sound piece data are selected, the selection result that further matches the search condition corresponding to the collation level data value “3” is further selected. Perform operations such as If a plurality of pieces of sound piece data still remain after being narrowed down by the search condition corresponding to the value “3” of the collation level data, the remaining one may be narrowed down to one on an arbitrary basis.

そして、一致音片決定部５２は、照合レベルデータの値に相当する条件を満たすものとして選択した音片データを、出力合成部５４へと供給する。
ただし、一致音片決定部５２は、話速変換部８より供給された音片データのうちから、照合レベルデータの値に相当する条件を満たす音片データを選択できない音片があった場合、該当する音片を、検索部６が暗号化音片データを索出できなかった音片（つまり、上述の欠落部分識別データが示す音片）とみなして扱うことを決定するものとする。 Then, the matching sound piece determination unit 52 supplies the sound piece data selected as satisfying the condition corresponding to the value of the collation level data to the output composition unit 54.
However, when there is a sound piece that cannot select sound piece data that satisfies the condition corresponding to the value of the collation level data from the sound piece data supplied from the speech speed conversion unit 8, It is determined that the corresponding sound piece is treated as a sound piece for which the search unit 6 cannot find the encrypted sound piece data (that is, the sound piece indicated by the above-described missing portion identification data).

一方、一致音片決定部５２は、話速変換部８より欠落部分識別データも供給されている場合、又は、照合レベルデータの値に相当する条件を満たす音片データを選択できなかった音片があった場合には、欠落部分識別データが示す音片（照合レベルデータの値に相当する条件を満たす音片データを選択できなかった音片を含む）の読みを表す表音文字列を定型メッセージデータより抽出して音響処理部４１に供給し、この音片の波形を合成するよう指示する。 On the other hand, the coincidence sound piece determination unit 52, when missing part identification data is also supplied from the speech speed conversion unit 8, or a sound piece for which sound piece data that satisfies the condition corresponding to the value of the collation level data could not be selected If there is, the phonetic character string representing the reading of the sound piece indicated by the missing part identification data (including the sound piece for which the sound piece data that satisfies the condition corresponding to the value of the collation level data could not be selected) is fixed. It is extracted from the message data and supplied to the acoustic processing unit 41 to instruct to synthesize the waveform of this sound piece.

指示を受けた音響処理部４１は、一致音片決定部５２より供給された表音文字列を、配信文字列データが表す表音文字列と同様に扱う。この結果、この表音文字列に含まれる表音文字が表す音素を構成する素片を表す圧縮波形データが検索部４２により索出され、圧縮される前の素片波形データが伸長部４３により復元される。一方で、韻律予測部５３により、この表音文字列が表す音片の韻律の予測結果を表す韻律予測データが生成される。そして音響処理部４１が、この表音文字列に含まれるそれぞれの表音文字が表す音声の波形を表す音声波形データを、復元された各素片波形データと、韻律予測データとに基づいて生成し、生成された音声波形データを、出力合成部５４へと供給する。 Upon receiving the instruction, the acoustic processing unit 41 treats the phonetic character string supplied from the matching sound piece determination unit 52 in the same manner as the phonetic character string represented by the distribution character string data. As a result, the compressed waveform data representing the segments constituting the phonemes represented by the phonetic characters included in the phonetic character string is retrieved by the search unit 42, and the segment waveform data before being compressed is expanded by the decompressing unit 43. Restored. On the other hand, the prosody prediction unit 53 generates prosody prediction data representing the prediction result of the prosody of the speech piece represented by the phonetic character string. Then, the acoustic processing unit 41 generates speech waveform data representing the speech waveform represented by each phonogram included in the phonogram string based on each restored unit waveform data and prosodic prediction data. Then, the generated speech waveform data is supplied to the output synthesis unit 54.

なお、一致音片決定部５２は、韻律予測部５３が既に生成して一致音片決定部５２に供給した韻律予測データのうち、欠落部分識別データが示す音片に相当する部分を音響処理部４１に供給するようにしてもよく、この場合、音響処理部４１は、改めて韻律予測部５３に当該音片の韻律予測を行わせる必要はない。このようにすれば、音片等の細かい単位毎に韻律予測を行う場合に比べて、より自然な発話が可能になる。 Note that the coincidence sound piece determination unit 52 generates a portion corresponding to the sound piece indicated by the missing part identification data from the prosodic prediction data already generated by the prosody prediction unit 53 and supplied to the coincidence sound piece determination unit 52. In this case, the acoustic processing unit 41 does not need to cause the prosody prediction unit 53 to perform prosody prediction of the sound piece again. In this way, it is possible to utter more naturally than when prosodic prediction is performed for each fine unit such as a sound piece.

出力合成部５４は、一致音片決定部５２より音片データを供給され、音響処理部４１より、素片波形データより生成された音声波形データを供給されると、供給されたそれぞれの音声波形データに含まれる素片波形データの個数を調整することにより、当該音声波形データが表す音声の時間長を、一致音片決定部５２より供給された音片データが表す音片の発声スピードと整合するようにする。 When the output synthesis unit 54 is supplied with the sound piece data from the coincidence sound piece determination unit 52 and is supplied with the sound waveform data generated from the unit waveform data from the acoustic processing unit 41, each of the supplied sound waveforms is supplied. By adjusting the number of segment waveform data included in the data, the time length of the speech represented by the speech waveform data is matched with the utterance speed of the speech unit represented by the speech piece data supplied from the coincidence speech piece determination unit 52 To do.

具体的には、出力合成部５４は、例えば、一致音片決定部５２より音片データに含まれる上述の各区間が表す音素の時間長が元の時間長に対して増減した比率を特定し、音響処理部４１より供給された音声波形データが表す音素の時間長が当該比率で変化するように、各音声波形データ内の素片波形データの個数を増加あるいは減少させればよい。なお、出力合成部５４は、当該比率を特定するため、例えば、一致音片決定部５２が供給した音片データの生成に用いられた元の音片データを検索部６より取得し、これら２個の音片データ内で互いに同一の音素を表す区間を１個ずつ特定すればよい。そして、一致音片決定部５２が供給した音片データ内で特定した区間内に含まれる素片の個数が、検索部６より取得した音片データ内で特定した区間内に含まれる素片の個数に対して増減した比率を、音素の時間長の増減の比率として特定するようにすればよい。なお、音声波形データが表す音素の時間長が、一致音片決定部５２より供給された音片データが表す音片のスピードに既に整合している場合、出力合成部５４は、音声波形データ内の素片波形データの個数を調整する必要はない。 Specifically, the output synthesis unit 54 specifies, for example, a ratio in which the time length of the phoneme represented by each of the sections included in the speech piece data from the coincidence speech piece determination unit 52 is increased or decreased with respect to the original time length. The number of segment waveform data in each speech waveform data may be increased or decreased so that the time length of the phonemes represented by the speech waveform data supplied from the acoustic processing unit 41 changes at the ratio. In order to specify the ratio, the output synthesis unit 54 acquires, for example, the original sound piece data used for generating the sound piece data supplied by the matching sound piece determination unit 52 from the search unit 6, and 2 It is only necessary to specify one section representing the same phoneme in each piece of piece data. Then, the number of segments included in the segment specified in the segment data supplied by the matching tone segment determination unit 52 is the number of segments included in the segment specified in the segment data acquired from the search unit 6. The ratio increased or decreased with respect to the number may be specified as the ratio of increase or decrease of the phoneme time length. When the time length of the phoneme represented by the speech waveform data is already matched with the speed of the speech piece represented by the speech piece data supplied from the coincidence speech piece determining unit 52, the output synthesis unit 54 There is no need to adjust the number of segment waveform data.

そして、出力合成部５４は、素片波形データの個数の調整が完了した音声波形データと、一致音片決定部５２より供給された音片データとを、定型メッセージデータが示す定型メッセージ内での各音片ないし音素の並びに従った順序で互いに結合し、合成音声を表すデータとして出力する。 Then, the output synthesis unit 54 combines the speech waveform data for which the number of segment waveform data has been adjusted and the speech piece data supplied from the coincidence speech piece determination unit 52 in the standard message indicated by the standard message data. They are combined with each other in the order of each sound piece or phoneme, and output as data representing synthesized speech.

なお、話速変換部８より供給されたデータに欠落部分識別データが含まれていない場合は、音響処理部４１に波形の合成を指示することなく直ちに、音片編集部５が選択した音片データを、定型メッセージデータが示す定型メッセージ内での表音文字列の並びに従った順序で互いに結合し、合成音声を表すデータとして出力すればよい。 If the missing part identification data is not included in the data supplied from the speech speed conversion unit 8, the sound piece selected by the sound piece editing unit 5 is immediately selected without instructing the sound processing unit 41 to synthesize the waveform. The data may be combined with each other in the order of the phonetic character string in the standard message indicated by the standard message data and output as data representing the synthesized speech.

以上説明した、この発明の実施の形態の音声合成システムでは、音素より大きな単位であり得る音片の波形を表す音片データが、韻律の予測結果に基づいて、録音編集方式により自然につなぎ合わせられ、定型メッセージを読み上げる音声が合成される。一方、適切な音片データを選択することができなかった音片は、音素より小さな単位である素片を表す圧縮波形データを用いて、規則合成方式の手法に従って合成される。 In the speech synthesis system according to the embodiment of the present invention described above, the speech piece data representing the waveform of the speech piece that can be a unit larger than the phoneme is naturally connected by the recording and editing method based on the prosodic prediction result. The voice that reads out the fixed message is synthesized. On the other hand, a sound piece for which appropriate sound piece data could not be selected is synthesized in accordance with a rule synthesis method using compressed waveform data representing a piece that is a unit smaller than a phoneme.

そして、暗号化音片データの復号化は、音片データベースＤが記憶する他のデータのうち、その位置が利用者に対して非公開である所定の部分から本体ユニットＭが抽出するデータからなる暗号鍵を用いて行われる。このため、音片登録ユニットＲ又はその利用者が、本体ユニットＭに暗号鍵を別途供給する必要がなく、従って、暗号鍵が、暗号鍵を非公開とする対象者へと漏洩する危険が低い。 The decryption of the encrypted sound piece data consists of data extracted by the main unit M from a predetermined portion whose position is not disclosed to the user among other data stored in the sound piece database D. This is done using an encryption key. For this reason, it is not necessary for the sound piece registration unit R or its user to separately supply an encryption key to the main unit M, and therefore there is a low risk that the encryption key will be leaked to a target person who makes the encryption key private. .

また、暗号化音片データの復号化は、当該暗号化音片データを記憶している音片データベースＤに記憶されているデータから抽出される暗号鍵を用いて行われる。このため、本体ユニットＭは自己に接続された音片データベースＤを他の音片データベースＤへと交換しても、交換後の音片データベースＤが記憶する暗号化音片データを正しく復号化する。 Further, the decryption of the encrypted sound piece data is performed using an encryption key extracted from the data stored in the sound piece database D storing the encrypted sound piece data. For this reason, the main unit M correctly decrypts the encrypted sound piece data stored in the replaced sound piece database D even when the sound piece database D connected to the main body unit M is replaced with another sound piece database D. .

一方、同一の読みを有する音片であっても、互いに異なる話者が発話すれば、当該音片を表すデータのデータ長や、当該音片の発声スピード、当該音片のピッチ成分の周波数の時間変化は、話者間で互いに異なった値となる。
従って、複数の音片データベースＤから、（Ｃ）、（Ｄ）及び（Ｅ）のデータから暗号鍵の生成に用いる対象の３個のデータを互いに同一の条件で抽出しても、それぞれの音片データベースＤが格納している暗号化音片データが表す音片の話者が互いに異なっていれば、それぞれの音片データベースＤについて生成される暗号鍵も互いに異なったものとなる。従って暗号鍵は音片データベースＤに固有のものとなり、ある音片データベースＤについて生成された暗号鍵が他の音片データベースＤ内の暗号化音片データの復号化に流用される危険はほぼ回避できる。 On the other hand, even if the sound pieces have the same reading, if different speakers speak, the data length of the data representing the sound piece, the utterance speed of the sound piece, the frequency of the pitch component of the sound piece The time change is a different value between speakers.
Therefore, even if three pieces of data to be used for generating an encryption key are extracted from a plurality of sound piece databases D from the data (C), (D) and (E) under the same conditions, If the speakers of the sound pieces represented by the encrypted sound piece data stored in the piece database D are different from each other, the encryption keys generated for the respective sound piece databases D are also different from each other. Therefore, the encryption key is unique to the sound piece database D, and the risk that the encryption key generated for one sound piece database D is used for decryption of the encrypted sound piece data in another sound piece database D is substantially avoided. it can.

なお、この音声合成システムの構成は上述のものに限られない。
例えば、音片データベースＤを構成する不揮発性メモリは、ＣＤ（Compact Disc）−ＲＷ（ReWritable）等、アクセスのために記録媒体ドライブ装置（例えば、ＣＤ−ＲＷドライブ装置）を必要とする記録媒体により構成されていてもよい。ただしこの場合、本体ユニットＭ及び音片登録ユニットＲはそれぞれ、当該記録媒体へのアクセスを行う記録媒体ドライブ装置を備えるものとする。そして、音片登録ユニットＲの記録媒体ドライブ装置は、音片データベース作成部１１より供給されたデータを、自己にセットされた記録媒体に記録し、本体ユニットＭの記録媒体ドライブ装置は、自己にセットされた記録媒体からデータを読み出して検索部６に供給するものとする。 Note that the configuration of this speech synthesis system is not limited to that described above.
For example, the nonvolatile memory constituting the sound piece database D may be a recording medium that requires a recording medium drive device (for example, a CD-RW drive device) for access, such as a CD (Compact Disc) -RW (ReWritable). It may be configured. However, in this case, the main unit M and the sound piece registration unit R are each provided with a recording medium drive device for accessing the recording medium. Then, the recording medium drive device of the sound piece registration unit R records the data supplied from the sound piece database creation unit 11 on the recording medium set in itself, and the recording medium drive device of the main body unit M self-records. It is assumed that data is read from the set recording medium and supplied to the search unit 6.

また、音片データベースＤは、必ずしもヘッダ部ＨＤＲ、インデックス部ＩＤＸ又はディレクトリ部ＤＩＲを自ら記憶する必要はなく、ヘッダ部ＨＤＲ、インデックス部ＩＤＸ及びディレクトリ部ＤＩＲの一部又は全部が、インターネット等からなる外部のネットワークに接続された外部のコンピュータに記憶されてもよい。
この場合、具体的には、例えば、音片登録ユニットＲの音片データベース作成部１１と本体ユニットＭの検索部６とが、それぞれ、モデム等からなる通信制御装置を備えていればよい。そして、音片データベース作成部１１がこのネットワークを介してこのコンピュータにアクセスし、ヘッダ部ＨＤＲ、インデックス部ＩＤＸ及びディレクトリ部ＤＩＲに属するデータの一部又は全部をこのコンピュータにアップロードするものとし、一方で検索部６が、アップロードされたこのデータを、このネットワークを介してこのコンピュータにアクセスすることにより取得するものとすればよい。 Further, the sound piece database D does not necessarily store the header part HDR, the index part IDX or the directory part DIR, and a part or all of the header part HDR, the index part IDX and the directory part DIR is made up of the Internet or the like. You may memorize | store in the external computer connected to the external network.
In this case, specifically, for example, the sound piece database creation unit 11 of the sound piece registration unit R and the search unit 6 of the main unit M may each include a communication control device including a modem or the like. Then, the sound piece database creation unit 11 accesses this computer via this network and uploads part or all of the data belonging to the header part HDR, the index part IDX, and the directory part DIR to this computer, The search unit 6 may acquire the uploaded data by accessing the computer via the network.

また、本体ユニットＭの復号化部７を構成するメモリが、当該本体ユニットＭに固有に割り当てられている識別符号を記憶してもよく、音片登録ユニットＲ及び本体ユニットＭがこの識別符号を暗号鍵の生成に用いてもよい。
具体的には、音片登録ユニットＲの音片データベース作成部１１が本体ユニットＭの識別符号を取得して、自己が生成した（Ｃ）のデータのうち、この識別符号の値により決まる所定の１個を抽出し、同様に、自己が生成した（Ｄ）のデータ及び（Ｅ）のデータからも、この識別符号の値により決まる所定の１個ずつを抽出し、抽出された計３個のデータを用いて暗号鍵を生成するようにしてもよい。
また、検索部６も、音片データベースＤが記憶する（Ｃ）、（Ｄ）及び（Ｅ）の各データのうち、自ら記憶する本体ユニットＭの識別符号の値により決まる所定の１個ずつ計３個を抽出して、抽出された３個のデータを用いて暗号鍵を生成するようにすればよい。
なお、「識別符号の値により決まる所定の１個」とは、具体的には、例えば識別符号（又は識別符号を所定の形式で含んだデータ）の値を所定のハッシュ関数に代入し、あるいは識別符号の値にその他所定の演算を施すことにより得られる値が示すデータ、などであればよい。 In addition, the memory constituting the decoding unit 7 of the main unit M may store an identification code uniquely assigned to the main unit M, and the sound piece registration unit R and the main unit M receive the identification code. You may use for the production | generation of an encryption key.
Specifically, the sound piece database creation unit 11 of the sound piece registration unit R acquires the identification code of the main body unit M, and among the data (C) generated by itself, a predetermined value determined by the value of this identification code. Similarly, a predetermined one determined by the value of this identification code is extracted from the data of (D) and (E) generated by itself, and a total of three extracted You may make it produce | generate an encryption key using data.
Further, the search unit 6 also counts a predetermined one determined by the value of the identification code of the main unit M stored in itself among the data (C), (D), and (E) stored in the sound piece database D. It is only necessary to extract three and generate the encryption key using the extracted three data.
The “predetermined one determined by the value of the identification code” specifically refers to, for example, substituting the value of the identification code (or data including the identification code in a predetermined format) into a predetermined hash function, or Any other data may be used as long as the value is obtained by performing another predetermined calculation on the value of the identification code.

また、音片登録ユニットＲが本体ユニットＭの識別符号を取得する手法は任意である。従って、例えば音片登録ユニットＲが、キーボード等からなる入力部と、液晶ディスプレイ装置等からなる表示部とを備え、ユーザが音片登録ユニットＲの入力部を操作することにより本体ユニットＭの識別符号を入力したとき、圧縮部１２がこの入力部よりこの識別符号を取得して、暗号鍵の作成に用いるものとしてもよい。また、音片登録ユニットＲと本体ユニットＭとが互いに接続可能に構成されていてもよく、この場合は、音片登録ユニットＲと本体ユニットＭとが互いに接続されている状態で、圧縮部１２が復号化部７にアクセスして、本体ユニットＭの識別符号を取得するようにしてもよい。 Moreover, the method in which the sound piece registration unit R acquires the identification code of the main body unit M is arbitrary. Therefore, for example, the sound piece registration unit R includes an input unit composed of a keyboard or the like and a display unit composed of a liquid crystal display device or the like, and the user can identify the main unit M by operating the input unit of the sound piece registration unit R. When a code is input, the compression unit 12 may acquire the identification code from the input unit and use it for creating an encryption key. Further, the sound piece registration unit R and the main unit M may be configured to be connectable to each other. In this case, the compression unit 12 is connected in a state where the sound piece registration unit R and the main unit M are connected to each other. May access the decoding unit 7 to obtain the identification code of the main unit M.

また、音片データベースＤを構成するメモリないし記録媒体が、当該メモリないし記録媒体に固有に割り当てられている識別符号を記憶してもよく、音片登録ユニットＲがこの識別符号を、本体ユニットＭの上述の識別符号と同様に、暗号鍵の生成に用いてもよい。 The memory or recording medium constituting the sound piece database D may store an identification code uniquely assigned to the memory or recording medium, and the sound piece registration unit R stores the identification code as the main unit M. Similarly to the above identification code, it may be used to generate an encryption key.

また、暗号鍵の生成のために行う演算は、必ずしも、抽出した３個のデータの値を互いに加算する、という演算に限られず任意である。また、演算に用いられるデータの数は３個に限られず、２個、あるいは４個以上のデータを抽出して演算に用いてもよい。
また、上述の識別符号を暗号鍵の生成に用いる場合、音片データベース作成部１１及び検索部６は、暗号鍵の生成のために抽出されたデータが示す値に、識別符号の値により決まる所定の規則に従った演算を施すものとしてもよい。 The calculation performed for generating the encryption key is not necessarily limited to the calculation of adding the extracted three data values to each other, but is arbitrary. Further, the number of data used for the calculation is not limited to three, and two or four or more data may be extracted and used for the calculation.
Further, when the above identification code is used for generating the encryption key, the sound piece database creating unit 11 and the search unit 6 are predetermined values determined by the value of the identification code to the values indicated by the data extracted for generating the encryption key. It is good also as what performs the calculation according to the rule of.

また、暗号鍵の生成に用いるデータは必ずしも（Ｃ）、（Ｄ）及び（Ｅ）のデータから抽出される必要はなく、音片データベース作成部１１及び検索部６は、（Ｃ）、（Ｄ）及び（Ｅ）のデータ以外のデータであってヘッダ部ＨＤＲ、インデックス部ＩＤＸ又はディレクトリ部ＤＩＲに属する任意のデータを、（Ｃ）、（Ｄ）又は（Ｅ）のデータに代えて、あるいは（Ｃ）、（Ｄ）又は（Ｅ）のデータと共に、暗号鍵の生成に用いてもよい。
更に、音片データベース作成部１１が、音片データの特徴を表すデータや、音片データが表す音片の特徴を表すその他任意のデータを生成して音片データベースＤに格納し、検索部６がこれらのデータを、（Ｃ）、（Ｄ）又は（Ｅ）のデータに代えて、あるいは（Ｃ）、（Ｄ）又は（Ｅ）のデータと共に、暗号鍵の作成に用いてもよい。 In addition, the data used for generating the encryption key does not necessarily need to be extracted from the data of (C), (D), and (E), and the sound piece database creation unit 11 and the search unit 6 are (C), (D ) And (E), and any data belonging to the header part HDR, index part IDX or directory part DIR is replaced with the data (C), (D) or (E), or ( You may use for the production | generation of an encryption key with the data of C), (D), or (E).
Further, the sound piece database creating unit 11 generates data representing the characteristics of the sound piece data and other arbitrary data representing the characteristics of the sound pieces represented by the sound piece data, stores them in the sound piece database D, and the search unit 6. However, these data may be used for creating an encryption key instead of the data of (C), (D) or (E) or together with the data of (C), (D) or (E).

また、音片データベース作成部１１は、マイクロフォン、増幅器、サンプリング回路、Ａ／Ｄ（Analog-to-Digital）コンバータ及びＰＣＭエンコーダなどを備えていてもよい。この場合、音片データベース作成部１１は、収録音片データセット記憶部１０より音片データを取得する代わりに、自己のマイクロフォンが集音した音声を表す音声信号を増幅し、サンプリングしてＡ／Ｄ変換した後、サンプリングされた音声信号にＰＣＭ変調を施すことにより、音片データを作成してもよい。 The sound piece database creation unit 11 may include a microphone, an amplifier, a sampling circuit, an A / D (Analog-to-Digital) converter, a PCM encoder, and the like. In this case, instead of obtaining the sound piece data from the recorded sound piece data set storage unit 10, the sound piece database creating unit 11 amplifies a sound signal representing the sound collected by its own microphone, samples it, and performs A / After D conversion, the piece data may be created by performing PCM modulation on the sampled audio signal.

また、音片データベース作成部１１は、図示しない記録媒体ドライブ装置にセットされた記録媒体から、この記録媒体ドライブ装置を介して、音片データベースＤに追加する新たな暗号化音片データの材料となる音片データや表音文字列を読み取ってもよい。
また、音片登録ユニットＲは、必ずしも収録音片データセット記憶部１０を備えている必要はない。 Further, the sound piece database creating unit 11 generates a new encrypted sound piece data material to be added to the sound piece database D from a recording medium set in a recording medium drive device (not shown) via the recording medium drive device. May be read.
The sound piece registration unit R does not necessarily need to include the recorded sound piece data set storage unit 10.

また、ピッチ成分データは音片データが表す音片のピッチ長の時間変化を表すデータであってもよい。この場合、一致音片決定部５２は、ピッチ長が最も短い位置（つまり、周波数がもっとも高い位置）をピッチ成分データに基づいて特定し、この位置をアクセントの位置であると解釈すればよい。 Further, the pitch component data may be data representing a time change of the pitch length of the sound piece represented by the sound piece data. In this case, the matching sound piece determination unit 52 may identify a position having the shortest pitch length (that is, a position having the highest frequency) based on the pitch component data, and interpret this position as an accent position.

また、素片波形データはＰＣＭ形式のデータである必要はなく、データ形式は任意である。また、波形データベース４４は素片波形データや音片データを必ずしもデータ圧縮された状態で記憶している必要はない。波形データベース４４が素片波形データをデータ圧縮されていない状態で記憶している場合、本体ユニットＭは伸長部４３を備えている必要はない。 Further, the segment waveform data does not have to be PCM format data, and the data format is arbitrary. Further, the waveform database 44 does not necessarily store the unit waveform data and sound piece data in a compressed state. When the waveform database 44 stores the segment waveform data in a state where the data is not compressed, the main body unit M does not need to include the decompression unit 43.

以上、この発明の実施の形態を説明したが、この発明に係る音声データベース製造装置及び音片復元装置は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。 Although the embodiment of the present invention has been described above, the speech database manufacturing apparatus and sound piece restoration apparatus according to the present invention can be realized using a normal computer system, not a dedicated system.

例えば、音片データベースＤを構成する外部の不揮発性メモリないし記録媒体に接続可能なパーソナルコンピュータに上述の収録音片データセット記憶部１０、音片データベース作成部１１及び圧縮部１２の動作を実行させるためのプログラムを格納した記録媒体（ＣＤ−ＲＯＭ、フレキシブルディスク等）から該プログラムをインストールすることにより、上述の処理を実行する音片登録ユニットＲを構成することができる。 For example, the above-described recorded sound piece data set storage unit 10, sound piece database creation unit 11, and compression unit 12 are caused to execute operations on a personal computer that can be connected to an external nonvolatile memory or recording medium constituting the sound piece database D. By installing the program from a recording medium (CD-ROM, flexible disk, etc.) that stores the program for recording, the sound piece registration unit R that executes the above-described processing can be configured.

そして、このプログラムを実行し音片登録ユニットＲとして機能するパーソナルコンピュータが、図１の音声合成システムの音片登録ユニットＲの動作に相当する処理として、図３に示す処理を行うようにすることもできる。
図３は、音片登録ユニットＲの機能を行うパーソナルコンピュータが実行する処理を示すフローチャートである。 A personal computer that executes this program and functions as the sound piece registration unit R performs the process shown in FIG. 3 as a process corresponding to the operation of the sound piece registration unit R of the speech synthesis system of FIG. You can also.
FIG. 3 is a flowchart showing processing executed by the personal computer that performs the function of the sound piece registration unit R.

すなわち、このパーソナルコンピュータが音片データベースＤに音片を登録する場合、まず、このパーソナルコンピュータは、音片データベースＤを構成する不揮発性メモリないし記録媒体がこのパーソナルコンピュータに接続された状態で、収録音片データセット記憶部１０より、互いに対応付けられている表音文字及び音片データを読み出し、又は、互いに対応付けられている表音文字列及び音片データを外部より取得して（図３、ステップＳ００１）、得られた音片データのデータ長と、この音片データが表す音声の発声スピード及びピッチ成分の周波数の時間変化を特定する（ステップＳ００２）。 That is, when the personal computer registers a sound piece in the sound piece database D, the personal computer first records the non-volatile memory or recording medium constituting the sound piece database D in a state where the personal computer is connected to the personal computer. The phonetic character string and the voice piece data associated with each other are read from the voice piece data set storage unit 10, or the phonetic character string and the voice piece data associated with each other are acquired from the outside (FIG. 3). Step S001), the data length of the obtained sound piece data, and the time variation of the voice production speed and the frequency of the pitch component represented by the sound piece data are specified (Step S002).

なお、このパーソナルコンピュータは、ステップＳ００２におけるデータ長及び発声スピードの特定を、例えば、この音片データのサンプル数を数えることにより行えばよい。
また、ピッチ成分の周波数の時間変化は、例えば、この音片データにケプストラム解析を施すことにより特定すればよい。具体的には、例えば、音片データが表す波形を時間軸上で多数の小部分へと区切り、得られたそれぞれの小部分のケプストラムを求め、このケプストラムの極大値を与える周波数のうちの最小値を、この小部分におけるピッチ成分の周波数として特定すればよい。なお、ピッチ成分の周波数の時間変化は、上述したように、例えば特開２００３−１０８１７２号公報に開示された手法に従って音片データをピッチ波形データへと変換してから、このピッチ波形データに基づいて特定するようにすると良好な結果が期待できる。 The personal computer may specify the data length and the utterance speed in step S002 by, for example, counting the number of samples of the sound piece data.
Moreover, what is necessary is just to identify the time change of the frequency of a pitch component, for example by performing a cepstrum analysis to this sound piece data. Specifically, for example, the waveform represented by the sound piece data is divided into a large number of small parts on the time axis, the cepstrum of each obtained small part is obtained, and the minimum of the frequencies giving the maximum value of this cepstrum is obtained. The value may be specified as the frequency of the pitch component in this small portion. As described above, the time change of the frequency of the pitch component is based on the pitch waveform data after the sound piece data is converted into the pitch waveform data according to the method disclosed in, for example, Japanese Patent Laid-Open No. 2003-108172. If you specify it, you can expect good results.

次に、このパーソナルコンピュータは、ステップＳ００２で特定したデータ長、発声スピード及びピッチ成分の周波数の時間変化を特定した結果を示す３個のデータを、上述の（Ｃ）のデータ、スピード初期値データ（すなわち上述の（Ｄ）のデータ）及びピッチ成分データ（すなわち上述の（Ｅ）のデータ）として生成する（ステップＳ００３）。 Next, the personal computer uses the data (C) and the speed initial value data described above as the result of specifying the data length, the utterance speed, and the frequency change of the pitch component frequency specified in step S002. (That is, the data of (D) described above) and pitch component data (that is, the data of (E) described above) are generated (step S003).

音片データベースに登録する対象であるすべての音片について（Ｃ）、（Ｄ）及び（Ｅ）のデータの生成を完了すると、このパーソナルコンピュータは、例えば、生成した（Ｃ）のデータの集合、（Ｄ）のデータの集合、及び（Ｅ）のデータの集合のそれぞれのうちから、それぞれにつき予め定められた読みの音片データより生成されたもの１個ずつ、計３個を選択して、これら３個のデータの値の和を表すデータを暗号鍵として生成する（ステップＳ００４）。 When the generation of the data (C), (D), and (E) is completed for all the sound pieces to be registered in the sound piece database, the personal computer, for example, collects the generated data set (C), From each of the data set of (D) and the data set of (E), a total of three are selected, one each generated from sound piece data of a predetermined reading for each, Data representing the sum of the values of these three data is generated as an encryption key (step S004).

次にこのパーソナルコンピュータは、ステップＳ００４で生成した暗号鍵を用いて、このパーソナルコンピュータより供給された音片データを対称鍵暗号の手法により暗号化することにより暗号化音片データを作成し（ステップＳ００５）、データ部ＤＡＴを構成するデータとして、音片データベースＤの記憶領域に書き込む（ステップＳ００６）。 Next, the personal computer creates encrypted sound piece data by encrypting the sound piece data supplied from the personal computer using a symmetric key encryption method using the encryption key generated in Step S004 (Step S004). In step S005), the data portion DAT is written in the storage area of the sound piece database D as data constituting the data portion DAT (step S006).

また、ステップＳ００６でこのパーソナルコンピュータは、書き込んだ暗号化音片データが表す音片の読みを示すものとして収録音片データセット記憶部１０より読み出した表音文字を、音片読みデータ、すなわち上述の（Ａ）のデータとして音片データベースＤの記憶領域に書き込む。
また、書き込んだ暗号化音片データの、音片データベースＤの記憶領域内での先頭のアドレスを特定し、このアドレスを上述の（Ｂ）のデータとして音片データベースＤの記憶領域に書き込む。
また、書き込んだ暗号化音片データに相当する音片データを用いて生成された上述の（Ｃ）、（Ｄ）及び（Ｅ）の各データも、音片データベースＤの記憶領域に書き込む。 In step S006, the personal computer converts the phonetic character read from the recorded sound piece data set storage unit 10 to indicate the reading of the sound piece represented by the written encrypted sound piece data. (A) is written in the storage area of the sound piece database D.
Further, the head address of the written encrypted sound piece data in the storage area of the sound piece database D is specified, and this address is written in the storage area of the sound piece database D as the data (B) described above.
In addition, the above-described data (C), (D), and (E) generated using sound piece data corresponding to the written encrypted sound piece data are also written in the storage area of the sound piece database D.

また、音片データベースＤを構成する不揮発性メモリないし記録媒体に接続可能なパーソナルコンピュータに上述の言語処理部１、一般単語辞書２、ユーザ単語辞書３、規則合成処理部４、音片編集部５、検索部６、音片データベースＤ、復号化部７及び話速変換部８の動作を実行させるためのプログラムを格納した記録媒体から該プログラムをインストールすることにより、上述の処理を実行する本体ユニットＭを構成することができる。 Further, the above-described language processing unit 1, general word dictionary 2, user word dictionary 3, rule synthesis processing unit 4, and sound piece editing unit 5 are connected to a non-volatile memory or a recording medium constituting the sound piece database D. , A main unit that executes the above-described processing by installing the program from a recording medium that stores the program for executing the operations of the search unit 6, the sound piece database D, the decoding unit 7, and the speech rate conversion unit 8. M can be configured.

そして、このプログラムを実行し本体ユニットＭとして機能するパーソナルコンピュータが、図１の音声合成システムの本体ユニットＭの動作に相当する処理として、図４〜図６に示す処理を行うようにすることもできる。
図４は、本体ユニットＭの機能を行うパーソナルコンピュータがフリーテキストデータを取得した場合の処理を示すフローチャートである。
図５は、本体ユニットＭの機能を行うパーソナルコンピュータが配信文字列データを取得した場合の処理を示すフローチャートである。
図６は、本体ユニットＭの機能を行うパーソナルコンピュータが定型メッセージデータ及び発声スピードデータを取得した場合の処理を示すフローチャートである。 Then, a personal computer that executes this program and functions as the main unit M may perform the processes shown in FIGS. 4 to 6 as the process corresponding to the operation of the main unit M of the speech synthesis system of FIG. it can.
FIG. 4 is a flowchart showing processing when a personal computer that performs the function of the main unit M acquires free text data.
FIG. 5 is a flowchart showing a process when the personal computer that performs the function of the main unit M acquires the distribution character string data.
FIG. 6 is a flowchart showing processing when a personal computer that performs the function of the main unit M acquires standard message data and utterance speed data.

すなわち、このパーソナルコンピュータが、外部より、上述のフリーテキストデータを取得すると（図４、ステップＳ１０１）、このフリーテキストデータが表すフリーテキストに含まれるそれぞれの表意文字について、その読みを表す表音文字を、一般単語辞書２やユーザ単語辞書３を検索することにより特定し、この表意文字を、特定した表音文字へと置換する（ステップＳ１０２）。なお、このパーソナルコンピュータがフリーテキストデータを取得する手法は任意である。 That is, when the personal computer obtains the above-mentioned free text data from the outside (step S101 in FIG. 4), the phonogram representing the reading of each ideographic character included in the free text represented by the free text data. Is identified by searching the general word dictionary 2 and the user word dictionary 3, and the ideogram is replaced with the identified phonogram (step S102). Note that the method of acquiring free text data by this personal computer is arbitrary.

そして、このパーソナルコンピュータは、フリーテキスト内の表意文字をすべて表音文字へと置換した結果を表す表音文字列が得られると、この表音文字列に含まれるそれぞれの表音文字について、当該表音文字が表す単位音声の波形を波形データベース４４より検索し、表音文字列に含まれるそれぞれの表音文字が表す音素を構成する素片の波形を表す圧縮波形データを索出し（ステップＳ１０３）、索出された圧縮波形データを、圧縮される前の素片波形データへと復元する（ステップＳ１０４）。 And when this personal computer obtains a phonetic character string representing the result of replacing all ideographic characters in the free text with phonetic characters, for each phonetic character contained in this phonetic character string, The waveform of the unit speech represented by the phonetic character is searched from the waveform database 44, and compressed waveform data representing the waveform of the segment constituting the phoneme represented by each phonetic character included in the phonetic character string is retrieved (step S103). ), The retrieved compressed waveform data is restored to the segment waveform data before being compressed (step S104).

一方で、このパーソナルコンピュータは、フリーテキストデータに韻律予測の手法に基づいた解析を加えることにより、フリーテキストが表す音声の韻律を予測する（ステップＳ１０５）。そして、ステップＳ１０４で復元された素片波形データと、ステップＳ１０５における韻律の予測結果とに基づいて音声波形データを生成し（ステップＳ１０６）、得られた音声波形データを、表音文字列内での各表音文字の並びに従った順序で互いに結合し、合成音声データとして出力する（ステップＳ１０７）。なお、このパーソナルコンピュータが合成音声データを出力する手法は任意である。 On the other hand, the personal computer predicts the prosody of the speech represented by the free text by adding analysis based on the prosody prediction method to the free text data (step S105). Then, speech waveform data is generated based on the segment waveform data restored in step S104 and the prosodic prediction result in step S105 (step S106), and the obtained speech waveform data is generated in the phonetic character string. Are combined with each other in the order of the phonograms and output as synthesized speech data (step S107). Note that the method by which the personal computer outputs the synthesized voice data is arbitrary.

また、このパーソナルコンピュータが、外部より、上述の配信文字列データを任意の手法で取得すると（図５、ステップＳ２０１）、この配信文字列データが表す表音文字列に含まれるそれぞれの表音文字について、上述のステップＳ１０３〜Ｓ１０４と同様に、当該表音文字が表す音素を構成する素片の波形を表す圧縮波形データを索出する処理、及び、索出された圧縮波形データを素片波形データへと復元する処理を行う（ステップＳ２０２）。 When this personal computer obtains the above-mentioned distribution character string data from the outside by an arbitrary method (FIG. 5, step S201), each phonogram included in the phonogram string represented by this distribution character string data In the same manner as in steps S103 to S104 described above, the process of searching for compressed waveform data representing the waveform of the segment constituting the phoneme represented by the phonetic character, and the retrieved compressed waveform data as the segment waveform A process of restoring data is performed (step S202).

一方でこのパーソナルコンピュータは、配信文字列に韻律予測の手法に基づいた解析を加えることにより、配信文字列が表す音声の韻律を予測し（ステップＳ２０３）、ステップＳ２０２で復元された素片波形データと、ステップＳ２０３における韻律の予測結果とに基づいて音声波形データを生成し（ステップＳ２０４）、得られた音声波形データを、表音文字列内での各表音文字の並びに従った順序で互いに結合し、合成音声データとしてステップＳ１０７の処理と同様の処理により出力する（ステップＳ２０５）。 On the other hand, this personal computer predicts the prosody of the speech represented by the distribution character string by adding an analysis based on the prosody prediction method to the distribution character string (step S203), and the segment waveform data restored in step S202. And speech waveform data is generated based on the prosodic prediction result in step S203 (step S204), and the obtained speech waveform data is exchanged with each other in the order in which the phonograms are arranged in the phonogram string. Combined and output as synthesized speech data by the same process as the process of step S107 (step S205).

一方、音片データベースＤを構成する不揮発性メモリないし記録媒体がこのパーソナルコンピュータに接続された状態で、このパーソナルコンピュータが、外部より、上述の定型メッセージデータ、照合レベルデータ及び発声スピードデータを任意の手法により取得すると（図６、ステップＳ３０１）、まず、定型メッセージデータに公知の手法による形態素解析を施すことにより、定型メッセージデータを構成する表意文字列を、表音文字列へと変換する（ステップＳ３０２）。 On the other hand, in a state where the non-volatile memory or recording medium constituting the sound piece database D is connected to the personal computer, the personal computer receives the above-mentioned fixed message data, collation level data, and utterance speed data from the outside. When acquired by the technique (FIG. 6, step S301), first, by performing morphological analysis by a known technique on the fixed message data, the ideographic character string constituting the fixed message data is converted into a phonetic character string (step). S302).

次に、このパーソナルコンピュータは、ステップＳ３０２の処理で得られた表音文字列に合致する表音文字列が対応付けられている暗号化音片データをすべて索出する（ステップＳ３０３）。 Next, the personal computer searches for all encrypted speech piece data associated with the phonetic character string that matches the phonetic character string obtained in the process of step S302 (step S303).

また、ステップＳ３０３では、該当する暗号化音片データに対応付けられている上述の音片読みデータ、スピード初期値データ及びピッチ成分データも索出する。なお、１個の音片につき複数の暗号化音片データが該当する場合は、該当する暗号化音片データすべてを索出する。一方、暗号化音片データを索出できなかった音片があった場合は、上述の欠落部分識別データを生成する。 In step S303, the above-described sound piece reading data, speed initial value data, and pitch component data associated with the corresponding encrypted sound piece data are also retrieved. In addition, when a plurality of encrypted sound piece data corresponds to one sound piece, all the corresponding encrypted sound piece data are searched. On the other hand, when there is a sound piece for which the encrypted sound piece data could not be found, the above-described missing part identification data is generated.

次に、このパーソナルコンピュータは、索出された暗号化音片データを、圧縮される前の音片データへと復元する（ステップＳ３０４）。そして、復元された音片データを、上述の話速変換部８が行う処理と同様の処理により変換して、当該音片データが表す音片の時間長を、発声スピードデータが示すスピードに合致させる（ステップＳ３０５）。なお、発声スピードデータが供給されていない場合は、復元された音片データを変換しなくてもよい。 Next, the personal computer restores the retrieved encrypted sound piece data to sound piece data before being compressed (step S304). Then, the restored sound piece data is converted by a process similar to the process performed by the speech speed conversion unit 8 described above, and the time length of the sound piece represented by the sound piece data matches the speed indicated by the utterance speed data. (Step S305). In addition, when the utterance speed data is not supplied, the restored sound piece data may not be converted.

なお、ステップＳ３０４でこのパーソナルコンピュータは、音片データベースＤにアクセスし、音片データベースＤが記憶する上述の（Ｃ）、（Ｄ）及び（Ｅ）の各データのうち所定の１個ずつ計３個を抽出して、抽出された３個のデータの値の和を表すデータを生成することにより、この暗号化音片データの生成に用いたものと同一の暗号鍵を生成する。そして、生成したこの暗号鍵を用いてこの暗号化音片データを復号化することにより、圧縮される前の音片データへと復元するものとする。 In step S304, the personal computer accesses the sound piece database D, and a predetermined number of each of the data (C), (D), and (E) stored in the sound piece database D is 3 in total. The same encryption key as that used to generate the encrypted sound piece data is generated by extracting the number and generating data representing the sum of the values of the extracted three data. Then, the encrypted sound piece data is decrypted by using the generated encryption key to restore the sound piece data before being compressed.

次に、このパーソナルコンピュータは、定型メッセージデータが表す定型メッセージに韻律予測の手法に基づいた解析を加えることにより、この定型メッセージの韻律を予測する（ステップＳ３０６）。そして、音片の時間長が変換された音片データのうちから、定型メッセージを構成する音片の波形に最も近い波形を表す音片データを、上述の一致音片決定部５２が行う処理と同様の処理を行うことにより、外部より取得した照合レベルデータが示す基準に従って、音片１個につき１個ずつ選択する（ステップＳ３０７）。 Next, the personal computer predicts the prosody of the fixed message by adding an analysis based on the prosodic prediction method to the fixed message represented by the fixed message data (step S306). Then, the above-mentioned matching piece determination unit 52 performs the piece piece data representing the waveform closest to the waveform of the piece constituting the standard message from the piece pieces data in which the time length of the piece is converted, By performing the same process, one piece is selected for each sound piece according to the reference indicated by the collation level data acquired from the outside (step S307).

具体的には、ステップＳ３０７でこのパーソナルコンピュータは、例えば、上述した（１）〜（３）の条件に従って音片データを特定する。すなわち、照合レベルデータの値が「１」である場合は、定型メッセージ内の音片と読みが合致する音片データをすべて、定型メッセージ内の音片の波形を表しているとみなす。また、照合レベルデータの値が「２」である場合は、読みを表す表音文字が合致し、更に、音片データのピッチ成分の周波数の時間変化を表すピッチ成分データの内容が定型メッセージに含まれる音片のアクセントの予測結果に合致する場合に限り、この音片データが定型メッセージ内の音片の波形を表しているとみなす。また、照合レベルデータの値が「３」である場合は、読みを表す表音文字及びアクセントが合致し、更に、音片データが表す音声の鼻濁音化や無声化の有無が、定型メッセージの韻律の予測結果に合致している場合に限り、この音片データが定型メッセージ内の音片の波形を表しているとみなす。
なお、照合レベルデータが示す基準に合致する音片データが１個の音片につき複数あった場合は、これら複数の音片データを、設定した条件より厳格な条件に従って１個に絞り込むものとする。また、照合レベルデータの値に相当する条件を満たす音片データを選択できない音片があった場合は、該当する音片を、暗号化音片データを索出できなかった音片として扱うことと決定し、例えば欠落部分識別データを生成するものとする。 Specifically, in step S307, the personal computer specifies sound piece data in accordance with, for example, the conditions (1) to (3) described above. That is, when the value of the collation level data is “1”, all of the piece data whose reading matches the sound piece in the standard message is regarded as representing the waveform of the sound piece in the standard message. When the value of the collation level data is “2”, the phonetic character representing the reading matches, and the content of the pitch component data representing the time change of the frequency of the pitch component of the sound piece data is displayed in the standard message. Only when the predicted result of the accent of the included speech piece matches, this speech piece data is considered to represent the waveform of the speech piece in the standard message. When the value of the collation level data is “3”, the phonetic character and the accent representing the reading match, and whether or not the voice represented by the speech piece data is nasalized or unvoiced is determined by the prosody of the standard message. The sound piece data is regarded as representing the waveform of the sound piece in the standard message only when the result matches the predicted result.
If there are a plurality of pieces of sound piece data that match the criteria indicated by the collation level data for one piece of sound, the plurality of pieces of sound piece data are narrowed down to one according to conditions that are stricter than the set conditions. . In addition, when there is a sound piece that cannot select sound piece data that satisfies the condition corresponding to the value of the collation level data, the corresponding sound piece is treated as a sound piece for which the encrypted sound piece data could not be found. For example, it is assumed that missing part identification data is generated.

一方、このパーソナルコンピュータは、欠落部分識別データを生成した場合、欠落部分識別データが示す音片の読みを表す表音文字列を定型メッセージデータより抽出し、この表音文字列につき、音素毎に、配信文字列データが表す表音文字列と同様に扱って上述のステップＳ２０２〜Ｓ２０４の処理と同様の処理を行うことにより、この表音文字列内の各表音文字が示す音声の波形を表す音声波形データを生成する（ステップＳ３０８）。
ただし、ステップＳ３０８でこのパーソナルコンピュータは、ステップＳ２０３の処理に相当する処理を行う代わりに、ステップＳ３０６における韻律予測の結果を用いて音声波形データを生成するようにしてもよい。 On the other hand, when the personal computer generates the missing part identification data, the personal computer extracts a phonetic character string representing the reading of the sound piece indicated by the missing part identification data from the standard message data. By processing the phonetic character string represented by the delivery character string data in the same manner as the processing in steps S202 to S204 described above, the waveform of the voice indicated by each phonetic character in the phonetic character string is obtained. The voice waveform data to be represented is generated (step S308).
However, in step S308, the personal computer may generate speech waveform data using the result of prosody prediction in step S306 instead of performing the processing corresponding to the processing in step S203.

次に、このパーソナルコンピュータは、上述の出力合成部５４が行う処理と同様の処理を行うことにより、ステップＳ３０８で生成された音声波形データに含まれる素片波形データの個数を調整し、当該音声波形データが表す音声の時間長を、ステップＳ３０７で選択された音片データが表す音片の発声スピードと整合するようにする（ステップＳ３０９）。 Next, the personal computer adjusts the number of segment waveform data included in the speech waveform data generated in step S308 by performing processing similar to the processing performed by the output synthesis unit 54 described above, and The time length of the voice represented by the waveform data is matched with the utterance speed of the sound piece represented by the sound piece data selected in step S307 (step S309).

すなわち、ステップＳ３０９でこのパーソナルコンピュータは、例えば、ステップＳ３０７で選択された音片データに含まれる上述の各区間が表す音素の時間長が元の時間長に対して増減した比率を特定し、ステップＳ３０８で生成された音声波形データが表す音声の時間長が当該比率で変化するように、各音声波形データ内の素片波形データの個数を増加あるいは減少させればよい。なお、当該比率を特定するため、例えば、ステップＳ３０７で選択された音片データ（発声スピード変換後の音片データ）と、当該音片データがステップＳ３０５で変換を受ける前の元の音片データとの内で互いに同一の音声を表す区間を１個ずつ特定し、発声スピード変換後の音片データ内で特定した区間内に含まれる素片の個数が、元の音片データ内で特定した区間内に含まれる素片の個数に対して増減した比率を、音声の時間長の増減の比率として特定するようにすればよい。なお、音声波形データが表す音声の時間長が、発声スピード変換後の音片データが表す音片のスピードに既に整合している場合、このパーソナルコンピュータは音声波形データ内の素片波形データの個数を調整する必要はない。 That is, in step S309, for example, the personal computer specifies a ratio in which the time length of the phoneme represented by each of the sections included in the sound piece data selected in step S307 is increased or decreased with respect to the original time length. What is necessary is just to increase or decrease the number of segment waveform data in each audio | voice waveform data so that the time length of the audio | voice represented by the audio | voice waveform data produced | generated by S308 may change by the said ratio. In order to specify the ratio, for example, the speech piece data selected in step S307 (speech piece data after utterance speed conversion) and the original speech piece data before the speech piece data is converted in step S305. And each segment representing the same voice is identified, and the number of segments contained in the segment identified in the speech segment data after the speech speed conversion is identified in the original speech segment data. What is necessary is just to specify the ratio increased / decreased with respect to the number of the pieces contained in an area as a ratio of increase / decrease of the audio | voice time length. In addition, when the time length of the voice represented by the voice waveform data is already matched with the speed of the voice piece represented by the voice piece data after the utterance speed conversion, the personal computer uses the number of segment waveform data in the voice waveform data. There is no need to adjust.

そして、このパーソナルコンピュータは、ステップＳ３０９の処理を経た音声波形データと、ステップＳ３０７で選択した音片データとを、定型メッセージデータが示す定型メッセージ内での表音文字列の並びに従った順序で互いに結合し、合成音声を表すデータとして出力する（ステップＳ３１０）。 The personal computer then combines the speech waveform data that has undergone the process of step S309 and the speech piece data selected in step S307 in the order in which the phonetic character strings in the standard message indicated by the standard message data are arranged. Combined and output as data representing synthesized speech (step S310).

なお、パーソナルコンピュータに本体ユニットＭ又は音片登録ユニットＲの機能を行わせるプログラムは、例えば、通信回線の掲示板（ＢＢＳ）にアップロードし、これを通信回線を介して配信してもよく、また、これらのプログラムを表す信号により搬送波を変調し、得られた変調波を伝送し、この変調波を受信した装置が変調波を復調してこれらのプログラムを復元するようにしてもよい。
そして、これらのプログラムを起動し、ＯＳの制御下に、他のアプリケーションプログラムと同様に実行することにより、上述の処理を実行することができる。 The program that causes the personal computer to perform the function of the main unit M or the sound piece registration unit R may be uploaded to a bulletin board (BBS) of a communication line and distributed via the communication line. The carrier wave may be modulated with a signal representing these programs, the obtained modulated wave may be transmitted, and a device that receives the modulated wave may demodulate the modulated wave to restore these programs.
The above-described processing can be executed by starting up these programs and executing them under the control of the OS in the same manner as other application programs.

なお、ＯＳが処理の一部を分担する場合、あるいは、ＯＳが本願発明の１つの構成要素の一部を構成するような場合には、記録媒体には、その部分を除いたプログラムを格納してもよい。この場合も、この発明では、その記録媒体には、コンピュータが実行する各機能又はステップを実行するためのプログラムが格納されているものとする。 When the OS shares a part of the processing, or when the OS constitutes a part of one component of the present invention, a program excluding the part is stored in the recording medium. May be. Also in this case, in the present invention, it is assumed that the recording medium stores a program for executing each function or step executed by the computer.

この発明の実施の形態に係る音声合成システムの構成を示すブロック図である。It is a block diagram which shows the structure of the speech synthesis system which concerns on embodiment of this invention. 音片データベースのデータ構造を模式的に示す図である。It is a figure which shows typically the data structure of a sound piece database. 図１の音片登録ユニットの機能を行うパーソナルコンピュータが実行する処理を示すフローチャートである。It is a flowchart which shows the process which the personal computer which performs the function of the sound piece registration unit of FIG. 1 performs. 図１の本体ユニットの機能を行うパーソナルコンピュータがフリーテキストデータを取得した場合の処理を示すフローチャートである。It is a flowchart which shows a process when the personal computer which performs the function of the main body unit of FIG. 1 acquires free text data. 図１の本体ユニットの機能を行うパーソナルコンピュータが配信文字列データを取得した場合の処理を示すフローチャートである。It is a flowchart which shows a process when the personal computer which performs the function of the main body unit of FIG. 1 acquires delivery character string data. 図１の本体ユニットの機能を行うパーソナルコンピュータが定型メッセージデータ及び発声スピードデータを取得した場合の処理を示すフローチャートである。It is a flowchart which shows a process when the personal computer which performs the function of the main body unit of FIG. 1 acquires fixed message data and utterance speed data.

Explanation of symbols

Ｍ本体ユニット
Ｄ音片データベース
Ｒ音片登録ユニット
１言語処理部
２一般単語辞書
３ユーザ単語辞書
４１音響処理部
４２検索部
４３伸長部
４４波形データベース
５音片編集部
５１形態素解析部
５２一致音片決定部
５３韻律予測部
５４出力合成部
６検索部
７復号化部
８話速変換部
１０収録音片データセット記憶部
１１音片データベース作成部
１２圧縮部
ＨＤＲヘッダ部
ＩＤＸインデックス部
ＤＩＲディレクトリ部
ＤＡＴデータ部 M Main unit D Sound piece database R Sound piece registration unit 1 Language processing unit 2 General word dictionary 3 User word dictionary 41 Acoustic processing unit 42 Search unit 43 Extension unit 44 Waveform database 5 Sound piece editing unit 51 Morphological analysis unit 52 Matching sound piece Determination unit 53 Prosody prediction unit 54 Output synthesis unit 6 Search unit 7 Decoding unit 8 Speech rate conversion unit 10 Recorded sound piece data set storage unit 11 Sound piece database creation unit 12 Compression unit HDR Header part IDX Index part DIR Directory part DAT data Part

Claims

Data acquisition means for acquiring sound piece data representing a sound piece;
Data representing the characteristics of the acquired sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data are generated, and encryption is performed using a plurality of predetermined data among the generated data An encryption key generating means for generating a key;
An encryption means for generating encrypted sound piece data by encrypting the acquired sound piece data using the generated encryption key;
Sound piece storage means for storing the generated encrypted sound piece data in a storage area of an external storage device,
A voice database manufacturing apparatus characterized by the above.

The encryption key generation means decrypts the storage device or the encrypted sound piece data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of data determined by the value of identification data uniquely assigned to an external device for performing the processing as the predetermined data, and generating the encryption key;
The speech database manufacturing apparatus according to claim 1.

The encryption key generation means complies with a rule determined by the value of identification data uniquely assigned to the predetermined plurality of data in the storage device or an external device that decrypts the encrypted sound piece data. Generating the encryption key by performing an operation;
The speech database manufacturing apparatus according to claim 1 or 2, wherein

The data representing the characteristics of the sound piece data consists of data representing the data length of the sound piece data.
The speech database manufacturing apparatus according to claim 1, 2, or 3.

The data representing the characteristics of the sound piece represented by the sound piece data consists of data representing the utterance speed of the sound piece, or data representing the time change of the frequency of the pitch component of the sound piece.
The speech database manufacturing apparatus according to any one of claims 1 to 4, wherein

An audio database that stores encrypted sound piece data corresponding to sound piece data representing sound pieces,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key.
Voice database characterized by that.

The encrypted speech piece data includes data representing the characteristics of the speech piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data. The sound piece data is encrypted using an encryption key generated as a plurality of predetermined data determined by identification data values uniquely assigned to an external device to be decrypted Equivalent to
The voice database according to claim 6.

The encrypted sound piece data has a rule determined by the predetermined plurality of data according to the value of identification data uniquely assigned to the sound database itself or an external device that decrypts the encrypted sound piece data. Using the encryption key generated by performing the operation according to the above, it corresponds to the sound piece data is encrypted,
The voice database according to claim 6 or 7, characterized in that

The data representing the characteristics of the sound piece data consists of data representing the data length of the sound piece data.
The voice database according to claim 6, 7 or 8.

The data representing the characteristics of the sound piece represented by the sound piece data consists of data representing the utterance speed of the sound piece, or data representing the time change of the frequency of the pitch component of the sound piece.
The speech database according to any one of claims 6 to 9, characterized by the above.

Encrypted sound piece data corresponding to the sound piece data representing the sound piece is stored, and the data representing the characteristics of the sound piece data and / or the characteristics of the sound pieces represented by the sound piece data are stored. It is configured to be connectable to an external audio database that further stores data to represent,
A selection means for selecting one of the encrypted speech piece data stored in the speech database to be used for restoring the speech piece;
And decryption means for decrypting the encrypted speech piece data selected by the selection means,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key,
The decryption means acquires the predetermined plurality of data from the voice database, generates the encryption key using the acquired plurality of data, and uses the generated encryption key to generate the encrypted sound. It is for decrypting one piece of data.
A sound piece restoration device characterized by that.

The voice database is assigned identification data unique to the voice database,
The encrypted sound piece data includes identification data uniquely assigned to the sound database among data representing the characteristics of the sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of items determined by values correspond to those obtained by encrypting the sound piece data using an encryption key generated as the predetermined plurality of data,
The decoding means is based on the value of identification data uniquely assigned to the speech database among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of determined items are acquired from the voice database, the encryption key is generated using the acquired plurality of data, and the encrypted sound piece data is decrypted using the generated encryption key. ,
The sound piece restoration apparatus according to claim 11.

The voice database is assigned identification data unique to the voice database,
The encrypted sound piece data uses an encryption key generated by performing an operation according to a rule determined by the value of identification data uniquely assigned to the predetermined plurality of data to the predetermined plurality of data, It corresponds to the sound piece data encrypted,
The decryption means generates the encryption key by performing an operation according to a rule determined by a value of identification data uniquely assigned to the speech database on the predetermined plurality of data acquired from the speech database. And decrypting the encrypted sound piece data using the generated encryption key,
The sound piece restoration device according to claim 11 or 12,

Further comprising synthesis means for generating data representing synthesized speech by combining the speech piece data decoded by the decoding means with each other;
The sound piece restoration device according to claim 11, 12 or 13.

Further comprising synthesizing means for generating data representing synthesized speech by coupling the speech piece data decoded by the decoding means to each other;
The synthesizing means is assigned identification data unique to the synthesizing means,
The encrypted sound piece data includes identification data uniquely assigned to the synthesizing unit among data representing the characteristics of the sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of items determined by values correspond to those obtained by encrypting the sound piece data using an encryption key generated as the predetermined plurality of data,
The decoding means is based on the value of the identification data uniquely assigned to the synthesizing means among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. A plurality of determined items are acquired from the voice database, the encryption key is generated using the acquired plurality of data, and the encrypted sound piece data is decrypted using the generated encryption key. ,
The sound piece restoration apparatus according to claim 11.

Further comprising synthesizing means for generating data representing synthesized speech by coupling the speech piece data decoded by the decoding means to each other;
The synthesizing means is assigned identification data unique to the synthesizing means,
The encrypted sound piece data uses an encryption key generated by performing an operation according to a rule determined by a value of identification data uniquely assigned to the combining unit on the predetermined plurality of data, It corresponds to the sound piece data encrypted,
The decryption means generates the encryption key by performing an operation according to a rule determined by a value of identification data uniquely assigned to the synthesis means on the predetermined plurality of data acquired from the speech database And decrypting the encrypted sound piece data using the generated encryption key,
The sound piece restoration apparatus according to claim 11 or 15, wherein

Acquire sound piece data representing a sound piece,
Data representing the characteristics of the acquired sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data are generated, and encryption is performed using a plurality of predetermined data among the generated data Generate a key
Encrypting the acquired sound piece data using the generated encryption key to generate encrypted sound piece data,
Storing the generated encrypted sound piece data in a storage area of an external storage device;
A voice database manufacturing method characterized by the above.

Encrypted sound piece data corresponding to the sound piece data representing the sound piece is stored, and the data representing the characteristics of the sound piece data and / or the characteristics of the sound pieces represented by the sound piece data are stored. A sound piece restoration method using a voice database for further storing data to be represented,
A selection step of selecting one of the encrypted speech piece data stored in the speech database to be used for restoring the speech piece;
A decryption step for decrypting the encrypted speech piece data selected in the selection step,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key,
In the decryption step, the predetermined plurality of data is acquired from the speech database, the encryption key is generated using the acquired plurality of data, and the encrypted sound is generated using the generated encryption key. Decrypt one piece of data,
A sound piece restoration method characterized by the above.

Computer
Data acquisition means for acquiring sound piece data representing a sound piece;
Data representing the characteristics of the acquired sound piece data and / or data representing the characteristics of the sound pieces represented by the sound piece data are generated, and encryption is performed using a plurality of predetermined data among the generated data An encryption key generating means for generating a key;
An encryption means for generating encrypted sound piece data by encrypting the acquired sound piece data using the generated encryption key;
Sound piece storage means for storing the generated encrypted sound piece data in a storage area of an external storage device;
Program to make it function.

Encrypted sound piece data corresponding to the sound piece data representing the sound piece is stored, and the data representing the characteristics of the sound piece data and / or the characteristics of the sound pieces represented by the sound piece data are stored. A computer connectable to an external audio database that further stores data to represent
A selection means for selecting one of the encrypted speech piece data stored in the speech database to be used for restoring the speech piece;
A program for functioning as decryption means for decrypting encrypted sound piece data selected by the selection means,
The encrypted sound piece data is generated using a plurality of predetermined data among the data representing the characteristics of the sound piece data and / or the data representing the characteristics of the sound pieces represented by the sound piece data. It corresponds to the sound piece data encrypted using an encryption key,
The decryption means acquires the predetermined plurality of data from the voice database, generates the encryption key using the acquired plurality of data, and uses the generated encryption key to generate the encrypted sound. It is for decrypting one piece of data.
A program characterized by that.