JP2007333851A

JP2007333851A - Speech synthesis method, speech synthesizer, speech synthesis program, speech synthesis delivery system

Info

Publication number: JP2007333851A
Application number: JP2006163309A
Authority: JP
Inventors: Takashi Miki; 敬三木
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2006-06-13
Filing date: 2006-06-13
Publication date: 2007-12-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech synthesis method for performing suitable hearing restriction or preventing unauthorized use of speech data without replacing contents of input texts, and to provide a speech synthesizer, a speech synthesis program and a speech synthesis delivery system thereof. <P>SOLUTION: The speech synthesis method has an authentication step for checking whether a listener is a registered user based on identification information of the listener hearing output speech; a step for reading a list in a storing means for storing the list holding a table of suppression words; a determination step for determining whether the suppression words are included in the input text; and a synthesizing step for synthesizing speech data based on the input text when the suppression words are not included in the input text or when the listener is the registered user. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声合成方法、音声合成装置、音声合成プログラム及び音声合成配信システムに関するものである。 The present invention relates to a speech synthesis method, a speech synthesizer, a speech synthesis program, and a speech synthesis distribution system.

従来、『音声出力に際して、特定の単語乃至単語の組み合わせを出力しないようにする』技術として、『音声再生を禁止する語を登録語として登録した抑制語リスト１０７と、入力された文書ファイルから、抑制語リスト１０７の登録語を抽出する抽出手段と、前記文書ファイルにおいて、前記抽出手段で抽出された登録語を所定の文字列に置換する置換手段と、前記置換手段により置換された文書ファイルに基づいて音声出力する音声出力手段（１０４、１０５）とを備える。』というものが提案されている（特許文献１）。
特開２００２−２２１９８１号公報（要約） Conventionally, as a technique of “not outputting a specific word or a combination of words at the time of voice output”, “from a suppressed word list 107 in which words that are prohibited from voice reproduction are registered as registered words and an input document file, Extraction means for extracting registered words from the suppression word list 107, replacement means for replacing the registered words extracted by the extraction means with a predetermined character string in the document file, and a document file replaced by the replacement means And voice output means (104, 105) for outputting voice based on the above. Is proposed (Patent Document 1).
JP 2002-221981 (Abstract)

しかしながら、合成した音声の用途や利用形態によっては、入力テキストの内容を置き換えて音声出力することが必ずしも好ましくない場合も存在する。
このような場合に対応するため、入力テキストの内容を置き換えることなく、適切な聴取制限を行う、あるいは音声データの不正利用を防止することができる音声合成方法、音声合成装置、音声合成プログラム及び音声合成配信システムが望まれていた。 However, there are cases where it is not always preferable to output the voice by replacing the contents of the input text depending on the use and usage form of the synthesized voice.
In order to cope with such a case, a speech synthesis method, a speech synthesizer, a speech synthesis program, and a speech that can appropriately restrict listening or prevent unauthorized use of speech data without replacing the contents of the input text. A composite distribution system was desired.

本発明に係る音声合成方法は、
入力したテキストを音声として合成して出力する際に、当該テキストに音声出力を抑制すべき語句（以下、抑制語句と呼ぶ）が含まれている場合には、聴取制限を加えて出力する方法であって、
出力音声を聴取する者の識別情報を基に、当該聴取者が正規ユーザか否かを確認する認証ステップと、
抑制語句の一覧を保持するリストを格納した記憶手段より当該リストを読み込むステップと、
入力テキスト中に、抑制語句が含まれているか否かを判断する判断ステップと、
入力テキスト中に抑制語句が含まれていない場合、又は聴取者が正規ユーザである場合に限り、入力テキストを基に音声データを合成する合成ステップと、
を有することを特徴とするものである。 A speech synthesis method according to the present invention includes:
When synthesizing and outputting input text as speech, if the text contains a phrase that should suppress speech output (hereinafter referred to as a suppression phrase), a method of outputting with restricted listening is used. There,
An authentication step for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
Reading the list from storage means storing a list holding a list of suppression words;
A determination step of determining whether or not a suppression word is included in the input text;
A synthesis step of synthesizing speech data based on the input text only if the input text does not contain a suppression word or if the listener is a regular user;
It is characterized by having.

また、本発明に係る音声合成装置は、
入力したテキストを音声として合成して出力する際に、当該テキストに抑制語句が含まれている場合には、聴取制限を加えて出力する装置であって、
出力音声を聴取する者の識別情報を基に、当該聴取者が正規ユーザか否かを確認する認証手段と、
抑制語句の一覧を保持するリストを格納した記憶手段と、
前記記憶手段より、抑制語句の一覧を保持するリストを読み込み、入力テキスト中に、抑制語句が含まれているか否かを判断する判断手段と、
入力テキスト中に抑制語句が含まれていない場合、又は聴取者が正規ユーザである場合に限り、入力テキストを基に音声データを合成する合成手段と、
を有することを特徴とするものである。 The speech synthesizer according to the present invention
When the input text is synthesized and output as speech, if the text contains a suppression phrase, it is a device that outputs with restriction of listening,
Authentication means for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
Storage means for storing a list for holding a list of suppression words;
A determination unit that reads a list holding a list of suppression words from the storage unit, and determines whether or not the suppression word is included in the input text; and
A synthesis means for synthesizing speech data based on the input text only when the input text does not contain a suppression word or when the listener is a regular user;
It is characterized by having.

また、本発明に係る音声合成配信システムは、
音声合成サーバと、音声合成サーバが出力した音声データを配信する配信サーバとを有する音声合成配信システムであって、
前記音声合成サーバは、
演算手段と、上記に記載の音声合成方法を演算手段に実行させる音声合成プログラムを格納した記憶手段とを有し、
前記記憶手段は、
聴取者の識別情報を保持するユーザテーブルと、
聴取者の識別情報と当該聴取者に許容される抑制度レベルとの関係を表すテーブルと、
抑制語句と当該抑制語句の抑制度レベルとの組を保持するリストとを格納したことを特徴とするものである。 In addition, the speech synthesis distribution system according to the present invention includes:
A speech synthesis distribution system having a speech synthesis server and a distribution server for distributing speech data output by the speech synthesis server,
The speech synthesis server
Computing means and storage means for storing a speech synthesis program for causing the computation means to execute the speech synthesis method described above,
The storage means
A user table holding the identification information of the listener;
A table representing the relationship between the listener's identification information and the level of suppression allowed for that listener;
The present invention is characterized in that a list holding a set of suppression words and suppression levels of the suppression words is stored.

本発明によれば、合成して出力する音声メッセージ中に公序良俗に触れるような表現や放送禁止用語、差別用語などの不適切な語句が含まれている場合や、特定のサービスを利用する者にだけ聴取できる音声メッセージ提供サービスに、その音声の聴取を制限するような仕組みを備え、聴取者の属性に応じて適切な配信音声を合成して好適に対応することができる。 According to the present invention, when a voice message to be synthesized and output includes an expression that touches public order and morals, an inappropriate phrase such as a broadcast-prohibited term or a discrimination term, or a person who uses a specific service. A voice message providing service that can only be listened to is provided with a mechanism for restricting listening to the voice, and an appropriate distribution voice can be synthesized according to the attributes of the listener to cope with it appropriately.

実施の形態１．
図１は、本発明の実施の形態１に係る音声合成装置の機能ブロック図を示すものである。
図１に示す音声合成装置１００は、入力手段１０１、認証手段１０２、抑制語判断手段１０３、抑制語リスト記憶手段１０４、音声合成手段１０５を有する。
入力手段１０１は、読み上げ対象テキスト１１０を入力として受け取り、抑制語判定手段１０３に出力する。
認証手段１０２は、音声合成装置１００の利用者を識別する情報を入力として受け取って認証処理を行い、その結果を音声合成手段１０５に出力する。
抑制語判定手段１０３は、抑制語リスト記憶手段１０４より抑制語リストを読み込み、読み上げ対象テキスト１１０の内容に抑制語句が含まれているか否かを判断して、その結果及び読み上げ対象テキスト１１０の内容を音声合成手段１０５に出力する。
抑制語リスト記憶手段１０４は、後述の図２に示す抑制語リストを格納している。
音声合成手段１０５は、認証手段１０２及び抑制語判定手段１０３の出力を基に、音声１１１を合成して出力する。出力方法は、音声データファイルとして記憶手段に書き出す方法などを用いることができる。 Embodiment 1 FIG.
FIG. 1 is a functional block diagram of the speech synthesizer according to Embodiment 1 of the present invention.
A speech synthesizer 100 shown in FIG. 1 includes an input unit 101, an authentication unit 102, a suppression word determination unit 103, a suppression word list storage unit 104, and a speech synthesis unit 105.
The input unit 101 receives the reading target text 110 as an input and outputs it to the suppression word determination unit 103.
The authentication unit 102 receives information for identifying the user of the speech synthesizer 100 as an input, performs authentication processing, and outputs the result to the speech synthesizer 105.
The suppression word determination unit 103 reads the suppression word list from the suppression word list storage unit 104, determines whether or not the suppression word phrase is included in the content of the text to be read 110, and the result and the content of the text 110 to be read out. Is output to the speech synthesizer 105.
The suppression word list storage unit 104 stores a suppression word list shown in FIG.
The voice synthesis unit 105 synthesizes and outputs the voice 111 based on the outputs of the authentication unit 102 and the suppression word determination unit 103. As an output method, a method of writing to a storage means as an audio data file can be used.

図２は、抑制語リスト記憶手段１０４が格納する抑制語リストの構成例を示すものである。
抑制語リストは、音声出力を抑制すべき語句のリストを保持するものであり、例えばＣＳＶ（ＣｏｍｍａＳｅｐａｒｅｔｅｄＶａｌｕｅｓ）等のテキストファイル形式、もしくはリレーショナルデータベースのテーブル形式で、抑制語リスト記憶手段１０４に格納するように構成することができる。 FIG. 2 shows a configuration example of the suppression word list stored in the suppression word list storage unit 104.
The suppression word list holds a list of words and phrases that should be suppressed in voice output, and is stored in the suppression word list storage unit 104 in a text file format such as CSV (Comma Separated Values) or a relational database table format. Can be configured to.

図２では、テーブル形式で格納している場合の構成とデータ例を示している。以下、各列について説明する。
「インデックス」列は、抑制語の先頭文字をインデックスとして保持する列であり、検索等の便宜上設けられているものである。本実施の形態１においては、日本語で表現される抑制語を想定しているため、本列の値は５０音の「あ〜ん」を格納しているが、英語の抑制語を保持する場合にはアルファベットのインデックスとするなど、整理・検索の便に資するデータとすればよい。
「抑制語」列は、音声出力を抑制すべき語句を保持する列である。本列の値は、「インデックス」列により集約整理されている。 FIG. 2 shows a configuration and data example when stored in a table format. Hereinafter, each column will be described.
The “index” column is a column that holds the first character of the suppression word as an index, and is provided for convenience of search and the like. In this Embodiment 1, since the suppression word expressed in Japanese is assumed, the value of this column stores "an" of 50 tones, but holds the English suppression word. In some cases, an alphabetic index may be used, and the data may be useful for organizing and searching.
The “suppression word” column is a column holding words / phrases for which voice output should be suppressed. The values in this column are summarized and organized by the “index” column.

図３は、図１の音声合成装置１００の全体動作フローを説明するものである。以下、各ステップについて説明する。
（Ｓ３０１）
入力手段１０１は、読み上げ対象テキスト１１０の０内容を受け取る。受け取った内容は、抑制語判定手段１０３に出力される。
（Ｓ３０２）
認証手段１０２は、音声合成装置１００の利用者を識別する情報を入力として受け取って認証処理を行う。
認証処理の内容は、例えば正規のＩＤとパスワードの組を認証手段１０２内の記憶領域に保存しておき、その内容との整合性を確認するなどの方法で実現することができる。
認証処理の結果は、音声合成手段１０５に出力される。
（Ｓ３０３）
抑制語判定手段１０３は、抑制語リスト記憶手段１０４より抑制語リストを読み込む。
（Ｓ３０４）
抑制語判定手段１０３は、入力手段１０１より受け取った読み上げ対象テキスト１１０の内容に、抑制語リストが保持する語句が含まれるか否かを判断する。判断結果は音声合成手段１０５へ出力される。
含まれると判断する場合はステップＳ３０５へ、含まれないと判断する場合はステップＳ３０６へ進む。
読み上げ対象テキスト１１０の内容は、音声合成手段１０５へ出力される。
（Ｓ３０５）
音声合成手段１０５は、認証手段１０２の出力に基づき、音声合成装置１００の利用者が正規の利用者であるか否かを判断する。
正規の利用者であると判断する場合はステップＳ３０６へ進み、正規の利用者でないと判断する場合は処理を終了する。
（Ｓ３０６）
音声合成手段１０５は、抑制語判定手段１０３が出力した読み上げ対象テキスト１１０の内容に基づき、音声合成処理を行う。
合成処理の内容は、読み上げ対象テキスト１１０の形態素解析を行ったうえで、あらかじめ構築した音声コーパス中の音声素片を組み合わせて合成する、あるいは所定の確率モデルに準拠して合成するなどの方法で実現することができる。
（Ｓ３０７）
音声合成手段１０５は、ステップＳ３０６で合成した音声を出力する。 FIG. 3 explains the overall operation flow of the speech synthesizer 100 of FIG. Hereinafter, each step will be described.
(S301)
The input unit 101 receives 0 content of the text 110 to be read out. The received content is output to the suppression word determination unit 103.
(S302)
The authentication unit 102 receives information for identifying the user of the speech synthesizer 100 as an input and performs an authentication process.
The contents of the authentication process can be realized by, for example, a method in which a pair of a regular ID and a password is stored in a storage area in the authentication unit 102 and the consistency with the contents is confirmed.
The result of the authentication process is output to the speech synthesizer 105.
(S303)
The suppression word determination unit 103 reads the suppression word list from the suppression word list storage unit 104.
(S304)
The suppression word determination unit 103 determines whether or not the content of the text to be read 110 received from the input unit 101 includes a phrase held in the suppression word list. The determination result is output to the speech synthesizer 105.
If it is determined that it is included, the process proceeds to step S305. If it is determined that it is not included, the process proceeds to step S306.
The content of the text to be read 110 is output to the speech synthesizer 105.
(S305)
The speech synthesizer 105 determines whether the user of the speech synthesizer 100 is a regular user based on the output of the authentication unit 102.
If it is determined that the user is an authorized user, the process proceeds to step S306. If it is determined that the user is not an authorized user, the process is terminated.
(S306)
The speech synthesis unit 105 performs speech synthesis processing based on the content of the text to be read 110 output from the suppression word determination unit 103.
The content of the synthesis process is obtained by performing a morphological analysis of the text 110 to be read out and combining the speech segments in the speech corpus constructed in advance, or by synthesizing in accordance with a predetermined probability model. Can be realized.
(S307)
The voice synthesizer 105 outputs the voice synthesized in step S306.

図３のフローチャートに示す方法によれば、あらかじめ聴取者の認証を行っておき、読み上げ対象テキスト１１０に抑制語句が含まれる場合には聴取制限を行うので、聴取者の属性に応じて適切な制限を実施することができる。 According to the method shown in the flowchart of FIG. 3, the listener is authenticated in advance, and when the suppression target phrase is included in the text to be read 110, the listening restriction is performed. Can be implemented.

図４は、図３に示すフローチャートの変形例を示すものである。
図３においては、読み上げ対象テキスト１１０の内容に抑制語句が１つでも含まれていれば、音声全体に聴取制限を加えるものであった。
図４に示すフローチャートは、抑制語句が含まれる部分のみ聴取制限を加えるようにしたものである。以下、各ステップについて説明する。
（Ｓ４０１）〜（Ｓ４０３）
図３のステップＳ３０１〜Ｓ３０３と同様であるので、説明を省略する。
（Ｓ４０４）
以下のステップＳ４０５〜Ｓ４０９を、読み上げ対象テキスト１１０の末尾に到達するまで繰り返し実行する。
（Ｓ４０５）
抑制語判定手段１０３は、読み上げ対象テキスト１１０の内容をあらかじめ所定のブロック単位に分割しておく。
次に、抑制語判定手段１０３は、読み上げ対象テキスト１１０の最初の音声再生ブロックを読み込む。本ステップの２回目以降の実行時には、読み上げ対象テキスト１１０の次の処理対象ブロックに移動する。
（Ｓ４０６）
抑制語判定手段１０３は、入力手段１０１より受け取った読み上げ対象テキスト１１０の内容に、抑制語リストが保持する語句が含まれるか否かを判断する。判断結果は音声合成手段１０５へ出力される。
含まれると判断する場合はステップＳ４０７へ、含まれないと判断する場合はステップＳ４０８へ進む。
読み上げ対象テキスト１１０の内容は、音声合成手段１０５へ出力される。
（Ｓ４０７）
音声合成手段１０５は、認証手段１０２の出力に基づき、音声合成装置１００の利用者が正規の利用者であるか否かを判断する。
正規の利用者であると判断する場合はステップＳ４０８へ進み、正規の利用者でないと判断する場合は、ステップＳ４０５へ戻る。
（Ｓ４０８）
音声合成手段１０５は、抑制語判定手段１０３が出力した読み上げ対象テキスト１１０の内容に基づき、音声合成処理を行う。
合成処理の内容は、図３のステップＳ３０６と同様である。
（Ｓ４０９）
音声合成手段１０５は、ステップＳ４０８で合成した音声を出力する。 FIG. 4 shows a modification of the flowchart shown in FIG.
In FIG. 3, if at least one suppression word / phrase is included in the content of the text to be read out 110, the listening restriction is added to the entire sound.
The flowchart shown in FIG. 4 is such that listening restriction is applied only to the portion including the suppression word / phrase. Hereinafter, each step will be described.
(S401) to (S403)
This is the same as steps S301 to S303 in FIG.
(S404)
The following steps S405 to S409 are repeatedly executed until the end of the text to be read 110 is reached.
(S405)
The suppression word determination unit 103 divides the content of the text to be read 110 into predetermined block units in advance.
Next, the suppression word determination unit 103 reads the first sound reproduction block of the text to be read 110. At the second and subsequent executions of this step, the process moves to the next processing target block of the reading target text 110.
(S406)
The suppression word determination unit 103 determines whether or not the content of the text to be read 110 received from the input unit 101 includes a phrase held in the suppression word list. The determination result is output to the speech synthesizer 105.
If it is determined that it is included, the process proceeds to step S407. If it is determined that it is not included, the process proceeds to step S408.
The content of the text to be read 110 is output to the speech synthesizer 105.
(S407)
The speech synthesizer 105 determines whether the user of the speech synthesizer 100 is a regular user based on the output of the authentication unit 102.
If it is determined that the user is an authorized user, the process proceeds to step S408. If it is determined that the user is not an authorized user, the process returns to step S405.
(S408)
The speech synthesis unit 105 performs speech synthesis processing based on the content of the text to be read 110 output from the suppression word determination unit 103.
The content of the synthesis process is the same as that in step S306 in FIG.
(S409)
The voice synthesizer 105 outputs the voice synthesized in step S408.

図４のフローチャートに示す方法によれば、読み上げ対象テキスト１１０の内容に抑制語句が含まれる場合であっても、抑制語句以外の部分は聴取者の属性に因らず聴取可能であるため、ユーザの利便性が増す。
ただしこの場合は、読み上げ対象テキスト１１０の内容には変更を加えていないものの、聴取者が耳にする音声出力内容は読み上げ対象テキスト１１０の内容と異なっているため、用途や利用形態に応じて、図３又は図４のいずれか適切な方法を選択する、あるいは無音部を挿入するなど、個々のケースに応じて適切な方法を用いるとよい。 According to the method shown in the flowchart of FIG. 4, even if a suppression word / phrase is included in the content of the text 110 to be read out, a portion other than the suppression word / phrase can be heard regardless of the listener's attributes. Increase convenience.
However, in this case, although the content of the text to be read 110 is not changed, the audio output content that the listener hears is different from the content of the text to be read 110. An appropriate method may be used according to individual cases, such as selecting an appropriate method of FIG. 3 or FIG. 4 or inserting a silent part.

本実施の形態１においては、認証手段１０２は自ら識別情報に基づき聴取者が正規の者であるか否かを判断するものとしたが、これに限られるものではなく、例えば外部の認証サーバ等にネットワークを介して接続し、認証結果のみを受け取るように構成してもよいし、音声合成装置１００の中に聴取者のＩＤ・パスワードの組を格納したデータベースを構成し、当該データベースを検索して認証を行うように構成してもよい。 In the first embodiment, the authentication unit 102 determines whether or not the listener is a legitimate person based on the identification information. However, the present invention is not limited to this. For example, an external authentication server or the like is used. May be configured to receive only the authentication result, or a database storing the listener's ID / password pair may be stored in the speech synthesizer 100, and the database may be searched. Authentication may be performed.

以上のように、本実施の形態１に係る音声合成装置によれば、
入力したテキストを音声として合成して出力する際に、当該テキストに抑制語句が含まれている場合には、聴取制限を加えて出力する装置であって、
出力音声を聴取する者の識別情報を基に、当該聴取者が正規ユーザか否かを確認する認証手段と、
抑制語句の一覧を保持するリストを格納した記憶手段と、
前記記憶手段より、抑制語句の一覧を保持するリストを読み込み、入力テキスト中に、抑制語句が含まれているか否かを判断する判断手段と、
入力テキスト中に抑制語句が含まれていない場合、又は聴取者が正規ユーザである場合に限り、入力テキストを基に音声データを合成する合成手段と、
を有するので、聴取者の属性に応じて適切な制限を実施することができる。
また、入力テキストの所定のブロック毎に抑制語句の有無を判断すれば、読み上げ対象テキスト１１０の内容に抑制語句が含まれる場合であっても、抑制語句以外の部分は聴取者の属性に因らず聴取可能であるため、ユーザの利便性が増す。 As described above, according to the speech synthesizer according to the first embodiment,
When the input text is synthesized and output as speech, if the text contains a suppression phrase, it is a device that outputs with restriction of listening,
Authentication means for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
Storage means for storing a list for holding a list of suppression words;
A determination unit that reads a list holding a list of suppression words from the storage unit, and determines whether or not the suppression word is included in the input text; and
A synthesis means for synthesizing speech data based on the input text only when the input text does not contain a suppression word or when the listener is a regular user;
Therefore, appropriate restrictions can be implemented according to the attributes of the listener.
Further, if the presence or absence of a suppression word is determined for each predetermined block of the input text, even if the suppression text is included in the content of the text to be read 110, the part other than the suppression word depends on the attribute of the listener. Therefore, convenience for the user is increased.

実施の形態２．
実施の形態１においては、正規の聴取者でない者は合成後の音声を聴取できないようにする音声合成方法及びその装置の構成を示した。これは即ち、音声合成を行う段階で、音声合成装置自らが聴取制限を行うものである。
一方、聴取制限を行う方法としては、音声再生を行う外部のアプリケーションに制限をかけるか否かの判断を委譲し、音声合成の段階では、その判断に用いる情報を付与しておくに止める方法をとることも可能である。
本発明の実施の形態２に係る音声合成装置は、そのような外部アプリケーションの判断に用いる情報を、音声合成時に付与するものである。 Embodiment 2. FIG.
In the first embodiment, the configuration of a speech synthesis method and apparatus for preventing a person who is not a regular listener from hearing the synthesized speech has been described. That is, at the stage of speech synthesis, the speech synthesizer itself limits listening.
On the other hand, as a method for restricting listening, a method for delegating whether or not to limit an external application that performs voice reproduction is delegated, and at the stage of speech synthesis, a method for stopping information from being used for the determination is stopped. It is also possible to take.
The speech synthesizer according to Embodiment 2 of the present invention provides information used for such external application determination at the time of speech synthesis.

図５は、本実施の形態２に係る音声合成装置の機能ブロック図を示すものである。
図５に示す音声合成装置５００は、入力手段５０１、抑制語リスト記憶手段５０４、音声合成手段５０５、算定手段５０６を有する。
入力手段５０１は、読み上げ対象テキスト１１０を入力として受け取り、算定手段５０６に出力する。
抑制語リスト記憶手段５０４は、後述の図６に示す抑制語リストを格納している。
算定手段５０６は、抑制語リスト記憶手段５０４より抑制語リストを読み込み、読み上げ対象テキスト１１０の内容に抑制語句が含まれているか否かを判断して、その結果及び読み上げ対象テキスト１１０の内容を音声合成手段５０５に出力する。また、抑制語句が含まれている場合には、その音声出力を抑制すべき度合い（以後、抑制度レベルと呼ぶ）を算定し、算定結果を音声合成手段５０５に出力する。
音声合成手段５０５は、算定手段５０６の出力を基に、音声１１１を合成して出力する。出力方法は、音声データファイルとして記憶手段に書き出す方法などを用いることができる。 FIG. 5 is a functional block diagram of the speech synthesizer according to the second embodiment.
A speech synthesis apparatus 500 shown in FIG. 5 includes an input unit 501, a suppression word list storage unit 504, a speech synthesis unit 505, and a calculation unit 506.
The input unit 501 receives the reading target text 110 as an input and outputs it to the calculation unit 506.
The suppression word list storage unit 504 stores a suppression word list shown in FIG.
The calculating unit 506 reads the suppression word list from the suppression word list storage unit 504, determines whether or not the suppression word phrase is included in the content of the text to be read 110, and the result and the content of the text 110 to be read out are voiced. The data is output to the combining unit 505. If a suppression word / phrase is included, the degree of suppression of the speech output (hereinafter referred to as a suppression level) is calculated, and the calculation result is output to the speech synthesis means 505.
The voice synthesizing unit 505 synthesizes and outputs the voice 111 based on the output of the calculating unit 506. As an output method, a method of writing to a storage means as an audio data file can be used.

図６は、抑制語リスト記憶手段５０４が格納する抑制語リストの構成例を示すものである。抑制語リストは、実施の形態１と同様に、ＣＳＶ等のテキストファイル形式、もしくはリレーショナルデータベースのテーブル形式で、抑制語リスト記憶手段５０４に格納するように構成することができる。
以下、図６の各列について説明する。
「インデックス」列と「抑制語」列の内容は、図２に示すものと同様である。
「抑制度レベル」列は、当該抑制語句の抑制度レベルを数値で表現したものである。レベル分けは、抑制語句の内容や、想定している利用場面に応じて、適宜事前に設定しておくことができる。 FIG. 6 shows a configuration example of the suppression word list stored in the suppression word list storage unit 504. As in the first embodiment, the suppression word list can be configured to be stored in the suppression word list storage unit 504 in a text file format such as CSV or a table format of a relational database.
Hereinafter, each column in FIG. 6 will be described.
The contents of the “index” column and the “suppression word” column are the same as those shown in FIG.
The “inhibition degree level” column expresses the inhibition degree level of the inhibition phrase in numerical values. The level classification can be appropriately set in advance according to the content of the suppression word and the assumed usage scene.

なお、図６においては、抑制語句とその抑制度レベルを同一のテーブルに保持するように記載したが、これに限られるものではなく、抑制度レベルを異なるテーブルに格納して両者の関連を外部キー制約等により表すように構成してもよい。 In FIG. 6, the suppression words and their suppression levels are described as being held in the same table. However, the present invention is not limited to this. The suppression levels are stored in different tables, and the relationship between the two is external. You may comprise so that it may represent by key restrictions etc.

図７は、本実施の形態２における合成後の音声データの内部構成イメージを示すものである。
本実施の形態２においては、合成後の音声データには、算定手段が求めた抑制度レベルを示す情報が含まれている。抑制度レベルを示す情報は、音声データ全体で共通の値を１つ持たせるように構成してもよいし、例えば音声フレーム毎にそのフレーム内の抑制語句の抑制度レベルを表す情報を持たせるように構成してもよい。 FIG. 7 shows an internal configuration image of synthesized audio data in the second embodiment.
In the second embodiment, the synthesized voice data includes information indicating the suppression level obtained by the calculation means. The information indicating the suppression level may be configured to have one common value for the entire audio data, for example, for each audio frame, information indicating the suppression level of the suppression word / phrase in the frame. You may comprise as follows.

図７の（１）は、音声データ全体で共通の値を１つ持たせるように構成した場合の音声データのイメージを示すものである。
この場合は、音声データの先頭などにヘッダ部を設け、当該音声データ全体の抑制度レベルを表す情報を、ヘッダ部の一部に持たせるようにするとよい。 (1) in FIG. 7 shows an image of audio data when the audio data is configured to have one common value.
In this case, it is preferable to provide a header portion at the beginning of the audio data and to have information indicating the suppression level of the entire audio data in a part of the header portion.

図７の（２）は、音声フレーム毎にそのフレーム内の抑制語句の抑制度レベルを表す情報を持たせるように構成した場合の音声データのイメージを示すものである。
一般に、音声データには抑制語句が含まれる度合いが部分毎に異なるため、フレーム毎に抑制度レベルを表す情報を持たせておけば、抑制度レベルが一定値以下のフレームは聴取者の属性に因らず聴取可能とすることも可能であるため、ユーザの利便性が増す。
この場合は、音声データの各フレームをフレームヘッダ部とフレームデータ部とに分けて構成し、フレームヘッダ部には当該フレームのバイト長やビットレート等の音声属性とともに抑制度レベルを表す情報を持たせるようにするとよい。 (2) of FIG. 7 shows an image of audio data in a case where each audio frame is configured to have information indicating the suppression level of the suppression word / phrase in the frame.
In general, since the degree to which suppression words are included in audio data varies from part to part, if information indicating the suppression level is provided for each frame, frames with a suppression level of a certain value or less are attributed to the listener. However, it is possible to make it possible to listen, so that convenience for the user is increased.
In this case, each frame of the audio data is divided into a frame header part and a frame data part, and the frame header part has information indicating the suppression level together with the audio attributes such as the byte length and bit rate of the frame. It is good to make it.

図８は、図５の音声合成装置５００の全体動作フローを説明するものである。以下、各ステップについて説明する。
なお、図８においては、図７の（２）に示すように、所定のブロック毎に抑制度レベルを算定する場合のフローチャートを示している。
（Ｓ８０１）
入力手段５０１は、読み上げ対象テキスト１１０の０内容を受け取る。受け取った内容は、算定手段５０６に出力される。
（Ｓ８０２）
算定手段５０６は、抑制語リスト記憶手段５０４より抑制語リストを読み込む。このとき、抑制語句とともに、当該抑制語句の抑制度レベルをセットにして読み込んでおく。
（Ｓ８０３）
以下のステップＳ８０４〜Ｓ８０７を、読み上げ対象テキスト１１０の末尾に到達するまで繰り返し実行する。
（Ｓ８０４）
算定手段５０６は、読み上げ対象テキスト１１０の内容をあらかじめ所定のブロック単位に分割しておく。
次に、算定手段５０６は、読み上げ対象テキスト１１０の最初の音声再生ブロックを読み込む。本ステップの２回目以降の実行時には、読み上げ対象テキスト１１０の次の処理対象ブロックに移動する。
（Ｓ８０５）
算定手段５０６は、現在のブロックの抑制度レベルを算定する。
算定に際しては、現在のブロックに含まれる抑制語句の抑制度レベルを総和する、平均値を求めるなど、任意の算定基準を用いることができる。
抑制度レベルの算定結果及び読み上げ対象テキスト１１０の内容は、音声合成手段５０５に出力される。
（Ｓ８０６）
音声合成手段５０５は、算定手段５０６が出力した読み上げ対象テキスト１１０の内容及び抑制度レベルの算定結果に基づき、算定結果を表す情報を含めた音声データを合成する処理を行う。音声データの内容は、図７の（１）（２）のいずれかの形式を用いることができる。
（Ｓ８０７）
音声合成手段５０５は、ステップＳ８０６で合成した音声を出力する。 FIG. 8 illustrates an overall operation flow of the speech synthesizer 500 of FIG. Hereinafter, each step will be described.
In addition, in FIG. 8, as shown to (2) of FIG. 7, the flowchart in the case of calculating a suppression degree level for every predetermined | prescribed block is shown.
(S801)
The input unit 501 receives 0 content of the text 110 to be read out. The received content is output to the calculation means 506.
(S802)
The calculation unit 506 reads the suppression word list from the suppression word list storage unit 504. At this time, together with the suppression word / phrase, the suppression level of the suppression word / phrase is read as a set.
(S803)
The following steps S804 to S807 are repeatedly executed until the end of the text to be read 110 is reached.
(S804)
The calculating means 506 divides the contents of the text to be read 110 into predetermined block units in advance.
Next, the calculation means 506 reads the first sound reproduction block of the text 110 to be read out. At the second and subsequent executions of this step, the process moves to the next processing target block of the reading target text 110.
(S805)
The calculating means 506 calculates the inhibition level of the current block.
In the calculation, any calculation standard can be used, such as summing up the suppression levels of the suppression words included in the current block, or obtaining an average value.
The calculation result of the suppression level and the contents of the text 110 to be read out are output to the speech synthesizer 505.
(S806)
The voice synthesizing unit 505 performs a process of synthesizing voice data including information representing the calculation result based on the content of the text to be read 110 output from the calculation unit 506 and the calculation result of the suppression level. As the contents of the audio data, any of the formats (1) and (2) in FIG. 7 can be used.
(S807)
The voice synthesizer 505 outputs the voice synthesized in step S806.

なお、図７の（１）に示すように、音声データ全体の抑制度レベルを算定する場合は、図８のステップＳ８０６とＳ８０７をループの外に出し、音声データ全体の抑制度レベルを算定した後に音声合成を行うように構成すればよい。 As shown in (1) of FIG. 7, when calculating the suppression level of the entire audio data, steps S806 and S807 of FIG. 8 are taken out of the loop, and the suppression level of the entire audio data is calculated. What is necessary is just to comprise so that speech synthesis may be performed later.

図８のフローチャートに示す方法によれば、抑制語リストを読み込んで入力テキストとのマッチング処理を行う際に、抑制度レベルの算定を同時に行うことができるので、少ない計算負荷で図７に示すような抑制度レベルを埋め込んだ音声データを作成できる。
即ち、聴取抑制処理は音声を再生するアプリケーション等に委譲し、音声合成の時点では抑制判断に用いる情報を付与するに止めているので、計算負荷の少ない簡易な方法で音声合成を行い、かつ適切な聴取抑制を行うことができる。 According to the method shown in the flowchart of FIG. 8, since the suppression level can be calculated at the same time when the suppression word list is read and matching processing with the input text is performed, as shown in FIG. Can create audio data with various levels of suppression.
In other words, listening suppression processing is delegated to an application that reproduces speech, and only information to be used for suppression determination is given at the time of speech synthesis, so speech synthesis is performed with a simple method with less calculation load and is appropriate. Can be suppressed.

以上のように、本実施の形態２によれば、
入力したテキストを音声として合成して出力する際に、当該テキストに抑制語句が含まれている場合には、聴取制限を加えて出力する装置であって、
抑制語句と、当該抑制語句の抑制度レベルとの組を保持するリストを格納する記憶手段と、
前記記憶手段より、抑制語句と、当該抑制語句の抑制度レベルとの組を保持するリストを読み込み、入力テキスト中の抑制度レベルを算定する算定手段と、
入力テキスト及び前記算定手段の処理結果に基づき、算定結果を表す情報を含めた音声データを合成する合成手段と、
を有するので、
計算負荷の少ない簡易な方法で音声合成を行い、かつ適切な聴取抑制を行うことができる。 As described above, according to the second embodiment,
When the input text is synthesized and output as speech, if the text contains a suppression phrase, it is a device that outputs with restriction of listening,
Storage means for storing a list that holds a combination of a suppression word and a suppression level of the suppression word;
From the storage means, a calculation means for reading a list holding a set of suppression words and suppression levels of the suppression words, and calculating a suppression level in the input text;
A synthesizing unit that synthesizes voice data including information representing the calculation result based on the input text and the processing result of the calculating unit;
So that
Speech synthesis can be performed by a simple method with a small calculation load, and appropriate listening suppression can be performed.

また、前記算定手段は、入力テキストの所定のブロック毎に抑制度レベルを算定し、
前記合成手段は、当該所定のブロック毎に、抑制度レベルを表す情報を含めて音声データを合成するので、
抑制度レベルが一定値以下のフレームは聴取者の属性に因らず聴取可能とすることが可能となり、ユーザの利便性が増す。 The calculating means calculates a suppression level for each predetermined block of the input text,
Since the synthesizing unit synthesizes audio data including information indicating the degree of suppression for each predetermined block,
A frame with a suppression level equal to or less than a certain value can be made audible regardless of the attributes of the listener, which increases the convenience for the user.

実施の形態３．
実施の形態１と２においては、聴取者が正規なユーザであるか否かを確認し、あるいは抑制度レベルを表す情報を音声データ中に埋め込む構成について説明した。これは、聴取者の識別情報と抑制度レベルとを、何ら関連付けずに別個の情報として利用するものである。
しかし一般的には、ユーザの年齢などの属性により、そのユーザに許容される抑制度レベルが、個々に異なっている場合もある。
本発明の実施の形態３では、ユーザ毎に許容される抑制度レベルを判断し、適正な抑制度レベルの範囲内にある場合に限り、合成した音声を出力するように構成したものである。 Embodiment 3 FIG.
In the first and second embodiments, the configuration has been described in which it is confirmed whether or not the listener is a legitimate user, or information indicating the suppression level is embedded in the audio data. This uses the identification information of the listener and the suppression level as separate information without associating them at all.
In general, however, the degree of suppression allowed for a user may vary from one to another depending on an attribute such as the user's age.
In the third embodiment of the present invention, the suppression level allowed for each user is determined, and the synthesized voice is output only when it is within the range of the appropriate suppression level.

図９は、本実施の形態３に係る音声合成装置の機能ブロック図を示すものである。
図９に示す音声合成装置９００は、入力手段９０１、認証手段９０２、抑制語リスト記憶手段９０４、音声合成手段９０５、算定手段９０６、抑制度レベル判断手段９０７を有する。
入力手段９０１は、読み上げ対象テキスト１１０を入力として受け取り、算定手段９０６に出力する。
認証手段９０２は、音声合成装置９００の利用者を識別する情報を入力として受け取って認証処理を行い、その結果を抑制度レベル判断手段９０７に出力する。
抑制語リスト記憶手段９０４は、図６に示す抑制語リスト、及び後述の図１０に示す許容度テーブルを格納している。
算定手段９０６は、抑制語リスト記憶手段９０４より抑制語リストを読み込み、読み上げ対象テキスト１１０の内容に抑制語句が含まれているか否かを判断して、その結果及び読み上げ対象テキスト１１０の内容を抑制度レベル判断手段９０７に出力する。また、抑制語句が含まれている場合には、その抑制度レベルを算定し、算定結果を抑制度レベル判断手段９０７に出力する。
抑制度レベル判断手段９０７は、算定手段９０６の算定結果及び認証手段９０２の認証結果を受け取り、抑制度レベルが、音声合成装置９００の利用者に許容される範囲内であるか否かを判断する。判断結果及び読み上げ対象テキスト１１０の内容は、音声合成手段９０５に出力される。
音声合成手段９０５は、抑制度レベル判断手段９０７の出力を基に、音声１１１を合成して出力する。出力方法は、音声データファイルとして記憶手段に書き出す方法などを用いることができる。 FIG. 9 is a functional block diagram of the speech synthesizer according to the third embodiment.
The speech synthesizer 900 shown in FIG. 9 includes an input unit 901, an authentication unit 902, a suppression word list storage unit 904, a speech synthesis unit 905, a calculation unit 906, and a suppression level determination unit 907.
The input unit 901 receives the reading target text 110 as an input and outputs it to the calculation unit 906.
The authentication unit 902 receives information for identifying the user of the speech synthesizer 900 as an input, performs authentication processing, and outputs the result to the suppression level determination unit 907.
The suppression word list storage unit 904 stores a suppression word list shown in FIG. 6 and a tolerance table shown in FIG. 10 described later.
The calculation unit 906 reads the suppression word list from the suppression word list storage unit 904, determines whether or not the suppression word phrase is included in the content of the text to be read 110, and suppresses the result and the content of the text 110 to be read out. It outputs to the degree level judgment means 907. If a suppression word is included, the suppression level is calculated and the calculation result is output to the suppression level determination means 907.
The suppression level determination unit 907 receives the calculation result of the calculation unit 906 and the authentication result of the authentication unit 902, and determines whether or not the suppression level is within a range allowed by the user of the speech synthesizer 900. . The determination result and the content of the text 110 to be read out are output to the speech synthesizer 905.
The voice synthesizer 905 synthesizes and outputs the voice 111 based on the output of the suppression level determination unit 907. As an output method, a method of writing to a storage means as an audio data file can be used.

図１０は、抑制語リスト記憶手段９０４が格納している許容度テーブルの構成及びデータ例を示すものである。以下、各列について説明する。
「ユーザＩＤ」列は、認証手段９０２が受け取る、音声合成装置９００の利用者を識別する情報に相当する。
「許容抑制度レベル」列は、当該「ユーザＩＤ」で識別される利用者に許容される最大の抑制度レベルを表したものである。 FIG. 10 shows the configuration and data example of the tolerance table stored in the suppression word list storage unit 904. Hereinafter, each column will be described.
The “user ID” column corresponds to information that the authentication unit 902 receives and identifies the user of the speech synthesizer 900.
The “allowable restraint level” column represents the maximum restraint level allowed for the user identified by the “user ID”.

図１１は、図９の音声合成装置９００の全体動作フローを説明するものである。以下、各ステップについて説明する。
（Ｓ１１０１）
入力手段９０１は、読み上げ対象テキスト１１０の０内容を受け取る。受け取った内容は、算定手段９０６に出力される。
（Ｓ１１０２）
認証手段９０２は、音声合成装置９００の利用者を識別する情報を入力として受け取って認証処理を行う。
認証処理の内容は、例えば正規のＩＤとパスワードの組を認証手段９０２内の記憶領域に保存しておき、その内容との整合性を確認するなどの方法で実現することができる。
認証処理の結果は、算定手段９０６と抑制度レベル判断手段９０７に出力される。
（Ｓ１１０３）
算定手段９０６は、抑制語リスト記憶手段９０４より抑制語リストを読み込む。このとき、抑制語句とともに、当該抑制語句の抑制度レベルをセットにして読み込んでおく。
（Ｓ１１０４）
以下のステップＳ１１０５〜Ｓ１１１０を、読み上げ対象テキスト１１０の末尾に到達するまで繰り返し実行する。
（Ｓ１１０５）
算定手段９０６は、読み上げ対象テキスト１１０の内容をあらかじめ所定のブロック単位に分割しておく。
次に、算定手段９０６は、読み上げ対象テキスト１１０の最初の音声再生ブロックを読み込む。本ステップの２回目以降の実行時には、読み上げ対象テキスト１１０の次の処理対象ブロックに移動する。
（Ｓ１１０６）
算定手段９０６は、認証手段９０２の出力に基づき、音声合成装置９００の利用者が正規の利用者であるか否かを判断する。
正規の利用者であると判断する場合はステップＳ１１０７へ進み、正規の利用者でないと判断する場合はステップＳ１１０５へ戻る。
（Ｓ１１０７）
算定手段９０６は、現在のブロックの抑制度レベルを算定する。
算定に際しては、現在のブロックに含まれる抑制語句の抑制度レベルを総和する、平均値を求めるなど、任意の算定基準を用いることができる。
抑制度レベルの算定結果及び読み上げ対象テキスト１１０の内容は、抑制度レベル判断手段９０７に出力される。
（Ｓ１１０８）
抑制度レベル判断手段９０７は、算定手段９０６の算定結果及び認証手段９０２の認証結果を受け取り、抑制度レベルが、音声合成装置９００の利用者に許容される範囲内であるか否かを判断する。判断に際しては、抑制語リスト記憶手段９０４が格納している許容度テーブルの内容も参照する。
許容される範囲内であると判断する場合はステップＳ１１０９へ進み、範囲内にないと判断する場合はステップＳ１１０５へ戻る。
判断結果及び読み上げ対象テキスト１１０の内容は、音声合成手段９０５に出力される。
（Ｓ１１０９）
音声合成手段９０５は、抑制度レベル判断手段９０７が出力した読み上げ対象テキスト１１０の内容、及び抑制度レベルの算定結果に基づき、算定結果を表す情報を含めた音声データを合成する処理を行う。音声データの内容は、図７の（１）（２）のいずれかの形式を用いることができる。
（Ｓ１１１０）
音声合成手段９０５は、ステップＳ１１０９で合成した音声を出力する。 FIG. 11 illustrates an overall operation flow of the speech synthesizer 900 of FIG. Hereinafter, each step will be described.
(S1101)
The input unit 901 receives 0 content of the text 110 to be read out. The received content is output to the calculation means 906.
(S1102)
The authentication unit 902 receives information for identifying the user of the speech synthesizer 900 as an input and performs authentication processing.
The contents of the authentication process can be realized by a method of, for example, storing a pair of a regular ID and a password in a storage area in the authentication unit 902 and confirming the consistency with the contents.
The result of the authentication process is output to the calculation unit 906 and the suppression level determination unit 907.
(S1103)
The calculation unit 906 reads the suppression word list from the suppression word list storage unit 904. At this time, together with the suppression word / phrase, the suppression level of the suppression word / phrase is read as a set.
(S1104)
The following steps S1105 to S1110 are repeated until the end of the text to be read 110 is reached.
(S1105)
The calculating means 906 previously divides the content of the text to be read 110 into predetermined block units.
Next, the calculation means 906 reads the first voice reproduction block of the text to be read 110. At the second and subsequent executions of this step, the process moves to the next processing target block of the reading target text 110.
(S1106)
The calculation unit 906 determines whether the user of the speech synthesizer 900 is a regular user based on the output of the authentication unit 902.
If it is determined that the user is an authorized user, the process proceeds to step S1107. If it is determined that the user is not an authorized user, the process returns to step S1105.
(S1107)
The calculating means 906 calculates the suppression level of the current block.
In the calculation, any calculation standard can be used, such as summing up the suppression levels of the suppression words included in the current block, or obtaining an average value.
The calculation result of the suppression level and the content of the text 110 to be read out are output to the suppression level determination unit 907.
(S1108)
The suppression level determination unit 907 receives the calculation result of the calculation unit 906 and the authentication result of the authentication unit 902, and determines whether or not the suppression level is within a range allowed by the user of the speech synthesizer 900. . In the determination, the contents of the tolerance table stored in the suppression word list storage unit 904 are also referred to.
If it is determined that it is within the allowable range, the process proceeds to step S1109. If it is determined that it is not within the range, the process returns to step S1105.
The determination result and the content of the text 110 to be read out are output to the speech synthesizer 905.
(S1109)
The voice synthesizing unit 905 performs a process of synthesizing voice data including information representing the calculation result based on the content of the text to be read 110 output from the suppression level determination unit 907 and the calculation result of the suppression level. As the contents of the audio data, any of the formats (1) and (2) in FIG. 7 can be used.
(S1110)
The voice synthesizer 905 outputs the voice synthesized in step S1109.

なお、ユーザ毎・ブロック毎に抑制度レベルを判断しているため、音声データ中に抑制度レベルを表す情報を埋め込んでおかなくとも十分な聴取抑制処理を行うことができるが、音声が再生される環境によっては、より細やかな聴取抑制処理を行いたい場合もあるものと考えられる。
そのため、本実施の形態３においても、実施の形態２と同様に、ステップＳ１１０９にて、音声データに抑制度レベルを表す情報を埋め込んでおくようにした。 Since the suppression level is determined for each user and for each block, sufficient listening suppression processing can be performed without embedding information indicating the suppression level in the audio data, but the sound is reproduced. Depending on the environment, it may be necessary to perform more detailed listening suppression processing.
Therefore, also in the third embodiment, as in the second embodiment, in step S1109, information representing the suppression level is embedded in the audio data.

以上のように、本実施の形態３によれば、
入力したテキストを音声として合成して出力する際に、当該テキストに抑制語句が含まれている場合には、聴取制限を加えて出力する装置であって、
出力音声を聴取する者の識別情報を基に、当該聴取者が正規ユーザか否かを確認する認証手段と、
抑制語句と、当該抑制語句の抑制度レベルとの組を保持するリストを格納する記憶手段と、
前記記憶手段より、抑制語句と、当該抑制語句の抑制度レベルとの組を保持するリストを読み込み、入力テキスト中の抑制度レベルを算定する算定手段と、
前記算定手段が算定した抑制度レベルが、聴取者に対して許容される範囲内にあるか否かを判断する抑制度レベル判断手段と、
聴取者が正規ユーザであり、かつ抑制度レベルが当該聴取者に対して許容される範囲内にある場合に限り、入力テキストを基に、算定結果を表す情報を含めた音声データを合成する合成手段と、
を有するので、
認証手段９０２の認証結果に基づき、正規のユーザにのみ合成した音声の聴取が可能となる。 As described above, according to the third embodiment,
When the input text is synthesized and output as speech, if the text contains a suppression phrase, it is a device that outputs with restriction of listening,
Authentication means for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
Storage means for storing a list that holds a combination of a suppression word and a suppression level of the suppression word;
From the storage means, a calculation means for reading a list holding a set of suppression words and suppression levels of the suppression words, and calculating a suppression level in the input text;
A degree-of-suppression level judgment means for judging whether or not the degree-of-suppression level calculated by the calculating means is within an allowable range for the listener;
Synthesis that synthesizes speech data including information representing the calculation result based on the input text only when the listener is a regular user and the suppression level is within the allowable range for the listener. Means,
So that
Based on the authentication result of the authentication means 902, it is possible to listen to the synthesized voice only for the authorized user.

また、前記記憶手段は、
聴取者の識別情報と、当該聴取者に許容される抑制度レベルとの関係を表すテーブルを格納しており、
前記抑制度レベル判断手段は、
当該テーブルの内容に基づき、抑制度レベルが当該聴取者に対して許容される範囲内にあるか否かを判断するので、
ユーザ毎に許容される抑制度レベルの上限値を判断しているため、ユーザ個別の属性に合わせた聴取制限を、音声データ単位もしくは所定のブロック単位で行うことが可能となる。 In addition, the storage means
Contains a table representing the relationship between the listener's identification information and the level of suppression allowed for that listener,
The suppression level determination means includes
Based on the contents of the table, it is determined whether the suppression level is within the allowable range for the listener.
Since the upper limit value of the suppression level allowed for each user is determined, it is possible to perform listening restriction in accordance with user-specific attributes in units of audio data or predetermined blocks.

実施の形態４．
実施の形態１〜３で説明した構成においては、聴取者の識別情報や抑制語リストを用いて、聴取をすることを許可できる者のみが音声を聴取できるようにしている。
しかし、音声データの利用形態や、音声合成に用いる音声素片データの権利関係によっては、正規な聴取者が音声データを取得した後に、聴取可能となった音声データが不正な聴取者に取得されて不正利用されることを防止する必要がある場合が存在する。
本発明の実施の形態４では、聴取制限を加えて音声データを生成した後に、当該音声データを暗号化して出力し、正規な聴取者でない者による不正利用を防止する構成について説明する。 Embodiment 4 FIG.
In the configuration described in the first to third embodiments, only the person who can permit listening can be listened to using the listener's identification information and the suppression word list.
However, depending on the usage form of speech data and the rights of speech segment data used for speech synthesis, after an authorized listener obtains speech data, the speech data that can be heard is obtained by an unauthorized listener. In some cases, it is necessary to prevent unauthorized use.
In the fourth embodiment of the present invention, a configuration will be described in which audio data is generated by applying a listening restriction, and then the audio data is encrypted and output to prevent unauthorized use by a person who is not a normal listener.

図１２は、本実施の形態４に係る音声合成装置の機能ブロック図を示すものである。
図１２に示す音声合成装置１２００は、入力手段１２０１、認証手段１２０２、抑制語リスト記憶手段１２０４、音声合成手段１２０５、算定手段１２０６、抑制度レベル判断手段１２０７、暗号化手段１２０８を有する。
入力手段１２０１は、読み上げ対象テキスト１１０を入力として受け取り、算定手段９０６に出力する。
認証手段１２０２は、音声合成装置１２００の利用者を識別する情報を入力として受け取って認証処理を行い、その結果を抑制度レベル判断手段１２０７と暗号化手段１２０８に出力する。
抑制語リスト記憶手段１２０４は、図６に示す抑制語リスト、及び図１０に示す許容度テーブルを格納している。
算定手段１２０６は、抑制語リスト記憶手段１２０４より抑制語リストを読み込み、読み上げ対象テキスト１１０の内容に抑制語句が含まれているか否かを判断して、その結果及び読み上げ対象テキスト１１０の内容を抑制度レベル判断手段１２０７に出力する。また、抑制語句が含まれている場合には、その抑制度レベルを算定し、算定結果を抑制度レベル判断手段１２０７に出力する。
抑制度レベル判断手段１２０７は、算定手段１２０６の算定結果及び認証手段１２０２の認証結果を受け取り、抑制度レベルが、音声合成装置１２００の利用者に許容される範囲内であるか否かを判断する。判断結果及び読み上げ対象テキスト１１０の内容は、音声合成手段１２０５に出力される。
音声合成手段１２０５は、抑制度レベル判断手段１２０７の出力を基に、音声データを合成して出力する。出力方法は、音声データファイルとして記憶手段に書き出す方法などを用いることができる。
暗号化手段１２０８は、認証手段１２０２より認証結果を受け取り、また音声合成手段１２０５より合成後の音声データを受け取る。次に、その情報を用いて音声データを暗号化して、音声１１１として出力する。出力方法は、音声データファイルとして記憶手段に書き出す方法などを用いることができる。 FIG. 12 is a functional block diagram of the speech synthesizer according to the fourth embodiment.
The speech synthesizer 1200 shown in FIG. 12 includes an input unit 1201, an authentication unit 1202, a suppression word list storage unit 1204, a speech synthesis unit 1205, a calculation unit 1206, a suppression level determination unit 1207, and an encryption unit 1208.
The input unit 1201 receives the reading target text 110 as an input and outputs it to the calculation unit 906.
The authentication unit 1202 receives information for identifying the user of the speech synthesizer 1200 as an input, performs authentication processing, and outputs the result to the suppression level determination unit 1207 and the encryption unit 1208.
The suppression word list storage unit 1204 stores the suppression word list shown in FIG. 6 and the tolerance table shown in FIG.
The calculation unit 1206 reads the suppression word list from the suppression word list storage unit 1204, determines whether or not the suppression word / phrase is included in the content of the text to be read 110, and suppresses the result and the content of the text 110 to be read out. It outputs to the degree level judgment means 1207. If a suppression word is included, the suppression level is calculated, and the calculation result is output to the suppression level determination means 1207.
The suppression level determination unit 1207 receives the calculation result of the calculation unit 1206 and the authentication result of the authentication unit 1202, and determines whether or not the suppression level is within the range allowed by the user of the speech synthesizer 1200. . The determination result and the content of the text 110 to be read out are output to the speech synthesizer 1205.
The voice synthesizing unit 1205 synthesizes voice data based on the output of the suppression level determination unit 1207 and outputs the synthesized voice data. As an output method, a method of writing to a storage means as an audio data file can be used.
The encryption unit 1208 receives the authentication result from the authentication unit 1202 and receives the synthesized voice data from the voice synthesis unit 1205. Next, the audio data is encrypted using the information and output as audio 111. As an output method, a method of writing to a storage means as an audio data file can be used.

図１３は、図１２の音声合成装置１２００の全体動作フローを説明するものである。以下、各ステップについて説明する。
（Ｓ１３０１）
入力手段１２０１は、読み上げ対象テキスト１１０の０内容を受け取る。受け取った内容は、算定手段１２０６に出力される。
（Ｓ１３０２）
認証手段１２０２は、音声合成装置１２００の利用者を識別する情報を入力として受け取って認証処理を行う。
認証処理の内容は、例えば正規のＩＤとパスワードの組を認証手段１２０２内の記憶領域に保存しておき、その内容との整合性を確認するなどの方法で実現することができる。
認証処理の結果は、算定手段１２０６、抑制度レベル判断手段１２０７、及び暗号化手段１２０８に出力される。
（Ｓ１３０３）
算定手段１２０６は、抑制語リスト記憶手段１２０４より抑制語リストを読み込む。このとき、抑制語句とともに、当該抑制語句の抑制度レベルをセットにして読み込んでおく。
（Ｓ１３０４）
以下のステップＳ１３０５〜Ｓ１３１１を、読み上げ対象テキスト１１０の末尾に到達するまで繰り返し実行する。
（Ｓ１３０５）
算定手段１２０６は、読み上げ対象テキスト１１０の内容をあらかじめ所定のブロック単位に分割しておく。
次に、算定手段１２０６は、読み上げ対象テキスト１１０の最初の音声再生ブロックを読み込む。本ステップの２回目以降の実行時には、読み上げ対象テキスト１１０の次の処理対象ブロックに移動する。
（Ｓ１３０６）
算定手段１２０６は、認証手段１２０２の出力に基づき、音声合成装置９００の利用者が正規の利用者であるか否かを判断する。
正規の利用者であると判断する場合はステップＳ１３０７へ進み、正規の利用者でないと判断する場合はステップＳ１３０５へ戻る。
（Ｓ１３０７）
算定手段１２０６は、現在のブロックの抑制度レベルを算定する。
算定に際しては、現在のブロックに含まれる抑制語句の抑制度レベルを総和する、平均値を求めるなど、任意の算定基準を用いることができる。
抑制度レベルの算定結果及び読み上げ対象テキスト１１０の内容は、抑制度レベル判断手段１２０７に出力される。
（Ｓ１３０８）
抑制度レベル判断手段１２０７は、算定手段１２０６の算定結果及び認証手段１２０２の認証結果を受け取り、抑制度レベルが、音声合成装置１２００の利用者に許容される範囲内であるか否かを判断する。判断に際しては、抑制語リスト記憶手段１２０４が格納している許容度テーブルの内容も参照する。
許容される範囲内であると判断する場合はステップＳ１３０９へ進み、範囲内にないと判断する場合はステップＳ１３０５へ戻る。
判断結果及び読み上げ対象テキスト１１０の内容は、音声合成手段１２０５に出力される。
（Ｓ１３０９）
音声合成手段１２０５は、抑制度レベル判断手段１２０７が出力した読み上げ対象テキスト１１０の内容、及び抑制度レベルの算定結果に基づき、算定結果を表す情報を含めた音声データを合成する処理を行う。音声データの内容は、図７の（１）（２）のいずれかの形式を用いることができる。
合成した音声データは、暗号化手段１２０８へ出力される。
（Ｓ１３１０）
暗号化手段１２０８は、認証手段１２０２より認証結果を受け取り、音声データを暗号化して出力する。
暗号化処理の際に用いる暗号化キーは、例えば認証手段１２０２が受け取った識別情報をそのまま用いてもよいし、あるいは所定の生成ルールに基づき、識別情報より暗号化キーを改めて生成して用いてもよい。
（Ｓ１３１１）
暗号化手段１２０８は、ステップＳ１３１０で暗号化した音声を出力する。 FIG. 13 illustrates an overall operation flow of the speech synthesizer 1200 of FIG. Hereinafter, each step will be described.
(S1301)
The input unit 1201 receives 0 content of the text 110 to be read out. The received contents are output to the calculation means 1206.
(S1302)
The authentication unit 1202 receives information for identifying the user of the speech synthesizer 1200 as an input and performs authentication processing.
The contents of the authentication process can be realized by a method of, for example, storing a pair of a regular ID and a password in a storage area in the authentication unit 1202 and confirming consistency with the contents.
The result of the authentication process is output to the calculation unit 1206, the suppression level determination unit 1207, and the encryption unit 1208.
(S1303)
The calculation unit 1206 reads the suppression word list from the suppression word list storage unit 1204. At this time, together with the suppression word / phrase, the suppression level of the suppression word / phrase is read as a set.
(S1304)
The following steps S1305 to S1311 are repeatedly executed until the end of the text to be read 110 is reached.
(S1305)
The calculation means 1206 previously divides the content of the text to be read 110 into predetermined block units.
Next, the calculation means 1206 reads the first voice reproduction block of the text to be read 110. At the second and subsequent executions of this step, the process moves to the next processing target block of the reading target text 110.
(S1306)
The calculation unit 1206 determines whether or not the user of the speech synthesizer 900 is a regular user based on the output of the authentication unit 1202.
If it is determined that the user is an authorized user, the process proceeds to step S1307. If it is determined that the user is not an authorized user, the process returns to step S1305.
(S1307)
The calculation means 1206 calculates the suppression level of the current block.
In the calculation, an arbitrary calculation standard can be used, such as summing up the suppression levels of the suppression words included in the current block, or obtaining an average value.
The calculation result of the suppression level and the content of the text 110 to be read out are output to the suppression level determination unit 1207.
(S1308)
The suppression level determination unit 1207 receives the calculation result of the calculation unit 1206 and the authentication result of the authentication unit 1202, and determines whether or not the suppression level is within a range allowed by the user of the speech synthesizer 1200. . In the determination, the contents of the tolerance table stored in the suppression word list storage unit 1204 are also referred to.
If it is determined that it is within the allowable range, the process proceeds to step S1309, and if it is determined that it is not within the range, the process returns to step S1305.
The determination result and the content of the text 110 to be read out are output to the speech synthesizer 1205.
(S1309)
The voice synthesizing unit 1205 performs a process of synthesizing voice data including information representing the calculation result based on the content of the text to be read 110 output from the suppression level determination unit 1207 and the calculation result of the suppression level. As the contents of the audio data, any of the formats (1) and (2) in FIG. 7 can be used.
The synthesized voice data is output to the encryption unit 1208.
(S1310)
The encryption unit 1208 receives the authentication result from the authentication unit 1202, encrypts the audio data, and outputs it.
As the encryption key used in the encryption process, for example, the identification information received by the authentication unit 1202 may be used as it is, or based on a predetermined generation rule, a new encryption key is generated from the identification information and used. Also good.
(S1311)
The encryption unit 1208 outputs the sound encrypted in step S1310.

図１３のフローチャートに示す方法によれば、正規な聴取者のみが知っている識別情報に基づいて暗号化キーを生成し、当該暗号化キーを用いて音声データを暗号化しているので、正規な聴取者でない者による不正利用を防止することができる。 According to the method shown in the flowchart of FIG. 13, the encryption key is generated based on the identification information known only by the authorized listener, and the voice data is encrypted using the encryption key. Unauthorized use by those who are not listeners can be prevented.

このように、音声データに暗号化処理を施すことにより、不正利用を防止する効果が期待できるが、暗号化した音声データを復号して音声を再生するためには、暗号化に用いたアルゴリズムに対応した復号アルゴリズムを用いることのできる音声プレイヤー等を用いる必要がある。そのため、ユーザに当該音声プレイヤーを使用することを強制してしまうので、ユーザにとって過度の負担となる場合もある。
そこで、音声データを暗号化する代わりに所定の音声透かし情報を音声データに埋め込んでおき、対応した復元方法を用いることにより、当該音声透かし情報を復元できるような構成とすれば、ユーザにとって過度の負担とならず、かつ不正利用が行われた場合には透かし情報によりそれが判明するので、一定の抑止効果も期待できる。 As described above, by performing encryption processing on the audio data, an effect of preventing unauthorized use can be expected. However, in order to decrypt the encrypted audio data and reproduce the audio, the algorithm used for the encryption is used. It is necessary to use an audio player or the like that can use a corresponding decoding algorithm. For this reason, the user is forced to use the audio player, which may be an excessive burden on the user.
Therefore, if the configuration is such that the audio watermark information can be restored by embedding predetermined audio watermark information in the audio data instead of encrypting the audio data and using a corresponding restoration method, it is excessive for the user. If it is not burdensome and unauthorized use is made, it will be revealed by the watermark information, so a certain deterrent effect can be expected.

図１４は、図７に示す音声データの内部構造イメージにおいて、さらに音声透かし情報を埋め込んだものを説明するものである。
図１４の（１）（２）いずれにおいても、ヘッダ部ではなく音声データ部そのものに、透かし情報を埋め込んでいる。
透かし情報の埋め込みは、音声合成手段１２０５が音声データを生成する際に埋め込んでおくように構成すればよい。 FIG. 14 illustrates the internal structure image of the audio data shown in FIG. 7 further embedded with audio watermark information.
In both (1) and (2) of FIG. 14, the watermark information is embedded in the audio data part itself, not in the header part.
The watermark information may be embedded when the voice synthesizing unit 1205 generates the voice data.

図１４の（１）は、音声データ全体で共通の音声透かし情報を１つ持たせるように構成した場合の音声データのイメージを示すものである。この場合は、音声データ部のうちの任意の部分に、音声透かし情報を埋め込んでおく。 (1) in FIG. 14 shows an image of audio data when the audio data is configured to have one common audio watermark information. In this case, audio watermark information is embedded in an arbitrary part of the audio data portion.

図１４の（２）は、音声フレーム毎に透かし情報を持たせるように構成した場合の音声データのイメージを示すものである。
この場合は、フレームデータ部に透かし情報を持たせるように構成する。各フレームの音声透かし情報は、全て共通のものとしてもよいし、当該フレームを特定できるハッシュ値などの情報を別途付与して、全てのフレームが異なる透かし情報を持つように構成してもよい。
フレーム毎に異なる透かし情報を持つように構成した場合は、音声データのフレーム構造を熟知している者が、フレームを分割などして不正にデータをコピー・改ざん等した場合であっても、その改ざん行為を検出することが可能である。 (2) in FIG. 14 shows an image of audio data in the case where the configuration is such that watermark information is provided for each audio frame.
In this case, the frame data portion is configured to have watermark information. The audio watermark information of each frame may be common to all the frames, or information such as a hash value that can identify the frame may be separately added so that all the frames have different watermark information.
When configured to have different watermark information for each frame, even if someone who is familiar with the frame structure of audio data has illegally copied or altered the data by dividing the frame, etc. It is possible to detect tampering.

以上のように、本実施の形態４によれば、
前記識別情報に基づき暗号化キーを生成し、前記合成手段が合成した音声データを暗号化する暗号化手段を有するので、
正規な聴取者のみが知っている識別情報に基づいて暗号化キーを生成し、当該暗号化キーを用いて音声データを暗号化して、正規な聴取者でない者による不正利用を防止することができる。 As described above, according to the fourth embodiment,
Since the encryption key is generated based on the identification information and the voice data synthesized by the synthesis means is encrypted,
An encryption key is generated based on identification information known only to a legitimate listener, and the voice data is encrypted using the encryption key to prevent unauthorized use by a person who is not a legitimate listener. .

また、前記合成手段は、入力テキストの所定のブロック毎に、音声データの合成元を識別するための透かしデータを含めて音声データを合成するので、
不正利用が行われた場合には透かし情報によりそれが判明するので、一定の抑止効果が期待できる。 The synthesizing unit synthesizes the audio data including watermark data for identifying the synthesis source of the audio data for each predetermined block of the input text.
If unauthorized use is made, it will be revealed by the watermark information, so a certain deterrent effect can be expected.

実施の形態５．
実施の形態１〜４では、所定の聴取制限や不正利用防止手段を付与した音声合成方法、及びそれを実現する音声合成装置の構成について説明した。
本発明の実施の形態５では、これらの音声合成方法ないしは音声合成装置を用いて生成した音声データを、音声コンテンツとしてクライアント端末へ配信するシステムの構成について説明する。 Embodiment 5 FIG.
In the first to fourth embodiments, the speech synthesis method provided with predetermined listening restrictions and unauthorized use prevention means and the configuration of the speech synthesis apparatus that implements the speech synthesis method have been described.
In the fifth embodiment of the present invention, a configuration of a system that distributes audio data generated by using these speech synthesis methods or speech synthesis apparatuses to a client terminal as audio content will be described.

図１５は、合成した音声コンテンツを含むＷｅｂページの一例を、画面イメージとして示すものである。
クライアント端末１５０１は、ユーザがＷｅｂページの閲覧と音声コンテンツの聴取に使用するコンピュータである。ユーザは、クライアント端末１５０１のＷｅｂブラウザソフトウェアを起動して、Ｗｅｂページの画面構成例１５０２に示すようなＷｅｂページを閲覧する。
（１）ユーザがリンクボタンをクリックする。
ユーザは、Ｗｅｂページの画面構成例１５０２中に表示されている、音声コンテンツへのリンクをクリックする。
（２）プレイヤーが起動する。
ユーザが音声コンテンツへのリンクをクリックすると、当該音声コンテンツのフォーマットに対応した音声プレイヤーソフトウェアが起動する。ユーザは、音声プレイヤーソフトウェアを介して、リンク先の音声コンテンツを聴取する。
なお、音声データに抑制語句及びその抑制度レベルを表す情報が含まれている場合には、プレイヤー画面中にその情報が表示されるように構成することもできる。 FIG. 15 shows an example of a Web page including synthesized audio content as a screen image.
A client terminal 1501 is a computer used by a user for browsing a Web page and listening to audio content. The user activates the web browser software of the client terminal 1501 and browses a web page as shown in a screen configuration example 1502 of the web page.
(1) The user clicks a link button.
The user clicks a link to audio content displayed in the screen configuration example 1502 of the Web page.
(2) The player starts up.
When the user clicks a link to audio content, audio player software corresponding to the audio content format is activated. The user listens to the linked audio content via the audio player software.
In addition, when the audio | voice data contain the information which shows a suppression word and its suppression degree level, it can also be comprised so that the information may be displayed on a player screen.

図１６は、図１５に示すような音声コンテンツへのリンク情報を含むＷｅｂページと、リンク先の音声コンテンツとを、クライアント端末へ配信するための音声合成配信システムの構成を説明するものである。
図１６においては、クライアント端末１６０１とサーバセンタ１６０２は、ネットワーク１６０３を介して接続されている。ユーザはクライアント端末１６０１を使用してサーバセンタ１６０２にアクセスする。
以下、ユーザがＷｅｂページや音声コンテンツを視聴する際の通信手順を簡単に説明する。
（１）ユーザはクライアント端末１６０１を使用して、サーバセンタ１６０２にログイン要求を送信する。
（２）サーバセンタ１６０２は、認証結果をクライアント端末１６０１に返信する。
（３）ユーザはクライアント端末１６０１を使用して、サーバセンタ１６０２に、Ｗｅｂページや音声コンテンツの配信を要求する。
（４）サーバセンタ１６０２は、要求されたＷｅｂページや音声コンテンツを配信する。 FIG. 16 illustrates the configuration of a speech synthesis and delivery system for delivering a Web page including link information to audio content as shown in FIG. 15 and linked audio content to client terminals.
In FIG. 16, the client terminal 1601 and the server center 1602 are connected via a network 1603. The user accesses the server center 1602 using the client terminal 1601.
Hereinafter, a communication procedure when a user views a Web page or audio content will be briefly described.
(1) The user transmits a login request to the server center 1602 using the client terminal 1601.
(2) The server center 1602 returns an authentication result to the client terminal 1601.
(3) The user uses the client terminal 1601 to request the server center 1602 to distribute a web page or audio content.
(4) The server center 1602 delivers the requested Web page and audio content.

なお、詳細な処理シーケンスについては、後述の図１８で説明する。 A detailed processing sequence will be described later with reference to FIG.

次に、サーバセンタ１６０２の内部構成について、図１６を基に説明する。
サーバセンタ１６０２内には、配信サーバ１６０４と、音声合成サーバ１６０５とが設置されている。
配信サーバ１６０４は、クライアント端末１６０１からの合成音声配信リクエストを受け付けて、音声合成サーバ１６０５にリクエストを転送する機能を提供する。また、Ｗｅｂページと音声合成サーバ１６０５が生成した音声データを、クライアント端末１６０１へ配信する機能も提供する。
音声合成サーバ１６０５は、配信サーバ１６０４より転送されたリクエストを元に、動的に音声合成を行い、配信サーバ１６０４に返送する機能を提供する。本実施の形態５において、「動的に」音声合成を行うとは、あらかじめ音声データを準備しておくのではなく、リクエストを受け付けた段階で音声合成を開始し、生成した音声データを出力する、ということを意味する。 Next, the internal configuration of the server center 1602 will be described with reference to FIG.
In the server center 1602, a distribution server 1604 and a speech synthesis server 1605 are installed.
The distribution server 1604 provides a function of receiving a synthesized voice distribution request from the client terminal 1601 and transferring the request to the voice synthesis server 1605. In addition, a function of delivering voice data generated by the web page and the voice synthesis server 1605 to the client terminal 1601 is also provided.
The voice synthesis server 1605 provides a function of dynamically synthesizing voice based on the request transferred from the distribution server 1604 and returning it to the distribution server 1604. In the fifth embodiment, “dynamically” voice synthesis means that voice data is not prepared in advance, but voice synthesis is started when a request is received, and the generated voice data is output. It means that.

次に、音声合成サーバ１６０５の構成について説明する。
音声合成サーバ１６０５は、図示しない演算手段と、記憶手段１６０６とを有する。
記憶手段１６０６は、音声合成プログラム１６１０、ユーザテーブル１６２０、抑制語リスト１６２１、許容度テーブル１６２２を格納している。
音声合成プログラム１６１０は、実施の形態１〜４で説明したフローチャート（図３、図４、図８、図１１、図１３）のいずれかを演算手段に実行させるプログラムとして構成することができる。本実施の形態５においては、図１３のフローチャートの処理を演算手段に実行させるプログラムであるものとする。
ユーザテーブル１６２０は、ユーザの識別情報を保持するものである。詳細は後述の図１７で説明する。
抑制語リスト１６２１は、図６に示すものと同様の構成を持つものである。
許容度テーブル１６２２は、図１０に示すものと同様の構成を持つものである。
詳細は、後述の後述の図１８で改めて説明する。 Next, the configuration of the speech synthesis server 1605 will be described.
The speech synthesis server 1605 includes a calculation unit (not shown) and a storage unit 1606.
The storage unit 1606 stores a speech synthesis program 1610, a user table 1620, a suppression word list 1621, and a tolerance table 1622.
The speech synthesis program 1610 can be configured as a program that causes the calculation means to execute any of the flowcharts (FIGS. 3, 4, 8, 11, and 13) described in the first to fourth embodiments. In the fifth embodiment, it is assumed that the program is a program for causing the arithmetic means to execute the processing of the flowchart of FIG.
The user table 1620 holds user identification information. Details will be described later with reference to FIG.
The suppression word list 1621 has the same configuration as that shown in FIG.
The tolerance table 1622 has the same configuration as that shown in FIG.
Details will be described later with reference to FIG.

以上の構成により、音声合成サーバは、音声データの供給元、すなわちエンコーダ的な役割を果たす。また、配信サーバは、エンコーダの役割を果たす音声合成サーバが生成した音声データをクライアント端末へ配信する、ストリーミングサーバ的な役割を果たす。
もっとも、音声データを配信する方法はストリーミング方式に限られるものではなく、ダウンロード方式により配信を行うように構成してもよい。 With the above configuration, the speech synthesis server plays the role of an audio data supplier, that is, an encoder. In addition, the distribution server serves as a streaming server that distributes the voice data generated by the voice synthesis server serving as an encoder to the client terminal.
However, the method for distributing the audio data is not limited to the streaming method, and it may be configured to distribute by the download method.

図１７は、ユーザテーブル１６２０の構成とデータ例について説明するものである。以下、各列について説明する。
「ユーザＩＤ」列は、クライアント端末１６０１を使用してＷｅｂページと音声コンテンツを視聴する者を識別するためのＩＤである。
「パスワード」列は、当該ユーザＩＤで識別される者に割り当てられたログインパスワードである。
ユーザは、「ユーザＩＤ」列の値と「パスワード」列の値の組を用いて、配信サーバ１６０４にログイン要求を送信する（図１６のステップ（１）に相当）。
「セッションＩＤ」列は、ログイン処理が完了した後に、当該ユーザに割り当てられる一意のランダムな文字数字列を格納するためのフィールドである。詳細は後述の図１８で説明する。 FIG. 17 explains the configuration and data example of the user table 1620. Hereinafter, each column will be described.
The “user ID” column is an ID for identifying a person who views a Web page and audio content using the client terminal 1601.
The “password” column is a login password assigned to the person identified by the user ID.
The user transmits a login request to the distribution server 1604 using a set of values in the “user ID” column and “password” column (corresponding to step (1) in FIG. 16).
The “session ID” column is a field for storing a unique random alphanumeric string assigned to the user after the login process is completed. Details will be described later with reference to FIG.

図１８は、図１６においてユーザがＷｅｂページや音声コンテンツを視聴する際の詳細な処理シーケンスを説明するものである。以下、各ステップについて説明する。
（Ｓ１８０１）
ユーザはクライアント端末１６０１を使用して、サーバセンタ１６０２（配信サーバ１６０４）にログイン要求を送信する。このとき、あらかじめユーザに割り当てられたユーザＩＤとパスワードを合わせて送信する。
（Ｓ１８０２）
配信サーバ１６０４は、クライアント端末１６０１から送信されたＩＤ・パスワードの組を受け取り、その組をキーとして、音声合成サーバ１６０５に照会クエリを発行する。
（Ｓ１８０３）
音声合成サーバ１６０５の演算手段は、音声合成プログラム１６１０の指示に基づき、ユーザの認証処理を行う。
具体的には、ユーザテーブル１６２０を、クライアント端末１６０１から送信されたＩＤ・パスワードの組で検索し、該当するデータがあるか否かで、当該ユーザが正規な者であるか否かを判断することができる。
音声合成サーバ１６０５の演算手段は、音声合成プログラム１６１０の指示に基づき、認証処理の結果を配信サーバ１６０４に返信する。
以下、正規ユーザとして認証されたものとして説明を継続する。
（Ｓ１８０４）
配信サーバ１６０４は、ランダムな文字数字列からなるセッションＩＤを生成する。生成したセッションＩＤの値は、Ｃｏｏｋｉｅ等に格納してクライアント端末１６０１へ送信されるとともに、ユーザテーブルの該当行の「セッションＩＤ」列に格納される。
（Ｓ１８０５）
ユーザはクライアント端末１６０１を使用して、例えば図１５のような画面で音声コンテンツへのリンクをクリック等することにより、配信サーバ１６０４に対して、当該音声コンテンツの配信を要求する。
クライアント端末１６０１が配信サーバ１６０４へリクエストを発行する際には、ステップＳ１８０４で受け取ったセッションＩＤを、リクエストとともに配信サーバ１６０４へ送信する。
（Ｓ１８０６）
配信サーバ１６０４は、クライアント端末１６０１が送信したセッションＩＤをキーにして、音声合成サーバ１６０５へ照会クエリを発行する。
（Ｓ１８０７）
音声合成サーバ１６０５の演算手段は、音声合成プログラム１６１０の指示に基づき、クライアント端末１６０１が送信したセッションＩＤをキーにして、ユーザテーブル１６２０を検索する。該当する「セッションＩＤ」列の値を持つデータが存在すれば、そのセッションＩＤを送信したユーザは認証済みであるものと判断することができる。
次に、音声合成サーバ１６０５の演算手段は、音声合成プログラム１６１０の指示に基づき、ユーザテーブル１６２０の該当データより「ユーザＩＤ」の値を読み取り、その値をキーにして、許容度テーブル１６２２を検索する。該当するデータより、当該ユーザに許容されている抑制度レベルの値を読み取る。
（Ｓ１８０８）
音声合成サーバ１６０５の演算手段は、音声合成プログラム１６１０の指示に基づき、音声合成処理を行う。
合成処理は、実施の形態４における図１３の、ステップＳ１３０４〜Ｓ１３０９に示す内容を実行する。入力テキストは、あらかじめ記憶手段１６０６に格納しておき、これを用いるように構成することができる。
（Ｓ１８０９）
音声合成サーバ１６０５の演算手段は、音声合成プログラム１６１０の指示に基づき、ステップＳ１８０６で配信サーバ１６０４が送信したセッションＩＤをキーにして、ステップＳ１８０８で生成した音声データを暗号化する。
（Ｓ１８１０）
音声合成サーバ１６０５の演算手段は、音声合成プログラム１６１０の指示に基づき、生成した音声データを配信サーバ１６０４に返送する。
（Ｓ１８１１）
配信サーバ１６０４は、ステップＳ１８１０で音声合成サーバ１６０５より受け取った音声データを、クライアント端末１６０１へ配信する。
配信の際には、ネットワーク１６０３の回線速度に応じて配信ビットレートを動的に増減させるなどの処理を加えてもよい。 FIG. 18 illustrates a detailed processing sequence when the user views a Web page or audio content in FIG. Hereinafter, each step will be described.
(S1801)
The user uses the client terminal 1601 to transmit a login request to the server center 1602 (distribution server 1604). At this time, the user ID and password assigned to the user in advance are transmitted together.
(S1802)
The distribution server 1604 receives the ID / password pair transmitted from the client terminal 1601, and issues a query to the speech synthesis server 1605 using the pair as a key.
(S1803)
The computing means of the speech synthesis server 1605 performs user authentication processing based on instructions from the speech synthesis program 1610.
Specifically, the user table 1620 is searched with the ID / password combination transmitted from the client terminal 1601, and it is determined whether or not the user is a legitimate person based on whether or not there is corresponding data. be able to.
The computing means of the speech synthesis server 1605 returns the result of the authentication process to the distribution server 1604 based on the instruction of the speech synthesis program 1610.
Hereinafter, the description will be continued assuming that the user is authenticated as a regular user.
(S1804)
The distribution server 1604 generates a session ID consisting of a random alphanumeric string. The value of the generated session ID is stored in Cookie and transmitted to the client terminal 1601 and stored in the “session ID” column of the corresponding row of the user table.
(S1805)
The user uses the client terminal 1601 to request the distribution server 1604 to distribute the audio content, for example, by clicking a link to the audio content on the screen as shown in FIG.
When the client terminal 1601 issues a request to the distribution server 1604, the session ID received in step S1804 is transmitted to the distribution server 1604 together with the request.
(S1806)
The distribution server 1604 issues an inquiry query to the speech synthesis server 1605 using the session ID transmitted by the client terminal 1601 as a key.
(S1807)
Based on an instruction from the speech synthesis program 1610, the computing means of the speech synthesis server 1605 searches the user table 1620 using the session ID transmitted by the client terminal 1601 as a key. If there is data having a value in the corresponding “session ID” column, it can be determined that the user who transmitted the session ID has been authenticated.
Next, the computing means of the speech synthesis server 1605 reads the value of “user ID” from the corresponding data in the user table 1620 based on the instruction of the speech synthesis program 1610 and searches the tolerance table 1622 using that value as a key. To do. The value of the suppression level permitted by the user is read from the corresponding data.
(S1808)
The computing means of the speech synthesis server 1605 performs speech synthesis processing based on instructions from the speech synthesis program 1610.
The composition process executes the contents shown in steps S1304 to S1309 in FIG. 13 in the fourth embodiment. The input text can be stored in advance in the storage unit 1606 and used.
(S1809)
Based on an instruction from the speech synthesis program 1610, the arithmetic unit of the speech synthesis server 1605 encrypts the speech data generated in step S1808 using the session ID transmitted by the distribution server 1604 in step S1806 as a key.
(S1810)
The calculation means of the voice synthesis server 1605 returns the generated voice data to the distribution server 1604 based on the instruction of the voice synthesis program 1610.
(S1811)
The distribution server 1604 distributes the voice data received from the voice synthesis server 1605 in step S1810 to the client terminal 1601.
At the time of distribution, processing such as dynamically increasing or decreasing the distribution bit rate according to the line speed of the network 1603 may be added.

図１８に示す処理シーケンスによれば、配信サーバ１６０４は、クライアント端末１６０１よりリクエストを受け付けた際に、音声合成サーバ１６０５より音声データを引き出してクライアント端末１６０１に配信する、いわゆる「Ｐｕｌｌ型配信」を実現することができる。
この配信方式によれば、リクエストの内容に合わせて動的に音声データを生成することが容易であるため、たとえば音声合成に用いる入力テキストをユーザに入力させ、その内容を音声合成して配信することも可能である。
これを実現するためには、入力テキストの内容を図１８のステップＳ１８０５でクライアント端末１６０１が送信し、その内容をステップＳ１８０６で音声合成サーバ１６０５に送信するように構成すればよい。この場合は、記憶手段１６０６に、入力テキストの内容をあらかじめ格納しておく必要はない。 According to the processing sequence shown in FIG. 18, when the delivery server 1604 receives a request from the client terminal 1601, the so-called “Pull-type delivery” in which voice data is extracted from the voice synthesis server 1605 and delivered to the client terminal 1601. Can be realized.
According to this distribution method, since it is easy to generate voice data dynamically according to the content of the request, for example, the input text used for voice synthesis is input by the user, and the content is voice-synthesized and distributed. It is also possible.
In order to realize this, the content of the input text may be transmitted by the client terminal 1601 in step S1805 in FIG. 18, and the content may be transmitted to the speech synthesis server 1605 in step S1806. In this case, it is not necessary to store the contents of the input text in the storage unit 1606 in advance.

図１９は、図１５に示すＷｅｂページの別の構成例を示すものである。
音声データには、抑制度レベルを表す情報を埋め込んでおくことができるため、音声プレイヤーに抑制度レベルを表示する機能があれば、音声プレイヤーの起動後は、ユーザが当該音声データの抑制度レベルを知ることができる。
しかし、音声プレイヤーを起動して音声データを再生する前に、その抑制度レベルを知ることができれば、ユーザが音声データを取得した後になって初めて、当該音声データを視聴することができないことが判明する、という２度手間を回避することができる。
そこで、図１９（１）に示す画面構成例のように、音声コンテンツへのリンク先に抑制度レベルを表示しておけば、かかる２度手間の回避が容易となる。 FIG. 19 shows another configuration example of the Web page shown in FIG.
Since the voice data can be embedded with information indicating the suppression level, if the voice player has a function of displaying the suppression level, the user can set the suppression level of the voice data after the voice player is activated. Can know.
However, if the suppression level can be known before starting the audio player and playing the audio data, it turns out that the audio data cannot be viewed until the user acquires the audio data. It is possible to avoid the trouble of performing twice.
Therefore, if the suppression level is displayed at the link destination to the audio content as in the screen configuration example shown in FIG.

配信サーバ１６０４は、図１９（１）のような抑制度レベルを表示したＷｅｂページをクライアント端末１６０１に配信する。配信するＷｅｂページのデータは、あらかじめ作成して配信サーバ１６０４に格納しておいてもよいし、音声データのヘッダ部（図７参照）を読み込んで、動的に生成するように構成してもよい。 The distribution server 1604 distributes the Web page displaying the suppression level as shown in FIG. 19 (1) to the client terminal 1601. Web page data to be distributed may be created in advance and stored in the distribution server 1604, or may be configured to be dynamically generated by reading the header portion of the audio data (see FIG. 7). Good.

以上のように、本実施の形態５によれば、
実施の形態１〜４で説明したフローチャート（図３、図４、図８、図１１、図１３）のいずれかを演算手段に実行させるプログラムを構成したので、
これらのフローチャートで表される音声合成方法をソフトウェアで実現でき、種々のアプリケーションに当該ソフトウェアを適用することができる。 As described above, according to the fifth embodiment,
Since the program for causing the calculation means to execute any of the flowcharts (FIGS. 3, 4, 8, 11, and 13) described in the first to fourth embodiments is configured,
The speech synthesis method represented by these flowcharts can be realized by software, and the software can be applied to various applications.

また、本実施の形態５によれば、
音声合成サーバと、音声合成サーバが出力した音声データを配信する配信サーバとを有する音声合成配信システムであって、
前記音声合成サーバは、
演算手段と、上記に記載の音声合成プログラムを格納した記憶手段とを有し、
前記記憶手段は、
聴取者の識別情報を保持するユーザテーブルと、
聴取者の識別情報と当該聴取者に許容される抑制度レベルとの関係を表すテーブルと、
抑制語句と当該抑制語句の抑制度レベルとの組を保持するリストとを格納したので、
リクエストを受け付けた段階で音声合成を開始し、生成した音声データを出力する、動的な音声合成配信サービスを提供することができる。 Further, according to the fifth embodiment,
A speech synthesis distribution system having a speech synthesis server and a distribution server for distributing speech data output by the speech synthesis server,
The speech synthesis server
It has a calculation means and a storage means for storing the speech synthesis program described above,
The storage means
A user table holding the identification information of the listener;
A table representing the relationship between the listener's identification information and the level of suppression allowed for that listener;
Since we stored a list that holds pairs of suppression words and suppression levels of the suppression words,
It is possible to provide a dynamic speech synthesis distribution service that starts speech synthesis at the stage of receiving a request and outputs the generated speech data.

また、前記配信サーバは、
クライアントからの合成音声配信リクエストを受け付けて前記音声合成サーバに該リクエストを転送し、
前記演算手段は、
前記配信サーバより転送されたリクエストを受け付けると、前記音声合成プログラムの指示に基づき、前記記憶手段に格納したテーブル及びリストを読み込み、音声合成を行って前記配信サーバに返送し、
前記配信サーバは、
返送された音声データをクライアントに配信するので、
クライアント端末１６０１よりリクエストを受け付けた際に、音声合成サーバ１６０５より音声データを引き出してクライアント端末１６０１に配信する、いわゆる「Ｐｕｌｌ型配信」を実現することができる。
この配信方式によれば、リクエストの内容に合わせて動的に音声データを生成することが容易であるため、たとえば音声合成に用いる入力テキストをユーザに入力させ、その内容を音声合成して配信することが可能となる。 In addition, the distribution server
Accepts a synthesized speech delivery request from a client and forwards the request to the speech synthesis server;
The computing means is
Upon receiving the request transferred from the distribution server, based on the instructions of the voice synthesis program, read the table and list stored in the storage means, perform the voice synthesis and return to the distribution server,
The distribution server
Since the returned audio data is delivered to the client,
When a request is received from the client terminal 1601, so-called “Pull type distribution” in which voice data is extracted from the voice synthesis server 1605 and distributed to the client terminal 1601 can be realized.
According to this distribution method, since it is easy to generate voice data dynamically according to the content of the request, for example, the input text used for voice synthesis is input by the user, and the content is voice-synthesized and distributed. It becomes possible.

また、前記配信サーバは、
前記音声合成サーバが出力する音声データへのリンク情報を含めたＷｅｂページをクライアントへ配信し、
当該Ｗｅｂページ配信の際には、リンク先音声データの抑制度レベルを表す情報を合わせて配信するので、
ユーザが音声データを取得した後になって初めて、当該音声データを視聴することができないことが判明する、という２度手間を回避することができる。 In addition, the distribution server
A web page including link information to voice data output by the voice synthesis server is delivered to the client;
At the time of the delivery of the web page, information representing the suppression level of the linked voice data is delivered together.
Only after the user has acquired the voice data can it be avoided that it is found that the voice data cannot be viewed.

実施の形態１に係る音声合成装置の機能ブロック図を示すものである。1 is a functional block diagram of a speech synthesizer according to Embodiment 1. FIG. 抑制語リスト記憶手段１０４が格納する抑制語リストの構成例を示すものである。The example of a structure of the suppression word list which the suppression word list memory | storage means 104 stores is shown. 図１の音声合成装置１００の全体動作フローを説明するものである。An overall operation flow of the speech synthesizer 100 of FIG. 1 will be described. 図３に示すフローチャートの変形例を示すものである。6 shows a modification of the flowchart shown in FIG. 本実施の形態２に係る音声合成装置の機能ブロック図を示すものである。FIG. 3 is a functional block diagram of a speech synthesizer according to the second embodiment. 抑制語リスト記憶手段５０４が格納する抑制語リストの構成例を示すものである。The example of a structure of the suppression word list which the suppression word list memory | storage means 504 stores is shown. 本実施の形態２における合成後の音声データの内部構成イメージを示すものである。6 shows an internal configuration image of synthesized audio data in the second embodiment. 図５の音声合成装置５００の全体動作フローを説明するものである。The overall operation flow of the speech synthesizer 500 in FIG. 5 will be described. 実施の形態３に係る音声合成装置の機能ブロック図を示すものである。FIG. 9 is a functional block diagram of a speech synthesizer according to a third embodiment. 抑制語リスト記憶手段９０４が格納している許容度テーブルの構成及びデータ例を示すものである。The structure and data example of the tolerance table stored in the suppression word list storage unit 904 are shown. 図９の音声合成装置９００の全体動作フローを説明するものである。An overall operation flow of the speech synthesizer 900 in FIG. 9 will be described. 実施の形態４に係る音声合成装置の機能ブロック図を示すものである。FIG. 10 is a functional block diagram of a speech synthesizer according to a fourth embodiment. 図１２の音声合成装置１２００の全体動作フローを説明するものである。The overall operation flow of the speech synthesizer 1200 of FIG. 12 will be described. 図７に示す音声データの内部構造イメージにおいて、さらに音声透かし情報を埋め込んだものを説明するものである。In the image of the internal structure of the audio data shown in FIG. 7, the audio data embedded with audio watermark information will be described. 合成した音声コンテンツを含むＷｅｂページの一例を、画面イメージとして示すものである。An example of a Web page including synthesized audio content is shown as a screen image. クライアント端末へ配信するための音声合成配信システムの構成を説明するものである。A configuration of a voice synthesis distribution system for distribution to a client terminal will be described. ユーザテーブル１６２０の構成とデータ例について説明するものである。The configuration and data example of the user table 1620 will be described. 図１６においてユーザがＷｅｂページや音声コンテンツを視聴する際の詳細な処理シーケンスを説明するものである。A detailed processing sequence when the user views a Web page or audio content in FIG. 16 will be described. 図１５に示すＷｅｂページの別の構成例を示すものである。Fig. 16 shows another configuration example of the Web page shown in Fig. 15.

Explanation of symbols

１００音声合成装置、１０１入力手段、１０２認証手段、１０３抑制語判断手段、１０４抑制語リスト記憶手段、１０５音声合成手段、１１０読み上げ対象テキスト、１１１音声、５００音声合成装置、５０１入力手段、５０４抑制語リスト記憶手段、５０５音声合成手段、５０６算定手段、９００音声合成装置、９０１入力手段、９０２認証手段、９０４抑制語リスト記憶手段、９０５音声合成手段、９０６算定手段、９０７抑制度レベル判断手段、１２００音声合成装置、１２０１入力手段、１２０２認証手段、１２０４抑制語リスト記憶手段、１２０５音声合成手段、１２０６算定手段、１２０７抑制度レベル判断手段、１２０８暗号化手段、１５０１クライアント端末、１５０２Ｗｅｂページの画面構成例、１５０３音声プレイヤーの画面構成例、１６０１クライアント端末、１６０２サーバセンタ、１６０３ネットワーク、１６０４配信サーバ、１６０５音声合成サーバ、１６０６記憶手段、１６１０音声合成プログラム、１６２０ユーザテーブル、１６２１抑制語リスト、１６２２許容度テーブル、１９０１クライアント端末、１９０２Ｗｅｂページの画面構成例２、１９０３音声プレイヤーの画面構成例。
DESCRIPTION OF SYMBOLS 100 Speech synthesizer, 101 Input means, 102 Authentication means, 103 Suppression word judgment means, 104 Suppression word list storage means, 105 Speech synthesizer, 110 Text to be read out, 111 Speech, 500 Speech synthesizer, 501 Input means, 504 Suppression Word list storage means, 505 speech synthesis means, 506 calculation means, 900 speech synthesizer, 901 input means, 902 authentication means, 904 suppression word list storage means, 905 speech synthesis means, 906 calculation means, 907 suppression degree level determination means, 1200 speech synthesizer, 1201 input unit, 1202 authentication unit, 1204 suppression word list storage unit, 1205 speech synthesis unit, 1206 calculation unit, 1207 suppression level determination unit, 1208 encryption unit, 1501 client terminal, 1502 Web page screen Configuration example 1 03 Voice player screen configuration example, 1601 client terminal, 1602 server center, 1603 network, 1604 distribution server, 1605 speech synthesis server, 1606 storage means, 1610 speech synthesis program, 1620 user table, 1621 suppression word list, 1622 tolerance table 1901 Client terminal, 1902 Web page screen configuration example 2, 1903 Audio player screen configuration example.

Claims

When synthesizing and outputting input text as speech, if the text contains a phrase that should suppress speech output (hereinafter referred to as a suppression phrase), a method of outputting with restricted listening is used. There,
An authentication step for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
Reading the list from storage means storing a list holding a list of suppression words;
A determination step of determining whether or not a suppression word is included in the input text;
A synthesis step of synthesizing speech data based on the input text only if the input text does not contain a suppression word or if the listener is a regular user;
A speech synthesis method characterized by comprising:

When the input text is synthesized and output as speech, if the text contains suppression words, it is a method of outputting with the restriction of listening,
A step of reading the list from the storage means storing the list holding the set of the suppression word and the suppression level of the suppression word;
A calculation step for calculating a suppression level in the input text based on the list;
A synthesis step of synthesizing speech data including information representing a calculation result based on the input text and the result of the calculation step;
A speech synthesis method characterized by comprising:

When the input text is synthesized and output as speech, if the text contains suppression words, it is a method of outputting with the restriction of listening,
An authentication step for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
A step of reading the list from the storage means storing the list holding the set of the suppression word and the suppression level of the suppression word;
A calculation step for calculating a suppression level in the input text;
Determining whether or not the suppression level calculated in the calculating step is within an allowable range for a listener;
Synthesis that synthesizes speech data including information representing the calculation result based on the input text only when the listener is a regular user and the suppression level is within the allowable range for the listener. Steps,
A speech synthesis method characterized by comprising:

The storage means
Contains a table representing the relationship between the listener's identification information and the level of suppression allowed for that listener,
4. The speech synthesis method according to claim 3, wherein it is determined whether or not the suppression level is within a range allowed for the listener based on the contents of the table.

In the calculating step, a suppression level is calculated for each predetermined block of input text,
5. The speech synthesis method according to claim 2, wherein in the synthesis step, speech data is synthesized for each predetermined block including information indicating a suppression level.

6. The speech synthesis method according to claim 1, further comprising an encryption step of generating an encryption key based on the identification information and encrypting the speech data synthesized in the synthesis step. .

7. The voice data is synthesized by including the watermark data for identifying the voice data synthesis source for each predetermined block of the input text in the synthesis step. The speech synthesis method described in 1.

When the input text is synthesized and output as speech, if the text contains a suppression phrase, it is a device that outputs with restriction of listening,
Authentication means for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
Storage means for storing a list for holding a list of suppression words;
A determination unit that reads a list holding a list of suppression words from the storage unit, and determines whether or not the suppression word is included in the input text; and
A synthesis means for synthesizing speech data based on the input text only when the input text does not contain a suppression word or when the listener is a regular user;
A speech synthesizer characterized by comprising:

When the input text is synthesized and output as speech, if the text contains a suppression phrase, it is a device that outputs with restriction of listening,
Storage means for storing a list that holds a combination of a suppression word and a suppression level of the suppression word;
From the storage means, a calculation means for reading a list holding a set of suppression words and suppression levels of the suppression words, and calculating a suppression level in the input text;
A synthesizing unit that synthesizes voice data including information representing the calculation result based on the input text and the processing result of the calculating unit;
A speech synthesizer characterized by comprising:

When the input text is synthesized and output as speech, if the text contains a suppression phrase, it is a device that outputs with restriction of listening,
Authentication means for confirming whether or not the listener is an authorized user based on the identification information of the person who listens to the output sound;
Storage means for storing a list that holds a combination of a suppression word and a suppression level of the suppression word;
From the storage means, a calculation means for reading a list holding a set of suppression words and suppression levels of the suppression words, and calculating a suppression level in the input text;
A degree-of-suppression level judgment means for judging whether or not the degree-of-suppression level calculated by the calculating means is within an allowable range for the listener;
Synthesis that synthesizes speech data including information representing the calculation result based on the input text only when the listener is a regular user and the suppression level is within the allowable range for the listener. Means,
A speech synthesizer characterized by comprising:

The storage means
Contains a table representing the relationship between the listener's identification information and the level of suppression allowed for that listener,
The suppression level determination means includes
11. The speech synthesizer according to claim 10, wherein it is determined whether or not the suppression level is within an allowable range for the listener based on the contents of the table.

The calculation means calculates a suppression level for each predetermined block of input text,
12. The speech synthesizer according to claim 9, wherein the synthesis unit synthesizes speech data including information indicating a suppression level for each predetermined block.

13. The speech synthesizer according to claim 8, further comprising an encryption unit that generates an encryption key based on the identification information and encrypts the speech data synthesized by the synthesis unit. .

14. The voice data according to any one of claims 8 to 13, wherein the synthesizing unit synthesizes voice data including watermark data for identifying a voice data synthesis source for each predetermined block of input text. The speech synthesizer described.

A speech synthesis program for causing a computing means to execute the speech synthesis method according to claim 1.

A speech synthesis distribution system having a speech synthesis server and a distribution server for distributing speech data output by the speech synthesis server,
The speech synthesis server
Calculation means, and storage means for storing the speech synthesis program according to claim 15;
The storage means
A user table holding the identification information of the listener;
A table representing the relationship between the listener's identification information and the level of suppression allowed for that listener;
A speech synthesis and distribution system characterized by storing a list that holds a combination of a suppression word and a suppression level of the suppression word.

The distribution server
Accepts a synthesized speech delivery request from a client and forwards the request to the speech synthesis server;
The computing means is
Upon receiving the request transferred from the distribution server, based on the instructions of the voice synthesis program, read the table and list stored in the storage means, perform the voice synthesis and return to the distribution server,
The distribution server
The speech synthesis delivery system according to claim 16, wherein the returned speech data is delivered to the client.

The distribution server
A web page including link information to voice data output by the voice synthesis server is delivered to the client;
18. The voice synthesizing and distributing system according to claim 17, wherein when the Web page is distributed, information indicating a suppression level of the linked voice data is also distributed.