JP4871119B2

JP4871119B2 - Speech synthesis method, speech synthesizer, program, recording medium

Info

Publication number: JP4871119B2
Application number: JP2006351973A
Authority: JP
Inventors: 昇宮崎; 秀之水野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-12-27
Filing date: 2006-12-27
Publication date: 2012-02-08
Anticipated expiration: 2026-12-27
Also published as: JP2008164759A

Description

本発明は、利用者の所望するテキスト情報を音声で聴取することを可能とする音声合成方法、音声合成装置、プログラム、記録媒体に関する。 The present invention relates to a speech synthesis method, speech synthesis apparatus, program, and recording medium that enable a user to listen to text information desired by a user.

最近では、記憶装置の小型化・軽量化や、メモリの低価格化・大容量化に伴って小型の携帯機器に利用者の好みとする音楽や音声を格納し、任意の場所・時間に聴取することが可能となってきている。特に近年ではＡＤＳＬや光ファイバ等を用いたブロードバンドのインターネットサービスの普及によって、ネットワーク上で音楽データや音声データの配信を行っているサイトを用意し、有料・または無料でそれらのデータをパーソナルコンピュータによりダウンロードして携帯型の音声再生機器等で聴取されることを前提としたネットワークサービスが普及しつつある。また、それらのデータを音声や音楽の再生機能が付加された携帯電話に直接ダウンロードし、携帯電話で聴取させるサービスも普及しつつある。
こうした、音声データや音楽データの配信の容易化には、前述したようにインターネットサービスのブロードバンド化によるものが大きいが、音声や音楽の符号化技術の向上や標準化、さらに基盤となるプラットフォームソフトウェアの普及によってそうしたデータの加工・格納・配信のプロセスが安価かつ容易に実現できるようになったことも要因の一つとして上げられる。 Recently, along with the downsizing and weight reduction of storage devices and the low price and large capacity of memory, music and voices that users prefer are stored in small portable devices and can be listened to anywhere and at any time. It has become possible to do. In recent years, with the spread of broadband Internet services using ADSL, optical fiber, etc., a site that distributes music data and voice data over a network has been prepared, and these data can be paid for free or for free on personal computers. Network services that are assumed to be downloaded and listened to by portable audio playback devices are becoming widespread. In addition, a service in which such data is directly downloaded to a mobile phone with a voice or music playback function and listened to by the mobile phone is becoming widespread.
As mentioned above, the easy distribution of voice data and music data is largely due to the broadbandization of Internet services. However, the improvement and standardization of voice and music coding technologies, and the spread of platform software as a foundation. As a result, the process of processing, storing, and distributing such data can be realized easily and inexpensively.

背景技術の項で述べたように、音声や音楽用の符号化技術によって配信用に加工された音声・音楽データのネットワーク配信により、一般の人々が好きな時間・場所で手持ちの携帯機器を用いて音楽や音声が聴取可能な状況になってきているが、そうした音声・音楽データは予め加工されて配信用に整備されたものしか利用できないといった問題がある。
具体的には、例えば当日のニュースや天気予報等を音声で聴取したいと利用者が思った時、そうしたニュースを配信している新聞社等のサイトをアクセスすることで音声データを得ることは可能である。しかしそうしたサイトでは通常ＴＶやラジオのニュースとは異なり１日に１回程度しか音声データを配信せず、当日のニュースを知ることはできないという問題がある。
また、こうしたサイトで予め用意されている音声は高々数種類であり、ユーザの好みや気分によって音声の種類を変更することは通常困難である。 As mentioned in the background section, network distribution of voice and music data processed for distribution by voice and music coding technology allows the use of portable devices on hand at times and places that the general public likes. However, there is a problem that only voice / music data that has been processed in advance and prepared for distribution can be used.
Specifically, for example, when a user wants to listen to the news of the day, weather forecast, etc. by voice, it is possible to obtain voice data by accessing a site such as a newspaper company that distributes such news. It is. However, unlike such TV and radio news, such sites usually deliver audio data only once a day, and there is a problem that the news of the day cannot be known.
Also, there are at most several types of voices prepared in advance at such sites, and it is usually difficult to change the type of voice according to the user's preference and mood.

その他、近年ネットワークの新しいコンテンツとして注目されているブログや、ＳＮＳのメッセージ等は殆どの場合、テキスト情報のみ配信されることが多く、そうした情報を音声で聴取することはできないという課題があった。
こうした問題を解決するための方法の一つとして、音声合成技術を利用してテキストから音声に変換し、そうした機器で聴取するようなサービスも提案されている。インターネット上に豊富に存在し、更新される頻度も高いテキスト情報から音声に変換することで、ユーザが所望する最新の情報を音声で聴取することも可能となっている。また、こうしたことを目的としたパーソナルコンピュータ用の読み上げソフトウェアも存在しており、パーソナルコンピュータで音声に変換し、携帯機器に転送することによりそうしたテキスト情報を音声で聴取することも可能となってきている。 In addition, blogs and SNS messages that have been attracting attention as new contents of the network in recent years are often distributed only in text information, and there is a problem that such information cannot be heard by voice.
As one method for solving such a problem, a service has been proposed in which text-to-speech is converted using speech synthesis technology and listened to with such a device. By converting text information, which is abundant on the Internet and frequently updated, into speech, it is possible to listen to the latest information desired by the user by speech. In addition, there is a reading software for a personal computer for such a purpose, and it has become possible to listen to such text information by voice by converting it into voice by a personal computer and transferring it to a portable device. Yes.

しかし、現在のテキスト音声合成技術では与えられたテキストをそのまま音声に変換するにすぎず、ユーザが様々な情報を音声で聴取したい場合、それらの情報を全て音声に変換することになるが、その場合は非常に長時間の音声となり、実際ユーザが全て聞くことは難しくなる。
この問題に対し、短時間での聴取を実現するための手段として参考文献１で実現される文書要約方法を利用することも容易に考えられる。ユーザが所望する情報が記述されたテキストを適切に要約することで、冗長な部分を除去することにより短時間で聴取することが可能となる。
また参考文献２で実現された最大量規定型情報編集装置を用いれば、ユーザ毎に、興味を持つコンテンツの種類や聴取する最大の情報の量を予め指定することで、ユーザの興味に対応するコンテンツを優先的に選択し、短時間で聴取することが可能となる。
しかし、これらの場合も以下のような課題が存在する。 However, with the current text-to-speech synthesis technology, the given text is simply converted to speech as it is, and if the user wants to listen to various information by speech, all of that information will be converted to speech. In this case, the voice becomes very long, and it is difficult for the user to actually listen to it.
To solve this problem, it is easy to use the document summarization method realized in Reference Document 1 as a means for realizing listening in a short time. By appropriately summarizing the text in which the information desired by the user is described, it is possible to listen in a short time by removing redundant portions.
Further, if the maximum amount defining type information editing device realized in Reference Document 2 is used, it is possible to cope with the user's interest by specifying in advance for each user the type of content of interest and the maximum amount of information to be listened to. It is possible to select content preferentially and listen in a short time.
However, the following problems also exist in these cases.

まず第１の課題として、個人によって異なる好み例えば話速はゆっくりでないと聞きとれないし、時間もないので全部で５分間聞きたい場合や、極力多くの情報を知りたいので、多少話速は早くてもよいが、２０分間程度にしたいというような場合や、利用者にとって興味のある情報は比較的詳細に、あまり興味が無く簡素でよい情報は大幅に要約して聴取したい場合、また情報の内容によって話者や話調を切り替えたい、例えばニュースであれば有名アナウンサーの声と調子で、芸能情報であれば芸能リポーター風というような、様々な利用者毎の要望への対応が不可能である点が上げられる。
さらに第２の課題として、利用者毎に異なる様々な好みに対応するため、好みとするジャンルやキーワード等、テキストの要約や分類のための多様な設定を用意しておいたとしても、近年の多様なコンテンツが氾濫している状況においては、ユーザが本当に好みとするようなコンテンツに直接関連したキーワードやジャンル分け及び関心の度合いをユーザが手動で設定することは困難である。
以上の課題が存在するため、予め作成されたコンテンツであり聴取自体が目的である音楽を携帯用機器で聴取することと比較して、情報の内容が重要である音声を携帯用機器で聴取するようなサービスや機器の普及は困難な状況である。 The first problem is that you can't listen to preferences that vary from person to person, for example, speaking speed is slow, and you don't have much time, so if you want to listen to 5 minutes in total, or want to know as much information as possible, the speaking speed is somewhat faster. However, if you want to keep it for about 20 minutes, or if you want to listen to the information that is of interest to users in a relatively detailed manner, or if you want to summarize the information that is not of interest and that is simple, you should listen to it again. It is impossible to respond to various requests for each user, such as the voice and tone of a famous announcer for news, and the style of entertainment reporter for entertainment information. Points are raised.
Furthermore, as a second problem, in order to cope with various preferences that vary from user to user, even if various settings for summarizing and classifying text such as genres and keywords are prepared, In a situation where various contents are flooded, it is difficult for the user to manually set keywords, genre divisions, and degrees of interest that are directly related to the contents that the user really likes.
Because the above problems exist, listening to audio whose content of information is important compared to listening to music that was created in advance and that is the purpose of listening on the portable device The spread of such services and equipment is difficult.

本発明による音声合成装置はテキストデータから成る記事を複数格納した記事群から設定した条件に合致した記事を選択し、選択した記事を構成するテキストデータから合成音声を生成する音声合成装置において、所望する合成音声の話速や話者種別、総時間長Ｔ、および必須再生記事数を設定するコンテンツ作成パラメータ設定手段と、利用者が関心を持つキーワード及びそのキーワードに対する関心の度合いをキーワードセット蓄積手段に設定するキーワード設定手段と、記事群に含まれる各記事ｉについて、キーワード蓄積手段に蓄積されているキーワードのうち当該記事ｉに出現するキーワードの関心の度合いに基づいて、記事（ｉ）の関心度α（ｉ）を決定する関心度決定手段と、０から１までの範囲で記事ｉの関心度α（ｉ）に応じて値が大きくなる所定の関数ｆを用い、各記事ｉの要約レベルＡ（ｉ）をｆ（α（ｉ））によって求め、記事ｉの関心度α（ｉ）の大きな順に、当該記事ｉが要約される前のテキストを音声合成した場合の時間長Ｌ（ｉ）と当該記事ｉの要約レベルＡ（ｉ）の総和が総時間長Ｔに満たない範囲で記事ｉを抽出し、抽出された記事ｉに要約レベルＡ（ｉ）を付与する要約レベル決定手段と、要約レベルＡ（ｉ）が付与された記事ｉに対して、その要約レベルＡ（ｉ）に基づいて、記事ｉに対応するテキストを要約するテキスト要約手段と、コンテンツ作成パラメータ設定手段により設定された話速や話者種別に基づいて前記要約されたテキストから音声を合成するテキスト音声合成手段とを備えることを特徴とする。 The speech synthesizer according to the present invention selects an article that matches a set condition from an article group storing a plurality of articles composed of text data, and generates a synthesized speech from text data constituting the selected article. Content creation parameter setting means for setting the speech speed, speaker type, total time length T, and the number of required articles to be reproduced, and keyword set storage means for the keyword that the user is interested in and the degree of interest in the keyword For each article i included in the article group and the keyword setting means to be set in the article group, the interest of the article (i) based on the degree of interest of the keyword appearing in the article i among the keywords accumulated in the keyword accumulating means degrees alpha and interest level determining means for determining (i), DOI articles i ranging from 0 to 1 alpha to (i) The summary level A (i) of each article i is obtained by f (α (i)) using a predetermined function f that increases in value, and the article i is ordered in descending order of the degree of interest α (i) of the article i. Article i is extracted and extracted in a range where the sum of the time length L (i) and the summarization level A (i) of the article i is less than the total time length T when the text before being summarized is speech-synthesized The summary level determination means for assigning the summary level A (i) to the article i, and the article i to which the summary level A (i) is assigned corresponds to the article i based on the summary level A (i) Text summarizing means for summarizing the text to be performed, and text speech synthesizing means for synthesizing speech from the summarized text based on the speech speed and speaker type set by the content creation parameter setting means. .

本発明による音声合成装置は更に前記記載の音声合成装置において、複数の記事をあらかじめ決められた複数のジャンルへ分類するテキスト分類手段を含み、コンテンツ作成パラメータ設定手段において、ジャンル毎に所望する合成音声の話速や話者種別を設定することを特徴とする。
本発明による音声合成装置は更に、前記記載の音声合成装置において、利用者が機器を操作して音声を聴取する際の操作対象となる記事に関連するキーワードを決定する操作対象キーワード決定手段と、決定したキーワードに関連する関心の度合いを、機器を操作した際の操作パラメータに応じて調整する関心度調整手段とを備えることを特徴とする。 The speech synthesizer according to the present invention further includes text classification means for classifying a plurality of articles into a plurality of predetermined genres in the speech synthesizer described above, and a desired synthesized speech for each genre in the content creation parameter setting means. The speech speed and speaker type are set.
The speech synthesizer according to the present invention further includes an operation target keyword determining means for determining a keyword related to an article to be operated when the user operates the device and listens to the sound in the speech synthesizer described above, An interest level adjusting means is provided for adjusting the degree of interest related to the determined keyword in accordance with an operation parameter when the device is operated.

使用者の合成音に対す時間長や話速等の要求条件に従って、利用者が所望するテキストコンテンツ毎の要約レベルを決定する手段を具備することで、個々の利用者の聴力の個人差や好み、音声の聴取に割り当て可能な時間等に基づいて、個人単位で調整された音声合成による情報の聴取が可能となり、これまで困難であった利用者毎の最適な音声情報の提供が可能となる。
またテキスト情報の種別毎に利用者が所望する音声の話速や、情報内容の要約レベル、合成音声の話者の種別や話調を決定する手段を具備することで、個人単位での情報の内容種別により、特定の情報のみゆっくり注意して聞き、それ以外の情報は早く流し聞きするとか、全ての情報を聞き流すように一気に聴取したり、個人ごとの聴取のスタイルや要求に合わせたりすることが可能となる。 By providing means to determine the summarization level for each text content desired by the user according to the requirements such as the length of time and speech speed for the synthesized sound of the user, individual differences and preferences of individual users' hearing ability Based on the time that can be allocated for listening to voice, etc., it is possible to listen to information by voice synthesis adjusted in individual units, and it is possible to provide optimal voice information for each user, which has been difficult until now .
In addition, by providing means for determining the speech speed desired by the user for each type of text information, the summary level of the information content, the type and tone of the synthesized speech speaker, Depending on the content type, listen to specific information slowly and carefully, listen to other information quickly, listen all at once, or adjust to the listening style and requirements of each individual. Is possible.

これと同時に、情報の内容と個人の好みに対応させて話者や話調、例えばニュースであれば男性の落ち着いたアナウンサー調で比較的ゆっくり話すとか、天気情報案内であれば女声で明るく活発に話すような話者・話調を選択することが可能となり、比較的長時間の音声の聴取をする場合に発生しやすい飽きや聞き疲れといった問題の解消や個人の好みへの対応が可能となる。
利用者が音声再生中に行った操作、例えば早送り、スキップ、巻き戻し等に関する操作情報が合成音声データの生成側にフィードバックされ、その情報とそれらの操作の対象となったジャンルキーワードに対する関心の度合いがユーザ単位で調整される手段を具備することで、こうした普段何気なく気になると感じて再聴取したり、あまり関心がないため早送りしたり、場合によってはスキップしたりという操作によって、利用者が的確に意識してジャンル単位で設定する関心の度合い、例えばサッカー、野球、映画といった大まかなジャンルにくらべ、より詳細な、特定のチーム、特定の選手に関する話題や特定の監督、特定の俳優に関する話題といった、利用者の個人的な興味や普段意識しない好き嫌いに基づいた情報の提供が可能となる。 At the same time, it is possible to speak relatively slowly in the form of a calm announcer like a male if it is a news or tone, depending on the content of the information and personal preferences, for example, in the case of news information, a female voice that is bright and active It becomes possible to select a speaker / speaking style that speaks, and it is possible to solve problems such as tiredness and fatigue that are likely to occur when listening to speech for a relatively long time and to respond to personal preferences. .
Operation information related to operations performed by the user during audio playback, such as fast-forward, skip, and rewind, is fed back to the generation side of the synthesized voice data, and the degree of interest in the genre keyword that is the target of those operations By providing a means that can be adjusted on a per-user basis, the user can accurately listen to it, feel free to listen to it, listen to it quickly because it is not very interested, or skip it in some cases. The degree of interest set by genre unit, such as soccer, baseball, movies, etc., more detailed topics about specific teams, specific players, specific directors, specific actors, etc. , It is possible to provide information based on personal interests of users and likes and dislikes that are not usually conscious .

本発明による音声合成方法を実現する音声合成装置は全てをハードウェアによって構成することも可能であるが、最も簡素に実現するにはコンピュータに本発明による音声合成プログラムをインストールし、コンピュータに本発明による音声合成装置として機能させ、本発明による音声合成方法を実行させる実施形態が最良である。
コンピュータに本発明による音声合成装置として機能させる場合、コンピュータに本発明による音声合成プログラムをインストールし、コンピュータにテキストデータから成る記事を複数格納した記事群と、これら記事群から設定した条件に合致した記事を選択して取り出す記事選択手段と、所望する合成音声の話速や話者種別、時間長を設定するコンテンツ作成パラメータ設定手段と、利用者が関心を持つキーワード及び関心の度合いを設定するキーワード設定手段と、設定されたキーワード及び関心の度合いに基づいて各記事の関心度を決定する関心度決定手段と、設定された話速と聴取時間と、決定された関心度に基づいて記事に対応するテキストの要約レベルを決定する要約レベル決定手段と、要約レベルに基づいて、記事に対応するテキストを要約するテキスト要約手段と、コンテンツ作成パラメータ設定手段により設定された話速や話者種別に基づいて要約されたテキストから音声を合成するテキスト音声合成手段とを構築し、音声合成装置として機能させる第１の実施形態と、
更に、前記記載の音声合成装置において、複数の記事をあらかじめ決められた複数のジャンルへと分類するテキスト分類手段を含み、
コンテンツ作成パラメータ設定手段において、ジャンル毎に所望する合成音声の話速や話者種別を設定する手段を付加した音声合成装置として機能させる第２の実施形態と、
更に、前記記載の音声合成装置において、機器を操作し音声を聴取する際の操作対象となる記事に関連するキーワードを決定する操作対象キーワード決定手段と、決定したキーワードに関連する関心の度合いを、機器を操作した際の操作パラメータに応じて調整する関心度調整手段とを備える装置として機能させる第３の実施形態とが考えられる。 The speech synthesizer that implements the speech synthesis method according to the present invention can be entirely configured by hardware. However, in order to achieve the simplest implementation, the speech synthesis program according to the present invention is installed in a computer, and the present invention is installed in the computer. An embodiment that functions as a speech synthesizer according to the present invention and executes the speech synthesis method according to the present invention is the best.
When the computer functions as a speech synthesizer according to the present invention, the speech synthesis program according to the present invention is installed in the computer, and a plurality of articles consisting of text data are stored in the computer, and the conditions set from these article groups are met. Article selection means for selecting and taking out articles, content creation parameter setting means for setting the desired speech speed, speaker type, and time length of the synthesized speech, keywords for which the user is interested, and keywords for setting the degree of interest Setting means, interest level determination means for determining the interest level of each article based on the set keyword and degree of interest, correspondence to articles based on the set speech speed and listening time, and the determined interest level A summary level determination means for determining the summary level of the text to be processed and the article based on the summary level Construct text summarizing means for summarizing text and text speech synthesizing means for synthesizing speech from text summarized based on speech speed and speaker type set by content creation parameter setting means and function as a speech synthesizer A first embodiment,
Furthermore, in the speech synthesizer described above, further comprising text classification means for classifying a plurality of articles into a plurality of predetermined genres,
A second embodiment in which content creation parameter setting means functions as a speech synthesizer to which means for setting desired speech speed and speaker type for each genre is added;
Further, in the speech synthesizer described above, an operation target keyword determination means for determining a keyword related to an article to be operated when operating a device and listening to speech, and a degree of interest related to the determined keyword, A third embodiment may be considered that functions as an apparatus including an interest level adjusting unit that adjusts according to an operation parameter when the device is operated.

図１に本発明の実施形態１に係わる音声合成装置の概念構成図を示す。
実施形態１で提案する音声合成装置１００は、テキストデータから成る記事を複数格納した記事群１０５を備え、この記事群１０５から選択した記事から合成音声を生成する際に、所望する合成音声の話速や話者種別、時間長を設定するコンテンツ作成パラメータ設定手段１０１と、利用者が関心を持つキーワード及び関心の度合いを設定するキーワード設定手段１１４と、設定されたキーワード及び関心の度合いに基づいて各記事の関心度を決定する関心度決定手段１０４と、設定された話速と聴取時間と、決定された関心度に基づいて選択された記事に対応するテキストの要約レベルを決定する要約レベル決定手段１０６と、要約レベルに基づいて、記事に対応するテキストを要約するテキスト要約手段１１２と、コンテンツ作成パラメータ設定手段１０１により設定された話速や話者種別に基づいて前記要約されたテキストから音声を合成するテキスト音声合成手段１０７とを備える。 FIG. 1 shows a conceptual configuration diagram of a speech synthesizer according to Embodiment 1 of the present invention.
The speech synthesizer 100 proposed in the first embodiment includes an article group 105 that stores a plurality of articles made up of text data. When generating synthesized speech from an article selected from the article group 105, a desired synthesized speech is spoken. Based on the set keyword and the degree of interest, the content creation parameter setting means 101 for setting the speed, speaker type, and time length, the keyword setting means 114 for setting the keyword and the degree of interest that the user is interested in Interest level determination means 104 that determines the level of interest of each article, summary level determination that determines the level of text corresponding to the selected article based on the set speech speed and listening time, and the determined level of interest Means 106, text summarizing means 112 for summarizing the text corresponding to the article based on the summarization level, and content creation parameters. And a text-to-speech synthesis means 107 for synthesizing speech from the summary text on the basis of the set speech speed and speaker type by data setting means 101.

コンテンツ作成パラメータ設定手段１０１は利用者がコンテンツ作成パラメータをコンテンツ作成パラメータ蓄積手段１０２へ設定する手段である。設定手段としては例えば図２に示すＧＵＩ画面を用いて実現することができる。コンテンツ作成パラメータには、少なくとも音声の総時間長と、話速と、話者のタイプを含み、また、要約レベル決定手段１０６の実現方法によっては、必須再生記事数を含むことが考えられる。総時間長や必須再生記事数は数値を記入するフォームで記入すればよいし、話速に関しては、話速の程度を指定するスライダを、話者のタイプについてはいくつかの選択肢を提示し、ラジオボタンを用いて設定するようにすれば良い。 The content creation parameter setting means 101 is a means for the user to set content creation parameters in the content creation parameter storage means 102. The setting means can be realized using, for example, a GUI screen shown in FIG. The content creation parameters include at least the total length of speech, the speed of speech, and the type of speaker. Depending on how the summary level determination means 106 is implemented, the content creation parameters may include the number of essential playback articles. The total length of time and the number of required playback articles can be entered in a form that fills in the numerical values. For the speaking speed, a slider for specifying the degree of speaking speed is presented. For the speaker type, several options are presented. The setting should be made using radio buttons.

キーワード設定手段１１４は、利用者が、自分が興味を持つ記事に関するキーワードをキーワードセット蓄積手段１１１に設定する手段である。キーワード設定手段１１４は、少なくともキーワードと、そのキーワードに対する利用者の関心の度合いをキーワードセット蓄積手段１１１に設定する。設定手段としては例えば図３に示されるＧＵＩ画面を用いて実現することができる。キーワードの設定には自由にテキスト入力するフォーム（入力欄１１１Ａにテキストデータを入力する）を用いればよく、関心の度合いについては、数段階のラジオボタン１１１Ｂを用いて実現すればよい。利用者が設定するキーワードは一つに限るものではなく、利用者の興味に応じて複数設定することができる。
図４に音声合成装置１００の動作フローを示す。
関心度決定ステップＳＰ１において記事群１０５に格納されている記事のそれぞれについて、関心度決定手段１０４を用いて関心度を付与し、関心度付与記事蓄積手段１０３に蓄積する。 The keyword setting unit 114 is a unit in which a user sets a keyword related to an article of interest to the keyword set storage unit 111. The keyword setting unit 114 sets at least the keyword and the degree of user interest in the keyword in the keyword set storage unit 111. The setting means can be realized by using, for example, a GUI screen shown in FIG. A keyword can be set by using a form for freely inputting text (text data is input in the input field 111A), and the degree of interest can be realized by using several stages of radio buttons 111B. The keyword set by the user is not limited to one, and a plurality of keywords can be set according to the user's interest.
FIG. 4 shows an operation flow of the speech synthesizer 100.
For each of the articles stored in the article group 105 in the interest level determination step SP <b> 1, an interest level is assigned using the interest level determination means 104 and accumulated in the interest level added article accumulation means 103.

次に、要約レベル決定ステップＳＰ２において関心度を付与した記事のそれぞれについて、要約レベル決定手段１０６を用いて要約レベルを決定し、この要約レベルを各記事に付与し、要約レベル付与記事蓄積手段１０９に蓄積する。
テキスト要約ステップＳＰ３においては要約レベルが付与された記事のそれぞれについてテキスト要約手段１１２を用いてテキストを要約し、要約記事蓄積手段１０８に蓄積する。
音声合成ステップＳＰ４において、テキスト音声合成手段１０７は要約記事蓄積手段１０８に蓄積されている要約された記事のそれぞれのテキストについて音声を合成し、合成音声蓄積手段１１３及び通信網２００を介して利用者の端末３００へ提示する。 Next, for each article to which the degree of interest is given in the summary level determination step SP2, a summary level is determined using the summary level determination means 106, this summary level is assigned to each article, and a summary level-added article storage means 109 is provided. To accumulate.
In the text summarization step SP3, the text summarizing means 112 is used to summarize the text for each article to which the summarization level is given, and is stored in the summary article accumulating means 108.
In the speech synthesis step SP4, the text speech synthesis means 107 synthesizes speech for each text of the summarized articles stored in the summary article storage means 108, and the user is connected via the synthesized speech storage means 113 and the communication network 200. To the terminal 300.

以下に各手段の構成動作について更に詳しく説明する。
関心度決定手段１０４は、記事群１０５に含まれる記事ｉ毎に、関心度α（ｉ）を付与する。関心度決定手段１０４は、例えば記事ｉについて、キーワードセット蓄積手段１１１に蓄積されているキーワードのうち記事ｉに出現するものを調べ、記事ｉに含まれるキーワードに対応する関心度のうち最も値の大きな関心度を、記事ｉの関心度α（ｉ）として決定することで実現できる。
要約レベル決定手段１０６は、関心度付与記事蓄積手段１０３に蓄積された関心度が付与された記事の集合から、要約の対象となる記事を決定し、同時に対象となる記事毎にその要約レベルを決定する。ここで要約レベルとは、要約されたあとのテキストを音声合成した場合の時間長を、要約される前のテキストを音声合成した場合の音声長で割った値とする。つまり、要約レベルは０から１の間の数値で表現され、値が１に近いほど要約されず原文のままとなり、値が０に近いほど要約されて短いテキストとなる。 Hereinafter, the constituent operations of each means will be described in more detail.
The interest level determination means 104 assigns an interest level α (i) to each article i included in the article group 105. For example, for the article i, the interest level determination unit 104 examines the keywords stored in the keyword set storage unit 111 that appear in the article i, and has the highest value among the interest levels corresponding to the keywords included in the article i. A large degree of interest can be realized by determining the degree of interest α (i) of article i.
The summary level determination means 106 determines an article to be summarized from the set of articles to which the degree of interest stored in the interest level-added article storage means 103 is given, and simultaneously sets the summary level for each target article. decide. Here, the summarization level is a value obtained by dividing the time length when the synthesized text is synthesized by speech by the speech length when the synthesized text is synthesized by speech. That is, the summarization level is expressed by a numerical value between 0 and 1, and the closer the value is to 1, the more the original text is not summarized, and the closer the value is to 0, the shorter the summarization is.

ここで、コンテンツ作成パラメータで指定される総音声時間長をＴとし、必須再生記事数をＭｉｎＡｒｔとし、記事ｉ毎の要約される前のテキストを音声合成した場合の時間長をＬ（ｉ）とし関心度をα（ｉ）とする。また、関心度α（ｉ）は、１から５までの範囲の数値で実現されるものとする。
このとき、記事ｉの要約レベルＡ（ｉ）は、例えば次の要約レベル決定手法１或は要約レベル決定手法２で求めることができる。 Here, T is the total audio time length specified by the content creation parameter, MinArt is the number of required playback articles, and L (i) is the time length when the text before being summarized for each article i is synthesized. The degree of interest is α (i). The degree of interest α (i) is assumed to be realized by a numerical value in the range of 1 to 5.
At this time, the summary level A (i) of the article i can be obtained by the following summary level determination method 1 or summary level determination method 2, for example.

要約レベル決定手法１
まず、関心度重みｗ（α（ｉ））を、関心度に応じて値が大きくなる、０から１までの範囲の値をとるものとする。例えばｗ（α（ｉ））は、（α（ｉ）−１）を４で割って得たものとする。
手順１、Ｌ（ｉ）＊ｗ（α（ｉ））の総和がＴを超えるまで、関心度α（ｉ）の大きなものから順に記事ｉを抽出する。抽出された記事数が必須再生記事数ＭｉｎＡｒｔに満たない場合、関心度α（ｉ）の大きなものから順にＭｉｎＡｒｔ個の記事を抽出する。抽出された記事に関するＬ（ｉ）＊ｗ（α（ｉ））の総和をＳとする。
手順２、要約度Ａ（ｉ）を以下の式で決定する。
Ａ（ｉ）＝ｗ（α（ｉ））＊（Ｔ／Ｓ）
上記の手法１により得られる要約レベルＡ（ｉ）は、関心度が高ければ高いほど高い要約率となり、さらに、各記事に対応する音声長Ｌ（ｉ）にＡ（ｉ）をかけて得られる要約後のテキストを音声合成した場合の音声長Ｌ（ｉ）＊Ａ（ｉ）の総和はＴとなる。 Summarization level determination method 1
First, it is assumed that the interest level weight w (α (i)) takes a value in the range from 0 to 1 that increases in accordance with the interest level. For example, w (α (i)) is obtained by dividing (α (i) -1) by 4.
Step 1, articles i are extracted in descending order of interest α (i) until the sum of L (i) * w (α (i)) exceeds T. If the number of articles extracted is less than the required number of articles to be reproduced MinArt, MinArt articles are extracted in descending order of interest α (i). Let S be the total sum of L (i) * w (α (i)) for the extracted articles.
Procedure 2, summarization A (i) is determined by the following equation.
A (i) = w (α (i)) * (T / S)
The summarization level A (i) obtained by the above-described method 1 becomes higher as the degree of interest is higher, and is obtained by multiplying the audio length L (i) corresponding to each article by A (i). The sum of the speech lengths L (i) * A (i) when the synthesized text is speech synthesized is T.

要約レベル決定手法２
例えば、関心度α（ｉ）毎に要約レベルを一意に決定する要約レベル関数ｆ（α）を準備する。ｆ（α）は０から１までの範囲の値をとるもので、例えば下記の関数を使うことができる。
ｆ（α）＝（（α−１）／８）＋０．５
このような関数ｆ（α）を用いて記事（ｉ）に関する要約レベルＡ（ｉ）をｆ（α（ｉ））とすれば、関心度αが１の場合は要約レベルが０．５となり、関心度が５の場合は要約レベルが１．０となるため、関心の強い記事についてはあまり要約されず、関心の薄い記事については要約されて短いテキストとなる。このような関数を用い、関心度が付与された記事の集合から、関心度の高い順に、Ｌ（ｉ）＊Ａ（ｉ）の総和がＴに満たない範囲で記事を抽出すれば、要約の対象となる記事を決定することができる。この要約レベル決定手法２の場合には、コンテンツ作成パラメータについては、必須再生記事数のパラメータは用いられない。
上記二つの要約レベル決定手法１及び２の実施例においては、テキストを音声合成した場合の音声長Ｌを知る必要があるが、これは、例えば該当テキストについてテキスト音声合成手段を用いて実際に音声を合成することにより求めることができる。また、要約レベル決定手法の実現方法は、もちろんこれらの手法に限るものではない。 Summarization level determination method 2
For example, a summary level function f (α) that uniquely determines a summary level for each degree of interest α (i) is prepared. f (α) takes a value in the range from 0 to 1, and for example, the following function can be used.
f (α) = ((α−1) / 8) +0.5
If the summarization level A (i) regarding the article (i) is set to f (α (i)) using such a function f (α), the summarization level is 0.5 when the degree of interest α is 1, When the degree of interest is 5, the summarization level is 1.0. Therefore, articles with high interest are not summarized very much, and articles with low interest are summarized into short text. Using such a function, if an article is extracted from a set of articles to which the degree of interest is given in descending order of interest, the sum of L (i) * A (i) is less than T. The target article can be determined. In the case of the summary level determination method 2, the parameter of the number of essential reproduction articles is not used as the content creation parameter.
In the embodiments of the above two summary level determination methods 1 and 2, it is necessary to know the speech length L when the text is speech-synthesized. Can be obtained by synthesizing. Of course, the method of realizing the summary level determination method is not limited to these methods.

テキスト要約手段１１２は、要約レベルが付与された記事テキストを、要約レベルに応じて要約する。例えば、参考文献１で実現される「文書要約方法」を用いれば実現することができる。
テキスト音声合成手段１０７は、要約された記事のテキストを、コンテンツ作成パラメータで指定される話者の種別、話速に対応した音声へ変換する。例えば、参考文献３による「音声合成方法」を、基準コードブックや差分コードブックを複数の話者について準備し、それらを選択的に用いるよう動作させることで実現することができる。 The text summarizing means 112 summarizes the article text to which the summary level is given according to the summary level. For example, it can be realized by using the “document summarization method” realized in Reference Document 1.
The text-to-speech synthesizer 107 converts the summarized article text into speech corresponding to the speaker type and speech speed specified by the content creation parameters. For example, the “speech synthesis method” according to Reference 3 can be realized by preparing a reference codebook or a difference codebook for a plurality of speakers and selectively operating them.

ここで、上記実施例の動作を、具体的な記事やキーワードの例を用いて説明する。キーワードセット蓄積手段１１１に蓄積されたキーワードセットが、図５のようなものであったとする。また、記事群１０５には図６に示すように六つの記事が含まれており、記事１から記事４まではキーワードを一つずつ含み、記事５はキーワードを二つ含み、記事６はキーワードを含んでいなかったものとする。また、利用者が指定し、コンテンツ作成パラメータ蓄積手段１０２に蓄積されたコンテンツ作成パラメータが図７に示すようなものであったとする。
このとき、関心度決定手段１０４は、それぞれの記事について関心度を決定する。記事の関心度は記事に含まれるキーワードのうち、もっとも大きな関心度を持つキーワードの関心度とされるので、関心度付与記事は図８のようになる。 Here, the operation of the above embodiment will be described using specific articles and keyword examples. Assume that the keyword set stored in the keyword set storage unit 111 is as shown in FIG. In addition, the article group 105 includes six articles as shown in FIG. 6, articles 1 to 4 each include one keyword, article 5 includes two keywords, and article 6 includes keywords. Suppose that it was not included. Further, it is assumed that the content creation parameters designated by the user and stored in the content creation parameter storage unit 102 are as shown in FIG.
At this time, the interest level determination means 104 determines the interest level for each article. Since the interest level of the article is the interest level of the keyword having the highest interest level among the keywords included in the article, the interest level granting article is as shown in FIG.

要約レベル決定手段１０６は、それぞれの記事について要約レベルを決定する。以下、動作例を示すが、このなかで、各記事のテキストを音声合成した場合の要約前の音声長Ｌ（ｉ）が図９のような時間となっているものとする。
このとき、要約レベル決定手法１を用いた場合の動作は、下記のようになる。
まず、関心度重みｗ（α）が決定される。記事３は関心度５であるので（５−１）を４で割った結果、関心度重みは１．０となり、同様に、記事１と記事５は０．７５となり、記事２と記事４は０．５となり、記事６は０となる。
次にＬ（ｉ）＊ｗ（α（ｉ））が１０分を超えない範囲で、関心度の高いものから順に記事を抽出する。ここで、例えば記事１と５や記事２と４のように、同じ関心度を持つものについては、ランダムに選択すればよい。 The summary level determination means 106 determines a summary level for each article. Hereinafter, although an example of operation is shown, it is assumed that the voice length L (i) before summarization when the text of each article is voice-synthesized is a time as shown in FIG.
At this time, the operation when the summary level determination method 1 is used is as follows.
First, the interest level weight w (α) is determined. Since article 3 has an interest level of 5, dividing (5-1) by 4 results in an interest degree weight of 1.0. Similarly, articles 1 and 5 are 0.75, and articles 2 and 4 are 0.5 and Article 6 is 0.
Next, articles are extracted in descending order of interest as long as L (i) * w (α (i)) does not exceed 10 minutes. Here, for example, articles 1 and 5 and articles 2 and 4 having the same degree of interest may be selected at random.

記事３だけを選択した場合、Ｌ（ｉ）＊ｗ（α（ｉ））の総和は６＊１．０で６．０である。次に記事１を選択した場合、総和は６＊１．０＋７＊０．７５＝１１．２５である。ここで総和が総音声時間長を超えるが、コンテンツ作成パラメータで指定される必須再生記事数が４であるため、更に二つの記事を抽出する必要がある。そこで、関心度の高い順に、記事５と記事２が選択される。この場合、Ｌ（ｉ）＊ｗ（α（ｉ））の総和は６＊１．０＋７＊０．７５＋１０＊０．７５＋６＊０．５＝２１．７５となる。ここで手順２の式を適用すれば、それぞれの記事の要約レベルが決定される。具体的には、記事３の要約レベルは１．０＊（１０．０÷２１．７５）＝０．４６であり、記事１と記事５の要約レベルは同様に０．３４となり、記事２の要約レベルは０．２３となる。このようにして得られる要約レベル付与記事を図１０に示す。 When only the article 3 is selected, the sum of L (i) * w (α (i)) is 6.0 (6 * 1.0). Next, when article 1 is selected, the total is 6 * 1.0 + 7 * 0.75 = 11.25. Here, the total exceeds the total audio time length, but the number of mandatory playback articles specified by the content creation parameter is 4, so two more articles need to be extracted. Therefore, article 5 and article 2 are selected in descending order of interest. In this case, the sum of L (i) * w (α (i)) is 6 * 1.0 + 7 * 0.75 + 10 * 0.75 + 6 * 0.5 = 21.75. If the formula of procedure 2 is applied here, the summary level of each article is determined. Specifically, the summary level of article 3 is 1.0 * (10.0 ÷ 21.75) = 0.46, and the summary level of article 1 and article 5 is similarly 0.34, The summarization level is 0.23. FIG. 10 shows an article with a summary level obtained in this way.

また、要約レベル決定手法２を用いた場合の動作は、下記のようになる。
まず、記事ｉについて、要約レベル関数を用いて要約レベルＡ（ｉ）の値を計算する。記事３は関心度が５であるため、（（５−１）／８）＋０．５＝１．０となる。記事１と５は、同様に（（４−１）／８）＋０．５＝０．８７５となり、記事２と４は０．６２となり、記事６は０．５となる。ここで、関心度の高い順に、Ｌ（ｉ）＊Ａ（ｉ）の総和がＴを超えない範囲で記事を抽出する。記事３を抽出すると６＊１．０＝６であるが、更に記事１を抽出すると６＊１．０＋７＊０．８７５＝１２．１２となり、総時間長Ｔを超えてしまう。このため、要約レベル決定手法２を用いる場合、記事３のみが抽出され、その要約レベルは１．０となる。 The operation when the summary level determination method 2 is used is as follows.
First, for the article i, the value of the summary level A (i) is calculated using the summary level function. Since article 3 has an interest level of 5, ((5-1) / 8) + 0.5 = 1.0. Articles 1 and 5 are similarly ((4-1) / 8) + 0.5 = 0.875, articles 2 and 4 are 0.62, and article 6 is 0.5. Here, articles are extracted in a descending order of interest so long as the sum of L (i) * A (i) does not exceed T. When article 3 is extracted, 6 * 1.0 = 6, but when article 1 is further extracted, 6 * 1.0 + 7 * 0.875 = 12.12, which exceeds the total time length T. Therefore, when the summary level determination method 2 is used, only the article 3 is extracted, and the summary level is 1.0.

図１１乃至図１５は本発明の実施形態２に係わる音声合成装置の実施例を示す。
この実施形態２では図１２に示すテキスト分類手段１２１と、ジャンル別記事群１２２Ａを新たに設けた構成を特徴とするものである。つまり、記事群１０５に含まれる複数の記事がテキスト分類手段１２１によって例えばジャンル毎に定めたキーワードの出現率を手掛かりにジャンル別に分類され、ジャンル別記事群１２２Ａにジャンル別に仕分けして蓄積される。またキーワード設定手段１１４においては、利用者がキーワードと関心度を、そのキーワードに属するジャンルとともに設定する。またコンテンツ作成パラメータと関心度を、そのキーワードの属するジャンルと共に設定する。またコンテンツ作成パラメータ設定手段１０１においても、コンテンツ作成パラメータの一部を各ジャンル毎に設定する。
従って、実施形態１で示した関心度付与記事蓄積手段１０３はジャンル別関心度付与記事蓄積手段１０３Ａに、コンテンツ作成パラメータ蓄積手段１０２はジャンル別コンテンツ作成パラメータ蓄積手段１０２Ａに、キーワードセット蓄積手段１１１はジャンル別キーワードセット蓄積手段１１１Ａに、要約レベル付与記事蓄積手段１０９はジャンル別要約レベル付与記事蓄積手段１０９Ａに、要約記事蓄積手段１０８はジャンル別要約記事蓄積手段１０８Ａに、それぞれ変更される。 11 to 15 show examples of the speech synthesizer according to the second embodiment of the present invention.
The second embodiment is characterized in that a text classification unit 121 and a genre-specific article group 122A shown in FIG. 12 are newly provided. That is, a plurality of articles included in the article group 105 are classified by genre by the text classifying unit 121 based on, for example, the keyword appearance rate determined for each genre, and sorted and stored in the genre-specific article group 122A. In the keyword setting means 114, the user sets the keyword and the degree of interest together with the genre belonging to the keyword. Also, the content creation parameter and the interest level are set together with the genre to which the keyword belongs. The content creation parameter setting unit 101 also sets a part of the content creation parameters for each genre.
Therefore, the interest-giving article accumulating means 103 shown in the first embodiment is the genre-specific interest-giving article accumulating means 103A, the content creation parameter accumulating means 102 is the genre-specific content creating parameter accumulating means 102A, and the keyword set accumulating means 111 is The genre-specific keyword set storage means 111A, the summary level-added article storage means 109 are changed to the genre-specific summary level-added article storage means 109A, and the summary article storage means 108 are changed to the genre-by-genre summary article storage means 108A.

図１３に、実施形態２による音声合成装置の処理フローを示す。
まず、テキスト分類ステップＳＰ１１において、記事群１０５に含まれる複数の記事が、あらかじめ決められた種類のジャンル毎に分類される。
次に、関心度決定ステップＳＰ１２において、ジャンル毎に分類された記事について、対応するジャンルに対応付けて指定されたキーワードの関心度を用いて、関心度を付与する。
次に、要約レベル決定ステップＳＰ１３において、ジャンル毎に分類された記事を、その記事に付与された要約レベルに従って要約する。
最後に音声合成ステップＳＰ１５において、ジャンル毎に分類され要約された記事を、コンテンツ作成パラメータに従って音声へ変換し、利用者の端末３００へ提示する。 FIG. 13 shows a processing flow of the speech synthesizer according to the second embodiment.
First, in the text classification step SP11, a plurality of articles included in the article group 105 are classified for each predetermined type of genre.
Next, in the interest level determination step SP12, the degree of interest is given to the articles classified by genre using the interest level of the keyword specified in association with the corresponding genre.
Next, in the summary level determination step SP13, the articles classified by genre are summarized according to the summary level assigned to the article.
Finally, in the speech synthesis step SP15, the articles classified and summarized for each genre are converted into speech according to the content creation parameters and presented to the user terminal 300.

以下に、各手段の実施例を説明する。
コンテンツ作成パラメータ設定手段１０１は、利用者がジャンル毎のコンテンツ作成パラメータ、ならびにコンテンツ全体の構成に関わるパラメータを設定する手段である。例えば図１４に示されるＧＵＩ画面を用いて実現される。コンテンツ作成パラメータには、少なくとも音声の総時間長と、各ジャンルにおける話速と、話者のタイプを含み、また、要約レベル決定手法の実現方法によっては、必須再生記事数を含むことが考えられる。
また、最終的に利用者へ音声を提示する際の順序であるジャンル再生順を含む。総時間長は数値を記入するフォームで記入すればよい。ジャンル毎の話者や話速、必須再生記事数を指定するＧＵＩについては、図２に例示したものと同様のＧＵＩについて、その対象となるジャンルを選択リストなどで指定するＧＵＩと併記することで実現できる。また、ジャンル再生順はジャンルを並べて表記した画面を準備し、任意のジャンルをマウスポインタなどで選択したのちに、再生順を早めたり遅らせたりするためのボタンを操作することで再生順を前後させるようなＧＵＩを用いれば実現させることができる。 Hereinafter, embodiments of each means will be described.
The content creation parameter setting means 101 is a means for the user to set content creation parameters for each genre and parameters relating to the overall content configuration. For example, it is realized using a GUI screen shown in FIG. The content creation parameters include at least the total length of speech, the speaking speed in each genre, and the type of speaker, and may include the number of essential playback articles depending on the method of realizing the summary level determination method. .
In addition, it includes a genre playback order that is the order in which audio is finally presented to the user. The total length of time may be entered on a form for entering numerical values. For GUIs that specify the speakers, speaking speeds, and the number of required playback articles for each genre, the GUIs similar to those illustrated in FIG. 2 can be written together with the GUI that specifies the target genre in a selection list or the like. realizable. Also, for the genre playback order, prepare a screen in which genres are displayed side by side, select an arbitrary genre with the mouse pointer, etc., and then operate the buttons to advance or delay the playback order to move the playback order back and forth. This can be realized by using such a GUI.

キーワード設定手段１１４は、利用者が興味を持つキーワードを、ジャンル毎に分けて設定する。例えば図１５に示すＧＵＩ画面を用いて、図３を同様なＧＵＩと、その対象となるジャンルを選択リストなどで指定するＧＵＩと併記することで実現できる。
テキスト分類手段１２１は、記事群１０５に含まれる複数の記事を、あらかじめ決められた数のジャンルに自動的に分類する。例えばＷｅｂなどにおいて既存の分類体系に従って分類され公開されている記事を大量に収集し、記事群として用いれば、記事が元来分類されていた先をジャンルとして用いることにより実現することができる。
関心度決定手段１０４は、実施形態１に係わる音声合成装置１００における関心度決定手段１０４を、ジャンル毎に独立して動作させることで実現することができる。
要約レベル決定手段１０６は、実施形態１に係わる音声合成装置１００における要約レベル決定手段１０６において、要約レベル決定手法２をジャンル毎に独立して動作させることで実現することもできるし、あるいは、次の要約レベル決定手法３で記事ｉの要約レベルＡ（ｉ）を求めることによって実現することもできる。 The keyword setting unit 114 sets keywords that the user is interested in for each genre. For example, using the GUI screen shown in FIG. 15, FIG. 3 can be realized by combining a similar GUI and a GUI for specifying the target genre by a selection list or the like.
The text classification unit 121 automatically classifies a plurality of articles included in the article group 105 into a predetermined number of genres. For example, if a large number of articles classified and published according to an existing classification system on the Web or the like are collected and used as an article group, it can be realized by using the destination where the article was originally classified as a genre.
The interest level determination means 104 can be realized by operating the interest level determination means 104 in the speech synthesizer 100 according to the first embodiment independently for each genre.
The summary level determination means 106 can be realized by causing the summary level determination means 106 in the speech synthesizer 100 according to the first embodiment to operate the summary level determination method 2 independently for each genre. This can also be realized by obtaining the summary level A (i) of the article i by the summary level determination method 3 in FIG.

要約レベル決定手段３
まず、関心度重みｗ（α（ｉ））を、関心度に応じて値が大きくなる、０から１までの範囲の値をとるものとする。例えばｗ（α（ｉ））は、（α（ｉ）−１）を４で割って得たものとする。
手順１、Ｌ（ｉ）＊ｗ（α（ｉ））の総和がＴを超えるまで、ジャンルに関係なく、関心度α（ｉ）の大きなものから順に記事ｉを抽出する。抽出が終わった時点で、各ジャンルにおいて、抽出された記事数が、コンテンツ作成パラメータ設定手段でジャンル毎に設定されるＭｉｎＡｒｔに満たない場合、該当するジャンルに含まれる記事のうち、関心度α（ｉ）の大きなものから順にＭｉｎＡｒｔ個に達するまで記事を追加抽出する。
抽出された記事に関するＬ（ｉ）＊ｗ（α（ｉ））の総和をＳとする。
手順２、要約度Ａ（ｉ）を以下の式で決定する。
Ａ（ｉ）＝ｗ（α（ｉ））×Ｔ／Ｓ
上記の手法により得られる要約レベルＡ（ｉ）は、関心度が高ければ高いほど高い要約率となり、さらに、各記事に対応する音声長Ｌ（ｉ）にＡ（ｉ）をかけて得られる要約後のテキストを音声合成した場合の音声長Ｌ（ｉ）＊Ａ（ｉ）の総和はＴとなり、ジャンル毎に設定されたＭｉｎＡｒｔ個の記事数を必ず含んでいる。 Summarization level determination means 3
First, it is assumed that the interest level weight w (α (i)) takes a value in the range from 0 to 1 that increases in accordance with the interest level. For example, w (α (i)) is obtained by dividing (α (i) -1) by 4.
Step 1, until the sum of L (i) * w (α (i)) exceeds T, articles i are extracted in descending order of interest α (i) regardless of the genre. When the number of articles extracted for each genre is less than MinArt set for each genre by the content creation parameter setting means at the time of completion of extraction, the degree of interest α ( Articles are additionally extracted in order from the largest of i) until the number reaches MinArt.
Let S be the total sum of L (i) * w (α (i)) for the extracted articles.
Procedure 2, summarization A (i) is determined by the following equation.
A (i) = w (α (i)) × T / S
The summarization level A (i) obtained by the above method has a higher summarization rate as the degree of interest is higher, and further, the summation obtained by multiplying the speech length L (i) corresponding to each article by A (i). The sum of the voice length L (i) * A (i) when the subsequent text is voice-synthesized is T and always includes the number of MinArt articles set for each genre.

テキスト要約手段１１２は、実施形態１に係わる音声合成装置１００におけるテキスト要約手段１１２と同様に実現することができる。
テキスト音声合成手段１０７は、要約された記事を、ジャンル別に音声へ変換する。このときコンテンツ作成パラメータにより指定されるジャンル毎の話者種別、話速に応じた音声へ変換する。また、コンテンツ作成パラメータにおいて指定されるジャンル再生順に従って音声を順次利用者の端末３００へ提示する。 The text summarization unit 112 can be realized in the same manner as the text summarization unit 112 in the speech synthesizer 100 according to the first embodiment.
The text-to-speech synthesizer 107 converts the summarized article into speech for each genre. At this time, the sound is converted into sound corresponding to the speaker type and speaking speed for each genre specified by the content creation parameter. In addition, the audio is sequentially presented to the user's terminal 300 in accordance with the genre playback order specified in the content creation parameter.

図１６及び図１７は本発明の実施形態３に係わる音声合成装置の機能構成図である。
実施形態３に係わる音声合成装置１００においてはテキスト分類手段１２１において記事群１０５に格納されている記事を分類する際に、各記事に固有の記事ＩＤを付与し、各記事毎に識別可能とした点と、各記事が選択される毎に利用者の早送り、スキップ巻き戻し等の操作情報を記事ＩＤを手掛かりに蓄積し、この操作情報により各記事を選択するために用いたキーワードの関心度を自動調整する構成とした点を特徴とするものである。
このため、構成上の変更点としては操作対象キーワード決定手段１３１と、関心度調整手段１３２を新たに設けた点と、図１２に示したジャンル別記事群１２２Ａはこの実施例３では記事ＩＤ付ジャンル別記事群１２２Ｂに変更し、ジャンル別関心度付与記事蓄積手段１０３Ａは記事ＩＤ付ジャンル別関心度付与記事蓄積手段１０３Ｂに変更し、ジャンル別要約レベル付与記事蓄積手段１０９Ａは記事ＩＤ付ジャンル別要約レベル付与記事蓄積手段１０９Ｂに変更し、ジャンル別要約記事蓄積手段１０８Ａは記事ＩＤ付ジャンル別要約記事蓄積手段１０８Ｂに変更し、ジャンル別合成音声蓄積手段１１３Ａは記事ＩＤ付ジャンル別合成音声蓄積手段１１３Ｂに変更した点である。
これらの構成上の変更により、利用者が音声を聴取する際に操作する端末３００などの機器の操作パラメータを音声合成装置１００に設けた操作対象キーワード決定手段１３１へ入力する。また、操作対象キーワード決定手段１３１において決定されたキーワードと、利用者の操作した操作内容を用いて、関心度調整手段１３２がジャンル別キーワードセット蓄積手段１１１Ａに保持されるキーワードセットの関心度を自動的に調整する。 16 and 17 are functional configuration diagrams of the speech synthesizer according to the third embodiment of the present invention.
In the speech synthesizer 100 according to the third embodiment, when the articles stored in the article group 105 are classified by the text classification unit 121, a unique article ID is assigned to each article so that each article can be identified. Each time an article is selected, operation information such as the user's fast forward and skip rewind is accumulated using the article ID as a clue, and the interest level of the keyword used to select each article based on this operation information is determined. It is characterized in that it is configured to automatically adjust.
For this reason, as the structural changes, the operation target keyword determination means 131 and the interest level adjustment means 132 are newly provided, and the genre-specific article group 122A shown in FIG. The genre-specific article group 122B is changed, the genre-specific interest-added article storage means 103A is changed to the genre-specific genre-specific interest-added article storage means 103B, and the genre-specific summary level-added article storage means 109A is classified by genre with article ID. The summary level-added article storage means 109B is changed, the genre-by-genre summary article storage means 108A is changed to the genre-by-genre summary article storage means 108B, and the genre-by-genre synthesized speech storage means 113A is genre-by-genre-by-genre synthesized speech storage means. This is a change to 113B.
With these structural changes, operation parameters of a device such as the terminal 300 that is operated when the user listens to the voice are input to the operation target keyword determination unit 131 provided in the voice synthesizer 100. Further, using the keyword determined by the operation target keyword determination unit 131 and the operation content operated by the user, the interest level adjustment unit 132 automatically calculates the interest level of the keyword set held in the genre-specific keyword set storage unit 111A. To adjust.

図１８に実施形態３に係わる音声合成装置１００の処理フローを示す。
テキスト分類ステップＳＰ２１から音声合成ステップまでＳＰ２５までは実施例２による音声合成装置１００における各ステップＳＰ１１〜ＳＰ１５と類似しているが、この実施形態３の特徴とするところはテキスト分類ステップＳＰ２１において分類された記事に、固有の記事ＩＤを付与する点である。また、関心度決定ステップＳＰ２２、要約レベル決定ステップＳＰ２３、テキスト要約ステップＳＰ２４において対応する記事に記事ＩＤが引き継がれ、テキスト音声合成ステップＳＰ２５においては、テキスト音声合成手段１０７が記事に対応するテキストを音声へ変換した後、記事ＩＤと共に音声を利用者へ提示する。
音声聴取／機器操作ステップＳＰ２６では、利用者が音声合成装置１００から提示される音声を聴取し、利用者が自らの要求によって、例えば早送りや巻き戻し、再生スキップなどの機器操作を行う。 FIG. 18 shows a processing flow of the speech synthesizer 100 according to the third embodiment.
From the text classification step SP21 to the speech synthesis step SP25 is similar to the steps SP11 to SP15 in the speech synthesis apparatus 100 according to the second embodiment, but the features of the third embodiment are classified in the text classification step SP21. A unique article ID is assigned to each article. Further, the article ID is inherited to the corresponding article in the interest level determining step SP22, the summarizing level determining step SP23, and the text summarizing step SP24. In the text speech synthesizing step SP25, the text speech synthesizing means 107 converts the text corresponding to the article into speech. After the conversion, the voice is presented to the user together with the article ID.
In the voice listening / apparatus operation step SP26, the user listens to the voice presented from the speech synthesizer 100, and the user performs equipment operations such as fast forward, rewind, and reproduction skip according to his / her request.

操作対象キーワード決定ステップＳＰ２７においては、操作対象キーワード決定手段１３１が、利用者が機器操作を行った対象の音声がどの記事であるかを同定し、更に、その記事がどのキーワードによって抽出されたのかを決定する。
関心度調整ステップＳＰ２８においては、関心度調整手段１３２が、操作対象キーワード決定ステップＳＰ２７において決定したキーワードの関心度を、利用者の機器操作に応じて増加もしくは減少させる。例えば、早送りやスキップ操作をしているのならば関心度を減少させ、巻き戻し操作を行った際には関心度を増加させることによって、利用者がキーワードを設定した時点から利用者の関心が強まったり薄まったりしたキーワードについても、利用者が聴取した時点の関心の度合いをキーワードセットに反映することができ、次回以降の音声合成において適切な要約レベルを提供することが可能となる。 In the operation target keyword determination step SP27, the operation target keyword determination means 131 identifies which article is the target voice for which the user has operated the device, and which keyword the article is extracted by. To decide.
In the interest level adjustment step SP28, the interest level adjustment unit 132 increases or decreases the interest level of the keyword determined in the operation target keyword determination step SP27 in accordance with the user's device operation. For example, if fast-forwarding or skipping operations are performed, the interest level is decreased, and when a rewinding operation is performed, the interest level is increased. Even for keywords that are strengthened or faded, the degree of interest at the time of listening to the user can be reflected in the keyword set, and an appropriate summarization level can be provided in the subsequent speech synthesis.

以下に、テキスト音声合成手段１０７、操作対象キーワード決定手段１３１および関心度調整手段１３２の実施例を説明する。
テキスト音声合成手段１０７においては、実施例２と同様に要約された記事に対応するテキストを音声へ変換するが、利用者へ音声を提示する際、音声データと共に記事ＩＤに関するデータを埋め込んで提示する。これは、例えばｍｐ３形式やＷＡＶ形式のような、タグを付与することのできる音声フォーマットを用いれば実現することができる。
利用者が音声を聴取する端末は、例えば音声の早送り、巻き戻しおよび再生スキップの操作ができるものとする。また、端末はネットワーク通信網２００などを通じて、操作情報および音声への埋め込まれた記事ＩＤを音声合成装置１００へ通信できるものとする。 Hereinafter, examples of the text-to-speech synthesis unit 107, the operation target keyword determination unit 131, and the interest level adjustment unit 132 will be described.
In the text-to-speech synthesizer 107, the text corresponding to the summarized article is converted into speech as in the second embodiment, but when presenting speech to the user, the data related to the article ID is embedded and presented together with the speech data. . This can be realized by using an audio format that can be tagged, such as mp3 format or WAV format.
It is assumed that the terminal where the user listens to the voice can perform operations such as fast-forwarding, rewinding, and reproduction skip, for example. Further, it is assumed that the terminal can communicate the operation information and the article ID embedded in the voice to the voice synthesizer 100 through the network communication network 200 or the like.

操作対象キーワード決定手段１３１は、記事ＩＤ付ジャンル別記事群１２２Ｂを参照することで、端末３００から送信された記事ＩＤに対応する記事を抽出することができる。更に、ジャンル別キーワードセットのうち、操作対象となった記事に含まれるキーワードを抽出することで、操作対象キーワードを決定することができる。
関心度調整手段１３２は、操作対象キーワード決定手段１３１において決定したキーワードの関心度に対して、例えば操作情報が早送りである場合には関心度を０．１減じ、操作情報が再生スキップである場合は関心度を０．２減じ、操作情報が巻き戻しである場合には０．１を加えることによって、関心度を調整することができる。操作の結果として関心度が１よりも小さな値になったり、５よりも大きな値となった場合には該当範囲を超えないように関心度の調整を制限することもできる。
もちろん関心度の調整方法はこれに限るものではなく、操作情報と関心度が対応付けられるような調整方法であればどのようなものであっても良い。 The operation target keyword determination unit 131 can extract an article corresponding to the article ID transmitted from the terminal 300 by referring to the genre-specific article group 122B with article ID. Furthermore, the operation target keyword can be determined by extracting the keyword included in the article that is the operation target from the genre-specific keyword set.
The interest level adjustment unit 132 reduces the interest level by 0.1 when the operation information is fast-forwarded with respect to the keyword interest level determined by the operation target keyword determination unit 131, and the operation information is a reproduction skip. The interest level can be adjusted by reducing the interest level by 0.2 and adding 0.1 when the operation information is rewind. When the interest level becomes a value smaller than 1 or a value larger than 5 as a result of the operation, the adjustment of the interest level can be limited so as not to exceed the corresponding range.
Of course, the method for adjusting the interest level is not limited to this, and any method may be used as long as the operation information and the interest level are associated with each other.

上述した実施例１、実施例２、実施例３で説明した各音声合成装置１００はコンピュータに本発明による音声合成プログラムをインストールし、コンピュータに備えたＣＰＵに音声合成プログラムを実行させることにより実現される。
本発明による音声合成プログラムはコンピュータが解読可能なプログラム言語によって記述され、コンピュータが読み取り可能な磁気ディスク、ＣＤ−ＲＯＭ或いは半導体メモリ等の記録媒体に記録される。これらの記録媒体から又は通信回線を通じてコンピュータにインストールされ、音声合成装置として機能させる。
［参考文献１］特開平１１−３３８８６７号公報
［参考文献２］特開平５−２２５２５５号公報
［参考文献３］特許第３４４４３９５号明細書 Each of the speech synthesizers 100 described in the first embodiment, the second embodiment, and the third embodiment described above is realized by installing a speech synthesis program according to the present invention in a computer and causing a CPU provided in the computer to execute the speech synthesis program. The
The speech synthesis program according to the present invention is described in a computer-readable program language, and is recorded on a recording medium such as a magnetic disk, CD-ROM, or semiconductor memory readable by the computer. It is installed in a computer from these recording media or through a communication line, and functions as a speech synthesizer.
[Reference 1] Japanese Patent Laid-Open No. 11-338867 [Reference 2] Japanese Patent Laid-Open No. 5-225255 [Reference 3] Japanese Patent No. 3444395

情報提供サイトの運用に活用される。 It is used for the operation of information service sites.

本発明の実施例１を説明するための機能構成図。The function block diagram for demonstrating Example 1 of this invention. 図１に示したコンテンツ作成パラメータ設定手段の詳細を説明するための図。The figure for demonstrating the detail of the content creation parameter setting means shown in FIG. 図１に示したキーワード設定手段の詳細を説明するための図。The figure for demonstrating the detail of the keyword setting means shown in FIG. 図１に示した音声合成装置の動作フローを説明するためのフローチャート。The flowchart for demonstrating the operation | movement flow of the speech synthesizer shown in FIG. 図１に示したキーワードセット蓄積手段の蓄積内容の一例を説明するための図。The figure for demonstrating an example of the content of accumulation | storage of the keyword set storage means shown in FIG. 図１に示した記事群の内容を説明するための図。The figure for demonstrating the content of the article group shown in FIG. 図１に示したコンテンツ作成パラメータ蓄積手段の蓄積内容の一例を説明するための図。The figure for demonstrating an example of the storage content of the content creation parameter storage means shown in FIG. 図１に示した関心度付与記事蓄積手段の蓄積内容の一例を説明するための図。The figure for demonstrating an example of the accumulation | storage content of the interest degree provision article | item accumulation means shown in FIG. 図１に示した記事群内の様子を説明するための図。The figure for demonstrating the mode in the article group shown in FIG. 図１に示した要約レベル付与記事蓄積手段の蓄積内容を説明するための図。The figure for demonstrating the storage content of the summary level provision article storage means shown in FIG. 本発明の実施例２の一部を説明するための機能構成図。The function block diagram for demonstrating a part of Example 2 of this invention. 図１１の続きを説明するための機能構成図。FIG. 12 is a functional configuration diagram for explaining the continuation of FIG. 11. 図１１と図１２に示した実施例２の動作を説明するためのフローチャート。FIG. 13 is a flowchart for explaining the operation of the second embodiment shown in FIGS. 11 and 12. FIG. 図１１と図１２に示したコンテンツ作成パラメータ設定手段の詳細を説明するための図。The figure for demonstrating the detail of the content creation parameter setting means shown in FIG. 11 and FIG. 図１１と図１２に示したキーワード設定手段の詳細を説明するための図。The figure for demonstrating the detail of the keyword setting means shown in FIG. 11 and FIG. 本発明の実施例３の一部を説明するための機能構成図。The function block diagram for demonstrating a part of Example 3 of this invention. 図１６の続きを説明するための機能構成図。FIG. 17 is a functional configuration diagram for explaining the continuation of FIG. 16. 図１６と図１７に示した実施例３の動作を説明するためのフローチャート。18 is a flowchart for explaining the operation of the third embodiment shown in FIGS. 16 and 17.

Explanation of symbols

１００音声合成装置
１０１コンテンツ作成パラメータ設定手段
１０２コンテンツ作成パラメータ蓄積手段
１０２Ａジャンル別コンテンツ作成パラメータ蓄積手段
１０３関心度付与記事蓄積手段
１０３Ａジャンル別関心度付与記事蓄積手段
１０３Ｂ記事ＩＤ付ジャンル別関心度付与記事蓄積手段
１０４関心度決定手段
１０５記事群
１０６要約レベル決定手段
１０７テキスト音声合成手段
１０８要約記事蓄積手段
１０８Ａジャンル別要約記事蓄積手段
１０８Ｂ記事ＩＤ付ジャンル別要約記事蓄積手段
１０９要約レベル付与記事蓄積手段
１０９Ａジャンル別要約レベル付与記事蓄積手段
１０９Ｂ記事ＩＤ付ジャンル別要約レベル付与記事蓄積手段
１１１キーワードセット蓄積手段
１１１Ａジャンル別キーワードセット蓄積手段
１１２テキスト要約手段
１１３合成音声蓄積手段
１１３Ａジャンル別合成音声蓄積手段
１１３Ｂ記事ＩＤ付ジャンル別合成音声蓄積手段
１１４キーワード設定手段
１２１テキスト分類手段
１２２Ａジャンル別記事群
１２２Ｂ記事ＩＤ付ジャンル別記事群
１３１操作対象キーワード決定手段
１３２関心度調整手段
２００通信網
３００端末 100 Speech synthesizer
101 Content creation parameter setting means
102 Content creation parameter storage means
102A Genre-specific content creation parameter storage means
103 Means for adding articles of interest
103A Genre-specific interest level article storage means
103B Article ID-added interest level-accumulated article accumulation means
104 Interest level determination means
105 articles
106 Summarization level determination means
107 Text-to-speech synthesis means
108 Summary article storage means
108A Summary article storage means by genre
108B Summary article storage means by genre with article ID 109 Summary level assignment article storage means 109A Summary level assignment article storage means by genre 109B Summary level assignment article storage means by genre with article ID 111 Keyword set storage means 111A Keyword set storage means by genre 112 Text summarizing means 113 Synthetic voice accumulating means 113A Genre-specific synthetic voice accumulating means 113B Article ID-added genre synthesized voice accumulating means 114 Keyword setting means 121 Text categorizing means 122A Genre-specific article group 122B Article-ID-genre article group 131 Operation target Keyword determining means 132 Interest level adjusting means 200 Communication network 300 Terminal

Claims

In the speech synthesis method for generating synthesized speech from text data constituting the selected article by selecting an article that matches the set condition from an article group storing a plurality of articles consisting of text data,
A content creation parameter setting step for setting the desired synthesized speech speed, speaker type, total time length T, and number of required playback articles ;
A keyword setting step for setting the keyword and the degree of interest in the keyword in the keyword set storage means ;
For each article i included in the article group, the degree of interest α (i) of the article i is determined based on the degree of interest of the keyword appearing in the article i among the keywords stored in the keyword set storage unit. Interest level determination step,
The interest level weight w (α (i)) for each article i is determined so that the value increases in accordance with the degree of interest α (i) in the range from 0 to 1, and the text before the article i is summarized The sum of multiplication of the time length L (i) and the interest weight w (α (i)) of the article i exceeds the total time length T, and the number of extracted articles is the required number. The article i is extracted in descending order of the degree of interest α (i) of the article i so as to satisfy the number of reproduced articles, the time length L (i) for the extracted article i and the interest degree weight w (α ( From the sum S of multiplications of i)) and the total time length T, the summary level A (i) of each article i extracted is determined as the interest weight w (α ((i)) * (total time) of the article i. A summarization level determination step that determines the summation level A (i) to the extracted article i, determined by the length T / sum S) ;
A text summarizing step for summarizing text corresponding to the article i based on the summary level A (i) for the article i given the summary level A (i) ;
A text-to-speech synthesis step of synthesizing speech from the summarized text data based on the speech speed and speaker type set by the content creation parameter setting means;
A speech synthesis method comprising:

  In the speech synthesis method for generating synthesized speech from text data constituting the selected article by selecting an article that matches the set condition from an article group storing a plurality of articles consisting of text data,
  A content creation parameter setting step for setting a desired synthesized speech speed, speaker type, and total time length T;
  A keyword setting step for setting the keyword and the degree of interest in the keyword in the keyword set storage means;
  For each article i included in the article group, the degree of interest α (i) of the article i is determined based on the degree of interest of the keyword appearing in the article i among the keywords stored in the keyword set storage unit. Interest level determination step,
  Using a predetermined function f that increases in accordance with the degree of interest α (i) of the article i in the range from 0 to 1, the summary level A (i) of each article i is obtained by f (α (i)). , The product of the time length L (i) and the summary level A (i) of the article i when the text before the article i is summarized is synthesized in the descending order of the degree of interest α (i) of the article i. A summary level determination step of extracting an article i in a range where the sum total is less than the total time length T and assigning a summary level A (i) to the extracted article i;
  A text summarizing step for summarizing text corresponding to the article i based on the summary level A (i) for the article i given the summary level A (i);
  A text-to-speech synthesis step of synthesizing speech from the summarized text data based on the speech speed and speaker type set by the content creation parameter setting means;
  A speech synthesis method comprising:

The speech synthesis method according to claim 1 or 2, further comprising a text classification step of classifying a plurality of articles into a plurality of predetermined genres,
A speech synthesizing method comprising a step of setting a desired synthesized speech speed and speaker type for each genre in the content creation parameter setting step.

The speech synthesis method according to any one of claims 1 to 3 ,
An operation target keyword determination step for determining a keyword related to an article to be operated when listening to sound;
An interest level adjustment step for adjusting the degree of interest related to the determined keyword according to the operation parameters when operating the device;
A speech synthesis method comprising:

In a speech synthesizer that selects an article that matches a set condition from an article group that stores a plurality of articles composed of text data, and generates synthesized speech from text data that constitutes the selected article.
Content creation parameter setting means for setting the desired synthesized speech speed, speaker type, total time length T, and number of required playback articles ;
A keyword setting step for setting the keyword and the degree of interest in the keyword in the keyword set storage means ;
For each article i included in the article groups, based on the degree of interest of keywords appearing in the articles i among the keywords stored in the keyword set storage means, DOI serial things i alpha a (i) Interest level determination means to determine;
The interest level weight w (α (i)) for each article i is determined so that the value increases in accordance with the degree of interest α (i) in the range from 0 to 1, and the text before the article i is summarized The sum of multiplication of the time length L (i) and the interest weight w (α (i)) of the article i exceeds the total time length T, and the number of extracted articles is the required number. The article i is extracted in descending order of the degree of interest α (i) of the article i so as to satisfy the number of reproduced articles, the time length L (i) for the extracted article i and the interest degree weight w (α ( From the sum S of multiplications of i)) and the total time length T, the summary level A (i) of each article i extracted is determined as the interest weight w (α ((i)) * (total time) of the article i. Summarization level determination means that determines the summation level A (i) to the extracted article i, determined by the length T / sum S) ,
Text summarizing means for summarizing the text corresponding to the article i based on the summary level A (i) for the article i given the summary level A (i) ;
Text-to-speech synthesis means for synthesizing speech from the summarized text based on the speech speed and speaker type set by the content creation parameter setting means;
A speech synthesizer comprising:

  In a speech synthesizer that selects an article that matches a set condition from an article group that stores a plurality of articles composed of text data, and generates synthesized speech from text data that constitutes the selected article.
  Content creation parameter setting means for setting the desired synthesized speech speed, speaker type, and total time length T;
  Keyword setting means for setting the keyword and the degree of interest in the keyword in the keyword set storage means;
  For each article i included in the article group, the degree of interest α (i) of the article i is determined based on the degree of interest of the keyword appearing in the article i among the keywords stored in the keyword set storage unit. Interest level determination means to
  Using a predetermined function f that increases in accordance with the degree of interest α (i) of the article i in the range from 0 to 1, the summary level A (i) of each article i is obtained by f (α (i)). , The product of the time length L (i) and the summary level A (i) of the article i when the text before the article i is summarized is synthesized in the descending order of the degree of interest α (i) of the article i. Summarization level determination means for extracting an article i in a range where the total sum is less than the total time length T and assigning the summarization level A (i) to the extracted article i;
  Text summarizing means for summarizing the text corresponding to the article i based on the summary level A (i) for the article i given the summary level A (i);
  Text-to-speech synthesis means for synthesizing speech from the summarized text data based on the speech speed and speaker type set by the content creation parameter setting means;
  A speech synthesizer comprising:

The speech synthesizer according to claim 5 or 6, comprising text classification means for classifying a plurality of articles into a plurality of predetermined genres,
A speech synthesizer characterized in that, in content creation parameter setting means, a desired synthesized speech speed and speaker type are set for each genre.

The speech synthesizer according to any one of claims 5 to 7 ,
An operation target keyword determining means for determining a keyword related to an article to be operated when listening to the sound of operating the device;
Interest level adjustment means for adjusting the degree of interest related to the determined keyword according to the operation parameter when operating the device,
A speech synthesizer comprising:

Computer is described by a readable programming language, the speech synthesis program to function as the speech synthesis device according to claim 5 or 8, wherein the computer.

A recording medium comprising a computer-readable recording medium, wherein the speech synthesis program according to claim 9 is recorded on the recording medium.