JP2005077663A

JP2005077663A - Voice synthesizer, voice synthesis method, and voice-synthesizing program

Info

Publication number: JP2005077663A
Application number: JP2003307121A
Authority: JP
Inventors: Miyoshi Ando; 美佳安藤
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2003-08-29
Filing date: 2003-08-29
Publication date: 2005-03-24
Anticipated expiration: 2023-08-29
Also published as: JP4225167B2

Abstract

<P>PROBLEM TO BE SOLVED: To enable BGM options suitable for a text to be read out, on the basis of a combinatorial conditions. <P>SOLUTION: A voice synthesizer extracts (S16) and detects (S14) environmental information affixed to an object text, and extracts category information (S22), sub-category information (S26), form information (S28), reading information (S32), ON/OFF information (S36), and partial information (S40) to be stored to a storing region of RAM. It generates a reading voice of the object text by voice synthesis processing (S42), and selects an appropriate BGM, on the basis of various conditions stored to RAM (S46) after calculating a time during reading (S44). When the texts is in series, common processing for selecting the same BGM is performed (S52), if a series of texts of the same category exists after repeating the above processing up to a final text. The generated reading voice is synchronized with the selected BGM and is output (S54). <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、入力されたテキストを音声に変換して出力する音声合成装置、音声合成方法、及び音声合成プログラムに関し、詳細には、ＢＧＭを再生しながら合成音声を出力する音声合成装置、音声合成方法、及び音声合成プログラムに関するものである。 The present invention relates to a speech synthesizer, a speech synthesis method, and a speech synthesis program that convert input text into speech and output, and more particularly, a speech synthesizer that outputs synthesized speech while reproducing BGM, and speech synthesis. The present invention relates to a method and a speech synthesis program.

文字列からなるテキストを合成音声により読み上げる場合、単純に文字列を読み上げるのみでなく、そのテキストに適合した効果音やＢＧＭを付与することが試みられている。例えば、特許文献１においては、自然言語処理による解析結果を用いてシーンの環境を抽出して環境に対応した効果音を出力する方法が開示されている。また、特許文献２においては、読み上げるテキストに付与したい感情を選択し、その感情に合わせた背景音楽を再生する方法が開示されている。さらに、特許文献３においては、読み上げられる情報の内容に応じてＢＧＭの再生条件を変更する方法が開示されている。
特開平７−７２８８８号公報特開２００１−１６６７８６号公報特開２００１−３４２８１号公報 When a text composed of a character string is read out by synthesized speech, an attempt is made not only to simply read out the character string but also to provide sound effects and BGM adapted to the text. For example, Patent Document 1 discloses a method of extracting a scene environment using an analysis result by natural language processing and outputting a sound effect corresponding to the environment. Patent Document 2 discloses a method of selecting an emotion to be given to a text to be read and reproducing background music in accordance with the emotion. Further, Patent Document 3 discloses a method of changing BGM playback conditions according to the content of information to be read out.
Japanese Unexamined Patent Publication No. 7-72888 JP 2001-166786 A JP 2001-34281 A

しかしながら、上記の方法では、読み上げられるテキストの内容や読み上げを行なう場所の環境等の情報に基づいて、複合的にテキストに適合したＢＧＭを選択することはできなかった。 However, in the above method, it is not possible to select a BGM that is adapted to the text in a complex manner based on information such as the contents of the text to be read out and the environment of the place where the text is read out.

本発明は上記問題を解決するためになされたもので、複合的な条件に基づき読み上げるテキストに適合したＢＧＭを選択できるようにすることを目的とする。 The present invention has been made to solve the above problem, and an object of the present invention is to be able to select a BGM suitable for text to be read out based on multiple conditions.

上記目的を達成するために、本出願の第１の発明は、テキストの合成音声による読み上げをＢＧＭを再生しつつ行なう音声合成装置において、少なくともカテゴリと再生時間により分類されたＢＧＭデータを記憶する記憶手段と、前記テキストに付加された付加情報を抽出する情報抽出手段と、当該情報抽出手段により抽出された付加情報に基づき、前記記憶手段からＢＧＭデータを選択するＢＧＭ選択手段と、当該ＢＧＭ選択手段により選択されたＢＧＭデータを前記テキストの読み上げに同期させて再生するＢＧＭ再生手段とを備えている。 In order to achieve the above object, a first invention of the present application is a speech synthesizer that reads out text by synthesized speech while reproducing BGM, and stores memory for storing BGM data classified at least by category and reproduction time. Means for extracting additional information added to the text, BGM selection means for selecting BGM data from the storage means based on the additional information extracted by the information extraction means, and the BGM selection means BGM reproducing means for reproducing the BGM data selected by the above in synchronization with the reading of the text.

また、本発明は、前記付加情報が、前記テキストの内容を分類するカテゴリ情報、前記テキストを読み上げる読み上げ音声の大きさ・ピッチ・トーンを含む属性を指定する読上情報、前記テキストの文字数や長さを示す書式情報、又は読み上げが行なわれる周囲の音の大きさを示す環境情報のいずれかを含んでもよい。 Further, according to the present invention, the additional information includes category information for classifying the content of the text, reading information for designating an attribute including a loudness, a pitch, and a tone for reading out the text, and the number of characters and the length of the text. May include either format information indicating the size or environment information indicating the volume of the surrounding sound to be read out.

また、本発明は、前記ＢＧＭ選択手段が、前記カテゴリ情報又は前記書式情報の少なくとも一方に基づいてＢＧＭデータを選択してもよい。 In the present invention, the BGM selection means may select BGM data based on at least one of the category information and the format information.

また、本発明は、前記ＢＧＭ選択手段が、複数の前記テキストの各々に対して１個ずつＢＧＭデータを選択してもよい。 In the present invention, the BGM selection means may select one piece of BGM data for each of the plurality of texts.

また、本発明は、連続した複数のテキストの付加情報から共通の特徴を抽出する共通化手段を備え、前記ＢＧＭ選択手段は、当該共通化手段により共通化された複数のテキストに対し、１個のＢＧＭデータを選択してもよい。 In addition, the present invention includes a common unit that extracts a common feature from additional information of a plurality of continuous texts, and the BGM selection unit includes one unit for each of the plurality of texts shared by the common unit. BGM data may be selected.

また、本発明は、前記共通化手段が、前記カテゴリ情報が共通する複数のテキストについて、当該共通するカテゴリ情報を共通の特徴として抽出してもよい。 In the present invention, the common unit may extract the common category information as a common feature for a plurality of texts with the common category information.

また、本発明は、前記付加情報が、前記ＢＧＭデータの再生を行なうか否かを指示する停止情報又は前記ＢＧＭの再生を行なう部分の割合を示す部分情報を含み、前記停止情報又は前記部分情報に基づいて、前記ＢＧＭの再生を開始、停止させるＢＧＭ停止制御手段を備えてもよい。 In the present invention, the additional information includes stop information for instructing whether or not to reproduce the BGM data, or partial information indicating a ratio of a portion for reproducing the BGM, and the stop information or the partial information Based on the above, BGM stop control means for starting and stopping the reproduction of the BGM may be provided.

次に、本出願の第２の発明は、テキストの合成音声による読み上げをＢＧＭを再生しつつ行なう音声合成方法において、前記テキストに付加された付加情報を抽出する情報抽出工程と、当該情報抽出工程により抽出された付加情報に基づき、少なくともカテゴリと再生時間により分類され記憶されている複数のＢＧＭデータから前記テキスト用にＢＧＭデータを選択するＢＧＭ選択工程と、当該ＢＧＭ選択工程において選択されたＢＧＭデータを前記テキストの読み上げに同期させて再生するＢＧＭ再生工程とを備えている。 Next, according to a second aspect of the present application, there is provided an information extraction step of extracting additional information added to the text in a speech synthesis method in which text is read out by synthesized speech while reproducing BGM, and the information extraction step. A BGM selection step for selecting BGM data for the text from a plurality of BGM data classified and stored based on at least a category and a reproduction time based on the additional information extracted by the BGM data selected in the BGM selection step And a BGM reproduction step of reproducing in synchronization with the reading of the text.

また、本発明は、前記ＢＧＭ選択工程では、前記カテゴリ情報又は前記書式情報の少なくとも一方に基づいてＢＧＭデータを選択してもよい。 In the BGM selecting step, BGM data may be selected based on at least one of the category information and the format information.

また、本発明は、前記ＢＧＭ選択工程では、複数の前記テキストの各々に対して１個ずつＢＧＭデータを選択してもよい。 In the BGM selecting step, one BGM data may be selected for each of the plurality of texts.

また、本発明は、連続した複数のテキストの付加情報から共通の特徴を抽出する共通化工程を備え、前記ＢＧＭ選択工程では、当該共通化工程において共通化された複数のテキストに対し、１個のＢＧＭデータを選択してもよい。 In addition, the present invention includes a common process for extracting common features from additional information of a plurality of continuous texts. In the BGM selection process, one text is used for a plurality of texts shared in the common process. BGM data may be selected.

また、本発明は、前記共通化工程では、前記カテゴリ情報が共通する複数のテキストについて、当該共通するカテゴリ情報を共通の特徴として抽出してもよい。 Moreover, this invention may extract the said common category information as a common characteristic about the some text in which the said category information is common in the said commonization process.

また、本発明は、前記付加情報が、前記ＢＧＭデータの再生を行なうか否かを指示する停止情報又は前記ＢＧＭの再生を行なう部分の割合を示す部分情報を含み、前記停止情報又は前記部分情報に基づいて、前記ＢＧＭの再生を開始、停止させるＢＧＭ停止制御工程を備えてもよい。 In the present invention, the additional information includes stop information for instructing whether or not to reproduce the BGM data, or partial information indicating a ratio of a portion for reproducing the BGM, and the stop information or the partial information A BGM stop control step for starting and stopping the reproduction of the BGM may be provided based on the above.

次に、本出願の第３の発明は、第２の発明をコンピュータに実行させる。 Next, the third invention of the present application causes a computer to execute the second invention.

本出願の第１の発明は、テキストにあらかじめ付加された付加情報に基づいてＢＧＭが選択されるので、テキストにふさわしいＢＧＭを再生しながらテキストの読み上げを聴くことができる。 In the first invention of the present application, the BGM is selected based on the additional information added in advance to the text, so that the text can be read out while reproducing the BGM suitable for the text.

また、本発明は、複数の付加情報に従ってＢＧＭを選択することができるので、よりテキストに適合したＢＧＭを再生しながらテキストの読み上げを聴くことができる。 Further, according to the present invention, since BGM can be selected according to a plurality of additional information, it is possible to listen to text reading while reproducing BGM more suitable for text.

さらに、本発明は、カテゴリ情報又は書式情報に基づいてＢＧＭが選択されるので、テキストのジャンルに合わせたＢＧＭや、テキストの長さにあったＢＧＭを再生しながらテキストの読み上げを聴くことができる。 Further, according to the present invention, since BGM is selected based on category information or format information, it is possible to listen to text-to-speech while playing back a BGM that matches the genre of the text or a BGM that matches the length of the text. .

また、本発明は、個々のテキストに対して１個ずつＢＧＭが選択されるので、それぞれのテキストに最適なＢＧＭを再生しながらテキストの読み上げを聴くことができる。 Further, according to the present invention, since one BGM is selected for each text, the text can be read out while reproducing the BGM most suitable for each text.

さらに、本発明は、連続した複数のテキストがカテゴリが共通している等の共通した特徴を有する場合には、共通したテキストに対して１個のＢＧＭが選択されるので、ＢＧＭが頻繁に切り替わることがなく、かつ適切なＢＧＭを再生しながらテキストの読み上げを聴くことができる。 Further, according to the present invention, when a plurality of continuous texts have a common feature such as a common category, one BGM is selected for the common text, so that the BGM is frequently switched. And can read aloud text while playing back an appropriate BGM.

また、本発明は、連続した複数のテキストのカテゴリが共通している場合には、共通したテキストに対して１個のＢＧＭが選択されるので、ＢＧＭが頻繁に切り替わることがなく、かつ適切なＢＧＭを再生しながらテキストの読み上げを聴くことができる。 Further, according to the present invention, when a plurality of continuous text categories are common, one BGM is selected for the common text, so that the BGM is not frequently switched and is appropriate. You can listen to text reading while playing BGM.

さらに、本発明は、付加情報としてＢＧＭ再生の開始・停止を指示することができるので、テキスト中でＢＧＭの再生を行ないたい部分だけにＢＧＭを再生しつつテキストの読み上げを聴くことができる。 Furthermore, since the present invention can instruct the start / stop of BGM playback as additional information, it is possible to listen to the text being read out while playing back the BGM only in the portion of the text where the BGM is to be played back.

次に、本出願の第２の発明は、テキストにあらかじめ付加された付加情報に基づいてＢＧＭが選択されるので、テキストにふさわしいＢＧＭを再生しながらテキストの読み上げを聴くことができる。 Next, in the second invention of the present application, since BGM is selected based on additional information added in advance to the text, it is possible to listen to the text while reading the BGM suitable for the text.

次に、本出願の第３の発明は、第２の発明の効果を奏することができる。 Next, the 3rd invention of this application can have the effect of the 2nd invention.

次に、本発明を実施するための最良の形態について図面を参照して説明する。
図１は、本発明による音声合成装置の一例を示す携帯情報端末１の回路ブロック図である。携帯情報端末１は、携帯情報端末１全体を制御するＣＰＵ１０と、各種プログラム・データベースを記憶する不揮発メモリ２１や各種データを記憶するＲＡＭ２２等のメモリを制御するメモリ制御部２０と、周辺機器を制御する周辺制御部３０が接続されている。周辺制御部３０には、周囲の音等を拾うマイク３１と、ディスプレイ３２と、入力部３３と、オーディオ部３４と、オーディオ部３４から送出される音声を出力するスピーカ３５とが接続されている。ここで、オーディオ部３４は、ＣＰＵ１０により合成された音声をアナログの音声信号に変換するとともに、所定の増幅を行ない、その増幅された音声信号をスピーカ３５に送出する。 Next, the best mode for carrying out the present invention will be described with reference to the drawings.
FIG. 1 is a circuit block diagram of a portable information terminal 1 showing an example of a speech synthesizer according to the present invention. The portable information terminal 1 controls a CPU 10 that controls the entire portable information terminal 1, a memory control unit 20 that controls memories such as a nonvolatile memory 21 that stores various programs and databases, a RAM 22 that stores various data, and peripheral devices. A peripheral control unit 30 is connected. Connected to the peripheral control unit 30 are a microphone 31 that picks up ambient sounds and the like, a display 32, an input unit 33, an audio unit 34, and a speaker 35 that outputs sound sent from the audio unit 34. . Here, the audio unit 34 converts the audio synthesized by the CPU 10 into an analog audio signal, performs predetermined amplification, and sends the amplified audio signal to the speaker 35.

次に、不揮発メモリ２１について図２乃至図５を参照して説明する。図２は、記憶手段である不揮発メモリ２１の構成を模式的に示すブロック図である。図３は、ＢＧＭデータベース２１１の構成を模式的に示すブロック図である。図４及び図５は、読上情報データベース２１２の構成を模式的に示すブロック図である。図２に示すように、不揮発メモリ２１には、ＢＧＭの選択に使用されるＢＧＭデータベース２１１と、テキストの読み上げに関する情報を記憶した読上情報データベース２１２と、テキストを音声合成する際に使用される音声合成辞書２１５と、合成音声の音節毎の継続時間を記憶した音節長データベース２１６とが記憶されている。 Next, the nonvolatile memory 21 will be described with reference to FIGS. FIG. 2 is a block diagram schematically showing the configuration of the nonvolatile memory 21 that is a storage means. FIG. 3 is a block diagram schematically showing the configuration of the BGM database 211. 4 and 5 are block diagrams schematically showing the configuration of the reading information database 212. As shown in FIG. As shown in FIG. 2, the non-volatile memory 21 includes a BGM database 211 used for BGM selection, a reading information database 212 that stores information related to text reading, and a text that is used for speech synthesis. A speech synthesis dictionary 215 and a syllable length database 216 storing the duration of each synthesized speech syllable are stored.

また、図３に示すように、ＢＧＭデータベース２１１は、ＢＧＭが再生される対象テキストの種類を示すカテゴリが記憶されるカテゴリ欄２１１ａ，対象テキストのサブカテゴリが記憶されるサブカテゴリ欄２１１ｂ，ＢＧＭの曲名が記憶される曲名欄２１１ｃ，ＢＧＭの継続時間が記憶される継続時間欄２１１ｄ，ＢＧＭを選択するための選択番号欄２１１ｅから構成されている。以上の条件の組み合わせにより、ＢＧＭデータベース２１１には、ＢＧＭを再生する対象のテキストの種類にふさわしい異なる継続時間のＢＧＭが複数ずつ用意されている。後述する音声合成処理において、対象テキストに同期して再生されるＢＧＭは、カテゴリ欄２１１ａに記憶されたカテゴリ、サブカテゴリ欄２１１ｂに記憶されたサブカテゴリと、継続時間欄２１１ｄに記憶されたＢＧＭの継続時間とを条件として決定される。また、サブカテゴリの指定がないテキストについてはカテゴリと継続時間により、カテゴリの指定がないテキストについてはａｌｌカテゴリのＢＧＭから決定される。 As shown in FIG. 3, the BGM database 211 includes a category column 211a in which a category indicating the type of target text to be played back is stored, a subcategory column 211b in which a subcategory of the target text is stored, and song titles in the BGM. The song title field 211c is stored, the duration field 211d is stored, and the selection number field 211e for selecting the BGM is stored. Based on the combination of the above conditions, the BGM database 211 has a plurality of BGMs having different durations suitable for the type of text to be reproduced. In a speech synthesis process to be described later, the BGM reproduced in synchronization with the target text includes the category stored in the category column 211a, the subcategory stored in the subcategory column 211b, and the duration of the BGM stored in the duration column 211d. And is determined on the condition. Further, the text for which no sub-category is specified is determined from the category and duration, and the text for which no category is specified is determined from the BGM of the all category.

次に、読上情報データベース２１２について、図４及び図５を参照して説明する。読上情報は、対象テキストに読上情報タグとして付加されているものである。この読上情報タグを検索キーにして情報の内容を決定するためのデータベースが読上情報データベース２１２である。図４に示すのは、読上情報として、読上音声の性別、年代、言葉遣い、ピッチ、トーン、声質を複数の種類又はレベルに分けて記憶したタイプの読上情報データベース２１２の例である。この例では、対象テキストの読上情報タグは、例えば、「＜性別：２＞」、「＜言葉使い：２＞」のような形で個別に指定され、これらを総合して読上情報が決定される。 Next, the reading information database 212 will be described with reference to FIGS. The reading information is added to the target text as a reading information tag. The reading information database 212 is a database for determining the content of information using the reading information tag as a search key. FIG. 4 shows an example of a reading information database 212 of a type in which the gender, age, wording, pitch, tone, and voice quality of the reading voice are stored in a plurality of types or levels as reading information. . In this example, the reading information tag of the target text is individually specified in the form of, for example, “<sex: 2>”, “<word usage: 2>”, and the reading information is integrated with these tags. It is determined.

また、図５に示すのは、各種の読上情報の組み合わせから構成されるタイプの読上情報データベース２１２の例である。この例では、読上情報データベース２１２は、選択番号欄２１２ａ，性別欄２１２ｂ，年代欄２１２ｃ，言葉遣い欄２１２ｄ，ピッチ欄２１２ｅ，トーン欄２１２ｆ，声質欄２１２ｇ，既定値欄２１２ｈからなり、性別欄２１２ｂ〜声質欄２１２ｇに記憶された読上条件の組み合わせを選択番号欄２１２ａにある番号で指定して読上情報タグとして対象テキストに付加することができる。読上情報タグが付加されていない場合には、既定値欄２１２ｈにおいて１が記憶されているもの(図５の例では、Ｇ）を用いることができる。 FIG. 5 shows an example of a read-out information database 212 of a type composed of a combination of various read-out information. In this example, the reading information database 212 includes a selection number field 212a, a sex field 212b, an age field 212c, a wording field 212d, a pitch field 212e, a tone field 212f, a voice quality field 212g, and a default value field 212h. A combination of reading conditions stored in 212b to voice quality column 212g can be designated by a number in selection number column 212a and added to the target text as a reading information tag. When the reading information tag is not added, the one stored in the default value column 212h (G in the example of FIG. 5) can be used.

次に、ＲＡＭ２２の記憶領域について図６を参照して説明する。図６は、ＲＡＭ２２の記憶領域を示す模式図である。ＲＡＭ２２は、対象テキストに付加された各種のタグを抽出した結果を記憶するためのものである。ＲＡＭ２２は、環境情報記憶領域２２１，カテゴリ情報記憶領域２２２，サブカテゴリ情報記憶領域２２３，書式情報記憶領域２２４，読上情報記憶領域２２５，停止情報記憶領域２２６，部分情報記憶領域２２７，読上音声データ記憶領域２２８，読上時間記憶領域２２９，選択ＢＧＭ記憶領域２３０からなり、対象テキストを処理する毎に各記憶領域にデータが順に記憶されるようになっている。 Next, the storage area of the RAM 22 will be described with reference to FIG. FIG. 6 is a schematic diagram showing a storage area of the RAM 22. The RAM 22 is for storing the results of extracting various tags added to the target text. The RAM 22 includes an environment information storage area 221, a category information storage area 222, a subcategory information storage area 223, a format information storage area 224, a reading information storage area 225, a stop information storage area 226, a partial information storage area 227, and reading voice data. The storage area 228, the reading time storage area 229, and the selected BGM storage area 230 are arranged so that data is sequentially stored in each storage area every time the target text is processed.

次に、このように構成された携帯情報端末１の動作について図７のフローチャートを参照して説明する。図７は、携帯情報端末１における音声合成処理の全体の流れを示すフローチャートである。ここで音声合成処理の対象となるテキストは、あらかじめ不揮発メモリ２１に記憶されていてもよいし、赤外線通信やモデム（図示外）を介してネットワークから配信されたものであってもよい。本実施形態で対象とするテキストデータは、ＢＧＭを再生するための条件として、あらかじめタグ情報が付加されているものである。 Next, the operation of the portable information terminal 1 configured as described above will be described with reference to the flowchart of FIG. FIG. 7 is a flowchart showing the overall flow of the speech synthesis process in the portable information terminal 1. Here, the text to be subjected to speech synthesis processing may be stored in advance in the non-volatile memory 21 or may be distributed from the network via infrared communication or a modem (not shown). The text data targeted in this embodiment has tag information added in advance as a condition for reproducing the BGM.

処理が開始されると、まず、読上対象のテキストを解析し、環境情報タグがあるか否かを判断する（Ｓ１２）。環境情報タグは、音声やＢＧＭが出力される環境の静かさを示すためのものであり、例えば「＜環境：電車の中＞」、「＜環境：深夜＞」等の形でテキストに付加されている。環境情報タグがある場合には（Ｓ１２：ＹＥＳ）、そのタグ情報を抽出する（Ｓ１６）。環境情報タグがない場合には（Ｓ１２：ＮＯ）、マイク３１を用いて周囲の音の大きさを検出する（Ｓ１４）。音の大きさは、例えば８０ｄＢ、４０ｄＢ等のように検出される。次に、タグ情報又は抽出した音の大きさをＲＡＭ２２の環境情報記憶領域２２１に記憶する（Ｓ１８）。ここで記憶された環境情報は、後に選択されるＢＧＭの再生時の音量や、読上音声の音量に反映される。 When the process is started, first, the text to be read is analyzed to determine whether there is an environment information tag (S12). The environmental information tag is used to indicate the quietness of the environment in which sound and BGM are output. For example, the environmental information tag is added to the text in the form of “<Environment: Inside a train”, “<Environment: Late night>”, etc. ing. If there is an environmental information tag (S12: YES), the tag information is extracted (S16). When there is no environmental information tag (S12: NO), the surrounding sound is detected using the microphone 31 (S14). The loudness is detected as 80 dB, 40 dB, or the like, for example. Next, the tag information or the extracted sound volume is stored in the environment information storage area 221 of the RAM 22 (S18). The environmental information stored here is reflected in the volume of the BGM that is selected later and the volume of the reading voice.

次に、対象テキストの大種別をあらわすカテゴリ情報タグがあるか否かを判断する（Ｓ２０）。カテゴリ情報タグは、例えば「＜カテゴリ：スポーツ＞」、「＜カテゴリ：ニュース＞」、「＜カテゴリ：緊急＞」、「＜カテゴリ：童話＞」等の形でテキストに付加されている。なお、カテゴリ情報は、かならずしも全てのテキストに付加されていなくてもよい。カテゴリ情報タグがある場合は（Ｓ２０：ＹＥＳ）、そのカテゴリ情報を抽出し、ＲＡＭ２２のカテゴリ情報記憶領域２２２に記憶する（Ｓ２２）。カテゴリ情報タグがない場合は（Ｓ２０：ＮＯ）、そのままＳ２４に進む。ここで記憶されたカテゴリ情報は、後述のサブカテゴリ情報、算出された読上時間とともにＢＧＭを選択する際の条件の１つとなる。カテゴリ情報が記憶されなかった場合には、全てのカテゴリに用いられるＢＧＭの中から選択される。 Next, it is determined whether or not there is a category information tag representing a large type of target text (S20). The category information tag is added to the text in the form of, for example, “<Category: Sports>”, “<Category: News>”, “<Category: Emergency>”, “<Category: Fairy tale>”. Note that the category information is not necessarily added to all texts. If there is a category information tag (S20: YES), the category information is extracted and stored in the category information storage area 222 of the RAM 22 (S22). If there is no category information tag (S20: NO), the process proceeds to S24 as it is. The category information stored here is one of the conditions for selecting the BGM together with the subcategory information described later and the calculated reading time. When the category information is not stored, the BGM used for all categories is selected.

次に、対象テキストのカテゴリ内の小種別をあらわすサブカテゴリ情報タグがあるか否かを判断する（Ｓ２４）。サブカテゴリ情報タグは、例えば「＜サブカテゴリ：闘い＞」、「＜サブカテゴリ：経済＞」、「＜サブカテゴリ：災害＞」、「＜サブカテゴリ：外国＞」等の形でテキストに付加されている。なお、サブカテゴリ情報は、かならずしも全てのテキストに付加されていなくてもよい。サブカテゴリ情報タグがある場合は（Ｓ２４：ＹＥＳ）、そのカテゴリ情報を抽出し、ＲＡＭ２２のサブカテゴリ情報記憶領域２２３に記憶する（Ｓ２６）。サブカテゴリ情報タグがない場合は（Ｓ２４：ＮＯ）、そのままＳ２８に進む。 Next, it is determined whether or not there is a subcategory information tag indicating a small type in the category of the target text (S24). The subcategory information tag is added to the text in the form of, for example, “<subcategory: struggle>”, “<subcategory: economy>”, “<subcategory: disaster>”, “<subcategory: foreign>”. Note that the subcategory information may not be added to all texts. If there is a subcategory information tag (S24: YES), the category information is extracted and stored in the subcategory information storage area 223 of the RAM 22 (S26). If there is no subcategory information tag (S24: NO), the process proceeds directly to S28.

次に、書式情報を抽出し、ＲＡＭ２２の書式情報記憶領域２２４に記憶する（Ｓ２８）。書式情報は、対象テキストの文字数や改行、空欄の数等から構成される書式情報タグからなり、テキストに付加されている。例えば、書式情報タグは、「＜文字数：２５０＞」、「＜空欄：５＞」等のような形でテキストに付加されている。また、付加されていない場合には対象テキストの文字数をカウントするように構成してもよい。ここで記憶される書式情報は、対象テキストの読上時間を算出する際に用いられる。 Next, the format information is extracted and stored in the format information storage area 224 of the RAM 22 (S28). The format information is composed of format information tags including the number of characters of the target text, line feeds, the number of blanks, etc., and is added to the text. For example, the format information tag is added to the text in a form such as “<number of characters: 250>”, “<blank: 5>”, or the like. Moreover, when not added, the number of characters of the target text may be counted. The format information stored here is used when calculating the reading time of the target text.

次に、テキストを読み上げる際の音声に関する情報をあらわす読上情報タグがあるか否かを判断する（Ｓ３０）。読上音声に関する情報としては、音量、音声の性別、年代、ピッチ、トーン、声質等がある。読上情報タグは、例えば、図４の読上情報データベース２１２の例にあるように、「＜性別：１＞」、「＜年代：３＞」、「＜ピッチ：４＞」等のような形、または、図５の読上情報データベース２１２の例にあるように、これらの組み合わせによる「＜読上：Ａ＞」のような形でテキストに付加されている。また、付加されていない場合には規定値が用いられる。読上情報タグがある場合には（Ｓ３０：ＹＥＳ）、その読上情報を抽出し、ＲＡＭ２２の読上情報記憶領域２２５に記憶する（Ｓ３２）。読上情報タグがない場合には（Ｓ３２：ＮＯ）、そのままＳ３４に進む。ここで記憶された読上情報は、後述する読上音声データの出力の際に同時にオーディオ部３４に出力され、スピーカからの出力時に反映される。 Next, it is determined whether or not there is a reading information tag that represents information related to the voice when reading the text (S30). Information related to reading speech includes volume, gender, age, pitch, tone, voice quality, and the like. The reading information tag includes, for example, “<sex: 1>”, “<age: 3>”, “<pitch: 4>”, and the like, as in the example of the reading information database 212 of FIG. As shown in the example of the reading information database 212 in FIG. 5, it is added to the text in the form of “<reading: A>” by a combination of these. In addition, when not added, a specified value is used. If there is a reading information tag (S30: YES), the reading information is extracted and stored in the reading information storage area 225 of the RAM 22 (S32). If there is no reading information tag (S32: NO), the process proceeds to S34. The reading information stored here is simultaneously output to the audio unit 34 when reading audio data to be described later is output, and is reflected when output from the speaker.

次に、対象テキストとともにＢＧＭを再生するか否かを示すＯＮ／ＯＦＦ情報タグがあるか否かを判断する（Ｓ３４）。ＯＮ／ＯＦＦ情報タグは、例えば「＜ＢＧＭ：ＯＮ＞」、「＜ＢＧＭ：ＯＦＦ＞」等のような形でテキストに付加されている。また、付加されていない場合には、前のテキストと同様に処理する。ＯＮ／ＯＦＦ情報タグがある場合には（Ｓ３４：ＹＥＳ）、そのＯＮ／ＯＦＦ情報をＲＡＭ２２の停止情報記憶領域２２６に記憶する（Ｓ３６）。ＯＮ／ＯＦＦ情報タグがない場合には（Ｓ３４：ＮＯ）、そのままＳ３８に進む。記憶されたＯＮ／ＯＦＦ情報がＯＮであれば選択されたＢＧＭを再生し、ＯＦＦであれば再生を行なわない。 Next, it is determined whether or not there is an ON / OFF information tag indicating whether or not the BGM is reproduced together with the target text (S34). The ON / OFF information tag is added to the text in a form such as “<BGM: ON>”, “<BGM: OFF>”, or the like. If it is not added, it is processed in the same way as the previous text. If there is an ON / OFF information tag (S34: YES), the ON / OFF information is stored in the stop information storage area 226 of the RAM 22 (S36). If there is no ON / OFF information tag (S34: NO), the process proceeds to S38. If the stored ON / OFF information is ON, the selected BGM is played back, and if it is OFF, playback is not performed.

次に、対象テキストの一部に対してＢＧＭを再生する場合にその割合や再生部分を指定する情報を示す部分情報タグがあるか否かを判断する（Ｓ３８）。部分情報タグは、例えば対象テキストのクライマックス部分だけにＢＧＭを再生したい等の場合に、その部分のテキストを「＜ＢＧＭ＞」と「＜／ＢＧＭ＞」とで囲むような形、また、対象テキストの初めから５０−７５％に該当する部分にＢＧＭを流したい場合には、テキストの先頭に「＜ＢＧＭ：５０−７５％＞等のような形でテキストに付加されている。部分情報タグがある場合には（Ｓ３８：ＹＥＳ）、その部分情報をＲＡＭ２２の部分情報記憶領域２２７に記憶する（Ｓ４０）。部分情報タグがない場合には（Ｓ３８：ＮＯ）、そのままＳ４２に進む。ここで記憶された部分情報と後述する読上時間算出処理で算出された読上時間とを掛け合わせてＢＧＭの継続時間を算出し、算出されたＢＧＭ継続時間に適合するＢＧＭをＢＧＭデータベース２１１から選択する。一致する継続時間のＢＧＭがＢＧＭデータベース２１１に存在しない場合には、読上時間よりも長い継続時間のＢＧＭを選択し、読上終了の際にＢＧＭをフェイドアウトするようにしたり、読上またはＢＧＭの速度を調整して同時に終了するようにしてもよい。 Next, when BGM is reproduced for a part of the target text, it is determined whether or not there is a partial information tag indicating information for specifying the ratio and the reproduction part (S38). The partial information tag is a form that surrounds the text of the part with “<BGM>” and “</ BGM>” when the BGM is to be reproduced only in the climax part of the target text, for example, and the target text When it is desired to flow BGM to the portion corresponding to 50-75% from the beginning of the text, “<BGM: 50-75%>” is added to the text at the beginning of the text. If there is (S38: YES), the partial information is stored in the partial information storage area 227 of the RAM 22 (S40) If there is no partial information tag (S38: NO), the process proceeds to S42 as it is. The BGM duration is calculated by multiplying the read partial information by the reading time calculated in the reading time calculation process described later, and the BGM that matches the calculated BGM duration is calculated as BGM. Select from the database 211. If there is no matching BGM in the BGM database 211, select a BGM with a duration longer than the reading time and fade out the BGM when reading ends. Alternatively, the reading or BGM speed may be adjusted to end simultaneously.

次に、対象テキストの音声合成処理を行なう（Ｓ４２）。この音声合成処理では、周知の方法により音声合成辞書２１５を用いてアクセント付き読み文字列を生成し、さらに、ポーズを付与してＲＡＭ２２の読上音声データ記憶領域２２８に記憶する。次いで、生成されたアクセント・ポーズ付き読み文字列データについてそれぞれの読み文字列の表す音節の継続長を算出し、それを合計して読上時間を算出する（Ｓ４４）。音節の継続長の既定値は、あらかじめ音節長データベース２１６として不揮発メモリ２１に記憶されている。この規定値に発声速度計数を乗じて実際の音節の継続長が算出される。この発生速度係数には、読上情報記憶領域１３７に記憶されたピッチに関する情報が反映される。次に、算出された各音節の継続長を合計して、対象テキストの読み上げにかかる合計継続時間を算出し、読上時間記憶領域２２９に記憶する。 Next, speech synthesis processing of the target text is performed (S42). In this speech synthesis process, an accented reading character string is generated by using a speech synthesis dictionary 215 by a well-known method, and a pause is given and stored in the reading speech data storage area 228 of the RAM 22. Next, the continuation length of the syllable represented by each read character string is calculated for the generated read character string data with accent / pause and summed up to calculate the reading time (S44). The default value of the syllable duration is stored in advance in the nonvolatile memory 21 as the syllable length database 216. The actual syllable duration is calculated by multiplying this specified value by the utterance speed count. The generation speed coefficient reflects information regarding the pitch stored in the reading information storage area 137. Next, the calculated durations of the syllables are totaled to calculate the total duration required for reading the target text, and the total duration is stored in the reading time storage area 229.

次に、以上の処理により、カテゴリ情報記憶領域２２２に記憶されたカテゴリ、サブカテゴリ情報記憶領域２２３に記憶されたサブカテゴリ、部分情報記憶領域２２７に記憶されたＢＧＭ再生の割合、読上時間記憶領域２２９に記憶された読上時間を条件として、ＢＧＭデータベース２１１から適切なＢＧＭを選択する（Ｓ４６）。選択されたＢＧＭは、選択ＢＧＭ記憶領域２３０に記憶される。 Next, by the above processing, the category stored in the category information storage area 222, the subcategory stored in the subcategory information storage area 223, the ratio of BGM playback stored in the partial information storage area 227, the reading time storage area 229 A suitable BGM is selected from the BGM database 211 on the condition of the reading time stored in (S46). The selected BGM is stored in the selected BGM storage area 230.

次に、現在処理している対象テキストが、読上を行ないたい最後のテキストか否かを判断する（Ｓ４８）。最後のテキストでなければ（Ｓ４８：ＮＯ）、Ｓ１８に戻って、前のテキストと同じ環境情報を環境情報記憶領域２２１に記憶する。環境情報は、これから読上・ＢＧＭ再生を行なう周囲の静かさを示す指標なので、全てのテキストについて共通のデータを記憶させるものである。そして、次のテキストについてＳ２０〜Ｓ４６の処理を行ない、最後のテキストになるまで処理を繰り返す。 Next, it is determined whether or not the target text currently being processed is the last text to be read (S48). If it is not the last text (S48: NO), the process returns to S18, and the same environment information as the previous text is stored in the environment information storage area 221. Since the environmental information is an index indicating the quietness of the surroundings where reading and BGM reproduction will be performed from now on, common data is stored for all texts. And the process of S20-S46 is performed about the following text, and a process is repeated until it becomes the last text.

最後のテキストの場合には（Ｓ４８：ＹＥＳ）、ＲＡＭ２２に記憶されている複数のテキストについて、同じカテゴリが連続しているか否かを判断する（Ｓ５０）。同じカテゴリが連続している場合には（Ｓ５０：ＹＥＳ）、共通化処理を行なう（Ｓ５２）。具体的には、連続しているテキストの読上時間と部分情報から得られるＢＧＭ再生時間を合計し、その合計時間に見合う、同じカテゴリのＢＧＭに変更し、選択ＢＧＭ記憶領域２３０に上書きして記憶する。同じカテゴリが連続していない場合には（Ｓ５０：ＮＯ）、そのままＳ５４に進む。 In the case of the last text (S48: YES), it is determined whether or not the same category is continuous for a plurality of texts stored in the RAM 22 (S50). If the same category is continuous (S50: YES), a sharing process is performed (S52). Specifically, the BGM playback time obtained from the reading time of the continuous text and the partial information is totaled, and the BGM of the same category corresponding to the total time is changed, and the selected BGM storage area 230 is overwritten. Remember. If the same category is not continuous (S50: NO), the process proceeds directly to S54.

そして、読上音声データ記憶領域２２８に記憶した読上音声データ、読上情報記憶領域２２５に記憶した読上情報、選択ＢＧＭ記憶領域２３０に記憶した選択ＢＧＭ情報をオーディオ部３４に出力し、オーディオ部３４で読上音声データをアナログ化して選択ＢＧＭと同期させてスピーカ３５から出力する（Ｓ５４）。 Then, the read audio data stored in the read audio data storage area 228, the read information stored in the read information storage area 225, and the selected BGM information stored in the selected BGM storage area 230 are output to the audio unit 34, and the audio The read-out voice data is converted to analog by the unit 34 and output from the speaker 35 in synchronization with the selected BGM (S54).

以下、具体的な例を、図６を参照しながら説明する。例えば、今、読み上げたいテキストが３つある場合を想定する。まず、処理を開始した対象テキストには環境情報タグが付加されていない場合（Ｓ１２：ＮＯ）、マイク３１を用いて周囲の音を拾い、音量を測定する（Ｓ１４）。そして、測定された音量、例えば８０ｄＢを環境情報記憶領域２２１の１番目のテキストに対応する領域に記憶する（Ｓ１８）。次に、カテゴリ情報タグを検索すると「＜カテゴリ：ニュース＞」というタグがヒットしたので（Ｓ２０：ＹＥＳ）、この情報をカテゴリ情報記憶領域２２２の１番目のテキストに対応する領域に記憶する（Ｓ２２）。次に、サブカテゴリ情報タグを検索すると「＜サブカテゴリ：経済＞」というタグがヒットしたので（Ｓ２４：ＹＥＳ）、この情報をサブカテゴリ情報記憶領域２２３の１番目のテキストに対応する領域に記憶する（Ｓ２６）。そして、書式情報タグを検索したがヒットしなかったので、テキストの文字数をカウントしてカウントされた２５０文字という情報を書式情報記憶領域２２４の１番目のテキストに対応する領域に記憶する（Ｓ２８）。 A specific example will be described below with reference to FIG. For example, assume that there are three texts to be read out. First, when the environment information tag is not added to the target text for which processing has started (S12: NO), the surrounding sound is picked up using the microphone 31 and the volume is measured (S14). Then, the measured sound volume, for example, 80 dB, is stored in the area corresponding to the first text in the environment information storage area 221 (S18). Next, when the category information tag is searched, the tag “<category: news>” is hit (S20: YES), and this information is stored in the area corresponding to the first text in the category information storage area 222 (S22). ). Next, when the subcategory information tag is searched, the tag “<subcategory: economy>” is hit (S24: YES), and this information is stored in the area corresponding to the first text in the subcategory information storage area 223 (S26). ). Then, since the format information tag was searched but did not hit, the number of characters in the text was counted and the information of 250 characters counted was stored in the area corresponding to the first text in the format information storage area 224 (S28). .

次に、読上情報タグを検索すると、「＜読上：Ｇ＞」というタグがヒットしたので（Ｓ３０：ＹＥＳ）、この情報を読上情報記憶領域２２５の１番目のテキストに対応する領域に記憶する（Ｓ３２）。そして、ＯＮ／ＯＦＦ情報タグを検索すると、「＜ＢＧＭ：ＯＮ＞」というタグがヒットしたので（Ｓ３４：ＹＥＳ）、この情報を停止情報記憶領域２２６の１番目のテキストに対応する領域に記憶する（Ｓ３６）。次に、部分情報タグを検索すると、「＜ＢＧＭ：５０−７５％＞」というタグがヒットしたので（Ｓ３８：ＹＥＳ）、その情報を部分情報記憶領域２２７の１番目のテキストに対応する領域に記憶する（Ｓ４０）。 Next, when the reading information tag is searched, the tag “<reading: G>” is hit (S30: YES), and this information is stored in the area corresponding to the first text in the reading information storage area 225. Store (S32). When the ON / OFF information tag is searched, the tag “<BGM: ON>” is hit (S34: YES), and this information is stored in the area corresponding to the first text in the stop information storage area 226. (S36). Next, when the partial information tag is searched, the tag “<BGM: 50-75%>” is hit (S38: YES), so that information is stored in the area corresponding to the first text in the partial information storage area 227. Store (S40).

次に、対象テキストの音声合成処理を行ない、生成された読み・アクセント・ポーズ付き文字列を読上音声データ記憶領域２２８の１番目のテキストに対応する領域に記憶する（Ｓ４２）。そして、読上音声データ記憶領域２２８に記憶されたデータに対して音節長データベース２１６を用いて読み上げにかかる時間を算出し、算出された読み上げ時間９０秒を読上時間記憶領域２２９の１番目のテキストに対応する領域に記憶する（Ｓ４４）。そして、以上の処理で記憶された情報、すなわち、カテゴリ：ニュース、サブカテゴリ：経済、再生割合：２５％(テキストの５０−７５％）と読上時間９０秒から算出される再生継続時間２２．５秒を条件として、図３に示すＢＧＭデータベース２１１を検索して、ＢＧＭのＫを選択する（Ｓ４６）。ここで、ＢＧＭのＫの継続時間は３０秒であり、算出された再生継続時間より長いので、読上終了の際にＢＧＭのＫは再生継続時間に合わせてフェイドアウトするように設定するとよい。 Next, speech synthesis processing is performed on the target text, and the generated character string with reading / accent / pause is stored in an area corresponding to the first text in the reading voice data storage area 228 (S42). Then, using the syllable length database 216 for the data stored in the reading voice data storage area 228, the time required for reading is calculated, and the calculated reading time of 90 seconds is calculated as the first reading time storage area 229. The area corresponding to the text is stored (S44). The information stored in the above processing, that is, category: news, subcategory: economy, playback ratio: 25% (50-75% of text), and playback duration 22.5 calculated from reading time 90 seconds. On the condition of the second, the BGM database 211 shown in FIG. 3 is searched and K of BGM is selected (S46). Here, the duration of K of BGM is 30 seconds, which is longer than the calculated playback duration, and therefore it is preferable to set the BGM K to fade out in accordance with the duration of playback when reading ends.

次に、このテキストは最後のテキストではないので（Ｓ４８：ＮＯ）、Ｓ１８に戻り、環境情報記憶領域２２１の２番目のテキストに対応する領域に、１番目のテキストに記憶したのと同じ情報、８０ｄＢを記憶する（Ｓ１８）。次に、カテゴリ情報タグを検索すると「＜カテゴリ：ニュース＞」というタグがヒットしたので（Ｓ２０：ＹＥＳ）、この情報をカテゴリ情報記憶領域２２２の２番目のテキストに対応する領域に記憶する（Ｓ２２）。次に、サブカテゴリ情報タグを検索すると「＜サブカテゴリ：戦闘＞」というタグがヒットしたので（Ｓ２４：ＹＥＳ）、この情報をサブカテゴリ情報記憶領域２２３の２番目のテキストに対応する領域に記憶する（Ｓ２６）。そして、書式情報タグを検索してヒットした「＜文字数：１５０＞」から得られた文字数情報を、書式情報記憶領域２２４の２番目のテキストに対応する領域に記憶する（Ｓ２８）。 Next, since this text is not the last text (S48: NO), the process returns to S18, and the same information stored in the first text in the area corresponding to the second text in the environment information storage area 221. 80 dB is stored (S18). Next, when the category information tag is searched, the tag “<category: news>” is hit (S20: YES), and this information is stored in the area corresponding to the second text in the category information storage area 222 (S22). ). Next, when the subcategory information tag is searched, the tag “<subcategory: battle>” is hit (S24: YES), and this information is stored in the area corresponding to the second text in the subcategory information storage area 223 (S26). ). Then, the number-of-characters information obtained from “<number of characters: 150>” obtained by searching the format information tag is stored in the area corresponding to the second text in the format information storage area 224 (S28).

次に、読上情報タグを検索すると、「＜読上：Ｔ＞」というタグがヒットしたので（Ｓ３０：ＹＥＳ）、この情報を読上情報記憶領域２２５の２番目のテキストに対応する領域に記憶する（Ｓ３２）。そして、ＯＮ／ＯＦＦ情報タグを検索したがヒットしなかったので（Ｓ３４：ＮＯ）、１番目のテキストと同様のＯＮという情報を停止情報記憶領域２２６の２番目のテキストに対応する領域に記憶する。次に、部分情報タグを検索すると、「＜ＢＧＭ：２０−５０％＞」というタグがヒットしたので（Ｓ３８：ＹＥＳ）、その情報を部分情報記憶領域２２７の２番目のテキストに対応する領域に記憶する（Ｓ４０）。 Next, when a reading information tag is searched, a tag “<reading: T>” is hit (S30: YES), and this information is stored in an area corresponding to the second text in the reading information storage area 225. Store (S32). Then, since the ON / OFF information tag was searched but did not hit (S34: NO), the same ON information as the first text is stored in the area corresponding to the second text in the stop information storage area 226. . Next, when the partial information tag is searched, the tag “<BGM: 20-50%>” is hit (S38: YES), so that information is stored in the area corresponding to the second text in the partial information storage area 227. Store (S40).

次に、対象テキストの音声合成処理を行ない、生成された読み・アクセント・ポーズ付き文字列を読上音声データ記憶領域２２８の２番目のテキストに対応する領域に記憶する（Ｓ４２）。そして、読上音声データ記憶領域２２８に記憶されたデータに対して音節長データベース２１６を用いて読み上げにかかる時間を算出し、算出された読み上げ時間１１０秒を読上時間記憶領域２２９の２番目のテキストに対応する領域に記憶する（Ｓ４４）。そして、以上の処理で記憶された情報、すなわち、カテゴリ：ニュース、サブカテゴリ：戦闘、再生割合：３０％（テキストの２０−５０％）と読上時間１１０秒から算出される再生継続時間３３秒を条件として、図３に示すＢＧＭデータベース２１１を検索して、ＢＧＭのＮを選択する（Ｓ４６）。 Next, speech synthesis processing is performed on the target text, and the generated character string with reading / accent / pause is stored in an area corresponding to the second text in the reading voice data storage area 228 (S42). Then, the time taken to read out the data stored in the reading voice data storage area 228 is calculated using the syllable length database 216, and the calculated reading time of 110 seconds is calculated as the second time in the reading time storage area 229. The area corresponding to the text is stored (S44). Then, the information stored in the above processing, that is, category: news, subcategory: battle, playback ratio: 30% (20-50% of text) and playback duration 33 seconds calculated from reading time 110 seconds. As a condition, the BGM database 211 shown in FIG. 3 is searched and N of BGM is selected (S46).

次に、このテキストは最後のテキストではないので（Ｓ４８：ＮＯ）、Ｓ１８に戻り、環境情報記憶領域２２１の３番目のテキストに対応する領域に、２番目のテキストに記憶したのと同じ情報、８０ｄＢを記憶する（Ｓ１８）。次に、カテゴリ情報タグを検索すると「＜カテゴリ：スポーツ＞」というタグがヒットしたので（Ｓ２０：ＹＥＳ）、この情報をカテゴリ情報記憶領域２２２の３番目のテキストに対応する領域に記憶する（Ｓ２２）。次に、サブカテゴリ情報タグを検索すると「＜サブカテゴリ：陸上＞」というタグがヒットしたので（Ｓ２４：ＹＥＳ）、この情報をサブカテゴリ情報記憶領域２２３の３番目のテキストに対応する領域に記憶する（Ｓ２６）。そして、書式情報タグを検索してヒットした「＜文字数：１５０＞」から得られた文字数情報を、書式情報記憶領域２２４の３番目のテキストに対応する領域に記憶する（Ｓ２８）。 Next, since this text is not the last text (S48: NO), the process returns to S18, and the same information stored in the second text in the area corresponding to the third text in the environment information storage area 221. 80 dB is stored (S18). Next, when the category information tag is searched, the tag “<category: sports>” is hit (S20: YES), and this information is stored in the area corresponding to the third text in the category information storage area 222 (S22). ). Next, when the subcategory information tag is searched, the tag “<subcategory: land>” is hit (S24: YES), and this information is stored in the area corresponding to the third text in the subcategory information storage area 223 (S26). ). Then, the number-of-characters information obtained from “<number of characters: 150>” obtained by searching for the format information tag is stored in the area corresponding to the third text in the format information storage area 224 (S28).

次に、読上情報タグを検索すると、「＜読上：Ｇ＞」というタグがヒットしたので（Ｓ３０：ＹＥＳ）、この情報を読上情報記憶領域２２５の３番目のテキストに対応する領域に記憶する（Ｓ３２）。そして、ＯＮ／ＯＦＦ情報タグを検索したがヒットしなかったので（Ｓ３４：ＮＯ）、２番目のテキストと同様のＯＮという情報を停止情報記憶領域２２６の３番目のテキストに対応する領域に記憶する。次に、部分情報タグを検索すると、「＜ＢＧＭ：４０−９０％＞」というタグがヒットしたので（Ｓ３８：ＹＥＳ）、その情報を部分情報記憶領域２２７の３番目のテキストに対応する領域に記憶する（Ｓ４０）。 Next, when the reading information tag is searched, the tag “<reading: G>” is hit (S30: YES), and this information is stored in the area corresponding to the third text in the reading information storage area 225. Store (S32). Then, since the ON / OFF information tag was searched but did not hit (S34: NO), the same ON information as the second text is stored in the area corresponding to the third text in the stop information storage area 226. . Next, when the partial information tag is searched, the tag “<BGM: 40-90%>” is hit (S38: YES), so that information is stored in the area corresponding to the third text in the partial information storage area 227. Store (S40).

次に、対象テキストの音声合成処理を行ない、生成された読み・アクセント・ポーズ付き文字列を読上音声データ記憶領域２２８の３番目のテキストに対応する領域に記憶する（Ｓ４２）。そして、読上音声データ記憶領域２２８に記憶されたデータに対して音節長データベース２１６を用いて読み上げにかかる時間を算出し、算出された読み上げ時間６０秒を読上時間記憶領域２２９の３番目のテキストに対応する領域に記憶する（Ｓ４４）。そして、以上の処理で記憶された情報、すなわち、カテゴリ：スポーツ、サブカテゴリ：陸上、再生割合：５０％（テキストの４０−９０％）と読上時間６０秒から算出される再生継続時間３０秒を条件として、図３に示すＢＧＭデータベース２１１を検索して、ＢＧＭのＪを選択する（Ｓ４６）。 Next, speech synthesis processing of the target text is performed, and the generated character string with reading / accent / pause is stored in an area corresponding to the third text in the reading voice data storage area 228 (S42). Then, using the syllable length database 216 for the data stored in the reading voice data storage area 228, the time required for reading is calculated, and the calculated reading time of 60 seconds is calculated as the third time in the reading time storage area 229. The area corresponding to the text is stored (S44). Then, the information stored by the above processing, that is, category: sports, subcategory: land, reproduction ratio: 50% (40-90% of text) and reading duration of 30 seconds calculated from reading time of 60 seconds. As a condition, the BGM database 211 shown in FIG. 3 is searched and J of BGM is selected (S46).

次に、このテキストが最後のテキストであるから（Ｓ４８：ＹＥＳ）、ＲＡＭ２２のカテゴリ情報記憶領域２２２を検索して、同じカテゴリが連続しているか否かを調べる（Ｓ５０）。すると、テキスト１とテキスト２のカテゴリが「ニュース」で同一である（Ｓ５０：ＹＥＳ）。ここで、共通化処理を行なう（Ｓ５２）。具体的には、テキスト１とテキスト２のＢＧＭ再生時間を合計すると、２２．５秒＋３３秒＝５５．５秒となるので、この継続時間に見合うＢＧＭのＬを選択し、テキスト１とテキスト２を継続してＢＧＭのＬを流すように選択ＢＧＭ記憶領域２３０に記憶する。ここで、ＢＧＭのＬの継続時間は１分４０秒であり、算出された再生継続時間より長いので、読上終了の際にＢＧＭのＬは再生継続時間に合わせてフェイドアウトするように設定するとよい。そして、各テキストの読上音声データ記憶領域２２８に記憶した読上音声データ、読上情報記憶領域２２５に記憶した読上情報、選択ＢＧＭ記憶領域２３０に記憶した選択ＢＧＭ情報を順にオーディオ部３４に出力し、オーディオ部３４で読上音声データをアナログ化して選択ＢＧＭと同期させてスピーカ３５から出力する（Ｓ５４）。 Next, since this text is the last text (S48: YES), the category information storage area 222 of the RAM 22 is searched to check whether or not the same category is continuous (S50). Then, the categories of text 1 and text 2 are the same as “news” (S50: YES). Here, common processing is performed (S52). Specifically, the sum of the BGM playback times of text 1 and text 2 is 22.5 seconds + 33 seconds = 55.5 seconds. Therefore, L of BGM corresponding to the duration time is selected, and text 1 and text 2 are selected. Is continuously stored in the selected BGM storage area 230 so that the BGM L flows. Here, the duration of L of BGM is 1 minute and 40 seconds, which is longer than the calculated playback duration. Therefore, it is preferable to set the BGM L to fade out in accordance with the playback duration when reading ends. . Then, the reading voice data stored in the reading voice data storage area 228 of each text, the reading information stored in the reading information storage area 225, and the selection BGM information stored in the selection BGM storage area 230 are sequentially supplied to the audio unit 34. Then, the audio data is converted to analog by the audio unit 34 and output from the speaker 35 in synchronization with the selected BGM (S54).

以上説明したように、本実施形態の携帯端末に１よれば、テキストに付加された環境情報タグ、カテゴリ情報タグ、サブカテゴリ情報タグ、書式情報タグ、読上情報タグ、ＯＮ／ＯＦＦ情報タグ、部分情報タグ、読上情報タグを抽出し、これらの情報を複合条件としてＢＧＭデータベースからテキストに適合したＢＧＭを選択するので、使用者は読上げられるテキストの内容にふさわしいＢＧＭを聴きながら心地よくテキストの読上げを聴くことができる。 As described above, according to the mobile terminal of the present embodiment, the environment information tag, the category information tag, the subcategory information tag, the format information tag, the reading information tag, the ON / OFF information tag, the part added to the text Information tags and reading information tags are extracted and BGM suitable for the text is selected from the BGM database using these information as a composite condition, so the user can read the text comfortably while listening to the BGM appropriate for the content of the text to be read. I can listen.

なお、上記実施の形態において、図７のフローチャートのＳ１６における環境情報抽出処理，Ｓ２２におけるカテゴリ情報抽出処理，Ｓ２６におけるサブカテゴリ情報抽出処理，Ｓ２８における書式情報抽出処理，Ｓ３２における読上情報抽出処理，Ｓ３６におけるＯＮ／ＯＦＦ情報抽出処理，Ｓ４０における部分情報抽出処理を実行するＣＰＵ１０が本発明の情報抽出手段として機能する。また、図７のフローチャートのＳ４６でＢＧＭ選択処理を実行するＣＰＵ１０が本発明のＢＧＭ選択手段として機能する。さらに、図７のフローチャートのＳ５４で読み上げ音声にＢＧＭを同期させて出力処理するＣＰＵ１０が本発明のＢＧＭ再生手段として機能する。また、図７のフローチャートのＳ５２で共通化処理を実行するＣＰＵ１０が本発明の共通化手段として機能する。 In the above embodiment, the environment information extraction process in S16 of the flowchart of FIG. 7, the category information extraction process in S22, the subcategory information extraction process in S26, the format information extraction process in S28, the reading information extraction process in S32, S36 The CPU 10 that executes the ON / OFF information extraction process in S40 and the partial information extraction process in S40 functions as the information extraction means of the present invention. Further, the CPU 10 that executes the BGM selection process in S46 of the flowchart of FIG. 7 functions as the BGM selection means of the present invention. Further, the CPU 10 that performs the output processing in synchronization with the read-out voice in S54 of the flowchart of FIG. 7 functions as the BGM reproducing means of the present invention. Further, the CPU 10 that executes the sharing process in S52 of the flowchart of FIG. 7 functions as the sharing means of the present invention.

本発明の音声合成装置は、音声合成処理を実行できるコンピュータとＢＧＭ音源を有する構成の装置に適用できる。 The speech synthesizer of the present invention can be applied to a device having a computer capable of performing speech synthesis processing and a BGM sound source.

本発明による音声合成装置の一例を示す携帯情報端末１の回路ブロック図である。It is a circuit block diagram of the portable information terminal 1 which shows an example of the speech synthesizer by this invention. 不揮発メモリ２１の構成を模式的に示すブロック図である。3 is a block diagram schematically showing a configuration of a nonvolatile memory 21. FIG. ＢＧＭデータベース２１１の構成を模式的に示すブロック図である。3 is a block diagram schematically showing a configuration of a BGM database 211. FIG. 読上情報データベース２１２の構成を模式的に示すブロック図である。3 is a block diagram schematically showing a configuration of a reading information database 212. FIG. 読上情報データベース２１２の構成を模式的に示すブロック図である。3 is a block diagram schematically showing a configuration of a reading information database 212. FIG. ＲＡＭ２２の記憶領域を示す模式図である。3 is a schematic diagram showing a storage area of a RAM 22. FIG. 携帯情報端末１における音声合成処理の全体の流れを示すフローチャートである。4 is a flowchart showing an overall flow of speech synthesis processing in the portable information terminal 1.

Explanation of symbols

１携帯情報端末
１０ＣＰＵ
２１不揮発メモリ
３４オーディオ部
３５スピーカ
２１１ＢＧＭデータベース
1 mobile information terminal 10 CPU
21 Non-volatile memory 34 Audio section 35 Speaker 211 BGM database

Claims

In a speech synthesizer that reads out text from synthesized speech while reproducing BGM,
Storage means for storing at least BGM data classified by category and playback time;
Information extracting means for extracting additional information added to the text;
BGM selection means for selecting BGM data from the storage means based on the additional information extracted by the information extraction means;
A speech synthesizer comprising: BGM reproduction means for reproducing BGM data selected by the BGM selection means in synchronism with reading of the text.

The additional information includes category information for classifying the content of the text, reading information for designating an attribute including the size, pitch, and tone of a reading voice that reads out the text, format information indicating the number of characters and the length of the text, 2. The speech synthesizer according to claim 1, further comprising any one of environmental information indicating a loudness of a surrounding sound to be read out.

The speech synthesis apparatus according to claim 2, wherein the BGM selection unit selects BGM data based on at least one of the category information and the format information.

4. The speech synthesizer according to claim 1, wherein the BGM selection means selects one BGM data for each of the plurality of texts.

A common means for extracting common features from additional information of a plurality of consecutive texts,
The speech synthesis apparatus according to claim 1, wherein the BGM selection unit selects one BGM data for a plurality of texts shared by the sharing unit.

6. The speech synthesizer according to claim 5, wherein the common unit extracts the common category information as a common feature for a plurality of texts having the common category information.

The additional information includes stop information for instructing whether or not to reproduce the BGM data, or partial information indicating a ratio of a part for reproducing the BGM,
7. The speech synthesizer according to claim 1, further comprising a BGM stop control means for starting and stopping playback of the BGM based on the stop information or the partial information.

In a speech synthesis method for reading out text with synthesized speech while playing back BGM,
An information extraction step of extracting additional information added to the text;
A BGM selection step of selecting BGM data for the text from a plurality of BGM data classified and stored by at least a category and a reproduction time based on the additional information extracted by the information extraction step;
A speech synthesis method comprising: a BGM reproduction step of reproducing the BGM data selected in the BGM selection step in synchronization with the reading of the text.

The additional information includes category information for classifying the content of the text, reading information for designating an attribute including the size, pitch, and tone of a reading voice that reads out the text, format information indicating the number of characters and the length of the text, 9. The speech synthesis method according to claim 8, further comprising any one of environmental information indicating a loudness of a surrounding sound to be read out.

The speech synthesis method according to claim 9, wherein, in the BGM selection step, BGM data is selected based on at least one of the category information or the format information.

11. The speech synthesis method according to claim 8, wherein in the BGM selection step, one BGM data is selected for each of the plurality of texts.

It has a common process to extract common features from additional information of multiple consecutive texts,
The speech synthesis method according to claim 8 or 9, wherein, in the BGM selection step, one BGM data is selected for a plurality of texts shared in the sharing step.

13. The speech synthesis method according to claim 12, wherein, in the sharing step, the common category information is extracted as a common feature for a plurality of texts having the same category information.

The additional information includes stop information for instructing whether or not to reproduce the BGM data, or partial information indicating a ratio of a part for reproducing the BGM,
The speech synthesis method according to any one of claims 8 to 13, further comprising a BGM stop control step of starting and stopping playback of the BGM based on the stop information or the partial information.

A speech synthesis program for causing a computer to execute the speech synthesis method according to claim 8.