JPH09265380A

JPH09265380A - Method and device for synthesizing voice

Info

Publication number: JPH09265380A
Application number: JP8072361A
Authority: JP
Inventors: Makoto Hirota; 誠廣田; Michio Aizawa; 道雄相澤; Keiichi Sakai; 桂一酒井; Tsuyoshi Yagisawa; 津義八木沢; Minoru Fujita; 稔藤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1996-03-27
Filing date: 1996-03-27
Publication date: 1997-10-07

Abstract

PROBLEM TO BE SOLVED: To classify the respective text strings of document data to be voice synthesized and to change and read aloud output voice by voice synthesis for respective attributes. SOLUTION: A document structure analysis part 101 classifies the contents of the document data into a citing part and a text based on prescribed symbols (for instance > and ≫) included in the document data. A voice quality data holding part holds the at least two kinds of voice quality data for deciding the voice quality of synthetic voice in the voice synthesis. A voice synthesis part 102 acquires which one of the text and a citing sentence, character string data to be the object of a voice synthesis processing belong to from the analyzed result of the document structure analysis part 101, selects the voice quality data held in the voice quality data holding part 103 corresponding to the attribute and performs the voice synthesis for the character string data based on the selected voice quality data.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声合成方法及び
装置に関する。特に、電子メールやネットニュースなど
のテキスト列を音声に変換して読み上げるのに適した音
声合成方法及び装置に関する。TECHNICAL FIELD The present invention relates to a speech synthesis method and apparatus. In particular, the present invention relates to a voice synthesizing method and device suitable for converting a text string of e-mail, net news, etc. into voice and reading the voice.

【０００２】[0002]

【従来の技術】コンピュータやワープロの普及により、
文書の電子化が進んでいる。また、こうした電子化文書
をコンピュータで処理し、音声に変換して出力する文書
読み上げ技術が開発されている。2. Description of the Related Art With the spread of computers and word processors,
The digitization of documents is progressing. Further, a document reading technique has been developed in which such a computerized document is processed by a computer, converted into a voice and output.

【０００３】近年になって、インターネットや各社のパ
ソコン通信サービスなど、コンピュータネットワークが
急速に発達している。これにより、電子メールやネット
ニュースといった電子化文書がその重要性を高めつつあ
る。電子メールやネットニュースは、他人の発信したメ
ールやネットニュース記事に対する「返信」の形で発信
されるケースが多々ある。この場合、他人のメールやニ
ュース記事の内容の一部を引用して用いることが多い。In recent years, computer networks such as the Internet and personal computer communication services of various companies are rapidly developing. As a result, electronic documents such as electronic mail and net news are becoming more important. E-mails and net news are often sent in the form of "replies" to mails and net news articles sent by others. In this case, it is often the case that a part of the contents of another person's mail or news article is quoted and used.

【０００４】[0004]

【発明が解決しようとする課題】さて、このような電子
メールやネットニュースを音声合成によって自動的に読
み上げする場合、本文と引用部分を区別することなく読
み上げてしまうと、その音声を聞くだけでは、その内容
を把握しにくいという問題がある。この問題を解決する
には、本文と引用部分で出力音声を変える必要がある
が、従来にはそのような技術はなかった。When automatically reading aloud such an electronic mail or netnews by voice synthesis, if the text and the quoted part are read aloud without distinction, it is only necessary to hear the voice. , There is a problem that it is difficult to grasp the contents. In order to solve this problem, it is necessary to change the output voice in the text and the quoted part, but there was no such technology in the past.

【０００５】本発明は上記従来技術に鑑みてなされたも
のであり、音声合成すべき文書データの各テキスト列を
属性によって分類し、各属性毎に出力音声を変更するこ
とが可能な音声合成方法及び装置を提供することを目的
とする。The present invention has been made in view of the above-mentioned prior art, and a speech synthesis method capable of classifying each text string of document data to be speech-synthesized by an attribute and changing an output speech for each attribute. And to provide a device.

【０００６】上記目的を達成する音声合成方法及び装置
により、例えば、電子メール等の読み上げに際して、文
書の本文と引用部分とで出力音声を変えて文書を読み上
げることを可能とする。By the voice synthesizing method and apparatus for achieving the above object, for example, when reading an e-mail or the like, it is possible to read the document by changing the output voice depending on the text of the document and the quoted portion.

【０００７】[0007]

【課題を解決するための手段】上記の目的を達成するた
めの本発明の音声合成装置は以下の構成を備える。即
ち、文書データに含まれる所定記号に基づいて、該文書
データの内容を少なくとも２種類の属性に分類する分類
手段と、音声合成における合成音声の声質を決定する、
少なくとも２種類の声質データを保持する保持手段と、
前記分類手段で分類された属性に応じて、前記保持手段
で保持された声質データを選択する選択手段と、前記選
択手段で選択された声質データに基づいて音声合成を行
う合成手段とを備える。A speech synthesis apparatus of the present invention for achieving the above object has the following configuration. That is, based on a predetermined symbol included in the document data, a classifying unit that classifies the content of the document data into at least two types of attributes, and a voice quality of synthesized speech in speech synthesis are determined.
Holding means for holding at least two types of voice quality data,
The present invention further includes a selection unit that selects the voice quality data held by the holding unit according to the attribute classified by the classification unit, and a synthesizing unit that performs voice synthesis based on the voice quality data selected by the selection unit.

【０００８】また、好ましくは、前記分類手段における
前記文書データの属性の分類は、該文書データの行単位
で行われる。Further, preferably, the classification of the attribute of the document data by the classifying means is performed for each line of the document data.

【０００９】また、好ましくは、前記分類手段は、前記
文書データの各行について、所定記号を含む所定の正規
表現と一致する文字列が行内に存在するか否かに基づい
て各行の属性を分類する。Preferably, the classification means classifies the attributes of each line of the document data based on whether or not a character string matching a predetermined regular expression including a predetermined symbol exists in the line. .

【００１０】また、好ましくは、前記分類手段は、前記
文書データの各行について、所定記号を末尾とする文字
列が所定文字数以下であるか否かに基づいて各行の属性
を分類する。Further, preferably, the classification means classifies the attributes of each line of the document data based on whether or not a character string having a predetermined symbol at the end is a predetermined number of characters or less.

【００１１】また、好ましくは、前記選択手段は、前記
分類手段で分類された属性と、前記所定記号の近傍の文
字列とに基づいて声質データを選択する。例えば、所定
記号の近傍（例えば所定記号の直前）に引用元を示す文
字列が配されている場合、引用元毎に異なる声質で音声
出力されることになり、更に文書内容の把握が容易とな
る。Further, preferably, the selecting means selects the voice quality data based on the attribute classified by the classifying means and the character string near the predetermined symbol. For example, when a character string indicating a citation source is arranged in the vicinity of a predetermined symbol (for example, immediately before the predetermined symbol), voice output is performed with a different voice quality for each citation source, further facilitating understanding of the document content. Become.

【００１２】また、好ましくは、前記分類手段は、文書
データに含まれる所定記号に基づいて、該文書データの
内容を引用文と本文に分類する。Further, preferably, the classification means classifies the contents of the document data into a citation sentence and a body text based on a predetermined symbol included in the document data.

【００１３】また、好ましくは、前記文書データの所定
行に含まれる情報に基づいて、当該文書データの所定の
属性に分類される内容の音声合成を男性の声質で行うか
女性の声質で行うかを決定する決定手段を更に備える。Further, preferably, based on the information included in a predetermined line of the document data, whether the voice synthesis of the content classified into the predetermined attribute of the document data is performed with a male voice quality or a female voice quality. It further comprises a determining means for determining.

【００１４】[0014]

【発明の実施の形態】以下、添付の図面を参照して本発
明の一実施形態を説明する。DETAILED DESCRIPTION OF THE INVENTION An embodiment of the present invention will be described below with reference to the accompanying drawings.

【００１５】＜第１の実施形態＞図１は、第１の実施形
態に係る文書読み上げ装置の機能構成を示すブロック図
である。同図において、１０１は文書構造解析部であ
り、文書データ中の本文と引用部分を区別する。ここで
は、文書構造解析部１０１は、電子メールやネットニュ
ースでよく用いられる引用記号“＞”などを手がかり
に、本文と引用部分を区別する。１０２は音声合成部で
あり、声質データに基づいた声質で音声合成を行う。１
０３は声質データ保持部であり、音声合成部１０２が使
用する複数種類の声質データを保持する。ここで、音声
合成部１０２は、文書構造解析部１０１で区別された本
文と引用文とで、採用する声質データを切り替えて音声
合成を行い、夫々異なる声質の合成音声を出力する。<First Embodiment> FIG. 1 is a block diagram showing the functional arrangement of a document reading device according to the first embodiment. In the figure, reference numeral 101 denotes a document structure analysis unit, which distinguishes a text body and a quoted portion in document data. Here, the document structure analysis unit 101 distinguishes the text from the quoted part based on the quote symbol “>” often used in electronic mail and netnews. A voice synthesis unit 102 performs voice synthesis with a voice quality based on voice quality data. 1
A voice quality data holding unit 03 holds a plurality of types of voice quality data used by the voice synthesis unit 102. Here, the voice synthesizing unit 102 performs voice synthesis by switching the voice quality data to be used between the main body and the quoted sentence distinguished by the document structure analyzing unit 101, and outputs synthesized voices having different voice qualities.

【００１６】この結果、電子メールやネットニュースの
内容を音声合成によって読み上げる場合に、本文と引用
部分とで声質を変えることができるようになり、出力音
声を聞くだけで本文と引用部分を区別することができ、
その内容をより正確に把握することができるようにな
る。As a result, when the contents of electronic mail or net news are read aloud by voice synthesis, the voice quality can be changed between the body text and the quoted portion, and the body text and the quoted portion can be distinguished only by listening to the output voice. It is possible,
It becomes possible to grasp the contents more accurately.

【００１７】図２は第１の実施形態による文書読み上げ
装置の構成を示すブロック図である。同図において、２
１は制御メモリであり、図３のフローチャートに示すよ
うな制御手順に従った制御プログラムを記憶する。２２
は制御メモリ２１に保持されている制御手順に従って判
断・演算などを行う中央処理装置である。２３はメモリ
であり、２４はディスク装置であり、声質データや入力
文書等を保持する。２５はバスである。FIG. 2 is a block diagram showing the configuration of the document reading device according to the first embodiment. In the figure, 2
Reference numeral 1 denotes a control memory, which stores a control program according to a control procedure as shown in the flowchart of FIG. 22
Is a central processing unit for making judgments and calculations according to the control procedure held in the control memory 21. Reference numeral 23 is a memory, and 24 is a disk device, which holds voice quality data, an input document, and the like. 25 is a bus.

【００１８】次に図３に示すフローチャートを参照し
て、本装置の動作を説明する。図３は第１の実施形態に
おける文書データの読み上げ手順を説明するフローチャ
ートである。また、図４は、文書合成の対象としての文
書データの一例を示す図である。Next, the operation of this apparatus will be described with reference to the flow chart shown in FIG. FIG. 3 is a flow chart for explaining the reading procedure of the document data in the first embodiment. Further, FIG. 4 is a diagram showing an example of document data as a target of document synthesis.

【００１９】まず、各変数を初期化する（ステップＳ３
０１）。本処理では、入力文書から読み込んだ１行を保
持するためのバッファＳ、その一つ前の１行を保持する
バッファＳ’、文書内容の一部を保持するバッファＢが
用いられるが、ステップＳ３０１ではそれぞれのバッフ
ァを空文字列（Φで示す）に初期化する。また、本処理
では、行Ｓが本分の一部か引用部分の一部かを表すフラ
グをＩ（Ｓ）とし、その一つ前の行Ｓ’が本文の一部か
引用部分の一部かを表すフラグをＩ（Ｓ’）とする。ス
テップＳ３０１ではＩ（Ｓ’）を０に初期化する。First, each variable is initialized (step S3).
01). In this process, a buffer S for holding one line read from the input document, a buffer S ′ for holding the preceding one line, and a buffer B for holding a part of the document contents are used. Then, each buffer is initialized to an empty string (indicated by Φ). In this process, the flag indicating whether the line S is a part of a book or a part of a quote is I (S), and the preceding line S'is a part of the text or a part of the quote. Let I (S ') be a flag indicating that. In step S301, I (S ') is initialized to 0.

【００２０】次に、現在バッファＳに保持されている行
文字列をＳ’にコピーし、入力文書から新たな１行をバ
ッファＳに読み込む（ステップＳ３０２）。続いて、バ
ッファＳに保持された行が当該入力文書のファイルの終
りを示すものかどうかを調べる（ステップＳ３０３）。Next, the line character string currently held in the buffer S is copied to S ', and a new line from the input document is read into the buffer S (step S302). Then, it is checked whether the line held in the buffer S indicates the end of the file of the input document (step S303).

【００２１】ファイルの終了でなければ、バッファＳが
本文の一部であるか引用部分の一部であるかを判定する
（ステップＳ３０４）。本例では、バッファＳが次の条
件を満たす場合に引用部分の一部であると判定し、そう
でなければ、本文の一部であると判定する。その条件
は、正規表現“＾．＊＞＞”または“＾．＊＞”にマッチ
し、かつ引用記号“＞＞”や“＞”の前の文字列はＮ文字以内
である、というものである。例えば、図４に示す、「ｙ
ａｇ＞＞」で始まる行がこの条件にマッチすることにな
る。If it is not the end of the file, it is determined whether the buffer S is a part of the text or a quoted part (step S304). In this example, if the buffer S satisfies the following condition, it is determined to be a part of the quoted portion, and if not, it is determined to be a part of the text. The condition is that it matches the regular expression "^. * >>" or "^. * >>" and that the character string before the quotation marks ">>>" or ">" is N characters or less. is there. For example, as shown in FIG.
The line starting with ag >> will match this condition.

【００２２】ここで、例えば“＾．＊＞＞”は、文字列
の先頭から任意文字列が存在し、>>の引用記号で終わる
ものを意味する。なお、上記との他に、好ましく
は、ＡＮＤ条件として、“＜＜”で始まらない、（＜
＜で始まる場合は引用箇所でないとする）という条件が
付加される。Here, for example, "^. * >>" means that an arbitrary character string exists from the beginning of the character string and ends with a quotation mark >>. In addition to the above, preferably, the AND condition does not start with “<<”, (<
If it starts with <, it is not a quoted part).

【００２３】Ｓが引用部分の一部と判定されたらＩ
（Ｓ）＝１とし（ステップＳ３０５）、本文の一部と判
定されたらＩ（Ｓ）＝０とする（ステップＳ３０６）。
そしてＩ（Ｓ）の値をチェックし（ステップＳ３０
７）、Ｉ（Ｓ）≠Ｉ（Ｓ’）であれば、バッファＢの内
容を音声合成部１０２に送る。音声合成部１０２では、
Ｉ（Ｓ）の値に従って、本文の声質データと引用部分の
声質データを切り替え、バッファＢの内容を音声に変換
し出力する（ステップＳ３０８）。If S is judged to be part of the quoted portion, then I
(S) = 1 is set (step S305), and if it is determined to be a part of the text, I (S) = 0 is set (step S306).
Then, the value of I (S) is checked (step S30
7), if I (S) ≠ I (S ′), the contents of the buffer B are sent to the voice synthesizer 102. In the voice synthesis unit 102,
In accordance with the value of I (S), the voice quality data of the main body and the voice quality data of the quoted portion are switched, the contents of the buffer B are converted into voice and output (step S308).

【００２４】ここで、ステップＳ３０８においてＩ
（Ｓ）＝１の場合は、Ｉ（Ｓ’）＝０であり、それまで
にバッファＢに保持されたデータが本文の一部であると
わかる。よって、Ｉ（Ｓ）＝１の場合は本文の声質デー
タを選択し、音声合成を実行する。同様に、Ｉ（Ｓ）＝
０の場合は、Ｉ（Ｓ’）＝１であり、それまでにバッフ
ァＢに保持されたデータは引用文の一部であることがわ
かる。よって、Ｉ（Ｓ）＝０の場合は、引用文の声質デ
ータを選択して、音声合成を実行する。Here, in step S308, I
When (S) = 1, I (S ′) = 0 and it can be seen that the data held in the buffer B up to that point is a part of the text. Therefore, when I (S) = 1, the voice quality data of the body is selected and the voice synthesis is executed. Similarly, I (S) =
In the case of 0, I (S ′) = 1, and it can be seen that the data held in the buffer B by that time is a part of the quoted text. Therefore, when I (S) = 0, the voice quality data of the quoted sentence is selected and the voice synthesis is executed.

【００２５】音声合成を実行すると、バッファＢ内の文
字列を空文字列（Φ）とし、ステップＳ３０９へ進む。
ステップＳ３０９では、バッファＢに行Ｓの内容を追加
し、ステップＳ３０２に戻る。When the voice synthesis is executed, the character string in the buffer B is made an empty character string (Φ), and the process proceeds to step S309.
In step S309, the contents of row S are added to buffer B, and the process returns to step S302.

【００２６】以上説明したように、第１の実施形態によ
れば、文書データを引用文と本文とで異なる声質で読み
上げることができる。特に、電子メールやネットニュー
スにおいて一般的に使用される引用記号に基づいて引用
文と本文との区別を行うので、この種の文書データの読
み上げに適した読み上げ装置となる。As described above, according to the first embodiment, document data can be read aloud with different voice qualities between the quoted text and the text. In particular, since the quoted text and the text are distinguished based on the quoted symbols generally used in electronic mail and netnews, the reading device is suitable for reading this kind of document data.

【００２７】＜第２の実施形態＞上記第１の実施形態で
は、本文と引用部分の２種類の声質データを切り替える
例であった。しかし、実際には、複数の発信者の発信内
容を同時に引用される場合も有る。このような場合、引
用記号の直前に、当該引用分の発信者を示す文字列を記
述することが多い（ここではこのような文字列を引用元
記号と呼ぶ）。図４では、「ｙａｇ」が発信者を示す文
字列（引用元記号）となる。そこで、一つの文書中に複
数の発信者の発信内容を同時に引用したものを自動読み
上げする場合を考慮して、引用部分用の声質データをあ
らかじめ複数用意しておく。そして、文書構造解析部１
０１で引用元記号を手がかりに引用部分を発信者ごとに
区別し、それぞれに適当な声質データを割り当てて音声
を合成するようにしてもよい。<Second Embodiment> In the above first embodiment, an example in which two types of voice quality data, that is, the body text and the quoted portion, are switched has been described. However, in reality, there are cases where the contents transmitted by a plurality of callers are quoted at the same time. In such a case, a character string indicating the sender of the quotation is often described immediately before the quotation mark (herein, such a character string is referred to as a quotation source mark). In FIG. 4, “yag” is a character string (quotation source symbol) indicating the sender. Therefore, a plurality of voice quality data for a quoted portion are prepared in advance in consideration of automatically reading aloud a document in which the transmission contents of a plurality of callers are simultaneously quoted. Then, the document structure analysis unit 1
In 01, the quotation part may be distinguished for each caller based on the quotation mark, and appropriate voice quality data may be assigned to each to synthesize the voice.

【００２８】例えば、文書構造解析部１０１は、引用記
号”＞”の手前の文字列に従って、引用元を識別し、新
たな引用元が出現する度に、声質データと引用元を対応
づける登録表を生成する。この登録表を参照することに
より、同じ引用元については同じ声質データを採用する
ことが可能となる。また、登録表に登録されていない引
用元が出現した場合は、登録表に登録されていない声質
データを声質データ保持部１０３から選択し、当該引用
元に割り当てて、登録表へ追加登録する。For example, the document structure analysis unit 101 identifies the citation source according to the character string before the quotation mark ">", and registers the voice quality data and the citation source each time a new citation source appears. To generate. By referring to this registration table, it becomes possible to adopt the same voice quality data for the same citation source. When a citation source not registered in the registration table appears, voice quality data not registered in the registration table is selected from the voice quality data holding unit 103, assigned to the citation source, and additionally registered in the registration table.

【００２９】なお、男性音声と女性音声を区別するよう
にしてもよい。電子メールやネットニュース記事のヘッ
ダー情報あるいは、引用元記号からそのメールやニュー
ス記事の発信者、引用部分の発信者の性別を推定し（ｅ
−ｍａｉｌの場合、メールヘッダーの“Ｆｒｏｍ”行に
記述されたｅ−ｍａｉｌアドレスが発信者を表す）、こ
の推定結果に従って、男性音声と女性音声を切り替えて
も良い。また、メールやニュース記事の発信者を示す文
字列と性別とを対応づけて登録しておき、これを参照し
て男性音声と女性音声を区別してもよい。The male voice and the female voice may be distinguished. The gender of the sender of the mail or news article or the sender of the quoted part is estimated from the header information of the email or net news article or the quoting source symbol (e
In the case of -mail, the e-mail address described in the "From" line of the mail header represents the sender), and the male voice and the female voice may be switched according to this estimation result. Alternatively, a character string indicating the sender of a mail or a news article may be registered in association with a gender, and a male voice and a female voice may be distinguished by referring to this.

【００３０】なお、本発明は、複数の機器（例えばホス
トコンピュータ，インタフェイス機器，リーダ，プリン
タなど）から構成されるシステムに適用しても、一つの
機器からなる装置（例えば、複写機，ファクシミリ装置
など）に適用してもよい。Even if the present invention is applied to a system composed of a plurality of devices (for example, host computer, interface device, reader, printer, etc.), a device composed of one device (for example, copying machine, facsimile) Device).

【００３１】また、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体を、システムあるいは装置に供給し、そ
のシステムあるいは装置のコンピュータ（またはＣＰＵ
やＭＰＵ）が記憶媒体に格納されたプログラムコードを
読出し実行することによっても、達成されることは言う
までもない。Further, an object of the present invention is to supply a storage medium having a program code of software for realizing the functions of the above-described embodiment to a system or apparatus, and to supply a computer (or CPU) of the system or apparatus.
And MPU) read and execute the program code stored in the storage medium.

【００３２】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００３３】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピディスク，ハードディス
ク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ
−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭな
どを用いることができる。A storage medium for supplying the program code is, for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD.
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００３４】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部または全
部を行い、その処理によって前述した実施形態の機能が
実現される場合も含まれることは言うまでもない。Further, by executing the program code read by the computer, not only the functions of the above-described embodiment are realized, but also the OS (operating system) running on the computer based on the instruction of the program code. It is needless to say that this also includes a case where the above) performs a part or all of the actual processing and the processing realizes the functions of the above-described embodiments.

【００３５】さらに、記憶媒体から読出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書込まれた後、そのプログラムコードの指示に
基づき、その機能拡張ボードや機能拡張ユニットに備わ
るＣＰＵなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written in the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, based on the instruction of the program code, It goes without saying that a case where the CPU or the like included in the function expansion board or the function expansion unit performs some or all of the actual processing and the processing realizes the functions of the above-described embodiments is also included.

【００３６】本発明を上記記憶媒体に適用する場合、そ
の記憶媒体には、先に説明したフローチャートに対応す
るプログラムコードを格納することになるが、簡単に説
明すると、図５のメモリマップ例に示す各モジュールを
記憶媒体に格納することになる。When the present invention is applied to the above-mentioned storage medium, the storage medium stores the program code corresponding to the above-mentioned flow chart. Briefly, the memory map example of FIG. Each module shown will be stored in the storage medium.

【００３７】すなわち、少なくとも「分類処理モジュー
ル」「選択処理モジュール」及び「合成処理モジュー
ル」の各モジュールのプログラムコードを記憶媒体に格
納すればよい。That is, at least the program code of each of the “classification processing module”, the “selection processing module” and the “synthesis processing module” may be stored in the storage medium.

【００３８】ここで、分類処理モジュールは、文書デー
タに含まれる所定記号に基づいて、該文書データの内容
を少なくとも２種類の属性に分類する分類処理を行うプ
ログラムモジュールである。選択処理モジュールは、音
声合成の対象となる文字列データの分類処理で分類され
た属性に応じて、音声合成における合成音声の声質を決
定するための少なくとも２種類の声質データを保持する
保持部よりいずれかの声質データを選択する選択処理を
実現するモジュールである。そして、合成処理モジュー
ルは、選択処理で選択された声質データに基づいて音声
合成を行う合成処理を行うモジュールである。Here, the classification processing module is a program module that performs classification processing for classifying the contents of the document data into at least two types of attributes based on a predetermined symbol included in the document data. The selection processing module includes a holding unit that holds at least two types of voice quality data for determining the voice quality of the synthesized voice in the voice synthesis according to the attribute classified by the classification process of the character string data to be subjected to the voice synthesis. It is a module that realizes a selection process for selecting any voice quality data. The synthesis processing module is a module for performing synthesis processing for performing voice synthesis based on the voice quality data selected in the selection processing.

【００３９】[0039]

【発明の効果】以上説明したように、本発明によれば、
音声合成すべき文書データの各テキスト列を属性によっ
て分類し、各属性毎に出力音声を変更することが可能と
なる。このため、例えば、電子メール等の読み上げに際
して、文書の本文と引用部分とで出力音声を変えて文書
を読み上げることが可能となり、出力音声を聞くだけで
本文と引用部分の区別が明確になり、内容をより正確に
把握できる、という効果が得られる。As described above, according to the present invention,
It is possible to classify each text string of the document data to be speech-synthesized by the attribute and change the output speech for each attribute. Therefore, for example, when reading an e-mail or the like, it is possible to read the document by changing the output voice depending on the text and the quoted part of the document, and the distinction between the text and the quoted part becomes clear only by listening to the output voice. The effect is that the content can be grasped more accurately.

【００４０】[0040]

[Brief description of drawings]

【図１】第１の実施形態に係る文書読み上げ装置の機能
構成を示すブロック図である。FIG. 1 is a block diagram showing a functional configuration of a document reading device according to a first embodiment.

【図２】第１の実施形態による文書読み上げ装置の構成
を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a document reading device according to the first embodiment.

【図３】第１の実施形態における文書データの読み上げ
手順を説明するフローチャートである。FIG. 3 is a flowchart illustrating a reading procedure of document data according to the first embodiment.

【図４】文書合成の対象としての文書データの一例を示
す図である。FIG. 4 is a diagram showing an example of document data as a target of document synthesis.

【図５】本発明に係る制御プログラムを格納する記憶媒
体のメモリマップ例を示す図である。FIG. 5 is a diagram showing an example of a memory map of a storage medium that stores a control program according to the present invention.

[Explanation of symbols]

２１制御メモリ２２中央処理装置２３メモリ２４ディスク装置２５バス１０１文書構造解析部１０２音声合成部１０３声質データ保持部 21 control memory 22 central processing unit 23 memory 24 disk device 25 bus 101 document structure analysis unit 102 speech synthesis unit 103 voice quality data holding unit

フロントページの続き (72)発明者八木沢津義東京都大田区下丸子３丁目30番２号キヤノン株式会社内 (72)発明者藤田稔東京都大田区下丸子３丁目30番２号キヤノン株式会社内Front page continuation (72) Inventor Tsuyoshi Yagisawa 3-30-2 Shimomaruko, Ota-ku, Tokyo Canon Inc. (72) Inventor Minoru Fujita 3-30-2 Shimomaruko, Ota-ku, Tokyo Canon Inc. Within

Claims

[Claims]

1. A classification means for classifying the content of the document data into at least two types of attributes based on a predetermined symbol included in the document data, and at least two types of voice qualities for determining the voice quality of synthesized speech in voice synthesis. Holding means for holding the data, selecting means for selecting the voice quality data held by the holding means in accordance with the attribute of the character string data to be synthesized which is classified by the classifying means, and the selecting means A speech synthesizing device, comprising: a synthesizing unit for synthesizing speech with respect to the character string data based on selected voice quality data.

2. The speech synthesizer according to claim 1, wherein the classification of the attribute of the document data by the classifying unit is performed on a line-by-line basis of the document data.

3. The classifying means classifies the attribute of each line of each line of the document data based on whether or not a character string matching a predetermined regular expression including a predetermined symbol exists in the line. The speech synthesizer according to claim 2.

4. The classifying means classifies the attribute of each line of each line of the document data based on whether or not a character string ending with a predetermined symbol is a predetermined number of characters or less. The speech synthesizer according to 2.

5. The voice synthesis according to claim 1, wherein the selecting unit selects the voice quality data based on the attribute classified by the classifying unit and a character string near the predetermined symbol. apparatus.

6. The voice synthesizing apparatus according to claim 1, wherein the classification unit classifies the content of the document data into a quoted text and a text based on a predetermined symbol included in the document data.

7. The apparatus further comprises a deciding means for deciding a voice quality to be used at the time of speech synthesis of contents classified into a predetermined attribute of the document data, based on information included in a specific line of the document data. The speech synthesizer according to claim 1.

8. A classifying step of classifying the contents of the document data into at least two types of attributes based on a predetermined symbol included in the document data, and a classifying step of the character string data to be combined processing. A selection step of selecting one of the voice quality data from a holding means for holding at least two types of voice quality data for determining the voice quality of the synthesized voice in the voice synthesis according to the attribute, and the voice quality selected in the selection step. And a synthesizing step of synthesizing the voice with respect to the character string data based on the data.

9. The speech synthesis method according to claim 8, wherein the classification of the attribute of the document data in the classifying step is performed for each line of the document data.

10. The classifying step classifies the attributes of each line of the document data based on whether or not a character string that matches a predetermined regular expression including a predetermined symbol exists in each line of the document data. The speech synthesis method according to claim 9.

11. The classifying step, for each line of the document data, classifies the attributes of each line based on whether or not a character string ending with a predetermined symbol is a predetermined number of characters or less. 9. The voice synthesis method according to item 9.

12. The voice synthesis according to claim 8, wherein the selecting step selects voice quality data based on the attribute classified in the classifying step and a character string near the predetermined symbol. Method.

13. The voice synthesizing method according to claim 8, wherein the classifying step classifies the content of the document data into a quoted text and a text based on a predetermined symbol included in the document data.

14. The method according to claim 14, further comprising a determining step of determining a voice quality to be used at the time of voice synthesis of a content classified into a predetermined attribute of the document data based on information included in a specific line of the document data. The speech synthesis method according to claim 8.

15. A computer-readable memory that stores a control program for performing voice synthesis output based on document data, wherein the content of the document data is at least two types based on a predetermined symbol included in the document data. At least two types of voice quality data for determining the voice quality of the synthesized voice in the voice synthesis according to the code of the classification process for classifying into attributes and the attribute of the character string data to be subjected to the synthesis process classified in the classification process. A selection step code for selecting one of the voice quality data from a holding means for holding, and a synthesis step code for performing voice synthesis on the character string data based on the voice quality data selected in the selection step. A computer-readable memory characterized by.