JP2009186498A

JP2009186498A - Speech synthesis device and speech synthesis program

Info

Publication number: JP2009186498A
Application number: JP2008023004A
Authority: JP
Inventors: Akiko Yamato; 亜紀子大和
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2008-02-01
Filing date: 2008-02-01
Publication date: 2009-08-20

Abstract

PROBLEM TO BE SOLVED: To provide a speech synthesis device and a speech synthesis program, for changing sound quality, when reading text with a tag structure. SOLUTION: When reading of a document which is displayed in a browser (S1) is requested (S5:YES), a parse tree is obtained from an XML parser (S10). The structure of a tag is checked for each node according to the parse tree, and if a condition is set in a parameter change table, a change value thereof is obtained, and a read character string and the parameter after changing are set in a buffer (S35). Then, the type of the tag is checked, and if a condition is set in the parameter change table for the tag type, a change value thereof is obtained, and the read character string and the parameter after changing are set in the buffer (S40). The set read character string is speech synthesized by using the specified parameter (S45), and output (S50). COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声合成装置及び音声合成プログラムに関するものであり、より具体的には、テキストに付されたタグに応じて合成音声の声質を切り替える音声合成装置及び音声合成プログラムに関する。 The present invention relates to a speech synthesizer and a speech synthesis program, and more specifically to a speech synthesizer and a speech synthesis program that switch voice quality of synthesized speech in accordance with a tag attached to text.

一般に、テキストから合成音声を生成する場合、まず、テキストの形態素解析が行われ、その解析結果を用いて、音韻データ及びパラメータ（音韻継続時間、ピッチパターン、パワー）が生成される。そして、生成された音韻データが音韻データベースから引き出され、その音韻データとパラメータに従って合成音声が生成される。生成された合成音声の声質は、主にパラメータによって決定される。このため、複数の声質を用いたい場合には、パラメータを切り替えて生成する。 In general, when generating synthesized speech from text, first, morphological analysis of the text is performed, and phoneme data and parameters (phoneme duration, pitch pattern, power) are generated using the analysis result. The generated phonological data is extracted from the phonological database, and synthesized speech is generated according to the phonological data and parameters. The voice quality of the generated synthesized speech is mainly determined by parameters. For this reason, when it is desired to use a plurality of voice qualities, the parameters are switched and generated.

このようなパラメータの切り替えは、ユーザが好みに応じて切り替えてもよいが、テキストの属性（フォント、種類、内容、重要性、発生源など）や表示形態に従って異なる声質を出力するようにする方法も提案されている(特許文献１参照)。
特開平１０−２６０８１５号公報 Such parameter switching may be performed according to user's preference, but a method for outputting different voice qualities according to text attributes (font, type, content, importance, source, etc.) and display form. Has also been proposed (see Patent Document 1).
JP-A-10-260815

しかしながら、テキストに付される属性の種類はそれほど多くない。一方、マークアップ言語では、表示のために各種のタグを用いて記載がなされている。読み上げにおいても表示に近い効果を得るために、タグに応じて声質を変えてテキストを読み上げたいという要望があった。 However, there are not so many kinds of attributes attached to text. On the other hand, in the markup language, description is made using various tags for display. In order to obtain an effect close to display even when reading aloud, there has been a demand to read the text by changing the voice quality according to the tag.

本発明は上記問題を解決するためになされたものであり、タグ構造を有するテキストを読み上げる際に声質を切り替える音声合成装置及び音声合成プログラムを提供することを目的とする。 The present invention has been made to solve the above problem, and an object of the present invention is to provide a speech synthesizer and a speech synthesis program for switching voice qualities when reading text having a tag structure.

上記目的を達成するため、本発明の請求項１に記載の音声合成装置は、タグ付テキストから合成音声を生成する音声合成装置であって、前記テキストを言語解析して読み文字列を付与する読み文字列付与手段と、前記読み文字列付与手段により付与された読み文字列に対応する音韻データ及び音韻パラメータを生成する音韻生成手段と、前記タグ付テキストを当該タグに従って構文解析した結果を取得する取得手段と、前記取得手段により取得した構文解析結果に基づき、所定のタグが付されたテキストについて、前記音韻生成手段により生成されたパラメータを変更する変更手段と、前記変更手段により変更されたパラメータ、及び、前記音韻生成手段により生成された音韻データから合成音声を生成する音声合成手段とを備えたことを特徴とする。 In order to achieve the above object, a speech synthesizer according to claim 1 of the present invention is a speech synthesizer that generates synthesized speech from tagged text, and linguistically analyzes the text to give a reading character string. A reading character string assigning means, a phoneme generating means for generating phoneme data and phoneme parameters corresponding to the reading character string given by the reading character string giving means, and a result obtained by parsing the tagged text according to the tag And a change means for changing a parameter generated by the phonological generation means for the text with a predetermined tag based on a result of the parsing acquired by the acquisition means, and a change by the change means Speech synthesis means for generating synthesized speech from the parameters and the phoneme data generated by the phoneme generation means. To.

また、本発明の請求項２に記載の音声合成装置は、請求項１に記載の発明の構成に加え、前記変更手段が、所定の名前のタグが付されたテキストについて前記パラメータを変更することを特徴とする。 Further, in the speech synthesizer according to claim 2 of the present invention, in addition to the configuration of the invention according to claim 1, the changing means changes the parameter for the text with a tag of a predetermined name. It is characterized by.

また、本発明の請求項３に記載の音声合成装置は、請求項２に記載の発明の構成に加え、前記変更手段によりパラメータが変更されるタグの名前を指定する指定手段を備えたことを特徴とする。 According to a third aspect of the present invention, in addition to the configuration of the second aspect of the present invention, the speech synthesizer includes a designation unit that designates the name of a tag whose parameter is changed by the changing unit. Features.

また、本発明の請求項４に記載の音声合成装置は、請求項１乃至３のいずれかに記載の発明の構成に加え、前記変更手段が、所定の属性を有するタグが付されたテキストについて前記パラメータを変更することを特徴とする。 According to a fourth aspect of the present invention, in the speech synthesizer according to the fourth aspect of the present invention, in addition to the configuration of the first aspect of the present invention, the changing unit is configured to process a text with a tag having a predetermined attribute. The parameter is changed.

また、本発明の請求項５に記載の音声合成装置は、請求項１乃至４のいずれかに記載の発明の構成に加え、前記変更手段が、所定のタグ構造を有するテキストについて前記パラメータを変更することを特徴とする。 According to a fifth aspect of the present invention, in the speech synthesizer according to the fifth aspect of the present invention, in addition to the configuration of the first aspect, the changing unit changes the parameter for text having a predetermined tag structure. It is characterized by doing.

また、本発明の請求項６に記載の音声合成装置は、請求項１乃至５のいずれかに記載の発明の構成に加え、前記変更手段が、表示形式を指定するタグが付されたテキストについて前記パラメータを変更することを特徴とする。 According to a sixth aspect of the present invention, in the speech synthesizer according to the sixth aspect of the present invention, in addition to the configuration of the first aspect of the present invention, the change unit is configured to add a text with a tag designating a display format. The parameter is changed.

また、本発明の請求項７に記載の音声合成プログラムは、請求項１乃至６のいずれかに記載の発明の各種処理手段としてコンピュータを機能させる。 A speech synthesis program according to claim 7 of the present invention causes a computer to function as various processing means of the invention according to any of claims 1 to 6.

本発明の請求項１に記載の音声合成装置は、テキストに付されたタグに従って構文解析した結果を取得し、付されたタグに応じて、テキストから生成される合成音声のパラメータを変更する。従って、タグに応じて合成音声の声質を切り替えることができ、タグ構成による表示の特徴を音声の違いで伝えることができる。 The speech synthesizer according to claim 1 of the present invention acquires the result of the syntax analysis according to the tag attached to the text, and changes the parameter of the synthesized speech generated from the text according to the attached tag. Therefore, the voice quality of the synthesized speech can be switched according to the tag, and the display characteristics by the tag configuration can be conveyed by the difference in speech.

また、本発明の請求項２に記載の音声合成装置は、請求項１に記載の発明の効果に加え、タグの名前によりパラメータを変更できるので、タグの名前に声質を対応づけることができる。例えば、表中の特定の列だけ特徴づけて聞くことができる。 Moreover, since the speech synthesizer according to claim 2 of the present invention can change the parameter according to the tag name in addition to the effect of the invention according to claim 1, the voice quality can be associated with the tag name. For example, only certain columns in the table can be characterized and heard.

また、本発明の請求項３に記載の音声合成装置は、請求項２に記載の発明の効果に加え、パラメータを変更したいタグの名前を指定できるので、作成者の所望の声質で指定の箇所を読み上げさせるようにタグ付テキストを作成することができる。 In addition to the effect of the invention described in claim 2, the speech synthesizer described in claim 3 of the present invention can specify the name of the tag whose parameter is to be changed. You can create tagged text to read.

また、本発明の請求項４に記載の音声合成装置は、請求項１乃至３のいずれかに記載の発明の効果に加え、タグの属性によりパラメータを変更できるので、例えば、リンク、イメージ、フォント等の場合に声質を変えるようにすることができる。 Further, in addition to the effect of the invention according to any one of claims 1 to 3, the speech synthesizer according to claim 4 of the present invention can change parameters according to tag attributes. For example, the voice quality can be changed.

また、本発明の請求項５に記載の音声合成装置は、請求項１乃至４のいずれかに記載の発明の効果に加え、タグ構造によってパラメータを変更できるので、タグの階層、タグの繰り返し等によって表現される表示の特性を読み上げられる声質に対応させることができる。 In addition to the effect of the invention according to any one of claims 1 to 4, the speech synthesizer according to claim 5 of the present invention can change parameters depending on the tag structure, so that the tag hierarchy, tag repetition, etc. The characteristic of the display expressed by can be made to correspond to the voice quality that can be read out.

また、本発明の請求項６に記載の音声合成装置は、請求項１乃至５のいずれかに記載の発明の効果に加え、表示形式を指定するタグによってパラメータを変更できるので、例えば、スタイルシートが指定されている場合は声質を変えるようにでき、表示の特徴を音声の違いでより明確に伝えることができる。 In addition to the effect of the invention according to any one of claims 1 to 5, the speech synthesizer according to claim 6 of the present invention can change a parameter by a tag designating a display format. When is specified, the voice quality can be changed, and the characteristics of the display can be conveyed more clearly by the difference in voice.

また、本発明の請求項７に記載の音声合成プログラムは、コンピュータに実行させることにより、請求項１乃至６のいずれかに記載の発明の作用効果を奏することができる。 Moreover, the speech synthesis program according to claim 7 of the present invention can exhibit the operational effects of the invention according to any of claims 1 to 6 by being executed by a computer.

以下、本発明の実施の形態を図面を参照して説明する。図１は、音声合成装置１００のハードウェアブロック図である。音声合成装置１００は、所謂パーソナルコンピュータである。図１に示すように、音声合成装置１００には、音声合成装置１００の制御を司るＣＰＵ１０が設けられ、ＣＰＵ１０には、各種のデータを一時的に記憶するＲＡＭ１１と、ＢＩＯＳ等を記憶したＲＯＭ１２とが接続している。ＣＰＵ１０には、バスを介して、ハードディスク装置１３、出力制御部１４、入力制御部１５、音声出力制御部１６が接続されている。出力制御部１４には、出力機器２４が接続され、入力制御部１５には入力機器２５が接続されている。出力機器２４とは、例えばディスプレイであり、入力機器２５とは、例えばマウスやキーボードである。音声出力制御部１６にはスピーカ２６が接続されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a hardware block diagram of the speech synthesizer 100. The speech synthesizer 100 is a so-called personal computer. As shown in FIG. 1, the speech synthesizer 100 is provided with a CPU 10 that controls the speech synthesizer 100. The CPU 10 includes a RAM 11 that temporarily stores various data, and a ROM 12 that stores BIOS and the like. Is connected. A hard disk device 13, an output control unit 14, an input control unit 15, and an audio output control unit 16 are connected to the CPU 10 via a bus. An output device 24 is connected to the output control unit 14, and an input device 25 is connected to the input control unit 15. The output device 24 is, for example, a display, and the input device 25 is, for example, a mouse or a keyboard. A speaker 26 is connected to the audio output control unit 16.

ハードディスク装置（ＨＤＤ）１３には、変更パラメータ記憶エリア１３１，音響モデル記憶エリア１３３，プログラム記憶エリア１３４，その他の情報記憶エリア１３５が少なくとも設けられている。変更パラメータ記憶エリア１３１には、読み上げの対象となる文章内に所定の条件を満たすタグが存在するときに音声パラメータを変更するための情報であるパラメータ変更テーブルが記憶されている。音響モデル記憶エリア１３３には、音声をスピーカ２６から出力するための複数の音響モデルが記憶されている。プログラム記憶エリア１３４には、ＣＰＵ１０で実行される各種のプログラムが記憶されている。その他の情報記憶エリア１３５には、音声合成装置１００で使用されるその他の情報が記憶されている。 The hard disk device (HDD) 13 is provided with at least a change parameter storage area 131, an acoustic model storage area 133, a program storage area 134, and other information storage areas 135. The change parameter storage area 131 stores a parameter change table that is information for changing a voice parameter when a tag that satisfies a predetermined condition exists in a sentence to be read out. The acoustic model storage area 133 stores a plurality of acoustic models for outputting sound from the speaker 26. In the program storage area 134, various programs executed by the CPU 10 are stored. In the other information storage area 135, other information used in the speech synthesizer 100 is stored.

次に、図２を参照して、ＨＤＤ１３の変更パラメータ記憶エリアに設けられているパラメータ変更テーブルについて説明する。図２は、パラメータ変更テーブルの構成を示す模式図である。パラメータ変更テーブルには、対象文中に指定されているタグや構造に応じて音声パラメータを変更するための情報が記憶されている。図２に示すように、パラメータ変更テーブルには、データ項目として「条件」、「パラメータ」、「補足文字列」が設けられている。「パラメータ」には、「音響モデル」，「パワー」，「ピッチ」，「スピード」，「音質」が設けられている。音声パラメータを変更するための条件としては、本実施形態では、「ユーザ指定」、「ＤＴＤ指定」、「スタイルシート指定」、「属性指定」、「補足指定」、「条件指定」、「タグ間の相対位置」、「タグ間の関係性」、「タグの深さ」を規定している。これらのうち、「ユーザ指定」、「ＤＴＤ指定」、「スタイルシート指定」、「属性指定」、「補足指定」、及び「条件指定」はタグの種類による条件である。「タグ間の相対位置」、「タグ間の関係性」、「タグの深さ」は、タグの構造による条件である。本実施形態では、読み上げ対象とされたＸＭＬ文書の解析結果を、タグ構造、タグ種類の順に調査し、該当する条件に合致した場合には、対応するパラメータに変更して、読み上げ文字列とともにパラメータをＲＡＭ１１にセットする。 Next, a parameter change table provided in the change parameter storage area of the HDD 13 will be described with reference to FIG. FIG. 2 is a schematic diagram showing the configuration of the parameter change table. The parameter change table stores information for changing the voice parameter according to the tag or structure specified in the target sentence. As shown in FIG. 2, the parameter change table includes “condition”, “parameter”, and “supplementary character string” as data items. “Parameter” includes “acoustic model”, “power”, “pitch”, “speed”, and “sound quality”. In this embodiment, the conditions for changing the voice parameter are “user designation”, “DTD designation”, “style sheet designation”, “attribute designation”, “supplement designation”, “condition designation”, “between tags”. "Relative position", "relationship between tags", and "tag depth". Among these, “user designation”, “DTD designation”, “style sheet designation”, “attribute designation”, “supplement designation”, and “condition designation” are conditions depending on the type of tag. “Relative position between tags”, “Relationship between tags”, and “Tag depth” are conditions depending on the tag structure. In the present embodiment, the analysis result of the XML document to be read out is examined in the order of the tag structure and the tag type, and when the corresponding condition is met, the parameter is changed to the corresponding parameter and the parameter is read together with the read character string. Is set in the RAM 11.

以下、各条件について説明する。まず、「ユーザ指定」は、ユーザから任意に指定された読み上げ条件文字列である。文字列で指定を受け付け、その入力文を解析した結果に基づいてタグ名を取得し、そのタグに該当すれば、音響モデルを変更する。 Hereinafter, each condition will be described. First, “user designation” is a reading condition character string arbitrarily designated by the user. A specification is received as a character string, a tag name is acquired based on the result of analyzing the input sentence, and if the tag is applicable, the acoustic model is changed.

「ＤＴＤ指定」は、ＤＴＤ宣言の中で指定されたタグかどうかの条件である。ＤＴＤ（Document Type Definition）は、「文書型定義」であり、ＸＭＬのようなマークアップ言語で文書を記述する際、その文書中でどのようなタグや属性が使われているかを定義したものである。例えば、ある要素直下での登場回数が必ず一回の要素、ある要素直下での登場は任意（？か＊）、ある要素直下で一回以上登場（＋）、ある要素直下での登場順序が任意の要素（｜）、ある要素の配下に子要素があるか等が定義できる。そこで、本実施形態では、図２に示すように、登場任意要素、１回以上登場要素、登場順序任意要素の３つをＤＴＤ指定対応タグとして、これらの要素がタグとして登場したときに、パラメータを変更することとしている。 “Specify DTD” is a condition as to whether the tag is specified in the DTD declaration. DTD (Document Type Definition) is a "document type definition" that defines what tags and attributes are used in a document when it is described in a markup language such as XML. is there. For example, the number of times of appearance immediately below a certain element is always one element, the appearance immediately below a certain element is optional (?? *), the appearance appears more than once (+) directly below a certain element, and the appearance order immediately below a certain element is Arbitrary elements (|), whether there are child elements under a certain element, etc. can be defined. Therefore, in the present embodiment, as shown in FIG. 2, when an appearance optional element, an element that appears once or more, and an appearance order arbitrary element are used as DTD designation corresponding tags, Is going to change.

「スタイルシート指定」は、スタイルシート内で指定されたタグかどうかの条件である。スタイルシートは、文書の見栄えに関する情報をひとまとめにした雛形である。スタイルシートを適用することで、複数の文書についての見栄えを統一したり、ＸＭＬ文書からレイアウトについての情報を分離して文書の構造のみを記述することができる。代表的なスタイルシートの言語としては、ＣＳＳ（Cascading Style Sheets）やＸＳＬ（Extensible Stylesheet Language）がある。尚、図７に示すスタイルシートは、ＸＳＬの例である。本実施形態では、図２に示すように、＜th＞タグ、＜td＞タグ、及び＜span＞タグの３つをスタイルシート指定タグとして、対象ノードがこれらのタグであった場合に、パラメータを変更することとしている。 “Style sheet designation” is a condition regarding whether or not the tag is designated in the style sheet. A style sheet is a template that gathers together information about the appearance of a document. By applying a style sheet, it is possible to unify the appearance of a plurality of documents, or to separate only layout information from an XML document and describe only the document structure. Typical style sheet languages include CSS (Cascading Style Sheets) and XSL (Extensible Stylesheet Language). The style sheet shown in FIG. 7 is an example of XSL. In this embodiment, as shown in FIG. 2, when the <th> tag, the <td> tag, and the tag are style sheet designation tags and the target node is these tags, the parameters are Is going to change.

「属性指定」は、特定の属性を有するタグかどうかの条件である。タグの属性としては、例えば、リンクタグ属性、イメージタグ属性、フォント指定属性などがある。図２に示すように、本実施形態では、リンクタグとイメージタグについて属性指定タグとし、対象ノードのタグがこれらの属性を有する場合には、パラメータ変更テーブルを参照して、パワーのパラメータを０．６とするように指定している。 “Attribute designation” is a condition as to whether or not the tag has a specific attribute. Examples of tag attributes include a link tag attribute, an image tag attribute, and a font designation attribute. As shown in FIG. 2, in this embodiment, the link tag and the image tag are attribute designation tags. When the tag of the target node has these attributes, the parameter of the power is set to 0 by referring to the parameter change table. .6 is specified.

「補足指定」は、読み上げる際にタグの種類を補足して読み上げた方が好ましいとして指定されているタグかどうかの条件である。例えば、リンクタグであれば、そのまま読み上げるよりも、「〜のリンクです」と補足した方が分かりやすくなる。図２に示すように、本実施形態では、＜a＞タグと＜img＞タグについて、補足指定タグとしている。＜a＞タグの場合には、パワーのパラメータを０．６とし、かつ、「のリンクです」の文字列を補足する。＜img＞タグの場合には、パワーのパラメータを０．６とし、かつ、「のイメージです」の文字列を補足する。 “Supplementary designation” is a condition as to whether or not the tag is designated as being preferably read out by supplementing the tag type when reading out. For example, in the case of a link tag, it is easier to understand if it is supplemented with “It is a link of ~” rather than reading it as it is. As shown in FIG. 2, in this embodiment, the <a> tag and the <img> tag are supplementary designation tags. In the case of the <a> tag, the power parameter is set to 0.6, and the character string “is a link” is supplemented. In the case of the <img> tag, the power parameter is set to 0.6 and the character string “is an image of” is supplemented.

「条件指定」は、その他のタグで予め指定されているタグかどうかの条件である。例えば、＜rank＞タグで囲まれる数値が指定の値、例えば「３」より小さければ変更する等の設定ができる。本実施形態では、条件として、タグで囲まれた文字列の中に数値が含まれる場合に、その数値が正の値であるか負の値であるかによってパラメータを変更することにしている。図２に示すように、プラスの数値を囲むタグであれば、ピッチを１．２倍とし、マイナスの数値を囲むタグであれば、ピッチを０．８倍とする。 “Condition designation” is a condition as to whether or not the tag is designated in advance with other tags. For example, a setting can be made such that the numerical value enclosed by the <rank> tag is changed if it is smaller than a specified value, for example, “3”. In the present embodiment, as a condition, when a numerical value is included in a character string surrounded by tags, the parameter is changed depending on whether the numerical value is a positive value or a negative value. As shown in FIG. 2, if the tag surrounds a positive numerical value, the pitch is 1.2 times, and if the tag surrounds a negative numerical value, the pitch is 0.8 times.

「タグ間の相対位置」は、タグの深さによる条件である。ＸＭＬ文書は、タグによって階層化して記載することができる。このタグの深さによる階層化は、表示の場合はインデントをつけたり、表形式にする等の表現により実現できるが、読み上げの場合は階層がわかりにくいので、タグの深さによって声質を変化させる。図２に示すように、タグの深さに応じて、パワー、ピッチ、スピード、音質の各パラメータは、タグの深さが「１」であれば１．１倍に、タグの深さが「２」であれば１．２倍に、タグの深さが「３」であれば１．３倍になる。 The “relative position between tags” is a condition depending on the depth of the tag. An XML document can be described hierarchically by tags. Hierarchization based on the depth of the tag can be realized by an expression such as indentation in the case of display or in the form of a table. However, since the hierarchy is difficult to understand in the case of reading, the voice quality is changed depending on the depth of the tag. As shown in FIG. 2, depending on the depth of the tag, the power, pitch, speed, and sound quality parameters are 1.1 times if the tag depth is “1” and the tag depth is “1”. If it is “2”, it will be 1.2 times, and if the depth of the tag is “3”, it will be 1.3 times.

「タグ間の関係性」は、表構造のタグのように、同じタグが隣にあったり、同じ階層に違う名称のタグがあったり、といったタグの関係を規定する条件である。例えば、表構造の行の場合、隣同士が同じタグとなる。表構造の列の場合、同じ階層で違う名称のタグとなる。このように、表の場合に、読み上げによって列や行の変化がわかるように、ピッチのパラメータを変更する。 The “relationship between tags” is a condition that defines a tag relationship such that the same tag is next to each other as in a table structure tag, or there is a tag with a different name in the same hierarchy. For example, in the case of a table-structured row, adjacent tags are the same. In the case of a table-structured column, the tags have different names in the same hierarchy. As described above, in the case of the table, the pitch parameter is changed so that the change of the column or row can be understood by reading.

「タグの深さ」は、タグの相対的位置による条件である。相対的位置を指定するタグの場合、最上位がデフォルト値となり、最上位から２つ下ではパワーが０．９となる。また、最深部のタグでは、パワーは０．６となる。 The “tag depth” is a condition depending on the relative position of the tag. In the case of a tag that designates a relative position, the highest value is the default value, and the power is 0.9 two levels below the highest value. In the deepest tag, the power is 0.6.

次に、図３〜図９を参照して、音声合成装置１００における動作について説明する。図３は、音声合成処理のメイン処理のフローチャートである。図４は、音声合成処理の中で実行されるタグ構造チェック処理のフローチャートである。図５は、音声合成処理の中で実行されるタグ種類チェック処理のフローチャートである。図６は、ブラウザの表示画面の例を示す模式図である。図７は、ＸＭＬ文書の例を示す説明図である。図８は、スタイルシートの例を示す説明図である。図９は、解析木の例を示す模式図である。 Next, operations in the speech synthesizer 100 will be described with reference to FIGS. FIG. 3 is a flowchart of the main process of the speech synthesis process. FIG. 4 is a flowchart of the tag structure check process executed in the speech synthesis process. FIG. 5 is a flowchart of the tag type check process executed in the speech synthesis process. FIG. 6 is a schematic diagram illustrating an example of a display screen of a browser. FIG. 7 is an explanatory diagram illustrating an example of an XML document. FIG. 8 is an explanatory diagram illustrating an example of a style sheet. FIG. 9 is a schematic diagram illustrating an example of an analysis tree.

本実施形態では、ＸＭＬ言語で記載された文書をブラウザに表示させ、それを読み上げる指示がユーザからなされたときにＸＭＬパーサから構文解析結果を取得し、取得した解析結果に従って、音声パラメータを適宜変更して音声を合成し、読み上げを行う。 In this embodiment, a document described in the XML language is displayed on the browser, and when the user is instructed to read it out, the parsing result is obtained from the XML parser, and the speech parameters are appropriately changed according to the obtained parsing result. Then, synthesize the speech and read it out.

メイン処理が開始されると、図３に示すように、まず、指定されたＸＭＬ文書をブラウザに表示させる（Ｓ１）。例えば、図７に示すＸＭＬ文書を表示させると、図８に示すスタイルシートが参照されて、図６のような画面となる。 When the main process is started, as shown in FIG. 3, first, the designated XML document is displayed on the browser (S1). For example, when the XML document shown in FIG. 7 is displayed, the style sheet shown in FIG. 8 is referred to and the screen shown in FIG. 6 is displayed.

次に、使用者から読み上げ要求が入力されたか否かを判断する（Ｓ５）。使用者からの入力は、画面上にボタンやメニューを設けて受け付けるようにすればよい。読み上げ要求が入力されなければ（Ｓ５：ＮＯ）、そのままの画面表示の状態で待機する。読み上げ要求が入力されれば（Ｓ５：ＹＥＳ）、ＸＭＬパーサから解析木を取得する（Ｓ１０）。 Next, it is determined whether or not a reading request has been input from the user (S5). Input from the user may be accepted by providing buttons and menus on the screen. If no reading request is input (S5: NO), the process stands by with the screen displayed as it is. If a reading request is input (S5: YES), an parse tree is acquired from the XML parser (S10).

ＸＭＬパーサ（ＸＭＬプロセッサとも呼ばれる。）は、ＸＭＬ形式の文書やファイルから、テキストデータの部分だけを抜き出して、プログラムやアプリケーションソフトが利用しやすい形に変換するためのソフトウェアである。ＸＭＬの文書はＸＭＬで独自に定義されたタグを利用して記述されているが、ＸＭＬパーサはこれを特定の形式に変換して出力することができる。ＸＭＬパーサは、変換の過程で構文解析を行い、ＸＭＬの仕様に適合していない場合にはエラーを出力する。また、ＸＭＬパーサによっては、解析結果をツリー構造（解析木）で出力することができる。本実施形態では、このようにしてＸＭＬパーサから出力される解析木を取得して利用する。例えば、本実施形態では、図９のような解析木が取得されたとする。 An XML parser (also called an XML processor) is software for extracting only text data from an XML document or file and converting it into a form that can be easily used by programs and application software. An XML document is described using a tag uniquely defined in XML, but an XML parser can convert this into a specific format and output it. The XML parser performs syntax analysis in the process of conversion, and outputs an error if it does not conform to the XML specification. Depending on the XML parser, the analysis result can be output in a tree structure (analysis tree). In this embodiment, the parse tree output from the XML parser is acquired and used in this way. For example, in this embodiment, it is assumed that an analysis tree as shown in FIG. 9 is acquired.

次に、ユーザからの読み上げに関する指定があるか否かを判断する（Ｓ１５）。ユーザの指定としては、例えば、図６の表示画面中、「『個数』の列を強調して」というように、文字列で指定を受け付けることができる。このような指定がなされた場合には（Ｓ１５：ＹＥＳ）、入力文を解析し（Ｓ２０）、解析された結果に基づいて対応するタグ名を取得する（Ｓ２５）。例えば、上記例では、『個数』の列に対応するタグである「pieces」タグが取得される。ユーザ指定がなければ（Ｓ１５：ＮＯ）、そのままＳ３０に進む。 Next, it is determined whether or not there is a designation related to reading from the user (S15). As the user designation, for example, the designation can be accepted by a character string such as “emphasize the“ number ”column” in the display screen of FIG. When such designation is made (S15: YES), the input sentence is analyzed (S20), and the corresponding tag name is acquired based on the analyzed result (S25). For example, in the above example, a “pieces” tag that is a tag corresponding to the “number” column is acquired. If there is no user designation (S15: NO), the process proceeds to S30 as it is.

次に、変更パラメータ記憶エリア１３１に記憶されているパラメータ変更テーブルを取得してＲＡＭ１１内のワークエリアに置く（Ｓ３０）。次に、Ｓ１０で取得した解析木に従い、対象ＸＭＬ文書のタグ構造をチェックする（Ｓ３５）。 Next, the parameter change table stored in the change parameter storage area 131 is acquired and placed in the work area in the RAM 11 (S30). Next, the tag structure of the target XML document is checked according to the parse tree acquired in S10 (S35).

ここで、タグ構造チェック処理の詳細について、図４を参照して説明する。図４に示すように、まず、解析木のノード数ｉを０として初期化する（Ｓ１０１）。次に、ｉの値が全ノード数よりも小さいか否かを判断する（Ｓ１０５）。ｉの値が全ノード数よりも小さければ（Ｓ１０５：ＹＥＳ）、対象ノードのタグの深さに対応するパラメータ変更値をパラメータ変更テーブルから取得する（Ｓ１１０）。図２に示すように、タグの深さに応じて、パワー、ピッチ、スピード、音質の各パラメータは、タグの深さが「１」であれば１．１倍に、タグの深さが「２」であれば１．２倍に、タグの深さが「３」であれば１．３倍になる。 Here, details of the tag structure check process will be described with reference to FIG. As shown in FIG. 4, first, the number i of nodes in the analysis tree is initialized to 0 (S101). Next, it is determined whether or not the value of i is smaller than the total number of nodes (S105). If the value of i is smaller than the total number of nodes (S105: YES), a parameter change value corresponding to the tag depth of the target node is obtained from the parameter change table (S110). As shown in FIG. 2, depending on the depth of the tag, the power, pitch, speed, and sound quality parameters are 1.1 times if the tag depth is “1” and the tag depth is “1”. If it is “2”, it will be 1.2 times, and if the depth of the tag is “3”, it will be 1.3 times.

次に、対象ノードのタグが相対的位置を指定している（階層状に構成されている）ものかどうかを判断する（Ｓ１１５）。階層になっているタグであれば（Ｓ１１５：ＹＥＳ）、その階層に従って、対応するパラメータ変更値をパラメータ変更テーブルから取得する（Ｓ１２０）。図２に示すように、相対的位置を指定するタグの場合、最上位がデフォルト値となり、最上位から２つ下ではパワーが０．９となる。また、最深部のタグでは、パワーは０．６となる。 Next, it is determined whether or not the tag of the target node specifies a relative position (configured in a hierarchy) (S115). If the tag has a hierarchy (S115: YES), the corresponding parameter change value is acquired from the parameter change table according to the hierarchy (S120). As shown in FIG. 2, in the case of a tag designating a relative position, the highest value is the default value, and the power is 0.9 two levels below the highest value. In the deepest tag, the power is 0.6.

パラメータ変更値の取得後、もしくは、相対的位置を指定するタグでない場合には（Ｓ１１５：ＮＯ）、同じタグが隣にあるかどうかを判断する（Ｓ１２５）。例えば、図９の解析木では、「ｐａｎ」タグは隣同士が同じタグになっている。このような構造は、表構造の行の場合に発生する。このため、行であることが読み上げの場合にも表現されるように、パラメータを変更する。同じタグが隣にあれば（Ｓ１２５：ＹＥＳ）、対応するパラメータ変更値をパラメータ変更テーブルから取得する（Ｓ１３０）。すなわち、図２に示すように、行の変更とみなしてピッチを１．２倍する。 After acquisition of the parameter change value or when the tag is not a tag for specifying the relative position (S115: NO), it is determined whether or not the same tag is adjacent (S125). For example, in the analysis tree of FIG. 9, the “pan” tag is the same tag next to each other. Such a structure occurs in the case of a table-structured row. For this reason, the parameter is changed so that the line is also expressed in the case of reading out. If the same tag is adjacent (S125: YES), the corresponding parameter change value is acquired from the parameter change table (S130). That is, as shown in FIG. 2, it is regarded as a row change and the pitch is multiplied by 1.2.

パラメータ変更値の取得後、もしくは、同じタグが隣にない場合には（Ｓ１２５：ＹＥＳ）、兄弟タグがあるかどうかを判断する（Ｓ１３５）。兄弟タグ、すなわち、図９の解析木の「name」タグ、「pieces」タグ、「size」タグ、「cc」タグ、「rank」タグ、「memo」タグのように、同じ階層で違う名称のタグがある場合には（Ｓ１３５：ＹＥＳ）、表構造の列となるので、これが読み上げの場合にも表現されるように、パラメータを変更する（Ｓ１４０）。すなわち、対応するパラメータ変更値をパラメータ変更テーブルから取得し、図２に示すように、ピッチを０．９倍とする。 After obtaining the parameter change value or when the same tag is not next (S125: YES), it is determined whether there is a sibling tag (S135). Sibling tags, ie, “name” tag, “pieces” tag, “size” tag, “cc” tag, “rank” tag, “memo” tag, etc. If there is a tag (S135: YES), it becomes a table-structured column, so the parameter is changed so that this is also expressed in the case of reading (S140). That is, the corresponding parameter change value is acquired from the parameter change table, and the pitch is set to 0.9 times as shown in FIG.

パラメータ変更値の取得後、もしくは、兄弟タグがない場合には（Ｓ１３５：ＮＯ）、対象ノードのタグに子タグがあるかどうかを判断する（Ｓ１４５）。例えば、図９の解析木の「memo」タグには、子タグとして「keyword」タグ、「ref」タグがある。子タグがある場合には（Ｓ１４５：ＹＥＳ）、対応するパラメータ変更値をパラメータ変更テーブルから取得する（Ｓ１５０）。すなわち、図２に示すように、音質を０．３倍とする。 After obtaining the parameter change value or when there is no sibling tag (S135: NO), it is determined whether the tag of the target node has a child tag (S145). For example, the “memo” tag of the analysis tree in FIG. 9 includes a “keyword” tag and a “ref” tag as child tags. If there is a child tag (S145: YES), the corresponding parameter change value is acquired from the parameter change table (S150). That is, as shown in FIG. 2, the sound quality is set to 0.3 times.

パラメータ変更値の取得後、もしくは、子タグがない場合には（Ｓ１４５：ＮＯ）、Ｓ１２０，Ｓ１３０，Ｓ１４０，Ｓ１５０で取得したパラメータ変更値を、読み上げ対象の文字列と対応させてＲＡＭ１１の読み上げ用バッファにセットする（Ｓ１５５）。 After acquiring the parameter change value or when there is no child tag (S145: NO), the parameter change value acquired in S120, S130, S140, and S150 is associated with the character string to be read out and read out in the RAM 11 The buffer is set (S155).

そして、ｉに１を加算して（Ｓ１６０）に戻り、ｉの値が全ノード数に達するまで上記の処理を繰り返し、全ノードの処理が終了したら（Ｓ１０５：ＮＯ）、音声合成処理（図３）に戻る。 Then, 1 is added to i (S160), and the above processing is repeated until the value of i reaches the total number of nodes. When the processing of all nodes is completed (S105: NO), the speech synthesis processing (FIG. 3) is performed. Return to).

タグ構造チェック処理（Ｓ３５）の終了後、次に、Ｓ１０で取得した解析木に従い、対象ＸＭＬ文書に含まれる各タグの種類をチェックする（Ｓ４０）。 After the tag structure check process (S35) is completed, next, the type of each tag included in the target XML document is checked according to the parse tree acquired in S10 (S40).

ここで、タグ種類チェック処理の詳細について、図５を参照して説明する。図５に示すように、まず、解析木のノード数ｉを０として初期化する（Ｓ２０１）。次に、ｉの値が全ノード数よりも小さいか否かを判断する（Ｓ２０５）。ｉの値が全ノード数よりも小さければ（Ｓ２０５：ＹＥＳ）、対象ノードのタグがユーザに指定された種類のタグかどうかを判断する。ユーザ指定タグであれば（Ｓ２１０：ＹＥＳ）、パラメータ変更テーブルを参照して、ユーザ指定に対応するパラメータ変更値を取得する（Ｓ２１５）。図２に示すように、ユーザ指定タグの場合は、音響モデルを初期値の「model A」から「model B」に変更する。例えば、ユーザが、「『個数』の列を強調して」と指定した場合、図３のＳ２５で、『個数』の列に対応するタグである「pieces」タグがユーザ指定タグとして取得されている。そこで、対象ノードのタグが「pieces」タグの場合には、音響モデルが「model B」に変更される。 Here, details of the tag type check process will be described with reference to FIG. As shown in FIG. 5, first, the number i of nodes in the analysis tree is initialized to 0 (S201). Next, it is determined whether or not the value of i is smaller than the total number of nodes (S205). If the value of i is smaller than the total number of nodes (S205: YES), it is determined whether or not the tag of the target node is of the type specified by the user. If it is a user designation tag (S210: YES), the parameter change value corresponding to the user designation is acquired with reference to the parameter change table (S215). As shown in FIG. 2, in the case of a user-specified tag, the acoustic model is changed from the initial value “model A” to “model B”. For example, if the user specifies “emphasize the“ number ”column”, the “pieces” tag corresponding to the “number” column is acquired as the user-specified tag in S25 of FIG. Yes. Therefore, if the tag of the target node is a “pieces” tag, the acoustic model is changed to “model B”.

パラメータ変更値取得後、もしくは、対象ノードのタグがユーザ指定タグでなければ（Ｓ２１０：ＮＯ）、次に、ＤＴＤ指定対応タグかどうかを判断する（Ｓ２２０）。本実施形態では、上述のように、登場任意要素、１回以上登場要素、登場順序任意要素の３つをＤＴＤ指定対応タグとして、これらの要素がタグとして登場したときに（Ｓ２２０：ＹＥＳ）、パラメータを変更する（Ｓ２２５）（図２参照）。 After obtaining the parameter change value or if the tag of the target node is not a user-specified tag (S210: NO), it is next determined whether the tag is a tag corresponding to the DTD specification (S220). In the present embodiment, as described above, when an appearance optional element, an element that appears once or more, and an appearance order arbitrary element are used as DTD designation corresponding tags, and these elements appear as tags (S220: YES), The parameter is changed (S225) (see FIG. 2).

例えば、図７のＸＭＬ文書の冒頭にあるＤＴＤ宣言では、＜!ELEMENT pans (pan+)＞となっているから、「pans」要素の中に、「pan」要素は１回以上登場するとされている。従って、対象ノードが「pan」タグの場合には、パラメータ変更テーブルを参照すると、ピッチが１．２倍となる。また、＜!ELEMENT pan (name,pieces,size,cc,rank,memo?)＞とあるから、「pans」要素の中に、「name」、「pieces」、「size」、「cc」、「rank」、「memo」の各要素は任意に登場する。従って、対象ノードがこれらのタグの場合には、パラメータ変更テーブルを参照すると、音質が０．５となる。 For example, in the DTD declaration at the beginning of the XML document in FIG. 7, <! ELEMENT pans (pan +)> is used, so the “pan” element appears to appear one or more times in the “pans” element. . Therefore, when the target node is a “pan” tag, the pitch is 1.2 times when the parameter change table is referred to. Also, because there is <! ELEMENT pan (name, pieces, size, cc, rank, memo?)>, "Name", "pieces", "size", "cc", " Each element of “rank” and “memo” appears arbitrarily. Therefore, when the target node is these tags, the sound quality becomes 0.5 when referring to the parameter change table.

パラメータ変更値取得後、もしくは、対象ノードのタグがＤＴＤ指定対応タグでない場合には（Ｓ２２０：ＮＯ）、次に、スタイルシート指定対応タグかどうかを判断する（Ｓ２３０）。本実施形態では、上述の通り、また、図２に示すように、＜th＞タグ、＜td＞タグ、及び＜span＞タグの３つをスタイルシート指定タグとして、対象ノードがこれらのタグであった場合に（Ｓ２３０：ＹＥＳ）、パラメータを変更する（Ｓ２３５）。 After acquiring the parameter change value or when the tag of the target node is not a tag corresponding to the DTD designation (S220: NO), it is next determined whether or not the tag is a tag corresponding to the style sheet designation (S230). In the present embodiment, as described above and as shown in FIG. 2, the <th> tag, the <td> tag, and the tag are set as style sheet designation tags, and the target node is the tag. If there is (S230: YES), the parameter is changed (S235).

例えば、図７のＸＭＬ文書では、ＤＴＤ宣言の後、＜?xml-stylesheet type="text/xsl" href="pans.xsl" ?＞とあり、「pans.xsl」というスタイルシート（図８参照）が指定されている。また、図８に示すように、スタイルシートでは、＜td＞タグに「name」、「pieces」、「size」、「cc」、「rank」、「memo」の各タグが適用されているので、対象ノードがこれらのタグの場合には、パラメータ変更テーブルを参照すると、ピッチが１．２倍となる。 For example, in the XML document of FIG. 7, after the DTD declaration, <? Xml-stylesheet type = "text / xsl" href = "pans.xsl"?>, And the style sheet "pans.xsl" (see FIG. 8) ) Is specified. Also, as shown in FIG. 8, in the style sheet, the tags “name”, “pieces”, “size”, “cc”, “rank”, and “memo” are applied to the <td> tag. When the target node is these tags, the pitch becomes 1.2 times by referring to the parameter change table.

パラメータ変更値取得後、もしくは、対象ノードのタグがスタイルシート指定タグでない場合には（Ｓ２３０：ＮＯ）、次に、属性指定タグかどうかを判断する（Ｓ２４０）。本実施形態では、上述のように、また、図２に示すように、リンクタグとイメージタグについて属性指定タグとし、対象ノードのタグがこれらの属性を有する場合には（Ｓ２４０：ＹＥＳ）、パラメータ変更テーブルを参照して、パワーのパラメータを０．６とする（Ｓ２４５）。 After obtaining the parameter change value or when the tag of the target node is not a style sheet designation tag (S230: NO), it is next determined whether or not it is an attribute designation tag (S240). In the present embodiment, as described above and as shown in FIG. 2, when the link tag and the image tag are attribute designation tags, and the tag of the target node has these attributes (S240: YES), the parameter With reference to the change table, the power parameter is set to 0.6 (S245).

パラメータ変更値取得後、もしくは、対象ノードのタグが属性指定タグでない場合には（Ｓ２４０：ＮＯ）、次に、補足指定タグかどうかを判断する（Ｓ２５０）。補足指定タグは、読み上げる際にタグの種類を細くして読み上げた方が好ましいものである。本実施形態では、＜a＞タグと＜img＞タグについて、補足指定タグとしている。対象ノードのタグがこれらのタグの場合には（Ｓ２５０：ＹＥＳ）、パラメータ変更テーブル（図２）を参照して変更値を取得する。図２に示すように、＜a＞タグの場合には、パワーのパラメータを０．６とし、かつ、「のリンクです」の文字列を補足する。＜img＞タグの場合には、パワーのパラメータを０．６とし、かつ、「のイメージです」の文字列を補足する。 After obtaining the parameter change value or when the tag of the target node is not an attribute designation tag (S240: NO), it is next determined whether or not it is a supplementary designation tag (S250). The supplementary designation tag is preferably read out by narrowing the tag type when reading out. In the present embodiment, the <a> tag and the <img> tag are supplementary designation tags. When the tags of the target node are these tags (S250: YES), the change value is acquired with reference to the parameter change table (FIG. 2). As shown in FIG. 2, in the case of the <a> tag, the power parameter is set to 0.6, and the character string “is a link” is supplemented. In the case of the <img> tag, the power parameter is set to 0.6 and the character string “is an image of” is supplemented.

パラメータ変更値取得後、もしくは、対象ノードのタグが補足指定タグでない場合には（Ｓ２５０：ＮＯ）、次に、条件指定タグかどうかを判断する（Ｓ２６０）。条件指定タグは、本実施形態では、条件として、タグで囲まれた文字列の中に数値が含まれる場合に、その数値が正の値であるか負の値であるかによってパラメータを変更することにしている。このような条件指定に合致するタグの場合には（Ｓ２６０：ＹＥＳ）、パラメータ変更テーブルを参照して変更値を取得する（Ｓ２６５）。図２に示すように、プラスの数値を囲むタグであれば、ピッチを１．２倍とし、マイナスの数値を囲むタグであれば、ピッチを０．８倍とする。 After acquiring the parameter change value or when the tag of the target node is not a supplementary designation tag (S250: NO), it is next determined whether or not it is a condition designation tag (S260). In this embodiment, the condition designation tag changes the parameter depending on whether the numeric value is a positive value or a negative value when the numeric value is included in the character string enclosed by the tag as a condition. I have decided. In the case of a tag that matches such a condition specification (S260: YES), a change value is acquired with reference to the parameter change table (S265). As shown in FIG. 2, if the tag surrounds a positive numerical value, the pitch is 1.2 times, and if the tag surrounds a negative numerical value, the pitch is 0.8 times.

パラメータ変更値取得後、もしくは、対象ノードのタグが条件指定タグでない場合には（Ｓ２６０：ＮＯ）、次に、タグで挟まれた中にテキストが含まれているかどうかを判断する（Ｓ２７０）。テキストが含まれていなければ（Ｓ２７０：ＮＯ）、代替文字列として「なし」を用意し（Ｓ２７５）、これを読み上げ文字列に加える。テキストが含まれている場合には（Ｓ２７０：ＹＥＳ）、そのままＳ２８０に進む。 After acquiring the parameter change value or when the tag of the target node is not a condition designation tag (S260: NO), it is next determined whether or not text is included between the tags (S270). If no text is included (S270: NO), “None” is prepared as an alternative character string (S275), and this is added to the reading character string. When the text is included (S270: YES), the process proceeds to S280 as it is.

以上でタグの種類に関するパラメータ変更値が取得されたので、これを各文字列と対応させて読み上げ用バッファにセットする（Ｓ２８０）。 As described above, the parameter change value related to the tag type is acquired, and this is set in the reading buffer in association with each character string (S280).

そして、ｉに１を加算して（Ｓ２８５）に戻り、ｉの値が全ノード数に達するまで上記の処理を繰り返し、全ノードの処理が終了したら（Ｓ２０５：ＮＯ）、音声合成処理（図３）に戻る。 Then, 1 is added to i (S285), and the above processing is repeated until the value of i reaches the number of all nodes. When the processing of all the nodes is completed (S205: NO), the speech synthesis processing (FIG. 3). Return to).

タグ種類チェック処理（Ｓ４０）の終了後、読み上げ用バッファに記憶されている読み上げ文字列、音声パラメータに従って、音響モデル記憶エリア１３３に記憶されている音響モデルを使用し、周知の方法で音声合成を行う（Ｓ４５）。そして、合成された音声をスピーカ２６から出力する（Ｓ５０）。以上でブラウザに表示されたＸＭＬ文書の読み上げが終了する。 After completion of the tag type check process (S40), speech synthesis is performed by a well-known method using the acoustic model stored in the acoustic model storage area 133 according to the reading character string and the speech parameter stored in the reading buffer. Perform (S45). Then, the synthesized voice is output from the speaker 26 (S50). This completes the reading of the XML document displayed on the browser.

以上説明したように、本実施形態の音声合成装置１００によれば、タグ付のＸＭＬ文書が表示されており、その読み上げが指示されたときに、そのタグの構造やタグの種類に応じて音声パラメータを変更し、変更された音声パラメータに従って音声合成を行って読み上げを行う。従って、表示に比べて表現が乏しくなりがちな音声による読み上げにおいても、メリハリをつけ、表示状態がわかるような工夫のなされた読み上げを行うことができる。従って、目の不自由なユーザや、今表示画面を注視することができないユーザに便宜を供することができる。 As described above, according to the speech synthesizer 100 of the present embodiment, an XML document with a tag is displayed, and when the reading is instructed, a speech is generated according to the tag structure and the tag type. The parameters are changed, and speech synthesis is performed according to the changed speech parameters, and reading is performed. Therefore, even in the case of speech reading that tends to be poor in expression as compared with display, it is possible to perform read-out that is devised so that the display state can be understood clearly. Therefore, it is possible to provide convenience to visually impaired users and users who are not able to watch the display screen now.

尚、上記実施形態において、Ｓ１０で解析木を取得するＣＰＵ１０が本発明の取得手段に相当する。また、Ｓ３５及びＳ４０でタグ構造チェック処理及びタグ種類チェック処理を実行するＣＰＵ１０が本発明の変更手段に相当する。また、Ｓ４５で音声合成処理を実行するＣＰＵ１０が本発明の音声合成手段に相当する。 In the above embodiment, the CPU 10 that acquires the analysis tree in S10 corresponds to the acquisition unit of the present invention. Moreover, CPU10 which performs a tag structure check process and a tag kind check process by S35 and S40 is equivalent to the change means of this invention. The CPU 10 that executes the speech synthesis process in S45 corresponds to the speech synthesis means of the present invention.

音声合成装置１００のハードウェアブロック図である。2 is a hardware block diagram of the speech synthesizer 100. FIG. パラメータ変更テーブルの構成を示す模式図である。It is a schematic diagram which shows the structure of a parameter change table. 音声合成処理のメイン処理のフローチャートである。It is a flowchart of the main process of a speech synthesis process. 音声合成処理の中で実行されるタグ構造チェック処理のフローチャートである。It is a flowchart of the tag structure check process performed in a speech synthesis process. 音声合成処理の中で実行されるタグ種類チェック処理のフローチャートである。It is a flowchart of the tag type check process performed in the speech synthesis process. ブラウザの表示画面の例を示す模式図である。It is a schematic diagram which shows the example of the display screen of a browser. ＸＭＬ文書の例を示す説明図である。It is explanatory drawing which shows the example of an XML document. スタイルシートの例を示す説明図である。It is explanatory drawing which shows the example of a style sheet. 解析木の例を示す模式図である。It is a schematic diagram which shows the example of an analysis tree.

Explanation of symbols

１０ＣＰＵ
１１ＲＡＭ
１３ハードディスク装置
１４出力制御部
１００音声合成装置
１３１変更パラメータ記憶エリア
１３３音響モデル記憶エリア
１３４プログラム記憶エリア 10 CPU
11 RAM
13 Hard Disk Device 14 Output Control Unit 100 Speech Synthesizer 131 Change Parameter Storage Area 133 Acoustic Model Storage Area 134 Program Storage Area

Claims

A speech synthesizer that generates synthesized speech from tagged text,
A reading character string assigning unit that linguistically analyzes the text and assigns a reading character string;
Phoneme generation means for generating phoneme data and phoneme parameters corresponding to the reading character string assigned by the reading character string giving means;
Obtaining means for obtaining a result of parsing the tagged text according to the tag;
Based on the parsing result obtained by the obtaining means, for the text with a predetermined tag, a changing means for changing the parameter generated by the phonological generation means;
A speech synthesizer comprising: a speech synthesizer configured to generate a synthesized speech from the parameter changed by the change unit and the phoneme data generated by the phonological generation unit.

2. The speech synthesizer according to claim 1, wherein the changing unit changes the parameter for text with a tag with a predetermined name.

The speech synthesizer according to claim 2, further comprising designation means for designating a name of a tag whose parameter is changed by the changing means.

The speech synthesizer according to any one of claims 1 to 3, wherein the changing unit changes the parameter for text with a tag having a predetermined attribute.

The speech synthesizer according to claim 1, wherein the changing unit changes the parameter for text having a predetermined tag structure.

The speech synthesizer according to any one of claims 1 to 5, wherein the changing unit changes the parameter for a text with a tag for designating a display format.

A speech synthesis program for causing a computer to function as various processing means of the speech synthesis apparatus according to claim 1.