JP2014197117A

JP2014197117A - Speech synthesizer and language dictionary registration method

Info

Publication number: JP2014197117A
Application number: JP2013072559A
Authority: JP
Inventors: 野田　拓也; Takuya Noda; 拓也野田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-03-29
Filing date: 2013-03-29
Publication date: 2014-10-16
Anticipated expiration: 2033-03-29
Also published as: JP6232724B2

Abstract

PROBLEM TO BE SOLVED: To provide a speech synthesizer capable of generating a proper synthesized speech signal even for a text including a word having a plurality of different readings or rhythms.SOLUTION: A speech synthesizer 1 has: an input unit 2 which acquires text data; a storage unit 3 in which at least a Chinese character and Kana character notation of a word, and an intermediate notation including the reading of the word and a rhythm symbol showing the rhythm are registered, and which stores a language dictionary which is used for generating the intermediate notation from the text data; a registration range setting unit 25 which expands a registration range to be registered in the language dictionary in such a manner that, when a word included in a correction range in the after-correction intermediate notation in which the intermediate notation is corrected is a predetermined part of speech, words of the same part of speech as the word which continues before/after the word included in the correction range are also included; and a registration unit 26 which registers at least Chinese character and Kana character notation and the intermediate notation of the word in the language dictionary, with a part included in the registration range in the after-correction intermediate notation as one word.

Description

本発明は、例えば、テキストデータから音声信号を合成する音声合成装置及びその音声合成装置で利用される言語辞書の登録方法に関する。 The present invention relates to a speech synthesizer that synthesizes a speech signal from text data, for example, and a language dictionary registration method used in the speech synthesizer.

近年、音声を自動合成する音声合成技術が開発されている。音声合成技術は、短時間で所望の音声を作成できるというメリットを有するため、これまで予め録音されたプロのナレータによる音声を用いていたアプリケーションの中には、このような音声合成技術を採用したものもある。特に、商業施設における案内放送、ハイウェイラジオ、ハイウェイテレホンまたは天気予報の放送など、短い時間間隔で提供する情報が更新されるアプリケーションでは、上記のメリットを持つ音声合成技術が有用である。 In recent years, speech synthesis technology for automatically synthesizing speech has been developed. Since speech synthesis technology has the advantage that it can create desired speech in a short time, such speech synthesis technology has been adopted in applications that have used pre-recorded speech by professional narrators. There are also things. In particular, in an application in which information provided at a short time interval is updated, such as a guidance broadcast in a commercial facility, a highway radio, a highway telephone, or a weather forecast broadcast, the speech synthesis technology having the above-described advantages is useful.

合成したい音声信号を生成するために、音声合成装置には、例えば、キーボードなどを介して漢字仮名交じりのテキストデータが入力される。そして音声合成装置は、そのテキストデータに対して、単語の読みなどを登録した言語辞書を利用して、形態素解析または係り受け解析といった言語処理を行う。そして音声合成装置は、その言語処理によって、形態素単位の読み表記を表す形態素情報と、その形態素情報にアクセント位置、アクセントの強弱あるいは抑揚の大小といった韻律を表す韻律記号を付した中間表記を生成する。そして音声合成装置は、その中間表記に基づいて、合成音声信号を生成する。 In order to generate a speech signal to be synthesized, text data mixed with kanji characters is input to the speech synthesizer via, for example, a keyboard. The speech synthesizer performs language processing such as morphological analysis or dependency analysis on the text data using a language dictionary in which word readings are registered. Then, the speech synthesizer generates morpheme information representing the morpheme-based reading notation and intermediate notation with the prosody symbol representing the prosody such as the accent position, the strength of the accent, or the size of the inflection on the morpheme information by the language processing. . Then, the speech synthesizer generates a synthesized speech signal based on the intermediate notation.

漢字には、複数の読み方があり、また、単語によってその単語に含まれる漢字の読み方は異なる。また、日常で使用される単語は日々変化するので、全ての単語を予め言語辞書に登録することは事実上不可能であり、入力されたテキストデータに、言語辞書に登録されていない単語が含まれていることもある。そのため、言語処理の結果得られる中間表記が正確でないこともある。このような場合、正しい合成音声信号を得るために、ユーザが手動で中間表記を修正する必要がある。このような修正作業は、ユーザにとって煩雑であるため、できるだけこのような修正作業が発生しないことが好ましい。そこで、入力テキストデータの終端に至った際にいままで抽出された未知語を一括してユーザに伝え、その未知語とユーザにより入力されたその未知語についての情報を単語辞書に登録する技術が提案されている（例えば、特許文献１を参照）。また、言語解析結果のうちの第１候補の言語解析結果の中に記憶された置換条件と一致する部分が存在する場合、その一致部分を置換条件に対応する置換情報に置き換えて新たな言語解析結果を生成する技術が提案されている（例えば、特許文献２を参照）。この技術では、新たな言語解析結果と同じものが第１候補以外の言語解析結果として存在している場合には、新たな言語解析結果に基づいて合成音声が生成される。 There are several ways to read kanji, and how to read kanji included in the word differs depending on the word. In addition, since words used in daily life change day by day, it is virtually impossible to register all words in the language dictionary in advance, and the input text data includes words that are not registered in the language dictionary. Sometimes it is. Therefore, the intermediate notation obtained as a result of language processing may not be accurate. In such a case, in order to obtain a correct synthesized speech signal, the user needs to manually correct the intermediate notation. Since such a correction work is complicated for the user, it is preferable that such a correction work does not occur as much as possible. Therefore, when the input text data reaches the end of the input text data, the unknown words extracted so far are collectively transmitted to the user, and the unknown word and the information about the unknown word input by the user are registered in the word dictionary. It has been proposed (see, for example, Patent Document 1). Further, if there is a part that matches the stored replacement condition in the first candidate language analysis result in the language analysis result, the matching part is replaced with replacement information corresponding to the replacement condition, and a new language analysis is performed. A technique for generating a result has been proposed (see, for example, Patent Document 2). In this technique, when the same language analysis result other than the first candidate exists as a language analysis result other than the first candidate, a synthesized speech is generated based on the new language analysis result.

上記の技術では、未知語として登録された単語または置換条件に合致する単語がテキストデータに含まれると、その登録された単語または置換条件にしたがって中間表記が生成されることになる。しかし、単語自体も、その単語の前後の文章などによって異なる読み方がなされたり、異なる韻律で発声されることがある。このような場合、上記の技術では、必ずしも適切な中間表記が生成されないおそれがある。そこで、読み上げ対象の文書の中間言語を編集する際に、修正対象の語句の指定と、修正反映の条件指定とを修正指示に含めるようにした技術が提案されている（例えば、特許文献３を参照）。この技術では、修正反映の条件として、例えば、関連単語・フレーズが指定される。 In the above technique, when a word registered as an unknown word or a word matching the replacement condition is included in the text data, an intermediate notation is generated according to the registered word or replacement condition. However, the word itself may be read differently depending on sentences before and after the word, or may be uttered with different prosody. In such a case, the above technique may not necessarily generate an appropriate intermediate notation. Therefore, a technique has been proposed in which, when editing an intermediate language of a document to be read out, specification of a correction target phrase and correction reflection condition specification are included in the correction instruction (for example, Patent Document 3). reference). In this technique, for example, a related word / phrase is specified as a condition for reflecting the correction.

特開平７−２４４４９１号公報JP 7-244491 A 特開平１０−３１２３７７号公報Japanese Patent Laid-Open No. 10-312377 特開２００６−３０３２６号公報JP 2006-30326 A

しかしながら、特許文献３に開示された技術でも、修正反映の条件が適切に指定されなければ、適切な中間言語が生成されないおそれがある。そして、全ての場合を予め考慮して、修正反映の条件を決定することは困難である。 However, even the technique disclosed in Patent Document 3 may not generate an appropriate intermediate language unless the conditions for reflecting the correction are appropriately specified. It is difficult to determine the conditions for reflecting the reflection in consideration of all cases in advance.

そこで本明細書は、一つの側面として、異なる複数の読みまたは韻律がある単語が含まれるテキストに対しても適切な合成音声信号を生成できる音声合成装置を提供することを目的とする。 Accordingly, an object of one aspect of the present specification is to provide a speech synthesizer that can generate an appropriate synthesized speech signal even for text including words having a plurality of different readings or prosody.

一つの実施形態によれば、テキストデータから生成した、テキストデータの読み及び韻律を表す韻律記号を含む中間表記に基づいて合成音声信号を生成する音声合成装置が提供される。この音声合成装置は、テキストデータを取得する入力部と、少なくとも単語の漢字仮名表記とその単語の読み及び韻律記号を含む中間表記とが登録され、テキストデータから中間表記を生成するために利用される言語辞書を記憶する記憶部と、中間表記が修正された修正後中間表記における修正範囲に含まれる単語が所定の品詞である場合、その修正範囲に含まれる単語の前後に連続する、その単語の品詞と同じ品詞の単語まで含むように、言語辞書に登録する登録範囲を拡張する登録範囲設定部と、修正後中間表記中の登録範囲に含まれる部分を一つの単語として、少なくともその単語の漢字仮名表記と中間表記とを言語辞書に登録する登録部とを有する。 According to one embodiment, there is provided a speech synthesizer that generates a synthesized speech signal based on an intermediate notation that includes a prosodic symbol representing a reading and prosody of text data generated from text data. This speech synthesizer is used to generate an intermediate notation from text data by registering an input unit for acquiring text data and an intermediate notation including at least a kanji kana notation of the word and a reading and prosodic symbol of the word. A storage unit for storing a language dictionary and a word included in the corrected range in the corrected intermediate notation in which the intermediate notation is corrected, if the word included in the corrected range is a word that is continuous before and after the word included in the corrected range The registration range setting unit that expands the registration range to be registered in the language dictionary so as to include up to the word of the same part of speech as the part of speech, and the portion included in the registration range in the corrected intermediate notation as one word, at least the word A registration unit for registering kanji kana notation and intermediate notation in a language dictionary;

本発明の目的及び利点は、請求項において特に指摘されたエレメント及び組み合わせにより実現され、かつ達成される。
上記の一般的な記述及び下記の詳細な記述の何れも、例示的かつ説明的なものであり、請求項のように、本発明を限定するものではないことを理解されたい。 The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It should be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.

本明細書に開示された音声合成装置は、異なる複数の読みまたは韻律がある単語が含まれるテキストに対しても適切な合成音声信号を生成できる。 The speech synthesizer disclosed in the present specification can generate an appropriate synthesized speech signal even for text including words having a plurality of different readings or prosody.

テキストデータに対して誤って生成された中間表記の一例を示す図である。It is a figure which shows an example of the intermediate | middle description produced | generated accidentally with respect to text data. 一つの実施形態による音声合成装置の概略構成図である。It is a schematic block diagram of the speech synthesizer by one Embodiment. 一つの実施形態による音声合成装置が有する処理部の機能ブロック図である。It is a functional block diagram of the process part which the speech synthesizer by one Embodiment has. 言語処理部により出力された中間表記と、ユーザにより修正された中間表記の一例を示す図である。It is a figure which shows an example of the intermediate notation output by the language processing part, and the intermediate notation corrected by the user. 図４の中間表記に対応する、ユーザ修正前の形態素読み表記とユーザ修正後の形態素読み表記を表す図である。It is a figure showing the morpheme reading notation before a user correction corresponding to the intermediate notation of FIG. 4, and the morpheme reading notation after a user correction. ユーザが修正する前の形態素読み表記とユーザが修正した後の形態素読み表記の他の一例を示す図である。It is a figure which shows another example of the morpheme reading notation before a user corrects, and the morpheme reading notation after a user corrects. 形態素情報設定処理の動作フローチャートである。It is an operation | movement flowchart of a morpheme information setting process. ユーザ修正の前後のそれぞれの中間表記と、ユーザ修正後の形態素読み表記との関係の一例を示す図である。It is a figure which shows an example of the relationship between each intermediate notation before and after user correction, and the morpheme reading notation after user correction. ユーザ修正の前後のそれぞれの中間表記と、設定される修正範囲との関係の一例を示す図である。It is a figure which shows an example of the relationship between each intermediate description before and after user correction, and the correction range set. ユーザ修正の前後の中間表記と登録範囲の関係の一例を示す図である。It is a figure which shows an example of the relationship between the intermediate description before and behind user correction, and the registration range. 登録範囲設定処理の動作フローチャートを示す図である。It is a figure which shows the operation | movement flowchart of a registration range setting process. 辞書登録処理の動作フローチャートである。It is an operation | movement flowchart of a dictionary registration process. ユーザ辞書に登録された単語と中間表記辞書に登録された単語の一例を示す図である。It is a figure which shows an example of the word registered into the word registered into the user dictionary and the intermediate notation dictionary. 辞書選択処理の動作フローチャートである。It is an operation | movement flowchart of a dictionary selection process.

以下、図を参照しつつ、様々な実施形態による音声合成装置について説明する。
最初に、図１を参照しつつ、音声合成の対象として入力されたテキストデータに対して誤って生成される中間表記の例を説明する。 Hereinafter, speech synthesis apparatuses according to various embodiments will be described with reference to the drawings.
First, an example of intermediate notation that is erroneously generated for text data input as a speech synthesis target will be described with reference to FIG.

テキストデータとして、「この羊羹最中は美しく・・・」という漢字仮名交じり文１００と、「創意工夫しながら・・・」という漢字仮名交じり文１１０が入力されたとする。そして、漢字仮名交じり文１００に対して「ヨーカンサ’イチューワ・・・」という中間表記１０１が生成され、漢字仮名交じり文１１０に対して「ソ−イク’フー」という中間表記１１１が生成されている。中間表記中の記号は「’」は、「アクセント強」を表す韻律記号である。なお、韻律記号自体は規格で定められているものではないので、特定の韻律を表す韻律記号として、本明細書で使用した韻律記号以外の記号が使用されてもよい。 It is assumed that a kanji kana mixed sentence 100 “Beautiful in the middle of this yokan ...” and a kanji kana mixed sentence 110 “ingenuity and ingenuity” are input as text data. Then, an intermediate notation 101 “Yokansa Ichuwa ...” is generated for the kanji kana mixed sentence 100, and an intermediate notation 111 “so-iku 'fu” is generated for the kanji kana mixed sentence 110. . The symbol “′” in the intermediate notation is a prosodic symbol representing “strong accent”. Since the prosodic symbols themselves are not defined in the standard, symbols other than the prosodic symbols used in this specification may be used as prosodic symbols representing specific prosody.

この例では、中間表記１０１において「最中」に相当する部分の読みがを誤って「サ’イチュー」と表記されている。そのため、ユーザが修正した中間表記１０２に示されるように、「サ’イチュー」が「モ’ナカ」と修正されている。ここで、言語辞書に単語「最中」が登録されていると、音声合成装置が、例えば、動的計画法に基づくマッチングなどの技術を用いて修正範囲を自動抽出した場合、自動抽出される範囲は言語辞書に登録されている単語である「最中」となる。そのため、ユーザがした修正を言語辞書に反映させると、それ以降、単語「最中」を含むテキストデータが入力されたときに、その単語「最中」の中間表記が常に「モ’ナカ」となるおそれがある。例えば、「遊びの最中に・・・」といった文章では、単語「最中」の中間表記は、「モ’ナカ」ではなく、「サ’イチュー」とすべきである。このように、複数の読み方がある単語では、ユーザによる読みの修正を反映することで、逆に誤った中間表示の生成に結び付くおそれがある。このような誤りを防ぐためには、ユーザは、「羊羹最中」自体を複合名詞として言語辞書に登録する必要がある。しかしながら、音声合成に関する専門知識が無いユーザが、適切にこのような判断を行うことは難しい。また、ユーザが音声合成に関する専門知識を有する場合でも、言語辞書に登録すべき単語の設定を誤ってしまうこともある。 In this example, in the intermediate notation 101, the portion corresponding to “middle” is mistakenly written as “situ”. For this reason, as shown in the intermediate notation 102 corrected by the user, “same” is corrected as “mode”. Here, if the word “middle” is registered in the language dictionary, the speech synthesizer automatically extracts the correction range using a technique such as matching based on dynamic programming, for example. The range is “middle” which is a word registered in the language dictionary. Therefore, when the correction made by the user is reflected in the language dictionary, when text data including the word “middle” is input thereafter, the intermediate notation of the word “middle” is always “monaka”. There is a risk. For example, in a sentence such as “in the middle of play ...”, the intermediate notation of the word “middle” should be “sa’ stew ”, not“ mo ”. As described above, in a word having a plurality of readings, the correction of reading by the user may be reflected, which may lead to generation of an erroneous intermediate display. In order to prevent such an error, the user needs to register “in the middle of the sheep” itself as a compound noun in the language dictionary. However, it is difficult for a user who has no specialized knowledge about speech synthesis to make such a determination appropriately. Further, even when the user has specialized knowledge related to speech synthesis, the setting of a word to be registered in the language dictionary may be wrong.

また、中間表記１１１では、読みは正しいものの、アクセントと区切りの位置が誤っている。そこで、中間表記１１２に示されるように、ユーザがアクセントの位置を修正し、区切りを表す韻律記号「＝」を追加して「ソ’−イ＝クフー」と修正したとする。その際、修正された箇所が韻律記号だけだと、修正の前後で各単語の読みに変化がないので、音声合成装置は、修正すべき範囲を特定できないおそれがある。この場合、ユーザが修正すべき範囲を、複合名詞「創意工夫」として指定することも考えられる。そして、言語辞書にその複合名詞を登録するために、「創意工夫」の品詞も指定することになる。ここで、単語「創意」は普通名詞であり、「工夫」はサ変名詞であるが、単語「工夫」が複合名詞中の最後尾に位置するので、ユーザは「創意工夫」の品詞もサ変名詞とすべきである。しかし、そのような専門知識を有さないユーザは、複合名詞「創意工夫」の正確な品詞が分からず、複合名詞「創意工夫」を普通名詞としてしまうこともある。そうすると、他のテキストデータにおいて、「創意工夫して」のように、その複合名詞「創意工夫」が動詞「する」とともに含まれていたとしても、「創意工夫」はサ変名詞として登録されていないので、サ変名詞と接続し易いサ変動詞「して」との接続で採用されず、その結果、元通り、普通名詞「創意」とサ変名詞「工夫」が採用されることになり、ユーザによるアクセント位置などの修正が中間表記に反映されなくなってしまう。 In the intermediate notation 111, although the reading is correct, the positions of the accent and the break are incorrect. Therefore, as shown in the intermediate notation 112, it is assumed that the user corrects the position of the accent, adds a prosodic symbol “=” representing a delimiter, and corrects it to “So’-i = Kufu”. At this time, if the corrected part is only the prosodic symbol, there is no change in the reading of each word before and after the correction, so that the speech synthesizer may not be able to specify the range to be corrected. In this case, the range to be corrected by the user may be designated as the compound noun “creativeness”. Then, in order to register the compound noun in the language dictionary, the part of speech of “ingenuity” is also designated. Here, the word “creativity” is a common noun, and “devise” is a sabot noun, but since the word “devise” is located at the end of the compound noun, the user can also use the part of speech Should be. However, a user who does not have such specialized knowledge may not know the exact part of speech of the compound noun “creativity” and may end up using the compound noun “creativity” as a common noun. Then, in other text data, even if the compound noun “creative device” is included with the verb “do”, as in “creative device”, “creative device” is not registered as a sabot noun. Therefore, it is not adopted in the connection with the sub-variant “Sei”, which is easy to connect with the sub-noun, and as a result, the common noun “creativity” and the sub-noun “ingenuity” are adopted, and the accent by the user Corrections such as position will not be reflected in the intermediate notation.

上記のような問題を解決するためには、音声合成装置には、ユーザが修正を加えた範囲を適正化した上で、その修正にかかわる単語の品詞及び単語の登録範囲を適切に決定することが求められる。そこで、本実施形態による音声合成装置は、ユーザによる修正前後の中間表記を比較して、形態素単位で一致する部分及び不一致の部分を特定し、その一致・不一致の情報に基づいて、修正後の中間表記に含まれる各形態素の品詞情報を設定する。そしてこの音声合成装置は、ユーザによる修正前後の中間表記を品詞単位でマッチング処理することで、不一致部分を含む品詞全体を修正範囲全体に設定する。そしてこの音声合成装置は、修正範囲となる品詞が名詞であれば、修正範囲とその前後で連続する名詞の並び全体を、ユーザの修正を反映する単語の登録範囲とする。またこの音声合成装置は、修正範囲となる品詞が活用自立語の語幹であれば、その語幹から同じ品詞の後置活用語尾までを、ユーザの修正を反映する単語の登録範囲とする。 In order to solve the above problems, the speech synthesizer should appropriately determine the part of speech and the word registration range involved in the correction after optimizing the range in which the user has made the correction. Is required. Therefore, the speech synthesizer according to the present embodiment compares the intermediate notation before and after the correction by the user, identifies a matching part and a mismatching part in morpheme units, and based on the matching / mismatching information, Set part-of-speech information for each morpheme included in the intermediate notation. Then, the speech synthesizer sets the entire part of speech including the inconsistent part as the entire correction range by matching the intermediate notation before and after the correction by the user for each part of speech. Then, if the part of speech that is the correction range is a noun, this speech synthesizer sets the correction range and the entire sequence of nouns continuous before and after the correction range as a word registration range that reflects the user's correction. Also, in this speech synthesizer, if the part of speech that is the correction range is a stem of an independence word, the range from the stem to the last inflection ending of the same part of speech is set as a word registration range that reflects the user's correction.

図２は、一つの実施形態による音声合成装置の概略構成図である。本実施形態では、音声合成装置１は、入力部２と、記憶部３と、処理部４と、出力部５とを有する。 FIG. 2 is a schematic configuration diagram of a speech synthesizer according to one embodiment. In the present embodiment, the speech synthesizer 1 includes an input unit 2, a storage unit 3, a processing unit 4, and an output unit 5.

入力部２は、合成音声の原文であり、漢字仮名交じり文であるテキストデータを取得する。そのために、入力部２は、例えば、キーボードを有する。また、入力部２は、マウスなどのポインティングデバイスとそのポインティングデバイスにより指示される入力すべき文字または数値などを表示するディスプレイとを有する。あるいは、入力部２は、タッチパネルディスプレイを有してもよい。
さらにまた、入力部２は、テキストデータを通信ネットワークを介して音声合成装置１と接続された他の機器から取得してもよい。この場合、入力部２は、音声合成装置１を通信ネットワークに接続するためのインターフェース回路を有する。
そして入力部２は、入力されたテキストデータを処理部４へ渡す。 The input unit 2 obtains text data that is an original text of synthesized speech and is a kanji-kana mixed text. For this purpose, the input unit 2 includes, for example, a keyboard. The input unit 2 includes a pointing device such as a mouse and a display that displays characters or numerical values to be input, which are instructed by the pointing device. Alternatively, the input unit 2 may have a touch panel display.
Furthermore, the input unit 2 may acquire text data from another device connected to the speech synthesizer 1 via a communication network. In this case, the input unit 2 includes an interface circuit for connecting the speech synthesizer 1 to a communication network.
The input unit 2 passes the input text data to the processing unit 4.

記憶部３は、例えば、半導体メモリ回路、磁気記憶装置または光記憶装置のうちの少なくとも一つを有する。そして記憶部３は、処理部４で用いられる各種コンピュータプログラム及び音声合成処理に用いられる各種のデータを記憶する。
記憶部３は、音声合成処理に用いられるデータとして、例えば、韻律モデルと、音声波形辞書を記憶する。さらに記憶部３は、言語処理に用いられるデータとして、テキストデータ中に出現すると想定される様々な単語について、その単語の漢字仮名表記、中間表記、品詞及び活用形などを格納した言語辞書を記憶する。さらに記憶部３は、ユーザにより登録された単語について、その単語の漢字仮名表記、中間表記、品詞及び活用形などを格納したユーザ辞書を記憶する。なお、ユーザ辞書も、言語辞書の一例である。 The storage unit 3 includes, for example, at least one of a semiconductor memory circuit, a magnetic storage device, and an optical storage device. The storage unit 3 stores various computer programs used in the processing unit 4 and various data used for speech synthesis processing.
The storage unit 3 stores, for example, a prosodic model and a speech waveform dictionary as data used for speech synthesis processing. Further, the storage unit 3 stores, as data used for language processing, a language dictionary that stores kanji kana notation, intermediate notation, part of speech, and utilization form of the words that are supposed to appear in the text data. To do. Furthermore, the memory | storage part 3 memorize | stores the user dictionary which stored the kanji kana notation of the word, intermediate notation, a part of speech, a utilization form, etc. about the word registered by the user. A user dictionary is also an example of a language dictionary.

出力部５は、処理部４から受け取った合成音声信号をスピーカ６へ出力する。そのために、出力部５は、例えば、スピーカ６を音声合成装置１と接続するためのオーディオインターフェース回路を有する。
また出力部５は、合成音声信号を、通信ネットワークを介して音声合成装置１と接続された他の装置へ出力してもよい。この場合、出力部５は、その通信ネットワークに音声合成装置１と接続するためのインターフェース回路を有する。なお、入力部２も通信ネットワークを介してテキストデータを取得する場合、入力部２と出力部５は一体化されていてもよい。 The output unit 5 outputs the synthesized voice signal received from the processing unit 4 to the speaker 6. For this purpose, the output unit 5 includes, for example, an audio interface circuit for connecting the speaker 6 to the speech synthesizer 1.
The output unit 5 may output the synthesized speech signal to another device connected to the speech synthesizer 1 via the communication network. In this case, the output unit 5 includes an interface circuit for connecting to the speech synthesizer 1 to the communication network. In addition, when the input part 2 also acquires text data via a communication network, the input part 2 and the output part 5 may be integrated.

処理部４は、一つまたは複数のプロセッサと、メモリ回路と、周辺回路とを有する。そして処理部４は、入力されたテキストデータに基づいて、合成音声信号を作成する。
図３は、処理部４の機能ブロック図である。処理部４は、言語処理部１０と、音声合成部１１と、辞書登録部１２とを有する。
処理部４が有するこれらの各部は、例えば、処理部４が有するプロセッサ上で動作するコンピュータプログラムにより実現される機能モジュールである。あるいは、処理部４が有するこれらの各部は、その各部の機能を実現する一つの集積回路として音声合成装置１に実装されてもよい。 The processing unit 4 includes one or a plurality of processors, a memory circuit, and a peripheral circuit. Then, the processing unit 4 creates a synthesized speech signal based on the input text data.
FIG. 3 is a functional block diagram of the processing unit 4. The processing unit 4 includes a language processing unit 10, a speech synthesis unit 11, and a dictionary registration unit 12.
Each of these units included in the processing unit 4 is, for example, a functional module realized by a computer program that operates on a processor included in the processing unit 4. Or these each part which the process part 4 has may be mounted in the speech synthesizer 1 as one integrated circuit which implement | achieves the function of each part.

言語処理部１０は、入力された、漢字仮名交じり文であるテキストデータから形態素読み表記を生成するとともに、そのテキストデータに含まれる各形態素情報を特定する。さらに、言語処理部１０は、入力されたテキストデータから、中間表記及びそのテキストデータに含まれる各品詞の情報を特定する。ここで、形態素読み表記とは、形態素単位の読み表記を表し、例えば、カタカナで表される。また、中間表記とは、形態素読み表記に、韻律を表す韻律記号が追加されたものである。韻律記号には、例えば、「アクセント位置」、「アクセント強弱」、「音程高低」、「抑揚大小」、「話速緩急」、「音量大小」及び「区切り」を表現する記号が含まれる。したがって、中間表記から韻律記号を除いたものは、形態素読み表記と一致する。また、中間表記から韻律記号を除いたものにおける、各品詞の情報は、形態素情報と１対１に対応する。すなわち、中間表記と品詞情報から、形態素読み表記と形態素情報が抽出される。 The language processing unit 10 generates a morpheme reading notation from the input text data that is a kanji mixed sentence, and specifies each morpheme information included in the text data. Furthermore, the language processing unit 10 specifies intermediate notation and information on each part of speech included in the text data from the input text data. Here, the morpheme reading notation represents the reading notation in units of morphemes, and is expressed in katakana, for example. The intermediate notation is obtained by adding prosodic symbols representing prosody to the morpheme reading notation. The prosodic symbols include, for example, symbols that represent “accent position”, “accent strength”, “pitch pitch”, “inflection magnitude”, “speech speed”, “volume level”, and “separation”. Accordingly, the intermediate notation excluding the prosodic symbols matches the morpheme reading notation. In addition, the information of each part of speech in the intermediate notation excluding the prosodic symbols corresponds to the morpheme information on a one-to-one basis. That is, morpheme reading notation and morpheme information are extracted from the intermediate notation and part-of-speech information.

言語処理部１０は、入力されたテキストデータから形態素読み表記及び中間表記などを生成するために、記憶部３に記憶されている言語辞書及びユーザ辞書を読み込む。そして言語処理部１０は、例えば、その言語辞書及びユーザ辞書を用いて、テキストデータに対して形態素解析及び係り受け解析を行って、テキストデータ中に出現する各単語の順序及び読み、アクセントの位置及び区切りの位置を決定する。その際、言語処理部１０は、テキストデータ中に言語辞書とユーザ辞書の両方に登録されている単語がある場合、ユーザ辞書に登録されている単語を優先的に利用してもよい。 The language processing unit 10 reads the language dictionary and the user dictionary stored in the storage unit 3 in order to generate morpheme reading notation and intermediate notation from the input text data. Then, the language processing unit 10 performs morphological analysis and dependency analysis on the text data using the language dictionary and the user dictionary, for example, the order and reading of each word appearing in the text data, and the position of the accent And the position of the break is determined. At that time, when there are words registered in both the language dictionary and the user dictionary in the text data, the language processing unit 10 may preferentially use the words registered in the user dictionary.

言語処理部１０は、形態素解析として、例えば、動的計画法を用いる方法を利用できる。また言語処理部１０は、係り受け解析として、例えば、先読みＬＲパーザまたはＬＬ法といった構文解析の手法を利用できる。そして言語処理部１０は、各単語の順序、読み、アクセントの位置及び区切りの位置に応じて形態素読み表記及び中間表記を作成する。
言語処理部１０は、生成した形態素読み表記及び中間表記などを記憶部３に一時的に記憶する。 The language processing unit 10 can use, for example, a method using dynamic programming as the morphological analysis. The language processing unit 10 can use a syntax analysis technique such as a prefetch LR parser or an LL method, for example, as dependency analysis. Then, the language processing unit 10 creates a morpheme reading notation and an intermediate notation according to the order of each word, reading, accent position, and break position.
The language processing unit 10 temporarily stores the generated morpheme reading notation and intermediate notation in the storage unit 3.

音声合成部１１は、入力されたテキストデータの中間表記に基づいて合成音声信号を作成する。 The speech synthesizer 11 creates a synthesized speech signal based on the intermediate notation of the input text data.

音声合成部１１は、中間表記に基づいて、合成音声信号を生成する際の目標韻律を生成する。そのために、音声合成部１１は、記憶部３から複数の韻律モデルを読み込む。この韻律モデルは、声を高くする位置及び声を低くする位置などを時間順に表したものである。そして音声合成部１１は、複数の韻律モデルのうち、中間表記に示されたアクセントの位置などに最も一致する韻律モデルを選択する。そして音声合成部１１は、選択した韻律モデル及び合成パラメータに従って、中間表記に対して声が高くなる位置あるいは声が低くなる位置、声の抑揚、ピッチなどを設定することにより、目標韻律を作成する。目標韻律は、音声波形を決定する単位となる音素ごとに、音素の長さ及びピッチ周波数を含む。なお、音素は、例えば、一つの母音あるいは一つの子音とすることができる。 The speech synthesizer 11 generates a target prosody for generating a synthesized speech signal based on the intermediate notation. For this purpose, the speech synthesis unit 11 reads a plurality of prosodic models from the storage unit 3. This prosodic model represents a position in which the voice is raised and a position in which the voice is lowered in time order. Then, the speech synthesizer 11 selects a prosodic model that most closely matches the position of the accent indicated by the intermediate notation among a plurality of prosodic models. Then, the speech synthesizer 11 creates a target prosody by setting a position where the voice becomes high or low, a position where the voice becomes low, a voice inflection, a pitch, and the like according to the selected prosodic model and synthesis parameters. . The target prosody includes a phoneme length and a pitch frequency for each phoneme as a unit for determining a speech waveform. Note that the phoneme can be, for example, one vowel or one consonant.

音声合成部１１は、生成した目標韻律に従って、例えば、HMM(Hidden Markov Model)合成方式、音素接続方式またはコーパスベース方式によって合成音声信号を作成する。
例えば、音声合成部１１は、音素ごとに、目標韻律の音素長及びピッチ周波数に最も近い音声波形を、例えばパターンマッチングにより音声波形辞書に登録されている複数の音声波形の中から選択する。そのために、音声合成部１１は、記憶部３から音声波形辞書を読み込む。音声波形辞書は、複数の音声波形及び各音声波形の識別番号を記録する。また音声波形は、例えば、一人以上のナレータが様々なテキストを読み上げた様々な音声を録音した音声信号から、音素単位で取り出された波形信号である。
さらに、音声合成部１１は、音素ごとに選択された音声波形を目標韻律に沿って接続できるようにするため、それら選択された音声波形と目標韻律に示された対応する音素の波形パターンとのずれ量を、波形変換情報として算出してもよい。
音声合成部１１は、音素ごとに選択された音声波形の識別番号を含む波形生成情報を作成する。波形生成情報は、波形変換情報をさらに含んでもよい。 The speech synthesizer 11 creates a synthesized speech signal according to the generated target prosody, for example, by an HMM (Hidden Markov Model) synthesis method, a phoneme connection method, or a corpus-based method.
For example, for each phoneme, the speech synthesizer 11 selects a speech waveform closest to the phoneme length and pitch frequency of the target prosody from a plurality of speech waveforms registered in the speech waveform dictionary by pattern matching, for example. For this purpose, the speech synthesis unit 11 reads a speech waveform dictionary from the storage unit 3. The speech waveform dictionary records a plurality of speech waveforms and an identification number of each speech waveform. The voice waveform is, for example, a waveform signal extracted in units of phonemes from voice signals obtained by recording various voices in which one or more narrators read various texts.
Furthermore, the speech synthesizer 11 connects the selected speech waveform and the waveform pattern of the corresponding phoneme indicated in the target prosody so that the speech waveform selected for each phoneme can be connected along the target prosody. The deviation amount may be calculated as waveform conversion information.
The speech synthesizer 11 creates waveform generation information including the identification number of the speech waveform selected for each phoneme. The waveform generation information may further include waveform conversion information.

音声合成部１１は、波形生成情報に含まれる各音素の音声波形の識別番号に対応する音声波形信号を記憶部３から読み込む。そして音声合成部１１は、各音声波形信号を連続的に接続することにより、合成音声信号を作成する。なお、波形生成情報に波形変換情報が含まれている場合、音声合成部１１は、各音声波形信号を、対応する音素について求められた波形変換情報に従って補正して音声波形信号を連続的に接続することにより、合成音声信号を作成する。
音声合成部１１は、合成音声信号を出力部５へ出力する。 The speech synthesizer 11 reads a speech waveform signal corresponding to the speech waveform identification number of each phoneme included in the waveform generation information from the storage unit 3. Then, the speech synthesizer 11 creates a synthesized speech signal by connecting each speech waveform signal continuously. When the waveform conversion information is included in the waveform generation information, the speech synthesizer 11 continuously connects the speech waveform signals by correcting each speech waveform signal according to the waveform conversion information obtained for the corresponding phoneme. By doing so, a synthesized speech signal is created.
The voice synthesizer 11 outputs the synthesized voice signal to the output unit 5.

辞書登録部１２は、言語処理部１０が生成した中間表記をユーザが修正したときに、その修正内容をユーザ辞書に登録する。そのために、辞書登録部１２は、編集部２１と、形態素情報設定部２２と、品詞情報設定部２３と、修正範囲設定部２４と、登録範囲設定部２５と、登録部２６とを有する。 When the user corrects the intermediate notation generated by the language processing unit 10, the dictionary registration unit 12 registers the correction content in the user dictionary. For this purpose, the dictionary registration unit 12 includes an editing unit 21, a morpheme information setting unit 22, a part of speech information setting unit 23, a correction range setting unit 24, a registration range setting unit 25, and a registration unit 26.

処理部４は、入力部２から中間表記の編集を行うことを示す操作信号を受け取ると、編集部２１を起動する。 When the processing unit 4 receives an operation signal indicating that intermediate notation editing is to be performed from the input unit 2, the processing unit 4 activates the editing unit 21.

編集部２１は、例えば、編集対象となる中間表記を、対応するテキストデータとともに入力部２が有するディスプレイに表示させる。
そして編集部２１は、入力部２のキーボード等から編集対象の中間表記の一部、例えば、一部の単語の読みまたはアクセントの位置などを修正する操作信号を受け取ると、その操作信号に従って、中間表記を修正する。そして編集部２１は、修正された中間表記を記憶部３に一時的に記憶する。 For example, the editing unit 21 displays the intermediate notation to be edited on the display of the input unit 2 together with the corresponding text data.
When the editing unit 21 receives an operation signal for correcting a part of the intermediate notation to be edited from the keyboard or the like of the input unit 2, for example, the reading of some words or the position of accents, the editing unit 21 performs intermediate processing according to the operation signal. Correct the notation. The editing unit 21 temporarily stores the corrected intermediate notation in the storage unit 3.

なお、音声合成に対する専門知識が無いユーザでも、中間表記を修正できるように、編集部２１は、入力部２が有するディスプレイに、韻律記号を日本語で表示させたり、各形態素の形態素情報を表示させてもよい。またユーザが、例えば、入力部２が有するマウスなどを介して修正する韻律記号を選択したり、韻律記号を追加する位置を指定すると、編集部２１は、例えば、プルダウンメニューなどで、選択可能な韻律記号を表す日本語表記をディスプレイに表示させる。そして編集部２１は、マウスなどを介して選択された日本語表記に対応する韻律記号で、中間表記の指定された位置の韻律記号を置換したり、選択された日本語表記に対応する韻律記号をその指定された位置に自動的に追加する。
また編集部２１は、ユーザが自分で修正した内容を把握できるようにするために、修正後の中間表記をディスプレイに表記させてもよい。さらに、編集部２１は、修正後の中間表記を音声合成部１１に入力することにより、修正後の中間表記に対して実際に生成される合成音声をスピーカ６から出力させることで、ユーザに修正内容を確認させてもよい。 The editing unit 21 displays the prosodic symbols in Japanese on the display of the input unit 2 or displays the morpheme information of each morpheme so that even a user who has no expertise in speech synthesis can correct the intermediate notation. You may let them. For example, when the user selects a prosodic symbol to be corrected through the mouse of the input unit 2 or designates a position where a prosodic symbol is added, the editing unit 21 can be selected using, for example, a pull-down menu. A Japanese notation representing a prosodic symbol is displayed on the display. Then, the editing unit 21 replaces the prosodic symbol at the designated position in the intermediate notation with the prosodic symbol corresponding to the Japanese notation selected via the mouse or the like, or the prosodic symbol corresponding to the selected Japanese notation. Is automatically added to the specified position.
The editing unit 21 may display the corrected intermediate notation on the display so that the user can grasp the contents corrected by the user. Further, the editing unit 21 inputs the corrected intermediate notation to the speech synthesizer 11, thereby causing the user to correct the synthesized speech that is actually generated for the corrected intermediate notation from the speaker 6. The contents may be confirmed.

図４は、言語処理部１０により出力された中間表記と、ユーザにより修正された中間表記の一例を示す図である。文字列４０１及び４０２は、それぞれ、入力されたテキストデータから言語処理部１０が生成した形態素読み表記及び中間表記を表す。この例では、形態素読み表記４０１中に、６個の形態素「ヨーカン」（普通名詞）、「サイチュー」（普通名詞）、「ワ」（助詞）、「ウツクシ」（形容詞（語幹））、「ク」（形容詞（活用語尾））及び「，」（記号（読点））が含まれている。そして形態素読み表記４０１に対応する中間表記４０２では、上記の６個の形態素に対応する６個の品詞の他、助詞「ワ」と形容詞「ウツクシ」の間に挿入された、区切りを表す句の記号「＝」が含まれている。 FIG. 4 is a diagram illustrating an example of the intermediate notation output by the language processing unit 10 and the intermediate notation corrected by the user. Character strings 401 and 402 represent morpheme reading notation and intermediate notation generated by the language processing unit 10 from input text data, respectively. In this example, the morpheme reading notation 401 includes six morphemes “Yokan” (common noun), “Situ” (common noun), “wa” (particle), “Utsukushi” (adjective (stem)), “ku” "(Adjective (utilization ending)) and", "(symbol (reading marks)). In the intermediate notation 402 corresponding to the morpheme reading notation 401, in addition to the six parts of speech corresponding to the above six morphemes, a phrase representing a delimiter inserted between the particle “wa” and the adjective “Utsukushi” The symbol “=” is included.

また、文字列４０３は、ユーザが修正した後の中間表記を表す。中間表記４０３では、修正前の中間表記４０２の２番目の品詞「サ’イチュー」が、「モ＊ナカ」に変更されている。さらに、５番目の品詞「ウツクシ％’」が、「ウツク’シ％」に変更されている。なお、中間表記４０２、４０３に含まれる記号「’」、「＊」、「％」などは、韻律記号である。こことでは、韻律記号「’」は、「アクセント強」を表し、韻律記号「＊」は、「アクセント弱」を表す。なお、中間表記４０３では、ユーザは個々の品詞を指定していないので、ユーザが中間表記を修正した時点では、中間表記４０３に含まれる個々の品詞及び各品詞に相当する中間表記の範囲は不明である。 A character string 403 represents intermediate notation after correction by the user. In the intermediate notation 403, the second part-of-speech “sa’ichu” of the intermediate notation 402 before correction is changed to “mo * naka”. Further, the fifth part-of-speech “Utsukushi% '” is changed to “Utsuku'shi%”. The symbols “′”, “*”, “%”, etc. included in the intermediate notations 402 and 403 are prosodic symbols. Here, the prosodic symbol “′” represents “strong accent”, and the prosodic symbol “*” represents “accent weak”. In the intermediate notation 403, since the user does not specify individual parts of speech, when the user corrects the intermediate notation, the range of the intermediate notation corresponding to each part of speech included in the intermediate notation 403 and each part of speech is unknown. It is.

形態素情報設定部２２は、ユーザが修正した後の中間表記に含まれる各形態素の情報を設定する。そのために、形態素情報設定部２２は、ユーザが修正する前の中間表記と、ユーザが修正した後の中間表記を記憶部３から読み込む。そして形態素情報設定部２２は、ユーザが修正した後の中間表記から韻律記号を除去することで、ユーザが修正した後の中間表記に対応する修正後形態素読み表記を生成する。 The morpheme information setting unit 22 sets information of each morpheme included in the intermediate notation after being corrected by the user. Therefore, the morpheme information setting unit 22 reads from the storage unit 3 the intermediate notation before correction by the user and the intermediate notation after correction by the user. Then, the morpheme information setting unit 22 generates a corrected morpheme reading notation corresponding to the intermediate notation corrected by the user by removing the prosodic symbols from the intermediate notation corrected by the user.

図５は、図４の中間表記に対応する、ユーザ修正前の形態素読み表記とユーザ修正後の形態素読み表記を表す図である。図５に示されたユーザ修正前の形態素読み表記５０１は、図４に示された形態素読み表記４０１と同一である。また、ユーザ修正後の形態素読み表記５０２は、図４に示されたユーザ修正後の中間表記４０３から、韻律記号を除去したものである。形態素読み表記５０１と形態素読み表記５０２を比較すると、修正前の形態素「サイチュー」と、修正後の形態素読み表記中の「モナカ」が一致しないことが分かる。 FIG. 5 is a diagram illustrating the morpheme reading notation before the user correction and the morpheme reading notation after the user correction corresponding to the intermediate notation of FIG. The morpheme reading notation 501 before the user modification shown in FIG. 5 is the same as the morpheme reading notation 401 shown in FIG. Further, the user-corrected morpheme reading notation 502 is obtained by removing the prosodic symbols from the user-corrected intermediate notation 403 shown in FIG. Comparing the morpheme reading notation 501 and the morpheme reading notation 502, it can be seen that the morpheme “situ” before correction and the “monaca” in the morpheme reading notation after correction do not match.

この不一致部分を見つけるために、形態素情報設定部２２は、ユーザ修正後の形態素読み表記とユーザ修正前の形態素読み表記との間で、例えば、動的計画法を用いたマッチング処理（ＤＰマッチング）を実行する。そして形態素情報設定部２２は、ユーザ修正後の形態素読み表記とユーザ修正前の形態素読み表記との間で、形態素単位で一致する部分と一致しない部分を特定する。そして形態素情報設定部２２は、ユーザ修正後の形態素読み表記中で、ユーザ修正前の形態素と一致する部分を、そのユーザ修正前の形態素と同一の種別の形態素とする。また、形態素情報設定部２２は、ユーザ修正後の形態素読み表記中で、ユーザ修正前の形態素と一致しない部分が、ユーザ修正前の一形態素に相当する場合も、その一致しない部分を、対応するユーザ修正前の形態素と同じ種別の形態素とする。すなわち、形態素情報設定部２２は、ユーザ修正前の形態素に１対１に対応する部分がユーザ修正後の形態素読み表記中に含まれる限り、その部分を、対応するユーザ修正前の形態素と同じ種別の形態素に設定する。 In order to find this inconsistent portion, the morpheme information setting unit 22 performs, for example, a matching process (DP matching) using dynamic programming between the morpheme reading notation after the user correction and the morpheme reading notation before the user correction. Execute. And the morpheme information setting part 22 specifies the part which does not correspond with the part matched in a morpheme unit between the morpheme reading notation after a user correction, and the morpheme reading notation before a user correction. Then, the morpheme information setting unit 22 sets a part that matches the morpheme before the user correction in the morpheme reading notation after the user correction as the morpheme of the same type as the morpheme before the user correction. Also, the morpheme information setting unit 22 corresponds to a non-matching part even when a part that does not match the morpheme before the user correction corresponds to one morpheme before the user correction in the morpheme reading notation after the user correction. The morpheme is the same type as the morpheme before user modification. That is, the morpheme information setting unit 22 has the same type as the corresponding morpheme before the user modification as long as the part corresponding to the morpheme before the user modification is included in the morpheme reading notation after the user modification. Set to the morpheme.

図５では、ユーザ修正後の形態素読み表記５０２中の「ヨーカン」、「モナカ」、「ワ」、「ウツクシ」、「ク」、「，」が、それぞれ、ユーザ修正前の形態素読み表記５０１中の各形態素「ヨーカン」、「サイチュー」、「ワ」、「ウツクシ」、「ク」、「，」に対応する。したがって、ユーザ修正後の形態素読み表記５０２中の「ヨーカン」、「モナカ」、「ワ」、「ウツクシ」、「ク」、「，」が、それぞれ、ユーザ修正前の対応する形態素と同じ種別の形態素に設定される。例えば、「ヨーカン」は普通名詞となり、「ウツクシ」は形容詞の語幹となる。また、「モナカ」と一致する形態素は、ユーザ修正前の形態素読み表記５０１には含まれないが、「モナカ」の前後の表記により、「モナカ」が形態素「サイチュー」に対して１対１に対応することが分かる。そこで、形態素情報設定部２２は、「モナカ」の品詞を形態素「サイチュー」と同じ普通名詞とする。 In FIG. 5, “Yokan”, “Monaca”, “Wa”, “Otsukushi”, “Ku”, “,” in the morpheme reading notation 502 after the user correction are respectively in the morpheme reading notation 501 before the user correction. Correspond to the morphemes “Yokan”, “Situ”, “Wa”, “Otsukushi”, “Ku”, “,”. Therefore, “Yokan”, “Monaka”, “Wa”, “Utsukushi”, “Ku”, “,” in the morpheme reading notation 502 after the user correction are respectively of the same type as the corresponding morpheme before the user correction. Set to morpheme. For example, “Yokan” is a common noun, and “Utsukushi” is the adjective stem. In addition, the morpheme that coincides with “monaca” is not included in the morpheme reading notation 501 before the user correction, but “monaca” has a one-to-one correspondence with the morpheme “situ” by the notation before and after “monaca”. You can see that it corresponds. Therefore, the morpheme information setting unit 22 sets the part of speech of “MONAKA” as a common noun same as the morpheme “Situ”.

しかし、ユーザ修正後の形態素読み表記のうち、ユーザ修正前の形態素読み表記と一致しない部分に、ユーザ修正前の形態素中の複数の形態素が対応することがある。この場合、形態素情報設定部２２は、その一致しない部分に対応する複数の形態素を、連続する同一種別の形態素ごとにグループ化する。ただし、本実施形態では、形態素情報設定部２２は、その形態素が名詞である場合には、普通名詞、サ変名詞といった名詞の分類は無視して同一種別の形態素として扱う。例えば、不一致部分に、普通名詞「ソーイ」とサ変名詞「クフー」が連続して含まれている場合、形態素情報設定部２２は、その二つの名詞「ソーイ」、「クフー」をまとめた「ソーイクフー」を一つのグループとする。 However, in the morpheme reading notation after user correction, a plurality of morphemes in the morpheme before user correction may correspond to a part that does not match the morpheme reading notation before user correction. In this case, the morpheme information setting unit 22 groups a plurality of morphemes corresponding to the non-matching parts into consecutive morphemes of the same type. However, in the present embodiment, when the morpheme is a noun, the morpheme information setting unit 22 ignores the classification of nouns such as ordinary nouns and saun nouns and treats them as the same type of morpheme. For example, when the common noun “Soi” and the Sa-noun “Qufu” are continuously included in the inconsistent portion, the morpheme information setting unit 22 “Soi Khoo” is a combination of the two nouns “Soi” and “Khoo”. "Is a group.

一致しない部分に対応する、グループに含まれる形態素が一つである場合は、上記の一つの形態素のみが対応する場合と同様に、修正の前後で１対１に対応している。そこで形態素情報設定部２２は、ユーザ修正後の形態素読み表記中のその不一致部分の形態素を、対応する形態素のグループと同じ種別の形態素とする。 When there is one morpheme included in the group corresponding to the non-matching part, it corresponds one-to-one before and after the correction, as in the case where only one morpheme corresponds. Therefore, the morpheme information setting unit 22 sets the morpheme of the mismatched part in the morpheme reading notation after the user correction as the morpheme of the same type as the corresponding morpheme group.

一方、一致しない部分に対応するグループに複数の形態素が含まれる場合、その形態素のグループが名詞でなければ、形態素情報設定部２２は、そのグループの形態素の種別を、そのままユーザ修正後の不一致部分の形態素の種別とする。これは、以下の理由による。
通常、形態素のグループが名詞でない場合、その形態素のグループの形態素の種別は、非活用自立語、活用自立語、付属語に大別されるが、いずれも中間表記の読み修正がほぼ発生しない。極稀に、動詞「通った(とおった、かよった)」、「行った(いった、おこなった)」などの同表記異読語が存在するが、形態素単位では、語幹部分のみが読み修正の対象となり、活用語尾には読み修正が発生しない。すなわち、活用語尾は形態素単位で必ず一致する。このことから、事実上、グループに複数の形態素が含まれるケースは、その形態素が名詞である場合以外にない。したがって、複数形態素で構成されるグループとしては、名詞のみが考慮されればよい。 On the other hand, when a plurality of morphemes are included in the group corresponding to the non-matching part, if the group of the morpheme is not a noun, the morpheme information setting unit 22 sets the type of the morpheme of the group as it is as a mismatched part after the user correction. Morpheme type. This is due to the following reason.
Usually, when a morpheme group is not a noun, the type of morpheme of the morpheme group is roughly classified into a non-utilization independent word, a utilization independent word, and an ancillary word. Very rarely, there are misreads with the same notation, such as the verbs `` passed '' and `` goed '', but in the morpheme unit, only the stem part is read and corrected The reading is not corrected at the end of the usage. In other words, the inflection endings always match in morpheme units. For this reason, there are virtually no cases where a group includes a plurality of morphemes other than when the morpheme is a noun. Therefore, only a noun need be considered as a group composed of a plurality of morphemes.

一つの実施例として、形態素のグループが複数の形態素で構成される場合、不一致部分に対する形態素の種別の設定は、以下の三つのケースに分類される。
（ケース１）形態素のグループに含まれる複数の形態素が全て普通名詞である場合
例えば、グループに含まれる形態素が「羊羹」「最中」である場合がケース１に相当する。この場合、形態素情報設定部２２は、その複数の形態素を一つの形態素に統合し、すなわち、複合名詞化する。そして形態素情報設定部２２は、不一致部分の形態素の種別を「普通名詞」に設定する。
（ケース２）形態素のグループに含まれる複数の形態素のうちの最後の形態素がサ変名詞（例えば）である場合
例えば、グループに含まれる形態素が「創意」「工夫」である場合がケース２に相当する。この場合、形態素情報設定部２２は、その複数の形態素を一つの形態素に統合し、すなわち、複合名詞化する。そして形態素情報設定部２２は、不一致部分の形態素の種別を「サ変名詞」に設定する。
（ケース３）形態素のグループに含まれる複数の形態素の種別が上記のケース以外の場合
この場合、形態素情報設定部２２は、その複数の形態素を一つの形態素に統合し、すなわち、複合名詞化する。そして形態素情報設定部２２は、不一致部分の形態素の種別を「固有名詞」に設定する。一般に、読み誤りは固有名詞(人名、地名など)に多く、確率上、固有名詞で定義することが好ましい。 As an example, when a morpheme group is composed of a plurality of morphemes, the setting of the morpheme type for the mismatched portion is classified into the following three cases.
(Case 1) When a plurality of morphemes included in the group of morphemes are all common nouns For example, the case where the morphemes included in the group are “sheep” and “middle” corresponds to case 1. In this case, the morpheme information setting unit 22 integrates the plurality of morphemes into one morpheme, that is, forms a compound noun. Then, the morpheme information setting unit 22 sets the type of the morpheme of the mismatched part to “common noun”.
(Case 2) When the last morpheme among a plurality of morphemes included in a group of morphemes is a sub-noun (for example) For example, the case where the morpheme included in a group is “creative” or “devise” corresponds to case 2 To do. In this case, the morpheme information setting unit 22 integrates the plurality of morphemes into one morpheme, that is, forms a compound noun. Then, the morpheme information setting unit 22 sets the type of the morpheme of the inconsistent portion to “sa variable noun”.
(Case 3) When the types of plural morphemes included in the morpheme group are other than the above case In this case, the morpheme information setting unit 22 integrates the plural morphemes into one morpheme, that is, forms a compound noun. . Then, the morpheme information setting unit 22 sets the type of the morpheme of the mismatched part to “proper noun”. In general, there are many reading errors in proper nouns (person names, place names, etc.), and it is preferable to define proper nouns in terms of probability.

図６は、ユーザが修正する前の形態素読み表記とユーザが修正した後の形態素読み表記の他の一例を示す図である。この例では、入力されたテキストデータ「ＩａａＳは、」に対する言語処理の結果、ユーザ修正前の形態素読み表記６０１は、６個の形態素の集合となっている。具体的には、形態素読み表記６０１には、形態素「アイ」（普通名詞）、「エー」（普通名詞）、「エー」（普通名詞）、「エス」（普通名詞）、「ワ」（助詞）、「，」（読点）が含まれる。一方、ユーザ修正後の形態素読み表記６０２は、「イアースワ，」となっている。そのため、形態素情報設定部２２は、ユーザ修正後の形態素読み表記６０２中の不一致部分「イアース」に、ユーザ修正前の形態素読み表記６０１中の４個の形態素「アイ」、「エー」、「エー」、「エス」が対応していることが分かる。この４個の形態素は、すべて名詞であるため、一つの形態素のグループに含まれる。さらに、この４個の形態素の名詞の種別は全て普通名詞であるため、図６の例は、上記のケース１に該当する。そのため、形態素情報設定部２２は、形態素読み表記６０３に示されるように、この４個の形態素を一つの形態素に統合し、ユーザ修正後の形態素読み表記中の対応する不一致部分の形態素の種別を普通名詞とする。 FIG. 6 is a diagram illustrating another example of the morpheme reading notation before correction by the user and the morpheme reading notation after correction by the user. In this example, as a result of language processing on the input text data “IaaS is”, the morpheme reading notation 601 before the user correction is a set of six morphemes. Specifically, the morpheme reading notation 601 includes the morpheme “A” (common noun), “A” (common noun), “A” (common noun), “S” (common noun), “wa” (particle). ), "," (Reading marks). On the other hand, the morpheme reading notation 602 after the user correction is “Yearswa,”. Therefore, the morpheme information setting unit 22 adds the four morphemes “A”, “A”, and “A” in the morpheme reading notation 601 before the user correction to the inconsistent part “Years” in the morpheme reading notation 602 after the user correction. ”And“ S ”correspond to each other. Since these four morphemes are all nouns, they are included in one morpheme group. Furthermore, since the four morpheme noun types are all common nouns, the example of FIG. Therefore, as shown in the morpheme reading notation 603, the morpheme information setting unit 22 integrates the four morphemes into one morpheme, and sets the morpheme type of the corresponding mismatched part in the morpheme reading notation after the user correction. Use common nouns.

図７は、形態素情報設定部２２により実行される形態素情報設定処理の動作フローチャートである。
形態素情報設定部２２は、修正後の中間表記から修正後の形態素読み表記を導出する（ステップＳ１０１）。そして形態素情報設定部２２は、修正前後の形態素読み表記間のマッチングにより一致部分及び不一致部分を特定する（ステップＳ１０２）。 FIG. 7 is an operation flowchart of the morpheme information setting process executed by the morpheme information setting unit 22.
The morpheme information setting unit 22 derives a corrected morpheme reading notation from the corrected intermediate notation (step S101). Then, the morpheme information setting unit 22 specifies a matching part and a mismatching part by matching between morpheme reading notations before and after correction (step S102).

形態素情報設定部２２は、修正後の形態素読み表記中の一致部分に修正前の形態素読み表記中の対応する形態素の範囲及び種別を設定する（ステップＳ１０３）。
一方、形態素情報設定部２２は、修正後の形態素読み表記中の不一致部分に対応する形態素は一つか否か判定する（ステップＳ１０４）。
不一致部分に対応する形態素が一つである場合（ステップＳ１０４−Ｙｅｓ）、形態素情報設定部２２は、不一致部分に修正前の形態素読み表記中の対応する形態素の範囲及び種別を設定する（ステップＳ１０５）。一方、不一致部分に対応する形態素が複数である場合（ステップＳ１０４−ｎｏ）、形態素情報設定部２２は、修正前の形態素読み表記中の不一致部分を連続する同一種別の形態素ごとにグループ化する（ステップＳ１０６）。そして形態素情報設定部２２は、グループに含まれる形態素は一つか否か判定する（ステップＳ１０７）。グループに含まれる形態素が一つである場合（ステップＳ１０７−Ｙｅｓ）、形態素情報設定部２２は、不一致部分に修正前の形態素読み表記中の対応する形態素の範囲及び種別を設定する（ステップＳ１０５）。一方、グループに含まれる形態素が一つでない場合（ステップＳ１０７−ｎｏ）、形態素情報設定部２２は、グループに含まれる形態素が全て普通名詞か否か判定する（ステップＳ１０８）。 The morpheme information setting unit 22 sets the range and type of the corresponding morpheme in the morpheme reading notation before correction to the matching part in the corrected morpheme reading notation (step S103).
On the other hand, the morpheme information setting unit 22 determines whether there is one morpheme corresponding to the mismatched part in the corrected morpheme reading notation (step S104).
When there is one morpheme corresponding to the mismatched part (step S104-Yes), the morpheme information setting unit 22 sets the range and type of the corresponding morpheme in the morpheme reading notation before correction to the mismatched part (step S105). ). On the other hand, when there are a plurality of morphemes corresponding to the mismatched part (step S104-no), the morpheme information setting unit 22 groups the mismatched parts in the morpheme reading notation before correction for each successive morpheme of the same type ( Step S106). The morpheme information setting unit 22 determines whether there is one morpheme included in the group (step S107). When there is one morpheme included in the group (step S107—Yes), the morpheme information setting unit 22 sets the range and type of the corresponding morpheme in the morpheme reading notation before correction in the mismatched part (step S105). . On the other hand, when the number of morphemes included in the group is not one (step S107-no), the morpheme information setting unit 22 determines whether all the morphemes included in the group are common nouns (step S108).

グループに含まれる形態素が全て普通名詞である場合（ステップＳ１０８−Ｙｅｓ）、形態素情報設定部２２は、修正後の形態素読み表記中の不一致部分全体を一つの普通名詞に設定する（ステップＳ１０９）。一方、グループに含まれる形態素の何れかが普通名詞でない場合（ステップＳ１０８−ｎｏ）、形態素情報設定部２２は、グループに含まれる最後の形態素がサ変名詞か否か判定する（ステップＳ１１０）。グループに含まれる最後の形態素がサ変名詞である場合（ステップＳ１１０−Ｙｅｓ）、形態素情報設定部２２は、修正後の形態素読み表記中の不一致部分全体を一つのサ変名詞に設定する（ステップＳ１１１）。一方、グループに含まれる最後の形態素がサ変名詞でない場合（ステップＳ１１０−ｎｏ）、形態素情報設定部２２は、修正後の形態素読み表記中の不一致部分全体を一つの固有名詞に設定する（ステップＳ１１２）。
ステップＳ１０５、Ｓ１０９、Ｓ１１１またはＳ１１２の後、形態素情報設定部２２は、形態素情報設定処理を終了する。形態素情報設定部２２は、この手順により、修正後の形態素読み表記中の各形態素の品詞を適切に設定できる。 When all the morphemes included in the group are common nouns (step S108-Yes), the morpheme information setting unit 22 sets the entire mismatched part in the corrected morpheme reading notation as one common noun (step S109). On the other hand, if any of the morphemes included in the group is not a common noun (step S108-no), the morpheme information setting unit 22 determines whether the last morpheme included in the group is a sub-noun (step S110). When the last morpheme included in the group is a sa-variant noun (step S110-Yes), the morpheme information setting unit 22 sets the entire inconsistent part in the corrected morpheme reading notation as one sa-noun (step S111). . On the other hand, when the last morpheme included in the group is not a savariant noun (step S110-no), the morpheme information setting unit 22 sets the entire mismatched part in the corrected morpheme reading notation as one proper noun (step S112). ).
After steps S105, S109, S111, or S112, the morpheme information setting unit 22 ends the morpheme information setting process. By this procedure, the morpheme information setting unit 22 can appropriately set the part of speech of each morpheme in the corrected morpheme reading notation.

形態素情報設定部２２は、ユーザ修正後の形態素読み表記について設定した形態素情報を、品詞情報設定部２３へ通知する。 The morpheme information setting unit 22 notifies the part-of-speech information setting unit 23 of the morpheme information set for the morpheme reading notation after the user correction.

品詞情報設定部２３は、ユーザ修正後の形態素読み表記中の各形態素の情報に基づいて、ユーザ修正後の中間表記中の各品詞の情報を設定する。形態素読み表記と中間表記の関係上、ユーザ修正後の形態素読み表記とユーザ修正後の中間表記とは、韻律記号を除いて１対１に対応している。そこで、品詞情報設定部２３は、ユーザ修正後の中間表記に含まれる各品詞の範囲を、ユーザ修正後の形態素読み表記の読みが一致する部分から特定し、各品詞の種別を、ユーザ修正後の形態素読み表記中の対応する形態素の種別に設定する。なお、韻律記号のみを含む品詞、例えば、句切りを表す韻律記号「＝」のみを含む品詞に関しては、品詞情報設定部２３は、ユーザ修正後の中間表記における品詞を、ユーザ修正前の中間表記における対応する品詞に設定すればよい。 The part-of-speech information setting unit 23 sets information on each part-of-speech in the intermediate notation after the user correction based on the information on each morpheme in the morpheme reading notation after the user correction. Due to the relationship between the morpheme reading notation and the intermediate notation, the morpheme reading notation after the user correction and the intermediate notation after the user correction have a one-to-one correspondence except for the prosodic symbols. Therefore, the part-of-speech information setting unit 23 specifies the range of each part-of-speech included in the intermediate notation after the user correction from the portion where the readings of the morpheme reading notation after the user correction match, and sets the type of each part-of-speech after the user correction To the corresponding morpheme type in the morpheme reading notation. For the part of speech including only the prosodic symbol, for example, the part of speech including only the prosodic symbol “=” indicating punctuation, the part of speech information setting unit 23 uses the intermediate notation before the user correction as the part of speech in the intermediate notation after the user correction. May be set to the corresponding part of speech.

図８は、ユーザ修正の前後のそれぞれの中間表記と、ユーザ修正後の形態素読み表記との関係の一例を示す図である。ユーザ修正後の中間表記８０２とユーザ修正後の形態素読み表記８０３とでは、韻律記号以外は一致している。そのため、形態素読み表記８０３に含まれる各形態素「ヨーカン」、「モナカ」、「ワ」、「ウツクシ」、「ク」及び「，」が、それぞれ、中間表記８０２の品詞「ヨーカン」、「モ＊ナカ」、「ワ」、「ウツク’シ％」、「ク」及び「，」に対応していることが分かる。そのため、各品詞「ヨーカン」、「モ＊ナカ」、「ワ」、「ウツク’シ％」、「ク」及び「，」の種別が、それぞれ形態素読み表記８０３中の対応する形態素の種別である、普通名詞、普通名詞、助詞（副）、形容詞（語幹）、形容詞（活用語尾）及び読点に設定される。また、形態素読み表記８０３中に対応する部分が無い、韻律記号「＝」は、ユーザ修正前の中間表記８０１中の対応する韻律記号「＝」の品詞である記号（句）に設定される。
品詞情報設定部２３は、ユーザ修正後の中間表記中の各品詞の情報を修正範囲設定部２４へ通知する。 FIG. 8 is a diagram illustrating an example of the relationship between the intermediate notation before and after the user correction and the morpheme reading notation after the user correction. The intermediate notation 802 after user correction and the morpheme reading notation 803 after user correction are identical except for the prosodic symbols. Therefore, each of the morphemes “Yokan”, “Monaca”, “Wa”, “Utsukushi”, “Ku” and “,” included in the morpheme reading notation 803 is the part of speech “Yokan”, “Mo *” of the intermediate notation 802, respectively. It can be seen that it corresponds to “Naka”, “Wa”, “Utsuku's%”, “Ku” and “,”. Therefore, the types of parts of speech “Yokan”, “Mo * Naka”, “Wa”, “Utsuku'shi%”, “Ku” and “,” are the types of the corresponding morphemes in the morpheme reading notation 803, respectively. , Common noun, common noun, particle (sub), adjective (stem), adjective (utilization ending) and punctuation. A prosodic symbol “=” having no corresponding part in the morpheme reading notation 803 is set to a symbol (phrase) that is a part of speech of the corresponding prosodic symbol “=” in the intermediate notation 801 before the user correction.
The part of speech information setting unit 23 notifies the correction range setting unit 24 of information on each part of speech in the intermediate notation after the user correction.

修正範囲設定部２４は、ユーザ修正後の中間表記から、品詞単位でユーザが修正した範囲を設定する。そのために、修正範囲設定部２４は、ユーザ修正後の中間表記とユーザ修正前の中間表記との間で、品詞単位でマッチング処理、例えば、ＤＰマッチングを行って、一致しない品詞を特定する。そして修正範囲設定部２４は、ユーザ修正後の中間表記に含まれる品詞のうち、ユーザ修正前の中間表記の品詞と一致しない品詞を修正範囲とする。このように、品詞単位で修正範囲を設定することにより、修正範囲設定部２４は、修正範囲を、ユーザが修正を意図した範囲に適切に設定できる。 The correction range setting unit 24 sets a range corrected by the user in units of parts of speech from the intermediate notation after the user correction. Therefore, the correction range setting unit 24 performs matching processing, for example, DP matching, on a part-of-speech basis between the intermediate notation after the user correction and the intermediate notation before the user correction, and specifies the part of speech that does not match. Then, the correction range setting unit 24 sets the part of speech included in the intermediate notation after the user correction to the part of speech that does not match the part of speech of the intermediate notation before the user correction. Thus, by setting the correction range in units of parts of speech, the correction range setting unit 24 can appropriately set the correction range to a range intended by the user for correction.

図９は、ユーザ修正の前後のそれぞれの中間表記と、設定される修正範囲との関係の一例を示す図である。図９において、ユーザ修正前の中間表記９０１に対して、ユーザ修正後の中間表記９０２では、品詞「サ’イチュー」が品詞「モ＊ナカ」に修正されている。そのため、矢印９０３に示されるように、品詞「モ＊ナカ」が修正範囲として設定される。
さらに、修正前の品詞「ウツクシ％’」について、修正後では「ウツク’シ％」となり、アクセント強を表す韻律記号の位置が修正されている。この場合も、韻律記号だけでなく、矢印９０４に示されるように、品詞「ウツク’シ％」全体が修正範囲として設定される。
修正範囲設定部２４は、設定した修正範囲を登録範囲設定部２５へ通知する。 FIG. 9 is a diagram illustrating an example of the relationship between the intermediate notation before and after the user correction and the correction range that is set. In FIG. 9, in the intermediate notation 901 before the user correction, in the intermediate notation 902 after the user correction, the part of speech “sa'ichu” is corrected to the part of speech “mo * naka”. Therefore, as indicated by the arrow 903, the part of speech “MO * NAKA” is set as the correction range.
Further, the part-of-speech “Utsukushi% ′” before correction becomes “Utsukushi%” after correction, and the position of the prosodic symbol representing the accent strength is corrected. Also in this case, not only the prosodic symbols but also the entire part of speech “Utsuku's%” is set as the correction range, as indicated by the arrow 904.
The correction range setting unit 24 notifies the registration range setting unit 25 of the set correction range.

登録範囲設定部２５は、ユーザ修正後の中間表記において設定された修正範囲から、ユーザ辞書に登録する範囲を設定する。連続する名詞のうちの一部をユーザが修正した場合に、その修正された名詞についてのみ、修正内容をユーザ登録すると、図１について上述した「最中」のように、修正された名詞が他の文脈で使用される場合に誤った中間表記に変換されるおそれがある。このような場合、ユーザ辞書には、名詞「最中」を、その名詞に前置された名詞「羊羹」まで含めて、「羊羹最中」として登録しておくことで、「最中」が「サイチュー」の読みで使用される場合に、誤って「モナカ」とされることを防止できる。そこで、本実施形態では、登録範囲設定部２５は、修正範囲に含まれる品詞が名詞である場合、その名詞と連続する名詞も登録範囲に含まれるように登録範囲に設定する。 The registration range setting unit 25 sets a range to be registered in the user dictionary from the correction range set in the intermediate notation after user correction. When the user corrects a part of the continuous nouns, if the correction contents are registered only for the corrected nouns, the corrected nouns can be changed as shown in “middle” described above with reference to FIG. May be converted to an incorrect intermediate representation when used in the context of. In such a case, by registering the noun “middle” in the user dictionary as well as the noun “yogo” prefixed to the noun, and registering it as “middle mushroom”, “middle” When used in reading “Situ”, it can be prevented from being mistakenly “Monaca”. Therefore, in the present embodiment, when the part of speech included in the correction range is a noun, the registration range setting unit 25 sets the registration range so that a noun that is continuous with the noun is also included in the registration range.

また、活用自立語のアクセント位置は、語尾によって変化することがある。例えば、図８に示されるように、修正範囲が活用自立語の語幹である場合、韻律については、実際に修正された中間表記における語尾との組み合わせで使用されるときのみ、修正された韻律が適用される可能性が高い。そこで本実施形態では、登録範囲設定部２５は、修正範囲に含まれる品詞が活用自立語の語幹である場合、活用自立語とその活用自立語の語尾を登録範囲に設定する。このように、登録範囲設定部２５は、修正範囲に含まれる品詞が所定の品詞である場合、その修正範囲の品詞と連続する同じ品詞を含むように登録範囲に設定する。これにより、登録範囲設定部２５は、ユーザによる修正をユーザ辞書に適切に反映させることができる。 In addition, the accent position of a self-supporting word may change depending on the ending. For example, as shown in FIG. 8, when the correction range is a stem of a use independent word, the prosody is only used when combined with the ending in the actually corrected intermediate notation. It is likely to be applied. Therefore, in the present embodiment, when the part of speech included in the correction range is a stem of a utilized independent word, the registration range setting unit 25 sets the utilized independent word and the ending of the utilized independent word in the registration range. As described above, when the part of speech included in the correction range is a predetermined part of speech, the registration range setting unit 25 sets the registration range so as to include the same part of speech continuous with the part of speech of the correction range. Thereby, the registration range setting unit 25 can appropriately reflect the correction by the user in the user dictionary.

図１０は、ユーザ修正の前後の中間表記と登録範囲の関係の一例を示す図である。ユーザ修正前の中間表記１０００とユーザ修正後の中間表記１００１を比較すると、「サ’イチュー」が「モ＊ナカ」と修正され、「ウツクシ％’」が「ウツク’シ％」と修正されている。そのため、名詞１００２（「モ＊ナカ」）と形容詞の語幹１００３（「ウツク’シ％」）が修正範囲に設定されている。この場合、名詞１００２については、前置された名詞「ヨーカン」も含めるように登録範囲１００４が設定される。一方、形容詞の語幹１００３については、活用語尾である「ク」も含めるように登録範囲１００５が設定される。 FIG. 10 is a diagram illustrating an example of the relationship between the intermediate notation before and after the user correction and the registration range. Comparing the intermediate notation 1000 before the user correction and the intermediate notation 1001 after the user correction, the “situation” is corrected as “mo * naka”, and the “development percentage” is modified as “development percentage”. Yes. Therefore, the noun 1002 (“M * Naka”) and the adjective stem 1003 (“Utsuku 'shi%”) are set in the correction range. In this case, for the noun 1002, the registration range 1004 is set so as to include the prefix noun “Yokan”. On the other hand, for the adjective stem 1003, the registration range 1005 is set so as to include “ku”, which is an inflection ending.

図１１は、登録範囲設定処理の動作フローチャートである。なお、登録範囲設定部２５は、ユーザ修正後の中間表記に、複数の修正範囲が含まれる場合、修正範囲ごとに、以下の処理を実行する。 FIG. 11 is an operation flowchart of the registration range setting process. In addition, the registration range setting part 25 performs the following processes for every correction range, when a some correction range is contained in the intermediate description after a user correction.

まず、初期処理として、登録範囲設定部２５は、修正範囲そのものを登録範囲とする。そして登録範囲設定部２５は、修正範囲に含まれる品詞が名詞か否か判定する（ステップＳ２０１）。修正範囲に含まれる品詞が名詞である場合（ステップＳ２０１−Ｙｅｓ）、登録範囲設定部２５は、登録範囲の前置品詞が名詞か否か判定する（ステップＳ２０２）。前置品詞が名詞である場合（ステップＳ２０２−Ｙｅｓ）、登録範囲設定部２５は、登録範囲の先頭位置をその前置品詞の先頭に拡張する（ステップＳ２０３）。その後、登録範囲設定部２５は、ステップＳ２０２以降の処理を繰り返す。 First, as an initial process, the registration range setting unit 25 sets the correction range itself as the registration range. Then, the registration range setting unit 25 determines whether or not the part of speech included in the correction range is a noun (step S201). When the part of speech included in the correction range is a noun (step S201—Yes), the registration range setting unit 25 determines whether the prefix part of speech of the registration range is a noun (step S202). When the prefix part of speech is a noun (step S202—Yes), the registration range setting unit 25 extends the head position of the registration range to the head of the prefix part of speech (step S203). Thereafter, the registration range setting unit 25 repeats the processing after step S202.

一方、前置品詞が名詞でなければ（ステップＳ２０２−Ｎｏ）、登録範囲設定部２５は、登録範囲の後置品詞が名詞か否か判定する（ステップＳ２０４）。後置品詞が名詞である場合（ステップＳ２０４−Ｙｅｓ）、登録範囲設定部２５は、登録範囲の後端位置をその後置品詞の終端に拡張する（ステップＳ２０５）。その後、登録範囲設定部２５は、ステップＳ２０４以降の処理を繰り返す。 On the other hand, if the prefix part of speech is not a noun (step S202-No), the registration range setting unit 25 determines whether the postfix part of speech of the registration range is a noun (step S204). When the postposition part of speech is a noun (step S204-Yes), the registration range setting unit 25 extends the rear end position of the registration range to the end of the postposition part of speech (step S205). Thereafter, the registration range setting unit 25 repeats the processing after step S204.

一方、後置品詞が名詞でなければ（ステップＳ２０４−Ｎｏ）、登録範囲設定部２５は、登録範囲内に含まれる、連続する全ての名詞全体を登録範囲とする（ステップＳ２０６）。 On the other hand, if the postscript part of speech is not a noun (step S204-No), the registration range setting unit 25 sets all the continuous nouns included in the registration range as the registration range (step S206).

また、ステップＳ２０１において、修正範囲に含まれる品詞が名詞でない場合（ステップＳ２０１−Ｎｏ）、登録範囲設定部２５は、修正範囲に含まれる品詞が活用自立語の語幹か否か判定する（ステップＳ２０７）。修正範囲に含まれる品詞が活用自立語の語幹であれば（ステップＳ２０７−Ｙｅｓ）、登録範囲設定部２５は、その活用自立語の語幹及び後置の活用語尾を登録範囲に設定する（ステップＳ２０８）。一方、修正範囲に含まれる品詞が活用自立語の語幹でなければ（ステップＳ２０７−Ｎｏ）、登録範囲設定部２５は、その修正範囲に含まれる品詞のみを登録範囲に設定する（ステップＳ２０９）。
ステップＳ２０６、Ｓ２０８またはＳ２０９の後、登録範囲設定部２５は、登録範囲設定処理を終了する。 If the part of speech included in the correction range is not a noun in step S201 (step S201-No), the registration range setting unit 25 determines whether or not the part of speech included in the correction range is a stem of a use independent word (step S207). ). If the part-of-speech included in the correction range is a stem of a use independent word (step S207-Yes), the registration range setting unit 25 sets the stem of the use independent word and a postfix use ending in the registration range (step S208). ). On the other hand, if the part of speech included in the correction range is not the stem of the independence word (step S207-No), the registration range setting unit 25 sets only the part of speech included in the correction range as the registration range (step S209).
After step S206, S208, or S209, the registration range setting unit 25 ends the registration range setting process.

登録範囲設定部２５は、設定された登録範囲を登録部２６に通知する。 The registration range setting unit 25 notifies the registration unit 26 of the set registration range.

登録部２６は、ユーザ修正後の中間表記中で設定された登録範囲に含まれる部分を一つの単語として、その単語の中間表記と品詞を、その単語の漢字仮名表記とともにユーザ辞書に登録する。例えば、図１０に示された例では、ユーザ辞書に、単語「羊羹最中」と対応付けて、「ヨーカンモ＊ナカ」という中間表記と普通名詞が登録される。また、ユーザ辞書に、単語「美しく」と対応付けて、「ウツク’シ％ク」という中間表記と形容詞が登録される。 The registration unit 26 registers the intermediate notation and part of speech of the word together with the kanji kana notation of the word in the user dictionary, with the part included in the registration range set in the intermediate notation after the user correction as one word. For example, in the example shown in FIG. 10, an intermediate notation of “Yokanmo * Naka” and a common noun are registered in the user dictionary in association with the word “midst of sheep”. In addition, an intermediate notation “adjective” and an adjective are registered in the user dictionary in association with the word “beautiful”.

図１２は、本実施形態による辞書登録処理の動作フローチャートである。
形態素情報設定部２２は、ユーザ修正後の中間表記からユーザ修正後の形態素読み表記を生成する（ステップＳ３０１）。そして形態素情報設定部２２は、ユーザ修正後の形態素読み表記に含まれる各形態素の範囲及び種別を設定する（ステップＳ３０２）。 FIG. 12 is an operation flowchart of dictionary registration processing according to this embodiment.
The morpheme information setting unit 22 generates a morpheme reading notation after user correction from the intermediate notation after user correction (step S301). The morpheme information setting unit 22 sets the range and type of each morpheme included in the morpheme reading notation after user correction (step S302).

品詞情報設定部２３は、ユーザ修正後の各形態素の範囲及び種別から、ユーザ修正後の中間表記における各品詞の範囲及び種別を特定する（ステップＳ３０３）。そして修正範囲設定部２４は、ユーザ修正の前後の中間表記を品詞単位でマッチングすることで品詞単位の修正範囲を設定する（ステップＳ３０４）。 The part of speech information setting unit 23 specifies the range and type of each part of speech in the intermediate notation after user correction from the range and type of each morpheme after user correction (step S303). Then, the correction range setting unit 24 sets the correction range in parts of speech by matching the intermediate notation before and after the user correction in units of parts of speech (step S304).

登録範囲設定部２５は、修正範囲に含まれる品詞が所定の品詞である場合に前後に連続する同一品詞を含めるように登録範囲を設定する（ステップＳ３０５）。そして登録部２６は、登録範囲単位で修正内容をユーザ辞書に登録する（ステップＳ３０６）。
その後、処理部４は、辞書登録処理を終了する。 The registration range setting unit 25 sets the registration range so that the same part of speech continues before and after when the part of speech included in the correction range is a predetermined part of speech (step S305). The registration unit 26 registers the correction contents in the user dictionary in units of registration ranges (step S306).
Thereafter, the processing unit 4 ends the dictionary registration process.

以上に説明してきたように、この音声合成装置は、ユーザが中間表記を修正すると、修正後の中間表記からユーザ辞書に登録すべき範囲及びその範囲に含まれる品詞を自動的に特定する。そのため、この音声合成装置は、ユーザが中間表記に対する特別な知識を有さなくても、あるいは、ユーザが修正した点について詳細に登録すべき範囲または品詞を設定しなくても、修正内容をユーザ辞書に適切に反映できる。そのため、この音声合成装置は、ユーザが不適切な修正内容をユーザ辞書に登録したり、登録すべき修正内容の一部がユーザ辞書に登録し忘れられることを防止できる。その結果として、この音声合成装置は、異なる複数の読みまたは韻律がある単語が含まれるテキストに対しても適切な合成音声信号を生成できる。 As described above, when the user corrects the intermediate notation, the speech synthesizer automatically specifies the range to be registered in the user dictionary and the part of speech included in the range from the corrected intermediate notation. For this reason, this speech synthesizer allows the user to edit the contents of the correction even if the user does not have special knowledge about the intermediate notation, or without setting a range or part of speech to be registered in detail for the point corrected by the user. Appropriately reflected in the dictionary. Therefore, this speech synthesizer can prevent the user from registering inappropriate correction content in the user dictionary or forgetting to register part of the correction content to be registered in the user dictionary. As a result, this speech synthesizer can generate an appropriate synthesized speech signal even for text including words having a plurality of different readings or prosody.

次に、第２の実施形態による音声合成装置について説明する。第２の実施形態による音声合成装置は、ユーザによる修正内容を登録する辞書として、単語の漢字仮名表記及び中間表記を記憶するユーザ辞書と、登録された単語の前後に位置する１以上の単語との関係も含めて記憶する中間表記辞書を有する。そしてこの音声合成装置は、言語辞書とともに、その２種類の辞書を利用して、言語処理を実行する。またこの音声号装置は、ユーザにより中間表記が修正された場合に、その修正内容をユーザ辞書に登録するか、中間表記辞書に登録するかを自動的に判別する。
なお、ユーザ辞書及び中間表記辞書は、それぞれ、言語辞書の一例である。 Next, a speech synthesizer according to the second embodiment will be described. The speech synthesizer according to the second embodiment includes, as a dictionary for registering correction contents by a user, a user dictionary that stores kanji kana notation and intermediate notation of words, and one or more words positioned before and after the registered words, It has an intermediate notation dictionary that also stores the relationship. The speech synthesizer executes language processing using the two types of dictionaries together with the language dictionary. In addition, when the intermediate notation is corrected by the user, the speech signal device automatically determines whether the correction content is registered in the user dictionary or the intermediate notation dictionary.
Each of the user dictionary and the intermediate notation dictionary is an example of a language dictionary.

第２の実施形態による音声合成装置は、第１の実施形態による音声合成装置と比較して、記憶部３が中間表記辞書を記憶する点と、処理部４の言語処理部１０の処理及び辞書登録部１２の登録部２６の処理について相違する。
そこで以下では、中間表記辞書と、言語処理部１０及び登録部２６について説明する。第２の実施形態による音声合成装置のその他の構成要素については、第１の実施形態の対置する構成要素の説明を参照されたい。 The speech synthesizer according to the second embodiment is different from the speech synthesizer according to the first embodiment in that the storage unit 3 stores an intermediate notation dictionary and the processing and dictionary of the language processor 10 of the processor 4. The processing of the registration unit 26 of the registration unit 12 is different.
Therefore, hereinafter, the intermediate notation dictionary, the language processing unit 10, and the registration unit 26 will be described. For the other components of the speech synthesizer according to the second embodiment, refer to the description of the components that face each other in the first embodiment.

本実施形態では、ユーザ辞書には、単一名詞及び複合名詞の修正した読みと、修正されたアクセント位置と、区切り位置とが登録されるものとする。一方、中間表記辞書には、ユーザ辞書の登録対象以外の中間表記の修正内容が登録されるものとする。例えば、中間表記辞書には、ユーザが修正した韻律情報が登録される。この韻律情報には、例えば、アクセントの強弱、音程の高低、抑揚の大小、話速の緩急、音量の大小及び区切り位置などが含まれる。しかし、ユーザ辞書の登録対象及び中間表記辞書の登録対象は、上記の例に限られない。ユーザ辞書に対する一般的な規定は存在しないので、音声合成装置の仕様に応じて、ユーザ辞書及び中間表記辞書の登録対象が設定されればよい。例えば、ユーザ辞書は、単一名詞及び複合名詞だけでなく、活用自立語または非活用自立語についても登録対象としてもよい。 In the present embodiment, it is assumed that a corrected reading of a single noun and a compound noun, a corrected accent position, and a break position are registered in the user dictionary. On the other hand, in the intermediate notation dictionary, correction contents of intermediate notations other than those to be registered in the user dictionary are registered. For example, prosodic information corrected by the user is registered in the intermediate notation dictionary. This prosodic information includes, for example, the strength of accents, the pitch of pitches, the level of inflection, the speed of speech, the level of volume, and the break position. However, the registration target of the user dictionary and the registration target of the intermediate notation dictionary are not limited to the above example. Since there are no general rules for user dictionaries, registration targets for user dictionaries and intermediate notation dictionaries need only be set according to the specifications of the speech synthesizer. For example, the user dictionary may register not only single nouns and compound nouns but also used independent words or non-used independent words.

図１３は、ユーザ辞書に登録された単語と中間表記辞書に登録された単語の一例を示す図である。ユーザ辞書１３００には、漢字仮名表記された単語「羊羹最中」について、その品詞の種別である普通名詞と、アクセント強の韻律記号を含む中間表記である「ヨーカンモ’ナカ」が関連付けて登録されている。一方、中間表記辞書１３０１には、漢字仮名表記された単語「羊羹最中」について、アクセント弱の韻律記号を含む中間表記である「ヨーカンモ＊ナカ」が関連付けて登録されている。さらに、中間表記辞書１３０１には、単語「羊羹最中」に関連付けられた中間表記が適用されるための条件である、前置される単語の漢字仮名表記「この」及び後置される単語の漢字仮名表記「は美しく」が関連付けて登録されている。すなわち、「羊羹最中」単独の場合には、「モ」のところのアクセントは強い方が好ましいが、「この羊羹最中は美しく」という文については、「モ」のところのアクセントは弱くすることを中間表記辞書は表している。なお、中間表記辞書１３０１には、前置される単語または後置される単語の中間表記も登録されていてもよい。 FIG. 13 is a diagram illustrating an example of a word registered in the user dictionary and a word registered in the intermediate notation dictionary. In the user dictionary 1300, the word “Yokanmochu” written in kanji kana is registered in association with the common noun that is the type of part of speech and “Yokanmo 'Naka” that is an intermediate notation including the accented prosodic symbol. ing. On the other hand, in the intermediate notation dictionary 1301, “Yokanmo * Naka”, which is an intermediate notation including a prosodic symbol with weak accent, is registered in association with the word “middle of the sheep” written in Kanji. Furthermore, in the intermediate notation dictionary 1301, the kanji kana notation “this” for the prefixed word and the postfixed word, which are conditions for applying the intermediate notation associated with the word “middle of the sheep”, are applied. The kanji kana notation “Ha Beautiful” is associated and registered. In other words, in the case of “in the middle of the sheep's wing” alone, it is preferable that the accent at “mo” is strong, but for the sentence “in the middle of this ramie”, the accent at “mo” is weakened. The intermediate notation dictionary represents this. Note that the intermediate notation dictionary 1301 may also register an intermediate notation of a prefixed word or a postfixed word.

この実施形態でも、言語処理部１０は、まず、言語辞書及びユーザ辞書を利用して、入力されたテキストデータの中間表記を作成する。その際、テキストデータ中に、言語辞書とユーザ辞書の両方に登録された表記がある場合には、言語処理部１０は、ユーザ辞書を優先的に利用する。そのため、例えば、図１に示されるように、テキストデータ中に「この羊羹最中は美しく」という文が含まれており、ユーザ辞書に「羊羹最中」が登録されているとする。この場合、言語処理部１０は、中間表記として「・・・ヨーカンサ’イチュー・・・」ではなく、「・・・ヨーカンモ’ナカ・・・」を出力する。 Also in this embodiment, the language processing unit 10 first creates an intermediate representation of input text data using a language dictionary and a user dictionary. At this time, if the text data includes notations registered in both the language dictionary and the user dictionary, the language processing unit 10 preferentially uses the user dictionary. Therefore, for example, as shown in FIG. 1, it is assumed that the text data includes a sentence “Beautiful in the midst of this sheep” and “In the middle of the sheep” is registered in the user dictionary. In this case, the language processing unit 10 outputs “... Yokakan 'Naka...” Instead of “.

さらに、言語処理部１０は、中間表記辞書を参照して、テキストデータ中に中間表記辞書に登録された単語の漢字仮名表記と前置される単語の漢字仮名表記及び後置される単語の漢字仮名表記と一致する部分があるか否か判定する。そして言語処理部１０は、一致する部分があれば、中間表記における対応する部分の中間表記を、中間表記辞書に登録されている中間表記で置換する。これにより、言語処理部１０は、文脈に応じて、合成音声信号中で単語のアクセント、音程、抑揚、話速、音量、または区切りなどを自動的に修正することができる。そのため、この音声合成装置は、ユーザが所望する合成音声をより適切に再現できる可能性を高めることができる。 Further, the language processing unit 10 refers to the intermediate notation dictionary and refers to the kanji kana notation of the word registered in the intermediate notation dictionary in the text data, the kanji kana notation of the prefixed word, and the kanji of the postfixed word. It is determined whether there is a part that matches the kana notation. If there is a matching part, the language processing unit 10 replaces the intermediate notation of the corresponding part in the intermediate notation with the intermediate notation registered in the intermediate notation dictionary. Thereby, the language processing unit 10 can automatically correct the accent, pitch, intonation, speech speed, volume, or break of the word in the synthesized speech signal according to the context. Therefore, this speech synthesizer can increase the possibility that the synthesized speech desired by the user can be reproduced more appropriately.

次に、ユーザが中間表記を修正した際に、その修正を反映させる辞書を選択するための処理について説明する。本実施形態では、この辞書の選択を、辞書登録部１２の登録部２６が実行する。 Next, a process for selecting a dictionary that reflects the correction when the user corrects the intermediate notation will be described. In the present embodiment, this dictionary selection is performed by the registration unit 26 of the dictionary registration unit 12.

図１４は、登録部２６により実行される辞書選択処理の動作フローチャートである。なお、辞書選択処理は、図１２に示した辞書登録処理のステップＳ３０６にて実行される。
登録部２６は、設定された登録範囲内の修正内容が、読み、区切り位置及びアクセント位置のうちの何れかの修正か否か判定する（ステップＳ４０１）。修正内容が、読み、区切り位置及びアクセント位置の何れでもない場合（ステップＳ４０１−Ｎｏ）、登録部２６は、中間表記辞書を選択する（ステップＳ４０２）。登録部２６は、ユーザ修正後の中間表記から、登録範囲に前置された品詞の漢字仮名表記及び登録範囲に後置された品詞の漢字仮名表記を抽出する（ステップＳ４０３）。なお、後置された品詞が助詞である場合、その助詞に後続する品詞も抽出してもよい。そして登録部２６は、登録範囲の修正後の中間表記及び漢字仮名表記と、前置漢字仮名表記及び後置漢字仮名表記とを中間表記辞書に登録する（ステップＳ４０４）。一般に、読み及びアクセント位置以外の修正は、合成音声信号における、「アクセント強弱」、「音程高低」、「抑揚大小」、「話速緩急」、「音量大小」及び「区切り」のうちの何れかの修正である。これらの修正は、合成音声の聴感上の前後のつながりを考慮して行われる。そのため、登録範囲の前後の単語も合わせて登録されることが好ましい。 FIG. 14 is an operation flowchart of dictionary selection processing executed by the registration unit 26. The dictionary selection process is executed in step S306 of the dictionary registration process shown in FIG.
The registration unit 26 determines whether or not the correction content within the set registration range is any one of reading, a break position, and an accent position (step S401). If the correction content is neither reading nor a break position or an accent position (step S401-No), the registration unit 26 selects an intermediate notation dictionary (step S402). The registration unit 26 extracts, from the intermediate notation after the user correction, the kanji kana notation of the part of speech prefixed to the registration range and the kanji kana notation of the part of speech postfixed to the registration range (step S403). In addition, when the postscript part of speech is a particle, the part of speech subsequent to the particle may be extracted. Then, the registration unit 26 registers the intermediate notation and kanji kana notation after the correction of the registration range, the prefix kanji kana notation, and the post-kanji kana notation in the intermediate notation dictionary (step S404). In general, corrections other than reading and accent positions are any of “accent strength”, “pitch pitch”, “inflection magnitude small”, “speaking speed gradual”, “volume loudness”, and “break” in the synthesized speech signal. It is a correction. These corrections are made in consideration of the connection before and after the audibility of the synthesized speech. Therefore, it is preferable that words before and after the registration range are also registered.

一方、ステップＳ４０１にて、修正内容が、読み、区切り位置及びアクセント位置のうちの何れかである場合（ステップＳ４０１−Ｙｅｓ）、登録部２６は、登録範囲の品詞が単一名詞及び複合名詞の何れかか否か判定する（ステップＳ４０５）。登録範囲の品詞が単一名詞及び複合名詞の何れでもなければ（ステップＳ４０５−Ｎｏ）、登録部２６は、ステップＳ４０２〜Ｓ４０４の処理を実行し、修正内容を中間表記辞書に登録する。 On the other hand, in step S401, when the correction content is any one of reading, a break position, and an accent position (step S401-Yes), the registration unit 26 indicates that the part of speech of the registration range is a single noun or a compound noun. It is determined whether or not it is any (step S405). If the part of speech in the registration range is neither a single noun nor a compound noun (step S405—No), the registration unit 26 executes the processes of steps S402 to S404 and registers the correction content in the intermediate notation dictionary.

一方、登録範囲の品詞が単一名詞及び複合名詞の何れかである場合（ステップＳ４０５−Ｙｅｓ）、登録部２６は、ユーザ辞書を選択する（ステップＳ４０６）。そして登録部２６は、登録範囲の修正後の中間表記、漢字仮名表記及び品詞の種別をユーザ辞書に登録する（ステップＳ４０７）。
ステップＳ４０４またはＳ４０７の後、登録部２６は、辞書選択処理を終了する。 On the other hand, when the part of speech of the registration range is either a single noun or a compound noun (step S405-Yes), the registration unit 26 selects a user dictionary (step S406). The registration unit 26 registers the intermediate notation, kanji kana notation, and part-of-speech type after correction of the registration range in the user dictionary (step S407).
After step S404 or S407, the registration unit 26 ends the dictionary selection process.

この実施形態によれば、音声合成装置は、ユーザが特定の文脈の中での合成音声の表現を修正しようとしたのか否かを自動的に判別し、その結果を辞書に反映させることができる。そのため、この音声合成装置は、ユーザによる修正負担を軽減しつつ、文脈に応じて適切な韻律を持つ合成音声信号を作成することができる。 According to this embodiment, the speech synthesizer can automatically determine whether or not the user tried to correct the expression of the synthesized speech in a specific context and reflect the result in the dictionary. . Therefore, this speech synthesizer can create a synthesized speech signal having an appropriate prosody according to the context while reducing the correction burden on the user.

変形例によれば、中間表記辞書に登録される単語について、その単語の漢字仮名表記及び中間表記とともに、その単語に前置される１以上の単語またはその単語に後続する１以上の単語の何れかのみが登録されてもよい。また、中間表記辞書には、一つの単語が、その単語に前置または後置される１以上の異なる単語の組み合わせごとに、複数登録されてもよい。 According to the modified example, for a word registered in the intermediate notation dictionary, any one of one or more words preceding the word or one or more words following the word, together with kanji kana notation and intermediate notation of the word Only or may be registered. A plurality of one word may be registered in the intermediate notation dictionary for each combination of one or more different words that precede or follow the word.

他の変形例によれば、言語処理に利用される辞書は、言語辞書一つだけでもよい。この場合には、ユーザが中間表記についてした修正内容は、全て言語辞書に反映される。そして言語処理部１０は、言語辞書のうち、ユーザ修正の結果により追加登録された内容を優先して言語処理を行えばよい。 According to another modification, only one language dictionary may be used for language processing. In this case, all corrections made by the user regarding the intermediate notation are reflected in the language dictionary. Then, the language processing unit 10 may perform the language processing by giving priority to the content additionally registered by the result of the user correction in the language dictionary.

また他の変形例によれば、辞書登録部１２の編集部２１は、ユーザ自身が中間表記中の修正範囲及び修正範囲の品詞を指定できるようにしてもよい。この場合には、辞書登録部１２は、ユーザ自身が設定した修正範囲に基づいて、登録範囲を自動的に設定する。このようにユーザが詳細に修正内容を指定できるようにすることで、音声合成の知識が豊富なユーザの利便性も向上できる。 According to another modification, the editing unit 21 of the dictionary registration unit 12 may allow the user himself / herself to specify the correction range and the part-of-speech in the correction range during the intermediate notation. In this case, the dictionary registration unit 12 automatically sets the registration range based on the correction range set by the user. As described above, by allowing the user to specify the details of the correction in detail, it is possible to improve the convenience of the user who has abundant knowledge of speech synthesis.

さらに、上記の各実施形態による音声合成装置の処理部が有する各機能をコンピュータに実現させるコンピュータプログラムは、コンピュータによって読み取り可能な媒体、例えば、磁気記録媒体、光記録媒体または半導体メモリに記録された形で提供されてもよい。 Furthermore, a computer program that causes a computer to realize each function of the processing unit of the speech synthesizer according to each of the above embodiments is recorded on a computer-readable medium, for example, a magnetic recording medium, an optical recording medium, or a semiconductor memory. It may be provided in the form.

ここに挙げられた全ての例及び特定の用語は、読者が、本発明及び当該技術の促進に対する本発明者により寄与された概念を理解することを助ける、教示的な目的において意図されたものであり、本発明の優位性及び劣等性を示すことに関する、本明細書の如何なる例の構成、そのような特定の挙げられた例及び条件に限定しないように解釈されるべきものである。本発明の実施形態は詳細に説明されているが、本発明の精神及び範囲から外れることなく、様々な変更、置換及び修正をこれに加えることが可能であることを理解されたい。 All examples and specific terms listed herein are intended for instructional purposes to help the reader understand the concepts contributed by the inventor to the present invention and the promotion of the technology. It should be construed that it is not limited to the construction of any example herein, such specific examples and conditions, with respect to showing the superiority and inferiority of the present invention. Although embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions and modifications can be made thereto without departing from the spirit and scope of the present invention.

以上説明した実施形態及びその変形例に関し、更に以下の付記を開示する。
（付記１）
テキストデータから生成した、該テキストデータの読み及び韻律を表す韻律記号を含む中間表記に基づいて合成音声信号を生成する音声合成装置であって、
前記テキストデータを取得する入力部と、
少なくとも単語の漢字仮名表記と該単語の読み及び前記韻律記号を含む中間表記とが登録され、前記テキストデータから前記中間表記を生成するために利用される言語辞書を記憶する記憶部と、
前記中間表記が修正された修正後中間表記における修正範囲に含まれる単語が所定の品詞である場合、当該修正範囲に含まれる単語の前後に連続する、当該単語の品詞と同じ品詞の単語まで含むように、前記言語辞書に登録する登録範囲を拡張する登録範囲設定部と、
前記修正後中間表記中の前記登録範囲に含まれる部分を一つの単語として、少なくとも該単語の漢字仮名表記と中間表記とを前記言語辞書に登録する登録部と、
を有する音声合成装置。
（付記２）
前記所定の品詞は名詞であり、前記登録範囲設定部は、前記修正範囲に含まれる単語が名詞である場合、当該修正範囲の前後に連続する名詞まで含むように前記登録範囲を拡張する、付記１に記載の音声合成装置。
（付記３）
前記所定の品詞は活用自立語の語幹であり、前記登録範囲設定部は、前記修正範囲に含まれる単語が活用自立語の語幹である場合、当該修正範囲に後続する活用自立語の語尾まで含むように前記登録範囲を拡張する、付記１に記載の音声合成装置。
（付記４）
前記登録部は、前記登録範囲に含まれる単語の品詞を、該単語の漢字仮名表記と中間表記とともに前記言語辞書に登録する、付記１〜３の何れか一項に記載の音声合成装置。
（付記５）
前記修正後中間表記から前記韻律記号を除去することにより、修正後の前記テキストデータの読みを表す修正後形態素読み表記を生成し、前記中間表記から前記韻律記号を除去した形態素読み表記と前記修正後形態素読み表記との間のマッチングにより、前記修正後形態素読み表記に含まれる各品詞の範囲及び種別を求める形態素情報設定部と、
前記修正後形態素読み表記に含まれる各形態素について、前記修正後中間表記のうちで当該形態素の読みと一致する部分を、それぞれ、当該形態素と同一の品詞に設定する品詞情報設定部と、
前記修正後中間表記と前記中間表記間で異なる部分を抽出し、該異なる部分を含む品詞全体を前記修正範囲に設定する修正範囲設定部と、
をさらに有する、付記１〜４の何れか一項に記載の音声合成装置。
（付記６）
前記形態素情報設定部は、前記修正後形態素読み表記と前記形態素読み表記間で一致する部分と一致しない部分とを抽出し、前記修正後形態素読み表記中の一致する部分には、前記形態素読み表記における対応する品詞と同一の品詞を設定し、前記修正後形態素読み表記中の一致しない部分に、前記形態素読み表記のうちの一つの品詞が対応している場合、当該一致しない部分に、前記形態素読み表記のうちの対応する部分の品詞を設定する、付記５に記載の音声合成装置。
（付記７）
前記形態素情報設定部は、前記修正後形態素読み表記中の一致しない部分に、前記形態素読み表記の連続する複数の普通名詞が対応している場合、当該一致しない部分の品詞を普通名詞とする、付記６に記載の音声合成装置。
（付記８）
前記形態素情報設定部は、前記修正後形態素読み表記中の一致しない部分に、前記形態素読み表記の連続する複数の名詞が対応し、かつ該複数の名詞のうちの最後尾の名詞がサ変名詞である場合、当該一致しない部分の品詞をサ変名詞とする、付記６に記載の音声合成装置。
（付記９）
前記形態素情報設定部は、前記修正後形態素読み表記中の一致しない部分に、前記形態素読み表記の連続する複数の名詞が対応し、該複数の名詞のうちの最後尾の名詞がサ変名詞でなく、かつ、該複数の名詞の何れかが普通名詞でない場合、当該一致しない部分の品詞を固有名詞とする、付記６に記載の音声合成装置。
（付記１０）
前記記憶部は、前記言語辞書として、少なくとも単語の漢字仮名表記とともに該単語の中間表記が登録される第１の言語辞書と、少なくとも単語の漢字仮名表記とともに、該単語の中間表記と該単語の前または後に位置する１以上の単語の漢字仮名表記が登録される第２の言語辞書とを記憶し、
前記登録部は、前記修正後中間表記の前記登録範囲においてアクセント位置以外の韻律が変更されている場合、前記登録範囲の前または後に位置する１以上の単語の漢字仮名表記と、前記登録範囲に含まれる単語の漢字仮名表記と中間表記とを前記第２の言語辞書に登録する、付記１〜９の何れか一項に記載の音声合成装置。
（付記１１）
前記登録部は、前記修正後中間表記の前記登録範囲に含まれる単語の品詞が名詞であり、かつ、アクセント位置以外の韻律が変更されていない場合、前記登録範囲に含まれる単語の漢字仮名表記と中間表記とを前記第１の言語辞書に登録する、付記１０に記載の音声合成装置。
（付記１２）
前記テキストデータのうち、前記第１の言語辞書に登録されている漢字仮名表記と一致する単語を該単語の中間表記とすることで前記中間表記を生成し、かつ、前記テキストデータのうち、前記第２の言語辞書に登録されている単語の漢字仮名表記と一致し、かつ当該単語の前または後に位置する１以上の単語の漢字仮名表記とも一致する部分に対応する前記中間表記の中間表記を、前記第２の言語辞書に登録されている当該単語の中間表記に書き換える言語処理部をさらに有する、付記１０または１１に記載の音声合成装置。
（付記１３）
テキストデータから合成音声信号を生成するための、該テキストデータの読み及び韻律を表す韻律記号を含む中間表記の作成に利用される言語辞書の登録方法であって、
前記テキストデータを取得し、
プロセッサが、前記中間表記が修正された修正後中間表記における修正範囲に含まれる単語が所定の品詞である場合、当該修正範囲に含まれる単語の前後に連続する、当該単語の品詞と同じ品詞の単語まで含むように、前記言語辞書に登録する登録範囲を拡張し、
前記プロセッサが、前記修正後中間表記中の前記登録範囲に含まれる部分を一つの単語として、該単語の漢字仮名表記と該単語の読み及び前記韻律記号を含む中間表記とを、記憶部に記憶された前記言語辞書に登録する、
ことを含む言語辞書登録方法。 The following supplementary notes are further disclosed regarding the embodiment described above and its modifications.
(Appendix 1)
A speech synthesizer that generates a synthesized speech signal based on an intermediate notation that includes prosodic symbols representing prosody and reading of the text data generated from text data,
An input unit for acquiring the text data;
A storage unit that stores at least a kanji kana notation of a word and an intermediate notation including the reading of the word and the prosodic symbol, and stores a language dictionary used to generate the intermediate notation from the text data;
When the word included in the corrected range in the corrected intermediate notation in which the intermediate notation is corrected is a predetermined part of speech, the word including the part of speech that is the same as the part of speech of the word that is continuous before and after the word included in the corrected range is included. A registration range setting unit for extending a registration range to be registered in the language dictionary,
A registration unit for registering at least a kanji kana notation and an intermediate notation of the word in the language dictionary, with a portion included in the registration range in the corrected intermediate notation as one word,
A speech synthesizer.
(Appendix 2)
The predetermined part-of-speech is a noun, and when the word included in the correction range is a noun, the registration range setting unit extends the registration range to include nouns continuous before and after the correction range. The speech synthesizer according to 1.
(Appendix 3)
The predetermined part of speech is a stem of a use independent word, and the registration range setting unit includes up to the ending of a use independent word following the correction range when the word included in the correction range is a stem of the use independent word The speech synthesizer according to appendix 1, wherein the registration range is extended as described above.
(Appendix 4)
The speech synthesizer according to any one of appendices 1 to 3, wherein the registration unit registers the part of speech of a word included in the registration range together with the kanji kana notation and the intermediate notation of the word in the language dictionary.
(Appendix 5)
By removing the prosodic symbols from the modified intermediate notation, a modified morpheme reading notation representing the corrected reading of the text data is generated, and the morpheme reading notation obtained by removing the prosodic symbols from the intermediate notation and the modified A morpheme information setting unit for obtaining a range and a type of each part of speech included in the corrected morpheme reading notation by matching with the post-morpheme reading notation;
For each morpheme included in the corrected morpheme reading notation, a part of speech information setting unit that sets a part of the corrected intermediate notation that matches the reading of the morpheme to the same part of speech as the morpheme,
A correction range setting unit that extracts different parts between the corrected intermediate notation and the intermediate notation, and sets the entire part of speech including the different parts in the correction range;
The speech synthesizer according to any one of appendices 1 to 4, further comprising:
(Appendix 6)
The morpheme information setting unit extracts a portion that matches and does not match between the corrected morpheme reading notation and the morpheme reading notation, and the matching part in the corrected morpheme reading notation includes the morpheme reading notation Set the same part of speech as the corresponding part of speech, and when one part of speech of the morpheme reading notation in the corrected morpheme reading notation corresponds to the part of the morpheme reading notation, The speech synthesizer according to appendix 5, wherein the part of speech corresponding to the reading notation is set.
(Appendix 7)
The morpheme information setting unit, when a plurality of consecutive common nouns of the morpheme reading notation corresponds to the non-matching part in the modified morpheme reading notation, the part of speech of the mismatched part is a common noun, The speech synthesizer according to appendix 6.
(Appendix 8)
The morpheme information setting unit includes a plurality of consecutive nouns in the morpheme reading notation corresponding to the non-matching part in the corrected morpheme reading notation, and the last noun of the plurality of nouns is a saun noun. The speech synthesizer according to appendix 6, wherein in some cases, the part-of-speech of the non-matching part is a sa variable noun.
(Appendix 9)
The morpheme information setting unit includes a plurality of consecutive nouns in the morpheme reading notation that correspond to a non-matching part in the corrected morpheme reading notation, and the last noun of the plurality of nouns is not a saun noun In addition, if any of the plurality of nouns is not a common noun, the speech synthesizer according to appendix 6, wherein the part of speech that does not match is a proper noun.
(Appendix 10)
The storage unit includes, as the language dictionary, a first language dictionary in which an intermediate notation of the word is registered together with at least the kanji kana notation of the word, an intermediate notation of the word and at least the kanji notation of the word Storing a second language dictionary in which kanji notation of one or more words located before or after is registered;
When the prosody other than the accent position is changed in the registration range of the modified intermediate notation, the registration unit includes kanji kana notation of one or more words positioned before or after the registration range, and the registration range. The speech synthesizer according to any one of appendices 1 to 9, wherein kanji kana notation and intermediate notation of an included word are registered in the second language dictionary.
(Appendix 11)
The registration unit, when the part of speech of the word included in the registration range of the corrected intermediate notation is a noun and the prosody other than the accent position is not changed, kanji kana notation of the word included in the registration range The speech synthesizer according to appendix 10, wherein the intermediate notation and the intermediate notation are registered in the first language dictionary.
(Appendix 12)
Among the text data, the intermediate notation is generated by using a word that matches the kanji kana notation registered in the first language dictionary as the intermediate notation of the word, and the text data includes the An intermediate notation of the intermediate notation corresponding to a portion that matches a kanji notation of a word registered in the second language dictionary and also matches a kanji notation of one or more words located before or after the word; The speech synthesizer according to appendix 10 or 11, further comprising a language processing unit for rewriting the intermediate notation of the word registered in the second language dictionary.
(Appendix 13)
A method for registering a language dictionary used to create an intermediate notation including prosodic symbols representing prosody and reading of the text data to generate a synthesized speech signal from text data,
Obtaining the text data,
When the word included in the correction range in the corrected intermediate notation in which the intermediate notation is corrected is a predetermined part of speech, the processor has the same part of speech as the part of speech of the word consecutively before and after the word included in the correction range. Extend the registration range to be registered in the language dictionary to include even words,
The processor stores, in a storage unit, a kanji kana notation of the word and an intermediate notation including the reading of the word and the prosodic symbol, with a portion included in the registered range in the corrected intermediate notation as one word Registered in the language dictionary
Language dictionary registration method including that.

１音声合成装置
２入力部
３記憶部
４処理部
５出力部
６スピーカ
１０言語処理部
１１音声合成部
１２辞書登録部
２１編集部
２２形態素情報設定部
２３品詞情報設定部
２４修正範囲設定部
２５登録範囲設定部
２６登録部 1 speech synthesis device 2 input unit 3 storage unit 4 processing unit 5 output unit 6 speaker 10 language processing unit 11 speech synthesis unit 12 dictionary registration unit 21 editing unit 22 morpheme information setting unit 23 part of speech information setting unit 24 correction range setting unit 25 registration Range setting part 26 Registration part

Claims

A speech synthesizer that generates a synthesized speech signal based on an intermediate notation that includes prosodic symbols representing prosody and reading of the text data generated from text data,
An input unit for acquiring the text data;
A storage unit that stores at least a kanji kana notation of a word and an intermediate notation including the reading of the word and the prosodic symbol, and stores a language dictionary used to generate the intermediate notation from the text data;
When the word included in the corrected range in the corrected intermediate notation in which the intermediate notation is corrected is a predetermined part of speech, the word including the part of speech that is the same as the part of speech of the word that is continuous before and after the word included in the corrected range is included. A registration range setting unit for extending a registration range to be registered in the language dictionary,
A registration unit for registering at least a kanji kana notation and an intermediate notation of the word in the language dictionary, with a portion included in the registration range in the corrected intermediate notation as one word,
A speech synthesizer.

The predetermined part-of-speech is a noun, and when the word included in the correction range is a noun, the registration range setting unit extends the registration range to include nouns continuous before and after the correction range. Item 2. The speech synthesizer according to Item 1.

The predetermined part of speech is a stem of a use independent word, and the registration range setting unit includes up to the ending of a use independent word following the correction range when the word included in the correction range is a stem of the use independent word The speech synthesis apparatus according to claim 1, wherein the registration range is extended as follows.

By removing the prosodic symbols from the modified intermediate notation, a modified morpheme reading notation representing the corrected reading of the text data is generated, and the morpheme reading notation obtained by removing the prosodic symbols from the intermediate notation and the modified A morpheme information setting unit for obtaining a range and a type of each part of speech included in the corrected morpheme reading notation by matching with the post-morpheme reading notation;
For each morpheme included in the corrected morpheme reading notation, a part of speech information setting unit that sets a part of the corrected intermediate notation that matches the reading of the morpheme to the same part of speech as the morpheme,
A correction range setting unit that extracts different parts between the corrected intermediate notation and the intermediate notation, and sets the entire part of speech including the different parts in the correction range;
The speech synthesizer according to claim 1, further comprising:

The morpheme information setting unit extracts a portion that matches and does not match between the corrected morpheme reading notation and the morpheme reading notation, and the matching part in the corrected morpheme reading notation includes the morpheme reading notation Set the same part of speech as the corresponding part of speech, and when one part of speech of the morpheme reading notation in the corrected morpheme reading notation corresponds to the part of the morpheme reading notation, The speech synthesizer according to claim 4, wherein the part of speech of a corresponding part of the reading notation is set.

The morpheme information setting unit includes a plurality of consecutive nouns in the morpheme reading notation corresponding to the non-matching part in the corrected morpheme reading notation, and the last noun of the plurality of nouns is a saun noun. The speech synthesizer according to claim 5, wherein, in some cases, the part of speech of the non-matching part is a sa-changing noun.

The storage unit includes, as the language dictionary, a first language dictionary in which an intermediate notation of the word is registered together with at least the kanji kana notation of the word, an intermediate notation of the word and at least the kanji notation of the word Storing a second language dictionary in which kanji notation of one or more words located before or after is registered;
When the prosody other than the accent position is changed in the registration range of the modified intermediate notation, the registration unit includes kanji kana notation of one or more words positioned before or after the registration range, and the registration range. The speech synthesizer according to claim 1, wherein kanji kana notation and intermediate notation of an included word are registered in the second language dictionary.

The registration unit, when the part of speech of the word included in the registration range of the corrected intermediate notation is a noun and the prosody other than the accent position is not changed, kanji kana notation of the word included in the registration range The speech synthesizer according to claim 7, wherein the intermediate notation and the intermediate notation are registered in the first language dictionary.

Among the text data, the intermediate notation is generated by using a word that matches the kanji kana notation registered in the first language dictionary as the intermediate notation of the word, and the text data includes the An intermediate notation of the intermediate notation corresponding to a portion that matches a kanji notation of a word registered in the second language dictionary and also matches a kanji notation of one or more words located before or after the word; The speech synthesizer according to claim 7, further comprising a language processing unit that rewrites the intermediate representation of the word registered in the second language dictionary.

A method for registering a language dictionary used to create an intermediate notation including prosodic symbols representing prosody and reading of the text data to generate a synthesized speech signal from text data,
Obtaining the text data,
When the word included in the correction range in the corrected intermediate notation in which the intermediate notation is corrected is a predetermined part of speech, the processor has the same part of speech as the part of speech of the word consecutively before and after the word included in the correction range. Extend the registration range to be registered in the language dictionary to include even words,
The processor stores, in a storage unit, a kanji kana notation of the word and an intermediate notation including the reading of the word and the prosodic symbol, with a portion included in the registered range in the corrected intermediate notation as one word Registered in the language dictionary
Language dictionary registration method including that.