JP6411015B2

JP6411015B2 - Speech synthesizer, speech synthesis method, and program

Info

Publication number: JP6411015B2
Application number: JP2013189845A
Authority: JP
Inventors: 淳哉斎藤; 野田　拓也; 拓也野田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-09-12
Filing date: 2013-09-12
Publication date: 2018-10-24
Anticipated expiration: 2033-09-12
Also published as: JP2015055793A

Description

本発明は、音声合成装置、音声合成方法、およびプログラムに関する。 The present invention relates to a speech synthesizer, a speech synthesis method, and a program.

ユーザが入力した日本語表記（漢字仮名交じり文など）を合成音声に変換して読み上げる音声合成技術が知られている。このような音声合成技術では、言語処理によって、日本語表記を、合成音声の表現に関わる韻律記号を表記した中間表記に変換し、中間表記にしたがって合成音声を生成する。 A speech synthesis technology is known that converts Japanese sentences (kanji, kana, mixed sentences, etc.) input by the user into synthetic speech and reads it. In such speech synthesis technology, language processing converts Japanese notation into an intermediate notation in which prosodic symbols related to the expression of synthetic speech are written, and synthetic speech is generated according to the intermediate notation.

中間表記は、日本語表記から言語処理によって自動生成するが、精度が十分でないことが多い。そこで、日本語表記から合成音声に変換する際には、音声合成装置による自動変換と、ユーザによる修正作業の相互作用によって、対話的に発音情報を作成するようにしている例がある。固定された複数の文例パターンに、変化する部分をはめ込んで音声合成を行う例もある。また、入力文を解析し、生成規則に従って第一の中間言語文字列を生成し、重み付けを行う方法が知られている。この方法では、中間言語文字列の変更調整を行って第二の中間言語文字列を生成して重み付けを行い、第一または第二の中間言語文字列のうち信頼性、重み付けの高い方を選定して出力することにより、自然な音声を出力することを目指している。さらに、入力された合成音声信号、及び文字並び、モーラ数、アクセント型に基づいて、複数の自然発声された音声から合成しようとする音声に最も近い音声のピッチパターンを探しだし、このピッチパターンをもとに合成を行う技術も知られている。（例えば、特許文献１〜４参照） Intermediate notation is automatically generated from Japanese notation by language processing, but often the accuracy is not sufficient. Therefore, when converting from Japanese-language notation to synthetic speech, there is an example in which pronunciation information is interactively created by the interaction between automatic conversion by a speech synthesizer and correction work by a user. There is also an example in which a speech synthesis is performed by inserting a changing part into a plurality of fixed example patterns. There is also known a method of analyzing an input sentence, generating a first intermediate language character string according to a generation rule, and weighting. In this method, the adjustment adjustment of the intermediate language character string is performed to generate the second intermediate language character string and the weighting is performed, and the higher one of the reliability and the weighting is selected among the first and second intermediate language character strings. It aims at outputting a natural voice by outputting it. Furthermore, based on the input synthetic speech signal, and the character arrangement, the number of moras, and the accent type, a pitch pattern of speech closest to the speech to be synthesized is searched from a plurality of naturally uttered speech, and this pitch pattern is There is also known a technique for performing synthesis on the basis. (For example, refer to patent documents 1-4)

特開平９−１７１３９２号公報Unexamined-Japanese-Patent No. 9-171392 特開平９−１３４１９０号公報JP-A-9-134190 特開２０００−５６７８６号公報JP 2000-56786 A 特開平９−３４４９２号公報JP-A-9-34492

しかしながら、情報技術を用いた教材用音声や、博物館や展示会の説明用音声などの高品質が求められる場面では、ユーザは何度も合成音声を聴取して中間表記を修正する必要があり、ユーザの負担が大きい。しかも、上記のような従来の技術では、不自然なアクセントになったり、修正パターンに適合する場合が少なすぎたり、複数文に依存した大局的な意味に依存する音声が合成されなかったりするという問題がある。 However, in situations where high quality is required, such as audio for teaching materials using information technology or audio for explaining museums and exhibitions, it is necessary for the user to listen to the synthesized speech many times and correct the intermediate notation, The burden on the user is large. Moreover, in the above-described conventional techniques, the accent may be unnatural, or may not fit the correction pattern too little, or the speech depending on the global meaning depending on plural sentences may not be synthesized. There's a problem.

また、中間表記を修正した後、日本語表記が、原稿の訂正等よって変更される場合がある。この場合、単純には、変更された日本語表記から言語処理によって、新たに中間表記を自動生成する必要があるが、ユーザが以前に修正した中間表記の情報は消えてしまうため、再び同様の修正が必要になり、再びユーザに負担を強いることになる。 In addition, after correcting the intermediate notation, the Japanese notation may be changed due to the correction of the document. In this case, it is simply necessary to automatically generate an intermediate notation anew from the changed Japanese notation by language processing, but the information on the intermediate notation previously corrected by the user disappears, so the same process is repeated again. A correction is required, which again places a burden on the user.

ひとつの側面によれば、本発明の目的は、音声合成において、合成音声の質を向上させるためのユーザの修正の負担を低減することである。 According to one aspect, the object of the present invention is to reduce the burden of the user's modification to improve the quality of synthesized speech in speech synthesis.

ひとつの態様である音声合成装置は、入力部、言語処理部、記憶部、修正部、音声合成部を有している。入力部は、音声合成の対象とする自然言語の言語表記の入力を受付ける。言語処理部は、少なくとも前記自然言語の形態素に対応する読み、品詞、アクセント句、および前記アクセント句に関する韻律を含む情報が登録された辞書情報に基づき前記言語表記を解析する。また、言語処理部は、前記言語表記に含まれる形態素と前記形態素に対応する品詞情報とを含む形態素表記、並びに、アクセント句のまとまりを示すアクセント句情報と前記アクセント句の韻律を示す韻律情報とを含む中間表記を生成する。記憶部は、品詞情報とアクセント句情報とに関連付けて、韻律情報の修正内容を示す修正情報を記憶する。修正部は、前記言語処理部により生成された品詞情報とアクセント句情報とに関連付けて、前記修正情報が前記記憶部に記憶されている場合に、前記言語処理部により生成された韻律情報を前記修正情報に基づき修正する。音声合成部は、前記修正部による修正を反映した韻律情報を含む中間表記に基づき前記言語表記に対応する音声を合成する。 The speech synthesis apparatus according to one aspect includes an input unit, a language processing unit, a storage unit, a correction unit, and a speech synthesis unit. The input unit receives an input of a language description of a natural language to be subjected to speech synthesis. The language processing unit analyzes the language notation based on dictionary information in which information including at least a reading, part of speech, accent phrase, and prosody regarding the accent phrase corresponding to a morpheme of the natural language is registered. Also, the language processing unit includes: a morpheme notation including a morpheme included in the linguistic notation and a part-of-speech information corresponding to the morpheme; accent phrase information indicating aggregation of accent phrases; and prosody information indicating prosody of the accent phrases Generate an intermediate notation that contains The storage unit stores correction information indicating correction content of prosody information in association with part-of-speech information and accent phrase information. The correction unit relates the prosody information generated by the language processing unit to the prosody information when the correction information is stored in the storage unit in association with the part-of-speech information generated by the language processing unit and the accent phrase information. Correct based on the correction information. The speech synthesis unit synthesizes the speech corresponding to the language notation based on the intermediate notation including the prosody information reflecting the correction by the correction unit.

なお、上述した態様に係る音声合成装置による音声処理方法、および音声処理方法をコンピュータに行わせるためのプログラムであっても、上述した態様に係る音声合成装置と同様の作用効果を奏するので、前述した課題が解決される。 Note that even the speech processing method by the speech synthesis device according to the aspect described above and the program for causing a computer to execute the speech processing method exhibit the same effects as the speech synthesis device according to the aspect described above, so Problems are solved.

実施形態の音声合成装置、音声合成方法、およびプログラムによれば、合成音声の質を向上させるためのユーザの修正の負担を低減することができる。 According to the voice synthesis apparatus, the voice synthesis method, and the program of the embodiment, it is possible to reduce the burden of the user's correction for improving the quality of the synthesized voice.

第１の実施の形態による音声合成装置の機能の一例を示すブロック図である。It is a block diagram which shows an example of the function of the speech synthesizer by 1st Embodiment. 第１の実施の形態によるアクセント強度の修正を概念的に示す図である。It is a figure which shows notionally correction of the accent intensity by 1st Embodiment. 第１の実施の形態による言語処理の一例を示す図である。It is a figure showing an example of language processing by a 1st embodiment. 第１の実施の形態によるアクセント強度の修正の一例を示す図である。It is a figure which shows an example of correction of the accent intensity by 1st Embodiment. 第１の実施の形態による日本語表記の変更の一例を示す図である。It is a figure which shows an example of a change of the Japanese-language description by 1st Embodiment. 第１の実施の形態による変更後の日本語表記の言語処理の一例を示す図である。It is a figure which shows an example of the language processing of the Japanese expression after the change by 1st Embodiment. 第１の実施の形態による変更形態素検索の一例を示す図である。It is a figure which shows an example of the change morpheme search by 1st embodiment. 第１の実施の形態による形態素表記と中間表記との対応付けの一例を示す図である。It is a figure which shows an example of matching with the morpheme notation and the intermediate notation by 1st Embodiment. 第１の実施の形態によるアクセント強度の修正判定の一例を示す図である。It is a figure which shows an example of correction determination of the accent intensity | strength by 1st Embodiment. 第１の実施の形態による音声合成装置による処理を示すフローチャートである。It is a flowchart which shows the process by the speech synthesizer by 1st Embodiment. 第２の実施の形態による音声合成装置の構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a structure of the speech synthesizer by 2nd Embodiment. 第２の実施の形態によるテンプレートの一例を示す図である。It is a figure which shows an example of the template by 2nd Embodiment. 第２の実施の形態によるテンプレートＤＢを参照する例を概念的に示す図である。It is a figure which shows notionally the example which refers template DB by 2nd Embodiment. 第２の実施の形態による音声合成装置の処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of the speech synthesizer by 2nd Embodiment. 第３の実施の形態による音声合成装置の構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a structure of the speech synthesizer by 3rd Embodiment. 韻律情報の変形例を示す図である。It is a figure which shows the modification of prosody information. 韻律情報の変形例を示す図である。It is a figure which shows the modification of prosody information. 標準的なコンピュータのハードウエア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of a standard computer.

（第１の実施の形態）
以下、図１から図７を参照しながら第１の実施の形態による音声合成装置１について説明する。説明の都合上、韻律記号の一例としてアクセント強度に着目して説明するが、他の韻律記号も適用可能である。アクセント強度は、アクセント句に対応して定められ、そのアクセント句の強弱を制御するという特徴をもつ韻律記号の一つである。他の韻律記号とは、例えば、音程高低、抑揚大小、話速緩急、音量大小などを示す記号である。以下に示す韻律記号の表記は一例であり、これに限定されない。 First Embodiment
Hereinafter, the speech synthesizer 1 according to the first embodiment will be described with reference to FIGS. 1 to 7. For convenience of explanation, although an explanation will be given focusing on accent intensity as an example of the prosodic symbols, other prosodic symbols can be applied. Accent intensity is one of prosodic symbols that is determined in correspondence with an accent phrase and that controls the strength and weakness of the accent phrase. The other prosodic symbols are symbols indicating, for example, pitch high and low, intonation size, speech speed and speed, volume level and the like. The notation of the prosodic symbols shown below is an example and is not limited thereto.

本実施の形態は、入力された日本語表記（漢字仮名交じり文など）に対応する合成音声においてアクセント強度の修正が行われる場合を想定している。この場合に、入力された日本語表記の一部を変更した変更後の日本語表記に、変更前の日本語表記における修正と同様にアクセント強度の修正を実行するか否かを自動的に判定する。なお、変更後の日本語表記に、変更前の日本語表記における修正と同様にアクセント強度の修正を実行することを、アクセント強度、または韻律情報を引き継ぐと表現するものとする。以下、初めにアクセント強度の修正が行われる日本語表記を第１の日本語表記、第１の日本語表記において日本語表記の一部が変更された日本語表記を第２の日本語表記という。 In the present embodiment, it is assumed that the accent intensity is corrected in the synthesized speech corresponding to the input Japanese notation (Kanji Kana mixed sentence and the like). In this case, it is automatically determined whether or not the correction of accent intensity is to be performed on the changed Japanese notation after changing a part of the input Japanese notation, as in the correction on the Japanese notation before the change. Do. Note that performing correction of the accent intensity in the same way as the correction in the Japanese notation before the change is expressed as taking over the accent strength or the prosody information to the Japanese notation after the change. Hereinafter, the Japanese notation in which the correction of the accent intensity is performed is referred to as a first Japanese notation, and the Japanese notation in which a part of the Japanese notation is changed in the first Japanese notation is referred to as a second Japanese notation .

図１は、第１の実施の形態による音声合成装置１の機能の一例を示すブロック図である。図１に示すように、音声合成装置１は、入出力部５、言語処理部７、音声合成部９、中間表記修正部１１、記憶部２１を有している。中間表記修正部１１は、変更検索部１３、形態素対応付け部１５、修正判定部１７、韻律修正部１９を有している。音声合成装置１の上記の各機能は、例えば、記憶部２１に記憶されたプログラムをプロセッサが読み込んで実行することにより実現されるようにしてもよい。また、少なくとも一部の機能を例えば半導体集積回路などにより実現するようにしてもよい。音声合成装置１は、例えば記憶部２１に、言語処理を行うための辞書、音声合成に使用する音声を記憶するようにしてもよい。 FIG. 1 is a block diagram showing an example of the function of the speech synthesizer 1 according to the first embodiment. As shown in FIG. 1, the speech synthesizer 1 includes an input / output unit 5, a language processing unit 7, a speech synthesis unit 9, an intermediate notation correction unit 11, and a storage unit 21. The intermediate notation correction unit 11 includes a change search unit 13, a morpheme matching unit 15, a correction determination unit 17, and a prosody correction unit 19. Each of the above-described functions of the speech synthesizer 1 may be realized, for example, by a processor reading and executing a program stored in the storage unit 21. In addition, at least a part of the functions may be realized by, for example, a semiconductor integrated circuit. The speech synthesizer 1 may store, for example, in the storage unit 21 a dictionary for performing language processing, and speech to be used for speech synthesis.

制御部３は、音声合成装置１の動作を制御する。入出力部５は、音声合成装置１への情報の入出力を行う機能であり、例えば、タッチパネル、キーボードなどの入力装置、表示装置、スピーカなどの出力装置を含む。言語処理部７は、例えば日本語の品詞情報、アクセント情報などを含む辞書に基づき、日本語表記の言語処理を行い、形態素表記、中間表記などを出力する。形態素表記は、例えば、形態素ごとに分割された日本語表記と、対応する品詞情報を含む情報である。中間表記は、例えば、日本語表記のヨミと、日本語表記におけるひとつのアクセント句に対応する語句を示すアクセント句情報と、アクセント句に対応するアクセント強度とに基づく表記である。音声合成部９は、例えば中間表記、および予め録音された例えば単語毎、文節毎などの音声に基づいて、音声を合成する。記憶部２１は、情報を記憶する装置であり、音声合成装置１の動作を制御するプログラムや、各処理を行うための情報等を記憶するようにしてもよい。 The control unit 3 controls the operation of the speech synthesizer 1. The input / output unit 5 is a function to input / output information to / from the voice synthesizer 1 and includes, for example, an input device such as a touch panel and a keyboard, and an output device such as a display device and a speaker. The language processing unit 7 performs language processing of Japanese notation, for example, based on a dictionary including Japanese part-of-speech information, accent information and the like, and outputs morphological notation, intermediate notation and the like. The morpheme notation is, for example, information including Japanese notation divided for each morpheme and corresponding part of speech information. The intermediate notation is a notation based on, for example, a Japanese notation, accent phrase information indicating a phrase corresponding to one accent phrase in the Japanese notation, and accent strength corresponding to the accent phrase. The speech synthesis unit 9 synthesizes speech on the basis of, for example, intermediate notation and speech recorded in advance, such as for each word, for each phrase, and the like. The storage unit 21 is a device for storing information, and may store a program for controlling the operation of the speech synthesizer 1 or information for performing each process.

中間表記修正部１１は、言語処理部７で出力された中間表記に修正が必要であるか否かを判定し、必要な場合には自動的に修正する。詳しくは、変更検索部１３は、第１の日本語表記の形態素表記と、第２の日本語表記の形態素表記とから、日本語表記における変更部分の形態素を検索する。形態素対応付け部１５は、入力された第１の日本語表記と、言語処理部７から出力された第１の日本語表記の中間表記に対し行われた修正後の中間表記とを対応付ける。 The intermediate notation correction unit 11 determines whether the intermediate notation outputted by the language processing unit 7 needs correction, and automatically corrects the intermediate notation if necessary. Specifically, the change search unit 13 searches the morpheme of the change portion in the Japanese notation from the first Japanese morpheme notation and the second Japanese morpheme notation. The morpheme association unit 15 associates the input first Japanese notation with the corrected intermediate notation performed on the first Japanese notation intermediate notation output from the language processing unit 7.

修正判定部１７は、第２の日本語表記に基づき言語処理部７で生成された中間表記において、アクセント強度の修正が必要であるか否かを判定する。すなわち、修正判定部１７は、第１および第２の日本語表記において、例えば、品詞情報とアクセント句情報とが一対一に対応している場合に、第２の日本語表記に対応するアクセント強度を修正すると判定する。すなわち、第１の日本語表記において行われたアクセント強度の修正が、修正後の第２の日本語表記のアクセント強度に引き継がれる。これにより、第２の日本語表記の、第１の日本語表記で変更された形態素を含むアクセント句のアクセント強度は、第１の日本語表記において修正された後のアクセント強度とされる。このようにして、修正判定部１７は、音声合成の対象の日本語表記のアクセント強度を、先に言語処理された日本語表記における修正後のアクセント強度に修正するか否かを判定する。韻律修正部１９は、修正判定部１７の判定結果に基づき、中間表記を修正する。 The correction determination unit 17 determines whether correction of the accent intensity is necessary in the intermediate notation generated by the language processing unit 7 based on the second Japanese notation. That is, in the first and second Japanese notations, for example, when the part-of-speech information and the accent phrase information correspond to each other in one-to-one correspondence, the correction determination unit 17 determines the accent intensity corresponding to the second Japanese notation. It is determined to correct That is, the correction of the accent intensity performed in the first Japanese notation is inherited to the accent strength of the second Japanese notation after the correction. Thereby, the accent strength of the accent phrase including the morpheme changed in the first Japanese notation in the second Japanese notation is taken as the accent strength after being corrected in the first Japanese notation. In this manner, the correction determination unit 17 determines whether to correct the accent intensity of the Japanese expression of the speech synthesis target to the corrected accent intensity of the Japanese expression that has been subjected to the language processing. The prosody correction unit 19 corrects the intermediate notation based on the determination result of the correction determination unit 17.

以下の説明においては、音声合成装置１における各動作は、例えば後述する演算処理装置が所定のプログラムを読み込むことにより実行される場合も含み、便宜的に上述した各機能が処理を行うとして説明する。 In the following description, each operation in the speech synthesizer 1 includes, for example, a case where an arithmetic processing unit described later executes the program by reading a predetermined program, and for convenience, the above-described functions are described as performing processing. .

図２は、本実施の形態による中間表記修正の一例を表示例により概念的に示す図である。図２の表示例４０ａに示すように、処理（ａ）において、第１の日本語表記に対応する日本語表記４１が入力される。日本語表記４１は、「東京近辺に雷注意報が・・・」という日本語文を示している。表示例４０ｂに示すように、日本語表記４１は、処理（ｂ）において、例えば言語処理部７により、韻律情報４３に変換される。ここで、韻律情報４３＝「トーキョーキ’ンペンニ」「カミナリチューイ’ホーガ＆」である。 FIG. 2 is a diagram conceptually showing an example of the intermediate notation correction according to the present embodiment by a display example. As shown in the display example 40 a of FIG. 2, in the process (a), the Japanese notation 41 corresponding to the first Japanese notation is input. The Japanese notation 41 shows a Japanese sentence "Lightning alert is ... near Tokyo". As shown in the display example 40b, the Japanese notation 41 is converted into prosody information 43 by the language processing unit 7, for example, in the process (b). Here, the prosody information 43 = "Tokyo iki 'n penni" and "kaminari chui' hoga &".

表示例４０ｃに示すように、処理（ｃ）において、ユーザにより韻律情報４３におけるアクセント強度が強アクセント「’」から弱アクセント「＊」に修正され、韻律情報４５とされている。ここで、韻律情報４５＝「トーキョーキ＊ンペンニ」「カミナリチューイ＊ホーガ＆」となる。 As shown in the display example 40 c, in processing (c), the accent intensity in the prosody information 43 is corrected from the strong accent “'” to the weak accent “*” by the user, and is set as the prosody information 45. Here, the prosody information 45 = "Tokyo * N-Penni" and "Kaminari Tuy * Houga &".

表示例４０ｄに示すように、処理（ｄ）では、第１の日本語表記の日本語表記４１の一部が修正され、日本語表記４７とされている。日本語表記４７は、「京都周辺に濃霧警報が・・・」という第２の日本語表記である。表示例４０ｅのように、処理（ｅ）では、言語処理部７が、日本語表記４７を解析して、形態素表記、中間表記を出力する。変更検索部１３は、形態素表記を比較して、日本語表記４１と日本語表記４７との異なる形態素を検索する。修正判定部１７は、検索された異なる形態素に対応する品詞情報、アクセント句情報を比較して、アクセント強度を修正するか否か判定する。判定の結果、「濃霧警報が」の部分のアクセント強度が「雷注意報が」で修正されたアクセント強度に設定され、韻律情報４９が出力される。 As shown in the display example 40d, in the process (d), a part of the Japanese notation 41 in the first Japanese notation is corrected, and is referred to as the Japanese notation 47. The Japanese language notation 47 is the second Japanese language notation "the dark fog warning is ... around Kyoto". As in the display example 40e, in the process (e), the language processing unit 7 analyzes the Japanese notation 47 and outputs a morphological notation and an intermediate notation. The change search unit 13 compares morpheme notations and searches for morphemes different from the Japanese notation 41 and the Japanese notation 47. The correction determination unit 17 compares part of speech information and accent phrase information corresponding to different retrieved morphemes, and determines whether to correct the accent intensity. As a result of the determination, the accent intensity of the portion of "fog alert" is set to the accent intensity corrected in "lightning alert", and the prosody information 49 is output.

ところで、上記のように「東京近辺に」を「京都周辺に」に変更した場合、それぞれ形態素解析すると、「東京」（固有名詞）と「近辺」（普通名詞）と「に」（格助詞）および、「京都」（固有名詞）と「周辺」（普通名詞）と「に」（格助詞）になる。例えば、形態素の品詞の並びのみに基づき韻律を修正する例では、「東京近辺に」にマッチする品詞の並びの前例があれば、同じ品詞の並びが「京都周辺に」もマッチするため、両者は共通のアクセント強度を持つことになる。この方法は、意味という点では正しいが、聴感という点では正しくない。ここで、「東京近辺に」および「京都周辺に」をアクセント句に分割すると、「東京周辺に」および「京都」「周辺に」となり、アクセント句の個数が１つから２つに変化している。この例は、アクセント句の個数変化によって大きく聴感的に変化しているためである。このように、アクセント強度はアクセント句に紐づいているため、「東京周辺に」のアクセント強度をどのように「京都」と「周辺に」に反映させるかは、自明でない。このため、上記のように、本実施の形態においては、品詞情報とアクセント句情報との両方の一対一対応を中間表記の修正の条件としている。 By the way, when “around Tokyo” is changed to “around Kyoto” as described above, morphological analysis analyzes “Tokyo” (proper noun), “nearby” (common noun) and “ni” (case particle) And, "Kyoto" (proper noun), "periphery" (common noun) and "ni" (case particle). For example, in the example of modifying the prosody based only on the arrangement of parts of speech of morphemes, if there is a precedent of the arrangement of parts of speech that matches “around Tokyo”, the same arrangement of parts of speech matches “around Kyoto”. Will have common accent strength. This method is correct in terms of meaning but not in terms of hearing. Here, when "around Tokyo" and "around Kyoto" are divided into accent phrases, they become "around Tokyo" and "Kyoto" "around", and the number of accent phrases changes from one to two. There is. In this example, the change in the number of accent phrases causes a large auditory change. As described above, since the accent strength is linked to the accent phrase, it is not obvious how to reflect the accent strength of “around Tokyo” in “Kyoto” and “around”. Therefore, as described above, in the present embodiment, one-to-one correspondence of both part-of-speech information and accent phrase information is used as a condition for correction of intermediate notation.

なお、例えば、ある文中の「東京に」を「京都に」に変更する場合には、「東京に」、「京都に」の品詞情報は、それぞれ（固有名詞）（格助詞）であり、アクセント句も単一であるため、修正の条件が満たされることになる。 For example, when changing “To Tokyo” in a sentence to “To Kyoto”, the parts of speech information of “To Tokyo” and “To Kyoto” are (proper nouns) (case particles) and accents, respectively. Since the phrase is also single, the condition of correction is satisfied.

以下、図３から図９を参照しながら、中間表記修正の一例について説明する。図３は、日本語表記４１に基づき、言語処理部７により、形態素表記５２と中間表記４４とが生成されることを示す図である。品詞情報５１は、日本語表記４１に含まれる品詞を示す情報であり、言語処理部７により日本語表記４１が形態素解析された結果出力される。形態素区切り情報４２は、品詞情報５１と日本語表記４１との対応を示す情報である。形態素表記５２は、品詞情報５１および形態素区切り情報４２を対応させた情報である。 Hereinafter, an example of the intermediate notation correction will be described with reference to FIGS. 3 to 9. FIG. 3 is a diagram showing that the language processing unit 7 generates the morpheme notation 52 and the intermediate notation 44 based on the Japanese notation 41. The part-of-speech information 51 is information indicating the part-of-speech included in the Japanese notation 41, and is output as a result of morphological analysis of the Japanese notation 41 by the language processing unit 7. The morpheme separation information 42 is information indicating the correspondence between the part of speech information 51 and the Japanese notation 41. The morpheme notation 52 is information in which the part-of-speech information 51 and the morpheme separation information 42 correspond to each other.

韻律情報４３は、日本語表記４１に基づき音声合成を行う際の、合成音声の表現に係る表記であり、言語処理部７により日本語表記４１が言語処理された結果に基づき出力される、アクセント強度などの韻律情報を含む情報である。アクセント句情報５４は、日本語表記４１におけるアクセントの区切りを示す情報である。中間表記４４は、韻律情報４３とアクセント句情報５４とを対応させた情報である。 The prosody information 43 is a description relating to the expression of synthetic speech when speech synthesis is performed based on the Japanese expression 41, and accent is output based on the result of language processing of the Japanese expression 41 by the language processing unit 7 It is information including prosody information such as strength. The accent phrase information 54 is information indicating the delimitation of the accent in the Japanese notation 41. The intermediate notation 44 is information in which the prosody information 43 and the accent phrase information 54 correspond to each other.

例えば、日本語表記４１＝「東京近辺に、雷注意報が発令された。」に対する形態素単位の形態素区切り情報４２とは、「東京」「近辺」「に」「雷」「注意報」「が」「発令」「さ」「れ」「た」というような情報である。品詞情報５１とは、「東京」は固有名詞、「近辺」は普通名詞、「に」は格助詞、というような情報である。固有名詞、普通名詞などは、品詞の種類である。例えば品詞情報５１では、品詞の数は６個である。韻律情報４３とは、「トーキョーキ’ンペンニ」「カミナリチューイ’ホーガ＆」等の情報である。韻律情報４３では、「’」は、強アクセントを示している。また図３の例では、品詞情報５１、アクセント句情報５４によれば、一つのアクセント句に含まれる品詞の数は、３個ずつである。 For example, the morpheme division information 42 of the morpheme unit for the Japanese notation 41 = "The lightning alert was issued in the vicinity of Tokyo" is "Tokyo" "Near" "Ni" "Lightning" "Warning" " It is information such as "decree" "sa" "re" "ta". The part-of-speech information 51 is information such as “Tokyo” as a proper noun, “nearby” as a common noun, “ni” as a case particle, and the like. Proper nouns, common nouns, etc. are types of part of speech. For example, in the part of speech information 51, the number of parts of speech is six. The prosody information 43 is information such as "Tokyo iki 'n' penni", "Kaminari chui 'hoga &" and the like. In the prosody information 43, '' indicates a strong accent. Further, in the example of FIG. 3, according to the part-of-speech information 51 and the accent phrase information 54, the number of parts of speech included in one accent phrase is three each.

図４は、中間表記４４に対して行われる、アクセント強度の修正例を示している。中間表記５５では、韻律情報４３においてアクセント強度が修正され、韻律情報４５とされている。韻律情報４５では、韻律情報４３の「カミナリチューイ’ホーガ＆」における強アクセント「’」に代えて、弱アクセント「＊」と修正されている。これにより、日本語表記４１に対応する韻律情報は、例えば、「トーキョーキ’ンペンニ、カミナリチューイ＊ホーガハツレーサレタ。」となる。 FIG. 4 shows an example of accent intensity correction performed on the intermediate notation 44. In the intermediate notation 55, the accent intensity is corrected in the prosody information 43 and is set as the prosody information 45. In the prosody information 45, the weak accent “*” is corrected in place of the strong accent “'” in the “kaminari chui' hoga &” of the prosody information 43. Thus, the prosody information corresponding to the Japanese-language notation 41 is, for example, “Tokyo ki 'n' penni, kaminari chui * hoga hatsure saleta.”

なお、「’」、「＊」は、アクセント位置およびアクセント強度を指定する韻律記号である。「’」はそのアクセント句のアクセント強度が強であること、「＊」は弱であることを意味する。「、」「。」は、呼気段落境界を指定し、「、」「。」および「」（全角スペース）はアクセント句境界を指定している。アクセント句境界に挟まれた文字列をアクセント句と呼ぶ。「＆」は、鼻濁音を示す韻律記号である。 Note that “'” and “*” are prosodic symbols that designate an accent position and an accent intensity. "'" Means that the accent strength of the accent phrase is strong, "*" means that it is weak. "," "." Designate exhalation paragraph boundaries, and "," "." And "" (full-width space) designate accent phrase boundaries. A string between accent phrase boundaries is called an accent phrase. "&" Is a prosodic symbol indicating nasal noise.

図５は、日本語表記４１の変更の一例を示している。図５に示すように、日本語表記４１＝「東京近辺に、雷注意報が・・・」を日本語表記４７＝「（京都周辺）に（濃霧警報）が・・・」と変更している。括弧は、変更された部分を示す。 FIG. 5 shows an example of changing the Japanese notation 41. As shown in FIG. 5, Japanese notation 41 = "To light in the vicinity of Tokyo, lightning warning ..." is changed to Japanese notation 47 = "(around Kyoto) ((dark fog alert) is ..." There is. Parentheses indicate changed parts.

図６は、日本語表記４７に基づき、言語処理部７により、形態素表記５８と中間表記６１とが生成されることを示している。品詞情報５７は、日本語表記４７に含まれる品詞を示す情報であり、言語処理部７により日本語表記４７が形態素解析された結果出力される。形態素区切り情報４８は、品詞情報５７と日本語表記４７との対応を示す情報である。形態素表記５８は、品詞情報５７および形態素区切り情報４８を対応させた情報である。 FIG. 6 shows that the language processing unit 7 generates the morpheme notation 58 and the intermediate notation 61 based on the Japanese notation 47. The part-of-speech information 57 is information indicating the part-of-speech included in the Japanese notation 47, and is output as a result of morphological analysis of the Japanese notation 47 by the language processing unit 7. The morpheme separation information 48 is information indicating the correspondence between the part of speech information 57 and the Japanese notation 47. The morpheme notation 58 is information in which the part-of-speech information 57 and the morpheme separation information 48 correspond to each other.

韻律情報５９は、日本語表記４７に基づく合成音声の表現にかかわる表記であり、言語処理部７により出力される、アクセント強度を含む情報である。アクセント句情報６０は、日本語表記４７におけるアクセントの区切りを示す情報である。中間表記６１は、韻律情報５９とアクセント句情報６０とを対応させた情報である。 The prosody information 59 is a notation related to the expression of synthetic speech based on the Japanese notation 47, and is information including accent intensity that is output by the language processing unit 7. The accent phrase information 60 is information indicating the delimitation of the accent in the Japanese notation 47. The intermediate notation 61 is information in which the prosody information 59 and the accent phrase information 60 correspond to each other.

図７は、変更された形態素を検索する例を示す図である。図７に示すように、形態素表記５２と形態素表記５８とを比較して、互いに異なる形態素を検索する。変更形態素表記５３は、形態素表記５２において形態素表記５８と異なる形態素を示す情報である。変更形態素表記６２は、形態素表記５８において形態素表記５２と異なる形態素を示す情報である。 FIG. 7 is a diagram showing an example of searching for a modified morpheme. As shown in FIG. 7, the morpheme notation 52 and the morpheme notation 58 are compared, and different morphemes are searched for. The modified morpheme notation 53 is information indicating a morpheme different from the morpheme notation 58 in the morpheme notation 52. The modified morpheme notation 62 is information indicating a morpheme different from the morpheme notation 52 in the morpheme notation 58.

図８は、変更形態素表記５３と中間表記５５とを対応付ける例を示す図である。対応情報５６は、日本語表記４１における変更を示す変更形態素表記５３と韻律情報を修正した後の中間表記５５とを対応付けた情報である。 FIG. 8 is a diagram showing an example in which the modified morpheme notation 53 and the intermediate notation 55 are associated with each other. The correspondence information 56 is information in which a modified morpheme notation 53 indicating a change in the Japanese notation 41 is associated with an intermediate notation 55 after the prosody information is corrected.

図９は、変更後の日本語表記４７に対し、修正後の韻律情報４５に応じた修正を行う修正判定例を示している。図９に示すように、変更情報６３は、中間表記６１と変更形態素表記６２とを対応付けた情報である。修正判別情報６４、修正参照情報６５は、アクセント句情報５４とアクセント句情報６０、品詞情報５１と品詞情報５７において、それぞれ一対一に対応するアクセント句を示す情報である。一対一に対応するとは、一つのアクセント句に対応する品詞の数と種類が一致していることである。この一対一に対応するアクセント句に修正参照情報６５におけるアクセント強度が適用されて修正が引き継がれ、韻律情報６６が生成される。なお、中間表記６７は、韻律情報６６とアクセント句情報６０とを対応させた情報である。 FIG. 9 shows a correction determination example of correcting the Japanese notation 47 after the change according to the prosody information 45 after the correction. As shown in FIG. 9, the change information 63 is information in which the intermediate representation 61 and the modified morphological representation 62 are associated. The correction determination information 64 and the correction reference information 65 are information indicating accent phrases corresponding to each other in the accent phrase information 54, the accent phrase information 60, the part-of-speech information 51, and the part-of-speech information 57. One-to-one correspondence means that the number and type of parts of speech corresponding to one accent phrase are the same. The accent intensity in the correction reference information 65 is applied to the accent phrase corresponding to this one-to-one, the correction is taken over, and the prosody information 66 is generated. The intermediate notation 67 is information in which the prosody information 66 and the accent phrase information 60 correspond to each other.

この例では日本語表記４７に対し韻律情報６６が出力され、音声合成部９は、韻律情報６６に基づき音声を合成する。なお、上記の例では、「東京周辺に」のアクセント強度は「京都近辺に」には適用されないが、「雷警報が」で修正されたアクセント強度は、「濃霧警報が」に適用されることになる。 In this example, prosody information 66 is output with respect to the Japanese notation 47, and the speech synthesis unit 9 synthesizes speech based on the prosody information 66. In the above example, the accent intensity of “around Tokyo” is not applied to “around Kyoto”, but the accent intensity corrected by “lightning alert” is applied to “dark fog alert” become.

以下、フローチャートを参照しながら、本実施の形態による音声合成装置１の動作についてさらに説明する。図１０は、音声合成装置１の動作を示すフローチャートである。図１０に示すように、入出力部５は、日本語表記の入力を受付ける（Ｓ７１）。例えば、入出力部５は、第１の日本語表記に対応する日本語表記４１を受付ける。入出力部５は、受付けた日本語表記４１を、例えば表示例４０ａのように表示してもよい。このとき、日本語表記４１を修正可能なように、ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ（ＧＵＩ）等によって、ユーザへ提示するように構成されることが好ましい。 Hereinafter, the operation of the speech synthesizer 1 according to the present embodiment will be further described with reference to the flowchart. FIG. 10 is a flowchart showing the operation of the speech synthesizer 1. As shown in FIG. 10, the input / output unit 5 receives an input written in Japanese (S71). For example, the input / output unit 5 receives the Japanese notation 41 corresponding to the first Japanese notation. The input / output unit 5 may display the received Japanese notation 41, for example, as in the display example 40a. At this time, it is preferable to be configured to present the user with a graphical user interface (GUI) or the like so that the Japanese notation 41 can be corrected.

言語処理部７は、受付けた例えば日本語表記４１に対して形態素解析や係り受け解析などを行い、形態素表記５２、中間表記５５を生成する（Ｓ７２）。例えば、図３に示したように、形態素表記５２は、品詞情報５１および形態素区切り情報４２を含む。例えば、中間表記４４は、韻律情報４３およびアクセント句情報５４を含む。 The language processing unit 7 performs, for example, morphological analysis and dependency analysis on the accepted Japanese notation 41, and generates a morphological notation 52 and an intermediate notation 55 (S72). For example, as shown in FIG. 3, the morphological notation 52 includes a part-of-speech information 51 and a morpheme separation information 42. For example, the intermediate notation 44 includes prosody information 43 and accent phrase information 54.

ここで、音声合成部９は、生成された中間表記５５に基づき音声を合成して入出力部５より出力する。例えば図２の表示例４０ｂにおいて、再生ボタンを表示し、再生ボタンが選択されると音声を出力するようにしてもよい。音声の出力は、例えばスピーカにより行うようにしてもよい。入出力部５は、中間表記４４において、ユーザの修正が入力されると修正を受付ける。例えば、図２の処理（ｃ）に示したように、アクセント強度の変更を受付ける（Ｓ７３）。さらに、図４に示したように、中間表記４４において、韻律情報４３が韻律情報４５に修正される。 Here, the speech synthesis unit 9 synthesizes speech based on the generated intermediate notation 55 and outputs the synthesized speech from the input / output unit 5. For example, in the display example 40b of FIG. 2, the reproduction button may be displayed, and when the reproduction button is selected, the sound may be output. The sound may be output by, for example, a speaker. The input / output unit 5 accepts the correction when the user's correction is input in the intermediate notation 44. For example, as shown in the process (c) of FIG. 2, a change in accent intensity is accepted (S73). Further, as shown in FIG. 4, in the intermediate notation 44, the prosody information 43 is corrected to the prosody information 45.

図２の処理（ｄ）に示したように、入出力部５は、例えば、日本語表記４１の一部の修正を受付ける。修正された日本語表記は、日本語表記４７となる（Ｓ７４）。図６に示したように、言語処理部７は、日本語表記４７を言語処理して、形態素表記５８、中間表記６１を生成する（Ｓ７５）。 As shown in the process (d) of FIG. 2, the input / output unit 5 accepts, for example, a correction of a part of the Japanese notation 41. The corrected Japanese notation is Japanese notation 47 (S74). As shown in FIG. 6, the language processing unit 7 language-processes the Japanese notation 47 to generate a morpheme notation 58 and an intermediate notation 61 (S75).

変更検索部１３は、形態素表記５２と形態素表記５８との対応をとる（Ｓ７６）。また、変更検索部１３は、図７に示すように、形態素表記５２と形態素表記５８との対応に基づき、変更形態素表記５３および変更形態素表記６２を生成する。すなわち、変更検索部１３は、形態素区切り情報４２と形態素区切り情報４８とを比較して異なる形態素を抽出し、変更形態素表記５３、６２を生成する。形態素対応付け部１５は、図８に示したように、変更形態素表記５３と中間表記５５との対応に基づき、対応情報５６を生成する（Ｓ７７）。対応情報５６は、記憶部２１に記憶される。このとき、記憶部２１は、例えば、音声合成装置１に内蔵された半導体メモリ等とすることができる。 The change search unit 13 associates the morpheme notation 52 with the morpheme notation 58 (S76). Further, as shown in FIG. 7, the change search unit 13 generates the changed morpheme notation 53 and the changed morpheme notation 62 based on the correspondence between the morpheme notation 52 and the morpheme notation 58. That is, the change search unit 13 compares the morpheme division information 42 with the morpheme division information 48, extracts different morphemes, and generates the modified morpheme notations 53 and 62. The morpheme association unit 15 generates the association information 56 based on the correspondence between the modified morpheme notation 53 and the intermediate notation 55, as shown in FIG. 8 (S77). The correspondence information 56 is stored in the storage unit 21. At this time, the storage unit 21 can be, for example, a semiconductor memory or the like built in the voice synthesizer 1.

修正判定部１７は、変更形態素表記６２で抽出された変更された形態素を含む未処理のアクセント句があるか否かを判別する（Ｓ７８）。未処理のアクセント句がない場合には（Ｓ７８：ＮＯ）、韻律修正部１９は、その時点での中間表記を出力する（Ｓ８１）。また、音声合成部９は、出力された中間表記に基づき音声を合成する（Ｓ８２）。 The correction determination unit 17 determines whether there is an unprocessed accent phrase including the changed morpheme extracted by the modified morpheme notation 62 (S78). If there is no unprocessed accent phrase (S78: NO), the prosody modification unit 19 outputs the intermediate notation at that time (S81). Further, the speech synthesis unit 9 synthesizes speech based on the outputted intermediate notation (S 82).

変更形態素表記６２で抽出された変更された形態素を含み、未処理のアクセント句がある場合には（Ｓ７８：ＹＥＳ）、修正判定部１７は、それぞれのアクセント句が、韻律情報修正条件を満たすか否かを判定する（Ｓ７９）。韻律情報修正条件とは、ここでは、例えば、当該形態素が含まれるアクセント句に対応する品詞情報５７、アクセント句情報６０共に、品詞情報５１、アクセント句情報５４と一対一に対応していることである。 If there is an unprocessed accent phrase (S78: YES), the correction determination unit 17 determines whether each accent phrase satisfies the prosody information correction condition, including the changed morpheme extracted by the modified morpheme notation 62 (S78: YES) It is determined whether or not (S79). In the prosody information correction condition, here, for example, both the part-of-speech information 57 and the accent phrase information 60 corresponding to the accent phrase including the morpheme are in one-to-one correspondence with the part-of-speech information 51 and the accent phrase information 54. is there.

修正判定部１７が、韻律情報修正条件を満たしていると判定した場合には（Ｓ７９：ＹＥＳ）、韻律修正部１９は、中間表記６１における当該アクセント句のアクセント強度を、中間表記５５における当該アクセント句のアクセント強度に修正する（Ｓ８０）。さらに韻律修正部１９は、処理をＳ７８に戻す。修正判定部１７は、韻律情報修正条件を満たしていないと判定した場合には（Ｓ７９：ＮＯ）、処理をＳ７８に戻す。 If the correction determination unit 17 determines that the prosody information correction condition is satisfied (S79: YES), the prosody correction unit 19 determines the accent intensity of the accent phrase in the intermediate notation 61 as the corresponding accent in the intermediate notation 55. Correct to the accent strength of the phrase (S80). Furthermore, the prosody modification unit 19 returns the process to S78. If the correction determination unit 17 determines that the prosody information correction condition is not satisfied (S79: NO), the process returns to S78.

以上説明したように、第１の実施の形態による音声合成装置１によれば、言語処理部７は、入出力部５を介して入力された日本語表記４１などの言語処理を行う。この言語処理により、例えば、品詞情報５１、形態素区切り情報４２を含む形態素表記５２、韻律情報４３、アクセント句情報５４を含む中間表記４４が生成される。 As described above, according to the speech synthesizer 1 according to the first embodiment, the language processing unit 7 performs language processing such as Japanese notation 41 input through the input / output unit 5. By this language processing, for example, a part-of-speech information 51, a morpheme notation 52 including morpheme division information 42, a prosody information 43, and an intermediate notation 44 including accent phrase information 54 are generated.

音声合成部９は、生成された中間表記４４に基づき、音声を合成し出力する。入出力部５は、出力された音声に基づき、韻律情報４３の修正が入力されると、修正を受付ける。 The speech synthesis unit 9 synthesizes and outputs speech based on the generated intermediate notation 44. When the correction of the prosody information 43 is input based on the output voice, the input / output unit 5 accepts the correction.

入出力部５が、例えば、日本語表記４１の一部を修正した日本語表記４７を受付けると、言語処理部７は、日本語表記４７の言語処理を行う。この言語処理により、例えば、品詞情報５７、形態素区切り情報４８を含む形態素表記５８、韻律情報５９、アクセント句情報６０を含む中間表記６１が生成される。 When the input / output unit 5 receives, for example, the Japanese notation 47 obtained by correcting a part of the Japanese notation 41, the language processing unit 7 performs language processing of the Japanese notation 47. By this language processing, for example, the intermediate notation 61 including the part-of-speech information 57, the morpheme notation 58 including the morpheme division information 48, the prosody information 59, and the accent phrase information 60 is generated.

変更検索部１３は、形態素表記５２と形態素表記５８との対応に基づき、変更形態素表記５３および変更形態素表記６２を取得する。すなわち、変更検索部１３は、形態素区切り情報４２と形態素区切り情報４８とを比較して、異なる形態素を抽出する。形態素対応付け部１５は、入力された第１の日本語表記の形態素表記５２と、第１の日本語表記が言語処理されて言語処理部７から出力された中間表記に対し行われた修正を反映した修正後の中間表記５５とを対応付け、対応情報５６を生成する。対応情報５６は、例えば記憶部２１に記憶される。 The change search unit 13 acquires the changed morpheme notation 53 and the changed morpheme notation 62 based on the correspondence between the morpheme notation 52 and the morpheme notation 58. That is, the change search unit 13 compares the morpheme division information 42 with the morpheme division information 48 to extract different morphemes. The morpheme relating unit 15 corrects the intermediate notation output from the language processing unit 7 after the linguistic processing of the first Japanese notation and the first Japanese notation input 52 being the first Japanese notation. Correspondence with the reflected intermediate representation 55 is generated, and correspondence information 56 is generated. The correspondence information 56 is stored, for example, in the storage unit 21.

修正判定部１７は、抽出された変更形態素表記６２に基づき、変更されたそれぞれの形態素が含まれるアクセント句ごとに品詞情報とアクセント句情報とが対応情報５６と共通（一対一に対応している）か否かを判定する。変更検索部１３は、形態素区切り情報４２と形態素区切り情報４８とで変化している形態素の周囲、もしくは全ての形態素に対して、形態素単位で判定を行う。一対一に対応しているか否かの判定は、マッチング処理で広く行われている例えば、ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ（ＤＰ）マッチングを利用することで実現できる。共通な場合には、韻律情報５９の対応するアクセント句のアクセント強度を、韻律情報４５のアクセント強度に修正する。 Based on the extracted modified morpheme notation 62, the correction determination unit 17 shares part-of-speech information and accent phrase information with the correspondence information 56 for each accent phrase that includes the changed morphemes (one-to-one correspondence) It is determined whether or not The change search unit 13 performs determination in units of morphemes on or around all morphemes that are changed by the morpheme division information 42 and the morpheme division information 48. Whether or not one-to-one correspondence can be determined can be realized by using, for example, Dynamic Programming (DP) matching, which is widely performed in matching processing. In the common case, the accent intensity of the corresponding accent phrase of the prosody information 59 is corrected to the accent intensity of the prosody information 45.

以上詳細に説明したように、音声合成装置１によれば、第１の日本語表記の言語処理結果に基づき生成された合成音声に対して韻律情報の修正を行った場合、修正に関する情報を記憶部２１に記憶する。修正に関する情報とは、韻律情報に関する修正内容と、修正した箇所に対応する少なくとも品詞情報およびアクセント句情報を含む情報である。第１の日本語表記とは別の第２の日本語表記の言語処理結果に、第１の日本語表記の韻律情報の修正箇所と対応するアクセント句があるか否かを判定する。対応するアクセント句がある場合に、第２の日本語表記の対応するアクセント句の韻律情報を、第１の日本語表記に対する修正と同様に修正することで、韻律情報の修正を自動的に反映する。 As described above in detail, according to the speech synthesizer 1, when correction of prosody information is performed on synthesized speech generated based on the result of language processing in the first Japanese notation, information on the correction is stored. Store in section 21. The information on correction is information including correction content on prosody information and at least part-of-speech information and accent phrase information corresponding to the corrected portion. It is determined whether or not there is an accent phrase corresponding to the corrected portion of the prosody information of the first Japanese notation in the language processing result of the second Japanese notation different from the first Japanese notation. If there is a corresponding accent phrase, the correction of the prosody information is automatically reflected by correcting the prosody information of the corresponding accent phrase of the second Japanese notation in the same manner as the correction for the first Japanese notation. Do.

対応するアクセント句とは、第１と第２の日本語表記において、品詞情報とアクセント句情報とが一対一に対応したアクセント句である。これにより、修正を行ったアクセント句と対応するアクセント句の韻律情報を、修正を行った後の韻律情報に自動的に修正し、修正を引き継ぐ。例えば、第１の日本語表記において、一部が変更されて第２の日本語表記となっている場合には、この修正を変更にかかわる全ての変更形態素を含むアクセント句について行う。さらに、ユーザが日本語表記、または中間表記の修正または変更を繰り返した場合は、以上の処理を繰り返す。 The corresponding accent phrase is an accent phrase in which the part-of-speech information and the accent phrase information correspond to each other in the first and second Japanese notations. This automatically corrects the prosody information of the corrected accent phrase and the corresponding accent phrase to the prosody information after correction, and takes over the correction. For example, in the first Japanese notation, when a part is changed to become a second Japanese notation, this correction is performed on an accent phrase including all the change morphemes involved in the change. Furthermore, when the user repeatedly corrects or changes Japanese notation or intermediate notation, the above processing is repeated.

以上のように、第１の実施の形態による音声合成装置１によれば、韻律情報の修正が行われた場合に、修正が行われた韻律情報に対応する品詞情報とアクセント句情報とを有する日本語表記に、修正を引き継ぐことができる。すなわち、音声合成装置１は、アクセント句の品詞の数と種類が同一である場合に、記憶部に記憶された修正情報と同様に、韻律情報の修正を行うと判定することができる。ユーザの修正または変更が日本語表記に行われた場合には、既にユーザによる修正または変更のあった韻律を、修正または変更が行われた部分に自動的に反映させることができる。これにより、自然な合成音声を生成するために、ユーザが手動で韻律を繰り返し修正するなどの手間が大幅に縮小され、簡易に高品質の合成音声を生成することが可能になる。 As described above, according to the speech synthesizer 1 according to the first embodiment, when correction of prosody information is performed, the speech synthesis device 1 has part-of-speech information and accent phrase information corresponding to the corrected prosody information. The correction can be handed over to the Japanese notation. That is, when the number and type of parts of speech of the accented phrase are the same, the speech synthesizer 1 can determine to correct the prosody information as in the correction information stored in the storage unit. When the user's correction or change is made in Japanese, prosody already corrected or changed by the user can be automatically reflected in the portion where the correction or change is made. As a result, in order to generate natural synthetic speech, the time and effort required for the user to repeatedly modify the prosody repeatedly is greatly reduced, and high-quality synthetic speech can be easily generated.

本実施の形態によれば、既に行われた韻律情報の修正を、日本語表記の変更された箇所に反映させるために、品詞情報とアクセント句情報とが一対一に対応するか否かを判別し、対応する場合に修正を引き継ぐことができる。具体的には、形態素対応付け部１５が、変更後の日本語表記に関する品詞情報と中間表記とを対応付ける。これにより、修正判定部１７は、変更検索部１３が検索した、修正された形態素を含むアクセント句の全形態素の品詞が、修正前のアクセント句と一対一に対応する品詞情報を有しているか否かを判別することができる。 According to the present embodiment, it is determined whether or not the part-of-speech information and the accent phrase information correspond to each other in order to reflect the correction of the prosody information that has already been made on the changed part of the Japanese notation. And can take over the correction if it corresponds. Specifically, the morpheme association unit 15 associates the part-of-speech information on the changed Japanese notation with the intermediate notation. Thereby, the correction determination unit 17 determines whether the parts-of-speech of all morphemes of the accent phrase including the corrected morpheme searched by the change search unit 13 have part-of-speech information corresponding to the accent phrase before correction one-on-one Whether or not can be determined.

このように品詞情報を使用することで、変更前後で形態素の持つ意味が類似しているか否か判別することができる。また、アクセント句を使用することで、変更前後で合成音声にしたときに聴感上類似しているか否かを判別することができる。これにより、意味が類似し、かつ聴覚上類似している場合のみ、音声合成装置１は、すでに修正を行ったアクセント句と同様の韻律の修正を行うことができる。よって、文脈および韻律を考慮した中間表記を生成することができ、アクセント強度等が無理に引き継がれてかえって不自然なアクセント強度になってしまうことを防止することができるので、合成音声の適切な修正が可能である。また、品詞のみ、あるいはアクセント句のみが一致した場合にアクセントを同様に修正するような例と比べると、より自然な音声とすることができる。 By using part-of-speech information in this manner, it is possible to determine whether the meanings of morphemes are similar before and after the change. In addition, by using the accent phrase, it is possible to determine whether or not the speech is similar when synthesized speech before and after the change. In this way, the speech synthesizer 1 can perform the same prosody correction as the accent phrase that has already been corrected, only when the meanings are similar and aurally similar. Therefore, it is possible to generate an intermediate notation taking into consideration context and prosody, and to prevent the unintended accent intensity from being forcibly taken over by the accent intensity etc., so that the synthetic speech is appropriate. Correction is possible. In addition, speech can be made more natural as compared with an example in which the accent is similarly corrected when only the part of speech or the accent phrase is matched.

教材音声や説明用音声など、高音質が求められる場合でも、十分な精度で合成音声を生成することができる。また、中間表記を修正した後、日本語表記が、原稿の訂正等よって変更される場合や、修正が繰り返される場合でも、ユーザが以前に修正した中間表記の情報を再利用することができる。よって、ユーザに修正のための負担を繰り返し強いることを回避でき、コストを削減できる。 Even when high sound quality is required, such as teaching material voice and explanatory voice, synthetic speech can be generated with sufficient accuracy. In addition, even if the Japanese notation is changed due to the correction of an original after correction of the intermediate notation, or even if the correction is repeated, it is possible to reuse the information on the intermediate notation that the user has previously corrected. Therefore, it is possible to prevent the user from repeatedly putting a burden of correction and reduce costs.

モーラ数およびアクセント型の並びに基づいて引き継ぐような場合のように、制約が厳しすぎ、多くの場合に、アクセント強度を引き継ぐことができない、という事態も回避される。 It is also avoided that the constraints are too severe and in many cases inability to take over the accent strength, as is the case for taking over on the basis of mora numbers and accent types and as well.

以上の構成により、ユーザの修正した韻律記号を適切に引き継ぎ、ユーザの負担となる修正コストの削減に寄与する。 According to the above configuration, the user's corrected prosodic symbols are properly taken over, which contributes to the reduction of the correction cost burdening the user.

（第２の実施の形態）
以下、第２の実施の形態による音声合成装置１００について、図１１から図１４を参照しながら説明する。第２の実施の形態による音声合成装置１００において、第１の実施の形態による音声合成装置１と同様の構成および動作については同一番号を付し、重複説明を省略する。本実施の形態においては、言語処理部７により生成された中間表記にかかわらず用いる中間表記を指定する場合に、用いる中間表記および対応する形態素表記をテンプレートとして記憶する。そして、音声合成装置１００は、音声合成対象の日本語表記の中間表記を生成する際に、テンプレートを参照する。 Second Embodiment
The speech synthesizer 100 according to the second embodiment will be described below with reference to FIGS. 11 to 14. In the speech synthesizer 100 according to the second embodiment, the same components and operations as those of the speech synthesizer 1 according to the first embodiment are denoted by the same reference numerals and the description thereof will be omitted. In this embodiment, when designating an intermediate notation to be used regardless of the intermediate notation generated by the language processing unit 7, the intermediate notation to be used and the corresponding morpheme notation are stored as a template. Then, the speech synthesizer 100 refers to the template when generating the intermediate notation of the Japanese notation of the speech synthesis target.

図１１は、第２の実施の形態による音声合成装置１００の構成の一例を示す機能ブロック図である。図１１に示すように、音声合成装置１００は、入出力部５、言語処理部７、音声合成部９、中間表記修正部１０３、記憶部２１を有している。記憶部２１には、テンプレートＤａｔａＢａｓｅ（ＤＢ）１１７が記憶されるようにしてもよい。中間表記修正部１０３は、ＤＢ登録部１０５、テンプレート検索部１１３、テンプレート対応付け部１１５、韻律修正部１９を有している。音声合成装置１００の上記の各機能は、例えば、記憶部２１に記憶されたプログラムをプロセッサが読み込んで実行することにより実現されるようにしてもよい。また、一部もしくは全ての機能を例えば半導体集積回路などにより実現するようにしてもよい。 FIG. 11 is a functional block diagram showing an example of the configuration of the speech synthesizer 100 according to the second embodiment. As shown in FIG. 11, the speech synthesizer 100 includes an input / output unit 5, a language processing unit 7, a speech synthesis unit 9, an intermediate notation correction unit 103, and a storage unit 21. The storage unit 21 may store a template Data Base (DB) 117. The intermediate notation correction unit 103 includes a DB registration unit 105, a template search unit 113, a template correspondence unit 115, and a prosody correction unit 19. Each of the above-described functions of the speech synthesizer 100 may be realized, for example, by a processor reading and executing a program stored in the storage unit 21. Further, part or all of the functions may be realized by, for example, a semiconductor integrated circuit or the like.

中間表記修正部１０３は、テンプレートＤＢ１１７を参照して、言語処理部７で出力された中間表記に修正が必要であるか否かを判定し、必要な場合には自動的に修正する。ＤＢ登録部１０５は、言語処理部７により生成される中間表記とは別の中間表記に関する情報をテンプレートＤＢ１１７に登録する。登録の対象となる中間表記は、例えば、入出力部５を介して入力される。ＤＢ登録部１０５は、品詞情報、アクセント句情報および韻律情報を対応付けて登録する。テンプレートＤＢ１１７は、例えば、品詞情報、アクセント句情報および韻律情報を対応付けて記憶したデータベースである。 The intermediate notation correction unit 103 refers to the template DB 117 to determine whether the intermediate notation output by the language processing unit 7 needs correction, and automatically corrects the intermediate notation if necessary. The DB registration unit 105 registers, in the template DB 117, information on an intermediate notation different from the intermediate notation generated by the language processing unit 7. The intermediate notation to be registered is input, for example, through the input / output unit 5. The DB registration unit 105 registers the part of speech information, the accent phrase information, and the prosody information in association with each other. The template DB 117 is, for example, a database in which part-of-speech information, accent phrase information and prosody information are stored in association with each other.

テンプレート検索部１１３は、修正された日本語表記、または新たに登録された日本語表記の中間表記が、テンプレートＤＢ１１７に登録済の情報とマッチするか否か検索する。テンプレート対応付け部１１５は、マッチするテンプレートがある場合、対象の日本語表記とテンプレートとの対応付を行う。韻律修正部１９は、マッチするテンプレートがある場合に、テンプレート対応付け部１１５で対応付けられた、対応するアクセント強度をテンプレートＤＢ１１７に記憶されているアクセント強度に修正する。 The template search unit 113 searches whether or not the corrected Japanese expression or the newly registered intermediate expression of the Japanese expression matches the information registered in the template DB 117. When there is a matching template, the template association unit 115 associates the Japanese notation of the target with the template. When there is a matching template, the prosody modification unit 19 modifies the corresponding accent intensity associated by the template association unit 115 to the accent intensity stored in the template DB 117.

なお、以下の説明においては、音声合成装置１における各動作は、例えば後述する演算処理装置が所定のプログラムを読み込むことにより実行される場合も含み、便宜的に上述した各機能が処理を行うとして説明する。 In the following description, each operation in the speech synthesizer 1 includes, for example, a case where an arithmetic processing unit to be described later executes the operation by reading a predetermined program, and each function described above performs processing for convenience. explain.

図１２は、テンプレート１２５の一例を示す図である。図１２に示すように、テンプレート１２５は、アクセント句情報１２１、品詞情報１２２、日本語表記例１２３、アクセント強度情報１２４を有している。日本語表記例１２３は、テンプレート１２５にマッチすると判別される日本語表記の例を示す情報である。日本語表記例１２３において「＃」は、品詞情報が一致するどのような形態素でもよいことを示している。日本語表記例１２３は、この例では、日本語表記の前半の格助詞と、日本語表記の後半の「発令されました。」とが指定されているが、これに限定されない。アクセント強度情報１２４は、韻律情報の一例である。アクセント強度情報１２４として、「指定なし」という情報が登録されているが、これは、修正の必要がないことを示している。 FIG. 12 shows an example of the template 125. As shown in FIG. As shown in FIG. 12, the template 125 includes accent phrase information 121, part-of-speech information 122, Japanese spelling examples 123, and accent strength information 124. The Japanese notation example 123 is information indicating an example of Japanese notation determined to match the template 125. In Japanese typographical example 123, "#" indicates that any morpheme in which part of speech information matches may be used. In the Japanese typographical example 123, in this example, the case particle in the first half of Japanese transcription and the second half of the Japanese typography are designated, but it is not limited thereto. Accent strength information 124 is an example of prosody information. Although the information "not specified" is registered as the accent intensity information 124, this indicates that no correction is necessary.

図１３は、日本語表記４７が入力された場合にテンプレートＤＢ１１７を参照する例を概念的に示す図である。図１３に示すように、テンプレートＤＢ１１７には、例えば、テンプレート１３５、１４５、１５５等、少なくとも一つのテンプレートが登録されている。テンプレート１３５は、アクセント句情報１３１、品詞情報１３２、日本語表記例１３３、アクセント強度情報１３４を有している。他のテンプレート１４５、１５５も同様である。アクセント強度情報１３４は、韻律情報の一例である。 FIG. 13 conceptually shows an example of referring to the template DB 117 when Japanese language notation 47 is input. As shown in FIG. 13, in the template DB 117, for example, at least one template such as the templates 135, 145 and 155 is registered. The template 135 includes accent phrase information 131, part-of-speech information 132, Japanese typographical examples 133, and accent strength information 134. The other templates 145 and 155 are the same. Accent intensity information 134 is an example of prosody information.

図１３においては、日本語表記４７から生成されたアクセント句情報６０、品詞情報５７、形態素区切り情報４８が示されている。この例では、アクセント句情報６０がアクセント句情報１３１と一対一に対応し、品詞情報５７が品詞情報１３２と一対一に対応するため、日本語表記４７は、テンプレート１３５にマッチすることになる。よって、アクセント強度情報１３４を中間表記に適用することになる。 In FIG. 13, accent phrase information 60, part-of-speech information 57, and morpheme separation information 48 generated from the Japanese notation 47 are shown. In this example, the Japanese phrase 47 matches the template 135 because the accent phrase information 60 corresponds to the accent phrase information 131 one to one, and the part of speech information 57 corresponds to the part of speech information 132 one to one. Therefore, the accent intensity information 134 is applied to the intermediate notation.

具体的には、アクセント句情報１３１＝「（アクセント句）（アクセント句）（アクセント句）」、品詞情報１３２＝「（固有名詞）（普通名詞）（格助詞）（普通名詞）（格助詞」である。これに対しアクセント強度情報１３４＝「（強アクセント（’））（弱アクセント（＊））（弱アクセント（＊））」と登録されている。一方、日本語表記４７については、図６において示したように、言語処理により韻律情報５９が生成される。このとき韻律情報５９は、アクセント強度情報１３４に基づき「シューヘンニ’」が「シューヘンニ＊」と修正され、「ノームケ’ーホーガ＆」が「ノームケ＊ーホーガ＆」と修正されることになる。 Specifically, accent phrase information 131 = “(accent phrase) (accent phrase) (accent phrase)”, part of speech information 132 = “(proper noun) (common noun) (case particle) (common noun) (case particle) On the other hand, accent intensity information 134 = "(strong accent (')) (weak accent (*)) (weak accent (*))" is registered. As shown in 6, the prosody information 59 is generated by the language processing, and the prosody information 59 is corrected based on the accent strength information 134 as "Shuhenni" as "Shuhenni *", and "Nomoke '-Hooga &" Will be corrected as "Nomke *-Hooga &".

図１４は、第２の実施の形態による音声合成装置１００の処理を示すフローチャートである。図１４に示した処理の前に、例えばＤＢ登録部１０５は、例えば、韻律情報に修正のあった日本語表記の形態素解析結果および中間表記に基づき、テンプレートＤＢ１１７に登録を行うものとする。例えば、図３に示した形態素表記５２と図４に示した修正後の中間表記５５とに基づきテンプレートを登録する。または、ユーザが手動で入出力部５を介して登録内容を入力するようにしてもよい。 FIG. 14 is a flowchart showing processing of the speech synthesizer 100 according to the second embodiment. Before the process shown in FIG. 14, for example, the DB registration unit 105 registers in the template DB 117 based on the morpheme analysis result and the intermediate notation of Japanese notation in which the prosody information has been corrected. For example, the template is registered based on the morpheme notation 52 shown in FIG. 3 and the intermediate notation 55 after the correction shown in FIG. Alternatively, the user may manually input the registration content via the input / output unit 5.

図１４に示すように、入出力部５は、例えば日本語表記４７の入力を受付ける（Ｓ１６１）。言語処理部７は、日本語表記４７を言語処理し、形態素表記５８および中間表記６１を出力する（Ｓ１６２）。テンプレート検索部１１３は、出力された形態素表記５８における品詞情報５７と中間表記６１におけるアクセント句情報６０とを、テンプレートＤＢ１１７のテンプレートとマッチするか否か検索する（Ｓ１６３）。このとき、テンプレート検索部１１３は、品詞情報５７およびアクセント句情報６０がそれぞれ一対一に対応付けられるテンプレートがあるか否かを検索する。なお、テンプレートとして日本文の一部が指定されている場合には、その日本語表記についても一致するか否かを検索する。 As shown in FIG. 14, the input / output unit 5 receives, for example, an input of Japanese notation 47 (S161). The language processing unit 7 language processes the Japanese notation 47 and outputs the morphological notation 58 and the intermediate notation 61 (S 162). The template search unit 113 searches whether the part of speech information 57 in the output morpheme notation 58 and the accent phrase information 60 in the intermediate notation 61 match the template in the template DB 117 (S163). At this time, the template search unit 113 searches whether there is a template in which the part-of-speech information 57 and the accent phrase information 60 correspond to each other. In addition, when a part of the Japanese text is designated as a template, it is searched whether or not the Japanese notation also matches.

マッチするテンプレートがない場合には（Ｓ１６３：ＮＯ）、韻律修正部１９は、その時点での中間表記をそのまま出力する（Ｓ１６６）。マッチするテンプレートがある場合には（Ｓ１６３：ＹＥＳ）、テンプレート対応付け部１１５は、テンプレートＤＢ１１７におけるテンプレートと例えば韻律情報５９とを対応付ける（Ｓ１６４）。韻律修正部１９は、韻律情報５９をマッチした例えばテンプレート１３５に基づき修正し（Ｓ１６５）、テンプレートに基づく中間表記を出力する（Ｓ１６６）。音声合成部９は、中間表記に基づき音声を合成する（Ｓ１６７）。 If there is no matching template (S163: NO), the prosody modification unit 19 outputs the intermediate notation at that time as it is (S166). If there is a matching template (S163: YES), the template associating unit 115 associates the template in the template DB 117 with the prosody information 59, for example (S164). The prosody modification unit 19 modifies the prosody information 59 based on, for example, the matched template 135 (S165), and outputs an intermediate notation based on the template (S166). The speech synthesis unit 9 synthesizes speech based on the intermediate notation (S167).

ここで、図１３のテンプレートＤＢ１１７の例を参照しながら、テンプレート検索部１１３の動作についてさらに説明する。ここでは、品詞情報とアクセント強度以外に、日本語表記の一部を合わせて登録しているが、これはなくてもよい。 Here, the operation of the template search unit 113 will be further described with reference to the example of the template DB 117 of FIG. Here, in addition to the part-of-speech information and the accent intensity, a part of the Japanese notation is registered together, but this may not be necessary.

例えば、日本語表記４７の「京都周辺に濃霧警報が」は、テンプレート１３５にはマッチするが、テンプレート１４５にはマッチしない。「京都周辺に」が「京都」と「周辺に」の２アクセント句であり、テンプレート１４５のアクセント句情報１４１と一致しないためである。このような検索は、例えば、形態素単位のＤＰマッチングを利用することで実現できる。 For example, Japanese words 47 “Dog fog alert around Kyoto” matches template 135 but not template 145. This is because “around Kyoto” is the two accent phrases “Kyoto” and “around” and does not match the accent phrase information 141 of the template 145. Such a search can be realized, for example, by using DP matching on a morpheme basis.

ＤＰマッチングでは、まず、テンプレート検索部１１３は、ユーザの入力した日本語表記を言語処理して生成した例えば形態素表記５８と、テンプレートデータベースに格納されたテンプレートの形態素表記に対して、形態素ごとの置換コストを次のように定義する。そして、テンプレート検索部１１３は、言語処理部７により生成された形態素表記５８と、テンプレートの形態素表記との異なる部分に関して、挿入コストおよび削除コストを１として、形態素単位でＤＰマッチングを行う。挿入コストは、例えば、テンプレートＤＢ１１７のテンプレートになく、形態素表記５８にある形態素がある場合に「１」とすることができる。削除コストは、例えば、テンプレートＤＢ１１７のテンプレートにあり、形態素表記５８にない形態素がある場合に「１」とすることができる。
１）２つの形態素の品詞および文字列そのものが一致するならば置換コスト＝０
２）２つの形態素の品詞が一致しかつテンプレートの形態素が「＃」ならば置換コスト＝０
３）上記２つの条件を満たさない場合、置換コスト＝１ In DP matching, first, the template search unit 113 substitutes, for each morpheme, for example, the morpheme notation 58 generated by language processing of the Japanese notation input by the user and the morpheme notation of the template stored in the template database. Define the cost as follows: Then, the template search unit 113 performs DP matching in units of morphemes, where the insertion cost and the deletion cost are 1, regarding different portions between the morpheme notation 58 generated by the language processing unit 7 and the morpheme notation of the template. The insertion cost can be set to “1”, for example, when there is a morpheme in the morpheme notation 58, not in the template of the template DB 117. The deletion cost can be set to “1”, for example, when there is a morpheme that exists in the template of the template DB 117 and is not in the morpheme notation 58.
1) If the part of speech of the two morphemes and the string itself match, the replacement cost = 0
2) If the part of speech of two morphemes match and the morpheme of the template is "#", the replacement cost = 0
3) If the above two conditions are not met, replacement cost = 1

次に、形態素表記５８の中で、置換・挿入・削除コストの合計が０となるような部分があるかを調べ、さらに、その部分に対応するアクセント句と形態素の包含関係がテンプレートのアクセント句と形態素の包含関係と一致しているか確認する。一致するならば、そのテンプレートはマッチしていると判定する。以上を繰り返すことで、テンプレートデータベースからマッチしているテンプレートを検索できる。 Next, it is checked whether there is a part where the sum of replacement, insertion and deletion costs is 0 in the morphological notation 58, and further, the inclusion relation of the accent phrase and the morpheme corresponding to the part is the accent phrase of the template. Make sure that it matches the inclusion relation of and morphemes. If they match, it is determined that the template matches. By repeating the above, it is possible to search for a matching template from the template database.

以上説明したように、本実施の形態による音声合成装置１００によれば、例えば言語処理部７で生成された中間表記とは別の中間表記を含むテンプレートを作成する。音声合成対象の日本語表記が入力されると、言語処理の結果の形態素表記、中間表記をテンプレートＤＢ１１７で検索し、マッチするか否かを判定する。 As described above, according to the speech synthesizer 100 according to the present embodiment, for example, a template including an intermediate notation different from the intermediate notation generated by the language processing unit 7 is created. When the Japanese notation of the speech synthesis target is input, the template DB 117 searches the morpheme notation and the intermediate notation of the result of the language processing, and determines whether or not they match.

テンプレートにマッチするか否かは、少なくとも品詞情報とアクセント句情報で判断する。具体的にどのような形態素であるかを指定し、検索してもよい。なお、テンプレートＤＢ１１７は、テンプレートを１つ以上格納する。 Whether or not the template is matched is determined by at least part of speech information and accent phrase information. You may specify and search specifically what morpheme it is. The template DB 117 stores one or more templates.

テンプレートＤＢ１１７に登録する情報に、例えば日本語表記例１３３、１４３、１５３などのように日本語表記の一部を登録することもできる。これにより、単語や形態素の並びといった局所的な意味だけでなく、複数文に依存した大局的な意味にも依存させて、適切にアクセント強度を設定することができる。テンプレートＤＢ１１７を自由に書き換え可能とすることにより、ユーザの要求により合致した合成音声を生成することができる。 For example, a part of Japanese notation can be registered in the information registered in the template DB 117 as in Japanese notation examples 133, 143, 153, and the like. As a result, the accent intensity can be appropriately set depending not only on the local meaning such as the arrangement of words and morphemes but also on the global meaning depending on a plurality of sentences. By making template DB 117 freely rewritable, it is possible to generate synthesized speech that matches the user's request.

以上の構成により、あらかじめ、テンプレートＤＢ１１７にテンプレートを登録しておけば、ユーザが日本語表記を入力し、言語処理で中間表記を自動生成する際に、登録しておいたテンプレートに基づき韻律情報を修正することができる。また、テンプレートに品詞情報とアクセント句情報を使用することで、文脈と韻律を考慮して適切に韻律情報を修正することができ、ユーザが中間表記を再び修正する手間の削減に寄与する。 With the above configuration, if a template is registered in advance in the template DB 117, when the user inputs Japanese notation and automatically generates an intermediate notation by language processing, prosody information is registered based on the registered template. It can be corrected. In addition, by using part-of-speech information and accent phrase information as a template, prosody information can be appropriately corrected in consideration of context and prosody, which contributes to reduction of the time for the user to correct the intermediate notation again.

（第３の実施の形態）
以下、第３の実施の形態による音声合成装置２００について、図１５を参照しながら説明する。第３の実施の形態による音声合成装置２００において、第１または第２の実施の形態による音声合成装置１、１００と同様の構成および動作については同一番号を付し、重複説明を省略する。本実施の形態においては、音声合成装置２００において、中間表記に修正が施された場合に、修正後の韻律情報および対応する品詞情報、アクセント句情報を自動的にテンプレートＤＢ１１７に登録して、テンプレートＤＢ１１７を更新する。 Third Embodiment
The speech synthesizer 200 according to the third embodiment will be described below with reference to FIG. In the speech synthesizer 200 according to the third embodiment, the same components and operations as those of the speech synthesizers 1 and 100 according to the first or second embodiment are designated by the same reference numerals and the description thereof will be omitted. In the present embodiment, in the speech synthesis apparatus 200, when the intermediate notation is corrected, the prosody information after correction, the corresponding part-of-speech information, and the accent phrase information are automatically registered in the template DB 117, and the template Update DB117.

図１５は、第３の実施の形態による音声合成装置２００の構成の一例を示す機能ブロック図である。図１５に示すように、音声合成装置２００は、入出力部５、言語処理部７、音声合成部９、中間表記修正部１０３、記憶部２１を有している。記憶部２１には、テンプレートＤａｔａＢａｓｅ（ＤＢ）１１７が記憶されることが好ましい。中間表記修正部２０３は、ＤＢ登録部１０５、テンプレート検索部１１３、テンプレート対応付け部１１５、韻律修正部１９、テンプレート更新部２０５を有している。音声合成装置２００の上記の各機能は、例えば、記憶部２１に記憶されたプログラムをプロセッサが読み込んで実行することにより実現されるようにしてもよい。また、一部もしくは全ての機能を例えば半導体集積回路などにより実現するようにしてもよい。 FIG. 15 is a functional block diagram showing an example of the configuration of the speech synthesizer 200 according to the third embodiment. As shown in FIG. 15, the speech synthesizer 200 includes an input / output unit 5, a language processing unit 7, a speech synthesis unit 9, an intermediate notation correction unit 103, and a storage unit 21. Preferably, the template Data Base (DB) 117 is stored in the storage unit 21. The intermediate notation correction unit 203 includes a DB registration unit 105, a template search unit 113, a template correspondence unit 115, a prosody correction unit 19, and a template update unit 205. Each of the above-described functions of the speech synthesizer 200 may be realized, for example, by a processor reading and executing a program stored in the storage unit 21. Further, part or all of the functions may be realized by, for example, a semiconductor integrated circuit or the like.

テンプレート更新部２０５は、音声合成装置２００において、中間表記に修正が施された場合に、修正後の中間表記の韻律情報、および少なくとも対応する品詞情報およびアクセント句情報を互いに関連付けて自動的にテンプレートＤＢ１１７に登録または更新する。中間表記修正部２０３は、音声合成を行う際には、最新のテンプレートＤＢ１１７を参照して、テンプレートとのマッチングを判定する。その他の構成および動作は、第２の実施の形態による音声合成装置１００と同様である。 The template updating unit 205 automatically associates the modified intermediate notation prosody information and at least the corresponding part-of-speech information and the accent phrase information with each other when the intermediate expression is corrected in the speech synthesizer 200. Register or update in DB117. When speech synthesis is performed, the intermediate transcription correction unit 203 refers to the latest template DB 117 to determine matching with a template. The other configuration and operation are similar to those of the speech synthesizer 100 according to the second embodiment.

以上説明したように、第３の実施の形態による音声合成装置２００によれば、音声合成装置１００による効果に加え、自動的にテンプレートＤＢ１１７を登録および更新することができる。よって、さらにきめ細かく韻律情報修正を反映することができ、さらにユーザの修正の手間を削減しながら良質の音声を合成することが可能になる。 As described above, according to the speech synthesis apparatus 200 according to the third embodiment, in addition to the effects of the speech synthesis apparatus 100, the template DB 117 can be automatically registered and updated. Therefore, it is possible to more finely reflect the prosody information correction, and it is possible to synthesize high-quality speech while reducing the time and effort of the user's correction.

（変形例）
以下、図１６、１７を参照しながら、第１から第３の実施の形態に適用可能な変形例について説明する。本変形例が適用される音声合成装置は、音声合成装置１、１００、２００のいずれでもよい。第１から第３の実施の形態において説明した音声合成装置１、１００、２００の構成および動作については、重複説明を省略する。 (Modification)
Hereinafter, with reference to FIGS. 16 and 17, modifications applicable to the first to third embodiments will be described. The speech synthesis apparatus to which this modification is applied may be any of the speech synthesis apparatuses 1, 100, and 200. The description of the configurations and operations of the speech synthesizers 1, 100 and 200 described in the first to third embodiments will be omitted.

図１６は、韻律情報として話速情報を用いる例である。図１６に示すように、韻律情報２１１として「トーキョーキ’ンペンニ、カミナリチューイ’ホーガ▽ ハツレーサレタ。」とされる。ここで「▽」は、話速を遅くすることを示す韻律記号とする。このとき、図１６において下線を付した「カミナリチューイ’ホーガ」に対応する品詞情報＝「（普通名詞）（普通名詞）（格助詞）」、アクセント句情報＝「（アクセント句）」と一対一に対応するアクセント句に関して、中間表記に韻律記号「▽」が適用されることになる。例えば、話速を早くする韻律記号「△」を用いることもできる。 FIG. 16 is an example of using speech speed information as prosody information. As shown in FIG. 16, as the prosody information 211, it is assumed that "Tokyo’ ki ペンペンペンペンペンペンペンカミナガガガツツツツツツツ. Here, “▽” is a prosodic symbol indicating that the speech speed is to be slowed down. At this time, the part-of-speech information corresponding to "Kaminaritui 'Hoga" underlined in FIG. 16 = "(common noun) (common noun) (case particle)", accent phrase information = "(accent phrase)" The prosodic symbol "▽" will be applied to the intermediate notation for the accent phrase corresponding to. For example, it is possible to use a prosodic symbol “Δ” for increasing the speech speed.

図１７は、韻律情報として音量情報を用いる例である。図１７に示すように、韻律情報２１５として「トーキョーキ’ンペンニ、カミナリチューイ’ホーガ↑ ハツレーサレタ。」とされる。ここで「↑」は、音量を大きくすることを示す韻律記号とする。このとき、図１７において下線を付した「カミナリチューイ’ホーガ」に対応する品詞情報＝「（普通名詞）（普通名詞）（格助詞）」、アクセント句情報＝「（アクセント句）」と一対一に対応するアクセント句に関して、中間表記に韻律記号「↑」が適用されることになる。例えば、音量を小さくする韻律記号「↓」を用いることもできる。 FIG. 17 is an example using volume information as prosody information. As shown in FIG. 17, as the prosody information 215, it is assumed that "Tokyo’ 'ペンペンペンペンチューペンカペンホーホーハガツツツツレーレーレー. Here, “↑” is a prosodic symbol indicating that the volume is increased. At this time, the part-of-speech information corresponding to "Kaminaritui 'Hoga" underlined in FIG. 17 = "(common noun) (common noun) (case particle)", accent phrase information = "(accent phrase)" The prosodic character "" will be applied to the intermediate notation with respect to the accent phrase corresponding to. For example, it is possible to use a prosodic symbol "↓" which reduces the volume.

以上のように、本変形例によれば、アクセント強度以外の様々な韻律記号にも、韻律記号の自動修正を行うことが可能である。よって、より高品質の合成音声を生成するための手間がさらに低減される。 As described above, according to this modification, it is possible to automatically correct the prosodic symbols for various prosodic symbols other than the accent intensity. This further reduces the effort for generating higher quality synthetic speech.

上記第１から第３の実施の形態および変形例において、中間表記修正部１１、１０３、２０３は、修正部の一例であり、入出力部５は、入力部、韻律修正受付部、言語変更受付部の一例である。また、入出力部５、ＤＢ登録部１０５は、登録受付部の一例である。形態素対応付け部１５、テンプレート更新部２０５は、対応付け部の一例である。 In the above first to third embodiments and modifications, the intermediate notation correction units 11, 103 and 203 are an example of correction units, and the input / output unit 5 is an input unit, a prosody correction reception unit, and a language change reception unit. It is an example of a part. The input / output unit 5 and the DB registration unit 105 are an example of a registration reception unit. The morpheme association unit 15 and the template update unit 205 are examples of the association unit.

なお、本発明は、以上に述べた実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内で種々の構成または実施形態を採ることができる。例えば、上記第１から第３の実施の形態および変形例においては、日本語表記について説明したが、上記のような言語処理が可能な他の自然言語についても適用が可能である。 The present invention is not limited to the embodiments described above, and various configurations or embodiments can be adopted without departing from the scope of the present invention. For example, although the Japanese notation has been described in the first to third embodiments and the modifications described above, the present invention is also applicable to other natural languages capable of language processing as described above.

形態素対応付け部１５は、図８に示したように、変更形態素表記５３と中間表記５５との対応付を行う。しかし中間表記５５は、ユーザにより修正が加えられた中間表記となるため、場合によっては、読み等が変更され、単純に対応をとることができない場合がある。このような場合には、例えば、変更された形態素の周囲のみの対応を取るようにしてもよい。対応をとる範囲の決定方法として、当該形態素の前後に連続する当該形態素と同一の品詞の形態素とするようにしてもよい。または、該当する日本語表記の全形態素に対してＤＰマッチングを行うことで、全形態素の対応をとることができる。 The morpheme association unit 15 associates the modified morpheme notation 53 with the intermediate notation 55, as shown in FIG. However, since the intermediate notation 55 is an intermediate notation corrected by the user, in some cases, the reading etc. may be changed and the correspondence may not be simply taken. In such a case, for example, only the surroundings of the changed morpheme may be taken. As a method of determining the range to take correspondence, the morpheme of the same part-of-speech as the morpheme that is continuous before and after the morpheme may be used. Alternatively, all morphemes can be matched by performing DP matching on all morphemes of corresponding Japanese notation.

上記第１から第３の実施形態および変形例では、形態素表記が品詞情報を含んでいるが、品詞情報は、上記の例に限定されない。例えば、名詞や動詞などの大まかな分類、もしくは、固有名詞や自立動詞などの詳細な分類、もしくは、固有名詞でも地名、製品名といった、より詳細で意味に係るものを使用するようにしてもよい。また、種類が同一と判別する組み合わせを規定するようにしてもよい。詳細になるほど韻律情報の修正条件が厳しくなるので、韻律情報を引き継がせたい度合いに応じて修正を行うか否かを選択することが可能となり、韻律記号の修正の精度が向上する。 In the first to third embodiments and the modification, the morpheme notation includes part-of-speech information, but the part-of-speech information is not limited to the above example. For example, rough classification such as nouns and verbs, or detailed classification such as proper nouns and independent verbs, or more proper and relevant nouns such as place names and product names may be used. . Further, a combination may be defined in which the types are determined to be the same. Since the correction condition of the prosody information becomes stricter as it becomes more detailed, it becomes possible to select whether or not to make correction depending on the degree to which the prosody information is to be taken over, and the accuracy of the correction of the prosody symbol is improved.

変更検索部１３が変更された形態素を検索する際には、ユーザが日本語表記を変更した前後の形態素の一部もしくは全てに対して、変更形態素を検索するようにしてもよい。これにより、効率的に変更形態素を検索し、計算速度の向上に寄与することができる。 When the change search unit 13 searches for the changed morpheme, the changed morpheme may be searched for part or all of the morphemes before and after the user changed the Japanese notation. This makes it possible to efficiently search for modified morphemes and contribute to the improvement of the calculation speed.

アクセント強度の指定は、強弱などの他、言語処理の結果をそのまま使用する「指定なし」でもよい。テンプレートＤＢ１１７に登録されるテンプレートは、文単位のテンプレート以外に、アクセント句単位や、句読点などで区切られる呼気段落単位などのテンプレートでもよい。これにより、より合成音声の品質向上が期待される。 The designation of the accent intensity may be "not designated", which uses the result of language processing as it is, in addition to the strength and weakness. The template registered in the template DB 117 may be a template such as an accent paragraph unit or an exhalation paragraph unit separated by punctuation marks, etc., in addition to the sentence unit template. This is expected to further improve the quality of synthetic speech.

テンプレート更新部２０５は、例えば、中間表記の修正があるごとに、テンプレートを更新するようにしてもよい。また、ＤＢ登録部１０５、テンプレート更新部２０５は、例えば、ＧＵＩなどを用いたユーザによる手動の登録入力を受付けるようにしてもよい。 The template updating unit 205 may update the template, for example, each time there is a correction of the intermediate notation. Also, the DB registration unit 105 and the template update unit 205 may receive, for example, manual registration input by the user using a GUI or the like.

ここで、上記第１から第３の実施の形態および変形例による音声合成方法の動作をコンピュータに行わせるために共通に適用されるコンピュータの例について説明する。図１８は、標準的なコンピュータのハードウエア構成の一例を示すブロック図である。図１８に示すように、コンピュータ３００は、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＣＰＵ）３０２、メモリ３０４、入力装置３０６、出力装置３０８、外部記憶装置３１２、媒体駆動装置３１４、ネットワーク接続装置等がバス３１０を介して接続されている。 Here, an example of a computer commonly applied to cause a computer to perform the operation of the speech synthesis method according to the first to third embodiments and the modification will be described. FIG. 18 is a block diagram showing an example of a standard computer hardware configuration. As shown in FIG. 18, in the computer 300, a central processing unit (CPU) 302, a memory 304, an input device 306, an output device 308, an external storage device 312, a medium drive device 314, a network connection device, etc. via a bus 310. It is connected.

ＣＰＵ３０２は、コンピュータ３００全体の動作を制御する演算処理装置である。メモリ３０４は、コンピュータ３００の動作を制御するプログラムを予め記憶したり、プログラムを実行する際に必要に応じて作業領域として使用したりするための記憶部である。メモリ３０４は、例えばＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ（ＲＡＭ）、ＲｅａｄＯｎｌｙＭｅｍｏｒｙ（ＲＯＭ）等である。入力装置３０６は、コンピュータの使用者により操作されると、その操作内容に対応付けられている使用者からの各種情報の入力を取得し、取得した入力情報をＣＰＵ３０２に送付する装置であり、例えばキーボード装置、マウス装置などである。出力装置３０８は、コンピュータ３００による処理結果を出力する装置であり、表示装置などが含まれる。例えば表示装置は、ＣＰＵ３０２により送付される表示データに応じてテキストや画像を表示する。 The CPU 302 is an arithmetic processing unit that controls the overall operation of the computer 300. The memory 304 is a storage unit for storing in advance a program for controlling the operation of the computer 300 or for using it as a work area as needed when executing the program. The memory 304 is, for example, a random access memory (RAM), a read only memory (ROM), or the like. The input device 306 is a device that, when operated by the user of the computer, acquires inputs of various information from the user associated with the operation content, and sends the acquired input information to the CPU 302, for example. It is a keyboard device, a mouse device, etc. The output device 308 is a device that outputs the processing result of the computer 300, and includes a display device and the like. For example, the display device displays a text or an image according to the display data sent by the CPU 302.

外部記憶装置３１２は、例えば、ハードディスクなどの記憶装置であり、ＣＰＵ３０２により実行される各種制御プログラムや、取得したデータ等を記憶しておく装置である。媒体駆動装置３１４は、可搬記録媒体３１６に書き込みおよび読み出しを行うための装置である。ＣＰＵ３０２は、可搬記録媒体３１６に記録されている所定の制御プログラムを、記録媒体駆動装置３１４を介して読み出して実行することによって、各種の制御処理を行うようにすることもできる。可搬記録媒体３１６は、例えばＣｏｍｐａｃｔＤｉｓｃ（ＣＤ）−ＲＯＭ、ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ（ＤＶＤ）、ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ（ＵＳＢ）メモリ等である。ネットワーク接続装置３１８は、有線または無線により外部との間で行われる各種データの授受の管理を行うインタフェース装置である。バス３１０は、上記各装置等を互いに接続し、データのやり取りを行う通信経路である。 The external storage device 312 is, for example, a storage device such as a hard disk, and is a device for storing various control programs executed by the CPU 302, acquired data, and the like. The medium drive device 314 is a device for writing to and reading from a portable storage medium 316. The CPU 302 can also perform various control processes by reading out and executing a predetermined control program recorded in the portable recording medium 316 via the recording medium drive device 314. The portable recording medium 316 is, for example, a Compact Disc (CD) -ROM, a Digital Versatile Disc (DVD), a Universal Serial Bus (USB) memory, or the like. The network connection device 318 is an interface device that manages exchange of various data performed with the outside by wire or wirelessly. The bus 310 is a communication path for mutually connecting the above-described devices and the like and exchanging data.

上記第１から第３の実施の形態および変形例による音声合成方法をコンピュータに実行させるプログラムは、例えば外部記憶装置３１２に記憶させる。ＣＰＵ３０２は、外部記憶装置３１２からプログラムを読み出し、コンピュータ３００に音声合成の動作を行なわせる。このとき、まず、音声合成の処理をＣＰＵ３０２に行わせるための制御プログラムを作成して外部記憶装置３１２に記憶させておく。そして、入力装置３０６から所定の指示をＣＰＵ３０２に与えて、この制御プログラムを外部記憶装置３１２から読み出させて実行させるようにする。また、このプログラムは、可搬記録媒体３１６に記憶するようにしてもよい。また、例えばテンプレートＤＢ１１７は、ネットワーク接続装置３１８を介して接続される他のコンピュータなどの記憶装置に設けられるようにしてもよい。また、上記音声合成処理の少なくとも一部を、ネットワーク接続装置３１８を介して接続される他のコンピュータで行うことも可能である。 A program that causes a computer to execute the speech synthesis method according to the first to third embodiments and the modification is stored, for example, in the external storage device 312. The CPU 302 reads a program from the external storage device 312 and causes the computer 300 to perform a voice synthesis operation. At this time, first, a control program for causing the CPU 302 to perform speech synthesis processing is created and stored in the external storage device 312. Then, a predetermined instruction is given from the input device 306 to the CPU 302, and the control program is read from the external storage device 312 and executed. Also, this program may be stored in the portable storage medium 316. Also, for example, the template DB 117 may be provided in a storage device such as another computer connected via the network connection device 318. In addition, at least a part of the speech synthesis process can be performed by another computer connected via the network connection device 318.

以上の実施形態に関し、さらに以下の付記を開示する。
（付記１）
音声合成の対象とする自然言語の言語表記の入力を受付ける入力部と、
少なくとも前記自然言語の形態素に対応する読み、品詞、アクセント句、および前記アクセント句に関する韻律を含む情報が登録された辞書情報に基づき前記言語表記を解析し、前記言語表記に含まれる形態素と前記形態素に対応する品詞情報とを含む形態素表記、並びに、アクセント句のまとまりを示すアクセント句情報と前記アクセント句の韻律を示す韻律情報とを含む中間表記を生成する言語処理部と、
品詞情報とアクセント句情報とに関連付けて、韻律情報の修正内容を示す修正情報を記憶する記憶部と、
前記言語処理部により生成された品詞情報とアクセント句情報とに関連付けて、前記修正情報が前記記憶部に記憶されている場合に、前記言語処理部により生成された韻律情報を前記修正情報に基づき修正する修正部と、
前記修正部による修正を反映した韻律情報を含む中間表記に基づき前記言語表記に対応する音声を合成する音声合成部と、
を有することを特徴とする音声合成装置。
（付記２）
前記修正部は、前記言語処理部により生成されたアクセント句情報と、前記記憶部に記憶されたアクセント句情報とのそれぞれが示すアクセント句にそれぞれに含まれる品詞の数と種類が一致する場合に、修正を行う
ことを特徴とする付記１に記載の音声合成装置。
（付記３）
前記言語処理部により生成された韻律情報の修正を受付ける韻律修正受付部と、
前記修正後の韻律情報と前記言語処理部により生成された前記品詞情報および前記アクセント句情報の対応付を行う対応付け部と、
をさらに有し、
前記記憶部は、前記韻律修正受付部で受付けた修正の内容を示す修正情報を、対応付けられた前記品詞情報と前記アクセント句情報とに関連付けて記憶する
ことを特徴とする付記１または付記２に記載の音声合成装置。
（付記４）
前記音声合成の対象とする言語表記の変更を受け付ける言語変更受付部と、
前記変更に関連する形態素を検索する変更検索部と、
をさらに有し、
前記言語処理部は、変更後の言語表記を解析し、
前記修正部は、検索された前記変更に関連する形態素を含むアクセント句に対応する品詞情報に関連付けて前記修正情報が記憶されている場合に修正を行う
ことを特徴とする付記１から付記３のいずれかに記載の音声合成装置。
（付記５）
前記修正情報の登録を受付ける登録受付部
をさらに有し、
前記記憶部は、前記登録受付部で受付けた修正情報を登録する
ことを特徴とする付記１から付記４のいずれかに記載の音声合成装置。
（付記６）
音声合成装置が、
音声合成の対象とする自然言語の言語表記の入力を受付け、
少なくとも前記自然言語の形態素に対応する読み、品詞、アクセント句、および前記アクセント句に関する韻律を含む情報が登録された辞書情報に基づき前記言語表記を解析し、前記言語表記に含まれる形態素と前記形態素に対応する品詞情報とを含む形態素表記、並びに、アクセント句のまとまりを示すアクセント句情報と前記アクセント句の韻律を示す韻律情報とを含む中間表記を生成し、
生成された前記品詞情報と前記アクセント句情報とに関連付けて、前記韻律情報の修正内容を示す修正情報が記憶部に記憶されている場合に、生成された前記韻律情報を前記修正情報に基づき修正し、
前記修正を反映した韻律情報を含む中間表記に基づき前記言語表記に対応する音声を合成する、
ことを特徴とする音声合成方法。
（付記７）
前記修正する処理においては、前記生成されたアクセント句情報と、前記記憶部に記憶されたアクセント句情報とのそれぞれが示すアクセント句にそれぞれに含まれる品詞の数と種類が一致する場合に、修正を行う
ことを特徴とする付記７に記載の音声合成方法。
（付記８）
さらに、
前記生成された韻律情報の修正を受付け、
前記修正後の韻律情報と前記生成された品詞情報およびアクセント句情報の対応付けを行い、受付けた前記修正の内容を示す修正情報を、対応付けられた前記品詞情報と前記アクセント句情報とに関連付けて記憶する
ことを特徴とする付記６または付記７に記載の音声合成方法。
（付記９）
さらに、
前記音声合成の対象とする言語表記の変更を受け付け、
前記変更後の言語表記を解析して、前記変更に関連する形態素を検索し、
検索された前記変更に関連する形態素を含むアクセント句に対応する品詞情報に関連付けて前記修正情報が記憶されている場合に修正を行う
ことを特徴とする付記６から付記８のいずれかに記載の音声合成方法。
（付記１０）
前記修正情報の登録を受付け、受付けた修正情報を前記記憶部に記憶させる
ことを特徴とする付記６から付記９のいずれかに記載の音声合成方法。
（付記１１）
音声合成の対象とする自然言語の言語表記の入力を受付け、
少なくとも前記自然言語の形態素に対応する読み、品詞、アクセント句、および前記アクセント句に関する韻律を含む情報が登録された辞書情報に基づき前記言語表記を解析し、前記言語表記に含まれる形態素と前記形態素に対応する品詞情報とを含む形態素表記、並びに、アクセント句のまとまりを示すアクセント句情報と前記アクセント句の韻律を示す韻律情報とを含む中間表記を生成し、
生成された前記品詞情報と前記アクセント句情報とに関連付けて、前記韻律情報の修正内容を示す修正情報が記憶部に記憶されている場合に、生成された前記韻律情報を前記修正情報に基づき修正し、
前記修正を反映した韻律情報を含む中間表記に基づき前記言語表記に対応する音声を合成する、
処理をコンピュータに実行させるプログラム。
（付記１２）
前記修正する処理においては、前記生成されたアクセント句情報と、前記記憶部に記憶されたアクセント句情報とのそれぞれが示すアクセント句にそれぞれに含まれる品詞の数と種類が一致する場合に、修正を行う
ことを特徴とする付記１１に記載のプログラム。
（付記１３）
さらに、
前記生成された韻律情報の修正を受付け、
前記修正後の韻律情報と前記生成された品詞情報およびアクセント句情報の対応付けを行い、受付けた前記修正の内容を示す修正情報を、対応付けられた前記品詞情報と前記アクセント句情報とに関連付けて記憶する
ことを特徴とする付記１１または付記１２に記載のプログラム。
（付記１４）
さらに、
前記音声合成の対象とする言語表記の変更を受け付け、
前記変更後の言語表記を解析して、前記変更に関連する形態素を検索し、
検索された前記変更に関連する形態素を含むアクセント句に対応する品詞情報に関連付けて前記修正情報が記憶されている場合に修正を行う
ことを特徴とする付記１１から付記１３のいずれかに記載のプログラム。
（付記１５）
前記修正情報の登録を受付け、受付けた修正情報を前記記憶部に記憶させる
ことを特徴とする付記１１から付記１４のいずれかに記載のプログラム。 Further, the following appendices will be disclosed regarding the above embodiment.
(Supplementary Note 1)
An input unit that receives an input of a language expression of a natural language to be subjected to speech synthesis;
The linguistic notation is analyzed based on dictionary information in which information including at least a reading, part of speech, accent phrase and prosody concerning the accent phrase corresponding to a morpheme of the natural language is analyzed, and the morpheme and the morpheme included in the linguistic notation A morpheme notation including part-of-speech information corresponding to a word processing section for generating an intermediate notation including accent phrase information indicating grouping of accent phrases and prosody information indicating prosody of the accent phrases;
A storage unit that stores correction information indicating correction content of prosody information in association with part-of-speech information and accent phrase information;
When the correction information is stored in the storage unit in association with the part-of-speech information generated by the language processing unit and the accent phrase information, the prosody information generated by the language processing unit is based on the correction information A correction unit to correct,
A speech synthesis unit that synthesizes speech corresponding to the language notation based on an intermediate notation including prosody information reflecting the correction by the correction unit;
A speech synthesizer characterized by having.
(Supplementary Note 2)
The correction unit is configured to match the number and type of parts of speech included in the accent phrases indicated by the accent phrase information generated by the language processing unit and the accent phrase information stored in the storage unit. The speech synthesizer according to claim 1, wherein the correction is performed.
(Supplementary Note 3)
A prosody modification receiving unit for accepting modification of prosody information generated by the language processing unit;
An associating unit that associates the prosody information after the correction with the part-of-speech information generated by the language processing unit and the accent phrase information;
And have
The storage unit stores correction information indicating the content of correction received by the prosody correction receiving unit in association with the part-of-speech information and the accent phrase information associated with each other. The speech synthesizer as described in.
(Supplementary Note 4)
A language change receiving unit that receives a change in language notation to be subjected to the speech synthesis;
A change search unit for searching morphemes related to the change;
And have
The language processing unit analyzes the changed language notation,
The correction unit performs correction when the correction information is stored in association with part-of-speech information corresponding to an accent phrase including a morpheme related to the searched change. The voice synthesizer according to any of the above.
(Supplementary Note 5)
It further has a registration acceptance unit that accepts registration of the correction information,
The voice synthesis apparatus according to any one of appendices 1 to 4, wherein the storage unit registers the correction information accepted by the registration accepting unit.
(Supplementary Note 6)
The speech synthesizer
Accepts input of natural language linguistic expressions targeted for speech synthesis,
The linguistic notation is analyzed based on dictionary information in which information including at least a reading, part of speech, accent phrase and prosody concerning the accent phrase corresponding to a morpheme of the natural language is analyzed, and the morpheme and the morpheme included in the linguistic notation Generating an intermediate notation including morpheme notation including part-of-speech information corresponding to, and accent phrase information indicating aggregation of accent phrases and prosody information indicating prosody of the accent phrases;
When correction information indicating correction content of the prosody information is stored in the storage unit in association with the generated part of speech information and the accent phrase information, the generated prosody information is corrected based on the correction information And
The speech corresponding to the language notation is synthesized based on an intermediate notation including prosody information reflecting the correction.
A speech synthesis method characterized in that.
(Appendix 7)
In the correction process, if the generated accent phrase information and the accent phrase information stored in the storage unit respectively indicate the number of types and the number of parts of speech included in the accent phrase, the correction is performed. The speech synthesis method as set forth in claim 7, characterized in that:
(Supplementary Note 8)
further,
Accept modification of the generated prosody information,
The corrected prosody information is associated with the generated part-of-speech information and accent phrase information, and the correction information indicating the content of the received correction is associated with the associated part-of-speech information and the accent phrase information The voice synthesis method according to Supplementary Note 6 or 7, characterized in that
(Appendix 9)
further,
Accept changes in the language representation targeted for the speech synthesis,
Analyzing the changed language notation to search for morphemes associated with the change;
8. A correction is performed when the correction information is stored in association with a part-of-speech information corresponding to an accent phrase including a morpheme related to the retrieved change. Speech synthesis method.
(Supplementary Note 10)
The voice synthesis method according to any one of appendices 6 to 9, characterized in that registration of the correction information is accepted and the accepted correction information is stored in the storage unit.
(Supplementary Note 11)
Accepts input of natural language linguistic expressions targeted for speech synthesis,
The linguistic notation is analyzed based on dictionary information in which information including at least a reading, part of speech, accent phrase and prosody concerning the accent phrase corresponding to a morpheme of the natural language is analyzed, and the morpheme and the morpheme included in the linguistic notation Generating an intermediate notation including morpheme notation including part-of-speech information corresponding to, and accent phrase information indicating aggregation of accent phrases and prosody information indicating prosody of the accent phrases;
When correction information indicating correction content of the prosody information is stored in the storage unit in association with the generated part of speech information and the accent phrase information, the generated prosody information is corrected based on the correction information And
The speech corresponding to the language notation is synthesized based on an intermediate notation including prosody information reflecting the correction.
A program that causes a computer to execute a process.
(Supplementary Note 12)
In the correction process, if the generated accent phrase information and the accent phrase information stored in the storage unit respectively indicate the number of types and the number of parts of speech included in the accent phrase, the correction is performed. The program according to appendix 11, wherein the program is performed.
(Supplementary Note 13)
further,
Accept modification of the generated prosody information,
The corrected prosody information is associated with the generated part-of-speech information and accent phrase information, and the correction information indicating the content of the received correction is associated with the associated part-of-speech information and the accent phrase information APPENDIX 11 or the program according to Appendix 12, which is characterized by storing.
(Supplementary Note 14)
further,
Accept changes in the language representation targeted for the speech synthesis,
Analyzing the changed language notation to search for morphemes associated with the change;
15. A correction is performed when the correction information is stored in association with a part-of-speech information corresponding to an accent phrase including a morpheme related to the retrieved change, and the correction is performed as described in any of supplementary notes 11 to 13. program.
(Supplementary Note 15)
15. The program according to any one of appendices 11 to 14, wherein the registration of the correction information is accepted, and the accepted correction information is stored in the storage unit.

１音声合成装置
３制御部
５入出力部
７言語処理部
９音声合成部
１１中間表記修正部
１３変更検索部
１５形態素対応付け部
１７修正判定部
１９韻律修正部
２１記憶部
４１、４７日本語表記
４２、４８形態素区切り情報
４３、４５、４９、５９、６６韻律情報
５１、５７品詞情報
５２、５８形態素表記
５３、６２変更形態素表記
５４、６０アクセント句情報
５５、６１、６７中間表記
５６対応情報
６３変更情報
６４修正判別情報
６５修正参照情報 DESCRIPTION OF SYMBOLS 1 speech synthesis apparatus 3 control unit 5 input / output unit 7 language processing unit 9 speech synthesis unit 11 intermediate notation correction unit 13 change search unit 15 morpheme association unit 17 correction determination unit 19 prosody correction unit 21 storage unit 41, 47 Japanese notation 42, 48 morpheme division information 43, 45, 49, 59, 66 prosody information 51, 57 part-of-speech information 52, 58 morpheme notation 53, 62 modified morpheme notation 54, 60 accent phrase information 55, 61, 67 intermediate notation 56 correspondence information 63 Change information 64 Correction determination information 65 Correction reference information

Claims

An input unit that receives an input of a language expression of a natural language to be subjected to speech synthesis;
The linguistic notation is analyzed based on dictionary information in which information including at least a reading, part of speech, accent phrase and prosody concerning the accent phrase corresponding to a morpheme of the natural language is analyzed, and the morpheme and the morpheme included in the linguistic notation A morpheme notation including part-of-speech information corresponding to a word processing section for generating an intermediate notation including accent phrase information indicating grouping of accent phrases and prosody information indicating prosody of the accent phrases;
A storage unit that stores correction information indicating correction content of prosody information in association with part-of-speech information and accent phrase information;
When the number and type of part of speech included in each of the accent phrases indicated by the accent phrase information generated by the language processing unit and the accent phrase information stored in the storage unit match, the language processing unit A correction unit that corrects prosody information generated by the correction based on the correction information;
A speech synthesis unit that synthesizes speech corresponding to the language notation based on an intermediate notation including prosody information reflecting the correction by the correction unit;
A speech synthesizer characterized by having.

A prosody modification receiving unit for accepting modification of prosody information generated by the language processing unit;
An associating unit that associates the prosody information after the correction with the part-of-speech information generated by the language processing unit and the accent phrase information;
And have
Wherein the storage unit, 請 Motomeko 1 modification information indicating the content of the correction accepted by the prosody modification accepting unit, and to store in association with the part of speech information associated with said accent phrase information The speech synthesizer as described in.

A language change receiving unit that receives a change in language notation to be subjected to the speech synthesis;
A change search unit for searching morphemes related to the change;
And have
The language processing unit analyzes the changed language notation,
The adjustment unit, according to claim 1 or claim, characterized in that to correct if the correction information in association with part of speech information corresponding to the accent phrase including morpheme associated with retrieved the changes are stored The speech synthesizer according to 2 .

It further has a registration acceptance unit that accepts registration of the correction information,
The speech synthesis apparatus according to any one of claims 1 to 3 , wherein the storage unit registers the correction information received by the registration reception unit.

The speech synthesizer
Accepts input of natural language linguistic expressions targeted for speech synthesis,
The linguistic notation is analyzed based on dictionary information in which information including at least a reading, part of speech, accent phrase and prosody concerning the accent phrase corresponding to a morpheme of the natural language is analyzed, and the morpheme and the morpheme included in the linguistic notation Generating an intermediate notation including morpheme notation including part-of-speech information corresponding to, and accent phrase information indicating aggregation of accent phrases and prosody information indicating prosody of the accent phrases;
Accent phrase information stored in the storage unit when the number and type of parts of speech included in each of the accent phrases indicated by the generated accent phrase information and the accent phrase information stored in the storage unit match based on the correction information associated with the correctness of the generated said prosodic information Osamu,
The speech corresponding to the language notation is synthesized based on an intermediate notation including prosody information reflecting the correction.
A speech synthesis method characterized in that.

Accepts input of natural language linguistic expressions targeted for speech synthesis,
The linguistic notation is analyzed based on dictionary information in which information including at least a reading, part of speech, accent phrase and prosody concerning the accent phrase corresponding to a morpheme of the natural language is analyzed, and the morpheme and the morpheme included in the linguistic notation Generating an intermediate notation including morpheme notation including part-of-speech information corresponding to, and accent phrase information indicating aggregation of accent phrases and prosody information indicating prosody of the accent phrases;
The accent phrase stored in the storage unit when the number and type of part of speech included in each of the accent phrases indicated by the generated accent phrase information and the accent phrase information stored in the storage unit match. based on the correction information associated with the information, correct the generated said prosodic information Osamu,
The speech corresponding to the language notation is synthesized based on an intermediate notation including prosody information reflecting the correction.
A program that causes a computer to execute a process.