JP2015172625A

JP2015172625A - Voice synthesizer, synthesized voice editing method, and synthesized voice editing computer program

Info

Publication number: JP2015172625A
Application number: JP2014047871A
Authority: JP
Inventors: 野田　拓也; Takuya Noda; 拓也野田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-03-11
Filing date: 2014-03-11
Publication date: 2015-10-01
Anticipated expiration: 2034-03-11
Also published as: JP6340839B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice synthesizer capable of preventing accent errors due to the edition of boundaries between accentual phases in synthesized voice.SOLUTION: The voice synthesizer comprises: a language processing unit 10 performing language processing on an original of synthesized voice and reading of the original while referring to a language dictionary, identifying a part of speech of each word included in the original, and generating an intermediate notation representing a position at which an accent and an accent combination occur in each accentual phrase into which the original is divided; an accentual-phrase-boundary-candidate extraction unit 21 extracting a boundary between an independent word out of the words and a word just before the independent word as an accentual phrase boundary candidate; and a correction-target-candidate determination unit 22 identifying a candidate for which accentual phrases before and after the candidate do not have incorrect accents as a correction target candidate to be displayed even if changing determination as to whether the boundary is an accentual phrase boundary depending on whether an accent of an accentual phrase before each accentual phrase boundary candidate and the candidate out of the accentual phrase boundary candidates are at a position of accent combination.

Description

本発明は、例えば、テキストデータから音声信号を合成する音声合成装置及びその音声合成装置で利用される合成音声編集方法及び合成音声編集用コンピュータプログラムに関する。 The present invention relates to a speech synthesizer that synthesizes a speech signal from text data, for example, a synthesized speech editing method and a synthesized speech editing computer program used in the speech synthesizer.

近年、音声を自動合成する音声合成技術が開発されている。音声合成技術は、短時間で所望の音声を作成できるというメリットを有するため、これまで予め録音されたプロのナレータによる音声を用いていたアプリケーションの中には、このような音声合成技術を採用したものもある。特に、商業施設における案内放送、ハイウェイラジオ、ハイウェイテレホンまたは天気予報の放送など、短い時間間隔で提供する情報が更新されるアプリケーションでは、上記のメリットを持つ音声合成技術が有用である。 In recent years, speech synthesis technology for automatically synthesizing speech has been developed. Since speech synthesis technology has the advantage that it can create desired speech in a short time, such speech synthesis technology has been adopted in applications that have used pre-recorded speech by professional narrators. There are also things. In particular, in an application in which information provided at a short time interval is updated, such as a guidance broadcast in a commercial facility, a highway radio, a highway telephone, or a weather forecast broadcast, the speech synthesis technology having the above-described advantages is useful.

合成したい音声信号を生成するために、音声合成装置には、例えば、キーボードなどを介して漢字仮名交じりのテキストデータが入力される。そして音声合成装置は、そのテキストデータに対して、単語を漢字と仮名で表した漢字仮名表記とその単語の発音を表す表音文字列などを登録した単語辞書を利用して、形態素解析または係り受け解析といった言語処理を行う。そして音声合成装置は、その言語処理によって、テキストデータの表音文字列と、その表音文字列にアクセント位置、アクセントの強弱あるいは抑揚の大小といった韻律を表す韻律記号を付した中間表記を生成する。そして音声合成装置は、その中間表記に基づいて、合成音声信号を生成する。 In order to generate a speech signal to be synthesized, text data mixed with kanji characters is input to the speech synthesizer via, for example, a keyboard. Then, the speech synthesizer uses a word dictionary in which kanji kana notation representing a word in kanji and kana and a phonetic character string representing pronunciation of the word are registered with respect to the text data, and morphological analysis or Performs language processing such as receiving analysis. The speech synthesizer then generates a phonetic character string of the text data and an intermediate notation in which the phonetic character string is added with a prosodic symbol representing the prosody such as accent position, accent strength, or inflection magnitude. . Then, the speech synthesizer generates a synthesized speech signal based on the intermediate notation.

展示会またはe-Learning用のナレーションとして使用される合成音声などでは、より自然な発声に近い、高品質な音声とするために、合成音声の韻律をユーザが調整することがある。そこで、アクセント句などの区切り位置を編集するためのユーザインターフェースを備え、単語列のアクセント句などを表す発話区分が編集されると、その編集された発話区分に基づいて発音記号列を再生成する技術が提案されている（例えば、特許文献１を参照）。 In synthesized speech used as a narration for an exhibition or e-Learning, the user may adjust the prosody of the synthesized speech in order to obtain high-quality speech that is closer to a natural utterance. Therefore, a user interface for editing the break position of an accent phrase is provided, and when a speech segment representing an accent phrase of a word string is edited, a phonetic symbol string is regenerated based on the edited speech segment. Techniques have been proposed (see, for example, Patent Document 1).

特開平５−１１７９７号公報JP-A-5-11797

特許文献１に開示された技術では、全てのアクセント句の境界が提示されるので、ユーザが、全てのアクセント句の境界の正誤と編集の有無を判定することになる。しかし、合成音声のアクセントが不自然とならないように、アクセント句の境界を変更するには、ユーザ自身がアクセントに関する知識を有していることが求められる。もし、アクセントに関する知識が十分でないユーザが、アクセント句の境界を無くしたり、あるいは追加したりといった編集作業を行うと、アクセント句の境界の有無によって適切なアクセントの位置が異なるために、アクセントが誤ったものになることがある。 In the technique disclosed in Patent Literature 1, since boundaries of all accent phrases are presented, the user determines whether the boundaries of all accent phrases are correct or not and whether or not editing has been performed. However, in order to change the boundary of the accent phrase so that the accent of the synthesized speech does not become unnatural, it is required that the user himself has knowledge about the accent. If a user who does not have enough knowledge about accents performs editing work such as removing or adding accent phrase boundaries, the correct accent position will differ depending on the presence or absence of the accent phrase boundaries. It may become a thing.

そこで本明細書は、一つの側面として、合成音声のアクセント句の境界の編集によるアクセントの誤りを防止できる音声合成装置を提供することを目的とする。 Accordingly, an object of one aspect of the present specification is to provide a speech synthesizer capable of preventing an accent error caused by editing a boundary of an accent phrase of a synthesized speech.

一つの実施形態によれば、音声合成装置が提供される。この音声合成装置は、合成音声の原文となるデータ及びその原文の読みを表すデータを取得する入力部と、単語ごとの品詞及びアクセント位置が登録された単語辞書を記憶する記憶部と、単語辞書を参照して原文及びその原文の読みに言語処理を行うことにより、原文に含まれる各単語の品詞を特定し、かつ、その原文をアクセント句単位で分割して、各アクセント句のアクセント及びアクセント結合が生じた位置を表す中間表記を生成する言語処理部と、原文に含まれる各単語の品詞を参照して自立語である単語を特定し、自立語とその自立語の直前の単語間の境界を、それぞれ、アクセント句の境界の候補とするアクセント句境界候補抽出部と、アクセント句の境界の候補のうち、その候補の前のアクセント句のアクセント及びその候補がアクセント結合された位置にあるか否かに応じて、アクセント句の境界か否かを変更しても、その候補の前後のアクセント句が誤ったアクセントにならない候補を修正対象候補として特定する修正対象候補決定部と、修正対象候補を表示部に表示させる表示制御部と、を有する。 According to one embodiment, a speech synthesizer is provided. The speech synthesizer includes an input unit that obtains original text of synthesized speech and data representing reading of the original text, a storage unit that stores a word dictionary in which part-of-speech and accent positions for each word are registered, and a word dictionary To identify the part of speech of each word contained in the original text, and to divide the original text into accent phrases, and to accent and accent each accent phrase. A language processing unit that generates an intermediate notation that indicates the position where the coupling occurs, and a word that is an independent word by referring to the part of speech of each word included in the original text, and between the independent word and the word immediately before the independent word An accent phrase boundary candidate extraction unit that sets each boundary as a candidate for an accent phrase boundary, and among the accent phrase boundary candidates, an accent phrase preceding the candidate and its candidates Correction target that identifies candidates whose accent phrases before and after the candidate do not become incorrect accents as correction target candidates even if the boundary of the accent phrase is changed depending on whether or not it is in an accent-joined position A candidate determination unit; and a display control unit that displays correction target candidates on the display unit.

本発明の目的及び利点は、請求項において特に指摘されたエレメント及び組み合わせにより実現され、かつ達成される。
上記の一般的な記述及び下記の詳細な記述の何れも、例示的かつ説明的なものであり、請求項のように、本発明を限定するものではないことを理解されたい。 The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It should be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.

本明細書に開示された音声合成装置は、合成音声のアクセント句の境界の編集によるアクセントの誤りを防止できる。 The speech synthesizer disclosed in this specification can prevent accent errors due to editing of accent phrase boundaries in synthesized speech.

音声合成装置の概略構成図である。It is a schematic block diagram of a speech synthesizer. 音声合成装置が有する処理部の機能ブロック図である。It is a functional block diagram of the process part which a speech synthesizer has. アクセント句境界の候補の一例を示す図である。It is a figure which shows an example of the candidate of an accent phrase boundary. アクセント句境界の候補と、修正対象候補の関係の一例を示す図である。It is a figure which shows an example of the relationship between the candidate of an accent phrase boundary, and a candidate for correction. 修正対象候補を表示する、表示部の表示画面の一例を示す図である。It is a figure which shows an example of the display screen of a display part which displays a correction object candidate. 合成音声編集部により実行される、合成音声編集処理の動作フローチャートである。It is an operation | movement flowchart of the synthetic | combination audio | voice editing process performed by the synthetic | combination audio | voice edit part. 第２の実施形態による合成音声編集部の機能ブロック図である。It is a functional block diagram of the synthetic | combination audio | voice edit part by 2nd Embodiment. アクセント結合の有無によるアクセント位置の違いの一例を示す図である。It is a figure which shows an example of the difference in the accent position by the presence or absence of an accent coupling | bonding. アクセント結合の有無、前置アクセント句のアクセント種別及びアクセント位置変化の有無の組み合わせと、編集属性、境界属性及び変化属性の関係を示すテーブルである。It is a table which shows the relationship between the combination of the presence / absence of accent connection, the accent type of the prefix accent phrase and the presence / absence of change in accent position, and the edit attribute, boundary attribute and change attribute. （ａ）は、第２の実施形態による、修正対象候補を表示する表示部の表示画面の一例を示す図であり、（ｂ）は、（ａ）に示された原文の読みとアクセントを示す。(A) is a figure which shows an example of the display screen of the display part which displays a correction object candidate by 2nd Embodiment, (b) shows the reading and accent of the original text shown by (a). . 第２の実施形態による合成音声編集部により実行される、合成音声編集処理の動作フローチャートである。It is an operation | movement flowchart of the synthetic | combination audio | voice editing process performed by the synthetic | combination audio | voice edit part by 2nd Embodiment.

以下、図を参照しつつ、音声合成装置について説明する。
この音声合成装置は、合成音声のアクセント句の境界をユーザに編集させる際、アクセント句の境界の候補のうち、アクセント句の境界か否かが変更されてもアクセントが正しい候補を編集可能に表示する。一方、この音声合成装置は、アクセント句の境界の候補のうち、アクセント句の境界か否かが変更されると、アクセントが不適切となる候補を非表示とすることで、アクセント句の境界の編集によるアクセント誤りを防止する。 Hereinafter, the speech synthesizer will be described with reference to the drawings.
When this speech synthesizer allows the user to edit the accent phrase boundaries of the synthesized speech, the correct accent candidate can be edited even if the accent phrase boundary is changed or not. To do. On the other hand, this speech synthesizer changes the accent phrase boundary by hiding candidates for which the accent is inappropriate among the accent phrase boundary candidates when the accent phrase boundary is changed. Prevent accent errors due to editing.

図１は、一つの実施形態による音声合成装置の概略構成図である。本実施形態では、音声合成装置１は、操作部２と、表示部３と、通信インターフェース部４と、出力部５と、記憶部６と、処理部７とを有する。 FIG. 1 is a schematic configuration diagram of a speech synthesizer according to one embodiment. In the present embodiment, the speech synthesizer 1 includes an operation unit 2, a display unit 3, a communication interface unit 4, an output unit 5, a storage unit 6, and a processing unit 7.

操作部２は、例えば、キーボードと、マウスといったポインティングデバイスとを有する。そして操作部２は、合成音声の原文であり、漢字仮名交じり文であるテキストデータと、その読みを表すテキストデータとを取得する入力部の一例である。操作部２は、ユーザにより入力されたテキストデータを処理部７へ渡す。 The operation unit 2 includes, for example, a keyboard and a pointing device such as a mouse. The operation unit 2 is an example of an input unit that acquires text data that is an original sentence of synthesized speech and is a kanji-kana mixed sentence and text data that represents the reading. The operation unit 2 passes the text data input by the user to the processing unit 7.

表示部３は、液晶ディスプレイといった表示装置を有する。そして表示部３は、入力された合成音声の原文のテキストデータと、その原文中に設定される、編集可能なアクセント句の境界の候補などを表示する。なお、操作部２と表示部３とは、タッチパネルディスプレイとして一体的に形成されてもよい。 The display unit 3 includes a display device such as a liquid crystal display. Then, the display unit 3 displays the text data of the input synthesized speech original text, editable accent phrase boundary candidates set in the original text, and the like. The operation unit 2 and the display unit 3 may be integrally formed as a touch panel display.

通信インターフェース部４は、音声合成装置１を通信ネットワークに接続するためのインターフェース回路を有し、通信ネットワークを介して様々な情報を取得する。また通信インターフェース部４は、入力部の他の一例であり、合成音声の原文であり、漢字仮名交じり文であるテキストデータと、その読みを表すテキストデータとを通信ネットワークを介して音声合成装置１と接続された他の機器から取得してもよい。
また、通信インターフェース部４は、処理部７から受け取った合成音声信号を、通信ネットワークを介して音声合成装置１と接続された他の装置へ出力してもよい。 The communication interface unit 4 has an interface circuit for connecting the speech synthesizer 1 to a communication network, and acquires various information via the communication network. The communication interface unit 4 is another example of an input unit, which is a synthesized speech original text data that is a kanji kana mixed text and text data representing the reading thereof via a communication network. You may acquire from other apparatuses connected with.
Further, the communication interface unit 4 may output the synthesized speech signal received from the processing unit 7 to another device connected to the speech synthesizer 1 via the communication network.

出力部５は、処理部７から受け取った合成音声信号をスピーカ８へ出力する。そのために、出力部５は、例えば、スピーカ８を音声合成装置１と接続するためのオーディオインターフェース回路を有する。 The output unit 5 outputs the synthesized voice signal received from the processing unit 7 to the speaker 8. For this purpose, the output unit 5 includes, for example, an audio interface circuit for connecting the speaker 8 to the speech synthesizer 1.

記憶部６は、例えば、半導体メモリ回路、磁気記憶装置または光記憶装置のうちの少なくとも一つを有する。そして記憶部６は、処理部７で用いられる各種コンピュータプログラム、音声合成処理または合成音声編集処理に用いられる各種のデータを記憶する。
記憶部６は、音声合成処理に用いられるデータとして、例えば、韻律モデルと、音声波形辞書を記憶する。さらに記憶部６は、単語辞書を記憶する。単語辞書には、様々な単語について、その単語の表記、表音文字列、その単語固有のアクセント及びアクセント結合のし易さを表すポイントが登録される。単語のアクセント結合のし易さを表すポイントは、その単語が他の単語の前に位置する場合と、他の単語の後に位置する場合とで異なっていてもよい。また、単語辞書には、登録された各単語の品詞情報及び活用形などがさらに登録されてもよい。 The storage unit 6 includes, for example, at least one of a semiconductor memory circuit, a magnetic storage device, and an optical storage device. And the memory | storage part 6 memorize | stores the various computer program used by the process part 7, the various data used for a speech synthesis process or a synthetic | combination speech edit process.
The storage unit 6 stores, for example, a prosodic model and a speech waveform dictionary as data used for speech synthesis processing. Furthermore, the storage unit 6 stores a word dictionary. In the word dictionary, for various words, a notation of the word, a phonetic character string, an accent unique to the word, and points indicating ease of accent combination are registered. The point indicating the ease of combining the accents of a word may be different between the case where the word is positioned before another word and the case where the word is positioned after another word. The word dictionary may further register part-of-speech information and a utilization form of each registered word.

処理部７は、一つまたは複数のプロセッサと、メモリ回路と、周辺回路とを有する。そして処理部７は、入力されたテキストデータに基づいて、合成音声信号を作成する。
図２は、処理部７の機能ブロック図である。処理部７は、言語処理部１０と、音声合成部１１と、合成音声編集部１２とを有する。
処理部７が有するこれらの各部は、例えば、処理部７が有するプロセッサ上で動作するコンピュータプログラムにより実現される機能モジュールである。あるいは、処理部７が有するこれらの各部は、その各部の機能を実現する一つの集積回路として音声合成装置１に実装されてもよい。 The processing unit 7 includes one or more processors, a memory circuit, and a peripheral circuit. Then, the processing unit 7 creates a synthesized speech signal based on the input text data.
FIG. 2 is a functional block diagram of the processing unit 7. The processing unit 7 includes a language processing unit 10, a speech synthesis unit 11, and a synthesized speech editing unit 12.
Each of these units included in the processing unit 7 is, for example, a functional module realized by a computer program that runs on a processor included in the processing unit 7. Or these each part which the process part 7 has may be mounted in the speech synthesizer 1 as one integrated circuit which implement | achieves the function of each part.

言語処理部１０は、入力された、漢字仮名交じり文である原文のテキストデータに対応する表音文字列を生成し、さらにその表音文字列に基づいて中間表記を生成する。ここで、中間表記とは、表音文字列に、韻律を表す韻律記号が追加されたものである。韻律記号には、例えば、「アクセント位置」、「アクセント強弱」、「音程高低」、「抑揚大小」、「話速緩急」、「音量大小」及び「区切り」を表現する記号が含まれる。したがって、中間表記から韻律記号を除いたものは、表音文字列と一致する。 The language processing unit 10 generates a phonetic character string corresponding to the original text data that is the input kanji kana mixed sentence, and further generates an intermediate notation based on the phonetic character string. Here, the intermediate notation is obtained by adding a prosodic symbol representing a prosody to a phonetic character string. The prosodic symbols include, for example, symbols that represent “accent position”, “accent strength”, “pitch pitch”, “inflection magnitude”, “speech speed”, “volume level”, and “separation”. Therefore, the intermediate notation excluding the prosodic symbols matches the phonetic character string.

言語処理部１０は、入力された原文のテキストデータ及び読みを表すテキストデータから中間表記を生成するために、記憶部６に記憶されている単語辞書を読み込む。言語処理部１０は、例えば、その単語辞書を用いて、それらのテキストデータに対して形態素解析及び係り受け解析を行って、原文中に出現する各単語の順序及び読み、アクセントの位置及びアクセント句の境界及び呼気段落境界などの区切りの位置を決定する。さらに、言語処理部１０は、単語辞書を参照して、連続するアクセント句同士を結合させたときの結合のし易さのポイントの合計を算出し、その合計値が所定の閾値以上となる場合、それらアクセント句同士を結合する。 The language processing unit 10 reads the word dictionary stored in the storage unit 6 in order to generate an intermediate notation from the input text data of the original text and text data representing the reading. For example, the language processing unit 10 uses the word dictionary to perform morphological analysis and dependency analysis on the text data, and the order and reading of each word appearing in the original sentence, the position of the accent, and the accent phrase Delimiter positions such as borders and exhalation paragraph boundaries are determined. Furthermore, the language processing unit 10 refers to the word dictionary, calculates the sum of points of ease of combining when consecutive accent phrases are combined, and the total value is equal to or greater than a predetermined threshold. , And join these accent phrases together.

言語処理部１０は、形態素解析として、例えば、動的計画法を用いる方法を利用できる。また言語処理部１０は、係り受け解析として、例えば、先読みＬＲパーザまたはＬＬ法といった構文解析の手法を利用できる。そして言語処理部１０は、各単語の順序、読み、アクセントの位置及びアクセント句の境界を含む区切りの位置に応じて中間表記を作成する。
言語処理部１０は、生成した中間表記を記憶部６に記憶する。 The language processing unit 10 can use, for example, a method using dynamic programming as the morphological analysis. The language processing unit 10 can use a syntax analysis technique such as a prefetch LR parser or an LL method, for example, as dependency analysis. The language processing unit 10 creates an intermediate notation according to the order of each word, the reading, the position of the accent, and the position of the break including the boundary of the accent phrase.
The language processing unit 10 stores the generated intermediate notation in the storage unit 6.

音声合成部１１は、入力されたテキストデータの中間表記に基づいて合成音声信号を作成する。 The speech synthesizer 11 creates a synthesized speech signal based on the intermediate notation of the input text data.

音声合成部１１は、中間表記に基づいて、合成音声信号を生成する際の目標韻律を生成する。そのために、音声合成部１１は、記憶部６から複数の韻律モデルを読み込む。この韻律モデルは、声を高くする位置及び声を低くする位置などを時間順に表したものである。そして音声合成部１１は、複数の韻律モデルのうち、中間表記に示されたアクセントの位置などに最も一致する韻律モデルを選択する。そして音声合成部１１は、選択した韻律モデル及び合成パラメータに従って、中間表記に対して声が高くなる位置あるいは声が低くなる位置、声の抑揚、ピッチなどを設定することにより、目標韻律を作成する。目標韻律は、音声波形を決定する単位となる音素ごとに、音素の長さ及びピッチ周波数を含む。なお、音素は、例えば、一つの母音あるいは一つの子音とすることができる。 The speech synthesizer 11 generates a target prosody for generating a synthesized speech signal based on the intermediate notation. For this purpose, the speech synthesis unit 11 reads a plurality of prosodic models from the storage unit 6. This prosodic model represents a position in which the voice is raised and a position in which the voice is lowered in time order. Then, the speech synthesizer 11 selects a prosodic model that most closely matches the position of the accent indicated by the intermediate notation among a plurality of prosodic models. Then, the speech synthesizer 11 creates a target prosody by setting a position where the voice becomes high or low, a position where the voice becomes low, a voice inflection, a pitch, and the like according to the selected prosodic model and synthesis parameters. . The target prosody includes a phoneme length and a pitch frequency for each phoneme as a unit for determining a speech waveform. Note that the phoneme can be, for example, one vowel or one consonant.

音声合成部１１は、生成した目標韻律に従って、例えば、HMM(Hidden Markov Model)合成方式、音素接続方式またはコーパスベース方式によって合成音声信号を作成する。
例えば、音声合成部１１は、音素ごとに、目標韻律の音素長及びピッチ周波数に最も近い音声波形を、例えばパターンマッチングにより音声波形辞書に登録されている複数の音声波形の中から選択する。そのために、音声合成部１１は、記憶部６から音声波形辞書を読み込む。音声波形辞書は、複数の音声波形及び各音声波形の識別番号を記録する。また音声波形は、例えば、一人以上のナレータが様々なテキストを読み上げた様々な音声を録音した音声信号から、音素単位で取り出された波形信号である。
さらに、音声合成部１１は、音素ごとに選択された音声波形を目標韻律に沿って接続できるようにするため、それら選択された音声波形と目標韻律に示された対応する音素の波形パターンとのずれ量を、波形変換情報として算出してもよい。
音声合成部１１は、音素ごとに選択された音声波形の識別番号を含む波形生成情報を作成する。波形生成情報は、波形変換情報をさらに含んでもよい。 The speech synthesizer 11 creates a synthesized speech signal according to the generated target prosody, for example, by an HMM (Hidden Markov Model) synthesis method, a phoneme connection method, or a corpus-based method.
For example, for each phoneme, the speech synthesizer 11 selects a speech waveform closest to the phoneme length and pitch frequency of the target prosody from a plurality of speech waveforms registered in the speech waveform dictionary by pattern matching, for example. For this purpose, the speech synthesis unit 11 reads a speech waveform dictionary from the storage unit 6. The speech waveform dictionary records a plurality of speech waveforms and an identification number of each speech waveform. The voice waveform is, for example, a waveform signal extracted in units of phonemes from voice signals obtained by recording various voices in which one or more narrators read various texts.
Furthermore, the speech synthesizer 11 connects the selected speech waveform and the waveform pattern of the corresponding phoneme indicated in the target prosody so that the speech waveform selected for each phoneme can be connected along the target prosody. The deviation amount may be calculated as waveform conversion information.
The speech synthesizer 11 creates waveform generation information including the identification number of the speech waveform selected for each phoneme. The waveform generation information may further include waveform conversion information.

音声合成部１１は、波形生成情報に含まれる各音素の音声波形の識別番号に対応する音声波形信号を記憶部６から読み込む。そして音声合成部１１は、各音声波形信号を連続的に接続することにより、合成音声信号を作成する。なお、波形生成情報に波形変換情報が含まれている場合、音声合成部１１は、各音声波形信号を、対応する音素について求められた波形変換情報に従って補正して音声波形信号を連続的に接続することにより、合成音声信号を作成する。
音声合成部１１は、合成音声信号を出力部５へ出力する。 The speech synthesizer 11 reads from the storage unit 6 a speech waveform signal corresponding to the speech waveform identification number of each phoneme included in the waveform generation information. Then, the speech synthesizer 11 creates a synthesized speech signal by connecting each speech waveform signal continuously. When the waveform conversion information is included in the waveform generation information, the speech synthesizer 11 continuously connects the speech waveform signals by correcting each speech waveform signal according to the waveform conversion information obtained for the corresponding phoneme. By doing so, a synthesized speech signal is created.
The voice synthesizer 11 outputs the synthesized voice signal to the output unit 5.

合成音声編集部１２は、アクセント句の境界となり得る、単語間の境界（以下、便宜上、アクセント句境界の候補と呼ぶ）のうち、ユーザが修正しても、アクセントが不適切とならないものを表示部３に表示させる。また合成音声編集部１２は、操作部２を介して、アクセント句境界の候補について、アクセント句境界か否かが変更されたときに、その変更内容に応じて中間表記を修正する。そのために、合成音声編集部１２は、アクセント句境界候補抽出部２１と、修正対象候補決定部２２と、表示制御部２３と、修正部２４とを有する。 The synthesized speech editing unit 12 displays a boundary between words that can be an accent phrase boundary (hereinafter referred to as an accent phrase boundary candidate for convenience) even if the user corrects the accent, Part 3 is displayed. When the accent phrase boundary candidate is changed through the operation unit 2 as to whether or not it is an accent phrase boundary, the synthesized speech editing unit 12 corrects the intermediate notation according to the change. For this purpose, the synthesized speech editing unit 12 includes an accent phrase boundary candidate extraction unit 21, a correction target candidate determination unit 22, a display control unit 23, and a correction unit 24.

処理部７は、操作部２から中間表記の編集を行うことを示す操作信号を受け取ると、合成音声編集部１２を起動する。合成音声編集部１２は、起動されると、記憶部６から、合成音声の原文のテキストデータと、そのテキストデータに対応する各単語の品詞情報及び中間表記とを読み込む。 When the processing unit 7 receives an operation signal indicating that intermediate notation editing is to be performed from the operation unit 2, the processing unit 7 activates the synthesized speech editing unit 12. When the synthesized speech editing unit 12 is activated, it reads from the storage unit 6 the original text data of the synthesized speech and the part-of-speech information and intermediate notation of each word corresponding to the text data.

アクセント句境界候補抽出部２１は、各単語の品詞情報に基づいて、合成音声の原文に含まれる、名詞、代名詞または連体詞といった自立語を特定する。そしてアクセント句境界候補抽出部２１は、各自立語とその直前の単語間の境界をアクセント句境界の候補として抽出する。 The accent phrase boundary candidate extraction unit 21 identifies independent words such as nouns, pronouns, or conjunctions included in the original synthesized speech based on the part of speech information of each word. Then, the accent phrase boundary candidate extraction unit 21 extracts a boundary between each independent word and the immediately preceding word as an accent phrase boundary candidate.

図３は、アクセント句境界の候補の一例を示す図である。図３において、合成音声の原文３００は、言語処理部１０によって単語ごとに分解され、各単語の品詞が特定されている。このうち、二重線３０１で示されるように、単語『時代』（名詞）、『音声』（名詞）、『合成』（名詞）、『進歩』（名詞）のそれぞれとその直前の単語間の境界が、アクセント句境界の候補となる。 FIG. 3 is a diagram illustrating an example of accent phrase boundary candidates. In FIG. 3, the synthesized speech original sentence 300 is decomposed for each word by the language processing unit 10, and the part of speech of each word is specified. Among these, as indicated by the double line 301, the words “era” (noun), “speech” (noun), “composite” (noun), “progress” (noun) and the word immediately before that The boundary is a candidate accent phrase boundary.

修正対象候補決定部２２は、アクセント句境界の候補のうち、ユーザがアクセント句境界か否かを変更しても、アクセントが不適切とならない候補を、修正対象候補として特定する。本実施形態では、修正対象候補決定部２２は、ユーザがアクセント句境界か否かを変更しても、その前後のアクセント句のアクセントの位置が変化しないアクセント句境界の候補を修正対象候補とする。 The correction target candidate determination unit 22 identifies candidates for which the accent does not become inappropriate even if the user changes the accent phrase boundary among the accent phrase boundary candidates as the correction target candidates. In the present embodiment, the correction target candidate determination unit 22 sets an accent phrase boundary candidate in which the position of the accent phrase before and after the accent phrase does not change even if the user changes the accent phrase boundary as a correction target candidate. .

アクセント句境界か否かを変更しても、その前後のアクセント句のアクセントの位置が変化しない条件は以下の２通りである。
（１）アクセント結合が生じているために、一つのアクセント句の途中に位置する候補（すなわち、中間表記では、アクセント句境界となっていない候補）
（２）中間表記でも二つのアクセント句の境界となっており、かつ、直前のアクセント句が平板なアクセントとなっている候補 There are the following two conditions that do not change the position of the accent phrase before and after the boundary of the accent phrase even if it is changed.
(1) Candidates that are located in the middle of one accent phrase because of accent coupling (that is, candidates that are not accent phrase boundaries in intermediate notation)
(2) Candidates that are the boundary between two accent phrases even in intermediate notation, and the previous accent phrase is a flat accent

そこで、修正対象候補決定部２２は、中間表記を参照して、上記の二つの条件のうちの何れかの条件を満たすアクセント句境界の候補を修正対象候補とする。 Therefore, the correction target candidate determination unit 22 refers to the intermediate notation and sets a candidate for an accent phrase boundary that satisfies one of the two conditions as a correction target candidate.

図４は、アクセント句境界の候補と、修正対象候補の関係の一例を示す図である。図４において、合成音声の原文４００は、言語処理部１０によって単語ごとに分解され、各単語の品詞が特定されている。また、原文４００において、『この時代の』、『音声合成の』、『進歩は』が、それぞれ、一つのアクセント句４０１となる。それぞれのアクセント句４０１のアクセントは、原文４００の読みを表す表記４０２において、個々の音ごとの高低を表す折れ線４０３により示される。さらに、アクセント句境界候補抽出部２１により抽出されたアクセント句境界の候補４０４−１〜４０４−４は、二重線で示される。 FIG. 4 is a diagram illustrating an example of a relationship between an accent phrase boundary candidate and a correction target candidate. In FIG. 4, the synthesized speech original sentence 400 is decomposed for each word by the language processing unit 10, and the part of speech of each word is specified. Further, in the original text 400, “this era”, “speech synthesis”, and “progress” are each one accent phrase 401. The accent of each accent phrase 401 is indicated by a broken line 403 representing the height of each individual sound in the notation 402 representing the reading of the original text 400. Further, the accent phrase boundary candidates 404-1 to 404-4 extracted by the accent phrase boundary candidate extraction unit 21 are indicated by double lines.

各アクセント句境界の候補のうち、候補４０４−１及び４０４−３は、一つのアクセント句の途中に位置するので、上記の条件（１）を満たす。
したがって、候補４０４−１及び４０４−３は、修正対象候補となる。 Among the accent phrase boundary candidates, the candidates 404-1 and 404-3 are located in the middle of one accent phrase, and therefore satisfy the above condition (1).
Therefore, the candidates 404-1 and 404-3 are correction target candidates.

また、候補４０４−２の直前のアクセント句『この時代の』のアクセントは平板となっているので、上記の条件（２）を満たす。
したがって、候補４０４−２は、修正対象候補となる。 Further, since the accent phrase “in this era” immediately before the candidate 404-2 is a flat plate, the above condition (2) is satisfied.
Therefore, the candidate 404-2 becomes a correction target candidate.

一方、候補４０４−４は、アクセント句の途中に位置しておらず、かつ、直前のアクセント句『音声合成の』のアクセントは平板でない。
したがって、候補４０４−４は、修正対象候補とならない。 On the other hand, the candidate 404-4 is not positioned in the middle of the accent phrase, and the accent of the immediately preceding accent phrase “speech synthesis” is not a flat plate.
Therefore, the candidate 404-4 is not a correction target candidate.

修正対象候補決定部２２は、各アクセント句境界の候補の位置、及び、各アクセント句境界の候補が修正対象候補となるか否かを表す情報を、表示制御部２３へ通知する。 The correction target candidate determination unit 22 notifies the display control unit 23 of the position of each accent phrase boundary candidate and information indicating whether or not each accent phrase boundary candidate is a correction target candidate.

表示制御部２３は、修正対象候補となるアクセント句境界の候補を、ユーザがアクセント句の境界とするか否かを変更可能であることが分かるように表示部３に表示させる。 The display control unit 23 causes the display unit 3 to display the accent phrase boundary candidate as the correction target candidate so that it can be determined whether or not the user can change the accent phrase boundary.

図５は、修正対象候補を表示する、表示部３の表示画面の一例を示す図である。
表示画面５００には、原文５１０と、修正対象候補５０１〜５０３が表示されている。このうち、修正対象候補５０１及び５０３は、アクセント結合のために、一つのアクセント句の途中に位置するので、修正前の状態では、アクセント句境界ではない。そのため、この例では、修正対象候補５０１及び５０３は、点線で示されている。一方、修正対象候補５０２は、言語処理部１０により生成された中間表記においてアクセント句境界となっている。そのため、この例では、修正対象候補５０２は、実線で示されている。 FIG. 5 is a diagram illustrating an example of a display screen of the display unit 3 that displays correction target candidates.
On the display screen 500, an original sentence 510 and correction target candidates 501 to 503 are displayed. Among them, the correction target candidates 501 and 503 are located in the middle of one accent phrase for accent combination, and thus are not accent phrase boundaries in the state before correction. Therefore, in this example, the correction target candidates 501 and 503 are indicated by dotted lines. On the other hand, the correction target candidate 502 is an accent phrase boundary in the intermediate notation generated by the language processing unit 10. Therefore, in this example, the correction target candidate 502 is indicated by a solid line.

なお、表示制御部２３は、修正対象候補でないアクセント句境界の候補を、ユーザがアクセント句の境界とするか否かを変更できないことが分かるようにして、表示部３に表示させてもよい。例えば、表示制御部２３は、修正対象候補でないアクセント句境界の候補を、修正対象候補を表す線と異なる色または異なる輝度の線として、原文の対応する位置に表示させてもよい。 The display control unit 23 may display the accent phrase boundary candidate that is not the correction target candidate on the display unit 3 so that the user cannot change whether or not the accent phrase boundary candidate is set as the accent phrase boundary. For example, the display control unit 23 may display an accent phrase boundary candidate that is not a correction target candidate as a line having a color different from that of the line representing the correction target candidate or a line having a different luminance at a corresponding position in the original text.

ユーザは、例えば、操作部２を介して、表示された修正対象候補にカーソルを合わせてクリックすることで、修正対象候補がアクセント句境界か否かを変更できる。そしてその操作に応じた信号を、処理部７へ出力する。 For example, the user can change whether or not the correction target candidate is an accent phrase boundary by positioning the cursor on the displayed correction target candidate and clicking via the operation unit 2. Then, a signal corresponding to the operation is output to the processing unit 7.

修正部２４は、修正対象候補をアクセント句境界とする操作部２を介した操作に応じて、中間表記における、その修正対象候補の位置に、アクセント句境界であることを示す記号を追加する。逆に、修正部２４は、修正対象候補をアクセント句境界でないようにする操作部２を介した操作に応じて、中間表記から、その修正対象候補の位置にある、アクセント句境界であることを示す記号を削除する。 The correction unit 24 adds a symbol indicating an accent phrase boundary to the position of the correction target candidate in the intermediate notation in response to an operation via the operation unit 2 having the correction target candidate as an accent phrase boundary. Conversely, the correction unit 24 determines that the correction target candidate is an accent phrase boundary at the position of the correction target candidate from the intermediate notation in response to an operation through the operation unit 2 that makes the correction target candidate not an accent phrase boundary. Remove the indicated symbol.

なお、本実施形態では、修正部２４は、修正対象候補が修正されても、その修正対象候補に後続するアクセント句のアクセントを修正しない。本実施形態では、修正対象候補が修正されても、修正対象候補に後続するアクセント句のアクセントの位置はそのままでも不自然な発声とならないことが想定されているためである。 In the present embodiment, the correction unit 24 does not correct the accent of the accent phrase that follows the correction target candidate even if the correction target candidate is corrected. This is because, in this embodiment, even if the correction target candidate is corrected, it is assumed that the accent position of the accent phrase that follows the correction target candidate does not result in an unnatural utterance.

図６は、合成音声編集部１２により実行される、合成音声編集処理の動作フローチャートである。合成音声編集処理は、例えば、中間表記が既に生成されている合成音声の原文に対して、操作部２を介して、合成音声編集処理を実行する操作が行われることにより開始される。 FIG. 6 is an operation flowchart of the synthesized speech editing process executed by the synthesized speech editing unit 12. The synthesized speech editing process is started, for example, when an operation for executing the synthesized speech editing process is performed via the operation unit 2 on the synthesized speech original text for which the intermediate notation has already been generated.

アクセント句境界候補抽出部２１は、原文に含まれる各単語の品詞を参照して、自立語とその直前の単語間の境界をアクセント句境界の候補に設定する（ステップＳ１０１）。
修正対象候補決定部２２は、中間表記を参照して、アクセント句境界の候補のうち、上記の（１）及び（２）の条件の一方を満たすアクセント句境界の候補を修正対象候補に設定する（ステップＳ１０２）。 The accent phrase boundary candidate extraction unit 21 refers to the part of speech of each word included in the original text, and sets the boundary between the independent word and the immediately preceding word as an accent phrase boundary candidate (step S101).
The correction target candidate determination unit 22 refers to the intermediate notation, and sets, among the accent phrase boundary candidates, an accent phrase boundary candidate that satisfies one of the above conditions (1) and (2) as a correction target candidate. (Step S102).

表示制御部２３は、修正対象候補を編集可能であることが分かるように表示部３に表示させる（ステップＳ１０３）。
修正部２４は、操作部２を介した操作に応じて、中間表記における、アクセント句境界か否かが変更された修正対象候補に相当する位置のアクセント句境界の表記を修正する（ステップＳ１０４）。そして修正部２４は、修正した中間表記を記憶部６に記憶する。そして合成音声編集部１２は、合成音声編集処理を終了する。 The display control unit 23 displays the correction target candidate on the display unit 3 so that it can be seen that the correction target candidate can be edited (step S103).
The correction unit 24 corrects the notation of the accent phrase boundary at the position corresponding to the correction target candidate in which whether or not the accent phrase boundary is changed in the intermediate notation in accordance with the operation via the operation unit 2 (step S104). . Then, the correction unit 24 stores the corrected intermediate notation in the storage unit 6. Then, the synthesized speech editing unit 12 ends the synthesized speech editing process.

以上に説明してきたように、この音声合成装置は、アクセント句境界の候補のうち、修正に伴って前後のアクセント句のアクセントを変更しなくても不適切な発声とならないものをユーザに提示する。そのため、この音声合成装置は、合成音声のアクセント句の境界の編集によるアクセントの誤りを防止できる。 As described above, this speech synthesizer presents to the user those accent phrase boundary candidates that do not become inappropriate utterances even if the accents of the preceding and following accent phrases are not changed along with the correction. . Therefore, this speech synthesizer can prevent accent errors due to editing of accent phrase boundaries of synthesized speech.

次に、第２の実施形態による音声合成装置について説明する。第２の実施形態による音声合成装置は、アクセント句境界か否かが修正されるとアクセントの位置も変化可能な修正対象候補の表示を、他の修正対象候補可能の表示と異ならせる。 Next, a speech synthesizer according to the second embodiment will be described. The speech synthesizer according to the second embodiment makes the display of the correction target candidate whose accent position can be changed when the accent phrase boundary is corrected differently from the display of other possible correction target candidates.

図７は、第２の実施形態による音声合成装置が有する合成音声編集部の機能ブロック図である。第２の実施形態による合成音声編集部１２は、アクセント句境界候補抽出部２１と、修正対象候補決定部２２と、複合語判定部２５と、アクセント位置変化判定部２６と、表示制御部２３と、修正部２４とを有する。
第２の実施形態による合成音声編集部１２は、第１の実施形態による合成音声編集部と比較して、複合語判定部２５及びアクセント位置変化判定部２６を有する点、及び、表示制御部２３の処理が異なる。そこで以下では、表示制御部２３、複合語判定部２５及びアクセント位置変化判定部２６とその関連部分について説明する。第２の実施形態による音声合成装置のその他の構成要素については、第１の実施形態の対応する構成要素の説明を参照されたい。 FIG. 7 is a functional block diagram of a synthesized speech editing unit included in the speech synthesizer according to the second embodiment. The synthesized speech editing unit 12 according to the second embodiment includes an accent phrase boundary candidate extraction unit 21, a correction target candidate determination unit 22, a compound word determination unit 25, an accent position change determination unit 26, and a display control unit 23. And a correction unit 24.
Compared with the synthesized speech editing unit according to the first embodiment, the synthesized speech editing unit 12 according to the second embodiment includes a compound word determination unit 25 and an accent position change determination unit 26, and a display control unit 23. The processing of is different. Therefore, hereinafter, the display control unit 23, the compound word determination unit 25, the accent position change determination unit 26, and the related parts will be described. For other components of the speech synthesizer according to the second embodiment, refer to the description of the corresponding components of the first embodiment.

複合語、特に、連続した複数の名詞が結合することで生成される複合語では、名詞間の結合位置の前後のアクセント句に含まれる単語のアクセントの位置が、元の名詞のアクセントの位置から変わることがある。また、名詞は自立語なので、複合語に含まれる、名詞同士が結合する位置、すなわち、名詞間の境界はアクセント句境界の候補となる。したがって、複合語に含まれる名詞同士の結合位置にあるアクセント句境界の候補がアクセント句境界となるか否かを変更すると、アクセントの位置を変えた方が自然な発声となる可能性がある。 In compound words, especially compound words generated by combining multiple consecutive nouns, the accent position of the word included in the accent phrase before and after the joint position between nouns is changed from the accent position of the original noun. It may change. Further, since nouns are independent words, positions where nouns are combined, that is, boundaries between nouns, included in compound words are candidates for accent phrase boundaries. Therefore, if the accent phrase boundary candidate at the joining position of nouns included in the compound word is changed as an accent phrase boundary, there is a possibility that the utterance is natural when the accent position is changed.

そこで、複合語判定部２５は、各アクセント句境界の候補について、複数の名詞が連続する複合語中の名詞の結合位置にあるか否かを判定する。そのために、複合語判定部２５は、原文に含まれる各単語の品詞情報を参照して、各アクセント句境界の候補の前後の単語の品詞を確認する。そして複合語判定部２５は、各アクセント句境界の候補のうち、前後の単語がともに名詞である候補が、複合語の結合位置にあると判定する。
複合語判定部２５は、各アクセント句境界の候補について、複合語の結合位置にあるか否かを表す情報を記憶部６に記憶する。
なお、変形例によれば、修正対象候補でないアクセント句境界の候補は、原則として編集を許可されないので、複合語判定部２５は、アクセント句境界の候補のうちの修正対象候補についてのみ、複合語中の結合位置にあるか否かを判定してもよい。 Therefore, the compound word determination unit 25 determines whether or not each accent phrase boundary candidate is at a combined position of nouns in a compound word in which a plurality of nouns are continuous. For this purpose, the compound word determination unit 25 refers to the part-of-speech information of each word included in the original sentence, and confirms the part-of-speech of the words before and after each accent phrase boundary candidate. Then, the compound word determination unit 25 determines that, among the candidates for each accent phrase boundary, candidates whose preceding and following words are nouns are in the combined position of the compound words.
The compound word determination unit 25 stores, in the storage unit 6, information indicating whether or not each accent phrase boundary candidate is at the combined word combination position.
In addition, according to the modified example, editing of accent phrase boundary candidates that are not correction target candidates is not permitted in principle, so that the compound word determination unit 25 determines only compound word candidates from among accent phrase boundary candidates. You may determine whether it exists in a coupling | bonding position inside.

アクセント位置変化判定部２６は、複合語のアクセント結合のルールを参照して、アクセント句境界の候補のそれぞれについて、アクセント結合により、アクセントの位置が変化するか否か判定する。そしてアクセント位置変化判定部２６は、アクセント句境界とするか否かでアクセントの位置が変化するアクセント句境界の候補と、アクセントの位置が変化しないアクセント句境界の候補とに、異なる属性を割り当てる。なお、この属性をアクセント位置変化属性と呼ぶ。 The accent position change determination unit 26 refers to the compound word accent combination rule and determines whether or not the accent position changes due to accent combination for each of the accent phrase boundary candidates. Then, the accent position change determination unit 26 assigns different attributes to an accent phrase boundary candidate whose accent position changes depending on whether or not the accent phrase boundary is used, and an accent phrase boundary candidate whose accent position does not change. This attribute is called an accent position change attribute.

複合語のアクセント結合ルールは、結合位置よりも前のアクセント句については、アクセントを平板とし、かつ、結合位置よりも後のアクセント句については、アクセントが平板でない場合にアクセントの位置を変化させないというものである。なお、以下では、説明の便宜上、結合位置よりも前のアクセント句を前置アクセント句と呼び、結合位置よりも後のアクセント句を後置アクセント句と呼ぶ。
上記のアクセント結合ルールにより、前置アクセント句に含まれる名詞の固有アクセントが平板以外であるか、または、後置アクセント句に含まれる名詞の固有アクセントが平板であれば、アクセント結合の有無によってアクセントの位置が変化する。なお、固有アクセントは、その名詞を単独で発声する場合のアクセントである。 The compound word accent combining rule is that the accent phrase before the combining position is a flat plate, and the accent phrase after the combining position is not changed when the accent is not a flat plate. Is. Hereinafter, for convenience of explanation, an accent phrase before the joining position is referred to as a prefix accent phrase, and an accent phrase after the joining position is referred to as a post-accent phrase.
If the unique accent of the noun included in the prefix accent phrase is not a flat plate or the unique accent of the noun included in the postfix accent phrase is a flat plate according to the above accent combination rule, the accent is determined depending on the presence or absence of the accent combination. The position of changes. The unique accent is an accent when the noun is uttered alone.

したがって、アクセント位置変化判定部２６は、複合語の途中に位置しないアクセント句境界の候補について、前置アクセント句のアクセントが平板であれば、アクセント位置変化属性を「無し」とする。一方、アクセント位置変化判定部２６は、複合語の途中に位置しないアクセント句境界の候補について、前置アクセント句のアクセントが平板以外であれば、原則としてアクセント句の境界の変更は認められないので、アクセント位置変化属性を「不定」とする。 Therefore, the accent position change determination unit 26 sets the accent position change attribute to “none” for the accent phrase boundary candidate not located in the middle of the compound word if the accent of the prefix accent phrase is a flat plate. On the other hand, as for the accent phrase boundary candidate that is not located in the middle of the compound word, the accent position change determination unit 26 is not allowed to change the accent phrase boundary in principle if the accent of the prefix accent phrase is other than a flat plate. The accent position change attribute is “undefined”.

また、アクセント位置変化判定部２６は、複合語の途中に位置するアクセント句境界の候補について、その複合語に含まれる各名詞の固有アクセントを、単語辞書を参照して特定する。そしてアクセント位置変化判定部２６は、前置アクセント句に含まれる名詞の固有アクセントが平板であり、かつ、後置アクセント句に含まれる名詞の固有アクセントが平板以外であれば、アクセント位置変化属性を「無し」とする。一方、アクセント位置変化判定部２６は、前置アクセント句に含まれる名詞の固有アクセントが平板以外であるか、あるいは、後置アクセント句に含まれる名詞の固有アクセントが平板であれば、アクセント位置変化属性を「有り」とする。 Further, the accent position change determination unit 26 specifies the unique accent of each noun included in the compound word with respect to the accent phrase boundary candidate located in the middle of the compound word with reference to the word dictionary. The accent position change determination unit 26 sets the accent position change attribute if the unique accent of the noun included in the prefix accent phrase is a flat plate and the unique accent of the noun included in the postfix accent phrase is other than a flat plate. “None”. On the other hand, if the unique accent of the noun included in the prefix accent phrase is other than a flat plate, or if the unique accent of the noun included in the postfix accent phrase is a flat plate, the accent position change determination unit 26 changes the accent position. The attribute is “present”.

アクセント結合によりアクセント位置の変化が生じた複合語については、アクセント結合の解消に伴って、アクセントの位置を個々の名詞の固有アクセントに一致させるよう変化させても、あるいは、アクセントの位置を維持しても、自然な発声となることがある。 For compound words in which the accent position has changed due to accent coupling, the accent position can be changed to match the unique accent of each noun as the accent coupling is resolved, or the accent position is maintained. But it can be a natural voice.

図８は、アクセント結合の有無によるアクセント位置の違いの一例を示す図である。図８において、丸印及び三角形は、それぞれ、一つの音を表す。アクセント結合が生じた複合語８００では、音の高低を表す折れ線８０１に示されるように、前置アクセント句の８０２のアクセントは平板となり、後置アクセント句８０３のアクセントは、いわゆる頭高あるいは中高となる。
ここで、前置アクセント句８０２と後置アクセント句８０３の間のアクセント句境界の候補８０４をアクセント句境界に修正したとする。この場合、折れ線８１１に示されるように、前置アクセント句８０２及び後置アクセント句８０３のアクセント位置は、アクセント結合がされている場合のまま維持されてもよい。あるいは、折れ線８１２に示されるように、前置アクセント句８０２に含まれる名詞及び後置アクセント句８０３に含まれる名詞の固有アクセントに合わせて、アクセントの位置が修正されてもよい。
このように、アクセント位置を修正するか否かは、ユーザの選択によって決定できる。 FIG. 8 is a diagram illustrating an example of a difference in accent position depending on the presence or absence of accent coupling. In FIG. 8, each of a circle and a triangle represents one sound. In the compound word 800 in which the accent combination is generated, as indicated by the broken line 801 representing the pitch of the sound, the accent of the prefix accent phrase 802 is a flat plate, and the accent of the suffix accent phrase 803 is a so-called head height or middle height. Become.
Here, it is assumed that the accent phrase boundary candidate 804 between the prefix accent phrase 802 and the prefix accent phrase 803 is corrected to the accent phrase boundary. In this case, as indicated by the broken line 811, the accent positions of the front accent phrase 802 and the post accent phrase 803 may be maintained as they are when the accents are combined. Alternatively, as indicated by the broken line 812, the position of the accent may be corrected according to the unique accent of the noun included in the prefix accent phrase 802 and the noun included in the prefix accent phrase 803.
Thus, whether or not to correct the accent position can be determined by the user's selection.

そこで、アクセント位置変化判定部２６は、複合語中の名詞同士の結合位置に有り、アクセント結合がなされており、かつ、アクセント位置変化属性が「有り」と判定されたアクセント句境界の候補については、アクセント位置変化属性を「選択」としてもよい。アクセント位置変化属性が「選択」であるアクセント句境界の候補については、アクセント句境界か否かが変更される際に、ユーザの選択によって、前後のアクセント句のアクセントの位置の変更または維持が決定される。 Therefore, the accent position change determination unit 26 is in the position where the nouns in the compound word are combined, the accent combination is performed, and the accent phrase boundary candidate for which the accent position change attribute is determined to be “present” is used. The accent position change attribute may be “selected”. For accent phrase boundary candidates whose accent position change attribute is “Select”, when the accent phrase boundary is changed, whether the accent position of the preceding or following accent phrase is changed or maintained is determined by the user's selection Is done.

アクセント位置変化判定部２６は、各アクセント句境界の候補についてのアクセント位置変化属性を記憶部６に記憶する。 The accent position change determination unit 26 stores the accent position change attribute for each accent phrase boundary candidate in the storage unit 6.

なお、変形例によれば、アクセント位置変化判定部２６は、修正対象候補についてのみ、アクセント句境界か否かでアクセント位置が変化するか否かを判定し、アクセント位置変化属性を設定してもよい。原則として、修正対象候補でないアクセント句境界の候補は、ユーザによってアクセント句境界か否かが変更されることはないためである。 Note that, according to the modification, the accent position change determination unit 26 determines whether or not the accent position changes depending on the accent phrase boundary only for the correction target candidate, and sets the accent position change attribute. Good. This is because, as a rule, accent phrase boundary candidates that are not correction target candidates are not changed by the user as to whether or not they are accent phrase boundaries.

表示制御部２３は、各修正対象候補の表示を、その修正対象候補のアクセント位置変化属性に応じて異ならせる。 The display control unit 23 changes the display of each correction target candidate according to the accent position change attribute of the correction target candidate.

図９は、アクセント結合の有無、前置アクセント句のアクセント種別及びアクセント位置変化の有無の組み合わせと、編集属性、境界属性及びアクセント位置変化属性の関係を示すテーブルである。なお、編集属性は、修正対象候補であるか否かを表す。また、境界属性は、言語処理の結果として得られた中間表記においてアクセント句境界に設定されているか否かを表す。 FIG. 9 is a table showing the relationship between the presence / absence of accent coupling, the combination of the accent type of the prefix accent phrase and the presence / absence of change in accent position, and the edit attribute, boundary attribute, and accent position change attribute. The edit attribute represents whether or not the candidate is a correction target candidate. The boundary attribute represents whether or not an accent phrase boundary is set in the intermediate notation obtained as a result of language processing.

テーブル９００において、一つの行に一つのカテゴリが示され、カテゴリごとにアクセント結合の有無、前置アクセント句のアクセント種別及びアクセント位置変化の有無の組み合わせが規定される。なお、記号「−」は、その記号が示された項目は参照されないことを示す。なお、カテゴリ１〜５は、アクセント句境界の候補が複合語の途中に位置する場合に対応し、カテゴリ６及び７は、アクセント句境界の候補の位置が複合語の途中でない場合に対応する。 In the table 900, one category is shown in one row, and the combination of presence / absence of accent coupling, accent type of the prefix accent phrase, and presence / absence of accent position change is defined for each category. The symbol “-” indicates that the item indicated by the symbol is not referred to. Categories 1 to 5 correspond to the case where the accent phrase boundary candidate is located in the middle of the compound word, and categories 6 and 7 correspond to the case where the position of the accent phrase boundary candidate is not in the middle of the compound word.

カテゴリ１に示されるように、アクセント句境界の候補がアクセント結合位置に有る場合、アクセント句境界の候補は、アクセント句境界には設定されていない。また、アクセント句境界の候補は編集可能（すなわち、修正対象候補）である。またカテゴリ１では、アクセント結合によりアクセント位置が変化しないので、アクセント位置変化属性は「無し」となる。なお、この場合、アクセント結合ルールから、前置アクセント句の固有アクセントは、必ず平板である。 As shown in category 1, when the accent phrase boundary candidate is at the accent coupling position, the accent phrase boundary candidate is not set as the accent phrase boundary. In addition, accent phrase boundary candidates are editable (that is, correction target candidates). In Category 1, since the accent position does not change due to the accent connection, the accent position change attribute is “none”. In this case, from the accent combination rule, the unique accent of the prefix accent phrase is always a flat plate.

カテゴリ２に示されるように、アクセント句境界の候補がアクセント結合位置ではなく、かつ、アクセント結合によりアクセント位置が変化しない場合も、アクセント結合ルールから、前置アクセント句の固有アクセントは、必ず平板である。したがって、アクセント句境界の候補は編集可能であり、アクセント位置変化属性は「無し」である。またこの場合、アクセント句境界の候補はアクセント結合位置ではないので、アクセント句境界の候補はアクセント句境界である。 As shown in Category 2, even if the accent phrase boundary candidate is not the accent joint position and the accent position does not change due to the accent joint, the unique accent of the prefix accent phrase is always a flat plate from the accent joint rule. is there. Therefore, the accent phrase boundary candidate is editable, and the accent position change attribute is “none”. Further, in this case, since the accent phrase boundary candidate is not the accent coupling position, the accent phrase boundary candidate is the accent phrase boundary.

また、カテゴリ３に示されるように、アクセント句境界の候補がアクセント結合位置に有り、かつ、アクセント結合によりアクセント位置が変化する場合、アクセント句境界の候補は編集可能であり、アクセント位置変化属性は「選択」となる。またこの場合、アクセント句境界の候補は、アクセント句の途中に位置するので、アクセント句境界には設定されていない。 Further, as shown in category 3, when the accent phrase boundary candidate is at the accent joint position and the accent position changes due to the accent joint, the accent phrase boundary candidate is editable, and the accent position change attribute is “Select”. Further, in this case, the accent phrase boundary candidate is located in the middle of the accent phrase, and thus is not set to the accent phrase boundary.

さらに、カテゴリ４に示されるように、アクセント句境界の候補がアクセント結合位置ではなく、かつ、前置アクセント句の固有アクセントが平板である場合も、アクセント句境界の候補は編集可能である。そしてアクセント結合によりアクセント位置が変化するので、アクセント位置変化属性は「選択」である。またこの場合、アクセント句境界の候補はアクセント結合位置ではないので、アクセント句境界の候補はアクセント句境界である。 Further, as shown in category 4, the accent phrase boundary candidate can be edited even if the accent phrase boundary candidate is not the accent coupling position and the unique accent of the prefix accent phrase is a flat plate. Since the accent position is changed by the accent combination, the accent position change attribute is “selection”. Further, in this case, since the accent phrase boundary candidate is not the accent coupling position, the accent phrase boundary candidate is the accent phrase boundary.

また、カテゴリ５に示されるように、アクセント句境界の候補がアクセント結合位置ではなく、かつ、前置アクセント句の固有アクセントが平板以外である場合、原則として、アクセント句境界の候補は編集不可能（すなわち、修正対象候補でない）である。しかし、このカテゴリに属するアクセント句境界の候補は、結合語内の名詞同士の結合位置にある。そこで例外的に、アクセント句境界の候補を編集可能とする。またこのカテゴリでは、アクセント結合の有無によって、少なくとも前置アクセント句のアクセントは変化する。そのため、アクセント位置変化属性は「有り」となる。アクセント句境界の候補をアクセント句の境界に設定することで、少なくとも前置アクセント句のアクセントの位置を変化させる必要が有る。そのため、この場合には、アクセント位置変化判定部２６は、アクセント位置変化属性を「選択」にしないことが好ましい。 In addition, as shown in category 5, if the accent phrase boundary candidate is not the accent combining position and the unique accent of the prefix accent phrase is other than a flat plate, the accent phrase boundary candidate is not editable in principle. (That is, it is not a candidate for correction). However, a candidate for an accent phrase boundary belonging to this category is at a joint position between nouns in the joint word. Therefore, as an exception, accent phrase boundary candidates can be edited. In this category, at least the accent of the prefix accent phrase changes depending on the presence or absence of accent coupling. Therefore, the accent position change attribute is “present”. By setting the accent phrase boundary candidate as the accent phrase boundary, it is necessary to change at least the position of the accent of the prefix accent phrase. Therefore, in this case, it is preferable that the accent position change determination unit 26 does not set the accent position change attribute to “selection”.

さらに、カテゴリ６に示されるように、アクセント句境界の候補の位置が複合語の途中でなければ、アクセント結合の有無によってアクセント位置は変化しない。そのため、アクセント位置変化属性は「無し」となる。また、前置アクセント句の固有アクセントが平板であるので、アクセント句境界の候補は編集可能である。またこの場合、アクセント句境界の候補は、アクセント結合位置ではないので、アクセント句境界の候補は、アクセント句境界である。 Furthermore, as shown in category 6, if the position of the accent phrase boundary candidate is not in the middle of a compound word, the accent position does not change depending on the presence or absence of accent coupling. Therefore, the accent position change attribute is “none”. Moreover, since the unique accent of the prefix accent phrase is a flat plate, the accent phrase boundary candidates can be edited. In this case, since the accent phrase boundary candidate is not the accent coupling position, the accent phrase boundary candidate is the accent phrase boundary.

最後に、カテゴリ７に示されるように、アクセント句境界の候補の位置が複合語の途中でなく、アクセント句境界の候補がアクセント結合の位置でなく、かつ、前置アクセント句の固有アクセントが平板以外である場合、アクセント句境界の候補は編集不可能である。したがって、アクセント位置変化属性は「不定」となる。なお、この場合も、アクセント句境界の候補は、アクセント結合の位置ではないので、アクセント句境界の候補は、アクセント句境界である。 Finally, as shown in category 7, the position of the accent phrase boundary candidate is not in the middle of the compound word, the accent phrase boundary candidate is not the position of the accent combination, and the unique accent of the prefix accent phrase is flat. Otherwise, accent phrase boundary candidates are not editable. Therefore, the accent position change attribute is “undefined”. Also in this case, since the accent phrase boundary candidate is not the position of the accent connection, the accent phrase boundary candidate is the accent phrase boundary.

図１０（ａ）は、第２の実施形態による、修正対象候補を表示する表示部３の表示画面の一例を示す図である。
表示画面１０００には、原文１０１０と、修正対象候補１００１〜１００３が表示されている。このうち、修正対象候補１００１及び１００３は、アクセント結合のために、一つのアクセント句の途中に位置するので、修正前の状態では、アクセント句境界ではない。そのため、この例では、修正対象候補１００１及び１００３は、点線で示されている。一方、修正対象候補１００２は、修正前の状態において、アクセント句境界となっている。そのため、この例では、修正対象候補１００２は、実線で示されている。 FIG. 10A is a diagram illustrating an example of a display screen of the display unit 3 that displays correction target candidates according to the second embodiment.
The display screen 1000 displays the original text 1010 and the correction target candidates 1001 to 1003. Among these, the correction target candidates 1001 and 1003 are located in the middle of one accent phrase for accent combination, and thus are not accent phrase boundaries in the state before correction. Therefore, in this example, the correction target candidates 1001 and 1003 are indicated by dotted lines. On the other hand, the correction target candidate 1002 is an accent phrase boundary in the state before correction. Therefore, in this example, the correction target candidate 1002 is indicated by a solid line.

さらに、修正対象候補１００３は、複合語『音声合成』の結合位置に有り、かつ、アクセント結合によってアクセント位置が変化している。したがって、修正対象候補１００３については、アクセント句境界に変更してアクセント結合を解消した場合に、アクセント結合時のアクセント位置を維持するか、各名詞の固有アクセントに変更するかを選択可能となっている。そこで、修正対象候補１００３は、アクセント位置の変更の有無を選択可能であることを示すために、修正対象候補１００１と異なる表示になっている。この例では、修正対象候補１００１は一本の線で示され、修正対象候補１００３は、二重線で表示される。 Further, the correction target candidate 1003 is at the combined position of the compound word “speech synthesis”, and the accent position is changed by the accent combination. Therefore, with respect to the correction target candidate 1003, when changing to an accent phrase boundary and canceling accent coupling, it is possible to select whether to maintain the accent position at the time of accent coupling or to change to the unique accent of each noun. Yes. Therefore, the correction target candidate 1003 is displayed differently from the correction target candidate 1001 in order to indicate that it is possible to select whether or not the accent position has been changed. In this example, the correction target candidate 1001 is indicated by a single line, and the correction target candidate 1003 is indicated by a double line.

この実施形態においても、ユーザは、例えば、操作部２を介して、表示された修正対象候補にカーソルを合わせてクリックすることで、修正対象候補をアクセント句境界か否かを変更できる。また、アクセント位置の変更が選択可能な修正対象候補については、例えば、操作部２がクリックされるごとに、アクセント位置を維持、アクセント位置を変更、アクセント句境界の変更無しが切り替えられる。そして操作部２は、その操作に応じた信号を、処理部７へ出力する。
修正部２４は、修正対象候補をアクセント句境界とするか否かが変更される際に、その前後のアクセント句においてアクセントの位置も変更される場合、言語処理部１０に、その前後のアクセント句を入力することで、アクセントの位置を修正する。 Also in this embodiment, the user can change whether or not the correction target candidate is an accent phrase boundary by, for example, placing the cursor on the displayed correction target candidate and clicking via the operation unit 2. For correction target candidates that can be selected to change the accent position, for example, each time the operation unit 2 is clicked, the accent position is maintained, the accent position is changed, and the accent phrase boundary is not changed. Then, the operation unit 2 outputs a signal corresponding to the operation to the processing unit 7.
When the correction unit 24 changes whether or not the correction target candidate is an accent phrase boundary, when the accent position is also changed in the preceding and following accent phrases, the correcting unit 24 instructs the language processing unit 10 to display the accent phrases before and after the accent phrase boundary. Enter the to correct the accent position.

図１０（ｂ）は、図１０（ａ）に示された原文の読みとアクセントを示す。片仮名で表記された原文の読み『コノジダイノオンセーゴーセーノシンポワ』と重ねて表示された折れ線１１０１は、アクセント句境界の修正がされていないときのアクセントを表す。また折れ線１１０２は、修正対象候補１００１及び１００２について、アクセント句境界か否かを変更したときのアクセントを表す。折れ線１１０３は、修正対象候補１００３について、アクセント位置を維持したまま、アクセント句境界に変更したときのアクセントを表す。一方、折れ線１１０４は、修正対象候補１００３について、その前後の名詞のアクセント位置が固有アクセントに応じた位置となるよう変更しつつ、アクセント句境界に変更したときのアクセントを表す。 FIG. 10 (b) shows the reading and accent of the original text shown in FIG. 10 (a). A polygonal line 1101 displayed superimposed on the original reading “Konoji Dino On Sego Seino Sympowa” written in katakana represents an accent when the accent phrase boundary is not corrected. A broken line 1102 represents the accent when the correction target candidates 1001 and 1002 are changed as to whether or not the boundary is an accent phrase boundary. A broken line 1103 represents an accent when the correction target candidate 1003 is changed to an accent phrase boundary while maintaining the accent position. On the other hand, the broken line 1104 represents the accent when the correction target candidate 1003 is changed to an accent phrase boundary while the accent positions of the nouns before and after the correction target candidate 1003 are changed to positions corresponding to the unique accents.

図１１は、第２の実施形態による合成音声編集部１２により実行される、合成音声編集処理の動作フローチャートである。 FIG. 11 is an operation flowchart of the synthesized speech editing process executed by the synthesized speech editing unit 12 according to the second embodiment.

アクセント句境界候補抽出部２１は、原文に含まれる各単語の品詞を参照して、自立語とその直前の単語間の境界をアクセント句境界の候補に設定する（ステップＳ２０１）。
修正対象候補決定部２２は、中間表記を参照して、アクセント句境界の候補のうち、上記の（１）及び（２）の条件の一方を満たすアクセント句境界の候補を修正対象候補に設定する（ステップＳ２０２）。 The accent phrase boundary candidate extraction unit 21 refers to the part of speech of each word included in the original text, and sets the boundary between the independent word and the immediately preceding word as an accent phrase boundary candidate (step S201).
The correction target candidate determination unit 22 refers to the intermediate notation, and sets, among the accent phrase boundary candidates, an accent phrase boundary candidate that satisfies one of the above conditions (1) and (2) as a correction target candidate. (Step S202).

複合語判定部２５は、各アクセント句境界の候補について、複数の名詞が連続する複合語中の名詞の結合位置にあるか否かを判定し、結合位置にあるアクセント句境界の候補を特定する（ステップＳ２０３）。そしてアクセント位置変化判定部２６は、複合語のアクセント結合ルールに従って、結合位置にあるアクセント句境界の候補がアクセント句境界か否かを変更するとアクセント位置が変化するか否か判定する。その結果に応じて、アクセント位置変化判定部２６は、各アクセント句境界の候補にアクセント位置変化属性を設定する（ステップＳ２０４）。 The compound word determination unit 25 determines, for each accent phrase boundary candidate, whether or not a plurality of nouns are in the combined position of the nouns in the continuous compound word, and identifies the candidate accent phrase boundary at the combined position. (Step S203). The accent position change determination unit 26 determines whether or not the accent position changes when the accent phrase boundary candidate at the combined position is changed according to the accent combination rule of the compound word. According to the result, the accent position change determination unit 26 sets an accent position change attribute to each accent phrase boundary candidate (step S204).

表示制御部２３は、修正対象候補を編集可能であることが分かるように表示部３に表示させる。その際、表示制御部２３は、修正対象候補のうち、アクセント句境界か否かを変更することでアクセントの位置が変化する修正対象候補とアクセントの位置が変化しない修正対象候補とが異なる表示となるように、各修正対象候補を表示させる（ステップＳ２０５）。
修正部２４は、操作部２を介した操作に応じて、中間表記のうち、変更された修正対象候補に相当する位置のアクセント句境界の表記を修正する（ステップＳ２０６）。そして修正部２４は、修正した中間表記を記憶部６に記憶する。そして合成音声編集部１２は、合成音声編集処理を終了する。 The display control unit 23 displays the correction target candidate on the display unit 3 so that it can be seen that the correction target candidate can be edited. At that time, the display control unit 23 displays a display in which the correction target candidate whose accent position changes by changing whether or not the accent phrase boundary is different from the correction target candidate and the correction target candidate whose accent position does not change. Each correction target candidate is displayed so that it becomes (step S205).
The correction unit 24 corrects the notation of the accent phrase boundary at the position corresponding to the changed correction target candidate in the intermediate notation in accordance with the operation via the operation unit 2 (step S206). Then, the correction unit 24 stores the corrected intermediate notation in the storage unit 6. Then, the synthesized speech editing unit 12 ends the synthesized speech editing process.

以上に説明してきたように、第２の実施形態による音声合成装置は、複合語中の結合位置にある、アクセント句境界とするか否かでアクセントの位置が変化する修正対象候補を、アクセントの位置が変化しない修正対象候補と区別して表示部に表示させる。またこの音声合成装置は、ユーザの操作に応じて、アクセント句境界とするか否かでアクセントの位置が変化する修正対象候補が変更される際、アクセント位置を維持するか、変更するかをユーザが選択可能とする。そのため、この音声合成装置は、アクセント句境界を変更することで、より自然な発声の合成音声を得ることができる。 As described above, the speech synthesizer according to the second embodiment determines a correction target candidate whose accent position changes depending on whether or not it is an accent phrase boundary at a combined position in a compound word. It is displayed on the display unit separately from the correction target candidates whose positions do not change. The speech synthesizer also determines whether to maintain or change the accent position when a correction target candidate whose accent position changes depending on whether the accent phrase boundary is used or not is changed according to a user operation. Can be selected. Therefore, this speech synthesizer can obtain a more natural synthesized speech by changing the accent phrase boundary.

さらに、上記の各実施形態による音声合成装置の処理部が有する各機能をコンピュータに実現させるコンピュータプログラムは、コンピュータによって読み取り可能な媒体、例えば、磁気記録媒体、光記録媒体または半導体メモリに記録された形で提供されてもよい。 Furthermore, a computer program that causes a computer to realize each function of the processing unit of the speech synthesizer according to each of the above embodiments is recorded on a computer-readable medium, for example, a magnetic recording medium, an optical recording medium, or a semiconductor memory. It may be provided in the form.

ここに挙げられた全ての例及び特定の用語は、読者が、本発明及び当該技術の促進に対する本発明者により寄与された概念を理解することを助ける、教示的な目的において意図されたものであり、本発明の優位性及び劣等性を示すことに関する、本明細書の如何なる例の構成、そのような特定の挙げられた例及び条件に限定しないように解釈されるべきものである。本発明の実施形態は詳細に説明されているが、本発明の精神及び範囲から外れることなく、様々な変更、置換及び修正をこれに加えることが可能であることを理解されたい。 All examples and specific terms listed herein are intended for instructional purposes to help the reader understand the concepts contributed by the inventor to the present invention and the promotion of the technology. It should be construed that it is not limited to the construction of any example herein, such specific examples and conditions, with respect to showing the superiority and inferiority of the present invention. Although embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions and modifications can be made thereto without departing from the spirit and scope of the present invention.

１音声合成装置
２操作部
３表示部
４通信インターフェース部
５出力部
６記憶部
７処理部
８スピーカ
１０言語処理部
１１音声合成部
１２合成音声編集部
２１アクセント句境界候補抽出部
２２修正対象候補決定部
２３表示制御部
２４修正部
２５複合語判定部
２６アクセント位置変化判定部 DESCRIPTION OF SYMBOLS 1 Speech synthesizer 2 Operation part 3 Display part 4 Communication interface part 5 Output part 6 Storage part 7 Processing part 8 Speaker 10 Language processing part 11 Speech synthesizer 12 Synthetic speech editing part 21 Accent phrase boundary candidate extraction part 22 Correction object candidate determination 23 Display control unit 24 Correction unit 25 Compound word determination unit 26 Accent position change determination unit

Claims

An input unit that obtains the original text of the synthesized speech and data representing the reading of the original text;
A storage unit for storing a word dictionary in which part-of-speech and accent positions for each word are registered;
By performing linguistic processing for reading the original text and the original text with reference to the word dictionary, the part of speech of each word included in the original text is specified, and the original text is divided in units of accent phrases, A language processing unit for generating an intermediate notation representing a position where the accent of the phrase and the accent coupling occur;
An accent phrase that identifies a word that is an independent word with reference to the part of speech of each word included in the original text, and that uses a boundary between the independent word and the word immediately before the independent word as a candidate for an accent phrase boundary, respectively A boundary candidate extraction unit;
Of the accent phrase boundary candidates, depending on whether the accent phrase precedes the candidate and whether the candidate is at the position where the accent is combined, whether or not the accent phrase boundary, A correction target candidate determining unit that identifies candidates whose accent phrases before and after the candidate do not become false accents as correction target candidates;
A display control unit for displaying the correction target candidates on a display unit;
A speech synthesizer.

The correction target candidate determination unit selects a candidate whose accent phrase position before and after the candidate does not change even if the accent phrase boundary is changed among the accent phrase boundary candidates. The speech synthesizer according to claim 1, which is a candidate.

The speech synthesis device according to claim 2, wherein the correction target candidate determination unit sets, as the correction target candidate, a candidate at an accent-combined position among the accent phrase boundary candidates.

The correction target candidate determination unit sets a candidate in the accent phrase boundary candidate at a position where no accents are combined and the accent phrase immediately before the candidate is a flat plate as the correction target candidate. The speech synthesizer according to claim 2 or 3.

5. The apparatus according to claim 1, further comprising a correction unit that corrects whether or not the selected correction target candidate is a boundary of an accent phrase in the intermediate notation in accordance with an operation through an operation unit. The speech synthesizer described.

For each of the correction target candidates, a compound word determination unit that determines whether or not a plurality of nouns are in the combined position of nouns in a compound word that is continuous,
Based on the compound word accent combining rule, if the boundary of the accent phrase among the correction target candidates at the combined position is changed, the position of at least one of the accent phrases before and after the correction target candidate The first attribute indicating that the accent position changes is assigned to the correction target candidate that changes, and on the other hand, even if the boundary of the accent phrase is changed, the accent phrase of the accent phrase before and after the correction target candidate is changed. An accent position change determination unit that assigns a second attribute indicating that the accent position does not change to a correction target candidate whose position does not change,
The display control unit makes the display on the display unit of the correction target candidate having the first attribute different from the display on the display unit of the correction target candidate having the second attribute. 5. The speech synthesizer according to any one of 4 above.

The combining rule is that the noun accent immediately before the combining position is a flat plate, and the noun accent immediately after the combining position is not a flat plate when the compound word is not changed.
The accent position change determination unit is configured such that an accent of a noun included in an accent phrase immediately before the correction target candidate at the combined position is other than a flat plate, or an accent immediately after the correction target candidate at the combined position The speech synthesizer according to claim 5, wherein when the noun accent included in the phrase is a flat plate, the first attribute is assigned to the correction target candidate.

When the correction target candidate having the first attribute selected in the intermediate notation data is corrected to be a boundary of an accent phrase in response to an operation through the operation unit, the first attribute The speech synthesizer according to claim 6, further comprising a correcting unit that determines whether or not to correct the positions of the accent phrases before and after the correction target candidate having the above.

Referring to the word dictionary in which the part of speech and accent position for each word are registered, by performing linguistic processing on the synthesized speech and the reading of the original, the part of speech of each word included in the original is identified, And, the original text is divided in units of accent phrases to generate an intermediate notation representing the position of each accent phrase and the position where the accent combination occurs,
Identify words that are independent words with reference to the part of speech of each word included in the original text, and each boundary between the independent word and the word immediately before the independent word is a candidate for an accent phrase boundary,
Of the accent phrase boundary candidates, depending on whether the accent phrase precedes the candidate and whether the candidate is at the position where the accent is combined, whether or not the accent phrase boundary, Identify candidates whose accent phrases before and after the candidate are not incorrect accents,
Displaying the candidate for correction on a display unit;
A synthesized speech editing method.

Referring to the word dictionary in which the part of speech and accent position for each word are registered, by performing linguistic processing on the synthesized speech and the reading of the original, the part of speech of each word included in the original is identified, And, the original text is divided in units of accent phrases to generate an intermediate notation representing the position of each accent phrase and the position where the accent combination occurs,
Identify words that are independent words with reference to the part of speech of each word included in the original text, and each boundary between the independent word and the word immediately before the independent word is a candidate for an accent phrase boundary,
Of the accent phrase boundary candidates, depending on whether the accent phrase precedes the candidate and whether the candidate is at the position where the accent is combined, whether or not the accent phrase boundary, Identify candidates whose accent phrases before and after the candidate are not incorrect accents,
Displaying the candidate for correction on a display unit;
A computer program for synthetic speech editing for causing a computer to execute this.