JP2005345699A

JP2005345699A - Device, method, and program for speech editing

Info

Publication number: JP2005345699A
Application number: JP2004164450A
Authority: JP
Inventors: Katsuyuki Murata; 克之村田; Shigenobu Seto; 重宣瀬戸
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2004-06-02
Filing date: 2004-06-02
Publication date: 2005-12-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech editing device capable of more easily editing a synthesized speech that an operator desires, without increasing the operation load on the operator. <P>SOLUTION: A speech editing device is equipped with correction means 110 and 112 of correcting a synthesized sound generated for a text; a correction information storage means 131 of storing correction information, indicating a corrected sound obtained by the correction means 110 and 112; a history structure generating means 136 of generating history structure information, indicating the relation between the correction information stored in the correction information storage means 131 and the synthesized sound before the correction; a display means 140 of displaying correspondence relation between the correction information and synthesized sound, made to correspond to each other in the history structure information generated by the history structure generating means 136; a choice-accepting means 120 of accepting a choice of correction information displayed by the display means 140, and a synthesized sound generating means 114 of generating a synthesized sound, based on the correction information selected by the choice-accepting means 120. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、合成音を編集する音声編集装置、音声編集方法および音声編集プログラムに関するものである。 The present invention relates to a voice editing device, a voice editing method, and a voice editing program for editing synthesized sound.

一般に、テキスト音声合成処理では、まず入力されたテキストを言語処理し、テキストの発音を示す表音情報を得る。表音情報には、言語処理の結果出力される、品詞、係り受け、読み、アクセント型、区切り位置、区切りの種別などが含まれている。そして、表音情報を基に韻律制御を行って韻律情報を生成する。韻律情報には、ピッチパターンや音韻継続時間長などが含まれている。この韻律情報を基に合成音を生成して、最終的にＤＡコンバータを介して合成音を出力する。 In general, in the text-to-speech synthesis process, first, input text is subjected to language processing to obtain phonetic information indicating the pronunciation of the text. The phonetic information includes parts of speech, dependency, reading, accent type, break position, break type, and the like that are output as a result of language processing. Then, prosody information is generated by performing prosody control based on phonetic information. The prosody information includes a pitch pattern, a phoneme duration, and the like. A synthesized sound is generated based on this prosodic information, and finally the synthesized sound is output via a DA converter.

このようなテキスト音声合成処理においては、単語の読み間違えなど言語処理において明らかな誤りが発生する、韻律が不自然であるなどの問題が生じる場合がある。 In such a text-to-speech synthesis process, there may occur problems such as a mistake in reading a word, an obvious error in language processing, and an unnatural prosody.

音声合成編集装置は、上述のようなテキスト音声合成の結果の誤り修正や、韻律の不自然さの改善などの修正を行う装置である。さらに、これらの処理を操作者が対話的に行うことができる。このように、上述のテキスト音声合成における問題を解決するために有効なツールである。 The speech synthesis editing device is a device that corrects errors such as the result of text-to-speech synthesis as described above and correction of prosody unnaturalness. Furthermore, these processes can be performed interactively by the operator. Thus, it is an effective tool for solving the above-described problems in text-to-speech synthesis.

音声合成編集装置としては、表音情報を表音記号列として出力し、言語処理において発生した誤りを対話的に修正するものが知られている（例えば、特許文献１参照）。 As a speech synthesis editing device, one that outputs phonetic information as a phonetic symbol string and interactively corrects an error that has occurred in language processing is known (for example, see Patent Document 1).

また、韻律情報を対話的に変更できる音声合成編集装置も知られている。当該装置においては、表音情報のみの修正に比べてより詳細な修正が可能である。具体的には、イントネーションの微妙な変更、読む速さの調整などが可能である。これにより、より自然性の高い、あるいは多様な韻律の合成音を作成することができる。 A speech synthesis editing apparatus that can interactively change prosodic information is also known. In this apparatus, more detailed correction is possible than correction of only phonetic information. Specifically, it is possible to change the intonation subtly and adjust the reading speed. As a result, it is possible to create synthetic sounds with higher naturalness or various prosody.

以上のように、操作者は、従来の音声合成編集装置を用いて、対話的に表音情報または韻律情報を修正することにより、所望の合成音に調整することができる。なお、この過程において、表音情報あるいは韻律情報を修正し、修正した結果を操作者自身が聞いて確かめるという、修正・試聴の作業が必須である。 As described above, the operator can adjust the phonetic information or the prosodic information interactively using a conventional voice synthesis editing apparatus to adjust to a desired synthesized sound. In this process, it is indispensable to perform modification and audition work in which the phonetic information or prosody information is corrected and the operator himself / herself confirms the corrected result.

特開２００２−３５１４８６号公報JP 2002-351486 A

しかし、上述の修正・試聴を繰り返す過程において、以前に修正した合成音を利用したい場合がある。例えば、以前得られた修正音が希望する合成音に近かったとあとから思い直した場合や、ある時点からもう一度修正をやり直したい場合等である。 However, there are cases where it is desired to use a previously modified synthesized sound in the process of repeating the above-described correction / trial listening. For example, this may be the case where the previously obtained correction sound is close to the desired synthesized sound, or when it is desired to redo the correction again at a certain point.

従来の音声合成編集装置においては、このような場合には、合成音からの修正操作を思い出し、所望の修正状態を再現するか、最初から修正をやり直す必要があった。このように、所望の状態から合成音の修正を行うことができないなど、操作者による編集環境が制限されていたため、より簡単に操作者の希望する合成音に修正できる編集環境の提供が望まれていた。 In such a case, the conventional speech synthesis / editing apparatus needs to recall the correction operation from the synthesized sound and reproduce the desired correction state or perform correction again from the beginning. As described above, since the editing environment by the operator is limited such that the synthesized sound cannot be corrected from a desired state, it is desired to provide an editing environment that can more easily correct the synthesized sound desired by the operator. It was.

本発明は、上記に鑑みてなされたものであって、操作者の操作負担を増加させることなく、より簡単に操作者の希望する合成音を編集することのできる音声編集装置を提供することを目的とする。 The present invention has been made in view of the above, and it is an object of the present invention to provide a voice editing apparatus that can more easily edit a synthesized sound desired by an operator without increasing an operation burden on the operator. Objective.

上述した課題を解決し、目的を達成するために、本発明は、テキストに対して生成された合成音を修正する修正手段と、前記修正手段により得られた修正音を示す修正情報を格納する修正情報格納手段と、前記修正情報格納手段が格納する前記修正情報と修正前の前記合成音との関係を示す履歴構造情報を生成する履歴構造生成手段と、前記履歴構造生成手段が生成した前記履歴構造情報において対応付けられた前記修正情報と前記合成音との対応関係を表示する表示手段と、前記表示手段に表示された前記修正情報の選択を受け付ける選択受付手段と、前記選択受付手段によって選択された前記修正情報に基づいて合成音を生成する合成音生成手段とを備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention stores correction means for correcting a synthesized sound generated for text and correction information indicating the correction sound obtained by the correction means. Correction information storage means, history structure generation means for generating history structure information indicating the relationship between the correction information stored in the correction information storage means and the synthesized sound before correction, and the history structure generation means generated by the history structure generation means Display means for displaying the correspondence between the correction information and the synthesized sound associated in the history structure information, a selection receiving means for receiving selection of the correction information displayed on the display means, and the selection receiving means Synthetic sound generating means for generating a synthetic sound based on the selected correction information is provided.

本発明にかかる音声編集装置は、修正後の修正音を示す修正情報を格納し、さらに複数の修正情報間の対応関係を識別可能に表示することができるので、以前行った修正段階から再度修正を行いたい場合にも、表示された対応関係に基づいて、容易に希望の修正情報を指定し、修正情報に対応する修正の状態から再度修正を行うことができるという効果を奏する。これにより、操作者の操作負担を増加させることなく、操作者の希望する合成音を編集することができるという効果を奏する。 The voice editing apparatus according to the present invention stores correction information indicating a corrected sound after correction, and can display the correspondence between a plurality of correction information in an identifiable manner, so that the correction is performed again from the previously performed correction stage. Even if it is desired to perform the correction, it is possible to easily specify the desired correction information based on the displayed correspondence relationship, and to perform correction again from the correction state corresponding to the correction information. As a result, the synthesized sound desired by the operator can be edited without increasing the operation burden on the operator.

以下に、本発明にかかる音声編集装置、音声編集方法および音声編集プログラムの実施の形態を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。 Hereinafter, embodiments of a voice editing device, a voice editing method, and a voice editing program according to the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.

（実施の形態）
図１は、本発明による音声編集装置の一実施の形態としての音声合成編集装置１０を示すブロック図である。音声合成編集装置１０は、は、テキスト取得部１００と、言語処理部１０２と、表音情報格納部１０４と、韻律制御部１０６と、韻律情報格納部１０８と、表音情報修正部１１０と、韻律情報修正部１１２と、合成音生成部１１４と、合成音出力部１１６と、指示受付部１２０と、修正管理部１３０と、修正表示部１４０とを備えている。 (Embodiment)
FIG. 1 is a block diagram showing a speech synthesis / editing apparatus 10 as an embodiment of a speech editing apparatus according to the present invention. The speech synthesis editing apparatus 10 includes a text acquisition unit 100, a language processing unit 102, a phonetic information storage unit 104, a prosody control unit 106, a prosody information storage unit 108, a phonetic information correction unit 110, The prosody information correction unit 112, the synthetic sound generation unit 114, the synthetic sound output unit 116, the instruction reception unit 120, the correction management unit 130, and the correction display unit 140 are provided.

本実施の形態では、テキストから合成音を生成するに際し、テキストを言語処理して得られる表音情報、及びこの表音情報から韻律制御して生成される韻律情報とを、これら表音情報と韻律情報の修正する各パラメータを表示画面を見ながらユーザからの指示入力に基づいて修正し、この修正履歴を保持する。再度ユーザからの修正指示があった場合には、この修正履歴を利用して表音情報及び韻律情報を修正して合成音が生成される。 In the present embodiment, when generating synthesized speech from text, phonetic information obtained by language processing of text, and prosodic information generated by prosody control from this phonetic information, these phonetic information and Each parameter to be corrected in the prosodic information is corrected based on an instruction input from the user while viewing the display screen, and this correction history is retained. When there is a correction instruction from the user again, a synthesized sound is generated by correcting the phonetic information and the prosodic information using this correction history.

テキスト入力部１０１は、合成音に変換すべきテキストを入力させる外部インターフェースである。テキストは、キーボードによる入力、あるいはテキストファイルの読み込みによって入力される。テキスト取得部１００は、外部インターフェースであるテキスト入力部１０１を介して、合成音に変換すべきテキストを取得する。 The text input unit 101 is an external interface for inputting text to be converted into synthesized sound. Text is input by keyboard input or by reading a text file. The text acquisition unit 100 acquires text to be converted into synthesized sound via a text input unit 101 that is an external interface.

言語処理部１０２は、テキスト取得部１００が取得したテキストに対して言語処理を行い、表音情報を生成する。ここで、言語処理とは、テキストをアクセント句単位に分割し、各アクセント句に対応する表音情報を出力する処理である。また、表音情報は、言語処理過程で生成される情報であり、一般に中間言語と呼ばれる表音記号列などを含んでいる。具体的には、形態素解析の結果から得られる品詞などの形態情報、係り受けなどの構文情報、読み、アクセント型、区切り位置、区切りの種別などを表した情報である。 The language processing unit 102 performs language processing on the text acquired by the text acquisition unit 100 to generate phonetic information. Here, the language processing is processing for dividing text into accent phrase units and outputting phonetic information corresponding to each accent phrase. The phonetic information is information generated in the language processing process and includes a phonetic symbol string generally called an intermediate language. Specifically, it is morphological information such as parts of speech obtained from the result of morphological analysis, syntax information such as dependency, information representing reading, accent type, delimiter position, delimiter type, and the like.

表音情報格納部１０４は、言語処理部１０２で生成した表音情報を、各アクセント句に対応付けて格納する。図２は、表音情報格納部１０４に格納される表音情報のデータ構成を模式的に示している。図２は、一例として、テキスト「今日は、良い天気です。」に対する表音情報１０４０のデータ構成を示している。 The phonetic information storage unit 104 stores the phonetic information generated by the language processing unit 102 in association with each accent phrase. FIG. 2 schematically shows the data structure of the phonetic information stored in the phonetic information storage unit 104. FIG. 2 shows, as an example, a data structure of the phonetic information 1040 for the text “Today is good weather”.

図２に示すように、表音情報格納部１０４は、テキストに含まれる各アクセント句に対して、先頭のアクセント句から順番にアクセント句ＩＤを付与して格納している。また、各アクセント句ＩＤに対応付けて各アクセント句に対する表音情報が対応付けられている。このように、各アクセント句と表音情報とを一対一に対応付けて格納する。 As shown in FIG. 2, the phonetic information storage unit 104 stores accent accent IDs in order from the top accent phrase for each accent phrase included in the text. Also, phonetic information for each accent phrase is associated with each accent phrase ID. In this way, each accent phrase and phonetic information are stored in a one-to-one correspondence.

再び説明を図１に戻す。韻律制御部１０６は、表音情報格納部１０４に格納された表音情報を基に韻律制御を行って、韻律情報を生成する。ここで、韻律情報とは、音韻継続時間長およびピッチパターンを示す情報である。韻律制御部１０６は、言語処理部１０２によって分割された各アクセント句に対して韻律情報を生成する。具体的には、例えば、テキスト「今日は、良い天気です。」の「きょーわ」、「よい」、「てんきです」の３つのアクセント句すべてに対して、表音情報を基に音韻継続時間長とピッチパターンを生成する。 The description returns to FIG. 1 again. The prosody control unit 106 performs prosody control based on the phonetic information stored in the phonetic information storage unit 104 to generate prosody information. Here, the prosodic information is information indicating a phoneme duration and a pitch pattern. The prosody control unit 106 generates prosody information for each accent phrase divided by the language processing unit 102. Specifically, for example, for all three accent phrases “Kyowa”, “Good”, and “Tenki is” in the text “Today is a good weather.” Generate duration and pitch pattern.

韻律情報格納部１０８は、韻律制御部１０６が生成した韻律情報を格納する。韻律情報は、１次元配列Ｐｉｔｃｈ［ｉ］で表現される。ここで、ｉは時間に相当するフレーム番号であり、Ｐｉｔｃｈ［ｘ］はフレーム番号ｘのピッチを示している。後述する韻律情報修正部１１２では、横軸を時間（フレーム）、縦軸をピッチ（オクターブ）としてピッチパターンが表示される。図３は、韻律情報格納部１０８が格納する韻律情報のデータ構成を模式的に示している。図３は、一例として、テキスト「今日は良い天気です。」に対する韻律情報１０８０のデータ構成を示している。 The prosody information storage unit 108 stores the prosody information generated by the prosody control unit 106. Prosodic information is represented by a one-dimensional array Pitch [i]. Here, i is a frame number corresponding to time, and Pitch [x] indicates the pitch of the frame number x. In the prosodic information correction unit 112 described later, a pitch pattern is displayed with time (frame) on the horizontal axis and pitch (octave) on the vertical axis. FIG. 3 schematically shows the data structure of the prosodic information stored in the prosodic information storage unit 108. FIG. 3 shows a data structure of the prosodic information 1080 for the text “Today is a good weather” as an example.

図３に示すように、テキストに含まれる各アクセント句に対して、Ｐｉｔｃｈ［ｉ］で表現された韻律情報が格納されている。すなわち、韻律情報格納部１０８は、ピッチとしてＰｉｔｃｈ［ｉ］の値を格納している。また、Ｐｉｔｃｈ［ｉ］の数に対応した時間長を格納している。 As shown in FIG. 3, prosodic information expressed by Pitch [i] is stored for each accent phrase included in the text. That is, the prosody information storage unit 108 stores the value of Pitch [i] as the pitch. In addition, a time length corresponding to the number of Pitch [i] is stored.

なお、他の例としては、各アクセント句が何番目のフレームから始まるか、ポインタを設定してもよい。この場合には、韻律情報全体から所定のアクセント句の韻律情報を抽出可能な形態に格納する。 As another example, a pointer may be set as to what frame each accent phrase starts from. In this case, the prosodic information of a predetermined accent phrase is stored in a form that can be extracted from the entire prosodic information.

再び説明を図１に戻す。指示入力部１２１はユーザからの指示をマウス、キーボード等で入力する。指示受付部１２０は、ユーザインターフェースである指示入力部１２１を介して操作者からの指示を受け付ける。そして、指示受付部１２０は、操作者かららの指示を示す指示情報を表音情報修正部１１０、韻律情報修正部１１２および合成音生成部１１４に送る。具体的な修正指示は、図８、９で後述する修正表示部１４０に表示された修正画面を見ながら行われる。 The description returns to FIG. 1 again. The instruction input unit 121 inputs an instruction from the user with a mouse, a keyboard, or the like. The instruction receiving unit 120 receives an instruction from an operator via an instruction input unit 121 that is a user interface. Then, the instruction receiving unit 120 sends instruction information indicating an instruction from the operator to the phonetic information correcting unit 110, the prosody information correcting unit 112, and the synthesized sound generating unit 114. A specific correction instruction is given while viewing the correction screen displayed on the correction display unit 140 described later with reference to FIGS.

表音情報修正部１１０は、指示受付部１２０から受け取った指示情報に基づいて表音情報格納部１０４に格納された表音情報を修正表示部１４０に表示される修正画面を見ながら対話的に修正する。そして、修正後の表音情報を後述の修正状態情報ファイル格納部１３１に格納させる。韻律情報修正部１１２は、指示受付部１２０から受け取った指示情報に基づいて韻律情報格納部１０８に格納された韻律情報を修正表示部１４０に表示される修正画面を見ながら対話的に修正する。そして、修正後の韻律情報を後述の修正状態情報ファイル格納部１３１に格納させる。 The phonetic information correction unit 110 interactively observes the phonetic information stored in the phonetic information storage unit 104 based on the instruction information received from the instruction receiving unit 120 while viewing the correction screen displayed on the correction display unit 140. Correct it. Then, the corrected phonetic information is stored in the correction state information file storage unit 131 described later. The prosody information correction unit 112 interactively corrects the prosody information stored in the prosody information storage unit 108 based on the instruction information received from the instruction reception unit 120 while viewing the correction screen displayed on the correction display unit 140. Then, the modified prosodic information is stored in the modified state information file storage unit 131 described later.

表音情報修正部１１０および韻律情報修正部１１２は、言語処理部１０２において分割されたアクセント句を単位に処理を行う。なお、本実施の形態にかかる表音情報修正部１１０および韻律情報修正部１１２は、本発明の修正手段を構成する。 The phonetic information correction unit 110 and the prosody information correction unit 112 perform processing in units of accent phrases divided by the language processing unit 102. Note that the phonetic information correction unit 110 and the prosody information correction unit 112 according to the present embodiment constitute correction means of the present invention.

合成音生成部１１４は、表音情報格納部１０４に格納された表音情報と韻律情報格納部１０８に格納された韻律情報とに基づいて合成音を生成する。合成音出力部１１６は、合成音生成部１１４で生成された合成音をＤＡ変換して外部に出力する。 The synthesized sound generation unit 114 generates a synthesized sound based on the phonetic information stored in the phonetic information storage unit 104 and the prosody information stored in the prosody information storage unit 108. The synthesized sound output unit 116 performs DA conversion on the synthesized sound generated by the synthesized sound generating unit 114 and outputs the resultant to the outside.

修正管理部１３０は、表音情報修正部１１０および韻律情報修正部１１２の処理により修正された修正後の表音情報および修正後の韻律情報に基づいて合成された合成音、すなわち修正音を管理する。修正表示部１４０は、修正管理部１３０が管理する修正音等を表示する。 The correction management unit 130 manages the modified phonetic information corrected by the processing of the phonetic information correcting unit 110 and the prosody information correcting unit 112 and the synthesized sound synthesized based on the corrected prosody information, that is, the corrected sound. To do. The correction display unit 140 displays a correction sound and the like managed by the correction management unit 130.

図４は、修正管理部１３０の詳細な機能構成を示すブロック図である。修正管理部１３０は、修正状態情報ファイル格納部１３１と、修正履歴ファイル生成部１３２と、修正履歴ファイル格納部１３４と、履歴構造データ生成部１３６と、履歴構造データ格納部１３８とを有している。 FIG. 4 is a block diagram illustrating a detailed functional configuration of the correction management unit 130. The modification management unit 130 includes a modification state information file storage unit 131, a modification history file generation unit 132, a modification history file storage unit 134, a history structure data generation unit 136, and a history structure data storage unit 138. Yes.

修正状態情報ファイル格納部１３１は、修正状態情報を格納する。ここで、修正状態情報とは、表音情報修正部１１０により修正された表音情報と韻律情報修正部１１２により修正された韻律情報とを対にしたものである。すなわち、修正状態情報により修正音を生成することができる。なお、実施の形態にかかる修正状態情報ファイル格納部１３１は、本発明の修正情報格納手段を構成する。また、実施の形態にかかる修正状態情報は、本発明の修正情報に相当する。 The modification state information file storage unit 131 stores modification state information. Here, the correction state information is a pair of the phonetic information corrected by the phonetic information correcting unit 110 and the prosody information corrected by the prosody information correcting unit 112. That is, a correction sound can be generated based on the correction state information. The correction state information file storage unit 131 according to the embodiment constitutes a correction information storage unit of the present invention. Further, the correction state information according to the embodiment corresponds to the correction information of the present invention.

修正状態情報は、表音情報である表音記号列と、韻律情報である１次元配列データを含んでいる。修正状態情報は、修正が行われるごとに１つのファイルとして生成される。また、テキスト中の１文、すなわち句点「。」までの文字列を単位として生成される。修正状態情報ファイルには、「ＥｄｉｔＤａｔａ０＿＃１．ｄａｔ」、「ＥｄｉｔＤａｔａ０＿＃２．ｄａｔ」、「ＥｄｉｔＤａｔａ０＿＃３．ｄａｔ」のように修正された順番に#１からの番号が付与される。より具体的には、修正され、かつ修正状態情報ファイルが生成された場合に、順番を示す番号が付与される。ここで、「ＥｄｉｔＤａｔａ０＿＃１．ｄａｔ」に含まれる「０」はテキスト中の第１文であることを示している。 The correction state information includes a phonetic symbol string that is phonetic information and one-dimensional array data that is prosodic information. The correction state information is generated as one file every time correction is performed. Further, it is generated in units of a character string up to one sentence in the text, that is, the punctuation mark “.”. Numbers from # 1 are assigned to the modification status information file in the order of modification such as “EditData0_ # 1.dat”, “EditData0_ # 2.dat”, and “EditData0_ # 3.dat”. More specifically, a number indicating the order is assigned when a correction is made and a correction state information file is generated. Here, “0” included in “EditData0_ # 1.dat” indicates the first sentence in the text.

修正状態情報ファイル格納部１３１はまた、指示受付部１２０からの指示に基づいて、修正状態情報に対応する表音情報および韻律情報をそれぞれ表音情報格納部１０４および韻律情報格納部１０８に送る。 The modification state information file storage unit 131 also sends the phonetic information and prosody information corresponding to the modification state information to the phonetic information storage unit 104 and the prosody information storage unit 108 based on the instruction from the instruction receiving unit 120, respectively.

図５は、修正状態情報ファイル格納部１３１に格納される修正後の韻律情報のデータ構成を模式的に示している。修正後の韻律情報としては、韻律情報の初期値との差分値を格納する。図５に示す例においては、「きょーわ」のアクセント句に対しては、ピッチの初期値との差分値「＋５」、時間長の差分値「０」が格納されている。このように差分値のみを格納することにより、データ量を軽減することができる。 FIG. 5 schematically shows a data structure of the prosodic information after correction stored in the correction state information file storage unit 131. As the prosodic information after correction, a difference value from the initial value of the prosodic information is stored. In the example shown in FIG. 5, a difference value “+5” from the initial pitch value and a time length difference value “0” are stored for the accent phrase “Kyowa”. By storing only the difference value in this way, the data amount can be reduced.

なお、他の例としては、修正後の韻律情報自体を保持してもよい。また、本実施の形態にかかる修正状態情報は、１つの文に対する修正された表音情報および韻律情報をすべて含むが、他の例としては、修正されたアクセント句と、その修正項目と修正値のみを修正状態情報としてもよい。 As another example, the modified prosodic information itself may be retained. The correction state information according to the present embodiment includes all the phonetic information and prosody information corrected for one sentence. As another example, the corrected accent phrase, the correction item, and the correction value are included. Only the correction status information may be used.

修正履歴ファイル生成部１３２は、表音情報修正部１１０および韻律情報修正部１１２による修正履歴を示す修正履歴ファイルを生成し、修正履歴ファイル格納部１３４に格納する。なお、修正履歴ファイルはテキストの１文単位に生成される。 The correction history file generation unit 132 generates a correction history file indicating the correction history by the phonetic information correction unit 110 and the prosody information correction unit 112 and stores the correction history file in the correction history file storage unit 134. The correction history file is generated for each sentence of text.

図６は、修正履歴ファイル格納部１３４に格納される修正履歴ファイルのデータ構成を模式的に示している。修正履歴ファイルは、テキストの各文毎のテーブル１３４１，１３４２・・・を有している。テーブル１３４１は、「ＥｄｉｔＤａｔａ（ｉ）」で識別される各文に対する修正履歴ファイルである。テーブル１３４１は、履歴登録数フィールド、履歴構造データフィールド、初期データフィールドおよび修正状態情報フィールドを有している。 FIG. 6 schematically shows the data structure of the correction history file stored in the correction history file storage unit 134. The correction history file has tables 1341, 1342,... For each sentence of the text. The table 1341 is a correction history file for each sentence identified by “EditData (i)”. The table 1341 has a history registration number field, a history structure data field, an initial data field, and a modification status information field.

履歴登録数フィールドには、履歴登録数が格納されている。ここで履歴登録数とは、対応する１文に対して生成された合成音について修正が行われ、修正状態情報が生成された回数である。履歴登録数は修正されるごとに変更される。 In the history registration number field, the history registration number is stored. Here, the number of history registrations is the number of times that the synthesized sound generated for one corresponding sentence is corrected and the correction state information is generated. The number of history registrations is changed every time it is modified.

履歴構造データフィールドは、履歴構造データのデータ名を格納している。履歴構造データとは、対応する１文に対して生成された合成音と、当該合成音を修正して得られた合成音との対応関係を示すデータである。 The history structure data field stores the data name of the history structure data. The history structure data is data indicating a correspondence relationship between a synthesized sound generated for a corresponding sentence and a synthesized sound obtained by correcting the synthesized sound.

初期データフィールドは、対応する１文に対して生成された合成音、すなわち初期データを格納している。修正状態情報フィールドは、初期データに対する修正状態情報を格納している。修正状態情報フィールドには、修正が行われる毎に修正状態情報が追加される。 The initial data field stores a synthesized sound generated for a corresponding sentence, that is, initial data. The correction status information field stores correction status information for the initial data. Correction state information is added to the correction state information field every time correction is performed.

初期データフィールドおよび修正データフィールドは、修正された日付・時間、修正順、修正された時点での合成音全体に関する各種パラメータ値を格納している。合成音全体に関するパラメータ値としては、話者、全体的なピッチ、音量、および話速などがある。
また、初期データフィールドおよび修正データフィールドは、表音情報および韻律情報の修正履歴として、修正されたアクセント句と、このアクセント句に対応する修正項目を格納している。ここで、修正項目は、例えばアクセント型、区切り、ピッチパターン形状、継続時間長などである。 The initial data field and the corrected data field store various parameter values relating to the corrected date / time, correction order, and the entire synthesized sound at the time of correction. The parameter values related to the entire synthesized sound include the speaker, the overall pitch, the volume, and the speaking speed.
The initial data field and the correction data field store a corrected accent phrase and a correction item corresponding to the accent phrase as a correction history of phonetic information and prosodic information. Here, the correction items are, for example, an accent type, a break, a pitch pattern shape, a duration time, and the like.

このように、修正履歴ファイル格納部１３４は、１つの初期データとしての合成音に対して、当該合成音を修正して得られた修正状態情報を対応付けて格納する。また、合成音を修正して得られた修正音をさらに修正して得られた修正状態情報も対応付けて格納する。 As described above, the modification history file storage unit 134 stores the synthesized state as one initial data in association with the modified state information obtained by modifying the synthesized sound. Further, correction state information obtained by further correcting the correction sound obtained by correcting the synthesized sound is also stored in association with each other.

履歴構造データ生成部１３６は、修正履歴ファイル格納部１３４が管理する履歴構造データを生成し、履歴構造データ格納部１３８に格納する。具体的には、履歴構造データ生成部１３６は、合成音が修正された場合に、修正状態情報と、当該合成音の初期値とを対応付ける。 The history structure data generation unit 136 generates history structure data managed by the correction history file storage unit 134 and stores it in the history structure data storage unit 138. Specifically, when the synthesized sound is modified, the history structure data generation unit 136 associates the modification state information with the initial value of the synthesized sound.

図７は、履歴構造データ格納部１３８に格納される履歴構造データのデータ構成を模式的に示している。履歴構造データ格納部１３８は、１つの初期データと、当該初期データについての修正状態情報との対応関係をツリー構造で表現した履歴構造データを格納している。ツリー構造における親ノードは、初期データである。また、各ノードは、親ノードの初期データを修正して得られた修正状態情報である。 FIG. 7 schematically shows a data structure of history structure data stored in the history structure data storage unit 138. The history structure data storage unit 138 stores history structure data in which a correspondence relationship between one initial data and the modification state information for the initial data is expressed in a tree structure. The parent node in the tree structure is initial data. Each node is modification state information obtained by modifying the initial data of the parent node.

図６に示す履歴構造データにおいては、修正状態情報「ＥｄｉｔＤａｔａ０＿＃１」および「ＥｄｉｔＤａｔａ０＿＃３」は、いずれも初期データ「ＥｄｉｔＤａｔａ０＿＃０」を修正して得られた情報である。また、修正状態情報「ＥｄｉｔＤａｔａ０＿＃２」および「ＥｄｉｔＤａｔａ０＿＃４」は、いずれも修正状態情報「ＥｄｉｔＤａｔａ０＿＃１」を修正して得られた情報である。また、修正状態情報「ＥｄｉｔＤａｔａ０＿＃５」は、修正状態情報「ＥｄｉｔＤａｔａ０＿＃４」を修正して得られた情報である。このように、履歴構造データは、このような各修正状態情報の対応を示している。 In the history structure data shown in FIG. 6, the modification status information “EditData0_ # 1” and “EditData0_ # 3” are both information obtained by modifying the initial data “EditData0_ # 0”. Further, the modification state information “EditData0_ # 2” and “EditData0_ # 4” are both information obtained by modifying the modification state information “EditData0_ # 1”. The modification state information “EditData0_ # 5” is information obtained by modifying the modification state information “EditData0_ # 4”. As described above, the history structure data indicates the correspondence between such pieces of correction state information.

また、各データ名に付与された「＃ｉ」は、修正の順番を示す番号である。すなわち、図７に示す履歴構造データにおいては、まず初期データ「０ＥｄｉｔＤａｔａ０＿＃０」を修正元として修正状態情報「ＥｄｉｔＤａｔａ０＿＃１」が生成されたことが示されている。そして、これに続いて、修正状態情報「ＥｄｉｔＤａｔａ０＿＃１」を修正元として修正状態情報「ＥｄｉｔＤａｔａ０＿＃２」が生成されたことが示されている。さらにこの後に、初期データ「ＥｄｉｔＤａｔａ０＿＃０」を修正元として修正状態情報「ＥｄｉｔＤａｔａ０＿＃３」が生成されたことが示されている。 Also, “#i” given to each data name is a number indicating the order of correction. That is, the history structure data shown in FIG. 7 indicates that the modification state information “EditData0_ # 1” is first generated using the initial data “0EditData0_ # 0” as the modification source. Subsequently, it is indicated that the correction state information “EditData0_ # 2” is generated using the correction state information “EditData0_ # 1” as a correction source. Further, it is shown that the modification status information “EditData0_ # 3” is generated using the initial data “EditData0_ # 0” as the modification source.

このように、履歴構造データ格納部１３８は、各修正状態情報の修正元の合成音および修正された順番を識別する情報を含む履歴構造データを格納している。 As described above, the history structure data storage unit 138 stores history structure data including information for identifying the synthesized sound of the correction source of each correction state information and the correction order.

なお、本実施の形態においては、修正状態情報および修正履歴をファイルとして格納するが、他の例としては、これらの情報をメモリに展開し、それぞれのアドレスを格納し、指定することとしてもよい。 In the present embodiment, the correction state information and the correction history are stored as files. However, as another example, such information may be expanded in a memory, and each address may be stored and specified. .

図８は、表音情報を修正する際に修正表示部１４０が表示する表音情報修正画面２００を示している。また、図９は、韻律情報を修正する際に修正表示部１４０が表示する韻律情報修正画面２１０を示している。 FIG. 8 shows a phonetic information correction screen 200 displayed by the correction display unit 140 when correcting phonetic information. FIG. 9 shows a prosodic information correction screen 210 displayed by the correction display unit 140 when correcting prosodic information.

操作者は、表音情報修正画面２００および韻律情報修正画面２１０を見ながら、表音情報および韻律情報を修正することができる。より具体的には、表音情報修正画面２００および韻律情報修正画面２１０に表示されている内容に対して、操作者がマウス等のユーザインターフェースを利用して指示を入力すると、指示受付部１２０はユーザからの指示を受け付け、指示情報を表音情報修正部１１０および韻律情報修正部１１２に送る。そして、表音情報修正部１１０および韻律情報修正部１１２は、指示情報に基づいて修正を行う。 The operator can correct the phonetic information and the prosody information while looking at the phonetic information correction screen 200 and the prosody information correction screen 210. More specifically, when the operator inputs an instruction to the contents displayed on the phonetic information correction screen 200 and the prosody information correction screen 210 using a user interface such as a mouse, the instruction receiving unit 120 An instruction from the user is accepted, and the instruction information is sent to the phonetic information correction unit 110 and the prosody information correction unit 112. And the phonetic information correction part 110 and the prosody information correction part 112 correct based on instruction information.

表音情報修正画面２００の画面上段は、表音情報修正画面２００および韻律情報修正画面２１０のいずれにも設けられた共通領域２２０である。共通領域２２０は、表音／韻律修正モード切り替えボタン２２１，２２２、テキスト入力／表示部分２２３、言語処理結果表示部分２２４、読み上げボタン２２５、履歴登録ボタン２２６を有している。 The upper part of the phonetic information correction screen 200 is a common area 220 provided in both the phonetic information correction screen 200 and the prosody information correction screen 210. The common area 220 includes phonetic / prosody modification mode switching buttons 221 and 222, a text input / display portion 223, a language processing result display portion 224, a reading button 225, and a history registration button 226.

操作者より、表音／韻律修正モード切り替えボタン２２１，２２２が選択されると、表音修正モードと韻律修正モードとが切り替る。このモード切替に対応して、修正表示部１４０は、表音情報修正画面２００と韻律情報修正画面２１０とを切り替えて表示する。すなわち、表音修正ボタン２２１が選択された場合には、画面下段を、図８に示す表音修正領域２３０に切り替え、韻律修正ボタン２２２が選択された場合には、画面下段を、図９に示す韻律修正領域２４０に切り替える。 When the phonetic / prosodic correction mode switching buttons 221 and 222 are selected by the operator, the phonetic correction mode and the prosody correction mode are switched. Corresponding to this mode switching, the correction display unit 140 switches and displays the phonetic information correction screen 200 and the prosody information correction screen 210. That is, when the phonetic correction button 221 is selected, the lower part of the screen is switched to the phonetic correction area 230 shown in FIG. 8, and when the prosody correction button 222 is selected, the lower part of the screen is shown in FIG. Switch to the prosody modification area 240 shown.

他の例としては、表音／韻律修正モード切り替えボタン２２１，２２２を設けず、図８に示した表音修正領域２３０と、図９に示した韻律修正領域２４０を同一画面に表示するようにしてもよい。 As another example, the phonetic / prosodic correction mode switching buttons 221 and 222 are not provided, and the phonetic correction area 230 shown in FIG. 8 and the prosody correction area 240 shown in FIG. 9 are displayed on the same screen. May be.

テキスト入力／表示部分２２３は、操作者からのテキスト入力を受け付ける欄である。図８に示すテキスト入力／表示部分２２３には「今日は良い天気です。」という漢字仮名混じりテキストが入力されている。 The text input / display portion 223 is a field for receiving text input from the operator. In the text input / display portion 223 shown in FIG. 8, a mixed text of kanji kana “Today is good weather” is input.

言語処理結果表示部分２２４は、言語処理部１０２による言語処理の結果を表示する。より具体的には、言語処理結果表示部分２２４は、言語処理の結果得られる表音記号列をそのまま表示するのではなく、表音記号列を変換し、読み、アクセント型、区切りの種別をアクセント句単位で区切って、一般の操作者にもわかる形態で表示する。 The language processing result display part 224 displays the result of language processing by the language processing unit 102. More specifically, the language processing result display portion 224 does not display the phonetic symbol string obtained as a result of the language processing as it is, but converts the phonetic symbol string to accentuate the reading, accent type, and delimiter type. It is divided into phrases and displayed in a form that is understandable to general operators.

図８に示す言語処理結果表示部分２２４においては、テキストを「きょーわ」、「よい」、「てんきです」の３つのアクセント句に分けて表示している。さらに、１番目のアクセント句「きょーわ」では、「きょ」にアンダーラインを引いて、ここにアクセントがあることを示している。また、語尾に「（小）」を付けて区切りの種別が小ポーズであることを示している。 In the language processing result display portion 224 shown in FIG. 8, the text is divided into three accent phrases “Kyowa”, “Good”, and “Tenki is”. Furthermore, in the first accent phrase “Kyowa”, “Kyo” is underlined to indicate that there is an accent. In addition, “(small)” is added to the end of the word to indicate that the type of separation is a small pause.

さらに、操作者は、言語処理結果表示部分２２４に表示されたアクセント句を選択することができる。例えば、マウスなどのユーザインターフェースによって、「きょーわ」が表示されている領域を指定することにより、アクセント句「きょーわ」を選択できる。図８に示す言語処理結果表示部分２２４においては、１番目のアクセント句「きょーわ」が選択されており、選択されたアクセント句「きょーわ」に対してマウスカーソル（矢印）が表示されている。 Furthermore, the operator can select the accent phrase displayed in the language processing result display portion 224. For example, the accent phrase “Kyowa” can be selected by designating an area where “Kyowa” is displayed by a user interface such as a mouse. In the language processing result display portion 224 shown in FIG. 8, the first accent phrase “Kyowa” is selected, and a mouse cursor (arrow) is placed on the selected accent phrase “Kyowa”. It is displayed.

画面下段は、言語処理結果表示部分２２４において操作者から選択されたアクセント句の読み、アクセント型、アクセント強弱、区切りの種別を言語処理結果に基づいて表示する。さらにこれらを修正する指示を操作者から受け付ける入力領域が設けられている。操作者からの指示によりアクセント句の読み等が修正された場合には、表音情報格納部１０４に格納されている表音情報を、修正された結果で更新し、さらに言語処理結果表示部分２２４の表示も更新する。 The lower part of the screen displays the accent phrase reading, accent type, accent strength, and division type selected by the operator in the language processing result display portion 224 based on the language processing result. Further, an input area for receiving an instruction to correct these from the operator is provided. When the accent phrase reading or the like is corrected by an instruction from the operator, the phonetic information stored in the phonetic information storage unit 104 is updated with the corrected result, and the language processing result display part 224 is updated. The display of is also updated.

画面下段の表音修正領域２３０は、読み修正部分２３１と、読み変更ボタン２３２と、アクセント修正部分２３４と、区切り修正部分２３５とを有している。 The phonetic correction area 230 at the bottom of the screen has a reading correction portion 231, a reading change button 232, an accent correction portion 234, and a break correction portion 235.

操作者の指示により読み修正部分２３１に表示されている読みが変更されると、選択されているアクセント句の読みが修正される。読み修正部分２３１で読みが入力された後、読み変更ボタン２３２が押されると、選択されているアクセント句の読みを入力された読みに変更する。 When the reading displayed in the reading correction portion 231 is changed by an instruction from the operator, the reading of the selected accent phrase is corrected. When a reading change button 232 is pressed after a reading is input in the reading correction portion 231, the reading of the selected accent phrase is changed to the input reading.

このとき、指示受付部１２０は、アクセント句の修正後の読みを取得し、表音情報修正部１１０へ送る。そして、表音情報修正部１１０は、取得した修正後のアクセント句の読みにより表音情報を修正する。そして、修正状態情報ファイル格納部１３１は、修正後の表音情報を修正状態情報として格納する。 At this time, the instruction receiving unit 120 acquires the corrected reading of the accent phrase and sends it to the phonetic information correcting unit 110. Then, the phonetic information correction unit 110 corrects the phonetic information by reading the acquired corrected accent phrase. Then, the correction state information file storage unit 131 stores the phonetic information after correction as correction state information.

また、操作者の指示によりアクセント修正部分２３４の内容が変更されるとアクセント型とアクセント強弱が修正される。図８に示すアクセント修正部分２３４は、「きょーわ」のアクセント型が１型であって、「きょ」にアクセントがあり、強いアクセントであることを示している。 Further, when the content of the accent correction portion 234 is changed according to the operator's instruction, the accent type and the accent strength are corrected. The accent correction portion 234 shown in FIG. 8 indicates that the accent type of “Kyowa” is type 1, and “Kyo” has an accent and is a strong accent.

アクセント型の修正は、「きょーわ」の場合はアクセント型が０〜２型の３通りある。そして、１と表示されている欄２３４１に所望の数値を入力することによりアクセント型を修正することができる。また、入力欄の右側の上下の矢印ボタン２３４２，２３４５により欄２３４１の数値を変更することができる。また、右側の選択ボタン２３４４，２３４５を選択することによりアクセント強弱を変更することができる。 There are three types of accent type correction: “Kyowa” has 0 to 2 accent types. The accent type can be corrected by inputting a desired numerical value in a column 2341 displayed as 1. Also, the numerical value in the column 2341 can be changed by the up and down arrow buttons 2342 and 2345 on the right side of the input column. Further, the accent strength can be changed by selecting the selection buttons 2344 and 2345 on the right side.

操作者の指示により区切り修正部分２３５の内容が変更されると、区切りの種別、区切り位置が修正される。区切りの種別は、強結合２３５１、弱結合２３５２、小ポーズ２３５３、大ポーズ２３５４の中から選択できる。ここで、区切りの種別が小、大ポーズの場合は、呼気段落の終わりであることを示す。アクセント句の連結は連結ボタン２３５５を押すことによって行える。図８のように１番目のアクセント句「きょーわ」が選択されていて連結ボタン２３５５が押された場合、２番目のアクセント句「よい」と連結して、「きょーわよい」というアクセント句を構成する。 When the content of the break correction portion 235 is changed according to an instruction from the operator, the break type and break position are corrected. The delimiter type can be selected from strong coupling 2351, weak coupling 2352, small pose 2353, and large pose 2354. Here, when the division type is small or large pause, it indicates the end of the exhalation paragraph. Accent phrases can be connected by pressing the connection button 2355. As shown in FIG. 8, when the first accent phrase “Kyowa” is selected and the link button 2355 is pressed, the second accent phrase “Good” is linked to “Kyowayo”. Is an accent phrase.

また、共通領域２２０の言語処理結果表示部分２２４は、修正された内容を履歴として登録するためのボタンである。言語処理結果表示部分２２４が押されると、修正状態情報ファイル格納部１３１に修正状態情報が格納される。履歴登録のタイミング、すなわち、修正された内容を修正状態情報ファイル格納部１３１に記憶するタイミングは、本実施の形態のように、履歴登録ボタン２２６を用意して操作者に行わせる方法の他に、読み、アクセント型、区切り、さらに、後述のピッチパターン形状、継続時間長の各修正インターフェースで変更されるごとに、あるいは、新たなテキストが入力された場合に、自動的に行うようにしてもよい。 The language processing result display portion 224 of the common area 220 is a button for registering the corrected content as a history. When the language processing result display portion 224 is pressed, the correction state information is stored in the correction state information file storage unit 131. The timing of history registration, that is, the timing of storing the modified content in the modified status information file storage unit 131 is not limited to the method of preparing the history registration button 226 and causing the operator to perform it as in this embodiment. , Reading, accent type, delimiter, and each time it is changed in each correction interface of pitch pattern shape and duration length described later, or when new text is input, it is automatically performed Good.

なお、表音情報の修正と、後述の韻律情報の修正が共に行われた場合は、両者の修正内容を対とした修正状態情報が修正状態情報ファイル格納部１３１に格納される。 When both the phonetic information correction and the later-described prosodic information correction are performed, the correction state information in which the correction contents are paired is stored in the correction state information file storage unit 131.

読み上げボタン２０７は、修正された合成音で読み上げさせるためのボタンである。読み上げボタン２０７が押されると、修正された表音情報あるいは韻律情報により、合成音生成部１１４、合成音出力部１１６を経て合成音が出力される。 The reading button 207 is a button for reading aloud with the modified synthesized sound. When the reading button 207 is pressed, a synthesized sound is output through the synthesized sound generation unit 114 and the synthesized sound output unit 116 based on the corrected phonetic information or prosody information.

また、韻律モードにおいては、図９に示す韻律情報修正画面２１０が表示される。韻律情報修正画面２１０の画面上段は、表音情報修正画面２００である。また、画面下段には、ピッチパターン修正部分２４０がある。ピッチパターン修正部分２４０は、言語処理結果表示部分２２４において操作者によって選択されたアクセント句のピッチパターンを韻律制御の結果に基づいて表示する。なお、韻律修正領域２４０における横軸は時間（フレーム）を示す。縦軸はピッチ（オクターブ）を示す。 In the prosody mode, a prosody information correction screen 210 shown in FIG. 9 is displayed. The upper part of the prosody information correction screen 210 is the phonetic information correction screen 200. A pitch pattern correction portion 240 is provided at the bottom of the screen. The pitch pattern correction portion 240 displays the pitch pattern of the accent phrase selected by the operator in the language processing result display portion 224 based on the result of prosodic control. The horizontal axis in the prosody modification area 240 indicates time (frame). The vertical axis represents pitch (octave).

また、韻律修正領域２４０に対して操作者からの指示を受け付けると、指示受付部１２０は、指示を取得し、韻律情報修正部１１２は、指示受付部１２０からの指示に基づいて音韻継続時間長およびピッチパターンを修正する。ピッチパターン修正部分２０９では、言語処理結果表示部分２２４において選択されたアクセント句のピッチパターンを中心に表示し、その音韻継続時間長およびピッチパターンの形状の修正の指示を受け付ける。 When receiving an instruction from the operator for the prosody modification area 240, the instruction receiving unit 120 acquires the instruction, and the prosody information correcting unit 112 is based on the instruction from the instruction receiving unit 120. And correct the pitch pattern. In the pitch pattern correction portion 209, the accent phrase selected in the language processing result display portion 224 is displayed with the pitch pattern as the center, and an instruction to correct the phoneme duration and pitch pattern shape is accepted.

図９においては、１番目のアクセント句「きょーわ」が選択されており、そのピッチパターンが中心に表示され、２番目のアクセント句「よい」のピッチパターンが右側に表示されている。 In FIG. 9, the first accent phrase “Kyowa” is selected, the pitch pattern is displayed at the center, and the pitch pattern of the second accent phrase “good” is displayed on the right side.

図９に示す韻律情報修正画面２１０において、韻律情報格納部１０８に格納されている韻律情報、すなわち１次元配列で表されたデータを、間接的に修正することができる。音韻継続時間長の修正は、アクセント句内の各モーラについて、アクセント句全体について、あるいは区切りの長さについて行うことができる。アクセント句内の各モーラの境界は縦線（点線を含む）で示されている。操作者がマウスで縦線の位置を左右に動かすと、各モーラの音韻継続時間長が修正される。また、マウスでアクセント句の先頭（左端）にある縦線を左右に動かすと、アクセント句全体の音韻継続時間長が伸縮される。 In the prosody information correction screen 210 shown in FIG. 9, the prosody information stored in the prosody information storage unit 108, that is, data represented by a one-dimensional array can be corrected indirectly. The phoneme duration can be modified for each mora in the accent phrase, for the entire accent phrase, or for the length of the break. The boundaries of each mora in the accent phrase are indicated by vertical lines (including dotted lines). When the operator moves the position of the vertical line from side to side with the mouse, the phoneme duration of each mora is corrected. When the vertical line at the beginning (left end) of the accent phrase is moved left and right with the mouse, the phoneme duration of the entire accent phrase is expanded or contracted.

選択されたアクセント句の音韻継続時間長が修正された場合には、他のアクセント句の時間長も、選択されたアクセント句における修正値分だけシフトする。これにより、韻律情報全体の整合性を保つことができる。 When the phoneme duration time length of the selected accent phrase is corrected, the time lengths of other accent phrases are also shifted by the correction value in the selected accent phrase. Thereby, the consistency of the whole prosodic information can be maintained.

また、図９における２番目のアクセント句「よい」、すなわち選択されているアクセント句の隣のアクセント句の先頭にある縦線をマウスで左右に動かすと、区切り間隔の時間長が修正される。 Further, when the vertical line at the head of the second accent phrase “good” in FIG. 9, that is, the accent phrase adjacent to the selected accent phrase is moved left and right with the mouse, the time length of the separation interval is corrected.

ピッチパターンの形状修正は、マウスでピッチパターン上に自由曲線を描くことにより行える。図１０−１および図１０−２はピッチパターンの形状修正例を示している。図１０−１のように、ピッチパターン全体を自由曲線（点線）で修正できる。また、図１０−２のように、ピッチパターンの一部を自由曲線（点線）で修正できる。また、ピッチパターン修正部分２０９内の左端にあるスライダーバーをマウスで上下に動かすことにより、選択されているアクセント句のピッチパターン全体を上下に動かして、ピッチの高低を修正できる。 The shape of the pitch pattern can be corrected by drawing a free curve on the pitch pattern with the mouse. 10-1 and 10-2 show examples of pitch pattern shape correction. As shown in FIG. 10A, the entire pitch pattern can be corrected with a free curve (dotted line). Further, as shown in FIG. 10-2, a part of the pitch pattern can be corrected with a free curve (dotted line). Further, by moving the slider bar at the left end in the pitch pattern correction portion 209 up and down with the mouse, the pitch pattern of the selected accent phrase can be moved up and down to correct the pitch level.

このように、韻律制御結果の韻律情報を表示すると共に、操作者からの指示を受け付けて、操作者が対話的に合成音を修正することができる。 In this way, the prosody information of the prosody control result is displayed, and an instruction from the operator can be received, and the operator can interactively correct the synthesized sound.

図１１は、修正表示部１４０が表示する修正履歴ファイル画面２５０を示している。修正履歴ファイル画面２５０は、テキスト表示部分２５１と、履歴構造表示部分２５２と、全体パラメータ表示部分２５３と、修正状態情報表示部分２５４とを有している。 FIG. 11 shows a correction history file screen 250 displayed by the correction display unit 140. The correction history file screen 250 includes a text display portion 251, a history structure display portion 252, an overall parameter display portion 253, and a correction state information display portion 254.

テキスト表示部分２５１は、対象となる文がテキスト中の何番目の文であるかを示す文番号、テキスト、履歴登録されている件数を示す履歴登録数を表示する。 The text display portion 251 displays a sentence number indicating the number of sentences in the text, a text, and a history registration number indicating the number of history registrations.

履歴構造表示部分２５２は、履歴構造データ格納部１３８に格納されている履歴構造データを表示する。図１１に示すように、合成音と修正音の対応関係をツリー構造で示す。なお、図１１においては、各修正状態情報が生成された日付を、修正状態情報の識別情報として表示している。また、ルートノードは、初期データである。 The history structure display part 252 displays the history structure data stored in the history structure data storage unit 138. As shown in FIG. 11, the correspondence between the synthesized sound and the modified sound is shown in a tree structure. In addition, in FIG. 11, the date when each correction status information was produced | generated is displayed as identification information of correction status information. The root node is initial data.

このように、履歴構造表示部分２５２においては、各修正状態情報と初期データとの派生関係をツリー構造で表示するので、操作者は、各しゅうせい状態情報と初期データとの対応関係を容易に把握することができる。 In this way, in the history structure display portion 252, the derivation relationship between each modified state information and the initial data is displayed in a tree structure, so that the operator can easily display the correspondence between each state information and the initial data. I can grasp it.

また、履歴構造表示部分２５２において、操作者が所望のノードを選択すると、全体パラメータ表示部分２５３に対応する全体パラメータが表示され、かつ修正状態情報表示部分２５４に対応する修正状態情報が表示される。図１１における履歴構造表示部分２５２においては、「０３／０７／２３＿＃５」が選択されている。 When the operator selects a desired node in the history structure display part 252, the overall parameters corresponding to the overall parameter display part 253 are displayed, and the correction state information corresponding to the correction state information display part 254 is displayed. . In the history structure display portion 252 in FIG. 11, “03/07 / 23_ # 5” is selected.

なお、本実施の形態においては、履歴構造表示部分２５２における各修正状態情報の識別情報として、修正状態情報が生成された日付が表示されていたが、他の例としては、これにかえて、操作者によって割り当てられた修正状態情報を識別するファイル名が表示されてもよい。 In the present embodiment, the date when the correction state information was generated is displayed as the identification information of each correction state information in the history structure display portion 252, but as another example, A file name for identifying the modification status information assigned by the operator may be displayed.

全体パラメータ表示部分２５３は、履歴構造表示部分２５２において選択された修正状態情報が示す全体パラメータを表示する。 The overall parameter display portion 253 displays the overall parameters indicated by the modification state information selected in the history structure display portion 252.

修正状態情報表示部分２５４は、履歴構造表示部分２５２において選択された修正状態情報が示す修正後の表音情報および韻律情報が表示される。具体的には、修正されたアクセント句とその修正項目を表示する。図１１においては、アクセント句「よい」のピッチパターン形状と、アクセント句「てんきです」の継続時間長が修正された場合の修正状態情報が表示されている。すなわち、「３／０７／２３＿＃５」は、「０３／０７／２３＿＃４」の修正状態からアクセント句「よい」のピッチパターン形状と、アクセント句「てんきです」の継続時間を修正した修正状態情報であることがわかる。 The correction state information display part 254 displays the corrected phonetic information and prosodic information indicated by the correction state information selected in the history structure display part 252. Specifically, the corrected accent phrase and its correction item are displayed. In FIG. 11, the pitch pattern shape of the accent phrase “good” and the correction state information when the duration time of the accent phrase “Tenki is” are corrected are displayed. In other words, “3/07 / 23_ # 5” is a correction in which the pitch pattern shape of the accent phrase “good” and the duration of the accent phrase “Tenki is” are corrected from the correction state of “03/07 / 23_ # 4” It turns out that it is status information.

なお、本実施の形態にかかる修正状態情報表示部分２５４は、直前のノードからの修正項目を表示したが、他の例としては、修正状態情報表示部分２５４は、初期データからの修正項目を表示してもよい。 In addition, although the correction state information display part 254 concerning this Embodiment displayed the correction item from the last node, as another example, the correction state information display part 254 displays the correction item from initial data. May be.

履歴構造表示部分２５２において、操作者が修正状態情報のノードを選択すると、そのノードに対応する修正状態情報が特定される。そして、修正状態情報に含まれる表音情報を表音情報格納部１０４に更新して格納する。また、修正状態情報に含まれる韻律情報を韻律情報格納部１０８に更新して格納する。さらに表音情報修正画面２００および韻律情報修正画面２１０の表示内容に反映させる。すなわち、更新後の表音情報および更新後の韻律情報に対する内容を表示させる。 In the history structure display part 252, when the operator selects a node of the correction state information, the correction state information corresponding to the node is specified. Then, the phonetic information included in the correction state information is updated and stored in the phonetic information storage unit 104. In addition, the prosody information included in the correction state information is updated and stored in the prosody information storage unit 108. Further, it is reflected in the display contents of the phonetic information correction screen 200 and the prosody information correction screen 210. That is, the contents for the updated phonetic information and the updated prosodic information are displayed.

このように、履歴構造表示部分２５２に表示されている修正状態情報のうちの１つを選択することにより、操作者が再現を希望する修正状態を再現し、さらに表音情報修正画面２００および韻律情報修正画面２１０上に表示させることができる。これにより、操作者は、希望する修正段階から修正を行うことができる。 In this way, by selecting one of the correction state information displayed in the history structure display portion 252, the correction state that the operator desires to reproduce is reproduced, and the phonetic information correction screen 200 and the prosody are further reproduced. It can be displayed on the information correction screen 210. Thus, the operator can make corrections from a desired correction stage.

例えば、操作者が、「０３／０７／２１＿＃２」を選択した場合には、「０３／０７／２１＿＃２」に対する修正状態情報に含まれる表音情報が表音情報格納部１０４の内容を更新して格納され、かつ「０３／０７／２１＿＃２」に対する修正状態情報に含まれる韻律情報が韻律情報格納部１０８の内容を更新して格納される。そして、「０３／０７／２１＿＃２」に対する修正状態を基準として再度修正を行うことができる。 For example, when the operator selects “03/07 / 21_ # 2”, the phonetic information included in the modification state information for “03/07 / 21_ # 2” is the content of the phonetic information storage unit 104. And the prosody information included in the correction state information for “03/07 / 21_ # 2” is updated and stored in the prosody information storage unit 108. Then, the correction can be performed again on the basis of the correction state with respect to “03/07 / 21_ # 2”.

また、履歴構造表示部分２５２において、各修正状態情報のノード表記を変更することができる。例えば、「０３／０７／２３＿＃５」の表記を、「修正Ｆｉｘ版」と変更できる。変更された表記は、履歴管理情報中の各修正状態情報に関する情報において、修正状態に付与する表記として格納される。このように、操作者が希望する表記に変更することができるので、操作者による認識をさらに容易にすることができる。 In the history structure display portion 252, the node notation of each modification state information can be changed. For example, the notation “03/07 / 23_ # 5” can be changed to “modified Fix version”. The changed notation is stored as a notation to be given to the correction state in the information regarding each correction state information in the history management information. Thus, since it can change to the description which an operator desires, recognition by an operator can be made still easier.

また、履歴構造表示部分２５２において、各修正状態情報を削除することができる。操作者から削除が指示された場合には、修正状態情報ファイル格納部１３１は該当する修正状態情報ファイルを削除する。そして、修正履歴ファイル生成部１３２は履歴管理情報ファイルの内容を更新する。 In the history structure display portion 252, each modification state information can be deleted. When deletion is instructed by the operator, the correction state information file storage unit 131 deletes the corresponding correction state information file. Then, the correction history file generation unit 132 updates the contents of the history management information file.

図１２から図１４は、音声合成編集装置１０の処理を示すフローチャートである。まず、テキスト取得部１００が、音声合成すべきテキストを取得すると（ステップＳ１，Ｙｅｓ）、ステップＳ２〜Ｓ８へと処理が進む。 12 to 14 are flowcharts showing processing of the speech synthesis editing apparatus 10. First, when the text acquisition unit 100 acquires text to be synthesized (step S1, Yes), the process proceeds to steps S2 to S8.

すなわち、まず言語処理部１０２が、テキスト取得部１００から受け取ったテキストに対し言語処理を行う（ステップＳ２）。その結果としてアクセント句単位に分割された表音情報を生成する。次に、分割された各アクセント句に番号付けを行う（ステップＳ３）。図２に示した例では、「きょーわ」を１番、「よい」を２番、「てんきです」を３番とする。次に、各アクセント句に対応する形で表音情報を表音情報格納部１０４に格納する（ステップＳ４）。 That is, first, the language processing unit 102 performs language processing on the text received from the text acquisition unit 100 (step S2). As a result, phonetic information divided into accent phrases is generated. Next, each divided accent phrase is numbered (step S3). In the example shown in FIG. 2, “Kyowa” is No. 1, “Good” is No. 2, and “Tenki is” is No. 3. Next, phonetic information is stored in the phonetic information storage unit 104 in a form corresponding to each accent phrase (step S4).

次に、修正表示部１４０は、表音情報格納部１０４に格納された言語処理結果の表音情報を表示する（ステップＳ５）。次に韻律制御部１０６は、表音情報格納部１０４に格納した表音情報を基に韻律制御を行い、韻律情報を生成する（ステップＳ６）。次に、各アクセント句に対応する形で韻律情報を韻律情報格納部１０８に格納する（ステップＳ７）。次に、修正表示部１４０は、韻律情報格納部１０８に格納された韻律情報を表示する（ステップＳ８）。その後、Ａへもどり、テキスト取得部１００が新たなテキストを取得すると（ステップＳ１，Ｙｅｓ）、再びステップＳ２〜Ｓ８へと進む。また、新たなテキストを取得しない場合には（ステップＳ１，Ｎｏ）、ステップＳ１１へ進む。 Next, the correction display unit 140 displays the phonetic information of the language processing result stored in the phonetic information storage unit 104 (step S5). Next, the prosody control unit 106 performs prosody control based on the phonetic information stored in the phonetic information storage unit 104, and generates prosody information (step S6). Next, prosodic information is stored in the prosodic information storage unit 108 in a form corresponding to each accent phrase (step S7). Next, the correction display unit 140 displays the prosody information stored in the prosody information storage unit 108 (step S8). Then, it returns to A and if the text acquisition part 100 acquires a new text (step S1, Yes), it will progress to step S2-S8 again. If no new text is acquired (step S1, No), the process proceeds to step S11.

ステップＳ１１では、表音情報修正部１１０において表音情報が修正されたかどうかを判定する。そして、修正された場合には（ステップＳ３，Ｙｅｓ）、ステップＳ３に進む。また、修正されなかった場合には（ステップＳ３，Ｎｏ）、ステップＳ１２へ進む。 In step S11, the phonetic information correction unit 110 determines whether the phonetic information has been corrected. And when it corrects (step S3, Yes), it progresses to step S3. If not corrected (No at Step S3), the process proceeds to Step S12.

ステップＳ１２では、韻律情報修正部１１２において韻律情報が修正されたかどうかを判定する。そして、修正された場合には（ステップＳ１２，Ｙｅｓ）ステップＳ７に進む。また、修正されなかった場合には（ステップＳ１２，Ｎｏ）、ステップＳ１３へ進む。 In step S12, the prosody information correction unit 112 determines whether the prosody information has been corrected. And when it corrects (step S12, Yes), it progresses to step S7. If not corrected (No at step S12), the process proceeds to step S13.

ステップＳ１３では、修正状態の履歴登録が指示されたかどうかを判定する。ここで、履歴登録の指示とは、現段階における修正状態を修正状態情報として登録する旨の指示である。そして、指示された場合には（ステップＳ１３，Ｙｅｓ）、図１３に示すＢへ進む。また、指示されなかった場合には（ステップＳ１３，Ｎｏ）、ステップＳ１４へ進む。 In step S13, it is determined whether or not a history registration of a correction state has been instructed. Here, the history registration instruction is an instruction to register the correction state at the current stage as the correction state information. If instructed (step S13, Yes), the process proceeds to B shown in FIG. If no instruction is given (No at step S13), the process proceeds to step S14.

図１３のＢに進んだ場合、まず、修正状態情報ファイルが生成され修正状態情報ファイル格納部１３１に格納される（ステップＳ２０）。次に、修正履歴ファイル生成部１３２によって修正履歴ファイルが生成され、修正履歴ファイル格納部１３４に格納される（ステップＳ２１）。そして、履歴登録の内容を表音情報修正画面２００および韻律情報修正画面２１０の表示内容に反映させる。すなわち、表示内容を更新する（ステップＳ２２）。そして、図１２に示すＡに戻る。 When the process proceeds to B in FIG. 13, first, a correction state information file is generated and stored in the correction state information file storage unit 131 (step S20). Next, a correction history file is generated by the correction history file generation unit 132 and stored in the correction history file storage unit 134 (step S21). Then, the contents of the history registration are reflected in the display contents of the phonetic information correction screen 200 and the prosody information correction screen 210. That is, the display content is updated (step S22). And it returns to A shown in FIG.

図１２におけるステップＳ１４では、修正表示部１４０が修正履歴として表示しているものの中から、操作者により所定の修正状態が選択されたかどうかを判定する。そして、選択された場合には（ステップＳ１４，Ｙｅｓ）、図１４に示すＣへ進む。また、選択されなかった場合には（ステップＳ１４，Ｎｏ）、ステップＳ１５へ進む。 In step S14 in FIG. 12, it is determined whether or not a predetermined correction state has been selected by the operator from those displayed by the correction display unit 140 as the correction history. If it is selected (step S14, Yes), the process proceeds to C shown in FIG. If not selected (No at Step S14), the process proceeds to Step S15.

図７のＣへ進んだ場合、まず、操作者から選択された修正状態情報、すなわち表音情報と韻律情報を修正状態情報ファイル格納部１３１から取得する（ステップＳ３０）。次に、取得した表音情報により修正表示部１４０の表示内容を更新する（ステップＳ３１）。さらに、取得した表音情報を表音情報格納部１０４に格納する（ステップＳ３２）。次に、取得した韻律情報により修正表示部１４０の表示内容を更新する（ステップＳ３３）。さらに、取得した韻律情報を韻律情報格納部１０８に格納する（ステップＳ３４）。そして、図１２に示すＡに戻る。 When the process proceeds to C in FIG. 7, first, correction state information selected by the operator, that is, phonetic information and prosodic information are acquired from the correction state information file storage unit 131 (step S30). Next, the display content of the correction display unit 140 is updated with the acquired phonetic information (step S31). Furthermore, the acquired phonetic information is stored in the phonetic information storage unit 104 (step S32). Next, the display content of the correction display unit 140 is updated with the acquired prosodic information (step S33). Furthermore, the acquired prosodic information is stored in the prosodic information storage unit 108 (step S34). And it returns to A shown in FIG.

図１２のステップＳ１５では、合成音の出力が指示された場合には（ステップＳ１５，Ｙｅｓ）、ステップＳ９へ進む。合成音の出力が指示されなかった場合には（ステップＳ１５，Ｎｏ）、ステップＳ１６へ進む。 In step S15 of FIG. 12, when the output of the synthesized sound is instructed (step S15, Yes), the process proceeds to step S9. When the output of the synthesized sound is not instructed (No at Step S15), the process proceeds to Step S16.

ステップＳ９において、合成音生成部１１４は、韻律情報格納部１０８に格納した韻律情報に基づいて合成音を生成する。次に、合成音出力部１１６は、合成音生成部１１４が生成した合成音をＤＡ変換して出力する（ステップＳ１０）。その後、Ａに戻る。 In step S <b> 9, the synthesized sound generation unit 114 generates a synthesized sound based on the prosodic information stored in the prosody information storage unit 108. Next, the synthesized sound output unit 116 performs DA conversion on the synthesized sound generated by the synthesized sound generation unit 114 and outputs the result (step S10). Then return to A.

ステップＳ１６では、終了が指示された場合は（ステップＳ１６，Ｙｅｓ）システムを終了する。修正が指示されなかった場合は（ステップＳ１６，Ｎｏ）、Ａに戻る。以上で、音声合成編集装置１０の処理が完了する。 In step S16, when the termination is instructed (step S16, Yes), the system is terminated. If correction is not instructed (No at step S16), the process returns to A. Thus, the process of the speech synthesis / editing apparatus 10 is completed.

図１５は、実施の形態に係る音声合成編集装置１０のハードウェア構成を示す図である。音声合成編集装置１０は、ハードウェア構成として、音声合成編集装置１０における音声編集処理を実行する音声編集プログラムなどが格納されているＲＯＭ５２と、ＲＯＭ５２内のプログラムに従って音声合成編集装置１０の各部を制御し、バッファリング時間変更処理等を実行するＣＰＵ５１と、ワークエリアが形成され、音声合成編集装置１０の制御に必要な種々のデータを記憶するＲＡＭ５３と、ネットワークに接続して、通信を行う通信I／Ｆ５７と、各部を接続するバス６２とを備えている。 FIG. 15 is a diagram illustrating a hardware configuration of the speech synthesis editing apparatus 10 according to the embodiment. The voice synthesis editing apparatus 10 controls, as a hardware configuration, a ROM 52 that stores a voice editing program for executing voice editing processing in the voice synthesis editing apparatus 10, and each unit of the voice synthesis editing apparatus 10 according to a program in the ROM 52. Then, a CPU 51 that executes buffering time changing processing, a RAM 53 in which a work area is formed and stores various data necessary for control of the speech synthesis editing apparatus 10, and a communication I connected to a network for communication / F57 and a bus 62 for connecting each part.

先に述べた音声合成編集装置１０における音声編集プログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フロッピー（登録商標）ディスク（ＦＤ）、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録されて提供されてもよい。 The speech editing program in the speech synthesis / editing apparatus 10 described above is a file in an installable format or an executable format, and is a computer-readable recording such as a CD-ROM, floppy (registered trademark) disk (FD), or DVD. It may be provided by being recorded on a medium.

この場合には、音声編集プログラムは、音声合成編集装置１０において上記記録媒体から読み出して実行することにより主記憶装置上にロードされ、上記ソフトウェア構成で説明した各部が主記憶装置上に生成されるようになっている。 In this case, the speech editing program is loaded onto the main storage device by being read from the recording medium and executed by the speech synthesis / editing device 10, and each unit described in the software configuration is generated on the main storage device. It is like that.

また、本実施の形態の音声編集プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。 Further, the audio editing program of the present embodiment may be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network.

以上、本発明を実施の形態を用いて説明したが、上記実施の形態に多様な変更または改良を加えることができる。 As described above, the present invention has been described using the embodiment, but various changes or improvements can be added to the above embodiment.

以上のように、本発明にかかる音声編集装置、音声編集方法および音声編集プログラムは、合成音の編集に有用であり、特に、対話型の音声編集に適している。 As described above, the speech editing apparatus, speech editing method, and speech editing program according to the present invention are useful for editing synthesized speech, and are particularly suitable for interactive speech editing.

音声合成編集装置１０を示すブロック図である。1 is a block diagram showing a speech synthesis / editing apparatus 10. FIG. 表音情報格納部１０４に格納される表音情報のデータ構成を模式的に示す図である。It is a figure which shows typically the data structure of the phonetic information stored in the phonetic information storage part. 韻律情報格納部１０８が格納する韻律情報のデータ構成を模式的に示す図である。It is a figure which shows typically the data structure of the prosodic information which the prosodic information storage part 108 stores. 修正管理部１３０の詳細な機能構成を示すブロック図である。3 is a block diagram illustrating a detailed functional configuration of a correction management unit 130. FIG. 修正状態情報ファイル格納部１３１に格納される修正後の韻律情報のデータ構成を模式的に示す図である。It is a figure which shows typically the data structure of the prosodic information after correction | amendment stored in the correction state information file storage part 131. FIG. 修正履歴ファイル格納部１３４に格納される修正履歴ファイルのデータ構成を模式的に示す図である。It is a figure which shows typically the data structure of the correction history file stored in the correction history file storage part. 履歴構造データ格納部１３８に格納される履歴構造データのデータ構成を模式的に示す図である。It is a figure which shows typically the data structure of the history structure data stored in the history structure data storage part 138. 表音情報を修正する際に修正表示部１４０が表示する表音情報修正画面２００を示す図である。It is a figure which shows the phonetic information correction screen 200 which the correction display part 140 displays when correcting phonetic information. 韻律情報を修正する際に修正表示部１４０が表示する韻律情報修正画面２１０を示す図である。It is a figure which shows the prosodic information correction screen 210 which the correction display part 140 displays when correcting prosodic information. ピッチパターンの形状修正例を示す図である。It is a figure which shows the example of shape correction of a pitch pattern. ピッチパターンの形状修正例を示す図である。It is a figure which shows the example of shape correction of a pitch pattern. 修正表示部１４０が表示する修正履歴ファイル画面２５０を示す図である。It is a figure which shows the correction history file screen 250 which the correction display part 140 displays. 音声合成編集装置１０の処理を示すフローチャートである。4 is a flowchart showing processing of the speech synthesis editing apparatus 10. 音声合成編集装置１０の処理を示すフローチャートである。4 is a flowchart showing processing of the speech synthesis editing apparatus 10. 音声合成編集装置１０の処理を示すフローチャートである。4 is a flowchart showing processing of the speech synthesis editing apparatus 10. 音声合成編集装置１０のハードウェア構成を示す図である。2 is a diagram illustrating a hardware configuration of the speech synthesis editing apparatus 10. FIG.

Explanation of symbols

１０音声合成編集装置
５１ＣＰＵ
５２ＲＯＭ
５３ＲＡＭ
５７通信Ｉ／Ｆ
６２バス
１００テキスト取得部
１０１テキスト入力部
１０２言語処理部
１０４表音情報格納部
１０６韻律制御部
１０８韻律情報格納部
１１０表音情報修正部
１１２韻律情報修正部
１１４合成音生成部
１１６合成音出力部
１２０指示受付部
１２１指示入力部
１３０修正管理部
１３１修正状態情報ファイル格納部
１３２修正履歴ファイル生成部
１３４修正履歴ファイル格納部
１３６履歴構造データ生成部
１３８履歴構造データ格納部
１４０修正表示部 10 Speech synthesis editing device 51 CPU
52 ROM
53 RAM
57 Communication I / F
62 Bus 100 Text acquisition unit 101 Text input unit 102 Language processing unit 104 Phonetic information storage unit 106 Prosody control unit 108 Prosody information storage unit 110 Phonetic information correction unit 112 Prosody information correction unit 114 Synthetic sound generation unit 116 Synthetic sound output unit DESCRIPTION OF SYMBOLS 120 Instruction reception part 121 Instruction input part 130 Correction management part 131 Correction state information file storage part 132 Correction history file generation part 134 Correction history file storage part 136 History structure data generation part 138 History structure data storage part 140 Correction display part

Claims

A correction means for correcting the synthesized sound generated for the text;
Correction information storage means for storing correction information indicating the correction sound obtained by the correction means;
History structure generation means for generating history structure information indicating a relationship between the correction information stored in the correction information storage means and the synthesized sound before correction;
Display means for displaying a correspondence relationship between the correction information and the synthesized sound associated with each other in the history structure information generated by the history structure generation means;
Selection accepting means for accepting selection of the correction information displayed on the display means;
A speech editing apparatus comprising: a synthesized sound generating means for generating a synthesized sound based on the correction information selected by the selection accepting means.

The correction information storage means stores a plurality of correction information obtained by each correction when the correction means corrects the same synthesized sound a plurality of times,
The history structure generating means generates the history structure information indicating a relationship between a plurality of the correction information for the same synthetic sound stored in the correction information storage means and the synthetic sound before correction. The voice editing device according to claim 1.

The speech editing apparatus according to claim 2, wherein the display unit displays a correspondence relationship between the correction information and the synthesized sound in a tree structure.

When the correction means corrects the synthetic sound, the correction is performed on the correction information indicating the correction sound obtained by the correction, and the number of corrections the correction means performs on the synthetic sound. Correction order information giving means for giving correction order information indicating whether or not
4. The voice editing apparatus according to claim 1, wherein the display unit displays the correction order information provided by the correction order information addition unit in association with the correction information. 5. .

5. The correction information storage unit stores a difference value between the corrected sound obtained by the correction by the correction unit and the synthesized sound before the correction as the correction information. The voice editing device according to any one of the above.

The display means displays a difference value between the correction sound indicated by the correction information and the synthesized sound before the correction sound is corrected based on the correction information stored in the correction information storage means. The voice editing apparatus according to claim 5.

The correction means includes phonetic information correction means for correcting the phonetic information of the synthesized sound,
The said correction information storage means stores the said correction information including the difference value of the said phonetic information before and after the correction by the said phonetic information correction means, The correction information storage means as described in any one of Claim 1 to 6 characterized by the above-mentioned. Voice editing device.

The correction means includes prosody information correction means for correcting the prosody information of the synthesized sound,
The voice editing according to any one of claims 1 to 7, wherein the correction information storage unit stores the correction information including a difference value between the prosodic information before and after correction by the prosody information correction unit. apparatus.

A modification step for modifying the synthesized sound generated for the text;
A correction information storage step of storing correction information indicating the correction sound obtained by the correction means in the correction information storage means;
A history structure generation step for generating history structure information indicating a relationship between the correction information stored in the correction information storage step and the synthesized sound before correction;
A display step of displaying a correspondence relationship between the correction information and the synthesized sound associated with each other in the history structure information generated in the history structure generation step;
A selection reception step for receiving selection of the correction information displayed in the display step;
And a synthetic sound generation step of generating a synthetic sound based on the correction information selected by the selection receiving step.

A voice editing program for causing a computer to execute voice editing processing,
A modification step for modifying the synthesized sound generated for the text;
A correction information storage step of storing correction information indicating the correction sound obtained by the correction means in the correction information storage means;
A history structure generation step for generating history structure information indicating a relationship between the correction information stored in the correction information storage step and the synthesized sound before correction;
A display step of displaying a correspondence relationship between the correction information and the synthesized sound associated with each other in the history structure information generated in the history structure generation step;
A selection reception step for receiving selection of the correction information displayed in the display step;
A speech editing program that causes a computer to execute a synthetic sound generation step of generating a synthetic sound based on the correction information selected in the selection receiving step.