JP2011170191A

JP2011170191A - Speech synthesis device, speech synthesis method and speech synthesis program

Info

Publication number: JP2011170191A
Application number: JP2010035067A
Authority: JP
Inventors: Nobuyuki Katae; 伸之片江
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-02-19
Filing date: 2010-02-19
Publication date: 2011-09-01
Anticipated expiration: 2030-02-19
Also published as: JP5423466B2

Abstract

<P>PROBLEM TO BE SOLVED: To easily and simply correcting a text for speech synthesis at a level of an input text, which is easily understood by an end user. <P>SOLUTION: The device is equipped with: a text editing section for editing a text which is an object of speech synthesis; a text editing determination section to determine whether or not text editing is performed in order to correct reading or accent of synthesis speech; an editing history data creating section of creating an editing history data to indicate an editing content of the text editing, when it is determined that text editing is performed in order to correct reading or accent of synthesis speech; and an editing history data storage section for storing the editing history data, when the created history data are registered. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、入力テキストから合成音声を生成する音声合成に関する。 The present invention relates to speech synthesis for generating synthesized speech from input text.

音声合成装置や音声合成プログラムは、我々が普段使っている漢字かな混じりテキストの入力から音声を合成する。その際、言語処理部が単語辞書を参照しながら入力テキストを形態素解析し、入力テキストを構成する単語（または形態素）を特定する。さらに、言語処理部が特定した単語の読みやアクセントが単語辞書から得られる。日本語のアクセントは、日本語の各音に割り振られた「高低」の配置であり、音声の了解性や自然性を確保するうえで必須のものである。各形態素の読みやアクセントが連結され、適宜アクセントの変形処理や、フレーズやポーズなどの境界を設定する処理が行われることによって、入力テキストに対する表音文字列が生成される。表音文字列は、テキストの読み方を表す文字列であり、一般的に、読みを表すカタカナ、アクセントを示す記号、フレーズやポーズの境界を示す記号を含む。 Speech synthesizers and speech synthesizers synthesize speech from text input that is usually mixed with kanji and kana. At that time, the language processing unit morphologically analyzes the input text while referring to the word dictionary, and specifies words (or morphemes) constituting the input text. Further, the reading and accent of the word specified by the language processing unit can be obtained from the word dictionary. Japanese accents are "high and low" arrangements assigned to each Japanese sound, and are essential for ensuring intelligibility and naturalness of speech. The phonetic character strings for the input text are generated by connecting the readings and accents of each morpheme and appropriately performing accent transformation processing and processing for setting boundaries such as phrases and poses. The phonetic character string is a character string representing how to read text, and generally includes a katakana representing a reading, a symbol representing an accent, and a symbol representing a boundary between phrases and poses.

任意の日本語テキストは非常に多様なため、現在の言語処理部では１００％の精度で解析ができるわけではない。言語処理部において読みやアクセントの誤りが起こる場合には、言語処理部の出力である表音文字列をマニュアルで修正するのが一般的である。前述したように、表音文字列において読みはカタカナで表記されることが多く、これは我々が普段使っている振り仮名に類似している。しかし、長音を「ー」で示したり（例えば「東京」は「トーキョー」と表す）、助詞の「は」「を」「へ」をそれぞれ「ワ」「オ」「エ」で示したり、アクセントがある箇所にアクセント記号を示したりするなど、表音文字列を修正するには、その仕様を十分に理解して慣れる必要があり、一般ユーザにとって容易に修正できるものではない。 Arbitrary Japanese texts are so diverse that the current language processor cannot analyze them with 100% accuracy. When reading or accenting errors occur in the language processing unit, it is common to manually correct the phonetic character string that is output from the language processing unit. As mentioned above, in the phonogram string, readings are often written in katakana, which is similar to the phonetics that we usually use. However, the long sound is indicated by “-” (for example, “Tokyo” is indicated as “Tokyo”), the particles “ha”, “to”, “to” are indicated by “wa”, “o”, “e”, and accents. In order to correct a phonetic character string, such as by showing an accent symbol at a certain place, it is necessary to fully understand and become familiar with the specification, and it cannot be easily corrected by general users.

エンドユーザにとって、前述したような表音文字列の修正はハードルが高いものである。一方で、表音文字列での読みやアクセントが正しくなるように、入力するテキストの方を修正することも可能である。入力テキストは我々が普段使っている漢字かな混じりテキストであるため、その修正は表音文字列の修正に比べて分かりやすく、エンドユーザにとっても負担が少ないと考えられる。ただし、入力テキストをどのように修正すれば合成音声が正しくなるか直感的に分かりにくい場合もある。また、入力テキストを修正するノウハウを習得したあとでも、様々なテキストに含まれる同じような修正を、入力テキストが変わるたびに毎回行うのは手間がかかり煩雑である。 For the end user, the correction of the phonetic character string as described above is a high hurdle. On the other hand, it is possible to correct the input text so that the reading and accent in the phonetic character string are correct. Since the input text is a kana-kana mixed text that we usually use, the correction is easier to understand than the correction of the phonetic string, and it is considered less burdensome for the end user. However, it may be difficult to intuitively understand how to correct the synthesized speech by correcting the input text. Further, even after acquiring know-how for correcting the input text, it is troublesome and complicated to perform the same correction included in various texts every time the input text changes.

例えば、合成音声の韻律(ピッチパターン、音素長、パワーなど)を編集する際に、その過去の韻律修正情報を保持しておき、その後の韻律情報の生成において、保持されている韻律修正情報を用いて、合成音声を生成する方法が開示されている（例えば、特許文献１、特許文献２参照）。 For example, when editing the prosody of a synthesized speech (pitch pattern, phoneme length, power, etc.), the previous prosody modification information is retained, and the prosody modification information retained in the subsequent generation of the prosody information is retained. And a method of generating synthesized speech using the method is disclosed (for example, see Patent Document 1 and Patent Document 2).

特開２００４−３０９７２４号公報JP 2004-309724 A 特開２００５−３４５６９９号公報JP 2005-345699 A

しかし、合成音声の韻律の編集は難易度が高く、簡単に最低限の修正がしたいというエンドユーザにとっては不適切である。 However, editing the prosody of a synthesized speech is difficult, and is inappropriate for end users who want to make minimal corrections easily.

そこで、本発明では、音声合成用のテキストの修正をエンドユーザが理解しやすい入力テキストのレベルで、簡便に行えるようにすることを目的とする。 Therefore, an object of the present invention is to make it possible to easily modify text for speech synthesis at the level of input text that is easily understood by the end user.

上記の目的を達成するために、以下に開示する音声合成装置は、音声合成の対象となるテキストを編集するテキスト編集部と、編集前後のテキストに対して、形態素解析、及び表音文字列生成を行う言語処理部と、編集前後のテキストそれぞれについての形態素解析結果、及び生成された表音文字列のうち、少なくとも表音文字列に基づいて、合成音声の読み、又はアクセントを修正するために実施したテキスト編集であるか否かを判定するテキスト編集判定部と、前記テキスト編集判定部が合成音声の読み、又はアクセントを修正するために実施したテキスト編集であると判定した場合に、当該テキスト編集の編集内容を示す編集履歴データを作成する編集履歴データ作成部と、編集履歴データ作成部が作成した編集履歴データの登録があったときに、当該編集履歴データを格納する編集履歴データ格納部とを備える。

また、前記テキスト編集部が編集した編集テキストの表音文字列、又は合成音声を出力する出力部と、前記出力部が出力した表音文字列、又は合成音声の読み、又はアクセントに誤りがある場合に、前記編集テキストに対して、合成音声の読み、アクセントの誤りの指摘をユーザから受付ける誤り指摘受付部と、前記誤り指摘受付部が受付けた前記編集テキストに対する合成音声の読み、アクセントの誤りの指摘箇所について、当該合成音声の読み、又はアクセントを修正するために実施すべきテキスト編集の編集内容を示す編集履歴データを、前記編集履歴データ格納部から検索して取得する編集履歴データ検索部とを更に備え、前記テキスト編集部は、前記編集履歴データ検索部が検索して取得した編集履歴データを用いて、前記編集テキストを再編集する。 In order to achieve the above object, a speech synthesizer disclosed below includes a text editing unit that edits text to be synthesized, morphological analysis, and phonogram generation for the text before and after editing. To correct the reading of the synthesized speech or the accent based on at least the phonetic character string out of the morphological analysis result for each of the text before and after editing and the generated phonetic character string A text editing determination unit for determining whether or not the text editing is performed, and the text editing determination unit when the text editing determination unit determines that the text editing is performed to read a synthesized speech or correct an accent. There is an edit history data creation unit that creates edit history data indicating the edit contents of the edit, and an edit history data created by the edit history data creation unit. When the, and a editing history data storage unit for storing the edit history data.

Also, there is an error in the phonetic character string of the edited text edited by the text editing unit or the output unit that outputs the synthesized speech, and the reading of the phonetic character string or synthesized speech output by the output unit, or the accent. In this case, for the edited text, an error indication receiving unit that receives a reading of synthesized speech and an indication of an accent error from a user, and a reading of the synthesized speech for the edited text received by the error indication receiving unit, an accent error Edit history data search unit for retrieving edit history data indicating the edit contents of text editing to be performed in order to correct the reading of the synthesized speech or the accent for the indicated part of the edit history data storage unit And the text editing unit uses the editing history data searched and acquired by the editing history data searching unit. To re-edit the strike.

上記の構成によれば、音声合成用のテキストの修正をエンドユーザが理解しやすい入力テキストのレベルで、簡便に行えるようにすることができる。 According to the above configuration, it is possible to easily modify the text for speech synthesis at the level of the input text that is easily understood by the end user.

本発明の実施形態１に係る音声合成装置の全体構成の一例を示すブロック図1 is a block diagram showing an example of the overall configuration of a speech synthesizer according to Embodiment 1 of the present invention. 本発明の実施形態１に係る音声合成装置の構成の一例を示すブロック図The block diagram which shows an example of a structure of the speech synthesizer which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る音声合成装置の構成の一例を示すブロック図The block diagram which shows an example of a structure of the speech synthesizer which concerns on Embodiment 1 of this invention. 編集履歴データ登録判定の一例を示す図The figure which shows an example of edit history data registration determination 編集履歴データ登録判定の一例を示す図The figure which shows an example of edit history data registration determination 編集履歴データ登録判定の一例を示す図The figure which shows an example of edit history data registration determination 編集履歴データ登録判定の一例を示す図The figure which shows an example of edit history data registration determination 編集履歴データ登録の一例を示す図Diagram showing an example of editing history data registration 編集履歴データの検索適用の一例を示す図The figure which shows an example of search application of edit history data 編集履歴データの検索適用の一例を示す図The figure which shows an example of search application of edit history data 編集履歴データの検索適用の一例を示す図The figure which shows an example of search application of edit history data 編集履歴データの検索適用の一例を示す図The figure which shows an example of search application of edit history data 本発明の実施形態１に係る音声合成装置の動作の一例を示すフロー図The flowchart which shows an example of operation | movement of the speech synthesizer which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る音声合成装置の動作の一例を示すフロー図The flowchart which shows an example of operation | movement of the speech synthesizer which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る音声合成装置の動作の一例を示すフロー図The flowchart which shows an example of operation | movement of the speech synthesizer which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係る音声合成装置の全体構成の一例を示すブロック図The block diagram which shows an example of the whole structure of the speech synthesizer which concerns on Embodiment 2 of this invention.

[実施形態１]
本実施形態１は、ユーザが各自のコンピュータにパッケージ製品をインストールして使用する音声合成ソフトウェアを音声合成装置に適用した例を示している。以下、本発明の実施形態１に係る音声合成装置１００について説明する。 [Embodiment 1]
The first embodiment shows an example in which speech synthesis software in which a user installs and uses a package product on his / her computer is applied to a speech synthesizer. Hereinafter, the speech synthesis apparatus 100 according to Embodiment 1 of the present invention will be described.

図１は、本発明の実施形態１に係る音声合成装置１００の全体構成を示すブロック図の一例である。図１において、音声合成装置１００は、制御部１１０、編集部１１１、言語処理部１２０、合成音声生成部１３０、合成データ管理部１４０、編集履歴登録更新部１５０、編集履歴検索部１６０、及び編集履歴蓄積部１７０を備える。音声合成装置１００は、インターフェース部１８０を介して、表示装置１９０、入力装置２００、及び音声出力装置２１０と接続されている。入力装置２００は、例えば、マウス、キーボード等の入力デバイスである。 FIG. 1 is an example of a block diagram showing an overall configuration of a speech synthesis apparatus 100 according to Embodiment 1 of the present invention. In FIG. 1, a speech synthesizer 100 includes a control unit 110, an editing unit 111, a language processing unit 120, a synthesized speech generation unit 130, a synthesized data management unit 140, an editing history registration / updating unit 150, an editing history search unit 160, and an editing unit. A history storage unit 170 is provided. The speech synthesizer 100 is connected to the display device 190, the input device 200, and the speech output device 210 via the interface unit 180. The input device 200 is an input device such as a mouse or a keyboard, for example.

制御部１１０は、音声合成装置１００の動作を制御するモジュールである。インターフェース部１８０は、テキストやそれに対する操作を入力する入力装置１９０や、入力に対するシステムの応答を出力する表示装置２００、合成音声を出力する音声出力装置２１０を制御するモジュールである。 The control unit 110 is a module that controls the operation of the speech synthesizer 100. The interface unit 180 is a module that controls the input device 190 that inputs text and operations for the text, the display device 200 that outputs a system response to the input, and the audio output device 210 that outputs synthesized speech.

編集部１１１は、テキスト編集部２０１、誤り指摘受付部２０２、テキスト修正部２０３及び出力部２０４を含むモジュールである。テキスト編集部２０１は、ユーザから音声合成の対象となるテキストの入力を受付ける。誤り指摘受付部２０２は、ユーザから合成音声の誤りの箇所を受付ける。テキスト修正部２０３は、音声合成の対象となるテキストを修正する。 The editing unit 111 is a module including a text editing unit 201, an error indication receiving unit 202, a text correction unit 203, and an output unit 204. The text editing unit 201 receives an input of a text to be subjected to speech synthesis from the user. The error indication receiving unit 202 receives an error location of the synthesized speech from the user. The text correction unit 203 corrects text to be subjected to speech synthesis.

言語処理部１２０はテキストを形態素解析し、表音文字列の生成を行うモジュールである。言語処理部１２０は、形態素解析部１２１、表音文字列生成部１２２を含むモジュールである。 The language processing unit 120 is a module that performs morphological analysis of text and generates a phonetic character string. The language processing unit 120 is a module including a morphological analysis unit 121 and a phonetic character string generation unit 122.

合成音声生成部１３０は表音文字列から合成音声を生成するモジュールである。編集途中のテキストやテキストの編集履歴、言語解析結果や表音文字列の結果は合成データ管理部１４０に一次保存され管理されている。 The synthesized speech generation unit 130 is a module that generates synthesized speech from a phonetic character string. Text being edited, text editing history, language analysis results, and phonetic character string results are temporarily stored and managed in the composite data management unit 140.

編集履歴登録更新部１５０は、入力テキストの編集状況、形態素解析結果、及び表音文字列をもとに、合成音声の読み、アクセント誤りを修正するために行ったテキスト編集を判定して、編集履歴蓄積部１７０に編集履歴データを登録するモジュールである。編集履歴蓄積部１７０は、編集履歴データ格納部１７１を含む。編集履歴登録更新部１５０は、編集履歴データ判定部１５１、編集履歴データ作成部１５２、及び編集履歴データ登録部１５３を含むモジュールである。 The editing history registration / update unit 150 determines the text editing performed to correct the reading of the synthesized speech and the accent error based on the editing status of the input text, the morphological analysis result, and the phonetic character string. This is a module for registering editing history data in the history storage unit 170. The editing history storage unit 170 includes an editing history data storage unit 171. The editing history registration update unit 150 is a module including an editing history data determination unit 151, an editing history data creation unit 152, and an editing history data registration unit 153.

編集履歴検索部１６０は、新たな入力テキストに対して、ユーザが合成音声の不適切な箇所を指摘した際に、その箇所に該当する編集履歴データを編集履歴データ格納部１７１から検索するモジュールである。編集履歴検索部１６０は、編集履歴データ検索部１６１を含むモジュールである。 The edit history search unit 160 is a module that searches the edit history data storage unit 171 for edit history data corresponding to a new input text when the user points out an inappropriate portion of the synthesized speech. is there. The edit history search unit 160 is a module that includes an edit history data search unit 161.

本発明の実施形態１に係る音声合成装置の構成を示す第１のブロック図を図２に示す。本ブロック図は、編集前のテキストと編集後のテキストから、それが合成音声の読みアクセント誤りを修正するために行ったテキスト編集か否かを判定して、編集履歴として蓄積するための構成を示している。 FIG. 2 shows a first block diagram showing the configuration of the speech synthesizer according to the first embodiment of the present invention. This block diagram shows a configuration for determining whether or not the text editing is performed for correcting the reading accent error of the synthesized speech from the pre-edit text and the post-edit text, and storing it as an edit history. Show.

テキスト編集部２０１は、入力装置２００およびインターフェース部１８０を介して、音声合成の対象となるテキストの入力を、ユーザから受付ける。入力されたテキストは、例えば、表示装置１９０に表示され、ユーザが編集可能にすることができる。テキスト編集部２０１は、テキストの編集指示もユーザから受付けることができる。 The text editing unit 201 receives an input of a text to be subjected to speech synthesis from the user via the input device 200 and the interface unit 180. The input text is displayed on the display device 190, for example, and can be edited by the user. The text editing unit 201 can also accept text editing instructions from the user.

形態素解析部１２１は、テキスト編集部２０１による編集前テキストと編集後テキストに対し、それぞれのテキストの形態素解析を行い、編集前テキストの解析結果、編集後テキストの解析結果を得る。表音文字列生成部１２２は、編集前テキストの表音文字列と編集後テキストの表音文字列を得る。 The morpheme analysis unit 121 performs morphological analysis of each text on the text before editing and the text after editing by the text editing unit 201, and obtains the analysis result of the text before editing and the analysis result of the text after editing. The phonetic character string generation unit 122 obtains a phonetic character string of the text before editing and a phonetic character string of the text after editing.

テキストは、形態素解析により、複数の形態素（例えば、単語）に分割される。分割された各単語について、読み、品詞、アクセント情報が生成される。形態素解析の方法として、例えば、ビタビ（Viterbi）アルゴリズムや最長一致法等が挙げられるが、本発明に用いられる形態素解析の方法は、特定のものに限定されない。表音文字列は、テキストの読み方を表す音声合成用の中間表記データである。本実形態における表音文字列の形式は、単なる一例であり、表音文字列の表し方は、これに限られない。 The text is divided into a plurality of morphemes (for example, words) by morphological analysis. Reading, part of speech, and accent information are generated for each divided word. Examples of the morphological analysis method include the Viterbi algorithm and the longest match method, but the morphological analysis method used in the present invention is not limited to a specific one. The phonetic character string is intermediate notation data for speech synthesis that represents how to read text. The format of the phonetic character string in this embodiment is merely an example, and the way of expressing the phonetic character string is not limited to this.

編集履歴データ判定部１５１は、編集前と編集後でテキストの表音文字列が変化していることを判定し、さらに、編集前テキストと編集後テキストとを比較、または編集前テキストと編集後テキストとの形態素解析結果を調べることにより、合成音声の読み、アクセント誤りを修正するために行ったテキスト編集か否かを判定することができる。 The editing history data determination unit 151 determines that the phonetic character string of the text has changed before and after editing, and compares the text before editing with the text after editing, or compares the text before editing with the text after editing. By examining the result of the morphological analysis with the text, it is possible to determine whether or not the text editing is performed in order to read the synthesized speech and correct the accent error.

編集履歴データ作成部１５２は、編集履歴データ判定部１５１が、テキスト編集が読み、アクセント誤りを修正するために行ったテキスト編集であると判定した場合、当該テキスト編集を編集履歴として登録するための編集履歴データを作成する。編集履歴データ作成部１５２は、例えば、テキスト中で表音文字列が変化している部分の編集内容を示す編集履歴データを生成する。 If the edit history data determination unit 151 determines that the text edit is a text edit that has been read and corrected to correct an accent error, the edit history data creation unit 152 registers the text edit as an edit history. Create edit history data. For example, the edit history data creation unit 152 generates edit history data indicating the edit contents of the portion where the phonetic character string is changed in the text.

編集履歴データ登録部１５３は、編集履歴データ作成部１５２が作成した編集履歴データを編集履歴データ格納部１７１に登録する。 The edit history data registration unit 153 registers the edit history data created by the edit history data creation unit 152 in the edit history data storage unit 171.

本発明の実施形態１に係る音声合成装置の構成を示す第２のブロック図を図３に示す。本ブロック図は、テキスト編集部２０１で、入力テキストにおける読み、アクセントが不適切な箇所をユーザが指摘し、その箇所に該当する編集履歴を編集履歴データベースから検索して、適用する構成を示している。 FIG. 3 shows a second block diagram showing the configuration of the speech synthesizer according to the first embodiment of the present invention. This block diagram shows a configuration in which the text editing unit 201 points out an inappropriate reading or accent in the input text, searches the editing history database for the editing history corresponding to the location, and applies it. Yes.

ユーザがテキスト編集部２０１からテキストを入力すると形態素解析部１２１は、入力テキストの形態素解析を行い、解析結果を得る。なお、この形態素解析の結果の各単語に対して下線を付して表示してもよい。これにより、ユーザは、テキストの形態素解析結果の各単語を容易に区別することができる。 When the user inputs text from the text editing unit 201, the morphological analysis unit 121 performs morphological analysis of the input text and obtains an analysis result. Note that each word resulting from the morphological analysis may be displayed with an underline. Thereby, the user can distinguish easily each word of the morphological analysis result of a text.

更に、表音文字列生成部１２２は入力されたテキストの表音文字列を生成する。表音文字列が出力部２０４で出力され、出力された入力テキストの中で不適切な箇所の指摘を誤り指摘受付部２０２から受付ける。出力部２０４は、表音文字列から合成した、合成音声を出力してもよい。これにより、ユーザは、合成音声が適切か否かを正確に判断することができる。 Furthermore, the phonetic character string generation unit 122 generates a phonetic character string of the input text. A phonetic character string is output from the output unit 204, and an indication of an inappropriate part in the output input text is received from the error indication receiving unit 202. The output unit 204 may output synthesized speech synthesized from the phonetic character string. Thus, the user can accurately determine whether or not the synthesized speech is appropriate.

次に、編集履歴データ検索部１６１は、入力テキスト、入力テキストの解析結果、及び誤り指摘箇所をもとに、編集履歴データ格納部１７１を検索する。 Next, the edit history data search unit 161 searches the edit history data storage unit 171 based on the input text, the analysis result of the input text, and the error indication part.

次に、テキスト修正部２０３は、編集履歴データ検索部１６１の検索の結果、該当する編集履歴データがある場合には、テキストを修正し、修正済みテキストはテキスト編集部２０１に返される。 Next, if there is corresponding editing history data as a result of the search by the editing history data search unit 161, the text correction unit 203 corrects the text, and the corrected text is returned to the text editing unit 201.

以下、合成音声の読み、アクセント誤りを修正するためのテキスト編集の判定方法の具体例について説明する。合成音声の読み、アクセント誤りを修正するためのテキスト編集には、（１）送り仮名の挿入または削除、（２）ひらがなと漢字との間の置換、（３）振り仮名の付与、（４）読点か空白の挿入、等が考えられる。編集履歴データ判定部１５１において、テキスト編集がこれらの合成音声の読み、アクセント誤りを修正するための所定の編集か否かを判定し、編集履歴データ作成部１５２で編集履歴として登録するデータを作成する例を、図４〜７に示す。 A specific example of a text editing determination method for correcting synthesized speech reading and accent errors will be described below. Reading of synthesized speech and text editing for correcting accent errors include (1) insertion or deletion of a sending kana, (2) substitution between hiragana and kanji, (3) giving a kana, (4) A punctuation mark or blank insertion is considered. In the editing history data determination unit 151, it is determined whether or not the text editing is a predetermined editing for reading the synthesized speech and correcting the accent error, and the editing history data creating unit 152 creates data to be registered as the editing history Examples of this are shown in FIGS.

図４は、（１）送り仮名の挿入または削除の例を示す図である。入力テキストが「これはお好み焼です。」のとき、ユーザはその合成音声を聞くと「お好み焼」を「おこのみしょー」と誤読しているとわかる。また、入力テキストを「これはお好み焼きです。」に変更する（送り仮名の「き」を挿入する）と正しい読みになるので、ユーザはこれを採用する。 FIG. 4 is a diagram showing an example of (1) insertion or deletion of a sending pseudonym. When the input text is “This is Okonomiyaki”, when the user hears the synthesized voice, it is understood that “okonomiyaki” is misread as “Okonomisho”. In addition, if the input text is changed to “This is Okonomiyaki” (inserting the kana “ki”), it will be read correctly, so the user adopts this.

このとき、「これはお好み焼です。」の表音文字列は「コレハオコノミ’ショーデス．」と誤っているが、「これはお好み焼きです。」の表音文字列は「コレハオコノミヤキデ’ス．」となり正しい（この表音文字列は必ずしもユーザに表示されない）。これは形態素解析で用いる単語辞書の中に、「お好み焼き」は登録されているが、「お好み焼」は登録されていないために、「お好み焼」の場合には「焼」が未知語と判定され、形態素解析誤りが生じたのが原因である。 At this time, the phonetic character string of “This is Okonomiyaki” is mistaken as “Koreha Okonomi 'Shodes.”, But the phonetic character string of “This is Okonomiyaki.” Is “Koreha Okonomiyakide's. Is correct (this phonetic character string is not necessarily displayed to the user). This is because “okonomiyaki” is registered in the word dictionary used for morphological analysis, but “okonomiyaki” is not registered. The reason is that a morphological analysis error occurred.

この例では、編集前のテキストの解析結果である普通名詞「お好み」と未知語「焼」とに対応する部分が、編集後の解析結果である普通名詞「お好み焼き」になっており、編集前の表音文字列は、「コレハオコノミ’ショーデス．」であるが、編集後の表音文字列は、「コレハオコノミヤキデ’ス．」と変化している。そして、編集前後のテキストを比較すると、漢字の「焼」の直後に、ひらがな「き」が挿入されているので、（１）の送り仮名の挿入または削除であると判定することができる。また、その編集の結果、表音文字列が変化し、その合成音声をユーザが採用したので、合成音声の読み、アクセント誤りを修正するためのテキストの変更であったと判断できる。 In this example, the part corresponding to the common noun “Okonomi” and the unknown word “Yaki”, which is the analysis result of the text before editing, is the common noun “Okonomiyaki”, which is the analysis result after editing. The previous phonetic character string is “Koreha Okonomi's Shodes.”, But the edited phonetic character string is changed to “Koreha Okonomiyakide's.”. Then, when comparing the text before and after editing, the hiragana “ki” is inserted immediately after the kanji “yaki”, so that it can be determined that it is the insertion or deletion of the sending kana in (1). Further, as a result of the editing, the phonetic character string has changed and the synthesized speech has been adopted by the user, so that it can be determined that the synthesized speech has been read and the text has been changed to correct the accent error.

すなわち、未知語が普通名詞になること、表音文字列が変化していること、及び漢字の後にひらがなが挿入されていることを条件として、これら３つの条件が満たされているか否かを判定することによって、読み、アクセント誤りの編集か否か判断される。また、このとき、編集前テキストの形態素解析結果において、「焼」は未知語と解析されている。以上から編集履歴データは、
インデックス：「焼」
適用条件：「焼」が未知語と解析されたとき
修正の種類：送り仮名の挿入
挿入する文字列：「き」
というものになる。なお、本例のように、表音文字列が変化した箇所であり、かつ編集前に未知語であった箇所をインデックスとすることができる。インデックスは、編集履歴データを検索する際にキーとなる。またこの編集履歴を登録するときには、この修正方法が１回使われたということを示すように、適用回数「１」を設定する。 That is, whether or not these three conditions are satisfied is determined on condition that the unknown word becomes a common noun, the phonetic character string is changed, and the hiragana is inserted after the kanji. By doing so, it is determined whether or not the reading or accent error is edited. At this time, “yaki” is analyzed as an unknown word in the morphological analysis result of the pre-edit text. From the above, edit history data is
Index: “Yaki”
Applicable conditions: When "Yaki" is analyzed as an unknown word, the type of correction: Inserted insertion character string: "ki"
It becomes that. Note that, as in this example, a portion where the phonetic character string has changed and a portion that was an unknown word before editing can be used as an index. The index is a key when searching for editing history data. When registering the editing history, the application count “1” is set to indicate that the correction method has been used once.

図５は、（２）ひらがなと漢字との間の置換の例を示す図である。入力テキスト「ただいま、時間外です。」のとき、ユーザはその合成音声を聞いて、「ただいま」のアクセントが間違っていることに気づく。また、入力テキストを「只今、時間外です。」に変更すると、アクセントが正しくなるので、ユーザはこれを採用する。 FIG. 5 is a diagram showing an example of replacement between (2) hiragana and kanji. When the input text is “I'm out of time now”, the user hears the synthesized speech and notices that the accent of “Now” is wrong. In addition, if the input text is changed to "Now, it's overtime", the accent will be correct and the user will adopt it.

このとき、「ただいま、」の表音文字列は「タダイマ，…」でありアクセントが間違っているが、これは「ただいま」が感動詞（つまり挨拶の「ただいま」）と解析されたためである。「只今、時間外です。」の表音文字列は「タダ’イマ，…」となっており正しい。 At this time, the phonetic character string of “Now,” is “Tadaima,…” and the accent is wrong, because “Now” has been analyzed as a moving verb (ie, “Now” in greetings). The phonetic character string “Now it's out of time.” Is correct, “Tada’Ima…”.

これは、形態素解析に用いる単語辞書に「ただいま」は感動詞、「只今」は普通名詞として登録されているためで、アクセントはそれぞれ0型（タダイマ）、2型（タダ’イマ）と登録されている。ユーザが「ただいま」を「只今」に変更するテキスト修正をしたとき、「ただいま」と「只今」は読みが同じくアクセントが違う単語であることは、単語辞書を参照して容易に分かることである。 This is because in the word dictionary used for morphological analysis, “Tadaima” is registered as an impression verb, and “Ima” is registered as a common noun. Accents are registered as type 0 (Tadaima) and type 2 (Tada'ima), respectively. ing. When the user modifies the text to change “Ima” to “Ima”, “Ima” and “Ima” are easily read by referring to the word dictionary. .

編集前のテキストの解析結果である感動詞「ただいま」が、編集後の解析結果である普通名詞「只今」になっており、編集前の表音文字列は、「タダイマ，…」であるが、編集後の表音文字列は、「タダ’イマ，…」と変化している。そして、編集前後のテキストを比較すると、「ただいま」というひらがなが、「只今」という漢字になっているので、（２）のひらがなと漢字との間の置換に該当すると判定できる。また、その編集の結果、表音文字列が変化し、その合成音声をユーザが採用したので、合成音声の読み、アクセント誤りを修正するためのテキストの変更であったと判断できる。 Although the impression verb “Tadaima”, which is the analysis result of the text before editing, is the common noun “Ima”, which is the analysis result after editing, and the phonetic string before editing is “Tadaima,…” The phonetic character string after editing changes to “tada'ima, ...”. When the text before and after editing is compared, the hiragana “now” becomes the kanji “Tama”, so it can be determined that it corresponds to the replacement between the hiragana and kanji in (2). Further, as a result of the editing, the phonetic character string has changed and the synthesized speech has been adopted by the user, so that it can be determined that the synthesized speech has been read and the text has been changed to correct the accent error.

すなわち、表音文字列が変化していること、及びひらがなが漢字に置換されたことを条件として、これら２つの条件がみたされているか否かを判定することによって、読み、アクセント誤りの編集か否か判断される。以上から編集履歴データは、
インデックス：「ただいま」
適用条件：「ただいま」が感動詞と解析されたとき
修正の種類：ひらがなと漢字との間の置換
置換する文字列：「只今」
というものである。またこの編集履歴を登録するときには、この修正方法が１回使われたということを示すように、適用回数「１」を設定する。 In other words, if the phonetic character string has changed and the hiragana character has been replaced with a kanji character, it is determined whether these two conditions are satisfied. It is judged whether or not. From the above, edit history data is
Index: “Now”
Applicable conditions: When "Imaima" is analyzed as a moving verb Type of correction: Hiragana and Kanji characters are replaced.
That's it. When registering the editing history, the application count “1” is set to indicate that the correction method has been used once.

図６は、（３）振り仮名の付与の例を示す図である。入力テキストは、図４と同様の「これはお好み焼です。」であり、「お好み焼」を「おこのみしょー」と誤読している。ここで、入力テキストにおいて、「焼」の部分に「やき」という振り仮名を付与すると読みが正しくなるので、ユーザはこれを採用する。 FIG. 6 is a diagram showing an example of (3) adding a pseudonym. The input text is “This is Okonomiyaki” as in FIG. 4, and “Okonomiyaki” is misread as “Okonomisho”. Here, in the input text, if the name “yaki” is added to the “yaki” part, the reading becomes correct, and the user adopts this.

「これはお好み焼です。」の表音文字列は「コレハオコノミ’ショーデス．」であるが、入力テキストにおいて「焼」の部分に「やき」という振り仮名を付与すると、表音文字列生成部はこれを参照して「コレハオコノミヤキデ’ス．」という正しい読みを生成することが可能である。 The phonetic string of “This is Okonomiyaki” is “Koreha Okonomi 'Shodes.”, But if the “yaki” kana is added to the “yaki” part of the input text, the phonetic string generator Can refer to this and generate the correct reading "Koreha Okonomiyakide '."

編集前の表音文字列は、「コレハオコノミ’ショーデス．」であるが、編集後の表音文字列は、「コレハオコノミヤキデ’ス．」と変化している。そして、編集前後のテキストを比較すると、「お好み焼」の「焼」に「やき」という振り仮名が付与されているので、（３）振り仮名の付与であると判定することができる。また、その編集の結果、表音文字列が変化し、その合成音声をユーザが採用したので、合成音声の読み、アクセント誤りを修正するためのテキストの変更であったと判断できる。 The phonetic character string before editing is “Koreha Okonomi 'Shodes.”, But the phonetic character string after editing is changed to “Koreha Okonomiyakide'.”. Then, when comparing the text before and after editing, since the name “Yaki” is given to “Yaki” of “okonomiyaki”, it can be determined that (3) the name is given. Further, as a result of the editing, the phonetic character string has changed and the synthesized speech has been adopted by the user, so that it can be determined that the synthesized speech has been read and the text has been changed to correct the accent error.

すなわち、表音文字列が変化したこと、及び表音文字列が変化した箇所に振り仮名が付与されていることを条件として、これら２つの条件が満たされているか否かを判定することによって、読み、アクセント誤りの編集か否か判断される。このとき編集履歴データは、
インデックス：「焼」
適用条件：「焼」が未知語と解析されたとき
修正の種類：振り仮名の付与
送り仮名：「やき」
というものである。またこの編集履歴を登録するときには、この修正方法が１回使われたということを示すように、適用回数「１」を設定する。 That is, by determining whether or not these two conditions are satisfied, on the condition that the phonetic character string has changed and that the place where the phonetic character string has changed has been assigned a kana. It is determined whether reading or accent error editing is performed. At this time, editing history data is
Index: “Yaki”
Applicable conditions: When “Yaki” is analyzed as an unknown word, the type of correction: Giving a pseudonym Sending pseudonym: “Yaki”
That's it. When registering the editing history, the application count “1” is set to indicate that the correction method has been used once.

図７は、（４）読点か空白の挿入の例を示す図である。入力テキストが「現在企業における…」のとき、「現在企業」はひと続きに読まれ不自然である。このとき「現在」のあとに読点を挿入することで、自然な読みになるので、ユーザはこれを採用する。 FIG. 7 is a diagram showing an example of (4) insertion of a punctuation mark or a blank. When the input text is "Current company ...", "Current company" is read in a row and is unnatural. At this time, by inserting a reading point after “current”, it becomes natural reading, and the user adopts this.

「現在企業における…」の表音文字列は「ゲンザイキ’ギョーニオケル…」であり、「現在」と「企業」がアクセント結合してひとつのアクセント句になっており、不適当であることが分かる。ここで、アクセント句とは日本語において一個のアクセントのまとまりを形成する語句の単位であり、個々のアクセントを持つ複数の単語が連結して一個のアクセント句を形成することをアクセント結合という。
これは「現在企業」が「現在時刻」や「現在地点」のような複合単語であると判断された結果である。一方、入力テキストを「現在、企業における…」にすれば、表音文字列は「ゲ’ンザイ，キ’ギョーニオケル…」となり二つのアクセント句に分かれる。 The phonetic string of “Currently in the company…” is “Genzaiki's Gioni Oker…”, and “Current” and “Company” are combined into a single accent phrase, indicating that it is inappropriate. . Here, an accent phrase is a unit of a word that forms a group of accents in Japanese, and a combination of a plurality of words having individual accents to form a single accent phrase is referred to as accent combination.
This is a result of determining that “current company” is a compound word such as “current time” or “current location”. On the other hand, if the input text is “currently in a company…”, the phonetic character string becomes “Genzai, Kiyogoni Okelle…” and is divided into two accent phrases.

編集前の表音文字列は、「ゲンザイキ’ギョーニオケル…」であるが、編集後の表音文字列は、「ゲ’ンザイ，キ’ギョーニオケル…」と変化している。そして、編集前後のテキストを比較すると、「現在」の直後に、読点「、」が挿入されているので、（４）読点か空白の挿入であると判定することができる。また、その編集の結果、表音文字列が変化し、その合成音声をユーザが採用したので、合成音声の読み、アクセント誤りを修正するためのテキストの変更であったと判断できる。 The phonetic character string before the editing is “Genzaiki's Gyoni Oker ...”, but the phonetic character string after the editing is changed as “Genzai, Ki'Gyoni Oker ...”. Then, when comparing the text before and after editing, since the punctuation mark “,” is inserted immediately after “current”, it can be determined that (4) the punctuation mark or blank is inserted. Further, as a result of the editing, the phonetic character string has changed and the synthesized speech has been adopted by the user, so that it can be determined that the synthesized speech has been read and the text has been changed to correct the accent error.

すなわち、読点が挿入されていること、及び読点前の形態素の表音文字列が変化していることを条件として、これら２つの条件が満たされているか否かを判定することによって、読み、アクセント誤りの編集か否かが判断される。このとき編集履歴データは、
インデックス：「現在」
適用条件：「現在」が普通名詞と解析されたとき
修正の種類：読点か空白の挿入
挿入する文字列：「、」
というものである。またこの編集履歴を登録するときには、この修正方法が１回使われたということを示すように、適用回数「１」を設定する。 That is, on the condition that a punctuation mark is inserted and the phonetic character string of the morpheme before the punctuation is changed, it is determined whether or not these two conditions are satisfied. It is determined whether or not the error has been edited. At this time, editing history data is
Index: “Current”
Applicable condition: When "present" is parsed as a common noun, type of correction: insert punctuation or white space, insert string: ","
That's it. When registering the editing history, the application count “1” is set to indicate that the correction method has been used once.

編集履歴データ判定部１５１は、合成音声の読み、又はアクセントを修正するために実施したテキスト編集であるか否かの判定を、送り仮名の挿入もしくは削除、仮名から漢字への変換、漢字から仮名への変換、振り仮名の付与、句読点か空白の挿入もしくは削除のいずれかであるか否かを判定することによって行うことができる。 The editing history data determination unit 151 determines whether or not it is a synthesized speech reading or text editing performed to correct an accent, insertion or deletion of a sending kana, conversion from kana to kanji, kanji to kana This can be done by determining whether it is conversion to, adding a kana, or inserting or deleting punctuation or white space.

図８は、編集履歴データの登録例を示す。図４〜７に示した編集履歴データ登録判定の例にしたがえば、本図に示すような編集履歴データベースが構築される。すなわち、インデックス「焼」の場合、品詞「未知語」、修正の種類「送り仮名の挿入」、挿入文字列「き」、適用回数「１」が登録される。インデックス「焼」の場合、品詞「未知語」、修正の種類「ふりがなの付与」、振り仮名「やき」、適用回数「１」が登録される。インデックス「ただいま」の場合、品詞「感動詞」、修正の種類「ひらがなと漢字との間の置換」、変更後文字列「只今」、適用回数「１」が登録される。また、インデックス「現在」の場合、品詞「名詞」、修正の種類「読点か空白の挿入」、挿入文字列「、」、適用回数「１」が登録される。 FIG. 8 shows an example of editing history data registration. According to the example of the editing history data registration determination shown in FIGS. 4 to 7, an editing history database as shown in this figure is constructed. That is, in the case of the index “yaki”, the part of speech “unknown word”, the type of correction “insert feed kana”, the insertion character string “ki”, and the application count “1” are registered. In the case of the index “yaki”, the part of speech “unknown word”, the correction type “giving furigana”, the kana “yaki”, and the application count “1” are registered. In the case of the index “Tadaima”, the part of speech “impression verb”, the type of correction “replacement between hiragana and kanji”, the changed character string “Imaima”, and the number of applications “1” are registered. In the case of the index “current”, the part of speech “noun”, the type of correction “insertion of punctuation or white space”, the insertion character string “,”, and the application count “1” are registered.

図９Ａ〜図９Ｄは、合成テキストに対して編集履歴データベースを検索してテキストを修正する例を示す図である。以下、図９Ａ〜図９Ｄに従って説明する。 FIG. 9A to FIG. 9D are diagrams illustrating an example in which the text is corrected by searching the editing history database for the composite text. Hereinafter, a description will be given with reference to FIGS. 9A to 9D.

入力テキスト「ただいま、タコ焼ができました。」を合成した音声をユーザが聞くと、まず「タコ焼」の読みが「たこしょー」となっており誤読であることが分かる。 When the user listens to the synthesized voice of the input text “Takoyaki is now,” the first reading of “Takoyaki” is “Takosho”, which indicates that it is misread.

不適切な箇所として「焼」の部分をユーザが指摘すると、編集履歴データベースの中から「焼」の文字を検索キーとして、「焼」に「やき」という振り仮名を付与する編集履歴データを検索してくる。この編集履歴をそのまま採用してもよいが、ユーザが他の修正方法を検索すると、「焼」のあとに振り仮名「き」を付与する編集履歴データが検索される。ユーザがこの編集履歴を採用すれば、テキストは「ただいま、タコ焼きができました。」となり、合成音声を聴取してこの部分の読みが正しくなっていることが確認される。 When the user points out the “yaki” part as an inappropriate part, the editing history database is searched for editing history data that uses the “yaki” character as a search key and adds the name “yaki” to “yaki”. Come on. This editing history may be adopted as it is, but when the user searches for another correction method, editing history data to which the pseudonym “ki” is given after “baked” is searched. If the user adopts this editing history, the text will be “Now we have baked octopus”, and it will be confirmed that the reading of this part is correct by listening to the synthesized speech.

ユーザが他の部分の修正に移る、又は文章全体を正しいと判定すると、先ほど採用した「焼」のあとに「き」を挿入するという編集が確定され、その編集履歴データの適用回数が加算されて更新される。 If the user moves on to correction of other parts or determines that the whole sentence is correct, the edit that inserts “ki” after “fire” adopted earlier is confirmed, and the number of times the edit history data is applied is added. Updated.

編集履歴データ格納部１７１に格納された各編集履歴データは、再編集に適用された適用回数をカウントする機能を有しており、編集履歴データ検索部１６１は、適用回数が多い編集履歴データを優先して検索し、編集履歴データ格納部１７１は、編集履歴データ検索部１６１が検索した編集履歴データが再編集に適用された際に、適用回数を更新することができる。 Each editing history data stored in the editing history data storage unit 171 has a function of counting the number of times of application applied to re-editing. The editing history data search unit 161 stores editing history data having a high number of applications. The search is preferentially performed, and the edit history data storage unit 171 can update the application count when the edit history data searched by the edit history data search unit 161 is applied to re-editing.

このように適用回数を保存しておくことで、「焼」のように、送り仮名の挿入と振り仮名の付与等の複数の修正方法がある場合には、編集履歴データの検索の際に適用回数が多いものを優先して検索することができ、検索の効率化を図ることができる。 By storing the number of times of application in this way, it can be applied when searching for editing history data when there are multiple correction methods, such as inserting a kana and assigning a kana, such as “Yaki”. A search with a high frequency can be prioritized and the search can be made more efficient.

更に、ユーザが「ただいま」の部分のアクセントの不適切な箇所を指摘すると、編集履歴データベースの中から「ただいま」の文字列を検索キーとして、「ただいま」を「只今」に置換する編集履歴データが検索される。これを採用することで、「只今、タコ焼きができました。」というテキストが生成され、正しいアクセントになる。ここでユーザが正しいと判定すると、いま採用した「ただいま」を「只今」に変換するという編集が確定され、編集履歴データの適用回数が加算されて更新される。 In addition, when the user points out an inappropriate part of the accent of the “currently” part, edit history data that replaces “currently” with “currently” using the “currently” character string as a search key in the edit history database. Is searched. By adopting this, the text “Tako-yaki is now ready” will be generated and it will be the correct accent. If it is determined that the user is correct, the editing that converts the “currently adopted” to “immediately” is confirmed, and the number of times of application of the editing history data is added and updated.

本実施形態においては、編集履歴データベースに登録、検索される編集履歴はユーザ自身によるものであり、テキストの編集方法が分からない場合にデータベースを参照するというよりも、過去に自身が実施した編集方法を、同様な編集が必要な新たなテキストに適用することにより、編集作業を効率化するというメリットがある。 In the present embodiment, the editing history registered and searched in the editing history database is by the user himself, and the editing method performed by himself in the past rather than referring to the database when the editing method of the text is unknown. Is applied to new texts that require similar editing, thereby improving the efficiency of editing.

テキスト編集部が編集した編集テキストの合成音声を出力する音声出力部と、音声出力部が出力した合成音声の読み、又はアクセントに誤りがある場合に、編集テキストに対して、合成音声の読み、アクセントの誤りの指摘をユーザから受付ける誤り指摘受付部と、誤り指摘受付部が受付けた編集テキストに対する合成音声の読み、アクセントの誤りの指摘箇所について、当該合成音声の読み、又はアクセントを修正するために実施すべきテキスト編集の編集内容を示す編集履歴データを、編集履歴データ格納部から検索して取得する編集履歴データ検索部とを備えることにより、テキスト編集部は、編集履歴データ検索部が検索して取得した編集履歴データを用いて、編集テキストを再編集することができる。 A voice output unit that outputs a synthesized voice of the edited text edited by the text editing unit, and a reading of the synthesized voice output by the voice output unit, or a reading of the synthesized voice with respect to the edited text when there is an error in the accent, An error indication receiving unit that receives an indication of an accent error from the user, and a reading of the synthesized speech with respect to the edited text received by the error indication receiving unit, and reading of the synthesized speech for an accent error indication point or correcting the accent The edit history data search unit retrieves the edit history data indicating the edit contents of the text editing to be performed from the edit history data storage unit. The edit text can be re-edited using the edit history data acquired in this way.

これにより、不慣れなユーザでもテキストの編集による合成音声の修正を簡単に行うことが可能になり、また、音声合成装置に慣れたユーザであっても、修正したい箇所を指摘するだけでテキストを編集するという効率的な操作が可能になる。 This makes it possible for even unskilled users to easily modify synthesized speech by editing text, and even users familiar with speech synthesizers can edit text simply by pointing out where they want to be corrected. Efficient operation is possible.

以下に示す例では、合成音声の読みアクセント誤りを修正するために行った所定のテキスト編集を判定し、編集履歴として蓄積する処理と、合成音声に読み、アクセント誤りがある箇所に編集履歴を検索して修正する処理とが、ひとつの実施形態で機能する場合を示している。図１０Ａ〜図１０Ｃは、本発明の実施形態１における処理の流れを示すフロー図である。以下、図１０Ａ〜図１０Ｃのフロー図に従って説明する。 In the example shown below, the process determines the predetermined text editing performed to correct the reading accent error in the synthesized speech, stores it as an editing history, and reads the synthesized speech and searches the editing history where there is an accent error. In this case, the processing to be corrected in this case functions in one embodiment. 10A to 10C are flowcharts showing the flow of processing in the first embodiment of the present invention. Hereinafter, it demonstrates according to the flowchart of FIG. 10A-FIG. 10C.

まず、合成テキスト編集領域が初期化され（ステップＳ１００１）、ユーザのテキスト編集が開始される。ユーザのテキスト編集は適当なタイミングでシステムに監視されている。編集履歴データ判定部１５１は、ユーザがテキストを編集したか否かを判定する（ステップＳ１００２）。 First, the composite text editing area is initialized (step S1001), and user text editing is started. The user's text editing is monitored by the system at an appropriate timing. The editing history data determination unit 151 determines whether or not the user has edited the text (step S1002).

次に、編集履歴データ判定部１５１は、ユーザが行ったテキスト編集が、（１）〜（４）に記述した所定のテキスト編集、すなわち、合成音声の読みアクセント誤りを修正するために行う可能性のあるテキスト編集であるか否かを判定する（ステップＳ１００３）。 Next, the editing history data determination unit 151 may perform the text editing performed by the user in order to correct the predetermined text editing described in (1) to (4), that is, the reading accent error of the synthesized speech. It is determined whether or not the text editing is performed (step S1003).

ステップＳ１００３において、合成音声の読みアクセント誤りを修正するために行う可能性のあるテキスト編集であった場合には、編集履歴データの言語処理を行って表音文字列を生成し、表音文字列の変化があった場合には、編集履歴データ作成部１５２はそのテキスト編集内容の編集履歴データを作成し、編集履歴データ登録部１５３は、当該編集履歴データを一時的に記憶しておく（ステップＳ１００４→Ｓ１００５→Ｓ１００６）。 In step S1003, if the text editing is likely to be performed to correct the reading accent error in the synthesized speech, the editing history data is subjected to language processing to generate a phonetic character string, and the phonetic character string. If there is a change, the editing history data creation unit 152 creates editing history data of the text editing content, and the editing history data registration unit 153 temporarily stores the editing history data (step S1004 → S1005 → S1006).

一方、編集履歴データ検索部１６１は、ユーザから合成音声の不適切な箇所をテキスト上で指摘を受付けたか否かを判定する（ステップＳ１００７）。 On the other hand, the edit history data search unit 161 determines whether an indication of an inappropriate portion of the synthesized speech has been received on the text from the user (step S1007).

ステップＳ１００７において、ユーザから合成音声の不適切な箇所をテキスト上で指摘を受付けた場合には、入力テキストの言語処理を行ったうえで（ステップＳ１００８）、編集履歴データ検索部１６１は、編集履歴データ格納部１７１の中に指摘箇所に該当する編集履歴データがあるか検索する。なお、不適切な箇所の指摘は、下線が付された不適切な単語を指摘することによって行われてもよい。このとき、編集履歴データ検索部１６１は、編集履歴データ格納部１７１の中に当該指摘箇所への適用をすべき変更履歴データがあるか否か判定する（ステップＳ１００９）。 In step S1007, when an inappropriate part of the synthesized speech is received from the user as an indication on the text, the editing history data search unit 161 performs the language processing of the input text (step S1008), It is searched whether there is any edit history data corresponding to the indicated location in the data storage unit 171. Note that an inappropriate part may be pointed out by pointing out an inappropriate word that is underlined. At this time, the edit history data search unit 161 determines whether there is change history data to be applied to the indicated location in the edit history data storage unit 171 (step S1009).

ステップＳ１００９において、適用すべき編集履歴データがない場合には、編集履歴データ検索部１６１は、その旨を表示してテキスト編集に戻る（ステップＳ１０１０）。一方、ステップＳ１００９において、適用すべき編集履歴データがある場合には、編集履歴データ検索部１６１は、ユーザにそれを適用するかどうかの判定を仰ぎ、適用する編集履歴データが決定するまで検索を繰り返す（ステップＳ１０１１〜Ｓ１０１３）。編集履歴データ検索部１６１は、適用が決まれば、その編集履歴データを一時的に記憶する（ステップＳ１０１４）。 If there is no editing history data to be applied in step S1009, the editing history data search unit 161 displays that fact and returns to text editing (step S1010). On the other hand, if there is editing history data to be applied in step S1009, the editing history data search unit 161 asks the user whether to apply the editing history data, and searches until the editing history data to be applied is determined. Repeat (steps S1011 to S1013). If the application history is determined, the editing history data search unit 161 temporarily stores the editing history data (step S1014).

また、ユーザが合成音声を確認するために、音声合成の実行を指示した場合には、言語処理、合成音声の生成、合成音声の出力を行い、ユーザへの確認を促す（ステップＳ１０１５〜Ｓ１０１８）。 Further, when the user instructs execution of speech synthesis in order to confirm the synthesized speech, language processing, generation of the synthesized speech, and output of the synthesized speech are performed to prompt the user to confirm (steps S1015 to S1018). .

ユーザが直前に修正した箇所以外の編集に移った場合や、合成全体について問題なしと判断した場合には、編集履歴データ登録部１５３は、該当箇所の修正は正しいと判断できるので、それまでに行われたテキストの修正や履歴データの適用（一時的に記憶してあるもの）は適切なものと判断し、編集履歴データベースへの登録や更新を行う（ステップＳ１０１９→Ｓ１０２０）。 When the user moves to editing other than the part corrected immediately before or when it is determined that there is no problem with the entire composition, the editing history data registration unit 153 can determine that the correction of the corresponding part is correct. It is determined that the text correction and history data application (temporarily stored) performed are appropriate, and registration and update in the editing history database are performed (steps S1019 → S1020).

すなわち、編集履歴データ登録部１５３は、テキスト編集によるものであれば新規に編集履歴データベースに登録する（ステップＳ１０２１）。また、編集履歴データ登録部１５３は、既存の履歴データの適用であればその適用回数に「１」を加算して更新する（ステップＳ１０２２）。こうして、合成音声全体にわたってユーザが正しいとするまで本処理が繰り返される（ステップＳ１０２３）。 That is, the edit history data registration unit 153 newly registers in the edit history database if it is a text edit (step S1021). Further, the editing history data registration unit 153 adds “1” to the number of times of application and updates the existing history data (step S1022). In this way, this processing is repeated until the user is correct over the entire synthesized speech (step S1023).

以上説明したように、本発明によれば、ユーザ自身が過去に行った入力テキストの所定の編集履歴を蓄積しておき、新たな入力テキストの音声合成を行う際に、編集履歴を検索して適用することで、適切なテキストの修正をエンドユーザが理解しやすい入力テキストのレベルで、簡便に行えるようにすることができる。
［実施形態２］
近年ソフトウェアの提供形態として、ＳａａＳ（Ｓｏｆｔｗａｒｅａｓａｓｅｒｖｉｃｅ）型サービスが広まりつつある。従来は、ソフトウェアをパッケージ製品としてユーザにライセンス販売し、ユーザが各自のコンピュータで稼動させるという形態が中心的であった。これに対し、ＳａａＳ型サービスでは、ソフトウェアを提供者側のサーバで稼動させ、ユーザはそのソフトウェアをネットワーク経由のサービスとして利用し、そのサービス料を提供者側に支払う。 As described above, according to the present invention, when the user himself / herself accumulates a predetermined editing history of input text that has been performed in the past and performs speech synthesis of new input text, the editing history is searched. By applying, it is possible to easily perform appropriate text correction at the input text level that is easy for the end user to understand.
[Embodiment 2]
In recent years, SaaS (Software as a service) type services are spreading as a form of software provision. Conventionally, software has been sold as licenses to users as packaged products, and users have to run on their own computers. On the other hand, in the SaaS type service, the software is operated on the server on the provider side, and the user uses the software as a service via the network and pays the service fee to the provider side.

この形態を用いて、音声合成においてもＳａａＳ型音声合成サービスを用いることが考えられる。つまり、ユーザはネットワークを介してサーバ上の音声合成ソフトウェアを利用する。そのため、エンドユーザが気軽に音声合成ソフトウェアを利用し、テキストを音声に変換したり、提供される編集機能を利用して好みの合成音声に調整したりすることが可能になる。 Using this form, it is conceivable to use the SaaS type speech synthesis service in speech synthesis. That is, the user uses the speech synthesis software on the server via the network. Therefore, the end user can easily use the speech synthesis software to convert the text into speech, or use the provided editing function to adjust to the desired synthesized speech.

本実施形態２は、ＳａａＳ型音声合成サービスの例であり、音声合成サーバが、音声合成ソフトウェアに従って動作している例である。ユーザは各自のユーザ端末３００、３１０、・・３Ｎ０を用いて、ネットワークを介して音声合成サービスを利用する。ユーザ端末側には音声合成サーバに対するデータを送受信して表示するためのブラウザが用意されている。図１１に示す音声合成サーバ５００（音声合成装置）において、図１と同じ機能ブロック（モジュール）には、同じ番号を付す。 The second embodiment is an example of a SaaS type speech synthesis service, in which a speech synthesis server operates according to speech synthesis software. Each user uses his / her user terminal 300, 310,... 3N0 to use the speech synthesis service via the network. On the user terminal side, a browser for transmitting and receiving data to and from the speech synthesis server is prepared. In the speech synthesis server 500 (speech synthesizer) shown in FIG. 11, the same numbers are assigned to the same functional blocks (modules) as in FIG.

本実施形態では、各ユーザ端末とのデータの送受信はブラウザに対応する形式（例えば、ＨＴＭＬ、ＸＭＬなど）で実施される。音声合成サーバ５００は、送信データ作成部４００、受信データ解析部４１０及びデータ送受信部４２０を更に備える。 In the present embodiment, data transmission / reception with each user terminal is performed in a format corresponding to the browser (for example, HTML, XML, etc.). The speech synthesis server 500 further includes a transmission data creation unit 400, a reception data analysis unit 410, and a data transmission / reception unit 420.

送信データ作成部４００は、音声合成サーバ５００の内部で処理されているデータをユーザ端末への送信データに変換する。受信データ解析部４１０は、ユーザ端末から受信したデータを音声合成サーバ５００の内部で処理されているデータに変換する。また、データ送受信部４２０は、これらのデータをユーザ端末に送受信する。 The transmission data creation unit 400 converts data processed inside the speech synthesis server 500 into transmission data to the user terminal. The reception data analysis unit 410 converts data received from the user terminal into data processed inside the speech synthesis server 500. The data transmitter / receiver 420 transmits / receives these data to / from the user terminal.

データ送受信部４２０は、ユーザ端末の編集部（ブラウザ）で入力された入力テキストデータ、誤り指摘データをユーザ端末から受信する。受信データ解析部４１０は、データ送受信部４２０で受信した入力テキストデータ、誤り指摘データを音声合成サーバ１００の内部で処理されているデータに変換する。 The data transmitting / receiving unit 420 receives input text data and error indication data input by the editing unit (browser) of the user terminal from the user terminal. The reception data analysis unit 410 converts the input text data and error indication data received by the data transmission / reception unit 420 into data processed inside the speech synthesis server 100.

編集履歴蓄積部１７０は、変換された入力テキストデータに基づいて、編集部１１１で実施した編集内容を示す編集履歴データを蓄積する。 The editing history accumulation unit 170 accumulates editing history data indicating editing contents performed by the editing unit 111 based on the converted input text data.

また、編集部１１１は、ユーザ端末からテキストデータの入力があると、テキストデータの表音文字列または合成音声データを生成して、ユーザ端末に対して出力する。このときに、送信データ作成部４００は、表音文字列または合成音声を、ユーザ端末への送信データに変換する。ユーザ端末は、誤り指摘を受付け、誤り指摘データを音声合成サーバに送信する。編集部１１１は、誤り指摘データに基づいて、編集履歴蓄積部１７０に蓄積された編集履歴データを参照して、ユーザ端末の編集部で入力された入力テキストデータを再編集する。送信データ作成部４００は、音声合成サーバ５００で再編集された再編集済みのテキストデータをユーザ端末への送信データに変換する。データ送受信部４２０は、送信データ作成部４００で作成された送信データをユーザ端末に送信する。 In addition, when text data is input from the user terminal, the editing unit 111 generates a phonogram string or synthesized speech data of the text data and outputs it to the user terminal. At this time, the transmission data creation unit 400 converts the phonetic character string or the synthesized speech into transmission data for the user terminal. The user terminal accepts the error indication and transmits the error indication data to the speech synthesis server. Based on the error indication data, the editing unit 111 refers to the editing history data stored in the editing history storage unit 170 and re-edits the input text data input by the editing unit of the user terminal. The transmission data creation unit 400 converts the re-edited text data re-edited by the speech synthesis server 500 into transmission data to the user terminal. The data transmission / reception unit 420 transmits the transmission data created by the transmission data creation unit 400 to the user terminal.

本実施形態２では、他のユーザの編集履歴データが蓄積され検索できる状態にあるので、不慣れなユーザでも、他のユーザの編集履歴を活用することで、テキストの編集による合成音声の修正を簡単に行うことが可能である。また、音声合成装置に慣れたユーザであっても、修正したい箇所を指摘するだけで編集することが可能なので、編集作業を効率的に行えるというメリットがある。 In the second embodiment, the editing history data of other users is stored and can be searched, so even an unfamiliar user can easily correct the synthesized speech by editing the text by using the editing history of the other users. Can be done. In addition, even a user who is familiar with the speech synthesizer can perform editing simply by pointing out the part to be corrected.

以上説明したように、本発明によれば、ＳａａＳ型音声合成サービスのようにサーバ上に置かれたソフトウェアであって、他のユーザの編集履歴が蓄積され検索できる枠組みがあれば、不慣れなユーザでもテキストの編集による合成音声の修正を簡単に行うことが可能である。また、入力テキストによる合成音声の調整を効果的に行うことができる。 As described above, according to the present invention, it is software that is placed on a server like the SaaS type speech synthesis service and has a framework in which editing histories of other users can be accumulated and searched. However, it is possible to easily correct synthesized speech by editing text. In addition, it is possible to effectively adjust the synthesized speech based on the input text.

上記実施形態で説明した構成は、単に具体例を示すものであり、本発明の技術的範囲を制限するものではない。本発明の効果を奏する範囲において、任意の構成を採用することが可能である。 The configuration described in the above embodiment merely shows a specific example, and does not limit the technical scope of the present invention. Any configuration can be employed within the scope of the effects of the present invention.

なお、音声合成サーバの構成は上記例に限られない。例えば、音声合成サーバ１００が備える機能の一部をユーザ端末または他のサーバが備える構成としてもよい。 The configuration of the speech synthesis server is not limited to the above example. For example, it is good also as a structure with which a user terminal or another server equips a part of function with which the speech synthesis server 100 is provided.

本発明の実施形態は、上述した実施形態を実現するソフトウェアのプログラム（実施の形態では図に示すフロー図に対応したプログラム）が装置に供給され、その装置のコンピュータが、供給されたプログラムを読出して、実行することによっても達成せれる場合を含む。したがって、本実施形態で説明した機能処理をコンピュータで実現するために、コンピュータにインストールされるプログラム自体も本発明の一実施形態である。つまり、本発明の機能処理を実現させるためのプログラムも、実施形態の一側面に含まれる。また、本発明の機能処理を実現させるためのプログラムを記録した媒体も、実施形態の一側面に含まれる。 In the embodiment of the present invention, a software program for realizing the above-described embodiment (in the embodiment, a program corresponding to the flowchart shown in the figure) is supplied to the apparatus, and the computer of the apparatus reads the supplied program. Including cases that can also be achieved by execution. Therefore, in order to realize the functional processing described in this embodiment by a computer, the program itself installed in the computer is also an embodiment of the present invention. That is, a program for realizing the functional processing of the present invention is also included in one aspect of the embodiment. A medium recording a program for realizing the functional processing of the present invention is also included in one aspect of the embodiment.

１００音声合成装置
１１０制御部
１１１編集部
１２０言語処理部
１２１形態素解析部
１２２表音文字列生成部
１３０合成音声生成部
１４０合成データ管理部
１５０編集履歴登録更新部
１５１編集履歴データ判定部
１５２編集履歴データ作成部
１５３編集履歴データ登録部
１６０編集履歴検索部
１７０編集履歴蓄積部
１７１編集履歴データ格納部
１８０インターフェース部
１９０表示装置
２００入力装置
２０１テキスト編集部
２０２誤り指摘受付部
２０３テキスト修正部
２１０音声出力装置
４００送信データ作成部
４１０受信データ解析部
４２０データ送受信部
５００音声合成サーバ 100 Speech Synthesizer 110 Control Unit 111 Editing Unit 120 Language Processing Unit 121 Morphological Analysis Unit 122 Phonetic Character String Generation Unit 130 Synthetic Speech Generation Unit 140 Synthetic Data Management Unit 150 Editing History Registration Update Unit 151 Editing History Data Determination Unit 152 Editing History Data creation unit 153 Editing history data registration unit 160 Editing history search unit 170 Editing history storage unit 171 Editing history data storage unit 180 Interface unit 190 Display device 200 Input device 201 Text editing unit 202 Error indication receiving unit 203 Text correction unit 210 Audio output Device 400 Transmission data creation unit 410 Reception data analysis unit 420 Data transmission / reception unit 500 Speech synthesis server

Claims

A text editing section for editing the text to be synthesized,
A language processing unit that performs morphological analysis and phonogram generation on the text before and after editing,
Whether the text editing was performed to correct the reading of the synthesized speech or the accent based on at least the phonetic character string among the morphological analysis results for the text before and after editing and the generated phonetic character string A text editing determination unit for determining whether or not,
An editing history data creation unit for creating editing history data indicating editing contents of the text editing when the text editing judgment unit determines that the text editing is performed to read the synthesized speech or correct the accent; ,
A speech synthesizer comprising: an editing history data storage unit that stores editing history data when the editing history data created by the editing history data creation unit is registered.

A phonetic character string of the edited text edited by the text editing unit, or an output unit for outputting synthesized speech;
When the phonetic character string output from the output unit or the synthesized speech is read or the accent is incorrect, the synthesized text is read from the user, and the error indication is received from the user. And
Editing history indicating the edited content of text editing to be performed in order to read the synthesized speech or correct the accent for the point where the accent error is pointed out with respect to the edited text received by the error indication receiving unit An edit history data search unit for searching for and acquiring data from the edit history data storage unit;
The text editing unit
The speech synthesizer according to claim 1, wherein the edit text is re-edited using the edit history data searched and acquired by the edit history data search unit.

Each editing history data stored in the editing history data storage unit has a function of counting the number of times applied to re-editing,
The editing history data search unit
Search with priority given to the editing history data with a large number of application times,
The edit history data storage unit
The speech synthesizer according to claim 1 or 2, wherein when the edit history data searched by the edit history data search unit is applied to re-editing, the application count is updated.

The text editing determination unit determines whether or not the text editing is performed to read the synthesized speech or to correct the accent, insertion or deletion of the sending kana, replacement between hiragana and kanji, The speech synthesizer according to any one of claims 1 to 3, wherein the speech synthesizer is performed by determining whether it is at least one of giving, reading, blank insertion or deletion.

A text editing step for editing the text to be synthesized,
Language processing steps for performing morphological analysis and phonetic character string generation on the text before and after editing,
Whether the text editing was performed to correct the reading of the synthesized speech or the accent based on at least the phonetic character string among the morphological analysis results for the text before and after editing and the generated phonetic character string A text editing judgment step for judging whether or not,
An editing history data creation step for creating editing history data indicating the editing content of the text editing when it is determined that the text editing is performed in order to read the synthesized speech or correct the accent in the text editing determination step; ,
A speech synthesis method comprising: an editing history data storage step of storing the editing history data when the editing history data created in the editing history data creation step is registered.

On the computer,
A text editing step for editing the text to be synthesized,
Language processing steps for performing morphological analysis and phonetic character string generation on the text before and after editing,
Whether the text editing was performed to correct the reading of the synthesized speech or the accent based on at least the phonetic character string among the morphological analysis results for the text before and after editing and the generated phonetic character string A text editing judgment step for judging whether or not,
An editing history data creation step for creating editing history data indicating the editing content of the text editing when it is determined that the text editing is performed in order to read the synthesized speech or correct the accent in the text editing determination step; ,
A speech synthesis program for executing an editing history data storing step for storing the editing history data when the editing history data created in the editing history data creating step is registered.