JP2009271209A

JP2009271209A - Voice message creation system, program, semiconductor integrated circuit device and method for manufacturing the same

Info

Publication number: JP2009271209A
Application number: JP2008119921A
Authority: JP
Inventors: Fumihito Baisho; 文仁倍賞
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2008-05-01
Filing date: 2008-05-01
Publication date: 2009-11-19

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently edit and create a voice data file for generating a voice guide message. <P>SOLUTION: This system comprises: a dictionary data storage section 182; a voice data creating section 120 with additional information, for creating a voice data with additional information in which additional information corresponding to a received text data, based on a dictionary data; an editing processing section 130 for performing editing processing by receiving edit input information by displaying an editing image, based on the voice data with additional information; a list information creation processing section 132 for creating list information based on the editing processing result; a voice data creating section 134 without additional information for creating a voice data without additional information, based on the voice data with additional information; and a memory write information creating section 136, which determines a storage object phrase that is to be stored in a voice data memory based on the list information, and which creates memory write information based on the voice data without additional information corresponding to the determined storage object phrase. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声メッセージ作成システム、プログラム、半導体集積回路装置及び半導体集積回路装置の製造方法に関する。 The present invention relates to a voice message creation system, a program, a semiconductor integrated circuit device, and a method for manufacturing a semiconductor integrated circuit device.

ホストプロセッサ、音声ＩＣを搭載し、ホストプロセッサと音声ＩＣが連動して音声でメッセージを出力する音声再生システムが搭載された電子機器が知られている。
特開２００２−０２３７８１号 2. Description of the Related Art There is known an electronic device that includes a host processor and an audio IC, and an audio playback system that outputs a message by voice in conjunction with the host processor and the audio IC.
JP 2002-023781

電子機器等にユーザーインターフェースとして予め設定された音声ガイドメッセージを出力する音声機能を持たせる場合、出力予定音声ガイドメッセージに対応した音声データファイルを音声再生装置の内蔵ＲＯＭに格納して、ホストからのコマンドに基づき、内蔵ＲＯＭから読み出した音声データを再生出力させる方式の音声再生システムがある。 When an electronic device has a voice function for outputting a voice guide message set in advance as a user interface, a voice data file corresponding to the voice guide message to be output is stored in the built-in ROM of the voice playback device, There is an audio reproduction system that reproduces and outputs audio data read from a built-in ROM based on a command.

かかる音声ガイドメッセージ作成過程において、従来は、音声データファイルと音声データファイルに対応した付加データの音声ログファイルを組み合わせて扱う必要があり、音声データ作成ツールと音声編集ツールのツール間でのファイルの受け渡しなど、ファイル管理が非常に面倒であった。また、どちらかのファイルが無くなったり、壊れたりした場合、音声データの作り直しが必要となり、音声ガイドメッセージ作成の手順が増えてしまうという問題点があった。 In the process of creating the voice guide message, conventionally, it is necessary to handle the voice data file and the voice log file of the additional data corresponding to the voice data file in combination, and the file between the voice data creation tool and the voice editing tool is used. File management such as delivery was very troublesome. In addition, when either file is lost or broken, it is necessary to recreate voice data, which increases the procedure for creating voice guide messages.

本発明は、以上のような技術的課題に鑑みてなされたものであり、その目的とするところは、電子機器等で出力予定の音声ガイドメッセージを発話させるための音声データファイルを効率よく編集、生成するための音声メッセージ作成システム、プログラム、半導体集積回路装置及び半導体集積回路装置の製造方法に関する。 The present invention has been made in view of the technical problems as described above, and its object is to efficiently edit a voice data file for uttering a voice guide message to be output by an electronic device or the like, The present invention relates to a voice message creation system, a program, a semiconductor integrated circuit device, and a method for manufacturing a semiconductor integrated circuit device.

（１）本発明は、
複数のフレーズを含むセンテンスとして構成される音声メッセージ作成システムであって、
テキストデータに対応した合成音声データを生成するための辞書データを記憶する辞書データ記憶部と、
前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成する付加情報付き音声データ生成部と、
前記付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行う編集処理部と、
編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行うリスト情報生成処理部と、
前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する付加情報なし音声データ作成部と、
前記リスト情報に基づき、音声データメモリへの格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成するメモリ書き込み情報生成部とを含むことを特徴とする。 (1) The present invention
A voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing unit that displays an edit screen of a voice message based on the voice data with additional information, accepts edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generation unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; It is characterized by including.

フレーズとは例えば文節や文の一部である。センテンスとは例えば文であり、電子機器などの音声ガイドメッセージとして使用される文でもよい。 A phrase is, for example, a phrase or a part of a sentence. The sentence is, for example, a sentence, and may be a sentence used as a voice guide message for an electronic device or the like.

テキストデータは文字（かな文字やカナ文字や漢字や数字を示すコード）データでもよい。例えばＡＳＣＩＩコードやＪＩＳコードで構成されたテキストデータであってもよい。 The text data may be character (code indicating kana characters, kana characters, kanji or numbers) data. For example, it may be text data composed of ASCII code or JIS code.

付加情報付き音声データ生成部は、フレーズのテキストデータに対した音声データをＴＴＳ方式で生成するものであり、例えば既存のＴＴＳツールを用いて実現してもよい。 The voice data generation unit with additional information is for generating voice data corresponding to the text data of the phrase by the TTS method, and may be realized using, for example, an existing TTS tool.

本発明によれば、音声データ作成時に、音声データファイルに付加情報（音声ログ情報）を埋め込むことで、ツール間（例えばＴＴＳツールと音声編集ツール）で受け渡すファイル数が少なくなり、ファイル管理が容易になる。 According to the present invention, when audio data is created, additional information (audio log information) is embedded in an audio data file, thereby reducing the number of files transferred between tools (for example, a TTS tool and an audio editing tool), and file management is facilitated. It becomes easy.

付加情報付きの音声データに基づきメモリ書き込み情報を生成するとメモリへの格納データ量がおおきくなる（メモリサイズも大きくなる）が、付加情報付きの音声データから付加情報を削除した付加情報なし音声データに基づきメモリ書き込み情報を生成することによりメモリサイズの増大を防止することができる。 When memory write information is generated based on audio data with additional information, the amount of data stored in the memory increases (the memory size also increases), but the audio data without additional information is deleted from the audio data with additional information. By generating the memory write information based on this, it is possible to prevent the memory size from increasing.

（２）本発明は、
複数のフレーズを含むセンテンスとして構成される音声メッセージ作成システムであって、
テキストデータに対応した合成音声データを生成するための辞書データを記憶する辞書データ記憶部と、
前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成する付加情報付き音声データ生成部と、を含むことを特徴とする。 (2) The present invention
A voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. It is characterized by including these.

（３）本発明は、
複数のフレーズを含むセンテンスとして構成される音声メッセージ作成システムであって、
音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行う編集処理部と、
編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行うリスト情報生成処理部と、
前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する付加情報なし音声データ作成部と、
前記リスト情報に基づき、音声データメモリへの格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成するメモリ書き込み情報生成部とを含むことを特徴とする。 (3) The present invention
A voice message creation system configured as a sentence including a plurality of phrases,
An edit processing unit that displays an edit screen of a voice message based on the audio data and the audio data with additional information including the additional information related to the audio data, receives the edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generation unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; It is characterized by including.

（４）本発明は、
複数のフレーズを含むセンテンスとして構成される音声メッセージ作成システムであって、
テキストデータに対応した合成音声データを生成するための辞書データを記憶する辞書データ記憶部と、
前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成する付加情報付き音声データ生成部と、
前記付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行う編集処理部と、
編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行うリスト情報生成処理部と、を含み、
前記付加情報付き音声データ生成部は、
フレーズに対応するテキストデータを受け取りフレーズに対応した付加情報付き音声データを生成し、
前記編集処理部は、
付加情報付き音声データの付加情報に基づき、編集画面に付加情報を表示する処理を行うことを特徴とする。 (4) The present invention
A voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing unit that displays an edit screen of a voice message based on the voice data with additional information, accepts edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit that generates list information including phrase information related to phrases constituting each sentence based on an editing process result,
The voice data generating unit with additional information is
Receives text data corresponding to the phrase, generates voice data with additional information corresponding to the phrase,
The edit processing unit
Based on the additional information of the audio data with additional information, a process for displaying the additional information on the editing screen is performed.

（５）この音声メッセージ作成システムは、
前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する付加情報なし音声データ作成部を含んでもよい。 (5) This voice message creation system
A voice data creation unit without additional information may be included that creates voice data without additional information from which additional information is deleted based on the voice data with additional information.

（６）この音声メッセージ作成システムは、
前記リスト情報に基づき、音声データメモリへの格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成するメモリ書き込み情報生成部とを含み、
前記メモリ書き込み情報生成部は、
複数のセンテンスで使用されているフレーズまたは１つのセンテンスで複数回使用されているフレーズについては同じフレーズの音声データが重複して書き込まれないように格納対象フレーズを決定してもよい。 (6) This voice message creation system
A memory write information generation unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; Including
The memory write information generation unit
For a phrase used in a plurality of sentences or a phrase used a plurality of times in one sentence, the storage target phrase may be determined so that audio data of the same phrase is not written redundantly.

また本発明によればリスト情報に基づきフレーズ単位でメモリ（ＲＯＭ）への書き込みの有無を決定するので、複数のセンテンスで使用されているフレーズについても同じ音声データが重複して書き込まれないようにメモリ書き込み情報（ＲＯＭイメージ）を生成することができる。このため複数のセンテンスで共用されるフレーズや１つのセンテンスで複数回使用されるフレーズについても、フレーズの音声データは１つしか格納されないので、メモリサイズの増大を防ぐことができる。 Further, according to the present invention, since the presence or absence of writing to the memory (ROM) is determined in units of phrases based on the list information, the same audio data is not written redundantly for phrases used in a plurality of sentences. Memory write information (ROM image) can be generated. For this reason, even for a phrase shared by a plurality of sentences and a phrase used a plurality of times in one sentence, only one phrase voice data is stored, so that an increase in memory size can be prevented.

（７）この音声メッセージ作成システムは、
前記編集処理部は、
付加情報付き音声データの付加情報に基づき、リスト情報の生成を行ってもよい。 (7) This voice message creation system
The edit processing unit
The list information may be generated based on the additional information of the audio data with additional information.

例えば付加情報付き音声データから付加情報（音声データに対応したテキスト情報や音声データのデータサイズ情報や音声データの再生時間情報等）を取り出してリスト情報（例えばフレーズ情報）を生成してもよい。 For example, additional information (text information corresponding to audio data, data size information of audio data, reproduction time information of audio data, etc.) may be extracted from audio data with additional information to generate list information (for example, phrase information).

（８）この音声メッセージ作成システムは、
前記リスト情報に基づき、センテンス音声の再生に必要な音声データを音声データメモリからよみだしセンテンスに対応した順序で再生させるための指示を行う音声再生コマンドを生成する音声再生コマンド生成処理部を含んでもよい。 (8) This voice message creation system
Based on the list information, an audio playback command generation processing unit that generates an audio playback command for instructing playback of audio data necessary for playback of sentence audio from the audio data memory in an order corresponding to the read-out sentence may be included. Good.

音声再生コマンドは、センテンスを構成するフレーズとその再生順序を特定し、センテンスを構成するフレーズのフレーズ特定情報とフレーズの再生順序に関するシーケンス情報を含むようにしてもよい。 The voice playback command may specify a phrase that constitutes a sentence and its playback order, and may include phrase specifying information of a phrase that constitutes a sentence and sequence information related to the playback order of the phrase.

（９）この音声メッセージ作成システムは、
前記付加情報として、音声データのデータサイズ情報を含んでもよい。 (9) This voice message creation system
The additional information may include audio data size information.

（１０）この音声メッセージ作成システムは、
前記付加情報として、音声データのテキスト情報を含んでもよい。 (10) This voice message creation system
The additional information may include text information of audio data.

（１１）この音声メッセージ作成システムは、
前記メモリ書き込み情報生成部は、
前記メモリ書き込み情報の合計サイズを算出して算出結果に基づきサイズ情報を出力してもよい。 (11) This voice message creation system
The memory write information generation unit
The total size of the memory write information may be calculated and the size information may be output based on the calculation result.

付加情報付き音声データ生成部が、フレーズに対応した音声データを作成する際に音声データのファイルサイズ情報を生成して、付加情報として付加情報付き音声データに含ませておき、前記メモリ書き込み情報生成部が、格納対象フレーズの音声データのファイルサイズ情報に基づき前記メモリ書き込み情報の合計サイズを算出してもよい。 When the audio data generating unit with additional information generates audio data corresponding to the phrase, the audio data file size information is generated and included in the audio data with additional information as additional information. The unit may calculate a total size of the memory write information based on file size information of the audio data of the storage target phrase.

なお使用するメモリサイズ情報と前記合計サイズを比較して、比較結果を出力するようにしてもよい。使用するメモリサイズ情報が前記合計サイズより小さいと判断した場合には、警告情報を出力するようにしてもよい。 The memory size information to be used may be compared with the total size, and the comparison result may be output. When it is determined that the memory size information to be used is smaller than the total size, warning information may be output.

（１２）この音声メッセージ作成システムは、
前記編集処理部は、
前記センテンスのテキストデータを受け付け、フレーズのテキストデータに基づき、受け付けたセンテンスのテキストデータを複数のフレーズに分割するセンテンス分割処理を行い、
前記リスト情報生成処理部は、
作成されたセンテンスを構成するフレーズのフレーズ特定情報とシーケンス情報を含むセンテンス情報の生成を行ってもよい。 (12) This voice message creation system
The edit processing unit
The sentence text data is received, and based on the phrase text data, sentence split processing is performed to divide the received sentence text data into a plurality of phrases,
The list information generation processing unit
You may generate | occur | produce the sentence information containing the phrase specific information and sequence information of the phrase which comprises the created sentence.

例えば前記センテンスのテキストデータは、フレーズの区切りを示す区切りデータを含み、前記編集処理部は、前記区切りデータに基づき前記センテンス分割処理を行ってもよい。区切りデータは例えばスペースデータでもよいし、所定の文字や記号を示すテキストデータでもよい。 For example, the sentence text data may include delimiter data indicating phrase delimiters, and the edit processing unit may perform the sentence dividing process based on the delimiter data. The delimiter data may be space data, for example, or text data indicating a predetermined character or symbol.

例えばセンテンスが”電源を切って下さい”に対して”電源を”、”電源を切って”、”切って下さい”、”下さい”等の文言の一部が重複するフレーズのフレーズデータが存在する場合に、センテンスを”電源を切って下さい”と言うように切りたい場所をスペースで明示する事で意図するフレーズに展開されるようにすることができる。 For example, there is phrase data for a phrase with a duplicated sentence such as “Turn off power”, “Turn off power”, “Turn off”, “Please”, etc. In some cases, the sentence can be expanded to the intended phrase by clearly indicating the place where you want to turn it off, such as "Please turn off the power".

（１３）この音声メッセージ作成システムは、
前記編集処理部は、
フレーズの選択入力を受け付け、選択されたフレーズに基づきセンテンスを作成するフレーズ結合処理を行い、
前記リスト情報生成処理部は、
フレーズ結合処理結果に基づき、センテンスを構成するフレーズのフレーズ特定情報とシーケンス情報を含むセンテンス情報の生成を行ってもよい。 (13) This voice message creation system
The edit processing unit
Accepts phrase selection input, performs phrase merge processing to create a sentence based on the selected phrase,
The list information generation processing unit
Based on the phrase combination processing result, sentence information including phrase specifying information and sequence information of phrases constituting the sentence may be generated.

（１４）この音声メッセージ作成システムは、
前記センテンス情報に基づきセンテンスを構成するフレーズとその再生順序を判断し、フレーズの音声データをその再生順序に従って再生出力する音声再生出力処理部を含んでもよい。 (14) This voice message creation system
An audio reproduction output processing unit may be included that determines a phrase that constitutes a sentence based on the sentence information and a reproduction order thereof, and reproduces and outputs the audio data of the phrase according to the reproduction order.

（１５）本発明は、
コンピュータを複数のフレーズを含むセンテンスとして構成される音声メッセージ作成システムとして動作させるためのプログラムであって、
テキストデータに対応した合成音声データを生成するための辞書データを記憶する辞書データ記憶部と、
前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成する付加情報付き音声データ生成部と、
前記付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行う編集処理部と、
編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行うリスト情報生成処理部と、
前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する付加情報なし音声データ作成部と、
前記リスト情報に基づき、音声データメモリへの格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成するメモリ書き込み情報生成部と、してコンピュータを機能させることを特徴とする。 (15) The present invention provides:
A program for operating a computer as a voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing unit that displays an edit screen of a voice message based on the voice data with additional information, accepts edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generating unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; And making the computer function.

（１６）本発明は、
コンピュータを複数のフレーズを含むセンテンスとして構成される音声メッセージ作成システムとして動作させるためのプログラムであって、
テキストデータに対応した合成音声データを生成するための辞書データを記憶する辞書データ記憶部と、
前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成する付加情報付き音声データ生成部と、してコンピュータを機能させることを特徴とする。 (16) The present invention provides:
A program for operating a computer as a voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. And making the computer function.

（１７）本発明は、
コンピュータを複数のフレーズを含むセンテンスとして構成される音声メッセージ作成システムとして動作させるためのプログラムであって、
音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行う編集処理部と、
編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行うリスト情報生成処理部と、
前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する付加情報なし音声データ作成部と、
前記リスト情報に基づき、音声データメモリへの格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成するメモリ書き込み情報生成部と、してコンピュータを機能させることを特徴とする。 (17) The present invention provides:
A program for operating a computer as a voice message creation system configured as a sentence including a plurality of phrases,
An edit processing unit that displays an edit screen of a voice message based on the voice data and the voice data with additional information including additional information related to the voice data, accepts the edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generating unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; And making the computer function.

（１８）本発明は、
不揮発性記憶部を含む、音声合成用の半導体集積回路装置の製造方法であって、
テキストデータに対応した合成音声データを生成するための辞書データを記憶する辞書データ記憶部を用意する手順と、
前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成する付加情報付き音声データ生成手順と、
前記付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行う編集処理手順と、
編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行うリスト情報生成処理手順と、
前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する付加情報なし音声データ作成手順と、
前記リスト情報に基づき、不揮発性記憶部への格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成するメモリ書き込み情報生成手順と、を含むことを特徴とする。 (18) The present invention provides:
A method of manufacturing a semiconductor integrated circuit device for speech synthesis, including a non-volatile storage unit,
A procedure for preparing a dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to text data;
A voice data generation procedure with additional information that generates voice data corresponding to the received text data based on the dictionary data, and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing procedure for displaying an edit screen of a voice message based on the voice data with additional information, receiving edit input information, and performing an edit process based on the edit input information;
A list information generation processing procedure for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
Based on the voice data with additional information, the voice data creation procedure without additional information for creating voice data without additional information from which the additional information is deleted;
A memory write information generation procedure for determining a storage target phrase to be stored in the nonvolatile storage unit based on the list information, and generating memory write information based on audio data without additional information corresponding to the determined storage target phrase; , Including.

（１９）本発明は、
上記のいずれかに記載の音声メッセージ作成システムによって生成されたメモリ書き込み情報が記憶された不揮発性記憶部と、
前記音声再生コマンドを受け取り、受け取った音声再生コマンドに基づき前記不揮発性記憶部から音声データを読み出して再生出力する音声合成部と、
を含む半導体集積回路装置である。 (19) The present invention provides:
A non-volatile storage unit storing memory write information generated by the voice message creation system according to any one of the above,
A voice synthesizer that receives the voice reproduction command, reads out voice data from the nonvolatile storage unit based on the received voice reproduction command, and reproduces and outputs the voice data;
Is a semiconductor integrated circuit device.

以下、本発明の好適な実施の形態について図面を用いて詳細に説明する。なお以下に説明する実施の形態は、特許請求の範囲に記載された本発明の内容を不当に限定するものではない。また以下で説明される構成の全てが本発明の必須構成要件であるとは限らない。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. The embodiments described below do not unduly limit the contents of the present invention described in the claims. Moreover, not all of the configurations described below are essential constituent requirements of the present invention.

１．本実施の形態構成
図１は、本実施の形態の音声メッセージ作成システムの機能ブロック図の一例である。
なお、本実施形態の音声メッセージ作成システム１００は、図１の構成要素（各部）を全て含む必要はなく、その一部を省略した構成としてもよい。 1. Configuration of the Present Embodiment FIG. 1 is an example of a functional block diagram of a voice message creation system according to the present embodiment.
Note that the voice message creation system 100 of the present embodiment does not have to include all of the components (each unit) in FIG. 1, and may have a configuration in which some of them are omitted.

操作部１６０は、ユーザーの操作等をデータとして入力するためのものであり、その機能は、操作ボタン、操作レバー、タッチパネル或いはマイクなどのハードウェアにより実現できる。 The operation unit 160 is for inputting a user operation or the like as data, and the function can be realized by hardware such as an operation button, an operation lever, a touch panel, or a microphone.

記憶部１７０は、処理部１１０や通信部１９６などのワーク領域となるもので、その機能はＲＡＭなどのハードウェアにより実現できる。 The storage unit 170 serves as a work area for the processing unit 110, the communication unit 196, and the like, and its function can be realized by hardware such as a RAM.

記憶部１７０はフレーズに対応した音声データ（付加情報付き音声データや付加情報無し音声データ）が記憶される音声データ記憶部１７２として機能するようにしてもよい。 The storage unit 170 may function as an audio data storage unit 172 that stores audio data (audio data with additional information or audio data without additional information) corresponding to a phrase.

また記憶部１７０は、センテンスを構成するフレーズの音声データのファイル情報（ファイル名称）とフレーズの再生順序に関するシーケンス情報を含むセンテンス情報を記憶するセンテンス情報記憶部１７４として機能するようにしてもよい。 The storage unit 170 may function as a sentence information storage unit 174 that stores sentence information including file information (file name) of audio data of phrases constituting a sentence and sequence information related to the reproduction order of phrases.

また記憶部１７０は、センテンスを構成するフレーズの編集情報（使用回数、ＲＯＭへの書き込みの有無）や付加情報をフレーズ情報として保持するフレーズ編集情報記憶部１７６として機能するようにしてもよい。 The storage unit 170 may function as a phrase editing information storage unit 176 that holds phrase editing information (number of uses, presence / absence of writing to the ROM) and additional information as phrase information.

情報記憶媒体１８０（コンピュータにより読み取り可能な媒体）は、プログラムやデータなどを格納するものであり、その機能は、光ディスク（ＣＤ、ＤＶＤ等）、光磁気ディスク（ＭＯ）、磁気ディスク、ハードディスク、磁気テープ、或いはメモリ（ＲＯＭ）などのハードウェアにより実現できる。 The information storage medium 180 (a computer-readable medium) stores programs, data, and the like, and functions as an optical disk (CD, DVD, etc.), a magneto-optical disk (MO), a magnetic disk, a hard disk, and a magnetic disk. It can be realized by hardware such as a tape or a memory (ROM).

また情報記憶媒体１８０には、本実施形態の各部としてコンピュータを機能させるプログラムや補助データが記憶されるとともに、辞書データ記憶部１８２として機能する。 The information storage medium 180 stores programs and auxiliary data that cause the computer to function as each unit of the present embodiment, and also functions as a dictionary data storage unit 182.

処理部１００は、この情報記憶媒体１８０に格納されるプログラム（データ）や情報記憶媒体１８０から読み出されたデータなどに基づいて本実施形態の種々の処理を行う。即ち情報記憶媒体１８０には、本実施形態の各部としてコンピュータを機能させるためのプログラム（各部の処理をコンピュータに実行させるためのプログラム）が記憶される。 The processing unit 100 performs various processes of the present embodiment based on a program (data) stored in the information storage medium 180, data read from the information storage medium 180, and the like. That is, the information storage medium 180 stores a program for causing a computer to function as each unit of the present embodiment (a program for causing a computer to execute processing of each unit).

表示部１９０は、本実施形態により生成された画像を出力するものであり、その機能は、ＣＲＴディスプレイ、ＬＣＤ（液晶ディスプレイ）、ＯＥＬＤ(有機ＥＬディスプレイ)、ＰＤＰ（プラズマディスプレイパネル）、タッチパネル型ディスプレイなどのハードウェアにより実現できる。表示部には本ツールの編集画面やダイアログ画面が表示される。 The display unit 190 outputs an image generated according to the present embodiment, and functions thereof are a CRT display, an LCD (liquid crystal display), an OELD (organic EL display), a PDP (plasma display panel), and a touch panel display. It can be realized by hardware such as. The editing screen and dialog screen of this tool are displayed on the display.

音出力部１９２は、本実施形態により生成された合成音声等を出力するものであり、その機能は、スピーカ、或いはヘッドフォンなどのハードウェアにより実現できる。 The sound output unit 192 outputs the synthesized speech generated by the present embodiment, and the function can be realized by hardware such as a speaker or headphones.

通信部１９６は、外部（例えばホスト装置や他の端末機）との間で通信を行うための各種の制御を行うものであり、その機能は、各種プロセッサ又は通信用ＡＳＩＣなどのハードウェアや、プログラムなどにより実現できる。 The communication unit 196 performs various controls for communicating with the outside (for example, a host device or another terminal), and functions thereof include hardware such as various processors or communication ASICs, It can be realized by a program.

なお本実施形態の各部としてコンピュータを機能させるためのプログラム（データ）は、ホスト装置（サーバ装置）が有する情報記憶媒体からネットワーク及び通信部１９６を介して情報記憶媒体１８０（あるいは記憶部１７０）に配信するようにしてもよい。このようなホスト装置（サーバ装置等）の情報記憶媒体の使用も本発明の範囲内に含めることができる。 Note that a program (data) for causing a computer to function as each unit of the present embodiment is transferred from the information storage medium included in the host device (server device) to the information storage medium 180 (or storage unit 170) via the network and communication unit 196. You may make it deliver. Use of the information storage medium of such a host device (server device or the like) can also be included in the scope of the present invention.

不揮発性記憶部１５０は、不揮発性メモリとして機能する記憶媒体で構成され、例えば電子機器に組み込まれる音声合成ＩＣの内蔵ＲＯＭとして使用されるＲＯＭでもよい。不揮発性記憶部１５０には、メモリ書き込み情報１５２が書き込まれるようにしてもよい。また不揮発性記憶部１５０には、音声再生コマンド１５４が書き込まれるようにしても良い。 The non-volatile storage unit 150 is configured by a storage medium that functions as a non-volatile memory, and may be a ROM used as a built-in ROM of a speech synthesis IC incorporated in an electronic device, for example. The memory write information 152 may be written in the nonvolatile storage unit 150. In addition, the audio reproduction command 154 may be written in the nonvolatile storage unit 150.

処理部１１０（プロセッサ）は、操作部１６０からの操作データやプログラムなどに基づいて、記憶部１７０をワーク領域として各種処理を行う。処理部１１０の機能は各種プロセッサ（ＣＰＵ、ＤＳＰ等）、ＡＳＩＣ（ゲートアレイ等）などのハードウェアや、プログラムにより実現できる。 The processing unit 110 (processor) performs various processes using the storage unit 170 as a work area based on operation data, a program, and the like from the operation unit 160. The functions of the processing unit 110 can be realized by hardware such as various processors (CPU, DSP, etc.), ASIC (gate array, etc.), and programs.

処理部１１０は、付加情報付き音声データ生成部１２０、編集処理部１３０、リスト情報生成処理部１３２、付加情報なし音声データ作成部１３４、メモリ書き込み情報生成部１３６、音声再生コマンド生成部１３８、音声再生出力処理部１４０を含んでもよい。 The processing unit 110 includes a voice data generation unit 120 with additional information, an editing processing unit 130, a list information generation processing unit 132, a voice data generation unit 134 without additional information, a memory write information generation unit 136, a voice reproduction command generation unit 138, a voice A reproduction output processing unit 140 may be included.

付加情報付き音声データ生成部１２０は、前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成し、編集処理部１３０は、前記付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行い、リスト情報生成処理部１３２は、編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行う。付加情報付き音声データ生成部１２０は、フレーズに対応するテキストデータを受け取りフレーズに対応した付加情報付き音声データを生成し、編集処理部１３０は、付加情報付き音声データの付加情報に基づき、編集画面に付加情報を表示する処理を行う。 The voice data generating unit with additional information 120 generates voice data corresponding to the received text data based on the dictionary data, and the voice data with additional information including the generated voice data and additional information related to the voice data. The editing processing unit 130 displays a voice message editing screen based on the voice data with additional information, accepts editing input information, performs editing processing based on the editing input information, and performs a list information generation processing unit 132. Generates list information including phrase information related to phrases constituting each sentence based on the editing processing result. The voice data generating unit with additional information 120 receives the text data corresponding to the phrase, generates voice data with additional information corresponding to the phrase, and the editing processing unit 130 edits the editing screen based on the additional information of the voice data with additional information. The additional information is displayed on the screen.

付加情報なし音声データ作成部１３４は、前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する。 The voice data creation unit without additional information 134 creates voice data without additional information from which the additional information is deleted based on the voice data with additional information.

メモリ書き込み情報生成部１３６は、前記リスト情報に基づき、音声データメモリへの格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成する。メモリ書き込み情報生成部１３６は、複数のセンテンスで使用されているフレーズまたは１つのセンテンスで複数回使用されているフレーズについては同じフレーズの音声データが重複して書き込まれないように格納対象フレーズを決定してもよい。 The memory write information generation unit 136 determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase. To do. The memory write information generation unit 136 determines a storage target phrase so that the voice data of the same phrase is not written redundantly for a phrase used in a plurality of sentences or a phrase used a plurality of times in one sentence. May be.

編集処理部１３０は、付加情報付き音声データの付加情報に基づき、リスト情報の生成を行ってもよい。 The edit processing unit 130 may generate list information based on the additional information of the audio data with additional information.

音声再生コマンド生成処理部１３８は、前記リスト情報に基づき、センテンス音声の再生に必要な音声データを音声データメモリからよみだしセンテンスに対応した順序で再生させるための指示を行う音声再生コマンドを生成する。 Based on the list information, the voice playback command generation processing unit 138 generates a voice playback command for instructing to play back voice data necessary for playing back the sentence voice from the voice data memory in the order corresponding to the read-out sentence. .

メモリ書き込み情報生成部１３６は、前記メモリ書き込み情報の合計サイズを算出して算出結果に基づきサイズ情報を出力してもよい。 The memory write information generation unit 136 may calculate the total size of the memory write information and output size information based on the calculation result.

編集処理部１３０は、前記センテンスのテキストデータを受け付け、フレーズのテキストデータに基づき、受け付けたセンテンスのテキストデータを複数のフレーズに分割するセンテンス分割処理を行い、リスト情報生成処理部１３２は、作成されたセンテンスを構成するフレーズのフレーズ特定情報とシーケンス情報を含むセンテンス情報の生成を行ってもよい。 The edit processing unit 130 receives the sentence text data, performs sentence division processing to divide the received sentence text data into a plurality of phrases based on the phrase text data, and the list information generation processing unit 132 is created. Sentence information including phrase specifying information and sequence information of phrases constituting the sentence may be generated.

編集処理部１３０は、フレーズの選択入力を受け付け、選択されたフレーズに基づきセンテンスを作成するフレーズ結合処理を行い、リスト情報生成処理部１３２は、フレーズ結合処理結果に基づき、センテンスを構成するフレーズのフレーズ特定情報とシーケンス情報を含むセンテンス情報の生成を行ってもよい。 The edit processing unit 130 accepts a phrase selection input, performs a phrase combining process for creating a sentence based on the selected phrase, and the list information generation processing unit 132 selects a phrase constituting the sentence based on the phrase combining process result. Sentence information including phrase specifying information and sequence information may be generated.

音声再生出力処理部１４０は、前記センテンス情報に基づきセンテンスを構成するフレーズとその再生順序を判断し、フレーズの音声データをその再生順序に従って再生出力する。 The audio reproduction output processing unit 140 determines phrases constituting the sentence and their reproduction order based on the sentence information, and reproduces and outputs the audio data of the phrase according to the reproduction order.

図２（Ａ）はメモリ書き込み情報（ＲＯＭイメージ）の生成過程について説明するための図であり、図２（Ｂ）はメモリ書き込み情報（ＲＯＭイメージ）の使用形態について説明するための図である。図３（Ａ）（Ｂ）は、それぞれ付加情報付き音声データ、付加情報無し音声データのファイルの構成を示す図である。 FIG. 2A is a diagram for explaining a generation process of memory write information (ROM image), and FIG. 2B is a diagram for explaining a usage form of the memory write information (ROM image). FIGS. 3A and 3B are diagrams showing the file structures of audio data with additional information and audio data without additional information, respectively.

１００は、本実施の形態の音声メッセージ作成ツール（プログラム又はシステム）である。１０は電子機器等に組み込まれユーザーインターフェースとして予め設定されたメッセージを出力する音声機能を有するＩＣである。音声合成ＩＣ１０は、内蔵ＲＯＭ２０に格納されたＲＯＭイメージ１５２と音声再生用コマンド１５４に基づきセンテンスに対応した音声を再生出力する。 Reference numeral 100 denotes a voice message creation tool (program or system) according to the present embodiment. Reference numeral 10 denotes an IC which is incorporated in an electronic device or the like and has a voice function for outputting a message preset as a user interface. The voice synthesis IC 10 plays back and outputs voice corresponding to the sentence based on the ROM image 152 and the voice playback command 154 stored in the built-in ROM 20.

音声メッセージ作成ツール１００を用いて、音声合成ＩＣ１０の内蔵ＲＯＭに格納するＲＯＭイメージ（フレーズの音声データの集合）と音声再生用コマンドを生成することができる。 Using the voice message creation tool 100, a ROM image (a set of phrases voice data) stored in the built-in ROM of the voice synthesis IC 10 and voice playback commands can be generated.

本音声メッセージ作成ツール１００は、例えばパーソナルコンピュータ（ＰＣ）等に、ＴＴＳツール１０２と音声編集ツール１０４を動作させるプログラムをインストールすることで音声メッセージ作成システムとして動作させることができる。 The voice message creation tool 100 can be operated as a voice message creation system by installing a program for operating the TTS tool 102 and the voice editing tool 104 in a personal computer (PC), for example.

ＴＴＳツール１０２は、テキストデータに対応した合成音声データを生成するための辞書データを記憶する辞書データ記憶部と、前記辞書データに基づき、受け取ったテキストデータに対応した音声データを生成して、生成した音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データを生成する付加情報付き音声データ生成部と、を含んで構成してもよい。 The TTS tool 102 generates a dictionary data storage unit for storing dictionary data for generating synthesized voice data corresponding to the text data, and generates voice data corresponding to the received text data based on the dictionary data. And a voice data generation unit with additional information that generates voice data with additional information including additional information related to the voice data.

音声編集ツール１０４は、音声データと当該音声データに関する付加情報が含まれた付加情報付き音声データに基づき、音声メッセージの編集画面を表示して編集入力情報を受け付け、編集入力情報に基づき編集処理を行う編集処理部と、編集処理結果に基づき各センテンスを構成するフレーズに関するフレーズ情報を含むリスト情報の生成を行うリスト情報生成処理部と、前記付加情報付き音声データに基づき、付加情報が削除された付加情報なし音声データを作成する付加情報なし音声データ作成部と、前記リスト情報に基づき、音声データメモリへの格納対象となる格納対象フレーズを決定し、決定した格納対象フレーズに対応した付加情報なし音声データに基づきメモリ書き込み情報を生成するメモリ書き込み情報生成部とを含んで構成してもよい。 The voice editing tool 104 displays a voice message editing screen based on the voice data and the voice data with additional information including the additional information related to the voice data, receives the editing input information, and performs the editing process based on the editing input information. The additional information is deleted based on the editing processing unit to be performed, the list information generation processing unit that generates the list information including the phrase information relating to the phrase that constitutes each sentence based on the editing processing result, and the audio data with the additional information. Based on the list information, a voice data creation unit that creates voice data without additional information, and determines a storage target phrase to be stored in the voice data memory, and there is no additional information corresponding to the determined storage target phrase Including a memory write information generation unit for generating memory write information based on audio data It may form.

なおここでは音声メッセージ作成ツール１００がＴＴＳツール１０２と音声編集ツール１０４が別個のプログラムとして構成されている場合を例にとり説明したが、音声メッセージ作成ツール１００がＴＴＳツール１０２の機能と音声編集ツール１０４の機能の両機能を１つのプログラムで実現するツールとして構成されていてもよい。 Here, a case has been described in which the voice message creation tool 100 is configured with the TTS tool 102 and the voice editing tool 104 as separate programs. However, the voice message creation tool 100 has the functions of the TTS tool 102 and the voice editing tool 104. These functions may be configured as a tool that realizes both functions in one program.

ＴＴＳツール１０２は、コンピュータを付加情報付き音声データ生成部として機能させるためのツールであり、作成するフレーズのテキスト情報４０と、ＴＴＳ用音声合成辞書５０（辞書データ記憶部１８２に記憶された辞書データ）に基づきフレーズに対応した付加情報付き音声データ６０を生成する。 The TTS tool 102 is a tool for causing a computer to function as a voice data generation unit with additional information. The text information 40 of a phrase to be created and a TTS voice synthesis dictionary 50 (dictionary data stored in the dictionary data storage unit 182) ) To generate audio data 60 with additional information corresponding to the phrase.

付加情報付き音声データ６０は、フレーズに対応した音声データ６２と付加情報ヘッダ７０を含む。すなわちフレーズに対応した音声データ６２は、ＴＴＳツールにより、ＴＴＳ用音声合成辞書５０（辞書データ記憶部１８２に記憶されている辞書データ）に基づき、ＴＴＳ方式によりフレーズ単位に生成した音声データである。この音声データは、既存の音声再生システムによって再生可能な音データであり、圧縮された音データでもよい。生成された音声データはフレーズ単位で圧縮され、フレーズ単位の音声ファイルとして保持されてもよい。例えばＡＤＰＣＭ形式やＡＡＣ−ＬＣ形式の音声データファイルでもよい。 The audio data with additional information 60 includes audio data 62 corresponding to a phrase and an additional information header 70. That is, the speech data 62 corresponding to the phrase is speech data generated in units of phrases by the TTS method based on the TTS speech synthesis dictionary 50 (dictionary data stored in the dictionary data storage unit 182) by the TTS tool. This audio data is sound data that can be reproduced by an existing audio reproduction system, and may be compressed sound data. The generated audio data may be compressed in units of phrases and held as an audio file in units of phrases. For example, an audio data file in ADPCM format or AAC-LC format may be used.

ヘッダ情報７０は付加情報を識別するヘッダＩＤ７２が格納される２バイトの領域を含む。 The header information 70 includes a 2-byte area in which a header ID 72 for identifying additional information is stored.

ヘッダ情報７０は付加情報ヘッダのサイズ（音声データ６２は含まない）７４が格納される２バイトの領域を含む。本実施の形態ではテキスト７６が可変長であるため付加情報ヘッダのサイズ７４により、音声データの格納開始位置を把握することができる。 The header information 70 includes a 2-byte area in which an additional information header size 74 (not including the audio data 62) is stored. In the present embodiment, since the text 76 has a variable length, the storage start position of the audio data can be grasped from the size 74 of the additional information header.

ヘッダ情報７０はＴＴＳツール３０に入力されるテキスト（フレーズのテキスト情報４０）７６が格納される可変長の領域を含む。テキスト７６の領域には生成された音声データ６２に対応して入力されたフレーズのテキスト情報４０が記憶される。 The header information 70 includes a variable length area in which text (phrase text information 40) 76 input to the TTS tool 30 is stored. The text information 40 of the phrase input corresponding to the generated voice data 62 is stored in the text 76 area.

ヘッダ情報７０はフレーズに対応した音声データ６２のデータサイズ（ヘッダは含まない）７８が格納された４バイトの領域を含む。データサイズ７８の情報はＴＴＳツールが音声データを作成する際にログ情報として生成するものを用いても良い。フレーズに対応した音声データ６２のデータサイズ（ヘッダは含まない）７８は音声編集ツール１０４においてＲＯＭ作成時のサイズチェック等で使用される。 The header information 70 includes a 4-byte area in which the data size (excluding the header) 78 of the audio data 62 corresponding to the phrase is stored. The information of the data size 78 may be information generated as log information when the TTS tool creates audio data. The data size (not including the header) 78 of the audio data 62 corresponding to the phrase is used by the audio editing tool 104 for size check at the time of ROM creation.

なお付加情報は上記記載の情報に限られず、例えばＴＴＳパラメータ（発音／スピード／ピッチ／ボリューム）や再生時間やデータフォーマット情報等を含むようにしてもよい。 The additional information is not limited to the information described above, and may include, for example, a TTS parameter (sound generation / speed / pitch / volume), reproduction time, data format information, and the like.

本実施の形態では、フレーズに対応した音声データ６２と付加情報ヘッダ７０とで１つのファイルが構成され、ファイル単位で一体的にアクセス可能であるため、音声データファイルと音声データファイルに対応した付加データの音声ログファイルを別個に管理する場合に比べファイル管理が容易となっている。 In the present embodiment, the audio data 62 corresponding to the phrase and the additional information header 70 constitute a single file and can be accessed integrally in units of files. Therefore, the audio data file and the additional data file corresponding to the audio data file can be accessed. File management is easier than managing separate audio log files for data.

ＴＴＳツール３０は、ＴＴＳ用音声合成辞書５０に基づきテキストデータ（ここではフレーズのテキスト情報に対応したテキストデータ４０）に基づき音声を合成するＴＴＳ（text-to-speech）方式の音声合成システムである。ＴＴＳシステムには、人体の発声過程をモデル化して音を合成するパラメトリック方式、実在の人物の音声データからなる音素片データを持ち、必要に応じてそれを組み合わせるとともにつなぎ部分を一部変形するなどして合成するコンキャティネイティブ方式、さらに発展形として言語ベースの解析から音声への組み立てを行なって実声データから合成音声を形成するコーパスベース方式など多数の方式があるが、そのいずれにも適用可能である。例えばコンキャティネイティブ方式やコーパスベース方式などの場合には音素辞書を有し、音声合成部は音素辞書に基づき読み表記に対応した合成音声の音声データを生成するようにしてもよい。 The TTS tool 30 is a TTS (text-to-speech) speech synthesis system that synthesizes speech based on text data (here, text data 40 corresponding to phrase text information) based on the TTS speech synthesis dictionary 50. . The TTS system has a parametric method that models the speech process of the human body and synthesizes sound, and has phoneme data consisting of voice data of a real person. If necessary, combine them and transform part of the connection part. There are many methods, such as the concati-native method for synthesizing and synthesizing the speech, and the corpus-based method for forming synthesized speech from real voice data by assembling speech from language-based analysis into speech. Is possible. For example, in the case of a concati native method or a corpus-based method, a phoneme dictionary may be provided, and the speech synthesis unit may generate speech data of synthesized speech corresponding to reading notation based on the phoneme dictionary.

ＴＴＳ用音声合成辞書５０は、例えば語彙辞書や音素辞書を含む。語彙辞書は、テキスト表記に対応した読み表記が記憶されているデータ辞書であり、音素辞書は、声質を高めるのに効果的な多くのケースを網羅した辞書である。語彙辞書はテキスト読み上げ処理におけるフロントエンド処理を行うための辞書であり、テキスト表記に対応した記号化言語表現(symbolic linguistic representation)（例えばテキスト表記に対応した読みのデータ）が格納された辞書でもよい。フロントエンド処理では、テキストの中の数字や省略表現を読み上げるときの表現に変換する処理（テキストの正規化、プリプロセッシング、トークン化などと呼ばれる）や、各単語を発音記号に変換し、テキストを熟語や文節、文などの韻律単位に分割する処理（単語に発音記号を割り当てる処理をテキスト音素(text-to-phoneme (TTP))変換または書記素音素(grapheme-to-phoneme (GTP))変換と呼ぶ）等が行われ、発音記号と韻律情報を組み合わせて記号化言語表現を作成し出力される構成でもよい。テキスト正規化の工程では、テキストに含まれる同綴異義語、数字、省略表現等を発声できるように変換する処理が行われるようにしてもよい。 The TTS speech synthesis dictionary 50 includes, for example, a vocabulary dictionary and a phoneme dictionary. The vocabulary dictionary is a data dictionary in which reading notation corresponding to text notation is stored, and the phoneme dictionary is a dictionary that covers many cases effective for improving voice quality. The vocabulary dictionary is a dictionary for performing front-end processing in text-to-speech processing, and may be a dictionary that stores a symbolic linguistic representation corresponding to text notation (for example, reading data corresponding to text notation). . In front-end processing, the numbers and abbreviations in the text are converted into expressions for reading (called text normalization, preprocessing, tokenization, etc.), and each word is converted into a phonetic symbol, Processing to divide into prosodic units such as idioms, clauses, sentences, etc. (processing to assign phonetic symbols to words is text-to-phoneme (TTP) conversion or grapheme-to-phoneme (GTP) conversion) The symbolic language expression may be created and output by combining phonetic symbols and prosodic information. In the text normalization step, conversion processing may be performed so that synonyms, numbers, abbreviations, etc. included in the text can be uttered.

音素辞書は、フロントエンドの出力である記号化言語表現を入力として対応する実際の音（音素）の波形情報を格納する辞書である。バックエンドで音声波形を生成する主要技術には、連結的合成(concatenative synthesis)やフォルマント合成(formant synthesis)がある。連結的合成は、基本的には録音された音声の断片を連結して合成する方法である。 The phoneme dictionary is a dictionary that stores waveform information of an actual sound (phoneme) corresponding to a symbolic language expression that is output from the front end. The main technologies for generating speech waveforms at the back end include concatenative synthesis and formant synthesis. Linked synthesis is basically a method of combining recorded audio fragments.

ＴＴＳツール３０は、ＴＴＳ用音声合成辞書５０に記憶されている語彙情報や音情報に基づき、フロントエンド処理やバックエンド処理を行い、入力されたフレーズのテキスト情報４０に対応した音声データ（図３（Ａ）の６２）を含む付加情報付き音声データ６０を生成するようにしてもよい。 The TTS tool 30 performs front-end processing and back-end processing based on vocabulary information and sound information stored in the TTS speech synthesis dictionary 50, and speech data corresponding to the input phrase text information 40 (FIG. 3). The audio data 60 with additional information including (A) 62) may be generated.

作成された付加情報付き音声データ６０は音声編集ツール１０４において使用される。 The created audio data 60 with additional information is used in the audio editing tool 104.

ユーザーは音声編集作成ツール１０４を用いて、音声合成ＩＣ１０に発話させたい音声ガイドメッセージ（センテンス）を編集することができる。音声編集作成ツール１０４は、付加情報付き音声データ及び編集入力情報１６２に基づき、センテンス（音声ガイドメッセージ）の編集を行い、編集結果に基づき、音声ガイドメッセージの音声合成を行うために必要なフレーズの音声データのファイルの集合であるＲＯＭイメージ１５２と、ＲＯＭイメージの音声データファイルを読み出して音声ガイドメッセージ（センテンス）の音声再生を行うための音声再生コマンド１５４を作成する。 The user can edit a voice guide message (sentence) to be uttered by the voice synthesis IC 10 using the voice editing creation tool 104. The voice editing creation tool 104 edits a sentence (voice guide message) based on the voice data with additional information and the edit input information 162, and based on the editing result, the voice editing message 104 generates a phrase necessary for voice synthesis of the voice guide message. A ROM image 152 that is a set of audio data files and an audio reproduction command 154 for reading out the audio data files of the ROM image and reproducing the audio guide message (sentence) are created.

本実施の形態ではＲＯＭイメージ１５２を作成する場合にはフレーズに対応した音声データとして、付加情報なし音声データを用いる。ＴＴＳツール１０２で作成される（音声編集ツール１０４の入力となる）のは付加情報付き音声データであるため、是をこのまま用いてＲＯＭイメージを作成するとＲＯＭに格納するデータ量が増えてしまう。そこで、本実施の形態では、音声編集ツールにおいて、付加情報付き音声データから付加情報を削除した付加情報無しの音声データ（図３（Ｂ）参照）を作成して、付加情報無しの音声データを用いてＲＯＭイメージを生成することで、内蔵ＲＯＭのデータサイズの増大を防止している。 In this embodiment, when the ROM image 152 is created, voice data without additional information is used as voice data corresponding to a phrase. Since the TTS tool 102 creates voice data with additional information (to be input to the voice editing tool 104), if a ROM image is created using the correct information as it is, the amount of data stored in the ROM increases. Therefore, in the present embodiment, the voice editing tool creates voice data without additional information (see FIG. 3B) in which the additional information is deleted from the voice data with additional information. By using this to generate a ROM image, an increase in the data size of the built-in ROM is prevented.

音声編集ツール１０４は、編集の際にＰＣの表示部に図６〜図２５に示すようなシート画面を表示し、ＰＣのキーボード等から編集入力情報１６２の入力を受け付けるようにしてもよい。 The voice editing tool 104 may display a sheet screen as shown in FIGS. 6 to 25 on the display unit of the PC at the time of editing, and may receive input of the edit input information 162 from a PC keyboard or the like.

なお編集画面では付加情報付き音声データの付加情報内容を表示するようにしてもよい。例えば付加情報であるフレーズのテキスト情報を編集画面に表示するようにしてもよい。また付加情報である音声データのデータサイズに基づき作成するメモリ格納情報（ＲＯＭイメージ）のサイズ情報を演算して編集画面に表示するようにしても良い。 Note that the additional information content of the audio data with additional information may be displayed on the editing screen. For example, phrase text information that is additional information may be displayed on the editing screen. Further, the size information of the memory storage information (ROM image) created based on the data size of the audio data as additional information may be calculated and displayed on the editing screen.

そして編集入力情報１６２やＰＣの記憶部に記憶された音声データやリスト情報に基づき、センテンスの編集処理を行い、編集結果に基づいてＲＯＭイメージ（音声データメモリに書き込むメモリ書き込み情報）１５２と音声再生コマンド１５４を生成して出力するようにしてもよい。また付加情報付き音声データの付加情報に基づきリスト情報を作成するようにしてもよい。 Then, the sentence editing process is performed based on the edit input information 162 and the voice data and list information stored in the storage unit of the PC, and the ROM image (memory writing information to be written in the voice data memory) 152 and the voice reproduction are performed based on the editing result. The command 154 may be generated and output. The list information may be created based on the additional information of the audio data with additional information.

音声再生コマンド１５４は、例えばセンテンスを構成するフレーズのファイル特定情報（例えばファイル名称等）を再生順に並べた構成でもよい。 The audio playback command 154 may have a configuration in which file specifying information (for example, a file name) of phrases constituting a sentence is arranged in the order of playback.

作成されたＲＯＭイメージ１５２は、電子機器等電子機器等に搭載する音声合成ＩＣ１０の内蔵メモリであるＲＯＭに格納してもよい。音声合成ＩＣ１０は、音声メッセージ作成ツール１００によって生成されたＲＯＭイメージ（メモリ書き込み情報）１５２が記憶された内蔵ＲＯＭ（不揮発性記憶部）２０を含み、音声再生コマンド１５４を受け取り、受け取った音声再生コマンド１５４に基づき内蔵ＲＯＭ（不揮発性記憶部）２０から音声データを読み出してセンテンスに対応した音声ガイドメッセージを再生出力する音声再生部として機能する。音声再生コマンド１５４はホストコンピュータ（例えば電子機器等の主制御部）から受け取るようにしてもよい。 The created ROM image 152 may be stored in a ROM that is a built-in memory of the speech synthesis IC 10 mounted on an electronic device such as an electronic device. The voice synthesis IC 10 includes a built-in ROM (nonvolatile storage unit) 20 in which a ROM image (memory write information) 152 generated by the voice message creation tool 100 is stored, receives a voice playback command 154, and receives the received voice playback command. Based on 154, it functions as an audio reproducing unit that reads out audio data from the built-in ROM (non-volatile storage unit) 20 and reproduces and outputs an audio guide message corresponding to the sentence. The audio reproduction command 154 may be received from a host computer (for example, a main control unit such as an electronic device).

音声合成ＩＣ１０の内蔵ＲＯＭ（不揮発性記憶部）２０には、フレーズに対応した付加情報無しの音声データが、重複なく記憶されているので、効率よくＲＯＭサイズの削減をはかることが出来る。 Since the voice data without additional information corresponding to the phrase is stored in the built-in ROM (nonvolatile storage unit) 20 of the voice synthesis IC 10 without duplication, the ROM size can be efficiently reduced.

図４（Ａ）は、フレーズに対応した音声データとフレーズ情報について説明するための図である。 FIG. 4A is a diagram for explaining audio data and phrase information corresponding to a phrase.

音声データ２０２は、既存の音声再生システムによって再生可能な音データのファイルであり、圧縮された音ファイルでもよい。例えばＴＴＳツールにより作成された音声データでもよい。 The audio data 202 is a sound data file that can be reproduced by an existing audio reproduction system, and may be a compressed sound file. For example, audio data created by a TTS tool may be used.

音声データ２０２はフレーズに対応した音声データを含むデータであり、付加情報付き音声データでもよいし付加情報なし音声データでもよい。例えばＲＯＭ書き込み時に付加情報無し音声データが作成される場合には、付加情報付き音声データ生成部で生成された付加情報付き音声データがフレーズ情報に関連づけて記憶されてもよい。また編集処理の際に付加情報を取り出して付加情報なしの音声データを作成する場合には、付加情報無し音声データがフレーズ情報に関連づけて記憶されてもよい。 The audio data 202 is data including audio data corresponding to a phrase, and may be audio data with additional information or audio data without additional information. For example, when voice data without additional information is created when writing to the ROM, the voice data with additional information generated by the voice data generation unit with additional information may be stored in association with the phrase information. In addition, when additional information is extracted and audio data without additional information is created during editing processing, the audio data without additional information may be stored in association with the phrase information.

フレーズ情報２００として、フレーズの音声データ２０２が格納されている音声データファイル名２０４（フレーズの音声データのファイル情報)、フレーズの音声データ２０２に対応した音声ログ情報２１０やフレーズ編集情報２２０がフレーズ識別ＩＤ２０６に対応付けて記憶されるようにしてもよい。 As the phrase information 200, an audio data file name 204 (phrase audio data file information) in which the phrase audio data 202 is stored, an audio log information 210 corresponding to the phrase audio data 202, and phrase editing information 220 are phrase identifications. The information may be stored in association with the ID 206.

音声ログ情報２１０は、フレーズ読み方に関するテキストデータであるテキスト情報２１２を含んでもよい。また音声ログ情報２１０は、フレーズの音声データが格納されたファイルのサイズ情報（バイト数等）２１４を含んでもよい。また音声ログ情報２１０は、フレーズの音声ファイルの再生時間情報（ms）２１６を含んでもよい。また音声ログ情報２１０は、ＴＴＳパラメータやデータフォーマット情報等の図示しないその他の情報を含んでもよい。かかる音声ログ情報２１０は、付加情報付き音声データファイルの付加情報（テキスト情報やデータサイズ情報等）を取り出して、取り出した付加情報に基づき生成してもよい。 The audio log information 210 may include text information 212 that is text data related to phrase reading. The audio log information 210 may also include file size information (such as the number of bytes) 214 in which phrase audio data is stored. The audio log information 210 may also include playback time information (ms) 216 of the phrase audio file. The audio log information 210 may also include other information (not shown) such as TTS parameters and data format information. Such audio log information 210 may be generated based on additional information (text information, data size information, etc.) extracted from an audio data file with additional information.

フレーズ編集情報２２０は、本実施の形態のセンテンス編集処理の結果に基づきフレーズ単位に生成生成される編集情報である。フレーズ編集情報２２０は、センテンスに使用される使用回数情報２２２を含んでもよい。またフレーズ編集情報２２０は、ＲＯＭへの書き込みの有無を指示するためのＲＯＭ書き込みの情報２２４を含んでもよい。 The phrase editing information 220 is editing information generated and generated for each phrase based on the result of the sentence editing process of the present embodiment. The phrase editing information 220 may include usage count information 222 used for sentences. The phrase editing information 220 may include ROM writing information 224 for instructing whether or not to write to the ROM.

図４（Ｂ）は、センテンス情報について説明するための図である。 FIG. 4B is a diagram for explaining sentence information.

センテンス情報２４０は、センテンスの編集処理結果に基づき生成される情報であり、センテンス識別ＩＤ２４２に関連づけて記憶されるようにしてもよい。 The sentence information 240 is information generated based on the sentence editing process result, and may be stored in association with the sentence identification ID 242.

センテンス情報２４０は、センテンスを構成するフレーズのテキスト情報２４４を含んでもよい。 The sentence information 240 may include text information 244 of phrases constituting the sentence.

センテンス情報２４０は、センテンスのサイズ情報２４６を含んでもよい。センテンスのサイズ情報２４４はセンテンスを構成するフレーズの音声データファイルの合計バイト数でもよい。また待ち時間を無音の音声データとして有する場合にはこの無音区間データを含む合計のバイト数でもよい。 The sentence information 240 may include sentence size information 246. The sentence size information 244 may be the total number of bytes of the audio data file of the phrase constituting the sentence. In addition, when the waiting time is included as silent voice data, the total number of bytes including the silent section data may be used.

センテンス情報２４０は、センテンスの再生時間情報２４８を含んでもよい。センテンスの再生時間情報２４８は、センテンスを構成するフレーズの音声ファイルの再生時間の合計時間でもよい。またフレーズの前後やフレーズ間に設定された待ち時間も含んだ合計時間でもよい。 The sentence information 240 may include sentence playback time information 248. The sentence playback time information 248 may be the total playback time of the audio files of the phrases constituting the sentence. Moreover, the total time including the waiting time set before and after the phrase or between the phrases may be used.

センテンス情報２４０は、センテンスに関連して編集入力を受けたコメント情報２５０を含んでもよい。 The sentence information 240 may include comment information 250 that has received an edit input related to the sentence.

センテンス情報２４０は、センテンスを構成するフレーズ特定情報２５４−１〜２５４−ｎを含んでもよい。フレーズ特定情報２５４−１〜２５４−ｎは、フレーズに対応した音声データ（図２の２０２）のファイル情報にアクセス可能な情報であり、例えば音声データファイルのファイル名称（図２の２０４）でも良いし、フレーズ識別ＩＤ（図３の２０６）でもよい。フレーズ特定情報２５４−１〜２５４−ｎはフレーズの再生順序に従って並べるようにしてもよい（インデックスｎがフレーズの再生順序と一致している）。 The sentence information 240 may include phrase specifying information 254-1 to 254-n constituting the sentence. The phrase specifying information 254-1 to 254-n is information accessible to the file information of the voice data (202 in FIG. 2) corresponding to the phrase, and may be, for example, the file name of the voice data file (204 in FIG. 2). Alternatively, the phrase identification ID (206 in FIG. 3) may be used. The phrase specifying information 254-1 to 254-n may be arranged according to the playback order of phrases (the index n matches the playback order of phrases).

センテンス情報２４０は、センテンスを構成するフレーズの前に設定されている待ち時間情報２５２−１〜２５２−ｎを含んでもよい。待ち時間情報２５２−１〜２５２−ｎは待ち時間の再生順序に従って並べるようにしてもよい（インデックスｎがフレーズの再生順序と一致している）。 The sentence information 240 may include waiting time information 252-1 to 252-n set before a phrase constituting the sentence. The waiting time information 252-1 to 252-n may be arranged in accordance with the reproduction order of the waiting time (index n matches the reproduction order of phrases).

フレーズ特定情報２５４−１〜２５４−ｎや待ち時間情報２５２−１〜２５２−ｎを再生順序に従って並べることで、フレーズの再生順序に関するシーケンス情報として機能させることができる。 By arranging the phrase specifying information 254-1 to 254-n and the waiting time information 252-1 to 252-n in accordance with the reproduction order, it is possible to function as sequence information regarding the reproduction order of the phrases.

図２６、図２７（Ａ）〜（Ｃ）は、本実施の形態の音声メッセージ作成ツールで行われる各処理について説明するための図である。 FIGS. 26 and 27A to 27C are diagrams for explaining each process performed by the voice message creation tool according to the present embodiment.

本実施の形態ではＴＴＳツール等を用いてフレーズのテキスト情報に基づきフレーズに対応した付加情報付き音声データ生成処理（Ｐ１）をおこなってもよい。生成される音声データは、各フレーズに対応した音声データのファイル（圧縮された音声ファイルでも良い。例えばＡＤＰＣＭ形式やＡＡＣ−ＬＣ形式の音声データファイル）と付加情報を含む。付加情報としては、フレーズの音声データのデータサイズやフレーズに対応したテキストデータや、フレーズの音声の再生時間等を含んでもよい。 In the present embodiment, voice data generation processing (P1) with additional information corresponding to a phrase may be performed based on the text information of the phrase using a TTS tool or the like. The generated audio data includes an audio data file corresponding to each phrase (a compressed audio file may be used. For example, an ADPCM format or AAC-LC format audio data file) and additional information. Additional information may include the data size of the phrase voice data, text data corresponding to the phrase, the playback time of the phrase voice, and the like.

また本実施の形態では、フレーズデータやフレーズ編集情報に基づきフレーズリスト６０２を生成するフレーズリスト生成処理（Ｐ２）を行っても良い。フレーズリストとは、フレーズ単位でデータを管理するためのデータ構造をもつデータの集合であり、各フレーズを特定するための識別ＩＤやインデックスに対応付けてフレーズの音声データや音声ログ情報やフレーズ編集情報が記憶されている。フレーズリストに基づきフレーズリストシート（フレーズリストを表形式で表示する画像）生成して表示部に出力してもよい。 In the present embodiment, a phrase list generation process (P2) for generating a phrase list 602 based on phrase data or phrase editing information may be performed. The phrase list is a set of data having a data structure for managing data in units of phrases, and the phrase voice data, voice log information, and phrase editing are associated with identification IDs and indexes for specifying each phrase. Information is stored. A phrase list sheet (an image displaying the phrase list in a tabular form) may be generated based on the phrase list and output to the display unit.

また本実施の形態ではセンテンスの編集するための編集画面の表示を行う編集画面表示処理（Ｐ５）を行ってもよい。編集画面としてセンテンスリストシートを表示し（センテンスリストシート画面）、センテンスの再生時間やファイルサイズの表示を行い、表示内容をみながらセンテンスの編集が行えるようにしてもよい。 In the present embodiment, an edit screen display process (P5) for displaying an edit screen for editing a sentence may be performed. A sentence list sheet may be displayed as an editing screen (sentence list sheet screen), the sentence playback time and file size may be displayed, and the sentence may be edited while viewing the display contents.

また本実施の形態では、センテンスのテキスト入力を受け付けフレーズに分割するセンテンス分割処理（Ｐ３）を行ってもよい。例えばセンテンスシート画面のセンテンス欄からセンテンスのテキスト入力を受け付け、入力されたセンテンスのテキストをフレーズに分割する処理をおこなうようにしてもよい。 In the present embodiment, sentence division processing (P3) may be performed in which sentence text input is divided into accepted phrases. For example, a sentence text input may be received from the sentence column of the sentence sheet screen, and the input sentence text may be divided into phrases.

図６（Ａ）〜（Ｃ）は、フレーズリスト作成とフレーズ分割処理の成功例と失敗例を模式的に示した図である。 FIGS. 6A to 6C are diagrams schematically showing successful examples and failed examples of phrase list creation and phrase division processing.

例えば図２７（Ａ）に示すようにフレーズデータ「ＡＡＡ」、「ＢＢＢ」、「ＣＣＣ」を読み込み、これらに基づきフレーズリストを生成する。この場合、図２７（Ｂ）に示すように「ＡＡＡＣＣＣ」なるセンテンスが入力されると、「ＡＡＡ」「ＣＣＣ」というフレーズリストに記憶された２つのフレーズに分割される。分割処理は、センテンスを構成するテキストデータとフレーズに対応するテキストデータを比較照合して、センテンスを構成するテキストデータをフレーズに対応するテキストデータに分ける処理である。センテンスを構成するテキストデータを第１のテキスト部分、第２のテキスト部分、・・、第ｎのテキスト部分に分割した場合、第１のテキスト部分、第２のテキスト部分、・・、第ｎのテキスト部分のすべてがフレーズリストに登録されているフレーズのテキストデータと一致した場合に分割処理が成功したとしてもよい。 For example, as shown in FIG. 27A, phrase data “AAA”, “BBB”, and “CCC” are read, and a phrase list is generated based on these. In this case, as shown in FIG. 27B, when the sentence “AAACCC” is input, the sentence is divided into two phrases stored in the phrase lists “AAA” and “CCC”. The division process is a process of comparing and collating the text data constituting the sentence with the text data corresponding to the phrase, and dividing the text data constituting the sentence into text data corresponding to the phrase. When the text data constituting the sentence is divided into the first text portion, the second text portion,..., The nth text portion, the first text portion, the second text portion,. The division process may be successful when all of the text portion matches the text data of the phrase registered in the phrase list.

センテンス分割結果はセンテンスリストシートに表示されるようにしてもよい。 The sentence division result may be displayed on the sentence list sheet.

例えば図２７（Ｃ）に示すように「ＡＡＡＣＢＣ」なるセンテンスが入力されると、「ＣＢＣ」というフレーズデータがフレーズリストに登録されていないためセンテンスをフレーズに分割することができない。このような場合にはセンテンス分割結果がセンテンスリストシートに表示されないようにしてもよいし、分割できないことを警告する表示をおこなってもよい。このようにするとセンテンスに誤りがあってフレーズに分割できなかった場合も表示によりすぐにわかるため、すぐに修正が出来る。 For example, when the sentence “AAACBC” is input as shown in FIG. 27C, the sentence cannot be divided into phrases because the phrase data “CBC” is not registered in the phrase list. In such a case, the sentence division result may not be displayed on the sentence list sheet, or a warning may be displayed that the sentence cannot be divided. In this way, even if there is an error in the sentence and it could not be divided into phrases, it can be immediately confirmed by the display, so it can be corrected immediately.

また本実施の形態では、指定されたフレーズに基づきセンテンスを生成するフレーズ結合処理（Ｐ４）をおこなってもよい。例えばフレーズデータ「ＡＡＡ」と「ＢＢＢ」がこの順序で選択された場合、フレーズデータ「ＡＡＡ」と「ＢＢＢ」をつなぎ合わせてセンテンス「ＡＡＡＢＢＢ」を生成してもよい。 Moreover, in this Embodiment, you may perform the phrase coupling | bonding process (P4) which produces | generates a sentence based on the designated phrase. For example, when the phrase data “AAA” and “BBB” are selected in this order, the phrase data “AAABBBB” may be generated by connecting the phrase data “AAA” and “BBB”.

また本実施の形態では、生成したセンテンスやフレーズの音声再生を行わせ、再生評価を行う生成評価処理（Ｐ６）を行っても良い。生成評価処理（Ｐ６）は、センテンスを構成するフレーズの特定情報に基づきセンテンスを構成するフレーズに対応した音声データをフレーズデータ記憶部から読み出して、センテンス情報のシーケンス情報に従って読み出した音声データの音声を再生出力する処理を行ってもよい。またセンテンス情報の待ち時間情報に基づき、フレーズの前又はフレーズ間に無音区間を設定して音声データの音声を再生出力をおこなってもよい。 Further, in the present embodiment, the generation evaluation process (P6) may be performed in which the generated sentence or phrase is reproduced and the reproduction evaluation is performed. In the generation evaluation process (P6), the voice data corresponding to the phrase constituting the sentence is read from the phrase data storage unit based on the specific information of the phrase constituting the sentence, and the voice of the voice data read according to the sequence information of the sentence information is read. Processing for reproduction output may be performed. Further, based on the waiting time information of the sentence information, a silent section may be set before the phrase or between the phrases, and the voice data may be reproduced and output.

また本実施の形態では、フレーズの前又はフレーズ間に遅延時間を設定し、フレーズ間隔の調整を行うフレーズ間隔の調整処理（Ｐ７）を行ってもよい。フレーズ間隔の調整処理（Ｐ７）として、センテンスを構成するフレーズの前及びフレーズ間の少なくとも１つについて設定する無音区間の長さに関する待ち時間情報に関する編集入力を受け付け、待ち時間情報を含むセンテンス情報の生成を行ってもよい。 Moreover, in this Embodiment, the delay time may be set before a phrase or between phrases, and the phrase space | interval adjustment process (P7) which adjusts a phrase space | interval may be performed. As the phrase interval adjustment process (P7), an edit input related to the waiting time information regarding the length of the silent section set for at least one of the phrases constituting the sentence and between the phrases is accepted, and the sentence information including the waiting time information is received. Generation may be performed.

また本実施の形態では、作成したセンテンスを発話させるために必要な音声データをメモリに格納する際のＲＯＭイメージ（ＲＯＭに格納するデータの内容）を生成するＲＯＭイメージ生成処理（Ｐ９）をおこなってもよい。ＲＯＭイメージ生成処理（Ｐ９）では、フレーズ編集情報に基づき音声データメモリに格納する格納対象フレーズを抽出し、抽出されたフレーズの音声データをフレーズデータ記憶部から読み出して、音声データメモリに書き込むメモリ書き込み情報（ＲＯＭイメージ）を生成してもよい。このようにすると複数のセンテンスで使用されているフレーズについては同じ音声データが重複して書き込まれないようにメモリ書き込み情報（ＲＯＭイメージ）を生成することができる。 In the present embodiment, ROM image generation processing (P9) for generating a ROM image (contents of data stored in the ROM) when storing voice data necessary for speaking the created sentence in the memory is performed. Also good. In the ROM image generation process (P9), a storage target phrase to be stored in the voice data memory is extracted based on the phrase editing information, and the voice data of the extracted phrase is read from the phrase data storage unit and written to the voice data memory. Information (ROM image) may be generated. In this way, memory writing information (ROM image) can be generated so that the same audio data is not written redundantly for phrases used in a plurality of sentences.

また本実施の形態では、センテンス音声を合成するためにＲＯＭイメージから読み出す音声データとその再生順序を指示するセンテンス音声再生コマンドを生成するセンテンス音声再生コマンド生成処理（Ｐ９）をおこなってもよい。センテンス音声再生コマンド生成処理（Ｐ９）では、センテンス情報のフレーズ特定情報に基づきセンテンスを構成するフレーズに対応した音声データを音声データメモリに格納されたメモリ書き込み情報（ＲＯＭイメージ）から読み出して、センテンス情報のシーケンス情報に従って読み出した音声データの音声を再生出力するための指示を行うセンテンス音声再生コマンドを生成してもよい。 In this embodiment, sentence voice reproduction command generation processing (P9) for generating voice data to be read from the ROM image and a sentence voice reproduction command for instructing the reproduction order in order to synthesize sentence voice may be performed. In the sentence voice reproduction command generation process (P9), the voice data corresponding to the phrase constituting the sentence is read from the memory writing information (ROM image) stored in the voice data memory based on the phrase specifying information of the sentence information, and the sentence information is read. A sentence voice reproduction command for giving an instruction to reproduce and output the voice of the voice data read in accordance with the sequence information may be generated.

２．表計算ソフトウエアを用いた編集の具体例
図５〜図２５は、表計算アプリケーションソフトウエアを用いて本ツールを実現する例について説明するための図である。 2. Examples of editing using spreadsheet software
5 to 25 are diagrams for explaining an example in which this tool is realized by using spreadsheet application software.

本ツールは汎用の表計算ソフトウエアのマクロ機能を利用して実現することもできる。 This tool can also be realized using the macro function of general-purpose spreadsheet software.

本ツールでは、ＴＴＳシステム等で音声作成ツールにて作成された音声データをフレーズとして取り扱い、フレーズを繋いで音声ガイドメッセージとなるセンテンスを編集する。音声データとしては、ＡＤＰＣＭ形式、ＡＡＣ−ＬＣ形式等を用いてもよい。 In this tool, voice data created by a voice creation tool in a TTS system or the like is handled as a phrase, and a sentence that becomes a voice guide message is edited by connecting phrases. As the audio data, an ADPCM format, an AAC-LC format, or the like may be used.

本ツールの各機能について説明する。本ツールの画面は、図５に示すようなダイアログ画面と図６や図７に示すような表形式シート画面から構成されている。 Each function of this tool will be described. The screen of this tool is composed of a dialog screen as shown in FIG. 5 and a tabular sheet screen as shown in FIGS.

ダイアログ画面４００のシート選択部４１０において、センテンスリスト４１２、フレーズリスト４１４、パラメータ４１６のラジオボタンを選択することによりセンテンスリスト４１２、フレーズリスト４１４、パラメータ４１６のシートの表示を切り替えることができる。 By selecting radio buttons of the sentence list 412, the phrase list 414, and the parameter 416 in the sheet selection unit 410 of the dialog screen 400, the display of the sheets of the sentence list 412, the phrase list 414, and the parameter 416 can be switched.

フレーズリスト４２０のアップデートボタン４２２を選択すると、音声データが入っているフォルダから、フレーズリストシートへ、編集に必要な音声データを読み込み、フレーズ一覧を作成する。 When the update button 422 of the phrase list 420 is selected, audio data necessary for editing is read from a folder containing audio data into a phrase list sheet, and a phrase list is created.

ツール４３０のフレーズ分割ボタン４３２を選択すると、センテンスシートのセンテンス欄に入力されたセンテンスを、フレーズリストにあるフレーズ一賢から一致するフレーズを選び出して並べ、センテンスを構成するフレーズの構成を作成する。 When the phrase split button 432 of the tool 430 is selected, the sentences entered in the sentence field of the sentence sheet are selected and arranged from the phrases in the phrase list, and the phrases are configured to create a sentence.

センテンス作成ボタン４３４を選択すると、フレーズ欄にあるフレーズをフレーズリストにあるフレーズと照合して一致するものをセンテンス欄に並べてセンテンスを構成する。 When the sentence creation button 434 is selected, the sentences in the phrase column are matched with the phrases in the phrase list, and the matching sentences are arranged in the sentence column to form a sentence.

ツール４３０のプレイボタン４３６を選択すると、センテンス欄からフレーズの内容を再生する。 When the play button 436 of the tool 430 is selected, the contents of the phrase are reproduced from the sentence column.

センテンスリストＲＯＭ４４０のライトアウトボタン４４２を選択すると、センテンス一賢からシーケンスファイル（センテンス生成情報が記憶されたファイルやセンテンス音声再生コマンドが記憶されたファイル）やＲＯＭイメージを生成する。なおシート上のすべてのセンテンスに対してシーケンスファイルが保存される。 When the light-out button 442 of the sentence list ROM 440 is selected, a sequence file (a file in which sentence generation information is stored or a file in which a sentence sound reproduction command is stored) and a ROM image are generated from the sentence. A sequence file is stored for all sentences on the sheet.

ＲＯＭイメージを生成する場合には、センテンスに使用されているフレーズは自動的にＲＯＭイメージに含むが、センテンスに使用されていないフレーズであっても、指定によりＲＯＭイメージに含ませる事が出来る。 When a ROM image is generated, phrases used in the sentence are automatically included in the ROM image, but even phrases that are not used in the sentence can be included in the ROM image by designation.

またライトアウトボタン４４２を選択することにより、センテンス一覧のシーケンスファイルが作成された後、トータルサイズ欄４４４に、データサイズの合計値が表示される。パラメータのＲＯＭイメージサイズで指定したサイズを上回った場合は赤字で表示される。本実施の形態では同じフレーズを複数のセンテンスに使用していても、１つ分のフレーズデータサイズしか、加算されない。この値は、保持され、次回起動時にも表示されるようにしてもよい。 Also, by selecting the light-out button 442, a sequence file of a sentence list is created, and then the total value of the data size is displayed in the total size column 444. If it exceeds the size specified by the parameter ROM image size, it is displayed in red. In this embodiment, even if the same phrase is used for a plurality of sentences, only one phrase data size is added. This value may be retained and displayed at the next startup.

またライトアウトボタン４４２を選択することにより、センテンス一覧のシーケンスファイルが作成された後、トータルタイム欄４４６に、センテンスの合計再生時間が表示される。 Also, by selecting the light-out button 442, a sentence list sequence file is created, and the total sentence playback time is displayed in the total time field 446.

本実施の形態では、センテンスリストシート、フレーズリストシート、パラメータシートの各シートを含む。 In the present embodiment, a sentence list sheet, a phrase list sheet, and a parameter sheet are included.

図６はセンテンスリストシート（表示画像）の一例である。本実施の形態では例えばパーソナルコンピュータの表示部にセンテンスリストシート５００が表示されるようにしてもよい。そしてセンテンスリストソート画面でセンテンスが編集された結果によってセンテンス情報やフレーズ編集情報が生成（作成や更新）されてもよい。 FIG. 6 is an example of a sentence list sheet (display image). In the present embodiment, for example, the sentence list sheet 500 may be displayed on the display unit of a personal computer. Then, sentence information and phrase editing information may be generated (created or updated) based on the result of editing the sentence on the sentence list sort screen.

インデックス欄５１０は、センテンスにアクセスする際のインデックス情報である。 The index column 510 is index information for accessing a sentence.

ＩＤ欄５２０は、センテンスの識別情報となるＩＤである。シーケンスファイルを作成するときに、このＩＤでファイル名を生成するようにしてもよい。 The ID column 520 is an ID that serves as sentence identification information. When creating a sequence file, a file name may be generated with this ID.

センテンス欄５３０は、再生されるセンテンスの内容である。フレーズを並べてセンテンスを作成することも可能であり、キーボード等を用いて直接センテンスを入力することも可能である。サイズ欄５４０は、センテンスを構成するフレーズと待ち時間の合計データサイズを表示する。タイム欄５５０は、センテンスの再生時間を表示する。コメント欄５６０はセンテンスに対するコメント欄として活用できる。待ち時間欄５７０はフレーズの前に設定する待ち時間を設定する欄であり、各フレーズに対応して設けられる。 The sentence column 530 is a content of a sentence to be reproduced. It is also possible to create a sentence by arranging phrases, and it is also possible to directly input a sentence using a keyboard or the like. The size column 540 displays the total data size of phrases and waiting times that constitute a sentence. The time column 550 displays the sentence playback time. The comment field 560 can be used as a comment field for a sentence. The waiting time column 570 is a column for setting a waiting time set before a phrase, and is provided corresponding to each phrase.

フレーズ欄５８０はセンテンスを構成するフレーズを表示する欄で例えば最大６４個まで表示できるようにしてもよい。なお各フレーズに対応して待ち時間欄５７０が設けられるので、フレーズ欄がｎ個表示されている場合には、各フレーズ毎に待ち時間欄が設けられるので待ち時間欄ｎ個表示されている。 The phrase column 580 is a column for displaying phrases constituting a sentence, and for example, a maximum of 64 phrases may be displayed. Since a waiting time column 570 is provided corresponding to each phrase, when n phrase columns are displayed, a waiting time column is provided for each phrase, so that n waiting time columns are displayed.

図７はフレーズリストシート（表示画像）の一例である。本実施の形態では例えばパーソナルコンピュータの表示部にフレーズリストシート６００が表示されるようにしてもよい。本実施の形態では、読み込んだ音声作成ツールの音声データを、フレーズとして管理する。フレーズリストシート６００は、フレーズデータやフレーズ編集情報に基づき生成され、編集結果に応じて更新さされるようにしてもよい。 FIG. 7 is an example of a phrase list sheet (display image). In the present embodiment, for example, phrase list sheet 600 may be displayed on the display unit of a personal computer. In this embodiment, the read voice data of the voice creation tool is managed as a phrase. The phrase list sheet 600 may be generated based on phrase data or phrase editing information, and may be updated according to the editing result.

インデックス欄６１０は、フレーズにアクセスする際のインデックス情報である。ＩＤ欄６２０は、フレーズの識別情報となるＩＤであり、読み込まれた音声データに対するフレーズの管理ＩＤとして使用される。ファイルネーム欄６３０には、フレーズとして読み込まれた音声データのファイル名が表示される。フレーズ欄には、フレーズとして読み込まれた音声データの内容（例えば読み方や表示）を表すテキストデータ表示される。サイズ欄６５０には、フレーズとして読み込まれた音声データのサイズが表示される。タイム欄６６０には、フレーズとして読み込まれた音声データの再生時間（ｍ単位）が表示される。使用回数欄６７０には、フレーズとして読み込まれた音声データが作成されたセンテンス全体で、何回使われているか表示される。ＲＯＭ書き込み欄６８０に、値が書かれている場合はライトアウトボタン（図５の４４２）を押した際にそのフレーズがＲＯＭイメージの中に含まれる。なおセンテンスに使用されているフレーズについては、ライトアウト実行の際に自動的に１の値が書かれ、使用されていないフレーズについては、自動的に空欄となるようにしてもよい。またこの欄にあらかじめ１以外の任意の値を書き込んでおいた場合には、使用されている／使用されていないに関わらずＲＯＭイメージに含めるようにしてもよい。 The index column 610 is index information for accessing a phrase. The ID column 620 is an ID serving as phrase identification information, and is used as a phrase management ID for the read audio data. In the file name column 630, the file name of the audio data read as a phrase is displayed. In the phrase column, text data representing the contents (for example, reading and display) of the audio data read as a phrase is displayed. The size column 650 displays the size of audio data read as a phrase. In the time column 660, the reproduction time (m unit) of the audio data read as a phrase is displayed. The number-of-uses column 670 displays how many times the sentence is used in the entire sentence in which the voice data read as a phrase is created. When a value is written in the ROM writing field 680, the phrase is included in the ROM image when the light-out button (442 in FIG. 5) is pressed. It should be noted that a value of 1 is automatically written for a phrase used in a sentence when a write-out is performed, and a blank is automatically entered for a phrase that is not used. If an arbitrary value other than 1 is previously written in this field, it may be included in the ROM image regardless of whether it is used or not used.

図８はパラメータシート（表示画像）の一例である。本実施の形態では例えばパーソナルコンピュータの表示部にパラメータシート７００が表示されるようにしてもよい。ここでは、本ツールに関するパラメータを設定することができる。 FIG. 8 shows an example of a parameter sheet (display image). In the present embodiment, for example, the parameter sheet 700 may be displayed on a display unit of a personal computer. Here you can set parameters for this tool.

待ち時間デフォルト値７１０は、センテンス作成時に、フレーズの前又はフレーズ間に設定されるデフォルトの待ち時間（単位はｍｓ）である。センテンスシートにおいて待ち時間（図６の５７０）が設定されている場合はその値を優先するようにしてもよい。 The waiting time default value 710 is a default waiting time (unit: ms) set before a phrase or between phrases when a sentence is created. When a waiting time (570 in FIG. 6) is set in the sentence sheet, the value may be prioritized.

シーケンスファイルフォーマット７２０は、シーケンスファイル生成において、出力するデータをバイナリ形式にするか、テキスト形式にするかを設定するパラメータである。テキスト形式にすると、各行に待ち時間値とフレーズの音声ファイル名称をカンマで区切って並べたテキストファイルを出力するようにしてもよい。 The sequence file format 720 is a parameter for setting whether to output data to be in binary format or text format in sequence file generation. In the text format, a text file in which a waiting time value and a phrase audio file name are separated by commas may be output on each line.

編集ツールでの作業手順について説明する。 The work procedure using the editing tool will be described.

まずフレーズに対応した音声データの準備を行う。フレーズ編集を行うための音声データを、フォルダの中にまとめる。 First, audio data corresponding to the phrase is prepared. Collect audio data for phrase editing in a folder.

次にフレーズに対応した音声データの読み込みを行う。本ツールのアップデート画面において、アップデートボタン（図５の４２２）を選択すると、フォルダ選択ダイアログが表示されるので、フレーズに音声データが入っているフォルダを選択し、ＯＫボタンを選択すると、ファイル選択ダイアログが閉じ、表示がフレーズリストシートに自動的に切り替わるようにしてもよい。フレーズに対応した音声データが読み込まれ、図９に示すようにフレーズリストシートにフレーズ一覧が作成される。 Next, the audio data corresponding to the phrase is read. When the update button (422 in FIG. 5) is selected on the update screen of this tool, a folder selection dialog is displayed. Select a folder containing audio data in the phrase and select the OK button to select the file selection dialog. May be closed and the display may be automatically switched to the phrase list sheet. Audio data corresponding to the phrase is read, and a phrase list is created on the phrase list sheet as shown in FIG.

次にセンテンスシート画面において音声ガイドメッセージとして発話させたいセンテンスを作成する。ダイアログ画面からセンテンスリストシートを選択してセンテンスを作成することができる。センテンスの作成方法には、センテンスを直接入力する方法と、読み込んだフレーズをつなげていく方法がある。センテンスを直接入力す場合、作成したいセンテンスをセンテンスリストのセンテンス欄に入力する。例えば図１０に示すようにセンテンスリスト５００のセンテンス欄５５０に「お風呂の温度は４１度です」という文書を入力する。 Next, a sentence to be uttered as a voice guide message is created on the sentence sheet screen. A sentence can be created by selecting a sentence list sheet from the dialog screen. There are two ways to create a sentence: enter the sentence directly and connect the imported phrases. When entering a sentence directly, enter the sentence you want to create in the sentence field of the sentence list. For example, as shown in FIG. 10, a document “Bath temperature is 41 degrees” is entered in the sentence column 550 of the sentence list 500.

入力したセンテンスを構成するフレーズは、フレーズリストにデータとして読み込まれている必要がある。この例では、図１１に示す「お風呂の」８０４、「温度は」８０６、「４１度です」８０２のデータがセンテンスを構成するフレーズとして使用される。 Phrases that make up the sentence you entered must be loaded as data in the phrase list. In this example, “bath” 804, “temperature” 806, and “41 degrees” 802 data shown in FIG. 11 are used as phrases constituting the sentence.

入力後、ダイアログ画面のフレーズ分割ボタンを選択すると、センテンスリストシート５００のセンテンス欄に入力されたセンテンス８１０が、複数のフレーズに展開される。結果として、フレーズリストシートに読み込まれているフレーズから、適切なフレーズデータが選択され、図１２に示すように選択されたフレーズデータがフレーズリストのフレーズ欄８１２、８１４、８１６に表示される。 When the phrase split button on the dialog screen is selected after the input, the sentence 810 input in the sentence column of the sentence list sheet 500 is expanded into a plurality of phrases. As a result, appropriate phrase data is selected from the phrases read in the phrase list sheet, and the selected phrase data is displayed in the phrase columns 812, 814, and 816 of the phrase list as shown in FIG.

なお各フレーズ欄８１２、８１４、８１６に対応した待ち時間欄８１１，８１３，８１５にはデフォルト値として１００（ms）が設定されている。この設定はセンテンスリスト画面への入力により変更することができる。 Note that 100 (ms) is set as a default value in the waiting time fields 811, 813, and 815 corresponding to the phrase fields 812, 814, and 816. This setting can be changed by inputting to the sentence list screen.

読み込んだフレーズをつなげる場合には、センテンスリストのフレーズ欄に、フレーズをテキスト入力していく。入力し終わったら、ダイアログのセンテンス作成ボタンを選択する。例として、「お風呂の温度は４１度です」の「４１」の部分を、「４１」から「４９」まで変更したフレーズを作成する（図１３参照）。この例では、下記のセンテンス「」部分の音声を変更することで、センテンスのバリエーションを作成する。 To connect the imported phrases, enter the text in the phrase field of the sentence list. When finished, select the Create sentence button in the dialog. As an example, a phrase in which “41” in “the bath temperature is 41 degrees” is changed from “41” to “49” is created (see FIG. 13). In this example, a sentence variation is created by changing the voice of the following sentence “”.

おふろのおんどはよんじゅう「いちど」です
この例では、図１４のフレーズリストのフレーズ群８２０に示すデータがセンテンスを構成するフレーズとして使用される。 In this example, the data shown in the phrase group 820 of the phrase list in FIG. 14 is used as a phrase constituting the sentence.

まず、基本となるセンテンスをセンテンスシートに入力する。例えば図１５の８３０に示すように「お風呂の温度は４０１度です」と入力する。文書的には「よんひゃくいちど」と読めるが、音声的には「よんじゅういちど」と再生される。このようにすると、少ないデータ構成でのバリエーションに富んだ作成が可能となる。 First, the basic sentence is input to the sentence sheet. For example, as shown by 830 in FIG. 15, “the bath temperature is 401 degrees” is input. Although it can be read as “Yonhyaku 1” in terms of document, it is played as “Yonju 1” in terms of sound. In this way, it is possible to create a rich variety with a small data configuration.

ダイアログのフレーズ分割ボタンを選択すると、８３２に示すように入力されているセンテンスがフレーズに展開される。そして展開されたフレーズ８３２を選択し、コピーして、複数のフレーズ群８３４を生成する。そして８３６に示すようにフレーズ「１度」の部分を「２度」〜「９度」に変更する。 When the phrase split button in the dialog is selected, the input sentence is expanded into a phrase as indicated by 832. The expanded phrase 832 is selected and copied to generate a plurality of phrase groups 834. Then, as indicated by 836, the phrase “1 degree” is changed from “2 degrees” to “9 degrees”.

そして図１６に示すように、作成するセンテンスのフレーズ欄８４０を選択し、ダイアログのセンテンス作成ボタンを選択すると、結果として、８４２に示すようにセンテンス欄にフレーズを結合させた結果が入る。他のフレーズの欄を選択し、ダイアログのセンテンス作成ボタンを選択していくことで、図１７の８４４に示すように他のセンテンスも作成することができる。 Then, as shown in FIG. 16, when the sentence phrase field 840 to be created is selected and the sentence creation button in the dialog is selected, the result of combining phrases into the sentence field is entered as shown at 842. By selecting another phrase field and selecting a sentence creation button in the dialog, another sentence can be created as shown at 844 in FIG.

センテンスが完成しなかった場合は、図１８の８５０に示すようにセンテンス欄の文字がグレー表示されるので、作成に必要なフレーズを音声作成ツールにて作成し、フレーズを追加する等の対処を行う。 If the sentence is not completed, the text in the sentence column will be grayed out as shown at 850 in FIG. 18, so create a phrase necessary for creation with the voice creation tool and add a phrase. Do.

センテンスを作成すると、フレーズは用意できているのに、うまく選択されず、センテンスが未完成になる場合がある。その場合の回避方法として以下のような構成を採用してもよい。 When you create a sentence, the phrase is ready, but it is not selected properly, and the sentence may be incomplete. In such a case, the following configuration may be adopted as an avoidance method.

例えばセンテンス中に区切りを明示するようにしてもよい。図１９の８６０に示すように「４０１度」と入力すると、「よんひゃくいちど」と読ませるのか、「よんじゅういちど」と読ませるのか、はっきりしなくなる場合がある。そのような場合、センテンス中に半角スペースを挿入することで、区切りとして指定することができる。図１９の８６２に示すように明示的に「40１度」と入力することで、「よんじゅういちど」と読ませることができる。 For example, a break may be specified in the sentence. When “401 degrees” is input as indicated by reference numeral 860 in FIG. 19, it may not be clear whether “Yonhyaku 1” or “Yonju 1” is read. In such a case, it can be specified as a delimiter by inserting a half-width space in the sentence. By explicitly inputting “401 degrees” as indicated by reference numeral 862 in FIG. 19, it is possible to read “40 times”.

また同じ読み方の違うフレーズを使用するようにしてもよい。センテンスで使用されるフレーズは、フレーズリストの先頭から見つけるため、図２０の８７０、８７２のように同じ読み方のフレーズが複数ある場合、うまく自動選択されない場合がある。このような場合音声データファイルの一部に対し、テキストエディタでの編集をおこうようにしてもよい。例えば選択させたいフレーズの音声ファイルの付属情報が記憶されたファイルを選択し、テキストエディタで開き、編集するようにしてもよい。図２１（Ａ）（Ｂ）に示すように音声ファイルの付属情報が記憶されたファイルには、カンマで区切られた３つの文字列（テキスト表示情報８９２、データサイズ８９４、再生時間８９６）が入っている。このテキスト表示情報８９２の文字列に記号(この例では「＊」）を追加する。 Moreover, you may make it use a different phrase of the same reading. Since the phrase used in the sentence is found from the top of the phrase list, if there are a plurality of phrases that are read in the same way as in 870 and 872 in FIG. In such a case, a part of the audio data file may be edited with a text editor. For example, a file in which the attached information of the audio file of the phrase to be selected is selected, opened with a text editor, and edited. As shown in FIGS. 21 (A) and 21 (B), the file in which the attached information of the audio file is stored contains three character strings (text display information 892, data size 894, playback time 896) separated by commas. ing. A symbol (“*” in this example) is added to the character string of the text display information 892.

そして再度、フレーズリストシートの一覧を更新すると、図２２の８９８に示すようにフレーズリストのフレーズ欄が変更した内容に更新されている。 When the list of the phrase list sheet is updated again, the phrase column of the phrase list is updated to the changed content as indicated by 898 in FIG.

そしてセンテンスリストにて、センテンスに対し、追加した記号を記入して入力して、センテンス分割ボタンを選択すると、図２３に示すようにテキスト表示情報が変更された音声が、フレーズとして選択される。 Then, in the sentence list, when an added symbol is entered and input to the sentence and the sentence division button is selected, the voice whose text display information is changed is selected as a phrase as shown in FIG.

また同じ部分が重複するフレーズが複数ある場合に、センテンスで使用されるフレーズを、フレーズリストの先頭から見つけると、所望のものが自動選択されない場合がある。このように同じ部分が重複するフレーズがある場合、センテンス上において、フレーズの区切りを明確にすることにより、適切なフレーズを選択させるようにしてもよい。例えば図２４（Ａ）に示すように「電源を」を含むフレーズが複数ある場合（９０２、９０４参照）、９０６に示すように「お風呂の電源を切ってください」と入力すると９０４の「電源を切って」がフレーズとして選択される場合がある。このような場合図２４（Ｂ）の９０８に示すように、センテンスリストのセンテンス欄で、区切りの場所に区切り記号として半角スペース９１０を挿入するようにしてもよい。この例では「電源を」の後ろに半角スペース９１０を挿入している。このようにしてフレーズ分割ボタンを選択すると、区切りに対応したフレーズを選択させることができる。 In addition, when there are a plurality of phrases having the same part, if a phrase used in the sentence is found from the top of the phrase list, a desired one may not be automatically selected. When there are phrases with the same part overlapping in this way, an appropriate phrase may be selected by clarifying the phrase delimiters on the sentence. For example, as shown in FIG. 24A, when there are a plurality of phrases including “power on” (see 902 and 904), when “turn off the bath power” is input as shown in 906, “power supply” of 904 May be selected as a phrase. In such a case, as indicated by reference numeral 908 in FIG. 24B, a half-width space 910 may be inserted as a delimiter symbol at the delimiter in the sentence column of the sentence list. In this example, a half-width space 910 is inserted behind the “power supply”. When the phrase division button is selected in this way, the phrase corresponding to the break can be selected.

次にセンテンスシート画面において、センテンスの調整を行う。 Next, the sentence is adjusted on the sentence sheet screen.

センテンス作成後、各フレーズの間隔を調整することができる。作成したセンテンスの各遅延時間欄にｍｓ単位の時間を入力することで、フレーズの前に設定される無音区間の長さを設定することができる。センテンスの調整結果は、ダイアログの再生ボタンを選択することによって、センテンスを発話させ、音で確認することができる。 After the sentence is created, the interval between each phrase can be adjusted. By inputting the time in ms unit in each delay time column of the created sentence, the length of the silent section set before the phrase can be set. The sentence adjustment result can be confirmed by sound by selecting the playback button in the dialog to utter the sentence.

次に作成したセンテンスのシーケンスファイル（例えばセンテンス音声再生コマンド等）とＲＯＭイメージを生成する。ダイアログ画面のライトアウトボタンを選択すると、シーケンスファイル（例えばセンテンス音声再生コマンド等）やＲＯＭイメージを生成することができる。生成されたファイルサイズの合計が、サイズ合計の欄（図５の４４４）に表示される。 Next, the generated sentence sequence file (for example, a sentence voice reproduction command) and a ROM image are generated. When the light-out button on the dialog screen is selected, a sequence file (for example, a sentence sound reproduction command) or a ROM image can be generated. The total of the generated file sizes is displayed in the size total column (444 in FIG. 5).

ＲＯＭイメージを生成する場合には、フレーズリストのＲＯＭ書き込み欄６８０が空欄でないフレーズがＲＯＭイメージの中に納められる。
この例ではＲＯＭ書き込み欄６８０の１０列目と１１列目に「Ｏ」が入っている事で（９１２、９１４参照）、使用回数欄６７０の回数値のあり／なしに関わらずＲＯＭイメージに収めることができる。 When generating a ROM image, a phrase in which the ROM writing column 680 of the phrase list is not blank is stored in the ROM image.
In this example, “O” is entered in the 10th and 11th columns of the ROM write column 680 (see 912 and 914), and the ROM image is stored in the ROM image regardless of the presence / absence of the count value in the use count column 670. be able to.

本ツールではフレーズリスト情報に基づきのＲＯＭに書き込むフレーズの音声ファイルを合計して合計値を求め、合計値に基づきＲＯＭサイズの参考値を決定する。合計値自体をＲＯＭサイズの参考値として決定しても良いし、合計値とＲＯＭサイズ参考値の対応関係を定めておいて、対応関係に基づきサイズ参考値を決定してもよい。 In this tool, the audio files of phrases written in the ROM based on the phrase list information are totaled to obtain a total value, and a reference value of the ROM size is determined based on the total value. The total value itself may be determined as a reference value for the ROM size, or a correspondence relationship between the total value and the ROM size reference value may be determined, and the size reference value may be determined based on the correspondence relationship.

また本ツールではセンテンスリスト情報及びセンテンスを構成するフレーズ情報とセンテンスを構成するフレーズに対応した待ち時間情報に基づき、センテンスの音声再生時間（複数のセンテンスがある場合には複数のセンテンスの音声再生時間の合計）を演算し、ダイアログの合計時間欄４４６に音声再生時間の合計を表示する。 This tool also uses the sentence list information, the phrase information that constitutes the sentence, and the waiting time information corresponding to the phrase that constitutes the sentence, and the voice playback time of the sentence (if there are multiple sentences, the voice playback time of multiple sentences And the total audio playback time is displayed in the total time column 446 of the dialog.

図２８は、本実施の形態の音声メッセージの作成処理の流れを示すフローチャートである。 FIG. 28 is a flowchart showing a flow of voice message creation processing according to the present embodiment.

まずフレーズに対応したテキスト情報に基づき付加情報付き音声データの準備を行う（ステップＳ１０）。 First, voice data with additional information is prepared based on text information corresponding to a phrase (step S10).

次に付加情報付き音声データに基づきリスト情報を生成する（ステップＳ２０）。 Next, list information is generated based on the audio data with additional information (step S20).

次にリスト情報に基づき編集画面を表示し、編集入力に基づき編集処理を行う
（ステップＳ３０）。 Next, an editing screen is displayed based on the list information, and editing processing is performed based on the editing input (step S30).

次に編集結果に基づきリスト情報の生成や更新を行う（ステップＳ４０）。 Next, list information is generated or updated based on the editing result (step S40).

次にリスト情報に基づき、ＲＯＭ格納対象フレーズを抽出し、抽出されたフレーズの付加情報なし音声データを生成し、付加情報なし音声データに基づいてＲＯＭイメージを生成する（ステップＳ５０）。 Next, a ROM storage target phrase is extracted based on the list information, voice data without additional information of the extracted phrase is generated, and a ROM image is generated based on the voice data without additional information (step S50).

次にリスト情報に基づき、ＲＯＭイメージに格納される音声データに対応したセンテンスの音声再生コマンドを生成する（ステップＳ６０）。 Next, based on the list information, a sentence voice reproduction command corresponding to the voice data stored in the ROM image is generated (step S60).

本実施の形態によれば、音声データ作成時に、音声データファイルに付加情報（音声ログ情報）を埋め込むことで、ツール間（例えばＴＴＳツールと音声編集ツール）で受け渡すファイル数が少なくなり、ファイル管理が容易になる。 According to this embodiment, by embedding additional information (audio log information) in an audio data file when generating audio data, the number of files transferred between tools (for example, a TTS tool and an audio editing tool) is reduced. Management becomes easy.

また付加情報（音声ログ情報）が埋め込まれた音声データを用いてメモリ格納情報（ＲＯＭイメージ）を作成してしまうと、メモリ（ＲＯＭ）サイズが大きくなってしまうが、ＲＯＭ作成時に音声データからログ情報を削除してメモリ格納情報（ＲＯＭイメージ）を作成することで、メモリ（ＲＯＭ）サイズの増大を防ぐことが出来る。 Also, if memory storage information (ROM image) is created using voice data with embedded additional information (voice log information), the memory (ROM) size will increase. By deleting the information and creating memory storage information (ROM image), an increase in the memory (ROM) size can be prevented.

本実施の形態によれば、音声ガイドメッセージであるセンテンスを構成するのに使われたフレーズが自動で選択されるため入れ忘れのようなヒューマンエラーがなく、センテンスには使われなかったものの意図的に入れておきたいフレーズも含めることが出来るため自由度が高い。 According to the present embodiment, the phrase used to compose the sentence that is the voice guidance message is automatically selected, so there is no human error such as forgetting to put it in, but it was intentionally not used in the sentence. Phrases you want to include can also be included, so there is a high degree of freedom.

またファイル長や発音時間を管理しているので、ＲＯＭ化されたときのサイズがわかる。またフレーズ間に待ち時間(ディレィ)を挿入する事が出来き、その場で発話させて確認することができるので便利である。 Moreover, since the file length and pronunciation time are managed, the size when ROMized is known. Also, it is convenient because you can insert a waiting time between phrases, and you can utter and check on the spot.

また自動的に必要な音声ファイルをまとめてＲＯＭイメージとして出力するので、作業効率の向上させ、人為的なミスも含むことができる。 Further, since necessary audio files are automatically output as a ROM image, work efficiency can be improved and human error can be included.

また例えばセンテンスの一部を変更するだけでバリエーションを含むセンテンスを効率よく作成できる。 In addition, for example, a sentence including a variation can be efficiently created only by changing a part of the sentence.

また必要なセンテンスを再生するために必要なメモリサイズを知ることが出来るので、格納する音声ファイルを絞ったり追加したりする等のコストバランスの調整が容易になる。 In addition, since it is possible to know the memory size necessary for reproducing a necessary sentence, it becomes easy to adjust the cost balance such as narrowing down or adding an audio file to be stored.

フレーズ間の待ち時間（ディレィ）を調整する事で発声のニュアンスをコントロールが容易におこなえるようになる。 By adjusting the waiting time (delay) between phrases, the nuance of utterance can be controlled easily.

なお、本発明は本実施形態に限定されず、本発明の要旨の範囲内で種々の変形実施が可能である。 In addition, this invention is not limited to this embodiment, A various deformation | transformation implementation is possible within the range of the summary of this invention.

本発明は、実施の形態で説明した構成と実質的に同一の構成（例えば、機能、方法及び結果が同一の構成、あるいは目的及び効果が同一の構成）を含む。また、本発明は、実施の形態で説明した構成の本質的でない部分を置き換えた構成を含む。また、本発明は、実施の形態で説明した構成と同一の作用効果を奏する構成又は同一の目的を達成することができる構成を含む。また、本発明は、実施の形態で説明した構成に公知技術を付加した構成を含む。 The present invention includes configurations that are substantially the same as the configurations described in the embodiments (for example, configurations that have the same functions, methods, and results, or configurations that have the same objects and effects). In addition, the invention includes a configuration in which a non-essential part of the configuration described in the embodiment is replaced. In addition, the present invention includes a configuration that exhibits the same operational effects as the configuration described in the embodiment or a configuration that can achieve the same object. Further, the invention includes a configuration in which a known technique is added to the configuration described in the embodiment.

本実施の形態の音声メッセージ作成システムの機能ブロック図の一例。An example of a functional block diagram of the voice message creation system of the present embodiment. 図２（Ａ）はメモリ書き込み情報（ＲＯＭイメージ）の生成過程について説明するための図であり、図２（Ｂ）はメモリ書き込み情報（ＲＯＭイメージ）の使用形態について説明するための図。FIG. 2A is a diagram for explaining a generation process of memory write information (ROM image), and FIG. 2B is a diagram for explaining a usage form of the memory write information (ROM image). 図３（Ａ）（Ｂ）は、それぞれ付加情報付き音声データ、付加情報無し音声データのファイルの構成を示す図。FIGS. 3A and 3B are diagrams showing file structures of audio data with additional information and audio data without additional information, respectively. 図４（Ａ）は、フレーズに対応した音声データとフレーズ情報について説明するための図であり、図４（Ｂ）は、センテンス情報について説明するための図。FIG. 4A is a diagram for explaining audio data and phrase information corresponding to a phrase, and FIG. 4B is a diagram for explaining sentence information. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 表計算アプリケーションソフトウエアを用いて本ツールを実現する例。An example of realizing this tool using spreadsheet application software. 本実施の形態の音声メッセージ作成ツールで行われる各処理について説明するための図。The figure for demonstrating each process performed with the voice message preparation tool of this Embodiment. 図２７（Ａ）〜（Ｃ）は、本実施の形態の音声メッセージ作成ツールで行われる各処理について説明するための図。FIGS. 27A to 27C are diagrams for describing each process performed by the voice message creation tool of the present embodiment. 本実施の形態の音声メッセージの作成処理の流れを示すフローチャート。The flowchart which shows the flow of the production | generation process of the voice message of this Embodiment.

Explanation of symbols

１０音声合成ＩＣ、２０内蔵ＲＯＭ、５０ＴＴＳ音声合成用辞書、１００音声メッセージ作成ツール（プログラム、システム）、１０２ＴＴＳツール、１０４音声編集ツール、１１０処理部、１２０付加情報付き音声データ生成部、１３０編集処理部、１３２リスト情報精製処理部、１３４付加情報なし音声データ作成部、１３６メモリ書き込み情報（ＲＯＭイメージ）生成部、１３８音声再生コマンド生成部、１４０音声再生出力処理部、１５０不揮発性記憶部、１５２メモリ書き込み情報、１５４音声再生コマンド、１６０操作部、１７０記憶部、１７２音声データ記憶部、１７４センテンス情報記憶部、１７６フレーズ情報記憶部、１８０情報記憶媒体、１８２辞書データ記憶部、１９０表示部、１９２音出力部、１９６通信部 10 speech synthesis IC, 20 built-in ROM, 50 TTS speech synthesis dictionary, 100 speech message creation tool (program, system), 102 TTS tool, 104 speech editing tool, 110 processing unit, 120 speech data generation unit with additional information, 130 Edit processing unit, 132 list information refining processing unit, 134 voice data creation unit without additional information, 136 memory write information (ROM image) generation unit, 138 voice playback command generation unit, 140 voice playback output processing unit, 150 nonvolatile storage unit , 152 Memory write information, 154 Audio playback command, 160 Operation unit, 170 Storage unit, 172 Audio data storage unit, 174 Sentence information storage unit, 176 Phrase information storage unit, 180 Information storage medium, 182 Dictionary data storage unit, 190 display Part, 92 sound output section, 196 communication unit

Claims

A voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing unit that displays an edit screen of a voice message based on the voice data with additional information, accepts edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generation unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; A voice message creation system comprising:

A voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. And a voice message creation system comprising:

A voice message creation system configured as a sentence including a plurality of phrases,
An edit processing unit that displays an edit screen of a voice message based on the audio data and the audio data with additional information including the additional information related to the audio data, receives the edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generation unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; A voice message creation system comprising:

A voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing unit that displays an edit screen of a voice message based on the voice data with additional information, accepts edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit that generates list information including phrase information related to phrases constituting each sentence based on an editing process result,
The voice data generating unit with additional information is
Receives text data corresponding to the phrase, generates voice data with additional information corresponding to the phrase,
The edit processing unit
A voice message creation system that performs processing for displaying additional information on an editing screen based on additional information of voice data with additional information.

In claim 4,
A voice message creation system comprising: a voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information.

In claim 5,
A memory write information generation unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; Including
The memory write information generation unit
A voice message characterized in that a phrase to be stored is determined so that voice data of the same phrase is not written twice for a phrase used in a plurality of sentences or a phrase used a plurality of times in one sentence. Creation system.

In any of claims 1, 3 to 6,
The edit processing unit
A voice message creation system that generates list information based on additional information of voice data with additional information.

In any one of Claims 1 thru | or 7,
A voice playback command generation processing unit for generating a voice playback command for instructing playback of voice data necessary for playback of sentence voice from the voice data memory in an order corresponding to the read sentence based on the list information; Voice message creation system characterized by

In any one of Claims 1 thru | or 8.
A voice message creation system comprising data size information of voice data as the additional information.

In any one of Claims 1 thru | or 9,
A voice message creating system comprising text information of voice data as the additional information.

In any one of Claims 1 thru | or 10.
The memory write information generation unit
A voice message creation system, wherein a total size of the memory write information is calculated and size information is output based on the calculation result.

In any of claims 1, 3 to 11,
The edit processing unit
The sentence text data is received, and based on the phrase text data, sentence split processing is performed to divide the received sentence text data into a plurality of phrases,
The list information generation processing unit
A voice message creation system characterized by generating sentence information including phrase specifying information and sequence information of a phrase constituting the created sentence.

In any of claims 1, 3 to 12,
The edit processing unit
Accepts phrase selection input, performs phrase merge processing to create a sentence based on the selected phrase,
The list information generation processing unit
A voice message creation system that generates sentence information including phrase specifying information and sequence information of a phrase constituting a sentence based on a phrase combination processing result.

In any one of Claims 1 thru | or 13.
A voice message creating system comprising: a voice playback output processing unit for judging a phrase constituting a sentence and a playback order thereof based on the sentence information, and playing back and outputting the voice data of the phrase according to the playback order.

A program for operating a computer as a voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing unit that displays an edit screen of a voice message based on the voice data with additional information, accepts edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generating unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; A program that causes a computer to function.

A program for operating a computer as a voice message creation system configured as a sentence including a plurality of phrases,
A dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to the text data;
A voice data generation unit with additional information that generates voice data corresponding to the received text data based on the dictionary data and generates voice data with additional information including the generated voice data and additional information related to the voice data. A program that causes a computer to function.

A program for operating a computer as a voice message creation system configured as a sentence including a plurality of phrases,
An edit processing unit that displays an edit screen of a voice message based on the voice data and the voice data with additional information including additional information related to the voice data, accepts the edit input information, and performs an edit process based on the edit input information;
A list information generation processing unit for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
A voice data creation unit without additional information that creates voice data without additional information from which additional information is deleted based on the voice data with additional information;
A memory write information generating unit that determines a storage target phrase to be stored in the audio data memory based on the list information, and generates memory write information based on the audio data without additional information corresponding to the determined storage target phrase; A program that causes a computer to function.

A method of manufacturing a semiconductor integrated circuit device for speech synthesis, including a non-volatile storage unit,
A procedure for preparing a dictionary data storage unit for storing dictionary data for generating synthesized speech data corresponding to text data;
A voice data generation procedure with additional information that generates voice data corresponding to the received text data based on the dictionary data, and generates voice data with additional information including the generated voice data and additional information related to the voice data. When,
An edit processing procedure for displaying an edit screen of a voice message based on the voice data with additional information, receiving edit input information, and performing an edit process based on the edit input information;
A list information generation processing procedure for generating list information including phrase information relating to phrases constituting each sentence based on the editing processing result;
Based on the voice data with additional information, the voice data creation procedure without additional information for creating voice data without additional information from which the additional information is deleted;
A memory write information generation procedure for determining a storage target phrase to be stored in the nonvolatile storage unit based on the list information, and generating memory write information based on audio data without additional information corresponding to the determined storage target phrase; A method for manufacturing a semiconductor integrated circuit device, comprising:

A non-volatile storage unit in which memory write information generated by the voice message creation system according to claim 1 is stored;
A voice synthesizer that receives the voice reproduction command, reads out voice data from the nonvolatile storage unit based on the received voice reproduction command, and reproduces and outputs the voice data;
A semiconductor integrated circuit device.