JP2008146019A

JP2008146019A - System for creating dictionary for speech synthesis, semiconductor integrated circuit device, and method for manufacturing semiconductor integrated circuit device

Info

Publication number: JP2008146019A
Application number: JP2007222469A
Authority: JP
Inventors: Masamichi Izumida; 正道泉田; Takao Katayama; 貴夫片山
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2006-11-16
Filing date: 2007-08-29
Publication date: 2008-06-26

Abstract

<P>PROBLEM TO BE SOLVED: To create a subset speech dictionary with which a speech can be synthesized with good pronunciation quality for a prescribed target utterance target document using a sufficient necessary amount of data. <P>SOLUTION: The system for creasing a dictionary for speech synthesis includes: a first speech synthesis dictionary memory means 182 that stores dictionary data composing a first dictionary for speech synthesis; a second speech synthesis dictionary creating means 120 that analyzes an utterance target document, checks frequency of occurrence of each word composing the utterance target sentence, determines words to be stored in a second dictionary for speech synthesis on the basis of the frequency of occurrence, and creates the second dictionary for speech synthesis using the dictionary data stored in the first dictionary for speech synthesis corresponding to the determined words to be stored; and a speech synthesis means 130 that creates synthesized speech corresponding to the utterance target document using the second dictionary for speech synthesis. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声合成用辞書作成システム、半導体集積回路装置及び半導体集積回路装置の製造方法に関する。 The present invention relates to a speech synthesis dictionary creation system, a semiconductor integrated circuit device, and a method for manufacturing a semiconductor integrated circuit device.

キャラクタデータの集まりであるテキストデータから音声を合成するＴＴＳ方式の音声合成ＬＳＩには、人体の発声過程をモデル化して音を合成するパラメトリック方式、実在の人物の音声データからなる音素片データを持ち、必要に応じてそれを組み合わせるとともにつなぎ部分を一部変形するなどして合成するコンキャティネイティブ方式、さらに発展形として言語ベースの解析から音声への組み立てを行なって実声データから合成音声を形成するコーパスベース方式など多数の方式がある。 The TTS speech synthesis LSI that synthesizes speech from text data, which is a collection of character data, has a parametric method that synthesizes sound by modeling the utterance process of the human body, and phoneme data consisting of speech data of a real person. In addition, a concati-native system that combines them as needed and synthesizes them by partially transforming the joints, and further develops speech synthesis from language-based analysis to form synthesized speech. There are many methods such as a corpus-based method.

いずれの方式にせよ、文章から音に変換する前に、ＳＨＩＦＴ−ＪＩＳコードなどで標記された表記上のテキスト表現から、どのように発音させたいのか「読み」への変換辞書（データベース）を持つことが必須である。 Regardless of the method, before converting from text to sound, there is a conversion dictionary (database) that translates the textual representation of the notation marked with SHIFT-JIS code into how to pronounce it. It is essential.

また、さらに、コンキャティネイティブ方式、コーパスベース方式では、さらに「読み」から検索する「音素」への辞書（データベース）も必要であった。
特開２００３−２０８１９１号公報 Further, in the concati native method and the corpus-based method, a dictionary (database) from “reading” to “phoneme” is also required.
JP 2003-208191 A

オンチップリソース（ＲＯＭ容量など）の限られるシングルチップＴＴＳ−ＬＳＩにおいて、搭載可能な音声合成用辞書ファイルが比較的小語彙に制限される場合には、対応できる語彙が限られるため、十分な発音品質が得られない可能性がある。 In a single-chip TTS-LSI with limited on-chip resources (ROM capacity, etc.), if the dictionary file for speech synthesis that can be mounted is limited to a relatively small vocabulary, the vocabulary that can be handled is limited, so sufficient pronunciation Quality may not be obtained.

小容量のシステムの場合、十分な語彙に対応する「表記→読み」データ辞書も声質を高めるのに効果的な多くのケースを網羅した「音素」辞書も持てないため、読み上げる対象の文章が、辞書から漏れた語彙を含んでいると、該当部分での音質の劣化、もしくは読み上げ不能といった事態が発生していた。 In the case of a small-capacity system, the “notation → reading” data dictionary corresponding to a sufficient vocabulary does not have a “phoneme” dictionary that covers many cases effective for improving voice quality. If the vocabulary that was leaked from the dictionary was included, there was a situation where the sound quality deteriorated at the corresponding part or reading was impossible.

本発明は、以上のような技術的課題に鑑みてなされたものであり、その目的とするところは、所定の発話対象文章に対して、必要十分なデータ量で発音品質のよい音声合成が可能なサブセット音声辞書を作成することである。 The present invention has been made in view of the technical problems as described above, and the object of the present invention is to synthesize speech with good pronunciation quality with a necessary and sufficient amount of data for a predetermined utterance target sentence. To create a simple subset speech dictionary.

（１）本発明は、
発話対象文章に対応した合成音声を生成するために必要な辞書データの集合である第１の音声合成用辞書から第１の音声合成用辞書に比べてデータ量の少ない第２の音声合成用辞書を作成する音声合成用辞書作成システムであって、
第１の音声合成用辞書を構成する辞書データが記憶された第１の音声合成用辞書記憶手段と、
発話対象文章を解析し、発話対象文章を構成する各語句の出現頻度を調べ、出現頻度に基づき、第２の音声合成用辞書への格納語を決定し、決定された格納語に対応して第１の音声合成用辞書に格納されている辞書データを用いて第２の音声合成用辞書を生成する第２の音声合成用辞書作成手段と、
第２の音声合成用辞書を用いて発話対象文章に対応した合成音声を生成する音声合成手段とを含むことを特徴とする。 (1) The present invention
The second speech synthesis dictionary having a smaller amount of data than the first speech synthesis dictionary from the first speech synthesis dictionary, which is a set of dictionary data necessary for generating synthesized speech corresponding to the utterance target sentence. A speech synthesis dictionary creation system for creating
A first speech synthesis dictionary storage means for storing dictionary data constituting the first speech synthesis dictionary;
Analyzing the utterance target sentence, examining the appearance frequency of each word constituting the utterance target sentence, determining a storage word in the second speech synthesis dictionary based on the appearance frequency, and corresponding to the determined storage word Second speech synthesis dictionary creation means for creating a second speech synthesis dictionary using dictionary data stored in the first speech synthesis dictionary;
Speech synthesis means for generating synthesized speech corresponding to the utterance target sentence using the second speech synthesis dictionary.

第１の音声合成用辞書は任意の発話対象文章に対する合成音声を生成することが可能な規模の辞書データを有するフルセット辞書（大容量辞書）であり、第２の音声合成用辞書は特定の発話対象文章に対する合成音声を生成することが可能な規模のデータを有するサブセット辞書（小容量辞書）である。 The first speech synthesis dictionary is a full-set dictionary (large-capacity dictionary) having dictionary data of a scale capable of generating synthesized speech for an arbitrary utterance target sentence, and the second speech synthesis dictionary is a specific dictionary. It is a subset dictionary (small-capacity dictionary) having data of a scale capable of generating synthesized speech for an utterance target sentence.

第１の音声合成用辞書は、例えば語彙辞書（「表記→読み」データ辞書）や音素辞書（声質を高めるのに効果的な多くのケースを網羅した辞書）等で構成される。第１の音声合成用辞書記憶手段にはこれらの辞書データが記憶され、辞書データベースとして機能する。なお辞書の種類は音声合成の方式に応じて決定され、例えば語彙辞書と音素辞書の両方含む場合でもよいし、語彙辞書のみでもよい。 The first speech synthesis dictionary includes, for example, a vocabulary dictionary (“notation → reading” data dictionary), a phoneme dictionary (a dictionary covering many cases effective for improving voice quality), and the like. These dictionary data are stored in the first speech synthesis dictionary storage means and function as a dictionary database. Note that the type of dictionary is determined according to the speech synthesis method. For example, both the vocabulary dictionary and the phoneme dictionary may be included, or only the vocabulary dictionary may be included.

語彙辞書はテキスト読み上げ処理におけるフロントエンド処理を行うための辞書であり、テキスト表記に対応した記号化言語表現（symbolic linguistic representation）（例えばテキスト表記に対応した読みのデータ）が格納された辞書である。 The vocabulary dictionary is a dictionary for performing front-end processing in text-to-speech processing, and is a dictionary in which symbolic linguistic representation (for example, reading data corresponding to text notation) corresponding to text notation is stored. .

フロントエンド処理では、テキストの中の数字や省略表現を読み上げるときの表現に変換する処理（テキストの正規化、プリプロセッシング、トークン化などと呼ばれる）や、各単語を発音記号に変換し、テキストを熟語や文節、文などの韻律単位に分割する処理（単語に発音記号を割り当てる処理をテキスト音素（text-to-phoneme（TTP））変換または書記素音素（grapheme-to-phoneme（GTP））変換と呼ぶ）等が行われ、発音記号と韻律情報を組み合わせて記号化言語表現を作成し出力される。 In front-end processing, the numbers and abbreviations in the text are converted into expressions for reading (called text normalization, preprocessing, tokenization, etc.), and each word is converted into a phonetic symbol, Processing to divide into prosodic units such as idioms, clauses and sentences (processing to assign phonetic symbols to words as text-to-phoneme (TTP) conversion or grapheme-to-phoneme (GTP)) conversion The symbolic language expression is created and output by combining phonetic symbols and prosodic information.

テキスト正規化の工程では、テキストに含まれる同綴異義語、数字、省略表現等を発声できるように変換する処理が行われる。多くのTTS（text-to-speech）システムでは入力されるテキストの意味を解析しないで、例えば前後の単語を調べたり、統計的な出現頻度を用いたり等の様々なヒューリスティクスを用いて同綴異義語の区別を行う。 In the text normalization step, conversion processing is performed so that synonyms, numbers, abbreviations, etc. included in the text can be uttered. Many TTS (text-to-speech) systems do not analyze the meaning of the input text, but use different heuristics, such as examining the words before and after, and using statistical appearance frequencies. Distinguish between synonyms.

音素辞書は、フロントエンドの出力である記号化言語表現を入力として対応する実際の音（音素）の波形情報を格納する辞書である。バックエンドで音声波形を生成する主要技術には、連結的合成（concatenative synthesis）やフォルマント合成（formant synthesis）がある。連結的合成は、基本的には録音された音声の断片を連結して合成する方法である。 The phoneme dictionary is a dictionary that stores waveform information of an actual sound (phoneme) corresponding to a symbolic language expression that is output from the front end. The main technologies for generating speech waveforms at the back end include concatenative synthesis and formant synthesis. Linked synthesis is basically a method of combining recorded audio fragments.

音声合成手段は、第１の音声合成用辞書に記憶されている語彙情報や音情報に基づき、フロントエンド処理やバックエンド処理を行い、受け取った発話対象文章に対応した合成音声を生成する。 The speech synthesis means performs front-end processing and back-end processing based on the vocabulary information and sound information stored in the first speech synthesis dictionary, and generates synthesized speech corresponding to the received utterance target sentence.

第２の音声合成用辞書作成手段は、例えば出現頻度の高い語句を優先して格納語として決定するようにしてもよい。例えば予め第２の音声変換用辞書に割り当て可能と決められた記憶容量のうち、特定の割合（例えば８０％）を出現頻度の高い語彙から順に割り当てるようにしてもよい。その際、出現頻度がある回数（例えば２回）なければ、上記の割合に達しなくとも割り当てを停止するようにしてもよい。出現頻度は、一般に「ロングテール」型の分布をとるので、このようにすることで対象文章の部位の多くがカバーされることが期待できる。 For example, the second speech synthesis dictionary creating means may preferentially determine words with high appearance frequency as stored words. For example, a specific ratio (for example, 80%) of the storage capacity determined to be assignable to the second speech conversion dictionary in advance may be assigned in order from the vocabulary with the highest appearance frequency. At that time, if the appearance frequency is not a certain number of times (for example, twice), the allocation may be stopped even if the above-mentioned ratio is not reached. Since the appearance frequency generally has a “long tail” type distribution, it can be expected that many parts of the target sentence are covered in this way.

音声合成手段は、第２の音声合成用辞書を用いて発話対象文章に対応した合成音声を生成するので、ユーザーは発話対象文章の音声合成の結果を確認することができる。 Since the speech synthesis means generates the synthesized speech corresponding to the utterance target sentence using the second speech synthesis dictionary, the user can confirm the result of speech synthesis of the utterance target sentence.

本発明によれば、特定の発話対象文章を解析して、特定の発話対象文章の音声合成を行う場合に必要十分な辞書データを第１の音声合成用辞書から抽出して、第１の音声合成用辞書に比べてデータ量が少ない第２の音声合成用辞書を生成することができる。 According to the present invention, a specific speech target sentence is analyzed, dictionary data necessary and sufficient for speech synthesis of the specific speech target sentence is extracted from the first speech synthesis dictionary, and the first speech is extracted. A second speech synthesis dictionary having a smaller amount of data than the synthesis dictionary can be generated.

従ってオンチップリソース（ＲＯＭ容量など）の限られるシングルチップＴＴＳ−ＬＳＩにおいて搭載可能な音声辞書ファイルが比較的小語彙に制限される場合でも、特定の発話対象文章については精度良く音声合成が可能なサブセット辞書（第２の音声合成用辞書）を生成することができる。 Therefore, even if a speech dictionary file that can be mounted on a single-chip TTS-LSI with limited on-chip resources (ROM capacity, etc.) is limited to a relatively small vocabulary, speech synthesis can be performed with high accuracy for a specific utterance target sentence. A subset dictionary (second speech synthesis dictionary) can be generated.

なお本発明では、第２の音声合成用辞書に格納される語彙を選別抽出することで語彙辞書のデータ量を減らすことができる。そして語彙辞書のデータ量を減らすことで、対応する音素辞書のデータ量も結果として減少するため、第２の音声合成用辞書の語彙辞書、音素辞書ともにデータ量を削減することができる。 In the present invention, the data amount of the vocabulary dictionary can be reduced by selecting and extracting the vocabulary stored in the second speech synthesis dictionary. By reducing the data amount of the vocabulary dictionary, the data amount of the corresponding phoneme dictionary is also reduced as a result, so that the data amount of both the vocabulary dictionary and the phoneme dictionary of the second speech synthesis dictionary can be reduced.

（２）本発明の音声合成用辞書作成システムは、
発話対象文章を構成する語のなかで第２の音声合成用辞書への格納対象となっていない未格納語について第２の音声合成用辞書の格納語に置き換える発話対象文章の変更を行う発話対象文章変更手段と、
を含むことを特徴とする。 (2) The speech synthesis dictionary creation system of the present invention includes:
An utterance target that changes an utterance target sentence that replaces an unstored word that is not stored in the second speech synthesis dictionary with words stored in the second speech synthesis dictionary among words constituting the utterance target sentence. Text change means;
It is characterized by including.

ここでの置き換えは、例えば未格納語をその同義語（第２の音声合成用辞書に格納されている同義語）に置き換える場合でもよいし、未格納語をそのかな表記（仮名表記にたいする辞書は第２の音声合成用辞書に格納されている物とする）に置き換える場合でもよい。 The replacement here may be, for example, a case where an unstored word is replaced with its synonym (synonym stored in the second dictionary for speech synthesis), or an unstored word is replaced with its kana notation (a dictionary for kana notation is It may be replaced with the one stored in the second speech synthesis dictionary.

本発明によれば、第２の音声合成用辞書の格納語を増やすことなく、発話対象文章の音声合成の精度を高めることができる。 According to the present invention, it is possible to improve the accuracy of speech synthesis of an utterance target sentence without increasing the number of words stored in the second speech synthesis dictionary.

なお音声合成手段は、第２の音声合成用辞書を用いて変更語の発話対象文章に対応した合成音声を生成するので、ユーザーは変更後の発話対象文章の音声合成の結果を確認することができる。 Note that the speech synthesis means uses the second speech synthesis dictionary to generate synthesized speech corresponding to the utterance target sentence of the changed word, so that the user can confirm the result of speech synthesis of the utterance target sentence after the change. it can.

（３）本発明の音声合成用辞書作成システムは、
前記発話対象文章変更手段は、
発話対象文章を構成する語の置き換えに関する変更履歴を記録することを特徴とする音声合成用辞書を作成することを特徴とする。 (3) A speech synthesis dictionary creation system according to the present invention includes:
The utterance target sentence changing means is:
A speech synthesis dictionary characterized by recording a change history regarding replacement of words constituting an utterance target sentence is created.

変更履歴には、変更した語と変更した語に対応する発話対象文章の原語の情報を含む。従って所定の語句を複数回に渡って変更した場合には、少なくとも原語（最初に与えられた発話対象文章に含まれていた語句）と最終的に変更された語の情報を含む。 The change history includes the changed word and the original language information of the utterance target sentence corresponding to the changed word. Accordingly, when a predetermined word / phrase is changed a plurality of times, at least the original word (the word / phrase included in the utterance target sentence given first) and the finally changed word information are included.

また変更履歴は、発話対象文章とは別個に生成してもよいし、発話対象文章の中に変更履歴のコメントを挿入する形式で生成してもよい。 The change history may be generated separately from the utterance target sentence, or may be generated in a format in which a comment of the change history is inserted into the utterance target sentence.

（４）本発明の音声合成用辞書作成システムは、
前記発話対象文章変更手段は、
前記未格納語について、第２の音声合成用辞書の格納語との同義語があるか否かを解析し、同義語がある場合には発話対象文章の前記未格納語を同義語に置き換える同義語置き換え処理を行う同義語置き換え処理手段を含むことを特徴とする。 (4) The speech synthesis dictionary creation system according to the present invention includes:
The utterance target sentence changing means is:
A synonym for analyzing whether or not there is a synonym with the stored word in the second speech synthesis dictionary for the unstored word, and replacing the unstored word in the utterance target sentence with a synonym if there is a synonym Synonym replacement processing means for performing word replacement processing is included.

例えば発話対象文章に含まれる第１の語句と第２の語句が同義語であって置き換え可能である場合、第１の語句が第２の音声合成用辞書の格納語であって、第２の語が第２の音声合成用辞書の格納語でない場合、本発明では、発話対象文章中の第２の語を第１の語に置き換える発話対象文章変更処理を行うことが可能である。 For example, when the first word and the second word included in the utterance target sentence are synonyms and can be replaced, the first word is a storage word of the second speech synthesis dictionary, and the second word When the word is not a stored word of the second speech synthesis dictionary, in the present invention, it is possible to perform an utterance target sentence changing process for replacing the second word in the utterance target sentence with the first word.

例えば同義語が定義されたシノニム辞書を用いて、未格納語の同義語を検索するようにしてもよい。例えば発話対象文章の未格納語の各語についてシノニム辞書から同義を検索して、第２の音声合成用辞書を検索して、検索結果得られた同義語が第２の音声合成用辞書の格納語となっているか否かを調べて、格納語となっている場合には、発話対象文章の当該未格納語を当該格納語で置き換える置き換え処理を行うようにしてもよい。 For example, synonyms of unstored words may be searched using a synonym dictionary in which synonyms are defined. For example, the synonym dictionary is searched for synonyms for each unstored word in the utterance target sentence, the second speech synthesis dictionary is searched, and the synonym obtained as a result of the search is stored in the second speech synthesis dictionary. It is checked whether or not it is a word, and if it is a stored word, a replacement process for replacing the unstored word of the utterance target sentence with the stored word may be performed.

本発明によれば、発話対象文章の意味内容を変更することなく、第２の音声合成用辞書の格納語を増やさずに発話対象文章の音声合成の精度を高めることができる。 According to the present invention, it is possible to increase the accuracy of speech synthesis of an utterance target sentence without changing the meaning content of the utterance target sentence and without increasing the number of words stored in the second speech synthesis dictionary.

なお音声合成手段は、第２の音声合成用辞書を用いて同義語で置き換え後の発話対象文章に対応した合成音声を生成するので、ユーザーは同義語で置き換え後の発話対象文章の音声合成の結果を確認することができる。 The speech synthesis means uses the second dictionary for speech synthesis to generate synthesized speech corresponding to the utterance target sentence after replacement with the synonym, so that the user can synthesize speech of the utterance target sentence after replacement with the synonym. The result can be confirmed.

（５）本発明の音声合成用辞書作成システムは、
発話対象文章変更手段は、
前記未格納語について、当該語のよみを表す仮名表記に置き換えるかな置き換え処理をおこなう仮名置き換え処理手段を含むことを特徴とする。 (5) A speech synthesis dictionary creation system according to the present invention includes:
The utterance target sentence changing means is:
It includes a kana replacement processing means for performing a kana replacement process for replacing the unstored word with a kana notation representing the reading of the word.

ここにおいて第２の音声合成用辞書は、仮名表記に対応して音声合成を行うための辞書データを含んでいるものとする。 Here, it is assumed that the second dictionary for speech synthesis includes dictionary data for performing speech synthesis corresponding to kana notation.

本発明によれば、出現頻度の少ない特殊な語句については仮名表記に置き換えることで（多少抑揚やアクセントが不自然になるかもしれないが）、特定の発話対象文章について音声合成は行うことができる第２の音声合成用辞書を作成することができる。 According to the present invention, speech synthesis can be performed on a specific utterance target sentence by replacing special words and phrases having a low appearance frequency with kana notation (although some inflection and accent may be unnatural). A second speech synthesis dictionary can be created.

（６）本発明の音声合成用辞書作成システムは、
第２の音声合成用辞書を用いて音声合成された発話対象文章に対する評価入力を受け付け、評価入力の内容に応じて第２の音声合成用辞書または発話対象文章の確定または変更処理を行う編集処理手段とを、
含むことを特徴とする。 (6) The speech synthesis dictionary creation system according to the present invention includes:
Editing process for accepting an evaluation input for an utterance target sentence synthesized by speech using the second speech synthesis dictionary and determining or changing the second speech synthesis dictionary or utterance target sentence according to the contents of the evaluation input Means,
It is characterized by including.

評価入力は例えばＯＫまたはＮＧのいずれかで返すようにしてもよい。 For example, the evaluation input may be returned as either OK or NG.

このようにすれば、ユーザーは作成中の第２の音声合成用辞書を用いて生成された発話対象文章の合成音声を実際に聞いて確認しながら、第２の音声合成用辞書または発話対象文章の確定または変更処理をおこなうことができる。従ってリアルタイムで結果を確認しながら第２の音声合成用辞書の編集処理をおこなうことができるので、ユーザーにとって使い勝手の良い音声合成用辞書作成システムを提供することができる。 In this way, the user can listen to and confirm the synthesized speech of the utterance target sentence generated using the second speech synthesis dictionary that is being created, while confirming the second speech synthesis dictionary or utterance target sentence. Can be confirmed or changed. Therefore, the second speech synthesis dictionary editing process can be performed while confirming the result in real time, so that a user-friendly speech synthesis dictionary creation system can be provided for the user.

（７）本発明の音声合成用辞書作成システムは、
前記編集処理手段は、
第２の音声合成用辞書の格納語についてのユーザーの指定入力を受け付け、
前記第２の音声合成用辞書作成手段は、
前記ユーザーの指定入力に基づき格納語を決定することを特徴とする。 (7) The speech synthesis dictionary creation system of the present invention includes:
The editing processing means includes
Accepts user-specified input for stored words in the second dictionary for speech synthesis,
The second speech synthesis dictionary creation means includes:
A storage word is determined based on the user's designated input.

例えば発話対象文章を構成する各語句の出現頻度に応じて格納語を決定したあと、残りの容量に入れる語句については、ユーザーからの指定入力を受け付け、当該指定入力に応じて決定するようにしてもよい。 For example, after determining the storage word according to the appearance frequency of each word constituting the utterance target sentence, for the words to be included in the remaining capacity, the specification input from the user is accepted and determined according to the specification input. Also good.

このようにすると、第２の音声合成用辞書の格納語の内容についてユーザーの意思をダイレクトに反映させる調整ができる。従って個別のユーザーの個別のニーズにきめ細かく対応した第２の音声合成用辞書の編集を行うことができる。 In this way, it is possible to make an adjustment that directly reflects the user's intention regarding the contents of the words stored in the second speech synthesis dictionary. Accordingly, it is possible to edit the second speech synthesis dictionary that closely corresponds to the individual needs of individual users.

（８）本発明は、
上記いずれかに記載の音声合成用辞書作成システムによって生成された第２の音声合成用辞書を構成する辞書データが記憶された不揮発性記憶部と、
前記不揮発性記憶部に記憶された辞書データを用いて所定の発話対象文章に対応した合成音声データを生成する合成音声データ生成処理部と、を含むことを特徴とする半導体集積回路装置である。 (8) The present invention
A non-volatile storage unit that stores dictionary data that constitutes the second dictionary for speech synthesis generated by the dictionary for speech synthesis according to any one of the above,
And a synthesized speech data generation processing unit that generates synthesized speech data corresponding to a predetermined utterance target sentence using dictionary data stored in the non-volatile storage unit.

（９）本発明は、
不揮発性記憶部を含む、音声合成用の半導体集積回路装置の製造方法であって、
半導体集積回路装置で音声合成を予定している発話対象文章を解析し、発話対象文章を構成する各語句の出現頻度を調べ、出現頻度に基づき、第２の音声合成用辞書への格納語を決定し、決定された格納語に対応して第１の音声合成用辞書に格納されている辞書データを用いて第２の音声合成用辞書を生成するステップと、
第２の音声合成用辞書を用いて発話対象文章に対応した合成音声を生成するステップと、
生成された第２の音声合成用辞書を構成する辞書データを前記半導体集積回路装置の不揮発性記憶部に書き込むステップと、
を含むことを特徴とする。 (9) The present invention
A method of manufacturing a semiconductor integrated circuit device for speech synthesis, including a non-volatile storage unit,
The semiconductor integrated circuit device analyzes the speech target sentence scheduled for speech synthesis, examines the appearance frequency of each word constituting the speech target sentence, and based on the appearance frequency, stores the words stored in the second speech synthesis dictionary. Determining and generating a second speech synthesis dictionary using dictionary data stored in the first speech synthesis dictionary corresponding to the determined stored word;
Generating synthesized speech corresponding to the utterance target sentence using the second dictionary for speech synthesis;
Writing the dictionary data constituting the generated second speech synthesis dictionary into the nonvolatile storage unit of the semiconductor integrated circuit device;
It is characterized by including.

以下、本発明の好適な実施の形態について図面を用いて詳細に説明する。なお以下に説明する実施の形態は、特許請求の範囲に記載された本発明の内容を不当に限定するものではない。また以下で説明される構成の全てが本発明の必須構成要件であるとは限らない。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. The embodiments described below do not unduly limit the contents of the present invention described in the claims. Also, not all of the configurations described below are essential constituent requirements of the present invention.

図１は、本実施の形態の音声合成用辞書作成システムと音声合成用辞書作成システムで作成された音声合成用辞書を内蔵する半導体集積回路装置について説明するための図である。 FIG. 1 is a diagram for explaining a semiconductor integrated circuit device having a built-in speech synthesis dictionary created by the speech synthesis dictionary creation system and the speech synthesis dictionary creation system of the present embodiment.

１００は、本実施の形態の音声合成用辞書作成システムであり、発話対象文章１０１に対応した合成音声を生成するために必要な辞書データの集合である大容量辞書（第１の音声合成用辞書）１８２から大容量辞書（第１の音声合成用辞書）１８２に比べてデータ量の少ない小容量辞書（第２の音声合成用辞書）１８４を作成する音声合成用辞書作成システムであって、パーソナルコンピュータにＴＴＳ対応の音声合成用大容量辞書１８２、音声合成用サブセット辞書作成ソフトウエア１２２及び音声合成ソフトウエア１３２を搭載することにより実現することができる。 Reference numeral 100 denotes a speech synthesis dictionary creation system according to the present embodiment, which is a large-capacity dictionary (first speech synthesis dictionary that is a set of dictionary data necessary for generating synthesized speech corresponding to an utterance target sentence 101. ) A speech synthesis dictionary creation system that creates a small-capacity dictionary (second speech synthesis dictionary) 184 with a smaller amount of data than a large-capacity dictionary (first speech synthesis dictionary) 182 from This can be realized by installing a large-capacity dictionary 182 for speech synthesis corresponding to TTS, a subset dictionary creation software 122 for speech synthesis, and speech synthesis software 132 on a computer.

音声合成用大容量辞書１８２は、第１の音声合成用辞書を構成する辞書データが記憶された第１の音声合成用辞書記憶手段として機能する。 The speech synthesis large-capacity dictionary 182 functions as a first speech synthesis dictionary storage unit in which dictionary data constituting the first speech synthesis dictionary is stored.

音声合成用サブセット辞書作成ソフトウエア１０２は、発話対象文章を解析し、発話対象文章を構成する各語句の出現頻度を調べ、出現頻度に基づき、小容量辞書（第２の音声合成用辞書）１８４への格納語を決定し、決定された格納語に対応して大容量辞書（第１の音声合成用辞書）１８２に格納されている辞書データを用いて小容量辞書（第２の音声合成用辞書）１８４を生成する第２の音声合成用辞書作成手段として機能する。 The speech synthesis subset dictionary creation software 102 analyzes the utterance target sentence, checks the appearance frequency of each word constituting the utterance target sentence, and based on the appearance frequency, the small-capacity dictionary (second speech synthesis dictionary) 184 And a small-capacity dictionary (for second speech synthesis) using dictionary data stored in a large-capacity dictionary (first speech synthesis dictionary) 182 corresponding to the determined storage word. (Dictionary) 184 functions as second speech synthesis dictionary creation means.

また音声合成用サブセット辞書作成ソフトウエア１２２は、発話対象文章を構成する語のなかで小容量辞書（第２の音声合成用辞書）１８４への格納対象となっていない未格納語について小容量辞書（第２の音声合成用辞書）１８４の格納語に置き換える発話対象文章の変更を行う発話対象文章変更手段として機能するようにしてもよい。 Further, the speech synthesis subset dictionary creation software 122 generates a small-capacity dictionary for unstored words that are not stored in the small-capacity dictionary (second speech synthesis dictionary) 184 among the words constituting the utterance target sentence. (Second speech synthesis dictionary) The utterance target sentence changing means for changing the utterance target sentence to be replaced with the stored word of 184 may be used.

また音声合成用サブセット辞書作成ソフトウエア１２２は、小容量辞書（第２の音声合成用辞書）１８４を用いて音声合成された発話対象文章に対する評価入力を受け付け、評価入力の内容に応じて第２の音声合成用辞書または発話対象文章の確定または変更処理を行う編集処理手段として機能するようにしてもよい。 The speech synthesis subset dictionary creation software 122 accepts an evaluation input for the speech-sentence text synthesized using the small-capacity dictionary (second speech synthesis dictionary) 184, and the second is generated according to the content of the evaluation input. The voice synthesis dictionary or the speech processing sentence may be functioned as editing processing means for confirming or changing the sentence.

音声合成ソフトウエア１３２は、小容量辞書(第２の音声合成用辞書)１８４を用いて発話対象文章に対応した合成音声を生成する音声合成手段として機能する。実際には大容量辞書（第２の音声合成用辞書)１８２を用いて発話対象文章に対応した合成音声を生成することも可能である。 The speech synthesis software 132 functions as speech synthesis means for generating synthesized speech corresponding to the utterance target sentence using the small-capacity dictionary (second speech synthesis dictionary) 184. In practice, it is also possible to generate synthesized speech corresponding to the utterance target sentence using the large-capacity dictionary (second speech synthesis dictionary) 182.

本実施の形態の音声合成用辞書作成システム１００は、発話対象文章に基づき格納語を決定して格納語に対応する辞書データを大容量辞書（第１の音声合成用辞書）１８２から抽出して小容量辞書（第２の音声合成用辞書）１８４に格納する。 The speech synthesis dictionary creation system 100 according to the present embodiment determines a stored word based on an utterance target sentence and extracts dictionary data corresponding to the stored word from a large-capacity dictionary (first speech synthesis dictionary) 182. It is stored in the small capacity dictionary (second speech synthesis dictionary) 184.

そして小容量辞書の辞書データをＴＴＳ−ＬＳＩ（半導体集積回路装置の一例）１０のＲＯＭ（不揮発性記憶部）に書き込んで小容量辞書を作成する。 Then, the dictionary data of the small-capacity dictionary is written into the ROM (nonvolatile storage unit) of the TTS-LSI (an example of a semiconductor integrated circuit device) 10 to create a small-capacity dictionary.

ＴＴＳ−ＬＳＩ（半導体集積回路装置の一例）１０は、小容量辞書３０及び音声合成システム２０を搭載し、所定の発話対象文章に対応した合成音声データを生成する半導体集積回路装置である。小容量辞書３０は音声合成用辞書を構成する辞書データが記憶された不揮発性記憶部として機能する。音声合成システム２０は、前記不揮発性記憶部に記憶された辞書データを用いて所定の発話対象文章に対応した合成音声データを生成する合成音声データ生成処理部として機能する。 A TTS-LSI (an example of a semiconductor integrated circuit device) 10 is a semiconductor integrated circuit device that includes a small-capacity dictionary 30 and a speech synthesis system 20 and generates synthesized speech data corresponding to a predetermined utterance target sentence. The small-capacity dictionary 30 functions as a non-volatile storage unit in which dictionary data constituting the speech synthesis dictionary is stored. The speech synthesis system 20 functions as a synthesized speech data generation processing unit that generates synthesized speech data corresponding to a predetermined utterance target sentence using dictionary data stored in the nonvolatile storage unit.

本実施の形態では、例えば特定用途向けで、読みあげる語彙について特定の用途がある場合や、読み上げる文章が予め判明しているＴＴＳ−ＬＳＩ（集積回路装置の一例）１０のように、搭載可能な音声辞書ファイルが比較的小語彙に制限されている。 In the present embodiment, it can be mounted as in a specific use, for example, when there is a specific use for a vocabulary to be read or a TTS-LSI (an example of an integrated circuit device) 10 in which a text to be read is known in advance. The voice dictionary file is relatively limited to small vocabulary.

ＴＴＳ−ＬＳＩ（集積回路装置の一例）１０用の小容量辞書（サブセット辞書）３０には、パーソナルコンピュータ１００上の大容量辞書（フルセット辞書）１８２から、ＴＴＳ−ＬＳＩ（集積回路装置の一例）１０で音声合成する所定の発話対象文章に必要な語彙に対応する辞書データを抽出して作成された小容量辞書（第２の音声合成用辞書）を構成する辞書データが書き込まれている。 A small-capacity dictionary (subset dictionary) 30 for TTS-LSI (an example of an integrated circuit device) 10 includes a large-capacity dictionary (full set dictionary) 182 on the personal computer 100 and a TTS-LSI (an example of an integrated circuit device). The dictionary data constituting the small-capacity dictionary (second speech synthesis dictionary) created by extracting the dictionary data corresponding to the vocabulary necessary for the predetermined utterance text to be synthesized at 10 is written.

このようにすることでＴＴＳ−ＬＳＩ（集積回路装置の一例）１０の特定の用途に対応する辞書を作成できるので、小容量の辞書で十分な性能を確保することができる。また、発話対象文章が予め判明している場合には、該発話対象文章の語彙のみに限った辞書を作成するので、リソースの無駄も無くことができ、ＴＴＳ−ＬＳＩ（集積回路装置の一例）１０に搭載する辞書の最適化を行なうことができる。 In this way, a dictionary corresponding to a specific application of the TTS-LSI (an example of an integrated circuit device) 10 can be created, so that a sufficient performance can be ensured with a small-capacity dictionary. Further, when the utterance target sentence is known in advance, a dictionary limited to only the vocabulary of the utterance target sentence is created, so that waste of resources can be eliminated, and TTS-LSI (an example of an integrated circuit device) 10 can be optimized.

図２は、本実施の形態の音声合成用辞書作成システムの機能ブロック図の一例である。なお、本実施形態の音声合成用辞書作成システム１００は、図２の構成要素（各部）を全て含む必要はなく、その一部を省略した構成としてもよい。 FIG. 2 is an example of a functional block diagram of the speech synthesis dictionary creation system of the present embodiment. Note that the speech synthesis dictionary creation system 100 of the present embodiment does not need to include all the components (each unit) in FIG. 2, and may have a configuration in which some of them are omitted.

操作部１６０は、ユーザーの操作等をデータとして入力するためのものであり、その機能は、操作ボタン、操作レバー、タッチパネル或いはマイクなどのハードウェアにより実現できる。 The operation unit 160 is for inputting a user operation or the like as data, and the function can be realized by hardware such as an operation button, an operation lever, a touch panel, or a microphone.

記憶部１７０は、処理部１１０や通信部１９６などのワーク領域となるもので、その機能はＲＡＭなどのハードウェアにより実現できる。 The storage unit 170 serves as a work area for the processing unit 110, the communication unit 196, and the like, and its function can be realized by hardware such as a RAM.

情報記憶媒体１８０（コンピュータにより読み取り可能な媒体）は、プログラムやデータなどを格納するものであり、その機能は、光ディスク（ＣＤ、ＤＶＤ等）、光磁気ディスク（ＭＯ）、磁気ディスク、ハードディスク、磁気テープ、或いはメモリ（ＲＯＭ）などのハードウェアにより実現できる。 The information storage medium 180 (a computer-readable medium) stores programs, data, and the like, and functions as an optical disk (CD, DVD, etc.), a magneto-optical disk (MO), a magnetic disk, a hard disk, and a magnetic disk. It can be realized by hardware such as a tape or a memory (ROM).

また情報記憶媒体１８０には、本実施形態の各部としてコンピュータを機能させるプログラムや補助データ（付加データ）が記憶されるとともに、音声合成用の大容量辞書データが記憶され第１の音声合成用辞書記憶部１８２として機能する。なお情報記憶媒体１８０には、第１の音声合成用辞書から抽出された第２の音声合成用辞の辞書データも記憶するようにしてもよい。 The information storage medium 180 stores a program for causing the computer to function as each unit of the present embodiment and auxiliary data (additional data), and also stores large-capacity dictionary data for speech synthesis, and the first speech synthesis dictionary. It functions as the storage unit 182. Note that the information storage medium 180 may also store dictionary data of the second speech synthesis word extracted from the first speech synthesis dictionary.

処理部１００は、この情報記憶媒体１８０に格納されるプログラム（データ）や情報記憶媒体１８０から読み出されたデータなどに基づいて本実施形態の種々の処理を行う。即ち情報記憶媒体１８０には、本実施形態の各部としてコンピュータを機能させるためのプログラム（各部の処理をコンピュータに実行させるためのプログラム）が記憶される。 The processing unit 100 performs various processes of the present embodiment based on a program (data) stored in the information storage medium 180, data read from the information storage medium 180, and the like. That is, the information storage medium 180 stores a program for causing a computer to function as each unit of the present embodiment (a program for causing a computer to execute processing of each unit).

表示部１９０は、本実施形態により生成された画像を出力するものであり、その機能は、ＣＲＴディスプレイ、ＬＣＤ（液晶ディスプレイ）、ＯＥＬＤ(有機ＥＬディスプレイ)、ＰＤＰ（プラズマディスプレイパネル）、タッチパネル型ディスプレイなどのハードウェアにより実現できる。 The display unit 190 outputs an image generated according to the present embodiment, and functions thereof are a CRT display, an LCD (liquid crystal display), an OELD (organic EL display), a PDP (plasma display panel), and a touch panel display. It can be realized by hardware such as.

音出力部１９２は、本実施形態により生成された合成音声等を出力するものであり、その機能は、スピーカ、或いはヘッドフォンなどのハードウェアにより実現できる。 The sound output unit 192 outputs the synthesized speech generated by the present embodiment, and the function can be realized by hardware such as a speaker or headphones.

通信部１９６は、外部（例えばホスト装置や他の端末機）との間で通信を行うための各種の制御を行うものであり、その機能は、各種プロセッサ又は通信用ＡＳＩＣなどのハードウェアや、プログラムなどにより実現できる。 The communication unit 196 performs various controls for communicating with the outside (for example, a host device or another terminal), and functions thereof include hardware such as various processors or communication ASICs, It can be realized by a program.

なお本実施形態の各部としてコンピュータを機能させるためのプログラム（データ）は、ホスト装置（サーバ装置）が有する情報記憶媒体からネットワーク及び通信部１９６を介して情報記憶媒体１８０（あるいは記憶部１７０）に配信するようにしてもよい。このようなホスト装置（サーバ装置等）の情報記憶媒体の使用も本発明の範囲内に含めることができる。 Note that a program (data) for causing a computer to function as each unit of the present embodiment is transferred from the information storage medium included in the host device (server device) to the information storage medium 180 (or storage unit 170) via the network and communication unit 196. You may make it deliver. Use of the information storage medium of such a host device (server device or the like) can also be included in the scope of the present invention.

処理部１１０（プロセッサ）は、操作部１６０からの操作データやプログラムなどに基づいて、記憶部１７０をワーク領域として各種処理を行う。処理部１１０の機能は各種プロセッサ（ＣＰＵ、ＤＳＰ等）、ＡＳＩＣ（ゲートアレイ等）などのハードウェアや、プログラムにより実現できる。 The processing unit 110 (processor) performs various processes using the storage unit 170 as a work area based on operation data, a program, and the like from the operation unit 160. The functions of the processing unit 110 can be realized by hardware such as various processors (CPU, DSP, etc.), ASIC (gate array, etc.), and programs.

処理部１１０は、第２の音声合成用辞書作成部１２０、合成音声データ生成処理部１３０、発話対象文章変更処理部１４０、辞書編集処理部１５０を含む。 The processing unit 110 includes a second speech synthesis dictionary creation unit 120, a synthesized speech data generation processing unit 130, an utterance target sentence change processing unit 140, and a dictionary editing processing unit 150.

第２の音声合成用辞書作成部１２０は、発話対象文章を解析し、発話対象文章を構成する各語句の出現頻度を調べ、出現頻度に基づき、第２の音声合成用辞書への格納語を決定し、決定された格納語に対応して第１の音声合成用辞書に格納されている辞書データを用いて第２の音声合成用辞書を生成する。 The second speech synthesis dictionary creation unit 120 analyzes the utterance target sentence, examines the appearance frequency of each word constituting the utterance target sentence, and based on the appearance frequency, stores the words stored in the second speech synthesis dictionary. The second speech synthesis dictionary is generated using the dictionary data stored in the first speech synthesis dictionary corresponding to the determined stored word.

合成音声データ生成処理部１３０は、第２の音声合成用辞書を用いて発話対象文章に対応した合成音声データを生成する。 The synthesized speech data generation processing unit 130 generates synthesized speech data corresponding to the utterance target sentence using the second speech synthesis dictionary.

発話対象文章変更処理部１４０は、発話対象文章を構成する語のなかで第２の音声合成用辞書への格納対象となっていない未格納語について第２の音声合成用辞書の格納語に置き換える発話対象文章の変更を行う。 The utterance target sentence change processing unit 140 replaces unstored words that are not stored in the second speech synthesis dictionary among the words constituting the utterance target sentences with the storage words of the second speech synthesis dictionary. Change the text to be spoken.

発話対象文章変更処理部１４０は、変更履歴記録処理部１４２、同義語置き換え処理部１４４、仮名置き換え処理部１４６を含む。 The utterance target sentence change processing unit 140 includes a change history recording processing unit 142, a synonym replacement processing unit 144, and a kana replacement processing unit 146.

変更履歴記録処理部１４２は、発話対象文章を構成する語の置き換えに関する変更履歴を記録する処理を行う。 The change history recording processing unit 142 performs a process of recording a change history related to replacement of words constituting the utterance target sentence.

同義語置き換え処理部１４４は、未格納語について、第２の音声合成用辞書の格納語との同義語があるか否かを解析し、同義語がある場合には発話対象文章の前記未格納語を同義語に置き換える同義語置き換え処理を行う。 The synonym replacement processing unit 144 analyzes whether an unstored word has a synonym with a stored word in the second speech synthesis dictionary, and if there is a synonym, the unstored utterance target sentence is stored. Perform synonym replacement processing to replace words with synonyms.

仮名置き換え処理部１４６は、未格納語について、当該語のよみを表す仮名表記に置き換えるかな置き換え処理をおこなう。 The kana replacement processing unit 146 performs kana replacement processing for replacing an unstored word with a kana notation representing the reading of the word.

辞書編集処理部１５０は、第２の音声合成用辞書を用いて音声合成された発話対象文章に対する評価入力を受け付け、評価入力の内容に応じて第２の音声合成用辞書または発話対象文章の確定または変更処理を行う。 The dictionary editing processing unit 150 receives an evaluation input with respect to the utterance target sentence synthesized by using the second speech synthesis dictionary, and determines the second speech synthesis dictionary or the utterance target sentence according to the contents of the evaluation input. Alternatively, change processing is performed.

また辞書編集処理部１５０は、第２の音声合成用辞書の格納語についてのユーザーの指定入力を受け付け、第２の音声合成用辞書作成部１２０は、ユーザーの指定入力に基づき格納語を決定するようにしてもよい。 Further, the dictionary editing processing unit 150 accepts a user's designation input for a storage word in the second speech synthesis dictionary, and the second speech synthesis dictionary creation unit 120 determines a storage word based on the user's designation input. You may do it.

次に、本発明の動作を、具体例を用いて説明する。 Next, the operation of the present invention will be described using a specific example.

図３は本実施の形態の処理の流れを説明するためのフローチャートである。 FIG. 3 is a flowchart for explaining the flow of processing of the present embodiment.

まず発話対象文章のプロファイリングを行う（ステップＳ１０）。例えば発話対象文章を語彙に分解し、各語彙の出現頻度を集計する。 First, the utterance target sentence is profiled (step S10). For example, utterance target sentences are broken down into vocabularies, and the appearance frequency of each vocabulary is tabulated.

次に頻出語辞書抽出（一次抽出）を行う（ステップＳ２０）。例えば上記プロファイリングデータに基づき、予め辞書に割り当て可能と決められた記憶容量のうち、特定の割合（例えば８０％）を出現頻度の高い語彙から順に割り当てる。その際、出現頻度がある回数（例えば２回）なければ、上記の割合に達しなくとも割り当てを停止する。出現頻度は、一般に「ロングテール」型の分布をとるので、この段階で対象文章の部位の多くがサブセット辞書でカバーされることが期待できる。 Next, frequent word dictionary extraction (primary extraction) is performed (step S20). For example, based on the profiling data, a specific ratio (for example, 80%) of the storage capacity determined to be assignable to the dictionary in advance is assigned in order from the vocabulary with the highest appearance frequency. At this time, if the appearance frequency is not a certain number of times (for example, twice), the allocation is stopped even if the above ratio is not reached. Since the appearance frequency generally has a “long tail” type distribution, it can be expected that many parts of the target sentence are covered by the subset dictionary at this stage.

次に一次抽出後のサブセット辞書を用いて発話対象文章の発話試行を行い、ユーザーに確認する（ステップＳ３０）。 Next, an utterance trial of the utterance target sentence is performed using the subset dictionary after the primary extraction, and the user is confirmed (step S30).

そしてユーザーからの確認入力（例えばＯＫ又はＮＧ）を受け付け、ＯＫであれば処理を終了し（一次抽出後の内容でサブセット辞書の内容を確定させる）、ＮＧであれば、以降の処理を行う（ステップＳ４０）。 Then, a confirmation input from the user (for example, OK or NG) is received. If OK, the process ends (confirms the contents of the subset dictionary with the contents after the primary extraction). If NG, the subsequent processes are performed ( Step S40).

次に低出現語彙の置き換え処理を行う（ステップＳ４０）。一次抽出の過程で漏れた語彙について、「シノニム」辞書を使って、語彙の置き換えができないかどうかを確認する。既に割り当てられている語彙に置き換えられる場合、および、置き換えにより複数の語彙を一つにまとめられる場合を調べて、置き換えによる発話対象文章の変更を行う（ステップＳ５０）。 Next, a low-appearance vocabulary replacement process is performed (step S40). Use the “Synonym” dictionary to check whether vocabulary replacement is possible for the vocabulary leaked during the primary extraction process. A case where the vocabulary is replaced with an already assigned vocabulary and a case where a plurality of vocabularies can be combined into one by the replacement are examined, and the utterance target sentence is changed by the replacement (step S50).

次に一次抽出後のサブセット辞書を用いて変更後の発話対象文章の発話試行を行い、ユーザーに確認する（ステップＳ６０）。なおここでの確認は、例えば変更箇所をテキスト等で画面に表示出力する形式での確認でもよいが、その場合でも変更後の音声を確認した方が間違いのない確認となるので好ましい。 Next, an utterance trial of the utterance target sentence after the change is performed using the subset dictionary after the primary extraction, and the user is confirmed (step S60). The confirmation here may be, for example, confirmation in a format in which the changed portion is displayed and output on the screen as text or the like, but even in that case, it is preferable to confirm the sound after the change because there is no mistake.

結果の置き換え採用の可否は、一旦、ユーザーに提示して、判断をうけた上で辞書に追加をすることも可能であるし、ともなく置き換えられるものは優先して置き換えてしまうことも可能である。この際、既に割り当てられているものは辞書追加は不要であるので、対象文章の語彙の方を置換することになる。また頻度順にソートした上で、頻度の高いものから、既に割り当てられている残りの割合の範囲内でサブセット辞書に追加を行なう場合には、追加分について置き換え可能な語彙があるか否か検索し、発話対象文章を新規追加した語彙に置換するようにしてもよい。 Whether the results can be replaced or not can be shown to the user and added to the dictionary after judgment, and those that are replaced can be replaced with priority. is there. At this time, since it is not necessary to add a dictionary to those already assigned, the vocabulary of the target sentence is replaced. In addition, when sorting in the order of frequency and adding to the subset dictionary within the range of the remaining percentage that has already been assigned, search for whether there is a vocabulary that can be replaced. The utterance target sentence may be replaced with a newly added vocabulary.

そしてユーザーからの確認入力（例えばＯＫ又はＮＧ）を受け付け、ＯＫであれば処理を終了し（一次抽出後の内容でサブセット辞書の内容を確定させる）、ＮＧであれば、以降の処理を行う（ステップＳ７０）。 Then, a confirmation input from the user (for example, OK or NG) is received. If OK, the process ends (confirms the contents of the subset dictionary with the contents after the primary extraction). If NG, the subsequent processes are performed ( Step S70).

次に、発話対象文章の変更を変更履歴として記録する処理を行う（ステップＳ８０）。 Next, a process of recording the change of the utterance target sentence as a change history is performed (step S80).

図４は、置き換え時の変更履歴記録処理の一例を説明するための図である。 FIG. 4 is a diagram for explaining an example of a change history recording process at the time of replacement.

たとえば図４に示すように発話対象文章２００自体にコメント２２０、２３０、２４０を挿入する形式で発話対象文章の変更履歴を残すようにしてもよい。コメントは例えばコメントであることを示すためにカギ括弧（図４の２２２と２２６、２３２と２３８、２３２と２３６）に囲む等で、発話対象文章と区別できるようにしてもよい。 For example, as shown in FIG. 4, the change history of the utterance target sentence may be left in a format in which comments 220, 230, and 240 are inserted into the utterance target sentence 200 itself. For example, the comment may be distinguished from the utterance target sentence by surrounding it with brackets (222 and 226, 232 and 238, 232 and 236 in FIG. 4) to indicate that it is a comment.

ここで２１０は置き換え語の単語である（発話対象文章の一部である）。コメント２２０と２４０は置き換え後の前後につき、これらのコメントに挟まれた部分が置き換え語であることを示す。２３０は、置き換え語に対応するオリジナル語（元もとの発話対象文章に含まれていた語句）が「パフォーマンス」であることを示すコメントである。 Here, 210 is a replacement word (a part of the utterance target sentence). Comments 220 and 240 indicate that the portion between these comments is the replacement word before and after the replacement. Reference numeral 230 denotes a comment indicating that the original word (the phrase included in the original utterance target sentence) corresponding to the replacement word is “performance”.

次に、ユーザーに対して手動編集を行うか否か確認し、行う場合には手動辞書編集処理を行う（ステップＳ９０、Ｓ１００）。発話対象対象文章で抽出されていない語彙について頻度順にソートした上で、頻度の高いものから、既に割り当てられている残りの割合の範囲内でサブセット辞書に追加を行なうようにしてもよい。 Next, it is confirmed whether or not manual editing is performed for the user. If so, manual dictionary editing processing is performed (steps S90 and S100). The vocabulary that has not been extracted from the utterance target text may be sorted in the order of frequency, and then added to the subset dictionary within the range of the remaining ratio that has already been assigned, from the highest frequency.

次に上記の処理で対応できない語句については、語句としての登録を断念し、対象文章へのルビ挿入による「単音発音」へと変換する（ステップＳ１１０）。 Next, for words that cannot be dealt with by the above processing, registration as words is abandoned and converted to “single pronunciation” by inserting ruby into the target sentence (step S110).

図５は、ルビ振り（かな置き換え処理）時の変更履歴記録処理の一例を説明するための図である。 FIG. 5 is a diagram for explaining an example of a change history recording process at the time of ruby swing (kana replacement process).

例えば、「量子論」という語彙の登録ができない場合には、図５の３１０に示すように「りょうしろん」というルビ（カタカナまたはひらかなのいずれかのかな）に変換する。その際、該当部位がルビであること、発音しないが元の語彙が「量子論」であったことを示すためのテキストＴＡＧづけを図５のようにおこなってもよい。 For example, if the vocabulary “quantum theory” cannot be registered, it is converted into a ruby (either katakana or hiragana) as shown in 310 of FIG. At that time, text TAGing may be performed as shown in FIG. 5 to indicate that the corresponding part is ruby, and that the original vocabulary is “quantum theory” although it is not pronounced.

すなわち図５に示すように発話対象文章３００自体にコメント３２０、３３０、３４０を挿入する。ここで３１０は仮名変換後のかなである（発話対象文章の一部である）。コメント３２０と３４０は仮名変換語の前後につき、これらのコメントに挟まれた部分が仮名変換語であることを示す。３３０は、仮名変換語に対応するオリジナル語（元もとの発話対象文章に含まれていた語句）が「量子論」であることを示すコメントである。 That is, as shown in FIG. 5, comments 320, 330, and 340 are inserted into the utterance target sentence 300 itself. Here, 310 is the kana after kana conversion (a part of the utterance target sentence). Comments 320 and 340 indicate that the portion between these comments is a kana conversion word before and after the kana conversion word. Reference numeral 330 denotes a comment indicating that the original word (a phrase included in the original utterance target sentence) corresponding to the kana conversion word is “quantum theory”.

サブセット辞書（第２の音声合成用辞書）には仮名表記に対する音声合成データは含まれているので、仮名文字で表された語句は発音可能である。ただし仮名文字としての認識しかないので、当該単語特有の抑揚やアクセントをだすことが困難で棒読みに近い形の発音となる。 Since the subset dictionary (second speech synthesis dictionary) includes speech synthesis data for kana notation, words represented by kana characters can be pronounced. However, since it is only recognized as a kana character, it is difficult to produce the inflection and accent peculiar to the word, and the pronunciation is similar to a stick reading.

そこでサブセット辞書を用いて変更後の発話対象文章の発話試行を行い、ユーザーに確認する（ステップＳ１２０）。 Therefore, an utterance trial of the utterance target sentence after the change is performed using the subset dictionary and confirmed with the user (step S120).

そしてユーザーからの確認入力（例えばＯＫ又はＮＧ）を受け付け、ＯＫであれば処理を終了し（一次抽出後の内容でサブセット辞書の内容を確定させる）、ＮＧであれば、ステップＳ１００に戻り以降の処理を行う（ステップＳ１３０）。 Then, a confirmation input (for example, OK or NG) from the user is accepted, and if OK, the process ends (confirms the contents of the subset dictionary with the contents after the primary extraction). If NG, the process returns to step S100 and thereafter. Processing is performed (step S130).

上記実施の形態ではサブセット辞書の語彙辞書の抽出を例にとり説明した。この手法によれは、語彙を絞り込むことにより、音素も抽出された語彙に対応するもののみに絞りこむことができるので、結果としてサブセット音素辞書も小さくすることができる。 In the above embodiment, the extraction of the vocabulary dictionary of the subset dictionary has been described as an example. According to this method, by narrowing down the vocabulary, the phonemes can be narrowed down to only those corresponding to the extracted vocabulary, and as a result, the subset phoneme dictionary can also be reduced.

しかしサブセット音素辞書サイズに問題ある場合には、一次抽出において割合を変えて再試行するなどの作業を行うようにしてもよい。 However, if there is a problem with the subset phoneme dictionary size, the primary extraction may be retried by changing the ratio.

図６は、サブセット辞書が搭載されるシングルチップＴＴＳ−ＬＳＩ（半導体集積回路装置）の構成について説明するための図である。 FIG. 6 is a diagram for explaining the configuration of a single chip TTS-LSI (semiconductor integrated circuit device) on which a subset dictionary is mounted.

シングルチップＴＴＳ−ＬＳＩ１０は、サブセット辞書３０を含む。サブセット辞書３０は、本実施の形態の音声合成用辞書作成システムによって生成された第２の音声合成用辞書を構成する辞書データが記憶された不揮発性記憶部として機能する。サブセット辞書３０は、語彙辞書３２と音素辞書３４を含み、ＲＯＭやフラッシュEEPROM等で実現できる。 The single chip TTS-LSI 10 includes a subset dictionary 30. The subset dictionary 30 functions as a nonvolatile storage unit in which dictionary data constituting the second speech synthesis dictionary generated by the speech synthesis dictionary creation system of the present embodiment is stored. The subset dictionary 30 includes a vocabulary dictionary 32 and a phoneme dictionary 34 and can be realized by a ROM, a flash EEPROM, or the like.

語彙辞書３２はテキスト読み上げ処理におけるフロントエンド処理を行うための辞書であり、テキスト表記に対応した記号化言語表現（symbolic linguistic representation）（例えばテキスト表記に対応した読みのデータ）が格納された辞書である。 The vocabulary dictionary 32 is a dictionary for performing front-end processing in text-to-speech processing, and is a dictionary in which symbolic linguistic representation (for example, reading data corresponding to text notation) corresponding to text notation is stored. is there.

フロントエンド処理では、テキストの中の数字や省略表現を読み上げるときの表現に変換する処理（テキストの正規化、プリプロセッシング、トークン化などと呼ばれる）や、各単語を発音記号に変換し、テキストを熟語や文節、文などの韻律単位に分割する処理（単語に発音記号を割り当てる処理をテキスト音素（text-to-phoneme（TTP））変換または書記素音素（grapheme-to-phoneme（GTP））変換と呼ぶ）等を行い、発音記号と韻律情報を組み合わせて記号化言語表現を作成し、フロントエンドの出力とする。 In front-end processing, the numbers and abbreviations in the text are converted into expressions for reading (called text normalization, preprocessing, tokenization, etc.), and each word is converted into a phonetic symbol, Processing to divide into prosodic units such as idioms, clauses and sentences (processing to assign phonetic symbols to words as text-to-phoneme (TTP) conversion or grapheme-to-phoneme (GTP)) conversion The symbolic language expression is created by combining phonetic symbols and prosodic information, and used as the output of the front end.

音素辞書３４は、フロントエンドの出力である記号化言語表現を入力として対応する実際の音（音素）の波形情報を格納する辞書である。 The phoneme dictionary 34 is a dictionary that stores waveform information of an actual sound (phoneme) corresponding to a symbolic language expression that is an output of the front end.

サブセット辞書３０には音声合成用辞書作成システムにより作成された第２の音声合成用辞書のデータが格納されている。例えば図３に説明した手順で生成された語彙辞書と当該語彙辞書に必要な音素辞書データからなる音素辞書によって構成してもよい。 The subset dictionary 30 stores data of the second speech synthesis dictionary created by the speech synthesis dictionary creation system. For example, you may comprise by the phoneme dictionary which consists of the vocabulary dictionary produced | generated by the procedure demonstrated in FIG. 3, and the phoneme dictionary data required for the said vocabulary dictionary.

シングルチップＴＴＳ−ＬＳＩ１０は、ホストＩ／Ｆ５０を含む。ホストＩ／Ｆ５０はホストコンピュータとコマンドやデータのやりとりを行うためのインターフェースブロックである。ホストＩ／Ｆ５０はＴＴＳコマンド／データバッファ５２を含み、ここにホストから指示された発話対象文章（テキストデータ）が格納される。発話対象文章は合成音声データ生成処理部２０への入力となる。 The single chip TTS-LSI 10 includes a host I / F 50. The host I / F 50 is an interface block for exchanging commands and data with the host computer. The host I / F 50 includes a TTS command / data buffer 52 in which an utterance target sentence (text data) instructed by the host is stored. The utterance target sentence is input to the synthesized voice data generation processing unit 20.

シングルチップＴＴＳ−ＬＳＩ１０は、合成音声データ生成処理部２０を含む。合成音声データ生成処理部２０は、不揮発性記憶部３０に記憶された辞書データ（サブセット辞書）を用いて所定の発話対象文章に対応した合成音声データを生成する合成音声生成部として機能する。合成音声データ生成処理部２０は、表記→音表記変換ブロック２２、音素選択部２４、発音ブロック２６、フィルタ処理部２８を含む。各部の機能は、専用の回路を設ける事によって実現してもよいし、ＣＰＵが各部の機能を実現するためのプログラムを実行することによって実現してもよい。合成音声データ生成処理部２０の機能は、図２の音声合成用辞書作成システムの合成音声データ生成処理部１３０の機能と同等である。 The single chip TTS-LSI 10 includes a synthesized voice data generation processing unit 20. The synthesized speech data generation processing unit 20 functions as a synthesized speech generation unit that generates synthesized speech data corresponding to a predetermined utterance target sentence using dictionary data (subset dictionary) stored in the nonvolatile storage unit 30. The synthesized speech data generation processing unit 20 includes a notation → sound notation conversion block 22, a phoneme selection unit 24, a pronunciation block 26, and a filter processing unit 28. The function of each part may be realized by providing a dedicated circuit, or may be realized by the CPU executing a program for realizing the function of each part. The function of the synthesized speech data generation processing unit 20 is equivalent to the function of the synthesized speech data generation processing unit 130 of the speech synthesis dictionary creation system in FIG.

表記→音表記変換ブロック２２は、語彙辞書３２を検索して受け取った発話対象文章を記号化言語表現２３にして音素選択部に渡す。 The notation-to-speech notation conversion block 22 searches the vocabulary dictionary 32 and sends the utterance target sentence received as a symbolic language expression 23 to the phoneme selection unit.

音素選択部２４は、発話対象文章の記号化言語表現２３を受け取り、音素辞書３４を検索して記号化言語表現２３に対応する音素の集合を発音ブロック２６に渡す。 The phoneme selection unit 24 receives the encoded language expression 23 of the utterance target sentence, searches the phoneme dictionary 34, and passes a set of phonemes corresponding to the encoded language expression 23 to the pronunciation block 26.

発音ブロック２６は、音素の集合に基づき合成音声波形２７を生成する。 The pronunciation block 26 generates a synthesized speech waveform 27 based on the set of phonemes.

フィルタ処理部２８は、フィルタを用いて合成音声波形の音質の変更または他のキャラクタの音声への変更を行う。 The filter processing unit 28 uses the filter to change the sound quality of the synthesized speech waveform or change to the voice of another character.

シングルチップＴＴＳ−ＬＳＩ１０は、スピーカーＩ／Ｆ４０を含む。フィルタ処理部２８でフィルタリングされた合成音声波形はスピーカーＩ／Ｆ４０のアンプ４２を介して外部のスピーカに出力される。 The single chip TTS-LSI 10 includes a speaker I / F 40. The synthesized speech waveform filtered by the filter processing unit 28 is output to an external speaker via the amplifier 42 of the speaker I / F 40.

本実施の形態のシングルチップＴＴＳ−ＬＳＩ１０は、小容量のサブセット辞書しか搭載せずに当該シングルチップＴＴＳ−ＬＳＩ１０の組み込まれる機器に対応した所定の発話対象文章については精度のよい合成音声データを生成することができる。 The single-chip TTS-LSI 10 according to the present embodiment has only a small-capacity subset dictionary and generates highly accurate synthesized speech data for a predetermined utterance target sentence corresponding to a device in which the single-chip TTS-LSI 10 is incorporated. can do.

図７は、本実施の形態の半導体集積回路装置の製造方法について説明するためのフローチャートである。本実施の形態の半導体集積回路装置は合成音声データ生成処理部と音声合成処理に用いる辞書データが記憶された不揮発性記憶部を含む半導体集積回路装置で以下の行程を経て製造される。 FIG. 7 is a flowchart for explaining the method for manufacturing the semiconductor integrated circuit device of the present embodiment. The semiconductor integrated circuit device of the present embodiment is a semiconductor integrated circuit device including a synthesized speech data generation processing unit and a non-volatile storage unit storing dictionary data used for speech synthesis processing, and is manufactured through the following steps.

まず半導体集積回路装置で発話を予定している発話対象文章を解析し、発話対象文章を構成する各語句の出現頻度を調べ、出現頻度に基づき、第２の音声合成用辞書への格納語を決定し、決定された格納語に対応して第１の音声合成用辞書に格納されている辞書データを用いて第２の音声合成用辞書を生成する（ステップＳ１０）。 First, an utterance target sentence scheduled to be uttered by a semiconductor integrated circuit device is analyzed, the appearance frequency of each word constituting the utterance target sentence is examined, and a word stored in the second speech synthesis dictionary is determined based on the appearance frequency. A second speech synthesis dictionary is generated using dictionary data stored in the first speech synthesis dictionary corresponding to the determined stored word (step S10).

次に第２の音声合成用辞書を用いて発話対象文章に対応した合成音声を生成する（ステップＳ２０）。ここで生成した合成音声についてユーザーからの評価入力を受け付け、ＯＫであれば第２の音声合成用辞書の内容を確定させ、ＮＧであれば第２の音声合成用辞書の編集を続行するようにしてもよい。 Next, synthesized speech corresponding to the utterance target sentence is generated using the second speech synthesis dictionary (step S20). An evaluation input from the user is accepted for the synthesized speech generated here. If OK, the content of the second speech synthesis dictionary is confirmed, and if it is NG, editing of the second speech synthesis dictionary is continued. May be.

次に生成された第２の音声合成用辞書を構成する辞書データを前記半導体集積回路装置の不揮発性記憶部に書き込む（ステップＳ３０）。例えばマスクＲＯＭとして半導体集積回路装置製造時に不揮発性記憶部に第２の音声合成用辞書を構成する辞書データを書き込むようにしてもよい。 Next, the generated dictionary data constituting the second dictionary for speech synthesis is written into the nonvolatile storage unit of the semiconductor integrated circuit device (step S30). For example, as a mask ROM, dictionary data constituting the second dictionary for speech synthesis may be written in the nonvolatile storage unit when the semiconductor integrated circuit device is manufactured.

なお、本発明は本実施形態に限定されず、本発明の要旨の範囲内で種々の変形実施が可能である。 In addition, this invention is not limited to this embodiment, A various deformation | transformation implementation is possible within the range of the summary of this invention.

また日本語以外の言語に対するＴＴＳシステムに対しても適用可能である。 It can also be applied to TTS systems for languages other than Japanese.

本実施の形態の音声合成用辞書作成システムと半導体集積回路装置について説明するための図。1 is a diagram for explaining a speech synthesis dictionary creation system and a semiconductor integrated circuit device according to an embodiment. FIG. 本実施の形態の音声合成用辞書作成システムの機能ブロック図の一例。An example of a functional block diagram of the dictionary system for speech synthesis of this embodiment. 本実施の形態の処理の流れを説明するためのフローチャート。The flowchart for demonstrating the flow of the process of this Embodiment. 置き換え時の変更履歴記録処理の一例を説明するための図。The figure for demonstrating an example of the change log recording process at the time of replacement. ルビ振り（かな置き換え処理）時の変更履歴記録処理の一例を説明するための図。The figure for demonstrating an example of the change log recording process at the time of ruby swing (kana replacement process). サブセット辞書が搭載されるシングルチップＴＴＳ−ＬＳＩ（半導体集積回路装置）の構成について説明するための図。The figure for demonstrating the structure of the single chip TTS-LSI (semiconductor integrated circuit device) by which a subset dictionary is mounted. 本実施の形態の半導体集積回路装置の製造方法について説明するためのフローチャート。9 is a flowchart for explaining a manufacturing method of the semiconductor integrated circuit device of the present embodiment.

Explanation of symbols

１半導体集積回路装置（ＴＴＳ−ＬＳＩ）、２０合成音声データ生成処理部（音声合成システム）、２２表記→音表記変換ブロック、２４音素選択部、２６発音ブロック、２８フィルタ処理部、３０少量量辞書（サブセット辞書）、３２語彙辞書、３４音素辞書、４０スピーカーＩ／Ｆ、５０ホストＩ／Ｆ、１００音声合成用辞書作成システム、１１０処理部、１２０第２の音声合成用辞書作成部、１２２サブセット辞書作成ソフトウエア、１３０合成音声データ生成処理部、１３２音声合成ソフトウエア、１４０発話対象文章変更部、１４２変更履歴記録処理部、１４４同義語置き換え処理部、１４６かな置き換え処理部、１５０辞書編集処理部、１８２第１の音声合成用辞書記憶部（大容量辞書）、１８４大容量辞書 DESCRIPTION OF SYMBOLS 1 Semiconductor integrated circuit device (TTS-LSI), 20 synthetic speech data generation processing part (speech synthesis system), 22 notation-to-speech notation conversion block, 24 phoneme selection part, 26 pronunciation block, 28 filter processing part, 30 small quantity dictionary (Subset dictionary), 32 vocabulary dictionary, 34 phoneme dictionary, 40 speaker I / F, 50 host I / F, 100 speech synthesis dictionary creation system, 110 processing unit, 120 second speech synthesis dictionary creation unit, 122 subset Dictionary creation software, 130 synthesized speech data generation processing unit, 132 speech synthesis software, 140 utterance target sentence changing unit, 142 change history recording processing unit, 144 synonym replacement processing unit, 146 kana replacement processing unit, 150 dictionary editing processing , 182 First speech synthesis dictionary storage unit (large-capacity dictionary), 18 4 large-capacity dictionary

Claims

The second speech synthesis dictionary having a smaller amount of data than the first speech synthesis dictionary from the first speech synthesis dictionary, which is a set of dictionary data necessary for generating synthesized speech corresponding to the utterance target sentence. A speech synthesis dictionary creation system for creating
A first speech synthesis dictionary storage means for storing dictionary data constituting the first speech synthesis dictionary;
Analyzing the utterance target sentence, examining the appearance frequency of each word constituting the utterance target sentence, determining a storage word in the second speech synthesis dictionary based on the appearance frequency, and corresponding to the determined storage word Second speech synthesis dictionary creation means for creating a second speech synthesis dictionary using dictionary data stored in the first speech synthesis dictionary;
A speech synthesis dictionary creating system comprising speech synthesis means for generating synthesized speech corresponding to an utterance target sentence using a second speech synthesis dictionary.

In claim 1,
An utterance target that changes an utterance target sentence that replaces an unstored word that is not stored in the second speech synthesis dictionary with words stored in the second speech synthesis dictionary among words constituting the utterance target sentence. Text change means;
A dictionary creation system for speech synthesis, characterized by comprising:

In claim 2,
The utterance target sentence changing means is:
A speech synthesis dictionary creation system, characterized by creating a speech synthesis dictionary characterized by recording a change history related to replacement of words constituting an utterance target sentence.

In any one of Claims 2 thru | or 3.
The utterance target sentence changing means is:
A synonym for analyzing whether or not there is a synonym with the stored word in the second speech synthesis dictionary for the unstored word, and replacing the unstored word in the utterance target sentence with a synonym if there is a synonym A speech synthesis dictionary creation system comprising synonym replacement processing means for performing word replacement processing.

In any of claims 2 to 4,
The utterance target sentence changing means is:
A speech synthesis dictionary creation system comprising kana replacement processing means for performing a kana replacement process for replacing the unstored word with a kana notation representing the reading of the word.

In any one of Claims 1 thru | or 5,
Editing process for accepting an evaluation input for an utterance target sentence synthesized by speech using the second speech synthesis dictionary and determining or changing the second speech synthesis dictionary or utterance target sentence according to the contents of the evaluation input Means,
A dictionary creation system for speech synthesis, characterized by comprising:

In any one of Claims 1 thru | or 6.
The editing processing means includes
Accepts user-specified input for stored words in the second dictionary for speech synthesis,
The second speech synthesis dictionary creation means includes:
A dictionary system for speech synthesis, wherein a storage word is determined based on a user's designated input.

A non-volatile storage unit storing dictionary data constituting a second speech synthesis dictionary generated by the speech synthesis dictionary creation system according to claim 1;
A synthesized voice data generation processing unit that generates synthesized voice data corresponding to a predetermined utterance target sentence using dictionary data stored in the nonvolatile storage unit;
A semiconductor integrated circuit device comprising:

A method of manufacturing a semiconductor integrated circuit device for speech synthesis, including a non-volatile storage unit,
The semiconductor integrated circuit device analyzes the speech target sentence scheduled for speech synthesis, examines the appearance frequency of each word constituting the speech target sentence, and based on the appearance frequency, stores the words stored in the second speech synthesis dictionary. Determining and generating a second speech synthesis dictionary using dictionary data stored in the first speech synthesis dictionary corresponding to the determined stored word;
Generating synthesized speech corresponding to the utterance target sentence using the second dictionary for speech synthesis;
Writing the dictionary data constituting the generated second speech synthesis dictionary into the nonvolatile storage unit of the semiconductor integrated circuit device;
A method for manufacturing a semiconductor integrated circuit device, comprising: