JP2001117577A

JP2001117577A - Voice synthesizing device

Info

Publication number: JP2001117577A
Application number: JP29726899A
Authority: JP
Inventors: Yuji Wada; 祐司和田
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1999-10-19
Filing date: 1999-10-19
Publication date: 2001-04-27

Abstract

PROBLEM TO BE SOLVED: To provide a voice synthesizing device which can generate synthesized voice of high quality by retrieving optimum waveform data. SOLUTION: A phoneme symbol string retrieving means (phoneme symbol string candidate retrieval part 3) retrieves a phoneme symbol string which matches at least part of a phoneme symbol string inputted from a phoneme symbol string decomposition part 2 and whose matching length is the longest from a voice database 1 and assigns it to the inputted phoneme symbol string. A voice synthesizing means (voice synthesis part 5) connects waveform data corresponding to the assigned phoneme symbol string and outputs synthesized voice. Thus, the longest phoneme symbol string is retrieved, and the voice can be synthesized with a small number of pieces of waveform data as a result and the synthesized voice of high quality can, therefore, be generated.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声データベース
に記憶された波形データを検索して合成音を出力する音
声合成装置に関する。The present invention relates to a speech synthesizer for retrieving waveform data stored in a speech database and outputting a synthesized speech.

【０００２】[0002]

【従来の技術】近年、人の音声を録音することにより得
られた波形データから合成音を生成する録音合成の技術
が注目されている。2. Description of the Related Art In recent years, a technique of recording and synthesizing for generating a synthetic sound from waveform data obtained by recording a human voice has attracted attention.

【０００３】この録音合成の技術を利用した従来の音声
合成装置においては、先ず、人の音声が録音され、その
音声波形を所定の音声単位（音素）ごとに分割すること
によって得られた波形データが多数用意される。これら
各波形データには、検索ラベルとしての音素記号が付与
され、予め波形データとともに記憶装置などに蓄積保存
される。このようにして構築されたものを音声データベ
ースという。[0003] In a conventional speech synthesizer utilizing this recording / synthesis technique, first, a human voice is recorded, and waveform data obtained by dividing the speech waveform into predetermined speech units (phonemes) is obtained. Are prepared. Each of these waveform data is provided with a phoneme symbol as a search label, and is stored in advance in a storage device or the like together with the waveform data. The one constructed in this way is called a speech database.

【０００４】音声を合成する際には、指令として入力さ
れたテキストが、先ず単語辞書などを用いた音素分解処
理により分解されて、一旦複数の音素記号に変換され
る。In synthesizing speech, a text input as a command is first decomposed by phoneme decomposition processing using a word dictionary or the like, and is once converted into a plurality of phoneme symbols.

【０００５】続いて、変換された各音素記号に対し、音
声データベースに記憶された検索ラベルとしての音素記
号が割り当てられる。[0005] Subsequently, a phoneme symbol as a search label stored in the speech database is assigned to each converted phoneme symbol.

【０００６】そして、これら割り当てられた音素記号に
対応する波形データが音声データベースから読み出され
て接続され、Ｄ／Ａ変換後、適宜増幅されて合成音とし
て出力される。[0006] Then, waveform data corresponding to these assigned phoneme symbols are read from the speech database and connected, and after D / A conversion, amplified as appropriate and output as synthesized speech.

【０００７】[0007]

【発明が解決しようとする課題】ところで、本発明の出
願人は、上記従来の音声合成装置とは異なり、連続する
複数の音素に相当する長短様々な波形データを用意し、
これらを音声データベースに多数蓄積しておき、この中
から適宜検索した波形データを接続して合成音を生成す
る音声合成装置の実現を目指している。The applicant of the present invention, unlike the above-mentioned conventional speech synthesizer, prepares various long and short waveform data corresponding to a plurality of continuous phonemes.
A large number of these are stored in a speech database, and waveform data searched as appropriate from the speech database is connected to realize a speech synthesis device that generates a synthesized sound.

【０００８】しかしながら、このような音声合成装置に
あって、音声データベース内の多数の波形データのなか
から最適なものを検索する技術は未だ確立されていな
い。このため、現状においては高品質の合成音を生成す
ることが難しく、例えば、録音された音声の特徴が損な
われてしまう場合さえもあった。従って、音声データベ
ースから最適な波形データを検索することにより、高品
質の合成音を生成できる音声合成装置の実現が望まれて
いる。However, in such a speech synthesizer, a technique for searching for an optimum one from a large number of waveform data in a speech database has not yet been established. For this reason, at present, it is difficult to generate a high-quality synthesized sound, and for example, the characteristics of a recorded voice may be sometimes impaired. Therefore, it is desired to realize a speech synthesizer capable of generating a high-quality synthesized sound by searching for optimal waveform data from a speech database.

【０００９】そこで本発明は、上記の課題に鑑みてなさ
れたものであり、その目的とするところは、最適な波形
データを検索することにより高品質の合成音生成を可能
とした音声合成装置を提供することにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and a purpose thereof is to provide a speech synthesizer capable of generating a high-quality synthesized sound by searching for optimum waveform data. To provide.

【００１０】[0010]

【課題を解決するための手段】上記の課題を解決するた
めに、本発明の請求項１に係る音声合成装置は、音声が
記録された複数の波形データ及び該各波形データの検索
ラベルとして付与された複数の音素記号列とが記憶され
る音声データベースと、入力された音素記号列の少なく
とも一部と一致し、かつ該一致した部分の長さが最長と
なる音素記号列を前記音声データベースから検索し出力
する音素記号列検索手段と、該音素記号列検索手段によ
り検索し出力された音素記号列に対応する波形データを
出力する音声合成手段とを有することを特徴とする。In order to solve the above-mentioned problems, a voice synthesizing apparatus according to claim 1 of the present invention assigns a plurality of waveform data in which voice is recorded and a search label of each waveform data. A plurality of phoneme symbol strings stored are stored in the speech database, and a phoneme symbol string that matches at least a part of the input phoneme symbol string and has the longest length of the matched portion is stored in the speech database. It is characterized by having phoneme symbol string search means for searching and outputting, and speech synthesis means for outputting waveform data corresponding to the phoneme symbol string searched and output by the phoneme symbol string search means.

【００１１】本発明の請求項１に係る音声合成装置にあ
っては、音声が記録された複数の波形データ及び該各波
形データの検索ラベルとして付与された複数の音素記号
列とが記憶される音声データベースが予め設けられる。In the speech synthesizer according to the first aspect of the present invention, a plurality of waveform data in which speech is recorded and a plurality of phoneme symbol strings assigned as search labels for the respective waveform data are stored. An audio database is provided in advance.

【００１２】音素記号列検索手段は、例えば音素分解を
行う音素分解部などから音素記号列が入力されたとき
に、当該入力された音素記号列の少なくとも一部と一致
し、かつ該一致した部分の長さが最長となる音素記号列
を音声データベースから検索し出力する。For example, when a phoneme symbol string is input from a phoneme decomposition unit for performing phoneme decomposition, the phoneme symbol string search means matches at least a part of the input phoneme symbol string, and The phoneme symbol string having the longest length is retrieved from the speech database and output.

【００１３】音声合成手段は、音素記号列検索手段によ
り検索し出力された音素記号列に対応する波形データを
音声データベースから読み出し、そして、例えば、音素
記号列検索手段に入力された音素記号列の順序で波形デ
ータを接続し出力する。The speech synthesizing means reads out waveform data corresponding to the phoneme symbol string retrieved and output by the phoneme symbol string retrieval means from the speech database, and, for example, retrieves the phoneme symbol string input to the phoneme symbol string retrieval means. Connect and output waveform data in order.

【００１４】このようにして、請求項１に係る本発明で
は、最長となる音素記号列を検索することよって、接続
される波形データの数及び接続部分の数を少なくするよ
うにしている。As described above, according to the first aspect of the present invention, the number of connected waveform data and the number of connected portions are reduced by searching for the longest phoneme symbol string.

【００１５】また、本発明の請求項２に係る音声合成装
置は、請求項１記載の音声合成装置において、前記音素
記号列検索手段は、前記入力された音素記号列内におい
て、優先的に検索対象となる部分を設定する場合と設定
しない場合とにより、前記音声データベースから異なる
音素記号列を検索することを特徴とする。According to a second aspect of the present invention, in the speech synthesizer according to the first aspect, the phoneme symbol string search means preferentially searches the input phoneme symbol string. A different phoneme symbol string is searched from the speech database depending on whether a target portion is set or not.

【００１６】本発明の請求項２に係る音声合成装置にあ
っては、音素記号列検索手段が、音声データベースから
異なる音素記号列を検索することよって、その異なる音
素記号列から最終的に１つの音素記号列を選択できるよ
うにし、結果的に最適な波形データを選択できるように
している。In the speech synthesizing apparatus according to the second aspect of the present invention, the phoneme symbol string search means searches the phonetic database for different phoneme symbol strings, and finally obtains one phoneme symbol string from the different phoneme symbol strings. A phoneme symbol string can be selected, and as a result, optimal waveform data can be selected.

【００１７】また、本発明の請求項３に係る音声合成装
置は、請求項１又は請求項２記載の音声合成装置におい
て、前記音素記号列検索手段は、前記音声データベース
から、前記入力された音素記号列を構成する音素記号列
の組を複数組検索し、該検索された各組に含まれる音素
記号列の数が同数であるときは、母音記号によって終了
する音素記号列を多く含む組を選択することを特徴とす
る。According to a third aspect of the present invention, there is provided the speech synthesizer according to the first or second aspect, wherein the phoneme symbol string retrieving means reads the input phoneme from the speech database. A plurality of sets of phoneme symbol strings constituting a symbol string are searched, and when the number of phoneme symbol strings included in each of the searched sets is the same, a set including many phoneme symbol strings terminated by vowel symbols is searched. It is characterized by selecting.

【００１８】本発明の請求項３に係る音声合成装置で
は、母音記号によって終了する音素記号列を多く含む組
を選択することによって、子音と母音とを組み合わせて
構成される音声の多い言語において、特に自然な合成音
が得られるようにしている。In the speech synthesizer according to the third aspect of the present invention, by selecting a set including a large number of phoneme symbol strings terminated by vowel symbols, in a speech-rich language formed by combining consonants and vowels, In particular, a natural synthesized sound is obtained.

【００１９】[0019]

【発明の実施の形態】以下、本発明に係る音声合成装置
の実施の形態を図面を参照して説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a speech synthesizing apparatus according to the present invention.

【００２０】図１は、本発明に係る音声合成装置の第１
の実施の形態の構成を示すブロック図である。図１に示
す音声合成装置は、音声データベース１、音素記号列分
解部２、音素記号列候補検索部３、候補決定部４及び音
声合成部５を含んで構成される。FIG. 1 shows a first embodiment of a speech synthesizer according to the present invention.
FIG. 3 is a block diagram showing a configuration of the embodiment. The speech synthesizer shown in FIG. 1 includes a speech database 1, a phoneme symbol string decomposition unit 2, a phoneme symbol string candidate search unit 3, a candidate determination unit 4, and a speech synthesis unit 5.

【００２１】音声データベース１には、音声が記録され
た複数の波形データが記憶される。当該各波形データ
は、一般的に音素といわれる音声単位或いは音素列（複
数の連続する音素からなる列）に対応しており、音素に
対応する１の記号或いは音素列に対応する２以上の記号
（以下、両者を区別せずに「音素記号列」という）が、
各波形データに対する検索ラベルとして付与され、波形
データとともにこの音声データベース１に記憶されてい
る。The sound database 1 stores a plurality of waveform data in which sounds are recorded. Each waveform data corresponds to a speech unit or a phoneme sequence (a sequence of a plurality of consecutive phonemes) generally called a phoneme, and one symbol corresponding to the phoneme or two or more symbols corresponding to the phoneme sequence (Hereinafter referred to as “phoneme symbol strings” without distinguishing them)
A search label is assigned to each waveform data, and is stored in the voice database 1 together with the waveform data.

【００２２】尚、各波形データ及び音素記号列は、それ
ぞれ別々にファイル化され、各音素記号列に対し、ファ
イル名によって波形データが対応づけられている。ここ
では、説明簡略化のため、同一のデータベースに記憶さ
れているものとする。Each waveform data and phoneme symbol string are separately filed, and each phoneme symbol string is associated with waveform data by a file name. Here, it is assumed that the information is stored in the same database for the sake of simplicity.

【００２３】音素記号列分解部２は、文章に相当する音
素記号列を文節単位に分解するブロックである。通常、
文章に相当するテキストデータに対し音素分解処理が行
われて、一旦音素記号列が生成され、その音素記号列が
音声合成の指令として音素記号列分解部２に入力され
る。音素記号列分解部２は、この入力された音素記号列
のなかの、句点或いは読点に相当する記号を分割点とし
て認識し、当該音素記号列を文節単位に分解する。The phoneme symbol string decomposition unit 2 is a block for decomposing a phoneme symbol string corresponding to a sentence into phrases. Normal,
A phoneme decomposition process is performed on text data corresponding to a sentence, a phoneme symbol string is once generated, and the phoneme symbol string is input to the phoneme symbol string decomposition unit 2 as a speech synthesis command. The phoneme symbol string decomposing unit 2 recognizes a symbol corresponding to a punctuation mark or a reading point in the input phoneme symbol string as a division point, and decomposes the phoneme symbol string into phrases.

【００２４】音素記号列候補検索部３は音素記号列検索
手段を構成するものであり、音素記号列分解部２から音
素記号列が入力されたときに、波形データに対応する音
素記号列を音声データベース１から検索するブロックで
ある。The phoneme symbol string candidate search section 3 constitutes a phoneme symbol string search means. When a phoneme symbol string is input from the phoneme symbol string decomposition section 2, the phoneme symbol string corresponding to the waveform data is converted to speech. This is a block searched from the database 1.

【００２５】ところで、自然な合成音を得るためには、
なるべく多くの音声が含まれる波形データを用いて音声
合成を行うことが必要である。換言すれば、より少ない
数の波形データで合成音を構成する方が自然な合成音を
得ることができる。By the way, in order to obtain a natural synthesized sound,
It is necessary to perform speech synthesis using waveform data containing as much speech as possible. In other words, a natural synthesized sound can be obtained by composing the synthesized sound with a smaller number of waveform data.

【００２６】そこで、この音素記号列候補検索部３は、
入力された音素記号列の少なくとも一部と一致し、かつ
該一致した部分の長さが最長となる音素記号列を音声デ
ータベース１から検索するように構成されている。Therefore, the phoneme symbol string candidate search unit 3
The speech database 1 is configured to search the speech database 1 for a phoneme symbol string that matches at least a part of the input phoneme symbol string and has the longest length of the matched part.

【００２７】このように最長の音素記号列を検索し、こ
れらに対応する波形データを接続することにより、自然
な合成音の生成を行うことができる。As described above, by searching for the longest phoneme symbol strings and connecting the corresponding waveform data, natural synthesized speech can be generated.

【００２８】音素記号列候補検索部３は、第１の候補検
索部３１及び第２の候補検索部３２から構成される。こ
れらブロックは、音素記号列分解部２から入力される音
素記号列内において、優先的に検索対象となる部分を設
定する場合と設定しない場合とによって、異なる音素記
号列を検索できるように構成されている。The phoneme symbol string candidate search section 3 comprises a first candidate search section 31 and a second candidate search section 32. These blocks are configured so that different phoneme symbol strings can be searched depending on whether a part to be searched is preferentially set or not in the phoneme symbol string input from the phoneme symbol string decomposition unit 2. ing.

【００２９】具体的に第１の候補検索部３１は、入力さ
れた音素記号列の先頭記号を含む部分を優先的に検索対
象とするように構成されており、一方、第２の候補検索
部３２は、このような優先的な検索対象を設定せず、任
意の部分を検索対象とするように構成されている。More specifically, the first candidate search unit 31 is configured to preferentially search a portion including the leading symbol of the input phoneme symbol string as a search target, while the second candidate search unit 31 Reference numeral 32 does not set such a priority search target, and is configured to set an arbitrary portion as a search target.

【００３０】候補決定部４は、音素記号列検索手段を構
成するブロックである。候補決定部４は、第１の候補検
索部３１及び第２の候補検索部３２によって、音素記号
列の組（以下、第１の候補検索部３１或いは第２の候補
検索部３２によって検索された音素記号列の組を「候
補」という）がそれぞれ１組づつ検索されたときに、い
ずれか一方の候補を選択し、これを入力された音素記号
列に割り当て、この割り当てられた候補（音素記号列の
組）を音声合成部５に出力する。The candidate determining section 4 is a block constituting a phoneme symbol string search means. The candidate determining unit 4 uses the first candidate searching unit 31 and the second candidate searching unit 32 to search for a set of phoneme symbol strings (hereinafter, searched by the first candidate searching unit 31 or the second candidate searching unit 32). When one set of phoneme symbol strings is referred to as a “candidate”, one of the candidates is selected and assigned to the input phoneme symbol string, and the assigned candidate (phoneme symbol) is selected. Is output to the speech synthesizer 5.

【００３１】音声合成部５は、音声合成手段を構成する
ブロックである。音声合成部５は、候補決定部４により
割り当てられた候補が入力されると、その候補の中の音
素記号列に対応する波形データを音声データベース１か
ら読み出し、これら波形データを接続して合成音を出力
する。The voice synthesizing section 5 is a block constituting voice synthesizing means. When the candidate assigned by the candidate deciding unit 4 is input, the speech synthesis unit 5 reads out waveform data corresponding to a phoneme symbol string in the candidate from the speech database 1, connects these waveform data, and connects the synthesized speech data. Is output.

【００３２】尚、テキストデータを分解して音素記号に
変換する音素記号列分解部や、変換された音素記号に基
本周波数や再生時間長を割り当てる韻律処理部を構成要
素として加えることも可能である。さらに、音声データ
ベース１に、同一の音素記号列に対応し、かつ基本周波
数や再生時間長の異なる複数の波形データを、その基本
周波数や再生時間長の情報とともに記憶させておき、音
声合成部５が、その複数の波形データの中から、韻律処
理部によって割り当てられた基本周波数や再生時間長に
応じたものを検索し、合成するように構成してもよい。
このような構成によれば、より適切な波形データを選択
して接続することができるため、一層自然な合成音を生
成することができる。It is also possible to add a phoneme symbol string decomposing unit for decomposing text data into phoneme symbols and a prosody processing unit for assigning a fundamental frequency and a reproduction time length to the converted phoneme symbols. . Further, a plurality of waveform data corresponding to the same phoneme symbol string and having different fundamental frequencies and playback time lengths are stored in the speech database 1 together with information on the fundamental frequencies and playback time lengths. However, a configuration may be adopted in which the plurality of waveform data is searched for and combined according to the fundamental frequency and the playback time length assigned by the prosody processing unit.
According to such a configuration, more appropriate waveform data can be selected and connected, so that a more natural synthesized sound can be generated.

【００３３】また、図１には示されていないが、本実施
の形態の音声合成装置には、処理を実行する演算部と、
処理命令の記憶やデータの一時記憶が可能な主記憶部と
が設けられ、また、適宜外部記憶装置などが設けられ
る。そして、この主記憶部や外部記憶装置などに記憶さ
れた命令を逐次演算部に読み込ませて実行させることに
より処理が行われる。また、各ブロック間のデータの転
送については、主記憶部に設けられた所定の記憶領域、
或いは命令によって適宜設定される作業領域を介して行
われる。具体的には、後述する文字列変数ｋ１、ｋ２、
ｋ３及びｋ４に対応する作業領域などが適宜設定される
ようになっている。Although not shown in FIG. 1, the speech synthesizing apparatus according to the present embodiment includes an arithmetic unit for executing processing,
A main storage unit capable of storing processing instructions and temporarily storing data is provided, and an external storage device is provided as appropriate. The processing is performed by sequentially reading the instructions stored in the main storage unit or the external storage device and executing them. For data transfer between blocks, a predetermined storage area provided in the main storage unit,
Alternatively, it is performed through a work area appropriately set by an instruction. Specifically, character string variables k1, k2,
The work areas and the like corresponding to k3 and k4 are set as appropriate.

【００３４】次に、本実施の形態における処理の流れを
説明する。音素記号列分解部２には、例えば、「はじめ
まして、よろしくおねがいします。」という日本語テキ
ストに相当する音素記号列／ｈａｚｉｍｅｍａｓｉｔｅ
ｑｙｏｒｏｓｉｋｕｏｎｅｇａｉｓｉｍａｓｕｑｑ
／が入力される。この音素記号列のなかで、／ｑ／は、
読点「、」を、／ｑｑ／は、句点「。」をそれぞれ示す
音素記号である。音素記号列分解部２は、これら記号を
分割点として認識し、入力された音素記号列を文節に相
当する音素記号列単位に分解する。このため、入力され
た音素記号列における読点や句点の両側にある各音素記
号列が、あたかも連続した音素記号列のようにみなされ
てしまう不都合を防止することができる。Next, the flow of processing in this embodiment will be described. The phoneme symbol string disassembling unit 2 includes, for example, a phoneme symbol string / hazimemite corresponding to a Japanese text "Hello, thank you."
q yorosikuonegaisimasu qq
/ Is input. In this phoneme symbol sequence, / q /
The reading point “,” and / qq / are phoneme symbols indicating the period “.”, Respectively. The phoneme symbol string decomposing unit 2 recognizes these symbols as division points, and decomposes the input phoneme symbol string into phoneme symbol string units corresponding to phrases. For this reason, it is possible to prevent inconvenience that each phoneme symbol string on both sides of a reading point or a punctuation mark in the input phoneme symbol string is regarded as a continuous phoneme symbol string.

【００３５】図２は、第１の候補検索部３１における処
理の流れを示すフローチャートである。ここでは、音素
記号列分解部２から、前述の日本語テキストの最初の文
節に相当する音素記号列／ｈａｚｉｍｅｍａｓｉｔｅ／
が入力された場合を説明する。FIG. 2 is a flowchart showing the flow of processing in the first candidate search section 31. Here, the phoneme symbol string decomposing unit 2 outputs a phoneme symbol string / hazimemite /
Will be described.

【００３６】ステップＳ１では、第１の候補検索部３１
が、初期化処理として、音素記号列／ｈａｚｉｍｅｍａ
ｓｉｔｅ／を文字列変数ｋ１に設定する。続くステップ
Ｓ２では、第１の候補検索部３１が、文字列変数ｋ１の
先頭記号／ｈ／を含み、かつその長さ（列長）が最長と
なる音素記号列を音声データベース１から検索して読み
出す。In step S1, the first candidate search unit 31
However, as initialization processing, a phoneme symbol string / hazimema
site / is set to a character string variable k1. In the following step S2, the first candidate search unit 31 searches the speech database 1 for a phoneme symbol string that includes the leading symbol / h / of the character string variable k1 and has the longest length (column length). read out.

【００３７】例えば、ステップＳ２では、先ず文節全体
に相当する音素記号列を検索し、検索できなかった場合
は、その列長を減らしながら順次に音声データベース１
内を検索していく。For example, in step S2, a phoneme symbol string corresponding to the entire phrase is searched first, and if it cannot be searched, the speech database 1 is sequentially reduced while reducing the length of the string.
Search inside.

【００３８】このようにして、音声データベース１か
ら、最長の音素記号列を検索することができる。In this manner, the longest phoneme symbol string can be searched from the speech database 1.

【００３９】尚、列長を減らしながら音声データベース
１を順次検索し、最終的に先頭の音素記号／ｈ／さえも
検索できなかったときには、第１の候補検索部３１は、
先頭の音素記号に対して特別な記号／ｃ／を割り当て
る。この記号／ｃ／は、検索エラーとしての特別の意味
を有している。ここでは、検索エラーとはならずに、音
素記号列／ｈａｚｉｍｅ／が検索されたこととする。Note that the speech database 1 is sequentially searched while the column length is reduced, and when even the first phoneme symbol / h / cannot be finally searched, the first candidate search unit 31
A special symbol / c / is assigned to the first phoneme symbol. This symbol / c / has a special meaning as a search error. Here, it is assumed that the phoneme symbol string / hazime / has been searched without causing a search error.

【００４０】ステップＳ２において音素記号列が検索さ
れた後は、ステップＳ３へと進み、ここでは、第１の候
補検索部３１が、ステップＳ２で読み出された音素記号
列を文字列変数ｋ１から切り取る。即ち、記憶領域にお
いて、その音素記号列を削除する。After the phoneme symbol string is searched in step S2, the process proceeds to step S3, where the first candidate search unit 31 converts the phoneme symbol string read in step S2 from the character string variable k1. cut out. That is, the phoneme symbol string is deleted from the storage area.

【００４１】続くステップＳ４では、第１の候補検索部
３１が、文字列変数ｋ１に含まれる音素記号の数をカウ
ントし、続くステップＳ５では、カウントされた音素記
号の数が０（零）であるか否かを判定する。In the following step S4, the first candidate search unit 31 counts the number of phoneme symbols included in the character string variable k1, and in the following step S5, the number of phoneme symbols counted is 0 (zero). It is determined whether or not there is.

【００４２】ステップＳ５において、ＮＯ（音素記号の
数が０（零）でない）と判定されたときは、入力された
音素記号列／ｈａｚｉｍｅｍａｓｉｔｅ／に対する検索
が全て終了していないこととなるため、ステップＳ２へ
と戻り、このステップ以降は、現在の文字列変数ｋ１、
即ち現在残っている音素記号列／ｍａｓｉｔｅ／につい
て検索が継続される。前述した検索エラーを意味する記
号／ｃ／が先頭記号に割り当てられたときも、同様に、
先頭記号を除いた２番目以降の音素記号列についての検
索が行われる。If it is determined in step S5 that the answer is NO (the number of phoneme symbols is not 0 (zero)), it means that all the searches for the input phoneme symbol string / hazimemite / have not been completed. Returning to S2, after this step, the current character string variable k1,
That is, the search is continued for the currently remaining phoneme symbol string / masite /. Similarly, when the symbol / c / signifying the search error described above is assigned to the first symbol,
A search is performed for the second and subsequent phoneme symbol strings excluding the first symbol.

【００４３】一方、ステップＳ５において、ＹＥＳ（音
素記号の数が０（零）である）と判定されたときは、音
素記号列／ｈａｚｉｍｅｍａｓｉｔｅ／全体についての
検索が終了したことになる。On the other hand, if it is determined in step S5 that the answer is YES (the number of phoneme symbols is 0 (zero)), it means that the search for the phoneme symbol string / hazimemite / entire has been completed.

【００４４】このようにして、第１の候補検索部３１
は、例えば、／ｈａｚｉｍｅ／，／ｍａｓｉ／及び／ｔ
ｅ／からなる音素記号列の組を音声データベース１から
読み出す。以下、この第１の候補検索部３１によって読
み出された音素記号列の組を候補１という。Thus, the first candidate search section 31
Is, for example, / hazime /, / masi / and / t
A set of phoneme symbol strings consisting of e / is read from the speech database 1. Hereinafter, the set of phoneme symbol strings read out by the first candidate search unit 31 is referred to as candidate 1.

【００４５】ステップＳ６では、第１の候補検索部３１
が、候補１を候補決定部４へ出力して一連の処理を終え
る。In step S6, the first candidate search unit 31
Outputs the candidate 1 to the candidate determining unit 4 and ends a series of processes.

【００４６】図３は、第２の候補検索部３２における処
理の流れを示すフローチャートである。音素記号列分解
部２から、入力された文節に相当する音素記号列／ｈａ
ｚｉｍｅｍａｓｉｔｅ／が入力されると、ステップＳ１
１では、第２の候補検索部３２が、この音素記号列を文
字列変数ｋ２に設定する。FIG. 3 is a flowchart showing the flow of processing in the second candidate search section 32. The phoneme symbol string / ha corresponding to the phrase input from the phoneme symbol string decomposition unit 2
When Zymmassite / is input, step S1
In 1, the second candidate search unit 32 sets this phoneme symbol string as a character string variable k2.

【００４７】ステップＳ１２では、第２の候補検索部３
２が、音声データベース１から、文字列変数ｋ２のいず
れかの部分を含み、かつその列長が最長となる音素記号
列を検索して読み出す。In step S12, the second candidate search unit 3
2 retrieves and reads from the speech database 1 a phoneme symbol string that includes any part of the character string variable k2 and has the longest string length.

【００４８】ここでは、図２のステップＳ２と同様に、
文節全体に相当する音素記号列を検索し、検索できなか
った場合は、その列長を減らしながら順次に音声データ
ベース１内を検索していく。ここでは、音素記号列／ｚ
ｉｍｅ／が検索されたこととする。尚、音素記号列／ｈ
ａｚｉｍｅｍａｓｉｔｅ／に含まれる音素記号列のいず
れもが、音声データベース１から検索できなかったとき
は、各音素記号に対して、前述の検索エラーを表す記号
／ｃ／が割り当てられる。Here, similarly to step S2 in FIG.
The phoneme symbol string corresponding to the entire phrase is searched. If the search is not successful, the speech database 1 is searched sequentially while reducing the length of the string. Here, the phoneme symbol string / z
It is assumed that im / has been searched. Note that the phoneme symbol string / h
If none of the phoneme symbol strings included in azimemasite / can be retrieved from the speech database 1, the symbol / c / representing the above-described search error is assigned to each phoneme symbol.

【００４９】ステップＳ１２において音素記号列が検索
された後は、ステップＳ１３へと進み、ここでは、第２
の候補検索部３２が、ステップＳ１２で「読み出された
音素記号列」の前に位置する音素記号列／ｈａ／を文字
列変数ｋ３に設定する。After the phoneme symbol string is retrieved in step S12, the process proceeds to step S13, where the second
Sets the phoneme symbol string / ha / located before the "read phoneme symbol string" in step S12 as the character string variable k3.

【００５０】続くステップＳ１４では、第２の候補検索
部３２が、文字列変数ｋ３について、図２と同様の処理
を行い、ステップＳ１５へと進む。In the following step S14, the second candidate search unit 32 performs the same processing as in FIG. 2 for the character string variable k3, and proceeds to step S15.

【００５１】ステップＳ１５では、第２の候補検索部３
２が、ステップＳ１２で「読み出された音素記号列」に
後続する音素記号列／ｍａｓｉｔｅ／を文字列変数ｋ４
に設定して、ステップＳ１６へと進む。In step S15, the second candidate search unit 3
2 sets the phoneme symbol string / masit / following the "read phoneme symbol string" in step S12 to the character string variable k4.
And the process proceeds to step S16.

【００５２】ステップＳ１６では、第２の候補検索部３
２が、文字列変数ｋ４について、図２と同様の処理を行
う。In step S16, the second candidate search unit 3
2 performs the same processing as in FIG. 2 on the character string variable k4.

【００５３】このようにして、第２の候補検索部３２
は、例えば、／ｈａ／，／ｚｉｍｅ／，／ｍａｓｉ／及
び／ｔｅ／からなる音素記号列の組を検索し、音声デー
タベース１からこれらを読み出す。以下、この第２の候
補検索部３２によって読み出された音素記号列の組を候
補２という。Thus, the second candidate search section 32
Retrieves a set of phoneme symbol strings consisting of, for example, / ha /, / zyme /, / masi /, and / te /, and reads them from the speech database 1. Hereinafter, the set of phoneme symbol strings read by the second candidate search unit 32 is referred to as candidate 2.

【００５４】ステップＳ１７では、第２の候補検索部３
２が、候補２を候補決定部４へ出力して一連の処理を終
える。In step S17, the second candidate search unit 3
2 outputs the candidate 2 to the candidate determining unit 4 and ends a series of processing.

【００５５】尚、文字列変数ｋ３或いは文字列変数ｋ４
内の音素記号列を一旦文字列変数ｋ１に移し替え、その
後、第１の候補検索部３１が検索を行ってもよい。ま
た、ステップＳ１２において検索された音素記号列が、
音素記号列／ｈａｚｉｍｅｍａｓｉｔｅ／の先頭記号／
ｈ／を含むときは、ステップＳ１３及びＳ１４の処理が
省略され、一方、最後尾の記号／ｅ／を含むときは、ス
テップＳ１５及びＳ１６の処理が省略される。The character string variable k3 or the character string variable k4
May be temporarily transferred to a character string variable k1, and then the first candidate search unit 31 may perform a search. Further, the phoneme symbol string searched in step S12 is
Phoneme symbol string / hazimemite / first symbol /
When h / is included, the processing in steps S13 and S14 is omitted, while when h / is included, the processing in steps S15 and S16 is omitted.

【００５６】図４は、候補決定部４による処理の流れを
示すフローチャートである。候補決定部４に対し、第１
の候補検索部３１及び第２の候補検索部３２によってそ
れぞれ読み出された候補１及び候補２が入力されると、
先ず、ステップＳ２１では、候補決定部４が、各候補内
の音素記号列の数を比較する。FIG. 4 is a flowchart showing the flow of the process performed by the candidate determining unit 4. For the candidate determination unit 4, the first
When the candidates 1 and 2 read by the candidate search unit 31 and the second candidate search unit 32 are input,
First, in step S21, the candidate determination unit 4 compares the number of phoneme symbol strings in each candidate.

【００５７】ステップＳ２１で音素記号列の数が異なる
と判定されたときは、ステップＳ２２へと進み、ここで
は候補決定部４が、音素記号列の数の少ない方の候補を
最終的に音素記号列／ｈａｚｉｍｅｍａｓｉｔｅ／に対
して割り当てる。即ち、入力された音素記号列の該当部
分に、検索ラベルとしての音素記号列が割り当てられた
ことになる。ここでは、候補２よりも音素記号列の数の
少ない候補１が割り当てられる。If it is determined in step S21 that the number of phoneme symbol strings is different, the process proceeds to step S22, where the candidate determination unit 4 finally determines the candidate having the smaller number of phoneme symbol strings into phoneme symbol strings. Assign to column / hazimemite /. That is, a phoneme symbol string as a search label is assigned to a corresponding part of the input phoneme symbol string. Here, candidate 1 having fewer phoneme symbol strings than candidate 2 is assigned.

【００５８】そして、候補決定部４は、割り当てられた
候補を音声合成部５に出力して処理を終える。Then, the candidate deciding unit 4 outputs the assigned candidates to the speech synthesizing unit 5 and ends the processing.

【００５９】一方、ステップＳ２１において、音素記号
列の数が同一であると判定されたときは、ステップＳ２
３へと進み、ここでは候補決定部４が、各候補内の音素
記号列の最後の音素記号を抽出する。ステップＳ２４で
は、候補決定部４が、抽出された音素記号のうちの母音
に相当する記号（以下、母音記号という）の数をカウン
トして、ステップＳ２５へと進む。On the other hand, if it is determined in step S21 that the number of phoneme symbol strings is the same, step S2
Proceeding to 3, the candidate determining unit 4 extracts the last phoneme symbol of the phoneme symbol string in each candidate. In step S24, the candidate determination unit 4 counts the number of symbols (hereinafter, referred to as vowel symbols) corresponding to vowels in the extracted phoneme symbols, and proceeds to step S25.

【００６０】ステップＳ２５では、候補決定部４が、母
音記号の数の多い方の候補を最終的に音素記号列／ｈａ
ｚｉｍｅｍａｓｉｔｅ／に対して割り当て、割り当てら
れた候補を音声合成部５に出力して処理を終える。In step S25, the candidate determination unit 4 finally determines the candidate having the larger number of vowel symbols in the phoneme symbol string / ha
It is assigned to the "Zymmasite /", and the assigned candidate is output to the speech synthesizer 5 to complete the processing.

【００６１】このようにして、候補決定部４による処理
が終了すると、音声合成部５は、入力された候補中の音
素記号列に対応する波形データを音声データベース１か
ら読み出し、これら波形データの接続、Ｄ／Ａ変換、及
び信号増幅を行い合成音を生成して出力する。When the processing by the candidate deciding unit 4 is completed in this way, the speech synthesizing unit 5 reads out the waveform data corresponding to the phoneme symbol string in the input candidate from the speech database 1 and connects these waveform data. , D / A conversion, and signal amplification to generate and output a synthesized sound.

【００６２】従って、この実施の形態に係る音声合成装
置によれば、音素記号列候補検索部３及び候補決定部４
によって音素記号列検索手段が構成され、この音素記号
列検索手段が、音素記号列が入力されたときに、当該音
素記号列の少なくとも一部と一致し、かつ該一致した部
分の長さが最長となる音素記号列を音声データベースか
ら検索し出力するようにし、音声合成手段を構成する音
声合成部５が、音素記号列検索手段からの音素記号列に
対応する波形データを出力するようにしたため、少ない
数の波形データにより合成音を構成することができ、従
って、ノイズを発生させる接続部分が少なく、かつ自然
で高品質な合成音を生成することができる。Therefore, according to the speech synthesizer of this embodiment, the phoneme symbol string candidate search unit 3 and the candidate determination unit 4
The phoneme symbol string search means is configured such that when a phoneme symbol string is input, the phoneme symbol string search means matches at least a part of the phoneme symbol string, and the length of the matched part is the longest. Since the phoneme symbol string to be retrieved from the speech database is retrieved and output, and the speech synthesis unit 5 constituting the speech synthesis means outputs the waveform data corresponding to the phoneme symbol string from the phoneme symbol string search means, A synthesized sound can be composed of a small number of waveform data, and therefore, a natural and high-quality synthesized sound can be generated with few connection parts that generate noise.

【００６３】また、音素記号列検索手段は、入力された
音素記号列内において、優先的に検索対象となる部分を
設定する場合と設定しない場合とにより、音声データベ
ースから異なる音素記号列を検索するようにしたため、
その異なる音素記号列から最終的に１つの音素記号列を
選択できるようになり、結果的に、波形データの選択の
幅を広げて、最適な波形データを合成音の生成に利用す
ることが可能となる。The phoneme symbol string search means searches the phonetic database for different phoneme symbol strings depending on whether a part to be searched is preferentially set or not in the input phoneme symbol string. So that
Finally, one phoneme symbol string can be selected from the different phoneme symbol strings. As a result, the range of waveform data selection can be expanded, and the optimum waveform data can be used for generating a synthesized sound. Becomes

【００６４】尚、優先的な検索対象となる部分は、上記
実施の形態のように、先頭記号を含む部分に限るもので
はない。従って、検索対象となる部分を、入力された音
素記号列内において、適宜異ならせて設定することによ
っても、異なる音素記号列を検索することができ、従っ
て、波形データの選択の幅を広げることができる。It should be noted that the portion to be searched preferentially is not limited to the portion including the leading symbol as in the above embodiment. Therefore, a different phoneme symbol string can be searched for by setting the part to be searched in the input phoneme symbol string as appropriate, thereby expanding the range of selection of waveform data. Can be.

【００６５】また、音素記号列検索手段は、音声データ
ベースから、入力された音素記号列を構成する音素記号
列の組を複数組検索し、該検索された各組に含まれる音
素記号列の数が同数であるときは、母音記号によって終
了する音素記号列を多く含む組を選択して割り当てるこ
ととしたため、子音と母音とを組み合わせて構成される
音声の多い日本語などの言語において、特に自然な合成
音を生成することができる。The phoneme symbol string search means searches a plurality of sets of phoneme symbol strings constituting the input phoneme symbol string from the speech database, and finds the number of phoneme symbol strings included in each of the searched sets. When the numbers are the same, a set that includes many phoneme symbol strings terminated by vowel symbols is selected and assigned.Therefore, especially in languages such as Japanese with many voices composed of a combination of consonants and vowels, A complex synthesized sound can be generated.

【００６６】尚、この実施の形態に係る音声合成装置の
各ブロックをソフトウェアで構成し、コンピュータ読み
取り可能な記録媒体に記録することにより配布性及び可
搬性を持たせ、任意の場所において上記と同様の効果を
得るようにすることも可能である。Each block of the speech synthesizer according to the present embodiment is constituted by software and recorded on a computer-readable recording medium so that the speech synthesizing device can be distributed and transported. It is also possible to obtain the effect of.

【００６７】また、上記各ブロックは集積回路等のハー
ドウェアのみで構成することができ、一方その一部をソ
フトウェアで構成することもできる。そして、そのいず
れにしても上記と同様の効果を得ることができる。さら
に、各ブロックを単一のコンピュータ上のみならず、ネ
ットワークを構成する端末やサーバマシンに分散配備さ
せてもよい。Further, each of the above blocks can be constituted only by hardware such as an integrated circuit, and a part thereof can be constituted by software. In any case, the same effect as above can be obtained. Furthermore, the blocks may be distributed and deployed not only on a single computer but also on terminals and server machines that configure a network.

【００６８】[0068]

【発明の効果】以上説明したように、本発明の請求項１
に係る音声合成装置によれば、入力された音素記号列の
少なくとも一部と一致し、かつ該一致した部分の長さが
最長となる音素記号列を検索することとしたため、少な
い数の波形データにより音声を合成することができ、従
って、高品質の合成音を生成することが可能となる。As described above, according to the first aspect of the present invention,
According to the speech synthesizing apparatus according to the above, since the phoneme symbol string that matches at least a part of the input phoneme symbol string and has the longest length of the matched part is searched, a small number of waveform data Can synthesize speech, and therefore, it is possible to generate a high-quality synthesized sound.

【００６９】また、本発明の請求項２に係る音声合成装
置によれば、入力された音素記号列内において、優先的
に検索対象となる部分を設定する場合と設定しない場合
とより、異なる音素記号列を検索することとしたため、
結果的に波形データの選択の幅を広げることができ、従
って、より高品質の合成音を生成することが可能とな
る。According to the speech synthesizing apparatus according to the second aspect of the present invention, a different phoneme is set in the input phoneme symbol string than when a portion to be searched is preferentially set or not set. Because we decided to search for a symbol string,
As a result, the range of selection of the waveform data can be widened, and therefore, a higher-quality synthesized sound can be generated.

【００７０】また、本発明の請求項３に係る音声合成装
置によれば、音素記号列の組を複数組検索し、該検索さ
れた各組に含まれる音素記号列の数が同数であるとき
は、母音記号によって終了する音素記号列を多く含む組
を選択することとしたため、子音と母音とを組み合わせ
て構成される音声の多い言語において、特に自然な合成
音を生成することができる。According to the speech synthesizing apparatus of the third aspect of the present invention, a plurality of sets of phoneme symbol strings are searched, and when the number of phoneme symbol strings included in each of the searched sets is the same. Selects a set including many phoneme symbol strings terminated by vowel symbols, so that a particularly natural synthesized sound can be generated in a language with many voices formed by combining consonants and vowels.

[Brief description of the drawings]

【図１】本発明に係る音声合成装置の第１の実施の形態
を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of a speech synthesizer according to the present invention.

【図２】図１に示した形態の第１の候補検索部における
処理の流れを示すフローチャートである。FIG. 2 is a flowchart showing a flow of processing in a first candidate search unit having the form shown in FIG. 1;

【図３】図１に示した形態の第２の候補検索部における
処理の流れを示すフローチャートである。FIG. 3 is a flowchart showing a flow of processing in a second candidate search unit having the form shown in FIG. 1;

【図４】図１に示した形態の候補決定部における処理の
流れを示すフローチャートである。FIG. 4 is a flowchart showing a flow of processing in a candidate determining unit having the form shown in FIG. 1;

[Explanation of symbols]

１音声データベース２音素記号列分解部３音素記号列候補検索部３１第１の候補検索部３２第２の候補検索部４候補決定部５音声合成部 Reference Signs List 1 voice database 2 phoneme symbol string decomposition section 3 phoneme symbol string candidate search section 31 first candidate search section 32 second candidate search section 4 candidate determination section 5 speech synthesis section

Claims

[Claims]

1. A speech database storing a plurality of waveform data in which speech is recorded and a plurality of phoneme symbol strings assigned as search labels for the respective waveform data, and at least a part of the input phoneme symbol string. And a phoneme symbol string search means for searching and outputting a phoneme symbol string having the longest matching portion from the speech database, and a phoneme symbol string searched and output by the phoneme symbol string search means. And a voice synthesizing unit that outputs waveform data corresponding to.

2. The phoneme symbol string search means according to claim 1, wherein said phoneme symbol string is different from said speech database depending on whether a part to be searched is preferentially set or not in said input phoneme symbol string. 2. The speech synthesizer according to claim 1, wherein a search is made for.

3. The phoneme symbol string search means searches a plurality of sets of phoneme symbol strings constituting the input phoneme symbol string from the speech database, and finds phoneme symbols included in each of the searched sets. When the number of columns is the same,
3. The speech synthesizer according to claim 1, wherein a set including a large number of phoneme symbol strings terminated by vowel symbols is selected.