JP2642617B2

JP2642617B2 - Speech synthesizer

Info

Publication number: JP2642617B2
Application number: JP60007744A
Authority: JP
Inventors: 秀紀大橋
Original assignee: Sanyo Denki Co Ltd
Current assignee: Sanyo Denki Co Ltd
Priority date: 1985-01-19
Filing date: 1985-01-19
Publication date: 1997-08-20
Anticipated expiration: 2012-08-20
Also published as: JPS61166600A

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は音声を合成する音声合成装置に関する。The present invention relates to a speech synthesizer for synthesizing speech.

（ロ）従来の技術従来の音声合成装置としては特願昭59−2291号にて提
案した如く、日本語の場合子音部並びに子音部から母音
部に継がる過渡部及び定常的な母音部が結合している約
100音節に対応しているLSP係数、PARCOR係数等の音節パ
ラメータをROM（リードオンリーメモリ）に貯えてお
き、このROMから読み出した音節パラメータを接続する
ことに依つて単語及び文章単位の音声を再合成するもの
があつた。この合成方式は一般に規則合成方式として知
られている。(B) Conventional technology As a conventional speech synthesizer, as proposed in Japanese Patent Application No. 59-2291, in the case of Japanese, a consonant part, a transient part continuous from a consonant part to a vowel part, and a steady vowel part are used. About binding
The syllable parameters such as the LSP coefficient and the PARCOR coefficient corresponding to 100 syllables are stored in ROM (Read Only Memory), and by connecting the syllable parameters read from this ROM, the speech in word and sentence units is reproduced. Something to synthesize. This combining method is generally known as a rule combining method.

斯様な音声合成装置は約100音節単位の音節パラメー
タをメモリに格納しておけばあらゆる日本語の音声が再
合成可能であるが、出力される音声は、ROM内に貯えら
れた音節データであるので、その特徴を持つた音声の再
合成しか行なえないという欠点があつた。従つて様々な
人物の特徴を持つた合成音声を出力することはできない
ものであつた。Such a speech synthesizer can resynthesize any Japanese speech by storing syllable parameters in units of about 100 syllables in memory, but the output speech is syllable data stored in ROM. However, there is a drawback that only resynthesizing of the voice having the characteristic can be performed. Therefore, it is impossible to output a synthesized voice having various characteristics of a person.

（ハ）発明が解決しようとする問題点本発明は上述の点に鑑みてなされ、規則合成方式の音
声合成装置に於て、音声素片に個人性をもたせたパラメ
ータを付加し、合成出力される音声に容易に個人性情報
をもたせる音声合成装置を供給するものである。(C) Problems to be Solved by the Invention The present invention has been made in view of the above points, and in a speech synthesis apparatus of a rule synthesis system, a speech unit is added with a parameter having personality and synthesized and output. The present invention provides a voice synthesizing device for easily giving personality information to a voice.

（ニ）問題点を解決するための手段本発明の音声合成装置は、音節単位の音声の特徴であ
る音節パラメータを記憶する標準音節パラメータ記憶部
と、外部から音声を入力して母音定常部のスペクトルパ
ラメータである特徴パラメータを抽出する特徴パラメー
タ抽出部と、標準音節パラメータ記憶部に記憶された音
節パラメータと特徴パラメータ抽出部で抽出された特徴
パラメータを用いて新たな音節パラメータを生成する音
節パラメータ生成部と、音節パラメータ生成部で生成さ
れた音節パラメータを記憶するユーザー音節パラメータ
記憶部と、を備え、ユーザー音節パラメータ記憶部に記
憶された音節パラメータに基づいて任意の単語あるいは
文章を音声出力する。(D) Means for Solving the Problems The speech synthesizer according to the present invention includes a standard syllable parameter storage unit that stores syllable parameters that are features of speech in syllable units, and a vowel stationary unit that receives speech from the outside. A feature parameter extraction unit that extracts feature parameters that are spectral parameters, and a syllable parameter generation that generates a new syllable parameter using the syllable parameters stored in the standard syllable parameter storage unit and the feature parameters extracted by the feature parameter extraction unit And a user syllable parameter storage unit for storing the syllable parameters generated by the syllable parameter generation unit, and outputs an arbitrary word or sentence based on the syllable parameters stored in the user syllable parameter storage unit.

（ホ）作用本発明の音声合成装置は子音部と子音部から母音部に
継がる過渡部と定常的な母音部とが結合した音節パラメ
ータをROM（リードオンリーメモリ）に貯える。また定
常的な個人性を示す母音のパラメータを音声入力部、音
声分析部より入力し、定常母音バッファーメモリからな
る定常母音記憶部に貯え、その定常母音パラメータと音
節パラメータより切り出された子音部及び子音部より母
音部に継がる過渡部とを接続して、個人性情報が付加さ
れた新たな音節単位の音節パラメータとする音節パラメ
ータ生成部及びそれを貯えるユーザー音節パラメータメ
モリからなるユーザー音節パラメータ記憶部を持ち、上
記の音節パラメータ列及びそれに適合するピッチ・パラ
メータにより音声を合成するものである。(E) Function The voice synthesizing apparatus of the present invention stores in a ROM (read only memory) syllable parameters in which a consonant part, a transient part continuing from the consonant part to the vowel part, and a steady vowel part are combined. Also, a vowel parameter indicating a steady personality is input from a voice input unit and a voice analysis unit, stored in a steady vowel storage unit including a steady vowel buffer memory, and a consonant part cut out from the steady vowel parameter and the syllable parameter. A user syllable parameter storage unit comprising a syllable parameter generation unit that connects a transition unit from a consonant unit to a vowel unit to create a new syllable parameter with syllable units added with personality information, and a user syllable parameter memory that stores it And synthesizes a voice using the syllable parameter sequence and the pitch parameter suitable for the sequence.

（ヘ）実施例第１図に本発明の音声合成装置の一実施例を示す。同
図に於て、（10）は音声入力用マイクであり、（11）は
音声入力用マイク（10）より入力された定常母音を分析
しLSPパラメータ、PARCORパラメータ等の音声の特徴パ
ラメータを抽出するパラメータ抽出回路である。（12）
はパラメータ抽出回路（11）により、抽出された定常母
音の特徴パラメータを一時的に貯える定常母音バッフア
ーメモリ（RAM）からなる定常母音記憶部である。(F) Embodiment FIG. 1 shows an embodiment of the speech synthesizer of the present invention. In the figure, (10) is a microphone for voice input, and (11) is an analysis of a stationary vowel input from the microphone for voice input (10) to extract voice feature parameters such as LSP parameters and PARCOR parameters. This is a parameter extraction circuit. (12)
Is a stationary vowel storage unit comprising a stationary vowel buffer memory (RAM) for temporarily storing characteristic parameters of the stationary vowels extracted by the parameter extracting circuit (11).

また（４）は子音部と子音部から母音部に継がる過渡
部及び定常的な母音部が結合している各種の音節パラメ
ータがアドレス付けされて貯えられている標準音節パラ
メータメモリからなる標準音節パラメータ記憶部であ
る。（６）は標準音節パラメータメモリ（４）中の音節
パラメータより子音部及び子音部から母音に継がる過渡
部のみを抽出し、その抽出データと定常母音バッフアー
メモリ（12）中の定常母音とを結合させ新たに（子音
部）＋（過渡部）＋（入力された定常母音）という定常
母音入力者の個人性情報を有する新たな音節パラメータ
を作りだす音節パラメータ生成部である。（５）は音節
パラメータ生成部により作り出された個人性を有する音
節パラメータを貯えておくユーザー音節パラメータメモ
リ（RAM）からなるユーザー音節パラメータ記憶部であ
る。(4) A standard syllable comprising a standard syllable parameter memory in which various syllable parameters in which a consonant part and a transient part continuing from the consonant part to the vowel part and a stationary vowel part are connected are addressed and stored. It is a parameter storage unit. (6) extracts only the consonant part and the transient part from the consonant part to the vowel from the syllable parameter in the standard syllable parameter memory (4), and extracts the extracted data and the stationary vowel in the stationary vowel buffer memory (12). Are combined to generate a new syllable parameter having (consonant part) + (transient part) + (input stationary vowel) individual personality information of the vowel input person. (5) is a user syllable parameter storage unit including a user syllable parameter memory (RAM) for storing syllable parameters having personality created by the syllable parameter generation unit.

一方（１）は文字キーが配列されたキーボード、
（２）はキーボード（１）からのキー操作信号を受けて
そのキーに対応する音節単位の文字信号に変換するデコ
ーダである。（３）はデコーダ（２）よりの文字信号と
標準音節パラメータメモリ（４）およびユーザー音節パ
ラメータメモリ（５）の各音節アドレスとを結びつける
音節アドレステーブルである。また（16）はデコーダ
（２）よりの音節単位の文字信号とその音節の発生時間
長とを対応づけた音節長テーブルである。On the other hand, (1) is a keyboard on which character keys are arranged,
A decoder (2) receives a key operation signal from the keyboard (1) and converts it into a character signal in syllable units corresponding to the key. (3) is a syllable address table that links the character signal from the decoder (2) with each syllable address in the standard syllable parameter memory (4) and the user syllable parameter memory (5). (16) is a syllable length table in which the character signal in syllable units from the decoder (2) is associated with the generation time length of the syllable.

また（７）は標準音節パラメータメモリ（４）もしく
はユーザー音節パラメータメモリ（５）のいづれの音節
パラメータにより音声合成を行なうかを選択する合成音
声選択部である。また（８）は音節長テーブル（16）に
て指定された音節の発生時間長に合致する如く音節デー
タ長を伸長又は圧縮する音節データ長制御部である。
（９）はパラメータ領域（９−ａ）とピッチ領域（９−
ｂ）とから成る音声データバッフアメモリであり、パラ
メータ領域（９−ａ）には標準音節パラメータメモリ
（４）もしくはユーザー音節パラメータメモリ（５）が
音節データ長制御部（８）により調節された状態での音
節パラメータとして格納され、これに続くキーボード
（１）よりのキー入力に応じて新たな音節パラメータが
順次格納される。Reference numeral (7) denotes a synthesized speech selecting section for selecting which of the standard syllable parameter memory (4) and the user syllable parameter memory (5) should be used for speech synthesis. A syllable data length control unit (8) expands or compresses the syllable data length so as to match the syllable generation time length specified in the syllable length table (16).
(9) is a parameter area (9-a) and a pitch area (9-a).
b), a standard syllable parameter memory (4) or a user syllable parameter memory (5) in a parameter area (9-a) adjusted by a syllable data length control unit (8). The syllable parameters are stored as syllable parameters in the state, and new syllable parameters are sequentially stored in response to key input from the keyboard (1).

（13）は合成音声のアクセント型を指定する為のアク
セント指定部である。（14）はアクセント指定部（13）
により指定されたアクセント及びキーボード（１）入力
より得られる合成音声の音節数で表わされるモーラ数と
の組合せ信号からなるピッチパターン指定信号を生成す
るピッチパターン指定回路である。（15）はピッチパタ
ーン指定回路（14）よりのピッチパターン指定信号より
合成音声のイントネーシヨン及びアクセントを決定する
標準的なピッチパラメータが納められているピッチテー
ブルであり、モーラ数とそのアクセント型の組合せ毎に
ピッチパラメータがパターン化されて格納されている。
すなわち、アクセント位置のピッチ周波数が相対的に高
くなるように設定される。（17）は音節長テーブル（1
6）からの各音節の時間長に基づいて、ピッチテーブル
（15）より得られた合成音声の標準ピッチパターンを音
節毎に線形圧縮又は線形伸長するピッチパターンマッチ
ング回路であり、その回路にてマッチングされたマッチ
ングピッチパターンが音声データバッフアメモリ（９）
のピッチ領域（９−ｂ）に格納され、このマッチング・
ピッチパターンとパラメータ領域（９−ａ）の音節パラ
メータ列とが対応付けられる。(13) is an accent designating part for designating an accent type of the synthesized speech. (14) is the accent designation part (13)
Is a pitch pattern designating circuit for generating a pitch pattern designating signal composed of a combination signal of the accent designated by the formula (1) and the number of mora represented by the number of syllables of the synthesized speech obtained from the keyboard (1) input. (15) is a pitch table in which standard pitch parameters for determining the intonation and accent of synthesized speech from the pitch pattern specifying signal from the pitch pattern specifying circuit (14) are stored. Is stored in a pattern for each combination of the pitch parameters.
That is, the pitch frequency at the accent position is set to be relatively high. (17) is the syllable length table (1
A pitch pattern matching circuit that linearly compresses or linearly expands the standard pitch pattern of the synthesized speech obtained from the pitch table (15) for each syllable based on the time length of each syllable from 6). The matched pitch pattern is stored in the audio data buffer memory (9).
Is stored in the pitch area (9-b) of
The pitch pattern is associated with the syllable parameter sequence in the parameter area (9-a).

（18）は音声データバッフアメモリ（９）に格納され
た音節パラメータ列及びそれに対応したマッチングピッ
チパターンを入力することにより、キーボード（１）入
力に対応した音声信号を合成出力する音声合成部であ
る。（19）は音声合成部（18）よりの合成音声出力を増
幅するアンプであり、スピーカー（20）より最終的な合
成音声が発生される。(18) is a voice synthesizing unit for synthesizing and outputting a voice signal corresponding to the keyboard (1) input by inputting a syllable parameter sequence stored in the voice data buffer memory (9) and a matching pitch pattern corresponding thereto. is there. An amplifier (19) amplifies a synthesized voice output from the voice synthesizer (18), and a final synthesized voice is generated from the speaker (20).

次に音節パラメータ生成部（６）における処理手順を
第２図のフローチヤートに基づいて、説明する。Next, the processing procedure in the syllable parameter generation unit (6) will be described based on the flowchart of FIG.

まず、標準音節パラメータメモリ（４）よりパラメー
タを抽出し、そのパラメータが子音部並びに子音部から
母音部に継がる過渡部であるか母音部であるかを判断
し、子音部もしくは母音部への過渡部であればユーザー
音節パラメータメモリ（５）内に書き込んでゆく、これ
を定常母音部への継続部のデータの最後まで行なう。ま
た標準音節パラメータメモリ（４）よりのパラメータが
母音であれば、そのパラメータに代わり定常母音バッフ
アーメモリ（12）より対応する定常母音パラメータを取
り込んで、第３図に示す如くユーザー音節パラメータメ
モリ（５）内の対応する（子音部）＋（定常母音部の継
続部）のデータに続けて書き込んでゆく。この作業を標
準音節パラメータメモリ（４）内のすべての音節パラメ
ータについて行なう。それによりユーザー音節パラメー
タメモリ（５）内には標準音節パラメータメモリ（４）
に対応する新しい個人性情報を持つた音節パラメータが
生成される。First, a parameter is extracted from the standard syllable parameter memory (4), and it is determined whether the parameter is a consonant part or a transition part or a vowel part which is continued from the consonant part to the vowel part. If it is a transient part, it is written into the user syllable parameter memory (5), and this is done until the end of the data of the continuation part to the steady vowel part. If the parameter from the standard syllable parameter memory (4) is a vowel, the corresponding vowel parameter is fetched from the stationary vowel buffer memory (12) instead of the parameter, and the user syllable parameter memory ( The data is written following the corresponding (consonant part) + (continuation part of the steady vowel part) in 5). This operation is performed for all syllable parameters in the standard syllable parameter memory (4). Thereby, the standard syllable parameter memory (4) is stored in the user syllable parameter memory (5).
A syllable parameter having new personality information corresponding to is generated.

（ホ）発明の効果本発明の音声合成装置は、以上の説明から明らかな如
く、外部から音声を入力して母音定常部のスペクトルパ
ラメータによる個人性を示す特徴パラメータを抽出する
特徴パラメータ抽出部と、その抽出部にて得られる特徴
パラメータを用いて個人情報が付加された音節パラメー
タを生成する音節パラメータ生成部とを設けたものであ
るので、ユーザーの個人性情報を含んだ音節パラメータ
を基本単位として規則合成が可能となり、ユーザー自身
の音声に近い合成音声を出力する事ができる。(E) Effects of the Invention As is apparent from the above description, the speech synthesizer according to the present invention includes a feature parameter extraction unit that inputs a speech from the outside and extracts a feature parameter indicating individuality based on a spectral parameter of a vowel stationary part. And a syllable parameter generation unit that generates a syllable parameter to which personal information is added using the characteristic parameter obtained by the extraction unit, so that the syllable parameter including the personality information of the user is used as a basic unit. As a result, it is possible to output a synthesized voice close to the user's own voice.

[Brief description of the drawings]

第１図は本発明の音声合成装置の一実施例の構成を示す
ブロック図であり、第２図及び第３図は本発明装置に係
る音節パラメータ生成部の処理手順を示すフローチャー
ト、及びそのメモリ図である。（１）……キーボード、（２）……デコーダ、（３）…
…音節アドレステーブル、（４）……標準音節パラメー
タメモリ、（５）……ユーザー音節パラメータメモリ、
（６）……音節パラメータ生成部、（７）……合成音声
選択部、（８）……音節データ長制御部、（９）……音
声データバッフアメモリ、（９−ａ）……パラメータ領
域、（９−ｂ）……ピッチ領域、（10）……音声入力用
マイク、（11）……パラメータ抽出回路、（12）……定
常母音バッフアーメモリ、（13）……アクセント指定
部、（14）……ピッチパターン指定回路、（15）……ピ
ッチテーブル、（16）……音節長テーブル、（17）……
ピッチパターン・マッチング回路、（18）……音声合成
部、（19）……アンプ、（20）……スピーカ。FIG. 1 is a block diagram showing the configuration of an embodiment of the speech synthesizer of the present invention. FIGS. 2 and 3 are flowcharts showing the processing procedure of a syllable parameter generator according to the present invention, and its memory. FIG. (1) ... keyboard, (2) ... decoder, (3) ...
… Syllable address table, (4)… standard syllable parameter memory, (5)… user syllable parameter memory,
(6) Syllable parameter generator, (7) Synthesized speech selector, (8) Syllable data length controller, (9) Speech data buffer memory, (9-a) ... Parameter Area, (9-b) ... pitch area, (10) ... microphone for voice input, (11) ... parameter extraction circuit, (12) ... steady vowel buffer memory, (13) ... accent designation section , (14)… pitch pattern designating circuit, (15)… pitch table, (16)… syllable length table, (17)…
Pitch pattern matching circuit, (18) voice synthesizer, (19) amplifier, (20) speaker.

Claims

(57) [Claims]

A standard syllable parameter storage unit for storing a syllable parameter which is a feature of a syllable unit; a feature parameter for inputting a speech from outside to extract a feature parameter which is a spectrum parameter of a vowel stationary part; An extraction unit (11), and a syllable parameter generation unit that generates a new syllable parameter using the syllable parameters stored in the standard syllable parameter storage unit (4) and the feature parameters extracted by the feature parameter extraction unit (11). 6) a user syllable parameter storage unit (5) for storing the syllable parameters generated by the syllable parameter generation unit (6);
And a voice synthesizer for outputting an arbitrary word or sentence based on the syllable parameters stored in the user syllable parameter storage unit (5).