JPS6022195A

JPS6022195A - Synthesization of voice

Info

Publication number: JPS6022195A
Application number: JP58129399A
Authority: JP
Inventors: 隆矢頭; 三木　敬; 森戸　誠
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1983-07-18
Filing date: 1983-07-18
Publication date: 1985-02-04

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（技術分野）本発明は、複数の再ケチヤンネルを用意しておき、順次
循環的に１つの再生チャンネルを指定して１つずつ音声
素片を再生させ、全てのチャンネルの出力を重畳させる
ことによって音声を合成させるようにした方法に関し、
特に、音節メモリと素片メモリとを用意しておき、これ
らメモリのアドレス制御によって複数のチャンネルを設
定するようにしだ音声合成方法に関する。Detailed Description of the Invention (Technical Field) The present invention provides a method for preparing a plurality of re-key channels, sequentially and cyclically specifying one playback channel to play back speech segments one by one, and Regarding the method of synthesizing speech by superimposing the output of
In particular, the present invention relates to a speech synthesis method in which a syllable memory and a segment memory are prepared and a plurality of channels are set by controlling the addresses of these memories.

（従来技術）現在実用化されている音声出方の方法の多くは、あらか
じめ人間が発声した単語１文章を自然波形もしくは音声
情報の圧縮を図った圧縮波形（例えばＡＤＰＣＭ符号、
　ＰＡＲＣＯＲパラメータ）の形で蓄積し、それらを出
力に応じて、編集して合成する方式でし最近等用的な目
的で大量の語いもしくは人名、会社名、地名といった無
制限な任意の音声の出方を必要とする装置の要求が高ま
っている。このような大量の言語情報を音声出力装置内
に蓄積し、必要に応じてこれを音声として出方するため
には、そのすべてを人間の発声に頼ることは非現実的で
ｓｂ、また音声蓄積ファイルも膨大な記憶容量を必要と
する。従って、文字系列のみから音声波形を合成する任
意語粟合成方式が研究されている。(Prior Art) Most of the methods of producing speech that are currently in practical use use natural waveforms or compressed waveforms (for example, ADPCM code,
PARCOR parameters), which are then edited and synthesized according to the output.Recently, a large number of words or unlimited arbitrary sounds such as people's names, company names, and place names can be output for general purposes. There is an increasing demand for equipment that requires In order to store such a large amount of linguistic information in a voice output device and output it as voice as necessary, it is unrealistic to rely on human vocalizations, and voice storage is also necessary. Files also require a huge amount of storage space. Therefore, research is being carried out on arbitrary word synthesis methods that synthesize speech waveforms only from character sequences.

ここでは合成回路が簡単である波形領域の任意語粟合成
方式について説明する。Here, a waveform domain arbitrary word synthesis method with a simple synthesis circuit will be described.

ところで、音声の波形を観測すると、母音などの有声音
の区間ではよく似た波形が繰シ返しているのが判かる。By the way, when observing the waveform of speech, it can be seen that very similar waveforms are repeated in the sections of voiced sounds such as vowels.

この周期をピッチ周期と呼び、との−周期内の波形をピ
ッチ単位の音声素片という。This period is called a pitch period, and the waveform within the - period is called a speech unit in pitch units.

この素片の内容の変化が音韻性を表わし、この周期の変
化の時間的・ぐターンがアクセントを与える。Changes in the content of this elemental piece express phonological properties, and the temporal changes in this cycle give accents.

前記のように母音などの有声音区間では、はとんど同じ
形の波形が繰シ返され、同じ種類の音声では似た形の波
形が現われる。従って音声中に現れる波形の中で、音声
を作シ上げるために必要な特徴的な音声波形を記憶装置
に蓄積し、この基本的な音素波形を編集しつなぎ合わせ
ることによって、任意の連続した音声を合成できると考
えられる。As mentioned above, in voiced sound sections such as vowels, waveforms of almost the same shape are repeated, and waveforms of similar shapes appear in the same type of speech. Therefore, among the waveforms that appear in speech, characteristic speech waveforms necessary for producing speech are stored in a storage device, and by editing and joining these basic phoneme waveforms, arbitrary continuous speech can be created. It is thought that it is possible to synthesize

一方、日本語の任意の文章は基本的に百数十種の単音節
で表わすことができる。従ってこれだけの単音節音声を
記憶装置に格納しておけば日本語のすべてが合成できる
。On the other hand, any sentence in Japanese can basically be expressed using over 100 different types of monosyllables. Therefore, by storing this many monosyllabic sounds in a storage device, it is possible to synthesize all of Japanese.

このように任意の言葉を音声として出力するためには音
声の基本単位として少くとも単語よシ小さいものを選ぶ
必要があるが、この種の音声合成装置において言葉をど
の程度の単位で分類し記憶させておくかということは合
成装置の規模あるいは音質を決定する重要な問題である
。In this way, in order to output arbitrary words as speech, it is necessary to select something smaller than a word as the basic unit of speech, but in what units are words classified and memorized in this type of speech synthesizer? Whether or not to use the synthesizer is an important issue that determines the scale and sound quality of the synthesizer.

すなわち、音韻性について言えば、音素のように基本単
位が小さい方が任意の言語メツセージの合成に必要な単
位の数は少なくてすむが、近傍の言語単位との相互作用
（調音結合）による音響的特性の変化がはげしくなシ、
それらを単純に接続するだけでは高品質の音声が得られ
ない。まだ音節を基本単位にすれば、音と音をつなぎあ
わせる操作が少くなシ加えて聴覚的々わたシの部分を含
めて音声を記憶するため、音韻性は十分保存され結合の
ための規則もあまシ複雑にしなくても良いことが予想さ
れるが、各単音節における音素の重複も多く記憶装置の
容量が増大する。In other words, when it comes to phonology, smaller basic units like phonemes require fewer units to synthesize any linguistic message, but acoustics due to interactions with neighboring linguistic units (articulatory combination) are smaller. Changes in physical characteristics are rapid,
Simply connecting them together will not provide high-quality audio. If we use syllables as the basic unit, there are fewer operations to connect sounds.In addition, since sounds are memorized including the auditory parts, phonology is well preserved and rules for combining sounds are also preserved. Although it is expected that it will not be necessary to make it too complicated, there will be many duplications of phonemes in each single syllable, and the capacity of the storage device will increase.

一方音韻性とともにアクセント、イントネニション、継
続時間などの韻律的情報は自然な合成音を得るだめの極
めて重要な要素である。On the other hand, phonological information as well as prosodic information such as accent, intonation, and duration are extremely important elements for obtaining natural synthesized sounds.

波形領域の合成装置において、記憶装置に格納された音
声単位を素材としてこれらの韻律を適正に制御するため
には、声の高さ、振幅及び時間長を制御情報の辞書の指
示に従って変化させながら連続音声を作シ上げていく。In a waveform domain synthesis device, in order to appropriately control the prosody using speech units stored in the storage device as raw materials, it is necessary to change the pitch, amplitude, and duration of the voice according to the instructions in the dictionary of control information. Create continuous audio.

韻律の制御性を考えると、音素を基本単位とする方が都
合がよく、単音節を制御単位として韻律制御を行うだめ
には、ピッチ、時間長の異なった複数の単音節波形から
適正なものを選択する他あま９有用な手だてがない。Considering the controllability of prosody, it is more convenient to use the phoneme as the basic unit, and in order to perform prosody control using a monosyllable as the control unit, it is necessary to select an appropriate one from among multiple monosyllabic waveforms with different pitches and durations. There are no other useful options.

以上のように任意語い音声出力装置における音声の基本
単位としては、記憶容量と韻律性については音声素片を
用いた方が良く、音韻的には単音節レベルが優れている
。As described above, as the basic unit of speech in an arbitrary speech speech output device, it is better to use a speech segment in terms of storage capacity and prosody, and monosyllable level is better in terms of phonology.

この理由から音韻的には単音節を制御単位とし、韻律の
上では音声素片を制御単位とする方法が提案されている
。すなわち、記憶装置に記録されている音声素片を単音
節をブロックとしてその内部で順次的に配列しておく方
法である。この方法においては音声素片が単音節ごとに
時系列で連続に取り出されることによって音韻性が保た
れまた韻律の制御にあたって音声素片を単位として用い
ることができる。For this reason, a method has been proposed in which the unit of control is a single syllable in terms of phonology, and the unit of control in terms of prosody is a speech segment. In other words, this is a method in which the speech segments recorded in the storage device are sequentially arranged within each monosyllable block. In this method, phonetic properties are maintained by sequentially extracting speech segments in chronological order for each single syllable, and speech segments can be used as a unit for prosody control.

しかしこの方式では、記憶装置における記憶の基本単位
としては音声素片を単位としているが記憶容量からいえ
ば、すべての単音節波形を記憶装置内に格納することに
は変わシなく、記憶容量の増大を招く。例として第１図
に単音節波形を示す。However, in this method, the basic unit of memory in the storage device is the speech segment, but in terms of storage capacity, all monosyllabic waveforms are still stored in the storage device, and the storage capacity is limited. cause an increase. As an example, FIG. 1 shows a monosyllabic waveform.

第１図ａは単音節／ｍａ／、第１図すは単音節／ｍ　１
　／　、第１図Ｃは単音節／ｒｉ／の波形例である。Figure 1 a is a monosyllable /ma/, Figure 1 is a monosyllable /m 1
/, FIG. 1C is an example of the waveform of the monosyllable /ri/.

これらの波形は各々単音節波形として記憶装置に記録さ
れる。ここで音声素片を単位としてみると／ｍａ／と／
ｍｉ　／のｍの部分、また／ｍｉ／と／ｒ　ｔ／のｉの
部分はほぼ同じ素片とみなすことができるが、それぞれ
の素片は音節を単位として順次的に記録する必要がある
ため、たとえ同じ素片であっても音節が異なればそれぞ
れ別個に記録されなければならない。Each of these waveforms is recorded in the storage device as a monosyllabic waveform. Here, if we consider the phonetic segment as a unit, /ma/ and /
The m part of mi / and the i part of /mi/ and /r t/ can be considered to be almost the same elemental fragment, but each elemental fragment needs to be recorded sequentially in units of syllables. , even if the same elemental fragment has different syllables, each must be recorded separately.

（発明の目的）本発明は、１ピッチ単位の音声素片もしくはそれを変換
した減衰波のサンプル時系列を表現する素片単位を素片
メモリに記憶しておき、しかも全ての素片単位を一定数
からなるサンプル時系列を表現するデータで記憶してお
き、更に単音節単位相当の波形を各素片単位の先頭アド
レスと素片単位のくり返し回数とで記憶している音節メ
モリとを備えておくこｔによって、メモリ容量を軽減す
ることを目的とし、特に複数のチャンネルを設定して全
てのチャンネルを加算重畳することによって音声素片の
つなぎ目の不連続性を軽減する形式の音声合成方法にお
いてこの目的を達成したものである。(Objective of the Invention) The present invention stores speech segments in units of one pitch or speech segments representing a sample time series of attenuated waves obtained by converting the speech segments in a segment memory, and also stores all speech segments in a segment memory. It stores data representing a sample time series consisting of a fixed number, and further includes a syllable memory that stores a waveform corresponding to a single syllable unit as a starting address for each segment and the number of repetitions for each segment. A speech synthesis method that aims to reduce memory capacity by setting multiple channels and adding and superimposing all channels to reduce discontinuity at the joints of speech segments. This objective was achieved in the following.

（発明の概要）まず、本発明が前提とする複数再生チャンネルによる波
形再生について第２図を用いて説明する。(Summary of the Invention) First, waveform reproduction using a plurality of reproduction channels, which is a premise of the present invention, will be explained with reference to FIG.

第２図において、Ｅ　Ｌｌ、〜Ｅす１ｓ　、ｐ　ＥＬ２
１　ｐ　ＥＬ２２及びＦＴ、〜ＰＴ５はそれぞれ合成す
べき音声に関する入力情報に基づいて作成された音声素
片の系列とピッチ周期の系列を示すものであシ、音声素
片の系列ＥＬＩＩ　＃　ＥＬ１２　ｒ　ＥＬｔ３ｐ　Ｅ
Ｌ２１　ｐ　ＥＬ２２とピッチ周期の系列Ｐ’ｒ１ｔ　
Ｐ’ｒ２　ｙ’　ＦＴ３　ｇ　ＦＴ４　ｔ　ＦＴ５とは
対で作成されたものとして示している。波形の再生チャ
ンネルとしては第１〜第４なる４チヤンネルを用意して
おき、最初の音声素片Ｅ　Ｌｌ、は第１の再生チャンネ
ルで再生させ、次のものＥＬ１２は第２の再生チャン−
ネルで再生させ、次々と再生チャンネルを変えて再生さ
せ、−循したら再び最初の第１の再生チャンネルで再生
させる。このように、順次循環的に１つの再生チャンネ
ルを選択して再生させることによシ、例えば第１の再生
チャンネルで音声素片を再生させることによシ、短かい
ピッチ周期が設定されても各音声素片の全長にわたって
再生され、これら全ての再生チャンネルを加算すること
によって合成波形を得ることができる。なお、第２図で
は、簡略化のため、同じ波形で示しているが、Ｅ　Ｌ１
＋　〜ＥＬ１３とＥ　Ｌ２ｔ　＋　、Ｅ　Ｌ１２とは異
なる音声素片であ’）　、ＥＵ、１１〜ＥＬ１３は同じ
音声素片ＥＬｌであ’）、ＥＬ２１とＥＬ２２は同じ音
声素片ＥＬ２である。従って、第２図では、音声素片Ｅ
Ｌ、が３回だけ繰返し、音声素片ＥＬ２が２回だけ繰返
すことになる。In FIG. 2, E Ll, ~Es1s, p EL2
1 p EL22 and FT, ~PT5 indicate a sequence of speech segments and a series of pitch periods, respectively, created based on input information regarding speech to be synthesized. Sequence of speech segments ELII # EL12 r ELt3p E
L21 p EL22 and pitch period sequence P'r1t
P'r2 y' FT3 g FT4 t FT5 is shown as a pair. Four channels, 1st to 4th, are prepared as waveform playback channels, and the first speech segment EL1 is played back on the first playback channel, and the next voice segment EL12 is played back on the second playback channel.
play it on the channel, change the playback channel one after another, play it again, and then play it again on the first playback channel. In this way, by sequentially and cyclically selecting and reproducing one reproduction channel, for example, by reproducing a speech segment on the first reproduction channel, even if a short pitch period is set. Each speech segment is played back over its entire length, and a composite waveform can be obtained by adding all these playback channels. In addition, in FIG. 2, the same waveform is shown for simplification, but E L1
+ ~ EL13 and EL2t + , EL12 are different phonetic units '), EU, 11 to EL13 are the same phonetic unit EL1'), and EL21 and EL22 are the same phonetic unit EL2. Therefore, in FIG. 2, the phonetic segment E
L is repeated only three times, and speech segment EL2 is repeated only twice.

本発明は、このような波形再生において、音節メモリと
一定長の音声素片を記憶する素片メモリとを設けておく
ことによって記憶容量を軽減するようにしたものであシ
、各チャンネル毎に素片メモリのアドレス変数を設定し
ておき、それに基づいてサンプルクロックと同期して各
チャンネルへ各素片単位の１サンプル分のデータを読み
出すというアドレス制御によって簡単に対応させたもの
であシ、以下実施例について説明する。The present invention reduces the storage capacity in such waveform reproduction by providing a syllable memory and a segment memory for storing speech segments of a certain length. This can be done simply by setting the address variable of the segment memory and then using address control to read out one sample of data for each segment to each channel in synchronization with the sample clock based on the address variable. Examples will be described below.

（実施例）次に、第３図〜第７図を用いて、実施例について説明す
る。(Example) Next, an example will be described using FIGS. 3 to 7.

第３図は、音声合成装置を示すブロック図であシ、第４
図はマイクロプロセッサ１が実行スる機能及び制御手順
を示すフローチャートである。合成すべき音声に関する
入力情報は、適当なポーズ等を伴って語句毎に単音節を
示す文字コードの系列の形式で、タイプライタ２からマ
イクロプロセッサ１に入力される。韻律メモリ部３には
、入力語句に関するアクセント型、イントネーション形
、及び継続時間などの韻律制御情報並び単音節情報が記
憶されていて、これらがマイクロプロセッサ１によって
検索され、入力情報に応じて単音節情報の系列とピッチ
周期の系列ＰＴｊが作成される。FIG. 3 is a block diagram showing the speech synthesis device.
The figure is a flowchart showing the functions and control procedures executed by the microprocessor 1. Input information regarding the speech to be synthesized is input from the typewriter 2 to the microprocessor 1 in the form of a series of character codes indicating monosyllables for each word with appropriate pauses and the like. The prosody memory unit 3 stores prosodic control information such as accent type, intonation type, and duration regarding input words, as well as monosyllabic information, which are searched by the microprocessor 1 to form monosyllabic information according to the input information. An information sequence and a pitch period sequence PTj are created.

又、単音節情報は単音節メモリ４における各単音節の先
頭アドレスを指定する情報と単音部長とか片単位の先頭
アドレスの系列と各素片単位の繰返し回数とを記憶して
いる。素片単位先頭アドレスは、素片メモリ５における
各素片単位の先頭アドレスを指定する情報である。素片
メモリ５には、全ての日本語音節における任意の音声素
片を再生するに必要な素片単位を記憶していて、素片単
位はＤＰＣＭ等の適当な圧縮技術によって作成され、又
、全ての素片単位は１２８サンプルなる予め定められた
一定長のサンプル時系列を再生できるデータ量で記憶し
て゛いる。波形再生器６は、採用した圧縮技術に対応し
た構成となっており、素片単位のデータに基づいてそれ
に対応したサンプル時系列を再生するものであシ、合成
出力で−のピッチ周期が短かくても１２８個のサンプル
を再生できるように４チヤンネルの再生チャンネルＲＧ
１〜ＲＧ４を設けている。波形再生は、サンプルクロッ
クの割込起動によって設定される実時間の時間軸上にお
いて、ピッチ周期の系列ＰＴｊと素片単位情報の系列と
を対応設定し、ピンチ周期の更新毎に時間軸上で対応し
た素片単位選択情報の素片単位ＥＬｔを選択し、同じく
ピッチ周期の更新毎に順次循環的に１つの再生チャンネ
ルを選択することによって再生すべき素片単位と再生チ
ャンネルを対応させ、全ての再生チャンネルにおいて１
サンゾルずつ再生させ、これを重畳させることによって
実行される。Further, the monosyllabic information stores information specifying the starting address of each monosyllable in the monosyllabic memory 4, a series of starting addresses for each unit such as the length of a single syllable, and the number of repetitions for each segment. The segment unit start address is information that specifies the start address of each segment unit in the segment memory 5. The segment memory 5 stores segment units necessary for reproducing arbitrary speech segments in all Japanese syllables, and the segment units are created by an appropriate compression technique such as DPCM, and All segment units are stored in an amount of data that can reproduce a sample time series of a predetermined length of 128 samples. The waveform regenerator 6 has a configuration compatible with the adopted compression technology, and reproduces the corresponding sample time series based on the data in units of fragments. There are 4 playback channels RG so that 128 samples can be played back.
1 to RG4 are provided. Waveform reproduction is performed by setting the pitch period sequence PTj and the sequence of segment unit information in correspondence on the real-time time axis set by interrupt activation of the sample clock, and reproducing the pitch period sequence PTj on the time axis every time the pinch period is updated. By selecting the corresponding element unit ELt of the element unit selection information and selecting one reproduction channel sequentially and cyclically every time the pitch period is updated, the element unit to be reproduced and the reproduction channel are made to correspond, and all 1 in the playback channel of
This is done by regenerating Sansol one by one and superimposing them.

第５図〜第７図はマイクロプロセッサｌが実行する波形
可成のフローを示すものである。5 to 7 show the waveform generation flow executed by the microprocessor l.

第５図は、波形再生のフローを示すものであシ、初期設
定されかつ後述の如く更新される４チヤンネル分のアド
レス変数ＡＤＩ−％−ＡＤ４を用意しておき、ステラｆ
ＳＰ１１において第１アドレス変数ＡＤ１で素片メモリ
５のアドレスを指定して１つの素片単位の１つのデータ
を読み出し、そのデータを波形再生器６の第１再生チヤ
ンネルＲＧ１へ送シ、１、っのサンプルを再生させる。FIG. 5 shows the flow of waveform reproduction. Address variables ADI-%-AD4 for four channels, which are initialized and updated as described later, are prepared, and Stella f
In SP11, the address of the elemental piece memory 5 is specified with the first address variable AD1, one piece of data of one elemental piece is read out, and the data is sent to the first reproduction channel RG1 of the waveform regenerator 6. Play the sample.

ステップ５Ｐ１２〜５Ｐ１４では、それぞれ第２〜第４
アドレス変数ＡＤ２〜ＡＤ４で素片メモリ５のアドレス
を指定し、それぞれ対応した第２〜第４再生チヤンネル
ＲＧ２〜ＲＧ４でそれぞれ１つのサンプルを再生させる
。ステップ５ＰＩ５では、全ての再生チャンネルＲＧ、
〜ＲＧ４の出力すなわち４サンゾルを加算して重畳する
ことによって合成出力の１サンプルを再生し出力する。In steps 5P12 to 5P14, the second to fourth
Addresses of the segment memory 5 are designated by address variables AD2 to AD4, and one sample is reproduced by the corresponding second to fourth reproduction channels RG2 to RG4, respectively. Step 5 In PI5, all playback channels RG,
By adding and superimposing the outputs of ~RG4, that is, 4 samples, one sample of the composite output is reproduced and output.

サンプルクロックの割込起動毎に１サンプルずつ再生す
ることによって合成出力を再生する。The synthesized output is reproduced by reproducing one sample each time the sample clock interrupt is activated.

第６図は、サンプルクロックの時間軸上で、素片単位先
頭アドレスを設定するフローチャートを示すものである
。FIG. 6 shows a flowchart for setting the start address of each elemental piece on the time axis of the sample clock.

第６図において、サンプルクロックの割込によって起動
され、ステップ５Ｔ２１において現在処理中の単音節が
残存しているか判定される。単音部長変数ＰＨ１＝　０
ならばステップ５Ｐ２２において、単音節番号をインク
レメントして先に作成されている単音節情報の系列から
次の単音節情報を取シ出し、ステップ５Ｐ２３において
その単音部長ＰＨＬａを単音筒長変数ＰＨ１ヘロードし
、又、その単音節先頭アドレスを単音節メモリ４′のア
ドレス変ｉ　ＡＤｐにロードする。続いて、ステップ５
Ｐ２４においてその単音部先頭アドレスで単音節メモリ
４のアドレスを指定して最初の素片単位ＥＬｊの先頭ア
ドレスとその素片単位の繰返し回数ｒＪ啄読み出す。ス
テップ５Ｐ２５においてその素片単位先頭アドレスを素
片メモリ５のアドレス変数ＡＤｅヘロードする。In FIG. 6, it is started by an interruption of the sample clock, and in step 5T21 it is determined whether the single syllable currently being processed remains. Single note length variable PH1 = 0
Then, in step 5P22, the monosyllable number is incremented to extract the next monosyllable information from the previously created series of monosyllable information, and in step 5P23, the monosyllable length PHLa is loaded into the monosyllable length variable PH1. Then, the monosyllable start address is loaded into the address change i ADp of the monosyllable memory 4'. Next, step 5
At P24, the address of the monosyllable memory 4 is designated by the monosyllable start address, and the start address of the first segment unit ELj and the number of repetitions rJ of the segment unit are read out. In step 5P25, the segment unit start address is loaded into the address variable ADe of the segment memory 5.

このステップ５Ｐ２５において、１つの単音節における
最初の素片単位先頭アドレスがサンプルクロックと対応
して設定されたことになる。また、ステップ５Ｐ２６に
おいて現在処理中の素片単位ＥＬｊの残存長が判断され
、素片長変数ＥＬｔ＝０ならば、ステップ５Ｔ２７にお
いてアドレス変数ＡＤｐの内容をインクレメントして次
のアドレスを指定し、ステラｆｓＰ２４．５Ｐ２５を経
て、１つの単音節における次の素片単位ＥＬｊ＋１の先
頭アドレスがサンゾルクロックの時間軸上に設定される
。ステップ５Ｐ２８．５Ｐ２９は素片長をサンプルクロ
ックの時間軸上で計数する過程でアシ、全ての素片単位
のサンプル数は同一の１２８個にしているので、ステッ
プ５Ｐ２８で素片長変数ＥＬｔとして素片繰返し回数ｒ
ｅを１２８倍した値をロードし、ステップ５Ｐ２９にお
いてサンプルクロックと同期してデクレメントすること
によって実行され、また、ステラ７°５ｐ３０では、同
じくサンプルクロックと同期して単音筒長変数ＰＨ１を
デクレメントしておく。In this step 5P25, the first segment unit head address in one monosyllable is set in correspondence with the sample clock. Further, in step 5P26, the remaining length of the segment unit ELj currently being processed is determined, and if the segment length variable ELt=0, the content of the address variable ADp is incremented in step 5T27 to specify the next address, and the Stellar Through fsP24.5P25, the start address of the next elemental unit ELj+1 in one single syllable is set on the time axis of the Sanzor clock. Steps 5P28 and 5P29 are performed in the process of counting the segment length on the time axis of the sample clock.The number of samples in all segment units is the same, 128, so in step 5P28, the segment length variable ELt is used to repeat the segment length. Number of times r
This is executed by loading the value multiplied by 128 e and decrementing it in synchronization with the sample clock in step 5P29. Also, in Stella 7° 5p30, the monotone cylinder length variable PH1 is decremented in synchronization with the sample clock. I'll keep it.

第７図は、ピッチ周期を実時間上で設定し、且つ同時的
に素片メモリ５から読み出される４個の素片単位を所定
の再生チャンネルＲＧＩ−ＲＧ４へ対応づけるフローを
示す。第７図を参照するに、サンプルクロックで割込起
動がかけられる毎に、ステップ５Ｐ４０において現在設
定中のピッチ周期ＰＴｊの残存長が判定され、ピッチ周
期変数’ｐＴｔ　＝０ならば、ステップ５Ｐ４１〜５Ｐ
４３の処理後、ステップ４４においてピッチ番号ｊをイ
ンクレメントして先に作成されているピッチ周期系列力
≧ら次のピッチ周期ＰＴｊを取シ出し、ステップ５Ｐ４
５においてそれをピッチ周期変数Ｐ■ｔヘロードシ、又
、してそのピッチ周期変数ｐＩ（ｚをデクレメントする
ことによって、サンゾルクロックの時間軸上で各ピッチ
周期ＰＴｊを設定する。ピッチ周期ＰＴｊの更新と同期
して実行されるステップ５Ｐ４１〜４３において、まず
ステップ５Ｐ４１において４チヤンネル指定値ＣＨをイ
ンクレメントすることによって順次循環的に１つのチャ
ンネルを選択し、ステラｆｓＰ４２で素片メモリ５の４
チャンネル分のアドレス変数ＡＤ、〜ＡＤ４のなかから
前記指定値ＣＨに対応するアドレス変数ＡＤｃｈを選択
し、ステップ５Ｐ４３では現在設定されている素片単位
ＥＬｉの先頭アドレスを示すアドレス変数ＡＤｅをその
選択されたチャンネルのアドレス変数ＡＤｃｈヘロード
することによって、各素片単位を各チャンネルに割シ当
てる。ステツｆｓＰ４７において、各チャンネルのアド
レス変数ＡＤ１％ＡＤ４をそれぞれインクレメントする
ことによって、各アドレス変数ＡＤ１〜ＡＤ４は、それ
ぞれ異なった素片単位であってしかも異なった順番のサ
ンプルを再生するデータを、サンゾルクロックと同期し
て順次アドレスしていくことができる。FIG. 7 shows a flow for setting the pitch period in real time and associating four segment units simultaneously read from the segment memory 5 with predetermined reproduction channels RGI-RG4. Referring to FIG. 7, each time an interrupt is activated by the sample clock, the remaining length of the pitch period PTj currently being set is determined in step 5P40, and if the pitch period variable 'pTt = 0, steps 5P41~ 5P
After the processing in step 43, the pitch number j is incremented in step 44 to extract the next pitch period PTj from the previously created pitch period series force≧, and step 5P4
In step 5, each pitch period PTj is set on the time axis of the Sanzor clock by converting it into a pitch period variable Pt, and by decrementing the pitch period variable pI(z. In steps 5P41 to 43, which are executed in synchronization with the update, one channel is sequentially and cyclically selected by incrementing the 4 channel specified value CH in step 5P41, and the 4 channels of the segment memory 5 are selected in the stellar fsP42.
The address variable ADch corresponding to the specified value CH is selected from among the address variables AD, ~AD4 for the channels, and in step 5P43, the address variable ADe indicating the start address of the currently set elemental unit ELi is selected for the selected address variable ADch. Each segment unit is assigned to each channel by loading it into the address variable ADch of the channel. In step fsP47, by incrementing address variables AD1%AD4 of each channel, each address variable AD1 to AD4 can sample data that reproduces samples in different unit of fragments and in different orders. It is possible to address sequentially in synchronization with the sol clock.

以上の実施例においては音声の強弱を与える振幅情報の
処理については省略したが、単音節情報と同様に監視し
、音声素片の更新と同期して各再生チャンネルへ与える
ことによって音声の強弱の制御を行うことができる。In the above embodiments, the processing of amplitude information that determines the strength of the voice is omitted, but it is monitored in the same way as monosyllabic information and is provided to each playback channel in synchronization with the update of the voice segment, thereby determining the strength and weakness of the voice. can be controlled.

以上説明したように、本実施例における音声合成装置は
音韻の制御単位として日本語単音節を、すなわち単音節
を１ブロツクとして音声素片データのアドレスが格納さ
れているアドレスメモリを備えることによシ、音声素片
データが単音節ごとに連続して取シ出されるため出力さ
れる単音節音声の音韻性が十分に保存される。まだ韻律
の制御は音声素片メモリ内の音声素片を制御の単位とし
ているため合成音に適切な韻律を与えることができる。As explained above, the speech synthesis device according to the present embodiment is equipped with an address memory in which addresses of speech unit data are stored in Japanese monosyllables as phoneme control units, that is, with each monosyllable as one block. Second, since the speech unit data is extracted continuously for each monosyllable, the phonology of the output monosyllabic speech is sufficiently preserved. However, since prosody control uses speech segments in the speech segment memory as units of control, it is possible to give appropriate prosody to synthesized speech.

さらに単音節を音声素片データのアドレスで管理してい
るので音声素片メモリにおいて音声素片データを単音節
単位に連続して持っておく必要がなく従って音声素片メ
モリは音韻的に重複しない最少限の音声素片データで構
成できるため記憶装置の記憶容量を軽減できる。また、
このような利点が簡単なアドレス制御で複数再生チャン
ネルの出力を重畳する方式において実現できる。Furthermore, since monosyllables are managed using addresses of speech segment data, there is no need to store speech segment data consecutively in monosyllable units in the speech segment memory, and therefore speech segment memories do not overlap phonologically. Since it can be configured with the minimum amount of speech segment data, the storage capacity of the storage device can be reduced. Also,
These advantages can be realized by a method of superimposing the outputs of multiple playback channels with simple address control.

（発明の効果）以上の説明から明らかなように、本発明によれば、複数
の再生チャンネルを設けているので音声素片のつなぎ目
が円滑に行われ、単音節メモリと素片メモリとを設けて
複数チャンネルに対応したアドレス制御を行わせている
ので、少ない記憶容量と簡単な手順とで音韻的にも韻律
的にも自然な合成音声を得ることができる。(Effects of the Invention) As is clear from the above description, according to the present invention, since a plurality of playback channels are provided, speech segments are smoothly connected, and monosyllabic memory and segment memory are provided. Since address control corresponding to a plurality of channels is performed using the system, synthesized speech that is natural both phonologically and prosodically can be obtained with a small storage capacity and a simple procedure.

[Brief explanation of drawings]

第１図は音声波形例を示す図、第２図は本発明で採用す
る波形再生の概念を示す説明図、第３図は本発明の実施
例を示すブロック図、第４図はその全体フローを示す図
、第５図〜第７図はそれぞれその詳細フローを示す図で
ある。１・・・マイクロプロセッサ、２・・・タイグライタ、
３・・・韻律メモリ、４・・・音節メモリ、５・・・素
片メモリ、６・・・波形再生器。特許出願人　沖電気工業株式会社第３図第４図　第５図手続補正書（自制昭和　！１１．１１月１　日特許庁長官　殿１、事件の表示昭和５８年　特　許　願第１２９３９９号２、発明の名
称音声合成方法３、補正をする者事件との関係　特許出願人任　所（〒１０５）　東京都港区虎ノ門１丁目７番１２
号４、代理人住　所（〒１０５）　東京都港区虎ノ門１丁目７査１２
号５、補正の対象　明細書中「発明の詳細な説明」の欄
６、補正の内容　別紙のとおシロ、補正の内容（１）明細書第３頁第１８行目に「圧縮波形」と４ノあるを「圧縮符号」と補正する。（２）　同書第８頁第７行目から第８行目にかけて「た
とえ同じ素片であっても」とあるのを「たとええ類似し
た素片であっても」と補正する。（３）　同書第１２頁第５行目に「日本語音節における
任意の音声素片を」とあるのをＦ吟す＝＝ｔ＝ｚ枦俳奪
Ｑ春鼻シー」４削徐」ニー「日本志吾音陣乞」と補正す
る。Fig. 1 is a diagram showing an example of an audio waveform, Fig. 2 is an explanatory diagram showing the concept of waveform reproduction adopted in the present invention, Fig. 3 is a block diagram showing an embodiment of the present invention, and Fig. 4 is the overall flow. , and FIGS. 5 to 7 are diagrams showing detailed flows thereof, respectively. 1...Microprocessor, 2...Tiger writer,
3... Prosodic memory, 4... Syllable memory, 5... Piece memory, 6... Waveform regenerator. Patent Applicant: Oki Electric Industry Co., Ltd. Figure 3 Figure 4 Figure 5 Procedural Amendment (Self-restraint Showa! November 1, 2015 Director General of the Patent Office 1, Indication of Case 1988 Patent Application No. 129399 2, Name of the invention Speech synthesis method 3, relationship with the case of the person making the amendment Patent applicant's office (〒105) 1-7-12 Toranomon, Minato-ku, Tokyo
No. 4, Agent address (105) 1-7-12 Toranomon, Minato-ku, Tokyo
No. 5, Subject of amendment Column 6 of "Detailed Description of the Invention" in the specification, Contents of amendment Attached sheet, Contents of amendment (1) "Compressed waveform" and 4 on page 3, line 18 of the specification Correct the word "compression code" to "compression code". (2) From line 7 to line 8 of page 8 of the same book, the phrase ``even if they are the same elemental piece'' has been amended to ``even if they are similar elemental pieces.'' (3) In the 5th line of page 12 of the same book, it says ``any phonetic segment in Japanese syllables.'' I am corrected by saying, ``Japan Shigoon Jinbei''.

Claims

[Scope of Claims] a) Contains a large number of segment units representing a sample time series associated with a speech segment of one pitch unit in natural voiced speech, and all segment units are divided into a predetermined number of segments. It is equipped with a segment memory that stores data representing a sample time series, and b) stores a large number of single syllables by the number of repetitions of each segment unit and the sequence of the start address where the segment unit is stored. c) having a plurality of predetermined playback channels for reproducing the sample time series based on the selected segment unit, and d) setting the address of the segment memory as a sample clock. Address variables to be updated in synchronization are prepared corresponding to each of the playback channels, and e) a series of monosyllabic information created based on input information regarding the speech to be synthesized is sequentially extracted once. specifies the start address of one single syllable, and reads the start address of one segment unit and the number of repetitions of that unit from the syllable memory, and also reads the number of repetitions or the number of repetitions of that unit, or the number of repetitions read immediately before. Every time a certain number of sample clocks are counted as the number of repetitions per unit of segment, the address of the syllable memory is advanced by one.
f) Selecting the start address of one segment unit read from the syllable memory in synchronization with the update of the bi-tsuchi cycle;
and sequentially and cyclically selecting one address variable from among the address variables in synchronization with the update of the pitch period;
Set the address of the selected segment unit in the selected address variable, g) update each address variable in synchronization with the sample clock, and update the segment memory with each address variable at each sample clock. Specify the address and give the data for each fragment unit to the corresponding playback channel, h) Synchronize with the sample clock and give the data for each playback channel
A voice synthesis method characterized by synthesizing voice by playing back samples one by one and then superimposing the output of each playback channel.