JPS58117598A

JPS58117598A - Voice synthesizer

Info

Publication number: JPS58117598A
Application number: JP57000262A
Authority: JP
Inventors: 原田　昭次
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-01-06
Filing date: 1982-01-06
Publication date: 1983-07-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は音声合成装置に係り、特に、分析合成方式によ
る音声合成装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech synthesis device, and particularly to a speech synthesis device using an analysis and synthesis method.

分析合成方式は、予め音声分析・Ｋより音声の内容を表
現するのに充分な音声データとしての情報、例えば音声
合成パラメータを抽出し、その情報を使って音声を再合
成する方式である。この方式には、入力波形の再現を図
るものとしてＡＰＣＭ（適応ＰＣＭ）方式、５ＢＣ（サ
ブバンド符号化）方式等があり、入力のスペクトル上の
性質は保存するが波形の再現性を問わないものとしてＬ
ＰＧ（線形予測符号化）方式、ＰＡＲＣＯＲ（偏自己相
関係数）方式等がある。これらの方式による従来の音声
合成装置は同一分析密度によって求めた音声データを用
いて音声を合成している。そのため、この音声データの
記憶部の容置に対して音声文章の時間が長く、音声デー
タを記憶部に収容しきれない場合は、音声データの分析
密度を一様に低くして多くの音声データが収容できるよ
うにしていた。The analysis and synthesis method is a method in which information in the form of speech data sufficient to express the content of speech, such as speech synthesis parameters, is extracted in advance from speech analysis/K, and the speech is resynthesized using this information. This method includes APCM (adaptive PCM) and 5BC (subband coding) methods that aim to reproduce the input waveform, and methods that preserve the spectral properties of the input but do not care about the reproducibility of the waveform. as L
There are a PG (linear predictive coding) method, a PARCOR (partial autocorrelation coefficient) method, and the like. Conventional speech synthesis devices using these methods synthesize speech using speech data obtained with the same analysis density. Therefore, if the length of the audio sentence is long compared to the storage capacity of this audio data, and the audio data cannot be accommodated in the storage unit, the analysis density of the audio data is uniformly lowered and a large amount of audio data is stored. was designed to accommodate.

例えば、音声出力としての音声文章の時間が合計４０秒
の場合、音声合成装置のメモリ容量が１２８にビットで
、その音声データの分析密度が４．８にビット／秒のも
のを用いると、このメモリ容量では幕、２８にビット／
４．８にビット＝約２６秒相当の音声データしか収納が
不可能である。そのため、このような場合にはメモリの
分析密度を２．４にビット／秒にして音声文章の時間が
長い場合に対処していた。この場合約５２秒相当の音戸
ブーータが収納できる。For example, if the total duration of the audio sentence as audio output is 40 seconds, the memory capacity of the speech synthesizer is 128 bits, and the analysis density of the audio data is 4.8 bits/second. In terms of memory capacity, Maku is 28 bits/
4.8 can only store audio data equivalent to approximately 26 seconds of bits. Therefore, in such cases, the memory analysis density is set to 2.4 bits/second to cope with the long duration of the audio sentence. In this case, about 52 seconds worth of ondo booters can be stored.

ところが、このように音声データの分析密度を−＜、１
に低くすると音質の劣下を招き音声文章が不明ｄ＋（に
なるといった欠点がある。However, in this way, the analysis density of voice data is −<,1
If it is set too low, the sound quality will deteriorate and the audio text will become unclear (d+).

すなわち、分析合成方式は、自然音声波形になんらかの
処理を施して、より情報密度の高いデータが得られるよ
うにしたものであって、例えば１）ＡＲ，ＣＯＲ，方式
であれば、第１図に示される音声波形を所定の分析区間
Ｔ１毎にサンプリングし、音声データとしての音声特徴
パラメータを記憶メモリに記憶するものである。In other words, the analysis and synthesis method performs some processing on the natural speech waveform to obtain data with higher information density.For example, 1) AR, COR, and the The displayed audio waveform is sampled every predetermined analysis interval T1, and the audio feature parameters as audio data are stored in a storage memory.

ここで分析区間ＴＩ　　（１０ｍＳ）が分析密度の４．
８にビット／秒に相当する場合であれば、メモリの音声
データが４８にビット／秒の音声データとして音声合成
され、第２図に示されるような音声信号として出力され
る。ところが、メモリに格納される音声データの分析密
度を４．８にビット／秒から２．４にビット／秒にした
場合には、その分析区間Ｔ２が第２図の２　［ａ　（２
０ｍ　Ｓ　）となり、第３図に示されるような音声１６
号として出力される。Here, the analysis interval TI (10 mS) is the analysis density of 4.
If it corresponds to 8 bits/second, the audio data in the memory is synthesized into 48 bits/second audio data, and is output as an audio signal as shown in FIG. However, when the analysis density of audio data stored in memory is increased from 4.8 bits/second to 2.4 bits/second, the analysis interval T2 becomes 2 [a (2
0m S), and the sound 16 as shown in Fig.
output as a number.

このようにメモリに格納される音声データの分析密度が
一様に低くなると忠実な音声波形が得られなくなり、音
声文章が不明瞭となる。If the analysis density of the audio data stored in the memory becomes uniformly low in this way, a faithful audio waveform cannot be obtained, and the audio sentence becomes unclear.

本発明は前記課題に鑑み成されたものであり、その目的
げ、音声文章が一様に不□明瞭になることを防止できる
音声合成装置を提供することにある。The present invention has been made in view of the above problems, and an object thereof is to provide a speech synthesis device that can prevent speech sentences from becoming uniformly unclear.

前記目的を達成するために本発明は、音声文章を構成す
る単ＭＩを各単語ブロック毎に音声分析によって得られ
る音声データ゛で格納するメモリを、分析密度によって
少なくとも２群に分割される単語ブロック群の構成とし
、所定の音声文章に対して明瞭度の要求される文節には
分析密度の高込単語ブロック群の音声データによって音
声合成を行なわせ、明瞭度の低いことが許容される文節
に対しては分析密度の低い単語ブロック群の音声データ
によって音声合成を行なわせるようにしたことを特徴と
する。In order to achieve the above object, the present invention provides a memory for storing a single MI constituting a spoken sentence as speech data obtained by speech analysis for each word block, into word block groups divided into at least two groups depending on the analysis density. For a given audio sentence, speech synthesis is performed using audio data from word blocks with high analysis density for phrases that require high intelligibility, and for phrases that require low intelligibility. The present invention is characterized in that speech synthesis is performed using speech data of a group of word blocks with low analysis density.

以ｆ５図面に基づいて本発明の好適な実施例を説明する
。Hereinafter, preferred embodiments of the present invention will be described based on the drawings.

第４図には、本発明の好適な実施例であって、ＰｋＲＣ
ＯＲ方式を適用した音声合成装置の構成図が示されてい
る。図において、音声合成装置は制御部２、並列インタ
ーフェイス回路４、バッファリレー６、音声合成部８、
メモリ１０、増幅器１２を有し、増幅器１２がスピーカ
１４に接続されている。FIG. 4 shows a preferred embodiment of the present invention, in which PkRC
A configuration diagram of a speech synthesis device to which the OR method is applied is shown. In the figure, the speech synthesizer includes a control section 2, a parallel interface circuit 4, a buffer relay 6, a speech synthesis section 8,
It has a memory 10 and an amplifier 12, and the amplifier 12 is connected to a speaker 14.

メモ１７１０ｕ音声文章を構成す、る単語を各単語ブロ
ック毎に音声分析によって得られる音声データで格納す
るものであり、分析密度によって少なくとも２群に分割
される単語ブロック群の音声データが格納されている。Memo 1710u The words constituting the audio sentence are stored as audio data obtained by audio analysis for each word block, and the audio data of word block groups divided into at least two groups depending on the analysis density is stored. There is.

すなわち、音声文章の中の単語を明瞭度の要求されるも
のとそうでないものとによって分析密度が異なるように
している。That is, the analysis density is made to differ depending on whether words in the audio sentence require high intelligibility or not.

例えば、〔１番線に電車が参ります。危いですから足も
との白線まで下がってお待ち下さい。〕のような列車接
近放送に用いられる音声文章の場合であって、前者の文
章が明瞭度の高いものが要求されるものであって、後者
の文章が明瞭度の低いものである場合には、前者の文章
の〔１番線に〕、〔電車が〕、〔参ります。〕の各単語
ブロックの分析゛密度を４８にビット／秒の音声データ
とし、後者の文章の〔危いですから〕、〔足もとの白線
まで〕、〔下がって〕、〔お待ち下さい〕の各単波ブロ
ックの分析密度を１４にビット／秒の音声データとして
格納する。For example, [A train is coming to platform 1. It is dangerous, so please stand back to the white line at your feet and wait. ] In the case of audio sentences used for train approaching announcements, where the former sentence requires high intelligibility and the latter sentence has low intelligibility, , in the former sentence, [on platform 1], [the train], [is coming]. Analysis of each word block in ``Density of 48 bits/second audio data,'' each single wave of the latter sentence ``It's dangerous,'' ``To the white line at your feet,'' ``Step back,'' and ``Please wait.'' The analysis density of the block is stored as audio data in 14 bits/second.

制御部２は、ＣＰＵ、ＲＯＭ、ＲＡＭ等を有するマイク
ロコンピュータで合成されており、音声出力指令１００
がバッファリレー６、並列インターフェイス回路４を介
して供給されると、メモリｌＯに含まれる単語ブロック
に対応した単語ブロックの組合せによって指令に対応し
た音声文章を編集し、編集に従った単語ブロックの選択
を指令する。この指令は並列インターフェイス回路４を
介して音声合成部８に供給される。The control unit 2 is composed of a microcomputer having a CPU, ROM, RAM, etc.
is supplied through the buffer relay 6 and the parallel interface circuit 4, the voice sentence corresponding to the command is edited by a combination of word blocks corresponding to the word blocks contained in the memory IO, and the word block is selected according to the editing. command. This command is supplied to the speech synthesis section 8 via the parallel interface circuit 4.

音声合成部８は、例えばパルス列発生器、雑音発生器か
ら成る音源部、デジタルフィルタ、Ｄ／Ａ変換器、パラ
メータ補間回路、パラメータ復号化回路及びインターフ
ェイス回路等から構成されており、制御部２からの選択
指令に従った単語ブロックをメモリー０から選択し、各
単語ブロックの音声データを分析密度に応じた音声信号
に合成するものである。音声合成部８において、音声合
成された音声信号は増幅器１２によって増幅されスピー
カー４から音声となって出力される。The speech synthesis section 8 is composed of a sound source section including a pulse train generator, a noise generator, a digital filter, a D/A converter, a parameter interpolation circuit, a parameter decoding circuit, an interface circuit, etc. Word blocks according to the selection command are selected from memory 0, and the audio data of each word block is synthesized into an audio signal according to the analysis density. In the voice synthesis section 8, the synthesized voice signal is amplified by the amplifier 12 and output as voice from the speaker 4.

次に、前述した列車接近放送の音声文章を放送する場合
の作用を第５図のフローチャートに基づいて説明する。Next, the operation when broadcasting the audio text of the train approach announcement described above will be explained based on the flowchart of FIG. 5.

まず、ステップ２００において、音声出力指令１００が
バッファリレー６、並列インターフェイス回路４ｉ介し
て制御部２に供給されると、ステ◆ ツブ２０２に移り、〔１番線に電車が鯵シます。〕と〔
危匹ですから足もとの白線まで下がってお待ち下さい。First, in step 200, when the voice output command 100 is supplied to the control unit 2 via the buffer relay 6 and the parallel interface circuit 4i, the process moves to step 202, where the train is placed on platform 1. 〕and〔
This is a dangerous animal, so please stand back to the white line at our feet and wait.

〕の音声文章が編集されるとともに、紬者の放送文章は
明瞭度が要求さｎる文章であるので、音声データの分析
密度を４．８にビット／秒ステップ２０３において、制
御部２からの選択指令に基づいて、〔１番線に〕、〔電
車が〕、〔参ります。〕の各単語の音声データは４．８
にビット／秒の分析密度によって音声合成され、スピー
カ１４から音声として放送される。次にステップ２０４
に移り、後者の放送文章の分析密度ｔ−２，４にビット
／秒とするイニシャル設定処理カ行すわれステップ２０
５に移る。ステップ２０５にお匹て制御部２からの選択
指令に基づｂて〔危いですから〕、〔足もとの白線まで
〕、〔下がって］、〔お待ち下さ匹。〕の各単語プ゛ロ
ックは１４にビット／秒の分析密度で音声合成されスピ
ーカ１４から放送される。] is edited, and since Tsumugi's broadcast text is a text that requires clarity, the analysis density of the audio data is set to 4.8 bits/second in step 203, and the control unit 2 Based on the selection command, [the train] will come [to platform 1]. ] The audio data for each word is 4.8
A voice is synthesized using an analysis density of bits/second, and the voice is broadcast from the speaker 14 as voice. Next step 204
Step 20, an initial setting process is performed to set the analysis density of the latter broadcast text to bits/second at t-2, 4.
Move on to 5. In step 205, based on the selection command from the control unit 2, [it's dangerous], [go to the white line at your feet], [go back], and [wait for me, please!]. ] Each word block is voice synthesized at an analysis density of 14 bits/second and broadcast from a speaker 14.

なお、本実施例においては文章毎に分析密度が異なるこ
とについて述べたが、−文章中の各単語ブロックの分析
密度を異なるようにし九場合でも同様に行なえる。In this embodiment, it has been described that the analysis density is different for each sentence, but the same can be done in the case where the analysis density is made different for each word block in the sentence.

このように本実施例によれば、メモリに分析密度の異な
る音声データを混在させること釦よりメモリ容ｔを節約
できるので、同一分析密度の音声データを格納する場合
よりも多くの音声データを格納することができる。その
ためメモリの低減によるコスト低減及びメモリ実装スペ
ースの有効利用が図れる。In this way, according to this embodiment, the memory capacity t can be saved by mixing audio data with different analysis densities in the memory, so more audio data can be stored than when audio data with the same analysis density is stored. can do. Therefore, costs can be reduced by reducing memory and memory mounting space can be used effectively.

以上説明したように本発明によれば、所定の記憶容置に
対して音声文章が収納しきれなか場合でも、音声文章を
構成する単語の音声データを分析密度によって少なくと
も２群に分割してメモリに格納することにより、メモリ
容量の低減が図れるとともに音声文章が一様に不明瞭に
なることが防止できるという優れた効果がある。As explained above, according to the present invention, even when a predetermined storage capacity cannot accommodate all the audio sentences, the audio data of the words that make up the audio sentences are divided into at least two groups depending on the analysis density and stored in the memory. This has the advantageous effect of reducing memory capacity and preventing speech sentences from becoming uniformly unclear.

[Brief explanation of the drawing]

第１図は、自然音声波形図、第２図は、第１図に示す音
声波形を４．８にビット／秒の分析密度で合成した音声
波形図、第３図は、第１図に示す音声波形をＺ４にビッ
ト／秒の分析密度で合成した音声波形図、第４図は、本
発明の一実施例を示す構成図、第５図は、第４図に示さ
れる装置の作用を説明するためのフローチャートチある
。２・・・制御部、４・・・並列インターフェイス回路、
６・・・バッファリレー、８・・・音声合成部、１ｏ・
・・メモｌＪ％１２・・・増幅器、１４・・・スピーカ
。第１図第４図Fig. 1 is a natural speech waveform diagram, Fig. 2 is a speech waveform diagram synthesized from the speech waveform shown in Fig. 1 at an analysis density of 4.8 bits/sec, and Fig. 3 is a diagram of the speech waveform shown in Fig. 1. A speech waveform diagram in which a speech waveform is synthesized into Z4 at an analysis density of bits/second, FIG. 4 is a block diagram showing an embodiment of the present invention, and FIG. 5 explains the operation of the device shown in FIG. 4. There is a flowchart for this. 2... Control unit, 4... Parallel interface circuit,
6...Buffer relay, 8...Speech synthesis section, 1o.
...Memo lJ%12...Amplifier, 14...Speaker. Figure 1 Figure 4

Claims

[Claims]

1. A memory that stores the single @ that makes up a spoken sentence in word blocks using audio data obtained through audio analysis;
A control unit that edits an audio sentence according to a combination of word blocks corresponding to the word blocks included in the memory according to an audio output command, and instructs selection of word blocks according to the editing, and a control unit that instructs selection of word blocks according to the editing, and a speech synthesis unit that selects word blocks and synthesizes the speech data of each word block into a speech signal according to analysis density; A speech synthesis device characterized by having a group.