JP2001312291A

JP2001312291A - Method for generating numeral voice waveform and method and device for synthesizing numerical voice

Info

Publication number: JP2001312291A
Application number: JP2000133181A
Authority: JP
Inventors: Hisashi Kawai; 恒河井; Norio Higuchi; 宜男樋口; Toru Shimizu; 徹清水
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2000-05-02
Filing date: 2000-05-02
Publication date: 2001-11-09
Anticipated expiration: 2020-05-02
Also published as: JP3632901B2

Abstract

PROBLEM TO BE SOLVED: To provide a waveform generating method, a synthesizing method, and a synthesizing device for numeral voice which has natural intonation when a numeral having plural digits is continuously spoken. SOLUTION: While the lower two digits (ZZ' of a 4-digit numeral (XX'ZZ') are fixed to an arbitrary 2-digit or 1-digit, the higher two digits (XX') are varied from 00 to 99 and the 4-digit numeral is continuously spoken; and the continuous speech of the 4-digit numeral is recorded to obtain a numeral speech waveform of the higher two digits (XX'). Further, while the higher two digits (ZZ') of a 4-digits numeral (ZZ'XX') are fixed to a arbitrary 2-digit or 1-digit numeral, the lower two digits (YY') are varied from 00 to 99 and the 4-digit numeral is continuously spoken; and the continuous speech of the 4-digit numeral is recorded to obtain a numeral speech waveform of the lower two digits (YY'). When a 4-digit numeral voice is synthesized, fragments of the waveforms of the higher two digits (XX') and lower two digits (YY') are cut out of the numeral speech waveform and the 4-digit numeral voice is synthesized.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は数字音声波形の作
成方法、数字音声の合成方法および装置に関し、特に、
複数の数字音声を連続して発生させる場合に、自然な抑
揚をもつ数字音声を発生させることができるようにした
数字音声波形の作成方法、数字音声の合成方法および装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for forming a numeric voice waveform, a method and an apparatus for synthesizing a numeric voice,
The present invention relates to a method of generating a numeric voice waveform, a method of synthesizing a numeric voice, and a device capable of generating a numeric voice having a natural intonation when a plurality of numeric voices are continuously generated.

【０００２】[0002]

【従来の技術】従来から、電話番号の案内サービス等に
見られるように、数字音声を合成して顧客に電話番号を
提供するサービスが実用化されている。このようなサー
ビスでは、数字が一つずつ独立的に発音されており、全
体を通して聞くと、不自然な感じがする。2. Description of the Related Art Conventionally, a service for synthesizing numeral voices and providing a telephone number to a customer has been put to practical use as seen in a telephone number guidance service and the like. In such a service, numbers are pronounced independently one by one, and when heard throughout, it feels unnatural.

【０００３】また、従来から、複数桁の数字、例えば４
桁の数字を連続的に発声した場合に自然に聞こえる基本
周波数パターンは、図７のように、最初の２桁、例えば
「ゼロ」と「イチ」は一つの尾根を形成し、また、次の
２桁、例えば「ニー」と「ヨン」が他の尾根を形成し、
１つ目の尾根の高さは「高」程度、２つ目の尾根の高さ
は「中」程度であることが知られている。また、３桁の
数字の場合には、最初の２桁は図７の最初の２桁と同じ
尾根形状および高さを示し、３桁目は図７の後の２桁と
同じ尾根形状で高さが「中」または「低」となることが
知られている。Conventionally, a plurality of digits, for example, 4
As shown in FIG. 7, the fundamental frequency pattern that sounds natural when the digits of a digit are continuously uttered is such that the first two digits, for example, “zero” and “one” form one ridge, and Two digits, for example "Knee" and "Yon" form another ridge,
It is known that the height of the first ridge is “high” and the height of the second ridge is “medium”. In the case of a three-digit number, the first two digits have the same ridge shape and height as the first two digits in FIG. 7, and the third digit has the same ridge shape and height as the last two digits in FIG. Is known to be "medium" or "low".

【０００４】[0004]

【発明が解決しようとする課題】前記した電話番号の案
内サービス等の従来技術では、提供される数字音声が不
自然に聞こえるため、顧客はこの不自然さに気を取られ
て正しい数字音声を聞き逃すことが生ずるという問題、
換言すれば情報の伝達にミスが生じやすいという問題が
あった。In the prior art such as the telephone number guidance service described above, the provided numeric voice sounds unnatural, and the customer pays attention to this unnaturalness and plays a correct numeric voice. The problem of oversight
In other words, there has been a problem that errors are likely to occur in information transmission.

【０００５】また、複数桁の数字、例えば４桁の数字の
自然に聞こえる発声パターンを予め全部用意しておき、
発声の要求があった都度、該要求のあった発声パターン
を再生して発声させようとすると、００００から９９９
９までの発声パターン、すなわち１０4 個のデータを用
意しなければならず、メモリに記憶させるデータ量が非
常に大きくなるという問題があった。またこの問題は、
発声させる数字の桁数が増えると、１０のべき乗で増加
するという問題があった。[0005] Also, a naturally utterable utterance pattern of a plurality of digits, for example, a four-digit number, is prepared in advance,
Whenever an utterance request is made, the requested utterance pattern is reproduced and uttered.
Up to 9 utterance patterns, that is, 10 4 data must be prepared, and there is a problem in that the amount of data stored in the memory becomes very large. The problem is
When the number of digits to be uttered increases, there is a problem that the number increases by a power of 10.

【０００６】この発明の目的は、前記した従来技術の問
題点を解消し、複数桁の数字を連続発声した場合、自然
な抑揚のある数字音声で聞こえる数字音声の合成方法お
よび装置を提供することにある。また、他の目的は、メ
モリに記憶させるデータ量が少なくても自然な抑揚のあ
る複数桁の数字音声を再生する数字音声波形の作成方
法、数字音声の合成方法および装置を提供することにあ
る。SUMMARY OF THE INVENTION It is an object of the present invention to provide a method and an apparatus for synthesizing a numeric voice which can be heard as a numeric voice with natural inflections when a plurality of digits are continuously uttered. It is in. It is another object of the present invention to provide a method of creating a numeric voice waveform for reproducing a numeric voice of a plurality of digits with natural inflections even if the amount of data stored in the memory is small, and a method and apparatus for synthesizing numeric voices. .

【０００７】[0007]

【課題を解決するための手段】前記した目的を達成する
ために、本発明は、０から９まで変化させて単独で発声
した波形と、３桁または４桁数字の後半１桁または２桁
を任意の１桁または２桁の数字に固定した状態で、前半
の２桁を００（ゼロゼロ）から９９（キューキュー）ま
で変化させて３桁または４桁数字を連続発声した波形と
を、基本周波数の高い数字音声波形として記録するよう
にした点、および３桁または４桁数字の前半２桁を任意
の２桁の数字に固定した状態で、後半の１桁を０から９
まで変化させて３桁数字を連続発声した波形と、該後半
の２桁を００（ゼロゼロ）から９９（キューキュー）ま
で変化させて４桁数字を連続発声した波形とを、基本周
波数の低い数字音声波形として記録するようにした点に
第１の特徴がある。この特徴によれば、数字の音声合成
の基となる、メモリに記憶させる数字音声波形のデータ
量を、大幅に低減することができる。In order to achieve the above-mentioned object, the present invention relates to a method in which a single uttered waveform varying from 0 to 9 is combined with the last one or two digits of a three- or four-digit number. A waveform in which the first two digits are changed from 00 (zero zero) to 99 (cue queue) while the first two digits are fixed to an arbitrary one or two digits, and a three- or four-digit number is continuously uttered, is a fundamental frequency. The first two digits of a three- or four-digit number are fixed to an arbitrary two-digit number, and the second digit is set to 0 to 9
The waveform in which the three digits are continuously uttered while changing the second digit and the waveform in which the last two digits are continuously uttered by changing the two digits from 00 (zero zero) to 99 (cue cue) are converted into a low fundamental frequency number. A first feature is that the sound waveform is recorded. According to this feature, it is possible to greatly reduce the data amount of the numeric voice waveform stored in the memory, which is the basis of the voice synthesis of the numbers.

【０００８】また、本発明は、基本周波数の高い数字音
声波形と、基本周波数の低い数字音声波形とを結合し
て、３桁または４桁の数字音声を合成するようにした点
に第２の特徴がある。この特徴によれば、自然な抑揚の
ある数字音声を再生することができるようになる。Further, the present invention is characterized in that a numeral voice waveform having a high fundamental frequency and a numeral voice waveform having a low fundamental frequency are combined to synthesize a three- or four-digit numeral voice. There are features. According to this feature, it is possible to reproduce a numeric voice with natural intonation.

【０００９】また、本発明は、数字列を先頭から２桁ず
つの区間に分割し、奇数区間の数字には基本周波数の高
い数字音声波形を指定し、偶数区間の数字には基本周波
数の低い数字音声波形を指定し、これらの数字音声波形
を結合して前記数字列の音声を合成するようにした点に
第３の特徴がある。この特徴によれば、自然な抑揚のあ
る多数桁の数字音声を再生することができるようにな
る。Further, according to the present invention, a numeral string is divided into intervals of two digits from the beginning, a numeral voice waveform having a high fundamental frequency is designated for a numeral in an odd period, and a low-frequency sound waveform having a low fundamental frequency is designated for a numeral in an even period. A third feature is that a numeral voice waveform is designated and these numeral voice waveforms are combined to synthesize the voice of the numeral string. According to this feature, it is possible to reproduce a multi-digit numeric voice with natural intonation.

【００１０】また、本発明は、基本周波数の高い数字音
声波形と、基本周波数の低い数字音声波形とを記憶する
音声波形蓄積部と、数字列を先頭から２桁ずつの区間に
分割する数字列分割部と、該区間毎に、基本周波数の高
さ指定を行う基本周波数高さ指定部と、該基本周波数高
さ指定部で指定された高さの数字音声波形を前記音声波
形蓄積部から抽出する音声波形抽出部と、該音声波形抽
出部から抽出された数字音声波形を結合して出力する音
声信号出力部とを具備した点に第４の特徴がある。この
特徴によれば、自然な抑揚のある多数桁の数字音声を再
生する数字音声の合成装置を提供できるようになる。The present invention also provides a voice waveform storage unit for storing a numeric voice waveform having a high fundamental frequency and a numeric voice waveform having a low fundamental frequency, and a numeric string for dividing the numeric string into two-digit sections from the beginning. A dividing unit, a fundamental frequency height designating unit for designating a fundamental frequency height for each section, and a numeric speech waveform having a height designated by the fundamental frequency height designating unit are extracted from the speech waveform accumulating unit. A fourth characteristic lies in that an audio waveform extracting unit for performing the above-mentioned operations and an audio signal output unit for combining and outputting the numeric audio waveforms extracted from the audio waveform extracting unit are provided. According to this feature, it is possible to provide a digit voice synthesizing apparatus that reproduces a multi-digit numeric voice with natural intonation.

【００１１】[0011]

【発明の実施の形態】以下に、図面を参照して、本発明
を詳細に説明する。まず、本発明の数字音声波形の作成
方法について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail with reference to the drawings. First, a method for producing a numeric voice waveform according to the present invention will be described.

【００１２】４桁数字を連続発声すると、前半の２桁は
基本周波数が高程度の第１の尾根を形成し、後半の２桁
は基本周波数が中または低程度の第２の尾根を形成する
ことに着目し、本発明では、前半の２桁の数字発声の波
形と、後半の２桁の数字発声の波形とを別々に作り、こ
れらの波形を任意に組み合わせて４桁の数字音声を合成
するようにする。When four-digit numbers are uttered continuously, the first two digits form a first ridge having a high fundamental frequency, and the second two digits form a second ridge having a medium or low fundamental frequency. Focusing on this, in the present invention, the waveform of the first two digits of the numeric utterance and the waveform of the second half of the two digits are separately generated, and these waveforms are arbitrarily combined to synthesize the four-digit numeric voice. To do it.

【００１３】そこで、本発明では、図１に示されている
ように、後半２桁を任意の２桁または１桁の数字に固定
した状態で、前半の２桁ＸＸ' を００から９９まで変化
させて４桁数字を連続発声し、該４桁数字の連続発声を
録音により、前半の２桁の数字発声波形として記憶す
る。例えば、後半２桁を「ニー」「ヨン」に固定して、
前半の２桁ＸＸ' を００（ゼロゼロ）から９９（キュー
キュー）まで変化させて、４桁数字を連続発声する。す
なわち、「ゼロ」「ゼロ」「ニー」「ヨン」、「ゼロ」
「イチ」「ニー」「ヨン」、「ゼロ」「ニー」「ニー」
「ヨン」、…、「キュー」「キュー」「ニー」「ヨン」
と発声して録音する。これにより、１００個の数字発声
波形が前半の２桁ＸＸ' の発声を含む波形として記憶さ
れる。Therefore, in the present invention, as shown in FIG. 1, the first two digits XX 'are changed from 00 to 99 while the latter two digits are fixed to any two or one digit. Then, the four-digit number is continuously uttered, and the continuous utterance of the four-digit number is recorded and stored as the first two-digit number utterance waveform. For example, fix the last two digits to "knee" and "yon"
The first two digits XX ′ are changed from 00 (zero zero) to 99 (cue queue), and four digits are uttered continuously. That is, "zero""zero""knee""yon","zero"
"Ichi""Knee""Yon","Zero""Knee""Knee"
“Yeon”,…, “Cue” “Cue” “Knee” “Yon”
And record. As a result, 100 numeric utterance waveforms are stored as waveforms including the first two digits of XX 'utterance.

【００１４】次に、前記とは逆に、図２に示されている
ように、前半の２桁を任意の２桁の数字、例えば「ニ
ー」「ヨン」に固定して、後半２桁ＹＹ' を、「ゼロ」
「ゼロ」から「キュー」「キュー」まで変化させて、４
桁数字で連続発声し、これを録音することにより、後半
２桁ＹＹ' の数字発声を含む波形として記憶する。これ
により、１００個の数字発声波形が後半２桁ＹＹ' の発
声を含む波形として記憶される。Next, on the contrary, as shown in FIG. 2, the first two digits are fixed to an arbitrary two-digit number, for example, "Knee" or "Yon", and the second half is YY. 'To "zero"
Change from "zero" to "queue""queue", 4
A continuous utterance is made with digit numbers, and the recorded utterance is stored as a waveform including the utterance of the last two digits YY '. As a result, 100 numeric utterance waveforms are stored as waveforms including the utterance of the last two digits YY '.

【００１５】また、１桁の数字を発声する場合を想定し
て、１桁数字の「ゼロ」〜「キュー」までの１０個の発
声波形を、前記１００個の前半の２桁ＸＸ' の数字発声
波形に追加して記憶する。さらに、３桁の数字を発声す
る場合を想定して、前半の２桁を任意の２桁の数字に固
定して、後半１桁の数字を「ゼロ」〜「キュー」まで変
化させて３桁数字で連続発声し録音する。そして、この
１０個の発声波形を、前記１００個の後半２桁ＹＹ' の
数字発声波形に追加して記憶する。Further, assuming that a one-digit number is uttered, the ten uttered waveforms from the one-digit number “zero” to “cue” are converted into the first two half-digit numbers XX ′ of the 100 digits. It is stored in addition to the utterance waveform. Furthermore, assuming the case of uttering a three-digit number, the first two digits are fixed to an arbitrary two-digit number, and the last one digit is changed from “zero” to “cue” to three digits. Speak and record numbers continuously. Then, the ten utterance waveforms are stored in addition to the 100 latter two-digit YY 'numeral utterance waveforms.

【００１６】したがって、本発明では、前半の数字発声
波形として１１０個、後半の数字発声波形として１１０
個の波形が、蓄積されることになる。Therefore, in the present invention, the first half number utterance waveform is 110 and the second half number utterance waveform is 110.
Individual waveforms will be accumulated.

【００１７】次に、本発明の、数字音声の合成方法の一
実施形態について説明する。Next, an embodiment of the method of synthesizing a numeric voice according to the present invention will be described.

【００１８】(1) 発声する数字が１桁の場合には、前記
の録音により得た前半の数字発声波形の中から、該当す
る１桁の数字発声波形を取り出して、再生する。(1) When the number to be uttered is one digit, the corresponding one-digit number utterance waveform is taken out of the first half number utterance waveform obtained by the above recording and reproduced.

【００１９】(2) 発声する数字が２桁の場合には、前記
の録音により得た前半の数字発声波形の中から、該当す
る２桁の数字発声波形の断片を切り出して、再生する。(2) When the number to be uttered is two digits, a corresponding two-digit number utterance waveform fragment is cut out of the first half number utterance waveform obtained by the above-mentioned recording and reproduced.

【００２０】(3) 発声する数字が３桁の場合には、前記
の録音により得た前半の数字発声波形の中から、該当す
る２桁の数字発声波形の断片を切り出し、さらに後半の
数字発声波形の中から、該当する１桁の数字発声波形の
断片を切り出して、両数字発声波形断片を結合して再生
する。この場合、前半の２桁と後半の１桁の音声区間の
前後に無音を付けずに切り出し、結合する。(3) When the number to be uttered is three digits, the corresponding two-digit number utterance waveform is cut out of the first half of the number utterance waveform obtained by the above recording, and the second half of the number utterance is further cut out. From the waveform, a corresponding one digit digit utterance waveform fragment is cut out, and the two digit utterance waveform fragments are combined and reproduced. In this case, the sound section is cut out without silence before and after the first two digits and the last one digit voice section and combined.

【００２１】(4) 発声する数字が４桁の場合には、前記
の録音により得た前半の数字発声波形の中から、該当す
る２桁の数字発声波形の断片を切り出し、さらに後半の
数字発声波形の中から、該当する２桁の数字発声波形の
断片を切り出して、両数字発声波形断片を結合して再生
する。この場合、前の２桁と後の２桁の音声区間の前後
に無音を付けずに切り出し、結合する。例えば、図３に
示されているように、前半２桁ＸＸ' ，後半２桁ＹＹ'
の音声波形の断片を結合して、４桁（ＸＸ' ＹＹ' ）の
音声を合成する。なお、該４桁の音声に不連続感が残る
場合には、前記前後２桁ずつの音声波形の結合部（図の
Ａ点）に１００ｍ秒程度の無音期間を挿入するのが好ま
しい。前記(3) および後述の(5) の場合も同様である。(4) When the number to be uttered is four digits, a fragment of the corresponding two-digit number utterance waveform is cut out from the first half number utterance waveform obtained by the above recording, and the second half number utterance is further cut out. From the waveform, a corresponding two-digit number utterance waveform fragment is cut out, and the two-digit utterance waveform fragments are combined and reproduced. In this case, the sound section is cut out without any silence before and after the two-digit voice section before and after the two-digit voice section and combined. For example, as shown in FIG. 3, the first two digits XX 'and the last two digits YY'
Are combined to synthesize a 4-digit (XX'YY ') voice. If a discontinuity remains in the four-digit voice, it is preferable to insert a silent period of about 100 msec into the joint (point A in the figure) of the two preceding and following two-digit voice waveforms. The same applies to the case of (3) and (5) described later.

【００２２】(5) 発声する数字が５桁以上の場合には、
先頭から４桁毎に区分けし、各区分けされた数字に対し
て、前記(1) 〜(4) の方法を適用して再生する。例え
ば、発声する数字が５桁の場合には、先頭から４桁まで
は(4) の方法で再生し、５桁目は(1) の方法で再生す
る。また、発声する数字が１０桁の場合には、１桁〜４
桁、５桁〜８桁はそれぞれ(4) の方法で再生し、９、１
０桁目は(2) の方法で再生する。この場合には、先頭か
ら４桁ずつ独立に処理し、これらの４桁の間に３００ｍ
秒程度の無音を挟んで接続する。(5) If the number to be uttered is 5 digits or more,
The data is divided into four digits from the beginning, and the divided numbers are reproduced by applying the above-mentioned methods (1) to (4). For example, if the number to be uttered is five digits, the first four digits are reproduced by the method (4), and the fifth digit is reproduced by the method (1). If the number to be uttered is 10 digits, 1 digit to 4 digits
Digit, 5 to 8 digits are reproduced by the method of (4), respectively,
The 0th digit is reproduced by the method of (2). In this case, processing is performed independently for each four digits from the beginning, and 300 m is interposed between these four digits.
Connect with silence for about a second.

【００２３】次に、本発明の、数字音声の合成装置の一
実施形態について説明する。この合成装置は、例えばコ
ンピュータを用いて実現することができる。図４は、本
発明の一実施形態の構成を示す機能ブロック図である。Next, an embodiment of the numeral speech synthesizing apparatus according to the present invention will be described. This synthesizing device can be realized using, for example, a computer. FIG. 4 is a functional block diagram showing the configuration of one embodiment of the present invention.

【００２４】図において、１は、入力された数字列を、
先頭から２桁ずつに分割する数字列分割部、２は、該分
割された数字に基本周波数の高さ（すなわち、ピッチ）
を指定する基本周波数高さ指定部、３は、基本周波数パ
ターンの第１の尾根に対応する２桁数字および１桁数字
の音声波形、および第２の尾根に対応する２桁数字およ
び１桁数字の音声波形を蓄積する音声波形蓄積部、４
は、前記基本周波数高さ指定部２から指定された位置に
適した基本周波数をもつ音声波形を音声波形蓄積部３か
ら抽出する音声波形抽出部、５は、必要に応じて無音を
挿入しつつ、音声波形をスピーカまたは電話回線に出力
する音声信号出力部である。In the figure, reference numeral 1 denotes an input numeral string,
The digit string dividing unit 2 that divides each digit into two digits from the beginning adds the height (ie, pitch) of the fundamental frequency to the divided digits.
Is a two-digit and one-digit voice waveform corresponding to the first ridge and a two-digit and one-digit number corresponding to the second ridge of the fundamental frequency pattern. Voice waveform storage unit for storing voice waveforms of
The audio waveform extraction unit 5 extracts an audio waveform having a basic frequency suitable for the position designated by the fundamental frequency height designation unit 2 from the audio waveform storage unit 3, and the speech waveform extraction unit 5 inserts silence as necessary. And an audio signal output unit for outputting an audio waveform to a speaker or a telephone line.

【００２５】前記基本周波数高さ指定部２は、前記数字
列分割部１で分割された区間の奇数番目に対しては高い
基本周波数を指定し、偶数番目に対しては低い基本周波
数を指定する。音声波形蓄積部３には、基本周波数の高
い波形である前半の２桁音声波形３ａと、基本周波数の
低い波形である後半の２桁音声波形３ｂを含む波形とが
記憶されており、基本周波数高さ指定部２によって高い
基本周波数が指定されると、音声波形抽出部４からの音
声波形抽出信号ａによって、前半の２桁音声波形３ａか
ら音声波形が切出され、一方低い基本周波数が指定され
ると、後半の２桁音声波形３ｂから音声波形が切出され
る。The fundamental frequency height designating section 2 designates a high fundamental frequency for odd-numbered sections and a low fundamental frequency for even-numbered sections of the section divided by the digit string dividing section 1. . The audio waveform storage unit 3 stores a waveform including a first two-digit audio waveform 3a which is a waveform having a high basic frequency and a waveform including a second two-digit audio waveform 3b which is a waveform having a low basic frequency. When a high fundamental frequency is designated by the height designation unit 2, a speech waveform is cut out from the first two-digit speech waveform 3a by the speech waveform extraction signal a from the speech waveform extraction unit 4, while a low fundamental frequency is designated. Then, an audio waveform is cut out from the latter two-digit audio waveform 3b.

【００２６】次に、本実施形態の動作を、図４と図５を
参照して説明する。図５は、本実施形態の動作を説明す
るためのフローチャートである。例えば電話番号の数字
列｛ａn ｝（ｎは、正の整数）が数字列分割部１に入力
すると（ステップＳ１）、数字列分割部１は、該数字列
｛ａn ｝を先頭から２桁ずつの区間に分割する（全部で
ｍ区間とする）（ステップＳ２）。次に、ｍ＝１と置き
（ステップＳ３）、基本周波数高さ指定部２は、第ｍ区
間の２桁数字（または、１桁数字）の基本周波数の高さ
（前半の音声波形または後半の音声波形）を指定する
（ステップＳ４）。次に、音声波形抽出部４は、第ｍ区
間の２桁数字（または、１桁数字）の音声波形を、指定
された音声波形蓄積部３中の領域から抽出する（ステッ
プＳ５）。そして、音声信号出力部５は、該抽出された
音声波形を、スピーカまたは電話回線に出力する（ステ
ップＳ６）。ステップＳ７では、前記数字列｛ａn ｝の
全部がスピーカまたは電話回線に出力されたか否かの判
断がなされ、この判断が否定の時にはステップＳ８に進
んで、ｍに１が加算される。そして、再度ステップＳ４
に戻って、前記と同様の動作が続けられる。前記の動作
が繰り返し行われ、ステップＳ７の判断が肯定になる
と、本実施形態の動作は終了する。なお、前記音声信号
出力部５は、必要に応じて、前半の２桁数字と後半の２
桁数字との間に無音を挿入して、自然な抑揚のある複数
桁の数字音声を生成するようにする。Next, the operation of this embodiment will be described with reference to FIGS. FIG. 5 is a flowchart for explaining the operation of the present embodiment. For example, when a numeral string {an} (n is a positive integer) of a telephone number is input to the numeral string division unit 1 (step S1), the numeral string division unit 1 converts the numeral string {an} into two digits from the beginning. (Step m2) (step S2). Next, setting m = 1 (step S3), the fundamental frequency height designation unit 2 determines the height of the fundamental frequency of the two-digit number (or one-digit number) of the m-th section (the first half voice waveform or the second half). (A voice waveform) (step S4). Next, the audio waveform extraction unit 4 extracts the two-digit number (or one-digit number) audio waveform of the m-th section from the designated area in the audio waveform storage unit 3 (step S5). Then, the audio signal output unit 5 outputs the extracted audio waveform to a speaker or a telephone line (Step S6). In step S7, it is determined whether or not all of the numeral string {an} has been output to the speaker or the telephone line. If the determination is negative, the process proceeds to step S8, where 1 is added to m. Then, again at step S4
And the same operation as described above is continued. The above operation is repeatedly performed, and when the determination in step S7 becomes positive, the operation of the present embodiment ends. The audio signal output unit 5 may output the first two digits and the second two digits as necessary.
A silence is inserted between the digits to generate a multi-digit voice with natural inflection.

【００２７】以上のように、本実施形態によれば、前半
の２桁数字の音声波形断片と後半の２桁数字の音声波形
断片とを結合して複数桁の数字音声を再生させるように
したので、自然な抑揚をもち、聞き手に違和感を感じさ
せにくい自然な数字音声を発生することができるように
なる。また、音声波形蓄積部３に蓄積する数字音声波形
は、前半の１桁および２桁数字に対する１１０個と、後
半の１桁および２桁数字に対する１１０個の、合計で２
２０個の波形で済むので、音声波形蓄積部３に蓄積する
データ量は従来のものに比べて大幅に低減することがで
きる。As described above, according to the present embodiment, the voice waveform fragment of the first two digits and the voice waveform fragment of the second half are combined to reproduce the numeric voice of a plurality of digits. Therefore, it is possible to generate a natural numeral sound having a natural intonation and making it difficult for a listener to feel uncomfortable. In addition, the number of numeral voice waveforms stored in the voice waveform storage unit 3 is 110 for the first half and one-digit number and 110 for the second half and two-digit number, for a total of two.
Since only 20 waveforms are required, the amount of data stored in the audio waveform storage unit 3 can be significantly reduced as compared with the conventional one.

【００２８】次に、本発明の第２実施形態を、図６(a)
〜(c) を参照して説明する。図６(a) 〜(c) は本実施形
態の要部のみを示すブロック図であり、それ以外の構成
は図４と同一または同等である。同図(a) では、音声波
形蓄積部３に前半の１桁および２桁音声波形（１１０個
の波形）３ａのみが記憶されている。変換器６は、該前
半の２桁音声波形３ａを入力とし、該入力を後半の２桁
音声波形の高さに変換して出力する働きをする。そこ
で、音声波形抽出部４が、前半の２桁音声波形の指示を
受けるとスイッチ７は端子８ａを選択し、一方後半の２
桁音声波形の指示を受けるとスイッチ７は端子８ｂを選
択する。この結果、音声波形蓄積部３には、前半の２桁
音声波形（１１０個の波形）３ａのみを蓄積させればよ
く、蓄積データ量を削減することができる。Next, a second embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIGS. FIGS. 6A to 6C are block diagrams showing only main parts of the present embodiment, and the other configuration is the same as or equivalent to FIG. In FIG. 3A, only the first half and one-digit audio waveforms (110 waveforms) 3a are stored in the audio waveform storage unit 3. The converter 6 receives the first two-digit audio waveform 3a as an input, converts the input to the height of the second half two-digit audio waveform, and outputs the same. Then, when the audio waveform extracting unit 4 receives the instruction of the first two-digit audio waveform, the switch 7 selects the terminal 8a, while the second half 2
When receiving the instruction of the digit voice waveform, the switch 7 selects the terminal 8b. As a result, only the first two-digit audio waveform (110 waveforms) 3a needs to be stored in the audio waveform storage unit 3, and the amount of stored data can be reduced.

【００２９】同図(b) は同図(a) の変形例を示し、音声
波形蓄積部３には後半の１桁および２桁音声波形（１１
０個の波形）３ｂのみが記憶されている。変換器９は、
該後半の２桁音声波形３ｂを入力とし、該入力を前半の
２桁音声波形の高さに変換して出力する働きをする。そ
こで、音声波形抽出部４が、前半の２桁音声波形の指示
を受けるとスイッチ１０は端子１１ｂを選択し、一方後
半の２桁音声波形の指示を受けるとスイッチ１０は端子
８ａを選択する。この結果、音声波形蓄積部３には、後
半の２桁音声波形（１１０個の波形）３ｂのみを蓄積さ
せればよく、蓄積データ量を削減することができる。FIG. 3B shows a modification of FIG. 3A, in which the second half digit and second digit audio waveforms (11
Only zero waveforms) 3b are stored. The converter 9
The latter two-digit audio waveform 3b is used as an input, and the input is converted into the height of the first two-digit audio waveform and output. Therefore, when the audio waveform extraction unit 4 receives the instruction of the first two-digit audio waveform, the switch 10 selects the terminal 11b, and when it receives the instruction of the latter two-digit audio waveform, the switch 10 selects the terminal 8a. As a result, only the latter two-digit two-digit audio waveforms (110 waveforms) 3b need to be stored in the audio waveform storage unit 3, and the amount of stored data can be reduced.

【００３０】同図(c) はさらに他の変形例を示し、音声
波形蓄積部３には前半と後半の１桁および２桁音声波形
の中間の高さの１桁および２桁の音声波形（１１０個の
波形）３ｃのみが記憶されている。第１の変換器１２
は、該中間の２桁音声波形３ｃを入力とし、該入力を前
半の２桁音声波形の高さに変換して出力する働きをす
る。また、第２の変換器１３は、該中間の２桁音声波形
３ｃを入力とし、該入力を後半の２桁音声波形の高さに
変換して出力する働きをする。そこで、音声波形抽出部
４が、前半の２桁音声波形の指示を受けるとスイッチ１
４は端子１５ａを選択し、一方後半の２桁音声波形の指
示を受けるとスイッチ１４は端子１５ｂを選択する。こ
の結果、音声波形蓄積部３には、中間の高さの２桁音声
波形（１１０個の波形）３ｃのみを蓄積させればよく、
蓄積データ量を削減することができる。FIG. 3C shows still another modified example, in which the audio waveform storage unit 3 stores a 1-digit and 2-digit audio waveform having a middle height between the first and second half 1-digit and 2-digit audio waveforms. Only 110 waveforms) 3c are stored. First converter 12
Has the function of taking the intermediate two-digit audio waveform 3c as an input, converting the input to the height of the first two-digit audio waveform, and outputting the same. The second converter 13 receives the intermediate two-digit audio waveform 3c, converts the input into the height of the latter two-digit audio waveform, and outputs the converted signal. Then, when the audio waveform extraction unit 4 receives the instruction of the first two-digit audio waveform, the switch 1
The switch 4 selects the terminal 15a, while the switch 14 selects the terminal 15b when receiving the instruction of the latter two-digit audio waveform. As a result, the audio waveform accumulating unit 3 only needs to accumulate only two-digit audio waveforms (110 waveforms) 3c having an intermediate height.
The amount of accumulated data can be reduced.

【００３１】なお、前記の実施形態では、前記音声波形
蓄積部３に、図１または図２に示されているような、３
桁または４桁数字の連続発声波形を蓄積するようにした
が、本発明はこれに限定されず、該３桁または４桁数字
の連続発声した波形のうちの必要な前半１または２桁、
または後半１または２桁の波形のみを予め切出して、こ
れらのみを蓄積するようにしてもよい。そのようにすれ
ば、音声波形蓄積部３の記憶容量、合成時の計算量を削
減することができる。It should be noted that, in the above-described embodiment, the sound waveform accumulating section 3 stores, as shown in FIG.
Although the continuous utterance waveform of the digit or four-digit number is stored, the present invention is not limited to this, and the necessary first half or two of the continuous utterance waveform of the three- or four-digit number,
Alternatively, only the waveform of the last one or two digits may be cut out in advance, and only these waveforms may be stored. By doing so, it is possible to reduce the storage capacity of the audio waveform storage unit 3 and the amount of calculation at the time of synthesis.

【００３２】[0032]

【発明の効果】以上の説明から明らかなように、本発明
によれば、基本周波数の高い数字音声波形と、基本周波
数の低い数字音声波形とを別個に蓄積するようにしたの
で、メモリに記憶するデータ量を、従来より大きく低減
できる。As is apparent from the above description, according to the present invention, the numeric voice waveform having a high fundamental frequency and the numeric voice waveform having a low fundamental frequency are separately stored, so that they are stored in the memory. The amount of data to be performed can be greatly reduced as compared with the related art.

【００３３】また、本発明によれば、基本周波数の高い
数字音声波形断片と、基本周波数の低い数字音声波形断
片とを結合して、数字音声を合成するようにしたので、
自然な抑揚をもつ複数数字の数字音声を発生させること
ができるようになり、聞き手に違和感を感じさせない自
然な音声を合成することができるようになる。また、本
発明は、自動音声応答装置等に適用すると好適である。Further, according to the present invention, a numeric voice is synthesized by combining a numeric voice waveform fragment having a high fundamental frequency and a numeric voice waveform fragment having a low fundamental frequency.
A plurality of numeric voices having natural intonation can be generated, and a natural voice that does not make a listener feel uncomfortable can be synthesized. The present invention is preferably applied to an automatic voice response device or the like.

[Brief description of the drawings]

【図１】４桁の数字音声の前半２桁の音声波形を作成
する方法の説明図である。FIG. 1 is an explanatory diagram of a method for creating a first two-digit voice waveform of a four-digit numeric voice.

【図２】４桁の数字音声の後半２桁の音声波形を作成
する方法の説明図である。FIG. 2 is an explanatory diagram of a method of creating a voice waveform of the last two digits of a four-digit numeric voice.

【図３】本発明の一実施形態の４桁の数字音声の合成
方法の説明図である。FIG. 3 is an explanatory diagram of a method for synthesizing a four-digit numeric voice according to an embodiment of the present invention.

【図４】本発明の一実施形態の数字音声の合成装置の
構成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of a numeral voice synthesizing apparatus according to an embodiment of the present invention.

【図５】本実施形態の動作を示すフローチャートであ
る。FIG. 5 is a flowchart showing the operation of the embodiment.

【図６】本発明の変形例の要部の構成を示すブロック
図である。FIG. 6 is a block diagram showing a configuration of a main part of a modified example of the present invention.

【図７】従来技術の説明図である。FIG. 7 is an explanatory diagram of a conventional technique.

[Explanation of symbols]

１…数字列分割部、２…基本周波数高さ指定部、３…音
声波形蓄積部、４…音声波形抽出部、５…音声信号出力
部、６、９、１２、１３…変換部。DESCRIPTION OF SYMBOLS 1 ... Numeric string division | segmentation part, 2 ... Basic frequency height designation | designated part, 3 ... Audio waveform accumulation part, 4 ... Audio waveform extraction part, 5 ... Audio signal output part, 6, 9, 12, 13 ... Conversion part.

───────────────────────────────────────────────────── フロントページの続き (72)発明者清水徹埼玉県上福岡市大原２−１−15 株式会社ケイディディ研究所内Ｆターム(参考） 5D045 AA09 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Toru Shimizu 2-1-15 Ohara, Kamifukuoka-shi, Saitama F-term in K.D. Laboratory Inc. (reference) 5D045 AA09

Claims

[Claims]

1. A waveform uttered independently by changing from 0 to 9 and a first half of a three- or four-digit number in a state where the last one or two digits are fixed to an arbitrary one or two-digit number. 2
A waveform in which the digits are changed from 00 (zero zero) to 99 (cue queue) and three or four digits are continuously uttered,
A method for creating a numeric voice waveform, characterized in that the waveform is recorded as a numeric voice waveform having a high fundamental frequency.

2. A waveform in which the first two digits of a three-digit or four-digit number are fixed to an arbitrary two-digit number, and the second half is changed from 0 to 9 to continuously utter a three-digit number, A waveform in which the last two digits are changed from 00 (zero-zero) to 99 (cue queue) and a four-digit number is uttered continuously, and a numerical voice waveform having a low fundamental frequency is recorded. How to make.

3. A three- or four-digit numeric speech is synthesized by combining a two-digit numeric speech waveform fragment having a high fundamental frequency and a one- or two-digit numeric speech waveform fragment having a low fundamental frequency. A method for synthesizing numeral voices, characterized in that:

4. A numeral string is divided into two-digit sections from the beginning, and a numeral voice waveform having a high fundamental frequency is designated for a number in an odd section, and a numeral speech waveform having a low fundamental frequency is designated for a number in an even section. A method for synthesizing a numeric voice, wherein the numeric voice is synthesized by combining designated numeric voice waveform fragments extracted based on these specifications.

5. The numerical voice waveform fragment having a high fundamental frequency and the numerical voice waveform fragment having a low fundamental frequency are respectively composed of the first two digits and the second half of the waveform created by the method according to claim 1 or 2. The method according to claim 3 or 4, wherein the method is a digit or a two-digit part.

6. A voice waveform storage unit for storing a numeric voice waveform having a high fundamental frequency and a numeric voice waveform having a low fundamental frequency, a numeric string dividing unit for dividing a numeric string into two-digit sections from the beginning, For each of the sections, a basic frequency height specifying unit for specifying a height of a basic frequency, and a voice waveform for extracting a numerical voice waveform fragment having a height specified by the basic frequency height specifying unit from the voice waveform storage unit. An apparatus for synthesizing a numeric voice, comprising: an extracting section; and an audio signal output section for combining and outputting the numeric audio waveform fragments extracted from the audio waveform extracting section.

7. A numerical voice waveform having a high basic frequency and a numerical voice waveform having a low basic frequency stored in the voice waveform storage section are respectively numerical voice waveforms created by the methods of claim 1 and 2. The numerical speech synthesizer according to claim 6, wherein

8. The voice waveform accumulating section stores only one of the numeric voice waveform having a high fundamental frequency and the numeric voice waveform having a low fundamental frequency, and the other numeric voice waveform is stored based on the one voice waveform. 7. The numerical speech synthesizer according to claim 6, wherein the numerical speech is created by frequency conversion.