JPH01106000A

JPH01106000A - Voice encoder

Info

Publication number: JPH01106000A
Application number: JP62263298A
Authority: JP
Inventors: Kanji Kunisawa; 国澤　寛治; Noboru Uechi; 上地　登
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1987-10-19
Filing date: 1987-10-19
Publication date: 1989-04-24
Anticipated expiration: 2012-06-25
Also published as: JP2624972B2

Abstract

PURPOSE: To make a synthesizer on an audio responding device side small-sized by analyzing inputted text data and writing voice data consisting of rhythm information and pronunciation information in a storage element. CONSTITUTION: A document is inputted through a text data input means consisting of a keyboard and other document input means. Its text data me converted by a document analyzing means 8 into the voice data consisting of the rhythm information (c) and pronunciation information (d) by a text synthesizing method according to a word dictionary and a rule, and the converged voice data are further encoded into, for example, a bar code by a voice data writing means 9 and written on printed matter, e.g. a teaching material, etc., to obtain the storage element 1. An IC card is usable as the storage element 1. Thus, the synthesizer on the audio responding device can be made small-sized.

Description

【発明の詳細な説明】［技術分野］本発明は音声応答装置に使用する音声データを作成して
蓄積要素に８き込む音声符号化装置を提供するにある。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention provides a voice encoding device that creates voice data to be used in a voice response device and stores it in a storage element.

［背景技術］情報通信サービスにおけるサービス内容が拡大するにつ
れて、コンピュータで処理した結果を人間に伝える端末
ｖｃ置として、種々の形式のものが実用に供されるよう
になった。この人間と機械とのインターフェースとして
、従来がらプリンタやデイスプレィなどの視覚に訴える
手段が主に用いられているが、音声によって人間に伝え
る方が使いやすい場合が多く、且つ経済的であるため、
最近は種々のサービスに広く使われるようになった。[Background Art] As the content of information and communication services has expanded, various types of terminals have come into practical use as terminals for transmitting computer-processed results to humans. Conventionally, visual means such as printers and displays have been mainly used as an interface between humans and machines, but it is often easier to use and more economical to convey information to humans by voice.
Recently, it has become widely used for various services.

このような音声による回答装置を一般的に音声応答装置
と呼ばれている。Such a voice response device is generally called a voice response device.

この音声応答装置では応答に用いる音声出力のためのデ
ータを蓄積して、このデータに基づいて、サービス要求
に応じた音声を作成して出力するようになっている。This voice response device stores data for outputting voice used in response, and creates and outputs voice in accordance with the service request based on this data.

このデータを蓄積した蓄積要素１としては第４図（ａ）
に示すように音声応答装置２内部に組み込んだＲＯＭや
、［４図（ｂ）に示すように音声応答装置２に装着する
ＩＣカードのような外部メモリが用いられている。The storage element 1 that has accumulated this data is shown in Figure 4(a).
As shown in FIG. 4, a ROM built into the voice response device 2, and an external memory such as an IC card attached to the voice response device 2 as shown in FIG. 4(b) are used.

蓄積要素１を内部に組み込んだ第４図（ａ）の音声応答
波ｆｉ２では、外部からの選択信号ａにより、蓄積要素
１のデータｂ内の必要なデータが選択されて、合成器３
に出力されるようになっている。In the voice response wave fi2 of FIG. 4(a) in which the storage element 1 is incorporated, necessary data in the data b of the storage element 1 is selected by the selection signal a from the outside, and the synthesizer 3
It is now output to .

この場合外部からの入力される情報が選択信号ａだけな
ので入力情報量は少ないが、蓄積要素１を内部するため
装置が大型化するという問題があった。In this case, since the only information input from the outside is the selection signal a, the amount of input information is small, but since the storage element 1 is internal, there is a problem that the device becomes larger.

一方外部に蓄積要素１を持つ第４図（ｂ）の場合には選
択信号ａで選択されたデータｂが音声応答装置２に人力
するため装置の入力情報量は多いが、蓄積要素１を内部
しないので、装置は萌者に比べて大型とはならない。On the other hand, in the case of FIG. 4(b) in which the storage element 1 is external, the data b selected by the selection signal a is manually inputted to the voice response device 2, so the amount of input information to the device is large, but the storage element 1 is stored internally. Therefore, the device is not as large as the Moesha.

例えば学習機器等に音声応答装置・２を用いた場合、音
声応答装置２を各個人が使用するため、音声応答波ｒ！
１２が小型で、且つ安価であることが望ましく、また出
力する音声（蓄積しておくデータ）の内容が頻繁に変更
、追加されるような場合は外部に′Ｍ積されたデータを
持つほうが適していると考えられる。For example, when the voice response device 2 is used as a learning device, each person uses the voice response device 2, so the voice response wave r!
It is desirable that the 12 is small and inexpensive, and if the content of the output audio (stored data) is frequently changed or added, it is better to have the data stored externally. It is thought that

ところでデータを記録、蓄積する場合には音声データを
音声合成ができる形に符号化することになるが、音声合
成の方式としては録音編集方式、パラメータ編集方式、
規則合成方式などが挙げられる。これら３者の方式を比
較すると下表な結果この中で規則合成方式は他の２方式
と比べて、音声品質は劣るが、情報圧縮率は非常に高く
、小型で安価な音声応答装置を作る場合に適している。By the way, when recording and storing data, the audio data must be encoded in a form that can be synthesized into speech.Speech synthesis methods include recording editing method, parameter editing method,
Examples include rule synthesis method. Comparing these three methods, the results are shown in the table below.Among them, the rule synthesis method has inferior voice quality compared to the other two methods, but has a very high information compression rate, making it possible to create a small and inexpensive voice response device. suitable for cases.

この規則合成方式は発音情報（文字列）と、アクセント
・イントネーション等の韻律情報から、音声Ｔ的、言語
学的規則に基づいて音声を作り出す方式である。特に、
発音情報（文字列）からのみ音声を作り出す方式をテキ
スト合成方式と呼ばれているが、この方式は人間の発話
という知能的な活動にまで踏み込んだ最終的な音声合成
の形態であるとい言える。音声に含まれる情報をＭ語情
報と、個人性等の自然性とに分けると現段階でのテキス
ト合成方式は言語情報のみを再現できるということにな
るが、言語情報を伝えるだけでよい応用分野は広いと考
えられる。また語学教材の場合でも標準的な話方を覚え
るとい点では有用な方式である。　テキスト合成方式と
して代表的なもの、例えば日本語については、「佐原、
自吸、小扉、嵯峨山ご日本語テキストからの音声合成”
、通研月報、３２，１１．ｐｐ、２２４３−２２５２（
昭５８）」に示され、また英語については、「＾ｌ　ｌ
ｅｎ＊Ｊ、Ｃａｒｌｓｏｎ會Ｒ０Ｇｒａｎｓｔｒｏｍｖ
Ｂ、ｌ１ｕｎｎｉｃｕｔｔ、Ｋｉａｔｔ、Ｄ、　ａｎｄ
　ＰｉｓｏｎｉｙＤ、：ＭｒＴＡＬＫ−７９″：　　Ｃ
ｏｎｖｅｒｓｉｏｎ　　ｏｆ　　ｕｎｒｅｓｔｒｉｃｔ
ｅｄ　　Ｅｎｇｌｉｓｂ　　ＩＩ　ｔｅｘｔ・ｔｏ　・
’５ｐｅｅｃｈ＋ＭｒＴ（１９７９）Ｊに挙げられてい
る。This rule synthesis method is a method for creating speech based on phonetic and linguistic rules from pronunciation information (character strings) and prosodic information such as accent and intonation. especially,
A method that creates speech only from pronunciation information (character strings) is called a text synthesis method, and it can be said that this method is the final form of speech synthesis that extends to the intellectual activity of human speech. If the information contained in speech is divided into M-word information and naturalness such as individuality, the current text synthesis method can only reproduce linguistic information, but there are applications in which it is only necessary to convey linguistic information. is considered to be wide. It is also a useful method for learning the standard way of speaking in language learning materials. Typical text synthesis methods, for example, for Japanese, are
“Self-speech, small door, Sagayama speech synthesis from Japanese text”
, Tsuken Monthly Report, 32, 11. pp, 2243-2252 (
1972)", and for English, "^l l
en*J, Carlsonkai R0 Granstromv
B, lunicutt, Kiatt, D, and
PisoniyD: MrTALK-79″: C
version of unrestricted
ed Englisb II text・to・
'5peech+MrT (1979) J.

このようなテキスト合成方式による合成器は大きく分け
ると第５図のようにテキストデータを入力して単語辞書
や各種規則によりアクセント・イントネーションや、音
韻艮等の韻律情報Ｃ１発音情報ｄとを生成する文章解析
部４と、これら文章解析部４により生成された情報によ
り制御情報ｅを生成する制御情報生成部５と、この制御
＋？Ｉ報ｅに基づいて音声信号を合成する音声合成部６
とからなり、特にアクセント・イントネーションや韻律
情報を生成する文章解析部４が大きく、このテキスト合
成システムを合成器に用いれば装置が大型するのは避け
られない。Synthesizers based on such a text synthesis method can be roughly divided into, as shown in Figure 5, input text data and generate prosodic information C1 such as accent/intonation and phoneme information and pronunciation information d using word dictionaries and various rules. A text analysis unit 4, a control information generation unit 5 that generates control information e from the information generated by these text analysis units 4, and a control Speech synthesis unit 6 that synthesizes an audio signal based on I report e
In particular, the text analysis section 4 that generates accent/intonation and prosody information is large, and if this text synthesis system is used as a synthesizer, it is inevitable that the device will be large.

従って蓄積要素１にテキストデータを記憶させて該テキ
ストデータから音声を合成するような音声応答装置２を
枯成した場合、第４図（ａ）（ｂ）のいずれの場合に於
いても合成器が大型化するという問題がある。Therefore, if the voice response device 2, which stores text data in the storage element 1 and synthesizes speech from the text data, becomes obsolete, the synthesizer There is a problem in that it becomes larger.

そこで文書解析部４を用いずに規則合成を行うものも提
案されでいるが、この場合蓄積要素１に韻律情報を書き
込む際、書き込む韻律情報Ｃが手作又で作成されるため
、極めて非能率的であった。Therefore, a method has been proposed in which rule synthesis is performed without using the document analysis unit 4, but in this case, when writing the prosody information to the storage element 1, the prosody information C to be written is created manually, which is extremely inefficient. It was a target.

し発明の目的］本発明は上述の点に鑑みてその目的とするところは、韻
律情報、発音情報からなる音声データを簡単に蓄ｍ要索
に記憶させることができ、結果音声応答装置側の合成器
を小型化することが可能となる音声符号化装置を提供す
るにある。OBJECT OF THE INVENTION] In view of the above-mentioned points, an object of the present invention is to easily store voice data consisting of prosodic information and pronunciation information in a storage mary, and as a result, the voice response device side can easily store voice data consisting of prosodic information and pronunciation information. An object of the present invention is to provide a speech encoding device that allows a synthesizer to be miniaturized.

［発明の開示１本発明は＃Ｓ１図に示すようにテキストデータを入力す
るテキストデータ入力手段７と、入力されたデキストデ
ータを解析して韻律情報Ｃと発音情報ｄとからなる音声
データを出力する文章解析手段８と、上記音声データを
蓄積要素１に書き込む手段９とを備えたことを特徴とす
るものである。[Disclosure 1 of the Invention As shown in FIG. The apparatus is characterized in that it includes a sentence analysis means 8 for analyzing the voice data, and a means 9 for writing the audio data into the storage element 1.

以下本発明音声符号化装置の実施例の動作を説明する。The operation of the embodiment of the speech encoding device of the present invention will be described below.

第２図は実施例の動作の７０−チャートを示しており、
まず本発明装置ｎをスタートさせ、キーボードやその他
の文章入力手段からなるテキストデータ人力手段７より
文章を入力する。このテキストデータは文章解析手段８
により単語辞古や規則に基づいてテキスト合成の手法に
より韻律情報（アクセント、イントネーション、ポーズ
、音声のレベル、音韻継続時間長等）Ｃと、発音情報（
音声合成時の合成単位の系列を示すもの）ｄとからなる
音声データに変換され、この変換作成された音声データ
は丈に音声データ書き込み手段９により例えばバーコー
ドとして符号化されて被印刷物、例えば教材等に書き込
まれる。つまりこのバーコードが印刷された被印刷物が
−ａ積要索１となる。FIG. 2 shows a 70-chart of the operation of the embodiment,
First, the apparatus n of the present invention is started, and a text is inputted using the text data manual means 7 consisting of a keyboard or other text input means. This text data is the text analysis method 8
Prosodic information (accent, intonation, pause, speech level, phonological duration, etc.) and pronunciation information (
d (indicating a sequence of synthesis units during speech synthesis), and this converted speech data is encoded as a bar code, for example, by the speech data writing means 9, and then sent to a printed material, e.g. Written in teaching materials, etc. In other words, the printed material on which this barcode is printed becomes -a product index 1.

このように本実施例ではテキストデータを人力すれば、
規則合成に用いる韻律１ｈ報Ｃ及び発音情報ｄに自動的
に変換し且つ符号化して蓄積要素１に古き込むことがで
きるのである。ここで音声データの情報量が小さいため
上述のようにバーコードにＭ積することができ、しがも
バーコードであるから人間によりランダムアクセスする
ことができて、音声出力手段側で特別な選択手段を必要
とせず、更に印刷物であるから容易に複製することもで
きるのである。In this way, in this example, if the text data is manually processed,
The information can be automatically converted into prosodic information C and pronunciation information d used for rule synthesis, encoded, and stored in the storage element 1. Here, since the amount of information in the audio data is small, M can be multiplied by the barcode as mentioned above, and since it is a barcode, it can be randomly accessed by humans, and special selections can be made on the audio output means side. It does not require any means, and since it is a printed matter, it can be easily reproduced.

第３図は本発明音声符号化装置で作成された蓄積要素１
から音声データを読み取って音声合成を行う音声出力手
段たる音声応答装置２を示しており、この音声応答装（
！！２ではバーフードから符号データを読み取るバーコ
ードリーグを備え、バーコードより読み取った符号デー
タを韻律情報Ｃと発音情報ｄとに復号し、これら情報ｃ
、ｄより音声合成に必要な制御情報ｅを作成する制御情
報生成部５と、制御情報生成部５からの制御情報ｅに基
づいて音声信号を合成する音声合成部６とからなる合成
器３を備えており、所謂規則合成のシステムを構成する
。この場合テキスト合成方式のように文章解析部を内蔵
する必要がないから音声応答装置２は小型で且つ安価に
製作できることになる。FIG. 3 shows storage element 1 created by the speech encoding device of the present invention.
This figure shows a voice response device 2, which is a voice output means for reading voice data from a computer and synthesizing it.
! ! 2 is equipped with a barcode league that reads code data from bar food, decodes the code data read from the barcode into prosody information C and pronunciation information d, and converts these information c
, d, a control information generation section 5 that creates control information e necessary for speech synthesis, and a speech synthesis section 6 that synthesizes an audio signal based on the control information e from the control information generation section 5. It is equipped with a so-called rule synthesis system. In this case, unlike the text synthesis method, there is no need to incorporate a text analysis section, so the voice response device 2 can be manufactured in a small size and at low cost.

また人力する’ｔｆｆ　ｉｌ　ｍは録音編集方式やパラ
メータ編集方式に比べて少ないという利点がある。Another advantage is that there is less manual input compared to the recording editing method or the parameter editing method.

尚音声応答装置２が小型になるから蓄積要素１としてＲ
ＯＭ等の内点用蓄積要素を用いても音声応答装置２が大
型化することが無い。又■ＣカードをＭ積要素１として
用いてもよい。勿論蓄積要素１の種類が変われば音声書
き込み手段９の構成も変わり、また音声応答装置２の制
御情報生成部５の符号入力手段の枯成ら変わる。Note that since the voice response device 2 is small, R is used as the storage element 1.
Even if an internal point storage element such as OM is used, the voice response device 2 does not become larger. Also, the C card may be used as the M product element 1. Of course, if the type of storage element 1 changes, the configuration of the voice writing means 9 will also change, and the code input means of the control information generating section 5 of the voice response device 2 will also change.

［発明の効果］本発明は上述のようにテキストデータを入力するテキス
トデータ入力手段と、入力されたデキストデータを解析
して韻律情報と発音情報とからなる音声データを出力す
る文章解析手段と、上記音声データを蓄積要素に書き込
む手段とを備えたので、テキスト入力を行うだけで、規
則合成に使用する発音情報及び韻律情報をＭ積手段に！
積記憶させることができるから、従来のように手作業に
゛　よる情報作成に比べて大幅に能率があがり、しかも
テキスト合成のように各合成器に文章解析手段を必要と
しなくなるから、音声出力手段側の小型化が可能な上に
、製作費も安価となり、例えば学習教材等に用いる場合
の条件を満足することができるという効果を奏する。[Effects of the Invention] The present invention includes a text data input means for inputting text data as described above, a text analysis means for analyzing the input dext data and outputting audio data consisting of prosody information and pronunciation information, Since it is equipped with a means for writing the above-mentioned audio data into the storage element, just by inputting text, the pronunciation information and prosody information used for rule synthesis can be transferred to the M product means!
Since it is possible to memorize products, efficiency is greatly improved compared to the conventional method of creating information manually.Furthermore, unlike text synthesis, each synthesizer does not require a text analysis means, so it can be used as an audio output means. In addition to being able to reduce the size of the device, the manufacturing cost is also low, and it is possible to satisfy the requirements for use as a learning material, for example.

[Brief explanation of the drawing]

第１図は本発明の回路構成図、第２図は本発明実施例の
動作説明用７０−チャート、第３図は同上に用いる音声
応答装置の回路構成図、！Ｒ４＆、Ｐｔ５５図は従来例
の説明用回路も１成図である。１・・・Ｍ積要素、７・・・テキストデータ入力手段、
８・・・文章解析手段、９・・・音声データ書き込み手
段である。代理人　弁理士　石　１）艮　七第２図割　（イ）昭和６２年１２月２８日FIG. 1 is a circuit configuration diagram of the present invention, FIG. 2 is a 70-chart for explaining the operation of an embodiment of the present invention, and FIG. 3 is a circuit configuration diagram of a voice response device used in the above. The diagrams R4& and Pt55 are also single diagrams of the explanatory circuit of the conventional example. 1...M product elements, 7...Text data input means,
8: Sentence analysis means; 9: Audio data writing means. Agent Patent Attorney Ishi 1) Ai 7th Diagram 2 (A) December 28, 1986

Claims

[Claims]

(1) A text data input means for inputting text data, a text analysis means for analyzing the input text data and outputting audio data consisting of prosody information and pronunciation information, and a means for writing the audio data into a storage element. A speech encoding device comprising:

(2) The speech encoding device according to claim 1, wherein a bar code is used as the storage element.

(3) The speech encoding device according to claim 1, characterized in that the storage element is an IC card.