JPH0675594A

JPH0675594A - Text voice conversion system

Info

Publication number: JPH0675594A
Application number: JP4227007A
Authority: JP
Inventors: Tadahiro Hoshino; 恭祐星野; Youko Mitsutsune; 陽子光恒
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1992-08-26
Filing date: 1992-08-26
Publication date: 1994-03-18

Abstract

PURPOSE:To provide a text voice conversion system which makes good use of each section of the text voice conversion device and facilitates many kinds of voice output methods. CONSTITUTION:The system has plural voice output sections 56a to 56d. An output selection section 55, which selects a voice output section, is provided between a voice synthesis section 54 and the sections 56a to 56d. The text data given to a text analysis section 52 are added with labels which specify a voice output section that outputs analog voice signals corresponding to the text data. The section 52, a synthetic parameter generating section 52 and the section 54 respectively perform the processes designated to themselves against the parts other than the labels, add labels and give the processed data to the next processing section. An output selection section 55 gives the inputted voice waveform data to the voice output section corresponding to the label added to the data.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テキストデータをアナ
ログ音声信号に変換して出力させるテキスト音声変換シ
ステムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech conversion system for converting text data into analog voice signals and outputting them.

【０００２】[0002]

【従来の技術】テキスト音声変換システムとは、任意の
テキストデータ（文字キャラクタコード）を入力とし、
アクセントやイントネーション等を付加して入力に相当
するアナログ音声信号を出力（例えば発音出力）させる
ものである。2. Description of the Related Art A text-to-speech conversion system uses arbitrary text data (character character code) as input,
An analog audio signal corresponding to the input is output (for example, a sound output) by adding an accent or intonation.

【０００３】図２は、日本語文（漢字かな混じり文）を
対象としたこのようなテキスト音声変換システムのハー
ドウェア構成を示すものであり、図３は、そのテキスト
音声変換装置の機能的構成を示すものである。FIG. 2 shows a hardware configuration of such a text-to-speech conversion system for a Japanese sentence (kanji / kana mixed sentence), and FIG. 3 shows a functional configuration of the text-speech conversion device. It is shown.

【０００４】図２において、テキスト音声変換システム
は、例えばパーソナルコンピュータ等でなるテキストデ
ータ発生装置（上位装置）１０と、テキスト音声変換装
置２０とから構成されている。In FIG. 2, the text-to-speech conversion system comprises a text data generator (upper device) 10 such as a personal computer, and a text-to-speech converter 20.

【０００５】テキストデータ発生装置１０は、例えば、
記録媒体に記録されているテキストデータをデータ出力
部１１からテキスト音声変換装置２０に与える。The text data generator 10 is, for example,
The text data recorded on the recording medium is given from the data output unit 11 to the text-to-speech converter 20.

【０００６】テキスト音声変換装置２０においては、デ
ータ入力部２１が与えられたテキストデータを取り込み
一時保持する。制御部２２は、例えばＣＰＵやＲＯＭや
ＲＡＭからなり、辞書記憶部２３に格納されている情報
を利用しながら、入力されたテキストデータを音声合成
のためのパラメータ（以下、合成パラメータと呼ぶ）に
ソフトウェア的に変換する。このようにして得られた合
成パラメータを音声合成部２４が音声波形データに変換
して、デジタル／アナログ変換器やスピーカ等でなる音
声出力部２５に与えてアナログ音声信号を出力させる。In the text-to-speech converter 20, the data input unit 21 takes in the text data provided and temporarily holds it. The control unit 22 includes, for example, a CPU, a ROM, and a RAM, and uses the information stored in the dictionary storage unit 23 to convert the input text data into parameters for voice synthesis (hereinafter, referred to as synthesis parameters). Convert by software. The voice synthesis unit 24 converts the synthesis parameters thus obtained into voice waveform data, and supplies the voice waveform data to the voice output unit 25 including a digital / analog converter or a speaker to output an analog voice signal.

【０００７】テキスト音声変換装置２０は、機能的に
は、図３に示すように、テキスト解析部３１と、単語辞
書３２と、合成パラメータ生成部３３と、音声素片記憶
部３４と、音声合成部３５（２４）と、音声出力部３６
（２５）とからなる。すなわち、上述した制御部２２
は、テキスト解析部３１及び合成パラメータ生成部３３
として機能し、上述した辞書記憶部２３は、単語辞書３
２及び音声素片記憶部３４として機能する。Functionally, the text-to-speech conversion device 20, as shown in FIG. 3, has a text analysis section 31, a word dictionary 32, a synthesis parameter generation section 33, a speech unit storage section 34, and a speech synthesis section. Section 35 (24) and voice output section 36
It consists of (25). That is, the control unit 22 described above
Is a text analysis unit 31 and a synthesis parameter generation unit 33.
Functioning as the word dictionary 3
2 and the voice unit storage unit 34.

【０００８】図３において、テキスト解析部３１では、
単語辞書３２を利用して、入力された漢字かな混じり文
（テキストデータ）から音韻、韻律記号列（中間言語）
を生成する。より詳細に言えば、漢字かな混じり文に対
して単語辞書３２と照合しながら形態素解析して単語等
の形態素に切り分ける。単語辞書３２には、形態素（単
語）の表記、読み、品詞、アクセント位置、前後の接続
情報等が格納されている。テキスト解析部３１は、単語
辞書３２に格納されているこれらの情報に基づいて、自
立語と自立語を接続したり自立語と付属語とを接続した
りして文節を作り、また、文節と文節を接続することで
文章の読みを作ると共にその際に文中のポーズ位置やア
クセント位置や強度等を決定して、入力文の読み、アク
セント、イントネーション、ポーズ等の情報を文字列と
して記述した音韻、韻律記号列（中間言語）を生成す
る。In FIG. 3, in the text analysis unit 31,
Using the word dictionary 32, a phoneme and a prosodic symbol string (intermediate language) from the input kanji / kana mixed sentence (text data)
To generate. More specifically, morpheme analysis is performed on a mixed kanji / kana sentence while collating the sentence with the word dictionary 32, and the sentence is divided into morphemes such as words. The word dictionary 32 stores notation of morphemes (words), reading, part of speech, accent position, connection information before and after, and the like. Based on the information stored in the word dictionary 32, the text analysis unit 31 connects the independent word and the independent word or connects the independent word and the adjunct word to create a clause, and A phoneme that describes information such as the reading, accent, intonation, and pose of the input sentence as a character string by connecting the clauses to create a sentence reading and determining the pose position, accent position, strength, etc. in the sentence at that time. , Prosodic symbol string (intermediate language) is generated.

【０００９】合成パラメータ生成部３３は、音韻、韻律
記号列に基づき、音声素片（音の種類）データを音声素
片記憶部３４から取出し、各音韻の継続時間（音の長
さ）を設定し、音韻、韻律記号列のアクセント情報に基
づいてピッチパタン（声の高さ）を決定し、さらに、各
句の長さ等により振幅パタン（音の強弱）を決定し、こ
れらの情報でなる合成パラメータを生成する。このう
ち、音声素片は、接続して合成波形を作るための音声基
本単位であり、音の種類等に応じて様々なものが予め用
意されていて音声素片記憶部３４に蓄積されている。The synthesis parameter generation unit 33 retrieves the voice unit (sound type) data from the voice unit storage unit 34 based on the phoneme and the prosodic symbol string, and sets the duration (sound length) of each phoneme. Then, the pitch pattern (pitch of voice) is determined based on the phoneme and accent information of the prosodic symbol string, and further the amplitude pattern (sound intensity) is determined by the length of each phrase and the like. Generate synthetic parameters. Of these, the voice unit is a basic unit of voice for connecting and creating a synthesized waveform, and various units are prepared in advance according to the type of sound and are stored in the voice unit storage unit 34. .

【００１０】音声合成部３５は、合成パラメータに基づ
き音声波形データを生成し、音声出力部３６はこの音声
波形データを内蔵するバッファメモリに一時保持し、所
定速度のクロックに従ってデジタル／アナログ変換して
アナログ音声信号に変換し、内蔵するスピーカから発音
出力させたり、又は、送信手段によって他の装置に送信
したりする。The voice synthesizing unit 35 generates voice waveform data based on the synthesis parameter, and the voice output unit 36 temporarily holds the voice waveform data in a built-in buffer memory and performs digital / analog conversion according to a clock of a predetermined speed. It is converted into an analog audio signal and output by a built-in speaker, or it is transmitted to another device by a transmission means.

【００１１】このようにして、入力された漢字かな混じ
り文（テキストデータ）に対応したアナログ音声信号を
得て出力することができる。In this way, it is possible to obtain and output an analog voice signal corresponding to the input kanji / kana mixed sentence (text data).

【００１２】[0012]

【発明が解決しようとする課題】ところで、音声出力部
２５（３６）の処理速度は、アナログ音声信号が人間に
聞かすことを意図したものであるのでそれに応じた速度
となる。一方、テキスト解析部３１や合成パラメータ生
成部３３として機能する制御部２２の処理速度や音声合
成部２４（３５）の処理速度は、このような制約がない
ので速いものである。そのため、制御部２２や音声合成
部２４の処理には多くの待ち時間が生じ、稼動率が低く
有効に利用していないという問題がある。そこで、制御
部２２や音声合成部２４に、処理したものを次々と格納
するバッファメモリを設けて稼動率を高めることが考え
られるが、音声出力部２５の処理速度が遅いため、この
ようなバッファメモリへの書込みが呼出しより多くなっ
てバッファメモリがオーバーフローすることは明白であ
り、有効な対処方法とは言い難い。By the way, the processing speed of the audio output unit 25 (36) is intended to hear an analog audio signal to a human being, and accordingly, the processing speed becomes accordingly. On the other hand, the processing speed of the control unit 22 functioning as the text analysis unit 31 and the synthesis parameter generation unit 33 and the processing speed of the voice synthesis unit 24 (35) are fast because there is no such restriction. Therefore, a lot of waiting time occurs in the processing of the control unit 22 and the voice synthesizing unit 24, and there is a problem that the operating rate is low and it is not effectively used. Therefore, it is conceivable to provide the control unit 22 and the voice synthesizing unit 24 with a buffer memory for storing processed data one after another to increase the operation rate. However, since the processing speed of the voice output unit 25 is slow, such a buffer It is obvious that the number of writes to memory is larger than the number of calls and the buffer memory overflows, which is not a valid solution.

【００１３】また、従来のテキスト音声変換システム
は、入力系統と出力系統が１対１であるため、多様な音
声出力方法を実現できないという問題もあった。例え
ば、複数の文書を同時に発声したりすることはできず、
また、同一文書に係る音声信号を同報出力するようなこ
とはできなかった。このようなことを達成させようとす
ると、テキスト音声変換装置２０が複数個必要となって
システムの規模が大きくなってしまう。Further, the conventional text-to-speech conversion system has a problem that it cannot realize various speech output methods because the input system and the output system are one-to-one. For example, you cannot speak multiple documents at the same time,
In addition, it was not possible to broadcast the voice signals related to the same document. In order to achieve such a thing, a plurality of text-to-speech converters 20 are required and the system scale becomes large.

【００１４】本発明は、以上の点を考慮してなされたも
のであり、テキスト音声変換装置内の各部を無駄なく利
用することができる、しかも、多様な音声出力方法に対
応できるテキスト音声変換システムを提供しようとした
ものである。The present invention has been made in consideration of the above points, and the text-to-speech conversion system can utilize each section in the text-to-speech conversion apparatus without waste and can support various voice output methods. Is intended to be provided.

【００１５】[0015]

【課題を解決するための手段】かかる課題を解決するた
め、本発明においては、入力されたテキストデータから
単語辞書を利用して音韻、韻律記号列を生成するテキス
ト解析部と、音韻、韻律記号列を音声素片記憶部を利用
して音声合成用パラメータに変換する合成パラメータ生
成部と、音声合成用パラメータを音声波形データに変換
する音声合成部と、音声波形データをアナログ音声信号
に変換して出力する音声出力部とを備えたテキスト音声
変換システムを、以下のようにした。In order to solve such a problem, in the present invention, a text analysis unit for generating a phoneme and a prosodic symbol string from input text data using a word dictionary, and a phoneme and a prosodic symbol. A synthesis parameter generation unit that converts the sequence into a voice synthesis parameter using the voice unit storage unit, a voice synthesis unit that converts the voice synthesis parameter into voice waveform data, and a voice waveform data into an analog voice signal. The text-to-speech conversion system provided with a speech output unit for outputting as described above is as follows.

【００１６】すなわち、音声出力部を複数設けると共
に、音声合成部とこれら複数の音声出力部との間に音声
出力部を選択する出力先選択部を設けた。また、テキス
ト解析部に与えるテキストデータに、このテキストデー
タに対応したアナログ音声信号を出力する音声出力部を
規定するラベルを付与し、テキスト解析部、合成パラメ
ータ生成部及び音声合成部はそれぞれ、ラベル以外の部
分に対して自己に割り当てられている処理を行なってラ
ベルを付与して処理後のデータを次の処理部に与え、出
力先選択部は、入力された音声波形データをそのデータ
に付与されているラベルに対応した音声出力部に与える
こととした。That is, a plurality of voice output units are provided, and an output destination selection unit for selecting a voice output unit is provided between the voice synthesis unit and the plurality of voice output units. In addition, a label defining a voice output unit that outputs an analog voice signal corresponding to the text data is given to the text data to be given to the text analysis unit, and the text analysis unit, the synthesis parameter generation unit, and the voice synthesis unit respectively label the text data. The parts other than are processed by the process assigned to themselves and given a label, and the processed data is given to the next processing unit, and the output destination selection unit gives the input voice waveform data to the data. It is decided to give it to the audio output section corresponding to the label that is set.

【００１７】ここで、出力先選択部は、指示された音声
出力モードを保持し、入力された音声波形データをその
データに付与されているラベルに対応した音声出力部に
対して、指示された音声出力モードに応じたタイミング
で与えることが好ましい。Here, the output destination selection unit holds the instructed voice output mode, and the voice output unit which has input the voice waveform data is instructed to the voice output unit corresponding to the label attached to the data. It is preferable to give it at a timing according to the audio output mode.

【００１８】[0018]

【作用】本発明は、テキスト解析部が入力されたテキス
トデータから単語辞書を利用して音韻、韻律記号列を生
成し、合成パラメータ生成部が音韻、韻律記号列を音声
素片記憶部を利用して音声合成用パラメータに変換し、
音声合成部が音声合成用パラメータを音声波形データに
変換し、音声出力部が音声波形データをアナログ音声信
号に変換して出力するテキスト音声変換システムを前提
とする。According to the present invention, the text analysis unit uses the word dictionary to generate phonemes and prosodic symbol strings from the input text data, and the synthesis parameter generation unit uses the phoneme and prosodic symbol strings in the speech unit storage unit. And convert it to a voice synthesis parameter,
It is premised on a text-to-speech conversion system in which a speech synthesis unit converts speech synthesis parameters into speech waveform data, and a speech output unit converts speech waveform data into an analog speech signal and outputs it.

【００１９】音声出力部は、人間の聴覚特性を考慮した
速度のクロックによってアナログ音声信号への変換を行
なうので、テキスト解析部、合成パラメータ生成部及び
音声合成部より処理速度が遅く、逆に言えば、テキスト
解析部、合成パラメータ生成部及び音声合成部の処理に
は余裕がある。Since the voice output unit performs conversion into an analog voice signal by a clock having a speed that takes human auditory characteristics into consideration, the processing speed is slower than that of the text analysis unit, the synthesis parameter generation unit and the voice synthesis unit, and vice versa. For example, the text analysis unit, the synthesis parameter generation unit, and the voice synthesis unit have a margin for processing.

【００２０】そこで、本発明においては、音声出力部だ
けを複数設けて、テキスト解析部、合成パラメータ生成
部及び音声合成部を効率的に利用することとした。勿
論、複数の音声出力部を設けたので、音声合成部とこれ
ら複数の音声出力部との間に音声出力部を選択する出力
先選択部を設けることを要する。また、どのテキストデ
ータをどの音声出力部に与えるかを一義的に規定するた
めに、テキスト解析部に与えるテキストデータに、この
テキストデータに係るアナログ音声信号を出力する音声
出力部を規定するラベルを付与し、テキスト解析部、合
成パラメータ生成部及び音声合成部はそれぞれ、ラベル
以外の部分に対して自己に割り当てられている処理を行
なってラベルを付与して処理後のデータを次の処理部に
与え、出力先選択部は、入力された音声波形データをそ
のデータに付与されているラベルに対応した音声出力部
に与えることとした。Therefore, in the present invention, a plurality of voice output units are provided to efficiently use the text analysis unit, the synthesis parameter generation unit and the voice synthesis unit. Of course, since a plurality of voice output units are provided, it is necessary to provide an output destination selection unit that selects the voice output unit between the voice synthesis unit and the plurality of voice output units. In order to uniquely specify which text data is given to which voice output section, the text data to be given to the text analysis section should be labeled with a voice output section that outputs an analog voice signal related to this text data. The text analysis unit, the synthesis parameter generation unit, and the speech synthesis unit each perform a process assigned to themselves other than the label to give a label and the processed data to the next processing unit. The output destination selection unit supplies the input voice waveform data to the voice output unit corresponding to the label attached to the data.

【００２１】ここで、本発明において新たに設けられた
出力先選択部に音声出力モードを指示し、出力先選択部
が、入力された音声波形データをそのデータに付与され
ているラベルに対応した音声出力部に対して、指示され
た音声出力モードに応じたタイミングで与えるように
し、複数の音声出力部からの出力を同時にさせたり交互
にさせたりできるようにすることは好ましい。Here, the voice output mode is instructed to the output destination selection unit newly provided in the present invention, and the output destination selection unit corresponds the input voice waveform data to the label attached to the data. It is preferable that the sound output section be given at a timing according to the instructed sound output mode so that the outputs from the plurality of sound output sections can be simultaneously or alternately made.

【００２２】[0022]

【実施例】以下、本発明の一実施例を図面を参照しなが
ら詳述する。図１は、この実施例に係るテキスト音声変
換システムを示すものである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail below with reference to the drawings. FIG. 1 shows a text-to-speech conversion system according to this embodiment.

【００２３】図１において、この実施例のテキスト音声
変換システムも、例えばパーソナルコンピュータ等でな
るテキストデータ発生装置（上位装置）４０と、テキス
ト音声変換装置５０とから構成されている。なお、図１
は、テキストデータ発生装置４０については機能的構成
で示し、テキスト音声変換装置５０についてはハードウ
ェア的構成で示している。なお、テキスト音声変換装置
５０についての機能的構成は、図１における制御部５２
及び辞書記憶部５３が従来と同様にそれぞれ、テキスト
解析部及び合成パラメータ生成部と、単語辞書及び音声
素片記憶部とに分割された構成となる。In FIG. 1, the text-to-speech conversion system of this embodiment also comprises a text data generator (upper device) 40 such as a personal computer and a text-speech converter 50. Note that FIG.
Shows the text data generation device 40 in a functional configuration, and the text-to-speech conversion device 50 in a hardware configuration. The functional configuration of the text-to-speech conversion device 50 is the same as the control unit 52 in FIG.
The dictionary storage unit 53 and the dictionary storage unit 53 are divided into a text analysis unit and a synthesis parameter generation unit, and a word dictionary and a speech unit storage unit, respectively.

【００２４】テキストデータ発生装置４０は、例えば、
記録媒体に記録されている文書ファイル（テキストデー
タ）を１以上格納しており、ここでは、４個の文書ファ
イル４１Ａ〜４１Ｄが格納されているものとする。The text data generator 40 is, for example,
One or more document files (text data) recorded on the recording medium are stored. Here, it is assumed that four document files 41A to 41D are stored.

【００２５】また、テキストデータ発生装置４０は音声
出力モード設定部４２を有する。音声出力モード設定部
４２は機能的な存在であり、ハードウェア上はキーボー
ドやＣＰＵやメモリ等からなっており、ユーザによって
指示された音声出力モードと、文書ファイル及びラベル
（音声出力部を特定するデータ）の組合せ情報を取り込
むものである。なお、後述するように、この実施例で
は、音声出力部として４個の音声出力部５６ａ〜５６ｄ
が設けられている。The text data generator 40 also has a voice output mode setting unit 42. The voice output mode setting unit 42 is a functional entity, and is composed of a keyboard, a CPU, a memory, and the like on the hardware, and the voice output mode instructed by the user, the document file, and the label (specify the voice output unit). Data) combination information. As will be described later, in this embodiment, four audio output units 56a to 56d are used as audio output units.
Is provided.

【００２６】ここで、音声出力モードとしては、１個の
文書ファイルの内容をいずれか１個の音声出力部から出
力させる単一ファイル単一出力モードと、１個の文書フ
ァイルの内容を複数の音声出力部から出力させる単一フ
ァイル同報出力モードと、複数の文書ファイルの内容を
別個の音声出力部から同時に出力させる複数ファイル同
時出力モードと、複数の文書ファイルの内容を別個の音
声出力部から交互に出力させる複数ファイル交互出力モ
ードと、複数の文書ファイルの内容をいずれか１個の音
声出力部から順次出力させる複数ファイル順次出力モー
ドとが用意されている。Here, as the audio output mode, a single file single output mode in which the content of one document file is output from any one audio output unit and a plurality of contents of one document file are output. Single file broadcast output mode to output from the audio output unit, multiple file simultaneous output mode to output the contents of multiple document files simultaneously from separate audio output units, and separate audio output unit to output the contents of multiple document files There are prepared a multiple file alternate output mode for alternately outputting from a plurality of files and a multiple file sequential output mode for sequentially outputting the contents of a plurality of document files from any one audio output unit.

【００２７】さらに、テキストデータ発生装置４０はフ
ァイル読出しラベル付加部４３を備えている。ファイル
読出しラベル付加部４３は、テキストデータの出力が指
示されたときに、音声出力モード設定部４２に設定され
ている音声出力モードに応じた文書ファイルからテキス
トデータを取出し、そのテキストデータの先頭に、その
テキストデータに係る文書ファイルに対応付けられた音
声出力部を特定するラベルを付加するものである。な
お、ラベルを付加するテキストデータの単位量は、音声
出力モードによって異なるものであっても良い。例え
ば、音声出力モードが複数ファイル同時出力モードの場
合には、ファイル読出しラベル付加部４３は、例えば指
示されている複数ファイルのテキストデータを順次所定
量（例えば５１２Byte）ずつ取出すことを繰返し、その
所定量毎にラベルを付与する。また、音声出力モードが
複数ファイル交互出力モードの場合には、ファイル読出
しラベル付加部４３は、例えば指示されている複数ファ
イルのテキストデータを１文ずつ交互に取出し、各文に
ラベルを付与する。Further, the text data generating device 40 is provided with a file read label adding section 43. When instructed to output the text data, the file read label adding unit 43 extracts the text data from the document file corresponding to the voice output mode set in the voice output mode setting unit 42, and adds the text data to the beginning of the text data. The label for identifying the voice output section associated with the document file relating to the text data is added. The unit amount of the text data to which the label is added may differ depending on the voice output mode. For example, when the audio output mode is the multiple file simultaneous output mode, the file read label adding unit 43 repeats, for example, sequentially extracting a predetermined amount (for example, 512 Bytes) of text data of a plurality of instructed files at that location. Label each quantitation. When the audio output mode is the multiple file alternate output mode, the file read label adding unit 43 alternately takes out the text data of the instructed multiple files one sentence at a time and assigns a label to each sentence.

【００２８】ここで、ラベルは、実際の漢字かな混じり
文では生じることがほとんど考えられないキャラクタコ
ード列に選定しておくことが好ましい。また、１個のテ
キストデータに付与するラベルは基本的には１個である
が、例えば単一ファイル同報出力モードの場合には複数
となる。Here, it is preferable to select the label as a character code string that is unlikely to occur in an actual kanji / kana mixed sentence. Further, although the number of labels to be given to one piece of text data is basically one, there are a plurality of labels in the single file broadcast output mode, for example.

【００２９】また、テキストデータ発生装置４０はデー
タ出力部４４を備えている。データ出力部４４は、音声
出力モードと、文書ファイル及びラベルの組合せ情報と
が音声出力モード設定部４２に設定され終わった際に
は、音声出力モードと出力に供する文書ファイル数との
データをテキスト音声変換装置５０に出力し、テキスト
データの出力が指示されたときには、ファイル読出しラ
ベル付加部４３によって先頭にラベルが付加されたテキ
ストデータをテキスト音声変換装置５０に出力する。The text data generator 40 also has a data output unit 44. When the voice output mode and the combination information of the document file and the label have been set in the voice output mode setting unit 42, the data output unit 44 outputs the data of the voice output mode and the number of document files to be output as text data. When the output to the voice conversion device 50 is instructed to output the text data, the text data having the label added at the beginning by the file read label addition unit 43 is output to the text voice conversion device 50.

【００３０】この実施例のテキスト音声変換装置５０
は、データ入力部５１、テキスト解析部及び合成パラメ
ータ生成部と機能する制御部５２、単語辞書及び音声素
片保存部として機能する辞書部５３、音声合成部５４、
出力先選択部５５及び複数の音声出力部５６ａ〜５６ｄ
から構成されている。Text-to-speech converter 50 of this embodiment
Is a data input unit 51, a control unit 52 that functions as a text analysis unit and a synthesis parameter generation unit, a dictionary unit 53 that functions as a word dictionary and a speech unit storage unit, a speech synthesis unit 54,
Output destination selection unit 55 and a plurality of audio output units 56a to 56d
It consists of

【００３１】データ入力部５１は、音声出力モードと出
力に供する文書ファイル数とのデータが入力されてくる
と、それらデータを出力先選択部５５に与える。また、
データ入力部５１は、ラベルが付加されたテキストデー
タが入力されてくると、そのデータを保持して制御部５
２による処理に委ねる。When the data of the voice output mode and the number of document files to be output is input, the data input unit 51 gives the data to the output destination selection unit 55. Also,
When the text data with the label is input, the data input unit 51 holds the data and holds the data.
I will leave it to the processing by 2.

【００３２】制御部５２は例えばＣＰＵやＲＯＭやＲＡ
Ｍからなり、辞書記憶部５３に格納されている情報を利
用しながら、入力されたテキストデータを音声合成のた
めの合成パラメータにソフトウェア的に変換するもので
あり、得られた合成パラメータに対しても入力されたテ
キストデータに付与されていたラベルを先頭に付与して
音声合成部５４に与えるものである。テキストデータを
合成パラメータに変換する処理はテキスト解析処理と合
成パラメータ生成処理とでなり、従来とほぼ同様である
が、入力されたテキストデータにラベルが付加されてい
るため、多少異なっている。The control unit 52 includes, for example, a CPU, a ROM and an RA.
M is used to convert the input text data into synthesis parameters for speech synthesis by software while utilizing the information stored in the dictionary storage unit 53. Also adds the label attached to the input text data to the beginning and gives it to the voice synthesis unit 54. The process of converting text data into a synthesis parameter is a text analysis process and a synthesis parameter generation process, which is almost the same as the conventional one, but is slightly different because a label is added to the input text data.

【００３３】図４は、この実施例によるテキスト解析処
理の概略を示すフローチャートである。まず、入力され
たテキストデータから１個のキャラクタデータを取出
し、一連のテキストデータが終了していないことを確認
してそれがラベルを構成しているデータであるか否かを
判断する（ステップ１００〜１０２）。ラベルを構成し
ているものであると、ステップ１００に戻って次のキャ
ラクタデータの取出しを行なう。他方、ラベルデータ以
外であると、すなわち、文書を構成しているキャラクタ
データであると、テキスト解析処理を行なって次のキャ
ラクタデータの取出しに戻る（ステップ１０３）。この
ようにしてラベル以外のキャラクタデータに対してテキ
スト解析処理が進み、一連のテキストデータ全体に対す
る処理が終了すると、テキスト解析処理で得られた音韻
・韻律記号列の先頭にテキストデータに付加されていた
ラベルをそのまま付加してテキスト解析全体の処理を終
了する（ステップ１０４）。FIG. 4 is a flow chart showing the outline of the text analysis processing according to this embodiment. First, one character data is extracted from the input text data, it is confirmed that a series of text data is not finished, and it is judged whether or not it is the data forming the label (step 100). 102). If it constitutes a label, the process returns to step 100 to take out the next character data. On the other hand, if it is other than the label data, that is, if it is the character data forming the document, the text analysis processing is performed and the process returns to the extraction of the next character data (step 103). In this way, the text analysis process proceeds for character data other than the label, and when the process for the entire series of text data ends, the phoneme / prosodic symbol string obtained by the text analysis process is added to the beginning of the text data. The label is added as it is, and the entire text analysis process is terminated (step 104).

【００３４】合成パラメータの生成処理も図示は省略す
るが、ラベルの観点からはほぼ同様な処理であり、音韻
・韻律記号列を変換して得られた合成パラメータの先頭
にも、入力されたテキストデータに付加されていたラベ
ルが付加される。Although the synthesis parameter generation process is not shown in the figure, the process is almost the same from the viewpoint of the label, and the input text is added to the beginning of the synthesis parameter obtained by converting the phoneme / prosodic symbol string. The label that was added to the data is added.

【００３５】この実施例の音声合成部５４も、従来の音
声合成部と同様に合成パラメータに対して合成処理を実
行して音声波形データを形成するものである。しかし、
形成した音声波形データの先頭にラベルを付与する点
で、従来の音声合成部とは異なる。The voice synthesizing unit 54 of this embodiment also performs the synthesizing process on the synthesizing parameters similarly to the conventional voice synthesizing unit to form the voice waveform data. But,
This is different from the conventional voice synthesizer in that a label is added to the beginning of the formed voice waveform data.

【００３６】制御部５２及び音声合成部５４は、音声出
力部５６ａ〜５６ｄのように、人間の聴覚特性を考慮し
た処理速度の制限がないので、そのハードウェア上認め
られている高速な処理速度で入力されたデータに対する
処理を順次行なう。Unlike the voice output units 56a to 56d, the control unit 52 and the voice synthesizing unit 54 have no limitation on the processing speed in consideration of human auditory characteristics, and therefore have a high processing speed recognized by the hardware. The data input in step S1 is sequentially processed.

【００３７】出力先選択部５５は、ラベル内容と複数の
音声出力部５６ａ〜５６ｄとを対応付けた図５に示すテ
ーブルを内蔵している。出力先選択部５５は、ラベルが
付与された音声波形データが与えられると、ラベルを除
いた音声波形データ部分をそのラベルに対応付けられた
いずれかの音声出力部５６ａ、５６ｂ、５６ｃ、５６ｄ
に与える。この際、出力先選択部５５は、音声出力モー
ドと出力に供する文書ファイル数とのデータに基づいた
出力方法で、音声波形データ部分をラベルによって定ま
る音声出力部５６ａ〜５６ｄに出力する。The output destination selection unit 55 has a built-in table shown in FIG. 5 in which the label contents are associated with the plurality of audio output units 56a to 56d. When the label-added voice waveform data is given, the output destination selection unit 55 outputs any one of the voice output data units 56a, 56b, 56c, 56d in which the voice waveform data portion excluding the label is associated with the label.
Give to. At this time, the output destination selection unit 55 outputs the audio waveform data portion to the audio output units 56a to 56d defined by the label by an output method based on the data of the audio output mode and the number of document files to be output.

【００３８】例えば、音声出力モードが、１個の文書フ
ァイルの内容をいずれか１個の音声出力部から出力させ
る単一ファイル単一出力モードであると、出力先選択部
５５は、単純にラベルを除いた音声波形データ部分をそ
のラベルに対応付けられたいずれかの音声出力部に与え
る。また、音声出力モードが、１個の文書ファイルの内
容を複数の音声出力部から出力させる単一ファイル同報
出力モードであると、出力先選択部５５は、ラベルを除
いた音声波形データ部分を付与されていた複数のラベル
に対応付けられた全ての音声出力部に与える。さらに、
音声出力モードが、複数の文書ファイルの内容を別個の
音声出力部から同時に出力させる複数ファイル同時出力
モードであると、出力先選択部５５は、順次与えられる
ラベル付き音声信号についてラベルが一巡する毎に、複
数の音声出力部に対するラベルを除いた音声波形データ
部分を同時にそれらの音声出力部に与える。さらにま
た、音声出力モードが、複数の文書ファイルの内容を別
個の音声出力部から交互に出力させる複数ファイル交互
出力モードであると、出力先選択部５５は、ラベルを除
いた音声波形データ部分をそのラベルに対応付けられた
いずれかの音声出力部に順次与える。また、音声出力モ
ードが、複数の文書ファイルの内容をいずれかの音声出
力部から順次出力させる複数ファイル順次出力モードで
あると、出力先選択部５５は、単純にラベルを除いた音
声波形データ部分をそのラベルに対応付けられたいずれ
かの音声出力部に与える。For example, if the audio output mode is a single file single output mode in which the content of one document file is output from any one audio output unit, the output destination selection unit 55 simply displays the label. The voice waveform data part except for is given to one of the voice output parts associated with the label. If the audio output mode is a single-file broadcast output mode in which the contents of one document file are output from a plurality of audio output units, the output destination selection unit 55 outputs the audio waveform data portion excluding the label. It is given to all the voice output units associated with the assigned labels. further,
When the audio output mode is a multiple file simultaneous output mode in which the contents of a plurality of document files are simultaneously output from different audio output units, the output destination selection unit 55 causes the output destination selection unit 55 to make a cycle of the labels for sequentially applied labeled audio signals. In addition, the voice waveform data portions excluding the labels for the plurality of voice output units are simultaneously given to those voice output units. Furthermore, when the audio output mode is a multiple file alternating output mode in which the contents of a plurality of document files are alternately output from separate audio output units, the output destination selecting unit 55 causes the audio waveform data portion excluding the label to be output. It is sequentially given to one of the voice output units associated with the label. Further, if the audio output mode is a multiple file sequential output mode in which the contents of a plurality of document files are sequentially output from any of the audio output units, the output destination selection unit 55 simply causes the audio waveform data portion excluding the label to be output. Is given to one of the voice output units associated with the label.

【００３９】各音声出力部５６ａ、５６ｂ、５６ｃ、５
６ｄはそれぞれ、バッファメモリやデジタル／アナログ
変換器を内蔵しており、出力先選択部５５から与えられ
た音声波形データをバッファメモリで受信し、バッファ
メモリに格納された音声波形データを所定速度のクロッ
クに基づいてデジタル／アナログ変換器によってアナロ
グ信号に変換して図示しない内蔵のスピーカ等から発音
出力させたり内蔵の通信手段によって他の装置に送信し
たりする。The voice output units 56a, 56b, 56c, 5
Each of the 6d has a built-in buffer memory and a digital / analog converter, receives the voice waveform data given from the output destination selection unit 55 in the buffer memory, and outputs the voice waveform data stored in the buffer memory at a predetermined speed. Based on the clock, it is converted into an analog signal by a digital / analog converter and output as a sound from a built-in speaker (not shown) or transmitted to another device by a built-in communication means.

【００４０】以上の構成を有するテキスト音声変換シス
テムによって、所望する１又は２以上のテキストファイ
ルの内容を所望の音声出力部から出力させることを望む
ユーザは、音声出力モードを設定し、出力するテキスト
ファイルとラベル（音声出力部）との対応関係を設定
し、出力を起動すれば良い。A user who wants to output the contents of one or more desired text files from a desired voice output section by the text-to-speech conversion system having the above configuration sets the voice output mode and outputs the text to be output. It suffices to set the correspondence between the file and the label (voice output section) and activate the output.

【００４１】以下、音声出力モードとして複数ファイル
同時出力モードが設定され、図６に示すように、文書フ
ァイル４１Ａにラベル０１が設定され、文書ファイル４
１Ｂにラベル００が設定され、文書ファイル４１Ｃにラ
ベル０２が設定されたとして動作を説明する。In the following, the multiple file simultaneous output mode is set as the audio output mode, the label 01 is set in the document file 41A, and the document file 4 is set as shown in FIG.
The operation will be described assuming that the label 00 is set in 1B and the label 02 is set in the document file 41C.

【００４２】モード等が設定されたときには、テキスト
データ発生装置４０からテキスト音声変換装置５０へ音
声出力モードが複数ファイル同時出力モードであるこ
と、出力に係るテキストファイル数が３であることが送
信され、その後、ラベルが付与された文書ファイル４１
Ａ、文書ファイル４１Ｂ、文書ファイル４１Ｃのテキス
トデータが順次テキスト音声変換装置５０に出力され
る。When the mode or the like is set, it is transmitted from the text data generating device 40 to the text-to-speech converting device 50 that the voice output mode is the multiple file simultaneous output mode and the number of text files related to the output is three. , And then the labeled document file 41
The text data of A, the document file 41B, and the document file 41C are sequentially output to the text-to-speech converter 50.

【００４３】テキスト音声変換装置５０においては、複
数ファイル同時出力モード、テキストファイル数３が出
力先選択部５５に与えられる。その後に、入力されたラ
ベル付きテキストデータに対しては、制御部５２が辞書
記憶部５３を利用しながらラベル以外の部分に対してテ
キスト解析処理及び合成パラメータの生成処理を施し、
得られた合成パラメータにラベルを付与した状態で音声
合成部５４に引き渡す。ラベル以外の合成パラメータ部
分は、音声合成部５４によって音声波形データに変換さ
れてラベルと共に出力先選択部５５に与えられる。出力
先選択部５５においては、予め与えられた複数ファイル
同時出力モード、テキストファイル数３に基づいて、入
力されたラベル付き音声波形データの音声波形データ部
分だけをラベルが指示する音声出力部５６ｂ、５６ａ、
５６ｃに同時に与え、音声出力部５６ｂ、５６ａ、５６
ｃによってアナログ信号に変換されて文書ファイル４１
Ａ、文書ファイル４１Ｂ、文書ファイル４１Ｃのテキス
トデータに対応するアナログ音声信号が同時に出力され
る。In the text-to-speech conversion device 50, the multiple file simultaneous output mode and the number of text files 3 are given to the output destination selection unit 55. Thereafter, for the input labeled text data, the control unit 52 uses the dictionary storage unit 53 to perform a text analysis process and a synthesis parameter generation process on a portion other than the label,
The obtained synthesis parameter is delivered to the voice synthesis unit 54 with a label attached. The synthesis parameter portion other than the label is converted into voice waveform data by the voice synthesis unit 54 and given to the output destination selection unit 55 together with the label. In the output destination selection unit 55, the voice output unit 56b in which only the voice waveform data portion of the input voice waveform data with a label is designated by the label based on the preliminarily given plural file simultaneous output mode and the number of text files is 3, 56a,
56c and audio output units 56b, 56a, 56
Document file 41 converted to an analog signal by c
Analog audio signals corresponding to the text data of A, the document file 41B, and the document file 41C are simultaneously output.

【００４４】従って、上記実施例によれば、音声出力モ
ードによっては、複数の音声出力部に音声波形データを
与えるように制御部５２及び音声合成部５４が動作し、
これら制御部５２及び音声合成部５４の稼動率を平均的
に見て従来より高めることができる。Therefore, according to the above embodiment, depending on the voice output mode, the control unit 52 and the voice synthesis unit 54 operate so as to give the voice waveform data to the plurality of voice output units,
The operating ratios of the control unit 52 and the voice synthesizing unit 54 can be increased on average, compared to the conventional case.

【００４５】また、複数の音声出力部を動作させようと
した場合には、従来では、テキスト音声変換装置又はシ
ステム全体を複数設けることが必要であったが、この実
施例によれば、制御部５２、辞書記憶部５３及び音声合
成部５４を共通化して対応でき、複数の音声出力部を動
作させようとした場合の構成を従来より簡単なものとす
ることができる。Further, when it is attempted to operate a plurality of voice output units, conventionally, it is necessary to provide a plurality of text-to-speech conversion devices or the entire system, but according to this embodiment, the control unit is provided. 52, the dictionary storage unit 53, and the voice synthesizing unit 54 can be commonly used, and the configuration when operating a plurality of voice output units can be made simpler than in the past.

【００４６】さらに、上記実施例によれば、複数の音声
出力部を有し、その音声出力部への音声波形データの供
給を音声出力モードによって選択制御する出力先選択部
を設けたので、種々の出力方法を行なうことができる。
すなわち、多様な使用方法を提供できる。Further, according to the above-mentioned embodiment, the output destination selecting section is provided which has a plurality of audio output sections and selectively controls the supply of the audio waveform data to the audio output sections in the audio output mode. Output method can be performed.
That is, various usage methods can be provided.

【００４７】なお、上記実施例においては、音声出力モ
ードをテキストデータ発生装置４０に対して設定するも
のを示したが、出力先選択部５５に直接設定するように
しても良い。In the above embodiment, the voice output mode is set to the text data generating device 40, but it may be set directly to the output destination selecting section 55.

【００４８】また、上記実施例においては、種々の音声
出力方法に対応できるものを示したが、１個の文書ファ
イルの内容をいずれか１個の音声出力部から出力させる
方法だけに対応できるシステムであっても良い。In the above embodiment, various audio output methods are shown. However, a system which can output only the content of one document file from any one audio output unit. May be

【００４９】さらに、本発明は、日本語文を対象とした
テキスト音声変換システムに限定されるものではなく、
他の言語のテキストデータを変換するシステムにも当然
に適用することができる。Furthermore, the present invention is not limited to the text-to-speech conversion system for Japanese sentences,
It can be naturally applied to a system for converting text data of other languages.

【００５０】[0050]

【発明の効果】以上のように、本発明によれば、音声出
力部を複数設けると共に、上記音声合成部とこれら複数
の音声出力部との間に音声出力部を選択する出力先選択
部を設け、テキスト解析部に与えるテキストデータに、
このテキストデータに対応したアナログ音声信号を出力
する音声出力部を規定するラベルを付与し、テキスト解
析部、合成パラメータ生成部及び音声合成部はそれぞ
れ、ラベル以外の部分に対して自己に割り当てられてい
る処理を行なってラベルを付与して処理後のデータを次
の処理部に与え、出力先選択部は、入力された音声波形
データをそのデータに付与されているラベルに対応した
音声出力部に与えるようにしたので、テキスト音声変換
装置内の各部を無駄なく利用することができる、しか
も、多様な音声出力方法に対応できるテキスト音声変換
システムを実現できる。As described above, according to the present invention, a plurality of voice output units are provided and an output destination selection unit for selecting a voice output unit is provided between the voice synthesis unit and the plurality of voice output units. The text data provided to the text analysis section,
A label that defines a voice output unit that outputs an analog voice signal corresponding to this text data is given, and the text analysis unit, the synthesis parameter generation unit, and the voice synthesis unit are assigned to each other except for the label. Processing is performed, a label is added, and the processed data is given to the next processing unit, and the output destination selection unit outputs the input voice waveform data to the voice output unit corresponding to the label given to the data. Since it is provided, it is possible to realize a text-to-speech conversion system that can utilize each unit in the text-speech conversion device without waste and can support various voice output methods.

[Brief description of drawings]

【図１】実施例の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment.

【図２】従来のハードウェア的構成を示すブロック図で
ある。FIG. 2 is a block diagram showing a conventional hardware configuration.

【図３】従来の機能的構成を示すブロック図である。FIG. 3 is a block diagram showing a conventional functional configuration.

【図４】実施例のテキスト解析処理の概略を示すフロー
チャートである。FIG. 4 is a flowchart showing an outline of a text analysis process of the embodiment.

【図５】実施例のラベルと音声出力部との対応関係を示
す説明図である。FIG. 5 is an explanatory diagram showing a correspondence relationship between a label and an audio output unit according to the embodiment.

【図６】実施例のラベル付きテキストデータを示す説明
図である。FIG. 6 is an explanatory diagram showing labeled text data according to the embodiment.

[Explanation of symbols]

４０…テキストデータ発生装置、４２…音声出力モード
設定部、４３…ファイル読出しラベル付加部、４４…デ
ータ出力部、５０…テキスト音声変換装置、５１…デー
タ入力部、５２…制御部（テキスト解析部、合成パラメ
ータ生成部）、５３…辞書記憶部（単語辞書、音声素片
記憶部）、５４…音声合成部、５５…出力先選択部、５
６ａ〜５６ｄ…音声出力部。40 ... Text data generating device, 42 ... Voice output mode setting unit, 43 ... File reading label adding unit, 44 ... Data output unit, 50 ... Text voice converting device, 51 ... Data input unit, 52 ... Control unit (text analysis unit) , Synthesis parameter generation unit), 53 ... dictionary storage unit (word dictionary, speech unit storage unit), 54 ... speech synthesis unit, 55 ... output destination selection unit, 5
6a to 56d ... Voice output unit.

Claims

[Claims]

1. A text analysis unit that generates a phoneme and a prosodic symbol string from input text data using a word dictionary, and a phoneme and a prosodic symbol string as a speech synthesis parameter using a speech unit storage unit. A synthesis parameter generation unit for conversion,
In a text-to-speech conversion system including a voice synthesizing unit for converting a voice synthesizing parameter into voice waveform data and a voice output unit for converting voice waveform data into an analog voice signal and outputting the analog voice signal, a plurality of the voice output units are provided. An output destination selection unit for selecting a voice output unit is provided between the voice synthesis unit and the plurality of voice output units, and an analog voice signal corresponding to the text data is output to the text data given to the text analysis unit. The text analysis unit, the synthesis parameter generation unit, and the voice synthesis unit each perform a process assigned to themselves other than the label to give a label. Then, the processed data is given to the next processing unit, and the output destination selection unit attaches the input voice waveform data to the data. A text-to-speech conversion system characterized by giving to the above-mentioned voice output unit corresponding to a given label.

2. The text-to-speech conversion system according to claim 1, wherein the output destination selection unit holds the instructed voice output mode, and the input voice waveform data is assigned to a label attached to the data. A text-to-speech conversion system, characterized in that it is applied to the corresponding voice output section at a timing corresponding to an instructed voice output mode.