JPH09160583A

JPH09160583A - Method and device for processing voice information

Info

Publication number: JPH09160583A
Application number: JP7321644A
Authority: JP
Inventors: Keiichi Sakai; 桂一酒井; Tsuyoshi Yagisawa; 津義八木沢; Minoru Fujita; 稔藤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-12-11
Filing date: 1995-12-11
Publication date: 1997-06-20

Abstract

PROBLEM TO BE SOLVED: To realize a variety of utterances by indicating the rhythm data of the inputted character-string, and uttering the character-string in accordance with the corrected rhythm data. SOLUTION: Data are received from an input part to a sentence input part 101, a corrected information label input part 110, and a used information label input part 114. The rhythm data kept in a rhythm data storage part 105 are indicated on a display screen of a display part by a rhythm data indicating part 107. In correcting the rhythm, the correction value is inputted by dragging a ○ with a mouse watching the display screen. When correction of the rhythm data is completed, the label information is inputted from the corrected information label input part 110 to the corrected data, and the inputted corrected information label is kept in the corrected information storage part 112 together with the rhythm data. In addition, a sound generating part 117 performs the sound generation treatment such as rhythm parameter connection, and the waveform connection referring to the values stored in a rhythm data correction and storage part 109.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声情報処理方法
及び装置に関し、特に、韻律に関わるデータを編集する
ことのできる音声情報処理方法及び装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice information processing method and apparatus, and more particularly to a voice information processing method and apparatus capable of editing prosodic data.

【０００２】本発明は、音声情報処理方法及び装置に関
し、特に、同じ字面の分を複数パターンのイントネーシ
ョンで発声することを可能にする音声情報処理方法及び
装置に関するものである。The present invention relates to a voice information processing method and device, and more particularly, to a voice information processing method and device that enables utterance of the same character with a plurality of patterns of intonation.

【０００３】[0003]

【従来の技術】一般に、音声合成装置は、入力文を解析
し、さらにアクセントやポーズなどの情報をテキストや
フレームによって表したもの（以下表音データと呼ぶ）
を生成する音声言語処理部と、音声言語処理部から受け
取った表音データを参照してピッチパターンを生成する
韻律制御部、韻律制御部から受け取ったピッチパターン
を参照して韻律と、パラメータ接続、波形接続などの音
声生成処理を行い、ＤＡコンバータを介して合成音声を
発声する音声生成部に分けられる。2. Description of the Related Art Generally, a speech synthesizer analyzes an input sentence and further represents information such as accents and pauses by texts and frames (hereinafter referred to as phonetic data).
A prosody control unit for generating a pitch pattern by referring to the phonetic data received from the voice language processing unit, a prosody and a parameter connection referring to the pitch pattern received from the prosody control unit, It is divided into a voice generation unit that performs voice generation processing such as waveform connection and produces a synthetic voice via a DA converter.

【０００４】「昨日は良い天気でした。」という入力文
に対する表音データの一例を図４に、ピッチパターンの
一例を図５に示す。An example of phonetic data for an input sentence "Yesterday was fine weather" is shown in FIG. 4, and an example of pitch patterns is shown in FIG.

【０００５】しかしながら、上記従来の装置では、同じ
字面の入力文に対して、最も尤度の高い一通りの合成音
声しか生成できないという欠点があった。However, the above-mentioned conventional device has a drawback in that it is possible to generate only one kind of synthetic speech having the highest likelihood for an input sentence having the same face.

【０００６】すなわち、同じ字面の文に対して、平叙文
的（尻下がり）なイントネーションを強制的に疑問調
（尻上がり）にしたり、標準語的なイントネーションを
関西弁風にしたい場合にも、思い通りの合成音声を生成
できない。[0006] That is, even in the case where it is desired to forcibly make a plain-text (inclination) intonation into a question tone (increase) or to make a standard-language intonation a Kansai dialect for sentences of the same character, Can't generate synthetic speech.

【０００７】[0007]

【発明が解決しようとする課題】本発明の目的は、上述
の欠点を除去し、韻律制御部で生成された韻律データを
編集し、その結果を用いて音声生成を行うことにより、
多様な発声を実現することにある。また、編集結果にラ
ベルを付与して、保持／読み込みを行うことにより、編
集した値の再利用を実現することにある。SUMMARY OF THE INVENTION An object of the present invention is to eliminate the above-mentioned drawbacks, edit prosody data generated by a prosody control unit, and use the result to perform speech generation.
It is to realize various vocalizations. Further, it is to realize reuse of the edited value by giving a label to the edited result and holding / reading it.

【０００８】[0008]

【課題を解決するための手段】上記課題を解決するため
に、本発明は、入力された文字列の韻律データを表示
し、前記表示された韻律データを修正し、前記修正され
た韻律データに従って文字列を発声する。In order to solve the above problems, the present invention displays prosodic data of an input character string, corrects the displayed prosodic data, and according to the corrected prosodic data. Say a string.

【０００９】上記課題を解決するために、本発明は、前
記入力された文字列を音声言語処理して前記表示する韻
律データを生成する。In order to solve the above-mentioned problems, the present invention processes the input character string by speech language to generate the prosody data to be displayed.

【００１０】上記課題を解決するために、本発明は、前
記韻律データを表示する際に、該韻律データに対応する
音声表記を韻律データと対応付けて表示する。In order to solve the above problems, the present invention, when displaying the prosody data, displays the phonetic notation corresponding to the prosody data in association with the prosody data.

【００１１】上記課題を解決するために、本発明は、前
記韻律データの修正は、表示画面上で韻律データを表わ
すマークを移動させることにより行う。In order to solve the above problem, the present invention corrects the prosody data by moving a mark representing the prosody data on the display screen.

【００１２】上記課題を解決するために、本発明は、前
記文字列は、表示されている文章中で特定された文字列
とする。In order to solve the above problems, the present invention provides that the character string is a character string specified in a displayed sentence.

【００１３】上記課題を解決するために、本発明は、前
記修正された韻律データにラベルを付与して登録する。In order to solve the above problems, the present invention adds a label to the modified prosody data and registers it.

【００１４】上記課題を解決するために、本発明は、ラ
ベルを入力し、前記入力されたラベルに対応する韻律デ
ータを読み出し、前記読み出した韻律データに従って発
声する。In order to solve the above problems, the present invention inputs a label, reads prosody data corresponding to the input label, and utters according to the read prosody data.

【００１５】上記課題を解決するために、本発明は、前
記修正された韻律データを保持し、発声の指示に応じて
前記保持された韻律データに従った発声を実行する。In order to solve the above-mentioned problems, the present invention holds the modified prosody data, and executes utterance according to the held prosody data in response to an instruction of utterance.

【００１６】上記課題を解決するために、本発明は、前
記韻律データを表示する表示手段を制御する。In order to solve the above-mentioned problems, the present invention controls display means for displaying the prosody data.

【００１７】上記課題を解決するために、本発明は、前
記文字列の発声をスピーカにより行うよう制御する。In order to solve the above problems, the present invention controls so that the character string is uttered by a speaker.

【００１８】上記課題を解決するために、本発明は、入
力手段から入力される前記韻律データの修正データを処
理する。In order to solve the above problems, the present invention processes modified data of the prosody data input from an input means.

【００１９】[0019]

【発明の実施の形態】以下、添付の図面を参照して本発
明の実施例を詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

【００２０】図１は、本発明の実施例の音声合成装置の
概略の構成を示すブロック図である。同図において、１
はＣＰＵであり、本音声合成装置における各種制御を行
なう。２はＲＯＭ（リードオンリーメモリ）であり、Ｃ
ＰＵ１が実行する制御プログラム及び各種処理に用いる
パラメータを格納する。ＲＯＭ２には、後述のフローチ
ャートで説明する処理を実行するためのＣＰＵ１を制御
する制御プログラムも格納されている。３はＲＡＭ（ラ
ンダムアクセスメモリ）であり、ＣＰＵ１が各種の制御
を実行するための作業領域としての情報格納部を提供す
る。４は音声出力部であり、合成音声信号に基づいて、
音声を生成し、出力する。FIG. 1 is a block diagram showing a schematic configuration of a speech synthesizer according to an embodiment of the present invention. In the figure, 1
Is a CPU, which performs various controls in the speech synthesizer. 2 is a ROM (Read Only Memory), C
The control program executed by the PU 1 and parameters used for various processes are stored. The ROM 2 also stores a control program for controlling the CPU 1 for executing the processing described in the flowchart described below. Reference numeral 3 denotes a RAM (random access memory), which provides an information storage unit as a work area for the CPU 1 to execute various controls. Reference numeral 4 is a voice output unit, which, based on the synthesized voice signal,
Generates and outputs audio.

【００２１】５は外部メモリであり、音声合成処理のた
めの韻律データの修正情報などを格納する。６はキーボ
ード、マウス等の入力部であり、音声合成すべき文を入
力したり、韻律データの修正情報を入力したりする。７
は表示部であり、ＣＲＴ、液晶表示器などで構成され、
韻律データ等の表示を行なう。８はバスであり、上記の
各構成を接続し、各構成間におけるデータの授受を可能
とする。Reference numeral 5 denotes an external memory which stores modification information of prosody data for voice synthesis processing. An input unit 6 such as a keyboard and a mouse is used to input a sentence to be voice-synthesized and to input correction information of prosody data. 7
Is a display unit, which is composed of a CRT, a liquid crystal display, etc.,
Display prosodic data etc. Reference numeral 8 denotes a bus, which connects the above-mentioned components to each other and enables data exchange between the components.

【００２２】図２は、本実施例の音声合成装置の機能構
成を示すブロック図である。FIG. 2 is a block diagram showing the functional arrangement of the speech synthesizer of this embodiment.

【００２３】ここで、まず図２に示した各機能構成が図
１の構成のどれにより成されるか、対応を示す。First, the correspondence between the functional configurations shown in FIG. 2 and the configuration shown in FIG. 1 will be described.

【００２４】文入力部１０１、修正情報ラベル入力部１
１０及び使用情報ラベル入力部１１４は、入力部６によ
り各データを入力する。The sentence input unit 101 and the correction information label input unit 1
10 and the usage information label input unit 114 input each data by the input unit 6.

【００２５】入力文保持部１０２、表音データ保持部１
０４、韻律データ保持部１０６、韻律データ修正値保持
部１０９、修正情報ラベル保持部１１１、使用情報ラベ
ル保持部１１５及び音声パラメータ保持部１１８は、Ｒ
ＡＭ３に各パラメータを記憶する。Input sentence holding unit 102, phonetic data holding unit 1
04, the prosody data holding unit 106, the prosody data correction value holding unit 109, the correction information label holding unit 111, the use information label holding unit 115, and the voice parameter holding unit 118, R
Each parameter is stored in AM3.

【００２６】修正情報保持部１１２は外部メモリ５に修
正情報のパラメータを記憶する。The modification information holding unit 112 stores parameters of modification information in the external memory 5.

【００２７】音声言語処理部１０３、韻律制御部１０
５、韻律データ修正部１０８、修正情報保持実行部１１
３、使用情報読込実行部１１６、発声実行部１２０及び
音声生成部１１７は、ＣＰＵ１により実行する。Speech language processing unit 103, prosody control unit 10
5, prosody data correction unit 108, correction information holding execution unit 11
3, the usage information read execution unit 116, the utterance execution unit 120, and the voice generation unit 117 are executed by the CPU 1.

【００２８】韻律データ表示部１０７は表示部７に韻律
データを表示し、ＤＡコンバータ１１９は音声出力部４
において実行する。The prosody data display unit 107 displays the prosody data on the display unit 7, and the DA converter 119 displays the voice output unit 4.
Run in

【００２９】図２において、１０１は音声合成を行う文
を入力部６より入力する文入力部である。１０２は文入
力部１０１から入力された入力文をＲＡＭ３に保持する
入力文保持部（バッファ）である。In FIG. 2, reference numeral 101 denotes a sentence input unit for inputting a sentence for speech synthesis from the input unit 6. An input sentence holding unit (buffer) 102 holds the input sentence input from the sentence input unit 101 in the RAM 3.

【００３０】１０３は入力された文に対して音声言語処
理を行い、音声表記、アクセント型、ポーズ長、発声ス
ピード、音量などの表音データを生成する音声言語処理
部であり、１０４は音声言語処理部１０３で音声言語処
理で生成された表音データをＲＡＭ３に保持する表音デ
ータ保持部（バッファ）である。図４に、この表音デー
タの一例を示す。文入力部１０１において入力された文
「昨日は良い天気でした。」は、音声言語処理部１０３
で音声言語処理され、「昨日は」「良い」「天気でし
た。」の３つのフレームごとにアクセント型、モーラ
数、次フレームへの接続情報、構成品詞、ポーズ長、発
声スピード、ピッチ、音量等の各種パラメータからなる
表音データを生成する。Reference numeral 103 is a voice language processing unit for performing voice language processing on the input sentence to generate phonetic data such as voice notation, accent type, pause length, utterance speed, and volume, and 104 is a voice language. It is a phonetic sound data holding unit (buffer) that holds the phonetic sound data generated by the speech language processing in the processing unit 103 in the RAM 3. FIG. 4 shows an example of this phonetic data. The sentence “Yesterday was fine weather.” Input in the sentence input unit 101 is the spoken language processing unit 103.
Spoken language processing is performed, and the accent type, the number of mora, the connection information to the next frame, the part of speech, the pause length, the utterance speed, the pitch, and the volume for each of the three frames "Yesterday", "Good" and "It was a weather." It generates phonetic data consisting of various parameters such as.

【００３１】１０５は表音データ保持部１０３に保持さ
れた表音データを参照してピッチパターンなどの韻律デ
ータを生成する韻律制御部であり、１０６は韻律制御部
１０５で得られた韻律データをＲＡＭ３に保持する韻律
データ保持部（バッファ）である。Reference numeral 105 is a prosody control unit for generating prosody data such as a pitch pattern by referring to the phonetic data stored in the phonetic data storage unit 103, and 106 is the prosody data obtained by the prosody control unit 105. It is a prosody data holding unit (buffer) held in the RAM 3.

【００３２】１０７は韻律データ保持部１０５に保持さ
れた韻律データを表示部７の画面上に表示する韻律デー
タ表示部である。１０８は韻律データ表示部１０７に表
示されたピッチパターンなどの韻律データを修正入力す
る韻律データ修正部であり、１０９は韻律データ修正部
１０８から修正入力された値をＲＡＭ３に保持する韻律
データ修正値保持部（バッファ）である。１１０は韻律
データ修正値保持部１０９に保持された値のラベルを入
力する修正情報ラベル入力部であり、１１１は修正情報
ラベル入力部から入力された修正値のラベルをＲＡＭ３
に保持する修正情報ラベル保持部（バッファ）である。
１１３は韻律データ修正値保持部１０９に保持された値
と修正情報ラベル保持部１１１に保持されたラベルとを
対応させて外部メモリ５の修正情報保持部１１２に保持
する修正情報保持実行部である。なお、修正情報保持実
行部は入力部６よりの実行指示の入力によって起動され
る。A prosody data display unit 107 displays the prosody data held in the prosody data holding unit 105 on the screen of the display unit 7. Reference numeral 108 denotes a prosody data correction unit for correcting and inputting prosody data such as a pitch pattern displayed on the prosody data display unit 107, and reference numeral 109 denotes a prosody data correction value for holding the value corrected and input from the prosody data correction unit 108 in the RAM 3. It is a holding unit (buffer). Reference numeral 110 denotes a correction information label input unit for inputting the label of the value held in the prosody data correction value holding unit 109, and 111 denotes the correction value label input from the correction information label input unit in the RAM 3
It is a correction information label holding unit (buffer) held in.
A correction information holding execution unit 113 holds the value held in the prosody data correction value holding unit 109 and the label held in the correction information label holding unit 111 in the correction information holding unit 112 of the external memory 5 in association with each other. . The correction information holding execution unit is activated by inputting an execution instruction from the input unit 6.

【００３３】韻律データ保持部１０５に保持された韻律
データは、図５のように、韻律データ表示部１０７によ
り表示部７の表示画面に表示される。この表示は、音声
表記「キノーハヨイテンキデシタ」と韻律の関係が
読み取れるように対応して表示され、また、韻律の修正
ポイントを「○」で表示する等して韻律の修正操作をや
りやすいものとする。The prosody data stored in the prosody data storage unit 105 is displayed on the display screen of the display unit 7 by the prosody data display unit 107 as shown in FIG. This display is displayed so that the relationship between the phonetic notation "Kinoha Yoi Tenkidsita" and the prosody can be read, and the prosodic modification points are indicated by "○" to make it easier to perform prosody modification operations. To do.

【００３４】韻律の修正は、この表示画面を見ながら、
○をマウスでドラッグして動かすことにより修正値を入
力する。例えば、図５のような、標準型の「昨日は良い
天気でした。」の発声を関西風の発声に変更したい時に
は、「キノーハ」、「ヨイ」、「テンキデシタ」の各フ
レームのイントネーションを尻上がりにすれば良いの
で、「キノーハ」の「ハ」に対応する○をマウスで上へ
とドラッグ移動し、同様に「ヨイ」の「イ」に対応する
○と「テンキデシタ」に対応する○を上げる。更に、よ
り関西風に近づける為に、他の部分の韻律も修正して、
図６のようなイントネーションを完成させる。To modify the prosody, while watching this display screen,
Enter the correction value by dragging ○ with the mouse to move it. For example, when you want to change the standard type "Yesterday was a nice day." So, drag the ○ corresponding to “Ha” of “Kinoha” to the top with the mouse, and raise the ○ corresponding to “A” of “Yoi” and the ○ corresponding to “Tenkiddesita” in the same way. . Furthermore, in order to bring it closer to the Kansai style, we also modified the prosody of other parts,
The intonation shown in FIG. 6 is completed.

【００３５】所望のイントネーションを完成させる為
に、韻律修正作業の途中においても入力部６から発声の
実行を指示することによって、その時表示画面に表示さ
れている韻律の状態での発声を実行し、オペレータはこ
の発声を聞いて修正作業を進める。マウスによる○のド
ラッグ移動により入力された修正値は随時韻律データ修
正値保持部１０９に保持されるので、発声の実行指示の
入力に応じて韻律データ修正値保持部１０９に保持され
ている韻律データを参照して音声データを生成し、発声
することによって表示画面上の韻律データに対応したイ
ントネーションでの発声が可能となる。In order to complete the desired intonation, the execution of vocalization is instructed from the input unit 6 even during the prosody modification work, so that the vocalization in the prosody state displayed on the display screen at that time is executed. The operator hears this utterance and proceeds with the correction work. Since the correction value input by dragging the mouse with a circle is held in the prosody data correction value holding unit 109 at any time, the prosody data held in the prosody data correction value holding unit 109 in response to the input of the utterance execution instruction. By generating the voice data with reference to, and uttering the voice data, it is possible to utter with the intonation corresponding to the prosody data on the display screen.

【００３６】韻律データの修正が完了したら、その修正
データに対してラベル情報として「関西風」を修正情報
ラベル入力部１１０より入力し、この入力された修正情
報ラベルは韻律データとともに修正情報保持部１１２に
保持される。修正情報ラベルは修正情報保持部１１２か
ら韻律データを読み込む際の識別子として機能する他、
図６のように韻律データを表示部７に表示する際には韻
律データとともにこの修正情報ラベルが表示され、修正
データであることや、何というラベルで保持されている
かわかるようになっている。When the correction of the prosody data is completed, "Kansai style" is input as label information to the correction data from the correction information label input section 110, and the input correction information label is input to the correction information holding section together with the prosody data. Retained at 112. The correction information label functions as an identifier when reading the prosody data from the correction information holding unit 112,
As shown in FIG. 6, when the prosody data is displayed on the display unit 7, the correction information label is displayed together with the prosody data so that the correction data and what label it holds are displayed.

【００３７】１１４は使用する修正情報のラベルを入力
する使用情報ラベル入力部であり、１１５は使用情報ラ
ベル入力部１１４から入力された使用情報のラベルをＲ
ＡＭ３に保持する使用情報ラベル保持部（バッファ）で
ある。１１６は使用情報ラベル保持部１１５に保持され
た使用情報のラベルに対応する修正情報を修正情報保持
部１１２から読み込み、韻律データ修正値保持部１０９
に保持する使用情報読込実行部である。なお、使用情報
読込実行部は入力部６よりの実行指示の入力によって起
動される。Reference numeral 114 is a usage information label input section for inputting the label of the correction information to be used, and 115 is the label of the usage information input from the usage information label input section 114.
It is a usage information label holding unit (buffer) held in AM3. Reference numeral 116 reads the correction information corresponding to the label of the usage information held in the usage information label holding unit 115 from the correction information holding unit 112, and the prosody data correction value holding unit 109.
It is a usage information reading execution unit held in. The usage information reading execution unit is activated by inputting an execution instruction from the input unit 6.

【００３８】１１７は、韻律データ修正値保持部１０９
に保持された値を参照して、韻律、パラメータ接続、波
形接続などの音声生成処理を行う音声生成部であり、１
１８は音声生成部１１７で生成された合成音声のパラメ
ータをＲＡＭ３に保持する音声パラメータ保持部（バッ
ファ）である。１１９は音声パラメータ保持部１１８に
保持された音声パラメータを合成音声に変換するＤＡコ
ンバータである。また、１２０は音声生成部１１７にお
ける音声生成の実行開始を制御する。Reference numeral 117 denotes a prosody data correction value holding unit 109.
A voice generation unit that performs voice generation processing such as prosody, parameter connection, and waveform connection by referring to the value held in
Reference numeral 18 denotes a voice parameter holding unit (buffer) that holds the parameters of the synthetic voice generated by the voice generating unit 117 in the RAM 3. A DA converter 119 converts the voice parameter held in the voice parameter holding unit 118 into synthetic voice. Also, 120 controls the start of execution of voice generation in the voice generation unit 117.

【００３９】次に、本装置の動作を説明する。図３は本
実施例の音声合成装置の動作手順を示すフローチャート
である。Next, the operation of the present apparatus will be described. FIG. 3 is a flow chart showing the operation procedure of the speech synthesizer of this embodiment.

【００４０】図３において、ステップＳ２０１〜Ｓ２０
７は、入力待ちの状態である。入力待ちの状態は、文入
力、韻律データ修正、修正情報ラベル入力、修正情報保
持実行、使用情報ラベル入力、使用情報読込実行、発声
実行のいずれかの入力がなされるまで繰り返し、入力に
応じて各処理を開始する。In FIG. 3, steps S201 to S20.
7 is a state of waiting for input. The state of waiting for input is repeated until one of sentence input, correction of prosody data, correction information label input, correction information retention execution, use information label input, use information read execution, and utterance execution is input. Start each process.

【００４１】文入力部１０１に音声合成を行う文が入力
されると、入力された文は入力文保持部１０２に保持さ
れ、処理はステップＳ２０１よりステップＳ２０８に進
む。ステップＳ２０８では、音声言語処理部１０３が入
力文保持部１０２に保持された入力文に音声言語処理を
行い、その結果である表音データを表音データ保持部１
０４に保持する。そして、ステップＳ２０９では、韻律
制御部１０５において、表音データ保持部１０４に保持
された表音データを参照して、ピッチパターンなどの韻
律データを生成し、その韻律制御情報を韻律データ保持
部１０９に保持し、ステップＳ２１０に移る。ステップ
Ｓ２１０では、韻律データ保持部１０９に保持されたピ
ッチパターンなど、韻律データの値を韻律データ表示部
１０７に表示する。その後、ステップＳ２０１に移り、
入力待ちの状態に移行する。When a sentence for speech synthesis is input to the sentence input unit 101, the input sentence is held in the input sentence holding unit 102, and the process proceeds from step S201 to step S208. In step S208, the phonetic language processing unit 103 performs phonetic language processing on the input sentence held in the input sentence holding unit 102, and the phonetic data as a result is phonetic data holding unit 1.
04. In step S209, the prosody control unit 105 refers to the phonetic data stored in the phonetic data storage unit 104 to generate prosody data such as a pitch pattern, and stores the prosody control information in the prosody data storage unit 109. , And move to step S210. In step S210, the value of the prosody data such as the pitch pattern held in the prosody data holding unit 109 is displayed on the prosody data display unit 107. Then, move to step S201,
Move to the input waiting state.

【００４２】韻律データ修正部１０８から韻律データの
修正値が入力された場合は、ステップＳ２０２からステ
ップＳ２１１に進む。ステップＳ２１１において、韻律
データ保持部１０６に値が保持されている場合はステッ
プＳ２１２に進み、韻律データ修正部１０８から入力さ
れた修正値を韻律データ修正値保持部１０９に保持す
る。一方、ステップＳ２１１において韻律データが保持
されていない場合は、ステップＳ２１３に進み、例え
ば、「文が入力されていません。入力して下さい。」と
いうエラーメッセージを出力（表示または音声出力）
し、ステップＳ２０１に戻る。When the modification value of the prosody data is input from the prosody data modification unit 108, the process proceeds from step S202 to step S211. If the value is held in the prosody data holding unit 106 in step S211, the process proceeds to step S212, and the correction value input from the prosody data correction unit 108 is held in the prosody data correction value holding unit 109. On the other hand, if the prosody data is not stored in step S211, the process proceeds to step S213, and, for example, an error message “Sentence has not been input. Please input.” Is output (display or voice output).
Then, the process returns to step S201.

【００４３】修正情報ラベル入力部１１０からラベルが
入力されると、ステップＳ２０３からステップＳ２１４
に移る。ステップＳ２１４では、修正情報ラベル入力部
１１０から入力された修正情報のラベルを修正情報ラベ
ル保持部１１１に保持し、ステップＳ２０１に戻る。When the label is input from the correction information label input unit 110, steps S203 to S214 are performed.
Move on to In step S214, the label of the correction information input from the correction information label input unit 110 is held in the correction information label holding unit 111, and the process returns to step S201.

【００４４】韻律情報保持実行部１１３から修正情報の
保持実行の指示が入力されると、ステップＳ２０４から
ステップＳ２１５に移る。ステップＳ２１５では、韻律
データ修正値保持部１０９および修正情報ラベル保持部
１１１にそれぞれ修正値とラベルが保持されているかを
調べる。修正値およびラベルが保持されている場合に
は、ステップＳ２１６に移り、韻律データ修正値保持部
１０９に保持された修正値を修正情報ラベル保持部１１
１に保持されたラベルとともに修正情報保持部１１２に
保持する。そして、ステップＳ２０１に戻る。ステップ
Ｓ２１５において修正値もしくはラベルの少なくとも一
方が保持されていない場合には、ステップＳ２１７へ進
み、例えば、「修正値かラベルが入力されていません。
入力して下さい。」というエラーメッセージを出力し、
ステップＳ２０１に移る。When the instruction to hold and execute the correction information is input from the prosody information holding / executing unit 113, the process proceeds from step S204 to step S215. In step S215, it is checked whether or not the prosody data correction value holding unit 109 and the correction information label holding unit 111 hold the correction value and the label, respectively. If the correction value and the label are held, the process proceeds to step S216, and the correction value held in the prosody data correction value holding unit 109 is set to the correction information label holding unit 11
The label is held in the correction information holding unit 112 together with the label held in 1. Then, the process returns to step S201. If at least one of the correction value and the label is not held in step S215, the process proceeds to step S217 and, for example, "the correction value or the label is not input.
Please enter. Error message,
Then, the process proceeds to step S201.

【００４５】使用情報ラベル入力部１１４から、使用す
べき情報のラベルが入力されると、ステップＳ２０５か
らステップＳ２１８に移る。ステップＳ２１８では、使
用情報ラベル入力部１１４から入力されたラベルを使用
情報ラベル保持部１１５に保持し、ステップＳ２０１に
移る。When the label of the information to be used is input from the use information label input unit 114, the process proceeds from step S205 to step S218. In step S218, the label input from the usage information label input unit 114 is held in the usage information label holding unit 115, and the process proceeds to step S201.

【００４６】使用情報読込実行部１１６において使用す
るパターンの読み込み実行の指示が入力されると、ステ
ップＳ２０６からステップＳ２１９に移る。ステップＳ
２１９では、使用情報ラベル保持部１１５に使用するラ
ベルが保持されているか否かを判断し、保持されていれ
ばステップＳ２２０に移る。そしてステップＳ２２０に
おいて、使用情報ラベル保持部１１５に保持されたラベ
ルに対応する修正情報を修正情報保持部１１２から読み
込み、これを韻律データ修正値保持部１０９に保持す
る。その後、ステップＳ２０１にもどる。また、ステッ
プＳ２１９において使用するラベルが保持されていない
場合には、ステップＳ２２１へ進み、例えば、「ラベル
が入力されていません。入力して下さい。」というエラ
ーメッセージを出力し、ステップＳ２０１に移る。When an instruction to read and use a pattern to be used is input to the usage information read execution unit 116, the process proceeds from step S206 to step S219. Step S
In 219, it is determined whether or not the label used in the usage information label holding unit 115 is held, and if held, the process proceeds to step S220. Then, in step S 220, the correction information corresponding to the label held in the usage information label holding unit 115 is read from the correction information holding unit 112 and held in the prosody data correction value holding unit 109. Then, it returns to step S201. If the label to be used in step S219 is not held, the process proceeds to step S221, for example, an error message "The label has not been entered. Please enter it." Is output, and the process proceeds to step S201. .

【００４７】発声実行部１２０から発声実行の指示が入
力されると、ステップＳ２０７からステップＳ２２２に
処理が移行する。ステップＳ２２２では、韻律データ保
持部１０７に韻律データが保持されているか否かを判断
し、保持されている場合にはステップＳ２２４に進む。When an instruction to execute the utterance is input from the utterance executing unit 120, the process proceeds from step S207 to step S222. In step S222, it is determined whether or not the prosody data holding unit 107 holds the prosody data. If the prosody data is held, the process proceeds to step S224.

【００４８】ステップＳ２２４では、音声生成部１１７
において、韻律データ保持部１０７に保持された韻律デ
ータおよび韻律データ修正部１０９に保持された修正値
を参照して、韻律、パラメータ接続、波形接続などの音
声生成処理を行ない、その結果得られた音声パラメータ
を音声パラメータ保持部１１８に保持する。最後に、ス
テップＳ２２５において、ＤＡコンバータ１１９が音声
パラメータ保持部１１８に保持された音声パラメータを
合成音声に変換し、合成音声を出力して、ステップＳ２
０１に戻る。In step S224, the voice generator 117
In the above, with reference to the prosody data held in the prosody data holding unit 107 and the correction value held in the prosody data correction unit 109, voice generation processing such as prosody, parameter connection, waveform connection, etc. is performed, and the result is obtained. The voice parameter is held in the voice parameter holding unit 118. Finally, in step S225, the DA converter 119 converts the voice parameter held in the voice parameter holding unit 118 into synthetic voice and outputs the synthetic voice, and then in step S2.
Return to 01.

【００４９】一方、ステップＳ２２２において韻律デー
タ保持部１０６に韻律データが保持されていない場合
は、ステップＳ２２３に進み、例えば、「文が入力され
ていません。入力して下さい。」というエラーメッセー
ジを出力し、ステップＳ２０１に戻る。尚、ステップＳ
２１３、Ｓ２１７、Ｓ２２１、Ｓ２２３におけるエラー
メッセージは、表示部７に文字として表示するか、音声
出力部４より音声メッセージとして出力するものであ
る。On the other hand, if the prosody data storage unit 106 does not store the prosody data in step S222, the process proceeds to step S223, and an error message "Sentence has not been entered. Please enter" is displayed. Output and return to step S201. Incidentally, step S
The error messages in 213, S217, S221, and S223 are displayed as characters on the display unit 7 or output as voice messages from the voice output unit 4.

【００５０】以上のような制御手順によれば、入力され
た文に対して音声合成処理および韻律制御処理を施して
得られた韻律データに所望の修正を加えることができ
る。このため、ピッチパターンなどの韻律データを直接
変更できるようになる。以下に韻律データの修正例を示
す。According to the control procedure as described above, desired correction can be added to the prosody data obtained by performing the voice synthesis process and the prosody control process on the input sentence. Therefore, the prosody data such as the pitch pattern can be directly changed. An example of modifying the prosody data is shown below.

【００５１】図６は、図５で示した「昨日は良い天気で
した。」という入力文に対する標準ピッチパターンを関
西弁風に修正したピッチパターンの例である。また、図
７に、「そうですか」という入力文に対する標準的な
（疑問文調の）ピッチパターンの例を示し、図８〜１０
に、それぞれ納得調、疑念調、相槌調に修正して、その
ラベルを付与したピッチパターンの例を示す。FIG. 6 shows an example of a pitch pattern in which the standard pitch pattern for the input sentence "Yesterday was fine weather" shown in FIG. 5 is corrected to the Kansai dialect. Further, FIG. 7 shows an example of a standard (question-like) pitch pattern for the input sentence “is it so”?
Shows an example of a pitch pattern which is modified to a convincing tone, a suspicious tone, and a hammering tone, and is given the label.

【００５２】以上で述べたように、韻律制御部の結果で
ある韻律データを編集し、その編集した値を用いて、音
声生成を行なうことにより、同じ字面の入力文に対して
も多様な発声を実現できる。また、編集した値をパター
ンとしてラベルを付与し、保持／読み込みを行なうこと
により、編集した値の再利用を実現できる。As described above, by editing the prosody data that is the result of the prosody control unit and using the edited values to generate speech, various utterances can be generated even for input sentences of the same character. Can be realized. Further, the edited value can be reused by giving a label with the edited value as a pattern and holding / reading.

【００５３】上述の説明では、必ず韻律データを修正す
る例について説明しているが、修正せずに合成音声を発
声させることができるようにしてもよい。その場合、韻
律データ保持部１０６に保持された値を直接指定して使
用して合成音声を発声する方法、或いは韻律データ保持
部１０６に値を保持すると同時に、その値を韻律データ
修正値保持部１０９にも保持し、この値を使用して合成
音声を発声する方法、或いは韻律データ修正値保持部１
０９に値が存在しているか否か判断した後で、値が存在
しない場合に、音声生成部１１７が韻律データ保持部１
０６に保持された値を使用して合成音声を発声する方法
などを用いても良い。In the above description, an example in which the prosody data is always corrected has been described, but it is also possible to be able to utter a synthetic voice without correction. In that case, a method of directly designating and using the value held in the prosody data holding unit 106, or holding the value in the prosody data holding unit 106, and at the same time holding the value in the prosody data correction value holding unit 109, a method for uttering synthetic speech using this value, or a prosody data correction value holding unit 1
After determining whether or not a value exists in 09, if the value does not exist, the voice generation unit 117 determines that the prosody data holding unit 1
A method of uttering a synthetic voice using the value held in 06 may be used.

【００５４】また、上述の説明では、韻律データ修正値
保持部１０９に修正が行なわれた文に対する全韻律デー
タを保持しているが、韻律データ保持部１０６に保持さ
れた値との差分のみを保持するようにしても良い。In the above description, the prosody data correction value holding unit 109 holds all the prosody data for the corrected sentence, but only the difference from the value held in the prosody data holding unit 106 is stored. You may keep it.

【００５５】また、上述の説明では、韻律データの修正
値の組を同時に１組のみバッファに保持するように説明
しているが、複数の値の組をバッファに保持し、入力部
６からの切替えの指示に応じて音声合成に使用する修正
値を切替えるようにしても良い。Further, in the above description, only one set of modified values of prosody data is held in the buffer at the same time, but a set of a plurality of values is held in the buffer and the set of values from the input unit 6 is input. The correction value used for speech synthesis may be switched according to the switching instruction.

【００５６】また、上述の説明では、入力文保持部１０
２に文が保持されると直ちに音声言語処理が行われるよ
うに説明しているが、音声言語処理の実行の指示を入力
部６により行ない、その指示に応じて音声言語処理を開
始するようにしても良い。In the above description, the input sentence holding unit 10
Although it is described that the speech language processing is performed immediately after the sentence is held in 2, the instruction to execute the speech language processing is given by the input unit 6, and the speech language processing is started according to the instruction. May be.

【００５７】また、上述の説明では、表音データ保持部
１０４に文が保持されると直ちに韻律制御処理が行われ
るように説明しているが、韻律制御処理の実行の指示を
入力部６により行ない、その指示に応じて韻律制御処理
を開始するようにしても良い。In the above description, the prosody control process is performed immediately after the sentence is held in the phonetic data holding unit 104. However, the input unit 6 gives an instruction to execute the prosody control process. Alternatively, the prosody control process may be started in response to the instruction.

【００５８】また、上述の説明では、入力をすべて人手
で行なうような説明をしているが、他の例えば、自然言
語生成装置や自然言語対話装置の出力をそのまま入力と
して用いても良い。Further, in the above description, all of the input is manually performed, but other outputs such as a natural language generation device or a natural language dialogue device may be used as the input as they are.

【００５９】また、上述の説明では、韻律の修正を行う
文を文入力部１０１より入力するような説明を行った
が、文の入力はこれに限るものではなく、通常の文章入
力画面において表示されている文章中の、所望の文もし
くは文字列をキーボード或いはマウスにより特定し、こ
の特定された文もしくは文字列を文入力部１０１により
取り込むようにしても良い。In the above description, the sentence for correcting the prosody is input from the sentence input unit 101, but the input of the sentence is not limited to this, and the sentence is displayed on the normal sentence input screen. A desired sentence or character string in the written sentence may be specified by the keyboard or mouse, and the specified sentence or character string may be captured by the sentence input unit 101.

【００６０】また、標準型のイントネーションだけでな
く、同じ字面の文でも複数種類のイントネーションを作
成し、ラベルを付与して保持しておけるので、文章入力
からその文章を発声するまでの音声言語処理の段階で、
例えば文末に「？」がついている文については疑問形で
あると判断し、疑問形のイントネーションであることを
識別できるラベルが付与されている韻律データを修正情
報保持部１１２より読み出すことにより、文章を発声す
る度にイントネーションを修正することなく自動的に疑
問形のイントネーションで発声できる。Further, not only the standard type intonation but also a plurality of types of intonations can be created for a sentence of the same character, and a label can be given and held, so that a speech language process from the input of a sentence to the utterance of that sentence is performed. At the stage of
For example, a sentence with “?” At the end of the sentence is determined to be an interrogative form, and prosodic data to which a label that can identify the intonation of the interrogative form is attached is read from the correction information holding unit 112. Each time you utter, you can utter automatically with a question intonation without modifying the intonation.

【００６１】[0061]

【発明の効果】以上説明したように、本発明によれば、
入力された文字列の韻律データを表示し、前記表示され
た韻律データを修正し、前記修正された韻律データに従
って文字列を発声することにより、オペレータ各自が所
望の韻律を設定して発声させることができる。As described above, according to the present invention,
Displaying the prosody data of the input character string, modifying the displayed prosody data, and voicing the character string according to the modified prosody data, whereby each operator sets and outputs a desired prosody. You can

【００６２】以上説明したように、本発明によれば、前
記入力された文字列を音声言語処理して前記表示する韻
律データを生成することにより、オペレータの所望する
どんな文字列にも対応することができる。As described above, according to the present invention, it is possible to correspond to any character string desired by the operator by subjecting the input character string to speech language processing to generate the prosody data to be displayed. You can

【００６３】以上説明したように、本発明によれば、前
記韻律データを表示する際に、該韻律データに対応する
音声表記を韻律データと対応付けて表示することによ
り、どの韻律データを修正すれば良いのかオペレータに
知らしめることができ、修正操作が理解し易くなる。As described above, according to the present invention, when displaying the prosody data, by displaying the phonetic notation corresponding to the prosody data in association with the prosody data, which prosody data can be corrected. It is possible to inform the operator whether or not to do so, and the correction operation becomes easy to understand.

【００６４】以上説明したように、本発明によれば、前
記韻律データの修正は、表示画面上で韻律データを表わ
すマークを移動させることにより行うことにより、韻律
データの修正値の入力が初心者にも容易に行える。As described above, according to the present invention, the correction of the prosody data is performed by moving the mark representing the prosody data on the display screen, so that a beginner can input the correction value of the prosody data. Can be done easily.

【００６５】以上説明したように、本発明によれば、前
記文字列は、表示されている文章中で特定された文字列
とすることにより、修正データ入力のために別ルーチン
の作業を改めて開始することなく、処理対象となってい
る文章に対しても修正作業を行うことができる。As described above, according to the present invention, the character string is the character string specified in the displayed text, so that the work of another routine for inputting the correction data is restarted. Without doing, it is possible to perform the correction work even for the sentence to be processed.

【００６６】以上説明したように、本発明によれば、前
記修正された韻律データにラベルを付与して登録するこ
とにより、修正された韻律データを再利用することがで
きる。As described above, according to the present invention, the modified prosody data can be reused by adding a label to the modified prosody data and registering it.

【００６７】以上説明したように、本発明によれば、ラ
ベルを入力し、前記入力されたラベルに対応する韻律デ
ータを読み出し、前記読み出した韻律データに従って発
声することにりより、所望の韻律データを素早く読み出
すことができる。As described above, according to the present invention, by inputting a label, reading the prosody data corresponding to the input label, and uttering according to the read prosody data, desired prosody data is obtained. Can be read quickly.

【００６８】以上説明したように、本発明によれば、前
記修正された韻律データを保持し、発声の指示に応じて
前記保持された韻律データに従った発声を実行すること
により、修正処理中でも発声を確認することができる。As described above, according to the present invention, the corrected prosody data is held, and the utterance according to the held prosody data is executed according to the instruction of utterance, so that the correction process is performed. You can check your utterance.

【００６９】以上説明したように、本発明によれば、前
記韻律データを表示する表示手段を制御することによ
り、表示手段によって韻律データを確認することができ
る。As described above, according to the present invention, by controlling the display means for displaying the prosody data, the prosody data can be confirmed by the display means.

【００７０】以上説明したように、本発明によれば、前
記文字列の発声をスピーカにより行うよう制御すること
により、文字列の発声を確認することができる。As described above, according to the present invention, the utterance of the character string can be confirmed by controlling the utterance of the character string by the speaker.

【００７１】以上説明したように、本発明によれば、入
力手段から入力される前記韻律データの修正データを処
理することにより、オペレータが自ら修正データを入力
することができる。As described above, according to the present invention, the operator can input the correction data by himself / herself by processing the correction data of the prosody data input from the input means.

[Brief description of the drawings]

【図１】音声合成装置の概略の構成を示すブロック図FIG. 1 is a block diagram showing a schematic configuration of a speech synthesizer.

【図２】音声合成装置の機能構成を示すブロック図FIG. 2 is a block diagram showing a functional configuration of a speech synthesizer.

【図３】音声合成処理動作手順を示すフローチャートFIG. 3 is a flowchart showing a voice synthesis processing operation procedure.

【図４】「昨日は良い天気でした。」という入力文に対
する表音データの一例を示す図FIG. 4 is a diagram showing an example of phonetic data for an input sentence “Yesterday was fine weather”.

【図５】「昨日は良い天気でした。」という入力文に対
する韻律データの一例を示す図FIG. 5 is a diagram showing an example of prosody data for an input sentence “Yesterday was fine weather”.

【図６】図５の入力文に対して、韻律データを関西弁風
にした修正例を示す図FIG. 6 is a diagram showing a modification example of the input sentence of FIG. 5 in which prosody data is made into a Kansai dialect.

【図７】「そうですか。」という入力文に対する韻律デ
ータの一例を示す図[FIG. 7] A diagram showing an example of prosody data for an input sentence “is it?”

【図８】図７の入力文に対して、韻律データを納得調に
した修正例を示す図FIG. 8 is a diagram showing a modification example in which prosody data is convincingly applied to the input sentence of FIG. 7.

【図９】図７の入力文に対して、韻律データを疑念調に
した修正例を示す図FIG. 9 is a diagram showing a modification example in which the prosody data is suspicious for the input sentence of FIG.

【図１０】図７の入力文に対して、韻律データを相槌調
にした修正例を示す図FIG. 10 is a diagram showing a modification example in which prosodic data is adjusted to the input sentence of FIG.

Claims

[Claims]

1. A voice information processing method comprising: displaying prosody data of an input character string, modifying the displayed prosody data, and uttering a character string in accordance with the modified prosody data.

2. The voice information processing method according to claim 1, wherein the input character string is subjected to speech language processing to generate the prosody data to be displayed.

3. The voice information processing method according to claim 1, wherein when the prosody data is displayed, a phonetic notation corresponding to the prosody data is displayed in association with the prosody data.

4. The voice information processing method according to claim 1, wherein the modification of the prosody data is performed by moving a mark representing the prosody data on a display screen.

5. The voice information processing method according to claim 1, wherein the character string is a character string specified in a displayed sentence.

6. The voice information processing method according to claim 1, wherein a label is added to the modified prosody data and registered.

7. The voice information processing method according to claim 1, wherein a label is input, prosody data corresponding to the input label is read, and a voice is uttered according to the read prosody data.

8. The voice information processing method according to claim 1, wherein the modified prosody data is held, and utterance is executed in accordance with the held prosody data in response to an utterance instruction.

9. The voice information processing method according to claim 1, further comprising controlling display means for displaying the prosody data.

10. The voice information processing method according to claim 1, wherein control is performed so that the speaker utters the character string.

11. The voice information processing method according to claim 1, wherein modified data of the prosody data input from the input means is processed.

12. Input means for inputting a character string, display means for displaying prosody data of the input character string, prosody modification means for modifying the displayed prosody data, and the modified prosody data. And a voicing control means for controlling to utter a character string according to the above.

13. The voice information processing apparatus according to claim 12, further comprising voice language processing means for performing voice language processing on the input character string to generate the prosody data to be displayed.

14. The voice information processing apparatus according to claim 12, wherein the display unit displays the phonetic notation corresponding to the prosody data in association with the prosody data.

15. The voice information processing apparatus according to claim 12, wherein the prosody modification unit performs the prosody modification according to movement of a mark representing the prosody data designated on the display screen.

16. The voice information processing apparatus according to claim 12, wherein the input unit inputs a character string specified in a displayed sentence.

17. The voice information according to claim 12, further comprising registration means for inputting a label by the input means, and adding the input label to the corrected prosody data to register the same. Processing equipment.

18. The voice information according to claim 12, wherein the reading means reads the prosody data corresponding to the label input by the input means, and the voicing control means utters according to the read prosody data. Processing equipment.

19. A holding means for holding the modified prosody data, a voicing instruction means for instructing utterance, and the utterance control means for the prosody held by the holding means in response to an instruction from the voicing instruction means. 13. The voice information processing apparatus according to claim 12, wherein the voice information processing apparatus executes utterance according to the data.

20. The voice information processing apparatus according to claim 12, further comprising a speaker that outputs a voice generated by control of the utterance control means.