JPH05313686A

JPH05313686A - Display controller

Info

Publication number: JPH05313686A
Application number: JP4109357A
Authority: JP
Inventors: Keiko Sakuragi; 恵子桜木; Masanobu Sakaguchi; 正信坂口; Shigeko Asano; 薫子浅野; Fumitaka Kawate; 史隆川手
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1992-04-02
Filing date: 1992-04-02
Publication date: 1993-11-26

Abstract

PURPOSE:To output an optional voice and an image synchronously. CONSTITUTION:A voice synthesis part 12 calculates parameters for synthesizing a voice corresponding to a text supplied from a application program 14 through a control part 11. Then, when a mouse is operated and a speak button is clicked, a control part 11 outputs an actuation instruction to a voice synthesis part 12 and an animation display part 13, and the voice synthesis part 12 outputs the parameters for synthesizing the voice corresponding to the text to a voice synthesizer, which synthesizes the voice according to the parameters. At the same time, the animation display part 13 outputs an animation corresponding to the motion of the mouse for vocalizing the text in order.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えば英語の発音を学
習させるＣＡＩシステムなどに適用して好適な表示制御
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a display controller suitable for application to, for example, a CAI system for learning English pronunciation.

【０００２】[0002]

【従来の技術】従来のＣＡＩシステムにおいては、あら
かじめ、例えばハードディスクなどに記録しておいた音
声信号と画像信号を読み出し、所定の同期信号に同期さ
せて音声に対応した画像（画像に対応した音声）を出力
することができるようになっている。2. Description of the Related Art In a conventional CAI system, an audio signal and an image signal previously recorded in, for example, a hard disk are read out, and an image corresponding to the audio (audio corresponding to the image is synchronized with a predetermined synchronization signal). ) Can be output.

【０００３】これにより、例えば音声信号としてネイテ
ィブスピーカの発声した英語をハードディスクにあらか
じめ記録しておくとともに、画像信号としてその英語が
発声されるときの口の動き（動作）を行うアニメーショ
ンをハードディスクにあらかじめ記録しておけば、使用
者に英語の発音と、その英語が発声されるときの口の動
きとを同時に学習させることができる。Thus, for example, the English uttered by the native speaker is recorded in advance in the hard disk as an audio signal, and the animation of the mouth movement (motion) when the English is uttered as the image signal is recorded in the hard disk in advance. If recorded, the user can learn the pronunciation of English and the movement of the mouth when the English is uttered at the same time.

【０００４】[0004]

【発明が解決しようとする課題】このように、従来のＣ
ＡＩシステムでは、あらかじめ音声信号と画像信号とを
ハードディスクに記録させておく必要がある。従って、
例えば単語単位で記録しておいた有限数の音声信号と画
像信号に対応する音声と画像しか出力することができ
ず、使用者に学習させる範囲が限定されてしまう課題が
あった。As described above, the conventional C
In the AI system, it is necessary to record the audio signal and the image signal in the hard disk in advance. Therefore,
For example, there is a problem that only a finite number of voice signals and image and voices and images corresponding to image signals recorded in units of words can be output, and the range for learning by the user is limited.

【０００５】そこで、ハードディスクに記録させておく
英単語に対応する音声信号と画像信号を順次追加する方
法がある。しかしながら、この方法では、ハードディス
クに記録されていない音声信号と画像信号が発見される
たびに、即ち使用者が学習しているときに、学習を中断
させて、その音声信号と画像信号とを追加記録しなけれ
ばならないので、学習者の興味をそいでしまう課題があ
った。Therefore, there is a method of sequentially adding audio signals and image signals corresponding to English words to be recorded on the hard disk. However, in this method, every time an audio signal and an image signal which are not recorded in the hard disk are found, that is, when the user is learning, the learning is interrupted and the audio signal and the image signal are added. Since it had to be recorded, there was a problem that attracted learners' interest.

【０００６】本発明は、このような状況に鑑みてなされ
たものであり、任意の音声を画像と同期させて出力する
ことができるようにするものである。The present invention has been made in view of such a situation, and it is possible to output an arbitrary sound in synchronization with an image.

【０００７】[0007]

【課題を解決するための手段】請求項１に記載の表示制
御装置は、テキストから音声を合成して出力する音声合
成手段としての音声合成器１および音声合成部１２と、
テキストに対応して例えばアニメーションなどの動画像
を表示する表示手段としてのＣＲＴ３およびアニメ表示
部１３と、音声合成器１および音声合成部１２により合
成された音声と、ＣＲＴ３およびアニメ表示部１３によ
り表示されたアニメーションとを同期させる同期手段と
してのコントロール部１１とを備えることを特徴とす
る。A display controller according to a first aspect of the present invention includes a voice synthesizer 1 and a voice synthesizer 12 as a voice synthesizer for synthesizing and outputting voice from text.
A CRT 3 and an animation display unit 13 as display means for displaying a moving image such as an animation corresponding to a text, a voice synthesized by the voice synthesizer 1 and a voice synthesis unit 12, and a CRT 3 and an animation display unit 13 are displayed. It is characterized by comprising a control unit 11 as a synchronizing means for synchronizing the generated animation.

【０００８】この表示制御装置は、ＣＲＴ３およびアニ
メ表示部１３に、テキストに対応して口が動くアニメー
ションを表示させることができる。This display control device can cause the CRT 3 and the animation display section 13 to display an animation in which the mouth moves in accordance with the text.

【０００９】[0009]

【作用】請求項１に記載の表示制御装置においては、テ
キストから合成された音声と、テキストに対応して動く
アニメーションとを同期させて出力する。従って、任意
の音声をアニメーションと同期させて出力するようにす
ることができる。In the display control device according to the first aspect, the voice synthesized from the text and the animation that moves corresponding to the text are output in synchronization. Therefore, it is possible to output any voice in synchronization with the animation.

【００１０】ＣＲＴ３およびアニメ表示部１３に、テキ
ストに対応して口が動くアニメーションを表示させるこ
とができる場合においては、例えばＣＡＩシステムなど
に適用することにより、学習者の興味がひきつけられ、
学習効果を向上させることができる。When the animation in which the mouth moves corresponding to the text can be displayed on the CRT 3 and the animation display unit 13, the learner's interest can be attracted by applying it to the CAI system, for example.
The learning effect can be improved.

【００１１】[0011]

【実施例】図１は、本発明の表示制御装置を応用したＣ
ＡＩシステムの一実施例の構成を示すブロック図であ
る。端末２は、ＲＳ２３２−Ｃケーブル７を介して音声
合成器１に接続され、ＣＲＴ３、例えばＵＮＩＸ用のワ
ークステーション（ＥＷＳ）４、キーボード５、および
マウス６より構成されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows a C to which the display control device of the present invention is applied.
It is a block diagram which shows the structure of one Example of AI system. The terminal 2 is connected to the voice synthesizer 1 via an RS232-C cable 7, and is composed of a CRT 3, for example, a workstation (EWS) 4 for UNIX, a keyboard 5, and a mouse 6.

【００１２】ＥＷＳ４は、装置全体を制御するＣＰＵ、
システムプログラムなどを記憶しているＲＯＭ、および
装置の動作上必要なデータを記憶するＲＡＭ（いずれも
図示せず）などより構成され、オペレーションシステム
としての例えばＵＮＩＸオペレーションシステム、およ
びＣＲＴ３に表示される（オープンされる）ウインドウ
などを制御する制御プログラムとしての例えばＸウイン
ドウシステムがインストールされている。The EWS 4 is a CPU that controls the entire apparatus,
It is composed of a ROM that stores a system program and the like, and a RAM (none of which is shown) that stores data necessary for the operation of the apparatus, and is displayed on, for example, a UNIX operation system as an operation system and a CRT 3 ( For example, an X window system is installed as a control program for controlling (opened) windows and the like.

【００１３】ＣＲＴ３は、ＥＷＳ４にインストールされ
たＸウインドウシステムに制御され、ＥＷＳ４より出力
される信号に対応して、例えばアニメーションやキーボ
ード５で入力されたテキスト（文字列）などを表示す
る。キーボード５およびマウス６は、ＥＷＳ４に接続さ
れており、テキストを作成したり、またはＥＷＳ４に対
してコマンドを発する場合に操作される。The CRT 3 is controlled by the X window system installed in the EWS 4 and displays, for example, an animation or a text (character string) input by the keyboard 5 in response to a signal output from the EWS 4. The keyboard 5 and the mouse 6 are connected to the EWS 4 and are operated when creating text or issuing a command to the EWS 4.

【００１４】音声合成器１は、ＥＷＳ４よりＲＳ２３２
−Ｃケーブル７を介して供給される音声の特徴パラメー
タから音声を合成して、内蔵するスピーカ（図示せず）
より出力する。The voice synthesizer 1 uses RS232 from EWS4.
-A speaker (not shown) that synthesizes voice from the characteristic parameters of the voice supplied via the C cable 7 and is built in
Output more.

【００１５】図１に示すＣＡＩシステムは、英語学習用
のもので、学習者が自由に試行錯誤しながら能動的に思
考することができる環境提供型、且つコミュニケーショ
ン型（双方主導型）のシステムであり、 ○シミュレーション機能 ○検索機能 ○ユーザ・フレンドリなインタフェースを有し、マルチメディアを駆使して、範囲にとらわれな
い学習、および様々な学習方法を提供することができる
ようになっている。The CAI system shown in FIG. 1 is for learning English, and is an environment-providing type and communication type (both sides initiative type) system in which a learner can think actively by trial and error. Yes, ○ Simulation function ○ Search function ○ Has a user-friendly interface, and by using multimedia, it is possible to provide learning that is not bound by the range and various learning methods.

【００１６】さらに、このＣＡＩシステムは、楽しい学
校の雰囲気を出すため、・「ＳＴＵＤＹ」・「ＰＯＳＴＯＦＦＩＣＥ」・「ＬＩＢＲＡＲＹ」・「ＰＬＡＹＲＯＯＭ」・「ＬＡＢＯＲＡＴＯＲＹ」と呼ばれる５つのサブシステムから構成される。Further, this CAI system is composed of five subsystems called "STUDY", "POST OFFICE", "LIBRARY", "PLAYROOM", and "LABORATORY" in order to create a pleasant school atmosphere.

【００１７】サブシステム「ＳＴＵＤＹ」は、教科書に
基づいて英語の文法的基礎を学び、内容を理解し、さら
には知識の定着をはかるための問題演習などを行なう、
基礎学習室である。The subsystem "STUDY" learns the grammatical basics of English based on textbooks, understands the contents, and also carries out problem exercises to establish knowledge.
It is a basic learning room.

【００１８】サブシステム「ＰＯＳＴＯＦＦＩＣＥ」
は、実用的な英語での表現や手紙の書式を身につける手
助けをするために、英文で手紙を書いて送ることをシミ
ュレートするものである。Subsystem "POST OFFICE"
Simulates writing and sending letters in English to help you learn practical English expressions and letter formats.

【００１９】サブシステム「ＬＩＢＲＡＲＹ」は、シス
テム（各サブシステム）内で使用される言葉をデータベ
ースとして持ち、他のサブシステムから自由に検索・参
照できる支援ツールである。このサブシステムでは、絵
や音も提供されるので、日本語を経由することなく視覚
聴覚の助けによって、学習者に言葉のイメージを捕えさ
せることができるようになっている。The subsystem "LIBRARY" is a support tool which has a database of words used in the system (each subsystem) and which can be freely searched / referenced from other subsystems. This subsystem also provides pictures and sounds so that the learner can capture the image of the words without the help of Japanese and with the aid of visual and auditory senses.

【００２０】サブシステム「ＰＬＡＹＲＯＯＭ」は、英
語の勉強に対する、学習者の興味、関心が失われてしま
うのを防止するためのもので、知的なゲームで遊びなが
ら、知らず知らずのうちに、英語を学習することができ
るようになっている。The subsystem "PLAYROOM" is intended to prevent the learner's interest in studying English from being lost, and he / she can play English games without hesitation. You are able to learn.

【００２１】サブシステム「ＬＡＢＯＲＡＴＯＲＹ」
は、英語による会話の実験をするためのもので、画像と
音声を組合せ、実際に人と会話しているかのような状況
（仮想状況）を作り、その中で学習者に英会話を体験さ
せることができるようになっている。Subsystem "LABORATORY"
Is an experiment for conversation in English. By combining images and sounds, we create a situation (virtual situation) as if we are actually talking to a person, and let the learner experience English conversation in it. You can do it.

【００２２】サブシステム「ＬＡＢＯＲＡＴＯＲＹ」
は、アプリケーション１４、並びに実際に人と会話して
いるかのような状況（仮想状況）を作り出すためのユー
ザインターフェースとしてのコントロール部１１、音声
合成部１２、およびアニメ表示部１３より構成される
（図２）。Subsystem "LABORATORY"
Is composed of an application 14, and a control unit 11 as a user interface for creating a situation (virtual situation) as if one were actually talking to a person, a voice synthesis unit 12, and an animation display unit 13 (see FIG. 2).

【００２３】コントロール部１１は、音声合成部１２お
よびアニメ表示部１３を制御し、音声合成部１２（音声
合成器１）により合成される音声と、アニメ表示部１３
によりＣＲＴ３に表示されるアニメーションの口の動き
とを同期させる。さらに、コントロール部１１は、アプ
リケーション１４より供給されるテキストの文字数、テ
キストに含まれるカンマ（，）の数、またはピリオ
ド（．）の数をそれぞれカウントし、ＣＲＴ３に表示さ
れるアニメーションの表示時間（アニメーションの口を
動かす時間）を算出してアニメ表示部１３に出力する。
音声合成部１２は、コントロール部１１を介してアプリ
ケーション１４より供給されるテキスト（文字列）か
ら、音声を合成するためのパラメータを算出し、音声合
成器１に出力する。アニメ表示部１３は、コントロール
部１１の制御にしたがって、セルフレームデータ記憶部
１３ａ（図６）に記憶された、アニメーション（アニメ
ーションの口）を動かすためのセルフレームデータを読
み出し、ＣＲＴ３に出力する。セルフレームデータ記憶
部１３ａには、アニメーション（アニメーションの口）
を動かすためのセルフレームデータ（図８）が記憶され
ている。The control unit 11 controls the voice synthesizing unit 12 and the animation display unit 13, and the voice synthesized by the voice synthesizing unit 12 (voice synthesizer 1) and the animation display unit 13 are controlled.
To synchronize the movement of the mouth of the animation displayed on the CRT 3. Further, the control unit 11 counts the number of characters of the text supplied from the application 14, the number of commas (,), or the number of periods (.) Included in the text, respectively, and displays the animation display time (CRT3). The time for moving the mouth of the animation) is calculated and output to the animation display unit 13.
The voice synthesizing unit 12 calculates a parameter for synthesizing voice from a text (character string) supplied from the application 14 via the control unit 11, and outputs the parameter to the voice synthesizer 1. Under the control of the control unit 11, the animation display unit 13 reads the cell frame data for moving the animation (animation mouth) stored in the cell frame data storage unit 13a (FIG. 6) and outputs it to the CRT 3. An animation (animation mouth) is stored in the cell frame data storage unit 13a.
The cell frame data (FIG. 8) for moving the cell are stored.

【００２４】アプリケーション１４は、サブシステム
「ＬＡＢＯＲＡＴＯＲＹ」におけるアプリケーション
で、例えば図３に示すように、対話処理部２１、言語処
理部２２、知識データベース検索部２３、および知識デ
ータベース２４より構成される。対話処理部２１は、様
々な状況に関するスクリプトと対話履歴を記憶（保持）
し、これらを基に対話の流れを管理する。The application 14 is an application in the subsystem "LABORATORY", and comprises a dialogue processing section 21, a language processing section 22, a knowledge database searching section 23, and a knowledge database 24, as shown in FIG. The dialogue processing unit 21 stores (holds) scripts and dialogue history relating to various situations.
And manage the flow of dialogue based on these.

【００２５】言語処理部２２は、解析部と生成部（いず
れも図示せず）より構成される。言語処理部２２の解析
部は、キーボード５で入力された英語文の構文解析、お
よび意味解析を行い、内部表現に変換して、知識データ
ベース検索部２３に供給する。言語処理部２２の生成部
は、知識データベース検索部２３より返された回答と、
対話処理部２１からの指示（制御）に基づき英語文を生
成し、コントロール部１１（図２）に出力する。The language processing unit 22 is composed of an analysis unit and a generation unit (neither is shown). The analysis unit of the language processing unit 22 performs syntactic analysis and semantic analysis of the English sentence input by the keyboard 5, converts the sentence into an internal representation, and supplies the internal representation to the knowledge database search unit 23. The generation unit of the language processing unit 22 uses the answer returned from the knowledge database search unit 23,
An English sentence is generated based on an instruction (control) from the dialogue processing unit 21 and output to the control unit 11 (FIG. 2).

【００２６】知識データベース検索部２３は、言語処理
部２２の解析部より供給された内部表現に対する返答
を、知識データベース２４に記憶されているデータベー
スから検索して言語処理部２２の生成部に出力する。知
識データベース２４には、質問に対する返答パターンな
どが内部表現の形で記憶されている。The knowledge database search unit 23 searches the database stored in the knowledge database 24 for a response to the internal expression supplied from the analysis unit of the language processing unit 22 and outputs it to the generation unit of the language processing unit 22. .. The knowledge database 24 stores reply patterns to questions in the form of internal expressions.

【００２７】次に、その動作について説明する。まず、
図１に示すＣＡＩシステムにおいて、キーボード５また
はマウス６が操作され、図３に示すアプリケーション１
４が起動されると、ＣＲＴ３に図４に示すような、先生
としてのアニメーション３２と、ＣＡＩシステムの使用
者、即ち学習者本人としてのアニメーション３１が表示
される。さらに、キーボード５が操作され、質問文とし
てのテキスト３３（What is the fastest train in Jap
an?）が入力されると、Ｘウインドウシステムに制御さ
れたＣＲＴ３のウインドウ上にテキスト３３が表示され
るとともに、アプリケーション１４の言語処理部２２
（図３）にテキスト３３が供給される。Next, the operation will be described. First,
In the CAI system shown in FIG. 1, the keyboard 5 or the mouse 6 is operated and the application 1 shown in FIG.
4 is activated, an animation 32 as a teacher and an animation 31 as a user of the CAI system, that is, the learner himself are displayed on the CRT 3 as shown in FIG. Further, the keyboard 5 is operated, and the text 33 (What is the fastest train in Jap as a question sentence is displayed.
an?) is input, the text 33 is displayed on the window of the CRT 3 controlled by the X window system, and the language processing unit 22 of the application 14 is displayed.
The text 33 is provided (FIG. 3).

【００２８】言語処理部２２の解析部において、キーボ
ード５で入力されたテキスト３３の構文解析、および意
味解析が行われ、内部表現に変換されて、知識データベ
ース検索部２３に供給される。知識データベース検索部
２３において、言語処理部２２の解析部より供給された
内部表現に対する返答が、知識データベース２４に記憶
されているデータベースから検索され言語処理部２２の
生成部に出力される。言語処理部２２の生成部におい
て、知識データベース検索部２３より返された回答と、
対話処理部２１からの指示（制御）に基づきテキスト３
４（図４）（Thefastest train in Japan is Shinkanse
n.）が生成され、コントロール部１１（図２）に出力さ
れる。In the analysis section of the language processing section 22, the text 33 input by the keyboard 5 is subjected to syntax analysis and semantic analysis, converted into an internal representation, and supplied to the knowledge database search section 23. In the knowledge database search unit 23, the reply to the internal expression supplied from the analysis unit of the language processing unit 22 is searched from the database stored in the knowledge database 24 and output to the generation unit of the language processing unit 22. In the generation unit of the language processing unit 22, the answer returned from the knowledge database search unit 23,
Text 3 based on an instruction (control) from the dialogue processor 21
4 (Figure 4) (The fastest train in Japan is Shinkanse
n.) is generated and output to the control unit 11 (FIG. 2).

【００２９】コントロール部１１において、アプリケー
ション１４より供給されたテキスト３４の文字数、テキ
スト３４に含まれるカンマ（，）の数、またはピリオド
（．）の数がそれぞれカウントされ、ＣＲＴ３に表示さ
れた先生としてのアニメーション３２の表示時間（アニ
メーション３２の口を動かす時間）が算出され、アニメ
表示部１３に供給される。In the control section 11, the number of characters of the text 34 supplied from the application 14, the number of commas (,), or the number of periods (.) Contained in the text 34 are counted, respectively, and as a teacher displayed on the CRT 3. The display time of the animation 32 (time to move the mouth of the animation 32) is calculated and supplied to the animation display unit 13.

【００３０】アニメ表示部１３において、コントロール
部１１を介してアプリケーション１４より供給された、
テキスト３３の返答であるテキスト３４を発声するため
の口の動きに対応するアニメーション３２のセルフレー
ムデータが、セルフレームデータ記憶部１３ａより読み
出される。In the animation display section 13, supplied from the application 14 via the control section 11,
The cell frame data of the animation 32 corresponding to the movement of the mouth for uttering the text 34, which is the response of the text 33, is read from the cell frame data storage unit 13a.

【００３１】一方、音声合成部１２において、コントロ
ール部１１を介してアプリケーション１４より供給され
た、テキスト３３の返答であるテキスト３４に対応する
音声を合成するためのパラメータが算出される。On the other hand, the voice synthesizing unit 12 calculates a parameter for synthesizing the voice corresponding to the text 34 which is the reply of the text 33 and is supplied from the application 14 via the control unit 11.

【００３２】そして、マウス６が操作され、スピークボ
タン３５がクリックされると、コントロール部１１にお
いて、起動命令が音声合成部１２およびアニメ表示部１
３に出力される。Then, when the mouse 6 is operated and the speak button 35 is clicked, in the control section 11, the start command is issued to the voice synthesis section 12 and the animation display section 1.
3 is output.

【００３３】音声合成部１２において、テキスト３４に
対応する音声を合成するためのパラメータが音声合成器
１に出力され、音声合成器１において、そのパラメータ
から音声「The fastest train in Japan is Shinkanse
n.」が合成され、内蔵するスピーカより出力される（図
５）。In the voice synthesizer 12, a parameter for synthesizing the voice corresponding to the text 34 is output to the voice synthesizer 1, and the voice synthesizer 1 uses the parameter "Voice The fastest train in Japan is Shinkanse".
n. "is synthesized and output from the built-in speaker (Fig. 5).

【００３４】同時に、アニメ表示部１３において、セル
フレームデータ記憶部１３ａより読み出された、テキス
ト３４を発声するための口の動きに対応するアニメーシ
ョン３２のセルフレームデータが、コントロール部１１
より供給されたアニメーション３２の表示時間（アニメ
ーション３２の口を動かす時間）に対応して、ＣＲＴ３
に順次出力され（図６）、ＣＲＴ３において、テキスト
３４を発声しているように口を動かすアニメーション３
２が表示される。At the same time, in the animation display unit 13, the cell frame data of the animation 32 corresponding to the movement of the mouth for uttering the text 34 read out from the cell frame data storage unit 13a is stored in the control unit 11.
Corresponding to the display time of animation 32 (time to move the mouth of animation 32) supplied by CRT3
Are sequentially output to (Fig. 6), and an animation 3 in which the mouth is moved as if the text 34 is being spoken on the CRT 3
2 is displayed.

【００３５】なお、このとき、Ｘウインドウシステムに
制御されたＣＲＴ３のウインドウ上にテキスト３４（図
４）が表示される。At this time, the text 34 (FIG. 4) is displayed on the window of the CRT 3 controlled by the X window system.

【００３６】以上のようにして、テキストから合成され
た音声と、テキストに対応して動くアニメーションとが
同期して出力される。As described above, the voice synthesized from the text and the animation that moves corresponding to the text are output in synchronization.

【００３７】次に、図７を参照して、さらにその動作に
ついて説明する。まず最初に、ステップＳ１において、
キーボード５よりテキストが入力されたか否かが判定さ
れる。ステップＳ１において、キーボード５よりテキス
トが入力されていないと判定された場合、再びステップ
Ｓ１に戻る。ステップＳ１において、キーボード５より
テキストが入力されたと判定された場合、ステップＳ２
に進み、テキストに対応する口の動きを行うためのアニ
メーションのセルフレームデータがセルフレームデータ
記憶部１３ａから読み出されるとともに、そのセルフレ
ームデータをＣＲＴ３に出力する時間、即ちアニメーシ
ョンの口を動かす時間が算出され、ステップＳ３に進
む。Next, the operation will be further described with reference to FIG. First, in step S1,
It is determined whether or not text has been input from the keyboard 5. When it is determined in step S1 that the text is not input from the keyboard 5, the process returns to step S1 again. If it is determined in step S1 that text has been input from the keyboard 5, step S2
The cell frame data of the animation for moving the mouth corresponding to the text is read from the cell frame data storage unit 13a, and the time for outputting the cell frame data to the CRT 3, that is, the time for moving the mouth of the animation It is calculated, and the process proceeds to step S3.

【００３８】ステップＳ３において、ステップＳ１で入
力されたテキストから音声合成するためのパラメータが
算出され、ステップＳ４に進む。ステップＳ４におい
て、ステップＳ３で算出されたパラメータから合成され
る音声の出力タイミングと、ステップＳ２でセルフレー
ムデータ記憶部１３ａから読み出されたセルフレームデ
ータの出力タイミングとを同期させるための同期信号
（起動命令）が検出されたか否かが判定される。ステッ
プＳ４において、同期信号が検出されていないと判定さ
れた場合、再びステップＳ４に戻る。ステップＳ４にお
いて、同期信号が検出されたと判定された場合、ステッ
プＳ５に進み、ステップＳ３で算出されたパラメータか
ら合成された音声が、音声合成器１の内蔵するスピーカ
より出力されるとともに、ステップＳ１で入力されたテ
キストを発声しているように口を動かすアニメーション
が、ステップＳ２で算出されたアニメーションの口を動
かす時間に対応して、ＣＲＴ３に表示される。In step S3, parameters for voice synthesis from the text input in step S1 are calculated, and the process proceeds to step S4. In step S4, a synchronization signal (for synchronizing the output timing of the voice synthesized from the parameters calculated in step S3 and the output timing of the cell frame data read from the cell frame data storage unit 13a in step S2 ( It is determined whether or not a start command) has been detected. When it is determined in step S4 that the sync signal is not detected, the process returns to step S4 again. When it is determined in step S4 that the synchronization signal is detected, the process proceeds to step S5, the voice synthesized from the parameters calculated in step S3 is output from the speaker incorporated in the voice synthesizer 1, and step S1 is performed. The animation of moving the mouth as if uttering the text input in step S2 is displayed on the CRT 3 in correspondence with the time of moving the mouth of the animation calculated in step S2.

【００３９】次に、アニメーションの表示時間（アニメ
ーションの口を動かす時間）の算出方法について説明す
る。まず、セルフレームデータ記憶部１３ａに記憶され
ているセルフレーム（図８）をアニメーションらしく表
示させるために、ＣＲＴ３で一枚のセルフレームが描画
される回数をＴとする。なお、この回数Ｔは、ＣＲＴ３
の表示速度と画像データの大きさを基にして、あらかじ
めＥＷＳ４に設定されている。Next, a method of calculating the animation display time (animation moving time) will be described. First, in order to display the cell frame (FIG. 8) stored in the cell frame data storage unit 13a like an animation, the number of times one cell frame is drawn by the CRT 3 is T. It should be noted that the number of times T is CRT3.
Is set in the EWS 4 in advance based on the display speed and the size of the image data.

【００４０】ＣＲＴ３に表示するセルフレームの枚数を
Ｎとすると、合成された音声が出力されている時間内
に、時間（Ｔ×Ｎ）だけかかるセルフレームの表示を行
うことのできる回数ＴＩＭＥＳは、テキストの文字数を
Ｍとして、次式にしたがって求められる。ＴＩＭＥＳ＝Ｍ／ａ但し、ａはテキスト中にコンマ（，）、またはピリオド
（．）を含むか否かと、これらを含む場合はその数によ
って可変の値を取るようになっており、取りうる値は前
述したＴと同様、あらかじめＥＷＳ４に設定されてい
る。Assuming that the number of cell frames displayed on the CRT 3 is N, the number of times TIMES that can display a cell frame that takes time (T × N) within the time when the synthesized voice is output is: It is calculated according to the following equation, where M is the number of characters in the text. TIMES = M / a However, a takes a variable value depending on whether or not a comma (,) or period (.) Is included in the text and the number of the commas. Is set in the EWS 4 in advance, similar to T described above.

【００４１】ここで、図９はテキストの文字数と発話時
間の関係を表す図である。テキストに用いられている単
語やその前後との繋がりにより若干ばらつきがあるもの
の、図に示すとおりほぼ矩形領域内に収まる。（この統
計結果によると、１文字あたりの発話時間は、０．０５
秒程度となっている。）Here, FIG. 9 is a diagram showing the relationship between the number of characters in the text and the utterance time. Although there are some variations depending on the words used in the text and the connections between the words and their surroundings, they fit within the rectangular area as shown in the figure. (According to this statistical result, the utterance time per character is 0.05
It is about a second. )

【００４２】上述したａが可変値を取る理由は、例えば
テキスト「No. Japan is not a large country.」や、
テキスト「Yes, I do.」などのようにテキスト中にピリ
オド、またはカンマがある場合、発話においては、その
直後に一呼吸、間が入るので、全体としての発話時間が
若干長くなることを考慮しているためである。The reason why the above-mentioned a takes a variable value is, for example, the text "No. Japan is not a large country."
If there is a period or comma in the text, such as the text "Yes, I do.", Take a breath and a pause immediately after the utterance, so consider that the total utterance time may be slightly longer. This is because

【００４３】図１０は、音声合成器１からのテキストの
発話時間を測定した結果を示す図である。図１０（ａ）
はテキスト「No. Japan is not a large country.」の
波形で、図１０（ｂ）はテキスト「No Japan is not a
large country.」の波形で、図１０（ｃ）はテキスト
「Now Japan is not a large country.」の波形であ
る。この３つを比べると、ピリオドがテキスト中に入っ
ているテキスト「No. Japan is not a large countr
y.」の発話時間が一番長いことが判る。FIG. 10 is a diagram showing a result of measuring the speech time of the text from the speech synthesizer 1. Figure 10 (a)
Is the waveform of the text "No. Japan is not a large country.", And Fig. 10 (b) is the text "No Japan is not a large country."
FIG. 10C shows a waveform of the text "Now Japan is not a large country." Comparing these three, the text "No. Japan is not a large countr" with a period in the text
It turns out that the utterance time of "y." is the longest.

【００４４】以上説明した方法により、テキストから合
成された音声と、そのテキストに対応して動くアニメー
ションの同期が、視聴覚的許容範囲内で実現可能とな
る。By the method described above, the synchronization of the voice synthesized from the text and the animation that moves corresponding to the text can be realized within the audiovisually permissible range.

【００４５】さらに、音声合成器１が音声出力を終えた
時点で、終了信号をＥＷＳ４に返すようにして、ＥＷＳ
４において、この終了信号が検出された場合に動画の表
示を終了するようにすれば、テキストから合成された音
声と、そのテキストに対応して動くアニメーションを、
より良く同期させることができる。Further, when the voice synthesizer 1 finishes outputting the voice, the end signal is returned to the EWS 4 so that the EWS 4 outputs.
If the display of the moving image is ended when the end signal is detected in 4, the voice synthesized from the text and the animation that moves in response to the text are
Can be better synchronized.

【００４６】また、単語の発話時間が定義してある単語
辞書（図１１）をＥＷＳ４の内蔵する、例えばハードデ
ィスク（図示せず）にあらかじめ記録させておくように
すれば、例えばテキスト「My uncle lives in a small
town.」などのテキストが与えられた時点で、そのテキ
ストを構成する単語「my」、「uncle」、「live」、「i
n」、「a」、「small」、「town」の発話時間を単語辞
書より、それぞれ０．１４８秒、０．２９８秒、０．２
７５秒、０．１０３秒、０．０５６秒、０．３１９秒、
０．２３１秒のように算出し、この合計時間（１．９２
８秒）の間だけアニメーションを表示するようにするこ
とができる。If the word dictionary (FIG. 11) in which the utterance time of a word is defined is recorded in advance in, for example, a hard disk (not shown) built in the EWS 4, for example, the text “My uncle lives” can be recorded. in a small
When a text such as "town." is given, the words "my", "uncle", "live", "i" that compose the text are given.
The utterance times of "n", "a", "small", and "town" are respectively 0.148 seconds, 0.298 seconds, 0.2 from the word dictionary.
75 seconds, 0.103 seconds, 0.056 seconds, 0.319 seconds,
It is calculated as 0.231 seconds, and this total time (1.92
The animation can be displayed only for 8 seconds.

【００４７】さらに、図８に示す口の動きを表すセルフ
レームと、英語の発音記号との対応表（図１２）をＥＷ
Ｓ４の内蔵するハードディスクにあらかじめ記録させて
おくようにすれば、テキストを発音記号に変換し（図１
３）、対応表から表示するセルフレームを得るようにし
て、アニメーションを生成するようにすることができ
る。Furthermore, the correspondence table (FIG. 12) between the cell frame representing the mouth movement shown in FIG. 8 and the English phonetic symbols is shown in EW.
If it is recorded in advance on the hard disk built into S4, the text will be converted into phonetic symbols (see Fig. 1).
3) The animation can be generated by obtaining the cell frame to be displayed from the correspondence table.

【００４８】以上説明したように、テキストから合成さ
れた音声が、テキストに対応して動くアニメーションが
話しているかのように出力されるので、学習者の興味が
ひきつけられ、学習効果を向上させることができる。As described above, since the voice synthesized from the text is output as if the animation moving corresponding to the text is output, the learner's interest is attracted and the learning effect is improved. You can

【００４９】[0049]

【発明の効果】請求項１に記載の表示制御装置によれ
ば、テキストから合成された音声と、テキストに対応し
て表示される動画とが同期して出力される。従って、任
意の音声を動画と同期させて出力するようにすることが
できる。According to the display control device of the first aspect, the voice synthesized from the text and the moving image displayed corresponding to the text are output in synchronization. Therefore, it is possible to output any sound in synchronization with the moving image.

【００５０】請求項２に記載の表示制御装置によれば、
表示手段に、テキストに対応して口が動くアニメーショ
ンを表示させるようにしたので、例えばＣＡＩシステム
などの学習用装置に適用することにより、学習者の興味
がひきつけられ、学習効果を向上させることができる。According to the display control device of the second aspect,
Since the animation in which the mouth moves corresponding to the text is displayed on the display means, by applying it to a learning device such as a CAI system, the interest of the learner can be attracted and the learning effect can be improved. it can.

[Brief description of drawings]

【図１】本発明の表示制御装置を応用したＣＡＩシステ
ムの一実施例の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a CAI system to which a display control device of the present invention is applied.

【図２】図１のＣＡＩシステムのサブシステム「ＬＡＢ
ＯＲＡＴＯＲＹ」におけるユーザインターフェースを示
す図である。FIG. 2 is a subsystem “LAB” of the CAI system of FIG.
It is a figure which shows the user interface in "ORATORY."

【図３】図２のアプリケーション１４のより詳細なブロ
ック図である。3 is a more detailed block diagram of the application 14 of FIG.

【図４】図１のＣＡＩシステムのＣＲＴ３に表示される
画面を示す図である。4 is a diagram showing a screen displayed on a CRT 3 of the CAI system of FIG.

【図５】図２の音声合成部１２の入出力を説明するため
の図である。5 is a diagram for explaining input / output of the speech synthesizer 12 of FIG.

【図６】図２のアニメ表示部１３の入出力を説明するた
めの図である。FIG. 6 is a diagram for explaining input / output of the animation display unit 13 of FIG.

【図７】図２のインターフェースの動作を説明するため
のフローチャートである。FIG. 7 is a flowchart for explaining the operation of the interface of FIG.

【図８】アニメ表示部１３のセルフレームデータ記憶部
１３ａに記憶されているセルフレームデータを示す図で
ある。8 is a diagram showing cell frame data stored in a cell frame data storage unit 13a of the animation display unit 13. FIG.

【図９】テキストの文字数と、テキストの発声時間との
関係を測定した結果を示す図である。FIG. 9 is a diagram showing a result of measuring the relationship between the number of characters in a text and the utterance time of the text.

【図１０】カンマまたはピリオドを含むテキストと、含
まないテキストとの発話時間を示す図である。FIG. 10 is a diagram showing utterance times of text including commas or periods and text not including commas.

【図１１】単語と、単語の発話時間とを対応させた単語
辞書を示す図である。FIG. 11 is a diagram showing a word dictionary in which words are associated with utterance times of the words.

【図１２】図８のセルフレームデータと、発音記号との
対応表を示す図である。12 is a diagram showing a correspondence table between the cell frame data of FIG. 8 and phonetic symbols.

【図１３】テキストを発音記号に変換したことを示す図
である。FIG. 13 is a diagram showing that text has been converted into phonetic symbols.

【符号の説明】１音声合成器２端末３ＣＲＴ４ワークステーション（ＥＷＳ）５キーボード６マウス１１コントロール部１２音声合成部１３アニメ表示部１３ａセルフレームデータ記憶部１４アプリケーション２１対話処理部２２言語処理部２３知識データベース検索部２４知識データベース３１，３２アニメーション３３，３４テキスト３５スピークボタン[Explanation of Codes] 1 voice synthesizer 2 terminal 3 CRT 4 workstation (EWS) 5 keyboard 6 mouse 11 control unit 12 voice synthesis unit 13 animation display unit 13a cell frame data storage unit 14 application 21 dialogue processing unit 22 language processing unit 23 Knowledge Database Search Section 24 Knowledge Database 31, 32 Animation 33, 34 Text 35 Speak Button

フロントページの続き (72)発明者川手史隆東京都品川区北品川６丁目７番35号ソニー株式会社内Front page continued (72) Inventor Fumitaka Kawate 6-735 Kitashinagawa, Shinagawa-ku, Tokyo Sony Corporation

Claims

[Claims]

1. A voice synthesizing unit for synthesizing and outputting a voice from a text, a display unit for displaying a moving image corresponding to the text, a voice synthesized by the voice synthesizing unit, and a display by the display unit. And a synchronization means for synchronizing the captured moving image.

2. The display control device according to claim 1, wherein the display unit displays an animation in which a mouth moves corresponding to the text.