JPH0934483A

JPH0934483A - Method and device for voice output

Info

Publication number: JPH0934483A
Application number: JP7180129A
Authority: JP
Inventors: Toshiyuki Noguchi; 利之野口; Yasunori Ohora; 恭則大洞
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-07-17
Filing date: 1995-07-17
Publication date: 1997-02-07
Anticipated expiration: 2015-07-17
Also published as: JP3585190B2

Abstract

PROBLEM TO BE SOLVED: To provide easy adjustability for the volume of voice output by furnishing a voice output means which outputs a voice having a volume corresponding to the area of image factors as part of a synthesized picture displayed. SOLUTION: An image database 4 stores a plurality of image factors, and a voice synthesizing means 6 synthesizes the image factors read out of the database 4. A display means 3 displays a synthesized picture obtained, and an input means 1 enters an instruction to change the area of image factors as part of the displayed image. A changing means changes the area based on the instruction, and a speaker 8 as a voice output means outputs a voice having a volume corresponding to the area of part of the picture factors. This configuration attains changing the area of part of the picture factors based on the given instruction, outputting the voice of volume corresponding to the changed area, and making the volume of the outputted voice in correspondence to the area of the image, so that the user is easy to understand the control of the volume to lead to achievement of easy controllability.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、グラフィカルユー
ザインタフェースを用いて音声出力の音量を制御する音
声出力装置および方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio output device and method for controlling the volume of audio output using a graphical user interface.

【０００２】[0002]

【従来の技術】従来、コンピュータで音声を出力する場
合に、機械的なボリュームスイッチや、キーボードから
入力する音量調整コマンドや、画面に表示したスライド
バーなどにより、音量を調整している。2. Description of the Related Art Conventionally, when outputting sound from a computer, the volume is adjusted by a mechanical volume switch, a volume adjustment command input from a keyboard, a slide bar displayed on the screen, or the like.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、機械的
なボリュームスイッチを設けるた場合は、装置の価格が
高くなる。音量調節コマンドを用いた場合は、操作者が
そのコマンドを覚えなくてはならない。スライドバーも
操作者に対して直感的でなく、特に小児にとっては、操
作方法が理解しにくいという課題があった。However, when a mechanical volume switch is provided, the cost of the device increases. When using the volume control command, the operator must remember the command. The slide bar is also not intuitive to the operator, and there is a problem that the operation method is difficult to understand, especially for children.

【０００４】そこで本発明は、このような課題を解決
し、音声出力の音量を容易に調整することのできる音声
出力装置および方法を提供することを目的とする。SUMMARY OF THE INVENTION It is therefore an object of the present invention to solve the above problems and provide an audio output device and method capable of easily adjusting the volume of audio output.

【０００５】[0005]

【課題を解決するための手段】このような目的を達成す
るために、請求項１に記載の発明は、複数の画像要素を
格納する画像データベースと、当該画像データベースか
ら読み出した前記画像要素を合成する手段と、合成して
得られたた合成画像を表示する手段と、表示された画像
の中の一部の画像要素の面積を変更する指示を入力する
入力手段と、前記指示に基づいて前記面積を変更する変
更手段と、前記一部の画像要素の面積に対応づけた音量
の音声を出力する音声出力手段とを備えたことを特徴と
する。In order to achieve such an object, the invention described in claim 1 combines an image database storing a plurality of image elements with the image element read from the image database. Means, a means for displaying a composite image obtained by combining, an input means for inputting an instruction to change the area of a part of the image elements in the displayed image, and the above-mentioned means based on the instruction. It is characterized by comprising a changing means for changing the area and a sound outputting means for outputting a sound of a volume corresponding to the area of the part of the image elements.

【０００６】請求項２に記載の発明は、前記画像データ
ベースは、前記一部の画像要素の、音量がゼロであるこ
とを示す第１の形態と、音量が最大であることを示す第
２の形態とを格納し、前記変更手段は、前記入力手段か
ら入力された指示に基づいて、前記第１の形態、前記第
２の形態、または前記第１及び第２の形態を補間して得
られた中間形態を表示することを特徴とする。According to a second aspect of the present invention, the image database has a first form indicating that the volume of the part of the image elements is zero and a second form indicating that the volume is maximum. And a form is stored, and the change unit is obtained by interpolating the first form, the second form, or the first and second forms based on an instruction input from the input unit. It is characterized by displaying an intermediate form.

【０００７】請求項３に記載の発明は、前記画像データ
ベースは前記一部の画像要素の複数の形態を格納し、前
記変更手段は、前記複数の形態から現在表示している形
態とは異なる形態を選択して表示することを特徴とす
る。According to a third aspect of the present invention, the image database stores a plurality of forms of the part of the image elements, and the changing means is a form different from the form currently displayed from the plurality of forms. Is selected and displayed.

【０００８】請求項４に記載の発明は、前記合成画像が
顔の画像であり、前記一部の画像要素が口の画像である
ことを特徴とする。According to a fourth aspect of the invention, the composite image is a face image and the part of the image elements is a mouth image.

【０００９】請求項５に記載の発明は、テキストデータ
を格納するテキスト保持手段と、前記テキスト保持手段
から読み出した前記テキストを音声に変換する変換手段
を更に備え、前記音声出力手段は、当該変換手段により
変換された音声を出力することを特徴とする。According to a fifth aspect of the present invention, there is further provided text holding means for storing text data, and conversion means for converting the text read from the text holding means into voice, and the voice output means has the conversion. It is characterized in that the voice converted by the means is output.

【００１０】請求項６に記載の発明は、複数の画像要素
を予め格納する画像データベースから読み出した、前記
複数の画像要素を合成する合成ステップと、合成して得
られた合成画像を表示する表示ステップと、表示した画
像の中の一部の画像要素の面積を変更する指示を入力す
る入力ステップと、前記指示に基づいて前記面積を変更
する変更ステップと、前記一部の画像要素の面積に対応
づけた音量の音声を出力する出力ステップとを備えたこ
とを特徴とする。According to a sixth aspect of the present invention, a synthesizing step of synthesizing the plurality of image elements, which is read from an image database storing a plurality of image elements in advance, and a display for displaying a synthesized image obtained by the synthesizing. A step, an input step of inputting an instruction to change the area of some image elements in the displayed image, a changing step of changing the area based on the instruction, and an area of some of the image elements And an output step of outputting a voice of a corresponding volume.

【００１１】請求項７に記載の発明は、前記画像データ
ベースは、前記一部の画像要素の、音量がゼロであるこ
とを示す第１の形態と、音量が最大であることを示す第
２の形態とを格納しており、前記変更ステップは、前記
入力ステップで入力された前記指示に基づいて、前記第
１の形態、前記第２の形態、または前記第１及び第２の
形態を補間して得られた中間形態のいずれかを表示する
ことを特徴とする。According to a seventh aspect of the present invention, in the image database, the first form indicating that the volume of the part of the image elements is zero and the second form indicating that the volume is maximum. The form is stored, and the changing step interpolates the first form, the second form, or the first and second forms based on the instruction input in the input step. It is characterized by displaying any of the obtained intermediate forms.

【００１２】請求項８に記載の発明は、前記画像データ
ベースは前記一部の画像要素の複数の形態を予め格納し
てており、前記変更ステップは、前記複数の形態から現
在表示している形態とは異なる形態を選択して表示する
ことを特徴とする。According to an eighth aspect of the present invention, the image database stores a plurality of forms of the part of the image elements in advance, and the changing step includes a form currently displayed from the plurality of forms. It is characterized by selecting and displaying a different form.

【００１３】請求項９に記載の発明は、前記合成画像が
顔の画像であり、前記一部の画像要素が口の画像である
ことを特徴とする。According to a ninth aspect of the invention, the composite image is a face image and the part of the image elements is a mouth image.

【００１４】請求項１０に記載の発明は、テキストデー
タを予めテキスト保持手段に格納しており、前記テキス
ト保持手段から読み出した前記テキストを音声に変換す
る変換ステップを更に備え、前記出力ステップは、当該
変換ステップにより変換された音声を出力することを特
徴とする。According to a tenth aspect of the present invention, text data is stored in advance in the text holding means, and the method further comprises a conversion step of converting the text read from the text holding means into voice, and the outputting step. It is characterized in that the voice converted by the converting step is output.

【００１５】請求項１又は請求項６に記載の発明によれ
ば、画像データベースから読み出した複数の画像要素を
合成して表示し、表示した画像の中の一部の画像要素の
面積を変更する指示を入力し、指示に基づいてその一部
の画像要素の面積を変更すると共に、変更された面積に
対応づけた音量の音声を出力する。According to the first or sixth aspect of the present invention, a plurality of image elements read from the image database are combined and displayed, and the area of some image elements in the displayed image is changed. An instruction is input, the area of some of the image elements is changed based on the instruction, and a sound of a volume corresponding to the changed area is output.

【００１６】請求項２又は請求項７に記載の発明によれ
ば、画像データベースは、一部の画像要素の、音量がゼ
ロであることを示す第１の形態と、音量が最大であるこ
とを示す第２の形態とを格納しており、入力された指示
に基づいて、第１の形態、第２の形態、または第１及び
第２の形態を補間して得られた中間形態のいずれかを表
示すると共に、表示された一部の画像の面積に対応づけ
た音量の音声を出力する。According to the second or seventh aspect of the present invention, the image database has the first form indicating that the volume of some image elements is zero, and that the volume is maximum. The second form shown in FIG. 2 is stored, and either the first form, the second form, or the intermediate form obtained by interpolating the first and second forms based on the input instruction. Is displayed, and a sound of a volume corresponding to the area of a part of the displayed image is output.

【００１７】請求項３又は請求項８に記載の発明によれ
ば、画像データベースは一部の画像要素の複数の形態を
予め格納しており、入力された指示に基づいて、現在表
示している形態とは異なる形態を選択して表示すると共
に、表示された一部の画像の面積に対応づけた音量の音
声を出力する。According to the third or eighth aspect of the present invention, the image database stores in advance a plurality of forms of some image elements, and the images are currently displayed based on the input instruction. A form different from the form is selected and displayed, and a sound with a volume corresponding to the area of a part of the displayed image is output.

【００１８】請求項４又は請求項９に記載の発明によれ
ば、合成画像として顔の画像が表示され、その中の口の
画像の面積が変更されると共に、変更された口の画像の
面積に対応づけられた音量の音声が出力される。According to the invention of claim 4 or claim 9, a face image is displayed as a composite image, the area of the mouth image therein is changed, and the area of the changed mouth image is changed. The sound of the volume associated with is output.

【００１９】請求項５又は請求項１０に記載の発明によ
れば、テキスト保持手段から読み出したテキストを音声
に変換し、変更された面積に対応づけた音量で出力され
る。According to the fifth or tenth aspect of the present invention, the text read out from the text holding means is converted into a voice, and the voice is output at a volume corresponding to the changed area.

【００２０】[0020]

【発明の実施の形態】以下、図面を参照して本発明の実
施例を詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings.

【００２１】図１は、本発明の実施例に係る音声出力装
置の構成を示すブロック図である。同図において、１は
ユーザからコマンド及びデータを入力する入力手段であ
り、例えば、キーボードおよびマウス等のポインティン
グデバイスから構成される。２は入力手段１からの入力
に対応して画像の表示を制御する表示制御手段、３は表
示制御手段２から出力された画像を表示する表示手段で
ある。４は表示手段に表示する顔などの画像の要素を格
納する画像データベースである。FIG. 1 is a block diagram showing the configuration of an audio output device according to an embodiment of the present invention. In the figure, reference numeral 1 denotes an input means for inputting a command and data from a user, which is composed of a keyboard and a pointing device such as a mouse. Reference numeral 2 is a display control means for controlling the display of an image in response to the input from the input means 1, and 3 is a display means for displaying the image output from the display control means 2. An image database 4 stores image elements such as faces displayed on the display means.

【００２２】図２に、画像データベース４に格納した画
像要素の一例を示す。画像データベース４は、人の顔の
画像の各構成要素としての「目」、「鼻」、「口」、
「輪郭」、「髪」の、画像要素ファイルを有する。各画
像要素ファイルには、図２に示すように各構成要素のデ
フォルト画像の形態およびデフォルト画像を変形した変
形画像の形態が格納されている。デフォルト画像および
変形画像はスケーラブルであり、入力手段１からの入力
等に基づいて拡大または縮小することができる。FIG. 2 shows an example of the image elements stored in the image database 4. The image database 4 includes “eyes”, “nose”, “mouths”, which are constituent elements of human face images.
It has image element files of "contour" and "hair". As shown in FIG. 2, each image element file stores the form of a default image of each component and the form of a modified image obtained by modifying the default image. The default image and the deformed image are scalable and can be enlarged or reduced based on the input from the input means 1 or the like.

【００２３】表示制御手段２は、各画像要素ファイルか
ら必要な画像を取り出し、必要に応じて大きさを変更
し、各画像を合成して表示手段３に表示する。また表示
制御手段２は、１つの画像ファイルに格納された１つの
画像要素の、複数の形態を補間することにより、複数の
形態の中間的な画像を形成することができる。The display control means 2 takes out a required image from each image element file, changes the size as necessary, synthesizes the images, and displays them on the display means 3. Further, the display control means 2 can form an intermediate image of a plurality of forms by interpolating a plurality of forms of one image element stored in one image file.

【００２４】テキスト保持手段５は、音声合成を行う対
象となる入力テキストを保存する。入力手段１から音声
合成開始コマンドが入力されると、中央制御手段１０
は、入力テキスト保持手段６からテキストを読みだし、
音声合成手段６に転送する。すると音声合成手段６は、
転送されたテキストに基づいて音声を合成する。The text holding means 5 saves the input text to be the target of speech synthesis. When a voice synthesis start command is input from the input means 1, the central control means 10
Reads the text from the input text holding means 6,
Transfer to the voice synthesizer 6. Then, the voice synthesizing means 6
Synthesize speech based on the transferred text.

【００２５】中央制御手段１０は、入力手段１から入力
された音量指示に従って音量制御手段７を制御する。音
量制御手段７は、音声合成手段６から入力された音声
を、中央制御手段１０からの制御に従って拡大し、スピ
ーカ８に出力する。The central control means 10 controls the volume control means 7 according to the volume instruction input from the input means 1. The volume control means 7 expands the voice input from the voice synthesis means 6 under the control of the central control means 10 and outputs it to the speaker 8.

【００２６】図３は、図１の中央制御手段１０の動作を
示すフローチャートである。また図４に、図３のフロー
チャートに示す動作に従って、表示手段３に表示される
顔の画像の例を表す。両図を用いて本音声出力装置の動
作を説明する。ステップ（Ｓと略記する）１０１（図
３）では、表示制御手段２を介して、画像データベース
４に格納された画像ファイルから「目」、「鼻」、
「口」、「輪郭」および「髪」の各画像データ要素のデ
フォルト画像を読み出し、読み出したデフォルト画像を
合成して、画像４１（図４）を表示手段３に表示する。FIG. 3 is a flow chart showing the operation of the central control means 10 of FIG. Further, FIG. 4 shows an example of a face image displayed on the display means 3 in accordance with the operation shown in the flowchart of FIG. The operation of the audio output device will be described with reference to FIGS. In step (abbreviated as S) 101 (FIG. 3), “eyes”, “nose”, from the image files stored in the image database 4 are displayed via the display control means 2.
The default images of the image data elements “mouth”, “outline” and “hair” are read out, the read out default images are combined, and the image 41 (FIG. 4) is displayed on the display means 3.

【００２７】Ｓ１０２では、表示した顔画像の中の「一
部の画像要素」の一例としての、口の部分がポインティ
ングデバイスにより指定されているかどうかを判断す
る。口の部分は、例えば画像４２（図４）に示すように
マウスのポインタを口の上に移動してマウスのボタンを
押すことにより指定することができる。In S102, it is determined whether or not the mouth portion, which is an example of "a part of image elements" in the displayed face image, is designated by the pointing device. The mouth portion can be designated by, for example, moving the mouse pointer over the mouth and pressing the mouse button as shown in the image 42 (FIG. 4).

【００２８】中央制御装置１０は、ポインタが口の部分
にあることおよびマウスボタンが押されていることを条
件に、口が指定されたと判断することができる。口の部
分が指定されていない場合はＳ１０２に戻る。口の部分
が指定されていた場合はＳ１０３に進み、口が中程度に
開いた画像を画像データベース４から読み出して、口の
デフォルト画像に換えて表示手段に表示する。Ｓ１０３
で表示される画像の一例を画像４３（図４）に示す。The central control unit 10 can determine that the mouth has been designated on the condition that the pointer is located at the mouth and the mouse button is pressed. When the mouth part is not designated, the process returns to S102. If the mouth portion has been designated, the process proceeds to S103, the image in which the mouth is opened moderately is read from the image database 4, and is displayed on the display unit in place of the default image of the mouth. S103
An image 43 (FIG. 4) shows an example of the image displayed in.

【００２９】Ｓ１０４では、音声を中程度に拡大するよ
うに音量制御手段７を設定する。またテキスト保持手段
からテキストを読みだし、音声合成手段６に転送する。
すると音声合成手段６が音声を合成し、音量制御手段７
が音声を中程度に拡大し、スピーカ８から音声が出力さ
れる。Ｓ１０５では、音声合成手段６が音声の出力が終
了したか否かを判断する。音声の出力が終了していなけ
ればＳ１０６に進み、音声の出力が終了していればＳ１
１５に進む。In S104, the volume control means 7 is set so as to expand the voice to a medium level. Further, the text is read from the text holding means and transferred to the voice synthesizing means 6.
Then, the voice synthesizing unit 6 synthesizes the voice, and the volume control unit 7
Expands the sound to a medium level, and the sound is output from the speaker 8. In S105, the voice synthesizing means 6 determines whether or not the voice output is completed. If the voice output has not ended, the process proceeds to S106, and if the voice output has ended, S1
Proceed to 15.

【００３０】Ｓ１０６では、口の部分がポインティング
デバイスにより指定されているか否かを検出する。口の
部分が指定されていない場合は、Ｓ１０５に戻る。Ｓ１
０６で口の部分が指定されていると、口の画像を大きく
する指示が入力されているか否かを判断する（Ｓ１０
７）。口の画像は、マウスボタンを押したままポインタ
を口の中心から遠ざけることにより拡大できる。Ｓ１０
７で口の画像を大きくする操作でなかった場合はＳ１１
０に進む。In S106, it is detected whether or not the mouth portion is designated by the pointing device. When the mouth part is not designated, the process returns to S105. S1
If the mouth portion is designated in 06, it is determined whether or not an instruction to enlarge the mouth image is input (S10).
7). The mouth image can be magnified by holding the mouse button and moving the pointer away from the center of the mouth. S10
If it is not the operation of enlarging the mouth image in step 7, S11
Go to 0.

【００３１】Ｓ１０７で口の画像を大きくする操作であ
った場合は、マウスの移動量に応じて、音量制御手段７
を用いて音声の増幅率を高める（Ｓ１０８）。Ｓ１０９
では、中程度に口が開いた画像と、口が大きく開いた画
像とを補間することにより、マウスの移動量に応じて開
かれた口の画像を形成して表示手段３に表示する。但
し、多様な大きさに開いた口の画像を予め用意し、マウ
スの移動量に応じて適切な画像を読み出して表示手段３
に表示しても良い。Ｓ１０９により表示される画像の一
例を画像４４（図４）に示す。If the operation is to enlarge the image of the mouth in S107, the volume control means 7 is operated according to the amount of movement of the mouse.
Is used to increase the amplification factor of voice (S108). S109
Then, by interpolating the image with the mouth wide open and the image with the mouth wide open, an image of the mouth open according to the amount of movement of the mouse is formed and displayed on the display unit 3. However, the image of the mouth opened in various sizes is prepared in advance, and an appropriate image is read according to the amount of movement of the mouse to display the image on the display unit 3.
May be displayed. An image 44 (FIG. 4) shows an example of the image displayed in S109.

【００３２】Ｓ１１０では、口の画像を小さくする指示
が入力されたか否かを判断する。口の画像は、マウスボ
タンを押したままポインタを口の中心へ近づける方向に
移動させることにより縮小することができる。口を小さ
くする指示が入力された場合は、音声制御手段による音
量の増幅率を、マウスの移動量に応じて小さくする（Ｓ
１１１）。また表示制御手段２を介して画像データベー
ス４から、小さな口の画像を読み出し、表示手段３に表
示して（Ｓ１１２）、Ｓ１０５に戻る。In S110, it is determined whether or not an instruction to reduce the size of the mouth image has been input. The mouth image can be reduced by holding the mouse button down and moving the pointer toward the center of the mouth. When an instruction to make the mouth smaller is input, the amplification factor of the volume by the voice control means is made smaller according to the amount of movement of the mouse (S
111). Further, the image of the small mouth is read from the image database 4 via the display control unit 2, displayed on the display unit 3 (S112), and the process returns to S105.

【００３３】例えば画像４５（図４）のようにポインタ
が口画像から離れており、Ｓ１１０で口を小さくする指
示が入力されていなかった場合は、音声の出力を停止す
る指示が入力されたか否かを判断する（Ｓ１１３）。音
声は、例えば表示手段３に表示した顔画像の中の口画像
をクリックすることにより停止できる。この場合中央制
御手段１０は、Ｓ１１３で、ポインタが口画像の上にあ
って、かつマウスのボタンが押されたことを条件とし
て、音声の出力を停止する指示が入力されたと判断す
る。For example, when the pointer is away from the mouth image as in the image 45 (FIG. 4) and the instruction to reduce the mouth is not input in S110, it is determined whether or not the instruction to stop the voice output is input. It is determined (S113). The voice can be stopped by, for example, clicking the mouth image in the face image displayed on the display unit 3. In this case, the central control means 10 determines in S113 that an instruction to stop the sound output is input on condition that the pointer is on the mouth image and the mouse button is pressed.

【００３４】Ｓ１１３で出力の停止が指示されていなけ
ればＳ１０５に戻る。Ｓ１１３で出力を停止する指示が
入力されていれば音声合成手段６による音声の合成を停
止し（Ｓ１１４）、口画像ファイルの中からの閉じたデ
フォルト画像を画像データベース４の「口」画像ファイ
ルから読み出し、既に表示している顔画像の口の部分と
置き換えて表示する（Ｓ１１５）。Ｓ１１５により表示
される画像を画像４６（図４）に示す。Ｓ１１５の後
に、Ｓ１０２に戻る。If it is not instructed to stop the output in S113, the process returns to S105. If the instruction to stop the output is input in S113, the speech synthesis by the speech synthesis unit 6 is stopped (S114), and the closed default image from the mouth image file is extracted from the "mouth" image file in the image database 4. It is read out and replaced with the mouth portion of the face image that has already been displayed and displayed (S115). The image displayed in S115 is shown in the image 46 (FIG. 4). It returns to S102 after S115.

【００３５】（その他）実施例１では、ポインテインゲ
デバイスであるマウスを用いて口の画像を指示したが、
画像の「指定手段」は、表示手段に表示された口の画像
を指定できるものであれば足り、例えばペン、デジタイ
ザなどの他の入力デバイスであってもよい。(Others) In the first embodiment, a mouse as a pointing device is used to indicate the image of the mouth.
The "designating means" for the image is sufficient as long as it can designate the image of the mouth displayed on the display means, and may be another input device such as a pen or a digitizer.

【００３６】図４ではポインタの一例として矢印を表示
したが、ポインタは表示手段に表示された口が指定され
ていることを表示できれば足り、例えば図５に示す様な
マイクの形のポインタを用いても良い。In FIG. 4, an arrow is displayed as an example of the pointer, but it is sufficient if the pointer can display that the mouth displayed on the display means is designated. For example, a pointer in the shape of a microphone as shown in FIG. 5 is used. May be.

【００３７】実施例１では、音声を出力し始めると口が
開いた画像に切り替えたが、口の画像の変化の方法は、
これに限定されるものではない。例えば、口の部分をア
ニメーションのように動画として動かしてもよい。この
場合、音声の大きさに応じて口の部分の動きを大きくし
ても良い。In the first embodiment, the image is switched to an image in which the mouth is opened when the sound is started to be output.
It is not limited to this. For example, the mouth part may be moved as a moving image like an animation. In this case, the movement of the mouth may be increased according to the volume of the voice.

【００３８】実施例１では、ポインタで口の画像をクリ
ックすることにより音声の合成を開始したが、音声とし
て出力したいテキストを表わすアイコンを、口の部分に
移動することにより音声の出力を開始してもよい。In the first embodiment, the voice synthesis is started by clicking the mouth image with the pointer, but the voice output is started by moving the icon representing the text to be output as voice to the mouth portion. May be.

【００３９】実施例１では、音声をスピーカに出力した
が、音声の「出力手段」はスピーカに限定されるもので
はなく、例えは音声記録デッキに出力することにより、
記録テープに録昔させてもよい。In the first embodiment, the sound is output to the speaker, but the "output means" of the sound is not limited to the speaker, and for example, by outputting to the sound recording deck,
It may be recorded on a recording tape for a long time.

【００４０】実施例１では、ポインティングデバイスを
用いて「口」の画像を指定し、拡大し、または縮小した
が、ポインティングデバイスに換えて、キーボードによ
り拡大又は縮小しても良い。In the first embodiment, the image of the "mouth" is designated by using the pointing device and enlarged or reduced, but it may be enlarged or reduced by the keyboard instead of the pointing device.

【００４１】実施例１では口の画像をクリックすること
により音声を停止したが、例えば画面に表示した「停
止」ボタンをクリックすることにより音声を停止しても
良い。Although the voice is stopped by clicking the image of the mouth in the first embodiment, the voice may be stopped by clicking the "stop" button displayed on the screen, for example.

【００４２】本発明は、複数の装置から構成されるシス
テムにも単体の装置にも適用できる。また、本発明はシ
ステム或は装置にプログラムを読み込むことによっても
実施できる。そのようなプログラムを記載した記録媒体
に、本出願による特許権の特許法第１０１条の効力が及
ぶことは言うまでもない。The present invention can be applied to a system composed of a plurality of devices or a single device. The present invention can also be implemented by loading a program into a system or device. It goes without saying that the recording medium in which such a program is written is subject to the effect of Article 101 of the Patent Law of the present patent application.

【００４３】[0043]

【発明の効果】以上の説明から明らかなように、請求項
１又は請求項６に記載の発明によれば、画像データベー
スから読み出した複数の画像要素を合成して表示し、表
示した画像の中の一部の画像要素の面積を変更する指示
を入力し、指示に基づいてその一部の画像要素の面積を
変更すると共に、変更された面積に対応づけた音量の音
声を出力する。このように、出力する音声の音量を画像
の面積に対応づけたので、ユーザにとって音量の制御が
理解しやすく、制御が容易になる。As is apparent from the above description, according to the invention described in claim 1 or 6, the plurality of image elements read from the image database are combined and displayed, and the image is displayed. An instruction to change the area of some of the image elements is input, the area of some of the image elements is changed based on the instruction, and a sound with a volume corresponding to the changed area is output. In this way, since the volume of the output voice is associated with the area of the image, the control of the volume is easy for the user to understand, and the control is easy.

【００４４】請求項２又は請求項７に記載の発明によれ
ば、入力された指示に基づいて、画像要素の第１の形
態、第２の形態、または第１及び第２の形態を補間して
得られた中間形態のいずれかを表示すると共に、表示さ
れた一部の画像の面積に対応づけた音量の音声を出力す
る。表示する全ての画像の形態を予め格納するのではな
く、第１の形態と第２の形態との中間形態を補間して出
力するので、全ての形態を予め格納した場合と比較し
て、必要なデータの格納容量を小さくすることができ
る。According to the second or seventh aspect of the invention, the first form, the second form, or the first and second forms of the image element are interpolated based on the input instruction. One of the obtained intermediate forms is displayed, and a sound of a volume corresponding to the area of a part of the displayed image is output. It is necessary to compare with the case where all the forms are stored in advance, because the forms of all the images to be displayed are not stored in advance and the intermediate form between the first form and the second form is interpolated and output. Storage capacity of various data can be reduced.

【００４５】請求項３又は請求項８に記載の発明によれ
ば、画像データベースは一部の画像要素の複数の形態を
予め格納しており、入力された指示に基づいて、現在表
示している形態とは異なる形態を選択して表示すると共
に、表示された一部の画像の面積に対応づけた音量の音
声を出力する。表示すべき複数の形態を予め格納してい
るので、表示する画像の面積を容易に変更することがで
きる。According to the invention described in claim 3 or claim 8, the image database stores a plurality of forms of some image elements in advance and is currently displayed based on the input instruction. A form different from the form is selected and displayed, and a sound with a volume corresponding to the area of a part of the displayed image is output. Since a plurality of forms to be displayed are stored in advance, the area of the displayed image can be easily changed.

【００４６】請求項４又は請求項９に記載の発明によれ
ば、合成画像として顔の画像が表示され、その中の口の
画像の面積が変更されると共に、変更された口の画像の
面積に対応づけられた音量の音声が出力される。口の形
をしたグラフィカルインターフェースに対する指示によ
り音量を制御するので、ユーザにとって音量の制御直感
的に理解しやすく、音量の制御が容易になる。特に、面
積が最小の場合に音声の出力を停止することにより、音
声出力の開始及び停止も容易に制御することができる。According to the fourth or ninth aspect of the invention, a face image is displayed as a composite image, the area of the mouth image therein is changed, and the changed area of the mouth image is changed. The sound of the volume associated with is output. Since the volume is controlled by an instruction to the mouth-shaped graphical interface, it is easy for the user to intuitively understand the volume control and the volume control becomes easy. Particularly, by stopping the sound output when the area is the minimum, the start and stop of the sound output can be easily controlled.

【００４７】請求項５又は請求項１０に記載の発明によ
れば、テキスト保持手段から読み出したテキストを音声
に変換し、変更した面積に対応づけた音量で音声を出力
する。このため、テキストの読み出し時の音量を、理解
しやすいインタフェースで容易に制御することができ
る。According to the fifth or tenth aspect of the invention, the text read from the text holding means is converted into a voice, and the voice is output at a volume corresponding to the changed area. Therefore, it is possible to easily control the volume at the time of reading the text with an interface that is easy to understand.

[Brief description of drawings]

【図１】本発明の一実施例にかかる音声出力装置の構成
を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an audio output device according to an embodiment of the present invention.

【図２】画像データベースに格納された各データファイ
ルの画像の例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of an image of each data file stored in an image database.

【図３】本発明の一実施例にかかる音声出力装置の処理
手順を示すフローチャートである。FIG. 3 is a flowchart showing a processing procedure of the audio output device according to the embodiment of the present invention.

【図４】図３のフローチャートに示す動作により表示さ
れる画像を説明する説明図である。FIG. 4 is an explanatory diagram illustrating an image displayed by the operation shown in the flowchart of FIG.

【図５】本発明の一実施例により表示される画像を説明
する説明図である。FIG. 5 is an explanatory diagram illustrating an image displayed according to an embodiment of the present invention.

[Explanation of symbols]

１入力手段２表示制御手段３表示手段４画像データベース５テキスト保持手段６音声合成手段７音量制御手段８スピーカ１０中央制御手段 1 Input Means 2 Display Control Means 3 Display Means 4 Image Database 5 Text Holding Means 6 Voice Synthesizing Means 7 Volume Control Means 8 Speakers 10 Central Control Means

Claims

[Claims]

1. An image database storing a plurality of image elements, a means for synthesizing the image elements read from the image database, a means for displaying a synthesized image obtained by synthesizing, and a displayed image. Input means for inputting an instruction to change the area of some of the image elements, changing means for changing the area based on the instruction, and a sound volume corresponding to the area of the some image element. An audio output device comprising: an audio output unit for outputting

2. The image database includes a first form indicating that the volume of the some image elements is zero,
A second form indicating that the volume is maximum, and the changing unit stores the first form, the second form, or the first form based on an instruction input from the input unit. And the intermediate form obtained by interpolating the second form and the second form are displayed.

3. The image database stores a plurality of forms of the partial image element, and the changing unit selects and displays a form different from the form currently displayed from the plurality of forms. The audio output device according to claim 1 or 2.

4. The audio output device according to claim 1, wherein the composite image is a face image, and the part of the image elements is a mouth image.

5. A text holding unit for storing text data, and a conversion unit for converting the text read from the text holding unit into voice, wherein the voice output unit outputs the voice converted by the conversion unit. The audio output device according to claim 1, wherein the audio output device outputs.

6. A synthesizing step of synthesizing the plurality of image elements read from an image database storing a plurality of image elements in advance, a display step of displaying a synthesized image obtained by the synthesizing, and a displaying step of displaying the image. An input step of inputting an instruction to change the area of a part of the image elements, a changing step of changing the area based on the instruction, and a sound volume corresponding to the area of the part of the image element. An audio output method comprising: an output step of outputting.

7. The first form, wherein the image database indicates that the volume of the some image elements is zero,
And a second form indicating that the volume is maximum, the changing step is based on the instruction input in the input step, and the first form, the second form,
7. The voice output method according to claim 6, further comprising displaying one of intermediate forms obtained by interpolating the first and second forms.

8. The image database stores a plurality of forms of the part of the image elements in advance, and the changing step selects a form different from the form currently displayed from the plurality of forms. The audio output method according to claim 6 or 7, characterized in that

9. The audio output method according to claim 6, wherein the composite image is a face image, and the part of the image elements is a mouth image.

10. The text data is stored in advance in the text holding means, and the method further comprises a conversion step of converting the text read from the text holding means into voice, and the output step is converted by the conversion step. The voice output method according to claim 6, wherein voice is output.