JP2001319241A

JP2001319241A - Image data generator for voice-driven dynamic picture

Info

Publication number: JP2001319241A
Application number: JP2000135680A
Authority: JP
Inventors: Yoshiaki Akiyama; 嘉朗秋山
Original assignee: Individual
Current assignee: Individual
Priority date: 2000-05-09
Filing date: 2000-05-09
Publication date: 2001-11-16

Abstract

PROBLEM TO BE SOLVED: To provide an image data generator for voice-driven dynamic picture capable of constituting a telephone equipment or independent image generator, for which only the preparation of a few or one face image is sufficient and it is not necessary to send a large quantity of data at the time of transmission. SOLUTION: This device is provided with a sampling and storage means for previously sampling and storing the typical example of mouth state changes corresponding to the specified voice of a speaker, a read means for reading one image of the face of the speaker, a change adding means for changing the mouth of the image on the basis of the state changes of the mouth stored in the sampling and storage means concerning the mouth of the image read into this read means, and a storage means for storing the state of the mouth changed by this change adding means.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声ドライブ動画
面用イメージデータ生成装置に関し、特に、携帯電話、パ
ソコン通信等に用いるのに適した音声ドライブ動画面用
イメージデータ生成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image data generating apparatus for an audio drive moving image plane, and more particularly to an image data generating apparatus for an audio drive moving image plane suitable for use in a cellular phone, a personal computer communication or the like.

【０００２】[0002]

【従来の技術】一般に提供されている音声ドライブ画像
付き電話装置もしくはテレビ会議電話装置などのいわゆ
るテレビ会議で用いられる技術では、双方向に発信者両
方の画像と音声を送受信するようになっている。この技
術では、テレビ会議等への出席者の画像を送るために、
双方にカメラを備える必要がある。このため、カメラを
備えることなしに発信者の顔の画像を双方に送信するよ
うにした表示装置付きの電話装置を用いることは困難で
ある。2. Description of the Related Art In a technology generally used in a so-called video conference such as a telephone device with an audio drive image or a video conference telephone device, images and voices of both callers are bidirectionally transmitted and received. . With this technology, in order to send images of attendees to video conferences, etc.,
Both need to have cameras. For this reason, it is difficult to use a telephone device with a display device that transmits an image of a caller's face to both without having a camera.

【０００３】このことを解決するために、本発明者は、
発信側の装置から基本的な顔画像のデータを予め受信側
の装置に送信しておいて、あとで送信される発声者の音
素を音声認識機能によって認識し、そのデータによって
顔画像の動きを受信側の装置で音声と共に視聴し得るよ
うにした技術を提案している（登録実用新案第３０６２
０８０号）。In order to solve this problem, the present inventor has proposed:
Basic facial image data is transmitted from the transmitting device to the receiving device in advance, and the phoneme of the speaker transmitted later is recognized by the voice recognition function, and the movement of the facial image is recognized by the data. A technique has been proposed in which a device on the receiving side can view the program together with audio (registered utility model 3062).
No. 080).

【０００４】[0004]

【発明が解決しようとする課題】この登録実用新案の考
案では、少なくともいくつか音声の音素に対応する顔
（実質的には、顔全体）の画像のパターンを予め準備
し、電話装置もしくは通信事業者のサーバ等のメモリに
格納する必要があり、また、その中の１つの方法では、
送信毎に、画像を送る必要がある。この結果、電話装置
もしくはサーバ等は大容量のメモリを備える必要であ
り、送信時に多量のデータを送る必要がある。In the invention of the registered utility model, a pattern of an image of a face (substantially the entire face) corresponding to at least some phonemes of voice is prepared in advance, and a telephone device or a communication business is prepared. Must be stored in the memory of the user's server, etc. In one of the methods,
An image must be sent for each transmission. As a result, the telephone device or the server needs to have a large-capacity memory, and needs to send a large amount of data at the time of transmission.

【０００５】したがって、本発明の目的は、少ないまた
は１つの顔の画像のみを準備するだけで十分であり、送
信時に多量のデータを送る必要のないように電話装置を
構成することができる音声ドライブ動画面用イメージデ
ータ生成装置を提供することにある。Accordingly, it is an object of the present invention to provide an audio drive that can be configured with only a few or one facial image and that does not require a large amount of data to be transmitted during transmission. An object of the present invention is to provide a moving image plane image data generating apparatus.

【０００６】[0006]

【課題を解決するための手段】前述の目的を達成するた
めに、本発明は、発声者が複数の特定の音声を発声した
とき、発声者の顔の画像を読み込む読み込み手段と、該
読み込み手段に読み込んだ画像から音声に対応する口の
状態変化を抽出する変化抽出手段と、前記変化抽出手段
で抽出した口の状態を格納する状態格納手段と、を有す
ることを特徴とする音声ドライブ動画面用イメージデー
タ生成装置を採用するものである。In order to achieve the above-mentioned object, the present invention provides a reading means for reading an image of a face of a speaker when the speaker utters a plurality of specific sounds, and the reading means. A change extracting means for extracting a change in the state of the mouth corresponding to the sound from the image read into the apparatus, and a state storage means for storing the state of the mouth extracted by the change extracting means, And an image data generating device.

【０００７】本発明は、また、発声者が音声を発声した
ときの発声者の顔の画像を読み込む手段と、該読み込み
手段に読み込んだ画像から特定の音声に対応する画像だ
けを選択する画像選択手段と、該画像選択手段で選択し
た画像から音声に対応する口の状態変化を抽出する変化
抽出手段と、前記変化抽出手段で抽出した口の状態を格
納する状態格納手段と、を有することを特徴とする音声
ドライブ動画面用イメージデータ生成装置を採用するも
のである。The present invention also provides means for reading an image of a face of a speaker when the speaker utters a voice, and image selection for selecting only an image corresponding to a specific voice from the images read by the reading means. Means, change extraction means for extracting a change in the state of the mouth corresponding to the sound from the image selected by the image selection means, and state storage means for storing the state of the mouth extracted by the change extraction means. The present invention employs an image data generating apparatus for an audio drive moving image plane, which is a feature.

【０００８】本発明は、さらに、発声者の特定の音声に
対応する口の状態変化の代表例を予めサンプリングして
格納するサンプリング格納手段と、発声者の顔の１つの
画像を読み込む読み込み手段と、該読み込み手段に読み
込んだ画像の口に対して前記サンプリング格納手段に格
納された口の状態変化に基づいて画像の口に対して変更
を加える変更追加手段と、該変更追加手段で変更された
口の状態を格納する格納手段と、を有することを特徴と
する音声ドライブ動画面用イメージデータ生成装置を採
用するものである。The present invention further comprises sampling storage means for pre-sampling and storing a representative example of a mouth state change corresponding to a specific voice of the speaker, and reading means for reading one image of the face of the speaker. A change adding means for making a change to the mouth of the image based on a change in the state of the mouth stored in the sampling storage means for the mouth of the image read by the reading means; And a storage means for storing the state of the mouth.

【０００９】ここで、本発明は、前記口の状態変化は唇
の輪郭の変化で表すことが好ましく、また、前記特定の
音声が母音であることが好ましい。Here, in the present invention, the change in the state of the mouth is preferably represented by a change in the contour of a lip, and the specific voice is preferably a vowel.

【００１０】さらに、本発明は、前述の音声ドライブ動
画面用イメージデータ生成装置を用いた電話装置におい
て、音声を感知する感知手段と、該感知手段で感知した
特定の音に対して識別する音声識別手段と、前記音声識
別手段で識別した特定の音に対応する口の状態変化に基
づいて発声者の顔の画像を表示する表示手段と、を有す
ることを特徴とする電話装置を採用するものである。Further, according to the present invention, there is provided a telephone apparatus using the above-described audio drive moving image plane image data generating apparatus, wherein a detecting means for detecting a voice, and a voice for identifying a specific sound detected by the detecting means are provided. A telephone device, comprising: identification means; and display means for displaying an image of the face of the speaker based on a change in the state of the mouth corresponding to the specific sound identified by the voice identification means. It is.

【００１１】[0011]

【発明の実施の形態】次に、本発明の実施例を図面を参
照して説明する。図１は、本発明の実施例１および２の
音声ドライブ動画面用イメージデータ生成装置の動作の
流れを示すフローチャートである。図２は、本発明の実
施例３の音声ドライブ動画面用イメージデータ生成装置
の動作の流れを示すフローチャートである。図３は、実
施例３で用いられるサンプリング工程を説明するための
図である。図４は、本発明の音声ドライブ動画面用イメ
ージデータ生成装置を電話装置に適用したときの動作の
流れを示すフローチャートである。Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a flowchart showing the operation flow of the audio drive moving image plane image data generating apparatus according to the first and second embodiments of the present invention. FIG. 2 is a flowchart showing the flow of operation of the audio drive moving image plane image data generating apparatus according to the third embodiment of the present invention. FIG. 3 is a diagram for explaining a sampling process used in the third embodiment. FIG. 4 is a flowchart showing the flow of operation when the audio drive moving image plane image data generating device of the present invention is applied to a telephone device.

【００１２】最初に、本発明の基本的原理を説明する
と、顔の表情は、笑ったり、怒ったり泣いたり、話した
りするとき等に色々と変化する。電話装置で相手の顔を
見ながら話す際には、話をするときの相手の顔の表情の
基本的変化がわかればほぼ十分であるので、本発明で
は、話をするときの顔の表情で特に変化する部分、少な
くとも口の部分（さらに、唇の状態の変化）に着目し、
口の部分の変化を含む顔の画像データを生成するもので
ある。First, the basic principle of the present invention will be described. The facial expression changes variously when laughing, angry, crying, talking, and the like. When speaking while looking at the other person's face with the telephone device, it is almost sufficient to know the basic change of the expression of the other person's face when talking, so in the present invention, the expression of the face when speaking is used. Pay particular attention to the part that changes, at least the part of the mouth (and the change in the state of the lips)
This is to generate face image data including a change in the mouth part.

【００１３】そして、口の部分の変化は、音声中の母音
（日本語では、あ、い、う、え、お）が共通であれば、
ほぼ同様に変化するので、基本的には、音声中の母音を
識別し、その母音に合った顔の表情を出力するようにし
たものである。しかし、母音以外の音（例えば、笑い声
等）も合わせて識別してもよいものである。If the vowels (a, i, u, e, o in Japanese) in the voice are common, the change in the mouth part is
Basically, the vowels in the voice are identified and the facial expressions matching the vowels are output, since they change almost in the same manner. However, sounds other than vowels (for example, laughter) may be identified together.

【００１４】また、口部分の画像中の座標軸上の位置決
めは、パターンマッチングや射影処理法に基づいて得ら
れた顔の輪郭に対する口の中央部分を基準とし、唇の輪
郭を発声音素、少なくとも各母音ごとに決定する。以下、
図面を参照して詳細に説明する。The position of the mouth on the coordinate axis in the image is determined based on the center of the mouth with respect to the contour of the face obtained based on the pattern matching or the projection processing method. Determined for each vowel. Less than,
This will be described in detail with reference to the drawings.

【００１５】（実施例１）最初に、図１を参照する。電
話装置の使用者に、特定の音（例えば、特に母音を発生
してもらい（ステップS１）、そのときの顔の表情をカメ
ラ(特に、デジタルカメラ)によって撮影し、画像データと
してメモリに格納する(即ち、画像読み込みを行う)（ス
テップＳ２）。Embodiment 1 First, reference will be made to FIG. A specific sound (for example, a vowel in particular is generated by the user of the telephone device (step S1), and a facial expression at that time is photographed by a camera (in particular, a digital camera) and stored in a memory as image data. (That is, image reading is performed) (step S2).

【００１６】次に、顔の画像データから唇の輪郭を抽出
し（ステップＳ３）、その輪郭をデータとして格納する
（ステップＳ４）。Next, the contour of the lips is extracted from the image data of the face (step S3), and the contour is stored as data (step S4).

【００１７】（実施例２）引き続き、図１を参照する
と、電話装置の使用者に、ある適当な文章を発声しても
らい（ステップＳ１１）、発声時の顔の表情を画像とし
て読み込み（ステップＳ１２）、その画像の中から特定
の音、例えば、母音に対応する画像を選択する（ステッ
プＳ１３）。その後、前述したステップ３、ステップ４
が行われる。(Embodiment 2) Referring to FIG. 1, a user of the telephone device utters a certain appropriate sentence (step S11), and the facial expression at the time of utterance is read as an image (step S12). ), A specific sound, for example, an image corresponding to a vowel is selected from the images (step S13). Then, the above-mentioned steps 3 and 4
Is performed.

【００１８】（実施例３）最初に、特定の音声に対応す
る唇の輪郭を得るために、特に音声ドライブ電話装置の
使用者に限られず、複数の適当な人が選ばれて発声が行
われ、特定の音声に対応する顔の表情のサンプリングが
行われる（ステップＳ２１）。ここで、図３を参照する
と、母音に対応する唇の輪郭の代表的なサンプルが表さ
れている。このようなサンプルを得て、それを唇の輪郭
のデータとして読み込む（ステップＳ２２）。(Embodiment 3) First, in order to obtain a contour of a lip corresponding to a specific voice, a plurality of appropriate persons are selected and uttered, not limited to the user of the voice drive telephone device. Then, the facial expression corresponding to the specific voice is sampled (step S21). Referring now to FIG. 3, a representative sample of the lip contour corresponding to a vowel is shown. Such a sample is obtained, and the sample is read as lip contour data (step S22).

【００１９】次に、電話装置の使用者の顔の表情の画像
が読み込まれ、ステップＳ２２で得られた唇のデータ
で、特定の音声に対応する顔の画像が作成され（ステッ
プＳ２３）、格納される（ステップＳ２４）。Next, an image of the facial expression of the user of the telephone device is read, and a face image corresponding to a specific voice is created from the lip data obtained in step S22 (step S23) and stored. Is performed (step S24).

【００２０】次に、前述のようにして生成された画像デ
ータを用いる電話装置の動作の流れを説明する。図４を
参照すると、電話装置での通話の際に、音声が感知され
るのを待って（ステップＳ３１）、音声を感知すると、
音声を識別する（ステップＳ３２）。音声を認識する
と、その音声（特に、母音）に対応する唇の輪郭のデー
タを読み出し（ステップＳ３３）、唇の輪郭のデータに
基づいて電話装置の表示画面に顔の画像の一部として表
示する。Next, the operation flow of the telephone device using the image data generated as described above will be described. Referring to FIG. 4, during a call on the telephone device, after waiting for voice to be detected (step S31), when voice is detected,
The voice is identified (step S32). When the voice is recognized, the data of the lip contour corresponding to the voice (especially, the vowel) is read out (step S33), and displayed as a part of the face image on the display screen of the telephone device based on the data of the lip contour. .

【００２１】(他の実施例)前述の説明は、電話装置内
で、口、詳細には、唇の輪郭の変化状態に対応して顔の
画像を作成しているが、音声ドライブ動画面用イメージ
データ生成装置内で、口、詳細には、唇の輪郭の変化状
態に対応して顔の画像を作成し、メモリに格納し、電話装
置では、特定の音声に応じて顔の画像を読み出して表示
してもよい。(Other Embodiments) In the above description, a face image is created in a telephone device in accordance with a change state of a mouth, specifically, a contour of a lip. In the image data generation device, a face image is created corresponding to the changing state of the mouth and, more specifically, the contour of the lips, stored in a memory, and the phone device reads out the face image according to a specific voice. May be displayed.

【００２２】前述の実施例においては、基本的な顔画像
原画として、カメラにより撮影されたものを述べている
が、使用者の顔を代表するイラスト、似顔絵、アニメーシ
ョン画、横顔等を原画面の１つとして採用してもよい。In the above-described embodiment, the basic image of a face image taken by a camera is described. However, an illustration, a portrait, an animation image, a profile, etc., representing the user's face are displayed on the original screen. It may be adopted as one.

【００２３】また、本発明の装置を電話装置の一部とし
て備えても、電話装置とは独立した画像生成装置で用い
てもよい。The device of the present invention may be provided as a part of a telephone device, or may be used in an image generating device independent of the telephone device.

【００２４】[0024]

【発明の効果】以上説明したように、本発明によれば、
少ないまたは１つの顔の画像のみを準備するだけで十分
であり、送信時に多量のデータを送る必要のないように
携帯電話を構成することができる音声ドライブ動画面用
イメージデータ生成装置が得られる。As described above, according to the present invention,
It is sufficient to prepare only a few or one facial image, and an image data generating apparatus for an audio drive moving image plane that can configure a mobile phone so that it is not necessary to send a large amount of data at the time of transmission is obtained.

[Brief description of the drawings]

【図１】図１は、本発明の実施例１および２の音声ドラ
イブ動画面用イメージデータ生成装置の動作の流れを示
すフローチャートである。FIG. 1 is a flowchart showing an operation flow of an audio drive moving image plane image data generating apparatus according to Embodiments 1 and 2 of the present invention.

【図２】図２は、本発明の実施例３の音声ドライブ動画
面用イメージデータ生成装置の動作の流れを示すフロー
チャートである。FIG. 2 is a flowchart illustrating a flow of an operation of an image data generating apparatus for an audio drive moving image plane according to a third embodiment of the present invention.

【図３】図３は、実施例３で用いられるサンプリング工
程を説明するための図である。FIG. 3 is a diagram for explaining a sampling process used in a third embodiment;

【図４】図４は、本発明の音声ドライブ動画面用イメー
ジデータ生成装置を携帯電話に適用したときの動作の流
れを示すフローチャートである。FIG. 4 is a flowchart showing a flow of operation when the audio drive moving image plane image data generating device of the present invention is applied to a mobile phone.

Claims

[Claims]

1. A reading means for reading an image of a face of a speaker when the speaker utters a plurality of specific sounds, and a change for extracting a state change of a mouth corresponding to the sound from the image read by the reading means. An audio drive moving image plane image data generating apparatus, comprising: an extraction unit; and a state storage unit that stores a state of the mouth extracted by the change extraction unit.

A means for reading an image of the face of the speaker when the speaker utters a voice; an image selecting means for selecting only an image corresponding to a specific voice from the images read by the reading means; A voice comprising: a change extraction unit that extracts a change in the state of a mouth corresponding to a sound from an image selected by an image selection unit; and a state storage unit that stores a state of the mouth extracted by the change extraction unit. Image data generation device for drive moving image plane.

3. Sampling storage means for pre-sampling and storing a representative example of a mouth state change corresponding to a specific voice of a speaker, reading means for reading one image of the face of the speaker, and said reading means A change adding means for making a change to the mouth of the image based on a change in the state of the mouth stored in the sampling storage means for the mouth of the image read in, and a state of the mouth changed by the change adding means. An audio drive moving image plane image data generating apparatus, comprising: a storage unit for storing.

4. The apparatus for generating image data for an audio drive moving image plane according to claim 1, wherein the means for storing the state of the mouth stores the center position of the mouth with respect to the contour of the face on a coordinate axis. An image data generating apparatus for an audio drive moving image plane, wherein the image data is determined from the outline of the mouth and stored.

5. The voice-driven moving image plane image data generating apparatus according to claim 1, wherein the change in the state of the mouth is represented by a change in a contour of a lip. Surface image data generator.

6. The image data generating apparatus according to claim 1, wherein the specific voice is a vowel. apparatus.

7. A telephone device using the audio-drive moving image plane image data generating device according to claim 1, wherein a detecting means for detecting sound, and a specific sound detected by the detecting means. And a display means for displaying an image of the face of the speaker based on a change in the state of the mouth corresponding to the specific sound identified by the voice identification means. Telephone equipment.

8. When the speaker utters a plurality of specific voices, the apparatus has reading means for reading an image of the face of the speaker, and state storage means for storing the image of the face read by the reading means. An image data generating apparatus for an audio drive moving image plane, characterized in that:

9. A means for reading an image of a face of a speaker when a speaker generates a sound, and an image selecting means for selecting only a face image corresponding to a specific sound from the image read by the reading means. And a state storing means for storing an image of the face selected by the image selecting means.

10. The image data generating apparatus according to claim 8, wherein the specific sound is a vowel.

11. A telephone apparatus using the audio drive moving image plane image data generating apparatus according to claim 8, wherein a sensing means for sensing audio is provided.
Voice recognition means for identifying a specific sound detected by the detection means, and display means for displaying an image of a speaker's face corresponding to the specific sound identified by the voice identification means. Telephone equipment.