JP3298076B2

JP3298076B2 - Image creation device

Info

Publication number: JP3298076B2
Application number: JP30642292A
Authority: JP
Inventors: 知生光永; 恒三浦
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1992-10-20
Filing date: 1992-10-20
Publication date: 2002-07-02
Anticipated expiration: 2017-07-02
Also published as: JPH06162166A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【目次】以下の順序で本発明を説明する。産業上の利用分野従来の技術（図４８）発明が解決しようとする課題（図４８）課題を解決するための手段（図３９及び図４６）作用（図３９及び図４６）実施例（１）第１の実施例（１−１）全体構成（図１及び図２）（１−２）口形状パラメータ（図３）（１−３）加重パラメータ（図４〜図３８）（１−４）加重パラメータの生成（図３９〜図４２）（１−５）実施例の効果（２）第２の実施例（２−１）全体構成（図４３〜図４５）（２−２）合成バランスパラメータ（図４６）（２−３）第２の実施例の効果（３）他の実施例（図４７）発明の効果[Table of Contents] The present invention will be described in the following order. Industrial application Conventional technology (FIG. 48) Problems to be solved by the invention (FIG. 48) Means for solving the problems (FIGS. 39 and 46) Action (FIGS. 39 and 46) Example (1) First Embodiment (1-1) Overall Configuration (FIGS. 1 and 2) (1-2) Mouth Shape Parameter (FIG. 3) (1-3) Weight Parameter (FIGS. 4 to 38) (1-4) Generation of weighting parameters (FIGS. 39 to 42) (1-5) Effects of embodiment (2) Second embodiment (2-1) Overall configuration (FIGS. 43 to 45) (2-2) Synthetic balance parameter (FIG. 46) (2-3) Effect of Second Embodiment (3) Other Embodiment (FIG. 47) Effect of the Invention

【０００２】[0002]

【産業上の利用分野】本発明は画像作成装置に関し、例
えばコンピユータグラフイツクスの手法を適用したアニ
メーシヨン作成装置に適用し得る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image forming apparatus, and can be applied to, for example, an animation forming apparatus to which a computer graphics technique is applied.

【０００３】[0003]

【従来の技術】従来、この種のアニメーシヨン作成装置
においては、加重パラメータ時系列を用いて複数個の内
挿ベクトルを多重内挿し、このようにして生成した内挿
ベクトルを用いてアニメーシヨンを作成するようになさ
れたものが提案されている（特願平2-256415号）。2. Description of the Related Art Conventionally, in this type of animation creating apparatus, a plurality of interpolation vectors are multiply interpolated using a weighted parameter time series, and the animation is generated using the interpolation vectors generated in this way. A proposal has been made to create one (Japanese Patent Application No. 2-256415).

【０００４】すなわち多重内挿法においては、それぞれ
ｍ個のパラメータ要素を持つ内挿パラメータベクトルｐ
（ｔ）について、各パラメータ要素の基本値を要素とす
る基本ベクトルｐ０を設定し、ｎ個の内挿パラメータベ
クトルｐｉ（ｉ＝１．．．ｎ）からそれぞれ基本ベクト
ルｐ０を減算してｍ＊ｎの内挿パラメータ行列Ｐ（ｔ）
を形成する。That is, in the multiple interpolation method, an interpolation parameter vector p having m parameter elements is used.
For (t), a basic vector p0 having the basic value of each parameter element as an element is set, and the basic vector p0 is subtracted from n interpolation parameter vectors pi (i = 1... N) to obtain m * n interpolation parameter matrix P (t)
To form

【０００５】さらに多重内挿法においては、ｎ個のパラ
メータ要素を持つ加重パラメータベクトルｗ（ｔ）を設
定し、内挿パラメータ行列Ｐ（ｔ）、基本ベクトルｐ０
から、次式In the multiple interpolation method, a weighted parameter vector w (t) having n parameter elements is set, and an interpolation parameter matrix P (t) and a basic vector p0 are set.
From the following equation

【数１】で示されるｍ個の要素を持つベクトルａ（ｔ）を新たに
生成する。(Equation 1) A vector a (t) having m elements represented by is newly generated.

【０００６】これにより図４８に示すようにアニメーシ
ヨン作成装置においては、１つの動きの対象となる物体
（以下オブジエクトと呼ぶ）の動きを複数のオブジエク
ト形状と加重パラメータ時系列との多重内挿によつて生
成する。Thus, as shown in FIG. 48, in the animation creating apparatus, the motion of an object (hereinafter referred to as an object) to be subjected to one motion is subjected to multiple interpolation of a plurality of object shapes and a weight parameter time series. Generated.

【０００７】すなわちアニメーシヨン作成装置において
は、動作を与えるオブジエクトと同じ定義のオブジエク
トＮ１、Ｎ２、Ｎ３を複数個用意し、それぞれに違つた
姿勢を与える（以下内挿オブジエクトと呼ぶ）。That is, in the animation creating apparatus, a plurality of objects N1, N2, and N3 having the same definition as the object giving the motion are prepared, and different postures are given to each of them (hereinafter referred to as interpolation objects).

【０００８】さらにアニメーシヨン作成装置において
は、基本ベクトルｐ０に対応して内挿オブジエクトの基
本姿勢でなる基本オブジエクトＫを与え、これにより内
挿オブジエクトＮ１、Ｎ２、Ｎ３を定義する。さらにｎ
個の内挿パラメータベクトルｐ（ｔ）から基本オブジエ
クトＫを定義する基本ベクトルｐ０を減算し、加重パラ
メータベクトルｗ１、ｗ２、ｗ３で加重混合する。Further, in the animation creating apparatus, a basic object K having the basic posture of the interpolation object is given corresponding to the basic vector p0, thereby defining the interpolation objects N1, N2 and N3. And n
The basic vector p0 defining the basic object K is subtracted from the interpolation parameter vectors p (t), and weighted mixing is performed using weighted parameter vectors w1, w2, and w3.

【０００９】これによりアニメーシヨン作成装置におい
ては、（１）式の演算処理を実行して各時刻のオブジエ
クトの姿勢を表す時系列ベクトルｗ（ｔ）を生成すれ
ば、内挿オブジエクトの姿勢情報を内挿パラメータベク
トルとした多重内挿法を用いて生成することができ、こ
れによりこのベクトルｗ（ｔ）に基づいてアニメーシヨ
ンＡを作成することができる。Thus, in the animation creation device, if the time series vector w (t) representing the attitude of the object at each time is generated by executing the arithmetic processing of the equation (1), the attitude information of the interpolation object is obtained. It can be generated by using a multiple interpolation method using an interpolation parameter vector, whereby the animation A can be created based on the vector w (t).

【００１０】これにより多重内挿法を適用したアニメー
シヨン作成装置においては、コンピユータグラフイツク
アニメーシヨンの最も一般的な手法であるキーフレーム
法を適用した場合に比して、少ないデータ量でアニメー
シヨンを作成し得、また自由度の高い、表現豊かなアニ
メーシヨンを作成し得る特徴がある。As a result, in the animation creating apparatus to which the multiple interpolation method is applied, compared with the case where the key frame method, which is the most general method of computer graphic animation, is applied, the animation has a smaller data amount. And has the feature of creating highly expressive and expressive animations.

【００１１】[0011]

【発明が解決しようとする課題】ところで人が登場する
アニメーシヨンにおいては、話す言葉に合わせて口を動
かすアニメーシヨン（以下リツプシンクアニメーシヨン
と呼ぶ）が不可欠である。By the way, in an animation in which a person appears, an animation in which a mouth is moved in accordance with a spoken word (hereinafter referred to as "lip sync animation") is indispensable.

【００１２】ところが従来のリツプシンクアニメーシヨ
ンにおいては、殆どが発声している間、単に口の開閉を
繰り返しているだけの単純なものである。このリツプシ
ンクアニメーシヨンを実際の口の動きに近づけるために
は、従来のアニメーシヨン作成装置においては、緻密な
オブジエクトモデルを設定し、オブジエクトモデルが持
つパラメータを駆動する必要がある。However, the conventional Lip Sync animation is a simple one in which the mouth is repeatedly opened and closed while most of them are speaking. In order to bring the Lip Sync animation closer to the actual movement of the mouth, in the conventional animation creation device, it is necessary to set a precise object model and drive the parameters of the object model.

【００１３】すなわち得られたアニメーシヨンの動き
は、オブジエクトモデルがどんなパラメータを持つかに
よつて左右され、例えリアルな動きを実現し得たとして
も、そのアニメーシヨンデータを別のオブジエクトモデ
ルに適用することが困難な特徴がある。That is, the motion of the obtained animation depends on what parameters the object model has, and even if a realistic motion can be realized, the animation data is transferred to another object model. There are features that are difficult to apply.

【００１４】このためリツプシンクアニメーシヨンを作
成する場合、従来のアニメーシヨン作成装置において
は、簡単にしかも表現豊かに作成することが困難な問題
があり、また一旦作成しても種々のオブジエクトに適用
し得ない問題があつた。これに対して多重内挿法を適用
したアニメーシヨン作成装置においては、基本的に動き
が加重パラメータの時系列として記述されるので、加重
パラメータを自動的に生成することができれば、効率良
くアニメーシヨンを作成することができる。For this reason, in the case of creating a Lip Sync animation, the conventional animation creation device has a problem that it is difficult to create it simply and richly, and even once it is created, it is difficult to create various objects. There was a problem that could not be applied. On the other hand, in the animation creation apparatus to which the multiple interpolation method is applied, since the motion is basically described as a time series of weighted parameters, if the weighted parameters can be automatically generated, the animation can be efficiently performed. Can be created.

【００１５】本発明は以上の点を考慮してなされたもの
で、多重内挿法アニメーシヨンの利点を活かし、オブジ
エクトモデルに依存しない表現豊かなリツプシンクアニ
メーシヨンを簡単に生成することができる画像作成装置
を提案しようとするものである。The present invention has been made in view of the above points, and makes it possible to easily generate a rich expression lip sync animation independent of an object model by utilizing the advantages of the multiple interpolation animation. It is intended to propose an image creating apparatus capable of doing so.

【００１６】[0016]

【課題を解決するための手段】かかる課題を解決するた
め本発明においては、多重内挿法によるアニメーシヨン
データを作成する画像作成装置１、１０及び画像作成方
法において、加重パラメータ時系列生成手段４、１６、
１８、３２で、予め設定した口形状に対して、発音情報
１２に基づいて母音及び特殊音の内挿パラメータ時系列
Ｗ０、Ｗ２、Ｗ４と、子音の内挿パラメータ時系列Ｗ
１、Ｗ３、Ｗ５とを生成した後、発音情報１２により発
音のタイミングを基準にして加重パラメータ時系列Ｗ
６、Ｗ７を生成し、アニメーシヨン作成手段６、４２
で、当該加重パラメータ時系列Ｗ６、Ｗ７を用いて口形
状の多重内挿を行い、発音情報１２に応じて口形状の変
化するアニメーシヨンＵ３を作成するようにした。According to the present invention, there is provided an image creating apparatus for creating animation data by a multiple interpolation method, and a method for generating weighted parameter time series in an image creating method. , 16,
In steps 18 and 32, the interpolation parameter time series W0, W2, W4 of vowels and special sounds, and the interpolation parameter time series W
1, W3, and W5, and then generates the weighted parameter time series W based on the sounding timing based on the sounding information 12.
6, W7 is generated, and animation creating means 6, 42
Thus, multiple interpolation of the mouth shape is performed using the weighted parameter time series W6 and W7, and an animation U3 in which the mouth shape changes according to the pronunciation information 12 is created.

【００１７】さらに本発明において、予め設定した口形
状は、「あ」を発音する口形状、「い」を発音する口形
状及び「う」を発音する口形状でなる。Further, in the present invention, the mouth shape set in advance includes a mouth shape that pronounces “a”, a mouth shape that pronounces “i”, and a mouth shape that pronounces “u”.

【００１８】さらに本発明においては、加重パラメータ
時系列生成手段４は、発生する音声の全ての母音と子音
に対して、内挿パラメータ時系列Ｗ０〜Ｗ５を形成する
内挿パラメータの要素データを設定するようにした。Further, in the present invention, the weighted parameter time series generation means 4 sets the interpolation parameter element data forming the interpolation parameter time series W0 to W5 for all vowels and consonants of the generated voice. I did it.

【００１９】さらに本発明においては、加重パラメータ
時系列生成手段４は、発音情報１２による子音の種類及
び又は無音の頻度に応じて、口形状に現れる子音の強さ
が変化するように、加重パラメータ時系列Ｗ６、Ｗ７を
形成するようにした。Further, in the present invention, the weighting parameter time series generating means 4 changes the weighting parameter so that the strength of the consonant appearing in the mouth shape changes according to the type of consonant and / or the frequency of silence based on the pronunciation information 12. Time series W6 and W7 are formed.

【００２０】さらに本発明においては、加重パラメータ
時系列生成手段４、１６、１８、３２で、母音及び特殊
音の内挿パラメータ時系列Ｗ０、Ｗ２、Ｗ４と、子音の
内挿パラメータ時系列Ｗ１、Ｗ３、Ｗ５とを合成して加
重パラメータ時系列Ｗ６、Ｗ７を生成するようにした。Further, in the present invention, the weighted parameter time series generating means 4, 16, 18, 32 include interpolation parameter time series W0, W2, W4 of vowels and special sounds and interpolation parameter time series W1, W1 of consonants. W3 and W5 are combined to generate weighted parameter time series W6 and W7.

【００２１】さらに本発明においては、加重パラメータ
時系列生成手段４は、母音の動きと子音の動きとの比率
を適切な値に保持するように、発音情報１２、母音発音
情報及び子音発音情報に基づいて値が時系列で変化する
ような加重パラメータ時系列Ｗ６、Ｗ７を形成するよう
にした。Further, in the present invention, the weighting parameter time series generating means 4 converts the pronunciation information 12, the vowel pronunciation information and the consonant pronunciation information so that the ratio between the movement of the vowel and the movement of the consonant is maintained at an appropriate value. The weighted parameter time series W6 and W7 whose values change in a time series based on the time series are formed.

【００２２】[0022]

【作用】予め設定した口形状に対して、発音情報１２に
基づいて母音及び特殊音の内挿パラメータ時系列Ｗ０、
Ｗ２、Ｗ４と、子音の内挿パラメータ時系列Ｗ１、Ｗ
３、Ｗ５とを生成した後、発音情報１２により発音のタ
イミングを基準にして加重パラメータ時系列Ｗ６、Ｗ７
を生成し、当該生成した加重パラメータ時系列Ｗ６、Ｗ
７を用いて口形状の多重内挿を行うことによつて発音情
報１２に応じて口形状の変化するアニメーシヨンＵ３を
作成すれば、多重内挿法アニメーシヨンの利点を活か
し、オブジエクトモデルに依存しない表現豊かなリツプ
シンクアニメーシヨンを簡単に生成することができる。According to the preset mouth shape, the interpolation parameter time series W0,
W2, W4, and consonant interpolation parameter time series W1, W
3 and W5, the weighting parameter time series W6 and W7 based on the sounding timing based on the sounding information 12
And the generated weighted parameter time series W6, W6
If the animation U3 whose mouth shape changes in accordance with the pronunciation information 12 is created by performing multiple interpolation of the mouth shape by using the method 7, the advantage of the multiple interpolation method animation can be utilized to create an object model. An easy-to-generate lip-sync animation that does not depend on the expression can be generated.

【００２３】[0023]

【実施例】以下図面について、本発明の一実施例を詳述
する。BRIEF DESCRIPTION OF THE DRAWINGS FIG.

【００２４】（１）第１の実施例（１−１）全体構成図１において、１は全体としてリツプシンクアニメーシ
ヨンを生成するアニメーシヨン作成装置の全体構成を示
し、この実施例の場合、規則合成法で作成した音声と同
期するリツプシンクアニメーシヨンを生成する。(1) First Embodiment (1-1) Overall Configuration In FIG. 1, reference numeral 1 denotes an overall configuration of an animation creating apparatus for generating a lip sync animation as a whole. Generates a Lip Sync animation that synchronizes with the speech created by the rule synthesis method.

【００２５】すなわちこの実施例においてアニメーシヨ
ン作成装置１は、複素ケプストラム規則音声合成システ
ムを用いて音声規則合成部２で音声を合成するようにな
され、ここで複素ケプストラム規則音声合成システムに
おいては、発音記号とアクセント位置情報を記述した発
音テキストに基づいて音声を合成する。That is, in this embodiment, the animation creating apparatus 1 is adapted to synthesize speech in the speech rule synthesizing section 2 using a complex cepstrum rule speech synthesis system. A speech is synthesized based on a pronunciation text that describes a symbol and accent position information.

【００２６】ここで図２に示すように発音情報は、音韻
情報及び発音タイミング情報で形成され、音韻情報は、
例えばローマ字、その他の記号で表すことができ、発音
タイミング情報は、各音韻が発音される時刻を所定の数
字データで表し得るようになされている。Here, as shown in FIG. 2, the pronunciation information is formed of phoneme information and pronunciation timing information.
For example, it can be represented by Roman characters or other symbols, and the sounding timing information can indicate the time at which each phoneme is sounded by predetermined numerical data.

【００２７】アニメーシヨン作成装置１においては、こ
の発音タイミング情報を基準にして加重パラメータ生成
部４で音韻情報から加重パラメータを生成する。さらに
アニメーシヨン作成装置１においては、この加重パラメ
ータに基づいてＭＩＭｅアニメーシヨン生成部６で口形
状モデルからアニメーシヨンを作成する。In the animation creation apparatus 1, a weighting parameter is generated from phonemic information by the weighting parameter generation unit 4 based on the sounding timing information. Further, in the animation creation apparatus 1, the MIMe animation generation unit 6 creates an animation from the mouth shape model based on the weighting parameters.

【００２８】これによりアニメーシヨン作成装置１にお
いては、４つの静止パターン（内挿、基本オブジエク
ト）と３つの加重パラメータ時系列から動パターンを生
成するようになされている。Thus, in the animation creating apparatus 1, a moving pattern is generated from four stationary patterns (interpolation, basic object) and three weighted parameter time series.

【００２９】（１−２）口形状パラメータこの実施例においてアニメーシヨン作成装置１は、口の
形状を表現する内挿オブジエクトとして３パターンの口
形状オブジエクトが予め設定されるようになされてい
る。(1-2) Mouth Shape Parameter In this embodiment, the animation creating apparatus 1 is set in advance with three patterns of mouth shape objects as interpolation objects expressing the shape of the mouth.

【００３０】ここで図３に示すように第１の口形状オブ
ジエクトは、下顎をおろして口を開いた「あ」の音声を
発生する口形状で（図３（Ａ））、第２の口形状オブジ
エクトは、歯を剥いて唇を横に引つ張つた「い」の音声
を発生する口形状（図３（Ｂ））、第３の口形状オブジ
エクトは、唇を窄めた「う」の音声を発生する口形状
（図３（Ｃ））が選定されるようになされている。Here, as shown in FIG. 3, the first mouth-shaped object is a mouth-shaped object in which the lower jaw is lowered and the mouth is opened to generate the sound of "A" (FIG. 3 (A)). The shape object is a mouth shape (FIG. 3 (B)) that generates a sound of “I” in which the teeth are peeled and the lips are stretched sideways (FIG. 3 (B)). The mouth shape (FIG. 3 (C)) which generates the voice of (1) is selected.

【００３１】これにより「え」の音声を発生する口形状
は、「あ」の音声を発生する口形状と「い」の音声を発
生する口形状との中間の口形状で表現し得、「お」の音
声を発生する口形状は、「あ」の音声を発生する口形状
と「う」の音声を発生する口形状との中間の口形状で表
現し得、これによりアニメーシヨン作成装置１は、母音
についての口形状オブジエクトを自由に選定及び作成し
得るようになされている。As a result, the mouth shape that generates the voice of “e” can be expressed by an intermediate mouth shape between the mouth shape that generates the voice of “a” and the mouth shape that generates the voice of “i”. The mouth shape that generates the voice of “O” can be expressed as an intermediate mouth shape between the mouth shape that generates the voice of “A” and the mouth shape that generates the voice of “U”. Is capable of freely selecting and creating a mouth-shaped object for a vowel.

【００３２】さらにこの実施例においてアニメーシヨン
作成装置１は、基本の口形状オブジエクトとして口を自
然に閉じた基本パターンが用意され（図３（Ｄ））、こ
れによりこの４つのパターンを用いてほぼ全ての口形状
を表現し得るようになされている。なおこのように３つ
の内挿オブジエクトに３種類の母音の口形状を選定すれ
ば、この３つの内挿オブジエクトをパラメータを設定す
る際の座標軸に選定した際、種々の口形状を直感的に把
握し得、その分全体の使い勝手を向上することができ
る。Further, in this embodiment, the animation creation device 1 prepares a basic pattern in which the mouth is naturally closed as a basic mouth shape object (FIG. 3 (D)). All mouth shapes can be expressed. If three kinds of vowel mouth shapes are selected for the three interpolation objects, various mouth shapes can be intuitively grasped when these three interpolation objects are selected as coordinate axes for setting parameters. Therefore, the overall usability can be improved accordingly.

【００３３】かくしてこの実施例においては、「あ」、
「い」、「う」に対応する形状と基本形の４パターンに
よつて日本語の発音形状を表現することにより、規則合
成法で作成した音声と同期するリツプシンクアニメーシ
ヨンを生成するようになされている。Thus, in this embodiment, "a",
By expressing a Japanese pronunciation shape using four patterns of shapes corresponding to "i" and "u" and a basic shape, it is possible to generate a lip sync animation synchronized with the speech created by the rule synthesis method. It has been done.

【００３４】（１−３）加重パラメータここで日本語の音要素は、母音、子音、特殊音に分けら
れ、この実施例の場合それぞれに加重パラメータ要素を
選定する。ここで加重パラメータは、それぞれ内挿オブ
ジエクトで表現される３つの口形状パターンの内挿比を
決める３次元のベクトルで、この実施例の場合、この加
重パラメータ要素で加重パラメータの時系列を生成し、
この時系列を用いて一連のアニメーシヨンを作成する。(1-3) Weight Parameters Here, Japanese sound elements are divided into vowels, consonants, and special sounds. In this embodiment, weight parameter elements are selected. Here, the weighting parameter is a three-dimensional vector that determines the interpolation ratio of three mouth-shaped patterns each represented by an interpolation object. In this embodiment, a time-series of weighting parameters is generated using the weighting parameter elements. ,
A series of animations is created using this time series.

【００３５】この場合日本語の母音の発音における口形
状は、それぞれ内挿オブジエクトで表現される３つの口
形状パターンで表現できることにより、子音の発音にお
ける口形状も口形状パターンの時系列で表現し得るよう
にする。このためこの実施例においてアニメーシヨン作
成装置１は、母音について加重パラメータ要素を所定の
ベクトルパターンで定義し、子音については同様のベク
トルパターンを用いてこのベクトルパターンを変化させ
て表現する。In this case, the mouth shape in the pronunciation of the Japanese vowel can be represented by three mouth shape patterns each represented by an interpolation object, so that the mouth shape in the pronunciation of the consonant is also represented by the time series of the mouth shape pattern. To get. For this reason, in this embodiment, the animation creating apparatus 1 defines weighted parameter elements for vowels with a predetermined vector pattern, and expresses consonants by changing the vector pattern using the same vector pattern.

【００３６】すなわち図４に加重パラメータ要素を示す
ように、母音においては、「あ」、「い」、「う」の口
形状を３次元の座標軸上に取つて、それぞれの口形状を
１つの点として表現することができる。That is, as shown in FIG. 4, the weight parameters of the vowels are "A", "I", and "U" on the three-dimensional coordinate axis, and each of the mouth shapes is represented by one. It can be expressed as a point.

【００３７】これに対して図５に示すように、子音にお
いては、「あ」、「い」、「う」の口形状でなる３次元
の座標軸に対して、軌跡として示すことができる。これ
により母音の口形状においては、所定のベクトルパター
ンで定義し得、また子音についても同様のベクトルパタ
ーンで表現することができる。On the other hand, as shown in FIG. 5, a consonant can be shown as a locus with respect to a three-dimensional coordinate axis having a mouth shape of "A", "I", and "U". As a result, the mouth shape of a vowel can be defined by a predetermined vector pattern, and consonants can be expressed by a similar vector pattern.

【００３８】これに対して長音、促音、無音でなる特殊
音について、アニメーシヨン作成装置１は、決まつた加
重パラメータ要素を設定せず、これに代えて加重パラメ
ータ時系列を生成する際、特殊音の前後関係に応じて前
後の加重パラメータ時系列から生成する。On the other hand, for a special sound consisting of a long sound, a prompt sound, and a silent sound, the animation creating apparatus 1 does not set a fixed weighted parameter element, but instead generates a special weighted parameter time series when generating a weighted parameter time series. It is generated from the time series of weighted parameters before and after according to the context of the sound.

【００３９】ところでアニメーシヨン作成装置１でリア
ルなリツプシングアニメーシヨンを作成するためには、
各音要素の口形状をうまく表現できるように各要素デー
タを設定する必要がある。このためこの実施例において
は、図５〜図３８に示すように、各子音に加重パラメー
タ要素データを割当て、これによりリアルなリツプシン
グアニメーシヨンを作成し得るようにした。なおここで
横軸は、子音の発音タイミング時刻を０として時間（ｍ
ｓ）を表し、縦軸は加重の強さを表す。さらに記号
「＠」は無音化した子音を表し、記号「ｑ」は鼻濁音の
「ｇ」を、記号「ｘ」は「ん」を表す。By the way, in order to create a realistic ripsing animation with the animation creating apparatus 1,
It is necessary to set each element data so that the mouth shape of each sound element can be expressed well. For this reason, in this embodiment, as shown in FIGS. 5 to 38, weighted parameter element data is assigned to each consonant so that a realistic ripsing animation can be created. Here, the horizontal axis represents time (m
s), and the vertical axis represents the weighting strength. Further, the symbol “＠” represents a silenced consonant, the symbol “q” represents “g” of a muddy sound, and the symbol “x” represents “n”.

【００４０】従つて図６においては、発音記号「ｋ」で
表される音声を発音した際の加重パラメータ要素データ
を、図７はその無音化した際の加重パラメータ要素デー
タを表す。さらに図８は、発音記号「ｓ」で表される音
声を発生した際の加重パラメータ要素データを表し、図
９はその無音化した際の加重パラメータ要素データを、
図１０は発音記号「ｓｈ」で表される音声の加重パラメ
ータ要素データを、図１１はその無音化した際の加重パ
ラメータ要素データを表す。Accordingly, FIG. 6 shows the weighted parameter element data when the sound represented by the phonetic symbol "k" is generated, and FIG. 7 shows the weighted parameter element data when the sound is muted. FIG. 8 shows weighted parameter element data when a voice represented by the phonetic symbol “s” is generated, and FIG. 9 shows weighted parameter element data when the sound is muted.
FIG. 10 shows the weighted parameter element data of the voice represented by the phonetic symbol “sh”, and FIG. 11 shows the weighted parameter element data when the sound is muted.

【００４１】さらに図１２は発音記号「ｔ」で表される
音声の加重パラメータ要素データを、図１３は発音記号
「ｃｈ」で表される音声の加重パラメータ要素データ
を、図１４はその無音化した際の加重パラメータ要素デ
ータを表す。さらに図１５は発音記号「ｔｓ」で表され
る音声の加重パラメータ要素データを、図１６はその無
音化した際の加重パラメータ要素データを、図１７は発
音記号「ｎ」で表される音声の加重パラメータ要素デー
タ、図１８は発音記号「ｈ」で表される音声の加重パラ
メータ要素データを表す。FIG. 12 shows the weighted parameter element data of the voice represented by the phonetic symbol "t", FIG. 13 shows the weighted parameter element data of the voice represented by the phonetic symbol "ch", and FIG. Represents the weighted parameter element data at the time of execution. Further, FIG. 15 shows the weighted parameter element data of the voice represented by the phonetic symbol "ts", FIG. 16 shows the weighted parameter element data when the voice is silenced, and FIG. FIG. 18 shows the weighted parameter element data of the voice represented by the phonetic symbol "h".

【００４２】さらに図１９〜図３０は、それぞれ発音記
号「ｆ」、「ｍ」、「ｙ」、「ｒ」、「ｗ」、「ｇ」、
「ｑ」、「ｚ」、「ｊ」、「ｄ」、「ｂ」、「ｐ」で表
される音声の加重パラメータ要素データを、図３１〜図
３８は、それぞれ発音記号「ｋｙ」、「ｎｙ」、「ｈ
ｙ」、「ｍｙ」、「ｒｙ」、「ｇｙ」、「ｂｙ」、
「ｘ」で表される音声の加重パラメータ要素データを表
す。FIGS. 19 to 30 show the pronunciation symbols "f", "m", "y", "r", "w", "g",
FIGS. 31 to 38 show the weighted parameter element data of the voice represented by “q”, “z”, “j”, “d”, “b”, and “p”, respectively. ny "," h
y "," my "," ry "," gy "," by ",
This represents weighted parameter element data of the voice represented by “x”.

【００４３】（１−４）加重パラメータの生成図３９に示すようにアニメーシヨン作成装置１において
は、予め準備した口形状パターンと加重パラメータ要素
を用いて、発音情報に基づいて加重パラメータ時系列を
自動的に生成し、これによりリツプシングアニメーシヨ
ンを自動的に生成する。(1-4) Generation of Weight Parameter As shown in FIG. 39, the animation creating apparatus 1 uses the mouth shape pattern prepared in advance and the weight parameter element to generate the weight parameter time series based on the phonetic information. Automatically generate, thereby automatically generating a ripping animation.

【００４４】すなわちアニメーシヨン作成装置１におい
ては、子音だけの口の動きを記述する加重パラメータ時
系列Ｗ１、Ｗ３、Ｗ５（以下子音加重パラメータ時系列
と呼ぶ）と母音及び特殊音の口の動きを記述する加重パ
ラメータ時系列Ｗ０、Ｗ２、Ｗ４（以下母音加重パラメ
ータ時系列と呼ぶ）の２つの時系列を生成し、各加重パ
ラメータ時系列と「あ」、「い」、「う」の内挿オブジ
エクトとの間で多重内挿法を適用して２種類の口の動き
Ｕ１及びＵ２を生成する。That is, in the animation creating apparatus 1, the weighted parameter time series W1, W3, W5 (hereinafter referred to as consonant weighted parameter time series) describing the movement of the mouth of only the consonant, and the movement of the mouth of the vowel and the special sound are recorded. Generate two time series of weighting parameter time series W0, W2, W4 (hereinafter referred to as vowel weighting parameter time series) to be described, and interpolate each weighting parameter time series with "A", "I", "U" Two kinds of mouth movements U1 and U2 are generated by applying a multiple interpolation method to the object.

【００４５】続いてアニメーシヨン作成装置１において
は、多重内挿法を適用することにより、この２種類の口
の動きＵ１及びＵ２を所定の混合加重パラメータＷ６、
Ｗ７を用いてバランスよく混合し、これにより最終的な
リツプシングアニメーシヨンＵ３を生成する。ここでア
ニメーシヨン作成装置１は、各子音に対応する加重パラ
メータ要素を発音タイミングに従つて時間軸上に離散的
に並べた後、各要素の終端と次の要素の始端を補間する
ことにより、子音加重パラメータ時系列を生成する。Subsequently, in the animation creating apparatus 1, the two kinds of mouth movements U1 and U2 are determined by applying a multiple interpolation method to a predetermined mixed weight parameter W6,
Mix well using W7 to produce the final ripping animation U3. Here, the animation creating apparatus 1 arranges the weighted parameter elements corresponding to each consonant discretely on the time axis according to the sounding timing, and then interpolates the end of each element and the start of the next element, Generate a consonant weighted parameter time series.

【００４６】このとき前後の子音の加重パラメータ要素
が時間軸上で重なる場合、アニメーシヨン作成装置１
は、内挿の手法を適用して子音加重パラメータ時系列を
生成することにより、連続的に変化するように子音加重
パラメータ時系列を生成する。At this time, if the weighted parameter elements of the preceding and following consonants overlap on the time axis, the animation creation device 1
Generates a consonant weighted parameter time series so as to vary continuously by applying an interpolation technique to generate a consonant weighted parameter time series.

【００４７】これに対してアニメーシヨン作成装置１
は、各音の発音タイミング時刻について、加重パラメー
タを決定することにより、同様に加重パラメータ要素を
時間軸上に離散的に並べた後、補間処理によりこの間を
順次埋めるように加重パラメータ要素を設定し、これに
より母音加重パラメータ時系列を生成する。On the other hand, the animation creation device 1
After determining the weighting parameters for the sound generation timing of each sound, the weighting parameter elements are similarly arranged discretely on the time axis, and then the weighting parameter elements are set so as to sequentially fill the gap by interpolation processing. Thus, a vowel weighted parameter time series is generated.

【００４８】このときある発音タイミング時刻の音要素
が母音である場合、アニメーシヨン作成装置１は、その
時刻の加重パラメータ値をその母音の加重パラメータ要
素に設定するのに対し、ある発音タイミング時刻の音要
素が長音である場合、その直前の発音タイミング時刻の
加重パラメータ値をその時刻の加重パラメータ値に設定
する。At this time, if the sound element at a certain sound generation timing is a vowel, the animation creating apparatus 1 sets the weight parameter value at that time as the weight parameter element of the vowel, while If the sound element is a long sound, the weight parameter value at the immediately preceding sounding timing time is set to the weight parameter value at that time.

【００４９】さらにアニメーシヨン作成装置１において
は、ある発音タイミング時刻の音要素が促音である場合
は、その時刻の加重パラメータ値を０ベクトル（すなわ
ち３パラメータがすべて値０でなる）に設定するのに対
し、ある発音タイミング時刻の音要素が無音である場合
は、その直前の発音タイミング時刻の加重パラメータ値
から一定の減衰比を乗じた値をその時刻の加重パラメー
タ値に設定する。Further, in the animation creating apparatus 1, if the sound element at a certain sound generation timing is a prompt sound, the weighting parameter value at that time is set to a zero vector (that is, all three parameters have the value 0). On the other hand, when the sound element at a certain sound generation timing is silent, a value obtained by multiplying the weighted parameter value of the immediately preceding sound generation timing by a constant attenuation ratio is set as the weight parameter value at that time.

【００５０】これにより規則合成法を適用して作成した
音声と同期するリツプシンクアニメーシヨンをリアルタ
イムで生成することができる。As a result, it is possible to generate in real time a lip sync animation synchronized with a voice created by applying the rule synthesis method.

【００５１】ところでこのようにして生成した母音加重
パラメータ時系列だけで口の動きを表現する場合、実際
の動きに対して違和感のある動きにしか表現し得ない。
このためこの実施例においては、２種類の口の動きＵ１
及びＵ２を所定の混合加重パラメータＷ６、Ｗ７を用い
てバランスよく混合し、これによりリアルで表情の豊か
な動きを実現する。By the way, when the mouth movement is expressed only by the vowel weighted parameter time series generated in this way, it can be expressed only as a movement that is uncomfortable with the actual movement.
Therefore, in this embodiment, two kinds of mouth movements U1
And U2 are mixed in a well-balanced manner using predetermined mixing weight parameters W6 and W7, thereby realizing a realistic and expressive movement.

【００５２】このためには、混合の割合を決める加重パ
ラメータＷ６、Ｗ７（以下混合加重パラメータ時系列と
呼ぶ）の設定が重要になる。ここで混合加重パラメータ
時系列は、母音、子音両者の混合比を決める２次元のベ
クトルで、この実施例の場合、発音タイミング情報に基
づいて、発音のタイミングを中心にしてそれぞれ母音、
子音が相手を引き合うように混合加重パラメータ時系列
を生成し、これにより母音と子音の動きをなめらかに表
現する。For this purpose, it is important to set weighting parameters W6 and W7 (hereinafter referred to as mixing weighting parameter time series) for determining the mixing ratio. Here, the mixed weighting parameter time series is a two-dimensional vector that determines the mixing ratio of both vowels and consonants. In this embodiment, based on the sounding timing information, the vowels,
A mixed weighted parameter time series is generated such that the consonant attracts the other party, thereby smoothly expressing the movement of the vowel and the consonant.

【００５３】このときアニメーシヨン作成装置１におい
ては、子音の音要素の種類に応じて、相手を引き合う強
さが切り換わるように、混合加重パラメータ時系列を切
り換えて生成するようになされ、これによりはつきりし
た口の形状として現われる例えば「ま」、「ぱ」、
「ば」などの子音を発音する口形状と、はつきりした口
の形状として現われない例えば「か」、「は」などの子
音について、この子音を発音する口形状の違いを表現す
るようになされている。At this time, in the animation creating apparatus 1, the mixed weighting parameter time series is switched and generated so that the strength of attracting the other party is switched according to the type of the sound element of the consonant. For example, "ma", "ぱ",
To express the difference between the mouth shape that pronounces consonants such as “ba” and the consonants that do not appear as the shape of a sticky mouth, such as “ka” and “ha”, that sound these consonants It has been done.

【００５４】これによりアニメーシヨン作成装置１にお
いては、リアルで表情の豊かなリツプシングアニメーシ
ヨンＵ３を生成することができた。さらに２つの動きの
混合とすることにより、母音と子音の前後関係による相
互作用も表現できる。As a result, the animation creating apparatus 1 was able to generate a realistic and expressive Ripping animation U3. Further, by using a mixture of two movements, an interaction due to the context of vowels and consonants can be expressed.

【００５５】具体的には、単音として「く」と発音した
場合、「う」に比べて口の動きがはつきり現れるのに対
し、一連の発音の中の「く」は母音を発音するための口
の動きがはつきりと現れない特徴があり、このような表
情も表現することができた。Specifically, when "ku" is pronounced as a single sound, the movement of the mouth is more pronounced than "u", whereas "ku" in a series of sounds produces a vowel. There was a characteristic that the movement of the mouth did not appear suddenly, and such a facial expression could be expressed.

【００５６】さらに図４０及び図４１に示すように、実
際に人が話すように、話す速度によつても口形状が変化
することを確認し得た。この場合「あいうえお」と「か
きくけこ」の違いは子音［ｋ］によるものであるが、ゆ
つくり話たときと、はやく話したときとで［ｋ］の表れ
かたが違うことがわかる。Further, as shown in FIG. 40 and FIG. 41, it was confirmed that the mouth shape changes depending on the speaking speed as if a person actually spoke. In this case, the difference between "aiueo" and "kakikukeko" is due to the consonant [k], but it can be seen that the appearance of [k] differs between when talking loosely and when talking quickly.

【００５７】さらに図４２に示すように、このように２
種類の口の動きＵ１及びＵ２を所定の混合加重パラメー
タで混合することにより、このバランスを調整して特徴
のある話かたを作ることができる。すなわち図４２にお
いては、母音の口の動きＵ２と子音の口の動きＵ１の混
合比を可変した場合の口形状で、右側に行く程母音の混
合比大きくした場合を表す。Further, as shown in FIG.
By mixing the kinds of mouth movements U1 and U2 with a predetermined mixing weight parameter, this balance can be adjusted to create a characteristic speech. That is, FIG. 42 shows a case where the mixture ratio of the vowel mouth movement U2 and the consonant mouth movement U1 is varied, and the mixture ratio of the vowel increases toward the right side.

【００５８】これにより実際の口の動きを反映するよう
に加重パラメータ時系列を自動的に生成することがで
き、これにより多重内挿法を適用して予め設定された口
形状パターンを用いてリアルタイムで自動的にアニメー
シヨンを作成することができる。さらにこのとき日本語
の発音情報（すなわち音要素とタイミングでなる）を入
力するだけで簡易にリツプシンクアニメーシヨンを作る
ことができ、さらに各子音による口形状の違いが表現で
きることにより、リアルなアニメーシヨンを作成するこ
とができる。This makes it possible to automatically generate a weighted parameter time series so as to reflect the actual mouth movement, thereby real-time processing using a pre-set mouth shape pattern by applying the multiple interpolation method. Can automatically create animations. Furthermore, at this time, it is possible to easily create a lip-sync animation simply by inputting Japanese pronunciation information (that is, consisting of sound elements and timing), and furthermore, it is possible to express a difference in mouth shape by each consonant, thereby realizing a realistic sound. You can create animations.

【００５９】また音声合成システムとの併用により、日
本語のテキスト情報を読み上げるアニメーシヨンを自動
的に作成することもできる。さらに同じ動きデータを違
う口（オブジエクトモデル）で共有することができ、こ
れにより動きデータをライブラリ化することもできる。In combination with the speech synthesis system, an animation for reading Japanese text information can be automatically created. Further, the same motion data can be shared by different mouths (object models), thereby making it possible to make a library of motion data.

【００６０】すなわち多重内挿法アニメーシヨンの加重
パラメータは、オブジエクトモデルに依存しない特徴が
ある。従つて通常のアニメーシヨン作成装置において、
例えばいくつかの代表点で構成されているオブジエクト
について、このオブジエクトの動きを各代表点の動きの
時系列として表現し、このとき対象となるオブジエクト
の代表点数を変更して元の動きデータを適用し得なくな
つた場合でも多重内挿法アニメーシヨンにおいては、代
表点数を変更しても元の動きデータを適用し得ることに
よりアニメーシヨンデータのライブラリ化を図ることが
できる。That is, the weighting parameter of the multiple interpolation animation is characterized in that it does not depend on the object model. Therefore, in a normal animation creation device,
For example, for an object composed of several representative points, the movement of this object is expressed as a time series of the movement of each representative point, and at this time, the number of representative points of the target object is changed and the original motion data is applied Even in such a case, in the multiple interpolation animation, even if the number of representative points is changed, the original motion data can be applied, so that the animation data can be made into a library.

【００６１】従つてこの実施例によれば、動きのデータ
としての加重パラメータ時系列を種々のオブジエクトに
共通に使用することができる。また内挿オブジエクトモ
デルの設定を変更すれば、同じ動きデータで口以外のオ
ブジエクトを言葉に同期して動かすこともできる。Therefore, according to this embodiment, the weighted parameter time series as motion data can be used in common for various objects. Also, if the setting of the interpolation object model is changed, it is possible to move objects other than the mouth in synchronization with words using the same motion data.

【００６２】（１−５）実施例の効果以上の構成によれば、多重内挿法を適用して口形状のア
ニメーシヨンを作成する際に、発音のタイミングに基づ
いて加重パラメータ時系列を生成することにより、話す
言葉に合わせて口を動かすことができるオブジエクトモ
デルに依存しない表現豊かなリツプシンクアニメーシヨ
ンを簡単に生成することができる。(1-5) Effects of the Embodiment According to the above configuration, when creating a mouth-shaped animation by applying the multiple interpolation method, a weighted parameter time series is generated based on the timing of sound generation. By doing so, it is possible to easily generate an expressive Lip Sync animation that does not depend on an object model that can move the mouth in accordance with the words spoken.

【００６３】（２）第２の実施例（２−１）全体構成図４３においては、１０は全体として第２の実施例によ
るアニメーシヨン作成装置を示し、第１の実施例と同様
に発音情報を使用して加重パラメータ時系列を自動的に
生成する。(2) Second Embodiment (2-1) Overall Configuration In FIG. 43, reference numeral 10 denotes an animation creation device according to the second embodiment as a whole, and the pronunciation information is the same as in the first embodiment. To automatically generate weighted parameter time series.

【００６４】アニメーシヨン作成装置１０においては、
この発音情報１２を発音情報分離部１４に与え、ここで
テキスト処理を実行して母音及び子音の発音情報に分離
し、それぞれ母音発音情報及び子音発音情報として動き
情報生成部１６及び１８に出力する。In the animation creating apparatus 10,
The pronunciation information 12 is provided to a pronunciation information separation unit 14, where text processing is performed to separate the vowel and consonant pronunciation information, and output to the motion information generation units 16 and 18 as vowel pronunciation information and consonant pronunciation information, respectively. .

【００６５】動き情報生成部１６及び１８は、それぞれ
母音発音情報及び子音発音情報に基づいて口形状の変化
を表すパラメータ時系列を生成する。すなわち図４４に
示すように動き情報生成部１６及び１８は、母音発音情
報及び子音発音情報でなる発音情報を発音情報入力部２
０に入力し、この発音情報を発音情報メモリ２２に格納
する。The motion information generators 16 and 18 generate a parameter time series representing a change in mouth shape based on vowel pronunciation information and consonant pronunciation information, respectively. That is, as shown in FIG. 44, the motion information generating units 16 and 18 transmit the pronunciation information including the vowel pronunciation information and the consonant pronunciation information to the pronunciation information input unit 2.
0, and the sound information is stored in the sound information memory 22.

【００６６】動き情報生成部１６及び１８は、予め設定
した口形状パラメータを口形状データメモリ２４に格納
するようになされ、演算装置２６は、発音情報メモリ２
２に格納した発音情報を順次読み出し、この発音情報に
基づいて口形状データメモリ２４に格納した口形状パラ
メータについて対応する口形状パラメータを順次選択す
る。The motion information generators 16 and 18 store preset mouth shape parameters in the mouth shape data memory 24.
The mouth shape parameters stored in the mouth shape data memory 24 are sequentially selected based on the sound information.

【００６７】さらに演算装置２６は、選択した口形状パ
ラメータを動き情報時系列メモリ２８に格納し、このと
き発音タイミング情報から得られる発音のタイミングに
応じて所定のアドレスを選択して口形状パラメータを格
納する。これに対して動き情報出力部３０は、アドレス
順に動き情報時系列メモリ２８の内容を読み出して出力
し、これにより口形状パラメータの時系列でなる動き情
報時系列を動き情報合成部３２に出力する。Further, the arithmetic unit 26 stores the selected mouth shape parameter in the motion information time-series memory 28, and at this time, selects a predetermined address in accordance with the sounding timing obtained from the sounding timing information to change the mouth shape parameter. Store. On the other hand, the motion information output unit 30 reads and outputs the contents of the motion information time series memory 28 in address order, and thereby outputs the motion information time series consisting of the time series of the mouth shape parameters to the motion information synthesis unit 32. .

【００６８】これにより動き情報生成部１６及び１８
は、第１の実施例と同様の手法を適用して口形状パラメ
ータの時系列でなる動き情報時系列を母音動き情報及び
子音動き情報として出力する。Thus, the motion information generators 16 and 18
Outputs a motion information time series composed of a time series of mouth shape parameters as vowel motion information and consonant motion information by applying the same method as in the first embodiment.

【００６９】動き情報合成部３２は、この母音動き情報
及び子音動き情報と合成バランスパラメータを使用して
１つの動き情報を生成する。すなわち図４５に示すよう
に動き情報合成部３２は、動き情報入力部３４を介して
母音動き情報及び子音動き情報を入力するのに対し、合
成バランスパラメータ入力部３６を介して合成バランス
パラメータを入力し、順次演算装置３８で演算処理す
る。The motion information synthesizing section 32 generates one motion information using the vowel motion information, the consonant motion information, and the synthesis balance parameter. That is, as shown in FIG. 45, the motion information synthesizing unit 32 inputs the vowel motion information and the consonant motion information via the motion information input unit 34, but inputs the synthetic balance parameter via the synthetic balance parameter input unit 36. Then, arithmetic processing is sequentially performed by the arithmetic unit 38.

【００７０】このとき演算装置３８においては、順次所
定の時間間隔で多重内挿演算処理を実行することによ
り、時系列データでなる動き情報を生成し、この動き情
報を動き情報出力部４０を介して画像生成装置４２に出
力する。ここで画像生成装置４２は、この動き情報に基
づいてリツプシングアニメーシヨンを作成し、モニタ装
置４４に表示する。At this time, in the arithmetic unit 38, motion information composed of time-series data is generated by successively performing multiple interpolation arithmetic processing at predetermined time intervals, and this motion information is transmitted via the motion information output unit 40. And outputs it to the image generation device 42. Here, the image generation device 42 creates a ripple animation based on the motion information and displays it on the monitor device 44.

【００７１】ここで合成バランスパラメータは、母音及
び子音でなる２つの口形状パラメータを内挿するための
加重パラメータで、この実施例においては、この合成バ
ランスパラメータを用いることにより、表情の豊かなリ
ツプシングアニメーシヨンを作成する。Here, the synthetic balance parameter is a weighting parameter for interpolating two mouth shape parameters consisting of a vowel and a consonant. In this embodiment, by using this synthetic balance parameter, a rich expression of Ritz can be obtained. Create a pushing animation.

【００７２】（２−２）合成バランスパラメータここで図４６に示すように、合成バランスパラメータに
おいては、合成バランスパラメータ生成部４６で発音情
報１２、母音発音情報及び子音発音情報に基づいて生成
され、母音動き情報及び子音動き情報に対して第１の実
施例の混合加重パラメータＷ６、Ｗ７（図３９）と同一
の関係にある。(2-2) Synthetic Balance Parameter As shown in FIG. 46, the synthetic balance parameter is generated by the synthetic balance parameter generator 46 based on the pronunciation information 12, vowel sound information, and consonant sound information. The vowel motion information and the consonant motion information have the same relationship as the mixed weight parameters W6 and W7 (FIG. 39) of the first embodiment.

【００７３】この実施例においては、この合成バランス
パラメータを合成バランスパラメータ時系列Ｗ７、Ｗ８
として使用し、これにより母音の動きと子音の動きとの
比率を適切な関係に保持し、表情の豊かなリツプシング
アニメーシヨンを作成する。In this embodiment, the combined balance parameters are converted to combined balance parameter time series W7, W8.
To maintain the ratio of vowel movements to consonant movements in an appropriate relationship, thereby creating a richly expressive ripsing animation.

【００７４】すなわち合成バランスパラメータ生成部４
６においては、各発音のタイミングで値が最大になり、
この最大値のタイミングから遠ざかるに従つて値が除々
に小さくなるように合成バランスパラメータを生成す
る。これによりアニメーシヨン作成装置１０において
は、母音の動きと子音の動きとが、互いに相手の動きを
妨害するような動きを未然に防止し得、リアルなリツプ
シングアニメーシヨンを作成することができる。That is, the synthesis balance parameter generation unit 4
In 6, the value becomes maximum at each sounding timing,
The synthesis balance parameter is generated such that the value gradually decreases as the distance from the timing of the maximum value increases. Thereby, in the animation creating apparatus 10, it is possible to prevent a movement in which the movement of the vowel and the movement of the consonant interfere with each other's movement, and create a realistic ripping animation. .

【００７５】さらに合成バランスパラメータ生成部４６
においては、発音のタイミングを順次検出することによ
り、各音韻の時間間隔を検出し、これにより発音の速度
情報を検出する。Further, the synthesis balance parameter generating section 46
In, the time interval of each phoneme is detected by sequentially detecting the sounding timing, and thereby the sounding speed information is detected.

【００７６】さらに合成バランスパラメータ生成部４６
においては、この速度情報に基づいて発音の速度が大き
いとき、母音、子音の何れかの動きが小さくなるよう
に、または双方の動きが小さくなるように合成バランス
パラメータを生成する。これによりアニメーシヨン作成
装置１０においては、発音の速度が大きいときでも滑ら
かさを失わないリアルなリツプシングアニメーシヨンを
作成することができる。Further, the synthesis balance parameter generating section 46
In, based on this speed information, when the pronunciation speed is high, the synthetic balance parameter is generated such that the movement of either the vowel or the consonant is reduced, or both movements are reduced. As a result, in the animation creating apparatus 10, it is possible to create a realistic ripping animation that does not lose its smoothness even when the speed of sound generation is high.

【００７７】さらに合成バランスパラメータ生成部４６
においては、音韻固有の情報を所定のメモリ回路に格納
し、これにより子音の音韻に応じて、その口形状に表れ
る重要度を重み情報値として格納する。これによりアニ
メーシヨン作成装置１０においては、この値に基づいて
合成バランスパラメータを補正し、かくしてよりリアル
なリツプシングアニメーシヨンを作成することができ
る。Further, the synthesis balance parameter generating section 46
In, the information unique to the phoneme is stored in a predetermined memory circuit, whereby the importance expressed in the mouth shape of the consonant is stored as the weight information value according to the phoneme of the consonant. As a result, the animation creation device 10 can correct the synthesis balance parameter based on this value, and thus can create a more realistic ripple animation.

【００７８】（２−３）第２の実施例の効果図４３の構成によれば、発音情報、母音発音情報及び子
音発音情報に基づいて合成バランスパラメータを生成
し、これにより母音の動きと子音の動きとの比率を適切
な関係に保持することにより、表情の豊かなリツプシン
グアニメーシヨンを作成することができる。(2-3) Effects of the Second Embodiment According to the configuration of FIG. 43, a synthetic balance parameter is generated based on pronunciation information, vowel pronunciation information, and consonant pronunciation information, whereby the movement of vowels and consonants are generated. By maintaining the ratio of the movement to the appropriate relationship, it is possible to create a rich expression animation.

【００７９】（３）他の実施例なお上述の第２の実施例においては、口形状パラメータ
の時系列として動き情報時系列を動き情報生成部から出
力する場合について述べたが、本発明はこれに限らず、
図４７に示すように加重パラメータ要素を格納する加重
パラメータ要素メモリ４８と、この加重パラメータ要素
で生成した加重パラメータ時系列を格納する加重パラメ
ータ時系列データメモリ５０を動き情報生成部に加え、
これにより動き情報出力部３０から直接加重パラメータ
時系列を出力するようにしてもよい。(3) Other Embodiments In the above-described second embodiment, the case where the motion information time series is output from the motion information generation unit as the time series of the mouth shape parameter has been described. Not only
As shown in FIG. 47, a weighting parameter element memory 48 for storing weighting parameter elements and a weighting parameter time series data memory 50 for storing weighting parameter time series generated by the weighting parameter elements are added to the motion information generation unit.
Thus, the weight information time series may be directly output from the motion information output unit 30.

【００８０】さらに上述の実施例においては、音声合成
システムから発音情報を入力する場合について述べた
が、本発明はこれに限らず、例えば肉声から発音情報を
検出し、この発音情報を用いるようにしてもよい。Further, in the above-described embodiment, the case where the pronunciation information is input from the speech synthesis system has been described. However, the present invention is not limited to this. For example, the pronunciation information is detected from the real voice, and the pronunciation information is used. You may.

【００８１】さらに上述の実施例においては、ゆつくり
話す場合と速く話す場合とで加重パラメータを切り換え
る場合について述べたが、本発明はこれに限らず、発音
タイミング情報だけでなく他の情報を併せて用いて加重
パラメータを切り換えるようにしてもよい。すなわち母
音加重パラメータ時系列の振幅は、口の開け方の大きさ
と関係がある。これに対して子音加重パラメータ時系列
の振幅は、口の動きの派手さに関係している。従つて例
えば、音声の音量や声の調子などの情報を検出し、この
検出結果に基づいて混合加重パラメータを変調すれば、
音声のイメージにより近いアニメーシヨンを生成するこ
とができる。Further, in the above-described embodiment, the case where the weighting parameter is switched between the case where the user speaks slowly and the case where the user speaks quickly is described. However, the present invention is not limited to this. May be used to switch the weighting parameter. That is, the amplitude of the vowel-weighted parameter time series is related to the size of the opening of the mouth. In contrast, the amplitude of the consonant weighted parameter time series is related to the loudness of the mouth movement. Therefore, for example, by detecting information such as the volume of the voice and the tone of the voice, and modulating the mixing weight parameter based on the detection result,
Animations closer to audio images can be generated.

【００８２】[0082]

【発明の効果】上述のように本発明によれば、予め設定
した口形状に対して、発音情報に基づいて母音及び特殊
音の内挿パラメータ時系列と、子音の内挿パラメータ時
系列とを生成した後、発音情報により発音のタイミング
を基準にして加重パラメータ時系列を生成し、当該生成
した加重パラメータ時系列を用いて口形状の多重内挿を
行うことによつて発音情報に応じて口形状の変化するア
ニメーシヨンを作成するようにしたことにより、多重内
挿法アニメーシヨンの利点を活かし、オブジエクトモデ
ルに依存しない表現豊かなリツプシンクアニメーシヨン
を簡単に生成し得る画像作成装置及び画像作成方法を実
現することができる。As described above, according to the present invention, the interpolation parameter time series of vowels and special sounds and the interpolation parameter time series of consonants are determined based on pronunciation information for a preset mouth shape. After generation, a weighted parameter time series is generated based on the sounding timing based on the pronunciation information, and multiple interpolation of the mouth shape is performed by using the generated weighted parameter time series, so that the mouth shape is determined according to the pronunciation information. An image creation apparatus that can easily generate rich lip sync animations that do not depend on an object model by taking advantage of the multiple interpolation animation method by creating an animation whose shape changes. An image creation method can be realized.

[Brief description of the drawings]

【図１】本発明の一実施例によるアニメーシヨン作成装
置の全体構成を示す略線図である。FIG. 1 is a schematic diagram illustrating an overall configuration of an animation creating apparatus according to an embodiment of the present invention.

【図２】発音情報を示す略線図である。FIG. 2 is a schematic diagram illustrating pronunciation information.

【図３】口形状オブジエクトを示す略線図である。FIG. 3 is a schematic diagram illustrating a mouth-shaped object.

【図４】母音の加重パラメータ要素を示す特性曲線図で
ある。FIG. 4 is a characteristic curve diagram showing weight parameter elements of vowels.

【図５】子音の加重パラメータ要素を示す特性曲線図で
ある。FIG. 5 is a characteristic curve diagram showing weight parameter elements of consonants.

【図６】発音記号「ｋ」で表される音声の加重パラメー
タ要素データを表す特性曲線図である。FIG. 6 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “k”.

【図７】その無音化した際の加重パラメータ要素データ
を表す特性曲線図である。FIG. 7 is a characteristic curve diagram showing weighted parameter element data at the time of silence.

【図８】発音記号「ｓ」で表される音声の加重パラメー
タ要素データを表す特性曲線図である。FIG. 8 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “s”.

【図９】その無音化した際の加重パラメータ要素データ
を表す特性曲線図である。FIG. 9 is a characteristic curve diagram showing weighted parameter element data at the time of silence.

【図１０】発音記号「ｓｈ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 10 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “sh”.

【図１１】その無音化した際の加重パラメータ要素デー
タを表す特性曲線図である。FIG. 11 is a characteristic curve diagram showing weighted parameter element data at the time of silence.

【図１２】発音記号「ｔ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 12 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “t”.

【図１３】発音記号「ｃｈ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 13 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “ch”.

【図１４】その無音化した際の加重パラメータ要素デー
タを表す特性曲線図である。FIG. 14 is a characteristic curve diagram showing weighted parameter element data at the time of silence.

【図１５】発音記号「ｔｓ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 15 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “ts”.

【図１６】その無音化した際の加重パラメータ要素デー
タを表す特性曲線図である。FIG. 16 is a characteristic curve diagram showing weighted parameter element data when the sound is muted.

【図１７】発音記号「ｎ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 17 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “n”.

【図１８】発音記号「ｈ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 18 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “h”.

【図１９】発音記号「ｆ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 19 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “f”.

【図２０】発音記号「ｍ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 20 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “m”.

【図２１】発音記号「ｙ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 21 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “y”.

【図２２】発音記号「ｒ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 22 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “r”.

【図２３】発音記号「ｗ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 23 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “w”.

【図２４】発音記号「ｇ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 24 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “g”.

【図２５】発音記号「ｑ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 25 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “q”.

【図２６】発音記号「ｚ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 26 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “z”.

【図２７】発音記号「ｊ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 27 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “j”.

【図２８】発音記号「ｄ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 28 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “d”.

【図２９】発音記号「ｂ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 29 is a characteristic curve diagram illustrating weighted parameter element data of a voice represented by a phonetic symbol “b”.

【図３０】発音記号「ｐ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 30 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “p”.

【図３１】発音記号「ｋｙ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 31 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “ky”.

【図３２】発音記号「ｎｙ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 32 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “ny”.

【図３３】発音記号「ｈｙ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 33 is a characteristic curve diagram showing weighted parameter element data of a speech represented by a phonetic symbol “hy”.

【図３４】発音記号「ｍｙ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 34 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “my”.

【図３５】発音記号「ｒｙ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 35 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “ry”.

【図３６】発音記号「ｇｙ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 36 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “gy”.

【図３７】発音記号「ｂｙ」で表される音声の加重パラ
メータ要素データを表す特性曲線図である。FIG. 37 is a characteristic curve diagram showing weighted parameter element data of a voice represented by a phonetic symbol “by”.

【図３８】発音記号「ｘ」で表される音声の加重パラメ
ータ要素データを表す特性曲線図である。FIG. 38 is a characteristic curve diagram showing weight parameter element data of a voice represented by a phonetic symbol “x”.

【図３９】リツプシングアニメーシヨンの自動生成の説
明に供する略線図である。FIG. 39 is a schematic diagram for explaining automatic generation of a ripping animation.

【図４０】ゆつくり話した場合の口の動きを示す特性曲
線図である。FIG. 40 is a characteristic curve diagram showing the movement of the mouth when the user speaks loosely.

【図４１】速く話した場合の口の動きを示す特性曲線図
である。FIG. 41 is a characteristic curve diagram showing mouth movements when speaking quickly.

【図４２】実際の口の動きを示す略線図である。FIG. 42 is a schematic diagram showing actual mouth movements.

【図４３】第２の実施例によるアニメーシヨン作成装置
を示すブロツク図である。FIG. 43 is a block diagram showing an animation creating apparatus according to a second embodiment.

【図４４】動き情報生成部を示すブロツク図である。FIG. 44 is a block diagram showing a motion information generation unit.

【図４５】動き情報合成部を示すブロツク図である。FIG. 45 is a block diagram illustrating a motion information synthesis unit.

【図４６】合成バランスパラメータの説明に供する略線
図である。FIG. 46 is a schematic diagram used for describing a synthesis balance parameter;

【図４７】動き情報生成部の他の実施例を示すブロツク
図である。FIG. 47 is a block diagram showing another embodiment of the motion information generator.

【図４８】多重内挿法を適用したアニメーシヨン作成の
説明に供する略線図である。FIG. 48 is a schematic diagram for explaining creation of animation using the multiple interpolation method;

[Explanation of symbols]

１、１０……アニメーシヨン作成装置、２……音声規則
合成部、４……加重パラメータ生成部、６……ＭＩＭｅ
アニメーシヨン生成部、１２……発音情報、１４……発
音情報分離部、１６、１８……動き情報生成部、３２…
…動き情報合成部。1, 10... Animation creation device, 2... Voice rule synthesizing unit, 4... Weighting parameter generating unit, 6... MIMe
Animation generation unit, 12 ... pronunciation information, 14 ... pronunciation information separation unit, 16, 18 ... motion information generation unit, 32 ...
... Motion information synthesis unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−133182（ＪＰ，Ａ) 特開平２−16681（ＪＰ，Ａ) 特開昭63−225875（ＪＰ，Ａ) ”テキスト情報に対応したロ形状変化を有する顔動画像の合成”，電子情報通信学会論文誌，Ｖｏｌ．Ｊ75−Ｄ−ＩＩ，Ｎｏ．２，ｐ．203−215 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 13/00 G06T 11/60 H04N 5/262 ＣＳＤＢ（日本国特許庁)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-4-133182 (JP, A) JP-A-2-16681 (JP, A) JP-A-63-225875 (JP, A) Supports "text information" Synthesizing a Face Moving Image with Shape Change ”, IEICE Transactions, Vol. J75-DII, No. 2, p. 203-215 (58) Fields investigated (Int. Cl. ⁷ , DB name) G06T 13/00 G06T 11/60 H04N 5/262 CSDB (Japan Patent Office)

Claims

(57) [Claims]

An image creating apparatus for creating animation data by a multiple interpolation method, comprising: a vowel based on pronunciation information for a preset mouth shape;
And interpolation parameters of special sounds and the consonant interpolation parameters
After generating a meter time series, a weighted parameter time series generating means for generating a weighted parameter time series based on the pronunciation timing based on the pronunciation information, and the mouth shape using the generated weighted parameter time series An image creating apparatus, comprising: animation creating means for creating an animation whose mouth shape changes according to the pronunciation information by performing multiple interpolation.

2. The method according to claim 1, wherein the preset mouth shape is a mouth shape that pronounces “A”, a mouth shape that pronounces “I”, and a mouth shape that pronounces “U”. The image creation device described in the above.

3. The weighted parameter time series generating means includes: an interpolation parameter time series of the vowel and special sound;
The weighted parameter is synthesized by synthesizing the sound interpolation parameter time series.
The image creation device according to claim 1 , wherein the image creation device generates a data time series .

4. The weighting parameter time series generating means according to claim 1 , further comprising:
Element data of interpolation parameters that generate parameter time series
Image creating apparatus according to claim 3, characterized in that to set the.

5. The weighted parameter time series generating means according to the type of consonant and / or the frequency of silence based on the pronunciation information.
In the same way, the intensity of the consonant appearing in the mouth shape changes,
The image creating apparatus according to claim 3 , wherein the weighted parameter time series is generated .

6. The weighted parameter time series generating means holds an appropriate ratio between a vowel movement and a consonant movement.
Based on pronunciation information, vowel pronunciation information, and consonant pronunciation information.
Weighting parameter such that the value changes in time series
The image creation device according to claim 3 , wherein the image creation device generates a time series .

7.Animation data by multiple interpolation
In the image creation method to create, Vowel based on pronunciation information for preset mouth shape
And interpolation parameters of special sounds and the consonant interpolation parameters
A first step of generating a meter time series; Weighted based on the pronunciation timing based on the above pronunciation information
A second step of generating a parameter time series; The mouth shape using the generated weighted parameter time series
Multi-interpolation of the shape of the mouth
The third step of creating animated animations With
An image creation method characterized in that it is obtained.

8. The preset mouth shape pronounces "a"
Mouth shape to pronounce "I" and mouth shape to pronounce "U"
Image <br/> image creating method according to claim 7, characterized that you become the mouth shape.

9.In the second step, The interpolation parameter time series of the vowels and special sounds and the consonants
The weighted parameter is synthesized by synthesizing the sound interpolation parameter time series.
Generate data time series Claims characterized by the following:7Described in
Image creation method.

The method according to claim 10, wherein said first stearyl class tap, for all vowels and consonants of the generated sound, the inner挿Pa
Element data of interpolation parameters that generate parameter time series
Image generating method according to claim 9, characterized in that to set the.

The method according to claim 11 wherein said second stearyl class tap, response to the type and or silence frequency consonant by the phonetic information
In the same way, the intensity of the consonant appearing in the mouth shape changes,
The method according to claim 9 , wherein a weighted parameter time series is generated .

The method according to claim 12 wherein said second stearyl class tap, to hold the ratio between the movement of the movement and consonant vowel to appropriate values
Based on pronunciation information, vowel pronunciation information, and consonant pronunciation information.
Weighting parameters such that the values change over time
The method according to claim 9 , wherein a time series is generated .