JPH0236687A

JPH0236687A - Face moving picture synthesis system

Info

Publication number: JPH0236687A
Application number: JP63187702A
Authority: JP
Inventors: Eiji Morimatsu; 映史森松; Toshitaka Tsuda; 俊隆津田; Kiichi Matsuda; 松田　喜一
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1988-07-27
Filing date: 1988-07-27
Publication date: 1990-02-06
Anticipated expiration: 2012-10-27
Also published as: JP2667455B2

Abstract

PURPOSE:To synthesize a more natural moving picture of a face by modifying a mouth so as to be matched with audio information sent from a sender side and displaying a mouth internal picture corresponding to the audio information in the case of reproducing a picture. CONSTITUTION:A two-dimensional mouth patch model is modified in response to a mouth code corresponding to the inputted audio information at the communication after the initializing operation and a changeover means 5 is switched in response to the mouth code. A mouth internal picture data inputted via the changeover means 5 is assigned to the two-dimensional mouth internal patch model representing the shape of mouth inside in terms of the set of patches and the result is outputted as a mouth internal picture in response to the mouth code. Then a picture synthesis means 6 synthesizes picture information from a still face picture data storage means 1, a two-dimensional mouth patch model modification means 2 and a two-dimensional mouth internal patch model modification means 4. Thus, a more natural moving picture of a face is synthesized and displayed in response to the audio information sent during the communication.

Description

【発明の詳細な説明】［目　次］概要産業上の利用分野従来の技術（第１２図）発明が解決しようとする課題課題を解決するための手段（第１図）作　用（第１図）実施例（第２〜１１図）発明の効果［概　要］初期化時に伝送される少数の初期化データを用いること
により、通信中に伝送される音声情報に応じて、受信側
で顔の動画像を合成して表示する顔動画像合成システム
に関し、受信側で、送信側から送られてくる音声情報に適合する
ように、口の部分を変形させて、画像を再生する際に、
音声情報に対応した口内部画像をも表示できるようにし
て、より自然な顔の動画像を合成できるようにすること
を目的とし、受信側に、初期化時に送られる１フレーム
分の静止顔画像データを記憶する手段と、口形２次元パ
ッチモデルを入力された音声情報に対応する口形符号に
応じて変形する手段と、自画像のうち口腔内部部分を切
り出した口内部画像データを全口形符号の数だけ記憶す
る手段と、口形符号に応じて選択された口内部画像デー
タを口内部２次元パッチモデルにあてはめて口形符号に
応じた口内部画像として出力する手段と、画像合成手段
とをそなえるように構成する。[Detailed description of the invention] [Table of contents] Overview Industrial application field Prior art (Figure 12) Means for solving the problem to be solved by the invention (Figure 1) Effect (Figure 1) ) Embodiment (Figures 2 to 11) Effects of the invention [Summary] By using a small amount of initialization data transmitted at the time of initialization, it is possible for the receiving side to adjust the facial expression according to the voice information transmitted during communication. Regarding a facial dynamic image synthesis system that synthesizes and displays moving images, when the receiving side deforms the mouth part to match the audio information sent from the transmitting side and plays the image,
The aim is to be able to display images of the inside of the mouth that correspond to audio information, thereby making it possible to synthesize more natural facial moving images. means for storing data; means for transforming a two-dimensional mouth shape patch model according to a mouth shape code corresponding to input audio information; means for applying the internal mouth image data selected according to the mouth shape code to an internal mouth two-dimensional patch model and outputting it as an internal mouth image according to the mouth shape code, and an image synthesizing means. Configure.

［産業上の利用分野］本発明は、初期化時に伝送される少数の初期化データを
用いることにより、通信中に伝送される音声情報に応じ
て、受信側で顔の動画像を合成して表示する顔動画像合
成システムに関する。[Industrial Application Field] The present invention uses a small amount of initialization data transmitted at the time of initialization to synthesize a moving image of a face on the receiving side according to audio information transmitted during communication. This invention relates to a facial moving image synthesis system for display.

テレビ（ＴＶ）電話、ＴＶ会議等においては、最終的に
公衆電話回線を利用した伝送方式を採用することが自振
とされており、このため、得られた画像情報を可能な限
り圧縮することが要望されている。In television (TV) telephones, TV conferences, etc., it is considered appropriate to ultimately adopt a transmission method that uses public telephone lines, and for this reason, it is necessary to compress the obtained image information as much as possible. is requested.

［従来の技術］ＴＶ電話等において伝送される画像は、通常。[Conventional technology] Images transmitted on videophones, etc. are normally used.

人物の原動画像であるが、かかる動画像情報は、第１２
図に示すように、音声情報とは独立して伝送されるのが
従来からの方式である。Although this is a moving image of a person, such moving image information is
As shown in the figure, the conventional method is to transmit audio information independently.

即ち、入力画像は、送信側においてＴＶカメラ６１によ
りアナログ画像信号として発生され、この画像信号は画
像符号化装置６２でディジタル信号に変換されて符号化
され圧縮されて受信側に送られる。受信側では、画像復
号化装置６３により受信画像を元の信号に復号化してデ
イスプレィ６４に出力画像として表示する。That is, an input image is generated as an analog image signal by a TV camera 61 on the transmitting side, and this image signal is converted into a digital signal by an image encoding device 62, encoded, compressed, and sent to the receiving side. On the receiving side, an image decoding device 63 decodes the received image into an original signal and displays it on a display 64 as an output image.

また、入力音声は送信側でマイクロ５で音声情報として
得た後、音声符号化装置６６で音声特有の符号化を行な
って圧縮した後、受信側で音声復号化装置６７で復号化
されてスピーカー６８から出力音声として得られる。In addition, the input voice is obtained as voice information by the micro 5 on the transmitting side, and is then compressed by voice-specific encoding in the voice encoding device 66. Then, on the receiving side, it is decoded by the voice decoding device 67, and then it is transmitted to the speaker. 68 as output audio.

しかしながら、このような従来から一般的に行なわれて
きた動画像の伝送方式は、動画像の情報量が大きいため
、低ビツトレートの通信回線を利用することができず、
コストが高くなってしまうとともに、公衆電話回線を利
用したＴＶ電話等の適用には程遠いという問題点があっ
た。However, since the amount of information contained in moving images is large, this conventional video transmission method that has been commonly used cannot use low bit rate communication lines.
There were problems in that the cost was high and that it was far from being applicable to TV telephones using public telephone lines.

そこで、送信側からは例えば顔の静止画情報をあらかじ
め送っておき、受信側で、送信側から送られてくる音声
情報からこの音声情報に適合するように、口の部分など
を変形させて、画像を再生することも考えられる。Therefore, the sending side sends, for example, still image information of the face in advance, and the receiving side deforms the mouth part etc. from the audio information sent from the sending side to match this audio information. It is also possible to play back images.

［発明が解決しようとする課題］しかしながら、このような手段では、口を開いた顔画像
を合成する際に、口内部（歯の部分とか舌の部分など）
の画像が表示されないため、顔の動画像としては不自然
さが残ってしまうという問題点がある。[Problems to be Solved by the Invention] However, with such means, when synthesizing a face image with an open mouth, the inside of the mouth (teeth, tongue, etc.)
Since the image of the face is not displayed, there is a problem in that the moving image of the face remains unnatural.

本発明は、このような問題点を解決しようとするもので
、初期化時に伝送される少数の初期化データを用いるこ
とにより、通信中に伝送される音声情報に応じて、受信
側で顔の動画像を合成して表示するものにおいて、受信
側で、送信側から送られてくる音声情報からこの音声情
報に適合するように１口の部分を変形させて、画像を再
生する際に、音声情報に対応した口内部側像をも表示で
きるようにして、より自然な顔の動画像を合成できるよ
うにした、顔動画像合成システムを提供することを目的
としている。The present invention aims to solve such problems, and by using a small amount of initialization data transmitted at the time of initialization, the receiving side can adjust the facial expression according to the voice information transmitted during communication. In a device that synthesizes and displays moving images, the receiving side transforms the first mouth part from the audio information sent from the transmitting side to match this audio information, and when playing the image, the audio It is an object of the present invention to provide a facial moving image synthesis system that can also display an image of the inside of the mouth corresponding to information and synthesize a more natural facial moving image.

［課題を解決するための手段］第１図は本発明の原理ブロック図である。[Means to solve the problem] FIG. 1 is a block diagram of the principle of the present invention.

第１図において、１は静止顔画像データ記憶手段で、こ
の静止顔画像データ記憶手段１は、初期化時に送られる
１フレーム分の静止顔画像データを記憶するものである
。In FIG. 1, reference numeral 1 denotes still face image data storage means, and this still face image data storage means 1 stores one frame of still face image data sent at the time of initialization.

２は口形２次元パッチモデル変形手段で、この口形２次
元パッチモデル変形手段２は、口を含む口周辺の形状を
パッチの集合で表わした口形２次元パッチモデルを、入
力された音声情報に対応する口形符号に応じて変形する
ものである。2 is a mouth shape two-dimensional patch model transforming means, and this mouth shape two-dimensional patch model transforming means 2 converts a mouth shape two-dimensional patch model, which represents the shape of the mouth including the mouth as a set of patches, into a shape corresponding to the input audio information. It transforms depending on the mouth shape code.

３は口内部画像データ記憶手段で、この口内部画像デー
タ記憶手段３は、口直像のうち口腔内部部分を切り出し
た口内部側像データを、全口形符号の数だけ記憶するも
ので、このために複数（Ｍ；自然数）の口内部面像メモ
リ３−１〜３−Ｍを有している。Reference numeral 3 denotes an internal mouth image data storage means, and this internal mouth image data storage unit 3 stores internal mouth side image data obtained by cutting out the internal part of the oral cavity from a direct image of the mouth, as many as the number of whole mouth type codes. Therefore, a plurality (M: natural number) of intraoral surface image memories 3-1 to 3-M are provided.

４は口内部２次元パッチモデル変形手段で、この口内部
２次元パッチモデル変形手段４は、口形符号に応じて切
り替わった切替手段５を介して入力された口内部画像デ
ータ記憶手段３から選択された口内部側像データを、口
内部形状をパッチの集合で表わした口内部２次元パッチ
モデルにあてはめて口形符号に応じた口内部側像として
出力するものである。Reference numeral 4 denotes an internal mouth two-dimensional patch model deforming means, and this internal mouth two-dimensional patch model deforming means 4 is selected from the internal mouth image data storage means 3 input via the switching means 5 which is switched according to the mouth shape code. The internal mouth image data is applied to an internal mouth two-dimensional patch model in which the internal shape of the mouth is represented by a set of patches, and is output as an internal mouth image corresponding to the mouth shape code.

５は切替手段で、この切替手段５は、口形符号に応じて
この口形符号に対応する口内部面像メ、モ・す３　　ｘ
　（１＝１１２ｔ　・・、Ｍ）と口内部２次元パッチモ
デル変形手段４とを接続するように切り替わるものであ
る。Reference numeral 5 denotes a switching means, and this switching means 5 selects an intraoral surface image corresponding to the mouth shape code according to the mouth shape code.
(1=112t . . . , M) is switched to connect the internal mouth two-dimensional patch model deforming means 4.

６は画像合成手段で、この画像合成手段６は、静止顔画
像データ記憶手段１９ロ形２次元パッチモデル変形手段
２および口内部２次元パッチモデル変形手段４からの画
像情報を合成するものである。Reference numeral 6 denotes an image synthesizing means, and this image synthesizing means 6 synthesizes image information from the still face image data storage means 19, the square two-dimensional patch model transforming means 2, and the internal mouth two-dimensional patch model transforming means 4. .

［作　用］上述の構成により、まず、初期化時において、送信側か
ら静止原画像データ記憶手段１へ静止顔画像データが送
られるが、その後は、口形２次元パッチモデルが静止顔
画像と整合するように、静止顔画像データ記憶手段１の
静止顔画像データと口形２次元パッチモデル変形手段２
の口形２次元パッチモデルのデータとのマツピングが施
されるとともに、口内部画偽が全口形符号に対して口内
部２次元パッチモデルに整合するように、目白部２次元
パッチモデル変形手段４の口内部２次元パッチモデルと
口内部画像データ記憶手段３の口内部側像データとのマ
ツピングが口形符号の全てについて施される。[Function] With the above configuration, first, at the time of initialization, still face image data is sent from the sending side to the still original image data storage means 1, but after that, the mouth shape two-dimensional patch model is matched with the still face image. The still face image data in the still face image data storage means 1 and the mouth shape two-dimensional patch model transforming means 2 are
The eye white part two-dimensional patch model deforming means 4 is mapped with the data of the mouth shape two-dimensional patch model, and the eye white part two-dimensional patch model deforming means 4 is mapped with the data of the mouth shape two-dimensional patch model. Mapping between the mouth interior two-dimensional patch model and the mouth interior side image data stored in the mouth interior image data storage means 3 is performed for all mouth shape codes.

このような初期化時の操作の後、通信が行なわれるわけ
であるが、かかる通信時においては、口形２次元パッチ
モデルが、入力された音声情報に対応する口形符号に応
じて変形されるとともに、口形符号に応じて切替手段５
が切り替わり、この切替手段５を介して入力された口内
部側像データが、口内部形状をパッチの集合で表わした
口内部２次元パッチモデルにあてはめられ、口形符号に
応じた口内部側像として出力される。即ち、各口形符号
に応じて、対応する口内部側像のマツピングデータから
口内部側像が再生される。After such initialization operations, communication is performed, and during such communication, the mouth shape two-dimensional patch model is transformed according to the mouth shape code corresponding to the input audio information, and , switching means 5 according to the mouth shape code.
The internal mouth side image data inputted through this switching means 5 is applied to the internal mouth two-dimensional patch model that represents the internal mouth shape as a set of patches, and the internal mouth side image data is applied to the internal mouth internal side image according to the mouth shape code. Output. That is, in accordance with each mouth shape code, the internal mouth image is reproduced from the mapping data of the corresponding internal mouth image.

そして、画像合成手段６にて、静止顔画像データ記憶手
段１２ロ形２次元パッチモデル変形手段２および口内部
２次元パッチモデル変形手段４からの画像情報が合成さ
れる。Then, the image synthesis means 6 synthesizes the image information from the still face image data storage means 12, the square two-dimensional patch model transformation means 2, and the internal mouth two-dimensional patch model transformation means 4.

これにより１通信中に伝送される音声情報に応じて、受
信画で、顔の動画像が合成されて表示される。As a result, a moving image of a face is synthesized and displayed as a received image according to audio information transmitted during one communication.

［実施例］以下、図面を参照して本発明の詳細な説明する。[Example] Hereinafter, the present invention will be described in detail with reference to the drawings.

第２図は本発明の一実施例を示すブロック図で、この実
施例では、送信部１０と受信部２０とが設けられ、送信
部１０は、顔画像入力を画像処理する画像処理部１１と
、音声入力を符２号化する音声符号化部１２とを含んで
いる。FIG. 2 is a block diagram showing an embodiment of the present invention. In this embodiment, a transmitting section 10 and a receiving section 20 are provided, and the transmitting section 10 is equipped with an image processing section 11 that processes an input facial image. , and a voice encoding unit 12 that encodes the voice input into code 2.

また、受信部２０は、背景画メモリ（静止顔画像データ
記憶手段）１９．音声復号化部２１．音声認識部２２７
ロ形モデル変形部３６．制御点座標メモリ（テーブル）
２３．瞼形モデル変形部２４、合成部２５．補間点計算
部２７．ランダムパルス発生部２８．座標テーブル制御
部２９を有している。The receiving unit 20 also includes a background image memory (still face image data storage means) 19. Audio decoding section 21. Voice recognition section 227
Square model deformation section 36. Control point coordinate memory (table)
23. Eyelid shape model deformation section 24, synthesis section 25. Interpolation point calculation unit 27. Random pulse generator 28. It has a coordinate table control section 29.

ここで、背景画メモリ１９は、初期化時に送信側より送
られた１フレーム分の顔画像の静止画データ（例えば口
を閉じた顔画像データ）を記憶し格納するものである。Here, the background image memory 19 stores one frame of still image data of a face image (for example, face image data with a closed mouth) sent from the transmitting side at the time of initialization.

また、音声復号化部２１は送信部１０で符号化された音
声符号を復号化するもので、音声認識部２２は音声復号
化部２１から出力された音声信号を音声認識するもので
ある。The audio decoding unit 21 decodes the audio code encoded by the transmitting unit 10, and the audio recognition unit 22 performs audio recognition on the audio signal output from the audio decoding unit 21.

口形モデル変形部３６は、音声認識部２２を通じて入力
された音声情報に対応する口形符号に応じて口形を変形
するもので、第３図に示すごとく、コードブック３６１
２ロ外部モデル変形部３６２゜口内部画像データ記憶部
３６３．口内部モデル変形部３６４．切替スイッチ部３
６５２合成画像メモリ３６６をそなえている。The mouth shape model modification section 36 transforms the mouth shape according to the mouth shape code corresponding to the speech information inputted through the speech recognition section 22, and as shown in FIG.
2. External model transformation section 362. Mouth internal image data storage section 363. Mouth interior model deformation part 364. Changeover switch section 3
652 composite image memory 366.

コードブック３６１は、音声認識部２２から次々と出力
される音素符号（音声の基本構成単位である母音又は子
音などから成るもの）から１組の口形パラメータ値を逐
次選択するものであるが、このコードブック３６１には
、第４図に示すように、特定の話者が各音素！、■・・
・９ｍを発生した場合の口の形状をパラメータ■　（例
えば口の横＠）、ＩＩ（例えば唇の厚さ）、・・・＋　
ｎ　（例えば口の縦幅）として数値化したテーブルが予
めその個人情報として記憶されている。The codebook 361 is for sequentially selecting a set of mouth shape parameter values from phoneme codes (consisting of vowels, consonants, etc., which are the basic structural units of speech) output one after another from the speech recognition unit 22. In the codebook 361, as shown in FIG. 4, each phoneme is recorded by a specific speaker! , ■...
・Parameters for the shape of the mouth when 9m is generated ■ (for example, the side of the mouth), II (for example, the thickness of the lips), ...+
A table quantified as n (for example, the vertical width of the mouth) is stored in advance as the personal information.

また、ロ外部モデル変形部３６２は、コードブック３６
１で逐次選択されたＩＭｉの口形パラメータ値に応じて
、口を含む口周辺の形状を第７図に示すように複数（こ
の例では２６）のバッチＲ工〜Ｒ２Ｇの集合で表わした
口形２次元パッチモデルを、各パッチＲ工〜Ｒ２ｓの頂
点を制御点として変形することにより、ロ外部モデル画
像を変形するものであり、例えば、上記の音素ｉ、ｎ、
ｍに対する自画像の一例を模式的に示すと、第６図（ａ
）（ｂ）、（Ｑ）のようになる。In addition, the external model deformation unit 362
In accordance with the mouth shape parameter values of IMi sequentially selected in step 1, the mouth shape 2 is expressed as a set of multiple (26 in this example) batches R to R2G, as shown in FIG. 7, including the mouth. By transforming the dimensional patch model using the vertices of each patch R to R2s as control points, the external model image is transformed. For example, the above phonemes i, n,
Fig. 6 (a) schematically shows an example of a self-portrait for m.
)(b),(Q).

従って、これらのコードブック３６１２ロ外部モデル変
形部３６２は、口形２次元パッチモデルを、入力された
音声情報に対応する口形符号に応じて変形する口形２次
元パッチモデル変形手段を構成する。Therefore, the codebook 3612 and the external model transformation unit 362 constitute a mouth shape two-dimensional patch model transforming means that transforms the mouth shape two-dimensional patch model according to the mouth shape code corresponding to the input audio information.

なお、ロ外部モデル変形部３６２は、初期化時に、個人
情報として予めその特定話者の１画面（１フレーム）分
の自画像データを背景画メモリ１９を介してもらい、こ
れを口の幾何学的形状の骨組となるパッチ・モデルにマ
ツピングしたものを口形モデル画像として記憶しておく
が、このように、最初に送信部１０から目部分の画像を
１画面分送っておく場合でも、フードブック３６１は予
め作っておく必要がある。In addition, at the time of initialization, the external model transformation unit 362 receives in advance one screen (one frame) of self-portrait data of the specific speaker as personal information via the background image memory 19, and converts this data into the geometric shape of the mouth. The image mapped onto the patch model that serves as the framework of the shape is stored as a mouth shape model image, but even if one screen of the eye image is first sent from the transmitter 10 in this way, the food book 361 must be made in advance.

口内部画像データ記憶部３６３は、自画像のうち口腔内
部部分（歯の部分や舌の部分など）を切り出した口内部
画像データを、全口形符号の数（Ｍ）だけ記憶するもの
で、このために複数（Ｍ）の口内部画像メモリ３６３−
１〜３６３−Ｍを有している。なお、この場合の各口形
符号１，２゜、Ｍの口内部画像の例を示すと、第９図の
下段部のようになる。The internal mouth image data storage unit 363 stores the internal mouth image data obtained by cutting out the internal parts of the oral cavity (teeth parts, tongue parts, etc.) from the self-portrait as many as the number (M) of the whole mouth shape code. A plurality (M) of internal mouth image memories 363-
1 to 363-M. In this case, examples of internal mouth images of mouth shapes 1, 2°, and M are shown in the lower part of FIG. 9.

口内部モデル変形部３６４は、口形符号１（ｉ＝１，２
．　　・・、Ｍ）に応じて切り替わった切替スイッチ部
３６５を介して入力された口内部画像データ記憶部３６
３から選択された口内部画像データを、口内部形状を第
８図に示すように複数（この例では８）のパッチＳ□〜
Ｓ、の集合で表わした口内部２次元パッチモデルにあて
はめて、口形符号に応じた口内部画像として出力するも
のである。その様子を模式的に示すと、第９図のように
なる。The mouth internal model deformation unit 364 has a mouth shape code 1 (i=1, 2
．． . . , M), the internal mouth image data storage unit 36 input via the changeover switch unit 365
The internal mouth image data selected from 3 is divided into a plurality of (8 in this example) patches S□~ as shown in FIG.
This is applied to a two-dimensional internal mouth patch model represented by a set of S, and output as an internal mouth image corresponding to the mouth shape code. The situation is schematically shown in FIG. 9.

切替スイッチ部３６５は１口形符号ｉに応じてこの口形
符号に対応する口内部画像メモリ３６３−１と口内部モ
デル変形部３６４とを接続するように切り替わるもので
ある。The changeover switch section 365 is switched in accordance with the mouth shape code i to connect the mouth interior image memory 363-1 and the mouth interior model transformation section 364 corresponding to this mouth shape symbol.

合成画像メモリ３６６は、ロ外部モデル変形部３６２お
よび口内部モデル変形部３６４からの画像情報を合成す
るものである。The composite image memory 366 combines image information from the external model transformation section 362 and the internal mouth model transformation section 364.

なお、この場合も、初期化時において１口形２次元パッ
チモデル（第７図参照）が静止顔画像と整合するように
、静止顔画像データと口形２次元パッチモデルのデータ
とのマツピングを施すほか、口内部画像が全口形符号に
対して口内部２次元パッチモデル（第８図参照）に整合
するように、口内部２次元パッチモデルと口内部画像デ
ータとのマツピングを口形符号の全てについて施してお
く。In this case as well, mapping is performed between the still face image data and the data of the mouth-shaped two-dimensional patch model so that the one-mouth-shaped two-dimensional patch model (see Figure 7) matches the still face image at the time of initialization. , Mapping between the internal mouth 2D patch model and the internal mouth image data is performed for all mouth shape codes so that the internal mouth image matches the internal mouth 2D patch model (see Figure 8) for all mouth shape codes. I'll keep it.

次に、第２図に示す補間点計算部２７は、静止画データ
に対応する瞼形状モデル（第１０図参照）の全頂点Ｐ１
〜Ｐ、の座標データを初期化時に受けて、まばたき開始
から終了までの各フレーム時点での制御点ｐ、、ｐ、、
ｐ４の座標を線形補間計算し、そのデータを制御点座標
メモリ２３へ送るものである。Next, the interpolation point calculation unit 27 shown in FIG. 2 calculates all vertices P1 of the eyelid shape model (see FIG. 10) corresponding to the still image data.
~P, is received at the time of initialization, and the control points p,, p, , at each frame point from the start to the end of blinking are determined.
The coordinates of p4 are calculated by linear interpolation, and the data is sent to the control point coordinate memory 23.

すなわち、この瞼形状モデルは、第１０図に示すごとく
、８個の頂点Ｐｉ〜Ｐ、（各点がｘ、ｙの２次元座標値
をもつ）と、これらの頂点Ｐｉ〜Ｐ。That is, as shown in FIG. 10, this eyelid shape model has eight vertices Pi to P (each point has two-dimensional coordinate values of x and y) and these vertices Pi to P.

をつないでできる６個の三角形パッチＴ□〜Ｔ６とで構
成されるが、この瞼形状モデルは、まばたきの動作を合
成するため、ｐ、、ｐ、、ｐ４を制御点（Ｘ＋　ｙ座標
を変化させる点）とし、その他の５点は不動（固定点）
としている。This eyelid shape model is composed of six triangular patches T□ to T6, which are formed by connecting the point), and the other five points are immovable (fixed points).
It is said that

そして、この補間点計算部２７においては、初期化時に
、８個の頂点Ｐ１〜Ｐ８の座標のほかに、Ｐ、、Ｐ、、
Ｐ４の最下点を示すｐ２　、ｐ。Then, in this interpolation point calculation unit 27, in addition to the coordinates of eight vertices P1 to P8, P, , P, .
p2, p indicating the lowest point of P4.

Ｐ、′の３点の座標値も与えられ、あらかじめ与えられ
たまばたき１回当りのフレーム数Ｎより、Ｐ２→Ｐ２′
→Ｐ　ｚ　、Ｐ　ｘ→Ｐ、′→Ｐ、、Ｐ４→Ｐ４→Ｐ４
の各区間を線形補間するようになっている。The coordinate values of the three points P and ' are also given, and from the pre-given number of frames per blink N, P2 → P2'
→P z , P x → P, ' → P, , P4 → P4 → P4
It is designed to perform linear interpolation for each interval.

制御点座標メモリ２３は、陰影モデル画像の瞼パラメー
タを基に瞼のまばたき動作を記憶するものである。具体
的には、上記補間点計算部２７で補間計算されたまばた
き開始から終了までの各フレーム時点における３つの制
御点Ｐｚ、Ｐ３．Ｐ４の座標をテーブルの形で、制御点
座標メモリ２３に記憶領域に保管するのである。この制
御点座標テーブルの構成例を第５図に示す。The control point coordinate memory 23 stores the blinking motion of the eyelids based on the eyelid parameters of the shadow model image. Specifically, three control points Pz, P3 . The coordinates of P4 are stored in the storage area of the control point coordinate memory 23 in the form of a table. An example of the structure of this control point coordinate table is shown in FIG.

ランダムパルス発生部２８は、まばたき信号（ランダム
パルス信号）を発生するものである。The random pulse generator 28 generates a blink signal (random pulse signal).

また、座標テーブル制御部２９は、ランダムパルス発生
部２８からまばたき開始信号を受けた時点から制御点座
標メモリ２３の座標テーブル内の全頂点データを順次読
み出し、各フレームごとに陰影モデル変形部２４へと転
送するものである。Further, the coordinate table control unit 29 sequentially reads out all vertex data in the coordinate table of the control point coordinate memory 23 from the time when the blink start signal is received from the random pulse generation unit 28, and sends the data to the shadow model transformation unit 24 for each frame. This is what is transferred.

陰影モデル変形部２４は、顔の瞼部分の幾何学的形状を
示す陰影パラメータによって定義される陰影モデル画像
を記憶するもので、この陰影モデル変形部２４では、制
御点座標メモリ２３から瞼パラメータを取り出し、この
瞼パラメータに基づいて陰影モデル画像を変形するもの
である。具体的には、座標テーブル制御部２９の作用に
より、制御点座標メモリ２３から順次送られてくる瞼パ
ラメータを取り込んで、この瞼パラメータに基づいて陰
影モデル画像を変形するのである。ここで、この陰影モ
デル画像の変形の様子を模式的に示すと、第１１図（ａ
）〜（ｃ）のようになる。The shadow model transformation unit 24 stores a shadow model image defined by shadow parameters indicating the geometrical shape of the eyelid portion of the face. The shadow model image is then transformed based on the eyelid parameters. Specifically, by the action of the coordinate table control unit 29, the eyelid parameters sequentially sent from the control point coordinate memory 23 are taken in, and the shadow model image is transformed based on the eyelid parameters. Here, the state of deformation of this shadow model image is schematically shown in Fig. 11 (a
) to (c).

合成部２５は１口形モデル変形部３６から発生された画
像口画像（この画像は日周辺部を含む口外影画像と歯や
歯ぐきの部分等を含む口内部画像とを合成されたもので
ある）および陰影モデル変形部２４から発生された瞼画
像を、背景画メモリ１９に記憶された静止顔画像の目部
分および瞼部分以外の画像と合成するものである。The synthesis unit 25 generates an image mouth image generated from the single-mouth model transformation unit 36 (this image is a composite of an extra-mouth shadow image including the periphery and an intra-mouth image including the teeth and gums). The eyelid image generated from the shadow model transformation unit 24 is then synthesized with an image other than the eye and eyelid portions of the still face image stored in the background image memory 19.

次に、この実施例の動作を説明する。Next, the operation of this embodiment will be explained.

音声入力は音声符号化部１２で符号化されて受信部２０
に伝送されるが、この音声符号は音声復号化部２１で復
号化して音声として出力される。The audio input is encoded by the audio encoder 12 and sent to the receiver 20.
This audio code is decoded by the audio decoding section 21 and output as audio.

また、一方において、この音声出力は音声認識部２２に
送られ、その音素符号が逐次抽出されてコードブック３
６１に送られる。コードブック３６１では、入力した音
素符号に基づいて第４図に示すコードブックの中から対
応する口形に関する１組のパラメータ値Ｉ、ＩＩ、・・
・、ｎを選択する。On the other hand, this voice output is sent to the voice recognition unit 22, and its phoneme codes are sequentially extracted and the codebook 3
Sent to 61. In the codebook 361, a set of parameter values I, II, . . . regarding the corresponding mouth shape are selected from the codebook shown in FIG.
・, select n.

そして、これらの選択された１組のパラメータ値により
、予め記憶したロ外部モデル画像を変形した口周辺画像
をロ外部モデル変形部３６２で発生する。この結果、発
生された自画像と音声認識部２２で抽出された音素との
対応関係は、例えば第６図（ａ）、（ｂ）、（ｃ）に示
すようになる。Then, based on the selected set of parameter values, the external model deforming section 362 generates a mouth area image obtained by transforming the external model image stored in advance. As a result, the correspondence between the generated self-image and the phonemes extracted by the voice recognition unit 22 is as shown in FIGS. 6(a), (b), and (c), for example.

また１口形符号ｉに応じて切替スイッチ部３６５が切り
替わり、この切替スイッチ部３６５を介して入力された
口内部画像データが、口内部２次元パッチモデルにあて
はめられ、各口形符号ｉに応じた口内部画像として出力
される。即ち、各口形符号に応じて、対応する口内部画
像のマツピングデータから口内部画像が再生される。そ
の様子を模式的に示すと、第９図のようになる。Further, the changeover switch section 365 is switched according to the mouth shape code i, and the internal mouth image data inputted through the changeover switch section 365 is applied to the internal mouth two-dimensional patch model, and the mouth according to each mouth shape code i is applied. Output as an internal image. That is, according to each mouth shape code, the internal mouth image is reproduced from the mapping data of the corresponding internal mouth image. The situation is schematically shown in FIG. 9.

そして、合成画像メモリ３６６にて、ロ外部モデル変形
部３６２および口内部モデル変形部３６４からの画像情
報が合成されることにより、自画像がつくられる。Then, in the composite image memory 366, the image information from the external model deforming section 362 and the internal mouth model deforming section 364 are combined to create a self-portrait.

なお、初期化時には、第７図に示すような口形２次元パ
ッチモデルが静止顔画像と整合するように、これらの静
止顔画像データと口形２次元パッチモデルのデータとの
マツピングが施されるほか、口内部画像が全口形符号に
対して口内部２次元パッチモデル（第９図参照）に整合
するように、これらの口内部２次元パッチモデルと口内
部画像データとのマツピングが口形符号の全てについて
施される。In addition, at the time of initialization, mapping is performed between the still face image data and the data of the mouth shape two-dimensional patch model so that the mouth shape two-dimensional patch model shown in FIG. 7 matches the still face image. , the mapping of these internal mouth 2D patch models and the internal mouth image data is performed to match the entire mouth shape code to the internal mouth 2D patch model (see Figure 9). It is carried out about.

一方、ランダムパルス発生部２８からは、ランダムな時
間間隔で、まばたき開始信号が発せられる。On the other hand, the random pulse generator 28 generates a blink start signal at random time intervals.

このようにランダムパルス発生部２８からパルス列信号
が出力されると、座標テーブル制御部２９では、このま
ばたき開始信号を受けた時点から、制御点座標メモリ２
３の座標テーブル内の全頂点データを読み出し、各フレ
ーム毎に陰影モデル変形部２４へと転送する。かかる転
送はまばたき開始信号発生時から単位まばたき当りのフ
レーム数が経過した時点で終了する。そして、陰影モデ
ル変形部２４では、上記の頂点データに従って、あらか
じめ記憶した陰影モデル画像を変形した瞼画像を発生す
る。When the pulse train signal is outputted from the random pulse generator 28 in this way, the coordinate table controller 29 starts the control point coordinate memory 2 from the time when this blink start signal is received.
All vertex data in the coordinate table No. 3 is read out and transferred to the shadow model transformation unit 24 for each frame. Such transfer ends when the number of frames per unit blink has elapsed since the blink start signal was generated. Then, the shadow model transformation unit 24 generates an eyelid image by transforming the shadow model image stored in advance according to the above vertex data.

このようにして変形して発生された自画像（この自画像
は口外部画像と口内部画像とを合成したものである）お
よび瞼画像は、その後、背景画メモリ１９に記憶された
静止顔画像の口および瞼以外の画像と、合成部２５で、
合成されて、顔全体の動画像として出力されることとな
る。The self-portrait (this self-portrait is a composite of the external mouth image and the internal mouth image) and eyelid image transformed and generated in this way are then combined with the mouth of the still face image stored in the background image memory 19. and images other than the eyelids, in the compositing section 25,
The images will be combined and output as a moving image of the entire face.

これにより、原動画の情報をより圧縮できるので、情報
量を大きく削減することができ、その結果、低ビツトレ
ートの回線を利用した低廉な画像伝送方式を実現できる
ほか、再生画像の瞼部分が適当にまばたきをしながら、
更に入力音声情報に適合するように、口の部分を変形さ
せる際に、音声情報に対応した口内部画像をも表示でき
るので、より自然な顔の動画像を合成できる。This makes it possible to further compress the information in the original video, greatly reducing the amount of information.As a result, it is possible to realize an inexpensive image transmission method that uses a low bit rate line, and the eyelids of the reproduced image can be adjusted appropriately. While blinking,
Furthermore, when deforming the mouth to match the input audio information, it is also possible to display an image of the inside of the mouth that corresponds to the audio information, making it possible to synthesize a more natural moving image of the face.

なお、上記の口形モデル変形部３６での口形モデル画像
の変形および陰影モデル変形部２４での陰影モデル画像
の変形に用いられる手法は、信学技報丁Ｅ８７−２．第
８７巻、第１９号、１９８７に記述されている。The method used to transform the mouth shape model image in the mouth shape model transformation section 36 and the shadow model image in the shadow model transformation section 24 is as described in IEICE Technical Report E87-2. 87, No. 19, 1987.

［発明の効果］以上詳述したように、本発明の顔動画像合成システムに
よれば、原動画の情報をより圧、ｆ！できるので、情報
量を大きく削減することができ、その結果、低ビツトレ
ートの回線を利用した低廉な画像伝送方式を実現できる
ほか、受信側で、送信側から送られてくる音声情報から
この音声情報に適合するように、口の部分を変形させて
、画像を再生する際に、音声情報に対応した口内部画像
をも表示できるので、より自然な顔の動画像を合成でき
るという利点がある。[Effects of the Invention] As detailed above, according to the facial moving image synthesis system of the present invention, the information of the original moving image can be further enhanced, f! As a result, the amount of information can be greatly reduced, and as a result, it is possible to realize an inexpensive image transmission method that uses a low bit rate line. When playing back an image by deforming the mouth part to match the image, it is also possible to display an image of the inside of the mouth that corresponds to the audio information, which has the advantage of being able to synthesize a more natural-looking moving image of the face.

[Brief explanation of the drawing]

第１図は本発明の原理ブロック図、第２図は本発明の一実施例を示すブロック図、第３図は
口形モデル変形部のブロック図、第４図はコードブック
の構成図、第５図は制御点座標テーブルの構成図、第６図（ａ）＝
　　（ｂ）、（ｃ）は音素符号に対する口画像を示す図
。第７図は口形２次元パッチモデルを示す図。第８図は口内部２次元パッチモデルを示−ソ”図、第９
図は口内部画像の生成法を示す模式図。第１０図は瞳領域の形状モデル構成を示す図、第１１図
（ａ）、（ｂ）、（ｃ）は陰影モデル画像の変形の概念
を説明する図、第１２図は従来の一般的な画像伝送方式を示す系統図で
ある。図において、１は静止顔画像データ記憶手段、２は口形２次元パッチモデル変形手段、３は口内部画像
データ記憶手段。３−１〜３−Ｍは口内部画像メモリ。４は口内部２次元パッチモデル変形手段、５は切替手段
、６は画像合成手段。１０は送信部、１１は画像処理部、１２は音声符号化部、１９は背景画メモリ、２０は受信部、２１は音声復号化部、２２は音声認諏部、２３は制御点座標メモリ（テーブル）、２４は瞼形モデ
ル変形部、２５は合成部、２７は補間点計算部、２８はランダムパルス発生部、２９は座標テーブル制御部、３６は口形モデル変形部。３６１はコードブック。３６２はロ外部モデル変形部。３６３は口内部画像データ記憶部、３６３−１〜３６３−Ｍは口内部画像メモリ、３６４は
口内部モデル変形部。３６５は切替スイッチ部、３６６は合成画像メモリである。コードブックの杉り云゛口第４　図５１を賢９４外）ｉ卸、曾、７壬オ衆ヤーブルｒ＠５目第５　国ｆ−Ｌ吉業■ ＝１　１１ｉ（Ｏ）（ｂ）Ｉｆ” λト晋素にズ寸オろＤ旦イ家１運丁を巧第６図ロ形２ン穴元ノＶ−／芒ヒモチノ４カへ′１′Ｄコ／／＼ロロ舌岩ｊＩ徨ｆｌ生、吸（乞ｊ瓢すオ更デ灯３コ第９
図口内台Ｐ２次元）で、７手モすル乞がす２とｊ、÷貝域
／１′７１′９状七ｉ′ル７丁茸７蚊？、せ刀第１０図 −フし−ム１７し−ムＮ／２７し−ムＮ８會・わモすルＶ！像η−１−ＩＡｔｒｔ／忠Ｅｇ児明
す相第１１図Fig. 1 is a block diagram of the principle of the present invention, Fig. 2 is a block diagram showing an embodiment of the present invention, Fig. 3 is a block diagram of a mouth shape model deformation section, Fig. 4 is a configuration diagram of a codebook, and Fig. 5 is a block diagram showing an embodiment of the present invention. The figure is a configuration diagram of the control point coordinate table, Figure 6 (a) =
(b) and (c) are diagrams showing mouth images corresponding to phoneme codes. FIG. 7 is a diagram showing a mouth shape two-dimensional patch model. Figure 8 shows a two-dimensional patch model of the inside of the mouth.
The figure is a schematic diagram showing a method for generating an internal mouth image. Figure 10 is a diagram showing the configuration of the shape model of the pupil region, Figures 11 (a), (b), and (c) are diagrams explaining the concept of deformation of the shadow model image, and Figure 12 is the conventional general FIG. 2 is a system diagram showing an image transmission method. In the figure, 1 is a still face image data storage means, 2 is a mouth shape two-dimensional patch model transformation means, and 3 is an internal mouth image data storage means. 3-1 to 3-M are internal mouth image memories. 4 is an internal mouth two-dimensional patch model deformation means, 5 is a switching means, and 6 is an image synthesis means. 10 is a transmitting unit, 11 is an image processing unit, 12 is an audio encoding unit, 19 is a background image memory, 20 is a receiving unit, 21 is an audio decoding unit, 22 is an audio recognition unit, 23 is a control point coordinate memory ( 24 is an eyelid model transformation section, 25 is a synthesis section, 27 is an interpolation point calculation section, 28 is a random pulse generation section, 29 is a coordinate table control section, and 36 is a mouth shape model transformation section. 361 is a codebook. 362 is an external model transformation part. 363 is an internal mouth image data storage unit, 363-1 to 363-M are internal mouth image memories, and 364 is an internal mouth model transformation unit. 365 is a changeover switch section, and 366 is a composite image memory. Codebook Sugiri Yunguchi No. 4 Figure 51 Ken94 Ex) i Wholesale, Zeng, 7 壬小連＠ 5 目 5 Country f-L Good business ■ = 1 11i (O) (b) If ” λ To Shinmoto Zuzuoro D Dan I family 1 Uncho to skill 6 Lo shape 2 N hole base V- / To 4 people in Aohimochino '1' D Ko // ＼ Roro tongue rock jI 徨Fl raw, sucking
In the figure (internal table P2 dimension), 7 hand mosuru begging 2 and j, ÷ shell area / 1'71'9 shape 7 i'le 7 mushrooms 7 mosquitoes? , Seto Figure 10 - Frame 1 7 Shi-mu N/2 7 Shi-mu N 8 Meeting Wamosuru V! Image η-1-IAtrt/Zhong Egji bright phase Fig. 11

Claims

[Claims]

(1) By using a small amount of initialization data transmitted during initialization, depending on the audio information transmitted during communication,
In a facial moving image synthesis system that synthesizes and displays facial moving images on a receiving side, the receiving side includes still face image data storage means (1) for storing one frame of still facial image data sent at the time of initialization. , 19) and mouth shape 2, which represents the shape of the mouth area including the mouth as a collection of patches.
Mouth shape two-dimensional patch model transformation means (2, 361, 362) that transforms a dimensional patch model according to a mouth shape code corresponding to input audio information; and mouth internal image data that is obtained by cutting out an internal part of the oral cavity from a mouth image. internal mouth image data storage means (3, 363) for storing the same number of mouth shape codes as the number of mouth shape codes;
63) is applied to an internal mouth 2-dimensional patch model in which the internal mouth shape is represented by a set of patches, and outputs an internal mouth image according to the mouth shape code. Transformation means (4, 364)
and combining image information from the still face image data storage means (1, 19), the mouth shape two-dimensional patch model transformation means (2, 362), and the mouth internal two-dimensional patch model transformation means (4, 364). 1. A face moving image synthesis system, comprising: an image synthesis means (6, 366).

(2) At the time of initialization, the still face image data and the data of the mouth shape two-dimensional patch model are mapped so that the mouth shape two-dimensional patch model matches the still face image, and the mouth internal image is is characterized in that mapping is performed between the intra-mouth shape two-dimensional patch model and the intra-mouth image data for all mouth shape codes so that the internal mouth shape two-dimensional patch model matches the entire mouth shape code with the intra-mouth shape two-dimensional patch model. The face moving image synthesis system according to claim 1.