JPH05316491A

JPH05316491A - Portrait image encoding system

Info

Publication number: JPH05316491A
Application number: JP11445292A
Authority: JP
Inventors: Kazuya Horii; 和哉堀井
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1992-05-07
Filing date: 1992-05-07
Publication date: 1993-11-26

Abstract

PURPOSE:To obtain a portrait image encoding system by which a processing is easy and whose data compression efficiency is high. CONSTITUTION:At the time of encoding, a portrait original image 101 undergoes expression/motion-analysis 102 and is parameter-conversion 103. Then, an expression/motion parameter 107 for three-dimensional models 104a-c is obtained. The three-dimensional models are constituted of plural models different in resolution and expression areas, and they are adaptively switched by the moving amount 106 of a whole head part or the face. At the time of decoding, one of three-dimensional models 104d-f is modeldeformed 109 by the expression/ motion parameter 107 and a decoding picture 112 can be obtained through the processing of luminance/color addition 110 based on an initial image 111.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、画像通信、画像蓄積
等への使用を目的とした、顔画像データ圧縮、あるい
は、復号化処理のみに注目すれば、コンピュータグラフ
ィクス等への使用を目的とした顔画像合成等に応用可能
な画像符号化方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is intended for use in computer graphics, etc., if attention is paid only to face image data compression or decoding processing intended for image communication, image storage, etc. The present invention relates to an image coding method applicable to face image synthesis and the like.

【０００２】[0002]

【従来の技術】近年、テレビ電話、テレビ会議等の画像
の狭帯域通信、あるいは、蓄積メデイア等への応用を目
指して、画像データ圧縮のための画像符号化方式の開
発、実用化が盛んに行われている。例えば、画像を小さ
いブロックに分割して、ブロック内画素を互いに無相関
な軸へ変換することにより冗長性を取り除く変換符号
化、あるいは、ブロック内の画素から構成されるベクト
ルを代表ベクトルに写像して、その代表ベクトルの番号
を符号化データとするベクトル量子化等様々な符号化方
式が存在する。しかしながら、前記、変換符号化、ベク
トル量子化に代表される従来の符号化方式は、総括する
と、画像における輝度、色、あるいは、色差などの空間
的な変化そのものを忠実に再現することを目的とした、
波形符号化に分類されるものであり、その符号化効率に
は限界がある。例えば、画像通信において、６４Ｋｂｐ
ｓ（ｂｐｓ：ビットパーセカンド、１秒間に送ることの
できるビット数）の伝送路で動画を送る場合、波形符号
化では、一般のテレビ放送と比較して、動き、画質とも
満足のいくものではない。そこで、従来のデータ圧縮の
考え方を大きく転換させ、高画質、高効率を目指した分
析・合成符号化と呼ばれる画像符号化方式が研究されて
いる。分析・合成符号化は波形符号化の限界を越えるも
のとして注目されはじめており、特に、テレビ電話、テ
レビ会議等への応用を目指して、人間の顔画像の分析・
合成符号化の研究が盛んに行われている。顔画像の分析
・合成符号化においては、符号化側と復号化側で、同じ
顔画像の３次元モデルを所有し、符号化側では、各画素
値の集合として与えられる顔画像データを分析して、３
次元モデルに対する変形情報を抽出し、符号化データと
する。一方、復号化側では、前記符号化データ、即ち、
３次元モデルの変形情報に基づき、予め所有する３次元
モデルを変形させ、輝度や色データを付加して、顔画像
を再生する。2. Description of the Related Art In recent years, an image coding method for image data compression has been actively developed and put into practical use for the purpose of application to narrow band communication of images such as videophones and video conferences, or storage media. Has been done. For example, the image is divided into small blocks and transform coding is performed to remove redundancy by transforming the pixels in the block to mutually uncorrelated axes, or a vector composed of pixels in the block is mapped to a representative vector. There are various coding methods such as vector quantization in which the number of the representative vector is coded data. However, the above-mentioned conventional coding methods represented by transform coding and vector quantization are generally aimed at faithfully reproducing spatial changes themselves such as luminance, color, or color difference in an image. did,
It is classified into waveform coding, and its coding efficiency is limited. For example, in image communication, 64 Kbp
When sending a moving image over a transmission path of s (bps: bit per second, the number of bits that can be sent per second), waveform coding is not satisfactory in terms of motion and image quality as compared with general television broadcasting. Absent. Therefore, an image coding method called analysis / synthesis coding aiming at high image quality and high efficiency has been studied by largely changing the conventional concept of data compression. Analysis / synthesis coding has begun to attract attention as it exceeds the limit of waveform coding, and in particular, with the aim of application to videophones, video conferencing, etc., analysis / synthesis of human face images
Researches on synthetic coding are actively conducted. In the face image analysis / synthesis encoding, the encoding side and the decoding side have the same three-dimensional model of the face image, and the encoding side analyzes the face image data given as a set of each pixel value. 3
Deformation information for the dimensional model is extracted and used as encoded data. On the other hand, on the decoding side, the encoded data, that is,
Based on the deformation information of the three-dimensional model, the three-dimensional model possessed in advance is deformed, luminance and color data are added, and the face image is reproduced.

【０００３】さて、従来、顔画像を対象としたこの種の
分析・合成符号化方式の概念を示したものとしては、例
えば、信学論Ｂ−１Ｖｏｌ．Ｊ７２−Ｂ−１Ｎ
ｏ．３（１９８９）相澤・原島・斉藤「構造モデルを用
いた画像の分析合成符号化方式」に示されたものがあ
る。図４は、前記引用例に示されている図を、本発明と
の比較を容易にするために修正して示したものである。
図において、顔原画像１０１はカメラ、スキャナ等の画
像入力装置から入力される人間の顔画像、表情・動き分
析１０２は、前記顔原画像１０１を入力データとし、顔
の表情や、頭部全体の動きを分析する。パラメータ変換
１０３では、前記表情・動き分析１０２の出力データに
基づき、３次元モデル１０４ａを基準とした時の表情・
動きパラメータ１０７を抽出し、符号化データとして出
力する。一方、復号化側では、まず、受け取った符号化
データ、即ち、表情・動きパラメータ１０７に基づき、
モデル変形１０９において３次元モデル１０４ｄを変形
させる。さらに、輝度・色付加１１０では、変形後のモ
デル画像に対して、予め受信してある初期画像１１１に
基づいて画素毎に輝度・色をつけ、表示するための復号
画像１１２が生成される。Conventionally, as a concept of this kind of analysis / synthesis coding method for face images, for example, see B. Vol. J72-B-1 N
o. 3 (1989) Aizawa / Harashima / Saito “Analysis / synthesis coding method of image using structural model”. FIG. 4 is a diagram showing the diagram shown in the above-mentioned reference, modified for ease of comparison with the present invention.
In the figure, a face original image 101 is a human face image input from an image input device such as a camera or a scanner, and a facial expression / movement analysis 102 uses the face original image 101 as input data, and the facial expression and the entire head portion. Analyze the movement of. In the parameter conversion 103, based on the output data of the expression / motion analysis 102, the expression / expression when the three-dimensional model 104a is used as a reference.
The motion parameter 107 is extracted and output as encoded data. On the other hand, on the decoding side, first, based on the received encoded data, that is, the facial expression / motion parameter 107,
In the model transformation 109, the three-dimensional model 104d is transformed. Further, in the brightness / color addition 110, a decoded image 112 for displaying and adding brightness / color to each pixel based on an initial image 111 received in advance is generated for the transformed model image.

【０００４】次に動作について説明する。Next, the operation will be described.

【０００５】図４において、画像入力装置等から入力さ
れた人間の顔原画像１０１は、いわゆる、ビットマップ
画像と呼ばれるものであり、画像を構成する各画素毎に
画素値を持った、圧縮されていない状態のデジタル画像
である。この顔原画像１０１は、次段の表情・動き分析
１０２において、例えば、フィルタリング処理等を用い
たノイズ除去、輝度や色や顔の大きさ等の正規化、顔輪
郭線の抽出、あるいは、目、眉、鼻、口などの、顔を構
成する各要素の抽出処理が行われる。次に、抽出された
各要素の相対位置関係、輪郭線との相対位置関係、ある
いは、各要素の大きさ、さらには、前画像フレームにお
ける画像データや表情・動き分析結果等の情報を用い
て、現画像フレームの表情や動きの分析が行われる。表
情の種類としては、例えば、典型的なものとして、無表
情（中立）、喜び、怒り、驚き、哀しみ、恐れ、嫌悪な
どがあるが、ここでは、「表情」を広義にとらえ、瞳の
向きやまばたき、等についても含むものとする。また、
動きとは、頭部全体の動きを示す。ただし、現時点にお
いては、前記の喜怒哀楽等の表情を認識することはむづ
かしい状況であるため、顔の各要素毎に基本となる動き
パターンを数種類ずつ決め、その基本パターンを抽出し
ているのが現状である。たとえば、眉に関する基本パタ
ーンとしては、（１）眉の内側を上げる、（２）眉の外
側を上げる、（３）眉を下げる、などがある。さて、上
記のようにして求められた表情分析データは、次にパラ
メータ変換１０３において、３次元モデル１０４ａの変
形パラメータに変換される。３次元モデル１０４ａは、
ワイヤー（線分）によって形状を表現する、いわゆる、
ワイヤーフレームモデルが使用される。各ワイヤーの始
点、終点の座標は、予め、入力される顔原画像１０１に
適合するように、拡大、縮小、平行移動、回転等によっ
て変換されている。即ち、顔原画像１０１と３次元モデ
ル１０４ａは、重なるように変換されており、当然のこ
とながら、復号化側に対しても同様の変換情報が送られ
て３次元モデル１０４ｄは、３次元モデル１０４ａに全
く等しく変形されている。この状態において、前記表情
・動き分析１０２から表情・動き分析データが送られて
くると、パラメータ変換１０３では、表情・動き分析デ
ータを３次元モデル１０４ａ用の変形パラメータに変換
し、その変形のための表情・動きパラメータ１０７を最
終的な符号化データとする。一方、復号化側では、ま
ず、前記表情・動きパラメータ１０７によって、３次元
モデル１０４ｄを変形させることにより、３次元モデル
１０４ｄの表情や動きを変更する。即ち、表情や動きの
変更とは、具体的には３次元モデル１０４ｄの各座標点
の変換にほかならない。輝度・色付加１１０において
は、変形された３次元モデル１０４ｄの各パッチ（線分
によって囲まれる平面、あるいは曲面）に対応した、輝
度、色等の変形前の画素情報を初期画像１１１からピッ
クアップして、３次元モデル１０４ｄにはりつけ、最終
的な復号画像１１２を得る。具体的な輝度、色付加処理
の例を以下に説明する。３次元モデル１０４ｄは、例え
ば、図５に示したような複数の３角形パッチで構成され
ており、３角形の各頂点の座標が表情や動きに伴って変
更される。図５において、３角形ＡＢＣが表情や動きに
伴って３角形Ａ’Ｂ’Ｃ’に変形させられたものとす
る。この時、３角形ＡＢＣ内にある画素集合を３角形
Ａ’Ｂ’Ｃ’内の画素集合に近似的に線形写像する。線
形写像の処理手順を図６に示す。即ち、まず、３角形
Ａ’Ｂ’Ｃ’内の点Ｘ’の斜交座標（ｓ，ｔ）を、点
Ａ’、Ｂ’、Ｃ’、Ｘ’の座標値を用いた連立方程式を
解くことによって求める。ここで、斜交軸は、辺Ａ’
Ｂ’と辺Ａ’Ｃ’の２軸としている。次に、３角形ＡＢ
Ｃにおける斜交座標（ｓ，ｔ）の点Ｘの画素値を抽出
し、この画素値を点Ｘ’の画素値として表示する。以上
を３角形Ａ’Ｂ’Ｃ’内の全ての点Ｘ’について行うと
ともに、３次元モデル１０４ｄの全ての３角形パッチに
ついて行い、初めて１フレームの画像の再生が可能とな
る。処理速度について言えば、例えば、通常のテレビジ
ョンのように１秒間に３０枚の画像を表示させようした
場合、１／３０秒の間に上記の処理を行う必要がある。
上記の処理の中で特に負荷の大きいものが、輝度・色付
加処理であり、これを、リアルタイムに処理しようとし
た場合、非常に大規模なハードウェアが必要となること
は言うまでもない。In FIG. 4, a human face original image 101 input from an image input device or the like is a so-called bitmap image, and is compressed with a pixel value for each pixel forming the image. It is a digital image in a state where it is not. This facial original image 101 is subjected to, for example, noise removal using filtering processing, normalization of brightness, color, face size, extraction of face contour lines, Extraction processing is performed on each element forming the face, such as eyebrows, nose, and mouth. Next, using the extracted relative positional relationship of each element, the relative positional relationship with the contour line, or the size of each element, and further information such as image data in the previous image frame and facial expression / motion analysis results. , Analysis of facial expression and movement of the current image frame is performed. Typical facial expressions include, for example, expressionless (neutral), joy, anger, surprise, sadness, fear, and disgust. Blinking etc. shall be included. Also,
The movement refers to the movement of the entire head. However, at this point in time, it is difficult to recognize facial expressions such as emotions and emotions, so several basic movement patterns are determined for each face element, and the basic patterns are extracted. Is the current situation. For example, basic patterns regarding eyebrows include (1) raising the inside of the eyebrows, (2) raising the outside of the eyebrows, and (3) lowering the eyebrows. Now, the facial expression analysis data obtained as described above is converted into a transformation parameter of the three-dimensional model 104a in the parameter conversion 103. The three-dimensional model 104a is
The so-called, which expresses the shape by a wire (segment),
A wireframe model is used. The coordinates of the start point and the end point of each wire are converted in advance by enlargement, reduction, parallel movement, rotation, etc. so as to fit the input face original image 101. That is, the original face image 101 and the three-dimensional model 104a are converted so as to overlap each other, and as a matter of course, the same conversion information is sent to the decoding side, and the three-dimensional model 104d becomes the three-dimensional model. It is transformed to 104a exactly the same. In this state, when the facial expression / movement analysis data is sent from the facial expression / movement analysis 102, the parameter conversion 103 converts the facial expression / movement analysis data into deformation parameters for the three-dimensional model 104a, and the deformation parameters are used for the deformation. The facial expression / motion parameter 107 is used as final encoded data. On the other hand, on the decoding side, first, the facial expression and movement of the three-dimensional model 104d are changed by deforming the three-dimensional model 104d with the facial expression / movement parameter 107. That is, the change of the facial expression and the movement is specifically the conversion of each coordinate point of the three-dimensional model 104d. In the luminance / color addition 110, pixel information before transformation such as luminance and color corresponding to each patch (a plane surrounded by line segments or a curved surface) of the transformed three-dimensional model 104d is picked up from the initial image 111. Then, it is attached to the three-dimensional model 104d to obtain the final decoded image 112. A specific example of the brightness and color addition processing will be described below. The three-dimensional model 104d is composed of, for example, a plurality of triangular patches as shown in FIG. 5, and the coordinates of each vertex of the triangle are changed according to the facial expression and the movement. In FIG. 5, it is assumed that the triangle ABC is transformed into the triangle A′B′C ′ according to the facial expression and the movement. At this time, the pixel set in the triangle ABC is approximately linearly mapped to the pixel set in the triangle A′B′C ′. The processing procedure of the linear mapping is shown in FIG. That is, first, the simultaneous equations using the diagonal coordinates (s, t) of the point X ′ in the triangle A′B′C ′ and the coordinate values of the points A ′, B ′, C ′, and X ′ are solved. Seek by. Here, the oblique axis is the side A '
It has two axes of B'and side A'C '. Next, the triangle AB
The pixel value of the point X at the oblique coordinates (s, t) in C is extracted, and this pixel value is displayed as the pixel value of the point X ′. The above process is performed for all the points X'in the triangle A'B'C 'and for all the triangle patches of the three-dimensional model 104d, and the image of one frame can be reproduced for the first time. Speaking of processing speed, for example, when displaying 30 images per second as in a normal television, the above processing needs to be performed within 1/30 second.
It is needless to say that a particularly large load is the luminance / color addition processing in the above processing, and if this processing is to be performed in real time, a very large scale of hardware is required.

【０００６】さて、上記においては、符号化側から復号
化側へ伝送される情報は主に３次元モデル１０４ｄ変形
用の表情・動きパラメータ１０７のみでよく、極端に送
信データの圧縮が実現できる。なお、これは、文字の場
合にあてはめると、文字の原画像をそのまま送る場合
と、文字を認識してそのコード情報を送る場合に相当す
る。In the above description, the information transmitted from the encoding side to the decoding side is mainly the facial expression / motion parameter 107 for transforming the three-dimensional model 104d, and the transmission data can be extremely compressed. When applied to a character, this corresponds to a case where the original image of the character is sent as it is and a case where the character is recognized and the code information thereof is sent.

【０００７】[0007]

【発明が解決しようとする課題】従来の顔画像符号化方
式は、以上のように処理されており、復号化側における
輝度、色データの付加の処理負荷が大きく、リアルタイ
ムに画像を再生しようとする場合、ハードウェア規模が
非常に大きくならざるを得なかった。また、本来、動く
対象に対しては人間の認視力が低下するという事実を考
慮することなく、常に一定の解像度を有する３次元モデ
ルを使用しているため、視覚特性における冗長度が利用
されていない。以上の点を考慮し、本発明では、視覚特
性における冗長度を減少させ、圧縮効率を高めるととも
に、輝度、色データの処理負荷が軽減できる符号化方式
を得ることを目的としている。The conventional face image coding system is processed as described above, and the processing load of adding luminance and color data on the decoding side is large, and an image is reproduced in real time. If you do, the hardware scale must be very large. Also, since a three-dimensional model having a constant resolution is always used without considering the fact that human visual acuity deteriorates for a moving object, redundancy in visual characteristics is utilized. Absent. In consideration of the above points, an object of the present invention is to obtain an encoding method capable of reducing redundancy in visual characteristics, improving compression efficiency, and reducing processing load of luminance and color data.

【０００８】[0008]

【課題を解決するための手段】本発明は、複数の３次元
モデルを符号化側と復号化側で所有する。さらに、複数
の３次元モデルは、それぞれ、解像度、あるいは、表現
領域の異なるモデルによって構成される。The present invention possesses a plurality of three-dimensional models on the encoding side and the decoding side. Furthermore, each of the plurality of three-dimensional models is composed of models having different resolutions or different expression areas.

【０００９】[0009]

【作用】符号化側において分析される顔の動き量に従っ
て、符号化側、及び、復号化側で使用する３次元モデル
を切換えて使用することにより、送受信する表情・動き
パラメータの数、及び、復号化側における輝度・色付加
処理の量が変化する。The number of facial expression / motion parameters to be transmitted / received by switching and using the three-dimensional models used on the encoding side and the decoding side according to the amount of movement of the face analyzed on the encoding side, and The amount of brightness / color addition processing on the decoding side changes.

【００１０】[0010]

【実施例】（実施例１）以下、この発明の実施例につい
て図面を参照しながら説明する。(Embodiment 1) An embodiment of the present invention will be described below with reference to the drawings.

【００１１】図１は本発明の第１の実施例における顔画
像符号化方式の処理を示したブロック図である。図にお
いて、顔原画像１０１はカメラ、スキャナ等の画像入力
装置から入力される人間の顔画像、表情・動き分析１０
２は、前記顔原画像１０１を入力データとし、表情や動
きを分析する。切換器１０５ａは、前記表情・動き分析
１０２から出力される動き量１０６に従って、複数の３
次元モデル１０４ａ〜ｃから１つを選択する。パラメー
タ変換１０３では、前記表情・動き分析１０２の出力さ
れる表情・動きデータに基づき、前記切換器１０５ｂに
よって選択された３次元モデルを基準とした時の表情・
動きパラメータ１０７を抽出し、符号化データとして出
力する。一方、復号化側では、まず、受け取った符号化
データ、即ち、表情・動きパラメータ１０７に基づき、
モデル変形１０９において３次元モデルを変形させる。
この３次元モデルは、送信側から送られてきたモデル切
換え情報１０８に基づき、切換器１０５ｂが複数の３次
元モデル１０４ｄ〜ｆから選択したものである。さら
に、輝度・色付加１１０では、変形後の３次元モデルに
対して、初期画像１１１を用いて画素毎に輝度・色をつ
け、表示するための復号画像１１２が生成される。FIG. 1 is a block diagram showing the processing of the face image coding system according to the first embodiment of the present invention. In the figure, a face original image 101 is a human face image input from an image input device such as a camera or a scanner, facial expression / movement analysis 10
2 uses the original face image 101 as input data and analyzes facial expressions and movements. The switching device 105a is configured to detect a plurality of three motions according to the motion amount 106 output from the facial expression / motion analysis 102.
One is selected from the dimensional models 104a to 104c. In the parameter conversion 103, based on the facial expression / movement data output from the facial expression / movement analysis 102, the facial expression / when the three-dimensional model selected by the switch 105b is used as a reference.
The motion parameter 107 is extracted and output as encoded data. On the other hand, on the decoding side, first, based on the received encoded data, that is, the facial expression / motion parameter 107,
In the model transformation 109, the three-dimensional model is transformed.
This three-dimensional model is selected by the switch 105b from the plurality of three-dimensional models 104d to 104f based on the model switching information 108 sent from the transmitting side. Furthermore, in the brightness / color addition 110, a decoded image 112 for displaying and displaying brightness and color for each pixel using the initial image 111 is generated for the transformed three-dimensional model.

【００１２】次に動作について説明する。Next, the operation will be described.

【００１３】図１において、基本的な処理順序は従来例
の場合と殆ど同様であるので、個々の動作説明は省略す
る。以下、本発明が従来例と大きく異なる点、即ち、複
数の３次元モデルを所有し、それらを、適応的に切換え
て画像の符号化を行う点を中心に説明する。In FIG. 1, the basic processing sequence is almost the same as that of the conventional example, and therefore the description of the individual operations is omitted. Hereinafter, the present invention will be described mainly with respect to a point that is greatly different from the conventional example, that is, that a plurality of three-dimensional models are possessed and those are adaptively switched to perform image coding.

【００１４】一般に、静止物体よりも動きのある物体に
対して人間の認視力は低い。これは、即ち、動きのある
対象に対しては、より少ない解像度で表示しても、見え
る画像に変わりがないということを示している。従っ
て、顔の動き量に応じて、適応的に３次元モデルの解像
度を変化させれば、送信するデータの圧縮につながると
ともに、輝度・色付加処理の高速化を図ることができ
る。Generally, humans have lower visual acuity for moving objects than stationary objects. This means that, for a moving object, even if it is displayed at a lower resolution, the image that can be seen does not change. Therefore, if the resolution of the three-dimensional model is adaptively changed according to the amount of movement of the face, the data to be transmitted can be compressed and the brightness / color addition processing can be speeded up.

【００１５】図１に示した実施例では、送信側と受信側
に、それぞれ解像度の異なる３種類の３次元モデル１０
４ａ〜ｃ、１０４ｄ〜ｆを配置している。３種類の３次
元モデル１０４ａ〜ｃ、１０４ｄ〜ｆは、例えば、図２
（ａ），（ｂ），（ｃ）に示すようなものであり、それ
ぞれ、顔の全体領域を覆うモデルではあるが、モデルを
構成するパッチの数、即ち、解像度が異なっている。図
１における表情・動き分析１０２で顔原画像１０１から
検出された頭部全体の動きは、前フレームにおける顔原
画像１０１から検出された頭部全体の動きと比較され、
相対的な動き量１０６として出力される。頭部全体の動
きは、例えば、顔の中の目、鼻、口等の特徴的な点を顔
原画像１０１から抽出して、フレーム間の相対的な動き
量を計算する方法や、あるいは、例えば、信学技報Ｐ
ＲＵ９０−６８（１９９０ー１０）崔・原島・武部
「知的画像符号化における頭部の動きと顔面の動き情報
の高精度推定」に示されているような、特徴点の検出等
の中間段階を経ずに、画素値から直接頭部の動き量を求
める方法がある。このようにして求められた動き量１０
６に基づいて、切換器１０５ａが、３個の３次元モデル
１０４ａ〜ｃの中の１個を選択し、パラメータ変換１０
３において使用される。具体的には、動き量１０６が大
きい場合には、図２（ａ）で示したような解像度の低い
３次元モデルが選択され、動き量１０６が小さい場合に
は、図２（ｃ）で示したような解像度の高い３次元モデ
ルが、また、中間の場合には図２（ｂ）で示したような
中間の３次元モデルが選択される。なお、動き量１０６
は、切換器１０５ａを経由して、モデル切換情報１０８
として復号化側にも送られる。復号化側では、モデル切
換情報１０８に従って、切換器１０５ｂが３次元モデル
１０４ｄ〜ｆの中から１個を選択しモデル変形１０９へ
渡す。以下の処理については、従来例の場合と全く同様
である。In the embodiment shown in FIG. 1, there are three types of three-dimensional models 10 having different resolutions on the transmitting side and the receiving side.
4a-c and 104d-f are arranged. The three types of three-dimensional models 104a-c and 104d-f are shown in FIG.
As shown in (a), (b), and (c), the models cover the entire area of the face, but the number of patches forming the model, that is, the resolution is different. The movement of the entire head detected from the original face image 101 by the facial expression / movement analysis 102 in FIG. 1 is compared with the movement of the entire head detected from the original face image 101 in the previous frame,
The relative amount of movement 106 is output. The movement of the entire head is calculated, for example, by extracting characteristic points such as eyes, nose, or mouth in the face from the face original image 101 and calculating a relative movement amount between frames, or For example, Technical Report P
RU90-68 (1990-10) Choi, Harashima, Takebe Intermediate stage of feature point detection, etc. as shown in "High-precision estimation of head movement and facial movement information in intelligent image coding" There is a method of directly obtaining the amount of movement of the head from the pixel value without going through. The amount of movement 10 thus obtained
6, the switch 105a selects one of the three three-dimensional models 104a to 104c, and the parameter conversion 10
Used in 3. Specifically, when the motion amount 106 is large, a three-dimensional model with low resolution as shown in FIG. 2A is selected, and when the motion amount 106 is small, it is shown in FIG. Such a three-dimensional model with high resolution is selected, and in the case of the intermediate one, the intermediate three-dimensional model as shown in FIG. 2B is selected. The amount of movement 106
The model switching information 108 via the switch 105a.
Is also sent to the decoding side. On the decoding side, according to the model switching information 108, the switching device 105b selects one from the three-dimensional models 104d to 104f and transfers it to the model transformation 109. The subsequent processing is exactly the same as in the case of the conventional example.

【００１６】（実施例２）図３は、本発明の第２の実施
例における複数の３次元モデルを示した図である。(Embodiment 2) FIG. 3 is a diagram showing a plurality of three-dimensional models in the second embodiment of the present invention.

【００１７】本実施例が上記第１の実施例と相違する点
は、複数の３次元モデル１０４ａ〜ｆが、顔全体を覆う
モデルと目、及び、目周辺領域だけのモデル、さらに
は、口、及び、口周辺領域だけのモデルで構成されてい
る点である。本実施例においては、３次元モデル選択の
基準となるのは、頭部全体の動き量ではなく、顔面にお
ける目、鼻、口等の各要素の動き量となる。この動き量
１０６は、表情・動き分析１０２において抽出される。
例えば、送信側において、表情の変化として瞬きだけを
検出したような場合には図３（ｂ）の３次元モデルが選
択され、目、及び、目周辺だけに新たな輝度・色データ
が上書きされ、瞬きだけをした画像が合成される。この
際、図３（ｂ）の３次元モデル以外の顔領域は前フレー
ムの画像がそのまま表示される。また、口のみの動きだ
けを送信側で検出した場合には、図３（ｃ）の３次元モ
デルが選択され、上記と同様に、口、及び、口周辺の画
像だけが新たに上書きされ、表示される。This embodiment is different from the first embodiment in that a plurality of three-dimensional models 104a to 104f cover the entire face and eyes, and a model only in the eye peripheral region, and further, the mouth. , And that the model is composed of only the area around the mouth. In this embodiment, it is not the amount of movement of the entire head, but the amount of movement of each element such as the eyes, nose, and mouth on the face that serves as the reference for selecting the three-dimensional model. The motion amount 106 is extracted in the facial expression / motion analysis 102.
For example, when only the blink is detected as the change in the facial expression on the transmitting side, the three-dimensional model in FIG. 3B is selected, and new luminance / color data is overwritten only in the eyes and around the eyes. , An image with only blinking is synthesized. At this time, the image of the previous frame is displayed as it is on the face area other than the three-dimensional model of FIG. 3B. Further, when only the movement of only the mouth is detected on the transmitting side, the three-dimensional model of FIG. 3C is selected, and similarly to the above, only the mouth and the image around the mouth are newly overwritten. Is displayed.

【００１８】[0018]

【発明の効果】以上説明したように本発明によれば、解
像度の異なる複数の３次元モデルを、動き量に応じて適
応的に切換えて使用するため、本来必要な部分の情報の
みを送信すればよく、データ圧縮効率が上がる。また、
表現領域の異なる複数の３次元モデルを使用すれば、限
られた領域についてのみ、輝度・色付加等の処理をすれ
ばよいので、受信側の処理負荷が大きく減少するという
効果がある。As described above, according to the present invention, since a plurality of three-dimensional models having different resolutions are adaptively switched and used according to the amount of motion, it is possible to transmit only the information of the originally necessary part. The data compression efficiency is improved. Also,
If a plurality of three-dimensional models having different expression areas are used, processing such as brightness / color addition may be performed only on a limited area, which has the effect of significantly reducing the processing load on the receiving side.

【００１９】なお、上記では、３次元モデルを使用する
場合を例にとり説明したが、２次元モデルであっても同
様の効果が得られる。In the above description, the case where a three-dimensional model is used has been described as an example, but the same effect can be obtained even with a two-dimensional model.

【００２０】また、上記本発明による実施例において、
動き量の検出は、本来、表情・動きの関するパラメータ
を伝送するために必要な処理であるため、処理的に負担
になるものではない。さらに、上記では、画像符号化方
式として、分析合成符号化のみを使用する場合を例にと
ったが、従来の波形符号化方式と分析合成符号化方式を
組み合わせた混合方式であっても、同様の効果が得られ
る。Further, in the above-mentioned embodiment according to the present invention,
The detection of the motion amount is originally a process necessary for transmitting the parameters relating to the facial expression / motion, and thus does not impose a processing load. Furthermore, in the above, the case where only analysis and synthesis coding is used as an image coding method has been taken as an example, but the same applies to a mixed method in which a conventional waveform coding method and analysis and synthesis coding method are combined. The effect of is obtained.

[Brief description of drawings]

【図１】本発明の第１の実施例による顔画像符号化方式
の処理ブロックを示した図である。FIG. 1 is a diagram showing processing blocks of a face image coding system according to a first embodiment of the present invention.

【図２】本発明の第１の実施例による顔画像符号化方式
において使用される、解像度の互いに異なる３次元モデ
ルを示した図である。FIG. 2 is a diagram showing three-dimensional models having different resolutions, which are used in the face image coding method according to the first embodiment of the present invention.

【図３】本発明の第２の実施例による顔画像符号化方式
において使用される、表現領域の互いに異なる３次元モ
デルを示した図である。FIG. 3 is a diagram showing three-dimensional models of different expression areas, which are used in a face image coding method according to a second embodiment of the present invention.

【図４】従来例における顔画像符号化方式の処理ブロッ
クを示した図である。FIG. 4 is a diagram showing processing blocks of a face image encoding method in a conventional example.

【図５】従来例、及び、本実施例における、輝度、色デ
ータ付加の処理概念を説明した図である。FIG. 5 is a diagram illustrating a processing concept of luminance and color data addition in a conventional example and the present embodiment.

【図６】従来例、及び、本実施例における、輝度、色デ
ータ付加の処理フローを示した図である。FIG. 6 is a diagram showing a processing flow of adding luminance and color data in the conventional example and the present embodiment.

[Explanation of symbols]

１０１顔原画像１０２表情・動き分析１０３パラメータ変換１０４３次元モデル１０５切換器１０６動き量１０７表情・動きパラメータ１０８モデル切換え情報１０９モデル変形１１０輝度・色付加１１１初期画像１１２復号画像 101 face original image 102 facial expression / motion analysis 103 parameter conversion 104 three-dimensional model 105 switcher 106 motion amount 107 facial expression / motion parameter 108 model switching information 109 model transformation 110 brightness / color addition 111 initial image 112 decoded image

Claims

[Claims]

1. A coding side and a decoding side similarly have a two-dimensional or three-dimensional model of a face, and the coding side analyzes facial expressions and movements from an input face image, and Face image coding for outputting and transmitting three-dimensional or three-dimensional deformation information, and on the decoding side, deforming the two-dimensional or three-dimensional model based on the received deformation information to synthesize an image In the method, a plurality of two-dimensional or three-dimensional models are owned by the encoding side and the decoding side in the same way, and the encoding side detects the amount of movement of the face in the input face image and A two-dimensional or three-dimensional model to be used is adaptively selected and model switching information and deformation information by facial expression and motion analysis are output and transmitted. Secondary Or,
A face image encoding method, characterized in that a three-dimensional model is switched and a model selected according to the deformation information is deformed to synthesize an image.

2. The two-dimensional or three-dimensional model is composed of a plurality of models of the whole face in which the patches forming each model have different fineness, and based on the amount of movement of the entire head, The face image encoding system according to claim 1, wherein the face image encoding system is used by switching.

3. The two-dimensional or three-dimensional model is composed of a plurality of models including at least one model for the entire face and one model for a partial area of the face, and is based on the amount of movement of the face. The face image encoding method according to claim 1, wherein the face image encoding method is used by switching.