JPH01162492A

JPH01162492A - Image transmission system

Info

Publication number: JPH01162492A
Application number: JP62321849A
Authority: JP
Inventors: Eiji Morimatsu; 映史森松; Kiichi Matsuda; 松田　喜一; Toshitaka Tsuda; 俊隆津田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-12-18
Filing date: 1987-12-18
Publication date: 1989-06-26
Anticipated expiration: 2012-08-25
Also published as: JP2644789B2

Abstract

PURPOSE:To reduce information quantity and to economize a transmission by selecting a mouth shape parameter based on a phoneme in voice information transmitted and deforming a mouth shape model image. CONSTITUTION:A corresponding mouth shape parameter is selected from a code book 23 and sent to a memory 24 by a phoneme code in voice information from a transmitter side. The mouth shape model image previously stored is deformed for each parameter in the memory 24, is synthesized by a synthesizing part 25 with a face moving image except the mouth part sent from the transmitter side and the moving image of a face is reproduced. Consequently, since the transmission image is only face image information except the mouth part of a speaker, information quantity can be reduced.

Description

【発明の詳細な説明】〔概　　要〕人物の傾動画像をその音声とともに伝送する方式に関し
、傾動画像の情報をより圧縮できる伝送方式を実現するこ
とを目的とし、口部分の幾何学的形状を示す１組の口形パラメータによ
って定義される口形モデル画像を記憶したメモリと、該
口形モデル画像のパラメータ値を全音素符号に対して記
憶したコードブックとを受信側に設け、送信側からの音
声情報中の音素符号により該コードブックから対応する
口形パラメータを選択し、該口形パラメータに基づいて
該メモリ中の口形モデル画像を変形し、合成部で送信側
からの口部分以外の傾動画像と合成して画像再生する構
成と、これに加えて更に標準口形画像の正規化コードブ
ックを有し、送信側からの最初の旧画像に応じて該正規
化コードブックの各口形パラメータを変換して該コード
ブックを初期化する初期化装置を設けた構成とする。[Detailed Description of the Invention] [Summary] Regarding a method for transmitting a tilted image of a person along with its sound, the present invention aims to realize a transmission method that can further compress the information of the tilted image. A receiving side is provided with a memory that stores a mouth shape model image defined by a set of mouth shape parameters shown in FIG. A corresponding mouth shape parameter is selected from the codebook based on the phoneme code inside, the mouth shape model image in the memory is transformed based on the mouth shape parameter, and the synthesizer combines it with the tilted image other than the mouth part from the transmitting side. In addition to this, it also has a normalized codebook for standard mouth shape images, converts each mouth shape parameter of the normalized codebook according to the first old image from the sending side, and reproduces the code. The configuration includes an initialization device that initializes the book.

[Industrial application field]

本発明は画像伝送方式に関するものであり、特に人物の
傾動画像をその音声とともに伝送する方式に関するもの
である。The present invention relates to an image transmission method, and particularly to a method for transmitting a tilted image of a person along with the sound thereof.

ＴＶ電話、ＴＶ会議等においては、最終的に公衆電話回
線を利用した伝送方式を採用することが目標とされてお
り、このため、得られた画像情報を可能な限り圧縮する
ことが要望されている。The goal for videophones, video conferences, etc. is to eventually adopt a transmission method that uses public telephone lines, and for this reason, there is a desire to compress the obtained image information as much as possible. There is.

[Conventional technology]

ＴＶ電話等において伝送される画像は通常、人物の傾動
画像であるが、かかる動画像情報は、第１２図に示すよ
うに、音声情報とは独立して伝送されるのが従来からの
方式である。Images transmitted in videophones, etc. are usually tilted images of people, but as shown in Figure 12, conventional methods have been to transmit such moving image information independently of audio information. be.

即ち、入力画像は、送信側においてＴＶカメラ６１によ
りアナログ画像信号として発生され、この画像信号は画
像符号化装置６２でディジタル信号に変換されて符号化
され圧縮されて受信側に送られる。受信側では、画像復
号化装置６３により受信画像を元の信号に復号化してデ
イスプレィ６４に出力画像として表示する。That is, an input image is generated as an analog image signal by a TV camera 61 on the transmitting side, and this image signal is converted into a digital signal by an image encoding device 62, encoded, compressed, and sent to the receiving side. On the receiving side, an image decoding device 63 decodes the received image into an original signal and displays it on a display 64 as an output image.

また、入力音声は送信側でマイクロ５で音声情報として
得た後、音声符号化装置６６で音声特有の符号化を行っ
て圧縮した後、受信側で音声復号化装置６７で復号化さ
れてスピーカー６８から出力音声として得られる。In addition, the input voice is obtained as voice information by the micro 5 on the transmitting side, and is then compressed by voice-specific encoding in the voice encoding device 66.The input voice is then decoded by the voice decoding device 67 on the receiving side, and is then output to the speaker. 68 as output audio.

[Problem that the invention seeks to solve]

このような従来から一般的に行われて来た動画像の伝送
方式は、動画像の情報量が大きいため、低ビツトレート
の通信回線を利用することができず、コストが高くなっ
てしまうとともに、公衆電話回線を利用したＴＶ電話等
の適用には程遠いという問題点があった。Since the amount of information in moving images is large, this traditional method of transmitting moving images that has been commonly used cannot use low bit rate communication lines, resulting in high costs. There was a problem in that it was far from being applicable to TV telephones etc. using public telephone lines.

従って、本発明は、傾動画像の情報をより圧縮できる伝
送方式を実現することを目的とする。Therefore, an object of the present invention is to realize a transmission method that can further compress information on tilted images.

[Means for solving problems]

第１図は、上記の目的を達成するための第１の本発明に
係る画像伝送方式の概念図を示し、この発明では、口部
分の幾何学的形状を示す１組の口形パラメータによって
定義される口形モデル画像を記憶したメモリ２４と、該
口形モデル画像のパラメータ値を全音素符号に対して記
憶したコードブック２３とを受信側に設け、送信側から
の音声情報中の音素符号によりコードブック２３から対
応する口形パラメータを選択し、該口形パラメータに基
づいてメモリ２４中の口形モデル画像を変形し、合成部
２５で送信側からの口部分以外の顔動画像と合成して傾
動画像を再生する。FIG. 1 shows a conceptual diagram of an image transmission method according to the first invention for achieving the above object. A memory 24 that stores a mouth shape model image, and a codebook 23 that stores parameter values of the mouth shape model image for all phoneme codes are provided on the receiving side, and the codebook is created using the phoneme codes in the audio information from the transmitting side. A corresponding mouth shape parameter is selected from 23, the mouth shape model image in the memory 24 is transformed based on the mouth shape parameter, and the synthesizer 25 synthesizes it with the face moving image other than the mouth part from the transmitting side to reproduce a tilted image. do.

この第１の本発明では、コードブック２３が特定の話者
に限定されてしまう。In this first invention, the codebook 23 is limited to a specific speaker.

そのため、第２図に概念的に示す第２の本発明に係る画
像伝送方式では、上記の第１の本発明に加えて、標準口
形画像の正規化コードブック３３を有し、送信側からの
最初の口側像に応じて正規化コードブック３３の各口形
パラメータを変換して上記のコードブック２３を初期化
する初期化装置３０を設けている。Therefore, in the image transmission system according to the second invention conceptually shown in FIG. An initialization device 30 is provided which initializes the codebook 23 by converting each mouth shape parameter of the normalized codebook 33 according to the initial mouth side image.

[For production]

第１図に示した第１の本発明に係る画像伝送方式では、
送信側からの音声情報に含まれる音素符号（これは音素
符号の形で伝送されても、符号化され伝送された音声を
復号化して音素符号の形で抽出してもよい）と、コード
ブック２３に記憶された音素符号とを対照し、一致する
音素符号の１組のパラメータを次々に取り出してメモリ
２４に送る。メモリ２４では、この１組のパラメータ毎
に、予め記憶した口形モデル画像を変形し、送信側から
送られてくる口部分以外の傾動画像と合成部２５で合成
して顔の動画像を再生する。In the image transmission system according to the first invention shown in FIG.
The phoneme code included in the audio information from the transmitting side (this may be transmitted in the form of a phoneme code, or it may be extracted in the form of a phoneme code by decoding the encoded and transmitted voice) and the codebook. The phoneme code stored in the memory 23 is compared with the phoneme code stored in the memory 23, and a set of parameters of the matching phoneme code is successively extracted and sent to the memory 24. The memory 24 transforms the pre-stored mouth shape model image for each set of parameters, and combines it with the tilted image other than the mouth part sent from the transmitting side in the synthesis unit 25 to reproduce a moving image of the face. .

これにより、口部分の動画像を伝送する必要がなくなる
。This eliminates the need to transmit a moving image of the mouth area.

第２図に示した第２の本発明に係る画像伝送方式では、
第１の本発明に加えられた初期化装置３０により、標準
口形画像を基本音素符号により正規化した正規化コード
ブック３３の各口形パラメータを、最初に送信側から伝
送されてくる口側像に応して変換し、上記のコードブッ
ク２３を初期化する。In the image transmission system according to the second invention shown in FIG.
The initialization device 30 added to the first aspect of the present invention first applies each mouth shape parameter of the normalization codebook 33, which is a standard mouth shape image normalized by the basic phoneme code, to the mouth side image transmitted from the transmitting side. The above codebook 23 is initialized by corresponding conversion.

従って、不特定多数の話者に対応してコードブック２３
を変換することができる。Therefore, the codebook 23 can be used for an unspecified number of speakers.
can be converted.

〔Example〕

以下、本願発明に係る画像伝送方式の実施例を説明する
。Embodiments of the image transmission system according to the present invention will be described below.

第３図は、第１の本発明に係る画像伝送方式の一実施例
を示しており、この実施例では、送信部１０と受信部２
０で構成され、送信部１０は、傾動画像入力を画像処理
する画像処理部１１と、音声人力を符号化する音声符号
化部１２とを含んでいる。また、受信部２０ば、送信部
１０で符号化された音声符号を復号化する音声復号化部
２１と、復号化部２１から出力された音声信号を音声認
識する音声認識部２２と、この音声認識部２２から次々
と出力される音素符号（音声の基本構成単位である母音
又は子音から成るもの）から１組の口形パラメータを逐
次選択するコードブック２３と、このコードブック２３
で逐次選択された１組の口形パラメータに応じて口形モ
デル画像を変形する口形モデル変形部（メモリ）２４と
、この口形モデル変形部２４から発生された口側像を、
送信部１０の画像処理部１１からの口部分以外の傾動画
像と合成する合成部２５とを含んでいる。FIG. 3 shows an embodiment of the image transmission system according to the first invention.
The transmission section 10 includes an image processing section 11 that performs image processing on tilted image input, and an audio encoding section 12 that encodes human voice input. The receiving unit 20 also includes a voice decoding unit 21 that decodes the voice code encoded by the transmitting unit 10, a voice recognition unit 22 that recognizes the voice signal output from the decoding unit 21, and a voice recognition unit 22 that recognizes the voice signal output from the decoding unit 21. A codebook 23 that sequentially selects a set of mouth shape parameters from phoneme codes (consisting of vowels or consonants, which are the basic constituent units of speech) successively output from the recognition unit 22;
A mouth shape model transformation unit (memory) 24 that transforms a mouth shape model image according to a set of mouth shape parameters sequentially selected in , and a mouth side image generated from this mouth shape model transformation unit 24.
It includes a synthesis section 25 that synthesizes the tilted image other than the mouth portion from the image processing section 11 of the transmission section 10.

また、コードブック２３には、第４図に示すように、特
定の話者が各音素Ｉ、■、・・・ｍを発音した場合の口
の形状をパラメータｊ（例えば口の横幅）、■（例えば
唇の厚さ）、・・・ｎ（例えば口の縦幅）として数値化
したテーブルが予めその個人情報として記憶されている
。In addition, as shown in FIG. 4, the codebook 23 includes parameters j (for example, the width of the mouth), (for example, the thickness of the lips), . . . n (for example, the vertical width of the mouth) is quantified in a table that is stored in advance as the personal information.

更に、口形モデル変形部２４には、その個人情報として
予めその特定話者の１画面（１フレーム）分の口側像デ
ータを口形モデル画像として記憶（マツピング）してお
く。これは、最初に送信部１０から口部分の画像を１画
面分送っておいてもよいが、コードブック２３は予め作
っておく必要がある。Further, the mouth shape model transformation unit 24 stores (maps) one screen (one frame) worth of mouth side image data of the specific speaker as the mouth shape model image in advance as the personal information. This may be done by first sending one screen worth of images of the mouth part from the transmitter 10, but the code book 23 needs to be created in advance.

次に上記の実施例の動作を説明する。Next, the operation of the above embodiment will be explained.

音声入力は音声符号化部１２で符号化されて受信部２０
に伝送されるが、この音声符号は音声復号化部２１で復
号化して音声として出力する。この音声出力は音声認識
部２２に送られ、その音素符号が逐次抽出されてコード
ブック２３に送られる。コードブック２３では、入力し
た音素符号に基づいて第４図に示すコードブックの中か
ら対応する口形に関する１組のパラメータＩ、■、・・
・ｎを選択する。そして、これらの選択された１組のパ
ラメータにより、予め記憶した口形モデル画像を変形し
た旧画像を口形モデル変形部２４で発生する。この結果
、発生された旧画像と音声認識部２２で抽出された音素
との対応関係は第５図に示すようになる。The audio input is encoded by the audio encoder 12 and sent to the receiver 20.
This audio code is decoded by the audio decoding section 21 and output as audio. This speech output is sent to the speech recognition section 22, and its phoneme codes are sequentially extracted and sent to the codebook 23. In the codebook 23, a set of parameters I, ■, . . . regarding the corresponding mouth shape are selected from the codebook shown in FIG.
・Select n. Then, based on the selected set of parameters, the mouth shape model deforming section 24 generates an old image obtained by deforming the mouth shape model image stored in advance. As a result, the correspondence relationship between the generated old image and the phonemes extracted by the voice recognition section 22 is as shown in FIG.

このようにして変形して発生された旧画像は、送信部１
０の画像処理部１１から送られて来る口取外の傾動画像
情報と、合成部２５で合成されて顔全体の動画像が得ら
れることとなる。The old image deformed and generated in this way is sent to the transmitter 1
A moving image of the entire face is obtained by combining the tilted image information of the outside of the mouth sent from the image processing unit 11 of No. 0 in the combining unit 25.

尚、上記の口形モデル変形部２４での口形モデル画像の
変形については、信学技報ＩＥ８７−２゜第８７巻、第
１９号に記述されている。The transformation of the mouth shape model image by the mouth shape model transformation section 24 is described in IEICE Technical Report IE87-2, Vol. 87, No. 19.

第６図は、第１の本発明に係る画像伝送方式の他の実施
例を示したもので、第３図の実施例と異なる点は、送信
部１０に音声認識部１３を設け、送信側で音声符号とそ
の他の情報（イントネーション、ピッチ等）とに分離し
て受信部２０に送り、受信部２０では、音素符号をその
ままコードブック２３で用いるとともに音素符号とイン
トネーション等の情報とを音声合成部２６で合成して音
声出力を発生していることである。その他の構成及び動
作は第３図の場合と同様である。FIG. 6 shows another embodiment of the image transmission system according to the first invention. The difference from the embodiment of FIG. 3 is that the transmitter 10 is provided with a voice recognition section 13, The phonetic code is separated into a phonetic code and other information (intonation, pitch, etc.) and sent to the receiving unit 20.The receiving unit 20 uses the phonetic code as it is in the codebook 23 and synthesizes the phonetic code and information such as intonation into speech. The unit 26 synthesizes the signals and generates audio output. Other configurations and operations are the same as those in FIG. 3.

以上の実施例では、予め記憶されたコードブック２３は
予め決めた話者固有のものであるため、不特定多数の人
物の旧画像を伝送しようとすると、コードブックに記憶
された全口形符号を、話者が変わる度にその話者に適合
させるための書き換え処理を行うか１．または、登録さ
れている話者のコードブック情報を全て記録しておくた
めの膨大なメモリ領域をコードブックに用意しておかな
ければならない。In the above embodiment, the pre-stored codebook 23 is unique to a predetermined speaker, so when trying to transmit old images of an unspecified number of people, the full-voice code stored in the codebook is 1. Whether to perform rewriting processing to adapt to the speaker each time the speaker changes.1. Alternatively, a huge memory area must be prepared in the codebook in order to record all the codebook information of registered speakers.

そこで、第２図に既に示した第２の本発明ではコードブ
ックを不特定の話者に合わせて用いることができるよう
にした。Therefore, in the second aspect of the present invention already shown in FIG. 2, the codebook can be used in accordance with unspecified speakers.

即ち、第７図に示すように、標準的な人間の全音素を発
音した時の口形に対する口形モデルの各パラメータ値を
測定して標準コードブックを作成し、このコードブック
内の各パラメータ値を予め決めた基本音素符号（例えば
無音符号）のパラメータ値で正規化（割り算）してパラ
メータ毎に正規化したコードブックを作る（第８図参照
）。That is, as shown in Figure 7, a standard codebook is created by measuring each parameter value of the mouth shape model for the mouth shape of a standard human when pronouncing all phonemes, and each parameter value in this codebook is A normalized codebook is created for each parameter by normalizing (dividing) the parameter value of a predetermined basic phoneme code (for example, silent code) (see FIG. 8).

そして、第９図に示すように、基本音素符号に対応する
個人の旧画像から１組のパラメータを測定し、パラメー
タ毎に第８図のように求めた正規化されたコードブック
の全音素符号に対する各パラメータに乗算することによ
り個人用のコードブックが作成できることとなる。即ち
、例えば、得られた１組の個人口直像パラメータがｂ　
Ｉ　＋　”’　ｂ　Ｉ　ｎとすれば、第８図において音
素符号■でパラメータＩの正規化コードａ　２１／　ａ
　１１には上記のパラメータｂｌ＋が掛けられて（ａ　
２１／　ａ　＋＋）　ｂ　ｚというコードに変換され、
同様にしてパラメータ■に関してはバラメークｂ、が全
音素符号に関して乗算されることとなる。Then, as shown in Figure 9, a set of parameters is measured from the old image of the individual corresponding to the basic phoneme code, and the total phoneme code of the normalized codebook is obtained for each parameter as shown in Figure 8. By multiplying each parameter for , a personal codebook can be created. That is, for example, if the obtained set of individual oral direct image parameters is b
If I + "' b I n, then in Fig. 8, the normalized code of parameter I with phoneme code ■ is a 21/ a
11 is multiplied by the above parameter bl+ (a
21/ a ++) b z is converted to the code,
Similarly, regarding the parameter (2), the parameter b is multiplied with respect to all phoneme codes.

第１０図は、かかる個人用のコードブックを作成するた
めの初期化装置３０を設けた実施例を示しており、この
初期化装置３０でコードブック２３を個人用に初期化す
ることにより不特定多数の話者の傾動画像を再生するも
のである。FIG. 10 shows an embodiment provided with an initialization device 30 for creating such a personal codebook. This reproduces tilted images of a large number of speakers.

この初期化装置３０の具体的な構成が第１１図に示され
ており、最初に送信部１０の画像処理部１１から顔画像
中の基本音素符号（この場合、無音符号）の旧画像が送
られてきた時、この初期化装置３０では、特徴点抽出部
３１でその旧画像の特徴点を抽出する。そして、この特
徴点間距離等からパラメータ計算部３２で１組のパラメ
ータを計算する。この１組のパラメータを、第８図に示
すように正規化コードブックメモリ３３に予め用意して
おいた正規化コードブックの各パラメータ毎の乗算を乗
算器３４で行って個人用コードブックメモリ３５を作成
してコードブック２３に格納する。The specific configuration of this initialization device 30 is shown in FIG. 11. First, an old image of a basic phoneme code (silence code in this case) in a face image is sent from the image processing unit 11 of the transmitting unit 10. When the old image is received, the feature point extraction unit 31 of the initialization device 30 extracts the feature points of the old image. Then, a parameter calculation unit 32 calculates a set of parameters based on the distance between feature points and the like. This set of parameters is multiplied by a multiplier 34 for each parameter of a normalized codebook prepared in advance in the normalized codebook memory 33 as shown in FIG. is created and stored in the codebook 23.

以後、その個人の旧画像伝送の際に参照されることとな
る。From now on, it will be referenced when transmitting old images of that individual.

尚、この初期化装置３０は、第６図に示すような実施例
にも同様に適用される。Incidentally, this initialization device 30 is similarly applied to the embodiment shown in FIG.

〔Effect of the invention〕

以上のように、本発明の画像伝送方式によれば、予め口
形モデル画像を用意しておき、伝送されてきた音声情報
の内の音素に基づいて口形パラメー夕を選択し、更にこ
の日影パラメータによって用意した口形モデル画像を変
形するように構成したので、伝送する画像としては、話
者の口部分以外の顔画像情報だけでよいので、情報量を
大きく削減することができ、低ビツトレートの回線を利
用できるので、低廉な伝送方式が実現される。As described above, according to the image transmission method of the present invention, a mouth shape model image is prepared in advance, mouth shape parameters are selected based on the phoneme in the transmitted audio information, and the shadow parameter Since the mouth shape model image prepared by can be used, making it possible to realize an inexpensive transmission method.

また、用意したコードブックを話者毎に更新できるよう
に初期化装置を設けたので、不特定多数の話者に対して
も容易に対応することが可能となる。Furthermore, since an initialization device is provided so that the prepared codebook can be updated for each speaker, it becomes possible to easily support an unspecified number of speakers.

[Brief explanation of the drawing]

第１図は第１の本発明に係る画像伝送方式を概念的に示
したブロック図、第２図は第２の本発明に係る画像伝送方式を概念的に示
したブロック図、第３図は第１の本発明に係る画像伝送方式の一実施例を
示すブロック図、第４図は本発明に係る画像伝送方式に用いられるコード
ブックの構成図、第５図は各音素符号に対する口画像を示す図、第６回は
第１の本発明に係る画像伝送方式の他の実施例を示すブ
ロック図、第７図は第２の本発明に係る画像伝送方式における正規
化コードブックの作成手順を示す図、第８図は正規化コ
ードブックの構成図、第９図は第２の本発明に係る画像
伝送方式における個人用コードブックの作成手順を示す
図、第１０図は第２の本発明に係る画像伝送方式の一実
施例を示すブロック図、第１１図は第２の本発明に用いる初期化装置の構成を示
すブロック図、第１２図は従来の一般的な画像伝送方式を示す系統図、
である。第１図及び第２図において、２３・・・コードブック、２４・・・メモリ、２５・・・合成部、３０・・・初期化装置、３３・・・正規化コードブック。図中、同一符号は同−又は相当部分を示す。FIG. 1 is a block diagram conceptually showing an image transmission method according to the first invention, FIG. 2 is a block diagram conceptually showing an image transmission method according to the second invention, and FIG. 3 is a block diagram conceptually showing an image transmission method according to the second invention. A block diagram showing an embodiment of the image transmission method according to the first invention; FIG. 4 is a configuration diagram of a codebook used in the image transmission method according to the invention; FIG. 5 shows mouth images for each phoneme code. The sixth part is a block diagram showing another embodiment of the image transmission method according to the first invention, and FIG. 8 is a configuration diagram of a normalization codebook, FIG. 9 is a diagram showing the procedure for creating a personal codebook in the image transmission system according to the second invention, and FIG. FIG. 11 is a block diagram showing the configuration of an initialization device used in the second invention, FIG. 12 is a system showing a conventional general image transmission method. figure,
It is. 1 and 2, 23... code book, 24... memory, 25... synthesis unit, 30... initialization device, 33... normalization code book. In the figures, the same reference numerals indicate the same or corresponding parts.

Claims

[Claims]

(1) A memory (24) that stores a mouth shape model image defined by a set of mouth shape parameters indicating the geometric shape of the mouth part, and a code that stores the parameter values of the mouth shape model image for all phoneme codes. A book (23) is provided on the receiving side, a corresponding mouth shape parameter is selected from the codebook (23) according to the phoneme code in the audio information from the transmitting side, and the data in the memory (24) is selected based on the mouth shape parameter. An image transmission method characterized by deforming a mouth shape model image and combining it with a face moving image other than the mouth part from the transmitting side in a compositing section (25) to reproduce the image.

(2) The transmitting side encodes the voice input, and the receiving side decodes and recognizes the voice to provide phoneme codes to the codebook (23).
Image transmission method described in section.

(3) The transmitting side recognizes the voice input and transmits it as a phoneme code, and the receiving side provides the phoneme code to the codebook (23). image transmission method.

(4) A memory (24) that stores a mouth shape model image defined by a set of mouth shape parameters indicating the geometric shape of the mouth part, and a code that stores the parameter values of the mouth shape model image for all phoneme codes. A book (23) is provided on the receiving side, a corresponding mouth shape parameter is selected from the codebook (23) according to the phoneme code in the audio information from the transmitting side, and the data in the memory (24) is selected based on the mouth shape parameter. This is an image transmission method in which a mouth shape model image is transformed and synthesized with a face moving image other than the mouth part from the transmitting side in a synthesis unit (25) to reproduce the image, and the standard mouth shape image normalization codebook (33) is used. and an initialization device (30) that initializes the codebook (23) by converting each mouth shape parameter of the normalized codebook (33) according to the first mouth image from the transmitting side. An image transmission method featuring:

(5) The transmitting side encodes a voice input, and the receiving side decodes and recognizes the voice to provide a phoneme code to the codebook (2). Image transmission method.

(6) The transmitting side recognizes the voice input and transmits it as a phoneme code, and the receiving side provides the phoneme code to the codebook (2).
Image transmission method described in section.