JPH08307841A

JPH08307841A - Pseudo moving image video telephone system

Info

Publication number: JPH08307841A
Application number: JP7111524A
Authority: JP
Inventors: Hiroaki Matsushita; 博明松下; Shigeyuki Sudo; 茂幸須藤; Tomohiro Ezaki; 智宏江崎; Atsushi Yoshioka; 厚吉岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-05-10
Filing date: 1995-05-10
Publication date: 1996-11-22

Abstract

PURPOSE: To provide the method of displaying a pseudo face moving image of a talker at a receiver side from a voice signal without image transmission from a sender side through the use of a telephone line. CONSTITUTION: A signal sent from a sender side talker via a telephone line 2 is given to a communication means 4, in which a voice signal is outputted and it is converted into a video parameter through linear prediction coding by a voice analysis means 5 to be outputted. On the other hand, a model generating means 7 stores plural mouth model and head models comprising wire frames and outputs mouth and head model data relating to the selected model. A voice parameter is converted into a mouth parameter by a parameter conversion means 8 on the mouth model data to be outputted and an image composite means 9 generates pseudo face moving image of a sender talker on the mouth parameter and the head model data and the image is displayed by a display means 10.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は電話回線などを利用する
通信装置に係り、特に送信側からの音声信号をもとに受
信側で擬似動画の表示を行う擬似動画ＴＶ電話装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a communication device using a telephone line or the like, and more particularly to a pseudo moving picture TV telephone device for displaying a pseudo moving picture on the receiving side based on an audio signal from the transmitting side.

【０００２】[0002]

【従来の技術】従来、話者をＴＶカメラ等で撮像し、画
像データ圧縮して音声信号と共に伝送し、受信側で画像
伸長を行いＴＶ画面等に表示するといったようなＴＶ電
話が実用化されている。その詳細については村上治著
「画像メディアと通信革命」（産業図書ｐ６１１９
８４年）に記載されている。2. Description of the Related Art Conventionally, a TV telephone has been put into practical use in which a speaker is imaged by a TV camera or the like, image data is compressed and transmitted together with an audio signal, and an image is expanded on the receiving side and displayed on a TV screen or the like. ing. For details, see Osamu Murakami, "Image Media and Communication Revolution" (Sangyo Tosho p61 19).
1984).

【０００３】[0003]

【発明が解決しようとする課題】従来のＴＶ電話におけ
る問題点として以下のようなことがあげられる。まず、
伝送すべき画像は話者をＴＶカメラ等で撮像したもので
あるため、画像圧縮の方式にもよるが、仮に画像圧縮し
たとしても情報量が膨大であり、１枚の画面を送るのに
数秒程度かかってしまう。また、送信側においてＴＶ電
話装置を持たない場合、すなわち、一般の電話や公衆電
話、携帯電話からの送信に対しては受信側におけるＴＶ
電話装置としての機能を発揮できないなど問題があっ
た。Problems to be solved by the conventional TV telephone are as follows. First,
The image to be transmitted is the image of the speaker taken by a TV camera or the like, so it depends on the image compression method, but even if the image is compressed, the amount of information is enormous and it takes several seconds to send one screen. It will take about a degree. Also, if the sender does not have a TV telephone device, that is, for the transmission from an ordinary telephone, public telephone, or mobile phone, the TV on the receiver side
There was a problem that it could not function as a telephone device.

【０００４】本発明の目的は電話回線を用いて、送信側
から画像伝送を行わずに音声信号からその話者の擬似顔
動画像を受信側で表示できるＴＶ電話装置を提供するこ
とにある。An object of the present invention is to provide a TV telephone device capable of displaying a pseudo facial moving image of the speaker on the receiving side from a voice signal without transmitting an image from the transmitting side using a telephone line.

【０００５】[0005]

【課題を解決するための手段】上記の問題を解決するた
め、本発明の擬似動画ＴＶ電話装置は音声信号の声道特
性と放射特性の特徴を分析し、線形予測符号化を行い、
特徴抽出した音声パラメータを出力する音声分析手段
と、ワイヤフレームで構成された複数の口形モデル及び
頭部モデルを蓄積し、選択されたモデルに関する口形及
び頭部モデルデータを出力するモデル生成手段と、前記
音声パラメータと口形モデルデータを入力し、音声パラ
メータを時々刻々と変化する口形パラメータに変換出力
するパラメータ変換手段と、前記口形パラメータ及び頭
部モデルデータをもとに、送信側話者の擬似顔動画像の
生成を行う画像合成手段と、前記画像合成手段で得られ
る送信側話者の擬似顔動画像を表示する表示手段とを備
えたことを特徴とする。In order to solve the above problems, the pseudo moving picture video telephone apparatus of the present invention analyzes the characteristics of the vocal tract characteristic and the radiation characteristic of an audio signal and performs linear predictive coding,
A voice analysis unit that outputs a voice parameter that has been feature-extracted, a model generation unit that stores a plurality of mouth-shaped models and head models configured by wireframes, and outputs mouth-shaped and head model data related to the selected model, Parameter conversion means for inputting the voice parameter and mouth shape model data and converting and outputting the voice parameter into a mouth shape parameter that changes from moment to moment, and based on the mouth shape parameter and head model data, a pseudo face of the transmitting speaker An image synthesizing unit for generating a moving image and a display unit for displaying the pseudo face moving image of the transmitting speaker obtained by the image synthesizing unit are provided.

【０００６】[0006]

【作用】本発明では例えば電話回線を介して送信側話者
から送られてくる音声信号は音声分析手段において線形
予測符号化に基づく特徴抽出により音声パラメータに変
換される。一方、モデル生成手段ではワイヤフレームで
構成された複数の口形モデルと頭部モデルが蓄積され、
選択されたモデルに関する口形及び頭部モデルデータが
出力される。音声パラメータは口形モデルデータをもと
に、パラメータ変換手段で口形パラメータに変換出力さ
れる。そして画像合成手段で、口形パラメータと頭部モ
デルデータをもとに、送信側話者の擬似顔動画像の生成
を行い、表示手段により表示する。In the present invention, the voice signal sent from the transmitting side speaker, for example, via the telephone line is converted into the voice parameter by the feature extraction based on the linear predictive coding in the voice analysis means. On the other hand, in the model generating means, a plurality of mouth-shaped models and head models composed of wire frames are accumulated,
The mouth shape and head model data relating to the selected model is output. The voice parameter is converted into a mouth shape parameter by the parameter converting means based on the mouth shape model data and output. Then, the image synthesizing means generates a pseudo-face moving image of the transmitting-side speaker based on the mouth shape parameter and the head model data, and displays it by the displaying means.

【０００７】このように本発明によれば、送信側話者の
音声信号からその擬似顔動画像を受信側で生成表示する
ので、実際に伝送する信号は音声に関するものだけとな
り、従来のＴＶ電話装置と比較すると、情報量や伝送時
間がはるかに低減でき、一般のアナログ電話回線や、さ
らに公衆電話、携帯電話からの送信に対して適応でき、
送信側話者の音声信号からその擬似顔動画像を受信側で
生成表示するので、ＴＶ電話同士で通話しているような
感覚で通話が行える。As described above, according to the present invention, the pseudo face moving image is generated and displayed on the receiving side from the voice signal of the transmitting side speaker, so that the signals actually transmitted are only those related to the voice, and the conventional TV telephone. Compared to the device, the amount of information and transmission time can be reduced significantly, and it can be adapted to transmission from general analog telephone lines, public telephones, mobile phones,
Since the pseudo face moving image is generated and displayed on the receiving side from the voice signal of the transmitting side speaker, it is possible to talk as if talking between videophones.

【０００８】これは受信側で表示される送信側話者の擬
似顔動画像については実際の通信時の送信側話者の状態
とは相違するものの、少なくとも送信側話者の擬似顔が
表示され、しかも音声に合わせた口の動きが動画表示さ
れるので、擬似的に送信側とＴＶ電話で通話しているよ
うな感覚で通話が行える。This is different from the state of the transmitting-side speaker during actual communication in the pseudo-face moving image of the transmitting-side speaker displayed on the receiving side, but at least the pseudo-face of the transmitting-side speaker is displayed. Moreover, since the motion of the mouth that matches the voice is displayed as a moving image, it is possible to make a call as if the user were talking on the sending side on a videophone.

【０００９】[0009]

【実施例】以下、本発明の実施例を図面を用いながら詳
しく説明する。図１は本発明の第１の実施例を示すブロ
ック図である。尚、図面では擬似動画ＴＶ電話装置にお
ける受信部のブロック図のみを示しており、送信部のブ
ロック図は公知の一般の電話機であっても良いため省略
してある。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of the present invention. In the drawing, only a block diagram of the receiving unit in the pseudo moving picture TV telephone device is shown, and the block diagram of the transmitting unit may be omitted because it may be a known general telephone.

【００１０】図１で、１は各家庭の一般電話、公衆電
話、自動車携帯電話など画像伝送装置を持たない電話
機、２は電話回線である。３は本発明の擬似動画ＴＶ電
話装置、４は電話回線を介して送信側話者から送られて
くる音声信号を出力する通信手段であり、この場合は電
話機の受信回路となる。５は音声信号を音声パラメータ
に変換出力する音声分析手段である。６は受信側話者が
送信側話者の擬似顔を選択するための選択信号を出力す
る選択手段である。７は口形モデルデータと頭部モデル
データを出力するモデル生成手段である。８は音声パラ
メータと口形モデルデータを入力し、音声パラメータを
口形パラメータに変換出力するパラメータ変換手段であ
る。９は口形パラメータと頭部モデルデータをもとに、
送信側話者の擬似顔動画像の生成を行う画像合成手段で
ある。１０は送信側話者の擬似顔動画像を表示する表示
手段である。In FIG. 1, reference numeral 1 is a telephone that does not have an image transmission device, such as a home telephone, a public telephone, or a mobile telephone of each home, and 2 is a telephone line. Reference numeral 3 is a pseudo-moving video telephone device of the present invention, and 4 is a communication means for outputting a voice signal sent from a transmitting side speaker via a telephone line. In this case, it is a receiving circuit of the telephone. Reference numeral 5 is a voice analysis means for converting and outputting a voice signal into a voice parameter. Reference numeral 6 is a selection unit that outputs a selection signal for the receiving speaker to select the pseudo face of the transmitting speaker. Reference numeral 7 is a model generating means for outputting the mouth shape model data and the head model data. Reference numeral 8 is a parameter conversion means for inputting a voice parameter and mouth shape model data and converting the voice parameter into a mouth shape parameter for output. 9 is based on mouth shape parameter and head model data,
It is an image synthesizing means for generating a pseudo facial moving image of the transmitting side speaker. Reference numeral 10 is a display unit for displaying a pseudo facial moving image of the transmitting side speaker.

【００１１】次に各部の動作について説明する。Next, the operation of each section will be described.

【００１２】電話機１が送信側で、擬似動画ＴＶ電話装
置３の受信動作を考える場合、電話機１から送られてく
る信号は電話回線２を介して擬似動画ＴＶ電話装置３に
入力される。信号は通信手段４より、送信側話者の音声
信号ａが出力される。その後、音声信号ａは音声分析手
段５に入力され、音声信号ａの声道特性と放射特性の特
徴を分析し、線形予測符号化を行うことにより、特徴抽
出した音声パラメータｂへ変換され、パラメータ変換手
段８に出力される。When the telephone 1 is on the transmitting side and the receiving operation of the pseudo moving picture TV telephone apparatus 3 is considered, the signal sent from the telephone 1 is input to the pseudo moving picture TV telephone apparatus 3 via the telephone line 2. As the signal, the voice signal a of the transmitting speaker is output from the communication means 4. After that, the voice signal a is input to the voice analysis means 5, the features of the vocal tract characteristic and the radiation characteristic of the voice signal a are analyzed, and linear predictive coding is performed to convert the voice signal a into the voice parameter b from which the feature is extracted. It is output to the conversion means 8.

【００１３】一方、送信側話者の話し声を聞くことによ
り、受信側話者は選択手段６で表示したい送信側話者の
擬似顔の選択を行う。これに伴い選択手段６からは選択
信号ｃが出力され、モデル生成手段７に入力される。モ
デル生成手段７にはワイヤフレームで構成された複数の
口形モデルと頭部モデルが蓄積されており、選択信号ｃ
を受けたモデル生成手段７は選択信号ｃをもとに選択さ
れたモデルに関する口形モデルデータｄをパラメータ変
換手段８に、頭部モデルデータｆを画像合成手段９に、
それぞれ出力する。On the other hand, by listening to the voice of the transmitting speaker, the receiving speaker selects the pseudo face of the transmitting speaker to be displayed by the selecting means 6. Along with this, the selection signal c is output from the selection means 6 and input to the model generation means 7. The model generation means 7 stores a plurality of mouth-shaped models and head models formed of wire frames, and a selection signal c
The model generation means 7 which received the parameter data converts the mouth shape model data d relating to the model selected based on the selection signal c to the parameter conversion means 8 and the head model data f to the image synthesis means 9.
Output each.

【００１４】ここで、選択手段６とモデル生成手段７の
動作に関連し、受信側話者における送信側話者の擬似顔
の選択について詳しく説明する。例えば、受信側話者
が、肉親、知人、友人などのよく電話がかかってくる人
達の顔をあらかじめ撮像するなどしておき、頭部モデル
としてモデル生成手段７に蓄積しておく。また、肉親、
知人、友人などのよく電話がかかってくる人達、以外の
人達に対応するための顔は初めからモデル生成手段７に
標準モデルの顔が頭部モデルとして蓄積されており、受
信側話者がその標準モデルの顔を自ら作成編集し、モデ
ル生成手段７に頭部モデルとして蓄積しておいても良
い。さて電話がかかってきたら、まず受信側話者はその
送信側話者の声より人物を判断する。もしその人物が肉
親、知人、友人などであれば、その顔を選択手段６で選
択し、それに伴い選択信号ｃがモデル生成手段７に出力
される。また、もしその人物が肉親、知人、友人などの
人物以外であれば、標準モデルの顔を選択手段６で選択
し、それに伴い選択信号ｃがモデル生成手段７に出力さ
れる。さらに、送信側話者の人物が誰であろうとも、あ
らかじめ受信側話者が自ら作成し蓄積しておいた標準モ
デルの顔を、選択手段６で選択しても良い。Here, the selection of the pseudo face of the transmitting-side speaker by the receiving-side speaker will be described in detail in relation to the operations of the selecting means 6 and the model generating means 7. For example, the receiving speaker images the faces of people who are frequently called, such as relatives, acquaintances, and friends, in advance, and stores them in the model generating means 7 as head models. Also, relatives,
Faces for accommodating people other than acquaintances, friends, and others who frequently call, the face of the standard model is stored as a head model in the model generating means 7 from the beginning, and the receiving speaker The face of the standard model may be created and edited by itself and stored in the model generation means 7 as a head model. When a call is received, the receiving speaker first determines the person from the voice of the transmitting speaker. If the person is a close relative, an acquaintance, a friend, etc., the face is selected by the selection means 6, and the selection signal c is output to the model generation means 7 accordingly. If the person is not a person such as a relative, an acquaintance, or a friend, the face of the standard model is selected by the selection means 6, and the selection signal c is output to the model generation means 7 accordingly. Further, regardless of who is the sender speaker, the selecting unit 6 may select the face of the standard model that the receiver speaker has created and accumulated in advance.

【００１５】さて、音声パラメータｂには送信側話者の
会話における発音時の口形の情報が含まれており、パラ
メータ変換手段８に入力される。同時に、口形モデルデ
ータｄもパラメータ変換手段８に入力される。ここで音
声パラメータｂは口形モデルデータｄをもとに、時々刻
々と変化する口形パラメータｅに変換出力される。その
後、口形パラメータｅと頭部モデルデータｆは画像合成
手段９に入力される。画像合成手段９では口形パラメー
タｅと頭部モデルデータｆをもとに、三角形ポリゴンで
構成される三次元モデル（ワイヤフレームモデル）を変
形させ、各ポリゴンにテクスチャマッピング処理を施す
ことにより送信側話者の擬似顔動画像を合成する。尚、
図２に、三次元モデル（ワイヤフレームモデル）のイメ
ージ図を示す。図２で示した三次元モデル（ワイヤフレ
ームモデル）９０を構成する無数の三角形、すなわち、
三角形ポリゴンを変形させ、その各ポリゴンに、口形パ
ラメータｅと頭部モデルデータｆをもとにテクスチャマ
ッピング処理を施すことにより、送信側話者の擬似顔動
画像が得られる。最後に画像合成手段９で得られた送信
側話者の擬似顔動画像を、表示手段１０により表示す
る。Now, the voice parameter b includes the mouth shape information at the time of pronunciation in the conversation of the transmitting side speaker and is input to the parameter converting means 8. At the same time, the mouth shape model data d is also input to the parameter converting means 8. Here, the voice parameter b is converted and output based on the mouth shape model data d into a mouth shape parameter e that changes from moment to moment. Then, the mouth shape parameter e and the head model data f are input to the image synthesizing means 9. The image synthesizing means 9 transforms a three-dimensional model (wireframe model) composed of triangular polygons on the basis of the mouth shape parameter e and the head model data f, and performs texture mapping processing on each polygon, thereby transmitting side talk. Person's pseudo face moving image is synthesized. still,
FIG. 2 shows an image diagram of a three-dimensional model (wireframe model). Countless triangles forming the three-dimensional model (wireframe model) 90 shown in FIG. 2, that is,
By deforming the triangular polygons and subjecting each of the polygons to texture mapping processing based on the mouth shape parameter e and the head model data f, a pseudo facial moving image of the transmitting speaker can be obtained. Finally, the display unit 10 displays the pseudo-face moving image of the transmitting-side speaker obtained by the image synthesizing unit 9.

【００１６】このように、図１の実施例では送信側話者
の音声信号からその擬似顔動画像を受信側で生成表示す
ることを特徴としているので、従来のＴＶ電話装置と比
較すると、情報量や伝送時間がはるかに低減できる。ま
た一般の電話や公衆電話、携帯電話からの送信に対して
も、従来のＴＶ電話同士で通話しているような感覚で通
話が行える。さらに送信側話者の音声信号からその擬似
顔動画像を受信側で生成表示するのに際し、音声信号を
パラメータに変換し、そのパラメータから直接的に画像
合成へ結び付ける、というような方法を用いているた
め、複雑な音声認識の手段を用いる必要がないという長
所もある。As described above, the embodiment of FIG. 1 is characterized in that the pseudo face moving image is generated and displayed on the receiving side from the voice signal of the transmitting side speaker. Volume and transmission time can be much reduced. In addition, even when transmitting from a general telephone, a public telephone, or a mobile telephone, it is possible to make a telephone call as if the telephone calls were between conventional TV telephones. Further, when the pseudo face moving image is generated and displayed on the receiving side from the voice signal of the transmitting side speaker, a method such as converting the audio signal into a parameter and directly connecting the parameter to image synthesis is used. Therefore, there is also an advantage that it is not necessary to use a complicated voice recognition means.

【００１７】次に図３、図４のブロック図を用いて本発
明の第２の実施例を詳しく説明する。図３で、１００は
自動車・携帯電話といったような受信側話者における移
動体通信の無線端末装置であり、３００は本発明の擬似
動画ＴＶ電話装置である。６は受信側話者が送信側話者
の擬似顔を選択するための選択信号を出力する選択手段
である。７は口形モデルデータと頭部モデルデータを出
力するモデル生成手段である。８は音声パラメータと口
形モデルデータを入力し、音声パラメータを口形パラメ
ータに変換出力するパラメータ変換手段である。９は口
形パラメータと頭部モデルデータをもとに、送信側話者
の擬似顔動画像の生成を行う画像合成手段である。１０
は送信側話者の擬似顔動画像を表示する表示手段であ
る。さらに無線端末装置１００における詳しいブロック
図を図４に示す。図４で、１０１は送受信アンテナ、１
０２は高周波部、１０３は変復調部、１０４はチャネル
コーデック、１０５は音声符号化手段、１０６は音声合
成手段、１０７はスピーカ、１０８はマイクであり、例
えばＰＤＣ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＣｅ
ｌｌｕｌａｒ）などの、財団法人電波システム開発セン
ター刊「ディジタル自動車電話システム標準規格（ＲＣ
ＲＳＴＤ−２７Ｂ」で規定されている端末である。Next, the second embodiment of the present invention will be described in detail with reference to the block diagrams of FIGS. In FIG. 3, reference numeral 100 is a wireless terminal device for mobile communication in a receiving speaker such as an automobile or a mobile phone, and 300 is a pseudo moving image TV phone device of the present invention. Reference numeral 6 is a selection unit that outputs a selection signal for the receiving speaker to select the pseudo face of the transmitting speaker. Reference numeral 7 is a model generating means for outputting the mouth shape model data and the head model data. Reference numeral 8 is a parameter conversion means for inputting a voice parameter and mouth shape model data and converting the voice parameter into a mouth shape parameter for output. Reference numeral 9 is an image synthesizing means for generating a pseudo facial moving image of the transmitting speaker based on the mouth shape parameter and the head model data. 10
Is a display unit for displaying a pseudo facial moving image of the transmitting speaker. Further, a detailed block diagram of the wireless terminal device 100 is shown in FIG. In FIG. 4, 101 is a transmitting / receiving antenna, and 1 is
Reference numeral 02 is a high frequency unit, 103 is a modulation / demodulation unit, 104 is a channel codec, 105 is a voice encoding unit, 106 is a voice synthesizing unit, 107 is a speaker, and 108 is a microphone. For example, a PDC (Personal Digital Ce).
"Digital Car Phone System Standards (RC)
R STD-27B ”.

【００１８】次に各部の動作について、図１における第
１の実施例と異なる点についてのみ詳しく説明する。Next, the operation of each part will be described in detail only about the points different from the first embodiment in FIG.

【００１９】無線端末装置１００に電話がかかってきた
とすると、送受信アンテナ１０１に受信した信号は高周
波部１０２で周波数の低い信号に変換され、変復調部１
０３で復調される。その後、チャネルコーデック１０４
で誤り訂正の処理が行われ、音声処理部１０５に入力さ
れる。音声処理部１０５では内部の音声符号化手段１０
６で、音声の声道特性と放射特性の特徴を分析し線形予
測符号化を行うことにより、特徴抽出した音声パラメー
タｂが存在する。線形予測符号化における処理に関して
はＰＤＣの場合、ＶＳＥＬＰ（Ｖｅｃｔｏｒ−Ｓｕｍ
ＥｘｃｉｔｅｄｌｉｎｅａｒＰｒｅｄｉｃｔｉｖｅ
Ｃｏｄｉｎｇ）が採用されており、同様の処理が行われ
る。音声パラメータｂには送信側話者の会話における発
音時の口形の情報が含まれており、パラメータ変換手段
８に入力される。以下、選択手段６、モデル生成手段
７、パラメータ変換手段８、画像合成手段９、表示手段
１０に至る機能及び動作については図１における第１の
実施例と同じであるため省略する。以上説明したよう
に、図１における第１の実施例と異なる点は通信手段４
と音声分析手段５が無線端末装置１００に含まれている
ことである。When a call is made to the wireless terminal device 100, the signal received by the transmitting / receiving antenna 101 is converted into a low frequency signal by the high frequency section 102, and the modulation / demodulation section 1
It is demodulated with 03. Then the channel codec 104
Then, error correction processing is performed and the result is input to the voice processing unit 105. In the voice processing unit 105, the internal voice encoding means 10
In step 6, the features of the vocal tract characteristic and the radiation characteristic of the voice are analyzed, and linear predictive coding is performed, so that the feature-extracted voice parameter b exists. Regarding processing in linear predictive coding, in the case of PDC, VSELP (Vector-Sum) is used.
Excited linear Predictive
Coding) is adopted and similar processing is performed. The voice parameter b includes mouth shape information at the time of pronunciation in the conversation of the transmitting speaker and is input to the parameter converting means 8. The functions and operations of the selecting means 6, the model generating means 7, the parameter converting means 8, the image synthesizing means 9, and the displaying means 10 are the same as those in the first embodiment shown in FIG. As described above, the communication unit 4 is different from the first embodiment in FIG.
That is, the voice analysis unit 5 is included in the wireless terminal device 100.

【００２０】次に図３、図４との実施例と本質的には同
じながらも若干変更を施したものとして一つの変形例を
図５に示す。図５で図３、図４と異なる点は図３におけ
る擬似動画ＴＶ電話装置３００を無線端末装置１００に
取り込んだことである。尚、図３における選択手段６は
図５における無線端末装置１００には存在しないが、公
知の一般の無線端末装置における、制御部１１０を介し
たキーパッド１１１などの操作により代用が可能であ
る。また同様に、図３における表示手段１０は図５にお
ける無線端末装置１００には存在しないが、これも公知
の一般の無線端末装置における、電話番号などを表示す
るＬＣＤ１１２などで代用が可能である。図５における
具体的な動作については図３、図４における動作と実質
的に同じであるため省略する。Next, one modification is shown in FIG. 5 as being essentially the same as the embodiment of FIGS. 3 and 4 but with some modifications. 5 is different from FIGS. 3 and 4 in that the pseudo moving picture TV phone device 300 in FIG. 3 is incorporated in the wireless terminal device 100. The selecting unit 6 in FIG. 3 does not exist in the wireless terminal device 100 in FIG. 5, but can be substituted by operating the keypad 111 or the like via the control unit 110 in a known general wireless terminal device. Similarly, the display means 10 in FIG. 3 does not exist in the wireless terminal device 100 in FIG. 5, but it can be replaced by an LCD 112 or the like that displays a telephone number or the like in a known general wireless terminal device. The specific operation in FIG. 5 is substantially the same as the operation in FIGS.

【００２１】このように、図３、図４、図５における第
２の実施例における効果は第１の実施例と同様に得るこ
とができる。また、周波数の有効利用に伴う伝送速度の
許容限度を考えた場合、送信側話者の画像伝送をせず
に、音声信号からその擬似顔動画像を受信側で生成表示
できる手段を無線端末装置に持たせた効果は大きい。As described above, the effects of the second embodiment shown in FIGS. 3, 4, and 5 can be obtained in the same manner as the first embodiment. Further, when considering the allowable limit of the transmission rate due to effective use of the frequency, the wireless terminal device is provided with means capable of generating and displaying the pseudo facial moving image from the audio signal on the receiving side without transmitting the image of the transmitting side speaker. The effect given to is great.

【００２２】[0022]

【発明の効果】本発明によれば、送信側話者の音声信号
からその擬似顔動画像を受信側で生成表示するので、実
際に伝送する信号は音声に関するものだけとなり、従来
のＴＶ電話装置と比較すると、情報量や伝送時間がはる
かに低減でき、もちろん一般のアナログ電話回線でも実
現できる。さらに一般の電話や公衆電話、携帯電話から
の送信に対しても、送信側話者の音声信号からその擬似
顔動画像を受信側で生成表示するので、従来のＴＶ電話
同士で通話しているような感覚で通話が行える。According to the present invention, since the pseudo face moving image is generated and displayed on the receiving side from the voice signal of the transmitting side speaker, the signals actually transmitted are only those related to the voice, and the conventional TV telephone apparatus. Compared with, the amount of information and transmission time can be greatly reduced, and of course, it can be realized even with general analog telephone lines. Further, even when transmitting from a general telephone, a public telephone, or a mobile telephone, the pseudo face moving image is generated and displayed on the receiving side from the voice signal of the transmitting side speaker, so that the conventional videophones talk with each other. You can talk like that.

[Brief description of drawings]

【図１】本発明の第１の実施例を示すブロック図。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】本発明の第１の実施例で用いる三次元モデルの
斜視図。FIG. 2 is a perspective view of a three-dimensional model used in the first embodiment of the present invention.

【図３】本発明の第２の実施例を示すブロック図。FIG. 3 is a block diagram showing a second embodiment of the present invention.

【図４】図３の実施例で用いる無線端末装置を示すブロ
ック図。FIG. 4 is a block diagram showing a wireless terminal device used in the embodiment of FIG.

【図５】本発明の第２の実施例の変形例を示すブロック
図。FIG. 5 is a block diagram showing a modification of the second embodiment of the present invention.

[Explanation of symbols]

１…電話機、２…電話回線、３…疑似動画ＴＶ電話装置、４…通信手段、５…音声分析手段、６…選択手段、７…モデル生成手段、８…パラメータ変換手段、９…画像合成手段、１０…表示手段。 DESCRIPTION OF SYMBOLS 1 ... Telephone, 2 ... Telephone line, 3 ... Pseudo video TV telephone apparatus, 4 ... Communication means, 5 ... Voice analysis means, 6 ... Selection means, 7 ... Model generation means, 8 ... Parameter conversion means, 9 ... Image synthesis means , 10 ... Display means.

───────────────────────────────────────────────────── フロントページの続き (72)発明者吉岡厚神奈川県横浜市戸塚区吉田町292番地株式会社日立製作所映像メディア研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Atsushi Yoshioka 292 Yoshida-cho, Totsuka-ku, Yokohama-shi, Kanagawa Ltd.

Claims

[Claims]

1. A voice analysis means for analyzing characteristics of a vocal tract characteristic and a radiation characteristic of a voice signal and outputting a voice parameter having the feature extracted, and a plurality of mouth-shaped models and head models composed of wire frames are accumulated. , Model generation means for outputting mouth shape and head model data relating to the selected model, inputting the voice parameter and the mouth shape model data,
Parameter converting means for converting and outputting the voice parameter into a mouth shape parameter that changes from moment to moment, and image synthesizing means for generating a pseudo face moving image of the transmitting speaker based on the mouth shape parameter and the head model data. And a display unit for displaying the pseudo-face moving image of the transmitting-side speaker obtained by the image synthesizing unit.

2. In a mobile terminal for mobile communication, a speech coding means for analyzing characteristics of vocal tract characteristics and radiation characteristics of speech to perform linear predictive coding, and an analog speech signal from the coded speech data. A plurality of wireframes, each of which is composed of a wire frame, includes a voice synthesizing unit for synthesizing, a communication unit for transmitting and receiving voice information with the encoded voice data, and an output unit for outputting the voice parameter from which the feature is extracted. A model generation unit that stores a mouth shape model and a head model and outputs mouth shape and head model data relating to the selected model, the voice parameter output by the output unit, and the mouth shape model output by the model generation unit. Parameter conversion means for inputting data and converting the voice parameter into a mouth-shaped parameter that changes from moment to moment; An image synthesizing unit for generating the pseudo-face moving image of the transmitting-side speaker based on the head model data, and a display for displaying the pseudo-face dynamic image of the transmitting-side speaker obtained by the image synthesizing unit. And a pseudo moving picture video telephone device.