JP2004040525A

JP2004040525A - Device and method for transmitting video signal

Info

Publication number: JP2004040525A
Application number: JP2002195470A
Authority: JP
Inventors: Takashi Yamaguchi; 山口　孝; Shiro Omori; 大森　士郎; Atsushi Sodeoka; 袖岡　淳
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-07-04
Filing date: 2002-07-04
Publication date: 2004-02-05

Abstract

<P>PROBLEM TO BE SOLVED: To prevent the face of a talker from being displayed in a video telephone, or the like. <P>SOLUTION: An identifier attached to the face of the talker and for detecting the position and direction of the talker and a video camera 14 for imaging the talker are provided. A first circuit 11 for discriminating the position, size and direction of the face of the talker from image data of the identifier among video signals outputted from the video camera 14 and a second circuit 11 for forming image data of a face of a character whose size and direction are equal to those of the face of the talker discriminated by the first circuit 11 are provided. A third circuit 11 for substituting the face image data of the character formed by the second circuit 11 for the image data of the face of the talker is provided, and a video signal outputted from the third circuit 11 is transmitted. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
この発明は、映像信号の送出装置および送出方法に関する。
【０００２】
【従来の技術】
通信やその環境の進化にともない、コミュニケーションは、これまでの音声だけのものから映像を取り込んだものへと変化している。いわゆるテレビ電話もそのうちの１つであり、このテレビ電話によれば、ディスプレイに映し出された相手の顔を見ながら通話をすることができる。
【０００３】
また、ＩＰ電話あるいはインターネット電話においても、ＡＤＳＬ、ＣＡＴＶ、ＦＴＴＨなどのブロードバンド回線を使用すれば、テレビ電話を実現することができ、やはり双方が相手の顔を見ながら通話をすることができる。
【０００４】
【発明が解決しようとする課題】
ところが、この映像を取り込んだコミュニケーションであっても、必ずしも自分の顔をそのまま見せたくはない場合がある。例えば、
▲１▼　インターネット家庭教師を利用したいが、顔を見せるのは恥ずかしい。
▲２▼　インターネットの利便性を利用して、未知の世界の人たちとも広くコミュニケーションをとりたいが、いきなり顔を知られるのは困る。
▲３▼　擬似的な自分を作成して気軽にネットコミュニティを享受したい。
などの場合がある。
【０００５】
この発明は、これらの場合に対処できるコミュニケーション装置を提供しようとするものである。
【０００６】
【課題を解決するための手段】
この発明においては、例えば、
送話者の顔に装着されて上記送話者の顔の位置および向きを検出するための識別子と、
上記送話者を撮像するビデオカメラと、
このビデオカメラから出力される映像信号のうち、上記識別子の画像データから上記送話者の顔の位置、大きさおよび向きを判別する第１の回路と、
この第１の回路により判別された上記送話者の顔の大きさおよび向きと等しい大きさおよび向きのキャラクタの顔の画像データを形成する第２の回路と、
上記第１の回路の判別した上記送話者の顔の位置にしたがって、上記ビデオカメラから出力される映像信号のうち、上記送話者の顔の画像データを、上記第２の回路の形成した上記キャラクタの顔の画像データに置き換える第３の回路と
を有し、
上記第３の回路から出力される映像信号を送出する
ようにした映像信号の送出装置
とするものである。
したがって、相手のディスプレイには、送話者の顔がキャラクタの顔に置き換えられた画像が表示される。
【０００７】
【発明の実施の形態】
図１は、この発明を、ＩＰ電話におけるテレビ電話機１０に適用した場合の一例を示し、この電話機１０は、制御回路１１、送話器１２、受話器１３および各種の操作キー（操作スイッチ）１６を有する。この場合、制御回路１１は、図示はしないが、テレビ電話を実現するための各種のハードウェアおよびソフトウェアを有するものであり、例えば、マイクロコンピュータ、エンコーダ回路およびデコーダ回路、ネットワークコントローラなどを有する。そして、制御回路１１は、この電話機１０の全体の動作を制御するとともに、音声信号および映像信号のエンコード処理（データ圧縮）やデコード処理（データ伸長処理）などを実行する。また、制御回路１１は、ネットワークコントローラを通じてネットワーク２０に接続されている。
【０００８】
そして、映像をともなわない通話の送話時には、送話器１２からのアナログ音声信号が、制御回路１１に供給されてＡ／Ｄ変換およびエンコード処理がされてからパケット化され、このパケットがネットワーク２０に送り出される。また、受話時には、ネットワーク２０を通じて送られてきたパケットが、制御回路１１に供給されてデコード処理およびＤ／Ａ変換されてもとのアナログ音声信号が取り出され、この音声信号が受話器１３に供給される。
【０００９】
さらに、制御回路１１には、映像を送受信するためにビデオカメラ１４およびディスプレイ１５が接続される。そして、映像をともなう通話時には、送話者がビデオカメラ１４により撮像され、その映像信号が制御回路１１に供給されてＡ／Ｄ変換およびエンコード処理がされ、このエンコード処理された映像信号とエンコード処理された音声信号とがパケット化され、このパケットがネットワーク２０に送り出される。また、相手からネットワーク２０を通じて送られてきたパケットが、制御回路１１に供給されてデコード処理およびＤ／Ａ変換されてもとの音声信号および映像信号が取り出され、これら音声信号および映像信号が受話器１３およびディスプレイ１５に供給される。
【００１０】
なお、この場合、例えば図２Ａに示すように、ディスプレイ１５のスクリーン１５Ｓの表示はピクチャインピクチャとなり、その親画面に相手の顔が大きく表示されるとともに、子画面にビデオカメラ１４の撮像した自分の顔、すなわち、相手に送信されている自分の顔が小さく表示される。また、所定のキー操作をすると、例えば図２Ｂに示すように、親画面の画像と子画面の画像とが入れ換わり、自分の顔が大きく表示されるとともに、相手の顔が小さく表示される。
【００１１】
そして、上述の▲１▼〜▲３▼項などの要求を満たすため、この発明においては、ビデオカメラ１４が撮像した送話者の顔をアニメーションによるキャラクタの顔に置き換えて相手に送出するものである。なお、以下の説明においては、このようにビデオカメラ１４が撮像した送話者の顔をアニメーションによるキャラクタの顔に置き換えて相手に送出するテレビ電話モードを「キャラクタモード」と呼ぶものとする。
【００１２】
そして、このキャラクタモードを実現するため、テレビ電話機１０は、さらに次のように構成される。すなわち、制御回路１１を構成するマイクロコンピュータには、このマイクロコンピュータが実行するプログラムの一部として例えば図３に示すルーチン１００が用意される。このルーチン１００は、詳細については後述するが、ビデオカメラ１４の撮像した送話者の顔を、アニメーションによるキャラクタの顔に置き換えるためのものであり、例えば１５回／１秒の割り合いで実行される。なお、図３においては、ルーチン１００は、この発明に関係する部分だけを抜粋して示している。
【００１３】
さらに、例えば図４に示すような眼鏡１７が用意される。この眼鏡１７は、送話者の顔の３次元的な位置や向きなどを検出するためのものである。このため、眼鏡１７のレンズ枠の中央前方の上方、右前方の下方および左前方の下方に、識別子として例えばＬＥＤ（１７Ａ〜１７Ｃ）が設けられ、眼鏡１７の例えば左のつるの途中に、識別子としてＬＥＤ（１７Ｄ）が設けられる。
【００１４】
この場合、送話者が眼鏡１７をかけて正面を向いたとき、その正面軸と直交する同一の垂直面内にＬＥＤ（１７Ａ〜１７Ｃ）が位置し、その垂直面と直交する同一の水平面内にＬＥＤ（１７Ａ、１７Ｄ）が位置するように、ＬＥＤ（１７Ａ〜１７Ｄ）を設けることが好ましい。また、ＬＥＤ（１７Ａ〜１７Ｄ）には、例えば制御回路１１から動作電圧が供給されて発光が行われる。
【００１５】
したがって、眼鏡１７をビデオカメラ１４で撮像した場合、所定の画像処理を行うことによりＬＥＤ（１７Ａ〜１７Ｄ）の発光からそれらの位置を検出することができ、この検出結果から、ビデオカメラ１４の撮像画面におけるＬＥＤ（１７Ｂ、１７Ｃ）に対するＬＥＤ（１７Ａ）の高さＨ、ＬＥＤ（１７Ｂ）とＬＥＤ（１７Ｃ）との間隔Ｗ、ＬＥＤ（１７Ａ）からＬＥＤ（１７Ｄ）までの奥行き方向（前後方向）の距離Ｄを求めることができる。また、ＬＥＤ（１７Ａ〜１７Ｃ）により、これらを含む平面が規定される。
【００１６】
そして、眼鏡１７の実物における値Ｈ、Ｗ、Ｄは既知なので、その値Ｈ、Ｗ、Ｄと、眼鏡１７の撮像結果における値Ｈ、Ｗ、Ｄとから、ビデオカメラ１４に対する眼鏡１７の距離、前後方向、左右方向および上下方向を中心とする向きを求めることができる。また、撮像画面内におけるＬＥＤ（１７Ａ）の位置からビデオカメラ１４に対する眼鏡１７の上下方向および左右方向における位置を求めることができる。
【００１７】
したがって、この眼鏡１７を送話者がかけた場合、ＬＥＤ（１７Ａ〜１７Ｄ）の位置を検出することにより、ビデオカメラ１４に対する送話者の顔の３次元的な位置や向きなどを求めることができる。そして、キャラクタモードを使用する場合には、送話者は眼鏡１７をかけて通話を行う。
【００１８】
このような構成において、例えば通話中に所定のキー操作をして電話機１０をテレビ電話のモードにすると、制御回路１１のマイクロコンピュータの処理がルーチン１００のステップ１０１からスタートし、次にステップ１０２において、例えば図５Ａに示すように、ビデオカメラ１４から出力される映像信号の１フレームが画像データとして制御回路１１に取り込まれ、続くステップ１０３において、テレビ電話モードが、通常のテレビ電話モードに設定されているかキャラクタモードに設定されているかが判別される。
【００１９】
そして、キャラクタモードに設定されている場合には、処理はステップ１０３からステップ１１１に進み、このステップ１１１において、ステップ１０２により取り込まれた１フレームに、すべてのＬＥＤ（１７Ａ〜１７Ｄ）が含まれているかどうかがチェックされる。そして、すべてのＬＥＤ（１７Ａ〜１７Ｄ）が含まれているときには、処理はステップ１１１からステップ１１２に進み、このステップ１１２において、撮像画面内のＬＥＤ（１７Ａ〜１７Ｄ）の位置および間隔Ｈ、Ｗ、Ｄを求めることにより、ビデオカメラ１４から送話者までの距離、撮像画面内における送話者の顔の位置、大きさおよび送話者の顔の向きが検出される。
【００２０】
続いてステップ１１３において、図５Ｂに示すように、アニメーションによるキャラクタの顔の画像データが形成されるとともに、その顔の大きさおよび向きは、ステップ１１２により求めたデータにしたがって送話者のそれに等しくされる。この場合、このキャラクタの顔の画像データは、キャラクタが正面を向いているときの顔の画像データをあらかじめ用意しておき、その画像データを、ステップ１１２により求めたデータにしたがって演算処理することにより形成することができる。さらに、このキャラクタの顔の画像は３次元画像とすることができる。
【００２１】
そして、次にステップ１１４において、ステップ１１２により求めた送話者の顔の位置の情報にしたがって、図５Ｃに示すように、ステップ１０２により取り込まれた画像データのうち、送話者の顔の部分の画像データが、ステップ１１３により生成されたキャラクタの顔の画像データにより置き換えられる。そして、その後、ステップ１１４により処理された画像データがエンコード処理（データ圧縮）され、このエンコード結果の画像データがステップ１２２によりネットワーク２０に送出され、ステップ１２３によりルーチン１００を終了する。
【００２２】
したがって、キャラクタモードに設定されている場合には、ビデオカメラ１４により撮像された送話者の画像は、その顔がキャラクタの顔に置き換えられ、その置き換えられた送話者の画像が相手のテレビ電話へと送られることになる。
【００２３】
そして、このとき、ルーチン１００は例えば１５回／１秒の割り合い（１５フレーム／１秒）で実行されるとともに、相手に送られたキャラクタの顔の位置や向きは、送話者の顔の位置や向きにつれて動くので、相手のテレビ電話には、いわばキャラクタの顔のマスクをつけた送話者の画像が動画により表示されることになる。なお、このとき、所定のキーを操作して図２ＡおよびＢに示すように、ピクチャインピクチャにおける送話者および受話者の画像の大小関係を切り換えることにより、送話者はキャラクタの顔の状態を確認することができ、すなわち、マスクの状態を確認することができる。
【００２４】
一方、ステップ１１１において、ステップ１０２により取り込まれた１フレームに、ＬＥＤ（１７Ａ〜１７Ｄ）のどれか１つでも含まれていないときには、処理はステップ１１１からステップ１１９に進み、このステップ１１９において、送話者の顔がビデオカメラ１４の撮像範囲から外れていることを示すダミーの画像データが形成され、その後、処理はステップ１２１に進み、そのダミーデータがエンコード処理されてネットワーク２０へと送出される。したがって、キャラクタモードに設定されている場合でも、送話者の顔の位置などを特定できないときには、送話者の画像が、相手のテレビ電話に送られることはない。
【００２５】
また、ステップ１０３において、通常のテレビ電話モードに設定されている場合には、処理はステップ１０３からステップ１２１に進み、このステップ１２１において、このとき（ステップ１０２により）取り込まれている画像データがエンコード処理されてネットワーク２０へと送出される。したがって、通常のテレビ電話モードに設定されている場合には、ビデオカメラ１４により撮像された送話者の画像が、相手のテレビ電話へとそのまま送られることになる。
【００２６】
こうして、このテレビ電話機１０によれば、キャラクタモードにした場合には、ビデオカメラ１４の撮像した送話者の顔がキャラクタの顔に置き換えられて相手へと送出され、相手のテレビ電話には、いわばキャラクタの顔のマスクをつけた送話者の動画が表示されるので、自分の顔を出すのは恥ずかしい、困るなどの精神的な垣根を取り去ることができる。
【００２７】
また、いわばマスクをつけたコミュニケーションとなるので、例えば友人との通常の通話であっても、新たなエンタテイメント性が加わることになる。さらに、何種類かの異なるキャラクタの顔の画像データを制御回路１１に用意しておき、そのうちの任意のものを選択して使用することによりエンタテイメント性をより高めることもできる。
【００２８】
なお、上述において、眼鏡１７に送話器１２および受話器１３を設けることもできる。また、制御回路１１は、パーソナルコンピュータとすることもでき、ディスプレイ１５はテレビ受像機などであってもよい。さらに、眼鏡１７にＬＥＤ（１７Ａ〜１７Ｄ）を設ける場合、その電源はボタン電池などとして眼鏡１７に設けることができる。
【００２９】
また、メーカなどのサーバに何種類かのキャラクタの顔の画像データを用意しておき、これを電話機１０にダウンロードして利用するようにもできる。さらに、上述においては、この発明をＩＰ電話に適用した場合であるが、テレビ電話機能を有する電話機であれば、携帯電話などにも適用することができる。
【００３０】
〔この明細書で使用している略語の一覧〕
Ａ／Ｄ　：Ａｎａｌｏｇ　ｔｏ　Ｄｉｇｉｔａｌ
ＡＤＳＬ：Ａｓｙｍｍｅｔｒｉｃ　Ｄｉｇｉｔａｌ　Ｓｕｂｓｃｒｉｂｅｒ　Ｌｉｎｅ
ＣＡＴＶ：ＣＡｂｌｅ　Ｔｅｌｅｖｉｓｉｏｎ
Ｄ／Ａ　：Ｄｉｇｉｔａｌ　ｔｏ　Ａｎａｌｏｇ
ＦＴＴＨ：Ｆｉｂｅｒ　Ｔｏ　Ｔｈｅ　Ｈｏｍｅ
ＩＰ　　：Ｉｎｔｅｒｎｅｔ　Ｐｒｏｔｏｃｏｌ
ＬＥＤ　：Ｌｉｇｈｔ　Ｅｍｉｔｔｉｎｇ　Ｄｉｏｄｅ
【００３１】
【発明の効果】
この発明によれば、ビデオカメラの撮像した送話者の顔がキャラクタの顔に置き換えられて相手へと送出されるので、自分の顔を出すのは恥ずかしい、困るなどの精神的な垣根を取り去ることができる。また、いわばマスクをつけたコミュニケーションとなり、新たなエンタテイメント性が加わることになる。
【図面の簡単な説明】
【図１】この発明の一形態を示す系統図である。
【図２】この発明を説明するための表示画面の図である。
【図３】この発明の一形態を示すフローチャートである。
【図４】この発明の一部の一形態を示す斜視図である。
【図５】この発明を説明するための表示画面の図である。
【符号の説明】
１１…制御回路、１２…送話器、１３…受話器、１４…ビデオカメラ、１５…ディスプレイ、１６…操作スイッチ、１７…眼鏡、１７Ａ〜１７Ｄ…ＬＥＤ、２０…ネットワーク[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a video signal transmitting device and a video signal transmitting method.
[0002]
[Prior art]
As communication and its environment have evolved, communication has changed from using only audio to capturing video. A so-called videophone is one of them. According to this videophone, it is possible to make a call while looking at the face of the other party shown on the display.
[0003]
Also, in the case of an IP telephone or an Internet telephone, if a broadband line such as ADSL, CATV, or FTTH is used, a videophone can be realized, and both parties can talk while looking at the other party's face.
[0004]
[Problems to be solved by the invention]
However, there is a case where the user does not necessarily want to show his / her own face even in the communication that captures the video. For example,
(1) I want to use an Internet tutor, but it's embarrassing to show my face.
(2) I want to use the convenience of the Internet to communicate widely with people in the unknown world, but it is not easy to suddenly know my face.
(3) I want to create a simulated self and enjoy the Internet community casually.
And so on.
[0005]
The present invention seeks to provide a communication device that can deal with these cases.
[0006]
[Means for Solving the Problems]
In the present invention, for example,
An identifier attached to the face of the sender to detect the position and orientation of the face of the sender;
A video camera for imaging the sender,
A first circuit for determining the position, size, and orientation of the face of the sender from the image data of the identifier in the video signal output from the video camera;
A second circuit for forming image data of a face of a character having a size and orientation equal to the size and orientation of the sender's face determined by the first circuit;
According to the position of the face of the sender determined by the first circuit, image data of the face of the sender is formed by the second circuit from the video signal output from the video camera. A third circuit for replacing image data of the character's face,
A video signal transmitting apparatus configured to transmit a video signal output from the third circuit.
Therefore, an image in which the face of the sender is replaced with the face of the character is displayed on the display of the other party.
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 shows an example in which the present invention is applied to a video telephone 10 for an IP telephone. This telephone 10 includes a control circuit 11, a transmitter 12, a receiver 13, and various operation keys (operation switches) 16. Have. In this case, although not shown, the control circuit 11 has various hardware and software for implementing a videophone, and includes, for example, a microcomputer, an encoder circuit and a decoder circuit, a network controller, and the like. The control circuit 11 controls the entire operation of the telephone 10 and executes an encoding process (data compression) and a decoding process (data decompression process) of the audio signal and the video signal. The control circuit 11 is connected to a network 20 through a network controller.
[0008]
When transmitting a call without video, an analog audio signal from the transmitter 12 is supplied to the control circuit 11 and subjected to A / D conversion and encoding, and then packetized. Will be sent to At the time of reception, a packet sent through the network 20 is supplied to the control circuit 11 and subjected to decoding processing and D / A conversion to take out the original analog audio signal. This audio signal is supplied to the receiver 13. You.
[0009]
Further, a video camera 14 and a display 15 are connected to the control circuit 11 for transmitting and receiving images. During a call with video, the sender is imaged by the video camera 14, and the video signal is supplied to the control circuit 11, where the video signal is subjected to A / D conversion and encoding, and the encoded video signal and the encoding process are performed. The converted audio signal is packetized, and the packet is sent out to the network 20. Also, a packet sent from the other party via the network 20 is supplied to the control circuit 11 where the packet is decoded and D / A converted, so that the original audio signal and video signal are taken out. 13 and the display 15.
[0010]
In this case, as shown in FIG. 2A, for example, the display on the screen 15S of the display 15 is a picture-in-picture, and the face of the other party is displayed in a large size on the main screen, and the self-image captured by the video camera 14 is displayed on the small screen. , That is, his / her face transmitted to the other party is displayed in a small size. When a predetermined key operation is performed, for example, as shown in FIG. 2B, the image of the main screen is replaced with the image of the child screen, and the face of oneself is displayed large and the face of the other party is displayed small.
[0011]
In order to satisfy the above requirements (1) to (3), the present invention replaces the face of the sender captured by the video camera 14 with the face of the character by animation and sends it to the other party. is there. In the following description, the videophone mode in which the face of the sender captured by the video camera 14 is replaced with the face of the character by animation and transmitted to the other party will be referred to as “character mode”.
[0012]
In order to realize this character mode, the videophone 10 is further configured as follows. That is, the microcomputer constituting the control circuit 11 is provided with, for example, a routine 100 shown in FIG. 3 as a part of a program executed by the microcomputer. As will be described in detail later, this routine 100 is for replacing the face of the sender captured by the video camera 14 with the face of the character by animation, and is executed, for example, at a rate of 15 times / 1 second. You. Note that, in FIG. 3, the routine 100 shows only portions related to the present invention.
[0013]
Further, for example, glasses 17 as shown in FIG. 4 are prepared. The glasses 17 are for detecting the three-dimensional position and orientation of the face of the sender. For this reason, for example, LEDs (17A to 17C) are provided as identifiers above the front of the center, below the right front, and below the front left of the lens frame of the glasses 17, and the identifiers are provided in the middle of the left vine of the glasses 17, for example. An LED (17D) is provided.
[0014]
In this case, when the transmitter faces the front with the glasses 17, LEDs (17A to 17C) are located in the same vertical plane orthogonal to the front axis, and in the same horizontal plane orthogonal to the vertical plane. It is preferable to provide the LEDs (17A to 17D) such that the LEDs (17A, 17D) are located at the same position. The LEDs (17A to 17D) are supplied with an operating voltage from the control circuit 11, for example, to emit light.
[0015]
Therefore, when the glasses 17 are imaged by the video camera 14, their positions can be detected from the light emission of the LEDs (17A to 17D) by performing predetermined image processing. The height H of the LED (17A) with respect to the LED (17B, 17C) on the screen, the distance W between the LED (17B) and the LED (17C), the depth direction (front-back direction) from the LED (17A) to the LED (17D). The distance D can be determined. The LEDs (17A to 17C) define a plane including them.
[0016]
Since the actual values H, W, and D of the glasses 17 are known, the distances of the glasses 17 to the video camera 14 are calculated based on the values H, W, and D, and the values H, W, and D in the imaging result of the glasses 17. The direction centering on the front-back direction, the left-right direction, and the up-down direction can be obtained. Further, the positions of the glasses 17 in the vertical and horizontal directions with respect to the video camera 14 can be obtained from the positions of the LEDs (17A) in the imaging screen.
[0017]
Therefore, when the speaker wears the glasses 17, it is possible to obtain the three-dimensional position and orientation of the face of the sender with respect to the video camera 14 by detecting the positions of the LEDs (17A to 17D). it can. When the character mode is used, the caller wears the glasses 17 to make a call.
[0018]
In such a configuration, for example, when the telephone 10 is set to the videophone mode by operating a predetermined key during a call, the microcomputer processing of the control circuit 11 starts from step 101 of the routine 100, and then proceeds to step 102. For example, as shown in FIG. 5A, one frame of the video signal output from the video camera 14 is taken into the control circuit 11 as image data, and in the subsequent step 103, the videophone mode is set to the normal videophone mode. Or character mode is determined.
[0019]
If the character mode has been set, the process proceeds from step 103 to step 111. In this step 111, all the LEDs (17A to 17D) are included in one frame captured in step 102. Is checked. When all the LEDs (17A to 17D) are included, the process proceeds from step 111 to step 112. In this step 112, the positions and the intervals H, W, By obtaining D, the distance from the video camera 14 to the sender, the position and size of the sender's face in the imaging screen, and the orientation of the sender's face are detected.
[0020]
Subsequently, in step 113, as shown in FIG. 5B, image data of the face of the character is formed by animation, and the size and orientation of the face are equal to those of the sender according to the data obtained in step 112. Is done. In this case, the image data of the face of the character is prepared by preparing in advance the image data of the face when the character is facing the front, and performing arithmetic processing on the image data in accordance with the data obtained in step 112. Can be formed. Further, the image of the face of the character can be a three-dimensional image.
[0021]
Then, in step 114, according to the information on the position of the face of the sender obtained in step 112, as shown in FIG. Is replaced with the image data of the face of the character generated in step 113. Thereafter, the image data processed in step 114 is subjected to encoding processing (data compression), the image data resulting from this encoding is transmitted to the network 20 in step 122, and the routine 100 ends in step 123.
[0022]
Therefore, when the character mode is set, the face of the sender imaged by the video camera 14 is replaced with the face of the character, and the replaced image of the sender is displayed on the other party's television. Will be sent to the phone.
[0023]
At this time, the routine 100 is executed, for example, at a rate of 15 times / 1 second (15 frames / 1 second), and the position and orientation of the face of the character sent to the other party are determined by the face of the sender. Since it moves according to the position and orientation, an image of the sender with a mask of the character's face is displayed as a moving image on the other party's videophone. At this time, as shown in FIGS. 2A and 2B, a predetermined key is operated to switch the magnitude relationship between the images of the sender and the receiver in the picture-in-picture, so that the sender can change the state of the face of the character. Can be confirmed, that is, the state of the mask can be confirmed.
[0024]
On the other hand, if it is determined in step 111 that any one of the LEDs (17A to 17D) is not included in one frame captured in step 102, the process proceeds from step 111 to step 119. Dummy image data indicating that the speaker's face is out of the imaging range of the video camera 14 is formed. Thereafter, the process proceeds to step 121, where the dummy data is encoded and transmitted to the network 20. . Therefore, even when the character mode is set, if the position of the face of the sender cannot be specified, the image of the sender is not sent to the other party's videophone.
[0025]
If it is determined in step 103 that the normal videophone mode has been set, the process proceeds from step 103 to step 121, where the image data captured at this time (by step 102) is encoded. It is processed and sent out to the network 20. Therefore, when the normal videophone mode is set, the image of the sender captured by the video camera 14 is sent to the other party's videophone as it is.
[0026]
Thus, according to the videophone 10, when the character mode is set, the face of the sender imaged by the video camera 14 is replaced with the face of the character and transmitted to the other party. In other words, a video of the sender wearing the mask of the character's face is displayed, so that it is possible to remove mental barriers such as embarrassing or having trouble with putting out one's own face.
[0027]
In addition, since the communication is performed with a mask attached, a new entertainment property is added even for a normal call with a friend, for example. Further, by preparing image data of several types of different character faces in the control circuit 11 and selecting and using any one of them, the entertainment property can be further enhanced.
[0028]
In the above description, the transmitter 12 and the receiver 13 can be provided on the glasses 17. Further, the control circuit 11 may be a personal computer, and the display 15 may be a television receiver or the like. Further, when the LEDs (17A to 17D) are provided on the glasses 17, the power supply thereof can be provided on the glasses 17 as a button battery or the like.
[0029]
Alternatively, image data of several types of character faces may be prepared in a server such as a manufacturer and downloaded to the telephone 10 for use. Further, in the above description, the present invention is applied to an IP telephone, but the present invention can also be applied to a mobile telephone as long as it has a video telephone function.
[0030]
[List of abbreviations used in this specification]
A / D: Analog to Digital
ADSL: Asymmetric Digital Subscriber Line
CATV: CAble Television
D / A: Digital to Analog
FTTH: Fiber To The Home
IP: Internet Protocol
LED: Light Emitting Diode
[0031]
【The invention's effect】
According to the present invention, since the face of the sender imaged by the video camera is replaced with the face of the character and transmitted to the other party, it is embarrassing to put out one's own face, and mental barriers such as being troubled are removed. be able to. In addition, it becomes a communication with a mask, so to speak, and a new entertainment property is added.
[Brief description of the drawings]
FIG. 1 is a system diagram illustrating one embodiment of the present invention.
FIG. 2 is a diagram of a display screen for explaining the present invention.
FIG. 3 is a flowchart illustrating one embodiment of the present invention.
FIG. 4 is a perspective view showing one embodiment of a part of the present invention.
FIG. 5 is a diagram of a display screen for explaining the present invention.
[Explanation of symbols]
11 control circuit, 12 transmitter, 13 receiver, 14 video camera, 15 display, 16 operation switch, 17 glasses, 17A to 17D LED, 20 network

Claims

An identifier attached to the face of the sender to detect the position and orientation of the face of the sender;
A video camera for imaging the sender,
A first circuit for determining the position, size, and orientation of the face of the sender from the image data of the identifier in the video signal output from the video camera;
A second circuit for forming image data of a face of a character having a size and orientation equal to the size and orientation of the sender's face determined by the first circuit;
According to the position of the face of the sender determined by the first circuit, image data of the face of the sender is formed by the second circuit from the video signal output from the video camera. A third circuit for replacing image data of the character's face,
A video signal transmitting device configured to transmit a video signal output from the third circuit.

The video signal transmitting device according to claim 1,
An encoder circuit for encoding a video signal output from the third circuit,
A video signal transmitting device which encodes the video signal output from the third circuit by the encoder circuit and then transmits the encoded video signal.

The video signal transmitting device according to claim 1 or 2,
An apparatus for transmitting a video signal, wherein the identifier is a plurality of LEDs provided on glasses worn by the sender.

The sender wearing the identifier for detecting the position and orientation of the face is imaged by a video camera,
Of the video signals output from the video camera, determine the position and orientation of the face of the sender from the image data of the identifier,
Image data of the face of the character is formed according to the information of the determination result,
In the video signal output from the video camera, the image data of the face of the sender is replaced with image data of the face of the character,
A video signal transmission method for transmitting the video signal after the replacement.

The method for transmitting a video signal according to claim 4,
A method of transmitting a video signal, wherein the video signal after the replacement is further encoded and then transmitted.

In the method for transmitting a video signal according to claim 4 or 5,
A method of transmitting a video signal, wherein the identifier is a plurality of LEDs provided on glasses worn by the sender.