JP2007300452A

JP2007300452A - Image and television broadcast receiver with sound communication function

Info

Publication number: JP2007300452A
Application number: JP2006127298A
Authority: JP
Inventors: Kosuke Yagi; 孝介八木; Tomokazu Fukuda; 智教福田; Takayuki Kushida; 隆行櫛田; Akihiro Nagase; 章裕長瀬; Kyoichiro Oda; 恭一郎小田; Kaoru Kawada; 薫河田; Manabu Hashimoto; 橋本　　学; Toru Yoshihara; 徹吉原; Tomonori Ohashi; 知典大橋
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-05-01
Filing date: 2006-05-01
Publication date: 2007-11-15
Anticipated expiration: 2026-05-01
Also published as: JP4845581B2

Abstract

<P>PROBLEM TO BE SOLVED: To allow a display dimension by each screen on a synthetic screen to be switchable in cooperation with a program content and the importance degree of each attendant in the conversation of a video conference. <P>SOLUTION: A television broadcast receiver with a sound communication function includes: a video decoding part 22 for decoding a television broadcast; an image and sound adjusting part 13 for converting an input signal into a form suitable for communication processing; a communication decoding part 34 for decoding the input signal from a communication line; a communication control part 32 for controlling an input/output signal relative to the communication line; an image composing part 14 for outputting the composite image signal of image signals from the video decoding part 22 and the communication decoding part 34; a display part 15 for displaying the composite image signal; and a conversation state determining part 41 for determining a conversation state between a user and a communication opposite party, based on respective sound signals from the image and sound adjusting part 13 and the communication decoding part 34, and outputting a synthetic control signal. The image composing part 14 composes the composite image signal, based on the composite control signal to be inputted from the conversation state determining part 41, and displays it in the display part 15. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、画像及び音声通信機能を備えたテレビジョン放送受像機に関する。 The present invention relates to a television broadcast receiver having an image and audio communication function.

電話回線等の通信回線を介して画面に表示された通信相手と画面を介して擬似的な対面会話が可能である画像及び音声通信機能を有する装置が一般的にテレビ電話装置として知られている。又、同様に通信回線を介して画面に表示された会議の会話参加者の発言を画面を介して対面拝聴が可能である画像及び音声通信機能を有する装置が一般的にテレビ会議システムとして知られている。又、テレビ局からのテレビジョン放送を受信しての番組を再生して画面に表示させるテレビジョン放送受像機も知られている。 A device having an image and voice communication function capable of performing a pseudo face-to-face conversation with a communication partner displayed on a screen via a communication line such as a telephone line is generally known as a videophone device. . Similarly, a device having an image and voice communication function capable of performing face-to-face listening to the speech of a conference conversation participant displayed on a screen via a communication line is generally known as a video conference system. ing. There is also known a television broadcast receiver that receives a television broadcast from a television station, reproduces a program, and displays the program on a screen.

更に、テレビジョン放送受像機とテレビ電話装置を組み合わせ、画面に表示されたテレビジョン放送の番組を試聴しながら同時に画面に表示された通信相手と擬似的な対面会話が可能である画像及び音声通信機能（テレビ電話機能）を有する画像及び音声通信機能付テレビジョン放送受像機も知られている。そのような従来の画像及び音声通信機能付テレビジョン放送受像機としては、例えば、家庭用のテレビジョン放送受像機の拡張スロットに付加されると共に電話回線にも接続されて、一般電話機能と画像及び音声通信機能を話中でも遠隔操作装置の操作により切り替え可能であるテレビ電話装置が知られている（例えば、特許文献１参照）。そのテレビ電話装置では、設置場所を減少させるために画面を兼用させてテレビジョン放送の親画面の右下の一部が画面の縦横比で切り取られて形成された固定の子画面にテレビ電話の通信相手が表示され、テレビ電話を受信した場合に当初は大きめの子画面で表示され、遠隔操作装置の操作により小画面の表示寸法を小さくすることが選択でき、テレビ電話の通信相手の音声がテレビジョン放送受像機のスピーカから出力される。 In addition, video and audio communications that enable a simulated face-to-face conversation with a communication partner displayed on the screen while listening to a television broadcast program displayed on the screen in combination with a television broadcast receiver and a videophone device A television broadcast receiver with an image and audio communication function having a function (video phone function) is also known. As such a conventional television broadcast receiver with an image and audio communication function, for example, it is added to an expansion slot of a home television broadcast receiver and is also connected to a telephone line, so that a general telephone function and an image are received. In addition, a videophone device that can be switched by operating a remote control device while a voice communication function is in use is known (see, for example, Patent Document 1). In such a videophone device, in order to reduce the installation space, the videophone can be used as a fixed sub-screen formed by cutting out the lower right part of the main screen of the television broadcast with the aspect ratio of the screen. When a communication partner is displayed and a videophone call is received, it is initially displayed on a large sub-screen. You can choose to reduce the display size of the small screen by operating the remote control device. Output from the speaker of the television broadcast receiver.

また、従来のテレビ会議システムとしては、通信回線に接続された各会議会話参加者の端末装置から入力する音声信号から各発言者の端末装置とその地点数を自動的に検出し、発言者が１人の場合には、発言者以外の各端末装置に発言者の画像を送出し、会議会話参加者中の発言者の数が増加するに従って単純に画面の均等分割数を２分割から４分割・・・ｎ分割と分割数を増加させた各発明者の合成画像を各端末装置に送出する多地点制御装置を備えたテレビ会議システムが知られている（例えば、特許文献２参照）。その多地点制御装置を備えたテレビ会議システムでは、発言者が複数の場合には、各発言者の複数の画像を均等分割された１画面に合成し、合成された画像を各会議会話参加者の端末装置に向けて送信するが、その際に、合成画像から送信先の発言者自身の画像が他の会議会話参加者の画像に差し替えられて各端末装置に送信される。 In addition, as a conventional video conference system, each speaker's terminal device and its number of points are automatically detected from a voice signal input from each conference conversation participant's terminal device connected to a communication line, and the speaker In the case of one person, the image of the speaker is sent to each terminal device other than the speaker, and the screen is simply divided into two to four as the number of speakers in the conference conversation increases. ... There is known a video conference system including a multi-point control device that sends a composite image of each inventor with n divisions and the number of divisions to each terminal device (for example, see Patent Document 2). In a video conference system equipped with the multipoint control device, when there are a plurality of speakers, a plurality of images of each speaker are combined into one equally divided screen, and the combined images are displayed in each conference conversation participant. In this case, the image of the speaker who is the transmission destination is replaced with the image of another conference conversation participant from the composite image and transmitted to each terminal device.

特開平１０−５６６７５号公報（段落００１４〜００１７、図１）JP-A-10-56675 (paragraphs 0014 to 0017, FIG. 1) 特許３３０５０３７号公報（段落００２６〜００２９、図１）Japanese Patent No. 3305037 (paragraphs 0026 to 0029, FIG. 1)

特許文献１には、従来のテレビジョン放送受像機に付加されるテレビ電話装置では、テレビ電話の通信相手の音声がテレビジョン放送受像機のスピーカから出力されることは記載されているが、その際のテレビジョン放送の番組の音声についてはどのように処理されるのかが記載されていない。このことから、特許文献１ではテレビジョン放送を試聴しながら同時にテレビ電話の通信相手と会話することは想定されていないと考えられる。テレビジョン放送の試聴とテレビ電話の通信相手と会話が同時でなく択一的に実施されるとすると、常識的にテレビ電話時には、テレビジョン放送の音声はミュートされて再生されないか、会話の邪魔にならない固定的な小音量で再生されてテレビ電話の通信相手の音声と混合されて出力されるかの何れかであると考えられる。又、その際にテレビ電話の通信相手が表示された子画面は、上記したように若干の寸法調整は可能であるもののテレビジョン放送の親画面中の右下等に固定的に配置されたままである。 Patent Document 1 describes that in a videophone device added to a conventional television broadcast receiver, the voice of the communication partner of the videophone is output from the speaker of the television broadcast receiver. It does not describe how to process the sound of a television broadcast program at that time. For this reason, in Patent Document 1, it is considered that it is not assumed that a user talks with a communication partner of a videophone at the same time while listening to a television broadcast. Assuming that the television broadcast audition and the conversation with the communication partner of the videophone are performed simultaneously instead of simultaneously, it is common sense that during a videophone call, the audio of the television broadcast is muted and not played back, or the conversation is disturbed. It is considered that the data is reproduced at a fixed small volume that does not become a signal and is mixed with the voice of the communication partner of the videophone and output. In addition, the child screen on which the other party of the videophone is displayed at that time can be adjusted slightly as described above, but it is fixedly arranged at the lower right of the main screen of the television broadcast. is there.

テレビジョン放送受像機に付加されるテレビ電話装置の音量については、例えば、テレビ電話の通信相手と同一局のテレビジョン放送の内容等を視聴しながら会話する場合は、テレビ電話装置の使用者にとっては、視聴中のテレビジョン放送の音声を通常音量に近い音量で再生することが希望され、従来のミュート又は小音量化だけでは使用者の希望を満足できないと場合があると考えられる。従来の装置では、そのような場合に、会話状況やテレビの視聴状況に合わせて自動的に音量を変化させたり手動で音量を選択することについては全く考慮されていない。 For the volume of the videophone device added to the television broadcast receiver, for example, when talking while watching the contents of the television broadcast of the same station as the communication partner of the videophone, for the user of the videophone device Is desired to reproduce the sound of the television broadcast being viewed at a volume close to the normal volume, and it may be considered that the user's desire cannot be satisfied only by the conventional mute or volume reduction. In such a case, in such a case, no consideration is given to automatically changing the volume in accordance with the conversation situation or television viewing situation or manually selecting the volume.

また、画面表示についても、特許文献１のように親画面中の右下等に固定的にテレビ電話の通信相手の画像が配置されていると、例えば、サッカーの試合の中継では画面右下からのゴールシーンやキックシーン等を視聴することができなくなり不都合である。それに対して、例えば、特許文献２のようにテレビジョン放送受像機の画面を左右均等に２分割して、テレビ放送の番組の画面とテレビ電話の通信相手の画面を各々表示させることも可能であるが、テレビ放送の番組の画面寸法は上下と左右共に半分の寸法になるので面積は１／４に縮小され、例えば、スポーツ中継等ではフィールドに対する個々の選手が小さくなってしまい、選手の細かい動き等が見づらくなる。従来の装置では、そのような場合に、会話状況やテレビの視聴状況に合わせて自動的に画面の大きさ（全画面中の縦横寸法）及びレイアウト（配置）を変化させたり、手動で選択することについては全く考慮されていない。 As for the screen display, if the image of the communication partner of the videophone is fixedly arranged in the lower right of the main screen as in Patent Document 1, for example, in the case of a soccer game relay, from the lower right of the screen It is inconvenient because it is impossible to view the goal scene, kick scene, and the like. On the other hand, for example, as shown in Patent Document 2, it is also possible to divide the screen of the television broadcast receiver into two equally and display the screen of the TV broadcast program and the screen of the communication partner of the videophone. However, the screen size of a TV broadcast program is half the size of the top, bottom, left and right, so the area is reduced to 1/4. It becomes difficult to see movement. In such a case, the conventional apparatus automatically changes the screen size (vertical and horizontal dimensions in the entire screen) and layout (arrangement) according to the conversation situation and the television viewing situation, or selects manually. This is not considered at all.

特許文献２では、会議会話参加者の各音声信号から発言者の数が増加するに従って単純に画面の均等分割数を２分割から４分割・・・ｎ分割と自動的に分割数を増加させることができると考えられるが、会話状況やテレビの視聴状況に合わせて個別の画面の大きさ（縦横の表示寸法）を変更することや自動的に画面の大きさを変化させることについては、全く記載されていない。 In Patent Document 2, the number of speakers is simply increased from 2 to 4 as the number of speakers increases from the audio signals of the participants of the conference conversation. However, it is completely described that changing the screen size (vertical and horizontal display dimensions) according to the conversation and TV viewing status, and automatically changing the screen size. It has not been.

上記した従来の技術に基づいて、テレビジョン放送の視聴番組とテレビ会議の複数の発言者を、表示画面を均等に分割して固定的な表示位置及び寸法で表示させる方法や、テレビジョン放送の視聴番組とテレビ電話の通信相手を、固定的な表示位置及び寸法で表示させるいくつかの方法が考えられる。例えば、表示画面の中央にテレビジョン放送の視聴番組を表示させると共に周囲に会議の発言者又は通信相手を配置させるように画面を合成する方法や、表示画面の上部にテレビジョン放送の視聴番組を表示させると共に下部に会議の発言者又は通信相手を配置させるように画面を合成する方法等である。尚、テレビ電話装置はテレビ会議システムにおける１対１の場合と考えられるので、以下の説明では、使用者が用いる装置がテレビ会議システムの場合のみについて記載し、テレビ電話の場合については特に必要な場合を除いて記載を省略する。 Based on the above-described conventional technology, a method for displaying a TV broadcast viewing program and a plurality of TV conference speakers at a fixed display position and size by dividing the display screen evenly, There are several methods for displaying a communication partner of a viewing program and a videophone at a fixed display position and size. For example, a television broadcast viewing program is displayed in the center of the display screen and a conference speaker or communication partner is arranged around the screen, or a television broadcast viewing program is displayed at the top of the display screen. And a method of synthesizing the screen so that the speaker or the communication partner of the conference is arranged at the bottom. Since the videophone device is considered to be a one-to-one case in a videoconference system, the following description describes only the case where the device used by the user is a videoconference system, and is particularly necessary for the case of a videophone. The description is omitted except in cases.

しかし、テレビジョン放送で視聴される番組はスポーツ番組、バラエティ、ドラマ等多種多彩であり、視聴者（＝テレビ電話／会議装置の使用者）にとってのテレビジョン放送の注目度（重要度）は、番組の内容にもよるし、その時点での個人の興味や趣味等にもより千差万別であって例えばタイミングによっても変わり予測や定義・定量化等が非常に難しい項目である。テレビジョン放送の番組を視聴する場合の画面上における使用者の注目点は、使用者の視線が単純なテレビ会議システムの場合とは異なり、テレビジョン放送の番組の状況により変化するため、最適な画面の表示寸法の比率は刻一刻と変化し検出は容易ではない。又、各使用者にとっての重要度も、テレビジョン放送の番組状況と、通信相手との会話（又は会議の他の発言者の発言内容）の状況により刻一刻と変化しており、注目点から単純に判断できるものでもない。 However, there are a wide variety of programs that are viewed on television broadcasts, such as sports programs, varieties, dramas, etc., and the degree of attention (importance) of television broadcasts to viewers (= users of videophones / conference devices) is It depends on the contents of the program, and also varies depending on the interests and hobbies of the individual at that time. For example, it varies depending on the timing and is very difficult to predict, define and quantify. The user's attention on the screen when watching a TV broadcast program is different from the case of a TV conference system where the user's line of sight is simple. The ratio of the display dimensions of the screen changes every moment and is not easy to detect. In addition, the importance for each user also changes from moment to moment depending on the status of the television broadcast program and the status of the conversation with the communication partner (or the content of other speakers in the conference). It is not something that can be simply judged.

例えば、テレビ会議システムの使用中に、テレビジョン放送の番組ではスポーツ番組の攻撃シーンと選手交代シーン、ドラマの導入部分とエンディング部分、番組本編とコマーシャル等があった場合には、その内容に応じて、テレビ視聴者（テレビ会議システム使用者）にとっての重要度が、番組視聴が中心となるか、又は、会議の他の会話参加者の発言（通信相手との会話）が中心となるかが変わる。 For example, when using a TV conference system, if there are sports program attack scenes and player substitution scenes, drama introduction parts and ending parts, main program and commercials, etc., depending on the contents Whether the importance for TV viewers (users of video conferencing systems) is centered on program viewing, or on the speech of other conversation participants (conversations with communication partners). change.

従来の装置では、例えば、テレビジョン放送の視聴番組とテレビ会議の発言者を固定比率で合成表示させる場合には、テレビジョン放送の視聴番組に使用者があまり興味がないにもかかわらず大きな固定寸法のままで画面表示されたり、重要でない発言をした発言者が比較的大きな固定寸法のままで画面表示されていた。つまり、各表示画面毎の寸法が、番組の内容に対する各会議会話参加者の感じる重要度と連携していなかった。その場合の視聴の興味に連携しない固定寸法の視聴番組の画面や、発言内容や発言の有無に連携しない固定寸法の各通信相手の画面に対しては、使用者は、自分の興味及び希望や自分の感じる重要度と画面表示内容等とが不一致であることに感覚的に不自然さと不満足感を感じ、また、会議では重要な発言をした発言者と他の発言者の区別が判別し辛く、結果的に装置が使用しづらいという問題があった。 In a conventional apparatus, for example, when a TV broadcast viewing program and a TV conference speaker are combined and displayed at a fixed ratio, the user is not very interested in the TV broadcast viewing program. A speaker who made an unimportant statement was displayed on the screen with a relatively large fixed size. That is, the size of each display screen is not linked to the importance felt by each conference conversation participant with respect to the contents of the program. In such a case, the user may not be able to view his / her own interests and wishes for the screen of a fixed-size viewing program that does not link with the interest of viewing or the screen of each communication partner with a fixed size that does not link with the content of speech or the presence of speech. It feels unnatural and unsatisfactory that the degree of importance that you feel is not consistent with what is displayed on the screen, and it is difficult to distinguish between the speaker who made the important speech and other speakers at the meeting. As a result, there is a problem that the apparatus is difficult to use.

同様に音声についても、例えば、テレビ会議における発言者の発言内容も多種多様であり、テレビ会議の使用者にとっての各発言内容の重要度も千差万別である。従来の装置のように、例えば、テレビジョン放送の視聴番組の音声とテレビ会議の発言者の音声を固定比率で混合させる場合には、テレビジョン放送の視聴番組にあまり興味がないにもかかわらず大きな固定音量のままで音声出力されていたり、重要でない発言をした発言者が比較的大きな固定音量のままで音声出力されていたりするというように、各画面毎の音量と、番組の内容に対する各会議会話参加者の重要度とが連携しないので、使用者の感覚的に不自然で使いづらい場合があるという問題があった。 Similarly, for speech, for example, the content of a speaker in a video conference is diverse, and the level of importance of each content for a user of a video conference is various. For example, when the audio of a television broadcast viewing program and the audio of a conference call speaker are mixed at a fixed ratio as in the conventional apparatus, the television broadcast viewing program is not very interested. The sound is output at a high fixed volume, or the speaker who made an unimportant speech is output at a relatively high fixed volume. There is a problem that the importance of the conference conversation participant is not linked, so that the user may feel unnatural and difficult to use.

本発明は、上記した問題を解決するためになされたもので、テレビジョン放送の番組の画面及び音声の視聴と、テレビ会議（テレビ電話）の発言者の画面を介しての対面拝聴の双方が合成画面により同時に可能である画像及び音声通信機能付テレビジョン放送受像機において、少なくとも合成画面における各画面毎の表示寸法を、番組の内容及びテレビ会議（テレビ電話）の各会話参加者の重要度と連携させて切り替え、使用者の感覚的に自然で使いやすい画像及び音声通信機能付テレビジョン放送受像機を提供することを目的とする。 The present invention has been made in order to solve the above-described problems, and is capable of both viewing a television broadcast program screen and sound, and listening to a person through a video conference (videophone) speaker screen. In a television broadcast receiver with an image and voice communication function that can be simultaneously displayed on the composite screen, at least the display dimensions of each screen on the composite screen, the content of the program, and the importance of each conversation participant in the video conference (videophone) It is an object of the present invention to provide a television broadcast receiver with an image and audio communication function that is switched in cooperation with the user and is natural and easy to use for the user.

この発明に係る画像及び音声通信機能付テレビジョン放送受像機は、
受信したテレビジョン放送のデジタル信号を復号してテレビ映像信号及びテレビ音声信号及びそれらに付帯されるテレビ制御信号を検出するテレビデコーダ部と、
撮像手段からの入力映像信号及び音声入力手段からの入力音声信号を画像及び音声通信処理に適した形式に変換する映像音声調整部と、
通信回線からの入力信号を復号して通信映像信号及び通信音声信号を検出する通信デコーダ部と、
通信回線に対する映像及び音声の入出力信号とそれに付帯される通信制御信号の入出力を制御する通信制御部と、
テレビデコーダ部からのテレビ映像信号及び通信デコーダ部からの通信映像信号が入力されて、画像及び音声通信時には、テレビジョン放送で受信した番組の画像と通信回線から入力した画像とが合成された合成映像信号を出力する映像合成部と、
テレビジョン放送で受信した番組の画像を表示させると共に画像及び音声通信時には合成映像信号を表示させる表示部
を少なくとも備えた画像及び音声通信機能付テレビジョン放送受像機であって、
少なくとも映像音声調整部からの入力音声信号及び通信デコーダ部からの通信音声信号が入力されて使用者と通信相手の会話状況を判定して合成制御信号を出力する会話状況判定部を更に備え、
映像合成部は、更に会話状況判定部からの合成制御信号が入力され、該合成制御信号に基づいて合成映像信号を合成し、
表示部は、合成制御信号に基づく合成映像信号を表示させる
ことを特徴とする。 A television broadcast receiver with an image and audio communication function according to the present invention,
A television decoder for decoding a received television broadcast digital signal to detect a television video signal and a television audio signal and a television control signal attached to them;
A video / audio adjustment unit that converts an input video signal from the imaging unit and an input audio signal from the audio input unit into a format suitable for image and audio communication processing;
A communication decoder that decodes an input signal from a communication line to detect a communication video signal and a communication audio signal;
A communication control unit for controlling input / output of video and audio input / output signals and communication control signals attached to the communication line;
A TV video signal from the TV decoder unit and a communication video signal from the communication decoder unit are input, and at the time of image and audio communication, the program image received by TV broadcasting and the image input from the communication line are combined. A video composition unit for outputting a video signal;
A television broadcast receiver with an image and audio communication function, comprising at least a display unit for displaying an image of a program received by television broadcasting and displaying a composite video signal during image and audio communication,
A conversation status determination unit that receives at least an input audio signal from the video / audio adjustment unit and a communication audio signal from the communication decoder unit, determines a conversation status between the user and the communication partner, and outputs a synthesis control signal;
The video synthesis unit further receives a synthesis control signal from the conversation state determination unit, synthesizes a synthesized video signal based on the synthesis control signal,
The display unit displays a composite video signal based on the composite control signal.

この発明に係る画像及び音声通信機能付テレビジョン放送受像機は、少なくとも合成画面における各画面毎の表示寸法を、番組の内容及びテレビ会議（テレビ電話）の各会話参加者の重要度と連携させて切り替え、使用者の感覚的に自然で使いやすい画像及び音声通信機能付テレビジョン放送受像機を提供することができる。 The television broadcast receiver with an image and sound communication function according to the present invention associates at least the display size of each screen on the composite screen with the contents of the program and the importance of each conversation participant of the video conference (videophone). Therefore, it is possible to provide a television broadcast receiver with an image and sound communication function that is natural and easy to use for the user.

実施の形態１．
図１は、本発明の実施の形態１の画像及び音声通信機能付テレビジョン放送受像機に係る一例の概略構成を示すブロック図である。
図１において、画像及び音声通信機能付テレビジョン放送受像機１は、テレビ局からのテレビジョン放送を受信しての番組を再生して画面に表示させると共に、例えば、同様に通信回線を介して画面に表示された会議の会話参加者の発言を画面を介して対面拝聴が可能である画像及び音声通信機能を有するテレビ会議システム、又は、電話回線等の通信回線を介して画面に表示された通信相手と画面を介して対面会話が可能である画像及び音声通信機能を有するテレビ電話装置を備える。尚、本実施の形態においても、テレビ電話装置はテレビ会議システムにおける１対１の場合と考えられるので、以下の説明では、使用者が用いる装置がテレビ会議システムの場合のみについて記載し、テレビ電話の場合については、特に必要な場合を除いて記載を省略する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a schematic configuration of an example of a television broadcast receiver with an image and audio communication function according to Embodiment 1 of the present invention.
In FIG. 1, a television broadcast receiver 1 with an image and sound communication function receives a television broadcast from a television station, reproduces a program and displays it on a screen, and similarly, for example, a screen via a communication line. Communication that is displayed on the screen via a video conference system having an image and voice communication function, or a communication line such as a telephone line, etc., which enables face-to-face listening to the speech of the conference conversation participant displayed on the screen A videophone device having an image and voice communication function capable of a face-to-face conversation with a partner via a screen is provided. In this embodiment as well, the videophone device is considered to be one-to-one in the video conference system. Therefore, in the following description, only the case where the device used by the user is the video conference system will be described. In the case of, the description is omitted unless particularly necessary.

遠隔操作装置２は、電波又は赤外線を用いた無線信号により画像及び音声通信機能付テレビジョン放送受像機を遠隔操作するために、テレビジョン放送受像機及びテレビ会議システムの操作に用いられるボタン（又はキー）を備えている。更に、後述する会話状況判定部４１からの合成制御信号の出力のオン／オフ又は合成制御信号の制御内容変更の入力操作が可能であるボタンを備えている。使用者が遠隔操作装置２を操作することで、電波又は赤外線を用いた無線信号を介して使用者の操作内容が会話状況判定部４１に伝わり、更に会話状況判定部４１からの合成制御信号が出力されて、最終的に使用者の希望に従い画面のレイアウト及び音声の混合方式を切り替えることができる。 The remote control device 2 is a button (or a button used for operating the television broadcast receiver and the video conference system) in order to remotely control the television broadcast receiver with an image and sound communication function by a radio signal using radio waves or infrared rays. Key). Furthermore, a button is provided that enables an input operation to turn on / off the output of the synthesis control signal from the conversation status determination unit 41, which will be described later, or to change the control content of the synthesis control signal. When the user operates the remote control device 2, the user's operation content is transmitted to the conversation state determination unit 41 via a radio signal using radio waves or infrared rays, and a composite control signal from the conversation state determination unit 41 is further received. The screen layout and the sound mixing method can be finally switched according to the user's request.

又、遠隔操作装置２には、他の会話参加者が視聴している番組に自動的に合わせることができるボタンが設けられている。使用者がそのボタンを押した場合には、音声通信機能付テレビジョン放送受像機１は、他の会話参加者が視聴中の番組をその視聴されている画像及び音声通信機能付テレビジョン放送受像機に問い合わせ、返信内容から、その他の会話参加者と同じ番組に視聴する番組を自動的に切り替える。 Further, the remote control device 2 is provided with a button that can be automatically adjusted to a program that other conversation participants are viewing. When the user presses the button, the television broadcast receiver with audio communication function 1 displays the image of the program being watched by another conversation participant and the television broadcast image with audio communication function. The system automatically switches the program to be watched to the same program as other conversation participants from the inquiry and reply contents.

システム制御部１１は、画像及び音声通信機能付テレビジョン放送受像機１内の各部と接続（不図示）されており、受像機全体の動作を制御する。 The system control unit 11 is connected (not shown) to each unit in the television broadcast receiver 1 with an image and audio communication function, and controls the operation of the entire receiver.

映像音声入力部１２には、例えば、ビデオカメラ等の使用者を撮像する画像入力手段及びマイクロフォン等の使用者の発言が入力される音声入力手段が接続されて、使用者の映像及び音声が入力され映像音声調整部１３に出力される。 The video / audio input unit 12 is connected with, for example, an image input unit that captures a user such as a video camera and a voice input unit that inputs a user's speech such as a microphone, and inputs the video and audio of the user. And output to the audio / video adjustment unit 13.

映像音声調整部１３は、撮像手段からの入力映像信号及び音声入力手段からの入力音声信号が映像音声入力部１２を介して入力され、後段の画像及び音声通信処理に適した形式に変換されて通信エンコーダ部３３及び会話状況判定部４１に送出される。 The video / audio adjustment unit 13 receives an input video signal from the imaging unit and an input audio signal from the audio input unit via the video / audio input unit 12 and converts them into a format suitable for subsequent image and audio communication processing. The data is sent to the communication encoder unit 33 and the conversation state determination unit 41.

映像合成部１４は、テレビデコーダ部２２からのテレビ映像信号及び通信デコーダ部３４からの通信映像信号が入力されて、画像及び音声通信時には、更に会話状況判定部４１からの合成制御信号が入力され、テレビジョン放送で受信した番組の画像と通信回線から入力した画像とがその合成制御信号に基づいて合成された合成映像信号が出力される。具体的には、映像合成部１４は、合成される各画像の寸法比率が異なる複数の合成レイアウトテンプレートの内容を含む合成制御信号の画面レイアウトに従って合成された合成映像信号を出力し、表示部１５に出力して表示させる。 The video synthesis unit 14 receives a television video signal from the television decoder unit 22 and a communication video signal from the communication decoder unit 34, and further receives a synthesis control signal from the conversation status determination unit 41 during image and audio communication. Then, a synthesized video signal obtained by synthesizing an image of a program received by television broadcasting and an image inputted from a communication line based on the synthesis control signal is output. Specifically, the video synthesizing unit 14 outputs a synthesized video signal synthesized according to the screen layout of the synthesis control signal including the contents of a plurality of synthesized layout templates having different dimensional ratios of the synthesized images, and the display unit 15 Output to and display.

表示部１５は、テレビジョン放送で受信した番組の画像を表示させると共に、画像及び音声通信時には合成制御信号に基づく合成映像信号を表示させる。 The display unit 15 displays an image of a program received by television broadcasting, and displays a composite video signal based on the composite control signal during image and audio communication.

音声混合部１６は、テレビデコーダ部２２からのテレビ音声信号及び通信デコーダ部３４からの通信音声信号及び映像音声調整部１３からの入力音声信号及び会話状況判定部４１からの合成制御信号が入力されて、合成制御信号に基づいて各会話参加者の音量レベルについても各々変化させて混合された混合音声信号が出力される。具体的には、音声混合部１６は、各画像に対応する各音声の混合される各音量レベルが異なる複数の音声混合テンプレートの内容を含む合成制御信号の割合に従って混合された混合音声信号が音声出力部１７に出力され、音声として出力される。 The audio mixing unit 16 receives the TV audio signal from the TV decoder unit 22, the communication audio signal from the communication decoder unit 34, the input audio signal from the video / audio adjustment unit 13, and the synthesis control signal from the conversation state determination unit 41. Thus, a mixed audio signal is output by changing the volume level of each conversation participant based on the synthesis control signal. Specifically, the sound mixing unit 16 outputs a mixed sound signal mixed according to the ratio of the synthesis control signal including the contents of a plurality of sound mixing templates in which the sound levels corresponding to the respective images are mixed. It is output to the output unit 17 and output as sound.

遠隔操作受信部１８は、遠隔操作装置２からの画像及び音声通信機能付テレビジョン放送受像機を遠隔操作するための無線信号を受信してシステム制御部１１に出力する。これにより、使用者の操作内容が画像及び音声通信機能付テレビジョン放送受像機で受け付けられる。 The remote operation receiving unit 18 receives an image from the remote operation device 2 and a radio signal for remotely operating the television broadcast receiver with a voice communication function, and outputs it to the system control unit 11. Thereby, the user's operation content is received by the television broadcast receiver with an image and audio communication function.

チューナ部２１は、アンテナ（図示せず）及びテレビデコーダ部２２と接続され、アンテナで受信された放送波が入力される。その後、使用者の指示等によりチューニング（選局）されたチャンネル（番組）の受信信号がテレビデコーダ部２２に向けて出力される。 The tuner unit 21 is connected to an antenna (not shown) and the television decoder unit 22 and receives broadcast waves received by the antenna. Thereafter, a reception signal of a channel (program) tuned (tuned) according to a user instruction or the like is output to the television decoder unit 22.

テレビデコーダ部２２では、受信されたテレビジョン放送のチューニングされたチャンネル（番組）のデジタル信号が復号され、テレビ映像信号及びテレビ音声信号及びそれらに付帯される各種モードや番組の分類などを含む付帯情報及びテレビ制御信号が検出されて、復号された映像信号は映像合成部１４に入力され、復号された音声信号は音声混合部１６に入力される出力される。 The television decoder unit 22 decodes the digital signal of the tuned channel (program) of the received television broadcast, and includes a television video signal and a television audio signal, and various modes and program classifications attached thereto. The information and the television control signal are detected, the decoded video signal is input to the video synthesis unit 14, and the decoded audio signal is input to the audio mixing unit 16 and output.

通信インタフェース部３１は、通信回線（図示せず）及び通信制御部３２と接続される。通信制御部３２から入力した信号は、通話中の通信相手に通信インタフェース部３１及び通信回線を介して送信され、通信回線から入力した信号は通信制御部３２に送出される。 The communication interface unit 31 is connected to a communication line (not shown) and the communication control unit 32. A signal input from the communication control unit 32 is transmitted to a communication partner during a call via the communication interface unit 31 and the communication line, and a signal input from the communication line is transmitted to the communication control unit 32.

通信制御部３２は、通信インタフェース部３１、通信エンコーダ部３３、通信デコーダ部３４、映像音声調整部１３、及び、会話状況判定部４１と接続され、通信回線に対する映像及び音声の入出力信号とそれに付帯される通信制御信号の入出力を制御する。 The communication control unit 32 is connected to the communication interface unit 31, the communication encoder unit 33, the communication decoder unit 34, the video / audio adjustment unit 13, and the conversation state determination unit 41. Controls the input / output of the accompanying communication control signals.

通信制御部３２は、出力する通信制御信号に、例えば、使用者の発言量情報と発声音量レベル情報、使用者の視聴するテレビ放送番組のチャンネル情報及び付帯情報、使用者の注目点情報、及び、遠隔操作装置２から無線信号により指示された内容を含めて出力する。一方、通信制御部３２は、入力する通信制御信号からは、例えば、通信相手の発言量情報と発声音量レベル情報、使用者の視聴するテレビ放送番組のチャンネル情報及び付帯情報、使用者の注目点情報、及び、通信相手の遠隔操作装置から無線信号により指示された内容を検出する。 The communication control unit 32 outputs, for example, a user's speech volume information and voice volume level information, channel information and supplementary information of a TV broadcast program viewed by the user, user attention point information, , Including the contents instructed by the radio signal from the remote operation device 2. On the other hand, the communication control unit 32 receives, for example, the communication partner's utterance volume information and utterance volume level information, the channel information and supplementary information of the TV broadcast program viewed by the user, and the user's attention from the input communication control signal. Information and contents instructed by a radio signal from the remote control device of the communication partner are detected.

通信エンコーダ部３３は、映像音声調整部１３から入力した映像信号（カメラ入力）及び音声信号（マイク入力）を符号化して通信制御部３２に送出する。 The communication encoder 33 encodes the video signal (camera input) and the audio signal (microphone input) input from the video / audio adjustment unit 13 and sends them to the communication control unit 32.

通信デコーダ部３４は、通信回線からの入力信号を復号して通信映像信号及び通信音声信号を検出する。
又、通信デコーダ部３４で復号された映像は、映像合成部１４に入力され、音声は音声混合部１６に入力される。 The communication decoder 34 decodes an input signal from the communication line and detects a communication video signal and a communication audio signal.
The video decoded by the communication decoder unit 34 is input to the video synthesis unit 14, and the audio is input to the audio mixing unit 16.

会話状況判定部４１は、映像音声調整部１３、テレビデコーダ部２２、通信制御部３２、通信デコーダ部３４と接続されて（不図示）、会話状況を判定するために映像情報・音声情報とその他の付帯情報が収集される。例えば、映像音声調整部１３からの入力音声信号、テレビデコーダ部２２からのチャンネル情報とテレビ音声信号とテレビ制御信号、通信デコーダ部３４からの通信音声信号、通信制御部３２からの通信制御信号が入力されて、それらの各信号を用いて使用者と通信相手の会話状況が判定されて合成制御信号が出力される。テレビ制御信号には、使用者が視聴中のテレビ放送番組のジャンル情報と、使用者が視聴中のテレビ放送番組の音声モード情報が含まれる。通信制御信号には、通信相手が視聴するテレビ放送番組の付帯情報中のジャンル情報及び／又は音声モード情報が含まれる。 The conversation status determination unit 41 is connected to the video / audio adjustment unit 13, the television decoder unit 22, the communication control unit 32, and the communication decoder unit 34 (not shown), and determines video information / audio information and other information to determine the conversation status. Additional information is collected. For example, an input audio signal from the video / audio adjusting unit 13, channel information from the TV decoder unit 22, a TV audio signal and a TV control signal, a communication audio signal from the communication decoder unit 34, and a communication control signal from the communication control unit 32 are included. The input signal is used to determine the conversation status between the user and the communication partner using these signals, and a composite control signal is output. The TV control signal includes genre information of the TV broadcast program being viewed by the user and sound mode information of the TV broadcast program being viewed by the user. The communication control signal includes genre information and / or audio mode information in the incidental information of the TV broadcast program viewed by the communication partner.

例えば、会話状況判定部４１では、映像音声調整部１３からの入力音声信号から検出される使用者の発言量及び／又は発声音量レベルと、通信デコーダ部３４からの通信音声信号から検出される通信相手の発言量及び／又は発声音量レベルに基づいて会話状況を判定し、発言量の多い人及び／又は発声音量レベルの高い人の画像が大きく表示されるように合成制御信号を生成する。 For example, in the conversation status determination unit 41, the user's speech volume and / or utterance volume level detected from the input audio signal from the video / audio adjustment unit 13 and the communication detected from the communication audio signal from the communication decoder unit 34. The conversation status is determined based on the other party's utterance volume and / or utterance volume level, and a synthesis control signal is generated so that an image of a person with a higher utterance volume and / or a person with a higher utterance volume level is displayed larger.

会話状況判定部４１は、通信制御部３２からの通信制御信号が入力されて、該通信制御信号から会話参加者の人数を判別する。会話参加者の人数が３人以上である場合には、例えば、発言量の多い順番に各会話参加者の画像が大きくなるように合成制御信号を生成するか、又は、発言量が最多の人の画像が大きくなるように合成制御信号を生成する。 The conversation state determination unit 41 receives a communication control signal from the communication control unit 32 and determines the number of conversation participants from the communication control signal. When the number of conversation participants is three or more, for example, the composite control signal is generated so that the images of the conversation participants increase in the order of the largest amount of speech, or the person who has the largest amount of speech The synthesis control signal is generated so that the image of the image becomes larger.

会話状況判定部４１は、映像音声調整部１３からの入力映像信号及び／又は遠隔操作装置指示入力情報から、使用者が表示部１５のどこを注目しているかの注目点を判定する注目点判定部５４を備えている。会話状況判定部４１は、会話参加者全員の注目点情報を用いることで、会話状況を判定して合成制御信号を出力することができる。例えば、会話状況の判定、視聴されるテレビ放送番組の画面の合成制御、会話参加者全体中の最も注目される被注目者の画面の合成制御、会話参加者全体中の相互に注目し合う各会話参加者の画面の合成制御には、その注目点の情報が利用可能である。 The conversation state determination unit 41 determines the point of interest in which the user is paying attention on the display unit 15 based on the input video signal from the video / audio adjustment unit 13 and / or the remote control device instruction input information. A portion 54 is provided. The conversation status determination unit 41 can determine the conversation status and output a synthesis control signal by using the attention point information of all the conversation participants. For example, judgment of conversation status, synthesis control of the screen of the TV broadcast program to be watched, synthesis control of the screen of the most watched person in the entire conversation participant, each paying attention to each other in the entire conversation participant The information on the attention point can be used for the composition control of the conversation participant's screen.

合成制御部４２は、会話状況判定部４１からの判定情報を受け取り、映像合成部１４、音声混合部１６の制御を行う。具体的には、合成制御部４２は、合成される各画像の個々についてレイアウト及び寸法を制御することが可能であり、予め保持される合成レイアウトテンプレートに対応させて合成可能な合成制御信号、又、混合される各音声の個々について音量レベルの制御が可能であり、予め保持される音声混合テンプレートに対応させて混合可能な合成制御信号を出力する。 The synthesis control unit 42 receives the determination information from the conversation state determination unit 41 and controls the video synthesis unit 14 and the audio mixing unit 16. Specifically, the composition control unit 42 can control the layout and dimensions of each of the images to be synthesized, and can be synthesized in accordance with a synthesis layout template stored in advance, The volume level can be controlled individually for each voice to be mixed, and a synthesis control signal that can be mixed according to a voice mixing template held in advance is output.

合成制御部４２では、会話状況判定部４１から受信した合成制御信号としてのテンプレートの内容に対応させて、画面を合成させる指示（合成制御信号）を映像合成部１４に出力し、受信したテンプレートの内容に対応させて、音声を混合させる指示（合成制御信号）を音声混合部１６に出力する。 The synthesis control unit 42 outputs an instruction (synthesis control signal) for synthesizing the screen to the video synthesis unit 14 in correspondence with the content of the template as the synthesis control signal received from the conversation state determination unit 41, and the received template of the received template. In response to the content, an instruction (synthetic control signal) for mixing sounds is output to the sound mixing unit 16.

図２は、図１の会話状況判定部４１の概略の内部構成の一例を示すブロック図である。
ネット検索部５１は、通信制御部３２及び後述する番組情報判定部５６と接続され、通信制御部３２を介してインターネット等のネット上に公開されているサイトを検索して視聴中又は視聴予定の番組の番組情報を収集することで、例えば、視聴中のアナログ放送番組のジャンル情報を検索することができる。番組のジャンル情報は、デジタル放送では放送される番組の制御信号等に付加される場合が多いが、アナログ放送では制御信号は付加されないので、そのようなアナログ放送の番組の番組情報を検索する場合に有効であり、その他に、デジタル放送に付加された制御信号中の番組情報では不足している場合の、例えば、番組付加情報中のジャンル分けよりもより詳しいジャンル分けを調査する場合にネットを用いた検索が有効である。 FIG. 2 is a block diagram illustrating an example of a schematic internal configuration of the conversation state determination unit 41 in FIG.
The net search unit 51 is connected to the communication control unit 32 and a program information determination unit 56 to be described later, and searches through the communication control unit 32 for a site that is open on the Internet or the like and is viewing or is scheduled to view the site. By collecting program information of a program, for example, genre information of an analog broadcast program being viewed can be searched. In many cases, the genre information of a program is added to a control signal or the like of a program that is broadcast in digital broadcasting, but a control signal is not added in analog broadcasting. Therefore, when searching for program information of such an analog broadcasting program In addition, when the program information in the control signal added to the digital broadcast is insufficient, for example, when investigating more detailed genre classification than the genre classification in the program additional information, The search used is effective.

被注目者判定部５２は、通信制御部３２及び後述する注目点判定部５４及び後述する合成モード選定部７３と接続され、注目点判定部５４からは使用者の注目点の判定結果が入力される。一方、通信制御部３２からは、各々の通信相手がどの通信相手を注目しているかの情報が入力され、例えば、テレビ会議会話参加者（会話参加者）全員の中で最も注目されている通信相手を検出（推定）する。又は、例えば、各会話参加者の注目度合いに応じて、段階的なランク付けをするように検出してもよい。この画像及び音声通信機能付テレビジョン放送受像機１の使用者の注目点の検出結果については、通信制御部３２及び通信回線を介して通信相手の画像及び音声通信機能付テレビジョン放送受像機１に送出する。又、例えば、検出の精度を向上させるためや通信相手の画像及び音声通信機能付テレビジョン放送受像機１における演算負荷を軽減させるために、最も注目されている通信相手の検出結果についても全通信相手の画像及び音声通信機能付テレビジョン放送受像機１に送出してもよい。 The person-of-interest determination unit 52 is connected to the communication control unit 32, a later-described attention point determination unit 54, and a later-described combination mode selection unit 73, and the attention point determination unit 54 receives the determination result of the user's attention point. The On the other hand, from the communication control unit 32, information on which communication partner each communication partner is paying attention to is input. For example, communication that is most noticed among all video conference conversation participants (conversation participants). Detect (estimate) the other party. Or you may detect so that it may rank in steps according to the attention degree of each conversation participant, for example. Regarding the detection result of the attention point of the user of the image and audio communication function-equipped television broadcast receiver 1, the communication partner 32 and the communication partner image and audio communication function-equipped television broadcast receiver 1 are connected via the communication control unit 32 and the communication line. To send. Further, for example, in order to improve the accuracy of detection and to reduce the calculation load in the television broadcast receiver 1 with the image and audio communication function of the communication partner, the communication result of the communication partner who has received the most attention You may send to the other party's image and audio | voice communication function-equipped television broadcast receiver 1.

視線一致検出部５３も、通信制御部３２及び注目点検出部５４及び合成モード選定部７３と接続され、注目点検出部５４からは使用者の注目点の検出結果が入力され、通信制御部３２からは、各々の通信相手がどの通信相手を注目しているかの情報を得るが、視線一致検出部５３では、例えば、この画像及び音声通信機能付テレビジョン放送受像機１の使用者の注目点と、各通信相手の画像及び音声通信機能付テレビジョン放送受像機１における使用者（通信相手）の注目点とが相互に相手を注目し合っているか、即ち、テレビ画面を介して擬似的に各通信相手と視線が一致しているかを検出（推定）し、それにより視線が一致している通信相手を検出する。 The line-of-sight detection unit 53 is also connected to the communication control unit 32, the point-of-interest detection unit 54, and the synthesis mode selection unit 73. The target point detection unit 54 receives the detection result of the user's point of interest, and the communication control unit 32. The information on which communication partner is paying attention to is obtained from each communication partner. In the line-of-sight coincidence detection unit 53, for example, the attention point of the user of the television broadcast receiver 1 with this image and voice communication function is obtained. And the image of each communication partner and the attention point of the user (communication partner) in the television broadcast receiver 1 with the voice communication function mutually pay attention to each other, that is, in a pseudo manner via the TV screen. It is detected (estimated) whether each communication partner and the line of sight match, thereby detecting the communication partner whose line of sight matches.

注目点検出部５４は、映像音声調整部１３及び映像合成部１４及び被注目者検出部５２及び注目点検出部５４及び合成モード選定部７３と接続され、映像音声調整部１３からは、映像音声入力部１２に入力されたこの画像及び音声通信機能付テレビジョン放送受像機１の使用者の映像が入力される。映像合成部１４からは、画面上のどこに何の画像が表示されるかの情報が入力される。注目点検出部５４では、例えば、入力した使用者の映像から、顔の輪郭に対する目の位置、目全体に対する黒目の位置、黒目の輪郭等が検出され、それらの検出結果を映像合成部１４からの画面上のどこに何の画像が表示されるかの情報と照合し、この使用者が画面上のどこを注目しているかの注目点を検出する。又、注目点を検出する他の方法として、例えば表示部１５の近傍に光源を設置できる場合には、特開平６−３１９７０１号公報等で示されている網膜反射光を利用して注目点を検出する方法を用いることができる。 The attention point detection unit 54 is connected to the video / audio adjustment unit 13, the video composition unit 14, the person-of-interest detection unit 52, the attention point detection unit 54, and the synthesis mode selection unit 73. This image input to the input unit 12 and the video of the user of the television broadcast receiver 1 with a voice communication function are input. Information about what image is displayed on the screen is input from the video composition unit 14. The attention point detection unit 54 detects, for example, the position of the eyes with respect to the contour of the face, the position of the black eyes with respect to the entire eye, the contour of the black eyes, and the like from the input user image. The point of interest on which the user is paying attention is detected by comparing with the information on what image is displayed where on the screen. As another method for detecting the attention point, for example, when a light source can be installed in the vicinity of the display unit 15, the attention point is detected by using retinal reflected light as disclosed in JP-A-6-319701. A detection method can be used.

音声モード検出部５５は、テレビデコーダ部２２及び合成モード選定部７３と接続され、テレビデコーダ部２２から放送信号の付帯情報のうちの音声関係情報が入力される。その音声関係情報から、例えば、現在使用者に視聴されている番組の音声モードが２カ国語モードであるか、ステレオモードであるかを検出し、検出結果を合成モード選定部７３に出力する。 The audio mode detection unit 55 is connected to the television decoder unit 22 and the synthesis mode selection unit 73, and audio related information in the incidental information of the broadcast signal is input from the television decoder unit 22. From the audio-related information, for example, it is detected whether the audio mode of the program currently viewed by the user is the bilingual mode or the stereo mode, and the detection result is output to the synthesis mode selection unit 73.

番組情報検出部５６は、テレビデコーダ部２２及びネット検索部５１及び合成モード選定部７３と接続され、テレビデコーダ部２２から放送信号の付帯情報のうちの番組関係情報が入力される。その番組関係情報から、現在使用者に視聴されている番組のジャンルを検出し、検出結果を合成モード選定部７３に出力する。アナログ放送等のテレビデコーダ部２２からの番組関係情報が入力されない場合や、付帯情報のうちの番組関係情報よりも詳しい番組関係情報（ジャンル分け）を得たい場合には、番組情報検出部５６は、ネット検索部５１によりネット上に公開されている番組情報を検索して現在使用者に視聴されている番組の詳しいジャンルを調べることができる。 The program information detection unit 56 is connected to the television decoder unit 22, the net search unit 51, and the synthesis mode selection unit 73, and program related information in the incidental information of the broadcast signal is input from the television decoder unit 22. The genre of the program currently viewed by the user is detected from the program related information, and the detection result is output to the synthesis mode selection unit 73. When the program related information from the TV decoder unit 22 such as analog broadcasting is not input, or when it is desired to obtain program related information (genre division) more detailed than the program related information in the accompanying information, the program information detecting unit 56 The detailed search genre of the program currently viewed by the user can be checked by searching the program information published on the net by the net search unit 51.

盛り上がり検出部６１は、テレビデコーダ部２２及び番組情報検出部５６及び合成モード選定部７３と接続され、番組情報検出部５６から現在使用者に視聴されている番組のジャンルの情報が入力され、その番組ジャンルに特有の番組の盛り上がり状況やハイライトシーンをテレビデコーダ部２２からの音声信号から検出し、検出結果を合成モード選定部７３に出力する。ある番組のジャンルに特有の盛り上がり状況やハイライトシーンを音声から検出することは、例えば、特定のジャンルの番組の一例としてサッカー中継では、アナウンサーはゴールシーンでは声量（音量）が増加し、トーンも高くなる場合が多い。野球のホームランの場合も同様にアナウンサーの声量（音量）が増加し、トーンも高くなる場合が多い。このことから、サッカーや野球の試合の盛り上がり状況とアナウンサーの声量及びトーンとの間には相関関係があるといえる。このアナウンサーの声量及びトーンと試合の盛り上がり状況との相関関係を利用して、番組の盛り上がり状況を検出することができる。 The climax detection unit 61 is connected to the television decoder unit 22, the program information detection unit 56, and the composition mode selection unit 73, and the genre information of the program currently viewed by the user is input from the program information detection unit 56. The program swell situation and highlight scene specific to the program genre are detected from the audio signal from the television decoder unit 22, and the detection result is output to the synthesis mode selection unit 73. For example, in a soccer broadcast as an example of a program of a specific genre, the announcer increases the volume (volume) and the tone in the goal scene. It is often high. Similarly, in the case of a baseball home run, the announcer's volume (volume) increases and the tone is often high. From this, it can be said that there is a correlation between the excitement of soccer and baseball games and the volume and tone of the announcer. Using the correlation between the announcer's voice volume and tone and the game's excitement, the program's excitement can be detected.

通話総量検出部６２は、映像音声調整部１３及び通信デコーダ部３４及び合成モード選定部７３と接続され、映像音声調整部１３からは、映像音声入力部１２に入力されたこの画像及び音声通信機能付テレビジョン放送受像機１の使用者の音声が入力されると共に、通信デコーダ部３４からは、通信相手の音声が入力され、使用者の音声と通信相手の音声とを合わせた全体での音量（音声の大きさ）と持続時間が検出され、その全体の音量と持続時間の統計処理結果から通信機能での通話総量を検出し、検出結果を合成モード選定部７３に出力する。通話総量については、通話総量検出部６２では、例えば、持続時間として過去１分間の会話音声の音量の統計を取って検出する。また、例えば、全体の音声信号から、盛り上がり検出部６１のように通信相手の声量及びトーンを利用して、会話の盛り上がり状況を検出することができるようにしてもよい。 The total call volume detecting unit 62 is connected to the video / audio adjusting unit 13, the communication decoder unit 34, and the synthesis mode selecting unit 73. The video / audio adjusting unit 13 inputs the image and audio communication function. The voice of the user of the attached television broadcast receiver 1 is input, and the voice of the communication partner is input from the communication decoder unit 34. The total volume of the voice of the user and the voice of the communication partner is combined. (Speech volume) and duration are detected, the total amount of calls in the communication function is detected from the overall volume and duration statistical processing results, and the detection results are output to the synthesis mode selection unit 73. For the total call volume, the total call volume detection unit 62 detects, for example, statistics of the volume of conversational voice for the past 1 minute as the duration. Further, for example, it may be possible to detect the excitement of the conversation from the entire audio signal using the voice volume and tone of the communication partner as in the excitement detection unit 61.

最多発言者検出部６３も、映像音声調整部１３及び通信デコーダ部３４及び合成モード選定部７３と接続され、映像音声調整部１３からは、映像音声入力部１２に入力されたこの画像及び音声通信機能付テレビジョン放送受像機１の使用者の音声が入力されると共に、通信デコーダ部３４からは、通信相手の音声が入力されるが、使用者の音声と通信相手の音声との個々の音量（音声の大きさ）と持続時間が検出され、その個々の音量と持続時間の統計処理結果から、最も多く発言している通信相手を検出するか又は発言量の順に通信相手を比率で分けて検出し、検出結果及びその比率を合成モード選定部７３に出力する。 The most-speaker detection unit 63 is also connected to the audio / video adjustment unit 13, the communication decoder unit 34, and the synthesis mode selection unit 73, and the image and audio communication input to the audio / video input unit 12 from the audio / video adjustment unit 13. While the voice of the user of the television broadcast receiver with function 1 is input and the voice of the communication partner is input from the communication decoder 34, the individual volumes of the user's voice and the voice of the communication partner are input. (Speech volume) and duration are detected, and from the statistical processing results of the individual volume and duration, the communication partner who speaks the most is detected or the communication partner is divided by ratio in order of the amount of speech. The detection result and the ratio are output to the synthesis mode selection unit 73.

検出内容判定テーブル７１は、合成モード選定部７３と接続され、画面レイアウトの合成モードを選択するために、予め実験や検討により決定された通話総量等の検出内容の判定基準及び検出内容に対する比重の係数がテーブル形式で格納される。 The detection content determination table 71 is connected to the synthesis mode selection unit 73, and in order to select the synthesis mode of the screen layout, the determination criteria for the detection content such as the total amount of calls determined in advance through experiments and examinations and the specific gravity of the detection content Coefficients are stored in table format.

レイアウト・音声混合テンプレート７２も、合成モード選定部７３と接続され、予め考えられる事態の各画像が様々な位置で様々な大きさに配置された画面レイアウトのテンプレートと、各会話参加者の音声を様々なレベルで混合させる音声混合のテンプレートが格納されている。 The layout / sound mixed template 72 is also connected to the synthesis mode selection unit 73, and a screen layout template in which each image of a conceivable situation is arranged in various sizes at various positions, and the voice of each conversation participant. Audio mixing templates to be mixed at various levels are stored.

合成モード選定部７３は、検出内容判定テーブル７１、レイアウト・音声混合テンプレート７２と接続され、更に、被注目者検出部５２、視線一致検出部５３、注目点検出部５４、音声モード検出部５５、番組情報検出部５６、盛り上がり検出部６１、通話総量検出部６２、最多発言者検出部６３と接続されている。合成モード選定部７３には、被注目者検出部５２、視線一致検出部５３、注目点検出部５４、音声モード検出部５５、番組情報検出部５６、盛り上がり検出部６１、通話総量検出部６２、及び、最多発言者検出部６３から各々の検出結果が入力され、各検出結果に検出内容判定テーブル７１の検出基準及び比重が掛け合わせられて補正され、それらの補正された各検出結果に基づいて合成モードが選定され、その選定された合成モードに適合する画面レイアウトと音声混合のテンプレートが選択され、選択結果が合成制御部４２に指示（合成制御信号）として出力される。 The synthesis mode selection unit 73 is connected to the detection content determination table 71 and the layout / sound mixing template 72, and further includes a subject detection unit 52, a line-of-sight detection unit 53, a point of interest detection unit 54, a voice mode detection unit 55, The program information detection unit 56, the climax detection unit 61, the total call amount detection unit 62, and the most frequent speaker detection unit 63 are connected. The synthesis mode selection unit 73 includes a person-of-interest detection unit 52, a line-of-sight detection unit 53, a point of interest detection unit 54, a voice mode detection unit 55, a program information detection unit 56, a climax detection unit 61, a total call amount detection unit 62, Each detection result is input from the most-speaker detection unit 63, corrected by multiplying each detection result by the detection criterion and specific gravity of the detection content determination table 71, and based on each corrected detection result. A synthesis mode is selected, a screen layout and a voice mixing template that match the selected synthesis mode are selected, and the selection result is output to the synthesis control unit 42 as an instruction (synthesis control signal).

このように合成モード選定部７３には、通信回線で入出力される信号の他に、テレビデコーダ部２２からの音声信号や制御信号も入力されるが、以下の合成モード選定部７３の内部構成ブロックの説明では、基本的な通信回線で入出力される信号による動作のみに簡略化して、各ブロックの動作内容を説明する。 Thus, in addition to the signals input / output via the communication line, audio signals and control signals from the TV decoder unit 22 are also input to the synthesis mode selection unit 73. In the description of the block, the operation content of each block will be described by simplifying only the operation based on signals input / output through a basic communication line.

合成モード選定部７３の内部の放送・通話比算出部８１では、入力された通話総量検出部６２の検出結果を検出内容判定テーブル７１の判断基準及び係数を用いて、会話重点及び放送重点の比率を算出することで、放送の視聴とテレビ会議での会話のどちらに重点が置かれているかを判断し、判断結果をテンプレート選択部８４に出力する。この判断結果としては、例えばレイアウト・音声混合テンプレート７２から候補となるテンプレートを選択し、その選択結果をテンプレート選択部８４に出力する。 The broadcast / call ratio calculation unit 81 inside the synthesis mode selection unit 73 uses the determination result and the coefficient of the detection content determination table 71 for the input detection result of the total call amount detection unit 62, and the ratio of the conversation priority and the broadcast priority. By calculating, it is determined which of the broadcast viewing and the video conference conversation is focused, and the determination result is output to the template selection unit 84. As the determination result, for example, a candidate template is selected from the layout / sound mixing template 72 and the selection result is output to the template selection unit 84.

特定発言者重点検知部８２では、入力された最多発言者検出部６３の検出結果を検出内容判定テーブル７１の判断基準及び係数を用いて、重点表示する通信相手を算出することで、特定の通信相手の重点表示を実施するかとどうかと、どの通信相手の重点表示するかを判断し、判断結果をテンプレート選択部８４に出力する。この判断結果としては、例えばレイアウト・音声混合テンプレート７２から候補となるテンプレートを選択し、その選択結果をテンプレート選択部８４に出力する。 The specific speaker emphasis detection unit 82 calculates the communication partner to be displayed with emphasis by using the determination criteria and coefficient of the detection content determination table 71 based on the detection result of the input most frequent speaker detection unit 63, thereby specifying a specific communication. It is determined whether or not the partner's emphasis display is performed and which communication partner's emphasis display is performed, and the determination result is output to the template selection unit 84. As the determination result, for example, a candidate template is selected from the layout / sound mixing template 72 and the selection result is output to the template selection unit 84.

使用者選択モード記憶部８３は、遠隔走査装置２により選択された映像合成モードあるいは音声混合モードがシステム制御部１１から入力され、その選択されたモードに従って、例えばレイアウト・音声混合テンプレート７２から候補となるテンプレートを選択して記憶する。 The user selection mode storage unit 83 receives the video synthesis mode or the audio mixing mode selected by the remote scanning device 2 from the system control unit 11, and selects candidates from the layout / audio mixing template 72 according to the selected mode, for example. Select a template to be stored.

テンプレート選択部８４では、放送・通話比算出部８１、特定発言者重点検知部８２の結果と使用者選択モード記憶部８３の記憶内容に基づいて、レイアウト・音声混合テンプレート７２から最終的なテンプレートを選択し、そのテンプレートの内容を合成制御信号として合成制御部４２に送信する。 The template selection unit 84 selects a final template from the layout / audio mixed template 72 based on the results of the broadcast / call ratio calculation unit 81, the specific speaker priority detection unit 82 and the stored contents of the user selection mode storage unit 83. The content of the template is selected and transmitted to the synthesis control unit 42 as a synthesis control signal.

テレビジョン放送の番組視聴とテレビ会議システムによる会話が同時に行われる場合の画像及び音声通信機能付テレビジョン放送受像機１の全般的な動作について説明する。 The general operation of the television broadcast receiver 1 with an image and audio communication function when a television broadcast program viewing and a conversation by the video conference system are simultaneously performed will be described.

まず、テレビジョン放送の番組視聴に関する動作について説明する。画像及び音声通信機能付テレビジョン放送受像機１では、アンテナで受信された放送波は、チューナ部２１でチューニングされ、テレビデコーダ部２２で復号され、そこでテレビジョン放送の番組の映像信号、音声信号、及び、各種モードや番組の分類などを含む付帯情報を含んだ制御信号が検出されて出力される。検出された番組の映像信号は、映像合成部１４を経由して表示部１５に映像として表示される。また、検出された番組の音声信号は、音声混合部１６を経由して音声出力部１７から出力される。このようにして、テレビジョン放送の番組の映像と音声が画像及び音声通信機能付テレビジョン放送受像機１によって提供される。 First, an operation related to television broadcast program viewing will be described. In the television broadcast receiver 1 with an image and audio communication function, a broadcast wave received by an antenna is tuned by a tuner unit 21 and decoded by a television decoder unit 22, where video signals and audio signals of a television broadcast program are obtained. And a control signal including incidental information including various modes and program classifications is detected and output. The detected video signal of the program is displayed as a video on the display unit 15 via the video synthesis unit 14. The detected audio signal of the program is output from the audio output unit 17 via the audio mixing unit 16. In this way, video and audio of a television broadcast program are provided by the television broadcast receiver 1 with an image and audio communication function.

次に、テレビ会議システムによる会話に関する動作について説明する。通信回線を経由して送信されてきたテレビ会議システムの会話用データは、通信インタフェース部３１で受信されて通信制御部３２で制御に必要なデータが検出された後、通信デコーダ部３４でデコードされて映像信号と音声信号として出力される。通信相手が複数存在する場合には、通信制御部３２で制御信号によるか又は通信デコーダ部３４で復号結果により各通信相手毎に出力（映像信号及び音声信号）が分離される。検出された映像信号は、映像合成部１４を経由して表示部１５に表示され、検出された音声信号は、音声混合部１６を経由して音声出力部１７から出力される。このようにして、テレビ会議システムでの通信相手に関する映像と音声が画像及び音声通信機能付テレビジョン放送受像機１によって提供される。 Next, the operation related to the conversation by the video conference system will be described. The conversation data of the video conference system transmitted via the communication line is received by the communication interface unit 31 and data necessary for control is detected by the communication control unit 32, and then decoded by the communication decoder unit 34. Are output as video and audio signals. When there are a plurality of communication partners, the output (video signal and audio signal) is separated for each communication partner based on the control signal in the communication control unit 32 or the decoding result in the communication decoder unit 34. The detected video signal is displayed on the display unit 15 via the video synthesis unit 14, and the detected audio signal is output from the audio output unit 17 via the audio mixing unit 16. In this way, the video and audio related to the communication partner in the video conference system are provided by the television broadcast receiver 1 with the image and audio communication function.

また、テレビ会議システムによる会話では、使用者自身の映像も確認するために表示されるのでその動作について説明する。例えばビデオカメラ等から入力された映像信号及びマイクロフォン等から入力された音声信号は、映像音声入力部１２を介してテレビ会議システムで利用するために適した形式に映像音声調整部１３で加工されて、通信エンコーダ部３３に入力されて符号化される。通信エンコーダ部３３で符号化された映像信号と音声信号は、通信制御部３２、通信インタフェース部３１を介して通信相手に送出される。その際に、映像音声調整部１３から出力される映像信号は、映像合成部１４を経由して表示部１５に表示され、音声信号は、音声混合部１６を経由して音声出力部１７から出力される。このようにして、テレビ会議システムでの使用者自身に関する映像と音声が画像及び音声通信機能付テレビジョン放送受像機１によって提供される。 In the conversation by the video conference system, the user's own video is also displayed for confirmation, and the operation will be described. For example, a video signal input from a video camera and an audio signal input from a microphone or the like are processed by the video / audio adjustment unit 13 into a format suitable for use in a video conference system via the video / audio input unit 12. Are input to the communication encoder 33 and encoded. The video signal and the audio signal encoded by the communication encoder unit 33 are sent to the communication partner via the communication control unit 32 and the communication interface unit 31. At this time, the video signal output from the video / audio adjustment unit 13 is displayed on the display unit 15 via the video synthesis unit 14, and the audio signal is output from the audio output unit 17 via the audio mixing unit 16. Is done. In this way, video and audio related to the user himself / herself in the video conference system is provided by the television broadcast receiver 1 with an image and audio communication function.

本実施の形態のように番組視聴とテレビ会議システムの会話との両方を同時に行う装置では、テレビジョン放送の番組、１人または複数の通信相手、使用者自身の複数の映像を同時に表示させたり、複数の音声を同時に出力させる必要がある。これらの情報の各映像は映像合成部１４で一つの画像に合成されて表示され、各音声は音声混合部１６で混合された音声で出力されることになる。 In an apparatus that simultaneously performs both viewing of a program and conversation of a video conference system as in the present embodiment, a television broadcast program, one or a plurality of communication partners, and a plurality of videos of the user himself / herself are displayed simultaneously. It is necessary to output a plurality of sounds at the same time. Each video of these pieces of information is synthesized and displayed as one image by the video synthesizing unit 14, and each audio is output as audio mixed by the audio mixing unit 16.

ここで、上記したテレビジョン放送の番組の映像、テレビ会議システムにおける通信相手の映像、テレビ会議システムにおける自分自身の映像を一つの画面に合成して表示させる場合、その各映像を合成したレイアウトとしては多数のパターンが考えられる。例えば、番組視聴重視の場合には番組の画面を大きく表示させて通信相手や自分自身は小さく表示させるが、会話重視の場合には、通信相手や自分自身を大きく表示させて番組の画面は小さく表示させることになる。本実施の形態では、その多数のパターンの各々に適応させてテンプレートを作成し、会話状況判定部４１内に格納している。 Here, when the video of the above-mentioned television broadcast program, the video of the communication partner in the video conference system, and the video of itself in the video conference system are combined and displayed on one screen, the layout is a combination of the videos. There are many possible patterns. For example, if the program viewing is important, the program screen is displayed large and the communication partner or yourself is displayed small, but if the conversation is important, the communication partner or self is displayed large and the program screen is small. Will be displayed. In the present embodiment, a template is created in accordance with each of the many patterns and stored in the conversation state determination unit 41.

図３は、図１の画像及び音声通信機能付テレビジョン放送受像機で番組視聴重視の場合の一例として本人を含めて４人で会話する場合の画面レイアウトを示す図である。
同じ番組を見ながらテレビ会議システムで会話する場合について考えた場合、最適な画面レイアウトは番組内容によって異なると考えられる。例えば、番組として映画やドラマを見る場合には、一般的に番組視聴に重点が置かれ、テレビ会議システムでの会話にはあまり重点が置かれないと考えられる。図３はそのような場合の画面レイアウトの例であり、合成表示画面１０１において、重点が置かれる番組画面６０１は表示寸法が比較的大きく表示され、テレビ会議システムの４人の会話参加者２０１、３０１、４０１、５０１は表示寸法が比較的小さく表示される。つまり、視聴される番組が映画やドラマの場合には、番組画面６０１は表示寸法が比較的大きく、テレビ会議システムの４人の会話参加者２０２、３０２、４０２、５０２の表示寸法が比較的小さく表示される図３のレイアウトのような合成画面が適していると考えられる。 FIG. 3 is a diagram showing a screen layout in a case where four people including the person have a conversation as an example of the case where the television viewing receiver with the image and voice communication function of FIG.
When considering the case of having a conversation with a video conference system while watching the same program, it is considered that the optimum screen layout varies depending on the program content. For example, when watching a movie or a drama as a program, it is generally considered that the emphasis is on viewing the program, and less emphasis is placed on the conversation in the video conference system. FIG. 3 shows an example of the screen layout in such a case. In the composite display screen 101, the program screen 601 to be emphasized is displayed with a relatively large display size, and the four conversation participants 201 of the video conference system are displayed. 301, 401, and 501 are displayed with relatively small display dimensions. That is, when the program to be viewed is a movie or a drama, the program screen 601 has a relatively large display size, and the display sizes of the four conversation participants 202, 302, 402, and 502 in the video conference system are relatively small. A composite screen such as the layout shown in FIG. 3 is considered suitable.

図４は、図１の画像及び音声通信機能付テレビジョン放送受像機で会話重視の場合の一例として本人を含めて４人で会話する場合の画面レイアウトを示す図である。
例えば、バラエティ番組を見る場合には、一般的に番組視聴にはそれほど重点が置かれず、テレビ会議システムでの会話に置かれる重点と同等であるか会話の重点の比率が高まると考えられる。極端な場合には、テレビジョン放送の番組の内容は話題提供的な見方で重点がほとんど置かれず、テレビ会議システムでの会話が中心になってほとんどの重点が置かれる場合も考えられる。図４はそのような場合の画面レイアウトの例であり、合成表示画面１０２において、重点が置かれない番組画面６０２は表示寸法が比較的小さく表示され、テレビ会議システムの４人の会話参加者２０２、３０２、４０２、５０２は表示寸法が比較的大きく表示される。つまり、視聴される番組がバラエティ番組の場合には、番組画面６０１は表示寸法が比較的小さく、テレビ会議システムの４人の会話参加者２０２、３０２、４０２、５０２の表示寸法が比較的大きく表示される図４のレイアウトのような合成画面が適していると考えられる。 FIG. 4 is a diagram showing a screen layout in a case where conversation is made by four persons including the person himself as an example of the case where importance is placed on conversation in the television broadcasting receiver with the image and voice communication function of FIG.
For example, when viewing a variety program, it is generally considered that the program viewing is not so focused, and the ratio of the conversation emphasis is equal to the emphasis placed on the conversation in the video conference system. In an extreme case, the content of the television broadcast program is hardly focused on the topic-providing view, and it is also possible that most of the focus is placed mainly on the conversation in the video conference system. FIG. 4 shows an example of the screen layout in such a case. On the composite display screen 102, the program screen 602 that is not emphasized is displayed with a relatively small display size, and the four conversation participants 202 of the video conference system are displayed. , 302, 402, and 502 are displayed with relatively large display dimensions. That is, when the program to be viewed is a variety program, the display size of the program screen 601 is relatively small, and the display size of the four conversation participants 202, 302, 402, and 502 of the video conference system is relatively large. It is considered that a composite screen like the layout of FIG.

尚、図３及び図４に示したレイアウトは一例であり、番組視聴重視の場合のレイアウトあるいは会話重視の場合のレイアウトとしては、図３及び図４に示したレイアウトの他にも様々な画面レイアウトが考えられる。極端なレイアウトとしては、例えば、通信相手を表示しないレイアウトやテレビジョン放送の番組を表示しないレイアウトも考えられる。 The layout shown in FIGS. 3 and 4 is an example, and various screen layouts other than the layouts shown in FIGS. 3 and 4 may be used as the layout when the program viewing is important or the conversation is important. Can be considered. As an extreme layout, for example, a layout that does not display a communication partner or a layout that does not display a television broadcast program can be considered.

本実施の形態では、上記のように合成画面のレイアウトについては、番組の内容に応じて最適なレイアウトに変更することが希望されるため、本実施の形態の音声通信機能付テレビジョン放送受像機では、番組内容に応じて番組視聴重視の場合のレイアウトあるいは会話重視の場合のレイアウトを自動的に切り替えるようにしている。そのために、テレビデコーダ部２２からは放送で番組と共に送信されてくる各種の付帯情報を含む制御信号を出力させ、会話状況検出部４１ではその付帯情報中から番組のジャンル情報を検出して、その番組のジャンルに応じた画面レイアウトを出力し、合成制御部４２では、番組のジャンルに応じた画面レイアウトを出力できるようなレイアウト及び音声混合方式を指示するように、自動的に出力する制御信号（テンプレート）を切り替えることができる。 In the present embodiment, as described above, it is desired that the layout of the composite screen is changed to an optimal layout in accordance with the contents of the program. Therefore, the television broadcast receiver with a voice communication function of the present embodiment is used. Then, according to the contents of the program, the layout when the program viewing is important or the layout when the conversation is important is automatically switched. For this purpose, the TV decoder unit 22 outputs a control signal including various accompanying information transmitted together with the program by broadcasting, and the conversation state detecting unit 41 detects the genre information of the program from the accompanying information. A screen layout corresponding to the genre of the program is output, and the synthesis control unit 42 automatically outputs a control signal (indicating a layout and audio mixing method that can output the screen layout corresponding to the genre of the program). Template).

また、本実施の形態では、番組の付帯情報等の番組情報は、例えば受信した無線電波の放送信号から得られるだけでなく、インターネット等の通信回線に接続されたテレビ局やテレビ関係情報の取り扱いサイトから通信制御部３２、通信インタフェース部３１を経由して番組情報を取得することができる。そして、会話状況判定部４１では、その番組情報から番組のジャンルを検出して出力する制御信号（テンプレート）を切り替え、合成制御部４２で番組のジャンルに応じた画面レイアウト及び音声混合方式に自動的に切り替えることができる。 In the present embodiment, program information such as supplementary information of a program is not only obtained from a broadcast signal of a received radio wave, for example, but also a TV station connected to a communication line such as the Internet or a site for handling TV-related information Program information can be acquired via the communication control unit 32 and the communication interface unit 31. Then, the conversation status determination unit 41 switches the control signal (template) to be output after detecting the program genre from the program information, and the composition control unit 42 automatically selects the screen layout and sound mixing method according to the program genre. You can switch to

また、本実施の形態の会話状況判定部４１では、番組の内容や状況に応じてステレオや２カ国語放送など音声モードも切り替わる音声モードの状態を識別し、合成制御部４２で音声モードの状態に応じた音声混合方式に自動的に切り替えることができる。 In addition, the conversation state determination unit 41 according to the present embodiment identifies the state of the voice mode in which the sound mode is switched, such as stereo or bilingual broadcasting, according to the content or situation of the program, and the composition control unit 42 determines the state of the voice mode. It is possible to automatically switch to a sound mixing method according to the.

また、本実施の形態では、スポーツ中継番組などの試合の盛り上がりとアナウンサーの声の音量・トーンとの相関関係があることを利用して、会話状況判定部４１でテレビジョン放送の番組の音声を検出して判定することで、番組の盛り上がり状況を判定し、合成制御部４２で番組の盛り上がり状況に応じた画面レイアウト及び音声混合方式に自動的に切り替えることができる。 Further, in the present embodiment, by utilizing the correlation between the excitement of a game such as a sports broadcast program and the volume / tone of the announcer's voice, the conversation status determination unit 41 can output the sound of the television broadcast program. By detecting and determining, the program excitement status can be determined, and the composition control unit 42 can automatically switch to a screen layout and an audio mixing method according to the program excitement status.

また、本実施の形態では、テレビ会議システムの音声を会話状況判定部４１で調べることができ、テレビ会議システムを介しての各発言量（音声情報量）が多い場合には、テレビ視聴よりも会話に重点が移っていると判断し、会話状況判定部４１で会話の盛り上がり具合を調べて、合成制御部４２で会話の盛り上がり具合に応じた画面レイアウト及び音声混合方式に自動的に切り替えることができる。 Further, in the present embodiment, the voice of the video conference system can be checked by the conversation status determination unit 41, and when each utterance amount (voice information amount) through the video conference system is large, it is more than the case of watching TV. It is determined that the focus has been shifted to the conversation, the conversation state determination unit 41 checks the conversation excitement, and the composition control unit 42 automatically switches to a screen layout and a voice mixing method according to the conversation excitement. it can.

また、本実施の形態では、映像音声調整部１３から出力される使用者の画像及び合成映像の画面レイアウトを利用して、会話状況判定部４１で使用者が現在注視している画面上の位置（視線）を検出することにより、使用者が見ている画面位置を検出して、合成制御部４２で画面の注視状況に応じた画面レイアウト及び音声混合方式に自動的に切り替えるようにできる。例えば、視線がテレビジョン放送の番組の画面に向いている場合には、テレビジョン放送の番組の画面が大きい画面レイアウトに切り替え、現在注目しているのがテレビ会議システムの画面である場合にはテレビ会議システムの画面の大きいレイアウトに切り替えることができる。 In the present embodiment, the position on the screen that the user is currently gazing at in the conversation status determination unit 41 using the screen layout of the user image and the composite video output from the video / audio adjustment unit 13. By detecting (line of sight), the screen position that the user is looking at can be detected, and the composition control unit 42 can automatically switch to the screen layout and audio mixing method according to the gaze status of the screen. For example, when the line of sight is directed to the screen of a television broadcast program, the screen of the television broadcast program is switched to a larger screen layout. You can switch to a larger layout of the video conferencing system screen.

また、本実施の形態では、使用者の視線を検知する場合に、現在注視している画面の情報を通信回線経由で通信相手（テレビ会議会話参加者）と相互に交換することにより、会話状況判定部４１で会議参加メンバー全体での注目度を検出することができる。その場合の合成制御部４２では、会議参加メンバー全体での画面注視状況に応じた画面レイアウト及び音声混合方式に自動的に切り替えることができる。 In this embodiment, when detecting the user's line of sight, the information on the screen currently being watched is exchanged with the communication partner (video conference conversation participant) via the communication line. The determination unit 41 can detect the degree of attention of all the members participating in the conference. In this case, the composition control unit 42 can automatically switch to a screen layout and a sound mixing method according to the screen gaze situation of all the conference participants.

また、本実施の形態では、音声及び視線に応じて画面レイアウト及び音声混合方式を自動的に切り替えることができるが、使用者にとってはその画面レイアウト及び音声混合方式が最適であるとは限らない。そこで、本実施の形態では、さらに使用者が遠隔操作装置２を操作することにより、使用者の希望する画面レイアウト及び音声混合方式を会話状況判定部４１に伝え、会話状況判定部４１で画面レイアウト及び音声混合方式を使用者の希望に従うように切り替えるようにもできる。 In this embodiment, the screen layout and the sound mixing method can be automatically switched according to the sound and line of sight, but the screen layout and the sound mixing method are not necessarily optimal for the user. Therefore, in the present embodiment, when the user further operates the remote control device 2, the screen layout and the voice mixing method desired by the user are transmitted to the conversation state determination unit 41, and the screen layout is determined by the conversation state determination unit 41. In addition, the voice mixing method can be switched according to the user's wishes.

さらに、本実施の形態では、使用者の希望する画面レイアウト及び音声混合方式の切り替えに関する情報を通信回線経由で通信相手に送信し、使用者と通信相手との双方の合成画面を同じ画面レイアウトにそろえることができる。 Further, in the present embodiment, information regarding the screen layout desired by the user and the switching of the voice mixing method is transmitted to the communication partner via the communication line, and the combined screens of both the user and the communication partner are set to the same screen layout. Can be aligned.

ここまでは、テレビ会議システムの各会話参加者の画面を各々同一寸法として、テレビジョン放送の番組の画面との合成画面の画面レイアウトの切り替えに関して説明してきたが、複数メンバーによるテレビ会議システムの場合、テレビ会議システムの各会話参加者の画面寸法の大小関係を個々に制御した方が良い場合がある。
例えば、会話の中心となって発言量が多い会話参加者は大きい寸法で表示し、あまり発言量が多くない会話参加者は小さく表示する等の制御が考えられる。 Up to this point, the screens of each conversation participant in the video conference system have the same dimensions, and the screen layout switching of the composite screen with the screen of the television broadcast program has been described. However, in the case of a video conference system with multiple members In some cases, it is better to individually control the size relationship of the screen dimensions of each conversation participant in the video conference system.
For example, it is possible to control such that a conversation participant having a large amount of speech at the center of the conversation is displayed in a large size and a conversation participant having a small amount of speech is displayed in a small size.

図５は、テレビ会議システムの４人の会話参加者の画面を各々同一寸法としてテレビジョン放送の番組の画面と合成した合成画面の画面レイアウトを示す図であり、図６は、テレビ会議システムの４人の会話参加者の内の特定の会話参加者のみを大きく表示してテレビジョン放送の番組の画面と合成した合成画面の画面レイアウトを示す図である。 FIG. 5 is a diagram showing a screen layout of a composite screen in which the screens of the four conversation participants of the video conference system have the same dimensions and are combined with the screen of the television broadcast program, and FIG. 6 shows the screen layout of the video conference system. It is a figure which shows the screen layout of the synthetic | combination screen which displayed only the specific conversation participant among four conversation participants large, and was synthesize | combined with the screen of the program of a television broadcast.

図５の合成表示画面１０３においては、番組画面６０３は合成表示画面１０３の中央に表示寸法が比較的大きく表示され、テレビ会議システムの４人の会話参加者２０３、３０３、４０３、５０３は、合成表示画面１０３の４角に表示寸法が比較的小さく表示されている。一方、図６の合成表示画面１０４においては、番組画面６０４は同様に合成表示画面１０４の中央に表示寸法が比較的大きく表示されるが、テレビ会議システムの４人の会話参加者２０４、３０４、４０４、５０４については、会話参加者２０４、４０４、５０４は同様に合成表示画面１０３の各角に表示寸法が比較的小さく表示されるものの、会話参加者３０４は合成表示画面１０３の左下角に拡大されて大きく表示されている。 In the composite display screen 103 of FIG. 5, the program screen 603 is displayed with a relatively large display size in the center of the composite display screen 103, and the four conversation participants 203, 303, 403, and 503 of the video conference system are combined. Display dimensions are displayed relatively small in the four corners of the display screen 103. On the other hand, in the composite display screen 104 of FIG. 6, the program screen 604 is similarly displayed at a relatively large display size in the center of the composite display screen 104, but the four conversation participants 204, 304, Concerning 404 and 504, the conversation participants 204, 404, and 504 are similarly displayed with relatively small display dimensions at each corner of the composite display screen 103, but the conversation participant 304 is enlarged to the lower left corner of the composite display screen 103. Has been displayed larger.

本実施の形態では、画面レイアウトを図５の合成画面から図６の合成画面に切り替える場合には、例えば、図５の合成画面が表示される状況において、会話状況判定部４１の最多発言者検出部６３で各会話参加者の音声から最多発言者を検出し、最多発言者の注目度が高いと判定し、その最多発言者の画像が拡大されたテンプレートを選択して合成制御信号として合成制御部４２に出力する。合成制御部４２では、映像合成部１４で合成される画像では最多発言者の映像が大きく表示された画面レイアウトとなるように合成制御信号を自動的に切り替える。 In the present embodiment, when the screen layout is switched from the composite screen of FIG. 5 to the composite screen of FIG. 6, for example, in the situation where the composite screen of FIG. The unit 63 detects the most-speaker from the speech of each conversation participant, determines that the most-speaker has a high level of attention, selects a template in which the image of the most-speaker is enlarged, and performs synthesis control as a synthesis control signal To the unit 42. The composition control unit 42 automatically switches the composition control signal so that the image synthesized by the image composition unit 14 has a screen layout in which the image of the most-speaker is displayed large.

また、本実施の形態では、画面レイアウトを図５の合成画面から図６の合成画面に切り替える場合としては、音声から最多発言者を検出して画面レイアウトを制御する方法に代えて、会話状況判定部４１の注目点検出部５４で使用者の視線を検出することで使用者が注視している会話参加者の画面を検出し、その会話参加者の注目度が高いと判定し、その注目度が高い会話参加者の画像が拡大されたテンプレートを選択して合成制御信号として合成制御部４２に出力する方法を用いることができる。その場合、合成制御部４２では、映像合成部１４で合成される画像ではその注目度が高い会話参加者の映像が大きく表示された画面レイアウトとなるように合成制御信号を自動的に切り替える。 Further, in the present embodiment, when switching the screen layout from the composite screen of FIG. 5 to the composite screen of FIG. 6, instead of the method of detecting the most-speaker from the voice and controlling the screen layout, the conversation status determination The attention point detection unit 54 of the unit 41 detects the user's line of sight to detect the screen of the conversation participant being watched by the user, and determines that the conversation participant has a high degree of attention. It is possible to use a method of selecting a template in which an image of a conversation participant having a high level is enlarged and outputting it as a synthesis control signal to the synthesis control unit 42. In that case, the synthesis control unit 42 automatically switches the synthesis control signal so that the image synthesized by the video synthesis unit 14 has a screen layout in which a video of a conversation participant having a high degree of attention is displayed large.

また、本実施の形態では、使用者の視線を検知する場合に、現在注視している画面の情報を通信回線経由で通信相手（テレビ会議会話参加者）と相互に交換することにより、会話状況判定部４１で会議参加メンバー全体での注目度を検出することができる。その場合の合成制御部４２では、会議参加メンバー全体での画面注視状況に応じて、会話参加者の多くが注目している通信相手の映像を大きく表示する画面レイアウトとなるように合成制御信号を自動的に切り替えることができる。 In this embodiment, when detecting the user's line of sight, the information on the screen currently being watched is exchanged with the communication partner (video conference conversation participant) via the communication line. The determination unit 41 can detect the degree of attention of all the members participating in the conference. In such a case, the composition control unit 42 generates a composition control signal so that a screen layout for displaying a video image of a communication partner that many conversation participants are paying attention to according to the screen gaze situation of all the conference participants. It can be switched automatically.

また、本実施の形態では、使用者の視線を検知する場合に、現在注視している画面の情報を通信回線経由で通信相手（テレビ会議会話参加者）と相互に交換することにより、会話状況判定部４１で、テレビ画面を介して擬似的に各通信相手と相互に視線が一致しているかを検出（推定）し、相互に視線が一致する会話参加者同士のお互いの画面だけを大きく配置した画面レイアウトに自動的に切り替えることもできる。 In this embodiment, when detecting the user's line of sight, the information on the screen currently being watched is exchanged with the communication partner (video conference conversation participant) via the communication line. The determination unit 41 detects (estimates) whether the lines of sight coincide with each other in a pseudo manner via the TV screen, and only the screens of the conversation participants whose lines of sight match each other are arranged large. The screen layout can be automatically switched to.

例えば、図６の会話参加者２０４、会話参加者３０４、会話参加者４０４、会話参加者５０４の４人で会話している場合で、会話参加者２０４が会話参加者３０４の画面を注視し、会話参加者３０４も会話参加者２０４の画面を注視、会話参加者４０４は会話参加者２０４を、会話参加者５０４はテレビ画面を注視している場合とする。その場合、会話参加者２０４の注視している画面では、会話参加者３０４の映像が大きく表示され、会話参加者３０４の注視している画面では、会話参加者２０４の映像が大きく表示されたレイアウトとなる。その場合に図６は、２０４の注視している画面を示すことになる。
又、その場合の会話参加者４０４、会話参加者５０４の注視している画面では、４人の会話参加者全員の映像が均等の大きさで表示される。 For example, when the conversation participant 204, the conversation participant 304, the conversation participant 404, and the conversation participant 504 in FIG. 6 are having a conversation, the conversation participant 204 watches the screen of the conversation participant 304. It is assumed that the conversation participant 304 is also watching the screen of the conversation participant 204, the conversation participant 404 is watching the conversation participant 204, and the conversation participant 504 is watching the television screen. In this case, the screen in which the conversation participant 204 is gazing is displayed with a large image of the conversation participant 304, and the screen in which the conversation participant 304 is gazing is displayed in a large size. It becomes. In this case, FIG. 6 shows a screen at which 204 is gazing.
Further, in the screen in which the conversation participants 404 and 504 are watching, the images of all four conversation participants are displayed in an equal size.

また、会話参加者が４人以上である場合には、例えば、図６の会話参加者２０４と会話参加者３０４が相互に注視し合って会話し、会話参加者４０４と会話参加者５０４が相互に注視し合って会話をする場合のように、視線検出により各々の通信相手を特定してその通信相手を大きく表示する処理が複数の組で実施される場合がある。この場合、画面については視線検出により相互に注視し合う相手が大きく表示される画面レイアウトに切り替えることができるが、音声については２組の会話音声の音量が均等に混合されて出力されるので、各組の個別の相手の音声が聞き取りにくい場合がある。そのような場合に、本実施の形態では、各組の個別の相手の画面の大きさを調整する際に、その相手の音声の混合比が大きくなるように調整することができる。従って、本実施の形態では、テレビ会議システムの１つの画面の中で複数組の会話参加者による複数の会話が存在する場合でも、個々の会話参加者に対応する相手の音声の混合比を大きくして混合させることができるので、個々の会話参加者が自分の会話相手の音声を自然に識別しやすく（聞き取りやすくすることができる。 When there are four or more conversation participants, for example, the conversation participant 204 and the conversation participant 304 in FIG. 6 pay attention to each other and have a conversation, and the conversation participant 404 and the conversation participant 504 interact with each other. As in the case of having a conversation while paying attention to each other, there is a case where a process of specifying each communication partner by visual line detection and displaying the communication partner in large size is performed in a plurality of sets. In this case, the screen layout can be switched to a screen layout in which the other party who is gazing at each other is displayed greatly by detecting the line of sight, but since the volume of the two conversational voices is mixed and output for the voice, There are cases where it is difficult to hear the voices of each party in each group. In such a case, in the present embodiment, when adjusting the screen size of each pair of individual opponents, it is possible to make adjustments so that the mixing ratio of the other party's voice is increased. Therefore, in this embodiment, even when there are a plurality of conversations by a plurality of conversation participants in one screen of the video conference system, the mixing ratio of the other party's voice corresponding to each conversation participant is increased. Therefore, it is easy for individual conversation participants to easily identify the voice of their conversation partner (easy to hear).

このように本実施の形態の画像及び音声通信機能付テレビジョン放送受像機では、合成画面における各画面毎の表示寸法を、番組の内容に対する各会議会話参加者の重要度と連携させて切り替えることができ、更に、合成画面における各会議会話参加者毎の画面の表示寸法を、各会議会話参加者に個別の重要度と連携させて切り替えることができ、音声についても、対応する会話相手の音声の混合比が大きくなるように混合させて出力させることができるので、使用者の感覚的に自然で使いやすい画像及び音声通信機能付テレビジョン放送受像機を提供することができる。 As described above, in the television broadcast receiver with the image and audio communication function according to the present embodiment, the display size of each screen on the composite screen is switched in association with the importance of each conference conversation participant with respect to the content of the program. In addition, the screen display size of each conference conversation participant on the composite screen can be switched in association with the individual importance for each conference conversation participant, and the voice of the corresponding conversation partner can also be switched. Therefore, it is possible to provide a television broadcast receiver with an image and audio communication function that is natural and easy to use for the user.

実施の形態２．
上記した実施の形態１では、各会議会話参加者の注目点及び視線の検出結果を画面レイアウトの切り替えに利用していたが、各会議会話参加者の注目点及び視線の検出結果は、例えば、合成画面における同一テレビ番組の画面上に、各会話参加者の注目点をマーキングさせることで、各会話参加者に共通の話題を提供することができる。以下に示す実施の形態２では、そのような合成画面における同一テレビ番組の画面上に各会話参加者の注目点をマーキングさせる場合について説明する。 Embodiment 2. FIG.
In Embodiment 1 described above, the attention point and line-of-sight detection results of each conference conversation participant were used for switching the screen layout. A common topic can be provided to each conversation participant by marking each conversation participant's attention point on the screen of the same television program on the composite screen. In the second embodiment described below, a case will be described in which the attention point of each conversation participant is marked on the screen of the same television program in such a composite screen.

本実施の形態の構成については、基本的に図１及び図２の各ブロック図に示した実施の形態１と同様であるが、一部の構成ブロックに以下に説明する相違点を有する。
会話状況判定部４１内のレイアウト・音声混合テンプレート７２には、各会話参加者の注目点を示す特定記号を予め記憶し、注目点検出部５４の検出した各会話参加者の注目点に、その特定記号の映像を合成する。例えば、各会話参加者の注目点の各々を示す第１の特定記号と、各注目点から最も多く注目される最被注目点を示す第２の特定記号をレイアウト・音声混合テンプレート７２に記憶する場合、注目点検出部５４の検出した各会話参加者の注目点に第１の特定記号の映像を合成し、最被注目点には第２の特定記号の映像を合成する。 The configuration of the present embodiment is basically the same as that of the first embodiment shown in the block diagrams of FIGS. 1 and 2, but some of the configuration blocks have differences described below.
A specific symbol indicating the attention point of each conversation participant is stored in advance in the layout / audio mixed template 72 in the conversation state determination unit 41, and the attention point of each conversation participant detected by the attention point detection unit 54 is Synthesize video with a specific symbol. For example, a first specific symbol indicating each of the attention points of each conversation participant and a second specific symbol indicating the most noticed point most noticed from each attention point are stored in the layout / voice mixed template 72. In this case, the video of the first specific symbol is synthesized with the attention point of each conversation participant detected by the attention point detection unit 54, and the video of the second specific symbol is synthesized with the most noticed point.

図７は、図３の本人を含めて４人で会話する場合の画面レイアウトに会話参加者の注目点を示す特定記号を追加して示した図である。
図７は、図３と同様に番組視聴に重点が置かれ、テレビ会議システムでの会話にはあまり重点が置かれない場合であり、合成表示画面１０５において、番組画面６０５は表示寸法が比較的大きく表示され、テレビ会議システムの４人の会話参加者２０５、３０５、４０５、５０５は表示寸法が比較的小さく同一寸法で表示される。 FIG. 7 is a diagram in which a specific symbol indicating the attention point of the conversation participant is added to the screen layout when the conversation is performed with four persons including the person in FIG. 3.
FIG. 7 shows a case where emphasis is placed on program viewing as in FIG. 3 and less emphasis is placed on conversation in the video conference system. In the composite display screen 105, the program screen 605 has a relatively small display size. The four conversation participants 205, 305, 405, and 505 of the video conference system are displayed with a relatively small display size and the same size.

本実施の形態における各会話参加者２０５、３０５、４０５、５０５の視線（注目点）を検出した結果は、通信回線経由で交換され、各々の注視する合成画面の番組画面６０５上に、各会話参加者２０５、３０５、４０５、５０５の注視点を示す「○」マーク等の特定記号が表示される。
特定記号としては、その表示位置を強調できるように、円「○」や十字「＋」等、周囲の画像や色調にとけ込みにくい形状及び色調のものが用いられる。更に、その特定記号の輝度の変化、色の変化等により強調してもよい。この特定記号の表示機能により、各会話参加者２０５、３０５、４０５、５０５は、お互いが注目している画面上の位置がわかるので、各参加者に共通の話題が提供されることになる。 The result of detecting the line of sight (attention point) of each conversation participant 205, 305, 405, 505 in the present embodiment is exchanged via a communication line, and each conversation is displayed on the program screen 605 of the synthesized screen to be watched. A specific symbol such as a “◯” mark indicating the gazing point of the participants 205, 305, 405, and 505 is displayed.
As the specific symbol, a symbol having a shape and tone that is difficult to be blended with the surrounding image and tone such as a circle “◯” and a cross “+” is used so that the display position can be emphasized. Further, the specific symbol may be emphasized by a change in luminance, a change in color, or the like. This specific symbol display function allows each conversation participant 205, 305, 405, 505 to know the position on the screen where each participant is paying attention, so that a common topic is provided to each participant.

画面上のマークは、個人を特定できないように各会話参加者を同じ特定記号で表示させることもできるが、目的によっては各会話参加者中の誰の視線を表示しているのかがわかる方が良い場合もあるので、その場合には、各会話参加者毎に異なる特定記号を表示させる。その場合の各会話参加者毎に異なる各特定記号としては、例えば、形状が同じでも色分けしたり、形状の一部変更等を実施することで、画面上に表示された視線マークが誰の視線を表すのかを容易に識別できるようにできる。 The mark on the screen allows each conversation participant to be displayed with the same specific symbol so that an individual cannot be identified, but depending on the purpose, it is better to know who is viewing the line of sight among each conversation participant Since it may be good, in that case, a different specific symbol is displayed for each conversation participant. In this case, the specific symbol that is different for each conversation participant may be, for example, who has the same line-of-sight mark displayed on the screen by color coding or partial modification of the shape. Can be easily identified.

色分けによる識別を行う場合、例えば、図７の画面上に表示された各会話参加者２０５、３０５、４０５、５０５の顔表示の枠線を各々異なる色で表示する。会話参加者２０５の枠線が赤色、会話参加者３０５の枠線が青色、会話参加者４０５の枠線が緑色、会話参加者５０５の枠線が黄色等である。次に、各会話参加者２０５、３０５、４０５、５０５の視線を示す特定記号（円）を各参加者の枠線の色と同じ色付きの円で表示させる。このように色分けすると、各会話参加者２０５、３０５、４０５、５０５の視線を、特定記号の色と顔表示の枠線を対照することで、視線を示す特定記号がだれの視線かを容易に識別することができる。 When performing identification by color coding, for example, the border lines of the face display of the conversation participants 205, 305, 405, and 505 displayed on the screen of FIG. 7 are displayed in different colors. The frame of the conversation participant 205 is red, the frame of the conversation participant 305 is blue, the frame of the conversation participant 405 is green, the frame of the conversation participant 505 is yellow, and the like. Next, a specific symbol (circle) indicating the line of sight of each conversation participant 205, 305, 405, 505 is displayed in a circle with the same color as the color of the frame of each participant. By color-coding in this way, the line of sight of each conversation participant 205, 305, 405, 505 is easily compared with the color of the specific symbol and the frame of the face display, so that the specific symbol indicating the line of sight can be easily determined. Can be identified.

このように本実施の形態では、各会話参加者の注目点をマーキングさせることで、各会話参加者に共通の話題を提供することができる。 Thus, in this Embodiment, a common topic can be provided to each conversation participant by marking the attention point of each conversation participant.

実施の形態３．
上記した実施の形態２では、合成画面における同一テレビ番組の画面上に各会話参加者の注目点をマーキングさせていたが、各会話参加者の視聴する合成画面に同一テレビ番組の画面を表示させる方法については記載されていなかった。その他の場合でも、画像及び音声通信機能付テレビジョン放送受像機では、会話参加者が同じテレビジョン放送の番組を見ていることが前提で会話が実施されることがある。以下に示す実施の形態３では、そのような各会話参加者の視聴する合成画面に同一テレビ番組の画面を表示させる方法について説明する。 Embodiment 3 FIG.
In Embodiment 2 described above, the attention point of each conversation participant is marked on the screen of the same TV program on the composite screen, but the same TV program screen is displayed on the composite screen viewed by each conversation participant. The method was not described. Even in other cases, in a television broadcast receiver with an image and audio communication function, a conversation may be performed on the assumption that conversation participants are watching the same television broadcast program. In Embodiment 3 described below, a method of displaying the same TV program screen on the composite screen viewed by each conversation participant will be described.

本実施の形態の構成については、基本的に図１及び図２の各ブロック図に示した実施の形態１と同様であるが、一部の構成ブロックに以下に説明する相違点を有する。
会話状況判定部４１内の番組情報検出部５６は、使用者がチャンネルを切り替えた場合等に、その切り替わったチャンネルの情報を、他の会話参加者の画像及び音声通信機能付テレビジョン放送受像機に送信する。又、遠隔操作装置２は、他の会話参加者の画像及び音声通信機能付テレビジョン放送受像機に視聴中の番組を問い合わせて結果を受信し、その番組のチャンネルを自動的に選局することで、他の会話参加者が視聴中の番組と使用者が視聴する番組とを一致させるボタンを備える。 The configuration of the present embodiment is basically the same as that of the first embodiment shown in the block diagrams of FIGS. 1 and 2, but some of the configuration blocks have differences described below.
The program information detection unit 56 in the conversation status determination unit 41 uses the information of the switched channel when the user switches the channel and the like, and the television broadcast receiver with an image and voice communication function of other conversation participants. Send to. In addition, the remote control device 2 inquires about the program being viewed from the television broadcast receiver with image and voice communication function of other conversation participants, receives the result, and automatically selects the channel of the program. And a button for matching the program that the other conversation participant is viewing with the program that the user is viewing.

各会話参加者の視聴する合成画面に同一テレビ番組の画面を表示させる方法としては、例えば、使用者がチャンネルを切り替えた場合等に、その切り替わったチャンネルの情報を、他の会話参加者の画像及び音声通信機能付テレビジョン放送受像機に送信することが必要である。そのため、遠隔操作装置２によりチャンネル変更の走査が実施された場合には、番組情報検出部５６は、テレビデコーダ部２２の制御情報から現在視聴中のチャンネル情報を得て、例えば、会話状況判定部４１内の番組情報検出部５６及びネット検索部５１等を用いて通信制御部３２及び通信インターフェース部３１から通信回線経由で、そのチャンネル情報を他の会話参加者へ送信する。一方、そのチャンネル情報を受信した画像及び音声通信機能付テレビジョン放送受像機では、会話状況判定部４１内の番組情報検出部５６及びネット検索部５１等を用いて、自動的にそのチャンネルに切り替わるようにする。 As a method of displaying the same TV program screen on the composite screen viewed by each conversation participant, for example, when the user switches the channel, the information of the switched channel is displayed as the image of the other conversation participant. It is also necessary to transmit to a television broadcast receiver with a voice communication function. Therefore, when a channel change scan is performed by the remote operation device 2, the program information detection unit 56 obtains channel information currently being viewed from the control information of the television decoder unit 22, for example, a conversation status determination unit The channel information is transmitted to the other conversation participants from the communication control unit 32 and the communication interface unit 31 via the communication line using the program information detection unit 56 and the net search unit 51 in 41. On the other hand, in a television broadcast receiver with an image and audio communication function that has received the channel information, the channel information is automatically switched to the channel by using the program information detection unit 56 and the net search unit 51 in the conversation state determination unit 41. Like that.

各会話参加者の画像及び音声通信機能付テレビジョン放送受像機において、目的によっては、チャンネル切り替えの情報を無視したり、送信せずに自分の画像及び音声通信機能付テレビジョン放送受像機のチャンネルだけを切り替えたりすることもできる。 In the television broadcast receiver with the image and voice communication function of each conversation participant, depending on the purpose, the channel of the television broadcast receiver with the own image and voice communication function without ignoring or transmitting the channel switching information. You can also switch only.

又、現在の会話参加者に新しい会話参加者が加わる場合には、例えば、遠隔操作装置に他の会話参加者の見ている番組に合わせるためのボタン持たせておき、そのボタンを押すことで、他の通信相手の画像及び音声通信機能付テレビジョン放送受像機に視聴中の番組を問い合わせ、そのチャンネルに自動的に切り替わるようにする。 In addition, when a new conversation participant is added to the current conversation participant, for example, the remote control device is provided with a button for adjusting to the program that the other conversation participant is viewing, and the button is pushed. Inquiries are made to the television broadcast receiver with image and audio communication functions of other communication partners about the program being viewed, and the channel is automatically switched to that channel.

このように本実施の形態では、各会話参加者の視聴する合成画面に同一テレビ番組の画面を表示させることができる。 Thus, in the present embodiment, it is possible to display the same television program screen on the composite screen viewed by each conversation participant.

実施の形態４．
上記した実施の形態１では、主に各画面の大きさと画面レイアウト及び音声の混合比を切り替えることによって、主な通信相手を識別しやすくしていたが、例えば、通信デコーダ部３４の能力が乏しく各会話参加者の画像表示寸法を変更することが厳しい場合等には、例えば、各会話参加者の画像のフレームレートを切り替えることによって主な通信相手を識別しやすくすることができる。以下に示す実施の形態４では、そのような各会話参加者の画像のフレームレートを切り替える場合について説明する。 Embodiment 4 FIG.
In the first embodiment described above, the main communication partner is easily identified by mainly switching the size of each screen, the screen layout, and the mixing ratio of the audio. However, for example, the capability of the communication decoder unit 34 is poor. When it is difficult to change the image display size of each conversation participant, for example, the main communication partner can be easily identified by switching the frame rate of the image of each conversation participant. In the fourth embodiment described below, a case where the frame rate of each conversation participant's image is switched will be described.

図８は、本発明の実施の形態４の画像及び音声通信機能付テレビジョン放送受像機に係る一例の概略構成を示すブロック図である。
図８に示した本実施の形態の構成は、基本的に図１で示した実施の形態１と同様であるが、以下に説明する相違点を有する。図２については同様である。
レイアウト・音声混合テンプレート７２には、予め考えられる事態の各画像が様々な位置で様々にフレームレートを増減された画面レイアウトのテンプレートが格納されている。
合成制御部４２には、合成制御信号を利用して、強調して表示させる画像に対するフレームレートを増加させて表示させるフレームレート制御部４３を備える。
フレームレート制御部４３は、強調して表示させる必要のない画像に対してはフレームレートを減少させる。 FIG. 8 is a block diagram showing a schematic configuration of an example according to a television broadcast receiver with an image and audio communication function according to the fourth embodiment of the present invention.
The configuration of the present embodiment shown in FIG. 8 is basically the same as that of the first embodiment shown in FIG. 1, but has the differences described below. The same applies to FIG.
The layout / sound mixing template 72 stores a screen layout template in which each image of a conceivable situation is increased or decreased at various positions at various positions.
The synthesis control unit 42 includes a frame rate control unit 43 that uses the synthesis control signal to increase the frame rate for an image to be displayed in an enhanced manner.
The frame rate control unit 43 reduces the frame rate for an image that does not need to be displayed with emphasis.

本実施の形態では、例えば、注目している通信相手の画像は秒３０コマで表示し、それ以外の通信相手は秒１コマで表示するなどの方法で、各会話参加者の画像から主な通信相手を識別しやすくすることができる。
このフレームレートを変化させる方法は、上記した画像サイズを変更する方法と併用することも可能である。 In the present embodiment, for example, the image of the communication partner of interest is displayed at 30 frames per second, and the other communication partner is displayed at one frame per second. The communication partner can be easily identified.
This method of changing the frame rate can be used in combination with the above-described method of changing the image size.

このように本実施の形態では、各会話参加者の画像のフレームレートを切り替えることによって主な通信相手を識別しやすくすることができる。 As described above, in this embodiment, it is possible to easily identify the main communication partner by switching the frame rate of the image of each conversation participant.

図１は、本発明の実施の形態１の画像及び音声通信機能付テレビジョン放送受像機に係る一例の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an example of a television broadcast receiver with an image and audio communication function according to Embodiment 1 of the present invention. 図１の会話状況判定部４１の概略の内部構成の一例を示すブロック図である。It is a block diagram which shows an example of the schematic internal structure of the conversation condition determination part 41 of FIG. 図１の画像及び音声通信機能付テレビジョン放送受像機で番組視聴重視の場合の一例として本人を含めて４人で会話する場合の画面レイアウトを示す図である。It is a figure which shows the screen layout in the case of having a conversation with four persons including the person as an example in the case of placing importance on program viewing on the television broadcast receiver with the image and voice communication function of FIG. 図１の画像及び音声通信機能付テレビジョン放送受像機で会話重視の場合の一例として本人を含めて４人で会話する場合の画面レイアウトを示す図である。It is a figure which shows the screen layout in the case of having conversation with four persons including the principal as an example in the case where importance is placed on conversation in the television broadcast receiver with the image and voice communication function of FIG. テレビ会議システムの４人の会話参加者の画面を各々同一寸法としてテレビジョン放送の番組の画面と合成した合成画面の画面レイアウトを示す図である。It is a figure which shows the screen layout of the synthetic | combination screen which synthesize | combined with the screen of the program of a television broadcast by making the screen of four conversation participants of a video conference system into the same dimension, respectively. テレビ会議システムの４人の会話参加者の内の特定の会話参加者のみを大きく表示してテレビジョン放送の番組の画面と合成した合成画面の画面レイアウトを示す図である。It is a figure which shows the screen layout of the synthetic | combination screen which displayed only the specific conversation participant among the four conversation participants of a video conference system, and was synthesize | combined with the screen of the program of a television broadcast. 図３の本人を含めて４人で会話する場合の画面レイアウトに会話参加者の注目点を示す特定記号を追加して示した図である。It is the figure which added and added the specific symbol which shows the attention point of a conversation participant to the screen layout in the case of having a conversation with four persons including the principal of FIG. 本発明の実施の形態４の画像及び音声通信機能付テレビジョン放送受像機に係る一例の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of an example which concerns on the television broadcast receiver with the image and audio | voice communication function of Embodiment 4 of this invention.

Explanation of symbols

１画像及び音声通信機能付テレビジョン放送受像機、２遠隔操作装置、１１システム制御部、１２映像音声入力部、１３映像音声調整部、１４映像合成部、１５表示部、１６音声混合部、１７音声出力部、１８遠隔操作受信部、２１チューナ部、２２テレビデコーダ部、３１通信インタフェース部、３２通信制御部、３３通信エンコーダ部、３４通信デコーダ部、４１会話状況判定部、４２合成制御部、４３フレームレート制御部、５１ネット検索部、５２被注目者検出部、５３視線一致検出部、５４注目点検出部、５５音声モード検出部、５６番組情報検出部、６１盛り上がり検出部、６２通話総量検出部、６３最多発言者検出部、７１検出内容判定部、７２レイアウト・音声混合テンプレート、７３合成モード選定部、８１放送・通話比算出部、８２特定発言者重点検知部、８３使用者選択モード記憶部、８４テンプレート選択部。
DESCRIPTION OF SYMBOLS 1 Television broadcast receiver with an image and audio | voice communication function, 2 Remote operation apparatus, 11 System control part, 12 Image | video audio | voice input part, 13 Image | video audio | voice adjustment part, 14 Image | video synthetic | combination part, 15 Display part, 16 Audio | voice mixing part, 17 Audio output unit, 18 remote operation receiving unit, 21 tuner unit, 22 TV decoder unit, 31 communication interface unit, 32 communication control unit, 33 communication encoder unit, 34 communication decoder unit, 41 conversation state determination unit, 42 synthesis control unit, 43 frame rate control unit, 51 net search unit, 52 subject detection unit, 53 line-of-sight detection unit, 54 attention point detection unit, 55 voice mode detection unit, 56 program information detection unit, 61 climax detection unit, 62 call total amount Detection unit, 63 Most-speaker detection unit, 71 Detection content determination unit, 72 Layout / sound mixing template 73 Synthesis mode selection unit, 81 broadcast-call ratio calculator, 82 specific speaker focus detection unit, 83 user selection mode storage unit, 84 template selecting section.

Claims

A television decoder for decoding a received television broadcast digital signal to detect a television video signal and a television audio signal and a television control signal attached to them;
A video / audio adjustment unit that converts an input video signal from the imaging unit and an input audio signal from the audio input unit into a format suitable for image and audio communication processing;
A communication decoder that decodes an input signal from a communication line to detect a communication video signal and a communication audio signal;
A communication control unit for controlling input / output of video and audio input / output signals and communication control signals attached to the communication line;
The TV video signal from the TV decoder unit and the communication video signal from the communication decoder unit are input, and at the time of image and audio communication, the image of the program received by the television broadcast and the image input from the communication line are combined. A video synthesis unit for outputting the synthesized video signal,
A television broadcast receiver with an image and audio communication function including at least a display unit for displaying an image of a program received by the television broadcast and displaying the composite video signal at the time of image and audio communication,
A conversation status determination unit that receives at least an input audio signal from the video / audio adjustment unit and a communication audio signal from the communication decoder unit, determines a conversation status between a user and a communication partner, and outputs a synthesis control signal; ,
The video synthesis unit further receives a synthesis control signal from the conversation situation determination unit, synthesizes the synthesized video signal based on the synthesis control signal,
The display unit displays a composite video signal based on the composite control signal. A television broadcast receiver with an image and audio communication function.

The conversation status determination unit is configured to perform a conversation based on a user's utterance detected from an input audio signal from the video / audio adjustment unit and a communication partner's utterance detected from a communication audio signal from the communication decoder. The television broadcast receiver with an image and a voice communication function according to claim 1, wherein the composite control signal is generated so that a situation is determined and an image of a person with a large amount of speech is displayed in a large size.

The said video composition part outputs the synthesized video signal synthesize | combined according to the said synthesis | combination control signal containing the content of the some synthetic | combination layout template from which the dimension ratio of each image synthesize | combined differs. A television broadcast receiver with the described image and audio communication function.

The communication control unit outputs at least the communication control signal to be output including user's speech amount information, and detects the speech amount information of the communication partner from the input communication control signal. Item 5. A television broadcast receiver with an image and audio communication function according to any one of Items 1 to 3.

The conversation status determination unit receives a communication control signal from the communication control unit, determines the number of conversation participants from the communication control signal, and when there are three or more, each conversation participation in the order of the amount of speech The television broadcast receiver with an image and audio communication function according to claim 4, wherein the synthesis control signal is generated so that a person's image becomes larger.

The conversation status determination unit receives a communication control signal from the communication control unit, determines the number of conversation participants from the communication control signal, and when there are three or more, the image of the person who has the largest amount of speech is displayed. The composite control signal is generated so as to be large. The television broadcast receiver with an image and sound communication function according to claim 4.

A television audio signal from the television decoder unit, a communication audio signal from the communication decoder unit, an input audio signal from the video / audio adjustment unit, and a synthesis control signal from the conversation state determination unit are input, and the synthesis control signal An audio mixing unit that outputs a mixed audio signal mixed based on
The television broadcast with an image and voice communication function according to any one of claims 4 to 6, wherein a volume level of each conversation participant is also changed by the voice mixing unit in accordance with the synthesis control signal. Receiver.

The sound mixing unit outputs a mixed sound signal mixed in accordance with the synthesis control signal including the contents of a plurality of sound mixing templates in which each sound level corresponding to each image is mixed and each volume level is different. The television broadcast receiver with an image and audio communication function according to claim 7.

In the communication control unit, the communication control signal to be output is output including the voice volume level information of the user, and the voice volume level information of the communication partner is detected from the input communication control signal.
In addition to the amount of speech, the conversation status determination unit sets the user's utterance volume level in the input audio signal from the video / audio adjustment unit and the utterance volume level of the communication partner in the communication audio signal from the communication decoder unit. The television broadcast receiver with an image and audio communication function according to any one of claims 2 to 8, wherein a conversation state is determined based on the image.

The conversation state determination unit further receives a TV audio signal from the TV decoder unit together with channel information,
The television broadcast receiver with an image and audio communication function according to any one of claims 1 to 9, wherein the television audio signal is used to determine a conversation state and a synthesis control signal is output.

The conversation status determination unit further receives a TV control signal from the TV decoder unit together with channel information,
11. The television broadcast receiver with an image and audio communication function according to claim 10, wherein at least genre information of a television broadcast program that is being viewed by a user is used as the television control signal.

The conversation status determination unit further receives a TV control signal from the TV decoder unit together with channel information,
The television broadcast receiver with an image and audio communication function according to claim 10, wherein at least audio mode information of a television broadcast program being viewed by a user is used as the television control signal.

In the communication control unit, the communication control signal that is output includes the channel information and supplementary information of the TV broadcast program that the user views, and the communication control signal that is input determines the TV broadcast program of the communication partner. Detect channel information and incidental information,
The communication status determination unit further receives a communication control signal from the communication control unit, determines the conversation status, and outputs a composite control signal. A television broadcast receiver with the image and audio communication function according to the description.

The image and sound according to claim 13, wherein the conversation status determination unit uses at least genre information in the incidental information of a television broadcast program viewed by a communication partner as a communication control signal from the communication control unit. Television broadcast receiver with communication function.

The image and audio communication function-equipped according to claim 13, wherein the conversation status determination unit uses at least audio mode information of a television broadcast program viewed by a communication partner as a communication control signal from the communication control unit. Television broadcast receiver.

The conversation state determination unit includes an attention point determination unit that receives an input video signal from the video / audio adjustment unit and determines an attention point on which part of the display unit the user is paying attention to. The television broadcast receiver with an image and audio communication function according to any one of claims 1 to 15, wherein a conversation control state is determined using the information, and a synthesis control signal is output.

In the communication control unit, the communication control signal to be output is output including the attention point information of the user, the attention point information of the communication partner is detected from the input communication control signal,
The television with an image and voice communication function according to claim 16, wherein the conversation state determination unit determines a conversation state using the attention point information of all conversation participants and outputs a composite control signal. John broadcast receiver.

18. The television with an image and audio communication function according to claim 16, wherein the conversation state determination unit uses the attention point information at least for composition control of a television broadcast program screen to be viewed. Broadcast receiver.

The image according to claim 16 or 17, wherein the conversation status determination unit uses the attention point information for at least the synthesis control of the screen of a person to be watched most of all conversation participants. And a television broadcast receiver with a voice communication function.

The said conversation status determination part uses the said attention point information with respect to the synthetic | combination control of the screen of each conversation participant mutually paying attention at least in the whole conversation participant. TV broadcast receiver with image and audio communication function.

The conversation status determination unit stores in advance a specific symbol indicating the attention point of each conversation participant,
The television broadcast receiver with an image and audio communication function according to any one of claims 16 to 20, wherein an image of the specific symbol is synthesized with the attention point determined by the attention point determination unit.

In the conversation status determination unit, a first specific symbol indicating each of the attention points of each conversation participant and a second specific symbol indicating the most attention point that is most noticed from each of the attention points are stored in advance. Remember,
The video of the first specific symbol is synthesized with the attention point of each conversation participant determined by the attention point determination unit, and the video of the second specific symbol is synthesized with the most noticeable point. The television broadcast receiver with an image and sound communication function according to claim 21.

The image processing apparatus according to any one of claims 1 to 22, further comprising: a frame rate control unit that increases a frame rate for an image to be displayed with emphasis using the synthesis control signal. Television broadcast receiver with voice communication function.

The television broadcast receiver with an image and audio communication function according to claim 21, wherein the frame rate control unit reduces the frame rate for an image that does not need to be displayed with emphasis.

A remote operation device capable of remotely operating a television broadcast receiver with an image and audio communication function by a radio signal using radio waves or infrared rays, and a remote operation receiving unit 18 for receiving the radio signal;
25. The remote control device according to claim 16, wherein the remote control device is capable of turning on / off the output of the composite control signal from the conversation state determination unit or changing the control content of the composite control signal. The television broadcast receiver with an image and audio communication function according to any one of the preceding claims.

In the communication control unit, the communication control signal to be output includes the contents instructed by the wireless signal from the remote operation device 2 and is output from the remote control device of the communication partner from the input communication control signal. The contents indicated by the signal
26. The television broadcast receiver with an image and voice communication function according to claim 25, wherein the conversation status determination unit uses remote control device instruction input information of all conversation participants to determine the attention point. .

The remote control device inquires about the program being viewed from the television broadcast receiver with image and voice communication function of other conversation participants, receives the result, and automatically selects the channel of the program, 27. The television broadcast receiver with an image and audio communication function according to claim 25 or 26, further comprising a button for matching a program being watched by another conversation participant with a program being watched by a user.