JP2008005349A

JP2008005349A - Video encoder, video transmission apparatus, video encoding method, and video transmission method

Info

Publication number: JP2008005349A
Application number: JP2006174502A
Authority: JP
Inventors: Toshio Suzuki; 敏雄鈴木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-06-23
Filing date: 2006-06-23
Publication date: 2008-01-10

Abstract

PROBLEM TO BE SOLVED: To provide a video encoder and a video encoding method capable of effectively encoding video data while keeping the quality of a person video that is an object to be noticed in video in television conferences and speeches of lectures or the like unchanged, and provide video transmission apparatus and a video transmission method capable of effectively transmitting video data in response to the state of a communication rate or the like of a transmission line. SOLUTION: A remote conference device 1 encodes conference video data [10] in which conference scenes are photographed and transmits it to a partner apparatus when a scene is changed, and it encodes only person video data [11] to [14] and transmits them to the partner apparatus when the scene is not changed. Hereby, a video of a person who is an object to be noticed in conference video is transmitted at all times even when a communication rate on a transmission line is lowered and the feature of the video is changed. Consequently, the video data [11] to [14] of the person are synthesized with the conference video data [10] in the partner apparatus, whereby the video of the conference scene can be updated with a less data transmission amount. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、テレビ会議や講演のスピーチ等の人物を含む映像データを符号化する映像符号化装置及び映像符号化方法、並びにネットワーク等を介して異なる地点間で上記映像データを伝送する映像伝送装置及び映像伝送方法に関する。 The present invention relates to a video encoding device and video encoding method for encoding video data including a person such as a teleconference or speech, and a video transmission device for transmitting the video data between different points via a network or the like. And a video transmission method.

従来、本社と営業所間等、離れた地点間での会議の時に使用する遠隔会議システムがあった。このような遠隔会議システムとしては、例えば、特許文献１に、会議参加者のそれぞれに対して１つずつマイクを設置して各発言者の音声を収音し、また、ズームアップ用カメラを用いて発言者を撮影するテレビ会議システムが開示されている。
特開平２−２０２２７５号公報 Conventionally, there has been a remote conference system used for a conference between remote locations such as a head office and a sales office. As such a remote conference system, for example, in Patent Document 1, one microphone is installed for each conference participant to collect the voice of each speaker, and a zoom-up camera is used. A video conference system for photographing a speaker is disclosed.
JP-A-2-202275

特許文献１に示すようなテレビ会議システムでは、専用回線等を介して音声データや映像データをやりとりしている。また、最近のテレビ会議システムでは、ＩＳＤＮ回線やアナログ電話網などの低ビットレートの通信回線を利用したネットワークを介してテレビ会議を行うことができるように、ＭＰＥＧ１，２やＩＴＵ−ＴＨ．２６３などで規定された映像圧縮方法を用いて圧縮したデジタル映像データをやりとりしている。 In a video conference system as shown in Patent Document 1, audio data and video data are exchanged via a dedicated line or the like. Also, in recent video conference systems, MPEG1, 2 and ITU-T H.264 have been adopted so that a video conference can be performed via a network using a low bit rate communication line such as an ISDN line or an analog telephone network. Digital video data compressed using a video compression method defined by H.263 is exchanged.

ＭＰＥＧ１，２やＩＴＵ−ＴＨ．２６３などを利用して映像を符号化する場合、動き情報と映像の空間的な情報量から、一定で均一なアルゴリズムを使用して、ビット割り当てを行う。例えば、テレビ放送のスポーツ番組や映画ＤＶＤ等のコンテンツでは、このような方法がとられる。 MPEG1, 2 and ITU-TH. When video is encoded using H.263 or the like, bit allocation is performed using a constant and uniform algorithm from the motion information and the spatial information amount of the video. For example, such a method is used for contents such as television broadcast sports programs and movie DVDs.

一方、従来のテレビ会議システムは、低ビットレートの通信回線でもテレビ会議が行えるように設計されているにもかかわらず、同時接続するユーザ数の増加等により、通信回線のトラフィックが増大して通信レートが低下した場合には、映像全体にわたって品質が低下するので必要な情報を送ることができず、例えば動いている人物がモザイク画のようになったり、動いている人物の画像が欠落したりするという問題があった。 On the other hand, the conventional video conferencing system is designed so that video conferencing can be performed even with a low bit rate communication line, but the communication line traffic increases due to an increase in the number of simultaneously connected users. If the rate drops, the quality of the whole video will drop, so you will not be able to send the necessary information.For example, a moving person will look like a mosaic picture, or a moving person's image will be missing. There was a problem to do.

そこで、本発明は、テレビ会議や講演のスピーチ等の映像において、注目される対象である人物の映像について品質を保ちながら、映像データを効率良く符号化できる映像符号化装置及び映像符号化方法、並びに伝送路の通信レートや映像全体に占める人物の情報量の割合に応じて、効率良く影像データを伝送できる映像伝送装置及び映像伝送方法を提供することを目的とする。 Therefore, the present invention provides a video encoding apparatus and a video encoding method capable of efficiently encoding video data while maintaining the quality of a video of a person who is a target of attention in a video such as a video conference or speech. It is another object of the present invention to provide a video transmission apparatus and a video transmission method capable of efficiently transmitting image data in accordance with the communication rate of the transmission path and the ratio of the amount of human information to the entire video.

この発明は、上記の課題を解決するための手段として、以下の構成を備えている。 The present invention has the following configuration as means for solving the above problems.

（１）人物及びその周囲を撮影してその映像を出力する撮像手段と、
前記映像から人物を検出する人物検出手段と、
前記映像から前記検出した人物の映像を周期的に抽出して、この人物映像データを低圧縮率で符号化し、前記人物以外の映像データを高圧縮率で符号化する符号化処理を行う符号化手段と、
を備えたことを特徴とする。 (1) imaging means for photographing a person and its surroundings and outputting the image;
A person detecting means for detecting a person from the video;
Encoding for periodically extracting the detected person's video from the video, encoding the human video data at a low compression rate, and encoding the video data other than the person at a high compression rate Means,
It is provided with.

この構成においては、映像符号化装置は、人物及びその周囲の映像から、人物の映像を抽出して、この映像データを低圧縮率で符号化し、人物以外の映像データを高圧縮率で符号化する。したがって、テレビ会議や講演のスピーチ等の映像においては、動きがほとんど発生せず、注目されるのは人物の領域であるため、注目される対象の映像については、品質を保ちながら映像データを効率良く符号化できる。 In this configuration, the video encoding device extracts a video of a person from a person and surrounding video, encodes this video data at a low compression rate, and encodes video data other than the person at a high compression rate. To do. Therefore, in video such as video conferences and speeches, there is almost no movement, and attention is focused on the human area. Encode well.

（２）人物及びその周囲を撮影してその映像を出力する撮像手段と、
前記映像から人物を検出する人物検出手段と、
前記撮像手段が撮影した映像の全体領域を符号化し、それ以降は周期的に、映像中の人物を含む矩形領域である人物映像データを符号化する符号化手段と、
を備えたことを特徴とする。 (2) imaging means for photographing a person and its surroundings and outputting the image;
A person detecting means for detecting a person from the video;
Encoding means for encoding the entire area of the video imaged by the imaging means, and thereafter encoding the person video data that is a rectangular area including a person in the video periodically;
It is provided with.

この構成においては、映像伝送装置は、人物及びその周囲の映像の全体領域を符号化し、それ以降は人物の映像データのみを符号化する。したがって、テレビ会議や講演のスピーチ等の映像においては、動きがほとんど発生せず、注目されるのは人物の領域であるため、注目される対象である人物の映像を周期的に符号化することで、復号時には、全体領域の映像において、人物の領域だけを更新すれば良いので、さらに映像データを効率良く符号化できる。 In this configuration, the video transmission apparatus encodes the entire area of the person and the surrounding video, and thereafter encodes only the video data of the person. Therefore, in video such as video conferences and speeches, there is almost no movement, and it is the person's area that is noticed. Therefore, the video of the person who is the subject of attention should be encoded periodically. Thus, at the time of decoding, only the person area needs to be updated in the video of the entire area, so that the video data can be encoded more efficiently.

（３）前記撮像手段が撮影した映像の全体領域における構成の変化を検出するシーンチェンジ検出手段を備え、
前記符号化手段は、前記シーンチェンジ検出手段が映像の全体領域における構成の変化を検出したタイミングで、映像の全体領域を符号化することを特徴とする。 (3) a scene change detecting means for detecting a change in configuration in the entire area of the video imaged by the imaging means;
The encoding means encodes the entire area of the video at a timing when the scene change detection means detects a change in configuration in the entire area of the video.

この構成においては、シーンチェンジ検出手段が映像の全体領域における構成の変化を検出したタイミングで、映像の全体領域を符号化する。したがって、テレビ会議や講演のスピーチ等の映像において、シーンチェンジが発生した時には、映像の領域全体を設定（更新）しなおすので、復号時において映像に矛盾が生じることなく、符号化することかできる。 In this configuration, the entire area of the video is encoded at the timing when the scene change detection means detects the change in the configuration in the entire area of the video. Therefore, when a scene change occurs in a video such as a video conference or speech, the entire video area is set (updated), so that the video can be encoded without any contradiction during decoding. .

（４）（１）乃至（３）のいずれかに記載の映像符号化装置と、
前記映像符号化装置が符号化した映像データを相手装置に送信する通信手段と、
を備えたことを特徴とする。 (4) The video encoding device according to any one of (1) to (3),
Communication means for transmitting video data encoded by the video encoding device to a counterpart device;
It is provided with.

この構成においては、映像伝送装置は、符号化した映像データを相手装置に送信する。したがって、テレビ会議や講演のスピーチ等の映像が効率良く符号化されているので、伝送路の通信レートに左右されることなく確実に相手装置に映像データを送信することができる。 In this configuration, the video transmission device transmits the encoded video data to the counterpart device. Accordingly, since video such as a video conference or lecture speech is efficiently encoded, the video data can be reliably transmitted to the partner apparatus regardless of the communication rate of the transmission path.

（５）人物及びその周囲を撮影してその映像を出力する撮像手段と、
前記映像から人物を検出する人物検出手段と、
伝送路の通信レートの状態、及び映像全体に占める人物の情報量の割合に基づいて所定の演算を行う演算手段と、
前記演算手段の演算結果が閾値以上の場合には、前記撮像手段が撮影した映像から前記検出した人物の映像を周期的に抽出して、この映像データを低圧縮率で符号化し、前記人物以外の映像データを高圧縮率で符号化し、
前記演算手段の演算結果が閾値未満の場合には、前記撮像手段が撮影した映像の全体領域を符号化し、それ以降は周期的に、映像中の人物を含む矩形領域である人物映像データを符号化する符号化手段と、
前記符号化手段が符号化した映像データを相手装置に送信する通信手段と、
を備えたことを特徴とする。 (5) imaging means for photographing a person and its surroundings and outputting the image;
A person detecting means for detecting a person from the video;
A calculation means for performing a predetermined calculation based on the state of the communication rate of the transmission path and the ratio of the information amount of the person in the entire video;
If the calculation result of the calculation means is equal to or greater than a threshold, the detected person's video is periodically extracted from the video taken by the imaging means, and the video data is encoded at a low compression rate. Video data at a high compression rate,
When the calculation result of the calculation means is less than the threshold value, the entire area of the video imaged by the imaging means is encoded, and thereafter the person video data that is a rectangular area including the person in the video is encoded periodically. Encoding means for
Communication means for transmitting the video data encoded by the encoding means to a counterpart device;
It is provided with.

この構成においては、映像伝送装置は、通信レートの状態、及び映像全体に占める人物の情報量の割合に基づいて行う所定の演算の結果が一定値以上の場合には、映像から人物の映像を抽出して、この人物映像データを低圧縮率で符号化し、人物以外の映像データを高圧縮率で符号化して、両映像データを相手装置に送信する。また、映像伝送装置は、上記の演算の結果が一定値未満の場合には、映像データの全体領域を符号化して、その映像データを相手装置に送信し、それ以降は人物を矩形に抽出した映像データを符号化して、その映像データを相手装置に送信する。したがって、伝送路における通信レートの状態、及び映像全体に占める人物の情報量の割合に応じて、映像における情報量の再割り当てや不要情報の削除を行うことにより、テレビ会議や講演のスピーチ等の映像において、注目される対象である人物の映像を確実に相手装置に送信することができる。 In this configuration, when the result of a predetermined calculation performed based on the state of the communication rate and the ratio of the information amount of the person occupying the entire video is equal to or greater than a certain value, the video transmission apparatus converts the video of the person from the video. The person video data is extracted and encoded at a low compression rate, the video data other than the person is encoded at a high compression rate, and both video data are transmitted to the partner apparatus. In addition, when the result of the above calculation is less than a certain value, the video transmission apparatus encodes the entire area of the video data, transmits the video data to the partner apparatus, and thereafter extracts the person into a rectangle. The video data is encoded, and the video data is transmitted to the partner apparatus. Therefore, depending on the state of the communication rate on the transmission path and the ratio of the amount of human information to the entire video, the information amount in the video is reassigned and unnecessary information is deleted, so that video conferences, speeches, etc. In the video, it is possible to reliably transmit the video of the person who is the target of attention to the partner apparatus.

（６）人物及びその周囲を撮影した映像から、人物を周期的に検出する手順、
前記検出した人物の映像データを抽出して低圧縮率で符号化し、前記人物以外の映像データを高圧縮率で符号化する手順、
を備えたことを特徴とする。 (6) a procedure for periodically detecting a person from an image of the person and the surroundings;
Extracting the detected person's video data and encoding it at a low compression rate, and encoding the video data other than the person at a high compression rate,
It is provided with.

この構成においては、（１）と同様の作用効果を得ることができる。 In this configuration, the same effect as (1) can be obtained.

（７）人物及びその周囲を撮影した映像から、人物を周期的に検出する手順、
前記人物及びその周囲の映像の全体領域を符号化し、それ以降は周期的に、映像中の人物を含む矩形領域である人物映像データを符号化する手順、
を備えたことを特徴とする。 (7) a procedure for periodically detecting a person from an image of the person and its surroundings;
A procedure for encoding the entire area of the person and the surrounding video, and encoding the person video data which is a rectangular area including the person in the video periodically thereafter;
It is provided with.

この構成においては、（２）と同様の作用効果を得ることができる。 In this configuration, the same effect as (2) can be obtained.

（８）前記人物及びその周囲の映像における全体領域の構成変化を検出する手順、
前記映像における全体領域の構成変化を検出したタイミングに、前記映像データの全体領域を符号化する手順、
を備えたことを特徴とする。 (8) A procedure for detecting a configuration change of the entire area in the person and the surrounding image,
A procedure for encoding the entire area of the video data at a timing when a change in the configuration of the entire area in the video is detected;
It is provided with.

この構成においては、（３）と同様の作用効果を得ることができる。 In this configuration, the same effect as (3) can be obtained.

（９）（６）乃至（８）のいずれかに記載の映像符号化方法を行った後に、
前記符号化した映像データを相手装置に送信する手順、
を備えたことを特徴とする。 (9) After performing the video encoding method according to any one of (6) to (8),
A procedure for transmitting the encoded video data to a counterpart device;
It is provided with.

この構成においては、（４）と同様の作用効果を得ることができる。 In this configuration, the same effect as (4) can be obtained.

（１０）伝送路の通信レートの状態、及び映像全体に占める人物の情報量の割合に基づいて所定の演算を行う手順、
前記演算結果が閾値以上の場合において、
人物及びその周囲を撮影した映像から、人物を周期的に検出する手順、
前記検出した人物の映像データを抽出して低圧縮率で符号化し、前記人物以外の映像データを高圧縮率で符号化する手順、
前記符号化された人物映像データ及び人物以外の映像データを相手装置に送信する手順、
前記演算結果が閾値未満の場合において、
人物及びその周囲を撮影した映像から、人物を周期的に検出する手順、
前記映像の全体領域を符号化し、それ以降は周期的に前記映像中の人物を含む矩形領域である人物映像データを符号化する手順、
前記符号化された映像データを相手装置に送信する手順、
を備えたことを特徴とする。 (10) A procedure for performing a predetermined calculation based on the state of the communication rate of the transmission path and the ratio of the information amount of the person in the entire video,
When the calculation result is equal to or greater than a threshold value,
A procedure for periodically detecting a person from an image of the person and its surroundings,
Extracting the detected person's video data and encoding it at a low compression rate, and encoding the video data other than the person at a high compression rate,
A procedure for transmitting the encoded person video data and video data other than the person to the counterpart device;
When the calculation result is less than the threshold value,
A procedure for periodically detecting a person from an image of the person and its surroundings,
A procedure for encoding the entire area of the video, and thereafter encoding the person video data which is a rectangular area including the person in the video periodically;
A procedure for transmitting the encoded video data to a counterpart device;
It is provided with.

この構成においては、（５）と同様の作用効果を得ることができる。 In this configuration, the same effect as (5) can be obtained.

テレビ会議や講演のスピーチ等の映像においては、動きがほとんど発生せず、また人物等の注目される対象以外についての情報はほぼ不要であるといった特徴を持つが、本発明によれば、映像における情報量の再割り当てや不要情報の削除を行うことにより、伝送路の通信レートの状態、及び映像全体に占める人物の情報量の割合に基づいて行った所定の演算結果が閾値未満の場合には、注目される対象の映像のみを符号化して送信し、上記演算結果が閾値以上の場合には、注目される対象の映像を低圧縮率で、背景映像を高圧縮率で符号化して送信するので、回線の状況や映像の特徴に応じて、通信レートが低下しても、注目される対象の映像を確実に送ることができる。また、同じ転送レートでは映像の高画質化を、また、同じ品質であれば低ビットレート化を図ることができる。 In video images such as video conferences and speeches, there is little movement, and there is a feature that information about objects other than the subject of interest such as people is almost unnecessary. When the predetermined calculation result based on the state of the communication rate of the transmission path and the proportion of the information amount of the person in the entire video is less than the threshold by reallocating the information amount or deleting unnecessary information , Only the target video of interest is encoded and transmitted, and if the calculation result is equal to or greater than the threshold, the target video of interest is encoded with a low compression rate and the background video is encoded with a high compression rate and transmitted. Therefore, even if the communication rate decreases according to the line status and video characteristics, it is possible to reliably send the target video. In addition, it is possible to improve the image quality of the video at the same transfer rate, and to reduce the bit rate if the quality is the same.

以下の説明では、本発明の映像符号化装置及び映像伝送装置の機能を備えた遠隔会議装置を例に挙げて説明する。なお、本発明の符号化装置及び映像伝送装置は、もちろん単独の構成であっても良い。 In the following description, a remote conference apparatus having the functions of the video encoding apparatus and the video transmission apparatus of the present invention will be described as an example. Of course, the encoding device and the video transmission device of the present invention may have a single configuration.

図１は、本発明の一実施形態に係る遠隔会議装置の斜視概観図、及びネットワーク接続した２台の遠隔会議装置を示す図である。図１（Ａ）に示すように、遠隔会議装置１は、一方向に長尺な直方体形状であり、スピーカアレイ２６を構成する８個のスピーカユニットＳＰ１〜ＳＰ８、及びマイクアレイ３０を構成する１４個のマイクユニットＭＩＣ１〜ＭＩＣ１４が、正面側にそれぞれ一定の間隔で直線上に配置された構成である。また、図示していないが、遠隔会議装置１の裏面側には、入出力コネクタ６４が設けられている。 FIG. 1 is a perspective view of a remote conference apparatus according to an embodiment of the present invention, and a diagram showing two remote conference apparatuses connected to a network. As shown in FIG. 1A, the remote conference apparatus 1 has a rectangular parallelepiped shape that is long in one direction, and the eight speaker units SP1 to SP8 that constitute the speaker array 26 and the microphone array 30 that constitute the microphone array 30. The microphone units MIC 1 to MIC 14 are arranged on a straight line at regular intervals on the front side. Although not shown, an input / output connector 64 is provided on the back side of the remote conference device 1.

なお、スピーカアレイ２６を構成するスピーカユニット数及びマイクアレイ３０を構成するマイクユニット数は、上記の数量に限定するものではない。 The number of speaker units constituting the speaker array 26 and the number of microphone units constituting the microphone array 30 are not limited to the above quantities.

遠隔会議装置１には、ビデオカメラ５０及びモニタ５２を接続可能であり、これらを接続した場合には、図１（Ａ）に示したようにモニタ５２の上面に遠隔会議装置１を載せ、さらに遠隔会議装置１の上面にビデオカメラ５０を載せて使用すると良い。また、遠隔会議装置１は、ビデオカメラ５０及びモニタ５２の接続の有無にかかわらず、会議机の端や台の上に置いて使用することも可能である。 A video camera 50 and a monitor 52 can be connected to the remote conference apparatus 1. When these are connected, the remote conference apparatus 1 is placed on the upper surface of the monitor 52 as shown in FIG. The video camera 50 may be mounted on the upper surface of the remote conference device 1 for use. The remote conference apparatus 1 can also be used by placing it on the end or table of a conference desk regardless of whether the video camera 50 and the monitor 52 are connected.

図１（Ｂ）に示すように、２台の遠隔会議装置１ａ，１ｂをインターネットやＬＡＮ等のネットワーク８に接続することで、遠隔会議システムを構築することができる。遠隔会議装置１ａは、ネットワーク８に接続されている別の遠隔会議装置１ｂと、マイクアレイ３０で収音した音声信号やビデオカメラ５０で撮像した映像信号を、ＳＩＰ等のプロトコルを用いてやりとり（送受信）する。ユーザは、遠隔会議装置１ａ，１ｂ、ビデオカメラ５０ａ，５０ｂ及びモニタ５２ａ，５２ｂを使用することで、異なる地点ａ，ｂ間で音声と映像によるテレビ会議を行うことができる。また、図示していないが、複数の遠隔会議装置１をネットワーク８に接続することで、さらに複数の地点間でテレビ会議を行うことが可能である。 As shown in FIG. 1B, a remote conference system can be constructed by connecting two remote conference apparatuses 1a and 1b to a network 8 such as the Internet or a LAN. The remote conference apparatus 1a exchanges an audio signal picked up by the microphone array 30 and a video signal picked up by the video camera 50 with another remote conference apparatus 1b connected to the network 8 using a protocol such as SIP ( Send and receive). By using the remote conference apparatuses 1a and 1b, the video cameras 50a and 50b, and the monitors 52a and 52b, the user can perform a video conference using audio and video between the different points a and b. Although not shown, it is possible to hold a video conference between a plurality of points by connecting a plurality of remote conference apparatuses 1 to the network 8.

次に、遠隔会議装置１の具体的な構成について説明する。図２は、本発明の一実施形態に係る遠隔会議装置の機能ブロック図である。遠隔会議装置１は、リモコン（リモートコントローラ）１０、信号受信部１２、制御部１４、メモリ１６、音声処理部２０、８個のスピーカユニットＳＰ１〜ＳＰ８から成るスピーカアレイ２６、１４個のマイクユニットＭＩＣ１〜ＭＩＣ１４から成るマイクアレイ３０、ビデオカメラ５０、モニタ５２、映像符号化部５４、映像復号部５６、多重化／分離部６０、通信部６２、及び入出力コネクタ６４を備えている。 Next, a specific configuration of the remote conference apparatus 1 will be described. FIG. 2 is a functional block diagram of the remote conference apparatus according to the embodiment of the present invention. The remote conference apparatus 1 includes a remote controller (remote controller) 10, a signal receiving unit 12, a control unit 14, a memory 16, a voice processing unit 20, a speaker array 26 including eight speaker units SP1 to SP8, and 14 microphone units MIC1. To the MIC 14, a video camera 50, a monitor 52, a video encoding unit 54, a video decoding unit 56, a multiplexing / separating unit 60, a communication unit 62, and an input / output connector 64.

遠隔会議装置１では、ＩＴＵ−ＴＨ．２６１、Ｈ．２６２、Ｈ．２６３、Ｈ．２６４、ＭｏｔｉｏｎＪＰＥＧなどの映像符号化方式に対して適応することができる。 In the remote conference apparatus 1, the ITU-T H.264 261, H.H. 262, H.C. 263, H.M. H.264, Motion JPEG, and other video encoding schemes.

ここで、本発明は映像伝送装置の映像処理機能に関するものであり、音声処理機能は主題から外れるので、以下の説明では遠隔会議装置１の音声処理系の説明は簡単にとどめる。 Here, the present invention relates to the video processing function of the video transmission apparatus, and the audio processing function is not the subject of the present invention. Therefore, in the following description, the description of the audio processing system of the remote conference apparatus 1 will be simplified.

リモコン１０は、操作部１０１を備えており、遠隔会議装置１の各種設定を行うためのものである。リモコン１０は、ユーザの操作に応じた信号を赤外光として出力する。 The remote controller 10 includes an operation unit 101 for performing various settings of the remote conference apparatus 1. The remote controller 10 outputs a signal corresponding to a user operation as infrared light.

信号受信部１２は、リモコン１０から出力された信号（赤外光）を受信（受光）して、制御部１４に出力する。 The signal receiving unit 12 receives (receives) a signal (infrared light) output from the remote controller 10 and outputs it to the control unit 14.

制御部１４は、音声処理部２０・ビデオカメラ５０・映像符号化部５４・映像復号部５６・通信部６２等を制御する。また、制御部１４は、信号受信部１２から送られてきた信号に基づいて各部を制御したり、メモリ１６からプログラムを読み出したりメモリ１６にデータを書き込んだりする。 The control unit 14 controls the audio processing unit 20, the video camera 50, the video encoding unit 54, the video decoding unit 56, the communication unit 62, and the like. The control unit 14 controls each unit based on the signal transmitted from the signal receiving unit 12, reads a program from the memory 16, and writes data to the memory 16.

メモリ１６は、制御部１４が実行するプログラムや、制御部１４によって書き込まれたデータ等を記憶している。 The memory 16 stores a program executed by the control unit 14, data written by the control unit 14, and the like.

音声処理部２０は、マイクアレイ３０で収音した音声信号を、圧縮・符号化して多重化／分離部６０に出力したり、多重化／分離部６０から符号化された音声信号が送られてくると、復号してスピーカアレイ２６に出力したりする。また、音声処理部２０は、マイクアレイ３０が収音した音声信号と、スピーカアレイ２６に対して出力する音声信号と、に対して、エコーキャンセル処理を行うことにより、適切なエコー除去が行われ、自装置の話者音声のみが出力音声信号として、ネットワークに送信される。 The audio processing unit 20 compresses and encodes the audio signal picked up by the microphone array 30 and outputs it to the multiplexing / separating unit 60, or the encoded audio signal is sent from the multiplexing / separating unit 60. When it comes, it decodes and outputs it to the speaker array 26. In addition, the sound processing unit 20 performs an echo cancellation process on the sound signal picked up by the microphone array 30 and the sound signal output to the speaker array 26, thereby performing appropriate echo removal. Only the speaker voice of the own device is transmitted to the network as an output voice signal.

スピーカユニットＳＰ１〜ＳＰ８は、無指向性のスピーカから成り、音声処理部２０から個別に与えられた放音信号を音声変換して外部に放音する。 The speaker units SP 1 to SP 8 are composed of omnidirectional speakers, which convert sound output signals individually given from the sound processing unit 20 into sound and emit the sound to the outside.

マイクアレイ３０を構成する各マイクユニットＭＩＣ１〜ＭＩＣ１４は、遠隔会議装置１の外部からの音声を収音して電気信号に変換し、収音信号を音声処理部２０に出力する。 Each microphone unit MIC 1 to MIC 14 constituting the microphone array 30 collects sound from outside the remote conference device 1 and converts it into an electrical signal, and outputs the collected sound signal to the sound processing unit 20.

ビデオカメラ５０は、会議風景や会議参加者（人物）の映像を撮像して、その映像データを一定周期で映像符号化部５４に出力する。 The video camera 50 captures the video of the conference scenery and conference participants (persons) and outputs the video data to the video encoding unit 54 at regular intervals.

モニタ５２は、別の遠隔会議装置から送られてきて、映像復号部５６で復号された映像データを表示する。 The monitor 52 displays video data sent from another remote conference apparatus and decoded by the video decoding unit 56.

映像符号化部５４は、ビデオカメラ５０から一定周期で出力された映像データをエンコード（符号化）して、多重化／分離部６０に出力する。また、映像符号化部５４は、映像ブロック情報量や人物重み係数を制御部１４に対して出力する。 The video encoding unit 54 encodes (encodes) video data output from the video camera 50 at a constant period and outputs the encoded video data to the multiplexing / separating unit 60. In addition, the video encoding unit 54 outputs the video block information amount and the human weight coefficient to the control unit 14.

映像復号部５６は、多重化／分離部６０から出力された符号化された映像データをデコード（復号）して、モニタ５２へ出力する。 The video decoding unit 56 decodes (decodes) the encoded video data output from the multiplexing / demultiplexing unit 60 and outputs the decoded video data to the monitor 52.

多重化／分離部６０は、音声処理部２０から出力された符号化された音声データと、映像符号化部５４から出力された符号化された映像信号と、を多重化してストリームデータを通信部６２へ出力する。また、多重化／分離部６０は、通信部６２から送られてきたストリームデータを符号化された音声データと符号化された映像データに分離して、音声信号を音声処理部２０へ出力し、映像信号を映像復号部５６へ出力する。 The multiplexing / separating unit 60 multiplexes the encoded audio data output from the audio processing unit 20 and the encoded video signal output from the video encoding unit 54, and transmits the stream data to the communication unit. To 62. The multiplexing / separating unit 60 separates the stream data transmitted from the communication unit 62 into encoded audio data and encoded video data, and outputs an audio signal to the audio processing unit 20. The video signal is output to the video decoding unit 56.

通信部６２は、多重化／分離部６０から出力されたストリームデータを、ネットワーク８における通信方式に対応するデータ形式（プロトコル）に変換して、入出力コネクタ６４及びネットワーク８を介して別の遠隔会議装置１へ送信する。また、通信部６２は、入出力コネクタ６４に接続されたネットワーク（ＬＡＮ）８を介して入力された別の遠隔会議装置１から送られてきたストリームデータを、ネットワーク８に対応するデータ形式（プロトコル）から変換して、多重化／分離部６０に出力する。さらに、通信部６２は、ネットワーク（伝送路）８における通信の状態（損失）を監視しており、制御部１４に対してその情報を随時送信している。 The communication unit 62 converts the stream data output from the multiplexing / demultiplexing unit 60 into a data format (protocol) corresponding to the communication method in the network 8, and transmits another remote data via the input / output connector 64 and the network 8. It transmits to the conference apparatus 1. In addition, the communication unit 62 converts the stream data sent from another remote conference apparatus 1 input via the network (LAN) 8 connected to the input / output connector 64 into a data format (protocol corresponding to the network 8). ) And output to the multiplexing / separating unit 60. Further, the communication unit 62 monitors the communication state (loss) in the network (transmission path) 8 and transmits the information to the control unit 14 as needed.

次に、映像符号化部５４の詳細な構成について説明する。図３は、映像符号化部の構成を示すブロック図である。 Next, a detailed configuration of the video encoding unit 54 will be described. FIG. 3 is a block diagram illustrating a configuration of the video encoding unit.

映像符号化部５４は、顔／人物検出器７０、画面切替検出器７２、符号化制御器７４、第１切替スイッチ７６、離散余弦変換器（ＤＣＴ）７８、量子化器（Ｑ）８０、符号化器８１、逆量子化器（ＩＱ）８２、逆離散余弦変換器（ＩＤＣＴ）８４、加算器８６、動き補償付き映像メモリ８８、第２切替スイッチ９０、及び減算器９２を備えている。 The video encoding unit 54 includes a face / person detector 70, a screen change detector 72, an encoding controller 74, a first changeover switch 76, a discrete cosine transformer (DCT) 78, a quantizer (Q) 80, a code A quantizer 81, an inverse quantizer (IQ) 82, an inverse discrete cosine transformer (IDCT) 84, an adder 86, a video memory 88 with motion compensation, a second changeover switch 90, and a subtractor 92.

映像符号化部５４では、イントラ符号化を行う場合には、第１切替スイッチ７６をイントラ端子７６Ａ側に切り替え、第２切替スイッチ９０をイントラ端子９０Ａ側に切り替える。また、映像符号化部５４では、インター符号化を行う場合には、第１切替スイッチ７６をインター端子７６Ｂ側に切り替え、第２切替スイッチ９０をインター端子９０Ｂ側に切り替える。 When performing the intra coding, the video encoding unit 54 switches the first changeover switch 76 to the intra terminal 76A side and the second changeover switch 90 to the intra terminal 90A side. Further, in the case of performing inter coding, the video coding unit 54 switches the first changeover switch 76 to the inter terminal 76B side and the second changeover switch 90 to the inter terminal 90B side.

ビデオカメラ５０から入力された映像データは、顔／人物検出器７０、第１切替スイッチ７６のイントラ端子７６Ａ、動き補償付き映像メモリ８８、及び減算器９２に送られる。 Video data input from the video camera 50 is sent to the face / person detector 70, the intra terminal 76A of the first changeover switch 76, the video memory 88 with motion compensation, and the subtractor 92.

顔／人物検出器７０は、送られてきた映像データ中に、顔や人物の特徴を有する映像の有無を判定し、顔や人物の特徴を有する映像を検出すると、顔の場合と人物の場合とで異なる信号を符号化制御器７４へ出力する。顔／人物検出器７０は、肌の色や画面輪郭構成等を利用して顔部を検出する。すなわち、顔／人物検出器７０は、スキントーンや、顔のパーツ（目、めがね、鼻、口等）、映像の輝度変化等の特徴を利用して、顔部を検出する。 When the face / person detector 70 determines whether or not there is an image having the characteristics of a face or a person in the transmitted image data, and detects an image having the characteristics of a face or a person, the case of a face or a person is detected. Are output to the encoding controller 74. The face / person detector 70 detects the face using the skin color, the screen outline configuration, and the like. That is, the face / person detector 70 detects a face portion using features such as skin tone, face parts (eyes, glasses, nose, mouth, etc.), and luminance change of the image.

また、顔／人物検出器７０は、画面切替検出器７２から大域的な境界／動き検出信号を取得して、顔部の検出信号とともに複合的に人物抽出を行う。具体的には、顔／人物検出器７０は、顔部が存在する領域に対して、縮退・拡張処理を行うとともに近傍エッジ検出を行って、大域的な領域から人物を検出する。また、会議参加者は口や手を動かしていることが多いので、弱い動きが存在するなどのパラメータを用いて、人物の検出精度を向上させる。 In addition, the face / person detector 70 acquires a global boundary / motion detection signal from the screen switching detector 72, and extracts a person in combination with the face detection signal. Specifically, the face / person detector 70 performs a degeneration / expansion process on a region where a face is present and performs a near edge detection to detect a person from a global region. Also, since conference participants often move their mouths and hands, parameters such as weak movements are used to improve human detection accuracy.

なお、以上の処理は、符号化単位のフレーム毎に行う必要は無い。また、遠隔会議装置１に設定されているモードに応じて、検出結果をビット優先割り当ての優先順位を上げる感度係数に利用したり、限定領域エリア決定のための情報としたりする。例えば、顔部の重み（感度）を高くし、その周辺部である人物（体）をそれに続く重み係数に設定する。 Note that the above processing need not be performed for each frame of a coding unit. Further, according to the mode set in the remote conference apparatus 1, the detection result is used as a sensitivity coefficient for raising the priority of bit priority assignment, or used as information for determining a limited area. For example, the weight (sensitivity) of the face is increased, and the person (body) that is the periphery of the face is set as the subsequent weight coefficient.

ビデオカメラ５０から入力された映像データは、イントラ符号化を行う場合には、第１切替スイッチ７６を介して離散余弦変換器７８に送られ、離散余弦変換器７８で離散余弦変換処理が施され、量子化器８０に送られる。量子化器８０は、符号化制御器７４から送られてきた量子化パラメータに基づいて、離散余弦変換器７８から送られてきた映像データを量子化して、符号化器８１へ出力する。また、量子化器８０で量子化された映像データは、逆量子化器８２で逆量子化され、さらに逆離散余弦変換器８４で逆離散余弦変換処理が施されて、加算器８６に送られる。 When performing intra coding, the video data input from the video camera 50 is sent to the discrete cosine transformer 78 via the first changeover switch 76, and the discrete cosine transformer 78 performs a discrete cosine transformation process. , And sent to the quantizer 80. The quantizer 80 quantizes the video data sent from the discrete cosine transformer 78 based on the quantization parameter sent from the coding controller 74 and outputs the quantized video data to the encoder 81. The video data quantized by the quantizer 80 is inversely quantized by the inverse quantizer 82, further subjected to inverse discrete cosine transform processing by the inverse discrete cosine transformer 84, and sent to the adder 86. .

加算器８６は、イントラ符号化を行う場合には、逆離散余弦変換器８４から出力された映像データをそのまま動き補償付き映像メモリ８８へ出力する。また、加算器８６は、インター符号化を行う場合には、動き補償付き映像メモリ８８から第２切替スイッチ９０を介して送られてきた動き補償付き映像データと、逆離散余弦変換器８４から出力された映像データと、を加算して、動き補償付き映像メモリ８８に出力する。 When the intra coding is performed, the adder 86 outputs the video data output from the inverse discrete cosine transformer 84 to the video memory 88 with motion compensation as it is. In addition, the adder 86 outputs the video data with motion compensation sent from the video memory with motion compensation 88 via the second changeover switch 90 and the inverse discrete cosine transformer 84 when performing inter coding. The video data thus added is added to the video memory 88 with motion compensation.

動き補償付き映像メモリ８８は、加算器８６から出力された映像データと、ビデオカメラ５０からの映像データと、を用いて生成した動き補償映像を、減算器９２に出力する。また、動き補償付き映像メモリ８８は、動きベクトルを符号化器８１に出力する。さらに、インター符号化を行う場合には、動き補償付き映像メモリ８８は、第２切替スイッチ９０を介して加算器８６に動き補償映像を出力する。 The motion compensated video memory 88 outputs the motion compensated video generated using the video data output from the adder 86 and the video data from the video camera 50 to the subtracter 92. The motion compensated video memory 88 outputs the motion vector to the encoder 81. Further, when performing inter coding, the motion compensated video memory 88 outputs the motion compensated video to the adder 86 via the second changeover switch 90.

減算器９２は、ビデオカメラ５０の映像データを、動き補償付き映像メモリ８８から出力された動き補償映像データから減算して、フレーム間の差分データを画面切替検出器７２へ出力する。また、インター符号化を行う場合には、減算器９２は、第１切替スイッチ７６を介して離散余弦変換器７８へ、フレーム間の差分データを出力する。 The subtracter 92 subtracts the video data of the video camera 50 from the motion compensated video data output from the motion compensated video memory 88 and outputs difference data between frames to the screen switching detector 72. In addition, when performing inter coding, the subtractor 92 outputs the inter-frame difference data to the discrete cosine transformer 78 via the first changeover switch 76.

このフレーム間の差分データは、離散余弦変換器７８で離散余弦変換処理が施され、量子化器８０に送られる。量子化器８０は、符号化制御器７４から送られてきた量子化パラメータに基づいて、離散余弦変換器７８から送られてきたフレーム間の差分データを量子化して、画素係数を符号化器８１へ出力するとともに逆量子化器８２へ出力する。 The difference data between the frames is subjected to a discrete cosine transform process by a discrete cosine transformer 78 and sent to a quantizer 80. The quantizer 80 quantizes the difference data between frames sent from the discrete cosine transformer 78 based on the quantization parameter sent from the coding controller 74, and converts the pixel coefficient into the encoder 81. And output to the inverse quantizer 82.

画面切替検出器７２は、減算器９２から出力されたフレーム間の差分データに基づいて画面の切り替わりを検出する。画面切替検出器７２は、シーンチェンジはもちろん、輝度や彩度が大きく変化した場合、カメラがパン、ズーム、チルトの動作が発生した場合等、画面が切り替わったことを検出した場合には、符号化制御器７４へその旨を伝える信号を出力する。 The screen switching detector 72 detects screen switching based on the difference data between frames output from the subtracter 92. When the screen switching detector 72 detects that the screen has been switched, such as when the brightness or saturation has changed greatly, as well as when the scene has changed, or when the camera has performed pan, zoom, or tilt operations, A signal to that effect is output to the control controller 74.

符号化制御器７４は、イントラ符号化を行うか、またはインター符号化を行うかに応じて、第１切替スイッチ７６及び第２切替スイッチ９０へ切り替え信号を出力するとともに、符号化器８１へイントラフラグまたはインターフラグを出力する。また、符号化制御器７４は、符号化フラグまたは非符号化フラグを符号化器８１へ出力する。さらに、符号化制御器７４は、量子化器８０及び符号化器８１へ量子化パラメータを出力する。 The encoding controller 74 outputs a switching signal to the first changeover switch 76 and the second changeover switch 90 depending on whether intra-encoding or inter-encoding is to be performed, and at the same time, to the encoder 81. Output flag or interflag. In addition, the encoding controller 74 outputs an encoding flag or a non-encoding flag to the encoder 81. Further, the encoding controller 74 outputs the quantization parameter to the quantizer 80 and the encoder 81.

次に、遠隔会議装置１の動作について説明する。図４は、遠隔会議装置を用いた会議風景、及びビデオカメラが撮影した通常モードの映像を示す図である。 Next, the operation of the remote conference apparatus 1 will be described. FIG. 4 is a diagram illustrating a conference scene using the remote conference device and a normal mode image captured by the video camera.

図５は、ビット優先割当モードで撮影した映像を示す図である。図６は、限定領域伝送モードで撮影した映像を示す図である。図７は、ズーム前の会議風景の全体画像、及びズーム後の会議風景の全体画像を示す図である。 FIG. 5 is a diagram showing a video shot in the bit priority assignment mode. FIG. 6 is a diagram illustrating an image captured in the limited area transmission mode. FIG. 7 is a diagram illustrating an entire image of the conference landscape before zooming and an entire image of the conference landscape after zooming.

図４（Ａ）に示すように、モニタ５２、遠隔会議装置１、及びビデオカメラ５０を、会議机４１の前に設置することで、会議の映像や会議参加者が発言する音声を別の遠隔会議装置とネットワーク（伝送路）８を介してやりとりすることができる。 As shown in FIG. 4A, by installing the monitor 52, the remote conference device 1, and the video camera 50 in front of the conference table 41, the video of the conference and the voices spoken by the conference participants are separated from each other. It is possible to communicate with the conference apparatus via the network (transmission path) 8.

ビデオカメラ５０が撮影した会議風景は、図４（Ｂ）に示すような映像として別の遠隔会議装置に送られる。 The conference scene shot by the video camera 50 is sent to another remote conference device as an image as shown in FIG.

遠隔会議装置１では、１．通常モード、２．ビット優先割当モード、及び３．限定領域伝送モードのうちいずれか１つの映像データ伝送モードを設定することができる。ただし、限定領域伝送モードであってもビット優先割当モードまたは通常モードの状態、また、ビット優先割当モードであっても通常モードの状態に自動的に遷移する場合がある。 In the remote conference apparatus 1. Normal mode, 2. 2. bit priority allocation mode, and Any one of the limited area transmission modes can be set. However, even in the limited area transmission mode, the state may be automatically changed to the bit priority assignment mode or the normal mode state, and the bit priority assignment mode may be automatically changed to the normal mode state.

１．通常モードは、映像を符号化する場合に、動き情報と、映像の空間的な情報量から、一定で均一なアルゴリズムを使用して、ビット割り当てを行うモードである。このモードは、必要なビットレートに対して伝送路に余裕が十分にある場合か、画面全体を更新しなければならない場合に使用する。 1. The normal mode is a mode in which bits are allocated using a constant and uniform algorithm from motion information and the amount of spatial information of the video when encoding the video. This mode is used when the transmission path has a sufficient margin for the required bit rate or when the entire screen has to be updated.

２．ビット優先割当モードは、動き情報と、時間差分情報の情報量と、人物重み係数を利用してビット割り当てを行う。すなわち、映像における重要な領域である人物や特に顔の部分により多くの符号ビットが割り振られるように設定する。このモードでは、人物の領域やその顔の領域を抽出する。遠隔会議装置１は、例えば図５に示すように、映像符号化部５４の顔／人物検出器７０により会議風景映像中の人物Ａ〜Ｃやその顔の部分を検出する。映像符号化部５４の符号化制御器７４は、人物重み係数を用いて、ブロック（例えばマクロブロック）の符号化打ち切りの判断を行う。また、符号化制御器７４は、人物重み係数を用いて、量子化器８０に出力する量子化パラメータと連動し、より符号ビットが割り振られるように設定する。これにより、ビット優先割当モードでは、図５に示した会議風景映像において、人物Ａ〜Ｃの顔の領域［１］・［３］・［５］は低圧縮率で高品位な映像で、人物Ａ〜Ｃの体の領域［２］・［４］・［６］は、人物Ａ〜Ｃの顔の領域に次いで低圧縮率で高品位な映像で、その他の領域である背景の領域［７］は高圧縮率で低品位な映像で、別の遠隔会議装置に送られる。 2. In the bit priority allocation mode, bit allocation is performed using motion information, the amount of time difference information, and a person weighting factor. In other words, the setting is made so that more code bits are allocated to a person who is an important area in the video and particularly to a face portion. In this mode, the person area and the face area are extracted. For example, as shown in FIG. 5, the remote conference apparatus 1 detects persons A to C and their face portions in the conference scene video by using the face / person detector 70 of the video encoding unit 54. The encoding controller 74 of the video encoding unit 54 uses the human weighting coefficient to determine whether to stop encoding a block (for example, a macro block). Also, the encoding controller 74 uses the human weighting coefficient to set so that more code bits are allocated in conjunction with the quantization parameter output to the quantizer 80. Accordingly, in the bit priority allocation mode, the face areas [1], [3], and [5] of the persons A to C in the conference scenery video shown in FIG. The body areas [2], [4], and [6] of A to C are high-definition images with a low compression rate next to the face areas of the persons A to C, and the background areas [7] that are other areas. ] Is a high-compression, low-quality video that is sent to another teleconferencing device.

また、遠隔会議装置１は、通常、画面を帯状構造に分割して、その単位で、画面の上端から順番に下端まで送信するが、このモードでは、送信最後で符号が割り当てられなくなるのを防ぐために、画面の中で重要な領域を先に送信する。すなわち、通常、会議風景の映像では人物の領域は、画面の中央部から下部にかけて位置するが、遠隔会議装置１は、このモードでは、図５（Ｂ）に示すように、画面を帯状構造に分割して、会議風景の映像における人物を検出すると、その人物を含む領域（同図における人物Ｂの頭頂部を含む領域［３］）の上端から順に下端まで送信し、続いて上端から順に、その人物を含む領域の直前までを送信する。 In addition, the remote conference apparatus 1 usually divides the screen into a band-like structure and transmits the unit in order from the upper end of the screen to the lower end, but in this mode, it is possible to prevent a code from being assigned at the end of transmission. In order to avoid this, send important areas on the screen first. That is, in the video of a conference scene, the person area is usually located from the center to the bottom of the screen. However, in this mode, the remote conference device 1 has a band-like structure as shown in FIG. When the person in the video of the meeting scene is detected by dividing, the area including the person (area [3] including the top of person B in the figure) is transmitted from the upper end to the lower end in order, and then sequentially from the upper end. Send up to just before the area containing the person.

３．限定領域伝送モードは、ＩＳＤＮやＧＳＴＮ（アナログ電話網）、そして輻輳が大きい、かつまた低帯域のＩＰ網のような低ビットレートでのテレビ会議やスピーチモードの場合に適用する。伝送路が低ビットレートの場合には、映像において動きが無いエリアであっても、ヘッダのオーバヘッドが相対的に大きくなる。そのため、このモードでは、映像データの送信開始時及びシーンチェンジ時には、図６（Ａ）に示すように、会議風景の全体領域［１０］の映像を伝送するが、それ以外の時には、必要なデータを確実に伝送するために、図６（Ｂ）に示すように、映像における必要な領域、すなわち顔や人物の領域［１１］〜［１４］のみを伝送して、他の領域についてはブロックの属性情報も送信しないように設定されている。 3. The limited area transmission mode is applied to video conferencing and speech modes at a low bit rate such as ISDN, GSTN (analog telephone network), and a high congestion and low bandwidth IP network. When the transmission path has a low bit rate, the header overhead becomes relatively large even in an area where there is no motion in the video. Therefore, in this mode, as shown in FIG. 6 (A), when the transmission of video data is started and when the scene changes, the video of the entire area [10] of the conference scene is transmitted. 6B, as shown in FIG. 6B, only the necessary areas in the video, that is, the face and person areas [11] to [14] are transmitted, and the other areas are block blocks. The attribute information is also set not to be transmitted.

また、このモードでは、伝送するデータ量を抑えるために、以下のように顔や人物の領域を設定する。すなわち、実装を容易にするために、顔や人物の領域のブロック形状を矩形に設定し、人物重み係数を利用してこの矩形領域を決定する。また、あまり細かなブロックに分割せず、会議風景の映像に含まれる人物の数程度までになるように設定する。また、このモードでは、別の遠隔会議装置１に対して、上記のブロックは、先頭ブロックのアドレスを送信し、連続するブロックの画像データを送信する。 In this mode, in order to reduce the amount of data to be transmitted, the face and person areas are set as follows. That is, in order to facilitate the implementation, the block shape of the face or person area is set to a rectangle, and this rectangle area is determined using a person weighting factor. In addition, the setting is made so that the number of persons included in the video of the meeting scene is not divided into very fine blocks. Further, in this mode, the above block transmits the address of the first block and transmits image data of successive blocks to another remote conference apparatus 1.

また、別の符号化方式として、図６（Ｃ）の上部に示すように、データを伝送しない領域、すなわち顔や人物の領域以外の領域（同図における黒ベタ領域）については、どれだけブロックをスキップするかを指定する情報のみを送り、顔や人物の領域（同図における人物Ａ，Ｂ，Ｃの領域）についてはブロックアドレスと映像データを送るように設定することが可能である。この方式では、人物数に依存することなく、また、矩形に限らず複雑な形状であっても、符号化する領域（伝送する領域）および符号化しない領域（伝送しない領域）を決定して、映像データを容易に伝送することができる。 Further, as another encoding method, as shown in the upper part of FIG. 6 (C), for an area where data is not transmitted, that is, an area other than the face or person area (solid black area in the figure) It is possible to send only the information specifying whether to skip the image, and to send a block address and video data for the face and person areas (person A, B, and C areas in the figure). In this method, it is not dependent on the number of persons, and even if the shape is not limited to a rectangle, a region to be encoded (region to transmit) and a region not to be encoded (region to not transmit) are determined. Video data can be easily transmitted.

また、このモードでは、背景の映像が送られないことになるため、映像データ送信開始時やシーンチェンジはもちろん、輝度や彩度が大きく変化した場合や、ビデオカメラ５０がパン・ズーム・チルトの動作を行って、映像の大域が変化した場合、例えば、ズーム動作により図７（Ａ）の映像から図７（Ｂ）の映像に変化した場合等には、通常モードに切り替えて、映像のリフレッシュを行う必要がある。 Also, in this mode, the background video is not sent, so when the video data starts or when the scene changes, the brightness or saturation changes greatly, or when the video camera 50 is pan / zoom / tilt. When the global area of the video is changed by the operation, for example, when the video of FIG. 7A is changed to the video of FIG. 7B by the zoom operation, the normal mode is switched to refresh the video. Need to do.

遠隔会議装置１では、以上のような３つのモードを設定することが可能である。 The remote conference apparatus 1 can set the three modes as described above.

また、遠隔会議装置１では、上記の３つのモードを自動的に切り替える自動切り替えモードを設定することも可能である。遠隔会議装置１では、通信部６２がネットワーク（伝送路）８の通信の状態（損失）を監視しているので、自動切り替えモードが設定された場合には、ネットワーク８における通信レートおよびその通信状態、さらに映像全体に占める人物の情報量の割合に応じて、上記の３つのモードを自動的に切り替える。 In the remote conference device 1, it is also possible to set an automatic switching mode for automatically switching the above three modes. In the remote conference apparatus 1, the communication unit 62 monitors the communication state (loss) of the network (transmission path) 8. Therefore, when the automatic switching mode is set, the communication rate and the communication state in the network 8 are set. Further, the above three modes are automatically switched according to the ratio of the amount of information of the person in the entire video.

図８は、制御部の具体的な構成を示すブロック図である。図８に示すように、制御部１４では、ブロック情報量演算器３２で、映像符号化部５４から得たブロック情報量と人物重み係数（α）との積を求め、人物領域積算器３３で、これを積算して、画面に占める人物領域の情報量Ｒｐを求める。また、全体領域積算器３４で、ブロック情報量を積算して、画面全体の情報量Ｒａを求める。そして、除算器３５でＲｐ／Ｒａの演算を行って、人物アクティビティＡｐを求める。ここで、人物アクティビティＡｐは、画面に占める人物領域の割合である。 FIG. 8 is a block diagram illustrating a specific configuration of the control unit. As shown in FIG. 8, in the control unit 14, the block information amount calculator 32 obtains the product of the block information amount obtained from the video encoding unit 54 and the human weight coefficient (α), and the person region integrator 33 These are integrated to obtain the information amount Rp of the person area occupying the screen. Further, the total area accumulator 34 accumulates the block information amount to obtain the information amount Ra of the entire screen. Then, Rp / Ra is calculated by the divider 35 to obtain the person activity Ap. Here, the person activity Ap is the ratio of the person area to the screen.

モード決定評価器３６は、通信部６２から取得したビットレート及び伝送損失率と、上記のようにして求めた人物アクティビティＡｐと、を用いて所定の関数演算を行う。ここで、ビットレートは、回線通信レートの設定値である。また、伝送損失率は、伝送に失敗した実際のデータ比率である。 The mode decision evaluator 36 performs a predetermined function calculation using the bit rate and transmission loss rate acquired from the communication unit 62 and the person activity Ap obtained as described above. Here, the bit rate is a set value of the line communication rate. The transmission loss rate is an actual data rate at which transmission has failed.

モード決定評価器３６は、演算結果が第１閾値Ａ以上であれば、通常モードを選択する。また、モード決定評価器３６は、演算結果が第１閾値Ａ〜第２閾値Ｂの場合にはビット優先割当モードを選択する。さらに、モード決定評価器３６は、演算結果が第２閾値Ｂ未満の場合には限定領域伝送モードを選択する。 The mode decision evaluator 36 selects the normal mode if the calculation result is equal to or greater than the first threshold A. The mode decision evaluator 36 selects the bit priority allocation mode when the calculation result is the first threshold A to the second threshold B. Further, the mode determination evaluator 36 selects the limited region transmission mode when the calculation result is less than the second threshold value B.

遠隔会議装置１では、例えば、人物アクティビティＡｐ＞＞０の場合には、限定領域伝送モードを選択する。また、回線通信レートが低い場合には、ビット優先モードが選択され、回線通信レートが更に低い場合には、限定領域伝送モードが選択される。更に、伝送損失率が高い場合には、ビット優先割当モードかまたは限定領域伝送モードが選択される。但し、低い伝送レートで、かつ、損失が大きい場合には、自動切り換えを行わずに、限定モードが選択される。 In the remote conference apparatus 1, for example, when the person activity Ap >> 0, the limited area transmission mode is selected. Further, when the line communication rate is low, the bit priority mode is selected, and when the line communication rate is lower, the limited area transmission mode is selected. Further, when the transmission loss rate is high, the bit priority assignment mode or the limited area transmission mode is selected. However, when the transmission rate is low and loss is large, the limited mode is selected without performing automatic switching.

なお、モード決定評価器３６は、テーブル３７を参照して演算結果に対応するモードを選択するようにしても良い。また、モード決定評価器３６が行う演算に用いる関数は、実験等により予め設定されている。 The mode decision evaluator 36 may select a mode corresponding to the calculation result with reference to the table 37. The function used for the calculation performed by the mode decision evaluator 36 is set in advance by experiments or the like.

また、遠隔会議装置１では、上記の３つのモードのうちビット優先割当モードと限定領域伝送モードを自動的に切り替えるように自動切り替えモードを設定することも可能である。この場合には、遠隔会議装置１では、モード決定評価器３６の演算結果が第２閾値Ｂ以上ではビット優先割当モード、モード決定評価器３６の演算結果が第２閾値Ｂ未満では限定領域伝送モードに切り替える。 Further, in the remote conference apparatus 1, it is possible to set the automatic switching mode so as to automatically switch the bit priority allocation mode and the limited area transmission mode among the above three modes. In this case, in the teleconference device 1, the bit priority allocation mode is obtained when the calculation result of the mode decision evaluator 36 is equal to or higher than the second threshold B, and the limited area transmission mode is obtained when the calculation result of the mode decision evaluator 36 is less than the second threshold B. Switch to.

図１（Ｂ）に示した構成の遠隔会議システムでは、地点ａにおいて設置された遠隔会議装置１ａから、上記のように各モードにおいて映像データを含むストリームデータが出力されると、地点ｂにおいて設置された１ｂは、入出力コネクタ６４を介して通信部６２で、ストリームデータを受信する。そして、多重化／分離部６０で符号化された映像データを分離し、映像復号部５６で、映像データを複合化してモニタに出力する。 In the remote conference system having the configuration shown in FIG. 1B, when stream data including video data in each mode is output from the remote conference device 1a installed at the point a as described above, the remote conference device 1a is installed at the point b. The received 1b is received by the communication unit 62 via the input / output connector 64 as stream data. Then, the video data encoded by the multiplexing / separating unit 60 is separated, and the video decoding unit 56 combines the video data and outputs it to the monitor.

遠隔会議装置１ａから通常モードで送られてきた映像データは、図４（Ｂ）に示すように、全体が一定品位な映像としてモニタ５２ｂに映し出される。 As shown in FIG. 4B, the video data sent from the remote conference apparatus 1a in the normal mode is displayed on the monitor 52b as a video of a constant quality as a whole.

また、遠隔会議装置１ａからビット優先割当モードで送られてきた映像データは、図５（Ａ）に示したように、人物Ａ〜Ｃの顔の領域［１］・［３］・［５］は低圧縮率で高品位な映像で、人物Ａ〜Ｃの体の領域［２］・［４］・［６］は、人物Ａ〜Ｃの顔の領域に次いで低圧縮率で高品位な映像で、その他の領域である背景の領域［７］は高圧縮率で低品位な映像で、モニタ５２ｂに映し出される。 Further, the video data sent from the remote conference apparatus 1a in the bit priority assignment mode is the face areas [1], [3], [5] of the persons A to C as shown in FIG. Is a high-definition image with a low compression rate, and the body areas [2], [4], and [6] of the persons A to C are high-definition images with a low compression ratio next to the face areas of the persons A to C. The background area [7], which is the other area, is a high-compression and low-quality image that is displayed on the monitor 52b.

また、遠隔会議装置１ａから限定領域伝送モードで送られてきた映像データは、図６に示すように、映像データ送信開始時には、図６（Ａ）に示す会議風景の全体領域［１０］の映像が送られてくるが、その後はシーンチェンジ等が発生しない限り、図６（Ｂ）に示すように、人物Ａ〜Ｃとその周囲の領域［１１］〜［１４］が一定品位な映像として送られてくる。遠隔会議装置１ｂは、会議風景の全体領域［１０］の映像データを含むストリームデータを受信すると、図６（Ａ）に示した映像をモニタ５２ｂに表示させる。また、遠隔会議装置１ｂは、人物Ａ〜Ｃとその周囲の領域のみの映像データを含むストリームデータを受信すると、図６（Ａ）に示した映像のうち、図６（Ｂ）に示した領域［１１］〜［１４］を置きかえて、モニタ５２ｂに表示させる。 In addition, as shown in FIG. 6, the video data sent from the remote conference apparatus 1a in the limited area transmission mode is video of the entire area [10] of the conference scene shown in FIG. After that, unless a scene change or the like occurs, as shown in FIG. 6 (B), the persons A to C and the surrounding areas [11] to [14] are transmitted as fixed quality images. It will be. When receiving the stream data including the video data of the entire area [10] of the conference scene, the remote conference device 1b displays the video shown in FIG. 6A on the monitor 52b. When the remote conference device 1b receives the stream data including the video data of only the persons A to C and the surrounding area, the area shown in FIG. 6B of the video shown in FIG. [11] to [14] are replaced and displayed on the monitor 52b.

ここで、映像データの更新を行わない背景領域と、映像データを更新する人物領域と、が不連続な映像になることがあり得る。そのため、遠隔会議装置１ｂは、映像復号部５６に低域通過フィルタ（不図示）を備えており、会議風景の全体映像に人物映像を合成する際に、人物映像の境界付近の領域に対して、ブロック境界を検知しにくくするため、低域通過フィルタを挿入する。 Here, the background area where the video data is not updated and the person area where the video data is updated may be discontinuous videos. For this reason, the remote conference device 1b includes a low-pass filter (not shown) in the video decoding unit 56. When the person video is synthesized with the entire video of the conference scene, the remote conference device 1b applies to a region near the boundary of the person video. In order to make it difficult to detect the block boundary, a low-pass filter is inserted.

次に、遠隔会議装置１の動作について、フローチャートに基づいて説明する。図９は、遠隔会議装置の動作を説明するためのフローチャートである。 Next, operation | movement of the remote conference apparatus 1 is demonstrated based on a flowchart. FIG. 9 is a flowchart for explaining the operation of the remote conference apparatus.

制御部１４は、起動時にはメモリ１６の記憶内容を読み出して、映像データ伝送モードがどのモードになっているかを確認する（ｓ１）。制御部１４は、通常モードに設定されている場合には（ｓ２）、通常モードで映像の伝送を行う（ｓ３）。制御部１４は、ビット優先割当モードが設定されている場合には（ｓ４）、ビット優先割当モードで映像の伝送を行う（ｓ５）。制御部１４は、限定領域伝送モードに設定されている場合には（ｓ６）、限定領域伝送モードで映像の伝送を行う（ｓ７）。 The control unit 14 reads the stored contents of the memory 16 at the time of activation and confirms which mode is the video data transmission mode (s1). When the normal mode is set (s2), the control unit 14 performs video transmission in the normal mode (s3). When the bit priority allocation mode is set (s4), the control unit 14 transmits video in the bit priority allocation mode (s5). When the limited area transmission mode is set (s6), the control unit 14 transmits the video in the limited area transmission mode (s7).

一方、制御部１４は、自動切り替えモードが設定されている場合には（ｓ８）、現在のネットワーク８の通信状態、さらに映像全体に占める人物の情報量の割合に基づいて演算を行い、その結果に基づいて実行するモードを設定する（ｓ９）。制御部１４は、演算結果が第１閾値Ａ以上の場合には（ｓ１０）、通常モードで映像の伝送を行う（ｓ１１）。制御部１４は、演算結果が第１閾値Ａ〜第２閾値Ｂの場合には（ｓ１２）、ビット優先割当モードで映像の伝送を行う（ｓ１３）。制御部１４は、ステップｓ１２において、演算結果が第２閾値Ｂ未満の場合に、限定領域伝送モードで映像の伝送を行う（ｓ１４）。 On the other hand, when the automatic switching mode is set (s8), the control unit 14 performs a calculation based on the current communication state of the network 8 and the ratio of the information amount of the person in the entire video, and the result The mode to be executed is set based on (S9). When the calculation result is greater than or equal to the first threshold A (s10), the control unit 14 transmits the video in the normal mode (s11). When the calculation result is the first threshold value A to the second threshold value B (s12), the control unit 14 performs video transmission in the bit priority assignment mode (s13). In step s12, when the calculation result is less than the second threshold value B, the control unit 14 transmits the video in the limited area transmission mode (s14).

制御部１４は、一定周期で、すなわち一例として３秒毎に、現在のネットワーク８の状態及び伝送しようとする映像の特徴に基づいて演算を行っており（ｓ１５，ｓ１７，ｓ１９）、演算結果が変化していない場合には（ｓ１６，ｓ１８，ｓ２０）、引き続き現在設定されているモードで映像の伝送を行う。 The control unit 14 performs calculation based on the current state of the network 8 and the characteristics of the video to be transmitted (s15, s17, s19) at regular intervals, that is, every 3 seconds as an example. If there is no change (s16, s18, s20), video transmission is continued in the currently set mode.

一方、演算結果が変化している場合には（ｓ１６，ｓ１８，ｓ２０）、ステップｓ１０以降の処理を行う。 On the other hand, when the calculation result has changed (s16, s18, s20), the process after step s10 is performed.

以上のように、本発明の遠隔会議装置では、設定に応じて、またはネットワークの状態に応じて、通信レートが低下したり映像全体に占める人物の情報量の割合が変化したりしても、会議風景の全体画像における注目される対象の映像を確実に送ることができる。また、同じ転送レートでは映像の高画質化を、また、同じ品質であれば低ビットレート化を図ることができる。 As described above, in the remote conference device of the present invention, even if the communication rate decreases or the ratio of the information amount of the person occupying the entire video changes according to the setting or the network state, It is possible to reliably send the target video in the entire image of the meeting scene. In addition, it is possible to improve the image quality of the video at the same transfer rate, and to reduce the bit rate if the quality is the same.

なお、以上の説明では、会議風景の映像を伝送する場合を例に挙げて説明したが、これに限るものではなく、人物及びその周囲を撮影した映像であれば、講演のスピーチの映像等、他の映像でも良い。 In the above description, the case where the video of the conference scene is transmitted has been described as an example, but the present invention is not limited to this. Other images may be used.

本発明の一実施形態に係る遠隔会議装置の斜視概観図、及びネットワーク接続した２台の遠隔会議装置を示す図である。1 is a perspective overview of a remote conference device according to an embodiment of the present invention, and a diagram showing two remote conference devices connected to a network. FIG. 本発明の一実施形態に係る遠隔会議装置の機能ブロック図である。It is a functional block diagram of the remote conference apparatus which concerns on one Embodiment of this invention. 映像符号化部の構成を示すブロック図である。It is a block diagram which shows the structure of a video coding part. 遠隔会議装置を用いた会議風景、及びビデオカメラが撮影した通常モードの映像を示す図である。It is a figure which shows the image | video of the normal mode image | photographed with the meeting scenery and video camera which used the remote conference apparatus. ビット優先割当モードで撮影した映像を示す図である。It is a figure which shows the image | video image | photographed in the bit priority allocation mode. 限定領域伝送モードで撮影した映像を示す図である。It is a figure which shows the image | video image | photographed in limited area transmission mode. ズーム前の会議風景の全体画像、及びズーム後の会議風景の全体画像を示す図である。It is a figure which shows the whole image of the meeting scenery before zooming, and the whole image of the meeting scenery after zooming. 制御部の具体的な構成を示すブロック図である。It is a block diagram which shows the specific structure of a control part. 遠隔会議装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of a remote conference apparatus.

Explanation of symbols

１，１ａ，１ｂ−遠隔会議装置８−ネットワーク１０−リモコン１２−信号受信部１４−制御部１６−メモリ２０−スピーカ信号処理部２０−音声処理部２６−スピーカアレイ３０−マイクアレイ３２−ブロック情報量演算器３３−人物領域積算器３４−全体領域積算器３５−除算器３６−モード決定評価器３７−テーブル４１−会議机５０，５０ａ，５０ｂ−ビデオカメラ５２，５２ａ，５２ｂ−モニタ５４−映像符号化部５６−映像復号部６２−通信部６４−入出力コネクタ７２−画面切替検出器７４−符号化制御器７６−第１切替スイッチ７８−離散余弦変換器８０−量子化器８１−符号化器８２−逆量子化器８４−逆離散余弦変換器８６−加算器８８−映像メモリ９０−第２切替スイッチ９２−減算器１０１−操作部 1, 1a, 1b-Remote conference device 8-Network 10-Remote control 12-Signal reception unit 14-Control unit 16-Memory 20-Speaker signal processing unit 20-Audio processing unit 26-Speaker array 30-Microphone array 32-Block information Quantity calculator 33-Person area integrator 34-Whole area integrator 35-Divider 36-Mode decision evaluator 37-Table 41-Conference desk 50, 50a, 50b-Video camera 52, 52a, 52b-Monitor 54-Video Encoding unit 56-Video decoding unit 62-Communication unit 64-Input / output connector 72-Screen switching detector 74-Coding controller 76-First changeover switch 78-Discrete cosine transformer 80-Quantizer 81-Coding 82-inverse quantizer 84-inverse discrete cosine transformer 86-additive Calculator 88-Video memory 90-Second changeover switch 92-Subtractor 101-Operation section

Claims

Imaging means for photographing a person and its surroundings and outputting the image;
A person detecting means for detecting a person from the video;
Encoding for periodically extracting the detected person's video from the video, encoding the human video data at a low compression rate, and encoding the video data other than the person at a high compression rate Means,
A video encoding device comprising:

Imaging means for photographing a person and its surroundings and outputting the image;
A person detecting means for detecting a person from the video;
Encoding means for encoding the entire area of the video imaged by the imaging means, and thereafter encoding the person video data that is a rectangular area including a person in the video periodically;
A video encoding device comprising:

A scene change detecting means for detecting a change in configuration in the entire area of the video imaged by the imaging means;
The video encoding device according to claim 2, wherein the encoding means encodes the entire area of the video at a timing when the scene change detection means detects a change in configuration in the entire area of the video.

A video encoding device according to any one of claims 1 to 3,
Communication means for transmitting video data encoded by the video encoding device to a counterpart device;
A video transmission apparatus comprising:

Imaging means for photographing a person and its surroundings and outputting the image;
A person detecting means for detecting a person from the video;
A calculation means for performing a predetermined calculation based on the state of the communication rate of the transmission path and the ratio of the information amount of the person in the entire video;
If the calculation result of the calculation means is equal to or greater than a threshold, the detected person's video is periodically extracted from the video taken by the imaging means, and the video data is encoded at a low compression rate. Video data at a high compression rate,
When the calculation result of the calculation means is less than the threshold value, the entire area of the video imaged by the imaging means is encoded, and thereafter the person video data that is a rectangular area including the person in the video is encoded periodically. Encoding means for
Communication means for transmitting the video data encoded by the encoding means to a counterpart device;
A video transmission apparatus comprising:

A procedure for periodically detecting a person from an image of the person and its surroundings,
Extracting the detected person's video data and encoding it at a low compression rate, and encoding the video data other than the person at a high compression rate,
A video encoding method comprising:

A procedure for periodically detecting a person from an image of the person and its surroundings,
A procedure for encoding the entire area of the person and the surrounding video, and encoding the person video data which is a rectangular area including the person in the video periodically thereafter;
A video encoding method comprising:

A procedure for detecting a composition change of the entire area in the person and the surrounding image;
A procedure for encoding the entire area of the video data at a timing when a change in the configuration of the entire area in the video is detected;
The video encoding method according to claim 7, further comprising:

After performing the video encoding method according to claim 6,
A procedure for transmitting the encoded video data to a counterpart device;
A video transmission method comprising:

A procedure for performing a predetermined calculation based on the state of the communication rate of the transmission path and the ratio of the information amount of the person in the entire video,
When the calculation result is equal to or greater than a threshold value,
A procedure for periodically detecting a person from an image of the person and its surroundings,
Extracting the detected person's video data and encoding it at a low compression rate, and encoding the video data other than the person at a high compression rate,
A procedure for transmitting the encoded person video data and video data other than the person to the counterpart device;
When the calculation result is less than the threshold value,
A procedure for periodically detecting a person from an image of the person and its surroundings,
A procedure for encoding the entire area of the video, and thereafter encoding the person video data which is a rectangular area including the person in the video periodically;
A procedure for transmitting the encoded video data to a counterpart device;
A video transmission method comprising: