JP4196085B2

JP4196085B2 - Video signal encoding apparatus and video conference system using the same

Info

Publication number: JP4196085B2
Application number: JP2003183477A
Authority: JP
Inventors: 克行西邑
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2003-06-26
Filing date: 2003-06-26
Publication date: 2008-12-17
Anticipated expiration: 2023-06-26
Also published as: JP2005020466A

Description

【０００１】
【発明の属する技術分野】
本発明は、多地点間の端末同士で通信回線を介して双方向の映像伝送を行うテレビ会議システムおよびそれに用いられる映像信号符号化装置に関するものである。
【０００２】
【従来の技術】
従来より、インターネットやＬＡＮを使用したマルチメディアアプリケーションの需要が高まっている。特に近年はインターネットで使用される回線の高速化、広帯域化が進んでおり、映像や音声を高速インターネット回線を通して双方向通信を行うテレビ会議などのシステムが現実的に利用可能になってきている。
【０００３】
一般的なテレビ会議システムとして、ここでは最大５人まで参加可能なテレビ会議システムの構成例を図４に示す。図４において、端末４１〜４５は通信回線を通じて多地点制御装置４６につながっており、相互に通信が可能なシステムとなっている。多地点制御装置４６は各端末から送信される映像や音声データを適切に切替えながら、会議に参加している相手の映像や音声をそれぞれの端末に振り分けて送信する。各端末は多地点制御装置４６から受信した映像データを処理してテレビ画面に表示したり、音声データをスピーカーから出力したりして相手との対話が可能なように適切なユーザインタフェースを備えている。これらのシステムでは映像信号をデジタルデータに変換した後、それぞれの使用する通信回線の帯域に適合した速度で映像データを送受信できるように、データ量の削減を目的として圧縮符号化が行われるのが通常である。ここで用いられる圧縮符号化の方式としては例えばＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）規格やＨ．２６３規格などの国際標準規格がよく知られている。これらの圧縮符号化方式を用いることにより映像データを通信回線を通して効率良く伝送を行うことが可能となっている。
【０００４】
このような構成のテレビ会議システムにおいては、複数の相手の映像をテレビ画面に表示する場合、画面を小画面に分割して表示することにより全員の映像を表示する手法が一般的である。そのような手法で表示される画面の一例を図５に示す。図５はユーザ１が操作している端末４１側のテレビ画面に表示される映像を表しており、会議の他の参加者ユーザ２〜ユーザ５の映像が小画面に分割されて表示される様子である。このように受信した複数の映像データを小画面に分割して表示する手法としては、従来では各々の相手から受信した映像データを複数のデコーダ（復号化装置）で並列処理でデコードを行い、その結果の映像を受信側の端末で合成して表示する手法が多く用いられていた。
【０００５】
そのような手法を用いたシステムでの端末構成の例を図６に示す。図６において、端末６１に入力された映像信号は入力処理手段６０１でデジタル変換された後に符号化手段６０２に出力される。符号化手段６０２はデジタル化された映像信号を圧縮符号化を行いデータストリーム形式にして送信手段６０３に出力する。送信手段６０３は映像データストリームを通信回線を通して多地点制御装置６３に向けて送信を行う。ここまでが端末の送信処理の様子である。次に端末の受信処理としては、多地点制御装置６３から受信した複数の参加者の映像データを端末６２の受信手段６０４が受信した後、それらを複数の復号化手段６０５にて並列に復号化を行ってそれぞれのデジタル映像信号に変換して出力する。それらのデジタル映像信号は合成表示手段６０６にてひとつの映像として合成され、図５のような表示形式にてテレビ画面に表示される。しかしこのような手法では、端末の受信処理の中で複数の復号化装置を備えたり、合成表示装置を備える必要があり、複雑な回路構成になって規模が大きくなるという問題点があった。そのような問題を解決するため、特許文献１に記載されているように、端末側で合成するのではなく、多地点制御装置側で合成を行うようにする手法がある。本手法の動作を図７を用いて簡単に説明する。端末７１における入力処理手段７０１および符号化手段７０２の動作は図６における入力処理手段６０１および符号化手段６０２と同様である。図７において、送信手段７０３は符号化手段７０２から入力される映像データストリームの内、映像の１／４の領域部分だけ取り出した映像データストリームを通信回線に送信する。多地点制御装置７３では、各端末から送信された１／４領域部分の映像データストリームを合成して４画面分でひとつの映像データストリームを再構成して各端末に送信する。端末７２では多地点制御装置７３からの映像データを受信手段７０４が受信した後、復号化手段７０５にて復号化を行ってデジタル映像信号に変換して出力する。多地点制御装置７３からのデジタル映像信号は端末の受信側では４画面を意識することなくひとつの映像データストリームとして復号化を行い表示するだけで、図５に示すような４画面分の映像を表示することができる。
【０００６】
このような手法によれば、端末の受信側の回路構成を簡素化することができる。
【０００７】
【特許文献１】
特開平４−７２８８７号公報
【０００８】
【発明が解決しようとする課題】
しかしながら、上記のような従来手法ではＭＰＥＧ規格のような圧縮符号化方式を用いた場合、符号化出力のデータストリーム上では画面内及び画面間での相関があるために、符号化データから単純に１／４領域部分のデータだけ取り出しても正しい映像として合成することはできない。また、これらの圧縮符号化方式では可変長符号化を行っており、画像ごとのデータ量は一般に変動しているため、多地点制御装置側で合成を行うとデータ量の変動が原因となって合成遅延が発生するという問題がある。
【０００９】
本発明は上記従来の問題点を解決するもので、ＭＰＥＧ規格のような圧縮符号化方式を用いた場合でも多地点制御装置側で合成が可能となり、端末の回路構成の簡素化を実現できるようになる。また可変長符号化によるデータ量変動を考慮した符号化制御を行うことにより、より映像品質の高いテレビ会議システムを提供することを目的とする。
【００１０】
【課題を解決するための手段】
本発明の請求項１に記載の発明は、通信回線を介して双方向の映像伝送を行うテレビ会議システムに用いられる映像信号符号化装置であって、入力された映像信号をデジタル処理して変換しデジタル映像として出力する入力処理手段と、前記入力処理手段から出力されたデジタル映像を蓄積するフレームメモリと、前記フレームメモリからデジタル映像を取り出し圧縮符号化して出力する符号化手段と、前記符号化手段から出力されたデータを一時的に蓄積しておくバッファと、前記バッファから出力されたデータを通信路を介して送信する送信手段と、前記バッファに蓄積されるデータのデータ量を監視して、前記データ量が所定のしきい値を越えると入力フレーム制御手段と符号化制御手段とにデータ量が閾値を超えたことを通知するための制御情報を出力するデータ量監視手段と、前記データ量監視手段から出力される制御情報に基づいて次のデジタル映像データを現在符号化処理に使用しているひとつ前のデジタル映像データで書き換えるよう前記フレームメモリに蓄積されているデジタル映像の配置を制御する入力フレーム制御手段と、前記データ量監視手段から出力される制御情報に基づいて符号化処理を打ち切って残りの画像領域はｎｏｔ＿ｃｏｄｅｄデータとして出力するよう前記符号化手段を制御する符号化制御手段を有することを特徴とする映像信号符号化装置である。
【００１１】
このように構成することにより、多地点制御装置は、それぞれの端末から送信されるデータをフレーム毎にビデオパケット単位で組み替えるだけで、一つのデータストリームとして受信端末側に送信することができる。受信端末側では受信した一つのデータストリームとして単にデコードするだけで、複数画面が合成された映像を表示することができ、かつ合成による遅延の少ない高品質な映像が得られる。
【００１２】
本発明の請求項２に記載の発明は、前記入力処理手段は入力された映像信号をデジタル変換する際に、あらかじめ符号化すべき映像の大きさに変換してデジタル映像を生成して出力することを特徴とする請求項１記載の映像信号符号化装置である。
【００１３】
このように構成することにより、縮小処理を入力段階で行うことによって符号化処理やデータ処理の負荷が軽減する。
【００１４】
本発明の請求項３に記載の発明は、前記符号化手段はＭＰＥＧ４規格に基づいて符号化することを特徴とする請求項１記載の映像信号符号化装置である。
【００１５】
このように構成することにより、ＭＰＥＧ規格のような圧縮符号化方式を用いた場合でも、多地点制御装置は、それぞれの端末から送信されるデータをフレーム毎にビデオパケット単位で組み替えるだけで、一つのデータストリームとして受信端末側に送信することができる。受信端末側では受信した一つのデータストリームとして単にデコードするだけで、複数画面が合成された映像を表示することができ、かつ合成による遅延の少ない高品質な映像が得られる。
【００１６】
本発明の請求項４に記載の発明は、前記符号化手段はＭＰＥＧ４規格のビデオパケットをひとつのマクロブロックラインごとに構成し、動きベクトルは必ず画面内を指すよう符号化することを特徴とする請求項３記載の映像信号符号化装置である。
【００１７】
このように構成することにより、多地点制御装置は、それぞれの端末から送信されるデータをフレーム毎にビデオパケット単位で組み替えるだけで、一つのデータストリームとして受信端末側に送信することができる。受信端末側では受信した一つのデータストリームとして単にデコードするだけで、複数画面が合成された映像を表示することができ、かつ合成による遅延の少ない高品質な映像が得られる。
【００１８】
本発明の請求項５に記載の発明は、前記符号化手段はＨ．２６３規格に基づいて符号化することを特徴とする請求項１記載の映像信号符号化装置である。
【００１９】
このように構成することにより、Ｈ．２６３規格のような圧縮符号化方式を用いた場合でも、多地点制御装置は、それぞれの端末から送信されるデータをフレーム毎にＧＯＢ単位で組み替えるだけで、一つのデータストリームとして受信端末側に送信することができる。受信端末側では受信した一つのデータストリームとして単にデコードするだけで、複数画面が合成された映像を表示することができ、かつ合成による遅延の少ない高品質な映像が得られる。
【００２０】
本発明の請求項６に記載の発明は、請求項１〜５のいずれかに記載の映像信号符号化装置を備えたテレビ会議システムである。
【００２１】
このように構成することにより、多地点制御装置は、それぞれの端末から送信されるデータをフレーム毎にビデオパケット単位またはＧＯＢ単位で組み替えるだけで、一つのデータストリームとして受信側に送信することができる。受信側では受信した一つのデータストリームとして単にデコードするだけで、複数画面が合成された映像を表示することができ、かつ合成による遅延の少ない高品質な映像が得られる。
【００２２】
【発明の実施の形態】
以下、本発明の実施の形態について、図１から図３を用いて説明する。
【００２３】
（実施の形態１）
図１は本発明の一実施の形態である映像信号符号化装置の構成を示すブロック図である。図１において、映像信号符号化装置は、テレビ会議を行うために自分の映像を映すカメラなどからの映像信号を入力とする。入力された映像信号は入力処理手段１０１においてデジタル映像信号に変換される。このとき、入力処理手段１０１は入力された映像信号をデジタル変換する際に、あらかじめ符号化すべき映像の大きさに変換してデジタル映像を生成して出力するようにする。すなわち、例えば入力映像信号を水平垂直方向ともに１／２にして全体の１／４の大きさになるよう縮小してデジタル映像として出力する。このように縮小処理を入力段階で行うことによって符号化処理やデータ処理の負荷が軽減するという効果も期待できる。１／４に縮小化されたデジタル映像データはフレームメモリ１０２にて一時的に蓄積される。通常はここで一時的に蓄積されたデジタル映像データはそのまま符号化手段１０３に入力され、例えばＭＰＥＧ４規格にて圧縮符号化処理が行われる。
【００２４】
ここで符号化手段１０３の処理について図２を用いて詳細に説明する。ＭＰＥＧ４規格においては１枚の映像フレームを複数の領域に分割したビデオパケットという符号化単位がある。またビデオパケットをさらに１６画素×１６画素単位で分割したマクロブロックという符号化単位がある。符号化手段１０３では、図２（ａ）に示すように水平方向に一列分のマクロブロックラインでひとつのビデオパケットとなるように構成して符号化処理を行う。またフレーム間符号化を行う際に用いられる動きベクトルの探索範囲を必ず画像の内側を指すように符号化処理を行う。このような条件で符号化することにより、図２（ｂ）に示すように多地点制御装置では複数の端末から送信された映像データをビデオパケットごとに組み替えて合成することで、映像の乱れのない正しい合成映像を生成することが可能となる。なお、Ｈ．２６３規格においてはＧＯＢ（ＧｒｏｕｐＯｆＢｌｏｃｋ）という符号化単位がある。ＧＯＢも１枚の映像フレームを複数の領域に分割する一要素であり、複数のマクロブロックから構成される。前記ＭＰＥＧ４規格においてビデオパケット単位で処理を行う場合と同様に、Ｈ．２６３規格ではＧＯＢ単位で処理を行うことにより、同様の効果が得られる。
【００２５】
このようにして符号化処理された結果のデータストリームは、ビデオパケット単位でバッファ１０４に出力され一時的に蓄積される。ここで一時的に蓄積されるビデオパケット単位のデータストリームのデータ量は、データ量監視手段１０７によって監視されており、１画像フレーム分のデータが蓄積されるまでにある一定の閾値（ＶＯＰｍａｘ）以内にデータ量がおさまっているかどうかが常にチェックされる。１画像フレーム分のデータ量が閾値（ＶＯＰｍａｘ）以内におさまっているときはそのまま送信手段１０８によって通信回線を介して多地点制御装置に向けて送信される。
【００２６】
一方、１画像フレーム分のデータが蓄積されるまでにデータ量が閾値（ＶＯＰｍａｘ）を超えるとき、データ量監視手段１０７は、入力フレーム制御手段１０５と符号化制御手段１０６に対して、データ量が閾値を超えたことを通知するための制御情報を出力する。このとき、制御情報を受けた符号化制御手段１０６は符号化手段１０３に対して、現在の画像フレームの符号化処理をその時点のビデオパケットで打ち切って残りの画像領域のビデオパケットはｎｏｔ＿ｃｏｄｅｄデータとして出力するよう制御を行う。ｎｏｔ＿ｃｏｄｅｄデータとは、ＭＰＥＧ規格においてフレーム間符号化の際に、時間軸上でひとつ前の画像フレームとまったく同じ画像であることを示す符号である。ｎｏｔ＿ｃｏｄｅｄデータ自身のデータ量は著しく小さいため、データ量を増加させることなく１画像フレーム分のデータとして完結させることができる。このような打ち切り処理を行った場合、その該当画像フレームを復号化すると、途中のビデオパケット以降の領域はひとつ前の画像データとして復号される。残りの領域の画像データを次の画像フレームで復号化処理時に再構成可能とするため、入力フレーム制御手段１０５は、フレームメモリ１０２に蓄積されている次のデジタル映像データを現在符号化処理に使用しているひとつ前のデジタル映像データで書き換えるよう制御する。このような書き換え処理を行うことにより、符号化手段１０３は次の画像フレームの符号化処理の際に、打ち切り処理を行った一つ前の画像フレームの残りの領域を符号化することになり、復号化処理時にはここで完結した画像フレームを再構成することが可能となる。このような打ち切り処理を行うことにより、符号化結果のデータストリームのデータ量の変動を最小限に抑えることができ、通信路を介して送信される映像データを均一化することができ、多地点制御装置での合成処理による遅延を抑えることが可能となる。この作用については次にさらに詳細な説明を行う。
【００２７】
ここで本実施の形態の映像信号符号化装置が処理を行う打ち切り処理について、図３を用いて説明する。
【００２８】
一般的にＭＰＥＧ規格やＨ．２６３規格などの映像信号に対する圧縮符号化方式では映像の特徴によって、また可変長符号化を行うため符号化結果のデータ量が変動する。
【００２９】
図３（ａ）はその様子を模式的に表したものである。Ｐ１〜Ｐ４は時間順に並べた映像フレームを表している。矩形の高さがそれぞれの映像フレームに対するデータ量を表している。図３（ａ）の場合では、映像フレームＰ２の符号化結果のデータ量が多くなっている。このようなとき、このデータを通信路を介して送信すると、データ量が多い映像フレームＰ２の映像データの送信には他よりも時間がかかり、多地点制御装置への到着も遅延することになる。すると、多地点制御装置ではこの映像フレームＰ２の映像データが到着しないために他の端末から受信した映像データとの合成処理に支障をきたすことになる。もしも映像フレームＰ２の映像データが到着するまで合成処理を待つとその分の遅延が発生し、その後の映像フレームＰ３、Ｐ４の合成処理にも遅延が影響するという問題がある。一方、映像フレームＰ２の映像データの到着を待たず、映像フレームＰ２の映像を捨てて合成処理を行うと次の映像フレームＰ３以降の映像データと整合が取れず、復号化処理で映像の乱れが発生するという問題がある。そこで本発明では、映像フレームＰ２の符号化時にデータ量が閾値（ＶＯＰｍａｘ）を超えるときには符号化をその時点のビデオパケットで打ち切って、それ以上データ量が増加するのを防ぐ。その様子を表したのが図３（ｂ）である。映像フレームＰ２の符号化時には途中で打ち切っているため、映像フレームＰ２の映像データを完結させるために次の映像フレームＰ３の符号化を行う前に映像フレームＰ２の映像で書き換える。そうすることにより、映像フレームＰ２の映像の残りの領域の符号化が行われて、この時点で映像フレームＰ２の映像データが完結する。
【００３０】
このような処理を行うことにより、ＭＰＥＧ規格などの圧縮符号化方式においてもデータ量の変動を抑えて均一化されたデータストリームを生成することができる。その結果、そのデータストリームを通信回線を介して多地点制御装置に送信した際に、均一なデータ受信が可能となり、多地点制御装置での映像合成処理が正しく行われ、ユーザにとっても違和感の少ない高品質な合成映像を提供することが可能となる。
【００３１】
【発明の効果】
以上のように本発明によれば、ＭＰＥＧ規格のような圧縮符号化方式を用いた場合でも多地点制御装置側で合成が可能となり、端末の回路構成の簡素化を実現できるようになる。また可変長符号化によるデータ量変動を考慮した符号化制御を行うことにより、遅延が少なくより映像品質の高い映像信号符号化装置およびテレビ会議システムを提供することができる。
【図面の簡単な説明】
【図１】本発明の一実施の形態である映像信号符号化装置の構成を示すブロック図
【図２】本発明の映像信号符号化装置の符号化処理のデータ構成を示す説明図
【図３】本発明の映像信号符号化装置の符号化処理の詳細を示す説明図
【図４】多地点間のテレビ会議システムの装置構成を示すブロック図
【図５】端末側のテレビ画面で表示される合成映像を示す模式図
【図６】従来のテレビ会議システムの構成を示すブロック図
【図７】従来のテレビ会議システムの構成を示すブロック図
【符号の説明】
１０１入力処理手段
１０２フレームメモリ
１０３符号化手段
１０４バッファ
１０５入力フレーム制御手段
１０６符号化制御手段
１０７データ量監視手段
１０８送信手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a video conference system that performs bidirectional video transmission between terminals between multiple points via a communication line, and a video signal encoding device used therefor.
[0002]
[Prior art]
Conventionally, the demand for multimedia applications using the Internet or LAN has increased. In recent years, in particular, the speed and bandwidth of lines used in the Internet have been increasing, and systems such as video conferencing that perform bidirectional communication of video and audio through high-speed Internet lines have become practically available.
[0003]
As a general video conference system, FIG. 4 shows a configuration example of a video conference system in which up to five people can participate. In FIG. 4, terminals 41 to 45 are connected to a multipoint control device 46 through a communication line, so that they can communicate with each other. The multipoint control device 46 distributes and transmits the video and audio of the other party participating in the conference to each terminal while appropriately switching the video and audio data transmitted from each terminal. Each terminal has an appropriate user interface so that the video data received from the multipoint control device 46 can be processed and displayed on a television screen, or audio data can be output from a speaker to interact with the other party. Yes. In these systems, after video signals are converted to digital data, compression encoding is performed for the purpose of reducing the amount of data so that video data can be transmitted and received at a speed suitable for the bandwidth of the communication line used. It is normal. Examples of the compression encoding method used here include the MPEG (Moving Picture Experts Group) standard and the H.264 standard. International standards such as the H.263 standard are well known. By using these compression encoding methods, video data can be efficiently transmitted through a communication line.
[0004]
In the video conference system having such a configuration, when displaying images of a plurality of opponents on a TV screen, a method of displaying all the videos by dividing the screen into small screens is generally used. An example of a screen displayed by such a method is shown in FIG. FIG. 5 shows a video displayed on the TV screen on the terminal 41 side operated by the user 1, and the video of the other participants users 2 to 5 of the conference is divided into small screens and displayed. It is. As a technique for dividing and displaying a plurality of video data received in this way into small screens, conventionally, video data received from each partner is decoded by a plurality of decoders (decoding devices) in parallel processing, A method of synthesizing and displaying the resulting video at the receiving terminal is often used.
[0005]
An example of a terminal configuration in a system using such a technique is shown in FIG. In FIG. 6, the video signal input to the terminal 61 is digitally converted by the input processing unit 601 and then output to the encoding unit 602. The encoding unit 602 compresses and encodes the digitized video signal and outputs it to the transmission unit 603 in a data stream format. The transmission means 603 transmits the video data stream to the multipoint control device 63 through the communication line. This is the state of the terminal transmission process. Next, as reception processing of the terminal, after the video data of a plurality of participants received from the multipoint control device 63 is received by the receiving means 604 of the terminal 62, they are decoded in parallel by the plurality of decoding means 605. To convert each digital video signal and output it. These digital video signals are synthesized as one video by the synthesis display means 606 and displayed on the television screen in the display format as shown in FIG. However, in such a method, it is necessary to provide a plurality of decoding devices or a composite display device in the reception processing of the terminal, and there is a problem that the scale becomes large due to a complicated circuit configuration. In order to solve such a problem, as described in Patent Document 1, there is a method of performing synthesis on the multipoint control apparatus side instead of synthesis on the terminal side. The operation of this method will be briefly described with reference to FIG. The operations of the input processing unit 701 and the encoding unit 702 in the terminal 71 are the same as those of the input processing unit 601 and the encoding unit 602 in FIG. In FIG. 7, the transmission unit 703 transmits a video data stream obtained by extracting only a quarter of the video data stream input from the encoding unit 702 to the communication line. The multipoint control device 73 combines the video data streams of the ¼ area portion transmitted from each terminal, reconstructs one video data stream for four screens, and transmits it to each terminal. In the terminal 72, after the video data from the multipoint control device 73 is received by the receiving means 704, the decoding means 705 decodes it, converts it into a digital video signal, and outputs it. The digital video signal from the multi-point control device 73 is decoded and displayed as one video data stream on the receiving side of the terminal without being conscious of the four screens, and the video for four screens as shown in FIG. 5 is displayed. Can be displayed.
[0006]
According to such a method, the circuit configuration on the receiving side of the terminal can be simplified.
[0007]
[Patent Document 1]
Japanese Patent Laid-Open No. 4-72887
[Problems to be solved by the invention]
However, in the conventional method as described above, when a compression encoding method such as the MPEG standard is used, there is a correlation between the screen and the screen on the data stream of the encoded output. Even if only the data of the 1/4 area portion is taken out, it cannot be synthesized as a correct video. In addition, these compression encoding methods perform variable length encoding, and the amount of data for each image generally fluctuates. Therefore, when combining on the multipoint control device side, the amount of data varies. There is a problem that synthesis delay occurs.
[0009]
The present invention solves the above-mentioned conventional problems, so that even when a compression coding system such as the MPEG standard is used, the multipoint control apparatus can synthesize, and the circuit configuration of the terminal can be simplified. become. It is another object of the present invention to provide a video conference system with higher video quality by performing encoding control in consideration of a data amount variation due to variable length encoding.
[0010]
[Means for Solving the Problems]
The invention according to claim 1 of the present invention is a video signal encoding apparatus used in a video conference system that performs bidirectional video transmission via a communication line, and digitally processes an input video signal and converts it. Input processing means for outputting as digital video, a frame memory for storing the digital video output from the input processing means, encoding means for taking out the digital video from the frame memory, compressing and outputting the digital video, and the encoding A buffer for temporarily storing the data output from the means, a transmission means for transmitting the data output from the buffer via a communication path, and monitoring the data amount of the data stored in the buffer. and notifying that the data amount is a data amount in the input frame control means and the coding control means exceeds a predetermined threshold value exceeds a threshold value The data amount monitoring means for outputting control information, is rewritten with the previous digital image data using the following digital video data in the current encoding process based on the control information outputted from the data amount monitoring unit Input frame control means for controlling the arrangement of the digital video stored in the frame memory, and the encoding process is terminated based on the control information output from the data amount monitoring means, and the remaining image area is output as not_coded data. A video signal encoding apparatus comprising encoding control means for controlling the encoding means.
[0011]
By configuring in this way, the multipoint control device can transmit data transmitted from each terminal to the receiving terminal side as a single data stream only by recombining data in units of video packets for each frame. On the receiving terminal side, it is possible to display a video in which a plurality of screens are synthesized by simply decoding it as one received data stream, and to obtain a high quality video with little delay due to synthesis.
[0012]
According to a second aspect of the present invention, when the input processing means digitally converts an input video signal, the input processing means converts the video signal to a size to be encoded in advance and generates and outputs a digital video. The video signal encoding device according to claim 1.
[0013]
With this configuration, the load of encoding processing and data processing is reduced by performing the reduction processing at the input stage.
[0014]
The invention according to claim 3 of the present invention is the video signal encoding apparatus according to claim 1, wherein the encoding means performs encoding based on the MPEG4 standard.
[0015]
With this configuration, even when a compression encoding method such as the MPEG standard is used, the multipoint control apparatus can reconfigure the data transmitted from each terminal by video packet unit for each frame. One data stream can be transmitted to the receiving terminal side. On the receiving terminal side, it is possible to display a video in which a plurality of screens are synthesized by simply decoding it as one received data stream, and to obtain a high quality video with little delay due to synthesis.
[0016]
The invention according to claim 4 of the present invention is characterized in that the encoding means configures a video packet of the MPEG4 standard for each macroblock line, and the motion vector always encodes within the screen. A video signal encoding device according to claim 3.
[0017]
By configuring in this way, the multipoint control device can transmit data transmitted from each terminal to the receiving terminal side as a single data stream only by recombining data in units of video packets for each frame. On the receiving terminal side, it is possible to display a video in which a plurality of screens are synthesized by simply decoding it as one received data stream, and to obtain a high quality video with little delay due to synthesis.
[0018]
In the invention according to claim 5 of the present invention, the encoding means is H.264. 2. The video signal encoding apparatus according to claim 1, wherein encoding is performed based on the H.263 standard.
[0019]
By configuring in this way, H. Even when a compression encoding method such as the H.263 standard is used, the multipoint control apparatus transmits data as a single data stream to the receiving terminal by simply recombining data transmitted from each terminal in units of GOBs. can do. On the receiving terminal side, it is possible to display a video in which a plurality of screens are synthesized by simply decoding it as one received data stream, and to obtain a high quality video with little delay due to synthesis.
[0020]
A sixth aspect of the present invention is a video conference system including the video signal encoding device according to any one of the first to fifth aspects.
[0021]
With this configuration, the multipoint control apparatus can transmit data transmitted from each terminal to the receiving side as a single data stream by simply recombining data for each frame in units of video packets or GOBs. . On the receiving side, it is possible to display a video in which a plurality of screens are synthesized by simply decoding it as one received data stream, and to obtain a high quality video with little delay due to synthesis.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to FIGS. 1 to 3.
[0023]
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a video signal encoding apparatus according to an embodiment of the present invention. In FIG. 1, a video signal encoding device receives a video signal from a camera or the like that displays its own video for a video conference. The input video signal is converted into a digital video signal by the input processing means 101. At this time, when the input video signal is digitally converted, the input processing unit 101 converts the video signal into a size to be encoded in advance and generates and outputs a digital video. That is, for example, the input video signal is halved in both the horizontal and vertical directions and reduced to ¼ of the entire size and output as a digital video. By performing the reduction process at the input stage as described above, an effect of reducing the load of the encoding process and the data process can be expected. The digital video data reduced to ¼ is temporarily stored in the frame memory 102. Normally, the digital video data temporarily stored here is input to the encoding means 103 as it is, and compression encoding processing is performed according to, for example, the MPEG4 standard.
[0024]
Here, the processing of the encoding means 103 will be described in detail with reference to FIG. In the MPEG4 standard, there is an encoding unit called a video packet in which one video frame is divided into a plurality of areas. In addition, there is an encoding unit called a macroblock obtained by further dividing a video packet in units of 16 pixels × 16 pixels. As shown in FIG. 2 (a), the encoding means 103 performs an encoding process by constituting a single video packet with one row of macroblock lines in the horizontal direction. Also, the encoding process is performed so that the search range of the motion vector used when performing the inter-frame encoding always points to the inside of the image. By encoding under such conditions, as shown in FIG. 2 (b), the multipoint control device recombines the video data transmitted from a plurality of terminals for each video packet to synthesize the video disturbance. It is possible to generate a correct composite video. H. In the H.263 standard, there is a coding unit called GOB (Group Of Block). GOB is also an element that divides one video frame into a plurality of areas, and is composed of a plurality of macroblocks. As in the case of processing in units of video packets in the MPEG4 standard, H.264 is used. In the H.263 standard, the same effect can be obtained by performing processing in units of GOB.
[0025]
The data stream resulting from the encoding process is output to the buffer 104 and temporarily stored in units of video packets. Here, the data amount of the data stream temporarily stored in units of video packets is monitored by the data amount monitoring unit 107, and is within a certain threshold (VOPmax) until data for one image frame is accumulated. It is always checked whether the amount of data has been reduced. When the amount of data for one image frame falls within the threshold (VOPmax), the data is transmitted as it is to the multipoint control apparatus via the communication line by the transmission means 108.
[0026]
On the other hand, when the data amount exceeds the threshold value (VOPmax) before the data for one image frame is accumulated, the data amount monitoring unit 107 determines the data amount to the input frame control unit 105 and the encoding control unit 106. Control information for notifying that the threshold has been exceeded is output. At this time, the encoding control unit 106 that has received the control information interrupts the encoding process of the current image frame to the encoding unit 103 with the video packet at that time, and the video packets in the remaining image area are not_coded data. Control to output. The not_coded data is a code indicating that the image is exactly the same as the previous image frame on the time axis during the interframe coding in the MPEG standard. Since the data amount of the not_coded data itself is remarkably small, it can be completed as data for one image frame without increasing the data amount. When such a truncation process is performed, when the corresponding image frame is decoded, the area after the video packet is decoded as the previous image data. The input frame control means 105 uses the next digital video data stored in the frame memory 102 for the current encoding process so that the image data of the remaining area can be reconstructed in the next image frame during the decoding process. Control to rewrite with the previous digital video data. By performing such a rewriting process, the encoding unit 103 encodes the remaining area of the immediately preceding image frame that has been subjected to the truncation process during the encoding process of the next image frame. At the time of decoding processing, it is possible to reconstruct the completed image frame. By performing such a truncation process, fluctuations in the data amount of the data stream of the encoding result can be minimized, video data transmitted via the communication path can be made uniform, and multipoint It is possible to suppress a delay due to the synthesis process in the control device. This operation will be described in more detail next.
[0027]
Here, an abort process performed by the video signal encoding apparatus according to the present embodiment will be described with reference to FIG.
[0028]
In general, MPEG standards and H.264 standards. In the compression encoding method for video signals such as the H.263 standard, the data amount of the encoding result varies depending on the characteristics of the video and because variable length encoding is performed.
[0029]
FIG. 3A schematically shows the state. P1 to P4 represent video frames arranged in time order. The height of the rectangle represents the amount of data for each video frame. In the case of FIG. 3A, the data amount of the encoding result of the video frame P2 is large. In such a case, if this data is transmitted through the communication path, it takes more time to transmit the video data of the video frame P2 having a larger amount of data, and the arrival at the multipoint control device is also delayed. . Then, since the video data of the video frame P2 does not arrive in the multipoint control device, the synthesis process with the video data received from another terminal is hindered. If the synthesis process is waited until the video data of the video frame P2 arrives, a delay corresponding to that occurs, and the delay also affects the synthesis process of the subsequent video frames P3 and P4. On the other hand, if the video data of the video frame P2 is discarded without synthesizing the video data of the video frame P2, the video data of the next video frame P3 and the subsequent video frames P3 cannot be matched and the video data is disturbed in the decoding process. There is a problem that occurs. Therefore, in the present invention, when the data amount exceeds the threshold value (VOPmax) at the time of encoding the video frame P2, the encoding is stopped at the video packet at that time, and further increase of the data amount is prevented. This is shown in FIG. 3 (b). Since the video frame P2 is cut off in the middle of encoding, the video of the video frame P2 is rewritten before encoding of the next video frame P3 in order to complete the video data of the video frame P2. By doing so, the remaining area of the video of the video frame P2 is encoded, and the video data of the video frame P2 is completed at this point.
[0030]
By performing such processing, it is possible to generate a uniform data stream while suppressing fluctuations in the amount of data even in a compression coding system such as the MPEG standard. As a result, when the data stream is transmitted to the multipoint control device via the communication line, uniform data reception is possible, video composition processing is correctly performed in the multipoint control device, and there is little discomfort for the user. It is possible to provide high-quality composite video.
[0031]
【The invention's effect】
As described above, according to the present invention, even when a compression encoding method such as the MPEG standard is used, synthesis can be performed on the multipoint control device side, and the circuit configuration of the terminal can be simplified. In addition, by performing coding control in consideration of data amount fluctuations due to variable length coding, it is possible to provide a video signal coding device and a video conference system with less delay and higher video quality.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a video signal encoding device according to an embodiment of the present invention. FIG. 2 is an explanatory diagram showing a data configuration of encoding processing of the video signal encoding device of the present invention. FIG. 4 is a diagram showing details of the encoding process of the video signal encoding apparatus of the present invention. FIG. 4 is a block diagram showing the apparatus configuration of a multipoint video conference system. FIG. 5 is displayed on the TV screen on the terminal side. Fig. 6 is a block diagram showing the configuration of a conventional video conference system. Fig. 7 is a block diagram showing the configuration of a conventional video conference system.
101 Input processing means 102 Frame memory 103 Encoding means 104 Buffer 105 Input frame control means 106 Encoding control means 107 Data amount monitoring means 108 Transmission means

Claims

A video signal encoding device used in a video conference system that performs video transmission via a communication line,
Input processing means for digitally processing and converting the input video signal and outputting as digital video;
A frame memory for storing the digital video output from the input processing means;
Encoding means for taking out a digital video from the frame memory and compressing and encoding the digital video;
A buffer for temporarily storing data output from the encoding means;
Transmitting means for transmitting the data output from the buffer via a communication line;
It monitors the amount of data accumulated in the buffer, since the amount of data to notify that the data amount exceeds the threshold value to the input frame control means and the coding control means exceeds a predetermined threshold Data amount monitoring means for outputting the control information of
Arrangement of digital video stored in the frame memory so that the next digital video data is rewritten with the previous digital video data currently used for encoding processing based on control information output from the data amount monitoring means Input frame control means for controlling
And coding control means for controlling the coding means to stop coding processing based on control information output from the data amount monitoring means and to output the remaining image area as not_coded data. Video signal encoding device.

2. The video signal code according to claim 1, wherein when the input video signal is digitally converted, the input signal is converted into a video size to be encoded in advance and a digital video is generated and output. Device.

2. The video signal encoding apparatus according to claim 1, wherein said encoding means encodes based on the MPEG4 standard.

4. A video signal encoding apparatus according to claim 3, wherein said encoding means configures an MPEG4 standard video packet for each macroblock line, and encodes a motion vector so as to always point within the screen.

The encoding means is H.264. The video signal encoding apparatus according to claim 1, wherein encoding is performed based on the H.263 standard.

A video conference system comprising the video signal encoding device according to claim 1.