JP2004304410A

JP2004304410A - Communication processing apparatus, communication processing method, and computer program

Info

Publication number: JP2004304410A
Application number: JP2003093346A
Authority: JP
Inventors: Satoshi Miyazaki; 敏宮崎
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-03-31
Filing date: 2003-03-31
Publication date: 2004-10-28
Anticipated expiration: 2023-03-31
Also published as: JP4120440B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a communication processing apparatus in which the quality of part of data significant for a conversation can be selectively improved and to provide a method therefor. <P>SOLUTION: The power levels of the audio data contained in transmitted data and received data are compared, and either the transmitted data or the received data that contain the audio data of the higher power level are selected and identified as significant processing data. The more amount of resources is applied on the more significant processing data. Thus, the audio and video data on the side of a main conversation partner are transmitted/received as high-quality data to be presented to a user being a communicator. That is, the quality of the part of data significant for the conversation can be selectively improved, thereby realizing data communication, communication and the like of which the quality of the conversation in bodily sensation is improved. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、通信処理装置、および通信処理方法、並びにコンピュータ・プログラムに関する。さらに、詳細には、ＴＶ電話アプリケーションに代表されるような、音声と映像を同時に用いた個人間の双方向リアルタイムコミュニケーション（「双方向ビジュアルコミュニケーション」と称する）について、その高品質化を実現することを可能とした通信処理装置、および通信処理方法、並びにコンピュータ・プログラムに関する。
【０００２】
【従来の技術】
最近、ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）電話やインスタントメッセンジャー等、文字、音声、映像等を用いて、個人間で手軽にコミュニケーションをとる手法が確立されつつあり、その一部はＴＶ電話アプリケーションとしてすでにビジネスとしても展開されている。今後、ネットワークの広帯域化、低価格化にともなって、これら個人間でのコミュニケーション手段も一段と整備され、広範囲に使用されるようになることが予想される。
【０００３】
音声と映像を同時に用いた双方向ビジュアルコミュニケーションの実現方法のひとつとして、ＡＶストリーミングを使った、ＴＶ電話のようなアプリケーションがある。すなわち、パーソナルコンピュータ、あるいは携帯情報端末や携帯電話等の通信処理装置によって構成されるコミュニケーション用端末（以下、コミュニケーション端末と称する）に、マイクとカメラ、スピーカと映像ディスプレイを装備し、コミュニケーション参加者の音声／映像をリアルタイムで相互に送受信することにより、相手の音声／映像を視聴しながらコミュニケーションをとるという形態である。
【０００４】
図１を参照して、従来の通信方法による双方向ビジュアルコミュニケーションの構成例について説明する。
【０００５】
端末Ａ１１０のユーザ（ユーザＡ）と端末Ｂ１２０のユーザ（ユーザＢ）は、ネットワーク１３０を介して接続しており、ビジュアルコミュニケーションをしている状態にある。ここでは、それぞれの端末がネットワークを介して接続状態になるまでの手順は、発明の本質とは離れるので、説明は省略する。
【０００６】
端末Ａ１１０において、ユーザＡの音声および映像は、カメラＡ１１２、マイクＡ１１１からなる映像および音声取得部から端末Ａ１１０に取り込まれる。これらの音声／映像データは、送信部Ａ１１５において符号化、パケット化等、所定の処理をした後、端末Ａ１１０からネットワーク１３０に送信される。端末Ｂ１２０は、ネットワーク１３０を介して端末Ａ１１０からの音声／映像データを受信する。
【０００７】
端末Ｂ１２０は、データ受信後、受信部Ｂ１２６においてユーザＡの音声／映像データを格納したパケットからのデータ取得および復号等、所定の処理を実行した後、ディスプレイＢ１２４、スピーカＢ１２３を介して画像および音声データを出力してユーザＢに提示する。なお、各端末の送信部、受信部における具体的処理については後述する。
【０００８】
双方向データ通信においては、端末Ａ１１０から端末Ｂ１２０に対するデータ送信に並行して、端末Ｂ１２０から端末Ａ１１０に対するデータ送信も行われる。端末Ｂ１２０では、ユーザＢの音声／映像を、カメラＢ１２２、マイクＢ１２１を用いて取得し、取得データは、端末Ａ１１０の場合と同様、送信部Ｂ１２５において所定の処理をした後、ネットワーク１３０に送信される。端末Ａ１１０は、上述したようにユーザＡの音声／映像データをネットワーク１３０に送信すると同時に、端末Ｂ１２０からネットワーク１３０に送信されたユーザＢの音声／映像データを受信し、受信部Ａ１１６において所定の処理をした後、端末Ａ１１０上のディスプレイＡ１１４、スピーカＡ１１３を介して画像および音声データを出力してユーザＡに提示する。
【０００９】
通常、双方向ビジュアルコミュニケーションでは、ネットワークの帯域を効率よく使用するために、音声／映像のデータは、それぞれの端末に設けられたコーデックを用いて数分の一から数十分の一にデータ圧縮（符号化）された後、ネットワークに送信される。例えばＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅｅｘｐｅｒｔｓＧｒｏｕｐ）、ＡＴＲＡＣ（ＡｄａｐｔｉｖｅＴｒａｎｓｆｏｒｍＡｃｏｕｓｔｉｃＣｏｄｉｎｇ）等の符号化処理がなされる。
【００１０】
また、ネットワークにデータを出力する際には、符号化されたデータをそのまま流すのではなく、ネットワーク送受信に適したプロトコルでパケット化したものを使用する。たとえばＴＣＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ）や、ＵＤＰ（ＵｓｅｒＤａｔａｇｒａｍＰｒｏｔｏｃｏｌ）に従ったパケットを生成して生成パケットをネットワークに出力する。各端末の送信部、受信部は、これらの処理を行っている。
【００１１】
送信部および受信部の構成例を図２、図３に示す。図２に示すように、送信データ処理部２１０は、マイク２０１から取得した音声データをオーディオ符号化器２１１に入力し、ＡＴＲＡＣ等の所定フォーマットに従ったデータ符号化処理を行って、オーディオパケット生成部２１２において、符号化データをペイロードとして格納し、送信元、送信先アドレス等の所定のヘッダ情報を設定したパケットを生成してネットワーク送出部２１５を介してネットワーク２２０に送出する。
【００１２】
映像データはカメラ２０２によって取得され、ビデオ符号化器２１３に入力し、ＭＰＥＧ等の所定フォーマットに従ったデータ符号化処理を行って、ビデオパケット生成部２１４において、符号化データをペイロードとして格納し、送信元、送信先アドレス等の所定のヘッダ情報を設定したパケットを生成してネットワーク送出部２１５を介してネットワーク２２０に送出する。
【００１３】
これらのパケットを受信する端末における処理について図３を参照して説明する。ネットワーク３０１を介して受信データ処理部３１０のネットワーク受信部３１１が受信したパケットは、オーディオパケット解析部３１２およびビデオパケット解析部３１４に入力されて、各パケットからの符号化データの取り出し、各パケットから取得した符号化データの再配列等の処理を実行した後、それぞれオーディオ復号化器３１３、ビデオ復号化器３１５に入力され、それぞれＡＴＲＡＣ、ＭＰＥＧ等の復号シーケンスに従った復号処理が実行される。復号データはそれぞれスピーカ３２１、ディスプレイ３２２を介して出力される。
【００１４】
このように、双方向ビジュアルコミュニケーションの参加者が使用するコミュニケーション端末では、そのツールを使用する者の音声／映像を符号化しネットワークに送信する送信部の処理と、コミュニケーションに参加している他者からネットワークを通じて送られてくる音声／映像データの復号化を行う受信部の処理、さらに復号化されたそれら音声／映像データをスピーカ、ディスプレイを用いて出力する処理を、リアルタイムで同時に行う必要がある。
【００１５】
なお、画像、音声等の複数のメディアデータを通信するデータ通信装置を開示した従来技術としては、例えば特許文献１がある。特許文献１には、複数のメディアデータを圧縮して送信する場合に符号化手段を選択して切り替えることで効率的なデータ通信を行う構成が示されている。また、画像データの通信において、特定の領域画像のみを抽出して送受信する処理方式について特許文献２に開示されている。
【００１６】
【特許文献１】
特開平１１−１７７４３６
【特許文献２】
特開２００２−５１３１５
【００１７】
【発明が解決しようとする課題】
コミュニケーション端末として用いられる通信処理装置としては、例えばパーソナルコンピュータ、携帯情報端末、携帯電話などがあるが、これらのうち、携帯情報端末や携帯電話では、その筐体サイズ、消費電力、搭載プロセッサ、端末価格等、様々な要因により、パーソナルコンピュータと比べて、一般には、処理能力の低いシステムであることが多い。すなわち、端末のＣＰＵパワーや使用可能なメモリ量がＰＣ等に比較して劣る場合が多い。
【００１８】
上述したように、双方向ビジュアルコミュニケーションを行うためには、リアルタイム性を維持するための高速処理能力が必要とされる。その結果、携帯情報端末や携帯電話等の、処理能力の低いシステムを使用した端末上では、リアルタイム性を維持するためデータ品質を犠牲にすることが行われる。すなわち、通信データを低いフレームレートによる映像データ、狭帯域の音声データとするなど、データ品質を犠牲にしてリアルタイム性を維持することが行われる。従って、リアルタイム性と高品質を維持したコミュニケーションは難しいという問題があった。
【００１９】
本発明は、このような状況に鑑みてなされたものであり、処理能力の低いコミュニケーション端末でも、通信データ品質を落とすことなく、双方向ビジュアルコミュニケーションを実現可能とした通信処理装置、および通信処理方法、並びにコンピュータ・プログラムを提供することを目的とする。
【００２０】
【課題を解決するための手段】
本発明の第１の側面は、
符号化データの送受信処理を実行する通信処理装置であり、
送信データの符号化処理を実行する送信データ処理部と、
受信データの復号化処理を実行する受信データ処理部と、
送信データおよび受信データの比較に基づいて、送信データおよび受信データのいずれかを重点処理データとして識別する主従判断部と、
前記主従判断部の識別情報に基づいて、重点処理データを処理する前記送信データ処理部または受信データ処理部いずれか一方のデータ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力する制御部とを有し、
前記送信データ処理部および受信データ処理部は、前記制御部からの制御信号に基づいて送信データの符号化態様、受信データの復号化態様を変更する処理を実行する構成であることを特徴とする通信処理装置にある。
【００２１】
さらに、本発明の通信処理装置の一実施態様において、前記主従判断は、送信データおよび受信データに含まれる音声データのパワーレベル比較を実行し、パワーレベルが大きい音声データを含む送信データまたは受信データのいずれかを選択して重点処理データとして識別する処理を実行する構成であることを特徴とする。
【００２２】
さらに、本発明の通信処理装置の一実施態様において、前記送信データ処理部および受信データ処理部は、音声データの符号化部または復号化部を有し、前記制御部からの処理高品質化命令に基づいて、符号化帯域または復号化帯域の拡大処理を実行し、前記制御部からの処理簡易化命令に基づいて、符号化帯域または復号化帯域の削減処理を実行する構成であることを特徴とする。
【００２３】
さらに、本発明の通信処理装置の一実施態様において、前記送信データ処理部および受信データ処理部は、映像データの符号化部または復号化部を有し、前記制御部からの処理高品質化命令に基づいて、符号化フレームレートまたは復号化フレームレートの増加処理を実行し、前記制御部からの処理簡易化命令に基づいて、符号化フレームレートまたは復号化フレームレートの削減処理を実行する構成であることを特徴とする。
【００２４】
さらに、本発明の通信処理装置の一実施態様において、前記送信データ処理部および受信データ処理部は、前記制御部からの処理高品質化命令に基づいて、通信処理装置内のリソース適用率を増加させた処理を実行し、前記制御部からの処理簡易化命令に基づいて、通信処理装置内のリソース適用率を減少させた処理を実行する構成であることを特徴とする。
【００２５】
さらに、本発明の通信処理装置の一実施態様において、前記リソース適用率は、ＣＰＵの処理時間およびメモリの使用率を含むものであることを特徴とする。
【００２６】
さらに、本発明の通信処理装置の一実施態様において、前記通信処理装置は、前記主従判断部の識別情報を通信先端末に対して送信し、識別情報の通知処理を実行する構成であることを特徴とする。
【００２７】
さらに、本発明の通信処理装置の一実施態様において、前記制御部は、通信先端末から受信する主従判断識別情報に基づいて、前記送信データ処理部および受信データ処理部いずれか一方の重点処理データ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力する構成であることを特徴とする。
【００２８】
さらに、本発明の通信処理装置の一実施態様において、前記通信処理装置は、さらに、送信データおよび受信データのいずれかを重点処理データとして任意に設定可能なスイッチ手段を有し、前記制御部は、前記スイッチ手段の設定情報に基づいて、前記重点処理データを処理する前記送信データ処理部または受信データ処理部いずれか一方のデータ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力することを特徴とする。
【００２９】
さらに、本発明の第２の側面は、
送信データおよび受信データの比較に基づいて、送信データおよび受信データのいずれかを重点処理データとして識別する主従判断ステップと、
前記主従判断ステップにおける識別情報に基づいて、重点処理データを処理する前記送信データ処理部または受信データ処理部いずれか一方のデータ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力する制御ステップと、
前記送信データ処理部および受信データ処理部において、前記処理高品質化命令または処理簡易化命令に基づいて送信データの符号化態様、受信データの復号化態様を変更する処理を実行する処理変更ステップと、
を有することを特徴とする通信処理方法にある。
【００３０】
さらに、本発明の通信処理方法の一実施態様において、前記主従判断ステップは、送信データおよび受信データに含まれる音声データのパワーレベル比較を実行し、パワーレベルが大きい音声データを含む送信データまたは受信データのいずれかを選択して重点処理データとして識別する処理を実行することを特徴とする。
【００３１】
さらに、本発明の通信処理方法の一実施態様において、前記送信データ処理部および受信データ処理部は、音声データの符号化部または復号化部を有し、前記処理高品質化命令に基づいて、符号化帯域または復号化帯域の拡大処理を実行し、前記処理簡易化命令に基づいて、符号化帯域または復号化帯域の削減処理を実行することを特徴とする。
【００３２】
さらに、本発明の通信処理方法の一実施態様において、前記送信データ処理部および受信データ処理部は、映像データの符号化部または復号化部を有し、前記処理高品質化命令に基づいて、符号化フレームレートまたは復号化フレームレートの増加処理を実行し、前記処理簡易化命令に基づいて、符号化フレームレートまたは復号化フレームレートの削減処理を実行することを特徴とする。
【００３３】
さらに、本発明の通信処理方法の一実施態様において、前記送信データ処理部および受信データ処理部は、前記処理高品質化命令に基づいて、通信処理装置内のリソース適用率を増加させた処理を実行し、前記処理簡易化命令に基づいて、通信処理装置内のリソース適用率を減少させた処理を実行することを特徴とする。
【００３４】
さらに、本発明の通信処理方法の一実施態様において、前記リソース適用率は、ＣＰＵの処理時間およびメモリの使用率を含むものであることを特徴とする。
【００３５】
さらに、本発明の通信処理方法の一実施態様において、前記通信処理方法は、さらに、前記主従判断ステップにおける識別情報を通信先端末に対して送信し、識別情報の通知処理を実行するステップを有することを特徴とする。
【００３６】
さらに、本発明の通信処理方法の一実施態様において、前記通信処理方法は、さらに、通信先端末から受信する主従判断識別情報に基づいて、前記送信データ処理部および受信データ処理部いずれか一方の重点処理データ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力するステップを有することを特徴とする。
【００３７】
さらに、本発明の通信処理方法の一実施態様において、前記通信処理方法は、さらに、送信データおよび受信データのいずれかを重点処理データとして任意に設定可能なスイッチ手段による重点処理データ設定ステップを有し、前記制御ステップは、前記スイッチ手段の設定情報に基づいて、前記重点処理データを処理する前記送信データ処理部または受信データ処理部いずれか一方のデータ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力することを特徴とする。
【００３８】
さらに、本発明の第３の側面は、
符号化データの通信処理を実行するコンピュータ・プログラムであり、
送信データおよび受信データの比較に基づいて、送信データおよび受信データのいずれかを重点処理データとして識別する主従判断ステップと、
前記主従判断ステップにおける識別情報に基づいて、重点処理データを処理する前記送信データ処理部または受信データ処理部いずれか一方のデータ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力する制御ステップと、
前記送信データ処理部および受信データ処理部において、前記処理高品質化命令または処理簡易化命令に基づいて送信データの符号化態様、受信データの復号化態様を変更する処理を実行する処理変更ステップと、
を有することを特徴とするコンピュータ・プログラムにある。
【００３９】
【作用】
本発明の構成によれば、会話の主従判定による処理の重み付けに基づいて通信処理装置のリソース配分を変更して送信データあるいは受信データの処理を行う構成としたので、主会話者側の音声および映像データを高品質データとして送受信して通信者であるユーザに提示することが可能となる。すなわち、会話にとって重要な部分のデータを選択的に高品質化することが可能となり、体感上の会話のクォリティを高くしたデータ通信、コミュニュケーションが実現される。
【００４０】
さらに、本発明の構成によれば、送信データおよび受信データに含まれる音声データのパワーレベル比較を実行し、パワーレベルが大きい音声データを含む送信データまたは受信データのいずれかを選択して重点処理データとして識別する処理を実行する構成としたので、実際に話を行っているユーザを主会話者として判断し、実際に話を行っているユーザの音声データおよび映像データを選択的に高品質化することが可能となる。
【００４１】
なお、本発明のコンピュータ・プログラムは、例えば、様々なプログラム・コードを実行可能な汎用コンピュータ・システムに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体、例えば、ＣＤやＦＤ、ＭＯなどの記憶媒体、あるいは、ネットワークなどの通信媒体によって提供可能なコンピュータ・プログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータ・システム上でプログラムに応じた処理が実現される。
【００４２】
本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づく、より詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。
【００４３】
【発明の実施の形態】
以下、図面を参照しながら、本発明の通信処理装置、および通信処理方法、並びにコンピュータ・プログラムの詳細について説明する。
【００４４】
本発明の通信処理装置は、音声および画像を伴う双方向ビジュアルコミュニケーションを実行し、データ通信の実行中に会話の主従関係を判断する。その上で、会話の主となる側に優先的に通信処理装置の有する情報処理のリソースを配分することにより、処理能力の低い端末でも円滑な双方向ビジュアルコミュニケーションを実現する。
【００４５】
なお、「情報処理のリソース」とは、双方向ビジュアルコミュニケーションの処理を行うのに必要となる通信処理装置の制御部（例えばＣＰＵ）や使用可能なメモリ量などである。具体的には、例えば双方向ビジュアルコミュニケーションを実行する場合に必要となる送信データの符号化、受信データの復号処理等に必要となるＣＰＵを含むデータ処理手段およびメモリ等であり、本発明の通信処理装置では、会話の主となる側に優先的にこれらの情報処理のリソースを配分して処理を実行する。
【００４６】
すなわち、通信処理装置において会話の主となる側が自装置側のユーザであるか、通信先の通信処理装置側のユーザであるかを判定し、その判定に基づいて、ＣＰＵを含むデータ処理手段の適用率、および使用するメモリ量を変更して、会話の主となる側のデータ処理により多くのリソースを配分して処理を実行する。
【００４７】
以下、図を参照して本発明の通信処理装置の構成および通信処理の手順についての詳細を説明する。
【００４８】
複数の通信処理装置間において、線または無線のネットワークを介した通信を実行する環境において、例えば通信処理装置間においてユーザが会話をする場合、話をする側とそれを聞く側が存在し、通常は、その立場が順次入れ替わることで双方向のコミュニケーションが成立する。
【００４９】
本発明では、複数の通信処理装置を用いてネットワークを介したデータ通信を実行している複数ユーザ中、主に話をしているユーザ側、すなわち多くの音声データを送信している側のユーザを主会話者、主会話者の話を聞いている側、すなわち音声データを受信し再生している側のユーザを従会話者と称する。双方向通信を行う場合、通常は主会話者と従会話者は固定されるものではなく適宜入れ替わることで会話が成立する。従って会話の参加者は、会話の内容に応じて、主会話者にも従会話者にもなりえる。
【００５０】
本発明の通信処理装置では、通信データとなるユーザの音声データの入出力状況を監視し、監視情報に基づいて随時、主従関係を判定し、判定結果に基づいてリソース配分を随時更新変更し、最適なリソースの配分による通信データの高品質化を実現させるものである。
【００５１】
通信処理装置を用いて、双方向ビジュアルコミュニケーションを実行する場合、主会話者と従会話者それぞれが使用している例えば携帯電話等の通信処理装置は、ユーザの音声および映像データをマイクおよびカメラにより取得して符号化処理の後、ネットワークを通じて相手方へと送信する。
【００５２】
従来の通信処理装置（端末）では、その端末を使用する人間が主会話者であるか、従会話者であるかを区別することなく、音声および映像データの符号化処理、符号化データ送信処理、および符号化データ受信処理、受信データ復号化処理等、通信データの送受信および再生に必要な各種の処理を、単純な処理プロセスの時分割処理等に基づいて実行している。すなわち、各通信端末のプロセッサ（ＣＰＵ等）が、蓄積された処理対象データに応じて順次処理を行うのが一般的なデータ処理構成となっている。
【００５３】
ところが、実際の双方向ビジュアルコミュニケーションで交わされるデータの内容を考えると、主会話者からのデータは会話の内容そのものなので、主会話者の音声および映像は、ともに通信処理装置を介して実行中の会話にとって重要なデータである場合が多い。一方、従会話者、すなわち主会話者の話を聞いている側からの音声データおよび映像データは、主会話者の話す内容に対する相槌や返事などを示すのみ、あるいは、なんら言葉を発していない場合などが多く、従会話者の音声および映像データは、ともに通信処理装置を介して実行中の会話にとって重要性が高いものでない場合が多い。
【００５４】
従って、従会話者の音声および映像データに対するデータ処理に、主会話者の音声および映像データに対する処理と同程度のシステムリソースを配分することは、データの重要性を鑑みた場合、最適な処理とは言えない。主会話者、従会話者に対応するデータ処理を均等に実行しても問題がないレベルの処理能力の高いプロセッサや充分なメモリ容量を持つ機器とすることはコスト高、および機器の大型化を招くこととなる。
【００５５】
そこで本発明の通信処理装置では、通信を実行している複数ユーザの音声データの入出力状況を監視し、監視情報に基づいて随時、主従関係を判定し、判定結果に基づいてリソース配分を随時更新して変更し、主会話者からのデータ処理により多くのリソースを適用することで主会話者のデータ処理を優先的に実行し、より重要度の高いデータ、すなわち主会話者の音声データおよび画像データをより高品質なデータとして送受信しユーザに提供することを可能とするものである。
【００５６】
図４を参照して、本発明の通信処理装置の構成および処理について説明する。本発明の通信処理装置４００は、ネットワーク４０１を介して受信したデータを処理する受信データ処理部４１０、受信データ処理部の処理後のデータを出力するディスプレイ４１４、スピーカ４１７、ネットワークに出力するユーザの音声および映像データを取得するマイク４３１、カメラ４３４、取得したデータに基づく送信データを生成する送信データ処理部４３０を有する。
【００５７】
受信データ処理部４１０では、ネットワーク４０１を介してネットワーク受信部４１１が受信したパケットを、ビデオパケット解析部４１２およびオーディオパケット解析部４１５に入力し、各パケットからの符号化データの取り出し、各パケットから取得した符号化データの再配列等の処理を実行した後、それぞれビデオ復号化器４１３、オーディオ復号化器４１６に入力し、それぞれＡＴＲＡＣ、ＭＰＥＧ等の復号シーケンスに従った復号処理が実行し、復号データをそれぞれディスプレイ４１４、スピーカ４１７を介して出力する。
【００５８】
送信データ処理部４３０では、マイク４３１から取得した音声データをオーディオ符号化器４３２に入力し、ＡＴＲＡＣ等の所定フォーマットに従ったデータ符号化処理を行って、オーディオパケット生成部４３３において、符号化データをペイロードとして格納し、送信元、送信先アドレス等の所定のヘッダ情報を設定したパケットを生成してネットワーク送信部４３７を介してネットワーク４０１に送出する。
【００５９】
映像データはカメラ４３４によって取得し、ビデオ符号化器４３５に入力し、ＭＰＥＧ等の所定フォーマットに従ったデータ符号化処理を行って、ビデオパケット生成部４３６において、符号化データをペイロードとして格納し、送信元、送信先アドレス等の所定のヘッダ情報を設定したパケットを生成してネットワーク送信部４３７を介してネットワーク４０１に送出する。
【００６０】
本発明の通信処理装置は、上述した受信データ処理部４１０、送信データ処理部４３０の他に、会話主従判断部４２０、符号化器／復号化器制御部４２１、符号化プリプロセス部４２３、および復号化プリプロセス部４２２を備える。
【００６１】
会話主従判断部４２０は、送信データおよび受信データのいずれかを重点処理データとするかを識別する処理を実行する。会話主従判断部４２０は、マイク４３１の取得する音声データである送信音声データと、ネットワーク４０１を介して受信した、コミュニケーション相手の音声データである受信音声データを入力する。ここで、送信音声データは、この通信処理装置４００を使用しているユーザが発し、マイク４３１によって取得された音声データである。一方、受信音声データは、通信相手ユーザの音声データであり、ネットワーク４０１を介してネットワーク受信部４１１が受信し、オーディオパケット解析部４１５においてパケットから取得した符号化データをオーディオ復号化器４１６において復号した通信相手ユーザの音声データである。
【００６２】
会話主従判断部４２０は、受信データ処理部４１０のオーディオ復号化器４１６の出力する通信相手ユーザの音声データである受信音声データと、マイク４３１の取得する音声データである送信音声データとを入力し、これら２つの音声データのパワーレベルを比較する。
【００６３】
音声が全く入力されない場合は、パワーレベルは０であるが、音声が入力されると会話主従判断部４２０において入力音声データに基づいてパワーレベルが計測される。主に話をしているのがこの通信処理装置４００側のユーザである場合は、マイク４３１の取得する音声データである送信音声データのパワーレベルが、通信相手ユーザの音声データである受信音声データのパワーレベルより大となる。すなわち、２つの入力音声データのパワーレベルは、
送信音声データ＞受信音声データ
となる。
【００６４】
一方、主に話をしているのが通信相手ユーザであれば、通信相手ユーザの音声データである受信音声データのパワーレベルが、マイク４３１の取得する音声データである送信音声データのパワーレベルより大となる。すなわち、２つの入力音声データのパワーレベルは、
送信音声データ＜受信音声データ
となる。
【００６５】
会話主従判断部４２０は、
送信音声データ＞受信音声データ
であれば、この通信処理装置４００側のユーザを主会話者とし、通信処理装置４００と通信を実行している通信相手ユーザを従会話者と判定する。
送信音声データ＜受信音声データ
であれば、この通信処理装置４００側のユーザを従会話者とし、通信処理装置４００と通信を実行している通信相手ユーザを主会話者と判定する。
【００６６】
会話主従判断部４２０は、送信音声データおよび受信音声データを継続的、あるいは所定のサンプリングタイミング毎に入力し入力データに基づいて、両データのパワーレベルを比較し、比較結果に基づいて、主会話者および従会話者がいずれのユーザであるかを判断する。
【００６７】
会話主従判断部４２０において実行する会話の主従判断処理シーケンスを説明するフローチャートを図５に示す。
【００６８】
ステップＳ１０１において、会話主従判断部４２０は、マイク４３１の取得した送信音声データの入力の有無を判定する。これは、会話主従判断部４２０が送信音声データの入力レベルに基づいて判定する。
【００６９】
送信音声データの入力があると判定すると、ステップＳ１０２に進み、受信データ処理部４１０のオーディオ復号化器４１６の出力する通信相手ユーザの音声データの入力の有無を判定する。これは、会話主従判断部４２０が受信音声データの入力レベルに基づいて判定する。
【００７０】
送信音声データの入力があり、かつ受信音声データの入力がある場合は、ステップＳ１０３において、送信音声データパワーレベルと、受信音声データパワーレベルの比較判定処理を実行する。
【００７１】
送信音声データパワー＞受信音声データパワー
である場合は、ステップＳ１０４に進み、ローカル端末が主会話者であることを示す識別信号を符号化器／復号化器制御部４２１に出力する。なお、図５において、「ローカル端末」とは、このフローチャートによる会話の主従判断処理を行っている会話主従判断部を備えた通信処理端末であり、「リモート端末」とは、「ローカル端末」とネットワークを介して接続し通信を実行している通信処理端末である。
【００７２】
送信音声データパワー＞受信音声データパワー
でない場合は、ステップＳ１１２に進み、リモート端末側ユーザが主会話者であることを示す識別信号を符号化器／復号化器制御部４２１に出力する。
【００７３】
ステップＳ１０２の判定がＮｏ、すなわち、送信音声データの入力があるが受信音声データの入力がない場合は、ステップＳ１０３における入力音声パワーレベルの比較処理を実行することなく、ステップＳ１０４に進み、ローカル端末側ユーザが主会話者であることを示す識別信号を符号化器／復号化器制御部４２１に出力する。
【００７４】
また、ステップＳ１０１において、送信音声データの入力がないと判定すると、ステップＳ１１１に進み、受信データ処理部４１０のオーディオ復号化器４１６の出力する通信相手ユーザの音声データの入力の有無を判定する。これは、会話主従判断部４２０が受信音声データの入力レベルに基づいて判定する。
【００７５】
ステップＳ１１１の判定がＹｅｓ、すなわち、送信音声データの入力がなく、受信音声データの入力のみがある場合は、ステップＳ１１２に進み、リモート端末側ユーザが主会話者であることを示す識別信号を符号化器／復号化器制御部４２１に出力する。
【００７６】
ステップＳ１１１の判定がＮｏ、すなわち、送信および受信音声データの双方の入力がない場合は、リソース制御を実行する必要がなく、符号化器／復号化器制御部４２１に対する識別信号の出力を実行することなくことなく処理を終了する。
【００７７】
なお、図５に示す処理は、会話主従判断部４２０において、継続的にあるいは予め定められたサンプリングタイミング毎に繰り返し実行される処理である。
【００７８】
上述したように、会話主従判断部４２０において２つの入力音声パワーレベルに基づいて会話の主従が判定されると、会話主従判断部４２０は、符号化器／復号化器制御部４２１に対していずれの端末側が主であるかを示す識別信号を出力する。
【００７９】
符号化器／復号化器制御部４２１は、会話主従判断部４２０から入力する識別信号に従って、復号化プリプロセス部４２２および符号化プリプロセス部４２３に対して識別信号に応じた制御命令を出力する。符号化器／復号化器制御部４２１は、会話主従判断部４２０から入力する識別情報に基づいて、重点処理データを処理する送信データ処理部４３０または受信データ処理部４１０いずれか一方のデータ処理部に対して処理高品質化命令を出力し、他方のデータ処理部に対して処理簡易化命令を出力する。
【００８０】
図４に示す通信処理装置（ローカル端末）が主会話者側の端末であるとした場合、ネットワークを経由して送られてくるリモート端末の従会話者側の音声および画像は、コミュニケーション上、重要ではないデータであり、従会話者の音声および映像データに関する処理を軽減、すなわち、従会話者の音声および映像データに関する処理に適用するリソースを減少させるように、符号化器／復号化器制御部４２１は、復号化プリプロセス部４２２に制御命令を出力する。
【００８１】
復号化プリプロセス部４２２は、符号化器／復号化器制御部４２１からの制御命令としてリソース減少命令を入力すると、ビデオ復号化器４１３に対して復号処理における復号映像データのフレームレートを低下させる処理変更命令を出力し、ビデオ復号化器４１３は処理変更命令に応じて復号フレームレートを低下させる。この結果、処理負荷が減少され、通信処理装置のリソース（ＣＰＵ、メモリ等）を他の処理に優先的に適用することが可能となる。
【００８２】
復号化プリプロセス部４２２は、符号化器／復号化器制御部４２１からの制御命令としてリソース減少命令を入力すると、ビデオ復号化器４１３に対して処理変更命令を出力し、ビデオ復号化器４１３は、リソース適用率を減少させた処理、すなわち、ＣＰＵの処理時間およびメモリの使用率を減少させて復号フレームレートを低下させた処理を行う。
【００８３】
さらに、復号化プリプロセス部４２２は、符号化器／復号化器制御部４２１からの制御命令としてリソース減少命令を入力すると、オーディオ復号化器４１６に対して、音声データの復号帯域を削減し狭帯域化するなどの処理変更命令を出力し、オーディオ復号化器４１６は、処理変更命令に従って復号処理態様の変更、すなわち音声データの復号帯域を削減し狭帯域化するなどの処理を実行し、音声データ復号処理の負荷を低下させる。この結果、処理負荷が減少され、通信処理装置のリソース（ＣＰＵ、メモリ等）を他の処理に優先的に適用することが可能となる。
【００８４】
復号化プリプロセス部４２２は、符号化器／復号化器制御部４２１からの制御命令としてリソース減少命令を入力すると、オーディオ復号化器４１６に対して処理変更命令を出力し、オーディオ復号化器４１６は、リソース適用率を減少させた処理、すなわち、ＣＰＵの処理時間およびメモリの使用率を減少させて復号帯域を削減した処理を行う。
【００８５】
また、図４に示す通信処理装置（ローカル端末）が主会話者側の端末であるとした場合、通信処理装置（ローカル端末）側の音声および画像は、コミュニケーション上、重要なデータであり、主会話者の音声および映像データに関する処理を重点的に実行、すなわち、主会話者の音声および映像データに関する処理に適用するリソースを増大させるように、符号化器／復号化器制御部４２１は、符号化プリプロセス部４２３に制御命令を出力する。
【００８６】
符号化プリプロセス部４２３は、符号化器／復号化器制御部４２１からの制御命令としてリソース増大命令を入力すると、ビデオ符号化器４３５に対して符号化処理における符号化映像データのフレームレートを可能な範囲で増加させる処理変更命令を出力し、ビデオ符号化器４３５は処理変更命令に応じて符号化フレームレートを可能な範囲で増加させる。この場合の処理負荷は増大するが、先に説明したように、受信データ処理部４１０での処理負荷軽減が実行されており、通信処理装置のリソース（ＣＰＵ、メモリ等）を、送信データ処理部４３０において優先的に適用することが可能であり、符号化フレームレートの増加に対応することが可能となる。この結果、符号化データのデータ品質が向上し、通信先のリモート端末では、より高品質な主会話者の映像データを再生出力することが可能となる。
【００８７】
符号化プリプロセス部４２３は、符号化器／復号化器制御部４２１からの制御命令としてリソース増大命令を入力すると、ビデオ符号化器４３５に対して処理変更命令を出力し、ビデオ符号化器４３５は、リソース適用率を増大させた処理、すなわち、ＣＰＵの処理時間およびメモリの使用率を増加させて符号化フレームレートを可能な範囲で増加させた処理を行う。
【００８８】
さらに、符号化プリプロセス部４２３は、符号化器／復号化器制御部４２１からの制御命令としてリソース増大命令を入力すると、オーディオ符号化器４３２に対して、音声データの符号化帯域を可能な範囲で広帯域化するなどの処理変更命令を出力し、オーディオ符号化器４３２は、処理変更命令に従って復号処理態様の変更を実行し、音声データの符号化データの品質を向上させる処理、すなわち音声データの符号化帯域を拡大し広帯域化するなどの処理を実行する。
【００８９】
符号化プリプロセス部４２３は、符号化器／復号化器制御部４２１からの制御命令としてリソース増大命令を入力すると、オーディオ符号化器４３２に対して処理変更命令を出力し、オーディオ符号化器４３２は、リソース適用率を増加させた処理、すなわち、ＣＰＵの処理時間およびメモリの使用率を増加させて符号化帯域を拡大した処理を行う。
【００９０】
この場合の処理負荷は増大するが、先に説明したように、受信データ処理部４１０での処理負荷軽減が実行されており、通信処理装置のリソース（ＣＰＵ、メモリ等）を、送信データ処理部４３０において優先的に適用することが可能であり、符号化帯域を可能な範囲で広帯域化するなどの処理が可能となる。この結果、符号化データのデータ品質が向上し、通信先のリモート端末では、より高品質な主会話者の音声データを再生出力することが可能となる。
【００９１】
一方、図４に示す通信処理装置（ローカル端末）が従会話者側の端末で、リモート端末が主会話者側であるとした場合、ネットワークを経由して送られてくるリモート端末の主会話者側の音声および画像は、コミュニケーション上、重要なデータであり、リモート端末から受信する音声および映像データに関する処理を重点的に処理、すなわち、リモート端末の主会話者側の音声および画像データに関する処理に適用するリソースを増大させるように、符号化器／復号化器制御部４２１は、復号化プリプロセス部４２２に制御命令を出力する。
【００９２】
復号化プリプロセス部４２２は、符号化器／復号化器制御部４２１からの制御命令としてリソース増大命令を入力すると、ビデオ復号化器４１３に対して復号処理における復号映像データのフレームレートを可能な範囲で増加させる処理変更命令を出力し、ビデオ復号化器４１３は処理変更命令に応じて復号フレームレートを可能な範囲で増加させる。この結果、処理負荷が増加するが、後述するように、送信データ処理部４３０では処理負荷軽減が実行され、通信処理装置のリソース（ＣＰＵ、メモリ等）を、受信データ処理部４１０において優先的に適用することが可能となり、復号フレームレートの増加に対応することが可能となる。この結果、復号データのデータ品質が向上し、ローカル端末では、通信先のリモート端末から受信する映像データを高品質な映像データとしてディスプレイ４１４において出力することが可能となる。
【００９３】
さらに、復号化プリプロセス部４２２は、符号化器／復号化器制御部４２１からの制御命令としてリソース増大命令を入力すると、オーディオ復号化器４１６に対して、音声データの復号帯域を可能な範囲で広帯域化するなどの処理変更命令を出力し、オーディオ復号化器４１６は、処理変更命令に従って復号処理態様の変更を実行し、復号音声データの高品質化を図る。この結果、処理負荷が増加するが、後述するように、送信データ処理部４３０では処理負荷軽減が実行され、通信処理装置のリソース（ＣＰＵ、メモリ等）を、受信データ処理部４１０において優先的に適用することが可能となり、音声データの復号帯域を可能な範囲で広帯域化するなどの処理が可能となる。この結果、復号データのデータ品質が向上し、ローカル端末では、通信先のリモート端末から受信する音声データを高品質な音声データとしてスピーカ４１７において出力することが可能となる。
【００９４】
また、図４に示す通信処理装置（ローカル端末）が従会話者側の端末であるとした場合、通信処理装置（ローカル端末）側の音声および画像は、コミュニケーション上、重要でないデータであり、従会話者の音声および映像データに関する処理を軽減させて実行、すなわち、ローカル端末である通信処理装置４００から送信する従会話者の音声および映像データに関する処理に適用するリソースを減少させるように、符号化器／復号化器制御部４２１は、符号化プリプロセス部４２３に制御命令を出力する。
【００９５】
符号化プリプロセス部４２３は、符号化器／復号化器制御部４２１からの制御命令としてリソース減少命令を入力すると、ビデオ符号化器４３５に対して符号化処理における符号化映像データのフレームレートを減少させる処理変更命令を出力し、ビデオ符号化器４３５は処理変更命令に応じて符号化フレームレートを減少させる。この結果、処理負荷が減少し、先に説明した受信データ処理部４１０での処理負荷増大に対応可能となる。
【００９６】
さらに、符号化プリプロセス部４２３は、符号化器／復号化器制御部４２１からの制御命令としてリソース減少命令を入力すると、オーディオ符号化器４３２に対して、音声データの符号化帯域を狭帯域化するなどの処理変更命令を出力し、オーディオ符号化器４３２は、処理変更命令に従って符号化処理態様の変更を実行し、音声データの符号化データの品質を低下させる処理を実行する。この結果、処理負荷が減少し、先に説明した受信データ処理部４１０での処理負荷増大に対応可能となる。
【００９７】
なお、オーディオ、ビデオの符号化器および復号化器において負荷を変動させるための処理態様の変更としては、例えば上記したように、オーディオの場合は処理帯域の変更、ビデオデータの場合はフレームレートの変更がある。具体的には、オーディオデータの処理負荷を軽減したい場合は、高音域をカットして低域だけの符号化あるいは復号化を実行することにより、高音域の処理に必要だったリソースの軽減が可能となる。またビデオデータの符号化または復号化において処理負荷を軽減したい場合は、フレームレートを落とすことにより、１秒あたりに必要なビデオデータの符号化および復号化処理のリソースを軽減することが可能である。
【００９８】
上述した符号化器／復号化器制御部４２１の処理手順について、図６を参照して説明する。ステップＳ２０１において、符号化器／復号化器制御部４２１は、会話主従判断部４２０が音声パワーレベルに基づいて判断したローカル端末、リモード端末いずれの端末側のユーザが主会話者であるかを示す識別信号を入力する。
【００９９】
ステップＳ２０２において、入力識別信号に基づいてローカル端末側が主会話者、すなわち、この処理フローを実行している端末を利用しているユーザ側が主会話者であることを示す識別信号であると判定された場合は、ステップＳ２０３において、受信データ処理部４１０内の復号化部、すなわち、ビデオ復号化器４１３およびオーディオ復号化器４１６に対して処理簡易化命令、すなわち制御命令としてのリソース減少命令を出力する。さらに、ステップＳ２０４において、送信データ処理部４３０内の符号化部、すなわち、オーディオ符号化器４３２およびビデオ符号化器４３５に対して処理高品質化命令、すなわち制御命令としてのリソース増大命令を出力する。
【０１００】
この処理により、この処理フローを実行している端末を利用しているユーザ側、すなわち主会話者の音声および映像の処理により多くのリソースが適用され、高品質なデータがネットワークを介して出力され、リモート端末において、高品質データの再生が実行される。なお、リモート端末も、図４に示すと同様の会話主従判断部による主会話者の判断を行い、リソース配分を行う機器であれば、この場合、リモート端末側では、主会話者側のローカル端末から受信する音声パワーを大と判定し、ローカル端末から受信する音声、映像の処理にリソースをより多く適用することになる。従って、ローカル端末から送信される高品質データを損なうことなく復号、再生することが可能となり、主会話者の音声および映像の高品質データ再生が実行される。
【０１０１】
すなわち、ネットワークを介した通信を実行する通信処理装置の双方が図４に示す会話主従判断部を有し、主従判断に基づくリソース配分を実行すれば、主会話者側のデータの優先的な処理が、双方の機器において実行され、高品質符号化データの生成、送信、受信、高品質符号化データの復号および再生がすべて実行されることになる。なお、一方のみが、図４に示す会話主従判断部を有し、主従判断に基づくリソース配分を実行する機器である場合においても、その機器においては主会話者側のデータ処理の優先実行が可能であり、自機器における処理の効率化および主会話者側のデータ高品質化が実現されることになる。
【０１０２】
一方、ステップＳ２０２において、入力識別信号に基づいてローカル端末側が主会話者でない、すなわち、この処理フローを実行している端末と通信を実行しているリモート端末を利用しているユーザ側が主会話者であることを示す識別信号であると判定された場合は、ステップＳ２１１に進み、受信データ処理部４１０内の復号化部、すなわち、ビデオ復号化器４１３およびオーディオ復号化器４１６に対して処理高品質化命令、すなわち制御命令としてのリソース増大命令を出力する。さらに、ステップＳ２１２において、送信データ処理部４３０内の符号化部、すなわち、オーディオ符号化器４３２およびビデオ符号化器４３５に対して処理簡易化命令、すなわち制御命令としてのリソース減少命令を出力する。
【０１０３】
この処理により、この処理フローを実行している端末を利用しているユーザ側、すなわち従会話者の音声および映像の処理リソースが減少され、リモート端末からの受信データに対する処理に多くのリソースが提供され、リモート端末から受信する音声および映像の高品質データがディスプレイ４１４、スピーカ４１７を介して出力されることになる。
【０１０４】
なお、リモート端末も、図４に示すと同様の会話主従判断部による主会話者の判断を行い、リソース配分を行う機器であれば、この場合、リモート端末側では、自装置すなわちリモート端末側が主会話者であると判定し、自装置から送信する音声、映像の処理にリソースをより多く適用することになる。従って、送信するデータを高品質データとする処理を行うことになり、２つの通信処理装置において、高品質データの生成、送信、受信、高品質符号化データの復号および再生がすべて実行されることになる。
【０１０５】
すなわち、ネットワークを介した通信を実行する通信処理装置の双方が図４に示す会話主従判断部を有し、主従判断に基づくリソース配分を実行すれば、主会話者側のデータの優先的な処理が、双方の機器において実行され、高品質符号化データの生成、送信、受信、高品質符号化データの復号および再生がすべて実行されることになる。なお、一方のみが、図４に示す会話主従判断部を有し、主従判断に基づくリソース配分を実行する機器である場合においても、その機器においては主会話者側のデータ処理の優先実行が可能であり、自機器における処理の効率化および主会話者側のデータ高品質化は実現される。
【０１０６】
図７および図８に本発明の通信処理装置に構成されるオーディオ符号化器、復号化器と、ビデオ符号化器とビデオ復号化器の詳細構成を示す。
【０１０７】
図７は、（ａ）オーディオ符号化器と、（ｂ）オーディオ復号化器の構成を示している。（ａ）に示すオーディオ符号化器４３２は、ＡＴＲＡＣ等のオーディオデータの符号化処理を実行するオーディオ符号化部コア５１１と、オーディオ符号化部コア５１１に対して符号化処理態様、具体的には例えば符号化帯域の設定処理を行うオーディオ符号化帯域制御部５１２とを有する。
【０１０８】
オーディオ符号化帯域制御部５１２は、符号化プリプロセス部４２３から、音声データの符号化帯域を可能な範囲で広帯域化あるいは狭帯域化等を指示する処理変更命令を入力する。オーディオ符号化帯域制御部５１２は、入力命令に基づいて符号化帯域の設定情報をオーディオ符号化部コア５１１に出力し、オーディオ符号化部コア５１１では設定された帯域に従った符号化処理を実行する。
【０１０９】
（ｂ）に示すオーディオ復号化器４１６は、ＡＴＲＡＣ等のオーディオデータの復号処理を実行するオーディオ復号化部コア５４１と、オーディオ復号化部コア５４１に対して復号化処理態様、具体的には例えば復号帯域の設定処理を行うオーディオ復号化帯域制御部５４２とを有する。
【０１１０】
オーディオ復号化帯域制御部５４２は、復号化プリプロセス部４２２から、音声データの復号帯域を可能な範囲で広帯域化あるいは狭帯域化等を指示する処理変更命令を入力する。オーディオ復号化帯域制御部５４２は、入力命令に基づいて復号帯域の設定情報をオーディオ復号化部コア５４１に出力し、オーディオ復号化部コア５４１では設定された帯域に従った復号化処理を実行する。
【０１１１】
このように、オーディオ符号化器４３２、およびオーディオ復号化器４１６では、符号化あるいは復号化帯域の設定を変更して処理負荷を適宜変更し、使用するリソースの変更を行う。
【０１１２】
図８は、（ａ）ビデオ符号化器と、（ｂ）ビデオ復号化器の構成を示している。（ａ）に示すビデオ符号化器４３５は、ＭＰＥＧ等のビデオデータの符号化処理を実行するビデオ符号化部コア６１１と、ビデオ符号化部コア６１１に対して符号化処理態様、具体的には例えば符号化フレームレートの設定処理を行うビデオ符号化フレームレート制御部６１２とを有する。
【０１１３】
ビデオ符号化フレームレート制御部６１２は、符号化プリプロセス部４２３から、ビデオデータの符号化フレームレートを高く、あるいは低くする等の処理変更命令を入力する。ビデオ符号化フレームレート制御部６１２は、入力命令に基づいて符号化フレームレートの設定情報をビデオ符号化部コア６１１に出力し、ビデオ符号化部コア６１１では設定されたフレームレートに従った符号化処理を実行する。
【０１１４】
（ｂ）に示すビデオ復号化器４１３は、ＭＰＥＧ等のビデオデータの復号処理を実行するビデオ復号化部コア６４１と、ビデオ復号化部コア６４１に対して復号化処理態様、具体的には例えば復号フレームレートの設定処理を行うビデオ復号化フレームレート制御部６４２とを有する。
【０１１５】
ビデオ復号化フレームレート制御部６４２は、復号化プリプロセス部４２２から、ビデオデータの復号フレームレートの高低を指示した処理変更命令を入力する。ビデオ復号化フレームレート制御部６４２は、入力命令に基づいて復号フレームレートの設定情報をビデオ復号化部コア６４１に出力し、ビデオ復号化部コア６４１では設定されたフレームレートに従った復号化処理を実行する。
【０１１６】
このように、ビデオ符号化器４３５、およびビデオ復号化器４１３では、符号化あるいは復号化フレームレートの設定を変更して処理負荷を適宜変更し、使用するリソースの変更を行う。
【０１１７】
上述したように、本発明の構成においては、会話の主従判定による処理の重み付けに基づいてリソースの配分を変更して各処理を行う構成としたので、主会話者側の端末では、主会話者の音声および映像データが優先的に処理され高品質なデータとして出力されることになる。また、従会話者から送られてくる音声および映像データは、フレームレートを落とされたり、帯域を狭められたりして、主会話者側の端末のディスプレイおよびスピーカを介して出力される。主会話者は、発言することに労力の多くを費やすため、従会話者からの音声および映像データのクォリティが下がっていたとしても、体感上問題となることはないと考えられる。
【０１１８】
一方、主会話者の音声および映像データは、従会話者に対応するデータ処理を簡略化することで余ったリソースを使うことが可能となり、フレームレートの向上、広帯域化などにより高品質化され、従会話者に送られる。従会話者側の端末は、主会話者の音声および映像データを、可能な限りのクォリティで復号化し、従会話者に提示する。従会話者は、会話上重要な主会話者の音声および映像データを高品質なデータとして視聴できるため、主会話者の細かい表情の変化や小さな声なども、問題なく認識することができる。従会話者の音声および映像データは、クォリティを落として符号化され、主会話者に送信される。
【０１１９】
ここで、クォリティを落として符号化された従会話者側データを受け取った主会話者側端末が、さらにクォリティを落として処理したとしても、問題はないが、実使用上は、クォリティの最低ラインをあらかじめ設定しておくことが好ましい。例えばビデオデータの処理における最低フレームレート、オーディオデータの処理における最低処理帯域を設定し、リソース減少時においても、これらの最低ラインを下回ることのない処理実行する。
【０１２０】
なお、符号化、復号化、ともに端末独自でクォリティを落とす処理を行うことができるので、必ずしも端末間で会話の主従をお互いに認識する必要はないが、例えばリアルタイム・トランスポート制御プロトコルＲＴＣＰ（Ｒｅａｌ−ＴｉｍｅＴｒａｎｓｐｏｒｔＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ）に規定されている“Ａｐｐｌｉｃａｔｉｏｎ−ｄｅｆｉｎｅｄＲＴＣＰｐａｃｋｅｔ（ＲＦＣ１８８９）”などを用いてお互いの通信処理装置の状態情報をリアルタイムで交換することにより、それぞれの通信処理装置で判定された主従関係情報を随時交換し、双方で統一された主従関係に基づく処理を行うようにしてもよい。
【０１２１】
また、２端末間の双方向ビジュアルコミュニケーションの場合では、どちらか片方の端末でのみ会話の主従を判断し、上記したようなネットワークプロトコルを用いて、主従判断端末から、非判断端末に対して判断情報を通知して、主従判断情報を２端末間で共有して統一された主従関係に基づく処理を行うようにしてもよい。
【０１２２】
この場合、図４の構成において、符号化器／復号化器制御部４２１は、通信先からの主従判断情報を入力して、入力した主従判断情報に基づいて、処理高品質化命令または処理簡易化命令を符号化処理部または復号化処理部に出力する。
【０１２３】
なお、上述の実施例においては、端末が自動で会話の主従を決定する場合を説明したが、本発明はこれにとどまらず、例えば、会話の主従をスイッチにより切り替えるモードを備えてもよい。ユーザが会話をしている中で、相手の音声／映像を、より高いクォリティで視聴したい場合に、自分を従会話者と設定するスイッチを設けることで、これが可能となる。
【０１２４】
また、上述の実施例においては、２つの端末間における通信例を示したが、３以上の端末間の通信を実行する場合でも、上述した実施例と同様、１つの通信端末において、送信するデータと、受信するデータ間で、パワーレベルを判定して、いずれかを主会話者として特定することが可能であり、本発明の構成は、２端末間での処理に限らず、３以上の多端末間における双方向ビジュアルコミュニケーションにも応用することができる。
【０１２５】
上述の実施例で述べた一連の処理は、ハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたデータ処理装置内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。一連の処理をソフトウェアによって行う場合には、そのソフトウェアを構成するプログラムが、例えば汎用のコンピュータやマイクロコンピュータ等にインストールされる。
【０１２６】
図９に、上述の実施例で述べた一連の処理を実行する通信処理装置のハードウェア構成例を示す。上述したように送受信データは、符号化データであり、データ送信の場合にはエンコード（符号化）処理が実行され、受信データについてはデコード（復号）処理が実行される。符号化されたデータはパケットとしてネットワークを介して送受信する。そのため、データ送信側では、パケット生成（パケタイズ処理）を実行し、データ受信側ではパケット展開および解析（デパケタイズ処理）を実行する。
【０１２７】
図９に示す例えばＰＣ等の通信処理装置８５０において、エンコード（符号化）処理、デコード（復号）処理は、ＣＰＵ８５６、またはコーデック８５１において実行される。なお、これらのリソースの配分処理、主会話者判断処理は、ＣＰＵ８５６が、メモリ８５７に格納されたプログラムに基づいて実行する。メモリ８５７は、上述した処理を実行するプログラムを格納する領域および処理において発生する中間データ等を格納するメモリ領域を有し、符号化、復号化に適用するメモリ領域についても、前述のリソースの配分処理に基づいて適宜変更設定される。
【０１２８】
通信処理装置８５０は、さらに、通信ネットワークとのインタフェースとして機能するネットワークインタフェース８５２、マウス８３７、キーボード８３６等の入力機器、これら入力機器に対する入力インタフェース８５３、ビデオカメラ８３３、マイク８３４、スピーカ８３５等のＡＶデータ入出力機器、これらＡＶデータ入出力機器からのデータ入出力を行なうＡＶインタフェース８５４、ディスプレイ８３２に対するデータ出力インタフェースとしてのディスプレイインタフェース８５５を有する。
【０１２９】
ＣＰＵ８５６は、各データ入出力インタフェース、コーデック８５１、ネットワークインタフェース８５２間のデータ転送制御、その他各種プログラム制御を実行する。メモリ８５７は、ＣＰＵ８５６により実行される各種プログラム、各種処理データ、ＣＰＵ８５６のワークエリアとして機能するＲＡＭ、ＲＯＭからなる。ＨＤＤ８５８は、データ格納、プログラム格納用の記憶媒体として機能する。これら各構成要素は、ＰＣＩバス８５９に接続され、相互のデータ送受信が可能な構成を持つ。
【０１３０】
送信データとしての符号化データは、ＣＰＵ８５６の制御の下にパケット生成処理（パケタイズ）を実行し、最終的に符号化データをペイロードとしたパケットをＰＣＩバス８５９上に出力し、ネットワークインタフェース８５２を介してネットワークに出力して、パケットのヘッダに設定された宛先アドレスに配信される。
【０１３１】
一方、ネットワークを介して入力するパケット化されたデータは、ネットワークインタフェース８５２を介して、ＣＰＵ８５６の制御の下、パケット展開処理（デパケタイズ）を実行し、さらにコーデック８５１あるいはＣＰＵ８５６の実行するデコードプログラムに従って復号処理を実行して、ディスプレイ８３２、スピーカ８３５において再生、出力する。
【０１３２】
なお、上述の実施例においては、通信を行うユーザの映像データの処理を中心として説明したが、本発明の構成において処理対象となる画像データは、カメラ以外の入力機器、例えばスキャナ等のデータ入力装置、あるいはフロッピーディスク、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＭＯ（Ｍａｇｎｅｔｏｏｐｔｉｃａｌ）ディスク，ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、磁気ディスク、半導体メモリなどのリムーバブル記録媒体から入力したデータを符号化して送信する場合にも適用可能である。
【０１３３】
以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。
【０１３４】
なお、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。
【０１３５】
例えば、プログラムは記録媒体としてのハードディスクやＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に予め記録しておくことができる。あるいは、プログラムはフレキシブルディスク、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＭＯ（Ｍａｇｎｅｔｏｏｐｔｉｃａｌ）ディスク，ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、磁気ディスク、半導体メモリなどのリムーバブル記録媒体に、一時的あるいは永続的に格納（記録）しておくことができる。このようなリムーバブル記録媒体は、いわゆるパッケージソフトウエアとして提供することができる。
【０１３６】
なお、プログラムは、上述したようなリムーバブル記録媒体からコンピュータにインストールする他、ダウンロードサイトから、コンピュータに無線転送したり、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットといったネットワークを介して、コンピュータに有線で転送し、コンピュータでは、そのようにして転送されてくるプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。
【０１３７】
なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。
【０１３８】
【発明の効果】
以上、説明したように、本発明の構成によれば、会話の主従判定による処理の重み付けに基づいて通信処理装置のリソース配分を変更して送信データあるいは受信データの処理を行う構成としたので、主会話者側の音声および映像データを高品質データとして送受信して通信者であるユーザに提示することが可能となる。すなわち、会話にとって重要な部分のデータを選択的に高品質化することが可能となり、体感上の会話のクォリティを高くしたデータ通信、コミュニュケーションが実現される。
【０１３９】
さらに、本発明の構成によれば、送信データおよび受信データに含まれる音声データのパワーレベル比較を実行し、パワーレベルが大きい音声データを含む送信データまたは受信データのいずれかを選択して重点処理データとして識別する処理を実行する構成としたので、実際に話を行っているユーザを主会話者として判断し、実際に話を行っているユーザの音声データおよび映像データを選択的に高品質化することが可能となる。
【図面の簡単な説明】
【図１】符号化データの通信処理構成を説明する図である。
【図２】符号化データの通信処理を実行する通信処理装置における送信データ処理部の構成を示す図である。
【図３】符号化データの通信処理を実行する通信処理装置における受信データ処理部の構成を示す図である。
【図４】本発明の通信処理装置の構成を示す図である。
【図５】本発明の通信処理装置の会話主従判断部の処理シーケンスを説明するフローチャートである。
【図６】本発明の通信処理装置の符号化器．復号化器制御部の処理シーケンスを説明するフローチャートである。
【図７】本発明の通信処理装置のオーディオ符号化器および復号化器の構成を示す図である。
【図８】本発明の通信処理装置のビデオ符号化器および復号化器の構成を示す図である。
【図９】本発明の通信処理装置のハードウェア構成例を示す図である。
【符号の説明】
１１０端末Ａ
１１１マイクＡ
１１２カメラＡ
１１３スピーカＡ
１１４ディスプレイＡ
１１５送信部Ａ
１１６受信部Ａ
１２０端末Ｂ
１２１マイクＢ
１２２カメラＢ
１２３スピーカＢ
１２４ディスプレイＢ
１２５送信部Ｂ
１２６受信部Ｂ
１３０ネットワーク
２０１マイク
２０２カメラ
２１０送信データ処理部
２１１オーディオ符号化器
２１２オーディオパケット生成部
２１３ビデオ符号化器
２１４ビデオパケット生成部
２１５ネットワーク送出部
２２０ネットワーク
３０１ネットワーク
３１０受信データ処理部
３１１ネットワーク受信部
３１２オーディオパケット解析部
３１３オーディオ復号化器
３１４ビデオパケット解析部
３１５ビデオ復号化器
３２１スピーカ
３２２ディスプレイ
４００通信処理装置
４０１ネットワーク
４１０受信データ処理部
４１１ネットワーク受信部
４１２ビデオパケット解析部
４１３ビデオ復号化器
４１４ディスプレイ
４１５オーディオパケット解析部
４１６オーディオ復号化器
４１７スピーカ
４２０会話主従判断部
４２１符号化器／復号化器制御部
４２２復号化プリプロセス部
４２３符号化プリプロセス部
４３０送信データ処理部
４３１マイク
４３２オーディオ符号化器
４３３オーディオパケット生成部
４３４カメラ
４３５ビデオ符号化器
４３６ビデオパケット生成部
４３７ネットワーク送出部
５１１オーディオ符号化部コア
５１２オーディオ符号化帯域制御部
５４１オーディオ復号化部コア
５４２オーディオ復号化帯域制御部
６１１ビデオ符号化部コア
６１２ビデオ符号化フレームレート制御部
６４１ビデオ復号化部コア
６４２ビデオ復号化フレームレート制御部
８０９ＰＣＩバス
８３２ディスプレイ
８３３ビデオカメラ
８３４マイク
８３５スピーカ
８３７マウス
８３８キーボード
８５０データ送受信装置
８５１コーデック
８５２ネットワークインタフェース
８５３入力インタフェース
８５４ＡＶインタフェース
８５５ディスプレイインタフェース
８５６ＣＰＵ
８５７メモリ
８５８ＨＤＤ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a communication processing device, a communication processing method, and a computer program. More specifically, to realize high quality two-way real-time communication between individuals using voice and video simultaneously (referred to as "two-way visual communication"), such as a TV phone application. TECHNICAL FIELD The present invention relates to a communication processing device, a communication processing method, and a computer program that have made it possible.
[0002]
[Prior art]
Recently, methods for easily communicating between individuals using characters, voices, images, and the like, such as IP (Internet Protocol) telephones and instant messengers, have been established, and some of them have already been used as businesses as TV telephone applications. Has been deployed. In the future, as the network becomes broader and lower in price, it is expected that communication means between these individuals will be further improved and used widely.
[0003]
One of the methods for realizing interactive visual communication using voice and video simultaneously is an application such as a TV phone using AV streaming. That is, a personal computer or a communication terminal (hereinafter, referred to as a communication terminal) including a communication processing device such as a portable information terminal or a mobile phone is provided with a microphone and a camera, a speaker and a video display, and a communication participant is provided. By transmitting and receiving audio / video mutually in real time, communication is performed while viewing the audio / video of the other party.
[0004]
With reference to FIG. 1, a configuration example of two-way visual communication by a conventional communication method will be described.
[0005]
The user of the terminal A110 (user A) and the user of the terminal B120 (user B) are connected via the network 130 and are in a state of performing visual communication. Here, the procedure until each terminal is connected via the network is different from the essence of the present invention, and the description is omitted.
[0006]
In the terminal A110, the audio and video of the user A are taken into the terminal A110 from the video and audio acquisition unit including the camera A112 and the microphone A111. The audio / video data is subjected to predetermined processing such as encoding and packetization in the transmission unit A115, and then transmitted from the terminal A110 to the network. Terminal B120 receives audio / video data from terminal A110 via network 130.
[0007]
After receiving the data, the terminal B120 performs predetermined processing such as data acquisition and decoding from a packet storing the audio / video data of the user A in the receiving unit B126, and then performs image and audio via the display B124 and the speaker B123. The data is output and presented to the user B. The specific processing in the transmission unit and the reception unit of each terminal will be described later.
[0008]
In the two-way data communication, data transmission from the terminal B120 to the terminal A110 is performed in parallel with data transmission from the terminal A110 to the terminal B120. In the terminal B120, the audio / video of the user B is acquired by using the camera B122 and the microphone B121, and the acquired data is transmitted to the network 130 after performing a predetermined process in the transmission unit B125 as in the case of the terminal A110. You. The terminal A110 transmits the audio / video data of the user A to the network 130 as described above, and simultaneously receives the audio / video data of the user B transmitted from the terminal B120 to the network 130. After that, image and audio data are output via the display A114 and the speaker A113 on the terminal A110 and presented to the user A.
[0009]
Normally, in two-way visual communication, audio / video data is compressed to several tenths to several tenths using a codec provided in each terminal in order to use network bandwidth efficiently. (Encoded) and transmitted to the network. For example, encoding processing such as MPEG (Moving Picture Experts Group) and ATRAC (Adaptive Transform Acoustic Coding) is performed.
[0010]
Also, when outputting data to the network, the encoded data is not streamed as it is, but rather is packetized using a protocol suitable for network transmission and reception. For example, it generates a packet according to TCP (Transmission Control Protocol) or UDP (User Datagram Protocol) and outputs the generated packet to the network. The transmitting unit and the receiving unit of each terminal perform these processes.
[0011]
2 and 3 show configuration examples of the transmission unit and the reception unit. As shown in FIG. 2, the transmission data processing unit 210 inputs audio data obtained from the microphone 201 to the audio encoder 211, performs data encoding processing according to a predetermined format such as ATRAC, and generates an audio packet. The unit 212 stores the encoded data as a payload, generates a packet in which predetermined header information such as a source address and a destination address is set, and transmits the packet to the network 220 via the network transmission unit 215.
[0012]
The video data is acquired by the camera 202, input to the video encoder 213, performs data encoding according to a predetermined format such as MPEG, and stores the encoded data as a payload in the video packet generation unit 214. A packet in which predetermined header information such as a source address and a destination address is set is generated and transmitted to the network 220 via the network transmission unit 215.
[0013]
The processing in the terminal receiving these packets will be described with reference to FIG. The packet received by the network receiving unit 311 of the received data processing unit 310 via the network 301 is input to the audio packet analyzing unit 312 and the video packet analyzing unit 314, to extract encoded data from each packet, After executing processing such as rearrangement of the obtained encoded data, the data is input to the audio decoder 313 and the video decoder 315, respectively, and the decoding processing according to the decoding sequence such as ATRAC and MPEG is executed. The decoded data is output via the speaker 321 and the display 322, respectively.
[0014]
As described above, in the communication terminal used by the participant of the interactive visual communication, the processing of the transmission unit that encodes the voice / video of the user using the tool and transmits the encoded data to the network, and the communication terminal performs the processing from the other participants participating in the communication. It is necessary to simultaneously perform, in real time, a process of a receiving unit that decodes audio / video data transmitted via a network and a process of outputting the decoded audio / video data using a speaker and a display.
[0015]
As a conventional technology that discloses a data communication device that communicates a plurality of media data such as images and sounds, there is, for example, Patent Document 1. Patent Literature 1 discloses a configuration for performing efficient data communication by selecting and switching an encoding unit when a plurality of media data are compressed and transmitted. Further, in communication of image data, Patent Document 2 discloses a processing method for extracting and transmitting only a specific area image.
[0016]
[Patent Document 1]
JP-A-11-177436
[Patent Document 2]
JP-A-2002-51315
[0017]
[Problems to be solved by the invention]
Examples of the communication processing device used as a communication terminal include a personal computer, a portable information terminal, and a mobile phone. Of these, the mobile information terminal and the mobile phone have a case size, power consumption, a mounted processor, and a terminal. Due to various factors such as price, the system generally has a lower processing capacity than the personal computer in many cases. That is, the CPU power of the terminal and the amount of available memory are often inferior to those of a PC or the like.
[0018]
As described above, in order to perform interactive visual communication, high-speed processing capability for maintaining real-time properties is required. As a result, data quality is sacrificed on a terminal such as a portable information terminal or a mobile phone using a system with low processing capability in order to maintain real-time performance. In other words, real-time characteristics are maintained at the expense of data quality, such as communication data being video data at a low frame rate and narrow-band audio data. Therefore, there has been a problem that it is difficult to maintain communication with real-time properties and high quality.
[0019]
The present invention has been made in view of such a situation, and a communication processing apparatus and a communication processing method capable of realizing two-way visual communication without deteriorating communication data quality even in a communication terminal having a low processing capability. , As well as computer programs.
[0020]
[Means for Solving the Problems]
According to a first aspect of the present invention,
A communication processing device that executes a transmission / reception process of encoded data,
A transmission data processing unit that performs transmission data encoding processing;
A reception data processing unit that performs a reception data decoding process;
A master / slave determination unit that identifies any one of the transmission data and the reception data as important processing data based on a comparison between the transmission data and the reception data;
Based on the identification information of the master / slave determination unit, a processing quality improvement command is output to either the transmission data processing unit or the reception data processing unit that processes the priority processing data, and the other data processing A control unit that outputs a processing simplification instruction to the unit,
The transmission data processing unit and the reception data processing unit are configured to execute a process of changing an encoding mode of transmission data and a decoding mode of reception data based on a control signal from the control unit. In the communication processing device.
[0021]
Further, in one embodiment of the communication processing device of the present invention, the master-slave determination is performed by comparing a power level of audio data included in the transmission data and the reception data, and determining whether the transmission data or the reception data including the audio data having the higher power level is included. And performing a process of selecting any one of them as important processing data.
[0022]
Further, in one embodiment of the communication processing device of the present invention, the transmission data processing unit and the reception data processing unit have an audio data encoding unit or a decoding unit, and a processing high quality instruction from the control unit. , And performs a process of reducing an encoding band or a decoding band based on a processing simplification command from the control unit. And
[0023]
Further, in one embodiment of the communication processing device of the present invention, the transmission data processing unit and the reception data processing unit have an encoding unit or a decoding unit for video data, and a processing high quality instruction from the control unit. Based on the above, a process of increasing the encoding frame rate or the decoding frame rate is executed, and a process of reducing the encoding frame rate or the decoding frame rate is executed based on a processing simplification command from the control unit. There is a feature.
[0024]
Further, in one embodiment of the communication processing device of the present invention, the transmission data processing unit and the reception data processing unit increase a resource application rate in the communication processing device based on a processing high quality command from the control unit. The present invention is characterized in that the configuration is such that the executed process is executed, and the process for reducing the resource application rate in the communication processing device is executed based on the process simplification command from the control unit.
[0025]
Further, in one embodiment of the communication processing apparatus of the present invention, the resource application rate includes a processing time of a CPU and a usage rate of a memory.
[0026]
Further, in one embodiment of the communication processing device of the present invention, the communication processing device transmits the identification information of the master / slave determination unit to a communication destination terminal, and performs a notification process of the identification information. Features.
[0027]
Further, in one embodiment of the communication processing device of the present invention, the control unit, based on the master-slave determination identification information received from the communication destination terminal, the priority processing data of one of the transmission data processing unit and the reception data processing unit It is characterized in that it outputs a high-quality processing instruction to the processing unit and outputs a processing simplification instruction to the other data processing unit.
[0028]
Further, in one embodiment of the communication processing device of the present invention, the communication processing device further includes a switch unit that can arbitrarily set any one of transmission data and reception data as important processing data, and the control unit includes: Based on the setting information of the switch means, outputs a processing quality improvement command to one of the transmission data processing unit or the reception data processing unit for processing the priority processing data, and outputs the other data A processing simplification command is output to the processing unit.
[0029]
Further, a second aspect of the present invention provides
A master / slave determination step of identifying any of the transmission data and the reception data as the priority processing data based on the comparison of the transmission data and the reception data;
Based on the identification information in the master-slave determination step, output a processing quality improvement command to either the transmission data processing unit or the reception data processing unit that processes the priority processing data, and perform the other data processing. A control step of outputting a processing simplification instruction to the unit;
In the transmission data processing unit and the reception data processing unit, a process change step of executing a process of changing a coding mode of transmission data and a decoding mode of reception data based on the processing high quality instruction or the processing simplification instruction; ,
Communication processing method.
[0030]
Further, in one embodiment of the communication processing method of the present invention, the master-slave determination step performs a power level comparison of audio data included in the transmission data and the reception data, and performs transmission data reception or reception including audio data having a high power level. It is characterized in that a process of selecting any of the data and identifying it as important processing data is executed.
[0031]
Further, in one embodiment of the communication processing method of the present invention, the transmission data processing unit and the reception data processing unit have an audio data encoding unit or decoding unit, based on the processing high quality command, The present invention is characterized in that an enlarging process of an encoding band or a decoding band is executed, and a process of reducing the encoding band or the decoding band is executed based on the processing simplification command.
[0032]
Further, in one embodiment of the communication processing method of the present invention, the transmission data processing unit and the reception data processing unit have an encoding unit or a decoding unit of video data, based on the processing high quality command, The encoding frame rate or the decoding frame rate is increased, and the encoding frame rate or the decoding frame rate is reduced based on the processing simplification command.
[0033]
Further, in one embodiment of the communication processing method of the present invention, the transmission data processing unit and the reception data processing unit perform processing for increasing a resource application rate in the communication processing device based on the processing high quality instruction. And executing a process in which the resource application rate in the communication processing device is reduced based on the process simplification instruction.
[0034]
Further, in one embodiment of the communication processing method of the present invention, the resource application rate includes a processing time of a CPU and a usage rate of a memory.
[0035]
Further, in one embodiment of the communication processing method of the present invention, the communication processing method further includes a step of transmitting the identification information in the master-slave determination step to a communication destination terminal, and executing a notification process of the identification information. It is characterized by the following.
[0036]
Further, in one embodiment of the communication processing method of the present invention, the communication processing method further includes one of the transmission data processing unit and the reception data processing unit based on master-slave determination identification information received from a communication destination terminal. The method further comprises the step of outputting a high-quality processing instruction to the priority processing data processing unit and outputting a processing simplification instruction to the other data processing unit.
[0037]
Further, in one embodiment of the communication processing method of the present invention, the communication processing method further includes a priority processing data setting step by a switch means capable of arbitrarily setting any of transmission data and reception data as priority processing data. The control step may include, based on the setting information of the switch means, issuing a processing high quality command to one of the transmission data processing unit and the reception data processing unit for processing the priority processing data. And outputting a processing simplification instruction to the other data processing unit.
[0038]
Further, a third aspect of the present invention provides
A computer program that executes communication processing of encoded data,
A master / slave determination step of identifying any of the transmission data and the reception data as the priority processing data based on the comparison of the transmission data and the reception data;
Based on the identification information in the master-slave determination step, output a processing quality improvement command to either the transmission data processing unit or the reception data processing unit that processes the priority processing data, and perform the other data processing. A control step of outputting a processing simplification instruction to the unit;
In the transmission data processing unit and the reception data processing unit, a process change step of executing a process of changing a coding mode of transmission data and a decoding mode of reception data based on the processing high quality instruction or the processing simplification instruction; ,
A computer program characterized by having:
[0039]
[Action]
According to the configuration of the present invention, since the configuration is such that the processing of transmission data or reception data is performed by changing the resource allocation of the communication processing device based on the weight of the processing based on the master-slave determination of the conversation, the voice of the main talker and It is possible to transmit and receive video data as high quality data and present it to a user who is a communicator. In other words, it is possible to selectively improve the quality of data in a portion important for conversation, and to realize data communication and communication with a high quality of conversation in terms of sensation.
[0040]
Further, according to the configuration of the present invention, the power level of the audio data included in the transmission data and the reception data is compared, and the transmission processing or the reception data including the audio data having the higher power level is selected to perform the priority processing. The process of identifying as data is executed, so that the user who is actually speaking is determined as the main talker, and the voice data and video data of the user who is actually speaking are selectively improved in quality. It is possible to do.
[0041]
The computer program of the present invention is provided, for example, in a computer-readable format for a general-purpose computer system capable of executing various program codes, in a storage medium or communication medium such as a CD, FD, or MO. And a computer program that can be provided by a communication medium such as a network. By providing such a program in a computer-readable format, processing according to the program is realized on a computer system.
[0042]
Further objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described below and the accompanying drawings. In this specification, the term “system” refers to a logical set of a plurality of devices, and is not limited to a device having each component in the same housing.
[0043]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the details of a communication processing device, a communication processing method, and a computer program of the present invention will be described with reference to the drawings.
[0044]
The communication processing device of the present invention executes two-way visual communication involving voice and image, and determines a master-slave relationship of a conversation during execution of data communication. Then, by smoothly allocating the information processing resources of the communication processing device to the main side of the conversation, smooth interactive visual communication is realized even with a terminal having a low processing capability.
[0045]
The “resources for information processing” are a control unit (for example, a CPU) of the communication processing device and an available memory amount necessary for performing the process of the interactive visual communication. Specifically, for example, it is a data processing unit including a CPU and a memory which are necessary for encoding transmission data and decoding received data, and the like, which are required when performing interactive visual communication. The processing device executes processing by preferentially allocating these information processing resources to the main part of the conversation.
[0046]
That is, in the communication processing device, it is determined whether the main side of the conversation is the user of the own device side or the user of the communication processing device side of the communication destination, and based on the determination, the data processing means including the CPU is determined. The application rate and the amount of memory used are changed, and more resources are allocated to data processing on the main side of the conversation to execute the processing.
[0047]
Hereinafter, the configuration of the communication processing device of the present invention and the procedure of the communication processing will be described in detail with reference to the drawings.
[0048]
In an environment in which communication is performed via a line or a wireless network between a plurality of communication processing devices, for example, when a user has a conversation between the communication processing devices, there is a talking side and a listening side. In this way, two-way communication is established by sequentially changing positions.
[0049]
According to the present invention, among a plurality of users who are performing data communication via a network using a plurality of communication processing devices, a user who is mainly talking, that is, a user who is transmitting a lot of voice data Is referred to as a main talker, and a user listening to the talk of the main talker, that is, a user receiving and reproducing voice data is referred to as a slave talker. In the case of performing two-way communication, the main talker and the slave talker are usually not fixed, and a conversation is established by being appropriately replaced. Therefore, the participants of the conversation can be both the main conversation person and the slave conversation person depending on the content of the conversation.
[0050]
In the communication processing device of the present invention, the input / output status of the voice data of the user as the communication data is monitored, the master-slave relationship is determined at any time based on the monitoring information, and the resource allocation is updated and changed as needed based on the determination result. It is intended to improve the quality of communication data by optimizing the allocation of resources.
[0051]
When performing two-way visual communication using a communication processing device, a communication processing device such as a mobile phone used by each of a main talker and a slave talker transmits voice and video data of the user by a microphone and a camera. After acquiring and encoding, the data is transmitted to the other party through the network.
[0052]
In a conventional communication processing device (terminal), audio and video data encoding processing and encoded data transmission processing are performed without distinguishing whether a person using the terminal is a main talker or a slave talker. And various processes required for transmission and reception and reproduction of communication data, such as encoded data reception processing and received data decoding processing, are executed based on time division processing of a simple processing process. That is, a general data processing configuration is such that the processor (CPU or the like) of each communication terminal sequentially performs processing in accordance with the stored processing target data.
[0053]
However, considering the content of data exchanged in actual interactive visual communication, since the data from the main talker is the content of the conversation itself, both the voice and the video of the main talker are being executed via the communication processing device. Often important data for conversation. On the other hand, the audio data and video data from the side who is listening to the talker, ie, the main talker, only indicate a hammer or reply to the content spoken by the main talker, or do not speak any words. In many cases, both the voice and video data of the follower are not highly important for the conversation being executed via the communication processing device.
[0054]
Therefore, allocating the same system resources to the data processing of the audio and video data of the slave as that of the processing of the audio and video data of the main talker is the most suitable processing in view of the importance of the data. I can't say. Providing a processor with a high processing capacity and a device with sufficient memory capacity at a level where there is no problem even if the data processing corresponding to the main talker and the follower is performed equally increases the cost and the size of the device. Will be invited.
[0055]
Therefore, in the communication processing device of the present invention, the input / output status of voice data of a plurality of users performing communication is monitored, the master-slave relationship is determined as needed based on the monitoring information, and the resource allocation is determined based on the determination result. Update and change, preferentially perform data processing of the main talker by applying more resources to data processing from the main talker, and more important data, that is, voice data of the main talker and This makes it possible to transmit and receive image data as higher quality data and provide the data to the user.
[0056]
The configuration and processing of the communication processing device of the present invention will be described with reference to FIG. The communication processing device 400 of the present invention includes a reception data processing unit 410 for processing data received via the network 401, a display 414 for outputting data after processing by the reception data processing unit, a speaker 417, and a user for outputting to the network. It has a microphone 431 for acquiring audio and video data, a camera 434, and a transmission data processing unit 430 for generating transmission data based on the acquired data.
[0057]
In the reception data processing unit 410, the packet received by the network reception unit 411 via the network 401 is input to the video packet analysis unit 412 and the audio packet analysis unit 415, and coded data is extracted from each packet. After executing processes such as rearrangement of the obtained encoded data, the input is input to the video decoder 413 and the audio decoder 416, respectively, and the decoding process is executed according to the decoding sequence such as ATRAC, MPEG, etc. The data is output via the display 414 and the speaker 417, respectively.
[0058]
The transmission data processing unit 430 inputs the audio data obtained from the microphone 431 to the audio encoder 432, performs data encoding processing according to a predetermined format such as ATRAC, etc., and outputs the encoded data to the audio packet generation unit 433. Is stored as a payload, a packet in which predetermined header information such as a source address and a destination address is set is generated, and transmitted to the network 401 via the network transmission unit 437.
[0059]
The video data is acquired by the camera 434, input to the video encoder 435, performs data encoding processing according to a predetermined format such as MPEG, and stores the encoded data as a payload in the video packet generator 436. A packet in which predetermined header information such as a source address and a destination address is set is generated and transmitted to the network 401 via the network transmission unit 437.
[0060]
The communication processing apparatus according to the present invention includes a conversation master / slave determination unit 420, an encoder / decoder control unit 421, an encoding preprocessing unit 423, and a reception data processing unit 410 and a transmission data processing unit 430. A decoding preprocessing unit 422 is provided.
[0061]
The conversation master / slave determination unit 420 executes a process of identifying which of the transmission data and the reception data is to be the priority processing data. The conversation master / slave determination unit 420 inputs transmission voice data, which is voice data acquired by the microphone 431, and reception voice data, which is voice data of a communication partner, received via the network 401. Here, the transmission voice data is voice data issued by the user using the communication processing device 400 and acquired by the microphone 431. On the other hand, the received voice data is voice data of a communication partner user. The received voice data is received by the network receiving unit 411 via the network 401, and the encoded data obtained from the packet in the audio packet analyzing unit 415 is decoded in the audio decoder 416. This is the voice data of the communication partner user.
[0062]
The conversation master / slave determining unit 420 receives the received voice data as the voice data of the communication partner user output from the audio decoder 416 of the received data processing unit 410 and the transmitted voice data as the voice data acquired by the microphone 431. , The power levels of these two audio data are compared.
[0063]
When no voice is input, the power level is 0, but when a voice is input, the conversation master / slave determination unit 420 measures the power level based on the input voice data. When the user who is mainly talking is the user of the communication processing device 400, the power level of the transmission voice data, which is the voice data acquired by the microphone 431, is equal to the reception voice data, which is the voice data of the communication partner user. Power level. That is, the power levels of the two input audio data are
Transmission voice data> Reception voice data
It becomes.
[0064]
On the other hand, if the communication partner user is mainly talking, the power level of the reception voice data which is the voice data of the communication partner user is higher than the power level of the transmission voice data which is the voice data acquired by the microphone 431. It will be great. That is, the power levels of the two input audio data are
Transmission audio data <reception audio data
It becomes.
[0065]
The conversation master / slave determination unit 420
Transmission voice data> Reception voice data
If so, the user on the communication processing device 400 side is determined to be the main talker, and the communication partner user performing communication with the communication processing device 400 is determined to be the slave talker.
Transmission audio data <reception audio data
If so, the user on the communication processing device 400 side is determined to be a subordinate, and the communication partner user who is performing communication with the communication processing device 400 is determined to be the main conversant.
[0066]
The conversation master / slave determination unit 420 continuously or at predetermined sampling timings receives the transmission voice data and the reception voice data, compares the power levels of the two data based on the input data, and determines the main conversation based on the comparison result. It is determined which user is the participant and the follower.
[0067]
FIG. 5 is a flowchart illustrating a conversation master-slave determination processing sequence executed by the conversation master-slave determination unit 420.
[0068]
In step S101, the conversation master / slave determination unit 420 determines whether or not the transmission voice data acquired by the microphone 431 has been input. This is determined by the conversation master / slave determination unit 420 based on the input level of the transmission voice data.
[0069]
If it is determined that there is input of transmission voice data, the process proceeds to step S102, and it is determined whether or not voice data of a communication partner user output by the audio decoder 416 of the reception data processing unit 410 has been input. This is determined by the conversation master / slave determination unit 420 based on the input level of the received voice data.
[0070]
If there is an input of transmission audio data and an input of reception audio data, in step S103, a comparison determination process is performed between the transmission audio data power level and the reception audio data power level.
[0071]
Transmitted voice data power> Received voice data power
If so, the process proceeds to step S104, and an identification signal indicating that the local terminal is the main talker is output to the encoder / decoder controller 421. In FIG. 5, the “local terminal” is a communication processing terminal provided with a conversation master / slave determination unit that performs a master / slave determination process of a conversation according to this flowchart, and the “remote terminal” is a “local terminal”. It is a communication processing terminal connected via a network and executing communication.
[0072]
Transmitted voice data power> Received voice data power
If not, the flow advances to step S112 to output an identification signal indicating that the remote terminal-side user is the main talker to the encoder / decoder controller 421.
[0073]
If the determination in step S102 is No, that is, if there is input of transmission audio data but no input of reception audio data, the process proceeds to step S104 without performing the input audio power level comparison processing in step S103, and the local terminal An identification signal indicating that the side user is the main talker is output to the encoder / decoder controller 421.
[0074]
If it is determined in step S101 that there is no input of the transmission voice data, the process proceeds to step S111, and it is determined whether or not the voice data of the communication partner output by the audio decoder 416 of the reception data processing unit 410 has been input. This is determined by the conversation master / slave determination unit 420 based on the input level of the received voice data.
[0075]
If the determination in step S111 is Yes, that is, if there is no input of transmitted voice data and only input of received voice data, the process proceeds to step S112, where an identification signal indicating that the remote terminal side user is the main talker is encoded. Output to the decoder / decoder controller 421.
[0076]
If the determination in step S111 is No, that is, if there is no input of both transmission and reception voice data, there is no need to execute resource control, and an identification signal is output to the encoder / decoder controller 421. The process ends without any error.
[0077]
Note that the process illustrated in FIG. 5 is a process that is repeatedly performed by the conversation master / slave determination unit 420 or repeatedly at predetermined sampling timings.
[0078]
As described above, when the conversation master-slave determination unit 420 determines the master-slave state of the conversation based on the two input audio power levels, the conversation master-slave determination unit 420 sends an instruction to the encoder / decoder control unit 421. And outputs an identification signal indicating whether the terminal side is the main.
[0079]
Encoder / decoder control section 421 outputs a control command corresponding to the identification signal to decoding preprocessing section 422 and encoding preprocessing section 423 according to the identification signal input from conversation master / slave determination section 420. . The encoder / decoder control unit 421 is configured to process either the transmission data processing unit 430 or the reception data processing unit 410 for processing the priority processing data based on the identification information input from the conversation master / slave determination unit 420. , And outputs a processing simplification instruction to the other data processing unit.
[0080]
If the communication processing device (local terminal) shown in FIG. 4 is a terminal of the main talker, the voice and image of the remote talker of the remote terminal transmitted via the network are important for communication. Coder / decoder control unit so as to reduce processing related to voice and video data of the follower, that is, reduce resources applied to processing related to voice and video data of the follower. 421 outputs a control command to the decoding pre-processing unit 422.
[0081]
Upon input of a resource reduction command as a control command from the encoder / decoder control unit 421, the decoding preprocessing unit 422 lowers the video decoder 413 to reduce the frame rate of decoded video data in the decoding process. The video decoder 413 outputs a processing change instruction, and reduces the decoding frame rate according to the processing change instruction. As a result, the processing load is reduced, and the resources (CPU, memory, etc.) of the communication processing device can be preferentially applied to other processes.
[0082]
Upon input of the resource reduction command as a control command from the encoder / decoder control unit 421, the decoding pre-processing unit 422 outputs a processing change command to the video decoder 413, and outputs the processing change command to the video decoder 413. Performs processing in which the resource application rate is reduced, that is, processing in which the processing time of the CPU and the memory usage rate are reduced to lower the decoding frame rate.
[0083]
Furthermore, when the decoding pre-processing unit 422 inputs a resource reduction command as a control command from the encoder / decoder control unit 421, the decoding pre-processing unit 422 reduces the decoding band of the audio data to the audio decoder 416 to reduce the bandwidth. The audio decoder 416 outputs a processing change instruction such as banding, and changes the decoding processing mode in accordance with the processing change instruction, that is, performs processing such as reducing the decoding band of the audio data and narrowing the band, and Reduce the load of data decoding processing. As a result, the processing load is reduced, and the resources (CPU, memory, etc.) of the communication processing device can be preferentially applied to other processes.
[0084]
Upon input of the resource reduction command as a control command from the encoder / decoder control unit 421, the decoding pre-processing unit 422 outputs a process change command to the audio decoder 416, and outputs the processing change command to the audio decoder 416. Performs processing in which the resource application rate is reduced, that is, processing in which the processing time of the CPU and the memory usage rate are reduced to reduce the decoding bandwidth.
[0085]
If the communication processing device (local terminal) shown in FIG. 4 is a terminal of the main talker, the voice and image of the communication processing device (local terminal) are important data for communication, and The encoder / decoder control unit 421 performs the processing so as to focus on the processing related to the voice and video data of the talker, that is, to increase the resources applied to the processing related to the voice and video data of the main talker. The control command is output to the chemical preprocessing unit 423.
[0086]
Upon input of a resource increase instruction as a control instruction from the encoder / decoder controller 421, the encoding preprocessor 423 sets the frame rate of the encoded video data in the encoding process to the video encoder 435. The video encoder 435 outputs a processing change instruction to increase the coding frame rate as much as possible according to the processing change instruction. Although the processing load in this case increases, as described above, the processing load is reduced in the reception data processing unit 410, and the resources (CPU, memory, etc.) of the communication processing device are allocated to the transmission data processing unit. 430 can be applied preferentially, and can cope with an increase in the encoding frame rate. As a result, the data quality of the encoded data is improved, and the remote terminal at the communication destination can reproduce and output higher quality video data of the main talker.
[0087]
Upon input of the resource increase instruction as a control instruction from the encoder / decoder controller 421, the encoding preprocessing unit 423 outputs a processing change instruction to the video encoder 435, and outputs the processing change instruction to the video encoder 435. Performs processing that increases the resource application rate, that is, processing that increases the processing time of the CPU and the usage rate of the memory to increase the encoding frame rate as much as possible.
[0088]
Further, when the encoding preprocessing unit 423 inputs a resource increase instruction as a control instruction from the encoder / decoder control unit 421, the encoding preprocessing unit 423 allows the audio encoder 432 to set the audio data encoding band. The audio encoder 432 outputs a processing change instruction to increase the bandwidth in the range, and the audio encoder 432 performs a change in the decoding processing mode according to the processing change instruction to improve the quality of the encoded data of the audio data, that is, the audio data. For example, processing such as expanding the coding band of the data and widening the band is executed.
[0089]
Upon input of the resource increase instruction as a control instruction from the encoder / decoder controller 421, the encoding preprocessor 423 outputs a processing change instruction to the audio encoder 432, and outputs the processing change instruction to the audio encoder 432. Performs processing in which the resource application rate is increased, that is, processing in which the processing time of the CPU and the usage rate of the memory are increased to expand the coding band.
[0090]
Although the processing load in this case increases, as described above, the processing load is reduced in the reception data processing unit 410, and the resources (CPU, memory, etc.) of the communication processing device are allocated to the transmission data processing unit. 430 can be applied preferentially, and processing such as widening the coding band as much as possible becomes possible. As a result, the data quality of the encoded data is improved, and the remote terminal at the communication destination can reproduce and output higher quality voice data of the main talker.
[0091]
On the other hand, when it is assumed that the communication processing device (local terminal) shown in FIG. 4 is a terminal of a slave talker and the remote terminal is a master talker, the master talker of the remote terminal sent via the network The voice and image on the side are important data for communication, and the processing mainly on the voice and video data received from the remote terminal, that is, the processing on the voice and image data on the main talker side of the remote terminal The encoder / decoder control unit 421 outputs a control command to the decoding pre-processing unit 422 so as to increase applied resources.
[0092]
Upon input of a resource increase command as a control command from the encoder / decoder control unit 421, the decoding preprocessing unit 422 can set the frame rate of decoded video data in the decoding process to the video decoder 413. The video decoder 413 outputs a processing change instruction to increase the decoding frame rate in the range as much as possible according to the processing change instruction. As a result, the processing load increases, but as described below, the processing load is reduced in the transmission data processing unit 430, and the resources (CPU, memory, etc.) of the communication processing device are preferentially allocated to the reception data processing unit 410. This can be applied, and it is possible to cope with an increase in the decoding frame rate. As a result, the data quality of the decoded data is improved, and the local terminal can output the video data received from the remote terminal of the communication destination as high-quality video data on the display 414.
[0093]
Further, when the decoding pre-processing unit 422 inputs a resource increase instruction as a control instruction from the encoder / decoder control unit 421, the decoding pre-processing unit 422 gives the audio decoder 416 an audio data decoding band within a possible range. Then, the audio decoder 416 changes the decoding processing mode in accordance with the processing change instruction, thereby improving the quality of the decoded audio data. As a result, the processing load increases, but as described below, the processing load is reduced in the transmission data processing unit 430, and the resources (CPU, memory, etc.) of the communication processing device are preferentially allocated to the reception data processing unit 410. This makes it possible to perform processing such as broadening the decoding band of audio data to the maximum possible range. As a result, the data quality of the decoded data is improved, and the local terminal can output the audio data received from the remote terminal of the communication destination as high-quality audio data at the speaker 417.
[0094]
Further, if the communication processing device (local terminal) shown in FIG. 4 is a terminal on the side of a slave, the voice and image on the communication processing device (local terminal) side are data that are not important for communication, and The processing related to the voice and video data of the talker is reduced and executed, that is, the encoding is performed so as to reduce the resources applied to the processing related to the voice and video data of the talker transmitted from the communication processing device 400 as the local terminal. The encoder / decoder controller 421 outputs a control command to the encoding pre-processing unit 423.
[0095]
Upon input of the resource reduction instruction as a control instruction from the encoder / decoder controller 421, the encoding preprocessor 423 sets the frame rate of the encoded video data in the encoding processing to the video encoder 435. The video encoder 435 outputs a processing change instruction to decrease the coding frame rate in response to the processing change instruction. As a result, the processing load decreases, and it becomes possible to cope with the increase in the processing load in the reception data processing unit 410 described above.
[0096]
Further, when the encoding pre-processing unit 423 receives a resource reduction instruction as a control instruction from the encoder / decoder control unit 421, the encoding pre-processing unit 423 narrows the audio data encoding band to the audio encoder 432. Then, the audio encoder 432 executes a process of changing the encoding processing mode in accordance with the process change instruction, and executes a process of reducing the quality of the encoded data of the audio data. As a result, the processing load decreases, and it becomes possible to cope with the increase in the processing load in the reception data processing unit 410 described above.
[0097]
The change of the processing mode for changing the load in the audio and video encoders and decoders is, for example, as described above, a change in the processing band for audio and a change in the frame rate for video data. There are changes. Specifically, if you want to reduce the processing load of audio data, you can cut the high-frequency range and perform encoding or decoding of only the low-frequency range to reduce the resources required for high-frequency processing. Becomes If it is desired to reduce the processing load in encoding or decoding video data, it is possible to reduce the video data encoding and decoding resources required per second by reducing the frame rate. .
[0098]
The processing procedure of the encoder / decoder controller 421 described above will be described with reference to FIG. In step S201, the encoder / decoder controller 421 indicates which of the local terminal and the remote terminal the user on the terminal side, which is determined by the conversation master / slave determination unit 420 based on the audio power level, is the main talker. Input the identification signal.
[0099]
In step S202, it is determined based on the input identification signal that the local terminal side is the main talker, that is, the identification signal indicating that the user using the terminal executing the processing flow is the main talker. In step S203, in step S203, a processing simplification instruction, that is, a resource reduction instruction as a control instruction is output to the decoding unit in the reception data processing unit 410, that is, the video decoder 413 and the audio decoder 416. I do. Further, in step S204, a high-quality processing instruction, that is, a resource increase instruction as a control instruction, is output to an encoding unit in the transmission data processing unit 430, that is, the audio encoder 432 and the video encoder 435. .
[0100]
By this processing, more resources are applied to the processing of the voice and video of the user who uses the terminal executing this processing flow, that is, the main talker, and high-quality data is output via the network. In the remote terminal, high-quality data is reproduced. It should be noted that the remote terminal is also a device that determines the main talker by the same conversation master / slave determination unit as shown in FIG. 4 and performs resource allocation. Is determined to be large, and more resources are applied to the processing of audio and video received from the local terminal. Therefore, it is possible to decode and reproduce the high quality data transmitted from the local terminal without deteriorating, and to reproduce the high quality data of the voice and video of the main talker.
[0101]
That is, if both of the communication processing apparatuses that execute communication via the network have the conversation master / slave determination unit shown in FIG. 4 and execute resource allocation based on the master / slave determination, preferential processing of data of the master talker is performed. Is performed in both devices, and generation, transmission, reception, decoding and reproduction of high quality encoded data are all executed. Even when only one of the devices has the conversation master-slave judgment unit shown in FIG. 4 and executes the resource allocation based on the master-slave judgment, the device can perform the priority execution of the data processing on the main talker side. Thus, the efficiency of processing in the own device and the high quality of data on the side of the main talker are realized.
[0102]
On the other hand, in step S202, based on the input identification signal, the local terminal is not the main talker, that is, the user using the remote terminal performing communication with the terminal performing the processing flow is determined to be the main talker. If it is determined that the received signal is an identification signal indicating that the received signal is an audio signal, the process proceeds to step S211 and the processing level of the decoding unit in the received data processing unit 410, that is, the video decoder 413 and the audio decoder 416 is processed. A quality command, that is, a resource increase command as a control command is output. Further, in step S212, a processing simplification instruction, that is, a resource reduction instruction as a control instruction, is output to an encoding unit in transmission data processing unit 430, that is, audio encoder 432 and video encoder 435.
[0103]
This processing reduces the processing resources of the user using the terminal executing this processing flow, ie, the voice and video processing resources of the follower, and provides more resources for processing the data received from the remote terminal. The high-quality audio and video data received from the remote terminal is output via the display 414 and the speaker 417.
[0104]
The remote terminal is also a device that determines the main talker by the same conversation master / slave determination unit as shown in FIG. 4 and performs resource allocation. In this case, the remote terminal side has its own device, that is, the remote terminal side. The user is determined to be a talker, and more resources are applied to processing of audio and video transmitted from the own device. Therefore, processing for converting the data to be transmitted into high quality data is performed, and generation, transmission, reception, decoding and reproduction of high quality encoded data are all executed in the two communication processing devices. become.
[0105]
That is, if both of the communication processing apparatuses that execute communication via the network have the conversation master / slave determination unit shown in FIG. 4 and execute resource allocation based on the master / slave determination, preferential processing of data of the master talker is performed. Is performed in both devices, and generation, transmission, reception, decoding and reproduction of high quality encoded data are all executed. Even when only one of the devices has the conversation master-slave judgment unit shown in FIG. 4 and executes the resource allocation based on the master-slave judgment, the device can perform the priority execution of the data processing on the main talker side. Thus, the processing efficiency of the own device and the high quality of the data of the main talker are improved.
[0106]
7 and 8 show the detailed configurations of an audio encoder, a decoder, a video encoder, and a video decoder included in the communication processing device of the present invention.
[0107]
FIG. 7 shows the configurations of (a) an audio encoder and (b) an audio decoder. An audio encoder 432 shown in (a) performs an audio encoding unit core 511 that executes an audio data encoding process such as ATRAC, and an encoding processing mode for the audio encoding unit core 511, specifically, For example, it has an audio coding band control unit 512 for performing a coding band setting process.
[0108]
The audio coding band control unit 512 inputs, from the coding pre-processing unit 423, a processing change instruction instructing to widen or narrow the band of the coding band of the audio data as much as possible. The audio coding band control unit 512 outputs the setting information of the coding band to the audio coding unit core 511 based on the input command, and the audio coding unit core 511 executes the coding process according to the set band. I do.
[0109]
The audio decoder 416 shown in (b) performs an audio decoding unit core 541 that executes a decoding process of audio data such as ATRAC, and a decoding processing mode for the audio decoding unit core 541, specifically, for example, And an audio decoding band control unit 542 that performs a decoding band setting process.
[0110]
The audio decoding band control unit 542 inputs, from the decoding pre-processing unit 422, a processing change command for instructing a widening or a narrowing of the decoding band of the audio data as much as possible. The audio decoding band control unit 542 outputs the setting information of the decoding band to the audio decoding unit core 541 based on the input command, and the audio decoding unit core 541 executes the decoding process according to the set band. .
[0111]
As described above, the audio encoder 432 and the audio decoder 416 change the setting of the encoding or decoding band, appropriately change the processing load, and change the resources used.
[0112]
FIG. 8 shows the configurations of (a) a video encoder and (b) a video decoder. The video encoder 435 shown in (a) performs an encoding process of video data such as MPEG and the like, and an encoding process mode for the video encoding unit core 611, specifically, For example, it has a video encoding frame rate control unit 612 that performs an encoding frame rate setting process.
[0113]
The video encoding frame rate control unit 612 inputs a processing change instruction to increase or decrease the encoding frame rate of video data from the encoding preprocessing unit 423. The video coding frame rate control unit 612 outputs coding frame rate setting information to the video coding unit core 611 based on the input command, and the video coding unit core 611 performs coding according to the set frame rate. Execute the process.
[0114]
A video decoder 413 shown in (b) performs a decoding process on video data such as MPEG and the like, and a decoding process mode for the video decoding unit core 641, specifically, for example, A video decoding frame rate control unit 642 for performing a decoding frame rate setting process.
[0115]
The video decoding frame rate control unit 642 inputs, from the decoding pre-processing unit 422, a processing change instruction instructing the level of the decoding frame rate of the video data. The video decoding frame rate control unit 642 outputs setting information of the decoding frame rate to the video decoding unit core 641 based on the input command, and the video decoding unit core 641 performs a decoding process according to the set frame rate. Execute
[0116]
As described above, the video encoder 435 and the video decoder 413 change the setting of the encoding or decoding frame rate, appropriately change the processing load, and change the resources used.
[0117]
As described above, in the configuration of the present invention, each processing is performed by changing the allocation of resources based on the weight of the processing based on the master-slave determination of the conversation. Is processed preferentially and is output as high quality data. The audio and video data sent from the attendant are output through the display and speaker of the terminal on the side of the master talker after the frame rate is reduced or the band is narrowed. Since the main talker spends a lot of effort in speaking, even if the quality of the audio and video data from the follower is reduced, it is considered that there will be no physical problem.
[0118]
On the other hand, the voice and video data of the main talker can use the surplus resources by simplifying the data processing corresponding to the slave talker, and the quality is improved by improving the frame rate, widening the band, etc. Sent to followers. The slave's terminal decodes the audio and video data of the master talker with the highest possible quality and presents it to the slave. Since the follower can view the voice and video data of the main talker who is important for the conversation as high-quality data, it can recognize fine changes in the facial expression of the main talker and small voices without any problem. The audio and video data of the follower are encoded with reduced quality and transmitted to the main talker.
[0119]
Here, there is no problem if the main talker side terminal that receives the encoded data with the reduced quality is processed further with the quality reduced, but in actual use, the lowest quality line is used. Is preferably set in advance. For example, a minimum frame rate in the processing of video data and a minimum processing band in the processing of audio data are set, and even when resources decrease, processing that does not fall below these minimum lines is executed.
[0120]
In addition, since both the encoding and the decoding can perform the process of lowering the quality of the terminal independently, it is not always necessary for the terminals to recognize the master and slave of the conversation with each other. For example, the real-time transport control protocol RTCP (Real) -The state information of each communication processing apparatus is exchanged in real time by using "Application-defined RTCP packet (RFC1889)" or the like specified in Time Transport Control Protocol, and thus, each communication processing apparatus is determined. The master-slave relationship information may be exchanged as needed, and a process based on a unified master-slave relationship may be performed on both sides.
[0121]
In the case of two-way visual communication between two terminals, the master / slave of the conversation is determined by only one of the terminals, and the master / slave determination terminal determines the non-determination terminal using the network protocol as described above. The information may be notified, and the master-slave determination information may be shared between the two terminals to perform processing based on the unified master-slave relationship.
[0122]
In this case, in the configuration of FIG. 4, the encoder / decoder control unit 421 inputs the master / slave determination information from the communication destination, and based on the input master / slave determination information, executes the processing high quality instruction or the simple processing. The decoding instruction is output to the encoding processing unit or the decoding processing unit.
[0123]
In the above-described embodiment, a case has been described in which the terminal automatically determines the master / slave of the conversation. However, the present invention is not limited to this. For example, a mode in which the master / slave of the conversation is switched by a switch may be provided. If the user wants to view the other party's audio / video with higher quality while the user is in a conversation, this can be achieved by providing a switch for setting himself / herself as a follower.
[0124]
Further, in the above-described embodiment, an example of communication between two terminals has been described. However, even when communication between three or more terminals is performed, similar to the above-described embodiment, data transmitted by one communication terminal is transmitted. It is possible to determine the power level between received data and specify one of them as the main talker, and the configuration of the present invention is not limited to processing between two terminals, and may be three or more. It can also be applied to interactive visual communication between terminals.
[0125]
A series of processes described in the above-described embodiment can be executed by hardware, software, or a combined configuration of both. When executing processing by software, a program recording the processing sequence is installed in a memory of a data processing device built in dedicated hardware and executed, or a general-purpose computer capable of executing various processing is used. It is possible to install and run the program. When a series of processing is performed by software, a program constituting the software is installed in, for example, a general-purpose computer or a microcomputer.
[0126]
FIG. 9 illustrates an example of a hardware configuration of a communication processing device that executes a series of processes described in the above-described embodiment. As described above, the transmission / reception data is encoded data, and in the case of data transmission, an encoding (encoding) process is executed, and a decoding (decoding) process is executed on the received data. The encoded data is transmitted and received as a packet via a network. Therefore, the data transmission side executes packet generation (packetizing processing), and the data receiving side executes packet expansion and analysis (depacketizing processing).
[0127]
In the communication processing device 850 such as a PC shown in FIG. 9, encoding (encoding) processing and decoding (decoding) processing are executed by the CPU 856 or the codec 851. Note that the CPU 856 executes the resource allocation processing and the main talker determination processing based on a program stored in the memory 857. The memory 857 has an area for storing a program for executing the above-described processing and a memory area for storing intermediate data and the like generated in the processing, and also allocates the above-mentioned resources to a memory area applied to encoding and decoding. It is appropriately changed and set based on the processing.
[0128]
The communication processing device 850 further includes input devices such as a network interface 852 functioning as an interface with a communication network, a mouse 837, a keyboard 836, and the like, and an input interface 853 for these input devices, a video camera 833, a microphone 834, a speaker 835, and other AV devices. It has a data input / output device, an AV interface 854 for performing data input / output from these AV data input / output devices, and a display interface 855 as a data output interface to the display 832.
[0129]
The CPU 856 executes data transfer control between each data input / output interface, the codec 851 and the network interface 852, and various other program controls. The memory 857 includes various programs executed by the CPU 856, various processing data, and a RAM and a ROM functioning as a work area of the CPU 856. The HDD 858 functions as a storage medium for storing data and programs. Each of these components is connected to a PCI bus 859 and has a configuration capable of mutually transmitting and receiving data.
[0130]
The encoded data as the transmission data is subjected to packet generation processing (packetizing) under the control of the CPU 856, and finally a packet having the encoded data as a payload is output on the PCI bus 859, and is transmitted via the network interface 852. Output to the network and delivered to the destination address set in the header of the packet.
[0131]
On the other hand, packetized data input via the network performs packet expansion processing (depacketizing) under the control of the CPU 856 via the network interface 852, and is further decoded according to a decoding program executed by the codec 851 or the CPU 856. The processing is executed and reproduced and output on the display 832 and the speaker 835.
[0132]
In the above-described embodiment, the description has been made centering on processing of video data of a user who performs communication. However, in the configuration of the present invention, image data to be processed is input data other than a camera, for example, a data input device such as a scanner. Encoding and transmitting data input from a device or a floppy disk, a CD-ROM (Compact Disc Only Memory), an MO (Magneto optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, a semiconductor memory, or other removable recording medium. It is also applicable to the case.
[0133]
The present invention has been described in detail with reference to the specific embodiments. However, it is obvious that those skilled in the art can modify or substitute the embodiment without departing from the spirit of the present invention. That is, the present invention has been disclosed by way of example, and should not be construed as limiting. In order to determine the gist of the present invention, the claims described at the beginning should be considered.
[0134]
Note that the series of processes described in the specification can be executed by hardware, software, or a combined configuration of both. When executing the processing by software, the program recording the processing sequence is installed in a memory in a computer embedded in dedicated hardware and executed, or the program is stored in a general-purpose computer capable of executing various processing. It can be installed and run.
[0135]
For example, the program can be recorded in a hard disk or a ROM (Read Only Memory) as a recording medium in advance. Alternatively, the program is temporarily or permanently stored on a removable recording medium such as a flexible disk, a CD-ROM (Compact Disc Only Memory), an MO (Magneto optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, or a semiconductor memory. It can be stored (recorded). Such a removable recording medium can be provided as so-called package software.
[0136]
The program is installed in the computer from the removable recording medium as described above, and is wirelessly transferred from the download site to the computer, or is transferred to the computer by wire via a network such as a LAN (Local Area Network) or the Internet. The computer can receive the program transferred in this way and install it on a recording medium such as a built-in hard disk.
[0137]
The various processes described in the specification may be executed not only in chronological order according to the description but also in parallel or individually according to the processing capability of the device that executes the processes or as necessary. Further, in this specification, a system is a logical set configuration of a plurality of devices, and is not limited to a device having each configuration in the same housing.
[0138]
【The invention's effect】
As described above, according to the configuration of the present invention, since the configuration is such that the resource allocation of the communication processing device is changed based on the weight of the process based on the master-slave determination of the conversation to process the transmission data or the reception data, It becomes possible to transmit and receive voice and video data of the main talker as high quality data and present it to the user who is the communicator. In other words, it is possible to selectively improve the quality of data in a portion important for conversation, and to realize data communication and communication with a high quality of conversation in terms of sensation.
[0139]
Further, according to the configuration of the present invention, the power level of the audio data included in the transmission data and the reception data is compared, and the transmission processing or the reception data including the audio data having the higher power level is selected to perform the priority processing. The process of identifying as data is executed, so that the user who is actually speaking is determined as the main talker, and the voice data and video data of the user who is actually speaking are selectively improved in quality. It is possible to do.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a communication processing configuration of encoded data.
FIG. 2 is a diagram illustrating a configuration of a transmission data processing unit in a communication processing device that executes communication processing of encoded data.
FIG. 3 is a diagram illustrating a configuration of a reception data processing unit in a communication processing device that executes communication processing of encoded data.
FIG. 4 is a diagram showing a configuration of a communication processing device of the present invention.
FIG. 5 is a flowchart illustrating a processing sequence of a conversation master / slave determination unit of the communication processing device of the present invention.
FIG. 6 is an encoder of the communication processing device of the present invention. It is a flowchart explaining the processing sequence of a decoder control part.
FIG. 7 is a diagram showing a configuration of an audio encoder and a decoder of the communication processing device of the present invention.
FIG. 8 is a diagram showing a configuration of a video encoder and a decoder of the communication processing device of the present invention.
FIG. 9 is a diagram illustrating an example of a hardware configuration of a communication processing device according to the present invention.
[Explanation of symbols]
110 Terminal A
111 Microphone A
112 Camera A
113 Speaker A
114 Display A
115 Transmission part A
116 Receiver A
120 Terminal B
121 Microphone B
122 Camera B
123 Speaker B
124 Display B
125 Transmission section B
126 Receiver B
130 Network
201 microphone
202 camera
210 transmission data processing unit
211 audio encoder
212 audio packet generator
213 Video Encoder
214 Video packet generator
215 Network sending unit
220 network
301 Network
310 reception data processing unit
311 Network receiver
312 Audio packet analyzer
313 audio decoder
314 Video packet analyzer
315 Video Decoder
321 speaker
322 display
400 communication processing device
401 Network
410 reception data processing unit
411 Network receiver
412 Video packet analyzer
413 Video Decoder
414 display
415 Audio Packet Analyzer
416 Audio Decoder
417 Speaker
420 Conversation master / slave judgment unit
421 Encoder / Decoder Control Unit
422 Decoding pre-processing unit
423 Encoding preprocessing unit
430 transmission data processing unit
431 microphone
432 audio encoder
433 Audio Packet Generator
434 camera
435 video encoder
436 Video packet generator
437 Network sending unit
511 Audio Encoding Unit Core
512 Audio coding band control unit
541 Audio Decoding Unit Core
542 audio decoding band control unit
611 Video Encoding Unit Core
612 Video coding frame rate control unit
641 video decoder core
642 video decoding frame rate control unit
809 PCI bus
832 display
833 video camera
834 microphone
835 speaker
837 mouse
838 keyboard
850 data transceiver
851 codec
852 network interface
853 input interface
854 AV interface
855 display interface
856 CPU
857 memory
858 HDD

Claims

A communication processing device that executes a transmission / reception process of encoded data,
A transmission data processing unit that performs transmission data encoding processing;
A reception data processing unit that performs a reception data decoding process;
A master / slave determination unit that identifies any one of the transmission data and the reception data as important processing data based on a comparison between the transmission data and the reception data;
Based on the identification information of the master / slave determination unit, a processing quality improvement command is output to either the transmission data processing unit or the reception data processing unit that processes the priority processing data, and the other data processing A control unit that outputs a processing simplification instruction to the unit,
The transmission data processing unit and the reception data processing unit are configured to execute a process of changing an encoding mode of transmission data and a decoding mode of reception data based on a control signal from the control unit. Communication processing device.

The master-slave determination is a process of comparing power levels of audio data included in transmission data and reception data, selecting any of transmission data or reception data including audio data having a high power level, and identifying the selected data as priority processing data. 2. The communication processing device according to claim 1, wherein the communication processing device performs the following.

The transmission data processing unit and the reception data processing unit include an audio data encoding unit or a decoding unit, and based on a processing quality improvement command from the control unit, expand a coding band or a decoding band. The communication processing device according to claim 1, wherein the communication processing device is configured to execute a process of reducing an encoding band or a decoding band based on a process simplification instruction from the control unit.

The transmission data processing unit and the reception data processing unit each include an encoding unit or a decoding unit for video data, and based on a processing high quality command from the control unit, determine an encoding frame rate or a decoding frame rate. The communication process according to claim 1, wherein the communication process is configured to execute an increase process and execute a process of reducing an encoding frame rate or a decoding frame rate based on a process simplification command from the control unit. apparatus.

The transmission data processing unit and the reception data processing unit execute a process of increasing a resource application rate in the communication processing device based on a processing quality improvement command from the control unit, and perform a simple process from the control unit. The communication processing device according to claim 1, wherein the communication processing device is configured to execute a process in which a resource application rate in the communication processing device is reduced based on the activation instruction.

The communication processing apparatus according to claim 5, wherein the resource application rate includes a processing time of a CPU and a usage rate of a memory.

The communication processing device,
The communication processing device according to claim 1, wherein the identification information of the master-slave determination unit is transmitted to a communication destination terminal, and a notification process of the identification information is executed.

The control unit includes:
Based on the master / slave determination identification information received from the communication destination terminal, a processing high quality command is output to one of the transmission data processing unit and the reception data processing unit, and the other data processing unit is output. 2. The communication processing device according to claim 1, wherein the communication processing device outputs a processing simplification instruction to the communication device.

The communication processing device may further include:
Having switch means which can arbitrarily set any one of transmission data and reception data as important processing data,
The control unit outputs a processing quality improvement command to one of the transmission data processing unit and the reception data processing unit that processes the priority processing data based on the setting information of the switch unit. 2. The communication processing device according to claim 1, wherein a processing simplification command is output to the other data processing unit.

A communication processing method for transmitting and receiving encoded data,
A master / slave determination step of identifying any of the transmission data and the reception data as the priority processing data based on the comparison of the transmission data and the reception data;
Based on the identification information in the master-slave determination step, output a processing quality improvement command to either the transmission data processing unit or the reception data processing unit that processes the priority processing data, and perform the other data processing. A control step of outputting a processing simplification instruction to the unit;
In the transmission data processing unit and the reception data processing unit, a process change step of executing a process of changing a coding mode of transmission data and a decoding mode of reception data based on the processing high quality instruction or the processing simplification instruction; ,
A communication processing method comprising:

The master-slave determination step performs a power level comparison of audio data included in the transmission data and the reception data, selects either transmission data or reception data including audio data having a high power level, and identifies the data as priority processing data. The communication processing method according to claim 10, wherein the processing is performed.

The transmission data processing unit and the reception data processing unit have an encoding unit or a decoding unit for audio data, and based on the processing high quality instruction, execute an expansion process of an encoding band or a decoding band, 11. The communication processing method according to claim 10, wherein the processing for reducing an encoding band or a decoding band is executed based on the processing simplification instruction.

The transmission data processing unit and the reception data processing unit include an encoding unit or a decoding unit for video data, and execute a process of increasing an encoding frame rate or a decoding frame rate based on the processing high quality instruction. 11. The communication processing method according to claim 10, wherein a reduction process of an encoding frame rate or a decoding frame rate is performed based on the processing simplification instruction.

The transmission data processing unit and the reception data processing unit execute a process of increasing a resource application rate in the communication processing device based on the processing high quality command, and perform a communication process based on the process simplification command. The communication processing method according to claim 10, wherein the communication processing method executes processing in which a resource application rate in the device is reduced.

The communication processing method according to claim 14, wherein the resource application rate includes a processing time of a CPU and a usage rate of a memory.

The communication processing method may further include:
11. The communication processing method according to claim 10, further comprising the step of transmitting the identification information in the master-slave determination step to a communication destination terminal, and executing a notification process of the identification information.

The communication processing method may further include:
Based on the master / slave determination identification information received from the communication destination terminal, a processing high quality command is output to one of the transmission data processing unit and the reception data processing unit, and the other data processing unit is output. 11. The communication processing method according to claim 10, further comprising a step of outputting a processing simplification instruction to the communication device.

The communication processing method may further include:
Having a priority processing data setting step by a switch means which can arbitrarily set any of the transmission data and the reception data as the priority processing data,
The control step outputs a processing quality improvement command to one of the transmission data processing unit and the reception data processing unit that processes the priority processing data based on the setting information of the switch unit. 11. The communication processing method according to claim 10, wherein a processing simplification instruction is output to the other data processing unit.

A computer program that executes communication processing of encoded data,
A master / slave determination step of identifying any of the transmission data and the reception data as the priority processing data based on the comparison of the transmission data and the reception data;
Based on the identification information in the master-slave determination step, output a processing quality improvement command to either the transmission data processing unit or the reception data processing unit that processes the priority processing data, and perform the other data processing. A control step of outputting a processing simplification instruction to the unit;
In the transmission data processing unit and the reception data processing unit, a process change step of executing a process of changing a coding mode of transmission data and a decoding mode of reception data based on the processing high quality instruction or the processing simplification instruction; ,
A computer program comprising: