JPWO2005122575A1

JPWO2005122575A1 - Communication device

Info

Publication number: JPWO2005122575A1
Application number: JP2006514390A
Authority: JP
Inventors: 昌之馬場; 小川　文伸; 文伸小川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2004-06-08
Filing date: 2004-06-08
Publication date: 2008-04-10
Also published as: WO2005122575A1

Abstract

送信音声データは音声符号化部２で符号化され、送信部３によって送信される。受信された符号化音声データは音声復号部５によって復号される。蓄積部７は、音声符号化部２で符号化される前の送信音声データと、受信部４で受信された符号化受信音声データを蓄積する。送受音声データを再生する場合、蓄積部７に蓄積された符号化受信音声データを音声復号部５で復号する。合成部８は、音声復号部５から出力された受信音声データと、蓄積部７に蓄積されている送信音声データとを合成し、再生データとして出力する。The transmission voice data is encoded by the voice encoding unit 2 and transmitted by the transmission unit 3. The received encoded audio data is decoded by the audio decoding unit 5. The accumulating unit 7 accumulates the transmission audio data before being encoded by the audio encoding unit 2 and the encoded reception audio data received by the receiving unit 4. When the transmission / reception voice data is reproduced, the encoded reception voice data stored in the storage unit 7 is decoded by the voice decoding unit 5. The synthesizing unit 8 synthesizes the received audio data output from the audio decoding unit 5 and the transmission audio data stored in the storage unit 7 and outputs it as reproduction data.

Description

本発明は、例えば音声データ等の送受信メディアデータを蓄積する通信装置に関する。 The present invention relates to a communication apparatus that stores transmission / reception media data such as audio data.

従来、無線電話機といった通信装置では、たとえば特開平１０−２７１０６１号公報等に示されるように、符号化前の送信データと復号後の受信データをミキシングすることによって、非符号化データの形式で送受の音声を合成音声として蓄積していた。
また、従来の通信装置として、符号化したデータを蓄積するようにしたものがあるが、これは受信した符号化データのみを蓄積するだけであった。
従来の通信装置は以上のように構成されているため、送受信の音声を符号化データとして蓄積し、これらの送受信音声を再生しようとした場合、例えば、送信データ用の符号化器と蓄積用の符号化器の二つが必要になる等、装置構成が大きくなるという問題点があった。
この発明は上記のような課題を解決するためになされたもので、送受信データを再生する場合でも、最小限の装置構成で実現することのできる通信装置を得ることを目的とする。Conventionally, in a communication apparatus such as a radio telephone, transmission / reception in the form of unencoded data is performed by mixing transmission data before encoding and reception data after decoding as disclosed in, for example, Japanese Patent Laid-Open No. 10-271061. Was stored as synthesized speech.
In addition, there is a conventional communication apparatus that stores encoded data, but this only stores received encoded data.
Since the conventional communication apparatus is configured as described above, transmission / reception voices are stored as encoded data, and when these transmission / reception voices are to be reproduced, for example, a transmission data encoder and a storage unit are stored. There is a problem that the apparatus configuration becomes large, such as requiring two encoders.
The present invention has been made to solve the above-described problems, and an object of the present invention is to obtain a communication apparatus that can be realized with a minimum apparatus configuration even when transmitting / receiving data is reproduced.

この発明に係る通信装置は、符号化前の送信メディアデータと、受信した符号化受信メディアデータとを蓄積部で蓄積し、この蓄積された符号化受信メディアデータを復号したデータと、蓄積部で蓄積された送信メディアデータを合成し、再生データとして出力するようにしたものである。
このことによって、送受信データを再生する場合でも、符号化部や復号部およびデータ蓄積部といった装置構成を最小限の薮で実現することができる効果がある。The communication apparatus according to the present invention stores transmission media data before encoding and received encoded reception media data in a storage unit, decodes the stored encoded reception media data, and stores in the storage unit The accumulated transmission media data is synthesized and output as reproduction data.
Thus, even when transmission / reception data is reproduced, it is possible to realize apparatus configurations such as an encoding unit, a decoding unit, and a data storage unit with a minimum number of hooks.

第１図はこの発明の実施の形態１による通信装置を示す構成図である。
第２図はこの発明の実施の形態２における通信装置を示す構成図である。
第３図はこの発明の実施の形態３における通信装置を示す構成図である。
第４図はこの発明の実施の形態４における通信装置を示す構成図である。
第５図は、この発明の実施の形態４における通信装置の無音データの最初のタイミングで次のデータを出力するようにした場合の動作説明図である。FIG. 1 is a block diagram showing a communication apparatus according to Embodiment 1 of the present invention.
FIG. 2 is a block diagram showing a communication apparatus according to Embodiment 2 of the present invention.
FIG. 3 is a block diagram showing a communication apparatus according to Embodiment 3 of the present invention.
FIG. 4 is a block diagram showing a communication apparatus according to Embodiment 4 of the present invention.
FIG. 5 is an operation explanatory diagram when the next data is output at the first timing of the silence data of the communication apparatus according to the fourth embodiment of the present invention.

以下、この発明をより詳細に説明するために、この発明を実施するための最良の形態について、添付の図面に従って説明する。
実施の形態１．
第１図は、この発明の実施の形態１による通信装置を示す構成図である。
通信装置１０１ａは網２００を介して通信装置１０１ｂと接続されている。通信装置１０１ａ，１０１ｂは、いわゆるテレビ電話といった画像と音声を送受信する端末である。また、図中、通信装置１０１ｂの構成は、通信装置１０１ａと同様であるため、通信装置１０１ａの内部構成のみを示している。尚、通信装置１０１ｂは、通信装置１０１ａの通信相手装置であるとする。
通信装置１０１ａは、ビデオ符号化部１、音声符号化部２、送信部３、受信部４、音声復号部５、ビデオ復号部６、蓄積部７、合成部８を備えている。
ビデオ符号化部１は、カメラ１１１などの映像入力装置からのビデオ信号の符号化を行う機能部である。音声符号化部２は、マイク１１２などの音声入力装置からの音声信号の符号化を行う機能部である。また、これらビデオ符号化部１および音声符号化部２により、符号化部が実現されている。送信部３は、ビデオ符号化部１および音声符号化部２からの符号化データを通信プロトコルに則り、網２００に出力するための機能部である。
受信部４は、網２００を介して通信装置１０１ａ宛のデータを受信する機能部であり、ビデオデータと音声データに分離する機能を有している。音声復号部５は、受信部４から出力された符号化受信音声データを復号し、スピーカ１１３などの音声出力装置に出力するための機能部である。ビデオ復号部６は、受信部４からの符号化ビデオデータを復号し、モニタ１１４などの映像出力装置に出力する機能部である。また、これら音声復号部５およびビデオ復号部６は、蓄積部７に蓄積された符号化受信音声データや符号化受信ビデオデータを復号するよう構成されている。そして、これら音声復号部５およびビデオ復号部６により、復号部が実現されている。
蓄積部７は、受信部４からの符号化受信データやマイク１１２からの符号化前の送信音声データを蓄積する機能部であり、必要に応じて蓄積データを出力するよう構成されている。合成部８は、音声復号部５からの復号音声データと蓄積部７からの送信音声データとを合成し、スピーカ１１３等の音声出力装置に出力する機能を有している。
尚、上記のビデオ符号化部１〜合成部８の各構成は、専用のハードウェアあるいはそれぞれの機能に対応したソフトウェアとこれを実行するＣＰＵやメモリ等のハードウェアによって実現されている。
次に、このように構成された通信装置の動作について説明する。
先ず、通信装置１０１ａ，１０１ｂの通信中の動作について説明する。
今、通信装置１０１ａと通信装置１０１ｂとは網２００を介して音声やビデオなどのメディアデータを用いた通信を行っているとする。
カメラ１１１からのビデオ信号は、ビデオ符号化部１で符号化され送信部３に送られる。マイク１１２からの音声信号は音声符号化部２で符号化され送信部３に送られると同時に蓄積部７に送られ、符号化されない音声信号のまま蓄積される。
送信部３では、ビデオと音声の符号化データを通信プロトコルに則り送信する。
一方、通信装置１０１ｂから網２００を介して送信されたビデオと音声の符号化データは、通信プロトコルに則り受信部４で受信され、ビデオデータと音声データに分けられ、それぞれビデオ復号部６および音声復号部５に渡される。このとき、符号化されたビデオ、音声データは同時に蓄積部７に渡され、蓄積部７で符号化データとして蓄積される。
ビデオ復号部６では受信部４からの符号化ビデオデータを復号し、ビデオ信号としてモニタ１１４に出力する。同様に、音声復号部５では受信部４からの符号化音声データを復号し、合成部８を通過して音声信号としてスピーカ１１３に出力される。
以上は、通信中の動作であるが、次に通信中に蓄積したデータを非通信中に再生する場合の動作について説明する。
蓄積部７において、受信部４からの出力を蓄積した符号化データは、その符号化音声データが音声復号部５へ、符号化ビデオデータがビデオ復号部６へ出力される。ビデオ復号部６では符号化ビデオデータを復号し、モニタ１１４にビデオ信号として出力し、モニタ１１４において受信したビデオが再生される。一方、音声復号部５でも符号化音声データを復号し、音声信号として合成部８に出力する。
蓄積部７では、通信中に受信データと共に送信データも蓄積していたので、その送信データである音声信号を同時に合成部に出力する。合成部８では、音声復号部５からの音声信号と蓄積部７からの音声信号のミキシングを行い、合成音声信号としてスピーカ１１３に出力し、スピーカ１１３において送受信音声が再生される。
以上のように、実施の形態１によれば、送信メディアデータを符号化して符号化送信メディアデータとする符号化部と、受信した符号化受信メディアデータを復号する復号部と、符号化受信メディアデータと送信メディアデータを蓄積する蓄積部と、蓄積部で蓄積された符号化受信メディアデータを復号部で復号したデータと、蓄積部で蓄積された送信メディアデータを合成し、再生データとして出力する合成部とを備えたので、最小限の装置構成で送受信データを再生することのできる通信装置を得ることができる。即ち、送受音声データを非符号化データとして蓄積する場合より、符号化の効果により蓄積データサイズを小さくすることができる。また、一つの復号部で送受信データの再生が可能であるため、別途に復号部を必要とせず、最小限の装置構成でこのような再生を実現することができる。
また、実施の形態１では、蓄積部は、蓄積した符号化受信メディアデータまたは送信メディアデータのうち、いずれか一方または両方を出力するようにしたので、再生時にどちらか片方のみを再生することが可能となる。即ち、従来のようにミキシングを行った後に蓄積する場合に比べて、所望のデータのみを再生することができるため、利便性を向上させることができる。これにより、利用者は、送信データのみ、または受信データのみ、あるいは送受信データを同時といったように、様々な再生方法を選択することができる。
実施の形態２．
実施の形態１は、送受別々の蓄積データを同時に再生するようにしたものであるが、実施の形態２では、一つの合成したデータを再生するようにした場合の実施形態である。
第２図は、実施の形態２の通信装置の構成図である。
通信装置１０２ａは網２００を介して通信装置１０２ｂと接続されており、これらの接続関係は、実施の形態１と同様である。また、実施の形態２においても、通信装置１０２ａと通信装置１０２ｂは同様の構成であるため、その内部構成は通信装置１０２ａのみ示している。
通信装置１０２ａは、ビデオ符号化部１、音声符号化部２、送信部３、受信部４、音声復号部５、ビデオ復号部６、蓄積部７、編集部９を備えている。ここで編集部９以外の構成は、実施の形態１と同様であるため、対応する部分に同一符号を付してその説明は省略する。
編集部９は、蓄積部７で蓄積された符号化受信音声データを復号した受信音声データと、送信音声データとを合成し、この合成したデータを符号化して、再度、蓄積部７に蓄積させるよう処理を行う機能を有している。
尚、上記のビデオ符号化部１〜編集部９の各構成は、専用のハードウェアあるいはそれぞれの機能に対応したソフトウェアとこれを実行するＣＰＵやメモリ等のハードウェアによって実現されている。
次に、実施の形態２の動作について説明する。
通信中の動作については、実施の形態１と同様であり、ビデオと音声のデータの送受信を行いながら、符号化前の送信音声データと復号前のビデオと音声の符号化データを蓄積部７で蓄積する。
通信終了後、蓄積部７は通信中に蓄積した符号化前の送信音声データと復号前の符号化受信音声データを編集部９に転送する。
編集部９では、復号前の符号化受信音声データの復号を行う。尚、この復号処理は音声復号部５を用いて行うよう構成してもよい。次に、編集部９は、復号された受信音声データと蓄積部７に蓄積されている符号化前の送信音声データを合成して、送受合成音声データを生成する。更に、編集部９は、生成した送受合成音声データを符号化し、符号化送受合成音声データを得る。尚、この符号化処理は音声符号化部２を用いて編集部９が指示するよう構成してもよい。
編集部９は、このようにして得た符号化送受合成音声データを蓄積部７に転送する。蓄積部７では、編集部９から送られてきた符号化送受合成音声データとビデオの受信符号化データを合わせて蓄積する。再生時は、蓄積部７から音声とビデオの符号化データをそれぞれ音声復号部５とビデオ復号部６に送出し、これら復号部によって受信ビデオの復号と送受信音声の復号が行われる。
また、蓄積部７では、編集部９から送られてきた符号化送受合成音声データとビデオの受信符号化データとを合わせて蓄積する際に、合成前の受信音声データおよび送信音声データを削除することで、所望の蓄積データのみを残すことができる。または、受信した符号化音声・ビデオデータを残したまま、受信ビデオデータをコピーし、それに編集部９から送られてきた符号化送受合成音声データを合わせて蓄積することで、受信のみの音声・ビデオデータと、合成音声・ビデオデータの２種類が生成でき、用途に応じて再生することも可能となる。
以上のように、実施の形態２によれば、送信メディアデータを符号化して符号化送信メディアデータとする符号化部と、受信した符号化受信メディアデータを復号する復号部と、符号化受信メディアデータを復号した受信メディアデータと、送信メディアデータとを合成し、合成したデータを符号化した符号化合成データを生成する編集部と、編集部から出力された符号化合成データを蓄積する蓄積部とを備え、蓄積した符号化合成データを復号し再生データとして出力するようにしたので、最小限の装置構成で送受信データを再生することができる。
また、実施の形態２では、通信後に仮に蓄積したデータを編集して、送受合成データを生成しているので、通信中には合成処理のための負荷がかかることがない。このため、符号化部や復号部は通信時に必要な数量でこのような送受信データの再生を実現することができる。
実施の形態３．
実施の形態３は、符号化部から出力された符号化送信音声データを蓄積するようにしたものである。
第３図は、実施の形態３の通信装置の構成図である。
通信装置１０３ａは網２００を介して通信装置１０３ｂと接続されており、これらの接続関係は、実施の形態１、２と同様である。また、実施の形態３においても、通信装置１０３ａと通信装置１０３ｂは同様の構成であるため、その内部構成は通信装置１０３ａのみ示している。
通信装置１０３ａは、ビデオ符号化部１、音声符号化部２、送信部３、受信部４、音声復号部５、ビデオ復号部６、蓄積部７、編集部１０を備えている。ここで、蓄積部７に入力されるのが、ビデオ符号化部１および音声符号化部２の出力データである点と、編集部１０以外の構成は実施の形態１と同様であるため、これら以外の構成に対する説明は省略する。
蓄積部７には、ビデオ符号化部１の出力データである符号化送信ビデオデータと、音声符号化部２の出力データである符号化送信音声データとが入力されるよう構成されており、蓄積部７は、これらの符号化データを蓄積する。また、編集部１０は、蓄積部７に蓄積されている符号化送信音声データと符号化受信音声データとを取り出し、これら音声データを復号した音声データをミキシングして合成音声データを生成し、この合成音声データを蓄積部７に出力する機能を有している。
尚、上記のビデオ符号化部１〜編集部１０の各構成は、専用のハードウェアあるいはそれぞれの機能に対応したソフトウェアとこれを実行するＣＰＵやメモリ等のハードウェアによって実現されている。
次に、実施の形態３の動作について説明する。
通信中の動作については、ビデオと音声のデータの送受信を行いながら、音声符号化部２の出力である符号化送信音声データと、受信部４の出力である復号前のビデオと音声の符号化データを蓄積部７で蓄積する。
通信終了後、蓄積部７は通信中に蓄積した符号化送信音声データと復号前の符号化受信音声データを編集部１０に転送する。
編集部１０では、これら符号化音声データの復号を行い、復号された受信音声データを得る。尚、この復号処理は音声復号部５を用いて行うよう構成してもよい。次に、編集部１０は、復号された受信音声データと送信音声データとを合成して、送受合成音声データを生成する。更に、編集部１０は、生成した送受合成音声データを符号化し、符号化送受合成音声データを得る。尚、この符号化処理は音声符号化部２を用いて行うよう構成してもよい。
編集部１０は、このようにして得た符号化送受合成音声データを蓄積部７に転送する。蓄積部７では、編集部１０から送られてきた符号化送受合成音声データを蓄積する。
また、蓄積部７に蓄積されている符号化送信ビデオデータと受信された符号化受信ビデオデータについても、編集部１０によって音声と同様の処理が行われ、例えば送信画像と受信画像とを同時に表示するような合成ビデオデータが生成される。そして、この合成ビデオデータを符号化した符号化送受合成ビデオデータが生成され、これが蓄積部７で蓄積される。但し、特に合成ビデオを必要としない場合は、この機能を省くことができる。
尚、編集部１０で行うこれら一連の処理はリアルタイムで行う必要がないため、例えば、符号化送信音声データの復号を行った後に符号化受信音声データの復号を行う、といったように復号処理を順番に行うことで、装置としての必要な機能数を最小限に抑えることができる。
また、本実施の形態では、音声とビデオデータのみを扱っているが、他の様々なメディアデータに関しても合成等の処理を行う必要があれば、編集部１０によって行うことができる。
尚、蓄積部７が蓄積する編集部１０からの符号化合成データが音声データのみであれば、符号化受信ビデオデータをコピーしそれと合わせて蓄積することも可能である。このようにして、送受信用の符号化データと同様のフォーマットの符号化合成データを蓄積部７で保存することとする。
また、蓄積データの再生時、蓄積部７では送信、受信、合成のいずれかの符号化データを音声復号部５およびビデオ復号部６に対して出力し、それぞれ音声信号、ビデオ信号が出力される。
以上のように実施の形態３によれば、送信メディアデータを符号化して符号化送信メディアデータとする符号化部と、受信した符号化受信メディアデータを復号する復号部と、符号化受信メディアデータと、符号化送信メディアデータとを復号した受信メディアデータと送信メディアデータとを合成し、合成したデータを符号化した符号化合成データを生成する編集部と、編集部から出力された符号化合成データを蓄積する蓄積部とを備え、蓄積した符号化合成データを復号し再生データとして出力するようにしたので、最小限の装置構成で送受信データを再生することができる。
また、実施の形態３では、符号化送信データ、符号化受信データを蓄積し、それらを通信後に編集を行い、送受合成データを生成しているので、通信中に合成処理のための負荷がかからず、送信、受信、合成の符号化データを生成することができる。
実施の形態４．
実施の形態４は、時分割に送信音声データと受信音声データとを蓄積するようにしたものである。
第４図は、実施の形態４の通信装置の構成図である。
通信装置１０４ａは網２００を介して通信装置１０４ｂと接続されており、これらの接続関係は、実施の形態１〜３と同様である。また、実施の形態４においても、通信装置１０４ａと通信装置１０４ｂは同様の構成であるため、その内部構成は通信装置１０４ａのみ示している。
通信装置１０４ａは、ビデオ符号化部１、音声符号化部２、送信部３、受信部４、音声復号部５、ビデオ復号部６、蓄積部７、選択部１１を備えている。ここで、選択部１１以外の構成は、実施の形態１と同様であるため、対応する部分に同一符号を付してその説明は省略する。選択部１１は、音声符号化部２からの出力である符号化送信音声データと、受信部４からの出力である符号化受信音声データとをその有音／無音状態に基づいて選択し、その選択データを蓄積部７に出力する機能部である。
尚、上記のビデオ符号化部１〜選択部１１の各構成は、専用のハードウェアあるいはそれぞれの機能に対応したソフトウェアとこれを実行するＣＰＵやメモリ等のハードウェアによって実現されている。
次に、実施の形態４の動作について説明する。
通信中では、選択部１１は音声符号化部２の出力（符号化送信音声データ）、および受信部４からの音声出力（符号化受信音声データ）を入力データとして、それらデータの有音／無音検出を行う。そして、受信音声データの無音部分を送信音声データの有音部分と入れ替えて、一つ分の音声データを生成し、これを蓄積部７に出力する。
一般に通話中でも音声の無音区間はかなりあるため、送信と受信の有音部分の符号化データを集めて切り貼りし、片方向分の音声符号化データよりデータ量を小さくすることは可能である。但し、通常蓄積するような一方向分のデータ量と同じにするために、選択部１１では、基本的に符号化受信データを選択し、符号化送信データが有音になった場合にのみ無音部分の符号化受信データに代えて符号化送信データを蓄積部７に転送する。
蓄積部７では、選択部１１で選択された符号化音声データと受信部４からの符号化ビデオデータとを蓄積する。尚、必要に応じて受信部４からの符号化音声データを直接蓄積する機能を設けても良い。他の動作は実施の形態３と同様であるためその説明は省略する。
蓄積したデータを再生するときは蓄積部７から符号化音声データを音声復号部５へ、符号化ビデオデータをビデオ復号部６へそれぞれ出力し、音声の送受データの再生と受信ビデオデータの再生を行う。
以上のように、実施の形態４では、送信メディアデータを符号化して符号化送信メディアデータとする符号化部と、受信した符号化受信メディアデータを復号する復号部と、符号化受信メディアデータと、符号化送信メディアデータを、これらメディアデータが有効データか否かによっていずれか一方を選択する選択部と、選択部の出力データを蓄積する蓄積部とを備え、蓄積したデータを復号部で復号し再生データとして出力するようにしたので、最小限の装置構成で送受信データを再生することができると共に、片方向分のデータ量で両方の音声データの有効な部分のみを蓄積できる効果がある。
また、選択部１１において、符号化送信音声データと符号化受信音声データが共に有音部分であった場合、どちらかを選択するだけでなく、選択されなかった方の符号化音声データを一時的に遅延させ、選択された方の符号化音声データが無音になった時に、その遅延させた符号化音声データを後続の蓄積部７に転送するようにしても良い。
このように構成すれば、本来のタイミングより遅延して蓄積部７へ送られた符号化音声データは、蓄積部７に送られたタイミングで蓄積される。そのため、再生時、実際よりわずかに遅延して音声が出力されることになるが、送受音声データの有音部分がかけることなく全て出力可能となる。
以上のように、実施の形態４では、符号化送信音声データと符号化受信音声データが同時に有音となった場合にどちらかを遅延させて蓄積するようにしているので、送受データの有音部分全てを蓄積することができる。
更に、選択部１１において、符号化送信音声データと符号化受信音声データを切り替える際に、被切替対象の符号化データに無音区間が一定期間以上存在したら切り替えるようにしてもよい。こうすることで、例えば音声データに含まれる会話の途中で切り替わることを防ぐことができる。
以上のように、実施の形態４では、送信音声符号化データと受信音声符号化データを切り替える際に、切り替える前の音声データに一定期間以上の無音区間が存在した時に切り替えるようにしたので、符号化音声データ中の会話が途中で途切れないようにすることができる。
ところで、切替の判定のために一定期間以上の無音が存在した後に切替を行うと、必ず切替ポイントで一定期間以上の無音が存在する。もし、送受データ共に有音の場合、どちらか一方の音声データは遅延させられており、更に切替判定のための一定期間以上の無音データ分だけ更に遅延することになる。つまり、送受音声間での相対遅延は無音データ分だけ更に大きくなる。
そこで、一定期間以上の無音データを検出したら、その無音データの最初のタイミングで切り替えられるようにすれば、切替後のデータの遅延が減少する。そのために、選択部１１では蓄積部７に出力するデータに一定期間以上の遅延をかけて出力する。そして一定期間以上の無音データを検出した時、つまり切替タイミングで、次に出力すべきデータが既に一定期間以上待たされている場合、そのデータを即座に出力すれば、切替検出のための無音データは蓄積部７に出力されずにすむ。
第５図は、無音データの最初のタイミングで次のデータを出力するようにした場合の動作説明図である。
選択部１１は、基本的に受信部４からの受信データを選択しており、この受信データを一定期間Ｔ_１より大きい遅延時間Ｔ_２だけ遅延させて蓄積部７に出力する。今、時刻ｔ_１で音声符号化部２から送信データが有音となったとすると、時刻ｔ_１では受信データが有音であるため、選択部１１は送信データを一時的に蓄積する。そして、受信データは遅延時間Ｔ_２だけ遅延して出力されるため、その終端（時刻ｔ_２）は、時刻ｔ_３（＝ｔ_２＋Ｔ_２）に出力される。ここで、遅延した受信データ（選択部１１の出力データ）は時刻ｔ_３以降は無音部分となるため、時刻ｔ_３において即座に送信データを出力する。即ち、このような処理を行わない場合は、受信データと送信データとの間には一定期間Ｔ_１の無音時間が存在することになるが、有音データの最初のタイミングで即座に送信データを出力することによって、切替後の送信データの遅延を減少させることができる。
以上のように、実施の形態４によれば、選択部は、切り替え判定を行うための一定期間以上の時間分、選択するデータに遅延をかけて出力し、選択するデータを切り替える際に、次に選択すべきデータが既に一定期間以上遅延している場合に、そのデータを即座に出力するようにしたので、切替ポイントにおける無音部分をなくすことができ、従って、送受音声の相対的な遅延を小さくすることができる。
尚、上記実施の形態４において、メディアデータが有効か否かの判定を音声データの有音か無音かで行うようにしたが、これに限定されるものではなく、メディアデータの種類等によって、任意に選択してもよい。
また、上記各実施の形態では、再生する送受信メディアデータとして、音声データやビデオデータであるとしたが、これらのデータに限定されるものではなく、種々のデータに適用可能である。Hereinafter, in order to describe the present invention in more detail, the best mode for carrying out the present invention will be described with reference to the accompanying drawings.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a communication apparatus according to Embodiment 1 of the present invention.
The communication device 101a is connected to the communication device 101b via the network 200. The communication devices 101a and 101b are terminals that transmit and receive images and sounds, such as so-called videophones. Also, in the figure, the configuration of the communication device 101b is the same as that of the communication device 101a, and therefore only the internal configuration of the communication device 101a is shown. Note that the communication device 101b is a communication partner device of the communication device 101a.
The communication apparatus 101a includes a video encoding unit 1, an audio encoding unit 2, a transmission unit 3, a reception unit 4, an audio decoding unit 5, a video decoding unit 6, a storage unit 7, and a synthesis unit 8.
The video encoding unit 1 is a functional unit that encodes a video signal from a video input device such as the camera 111. The audio encoding unit 2 is a functional unit that encodes an audio signal from an audio input device such as the microphone 112. Also, the video encoding unit 1 and the audio encoding unit 2 realize an encoding unit. The transmission unit 3 is a functional unit for outputting the encoded data from the video encoding unit 1 and the audio encoding unit 2 to the network 200 in accordance with a communication protocol.
The receiving unit 4 is a functional unit that receives data addressed to the communication apparatus 101a via the network 200, and has a function of separating video data and audio data. The audio decoding unit 5 is a functional unit for decoding the encoded reception audio data output from the reception unit 4 and outputting the decoded reception audio data to an audio output device such as the speaker 113. The video decoding unit 6 is a functional unit that decodes the encoded video data from the receiving unit 4 and outputs the decoded video data to a video output device such as the monitor 114. The audio decoding unit 5 and the video decoding unit 6 are configured to decode the encoded reception audio data and the encoded reception video data stored in the storage unit 7. The audio decoding unit 5 and the video decoding unit 6 realize a decoding unit.
The accumulating unit 7 is a functional unit that accumulates encoded reception data from the receiving unit 4 and transmission voice data before encoding from the microphone 112, and is configured to output accumulated data as necessary. The synthesizing unit 8 has a function of synthesizing the decoded audio data from the audio decoding unit 5 and the transmission audio data from the storage unit 7 and outputting them to an audio output device such as the speaker 113.
Each configuration of the video encoding unit 1 to the synthesizing unit 8 is realized by dedicated hardware or software corresponding to each function and hardware such as a CPU or a memory that executes the software.
Next, the operation of the communication apparatus configured as described above will be described.
First, an operation during communication of the communication devices 101a and 101b will be described.
Now, it is assumed that the communication device 101a and the communication device 101b perform communication using media data such as voice and video via the network 200.
A video signal from the camera 111 is encoded by the video encoder 1 and sent to the transmitter 3. The audio signal from the microphone 112 is encoded by the audio encoding unit 2 and sent to the transmission unit 3 and simultaneously sent to the storage unit 7 where it is stored as an unencoded audio signal.
The transmission unit 3 transmits encoded video and audio data according to a communication protocol.
On the other hand, encoded video and audio data transmitted from the communication apparatus 101b via the network 200 is received by the receiving unit 4 in accordance with a communication protocol, and is divided into video data and audio data. It is passed to the decryption unit 5. At this time, the encoded video and audio data are simultaneously transferred to the storage unit 7 and stored as encoded data in the storage unit 7.
The video decoding unit 6 decodes the encoded video data from the receiving unit 4 and outputs it to the monitor 114 as a video signal. Similarly, the audio decoding unit 5 decodes the encoded audio data from the reception unit 4, passes through the synthesis unit 8, and is output to the speaker 113 as an audio signal.
The above is the operation during communication. Next, the operation when data accumulated during communication is reproduced during non-communication will be described.
In the storage unit 7, the encoded data in which the output from the receiving unit 4 is stored, the encoded audio data is output to the audio decoding unit 5 and the encoded video data is output to the video decoding unit 6. The video decoding unit 6 decodes the encoded video data, outputs it as a video signal to the monitor 114, and the video received by the monitor 114 is reproduced. On the other hand, the speech decoding unit 5 also decodes the encoded speech data and outputs it as a speech signal to the synthesis unit 8.
Since the storage unit 7 has also stored the transmission data together with the reception data during communication, the sound signal as the transmission data is simultaneously output to the synthesis unit. The synthesizing unit 8 performs mixing of the audio signal from the audio decoding unit 5 and the audio signal from the storage unit 7, outputs the mixed audio signal to the speaker 113, and the transmitted / received audio is reproduced by the speaker 113.
As described above, according to the first embodiment, the encoding unit that encodes transmission media data to obtain encoded transmission media data, the decoding unit that decodes the received encoded reception media data, and the encoded reception media The storage unit for storing data and transmission media data, the data obtained by decoding the encoded reception media data stored in the storage unit by the decoding unit, and the transmission media data stored in the storage unit are combined and output as reproduction data Since the synthesizer is provided, it is possible to obtain a communication device capable of reproducing transmission / reception data with a minimum device configuration. That is, the stored data size can be reduced by the effect of encoding, compared with the case where transmission / reception voice data is stored as non-encoded data. Further, since transmission / reception data can be reproduced by one decoding unit, such reproduction can be realized with a minimum apparatus configuration without requiring a separate decoding unit.
In the first embodiment, since the storage unit outputs either one or both of the stored encoded reception media data and transmission media data, only one of them can be played back during playback. It becomes possible. That is, as compared with the conventional case of storing after mixing, it is possible to reproduce only desired data, so that convenience can be improved. Thus, the user can select various reproduction methods such as transmission data only, reception data only, or transmission / reception data simultaneously.
Embodiment 2. FIG.
In the first embodiment, the accumulated data separately transmitted and received are reproduced at the same time. In the second embodiment, one synthesized data is reproduced.
FIG. 2 is a configuration diagram of the communication apparatus according to the second embodiment.
The communication device 102a is connected to the communication device 102b via the network 200, and the connection relationship between them is the same as in the first embodiment. Also in the second embodiment, since the communication device 102a and the communication device 102b have the same configuration, only the communication device 102a is shown as the internal configuration.
The communication device 102 a includes a video encoding unit 1, an audio encoding unit 2, a transmission unit 3, a reception unit 4, an audio decoding unit 5, a video decoding unit 6, a storage unit 7, and an editing unit 9. Here, since the configuration other than the editing unit 9 is the same as that of the first embodiment, the corresponding parts are denoted by the same reference numerals and description thereof is omitted.
The editing unit 9 synthesizes the reception voice data obtained by decoding the encoded reception voice data stored in the storage unit 7 and the transmission voice data, encodes the combined data, and stores the data in the storage unit 7 again. Has a function to perform such processing.
Each configuration of the video encoding unit 1 to the editing unit 9 is realized by dedicated hardware or software corresponding to each function and hardware such as a CPU or a memory that executes the software.
Next, the operation of the second embodiment will be described.
The operation during communication is the same as in the first embodiment, and the transmission unit 7 stores the transmission audio data before encoding and the encoded video and audio data before decoding while transmitting and receiving video and audio data. accumulate.
After the end of communication, the storage unit 7 transfers the transmission voice data before encoding and the encoded reception voice data before decoding stored during communication to the editing unit 9.
The editing unit 9 decodes the encoded reception voice data before decoding. This decoding process may be performed using the audio decoding unit 5. Next, the editing unit 9 synthesizes the decoded received voice data and the transmission voice data before encoding stored in the storage unit 7 to generate transmission / reception synthesized voice data. Further, the editing unit 9 encodes the generated transmission / reception synthesized speech data to obtain encoded transmission / reception synthesized speech data. This encoding process may be configured to be instructed by the editing unit 9 using the speech encoding unit 2.
The editing unit 9 transfers the encoded transmission / reception synthesized speech data thus obtained to the storage unit 7. The accumulating unit 7 accumulates the encoded transmission / reception synthesized audio data sent from the editing unit 9 and the received encoded video data. During reproduction, audio and video encoded data are sent from the storage unit 7 to the audio decoding unit 5 and the video decoding unit 6, respectively, and the decoding unit decodes received video and transmission / reception audio.
Further, the storage unit 7 deletes the reception voice data and the transmission voice data before synthesis when the encoded transmission / reception synthesized voice data sent from the editing unit 9 and the video reception encoded data are stored together. Thus, it is possible to leave only desired accumulated data. Alternatively, the received video data is copied while the received encoded audio / video data is left, and the encoded transmission / reception synthesized audio data sent from the editing unit 9 is stored together with the received audio / video data, so that only the received audio / video data is stored. Two types of video data and synthesized audio / video data can be generated, and can be reproduced according to usage.
As described above, according to the second embodiment, the encoding unit that encodes transmission media data to obtain encoded transmission media data, the decoding unit that decodes the received encoded reception media data, and the encoded reception media An editing unit that synthesizes reception media data obtained by decoding data and transmission media data, generates encoded combined data obtained by encoding the combined data, and an accumulation unit that stores encoded combined data output from the editing unit Since the accumulated encoded combined data is decoded and output as reproduction data, the transmission / reception data can be reproduced with a minimum apparatus configuration.
Further, in the second embodiment, data temporarily stored after communication is edited to generate transmission / reception combined data, so that a load for combining processing is not applied during communication. For this reason, the encoding unit and the decoding unit can realize reproduction of such transmission / reception data with a quantity necessary for communication.
Embodiment 3 FIG.
In the third embodiment, the encoded transmission voice data output from the encoding unit is accumulated.
FIG. 3 is a configuration diagram of the communication apparatus according to the third embodiment.
The communication device 103a is connected to the communication device 103b via the network 200, and the connection relationship between them is the same as in the first and second embodiments. Also in the third embodiment, since the communication device 103a and the communication device 103b have the same configuration, only the communication device 103a is shown as the internal configuration.
The communication device 103a includes a video encoding unit 1, an audio encoding unit 2, a transmission unit 3, a receiving unit 4, an audio decoding unit 5, a video decoding unit 6, a storage unit 7, and an editing unit 10. Here, since the data input to the storage unit 7 is the output data of the video encoding unit 1 and the audio encoding unit 2 and the configuration other than the editing unit 10 is the same as in the first embodiment, these are the same. Descriptions for other configurations are omitted.
The storage unit 7 is configured to receive encoded transmission video data that is output data of the video encoding unit 1 and encoded transmission audio data that is output data of the audio encoding unit 2. The unit 7 accumulates these encoded data. Further, the editing unit 10 takes out the encoded transmission audio data and the encoded reception audio data stored in the storage unit 7, mixes the audio data obtained by decoding these audio data, and generates synthesized audio data. A function of outputting the synthesized voice data to the storage unit 7 is provided.
Each configuration of the video encoding unit 1 to the editing unit 10 is realized by dedicated hardware or software corresponding to each function and hardware such as a CPU or a memory that executes the software.
Next, the operation of the third embodiment will be described.
Regarding the operation during communication, while transmitting and receiving video and audio data, the encoded transmission audio data that is the output of the audio encoding unit 2 and the undecoded video and audio encoding that is the output of the receiving unit 4 Data is stored in the storage unit 7.
After the communication is completed, the storage unit 7 transfers the encoded transmission voice data stored during the communication and the encoded reception voice data before decoding to the editing unit 10.
The editing unit 10 decodes these encoded audio data to obtain decoded received audio data. This decoding process may be performed using the audio decoding unit 5. Next, the editing unit 10 synthesizes the decoded reception voice data and the transmission voice data to generate transmission / reception synthesized voice data. Further, the editing unit 10 encodes the generated transmission / reception synthesized speech data to obtain encoded transmission / reception synthesized speech data. In addition, you may comprise so that this encoding process may be performed using the audio | voice encoding part 2. FIG.
The editing unit 10 transfers the encoded transmission / reception synthesized speech data thus obtained to the storage unit 7. The storage unit 7 stores the encoded transmission / reception synthesized speech data sent from the editing unit 10.
Also, the encoded transmission video data stored in the storage unit 7 and the received encoded reception video data are also processed by the editing unit 10 in the same manner as audio, and for example, a transmission image and a reception image are displayed simultaneously. Such composite video data is generated. Then, encoded transmission / reception composite video data obtained by encoding the composite video data is generated and stored in the storage unit 7. However, this function can be omitted when the synthesized video is not particularly required.
Since the series of processes performed by the editing unit 10 do not need to be performed in real time, the decoding processes are performed in order, for example, the encoded reception voice data is decoded after the encoded transmission voice data is decoded. By doing so, the number of functions required as a device can be minimized.
In this embodiment, only audio and video data are handled. However, if it is necessary to perform processing such as composition on other various media data, the editing unit 10 can perform the processing.
If the encoded synthesized data from the editing unit 10 stored in the storage unit 7 is only audio data, the encoded received video data can be copied and stored together. In this way, the encoded combined data having the same format as the encoded data for transmission / reception is stored in the storage unit 7.
In addition, when reproducing the stored data, the storage unit 7 outputs encoded data of transmission, reception, or synthesis to the audio decoding unit 5 and the video decoding unit 6, and outputs an audio signal and a video signal, respectively. .
As described above, according to the third embodiment, an encoding unit that encodes transmission media data to obtain encoded transmission media data, a decoding unit that decodes received encoded reception media data, and encoded reception media data And the received media data obtained by decoding the encoded transmission media data and the transmission media data, and the editing unit that generates the encoded synthesized data obtained by encoding the synthesized data, and the encoded synthesis output from the editing unit Since the storage unit for storing data is provided, and the stored encoded combined data is decoded and output as reproduction data, transmission / reception data can be reproduced with a minimum apparatus configuration.
In the third embodiment, encoded transmission data and encoded reception data are accumulated, edited after communication, and transmission / reception combined data is generated. Therefore, there is a load for combining processing during communication. Therefore, it is possible to generate encoded data for transmission, reception, and synthesis.
Embodiment 4 FIG.
In the fourth embodiment, transmission audio data and reception audio data are accumulated in a time division manner.
FIG. 4 is a configuration diagram of a communication apparatus according to the fourth embodiment.
The communication device 104a is connected to the communication device 104b via the network 200, and the connection relationship between them is the same as in the first to third embodiments. Also in the fourth embodiment, since the communication device 104a and the communication device 104b have the same configuration, only the communication device 104a is shown as the internal configuration.
The communication device 104 a includes a video encoding unit 1, an audio encoding unit 2, a transmission unit 3, a reception unit 4, an audio decoding unit 5, a video decoding unit 6, a storage unit 7, and a selection unit 11. Here, since the configuration other than the selection unit 11 is the same as that of the first embodiment, the corresponding parts are denoted by the same reference numerals and description thereof is omitted. The selection unit 11 selects the encoded transmission voice data that is output from the voice encoding unit 2 and the encoded reception voice data that is output from the reception unit 4 based on the sound / silence state, It is a functional unit that outputs selection data to the storage unit 7.
Each configuration of the video encoding unit 1 to the selection unit 11 is realized by dedicated hardware or software corresponding to each function and hardware such as a CPU or a memory that executes the software.
Next, the operation of the fourth embodiment will be described.
During communication, the selection unit 11 uses the output of the audio encoding unit 2 (encoded transmission audio data) and the audio output from the reception unit 4 (encoded reception audio data) as input data, and the voice / silence of these data. Perform detection. Then, the silent part of the received voice data is replaced with the voiced part of the transmitted voice data to generate one piece of voice data, which is output to the storage unit 7.
In general, since there are quite a few silent sections of speech even during a call, it is possible to collect and paste the encoded data of the voiced portions of transmission and reception and make the data amount smaller than the voice encoded data for one direction. However, in order to make it the same as the data amount for one direction that normally accumulates, the selection unit 11 basically selects the encoded reception data, and only silences when the encoded transmission data becomes voiced. The encoded transmission data is transferred to the storage unit 7 instead of the partial encoded reception data.
The storage unit 7 stores the encoded audio data selected by the selection unit 11 and the encoded video data from the receiving unit 4. In addition, you may provide the function to accumulate | store directly the encoding audio | voice data from the receiving part 4 as needed. Since other operations are the same as those of the third embodiment, description thereof is omitted.
When reproducing the stored data, the encoded audio data is output from the storage unit 7 to the audio decoding unit 5 and the encoded video data is output to the video decoding unit 6 to reproduce audio transmission / reception data and reception video data. Do.
As described above, in the fourth embodiment, an encoding unit that encodes transmission media data to obtain encoded transmission media data, a decoding unit that decodes received encoded reception media data, encoded reception media data, A selection unit that selects one of the encoded transmission media data depending on whether the media data is valid data, and a storage unit that stores output data of the selection unit, and the stored data is decoded by the decoding unit Since the data is output as reproduction data, the transmission / reception data can be reproduced with a minimum apparatus configuration, and only the effective portion of both audio data can be stored with the data amount in one direction.
In addition, in the selection unit 11, when both the encoded transmission voice data and the encoded reception voice data are voiced portions, not only one of them is selected but also the encoded voice data that has not been selected is temporarily stored. When the selected encoded audio data becomes silent, the delayed encoded audio data may be transferred to the subsequent storage unit 7.
If comprised in this way, the encoding audio | voice data sent to the storage part 7 delayed from original timing will be accumulate | stored at the timing sent to the storage part 7. FIG. For this reason, at the time of reproduction, the sound is output with a slight delay from the actual time, but it is possible to output all without the sound part of the transmitted / received sound data being applied.
As described above, in the fourth embodiment, when the encoded transmission voice data and the encoded reception voice data are simultaneously voiced, either one is delayed and stored. All parts can be stored.
Furthermore, when the selection unit 11 switches between the encoded transmission voice data and the encoded reception voice data, the selection unit 11 may be switched if there is a silence period in the encoded data to be switched for a certain period or more. By doing so, it is possible to prevent switching during the conversation included in the audio data, for example.
As described above, in the fourth embodiment, when the transmission speech encoded data and the reception speech encoded data are switched, the switching is performed when there is a silent period of a certain period or more in the speech data before switching. It is possible to prevent the conversation in the voice data from being interrupted.
By the way, when switching is performed after silence for a certain period or more exists for switching determination, there is always silence for a certain period or more at the switching point. If both transmission and reception data are sounded, either one of the audio data is delayed, and further delayed by the amount of silent data for a certain period or more for switching determination. That is, the relative delay between transmitted and received voices is further increased by the amount of silence data.
Therefore, if silence data for a certain period or longer is detected, if the switching is performed at the first timing of the silence data, the data delay after switching is reduced. For this purpose, the selection unit 11 outputs the data output to the storage unit 7 with a delay of a certain period or more. And when silence data for a certain period or more is detected, that is, when the next data to be output has already been waited for a certain period or more at the switching timing, if the data is output immediately, the silence data for switching detection Is not output to the storage unit 7.
FIG. 5 is a diagram for explaining the operation when the next data is output at the first timing of the silent data.
The selection unit 11 basically selects the reception data from the reception unit 4, delays the reception data by a delay time T _{2 that is} longer than a predetermined period T _1, and outputs the received data to the storage unit 7. Now, the transmission data from the speech coder 2 at time t ₁ is assuming that a sound, for receiving data at time t ₁ indicates the presence of sound, the selection unit 11 temporarily stores the transmission data. Since the received data is output after being delayed by the delay time T ₂ , the end (time t ₂ ) is output at time t ₃ (= t ₂ + T ₂ ). Here, the reception data delayed (output data of the selector 11) is the time t ₃ after for a silent portion, immediately outputs the transmission data at time t _3. That is, the absence of such treatment is between the reception data and the transmission data will be present silent time for a certain period T _1, the immediately transmit data at the first timing data with voice By outputting, delay of transmission data after switching can be reduced.
As described above, according to the fourth embodiment, the selection unit outputs the data to be selected with a delay for a time equal to or longer than a certain period for performing the switching determination, and when the data to be selected is switched, When the data to be selected is already delayed for a certain period or longer, the data is output immediately, so that the silent part at the switching point can be eliminated, and therefore the relative delay of the transmitted and received voices is reduced. Can be small.
In the fourth embodiment, the determination as to whether the media data is valid is made based on whether the audio data is voiced or silent. However, the present invention is not limited to this, and depending on the type of the media data, etc. You may choose arbitrarily.
In each of the above embodiments, the transmission / reception media data to be reproduced is audio data or video data, but is not limited to these data, and can be applied to various data.

以上のように、この発明に係る通信装置は、例えばテレビ電話に適用され、通信後のビデオデータや音声データを再生するのに適している。 As described above, the communication device according to the present invention is applied to, for example, a videophone and is suitable for reproducing video data and audio data after communication.

Claims

An encoding unit that encodes transmission media data to obtain encoded transmission media data;
A decoding unit for decoding the received encoded received media data;
A storage section for storing the encoded reception media data and the transmission media data;
A communication apparatus comprising: data obtained by decoding the encoded reception media data accumulated in the accumulation unit by the decoding unit; and a synthesis unit that synthesizes transmission media data accumulated in the accumulation unit and outputs the synthesized data.

The communication apparatus according to claim 1, wherein the storage unit outputs one or both of the stored encoded reception media data and transmission media data.

An encoding unit that encodes transmission media data to obtain encoded transmission media data;
A decoding unit for decoding the received encoded received media data;
An editing unit that synthesizes the reception media data obtained by decoding the encoded reception media data and the transmission media data, and generates encoded composite data obtained by encoding the combined data;
And a storage unit that stores the encoded combined data output from the editing unit, and the decoded encoded combined data is decoded by the decoding unit and output as reproduction data.

An encoding unit that encodes transmission media data to obtain encoded transmission media data;
A decoding unit for decoding the received encoded received media data;
An editing unit that synthesizes the reception media data and the transmission media data obtained by decoding the encoded reception media data and the encoded transmission media data, and generates encoded composite data obtained by encoding the combined data;
And a storage unit that stores the encoded combined data output from the editing unit, and the decoded encoded combined data is decoded by the decoding unit and output as reproduction data.

An encoding unit that encodes transmission media data to obtain encoded transmission media data;
A decoding unit for decoding the received encoded received media data;
A selector that selects one of the encoded reception media data and the encoded transmission media data depending on whether the media data is valid data;
And a storage unit that stores the output data of the selection unit, wherein the stored data is decoded by the decoding unit and output as reproduction data.

6. The communication according to claim 5, wherein the selection unit delays the timing of outputting one of the data when both the encoded reception media data and the encoded transmission media data are valid data. apparatus.

The selection unit, when switching between the encoded reception media data and the encoded transmission media data, is performed when a state where the state of the data before switching is not valid data exists for a certain period or more. The communication device according to claim 5.

The selection unit outputs the data to be selected with a delay for a time equal to or longer than a certain period for performing the switching determination, and when switching the data to be selected, the data to be selected next is already delayed for a certain period 8. The communication apparatus according to claim 7, wherein the data is immediately output when the data is present.