JP4245575B2

JP4245575B2 - COMMUNICATION DEVICE, COMMUNICATION METHOD, AND COMMUNICATION PROGRAM

Info

Publication number: JP4245575B2
Application number: JP2005053540A
Authority: JP
Inventors: 篤信木村; 義弘島田; 稔小林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-02-28
Filing date: 2005-02-28
Publication date: 2009-03-25
Anticipated expiration: 2025-02-28
Also published as: JP2006238344A

Description

本発明は、遠隔の通信装置において、相手側の空間での音伝播の様子を可視化する方法に関する。 The present invention relates to a method for visualizing the state of sound propagation in a partner's space in a remote communication device.

音声を含む通信において音量は重要であり、ユーザが音量を容易に把握することができるように可視化する手法が用いられることがある。例えば、映像音声通信装置において、自分側の装置に入力する音量を可視化する手法として、ピークメータを用いる手法がある。 The volume of sound is important in communication including voice, and a method of visualizing the volume so that the user can easily grasp the volume may be used. For example, in a video / audio communication apparatus, there is a technique using a peak meter as a technique for visualizing the volume input to the apparatus on its own side.

しかし、遠隔の相手側の音声提示部から再生される音量を可視化する手法はなかった。そのため、従来の映像音声通信装置では、遠隔の相手側の音声提示部から再生される音量が未知であるために、コミュニケーションの成立が阻害されることがあった（非特許文献１参照）。
Ｆｉｓｈ, Ｒ．Ｓ．, Ｋｒａｕｔ, Ｒ．Ｅ．, ａｎｄＣｈａｌｆｏｎｅｔｅ, Ｂ．Ｌ．, ：ＴｈｅＶｉｄｅｏＷｉｎｄｏｗＳｙｓｔｅｍｉｎＩｎｆｏｒｍａｌＣｏｍｍｕｎｉｃａｔｉｏｎｓ, ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＡＣＭ１９９０ＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＳｕｐｐｏｒｔｅｄＣｏｏｐｅｒａｔｉｖｅＷｏｒｋ（ＣＳＣＷ９０）, ＡＣＭ, ｐｐ．１−１１（１９９０）． However, there was no method for visualizing the volume reproduced from the remote voice presentation unit. Therefore, in the conventional video / audio communication device, since the volume reproduced from the remote voice presentation unit is unknown, establishment of communication may be hindered (see Non-Patent Document 1).
Fish, R.A. S. Kraut, R .; E. , and Chalfonete, B .; L. The Video Window in Information Communications, Proceedings of the ACM 1990 Conference on Computer Supported Work, CSCW90. 1-11 (1990).

従来の手法では、自分の声が相手側の装置でどの程度の音量で再生され、相手にどの程度の音量で伝わっているのかわからず、自分が望んでいる音量、相手が望んでいる音量で相手と会話をすることが難しい。 With the conventional method, you can't know how loud your voice is played on the other party's device, and how much is transmitted to the other party. It is difficult to talk with the other party.

本発明の目的は、相手側への音声がどの程度の音量となっているのかを把握することのできる通信装置を提供することである。 The objective of this invention is providing the communication apparatus which can grasp | ascertain how much the audio | voice to the other party's sound volume is.

上記目的を達成するために、本発明の通信装置は、少なくとも音声を相互に送受信して通信を行う通信装置であって、相手側装置から受信した音声を再生したときの音量レベルを測定する提示音声集音部と、音声をユーザに提示する空間内での音声の伝播モデルを予め求めておき、前記相手側装置からの音声が再生されたとき、前記提示音声集音部で測定された音量レベルと前記伝播モデルとから、前記相手側装置からの前記音声の前記空間内での伝播状態を算出し、該伝播状態の情報を前記相手側装置に送信する伝播状態情報生成部とを有している。 In order to achieve the above object, the communication device of the present invention is a communication device that performs communication by transmitting and receiving at least audio to and from each other, and presents a volume level measurement when reproducing the audio received from the counterpart device A sound collection unit and a sound volume measured by the presentation sound collection unit when a sound propagation model in a space where the sound is presented to the user is obtained in advance and the sound from the counterpart device is reproduced. A propagation state information generating unit that calculates a propagation state of the voice from the counterpart device in the space from the level and the propagation model, and transmits information of the propagation state to the counterpart device; ing.

本発明によれば、受話側の通信装置が予め音の伝播モデルを求めておき、通信時、その伝播モデルを用いて送話側の通信装置からの音声の伝播状態を求め、それを送話側の通信装置に送るので、送話側の通信装置のユーザは、自分の発した音声が、受話側の通信装置のユーザの周囲の空間にどのように提示されているかを把握することができる。 According to the present invention, the communication device on the receiving side obtains a sound propagation model in advance, and at the time of communication, the propagation state of the sound from the communication device on the transmitting side is obtained using the propagation model, and this is transmitted. The user of the transmitting communication device can grasp how the voice he / she is presenting in the space around the user of the receiving communication device. .

また、音声に加えて前記空間内の映像が相互に送受信されており、前記伝播状態情報生成部は、前記伝播状態の情報を前記空間内の映像にエフェクトとして付加して前記相手側装置に送信することとしてもよい。 In addition to audio, video in the space is mutually transmitted and received, and the propagation state information generation unit adds the information on the propagation state as an effect to the video in the space and transmits it to the counterpart device. It is good to do.

これによれば、受話側の通信装置が送話側の通信装置２０からの音声の伝播状態をエフェクトとして映像に付加して送話側の通信装置に送るので、送話側の通信装置のユーザは、自分の発した音声が、受話側の通信装置のユーザの周囲の空間にどのように提示されているかを視覚的に把握することができる。 According to this, since the communication device on the receiving side adds the sound propagation state from the communication device 20 on the transmitting side to the video as an effect and sends it to the communication device on the transmitting side, the user of the communication device on the transmitting side Can visually grasp how the voice uttered by the user is presented in the space around the user of the communication device on the receiving side.

また、所定のテスト音が発生されたときの、前記空間内の複数箇所の音量レベルを測定する音場測定部をさらに有し、前記伝播状態情報生成部は、前記音場測定部で測定された複数の音量レベルを用いて前記伝播モデルを算出することとしてもよい。 And a sound field measuring unit that measures sound volume levels at a plurality of locations in the space when a predetermined test sound is generated, and the propagation state information generating unit is measured by the sound field measuring unit. The propagation model may be calculated using a plurality of volume levels.

これによれば、音場測定部の複数箇所の音量レベルにより伝播モデルを算出することができ、実際の音声の伝播状態の推定が可能である。 According to this, a propagation model can be calculated from the sound volume levels at a plurality of locations in the sound field measuring unit, and an actual sound propagation state can be estimated.

また、前記伝播モデルは、前記提示音声集音部で測定される音量レベルと前記空間内の任意点の音量レベルとの関係を示すｎ次近似関数であり、前記音場測定部は、前記ｎ次近似関数の次数に応じた数の箇所の音量レベルを測定することとしてもよい。 The propagation model is an nth-order approximation function indicating a relationship between a volume level measured by the presentation sound collection unit and a volume level at an arbitrary point in the space, and the sound field measurement unit It is good also as measuring the volume level of the place of the number according to the order of the next approximation function.

これによれば、空間内にて複雑な伝播状態を示すような場合にも、必要に応じて高次数の伝播モデルを利用することにより実際の伝播状態を高い精度で把握することができる。 According to this, even when a complicated propagation state is shown in the space, the actual propagation state can be grasped with high accuracy by using a high-order propagation model as necessary.

また、前記相手側装置から受信した音声について、該相手側装置での波形と前記提示音声集音部での波形が一致するか否かを判定する波形一致判定部をさらに有し、前記伝播状態情報生成部は、前記波形一致判定部にて一致と判定された期間について、前記伝播状態の情報を前記相手側装置に送信することとしてもよい。 In addition, for the voice received from the counterpart device, it further includes a waveform match determination unit that determines whether the waveform at the counterpart device matches the waveform at the presentation voice collection unit, and the propagation state An information generation part is good also as transmitting the information of the said propagation state to the said other party apparatus about the period determined to be the coincidence in the said waveform coincidence determination part.

これによれば、受話側の通信装置は、送話側波形と受話側波形との一致を判定し、それらが一致したときに伝播状態の情報を送信するので、受話側の通信装置の周囲に大きな騒音が発生する場合にも、騒音の影響を除去し、送話側の通信装置から送った音声の伝播状態を送話側の通信装置のユーザに把握させることができる。 According to this, the communication device on the receiving side determines the match between the waveform on the transmitting side and the waveform on the receiving side, and transmits the propagation state information when they match, so that the communication device on the receiving side is surrounded by Even when a large amount of noise is generated, the influence of the noise can be removed, and the user of the transmitting communication device can be made aware of the propagation state of the voice transmitted from the transmitting communication device.

本発明を実施するための形態について図面を参照して詳細に説明する。本実施形態では、双方向に映像および音声を送受信する通信システムを例示する。本実施形態の通信システムは、送話者から受話者に音声が送られるとき、受話者側での音量の情報を、送話者側に提示される受話者側の映像にエフェクトとして付加することにより、送話者に提示するものである。エフェクトとは、映像に対して付加される映像効果であり、例えば、映像に効果画像を重ね合わせて表示する（重畳）ものである。 Embodiments for carrying out the present invention will be described in detail with reference to the drawings. In this embodiment, a communication system that transmits and receives video and audio bidirectionally is illustrated. The communication system according to the present embodiment adds volume information on the receiver side as an effect to the video on the receiver side presented to the speaker side when sound is transmitted from the speaker to the receiver. Is presented to the sender. The effect is a video effect added to the video. For example, the effect is displayed by superimposing an effect image on the video (superimposition).

図１は、本実施形態による通信システムの概略構成を示すブロック図である。図１を参照すると、本実施形態の通信システムは、通信装置１０と通信装置２０が通信網３０を介して相互に接続されている。ここでは２つの通信装置を有する構成を例示するが、本発明はこの構成に限定されるものではなく、通信装置が複数であればよい。 FIG. 1 is a block diagram showing a schematic configuration of a communication system according to the present embodiment. Referring to FIG. 1, in the communication system of the present embodiment, a communication device 10 and a communication device 20 are connected to each other via a communication network 30. Here, a configuration having two communication devices is illustrated, but the present invention is not limited to this configuration, and it is sufficient if there are a plurality of communication devices.

図２は、本実施形態による通信システムの各装置の構成および各部の配置を示す図である。図３は、本実施形態による通信装置の構成を示すブロック図である。通信装置１０はユーザａ１０９に使用され、通信装置２０はユーザｂ１１０に使用されるものとする。また、ここではユーザａ１０９とユーザｂ１１０の間の通信を想定する。 FIG. 2 is a diagram illustrating a configuration of each device and an arrangement of each unit of the communication system according to the present embodiment. FIG. 3 is a block diagram showing the configuration of the communication apparatus according to the present embodiment. The communication device 10 is used for the user a109, and the communication device 20 is used for the user b110. Here, communication between the user a109 and the user b110 is assumed.

図１および図２を参照すると、通信装置１０、２０は双方向の通信を可能にするために、同じ構成を有している。 Referring to FIGS. 1 and 2, the communication devices 10 and 20 have the same configuration in order to enable bidirectional communication.

通信装置１０は、集音部１０１、音声提示部１０２、撮像部１０３、映像提示部１０４、提示音声集音部１０５、音場測定部１０６、およびエフェクト生成部１０８を有している。音場測定部１０６には、音声提示部１０２および映像提示部１０４の前に配置された複数のマイク入力部１０７が含まれている。 The communication device 10 includes a sound collection unit 101, a voice presentation unit 102, an imaging unit 103, a video presentation unit 104, a presentation voice sound collection unit 105, a sound field measurement unit 106, and an effect generation unit 108. The sound field measurement unit 106 includes a plurality of microphone input units 107 arranged in front of the audio presentation unit 102 and the video presentation unit 104.

同様に、通信装置２０は、集音部２０１、音声提示部２０２、撮像部２０３、映像提示部２０４、提示音声集音部２０５、音場測定部２０６、およびエフェクト生成部２０８を有している。音場測定部２０６には複数のマイク入力部２０７が含まれている。集音部２０１、音声提示部２０２、撮像部２０３、映像提示部２０４、提示音声集音部２０５、音場測定部２０６、エフェクト生成部２０８の各々は、通信装置１０における集音部１０１、音声提示部１０２、撮像部１０３、映像提示部１０４、提示音声集音部１０５、音場測定部１０６、エフェクト生成部１０８と同じものである。ここでは、通信装置１０について説明する。 Similarly, the communication device 20 includes a sound collection unit 201, a sound presentation unit 202, an imaging unit 203, a video presentation unit 204, a presentation sound collection unit 205, a sound field measurement unit 206, and an effect generation unit 208. . The sound field measurement unit 206 includes a plurality of microphone input units 207. The sound collecting unit 201, the sound presenting unit 202, the image capturing unit 203, the video presenting unit 204, the presented sound collecting unit 205, the sound field measuring unit 206, and the effect generating unit 208 are the sound collecting unit 101 and sound in the communication device 10. This is the same as the presentation unit 102, the imaging unit 103, the video presentation unit 104, the presentation sound collection unit 105, the sound field measurement unit 106, and the effect generation unit 108. Here, the communication device 10 will be described.

集音部１０１は、ユーザａ１０９の発話を集音し、音声を相手側の通信装置２０に送る。 The sound collecting unit 101 collects the utterance of the user a 109 and sends the sound to the communication device 20 on the other side.

撮像部１０３は、ユーザａ１０９の画像を撮影し、画像をエフェクト生成部１０８に送る。 The imaging unit 103 captures an image of the user a 109 and sends the image to the effect generation unit 108.

音声提示部１０２は、相手側の通信装置２０から受信した音声を再生し、ユーザａ１０９に提示する。 The voice presentation unit 102 plays back the voice received from the communication device 20 on the other side and presents it to the user a109.

映像提示部１０４は、相手側の通信装置２０からの映像を再生し、ユーザａ１０９に提示する。 The video presentation unit 104 plays back the video from the communication device 20 on the other side and presents it to the user a109.

提示音声集音部１０５は、音声提示部１０２の前に設置されており、音声提示部１０２による再生音の音量レベルを測定し、その音量レベルをエフェクト生成部１０８に通知する。 The presentation sound collection unit 105 is installed in front of the voice presentation unit 102, measures the volume level of the reproduced sound by the voice presentation unit 102, and notifies the effect generation unit 108 of the volume level.

音場測定部１０６は、撮像部１０３の画角内に平面的に配置した複数（本実施形態では１０個）のマイク入力部１０７より構成されており、音声提示部１０２からの再生音の各マイク入力部１０７での音量レベルを測定し、その音量レベルをエフェクト生成部１０８に通知する。 The sound field measurement unit 106 includes a plurality of (10 in the present embodiment) microphone input units 107 arranged in a plane within the angle of view of the imaging unit 103, and each of the reproduced sounds from the audio presentation unit 102. The volume level at the microphone input unit 107 is measured, and the volume level is notified to the effect generation unit 108.

エフェクト生成部１０８は、本装置が通信に使用される前に、予め、音声提示部１０２から発せられる音の伝播状態を算出するための伝播モデルとなる空間伝播二次近似関数を算出しておく。空間伝播二次近似関数は、音声提示部１０２から所定のテスト音を発生させ、提示音声集音部１０５で得られる音量レベルと、音場測定部１０６で得られる音量レベルとに基づいて算出される。 The effect generation unit 108 calculates a spatial propagation quadratic approximation function that is a propagation model for calculating the propagation state of the sound emitted from the audio presentation unit 102 before the apparatus is used for communication. . The spatial propagation quadratic approximation function is calculated based on the sound volume level obtained by the sound collection unit 105 and the sound level obtained by the sound field measurement unit 106 by generating a predetermined test sound from the sound presentation unit 102. The

また、エフェクト生成部１０８は、本装置が通信に使用されるとき、音声提示部１０２で再生され、音声提示集音部１０５で測定される、相手側の通信装置２０からの音声の音量レベルを、予め求めておいた空間伝播二次近似関数に代入し、空間の任意点での音量レベルを算出する。そして、エフェクト生成部１０８は、得られた空間内の音量レベルの情報をエフェクトとして、撮像部１０３で撮像された空間の映像に重畳して相手側の通信装置に送る。 Further, the effect generation unit 108 determines the volume level of the sound from the communication device 20 on the other side, which is reproduced by the voice presentation unit 102 and measured by the voice presentation sound collection unit 105 when the apparatus is used for communication. Then, the sound volume level at an arbitrary point in the space is calculated by substituting it into a spatial propagation quadratic approximation function obtained in advance. Then, the effect generation unit 108 superimposes the obtained volume level information in the space as an effect on the image of the space imaged by the imaging unit 103 and sends it to the communication device on the other side.

なお、撮像部１０３の設置位置および撮影方向は音場測定部１０６に対して固定されており、撮像部１０３による映像と音場測定部１０６の平面座標との対応関係は予め得られているものとしてもよい。また、撮像部１０３で撮像される映像と音場測定部１０６のマイク入力部１０７の配置などから、撮像された映像と音場測定部１０６の平面座標との対応関係を求めることとしてもよい。また、撮像部１０３の設置位置および撮影方向から、撮像された映像と音場測定部１０６の平面座標との対応関係を求めることとしてもよい。 Note that the installation position and shooting direction of the imaging unit 103 are fixed with respect to the sound field measuring unit 106, and the correspondence between the image by the imaging unit 103 and the plane coordinates of the sound field measuring unit 106 is obtained in advance. It is good. Alternatively, the correspondence between the captured image and the plane coordinates of the sound field measuring unit 106 may be obtained from the image captured by the image capturing unit 103 and the arrangement of the microphone input unit 107 of the sound field measuring unit 106. Further, the correspondence relationship between the captured image and the plane coordinates of the sound field measuring unit 106 may be obtained from the installation position and shooting direction of the imaging unit 103.

次に、空間伝播二次近似関数を算出する処理について説明する。 Next, processing for calculating a spatial propagation quadratic approximate function will be described.

図４は、本実施形態による通信装置の空間伝播二次近似関数算出処理を示すフローチャートである。ここでは音量の異なる複数のテスト音を用いるものとし、各テスト音を示す番号をｍとする。 FIG. 4 is a flowchart showing a spatial propagation quadratic approximate function calculation process of the communication apparatus according to the present embodiment. Here, a plurality of test sounds having different volumes are used, and a number indicating each test sound is m.

空間伝播二次近似関数は、通信装置が通信に用いられる前に算出される。図４を参照すると、まず、本実施形態の通信装置は、音声提示部１０２より所定のテスト音を提示する（ステップＡ１０１）。その状態で、通信装置は、提示音声集音部１０５にてその音声を集音して音量レベルを測定するとともに（ステップＡ１０２）、音場測定部１０６の１０個のマイク入力部１０７により、マイクの配置された位置（ｘ，ｙ）の音量レベル（Ｖ）を測定する（ステップＡ１０３）。 The spatial propagation quadratic approximation function is calculated before the communication device is used for communication. Referring to FIG. 4, first, the communication apparatus according to the present embodiment presents a predetermined test sound from the voice presentation unit 102 (step A101). In this state, the communication apparatus collects the voice by the presentation voice sound collection unit 105 and measures the volume level (step A102), and uses the ten microphone input units 107 of the sound field measurement unit 106 to set the microphone. The sound volume level (V) at the position (x, y) where is arranged is measured (step A103).

ここで用いるテスト音は、男性の発声時の平均基本周波数１３０Ｈｚ、あるいは女性の発声時の平均基本周波数２４５Ｈｚのどちらかの音声であることが好ましい。また、このテスト音は、提示音声集音部１０５で測定される音量レベル（Ｗｍ）が人の発声時の主な音量帯である２０、３０、４０、５０、６０、７０、８０ｄＢのいずれかであることが好ましい。 The test sound used here is preferably a voice having either an average fundamental frequency of 130 Hz when a man speaks or an average fundamental frequency of 245 Hz when a woman speaks. In addition, this test sound is one of 20, 30, 40, 50, 60, 70, and 80 dB in which the volume level (Wm) measured by the presentation sound collecting unit 105 is a main volume range when a person speaks. It is preferable that

提示音声集音部１０５で測定されるテスト音（Ｗｍ）と、１０個のマイク入力部１０７の座標位置（ｘ，ｙ）および音量レベル（Ｖ）の測定値とを式（１）に代入し、各係数（ａ，ｂ，ｃ）を求める（ステップＡ１０４）。音量（Ｖ）には、複数のテスト音により各周波数（１３０Ｈｚ、２４５Ｈｚ）に対して得られた音量の平均値を用いれば良い。 The test sound (Wm) measured by the presentation sound collection unit 105 and the measured values of the coordinate positions (x, y) and volume levels (V) of the ten microphone input units 107 are substituted into the equation (1). Each coefficient (a, b, c) is obtained (step A104). As the volume (V), an average value of volumes obtained for each frequency (130 Hz, 245 Hz) by a plurality of test sounds may be used.

次に、通信装置は、得られた各係数に基づき、提示音声集音部１０５での音量レベルがＷであるときの空間内の任意点（ｘ，ｙ）での音の伝播状態を近似する二次関数を式（２）として求める（ステップＡ１０５）。音の伝播状態は、音がどの領域に有効に到達するかを示すものである。例えば、平常時のノイズがある環境において、有意な音声が人間の聴覚により認識可能な程度で到達するか否かを基準とすることとしてもよい。

Next, the communication apparatus approximates the sound propagation state at an arbitrary point (x, y) in the space when the volume level in the presentation sound collection unit 105 is W based on the obtained coefficients. A quadratic function is obtained as equation (2) (step A105). The sound propagation state indicates which region the sound effectively reaches. For example, in an environment where there is a normal noise, it may be based on whether or not a significant voice reaches a level that can be recognized by human hearing.

なお、ここで通信装置１０（ユーザａ１０９側）における空間伝播二次近似関数の算出には通信装置２０（ユーザｂ１１０）における集音部２０１での入力の音量レベルでなく、通信装置１０の提示音声集音部１０５における音量レベルを用いている。これは、通信装置の使用に際し、通信装置２０における集音部２０１のゲインなどのセッティングや、通信装置１０における音声提示部１０２のボリュームなどのセッティングがユーザにより自由に変更される可能性があり、セッティングが変更された場合にも空間伝播二次近似関数が影響を受けないようにしておくためである。

Here, the calculation of the spatial propagation quadratic approximation function in the communication device 10 (user a 109 side) is not the input sound volume level in the sound collection unit 201 in the communication device 20 (user b110), but the voice presented by the communication device 10. The volume level in the sound collection unit 105 is used. This is because the user may freely change settings such as the gain of the sound collection unit 201 in the communication device 20 and the volume of the voice presentation unit 102 in the communication device 10 when the communication device is used. This is to prevent the spatial propagation quadratic approximation function from being affected even when the setting is changed.

また、装置利用前に、音場測定部１０６の各マイク入力部１０７により装置を設置する環境の定常ノイズレベル（Ｖａｖｇ）を測定しておくこととしてもよい。この定常ノイズレベルの値は、提示音声集音部１０２の音量レベルからユーザｂ１１０の発話の有無を判定するのに用いることができる。 Moreover, it is good also as measuring the stationary noise level (Vavg) of the environment which installs an apparatus with each microphone input part 107 of the sound field measurement part 106 before using an apparatus. The value of the steady noise level can be used to determine the presence / absence of the utterance of the user b110 from the volume level of the presentation sound collection unit 102.

マイク入力部１０７で測定される音量レベル（Ｖ）が定常ノイズレベル（Ｖａｖｇ）より大きいとき、そのマイク入力部１０７で測定される音声が音声提示部１０２からの再生音であると考えられる。例えば、再生音が検出されていることを、エフェクト４０１を映像に重畳させる条件としてもよい。 When the volume level (V) measured by the microphone input unit 107 is larger than the stationary noise level (Vavg), it is considered that the sound measured by the microphone input unit 107 is a reproduced sound from the voice presentation unit 102. For example, the detection of the playback sound may be a condition for superimposing the effect 401 on the video.

また、本実施形態では、本装置を通信に利用するとき、音場測定部１０６でリアルタイムに測定される音量レベルからではなく、予め求めておいた空間二次近時関数と提示音声集音部１０５で得られたリアルタイムの音量とから任意点（ｘ，ｙ）での音量を求めている。そのため、音場測定部１０６は、空間二次近似関数を算出するときに必要とされるだけで、その後は必ずしも必要ではない。音場測定部１０６のマイク入力部１０７は、ユーザｂ１１０の発話以外の音声を集音する可能性もあるので、例えば、空間二次近似関数を求める処理が終了したら、音場測定部１０６を撤去することとしてもよい。 Further, in the present embodiment, when the apparatus is used for communication, the spatial second-order function and the presentation voice collecting unit that are obtained in advance, not from the volume level measured in real time by the sound field measuring unit 106. The volume at an arbitrary point (x, y) is obtained from the real-time volume obtained at 105. Therefore, the sound field measurement unit 106 is only required when calculating the spatial quadratic approximation function, and is not always necessary thereafter. Since the microphone input unit 107 of the sound field measurement unit 106 may collect sound other than the speech of the user b110, for example, when the process for obtaining the spatial quadratic approximation function is completed, the sound field measurement unit 106 is removed. It is good to do.

次に、エフェクト生成部１０８によるエフェクト重畳処理について説明する。 Next, effect superimposition processing by the effect generation unit 108 will be described.

図５は、本実施形態による通信装置のエフェクト重畳処理を示すフローチャートである。図５を参照すると、通信装置１０と通信装置２０の通信時、まず、通信装置２０の集音部２０１にユーザｂ１１０の音声が入力されると（ステップＢ１０１）、通信装置１０の音声提示部１０２はその音声を再生し、ユーザａ１０９に提示する（ステップＢ１０２）。音声提示部１０２により再生された音声の音量レベルを提示音声集音部１０５が測定し（ステップＢ１０３）、エフェクト生成部１０８が式（２）の空間伝播二次近似関数を用いて任意点の音量レベル（Ｖ）を算出する（ステップＢ１０４）。 FIG. 5 is a flowchart showing the effect superimposing process of the communication device according to the present embodiment. Referring to FIG. 5, during communication between the communication device 10 and the communication device 20, first, when the voice of the user b 110 is input to the sound collection unit 201 of the communication device 20 (step B <b> 101), the voice presentation unit 102 of the communication device 10. Plays the voice and presents it to the user a109 (step B102). The presentation sound collection unit 105 measures the volume level of the voice reproduced by the voice presentation unit 102 (step B103), and the effect generation unit 108 uses the spatial propagation quadratic approximation function of Equation (2) to set the volume at any point. The level (V) is calculated (step B104).

一方、それと並行して、通信装置１０の撮像部１０３では映像が取得される（ステップＣ１０１）。そして、エフェクト生成部１０８では、取得された映像と音場測定部１０６の平面座標との対応関係を求め、座標系を一致させる（ステップＣ１０２）。さらに、エフェクト生成部１０８は、式（２）においてＶ＝Ｖａｖｇ（定常ノイズレベル）、Ｗ＝Ｗ＋５（Ｗの初期値は０であるとする）を満たす領域を求め、その領域の境界を識別するエフェクト４０１の１つの波を作成する（ステップＣ１０３）。エフェクト４０１は、所定の座標の領域を他の領域と識別可能にする画像効果をいい、ここでは一例として曲線で示された波により識別するものとする。 On the other hand, in parallel with this, an image is acquired by the imaging unit 103 of the communication device 10 (step C101). Then, the effect generation unit 108 obtains the correspondence between the acquired image and the plane coordinates of the sound field measurement unit 106, and matches the coordinate systems (step C102). Further, the effect generation unit 108 obtains a region satisfying V = Vavg (steady noise level) and W = W + 5 (assuming that the initial value of W is 0) in the expression (2), and identifies the boundary of the region. One wave of the effect 401 is created (step C103). The effect 401 refers to an image effect that makes it possible to distinguish a region of a predetermined coordinate from other regions. Here, as an example, the effect 401 is identified by a wave indicated by a curve.

次に、任意点での音量レベル（Ｖ）が定常ノイズレベル（Ｖａｖｇ）より大きいか否か判定し、大きくなければ透明度を０としエフェクト４０１を映像に重畳し（ステップＤ１０２）、大きければ透明度を１００としてエフェクト４０１を映像に重畳する（ステップＤ１０３）。ここでは透明度０は透明であることを示し、透明度１００は不透明であることを示すものとする。したがって、音声レベルが定常ノイズレベルより大きければ、音声がそこまで伝播するものとして不透明の曲線のエフェクト４０１を提示する。一方、音声レベルが定常ノイズレベルより大きくなければ、音声がそこまで伝播しないものとしてエフェクト４０１の曲線を透明にする。エフェクト４０１の重畳された映像は通信装置２０に送られ、通信装置２０の映像提示部２０３にて再生され、ユーザｂ１１０に提示される（ステップＤ１０４）。 Next, it is determined whether or not the volume level (V) at an arbitrary point is greater than the steady noise level (Vavg). If not, the transparency is set to 0 and the effect 401 is superimposed on the video (step D102). The effect 401 is superimposed on the video as 100 (step D103). Here, transparency 0 indicates that it is transparent, and transparency 100 indicates that it is opaque. Therefore, if the sound level is higher than the steady noise level, the effect 401 of an opaque curve is presented as the sound propagates there. On the other hand, if the sound level is not higher than the steady noise level, the curve of the effect 401 is made transparent so that the sound does not propagate so far. The video on which the effect 401 is superimposed is sent to the communication device 20, is played back by the video presentation unit 203 of the communication device 20, and is presented to the user b110 (step D104).

図６は、本実施形態における、通信装置１０における音場測定部１０６の平面の座標軸と、通信装置２０の映像提示部２０４に提示される映像の座標軸との対応の一例を示す図である。図７は、通信装置１０における音場測定部１０６の平面の座標軸と対応付けられた、通信装置２０の映像提示部２０４にて提示される映像の一例を示す図である。 FIG. 6 is a diagram illustrating an example of correspondence between the coordinate axes of the plane of the sound field measurement unit 106 in the communication device 10 and the coordinate axes of the video presented to the video presentation unit 204 of the communication device 20 in the present embodiment. FIG. 7 is a diagram illustrating an example of an image presented by the image presentation unit 204 of the communication device 20 that is associated with the plane coordinate axis of the sound field measurement unit 106 in the communication device 10.

図６および図７を参照すると、撮像部１０３の設置位置および撮像方向が固定されており、音場測定部１０６の各マイク入力部１０７が平面状に設置されている。撮像部１０３で撮像される映像から、画像認識により、平面状に設置されたマイク入力部１０７のうち、座標が既知な３つのマイク入力部１０７の位置を求め、撮像された映像の座標軸と音場測定部１０６の平面上の座標との対応付けを求める。これらの処理は、装置を通信に利用する前に予め行っておけばよい。これにより、通信装置２０の映像提示部２０４にてユーザｂ１１０に提示される映像の座標と通信装置１０の音場測定部１０６の座標を一致させることができる。エフェクト４０１を良好に重畳させることができる。 6 and 7, the installation position and the imaging direction of the imaging unit 103 are fixed, and each microphone input unit 107 of the sound field measuring unit 106 is installed in a planar shape. From the image captured by the image capturing unit 103, the positions of the three microphone input units 107 whose coordinates are known among the microphone input units 107 installed in a plane are obtained by image recognition, and the coordinate axes and sound of the captured image are obtained. The association with the coordinates on the plane of the field measurement unit 106 is obtained. These processes may be performed in advance before using the apparatus for communication. Thereby, the coordinates of the video presented to the user b110 by the video presentation unit 204 of the communication device 20 and the coordinates of the sound field measurement unit 106 of the communication device 10 can be matched. The effect 401 can be satisfactorily superimposed.

図８は、映像提示部にてユーザに提示されるエフェクト４０１の重畳された映像の、音量が小さいときの一例を示す図である。図９は、映像提示部にてユーザに提示されるエフェクトの重畳された映像の、音量が大きいときの一例を示す図である。 FIG. 8 is a diagram illustrating an example when the volume of the video on which the effect 401 presented to the user in the video presentation unit is superimposed is low. FIG. 9 is a diagram illustrating an example when the volume of the video on which the effect presented to the user is superimposed by the video presenting unit is high.

空間伝播二次近似関数を求めるときには、音声提示部１０２で複数のテスト音を発生させる。ここでは、テスト音は、提示音声集音部１０５で集音する音量レベルが０ｄＢから、５ｄＢ間隔の音量レベル（Ｗｍ）となるような５段階の音量とする。 When obtaining the spatial propagation quadratic approximate function, the voice presentation unit 102 generates a plurality of test sounds. Here, the test sound is assumed to have five levels of sound volume at which the sound volume level collected by the presentation sound sound collecting unit 105 is from 0 dB to 5 dB intervals (Wm).

そして、エフェクト生成部１０８は、各テスト音について、各マイク入力部１０７で得られる音量から、音量レベル（Ｖ）が定常ノイズレベル（Ｖａｖｇ）を超えると推定される領域を求め、その領域に対応する映像提示部２０４で提示される映像領域を求める。
これを５段階の音量について行うことにより５つの領域（領域ａ５０１、領域ｂ５０２、領域ｃ５０３、領域ｄ５０４、領域ｅ５０５が得られる。 Then, for each test sound, the effect generation unit 108 obtains an area where the volume level (V) is estimated to exceed the steady noise level (Vavg) from the volume obtained by each microphone input unit 107, and corresponds to that area. The video area presented by the video presentation unit 204 is obtained.
By performing this for five levels of volume, five regions (region a501, region b502, region c503, region d504, and region e505 are obtained.

各音量レベルについて得られた領域は、その音量レベルの音声が再生されたときの音の伝播範囲とみなすことができる。そして、各領域の境界にエフェクト４０１の曲線を描くことにより、エフェクト４０１は波紋の形状になり、通信装置２０の映像提示部
２０４では、通信装置１０の音声提示部１０２から波紋が発生しているように見える。 The region obtained for each volume level can be regarded as the sound propagation range when the sound of that volume level is reproduced. Then, by drawing the curve of the effect 401 at the boundary of each region, the effect 401 has a ripple shape, and the video presentation unit 204 of the communication device 20 generates a ripple from the voice presentation unit 102 of the communication device 10. looks like.

これらの曲線を含むエフェクト４０１は、常に映像とともに映像提示部に２０４に提示可能なように、撮像部１０３で撮影された映像に予め重畳されている。そして、初期状態では、このエフェクト４０１の透明度設定は０にされており、エフェクト４０１は透明に描画されている。 The effect 401 including these curves is preliminarily superimposed on the video imaged by the imaging unit 103 so that it can always be presented to the video presentation unit 204 together with the video image. In the initial state, the transparency of the effect 401 is set to 0, and the effect 401 is drawn transparently.

通信装置２０にてユーザｂ１１０が発話すると、通信装置１０の提示音声集音部１０２における音量レベルに応じて、通信装置２０の映像提示部２０４では、音の伝播範囲内のエフェクト４０１の波紋の透明度が１００にされる。 When the user b110 speaks in the communication device 20, the video presentation unit 204 of the communication device 20 causes the ripples of the effect 401 within the sound propagation range to be transparent according to the volume level in the presentation sound collection unit 102 of the communication device 10. Is set to 100.

ユーザａ１０９側の音声提示部１０２での音量レベルが小さければ、エフェクト４０１はあまり広がらずエフェクト４０１を形成する波の数は少なくなる（図８参照）。ユーザａ１０９側の音声提示部１０２での音量レベルが大きければ、エフェクト４０１は大きく広がりエフェクト４０１を形成する波の数が多くなる（図９参照）。 If the volume level at the voice presentation unit 102 on the user a 109 side is small, the effect 401 does not spread so much and the number of waves forming the effect 401 decreases (see FIG. 8). If the volume level at the voice presentation unit 102 on the user a 109 side is high, the effect 401 spreads greatly and the number of waves forming the effect 401 increases (see FIG. 9).

以上説明したように、本実施形態によれば、予め受話側の通信装置１０が音の空間伝播近似二次関数を求めておき、通信時、受話側の通信装置１０が空間伝播近似二次関数を用いて送話側の通信装置２０からの音声の伝播状態を求め、それをエフェクトとして、受話側の通信装置１０の映像に重畳して送話側の通信装置２０に送るので、通信装置２０のユーザｂ１１０は、自分の発した音声が、通信装置１０のユーザａ１０９の周囲の空間にどのように提示されているかを視覚的に把握することができる。 As described above, according to the present embodiment, the communication device 10 on the reception side obtains a spatial propagation approximate quadratic function of sound in advance, and the communication device 10 on the reception side receives the spatial propagation approximation quadratic function during communication. Is used to determine the propagation state of the voice from the communication device 20 on the transmission side, and the result is superimposed on the video of the communication device 10 on the reception side and sent to the communication device 20 on the transmission side as an effect. The user b110 can visually grasp how the voice he / she uttered is presented in the space around the user a109 of the communication device 10.

なお、本実施形態では、空間内の複数箇所に設置したマイク入力部１０７により測定した音量レベルから得られた空間伝播二次近似関数を算出することにより、空間内での実際の音声の伝播状態を推定可能である。しかし、本発明は、空間伝播関数を二次関数として近似する構成に限定されるものではない。音場測定部１０６のマイク入力部１０７の数を増やすことによって、音の空間伝播関数算出時に、高次数の関数を算出することが可能となる。複雑な音の伝播を表現できる高次数の関数によって、より精度の高い音の伝播の推定が可能となる。空間内の音声の伝播が音声提示部１０２で再生される音量から単純に推定できないような複雑な伝播状態を示すような場合にも、必要に応じて高次数の伝播モデルを利用することにより実際の伝播状態を高い精度で把握することができる。 In the present embodiment, the actual sound propagation state in the space is calculated by calculating the spatial propagation quadratic approximation function obtained from the volume level measured by the microphone input units 107 installed in a plurality of places in the space. Can be estimated. However, the present invention is not limited to a configuration that approximates a spatial propagation function as a quadratic function. By increasing the number of microphone input units 107 of the sound field measuring unit 106, it is possible to calculate a high-order function when calculating the spatial propagation function of sound. High-order functions that can represent complex sound propagation enable more accurate sound propagation estimation. Even when the propagation of sound in the space shows a complicated propagation state that cannot be simply estimated from the volume reproduced by the speech presentation unit 102, it is actually possible to use a high-order propagation model as necessary. Can be grasped with high accuracy.

また、本実施形態では、映像上の音声の伝播状態を複数の曲線からなるエフェクト４０１として重畳する例を示したが、本発明はこれに限定されるものではない。受話側の音声の伝播状態を送話側にフィードバックするものであれば、どのような手段によってもよい。例えば、音声の伝播状態を、色の変化など他の態様のエフェクトとして映像上にマッピングし、フィードバックすることとしてもよい。また、音声の伝播状態をフィードバックする手段は映像によるものに限定されるものでもない。 In the present embodiment, an example in which the sound propagation state on the video is superimposed as the effect 401 composed of a plurality of curves has been shown, but the present invention is not limited to this. Any means may be used as long as the propagation state of the voice on the receiver side is fed back to the transmitter side. For example, the sound propagation state may be mapped on the video as another effect such as a color change and fed back. Also, the means for feeding back the sound propagation state is not limited to that by video.

また、本実施形態は、双方向の映像および音声を送受信する通信システムを例示したが、本発明はこれに限定されるものではない。少なくとも音声を送受信するものであれば、必ずしも映像の送受信は必要とされない。音声通信の場合、映像以外の手段により音声の伝播状態をフィードバックすればよく、例えば、音声の伝播状態のみを表示可能な簡易な表示装置があればよい。 Moreover, although this embodiment illustrated the communication system which transmits / receives a bidirectional | two-way image | video and audio | voice, this invention is not limited to this. Video transmission / reception is not necessarily required as long as audio transmission / reception is possible. In the case of voice communication, it is only necessary to feed back the voice propagation state by means other than video. For example, a simple display device that can display only the voice propagation state may be used.

また、本実施形態では、提示音声集音部１０５は、音声提示部１０２で再生された音声の音量レベルを測定することとしたが、本発明はこれに限定されるものではない。提示音声集音部１０５は音声提示部１０２のスピーカ（不図示）への入力信号のレベルを音量レベルとして測定することとし、伝播モデルの算出および実際の伝播状態の算出においてその入力信号レベルを用いることとしてもよい。 Moreover, in this embodiment, although the presentation audio | voice sound collection part 105 measured the volume level of the audio | voice reproduced | regenerated by the audio | voice presentation part 102, this invention is not limited to this. The presented voice collecting unit 105 measures the level of the input signal to the speaker (not shown) of the voice presenting unit 102 as the volume level, and uses the input signal level in the calculation of the propagation model and the actual propagation state. It is good as well.

次に、本発明の他の実施形態について説明する。 Next, another embodiment of the present invention will be described.

図１０は、他の実施形態による通信装置の構成を示すブロック図である。 FIG. 10 is a block diagram illustrating a configuration of a communication device according to another embodiment.

図１０を参照すると、本実施形態の通信装置６０は、集音部１０１、音声提示部１０２、撮像部１０３、映像提示部１０４、提示音声集音部１０５、音場測定部１０６、エフェクト生成部１０８、および波形一致判定部６０１を有している。 Referring to FIG. 10, the communication device 60 according to the present embodiment includes a sound collection unit 101, a sound presentation unit 102, an imaging unit 103, a video presentation unit 104, a presentation sound collection unit 105, a sound field measurement unit 106, and an effect generation unit. 108 and a waveform match determination unit 601.

図１０には、ユーザａ１０９の使用する通信装置のみが示されているが、これと通信するユーザｂ１１０の使用する通信装置も同様の構成である。不図示であるが、ユーザｂ１１０の使用する通信装置７０は、通信装置６０の各部に対応する、集音部２０１、音声提示部２０２、撮像部２０３、映像提示部２０４、提示音声集音部２０５、音場測定部２０６、エフェクト生成部２０８、および波形一致判定部７０１を有しているものとする。 FIG. 10 shows only the communication device used by the user a109, but the communication device used by the user b110 communicating with the user a109 has the same configuration. Although not shown, the communication device 70 used by the user b110 includes a sound collection unit 201, a voice presentation unit 202, an imaging unit 203, a video presentation unit 204, and a presentation voice sound collection unit 205 corresponding to each unit of the communication device 60. Assume that a sound field measurement unit 206, an effect generation unit 208, and a waveform match determination unit 701 are included.

図１０における集音部１０１、音声提示部１０２、撮像部１０３、映像提示部１０４、提示音声集音部１０５、音場測定部１０６は図３に示したものと同じものである。 The sound collecting unit 101, the sound presenting unit 102, the imaging unit 103, the video presenting unit 104, the presented sound collecting unit 105, and the sound field measuring unit 106 in FIG. 10 are the same as those shown in FIG.

波形一致判定部６０１は、送話側波形と受話側波形が一致するか否かの判定を行う。 The waveform coincidence determination unit 601 determines whether or not the transmission side waveform and the reception side waveform match.

ユーザｂ１１０が発話すると、その音声が通信装置７０の集音部２０１で集音される。このときの集音部２０１での音量レベルを時系列で取得した波形を送話側波形１１０１とする。なお、送話側波形１１０１は、ユーザｂ１１０の使用する通信装置７０で測定された波形であるが、この波形をユーザａ１０９の使用する通信装置６０に通知することとすればよい。また、通信装置１０側で音声提示部１０２にて再生する音声データから送話側波形１１０１を求めることとしてもよい。 When the user b110 speaks, the sound is collected by the sound collection unit 201 of the communication device 70. A waveform obtained by chronologically acquiring the volume level at the sound collecting unit 201 at this time is defined as a transmission side waveform 1101. Note that the transmission side waveform 1101 is a waveform measured by the communication device 70 used by the user b110, and this waveform may be notified to the communication device 60 used by the user a109. Alternatively, the transmission-side waveform 1101 may be obtained from voice data reproduced by the voice presentation unit 102 on the communication device 10 side.

一方、通信装置６０における提示音声集音部１０５での音量レベルを時系列で取得した波形を受話側波形１１０２とする。 On the other hand, a waveform obtained by chronologically obtaining the volume level in the presentation sound collection unit 105 in the communication device 60 is defined as a reception-side waveform 1102.

送話側波形１１０１は、ユーザｂ１１０が発話し、集音部２０１で音量レベルが変化したときに取得が開始され、集音部２０１で１秒間の無音時間帯が計測された時点で取得が終了する。その間の時間が入力時間（Ｔ）とされる。 The transmission side waveform 1101 is acquired when the user b110 speaks and the volume level is changed by the sound collection unit 201, and is acquired when the sound collection unit 201 measures a one-second silence period. To do. The time between them is the input time (T).

受話側波形１１０２は、ユーザａ１０９側の通信装置６０の音声提示部１０２で音声が再生されるとき、取得が開始され、ユーザｂ１１０側の通信装置７０の集音部２０１で測定された入力時間（Ｔ）が経過すると、取得が終了する。 The reception-side waveform 1102 starts to be acquired when voice is reproduced by the voice presentation unit 102 of the communication device 60 on the user a 109 side, and the input time (measured by the sound collection unit 201 of the communication device 70 on the user b 110 side). When T) elapses, acquisition ends.

提示音声集音部１０５での集音には、音声提示部１０２での再生に対して遅延があり、入力時間（Ｔ）で波形の取得を終了することで、必要な音声波形を逃すことが考えられるが、ユーザｂ１１０側の集音部２０１の開始から終了までの時間には、１秒間の無音時間帯が含まれているため、ユーザｂ１１０の発話時の音声波形を逃すことはない。 The sound collection by the presentation sound collection unit 105 has a delay with respect to the reproduction by the sound presentation unit 102, and a necessary sound waveform can be missed by completing the waveform acquisition at the input time (T). Although it is conceivable, since the time from the start to the end of the sound collection unit 201 on the user b110 side includes a one-second silent period, the voice waveform at the time when the user b110 speaks is not missed.

波形一致判定部６０１は、切り出された送話側波形１１０１と受話側波形１１０２の一致を判定するため、一定の誤差範囲（ここでは３ｄＢ）で両波形の一致処理を行う。この一致処理では、２つの波形が一定誤差範囲内にあれば一致と判定する。一致していないと判定されれば、提示音声集音部１０５に入力された音は音声提示部１０２からの再生音以外の音であり、ユーザｂ１１０側の集音部２０１へ入力された音声ではないとみなされる。一方、２つの波形が一致していると判定されれば、音場測定部１０６のマイク入力部１０７に入力された音声であるとみなされる。 The waveform matching determination unit 601 performs matching processing of both waveforms within a certain error range (here, 3 dB) in order to determine the match between the cut-out transmitting-side waveform 1101 and the receiving-side waveform 1102. In this matching process, if the two waveforms are within a certain error range, it is determined that they match. If it is determined that they do not match, the sound input to the presentation sound collection unit 105 is a sound other than the reproduction sound from the sound presentation unit 102, and the sound input to the sound collection unit 201 on the user b 110 side It is considered not. On the other hand, if it is determined that the two waveforms match, it is considered that the sound is input to the microphone input unit 107 of the sound field measurement unit 106.

この判定結果は、波形一致判定部６０１からエフェクト生成部１０８に通知される。エフェクト生成部１０８は、波形が一致しないと通知された場合、エフェクト４０１の透明度を０に設定する。 This determination result is notified from the waveform match determination unit 601 to the effect generation unit 108. The effect generation unit 108 sets the transparency of the effect 401 to 0 when notified that the waveforms do not match.

図１１は、波形一致判定部による波形一致判定処理を示すフローチャートである。図１１を参照すると、ユーザｂ１１０が通信装置７０の集音部２０１に音声を入力すると（ステップＥ１０１）、通信装置７０の集音部２０１にて集音が開始される（ステップＥ１０２）。ここで集音される波形が送話側波形１１０１となる。 FIG. 11 is a flowchart showing a waveform match determination process by the waveform match determination unit. Referring to FIG. 11, when the user b110 inputs a sound to the sound collection unit 201 of the communication device 70 (step E101), sound collection is started by the sound collection unit 201 of the communication device 70 (step E102). The waveform collected here becomes the transmission side waveform 1101.

その音声は通信装置７０から通信装置６０に送られる。そして、通信装置６０の音声提示部１０２にて音声が再生され、ユーザａ１０９に提示される（ステップＥ１０３）。音声提示部１０２で再生された音声は、提示音声集音部１０５にて集音される（ステップＥ１０４）。そこで、波形一致判定部６０１は、送話側波形と受話側波形の一致判定を開始する（ステップＥ１０５）。 The sound is transmitted from the communication device 70 to the communication device 60. Then, the voice is reproduced by the voice presentation unit 102 of the communication device 60 and presented to the user a 109 (step E103). The voice reproduced by the voice presentation unit 102 is collected by the presentation voice collection unit 105 (step E104). Therefore, the waveform match determination unit 601 starts the match determination between the transmission side waveform and the reception side waveform (step E105).

通信装置７０の集音部２０１にて１秒間の無音状態が検出されると、その時点で入力時間（Ｔ）が算出される（ステップＥ１０６）。そして、一致判定の開始から時間Ｔが経過すると、通信装置６０の波形一致判定部６０１による一致判定が終了する（ステップＥ１０７）。 When the sound collecting unit 201 of the communication device 70 detects a one-second silence state, the input time (T) is calculated at that time (step E106). When time T elapses from the start of the match determination, the match determination by the waveform match determination unit 601 of the communication device 60 ends (step E107).

次に、通信装置６０の波形一致判定部６０１は、送話側波形と受話側波形の一致処理の結果、それらが一致しているか否か判定する（ステップＥ１０８）。送話側波形と受話側波形が一致していれば、そのまま処理が繰り返される。また、送話側波形と受話側波形が類似していなければ、エフェクト生成部１０８は、エフェクト４０１の透明度を０に設定する（ステップＥ１０９）。 Next, the waveform matching determination unit 601 of the communication device 60 determines whether or not the transmission side waveform and the reception side waveform match as a result of the matching process (step E108). If the transmitting side waveform and the receiving side waveform match, the process is repeated as it is. If the transmitting side waveform and the receiving side waveform are not similar, the effect generating unit 108 sets the transparency of the effect 401 to 0 (step E109).

図１２は、送話側波形と受話側波形の関係を示す図である。図１２を参照すると、送話側波形１１０１と受話側波形１１０２が示されている。送話側波形と受話側波形の間にはｔ（ｍｓ）の遅延があるが、送話側波形の（ａ）の部分と受話側波形の（ｂ）の部分とは類似している。波形一致判定部６０１は、この部分を一致すると判定し、エフェクト生成部１０８は、この部分にてエフェクト４０１を提示する。一方、受話側波形の（ｃ）の部分や（ｄ）の部分に現れている波形は、送話側波形１１０１にはない。波形一致判定部６０１は、この部分で一致しないと判定し、エフェクト生成部１０８はこの部分のエフェクト４０１の透明度を０にし、エフェクト４０１を透明にする。 FIG. 12 is a diagram showing the relationship between the transmitting side waveform and the receiving side waveform. Referring to FIG. 12, a transmission side waveform 1101 and a reception side waveform 1102 are shown. Although there is a delay of t (ms) between the transmission side waveform and the reception side waveform, the portion (a) of the transmission side waveform is similar to the portion (b) of the reception side waveform. The waveform match determination unit 601 determines that this part matches, and the effect generation unit 108 presents the effect 401 in this part. On the other hand, the waveform appearing in the portion (c) or the portion (d) of the reception side waveform is not in the transmission side waveform 1101. The waveform match determination unit 601 determines that there is no match in this part, and the effect generation unit 108 sets the transparency of the effect 401 in this part to 0 and makes the effect 401 transparent.

以上説明したように本実施形態によれば、受話側の通信装置１０は、送話側波形と受話側波形との一致を判定し、それらが一致したときに不透明なエフェクト４０１を重畳するので、受話側の通信装置１０の周囲に音声提示部１０２以外に大きな騒音が発生する場合にも、騒音の影響を除去し、送話側の通信装置１０から送った音声の伝播状態を送話側の通信装置１０のユーザｂ１１０に把握させることができる。 As described above, according to the present embodiment, the communication device 10 on the reception side determines whether the transmission side waveform and the reception side waveform match, and when they match, the opaque effect 401 is superimposed. Even when a large noise is generated around the communication device 10 on the receiving side other than the voice presentation unit 102, the influence of the noise is removed, and the propagation state of the voice transmitted from the communication device 10 on the transmitting side is determined. The user b110 of the communication device 10 can grasp this.

なお、上述した各実施形態における通信装置は、ソフトウェアプログラムおよびそれを実行するコンピュータにより実現することができる。また、そのソフトウェアプログラムは記録媒体に記録することができ、あるいはネットワークを通じて提供することもできる。 Note that the communication device in each of the embodiments described above can be realized by a software program and a computer that executes the software program. The software program can be recorded on a recording medium, or can be provided through a network.

本実施形態による通信システムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the communication system by this embodiment. 本実施形態による通信システムの各装置の構成および各部の配置を示す図である。It is a figure which shows the structure of each apparatus of the communication system by this embodiment, and arrangement | positioning of each part. 本実施形態による通信装置の構成を示すブロック図である。It is a block diagram which shows the structure of the communication apparatus by this embodiment. 本実施形態による通信装置の空間伝播二次近似関数算出処理を示すフローチャートである。It is a flowchart which shows the space propagation quadratic approximation function calculation process of the communication apparatus by this embodiment. 本実施形態による通信装置のエフェクト重畳処理を示すフローチャートである。It is a flowchart which shows the effect superimposition process of the communication apparatus by this embodiment. 受話側の通信装置における音場測定部の平面の座標軸と、送話側の通信装置の映像提示部に提示される映像の座標軸との対応の一例を示す図である。It is a figure which shows an example of a response | compatibility with the coordinate axis of the plane of the sound field measurement part in the communication apparatus of a receiving side, and the coordinate axis of the image | video presented to the video presentation part of the communication apparatus of a transmission side. 受話側の通信装置における音場測定部の平面の座標軸と対応付けられた、送話側の通信装置の映像提示部にて提示される映像の一例を示す図である。It is a figure which shows an example of the image | video shown in the image | video presentation part of the communication apparatus of a transmission side matched with the coordinate axis of the plane of the sound field measurement part in the communication apparatus of a reception side. 映像提示部にてユーザに提示されるエフェクトの重畳された映像の、音量が小さいときの一例を示す図である。It is a figure which shows an example when the sound volume of the image | video with which the effect shown to a user is superimposed by the image | video presentation part is low. 映像提示部にてユーザに提示されるエフェクトの重畳された映像の、音量が大きいときの一例を示す図である。It is a figure which shows an example when the sound volume of the image | video on which the effect shown to a user is superimposed by the image | video presentation part is large. 他の実施形態による通信装置の構成を示すブロック図である。It is a block diagram which shows the structure of the communication apparatus by other embodiment. 波形一致判定部による波形一致判定処理を示すフローチャートである。It is a flowchart which shows the waveform matching determination process by a waveform matching determination part. 送話側波形と受話側波形の関係を示す図である。It is a figure which shows the relationship between a transmission side waveform and a receiving side waveform.

Explanation of symbols

１０、６０通信装置
１０１集音部
１０２音声提示部
１０３撮像部
１０４映像提示部
１０５提示音声集音部
１０６音場測定部
１０７マイク入力部
１０８エフェクト生成部
１０９ユーザａ
１１０ユーザｂ
２０、７０通信装置
２０１集音部
２０２音声提示部
２０３撮像部
２０４映像提示部
２０５提示音声集音部
２０６音場測定部
２０７マイク入力部
２０８エフェクト生成部２０８
３０通信網
４０１エフェクト
５０１領域ａ
５０２領域ｂ
５０３領域ｃ
５０４領域ｄ
５０５領域ｅ
６１波形一致判定部
１１０１送話側波形
１１０２受話側波形
Ａ１０１〜Ａ１０５、Ｂ１０１〜Ｂ１０４、Ｃ１０１〜Ｃ１０３、Ｄ１０１〜Ｄ１０４１、Ｅ１０１〜Ｅ１０９ステップ DESCRIPTION OF SYMBOLS 10, 60 Communication apparatus 101 Sound collection part 102 Audio | voice presentation part 103 Image pick-up part 104 Image | video presentation part 105 Presented sound collection part 106 Sound field measurement part 107 Microphone input part 108 Effect generation part 109 User a
110 User b
20, 70 Communication device 201 Sound collection unit 202 Audio presentation unit 203 Imaging unit 204 Video presentation unit 205 Presented sound collection unit 206 Sound field measurement unit 207 Microphone input unit 208 Effect generation unit 208
30 communication network 401 effect 501 area a
502 Region b
503 region c
504 area d
505 area e
61 Waveform coincidence determination unit 1101 Transmission side waveform 1102 Reception side waveform A101 to A105, B101 to B104, C101 to C103, D101 to D1041, E101 to E109 Step

Claims

A communication device that performs communication by transmitting and receiving at least voice to each other,
A presentation sound collector that measures the volume level when the sound received from the other device is played,
A voice propagation model in a space where the voice is presented to the user is obtained in advance, and when the voice from the counterpart device is reproduced, the volume level measured by the presentation voice collecting unit and the propagation model A propagation state information generating unit that calculates a propagation state of the voice from the counterpart device in the space and transmits information on the propagation state to the counterpart device.

In addition to audio, video in the space is sent and received mutually,
The communication apparatus according to claim 1, wherein the propagation state information generation unit adds the propagation state information to the video in the space as an effect and transmits the effect to the counterpart apparatus.

A sound field measuring unit that measures sound volume levels at a plurality of locations in the space when a predetermined test sound is generated;
The communication apparatus according to claim 1, wherein the propagation state information generation unit calculates the propagation model using a plurality of volume levels measured by the sound field measurement unit.

The propagation model is an nth-order approximation function indicating a relationship between a volume level measured by the presentation sound collection unit and a volume level at an arbitrary point in the space;
The communication apparatus according to claim 3, wherein the sound field measurement unit measures a volume level at a number corresponding to an order of the n-order approximation function.

For the voice received from the counterpart device, further includes a waveform match determination unit that determines whether the waveform at the counterpart device matches the waveform at the presentation voice collection unit,
5. The propagation state information generation unit according to claim 1, wherein the propagation state information is transmitted to the counterpart device for a period determined to be coincident by the waveform coincidence determination unit. Communication device.

A communication method for performing communication by transmitting and receiving at least voice to and from each other in a communication device,
Obtaining a voice propagation model in advance in a space where the voice is presented to the user;
Measuring the volume level when the audio received from the other device is played back;
From the volume level measured when the sound from the counterpart device is reproduced and the propagation model, a propagation state of the sound from the counterpart device in the space is calculated, and the propagation state Transmitting information to the counterpart device.

In addition to audio, video in the space is sent and received mutually,
The communication method according to claim 6, wherein the propagation state information is added to an image in the space as an effect and transmitted to the counterpart device.

Measure the volume level of multiple places in the space when a predetermined test sound is uttered,
The communication method according to claim 6 or 7, wherein the propagation model is calculated using the sound volume levels measured at a plurality of locations in the space.

For the voice received from the counterpart device, it is determined whether or not the waveform at the counterpart device matches the waveform measured by the own device, and the propagation state information is determined for the period determined to match. The communication method according to claim 6, wherein the communication method is transmitted to the counterpart device.

The communication program for making a computer perform the operation | movement of each step in the communication method of any one of Claims 6-9.