JP4992591B2

JP4992591B2 - Communication system and communication terminal

Info

Publication number: JP4992591B2
Application number: JP2007193042A
Authority: JP
Inventors: 謙一北谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-07-25
Filing date: 2007-07-25
Publication date: 2012-08-08
Anticipated expiration: 2027-07-25
Also published as: JP2009033298A

Description

本発明は、通信システム及び通信端末に関し、更に詳しくは、音源定位スピーカを備える３つ以上の通信端末を、通信回線を介して互いに接続してマルチトークを実現する通信システム、及び、その通信システムに用いられる通信端末に関する。 The present invention relates to a communication system and a communication terminal, and more specifically, a communication system that realizes multitalk by connecting three or more communication terminals including sound source localization speakers to each other via a communication line, and the communication system thereof The present invention relates to a communication terminal used for.

３人以上のユーザが通信端末を介して同時に通話するマルチトークシステムが既に実現されており、通信技術の進歩に伴って、今後更に普及することが予想される。ところで、３人以上の人が実際に集まって会話をする場合には、複数の話者が同時に発言しても、聞き手は各話者の発言を聞き分けることが出来る。この現象はカクテルパーティ効果と呼ばれ、人の脳は、音が左右の耳に到達する時間差や、音源の距離感、視覚から得られる情報などに基づき、現在聞こえている音から複数の音源を分離する能力を持っているためとされている。 A multi-talk system in which three or more users talk at the same time via a communication terminal has already been realized, and it is expected that it will become more popular in the future as communication technology advances. By the way, when three or more people actually gather and have a conversation, even if a plurality of speakers speak at the same time, the listener can distinguish the speech of each speaker. This phenomenon is called the cocktail party effect, and the human brain uses multiple sound sources from the currently heard sound based on the time difference between the sound reaching the left and right ears, the sense of distance of the sound source, and information obtained from vision. It is said that it has the ability to separate.

しかし、上記のマルチトークシステムでは、３人以上の人が集まって会話をする場合とは異なり、複数の話者が同時に発言した際に、単にスピーカで各話者の音声を再生するだけでは、音が左右の耳に到達する時間差や、音源の距離感などが均一化される。このため、聞き手は、各話者の音声を聞き分けることが難しくなり、誰が発言しているのかを理解できなくなるという問題が生じる。 However, unlike the case where three or more people gather and talk in the multi-talk system described above, when a plurality of speakers speak at the same time, simply playing each speaker's voice through the speaker, The time difference between the sound reaching the left and right ears, the sense of distance of the sound source, etc. are made uniform. For this reason, it becomes difficult for the listener to distinguish each speaker's voice, and there arises a problem that the listener cannot understand who is speaking.

上記の問題を解決するために、特許文献１は、マルチトークシステムにおける通信端末のスピーカをステレオスピーカで構成すると共に、各話者の音源を定位する旨を記載している。音源を定位するとは、ステレオスピーカ等から再生される立体的な音声において、各話者の音声が互いに異なる位置から聞こえるように、各話者の音源を擬似的に異なる位置に配置することを言う。
特開２００４−２７４１４７号公報 In order to solve the above problem, Patent Document 1 describes that a speaker of a communication terminal in a multi-talk system is configured by a stereo speaker and that the sound source of each speaker is localized. To locate a sound source means to arrange each speaker's sound source in a quasi-different position so that the sound of each speaker can be heard from different positions in the three-dimensional sound reproduced from a stereo speaker or the like .
JP 2004-274147 A

本発明者は、特許文献１に記載のような、各話者の音源を定位したマルチトークシステムにおいて、更に研究を行った。その結果、各話者の音源を定位した場合でも、複数の話者が同時に発言をした際には、一方の発言が他方の発言に遮られて聞き取りにくくなる場合があることが判った。 The inventor conducted further research on a multi-talk system in which the sound source of each speaker is localized as described in Patent Document 1. As a result, it was found that even when the sound source of each speaker is localized, when a plurality of speakers speak at the same time, one of the utterances may be blocked by the other, making it difficult to hear.

本発明は、上記に鑑み、音源定位スピーカを備える３つ以上の通信端末を、通信回線を介して互いに接続してマルチトークを実現する通信システム、及び、その通信システムに用いられる通信端末であって、複数の話者が同時に発言をした際にも、各話者の発言を聞き手が容易に区別できる通信システム及び通信端末を提供することを目的とする。 In view of the above, the present invention is a communication system that realizes multitalk by connecting three or more communication terminals including a sound source localization speaker to each other via a communication line, and a communication terminal used in the communication system. Thus, it is an object of the present invention to provide a communication system and a communication terminal in which a listener can easily distinguish a speaker's speech even when a plurality of speakers speak simultaneously.

上記目的を達成するために、本発明の第１の視点に係る通信システムは、
音源定位スピーカを備える３つ以上の通信端末を、通信回線を介して互いに接続してマルチトークを実現する通信システムであって、
接続中の通信端末から音声信号を送信している１つ以上の通信端末を識別し、該識別した通信端末と該通信端末が送信する音声信号の音量とを対応づけた通信状態情報を生成する通信状態検出部と、
前記通信状態情報を受信し、現在送信されている音声信号を受信する通信端末を受信端末として識別し、該識別した受信端末毎に設定されている音像合成規則に基づいて、現在送信されている音声信号について各受信端末毎に音像合成を行い、該合成した音像を当該受信端末に送信する音像合成装置と、を備え、
前記音像合成規則が、音源定位及び音量調整のための規則を含み、
前記音像合成規則が、所定の音量レベルを超えた音声信号の音源を、等方的に配置する旨を規定する、ことを特徴とする。 In order to achieve the above object, a communication system according to the first aspect of the present invention provides:
A communication system that realizes multitalk by connecting three or more communication terminals including sound source localization speakers to each other via a communication line,
One or more communication terminals that are transmitting audio signals from the connected communication terminal are identified, and communication state information that associates the identified communication terminals with the volume of the audio signal transmitted by the communication terminals is generated. A communication state detection unit;
A communication terminal that receives the communication status information and receives a currently transmitted voice signal is identified as a receiving terminal, and is currently transmitted based on a sound image synthesis rule set for each identified receiving terminal. A sound image synthesizing device that performs sound image synthesis for each receiving terminal with respect to the audio signal, and transmits the synthesized sound image to the receiving terminal ;
The sound image synthesis rules include rules for sound source localization and volume adjustment,
The sound image synthesis rule stipulates that sound sources of audio signals exceeding a predetermined volume level are arranged isotropically .

本発明の第２の視点に係る通信システムは、
音源定位スピーカ及び映像表示装置を備える３つ以上の通信端末を、通信回線を介して互いに接続してテレビ電話マルチトークを実現する通信システムであって、
接続中の通信端末から音声信号及び映像信号を送信している１つ以上の通信端末を識別し、該識別した通信端末と該通信端末が送信する音声信号の音量とを対応づけた通信状態情報を生成する通信状態検出部と、
前記通信状態情報を受信し、現在送信されている音声信号及び映像信号を受信する通信端末を受信端末として識別し、該識別した受信端末毎に設定されている映像合成規則に基づいて、現在送信されている映像信号について各受信端末毎に映像合成を行い、該合成した映像を当該受信端末に送信すると共に、前記合成した映像に関する映像合成情報を生成する映像合成装置と、
前記映像合成情報に基づいて、現在送信されている音声信号について各受信端末毎に音像合成を行い、該合成した音像を当該受信端末に送信する音像合成装置と、を備え、
前記映像合成規則が、映像配置のための規則を含み、
前記映像合成規則が、所定の音量レベルを超えた音声信号に対応する映像を、等方的に配置する旨を規定する、ことを特徴とする。 The communication system according to the second aspect of the present invention is:
A communication system for realizing videophone multi-talk by connecting three or more communication terminals including a sound source localization speaker and a video display device to each other via a communication line,
Communication state information that identifies one or more communication terminals that are transmitting audio signals and video signals from the connected communication terminal, and associates the identified communication terminals with the volume of the audio signal transmitted by the communication terminal. A communication state detection unit for generating
A communication terminal that receives the communication status information and receives a currently transmitted audio signal and video signal is identified as a receiving terminal, and is currently transmitted based on a video composition rule set for each identified receiving terminal. A video synthesizer that performs video synthesis for each receiving terminal for the received video signal, transmits the synthesized video to the receiving terminal, and generates video synthesis information relating to the synthesized video;
A sound image synthesizing device that performs sound image synthesis for each receiving terminal on the currently transmitted audio signal based on the video synthesis information, and transmits the synthesized sound image to the receiving terminal ;
The video composition rule includes a rule for video layout;
The video synthesis rule defines that video corresponding to an audio signal exceeding a predetermined volume level is isotropically arranged .

本発明の第１の視点に係る通信システムによれば、音像合成装置が、通信端末装置から受信した音像合成規則に基づいて、通話状態にある他の各通信端末装置からの音声信号を音像に合成するので、聞き手にとって聞き取り易い最適な音像に合成できる。これらによって、複数の話者が同時に発言をした際にも、各話者の発言を聞き手が容易に区別できる。 According to the communication system according to the first aspect of the present invention, the sound image synthesizer converts sound signals from other communication terminal devices in a call state into sound images based on the sound image synthesis rules received from the communication terminal devices. Since it is synthesized, it can be synthesized into an optimal sound image that is easy to hear for the listener. Thus, even when a plurality of speakers speak at the same time, the listener can easily distinguish the statements of each speaker.

本発明の第２の視点に係る通信システムによれば、映像と音像とが対応するので、各話者の発言を聞き手がより容易に区別できる。 According to the communication system according to the second aspect of the present invention, since the video and the sound image correspond to each other, the listener can more easily distinguish the speech of each speaker.

以下に、添付図面を参照し、本発明の実施例を詳しく説明する。図１は、本発明の第１実施例に係る通信システムのブロック図である。通信システム１０は、マルチトークシステムであって、通信端末１１と、通信回線１３を介して通信端末１１と接続された通話交換装置１２とを有する。通話交換装置１２は、３つ以上の通信端末１１を同時に接続して、マルチトークを実現することが出来る。同図は、４つの通信端末１１ａ〜１１ｄが同時に接続された状態を示している。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 is a block diagram of a communication system according to a first embodiment of the present invention. The communication system 10 is a multi-talk system, and includes a communication terminal 11 and a call exchange device 12 connected to the communication terminal 11 via a communication line 13. The call exchange device 12 can realize multitalk by connecting three or more communication terminals 11 simultaneously. The figure shows a state in which four communication terminals 11a to 11d are connected simultaneously.

図２は、通信端末１１のブロック図である。通信端末１１は、例えば携帯電話機として構成され、ユーザの入力操作を受け付ける入力装置２１と、液晶表示パネルからなる表示装置２２と、記憶装置２３と、通信装置２４と、スピーカからなる受話装置２５と、マイクからなる送話装置２６と、制御装置２７とを有する。 FIG. 2 is a block diagram of the communication terminal 11. The communication terminal 11 is configured as, for example, a mobile phone, and includes an input device 21 that receives user input operations, a display device 22 that includes a liquid crystal display panel, a storage device 23, a communication device 24, and a receiver device 25 that includes a speaker. , A transmission device 26 composed of a microphone, and a control device 27.

受話装置２５は、音源定位スピーカとして構成される。音源定位スピーカとは、一般にステレオスピーカやサラウンドスピーカと呼ばれ、複数の音源が擬似的に互いに異なる方角及び距離から聞こえるように再生されるスピーカ又はスピーカセットである。通信装置２４は、通話交換装置１２との間で音声信号の送受信を行う。制御装置２７は、一時記憶装置及び中央演算装置を備え、所定の制御プログラムによって動作し、通信端末１１における各部２１〜２６の動作を制御する。 The receiver 25 is configured as a sound source localization speaker. The sound source localization speaker is generally called a stereo speaker or a surround speaker, and is a speaker or a speaker set that is reproduced so that a plurality of sound sources can be heard from different directions and distances in a pseudo manner. The communication device 24 transmits and receives audio signals to and from the call exchange device 12. The control device 27 includes a temporary storage device and a central processing unit, operates according to a predetermined control program, and controls the operations of the units 21 to 26 in the communication terminal 11.

図３は、通話交換装置１２のブロック図である。通話交換装置１２は、複数の通信端末１１との間で音声信号の送受信を行う通信装置３１と、記憶装置３２と、制御装置３３とを有する。制御装置３３は、一時記憶装置及び中央演算装置を備え、所定の制御プログラムによって動作し、通話交換装置１２における各部３１，３２の動作を制御する。 FIG. 3 is a block diagram of the call exchange device 12. The call exchange device 12 includes a communication device 31 that transmits and receives audio signals to and from the plurality of communication terminals 11, a storage device 32, and a control device 33. The control device 33 includes a temporary storage device and a central processing unit, operates according to a predetermined control program, and controls the operations of the units 31 and 32 in the call exchange device 12.

制御装置３３は、現在の通話状態を検出する通話状態検出部３４と、各通信端末１１から受信した音声信号を音像に合成する音像合成処理部（音像合成装置）３５とを有する。音像に合成するとは、音源定位スピーカから再生される立体的な音声において、各話者の音声が互いに異なる位置から聞こえるように、各話者の音源を擬似的に異なる位置に配置し（音源定位）、又は、各話者の音量を調節し（音量調節）、若しくは、これら双方を行うことを言う。 The control device 33 includes a call state detection unit 34 that detects a current call state, and a sound image synthesis processing unit (sound image synthesis device) 35 that synthesizes a sound signal received from each communication terminal 11 with a sound image. Synthesizing with a sound image means that in the three-dimensional sound reproduced from the sound source localization speaker, each speaker's sound source is placed in a pseudo different position so that each speaker's sound can be heard from different positions (sound source localization). ), Or adjusting the volume of each speaker (volume adjustment), or both.

図４は、図１の通信システム１０を用いた通話方法の手順を示すフローチャートである。通信端末１１の記憶装置２３に、予め音像合成規則を格納する（ステップＳ１１）。音像の合成は、複数の話者の音声信号を一つの音像に合成するものであり、例えば、現在通話状態にある人の数、現在の発言の数、発言の声の大きさ、又は、現在通話状態にある人に関する情報等に基づいて行う。音像合成規則は、その音像合成の際のルールを定めたものである。 FIG. 4 is a flowchart showing a procedure of a call method using the communication system 10 of FIG. The sound image synthesis rules are stored in advance in the storage device 23 of the communication terminal 11 (step S11). The synthesis of sound images is to synthesize the sound signals of a plurality of speakers into one sound image, for example, the number of people currently in a call state, the number of current utterances, the loudness of utterances, This is based on information related to the person in a call state. The sound image synthesis rule defines a rule for the sound image synthesis.

音像合成規則は、ユーザが入力装置２１を用いて予め設定することができ、通話中であっても変更することが出来る。また、音像合成規則の幾つかの候補を予め記憶装置２３に格納しておき、それらの候補からユーザが入力装置２１を介して選択できるようにしてもよい。 The sound image synthesis rule can be preset by the user using the input device 21 and can be changed even during a call. Alternatively, some sound image synthesis rule candidates may be stored in the storage device 23 in advance, and the user can select from these candidates via the input device 21.

通信端末１１の制御装置２７は、自己の通信端末１１を含めて３つ以上の通信端末１１が同時に接続され、マルチトークが実現されると、記憶装置２３から音像合成規則を読み出し、通信装置２４を介して通話交換装置１２に送信する（ステップＳ１２）。通話交換装置１２の制御装置３３は、受信した音像合成規則を記憶装置３２に格納すると共に、通話状態検出部３４及び音像合成処理部３５を起動する（ステップＳ１３）。 When three or more communication terminals 11 including the own communication terminal 11 are simultaneously connected and multitalk is realized, the control device 27 of the communication terminal 11 reads the sound image synthesis rule from the storage device 23, and the communication device 24. Is transmitted to the call exchange device 12 (step S12). The control device 33 of the call exchange device 12 stores the received sound image synthesis rule in the storage device 32 and activates the call state detection unit 34 and the sound image synthesis processing unit 35 (step S13).

通話状態検出部３４は、現在の通話状態を検出し、音像合成規則を受信した通信端末１１に接続されている各通信端末１１の識別情報と音声信号の音量の情報とを対応付けて通信状態情報を生成し、音像合成処理部３５に出力する（ステップＳ１４）。音像合成処理部３５は、通話状態検出部３４から得た通信状態情報及び音像合成規則に基づき、各通信端末１１から受信した音声信号を、音像に合成する（ステップＳ１５）。音像の合成に際しては、各通信端末１１から受信した音声信号の音源定位及び音量調節の何れか、又は、双方を行う。 The call state detecting unit 34 detects the current call state and associates the identification information of each communication terminal 11 connected to the communication terminal 11 that has received the sound image synthesis rule with the volume information of the audio signal to communicate the communication state. Information is generated and output to the sound image synthesis processing unit 35 (step S14). The sound image synthesis processing unit 35 synthesizes the audio signal received from each communication terminal 11 into a sound image based on the communication state information and the sound image synthesis rule obtained from the call state detection unit 34 (step S15). In synthesizing a sound image, either or both of sound source localization and volume adjustment of an audio signal received from each communication terminal 11 is performed.

音像合成処理部３５は、合成した音像を、通信装置３１を介して、音像合成規則を受信した通信端末１１に送信する（ステップＳ１６）。通信端末１１の制御装置２７は、通話交換装置１２から音像を受信し、受信した音像を受話装置２５から再生させる（ステップＳ１７）。音源定位や音量調節がなされた音像が再生されることによって、聞き手には、各話者が互いに異なる位置で発言しているように聞こえ、各話者の発言を区別できる。 The sound image synthesis processing unit 35 transmits the synthesized sound image to the communication terminal 11 that has received the sound image synthesis rules via the communication device 31 (step S16). The control device 27 of the communication terminal 11 receives the sound image from the call exchange device 12, and reproduces the received sound image from the receiver device 25 (step S17). By reproducing the sound image with sound source localization and volume adjustment, it is heard by the listener that each speaker is speaking at a different position, and the speaker's speech can be distinguished.

図５（ａ）、（ｂ）に、音像合成規則の一例を示す。音像の合成に際しては原則的に、複数の音源を可能な限り等方的に配置する。等方的に配置するとは、１次元、２次元、又は、３次元の空間内に、複数の音源を聞き手を中心として互いに対称的な方位に配置することを言う。 FIGS. 5A and 5B show examples of sound image synthesis rules. When synthesizing sound images, in principle, a plurality of sound sources are arranged as isotropic as possible. “Isotropically arranged” means that a plurality of sound sources are arranged in symmetrical directions around a listener in a one-dimensional, two-dimensional, or three-dimensional space.

例えば、２つの音源を配置する際には、図５（ａ）に示すように、２つの音源を、聞き手と交差する直線上に、聞き手を挟んで対称になるように配置する。３つの音源を配置する際には、図５（ｂ）に示すように、３つの音源を、聞き手を中心とする正三角形の頂点に配置する。４つの音源を配置する際には、聞き手を中心とする正方形又は正四面体の頂点に、４つの音源を配置する。このように複数の音源を等方的に配置することによって、各話者の発言を聞き手が容易に区別できる。 For example, when two sound sources are arranged, as shown in FIG. 5A, the two sound sources are arranged on a straight line intersecting the listener so as to be symmetric with respect to the listener. When three sound sources are arranged, as shown in FIG. 5B, the three sound sources are arranged at the apex of an equilateral triangle centered on the listener. When four sound sources are arranged, the four sound sources are arranged at the vertices of a square or a regular tetrahedron centered on the listener. Thus, by arranging a plurality of sound sources isotropically, the listener can easily distinguish the speech of each speaker.

音像合成規則の他の一例として、図５（ａ）、（ｂ）等に示したように音源の方位が決定された後に、聞き手と音源との距離を設定してもよい。例えば、各話者の音量に基づいて、音量の大きな音源を聞き手から近い所に、音量の小さな音源を聞き手から遠い所に配置してもよい。 As another example of the sound image synthesis rule, the distance between the listener and the sound source may be set after the direction of the sound source is determined as shown in FIGS. For example, based on the volume of each speaker, a loud sound source may be placed near the listener, and a low sound source may be placed far from the listener.

音像合成規則の他の一例として、音声信号の音量が一定のレベルを超えたものを発言として抽出し、この発言として抽出された音声信号のみを音源に合成してもよい。各話者の配置を固定する場合に比して、発言同士を離して配置できるので、各話者の発言を聞き手がより容易に区別できる。この場合、更に、各発言の音量が均一になるように調節してもよい。 As another example of the sound image synthesis rule, a speech signal whose volume exceeds a certain level may be extracted as a speech, and only the speech signal extracted as the speech may be synthesized with a sound source. Compared with the case where the arrangement of each speaker is fixed, the utterances can be arranged apart from each other, so that the utterance of each speaker can be more easily distinguished by the listener. In this case, you may adjust so that the volume of each utterance may become uniform.

音像合成規則の他の一例として、音声信号の音量が、発言と見なされるレベルＡを更に上回るレベルＢを超えると、当該発言を音量の大きな発言であると見なす。更に、音量がレベルＢを超える発言を優先的に等方的に配置した後、音量がレベルＢを超えない発言を音量がレベルＢを超える発言の配置場所の間に、可能な限り等方的に配置してもよい。 As another example of the sound image synthesis rule, when the volume of an audio signal exceeds a level B that is further higher than a level A that is regarded as an utterance, the utterance is regarded as an utterance with a high volume. Further, after the messages whose volume exceeds level B are preferentially arranged isotropically, the messages whose volume does not exceed level B are as isotropic as possible between the positions where the messages whose volume exceeds level B are arranged. You may arrange in.

音像合成規則の他の一例として、通信端末１１の電話帳機能によって通話相手を特定し、電話帳データから通話相手の関連情報を抽出する。また、抽出された通話相手の関連情報に基づいて、通話相手をグループ分けし、或いは、順位付けする。また、グループごとに、又は、順位に基づいて、各通話相手による音声信号の音源定位や音量調節を行う。 As another example of the sound image synthesis rule, the other party is specified by the telephone book function of the communication terminal 11 and related information of the other party is extracted from the telephone book data. Moreover, based on the extracted related information of the other party, the other party is grouped or ranked. Further, sound source localization and volume adjustment of audio signals by each call partner are performed for each group or based on the order.

音像合成規則の他の一例として、通信端末１１の電話帳機能によって通話相手を特定し、通話記録から通話相手との過去の通話回数や、通話時間などの情報を、メール送受信記録から通話相手との過去のメール送受信回数や、送受信したメールのサイズなどの情報を抽出する。また、抽出したこれらの情報に基づいて、通話相手をグループ分けし、或いは、順位付けする。また、グループごとに、又は、順位に基づいて、各通話相手による音声信号の音源定位や音量調節を行う。 As another example of the sound image synthesis rule, the other party is specified by the telephone directory function of the communication terminal 11, and information such as the number of past calls with the other party and the duration of the call is recorded from the call record. Extract information such as the number of past emails sent and received and the size of emails sent and received. Further, based on the extracted information, the call partners are grouped or ranked. Further, sound source localization and volume adjustment of audio signals by each call partner are performed for each group or based on the order.

上記のように音像合成規則が通信端末１１の電話帳機能等を利用することを規定している場合には、通信端末１１の制御装置２７は、ステップＳ１２の音像合成規則の送信に先立って、電話帳機能等の利用を行い、音像合成規則の詳細な設定を行う。音像合成規則は、上記に述べた例を互いに組み合わせて構成してもよい。また、上記に述べた例の他にも、通信システム１０やユーザの目的に応じて、種々の態様を採用できる。 When the sound image synthesis rule stipulates that the telephone book function or the like of the communication terminal 11 is used as described above, the control device 27 of the communication terminal 11 prior to the transmission of the sound image synthesis rule in step S12, Use the phone book function, etc., and make detailed settings for sound image synthesis rules. The sound image synthesis rule may be configured by combining the above-described examples. In addition to the examples described above, various modes can be adopted according to the communication system 10 and the purpose of the user.

ユーザが音像合成規則を変更すると、ステップＳ１２、Ｓ１３に従って、通話交換装置１２の記憶装置３２に変更された新たな音像合成規則が格納され、ステップＳ１４では、通話交換装置１２の音像合成処理部３５は新たな音像合成規則に基づいて、音像を合成する。 When the user changes the sound image synthesis rule, the changed new sound image synthesis rule is stored in the storage device 32 of the call exchange device 12 according to steps S12 and S13. In step S14, the sound image synthesis processing unit 35 of the call exchange device 12 is stored. Synthesizes a sound image based on a new sound image synthesis rule.

通話中に音像合成規則を変更する方法の例の一つとして、通信端末１１の記憶装置２３に格納された複数の音像合成規則から現在選択されている音像合成規則とは異なるものを選択することが出来る。また、各話者の音源の位置を互いに入れ替えることが出来る。更に、各話者の音量の大小を互いに切り替えることが出来る。更に、各話者の音源の位置の関係と、各話者の音量の大小の関係とを互いに切り替えることが出来る。 One example of a method for changing the sound image synthesis rule during a call is to select a sound image synthesis rule that is different from the currently selected sound image synthesis rule from a plurality of sound image synthesis rules stored in the storage device 23 of the communication terminal 11. I can do it. In addition, the positions of the sound sources of the speakers can be interchanged. Further, the volume level of each speaker can be switched between each other. Furthermore, the relationship between the positions of the sound sources of the speakers and the relationship between the loudness levels of the speakers can be switched to each other.

本実施例では、聞き手が通話中に音像合成規則を変更できるように構成したので、現在適用されている音像合成規則で、聞き手が各話者の発言を聞き分けにくいと感じた場合にも、音像合成規則を変更することで、聞き分け易くすることが出来る。また、現在適用されている音像合成規則が聞き手の意図に適合しない場合にも、聞き手は所望のものに変更できるので、聞き手が不便を感じることを防止できる。 In this embodiment, since the listener can change the sound image synthesis rule during a call, the sound image synthesis rule can be applied even when the listener feels it is difficult to distinguish each speaker's utterance with the currently applied sound image synthesis rule. By changing the composition rule, it can be made easier to distinguish. In addition, even when the currently applied sound image synthesis rules do not match the listener's intention, the listener can be changed to a desired one, thereby preventing the listener from feeling inconvenience.

例えば、マルチトーク中に２人の話者が同時に発言をしていて、且つ、この２人の話者の音源が互いに近い位置に配置されている場合には、聞き手は各話者の発言を区別しにくい。この場合には、２人の話者の音源を互いに遠い位置に配置すれば、各話者の発言を聞き分け易く出来る。 For example, if two speakers are speaking at the same time during multi-talk, and the sound sources of these two speakers are located close to each other, the listener will speak each speaker. Difficult to distinguish. In this case, if the sound sources of the two speakers are arranged at positions far from each other, it is possible to easily distinguish the speech of each speaker.

また、複数の話者が同時に発言している際に、聞き手が特定の話者の発言に注意を向けたいが、他の話者の発言に遮られて聞き取りにくい場合が考えられる。この場合には、この特定の話者の発言が聞き取り易くなるように各話者の音源の配置を入れ替え、又は、各話者の音量を調節すれば、前述の発言を聞き取り易く出来る。更に、各話者の音量に合わせて、それら各話者の音源と聞き手との擬似的な距離を調節すれば、各話者の発言をより聞き分け易くでき、且つ、臨場感のある通話を楽しむことが出来る。 In addition, when a plurality of speakers are speaking at the same time, the listener wants to pay attention to the speech of a specific speaker, but may be difficult to hear due to being blocked by the speech of another speaker. In this case, if the arrangement of the sound sources of each speaker is changed or the volume of each speaker is adjusted so that the speech of this specific speaker is easy to hear, the above-mentioned speech can be easily heard. Furthermore, by adjusting the pseudo distance between each speaker's sound source and the listener in accordance with the volume of each speaker, each speaker's utterances can be more easily distinguished and enjoy a realistic call. I can do it.

本実施例の通信システム１０によれば、音像合成処理部３５が、通信端末１１から受信した音像合成規則に基づいて、通話状態にある他の各通信端末１１からの音声信号を音像に合成するので、聞き手にとって区別し易い最適な音像に合成できる。これらによって、複数の話者が同時に発言をした際にも、各話者の発言を聞き手が区別し易くできる。 According to the communication system 10 of the present embodiment, the sound image synthesis processing unit 35 synthesizes audio signals from the other communication terminals 11 in a call state based on the sound image synthesis rules received from the communication terminals 11 into sound images. Therefore, it can be synthesized into an optimal sound image that is easy for the listener to distinguish. As a result, even when a plurality of speakers speak at the same time, it is easy for the listener to distinguish the speech of each speaker.

なお、第１実施例の通信システム１０では、通話状態検出部３４及び音像合成処理部３５の双方が通話交換装置１２内に配設されている必要はなく、例えば何れか一方又は双方が通信端末１１の制御装置２７内に配設されてもよい。 In the communication system 10 of the first embodiment, it is not necessary that both the call state detection unit 34 and the sound image synthesis processing unit 35 are provided in the call exchange device 12, for example, either one or both are communication terminals. 11 control devices 27.

本発明の第２実施例の通信システムは、テレビ電話マルチトークシステムとして構成される。通信端末１１は、図２に示した各種機能に加えて、撮影装置（図示せず）を有する。通信端末１１の通信装置２４は、通話中に撮影を行い、撮影した映像情報を通話交換装置１２に送信する。また、通話中に通話交換装置１２から映像情報を受信し、表示装置２２上に表示する。 The communication system according to the second embodiment of the present invention is configured as a videophone multitalk system. The communication terminal 11 has a photographing device (not shown) in addition to the various functions shown in FIG. The communication device 24 of the communication terminal 11 performs shooting during a call and transmits the captured video information to the call exchange device 12. In addition, video information is received from the call exchange device 12 during a call and displayed on the display device 22.

図６は、本実施例における通話交換装置１２の制御装置３３のブロック図である。通話交換装置１２の制御装置３３は、通話状態検出部３４及び音像合成処理部３５に加えて、通話状態にある各通信端末１１から受信した映像を合成する映像合成処理部（映像合成装置）３６を有する。 FIG. 6 is a block diagram of the control device 33 of the call exchange device 12 in this embodiment. In addition to the call state detection unit 34 and the sound image synthesis processing unit 35, the control device 33 of the call exchange device 12 includes a video synthesis processing unit (video synthesis device) 36 that synthesizes video received from each communication terminal 11 in the call state. Have

図７は、本実施例の通信システムを用いた通話方法の手順を示すフローチャートである。通信端末１１の記憶装置２３に、予め映像合成規則を格納する（ステップＳ２１）。映像の合成は、複数の話者の映像を一つの映像に合成するものであり、映像合成規則は、その映像合成の際のルールを定めたものである。映像合成規則も、前述の音像合成規則と同様に、聞き手が入力装置２１を用いて予め設定することができ、通話中であっても変更することが出来る。また、映像合成規則の幾つかの候補を予め記憶装置２３に格納しておき、それらの候補から聞き手が入力装置２１を介して選択できるようにしてもよい。 FIG. 7 is a flowchart showing a procedure of a call method using the communication system of the present embodiment. The video composition rule is stored in advance in the storage device 23 of the communication terminal 11 (step S21). Video synthesis is to combine a plurality of speakers' videos into one video, and the video synthesis rule defines rules for video synthesis. Similarly to the above-described sound image synthesis rule, the video synthesis rule can also be set in advance by the listener using the input device 21 and can be changed even during a call. Alternatively, some candidates for the video composition rule may be stored in advance in the storage device 23 so that a listener can select from these candidates via the input device 21.

通信端末１１の制御装置２７は、自己の通信端末１１を含めて３つ以上の通信端末１１が同時に接続され、テレビ電話マルチトークが実現されると、記憶装置２３から映像合成規則を読み出し、通信装置２４を介して通話交換装置１２に送信する（ステップＳ２２）。通話交換装置１２の制御装置３３は、受信した映像合成規則を記憶装置３２に格納すると共に、通話状態検出部３４、音像合成処理部３５、及び、映像合成処理部３６を起動する（ステップＳ２３）。 The control device 27 of the communication terminal 11 reads the video composition rule from the storage device 23 when three or more communication terminals 11 including the own communication terminal 11 are connected at the same time and videophone multitalk is realized. The message is transmitted to the call exchange device 12 via the device 24 (step S22). The control device 33 of the call exchange device 12 stores the received video composition rule in the storage device 32, and activates the call state detection unit 34, the sound image synthesis processing unit 35, and the video synthesis processing unit 36 (step S23). .

通話状態検出部３４は、現在の通話状態を検出し、映像合成規則を受信した通信端末１１に接続されている各通信端末１１の識別情報と音声信号の音量の情報とを対応付けて通信状態情報を生成し、映像合成処理部３６に出力する（ステップＳ２４）。 The call state detection unit 34 detects the current call state and associates the identification information of each communication terminal 11 connected to the communication terminal 11 that has received the video composition rule with the information on the volume of the audio signal. Information is generated and output to the video composition processing unit 36 (step S24).

映像合成処理部３６は、通話状態検出部３４から得た通信状態情報及び映像合成規則に基づき、各通信端末１１から受信した映像の配置、大きさ、動き、表示状態の一部又は全てを決定し、この決定に基づいて各通信端末１１から受信した映像を合成する（ステップＳ２５）。映像合成処理部３６は、合成した映像の情報を映像合成規則を受信した通信端末１１に送信すると共に、各通信端末１１の識別情報と、各通信端末１１から受信した映像の配置、大きさ、動き、表示状態の情報とを対応付けて映像合成情報を生成し、音像合成処理部３５に出力する（ステップＳ２６）。 The video composition processing unit 36 determines part or all of the arrangement, size, movement, and display state of the video received from each communication terminal 11 based on the communication state information and the video composition rule obtained from the call state detection unit 34. Based on this determination, the video received from each communication terminal 11 is synthesized (step S25). The video composition processing unit 36 transmits the information of the synthesized video to the communication terminal 11 that has received the video composition rule, and the identification information of each communication terminal 11 and the arrangement and size of the video received from each communication terminal 11. Video synthesis information is generated by associating the movement and display state information, and is output to the sound image synthesis processing unit 35 (step S26).

音像合成処理部３５は、映像合成情報に基づいて、各通信端末１１から受信した音像を合成する（ステップＳ２７）。音像合成処理部３５は、合成した音像の情報を映像合成規則を受信した通信端末１１に送信する（ステップＳ２８）。通信端末１１は、映像合成処理部３６から映像情報を受け取り映像を表示装置２２に表示させる（ステップＳ２９）と共に、音像合成処理部３５から音像情報を受け取り音像を受話装置２５で再生させる（ステップＳ３０）。 The sound image synthesis processing unit 35 synthesizes the sound image received from each communication terminal 11 based on the video synthesis information (step S27). The sound image synthesis processing unit 35 transmits information of the synthesized sound image to the communication terminal 11 that has received the video synthesis rule (step S28). The communication terminal 11 receives the video information from the video synthesis processing unit 36 and displays the video on the display device 22 (step S29), and also receives the sound image information from the sound image synthesis processing unit 35 and reproduces the sound image on the receiver 25 (step S30). ).

映像合成規則の一例として、音声信号の音量に応じて対応する映像の大きさや配置を設定してもよい。この場合、特定の話者の映像が他の話者の映像上にオーバーレイするように表示させてもよい。映像合成規則の他の一例として、音声信号の音量が一定のレベルを超えたものを発言として抽出し、発言中の話者の映像を相対的に大きく表示してもよい。或いは、発言中の話者の映像のみを画面上に配置してもよい。これらの場合、発言中の話者の映像を画面上に等方的に配置してもよい。 As an example of the video composition rule, the size and arrangement of the corresponding video may be set according to the volume of the audio signal. In this case, the video of a specific speaker may be displayed so as to be overlaid on the video of another speaker. As another example of the video synthesis rule, a voice signal whose volume exceeds a certain level may be extracted as a utterance, and the video of the speaker who is speaking may be displayed relatively large. Alternatively, only the video of the speaker who is speaking may be arranged on the screen. In these cases, the video of the speaker who is speaking may be arranged isotropically on the screen.

映像合成規則の他の一例として、各話者の映像を画面の端部に配置し、或いは、表示させず、音声信号の音量が一定のレベルを超えた際に発言と見なして、発言者の映像を画面の端から中央付近に向かって移動表示させてもよい。この場合、発言中の話者の映像を画面の中央付近に等方的に配置してもよい。 As another example of the video composition rule, the video of each speaker is placed at the edge of the screen or is not displayed, and it is regarded as a speech when the volume of the audio signal exceeds a certain level. The video may be moved and displayed from the edge of the screen toward the center. In this case, the video of the speaker who is speaking may be arranged isotropically near the center of the screen.

映像合成規則の他の一例として、音声信号の音量が、発言と見なされるレベルＡを更に上回るレベルＢを超えると、当該発言を音量の大きな発言であると見なす。更に、音量がレベルＢを超える発言に対応する映像を優先的に等方的に配置した後、音量がレベルＢを超えない発言に対応する映像を、音量がレベルＢを超える発言に対応する映像の配置場所の間に、可能な限り等方的に配置してもよい。 As another example of the video composition rule, when the volume of the audio signal exceeds a level B that is further higher than the level A that is regarded as a speech, the speech is regarded as a speech with a high volume. Furthermore, after the video corresponding to the speech whose volume exceeds level B is preferentially arranged, the video corresponding to the speech whose volume does not exceed level B is displayed, and the video corresponding to the speech whose volume exceeds level B. It may be arranged as isotropic as possible between the arrangement locations.

映像合成規則の他の一例として、通信端末１１の電話帳機能によって通話相手を特定し、電話帳データから通話相手の関連情報を抽出する。また、抽出された通話相手の関連情報に基づいて、通話相手をグループ分けし、或いは、順位付けする。また、グループごとに、又は、順位に基づいて、各通話相手の映像の配置、大きさ、動き、表示状態を設定する。 As another example of the video composition rule, the other party is specified by the telephone book function of the communication terminal 11 and related information of the other party is extracted from the telephone book data. Moreover, based on the extracted related information of the other party, the other party is grouped or ranked. Further, the arrangement, size, movement, and display state of each caller's video are set for each group or based on the order.

映像合成規則の他の一例として、通信端末１１の電話帳機能によって通話相手を特定し、通話記録から通話相手との過去の通話回数や、通話時間などの情報を、メール送受信記録から通話相手との過去のメール送受信回数や、送受信したメールのサイズなどの情報を抽出する。また、抽出したこれらの情報に基づいて、通話相手をグループ分けし、或いは、順位付けする。また、グループごとに、又は、順位に基づいて、各通話相手の映像の配置、大きさ、動き、表示状態を設定する。 As another example of the video composition rule, the other party is specified by the telephone directory function of the communication terminal 11, and information such as the number of past calls with the other party from the call record and the call duration are obtained from the mail transmission / reception record. Extract information such as the number of past emails sent and received and the size of emails sent and received. Further, based on the extracted information, the call partners are grouped or ranked. Further, the arrangement, size, movement, and display state of each caller's video are set for each group or based on the order.

これらによって、複数の話者が同時に発言をした際にも、視覚的な効果が相まって、聞き手は各話者の発言をより容易に区別できる。なお、上記のように映像合成規則が通信端末１１の電話帳機能等を利用することを規定している場合には、通信端末１１の制御装置２７は、ステップＳ２２の映像合成規則の送信に先立って、電話帳機能等の利用を行い、映像合成規則の詳細な設定を行う。映像合成規則は、上記に述べた例を互いに組み合わせて構成してもよい。映像合成規則は、上記に述べた例の他にも、通信システムやユーザの目的に応じて、種々の態様を採用できる。 As a result, even when a plurality of speakers speak at the same time, the visual effect is combined, and the listener can more easily distinguish each speaker's speech. When the video composition rule specifies that the telephone book function or the like of the communication terminal 11 is used as described above, the control device 27 of the communication terminal 11 prior to the transmission of the video composition rule in step S22. Then, use the phone book function, etc., and make detailed settings for video composition rules. The video composition rule may be configured by combining the above-described examples. In addition to the example described above, various modes can be adopted for the video composition rule depending on the communication system and the purpose of the user.

ステップＳ２７の音像の合成に際しては、例えば、各話者の音源の配置や方位が、各話者の映像の配置に対応するように設定できる。この場合、映像と音像とが一致するので、複数の話者が同時に発言をした際に、聞き手は各話者の発言をより容易に区別できる。 When synthesizing the sound image in step S27, for example, the arrangement and direction of the sound sources of the speakers can be set so as to correspond to the arrangement of the images of the speakers. In this case, since the video and the sound image match, when a plurality of speakers speak at the same time, the listener can more easily distinguish the speech of each speaker.

本実施例の通信システムによれば、テレビ電話マルチトークを実現して通話を行う際に、映像と音像とを対応させることが出来るので、複数の話者が同時に発言をした際にも、視覚的な効果も相まって、聞き手は各話者の発言を容易に区別できる。 According to the communication system of the present embodiment, video and multi-talk can be realized so that a video and a sound image can be associated with each other, so that even when a plurality of speakers speak at the same time, visual communication is possible. Combined with the effects, the listener can easily distinguish each speaker's speech.

なお、第２実施例の通信システムでは、通話状態検出部３４、及び、音像合成処理部３５、及び、映像合成処理部３６の全てが通話交換装置１２内に配設されている必要はなく、例えば何れかが通信端末１１の制御装置２７内に配設されてもよい。 In the communication system of the second embodiment, it is not necessary that the call state detection unit 34, the sound image synthesis processing unit 35, and the video synthesis processing unit 36 are all provided in the call exchange device 12. For example, any of them may be arranged in the control device 27 of the communication terminal 11.

以上、本発明をその好適な実施例に基づいて説明したが、本発明の通信システム及び通信端末は、上記実施例の構成にのみ限定されるものではなく、上記実施例の構成から種々の修正及び変更を施したものも、本発明の範囲に含まれる。 Although the present invention has been described based on the preferred embodiment, the communication system and communication terminal of the present invention are not limited to the configuration of the above embodiment, and various modifications can be made from the configuration of the above embodiment. Further, modifications and changes are also included in the scope of the present invention.

本発明の第１実施例に係る通信システムのブロック図である。1 is a block diagram of a communication system according to a first embodiment of the present invention. 図１の通信端末のブロック図である。It is a block diagram of the communication terminal of FIG. 図１の通話交換装置のブロック図である。FIG. 2 is a block diagram of the call exchange device of FIG. 1. 図１の通信システムの動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the communication system of FIG. 図５（ａ）、（ｂ）は、音像合成規則の一例を模式的に示す図である。FIGS. 5A and 5B are diagrams schematically illustrating an example of a sound image synthesis rule. 本発明の第２実施例に係る通信システムについて、通話交換装置の制御装置のブロック図である。It is a block diagram of the control apparatus of a speech exchange apparatus about the communication system which concerns on 2nd Example of this invention. 本発明の第２実施例に係る通信システムの動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the communication system which concerns on 2nd Example of this invention.

Explanation of symbols

１０：通信システム
１１，１１ａ〜１１ｄ：通信端末
１２：通話交換装置
１３：通信回線
２１：入力装置
２２：表示装置
２３：記憶装置
２４：通信装置
２５：受話装置
２６：送話装置
２７：制御装置
３１：通信装置
３２：記憶装置
３３：制御装置
３４：通話状態検出部
３５：音像合成処理部
３６：映像合成処理部 10: communication system 11, 11a to 11d: communication terminal 12: call exchange device 13: communication line 21: input device 22: display device 23: storage device 24: communication device 25: receiver device 26: transmitter device 27: control device 31: Communication device 32: Storage device 33: Control device 34: Call state detection unit 35: Sound image synthesis processing unit 36: Video synthesis processing unit

Claims

A communication system that realizes multitalk by connecting three or more communication terminals including sound source localization speakers to each other via a communication line,
One or more communication terminals that are transmitting audio signals from the connected communication terminal are identified, and communication state information that associates the identified communication terminals with the volume of the audio signal transmitted by the communication terminals is generated. A communication state detection unit;
A communication terminal that receives the communication status information and receives a currently transmitted voice signal is identified as a receiving terminal, and is currently transmitted based on a sound image synthesis rule set for each identified receiving terminal. A sound image synthesizing device that performs sound image synthesis for each receiving terminal with respect to the audio signal, and transmits the synthesized sound image to the receiving terminal ;
The sound image synthesis rules include rules for sound source localization and volume adjustment,
The communication system according to claim 1, wherein the sound image synthesis rule defines that sound sources of audio signals exceeding a predetermined volume level are arranged isotropically .

The communication system according to claim 1 , wherein the sound image synthesis rule defines that the volume of an audio signal exceeding a predetermined volume level is uniform.

The communication system according to claim 1 or 2 , wherein the sound image synthesis rule defines that a sound source distance of an audio signal corresponds to a sound volume.

The communication system according to claim 1 , wherein the sound image synthesis rule defines that the audio signal is sorted into a plurality of stages according to the sound volume, and a sound source of the sound signal is arranged at each of the sorted stages.

A communication system for realizing videophone multi-talk by connecting three or more communication terminals including a sound source localization speaker and a video display device to each other via a communication line,
Communication state information that identifies one or more communication terminals that are transmitting audio signals and video signals from the connected communication terminal, and associates the identified communication terminals with the volume of the audio signal transmitted by the communication terminal. A communication state detection unit for generating
A communication terminal that receives the communication status information and receives a currently transmitted audio signal and video signal is identified as a receiving terminal, and is currently transmitted based on a video composition rule set for each identified receiving terminal. A video synthesizer that performs video synthesis for each receiving terminal for the received video signal, transmits the synthesized video to the receiving terminal, and generates video synthesis information relating to the synthesized video;
A sound image synthesizing device that performs sound image synthesis for each receiving terminal on the currently transmitted audio signal based on the video synthesis information, and transmits the synthesized sound image to the receiving terminal ;
The video composition rule includes a rule for video layout;
The communication system characterized in that the video composition rule defines that video corresponding to an audio signal exceeding a predetermined volume level is isotropically arranged .

The communication system according to claim 5 , wherein the video composition rule defines that only video corresponding to an audio signal exceeding a predetermined volume level is displayed.

The communication system according to claim 5 or 6 , wherein the video composition rule defines that a size of a corresponding video is set according to a volume of an audio signal.

The video compositing rules, and sorting the plurality of stages depending on the volume of the audio signal exceeds a predetermined volume level, to define the effect of placing an image corresponding to the audio signal for each stage of the sorting claim 5 The communication system according to 1.

A communication terminal used in the communication system according to any one of claims 1 to 4 ,
A storage device for storing sound image synthesis rules;
A control device that reads and transmits a sound image synthesis rule from the storage device;
A communication terminal comprising:

The communication terminal according to claim 9 , wherein the sound image synthesis rules stored in the storage device can be set or changed via an input device.

The storage device stores a plurality of sound image synthesis rule candidates,
The communication terminal according to claim 9 , wherein a desired sound image synthesis rule is selectable from a plurality of sound image synthesis rule candidates stored in the storage device via an input device.

The controller is
A function for identifying a user of a communication terminal connected via a communication line using a telephone directory function, and extracting information of the specified user;
A function of setting the sound image synthesis rule based on the information of the identified user;
The communication terminal according to claim 9 , comprising:

The information of the specified user is the number of past calls with the specified user, the call time, the number of mail transmission / reception, or the size of the mail transmitted / received,
13. The communication terminal according to claim 12 , wherein the control device groups or ranks the plurality of identified users when setting the sound image synthesis rule.

A communication terminal used in the communication system according to any one of claims 5 to 8 ,
A storage device for storing video composition rules;
A control device that reads and transmits a video composition rule from the storage device;
A communication terminal comprising:

Stored video combining rule in the storage device, via the input device is configured to be set or changed, the communication terminal according to claim 1 4.

The storage device stores a plurality of video synthesis rule candidates,
From the candidates of the storage device a plurality of video composition rules stored in, via the input device, a desired image synthesis rules are configured to be selected, the communication terminal according to claim 1 4.

The controller is
A function for identifying a user of a communication terminal connected via a communication line using a telephone directory function, and extracting information of the specified user;
A function of setting the video composition rule based on the specified user information;
The a communication terminal according to claim 1 4.

The information of the specified user is the number of past calls with the specified user, the call time, the number of mail transmission / reception, or the size of the mail transmitted / received,
The communication terminal according to claim 17 , wherein the control device groups or ranks the plurality of identified users when setting the video composition rule.

A communication terminal used in the communication system according to any one of claims 1 to 4 ,
A sound image synthesizer is provided that receives a sound signal and communication state information transmitted from a connected communication terminal, and performs sound image synthesis on a currently transmitted sound signal based on a sound image synthesis rule set in the terminal. A communication terminal characterized by that.

A communication terminal used in the communication system according to any one of claims 5 to 8 ,
A video synthesizing device that receives a video signal and communication state information transmitted from a connected communication terminal and performs video synthesis on a currently transmitted video signal based on a video synthesis rule set in the terminal is provided. A communication terminal characterized by that.

A communication terminal used in the communication system according to any one of claims 5 to 8 ,
It comprises a sound image synthesizer that receives a video signal and video synthesis information transmitted by a connected communication terminal and performs sound image synthesis on a currently transmitted audio signal based on the received video synthesis information. Communication terminal.

A communication terminal used in a communication system,
The communication system is:
Multi-talk is realized by connecting three or more communication terminals equipped with sound source localization speakers to each other via a communication line.
One or more communication terminals that are transmitting audio signals from the connected communication terminal are identified, and communication state information that associates the identified communication terminals with the volume of the audio signal transmitted by the communication terminals is generated. A communication state detection unit;
A communication terminal that receives the communication status information and receives a currently transmitted voice signal is identified as a receiving terminal, and is currently transmitted based on a sound image synthesis rule set for each identified receiving terminal. A sound image synthesizing device that performs sound image synthesis for each receiving terminal with respect to the audio signal, and transmits the synthesized sound image to the receiving terminal;
The communication terminal is
A storage device for storing sound image synthesis rules;
A control device that reads and transmits a sound image synthesis rule from the storage device, and
The controller is
A function for identifying a user of a communication terminal connected via a communication line using a telephone directory function, and extracting information of the specified user;
A function of setting the sound image synthesis rule based on the specified user information,
The information of the specified user is the number of past calls with the specified user, the call time, the number of mail transmission / reception, or the size of the mail transmitted / received,
Communication terminal and the control device, which upon setting of the sound image combining rule, grouping a plurality of users who have the particular or ranks, and wherein the.

A communication terminal used in a communication system,
The communication system is:
Three or more communication terminals including a sound source localization speaker and a video display device are connected to each other via a communication line to realize a videophone multitalk.
Communication state information that identifies one or more communication terminals that are transmitting audio signals and video signals from the connected communication terminal, and associates the identified communication terminals with the volume of the audio signal transmitted by the communication terminal. A communication state detection unit for generating
A communication terminal that receives the communication status information and receives a currently transmitted audio signal and video signal is identified as a receiving terminal, and is currently transmitted based on a video composition rule set for each identified receiving terminal. A video synthesizer that performs video synthesis for each receiving terminal for the received video signal, transmits the synthesized video to the receiving terminal, and generates video synthesis information relating to the synthesized video;
A sound image synthesizing device that performs sound image synthesis for each receiving terminal on the currently transmitted audio signal based on the video synthesis information, and transmits the synthesized sound image to the receiving terminal;
The communication terminal is
A storage device for storing video composition rules;
A control device that reads and transmits a video composition rule from the storage device, and
The controller is
A function for identifying a user of a communication terminal connected via a communication line using a telephone directory function, and extracting information of the specified user;
A function of setting the video composition rule based on the information of the identified user,
The information of the specified user is the number of past calls with the specified user, the call time, the number of mail transmission / reception, or the size of the mail transmitted / received,
Communication terminal and the control device, in which when setting of the video combining rule, grouping a plurality of users who have the particular or ranks, and wherein the.