JP2009060220A

JP2009060220A - Communication system and communication program

Info

Publication number: JP2009060220A
Application number: JP2007223838A
Authority: JP
Inventors: Yorihiro Yamatani; 自広山谷
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2007-08-30
Filing date: 2007-08-30
Publication date: 2009-03-19

Abstract

<P>PROBLEM TO BE SOLVED: To perform smooth communication in which it is easy to hear voice of persons having a dialog between different points. <P>SOLUTION: Positions of the persons performing communication between the different points X and Y are determined, and on the basis of the determination results, at least either an operation for changing sensitivity of microphones 104C, 204A to the voice in the positions, or an operation for increasing the magnitude of voice outputted from speakers 105C, 205A to the positions is performed. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ネットワークを介して異なる地点間のコミュニケーションを可能とするコミュニケーションシステム及びコミュニケーションプログラムに関するものである。 The present invention relates to a communication system and a communication program that enable communication between different points via a network.

ネットワークを介して異なる地点間のコミュニケーションを可能とするコミュニケーションシステムは、ブロードバンドに代表される通信回線容量の増大や、当該システム内のコンピュータの高性能化等により日々進歩してきている。コミュニケーションシステムとして高画質のカラー画像を音声とともにリアルタイムで双方向に通信するものがあり、例えば、ビジネス分野におけるテレビ会議システムでは、等身大に表示された複数の相手と臨場感をもって会話ができるものがある。双方向で通信可能なテレビ会議システムとして特許文献１に記載の技術がある。特許文献１に記載のテレビ会議システムでは、各々の会議場所に集合した参加者がディスプレイやマイクロフォンを通じて対話することが可能である。
特表２００１−５１７３９５号公報 A communication system that enables communication between different points via a network is progressing day by day due to an increase in communication line capacity represented by broadband, an increase in performance of computers in the system, and the like. Some communication systems communicate high-quality color images bidirectionally in real time with audio. For example, in a video conference system in the business field, there is one that can talk with a plurality of life-size displayed persons with realism. is there. As a video conference system capable of bidirectional communication, there is a technique described in Patent Document 1. In the video conference system described in Patent Document 1, participants who gather at each conference place can interact through a display or a microphone.
JP-T-2001-517395

ところで、テレビ会議システムでは複数の参加者が共通する議題に対して会話を行うため、他の参加者の声がコミュニケーション上、障害とならない。 By the way, in a video conference system, since a plurality of participants have a conversation on a common agenda, the voices of other participants do not become obstacles to communication.

しかし、遠隔のオフィス間を大画面のディスプレイ等を利用して常時接続しているようなコミュニケーションシステムでは、オフィスに在籍する複数の者が互いに異なる者と同時に会話をする場合がある。この際に対話相手とは異なる者から発せられる音声が障害となり、対話相手の音声が聞こえなかったり、聞こえずらい場合があり、円滑なコミュニケーションを図ることが出来ない。 However, in a communication system in which remote offices are always connected using a large screen display or the like, a plurality of persons in the office may have a conversation with different persons at the same time. At this time, the voice emitted from a person different from the conversation partner becomes an obstacle, and the voice of the conversation partner may not be heard or difficult to hear, and smooth communication cannot be achieved.

そこで、本発明の目的は、異なる地点間で対話をする者同士の声が聞こえやすいコミュニケーションシステム及びコミュニケーションプログラムを提供することにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a communication system and a communication program that can easily hear voices of persons who interact with each other at different points.

上記目的を達成するため、本発明に係るコミュニケーションシステムは、
ネットワークを介して異なる地点間のコミュニケーションを可能とするコミュニケーションシステムであって、
話者を撮影するカメラと、
当該カメラにより撮影した画像を映し出すディスプレイと、
話者の音声を音声信号に変換するマイクロフォンと、
当該マイクロフォンにより変換された音声信号を外部に出力するスピーカーと、
コミュニケーションシステム内の動作を制御する制御部と、を有し、
前記制御部は、異なる地点間においてコミュニケーションをとっている対話者の位置を判定し、その判定結果に基づいて、前記位置の音声に対する前記マイクロフォンの感度を変更する動作、又は前記位置に対して前記スピーカーから出力される音声を大きくする動作の少なくとも何れかの動作を実行することを特徴とするものである。 In order to achieve the above object, a communication system according to the present invention provides:
A communication system that enables communication between different points via a network,
A camera to shoot the speaker,
A display for displaying images taken by the camera;
A microphone that converts the voice of the speaker into a voice signal;
A speaker for outputting the audio signal converted by the microphone to the outside;
A control unit for controlling the operation in the communication system,
The control unit determines the position of a conversation person who is communicating between different points, and based on the determination result, the operation of changing the sensitivity of the microphone with respect to the sound at the position, or the position with respect to the position It is characterized in that at least one of the operations for increasing the sound output from the speaker is executed.

また、本発明に係るコミュニケーションプログラムは、
話者を撮影するカメラと、
当該カメラにより撮影した画像を映し出すディスプレイと、
話者の音声を音声信号に変換するマイクロフォンと、
当該マイクロフォンにより変換された音声信号を外部に出力するスピーカーと、
を有するコミュニケーションシステムに対し、コンピュータを利用して異なる地点間のコミュニケーションを可能とするコミュニケーションプログラムであって、
異なる地点間においてコミュニケーションをとっている対話者の位置を判定する判定工程と、
当該判定工程により判定した判定結果に基づいて、前記位置の音声に対する前記マイクロフォンの感度を変更する動作、又は前記位置に対して前記スピーカーから出力される音声を大きくする動作の少なくとも何れかの動作を実行する動作工程と、
をコンピュータに実行させることを特徴とするものである。 In addition, the communication program according to the present invention includes:
A camera to shoot the speaker,
A display for displaying images taken by the camera;
A microphone that converts the voice of the speaker into a voice signal;
A speaker for outputting the audio signal converted by the microphone to the outside;
Is a communication program that enables communication between different points using a computer.
A determination step of determining the location of the interlocutor communicating between the different points;
Based on the determination result determined in the determination step, at least one of an operation of changing the sensitivity of the microphone with respect to the sound at the position or an operation of increasing the sound output from the speaker with respect to the position. An operational process to perform;
Is executed by a computer.

本発明に係るコミュニケーションシステム及びコミュニケーションプログラムによれば、異なる地点間で対話をする者同士の声が聞こえやすく、円滑なコミュニケーションを図ることが出来る。 According to the communication system and the communication program according to the present invention, it is easy to hear the voices of those who have a conversation between different points, and smooth communication can be achieved.

図１は本発明に係るコミュニケーションシステムの概略図である。 FIG. 1 is a schematic diagram of a communication system according to the present invention.

異なる地点にある居室Ｘと居室Ｙはネットワーク３を経由して接続されており、双方向のコミュニケーションが可能となっている。 The room X and the room Y at different points are connected via the network 3 so that two-way communication is possible.

居室Ｘにはディスプレイ１０２が設置されており、居室Ｙにはディスプレイ２０２が設置されている。ディスプレイ１０２には居室Ｙの映像が映し出され、ディスプレイ２０２には居室Ｘの映像が映し出される。例えば居室Ｘに在籍する「Ａ」という人物は、居室Ｙに在籍する「Ｂ」という人物と図１に示すコミュニケーションシステムを利用して対話することが出来る。 A display 102 is installed in the living room X, and a display 202 is installed in the living room Y. The image of the room Y is displayed on the display 102, and the image of the room X is displayed on the display 202. For example, the person “A” who is enrolled in the room X can interact with the person “B” who is enrolled in the room Y using the communication system shown in FIG.

図２は本発明に係るコミュニケーションシステムのブロック図であり、代表的な制御構成を示している。 FIG. 2 is a block diagram of a communication system according to the present invention, showing a typical control configuration.

居室Ｘには対話システム１が設置されており、居室Ｙには対話システム２が設置されている。対話システム１と対話システム２がネットワーク３を介して接続されて全体としてのコミュニケーションシステムが成り立っている。対話システム１と対話システム２は同様の構成になっているため、対話システム１に基づいて各構成を説明する。 A dialogue system 1 is installed in the living room X, and a dialogue system 2 is installed in the living room Y. The dialogue system 1 and the dialogue system 2 are connected via a network 3 to constitute a communication system as a whole. Since the dialogue system 1 and the dialogue system 2 have the same configuration, each configuration will be described based on the dialogue system 1.

対話システム１はＰＣ１０１、ディスプレイ１０２、カメラ１０３、マイクロフォン１０４、スピーカー１０５から構成されている。対話システム２へ信号を送信したり、対話システム２から信号を受信したりすることはＰＣ（コンピュータ）１０１によって行われる。またディスプレイ１０２やスピーカー１０３等はＰＣ１０１に接続されており、所定のプログラムによってＰＣ１０１がディスプレイ１０２やスピーカー１０３等の動作を制御する。 The dialogue system 1 includes a PC 101, a display 102, a camera 103, a microphone 104, and a speaker 105. The PC (computer) 101 transmits a signal to the dialog system 2 and receives a signal from the dialog system 2. The display 102, the speaker 103, and the like are connected to the PC 101, and the PC 101 controls operations of the display 102, the speaker 103, and the like according to a predetermined program.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１ＡはＰＣ（コンピュータ）１０１全体の動作を制御するものであり、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０１ＢやＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０１Ｃ等に接続されている。このＣＰＵ１０１Ａは、ＲＯＭ１０１Ｂに格納されている各種制御プログラムを読み出してＲＡＭ１０１Ｃに展開し、各部の動作を制御する。また、ＣＰＵ１０１Ａは、ＲＡＭ１０１Ｃに展開したプログラムに従って各種処理を実行し、その処理結果をＲＡＭ１０１Ｃに格納する。そして、ＲＡＭ１０１Ｃに格納した処理結果を所定の保存先に保存させる。尚、本実施形態においては、ＣＰＵ１０１ＡはＲＯＭ１０１Ｂ及びＲＡＭ１０１Ｃと協働することにより制御部を構成する。 A CPU (Central Processing Unit) 101A controls the operation of the entire PC (computer) 101, and is connected to a ROM (Read Only Memory) 101B, a RAM (Random Access Memory) 101C, and the like. The CPU 101A reads various control programs stored in the ROM 101B, develops them in the RAM 101C, and controls the operation of each unit. Further, the CPU 101A executes various processes in accordance with the program expanded in the RAM 101C, and stores the processing results in the RAM 101C. Then, the processing result stored in the RAM 101C is stored in a predetermined storage destination. In this embodiment, the CPU 101A constitutes a control unit by cooperating with the ROM 101B and the RAM 101C.

ＲＯＭ１０１Ｂは、プログラムやデータ等を予め記憶しており、この記録媒体は磁気的、光学的記録媒体、若しくは半導体メモリで構成されている。 The ROM 101B stores programs, data, and the like in advance, and this recording medium is composed of a magnetic or optical recording medium or a semiconductor memory.

ＲＡＭ１０１Ｃは、ＣＰＵ１０１Ａによって実行される各種制御プログラムによって処理されたデータ等を一時的に記憶するワークエリアを形成する。 The RAM 101C forms a work area for temporarily storing data processed by various control programs executed by the CPU 101A.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１０１Ｄは、所定のデータを記憶する機能を有する。磁性体を塗布または蒸着した金属のディスクを一定の間隔で何枚も重ね合わせた構造となっており、これをモータで高速に回転させて磁気ヘッドを近づけてデータを読み書きする。本発明に係るコミュニケーションプログラムはＨＤＤ１０１Ｄに記憶されている（コミュニケーションプログラムはＰＣ２０１のＨＤＤにも記憶されている）。 An HDD (Hard Disk Drive) 101D has a function of storing predetermined data. It has a structure in which a number of metal disks coated or vapor-deposited with a magnetic material are stacked at regular intervals, and this is rotated at high speed by a motor to read and write data by bringing the magnetic head closer. The communication program according to the present invention is stored in the HDD 101D (the communication program is also stored in the HDD of the PC 201).

ディスプレイ１０２は居室Ｙの映像や居室Ｙに所属する人物の映像を映し出すものであり、映し出される映像は対話システム２におけるカメラ２０３によって撮影されたものである。ディスプレイ１０２は臨場感を高めるために高解像度の大画面であることが好ましい。 The display 102 displays a video of the room Y and a video of a person belonging to the room Y, and the video displayed is taken by the camera 203 in the dialogue system 2. The display 102 is preferably a large screen with high resolution in order to enhance the presence.

カメラ１０３は、居室Ｘや居室Ｘに所属する人物を撮影し、ネットワーク３を介して、撮影した映像を対話システム２に配信するものである。 The camera 103 captures a room X or a person belonging to the room X and distributes the captured video to the dialogue system 2 via the network 3.

マイクロフォン１０４は居室Ｘで発生する音声を収集し、マイクロフォン１０４で収集した音声はネットワーク３を介して、対話システム２におけるスピーカー２０５に配信される。マイクロフォン１０４は最低２チャンネル以上必要であり、臨場感を高めるためにステレオマイクロフォンであることが好ましい。 The microphone 104 collects sound generated in the room X, and the sound collected by the microphone 104 is distributed to the speaker 205 in the dialogue system 2 via the network 3. The microphone 104 requires at least two channels, and is preferably a stereo microphone in order to enhance the sense of reality.

スピーカー１０５は居室Ｙで発せられた音声を居室Ｘにおいて提供するものであり、最低２チャンネル以上が必要である。スピーカー１０５はディスプレイ１０２に内蔵されていても良いし、ディスプレイ１０２と別体であっても良い。 The speaker 105 provides the sound emitted from the room Y in the room X, and requires at least two channels. The speaker 105 may be built in the display 102 or may be separate from the display 102.

次に図２で説明したコミュニケーションシステムを利用し、対話者の位置を判定してスピーカーから出力される音声等を調整し、コミュニケーションをとる動作について説明する。例えば、居室Ｘに在籍する「Ａ」という人物と、居室Ｙに在籍する「Ｂ」という人物が、図３に示すようなディスプレイに対する位置関係で対話をしている状態を想定して具体的に説明する。 Next, a description will be given of an operation for making communication by using the communication system described with reference to FIG. 2 to determine the position of the conversation person and adjusting the sound output from the speaker. For example, assuming that the person “A” who is enrolled in the room X and the person “B” who is enrolled in the room Y are interacting in a positional relationship with respect to the display as shown in FIG. explain.

図３で示すように、居室Ｘにおけるカメラ１０３はディスプレイ１０２の中央且つ上方に設置されており、居室Ｙにおけるカメラ２０３はディスプレイ２０２の中央且つ上方に設置されている。ディスプレイ１０２は３つの領域α、β、γに区別されており、各々の領域にマイクロフォンとスピーカーが設置されている（例えば、ディスプレイ上のα領域に対してはマイクロフォン１０４Ａとスピーカー１０５Ａが設置されている）。居室Ｙにおけるディスプレイ２０２もディスプレイ１０２と同様に３つの領域α、β、γに区別されており、各々の領域にマイクロフォンとスピーカーが設置されている。なお、ディスプレイ１０２及びディスプレイ２０２の３つの領域は例示であり、複数の領域に分かれていればよい。 As shown in FIG. 3, the camera 103 in the living room X is installed at the center and above the display 102, and the camera 203 in the living room Y is installed at the center and above the display 202. The display 102 is divided into three regions α, β, and γ, and a microphone and a speaker are installed in each region (for example, a microphone 104A and a speaker 105A are installed in the α region on the display). ) Similarly to the display 102, the display 202 in the living room Y is also divided into three regions α, β, and γ, and a microphone and a speaker are installed in each region. Note that the three areas of the display 102 and the display 202 are merely examples, and may be divided into a plurality of areas.

居室Ｙにおけるカメラ２０３で撮影された映像は居室Ｘのディスプレイ１０２に映し出され、居室Ｙに居る人物Ｂはディスプレイ１０２のγ領域に映し出される。従って、ディスプレイ１０２に映し出されている人物Ｂに対して正面に位置するほうが対話がしやすいため、人物Ｂと対話をしている人物Ａはディスプレイ１０２のγ領域の前に位置している。同じように、ディスプレイ２０２に映し出されている人物Ａに対して正面に位置するほうが対話がしやすいため、人物Ａと対話をしている人物Ｂはディスプレイ２０２のα領域の前に位置している。 The video shot by the camera 203 in the room Y is displayed on the display 102 in the room X, and the person B in the room Y is displayed in the γ region of the display 102. Accordingly, since it is easier to interact with the person B displayed on the display 102 in front of the person B, the person A interacting with the person B is located in front of the γ region of the display 102. Similarly, since it is easier for the person A shown on the display 202 to interact with the person A, the person B interacting with the person A is located in front of the α area of the display 202. .

このような状態で居室Ｘに在籍する人物Ａと、居室Ｙに在籍する人物Ｂが他の人の声に邪魔されず、円滑に対話を行う方法を図４において説明する。 A method in which the person A enrolled in the room X in this state and the person B enrolled in the room Y interact smoothly without being disturbed by the voices of other people will be described with reference to FIG.

図４は話者の顔の方向に基づき対話者の位置を判定して音声を調整する動作を説明するフローチャート図である。 FIG. 4 is a flowchart for explaining the operation of adjusting the voice by determining the position of the conversation person based on the direction of the speaker's face.

まず、対話システム１又は対話システム２はマイクロフォンにより人の音声を検知したかどうかを判断する（ステップＳ１）。人の音声をマイクロフォンにより検知することにより対話が開始されたと判断できるため、まず最初に当該判断動作を行う。なお、当該判断動作は対話が開始されたことを判断できれば良いため、人の音声を検知する方法に限らず、例えば対話する際に対話システム１又は対話システム２における所定のボタンが対話者によって押されたか否かを判断するようにしても良い。 First, the dialogue system 1 or the dialogue system 2 determines whether or not a human voice is detected by the microphone (step S1). Since it can be determined that the conversation has started by detecting the human voice with the microphone, the determination operation is first performed. Note that this determination operation is not limited to a method of detecting a human voice because it is only necessary to be able to determine that the dialog has started. For example, when a dialog is performed, a predetermined button in the dialog system 1 or the dialog system 2 is pressed by the dialog person. It may be determined whether or not it has been done.

ステップＳ１において人の音声を検知すると（ステップＳ１；Ｙｅｓ）、スピーカー等の音量調整をするため、まずディスプレイの前に居る人物を特定する（ステップＳ２）。図３の例でいれば、ディスプレイ１０２の前に人物Ａが居て、ディスプレイ２０２の前に人物Ｂが居るため、カメラ１０３及びカメラ２０３によって撮影された画像をもとに人物Ａ及び人物Ｂが特定される。 When a human voice is detected in step S1 (step S1; Yes), in order to adjust the volume of a speaker or the like, a person in front of the display is first identified (step S2). In the example of FIG. 3, since the person A is in front of the display 102 and the person B is in front of the display 202, the persons A and B are based on images taken by the camera 103 and the camera 203. Identified.

次に特定された人物の顔の方向に基づき、ディスプレイ上の対話先領域を検出する（ステップＳ３）。顔の方向はカメラ１０３、２０３により撮影した画像における人物の顔から検出する。具体的な検出方法は、例えば人物の左目、右目、口、肌領域の画像幅のパラメータを算出し、予め記憶されているルックアップテーブルと比較して検出する方法（例えば特開２０００−９７６７６号公報記載の技術）である。顔の方向が検出されると、ディスプレイ上のどの領域を見ていることになるのかを検出する。図３に示す例でいえば、居室Ｘにおいて、ディスプレイ１０２におけるγ領域（人物Ｂが映し出されている領域）に対して人物Ａの顔の方向が向いているため、ディスプレイ１０２においてγ領域を対話先領域とみなす。一方、居室Ｙにおいて、ディスプレイ２０２におけるα領域（人物Ａが映し出されている領域）に対して人物Ｂの顔の方向が向いているため、ディスプレイ２０２におけるα領域を対話先領域とみなす。 Next, based on the specified face direction of the person, a dialogue destination area on the display is detected (step S3). The face direction is detected from the face of a person in the images taken by the cameras 103 and 203. A specific detection method is, for example, a method of calculating image width parameters of a person's left eye, right eye, mouth, and skin region and detecting the parameter by comparing with a pre-stored lookup table (for example, Japanese Patent Laid-Open No. 2000-97676). (Technique described in the publication). When the face direction is detected, it is detected which area on the display is being viewed. In the example shown in FIG. 3, in the living room X, the face of the person A faces the γ area on the display 102 (the area where the person B is projected). Considered the destination area. On the other hand, in the living room Y, the face of the person B faces the α area on the display 202 (the area where the person A is projected), so the α area on the display 202 is regarded as the conversation destination area.

そして、ステップＳ３においてディスプレイ１０２上の対話先領域とディスプレイ２０２上の対話先領域を照合し、居室Ｘと居室Ｙとの間で対話をしている対話者の位置を判定する（判定工程としてのステップＳ４）。この判定は居室ＸにおけるＰＣ１０１、又は居室ＹにおけるＰＣ２０１の何れかにおいて行われる。 Then, in step S3, the dialogue destination area on the display 102 and the dialogue destination area on the display 202 are collated, and the position of the conversation person having a conversation between the room X and the room Y is determined (as a determination step). Step S4). This determination is performed in either the PC 101 in the room X or the PC 201 in the room Y.

図３に示す例でいれば、ステップＳ３においてディスプレイ１０２におけるγ領域とディスプレイ２０２のα領域が対話先領域と検出されており、各々のディスプレイに映し出されている映像と、その映像を見ている人物の顔の方向が一致するため、居室Ｘと居室Ｙにおける対話が成立しているものと判断出来る。同じようにディスプレイ１０２におけるβ領域とディスプレイ２０２のβ領域が対話先領域とされた場合、ディスプレイ１０２におけるα領域とディスプレイ２０２のγ領域が対話先領域とされた場合、居室Ｘと居室Ｙにおける対話が成立しているものと判断出来る。 In the example shown in FIG. 3, in step S3, the γ region on the display 102 and the α region on the display 202 are detected as the dialogue destination region, and the video displayed on each display and the video are viewed. Since the face direction of the person matches, it can be determined that the dialogue between the room X and the room Y has been established. Similarly, when the β area of the display 102 and the β area of the display 202 are set as the dialogue destination area, the dialogue between the room X and the room Y is set when the α area of the display 102 and the γ area of the display 202 are set as the dialogue destination area. It can be determined that

図３では、対話者は人物Ａと人物Ｂであり、その対話者の位置はディスプレイ１０２におけるγ領域（人物Ａが居る位置）、人物Ｂに関してはディスプレイ２０２におけるα領域（人物Ｂが居る位置）であると判定する。 In FIG. 3, the interrogators are person A and person B, and the positions of the interrogators are the γ region (position where person A is present) on display 102, and for person B the α region (position where person B is present) on display 202. It is determined that

ステップＳ４において対話者の位置が判定されると、対話者が居る領域のマイクロフォンにおいて音声感度を高くするように変更し（動作工程としてのステップＳ５）、また対話者が居る領域のスピーカーにおいて、会話相手の音声を大きく出力する（動作工程としてのステップＳ６）。 When the position of the talker is determined in step S4, the voice sensitivity is changed to be high in the microphone in the area where the talker is present (step S5 as an operation process), and the conversation is performed in the speaker in the area where the talker is located. The other party's voice is greatly output (step S6 as an operation process).

図３に示す例でいえば、居室Ｘにおけるγ領域（人物Ａが居る位置）のマイクロフォン１０４Ｃの音声感度を変更して人物Ａの音声を十分に拾うようにし、居室Ｙにおけるα領域（人物Ｂが居る位置）のマイクロフォン２０４Ａの音声感度を変更して人物Ｂの音声を十分に拾うようにする。また、居室Ｘにおけるγ領域のスピーカー１０５Ｃにおいて、人物Ｂの音声を大きく出力し、居室Ｙにおけるα領域のスピーカー１０４Ａにおいて、人物Ａの音声を大きく出力する。なお、マイクロフォンの音声感度を変更し、且つスピーカーの出力値を大きくするのではなく、少なくとも何れか一方の動作を行うようにしても良い。また、図３に示すようなマイクロフォンやスピーカーを複数設置する形態ではなく、マイクロフォンとスピーカーを一つずつ設置し、対話者の位置の音声に対するマイクロフォンの感度を変更したり、対話者の位置に対してスピーカーから出力される音声を大きくするようにしてもよい。 In the example shown in FIG. 3, the voice sensitivity of the microphone 104C in the γ region (position where the person A is present) in the living room X is changed so as to sufficiently pick up the voice of the person A, and the α region (person B in the living room Y). The sound sensitivity of the microphone 204A at the position where the person B is located is changed so that the sound of the person B is sufficiently picked up. The loudspeaker 105 C in the room X outputs a loud voice of the person B, and the loudspeaker 104 A in the alpha room Y in the living room Y outputs a loud voice of the person A. Note that at least one of the operations may be performed instead of changing the sound sensitivity of the microphone and increasing the output value of the speaker. Also, instead of installing a plurality of microphones and speakers as shown in FIG. 3, one microphone and one speaker are installed to change the sensitivity of the microphone to the voice of the talker position, The sound output from the speaker may be increased.

以上図３及び図４で説明したように、対話者の位置を判定し、その判定結果に基づいてマイクロフォンやスピーカーにより音声を調整すれば、対話者以外の者がしゃべっていても、その者の音声が障害になることなく、対話をする者同士の声が聞こえやすくなり、円滑にコミュニケーションをとることが出来る。 As described above with reference to FIGS. 3 and 4, if the position of the conversation person is determined and the sound is adjusted by the microphone or the speaker based on the determination result, even if a person other than the conversation person speaks, It is easy to hear the voices of those who have a conversation without disturbing the voice, and it is possible to communicate smoothly.

また、別の方法として、話者の視線の方向に基づき対話者の位置を判定することも考えられ、この点を図５及び図６を用いて説明する。図３とは異なり、居室Ｘに在籍する人物Ａ及び人物Ｃと、居室Ｙに在籍する人物Ｂが図５に示すようなディスプレイに対する位置関係で２対１で対話をしている状態を想定して具体的に説明する。 As another method, it is conceivable to determine the position of the conversation person based on the direction of the line of sight of the speaker. This point will be described with reference to FIGS. Unlike FIG. 3, it is assumed that the person A and person C who are enrolled in the room X and the person B who is enrolled in the room Y are having a two-to-one conversation with respect to the display as shown in FIG. Will be described in detail.

図６におけるステップＳ１１〜Ｓ１２、Ｓ１４〜Ｓ１６は、図３におけるステップＳ１〜Ｓ２、Ｓ４〜Ｓ６と同様である。図６のステップＳ１３では特定された人物の視線の方向に基づき、ディスプレイ上の対話先領域を検出する。人物の視線の方向は、カメラ１０３、２０３により撮影した画像における人物の目の動きから検出する（例えば特開２０００−１４６５５３号公報記載の技術）。 Steps S11 to S12 and S14 to S16 in FIG. 6 are the same as steps S1 to S2 and S4 to S6 in FIG. In step S13 of FIG. 6, a dialog destination area on the display is detected based on the identified line of sight of the person. The direction of the line of sight of the person is detected from the movement of the eyes of the person in the images taken by the cameras 103 and 203 (for example, the technique described in Japanese Patent Application Laid-Open No. 2000-146553).

図５に示す例でいえば、居室Ｘにおいて、ディスプレイ１０２におけるγ領域（人物Ｂが映し出されている領域）に対して人物Ａ及び人物Ｃの視線が向いているため、ディスプレイ１０２においてγ領域を対話先領域とみなす。一方、居室Ｙにおいて、ディスプレイ２０２におけるα領域（人物Ａが映し出されている領域）とβ領域（人物Ｃが映し出されている領域）に対して人物Ｂの視線が向いているため、ディスプレイ２０２におけるα領域とβ領域を対話先領域とみなす。 In the example shown in FIG. 5, in the living room X, the lines of sight of the person A and the person C are directed to the γ area (area where the person B is projected) on the display 102. It is considered as a dialogue destination area. On the other hand, in the living room Y, the line of sight of the person B faces the α area (the area where the person A is projected) and the β area (the area where the person C is projected) on the display 202. The α region and the β region are regarded as dialogue destination regions.

ステップＳ１３における検出が終了すると、ディスプレイ１０２上の対話先領域とディスプレイ２０２上の対話先領域を照合し、居室Ｘと居室Ｙとの間で対話をしている対話者の位置を判定する（ステップＳ１４）。そしてその判定結果に基づいて、対話者が居る領域のマイクロフォンにおいて音声感度を高くするように変更したり（ステップＳ１５）、また対話者が居る領域のスピーカーにおいて、会話相手の音声を大きく出力する（ステップＳ１６）。 When the detection in step S13 is completed, the dialogue destination area on the display 102 and the dialogue destination area on the display 202 are collated, and the position of the conversation person who has a conversation between the room X and the room Y is determined (step). S14). Based on the result of the determination, the voice sensitivity of the microphone in the area where the talker is present is changed to be high (step S15), and the voice of the conversation partner is greatly output from the speaker in the area where the talker is present ( Step S16).

図６に示す例でいえば、居室Ｘにおけるβ領域（人物Ｃが居る位置）のマイクロフォン１０４Ｂとγ領域（人物Ａが居る位置）のマイクロフォン１０４Ｃの音声感度を変更して人物Ａ及び人物Ｃの音声を十分に拾うようにし、居室Ｙにおけるα領域（人物Ｂが居る位置）のマイクロフォン２０４Ａの音声感度を変更して人物Ｂの音声を十分に拾うようにする。また、居室Ｘにおけるβ領域のスピーカー１０５Ｂとγ領域のスピーカー１０５Ｃにおいて、人物Ｂの音声を大きく出力し、居室Ｙにおけるα領域のスピーカー１０４Ａにおいて、人物Ａの音声を大きく出力する。 In the example shown in FIG. 6, the voice sensitivities of the microphone 104B in the β region (position where the person C is present) and the microphone 104C in the γ region (position where the person A is present) in the room X are changed to change the person A and the person C. The sound is sufficiently picked up, and the sound sensitivity of the microphone 204A in the α region (position where the person B is present) in the room Y is changed so that the sound of the person B is sufficiently picked up. In addition, the sound of the person B is greatly output from the speaker 105B in the β region and the speaker 105C in the γ region in the living room X, and the sound of the person A is output greatly from the speaker 104A in the α region in the living room Y.

このように人物の視線の方向に基づいて対話者の位置を判定し、その判定結果に基づいてマイクロフォンやスピーカーにより音声を調整すれば、対話者以外の者がしゃべっていても、その者の音声が障害になることなく、対話をする者同士の声が聞こえやすくなり、円滑にコミュニケーションをとることが出来る。 In this way, if the position of the conversation person is determined based on the direction of the person's line of sight, and the sound is adjusted by a microphone or a speaker based on the determination result, the voice of the person who is not the conversation person is spoken. This makes it easier to hear the voices of those who are interacting with each other, and allows smooth communication.

また、図７のフローチャート図におけるステップＳ２３に示すように話者から発せられる音声の方向に基づき、ディスプレイ上の対話先領域を検出してもよい。具体的な検出方法は、マイクロフォンから入力される音声の周波数分析をすることにより検出する方法（例えば特開平９−２５１２９９号公報記載の技術）である。なお、図７におけるステップＳ２１〜Ｓ２２、Ｓ２４〜Ｓ２６は、図３におけるステップＳ１〜Ｓ２、Ｓ４〜Ｓ６と同様である。 Further, as shown in step S23 in the flowchart of FIG. 7, the conversation destination area on the display may be detected based on the direction of the voice emitted from the speaker. A specific detection method is a method (for example, a technique described in Japanese Patent Application Laid-Open No. 9-251299) in which detection is performed by frequency analysis of sound input from a microphone. Note that steps S21 to S22 and S24 to S26 in FIG. 7 are the same as steps S1 to S2 and S4 to S6 in FIG.

また、図８のフローチャート図で示すように話者がディスプレイ上で選択した箇所に基づき、ディスプレイ上の対話領域を決定してもよい。 Further, as shown in the flowchart of FIG. 8, the dialogue area on the display may be determined based on the location selected by the speaker on the display.

居室Ｘと居室Ｙにおいて対話を行う場合、ディスプレイ上で対話する相手をユーザーが選択するようにする。例えば、ディスプレイ１０２や２０２において「対話する相手をディスプレイ上で選択してください」という表示を出して、ユーザーに選択するよう促す。 When a conversation is performed in the room X and the room Y, the user selects a partner to interact on the display. For example, on the display 102 or 202, a message “Please select a partner to interact with on the display” is displayed to prompt the user to select.

図８におけるステップＳ３１においてディスプレイ上で対話する相手がユーザーによって選択されたかどうか判断する。ディスプレイ上での選択があった場合に、ディスプレイの選択箇所が対話する相手がいる領域であると判断し、その選択箇所に基づき対話先領域を決定する（ステップＳ３２）。ディスプレイの選択箇所の座標を把握することにより、どこの領域が選択箇所であるか判断することが出来る。ステップＳ３２において対話者の位置を判定すると、図４等で説明した動作と同様に対話者が居る領域のマイクロフォンにおいて音声感度を高くするように変更したり（ステップＳ３４）、また対話者が居る領域のスピーカーにおいて、会話相手の音声を大きく出力する（ステップＳ３５）。 In step S31 in FIG. 8, it is determined whether or not the other party to interact with on the display has been selected by the user. When there is a selection on the display, it is determined that the selected location of the display is an area where there is a conversation partner, and the dialogue destination area is determined based on the selected location (step S32). By grasping the coordinates of the selected location on the display, it is possible to determine which region is the selected location. When the position of the interlocutor is determined in step S32, it is changed to increase the voice sensitivity in the microphone in the area where the interlocutor is present in the same manner as the operation described in FIG. 4 or the like (step S34). Loudly output the voice of the conversation partner (step S35).

また、図９のフローチャート図で示すように、マイクロフォンに入力された対話相手の名前に関する音声情報から対話相手の顔情報を抽出し、その抽出した顔情報とカメラにより撮影した画像を照合させて対話者の位置を判定してもよい。 Further, as shown in the flowchart of FIG. 9, the conversation partner's face information is extracted from the voice information related to the conversation partner's name input to the microphone, and the extracted face information is collated with the image taken by the camera. The position of the person may be determined.

まず、マイクロフォンにおいて会話相手の名前の呼び掛けがあるかどうか検出する（ステップＳ４１）。そして名前の呼び掛けがある場合に、呼び掛けられた名前を音声認識し（ステップＳ４２）、音声認識した名前情報から顔情報を抽出する（ステップＳ４３）。 First, it is detected whether there is a call for the name of the conversation partner in the microphone (step S41). When there is a name call, the called name is recognized by speech (step S42), and face information is extracted from the name information that has been recognized (step S43).

名前の音声認識は、マイクロフォンにより入力された音声信号の波長等を分析することにより入力された音声を認識するものである。顔情報を抽出は、名前と顔情報の関係が規定されたデータベースを参照することにより行う。 Name speech recognition recognizes input speech by analyzing the wavelength and the like of a speech signal input by a microphone. The face information is extracted by referring to a database in which the relationship between the name and the face information is defined.

顔情報が抽出されると、抽出した顔情報に基づき該当する人物が居る領域を特定する（ステップＳ４４）。カメラにより撮影した画像から人物の顔を抽出し、目や口の位置情報をステップＳ４３で抽出した顔情報と照合し、該当する人物が居る領域を特定する。 When the face information is extracted, an area where the corresponding person is present is specified based on the extracted face information (step S44). A person's face is extracted from the image photographed by the camera, and the position information of the eyes and mouth is compared with the face information extracted in step S43, and the region where the corresponding person exists is specified.

そして、該当する人物が居る領域を特定した後は、特定した領域を対話者が居る位置と判定し（ステップＳ４５）、対話者が居る領域のマイクロフォンにおいて音声感度を高くするように変更したり（ステップＳ４６）、また対話者が居る領域のスピーカーにおいて、会話相手の音声を大きく出力する（ステップＳ４７）。 And after specifying the area | region where the applicable person exists, it determines that the specified area | region is a position where a dialog person exists (step S45), and it changes so that audio | voice sensitivity may be made high in the microphone of the area | region where a dialog person exists ( In step S46), the voice of the conversation partner is greatly output from the speaker in the area where the conversation person is present (step S47).

以上図６〜図９で説明したように、話者の顔の方向に限らず、話者の視線の方向、話者から発せられる音声の方向、話者が選択したディスプレイ上の選択箇所等に基づいて、対話者の位置を判定し、その判定結果に基づいてマイクロフォンの音声感度を変更したり、スピーカーの音量を調整することで、他人の音声が障害になることなく対話をする者同士の声が聞こえやすくなる。 As described above with reference to FIGS. 6 to 9, not only the direction of the speaker's face, but also the direction of the speaker's line of sight, the direction of the sound emitted from the speaker, the selection location on the display selected by the speaker, etc. Based on the determination result, the microphone's voice sensitivity is changed or the speaker's volume is adjusted, so that the voices of other people interacting with each other without any obstacles. The voice becomes easier to hear.

なお、本発明の実施の形態を図面によって説明してきたが、本発明は当該実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲における変更や追加があっても本発明に含まれる。 Although the embodiments of the present invention have been described with reference to the drawings, the present invention is not limited to the embodiments, and the present invention can be modified or added without departing from the scope of the present invention. included.

図３や図５において、例えば対話者と判定しずらい者がいる場合は対話者とみなし、その者に対してマイクロフォンにおいて音声感度を高くするように変更したり、スピーカーにおいて会話相手の音声を大きく出力するようにしてもよい。 3 and 5, for example, when there is a person who is difficult to determine as a conversation person, it is regarded as a conversation person, and the conversation person is changed to increase the voice sensitivity with the microphone, or the voice of the conversation partner is heard with the speaker. You may make it output large.

コミュニケーションシステムの概略図である。It is the schematic of a communication system. コミュニケーションシステムのブロック図である。It is a block diagram of a communication system. 居室Ｘにおける人物Ａと居室Ｙにおける人物Ｂが会話をしている状態を示す説明図である。It is explanatory drawing which shows the state in which the person A in the living room X and the person B in the living room Y are talking. 話者の顔の方向に基づき対話者の位置を判定して音声を調整する動作を説明するフローチャート図である。It is a flowchart explaining the operation | movement which determines the position of a conversation person based on the direction of a speaker's face, and adjusts a sound. 居室Ｘにおける人物Ａ及び人物Ｃと居室Ｙにおける人物Ｂが会話をしている状態を示す説明図である。It is explanatory drawing which shows the state in which the person A and the person C in the living room X and the person B in the living room Y are talking. 話者の視線の方向に基づき対話者の位置を判定して音声を調整する動作を説明するフローチャート図である。It is a flowchart explaining the operation | movement which determines the position of a conversation person based on the direction of a speaker's eyes | visual_axis, and adjusts a sound. 話者から発せられる音声の方向に基づき対話者の位置を判定して音声を調整する動作を説明するフローチャート図である。It is a flowchart figure explaining the operation | movement which determines the position of a conversation person based on the direction of the audio | voice emitted from a speaker, and adjusts an audio | voice. 話者が選択したディスプレイ上の選択箇所に基づき対話者の位置を判定して音声を調整する動作を説明するフローチャート図である。It is a flowchart figure explaining the operation | movement which determines the position of a conversation person based on the selection location on the display which the speaker selected, and adjusts an audio | voice. 対話相手の名前に関する音声情報や顔情報に基づき対話者の位置を判定して音声を調整する動作を説明するフローチャート図である。It is a flowchart explaining the operation | movement which determines the position of a conversation person based on the audio | voice information and face information regarding the name of a conversation partner, and adjusts a sound.

Explanation of symbols

１、２対話システム
１０１、２０１ＰＣ
１０２、２０２ディスプレイ
１０３、２０３カメラ
１０４、２０４マイクロフォン
１０５、２０５スピーカー
１０１Ａ、２０１ＡＣＰＵ
１０１Ｂ、２０１ＢＲＯＭ
１０１Ｃ、２０１ＣＲＡＭ
１０１Ｄ、２０１ＤＨＤＤ 1, 2, Dialog system 101, 201 PC
102, 202 Display 103, 203 Camera 104, 204 Microphone 105, 205 Speaker 101A, 201A CPU
101B, 201B ROM
101C, 201C RAM
101D, 201D HDD

Claims

A communication system that enables communication between different points via a network,
A camera to shoot the speaker,
A display for displaying images taken by the camera;
A microphone that converts the voice of the speaker into a voice signal;
A speaker for outputting the audio signal converted by the microphone to the outside;
A control unit for controlling the operation in the communication system,
The control unit determines the position of a conversation person who is communicating between different points, and based on the determination result, the operation of changing the sensitivity of the microphone with respect to the sound at the position, or the position with respect to the position A communication system that performs at least one of operations for increasing sound output from a speaker.

The said control part detects the dialog destination area | region on the said display based on the direction of a speaker's face, and determines the said position by collating the said dialog destination area | region detected in each said point. 1. The communication system according to 1.

The control unit detects the dialogue destination area on the display based on the direction of the line of sight of a speaker, and determines the position by collating the dialogue destination area detected at each of the points. 1. The communication system according to 1.

The control unit detects a dialogue destination area on the display based on a direction of a voice emitted from a speaker, and determines the position by comparing the dialogue destination area detected at each of the points. The communication system according to claim 1.

The communication system according to claim 1, wherein the control unit determines the position based on a selection location on the display selected by a speaker.

The control unit extracts face information of the conversation partner from voice information related to the name of the conversation partner input to the microphone, and determines the position by comparing the extracted face information with an image photographed by the camera. The communication system according to claim 1.

A camera to shoot the speaker,
A display for displaying images taken by the camera;
A microphone that converts the voice of the speaker into a voice signal;
A speaker for outputting the audio signal converted by the microphone to the outside;
Is a communication program that enables communication between different points using a computer.
A determination step of determining the location of the interlocutor communicating between the different points;
Based on the determination result determined in the determination step, at least one of an operation of changing the sensitivity of the microphone with respect to the sound at the position or an operation of increasing the sound output from the speaker with respect to the position. An operational process to perform;
A communication program characterized by causing a computer to execute.

The determination step detects the dialog destination area on the display based on the direction of a speaker's face and determines the position by comparing the dialog destination area detected at each of the points. The communication program according to 7.

The determination step detects the dialog destination area on the display based on the direction of the line of sight of a speaker, and determines the position by collating the dialog destination area detected at each of the points. The communication program according to 7.

In the determination step, the dialogue destination area on the display is detected based on the direction of a voice emitted from a speaker, and the dialogue destination area detected at each point is collated to determine the position. The communication program according to claim 7.

The communication program according to claim 7, wherein the determination step determines the position based on a selection location on the display selected by a speaker.

In the determination step, face information of the conversation partner is extracted from voice information related to the name of the conversation partner input to the microphone, and the position is determined by collating the extracted face information with an image photographed by the camera. The communication program according to claim 7.