JP2006211504A

JP2006211504A - Conference terminal apparatus, conference system and computer program

Info

Publication number: JP2006211504A
Application number: JP2005023239A
Authority: JP
Inventors: Tetsuya Katsumata; 鉄弥勝股; Naoyuki Uramatsu; 尚之浦松; Takayuki Hoshino; 孝行星野; Yuki Okawa; 友樹大川
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2005-01-31
Filing date: 2005-01-31
Publication date: 2006-08-10

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently and effectively proceed with a conference among a plurality of bases. <P>SOLUTION: A conference control system installed on each of bases comprises a conference terminal apparatus 100. The conference terminal apparatus 100 controls proceedings of a conference according to a conference control program. The conference control program contains first and second selection algorithms, and a CPU 111 controls a selection section 130 and selects one of a plurality of microphones according to any one of the selection algorithms. Only audio signals corresponding to the selected microphone are inputted to a CODEC 140 and transmitted, as audio data, from a communication section 150 to the other bases. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、例えば複数の拠点間で会議を行うための会議用端末装置、会議システム及びコンピュータプログラムの技術分野に関する。 The present invention relates to a technical field of a conference terminal device, a conference system, and a computer program for performing a conference between a plurality of bases, for example.

この種の技術が各種提案されている（例えば、特許文献１又は２参照）。 Various techniques of this type have been proposed (see, for example, Patent Document 1 or 2).

特許文献１に開示されたテレビ会議システム（以下、第１の従来技術と称する）によれば、複数のマイクからの音声信号の信号レベルを比較して音声入力動作中のマイクを判別し、音声入力動作中のマイクに対応するカメラ操作パラメータをメモリから読み出してビデオカメラの作動を制御することによって、会議の進行に適した映像を相手側に送出することが可能であるとされている。 According to the video conference system disclosed in Patent Document 1 (hereinafter referred to as the first prior art), the signal levels of audio signals from a plurality of microphones are compared to determine a microphone that is performing an audio input operation, By reading camera operation parameters corresponding to a microphone that is performing an input operation from a memory and controlling the operation of the video camera, it is possible to send an image suitable for the progress of the conference to the other party.

また、特許文献２に開示された通信会議端末装置（以下、第２の従来技術と称する）によれば、複数のマイクから入力された音声信号のうち、所定のレベルよりも大きい音声信号を増幅する可変利得アンプの利得を大きい値に設定するので、主たる発言者の音声信号を強調することができ、相手端末では主たる発言者の音声を明確に聞き取ることができるとされている。 Further, according to the communication conference terminal device disclosed in Patent Document 2 (hereinafter referred to as the second prior art), an audio signal larger than a predetermined level is amplified among audio signals input from a plurality of microphones. Since the gain of the variable gain amplifier is set to a large value, the voice signal of the main speaker can be emphasized, and the voice of the main speaker can be clearly heard at the other terminal.

特開平７−８７４７０号公報JP-A-7-87470 特開平８−１１６３５３号公報JP-A-8-116353

会議の参加者が拠点毎に複数存在する場合、これら複数の参加者が同時にマイクに向って発言すると、他の場所に伝達される音声はこれらが入り混じったものとなるから、誰の発言であるかを特定するだけでも困難な作業となる。 If there are multiple conference participants at each site, if these participants speak to the microphone at the same time, the voice transmitted to other places will be mixed, so no one can say It is a difficult task just to identify whether there is.

このような問題に対し、第１及び第２の従来技術は、夫々音声信号のレベルに基づいて発言者を特定している。しかしながら、会議において音声の大小は必ずしも重要性の大小と一致している訳ではなく、また音声の大小は個人差があるから不公平でさえある。即ち、これら従来の技術には、会議を効率的且つ効果的に進行させることが困難であるという技術的な問題点がある。 In response to such a problem, the first and second conventional techniques specify the speaker based on the level of the audio signal. However, the size of the voice does not necessarily coincide with the magnitude of importance in the meeting, and the size of the voice is even unfair because of individual differences. That is, these conventional techniques have a technical problem that it is difficult to efficiently and effectively advance the conference.

一方で、複数のマイクから入力される音声信号の中から一つの音声信号を手動で選択して、常に明確な音声を他の拠点に伝達させることも考えられるが、選択作業が煩雑となって会議の円滑な進行を妨げかねない。 On the other hand, it is conceivable to manually select one audio signal from audio signals input from a plurality of microphones, and always transmit clear audio to other sites, but the selection work becomes complicated. It may hinder the smooth progress of the meeting.

本発明は、例えば、上述した問題点に鑑みてなされたものであり、会議を効率的且つ効果的に進行させ得る会議用端末装置、会議システム及びコンピュータプログラムを提供することを課題とする。 The present invention has been made in view of the above-described problems, for example, and it is an object of the present invention to provide a conference terminal device, a conference system, and a computer program that can efficiently and effectively advance a conference.

上述した課題を解決するため、請求項１の会議用端末装置は、複数の拠点間で会議を行うために、該拠点において（i）前記会議の参加者に対応付けられて設置され、音声を夫々集音すると共に該集音した音声に対応する音声信号を夫々生成する複数のマイクロフォン及び（ii）少なくとも一つのスピーカと共に設置される会議用端末装置であって、所定種類のアルゴリズムに従って、前記複数のマイクロフォンの中から一つのマイクロフォンを選択する選択手段と、前記複数のマイクロフォンに対応する複数の前記音声信号から、前記選択された一つのマイクロフォンにより生成された前記音声信号に対応する音声データを生成する音声データ生成手段と、前記複数の拠点のうち自拠点を除く他の拠点に対し、ネットワークを介して前記生成された音声データを送信すると共に、前記ネットワークを介して前記他の拠点において生成された音声データを受信する通信手段と、前記受信された音声データに基づいて音声出力を行うように前記スピーカを制御する制御手段とを具備することを特徴とする。 In order to solve the above-described problem, the conference terminal device according to claim 1 is installed in association with (i) a participant of the conference at the base in order to hold a conference between the bases. A plurality of microphones each collecting sound and generating a sound signal corresponding to the collected sound; and (ii) a conference terminal device installed together with at least one speaker, wherein the plurality of microphones are arranged according to a predetermined type of algorithm. The voice data corresponding to the voice signal generated by the selected one microphone is generated from the selection means for selecting one microphone from the plurality of microphones and the plurality of voice signals corresponding to the plurality of microphones. Voice data generation means for performing the generation via a network with respect to other bases other than the local base among the plurality of bases Communication means for transmitting the received voice data and receiving the voice data generated at the other site via the network, and controlling the speaker so as to output the voice based on the received voice data And a control means.

上述した課題を解決するため、請求項８の会議システムは、複数の拠点間で会議を行うために該拠点に設置され、（i）前記会議の参加者に対応付けられて設置され、音声を集音すると共に該集音した音声に対応する音声信号を生成する複数のマイクロフォン、（ii）少なくとも一つのスピーカ及び（iii）会議用端末装置を含み、前記会議用端末装置は、所定種類のアルゴリズムに従って、前記複数のマイクロフォンの中から一つのマイクロフォンを選択する選択手段と、前記複数のマイクロフォンに対応する複数の前記音声信号から、前記選択された一つのマイクロフォンにより生成された前記音声信号に対応する音声データを生成する音声データ生成手段と、前記複数の拠点のうち自拠点を除く他の拠点に対し、ネットワークを介して前記生成された音声データを送信すると共に、前記ネットワークを介して前記他の拠点において生成された音声データを受信する通信手段と、
前記受信された音声データに基づいて音声出力を行うように前記スピーカを制御する制御手段とを具備することを特徴とする。 In order to solve the above-described problem, the conference system according to claim 8 is installed at a base to hold a conference between a plurality of bases, and (i) is installed in association with a participant of the conference, A plurality of microphones for collecting sound and generating sound signals corresponding to the collected sound; (ii) at least one speaker; and (iii) a conference terminal device, wherein the conference terminal device is a predetermined type of algorithm. And selecting means for selecting one microphone from the plurality of microphones, and corresponding to the sound signal generated by the selected one microphone from the plurality of sound signals corresponding to the plurality of microphones. Voice data generating means for generating voice data, and the other bases other than the own base among the plurality of bases, through the network It transmits the voice data made, communication means for receiving the audio data generated in the other site via the network;
And control means for controlling the speaker so as to output sound based on the received sound data.

上述した課題を解決するため、請求項９のコンピュータプログラムは、コンピュータシステムを、請求項１から７のいずれか一項に記載の選択手段として機能させることを特徴とする。 In order to solve the above-described problem, a computer program according to a ninth aspect causes a computer system to function as the selection unit according to any one of the first to seventh aspects.

上述した課題を解決するため、請求項１０のコンピュータプログラムは、コンピュータシステムを、請求項１から７のいずれか一項に記載の選択手段、音声データ生成手段及び制御手段として機能させることを特徴とする。 In order to solve the above-described problem, a computer program according to claim 10 causes a computer system to function as the selection unit, the voice data generation unit, and the control unit according to any one of claims 1 to 7. To do.

＜会議用端末装置の実施形態＞
本発明の会議用端末装置に係る実施形態は、複数の拠点間で会議を行うために、該拠点において（i）前記会議の参加者に対応付けられて設置され、音声を夫々集音すると共に該集音した音声に対応する音声信号を夫々生成する複数のマイクロフォン及び（ii）少なくとも一つのスピーカと共に設置される会議用端末装置であって、所定種類のアルゴリズムに従って、前記複数のマイクロフォンの中から一つのマイクロフォンを選択する選択手段と、前記複数のマイクロフォンに対応する複数の前記音声信号から、前記選択された一つのマイクロフォンにより生成された前記音声信号に対応する音声データを生成する音声データ生成手段と、前記複数の拠点のうち自拠点を除く他の拠点に対し、ネットワークを介して前記生成された音声データを送信すると共に、前記ネットワークを介して前記他の拠点において生成された音声データを受信する通信手段と、前記受信された音声データに基づいて音声出力を行うように前記スピーカを制御する制御手段とを具備する。 <Embodiment of Conference Terminal Device>
In the embodiment of the conference terminal device of the present invention, in order to conduct a conference between a plurality of sites, (i) the conference terminal device is installed in association with the participants of the conference and collects sounds respectively. A plurality of microphones each generating a sound signal corresponding to the collected sound and (ii) a conference terminal device installed together with at least one speaker, wherein the meeting terminal device is selected from among the plurality of microphones according to a predetermined type of algorithm. Selection means for selecting one microphone and voice data generation means for generating voice data corresponding to the voice signal generated by the selected one microphone from the plurality of voice signals corresponding to the plurality of microphones. And the generated voice data via a network to other bases of the plurality of bases other than the own base A communication means for transmitting and receiving voice data generated at the other site via the network, and a control means for controlling the speaker so as to perform voice output based on the received voice data. It has.

本発明において、「拠点」とは、会議を行うための一つの場所を指し、例えば、会社、事務所、ホテル若しくは学校などにおける各種会議室、ホール、又は自宅或いは屋外の一空間などを含み、会議を行うことが可能である空間を全て含む概念である。尚、本発明における「会議」とは、参加者の間で行われる、例えば討論、相談、打ち合わせ又は話し合いなどを指す。本発明に係る会議用端末装置は、複数のこれら拠点間で会議を行うために、これら拠点において、複数のマイクロフォン及び少なくとも一つのスピーカと共に設置される。 In the present invention, the "base" refers to one place for the conference, for example, including various meeting rooms in a company, office, hotel or school, a hall, or a home or outdoor space, It is a concept that includes all spaces in which a meeting can be held. The “conference” in the present invention refers to discussions, consultations, meetings, discussions, and the like that are performed among participants. The conference terminal device according to the present invention is installed together with a plurality of microphones and at least one speaker in order to conduct a conference between the plurality of sites.

拠点には少なくとも一人の参加者がおり、複数のマイクロフォンは、この参加者に対応付けられる形で設置される。ここで、「参加者に対応付けられる」とは、参加者とマイクロフォンとの間に何らかの対応関係が存在することを表す趣旨であり、例えば参加者とマイクロフォンとが、一対一、一対多、多対一、又は多対多に対応することを表す。参加者は、会議が進行する過程で適宜発言を行う。この際、参加者の音声は、これらマイクロフォンによって集音され、集音された音声に対応する音声信号が生成される。 There is at least one participant at the base, and a plurality of microphones are installed in association with the participants. Here, “corresponding to a participant” means that there is some correspondence between the participant and the microphone. For example, the participant and the microphone are one-to-one, one-to-many, many-pair. One or many-to-many correspondence. Participants will speak as appropriate during the course of the conference. At this time, the participant's voice is collected by these microphones, and a voice signal corresponding to the collected voice is generated.

本発明の会議用端末装置に係る実施形態によれば、その動作時には、音声データ生成手段によって所定種類の音声データが生成される。 According to the embodiment of the conference terminal device of the present invention, during the operation, a predetermined type of audio data is generated by the audio data generation means.

ここで、「所定種類の音声データ」とは、自拠点においてマイクロフォンにより生成される音声信号を他の拠点に伝送するためのデータを総称する概念であり、例えば、アナログ信号である音声信号から音声用のＣＯＤＥＣ（COrder／DECorder）などを介して生成されるデジタル音声データなどを指す。 Here, the “predetermined type of audio data” is a concept that collectively refers to data for transmitting an audio signal generated by a microphone at a local site to another site. This refers to digital audio data and the like generated via CODEC (COrder / DECorder) for use.

生成された音声データは、通信手段により、ネットワークを介して他の拠点に送信される。一方、他の拠点において設置された会議用端末装置においても、同様の動作が行われており、自拠点における会議用端末装置においては、ネットワークを介して他の拠点において生成された音声データが受信される。 The generated voice data is transmitted by communication means to another base via the network. On the other hand, the same operation is performed in the conference terminal device installed in another base, and the conference terminal device in the local site receives the voice data generated in the other base via the network. Is done.

ここで、本発明に係る「ネットワーク」とは、複数拠点に設置された会議用端末夫々を収容するネットワークであり、例えば、ＬＡＮ（Local Area Network）やＷＡＮ（Wide Area Network）などの通信網を指す。これら通信網は、例えば、通常の電話回線、ＡＤＳＬ（Asymmetric Digital Subscriber Line）、ＩＳＤＮ回線、光ファイバーなどの一部又は全部を含んでなる有線通信網として、或いは一部が無線通信網として構築される。 Here, the “network” according to the present invention is a network that accommodates each of conference terminals installed at a plurality of locations. For example, a communication network such as a LAN (Local Area Network) or a WAN (Wide Area Network) is used. Point to. These communication networks are constructed, for example, as a wired communication network including a part or all of a normal telephone line, an ADSL (Asymmetric Digital Subscriber Line), an ISDN line, an optical fiber, or a part as a wireless communication network. .

制御手段は、この受信された音声データに基づいて音声出力を行うようにスピーカを制御する。例えば、音声データ生成手段が前述したＣＯＤＥＣなどを含む場合には、デジタルデータとして受信された音声データがデコードされ、アナログの音声信号が取り出される。スピーカからは、係るアナログ音声信号によって規定される音声が出力（放音）される。このような動作が各拠点において実行されることにより、双方向的に会議が進行する。 The control means controls the speaker so as to output sound based on the received sound data. For example, when the audio data generation means includes the above-described CODEC, the audio data received as digital data is decoded and an analog audio signal is extracted. From the speaker, sound defined by the analog audio signal is output (sound emission). By performing such an operation at each site, the conference proceeds in a bidirectional manner.

ここで特に、マイクロフォンを介して会議用端末装置に入力される音声信号が複数ある場合、生成される音声データは通常それらを全て含んだものとなる。従って、最終的にスピーカから放音される音声は、複数の参加者の音声が混在したものとなる。これでは、参加者のうち誰が発言しているのかを特定することが困難である。 Here, particularly when there are a plurality of audio signals input to the conference terminal device via the microphone, the generated audio data usually includes all of them. Therefore, the sound finally emitted from the speaker is a mixture of the sounds of a plurality of participants. This makes it difficult to identify who is speaking among the participants.

加えてこのような場合、入力される音声信号各々に対して生じる雑音（ノイズ）が音声データに集約される為、最終的な雑音のレベルが無視し得ない程大きくなり易い。即ち、放音される音声の品質が劣化し易い。その結果、会議の円滑な進行は困難となる。 In addition, in such a case, noise (noise) generated for each input audio signal is aggregated in the audio data, so that the final noise level tends to be so large that it cannot be ignored. That is, the quality of the emitted sound is likely to deteriorate. As a result, smooth progress of the conference becomes difficult.

そこで、本発明の会議用端末装置に係る実施形態は、選択手段及び音声データ生成手段の作用によって係る問題を解決している。 Therefore, the embodiment according to the conference terminal device of the present invention solves the problem due to the action of the selection means and the voice data generation means.

本発明に係る「選択手段」は、複数のマイクロフォンの中から一つのマイクロフォンを選択することが可能に構成されている。この際、選択手段は、所定種類のアルゴリズムに従って係る選択を行う。ここで、「所定種類のアルゴリズム」とは、参加者の中の一人に対し、有意に（適切に）発言権を付与し得る何らかの規則に基づいたアルゴリズムである限りにおいて、如何なるものであっても構わない。このようなアルゴリズムは、予め実験的に、経験的に、或いはシミュレーションなどによって適切に決定可能である場合には、そのようなアルゴリズムであってもよい。 The “selecting means” according to the present invention is configured to be able to select one microphone from a plurality of microphones. At this time, the selection means performs the selection according to a predetermined type of algorithm. Here, the “predetermined type of algorithm” is any algorithm as long as it is an algorithm based on some rule that can significantly (appropriately) give the right to one of the participants. I do not care. Such an algorithm may be such an algorithm when it can be appropriately determined in advance experimentally, empirically, or by simulation.

例えば、会議に際して進行役（議長）に相当する参加者が存在する場合には、係る進行役に対応して設置されたマイクロフォンが他の参加者に対応するマイクロフォンよりも優先的に選択されるようなアルゴリズムであってもよい。或いは、ブレインストーミングのような比較的自由な雰囲気の中で行われる会議において、可能な限り参加者各々に平等に発言権を与えるための、複数のマイクロフォン各々を均等に選択するようなアルゴリズムであってもよい。 For example, when there is a participant corresponding to the facilitator (chairman) at the meeting, the microphone installed corresponding to the facilitator is selected with priority over the microphones corresponding to other participants. A simple algorithm may be used. Alternatively, in a conference held in a relatively free atmosphere such as brainstorming, an algorithm is used to equally select each of a plurality of microphones in order to give each participant the right to speak as much as possible. May be.

音声データ生成手段は、複数のマイクロフォンに対応する複数の音声信号から、選択された一つのマイクロフォンにより生成された音声信号に対応する音声データを生成する。ここで「一つのマイクロフォンにより生成された音声信号に対応する音声データ」とは、完全に他の音声信号（選択されたもの以外）がシャットアウトされた音声データ、他の音声信号の音量（レベル）を相対的に小さくした音声データなどを含み、この一つのマイクロフォンにより生成された音声信号が、生成された音声データにおいて支配的である場合を含む概念である。 The sound data generation means generates sound data corresponding to the sound signal generated by the selected one microphone from the plurality of sound signals corresponding to the plurality of microphones. Here, “audio data corresponding to an audio signal generated by one microphone” means audio data in which another audio signal (other than the selected one) is completely shut out, and the volume (level) of the other audio signal. This is a concept including a case in which the audio signal generated by the single microphone is dominant in the generated audio data.

本発明の会議用端末装置に係る実施形態によれば、このように一つのマイクロフォンに対応する音声データが生成されるため、最終的にスピーカから放音される音声の品質が劣化し難い。また、マイクロフォンの選択は選択手段により自動的に行われるため快適である。即ち、会議を効率的且つ効果的に進行させることが可能となるのである。 According to the embodiment of the conference terminal device of the present invention, since the audio data corresponding to one microphone is generated in this way, the quality of the sound finally emitted from the speaker is unlikely to deteriorate. Moreover, since the selection of the microphone is automatically performed by the selection means, it is comfortable. That is, the conference can be efficiently and effectively advanced.

尚、このような選択を手動で行った場合には、音声の品質に関しては同様の効果を期待することができるが、適切なタイミングで適切な発言を常に選択することは困難であり、著しく煩雑な作業となる上、専用の作業員が必要となる為極めて非経済的である。その点において、本発明の会議用端末装置に係る実施形態は明らかに有利に構成されている。 If such a selection is made manually, the same effect can be expected with respect to the sound quality, but it is difficult to always select an appropriate speech at an appropriate timing, which is extremely complicated. It is very uneconomical because it requires a dedicated worker. In that respect, the embodiment according to the conference terminal of the present invention is clearly advantageous.

尚、上述した本発明に係る効果は、複数拠点の全てに複数のマイクロフォンが設置されている場合に、より顕著に発揮されるものであるが、必ずしも全ての拠点に複数のマイクロフォンが設置されている必要はない。例えば、２拠点間で会議を行う際に、一方の拠点にはマイクロフォン及びスピーカが夫々一個ずつ設置されていてもよい。このような場合であっても、他方の拠点から送信される音声データによって、上述した如き本発明の効果が実現される。従って、上述した説明における、ネットワークを介して受信される他の拠点において生成された音声データとは、典型的には、選択された一つのマイクロフォンに対応する音声データであるが、単一マイクロフォンにより生成された音声信号から生成された音声データであってもよい趣旨である。 The above-described effect according to the present invention is more prominent when a plurality of microphones are installed at all of the plurality of sites, but a plurality of microphones are not necessarily installed at all the sites. There is no need to be. For example, when a conference is held between two sites, one microphone and one speaker may be installed at each site. Even in such a case, the effect of the present invention as described above is realized by the audio data transmitted from the other base. Therefore, in the above description, the voice data generated at another site received via the network is typically voice data corresponding to one selected microphone, but it is determined by a single microphone. This is to the effect that voice data generated from the generated voice signal may be used.

本発明の会議用端末装置に係る実施形態の一の態様では、前記拠点には、（i）被写体を撮像すると共に該撮像した被写体に対応する映像信号を生成する撮像手段及び（ii）前記被写体に係る映像を表示する表示手段が更に設置され、前記会議用端末は更に、前記生成された映像信号に対応する所定種類の映像データを生成する映像データ生成手段を具備し、前記通信手段は更に、前記他の拠点に対し前記ネットワークを介して前記生成された映像データを送信すると共に前記他の拠点から前記ネットワークを介して前記他の拠点において生成された映像データを受信し、前記制御手段は更に、前記生成された映像データ及び前記受信された映像データのうち少なくとも一方に基づいて映像出力を行うように前記表示手段を制御する。 In one aspect of the embodiment of the conference terminal device according to the present invention, the base includes (i) an imaging unit that captures an image of a subject and generates a video signal corresponding to the captured subject; and (ii) the subject. Display means for displaying the video according to the video data is further installed, and the conference terminal further includes video data generation means for generating video data of a predetermined type corresponding to the generated video signal, and the communication means further includes Transmitting the generated video data via the network to the other site and receiving the video data generated at the other site via the network from the other site; Further, the display unit is controlled to output a video based on at least one of the generated video data and the received video data.

この態様においては、拠点に撮像手段及び表示手段が更に設置される。ここで、本発明に係る「撮像手段」とは、例えば、デジタルカメラやデジタルビデオカメラなどであり、被写体に係る映像信号を生成することが可能であるものを総称する概念である。また、本発明に係る「表示手段」とは、映像を表示するための表示画面を有する装置であり、例えば、大型の表示領域を有するプラズマディスプレイ、液晶ディスプレイ又はＣＲＴ（Cathode Ray Tube）ディスプレイ装置などが好適に使用される。この態様においては、これらを使用した、所謂「テレビ会議」と称される形態で会議が進行する。 In this aspect, an imaging unit and a display unit are further installed at the base. Here, the “imaging means” according to the present invention is a concept generically referring to, for example, a digital camera, a digital video camera, or the like that can generate a video signal related to a subject. The “display means” according to the present invention is a device having a display screen for displaying an image, such as a plasma display, a liquid crystal display or a CRT (Cathode Ray Tube) display device having a large display area. Are preferably used. In this mode, the conference proceeds in a so-called “video conference” using these.

尚、撮像手段によって撮像される被写体は、会議に対し関連性を有する被写体である限りにおいて如何なるものであってもよい。例えば、拠点全体の風景であってもよいし、参加者各々であってもよい。 Note that the subject imaged by the imaging means may be any subject as long as it is a subject relevant to the conference. For example, the scenery of the whole base may be sufficient, and each participant may be sufficient.

映像データ生成手段は、生成された映像信号に対応する所定種類の映像データを生成する。ここで、本発明に係る「所定種類の映像データ」とは、例えば、自拠点における映像信号を他の拠点に伝送するためのデータ又は自拠点における映像信号に基づいた映像を自拠点の表示手段に表示させるためのデータなどを総称する概念であり、例えば、前者には、映像信号からビデオ用のＣＯＤＥＣなどを介して生成される、符号化されて圧縮されたデータなどが含まれる。この映像データは、例えば通信手段を介して他の拠点に送信される。また、他の拠点において生成された映像データが、通信手段を介して自拠点における会議用端末装置で受信される。 The video data generating means generates a predetermined type of video data corresponding to the generated video signal. Here, the “predetermined type of video data” according to the present invention refers to, for example, data for transmitting a video signal at its own site to another site or a video based on the video signal at its own site as a display means of its own site For example, the former includes encoded and compressed data generated from a video signal via a video CODEC or the like. This video data is transmitted to another site via, for example, communication means. Also, the video data generated at the other base is received by the conference terminal device at the local base via the communication means.

制御手段は、これら自拠点或いは他の拠点において生成された映像データに基づいて映像出力を行うように表示手段を制御する。ここで、自拠点の映像データに基づいて表示手段が制御される場合には、自拠点における撮像手段により撮像された被写体に関する映像が表示手段に表示される。一方、他の拠点の映像データに基づいて表示が行われる場合、例えば、受信された映像データから前述のビデオ用のＣＯＤＥＣなどを介して取り出される映像信号に基づいて、他の拠点における映像が表示手段に表示される。これら二種類の映像は、分割された複数の画面に表示されてもよい。このような表示手段の表示の態様も、制御手段の作用により自由に決定されてよい。この態様によれば、音声の他に映像が加わるため、会議に視覚的な効果を付与することができ、極めて効果的である。 The control means controls the display means to perform video output based on the video data generated at the local site or other sites. Here, when the display means is controlled based on the video data of the local site, the video relating to the subject imaged by the imaging means at the local site is displayed on the display means. On the other hand, when the display is performed based on the video data of another base, for example, the video at the other base is displayed based on the video signal extracted from the received video data through the above-described video CODEC. Displayed on the means. These two types of videos may be displayed on a plurality of divided screens. Such a display mode of the display means may be freely determined by the action of the control means. According to this aspect, since a video is added in addition to audio, a visual effect can be given to the conference, which is extremely effective.

尚、この態様においては、前記制御手段は更に、前記選択された一つのマイクロフォンに対応する参加者が前記被写体となるように前記撮像手段を制御してもよい。 In this aspect, the control means may further control the imaging means so that a participant corresponding to the selected one microphone becomes the subject.

この態様によれば、選択手段によって一つのマイクロフォンが選択された際、係るマイクロフォンに対応する参加者が撮像されるように制御手段が撮像手段を制御する。この場合には、自拠点において発言している参加者の映像が、自拠点或いは他の拠点における表示手段に表示されることとなり、音声と映像とを効果的に組み合わせることができる。従って、会議を円滑に進行させることが容易にして可能となる。 According to this aspect, when one microphone is selected by the selection unit, the control unit controls the imaging unit so that a participant corresponding to the microphone is captured. In this case, the video of the participant who speaks at the local site is displayed on the display means at the local site or at another site, and the audio and the video can be effectively combined. Therefore, it is possible to facilitate and facilitate the conference.

また、表示手段を制御することが可能である本発明の会議用端末装置に係る実施形態の一の態様では、前記選択された一つのマイクロフォンに対応する参加者に関する所定種類の識別データを生成する識別データ生成手段を更に具備し、前記通信手段は更に、前記他の拠点に対し前記ネットワークを介して前記生成された識別データを送信すると共に前記他の拠点から前記ネットワークを介して前記他の拠点において生成された識別データを受信し、前記制御手段は更に、前記生成された識別データ及び前記受信された識別データのうち少なくとも一方に基づいて前記表示手段を制御する。 Further, in one aspect of the embodiment of the conference terminal device of the present invention capable of controlling the display means, a predetermined type of identification data relating to the participant corresponding to the selected one microphone is generated. And further comprising identification data generating means, wherein the communication means further transmits the generated identification data to the other base via the network and from the other base to the other base via the network. The control unit further controls the display unit based on at least one of the generated identification data and the received identification data.

この態様によれば、識別データ生成手段により、選択されたマイクロフォンに対応する参加者に関する識別データが生成される。この識別データは、例えば通信手段によりネットワークを介して他の拠点に送信される。 According to this aspect, the identification data generation means generates identification data related to the participant corresponding to the selected microphone. This identification data is transmitted to another site via a network, for example, by communication means.

ここで、制御手段は、この生成された識別データ及び他の拠点から受信した識別データのうち少なくとも一方に基づいて表示手段を制御する。前者であれば、自拠点における発言者の識別データに対応する視覚的な情報が表示され、後者であれば、他の拠点における発言者の識別データに対応する視覚的な情報が表示される。いずれにしても、表示内容に付加価値が与えられ、会議をより効果的に行うことができる。 Here, the control means controls the display means based on at least one of the generated identification data and the identification data received from another base. If it is the former, visual information corresponding to the identification data of the speaker at its own base is displayed, and if it is the latter, visual information corresponding to the identification data of the speaker at another base is displayed. In any case, added value is given to the display contents, and the conference can be performed more effectively.

尚、ここで、「所定種類の識別データ」とは、例えば参加者の氏名や所属部署など、参加者を識別するためのデータである限りにおいて何ら限定されるものではないが、表示されることによって会議に付加価値を与えることが可能な情報であれば好適である。 Here, the “predetermined type of identification data” is not limited in any way as long as it is data for identifying the participant, such as the name of the participant and the department to which the participant belongs. Any information that can add value to the conference is suitable.

本発明の会議用端末装置に係る実施形態の他の態様では、前記アルゴリズムは、前記会議の形態に応じて予め複数用意されており、前記選択手段は、前記複数用意されたアルゴリズムのうち一つに従って前記一つのマイクロフォンを選択する。 In another aspect of the embodiment of the conference terminal device of the present invention, a plurality of the algorithms are prepared in advance according to the form of the conference, and the selection means is one of the plurality of prepared algorithms. To select the one microphone.

この態様によれば、マイクロフォンを選択するためのアルゴリズムは予め会議の形態に応じて複数用意されている。ここで、「会議の形態」とは、会議の規模、種類又は性質など、会議を特徴付ける要素を広く含む概念である。従って、この態様によれば、その都度適切なアルゴリズムを使用することが容易にして可能となり、会議をより効率的且つ効果的に行うことが可能となる。 According to this aspect, a plurality of algorithms for selecting a microphone are prepared in advance according to the form of the conference. Here, the “conference form” is a concept that widely includes elements that characterize the conference such as the size, type, or nature of the conference. Therefore, according to this aspect, it is possible to easily use an appropriate algorithm each time, and it is possible to hold a conference more efficiently and effectively.

この態様では、前記複数用意されたアルゴリズムは、前記複数のマイクロフォン各々を巡回的に選択するアルゴリズムを含んでもよい。 In this aspect, the plurality of prepared algorithms may include an algorithm that cyclically selects each of the plurality of microphones.

ここで、本発明において「巡回的に選択する」とは、複数のマイクロフォンのうち少なくとも一部の間で、予め定められた或いは不定期に決定される順序に従って一つのマイクロフォンを選択することを表す概念である。この際、一つのマイクロフォンが選択されている時間の重み付けは、相互に等しくても異なっていてもよい。但し、典型的には、マイクロフォン各々を等しい重み付けで公平に選択することを表す。 Here, “cyclically selecting” in the present invention means selecting one microphone in accordance with a predetermined or irregularly determined order among at least some of the plurality of microphones. It is a concept. At this time, the weighting of the time during which one microphone is selected may be equal to or different from each other. However, typically, it represents that each microphone is selected fairly with equal weighting.

例えば、複数のアルゴリズムのうち一つがこのようなアルゴリズムである場合、複数のマイクロフォン各々が巡回的に選択されるので、比較的に処理が軽くて済む上、公平に或いは公平とみなし得る程度にマイクロフォンを選択することが可能となる。従って、参加者各々における発言権が概ね等しい場合などに特に効果的である。 For example, when one of a plurality of algorithms is such an algorithm, each of the plurality of microphones is cyclically selected, so that the processing can be relatively light and the microphones can be regarded as fair or fair. Can be selected. Therefore, it is particularly effective when the speaking rights of the participants are substantially equal.

また、この態様では、前記複数用意されたアルゴリズムは、前記複数のマイクロフォン各々を予め前記複数のマイクロフォン各々に対し付与される優先度に応じて選択するアルゴリズムを含んでもよい。 In this aspect, the plurality of prepared algorithms may include an algorithm that selects each of the plurality of microphones according to a priority given to each of the plurality of microphones in advance.

例えば、この場合には、マイクロフォン各々に予め優先度が設定されるので、マイクロフォン各々、即ち、それに対応する参加者各々を相互に差別化することが可能となる。例えば、議長や司会など会議を取りまとめる立場にある参加者に対応するマイクロフォンを優先的に選択することも容易にして可能である。尚、このような優先度は、例えば選択されている期間の長短として与えられていてもよいし、選択される頻度の高低として与えられていてもよい。また、会議開始以前或いは会議中に、優先度を規定するパラメータを自由に設定可能に構成されていてもよい。 For example, in this case, since the priority is set in advance for each microphone, each microphone, that is, each participant corresponding thereto, can be differentiated from each other. For example, it is possible to easily select a microphone corresponding to a participant who is in the position of coordinating a conference such as a chairperson or a chairman. Note that such priority may be given as, for example, the length of the selected period, or may be given as the frequency of selection. Moreover, the parameter which prescribes | regulates a priority may be set freely before the meeting start or during a meeting.

この態様によれば、会議の規模、種類、又は性質などに適応したマイクロフォンの選択が可能となり、会議を一層円滑に行うことができる。
＜会議システムの実施形態＞
本発明の会議システムに係る実施形態は、複数の拠点間で会議を行うために該拠点に設置され、（i）前記会議の参加者に対応付けられて設置され、音声を集音すると共に該集音した音声に対応する音声信号を生成する複数のマイクロフォン、（ii）少なくとも一つのスピーカ及び（iii）会議用端末装置を含み、前記会議用端末装置は、所定種類のアルゴリズムに従って、前記複数のマイクロフォンの中から一つのマイクロフォンを選択する選択手段と、前記複数のマイクロフォンに対応する複数の前記音声信号から、前記選択された一つのマイクロフォンにより生成された前記音声信号に対応する音声データを生成する音声データ生成手段と、前記複数の拠点のうち自拠点を除く他の拠点に対し、ネットワークを介して前記生成された音声データを送信すると共に、前記ネットワークを介して前記他の拠点において生成された音声データを受信する通信手段と、前記受信された音声データに基づいて音声出力を行うように前記スピーカを制御する制御手段とを具備する。 According to this aspect, it is possible to select a microphone adapted to the size, type, or nature of the conference, and the conference can be performed more smoothly.
<Embodiment of conference system>
An embodiment according to the conference system of the present invention is installed at a base to hold a conference between a plurality of bases, and (i) is installed in association with a participant of the conference, collects sound, and A plurality of microphones for generating audio signals corresponding to the collected sound; (ii) at least one speaker; and (iii) a conference terminal device, wherein the conference terminal device is configured to perform the plurality of Audio data corresponding to the audio signal generated by the selected one microphone is generated from selection means for selecting one microphone from among the microphones and the plurality of audio signals corresponding to the plurality of microphones. The voice data generation means and the generated voice data over the network to other bases other than the local base among the plurality of bases. Communication means for transmitting voice data and receiving voice data generated at the other site via the network, and control means for controlling the speaker to perform voice output based on the received voice data It comprises.

本発明の会議システムに係る実施形態によれば、前述した如き会議用端末装置、マイクロフォン及びスピーカ各々における相互作用により、複数拠点間において効率的且つ効果的に会議を行うことが可能となる。
＜コンピュータプログラムの第１実施形態＞
本発明のコンピュータプログラムに係る第１実施形態は、コンピュータシステムを、上記いずれかの選択手段として機能させる。 According to the embodiment of the conference system of the present invention, it is possible to hold a conference efficiently and effectively between a plurality of sites by the interaction between the conference terminal device, the microphone and the speaker as described above.
<First Embodiment of Computer Program>
The first embodiment according to the computer program of the present invention causes a computer system to function as any one of the above selection means.

本発明のコンピュータプログラムに係る第１実施形態によれば、当該コンピュータプログラムを格納するＲＯＭ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク等の記録媒体から、当該コンピュータプログラムをコンピュータシステムに読み込んで実行させれば、或いは、当該コンピュータプログラムを、例えば、通信手段等を介してコンピュータシステムにダウンロードさせた後に実行させれば、上述した本発明の会議用端末装置に係る実施形態における選択手段を比較的簡単に実現可能である。 According to the first embodiment of the computer program of the present invention, the computer program can be read into a computer system from a recording medium such as a ROM, CD-ROM, DVD-ROM, and hard disk that stores the computer program and executed. Alternatively, if the computer program is executed after being downloaded to a computer system via, for example, communication means, the selection means in the embodiment of the conference terminal device of the present invention described above can be made relatively simple. It is feasible.

尚、上述した本発明の会議用端末装置に係る実施形態における各種態様に対応して、本発明のコンピュータプログラムに係る第１実施形態も各種態様を採ることが可能である。
＜コンピュータプログラムの第２実施形態＞
本発明のコンピュータプログラムに係る第２実施形態は、コンピュータシステムを、上記いずれかの選択手段、音声データ生成手段及び制御手段として機能させる。 Incidentally, the first embodiment according to the computer program of the present invention can also adopt various aspects in response to the various aspects of the embodiment according to the conference terminal apparatus of the present invention described above.
<Second Embodiment of Computer Program>
The second embodiment according to the computer program of the present invention causes a computer system to function as any one of the selection means, the sound data generation means, and the control means.

本発明のコンピュータプログラムに係る第２実施形態によれば、当該コンピュータプログラムを格納するＲＯＭ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク等の記録媒体から、当該コンピュータプログラムをコンピュータシステムに読み込んで実行させれば、或いは、当該コンピュータプログラムを、例えば、通信手段等を介してコンピュータシステムにダウンロードさせた後に実行させれば、上述した本発明の会議用端末装置に係る実施形態における選択手段、音声データ生成手段及び制御手段を比較的簡単に実現可能である。 According to the second embodiment of the computer program of the present invention, the computer program can be read into a computer system from a recording medium such as a ROM, CD-ROM, DVD-ROM, and hard disk that stores the computer program and executed. Alternatively, if the computer program is executed after being downloaded to a computer system via a communication means, for example, the selecting means, the voice data generating means in the embodiment of the conference terminal device of the present invention described above And the control means can be realized relatively easily.

尚、上述した本発明の会議用端末装置に係る実施形態における各種態様に対応して、本発明のコンピュータプログラムに係る第２実施形態も各種態様を採ることが可能である。 Incidentally, the second embodiment according to the computer program of the present invention can also adopt various aspects in response to the various aspects of the embodiment according to the conference terminal apparatus of the present invention described above.

本発明のこのような作用及び他の利得は次に説明する実施例から明らかにされる。 These effects and other advantages of the present invention will become apparent from the embodiments described below.

以下、図面を参照して、本発明の好適な実施例について説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

＜１：第１実施例＞
＜１．１：実施例の構成＞
始めに、図１を参照して、本発明の第１実施例に係る会議システムの概要について説明する。ここに、図１は、会議システム１０の概念図である。 <1: First embodiment>
<1.1: Configuration of Example>
First, the outline of the conference system according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a conceptual diagram of the conference system 10.

図１において、会議システム１０は、拠点Ａ、拠点Ｂ及び拠点Ｃの複数の拠点間でネットワーク２０を介して会議を行うためのシステムである。 In FIG. 1, a conference system 10 is a system for conducting a conference via a network 20 between a plurality of bases A, B, and C.

拠点Ａ、拠点Ｂ及び拠点Ｃは、例えば、夫々相互に離れた場所に存在する会議室などであり、ネットワーク２０は、それら拠点間を結ぶＬＡＮ又はＷＡＮなどのデータ通信網である。尚、ネットワーク２０は、一部又は全体が有線又は無線通信ネットワークとして構成されていてもよい。 The base A, the base B, and the base C are, for example, conference rooms that exist at locations away from each other, and the network 20 is a data communication network such as a LAN or WAN that connects these bases. The network 20 may be partly or wholly configured as a wired or wireless communication network.

次に、図２を参照して、拠点の詳細について説明する。ここに、図２は、拠点Ａの模式図である。尚、同図において、図１と重複する箇所には同一の符号を付してその説明を省略することとする。また、本実施例において、拠点Ａ、拠点Ｂ及び拠点Ｃは夫々同一の構成を有しており、従って、拠点Ａの説明をもって他の拠点の説明も同時に行うものとする。 Next, the details of the base will be described with reference to FIG. FIG. 2 is a schematic diagram of the base A. In the figure, the same reference numerals are assigned to the same portions as those in FIG. 1, and the description thereof is omitted. Further, in this embodiment, the base A, the base B, and the base C have the same configuration, and therefore, the description of the base A is also performed simultaneously with the description of the base A.

図２において、拠点Ａには、複数の参加者Ａ１、Ａ２及びＡ３がおり、更に会議制御システム１０Ａが設置されている。会議制御システム１０Ａは、複数のマイクロフォン（以降、適宜「マイク」と称する）ＭＡ１、ＭＡ２及びＭＡ３、スピーカＳＰＡ並びに会議用端末装置１００を備えた、本発明に係る「会議システム」の一例である。 In FIG. 2, the site A has a plurality of participants A1, A2 and A3, and further a conference control system 10A is installed. The conference control system 10A is an example of the “conference system” according to the present invention, which includes a plurality of microphones (hereinafter, referred to as “microphones” as appropriate) MA1, MA2 and MA3, a speaker SPA, and the conference terminal device 100.

参加者Ａ１、Ａ２及びＡ３は夫々拠点Ａにおける会議の参加者である。各参加者には、夫々マイクＭＡ１、ＭＡ２及びＭＡ３が割り当てられている。マイクＭＡ１、ＭＡ２及びＭＡ３は、夫々参加者Ａ１、Ａ２及びＡ３の音声を集音すると共に、該集音した音声に対応する音声信号を生成することが可能な、本発明に係る「マイクロフォン」の一例である。スピーカＳＰＡは、後述する制御部１１０の制御に従って、会議に関する音声を放音することが可能に構成された、本発明に係る「スピーカ」の一例である。 Participants A1, A2 and A3 are participants of the conference at the site A, respectively. Each participant is assigned a microphone MA1, MA2, and MA3, respectively. The microphones MA1, MA2, and MA3 collect sound of the participants A1, A2, and A3, respectively, and can generate sound signals corresponding to the collected sounds. It is an example. The speaker SPA is an example of a “speaker” according to the present invention configured to be able to emit a sound related to a conference under the control of the control unit 110 described later.

会議用端末装置１００は、制御部１１０、接続ポート１２１、１２２及び１２３、選択部１３０、ＣＯＤＥＣ１４０並びに通信部１５０を備えた、本発明に係る「会議用端末装置」の一例である。 The conference terminal device 100 is an example of a “conference terminal device” according to the present invention, which includes a control unit 110, connection ports 121, 122, and 123, a selection unit 130, a CODEC 140, and a communication unit 150.

制御部１１０は、ＣＰＵ（Central Processing Unit）１１１、ＲＯＭ（Read Only Memory）１１２及びＲＡＭ（Random Access Memory）１１３を備える。 The control unit 110 includes a CPU (Central Processing Unit) 111, a ROM (Read Only Memory) 112, and a RAM (Random Access Memory) 113.

ＣＰＵ１１１は、会議用端末装置１００の動作を制御する制御ユニットである。ＲＯＭ１１２は、会議用端末装置１００において実行される会議制御プログラムが格納された不揮発性メモリである。ＲＡＭ１１３は、会議用端末装置１００が会議制御プログラムを実行する過程で生じるデータを一時的に格納するための揮発性のメモリである。 The CPU 111 is a control unit that controls the operation of the conference terminal device 100. The ROM 112 is a non-volatile memory in which a conference control program executed in the conference terminal device 100 is stored. The RAM 113 is a volatile memory for temporarily storing data generated in the process in which the conference terminal device 100 executes the conference control program.

接続ポート１２１、１２２及び１２３は、夫々マイクＭＡ１、ＭＡ２及びＭＡ３と電気的に接続された入力インターフェイスである。夫々のマイクにおいて生成された音声信号は、各接続ポートを介して会議用端末装置１００に入力される。 The connection ports 121, 122, and 123 are input interfaces that are electrically connected to the microphones MA1, MA2, and MA3, respectively. The audio signal generated in each microphone is input to the conference terminal device 100 via each connection port.

選択部１３０は、会議制御用プログラムに従って、接続ポート１２１、１２２及び１２３のうち一つの接続ポートを選択することが可能に構成されており、制御部１１０と共に、本発明に係る「選択手段」の一例として機能する。 The selection unit 130 is configured to be able to select one connection port among the connection ports 121, 122, and 123 according to the conference control program, and together with the control unit 110, the “selection unit” of the present invention. It serves as an example.

ＣＯＤＥＣ１４０は、選択部１３０を介して入力されるアナログ音声信号をデジタル信号に変換することが可能に構成された、本発明に係る「音声データ生成手段」の一例である。 The CODEC 140 is an example of an “audio data generation unit” according to the present invention configured to be able to convert an analog audio signal input via the selection unit 130 into a digital signal.

通信部１５０は、ネットワーク２０を介して制御部１１０及びＣＯＤＥＣ１４０から入力される各種データを他の拠点（ここでは拠点Ｂ及び拠点Ｃ）に送信すると共に、ネットワーク２０を介して他の拠点から送信される当該各種データを自拠点（ここでは拠点Ａ）において受信することが可能に構成された、本発明に係る「通信手段」の一例である。 The communication unit 150 transmits various data input from the control unit 110 and the CODEC 140 via the network 20 to other bases (here, the base B and the base C), and is transmitted from other bases via the network 20. This is an example of the “communication means” according to the present invention configured to be able to receive the various data at its own base (here, the base A).

＜1．２：実施例の動作＞
＜１．２．１：会議の進行＞
引き続き、図２を参照して、拠点Ａにおける会議の進行について説明する。 <1.2: Operation of the embodiment>
<1.2.1: Progress of meeting>
Next, with reference to FIG. 2, the progress of the conference at the site A will be described.

ＣＰＵ１１１は、ＲＯＭ１１２に格納される会議制御プログラムを実行することによって会議用端末装置１００の前述した各部の動作を制御し、会議の進行を制御している。 The CPU 111 controls the progress of the conference by executing the conference control program stored in the ROM 112 to control the operation of each unit described above of the conference terminal device 100.

拠点Ａにおいて、参加者Ａ１、Ａ２及びＡ３は、適宜発言を行う。発言が行われた際には、その音声は、夫々に対応するマイクＭＡ１、ＭＡ２及びＭＡ３によって集音される。この集音された音声は、夫々のマイクにおいて音声信号に変換され、各マイクに対応する接続ポート１２１、１２２及び１２３を介して会議用端末装置１００に入力される。 At site A, participants A1, A2 and A3 speak appropriately. When a speech is made, the sound is collected by the corresponding microphones MA1, MA2 and MA3. The collected sound is converted into a sound signal in each microphone and input to the conference terminal device 100 via the connection ports 121, 122, and 123 corresponding to each microphone.

本来、これら会議用端末装置１００に入力された音声信号は、ＣＯＤＥＣ１４０によってデジタルデータである音声データに変換され、通信部１５０を介して他の拠点における会議用端末装置に送信される。また、他の拠点における音声データは、通信部１５０を介して受信される。ＣＰＵ１１１は、受信した音声データを一時的にＲＡＭ１１３に格納すると共に、ＣＯＤＥＣ１４０を介して係る音声データを再び音声信号に変換する。ＣＰＵ１１１は、この変換された音声信号をスピーカＳＰＡに入力する。こうしてスピーカＳＰＡからは、他の拠点における音声が放音される。各拠点において同様の動作が行われることにより会議が進行する。 Originally, the audio signal input to the conference terminal device 100 is converted into audio data that is digital data by the CODEC 140, and transmitted to the conference terminal device in another base via the communication unit 150. In addition, audio data at other locations is received via the communication unit 150. The CPU 111 temporarily stores the received audio data in the RAM 113 and converts the audio data into an audio signal again via the CODEC 140. The CPU 111 inputs the converted audio signal to the speaker SPA. In this way, the sound at another base is emitted from the speaker SPA. The conference proceeds by performing the same operation at each site.

ここで、複数の参加者が同時に発言を行った場合、会議用端末装置１００に入力される音声信号も複数であり、ＣＯＤＥＣ１４０によって生成される音声データは、それらが混在した音声データとなる。これでは、他の拠点において、拠点Ａで誰が発言を行っているのかを特定し辛い。更に、各マイクにおいて音声信号を生成する時点で、相応のノイズが発生するため、それらが集約される音声データにおけるノイズは大きくなって、音声データの劣化が進行し易い。 Here, when a plurality of participants speak at the same time, a plurality of audio signals are input to the conference terminal device 100, and the audio data generated by the CODEC 140 is audio data in which they are mixed. This makes it difficult to specify who is speaking at the site A at other sites. Furthermore, since appropriate noise is generated at the time of generating the audio signal in each microphone, the noise in the audio data in which they are collected becomes large, and the audio data is likely to deteriorate.

そこで、会議用端末装置１００においては、選択部１３０が複数のマイクＭＡ１、ＭＡ２及びＭＡ３のうちいずれか一つを選択することによって、常にＣＯＤＥＣ１４０に入力される音声信号を一つに制限している。具体的に、選択部１３０は、選択するマイク以外のマイクに対応する接続ポートからの入力信号を無視するようにＣＰＵ１１１によって制御されている。従って、ＣＯＤＥＣ１４０に入力される音声信号は一つに制限され、各拠点においては、常に一人の発言者の音声に基づいて会議が進行する。尚、選択部１３０が一つのマイクを選択する際の手法は上述した効果が担保される限りにおいて特に限定されない。例えば、各接続ポートと選択部１３０との間に、可変式のアッテネータを介在させ、選択されたもの以外の信号減衰量を大きくすることによって、実質的に一つの音声信号がＣＯＤＥＣ１４０に入力されるように構成されていてもよいし、単純に、選択されたもの以外の電源を切断してもよい。 Therefore, in the conference terminal device 100, the selection unit 130 selects any one of the plurality of microphones MA1, MA2, and MA3, so that the audio signal that is always input to the CODEC 140 is limited to one. . Specifically, the selection unit 130 is controlled by the CPU 111 so as to ignore an input signal from a connection port corresponding to a microphone other than the selected microphone. Therefore, the number of audio signals input to the CODEC 140 is limited to one, and the conference always proceeds based on the voice of one speaker at each site. Note that the method used when the selection unit 130 selects one microphone is not particularly limited as long as the above-described effect is ensured. For example, by interposing a variable attenuator between each connection port and the selection unit 130 and increasing the signal attenuation amount other than the selected one, one audio signal is substantially input to the CODEC 140. The power supply other than the selected one may be simply cut off.

一方、選択部１３０はＣＰＵ１１１が実行する会議制御プログラムの中に予め複数設定されている選択アルゴリズムに従って接続ポートの選択を行っている。この選択アルゴリズムは、会議の趣旨、規模、又は性質などに応じて最適化されたアルゴリズムであり、予め複数のアルゴリズムのうち一が参加者などによって選択されている。 On the other hand, the selection unit 130 selects a connection port according to a plurality of selection algorithms set in advance in a conference control program executed by the CPU 111. This selection algorithm is an algorithm optimized in accordance with the purpose, scale, or nature of the conference, and one of a plurality of algorithms is selected in advance by a participant or the like.

＜１．２．２：選択アルゴリズムの詳細＞
次に、図３及び図４を参照して、選択アルゴリズムの詳細について説明する。ここに、図３は、第１選択アルゴリズムのフローチャートであり、図４は、第２選択アルゴリズムのフローチャートである。 <1.2.2: Details of selection algorithm>
Next, the details of the selection algorithm will be described with reference to FIGS. FIG. 3 is a flowchart of the first selection algorithm, and FIG. 4 is a flowchart of the second selection algorithm.

図３において、始めに、マイクＭＡ１に音声入力が有るか否かが判別される（ステップＡ１０）。マイクＭＡ１に音声入力が有るか否かは、接続ポート１２１に入力される音声信号のレベルが所定値以上であるか否かの判断に従って行われる。尚、係る所定値は、予め実験的に、経験的に或いはシミュレーションなどによって適正なレベルに設定されている。 In FIG. 3, first, it is determined whether or not there is an audio input to the microphone MA1 (step A10). Whether or not there is an audio input to the microphone MA1 is determined according to whether or not the level of the audio signal input to the connection port 121 is equal to or higher than a predetermined value. The predetermined value is set to an appropriate level in advance experimentally, empirically, or by simulation.

マイクＭＡ１に音声入力が有る場合（ステップＡ１０：ＹＥＳ）、ＣＰＵ１１１は、選択部１３０を制御して接続ポート１２１を選択する（ステップＡ１１）。ここで、接続ポートを「選択する」とは、係る接続ポートからの入力のみをＣＯＤＥＣ１４０に出力する処理が実行されることを意味する。即ち、本実施例において、選択されていない接続ポートから入力された音声信号は、会議の進行に影響を与えない。ステップＡ１１において接続ポート１２１が選択された時点から、マイクＭＡ１から出力された音声信号のみがＣＯＤＥＣ１４０に入力され、他の拠点においては、マイクＭＡ１に入力された音声が放音される。 When the microphone MA1 has a voice input (step A10: YES), the CPU 111 controls the selection unit 130 to select the connection port 121 (step A11). Here, “selecting” a connection port means that processing for outputting only the input from the connection port to the CODEC 140 is executed. That is, in the present embodiment, the audio signal input from the connection port that is not selected does not affect the progress of the conference. From the time when the connection port 121 is selected in step A11, only the audio signal output from the microphone MA1 is input to the CODEC 140, and the audio input to the microphone MA1 is emitted at other sites.

接続ポート１２１が選択されると、ＣＰＵ１１１は、マイクＭＡ１に音声入力が有るか否かを判別する（ステップＡ１２）。マイクＭＡ１に音声入力が有る場合（ステップＡ１２：ＹＥＳ）、即ち、マイクＭＡ１への音声入力が継続している場合には、ＣＰＵ１１１は、所定の待機時間が経過するまで待機する（ステップＡ１３）。この待機期間においては、マイクＭＡ１が選択された状態が継続する。尚、係る待機期間の長さは、予め実験的に、経験的に、或いはシミュレーションなどによって、適切な値に定められている。 When the connection port 121 is selected, the CPU 111 determines whether or not there is an audio input to the microphone MA1 (step A12). When the microphone MA1 has a voice input (step A12: YES), that is, when the voice input to the microphone MA1 is continued, the CPU 111 stands by until a predetermined standby time elapses (step A13). In this standby period, the state where the microphone MA1 is selected continues. Note that the length of the waiting period is set to an appropriate value in advance by experiment, experience, simulation, or the like.

待機期間が経過すると、ＣＰＵ１１１は、再びステップＡ１２に処理を戻し、マイクＭＡ１への音声入力の有無を判別する。このように、ＣＰＵ１１１は、マイクＭＡ１への音声入力が継続している限り、ステップＡ１２及びステップＡ１３を繰り返す。これらがループ処理されている期間中は、参加者Ａ１の音声（マイクＭＡ１に入力される音声）に対応する音声信号がＣＯＤＥＣ１４０に入力される。 When the standby period elapses, the CPU 111 returns the process to step A12 again, and determines whether or not there is an audio input to the microphone MA1. As described above, the CPU 111 repeats step A12 and step A13 as long as the voice input to the microphone MA1 is continued. During the period when these are loop-processed, an audio signal corresponding to the audio of the participant A1 (audio input to the microphone MA1) is input to the CODEC 140.

一方、ステップＡ１０又はステップＡ１２において、マイクＭＡ１への音声入力が無いと判別された場合（ステップＡ１０又はステップＡ１２が「ＮＯ」の場合）、ＣＰＵ１１１はマイクＭＡ２への音声入力の有無を判別する（ステップＡ１４）。 On the other hand, when it is determined in step A10 or step A12 that there is no voice input to the microphone MA1 (when step A10 or step A12 is “NO”), the CPU 111 determines whether or not there is a voice input to the microphone MA2. Step A14).

マイクＭＡ２に音声入力が有る場合（ステップＡ１４：ＹＥＳ）、ＣＰＵ１１１は選択部１３０を制御して接続ポート１２２を選択し（ステップＡ１５）、マイクＭＡ２への音声入力の有無を再度判別する（ステップＡ１６）。ここで、マイクＭＡ２への音声入力が継続している場合（ステップＡ１６：ＹＥＳ）、ＣＰＵ１１１は、所定の経過時間が経過するまで待機する（ステップＡ１７）。所定の待機時間が経過すると、ＣＰＵ１１１は処理をステップＡ１６に戻し、再びマイクＭＡ２への音声入力の有無を判別する。そして、マイクＭＡ２への音声入力が継続している限りステップＡ１６及びステップＡ１７が繰り返され、参加者Ａ２の音声（マイクＭＡ２に入力される音声）に対応する音声信号がＣＯＤＥＣ１４０に入力される。 When the microphone MA2 has a voice input (step A14: YES), the CPU 111 controls the selection unit 130 to select the connection port 122 (step A15), and again determines whether there is a voice input to the microphone MA2 (step A16). ). Here, when the voice input to the microphone MA2 is continued (step A16: YES), the CPU 111 stands by until a predetermined elapsed time elapses (step A17). When the predetermined standby time has elapsed, the CPU 111 returns the process to step A16, and again determines whether or not there is a voice input to the microphone MA2. Then, as long as the voice input to the microphone MA2 is continued, Step A16 and Step A17 are repeated, and a voice signal corresponding to the voice of the participant A2 (voice inputted to the microphone MA2) is inputted to the CODEC 140.

一方、ステップＡ１４又はステップＡ１６において、マイクＭＡ２への音声入力が無いと判別された場合（ステップＡ１４又はステップＡ１６が「ＮＯ」の場合）、ＣＰＵ１１１はマイクＭＡ３への音声入力の有無を判別する（ステップＡ１８）。 On the other hand, when it is determined in step A14 or step A16 that there is no audio input to the microphone MA2 (when step A14 or step A16 is “NO”), the CPU 111 determines whether or not there is an audio input to the microphone MA3 ( Step A18).

マイクＭＡ３に音声入力が有る場合（ステップＡ１８：ＹＥＳ）、ＣＰＵ１１１は選択部１３０を制御して接続ポート１２３を選択し（ステップＡ１９）、マイクＭＡ３への音声入力の有無を再度判別する（ステップＡ２０）。ここで、マイクＭＡ３への音声入力が継続している場合（ステップＡ２０：ＹＥＳ）、ＣＰＵ１１１は、所定の経過時間が経過するまで待機する（ステップＡ２１）。所定の待機時間が経過すると、ＣＰＵ１１１は処理をステップＡ２０に戻し、再びマイクＭＡ３への音声入力の有無を判別する。そして、マイクＭＡ３への音声入力が継続している限りステップＡ２０及びステップＡ２１が繰り返され、参加者Ａ３の音声（マイクＭＡ３に入力される音声）に対応する音声信号がＣＯＤＥＣ１４０に入力される。 When the microphone MA3 has a voice input (step A18: YES), the CPU 111 controls the selection unit 130 to select the connection port 123 (step A19), and again determines whether there is a voice input to the microphone MA3 (step A20). ). Here, when the voice input to the microphone MA3 is continued (step A20: YES), the CPU 111 stands by until a predetermined elapsed time elapses (step A21). When the predetermined standby time has elapsed, the CPU 111 returns the process to step A20, and again determines whether or not there is an audio input to the microphone MA3. Then, as long as the voice input to the microphone MA3 continues, Step A20 and Step A21 are repeated, and a voice signal corresponding to the voice of the participant A3 (the voice inputted to the microphone MA3) is inputted to the CODEC 140.

ステップＡ１８又はステップＡ２０において、マイクＭＡ３への音声入力が無いと判別された場合（ステップＡ１８又はステップＡ２０が「ＮＯ」の場合）、ＣＰＵ１１１は、処理を再びステップＡ１０に戻し、これまで説明した各ステップが繰り返される。 When it is determined in step A18 or step A20 that there is no voice input to the microphone MA3 (when step A18 or step A20 is “NO”), the CPU 111 returns the process to step A10 again, The steps are repeated.

第１選択アルゴリズムは、このように、参加者Ａ１、参加者Ａ２及び参加者Ａ３を公平に、且つ巡回的に選択するアルゴリズムであり、例えば、ブレインストーミングなど、参加者間に発言上の序列が無い場合などに有効なアルゴリズムとなっている。 As described above, the first selection algorithm is an algorithm for selecting the participant A1, the participant A2, and the participant A3 fairly and cyclically. For example, there is a speech rank among the participants such as brainstorming. It is an effective algorithm when there is no such thing.

次に図４を参照して、第２選択アルゴリズムの詳細について説明する。尚、図４において、図３と重複する箇所には同一の符号を付してその説明を省略することとする。 Next, the details of the second selection algorithm will be described with reference to FIG. In FIG. 4, parts that are the same as those in FIG. 3 are given the same reference numerals, and descriptions thereof are omitted.

図４に示される第２選択アルゴリズムは、図３においてステップＡ１６、ステップＡ１７、ステップＡ２０及びステップＡ２１を削除したものとなっている。 The second selection algorithm shown in FIG. 4 is obtained by deleting step A16, step A17, step A20, and step A21 in FIG.

即ち、ステップＡ１４においてマイクＭＡ２に音声入力が無いか、又はステップＡ１５において接続ポート１２２が選択された場合に、処理はステップＡ１８に移行する。同様に、ステップＡ１８でマイクＭＡ３に音声入力が無いか、又はステップＡ１９において接続ポート１２３が選択された場合に、処理はステップＡ１０に復帰する。ステップＡ１０からステップＡ１３に至る一連処理は第１選択アルゴリズムと同様に実行される。 That is, if there is no voice input to the microphone MA2 in step A14, or if the connection port 122 is selected in step A15, the process proceeds to step A18. Similarly, if there is no voice input to the microphone MA3 in step A18, or if the connection port 123 is selected in step A19, the process returns to step A10. A series of processing from step A10 to step A13 is executed in the same manner as the first selection algorithm.

第２選択アルゴリズムにおいては、ステップＡ１５において接続ポート１２２が選択されたとしても、マイクＭＡ３に音声入力が有る場合には、選択される接続ポートは即座に接続ポート１２３に切り替わる。同様に、接続ポート１２３が選択されたとしても、マイクＭＡ１に音声入力がある場合には、選択される接続ポートは即座に接続ポート１２１に切り替わる。接続ポート１２１が選択されている場合のみ、マイクＭＡ１に音声が入力されている限り接続ポート１２１の選択が継続される。 In the second selection algorithm, even if the connection port 122 is selected in Step A15, if the microphone MA3 has an audio input, the selected connection port is immediately switched to the connection port 123. Similarly, even if the connection port 123 is selected, if there is an audio input to the microphone MA1, the selected connection port is immediately switched to the connection port 121. Only when the connection port 121 is selected, the selection of the connection port 121 is continued as long as sound is input to the microphone MA1.

即ち、参加者Ａ１は、参加者Ａ２及び参加者Ａ３に較べて高い優先度を与えられている。このようなアルゴリズムは、例えば、議長などの進行役（即ち、参加者Ａ１に相当する）が存在し、参加者の少なくとも一部に発言上の序列が存在する会議などに有効なアルゴリズムとなっている。 That is, the participant A1 is given higher priority than the participant A2 and the participant A3. Such an algorithm is an effective algorithm for a meeting in which a facilitator such as a chairperson (that is, corresponding to the participant A1) exists and at least a part of the participants has a ranking in speech. Yes.

尚、図３に係る第１選択アルゴリズムにおいて、ステップＡ１７及びステップＡ２１に係る経過時間を「ゼロ」又はそれに準じるような極めて小さい値に設定した場合、マイクＭＡ２への音声入力が一瞬途切れた時点でマイクＭＡ３への入力があれば、マイクＭＡ３へと選択が切り替わることとなり、図４に係る第２選択アルゴリズムと類似したアルゴリズムを実現することも可能である。この場合、会議用端末装置１００において、係る待機時間が外部入力可能に構成されていれば、接続ポート１２２及び接続ポート１２３に対応する経過時間を短く設定することによって容易にこのような制御を実現することも可能である。また、この場合には、参加者各々に対し、優先度或いは発言の重要度などに応じて適切な経過時間を設定し、会議毎に最適なアルゴリズムを形成することも容易にして可能である。 In the first selection algorithm according to FIG. 3, when the elapsed time according to step A17 and step A21 is set to “zero” or an extremely small value corresponding thereto, the voice input to the microphone MA2 is momentarily interrupted. If there is an input to the microphone MA3, the selection is switched to the microphone MA3, and an algorithm similar to the second selection algorithm according to FIG. 4 can be realized. In this case, if the waiting time is configured to be externally input in the conference terminal device 100, such control can be easily realized by setting the elapsed time corresponding to the connection port 122 and the connection port 123 short. It is also possible to do. In this case, it is possible to easily set an appropriate elapsed time for each participant according to the priority level or the importance level of the speech, and to form an optimal algorithm for each conference.

次に、図５を参照して、これら２種類のアルゴリズムを視覚的に説明する。ここに、図５は、選択されるマイクロフォンの時間経過を表すチャートである。 Next, these two types of algorithms will be described visually with reference to FIG. FIG. 5 is a chart showing the time course of the selected microphone.

図５において、「音声入力レベル」とは、各マイクロフォンにおいて生成される音声信号を２値化して得られるレベルであり、「Ｈ」又は「Ｌ」のいずれかの値を採る。「Ｈ」レベルとは、対応するマイクロフォンに音声入力があることを表しており、「Ｌ」レベルとは、対応するマイクロフォンに音声入力がないことを表している。また、「選択結果」とは、選択部１３０による選択結果を表し、ＣＯＤＥＣ１４０に入力される音声信号の種類と等価である。即ち、「マイク○○」と表記されている期間は、マイク○○への入力音声に対応する音声信号がＣＯＤＥＣ１４０に入力される。尚、「（マイク○○）」と表記されている期間は、いずれのマイクへも音声入力が行われていない状態であり、単にマイクロフォンが選択されているのみの状態を表している。 In FIG. 5, “audio input level” is a level obtained by binarizing an audio signal generated in each microphone, and takes either “H” or “L”. The “H” level represents that the corresponding microphone has a voice input, and the “L” level represents that the corresponding microphone has no voice input. The “selection result” represents a selection result by the selection unit 130 and is equivalent to the type of the audio signal input to the CODEC 140. That is, during the period indicated as “Mic OO”, an audio signal corresponding to the input audio to the microphone OO is input to the CODEC 140. Note that the period indicated as “(microphone OO)” indicates a state in which no sound is input to any microphone, and only a microphone is selected.

図５において、第１アルゴリズムに従った場合、マイクロフォンの選択結果は、時刻Ｔ０においてマイクＭＡ１、時刻Ｔ２においてマイクＭＡ２、時刻Ｔ５においてマイクＭＡ３、そして時刻Ｔ８においてマイクＭＡ１へと切り替わる。即ち、選択されているマイクロフォンへの音声入力が継続している期間（音声入力レベルが「Ｈ」である期間）は、選択されるマイクロフォンが切り替わることはない。従って、総じて長い周期で発言者（他の拠点において発言者と見なされる音声）が切り替わることになる。 In FIG. 5, when the first algorithm is followed, the microphone selection result is switched to microphone MA1 at time T0, microphone MA2 at time T2, microphone MA3 at time T5, and microphone MA1 at time T8. In other words, the selected microphone is not switched during a period in which sound input to the selected microphone is continued (period in which the sound input level is “H”). Therefore, the speaker (speech regarded as a speaker at another base) is switched in a generally long cycle.

一方、第２選択アルゴリズムに従った場合、時刻Ｔ０においてマイクＭＡ１、時刻Ｔ２においてマイクＭＡ２、時刻Ｔ３においてマイクＭＡ３、時刻Ｔ４においてマイクＭＡ１、時刻Ｔ６においてマイクＭＡ３、そして時刻Ｔ８においてマイクＭＡ１へと選択結果が切り替わる。即ち、マイクＭＡ２及びマイクＭＡ３に関しては、例え音声入力中であったとしても、他のマイクロフォンからの音声入力がある場合にはマイクロフォンの切換えが生じる。従って、総じて短い周囲で発言者が切り替わることになる。 On the other hand, when the second selection algorithm is followed, microphone MA1 at time T0, microphone MA2 at time T2, microphone MA3 at time T3, microphone MA1 at time T4, microphone MA3 at time T6, and microphone MA1 at time T8 are selected. The result switches. That is, regarding the microphone MA2 and the microphone MA3, even if voice input is being performed, if there is a voice input from another microphone, the microphones are switched. Therefore, the speaker is switched in a short surrounding as a whole.

このように、本実施例に係る会議用端末装置１００によれば、会議の規模、目的、或いは性質などに応じて適宜アルゴリズムを切替えることによって、如何なる形態の会議であっても効率的且つ効果的に進行させることが可能となる。従って、会議制御システム１０Ａによって、会議を効率的且つ効果的に進行させることが可能となるのである。 As described above, according to the conference terminal device 100 according to the present embodiment, the algorithm is appropriately switched according to the scale, purpose, or nature of the conference, so that any type of conference can be efficiently and effectively performed. It is possible to proceed to. Therefore, the conference can be efficiently and effectively advanced by the conference control system 10A.

＜２：第２実施例＞
上記第１実施例においては、本発明に係る会議用端末装置を音声会議に適用した場合について説明したが、更に本発明に係る会議用端末装置は、映像を含めた会議にも効果的に作用する。以下、そのような本発明の第２実施例について説明する。 <2: Second embodiment>
In the first embodiment, the case where the conference terminal device according to the present invention is applied to an audio conference has been described. However, the conference terminal device according to the present invention is also effective in a conference including video. To do. The second embodiment of the present invention will be described below.

＜２．１：第２実施例の概要＞
図６は、本発明の第２実施例に係る拠点Ａの模式図である。尚、同図において、図２と重複する箇所には同一の符号を付してその説明を省略する。 <2.1: Overview of the second embodiment>
FIG. 6 is a schematic diagram of the site A according to the second embodiment of the present invention. In the figure, the same reference numerals are assigned to the same parts as in FIG.

図６において、拠点Ａには図２における会議制御システム１０Ａの代わりに会議制御システム１０ＡＡが設置されている。会議制御システム１０ＡＡは、カメラ３０及び表示装置４０を備える点で会議制御システム１０Ａと構成が異なる、本発明に係る「会議システム」の他の一例である。 In FIG. 6, a conference control system 10AA is installed at the site A instead of the conference control system 10A in FIG. The conference control system 10AA is another example of the “conference system” according to the present invention, which is different in configuration from the conference control system 10A in that the camera 30 and the display device 40 are provided.

カメラ３０は、例えば、デジタルビデオカメラであり、拠点Aにおける会議の様子を撮影することが可能に構成された、本発明に係る「撮像手段」の一例である。カメラ３０には三脚、可動式アタッチメント及びアタッチメント駆動部など（いずれも不図示）が備わっている。カメラ３０は、三脚に固定されると共に、可動式アタッチメントがアタッチメント駆動部によって駆動されて三次元的に回動するのに伴い、被写体を自由に変えることが可能に構成されている。アタッチメント駆動部を駆動するための制御信号は、制御部１１０から送信されている。 The camera 30 is, for example, a digital video camera, and is an example of an “imaging unit” according to the present invention configured to be able to photograph a meeting at the site A. The camera 30 includes a tripod, a movable attachment, an attachment drive unit, and the like (all not shown). The camera 30 is fixed to a tripod, and is configured to be able to freely change the subject as the movable attachment is driven by the attachment drive unit to rotate three-dimensionally. A control signal for driving the attachment driving unit is transmitted from the control unit 110.

表示装置４０は、例えば、プラズマディスプレイ装置などの比較的大型の表示装置であり、画面部４１に、拠点間で行われる会議に関する映像を表示することが可能に構成された、本発明に係る「表示手段」の一例である。画面部４１に表示されるべき映像に関する映像表示用信号は、カメラ３０及び制御部１１０から入力される。 The display device 40 is, for example, a relatively large display device such as a plasma display device, and the screen unit 41 is configured to be able to display a video related to a conference held between bases according to the present invention. It is an example of “display means”. A video display signal related to a video to be displayed on the screen unit 41 is input from the camera 30 and the control unit 110.

一方、会議用端末装置１００には、予め参加者Ａ１、Ａ２及びＡ３の個人的な情報（即ち、本発明に係る「識別データ」の一例）を登録しておくことができる。このような個人情報の登録は、例えば、キーボードやマウスなどの入力手段（不図示）を介して実行されている。尚、ここで述べられる「個人情報」とは、氏名、年齢、所属部署又は役職名などを含み、会議の参考となり得る情報を広く規定するものである。登録された個人情報は、ＲＡＭ１１３に格納される。また、それとは別に、会議用端末装置１００には、参加者Ａ１、参加者Ａ２及び参加者Ａ３の位置情報も登録される。この登録された位置情報もＲＡＭ１１３に格納される。尚、本実施例において格納される位置情報とは、カメラ３０が三脚に固定され、基本構図（拠点Ａを俯瞰する構図）が設定された状態で、参加者Ａ１を撮像可能な位置にカメラ３０を動かした（パン、ティルト又はズームなどの動作を行った）際の、制御量の変化量である。但し、カメラ３０にＧＰＳ（Global Positioning System）などの測位システムが備わる場合には、各参加者の絶対位置であってもよい。 On the other hand, in the conference terminal device 100, personal information of the participants A1, A2, and A3 (that is, an example of “identification data” according to the present invention) can be registered in advance. Such registration of personal information is executed via input means (not shown) such as a keyboard and a mouse, for example. The “personal information” described here includes a name, age, department, title, etc., and widely defines information that can be used as a reference for the conference. The registered personal information is stored in the RAM 113. Separately, the location information of the participant A1, the participant A2, and the participant A3 is also registered in the conference terminal device 100. The registered position information is also stored in the RAM 113. Note that the position information stored in the present embodiment means that the camera 30 is fixed to a tripod and the camera 30 is located at a position where the participant A1 can be imaged in a state where a basic composition (a composition overlooking the base A) is set. Is the amount of change in the control amount when the button is moved (an operation such as pan, tilt, or zoom is performed). However, when the camera 30 is equipped with a positioning system such as GPS (Global Positioning System), the absolute position of each participant may be used.

＜２．２：カメラ３０及び表示装置４０の詳細動作＞
引き続き、図６を参照して、本実施例におけるカメラ３０及び表示装置４０の動作について説明する。本実施例において、制御部１１０が実行する会議制御プログラムには、カメラ３０及び表示装置４０を夫々制御するための映像制御プログラムが含まれている。カメラ３０及び表示装置４０は、この映像制御プログラムに従ってその動作が制御されている。尚、拠点Ｂ及び拠点Ｃも同様の構成を採ることにより、これら各拠点間においては、所謂「テレビ会議」が実現されている。 <2.2: Detailed Operation of Camera 30 and Display Device 40>
Next, operations of the camera 30 and the display device 40 in this embodiment will be described with reference to FIG. In the present embodiment, the conference control program executed by the control unit 110 includes a video control program for controlling the camera 30 and the display device 40, respectively. The operations of the camera 30 and the display device 40 are controlled according to this video control program. Note that the base B and the base C adopt the same configuration, so that a so-called “video conference” is realized between these bases.

カメラ３０は、基本的に、前述した基本構図に従って、拠点Ａの俯瞰映像を撮影している。この映像に関する映像信号（ビデオ信号）は、表示装置４０及びＣＯＤＥＣ１４０に入力される。ＣＯＤＥＣ１４０では、入力された映像信号を符号化して圧縮する。即ち、ＣＯＤＥＣ１４０は、第１実施例に係る音声用のＣＯＤＥＣ機能と本実施例に係るビデオ用のＣＯＤＥＣ機能とを併せもっており、本発明に係る「映像データ生成手段」の一例としても機能するように構成されている。 The camera 30 basically takes a bird's-eye view image of the site A in accordance with the basic composition described above. A video signal (video signal) related to this video is input to the display device 40 and the CODEC 140. The CODEC 140 encodes and compresses the input video signal. That is, the CODEC 140 combines the audio CODEC function according to the first embodiment and the video CODEC function according to the present embodiment, and functions as an example of the “video data generation unit” according to the present invention. It is configured.

ＣＯＤＥＣ１４０により符号圧縮化された映像データ（即ち、本発明に係る「映像データ」の一例）は、通信部１５０から他の拠点に送信される。一方、他の拠点において撮影された映像に関する映像データは、通信部１５０を介して受信され、ＣＯＤＥＣ１４０に入力される。ＣＯＤＥＣ１４０に入力された他の拠点における映像データは、表示装置４０による表示が可能な映像信号に復号化され、ＲＡＭ１１３に一時的に格納されつつ、表示装置４０に入力される。 The video data code-compressed by the CODEC 140 (that is, an example of “video data” according to the present invention) is transmitted from the communication unit 150 to another site. On the other hand, video data relating to videos taken at other sites is received via the communication unit 150 and input to the CODEC 140. The video data at other sites input to the CODEC 140 is decoded into a video signal that can be displayed by the display device 40, and is temporarily stored in the RAM 113 and input to the display device 40.

一方、自拠点の映像に関しては、カメラ３０から表示装置４０に入力される映像信号が直接的に入力される。従って、この場合、カメラ３０は、本発明に係る「映像データ生成手段」の他の一例としても機能している。 On the other hand, the video signal input from the camera 30 to the display device 40 is directly input with respect to the video of the local site. Therefore, in this case, the camera 30 also functions as another example of the “video data generation unit” according to the present invention.

また、ＣＰＵ１１１は、ＲＡＭ１１３から、会議の資料となるアプリケーションソフトウェアのデータファイル（資料データ）を読み出し、表示装置４０による表示が可能な映像信号を生成して表示装置４０に入力している。即ち、表示装置４０には、自拠点（拠点Ａ）の映像信号、他の拠点における映像信号及び資料データに関する映像信号の３種類の映像信号が入力されている。 Further, the CPU 111 reads out a data file (material data) of application software as a conference material from the RAM 113, generates a video signal that can be displayed by the display device 40, and inputs the video signal to the display device 40. That is, the display device 40 is input with three types of video signals: a video signal at the local site (base A), a video signal at another site, and a video signal related to the material data.

ここで、図７を参照して、表示装置４０における画面部４１の詳細について説明する。ここに、図７は、画面部４１の模式図である。 Here, with reference to FIG. 7, the detail of the screen part 41 in the display apparatus 40 is demonstrated. FIG. 7 is a schematic diagram of the screen unit 41.

図７において、画面部４１は、メイン画面４１ａ、サブ画面４１ｂ、サブ画面４１ｃ及び識別情報画面４１ｄの計４画面に分割されている。 In FIG. 7, the screen section 41 is divided into a total of four screens, a main screen 41a, a sub screen 41b, a sub screen 41c, and an identification information screen 41d.

メイン画面４１ａには、資料データに基づいた資料映像が表示される。会議の参加者は、係るメイン画面４１ａに表示される資料映像に基づいて議論を行う。 A material video based on the material data is displayed on the main screen 41a. Participants in the conference discuss based on the material video displayed on the main screen 41a.

サブ画面４１ｂには、自拠点の映像が表示される。これは、カメラ３０から直接的に入力された、自拠点に関する映像信号に基づいて表示される映像である。 On the sub screen 41b, an image of the local site is displayed. This is a video that is displayed based on a video signal related to the local site that is directly input from the camera 30.

サブ画面４１ｃには、他の拠点の映像が表示される。これは、通信部１５０を介して得られた他の拠点における映像データから、ＣＯＤＥＣ１４０を介して得られる映像信号に基づいて表示される映像である。識別情報画面４１ｄについては後述する。 On the sub screen 41c, images of other bases are displayed. This is a video displayed based on a video signal obtained via the CODEC 140 from the video data obtained at another location obtained via the communication unit 150. The identification information screen 41d will be described later.

一方、ＣＰＵ１１１は会議制御プログラムを実行する過程において、前述の選択アルゴリズムに基づいた発言者の選択を行っている。ここで、本実施例において、ＣＰＵ１１１は、現在選択されているマイクに対応する参加者をカメラ３０によって撮像し、画面部４１におけるサブ画面４１ｂに表示することが可能に構成されている。 On the other hand, in the process of executing the conference control program, the CPU 111 selects a speaker based on the above selection algorithm. Here, in the present embodiment, the CPU 111 is configured to be able to capture the participant corresponding to the currently selected microphone with the camera 30 and display it on the sub screen 41 b in the screen unit 41.

具体的には、ＣＰＵ１１１は、選択された接続ポート（マイク或いは参加者でも等価である）に対応する位置情報をＲＡＭ１１３から読み出し、この読み出した位置情報に対応する制御信号を生成して、カメラ３０に送信する。尚、この制御信号とは、前述した制御量の変化量に対応して生成される信号であり、カメラ３０の構図を、基本構図から選択された参加者へと変化させるための信号である。 Specifically, the CPU 111 reads position information corresponding to the selected connection port (equivalent to a microphone or a participant) from the RAM 113, generates a control signal corresponding to the read position information, and generates the camera 30. Send to. The control signal is a signal generated in accordance with the above-described change amount of the control amount, and is a signal for changing the composition of the camera 30 from the basic composition to the selected participant.

カメラ３０ではこの制御情報に基づいて、アタッチメント駆動部を駆動し、カメラ３０の構図を、現在発言を行っている参加者に対応する構図に変化させる。こうして、画面部４１ｂには、その時点で選択されている（発言を行っている）参加者が刻々と映し出されることとなる。 Based on this control information, the camera 30 drives the attachment drive unit to change the composition of the camera 30 to a composition corresponding to the participant who is currently speaking. Thus, the participant currently selected (speaking) is displayed on the screen unit 41b.

更に、ＣＰＵ１１１は、カメラ３０の動作を制御すると同時に、現在選択されている参加者に対応する個人情報をＲＡＭ１１３から読み出す。ＣＰＵ１１１は、この読み出した個人情報を識別情報画面４１ｄに表示するための表示用信号を生成し、表示装置４０に入力する。表示装置４０は、この個人情報に関する表示用信号に基づいて、識別情報画面４１ｄに現在選択されている参加者の個人情報を表示する。図７における識別情報画面４１ｄには、この様子が示されている。尚、この読み出された個人情報は、通信部１５０を介して他の拠点にも送信される。他の拠点においては、この送信された個人情報に基づいて、サブ画面４１ｃに拠点Ａにおける発言者の映像が表示される。 Further, the CPU 111 controls the operation of the camera 30 and simultaneously reads out personal information corresponding to the currently selected participant from the RAM 113. The CPU 111 generates a display signal for displaying the read personal information on the identification information screen 41 d and inputs it to the display device 40. The display device 40 displays the personal information of the participant currently selected on the identification information screen 41d based on the display signal related to the personal information. This state is shown on the identification information screen 41d in FIG. The read personal information is also transmitted to other bases via the communication unit 150. At other sites, the video of the speaker at the site A is displayed on the sub-screen 41c based on the transmitted personal information.

以上説明したように、第２実施例に係る会議制御システム１０ＡＡでは、音声に映像を伴った、一層充実した会議を行うことが可能となっている。この際、カメラ３０は、音声（マイク）の選択結果と連動して、常に発言を行っている参加者が撮影されるように制御されるので、非常に効果的である。また、更に選択された参加者の個人情報も表示可能に構成されることによって、会議を一層円滑に進行させることが可能となっている。即ち、複数拠点間で会議を効率的且つ効果的に行うことが容易にして可能となっているのである。 As described above, in the conference control system 10AA according to the second embodiment, it is possible to hold a more fulfilling conference with audio and video. At this time, the camera 30 is very effective because it is controlled so that the participant who is always speaking is photographed in conjunction with the selection result of the sound (microphone). Further, since the personal information of the selected participant can be displayed, it is possible to make the conference proceed more smoothly. That is, it is possible to easily and efficiently hold a conference between a plurality of bases.

本発明は、上述した実施例に限られるものではなく、請求の範囲及び明細書全体から読み取れる発明の要旨或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴う会議用端末装置、会議システム及びコンピュータプログラムもまた本発明の技術的範囲に含まれるものである。 The present invention is not limited to the above-described embodiments, and can be changed as appropriate without departing from the spirit or concept of the invention that can be read from the claims and the entire specification. The conference system and the computer program are also included in the technical scope of the present invention.

本発明の第１実施例に係る会議システムの概念図である。It is a conceptual diagram of the conference system which concerns on 1st Example of this invention. 図１の会議システムにおける一拠点の模式図である。It is a schematic diagram of one base in the conference system of FIG. 図２の会議制御システムにおいて会議用端末装置により行われる会議制御プログラムに関する第１選択アルゴリズムのフローチャートである。It is a flowchart of the 1st selection algorithm regarding the conference control program performed by the conference terminal device in the conference control system of FIG. 図２の会議制御システムにおいて会議用端末装置により行われる会議制御プログラムに関する第２選択アルゴリズムのフローチャートである。It is a flowchart of the 2nd selection algorithm regarding the conference control program performed by the conference terminal device in the conference control system of FIG. 図４及び図５の各アルゴリズムについて、選択されるマイクロフォンの時間経過を表すチャートである。It is a chart showing the time course of the microphone selected about each algorithm of FIG.4 and FIG.5. 本発明の第２実施例に係る一拠点の模式図である。It is a mimetic diagram of one base concerning the 2nd example of the present invention. 図６の会議制御システムに備わる表示装置における、画面部の模式図である。It is a schematic diagram of the screen part in the display apparatus with which the conference control system of FIG. 6 is equipped.

Explanation of symbols

１０…会議システム、１０Ａ…会議制御システム、１０ＡＡ…会議制御システム、２０…ネットワーク、３０…カメラ、４０…表示装置、１００…会議用端末装置、１１０…制御部、１２１、１２２、１２３…接続ポート、１３０…選択部、１４０…ＣＯＤＥＣ、１５０…通信部、ＳＰＡ…スピーカ、ＭＡ１、ＭＡ２、ＭＡ３…マイクロフォン、Ａ１、Ａ２、Ａ３…参加者。 DESCRIPTION OF SYMBOLS 10 ... Conference system, 10A ... Conference control system, 10AA ... Conference control system, 20 ... Network, 30 ... Camera, 40 ... Display device, 100 ... Conference terminal device, 110 ... Control part, 121, 122, 123 ... Connection port , 130 ... selection unit, 140 ... CODEC, 150 ... communication unit, SPA ... speaker, MA1, MA2, MA3 ... microphone, A1, A2, A3 ... participants.

Claims

In order to conduct a conference between a plurality of bases, (i) installed in association with the participants of the conference at the bases, respectively collecting voices and generating voice signals corresponding to the collected voices. A conference terminal device installed together with a plurality of microphones and (ii) at least one speaker,
Selecting means for selecting one microphone from the plurality of microphones according to a predetermined type of algorithm;
Voice data generating means for generating voice data corresponding to the voice signal generated by the selected one microphone from the plurality of voice signals corresponding to the plurality of microphones;
Communication that transmits the generated voice data via a network to other bases other than the own base among the plurality of bases and receives the voice data generated at the other bases via the network Means,
And a control means for controlling the loudspeaker so as to perform voice output based on the received voice data.

The base further includes (i) an imaging unit that captures an image of the subject and generates a video signal corresponding to the captured subject, and (ii) a display unit that displays an image related to the subject,
The conference terminal further comprises video data generating means for generating a predetermined type of video data corresponding to the generated video signal,
The communication means further transmits the generated video data to the other base via the network and receives the video data generated at the other base via the network from the other base. ,
The conference according to claim 1, wherein the control means further controls the display means to perform video output based on at least one of the generated video data and the received video data. Terminal equipment.

The conference terminal device according to claim 2, wherein the control unit further controls the imaging unit such that a participant corresponding to the selected one microphone becomes the subject.

Further comprising identification data generating means for generating a predetermined type of identification data related to the participant corresponding to the selected one microphone;
The communication means further transmits the generated identification data via the network to the other base and receives identification data generated at the other base via the network from the other base. ,
The conference terminal device according to claim 2, wherein the control unit further controls the display unit based on at least one of the generated identification data and the received identification data.

A plurality of algorithms are prepared in advance according to the form of the conference,
5. The conference terminal device according to claim 1, wherein the selection unit selects the one microphone according to one of the plurality of prepared algorithms. 6.

The conference terminal device according to claim 5, wherein the plurality of prepared algorithms include an algorithm that cyclically selects each of the plurality of microphones.

The conference algorithm according to claim 5, wherein the plurality of prepared algorithms include an algorithm for selecting each of the plurality of microphones according to a priority given in advance to each of the plurality of microphones. Terminal device.

Installed at a base for a conference between a plurality of bases, (i) Installed in association with the participants of the conference, and collects voice and generates a voice signal corresponding to the collected voice A plurality of microphones, (ii) at least one speaker, and (iii) a conference terminal device,
The conference terminal device is:
Selecting means for selecting one microphone from the plurality of microphones according to a predetermined type of algorithm;
Voice data generating means for generating voice data corresponding to the voice signal generated by the selected one microphone from the plurality of voice signals corresponding to the plurality of microphones;
Communication that transmits the generated voice data via a network to other bases other than the own base among the plurality of bases and receives the voice data generated at the other bases via the network Means,
And a control means for controlling the speaker so as to output audio based on the received audio data.

A computer program for causing a computer system to function as the selection means according to any one of claims 1 to 7.

A computer program that causes a computer system to function as the selection unit, the voice data generation unit, and the control unit according to any one of claims 1 to 7.