JP2013008210A

JP2013008210A - Conference system, image display device, and image voice processing method

Info

Publication number: JP2013008210A
Application number: JP2011140627A
Authority: JP
Inventors: Mitsuru Kubota; 満久保田
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2011-06-24
Filing date: 2011-06-24
Publication date: 2013-01-10

Abstract

PROBLEM TO BE SOLVED: To provide a conference system for improving convenience.SOLUTION: A conference system 1 displays a synthetic image obtained by synthesizing respective images to be treated by respective terminals 2 with one another on an image display device 3. Each terminal 2 includes: input acceptance means for accepting input of display states of an image to be treated by the terminal 2 and images to be treated by other terminals 2 in the synthetic image; and transmission control means for transmitting voice information regarding voice collected by microphones, display information regarding the display states, and image information regarding the image to be treated by the terminal 2 to the image display device 3. The image display device 3 includes: reception means for receiving the voice information, the display information, and the image information; first control means for generating the synthetic image on the basis of the display information, and making the display means display the synthetic image; and second control means for generating synthetic voice of the respective voices collected by the respective microphones on the basis of the display information, and making a speaker output the synthetic voice.

Description

本発明は、会議システム、画像表示装置、及び画像音声処理方法に関する。 The present invention relates to a conference system, an image display device, and an image sound processing method.

従来、会議を行う際に利用される会議システムとして、会議の各参加者の利用に供される各端末の各表示画面をスクリーン上に同時に表示（分割表示）する構成が知られている（例えば、特許文献１参照）。
特許文献１に記載の会議システムは、会議の各参加者の利用に供される複数のコンピューター（端末）と、会議の司会者の利用に供される司会者コンピューター（サーバー）と、プロジェクターとがネットワークにより接続された構成を有する。
サーバーは、ネットワークを介して各端末が扱う各画像データ（各端末の各表示画面に関する画像データ）を受信した後、当該各画像データに基づく各画像を合成した合成画像（分割表示画面）を生成し、当該合成画像に関する合成画像データをプロジェクターに送信する。
そして、プロジェクターは、ネットワークを介して合成画像データを受信し、当該合成画像データに基づく合成画像をスクリーン上に表示する。 2. Description of the Related Art Conventionally, as a conference system used when a conference is performed, a configuration is known in which each display screen of each terminal used for the use of each participant in the conference is simultaneously displayed (divided display) on a screen (for example, , See Patent Document 1).
The conference system described in Patent Document 1 includes a plurality of computers (terminals) provided for use by each participant of a conference, a host computer (server) provided for use by a conference host, and a projector. It has a configuration connected by a network.
The server receives each image data handled by each terminal (image data related to each display screen of each terminal) via the network, and then generates a composite image (divided display screen) by combining the images based on each image data. Then, the composite image data related to the composite image is transmitted to the projector.
The projector receives the composite image data via the network and displays a composite image based on the composite image data on the screen.

特開２０１０−１９３５２１号公報JP 2010-193521 A

ところで、会議の場では、スクリーン上の合成画像を参照しながら、主に説明を行う参加者（合成画像に含まれる自身が作成した資料（自身が利用する端末装置が扱う画像）に基づいて説明を行う参加者（以下、主発言者））が存在するものである。
しかしながら、特許文献１に記載の会議システムでは、各参加者の資料（各端末装置が扱う各画像）がスクリーン上に分割表示されるため、他の参加者は、主発言者がスクリーン上に表示された各画像のうち、どの画像に基づいて説明を行っているか判断し難い。
また、主発言者が説明を行っている際に、他の参加者の一部が会話等をしていた場合には、主発言者の声が聞き取り難いものとなる。
以上のことから、特許文献１に記載の会議システムでは、利便性の向上が図り難い、という問題がある。 By the way, in the meeting place, referring to the composite image on the screen, the explanation is mainly based on the participant (the material created by the self included in the composite image (the image handled by the terminal device used by the user) included in the composite image). Participants (hereinafter referred to as main speakers) who perform
However, in the conference system described in Patent Document 1, since each participant's material (each image handled by each terminal device) is divided and displayed on the screen, the other speakers can display the main speaker on the screen. It is difficult to determine which image is used for the explanation among the images that have been displayed.
In addition, when the main speaker is explaining, if some of the other participants have a conversation or the like, the voice of the main speaker is difficult to hear.
From the above, the conference system described in Patent Document 1 has a problem that it is difficult to improve convenience.

本発明の目的は、利便性の向上が図れる会議システム、画像表示装置、及び画像音声処理方法を提供することにある。 An object of the present invention is to provide a conference system, an image display device, and an image sound processing method capable of improving convenience.

本発明の会議システムは、複数の端末装置と、前記複数の端末装置と通信可能に接続された画像表示装置とを備え、前記複数の端末装置が扱う各画像を合成した合成画像を前記画像表示装置に表示させる会議システムであって、前記端末装置は、周囲の音を集音する音声収集手段と、前記合成画像における、当該端末装置が扱う画像と、他の前記端末装置が扱う画像との表示状態の入力を受け付ける入力受付手段と、前記音声収集手段にて集音された音声に関する音声情報、前記表示状態に関する表示情報、及び当該端末装置が扱う画像に関する画像情報を前記画像表示装置に送信する送信制御手段とを備え、前記画像表示装置は、画像を表示する表示手段と、前記音声情報、前記表示情報、及び前記画像情報を受信する受信手段と、前記表示情報に基づいて、前記複数の端末装置からの各前記画像情報に基づく各画像を合成して前記合成画像を生成し、生成した前記合成画像を前記表示手段に表示させる第１制御手段と、音声を出力する音声出力手段と、前記表示情報に基づいて、前記複数の端末装置からの各前記音声情報に基づく各音声を合成して合成音声を生成し、生成した前記合成音声を前記音声出力手段に出力させる第２制御手段とを備えることを特徴とする。 The conference system of the present invention includes a plurality of terminal devices and an image display device connected to be communicable with the plurality of terminal devices, and displays the combined image obtained by combining the images handled by the plurality of terminal devices. In the conference system to be displayed on the device, the terminal device includes: a sound collection unit that collects ambient sounds; an image handled by the terminal device in the synthesized image; and an image handled by the other terminal device Input receiving means for receiving an input of a display state; audio information relating to the sound collected by the voice collecting means; display information relating to the display state; and image information relating to an image handled by the terminal device are transmitted to the image display device. The image display device includes a display unit that displays an image, a reception unit that receives the audio information, the display information, and the image information, and the table. A first control unit configured to generate the composite image by combining the images based on the image information from the plurality of terminal devices based on the information, and to display the generated composite image on the display unit; Based on the display information, and synthesizing each voice based on the voice information from the plurality of terminal devices to generate a synthesized voice, and generating the synthesized voice as the voice output means And a second control means for outputting the output.

本発明では、会議システムは、上述したように構成された複数の端末装置及び画像表示装置を備える。このため、以下に示すように、各端末装置が扱う各画像を画像表示装置に表示させるとともに、各端末装置のマイクロフォン等の各音声収集手段にて集音された各音声を画像表示装置に出力させることができる。
例えば、参加者の一人が自身の利用する端末装置の入力手段により、自身が利用する端末装置が扱う画像と、他の端末装置が扱う画像との表示状態を入力すると、当該端末装置の入力受付手段が当該表示状態の入力を受け付ける（入力受付ステップ）。
ここで、表示状態とは、例えば、合成画像（分割表示画面）において、各端末装置が扱う各画像の表示位置や、表示サイズ等を意味するものである。
そして、複数の端末装置の各送信制御手段は、端末装置毎に設けられたマイクロフォン等の各音声収集手段から出力された音声情報、及び自身の端末装置が扱う画像に関する画像情報の他、表示状態に関する表示情報（表示状態を受け付けた端末装置のみが送信）を画像表示装置に送信する（送信ステップ）。 In the present invention, the conference system includes a plurality of terminal devices and image display devices configured as described above. For this reason, as shown below, each image handled by each terminal device is displayed on the image display device, and each sound collected by each sound collecting means such as a microphone of each terminal device is output to the image display device. Can be made.
For example, when one of the participants inputs the display state of the image handled by the terminal device used by the participant and the image handled by another terminal device by the input means of the terminal device used by the participant, the input acceptance of the terminal device is accepted. The means accepts an input of the display state (input acceptance step).
Here, the display state means, for example, the display position and display size of each image handled by each terminal device in a composite image (divided display screen).
And each transmission control means of a plurality of terminal devices displays the display state in addition to the sound information output from each sound collecting means such as a microphone provided for each terminal device and the image information related to the image handled by its own terminal device. Display information (sent only by the terminal device receiving the display state) is transmitted to the image display device (transmission step).

一方、画像表示装置において、受信手段は、各端末装置から送信された音声情報、表示情報、及び画像情報を受信する（受信ステップ）。
また、第１制御手段は、表示情報（各画像の表示位置や表示サイズ等）に基づいて、各端末装置からの各画像情報に基づく各画像を合成し、合成画像を表示手段に表示させる（第１制御ステップ）。
さらに、第２制御手段は、表示情報に基づいて、各端末装置からの各音声情報に基づく各音声を合成し、合成音声をスピーカー等の音声出力手段に出力させる（第２制御ステップ）。
例えば、第１，第２制御手段は、表示情報が「複数の端末装置が扱う各画像のうち第１，第２端末装置が扱う２つの画像を左右に分割表示し、主発言者の利用に供される第１端末装置が扱う画像の表示位置及び表示サイズを「左側」及び「大」とし、他の参加者の利用に供される第２端末装置が扱う画像の表示位置及び表示サイズを「右側」及び「小」とする」旨の情報であった場合には、以下に示すように、合成画像及び合成音声を生成する。 On the other hand, in the image display device, the receiving means receives the audio information, display information, and image information transmitted from each terminal device (reception step).
Further, the first control unit synthesizes the images based on the image information from the terminal devices based on the display information (display position and display size of each image) and displays the combined image on the display unit ( First control step).
Further, the second control means synthesizes each voice based on each voice information from each terminal device based on the display information, and outputs the synthesized voice to a voice output means such as a speaker (second control step).
For example, the first and second control means may display the display information “two images handled by the first and second terminal devices among the images handled by the plurality of terminal devices are divided into left and right parts for use by the main speaker. The display position and display size of the image handled by the first terminal device provided are “left” and “large”, and the display position and display size of the image handled by the second terminal device provided for use by other participants are If the information indicates “right” and “small”, a synthesized image and synthesized speech are generated as shown below.

すなわち、第１制御手段は、上記表示情報に基づいて、第１端末装置から送信された画像情報に基づく画像（主発言者の資料等）を左側に大きく、第２端末装置から送信された画像（他の参加者の資料等）を右側に小さくした合成画像を生成する。
また、第２制御手段は、表示情報に基づいて、第１端末装置から送信された音声情報に基づく音声（主発言者の声）の出力レベルを増加させ、第２端末装置から送信された音声情報に基づく音声（他の参加者の声）の出力レベルを低下させ、当該調整された各音声を合成して合成音声を生成する。 That is, based on the display information, the first control means enlarges an image based on the image information transmitted from the first terminal device (material of the main speaker, etc.) to the left, and an image transmitted from the second terminal device. A composite image in which (other participants' materials, etc.) is reduced to the right is generated.
Further, the second control means increases the output level of the voice (the voice of the main speaker) based on the voice information transmitted from the first terminal device based on the display information, and the voice transmitted from the second terminal device. The output level of the voice based on the information (voices of other participants) is lowered, and the adjusted voices are synthesized to generate a synthesized voice.

以上のことから、端末装置を利用して表示状態を適宜、入力することで、主発言者の資料（主発言者の利用に供される端末装置が扱う画像）を合成画像（分割表示画面）中の任意の表示位置や、任意の表示サイズ等に設定できる。このため、他の参加者に主発言者がどの画像に基づいて説明を行っているかを容易に判断させることができる。
また、各参加者の声を各マイクロフォン等にて集音し、上述したように主発言者の声を他の参加者の声に対して強調した合成音声を生成及び出力できるので、主発言者が説明を行っている際に、他の参加者の一部が会話等をしていた場合であっても、主発言者の声（スピーカー等を介した音声）を聞き取りやすいものとすることができる。
したがって、利便性の向上が図れる。
また、合成画像及び合成音声の生成を画像表示装置自身が実行するため、従来のようなサーバーを必要とせず、会議システムの構成を簡素化できる。 From the above, by inputting the display state as appropriate using the terminal device, the main speaker's material (the image handled by the terminal device used for the main speaker's use) is synthesized (divided display screen). It can be set to any display position inside, any display size, etc. For this reason, other participants can easily determine which image the main speaker is explaining.
In addition, each participant's voice is collected by each microphone, etc., and as described above, a synthesized speech in which the voice of the main speaker is emphasized with respect to the voices of other participants can be generated and output. When the explanation is given, the voice of the main speaker (voice through speakers, etc.) should be easy to hear even if some of the other participants are talking. it can.
Therefore, the convenience can be improved.
In addition, since the image display apparatus itself generates a synthesized image and synthesized speech, a conventional server is not required, and the configuration of the conference system can be simplified.

本発明の会議システムは、複数の端末装置と、画像表示装置と、前記複数の端末装置及び前記画像表示装置と通信可能に接続された情報処理装置とを備え、前記複数の端末装置が扱う各画像を合成した合成画像を前記画像表示装置に表示させる会議システムであって、前記端末装置は、周囲の音を集音する音声収集手段と、前記合成画像における、当該前記端末装置が扱う画像と、他の前記端末装置が扱う画像との表示状態の入力を受け付ける入力受付手段と、前記音声収集手段にて集音された音声に関する音声情報、前記表示状態に関する表示情報、及び当該端末装置が扱う画像に関する画像情報を前記情報処理装置に送信する送信制御手段とを備え、前記情報処理装置は、前記表示情報に基づいて、前記複数の端末装置からの各前記画像情報に基づく各画像を合成して前記合成画像を生成する第１制御手段と、前記表示情報に基づいて、前記複数の端末装置からの各前記音声情報に基づく各音声を合成して合成音声を生成する第２制御手段と、前記通信路を介して、前記合成画像に関する合成画像情報、及び前記合成音声に関する合成音声情報を前記画像表示装置に送信する第３制御手段とを備え、前記画像表示装置は、前記情報処理装置からの前記合成画像情報に基づく前記合成画像を表示する表示手段と、前記情報処理装置からの前記合成音声情報に基づく前記合成音声を出力する音声出力手段とを備えることを特徴とする。 The conference system of the present invention includes a plurality of terminal devices, an image display device, the plurality of terminal devices and an information processing device connected to be communicable with the image display device, and each of the plurality of terminal devices handles A conference system that displays a composite image obtained by combining images on the image display device, wherein the terminal device includes a sound collection unit that collects ambient sounds, and an image handled by the terminal device in the composite image. , An input receiving unit that receives an input of a display state with an image handled by the other terminal device, voice information about the sound collected by the voice collecting unit, display information about the display state, and the terminal device Transmission control means for transmitting image information related to an image to the information processing apparatus, and the information processing apparatus receives the images from the plurality of terminal devices based on the display information. First control means for synthesizing each image based on the report to generate the synthesized image, and based on the display information, synthesizing the sounds based on the audio information from the plurality of terminal devices, A second control means for generating; and a third control means for transmitting the synthesized image information relating to the synthesized image and the synthesized voice information relating to the synthesized voice to the image display device via the communication path, and the image display. The apparatus includes display means for displaying the synthesized image based on the synthesized image information from the information processing apparatus, and voice output means for outputting the synthesized voice based on the synthesized voice information from the information processing apparatus. It is characterized by.

本発明では、会議システムは、上述したように構成された複数の端末装置、画像表示装置、及び情報処理装置を備える。
言い換えれば、本発明の会議システムは、情報処理装置を備えない上述した会議システムにおいて、画像表示装置の機能の一部（合成画像及び合成音声を生成する機能）を省略し、情報処理装置に当該機能の一部を設けている。
したがって、情報処理装置を備えない上述した会議システムと同様の効果を享受できる。
また、合成画像及び合成音声の生成を画像表示装置ではなく、情報処理装置で実行するため、画像表示装置に合成画像及び合成音声を生成する機能を別途、設ける必要がなく、汎用の画像表示装置を利用した会議システムを構築できる。 In the present invention, the conference system includes a plurality of terminal devices, image display devices, and information processing devices configured as described above.
In other words, the conference system of the present invention omits a part of the functions of the image display device (a function of generating a synthesized image and synthesized speech) in the above-described conference system that does not include the information processing device, and Some of the functions are provided.
Therefore, the same effect as the above-described conference system that does not include the information processing apparatus can be enjoyed.
In addition, since the generation of the synthesized image and the synthesized voice is executed by the information processing apparatus instead of the image display apparatus, it is not necessary to provide a separate function for generating the synthesized image and synthesized voice in the image display apparatus. It is possible to construct a conference system using

本発明の会議システムでは、前記表示状態は、前記複数の端末装置が扱う各画像の表示サイズを含むものであり、前記第２制御手段は、前記各画像の表示サイズに基づいて、前記各画像に対応した前記各音声の出力レベルを調整し、前記合成音声を生成することが好ましい。
本発明では、表示状態が上述した表示サイズを含むものであるので、当該表示サイズに基づいて第１，第２制御手段が合成画像及び合成音声を生成すれば、主発言者の資料（主発言者の利用に供される端末装置が扱う画像）や声を他の参加者の資料や声に対して強調できる。
したがって、主発言者の資料や声を視聴し易いものとなり、利便性の向上が図れる。 In the conference system of the present invention, the display state includes a display size of each image handled by the plurality of terminal devices, and the second control unit determines whether each image is based on the display size of each image. It is preferable to adjust the output level of each voice corresponding to the above and generate the synthesized voice.
In the present invention, since the display state includes the display size described above, if the first and second control means generate a synthesized image and synthesized speech based on the display size, the material of the main speaker (the main speaker's material) It is possible to emphasize the image) and voice of the terminal device used for use with respect to other participants' materials and voice.
Therefore, it becomes easy to view the material and voice of the main speaker, and the convenience can be improved.

本発明の画像表示装置は、複数の端末装置と通信可能に接続され、前記複数の端末装置が扱う各画像が合成された合成画像を表示する画像表示装置であって、当該画像表示装置は、画像を表示する表示手段と、前記複数の端末装置のそれぞれにおいて集音された音声に関する音声情報、前記合成画像における、前記複数の端末装置のそれぞれが扱う画像の表示状態に関する表示情報、及び前記複数の端末装置のそれぞれが扱う画像情報を受信する受信手段と、前記表示情報に基づいて、前記複数の端末装置からの各前記画像情報に基づく各画像を合成して前記合成画像を生成し、生成した前記合成画像を前記表示手段に表示させる第１制御手段と、音声を出力する音声出力手段と、前記表示情報に基づいて、前記複数の端末装置からの各前記音声情報に基づく各音声を合成して合成音声を生成し、生成した前記合成音声を前記音声出力手段に出力させる第２制御手段とを備えることを特徴とする。
本発明の画像表示装置は、上述した会議システムに利用されるものであるので、上述した会議システムと同様の作用及び効果を享受できる。 The image display device of the present invention is an image display device that is communicably connected to a plurality of terminal devices and displays a combined image in which the images handled by the plurality of terminal devices are combined, and the image display device includes: Display means for displaying an image; audio information relating to sound collected by each of the plurality of terminal devices; display information relating to a display state of an image handled by each of the plurality of terminal devices in the synthesized image; Receiving means for receiving image information handled by each of the terminal devices, and generating the composite image by combining the images based on the image information from the plurality of terminal devices based on the display information, First display means for displaying the synthesized image on the display means, sound output means for outputting sound, and each of the plurality of terminal devices based on the display information. And combines the audio based on the voice information to produce synthesized speech, the synthesized speech generated, characterized in that it comprises a second control means for outputting to the audio output means.
Since the image display device of the present invention is used in the conference system described above, it can enjoy the same operations and effects as the conference system described above.

本発明の画像音声処理方法は、複数の端末装置と、前記複数の端末装置と通信可能に接続された画像表示装置とを備え、前記複数の端末装置が扱う各画像を合成した合成画像を前記画像表示装置に表示させる会議システムを利用した画像音声処理方法であって、前記端末装置が、前記合成画像における、当該端末装置が扱う画像と、他の前記端末装置が扱う画像との表示状態の入力を受け付ける入力受付ステップと、前記端末装置が、音声収集手段にて集音された音声に関する音声情報、前記表示状態に関する表示情報、及び当該端末装置が扱う画像に関する画像情報を前記画像表示装置に送信する送信制御ステップと、前記画像表示装置が、前記音声情報、前記表示情報、及び前記画像情報を受信する受信ステップと、前記画像表示装置が、前記表示情報に基づいて、前記複数の端末装置からの各前記画像情報に基づく各画像を合成して前記合成画像を生成し、生成した前記合成画像を表示する第１制御ステップと、前記画像表示装置が、前記表示情報に基づいて、前記複数の端末装置からの各前記音声情報に基づく各音声を合成して合成音声を生成し、生成した前記合成音声を出力する第２制御ステップと、を含むことを特徴とする。
本発明の画像音声処理方法は、上述した会議システムを利用した方法であるので、上述した会議システムと同様の作用及び効果を享受できる。 The video / audio processing method of the present invention includes a plurality of terminal devices and an image display device connected to be communicable with the plurality of terminal devices, and combines the synthesized images obtained by combining the images handled by the plurality of terminal devices. An audio / video processing method using a conference system to be displayed on an image display device, wherein the terminal device displays a display state of an image handled by the terminal device and an image handled by another terminal device in the synthesized image. An input receiving step for receiving an input, and audio information related to the sound collected by the terminal device by the sound collecting means, display information related to the display state, and image information related to an image handled by the terminal device are stored in the image display device. A transmission control step of transmitting, a reception step of receiving the audio information, the display information, and the image information by the image display device; and the image display device of A first control step of generating the composite image by combining the images based on the image information from the plurality of terminal devices based on the display information, and displaying the generated composite image; and the image display A second control step in which an apparatus synthesizes each voice based on each voice information from the plurality of terminal devices based on the display information to generate a synthesized voice, and outputs the generated synthesized voice; It is characterized by including.
Since the audio / video processing method of the present invention is a method using the above-described conference system, it can enjoy the same operations and effects as the above-described conference system.

本発明の画像音声処理方法は、複数の端末装置と通信可能に接続され、前記複数の端末装置が扱う各画像が合成された合成画像を表示する画像表示装置の画像音声処理方法であって、前記複数の端末装置のそれぞれにおいて集音された音声に関する音声情報、前記合成画像における、前記複数の端末装置のそれぞれが扱う画像の表示状態に関する表示情報、及び前記複数の端末装置のそれぞれが扱う画像情報を受信する受信ステップと、前記表示情報に基づいて、前記複数の端末装置からの各前記画像情報に基づく各画像を合成して前記合成画像を生成し、生成した前記合成画像を表示する第１制御ステップと、前記表示情報に基づいて、前記複数の端末装置からの各前記音声情報に基づく各音声を合成して合成音声を生成し、生成した前記合成音声を出力する第２制御ステップと、を含むことを特徴とする。
本発明の画像音声処理方法は、上述した画像表示装置にて実施される方法であるので、上述した画像表示装置と同様の作用及び効果を享受できる。 The image / audio processing method of the present invention is an image / audio processing method for an image display device that is communicably connected to a plurality of terminal devices and displays a combined image obtained by combining the images handled by the plurality of terminal devices, Audio information relating to the sound collected by each of the plurality of terminal devices, display information relating to a display state of an image handled by each of the plurality of terminal devices in the synthesized image, and images handled by each of the plurality of terminal devices A receiving step for receiving information; and a method for generating the composite image by combining the images based on the image information from the plurality of terminal devices based on the display information, and displaying the generated composite image. 1 control step and, based on the display information, synthesize voices based on the voice information from the plurality of terminal devices to generate a synthesized voice, A second control step of outputting the formed sound, characterized in that it comprises a.
Since the image / audio processing method of the present invention is a method implemented by the above-described image display device, it can enjoy the same operations and effects as the above-described image display device.

第１実施形態における会議システムを示すブロック図。The block diagram which shows the conference system in 1st Embodiment. 第１実施形態における端末装置の構成を示すブロック図。The block diagram which shows the structure of the terminal device in 1st Embodiment. 第１実施形態におけるプロジェクターの構成を示すブロック図。FIG. 2 is a block diagram illustrating a configuration of a projector according to the first embodiment. 第１実施形態における音声処理手段の構成を示すブロック図。The block diagram which shows the structure of the audio | voice processing means in 1st Embodiment. 第１実施形態における画像音声処理方法を説明するフローチャート。The flowchart explaining the image audio processing method in 1st Embodiment. 第１実施形態における設定ウィンドウの一例を示す図。The figure which shows an example of the setting window in 1st Embodiment. 第２実施形態における会議システムを示すブロック図。The block diagram which shows the conference system in 2nd Embodiment. 第２実施形態におけるプロジェクターの構成を示すブロック図。The block diagram which shows the structure of the projector in 2nd Embodiment. 第２実施形態におけるサーバー装置の構成を示すブロック図。The block diagram which shows the structure of the server apparatus in 2nd Embodiment. 第２実施形態における画像音声処理方法を説明するフローチャート。The flowchart explaining the image audio processing method in 2nd Embodiment.

[第１実施形態]
以下、本発明の第１実施形態を図面に基づいて説明する。
〔会議システムの構成〕
図１は、会議システム１を示すブロック図である。
会議システム１は、会議を行う際に利用されるシステムであり、会議の各参加者の利用に供される各端末の各表示画面をスクリーン上に同時に表示（分割表示）する。
この会議システム１は、複数の端末装置２と画像表示装置としてのプロジェクター３とが所定の通信路を介して接続された構成を有する。 [First embodiment]
DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, a first embodiment of the invention will be described with reference to the drawings.
[Conference system configuration]
FIG. 1 is a block diagram showing a conference system 1.
The conference system 1 is a system that is used when a conference is performed, and simultaneously displays (divides and displays) each display screen of each terminal that is used by each participant in the conference.
The conference system 1 has a configuration in which a plurality of terminal devices 2 and a projector 3 as an image display device are connected via a predetermined communication path.

なお、本実施形態では、説明の便宜上、図１に示すように、端末装置２を４つ（第１〜第４端末装置２Ａ〜２Ｄ）としているが、その数は、４つに限らず、その他の数としても構わない。
また、本実施形態では、図１に示すように、前記通信路として、ローカルエリアネットワークＬＡＮ（以下、ネットワークＬＡＮ）を採用しているが、これに限らず、使用環境や求められる通信速度等に応じて、その他の通信路を採用しても構わない。また、会議システム１において採用される通信方法は、有線通信であってもよく、無線通信であってもよい。 In this embodiment, for convenience of explanation, as shown in FIG. 1, the number of terminal devices 2 is four (first to fourth terminal devices 2A to 2D), but the number is not limited to four. Other numbers may be used.
Further, in the present embodiment, as shown in FIG. 1, a local area network LAN (hereinafter referred to as a network LAN) is adopted as the communication path. Depending on the situation, other communication paths may be adopted. The communication method employed in the conference system 1 may be wired communication or wireless communication.

〔端末装置の構成〕
図２は、端末装置２の構成を示すブロック図である。
端末装置２は、会議の各参加者（本実施形態では４名）の利用に供されるパーソナルコンピューターで構成されている。
なお、本実施形態における各端末装置２Ａ〜２Ｄは、同一の構成を有する。
この端末装置２は、図２に示すように、ハードディスク等のメモリー２２、及び当該メモリー２２に記憶されたプログラムを実行するＣＰＵ(Central Processing Unit)２１の他、ネットワークインターフェース（ネットワークI/F）２３と、入力手段２４と、画像処理手段２５と、ＶＲＡＭ（Video Random Access Memory）２６と、ディスプレイ２７と、音声収集手段としてのマイクロフォン２８と、音声処理手段２９等を備える。 [Configuration of terminal device]
FIG. 2 is a block diagram illustrating a configuration of the terminal device 2.
The terminal device 2 is configured by a personal computer that is used by each participant of the conference (four people in this embodiment).
In addition, each terminal device 2A-2D in this embodiment has the same structure.
As shown in FIG. 2, the terminal device 2 includes a memory 22 such as a hard disk, a CPU (Central Processing Unit) 21 that executes a program stored in the memory 22, and a network interface (network I / F) 23. An input unit 24, an image processing unit 25, a video random access memory (VRAM) 26, a display 27, a microphone 28 as an audio collection unit, an audio processing unit 29, and the like.

ネットワークインターフェース２３は、ネットワークＬＡＮに接続するためのインターフェースである。
入力手段２４は、マウスやキーボード等で構成され、参加者による操作に応じた操作信号を出力する。
画像処理手段２５は、例えばＧＰＵ（Graphics Processing Unit）等で構成され、ＣＰＵ２１による制御の下、ＶＲＡＭ２６をバッファとして使用して、ディスプレイ２７に表示すべき画像を表示する。
マイクロフォン２８は、端末装置２の周囲の音（特に、端末装置２を利用する参加者の声）を集音し、音声信号を出力する。
音声処理手段２９は、マイクロフォン２８から出力されたアナログの音声信号をデジタルの音声データ（音声情報）に変換する。 The network interface 23 is an interface for connecting to a network LAN.
The input unit 24 includes a mouse, a keyboard, and the like, and outputs an operation signal corresponding to the operation by the participant.
The image processing means 25 is composed of, for example, a GPU (Graphics Processing Unit) or the like, and displays an image to be displayed on the display 27 using the VRAM 26 as a buffer under the control of the CPU 21.
The microphone 28 collects sounds around the terminal device 2 (particularly, voices of participants who use the terminal device 2) and outputs audio signals.
The audio processing means 29 converts the analog audio signal output from the microphone 28 into digital audio data (audio information).

ＣＰＵ２１は、メモリー２２に記憶されたプログラムを実行することで、図２に示すように、入力受付手段２１１、ＧＵＩ(Graphical User Interface)制御手段２１２、及び送信制御手段としての通信制御手段２１３等として機能する。
入力受付手段２１１は、入力手段２４からの操作信号に基づいて、参加者による入力手段２４への操作を認識する。
ＧＵＩ制御手段２１２は、画像処理手段２５の動作を制御し、ディスプレイ２７に後述する設定ウィンドウＷを表示させる。
通信制御手段２１３は、ネットワークＬＡＮを介したプロジェクター３との接続を確立するとともに、ネットワークＬＡＮを介してプロジェクター３との間で情報を送受信する。
例えば、通信制御手段２１３は、上述した音声データ、入力受付手段２１１にて認識した情報（以下、表示データ（表示情報））、及び端末装置２が扱う画像データ（ディスプレイ２７に表示されている画像（設定ウィンドウＷを除く）に関する画像データ（ＶＲＡＭ２６に記憶された画像データ））等をプロジェクター３に送信する。 As shown in FIG. 2, the CPU 21 executes a program stored in the memory 22, and as shown in FIG. 2, as an input receiving means 211, a GUI (Graphical User Interface) control means 212, a communication control means 213 as a transmission control means, and the like. Function.
Based on the operation signal from the input unit 24, the input receiving unit 211 recognizes an operation on the input unit 24 by the participant.
The GUI control unit 212 controls the operation of the image processing unit 25 to display a setting window W described later on the display 27.
The communication control unit 213 establishes a connection with the projector 3 via the network LAN and transmits / receives information to / from the projector 3 via the network LAN.
For example, the communication control unit 213 includes the above-described audio data, information recognized by the input receiving unit 211 (hereinafter, display data (display information)), and image data handled by the terminal device 2 (an image displayed on the display 27). Image data (excluding the setting window W) (image data stored in the VRAM 26)) is transmitted to the projector 3.

〔プロジェクターの構成〕
図３は、プロジェクター３の構成を示すブロック図である。
プロジェクター３は、画像を投射してスクリーン上に投影画像を表示する。
このプロジェクター３は、図３に示すように、ＲＯＭ(Read Only Memory)やＲＡＭ(Random Access Memory)等を含んで構成されるメモリー３２、及び当該メモリー３２に記憶されたプログラムを実行するＣＰＵ３１の他、ネットワークインターフェース（ネットワークI/F）３３と、画像処理手段３４と、ＶＲＡＭ３５と、表示手段としての画像投射手段３６と、音声処理手段３７と、音声出力手段としてのスピーカー３８等を備える。 [Configuration of projector]
FIG. 3 is a block diagram showing the configuration of the projector 3.
The projector 3 projects an image and displays a projected image on the screen.
As shown in FIG. 3, the projector 3 includes a memory 32 including a ROM (Read Only Memory), a RAM (Random Access Memory), and the like, and a CPU 31 that executes a program stored in the memory 32. A network interface (network I / F) 33, an image processing means 34, a VRAM 35, an image projection means 36 as a display means, an audio processing means 37, a speaker 38 as an audio output means, and the like.

ネットワークインターフェース３３は、ネットワークＬＡＮに接続するためのインターフェースである。
画像処理手段３４は、例えばＧＰＵ等で構成され、ＣＰＵ３１による制御の下、ＶＲＡＭ３５をバッファとして使用して、画像投射手段３６に画像を投射させる。
画像投射手段３６は、種々の一般的なプロジェクターで使用される光学系で構成され、光源装置と、光源装置から出射された光束を変調する液晶パネル等の光変調装置と、光変調装置にて変調された光束（画像）を投射する投射レンズ等で構成されている。 The network interface 33 is an interface for connecting to a network LAN.
The image processing means 34 is composed of, for example, a GPU or the like, and projects an image on the image projection means 36 using the VRAM 35 as a buffer under the control of the CPU 31.
The image projection means 36 is composed of an optical system used in various general projectors, and includes a light source device, a light modulation device such as a liquid crystal panel that modulates a light beam emitted from the light source device, and a light modulation device. The projection lens etc. which project the modulated light beam (image) are comprised.

図４は、音声処理手段３７の構成を示すブロック図である。
音声処理手段３７は、ＣＰＵ３１による制御の下、各端末装置２から送信された各音声データに基づいて、各端末装置２の各マイクロフォン２８にて収集された各参加者の声を合成し、当該合成した合成音声を、スピーカー３８を介して出力させる。
この音声処理手段３７は、図４に示すように、複数の音声増幅部３７１と、音声合成部３７２等を備える。
なお、本実施形態では、会議システム１に用いられる端末装置２の数を４つとしたため、音声増幅部３７１も４つ（第１〜第４音声増幅部３７１Ａ〜３７１Ｄ）としているが、その数は、４つに限らず、端末装置２に応じた数だけ設ければよい。 FIG. 4 is a block diagram showing the configuration of the audio processing means 37.
The voice processing means 37 synthesizes the voices of the participants collected by the microphones 28 of the terminal devices 2 based on the voice data transmitted from the terminal devices 2 under the control of the CPU 31. The synthesized voice synthesized is output through the speaker 38.
As shown in FIG. 4, the voice processing unit 37 includes a plurality of voice amplification units 371, a voice synthesis unit 372, and the like.
In the present embodiment, since the number of the terminal devices 2 used in the conference system 1 is four, the number of the audio amplifying units 371 is also four (first to fourth audio amplifying units 371A to 371D). The number is not limited to four, and may be provided according to the number of terminal devices 2.

音声増幅部３７１は、アンプ等を備えて構成され、各端末装置２のうち、対象となる端末装置２から送信された音声データをアナログの音声信号に変換し、ＣＰＵ３１の制御の下、変換した音声信号の信号レベル（音声の出力レベル）を調整する（種々の増幅度で増幅する）。
なお、第１〜第４音声増幅部３７１Ａ〜３７１Ｄは、第１〜第４端末装置２Ａ〜２Ｄにそれぞれ対応する。
音声合成部３７２は、各音声増幅部３７１にて増幅度が調整された各音声信号を合成して合成音声信号を生成し、当該合成音声信号に基づく合成音声を、スピーカー３８を介して出力させる。 The audio amplifying unit 371 includes an amplifier and the like, converts the audio data transmitted from the target terminal device 2 among the terminal devices 2 into an analog audio signal, and converts it under the control of the CPU 31. The signal level (sound output level) of the audio signal is adjusted (amplified at various amplification levels).
The first to fourth audio amplification units 371A to 371D correspond to the first to fourth terminal devices 2A to 2D, respectively.
The voice synthesis unit 372 generates a synthesized voice signal by synthesizing each voice signal whose amplification degree is adjusted by each voice amplification unit 371, and outputs a synthesized voice based on the synthesized voice signal via the speaker 38. .

ＣＰＵ３１は、メモリー３２に記憶されたプログラムを実行することで、図３に示すように、受信手段としての通信制御手段３１１、第１制御手段３１２、及び第２制御手段３１３等として機能する。
通信制御手段２１３は、ネットワークＬＡＮを介した各端末装置２との接続を確立するとともに、ネットワークＬＡＮを介して各端末装置２との間で情報を送受信する。 As shown in FIG. 3, the CPU 31 functions as a communication control unit 311, a first control unit 312, a second control unit 313, and the like as a reception unit by executing a program stored in the memory 32.
The communication control unit 213 establishes a connection with each terminal device 2 via the network LAN and transmits / receives information to / from each terminal device 2 via the network LAN.

第１制御手段３１２は、端末装置２から送信された表示データに基づいて、画像処理手段３４の動作を制御し、各端末装置２からの各画像データに基づく各画像を合成し、当該合成画像をスクリーン上に表示させる。
第２制御手段３１３は、端末装置２から送信された表示データに基づいて、音声処理手段３７の動作を制御し、各端末装置２からの各音声データに基づく各音声を合成し、当該合成音声をスピーカー３８から出力させる。 The first control unit 312 controls the operation of the image processing unit 34 based on the display data transmitted from the terminal device 2, combines the images based on the image data from the terminal devices 2, and generates the combined image. Is displayed on the screen.
The second control unit 313 controls the operation of the voice processing unit 37 based on the display data transmitted from the terminal device 2, synthesizes each voice based on each voice data from each terminal device 2, and the synthesized voice. Is output from the speaker 38.

〔会議システムの動作〕
次に、上述した会議システム１の動作（画像音声処理方法）について説明する。
図５は、画像音声処理方法を説明するフローチャートである。
なお、以下では、説明の便宜上、端末装置２とプロジェクター３とのネットワークＬＡＮを介した接続が既に確立されているものとする。
また、以下では、説明の便宜上、ネットワークＬＡＮを介してプロジェクター３と接続が確立されている各端末装置２を第１，第２端末装置２Ａ，２Ｂとし、第３，第４端末装置２Ｃ，２Ｄについては接続が確立されていないものとする。
すなわち、会議システム１を利用した会議への参加者は、第１端末装置２Ａを利用する第１参加者と、第２端末装置２Ｂを利用する第２参加者の２名とする。 [Operation of the conference system]
Next, the operation (video / audio processing method) of the conference system 1 described above will be described.
FIG. 5 is a flowchart for explaining the image / audio processing method.
In the following, for convenience of explanation, it is assumed that the connection between the terminal device 2 and the projector 3 via the network LAN has already been established.
In the following, for convenience of explanation, the terminal devices 2 that are connected to the projector 3 via the network LAN are referred to as first and second terminal devices 2A and 2B, and third and fourth terminal devices 2C and 2D. It is assumed that no connection has been established.
In other words, the number of participants in the conference using the conference system 1 is the first participant using the first terminal device 2A and the second participant using the second terminal device 2B.

図６は、設定ウィンドウＷの一例を示す図である。
例えば、第１参加者が第１端末装置２Ａの入力手段２４により「設定ウィンドウＷを表示させる」旨の入力操作を実施すると、入力受付手段２１１は、当該入力操作を認識する（ステップＳ１０１）。
ステップＳ１０１の後、ＧＵＩ制御手段２１２は、画像処理手段２５の動作を制御し、ディスプレイ２７に図６に示す設定ウィンドウＷを表示させる（ステップＳ１０２）。 FIG. 6 is a diagram illustrating an example of the setting window W.
For example, when the first participant performs an input operation to “display the setting window W” using the input unit 24 of the first terminal device 2A, the input receiving unit 211 recognizes the input operation (step S101).
After step S101, the GUI control unit 212 controls the operation of the image processing unit 25 to display the setting window W shown in FIG. 6 on the display 27 (step S102).

設定ウィンドウＷは、図６に示すように、第１〜第３ウィンドウＷ１〜Ｗ３を備える。
第１ウィンドウＷ１は、会議システム１を利用する全参加者に対して、当該参加者を認識させる領域である。
なお、当該参加者については、各端末装置２とプロジェクター３とのネットワークＬＡＮを介した接続が確立された際に、接続確立済みの各端末装置２（本実施形態では第１，第２端末装置２Ａ，２Ｂ）に関する情報がプロジェクター３から当該各端末装置２に送信される。そして、各端末装置２（各ＣＰＵ２１）は、当該送信された情報（接続確立済みの各端末装置２に関する情報）により、当該端末装置２に対して予め設定された各参加者を認識し、第１ウィンドウＷ１に当該各参加者に関する参加者画像を表示する。例えば、本実施形態では、第１端末装置２Ａは、プロジェクター３から送信された情報に基づき、第２端末装置２Ｂとプロジェクター３との接続が確立済みであることを把握するとともに、第２端末装置２Ｂに対して設定された参加者を認識する。
本実施形態では、上述したように、各参加者が第１，第２参加者の２名であるため、第１ウィンドウＷ１には、第１参加者に応じた第１参加者画像ＦＰ１、及び第２参加者に応じた第２参加者画像ＦＰ２が表示される。 As shown in FIG. 6, the setting window W includes first to third windows W1 to W3.
The first window W1 is an area in which all participants who use the conference system 1 recognize the participant.
For each participant, when the connection between each terminal device 2 and the projector 3 via the network LAN is established, each terminal device 2 that has already been connected (in this embodiment, the first and second terminal devices). 2A, 2B) information is transmitted from the projector 3 to each terminal device 2. Then, each terminal device 2 (each CPU 21) recognizes each participant set in advance for the terminal device 2 based on the transmitted information (information on each terminal device 2 that has been established), and Participant images relating to each participant are displayed in one window W1. For example, in the present embodiment, the first terminal device 2A recognizes that the connection between the second terminal device 2B and the projector 3 has been established based on the information transmitted from the projector 3, and the second terminal device Recognize the participant set for 2B.
In the present embodiment, as described above, since each participant is the first and second participants, the first window W1 includes the first participant image FP1 corresponding to the first participant, and A second participant image FP2 corresponding to the second participant is displayed.

第２ウィンドウＷ２は、プロジェクター３に表示させる合成画像において、各端末装置２が扱う画像の表示数を選択させる領域である。
具体的に、第２ウィンドウＷ２には、図６に示すように、第１〜第３選択画像ＦＣ１〜ＦＣ３が表示されている。
第１選択画像ＦＣ１は、接続確立済みの各端末装置２のうち、いずれかの端末装置２が扱う画像のみを全画面で表示させる旨（表示数が１）を選択させる画像である。
第２選択画像ＦＣ２は、接続確立済みの各端末装置２のうち、２つの端末装置２が扱う各画像を左右に２画面で表示させる旨（表示数が２）を選択させる画像である。
第３選択画像ＦＣ３は、接続確立済みの各端末装置２のうち、４つの端末装置２が扱う各画像を上下左右に４画面で表示させる旨（表示数が４）を選択させる画像である。 The second window W2 is an area for selecting the display number of images handled by each terminal device 2 in the composite image to be displayed on the projector 3.
Specifically, as shown in FIG. 6, first to third selection images FC1 to FC3 are displayed in the second window W2.
The first selection image FC1 is an image for selecting that the images handled by any one of the terminal devices 2 are displayed on the full screen (the number of displays is 1) among the terminal devices 2 that have been established.
The second selection image FC2 is an image for selecting that the images handled by the two terminal devices 2 are displayed on the left and right in two screens (the number of displays is 2) among the terminal devices 2 that have been established.
The third selection image FC3 is an image for selecting that the images handled by the four terminal devices 2 are displayed on four screens on the top, bottom, left, and right among the terminal devices 2 that have already been connected (the number of displays is four).

第３ウィンドウＷ３は、プロジェクター３に表示させる合成画像において、接続確立済みの各端末装置２が扱う各画像の表示位置及び表示サイズを選択させる領域である。
例えば、第１参加者による入力手段２４への操作（マウス操作）により、第２選択画像ＦＣ２が選択された場合には、図６に示すように、第３ウィンドウＷ３には、プロジェクター３が表示する２画面に相当する２つの領域Ａｒ１，Ａｒ２が表示される。
なお、具体的な図示は省略したが、第１選択画像ＦＣ１が選択された場合には、第３ウィンドウＷ３には、プロジェクター３が表示する１画面に相当する１つの領域のみが表示される。また、第３選択画像ＦＣ３が選択された場合には、第３ウィンドウＷ３には、プロジェクター３が表示する４画面に相当する４つの領域が表示される。 The third window W <b> 3 is an area for selecting a display position and a display size of each image handled by each terminal device 2 that has already been connected in the composite image displayed on the projector 3.
For example, when the second selected image FC2 is selected by an operation (mouse operation) on the input means 24 by the first participant, as shown in FIG. 6, the projector 3 displays in the third window W3. Two areas Ar1 and Ar2 corresponding to the two screens to be displayed are displayed.
Although specific illustration is omitted, when the first selection image FC1 is selected, only one area corresponding to one screen displayed by the projector 3 is displayed in the third window W3. When the third selection image FC3 is selected, four areas corresponding to four screens displayed by the projector 3 are displayed in the third window W3.

そして、第１参加者は、接続確立済みの各端末装置２が扱う各画像の表示位置を選択する際には、入力手段２４を操作し、第１ウィンドウＷ１に表示された各参加者画像のうちいずれかの参加者画像を第３ウィンドウＷ３に表示された各領域のうちいずれかの領域にドラッグアンドドロップする。
例えば、図６（Ａ）に示す例では、第３ウィンドウＷ３における左側の領域Ａｒ１に第１参加者画像ＦＰ１がドラッグアンドドロップされ、右側の領域Ａｒ２に第２参加者画像ＦＰ２がドラッグアンドドロップされた状態を示している。
このように表示位置を選択することで、第１端末装置２Ａが扱う画像（以下の説明では、「第１の画像」とも称する）を左側に表示し、第２端末装置２Ｂが扱う画像（以下の説明では、「第２の画像」とも称する）を右側に表示する旨を指示することとなる。 Then, when the first participant selects the display position of each image handled by each terminal device 2 that has already been established, the first participant operates the input unit 24 to display each participant image displayed in the first window W1. One of the participant images is dragged and dropped onto any one of the regions displayed in the third window W3.
For example, in the example shown in FIG. 6A, the first participant image FP1 is dragged and dropped in the left area Ar1 in the third window W3, and the second participant image FP2 is dragged and dropped in the right area Ar2. Shows the state.
By selecting the display position in this way, an image handled by the first terminal device 2A (hereinafter also referred to as “first image”) is displayed on the left side, and an image handled by the second terminal device 2B (hereinafter referred to as “first image”). In this description, it is instructed to display “second image” on the right side.

また、第１参加者は、接続確立済みの各端末装置２が扱う各画像の表示サイズを選択する際には、入力手段２４を操作し、第３ウィンドウＷ３に表示された各領域のうちいずれかの領域の端縁にカーソル（図示略）を合わせ、当該領域の大きさを変えるようにドラッグアンドドロップする。
例えば、図６（Ｂ）に示す例では、上記操作により第３ウィンドウＷ３における左側の領域Ａｒ１が大きくされ、当該操作に伴い右側の領域Ａｒ２が小さくなった状態を示している。
このように表示サイズを選択することで、第１端末装置２Ａが扱う画像（第１参加者（主発言者）の資料等）を大きく表示し、第２端末装置２Ｂが扱う画像（他の参加者の資料等）を小さく表示する旨を指示することとなる。 In addition, when the first participant selects the display size of each image handled by each terminal device 2 that has already been established, the first participant operates the input unit 24 to select which of the areas displayed in the third window W3. Place the cursor (not shown) at the edge of the area, and drag and drop to change the size of the area.
For example, the example shown in FIG. 6B shows a state in which the left area Ar1 in the third window W3 is enlarged by the above operation, and the right area Ar2 is reduced in accordance with the operation.
By selecting the display size in this way, the image handled by the first terminal device 2A (such as the material of the first participant (main speaker)) is displayed in a large size, and the image handled by the second terminal device 2B (other participation) The user's material) is displayed in a small size.

ステップＳ１０２の後、第１参加者により上述したように入力手段２４が操作された場合には、入力受付手段２１１は、プロジェクター３に表示させる合成画像において、接続確立済みの各端末装置２が扱う各画像の表示状態（表示数、表示位置、表示サイズ）の入力を受け付ける（ステップＳ１０３：入力受付ステップ）。
そして、入力受付手段２１１は、表示状態（表示数、表示位置、表示サイズ）に関する表示データをメモリー２２に記憶させる。
例えば、図６（Ｂ）に示す表示状態が入力された場合には、入力受付手段２１１は、表示数を「２」とし、第１端末装置２Ａが扱う画像の表示位置を「左側」とし、第２端末装置２Ｂが扱う画像の表示位置を「右側」とし、第１，第２端末装置２Ａ，２Ｂが扱う各画像の表示サイズ（比率）を「Ａ：Ｂ」とする旨の表示データをメモリー２２に記憶させる。
本実施形態では、「Ａ」はプロジェクター３に表示させる合成画像における第１の画像の比率を、「Ｂ」は第２の画像の比率を表すものとする。この比率は、第１，第２の画像の幅、高さ、または対角線の長さの比率や、面積の比率等を採用することができる。また、表示サイズに関する表示データは、第１，第２の画像の表示サイズを表す情報であれば、比率以外の情報であってもよい。例えば、表示サイズに関する表示データは、第１，第２の画像の幅、高さ、対角線の長さや、面積等を表す情報であってもよく、これらの差を表す情報であってもよい。 After step S102, when the input means 24 is operated by the first participant as described above, the input receiving means 211 is handled by each terminal device 2 with established connection in the composite image displayed on the projector 3. Input of the display state (display number, display position, display size) of each image is received (step S103: input reception step).
Then, the input receiving unit 211 stores display data relating to the display state (display number, display position, display size) in the memory 22.
For example, when the display state shown in FIG. 6B is input, the input reception unit 211 sets the display number to “2”, sets the display position of the image handled by the first terminal device 2A to “left”, Display data indicating that the display position of the image handled by the second terminal device 2B is “right” and the display size (ratio) of each image handled by the first and second terminal devices 2A and 2B is “A: B”. It is stored in the memory 22.
In the present embodiment, “A” represents the ratio of the first image in the composite image displayed on the projector 3, and “B” represents the ratio of the second image. As this ratio, the ratio of the width and height of the first and second images, the length of the diagonal line, the ratio of the area, and the like can be adopted. The display data related to the display size may be information other than the ratio as long as it is information indicating the display size of the first and second images. For example, the display data relating to the display size may be information indicating the width, height, diagonal length, area, and the like of the first and second images, or information indicating the difference between them.

ステップＳ１０３の後、第１参加者が第１端末装置２Ａの入力手段２４により「入力した表示状態で合成画像をプロジェクター３に表示させる」旨の入力操作を実施すると、入力受付手段２１１は、当該入力操作を認識する。
そして、通信制御手段２１３は、メモリー２２に記憶された表示データを、ネットワークＬＡＮを介してプロジェクター３に送信する（ステップＳ１０４：送信制御ステップ）。
また、ＣＰＵ２１は、画像処理手段２５の動作を制御し、設定ウィンドウＷの表示を止め、通常の画像をディスプレイ２７に表示させる。 After step S103, when the first participant performs an input operation to “display the composite image on the projector 3 in the input display state” with the input unit 24 of the first terminal device 2A, the input receiving unit 211 Recognize input operations.
Then, the communication control unit 213 transmits the display data stored in the memory 22 to the projector 3 via the network LAN (step S104: transmission control step).
Further, the CPU 21 controls the operation of the image processing means 25, stops displaying the setting window W, and causes the display 27 to display a normal image.

ステップＳ１０４の後、プロジェクター３の通信制御手段３１１は、ネットワークＬＡＮを介して、第１端末装置２Ａから表示データを受信する（ステップＳ１０５：受信ステップ）。
そして、ＣＰＵ３１は、受信した表示データをメモリー３２に記憶させる。
ステップＳ１０５の後、通信制御手段３１１は、ネットワークＬＡＮを介して、第１，第２端末装置２Ａ，２Ｂに対して、画像データ及び音声データの送信要求を行う（ステップＳ１０６）。 After step S104, the communication control means 311 of the projector 3 receives display data from the first terminal device 2A via the network LAN (step S105: reception step).
Then, the CPU 31 stores the received display data in the memory 32.
After step S105, the communication control unit 311 makes a transmission request for image data and audio data to the first and second terminal apparatuses 2A and 2B via the network LAN (step S106).

ステップＳ１０６の後、第１，第２端末装置２Ａ，２Ｂの各通信制御手段２１３は、プロジェクター３からデータ送信要求を受信（ステップＳ１０７，Ｓ１０８）すると、ネットワークＬＡＮを介してプロジェクター３に対して画像データ及び音声データを送信する（ステップＳ１０９，Ｓ１１０：送信制御ステップ）。
ここで、画像データは、第１，第２端末装置２Ａ，２Ｂにおいて、現時点で各ディスプレイ２７に表示されている画像（表示画面）に関する画像データであり、具体的には、現時点で各ＶＲＡＭ２６に記憶されている画像データである。
また、音声データは、第１，第２端末装置２Ａ，２Ｂにおいて、現時点で第１，第２参加者の声が各マイクロフォン２８にて集音され、各マイクロフォン２８からの音声信号が各音声処理手段２９にて変換された音声データである。 After step S106, when the communication control means 213 of the first and second terminal devices 2A and 2B receives a data transmission request from the projector 3 (steps S107 and S108), an image is sent to the projector 3 via the network LAN. Data and audio data are transmitted (steps S109 and S110: transmission control step).
Here, the image data is image data related to an image (display screen) currently displayed on each display 27 in the first and second terminal devices 2A and 2B. Specifically, the image data is currently stored in each VRAM 26. Stored image data.
In addition, the voice data is collected at each microphone 28 by the first and second terminal devices 2A and 2B at the present time, and the voice signal from each microphone 28 is processed by each voice processing. The voice data converted by the means 29.

ステップＳ１０９，Ｓ１１０の後、プロジェクター３の通信制御手段３１１は、ネットワークＬＡＮを介して、第１，第２端末装置２Ａ，２Ｂから画像データ及び音声データを受信する（ステップＳ１１１：受信ステップ）。
そして、ＣＰＵ３１は、端末装置２を識別する識別情報（ＩＰアドレス等）にて送信元（端末装置２）を特定し、受信した画像データ及び音声データを当該送信元の端末装置２に関連付けてメモリー２２に記憶させる。
なお、以下では、メモリー２２に記憶された画像データ及び音声データのうち、送信元が第１端末装置２Ａであるデータを第１画像データ及び第１音声データと記載し、送信元が第２端末装置２Ｂであるデータを第２画像データ及び第２音声データと記載する。 After steps S109 and S110, the communication control means 311 of the projector 3 receives image data and audio data from the first and second terminal devices 2A and 2B via the network LAN (step S111: reception step).
Then, the CPU 31 specifies the transmission source (terminal device 2) by identification information (IP address or the like) for identifying the terminal device 2, and associates the received image data and audio data with the transmission source terminal device 2 in the memory. 22 is stored.
In the following, among the image data and audio data stored in the memory 22, data whose transmission source is the first terminal device 2A will be referred to as first image data and first audio data, and the transmission source will be the second terminal. The data that is the device 2B is described as second image data and second audio data.

ステップＳ１１１の後、第１制御手段３１２は、メモリー２２に記憶された表示データに基づいて、画像処理手段３４の動作を制御し、合成画像を生成させ（ステップＳ１１２）、当該合成画像をスクリーン上に表示させる（ステップＳ１１３：第１制御ステップ）。
例えば、ステップＳ１０３において図６（Ｂ）に示す表示状態が入力された場合には、第１制御手段３１２は、表示データに基づいて、以下に示すような合成画像を生成及び表示させる。
すなわち、第１制御手段３１２は、表示データ（表示数、表示位置）に基づいて、第１画像データに基づく画像（第１の画像）の表示位置を「左側」、第２画像データに基づく画像（第２の画像）の表示位置を「右側」とする。
また、第１制御手段３１２は、第１，第２の画像の表示サイズを表示データに基づく比率とした合成画像（第１の画像（主発言者（第１参加者）の資料等）が第２の画像（他の参加者（第２参加者）よりも大きい合成画像）を生成させ（当該合成画像に関する合成画像データをＶＲＡＭ３５上に生成させ）、スクリーン上に当該合成画像を表示させる。 After step S111, the first control unit 312 controls the operation of the image processing unit 34 based on the display data stored in the memory 22, generates a composite image (step S112), and displays the composite image on the screen. (Step S113: first control step).
For example, when the display state shown in FIG. 6B is input in step S103, the first control unit 312 generates and displays a composite image as shown below based on the display data.
That is, the first control means 312 sets the display position of the image (first image) based on the first image data to “left” based on the display data (display number, display position), and the image based on the second image data. The display position of (second image) is “right”.
In addition, the first control means 312 has a composite image (a first image (material of the main speaker (first participant), etc.) having the display size of the first and second images as a ratio based on the display data. A second image (a composite image larger than other participants (second participants)) is generated (composite image data related to the composite image is generated on the VRAM 35), and the composite image is displayed on the screen.

また、ステップＳ１１１の後（図５では説明の便宜上、ステップＳ１１２，Ｓ１１３の後の処理としている）、第２制御手段３１３は、メモリー２２に記憶された表示データに基づいて、音声処理手段３７の動作を制御し、合成音声を生成させ（ステップＳ１１４）、当該合成音声をスピーカー３８から出力させる（ステップＳ１１５：第２制御ステップ）。
例えば、ステップＳ１０３において図６（Ｂ）に示す表示状態が入力された場合には、第２制御手段３１３は、表示データに基づいて、以下に示すような合成音声を生成させる。
すなわち、第２制御手段３１３は、表示データ（各画像の表示サイズの比率）に基づいて、第１音声増幅部３７１Ａに第１音声データを処理させ、当該第１音声データを変換した後の音声信号を増幅させる。本実施形態では、第１，第２の画像が合成画像に含まれており、第１の画像の方が第２の画像よりも大きい。このため、第２制御手段３１３は、第１音声増幅部３７１Ａにデフォルト値よりも大きい増幅度で音声信号を増幅させる。
また、第２制御手段３１３は、表示データ（各画像の表示サイズの比率）に基づいて、第２音声増幅部３７１Ｂに第２音声データを処理させ、当該第２音声データを変換した後の音声信号を増幅させる。本実施形態では、第１，第２の画像が合成画像に含まれており、第２の画像の方が第１の画像よりも小さい。このため、第２制御手段３１３は、第２音声増幅部３７１Ｂにデフォルト値よりも小さい増幅度で音声信号を増幅させる。換言すると、第２制御手段３１３は、合成画像において第１の画像が他の画像よりも大きく表示されるほど、第１の画像に対応する音声を他の画像に対応する音声よりも大きくする。 Further, after step S111 (in FIG. 5, for convenience of explanation, the processing after step S112, S113 is performed), the second control unit 313 performs the audio processing unit 37 based on the display data stored in the memory 22. The operation is controlled to generate synthesized speech (step S114), and the synthesized speech is output from the speaker 38 (step S115: second control step).
For example, when the display state shown in FIG. 6B is input in step S103, the second control unit 313 generates synthesized speech as shown below based on the display data.
That is, the second control means 313 causes the first audio amplifying unit 371A to process the first audio data based on the display data (the display size ratio of each image), and the audio after the first audio data is converted. Amplify the signal. In the present embodiment, the first and second images are included in the composite image, and the first image is larger than the second image. For this reason, the second control unit 313 causes the first audio amplification unit 371A to amplify the audio signal with an amplification degree larger than the default value.
In addition, the second control unit 313 causes the second audio amplification unit 371B to process the second audio data based on the display data (the display size ratio of each image), and converts the second audio data into audio. Amplify the signal. In the present embodiment, the first and second images are included in the composite image, and the second image is smaller than the first image. For this reason, the second control unit 313 causes the second audio amplification unit 371B to amplify the audio signal with an amplification factor smaller than the default value. In other words, the second control means 313 makes the sound corresponding to the first image larger than the sound corresponding to the other image, as the first image is displayed larger than the other images in the composite image.

そして、音声合成部３７２は、第１，第２音声増幅部３７１Ａ，３７１Ｂにて増幅された各音声信号を合成して合成音声信号を生成し、当該合成音声信号に基づく合成音声を、スピーカー３８を介して出力させる。
すなわち、ステップＳ１０３において図６（Ｂ）に示す表示状態が入力された場合には、第２制御手段３１３は、主発言者（第１参加者）の声を他の参加者（第２参加者）の声に対して強調した合成音声を生成及び出力させる。
以降、ステップＳ１０９〜Ｓ１１５が順次、繰り返し実行され、現時点での第１，第２端末装置２Ａ，２Ｂの各表示画面が合成画像としてスクリーンに表示され、現時点での第１，第２端末装置２Ａ，２Ｂの各マイクロフォン２８に集音された各参加者の声が合成音声としてスピーカー３８から出力されることとなる。 Then, the voice synthesizer 372 generates a synthesized voice signal by synthesizing the voice signals amplified by the first and second voice amplifiers 371A and 371B, and the synthesized voice based on the synthesized voice signal is converted into the speaker 38. Output via.
That is, when the display state shown in FIG. 6B is input in step S103, the second control means 313 sends the voice of the main speaker (first participant) to other participants (second participants). ) To generate and output a synthesized voice emphasized with respect to the voice.
Thereafter, steps S109 to S115 are sequentially and repeatedly executed, and the display screens of the first and second terminal devices 2A and 2B at the present time are displayed on the screen as composite images, and the first and second terminal devices 2A at the present time are displayed. , 2B, the voices of the participants collected by the microphones 28 are output as synthesized speech from the speaker 38.

上述した第１実施形態によれば、以下の効果がある。
本実施形態では、会議システム１は、複数の端末装置２及びプロジェクター３を備えるので、端末装置２を利用して表示状態を適宜、入力することで、主発言者の資料（主発言者の利用に供される端末装置２が扱う画像）を合成画像（分割表示画面）中の任意の表示位置や、任意の表示サイズ等に設定できる。このため、他の参加者に主発言者がどの画像に基づいて説明を行っているかを容易に判断させることができる。
また、各参加者の声を各マイクロフォン２８にて集音し、主発言者の声を他の参加者の声に対して強調した合成音声を生成及び出力できるので、主発言者が説明を行っている際に、他の参加者の一部が会話等をしていた場合であっても、主発言者の声を聞き取りやすいものとすることができる。
したがって、利便性の向上が図れる。
また、合成画像及び合成音声の生成をプロジェクター３自身が実行するため、従来のようなサーバーを必要とせず、会議システム１の構成を簡素化できる。 The first embodiment described above has the following effects.
In the present embodiment, since the conference system 1 includes a plurality of terminal devices 2 and projectors 3, by appropriately inputting the display state using the terminal device 2, the material of the main speaker (use of the main speaker) Can be set to an arbitrary display position, an arbitrary display size, or the like in the composite image (divided display screen). For this reason, other participants can easily determine which image the main speaker is explaining.
In addition, the voice of each participant is collected by each microphone 28, and a synthesized speech in which the voice of the main speaker is emphasized with respect to the voices of other participants can be generated and output. Even when some of the other participants are talking, the voice of the main speaker can be easily heard.
Therefore, the convenience can be improved.
In addition, since the projector 3 itself generates the synthesized image and the synthesized voice, the configuration of the conference system 1 can be simplified without requiring a conventional server.

さらに、表示状態が上述した表示サイズを含むものであるので、当該表示サイズに基づいて第１，第２制御手段３１２，３１３が合成画像及び合成音声を生成すれば、主発言者の資料（主発言者の利用に供される端末装置２が扱う画像）や声を他の参加者の資料や声に対して強調できる。
したがって、主発言者の資料や声を視聴し易いものとなり、利便性の向上が図れる。 Furthermore, since the display state includes the display size described above, if the first and second control means 312 and 313 generate a synthesized image and synthesized speech based on the display size, the material of the main speaker (main speaker) Image and voice handled by the terminal device 2 provided for use of the other participants can be emphasized with respect to materials and voices of other participants.
Therefore, it becomes easy to view the material and voice of the main speaker, and the convenience can be improved.

[第２実施形態]
次に、本発明の第２実施形態を図面に基づいて説明する。
以下の説明では、前記第１実施形態と同様の構成及び同一部材には同一の符号を付して、その詳細な説明は省略または簡略化する。
図７は、第２実施形態における会議システム１を示すブロック図である。
図８は、第２実施形態におけるプロジェクター３の構成を示すブロック図である。
図９は、第２実施形態におけるサーバー装置４の構成を示すブロック図である。
本実施形態では、前記第１実施形態に対して、図７ないし図９に示すように、会議システム１の構成として端末装置２及びプロジェクター３の他、サーバー装置４を追加した点、及びプロジェクター３におけるＣＰＵ３１の一部の機能を省略し、当該一部の機能をサーバー装置４のＣＰＵ４１に追加した点が異なるのみである。 [Second Embodiment]
Next, 2nd Embodiment of this invention is described based on drawing.
In the following description, the same configurations and the same members as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted or simplified.
FIG. 7 is a block diagram showing the conference system 1 in the second embodiment.
FIG. 8 is a block diagram illustrating a configuration of the projector 3 according to the second embodiment.
FIG. 9 is a block diagram illustrating a configuration of the server device 4 according to the second embodiment.
In the present embodiment, as shown in FIGS. 7 to 9, the server device 4 is added to the configuration of the conference system 1 in addition to the terminal device 2 and the projector 3 as compared with the first embodiment, and the projector 3 The only difference is that some of the functions of the CPU 31 are omitted and the partial functions are added to the CPU 41 of the server device 4.

サーバー装置４は、端末装置２と同様のパーソナルコンピューターで構成されており、図９に示すように、ＣＰＵ４１、メモリー４２、ネットワークインターフェース（ネットワークI/F）４３、入力手段４４、画像処理手段４５、ＶＲＡＭ４６、ディスプレイ４７、及び音声処理手段４８を備える。
ＣＰＵ４１は、メモリー４２に記憶されたプログラムを実行することで、図９に示すように、第３制御手段としての通信制御手段４１１、前記第１実施形態で説明したプロジェクター３（ＣＰＵ３１）における第１，第２制御手段３１２，３１３と同様の第１，第２制御手段４１２，４１３等として機能する。
音声処理手段４８は、前記第１実施形態で説明したプロジェクター３の音声処理手段３７と同様の構成を有するものである。 The server device 4 is composed of a personal computer similar to the terminal device 2, and as shown in FIG. 9, a CPU 41, a memory 42, a network interface (network I / F) 43, an input means 44, an image processing means 45, A VRAM 46, a display 47, and sound processing means 48 are provided.
The CPU 41 executes the program stored in the memory 42, thereby, as shown in FIG. 9, the communication control unit 411 as the third control unit, and the first in the projector 3 (CPU 31) described in the first embodiment. , Function as first and second control means 412 and 413 similar to the second control means 312 and 313.
The sound processing means 48 has the same configuration as the sound processing means 37 of the projector 3 described in the first embodiment.

次に、第２実施形態における画像音声処理方法について説明する。
図１０は、第２実施形態における画像音声処理方法を説明するフローチャートである。
なお、以下では、説明の便宜上、端末装置２とサーバー装置４とのネットワークＬＡＮを介した接続、及びプロジェクター３とサーバー装置４とのネットワークＬＡＮを介した接続が既に確立されているものとする。
また、以下では、前記第１実施形態と同様に、ネットワークＬＡＮを介してサーバー装置４と接続が確立されている各端末装置２を第１，第２端末装置２Ａ，２Ｂとする。 Next, a video / audio processing method according to the second embodiment will be described.
FIG. 10 is a flowchart for explaining a video / audio processing method according to the second embodiment.
In the following, for convenience of explanation, it is assumed that the connection between the terminal device 2 and the server device 4 via the network LAN and the connection between the projector 3 and the server device 4 via the network LAN have already been established.
In the following, similarly to the first embodiment, the terminal devices 2 that are connected to the server device 4 via the network LAN are referred to as first and second terminal devices 2A and 2B.

本実施形態の画像音声処理方法では、上述したようにプロジェクター３（ＣＰＵ３１）の一部の機能をサーバー装置４（ＣＰＵ４１）に持たせたことに伴い、前記第１実施形態で説明したプロジェクター３（ＣＰＵ３１）が実行していた処理の一部をサーバー装置４（ＣＰＵ４１）が実行することとなる。
すなわち、サーバー装置４（ＣＰＵ４１）は、図１０に示すように、前記第１実施形態でプロジェクター３（ＣＰＵ３１）が実行していたステップＳ１０５，Ｓ１０６，Ｓ１１１，Ｓ１１２，Ｓ１１４を実行することとなる。 In the image / audio processing method of this embodiment, as described above, the projector 3 (CPU 31) has a part of the functions of the projector 3 (CPU 31), and the projector 3 (CPU 41) described in the first embodiment has been described. The server device 4 (CPU 41) executes a part of the processing executed by the CPU 31).
That is, as shown in FIG. 10, the server apparatus 4 (CPU 41) executes steps S105, S106, S111, S112, and S114 that were executed by the projector 3 (CPU 31) in the first embodiment.

なお、本実施形態において、ステップＳ１１２では、第１制御手段４１２による制御の下、画像処理手段４５が合成画像データを生成する。また、ステップＳ１１４では、第２制御手段４１３による制御の下、音声処理手段４８が合成音声信号を生成し、当該合成音声信号（アナログ）をデジタルデータ（合成音声データ）に変換する。
また、各端末装置２は、ステップＳ１０４，Ｓ１０７〜Ｓ１１０を実行する際の送受信の相手がサーバー装置４となる。 In this embodiment, in step S112, the image processing unit 45 generates composite image data under the control of the first control unit 412. In step S114, the voice processing unit 48 generates a synthesized voice signal under the control of the second control unit 413, and converts the synthesized voice signal (analog) into digital data (synthesized voice data).
Further, in each terminal device 2, the transmission / reception partner at the time of executing steps S 104 and S 107 to S 110 becomes the server device 4.

そして、サーバー装置４の通信制御手段４１１は、ステップＳ１１２，Ｓ１１４において生成された合成画像データ及び合成音声データを、ネットワークＬＡＮを介して、プロジェクター３に送信する（ステップＳ２０１）。
一方、プロジェクター３の通信制御手段３１１は、ネットワークＬＡＮを介して、サーバー装置４からの合成画像データ及び合成音声データを受信する（ステップＳ２０２）。
そして、ＣＰＵ３１は、当該合成画像データをＶＲＡＭ３５上に記憶させ、当該合成画像データに基づく合成画像をスクリーン上に表示させる（ステップＳ２０３）とともに、当該合成音声データを音声処理手段３７Ａにてアナログの合成音声信号に変換させ、当該合成音声信号に基づく合成音声をスピーカー３８から出力させる（ステップＳ２０４）。 Then, the communication control unit 411 of the server device 4 transmits the synthesized image data and synthesized voice data generated in steps S112 and S114 to the projector 3 via the network LAN (step S201).
On the other hand, the communication control means 311 of the projector 3 receives the synthesized image data and synthesized audio data from the server device 4 via the network LAN (step S202).
Then, the CPU 31 stores the synthesized image data on the VRAM 35, displays a synthesized image based on the synthesized image data on the screen (step S203), and analog-synthesizes the synthesized voice data by the voice processing unit 37A. The sound is converted into a sound signal, and the synthesized sound based on the synthesized sound signal is output from the speaker 38 (step S204).

上述した第２実施形態によれば、前記第１実施形態と同様の効果の他、以下の効果がある。
本実施形態では、合成画像及び合成音声の生成をプロジェクター３ではなく、サーバー装置４で実行するため、プロジェクター３に合成画像及び合成音声を生成する機能を別途、設ける必要がなく、汎用のプロジェクター３を利用した会議システム１を構築できる。 According to the second embodiment described above, there are the following effects in addition to the same effects as in the first embodiment.
In the present embodiment, since the generation of the composite image and the synthetic voice is executed by the server device 4 instead of the projector 3, it is not necessary to provide the projector 3 with a function for generating the synthetic image and the synthetic voice. It is possible to construct a conference system 1 using

なお、本発明は前述の実施形態に限定されるものではなく、本発明の目的を達成できる範囲での変形、改良等は本発明に含まれるものである。
前記各実施形態では、ステップＳ１０１〜Ｓ１０４を第１端末装置２Ａが実行していたが、第１参加者以外の参加者が他の端末装置２の入力手段２４を操作すれば、他の端末装置２がステップＳ１０１〜Ｓ１０４を実行するものである。
前記各実施形態では、表示サイズを大きくする画像に対応する音声を他の画像に対応する音声に対して強調させていたが、これに限らず、所定の表示位置に位置付ける画像に対応する音声を他の画像に対応する音声に対して強調させるように構成しても構わない。 It should be noted that the present invention is not limited to the above-described embodiments, and modifications, improvements, and the like within the scope that can achieve the object of the present invention are included in the present invention.
In each said embodiment, 2A of 1st terminal devices performed step S101-S104, but if a participant other than a 1st participant operates the input means 24 of the other terminal device 2, another terminal device 2 executes steps S101 to S104.
In each of the above-described embodiments, the sound corresponding to the image whose display size is increased is emphasized with respect to the sound corresponding to the other image. However, the sound corresponding to the image positioned at a predetermined display position is not limited thereto. You may comprise so that it may emphasize with respect to the audio | voice corresponding to another image.

前記第２実施形態では、ステップＳ１０１〜Ｓ１０３を実行する機能（入力受付手段２１１及びＧＵＩ制御手段２１２）を端末装置２に持たせていたが、これに限らず、サーバー装置４に当該機能を持たせても構わない。
前記各実施形態では、本発明に係る画像表示装置としてプロジェクター３を採用していたが、これに限らず、液晶ディスプレイ、プラズマテレビ、有機ＥＬ（Electro Luminescence）等を採用しても構わない。 In the second embodiment, the terminal device 2 has the function of executing steps S101 to S103 (the input receiving unit 211 and the GUI control unit 212). However, the present invention is not limited to this, and the server device 4 has the function. It does not matter.
In each of the above embodiments, the projector 3 is employed as the image display device according to the present invention. However, the present invention is not limited thereto, and a liquid crystal display, a plasma television, an organic EL (Electro Luminescence), or the like may be employed.

前記各実施形態では、第１音声増幅部３７１Ａの増幅度をデフォルト値よりも大きく、第２音声増幅部３７１Ｂの増幅度をデフォルト値よりも小さくしているが、一方の増幅度はデフォルト値のままで、他方の増幅度を大きくする（または、小さくする）ことで音量に差をつけてもよい。
前記各実施形態では、音量の調整をプロジェクター３またはサーバー装置４で実施していたが、これに限らず、音量の調整までを各端末装置２で行い、音量が調整済みの音声をプロジェクター３やサーバー装置４が合成する構成を採用しても構わない。 In each of the embodiments described above, the amplification level of the first audio amplification unit 371A is larger than the default value, and the amplification level of the second audio amplification unit 371B is smaller than the default value. The volume may be differentiated by increasing (or decreasing) the other amplification degree.
In each of the embodiments described above, the volume adjustment is performed by the projector 3 or the server device 4. However, the present invention is not limited to this, and the volume adjustment is performed by each terminal device 2. A configuration in which the server device 4 synthesizes may be adopted.

本発明は、プロジェクター等の画像表示装置を用いて会議を行う会議システムに利用できる。 The present invention can be used for a conference system that performs a conference using an image display device such as a projector.

１・・・会議システム、２・・・端末装置、３・・・プロジェクター、４・・・サーバー装置、２８・・・マイクロフォン（音声収集手段）、３６・・・画像投射手段（表示手段）、３８・・・スピーカー（音声出力手段）、２１１・・・入力受付手段、２１３・・・通信制御手段（送信制御手段）、３１２，４１２・・・第１制御手段、３１３，４１３・・・第２制御手段、４１１・・・通信制御手段（第３制御手段）、ＬＡＮ・・・ネットワーク（通信路）、Ｓ１０３・・・入力受付ステップ、Ｓ１０４，Ｓ１０９，Ｓ１１０・・・送信制御ステップ、Ｓ１０５，Ｓ１１１・・・受信ステップ、Ｓ１１３・・・第１制御ステップ、Ｓ１１５・・・第２制御ステップ。 DESCRIPTION OF SYMBOLS 1 ... Conference system, 2 ... Terminal device, 3 ... Projector, 4 ... Server device, 28 ... Microphone (voice collecting means), 36 ... Image projection means (display means), 38 ... speaker (audio output means), 211 ... input receiving means, 213 ... communication control means (transmission control means), 312, 412 ... first control means, 313,413 ... 2 control means, 411 ... communication control means (third control means), LAN ... network (communication path), S103 ... input acceptance step, S104, S109, S110 ... transmission control step, S105, S111: reception step, S113: first control step, S115: second control step.

Claims

A conference system, comprising: a plurality of terminal devices; and an image display device connected to be communicable with the plurality of terminal devices, wherein a composite image obtained by combining the images handled by the plurality of terminal devices is displayed on the image display device. There,
The terminal device
Voice collecting means for collecting ambient sounds;
An input receiving means for receiving an input of a display state of an image handled by the terminal device and an image handled by another terminal device in the composite image;
Transmission control means for transmitting to the image display device audio information relating to the sound collected by the sound collecting means, display information relating to the display state, and image information relating to an image handled by the terminal device;
The image display device includes:
Display means for displaying an image;
Receiving means for receiving the audio information, the display information, and the image information;
A first control unit configured to generate the composite image by combining the images based on the image information from the plurality of terminal devices based on the display information, and to display the generated composite image on the display unit; ,
Audio output means for outputting audio;
A second control unit configured to generate a synthesized voice by synthesizing each voice based on the voice information from the plurality of terminal devices based on the display information, and to output the generated synthesized voice to the voice output unit; A conference system characterized by comprising:

A composite image comprising a plurality of terminal devices, an image display device, and the information processing devices communicably connected to the plurality of terminal devices and the image display device, and a composite image obtained by combining the images handled by the plurality of terminal devices A conference system for displaying on the image display device,
The terminal device
Voice collecting means for collecting ambient sounds;
An input receiving means for receiving an input of a display state of an image handled by the terminal device and an image handled by another terminal device in the composite image;
Transmission control means for transmitting to the information processing apparatus audio information relating to the voice collected by the voice collecting means, display information relating to the display state, and image information relating to an image handled by the terminal device;
The information processing apparatus includes:
First control means for generating the composite image by combining the images based on the image information from the plurality of terminal devices based on the display information;
Second control means for generating synthesized speech by synthesizing each voice based on each of the voice information from the plurality of terminal devices based on the display information;
Comprising, via the communication path, synthetic image information relating to the synthetic image and synthetic voice information relating to the synthetic speech to the image display device;
The image display device includes:
Display means for displaying the composite image based on the composite image information from the information processing apparatus;
A conference system comprising: voice output means for outputting the synthesized voice based on the synthesized voice information from the information processing apparatus.

In the conference system according to claim 1 or 2,
The display state is
Including the display size of each image handled by the plurality of terminal devices,
The second control means includes
Based on the display size of each image, the output level of each sound corresponding to each image is adjusted to generate the synthesized sound.

An image display device that is communicably connected to a plurality of terminal devices and displays a composite image in which images handled by the plurality of terminal devices are combined,
The image display device
Display means for displaying an image;
Audio information relating to the sound collected by each of the plurality of terminal devices, display information relating to a display state of an image handled by each of the plurality of terminal devices in the synthesized image, and images handled by each of the plurality of terminal devices Receiving means for receiving information;
A first control unit configured to generate the composite image by combining the images based on the image information from the plurality of terminal devices based on the display information, and to display the generated composite image on the display unit; ,
Audio output means for outputting audio;
A second control unit configured to generate a synthesized voice by synthesizing each voice based on the voice information from the plurality of terminal devices based on the display information, and to output the generated synthesized voice to the voice output unit; An image display device comprising:

A conference system comprising a plurality of terminal devices and an image display device connected to be communicable with the plurality of terminal devices, and causing the image display device to display a combined image obtained by combining the images handled by the plurality of terminal devices. A video / audio processing method used,
An input receiving step in which the terminal device receives an input of a display state of an image handled by the terminal device and an image handled by the other terminal device in the composite image;
A transmission control step in which the terminal device transmits to the image display device audio information relating to the sound collected by the sound collecting means, display information relating to the display state, and image information relating to an image handled by the terminal device;
A receiving step in which the image display device receives the audio information, the display information, and the image information;
The image display device generates a composite image by combining the images based on the image information from the plurality of terminal devices based on the display information, and displays the generated composite image Steps,
A second control step in which the image display device generates a synthesized speech by synthesizing each speech based on the speech information from the plurality of terminal devices based on the display information, and outputs the generated synthesized speech; When,
An audio / video processing method comprising:

An image audio processing method of an image display device that is communicably connected to a plurality of terminal devices and displays a composite image in which images handled by the plurality of terminal devices are combined,
Audio information relating to the sound collected by each of the plurality of terminal devices, display information relating to a display state of an image handled by each of the plurality of terminal devices in the synthesized image, and images handled by each of the plurality of terminal devices A receiving step for receiving information;
A first control step of generating the composite image by combining the images based on the image information from the plurality of terminal devices based on the display information, and displaying the generated composite image;
A second control step of generating a synthesized voice by synthesizing voices based on the voice information from the plurality of terminal devices based on the display information, and outputting the generated synthesized voice;
An audio / video processing method comprising: