JP4862645B2

JP4862645B2 - Video conferencing equipment

Info

Publication number: JP4862645B2
Application number: JP2006341175A
Authority: JP
Inventors: 紀行畑; 卓也田丸
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-12-19
Filing date: 2006-12-19
Publication date: 2012-01-25
Anticipated expiration: 2026-12-19
Also published as: CN101518049A; JP2008154055A; WO2008075726A1

Description

この発明は、互いに離れた会議室間でビデオ会議を行う際に用いる映像や画像と音声とを通信するビデオ会議装置に関するものである。 The present invention relates to a video conferencing apparatus that communicates video, images, and audio used in a video conference between conference rooms separated from each other.

従来、互いに離れた複数地点間でビデオ会議を行う場合、それぞれの地点に特許文献１に示すようなビデオ会議装置（テレビ会議装置）を配置し、当該ビデオ会議装置を取り囲むように会議者が在席して会議を行う。 Conventionally, when a video conference is performed between a plurality of points distant from each other, a video conference device (video conference device) as shown in Patent Document 1 is arranged at each point, and a conference person exists so as to surround the video conference device. Sit and hold a meeting.

特許文献１のビデオ会議装置では、各会議者に電波発生器付きマイクを装着させ、最も高レベルの音声を収音したマイクから電波を放射する。人物撮影用カメラは、この電波を受信することで話者方向を検出して、当該話者方向へカメラを向け、話者を中心とする映像を撮像する。この映像データと音声データとは符号化され、相手先のビデオ会議装置に送信される。
特開平６−２７６５１４号公報 In the video conference apparatus of Patent Document 1, each conference person is equipped with a microphone with a radio wave generator, and radio waves are radiated from the microphone that picks up the highest level of sound. The person photographing camera detects the speaker direction by receiving the radio wave, directs the camera toward the speaker direction, and captures an image centered on the speaker. The video data and audio data are encoded and transmitted to the destination video conference apparatus.
JP-A-6-276514

ビデオ会議を行う場合、上述のように話者等の会議者の映像だけでなく、離れた地点間で資料等を共通に参照したい場合がある。特許文献１の装置では、話者の映像を切り替えて取得することができるが、このままでは資料を映すことはできない。このため、特許文献１の構成を利用して資料を映すには、会議者が手差しでカメラの前に資料を翳せばよいが、資料を完全に固定することができないので、画像がブレてしまう。また、レンズによる湾曲の影響を受けて、資料をありのまま（元画像のまま）取り込むことができない。また、資料を共通で参照する別方法として、資料をデータ化して送信することも可能ではあるが、会議中に書き込みをして説明する等の直感的で、フレキシビリティに富んだ資料を提供することができない。 When a video conference is performed, there are cases where it is desired to refer not only to the video of a conference person such as a speaker as described above but also to materials in common between distant points. With the apparatus of Patent Document 1, it is possible to switch and acquire a speaker's video, but it is not possible to project a document as it is. For this reason, in order to project a document using the configuration of Patent Document 1, it is sufficient for a conference person to manually view the document in front of the camera, but the document cannot be completely fixed, so the image is blurred. End up. In addition, the document cannot be captured as it is (original image) due to the influence of the curvature of the lens. In addition, as an alternative method to refer to the material in common, it is possible to send the material as data, but it provides intuitive and flexible material such as writing and explaining during the meeting. I can't.

したがって、本発明の目的は、音声、映像とともに、フレキシビリティに富むような資料であっても、正確且つ明瞭に送信することができるビデオ会議装置を提供することにある。 Accordingly, an object of the present invention is to provide a video conferencing apparatus capable of accurately and clearly transmitting even a highly flexible material together with audio and video.

この発明は、所定領域を撮像するカメラと、該カメラの撮像した映像に基づいて映像データを生成する映像データ生成手段と、自装置周囲の音声を収音して収音音声データを生成する収音手段と、放音音声データを放音する放音手段と、前記収音手段および放音手段が設けられた筐体と、前記収音音声データと前記映像データとで通信データを形成し、当該通信データを外部に送信するとともに、外部からの通信データから放音音声データを取得して前記放音手段に与える通信手段と、前記カメラを前記筐体に対して一定に支持する支持手段と、を備えたビデオ会議装置であって、前記カメラは、会議者撮像領域と、前記筐体の近傍の前記カメラに近接する領域とを同時に撮像し、前記映像データ生成手段は、前記会議者撮像領域に対応する第１部分映像データから、前記収音音声データに対応する方位領域のみを切り出して、切り出した第１部分映像データを第１補整処理により補整し、前記カメラに近接する領域に対応する第２部分映像データを、前記第１補整処理と異なる第２補整処理により補整する。 This invention produces a camera for imaging a predetermined region, and the image data generation means for generating image data based on the video imaged in the camera, the picked-up sound data picked up ambient sounds own device Communication data is formed by sound collection means, sound emission means for emitting sound emission data, a housing provided with the sound collection means and sound emission means, and the sound collection sound data and the video data. A communication means for transmitting the communication data to the outside, acquiring sound emission sound data from the communication data from the outside, and providing the sound emission means to the sound emission means; and a support means for supporting the camera with respect to the casing. A video conferencing apparatus, wherein the camera simultaneously images a conference person imaging area and an area close to the camera in the vicinity of the housing, and the video data generating means Corresponds to the imaging area 1 partial video data, by cutting out only the azimuth area corresponding to the picked-up sound data, the first partial image data cut out by compensation by the first correction processing, a second partial image corresponding to the region close to the camera The data is corrected by a second correction process different from the first correction process.

この構成のビデオ会議装置は、会議者撮像領域に対応する第１部分映像データと、カメラに近接する資料が配置された領域に対応する第２部分映像データとが、一台のカメラで同時に取得される。そして、第１部分映像データは、収音音声データに対応する方位領域のみが切り出され、第１補整処理により適宜補整される。第２部分映像データは、対応する第２補整処理により適宜見やすいように補整される。 With this configuration, the video conferencing apparatus simultaneously acquires the first partial video data corresponding to the conference person imaging area and the second partial video data corresponding to the area where the material close to the camera is arranged by one camera. Is done. In the first partial video data, only the azimuth area corresponding to the collected sound data is cut out and appropriately corrected by the first correction processing. The second partial video data is adjusted so as to be easily viewable by the corresponding second correction processing.

これにより、会議者映像と資料等の静止画とが、同時に取得され、且つ、それぞれの撮影仕様に応じて補整される。この結果、相手先装置に対して、それぞれ適正に補整された会議者映像と資料画像とを同時に送信することもできる。 Thereby, a conference person image and a still image such as a document are acquired at the same time, and are corrected according to each shooting specification. As a result, it is also possible to simultaneously transmit the conference person video and the material image appropriately corrected to the destination apparatus.

この発明のビデオ会議装置は、通信データに用いる部分映像データを選択する選択手段を備える。ビデオ会議装置の映像データ生成手段は、選択手段により選択された部分映像データを通信手段に与える。 The video conference apparatus according to the present invention includes selection means for selecting partial video data used for communication data. The video data generation means of the video conference apparatus gives the partial video data selected by the selection means to the communication means.

この構成では、会議者映像と静止画とのうちの選択されたいずれか一方が送信される。これにより、経時変化の殆どない静止画を必要なときにのみ送信することができるので、通信系に余分な負荷を掛けることがない。 In this configuration, any one selected from the conference video and the still image is transmitted. As a result, a still image that hardly changes with time can be transmitted only when necessary, so that no extra load is applied to the communication system.

また、この発明のビデオ会議装置では、カメラに魚眼レンズを有し、該魚眼レンズにより撮像される領域の中心領域をカメラに近接する領域とし、少なくとも中心領域から外の周辺領域を会議者撮像領域とすることを特徴としている。 In the video conference apparatus of the present invention, the camera has a fish-eye lens, the center area of the area imaged by the fish-eye lens is an area close to the camera, and at least the peripheral area outside the center area is the conference person imaging area. It is characterized by that.

この構成のビデオ会議装置では、具体的なカメラの仕様として魚眼レンズを利用する。そして、魚眼レンズの中心に対応する領域をカメラが近接する領域とし、この領域に応じた補整処理により適宜補整を行う。会議者撮像領域は、態様の切り替えを行う場合は中心領域も使用することがあるが、周辺領域を使用することが主となる。したがって、会議者領域の映像に関しては、それぞれの場合に応じて、選択した領域に応じた補整処理により適宜補整を行う。これにより、カメラ近傍の近接領域の映像（画像）と会議者撮像領域の映像とを、魚眼レンズを介して撮像しても、それぞれの映像が適宜補整される。 In the video conferencing apparatus having this configuration, a fisheye lens is used as a specific camera specification. Then, an area corresponding to the center of the fisheye lens is set as an area close to the camera, and correction is appropriately performed by correction processing according to the area. As the conference person imaging area, the center area may be used when the mode is changed, but the peripheral area is mainly used. Therefore, the video in the conference area is appropriately corrected by the correction processing according to the selected area in each case. Thereby, even if the video (image) of the proximity region near the camera and the video of the conference person imaging region are imaged via the fisheye lens, the respective images are appropriately corrected.

また、この発明のビデオ会議装置の映像データ生成手段は、カメラと一体形成されている。また、この発明のビデオ会議装置の通信手段は、放音手段および収音手段とともに筐体に一体形成されている。また、この発明のビデオ会議装置の映像データ生成手段は、放音手段および収音手段とともに筐体に一体形成されている。これらにより、ビデオ会議装置がコンパクトに構成される。 The video data generating means of the video conference apparatus of the present invention is formed integrally with the camera. The communication unit of the video conferencing system of the present invention is integrally formed in the housing with a sound unit and sound pickup means release. The video data generating means of the video conferencing system of the present invention is integrally formed in the housing with a sound unit and sound pickup means release. As a result, the video conference apparatus is configured in a compact manner.

また、この発明のビデオ会議装置は映像データを再生するディスプレイモニタを備える。このビデオ会議装置の通信手段は、通信データに含まれる映像データを取得して、ディスプレイモニタに与える。 The video conference apparatus according to the present invention further includes a display monitor for reproducing video data. The communication means of this video conference apparatus acquires video data included in the communication data and gives it to the display monitor.

これにより、通信会議を行う各地点に本発明のビデオ会議装置を配置して接続するだけで、双方で会議者映像と資料とを簡単に共有することができる。 As a result, the conference video and materials can be easily shared by both parties simply by arranging and connecting the video conference apparatus of the present invention at each point where the communication conference is performed.

この発明によれば、簡単なカメラ方向の操作で、話者の映像は話者の映像に応じた補整処理で補整され、資料の画像は資料の画像に応じた補整処理で補整されるので、話者映像および資料画像を、ともに正確且つ明瞭に相手側装置に送信することができる。これにより、本装置を用いたビデオ会議では、より臨場感の有る、互いに分かりやすい会議を簡単に実現することができる。 According to the present invention, with a simple camera direction operation, the video of the speaker is corrected by the correction processing according to the video of the speaker, and the image of the material is corrected by the correction processing according to the image of the material. Both the speaker video and the material image can be transmitted to the counterpart device accurately and clearly. As a result, in the video conference using the present apparatus, it is possible to easily realize a conference that is more realistic and easy to understand.

本発明の第１の実施形態に係るビデオ会議装置について、図を参照して説明する。
図１、図２は本実施形態のビデオ会議装置の外観図であり、（Ａ）が平面図、（Ｂ）が側面図である。図１、図２では、機構的に特徴のある放収音装置、カメラ、ステーの構成のみを示し、通信端末、放収音装置、およびカメラを電気的に接続するケーブルについては図示を省略する。また、図１は会議者撮影モード時の機構状態を示し、図（２）は資料撮影モード時の機構状態を示す。
図３は本実施形態のビデオ会議装置の主要構成を示すブロック図である。 A video conference apparatus according to a first embodiment of the present invention will be described with reference to the drawings.
1 and 2 are external views of the video conference apparatus according to the present embodiment, in which (A) is a plan view and (B) is a side view. 1 and 2 show only the structure of the sound emission / collection device, camera, and stay that are mechanically characteristic, and the communication terminals, the sound emission / collection device, and the cable that electrically connects the camera are not shown. . FIG. 1 shows the mechanism state in the conference shooting mode, and FIG. 2 shows the mechanism state in the document shooting mode.
FIG. 3 is a block diagram showing the main configuration of the video conference apparatus according to the present embodiment.

なお、図１、図２、図３およびこれ以降の本明細書で参照される図においては、マイクを代表または総称して「ＭＣ」で表し、スピーカを代表または総称して「ＳＰ」で表す。
本実施形態のビデオ会議装置は、平面視した形状が円盤状の放収音装置１と、撮像機能および映像データ生成機能を備えるカメラ２と、カメラ２を放収音装置１に対して所定位置に設置するステー３とを備える。また、図１、図２には図示していないが、放収音装置１とカメラ２とは電気的に接続され、さらにビデオ会議装置は、放収音装置１とカメラ２とに電気的に接続する通信端末を備える。 In FIG. 1, FIG. 2, FIG. 3, and the drawings referred to in this specification, the microphone is represented or generically represented by “MC”, and the speaker is represented or generically represented by “SP”. .
The video conference apparatus according to the present embodiment includes a sound emitting and collecting apparatus 1 having a disk shape in plan view, a camera 2 having an imaging function and a video data generating function, and the camera 2 at a predetermined position with respect to the sound emitting and collecting apparatus 1. And a stay 3 to be installed. Although not shown in FIGS. 1 and 2, the sound emitting and collecting apparatus 1 and the camera 2 are electrically connected, and the video conference apparatus is electrically connected to the sound emitting and collecting apparatus 1 and the camera 2. A communication terminal to be connected is provided.

通信端末５は、ネットワーク５００を介して接続された相手先のビデオ会議装置の通信端末から受信した通信データを復調して、放音用音声信号、相手先装置ＩＤ、話者方位データを取得して、ケーブル接続された自装置側の放収音装置１に与える。また、通信端末５は、自装置側の放収音装置１から受信した収音音声信号および話者位置データと、カメラ２から受信した映像データとに基づいて通信データを生成する。通信端末５は、生成した通信データを、相手先ビデオ会議装置の通信端末へ送信する。また、通信端末５は、場合に応じて、放収音装置１とカメラ２との間での話者位置データの送受信を仲介する。 The communication terminal 5 demodulates the communication data received from the communication terminal of the other party's video conferencing apparatus connected via the network 500, and obtains the sound output sound signal, the other party apparatus ID, and the speaker orientation data. Then, it is given to the sound emitting and collecting apparatus 1 on the own apparatus side connected by the cable. Further, the communication terminal 5 generates communication data based on the collected sound signal and the speaker position data received from the sound emitting and collecting device 1 on the own device side and the video data received from the camera 2. The communication terminal 5 transmits the generated communication data to the communication terminal of the partner video conference device. Moreover, the communication terminal 5 mediates transmission / reception of the speaker position data between the sound emitting and collecting apparatus 1 and the camera 2 depending on the case.

放収音装置１は円板状の筐体１１を備える。具体的に、筐体１１は、平面視した形状が円形であり、天面と底面との面積が垂直方向の途中部分の面積よりも狭く、側面視した形状が、高さ方向の一点から天面に向けて狭くなるとともに、前記一点から底面に向けて狭くなる形状からなる。すなわち、前記一点より上部側および下部側にそれぞれ傾斜面を有する形状からなる。筐体１１の天面には、該天面の面積よりも狭く、所定深さからなる凹部１１０が形成されており、凹部１１０の平面視した中心と天面の中心とが、一致するように設定されている。 The sound emission and collection device 1 includes a disk-shaped housing 11. Specifically, the casing 11 has a circular shape in plan view, the area between the top surface and the bottom surface is narrower than the area of the middle part in the vertical direction, and the shape in side view has a ceiling from one point in the height direction. It has a shape that narrows toward the surface and narrows from the one point toward the bottom surface. That is, it has a shape having inclined surfaces on the upper side and the lower side from the one point. The top surface of the housing 11 is formed with a recess 110 that is narrower than the top surface and has a predetermined depth, so that the center of the recess 110 in plan view coincides with the center of the top surface. Is set.

１６個のマイクＭＣ１〜ＭＣ１６は、凹部１１０の側面に沿った筐体１１の天面側内部に設置されており、各マイクＭＣ１〜ＭＣ１６は放収音装置１を平面視した中心を回転中心として等角度ピッチ（この場合は約２２．５°間隔）で配置されている。この際、マイクＭＣ１がθ＝０°方向となり、順にθが２２．５°ずつ増加する方向に沿って各マイクＭＣ１〜ＭＣ１６が配置される。例えば、マイクＭＣ５はθ＝９０°方向に配置され、マイクＭＣ９はθ＝１８０°方向に配置され、マイクＭＣ１３は、θ＝２７０°方向に配置される。また、各マイクＭＣ１〜ＭＣ１６は、単一指向性を有し、それぞれが前記平面視した中心方向に強い指向性を有するように配置されている。例えば、マイクＭＣ１はθ＝１８０°方向を指向性の中心とし、マイクＭＣ５はθ＝２７０°方向を指向性の中心とし、マイクＭＣ９はθ＝０（３６０）°方向を指向性の中心とし、マイクＭＣ１３はθ＝９０°方向を指向性の中心とする。なお、マイクの個数はこれに限らず、仕様に応じて適宜設定すればよい。 The 16 microphones MC1 to MC16 are installed inside the top surface of the housing 11 along the side surface of the recess 110, and each of the microphones MC1 to MC16 has a center in the plan view of the sound emitting and collecting device 1 as a rotation center. They are arranged at an equiangular pitch (in this case, an interval of about 22.5 °). At this time, the microphone MC1 is in the direction of θ = 0 °, and the microphones MC1 to MC16 are arranged along the direction in which θ increases by 22.5 ° in order. For example, the microphone MC5 is disposed in the θ = 90 ° direction, the microphone MC9 is disposed in the θ = 180 ° direction, and the microphone MC13 is disposed in the θ = 270 ° direction. Further, each of the microphones MC1 to MC16 has a single directivity, and each microphone is arranged so as to have a strong directivity in the central direction as viewed from above. For example, the microphone MC1 has the direction of θ = 180 ° as the center of directivity, the microphone MC5 has the direction of θ = 270 ° as the center of directivity, and the microphone MC9 has the direction of θ = 0 (360) ° as the center of directivity. The microphone MC13 has the direction of θ = 90 ° as the center of directivity. The number of microphones is not limited to this, and may be set as appropriate according to specifications.

４個のスピーカＳＰ１〜ＳＰ４は、筐体１１の下部側の傾斜面と放音面が一致するようにそれぞれ設置されており、各スピーカＳＰ１〜ＳＰ４は放収音装置１を平面視した中心を回転中心として等角度ピッチ（この場合は約９０°間隔）で配置されている。この際、スピーカＳＰ１がマイクＭＣ１と同じθ＝０°方向に配置され、スピーカＳＰ２がマイクＭＣ５と同じθ＝９０°方向に配置され、スピーカＳＰ３がマイクＭＣ９と同じθ＝１８０°方向に配置され、スピーカＳＰ４がマイクＭＣ１３と同じθ＝２７０°方向に配置される。また、各スピーカＳＰ１〜ＳＰ４は、放音面の正面方向に強い指向性を有するものであり、スピーカＳＰ１はθ＝０°方向を中心に放音し、スピーカＳＰ２はθ＝９０°方向を中心に放音し、スピーカＳＰ３はθ＝１８０°方向を中心に放音し、スピーカＳＰ４はθ＝２７０°方向を中心に放音する。 The four speakers SP1 to SP4 are respectively installed so that the inclined surface on the lower side of the housing 11 and the sound emitting surface coincide with each other, and each speaker SP1 to SP4 has a center in a plan view of the sound emitting and collecting device 1. The rotation centers are arranged at equiangular pitches (in this case, intervals of about 90 °). At this time, the speaker SP1 is arranged in the same θ = 0 ° direction as the microphone MC1, the speaker SP2 is arranged in the same θ = 90 ° direction as the microphone MC5, and the speaker SP3 is arranged in the same θ = 180 ° direction as the microphone MC9. The speaker SP4 is arranged in the same θ = 270 ° direction as the microphone MC13. The speakers SP1 to SP4 have strong directivity in the front direction of the sound emitting surface. The speaker SP1 emits sound around the θ = 0 ° direction, and the speaker SP2 centers around the θ = 90 ° direction. The speaker SP3 emits sound around the θ = 180 ° direction, and the speaker SP4 emits sound around the θ = 270 ° direction.

このように、スピーカＳＰ１〜ＳＰ４を筐体１１の下部側に配置し、マイクＭＣ１〜ＭＣ１６を筐体１１の上部側に配置し、マイクＭＣ１〜ＭＣ１６の収音方向を筐体１１の中心方向とすることで、各マイクＭＣ１〜ＭＣ１６は、スピーカＳＰ１〜ＳＰ４からの回り込み音声を収音し難くなる。このため、後述する話者位置検出で、回り込み音声の影響を受け難くなり、より高精度に話者位置検出が行える。 As described above, the speakers SP1 to SP4 are arranged on the lower side of the casing 11, the microphones MC1 to MC16 are arranged on the upper side of the casing 11, and the sound collection direction of the microphones MC1 to MC16 is the center direction of the casing 11. This makes it difficult for the microphones MC1 to MC16 to collect the wraparound sound from the speakers SP1 to SP4. For this reason, it becomes difficult to be influenced by the wraparound sound in the speaker position detection described later, and the speaker position can be detected with higher accuracy.

操作部１１１は、筐体１１の上部側の傾斜面に設置されており、図示しないが、各種の操作釦および液晶表示パネルを備える。
入出力Ｉ／Ｆ１０２（図１，２では図示せず）は、筐体１１の下部側の傾斜面で、スピーカＳＰ１〜ＳＰ４が設置されていない位置に設置されており、音声データおよび各種制御データを通信可能な端子を備える。そして、入出力Ｉ／Ｆ１０２の端子と通信端末とをケーブル等で接続することで、放収音装置１と通信端末とで通信を行う。 The operation unit 111 is installed on an inclined surface on the upper side of the housing 11 and includes various operation buttons and a liquid crystal display panel (not shown).
The input / output I / F 102 (not shown in FIGS. 1 and 2) is an inclined surface on the lower side of the housing 11 and is installed at a position where the speakers SP1 to SP4 are not installed. Audio data and various control data Is provided with a terminal capable of communicating. Then, by connecting the terminal of the input / output I / F 102 and the communication terminal with a cable or the like, communication is performed between the sound emitting and collecting apparatus 1 and the communication terminal.

放収音装置１は、このような構造上の構成とともに、図３に示すような機能的な構成を備える。
制御部１０１は、放収音装置１の設定、収音、放音等の全般制御を行うとともに、操作部１１１により入力された操作指示内容に基づく制御を放収音装置１の各部に与える。 The sound emission and collection device 1 has a functional configuration as shown in FIG. 3 in addition to such a structural configuration.
The control unit 101 performs general control such as setting, sound collection, and sound emission of the sound emitting and collecting apparatus 1 and gives control to each unit of the sound emitting and collecting apparatus 1 based on the operation instruction content input by the operation unit 111.

（１）放音
入出力Ｉ／Ｆ１０２は、通信端末５から受信した放音用音声信号Ｓ１〜Ｓ３をそれぞれチャンネルＣＨ１〜ＣＨ３に出力する。なお、チャンネルの割り当ては、受信した放音用音声信号の数に応じて適宜設定すればよい。また、入出力Ｉ／Ｆ１０２は、通信端末５から相手先装置ＩＤを受信して相手先装置ＩＤ毎にチャンネルＣＨを割り当てる。例えば、接続中の相手先装置が一台である場合、当該相手先装置からの音声データを放音用音声信号Ｓ１として、チャンネルＣＨ１に割り当てる。また、接続中の相手先装置が二台である場合、二台の相手先装置からの音声データをそれぞれ放音用音声信号Ｓ１，Ｓ２として、チャンネルＣＨ１，ＣＨ２に個別に割り当てる。同様に、接続中の相手先装置が三台である場合、三台の相手先装置からの音声データをそれぞれ放音用音声信号Ｓ１，Ｓ２，Ｓ３として、チャンネルＣＨ１，ＣＨ２，ＣＨ３に個別に割り当てる。チャンネルＣＨ１〜ＣＨ３は、エコーキャンセル部１０７を介して放音制御部１０３に接続される。
また、入出力Ｉ／Ｆ１０２は、通信端末５から相手先放収音装置での話者方位データＰｙを抽出し、チャンネル情報とともに放音制御部１０３に与える。 (1) Sound emission The input / output I / F 102 outputs sound emission sound signals S1 to S3 received from the communication terminal 5 to the channels CH1 to CH3, respectively. Note that channel assignment may be set as appropriate according to the number of received sound signals for sound emission. Further, the input / output I / F 102 receives the counterpart device ID from the communication terminal 5 and allocates a channel CH for each counterpart device ID. For example, when there is one connected counterpart device, the audio data from the counterpart device is assigned to the channel CH1 as the sound output sound signal S1. Further, when there are two connected counterpart devices, the audio data from the two counterpart devices are individually assigned to the channels CH1 and CH2 as sound emission sound signals S1 and S2, respectively. Similarly, when there are three connected counterpart devices, the audio data from the three counterpart devices are individually assigned to channels CH1, CH2, and CH3 as sound emission sound signals S1, S2, and S3, respectively. . Channels CH 1 to CH 3 are connected to the sound emission control unit 103 via the echo cancellation unit 107.
Further, the input / output I / F 102 extracts the speaker orientation data Py at the other party sound emission and collection device from the communication terminal 5 and supplies it to the sound emission control unit 103 together with the channel information.

放音制御部１０３は、放音用音声信号Ｓ１〜Ｓ３と、話者方位情報Ｐｙとに基づいて、各スピーカＳＰ１〜ＳＰ４に与えるスピーカ出力信号ＳＰＤ１〜ＳＰＤ４を生成する。 The sound emission control unit 103 generates speaker output signals SPD1 to SPD4 to be given to the speakers SP1 to SP4 based on the sound emission sound signals S1 to S3 and the speaker orientation information Py.

Ｄ／Ａ−ＡＭＰ１０４は各スピーカ出力信号ＳＰＤ１〜ＳＰＤ４をディジタル−アナログ変換し、一定の増幅率で増幅して、それぞれスピーカＳＰ１〜ＳＰ４に与える。スピーカＳＰ１〜ＳＰ４は、与えられたスピーカ出力信号ＳＰＤ１〜ＳＰＤ４を音声変換して放音する。 The D / A-AMP 104 performs digital-analog conversion on the speaker output signals SPD1 to SPD4, amplifies the signals with a constant amplification factor, and supplies them to the speakers SP1 to SP4, respectively. The speakers SP1 to SP4 convert the given speaker output signals SPD1 to SPD4 into sounds and emit the sounds.

このような放音処理を行うことで、各スピーカＳＰ１〜ＳＰ４から放音される音声が所定の遅延関係および振幅関係になるため、あたかも設定した仮想音源から放音されたような感覚を会議者に与えることができる。 By performing such sound emission processing, the sound emitted from the speakers SP1 to SP4 has a predetermined delay relationship and amplitude relationship, so that the conference person feels as if the sound is emitted from the set virtual sound source. Can be given to.

（２）収音
マイクＭＣ１〜ＭＣ１６は、会議者の発生音等の外部からの音声を収音して収音信号ＭＳ１〜ＭＳ１６を生成する。各Ａ／Ｄ−ＡＭＰ１０５は、対応する収音信号ＭＳ１〜ＭＳ１６を所定増幅率で増幅し、アナログ−ディジタル変換して収音制御部１０６に出力する。 (2) Sound Collection The microphones MC1 to MC16 collect sound from the outside such as the sound generated by the conference person and generate sound collection signals MS1 to MS16. Each A / D-AMP 105 amplifies the corresponding collected sound signals MS 1 to MS 16 with a predetermined amplification factor, performs analog-digital conversion, and outputs the result to the collected sound control unit 106.

収音制御部１０６は取得した収音信号ＭＳ１〜ＭＳ１６を、それぞれ異なる遅延制御パターンおよび振幅パターンで合成して、それぞれに異なる方向を指向性の中心方向とする収音ビーム信号を生成する。例えば、放収音装置１を中心として、全周囲３６０°を８分割した角度、すなわち、４５°毎に指向性の中心方向がシフトする８本の収音ビーム信号を生成する。収音制御部１０６は、これら収音ビーム信号の振幅レベルを比較して、もっとも高い振幅レベルの収音ビーム信号ＭＢＳを選択して、エコーキャンセル部１０７に出力する。収音制御部１０６は、選択した収音ビーム信号に対応する話者方位を取得し、話者方位情報Ｐｍを生成し、入出力Ｉ／Ｆ１０２に与える。 The sound collection control unit 106 synthesizes the acquired sound collection signals MS1 to MS16 with different delay control patterns and amplitude patterns, and generates sound collection beam signals having different directions in the central direction of directivity. For example, with the sound emitting and collecting apparatus 1 as the center, eight sound collecting beam signals are generated in which the entire circumference 360 ° is divided into eight angles, that is, the central direction of the directivity is shifted every 45 °. The sound collection control unit 106 compares the amplitude levels of these sound collection beam signals, selects the sound collection beam signal MBS having the highest amplitude level, and outputs it to the echo cancellation unit 107. The sound collection control unit 106 acquires the speaker orientation corresponding to the selected sound collection beam signal, generates the speaker orientation information Pm, and provides it to the input / output I / F 102.

エコーキャンセル部１０７は、入力される収音ビーム信号ＭＢＳに対して、各放音用音声信号Ｓ１〜Ｓ３に基づく擬似回帰音信号を生成する適応型フィルタと、収音ビーム信号ＭＢＳから擬似回帰音信号を減算するポストプロセッサとからなる。エコーキャンセル回路は、適応型フィルタのフィルタ係数を逐次最適化しながら出力用収音ビーム信号ＭＢＳから擬似回帰音信号を減算することで、出力用収音ビーム信号ＭＢＳに含まれるスピーカＳＰ１〜ＳＰ４からマイクＭＣ１〜ＭＣ１６への回り込み成分を除去する。この回り込み成分が除去された収音ビーム信号ＭＢＳは、入出力Ｉ／Ｆ１０２に出力される。 The echo cancellation unit 107 generates a pseudo regression sound signal based on the sound output sound signals S1 to S3 for the input sound collection beam signal MBS, and a pseudo regression sound from the sound collection beam signal MBS. It consists of a post processor that subtracts the signal. The echo cancellation circuit subtracts the pseudo-regression sound signal from the output sound collection beam signal MBS while sequentially optimizing the filter coefficient of the adaptive filter, so that the speakers SP1 to SP4 included in the output sound collection beam signal MBS are connected to the microphone. The wraparound component to MC1 to MC16 is removed. The collected sound beam signal MBS from which the wraparound component has been removed is output to the input / output I / F 102.

入出力Ｉ／Ｆ１０２は、エコーキャンセル部１０７で回帰音除去された収音ビーム信号ＭＢＳと、収音制御部１０６からの話者方位情報Ｐｍとを関連付けして、通信端末５に出力する。 The input / output I / F 102 associates the collected sound beam signal MBS from which the return sound has been removed by the echo canceling unit 107 with the speaker orientation information Pm from the collected sound control unit 106 and outputs it to the communication terminal 5.

カメラ２は、ステー３により、図１、図２に示すように、放収音装置１に対して固定された位置に設置される。この際、カメラ２は、ステー３により、水平方向（図１に示すカメラ２の向く方向）と垂直下方向（図２に示すカメラ２の向く方向）との間で回動可能に設置されている。 As shown in FIGS. 1 and 2, the camera 2 is installed at a position fixed to the sound emitting and collecting apparatus 1 by the stay 3. At this time, the camera 2 is installed by the stay 3 so as to be rotatable between a horizontal direction (direction facing the camera 2 shown in FIG. 1) and a vertical downward direction (direction facing the camera 2 shown in FIG. 2). Yes.

ステー３は、主体部３１、カメラ支持部３２、主体支持部３３、放収音装置取付部３４を備える。主体部３１は、所定幅を有する直線状部材からなり主体支持部３３により垂直方向に対して所定角の方向に延びる形状で設置される。主体部３１の延びる方向の一方端には、ヒンジ２０３を介してカメラ支持部３２が設置され、他方端には、放収音装置取付部３４が設置されている。放収音装置取付部３４は、筐体１１の脚部１２が装嵌する形状の開口部を有する平板からなり、例えば、主体部３１と一体形成されている。 The stay 3 includes a main body portion 31, a camera support portion 32, a main body support portion 33, and a sound emitting and collecting device attachment portion 34. The main body portion 31 is formed of a linear member having a predetermined width and is installed in a shape extending in a direction of a predetermined angle with respect to the vertical direction by the main body support portion 33. A camera support portion 32 is installed at one end in the extending direction of the main body portion 31 via a hinge 203, and a sound emission and sound collection device mounting portion 34 is installed at the other end. The sound emitting and collecting device mounting portion 34 is formed of a flat plate having an opening portion in which the leg portion 12 of the housing 11 is fitted, and is integrally formed with the main body portion 31, for example.

主体部３１のカメラ支持部３２側端部は、幅方向の両端壁のみが残り、幅方向の中央部が開口する形状からなる。この開口部は、カメラ支持部３２に設置されたカメラ２が水平方向と垂直下方向との間で回動する際に、主体部３１に接触しない形状からなる。 The end portion of the main body 31 on the camera support portion 32 side has a shape in which only both end walls in the width direction remain and the center portion in the width direction opens. The opening has a shape that does not come into contact with the main body 31 when the camera 2 installed on the camera support 32 rotates between the horizontal direction and the vertical downward direction.

ヒンジ２０３は、カメラ支持部３２を主体部３１に対して回動可能に設置させる構造をなす。また、ヒンジ２０３およびカメラ保持部３２は、カメラ２およびカメラ支持部３２が水平方向に向いた場合と垂直下方向に向いた場合とに、半固定される構造を有する。例えば、ヒンジ２０３を主体部３１に固定し、ヒンジ２０３の水平方向の位置と垂直下方向の位置にそれぞれ凹部を形成する。カメラ支持部３２のヒンジ側端部には、前記凹部に嵌る形状の凸部を設け、当該凸部をバネ等でカメラ支持部３２内から付勢する形状を備えさせる。これにより、カメラ２は水平方向と垂直下方向との間で回動し、且つ水平方向と垂直下方向とで機構的状態を維持することが可能となる。 The hinge 203 has a structure in which the camera support portion 32 is installed so as to be rotatable with respect to the main body portion 31. In addition, the hinge 203 and the camera holding unit 32 have a structure that is semi-fixed when the camera 2 and the camera support unit 32 face in the horizontal direction and in the vertical downward direction. For example, the hinge 203 is fixed to the main body 31, and the recesses are formed at the horizontal position and the vertical downward position of the hinge 203, respectively. The camera support portion 32 has a hinge-side end provided with a convex portion that fits into the concave portion, and is provided with a shape that biases the convex portion from within the camera support portion 32 with a spring or the like. Thereby, the camera 2 can be rotated between the horizontal direction and the vertical downward direction, and can maintain a mechanical state in the horizontal direction and the vertical downward direction.

このヒンジ２０３およびカメラ支持部３２からなる機構部は、スイッチ４として機能する。例えば、これら凹部および凸部にそれぞれ電極を設置し、電気的にこれらの導通、開放を検出する。この際、水平方向の凹部と、垂直下方向の凹部とで異なる信号が得られるように結線または検出信号を設定する。このような構造によりスイッチ４が形成され、当該スイッチ４の検出結果は、カメラ２に与えられる。これにより、カメラ２は、自身が水平方向を向いているのか、垂直下方向を向いているのかを識別して、映像を取得することができる。 The mechanism unit including the hinge 203 and the camera support unit 32 functions as the switch 4. For example, an electrode is installed in each of the concave portion and the convex portion, and the conduction and release of these are detected electrically. At this time, the connection or detection signal is set so that different signals are obtained between the horizontal recess and the vertical downward recess. The switch 4 is formed by such a structure, and the detection result of the switch 4 is given to the camera 2. Thereby, the camera 2 can identify whether the camera 2 is facing the horizontal direction or the vertical downward direction, and can acquire the video.

カメラ２は、撮像部２１と映像処理部２２とを備える。撮像部２１は、魚眼レンズを備え、カメラ２の正面方向を中心として、全方位に対して、無限距離から魚眼レンズの設置面までの領域を撮像する。撮像データは、映像処理部２２に与えられる。 The camera 2 includes an imaging unit 21 and a video processing unit 22. The imaging unit 21 includes a fisheye lens, and images an area from an infinite distance to the fisheye lens installation surface in all directions with the front direction of the camera 2 as the center. The imaging data is given to the video processing unit 22.

映像処理部２２は、ステー３のスイッチ４（ヒンジ２０３およびカメラ支持部３２）から検出したカメラ２の向く方向（以下、撮影方向と称する）を取得する。映像処理部２２は、取得した撮影方向および通信端末５を介して放収音装置１からの話者方位データＰｍに基づいて、撮像データから必要部のみを抽出して画像補整し、映像データを生成する。生成された映像データは、通信端末５に与えられる。 The video processing unit 22 acquires the direction (hereinafter referred to as a shooting direction) of the camera 2 detected from the switch 4 (the hinge 203 and the camera support unit 32) of the stay 3. Based on the acquired shooting direction and the speaker orientation data Pm from the sound emitting and collecting apparatus 1 via the communication terminal 5, the video processing unit 22 extracts only the necessary part from the imaged data and corrects the image, thereby converting the video data. Generate. The generated video data is given to the communication terminal 5.

次に、当該ビデオ会議装置の使用方法および映像処理部２２での映像データ生成方法について、より具体的に説明する。なお、以下の説明では、自装置側の会議者が５名である場合について示すが、会議者数が特にこれに限るものではない。 Next, the usage method of the video conference apparatus and the video data generation method in the video processing unit 22 will be described more specifically. In the following description, the case where there are five conference members on the own device side is shown, but the number of conference members is not particularly limited to this.

図４は本実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点とビデオ会議を行う状況を示す図であり、カメラ２が会議者６０１〜６０５を撮像している場合を示した図である。
図５は映像データ生成の説明に用いる説明図であり、（Ａ）は魚眼レンズを介して撮像された映像（画像）を示し、（Ｂ）、（Ｃ）は会議者方位毎の画像補整概念を示す。 FIG. 4 is a diagram illustrating a situation in which the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the case where the camera 2 captures the participants 601 to 605 is illustrated. FIG.
FIG. 5 is an explanatory diagram used for explaining video data generation. (A) shows a video (image) taken through a fisheye lens, and (B) and (C) show image correction concepts for each conference direction. Show.

図６は本実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点とビデオ会議を行う状況を示す図であり、カメラ２が資料６５０を撮像している場合を示した図である。
図７は映像データ生成の説明に用いる説明図であり、（Ａ）は魚眼レンズを介して撮像された映像（画像）を示し、（Ｂ）は資料撮像時の画像補整概念を示す。 FIG. 6 is a diagram illustrating a situation in which the video conference apparatus according to the present embodiment is arranged to perform a video conference with another point connected to the network, and is a diagram illustrating a case where the camera 2 captures the material 650. .
FIGS. 7A and 7B are explanatory diagrams used for explaining video data generation. FIG. 7A shows a video (image) captured through a fisheye lens, and FIG. 7B shows a concept of image correction at the time of document imaging.

ビデオ会議を行う場合には、会議者６０１〜６０５は、長円形のテーブル７００に対して長手方向の片端を除く位置に着席する。テーブル７００上には、円形の放収音装置１とこれにステー３により固定されたカメラ２との一体部材が設置される。この際、カメラ２は、水平方向に向いた状態で、テーブル７００の長手方向に平行な軸が魚眼レンズの中心軸と一致するように設置されている。テーブル７００の下には、通信端末５が設置されている。通信端末５は、放収音装置１、カメラ２と電気的に接続し、且つネットワーク５００に接続している。また、通信端末５は、ディスプレイ６に電気的に接続している。ディスプレイ６は、例えば液晶ディスプレイ等からなり、テーブル７００の会議者６０１〜６０５が着席していない側の端部付近に設置される。この際、ディスプレイ６は、テーブル７００方向に表示面が向くように設置されている。 When a video conference is performed, the conference participants 601 to 605 are seated at a position excluding one end in the longitudinal direction with respect to the oval table 700. On the table 700, an integrated member of a circular sound emitting and collecting apparatus 1 and a camera 2 fixed to the same by a stay 3 is installed. At this time, the camera 2 is installed so that the axis parallel to the longitudinal direction of the table 700 coincides with the central axis of the fisheye lens in a state of being oriented in the horizontal direction. Under the table 700, the communication terminal 5 is installed. The communication terminal 5 is electrically connected to the sound emission and collection device 1 and the camera 2, and is connected to the network 500. The communication terminal 5 is electrically connected to the display 6. The display 6 includes a liquid crystal display, for example, and is installed near the end of the table 700 on the side where the participants 601 to 605 are not seated. At this time, the display 6 is installed so that the display surface faces in the direction of the table 700.

このような状態でビデオ会議が行われると、放収音装置１、カメラ２、通信端末５を含むビデオ会議装置は、二つのモードで会議の映像を相手先のビデオ会議装置に送信する。 When a video conference is performed in such a state, the video conference device including the sound emission and collection device 1, the camera 2, and the communication terminal 5 transmits the video of the conference to the destination video conference device in two modes.

（１）会議者撮影モード
会議者６０１〜６０５のいずれかが、カメラ２を水平方向にセットすると、スイッチ４からの検出信号により、カメラ２の映像処理部２２は、会議者撮影モードが選択されたことを検出する。映像処理部２２は、会議者撮影モードを検出すると、当該モードの選択信号を通信端末５に与える。 (1) Conference participant shooting mode When any of the conference participants 601 to 605 sets the camera 2 in the horizontal direction, the video processing unit 22 of the camera 2 selects the conference participant shooting mode based on the detection signal from the switch 4. Detect that. When the video processing unit 22 detects the conference person shooting mode, the video processing unit 22 gives a selection signal of the mode to the communication terminal 5.

カメラ２の撮像部２１は、魚眼レンズを通して、自装置側に在席する全会議者６０１〜６０５を撮像した撮像データを取得し、映像処理部２２に出力する。ここで、撮像データは、魚眼レンズを通しているので、撮像領域が図５（Ａ）のように円形になる。会議者撮影モードが選択されている場合、映像処理部２２は、円形の撮像データに対して、円弧状に曲がる水平方向を方位角φで表し、垂直方向を仰角ψで表す座標系で取得する。すなわち、魚群レンズの正面方向で、レンズ軸と同じ高さがφ＝０°、ψ＝０°に設定される。さらに、当該座標から左方向に広がる方向でφが負方向に増加し、右方向に広がる方向でφが正方向に増加するように設定されている。したがって、カメラ２の魚眼レンズの最先端から、撮影方向に対して左方向で魚眼レンズの軸に垂直な方向がφ＝−９０°となり、カメラ２の魚眼レンズの最先端から、撮影方向に対して右方向で魚眼レンズの軸に垂直な方向がφ＝＋９０°となる。また、当該座標から上方向に広がる方向でψが正方向に増加し、下方向に広がる方向でψが負方向に増加するように設定されている。したがって、カメラ２の魚眼レンズの最先端から、撮影方向に対して上方向で魚眼レンズの軸に垂直な方向がψ＝＋９０°となり、カメラ２の魚眼レンズの最先端から、撮影方向に対して下方向で魚眼レンズの軸に垂直な方向がψ＝−９０°となる。 The imaging unit 21 of the camera 2 acquires imaging data obtained by imaging all the conference participants 601 to 605 present on the device side through the fisheye lens, and outputs the acquired imaging data to the video processing unit 22. Here, since the imaging data passes through the fisheye lens, the imaging area is circular as shown in FIG. When the conference shooting mode is selected, the video processing unit 22 acquires, with respect to the circular imaging data, a coordinate system in which the horizontal direction that is curved in an arc shape is represented by an azimuth angle φ and the vertical direction is represented by an elevation angle ψ. . That is, in the front direction of the fish lens, the same height as the lens axis is set to φ = 0 ° and ψ = 0 °. Further, φ is set so that φ increases in the negative direction in the direction spreading leftward from the coordinates, and φ increases in the positive direction in the direction spreading rightward. Therefore, from the leading edge of the fisheye lens of the camera 2, the direction perpendicular to the axis of the fisheye lens to the left with respect to the photographing direction is φ = −90 °, and from the leading edge of the fisheye lens of the camera 2 to the right with respect to the photographing direction. Thus, the direction perpendicular to the axis of the fisheye lens is φ = + 90 °. In addition, ψ increases in the positive direction in the direction extending upward from the coordinates, and ψ increases in the negative direction in the direction extending downward. Therefore, from the leading edge of the fisheye lens of the camera 2, the direction upward with respect to the photographing direction and perpendicular to the axis of the fisheye lens is ψ = + 90 °, and from the leading edge of the fisheye lens of the camera 2 to the downward direction with respect to the photographing direction. The direction perpendicular to the axis of the fisheye lens is ψ = −90 °.

放収音装置１は、前述の処理により、発言中の会議者の音声を取得するとともに、会議者方位を検出して、収音音声データと話者方位情報θとを通信端末５に与える。例えば、図４に示す会議者６０１が発言すれば、放収音装置１は、会議者６０１の方位θ１を検出して、会議者６０１方向からの音声に基づく収音音声データと話者方位情報θ１とを通信端末５に与える。また、会議者６０５が発言すれば、放収音装置１は、会議者６０５の方位θ２を検出して、会議者６０５方向からの音声に基づく収音音声データと話者方位情報θ２を通信端末５に与える。通信端末５は、話者方位情報θをカメラ２の映像処理部２２に与える。 The sound emission and collection device 1 obtains the voice of the conference participant who is speaking by the above-described processing, detects the conference direction, and provides the communication terminal 5 with the collected sound data and the speaker orientation information θ. For example, if the conference person 601 shown in FIG. 4 speaks, the sound emitting and collecting apparatus 1 detects the orientation θ1 of the conference person 601, and the sound collection voice data and the speaker orientation information based on the voice from the conference party 601 direction. θ1 is given to the communication terminal 5. Further, if the conference person 605 speaks, the sound emitting and collecting apparatus 1 detects the orientation θ2 of the conference person 605, and transmits the collected sound data based on the voice from the direction of the conference person 605 and the speaker orientation information θ2 to the communication terminal. Give to 5. The communication terminal 5 gives the speaker orientation information θ to the video processing unit 22 of the camera 2.

映像処理部２２は、通信端末５からの話者方位情報θに基づいて、撮像データを補整する。映像処理部２２は、話者方位情報θと、撮像データに設定された方位角φとの関係を予め記憶している。そして、映像処理部２２は、話者方位情報θを受け付けると、対応する方位角φを読み出す。例えば、映像処理部２２は、会議者６０１に対する話者方位情報θ１を受け付けると、対応する方位角φ＝０°を読み出す。また、例えば、映像処理部２２は会議者６０５に対する話者方位情報θ２を受け付けると、対応する方位角φ＝−９０°を読み出す。 The video processing unit 22 corrects the imaging data based on the speaker orientation information θ from the communication terminal 5. The video processing unit 22 stores in advance the relationship between the speaker orientation information θ and the orientation angle φ set in the imaging data. When the video processing unit 22 receives the speaker orientation information θ, the video processing unit 22 reads the corresponding orientation angle φ. For example, when the video processing unit 22 receives the speaker azimuth information θ1 for the conference 601, the video processing unit 22 reads the corresponding azimuth angle φ = 0 °. For example, when the video processing unit 22 receives the speaker orientation information θ 2 for the conference 605, the video processing unit 22 reads the corresponding orientation angle φ = −90 °.

映像処理部２２は、読み出した方位角φを含む所定方位角幅からなる画像抽出方位角範囲を設定する。また、映像処理部２２は、仰角ψ＝０°を含む所定仰角幅からなる画像抽出仰角範囲を設定する。そして、映像処理部２２は、設定した方位角範囲と仰角範囲とにより画像抽出領域を決定し、当該領域に対応する撮像データを画像データとして取得する。 The video processing unit 22 sets an image extraction azimuth angle range including a predetermined azimuth angle width including the read azimuth angle φ. Further, the video processing unit 22 sets an image extraction elevation angle range having a predetermined elevation angle width including the elevation angle ψ = 0 °. Then, the video processing unit 22 determines an image extraction region based on the set azimuth angle range and elevation angle range, and acquires imaging data corresponding to the region as image data.

例えば、映像処理部２２は、方位角φ＝０°を読み出すと、φ＝０°を含み方位角φ１〜方位角φ２（φ１＜０°＜φ２）の範囲を方位角範囲に設定する。また、映像処理部２２は、ψ＝０°を含み仰角ψ１〜仰角ψ２（ψ１＜ψ２）の範囲を仰角範囲に設定する。そして、映像処理部２２は、方位角範囲φ１〜φ２、仰角範囲ψ１〜ψ２により画像抽出領域を設定して、画像データ６２１を取得する。また、例えば、映像処理部２２は、方位角φ＝−９０°を読み出すと、φ＝−９０°を含み方位角φ３〜方位角φ４（φ３＜−９０°＜φ４）の範囲を方位角範囲に設定する。また、映像処理部２２は、ψ＝０°を含み仰角ψ３〜仰角ψ４（ψ３＜ψ４）の範囲を仰角範囲に設定する。そして、映像処理部２２は、方位角範囲φ３〜φ４、仰角範囲ψ３〜ψ４により画像抽出領域を設定して、画像データ６２２を取得する。 For example, when the video processing unit 22 reads out the azimuth angle φ = 0 °, the video processing unit 22 sets the range of the azimuth angle φ1 to the azimuth angle φ2 (φ1 <0 ° <φ2) including φ = 0 ° as the azimuth angle range. Further, the video processing unit 22 sets the range of elevation angle ψ1 to elevation angle ψ2 (ψ1 <ψ2) including ψ = 0 ° as the elevation angle range. Then, the video processing unit 22 acquires the image data 621 by setting an image extraction region with the azimuth angle ranges φ1 to φ2 and the elevation angle ranges φ1 to φ2. Further, for example, when the image processing unit 22 reads out the azimuth angle φ = −90 °, the azimuth angle range falls within a range of azimuth angle φ3 to azimuth angle φ4 (φ3 <−90 ° <φ4) including φ = −90 °. Set to. Further, the video processing unit 22 sets the range of elevation angle ψ3 to elevation angle ψ4 (ψ3 <ψ4) including ψ = 0 ° as the elevation angle range. Then, the video processing unit 22 sets the image extraction region based on the azimuth angle range φ3 to φ4 and the elevation angle range φ3 to φ4, and acquires the image data 622.

映像処理部２２は、取得した画像抽出領域毎に画像の補整変換を行う。具体的には、二つの角度方向であるφ方向とψ方向で定義される各画素を、直交二次元の平面座標（Ｘ−Ｙ座標系）の画素に当てはめるように補整変換する。この際、映像処理部２２は、φ−ψ座標系とＸ−Ｙ座標系との変換処理テーブルを予め記憶しており、取得した各画素のφ−ψ座標に基づいて、Ｘ−Ｙ座標を算出し、補整変換する。なお、映像処理部２２は、予め座標変換演算式を記憶しており、当該座標変換演算式を用いて補整変換を行っても良い。 The video processing unit 22 performs image correction conversion for each acquired image extraction region. More specifically, correction conversion is performed so that each pixel defined by two angular directions φ direction and ψ direction is applied to a pixel of an orthogonal two-dimensional plane coordinate (XY coordinate system). At this time, the video processing unit 22 stores in advance a conversion processing table between the φ-ψ coordinate system and the XY coordinate system, and calculates the XY coordinates based on the obtained φ-ψ coordinates of each pixel. Calculate and compensate. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.

例えば、図５（Ｂ）に示すように、映像処理部２２は、方位角範囲φ１〜φ２、仰角範囲ψ１〜ψ２で設定される画像データ６２１を、平面座標系であり水平方向をＸ軸として垂直方向をＹ軸とするｘ１〜ｘ２，ｙ１〜ｙ２で設定される補整画像データ６２１’に変換する。この変換により、φ−ψ座標系で取得した会議者６０１の人物像６１１が、Ｘ−Ｙ座標系（平面座標系）の補整人物像６３１に変換される。このようにＸ−Ｙ座標系に変換することで、補整人物像６３１は、会議者６０１の自然体像に近いものとなる。 For example, as shown in FIG. 5B, the video processing unit 22 uses the plane coordinate system and the horizontal direction as the X axis for the image data 621 set in the azimuth angle range φ1 to φ2 and the elevation angle range φ1 to φ2. The image data is converted into corrected image data 621 ′ set by x1 to x2 and y1 to y2 with the vertical direction as the Y axis. By this conversion, the person image 611 of the conference 601 acquired in the φ-ψ coordinate system is converted into a corrected person image 631 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected person image 631 becomes close to the natural body image of the conference 601.

また、例えば、図５（Ｃ）に示すように、映像処理部２２は、方位角範囲φ３〜φ４、仰角範囲ψ３〜ψ４で設定される画像データ６２２を、平面座標系であり水平方向をＸ軸として垂直方向をＹ軸とするｘ３〜ｘ４，ｙ３〜ｙ４で設定される補整画像データ６２２’に変換する。この変換により、φ−ψ座標系で取得した会議者６０５の人物像６１５が、Ｘ−Ｙ座標系（平面座標系）の補整人物像６３５に変換される。このようにＸ−Ｙ座標系に変換することで、補整人物像６３５は、会議者６０１の自然体像に近いものとなる。 Also, for example, as shown in FIG. 5C, the video processing unit 22 converts the image data 622 set in the azimuth angle range φ3 to φ4 and the elevation angle range ψ3 to ψ4 into a plane coordinate system in the horizontal direction X. The image is converted into corrected image data 622 ′ set by x3 to x4 and y3 to y4 with the vertical direction as the axis. By this conversion, the person image 615 of the conference person 605 acquired in the φ-ψ coordinate system is converted into a corrected person image 635 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected person image 635 becomes close to the natural body image of the conference 601.

映像処理部２２は、このように自然体に近づいた補整人物像を含む補整画像データに時間情報を添付して映像データとして通信端末５に出力する。このような補整画像データの生成および出力は、逐次行われており、受け付けた話者方位情報θが変化すれば、この変化に応じて、補整画像データの中心方向も切り替わる。 The video processing unit 22 attaches time information to the corrected image data including the corrected human image approaching the natural body in this way, and outputs the corrected image data to the communication terminal 5 as video data. Such generation and output of the corrected image data are sequentially performed. If the received speaker orientation information θ changes, the center direction of the corrected image data is switched in accordance with the change.

通信端末５は、映像処理部２２からの映像データと収音音声データと話者方位情報θとを関連付けして通信データを生成し、ネットワーク５００を介して相手先のビデオ会議装置に送信する。これにより、相手先のビデオ会議装置の周囲に在席する会議者には、発言中の会議者の自然体に近い映像と当該会議者の発言とを提供することができる。 The communication terminal 5 generates communication data by associating the video data from the video processing unit 22, the collected sound data, and the speaker orientation information θ, and transmits the communication data to the video conference device of the other party via the network 500. Accordingly, it is possible to provide a conference person who is present around the other party's video conference apparatus with an image close to the natural body of the conference participant who is speaking and the speech of the conference participant.

（２）資料撮影モード
会議者６０１〜６０５のいずれかが、図６に示すように、カメラ２を垂直下方向にセットすると、スイッチ４からの検出信号により、カメラ２の映像処理部２２は、資料撮影モードが選択されたことを検出する。映像処理部２２は、資料撮影モードを検出すると、当該モードの選択信号を通信端末５に与える。 (2) Document shooting mode When any of the participants 601 to 605 sets the camera 2 vertically downward as shown in FIG. 6, the video processing unit 22 of the camera 2 uses the detection signal from the switch 4 to Detect that the document shooting mode is selected. When the video processing unit 22 detects the material photographing mode, the video processing unit 22 gives a selection signal of the mode to the communication terminal 5.

また、会議者６０１〜６０５のいずれかは、テーブル７００におけるヒンジ２０３の垂直下方向位置を中心にして、資料６５０を載置する。この際、テーブル７００上に資料載置用マーキングを予め行っておけば、資料６５０を容易に且つ適切に載置することができる。 In addition, any of the conference participants 601 to 605 places the material 650 around the vertical downward position of the hinge 203 in the table 700. At this time, if the material placement marking is performed on the table 700 in advance, the material 650 can be placed easily and appropriately.

カメラ２の撮像部２１は、魚眼レンズを通して、テーブル７００上に載置された資料６５０を撮像した撮像データを取得し、映像処理部２２に出力する。ここで、撮像データは、魚眼レンズを通しているので、撮像領域が図７（Ａ）のように円形になる。 The imaging unit 21 of the camera 2 acquires imaging data obtained by imaging the material 650 placed on the table 700 through the fisheye lens, and outputs the acquired imaging data to the video processing unit 22. Here, since the imaging data passes through the fisheye lens, the imaging area is circular as shown in FIG.

資料撮影モードが選択されている場合、映像処理部２２は、円形の撮像データに対して、撮像データの中心を原点とし、原点から放射方向に延びる距離ｒと、所定方向（図７では原点から撮像データに向かって右方向を０°方向）に対する角度ηとで表されるｒ−η座標系で取得する。映像処理部２２は、取得した撮像データから、予め設定された範囲の画像データ６８０を切り出す。 When the document photographing mode is selected, the image processing unit 22 for circular image data, the distance r extending in the radial direction from the origin with the center of the image data as the origin, and a predetermined direction (in FIG. 7, from the origin). It is acquired in an r-η coordinate system expressed by an angle η with respect to the imaging data (the right direction is the 0 ° direction). The video processing unit 22 cuts out image data 680 in a preset range from the acquired imaging data.

映像処理部２２は、ｒ−η座標系の画像データ６８０をＸ−Ｙ平面座標系の補整画像データ６８０’に変換することで補整する。この際、映像処理部２２は、ｒ−η座標系とＸ−Ｙ座標系との中心座標を一致させた座標変換処理テーブルを予め記憶しており、取得した各画素のｒ−η座標に基づいてＸ−Ｙ座標を算出し、補整変換する。なお、映像処理部２２は、予め座標変換演算式を記憶しており、当該座標変換演算式を用いて補整変換を行っても良い。 The video processing unit 22 corrects the image data 680 in the r-η coordinate system by converting it into corrected image data 680 'in the XY plane coordinate system. At this time, the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r-η coordinate system and the XY coordinate system are matched, and is based on the acquired r-η coordinates of each pixel. X-Y coordinates are calculated and correction conversion is performed. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.

この変換により、ｒ−η座標系で取得した資料６５０の資料像６６０が、Ｘ−Ｙ座標系（平面座標系）の補整資料像６７０に変換される。このようにＸ−Ｙ座標系に変換することで、補整資料像６７０は、資料６５０の自然体像に近いものとなる。すなわち、文字が歪んでいない資料６５０の画像データを取得することができる。 By this conversion, the material image 660 of the material 650 acquired in the r-η coordinate system is converted into a corrected material image 670 in the XY coordinate system (planar coordinate system). By converting into the XY coordinate system in this way, the corrected material image 670 becomes close to the natural body image of the material 650. That is, the image data of the material 650 in which the characters are not distorted can be acquired.

通信端末５は、映像処理部２２から取得した資料６５０の画像データを含む通信データを生成し、ネットワーク５００を介して相手先のビデオ会議装置に送信する。これにより、相手先のビデオ会議装置の周囲に在席する会議者には、鮮明で見やすい資料の画像を提供することができる。なお、この際、通信端末５は、収音音声データを放収音装置１から取得していれば、資料６５０の画像データとともに収音音声データを含む通信データを生成し、送信するようにしてもよい。 The communication terminal 5 generates communication data including the image data of the material 650 acquired from the video processing unit 22 and transmits the communication data to the partner video conference apparatus via the network 500. Thereby, it is possible to provide clear and easy-to-view material images to the conference attendees who are present around the other party's video conference apparatus. At this time, if the collected sound data is acquired from the sound emission and collection device 1, the communication terminal 5 generates and transmits communication data including the collected sound data together with the image data of the material 650. Also good.

以上のように、本実施形態の構成および処理を用いることで、会議者の映像と資料の画像とを、それぞれの仕様に適した状態で取得し、送信することができる。この際、カメラを水平方向と垂直下方向との二方向に可変させるだけで、会議者映像と資料画像とのそれぞれの仕様に応じた映像を容易に取得することができる。 As described above, by using the configuration and processing of the present embodiment, it is possible to acquire and transmit a conference person's video and material image in a state suitable for each specification. At this time, the video corresponding to the specifications of the conference video and the document image can be easily acquired by simply changing the camera in two directions, the horizontal direction and the vertical downward direction.

次に、第２の実施形態に係るビデオ会議装置について図を参照して説明する。
図８は、本実施形態のビデオ会議装置の内の放収音装置１とカメラ２と支持体７とからなる組み立て部材の外観図であり、（Ａ）は平面図、（Ｂ）は側面図である。
図９は、本実施形態のビデオ会議装置を用いたビデオ会議装置の状況を示した図であり、（Ａ）は平面図、（Ｂ）は側面図である。なお、図８、図９では、放収音装置１、カメラ２に接続されるケーブル類については、図示を省略している。 Next, a video conference apparatus according to the second embodiment will be described with reference to the drawings.
FIG. 8 is an external view of an assembly member including the sound emitting and collecting apparatus 1, the camera 2, and the support 7 in the video conference apparatus of the present embodiment, (A) is a plan view, and (B) is a side view. It is.
FIG. 9 is a diagram illustrating a situation of a video conference apparatus using the video conference apparatus of the present embodiment, where (A) is a plan view and (B) is a side view. 8 and 9, the cables connected to the sound emission and collection device 1 and the camera 2 are not shown.

図１０は、本実施形態のビデオ会議装置による映像データの生成を説明する図であり、（Ａ）は撮像データを示す図、（Ｂ）は撮像データの中心部の画像補整の概念図、（Ｃ）は撮像データの周囲部の画像補整の概念図である。
本実施形態のビデオ会議装置は、放収音装置１および通信端末５の構成および処理は、第１の実施形態のビデオ会議装置と同じである。一方、本実施形態のビデオ会議装置は、カメラ２の設置構造すなわち支持体７の構造、およびカメラ２の映像処理部２２での映像処理方法が、第１の実施形態と異なり、スイッチ４が省略されたものである。 10A and 10B are diagrams for explaining generation of video data by the video conferencing apparatus according to the present embodiment. FIG. 10A is a diagram illustrating imaging data, FIG. 10B is a conceptual diagram of image correction at the center of the imaging data, C) is a conceptual diagram of image correction around the image data.
In the video conference apparatus of the present embodiment, the configurations and processes of the sound emitting and collecting apparatus 1 and the communication terminal 5 are the same as those of the video conference apparatus of the first embodiment. On the other hand, the video conferencing apparatus of the present embodiment is different from the first embodiment in the installation structure of the camera 2, that is, the structure of the support 7, and the video processing method in the video processing unit 22 of the camera 2, and the switch 4 is omitted. It has been done.

図８に示すように、円板状の放収音装置１の周囲には、支持体７が配置されている。支持体７は、垂直方向に延びる四本の垂直支軸と、放収音装置１の上面から距離ｈ１の位置に配置された二本の水平支軸と、放収音装置１の上面から距離ｈ２（＞ｈ１）の位置に配置された四本の水平支軸とからなる。距離ｈ１に配置される二本の水平支軸は、放収音装置１を平面視した時の略中心の位置で交わる構造からなり、四本の垂直支軸により距離ｈ１に保持されている。距離ｈ２に配置される水平支軸は、平面視して略正方形となるように組まれ、四本の垂直支軸により距離ｈ２に保持されている。 As shown in FIG. 8, a support 7 is disposed around the disc-shaped sound emitting and collecting apparatus 1. The support 7 has four vertical support shafts extending in the vertical direction, two horizontal support shafts arranged at a distance h1 from the upper surface of the sound emitting and collecting device 1, and a distance from the upper surface of the sound emitting and collecting device 1. It consists of four horizontal spindles arranged at the position of h2 (> h1). The two horizontal support shafts arranged at the distance h1 have a structure that intersects at a substantially central position when the sound emitting and collecting apparatus 1 is viewed in plan, and are held at the distance h1 by the four vertical support shafts. The horizontal support shafts arranged at the distance h2 are assembled so as to be substantially square in a plan view, and are held at the distance h2 by four vertical support shafts.

カメラ２は、距離ｈ１にある二本の水平支軸の交点に設置されている。カメラ２は、撮像方向が垂直上向きになるように設置されている。 The camera 2 is installed at the intersection of two horizontal spindles at a distance h1. The camera 2 is installed so that the imaging direction is vertically upward.

載置テーブル８は、距離ｈ２にある四本の水平支軸により支持されており、載置テーブル８は、透過性の高いガラスやアクリル板等により形成されている。この際、平面視した状態で、載置テーブル８の中心とカメラ２の魚眼レンズの軸とが略一致するように、載置テーブル８とカメラ２が設置される。 The mounting table 8 is supported by four horizontal support shafts at a distance h2, and the mounting table 8 is formed of a highly transmissive glass, an acrylic plate, or the like. At this time, the mounting table 8 and the camera 2 are installed so that the center of the mounting table 8 and the axis of the fisheye lens of the camera 2 substantially coincide with each other in a plan view.

載置テーブル８の上には、資料６５０が、印刷面を垂直下方向すなわち載置テーブル８に接する向きで置かれる。 On the mounting table 8, the material 650 is placed with the printing surface in a vertically downward direction, that is, in a direction in contact with the mounting table 8.

ここで、カメラ２の高さおよび載置テーブル８の高さ、すなわち、距離ｈ１，ｈ２は、図９に示すように、会議者６０１〜６０４の少なくとも顔が、カメラ２で撮影可能で、且つ、載置テーブル８を支持する水平支軸で隠れないように設定するとよい。 Here, the height of the camera 2 and the height of the mounting table 8, that is, the distances h1 and h2, as shown in FIG. 9, at least the faces of the conference participants 601 to 604 can be photographed by the camera 2, and The horizontal support shaft that supports the mounting table 8 may be set so as not to be hidden.

このような構成のビデオ会議装置を用いた場合、カメラ２の撮像部２１で取得される撮像データは、図１０（Ａ）のようになる。すなわち、撮像データは、魚眼レンズを通して撮像されたものであるので、全撮像領域が円形の全領域画像データ６１０となり、その中心に資料６５０の資料像６６０が映され、その周辺部に各会議者６０１〜６０４の人物像６４１〜６４４が映される。 When the video conferencing apparatus having such a configuration is used, the imaging data acquired by the imaging unit 21 of the camera 2 is as shown in FIG. That is, since the imaging data is taken through a fish-eye lens, the entire imaging area is circular all-area image data 610, and the document image 660 of the document 650 is projected at the center, and each conference participant 601 is located in the periphery. ˜604 person images 641 to 644 are displayed.

映像処理部２２は、円形の撮像データに対して、撮像データの中心を原点とし、原点から放射方向に延びる距離ｒと、所定方向（図１０では原点から撮像データに向かって右方向を０°方向）に対する角度ηとで表されるｒ−η座標系で取得する。映像処理部２２は、取得した撮像データから、予め設定された範囲の画像データ６８１を切り出す。 For the circular image data, the image processing unit 22 sets the center of the image data as the origin, a distance r extending in the radial direction from the origin, and a predetermined direction (in FIG. 10, the right direction from the origin to the image data is 0 °. It is obtained in an r-η coordinate system expressed by an angle η with respect to (direction). The video processing unit 22 cuts out image data 681 in a preset range from the acquired imaging data.

映像処理部２２は、ｒ−η座標系の画像データ６８１をＸ−Ｙ平面座標系の補整画像データ６８１’に変換することで補整する。この際、映像処理部２２は、ｒ−η座標系とＸ−Ｙ座標系との中心座標を一致させた座標変換処理テーブルを予め記憶しており、取得した各画素のｒ−η座標に基づいてＸ−Ｙ座標を算出し、補整変換する。なお、映像処理部２２は、予め座標変換演算式を記憶しており、当該座標変換演算式を用いて補整変換を行っても良い。 The video processing unit 22 corrects the image data 681 in the r-η coordinate system by converting it into corrected image data 681 'in the XY plane coordinate system. At this time, the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r-η coordinate system and the XY coordinate system are matched, and is based on the acquired r-η coordinates of each pixel. X-Y coordinates are calculated and correction conversion is performed. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.

この変換により、図１０（Ｂ）に示すように、ｒ−η座標系で取得した資料６５０の資料像６６０が、Ｘ−Ｙ座標系（平面座標系）の補整資料像６７０に変換される。このようにＸ−Ｙ座標系に変換することで、補整資料像６７０は、資料６５０の自然体像に近いものとなる。すなわち、文字が歪んでいない資料６５０の画像データを取得することができる。 By this conversion, as shown in FIG. 10B, the material image 660 of the material 650 acquired in the r-η coordinate system is converted into a corrected material image 670 in the XY coordinate system (planar coordinate system). By converting into the XY coordinate system in this way, the corrected material image 670 becomes close to the natural body image of the material 650. That is, the image data of the material 650 in which the characters are not distorted can be acquired.

また、映像処理部２２は、全領域画像データ６１０から中心付近の画像データ６８１を取り除いた周辺部画像データ６８２を取得する。映像処理部２２は、通信端末５を介して放収音装置１から取得した話者位置情報に基づいて、第１の実施形態と同様に、抽出する領域を設定する。すなわち、映像処理部２２は、発言中の会議者の像を含む領域を抽出し、部分画像データ６８３を取得する。この際、映像処理部２２は、部分画像データをｒ−η座標系で取得する。具体的には、図１０（Ｃ）に示すように、映像処理部２２は、話者方位情報に基づいて、該当する会議者の像を含む扇形状の四箇所の角部の座標を、（ｒ１０，η１０），（ｒ１０，η２０），（ｒ２０，η２０），（ｒ２０，η１０）に設定して取得する。 In addition, the video processing unit 22 acquires peripheral image data 682 obtained by removing the image data 681 near the center from the entire region image data 610. The video processing unit 22 sets a region to be extracted based on the speaker position information acquired from the sound emitting and collecting apparatus 1 via the communication terminal 5 as in the first embodiment. That is, the video processing unit 22 extracts a region including the image of the conference participant who is speaking, and acquires partial image data 683. At this time, the video processing unit 22 acquires partial image data in the r-η coordinate system. Specifically, as shown in FIG. 10C, the video processing unit 22 determines the coordinates of the four corners of the fan-shaped corner including the image of the corresponding conference, based on the speaker orientation information ( r10, η10), (r10, η20), (r20, η20), (r20, η10) are set and acquired.

映像処理部２２は、取得した部分画像データ６８３の補整変換を行う。具体的には、ｒ−η座標系で定義される各画素を、直交二次元の平面座標（Ｘ−Ｙ座標系）の画素に当てはめるように補整変換する。この際、映像処理部２２は、ｒ−η座標系とＸ−Ｙ座標系との変換処理テーブルを予め記憶しており、取得した各画素のｒ−η座標に基づいて、Ｘ−Ｙ座標を算出し、補整変換する。なお、映像処理部２２は、予め座標変換演算式を記憶しており、当該座標変換演算式を用いて補整変換を行っても良い。 The video processing unit 22 performs correction conversion on the acquired partial image data 683. Specifically, correction conversion is performed so that each pixel defined in the r-η coordinate system is applied to a pixel in an orthogonal two-dimensional plane coordinate (XY coordinate system). At this time, the video processing unit 22 stores in advance a conversion processing table between the r-η coordinate system and the XY coordinate system, and calculates the XY coordinates based on the acquired r-η coordinates of each pixel. Calculate and compensate. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.

例えば、図１０（Ｃ）に示すように、映像処理部２２は、距離範囲ｒ１０〜ｒ２０、方位角範囲η１０〜η２０で設定される部分画像データ６８３を、平面座標系であり水平方向をＸ軸として垂直方向をＹ軸とするｘ１０〜ｘ２０，ｙ１０〜ｙ２０で設定される補整画像データ６８３’に変換する。この変換により、ｒ−η座標系で取得した会議者６０４の人物像６４４が、Ｘ−Ｙ座標系（平面座標系）の補整人物像６５４に変換される。このようにＸ−Ｙ座標系に変換することで、補整人物像６５４は、会議者６０４の自然体像に近いものとなる。 For example, as shown in FIG. 10C, the video processing unit 22 converts the partial image data 683 set in the distance range r10 to r20 and the azimuth angle range η10 to η20 into the plane coordinate system and the horizontal direction as the X axis. Is converted into corrected image data 683 ′ set by x10 to x20 and y10 to y20 with the vertical direction as the Y axis. By this conversion, the person image 644 of the conference person 604 acquired in the r-η coordinate system is converted into a corrected person image 654 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected person image 654 becomes close to the natural body image of the conference person 604.

映像処理部２２は、取得した補整資料像６７０を含む補整画像データと補整人物像６５４を含む補整画像データとに時間情報を添付して、映像データとして通信端末５に出力する。このような補整画像データの生成および出力は、逐次行われており、受け付けた話者方位情報θが変化すれば、この変化に応じて、補整人物像を含む補整画像データのみが切り替わった映像データが出力される。 The video processing unit 22 attaches time information to the corrected image data including the acquired correction material image 670 and the corrected image data including the corrected human image 654, and outputs the image data to the communication terminal 5. Such generation and output of the corrected image data are sequentially performed. If the received speaker orientation information θ changes, video data in which only the corrected image data including the corrected person image is switched according to the change. Is output.

通信端末５は、映像処理部２２からの映像データと収音音声データと話者方位情報θとを関連付けして通信データを生成し、ネットワーク５００を介して相手先のビデオ会議装置に送信する。これにより、相手先のビデオ会議装置の周囲に在席する会議者には、発言中の会議者の自然体に近い映像と当該会議者の発言とともに、資料画像を同時に提供することができる。 The communication terminal 5 generates communication data by associating the video data from the video processing unit 22, the collected sound data, and the speaker orientation information θ, and transmits the communication data to the video conference device of the other party via the network 500. Thus, a conference image can be provided to a conference person who is present in the vicinity of the video conference apparatus of the other party, together with a video close to the natural state of the conference participant who is speaking and the speech of the conference participant.

このように、本実施形態の構成および処理を用いることで、発言中の会議者映像と資料画像とを同時に取得して送信するビデオ会議装置を比較的簡素な構造で実現することができる。 As described above, by using the configuration and processing of the present embodiment, a video conference apparatus that simultaneously acquires and transmits a conference participant's video and a material image that are speaking can be realized with a relatively simple structure.

なお、本実施形態では、会議者映像と資料画像とを同時に取得して送信する例を示したが、資料画像の取得は、定常的に行うのではなく、一時的に行って、このタイミングでのみ送信するようにしても良い。この場合、資料画像は、資料を取り替える時以外に変化することはないので、定常的に資料画像を送信する場合と比較しても、相手先に送信される情報内容が減ることはない。その一方で、資料画像を送信しない間は、資料画像のデータ量分だけ処理およびネットワーク負荷が軽くなるので、より高速に処理および送信を行うことができる。なお、資料画像取得のタイミングは、新たな資料を載置した際に操作部から、取得操作入力を行うようにしてもよく、画像解析部を設け、取得した画像が前の画像と異なる時を、新たな取得タイミングにしてもよい。 In the present embodiment, an example in which a conference person image and a document image are acquired and transmitted at the same time has been shown. However, acquisition of a document image is not performed regularly, but is performed temporarily and at this timing. You may make it transmit only. In this case, since the material image does not change except when the material is replaced, the content of information transmitted to the other party is not reduced even when compared with the case of constantly transmitting the material image. On the other hand, while the material image is not transmitted, the processing and the network load are reduced by the amount of data of the material image, so that the processing and transmission can be performed at higher speed. Note that the material image acquisition timing may be such that when the new material is placed, an acquisition operation input may be performed from the operation unit, an image analysis unit is provided, and the acquired image is different from the previous image. A new acquisition timing may be set.

また、前述の各実施形態では、カメラ内に映像処理部を備えた例を示したが、当該映像処理部をカメラと独立な装置で実現したり、放収音装置や、通信端末に装備してもよい。これにより、カメラがより簡素な構造となるので、前述の必要な領域の撮影が可能なレンズさえあれば、汎用の動画用カメラを用いることもできる。 In each of the above-described embodiments, an example in which a video processing unit is provided in the camera has been shown. However, the video processing unit can be realized by a device independent of the camera, or installed in a sound emitting and collecting device or a communication terminal. May be. Accordingly, since the camera has a simpler structure, a general-purpose video camera can be used as long as it has a lens capable of photographing the necessary area described above.

また、前述の説明では、通信端末を放収音装置と独立に設けた例を示したが、通信端末の有する機能を放収音装置に備えても良い。これにより、ビデオ会議装置の構成要素数が減少するので、より簡素で小型のビデオ会議装置を実現することができる。 In the above description, the example in which the communication terminal is provided independently of the sound emission and collection device has been described. However, the function of the communication terminal may be provided in the sound emission and collection device. As a result, the number of components of the video conference apparatus is reduced, so that a simpler and smaller video conference apparatus can be realized.

第１の実施形態のビデオ会議装置の会議者撮影モード時の外観図である。It is an external view at the time of the conference person photographing mode of the video conference apparatus of the first embodiment. 第１の実施形態のビデオ会議装置の資料撮影モード時の外観図である。It is an external view at the time of document photography mode of the video conference apparatus of 1st Embodiment. 第１の実施形態のビデオ会議装置の主要構成を示すブロック図である。It is a block diagram which shows the main structures of the video conference apparatus of 1st Embodiment. 第１の実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点とビデオ会議を行う状況（会議者撮影モード）を示す図である。It is a figure which shows the condition (conference person imaging | photography mode) which arrange | positions the video conference apparatus of 1st Embodiment and performs a video conference with the other points connected to the network. 会議者撮影モード時の映像データ生成の説明に用いる説明図である。It is explanatory drawing used for description of the video data generation at the time of a conference person photographing mode. 第１の実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点とビデオ会議を行う状況（資料撮影モード）を示す図である。It is a figure which shows the condition (material imaging | photography mode) which arrange | positions the video conference apparatus of 1st Embodiment and performs a video conference with the other points connected to the network. 資料撮影モード時の映像データ生成の説明に用いる説明図である。It is explanatory drawing used for description of the video data generation at the time of document photography mode. 第２の実施形態のビデオ会議装置の内の放収音装置１とカメラ２と支持体７とからなる組み立て部材の外観図である。It is an external view of the assembly member which consists of the sound emission and collection apparatus 1, the camera 2, and the support body 7 in the video conference apparatus of 2nd Embodiment. 第２の実施形態のビデオ会議装置を用いたビデオ会議装置の状況を示した図である。It is the figure which showed the condition of the video conference apparatus using the video conference apparatus of 2nd Embodiment. 第２の実施形態のビデオ会議装置による映像データの生成を説明する図である。It is a figure explaining the production | generation of the video data by the video conference apparatus of 2nd Embodiment.

Explanation of symbols

１−放収音装置、２−カメラ、３−ステー、４−スイッチ、５−通信端末、６−ディスプレイ、７−支持体、８−載置テーブル、１１−筐体、１２−脚部、２１−撮像部、２２−映像処理部、３１−主体部、３２−カメラ支持部、３３−主体支持部、３４−放収音装置取付部、１０２−入出力Ｉ／Ｆ、１０３−放音制御部、１０５−Ａ／Ｄ−ＡＭＰ、１０６−収音制御部、１０７−エコーキャンセル部、１１０−凹部、１１１−操作部、２０３−ヒンジ、５００−ネットワーク、６０１〜６０５−会議者、６１０−全領域画像データ、６１１，６１５−人物像、６２１−補整画像データ、６２２−補整画像データ、６３１，６３５−補整人物像、６４１〜６４４−人物像、６５０−資料、６５４−補整人物像、６６０−資料像、６７０−補整資料像、６８０，６８１−補整画像データ、６８２−周辺部画像データ、６８３−部分画像データ、７００−テーブル 1- Sound emitting and collecting device, 2-camera, 3-stay, 4-switch, 5-communication terminal, 6-display, 7-support, 8-mounting table, 11-housing, 12-leg, 21 -Imaging unit, 22-Video processing unit, 31- Main unit, 32-Camera support unit, 33- Main unit support unit, 34- Sound output device mounting unit, 102- Input / output I / F, 103- Sound output control unit , 105-A / D-AMP, 106-sound collection control unit, 107-echo cancellation unit, 110-recess, 111-operation unit, 203-hinge, 500-network, 601-605-conference, 610-all areas Image data, 611, 615-person image, 621-correction image data, 622-correction image data, 631,635-correction person image, 641-644-person image, 650-material, 654-correction image, 660-material Image, 670-Compensation Ryozo, 680,681- compensation image data, 682- periphery image data, 683- partial image data, 700- Table

Claims

A camera for imaging a predetermined area;
Video data generating means for generating video data based on video captured by the camera;
Sound collecting means for collecting sound around the device and generating collected sound data ;
A sound emission means for emitting the emitted sound data ;
A housing provided with the sound collecting means and the sound emitting means ;
Communication means that forms communication data with the collected sound data and the video data, transmits the communication data to the outside, and obtains sound emission sound data from the communication data from the outside and gives the sound emission means to the sound emission means; ,
Support means for supporting the camera with respect to the housing;
A video conferencing apparatus comprising:
The camera simultaneously images a conference person imaging area and an area close to the camera in the vicinity of the housing,
The video data generation means includes
Only the azimuth area corresponding to the collected sound data is cut out from the first partial video data corresponding to the conference person imaging area, and the cut out first partial video data is corrected by the first correction processing,
A video conference apparatus that corrects second partial video data corresponding to an area close to the camera by a second correction process different from the first correction process.

Comprising a selection means for selecting partial video data used for communication data;
The video conference apparatus according to claim 1 , wherein the video data generation unit provides the communication unit with the partial video data selected by the selection unit.

3. The camera according to claim 1, wherein the camera has a fish-eye lens, a center area of an area imaged by the fish-eye lens is an area close to the camera, and at least a peripheral area outside the center area is the conference person imaging area. The video conferencing device described.

The video data generating means, video conferencing apparatus according to any one of claims 1 to 3, wherein said camera and are integrally formed.

The video data generating means, video conferencing apparatus according to any one of claims 1 to 3 which is integrally formed on the housing together with the release sound means and said sound collecting means.

Wherein the communication means includes a video conferencing device according to any one of claims 1 to 5 which is integrally formed on the housing together with the release sound means and said sound collecting means.

It has a display monitor that plays back video data,
It said communication means acquires image data included in the communication data, video conferencing apparatus according to any one of claims 1 to 6, applied to the display monitor.