WO2008075726A1 - Video conferencing device - Google Patents

Video conferencing device Download PDF

Info

Publication number
WO2008075726A1
WO2008075726A1 PCT/JP2007/074449 JP2007074449W WO2008075726A1 WO 2008075726 A1 WO2008075726 A1 WO 2008075726A1 JP 2007074449 W JP2007074449 W JP 2007074449W WO 2008075726 A1 WO2008075726 A1 WO 2008075726A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
sound
data
unit
video data
Prior art date
Application number
PCT/JP2007/074449
Other languages
French (fr)
Japanese (ja)
Inventor
Toshiyuki Hata
Takuya Tamaru
Original Assignee
Yamaha Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corporation filed Critical Yamaha Corporation
Publication of WO2008075726A1 publication Critical patent/WO2008075726A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0007Image acquisition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture

Definitions

  • the present invention relates to a video conferencing apparatus that communicates video and images and audio used when a video conference is performed between conference rooms separated from each other.
  • a video conference device (video conference device) as shown in Patent Document 1 is arranged at each point so as to surround the video conference device. The conference is attended and a conference is held.
  • each conference person is equipped with a microphone with a radio wave generator, and radio waves are radiated from the microphone that picks up the highest level of sound.
  • the person photographing camera detects the direction of the speaker by receiving this radio wave, directs the camera toward the direction of the speaker, and captures an image centered on the speaker.
  • the video data and audio data are encoded and transmitted to the destination video conference apparatus.
  • Patent Document 1 JP-A-6-276514
  • an object of the present invention is to provide a resource that is highly flexible along with audio and video. To provide a video conferencing device that can be transmitted accurately and clearly even if there is a fee.
  • the present invention provides an imaging unit that captures a predetermined area, a video data generation unit that generates video data based on video captured by the imaging unit, Generating voice data and generating communication data including a housing including a sound emission and collection unit for emitting sound emission sound data, sound collection sound data and video data, and transmitting the communication data to the outside.
  • the present invention relates to a video conferencing apparatus including a communication unit that acquires sound emission sound data from external communication data and applies the sound emission sound collection unit, and a support unit that supports the imaging unit in a predetermined manner.
  • the support unit uses the first mode in which the imaging unit is directed to the conference person imaging region around the housing, and the second mode in which the imaging unit is directed to a region near the imaging unit in the vicinity of the housing.
  • the imaging unit is supported by any one of the modes.
  • the video data generation unit of this video conference apparatus cuts out only the azimuth area corresponding to the sound collection direction information of the collected sound data from the video data.
  • the extracted video data is corrected by the first correction processing according to the first mode.
  • the video data generation unit cuts out a predetermined area centered on the front direction of the imaging unit from the video data, and is different from the first correction processing.
  • the video data cut out by the second correction process according to the above is corrected.
  • the video conference device of the present invention cuts out only the video data in the sound collection direction and performs the first correction process when the imaging unit is set to the first mode facing the conference person imaging area. Make adjustments so that they are easy to see. Then, the video conference device generates communication data from the video data and the collected sound data, and transmits the communication data to the counterpart device.
  • the image capturing unit captures a document or the like installed in a close area near the casing.
  • the image taken by the imaging unit from the front is corrected by the second correction process so that it can be easily viewed.
  • the video conference device generates communication data including the video data and transmits it to the counterpart device.
  • the video is corrected by the first correction process and the second correction process, which are different correction processes according to the respective modes. .
  • the conference participant video and the still image such as the document are corrected according to the respective shooting specifications, the conference participant video and the document image appropriately corrected with respect to the destination device. And can send power.
  • the support section of the video conference apparatus of the present invention is characterized by including a joint mechanism for switching between the first mode and the second mode, and forming a switch by the joint mechanism.
  • the video data generation unit of this video conference apparatus is characterized by detecting the selection between the first mode and the second mode based on the switch selection status by the joint mechanism.
  • the first mode and the second mode are selected by switching the switch by operating the joint mechanism of the support unit. Second mode is set.
  • the present invention also includes an imaging unit that images a predetermined area, a video data generation unit that generates video data based on the video captured by the imaging unit, and a sound around the device itself. Generates collected sound data, generates communication data including a sound emission / collection unit that emits sound output sound data, sound collection sound data and video data, and transmits the communication data to the outside.
  • the present invention relates to a video conferencing apparatus comprising: a communication unit that obtains sound emission sound data from communication data from and provides the sound emission and collection unit; and a support unit that supports the imaging unit with respect to the housing. is there. In this video conference apparatus, the imaging unit simultaneously captures the conference person imaging area and the area close to the imaging unit in the vicinity of the housing.
  • the video data generation unit cuts out only the azimuth area corresponding to the sound collection direction information of the collected sound data from the first partial video data corresponding to the conference person imaging area, and the third partial video data is extracted from the first partial video data.
  • the second partial video data corresponding to the area close to the imaging unit is corrected by a fourth correction process different from the third correction process.
  • the first partial video data corresponding to the conference person imaging area and the second partial video data corresponding to the area where the material close to the imaging unit is arranged are provided as one unit. Acquired simultaneously by the imaging unit. In the first partial video data, only the azimuth area corresponding to the collected sound data is cut out and appropriately corrected by the third correction process. The second part video data is adjusted so that it can be easily viewed by the corresponding fourth correction process.
  • the conference video and the still image such as the document are acquired at the same time, and each It is adjusted according to the shooting specifications.
  • the video conference apparatus includes a selection unit that selects partial video data used for communication data.
  • the video data generation unit of the video conference apparatus gives the partial video data selected by the selection unit to the communication unit.
  • the imaging unit has a fisheye lens
  • the central region of the region imaged by the fisheye lens is set as a region close to the imaging unit, and at least a peripheral region outside the central region is conferenced It is characterized by a person imaging area.
  • a fisheye lens is used as a specific specification of the imaging unit. Then, an area corresponding to the center of the fisheye lens is set as an area close to the imaging unit, and correction is appropriately performed by correction processing according to this area.
  • the center area may be used when the mode is changed, but the peripheral area is mainly used. Therefore, the video in the conference area is appropriately adjusted by the correction processing according to the selected area in each case. As a result, even if an image (image) in the vicinity area near the imaging unit and an image in the conference area are captured through the fisheye lens, the respective images are appropriately corrected.
  • the video data generation unit of the video conference apparatus of the present invention is integrally formed with the imaging unit.
  • the communication unit of the video conference apparatus according to the present invention is integrally formed with the housing together with the sound emission and collection unit.
  • the video data generation unit of the video conference apparatus according to the present invention is integrally formed with the casing together with the sound emission and collection unit.
  • the video conference apparatus of the present invention further includes a display monitor for reproducing video data.
  • the communication unit of this video conference apparatus acquires video data included in the communication data and supplies it to the display monitor.
  • the video conference apparatus of the present invention is arranged and connected to each point where the communication conference is performed. It is possible to easily share the conference video and materials between the two parties.
  • the video of the speaker is corrected by the correction processing according to the video of the speaker
  • the image of the material is corrected by the correction processing according to the image of the material by a simple operation of the imaging unit. Since the correction is made, it is possible to transmit both the speaker image and the document image to the other device accurately and clearly. As a result, in the video conference using this apparatus, it is possible to realize the conference more easily and easily!
  • FIG. 1 is an external view of a video conferencing apparatus according to a first embodiment in a conference shooting mode.
  • FIG. 2 is an external view of the video conference apparatus according to the first embodiment in a document shooting mode.
  • FIG. 3 is a block diagram illustrating a main configuration of the video conference apparatus according to the first embodiment.
  • FIG. 4 is a diagram illustrating a situation (conference shooting mode) in which the video conference apparatus according to the first embodiment is arranged and a video conference is performed with another point connected to the network.
  • FIG. 5 is an explanatory diagram used for explaining video data generation in the conference shooting mode.
  • FIG. 6 is a diagram showing a situation (material shooting mode) in which the video conference apparatus according to the first embodiment is arranged and a video conference is performed with another point connected to the network.
  • FIG. 7 is an explanatory diagram used for explaining video data generation in the document photographing mode.
  • FIG. 8 is an external view of an assembly member including a sound emission and collection device 1, a camera 2, and a support 7 in a video conference device according to a second embodiment.
  • FIG. 9 is a diagram showing a usage situation of a video conference apparatus using the video conference apparatus of the second embodiment.
  • FIG. 10 is a diagram for explaining generation of video data by the video conference apparatus according to the second embodiment.
  • FIGS. 1 and 2 are external views of the video conference apparatus of the present embodiment
  • FIG. (B) is a side view
  • Fig. 1 and Fig. 2 show only the structure of the sound emission and collection device, camera, and stage, which are mechanically characteristic, and the communication terminal, sound emission and collection device, and the cable that electrically connects the camera are not shown. Omitted.
  • Fig. 1 shows the mechanism state in the conference shooting mode
  • Fig. 2 shows the mechanism state in the document shooting mode.
  • FIG. 3 is a block diagram showing the main configuration of the video conference apparatus according to the present embodiment.
  • the video conference apparatus includes a sound emitting and collecting apparatus 1 having a disk shape in plan view, a camera 2 having an imaging function and a video data generating function, and the camera 2 with respect to the sound emitting and collecting apparatus 1 at a predetermined position. And stay 3 to be installed.
  • sound emitting and collecting apparatus 1 and camera 2 are electrically connected, and the video conferencing apparatus is electrically connected to sound emitting and collecting apparatus 1 and camera 2.
  • a communication terminal to be connected is provided.
  • the communication terminal 5 demodulates the communication data received from the communication terminal of the other party's video conferencing apparatus connected via the network 500, and outputs a sound signal for sound emission, the other party's apparatus ID, and the speaker orientation.
  • the data is acquired and given to the sound emitting and collecting device 1 on the own device side connected by the cable.
  • the communication terminal 5 generates communication data based on the collected sound signal and speaker position data received from the sound emitting and collecting device 1 on the own device side and the video data received from the camera 2.
  • Communication terminal 5 transmits the generated communication data to the communication terminal of the destination video conference device. To do. Further, the communication terminal 5 mediates transmission / reception of the speaker position data between the sound emitting and collecting apparatus 1 and the camera 2 depending on the situation.
  • the sound emission and collection device 1 includes a disk-shaped housing 11.
  • the casing 11 has a circular shape in plan view, and the shape in side view in which the area between the top surface and the bottom surface is narrower than the area of the middle part in the vertical direction is from a point in the height direction. It has a shape that narrows toward the surface and narrows from the one point toward the bottom surface. That is, it has a shape having inclined surfaces on the upper side and the lower side from the one point.
  • a concave portion 110 having a predetermined depth narrower than the area of the top surface is formed on the top surface of the casing 11 so that the center of the concave portion 110 and the center of the top surface coincide with each other. Is set to
  • each microphone MC ;! to MC16 has a single directivity
  • each microphone MC is arranged so as to have a strong directivity in the central direction as viewed from above.
  • the direction is the center of directivity
  • the number of microphones is not limited to this, and may be set as appropriate according to specifications.
  • each speaker SP;! To SP4 has a strong directivity in the front direction of the sound emitting surface.
  • the speakers SP;! To SP4 are arranged on the lower side of the casing 11, and the microphones MC;! To MC16 are arranged on the upper side of the casing 11, and the microphones MC;! To MC16 are accommodated.
  • the microphones MC ;! to MC16 are difficult to pick up the wraparound sound from the speakers SP ;! to SP4.
  • speaker position detection which will be described later, is less likely to be affected by wraparound speech, and the speaker position can be detected with higher accuracy.
  • the operation unit 111 is installed on an inclined surface on the upper side of the casing 11, and includes various operation buttons and a liquid crystal display panel (not shown).
  • the input / output I / F102 (not shown in FIGS. 1 and 2) is an inclined surface on the lower side of the casing 11, and is installed at a position where the SP force SP;! To SP4 is not installed. Equipped with a terminal that can communicate various control data. Then, by connecting the terminal of the input / output I / F 102 and the communication terminal with a cable or the like, communication is performed between the sound emission and collection device 1 and the communication terminal.
  • the sound emitting and collecting apparatus 1 has a functional configuration as shown in FIG. 3 in addition to such a structural configuration.
  • the control unit 101 performs general control such as setting, sound collection, and sound emission of the sound emission / collection device 1, and controls each part of the sound emission / collection device 1 based on the operation instruction content input by the operation unit 111.
  • the input / output I / F 102 outputs sound emission sound signals S 1 to S 3 received from the communication terminal 5 to the channels CH;! To CH 3, respectively.
  • the channel assignment may be set as appropriate according to the number of received sound signals for sound emission.
  • the input / output I / F 102 receives the counterpart device ID from the communication terminal 5 and assigns a channel CH to each counterpart device ID. For example, when there is one connected counterpart device, the audio data from the counterpart device is assigned to channel CH1 as sound output audio signal S1. Also, when there are two connected counterpart devices, the audio data from the two counterpart devices are individually assigned to channels CHI and CH2 as sound emission sound signals SI and S2, respectively.
  • the audio data from the three counterpart devices are individually assigned to channels CHI, CH2, and CH3 as sound output signals SI, S2, and S3, respectively.
  • the channels CH;! To CH3 are connected to the sound emission control unit 103 via the echo cancellation unit 107.
  • the input / output I / F 102 extracts the speaker orientation data Py at the other party sound emission and collection device from the communication terminal 5 and provides it to the sound emission control unit 103 together with the channel information.
  • the sound emission control unit 103 generates speaker output signals SPD;! To SPD4 to be given to the speakers SP;! To SP4 based on the sound signals for sound emission S1 to S3 and the speaker orientation information Py. To do.
  • the D / A-AMP 104 converts each speaker output signal SPD ;! to SPD4 from digital to analog, amplifies the signal with a constant amplification factor, and supplies it to the speakers SP ;! to SP4, respectively.
  • Speaker SP;! ⁇ SP4 converts the given speaker output signal SPD;! ⁇ SPD4 into sound and emits it
  • the sound emitted from each speaker SP;! To SP4 has a predetermined delay relationship and amplitude relationship. A sense of sound can be given to the conferees.
  • the microphones MC;! To MC16 collect sound from outside, such as the sound generated by the conference, and generate the collected signals MS;! To MS16.
  • Each A / D-AMP 105 amplifies the corresponding collected sound signal MS ;! to MS 16 with a predetermined amplification factor, converts the signal to analog-digital, and outputs it to the sound collection control unit 106.
  • the sound collection control unit 106 synthesizes the acquired sound collection signals MS;! To MS 16 with different delay control patterns and amplitude patterns, and sets the respective different directions as the central direction of directivity. A sound beam signal is generated. For example, with the sound emitting and collecting apparatus 1 as the center, eight sound collecting beam signals are generated in which the 360 ° of the entire circumference is divided into eight angles, that is, the central direction of the directivity is shifted every 45 °.
  • the sound collection control unit 106 compares the amplitude levels of these sound collection beam signals, selects the sound collection beam signal MBS having the highest amplitude level, and outputs it to the echo cancellation unit 107.
  • the sound collection control unit 106 acquires the speaker orientation corresponding to the selected sound collection beam signal, generates the speaker orientation information Pm, and provides it to the input / output I / F 102.
  • the echo cancellation unit 107 includes an adaptive filter that generates pseudo-regression sound signals based on the sound output sound signals S1 to S3 for the input sound pickup beam signal MBS, and a sound pickup beam signal. It consists of a post processor that subtracts the pseudo-regressive sound signal from the MBS.
  • the echo cancellation circuit subtracts the pseudo-regression sound signal from the output sound pickup beam signal MBS while sequentially optimizing the filter coefficients of the adaptive filter, so that the speaker SP;! Included in the output sound pickup beam signal MBS! ⁇ Remove the wraparound component from SP4 to microphone MC;! ⁇ MC16.
  • the collected sound beam signal MBS from which the wraparound component has been removed is output to the input / output I / F 102.
  • the input / output I / F 102 associates the collected sound beam signal MBS from which the return sound has been removed by the echo canceling unit 107 with the speaker orientation information Pm from the sound collecting control unit 106, and outputs it to the communication terminal 5. To do.
  • the camera 2 is installed at a position fixed to the sound emitting and collecting apparatus 1 by the stay 3 as shown in FIGS. At this time, the camera 2 is installed by the stay 3 so as to be rotatable between a horizontal direction (direction facing the camera 2 shown in FIG. 1) and a vertical downward direction (direction facing the camera 2 shown in FIG. 2). ing.
  • the stay 3 includes a main body part 31, a camera support part 32, a main body support part 33, and a sound emitting and collecting device attachment part 34.
  • the main body 31 is formed of a linear member having a predetermined width, and is installed in a shape extending in a direction of a predetermined angle with respect to the vertical direction by the main body support 33.
  • a camera support portion 32 is installed at one end in the extending direction of the main body portion 31 via a hinge 203, and a sound emission and collection device mounting portion 34 is installed at the other end.
  • the sound emitting and collecting device mounting portion 34 is formed of a flat plate having an opening portion into which the leg portion 12 of the housing 11 is fitted, and is integrally formed with the main body portion 31, for example.
  • the end portion on the camera support portion 32 side of the main portion 31 has a shape in which only both end walls in the width direction remain and the center portion in the width direction opens.
  • the opening has a shape that does not contact the main body 31 when the camera 2 installed in the camera support 32 rotates between the horizontal direction and the vertical downward direction.
  • the hinge 203 has a structure in which the camera support portion 32 is rotatably installed with respect to the main body portion 31. Further, the hinge 203 and the camera support portion 32 have a structure that is semi-fixed when the camera 2 and the camera support portion 32 face in the horizontal direction and when they face in the vertical downward direction. For example, the hinge 203 is fixed to the main body 31, and the recesses are formed at the horizontal position and the vertically downward position of the hinge 203, respectively.
  • a protrusion on the hinge side of the camera support 32 is provided with a protrusion that fits into the recess, and the protrusion is biased from within the camera support 32 with a panel or the like. As a result, the camera 2 can rotate between the horizontal direction and the vertically downward direction, and can maintain a mechanical state in the horizontal direction and the vertically downward direction.
  • the mechanism unit including the hinge 203 and the camera support unit 32 functions as the switch 4.
  • connection or detection signal is set so that different signals are obtained between the horizontal recess and the vertical downward recess.
  • the switch 4 is formed, and the detection result of the switch 4 is given to the camera 2.
  • the camera 2 can identify the power of the camera 2 facing in the horizontal direction and the power of capturing the video by identifying whether the camera 2 is facing down in the vertical direction.
  • the camera 2 includes an imaging unit 21 and a video processing unit 22.
  • the imaging unit 21 includes a fisheye lens, and images an area up to the installation surface of the fisheye lens with an infinite distance force in all directions around the front direction of the camera 2.
  • the imaging data is given to the video processing unit 22.
  • the image processing unit 22 acquires the direction in which the camera 2 is detected (hereinafter referred to as a shooting direction) from the force of the switch 4 (hinge 203 and camera support unit 32) of the stay 3. Based on the acquired shooting direction and the speaker orientation data P m from the sound emission and collection device 1 via the communication terminal 5, the video processing unit 22 extracts only the necessary part from the imaging data and corrects the image, thereby obtaining video data. Is generated. The generated video data is given to the communication terminal 5.
  • the number of power conferencing members shown when there are five conferencing members on the device side is not particularly limited to this.
  • FIG. 4 is a diagram showing a situation in which the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the camera 2 captures the conference participants 60;!-605.
  • FIG. 4 is a diagram showing a situation in which the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the camera 2 captures the conference participants 60;!-605.
  • FIG. 4 is a diagram showing a situation in which the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the camera 2 captures the conference participants 60;!-605.
  • FIG. 4 is a diagram showing a situation in which the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the camera 2 captures the conference participants 60;!-605.
  • FIG. 5 is an explanatory diagram used to explain video data generation.
  • (A) shows the video (image) taken through the fisheye lens
  • (B) and (C) are image correction concepts for each conference direction. Indicates.
  • FIG. 6 is a diagram illustrating a situation where the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the case where the camera 2 captures the document 650 is illustrated.
  • FIG. 6 is a diagram illustrating a situation where the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the case where the camera 2 captures the document 650 is illustrated.
  • Fig. 7 is an explanatory diagram used to explain the video data generation.
  • (A) shows the video (image) taken through the fisheye lens, and
  • (B) shows the concept of image correction during image capture.
  • the conferees 60;! To 605 are seated on the oval table 700 at positions other than one end in the longitudinal direction.
  • an integrated member of a circular sound emission and collection device 1 and a camera 2 fixed to the same by a stay 3 is installed on the table 700.
  • the force lens 2 is installed so that the axis parallel to the longitudinal direction of the table 700 coincides with the central axis of the fish-eye lens in a state of being horizontally oriented.
  • a communication terminal 5 is installed under the table 700.
  • the communication terminal 5 is electrically connected to the sound emission and collection device 1 and the camera 2 and is connected to the network 500.
  • the communication terminal 5 is electrically connected to the display 6.
  • the display 6 is composed of, for example, a liquid crystal display or the like, and is installed near the end of the table 700 where the participants 60 ;! to 605 are not seated. At this time, the display 6 is installed such that the display surface faces the direction of the table 700.
  • the video conference device including the sound emission and collection device 1, the camera 2, and the communication terminal 5 transmits the conference video to the destination video conference device in two modes. Send.
  • the video processor 22 of the camera 2 detects that the conference shooting mode has been selected by the detection signal from the switch 4. To do.
  • the video processing unit 22 detects the conference shooting mode, the video processing unit 22 provides the communication terminal 5 with a selection signal for the mode.
  • the imaging unit 21 of the camera 2 acquires imaging data obtained by imaging all conference persons 60;! To 605 present on the device side through the fisheye lens, and outputs the acquired imaging data to the video processing unit 22.
  • the imaging area becomes circular as shown in Fig. 5 (A).
  • the sound emission and collection device 1 acquires the voice of the conference participant who is speaking by the above-described processing, detects the conference direction, and transmits the collected sound data and the speaker orientation information ⁇ to the communication terminal 5.
  • the sound emission and collection device 1 detects the direction ⁇ 1 of the conference party 601 and collects the collected sound data and the speaker based on the voice from the direction of the conference party 601.
  • Direction information ⁇ 1 is given to communication terminal 5.
  • the sound emission and collection device 1 detects the orientation ⁇ 2 of the conference party 605, and collects the collected sound data based on the voice from the conference 605 direction and the speaker orientation information ⁇ 2.
  • the communication terminal 5 gives the speaker orientation information ⁇ to the video processing unit 22 of the camera 2.
  • the video processing unit 22 corrects the imaging data based on the speaker orientation information ⁇ from the communication terminal 5! /.
  • the video processing unit 22 stores in advance the relationship between the speaker orientation information ⁇ and the orientation angle ⁇ set in the imaging data.
  • the video processing unit 22 reads the corresponding orientation angle ⁇ .
  • the video processing unit 22 receives the speaker orientation information ⁇ 1 for the conference 601
  • the video processing unit 22 performs image correction conversion for each acquired image extraction area. Specifically, each pixel defined by two angular directions, ⁇ direction and ⁇ direction, is corrected so as to be applied to a pixel in an orthogonal two-dimensional plane coordinate (X— ⁇ coordinate system). At this time, the video processing unit 22 stores a conversion processing table between the ⁇ coordinate system and the X ⁇ coordinate system in advance, and calculates the X ⁇ coordinate based on the obtained ⁇ coordinate of each pixel. And compensate for transformation. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.
  • the video processing unit 22 uses a plane coordinate system to store the image data 621 set in the azimuth range ⁇ 1 to ⁇ 2 and the elevation range ⁇ 1 to ⁇ 2. Converted to the corrected image data 621 'set by xl to x2 and yl to y2 with the horizontal direction as the X axis and the vertical direction as the vertical axis.
  • the person image 611 of the conference person 601 obtained in the ⁇ coordinate system is converted into a corrected person image 631 in the XY coordinate system (planar coordinate system).
  • the corrected person image 631 becomes close to the natural body image of the conference person 601.
  • the video processing unit 22 converts the image data 622 set in the azimuth angle range ⁇ 3 to ⁇ 4 and the elevation angle range ⁇ 3 to ⁇ 4 into plane coordinates.
  • horizontal direction X axis Is converted to the corrected image data 622 ′ set by x3 to x4 and y3 to y4 with the vertical direction as the Y axis.
  • the person image 615 of the conference person 605 acquired in the ⁇ coordinate system is converted into a corrected person image 635 in the XY coordinate system (planar coordinate system).
  • the corrected person image 635 becomes close to the natural body image of the conference participant 601.
  • the video processing unit 22 attaches time information to the corrected image data including the corrected human image approaching the natural body in this way, and outputs the corrected image data to the communication terminal 5 as video data. Such generation and output of the corrected image data are performed sequentially. If the received speaker orientation information ⁇ changes, the center direction of the corrected image data is switched according to the change.
  • the communication terminal 5 uses the video data from the video processing unit 22, the collected voice data, and the speaker orientation information.
  • Communication data is generated by associating with ⁇ , and transmitted to the video conference apparatus of the other party via the network 500.
  • the video processing unit 22 of the camera 2 causes the document shooting mode to be detected by the detection signal from the switch 4. Detects that a command was selected.
  • the video processing unit 22 detects the document photographing mode, the video processing unit 22 gives a selection signal for the mode to the communication terminal 5.
  • the force of any of the participants 60;! To 605 places the material 650 around the vertical downward position of the hinge 203 in the table 700. At this time, if the material placement marking is performed on the table 700 in advance, the material 650 can be placed easily and appropriately.
  • the imaging unit 21 of the camera 2 acquires imaging data obtained by imaging the material 650 placed on the table 700 through the fisheye lens, and outputs it to the video processing unit 22.
  • the imaging data passes through the fisheye lens, the imaging area becomes circular as shown in FIG.
  • the image processing unit 22 sets the center of the imaging data as the origin and the distance r extending in the radial direction from the origin, and a predetermined direction (see FIG. In Fig. 7, it is expressed as an angle 71 with respect to the image data from the origin in the right direction (0 ° direction). Obtained in the coordinate system.
  • the video processing unit 22 cuts out image data 680 in a preset range from the acquired imaging data.
  • the video processing unit 22 corrects the image data 680 in the r ⁇ coordinate system by converting it into corrected image data 680 ′ in the X ⁇ plane coordinate system. At this time, the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r ⁇ coordinate system and the X ⁇ coordinate system coincide with each other, and the X— Y coordinate is calculated and corrected. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and V can be used for the coordinate conversion calculation formula and fi correction can be performed.
  • the material image 660 of the material 650 acquired in the r-7 coordinate system is converted into a corrected material image 670 in the XY coordinate system (planar coordinate system).
  • the corrected material image 670 becomes close to the natural body image of the material 650. That is, the image data of the material 650 can be acquired.
  • the communication terminal 5 generates communication data including the image data of the material 650 acquired from the video processing unit 22 and transmits the communication data to the partner video conference apparatus via the network 500. As a result, it is possible to provide clear and easy-to-see material images to the conference attendees who are present around the video conference device of the other party. At this time, if the collected sound data is acquired from the sound emission and collection device 1, the communication terminal 5 generates and transmits communication data including the collected sound data together with the image data of the material 650. Also good.
  • FIG. 8 is an external view of an assembly member including the sound emission and collection device 1, the camera 2, and the support 7 in the video conference device of the present embodiment, (A) is a plan view, and (B) is a side view. It is.
  • FIG. 9 is a diagram showing a usage situation of the video conference apparatus using the video conference apparatus of the present embodiment, where (A) is a plan view and (B) is a side view. 8 and 9, the sound emission and collection device 1 and cables connected to the camera 2 are not shown.
  • FIG. 10 is a diagram for explaining generation of video data by the video conference apparatus according to the present embodiment.
  • (A) is a diagram showing imaging data
  • (B) is a concept of image correction at the center of the imaging data.
  • FIGS. 2C and 2C are conceptual diagrams of image correction around the image data.
  • the configuration and processing of the sound emitting and collecting apparatus 1 and the communication terminal 5 are the same as those of the video conference apparatus of the first embodiment.
  • the video conferencing apparatus of the present embodiment is different from the first embodiment in that the switch 4 is installed in the structure of the camera 2, that is, the structure of the support 7 and the video processing method in the video processing unit 22 of the camera 2. It is omitted.
  • a support 7 is disposed around the disc-shaped sound emitting and collecting apparatus 1.
  • the support 7 includes four vertical support shafts extending in the vertical direction, two horizontal support shafts disposed at a distance h 1 from the top surface of the sound emitting and collecting device 1, and the top surface of the sound emitting and collecting device 1. It consists of four horizontal spindles arranged at a distance h2 (> hl).
  • the two horizontal support shafts arranged at the distance hi have a structure that intersects at a substantially central position when the sound emitting and collecting apparatus 1 is viewed in plan, and are held at the distance hi by the four vertical support shafts.
  • the horizontal support shafts arranged at the distance h2 are assembled so as to be substantially square in a plan view, and are held at the distance h2 by four vertical support shafts.
  • Camera 2 is installed at the intersection of two horizontal spindles at distance hi. Camera 2 is installed so that the shooting direction is vertically upward.
  • the mounting table 8 is supported by four horizontal support shafts at a distance h2, and the mounting table 8 is formed of a highly transmissive glass, an acrylic plate, or the like. At this time, the mounting table 8 and the camera 2 are installed so that the center of the mounting table 8 and the axis of the fisheye lens of the camera 2 substantially coincide with each other in a plan view.
  • the material 650 is placed with the printing surface in a vertically downward direction, that is, in a direction in contact with the mounting tape nozzle 8.
  • the height of the camera 2 and the height of the mounting table 8, that is, the distances hi and h2, are as shown in FIG. It should be set so that it can be photographed and is not hidden by the horizontal spindle that supports the mounting table 8.
  • the video conferencing apparatus having such a configuration When the video conferencing apparatus having such a configuration is used, it is acquired by the imaging unit 21 of the camera 2.
  • the imaging data is as shown in Fig. 10 (A).
  • the entire imaging area is a circular all-area image data 610, and the document image 660 of the document 650 is projected at the center, and each of the surrounding areas is each image data 660.
  • Personnel image 60;! ⁇ 604 People image 64;! ⁇ 644 are shown.
  • the image processing unit 22 uses the center of the image data as the origin, the distance r extending in the radial direction from the origin, and a predetermined direction (in FIG. 10, from the origin to the image data). It is obtained in the r 7] coordinate system expressed by the angle ⁇ with respect to the right direction (0 ° direction). The video processing unit 22 cuts out a predetermined range of image data 681 from the acquired imaging data.
  • the video processing unit 22 corrects the image data 681 in the r ⁇ coordinate system by converting it into corrected image data 681 ′ in the X ⁇ plane coordinate system.
  • the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r ⁇ coordinate system and the X ⁇ coordinate system coincide with each other, and based on the acquired r 7] coordinates of each pixel, — Calculate Y coordinate and perform correction conversion.
  • the video processing unit 22 stores a coordinate conversion calculation formula in advance, and V can be used for the coordinate conversion calculation formula and fi correction can be performed.
  • the material image 660 of the material 650 acquired in the r ⁇ coordinate system is converted into a corrected material image 670 in the ⁇ - ⁇ coordinate system (planar coordinate system).
  • The By transforming into the X-coordinate system in this way, the corrected material image 670 becomes close to the natural body image of the material 650. In other words, it is possible to obtain the image data of the document 650 that is not distorted
  • the video processing unit 22 acquires peripheral image data 682 by removing the image data 681 near the center from the entire region image data 610. Based on the speaker position information acquired from the sound emission and collection device 1 via the communication terminal 5, the video processing unit 22 sets an area to be extracted as in the first embodiment. That is, the video processing unit 22 extracts a region including the image of the conference participant who is speaking, and acquires the partial image data 683. At this time, the video processing unit 22 acquires partial image data in the rV coordinate system. Specifically, as shown in FIG. 10 (C), the video processing unit 22 determines the coordinates of the four corners of the fan shape including the image of the corresponding conference (rlO) based on the speaker orientation information. , 7] 10), (rlO, ⁇ 20), (r20, ⁇ 20), (r20,] 10) ⁇ To obtain.
  • the video processing unit 22 performs correction conversion on the acquired partial image data 683. Specifically, each pixel defined in the r coordinate system is compensated and transformed so as to be applied to a pixel in the orthogonal two-dimensional plane coordinate (XY coordinate system). At this time, the video processing unit 22 stores in advance a conversion processing table between the rn coordinate system and the XY coordinate system, and calculates the XY coordinate based on the acquired rn coordinate of each pixel. , Make corrections. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.
  • the video processing unit 22 has a planar coordinate system for displaying the partial image data 683 set in the distance range rl0 to r20 and the azimuth angle range ⁇ 10 to ⁇ 20. Converted to the corrected image data 683 'set by xl0 to x20 and yl0 to y20 with the horizontal direction as the X axis and the vertical direction as the vertical axis. By this conversion, the person image 644 of the conference person 604 acquired in the rn coordinate system is converted into a corrected person image 654 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected human image 654 becomes close to the natural image of the conference person 604.
  • the video processing unit 22 attaches time information to the corrected image data including the acquired correction material image 670 and the corrected image data including the corrected human image 654, and outputs it to the communication terminal 5 as video data. Generation and output of such corrected image data are performed sequentially. If the received speaker orientation information ⁇ changes, only the corrected image data including the corrected human image is switched according to the change. Video data is output.
  • the communication terminal 5 uses the video data from the video processing unit 22, the collected voice data, and the speaker orientation information.
  • Communication data is generated by associating with ⁇ , and transmitted to the video conference apparatus of the other party via the network 500.
  • Communication data
  • Communication data is generated by associating with ⁇ , and transmitted to the video conference apparatus of the other party via the network 500.
  • the processing and network load are reduced by the amount of data of the document image, so that processing and transmission can be performed at higher speed.
  • the document image acquisition timing is different from the previous image by providing an image analysis unit that can input the acquisition operation from the operation unit when a new document is placed. Time may be a new acquisition timing.
  • the power shown in the example in which the video processing unit is provided in the camera can be realized by a device independent of the camera, or the sound emitting and collecting device or the communication terminal. You may equip it.
  • a general-purpose video camera can be used as long as it has a lens capable of shooting the necessary area described above.
  • the communication terminal is provided independently of the sound emission and collection device, but the function of the communication terminal may be provided in the sound emission and collection device.
  • the number of components of the video conference apparatus is reduced, so that a simpler and smaller video conference apparatus can be realized.
  • the present invention is based on a Japanese patent application filed on December 19, 2006 (Japanese Patent Application No. 2006-341175), the contents of which are incorporated herein by reference.

Abstract

Provided is a video conferencing device capable of accurately and clearly sending a conference participant audio and video and associated materials. The position of a camera (2) is fixed by a stay (3) with respect to a disc-shaped sound emitting/collecting device (1). In this position, the camera (2) is arranged in such a manner that it can rotate and be half-fixed at a horizontal-direction state and a vertical-direction state. Here, the state of the camera (2) is detected by using a switch. When the camera (2) is set in the horizontal direction, the camera (2) images a conference participant and extracts and acquires a video of a conference participant who is talking. On the other hand, when the camera (2) is set in the vertical direction, it images a material which is set in advance. Since the camera (2) includes a fish-eye lens, the acquired video is corrected in accordance with the respective states so as to generate image data to be transmitted.

Description

明 細 書  Specification
ビデオ会議装置  Video conferencing equipment
技術分野  Technical field
[0001] この発明は、互いに離れた会議室間でビデオ会議を行う際に用いる映像や画像と 音声とを通信するビデオ会議装置に関するものである。  [0001] The present invention relates to a video conferencing apparatus that communicates video and images and audio used when a video conference is performed between conference rooms separated from each other.
背景技術  Background art
[0002] 従来、互いに離れた複数地点間でビデオ会議を行う場合、それぞれの地点に特許 文献 1に示すようなビデオ会議装置 (テレビ会議装置)を配置し、当該ビデオ会議装 置を取り囲むように会議者が在席して会議を行う。  Conventionally, when a video conference is performed between a plurality of points separated from each other, a video conference device (video conference device) as shown in Patent Document 1 is arranged at each point so as to surround the video conference device. The conference is attended and a conference is held.
[0003] 特許文献 1のビデオ会議装置では、各会議者に電波発生器付きマイクを装着させ 、最も高レベルの音声を収音したマイクから電波を放射する。人物撮影用カメラは、こ の電波を受信することで話者方向を検出して、当該話者方向へカメラを向け、話者を 中心とする映像を撮像する。この映像データと音声データとは符号化され、相手先の ビデオ会議装置に送信される。  [0003] In the video conference apparatus of Patent Document 1, each conference person is equipped with a microphone with a radio wave generator, and radio waves are radiated from the microphone that picks up the highest level of sound. The person photographing camera detects the direction of the speaker by receiving this radio wave, directs the camera toward the direction of the speaker, and captures an image centered on the speaker. The video data and audio data are encoded and transmitted to the destination video conference apparatus.
特許文献 1 :特開平 6— 276514号公報  Patent Document 1: JP-A-6-276514
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0004] ビデオ会議を行う場合、上述のように話者等の会議者の映像だけでなぐ離れた地 点間で資料等を共通に参照したい場合がある。特許文献 1の装置では、話者の映像 を切り替えて取得することができる力 このままでは資料を映すことはできない。この ため、特許文献 1の構成を利用して資料を映すには、会議者が手差しでカメラの前に 資料を翳せばよいが、資料を完全に固定することができないので、画像がブレてしま う。また、レンズによる湾曲の影響を受けて、資料をありのまま(元画像のまま)取り込 むことができない。また、資料を共通で参照する別方法として、資料をデータ化して 送信することも可能ではある力 会議中に書き込みをして説明する等の直感的で、フ レキシビリティに富んだ資料を提供することができない。  [0004] When a video conference is performed, there is a case where it is desired to refer to materials or the like in common between distant points only by the video of a conference person such as a speaker as described above. With the device of Patent Document 1, the ability to switch and acquire a speaker's video cannot be used to display materials. For this reason, in order to project a document using the configuration of Patent Document 1, it is sufficient for the conference person to manually view the document in front of the camera, but the document cannot be fixed completely, so the image is blurred. Let's do it. In addition, the document cannot be imported as it is (original image) due to the influence of the curvature of the lens. In addition, as an alternative method of referring to materials in common, it is also possible to transmit the materials as data. Intuitive and flexible materials such as writing and explaining during meetings are provided. I can't.
[0005] したがって、本発明の目的は、音声、映像とともに、フレキシビリティに富むような資 料であっても、正確且つ明瞭に送信することができるビデオ会議装置を提供すること にめ ·ο。 [0005] Therefore, an object of the present invention is to provide a resource that is highly flexible along with audio and video. To provide a video conferencing device that can be transmitted accurately and clearly even if there is a fee.
課題を解決するための手段  Means for solving the problem
[0006] この発明は、所定領域を撮像する撮像部と、該撮像部の撮像した映像に基づいて 映像データを生成する映像データ生成部と、 自装置周囲の音声を収音して収音音 声データを生成し、放音音声データを放音する放収音部を備える筐体と、収音音声 データと映像データとを有する通信データを生成して当該通信データを外部に送信 するとともに、外部からの通信データから放音音声データを取得して放収音部に与え る通信部と、撮像部を所定の態様で支持する支持部と、を備えたビデオ会議装置に 関するものである。このビデオ会議装置では、支持部で、筐体の周囲の会議者撮像 領域に撮像部を向ける第 1態様と、筐体の近傍の前記撮像部に近接する領域に前 記撮像部を向ける第 2態様と、のいずれかで前記撮像部を支持する。そして、(Α)こ のビデオ会議装置の映像データ生成部は、第 1態様の選択が検出されると、収音音 声データの収音方位情報に対応する方位領域のみを映像データから切り出して、切 り出した映像データを第 1態様に応じた第 1補整処理により補整する。また、(Β)この 映像データ生成部は、第 2態様の選択が検出されると、撮像部の正面方向を中心と する所定領域を映像データから切り出して、第 1補整処理と異なる第 2態様に応じた 第 2補整処理により切り出した映像データを補整する。  [0006] The present invention provides an imaging unit that captures a predetermined area, a video data generation unit that generates video data based on video captured by the imaging unit, Generating voice data and generating communication data including a housing including a sound emission and collection unit for emitting sound emission sound data, sound collection sound data and video data, and transmitting the communication data to the outside. The present invention relates to a video conferencing apparatus including a communication unit that acquires sound emission sound data from external communication data and applies the sound emission sound collection unit, and a support unit that supports the imaging unit in a predetermined manner. In this video conferencing apparatus, the support unit uses the first mode in which the imaging unit is directed to the conference person imaging region around the housing, and the second mode in which the imaging unit is directed to a region near the imaging unit in the vicinity of the housing. The imaging unit is supported by any one of the modes. Then, (i) when the selection of the first mode is detected, the video data generation unit of this video conference apparatus cuts out only the azimuth area corresponding to the sound collection direction information of the collected sound data from the video data. Then, the extracted video data is corrected by the first correction processing according to the first mode. In addition, (ii) when the selection of the second mode is detected, the video data generation unit cuts out a predetermined area centered on the front direction of the imaging unit from the video data, and is different from the first correction processing. The video data cut out by the second correction process according to the above is corrected.
[0007] この構成では、本発明のビデオ会議装置は、撮像部が会議者撮像領域を向く第 1 態様に設定されている時には、収音方位の映像データのみを切り出して、第 1補整 処理により適宜見やすいように補整する。そして、ビデオ会議装置は、この映像デー タと収音音声データとから通信データを生成して、相手先装置に送信する。一方、ビ デォ会議装置は、撮像部が筐体近傍の近接領域に設置された資料等を撮影する第 [0007] With this configuration, the video conference device of the present invention cuts out only the video data in the sound collection direction and performs the first correction process when the imaging unit is set to the first mode facing the conference person imaging area. Make adjustments so that they are easy to see. Then, the video conference device generates communication data from the video data and the collected sound data, and transmits the communication data to the counterpart device. On the other hand, in the video conference apparatus, the image capturing unit captures a document or the like installed in a close area near the casing.
2態様に設定されている場合には、撮像部が正面から撮影した映像を、第 2補整処 理により適宜見やすいように補整する。そして、ビデオ会議装置は、この映像データ を含む通信データを生成して、相手先装置に送信する。この際、第 1態様と第 2態様 とは撮影する領域が異なる場合もあるので、それぞれの態様に応じた異なる補整処 理からなる第 1補整処理と第 2補整処理とにより映像が補整される。 [0008] これにより、会議者映像と資料等の静止画とが、それぞれの撮影仕様に応じて補整 されるので、相手先装置に対して、それぞれ適正に補整された会議者映像と資料画 像とを送信すること力できる。 If it is set to 2 modes, the image taken by the imaging unit from the front is corrected by the second correction process so that it can be easily viewed. Then, the video conference device generates communication data including the video data and transmits it to the counterpart device. At this time, since the area to be photographed may be different between the first mode and the second mode, the video is corrected by the first correction process and the second correction process, which are different correction processes according to the respective modes. . [0008] Thereby, since the conference participant video and the still image such as the document are corrected according to the respective shooting specifications, the conference participant video and the document image appropriately corrected with respect to the destination device. And can send power.
[0009] また、この発明のビデオ会議装置の支持部は、第 1態様と第 2態様とを切り替える関 節機構を備えるとともに、該関節機構によるスィッチを形成することを特徴としている。 さらに、このビデオ会議装置の映像データ生成部は、関節機構によるスィッチの選択 状況に基づいて第 1態様と第 2態様との選択を検出することを特徴としている。  [0009] Further, the support section of the video conference apparatus of the present invention is characterized by including a joint mechanism for switching between the first mode and the second mode, and forming a switch by the joint mechanism. Furthermore, the video data generation unit of this video conference apparatus is characterized by detecting the selection between the first mode and the second mode based on the switch selection status by the joint mechanism.
[0010] この構成のビデオ会議装置は、支持部の関節機構を動作させてスィッチを切り替え ることで、第 1態様と第 2態様とが選択されるので、機構的に簡単に第 1態様と第 2態 様とが設定される。  [0010] In the video conferencing apparatus having this configuration, the first mode and the second mode are selected by switching the switch by operating the joint mechanism of the support unit. Second mode is set.
[0011] また、この発明は、所定領域を撮像する撮像部と、該撮像部の撮像した映像に基 づいて映像データを生成する映像データ生成部と、自装置周囲の音声を収音して収 音音声データを生成し、放音音声データを放音する放収音部と、収音音声データと 映像データとを有する通信データを生成し、当該通信データを外部に送信するととも に、外部からの通信データから放音音声データを取得して前記放収音部に与える通 信部と、撮像部を筐体に対して一定に支持する支持部と、を備えたビデオ会議装置 に関するものである。このビデオ会議装置では、撮像部は、会議者撮像領域と、筐体 の近傍の前記撮像部に近接する領域とを同時に撮像する。映像データ生成部は、 会議者撮像領域に対応する第 1部分映像データから、収音音声データの収音方位 情報に対応する方位領域のみを切り出して、切り出した第 1部分映像データを第 3補 整処理により補整し、撮像部に近接する領域に対応する第 2部分映像データを、第 3 補整処理と異なる第 4補整処理により補整する。  [0011] The present invention also includes an imaging unit that images a predetermined area, a video data generation unit that generates video data based on the video captured by the imaging unit, and a sound around the device itself. Generates collected sound data, generates communication data including a sound emission / collection unit that emits sound output sound data, sound collection sound data and video data, and transmits the communication data to the outside. The present invention relates to a video conferencing apparatus comprising: a communication unit that obtains sound emission sound data from communication data from and provides the sound emission and collection unit; and a support unit that supports the imaging unit with respect to the housing. is there. In this video conference apparatus, the imaging unit simultaneously captures the conference person imaging area and the area close to the imaging unit in the vicinity of the housing. The video data generation unit cuts out only the azimuth area corresponding to the sound collection direction information of the collected sound data from the first partial video data corresponding to the conference person imaging area, and the third partial video data is extracted from the first partial video data. The second partial video data corresponding to the area close to the imaging unit is corrected by a fourth correction process different from the third correction process.
[0012] この構成のビデオ会議装置は、会議者撮像領域に対応する第 1部分映像データと 、撮像部に近接する資料が配置された領域に対応する第 2部分映像データとが、一 台の撮像部で同時に取得される。そして、第 1部分映像データは、収音音声データ に対応する方位領域のみが切り出され、第 3補整処理により適宜補整される。第 2部 分映像データは、対応する第 4補整処理により適宜見やすいように補整される。  [0012] In the video conferencing apparatus having this configuration, the first partial video data corresponding to the conference person imaging area and the second partial video data corresponding to the area where the material close to the imaging unit is arranged are provided as one unit. Acquired simultaneously by the imaging unit. In the first partial video data, only the azimuth area corresponding to the collected sound data is cut out and appropriately corrected by the third correction process. The second part video data is adjusted so that it can be easily viewed by the corresponding fourth correction process.
[0013] これにより、会議者映像と資料等の静止画とが、同時に取得され、且つ、それぞれ の撮影仕様に応じて補整される。この結果、相手先装置に対して、それぞれ適正に 補整された会議者映像と資料画像とを同時に送信することもできる。 [0013] Thereby, the conference video and the still image such as the document are acquired at the same time, and each It is adjusted according to the shooting specifications. As a result, it is also possible to simultaneously transmit the conference participant video and the material image appropriately corrected to the destination device.
[0014] この発明のビデオ会議装置は、通信データに用いる部分映像データを選択する選 択部を備える。ビデオ会議装置の映像データ生成部は、選択部により選択された部 分映像データを通信部に与える。  [0014] The video conference apparatus according to the present invention includes a selection unit that selects partial video data used for communication data. The video data generation unit of the video conference apparatus gives the partial video data selected by the selection unit to the communication unit.
[0015] この構成では、会議者映像と静止画とのうちの選択されたいずれか一方が送信さ れる。これにより、経時変化の殆どない静止画を必要なときにのみ送信することができ るので、通信系に余分な負荷を掛けることがない。  [0015] In this configuration, one of the conference video and the still image selected is transmitted. As a result, a still image that hardly changes with time can be transmitted only when necessary, so that no extra load is applied to the communication system.
[0016] また、この発明のビデオ会議装置では、撮像部に魚眼レンズを有し、該魚眼レンズ により撮像される領域の中心領域を撮像部に近接する領域とし、少なくとも中心領域 から外の周辺領域を会議者撮像領域とすることを特徴としている。  [0016] Further, in the video conference apparatus of the present invention, the imaging unit has a fisheye lens, the central region of the region imaged by the fisheye lens is set as a region close to the imaging unit, and at least a peripheral region outside the central region is conferenced It is characterized by a person imaging area.
[0017] この構成のビデオ会議装置では、具体的な撮像部の仕様として魚眼レンズを利用 する。そして、魚眼レンズの中心に対応する領域を撮像部が近接する領域とし、この 領域に応じた補整処理により適宜補整を行う。会議者撮像領域は、態様の切り替え を行う場合は中心領域も使用することがあるが、周辺領域を使用することが主となる。 したがって、会議者領域の映像に関しては、それぞれの場合に応じて、選択した領 域に応じた補整処理により適宜補整を行う。これにより、撮像部近傍の近接領域の映 像 (画像)と会議者撮像領域の映像とを、魚眼レンズを介して撮像しても、それぞれの 映像が適宜補整される。  [0017] In the video conferencing apparatus having this configuration, a fisheye lens is used as a specific specification of the imaging unit. Then, an area corresponding to the center of the fisheye lens is set as an area close to the imaging unit, and correction is appropriately performed by correction processing according to this area. As the conference area, the center area may be used when the mode is changed, but the peripheral area is mainly used. Therefore, the video in the conference area is appropriately adjusted by the correction processing according to the selected area in each case. As a result, even if an image (image) in the vicinity area near the imaging unit and an image in the conference area are captured through the fisheye lens, the respective images are appropriately corrected.
[0018] また、この発明のビデオ会議装置の映像データ生成部は、撮像部と一体形成され ている。また、この発明のビデオ会議装置の通信部は、放収音部とともに筐体に一体 形成されている。また、この発明のビデオ会議装置の映像データ生成部は、放収音 部とともに筐体に一体形成されている。これらにより、ビデオ会議装置がコンパクトに 構成される。  [0018] In addition, the video data generation unit of the video conference apparatus of the present invention is integrally formed with the imaging unit. In addition, the communication unit of the video conference apparatus according to the present invention is integrally formed with the housing together with the sound emission and collection unit. In addition, the video data generation unit of the video conference apparatus according to the present invention is integrally formed with the casing together with the sound emission and collection unit. These make the video conferencing equipment compact.
[0019] また、この発明のビデオ会議装置は映像データを再生するディスプレイモニタを備 える。このビデオ会議装置の通信部は、通信データに含まれる映像データを取得し て、ディスプレイモニタに与える。  [0019] The video conference apparatus of the present invention further includes a display monitor for reproducing video data. The communication unit of this video conference apparatus acquires video data included in the communication data and supplies it to the display monitor.
[0020] これにより、通信会議を行う各地点に本発明のビデオ会議装置を配置して接続する だけで、双方で会議者映像と資料とを簡単に共有することができる。 [0020] Thereby, the video conference apparatus of the present invention is arranged and connected to each point where the communication conference is performed. It is possible to easily share the conference video and materials between the two parties.
発明の効果  The invention's effect
[0021] この発明によれば、簡単な撮像部の方向の操作で、話者の映像は話者の映像に 応じた補整処理で補整され、資料の画像は資料の画像に応じた補整処理で補整さ れるので、話者映像および資料画像を、ともに正確且つ明瞭に相手側装置に送信す ること力 Sできる。これにより、本装置を用いたビデオ会議では、より臨場感の有る、互 いに分かりやす!/、会議を簡単に実現することができる。  [0021] According to the present invention, the video of the speaker is corrected by the correction processing according to the video of the speaker, and the image of the material is corrected by the correction processing according to the image of the material by a simple operation of the imaging unit. Since the correction is made, it is possible to transmit both the speaker image and the document image to the other device accurately and clearly. As a result, in the video conference using this apparatus, it is possible to realize the conference more easily and easily!
図面の簡単な説明  Brief Description of Drawings
[0022] [図 1]第 1の実施形態のビデオ会議装置の会議者撮影モード時の外観図である。  FIG. 1 is an external view of a video conferencing apparatus according to a first embodiment in a conference shooting mode.
[図 2]第 1の実施形態のビデオ会議装置の資料撮影モード時の外観図である。  FIG. 2 is an external view of the video conference apparatus according to the first embodiment in a document shooting mode.
[図 3]第 1の実施形態のビデオ会議装置の主要構成を示すブロック図である。  FIG. 3 is a block diagram illustrating a main configuration of the video conference apparatus according to the first embodiment.
[図 4]第 1の実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点と ビデオ会議を行う状況 (会議者撮影モード)を示す図である。  FIG. 4 is a diagram illustrating a situation (conference shooting mode) in which the video conference apparatus according to the first embodiment is arranged and a video conference is performed with another point connected to the network.
[図 5]会議者撮影モード時の映像データ生成の説明に用いる説明図である。  FIG. 5 is an explanatory diagram used for explaining video data generation in the conference shooting mode.
[図 6]第 1の実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点と ビデオ会議を行う状況 (資料撮影モード)を示す図である。  FIG. 6 is a diagram showing a situation (material shooting mode) in which the video conference apparatus according to the first embodiment is arranged and a video conference is performed with another point connected to the network.
[図 7]資料撮影モード時の映像データ生成の説明に用いる説明図である。  FIG. 7 is an explanatory diagram used for explaining video data generation in the document photographing mode.
[図 8]第 2の実施形態のビデオ会議装置の内の放収音装置 1とカメラ 2と支持体 7とか らなる組み立て部材の外観図である。  FIG. 8 is an external view of an assembly member including a sound emission and collection device 1, a camera 2, and a support 7 in a video conference device according to a second embodiment.
[図 9]第 2の実施形態のビデオ会議装置を用いたビデオ会議装置の使用状況を示し た図である。  FIG. 9 is a diagram showing a usage situation of a video conference apparatus using the video conference apparatus of the second embodiment.
[図 10]第 2の実施形態のビデオ会議装置による映像データの生成を説明する図であ 符号の説明  FIG. 10 is a diagram for explaining generation of video data by the video conference apparatus according to the second embodiment.
[0023] 1 放収音装置、 2—力メラ、 3 ステ一、 4 スィッチ、 5 通信端末、 6 ディスプレ ィ、 7 支持体、 8—載置テーブル、 11 筐体、 12 脚部、 21—撮像部、 22 映像 処理部、 31 主体部、 32 力メラ支持部、 33 主体支持部、 34 放収音装置取付 部、 102 入出力 I/F、 103 放音制御部、 105— A/D— AMP、 106 収音制 , 107 エコーキャンセノレ 、 110 凹 、 111 操作 、 203 ヒンジ、 500 —ネットワーク、 60 〜 605 会議者、 610 全領域画像データ、 611 , 615 人物 像、 621—補整画像データ、 622 補整画像データ、 631 , 635 補整人物像、 64 1〜644 人物像、 650 資料、 654 補整人物像、 660 資料像、 670 補整資 料像、 680, 681 補整画像データ、 682 周辺部画像データ、 683 部分画像デ ータ、 700 テーブル [0023] 1 Sound emission and collection device, 2—Power mela, 3 Steady, 4 Switch, 5 Communication terminal, 6 Display, 7 Support, 8—Mounting table, 11 Housing, 12 Legs, 21—Imaging Part, 22 Video processing part, 31 Main part, 32 Forced support part, 33 Main part support part, 34 Sound emitting / receiving device mounting part, 102 Input / output I / F, 103 Sound emission control part, 105—A / D—AMP 106 Sound pickup system , 107 Echo Cancellation, 110 Concave, 111 Operation, 203 Hinge, 500 — Network, 60 to 605 Conference, 610 All Area Image Data, 611, 615 People Image, 621 — Compensation Image Data, 622 Compensation Image Data, 631, 635 human image, 64 1-644 human image, 650 materials, 654 human image, 660 image, 670 image, 680, 681 image data, 682 peripheral image data, 683 partial image data, 700 tables
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0024] 本発明の第 1の実施形態に係るビデオ会議装置について、図を参照して説明する 図 1、図 2は本実施形態のビデオ会議装置の外観図であり、(A)が平面図、(B)が 側面図である。図 1、図 2では、機構的に特徴のある放収音装置、カメラ、ステ一の構 成のみを示し、通信端末、放収音装置、およびカメラを電気的に接続するケーブル については図示を省略する。また、図 1は会議者撮影モード時の機構状態を示し、図 2は資料撮影モード時の機構状態を示す。 The video conference apparatus according to the first embodiment of the present invention will be described with reference to the drawings. FIGS. 1 and 2 are external views of the video conference apparatus of the present embodiment, and FIG. (B) is a side view. Fig. 1 and Fig. 2 show only the structure of the sound emission and collection device, camera, and stage, which are mechanically characteristic, and the communication terminal, sound emission and collection device, and the cable that electrically connects the camera are not shown. Omitted. Fig. 1 shows the mechanism state in the conference shooting mode, and Fig. 2 shows the mechanism state in the document shooting mode.
図 3は本実施形態のビデオ会議装置の主要構成を示すブロック図である。  FIG. 3 is a block diagram showing the main configuration of the video conference apparatus according to the present embodiment.
[0025] なお、図 1、図 2、図 3およびこれ以降の本明細書で参照される図においては、マイ クを代表または総称して「MC」で表し、スピーカを代表または総称して「SP」で表す。 本実施形態のビデオ会議装置は、平面視した形状が円盤状の放収音装置 1と、撮像 機能および映像データ生成機能を備えるカメラ 2と、カメラ 2を放収音装置 1に対して 所定位置に設置するステー 3とを備える。また、図 1、図 2には図示していないが、放 収音装置 1とカメラ 2とは電気的に接続され、さらにビデオ会議装置は、放収音装置 1 とカメラ 2とに電気的に接続する通信端末を備える。  [0025] In FIG. 1, FIG. 2, FIG. 3 and the subsequent drawings referred to in this specification, the microphone is represented or generically represented by “MC”, and the speaker is represented or generically represented by “MC”. “SP”. The video conference apparatus according to the present embodiment includes a sound emitting and collecting apparatus 1 having a disk shape in plan view, a camera 2 having an imaging function and a video data generating function, and the camera 2 with respect to the sound emitting and collecting apparatus 1 at a predetermined position. And stay 3 to be installed. Although not shown in FIGS. 1 and 2, sound emitting and collecting apparatus 1 and camera 2 are electrically connected, and the video conferencing apparatus is electrically connected to sound emitting and collecting apparatus 1 and camera 2. A communication terminal to be connected is provided.
[0026] 通信端末 5は、ネットワーク 500を介して接続された相手先のビデオ会議装置の通 信端末から受信した通信データを復調して、放音用音声信号、相手先装置 ID、話者 方位データを取得して、ケーブル接続された自装置側の放収音装置 1に与える。ま た、通信端末 5は、 自装置側の放収音装置 1から受信した収音音声信号および話者 位置データと、カメラ 2から受信した映像データとに基づいて通信データを生成する。 通信端末 5は、生成した通信データを、相手先ビデオ会議装置の通信端末へ送信 する。また、通信端末 5は、場合に応じて、放収音装置 1とカメラ 2との間での話者位 置データの送受信を仲介する。 [0026] The communication terminal 5 demodulates the communication data received from the communication terminal of the other party's video conferencing apparatus connected via the network 500, and outputs a sound signal for sound emission, the other party's apparatus ID, and the speaker orientation. The data is acquired and given to the sound emitting and collecting device 1 on the own device side connected by the cable. Further, the communication terminal 5 generates communication data based on the collected sound signal and speaker position data received from the sound emitting and collecting device 1 on the own device side and the video data received from the camera 2. Communication terminal 5 transmits the generated communication data to the communication terminal of the destination video conference device. To do. Further, the communication terminal 5 mediates transmission / reception of the speaker position data between the sound emitting and collecting apparatus 1 and the camera 2 depending on the situation.
[0027] 放収音装置 1は円板状の筐体 11を備える。具体的に、筐体 11は、平面視した形状 が円形であり、天面と底面との面積が垂直方向の途中部分の面積よりも狭ぐ側面視 した形状が、高さ方向の一点から天面に向けて狭くなるとともに、前記一点から底面 に向けて狭くなる形状からなる。すなわち、前記一点より上部側および下部側にそれ ぞれ傾斜面を有する形状からなる。筐体 11の天面には、該天面の面積よりも狭ぐ所 定深さからなる凹部 110が形成されており、凹部 110の平面視した中心と天面の中 心と力 一致するように設定されている。 The sound emission and collection device 1 includes a disk-shaped housing 11. Specifically, the casing 11 has a circular shape in plan view, and the shape in side view in which the area between the top surface and the bottom surface is narrower than the area of the middle part in the vertical direction is from a point in the height direction. It has a shape that narrows toward the surface and narrows from the one point toward the bottom surface. That is, it has a shape having inclined surfaces on the upper side and the lower side from the one point. A concave portion 110 having a predetermined depth narrower than the area of the top surface is formed on the top surface of the casing 11 so that the center of the concave portion 110 and the center of the top surface coincide with each other. Is set to
[0028] 16個のマイク MC;!〜 MC16は、凹部 110の側面に沿った筐体 11の天面側内部 に設置されており、各マイク MC;!〜 MC16は放収音装置 1を平面視した中心を回転 中心として等角度ピッチ(この場合は約 22. 5° 間隔)で配置されている。この際、マ イク MC1が θ =0° 方向であるとすると、順に Θ力 ¾2. 5° ずつ増加する方向に沿 つて各マイク MC;!〜 MC16が配置される。例えば、マイク MC5は Θ = 90° 方向に 配置され、マイク MC9は Θ = 180° 方向に配置され、マイク MC13は、 Θ = 270° 方向に配置される。また、各マイク MC;!〜 MC16は、単一指向性を有し、それぞれ が前記平面視した中心方向に強い指向性を有するように配置されている。例えば、 マイク MC1は Θ = 180° 方向を指向性の中心とし、マイク MC5は Θ = 270° 方向 を指向性の中心とし、マイク MC9は Θ =0 (360)。 方向を指向性の中心とし、マイク MC13は Θ = 90° 方向を指向性の中心とする。なお、マイクの個数はこれに限らず 、仕様に応じて適宜設定すればよい。  [0028] 16 microphones MC;! To MC16 are installed inside the top surface of the casing 11 along the side surface of the recess 110, and each microphone MC;! They are arranged at equiangular pitches (in this case, at intervals of about 22.5 °) with the viewed center as the rotation center. At this time, if the microphone MC1 is in the direction of θ = 0 °, the microphones MC ;! to MC16 are arranged along the direction in which the Θ force increases by ½2.5 ° in order. For example, the microphone MC5 is arranged in the Θ = 90 ° direction, the microphone MC9 is arranged in the Θ = 180 ° direction, and the microphone MC13 is arranged in the Θ = 270 ° direction. Moreover, each microphone MC ;! to MC16 has a single directivity, and each microphone MC is arranged so as to have a strong directivity in the central direction as viewed from above. For example, microphone MC1 has Θ = 180 ° as the center of directivity, microphone MC5 has Θ = 270 ° as the center of directivity, and microphone MC9 has Θ = 0 (360). The direction is the center of directivity, and the microphone MC13 has the direction of Θ = 90 ° as the center of directivity. The number of microphones is not limited to this, and may be set as appropriate according to specifications.
[0029] 4個のスピーカ SP;!〜 SP4は、筐体 11の下部側の傾斜面と放音面が一致するよう にそれぞれ設置されており、各スピーカ SP;!〜 SP4は放収音装置 1を平面視した中 心を回転中心として等角度ピッチ(この場合は約 90° 間隔)で配置されている。この 際、スピーカ SP1の角度が Θ =0° 方向に配置され、スピーカ SP2がスピーカ SP1 に対して Θ = 90° 方向に配置され、スピーカ SP3がスピーカ SP1に対して Θ = 180 ° 方向に配置され、スピーカ SP4がスピーカ SP1に対して Θ = 270° 方向に配置さ れる。また、各スピーカ SP;!〜 SP4は、放音面の正面方向に強い指向性を有するも のであり、スピーカ SP1は Θ = 0。 方向を中心に放音し、スピーカ SP2は Θ = 90° 方向を中心に放音し、スピーカ SP3は Θ = 180° 方向を中心に放音し、スピーカ SP 4は Θ = 270° 方向を中心に放音する。 [0029] The four speakers SP;! To SP4 are installed so that the inclined surface on the lower side of the casing 11 and the sound emitting surface coincide with each other, and each speaker SP;! To SP4 is a sound emitting and collecting device. They are arranged at equiangular pitches (in this case, about 90 ° intervals) with the center of 1 viewed in plan as the center of rotation. At this time, the angle of the speaker SP1 is arranged in the Θ = 0 ° direction, the speaker SP2 is arranged in the Θ = 90 ° direction with respect to the speaker SP1, and the speaker SP3 is arranged in the Θ = 180 ° direction with respect to the speaker SP1. Speaker SP4 is arranged in the direction of Θ = 270 ° with respect to speaker SP1. Also, each speaker SP;! To SP4 has a strong directivity in the front direction of the sound emitting surface. The speaker SP1 has Θ = 0. The speaker SP2 emits sound around the Θ = 90 ° direction, the speaker SP3 emits sound around the Θ = 180 ° direction, and the speaker SP 4 focuses on the Θ = 270 ° direction. Sounds out.
[0030] このように、スピーカ SP;!〜 SP4を筐体 11の下部側に配置し、マイク MC;!〜 MC1 6を筐体 11の上部側に配置し、マイク MC;!〜 MC16の収音方向を筐体 11の中心方 向とすることで、各マイク MC;!〜 MC16は、スピーカ SP;!〜 SP4からの回り込み音声 を収音し難くなる。このため、後述する話者位置検出で、回り込み音声の影響を受け 難くなり、より高精度に話者位置検出が行える。  [0030] In this way, the speakers SP;! To SP4 are arranged on the lower side of the casing 11, and the microphones MC;! To MC16 are arranged on the upper side of the casing 11, and the microphones MC;! To MC16 are accommodated. By setting the sound direction to the central direction of the housing 11, the microphones MC ;! to MC16 are difficult to pick up the wraparound sound from the speakers SP ;! to SP4. For this reason, speaker position detection, which will be described later, is less likely to be affected by wraparound speech, and the speaker position can be detected with higher accuracy.
[0031] 操作部 111は、筐体 11の上部側の傾斜面に設置されており、図示しないが、各種 の操作釦および液晶表示パネルを備える。  [0031] The operation unit 111 is installed on an inclined surface on the upper side of the casing 11, and includes various operation buttons and a liquid crystal display panel (not shown).
入出力 I/F102 (図 1 , 2では図示せず)は、筐体 11の下部側の傾斜面で、スピー 力 SP;!〜 SP4が設置されていない位置に設置されており、音声データおよび各種制 御データを通信可能な端子を備える。そして、入出力 I/F102の端子と通信端末と をケーブル等で接続することで、放収音装置 1と通信端末とで通信を行う。  The input / output I / F102 (not shown in FIGS. 1 and 2) is an inclined surface on the lower side of the casing 11, and is installed at a position where the SP force SP;! To SP4 is not installed. Equipped with a terminal that can communicate various control data. Then, by connecting the terminal of the input / output I / F 102 and the communication terminal with a cable or the like, communication is performed between the sound emission and collection device 1 and the communication terminal.
[0032] 放収音装置 1は、このような構造上の構成とともに、図 3に示すような機能的な構成 を備える。  [0032] The sound emitting and collecting apparatus 1 has a functional configuration as shown in FIG. 3 in addition to such a structural configuration.
制御部 101は、放収音装置 1の設定、収音、放音等の全般制御を行うとともに、操 作部 11 1により入力された操作指示内容に基づく制御を放収音装置 1の各部に与え  The control unit 101 performs general control such as setting, sound collection, and sound emission of the sound emission / collection device 1, and controls each part of the sound emission / collection device 1 based on the operation instruction content input by the operation unit 111. Give
[0033] (1)放音 [0033] (1) Sound emission
入出力 I/F102は、通信端末 5から受信した放音用音声信号 S 1〜S3をそれぞれ チャンネル CH;!〜 CH3に出力する。なお、チャンネルの割り当ては、受信した放音 用音声信号の数に応じて適宜設定すればよい。また、入出力 I/F102は、通信端末 5から相手先装置 IDを受信して相手先装置 ID毎にチャンネル CHを割り当てる。例 えば、接続中の相手先装置が一台である場合、当該相手先装置からの音声データ を放音用音声信号 S 1として、チャンネル CH1に割り当てる。また、接続中の相手先 装置が二台である場合、二台の相手先装置からの音声データをそれぞれ放音用音 声信号 S I , S2として、チャンネル CHI , CH2に個別に割り当てる。同様に、接続中 の相手先装置が三台である場合、三台の相手先装置からの音声データをそれぞれ 放音用音声信号 S I , S2, S3として、チャンネル CHI , CH2, CH3に個別に割り当 てる。チャンネル CH;!〜 CH3は、エコーキャンセル部 107を介して放音制御部 103 に接続される。 The input / output I / F 102 outputs sound emission sound signals S 1 to S 3 received from the communication terminal 5 to the channels CH;! To CH 3, respectively. The channel assignment may be set as appropriate according to the number of received sound signals for sound emission. The input / output I / F 102 receives the counterpart device ID from the communication terminal 5 and assigns a channel CH to each counterpart device ID. For example, when there is one connected counterpart device, the audio data from the counterpart device is assigned to channel CH1 as sound output audio signal S1. Also, when there are two connected counterpart devices, the audio data from the two counterpart devices are individually assigned to channels CHI and CH2 as sound emission sound signals SI and S2, respectively. Similarly, connected If there are three counterpart devices, the audio data from the three counterpart devices are individually assigned to channels CHI, CH2, and CH3 as sound output signals SI, S2, and S3, respectively. The channels CH;! To CH3 are connected to the sound emission control unit 103 via the echo cancellation unit 107.
また、入出力 I/F102は、通信端末 5から相手先放収音装置での話者方位データ Pyを抽出し、チャンネル情報とともに放音制御部 103に与える。  Further, the input / output I / F 102 extracts the speaker orientation data Py at the other party sound emission and collection device from the communication terminal 5 and provides it to the sound emission control unit 103 together with the channel information.
[0034] 放音制御部 103は、放音用音声信号 S 1〜S3と、話者方位情報 Pyとに基づいて、 各スピーカ SP;!〜 SP4に与えるスピーカ出力信号 SPD;!〜 SPD4を生成する。  [0034] The sound emission control unit 103 generates speaker output signals SPD;! To SPD4 to be given to the speakers SP;! To SP4 based on the sound signals for sound emission S1 to S3 and the speaker orientation information Py. To do.
[0035] D/A—AMP104は各スピーカ出力信号 SPD;!〜 SPD4をディジタル アナログ 変換し、一定の増幅率で増幅して、それぞれスピーカ SP;!〜 SP4に与える。スピーカ SP;!〜 SP4は、与えられたスピーカ出力信号 SPD;!〜 SPD4を音声変換して放音す  [0035] The D / A-AMP 104 converts each speaker output signal SPD ;! to SPD4 from digital to analog, amplifies the signal with a constant amplification factor, and supplies it to the speakers SP ;! to SP4, respectively. Speaker SP;! ~ SP4 converts the given speaker output signal SPD;! ~ SPD4 into sound and emits it
[0036] このような放音処理を行うことで、各スピーカ SP;!〜 SP4から放音される音声が所定 の遅延関係および振幅関係になるため、あた力、も設定した仮想音源から放音された ような感覚を会議者に与えることができる。 [0036] By performing such sound emission processing, the sound emitted from each speaker SP;! To SP4 has a predetermined delay relationship and amplitude relationship. A sense of sound can be given to the conferees.
[0037] (2)収音  [0037] (2) Sound collection
マイク MC;!〜 MC16は、会議者の発生音等の外部からの音声を収音して収音信 号 MS;!〜 MS16を生成する。各 A/D— AMP105は、対応する収音信号 MS;!〜 MS 16を所定増幅率で増幅し、アナログ ディジタル変換して収音制御部 106に出 力する。  The microphones MC;! To MC16 collect sound from outside, such as the sound generated by the conference, and generate the collected signals MS;! To MS16. Each A / D-AMP 105 amplifies the corresponding collected sound signal MS ;! to MS 16 with a predetermined amplification factor, converts the signal to analog-digital, and outputs it to the sound collection control unit 106.
[0038] 収音制御部 106は取得した収音信号 MS;!〜 MS 16を、それぞれ異なる遅延制御 ノ ターンおよび振幅パターンで合成して、それぞれに異なる方向を指向性の中心方 向とする収音ビーム信号を生成する。例えば、放収音装置 1を中心として、全周囲 36 0° を 8分割した角度、すなわち、 45° 毎に指向性の中心方向がシフトする 8本の収 音ビーム信号を生成する。収音制御部 106は、これら収音ビーム信号の振幅レベル を比較して、もっとも高い振幅レベルの収音ビーム信号 MBSを選択して、エコーキヤ ンセル部 107に出力する。収音制御部 106は、選択した収音ビーム信号に対応する 話者方位を取得し、話者方位情報 Pmを生成し、入出力 I/F102に与える。 [0039] エコーキャンセル部 107は、入力される収音ビーム信号 MBSに対して、各放音用 音声信号 S 1〜S3に基づく擬似回帰音信号を生成する適応型フィルタと、収音ビー ム信号 MBSから擬似回帰音信号を減算するポストプロセッサとからなる。エコーキヤ ンセル回路は、適応型フィルタのフィルタ係数を逐次最適化しながら出力用収音ビ ーム信号 MBSから擬似回帰音信号を減算することで、出力用収音ビーム信号 MBS に含まれるスピーカ SP;!〜 SP4からマイク MC;!〜 MC16への回り込み成分を除去 する。この回り込み成分が除去された収音ビーム信号 MBSは、入出力 I/F102に 出力される。 [0038] The sound collection control unit 106 synthesizes the acquired sound collection signals MS;! To MS 16 with different delay control patterns and amplitude patterns, and sets the respective different directions as the central direction of directivity. A sound beam signal is generated. For example, with the sound emitting and collecting apparatus 1 as the center, eight sound collecting beam signals are generated in which the 360 ° of the entire circumference is divided into eight angles, that is, the central direction of the directivity is shifted every 45 °. The sound collection control unit 106 compares the amplitude levels of these sound collection beam signals, selects the sound collection beam signal MBS having the highest amplitude level, and outputs it to the echo cancellation unit 107. The sound collection control unit 106 acquires the speaker orientation corresponding to the selected sound collection beam signal, generates the speaker orientation information Pm, and provides it to the input / output I / F 102. [0039] The echo cancellation unit 107 includes an adaptive filter that generates pseudo-regression sound signals based on the sound output sound signals S1 to S3 for the input sound pickup beam signal MBS, and a sound pickup beam signal. It consists of a post processor that subtracts the pseudo-regressive sound signal from the MBS. The echo cancellation circuit subtracts the pseudo-regression sound signal from the output sound pickup beam signal MBS while sequentially optimizing the filter coefficients of the adaptive filter, so that the speaker SP;! Included in the output sound pickup beam signal MBS! ~ Remove the wraparound component from SP4 to microphone MC;! ~ MC16. The collected sound beam signal MBS from which the wraparound component has been removed is output to the input / output I / F 102.
[0040] 入出力 I/F102は、エコーキャンセル部 107で回帰音除去された収音ビーム信号 MBSと、収音制御部 106からの話者方位情報 Pmとを関連付けして、通信端末 5に 出力する。  [0040] The input / output I / F 102 associates the collected sound beam signal MBS from which the return sound has been removed by the echo canceling unit 107 with the speaker orientation information Pm from the sound collecting control unit 106, and outputs it to the communication terminal 5. To do.
[0041] カメラ 2は、ステー 3により、図 1、図 2に示すように、放収音装置 1に対して固定され た位置に設置される。この際、カメラ 2は、ステー 3により、水平方向(図 1に示すカメラ 2の向く方向)と垂直下方向(図 2に示すカメラ 2の向く方向)との間で回動可能に設 置されている。  The camera 2 is installed at a position fixed to the sound emitting and collecting apparatus 1 by the stay 3 as shown in FIGS. At this time, the camera 2 is installed by the stay 3 so as to be rotatable between a horizontal direction (direction facing the camera 2 shown in FIG. 1) and a vertical downward direction (direction facing the camera 2 shown in FIG. 2). ing.
[0042] ステー 3は、主体部 31、カメラ支持部 32、主体支持部 33、放収音装置取付部 34を 備える。主体部 31は、所定幅を有する直線状部材からなり主体支持部 33により垂直 方向に対して所定角の方向に延びる形状で設置される。主体部 31の延びる方向の 一方端には、ヒンジ 203を介してカメラ支持部 32が設置され、他方端には、放収音装 置取付部 34が設置されている。放収音装置取付部 34は、筐体 11の脚部 12が装嵌 する形状の開口部を有する平板からなり、例えば、主体部 31と一体形成されている。  The stay 3 includes a main body part 31, a camera support part 32, a main body support part 33, and a sound emitting and collecting device attachment part 34. The main body 31 is formed of a linear member having a predetermined width, and is installed in a shape extending in a direction of a predetermined angle with respect to the vertical direction by the main body support 33. A camera support portion 32 is installed at one end in the extending direction of the main body portion 31 via a hinge 203, and a sound emission and collection device mounting portion 34 is installed at the other end. The sound emitting and collecting device mounting portion 34 is formed of a flat plate having an opening portion into which the leg portion 12 of the housing 11 is fitted, and is integrally formed with the main body portion 31, for example.
[0043] 主体部 31のカメラ支持部 32側端部は、幅方向の両端壁のみが残り、幅方向の中 央部が開口する形状からなる。この開口部は、カメラ支持部 32に設置されたカメラ 2 が水平方向と垂直下方向との間で回動する際に、主体部 31に接触しない形状から なる。  [0043] The end portion on the camera support portion 32 side of the main portion 31 has a shape in which only both end walls in the width direction remain and the center portion in the width direction opens. The opening has a shape that does not contact the main body 31 when the camera 2 installed in the camera support 32 rotates between the horizontal direction and the vertical downward direction.
[0044] ヒンジ 203は、カメラ支持部 32を主体部 31に対して回動可能に設置させる構造を なす。また、ヒンジ 203およびカメラ支持部 32は、カメラ 2およびカメラ支持部 32が水 平方向に向いた場合と垂直下方向に向いた場合とに、半固定される構造を有する。 例えば、ヒンジ 203を主体部 31に固定し、ヒンジ 203の水平方向の位置と垂直下方 向の位置にそれぞれ凹部を形成する。カメラ支持部 32のヒンジ側端部には、前記凹 部に嵌る形状の凸部を設け、当該凸部をパネ等でカメラ支持部 32内から付勢する形 状を備えさせる。これにより、カメラ 2は水平方向と垂直下方向との間で回動し、且つ 水平方向と垂直下方向とで機構的状態を維持することが可能となる。 The hinge 203 has a structure in which the camera support portion 32 is rotatably installed with respect to the main body portion 31. Further, the hinge 203 and the camera support portion 32 have a structure that is semi-fixed when the camera 2 and the camera support portion 32 face in the horizontal direction and when they face in the vertical downward direction. For example, the hinge 203 is fixed to the main body 31, and the recesses are formed at the horizontal position and the vertically downward position of the hinge 203, respectively. A protrusion on the hinge side of the camera support 32 is provided with a protrusion that fits into the recess, and the protrusion is biased from within the camera support 32 with a panel or the like. As a result, the camera 2 can rotate between the horizontal direction and the vertically downward direction, and can maintain a mechanical state in the horizontal direction and the vertically downward direction.
[0045] このヒンジ 203およびカメラ支持部 32からなる機構部は、スィッチ 4として機能する。  The mechanism unit including the hinge 203 and the camera support unit 32 functions as the switch 4.
例えば、これら凹部および凸部にそれぞれ電極を設置し、電気的にこれらの導通、 開放を検出する。この際、水平方向の凹部と、垂直下方向の凹部とで異なる信号が 得られるように結線または検出信号を設定する。このような構造によりスィッチ 4が形 成され、当該スィッチ 4の検出結果は、カメラ 2に与えられる。これにより、カメラ 2は、 自身が水平方向を向いているの力、、垂直下方向を向いているのかを識別して、映像 を取得すること力できる。  For example, electrodes are installed in the concave and convex portions, respectively, and electrical conduction and release are detected. At this time, the connection or detection signal is set so that different signals are obtained between the horizontal recess and the vertical downward recess. With such a structure, the switch 4 is formed, and the detection result of the switch 4 is given to the camera 2. As a result, the camera 2 can identify the power of the camera 2 facing in the horizontal direction and the power of capturing the video by identifying whether the camera 2 is facing down in the vertical direction.
[0046] カメラ 2は、撮像部 21と映像処理部 22とを備える。撮像部 21は、魚眼レンズを備え 、カメラ 2の正面方向を中心として、全方位に対して、無限距離力も魚眼レンズの設 置面までの領域を撮像する。撮像データは、映像処理部 22に与えられる。  The camera 2 includes an imaging unit 21 and a video processing unit 22. The imaging unit 21 includes a fisheye lens, and images an area up to the installation surface of the fisheye lens with an infinite distance force in all directions around the front direction of the camera 2. The imaging data is given to the video processing unit 22.
[0047] 映像処理部 22は、ステー 3のスィッチ 4 (ヒンジ 203およびカメラ支持部 32)力、ら検 出したカメラ 2の向く方向(以下、撮影方向と称する)を取得する。映像処理部 22は、 取得した撮影方向および通信端末 5を介して放収音装置 1からの話者方位データ P mに基づいて、撮像データから必要部のみを抽出して画像補整し、映像データを生 成する。生成された映像データは、通信端末 5に与えられる。  The image processing unit 22 acquires the direction in which the camera 2 is detected (hereinafter referred to as a shooting direction) from the force of the switch 4 (hinge 203 and camera support unit 32) of the stay 3. Based on the acquired shooting direction and the speaker orientation data P m from the sound emission and collection device 1 via the communication terminal 5, the video processing unit 22 extracts only the necessary part from the imaging data and corrects the image, thereby obtaining video data. Is generated. The generated video data is given to the communication terminal 5.
[0048] 次に、当該ビデオ会議装置の使用方法および映像処理部 22での映像データ生成 方法について、より具体的に説明する。なお、以下の説明では、自装置側の会議者 が 5名である場合について示す力 会議者数が特にこれに限るものではない。  [0048] Next, a method for using the video conference apparatus and a method for generating video data in the video processing unit 22 will be described more specifically. In the following explanation, the number of power conferencing members shown when there are five conferencing members on the device side is not particularly limited to this.
[0049] 図 4は本実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点と ビデオ会議を行う状況を示す図であり、カメラ 2が会議者 60;!〜 605を撮像している 場合を示した図である。  FIG. 4 is a diagram showing a situation in which the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the camera 2 captures the conference participants 60;!-605. FIG.
図 5は映像データ生成の説明に用いる説明図であり、 (A)は魚眼レンズを介して撮 像された映像 (画像)を示し、(B)、(C)は会議者方位毎の画像補整概念を示す。 [0050] 図 6は本実施形態のビデオ会議装置を配置して、ネットワーク接続された他地点と ビデオ会議を行う状況を示す図であり、カメラ 2が資料 650を撮像している場合を示し た図である。 Fig. 5 is an explanatory diagram used to explain video data generation. (A) shows the video (image) taken through the fisheye lens, and (B) and (C) are image correction concepts for each conference direction. Indicates. FIG. 6 is a diagram illustrating a situation where the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the case where the camera 2 captures the document 650 is illustrated. FIG.
図 7は映像データ生成の説明に用いる説明図であり、 (A)は魚眼レンズを介して撮 像された映像 (画像)を示し、 (B)は資料撮像時の画像補整概念を示す。  Fig. 7 is an explanatory diagram used to explain the video data generation. (A) shows the video (image) taken through the fisheye lens, and (B) shows the concept of image correction during image capture.
[0051] ビデオ会議を行う場合には、会議者 60;!〜 605は、長円形のテーブル 700に対し て長手方向の片端を除く位置に着席する。テーブル 700上には、円形の放収音装 置 1とこれにステー 3により固定されたカメラ 2との一体部材が設置される。この際、力 メラ 2は、水平方向に向いた状態で、テーブル 700の長手方向に平行な軸が魚眼レ ンズの中心軸と一致するように設置されている。テーブル 700の下には、通信端末 5 が設置されている。通信端末 5は、放収音装置 1、カメラ 2と電気的に接続し、且つネ ットワーク 500に接続している。また、通信端末 5は、ディスプレイ 6に電気的に接続し ている。ディスプレイ 6は、例えば液晶ディスプレイ等からなり、テーブル 700の会議 者 60;!〜 605が着席していない側の端部付近に設置される。この際、ディスプレイ 6 は、テーブル 700方向に表示面が向くように設置されている。  [0051] When a video conference is performed, the conferees 60;! To 605 are seated on the oval table 700 at positions other than one end in the longitudinal direction. On the table 700, an integrated member of a circular sound emission and collection device 1 and a camera 2 fixed to the same by a stay 3 is installed. At this time, the force lens 2 is installed so that the axis parallel to the longitudinal direction of the table 700 coincides with the central axis of the fish-eye lens in a state of being horizontally oriented. Under the table 700, a communication terminal 5 is installed. The communication terminal 5 is electrically connected to the sound emission and collection device 1 and the camera 2 and is connected to the network 500. The communication terminal 5 is electrically connected to the display 6. The display 6 is composed of, for example, a liquid crystal display or the like, and is installed near the end of the table 700 where the participants 60 ;! to 605 are not seated. At this time, the display 6 is installed such that the display surface faces the direction of the table 700.
[0052] このような状態でビデオ会議が行われると、放収音装置 1、カメラ 2、通信端末 5を含 むビデオ会議装置は、二つのモードで会議の映像を相手先のビデオ会議装置に送 信する。  [0052] When a video conference is performed in such a state, the video conference device including the sound emission and collection device 1, the camera 2, and the communication terminal 5 transmits the conference video to the destination video conference device in two modes. Send.
[0053] (1)会議者撮影モード  [0053] (1) Conference shooting mode
会議者 60;!〜 605のいずれ力、が、カメラ 2を水平方向にセットすると、スィッチ 4から の検出信号により、カメラ 2の映像処理部 22は、会議者撮影モードが選択されたこと を検出する。映像処理部 22は、会議者撮影モードを検出すると、当該モードの選択 信号を通信端末 5に与える。  When any of the participants 60;! To 605 is set in the horizontal direction, the video processor 22 of the camera 2 detects that the conference shooting mode has been selected by the detection signal from the switch 4. To do. When the video processing unit 22 detects the conference shooting mode, the video processing unit 22 provides the communication terminal 5 with a selection signal for the mode.
[0054] カメラ 2の撮像部 21は、魚眼レンズを通して、自装置側に在席する全会議者 60;!〜 605を撮像した撮像データを取得し、映像処理部 22に出力する。ここで、撮像デー タは、魚眼レンズを通しているので、撮像領域が図 5 (A)のように円形になる。会議者 撮影モードが選択されている場合、映像処理部 22は、円形の撮像データに対して、 円弧状に曲がる水平方向を方位角 Φで表し、垂直方向を仰角 Φで表す座標系で取 得する。すなわち、魚眼レンズの正面方向で、レンズ軸と同じ高さが φ =0。 、 φ =0The imaging unit 21 of the camera 2 acquires imaging data obtained by imaging all conference persons 60;! To 605 present on the device side through the fisheye lens, and outputs the acquired imaging data to the video processing unit 22. Here, since the imaging data passes through the fisheye lens, the imaging area becomes circular as shown in Fig. 5 (A). When the conference shooting mode is selected, the video processing unit 22 takes a circular imaging data with a coordinate system in which the horizontal direction of the circular arc is represented by an azimuth angle Φ and the vertical direction is represented by an elevation angle Φ. To get. That is, the same height as the lens axis in the front direction of the fisheye lens is φ = 0. , Φ = 0
° に設定される。さらに、当該座標から左方向に広がる方向で Φが負方向に増加し、 右方向に広がる方向で Φが正方向に増加するように設定されている。したがって、力 メラ 2の魚眼レンズの最先端から、撮影方向に対して左方向で魚眼レンズの軸に垂 直な方向が φ =— 90° となり、カメラ 2の魚眼レンズの最先端から、撮影方向に対し て右方向で魚眼レンズの軸に垂直な方向が φ = + 90° となる。また、当該座標から 上方向に広がる方向で φが正方向に増加し、下方向に広がる方向で φが負方向に 増加するように設定されている。したがって、カメラ 2の魚眼レンズの最先端から、撮 影方向に対して上方向で魚眼レンズの軸に垂直な方向が Φ = + 90° となり、カメラ 2の魚眼レンズの最先端から、撮影方向に対して下方向で魚眼レンズの軸に垂直な 方向が φ =— 90° となる。 Set to °. Furthermore, it is set so that Φ increases in the negative direction in the direction spreading leftward from the coordinates, and Φ increases in the positive direction in the direction spreading rightward. Therefore, the direction perpendicular to the fisheye lens axis is φ = 90 ° from the front of the fisheye lens of the force lens 2 to the shooting direction, and from the cutting edge of the fisheye lens of the camera 2 to the shooting direction. The direction perpendicular to the fisheye lens axis in the right direction is φ = + 90 °. Also, it is set so that φ increases in the positive direction in the direction spreading upward from the coordinates, and φ increases in the negative direction in the direction spreading downward. Therefore, from the leading edge of camera 2's fisheye lens, the direction upward to the shooting direction and perpendicular to the axis of the fisheye lens is Φ = + 90 °, and from the leading edge of camera 2's fisheye lens, it is below the shooting direction. The direction perpendicular to the axis of the fisheye lens is φ = —90 °.
[0055] 放収音装置 1は、前述の処理により、発言中の会議者の音声を取得するとともに、 会議者方位を検出して、収音音声データと話者方位情報 Θとを通信端末 5に与える 。例えば、図 4に示す会議者 601が発言すれば、放収音装置 1は、会議者 601の方 位 θ 1を検出して、会議者 601方向からの音声に基づく収音音声データと話者方位 情報 θ 1とを通信端末 5に与える。また、会議者 605が発言すれば、放収音装置 1は 、会議者 605の方位 Θ 2を検出して、会議者 605方向からの音声に基づく収音音声 データと話者方位情報 Θ 2を通信端末 5に与える。通信端末 5は、話者方位情報 Θ をカメラ 2の映像処理部 22に与える。  [0055] The sound emission and collection device 1 acquires the voice of the conference participant who is speaking by the above-described processing, detects the conference direction, and transmits the collected sound data and the speaker orientation information Θ to the communication terminal 5. Give to. For example, if the conference person 601 shown in FIG. 4 speaks, the sound emission and collection device 1 detects the direction θ 1 of the conference party 601 and collects the collected sound data and the speaker based on the voice from the direction of the conference party 601. Direction information θ 1 is given to communication terminal 5. Also, if the conference person 605 speaks, the sound emission and collection device 1 detects the orientation Θ 2 of the conference party 605, and collects the collected sound data based on the voice from the conference 605 direction and the speaker orientation information Θ 2. Give to communication terminal 5. The communication terminal 5 gives the speaker orientation information Θ to the video processing unit 22 of the camera 2.
[0056] 映像処理部 22は、通信端末 5からの話者方位情報 Θに基づ!/、て、撮像データを補 整する。映像処理部 22は、話者方位情報 Θと、撮像データに設定された方位角 φと の関係を予め記憶している。そして、映像処理部 22は、話者方位情報 Θを受け付け ると、対応する方位角 φを読み出す。例えば、映像処理部 22は、会議者 601に対す る話者方位情報 Θ 1を受け付けると、対応する方位角 φ =0° を読み出す。また、例 えば、映像処理部 22は会議者 605に対する話者方位情報 Θ 2を受け付けると、対応 する方位角 φ =— 90° を読み出す。  The video processing unit 22 corrects the imaging data based on the speaker orientation information Θ from the communication terminal 5! /. The video processing unit 22 stores in advance the relationship between the speaker orientation information Θ and the orientation angle φ set in the imaging data. When the video processing unit 22 receives the speaker orientation information Θ, the video processing unit 22 reads the corresponding orientation angle φ. For example, when the video processing unit 22 receives the speaker orientation information Θ 1 for the conference 601, the video processing unit 22 reads out the corresponding azimuth angle φ = 0 °. For example, when the video processing unit 22 receives the speaker orientation information Θ 2 for the conference person 605, the video processing unit 22 reads the corresponding orientation angle φ = −90 °.
[0057] 映像処理部 22は、読み出した方位角 φを含む所定方位角幅からなる画像抽出方 位角範囲を設定する。また、映像処理部 22は、仰角 φ =0° を含む所定仰角幅から なる画像抽出仰角範囲を設定する。そして、映像処理部 22は、設定した方位角範囲 と仰角範囲とにより画像抽出領域を決定し、当該領域に対応する撮像データを画像 データとして取得する。 [0057] The video processing unit 22 sets an image extraction direction angle range having a predetermined azimuth angle width including the read azimuth angle φ. In addition, the video processing unit 22 starts from a predetermined elevation angle width including an elevation angle φ = 0 °. An image extraction elevation range is set. Then, the video processing unit 22 determines an image extraction region based on the set azimuth angle range and elevation angle range, and acquires imaging data corresponding to the region as image data.
[0058] 例えば、映像処理部 22は、方位角 φ=0° を読み出すと、 φ=0° を含み方位角 φ 1〜方位角 φ 2( φ 1<0° く φ 2)の範囲を方位角範囲に設定する。また、映像処 理部 22は、 φ =0° を含み仰角 φ 1〜仰角 φ2(φ1ぐ φ 2)の範囲を仰角範囲に設 定する。そして、映像処理部 22は、方位角範囲 φ1〜φ2、仰角範囲 φ1〜φ 2によ り画像抽出領域を設定して、画像データ 621を取得する。また、例えば、映像処理部 22は、方位角 φ=— 90° を読み出すと、 φ =— 90° を含み方位角 Φ3〜方位角 Φ4(φ3<— 90° < φ 4)の範囲を方位角範囲に設定する。また、映像処理部 22は 、 φ =0° を含み仰角 φ 3〜仰角 φ 4 ( φ 3< φ 4)の範囲を仰角範囲に設定する。そ して、映像処理部 22は、方位角範囲 φ 3〜 φ 4、仰角範囲 φ3〜φ 4により画像抽出 領域を設定して、画像データ 622を取得する。  [0058] For example, when the image processing unit 22 reads out the azimuth angle φ = 0 °, the azimuth angle φ1 to the azimuth angle φ2 (φ1 <0 ° Set to angular range. In addition, the image processing unit 22 sets the range of elevation angle φ1 to elevation angle φ2 (φ1 to φ2) including φ = 0 ° as the elevation angle range. Then, the video processing unit 22 acquires the image data 621 by setting an image extraction region based on the azimuth range φ1 to φ2 and the elevation range φ1 to φ2. For example, when the image processing unit 22 reads out the azimuth angle φ = −90 °, the azimuth angle Φ3—azimuth angle φ4 to azimuth angle φ4 (φ3 <−90 ° <φ4) is included. Set to range. Further, the video processing unit 22 sets the range of elevation angle φ3 to elevation angle φ4 (φ3 <φ4) including φ = 0 ° as the elevation angle range. Then, the video processing unit 22 sets the image extraction area with the azimuth angle range φ 3 to φ 4 and the elevation angle range φ 3 to φ 4 and acquires the image data 622.
[0059] 映像処理部 22は、取得した画像抽出領域毎に画像の補整変換を行う。具体的に は、二つの角度方向である φ方向と φ方向で定義される各画素を、直交二次元の平 面座標 (X— Υ座標系)の画素に当てはめるように補整変換する。この際、映像処理 部 22は、 φ φ座標系と X— Υ座標系との変換処理テーブルを予め記憶しており、 取得した各画素の φ— φ座標に基づいて、 X— Υ座標を算出し、補整変換する。な お、映像処理部 22は、予め座標変換演算式を記憶しており、当該座標変換演算式 を用いて補整変換を行っても良い。  The video processing unit 22 performs image correction conversion for each acquired image extraction area. Specifically, each pixel defined by two angular directions, φ direction and φ direction, is corrected so as to be applied to a pixel in an orthogonal two-dimensional plane coordinate (X—Υ coordinate system). At this time, the video processing unit 22 stores a conversion processing table between the φφ coordinate system and the X−Υ coordinate system in advance, and calculates the X−Υ coordinate based on the obtained φ−φ coordinate of each pixel. And compensate for transformation. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.
[0060] 例えば、図 5(B)に示すように、映像処理部 22は、方位角範囲 φ1〜φ2、仰角範 囲 φ 1〜 φ 2で設定される画像データ 621を、平面座標系であり水平方向を X軸とし て垂直方向を Υ軸とする xl〜x2, yl〜y2で設定される補整画像データ 621'に変 換する。この変換により、 φ φ座標系で取得した会議者 601の人物像 611が、 X— Y座標系(平面座標系)の補整人物像 631に変換される。このように X—Y座標系に 変換することで、補整人物像 631は、会議者 601の自然体像に近いものとなる。  [0060] For example, as shown in FIG. 5 (B), the video processing unit 22 uses a plane coordinate system to store the image data 621 set in the azimuth range φ1 to φ2 and the elevation range φ1 to φ2. Converted to the corrected image data 621 'set by xl to x2 and yl to y2 with the horizontal direction as the X axis and the vertical direction as the vertical axis. By this conversion, the person image 611 of the conference person 601 obtained in the φφ coordinate system is converted into a corrected person image 631 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected person image 631 becomes close to the natural body image of the conference person 601.
[0061] また、例えば、図 5(C)に示すように、映像処理部 22は、方位角範囲 φ 3〜 φ 4、仰 角範囲 φ3〜φ 4で設定される画像データ 622を、平面座標系であり水平方向を X軸 として垂直方向を Y軸とする x3〜x4, y3〜y4で設定される補整画像データ 622'に 変換する。この変換により、 φ φ座標系で取得した会議者 605の人物像 615が、 X Y座標系(平面座標系)の補整人物像 635に変換される。このように X— Y座標系 に変換することで、補整人物像 635は、会議者 601の自然体像に近いものとなる。 [0061] Further, for example, as shown in FIG. 5C, the video processing unit 22 converts the image data 622 set in the azimuth angle range φ3 to φ4 and the elevation angle range φ3 to φ4 into plane coordinates. System, horizontal direction X axis Is converted to the corrected image data 622 ′ set by x3 to x4 and y3 to y4 with the vertical direction as the Y axis. By this conversion, the person image 615 of the conference person 605 acquired in the φφ coordinate system is converted into a corrected person image 635 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected person image 635 becomes close to the natural body image of the conference participant 601.
[0062] 映像処理部 22は、このように自然体に近づいた補整人物像を含む補整画像デー タに時間情報を添付して映像データとして通信端末 5に出力する。このような補整画 像データの生成および出力は、逐次行われており、受け付けた話者方位情報 Θが 変化すれば、この変化に応じて、補整画像データの中心方向も切り替わる。  The video processing unit 22 attaches time information to the corrected image data including the corrected human image approaching the natural body in this way, and outputs the corrected image data to the communication terminal 5 as video data. Such generation and output of the corrected image data are performed sequentially. If the received speaker orientation information Θ changes, the center direction of the corrected image data is switched according to the change.
[0063] 通信端末 5は、映像処理部 22からの映像データと収音音声データと話者方位情報  [0063] The communication terminal 5 uses the video data from the video processing unit 22, the collected voice data, and the speaker orientation information.
Θとを関連付けして通信データを生成し、ネットワーク 500を介して相手先のビデオ 会議装置に送信する。これにより、相手先のビデオ会議装置の周囲に在席する会議 者には、発言中の会議者の自然体に近い映像と当該会議者の発言とを提供すること ができる。  Communication data is generated by associating with Θ, and transmitted to the video conference apparatus of the other party via the network 500. As a result, it is possible to provide a conference person who is present in the vicinity of the other party's video conferencing apparatus with an image close to the natural state of the conference participant who is speaking and the speech of the conference participant.
[0064] (2)資料撮影モード [0064] ( 2 ) Document shooting mode
会議者 60;!〜 605のいずれ力、が、図 6に示すように、カメラ 2を垂直下方向にセット すると、スィッチ 4からの検出信号により、カメラ 2の映像処理部 22は、資料撮影モー ドが選択されたことを検出する。映像処理部 22は、資料撮影モードを検出すると、当 該モードの選択信号を通信端末 5に与える。  As shown in FIG. 6, when the power of conference person 60;! To 605 is set in the vertically downward direction as shown in FIG. 6, the video processing unit 22 of the camera 2 causes the document shooting mode to be detected by the detection signal from the switch 4. Detects that a command was selected. When the video processing unit 22 detects the document photographing mode, the video processing unit 22 gives a selection signal for the mode to the communication terminal 5.
[0065] また、会議者 60;!〜 605のいずれ力、は、テーブル 700におけるヒンジ 203の垂直下 方向位置を中心にして、資料 650を載置する。この際、テーブル 700上に資料載置 用マーキングを予め行っておけば、資料 650を容易に且つ適切に載置することがで きる。  [0065] In addition, the force of any of the participants 60;! To 605 places the material 650 around the vertical downward position of the hinge 203 in the table 700. At this time, if the material placement marking is performed on the table 700 in advance, the material 650 can be placed easily and appropriately.
[0066] カメラ 2の撮像部 21は、魚眼レンズを通して、テーブル 700上に載置された資料 65 0を撮像した撮像データを取得し、映像処理部 22に出力する。ここで、撮像データは 、魚眼レンズを通しているので、撮像領域が図 7 (A)のように円形になる。  The imaging unit 21 of the camera 2 acquires imaging data obtained by imaging the material 650 placed on the table 700 through the fisheye lens, and outputs it to the video processing unit 22. Here, since the imaging data passes through the fisheye lens, the imaging area becomes circular as shown in FIG.
[0067] 資料撮影モードが選択されている場合、映像処理部 22は、円形の撮像データに対 して、撮像データの中心を原点とし、原点から放射方向に延びる距離 rと、所定方向( 図 7では原点から撮像データに向かって右方向を 0° 方向)に対する角度 71とで表さ れる r 7]座標系で取得する。映像処理部 22は、取得した撮像データから、予め設 定された範囲の画像データ 680を切り出す。 [0067] When the document shooting mode is selected, the image processing unit 22 sets the center of the imaging data as the origin and the distance r extending in the radial direction from the origin, and a predetermined direction (see FIG. In Fig. 7, it is expressed as an angle 71 with respect to the image data from the origin in the right direction (0 ° direction). Obtained in the coordinate system. The video processing unit 22 cuts out image data 680 in a preset range from the acquired imaging data.
[0068] 映像処理部 22は、 r η座標系の画像データ 680を X— Υ平面座標系の補整画像 データ 680 'に変換することで補整する。この際、映像処理部 22は、 r η座標系と X Υ座標系との中心座標を一致させた座標変換処理テーブルを予め記憶しており、 取得した各画素の r V座標に基づいて X— Y座標を算出し、補整変換する。なお、 映像処理部 22は、予め座標変換演算式を記憶しており、当該座標変換演算式を用 V、て補整変換を fiつても良レ、。  The video processing unit 22 corrects the image data 680 in the r η coordinate system by converting it into corrected image data 680 ′ in the X−Υ plane coordinate system. At this time, the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r η coordinate system and the X Υ coordinate system coincide with each other, and the X— Y coordinate is calculated and corrected. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and V can be used for the coordinate conversion calculation formula and fi correction can be performed.
[0069] この変換により、 r- 7]座標系で取得した資料 650の資料像 660が、 X—Y座標系( 平面座標系)の補整資料像 670に変換される。このように X—Y座標系に変換するこ とで、補整資料像 670は、資料 650の自然体像に近いものとなる。すなわち、文字が 歪んで!/、な!/、資料 650の画像データを取得することができる。  [0069] By this conversion, the material image 660 of the material 650 acquired in the r-7 coordinate system is converted into a corrected material image 670 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected material image 670 becomes close to the natural body image of the material 650. That is, the image data of the material 650 can be acquired.
[0070] 通信端末 5は、映像処理部 22から取得した資料 650の画像データを含む通信デ ータを生成し、ネットワーク 500を介して相手先のビデオ会議装置に送信する。これ により、相手先のビデオ会議装置の周囲に在席する会議者には、鮮明で見やすぃ資 料の画像を提供することができる。なお、この際、通信端末 5は、収音音声データを 放収音装置 1から取得していれば、資料 650の画像データとともに収音音声データを 含む通信データを生成し、送信するようにしてもよい。  The communication terminal 5 generates communication data including the image data of the material 650 acquired from the video processing unit 22 and transmits the communication data to the partner video conference apparatus via the network 500. As a result, it is possible to provide clear and easy-to-see material images to the conference attendees who are present around the video conference device of the other party. At this time, if the collected sound data is acquired from the sound emission and collection device 1, the communication terminal 5 generates and transmits communication data including the collected sound data together with the image data of the material 650. Also good.
[0071] 以上のように、本実施形態の構成および処理  [0071] As described above, the configuration and processing of the present embodiment
を用いることで、会議者の映像と資料の画像とを、それぞれの仕様に適した状態で取 得し、送信すること力できる。この際、カメラを水平方向と垂直下方向との二方向に可 変させるだけで、会議者映像と資料画像とのそれぞれの仕様に応じた映像を容易に 取得すること力 Sでさる。  By using, you can acquire and transmit the video of the conference and the image of the document in a state suitable for each specification. At this time, simply changing the camera in two directions, the horizontal direction and the vertical downward direction, can easily obtain the video corresponding to the specifications of the conference video and the document image.
[0072] 次に、第 2の実施形態に係るビデオ会議装置について図を参照して説明する。  Next, a video conference apparatus according to the second embodiment will be described with reference to the drawings.
図 8は、本実施形態のビデオ会議装置の内の放収音装置 1とカメラ 2と支持体 7とか らなる組み立て部材の外観図であり、(A)は平面図、(B)は側面図である。  FIG. 8 is an external view of an assembly member including the sound emission and collection device 1, the camera 2, and the support 7 in the video conference device of the present embodiment, (A) is a plan view, and (B) is a side view. It is.
図 9は、本実施形態のビデオ会議装置を用いたビデオ会議装置の使用状況を示し た図であり、(A)は平面図、(B)は側面図である。なお、図 8、図 9では、放収音装置 1、カメラ 2に接続されるケーブル類については、図示を省略している。 FIG. 9 is a diagram showing a usage situation of the video conference apparatus using the video conference apparatus of the present embodiment, where (A) is a plan view and (B) is a side view. 8 and 9, the sound emission and collection device 1 and cables connected to the camera 2 are not shown.
[0073] 図 10は、本実施形態のビデオ会議装置による映像データの生成を説明する図で あり、(A)は撮像データを示す図、(B)は撮像データの中心部の画像補整の概念図 、 (C)は撮像データの周囲部の画像補整の概念図である。 FIG. 10 is a diagram for explaining generation of video data by the video conference apparatus according to the present embodiment. (A) is a diagram showing imaging data, and (B) is a concept of image correction at the center of the imaging data. FIGS. 2C and 2C are conceptual diagrams of image correction around the image data.
本実施形態のビデオ会議装置は、放収音装置 1および通信端末 5の構成および処 理は、第 1の実施形態のビデオ会議装置と同じである。一方、本実施形態のビデオ 会議装置は、カメラ 2の設置構造すなわち支持体 7の構造、およびカメラ 2の映像処 理部 22での映像処理方法が、第 1の実施形態と異なり、スィッチ 4が省略されたもの である。  In the video conference apparatus of the present embodiment, the configuration and processing of the sound emitting and collecting apparatus 1 and the communication terminal 5 are the same as those of the video conference apparatus of the first embodiment. On the other hand, the video conferencing apparatus of the present embodiment is different from the first embodiment in that the switch 4 is installed in the structure of the camera 2, that is, the structure of the support 7 and the video processing method in the video processing unit 22 of the camera 2. It is omitted.
[0074] 図 8に示すように、円板状の放収音装置 1の周囲には、支持体 7が配置されている 。支持体 7は、垂直方向に延びる四本の垂直支軸と、放収音装置 1の上面から距離 h 1の位置に配置された二本の水平支軸と、放収音装置 1の上面から距離 h2 (〉hl) の位置に配置された四本の水平支軸とからなる。距離 hiに配置される二本の水平支 軸は、放収音装置 1を平面視した時の略中心の位置で交わる構造からなり、四本の 垂直支軸により距離 hiに保持されている。距離 h2に配置される水平支軸は、平面視 して略正方形となるように組まれ、四本の垂直支軸により距離 h2に保持されている。  As shown in FIG. 8, a support 7 is disposed around the disc-shaped sound emitting and collecting apparatus 1. The support 7 includes four vertical support shafts extending in the vertical direction, two horizontal support shafts disposed at a distance h 1 from the top surface of the sound emitting and collecting device 1, and the top surface of the sound emitting and collecting device 1. It consists of four horizontal spindles arranged at a distance h2 (> hl). The two horizontal support shafts arranged at the distance hi have a structure that intersects at a substantially central position when the sound emitting and collecting apparatus 1 is viewed in plan, and are held at the distance hi by the four vertical support shafts. The horizontal support shafts arranged at the distance h2 are assembled so as to be substantially square in a plan view, and are held at the distance h2 by four vertical support shafts.
[0075] カメラ 2は、距離 hiにある二本の水平支軸の交点に設置されている。カメラ 2は、撮 像方向が垂直上向きになるように設置されている。  [0075] Camera 2 is installed at the intersection of two horizontal spindles at distance hi. Camera 2 is installed so that the shooting direction is vertically upward.
[0076] 載置テーブル 8は、距離 h2にある四本の水平支軸により支持されており、載置テー ブル 8は、透過性の高いガラスやアクリル板等により形成されている。この際、平面視 した状態で、載置テーブル 8の中心とカメラ 2の魚眼レンズの軸とが略一致するように 、載置テーブル 8とカメラ 2が設置される。  [0076] The mounting table 8 is supported by four horizontal support shafts at a distance h2, and the mounting table 8 is formed of a highly transmissive glass, an acrylic plate, or the like. At this time, the mounting table 8 and the camera 2 are installed so that the center of the mounting table 8 and the axis of the fisheye lens of the camera 2 substantially coincide with each other in a plan view.
[0077] 載置テーブル 8の上には、資料 650が、印刷面を垂直下方向すなわち載置テープ ノレ 8に接する向きで置かれる。  On the mounting table 8, the material 650 is placed with the printing surface in a vertically downward direction, that is, in a direction in contact with the mounting tape nozzle 8.
[0078] ここで、カメラ 2の高さおよび載置テーブル 8の高さ、すなわち、距離 hi , h2は、図 9 に示すように、会議者 60;!〜 604の少なくとも顔が、カメラ 2で撮影可能で、且つ、載 置テーブル 8を支持する水平支軸で隠れないように設定するとよい。  [0078] Here, the height of the camera 2 and the height of the mounting table 8, that is, the distances hi and h2, are as shown in FIG. It should be set so that it can be photographed and is not hidden by the horizontal spindle that supports the mounting table 8.
[0079] このような構成のビデオ会議装置を用いた場合、カメラ 2の撮像部 21で取得される 撮像データは、図 10 (A)のようになる。すなわち、撮像データは、魚眼レンズを通し て撮像されたものであるので、全撮像領域が円形の全領域画像データ 610となり、そ の中心に資料 650の資料像 660が映され、その周辺部に各会議者 60;!〜 604の人 物像 64;!〜 644が映される。 When the video conferencing apparatus having such a configuration is used, it is acquired by the imaging unit 21 of the camera 2. The imaging data is as shown in Fig. 10 (A). In other words, since the imaging data is taken through a fisheye lens, the entire imaging area is a circular all-area image data 610, and the document image 660 of the document 650 is projected at the center, and each of the surrounding areas is each image data 660. Personnel image 60;! ~ 604 People image 64;! ~ 644 are shown.
[0080] 映像処理部 22は、円形の撮像データに対して、撮像データの中心を原点とし、原 点から放射方向に延びる距離 rと、所定方向(図 10では原点から撮像データに向か つて右方向を 0° 方向)に対する角度 ηとで表される r 7]座標系で取得する。映像 処理部 22は、取得した撮像データから、予め設定された範囲の画像データ 681を切 り出す。 [0080] For the circular image data, the image processing unit 22 uses the center of the image data as the origin, the distance r extending in the radial direction from the origin, and a predetermined direction (in FIG. 10, from the origin to the image data). It is obtained in the r 7] coordinate system expressed by the angle η with respect to the right direction (0 ° direction). The video processing unit 22 cuts out a predetermined range of image data 681 from the acquired imaging data.
[0081] 映像処理部 22は、 r η座標系の画像データ 681を X— Υ平面座標系の補整画像 データ 681 'に変換することで補整する。この際、映像処理部 22は、 r η座標系と X Υ座標系との中心座標を一致させた座標変換処理テーブルを予め記憶しており、 取得した各画素の r 7]座標に基づいて X— Y座標を算出し、補整変換する。なお、 映像処理部 22は、予め座標変換演算式を記憶しており、当該座標変換演算式を用 V、て補整変換を fiつても良レ、。  The video processing unit 22 corrects the image data 681 in the r η coordinate system by converting it into corrected image data 681 ′ in the X−Υ plane coordinate system. At this time, the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r η coordinate system and the X Υ coordinate system coincide with each other, and based on the acquired r 7] coordinates of each pixel, — Calculate Y coordinate and perform correction conversion. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and V can be used for the coordinate conversion calculation formula and fi correction can be performed.
[0082] この変換により、図 10 (B)に示すように、 r η座標系で取得した資料 650の資料 像 660が、 Χ—Υ座標系(平面座標系)の補整資料像 670に変換される。このように X Υ座標系に変換することで、補整資料像 670は、資料 650の自然体像に近いもの となる。すなわち、文字が歪んでいない資料 650の画像データを取得することができ  By this conversion, as shown in FIG. 10B, the material image 660 of the material 650 acquired in the r η coordinate system is converted into a corrected material image 670 in the Χ-Χ coordinate system (planar coordinate system). The By transforming into the X-coordinate system in this way, the corrected material image 670 becomes close to the natural body image of the material 650. In other words, it is possible to obtain the image data of the document 650 that is not distorted
[0083] また、映像処理部 22は、全領域画像データ 610から中心付近の画像データ 681を 取り除いた周辺部画像データ 682を取得する。映像処理部 22は、通信端末 5を介し て放収音装置 1から取得した話者位置情報に基づいて、第 1の実施形態と同様に、 抽出する領域を設定する。すなわち、映像処理部 22は、発言中の会議者の像を含 む領域を抽出し、部分画像データ 683を取得する。この際、映像処理部 22は、部分 画像データを r V座標系で取得する。具体的には、図 10 (C)に示すように、映像 処理部 22は、話者方位情報に基づいて、該当する会議者の像を含む扇形状の四箇 所の角 の座標を、 (rlO, 7] 10) , (rlO, η 20) , (r20, η 20) , (r20, ] 10) ίこ設 定して取得する。 In addition, the video processing unit 22 acquires peripheral image data 682 by removing the image data 681 near the center from the entire region image data 610. Based on the speaker position information acquired from the sound emission and collection device 1 via the communication terminal 5, the video processing unit 22 sets an area to be extracted as in the first embodiment. That is, the video processing unit 22 extracts a region including the image of the conference participant who is speaking, and acquires the partial image data 683. At this time, the video processing unit 22 acquires partial image data in the rV coordinate system. Specifically, as shown in FIG. 10 (C), the video processing unit 22 determines the coordinates of the four corners of the fan shape including the image of the corresponding conference (rlO) based on the speaker orientation information. , 7] 10), (rlO, η 20), (r20, η 20), (r20,] 10) ί To obtain.
[0084] 映像処理部 22は、取得した部分画像データ 683の補整変換を行う。具体的には、 r 座標系で定義される各画素を、直交二次元の平面座標 (X— Y座標系)の画素 に当てはめるように補整変換する。この際、映像処理部 22は、 r- n座標系と X—Y 座標系との変換処理テーブルを予め記憶しており、取得した各画素の r n座標に 基づいて、 X— Y座標を算出し、補整変換する。なお、映像処理部 22は、予め座標 変換演算式を記憶しており、当該座標変換演算式を用いて補整変換を行っても良い The video processing unit 22 performs correction conversion on the acquired partial image data 683. Specifically, each pixel defined in the r coordinate system is compensated and transformed so as to be applied to a pixel in the orthogonal two-dimensional plane coordinate (XY coordinate system). At this time, the video processing unit 22 stores in advance a conversion processing table between the rn coordinate system and the XY coordinate system, and calculates the XY coordinate based on the acquired rn coordinate of each pixel. , Make corrections. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.
Yes
[0085] 例えば、図 10 (C)に示すように、映像処理部 22は、距離範囲 rl0〜r20、方位角 範囲 η 10〜 η 20で設定される部分画像データ 683を、平面座標系であり水平方向 を X軸として垂直方向を Υ軸とする xl0〜x20, yl0〜y20で設定される補整画像デ ータ 683'に変換する。この変換により、 r- n座標系で取得した会議者 604の人物 像 644が、 X—Y座標系(平面座標系)の補整人物像 654に変換される。このように X Y座標系に変換することで、補整人物像 654は、会議者 604の自然体像に近いも のとなる。  For example, as shown in FIG. 10 (C), the video processing unit 22 has a planar coordinate system for displaying the partial image data 683 set in the distance range rl0 to r20 and the azimuth angle range η10 to η20. Converted to the corrected image data 683 'set by xl0 to x20 and yl0 to y20 with the horizontal direction as the X axis and the vertical direction as the vertical axis. By this conversion, the person image 644 of the conference person 604 acquired in the rn coordinate system is converted into a corrected person image 654 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected human image 654 becomes close to the natural image of the conference person 604.
[0086] 映像処理部 22は、取得した補整資料像 670を含む補整画像データと補整人物像 654を含む補整画像データとに時間情報を添付して、映像データとして通信端末 5 に出力する。このような補整画像データの生成および出力は、逐次行われており、受 け付けた話者方位情報 Θが変化すれば、この変化に応じて、補整人物像を含む補 整画像データのみが切り替わった映像データが出力される。  The video processing unit 22 attaches time information to the corrected image data including the acquired correction material image 670 and the corrected image data including the corrected human image 654, and outputs it to the communication terminal 5 as video data. Generation and output of such corrected image data are performed sequentially. If the received speaker orientation information Θ changes, only the corrected image data including the corrected human image is switched according to the change. Video data is output.
[0087] 通信端末 5は、映像処理部 22からの映像データと収音音声データと話者方位情報  [0087] The communication terminal 5 uses the video data from the video processing unit 22, the collected voice data, and the speaker orientation information.
Θとを関連付けして通信データを生成し、ネットワーク 500を介して相手先のビデオ 会議装置に送信する。これにより、相手先のビデオ会議装置の周囲に在席する会議 者には、発言中の会議者の自然体に近い映像と当該会議者の発言とともに、資料画 像を同時に提供することができる。  Communication data is generated by associating with Θ, and transmitted to the video conference apparatus of the other party via the network 500. As a result, it is possible to simultaneously provide a document image together with a video that is close to the natural state of the speaking conference participant and a speech of the conference conference participant who is present in the vicinity of the other party's video conference device.
[0088] このように、本実施形態の構成および処理を用いることで、発言中の会議者映像と 資料画像とを同時に取得して送信するビデオ会議装置を比較的簡素な構造で実現 すること力 Sでさる。 [0089] なお、本実施形態では、会議者映像と資料画像とを同時に取得して送信する例を 示したが、資料画像の取得は、定常的に行うのではなぐ一時的に行って、このタイミ ングでのみ送信するようにしても良い。この場合、資料画像は、資料を取り替える時 以外に変化することはないので、定常的に資料画像を送信する場合と比較しても、 相手先に送信される情報内容が減ることはない。その一方で、資料画像を送信しな い間は、資料画像のデータ量分だけ処理およびネットワーク負荷が軽くなるので、よ り高速に処理および送信を行うことができる。なお、資料画像取得のタイミングは、新 たな資料を載置した際に操作部から、取得操作入力を行うようにしてもよぐ画像解 析部を設け、取得した画像が前の画像と異なる時を、新たな取得タイミングにしてもよ い。 [0088] In this way, by using the configuration and processing of the present embodiment, it is possible to realize a video conferencing apparatus that acquires and transmits a conference participant's video and a document image that are currently speaking with a relatively simple structure. Touch with S. [0089] In the present embodiment, an example in which a conference participant image and a document image are acquired and transmitted at the same time has been shown. However, acquisition of a document image is temporarily performed rather than performed on a regular basis. It may be transmitted only at the timing. In this case, since the material image does not change except when the material is replaced, the content of information transmitted to the other party is not reduced compared to the case where the material image is regularly transmitted. On the other hand, while the document image is not transmitted, the processing and network load are reduced by the amount of data of the document image, so that processing and transmission can be performed at higher speed. Note that the document image acquisition timing is different from the previous image by providing an image analysis unit that can input the acquisition operation from the operation unit when a new document is placed. Time may be a new acquisition timing.
[0090] また、前述の各実施形態では、カメラ内に映像処理部を備えた例を示した力 当該 映像処理部をカメラと独立な装置で実現したり、放収音装置や、通信端末に装備し てもよい。これにより、カメラがより簡素な構造となるので、前述の必要な領域の撮影 が可能なレンズさえあれば、汎用の動画用カメラを用いることもできる。  Further, in each of the above-described embodiments, the power shown in the example in which the video processing unit is provided in the camera. The video processing unit can be realized by a device independent of the camera, or the sound emitting and collecting device or the communication terminal. You may equip it. As a result, since the camera has a simpler structure, a general-purpose video camera can be used as long as it has a lens capable of shooting the necessary area described above.
[0091] また、前述の説明では、通信端末を放収音装置と独立に設けた例を示したが、通 信端末の有する機能を放収音装置に備えても良い。これにより、ビデオ会議装置の 構成要素数が減少するので、より簡素で小型のビデオ会議装置を実現することがで きる。  Further, in the above description, an example in which the communication terminal is provided independently of the sound emission and collection device is shown, but the function of the communication terminal may be provided in the sound emission and collection device. As a result, the number of components of the video conference apparatus is reduced, so that a simpler and smaller video conference apparatus can be realized.
[0092] 本発明を詳細にまた特定の実施態様を参照して説明してきた力 本発明の精神、 範囲または意図の範囲を逸脱することなく様々な変更や修正を加えることができるこ とは当業者にとって明らかである。  [0092] The present invention has been described in detail and with reference to particular embodiments. It should be understood that various changes and modifications can be made without departing from the spirit, scope or scope of the invention. It is clear to the contractor.
本発明は、 2006年 12月 19日出願の日本特許出願(特願 2006-341175)に基づくも のであり、その内容はここに参照として取り込まれる。  The present invention is based on a Japanese patent application filed on December 19, 2006 (Japanese Patent Application No. 2006-341175), the contents of which are incorporated herein by reference.

Claims

請求の範囲 The scope of the claims
[1] 所定領域を撮像する撮像部と、 [1] an imaging unit for imaging a predetermined area;
該撮像部の撮像した映像に基づいて映像データを生成する映像データ生成部と、 自装置周囲の音声を収音して収音音声データを生成し、放音音声データを放音 する放収音部を備える筐体と、  A video data generation unit that generates video data based on the video captured by the imaging unit, and a sound collection / sound collection unit that collects sound around the device to generate sound collection sound data and emits sound release sound data. A housing having a section;
前記収音音声データと前記映像データとを有する通信データを生成し、当該通信 データを外部に送信するとともに、外部からの通信データから放音音声データを取得 して前記放収音部に与える通信部と、  Communication data including the collected sound data and the video data is generated, the communication data is transmitted to the outside, and the emitted sound data is acquired from the communication data from the outside and is given to the sound emitting and collecting unit And
前記撮像部を所定の態様で支持する支持部と、  A support unit that supports the imaging unit in a predetermined manner;
を備えたビデオ会議装置であって、  A video conferencing apparatus comprising:
前記支持部は、  The support part is
前記筐体の周囲の会議者撮像領域に前記撮像部を向ける第 1態様と、前記筐体 の近傍の前記撮像部に近接する領域に前記撮像部を向ける第 2態様と、のいずれ かで前記撮像部を支持し、  The first aspect in which the imaging unit is directed to a conference person imaging region around the casing, and the second aspect in which the imaging unit is directed to an area near the imaging unit in the vicinity of the casing. Support the imaging unit,
前記映像データ生成部は、  The video data generation unit
前記第 1態様の選択が検出されると、前記収音音声データの収音方位情報に対応 する方位領域のみを前記映像データから切り出して、切り出した映像データを前記 第 1態様に応じた第 1補整処理により補整し、  When the selection of the first mode is detected, only the azimuth area corresponding to the sound collection direction information of the collected sound data is cut out from the video data, and the cut out video data is changed to the first type according to the first mode. Compensate by the compensation process,
前記第 2態様の選択が検出されると、前記撮像部の正面方向を中心とする所定領 域を前記映像データから切り出して、前記第 1補整処理と異なる前記第 2態様に応じ た第 2補整処理により切り出した映像データを補整する、ビデオ会議装置。  When the selection of the second mode is detected, a predetermined area centered on the front direction of the imaging unit is cut out from the video data, and the second correction according to the second mode different from the first correction process is performed. Video conferencing equipment that compensates video data cut out by processing.
[2] 前記支持部は、前記第 1態様と前記第 2態様とを切り替える関節機構を備えるととも に、該関節機構によるスィッチを形成し、 [2] The support portion includes a joint mechanism that switches between the first mode and the second mode, and forms a switch by the joint mechanism,
前記映像データ生成部は、前記関節機構によるスィッチの選択状況に基づ!/、て前 記第 1態様と前記第 2態様との選択を検出する請求項 1に記載のビデオ会議装置。  2. The video conference apparatus according to claim 1, wherein the video data generation unit detects selection between the first mode and the second mode based on a selection status of the switch by the joint mechanism.
[3] 所定領域を撮像する撮像部と、 [3] an imaging unit for imaging a predetermined area;
該撮像部の撮像した映像に基づいて映像データを生成する映像データ生成部と、 自装置周囲の音声を収音して収音音声データを生成し、放音音声データを放音す る放収音部を備える筐体と、 A video data generation unit that generates video data based on the video captured by the imaging unit, and collects sound around the device itself to generate sound collection sound data, and emits sound emission sound data A housing having a sound emission and collection part;
前記収音音声データと前記映像データとを有する通信データを生成し、当該通信 データを外部に送信するとともに、外部からの通信データから放音音声データを取得 して前記放収音部に与える通信部と、  Communication data including the collected sound data and the video data is generated, the communication data is transmitted to the outside, and the emitted sound data is acquired from the communication data from the outside and is given to the sound emitting and collecting unit And
前記撮像部を前記筐体に対して一定に支持する支持部と、  A support unit for supporting the imaging unit with respect to the housing;
を備えたビデオ会議装置であって、  A video conferencing apparatus comprising:
前記撮像部は、会議者撮像領域と、前記筐体の近傍の前記撮像部に近接する領 域とを同時に撮像し、  The imaging unit simultaneously images a conference person imaging region and a region near the imaging unit in the vicinity of the housing,
前記映像データ生成部は、  The video data generation unit
前記会議者撮像領域に対応する第 1部分映像データから、前記収音音声データの 収音方位情報に対応する方位領域のみを切り出して、切り出した第 1部分映像デー タを第 3補整処理により補整し、  Only the azimuth area corresponding to the sound collection azimuth information of the collected sound data is cut out from the first partial video data corresponding to the conference person imaging area, and the cut out first partial video data is corrected by the third correction process. And
前記撮像部に近接する領域に対応する第 2部分映像データを、前記第 3補整処理 と異なる第 4補整処理により補整する、ビデオ会議装置。  A video conferencing apparatus that corrects the second partial video data corresponding to an area close to the imaging unit by a fourth correction process different from the third correction process.
[4] 前記通信データに用いる部分映像データを選択する選択部を備え、 [4] a selection unit that selects partial video data used for the communication data;
前記映像データ生成部は、前記選択部により選択された部分映像データを前記通 信部に与える請求項 3に記載のビデオ会議装置。  4. The video conferencing apparatus according to claim 3, wherein the video data generation unit provides the communication unit with partial video data selected by the selection unit.
[5] 前記撮像部は魚眼レンズを有し、該魚眼レンズにより撮像される領域の中心領域を 前記撮像部に近接する領域とし、少なくとも前記中心領域から外の周辺領域を前記 会議者撮像領域とする請求項;!〜 4のいずれかに記載のビデオ会議装置。 [5] The imaging unit includes a fisheye lens, a central region of the region imaged by the fisheye lens is a region close to the imaging unit, and at least a peripheral region outside the central region is the conference person imaging region. The video conference apparatus according to any one of Items ;! to 4.
[6] 前記映像データ生成部は、前記撮像部と一体形成されている請求項;!〜 5のいず れかに記載のビデオ会議装置。 6. The video conference apparatus according to claim 5, wherein the video data generation unit is formed integrally with the imaging unit.
[7] 前記通信部は、前記放収音部とともに前記筐体に一体形成されている請求項;!〜[7] The communication unit is integrally formed with the casing together with the sound emission and collection unit;
6の!/、ずれかに記載のビデオ会議装置。 6! /, Video conferencing equipment as described in any of the above.
[8] 前記映像データ生成部は、前記放収音部とともに前記筐体に一体形成されている 請求項;!〜 5, 7のいずれかに記載のビデオ会議装置。 8. The video conference apparatus according to claim 5, wherein the video data generation unit is integrally formed with the housing together with the sound emission and collection unit.
[9] 映像データを再生するディスプレイモニタを備え、 [9] Equipped with a display monitor that plays back video data,
前記通信部は、通信データに含まれる映像データを取得して、前記ディスプレイモ ユタに与える請求項 1〜8のいずれかに記載のビデオ会議装置。 The communication unit acquires video data included in the communication data, and displays the display mode. The video conference apparatus according to claim 1, which is given to Utah.
PCT/JP2007/074449 2006-12-19 2007-12-19 Video conferencing device WO2008075726A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006341175A JP4862645B2 (en) 2006-12-19 2006-12-19 Video conferencing equipment
JP2006-341175 2006-12-19

Publications (1)

Publication Number Publication Date
WO2008075726A1 true WO2008075726A1 (en) 2008-06-26

Family

ID=39536354

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/074449 WO2008075726A1 (en) 2006-12-19 2007-12-19 Video conferencing device

Country Status (3)

Country Link
JP (1) JP4862645B2 (en)
CN (1) CN101518049A (en)
WO (1) WO2008075726A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012039195A (en) * 2010-08-03 2012-02-23 Kokuyo Co Ltd Table system for television conference
CN104580992A (en) * 2014-12-31 2015-04-29 广东欧珀移动通信有限公司 Control method and mobile terminal
CN104967777A (en) * 2015-06-11 2015-10-07 广东欧珀移动通信有限公司 Method for controlling camera to carry out photographing, and terminal

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101969541A (en) * 2010-10-28 2011-02-09 上海杰图软件技术有限公司 Panoramic video communication system and method
JP2013009304A (en) 2011-05-20 2013-01-10 Ricoh Co Ltd Image input device, conference device, image processing control program, and recording medium
CN104932665B (en) * 2014-03-19 2018-07-06 联想(北京)有限公司 A kind of information processing method and a kind of electronic equipment
CN105100677A (en) * 2014-05-21 2015-11-25 华为技术有限公司 Method for presenting video conference, devices for presenting video conference and system for presenting video conference
CN104410778A (en) * 2014-10-09 2015-03-11 深圳市金立通信设备有限公司 Terminal
CN104320729A (en) * 2014-10-09 2015-01-28 深圳市金立通信设备有限公司 Pickup method
JP6450604B2 (en) * 2015-01-28 2019-01-09 オリンパス株式会社 Image acquisition apparatus and image acquisition method
CN105163024A (en) * 2015-08-27 2015-12-16 华为技术有限公司 Method for obtaining target image and target tracking device
CN106791538B (en) * 2016-12-25 2019-08-27 重庆警蜂科技有限公司 Digital display circuit for circuit court
CN107066039B (en) * 2016-12-25 2020-02-18 重庆警蜂科技有限公司 Portable multifunctional digital court trial terminal for patrol
CN108200515B (en) * 2017-12-29 2021-01-22 苏州科达科技股份有限公司 Multi-beam conference pickup system and method
JP7135360B2 (en) 2018-03-23 2022-09-13 ヤマハ株式会社 Light-emitting display switch and sound collecting device
CN113923305B (en) * 2021-12-14 2022-06-21 荣耀终端有限公司 Multi-screen cooperative communication method, system, terminal and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5436654A (en) * 1994-02-07 1995-07-25 Sony Electronics, Inc. Lens tilt mechanism for video teleconferencing unit
JPH07327217A (en) * 1994-06-02 1995-12-12 Canon Inc Picture input device
JPH11331827A (en) * 1998-05-12 1999-11-30 Fujitsu Ltd Television camera

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005311619A (en) * 2004-04-20 2005-11-04 Yakichiro Sakai Communication system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5436654A (en) * 1994-02-07 1995-07-25 Sony Electronics, Inc. Lens tilt mechanism for video teleconferencing unit
JPH07327217A (en) * 1994-06-02 1995-12-12 Canon Inc Picture input device
JPH11331827A (en) * 1998-05-12 1999-11-30 Fujitsu Ltd Television camera

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012039195A (en) * 2010-08-03 2012-02-23 Kokuyo Co Ltd Table system for television conference
CN104580992A (en) * 2014-12-31 2015-04-29 广东欧珀移动通信有限公司 Control method and mobile terminal
CN104967777A (en) * 2015-06-11 2015-10-07 广东欧珀移动通信有限公司 Method for controlling camera to carry out photographing, and terminal
CN104967777B (en) * 2015-06-11 2018-03-27 广东欧珀移动通信有限公司 One kind control camera image pickup method and terminal

Also Published As

Publication number Publication date
JP2008154055A (en) 2008-07-03
JP4862645B2 (en) 2012-01-25
CN101518049A (en) 2009-08-26

Similar Documents

Publication Publication Date Title
WO2008075726A1 (en) Video conferencing device
US7852369B2 (en) Integrated design for omni-directional camera and microphone array
CN109218651B (en) Optimal view selection method in video conference
US5612733A (en) Optics orienting arrangement for videoconferencing system
JP2007228070A (en) Video conference apparatus
JP6551155B2 (en) Communication system, communication apparatus, communication method and program
JP3798799B2 (en) Video phone equipment
US20040008423A1 (en) Visual teleconferencing apparatus
EP1513345A1 (en) Communication apparatus and conference apparatus
JP2017034502A (en) Communication equipment, communication method, program, and communication system
JP2008288785A (en) Video conference apparatus
WO2001011881A1 (en) Videophone device
JP2007274463A (en) Remote conference apparatus
JP2006121709A (en) Ceiling microphone assembly
WO2008047804A1 (en) Voice conference device and voice conference system
JP4411959B2 (en) Audio collection / video imaging equipment
JP2007274462A (en) Video conference apparatus and video conference system
US7940923B2 (en) Speakerphone with a novel loudspeaker placement
JP2009171486A (en) Video conference system
JP2005151471A (en) Voice collection/video image pickup apparatus and image pickup condition determination method
JP2008005346A (en) Sound reflecting device
KR20100006029A (en) A remote video conference system
JP2002107805A (en) Portable device with camera
CN213213667U (en) Interactive conference device based on visual and sound fusion
CN213213666U (en) Video and audio communication equipment

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780034288.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07850919

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07850919

Country of ref document: EP

Kind code of ref document: A1