WO2015198964A1 - Imaging device provided with audio input/output function and videoconferencing system - Google Patents

Imaging device provided with audio input/output function and videoconferencing system Download PDF

Info

Publication number
WO2015198964A1
WO2015198964A1 PCT/JP2015/067628 JP2015067628W WO2015198964A1 WO 2015198964 A1 WO2015198964 A1 WO 2015198964A1 JP 2015067628 W JP2015067628 W JP 2015067628W WO 2015198964 A1 WO2015198964 A1 WO 2015198964A1
Authority
WO
WIPO (PCT)
Prior art keywords
output function
speaker
omnidirectional camera
image data
voice input
Prior art date
Application number
PCT/JP2015/067628
Other languages
French (fr)
Japanese (ja)
Inventor
大坪 宏安
Original Assignee
日立マクセル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日立マクセル株式会社 filed Critical 日立マクセル株式会社
Publication of WO2015198964A1 publication Critical patent/WO2015198964A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to an imaging apparatus with a voice input / output function and a video conference system.
  • a conference room has a voice input / output device including a microphone and a speaker arranged on a table, a display arranged near the table, and a video camera (for example, recording) arranged near the display.
  • a video conference system is provided in which a so-called video conference using an image and sound is possible between another conference room and a remote conference room.
  • the angle of view of the TV camera is often adjusted so that all participants in the conference fall within the shooting range.
  • the seating position of the participant may be limited, or it may be difficult to keep all the participants within the shooting range.
  • it may take a little time to adjust the angle of view, zoom, etc. of the TV camera before the start of the conference, and there will be a time lag between the start of the conference after all the participants have gathered.
  • the main speaker is determined in advance at the conference, it is possible to take measures such as sitting at the center of the shooting range of the TV camera as much as possible, but who of the participants speaks? If you don't know, there is a problem that the speaker is near the end of the shooting range and you cannot see it well.
  • a plurality of microphones for voice input or a plurality of wide-angle cameras are provided, and the position of the speaker is identified from the audio signals of the plurality of microphones and the image data of the plurality of wide-angle cameras, and based on the position of the speaker
  • a proposal has been made to control a microphone so as to mainly input a voice uttered by a speaker and to control a camera so as to mainly photograph the speaker (see Patent Document 1).
  • PTZ is a camera capable of panning (P) for swinging the camera left and right, tilt (t) for swinging the camera up and down, and zoom (Z) for enlarging the image.
  • the camera direction and zoom can be controlled so that Further, in the case of a system in which the position of the speaker can be specified as described above, the PTZ camera can be automatically directed to the speaker.
  • the position of the speaker is specified using a plurality of microphones and cameras, and the camera is used so that the speaker is mainly photographed based on the position of the specified speaker.
  • the microphone is controlled so that the voice of the speaker's speech is mainly input. Therefore, in Patent Document 1, a plurality of microphones and cameras are necessary, and a control device that controls the microphones and cameras is necessary, which increases the cost of the conference system.
  • the position of the speaker is specified, the camera control for imaging the specified speaker, and the voice of the speaker
  • the microphone may be necessary to control the microphone to extract the signal
  • the present invention has been made in view of the above circumstances, and provides an imaging device with a voice input / output function for a video conference that can be manufactured at low cost and a video conference system having the imaging device with a voice input / output function. With the goal.
  • an imaging apparatus with a voice input / output function includes an omnidirectional camera that captures the surroundings, An audio output device provided in the vicinity of the omnidirectional camera and outputting an audio signal input from the outside as sound; An audio input device that is provided in the vicinity of the omnidirectional camera and inputs ambient audio as an audio signal; The image data picked up by the omnidirectional camera and the sound signal input by the sound input device are output.
  • the imaging device with a voice input / output function when used as an imaging device for a conference system, a speaker as a voice output device, and a microphone as a voice input device, the imaging device with a voice input / output function is used. All the participants can be photographed with an omnidirectional camera by placing them on a table and sitting around and crawling around the table. In this case, each participant who surrounds the table sees the imaging device with a voice input / output function on the table, or sees a display on which other venues of the video conference are projected.
  • the speakers basically speak to a microphone as an audio input device, and there is a high possibility that the speakers will face the speaker from which other participants' voices are output.
  • the sound source is in the front direction of the face, the sound is easier to hear and the sound source is often viewed so that the sound can be heard better. That is, when a conference participant sits around the omnidirectional camera on the table, at least the speaker faces the mic or speaker so that the omnidirectional camera in the vicinity of the mic or speaker faces. Therefore, the omnidirectional camera takes a picture of the participant from the front, and on the image data of the speaker, the speaker speaks to the participant at the other conference room where the speaker is watching the image data. Looks like you are doing.
  • the omnidirectional camera is arranged in a table and takes pictures of participants sitting around the table, the distance from the participants is short, and the difference in distance between the participants is small. Therefore, even if it is not an omnidirectional camera having a high resolution, the participants can be sufficiently photographed, and the cost can be reduced as compared with the case of using a high resolution omnidirectional camera.
  • the omnidirectional camera includes, for example, a fisheye camera using a fisheye lens, a camera using a mirror having a shape close to a conical shape, and an omnidirectional camera.
  • the audio output device is, for example, a speaker.
  • the voice input device is a microphone, for example.
  • a plurality of displays for displaying image data input from the outside so as to be visible from a plurality of directions in the vicinity of the omnidirectional camera in a position that does not interfere with surrounding imaging by the omnidirectional camera Is preferably provided.
  • the participants of the conference basically have a display on which the participants of other venues are projected, a speaker in which the speech of the participants of other venues is output as audio, and participation of other venues.
  • the microphone is facing the direction of the person who talks to the person, these displays, microphones, and speakers are close together, so basically, most of the participants in the conference are naturally omnidirectional cameras.
  • the participants who are facing the participants in the other venues are displayed.
  • the voice input device includes at least three microphones respectively facing at least different surrounding directions, A sound source direction recognition device that identifies the direction of the sound source from the volume of the sound input to each microphone; An omnidirectional image data captured by the omnidirectional camera is preferably provided with an image processing device that converts image data centered on the direction of the sound source specified by the sound source direction recognition device.
  • a speaker who is speaking is identified from among the participants, and a panoramic image with the speaker as a substantially central left and right is displayed on the display of another conference hall, or the speaker is extracted. It is possible to display the image of the state on the display of another venue.
  • a speaker as a sound source can be relatively easily compared by comparing the volumes of the microphones, even if they are not highly directional microphones.
  • the direction of the sound source can be specified and the direction of the sound source can be specified to specify the position of the speaker on the omnidirectional image, it is not necessary to specify the position of the sound source, and the microphone array can be used to specify the position of the sound source. In addition, since it is not necessary to use a microphone with high directivity, the cost can be reduced. In addition, when creating image data centered on the speaker specified by the microphone (main subject), it is easy to create image data mainly consisting of the speaker by specifying the direction on the omnidirectional image data. can do.
  • the face of the person being imaged is recognized in the image data imaged by the omnidirectional camera, and the movement of the mouth of the recognized face is used to recognize the face of the person being imaged.
  • An image recognition device that identifies the person being imaged, It is preferable that an omnidirectional image data captured by the omnidirectional camera is provided with an image processing device that converts the omnidirectional image data into image data centered on the person to be imaged identified as speaking by the image recognition device.
  • the direction of the speaker is specified in the same manner as in the case of voice, it is possible to create image data mainly for the speaker and there is no need to specify the position. There is no need to use it, and the cost can be reduced.
  • the direction of the speaker is specified by voice, when creating image data mainly including the specified speaker, the direction can be easily specified by specifying the direction on the omnidirectional image data. Image data mainly composed of a person can be created.
  • the video conference system includes a plurality of imaging devices with audio input / output functions according to the present invention, and each of the imaging devices with audio input / output functions includes the image data and the audio signal in another imaging device with audio input / output functions. And a communication device for inputting the image data and the audio signal output from the other imaging apparatus with the audio input / output function.
  • the video conference system of the present invention can achieve the above-described operational effects of each imaging apparatus with a voice input / output function.
  • An imaging device with a voice input / output function may be configured without a display, but image data with a voice input / output function can be obtained by inputting image data captured by another imaging device with a voice input / output function.
  • image data can be output to an external display.
  • the imaging device with audio input / output function and the video conference system of the present invention can be manufactured at a low cost, and when a speaker is displayed on the display, it can be in a state suitable for a person who looks at the display. Increases nature.
  • FIG. 4 is a diagram for explaining an image output from an imaging apparatus with a voice input / output function, in which (a) is a diagram illustrating an outline of a panoramic image converted from an omnidirectional image, and (b) is a diagram illustrating all images.
  • the panoramic image converted from the azimuth image is divided into two columns, and (c) is an image of the speaker added, and (d) the omnidirectional images taken at three different locations.
  • the video conference system uses a plurality of imaging devices 1 with audio input / output functions shown in FIGS. 1A and 1B, and images with audio input / output functions are provided in a plurality of remote conference rooms. By arranging the device 1, a video conference system is constructed.
  • An imaging apparatus 1 with a voice input / output function shown in FIG. 1 includes a substantially disc-shaped base plate 2, a substantially dome-shaped cover 3 that covers the base plate 2, and an outer peripheral portion of the base plate 2 along the circumferential direction.
  • Speakers sounds
  • the microphone 5, the speaker 6, and the omnidirectional camera 7 are provided close to each other. That is, the microphone 5, the speaker 6, and the omnidirectional camera 7 are arranged close to each other. Further, the speaker 6 and the omnidirectional camera 7 are arranged so that their central axes substantially coincide with each other, and the microphone 5 is arranged at a position that is substantially equidistant from the above-mentioned central axis.
  • the base plate 2 is provided with an attachment structure for attaching the microphone 5, the speaker 6 and the control board 4 on the upper surface thereof.
  • An attachment structure for attaching a circular lower edge portion (outer peripheral edge portion) of the cover 3 having the same diameter as the base plate 2 is provided on the outer peripheral portion of the disc-shaped base plate 2.
  • the cover 3 is provided with one or a plurality of holes (not shown) at positions corresponding to the microphone 5 so as not to interfere with voice input to the microphone 5.
  • the dome-shaped cover 3 is provided with an opening 3a for outputting sound from the speaker 6 at the upper part (central part).
  • the opening 3 a of the cover 3 is provided with a bridge-like camera fixing portion 3 b for fixing the omnidirectional camera 7 to the central portion of the upper portion of the cover 3.
  • the microphone 5 has, for example, directivity, and the direction with the highest sensitivity is matched with the radial direction orthogonal to the central axis of the hemispherical surface or cylindrical surface of the omnidirectional camera 7, for example.
  • the microphones 5 are arranged at equal positions in the radial direction with respect to the center axis of the photographing range and at positions shifted by 90 degrees (equal intervals in the circumferential direction).
  • an omnidirectional microphone 5 may be used as the microphone 5.
  • Each microphone 5 is connected to the control board 4 and converts sound into an audio signal and inputs it to the control board 4.
  • the audio signal may be analog or digital.
  • the speaker 6 is of an omnidirectional type, and a single speaker 6 outputs audio in almost all directions. A plurality of non-omnidirectional speakers such as three or four may be used.
  • the speaker 6 is connected to the control board 4 and converts a sound signal output from the control board 4 into sound and outputs the sound to the surroundings.
  • the omnidirectional camera 7 is, for example, a fisheye camera having a hemispherical imaging range, and the surrounding area is an imaging target.
  • omnidirectional image data F (shown in FIG. 2) is obtained from images captured by a plurality of cameras. ), A camera that captures the surroundings through a substantially conical mirror, or an omnidirectional camera.
  • the omnidirectional camera 7 only needs to be able to image a participant as a subject sitting around the table T from the imaging apparatus 1 with a voice input / output function placed on the table T. For example, upward image data is not required.
  • the arrangement position of the omnidirectional camera 7 is high, for example, when it has a height higher than the head of the participant who sits down, it becomes impossible to photograph the bust of the participant in the hemispherical photographing range.
  • an omnidirectional camera can be preferably used.
  • the control board 4 specifies the direction of the sound source from the volume level (sound volume) of the audio signals input from the four microphones 5.
  • the position of the sound source is not specified by specifying the direction of the sound source and the distance to the sound source, the position of the sound source is measured from the volume levels of the four microphones 5. For example, the top two adjacent microphones 5 having a high volume level are specified, and the direction between these two microphones 5 is determined from the difference in volume between these two microphones.
  • the sound source is in the direction in which the microphone 5 with the first volume faces.
  • the sound source may be specified from the phase shift of the sound in each microphone 5. That is, a well-known method for specifying the direction of the sound source based on the difference in sound arrival time in each microphone 5 due to the difference in distance from the sound source may be used.
  • the control board 4 as an image recognition device is adapted to specify the direction of the speaker from the omnidirectional image data F input from the omnidirectional camera 7. Basically, the direction of each participant is specified by recognizing the face of each participant (imaged person) from the omnidirectional image data F by well-known face recognition. Further, each participant's mouth is image-recognized, it is determined whether or not the mouth (lips) is moving, and the direction of the face determined that the mouth is moving is set as the direction of the speaker.
  • image processing and image recognition can be easily created using Intel (registered trademark) Open CV (Intel Open Source Computer Library Library).
  • an object detection program registered in the open CV can be used.
  • image recognition there are a learning phase and a recognition phase.
  • image recognition such as face recognition becomes possible.
  • Haar / Like feature values are used as image feature values, and an algorithm called Adaboost is used as a learning algorithm.
  • Adaboost is used as a learning algorithm.
  • the object detection program it is possible to recognize a face image as a face in the object detection program by causing the object detection program to perform machine learning based on the feature points.
  • the open CV is not necessarily used for the image recognition program, and an existing program or a chip equipped with an existing image recognition circuit may be used.
  • the movement of the speaker's mouth can also be recognized by using the above-mentioned open CV object detection program for opportunity learning, for example, to recognize the difference between a speaking mouth and a silent mouth.
  • the control board 4 specifies the direction of the sound source as a speaker even by voice.
  • the direction of the speaker based on the sound source direction recognition and the image recognition is, for example, predetermined.
  • the direction obtained by the image recognition is used as the speaker direction when the angles match within the angle range (for example, within 0 to 10 degrees). Yes.
  • the direction of the sound source by the sound source direction recognition and the direction of the speaker by the image recognition are not within the predetermined angle range, it is determined that there is no speaker.
  • participants who speak a private language, participants who yawn, participants who make loud noises when moving a chair, etc. are recognized as speakers even temporarily, for example This prevents a situation in which the image is displayed largely on the display 8 at another venue.
  • the direction of the speaker may be determined only by sound source direction recognition, or the direction of the speaker may be determined only by image recognition.
  • control board 4 functions as an image processing device that converts the omnidirectional image data F input from the omnidirectional camera 7 into a panoramic image by known image processing.
  • the panoramic image data is created from the omnidirectional image data F by determining the positions at the right and left ends of the panoramic image from the omnidirectional image data F.
  • the omnidirectional image data F is cut open at a position 180 degrees from the direction of the speaker, that is, in a direction opposite to the direction of the inventor.
  • the positions of the right end and the left end of the panoramic image are used.
  • the interval of each participant whose face is recognized as described above is determined, and the center of the widest interval is set as the position of the left end and the right end of the panoramic image.
  • the control board 4 creates the image data of the speaker whose participants are mainly recognized in that direction.
  • the face-recognized participant's image portion may be taken out and used as image data, or the image portion within a predetermined angle range in the direction of the specified speaker is used as the speaker's image data. Also good.
  • control board 4 as a communication device uses a local area network (LAN), the Internet, a public telephone line network, a mobile phone line network, a dedicated communication line, etc.
  • LAN local area network
  • the panoramic image data and the speaker image obtained by performing the image processing as described above on the audio signal input from the microphone 5 and the omnidirectional image data F captured by the omnidirectional camera 7 by performing data communication with the imaging device 1 with an output function.
  • Data is transmitted to another imaging apparatus 1 with a voice input / output function.
  • the imaging apparatus 1 with the voice input / output function since the imaging apparatus 1 with the voice input / output function does not have the display 8, the received image data is output to the connection terminal for the display 8, and is displayed on the display 8 connected to the connection terminal. Display image data. As will be described later, the received image data including the display 8 in the imaging apparatus 1 with the voice input / output function may be output to the display 8 of the imaging apparatus 1 with the voice input / output function.
  • the control board 4 performs sound source direction recognition, image recognition, image processing, and the like. However, the control board 4 mainly controls input / output of audio signals and image data, and connects the control board 4 to the wired LAN. Alternatively, sound source direction recognition, image recognition, and image processing may be performed by a personal computer (PC PC: illustrated in FIG. 2) connected by a wireless LAN, USB, or the like. In addition, various image processing is performed by the imaging apparatus 1 with an audio input / output function having an omnidirectional camera 7 that captures an omnidirectional image. You may carry out with the imaging device 1 or the personal computer PC connected to it. That is, the omnidirectional image data F taken by the omnidirectional camera 7 may be transmitted as image data as it is, and the received image pickup apparatus with a voice input / output function 1 may process the image and display it on the display 8.
  • PC PC illustrated in FIG. 2
  • various image processing is performed by the imaging apparatus 1 with an audio input / output function having an omnidirectional camera 7 that captures an omnidirectional image
  • the imaging device 1 with a voice input / output function of such a telephone conference system is used by being placed on a table T in a conference room, for example, as shown in FIG.
  • the conference participant P sits around the table T.
  • the participants P sit in two rows on the two long sides of the rectangular table T, respectively.
  • the personal computer PC is used as described above, and the display 8 is connected via the personal computer PC, and image data processed by the personal computer PC is displayed on the display 8.
  • the omnidirectional image data F captured by the omnidirectional camera 7 in the state shown in FIG. 2 is in the state shown in FIG.
  • the three-dimensional omnidirectional image data F is shown in a simplified manner in a state projected onto a plane.
  • the control board 4 performs image processing on this omnidirectional image data F to make two panoramic images G1 or G1 displayed during display on the display 8 shown in FIG. 4A or 4B.
  • the divided panoramic images G2 and G3 are used.
  • the interval of each participant P in the omnidirectional image data F is determined, and if there is an interval greater than a predetermined interval (angle), the panoramic image G1 is displayed.
  • the left and right widths of the panoramic images G2 and G3 are compressed by separating and cutting the interval between the separated parts. Note that when creating the panoramic images G1, G2, and G3, all the intervals between the participants P may be cut.
  • the panoramic image may be created by creating image data of each participant P with a predetermined width (predetermined angle range) and arranging the data side by side. Also in this case, the interval between the participants P can be prevented from being displayed.
  • the panoramic images G2 and G3 are displayed in a large size by displaying the image data separated into two in the upper and lower stages.
  • FIG. 4C when the speaker is specified, as shown in FIG. 4C, in addition to the panoramic image G1 shown in FIG. 4A, an image G10 mainly composed of the speaker is displayed separately.
  • the video conference is not necessarily held in only two places and may be held in three or more places, in that case, for example, as shown in FIG. And panoramic images G1, G4, and G5 are displayed at the respective divided portions.
  • FIG. 4D a video conference is performed by connecting four places, and images of three meeting rooms other than the meeting room with the display 8 are displayed.
  • the control board 4 as a communication device of the imaging device 1 with the voice input / output function installed in each conference room, as described above, in each conference room.
  • the images of the participants in other conference rooms are displayed on the display 8 as described above, and input from the speakers 6 in the other conference rooms. Audio signal is output.
  • the omnidirectional camera 7, the microphone 5, and the speaker 6 are substantially integrated as described above, and a participant who speaks (speaker) ) Basically tries to speak into the microphone 5.
  • the speaker since there is the omnidirectional camera 7 in the vicinity of the microphone 5, the speaker is in a state of speaking toward the omnidirectional camera 7, and the speaker is in a state of being photographed from the front.
  • the speaker's image G ⁇ b> 10 is displayed on the display 8, there is a high possibility that the speaker is talking to a participant in another conference room looking at the display 8.
  • the voice of the speaker in the other conference room can be heard from the speaker 6 near the omnidirectional camera 7, so that the sound can be easily heard. It faces the speaker 6.
  • the speaker speaks into the omnidirectional camera 7 and is inexpensive. Therefore, as described above, it is easy to obtain an image in a state where a speaker is speaking toward a participant in another conference room. For these reasons, it is possible to suppress a sense of incongruity peculiar to a video conference caused by a speaker speaking in a direction other than the omnidirectional camera 7 on the screen of the display 8. In other words, it is possible to urge the speaker to naturally face the omnidirectional camera 7 without making an effort to consciously face the camera.
  • the above-described omnidirectional camera 7 can be used without any particular control. Thus, if the participant who speaks is specified, the image of the speaker can be easily obtained.
  • the imaging apparatus 1a with the voice input / output function of the second embodiment is similar to the imaging apparatus 1 with the voice input / output function of the first embodiment.
  • a base plate 11, a cover 12, a control board (not shown) (control board 4 in FIG. 1), a microphone 5, a speaker 6, and an omnidirectional camera 7 are provided.
  • the imaging apparatus 1a with a voice input / output function of the second embodiment further includes a display 8, that is, the imaging apparatus 1 with a voice input / output function of the first embodiment and the voice input of the second embodiment.
  • the difference from the imaging device with an output function 1a is whether the display 8 is separate from the imaging device 1 with a voice input / output function or whether the display 8 is provided in the imaging device 1a with a voice input / output function. It is a difference.
  • the base plate 11 is formed in a rectangular plate shape, and the microphone 5 is provided at each of the four corners.
  • a display for example, a liquid crystal display
  • a control board and a speaker 6 are disposed between the two displays 8 on the base plate 11.
  • the cover 12 is formed in a rectangular parallelepiped shape corresponding to the rectangular base plate 11 and is attached so as to cover the base plate 11.
  • a window portion 12a that allows the display screen of the display 8 to be visually recognized from the outside is provided.
  • the top plate of the cover 12 is provided with an opening 12 b corresponding to the speaker 6.
  • a camera fixing portion 12c is provided in a bridge shape at the opening 12b portion of the cover 12, and the omnidirectional camera 7 is attached to the camera fixing portion 12c.
  • One or a plurality of holes may be provided at a position corresponding to the microphone 5 of the cover 12.
  • the imaging apparatus 1 a with a voice input / output function includes two pieces so as to be preferably used when participants P sit side by side on two parallel side edges of the table T, respectively.
  • the displays 8 are arranged in opposite directions. Further, as the display 8, for example, a display 8 having a relatively small screen of about 7 inches to 15 inches is used, and when placed on the table T, the lines of sight of the participants sitting facing each other are displayed. It is designed not to block. Moreover, the cost concerning the display 8 is reduced.
  • the imaging apparatus with a voice input / output function 1a of the second embodiment it is possible to obtain substantially the same operational effects as the imaging apparatus 1 with a voice input / output function of the first embodiment.
  • the display 8 is provided in the vicinity of the omnidirectional camera 7, and when the participant is seated around the table T as described above, the head is turned to the front without being inclined. The display 8 can be seen without difficulty.
  • the omnidirectional camera 7 When the participant P faces the display 8, the omnidirectional camera 7 is seen in the vicinity of the display 8 in the vicinity of the display 8, so that the omnidirectional camera 7 is viewed. You can get image data that looks like participants in other venues. That is, in the first embodiment, the structure is such that the speaker is mainly urged to speak by looking at the omnidirectional camera 7, but the other participants have the display 8 in a different place from the omnidirectional camera 7. It is difficult to prevent the participant other than the speaker from looking at the omnidirectional camera 7 and taking an image of the participant other than the speaker facing away. Met.
  • the display 8 is arranged in the vicinity of the omnidirectional camera 7, and when the participant looks at the display 8, the face of the participant is directed toward the omnidirectional camera 7.
  • the speaker in order for the speaker to view the display 8, it is not necessary to deviate the direction of the face from the direction of the microphone 5, the speaker 6, and the omnidirectional camera 7, and the face is directed to the omnidirectional camera 7 while speaking. .
  • the imaging device 1b with the voice input / output function of the third embodiment is similar to the imaging device 1 with the voice input / output function of the first embodiment.
  • a base plate 21, a cover 22, a control board (control board 4 in FIG. 1), a microphone 5, a speaker 6, and an omnidirectional camera 7 are provided.
  • the imaging apparatus 1b with a voice input / output function of the third embodiment includes a display 8 as in the case of the second embodiment.
  • the base plate 21 is formed in a triangular plate shape, and a microphone 5 is provided at each of the three corners.
  • a display 8 is attached to each of the three side edges of the base plate 21 with the display screen facing outward. Further, a control board and a speaker 6 (not shown) are arranged inside the three displays 8 of the base plate 11.
  • the cover 22 is formed in a triangular prism shape corresponding to the triangular base plate 21 and is attached so as to cover the base plate 21. At positions corresponding to the display 8 on each of the three side surfaces of the cover 22, a window portion 22 a that allows the display screen of the display 8 to be visually recognized from the outside is provided. In addition, the top plate of the cover 22 is provided with an opening 22 b corresponding to the speaker 6. A camera fixing portion 22c is provided in a Y-bridge shape at the opening 22b of the cover 22, and the omnidirectional camera 7 is attached to the camera fixing portion 22c. One or a plurality of holes may be provided at a position corresponding to the microphone 5 of the cover 22.
  • the imaging apparatus 1b with a voice input / output function according to the third embodiment is basically the same as that of the second embodiment except for the difference in the number of displays 8 and microphones 5 and whether the planar shape is a square or a triangle.
  • the image pickup apparatus 1a with a voice input / output function has substantially the same structure and exhibits the same effects.
  • voice input / output function having the four displays 8 by providing the display 8 in all the side surfaces of the cover 12 in the shape of 2nd Embodiment.
  • the imaging device 1c with the voice input / output function of the fourth embodiment is similar to the imaging device 1 with the voice input / output function of the first embodiment.
  • a control board (control board 4 in FIG. 1), a microphone 5a, a speaker 6a, and an omnidirectional camera 7a are provided.
  • the imaging apparatus with audio input / output function 1c according to the fourth embodiment includes a display 8a as in the second and third embodiments.
  • the control board of the imaging device 1c with a voice input / output function for example, the microphone 5a, for example, the display 8a of about 20 to 32 inches (or more) may be used.
  • the speaker 6a is incorporated, and an omnidirectional camera 7a is attached to the center of the upper surface of the display 8a. That is, in the display for a personal computer or the like, the control board and the omnidirectional camera 7a are provided on a display incorporating a speaker and a microphone.
  • the display 8a may be connected to a personal computer, and the personal computer PC may have functions other than the control board data input / output. In this case, the connection between the display 8a and the personal computer PC can be performed in the same manner as when a display of a type including a speaker, a microphone, and a camera is connected to the personal computer PC.
  • the display 8a has display screens 14a and 14b on both the front and back surfaces. As shown in FIG. Participants sitting facing each other see different display screens 14a and 14b.
  • a plurality of imaging devices 1c with a voice input / output function may be used as the display screen 14b is not provided on the back side.
  • a plurality of omnidirectional cameras 7a a plurality of omnidirectional cameras 7a are not necessarily required. Therefore, a combination of a type having an omnidirectional camera 7a and a type having no omnidirectional camera 7a may be used. Good.
  • the position of the omnidirectional camera 7a may be too high for the participant, and only the upper part of the bust portion of the participant may be reflected in the hemispherical shooting range. Only the upper part of the participant's face may be visible. Therefore, it is preferable that the omnidirectional camera 7a is an omnidirectional camera having a shooting range that is wider than the hemisphere and close to the whole globe.
  • the imaging apparatus with a voice input / output function 1c of the fourth embodiment it is possible to obtain substantially the same operational effects as those of the first and second embodiments.
  • Image pickup apparatus with voice input / output function 4 Control board (image recognition device, image processing device, sound source direction recognition device, communication device) 5,5a Microphone (voice input device) 6,6a Speaker (Audio output device) 7,7a Omnidirectional camera 8,8a Display

Abstract

Provided are an imaging device provided with an audio input/output function for videoconferencing, and a videoconferencing system, which are inexpensive to produce. An imaging device (1) provided with an audio input/output function that is used in a videoconferencing system is provided with microphones (5), a speaker (6) and an omnidirectional camera (7), and the microphones (5), the speaker (6) and the omnidirectional camera (7) are arranged at positions that are close to one another. The imaging device (1) provided with an audio input/output function is used while placed on a table around which participants participating in a videoconference sit. A person speaking at the conference speaks towards the microphones (5) and is highly likely to face the speaker in order to hear the sound of the speaker (6). Thus, the circumstances are such that the person speaking is imaged from the front by the omnidirectional camera (7).

Description

音声入出力機能付き撮像装置およびテレビ会議システムImage pickup apparatus with audio input / output function and video conference system
 本発明は、音声入出力機能付き撮像装置およびテレビ会議システムに関する。 The present invention relates to an imaging apparatus with a voice input / output function and a video conference system.
 近年、会議室には、テーブル上に配置されたマイクおよびスピーカを備える音声入出力装置と、テーブルの近傍に配置されるディスプレイと、このディスプレイの近傍に配置された動画撮影用のカメラ(例えば録画機能の無いテレビカメラ)とを備え、離れた場所の別の会議室との間で、画像と音声を用いた所謂テレビ会議が可能となるテレビ会議システムが設けられている場合がある。 In recent years, a conference room has a voice input / output device including a microphone and a speaker arranged on a table, a display arranged near the table, and a video camera (for example, recording) arranged near the display. There is a case in which a video conference system is provided in which a so-called video conference using an image and sound is possible between another conference room and a remote conference room.
 このようなテレビ会議システムでは、テレビカメラの画角を調整して、会議の参加者の全員が撮影範囲に入るようにする場合が多い。この場合に、参加者の着席位置が制限されたり、参加者全員を撮影範囲に収めることが困難であったりする場合がある。また、会議開始前にテレビカメラの画角やズーム等を調整するのに少し時間がかかることがあり、参加者が全員揃ってから会議開始までに時間差が生じてしまう。 In such a video conference system, the angle of view of the TV camera is often adjusted so that all participants in the conference fall within the shooting range. In this case, the seating position of the participant may be limited, or it may be difficult to keep all the participants within the shooting range. Further, it may take a little time to adjust the angle of view, zoom, etc. of the TV camera before the start of the conference, and there will be a time lag between the start of the conference after all the participants have gathered.
 また、会議において、主な発言者が予め決まっている場合には、発言者にテレビカメラの撮影範囲のなるべく中央側に座って貰うなどの対策が可能であるが、参加者の誰が発言するか分からない状態では、発言者が撮影範囲の端の方にいて、よく見えないなどの問題が生じる。 In addition, if the main speaker is determined in advance at the conference, it is possible to take measures such as sitting at the center of the shooting range of the TV camera as much as possible, but who of the participants speaks? If you don't know, there is a problem that the speaker is near the end of the shooting range and you cannot see it well.
 そこで、音声入力用マイクを複数設けるか、複数の広角度カメラを設け、これら複数のマイクの音声信号や複数の広角カメラの画像データから発言者の位置を特定し、発言者の位置に基づいて、発言者の発する音声を主に音声入力するようにマイクを制御し、かつ、発言者を主に撮影するようにカメラを制御する提案がなされている(特許文献1参照)。 Therefore, a plurality of microphones for voice input or a plurality of wide-angle cameras are provided, and the position of the speaker is identified from the audio signals of the plurality of microphones and the image data of the plurality of wide-angle cameras, and based on the position of the speaker A proposal has been made to control a microphone so as to mainly input a voice uttered by a speaker and to control a camera so as to mainly photograph the speaker (see Patent Document 1).
 また、近年の会議システムでは、カメラとしてPTZカメラが用いられる。PTZとは、カメラを左右に首振りさせるパーン(P)と、上下に首振りさせるチルト(t)、画像を拡大するズーム(Z)が可能なカメラであり、例えば、会議の発言者が中心となるようにカメラの向きとズームを制御することができる。また、上述のように発言者の位置が特定できるシステムの場合に、自動でPTZカメラを発言者に向けることができる。 In recent conference systems, a PTZ camera is used as the camera. PTZ is a camera capable of panning (P) for swinging the camera left and right, tilt (t) for swinging the camera up and down, and zoom (Z) for enlarging the image. The camera direction and zoom can be controlled so that Further, in the case of a system in which the position of the speaker can be specified as described above, the PTZ camera can be automatically directed to the speaker.
特開平10-145763号公報Japanese Patent Laid-Open No. 10-145763
 ところで、特許文献1の発明では、複数台のマイクやカメラを用いて発言者の位置を特定し、この特定された発言者の位置に基づいて、発言者が主に撮影されるようにカメラを制御したり、発言者の発言の音声が主に入力されるようにマイクを制御したりする。したがって、特許文献1では、複数のマイクやカメラが必要で、かつ、マイクやカメラを制御する制御装置が必要であり、会議システムのコストが高くなる。 By the way, in the invention of Patent Document 1, the position of the speaker is specified using a plurality of microphones and cameras, and the camera is used so that the speaker is mainly photographed based on the position of the specified speaker. The microphone is controlled so that the voice of the speaker's speech is mainly input. Therefore, in Patent Document 1, a plurality of microphones and cameras are necessary, and a control device that controls the microphones and cameras is necessary, which increases the cost of the conference system.
 例えば、1つの会議室の参加者が数十人を超えるような場合には、発言者の位置を特定し、特定された発言者を撮像するためのカメラの制御や、発言者の発言の音声を抽出するためのマイクの制御が必要となるかもしれないが、1つの会議室の参加者が十数人以下の場合に、コストパフォーマンス的に問題がある。 For example, if there are more than several tens of participants in one conference room, the position of the speaker is specified, the camera control for imaging the specified speaker, and the voice of the speaker Although it may be necessary to control the microphone to extract the signal, there is a problem in cost performance when the number of participants in one conference room is less than ten.
 本発明は、前記事情に鑑みてなされたものであり、低コストに製造可能なテレビ会議用の音声入出力機能付き撮像装置およびこの音声入出力機能付き撮像装置を有するテレビ会議システムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and provides an imaging device with a voice input / output function for a video conference that can be manufactured at low cost and a video conference system having the imaging device with a voice input / output function. With the goal.
 前記課題を解決するために、本発明の音声入出力機能付き撮像装置は、周囲を撮像対象とする全方位カメラと、
 前記全方位カメラの近傍に設けられ、外部から入力される音声信号を音声として周囲に出力する音声出力デバイスと、
 前記全方位カメラの近傍に設けられ、周囲の音声を音声信号として入力する音声入力デバイスとを備え、
 前記全方位カメラにより撮像された画像データと、前記音声入力デバイスにより入力された音声信号を出力することを特徴とする。
In order to solve the above-described problem, an imaging apparatus with a voice input / output function according to the present invention includes an omnidirectional camera that captures the surroundings,
An audio output device provided in the vicinity of the omnidirectional camera and outputting an audio signal input from the outside as sound;
An audio input device that is provided in the vicinity of the omnidirectional camera and inputs ambient audio as an audio signal;
The image data picked up by the omnidirectional camera and the sound signal input by the sound input device are output.
 このような構成によれば、音声入出力機能付き撮像装置を会議システムの撮像装置、音声出力デバイスとしてのスピーカ、音声入力デバイスとしてのマイクとして使用する場合に、当該音声入出力機能付き撮像装置をテーブルに配置し、このテーブルを囲んで会議の複数の参加者に座って貰うことにより、全方位カメラにより参加者の全員を撮影することが可能となる。この場合、テーブルを囲む参加者は、それぞれ、テーブル上の音声入出力機能付き撮像装置を見るか、テレビ会議の他の会場が映し出されたディスプレイを見ることになる。 According to such a configuration, when the imaging device with a voice input / output function is used as an imaging device for a conference system, a speaker as a voice output device, and a microphone as a voice input device, the imaging device with a voice input / output function is used. All the participants can be photographed with an omnidirectional camera by placing them on a table and sitting around and crawling around the table. In this case, each participant who surrounds the table sees the imaging device with a voice input / output function on the table, or sees a display on which other venues of the video conference are projected.
 但し、発言者の多くは、基本的に音声入力デバイスとしてのマイクに向かって発言する場合が多く、また、他の参加者の音声が出力されるスピーカの方向を向く可能性も高い。一般に音源が顔の正面方向にある方が、音が聞き取り易く、音が良く聞こえるように音源の方を見ることが多い。すなわち、テーブル上の全方位カメラの周囲を囲んで会議の参加者が座った場合に少なくとも発言者がマイクやスピーカの方を向くことで、マイクやスピーカの近傍にある全方位カメラの方向を向くことになり、全方位カメラで正面から参加者を撮影する状態となり、発言者を撮像した画像データ上では、発言者が画像データを見ているテレビ会議の他の会場の参加者を向いて発言しているように見える。 However, many of the speakers basically speak to a microphone as an audio input device, and there is a high possibility that the speakers will face the speaker from which other participants' voices are output. In general, when the sound source is in the front direction of the face, the sound is easier to hear and the sound source is often viewed so that the sound can be heard better. That is, when a conference participant sits around the omnidirectional camera on the table, at least the speaker faces the mic or speaker so that the omnidirectional camera in the vicinity of the mic or speaker faces. Therefore, the omnidirectional camera takes a picture of the participant from the front, and on the image data of the speaker, the speaker speaks to the participant at the other conference room where the speaker is watching the image data. Looks like you are doing.
 すなわち、マイクとスピーカとカメラを略同じ位置に配置することにより、少なくとも会議の参加者が発言する場合に、カメラを向いて発言させるように促すことができ、発言者の画像を明確にすることができる。 In other words, by placing microphones, speakers, and cameras in approximately the same position, at least when a participant in a conference speaks, it is possible to encourage the camera to speak and to clarify the image of the speaker Can do.
 また、全方位カメラは、テーブル状に配置されて、テーブル周囲に座る参加者を撮影するので参加者と距離が短く、かつ、各参加者による距離の差が少ない。したがって、高い解像度を有する全方位カメラでなくても、参加者の撮影が十分可能であり、高い解像度の全方位カメラを用いる場合よりもコストの低減を図ることができる。 Also, since the omnidirectional camera is arranged in a table and takes pictures of participants sitting around the table, the distance from the participants is short, and the difference in distance between the participants is small. Therefore, even if it is not an omnidirectional camera having a high resolution, the participants can be sufficiently photographed, and the cost can be reduced as compared with the case of using a high resolution omnidirectional camera.
 なお、全方位カメラで撮像された全方位画像データをそのまま平面に投影した状態に出力すると歪んだ画像となるので、例えば、パノラマ画像に変換したり、各被写体となる会議の参加者毎の画像に変換したりするとともに、歪みをとる画像処理を行う必要がある。また、全方位カメラは、例えば、魚眼レンズを用いた魚眼カメラや、円錐状に近い形状のミラーを用いたカメラや、全天球カメラが含まれる。また、音声出力デバイスは、例えば、スピーカである。また、音声入力デバイスは、例えば、マイクである。 Note that if the omnidirectional image data picked up by the omnidirectional camera is output as it is projected onto the plane as it is, it becomes a distorted image. For example, it is converted into a panoramic image or an image for each participant of the conference as each subject. In addition, it is necessary to perform image processing that takes distortion. The omnidirectional camera includes, for example, a fisheye camera using a fisheye lens, a camera using a mirror having a shape close to a conical shape, and an omnidirectional camera. The audio output device is, for example, a speaker. The voice input device is a microphone, for example.
 本発明の前記構成において、前記全方位カメラの近傍で当該全方位カメラによる周囲の撮像を妨げない位置に、周囲の複数方向から視認可能に外部から入力された画像データを表示する複数台のディスプレイが設けられていることが好ましい。 In the configuration of the present invention, a plurality of displays for displaying image data input from the outside so as to be visible from a plurality of directions in the vicinity of the omnidirectional camera in a position that does not interfere with surrounding imaging by the omnidirectional camera Is preferably provided.
 このような構成によれば、基本的に会議の参加者は、他の会場の参加者が映し出されたディスプレイ、他の会場の参加者の発言が音声として出力されるスピーカ、他の会場の参加者へ話しかけるためのマイクの方向を向く可能性が高いが、これらディスプレイ、マイク、スピーカが互いに近傍にまとまって存在するので、基本的に会議の参加者は、その多くが自然に全方位カメラの方向を向くことになり、他の会場のディスプレイでは、他の会場の参加者の方向を向いている参加者が映し出されることになる。 According to such a configuration, the participants of the conference basically have a display on which the participants of other venues are projected, a speaker in which the speech of the participants of other venues is output as audio, and participation of other venues. Although there is a high possibility that the microphone is facing the direction of the person who talks to the person, these displays, microphones, and speakers are close together, so basically, most of the participants in the conference are naturally omnidirectional cameras. In other venue displays, the participants who are facing the participants in the other venues are displayed.
 また、全方位カメラをテーブル上に置いた場合に、各参加者とディスプレイとの距離が短くなり、比較的小さなサイズのディスプレイでも別会場の参加者の識別が可能になるので、ディスプレイを複数用いるものとしても大きなディスプレイを1つ用いる場合よりもコストの低減を図ることができる。なお、四角いテーブルに、参加者が2列で向かい合って座るような場合に、ディスプレイを2つとすることができる。円卓の周囲に参加者が円状に並んで座る場合や、四角いテーブルの4辺のうちの3辺以上に、分かれて参加者が座る場合には、ディスプレイが3つ以上あることが好ましい。 In addition, when an omnidirectional camera is placed on a table, the distance between each participant and the display becomes shorter, and it becomes possible to identify participants in different venues even with a relatively small display, so use multiple displays. Even if it is a thing, cost reduction can be aimed at rather than the case where one big display is used. In addition, when a participant sits facing each other in two rows on a square table, two displays can be provided. When participants sit in a circle around the round table, or when participants sit separately on three or more of the four sides of a square table, it is preferable that there are three or more displays.
 本発明の前記構成において、前記音声入力デバイスは、少なくとも周囲の異なる方向をそれぞれ向いた少なくとも3つのマイクを備え、
 各マイクに入力する音声の音量から音源の方向を特定する音源方向認識デバイスと、
 前記全方位カメラで撮像された全方位画像データを、前記音源方向認識デバイスにより特定された音源の方向を中心とする画像データに変換する画像処理デバイスを備えることが好ましい。
In the above configuration of the present invention, the voice input device includes at least three microphones respectively facing at least different surrounding directions,
A sound source direction recognition device that identifies the direction of the sound source from the volume of the sound input to each microphone;
An omnidirectional image data captured by the omnidirectional camera is preferably provided with an image processing device that converts image data centered on the direction of the sound source specified by the sound source direction recognition device.
 このような構成によれば、参加者のうち発言している発言者を特定して、発言者を左右の略中央とするパノラマ画像を他の会議会場のディスプレイに表示したり、発言者を抜き出した状態の画像を他の会場のディスプレイに表示したりすることが可能になる。本発明においては、全方位カメラおよびその近傍のマイクの周囲に参加者がいるので、指向性の高いマイクでなくとも、各マイクの音量を比較することで、比較的容易に音源としての発言者の方向を特定可能であるとともに、発言者の全方位画像上の位置を特定するのに音源の方向さえ特定できれば、音源の位置まで特定する必要がなく、音源の位置を特定するのにマイクアレーや指向性の高いマイク等を用いなくてもよいので、コストの低減を図ることができる。また、マイクにより特定された発言者を中心(主体)とする画像データを作成する際には、全方位画像データ上で方向を指定することにより、容易に発言者を主体とする画像データを作成することができる。 According to such a configuration, a speaker who is speaking is identified from among the participants, and a panoramic image with the speaker as a substantially central left and right is displayed on the display of another conference hall, or the speaker is extracted. It is possible to display the image of the state on the display of another venue. In the present invention, since there are participants around the omnidirectional camera and the microphone in the vicinity thereof, a speaker as a sound source can be relatively easily compared by comparing the volumes of the microphones, even if they are not highly directional microphones. As long as the direction of the sound source can be specified and the direction of the sound source can be specified to specify the position of the speaker on the omnidirectional image, it is not necessary to specify the position of the sound source, and the microphone array can be used to specify the position of the sound source. In addition, since it is not necessary to use a microphone with high directivity, the cost can be reduced. In addition, when creating image data centered on the speaker specified by the microphone (main subject), it is easy to create image data mainly consisting of the speaker by specifying the direction on the omnidirectional image data. can do.
 本発明の前記構成において、前記全方位カメラで撮像された画像データ中に撮像されている被撮像者の顔を認識するとともに、認識された前記顔の口の動きから前記被撮像者のうちの発言している前記被撮像者を特定する画像認識デバイスと、
 前記全方位カメラで撮像された全方位画像データを、前記画像認識デバイスにより発言していると特定された前記被撮像者を中心とする画像データに変換する画像処理デバイスとを備えることが好ましい。
In the configuration of the present invention, the face of the person being imaged is recognized in the image data imaged by the omnidirectional camera, and the movement of the mouth of the recognized face is used to recognize the face of the person being imaged. An image recognition device that identifies the person being imaged,
It is preferable that an omnidirectional image data captured by the omnidirectional camera is provided with an image processing device that converts the omnidirectional image data into image data centered on the person to be imaged identified as speaking by the image recognition device.
 このような構成によれば、音声の場合と同様に発言者の方向を特定すれば、発言者を主体とする画像データを作成可能であり、位置を特定する必要がないので、複数のカメラを用いる必要がなく、コストの低減を図ることができる。また、音声で発言者の方向を特定した場合と同様に、特定された発言者を主体とする画像データを作成する際には、全方位画像データ上で方向を指定することにより、容易に発言者を主体とする画像データを作成することができる。 According to such a configuration, if the direction of the speaker is specified in the same manner as in the case of voice, it is possible to create image data mainly for the speaker and there is no need to specify the position. There is no need to use it, and the cost can be reduced. Similarly to the case where the direction of the speaker is specified by voice, when creating image data mainly including the specified speaker, the direction can be easily specified by specifying the direction on the omnidirectional image data. Image data mainly composed of a person can be created.
 本発明のテレビ会議システムは、本発明の音声入出力機能付き撮像装置を複数備え、各音声入出力機能付き撮像装置は、他の前記音声入出力機能付き撮像装置に前記画像データと前記音声信号を出力し、かつ、他の前記音声入出力機能付き撮像装置から出力された前記画像データおよび前記音声信号を入力するための通信デバイスを備えることを特徴とする。 The video conference system according to the present invention includes a plurality of imaging devices with audio input / output functions according to the present invention, and each of the imaging devices with audio input / output functions includes the image data and the audio signal in another imaging device with audio input / output functions. And a communication device for inputting the image data and the audio signal output from the other imaging apparatus with the audio input / output function.
 このような構成によれば、本発明のテレビ会議システムは、各音声入出力機能付き撮像装置の上述の作用効果を奏することができる。なお、音声入出力機能付き撮像装置には、ディスプレイが無い構成の場合もあるが、他の音声入出力機能付き撮像装置で撮像された画像データが入力されることにより、音声入出力機能付き撮像装置において、外部のディスプレイに画像データを出力することが可能になる。 According to such a configuration, the video conference system of the present invention can achieve the above-described operational effects of each imaging apparatus with a voice input / output function. An imaging device with a voice input / output function may be configured without a display, but image data with a voice input / output function can be obtained by inputting image data captured by another imaging device with a voice input / output function. In the apparatus, image data can be output to an external display.
 本発明の音声入出力機能付き撮像装置およびテレビ会議システムによれば、低コストに製造可能であり、かつ、発言者がディスプレイに表示された場合に、ディスプレイを見る人を向いた状態となる可能性が高くなる。 According to the imaging device with audio input / output function and the video conference system of the present invention, it can be manufactured at a low cost, and when a speaker is displayed on the display, it can be in a state suitable for a person who looks at the display. Increases nature.
第1の実施の形態の音声入出力機能付き撮像装置を示すカバーを半透明化した図であって、(a)が平面図であり、(b)が側面図である。It is the figure which made the cover which shows the imaging device with a voice input / output function of 1st Embodiment translucent, Comprising: (a) is a top view, (b) is a side view. 同、音声入出力機能付き撮像装置の使用状況を説明するための図である。It is a figure for demonstrating the use condition of an imaging device with a voice input / output function. 同、音声入出力機能付き撮像装置の全方位カメラに撮影された画像を説明するための図である。It is a figure for demonstrating the image image | photographed with the omnidirectional camera of the imaging device with a voice input / output function. 同、音声入出力機能付き撮像装置から出力される画像を説明するための図であって、(a)は全方位画像から変換されたパノラマ画像の概略を示す図であり、(b)は全方位画像から変換されたパノラマ画像を分割して2列にしたものであり、(c)は、発言者の画像を加えたものであり、(d)異なる3か所で撮影された全方位画像をそれぞれパノラマ画像としたものである。FIG. 4 is a diagram for explaining an image output from an imaging apparatus with a voice input / output function, in which (a) is a diagram illustrating an outline of a panoramic image converted from an omnidirectional image, and (b) is a diagram illustrating all images. The panoramic image converted from the azimuth image is divided into two columns, and (c) is an image of the speaker added, and (d) the omnidirectional images taken at three different locations. Are panoramic images. 第2の実施の形態の音声入出力機能付き撮像装置を示すカバーを半透明化した図であって、(a)が平面図であり、(b)が側面図である。It is the figure which made the cover which shows the imaging device with a voice input / output function of 2nd Embodiment translucent, Comprising: (a) is a top view, (b) is a side view. 第3の実施の形態の音声入出力機能付き撮像装置を示すカバーを半透明化した図であって、(a)が平面図であり、(b)が側面図である。It is the figure which made the cover which shows the imaging device with a voice input / output function of 3rd Embodiment translucent, Comprising: (a) is a top view, (b) is a side view. 第4の実施の形態の音声入出力機能付き撮像装置を示す図であって、(a)が正面図であり、(b)が背面図である。It is a figure which shows the imaging device with a voice input / output function of 4th Embodiment, (a) is a front view, (b) is a rear view.
 以下、図面を参照しながら本発明の第1の実施の形態について説明する。
 本実施の形態のテレビ会議システムは、図1(a)、(b)に示す音声入出力機能付き撮像装置1を複数用いるものであり、離れた複数箇所の会議室に音声入出力機能付き撮像装置1を配置することにより、テレビ会議システムが構築される。
The first embodiment of the present invention will be described below with reference to the drawings.
The video conference system according to the present embodiment uses a plurality of imaging devices 1 with audio input / output functions shown in FIGS. 1A and 1B, and images with audio input / output functions are provided in a plurality of remote conference rooms. By arranging the device 1, a video conference system is constructed.
 図1に示す音声入出力機能付き撮像装置1は、略円板状のベース板2と、ベース板2上を覆う略ドーム状のカバー3と、ベース板2の外周部に周方向に沿って等間隔に配置されるとともに後述の制御基板4に接続されたマイク(音声入力デバイス)5と、ベース板2とカバー3との間に、カバー3で覆われた状態に配置されたスピーカ(音声出力デバイス)6と、カバー3上に固定された全方位カメラ7とを備える。マイク5と、スピーカ6と、全方位カメラ7は、互いに近接して設けられている。すなわち、マイク5と、スピーカ6と、全方位カメラ7とは互いに近傍となる配置となっている。また、スピーカ6と、全方位カメラ7とは、それらの中心軸が略一致するように配置され、マイク5は、上述の中心軸から略等距離となる位置に配置されている。 An imaging apparatus 1 with a voice input / output function shown in FIG. 1 includes a substantially disc-shaped base plate 2, a substantially dome-shaped cover 3 that covers the base plate 2, and an outer peripheral portion of the base plate 2 along the circumferential direction. Speakers (sounds) arranged in a state of being covered with a cover 3 between a base plate 2 and a cover 3 and microphones (sound input devices) 5 that are arranged at equal intervals and connected to a control board 4 described later. Output device) 6 and an omnidirectional camera 7 fixed on the cover 3. The microphone 5, the speaker 6, and the omnidirectional camera 7 are provided close to each other. That is, the microphone 5, the speaker 6, and the omnidirectional camera 7 are arranged close to each other. Further, the speaker 6 and the omnidirectional camera 7 are arranged so that their central axes substantially coincide with each other, and the microphone 5 is arranged at a position that is substantially equidistant from the above-mentioned central axis.
 ベース板2は、その上面に、マイク5、スピーカ6、制御基板4を取り付けるための取付構造が設けられている。また、円板状のベース板2の外周部には、ベース板2と略同径のカバー3の円形の下側縁部(外周縁部)を取り付けるための取付構造が設けられている。 The base plate 2 is provided with an attachment structure for attaching the microphone 5, the speaker 6 and the control board 4 on the upper surface thereof. An attachment structure for attaching a circular lower edge portion (outer peripheral edge portion) of the cover 3 having the same diameter as the base plate 2 is provided on the outer peripheral portion of the disc-shaped base plate 2.
 カバー3は、マイク5に対応する位置に図示しない1つまたは複数の孔が設けられ、マイク5への音声入力を妨げないようになっている。まあ、ドーム状のカバー3の上部(中央部)には、スピーカ6からの音声出力用の開口部3aが設けられている。まあ、カバー3の開口部3aには、全方位カメラ7をカバー3の上部の中央部に固定するための橋状のカメラ固定部3bが設けられている。 The cover 3 is provided with one or a plurality of holes (not shown) at positions corresponding to the microphone 5 so as not to interfere with voice input to the microphone 5. The dome-shaped cover 3 is provided with an opening 3a for outputting sound from the speaker 6 at the upper part (central part). The opening 3 a of the cover 3 is provided with a bridge-like camera fixing portion 3 b for fixing the omnidirectional camera 7 to the central portion of the upper portion of the cover 3.
 マイク5は、例えば、指向性を有するものであり、最も感度の高い方向を、全方位カメラ7の例えば撮影範囲となる半球面や円筒面の中心軸に直交する半径方向に合わせている。また、マイク5の配置位置は、撮影範囲の中心軸に対して半径方向に等距離で、それぞれ90度ずれた位置(周方向に等間隔)に配置されている。なお、マイク5として無指向性のマイク5を用いてもよい。各マイク5は、制御基板4に接続されており音声を音声信号に変換して制御基板4に入力している。なお、音声信号はアナログであってもデジタルであってもよい。 The microphone 5 has, for example, directivity, and the direction with the highest sensitivity is matched with the radial direction orthogonal to the central axis of the hemispherical surface or cylindrical surface of the omnidirectional camera 7, for example. In addition, the microphones 5 are arranged at equal positions in the radial direction with respect to the center axis of the photographing range and at positions shifted by 90 degrees (equal intervals in the circumferential direction). Note that an omnidirectional microphone 5 may be used as the microphone 5. Each microphone 5 is connected to the control board 4 and converts sound into an audio signal and inputs it to the control board 4. Note that the audio signal may be analog or digital.
 スピーカ6は、全方位型のものであり、1つのスピーカ6により音声が全方位に略同等に出力する。なお、全方位型でないスピーカを3つまたは4つ等のように複数用いてもよい。スピーカ6は、制御基板4に接続されており、制御基板4から出力される音声信号を音声に変換して周囲に出力する。 The speaker 6 is of an omnidirectional type, and a single speaker 6 outputs audio in almost all directions. A plurality of non-omnidirectional speakers such as three or four may be used. The speaker 6 is connected to the control board 4 and converts a sound signal output from the control board 4 into sound and outputs the sound to the surroundings.
 全方位カメラ7は、例えば、半球状の撮像範囲を有する魚眼カメラであり、周囲を撮像対象としているが、例えば、複数のカメラで撮影された画像から全方位画像データF(図2に図示)を得るようなものであっても良いし、略円錐状のミラーを介して周囲を撮影するカメラであってもよいし、全天球カメラであってもよい。全方位カメラ7では、テーブルTに置かれた音声入出力機能付き撮像装置1からテーブルTの周囲に座る被写体としての参加者を撮像できればよく、例えば、上方向の画像データは必要としない。 The omnidirectional camera 7 is, for example, a fisheye camera having a hemispherical imaging range, and the surrounding area is an imaging target. For example, omnidirectional image data F (shown in FIG. 2) is obtained from images captured by a plurality of cameras. ), A camera that captures the surroundings through a substantially conical mirror, or an omnidirectional camera. The omnidirectional camera 7 only needs to be able to image a participant as a subject sitting around the table T from the imaging apparatus 1 with a voice input / output function placed on the table T. For example, upward image data is not required.
 また、全方位カメラ7の配置位置が高い場合、例えば、座った参加者の頭部以上の高さを有する場合など、半球状の撮影範囲では、参加者の胸像を撮影することができなくなるので、全方位カメラ7の配置位置が高くなる場合には、全天球カメラを好適に用いることができる。 Further, when the arrangement position of the omnidirectional camera 7 is high, for example, when it has a height higher than the head of the participant who sits down, it becomes impossible to photograph the bust of the participant in the hemispherical photographing range. When the arrangement position of the omnidirectional camera 7 becomes high, an omnidirectional camera can be preferably used.
 制御基板4は、音源方向認識デバイスとして、4つのマイク5から入力される音声信号の音量レベル(音の大きさ)から音源の方向を特定するようになっている。本実施の形態では、音源の方向と音源までの距離を特定することにより音源の位置を特定することはしないので、4つのマイク5の音量レベルから音源の位置を測定する。例えば、音量レベルが高い上位2本の隣り合うマイク5を特定し、これらの2つのマイクの音量の差からこれら2つのマイク5の中間となる方向を決定する。 As the sound source direction recognition device, the control board 4 specifies the direction of the sound source from the volume level (sound volume) of the audio signals input from the four microphones 5. In the present embodiment, since the position of the sound source is not specified by specifying the direction of the sound source and the distance to the sound source, the position of the sound source is measured from the volume levels of the four microphones 5. For example, the top two adjacent microphones 5 having a high volume level are specified, and the direction between these two microphones 5 is determined from the difference in volume between these two microphones.
 例えば、2つのマイク5で音量に差が無ければ、これらマイク5の略中央となる方向に音源があると特定し、どちらかのマイク5の音量が高ければ、これらマイク5の中央となる方向と、音量が高い方のマイク5の方向との間に音源の方向があることになる。また、音量が2位となるマイク5と、音量が3位となるマイク5とで音量が略同じならば、音量が1位のマイク5が向く方向に音源があることになる。 For example, if there is no difference in volume between the two microphones 5, it is specified that there is a sound source in the direction that is approximately the center of these microphones 5. If the volume of either microphone 5 is high, the direction that is the center of these microphones 5. And the direction of the sound source is between the direction of the microphone 5 with the higher volume. If the volume of the microphone 5 with the second volume is substantially the same as that of the microphone 5 with the third volume, the sound source is in the direction in which the microphone 5 with the first volume faces.
 なお、各マイク5における音の位相のずれから音源を特定するものとしてもよい。すなわち、音源からの距離の違いによる各マイク5における音の到達時間の違いに基づいて音源の方向を特定する周知の方法を用いてもよい。
 また、画像認識デバイスとしての制御基板4は、全方位カメラ7から入力される全方位画像データFから発言者の方向を特定するようになっている。基本的には、周知の顔認識により全方位画像データFから各参加者(被撮像者)の顔を認識することにより、各参加者の方向を特定する。また、各参加者の口を画像認識し、口(唇)が動いているか否かを判定し、口が動いていると判定された顔の方向を発言者の方向とする。
The sound source may be specified from the phase shift of the sound in each microphone 5. That is, a well-known method for specifying the direction of the sound source based on the difference in sound arrival time in each microphone 5 due to the difference in distance from the sound source may be used.
The control board 4 as an image recognition device is adapted to specify the direction of the speaker from the omnidirectional image data F input from the omnidirectional camera 7. Basically, the direction of each participant is specified by recognizing the face of each participant (imaged person) from the omnidirectional image data F by well-known face recognition. Further, each participant's mouth is image-recognized, it is determined whether or not the mouth (lips) is moving, and the direction of the face determined that the mouth is moving is set as the direction of the speaker.
 なお、画像処理および画像認識に関しては、インテル(登録商標)オープンCV(Intel Open Source Computer Vision Library)を利用して容易に作成可能である。例えば、顔認識プログラムを作成する場合に、オープンCVに登録されているオブジェクト検出プログラムを用いることができる。画像認識の原理として、学習フェーズと認識フェーズがあり、画像から特徴量を抽出し、学習アルゴリズムによってオブジェクトの特徴を学習することにより、例えば、顔認識等の画像認識が可能となる。オープンCVでは、画像特徴量としてHaar・Like特徴量を用い、学習アルゴリズムとしてAdaboostと呼ばれるアルゴリズムを使用している。オブジェクト検出プログラムにおいて、特徴点に基づいて顔の画像か否かを機械学習させることにより、オブジェクト検出プログラムにおいて、顔の画像を顔として認識することが可能となる。なお、画像認識プログラムに必ずしもオープンCVを利用しなくてもよいし、既存のプログラムや、既存の画像認識回路を搭載したチップを利用してもよい。発言者の口の動きの認識も上述のオープンCVのオブジェクト検出プログラムを用いて、機会学習させることにより、例えば、話している口と、黙っている口の違いを認識させることができる。 Note that image processing and image recognition can be easily created using Intel (registered trademark) Open CV (Intel Open Source Computer Library Library). For example, when creating a face recognition program, an object detection program registered in the open CV can be used. As a principle of image recognition, there are a learning phase and a recognition phase. By extracting a feature amount from an image and learning a feature of an object by a learning algorithm, for example, image recognition such as face recognition becomes possible. In Open CV, Haar / Like feature values are used as image feature values, and an algorithm called Adaboost is used as a learning algorithm. In the object detection program, it is possible to recognize a face image as a face in the object detection program by causing the object detection program to perform machine learning based on the feature points. Note that the open CV is not necessarily used for the image recognition program, and an existing program or a chip equipped with an existing image recognition circuit may be used. The movement of the speaker's mouth can also be recognized by using the above-mentioned open CV object detection program for opportunity learning, for example, to recognize the difference between a speaking mouth and a silent mouth.
 本実施の形態では、顔認識を行って各参加者の方向を認識するとともに、口の動きを検出して発言者の方向を認識する。なお、上述のように制御基板4では、音声によっても発言者としての音源の方向を特定しているので、本実施の形態では、これら音源方向認識と画像認識に基づく発言者の方向が例えば所定角度範囲内(例えば0~10度以内)で一致する場合に、これら音源方向認識と画像認識で求められた2つの方向のうち、例えば、画像認識で得られた方向を、発言者の方向としている。 In this embodiment, face recognition is performed to recognize the direction of each participant, and mouth movement is detected to recognize the direction of the speaker. As described above, the control board 4 specifies the direction of the sound source as a speaker even by voice. In the present embodiment, the direction of the speaker based on the sound source direction recognition and the image recognition is, for example, predetermined. Of the two directions obtained by the sound source direction recognition and the image recognition, for example, the direction obtained by the image recognition is used as the speaker direction when the angles match within the angle range (for example, within 0 to 10 degrees). Yes.
 音源方向認識による音源方向と画像認識による発言者の方向とが所定角度範囲以内とならない場合には、発言者がいないと判定する。これにより、小声で私語を話している参加者や、あくびをしている参加者や、椅子を動かした際に大きな音を出した参加者などが、一時的にでも発言者として認識されて例えば別の会場のディスプレイ8に大きく表示されてしまうような状態を防止している。なお、音源方向認識だけで、発言者の方向を決定しても良いし、画像認識だけで発言者の方向を決定してもよい。 If the direction of the sound source by the sound source direction recognition and the direction of the speaker by the image recognition are not within the predetermined angle range, it is determined that there is no speaker. As a result, participants who speak a private language, participants who yawn, participants who make loud noises when moving a chair, etc. are recognized as speakers even temporarily, for example This prevents a situation in which the image is displayed largely on the display 8 at another venue. Note that the direction of the speaker may be determined only by sound source direction recognition, or the direction of the speaker may be determined only by image recognition.
 また、制御基板4は、全方位カメラ7から入力された全方位画像データFを周知の画像処理によりパノラマ画像に変換する画像処理デバイスとして機能する。この際には、全方位画像データFからパノラマ画像の右端および左端となる位置を決定して、全方位画像データFからパノラマ画像データを作成する。上述のように発言者の方向が特定された場合には、発言者の方向から180度、すなわち、発明者の方向の反対となる方向の位置で、全方位画像データFを切り開き、この位置をパノラマ画像の右端および左端の位置とする。また、発言者がいない場合には、例えば、上述のように顔認識された各参加者の間隔を判定し、最も広い間隔の中央をパノラマ画像の左端および右端の位置とする。 Further, the control board 4 functions as an image processing device that converts the omnidirectional image data F input from the omnidirectional camera 7 into a panoramic image by known image processing. At this time, the panoramic image data is created from the omnidirectional image data F by determining the positions at the right and left ends of the panoramic image from the omnidirectional image data F. When the direction of the speaker is specified as described above, the omnidirectional image data F is cut open at a position 180 degrees from the direction of the speaker, that is, in a direction opposite to the direction of the inventor. The positions of the right end and the left end of the panoramic image are used. Further, when there is no speaker, for example, the interval of each participant whose face is recognized as described above is determined, and the center of the widest interval is set as the position of the left end and the right end of the panoramic image.
 また、制御基板4は、発言者の方向を特定した場合に、その方向で顔認識された参加者が主に被写体となっている発言者の画像データを作成する。なお、この画像データの作成においては、顔認識された参加者の画像部分を取り出して画像データとしてもよいし、特定された発言者の方向の所定角度範囲の画像部分を発言者の画像データとしてもよい。 Also, when the direction of the speaker is specified, the control board 4 creates the image data of the speaker whose participants are mainly recognized in that direction. In the creation of the image data, the face-recognized participant's image portion may be taken out and used as image data, or the image portion within a predetermined angle range in the direction of the specified speaker is used as the speaker's image data. Also good.
 また、通信デバイスとしての制御基板4は、ローカルエリアネットワーク(LAN)や、インターネットや公衆電話回線網や、携帯電話回線網や専用通信回線等を利用して、離れた場所にある他の音声入出力機能付き撮像装置1とデータ通信を行い、マイク5により入力された音声信号および全方位カメラ7で撮影された全方位画像データFを上述のように画像処理したパノラマ画像データおよび発言者の画像データを他の音声入出力機能付き撮像装置1に送信する。 In addition, the control board 4 as a communication device uses a local area network (LAN), the Internet, a public telephone line network, a mobile phone line network, a dedicated communication line, etc. The panoramic image data and the speaker image obtained by performing the image processing as described above on the audio signal input from the microphone 5 and the omnidirectional image data F captured by the omnidirectional camera 7 by performing data communication with the imaging device 1 with an output function. Data is transmitted to another imaging apparatus 1 with a voice input / output function.
 また、他の音声入出力機能付き撮像装置1から送信された音声信号、パノラマ画像データ、発言者の画像データ等を受信する。なお、発言者の画像データは、当該画像データが作成された場合にだけ送受信される。また、本実施の形態では、音声入出力機能付き撮像装置1にはディスプレイ8が無いので、受信された画像データは、ディスプレイ8用の接続端子に出力され、接続端子に接続されたディスプレイ8に画像データを表示する。なお、後述のように音声入出力機能付き撮像装置1にディスプレイ8を含めて受信した画像データを音声入出力機能付き撮像装置1のディスプレイ8に出力するようにしてもよい。 Also, it receives audio signals, panoramic image data, speaker image data, etc. transmitted from the other imaging device 1 with audio input / output function. The image data of the speaker is transmitted / received only when the image data is created. In the present embodiment, since the imaging apparatus 1 with the voice input / output function does not have the display 8, the received image data is output to the connection terminal for the display 8, and is displayed on the display 8 connected to the connection terminal. Display image data. As will be described later, the received image data including the display 8 in the imaging apparatus 1 with the voice input / output function may be output to the display 8 of the imaging apparatus 1 with the voice input / output function.
 また、制御基板4で、音源方向認識、画像認識、画像処理等を行うものとしたが、制御基板4では、主に音声信号、画像データの入出力だけを制御し、制御基板4に有線LANや無線LANやUSB等で接続されたパーソナルコンピュータ(パソコンPC:図2に図示)で音源方向認識、画像認識、画像処理を行うものとしてもよい。また、各種画像処理を、全方位画像を撮影した全方位カメラ7がある音声入出力機能付き撮像装置1で行うものとしたが、画像処理を、画像データを受信する側の音声入出力機能付き撮像装置1またはそれに接続されたパソコンPCで行ってもよい。すなわち、画像データとして全方位カメラ7で撮影された全方位画像データFをそのまま送信して、受信した音声入出力機能付き撮像装置1において、画像処理してディスプレイ8に表示するものとしてもよい。 The control board 4 performs sound source direction recognition, image recognition, image processing, and the like. However, the control board 4 mainly controls input / output of audio signals and image data, and connects the control board 4 to the wired LAN. Alternatively, sound source direction recognition, image recognition, and image processing may be performed by a personal computer (PC PC: illustrated in FIG. 2) connected by a wireless LAN, USB, or the like. In addition, various image processing is performed by the imaging apparatus 1 with an audio input / output function having an omnidirectional camera 7 that captures an omnidirectional image. You may carry out with the imaging device 1 or the personal computer PC connected to it. That is, the omnidirectional image data F taken by the omnidirectional camera 7 may be transmitted as image data as it is, and the received image pickup apparatus with a voice input / output function 1 may process the image and display it on the display 8.
 このような電話会議システムの音声入出力機能付き撮像装置1は、例えば、図2に示すように、会議室のテーブルTの上に置いて用いられる。会議の参加者Pは、テーブルTを囲んで座ることになる。ここでは、長方形状のテーブルTの2つの長辺にそれぞれ参加者Pが2列に座っている。なお、図2では、上述のようにパソコンPCを用いるものとし、ディスプレイ8は、パソコンPCを介して接続されており、パソコンPCで処理された画像データがディスプレイ8に表示される。 The imaging device 1 with a voice input / output function of such a telephone conference system is used by being placed on a table T in a conference room, for example, as shown in FIG. The conference participant P sits around the table T. Here, the participants P sit in two rows on the two long sides of the rectangular table T, respectively. In FIG. 2, the personal computer PC is used as described above, and the display 8 is connected via the personal computer PC, and image data processed by the personal computer PC is displayed on the display 8.
 図2に示す状態で、全方位カメラ7で撮像された全方位画像データFは、図3に示す状態となる。なお、図3では、立体的な全方位画像データFを平面に投影した状態で簡略化して示している。制御基板4では、この全方位画像データFを画像処理して、図4(a)または図4(b)に示すディスプレイ8の表示中に表示されるパノラマ画像G1またはパノラマ画像G1を2つに分割したパノラマ画像G2、G3としている。 The omnidirectional image data F captured by the omnidirectional camera 7 in the state shown in FIG. 2 is in the state shown in FIG. In FIG. 3, the three-dimensional omnidirectional image data F is shown in a simplified manner in a state projected onto a plane. The control board 4 performs image processing on this omnidirectional image data F to make two panoramic images G1 or G1 displayed during display on the display 8 shown in FIG. 4A or 4B. The divided panoramic images G2 and G3 are used.
 本実施の形態では、図4(b)に示すように、全方位画像データF中の各参加者Pの間隔を判定し、所定間隔(角度)以上の間隔がある場合に、パノラマ画像G1を分離し、分離された部分の間隔をカットすることで、パノラマ画像G2,G3の左右幅を圧縮している。なお、パノラマ画像G1、G2、G3の作成に際し、参加者P同士の間の間隔を全てカットするようにしてもよい。また、所定幅(所定角度範囲)で各参加者Pの画像データを作成し、これを横に並べることでパノラマ画像を作成してもよい。この場合も、参加者P同士の間隔を表示しないようにできる。なお、図4(b)では、2つに分離した画像データを上下二段に表示することにより、各パノラマ画像G2,G3を大きく表示している。 In the present embodiment, as shown in FIG. 4B, the interval of each participant P in the omnidirectional image data F is determined, and if there is an interval greater than a predetermined interval (angle), the panoramic image G1 is displayed. The left and right widths of the panoramic images G2 and G3 are compressed by separating and cutting the interval between the separated parts. Note that when creating the panoramic images G1, G2, and G3, all the intervals between the participants P may be cut. Alternatively, the panoramic image may be created by creating image data of each participant P with a predetermined width (predetermined angle range) and arranging the data side by side. Also in this case, the interval between the participants P can be prevented from being displayed. In FIG. 4B, the panoramic images G2 and G3 are displayed in a large size by displaying the image data separated into two in the upper and lower stages.
 また、発言者を特定した場合には、図4(c)に示すように、図4(a)に示すパノラマ画像G1に加えて発言者を主体とする画像G10を別に表示する。なお、テレビ会議は、2箇所だけで行われるとは限らず、3か所以上で行われる場合があるので、その場合には、例えば、図4(d)に示すように、ディスプレイ8の画面を分割して、各分割箇所にパノラマ画像G1,G4,G5を表示する。図4(d)では、4か所を結んでテレビ会議が行われ、ディスプレイ8がある会議室以外の他の3か所の会議室の画像が表示された状態となっている。 Further, when the speaker is specified, as shown in FIG. 4C, in addition to the panoramic image G1 shown in FIG. 4A, an image G10 mainly composed of the speaker is displayed separately. In addition, since the video conference is not necessarily held in only two places and may be held in three or more places, in that case, for example, as shown in FIG. And panoramic images G1, G4, and G5 are displayed at the respective divided portions. In FIG. 4D, a video conference is performed by connecting four places, and images of three meeting rooms other than the meeting room with the display 8 are displayed.
 この音声入出力機能付き撮像装置1を用いたテレビ会議システムでは、各会議室に設置された音声入出力機能付き撮像装置1の上述のように通信デバイスとしての制御基板4において、各会議室で撮影された画像データと入力された音声信号を送受信することにより、上述のようにディスプレイ8に、他の会議室の参加者の画像が表示されるとともに、スピーカ6から他の会議室で入力された音声信号が出力される。 In the video conference system using the imaging device 1 with the voice input / output function, the control board 4 as a communication device of the imaging device 1 with the voice input / output function installed in each conference room, as described above, in each conference room. By transmitting and receiving the captured image data and the input audio signal, the images of the participants in other conference rooms are displayed on the display 8 as described above, and input from the speakers 6 in the other conference rooms. Audio signal is output.
 このような音声入出力機能付き撮像装置1およびテレビ会議システムにおいては、上述のように全方位カメラ7とマイク5とスピーカ6とが略一体的に構成されており、発言する参加者(発言者)は、基本的にマイク5に向かって発言しようとする。この場合に、マイク5の近傍に全方位カメラ7があるので、発言者は、全方位カメラ7に向かって発言する状態となり、発言者は正面から撮影される状態となる。この場合に、発言者の画像G10をディスプレイ8に表示した際に、発言者がディスプレイ8を見ている他の会議室の参加者に向かって話しているように見える可能性が高い。 In such an imaging apparatus 1 with a voice input / output function and a video conference system, the omnidirectional camera 7, the microphone 5, and the speaker 6 are substantially integrated as described above, and a participant who speaks (speaker) ) Basically tries to speak into the microphone 5. In this case, since there is the omnidirectional camera 7 in the vicinity of the microphone 5, the speaker is in a state of speaking toward the omnidirectional camera 7, and the speaker is in a state of being photographed from the front. In this case, when the speaker's image G <b> 10 is displayed on the display 8, there is a high possibility that the speaker is talking to a participant in another conference room looking at the display 8.
 また、他の会議室の参加者と話し合っている状態の場合には、他の会議室の発言者の音声が、全方位カメラ7の近傍のスピーカ6から聞こえるので、音を聞き取り易くするためにスピーカ6の方を向くことになる。これにより、発言者が全方位カメラ7に向かって話す状態となり安い。したがって、上述のように発言者が他の会議室の参加者の方を向いて話している状態の画像を得易くなる。これらのことから、ディスプレイ8の画面において、発言者が全方位カメラ7以外の方向を向いて話すことによるテレビ会議特有の違和感が生じるのを抑制することができる。言い換えれば、発言者が意識してカメラの方を向くように努力しなくても、自然に全方位カメラ7の方を向くように促すことができる。 Also, in the state of talking with participants in other conference rooms, the voice of the speaker in the other conference room can be heard from the speaker 6 near the omnidirectional camera 7, so that the sound can be easily heard. It faces the speaker 6. As a result, the speaker speaks into the omnidirectional camera 7 and is inexpensive. Therefore, as described above, it is easy to obtain an image in a state where a speaker is speaking toward a participant in another conference room. For these reasons, it is possible to suppress a sense of incongruity peculiar to a video conference caused by a speaker speaking in a direction other than the omnidirectional camera 7 on the screen of the display 8. In other words, it is possible to urge the speaker to naturally face the omnidirectional camera 7 without making an effort to consciously face the camera.
 また、全方位カメラ7により、基本的にテーブルTの周囲に坐っている全ての参加者が略同等の大きさで撮影されているので、特に全方位カメラ7を制御しなくても、上述のように発言している参加者を特定すれば、容易に発言者の画像を得ることができる。 In addition, since all the participants sitting around the table T are basically photographed with substantially the same size by the omnidirectional camera 7, the above-described omnidirectional camera 7 can be used without any particular control. Thus, if the participant who speaks is specified, the image of the speaker can be easily obtained.
 次に、本発明の第2の実施の形態を説明する。
 図5(a)、(b)に示すように、第2の実施の形態の音声入出力機能付き撮像装置1aは、第1の実施の形態の音声入出力機能付き撮像装置1と同様に、ベース板11、カバー12、図示しない制御基板(図1の制御基板4)、マイク5、スピーカ6、全方位カメラ7を備える。第2の実施の形態の音声入出力機能付き撮像装置1aは、さらにディスプレイ8を備える、すなわち、第1の実施の形態の音声入出力機能付き撮像装置1と第2の実施の形態の音声入出力機能付き撮像装置1aとの違いは、ディスプレイ8が音声入出力機能付き撮像装置1に対して別体になっているか、音声入出力機能付き撮像装置1aに、ディスプレイ8が備えられているかの違いである。
Next, a second embodiment of the present invention will be described.
As shown in FIGS. 5A and 5B, the imaging apparatus 1a with the voice input / output function of the second embodiment is similar to the imaging apparatus 1 with the voice input / output function of the first embodiment. A base plate 11, a cover 12, a control board (not shown) (control board 4 in FIG. 1), a microphone 5, a speaker 6, and an omnidirectional camera 7 are provided. The imaging apparatus 1a with a voice input / output function of the second embodiment further includes a display 8, that is, the imaging apparatus 1 with a voice input / output function of the first embodiment and the voice input of the second embodiment. The difference from the imaging device with an output function 1a is whether the display 8 is separate from the imaging device 1 with a voice input / output function or whether the display 8 is provided in the imaging device 1a with a voice input / output function. It is a difference.
 本実施の形態において、ベース板11は、矩形板状に形成され、その四隅部のそれぞれにマイク5が備えられている。また、ベース板11の互いに離れた一対の側縁部には、それぞれ表示画面を反対方向(外側)に向けてディスプレイ(例えば、液晶ディスプレイ)8が取り付けられている。また、ベース板11の2つのディスプレイ8の間に、図示しない制御基板とスピーカ6が配置されている。 In this embodiment, the base plate 11 is formed in a rectangular plate shape, and the microphone 5 is provided at each of the four corners. In addition, a display (for example, a liquid crystal display) 8 is attached to a pair of side edges of the base plate 11 that face each other in the opposite direction (outside). A control board and a speaker 6 (not shown) are disposed between the two displays 8 on the base plate 11.
 カバー12は、矩形状のベース板11に対応する直方体状に形成され、ベース板11を覆うように取り付けられている。カバー12の上述の2つのディスプレイ8に対応する互いに平行な2つの側面には、ディスプレイ8の表示画面を外部から視認可能とする窓部12aが設けられている。また、カバー12の天板には、スピーカ6に対応して開口部12bが設けられている。カバー12の開口部12bの部分には、橋状にカメラ固定部12cが設けられ、このカメラ固定部12cに全方位カメラ7が取り付けられている。なお、カバー12のマイク5に対応する位置には、1つか複数の孔を設けてもよい。 The cover 12 is formed in a rectangular parallelepiped shape corresponding to the rectangular base plate 11 and is attached so as to cover the base plate 11. On two side surfaces of the cover 12 corresponding to the above-described two displays 8 that are parallel to each other, a window portion 12a that allows the display screen of the display 8 to be visually recognized from the outside is provided. Further, the top plate of the cover 12 is provided with an opening 12 b corresponding to the speaker 6. A camera fixing portion 12c is provided in a bridge shape at the opening 12b portion of the cover 12, and the omnidirectional camera 7 is attached to the camera fixing portion 12c. One or a plurality of holes may be provided at a position corresponding to the microphone 5 of the cover 12.
 また、この音声入出力機能付き撮像装置1aは、図2に示すように、テーブルTの互いに平行な2つの側縁にそれぞれ参加者Pが並んで座る場合に好適に用いられるように、2つのディスプレイ8を互いに反対向きに配置している。また、ディスプレイ8としては、例えば、7インチから15インチ程度の比較的画面の小さいディスプレイ8を用いてIおり、テーブルT上に置いた場合に、互いに対向して坐っている参加者同士の視線を遮らないようになっている。また、ディスプレイ8にかかるコストを低減している。 In addition, as shown in FIG. 2, the imaging apparatus 1 a with a voice input / output function includes two pieces so as to be preferably used when participants P sit side by side on two parallel side edges of the table T, respectively. The displays 8 are arranged in opposite directions. Further, as the display 8, for example, a display 8 having a relatively small screen of about 7 inches to 15 inches is used, and when placed on the table T, the lines of sight of the participants sitting facing each other are displayed. It is designed not to block. Moreover, the cost concerning the display 8 is reduced.
 このような第2の実施の形態の音声入出力機能付き撮像装置1aによれば、第1の実施の形態の音声入出力機能付き撮像装置1と略同様の作用効果を得ることができる。また、全方位カメラ7の近傍にディスプレイ8が設けられており、上述のようにテーブルTの回りに参加者が着席した場合に、頭の向きを斜めにしたりすることなく、正面を向いた状態で無理なくディスプレイ8を見ることができる。 According to the imaging apparatus with a voice input / output function 1a of the second embodiment, it is possible to obtain substantially the same operational effects as the imaging apparatus 1 with a voice input / output function of the first embodiment. In addition, the display 8 is provided in the vicinity of the omnidirectional camera 7, and when the participant is seated around the table T as described above, the head is turned to the front without being inclined. The display 8 can be seen without difficulty.
 また、参加者Pがディスプレイ8の方を向くと、ディスプレイ8の近傍でディスプレイ8の略上に全方位カメラ7があることにより、全方位カメラ7を見ることになり、参加者の略全員が他の会場の参加者を見ているような画像データを得られる。すなわち、第1の実施の形態では、主に発言者が全方位カメラ7を見て発言するように促す構造であったが、他の参加者は、全方位カメラ7と異なる場所にあるディスプレイ8を見ている可能性があり、発言者以外の参加者が全方位カメラ7を見ておらず、発言者以外の参加者がよそを向いている画像が撮像されるのを抑制することが困難であった。 When the participant P faces the display 8, the omnidirectional camera 7 is seen in the vicinity of the display 8 in the vicinity of the display 8, so that the omnidirectional camera 7 is viewed. You can get image data that looks like participants in other venues. That is, in the first embodiment, the structure is such that the speaker is mainly urged to speak by looking at the omnidirectional camera 7, but the other participants have the display 8 in a different place from the omnidirectional camera 7. It is difficult to prevent the participant other than the speaker from looking at the omnidirectional camera 7 and taking an image of the participant other than the speaker facing away. Met.
 それに対して、第2の実施の形態では、全方位カメラ7の近傍に、ディスプレイ8を配置し、参加者がディスプレイ8を見ると参加者の顔が全方位カメラ7の方向くことになる。また、発言者もディスプレイ8を見るために、マイク5、スピーカ6、全方位カメラ7の方向から顔の向きを逸らす必要がなくなり、発言中は、全方位カメラ7に顔を向けた状態となる。 On the other hand, in the second embodiment, the display 8 is arranged in the vicinity of the omnidirectional camera 7, and when the participant looks at the display 8, the face of the participant is directed toward the omnidirectional camera 7. In addition, in order for the speaker to view the display 8, it is not necessary to deviate the direction of the face from the direction of the microphone 5, the speaker 6, and the omnidirectional camera 7, and the face is directed to the omnidirectional camera 7 while speaking. .
 次に、本発明の第3の実施の形態を説明する。
 図6(a)、(b)に示すように、第3の実施の形態の音声入出力機能付き撮像装置1bは、第1の実施の形態の音声入出力機能付き撮像装置1と同様に、ベース板21、カバー22、制御基板(図1の制御基板4)、マイク5、スピーカ6、全方位カメラ7を備える。第3の実施の形態の音声入出力機能付き撮像装置1bは、第2の実施の形態の場合と同様に、ディスプレイ8を備える。
Next, a third embodiment of the present invention will be described.
As shown in FIGS. 6A and 6B, the imaging device 1b with the voice input / output function of the third embodiment is similar to the imaging device 1 with the voice input / output function of the first embodiment. A base plate 21, a cover 22, a control board (control board 4 in FIG. 1), a microphone 5, a speaker 6, and an omnidirectional camera 7 are provided. The imaging apparatus 1b with a voice input / output function of the third embodiment includes a display 8 as in the case of the second embodiment.
 本実施の形態において、ベース板21は、三角形の板状に形成され、その3つの隅部のそれぞれにマイク5が備えられている。また、ベース板21の3つの側縁部には、それぞれ表示画面を外側に向けてディスプレイ8が取り付けられている。また、ベース板11の3つのディスプレイ8の内側に、図示しない制御基板とスピーカ6が配置されている。 In this embodiment, the base plate 21 is formed in a triangular plate shape, and a microphone 5 is provided at each of the three corners. A display 8 is attached to each of the three side edges of the base plate 21 with the display screen facing outward. Further, a control board and a speaker 6 (not shown) are arranged inside the three displays 8 of the base plate 11.
 カバー22は、三角形状のベース板21に対応する三角柱状に形成され、ベース板21を覆うように取り付けられている。カバー22の3つの側面それぞれのディスプレイ8に対応する位置には、ディスプレイ8の表示画面を外部から視認可能とする窓部22aが設けられている。また、カバー22の天板には、スピーカ6に対応して開口部22bが設けられている。カバー22の開口部22bの部分には、Y字橋状にカメラ固定部22cが設けられ、このカメラ固定部22cに全方位カメラ7が取り付けられている。なお、カバー22のマイク5に対応する位置には、1つか複数の孔を設けてもよい。 The cover 22 is formed in a triangular prism shape corresponding to the triangular base plate 21 and is attached so as to cover the base plate 21. At positions corresponding to the display 8 on each of the three side surfaces of the cover 22, a window portion 22 a that allows the display screen of the display 8 to be visually recognized from the outside is provided. In addition, the top plate of the cover 22 is provided with an opening 22 b corresponding to the speaker 6. A camera fixing portion 22c is provided in a Y-bridge shape at the opening 22b of the cover 22, and the omnidirectional camera 7 is attached to the camera fixing portion 22c. One or a plurality of holes may be provided at a position corresponding to the microphone 5 of the cover 22.
 第3の実施の形態の音声入出力機能付き撮像装置1bは、基本的にディスプレイ8とマイク5の数の違いと、平面形状が四角形か三角形かの違い以外は、第2の実施の形態の音声入出力機能付き撮像装置1aと略同様の構造を有するものであり、同様の作用効果を奏する。また、第3の実施の形態では、ディスプレイ8が互いに120度離れた3方向を向いているので、テーブルTの周囲でディスプレイ8の画面が見られない死角となる方向を減らすことができる。なお、第2の実施の形態の形状で、カバー12の全ての側面にディスプレイ8を設けることで、音声入出力機能付き撮像装置1aが4つのディスプレイ8を持つものとしてもよい。 The imaging apparatus 1b with a voice input / output function according to the third embodiment is basically the same as that of the second embodiment except for the difference in the number of displays 8 and microphones 5 and whether the planar shape is a square or a triangle. The image pickup apparatus 1a with a voice input / output function has substantially the same structure and exhibits the same effects. In the third embodiment, since the display 8 faces three directions 120 degrees apart from each other, it is possible to reduce the direction of the blind spot where the screen of the display 8 cannot be seen around the table T. In addition, it is good also as the imaging device 1a with an audio | voice input / output function having the four displays 8 by providing the display 8 in all the side surfaces of the cover 12 in the shape of 2nd Embodiment.
 次に、本発明の第4の実施の形態を説明する。
 図7(a)、(b)に示すように、第4の実施の形態の音声入出力機能付き撮像装置1cは、第1の実施の形態の音声入出力機能付き撮像装置1と同様に、制御基板(図1の制御基板4)、マイク5a、スピーカ6a、全方位カメラ7aを備える。第4の実施の形態の音声入出力機能付き撮像装置1cは、第2、第3の実施の形態の場合と同様に、ディスプレイ8aを備える。
Next, a fourth embodiment of the present invention will be described.
As shown in FIGS. 7A and 7B, the imaging device 1c with the voice input / output function of the fourth embodiment is similar to the imaging device 1 with the voice input / output function of the first embodiment. A control board (control board 4 in FIG. 1), a microphone 5a, a speaker 6a, and an omnidirectional camera 7a are provided. The imaging apparatus with audio input / output function 1c according to the fourth embodiment includes a display 8a as in the second and third embodiments.
 本実施の形態においては、例えば、15インチより大きいディスプレイ8aとして、例えば20~32インチ程度(それ以上であってもよい)のディスプレイ8aに音声入出力機能付き撮像装置1cの制御基板、マイク5a、スピーカ6aを組み込み、ディスプレイ8aの上面の中央部に、全方位カメラ7aが取り付けられている。すなわち、パソコン用ディスプレイなどで、スピーカとマイクを内蔵するディスプレイに制御基板と全方位カメラ7aを設けた構成となっている。但し、ディスプレイ8aをパソコンと接続し、制御基板のデータの入出力以外の機能をパソコンPCに持たせるものとしてもよい。この場合にディスプレイ8aとパソコンPCの接続は、スピーカ、マイク、カメラを備えるタイプのディスプレイをパソコンPCに接続する場合と同様に行うことができる。 In the present embodiment, for example, as the display 8a larger than 15 inches, the control board of the imaging device 1c with a voice input / output function, the microphone 5a, for example, the display 8a of about 20 to 32 inches (or more) may be used. The speaker 6a is incorporated, and an omnidirectional camera 7a is attached to the center of the upper surface of the display 8a. That is, in the display for a personal computer or the like, the control board and the omnidirectional camera 7a are provided on a display incorporating a speaker and a microphone. However, the display 8a may be connected to a personal computer, and the personal computer PC may have functions other than the control board data input / output. In this case, the connection between the display 8a and the personal computer PC can be performed in the same manner as when a display of a type including a speaker, a microphone, and a camera is connected to the personal computer PC.
 図7(a)、(b)に示すように、第4の実施の形態では、ディスプレイ8aは、表裏面の両方に表示画面14a、14bを有するものであり、図2に示すように、互いに対向して坐る参加者がそれぞれ別の表示画面14a、14bを見るようになっている。なお、音声入出力機能付き撮像装置1cにおいて、裏面側に表示画面14bを設けないものとして、複数台の音声入出力機能付き撮像装置1cを用いるものとしてもよい。この場合に、全方位カメラ7aが複数となるが、必ずしも複数台の全方位カメラ7aを必要としないので、全方位カメラ7aを有するタイプと、全方位カメラ7aが無いタイプとを組み合わせるものとしてもよい。また、ディスプレイ8の大きさによっては、参加者に対して全方位カメラ7aの位置が高くなり過ぎて、半球状の撮影範囲に参加者の胸像部分の上部しか映らない可能性があり、最悪、参加者の顔の上部しか映らない可能性がある。そこで、撮影範囲が半球より広く全球に近い撮影範囲を有する全天球カメラを全方位カメラ7aとすることが好ましい。 As shown in FIGS. 7 (a) and 7 (b), in the fourth embodiment, the display 8a has display screens 14a and 14b on both the front and back surfaces. As shown in FIG. Participants sitting facing each other see different display screens 14a and 14b. In addition, in the imaging device 1c with a voice input / output function, a plurality of imaging devices 1c with a voice input / output function may be used as the display screen 14b is not provided on the back side. In this case, although there are a plurality of omnidirectional cameras 7a, a plurality of omnidirectional cameras 7a are not necessarily required. Therefore, a combination of a type having an omnidirectional camera 7a and a type having no omnidirectional camera 7a may be used. Good. Further, depending on the size of the display 8, the position of the omnidirectional camera 7a may be too high for the participant, and only the upper part of the bust portion of the participant may be reflected in the hemispherical shooting range. Only the upper part of the participant's face may be visible. Therefore, it is preferable that the omnidirectional camera 7a is an omnidirectional camera having a shooting range that is wider than the hemisphere and close to the whole globe.
 第4の実施の形態の音声入出力機能付き撮像装置1cによれば、第1および第2の実施の形態と略同様の作用効果を得ることができる。 According to the imaging apparatus with a voice input / output function 1c of the fourth embodiment, it is possible to obtain substantially the same operational effects as those of the first and second embodiments.
1,1a,1b,1c 音声入出力機能付き撮像装置
4 制御基板(画像認識デバイス、画像処理デバイス、音源方向認識デバイス、通信デバイス)
5,5a マイク(音声入力デバイス)
6,6a スピーカ(音声出力デバイス)
7,7a 全方位カメラ
8,8a ディスプレイ
1, 1a, 1b, 1c Image pickup apparatus with voice input / output function 4 Control board (image recognition device, image processing device, sound source direction recognition device, communication device)
5,5a Microphone (voice input device)
6,6a Speaker (Audio output device)
7,7a Omnidirectional camera 8,8a Display

Claims (5)

  1.  周囲を撮像対象とする全方位カメラと、
     前記全方位カメラの近傍に設けられ、外部から入力される音声信号を音声として周囲に出力する音声出力デバイスと、
     前記全方位カメラの近傍に設けられ、周囲の音声を音声信号として入力する音声入力デバイスとを備え、
     前記全方位カメラにより撮像された画像データと、前記音声入力デバイスにより入力された音声信号を出力することを特徴とする音声入出力機能付き撮像装置。
    An omnidirectional camera that captures the surroundings;
    An audio output device provided in the vicinity of the omnidirectional camera and outputting an audio signal input from the outside as sound;
    An audio input device that is provided in the vicinity of the omnidirectional camera and inputs ambient audio as an audio signal;
    An image pickup apparatus with a voice input / output function, which outputs image data picked up by the omnidirectional camera and a voice signal input by the voice input device.
  2.  前記全方位カメラの近傍で当該全方位カメラによる周囲の撮像を妨げない位置に、周囲の複数方向から視認可能に外部から入力された画像データを表示する複数台のディスプレイが設けられていることを特徴とする請求項1に記載の音声入出力機能付き撮像装置。 In the vicinity of the omnidirectional camera, a plurality of displays for displaying image data input from the outside so as to be visible from a plurality of surrounding directions are provided at positions that do not interfere with surrounding imaging by the omnidirectional camera. The imaging apparatus with a voice input / output function according to claim 1.
  3.  前記音声入力デバイスは、少なくとも周囲の異なる方向をそれぞれ向いた少なくとも3つのマイクを備え、
     各マイクに入力する音声の音量から音源の方向を特定する音源方向認識デバイスと、
     前記全方位カメラで撮像された全方位画像データを、前記音源方向認識デバイスにより特定された音源の方向を中心とする画像データに変換する画像処理デバイスを備えることを特徴とする請求項1に記載の音声入出力機能付き撮像装置。
    The voice input device includes at least three microphones respectively facing at least different surrounding directions,
    A sound source direction recognition device that identifies the direction of the sound source from the volume of the sound input to each microphone;
    2. The image processing device according to claim 1, further comprising an image processing device that converts omnidirectional image data captured by the omnidirectional camera into image data centered on a direction of a sound source identified by the sound source direction recognition device. Imaging device with voice input / output function.
  4.  前記全方位カメラで撮像された画像データ中に撮像されている被撮像者の顔を認識するとともに、認識された前記顔の口の動きから前記被撮像者のうちの発言している前記被撮像者を特定する画像認識デバイスと、
     前記全方位カメラで撮像された全方位画像データを、前記画像認識デバイスにより発言していると特定された前記被撮像者を中心とする画像データに変換する画像処理デバイスとを備えることを特徴とする請求項1に記載の音声入出力機能付き撮像装置。
    Recognizing the face of the person being imaged in the image data captured by the omnidirectional camera, and speaking from the person being imaged based on the movement of the mouth of the recognized face An image recognition device for identifying a person,
    An image processing device that converts omnidirectional image data captured by the omnidirectional camera into image data centered on the imaged person identified as speaking by the image recognition device. The imaging apparatus with a voice input / output function according to claim 1.
  5.  請求項1から請求項4のいずれか1項に記載の前記音声入出力機能付き撮像装置を複数備え、各音声入出力機能付き撮像装置は、他の前記音声入出力機能付き撮像装置に前記画像データと前記音声信号を出力し、かつ、他の前記音声入出力機能付き撮像装置から出力された前記画像データおよび前記音声信号を入力するための通信装置を備えることを特徴とするテレビ会議システム。 5. A plurality of the imaging devices with a voice input / output function according to claim 1, wherein each of the imaging devices with a voice input / output function is added to the other imaging device with a voice input / output function. A video conferencing system comprising: a communication device that outputs data and the audio signal, and inputs the image data and the audio signal output from the other imaging device with an audio input / output function.
PCT/JP2015/067628 2014-06-24 2015-06-18 Imaging device provided with audio input/output function and videoconferencing system WO2015198964A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014129638A JP2016010010A (en) 2014-06-24 2014-06-24 Imaging apparatus with voice input and output function and video conference system
JP2014-129638 2014-06-24

Publications (1)

Publication Number Publication Date
WO2015198964A1 true WO2015198964A1 (en) 2015-12-30

Family

ID=54938049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/067628 WO2015198964A1 (en) 2014-06-24 2015-06-18 Imaging device provided with audio input/output function and videoconferencing system

Country Status (2)

Country Link
JP (1) JP2016010010A (en)
WO (1) WO2015198964A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887652A (en) * 2021-01-21 2021-06-01 宁波市鄞州声科电子有限公司 System and method for improving quality of network conference based on intelligent microphone array

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7017045B2 (en) * 2016-09-30 2022-02-08 株式会社リコー Communication terminal, display method, and program
JP6846753B2 (en) 2017-06-28 2021-03-24 株式会社オプティム Computer system, web conferencing audio assistance methods and programs
JP7100824B2 (en) 2018-06-20 2022-07-14 カシオ計算機株式会社 Data processing equipment, data processing methods and programs
JP7245034B2 (en) * 2018-11-27 2023-03-23 キヤノン株式会社 SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND PROGRAM

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003018561A (en) * 2001-07-04 2003-01-17 Ricoh Co Ltd Pantoscopic video image recording/reproducing system, conference recording/reproducing system, pantoscopic video image transmitting apparatus, conference video image transmitting apparatus, pantoscopic video image reproducing apparatus, conference video image reproducing apparatus, pantoscopic video image recording/reproducing method, conference video image reproducing method, pantoscopic video image transmitting method, conference video image transmitting method, pantoscopic video image reproducing method, conference video image reproducing method and program
JP2005274707A (en) * 2004-03-23 2005-10-06 Sony Corp Information processing apparatus and method, program, and recording medium
JP2007228070A (en) * 2006-02-21 2007-09-06 Yamaha Corp Video conference apparatus
JP2008085930A (en) * 2006-09-29 2008-04-10 Nec Engineering Ltd Video conference apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003018561A (en) * 2001-07-04 2003-01-17 Ricoh Co Ltd Pantoscopic video image recording/reproducing system, conference recording/reproducing system, pantoscopic video image transmitting apparatus, conference video image transmitting apparatus, pantoscopic video image reproducing apparatus, conference video image reproducing apparatus, pantoscopic video image recording/reproducing method, conference video image reproducing method, pantoscopic video image transmitting method, conference video image transmitting method, pantoscopic video image reproducing method, conference video image reproducing method and program
JP2005274707A (en) * 2004-03-23 2005-10-06 Sony Corp Information processing apparatus and method, program, and recording medium
JP2007228070A (en) * 2006-02-21 2007-09-06 Yamaha Corp Video conference apparatus
JP2008085930A (en) * 2006-09-29 2008-04-10 Nec Engineering Ltd Video conference apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887652A (en) * 2021-01-21 2021-06-01 宁波市鄞州声科电子有限公司 System and method for improving quality of network conference based on intelligent microphone array
CN112887652B (en) * 2021-01-21 2023-03-14 宁波市鄞州声科电子有限公司 System and method for improving quality of network conference based on intelligent microphone array

Also Published As

Publication number Publication date
JP2016010010A (en) 2016-01-18

Similar Documents

Publication Publication Date Title
US9860486B2 (en) Communication apparatus, communication method, and communication system
US10440322B2 (en) Automated configuration of behavior of a telepresence system based on spatial detection of telepresence components
US20030160862A1 (en) Apparatus having cooperating wide-angle digital camera system and microphone array
US5940118A (en) System and method for steering directional microphones
WO2015198964A1 (en) Imaging device provided with audio input/output function and videoconferencing system
US7852369B2 (en) Integrated design for omni-directional camera and microphone array
JP5857674B2 (en) Image processing apparatus and image processing system
US20100118112A1 (en) Group table top videoconferencing device
US20190028817A1 (en) System and method for a directional speaker selection
US20040008423A1 (en) Visual teleconferencing apparatus
JP2018521593A5 (en)
JPH11331827A (en) Television camera
US10771694B1 (en) Conference terminal and conference system
CN104580992A (en) Control method and mobile terminal
US10079996B2 (en) Communication system, communication device, and communication method
US8390665B2 (en) Apparatus, system and method for video call
JP2017028608A (en) Video conference terminal equipment
TW201734948A (en) A method, system and device for generating associated audio and visual signals in a wide angle image system
US11856387B2 (en) Video conferencing system and method thereof
JP2009049734A (en) Camera-mounted microphone and control program thereof, and video conference system
JP2011087218A (en) Loud speaker system
CN113676622A (en) Video processing method, image pickup apparatus, video conference system, and storage medium
JP2013141231A (en) Video conference apparatus and control method of video conference apparatus
CN107438169A (en) Alignment system, pre-determined bit method and real-time location method
JP6450604B2 (en) Image acquisition apparatus and image acquisition method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15810911

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15810911

Country of ref document: EP

Kind code of ref document: A1