EP1110398A1 - Systeme automatique de prise de son et d'images - Google Patents

Systeme automatique de prise de son et d'images

Info

Publication number
EP1110398A1
EP1110398A1 EP99940237A EP99940237A EP1110398A1 EP 1110398 A1 EP1110398 A1 EP 1110398A1 EP 99940237 A EP99940237 A EP 99940237A EP 99940237 A EP99940237 A EP 99940237A EP 1110398 A1 EP1110398 A1 EP 1110398A1
Authority
EP
European Patent Office
Prior art keywords
remote control
scene
person
people
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99940237A
Other languages
German (de)
English (en)
French (fr)
Inventor
Jean-Emmanuel Viallet
Rapha[L Feraud
Michel Collobert
Olivier Bernier
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP1110398A1 publication Critical patent/EP1110398A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the invention relates to an automatic sound and image pickup system, in particular for videoconferencing.
  • videoconferencing systems are equipped with recording and sound means, having equipment (cameras and microphones) which are not orientable or whose orientation is controlled by means of a remote control.
  • the remote control makes it possible to continuously scan the site and the azimuth of the camera as well as to continuously vary the zoom of the camera. Orientation of the camera in the direction occupied by a person or a group of people is possible, but difficult.
  • Space directions (six for the two cameras) can be stored by the camera. The camera can be directed in one of these directions by pressing a button on the remote control or by controlling the serial port. The interest of this function is to directly access a direction of space without having to act by combination of successive keys (site, azimuth).
  • the user of the remote control can simply switch from one person to another.
  • the acoustic analysis of the scene is obtained from several microphones which make it possible to determine the direction of the sound sources, even of the sources of speech.
  • the direction of the speech sources being identified, they could be selected one by one, then be followed dynamically.
  • the Lime Light function of Picture Tel a company that manufactures and markets videoconferencing systems, is based on acoustic localization and allows the detection and monitoring of a sound source and the dynamic orientation of a camera.
  • the first drawback is related to the fact that the positions must be prerecorded. They cannot therefore be rapidly changed continuously.
  • the second disadvantage is to assume that people will occupy the prerecorded positions well and will not move from them. In practice and even by fixing the chairs to the floor, we see that people move and therefore are rarely in the center of the frame, or even get out of the frame if it is tight on the person. This drawback is manifest in the context of the videoconferencing system where people spontaneously leave the framework defined by the pre-recorded directions of space.
  • the functionality of access to predetermined directions of space may be suitable for certain stable situations (remote monitoring), but does not make it possible to adapt to a particular situation.
  • the camera points in the direction of space, but knows nothing about the content of the space, whether occupied by a person or empty.
  • Another drawback, secondary, is the number limited to 6 directions of space which can be memorized by the camera and therefore accessible by the remote control. This disadvantage is generally solved by memorizing these directions in a computer and by using a remote control with a greater number of keys.
  • acoustic speech activity is by nature intermittent (when a person stops speaking to listen).
  • the acoustic location is sensitive to the amplitude of the sound source.
  • Visual localization has the following drawbacks: The main drawback of visual localization is related to the complexity of the algorithms, their speed and their robustness. However, several systems are operational either on workstation or personal computer (PC) like the systems developed by the depositor, or as in the publications cited previously by the depositor.
  • PC personal computer
  • the automatic shooting function of a group of people performed by the depositor is, in use, particularly useful although complex.
  • the framework constantly adapts to the number and position of participants in a videoconference.
  • the invention therefore proposes an intelligent interface capable of carrying out the selection of a person (or a group of people) from among the people on the filmed scene, on the order of a speaker, and the automatic framing from information provided by the scene analysis, on the selected person (or group of people).
  • the subject of the invention is therefore an automatic sound and image pickup system, in particular for videoconferencing, comprising means for controlling recording and sound sensors and scene analysis means controlling these control means for obtain an automatic framing of the filmed scene.
  • the system includes means for selecting a person or a group of people from among the people on the filmed scene and automatic framing means from the information provided by the scene analysis means, on the selected person. or the group of people.
  • the subject of the invention is more particularly, an automatic system for taking sound and images, in particular for videoconferencing, comprising means for controlling photographic and sound sensors, scene analysis means for supplying signals. position to the control means, means for selecting a person or a group from among people on the filmed scene,
  • the selection means comprise a physical interface comprising a remote control able to allow the selection of any one of the people on the scene or a group, to have an automatic framing around this person or of the group, or to select all the people to have a general framing of the scene;
  • the framing means comprise a logical interface capable of establishing a correspondence between the person selected by the command to distance and the position information from the scene analysis to provide the control means with the position information of this person or group relative to the filmed scene.
  • the remote control is a universal remote control, activating a device capable of transmitting control signals to the logical interface
  • the signals emitted by the remote control can be infrared or electromagnetic.
  • the control signals from said remote control can be received and re-transmitted by a transceiver.
  • the control signals of said remote control can be received and re-emitted by a speech recognition or gesture recognition device.
  • the remote control can be carried out by the remote control of the image analysis camera, the control signals of said remote control being received and re-transmitted by the analysis camera to the logical interface.
  • the remote control is a universal remote control, the control signals of said remote control being received and retransmitted by the analysis camera.
  • the remote control comprises a graphical interface.
  • the remote control also comprises, in this case, a screen on which the scene and the various selectable zones are viewed.
  • the remote control includes a computer input / output device to select the areas identified.
  • provision may be made for the scene analysis means to receive a local analysis signal (A) and for the selection means to select a person or a group of people from the scene filmed locally. and that the automatic framing means use the information from the scene filmed locally.
  • A local analysis signal
  • the automatic framing means use the information from the scene filmed locally.
  • the analysis means receive a signal (A 1 ) from a remote system for or corresponding to the scene analysis and that the selection means then make it possible to select a person or a group of people from the scene filmed remotely and the automatic framing means make it possible to control the framing of the scene filmed remotely, the control signals being transported to the remote system.
  • FIG. 1 represents a block diagram of the invention
  • FIG. 2 represents a more detailed diagram of the invention
  • FIG. 3 represents a particular embodiment for the physical interface
  • FIG. 4 represents another embodiment for the physical interface
  • FIG. 5 represents another embodiment of the physical interface
  • FIG. 6 represents another embodiment of the physical interface
  • FIG. 7 shows another embodiment of the physical interface.
  • FIG. 1 schematically shows an automatic sound and image pick-up system in which there are audiovisual resources 10 for filming and capturing the sound of a scene 50.
  • the scene is made up of one or more people called Pl-Pn speakers on a site, wishing to communicate with other people from a remote site.
  • the audiovisual resources 10 are constituted by audio and visual sensors.
  • the audio sensors are for example a series of microphones placed close to the speakers.
  • the video sensors consist of one or more cameras filming the scene.
  • the audiovisual resources 10 are controlled by a conventional control device 20, capable of supplying the control signals to the sensors 10 according to the information received at the input by the interface 30 as detailed below.
  • the information received as input is provided by the interface 30 from the scene analysis device 40 and from the selection made by a speaker.
  • the scene analysis device can be either audio, visual or audiovisual associated with visual or audiovisual sensors.
  • this device is an existing visual device.
  • a fixed analysis camera 60 is used (the camera can be mobile), which makes it possible to provide the signal required to perform an analysis of the visual scene observed.
  • the scene analysis device therefore comprises for this purpose, the camera 60 and means 40 for processing the signal A supplied by this camera.
  • These means are made for example by a microcomputer or a work station equipped with a specific, existing program, for scene analysis.
  • the faces of the people present in the visual field are detected by a neural network, then said program implements an algorithm which follows the detected faces.
  • Other known techniques can be used.
  • a scene analysis device 40 can be used with a mobile camera.
  • a scene analysis device using several fixed or mobile cameras can be used or produced.
  • the various sensors 10 are controlled by a control device 20 which receives control signals from the interface 30 in accordance with the present invention.
  • a control device 20 which receives control signals from the interface 30 in accordance with the present invention.
  • it is a device 20 for controlling a motorized camera 11 which takes the picture and an acoustic antenna 12 which provides sound recording.
  • a motorized camera 11 which takes the picture
  • an acoustic antenna 12 which provides sound recording.
  • the shooting and sound for a set of people and for a single person which corresponds to actual achievements for the depositor.
  • the same techniques can be used for shooting and sound concerning a group of people; the group is a subset of all people.
  • the analysis of the scene is visual, that is to say that the position of the people is determined but it is not known whether they are speaking.
  • the sound pickup devices will be selected from audiovisual information.
  • the control device 20 controls the camera 11 so that all of the people, present in the field of analysis are framed, respecting the rules of the art of shooting as far as the constraints of the camera 11 allow.
  • the device 20 controls the camera 11 so that the person, in compliance with the rules of the shooting , or laterally centered, that his eyes are approximately at the upper third of the image for example.
  • the shooting seeks to isolate this person from others in the image, insofar as the constraints linked to the camera and the rules of the shooting authorizes it.
  • the device 20 controls the sound recording so as to capture the sound field of the different participants. This sound field can be obtained in different ways:
  • the device 20 controls the sound recording so as to capture the sound field of the different people.
  • This sound field can be obtained in different ways:
  • the interface 30 allows the user of the system to obtain a shot and sound in accordance with his request (a wide shot of all of the people, a tight shot of a particular person).
  • the sending of a command from the interface triggers the orientation command of the sound and image pickup sensors, as a function of the audiovisual scene, analyzed by the scene analysis device.
  • the interface includes a logical interface 31 and a physical interface 32.
  • the physical interface 32 can be produced according to different embodiments described below in connection with FIGS. 3 to 7.
  • the logic interface 31 is, according to a preferred embodiment, constituted by a program loaded in the system for processing the scene analysis signal 40. This logic interface 31 recovers position information of the people on the scene resulting from processing scene analysis and establishes a correspondence between this position information and the selection information given through the physical interface by the operator.
  • This logic interface 31 interprets (that is to say it decodes) the information received from the unit 40 to supply position control signals interpretable by the control device 20 in order to carry out the desired framing around the person selected or group.
  • a first embodiment comprises a graphic interface 32A installed on a microcomputer or workstation P as shown in FIG. 3.
  • a mouse 320 the user chooses to obtain a picture and sound recording on all of the people on the scene, by clicking on a window named "Ensemble", referenced E.
  • the user chooses to obtain a shot and sound on a person on the scene, by clicking on a window carrying the number of the desired person Pl-Pn or of the group of people.
  • the wording in figures of the people can be replaced by the image of person 321 obtained by the analysis system. This image is obtained either at a time set by the system user, or it is automatically refreshed during the meeting.
  • a graphical interface 32A with the image of the people 321 is more ergonomic for the user, because the interface displays the shots that the user can select.
  • the mouse 320 can be replaced by a touch screen and / or by a speech recognition device R.
  • FIG. 4 Another embodiment produced for the physical interface 32 is represented by FIG. 4.
  • the use of the remote control 32B of the visual scene analysis camera 60 is diverted to allow the user of the system to send control signals to camera 60.
  • the diversion and use of this remote control has been carried out for reasons of ease and speed of implementation.
  • the infrared remote control 32B is in communication (CDE commands) with the analysis camera 60.
  • This analysis camera remote control has a certain number of keys including in particular keys corresponding to position memories and a "home" key H corresponding to the rest position of the camera.
  • Position memories are not used as such to point directions of space, but we only use the fact that the keys are activated.
  • the positions of the position memories are initialized beforehand by the system, at the rest position of the camera.
  • the analysis camera being fixed in one of the embodiments, the triggering of positions 1 to 6 or of the "home” key H has no effect on the position of this analysis camera 60.
  • the user in pressing for example the "home” button H, the user triggers via the devices 60, 40, 30 and 20, a shooting and sound on all the people present in the scene.
  • the user by pressing one of the keys from 1 to 6 corresponding to the position memory, the user triggers via the devices 60, 40, 30, and 20, a shot on the corresponding person (6 people maximum in this version) .
  • This embodiment is not illustrated because it corresponds to the diagram in FIG. 4 except that the remote control 32B is in this case a universal remote control.
  • FIG. 5 corresponds to another embodiment according to the invention.
  • This transceiver 70 receives infrared CDE signals from the remote control 32B and returns codes to the logical interface 31, for example through an RS232 communication port, connected to the interface 30.
  • FIG. 6 illustrates a mode of embodiment according to which the physical interface 32 comprises a remote control by voice 32B associated with an existing speech recognition device 80.
  • FIG. 7 illustrates an embodiment according to which the physical interface 32 comprises a remote control by the gesture 32B associated with a device for recognizing the existing gesture 90.
  • the interfaces 31, 32 previously described make it possible to control the shooting and sound sensors physically present in a remote room (where the user is not located), the room with which he is in videoconferencing for example.
  • the user participating in a videoconference selects and obtains the shots and his desired.
  • the signal A '(remote) for scene analysis or corresponding to the analysis will be applied to an input of the analysis device 40.
  • the signals C emitted by the infrared remote control or by the graphical interface are transported with the image, the sound and the other signals of videoconferencing.
  • the possible sensor control conflict between the local room and the remote room must be managed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Devices (AREA)
  • User Interface Of Digital Computer (AREA)
  • Selective Calling Equipment (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Closed-Circuit Television Systems (AREA)
EP99940237A 1998-08-31 1999-08-26 Systeme automatique de prise de son et d'images Withdrawn EP1110398A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR9810888A FR2782877B1 (fr) 1998-08-31 1998-08-31 Systeme automatique de prise de son et d'images
FR9810888 1998-08-31
PCT/FR1999/002047 WO2000013417A1 (fr) 1998-08-31 1999-08-26 Systeme automatique de prise de son et d'images

Publications (1)

Publication Number Publication Date
EP1110398A1 true EP1110398A1 (fr) 2001-06-27

Family

ID=9530001

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99940237A Withdrawn EP1110398A1 (fr) 1998-08-31 1999-08-26 Systeme automatique de prise de son et d'images

Country Status (4)

Country Link
EP (1) EP1110398A1 (enExample)
JP (1) JP2002524936A (enExample)
FR (1) FR2782877B1 (enExample)
WO (1) WO2000013417A1 (enExample)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010055058A1 (en) * 2000-06-08 2001-12-27 Rajko Milovanovic Method and system for video telephony
US6937266B2 (en) * 2001-06-14 2005-08-30 Microsoft Corporation Automated online broadcasting system and method using an omni-directional camera system for viewing meetings over a computer network
JP5395716B2 (ja) * 2010-03-25 2014-01-22 株式会社デンソーアイティーラボラトリ 車外音提供装置、車外音提供方法およびプログラム

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2389865B1 (enExample) * 1977-05-06 1981-11-20 Realisa Electroniques Et
US4286289A (en) * 1979-10-31 1981-08-25 The United States Of America As Represented By The Secretary Of The Army Touch screen target designator
GB9119863D0 (en) * 1991-09-17 1991-10-30 Radamec Epo Ltd Pictorial based shot and recall method and equipment for remotely controlled camera systems
CA2148231C (en) * 1993-01-29 1999-01-12 Michael Haysom Bianchi Automatic tracking camera control system
US5745161A (en) * 1993-08-30 1998-04-28 Canon Kabushiki Kaisha Video conference system
JPH09506217A (ja) * 1993-10-20 1997-06-17 ヴィデオコンファレンスィング システムズ インコーポレイテッド 適応型テレビ会議システム
US5508734A (en) * 1994-07-27 1996-04-16 International Business Machines Corporation Method and apparatus for hemispheric imaging which emphasizes peripheral content
WO1996014587A2 (en) * 1994-11-04 1996-05-17 Telemedia A/S A method in an image recording system
US5805745A (en) * 1995-06-26 1998-09-08 Lucent Technologies Inc. Method for locating a subject's lips in a facial image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0013417A1 *

Also Published As

Publication number Publication date
FR2782877A1 (fr) 2000-03-03
FR2782877B1 (fr) 2000-10-13
WO2000013417A1 (fr) 2000-03-09
JP2002524936A (ja) 2002-08-06

Similar Documents

Publication Publication Date Title
US8159519B2 (en) Personal controls for personal video communications
US8154578B2 (en) Multi-camera residential communication system
US8063929B2 (en) Managing scene transitions for video communication
US8253770B2 (en) Residential video communication system
US8154583B2 (en) Eye gazing imaging for video communications
US9274744B2 (en) Relative position-inclusive device interfaces
CN101247461B (zh) 为照相机提供区域缩放功能
US7559026B2 (en) Video conferencing system having focus control
US6972787B1 (en) System and method for tracking an object with multiple cameras
US9239627B2 (en) SmartLight interaction system
CA2284884C (fr) Systeme de visioconference
US8340258B2 (en) System, method and apparatus for controlling image access in a video collaboration system
US20150208032A1 (en) Content data capture, display and manipulation system
US20120306995A1 (en) Ambulatory Presence Features
US20120081504A1 (en) Audio source locator and tracker, a method of directing a camera to view an audio source and a video conferencing terminal
JP2013504933A (ja) 時間シフトされたビデオ通信
US9374554B1 (en) Display selection for video conferencing
US11019272B2 (en) Automatic dynamic range control for audio/video recording and communication devices
CN102316269A (zh) 成像控制设备、成像控制方法和程序
CN107439002A (zh) 深度成像
US20250164858A1 (en) Systems and methods for video camera systems for smart tv applications
EP1110398A1 (fr) Systeme automatique de prise de son et d'images
WO2008066705A1 (en) Image capture apparatus with indicator
US12407789B2 (en) Automated video conference system with multi-camera support
EP1168810A2 (fr) Téléphone mobile muni d'une caméra

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010227

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

17Q First examination report despatched

Effective date: 20011004

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20030205

RBV Designated contracting states (corrected)

Designated state(s): DE FI FR GB