WO2000013417A1 - Systeme automatique de prise de son et d'images - Google Patents
Systeme automatique de prise de son et d'images Download PDFInfo
- Publication number
- WO2000013417A1 WO2000013417A1 PCT/FR1999/002047 FR9902047W WO0013417A1 WO 2000013417 A1 WO2000013417 A1 WO 2000013417A1 FR 9902047 W FR9902047 W FR 9902047W WO 0013417 A1 WO0013417 A1 WO 0013417A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- remote control
- scene
- person
- people
- analysis
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
Definitions
- the invention relates to an automatic sound and image pickup system, in particular for videoconferencing.
- videoconferencing systems are equipped with recording and sound means, having equipment (cameras and microphones) which are not orientable or whose orientation is controlled by means of a remote control.
- the remote control makes it possible to continuously scan the site and the azimuth of the camera as well as to continuously vary the zoom of the camera. Orientation of the camera in the direction occupied by a person or a group of people is possible, but difficult.
- Space directions (six for the two cameras) can be stored by the camera. The camera can be directed in one of these directions by pressing a button on the remote control or by controlling the serial port. The interest of this function is to directly access a direction of space without having to act by combination of successive keys (site, azimuth).
- the user of the remote control can simply switch from one person to another.
- the acoustic analysis of the scene is obtained from several microphones which make it possible to determine the direction of the sound sources, even of the sources of speech.
- the direction of the speech sources being identified, they could be selected one by one, then be followed dynamically.
- the Lime Light function of Picture Tel a company that manufactures and markets videoconferencing systems, is based on acoustic localization and allows the detection and monitoring of a sound source and the dynamic orientation of a camera.
- the first drawback is related to the fact that the positions must be prerecorded. They cannot therefore be rapidly changed continuously.
- the second disadvantage is to assume that people will occupy the prerecorded positions well and will not move from them. In practice and even by fixing the chairs to the floor, we see that people move and therefore are rarely in the center of the frame, or even get out of the frame if it is tight on the person. This drawback is manifest in the context of the videoconferencing system where people spontaneously leave the framework defined by the pre-recorded directions of space.
- the functionality of access to predetermined directions of space may be suitable for certain stable situations (remote monitoring), but does not make it possible to adapt to a particular situation.
- the camera points in the direction of space, but knows nothing about the content of the space, whether occupied by a person or empty.
- Another drawback, secondary, is the number limited to 6 directions of space which can be memorized by the camera and therefore accessible by the remote control. This disadvantage is generally solved by memorizing these directions in a computer and by using a remote control with a greater number of keys.
- acoustic speech activity is by nature intermittent (when a person stops speaking to listen).
- the acoustic location is sensitive to the amplitude of the sound source.
- Visual localization has the following drawbacks: The main drawback of visual localization is related to the complexity of the algorithms, their speed and their robustness. However, several systems are operational either on workstation or personal computer (PC) like the systems developed by the depositor, or as in the publications cited previously by the depositor.
- PC personal computer
- the automatic shooting function of a group of people performed by the depositor is, in use, particularly useful although complex.
- the framework constantly adapts to the number and position of participants in a videoconference.
- the invention therefore proposes an intelligent interface capable of carrying out the selection of a person (or a group of people) from among the people on the filmed scene, on the order of a speaker, and the automatic framing from information provided by the scene analysis, on the selected person (or group of people).
- the subject of the invention is therefore an automatic sound and image pickup system, in particular for videoconferencing, comprising means for controlling recording and sound sensors and scene analysis means controlling these control means for obtain an automatic framing of the filmed scene.
- the system includes means for selecting a person or a group of people from among the people on the filmed scene and automatic framing means from the information provided by the scene analysis means, on the selected person. or the group of people.
- the subject of the invention is more particularly, an automatic system for taking sound and images, in particular for videoconferencing, comprising means for controlling photographic and sound sensors, scene analysis means for supplying signals. position to the control means, means for selecting a person or a group from among people on the filmed scene,
- the selection means comprise a physical interface comprising a remote control able to allow the selection of any one of the people on the scene or a group, to have an automatic framing around this person or of the group, or to select all the people to have a general framing of the scene;
- the framing means comprise a logical interface capable of establishing a correspondence between the person selected by the command to distance and the position information from the scene analysis to provide the control means with the position information of this person or group relative to the filmed scene.
- the remote control is a universal remote control, activating a device capable of transmitting control signals to the logical interface
- the signals emitted by the remote control can be infrared or electromagnetic.
- the control signals from said remote control can be received and re-transmitted by a transceiver.
- the control signals of said remote control can be received and re-emitted by a speech recognition or gesture recognition device.
- the remote control can be carried out by the remote control of the image analysis camera, the control signals of said remote control being received and re-transmitted by the analysis camera to the logical interface.
- the remote control is a universal remote control, the control signals of said remote control being received and retransmitted by the analysis camera.
- the remote control comprises a graphical interface.
- the remote control also comprises, in this case, a screen on which the scene and the various selectable zones are viewed.
- the remote control includes a computer input / output device to select the areas identified.
- provision may be made for the scene analysis means to receive a local analysis signal (A) and for the selection means to select a person or a group of people from the scene filmed locally. and that the automatic framing means use the information from the scene filmed locally.
- A local analysis signal
- the automatic framing means use the information from the scene filmed locally.
- the analysis means receive a signal (A 1 ) from a remote system for or corresponding to the scene analysis and that the selection means then make it possible to select a person or a group of people from the scene filmed remotely and the automatic framing means make it possible to control the framing of the scene filmed remotely, the control signals being transported to the remote system.
- FIG. 1 represents a block diagram of the invention
- FIG. 2 represents a more detailed diagram of the invention
- FIG. 3 represents a particular embodiment for the physical interface
- FIG. 4 represents another embodiment for the physical interface
- FIG. 5 represents another embodiment of the physical interface
- FIG. 6 represents another embodiment of the physical interface
- FIG. 7 shows another embodiment of the physical interface.
- FIG. 1 schematically shows an automatic sound and image pick-up system in which there are audiovisual resources 10 for filming and capturing the sound of a scene 50.
- the scene is made up of one or more people called Pl-Pn speakers on a site, wishing to communicate with other people from a remote site.
- the audiovisual resources 10 are constituted by audio and visual sensors.
- the audio sensors are for example a series of microphones placed close to the speakers.
- the video sensors consist of one or more cameras filming the scene.
- the audiovisual resources 10 are controlled by a conventional control device 20, capable of supplying the control signals to the sensors 10 according to the information received at the input by the interface 30 as detailed below.
- the information received as input is provided by the interface 30 from the scene analysis device 40 and from the selection made by a speaker.
- the scene analysis device can be either audio, visual or audiovisual associated with visual or audiovisual sensors.
- this device is an existing visual device.
- a fixed analysis camera 60 is used (the camera can be mobile), which makes it possible to provide the signal required to perform an analysis of the visual scene observed.
- the scene analysis device therefore comprises for this purpose, the camera 60 and means 40 for processing the signal A supplied by this camera.
- These means are made for example by a microcomputer or a work station equipped with a specific, existing program, for scene analysis.
- the faces of the people present in the visual field are detected by a neural network, then said program implements an algorithm which follows the detected faces.
- Other known techniques can be used.
- a scene analysis device 40 can be used with a mobile camera.
- a scene analysis device using several fixed or mobile cameras can be used or produced.
- the various sensors 10 are controlled by a control device 20 which receives control signals from the interface 30 in accordance with the present invention.
- a control device 20 which receives control signals from the interface 30 in accordance with the present invention.
- it is a device 20 for controlling a motorized camera 11 which takes the picture and an acoustic antenna 12 which provides sound recording.
- a motorized camera 11 which takes the picture
- an acoustic antenna 12 which provides sound recording.
- the shooting and sound for a set of people and for a single person which corresponds to actual achievements for the depositor.
- the same techniques can be used for shooting and sound concerning a group of people; the group is a subset of all people.
- the analysis of the scene is visual, that is to say that the position of the people is determined but it is not known whether they are speaking.
- the sound pickup devices will be selected from audiovisual information.
- the control device 20 controls the camera 11 so that all of the people, present in the field of analysis are framed, respecting the rules of the art of shooting as far as the constraints of the camera 11 allow.
- the device 20 controls the camera 11 so that the person, in compliance with the rules of the shooting , or laterally centered, that his eyes are approximately at the upper third of the image for example.
- the shooting seeks to isolate this person from others in the image, insofar as the constraints linked to the camera and the rules of the shooting authorizes it.
- the device 20 controls the sound recording so as to capture the sound field of the different participants. This sound field can be obtained in different ways:
- the device 20 controls the sound recording so as to capture the sound field of the different people.
- This sound field can be obtained in different ways:
- the interface 30 allows the user of the system to obtain a shot and sound in accordance with his request (a wide shot of all of the people, a tight shot of a particular person).
- the sending of a command from the interface triggers the orientation command of the sound and image pickup sensors, as a function of the audiovisual scene, analyzed by the scene analysis device.
- the interface includes a logical interface 31 and a physical interface 32.
- the physical interface 32 can be produced according to different embodiments described below in connection with FIGS. 3 to 7.
- the logic interface 31 is, according to a preferred embodiment, constituted by a program loaded in the system for processing the scene analysis signal 40. This logic interface 31 recovers position information of the people on the scene resulting from processing scene analysis and establishes a correspondence between this position information and the selection information given through the physical interface by the operator.
- This logic interface 31 interprets (that is to say it decodes) the information received from the unit 40 to supply position control signals interpretable by the control device 20 in order to carry out the desired framing around the person selected or group.
- a first embodiment comprises a graphic interface 32A installed on a microcomputer or workstation P as shown in FIG. 3.
- a mouse 320 the user chooses to obtain a picture and sound recording on all of the people on the scene, by clicking on a window named "Ensemble", referenced E.
- the user chooses to obtain a shot and sound on a person on the scene, by clicking on a window carrying the number of the desired person Pl-Pn or of the group of people.
- the wording in figures of the people can be replaced by the image of person 321 obtained by the analysis system. This image is obtained either at a time set by the system user, or it is automatically refreshed during the meeting.
- a graphical interface 32A with the image of the people 321 is more ergonomic for the user, because the interface displays the shots that the user can select.
- the mouse 320 can be replaced by a touch screen and / or by a speech recognition device R.
- FIG. 4 Another embodiment produced for the physical interface 32 is represented by FIG. 4.
- the use of the remote control 32B of the visual scene analysis camera 60 is diverted to allow the user of the system to send control signals to camera 60.
- the diversion and use of this remote control has been carried out for reasons of ease and speed of implementation.
- the infrared remote control 32B is in communication (CDE commands) with the analysis camera 60.
- This analysis camera remote control has a certain number of keys including in particular keys corresponding to position memories and a "home" key H corresponding to the rest position of the camera.
- Position memories are not used as such to point directions of space, but we only use the fact that the keys are activated.
- the positions of the position memories are initialized beforehand by the system, at the rest position of the camera.
- the analysis camera being fixed in one of the embodiments, the triggering of positions 1 to 6 or of the "home” key H has no effect on the position of this analysis camera 60.
- the user in pressing for example the "home” button H, the user triggers via the devices 60, 40, 30 and 20, a shooting and sound on all the people present in the scene.
- the user by pressing one of the keys from 1 to 6 corresponding to the position memory, the user triggers via the devices 60, 40, 30, and 20, a shot on the corresponding person (6 people maximum in this version) .
- This embodiment is not illustrated because it corresponds to the diagram in FIG. 4 except that the remote control 32B is in this case a universal remote control.
- FIG. 5 corresponds to another embodiment according to the invention.
- This transceiver 70 receives infrared CDE signals from the remote control 32B and returns codes to the logical interface 31, for example through an RS232 communication port, connected to the interface 30.
- FIG. 6 illustrates a mode of embodiment according to which the physical interface 32 comprises a remote control by voice 32B associated with an existing speech recognition device 80.
- FIG. 7 illustrates an embodiment according to which the physical interface 32 comprises a remote control by the gesture 32B associated with a device for recognizing the existing gesture 90.
- the interfaces 31, 32 previously described make it possible to control the shooting and sound sensors physically present in a remote room (where the user is not located), the room with which he is in videoconferencing for example.
- the user participating in a videoconference selects and obtains the shots and his desired.
- the signal A '(remote) for scene analysis or corresponding to the analysis will be applied to an input of the analysis device 40.
- the signals C emitted by the infrared remote control or by the graphical interface are transported with the image, the sound and the other signals of videoconferencing.
- the possible sensor control conflict between the local room and the remote room must be managed.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
- Selective Calling Equipment (AREA)
- Studio Devices (AREA)
- Closed-Circuit Television Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP99940237A EP1110398A1 (fr) | 1998-08-31 | 1999-08-26 | Systeme automatique de prise de son et d'images |
JP2000568257A JP2002524936A (ja) | 1998-08-31 | 1999-08-26 | 音声および画像自動記録システム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR98/10888 | 1998-08-31 | ||
FR9810888A FR2782877B1 (fr) | 1998-08-31 | 1998-08-31 | Systeme automatique de prise de son et d'images |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000013417A1 true WO2000013417A1 (fr) | 2000-03-09 |
Family
ID=9530001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR1999/002047 WO2000013417A1 (fr) | 1998-08-31 | 1999-08-26 | Systeme automatique de prise de son et d'images |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1110398A1 (fr) |
JP (1) | JP2002524936A (fr) |
FR (1) | FR2782877B1 (fr) |
WO (1) | WO2000013417A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6937266B2 (en) * | 2001-06-14 | 2005-08-30 | Microsoft Corporation | Automated online broadcasting system and method using an omni-directional camera system for viewing meetings over a computer network |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010055058A1 (en) * | 2000-06-08 | 2001-12-27 | Rajko Milovanovic | Method and system for video telephony |
JP5395716B2 (ja) * | 2010-03-25 | 2014-01-22 | 株式会社デンソーアイティーラボラトリ | 車外音提供装置、車外音提供方法およびプログラム |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4274609A (en) * | 1977-05-06 | 1981-06-23 | Societe D'etudes Et De Realisations Electroniques | Target and missile angle tracking method and system for guiding missiles on to targets |
GB2252473A (en) * | 1991-09-17 | 1992-08-05 | Radamec Epo Limited | Remote control system for robotic camera |
WO1995011566A1 (fr) * | 1993-10-20 | 1995-04-27 | Videoconferencing Systems, Inc. | Systeme adaptable de video conferences |
US5434617A (en) * | 1993-01-29 | 1995-07-18 | Bell Communications Research, Inc. | Automatic tracking camera control system |
WO1996014587A2 (fr) * | 1994-11-04 | 1996-05-17 | Telemedia A/S | Procede relatif a un systeme d'enregistrement d'images |
EP0751473A1 (fr) * | 1995-06-26 | 1997-01-02 | Lucent Technologies Inc. | Localisation de caractéristiques dans une image |
US5686957A (en) * | 1994-07-27 | 1997-11-11 | International Business Machines Corporation | Teleconferencing imaging system with automatic camera steering |
US5745161A (en) * | 1993-08-30 | 1998-04-28 | Canon Kabushiki Kaisha | Video conference system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4286289A (en) * | 1979-10-31 | 1981-08-25 | The United States Of America As Represented By The Secretary Of The Army | Touch screen target designator |
-
1998
- 1998-08-31 FR FR9810888A patent/FR2782877B1/fr not_active Expired - Fee Related
-
1999
- 1999-08-26 JP JP2000568257A patent/JP2002524936A/ja not_active Abandoned
- 1999-08-26 EP EP99940237A patent/EP1110398A1/fr not_active Withdrawn
- 1999-08-26 WO PCT/FR1999/002047 patent/WO2000013417A1/fr not_active Application Discontinuation
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4274609A (en) * | 1977-05-06 | 1981-06-23 | Societe D'etudes Et De Realisations Electroniques | Target and missile angle tracking method and system for guiding missiles on to targets |
GB2252473A (en) * | 1991-09-17 | 1992-08-05 | Radamec Epo Limited | Remote control system for robotic camera |
US5434617A (en) * | 1993-01-29 | 1995-07-18 | Bell Communications Research, Inc. | Automatic tracking camera control system |
US5745161A (en) * | 1993-08-30 | 1998-04-28 | Canon Kabushiki Kaisha | Video conference system |
WO1995011566A1 (fr) * | 1993-10-20 | 1995-04-27 | Videoconferencing Systems, Inc. | Systeme adaptable de video conferences |
US5686957A (en) * | 1994-07-27 | 1997-11-11 | International Business Machines Corporation | Teleconferencing imaging system with automatic camera steering |
WO1996014587A2 (fr) * | 1994-11-04 | 1996-05-17 | Telemedia A/S | Procede relatif a un systeme d'enregistrement d'images |
EP0751473A1 (fr) * | 1995-06-26 | 1997-01-02 | Lucent Technologies Inc. | Localisation de caractéristiques dans une image |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6937266B2 (en) * | 2001-06-14 | 2005-08-30 | Microsoft Corporation | Automated online broadcasting system and method using an omni-directional camera system for viewing meetings over a computer network |
Also Published As
Publication number | Publication date |
---|---|
FR2782877B1 (fr) | 2000-10-13 |
EP1110398A1 (fr) | 2001-06-27 |
FR2782877A1 (fr) | 2000-03-03 |
JP2002524936A (ja) | 2002-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8159519B2 (en) | Personal controls for personal video communications | |
US8154578B2 (en) | Multi-camera residential communication system | |
US8063929B2 (en) | Managing scene transitions for video communication | |
US8253770B2 (en) | Residential video communication system | |
US8154583B2 (en) | Eye gazing imaging for video communications | |
US9274744B2 (en) | Relative position-inclusive device interfaces | |
CN101247461B (zh) | 为照相机提供区域缩放功能 | |
US7559026B2 (en) | Video conferencing system having focus control | |
US6972787B1 (en) | System and method for tracking an object with multiple cameras | |
US8941710B2 (en) | Ambulatory presence features | |
CA2284884C (fr) | Systeme de visioconference | |
US9263044B1 (en) | Noise reduction based on mouth area movement recognition | |
US20150208032A1 (en) | Content data capture, display and manipulation system | |
KR20170091913A (ko) | 영상 서비스 제공 방법 및 장치 | |
US20080180519A1 (en) | Presentation control system | |
US9374554B1 (en) | Display selection for video conferencing | |
JP2013504933A (ja) | 時間シフトされたビデオ通信 | |
US11019272B2 (en) | Automatic dynamic range control for audio/video recording and communication devices | |
FR2886800A1 (fr) | Procede et dispositif de commande d'un deplacement d'une ligne de visee, systeme de visioconference, terminal et programme pour la mise en oeuvre du procede | |
CN108702458A (zh) | 拍摄方法和装置 | |
CN106791339A (zh) | 成像系统及其控制方法 | |
CN106341602A (zh) | 全景图像的生成方法及装置 | |
CN105049813A (zh) | 控制视频画面的方法、装置及终端 | |
WO2000013417A1 (fr) | Systeme automatique de prise de son et d'images | |
CN109983765A (zh) | 经由全方位相机的视听传输调整 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1999940237 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2000 568257 Kind code of ref document: A Format of ref document f/p: F |
|
WWP | Wipo information: published in national office |
Ref document number: 1999940237 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1999940237 Country of ref document: EP |