WO2001072034A1 - Hands-free home video production camcorder - Google Patents

Hands-free home video production camcorder Download PDF

Info

Publication number
WO2001072034A1
WO2001072034A1 PCT/EP2001/002758 EP0102758W WO0172034A1 WO 2001072034 A1 WO2001072034 A1 WO 2001072034A1 EP 0102758 W EP0102758 W EP 0102758W WO 0172034 A1 WO0172034 A1 WO 0172034A1
Authority
WO
WIPO (PCT)
Prior art keywords
field
view
camera
information parameters
change
Prior art date
Application number
PCT/EP2001/002758
Other languages
French (fr)
Inventor
Mi-Suen Lee
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to JP2001570070A priority Critical patent/JP2003528548A/en
Priority to EP01913866A priority patent/EP1269746A1/en
Priority to KR1020017014729A priority patent/KR20020008191A/en
Publication of WO2001072034A1 publication Critical patent/WO2001072034A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • H04N23/661Transmitting camera control signals through networks, e.g. control via the Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects

Definitions

  • This invention relates to the field of video systems, and in particular to the control of a camcorder to facilitate hands-free video recording.
  • Production video recording systems are available that include a remote control of a pan/tilt/zoom camera so that a user/operator is free to control the camera while also appearing in the captured camera images.
  • the user is the "narrator" for the events being recorded, and uses the remote control to reorient the camera to point to desired scenes. In this manner the user is able to create a video recording without the aid of a second person to operate the camera.
  • U.S. patent 5,432,597 "Remote Controlled Tracking System for Tracking a Remote-Control Unit and Positioning and Operating a Camera and Method", issued 11 July 1995, discloses a remote control camera system that also includes a tracking capability, and is incorporated by reference herein.
  • the referenced patent discloses an infrared emitter that sweeps an area, and an infrared detector associated with the narrator that signals the camera controller when the infrared signal is received.
  • the infrared detector can be contained in the remote control device that the narrator uses to control the camera, or it can be worn by the narrator.
  • Japan patent JP 09009365 A discloses a remote control system that includes a motion detector within a remote control device and provides camera orientation commands corresponding to the detected motion.
  • Each of these prior art devices operate based on a premise that a single person, the person holding the remote control device, is a primary subject for recording or the primary director for determining the views to be recorded. This requires that such a person be designated, and requires that person's deliberate attention to direct the recording. Although having a primary narrator or director is commonplace for professional or semi-professional recordings, there are a number of occasions when an UN-attended recording may be preferred.
  • the person designated to record the event is not able to freely participate in the celebration, or is not able to properly direct the recording, because of the division of roles between celebration-participant and camera-director
  • the person designated as the camera director may not be skilled in the art, and the resultant recorded images may not adequately capture the event, and may actually be discomforting to view, because of rapid camera movement, changes of scenes, and so on
  • the to and fro motions of a camera operator can be distracting
  • a modular and portable pan-tilt apparatus that is configured to receive a conventional camcorder.
  • a control system receives the audio and video information from the camcorder, and provides pan and tilt commands to the pan-tilt apparatus to o ⁇ ent the camera approp ⁇ ately. If the camcorder has a controllable zoom, the controller also provides zoom commands to adjust the camera's field of view to properly frame the image, based on an analysis of the image and/or audio content.
  • a preferred system allows remote and direct control of the camera as desired, and can be configured to provide auto-tracking based on image content.
  • the camera control system also includes one or more knowledge based systems and learning systems that regulate the control of the camera consistent with skilled camera-operator techniques.
  • Fig. 1 illustrates an example block diagram of a camera control system in accordance with this invention.
  • Fig. 2 illustrates an example flow diagram of a camera control system in accordance with this invention.
  • Fig. 1 illustrates an example block diagram of a camera control system 100 in accordance with this invention.
  • the camera control system 100 includes a base unit 130 that is configured to control the o ⁇ entation of a camera 110, based on commands received from a field of view controller 170.
  • a remote control device 180 allows a user to communicate commands directly to the field of view controller to control the o ⁇ entation of the camera.
  • the base unit 130 is configured to allow the camera 110 to rotate vertically (tilt) and horizontally (pan), individually or m combination, so that it can be o ⁇ ented as desired.
  • the controller 170 can adjust the focal length of the camera 110 as required, via zoom-in and zoom-out activation commands, to achieve a desired field of view. That is, the field of view of the camera is defined by the line of sight of the camera 110 as adjusted by the pan and tilt controls, and by the magnification provided by the zoom control.
  • the camera control system 100 also includes an image processing system 150 and/or an audio processing system 160.
  • the image processing system 150 and audio processing system 160 facilitate an unattended control of the camera 110 by providing parameters to the field of view controller 170 for controlling the camera 110 based on the information contained in the image or audio information from the camera 110.
  • this unattended control emulates the operations that a human camera operator might perform, based on the images and sounds received while viewing a scene through the view-finder of the camera. For example, a human camera operator will typically zoom-out to capture group scenes, zoom-in to capture solo speakers, pan to follow select individuals or groups, and so on.
  • the image processing system 150 analyzes the images from the camera 110 to provide image information parameters to the controller 170. These parameters will depend upon the information requirements of the algorithms used within the controller 170.
  • the controller 170 in a preferred embodiment, for example, includes a figure targeting and tracking system that frames a target figure within the field of view of the camera, via commands to the base unit 130 and the camera 110. To effect such a targeting and tracking function, the controller 170 requires a determination of the location and size of each figure, or each major figure, within the camera's field of view.
  • the image processing system 150 in this system identifies each figure in the image from the camera 110, using, for example, flesh tone identification processes and the like, and provides the location and size parameters to the controller 170. As image processing techniques continue to advance, the image processing system 150 is also configured to provide other related image information, such as the estimated "world" coordinates of each figure, the estimated physical size of each figure, the estimated speed of each moving figure, and other estimates, as required by the algorithms of controller 170. Although illustrated as containing separate components for ease of understanding, a preferred embodiment of the camera control system 100 is a portable unit that contains the base unit 130, the image processing system 140, the audio processing system 160, and the field of view controller 170.
  • This portable unit is configured to be a camera "accessory" upon which a conventional camcorder can be mounted. If the camcorder does not contain a stereo audio system, or if the discrimination provided by the camcorder's audio system is found to be insufficient for sound isolation and locating, as discussed below, a modular audio system 120 is provided that can be mounted on the base unit 130 as well. By providing this portable unit, the user can place the unit at any convenient location, and initiate an unattended recording of events within the potential fields of view provided by this location.
  • the image processing system 150 may be configured to distinguish and report the location of dining tables, seating arrangements, and other common focal points.
  • the image processing system 150 may be configure to recognize and report the location of distinctive items, such as football uniforms instead of flesh tones when recording a football game, animal figures in addition to or in lieu of human figures when recording a dog show, sail shapes when recording a yacht race, and so on
  • the image processing system 150 includes a gesture recognition system that is configured to recognize one or more of a plurality of predefined visual gestures within the field of view of the camera. This gesture recognition may operate in conjunction with the audio processing system, such that the user initiates the gesture recognition process via a vocal keyword, such as "Camera!, and thereafter points in a direction that the camera is to pan, or points to an individual that the camera is to track, or provides a gesture that causes the camera to initiate or terminate some other action.
  • the image processing system 150 After recognizing a gesture, the image processing system 150 provides information parameters to the controller 170 to effect the approp ⁇ ate action corresponding to the particular gesture.
  • the image processing system may also be configured to effect the approp ⁇ ate actions directly, to control any or all of the components in the camera control system 100 in response to the recognized gesture.
  • the field of view controller 170 or other processing device, may provide the gesture recognition function, and the image processing system 150 may merely distinguish and report the location of select body parts, such as hands and arms to the approp ⁇ ate device for gesture recognition control
  • the audio processing system 160 performs a similar disc ⁇ mmatmg and locating function as the image processing system 150, based on audio signals received from an audio system 120.
  • the audio system 120 may be integral to the camera 110 or the base unit 130, or it may be a discrete component that is attached to the base unit 130.
  • the audio system includes two or more microphones 122, 124, so that the location of a sound source can be determined via differential volume and phase analysis techniques that are common in the art. Also common in the art are sound source disc ⁇ mination techniques that are used by the processing system 160 to distinguish and locate multiple simultaneous sound sources.
  • Associated parameters such as the received volume level, the "world" coordinates of each source, rate of speed of moving sound sources, and other audio information parameters that the field of view controller 170 uses to determine an approp ⁇ ate subsequent field of view for the camera 110 o
  • the audio processing system 160 includes a voice recognition system that is configured to recognize one or more of a plurality of predefined speech patterns within the audio signals.
  • the audio processing system 160 may be configured to recognize an initiating keyword, such as "Camera!, and in response, provides a signal to the image processing system 150 to initiate the aforementioned gesture recognition process At the same time, this keyword may initiate the recognition of other keywords, such as "left”, “ ⁇ ght”, “zoom-in”, “zoom-out", “track”,
  • the speech recognition process may be partitioned between the audio processing system and the field of view controller, so that conventional speech processing devices can be employed in the audio processing system 160, and keyword recognition processes specific to the control of the camera control system 100 can be employed in the controller 170.
  • the audio processing system provides a "transc ⁇ pt" of received speech signals continuously to the controller 170, and the controller 170 initiates and controls the keyword recognition process upon receipt of a keyword phrase within the transc ⁇ pt.
  • the speech recognition system may be used in conjunction with or independent of the gesture recognition system to further facilitate the processing of user directives for the control of the camera system 100
  • the field of view controller 170 uses the image information parameters from the image processing system 150 and the audio information parameters from the audio processing system 160 to determine whether a change of o ⁇ entation or perspective is approp ⁇ ate, based on these information parameters.
  • FIG. 2 illustrates an example flow diagram of a camera control system in accordance with this invention. This flow diagram is presented for illustrative purposes, and is not intended to be an exhaustive representation of the features that can be incorporated by one of ordinary skill in the art in view of this disclosure. Illustrated is a continuous process that includes two parallel processes, an image information process 220-228, and an audio information process 240-246.
  • the process starts, at 210, with an o ⁇ entation of the camera that provides images and sounds that are converted to image information and audio information, at 220 and 240, respectively.
  • the image information is processed to identify individual figures and clusters of figures, at 222.
  • the image processing system 150 of FIG. 1 provides parameters related to each figure in the current image, and the controller 170 processes these parameters to identify key figures, based on the location of such figures relative to other figures, and identifies clusters of figures based on their spatial relationship to each other.
  • the key figure and clustering process 222 may use audio information and speaker identifiers, discussed below, to facilitate the identification process.
  • the audio processing process may also provide a keyword recognition process that initiates a gesture recognition process in the image processing system. If a command gesture is detected, at 224, the appropriate command is executed, at 254, either directly or via a determination of new orientation parameters, at 250. At the same time, the audio information is processed to identify a primary speaker, based on the audio information from one speaker relative to other speakers, and to identify other speakers or clusters of speakers, at 224. If a voice command, discussed above, is given, at 244, the command is executed, at 254, either directly or via a determination of new orientation parameters, at 250.
  • orientation as used herein includes a control of pan, tilt, or settings to effect a desired field of view.
  • a command gesture is not given, at 224, the image information process continues at 226. If no clusters of figures have been identified in the image, at 226, a pan is initiated, or continued, to find an image that contains a figure, at 226. For the purposes of this disclosure, a cluster includes both single and multiple figures within the camera field of view.
  • the pan process at 226 uses the processed audio information and speaker identification to determine a preferred direction of panning. That is, for example, if the audio information indicates that voices are detected at an area to the right of the camera, the pan at 256 is initiated to turn the camera to the right.
  • the tilt of the camera causes the pan at 256 to orient the camera inappropriately for figure detection, the tilt is automatically adjusted to provide a pan operation that sweeps the area with a substantially level field of view.
  • the image and audio information is used to determine which cluster to choose as a focal cluster. For example, if one cluster is particularly active, or particularly loud, this cluster will preferably be selected as the focal cluster. Using techniques common in the art, such as weighted sampling, the likelihood of each cluster being selected can be made to be dependent on a variety of factors, such as the audio volume, the activity level, the time since this cluster was last selected, whether the figures are all oriented toward a central point, and so on. If a single cluster is located, at 228, cluster selection is not required.
  • the select cluster or the single cluster, is "framed", using the image and audio information associated with this cluster. For example, initially the entire cluster will be centered within the camera's field of view. Thereafter, if the sounds are emanating from a particular point within the cluster, the camera o ⁇ entation is adjusted toward that sound and the field of view is narrowed to magnify the region surrounding the predominant sound source.
  • directional microphones are also employed that allow for a widening and narrowing of the field of audio reception in concert with the zoom settings. If there is a single sound source within the cluster, the figure corresponding to the sound source is framed, using established image framing techniques.
  • a solo speaker is preferably framed so that the image contains a part of the upper torso, the entire head, including any headdress, plus a space above the head. If the sounds shift from figure to figure, the field of view is enlarged and centered so as to include all of the speaking figures.
  • Rules or instructions for framing clusters and figures are preferably stored in a knowledge-based system 275, illustrated in FIG. 1.
  • the knowledge-based system 275 also includes a learning system for updating such rules, via a playback of p ⁇ or recordings accompanied by feedback from the user as to the approp ⁇ ateness of individual actions taken by the controller 270 based on the existing knowledge base and other factors.
  • Also included in the preferred knowledge based system 275 are instructions and rules regarding how long to remain zoomed in on a speaker, how long to maintain a cluster as the select cluster, and so on. Rules are also provided to control changes of focal points.
  • a rule may be provided that requires that the camera be zoomed-out completely before a change of clusters, or that a fade be introduced for such changes.
  • rules may be provided that terminate a recording of an image after a predetermined time pe ⁇ od, or after a specified pe ⁇ od of inactivity, and so on.
  • recording can be temporarily suspended while the control system pans to locate other clusters, other predominant speakers, and so on.
  • user controls are provided to invoke particular rule sets for different types of events. For example, there may be a rule set that is created for recording a particular type of sports event, another rule set for recording a house party, another rule set for recording a theat ⁇ cal performance, and so on.
  • a preferred embodiment of the controller 170 of FIG. 1 also includes an image tracking capability.
  • the controller 170 can be set to a tracking mode wherein an identified figure is continually tracked, regardless of the occurrence of other figures or sounds du ⁇ ng the tracking pe ⁇ od That is, for example, if the user desires to fill the role of narrator for a particular event, the user can place the control system 100 into a tracking mode and identify himself or herself as the tracking target Thereafter, the user may travel as desired to different locations, with the camera's field of view being automatically adjusted to keep the user within each image, unless otherwise directed.
  • the commands for tracking or otherwise directing the camera can be communicated via keywords or gestures, or via a conventional remote control device
  • the aforementioned rule sets in a preferred embodiment include options for different tracking modes as well For example, in a narrator mode, the framing rules may direct the camera o ⁇ ent
  • the approp ⁇ ate o ⁇ entation requirements from the command block 254, the pan block 256, or the framing block 259 are processed at 250 to determine the required camera and base unit onentation parameters to achieve the desired camera field of view These parameters are used to o ⁇ ent the camera, at 210

Abstract

A portable camera control system uses image and audio content information to control the orientation of a camera. In a preferred embodiment, a modular and portable pan-tilt apparatus is provided that is configured to receive a conventional camcorder. A control system receives the audio and video information from the camcorder, and provides pan and tilt commands to the pan-tilt apparatus to orient the camcorder appropriately. If the camcorder has a controllable zoom, the controller also provides zoom commands to adjust the camcorder's field of view to properly frame the image, based on an analysis of the image and/or audio content. A preferred system allows remote and direct control of the camera as desired, and can be configured to provide auto-tracking based on image content. The camera control system also includes one or more knowledge based systems and learning systems that regulate the control of the camera consistent with skilled camera-operator techniques.

Description

Hands-free home video production camcorder
This invention relates to the field of video systems, and in particular to the control of a camcorder to facilitate hands-free video recording.
Production video recording systems are available that include a remote control of a pan/tilt/zoom camera so that a user/operator is free to control the camera while also appearing in the captured camera images. In a typical production, the user is the "narrator" for the events being recorded, and uses the remote control to reorient the camera to point to desired scenes. In this manner the user is able to create a video recording without the aid of a second person to operate the camera.
U.S. patent 5,432,597, "Remote Controlled Tracking System for Tracking a Remote-Control Unit and Positioning and Operating a Camera and Method", issued 11 July 1995, discloses a remote control camera system that also includes a tracking capability, and is incorporated by reference herein. The referenced patent discloses an infrared emitter that sweeps an area, and an infrared detector associated with the narrator that signals the camera controller when the infrared signal is received. The infrared detector can be contained in the remote control device that the narrator uses to control the camera, or it can be worn by the narrator. When the camera control system is switched to the tracking mode, the camera is continually adjusted so that its field of view is along the line of sight corresponding to the location of the infrared detector. The rate of change of the camera orientation can be limited, and the controls can be configured to maintain a rate and direction of change during interim periods of a loss of tracking signaling. The referenced patent also discloses the ability to store and retrieve camera settings to return to prior fields of view, using remote control commands, or automated sequences. Japan patent JP 09009365 A, "Remote Controller and Image Pickup System", filed 19 June 1995, discloses a remote control system that includes a motion detector within a remote control device and provides camera orientation commands corresponding to the detected motion.
Each of these prior art devices operate based on a premise that a single person, the person holding the remote control device, is a primary subject for recording or the primary director for determining the views to be recorded. This requires that such a person be designated, and requires that person's deliberate attention to direct the recording. Although having a primary narrator or director is commonplace for professional or semi-professional recordings, there are a number of occasions when an UN-attended recording may be preferred. At a family celebration, for example, the person designated to record the event is not able to freely participate in the celebration, or is not able to properly direct the recording, because of the division of roles between celebration-participant and camera-director Additionally, the person designated as the camera director may not be skilled in the art, and the resultant recorded images may not adequately capture the event, and may actually be discomforting to view, because of rapid camera movement, changes of scenes, and so on Furthermore, in certain situations, such as wedding ceremonies, the to and fro motions of a camera operator can be distracting
It is an object of this invention to provide a camera automation system that facilitates an unattended operation of a recording system. It is a further object of this invention to provide a camera control system that controls a camera consistent with recording techniques that are known to skilled camera operators. It is a further object of this invention to provide an unobtrusive means of recording an event while still allowing for changing views and images.
These objects and others are achieved by providing a camera control system that uses image and audio content information to control the oπentation of a camera. In a preferred embodiment, a modular and portable pan-tilt apparatus is provided that is configured to receive a conventional camcorder. A control system receives the audio and video information from the camcorder, and provides pan and tilt commands to the pan-tilt apparatus to oπent the camera appropπately. If the camcorder has a controllable zoom, the controller also provides zoom commands to adjust the camera's field of view to properly frame the image, based on an analysis of the image and/or audio content. A preferred system allows remote and direct control of the camera as desired, and can be configured to provide auto-tracking based on image content. The camera control system also includes one or more knowledge based systems and learning systems that regulate the control of the camera consistent with skilled camera-operator techniques. The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
Fig. 1 illustrates an example block diagram of a camera control system in accordance with this invention. Fig. 2 illustrates an example flow diagram of a camera control system in accordance with this invention.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.
Fig. 1 illustrates an example block diagram of a camera control system 100 in accordance with this invention. The camera control system 100 includes a base unit 130 that is configured to control the oπentation of a camera 110, based on commands received from a field of view controller 170. As in a conventional camera control system, a remote control device 180 allows a user to communicate commands directly to the field of view controller to control the oπentation of the camera. In a preferred embodiment, the base unit 130 is configured to allow the camera 110 to rotate vertically (tilt) and horizontally (pan), individually or m combination, so that it can be oπented as desired. These changes of oπentation are effected by communicating activation commands to motors that effect the rotation of the camera through the desired plane of rotation. If the camera 110 has a controllable and adjustable zoom, the controller 170 can adjust the focal length of the camera 110 as required, via zoom-in and zoom-out activation commands, to achieve a desired field of view. That is, the field of view of the camera is defined by the line of sight of the camera 110 as adjusted by the pan and tilt controls, and by the magnification provided by the zoom control.
In accordance with this invention, the camera control system 100 also includes an image processing system 150 and/or an audio processing system 160. The image processing system 150 and audio processing system 160 facilitate an unattended control of the camera 110 by providing parameters to the field of view controller 170 for controlling the camera 110 based on the information contained in the image or audio information from the camera 110. In a preferred embodiment, this unattended control emulates the operations that a human camera operator might perform, based on the images and sounds received while viewing a scene through the view-finder of the camera. For example, a human camera operator will typically zoom-out to capture group scenes, zoom-in to capture solo speakers, pan to follow select individuals or groups, and so on. A skilled camera operator avoids sudden camera movements, zooms-out to minimize back-and-forth camera movements, and practices other techniques that produce the visually appealing results that distinguish good video recordings from poorer quality recordings. The image processing system 150 analyzes the images from the camera 110 to provide image information parameters to the controller 170. These parameters will depend upon the information requirements of the algorithms used within the controller 170. The controller 170 in a preferred embodiment, for example, includes a figure targeting and tracking system that frames a target figure within the field of view of the camera, via commands to the base unit 130 and the camera 110. To effect such a targeting and tracking function, the controller 170 requires a determination of the location and size of each figure, or each major figure, within the camera's field of view. The image processing system 150 in this system identifies each figure in the image from the camera 110, using, for example, flesh tone identification processes and the like, and provides the location and size parameters to the controller 170. As image processing techniques continue to advance, the image processing system 150 is also configured to provide other related image information, such as the estimated "world" coordinates of each figure, the estimated physical size of each figure, the estimated speed of each moving figure, and other estimates, as required by the algorithms of controller 170. Although illustrated as containing separate components for ease of understanding, a preferred embodiment of the camera control system 100 is a portable unit that contains the base unit 130, the image processing system 140, the audio processing system 160, and the field of view controller 170. This portable unit is configured to be a camera "accessory" upon which a conventional camcorder can be mounted. If the camcorder does not contain a stereo audio system, or if the discrimination provided by the camcorder's audio system is found to be insufficient for sound isolation and locating, as discussed below, a modular audio system 120 is provided that can be mounted on the base unit 130 as well. By providing this portable unit, the user can place the unit at any convenient location, and initiate an unattended recording of events within the potential fields of view provided by this location.
As would be evident to one of ordinary skill in the art, other image information besides an identification of figures can be provided by the image processing system 150. For example, in a system that is configured for unattended home video recordings, the image processing system 150 may be configured to distinguish and report the location of dining tables, seating arrangements, and other common focal points. In like manner, a system that is configured for unattended recordings of special events, the image processing system 150 may be configure to recognize and report the location of distinctive items, such as football uniforms instead of flesh tones when recording a football game, animal figures in addition to or in lieu of human figures when recording a dog show, sail shapes when recording a yacht race, and so on
In a preferred embodiment of this invention, the image processing system 150 includes a gesture recognition system that is configured to recognize one or more of a plurality of predefined visual gestures within the field of view of the camera. This gesture recognition may operate in conjunction with the audio processing system, such that the user initiates the gesture recognition process via a vocal keyword, such as "Camera!", and thereafter points in a direction that the camera is to pan, or points to an individual that the camera is to track, or provides a gesture that causes the camera to initiate or terminate some other action. After recognizing a gesture, the image processing system 150 provides information parameters to the controller 170 to effect the appropπate action corresponding to the particular gesture. As would be evident to one of ordinary skill in the art, the image processing system may also be configured to effect the appropπate actions directly, to control any or all of the components in the camera control system 100 in response to the recognized gesture. Alternatively, the field of view controller 170, or other processing device, may provide the gesture recognition function, and the image processing system 150 may merely distinguish and report the location of select body parts, such as hands and arms to the appropπate device for gesture recognition control
The audio processing system 160 performs a similar discπmmatmg and locating function as the image processing system 150, based on audio signals received from an audio system 120. The audio system 120 may be integral to the camera 110 or the base unit 130, or it may be a discrete component that is attached to the base unit 130. Preferably, the audio system includes two or more microphones 122, 124, so that the location of a sound source can be determined via differential volume and phase analysis techniques that are common in the art. Also common in the art are sound source discπmination techniques that are used by the processing system 160 to distinguish and locate multiple simultaneous sound sources. Associated parameters, such as the received volume level, the "world" coordinates of each source, rate of speed of moving sound sources, and other audio information parameters that the field of view controller 170 uses to determine an appropπate subsequent field of view for the camera 110 o
In a preferred embodiment of this invention, the audio processing system 160 includes a voice recognition system that is configured to recognize one or more of a plurality of predefined speech patterns within the audio signals. As mentioned above, the audio processing system 160 may be configured to recognize an initiating keyword, such as "Camera!", and in response, provides a signal to the image processing system 150 to initiate the aforementioned gesture recognition process At the same time, this keyword may initiate the recognition of other keywords, such as "left", "πght", "zoom-in", "zoom-out", "track",
"terminate", and so on. This recognition process is automatically terminated after a predetermined interval containing no recognized keywords. This controlled initiation and termination process is employed so that the control system 100 is not inadvertently
"controlled" by the random occurrence of keywords in received speech signals. The speech recognition process may be partitioned between the audio processing system and the field of view controller, so that conventional speech processing devices can be employed in the audio processing system 160, and keyword recognition processes specific to the control of the camera control system 100 can be employed in the controller 170. In this embodiment, the audio processing system provides a "transcπpt" of received speech signals continuously to the controller 170, and the controller 170 initiates and controls the keyword recognition process upon receipt of a keyword phrase within the transcπpt. The speech recognition system may be used in conjunction with or independent of the gesture recognition system to further facilitate the processing of user directives for the control of the camera system 100 The field of view controller 170 uses the image information parameters from the image processing system 150 and the audio information parameters from the audio processing system 160 to determine whether a change of oπentation or perspective is appropπate, based on these information parameters. FIG. 2 illustrates an example flow diagram of a camera control system in accordance with this invention. This flow diagram is presented for illustrative purposes, and is not intended to be an exhaustive representation of the features that can be incorporated by one of ordinary skill in the art in view of this disclosure. Illustrated is a continuous process that includes two parallel processes, an image information process 220-228, and an audio information process 240-246. The process starts, at 210, with an oπentation of the camera that provides images and sounds that are converted to image information and audio information, at 220 and 240, respectively. The image information is processed to identify individual figures and clusters of figures, at 222. For example, the image processing system 150 of FIG. 1 provides parameters related to each figure in the current image, and the controller 170 processes these parameters to identify key figures, based on the location of such figures relative to other figures, and identifies clusters of figures based on their spatial relationship to each other. Not illustrated in FIG. 2, the key figure and clustering process 222 may use audio information and speaker identifiers, discussed below, to facilitate the identification process. As discussed above, the audio processing process may also provide a keyword recognition process that initiates a gesture recognition process in the image processing system. If a command gesture is detected, at 224, the appropriate command is executed, at 254, either directly or via a determination of new orientation parameters, at 250. At the same time, the audio information is processed to identify a primary speaker, based on the audio information from one speaker relative to other speakers, and to identify other speakers or clusters of speakers, at 224. If a voice command, discussed above, is given, at 244, the command is executed, at 254, either directly or via a determination of new orientation parameters, at 250. For ease of reference, the term orientation as used herein includes a control of pan, tilt, or settings to effect a desired field of view. If a command gesture is not given, at 224, the image information process continues at 226. If no clusters of figures have been identified in the image, at 226, a pan is initiated, or continued, to find an image that contains a figure, at 226. For the purposes of this disclosure, a cluster includes both single and multiple figures within the camera field of view. The pan process at 226 uses the processed audio information and speaker identification to determine a preferred direction of panning. That is, for example, if the audio information indicates that voices are detected at an area to the right of the camera, the pan at 256 is initiated to turn the camera to the right. Not illustrated, if the tilt of the camera causes the pan at 256 to orient the camera inappropriately for figure detection, the tilt is automatically adjusted to provide a pan operation that sweeps the area with a substantially level field of view. If multiple clusters of figures are located, at 228, the image and audio information is used to determine which cluster to choose as a focal cluster. For example, if one cluster is particularly active, or particularly loud, this cluster will preferably be selected as the focal cluster. Using techniques common in the art, such as weighted sampling, the likelihood of each cluster being selected can be made to be dependent on a variety of factors, such as the audio volume, the activity level, the time since this cluster was last selected, whether the figures are all oriented toward a central point, and so on. If a single cluster is located, at 228, cluster selection is not required.
At 259, the select cluster, or the single cluster, is "framed", using the image and audio information associated with this cluster. For example, initially the entire cluster will be centered within the camera's field of view. Thereafter, if the sounds are emanating from a particular point within the cluster, the camera oπentation is adjusted toward that sound and the field of view is narrowed to magnify the region surrounding the predominant sound source. In a preferred embodiment of the invention, directional microphones are also employed that allow for a widening and narrowing of the field of audio reception in concert with the zoom settings. If there is a single sound source within the cluster, the figure corresponding to the sound source is framed, using established image framing techniques. For example, a solo speaker is preferably framed so that the image contains a part of the upper torso, the entire head, including any headdress, plus a space above the head. If the sounds shift from figure to figure, the field of view is enlarged and centered so as to include all of the speaking figures.
Rules or instructions for framing clusters and figures are preferably stored in a knowledge-based system 275, illustrated in FIG. 1. In a preferred embodiment, the knowledge-based system 275 also includes a learning system for updating such rules, via a playback of pπor recordings accompanied by feedback from the user as to the appropπateness of individual actions taken by the controller 270 based on the existing knowledge base and other factors. Also included in the preferred knowledge based system 275 are instructions and rules regarding how long to remain zoomed in on a speaker, how long to maintain a cluster as the select cluster, and so on. Rules are also provided to control changes of focal points. For example, a rule may be provided that requires that the camera be zoomed-out completely before a change of clusters, or that a fade be introduced for such changes. In like manner, rules may be provided that terminate a recording of an image after a predetermined time peπod, or after a specified peπod of inactivity, and so on. Similarly, recording can be temporarily suspended while the control system pans to locate other clusters, other predominant speakers, and so on. In a preferred embodiment, user controls are provided to invoke particular rule sets for different types of events. For example, there may be a rule set that is created for recording a particular type of sports event, another rule set for recording a house party, another rule set for recording a theatπcal performance, and so on. As mentioned above, a preferred embodiment of the controller 170 of FIG. 1 also includes an image tracking capability. In addition to maintaining a predominant speaker in the central portion of the camera's field of view based on the image and audio information, the controller 170 can be set to a tracking mode wherein an identified figure is continually tracked, regardless of the occurrence of other figures or sounds duπng the tracking peπod That is, for example, if the user desires to fill the role of narrator for a particular event, the user can place the control system 100 into a tracking mode and identify himself or herself as the tracking target Thereafter, the user may travel as desired to different locations, with the camera's field of view being automatically adjusted to keep the user within each image, unless otherwise directed As discussed above, the commands for tracking or otherwise directing the camera can be communicated via keywords or gestures, or via a conventional remote control device The aforementioned rule sets in a preferred embodiment include options for different tracking modes as well For example, in a narrator mode, the framing rules may direct the camera oπentation so as to place the narrator off to one side or the other of the captured image frames In other tracking modes, the rules may direct the camera oπentation so as to keep the tracked figure centrally located withm the captured image frames
The appropπate oπentation requirements from the command block 254, the pan block 256, or the framing block 259 are processed at 250 to determine the required camera and base unit onentation parameters to achieve the desired camera field of view These parameters are used to oπent the camera, at 210
The foregoing merely illustrates the pπnciples of the invention It will thus be appreciated that those skilled in the art will be able to devise vaπous arrangements which, although not explicitly descπbed or shown herein, embody the pπnciples of the invention and are thus withm the spiπt and scope of the following claims

Claims

CLAIMS:
1. A portable camera control system (100) for controlling a field of view of a camera (110) comprising: at least one of: an image processing system (150) that is configured to receive and process images from the camera (110) corresponding to the field of view of the camera (110) and to provide therefrom image information parameters, and an audio processing system (160) that is configured to receive and process audio signals corresponding to the field of view of the camera (110) and to provide therefrom audio information parameters, a field of view controller (170), operably coupled to the at least one of the image processing system (150) and the audio processing system (160), that is configured to effect a change of the field of view of the camera (110) based on at least one of: the image information parameters and the audio information parameters.
2. The portable camera control system (100) of claim 1, further including a base unit (130) that is configured to accept a fixed attachment of the camera (110), the base unit (130) including at least one orientation motor that is operably connected to the field of view controller (170) to effect the change of the field of view of the camera (110).
3. The portable camera control system (100) of claim 1, further including a remote control device (180) that provides remote orientation commands, and wherein the field of view controller (170) is further configured to change the field of view of the camera (110) based on the remote orientation commands.
4. The portable camera control system (100) of claim 1, wherein the image processing system (150) includes a gesture-recognition system that is configured to provide one or more of the image information parameters based on a recognition of one or more of a plurality of predefined visual gestures v/ithin the field of view of the camera (110).
5. The portable camera control system (100) of claim 1, wherein the audio processing system (160) includes a voice-recognition system that is configured to provide one or more of the audio information parameters based on a recognition of one or more of a plurality of predefined speech patterns within the audio signals.
6. The portable camera control system (100) of claim 1, wherein the field of view controller (170) includes at least one of: an expert system, a knowledge-based system (175), a rules-based system, and a learning system that is configured to facilitate a determination of the change of the field of view.
7. The portable camera control system (100) of claim 6, wherein the at least one expert system, knowledge-based system (175), rules-based system, and learning system contain a plurality of instruction sets that each facilitate a determination of the change of the field of view based on the image information parameters and the audio information parameters, and the field of view controller (170) is configured to allow a selection of an instruction set of the plurality of instruction sets for use in determining the change of the field of view.
8. A method of controlling a field of view of a portable camera (110),comprising at least one of: receiving and processing images (220) from the portable camera
(110) corresponding to the field of view of the portable camera (110) and to provide therefrom image information parameters, and receiving and processing audio signals (240) corresponding to the field of view of the portable camera (110) and to provide therefrom audio information parameters, and effecting (250) a change of the field of view of the portable camera (110) based on at least one of: the image information parameters and the audio information parameters.
9 The method of claim 8, wherein effecting (250) the change of the field of view of the portable camera (110) is via a change of at least one of a pan oπentation, a tilt oπentation, and a zoom setting
10 The method of claim 8, further including receiving remote oπentation commands, and wherein effecting the change the field of view of the portable camera (110) is further based on the remote oπentation commands
11. The method of claim 8, further including processing (224, 244) at least one of the images and the audio signals to provide a control of the portable camera (110) based on a recognition of at least one of: one or more of a plurality of predefined visual gestures withm the field of view of the portable camera (110), and one or more of a plurality of predefined speech patterns within the audio signals.
12. The method of claim 8, wherein effecting (250) the change of the field of view of the portable camera (110) includes a use of at least one of: an expert system, a knowledge- based system (175), a rules-based system, and a learning system that is configured to facilitate a determination of the change of the field of view.
13. The method of claim 12, wherein the at least one expert system, knowledge-based system (175), rules-based system, and learning system contain a plurality of instruction sets that each facilitate a determination of the change of the field of view based on the image information parameters and the audio information parameters, and the method further includes selecting an instruction set of the plurality of instruction sets for use in determining the change of the field of view
14. A portable base unit (130) that is configured to receive a hand-held camcorder, the portable base unit (130) compπsmg. at least motor that is configured to adjust an oπentation of the camcorder in at least one plane of rotation based on a receipt of corresponding motor activation signals, at least one of an image processing system (150) that is configured to receive and process images from the hand-held camcorder corresponding to the field of view of the handheld camcorder and to provide therefrom image information parameters, and an audio processing system (160) that is configured to receive and process audio signals corresponding to the field of view of the hand-held camcorder and to provide therefrom audio information parameters, a field of view controller (170), operably coupled to the at least one of the image processing system (150) and the audio processing system (160), that is configured to effect a change of the field of view of the hand-held camcorder via the motor activation signals, based on at least one of: the image information parameters and the audio information parameters.
15. The portable base unit (130) of claim 14, wherein the field of view controller (170) effects the motor activation signals based on a determination of a desired change of at least one of: a pan oπentation, a tilt oπentation, and a zoom setting.
16. The portable base unit (130) of claim 14, further including a remote control device (180) that provides remote oπentation commands, and wherein the field of view controller (170) is further configured to change the field of view of the hand-held camcorder based on the remote oπentation commands.
17. The portable base unit (130) of claim 14, wherein the image processing system (150) includes a gesture -recognition system that is configured to provide one or more of the image information parameters based on a recognition of one or more of a plurality of predefined visual gestures withm the field of view of the hand-held camcorder.
18. The portable base unit (130) of claim 17, wherein the audio processing system (160) includes a voice-recognition system that is configured to provide one or more of the audio information parameters based on a recognition of one or more of a plurality of predefined speech patterns withm the audio signals.
19. The portable base unit (130) of claim 17, wherein the field of view controller (170) includes at least one of: an expert system, a knowledge-based system (175), a rules-based system, and a learning system that is configured to facilitate a determination of the change of the field of view
20. The portable base unit (130) of claim 19, wherein the at least one expert system, knowledge-based system (175), rules-based system, and learning system contain a plurality of instruction sets that each facilitate a determination of the change of the field of view based on the image information parameters and the audio information parameters, and the field of view controller (170) is configured to allow a selection of an instruction set of the plurality of instruction sets for use in determining the change of the field of view.
PCT/EP2001/002758 2000-03-21 2001-03-12 Hands-free home video production camcorder WO2001072034A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2001570070A JP2003528548A (en) 2000-03-21 2001-03-12 Hand-free home video production camcorder
EP01913866A EP1269746A1 (en) 2000-03-21 2001-03-12 Hands-free home video production camcorder
KR1020017014729A KR20020008191A (en) 2000-03-21 2001-03-12 Hands-free home video production camcorder

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US53282000A 2000-03-21 2000-03-21
US09/532,820 2000-03-21

Publications (1)

Publication Number Publication Date
WO2001072034A1 true WO2001072034A1 (en) 2001-09-27

Family

ID=24123302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/002758 WO2001072034A1 (en) 2000-03-21 2001-03-12 Hands-free home video production camcorder

Country Status (5)

Country Link
EP (1) EP1269746A1 (en)
JP (1) JP2003528548A (en)
KR (1) KR20020008191A (en)
CN (1) CN1381131A (en)
WO (1) WO2001072034A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2407635A (en) * 2003-10-31 2005-05-04 Hewlett Packard Development Co Control of camera field of view with user hand gestures recognition
WO2008107733A1 (en) * 2007-03-07 2008-09-12 Sony Ericsson Mobile Communications Ab Method and system for a self timer function for a camera and camera equipped mobile radio terminal
US7990421B2 (en) * 2008-07-18 2011-08-02 Sony Ericsson Mobile Communications Ab Arrangement and method relating to an image recording device

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101442149B1 (en) 2008-01-17 2014-09-23 삼성전자 주식회사 Apparatus and method for acquiring image based on expertise
JP5495612B2 (en) * 2008-04-23 2014-05-21 キヤノン株式会社 Camera control apparatus and method
CN102158680A (en) * 2010-02-11 2011-08-17 北京华旗随身数码股份有限公司 Telephone conference terminal with visualization function
CN103248633B (en) * 2012-02-01 2017-05-24 深圳中兴力维技术有限公司 PTZ control method and system
JP2013196047A (en) * 2012-03-15 2013-09-30 Omron Corp Gesture input apparatus, control program, computer-readable recording medium, electronic device, gesture input system, and control method of gesture input apparatus
CN102799191B (en) * 2012-08-07 2016-07-13 通号通信信息集团有限公司 Cloud platform control method and system based on action recognition technology
CN104243894A (en) * 2013-06-09 2014-12-24 中国科学院声学研究所 Audio and video fused monitoring method
CN106992004B (en) * 2017-03-06 2020-06-26 华为技术有限公司 Method and terminal for adjusting video
CN109194918B (en) * 2018-09-17 2022-04-19 东莞市丰展电子科技有限公司 Shooting system based on mobile carrier

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5432597A (en) * 1990-05-31 1995-07-11 Parkervision, Inc. Remote controlled tracking system for tracking a remote-control unit and positioning and operating a camera and method
EP0689356A2 (en) * 1994-06-20 1995-12-27 AT&T Corp. Voice-following video system
WO1999004557A1 (en) * 1997-07-18 1999-01-28 Interval Research Corporation Visual user interface for controlling the interaction of a device with a spatial region

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5432597A (en) * 1990-05-31 1995-07-11 Parkervision, Inc. Remote controlled tracking system for tracking a remote-control unit and positioning and operating a camera and method
EP0689356A2 (en) * 1994-06-20 1995-12-27 AT&T Corp. Voice-following video system
WO1999004557A1 (en) * 1997-07-18 1999-01-28 Interval Research Corporation Visual user interface for controlling the interaction of a device with a spatial region

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MASAAKI FUKUMOTO ET AL: "FINGER-POINTER: POINTING INTERFACE BY IMAGE PROCESSING", COMPUTERS AND GRAPHICS, PERGAMON PRESS LTD. OXFORD, GB, vol. 18, no. 5, 1 September 1994 (1994-09-01), pages 633 - 642, XP000546603, ISSN: 0097-8493 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2407635A (en) * 2003-10-31 2005-05-04 Hewlett Packard Development Co Control of camera field of view with user hand gestures recognition
GB2407635B (en) * 2003-10-31 2006-07-12 Hewlett Packard Development Co Improvements in and relating to camera control
US7483057B2 (en) 2003-10-31 2009-01-27 Hewlett-Packard Development Company, L.P. Camera control
WO2008107733A1 (en) * 2007-03-07 2008-09-12 Sony Ericsson Mobile Communications Ab Method and system for a self timer function for a camera and camera equipped mobile radio terminal
US7990421B2 (en) * 2008-07-18 2011-08-02 Sony Ericsson Mobile Communications Ab Arrangement and method relating to an image recording device
US8350931B2 (en) 2008-07-18 2013-01-08 Sony Ericsson Mobile Communications Ab Arrangement and method relating to an image recording device

Also Published As

Publication number Publication date
CN1381131A (en) 2002-11-20
KR20020008191A (en) 2002-01-29
JP2003528548A (en) 2003-09-24
EP1269746A1 (en) 2003-01-02

Similar Documents

Publication Publication Date Title
US6850265B1 (en) Method and apparatus for tracking moving objects using combined video and audio information in video conferencing and other applications
US6005610A (en) Audio-visual object localization and tracking system and method therefor
EP1186162B1 (en) Multi-modal video target acquisition and re-direction system and method
CN102902505B (en) Device with enhancing audio
JP3115798B2 (en) Audio tracking video system
WO2001072034A1 (en) Hands-free home video production camcorder
US20080246833A1 (en) Video conferencing apparatus, control method, and program
JP2010533416A (en) Automatic camera control method and system
JP2004528766A (en) Method and apparatus for sensing and locating a speaker using sound / image
JP2011244454A (en) Videoconferencing endpoint having multiple voice-tracking cameras
JP2011244455A (en) Automatic camera framing for videoconferencing
JP2011244456A (en) Voice tracking camera with speaker identification
EP1368781A2 (en) Automatic positioning of display depending upon the viewer's location or gesture
US11711665B2 (en) Switching binaural sound from head movements
WO2002041632A1 (en) Recording of moving images
JP4638183B2 (en) Multiple camera output editing apparatus and editing method thereof
JP2020526094A (en) User signal processing method and apparatus for performing such method
EP1269255A2 (en) Method and apparatus for determining camera movement control criteria
Zotkin et al. An audio-video front-end for multimedia applications
JP2019095523A (en) Robot and robot control method
JPH0983856A (en) Intelligent camera equipment
JP2005277845A (en) Photographing controller
JP2006081128A (en) Photographing system, karaoke system, and photographing method
JP6993802B2 (en) Image processing equipment
CN107786834A (en) For the camera base and its method in video conferencing system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

WWE Wipo information: entry into national phase

Ref document number: 2001913866

Country of ref document: EP

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2001 570070

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020017014729

Country of ref document: KR

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 018012825

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 1020017014729

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2001913866

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2001913866

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1020017014729

Country of ref document: KR