METHOD AND APPARATUS FOR VISION-BASED COUPLING BETWEEN POINTER ACTIONS AND PROJECTED IMAGES
BACKGROUND OF THE INVENTION
1. Field of the Invention The present invention relates generally to computerized mapping of pointers and, in particular, to the computerized mapping of a user's pointer between a computer display and a computer virtual environment.
2. Description of the Prior Art
Traditional methods of controlling computer-based presentations (such as Microsoft* PowerPoint* talks) require the user to send commands to the computer using either the keyboard or the mouse. This can be awkward because it diverts the attention of the presenter and the audience from the projected presentation (screen). A more natural mode of controlling such presentations would be by performing operations directly on the presentation area. Existing systems for accepting user input from the presentation surface include expensive electronic whiteboards and pointing devices such as remote mice. Electronic whiteboards are not portable, and either require laborious manual calibration and/or force the use of specially coded markers. Remote mice lack the transparency and immediacy of pointing actions and suffer from other problems. For instance, infrared mice require the user to point the mouse at a small target, radio mice are subject to interference, and mice with long cables are unwieldy.
It is. therefore, an object of the present invention to allow the user to directly control the presentation in a more natural manner, either at a distance from the screen (using a laser pointer or shadow gestures) or at a distance from the computer
(using a telescoping pointer). It is another object of the present invention to create an inexpensive and portable presentation control system.
SUMMARY OF THE INVENTION Accordingly, we have developed a method and an apparatus to overcome the problems with the prior art systems.
The present invention includes an inexpensive low-resolution camera which is connected to the user's notebook computer and which observes the presentation area. Computer-vision algorithms determine where the user is pointing on the
presentation surface, and this provides a correspondence between the user's actions (as seen by the camera) and active regions in the image being displayed on the screen.
The user may control the presentation using a variety of methods such as laser-pointer motions on the projected image, shadows cast by the user's fingers or a traditional telescoping pointer (optionally augmented by a lighted or reflective tip). The user is not. however, restricted to this mode. Control can be augmented using standard keyboard/mouse events, speech recognition or vision-based facial-expression analysis. The method of the present invention includes determining the mapping between a point in the projected image frame and the corresponding point in the camera image frame; given an image of the scene as taken by the camera, identifying the point(s). if any, that were targeted by the user (in the case of a laser pointer, this corresponds to the bright red dot in the image and for a shadow of the user's finger, this is slightly more complicated); given the mapping computed above, mapping the user-selected point from camera image frame to source image (pre-projection) frame; and performing a programmed action in response to the location and characteristics of the user's pointer action (for instance, drawing a dot or moving to the next slide of a presentation). Further, the present invention includes an apparatus that is capable of performing the above- described method.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram of one embodiment of an apparatus according to the present invention; and
Fig. 2 is a flow diagram illustrating the general method according to the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The present invention is directed to mapping a user's pointer between a computer display and a computer virtual environment. According to the present invention, there is provided a method and apparatus for the computerized mapping of pointers in a projection/camera system.
Referring to Fig. 1, the apparatus includes a pointer 10, a computer 12 with a source image frame, a projection surface 14 with a projected image frame, and a camera 16 with a camera image frame and oriented to observe the projection surface 14.
Further, the present invention includes a projector 18 which projects an image generated by the computer 12 onto the projection surface 14. The pointer 10, in use, creates a pointer targeted point 20 on the projection surface 14 which is detected by the camera 16. It is also envisioned that the display of the computer 12 can be used as the projection surface 14. obviating the need for a separate projector 18.
As shown in Fig. 2, mapping a user's pointer 10 between a computer 12 with a source image frame and projection surface 14 with a projected image frame using a camera 16 with a camera image frame is achieved through the steps of determining the mapping between a point in the projected image frame and the corresponding point in the camera image frame (projector-camera system calibration step 22), identifying a pointer targeted point within the camera image frame (feature extraction step 24), determining a mapping between the pointer targeted point from the camera image frame to the source image frame (coordinate mapping step 26), and performing a programmed action in response to the location of the pointer targeted point within the source image frame (program action step 28).
To clarify the following discussion, the following frames of reference are defined. The "source image frame" is the coordinates in the internal representation of the screen (typically pixels) in the computer 12. The "projected image frame" is the coordinates on the projection surface 14. Finally, the "camera image frame" is the coordinates in the image captured by the camera 16 (in which the projected image frame is a sub-region).
The first step 22 of the present invention can be referred to as the projector-camera system calibration step. Since the relative positions of the camera 16, projection surface 14 and projector 18 may vary, it is important to determine a mapping between a given point in the camera image frame and the corresponding point in the projected image frame, which is the projection surface 14 containing the user's presentation (e.g., given the observed position of the pointer 10, determine whether it falls inside the projected image of a button). Calibration of such a system may be manual, semi-automatic or fully automatic. For manual calibration, the camera 16 and projector 18 are mounted in a known position and orientation, and the internal optical parameters of the components are specified. For semi-automatic calibration, the
-
computer projects a presentation slide depicting several calibration regions, and the user manually identifies the locations of these regions in the camera image. From these correspondences, the system automatically computes the above mapping. Finally, for automatic calibration, calibration can be performed without any user interaction. It is also envisioned that if the apparatus is built such that the camera 16 and projector 18 optics are shared (i.e.. the lens is simultaneously used to project and acquire images), the above mapping is simply a known similarity mapping regardless of the position and orientation of the apparatus with respect to the projection surface 14. Such an apparatus requires no calibration. The next step 24 of the present invention can be referred to as the feature extraction step. Feature extraction involves identifying the position of the pointer targeted point 20 in the camera image frame utilizing the mapping determined in step 22. Different pointer 10 types may require different image processing approaches. As an example, described herein is the process used for detecting laser dots and telescoping pointers with illuminated or highly reflective tips according to the present invention. In a dark environment, such as a presentation theater, the pointer 10 creates a saturated region of pixels in the camera image. This spot may occupy several pixels in the image and can be extracted by appropriately thresholding the image (e.g., for a typical laser pointer, the red channel of the image). The centroid of these saturated pixels provides an estimate of the pointer 10 location with potentially sub-pixel accuracy.
If multiple laser pointers are to be simultaneously employed, the centroid of each connected component in the thresholded image can be separately computed. When the pointer 10 cannot be located using simple color or region-based techniques, the feature extraction phase can employ methods, such as template matching (searching the image for a known shape) or image differencing (comparing the current image with a prior image) to locate the pointer. In many cases, the present invention can exploit the fact that displayed scene characteristics can be optimized for feature extraction.
The next step 26 can be referred to as the coordinate mapping step. Once the pointer targeted point 20 has been located in the camera image, its position must be converted from the camera image frame to the source image frame, or the computer's internal coordinates, using the results of the previous steps. The mapping between points
in the camera image frame and points in the source image (pre-projection) frame is represented as a projective transform that converts 2-D points (represented as 3 x 1 vectors in homogeneous coordinates) from one coordinate frame to the other. Thus, a matrix multiplication enables the system to determine the point in the source image corresponding to the pointer targeted point 20 observed location on the projection surface 14.
Finally, the program response step 28 performs a programmed action of the computer 12 in response to the physical location and/or characteristics of the pointer targeted point 20. The present invention provides a general method for specifying pointer 10 actions in projected images. Through the techniques and steps described above, the present invention can track the visible pointer targeted point 20 (e.g., laser pointer dot, physical pointing device or a shadow) on the projection surface 14, enabling the user to control a virtual mouse cursor on the display. Event activation can be triggered by one of three general strategies: 1. change in visible pointer 10 state (color changes, e.g., red versus green light sources, or a change in pointer shape, flashing, etc.);
2. specific motion patterns (pausing in a specific region of the screen, distinctive gestures, etc.); and
3. augmented with other modes of communication (voice activation or button presses communicated through alternate devices, e.g., radio or infrared, etc.). These strategies permit a user to easily emulate established user-interface paradigms such as those employed for computer mice.
The pointer 10 events enable a variety of applications. The user may deliver a presentation (e.g., Microsoft® PowerPoint® slides) without physically interacting with his/her computer 12. By pressing virtual buttons (using either a laser pointer, pointing stick or finger), the user could change slides, activate animations or follow hyperlinks while focusing on the presentation area and audience. Additionally, the user can use the presentation area as a virtual sketchpad or drafting table and draw shapes using a laser pointer or his/her finger. In the context of a presentation, the user would be able to make freehand annotations or easily highlight regions on a slide. More structured
interactions, such as dragging and dropping virtual objects, are also possible. Further, the present invention also works with standard desktop monitors and notebook screens. By providing the computer 12 with a camera 16 that looks at its own display device, virtual touch-screens that require no hardware modification of the display device are enabled. It is envisioned that multiple projection surfaces 14 may be used, with each projection surface 14 having a camera 16 aimed at the projection surface 14. Additionally, the projection surface 14 may be a presentation screen, whitewalls or any other suitable projection surface 14. Further, the computer 12 display may be a CRT monitor or a flat-panel display having the camera 16 aimed directly at the computer 12 display. In this manner, the computer 12 display becomes the projection surface 14.
The user may use a pointer 10 that creates light, either within or outside the human visual spectrum, which may include laser pointers, rods, telescoping pointers with illuminated tips, pointers with reflective tips or a rod with a highly visible tip. Absent such a pointer 10, the user may create the pointer targeted point 20 by interposing an object between the camera 16 and projection surface 14. Similarly, the pointer 10 actions are created from shadows formed by this opaque object between the projector 18 and the projection surface 14.
When combined with other technologies, such as speech recognition, the present invention could enable entry of text while the user is away from the keyboard. Such an extension may be suitable for classroom use and in collaborative environments where several users could share a large work space. Overall, the present invention is a computer-controlled, vision-based mapping system between pointer 10 actions and projected images.
The invention itself, both as to its construction and its method of operation, together with the additional objects and advantages thereof, will best be understood from the previous description of specific embodiments when read in connection with the accompanying drawings. Although the specific description of the herein disclosed invention has been described in detail above, it may be appreciated that those skilled in the art may make other modifications and changes in the invention disclosed above without departing from the spirit and scope thereof.