WO2001046941A1

WO2001046941A1 - Method and apparatus for vision-based coupling between pointer actions and projected images

Info

Publication number: WO2001046941A1
Application number: PCT/US2000/034810
Authority: WO
Inventors: Rahul Sukthankar; Robert Stockton; Matthew Mullin; Mark Kantrowitz
Original assignee: Justsystem Corporation
Priority date: 1999-12-23
Filing date: 2000-12-20
Publication date: 2001-06-28
Also published as: AU3436001A

Abstract

The present invention, by means of aiming a digital camera (16) at a computer display (14) or projected image thereof, observes a user's pointer actions and maps them into the computer's virtual environment. This invention provides an inexpensive and portable alternative to systems employing traditional methods such as touch-screens. It can be demonstrated that the present invention is practical by implementing two applications: (1) a tool that enables the user to use a laser pointer (10) as a drawing tool; and (2) presentation software that can be activated by the user's pointing actions. The invention extends to multiple users, multiple cameras and/or multiple display devices.

Description

METHOD AND APPARATUS FOR VISION-BASED COUPLING BETWEEN POINTER ACTIONS AND PROJECTED IMAGES

BACKGROUND OF THE INVENTION

1. Field of the Invention The present invention relates generally to computerized mapping of pointers and, in particular, to the computerized mapping of a user's pointer between a computer display and a computer virtual environment.

2. Description of the Prior Art

Traditional methods of controlling computer-based presentations (such as Microsoft* PowerPoint* talks) require the user to send commands to the computer using either the keyboard or the mouse. This can be awkward because it diverts the attention of the presenter and the audience from the projected presentation (screen). A more natural mode of controlling such presentations would be by performing operations directly on the presentation area. Existing systems for accepting user input from the presentation surface include expensive electronic whiteboards and pointing devices such as remote mice. Electronic whiteboards are not portable, and either require laborious manual calibration and/or force the use of specially coded markers. Remote mice lack the transparency and immediacy of pointing actions and suffer from other problems. For instance, infrared mice require the user to point the mouse at a small target, radio mice are subject to interference, and mice with long cables are unwieldy.

It is. therefore, an object of the present invention to allow the user to directly control the presentation in a more natural manner, either at a distance from the screen (using a laser pointer or shadow gestures) or at a distance from the computer

(using a telescoping pointer). It is another object of the present invention to create an inexpensive and portable presentation control system.

SUMMARY OF THE INVENTION Accordingly, we have developed a method and an apparatus to overcome the problems with the prior art systems.

The present invention includes an inexpensive low-resolution camera which is connected to the user's notebook computer and which observes the presentation area. Computer-vision algorithms determine where the user is pointing on the presentation surface, and this provides a correspondence between the user's actions (as seen by the camera) and active regions in the image being displayed on the screen.

The user may control the presentation using a variety of methods such as laser-pointer motions on the projected image, shadows cast by the user's fingers or a traditional telescoping pointer (optionally augmented by a lighted or reflective tip). The user is not. however, restricted to this mode. Control can be augmented using standard keyboard/mouse events, speech recognition or vision-based facial-expression analysis. The method of the present invention includes determining the mapping between a point in the projected image frame and the corresponding point in the camera image frame; given an image of the scene as taken by the camera, identifying the point(s). if any, that were targeted by the user (in the case of a laser pointer, this corresponds to the bright red dot in the image and for a shadow of the user's finger, this is slightly more complicated); given the mapping computed above, mapping the user-selected point from camera image frame to source image (pre-projection) frame; and performing a programmed action in response to the location and characteristics of the user's pointer action (for instance, drawing a dot or moving to the next slide of a presentation). Further, the present invention includes an apparatus that is capable of performing the above- described method.

BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram of one embodiment of an apparatus according to the present invention; and

Fig. 2 is a flow diagram illustrating the general method according to the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The present invention is directed to mapping a user's pointer between a computer display and a computer virtual environment. According to the present invention, there is provided a method and apparatus for the computerized mapping of pointers in a projection/camera system.

Referring to Fig. 1, the apparatus includes a pointer 10, a computer 12 with a source image frame, a projection surface 14 with a projected image frame, and a camera 16 with a camera image frame and oriented to observe the projection surface 14. Further, the present invention includes a projector 18 which projects an image generated by the computer 12 onto the projection surface 14. The pointer 10, in use, creates a pointer targeted point 20 on the projection surface 14 which is detected by the camera 16. It is also envisioned that the display of the computer 12 can be used as the projection surface 14. obviating the need for a separate projector 18.

As shown in Fig. 2, mapping a user's pointer 10 between a computer 12 with a source image frame and projection surface 14 with a projected image frame using a camera 16 with a camera image frame is achieved through the steps of determining the mapping between a point in the projected image frame and the corresponding point in the camera image frame (projector-camera system calibration step 22), identifying a pointer targeted point within the camera image frame (feature extraction step 24), determining a mapping between the pointer targeted point from the camera image frame to the source image frame (coordinate mapping step 26), and performing a programmed action in response to the location of the pointer targeted point within the source image frame (program action step 28).

To clarify the following discussion, the following frames of reference are defined. The "source image frame" is the coordinates in the internal representation of the screen (typically pixels) in the computer 12. The "projected image frame" is the coordinates on the projection surface 14. Finally, the "camera image frame" is the coordinates in the image captured by the camera 16 (in which the projected image frame is a sub-region).

The first step 22 of the present invention can be referred to as the projector-camera system calibration step. Since the relative positions of the camera 16, projection surface 14 and projector 18 may vary, it is important to determine a mapping between a given point in the camera image frame and the corresponding point in the projected image frame, which is the projection surface 14 containing the user's presentation (e.g., given the observed position of the pointer 10, determine whether it falls inside the projected image of a button). Calibration of such a system may be manual, semi-automatic or fully automatic. For manual calibration, the camera 16 and projector 18 are mounted in a known position and orientation, and the internal optical parameters of the components are specified. For semi-automatic calibration, the

- computer projects a presentation slide depicting several calibration regions, and the user manually identifies the locations of these regions in the camera image. From these correspondences, the system automatically computes the above mapping. Finally, for automatic calibration, calibration can be performed without any user interaction. It is also envisioned that if the apparatus is built such that the camera 16 and projector 18 optics are shared (i.e.. the lens is simultaneously used to project and acquire images), the above mapping is simply a known similarity mapping regardless of the position and orientation of the apparatus with respect to the projection surface 14. Such an apparatus requires no calibration. The next step 24 of the present invention can be referred to as the feature extraction step. Feature extraction involves identifying the position of the pointer targeted point 20 in the camera image frame utilizing the mapping determined in step 22. Different pointer 10 types may require different image processing approaches. As an example, described herein is the process used for detecting laser dots and telescoping pointers with illuminated or highly reflective tips according to the present invention. In a dark environment, such as a presentation theater, the pointer 10 creates a saturated region of pixels in the camera image. This spot may occupy several pixels in the image and can be extracted by appropriately thresholding the image (e.g., for a typical laser pointer, the red channel of the image). The centroid of these saturated pixels provides an estimate of the pointer 10 location with potentially sub-pixel accuracy.

If multiple laser pointers are to be simultaneously employed, the centroid of each connected component in the thresholded image can be separately computed. When the pointer 10 cannot be located using simple color or region-based techniques, the feature extraction phase can employ methods, such as template matching (searching the image for a known shape) or image differencing (comparing the current image with a prior image) to locate the pointer. In many cases, the present invention can exploit the fact that displayed scene characteristics can be optimized for feature extraction.

The next step 26 can be referred to as the coordinate mapping step. Once the pointer targeted point 20 has been located in the camera image, its position must be converted from the camera image frame to the source image frame, or the computer's internal coordinates, using the results of the previous steps. The mapping between points in the camera image frame and points in the source image (pre-projection) frame is represented as a projective transform that converts 2-D points (represented as 3 x 1 vectors in homogeneous coordinates) from one coordinate frame to the other. Thus, a matrix multiplication enables the system to determine the point in the source image corresponding to the pointer targeted point 20 observed location on the projection surface 14.

Finally, the program response step 28 performs a programmed action of the computer 12 in response to the physical location and/or characteristics of the pointer targeted point 20. The present invention provides a general method for specifying pointer 10 actions in projected images. Through the techniques and steps described above, the present invention can track the visible pointer targeted point 20 (e.g., laser pointer dot, physical pointing device or a shadow) on the projection surface 14, enabling the user to control a virtual mouse cursor on the display. Event activation can be triggered by one of three general strategies: 1. change in visible pointer 10 state (color changes, e.g., red versus green light sources, or a change in pointer shape, flashing, etc.);

2. specific motion patterns (pausing in a specific region of the screen, distinctive gestures, etc.); and

3. augmented with other modes of communication (voice activation or button presses communicated through alternate devices, e.g., radio or infrared, etc.). These strategies permit a user to easily emulate established user-interface paradigms such as those employed for computer mice.

The pointer 10 events enable a variety of applications. The user may deliver a presentation (e.g., Microsoft^® PowerPoint^® slides) without physically interacting with his/her computer 12. By pressing virtual buttons (using either a laser pointer, pointing stick or finger), the user could change slides, activate animations or follow hyperlinks while focusing on the presentation area and audience. Additionally, the user can use the presentation area as a virtual sketchpad or drafting table and draw shapes using a laser pointer or his/her finger. In the context of a presentation, the user would be able to make freehand annotations or easily highlight regions on a slide. More structured interactions, such as dragging and dropping virtual objects, are also possible. Further, the present invention also works with standard desktop monitors and notebook screens. By providing the computer 12 with a camera 16 that looks at its own display device, virtual touch-screens that require no hardware modification of the display device are enabled. It is envisioned that multiple projection surfaces 14 may be used, with each projection surface 14 having a camera 16 aimed at the projection surface 14. Additionally, the projection surface 14 may be a presentation screen, whitewalls or any other suitable projection surface 14. Further, the computer 12 display may be a CRT monitor or a flat-panel display having the camera 16 aimed directly at the computer 12 display. In this manner, the computer 12 display becomes the projection surface 14.

The user may use a pointer 10 that creates light, either within or outside the human visual spectrum, which may include laser pointers, rods, telescoping pointers with illuminated tips, pointers with reflective tips or a rod with a highly visible tip. Absent such a pointer 10, the user may create the pointer targeted point 20 by interposing an object between the camera 16 and projection surface 14. Similarly, the pointer 10 actions are created from shadows formed by this opaque object between the projector 18 and the projection surface 14.

When combined with other technologies, such as speech recognition, the present invention could enable entry of text while the user is away from the keyboard. Such an extension may be suitable for classroom use and in collaborative environments where several users could share a large work space. Overall, the present invention is a computer-controlled, vision-based mapping system between pointer 10 actions and projected images.

The invention itself, both as to its construction and its method of operation, together with the additional objects and advantages thereof, will best be understood from the previous description of specific embodiments when read in connection with the accompanying drawings. Although the specific description of the herein disclosed invention has been described in detail above, it may be appreciated that those skilled in the art may make other modifications and changes in the invention disclosed above without departing from the spirit and scope thereof.

Claims

We claim:

1. A computer implemented method for mapping a user's pointer between a computer with a source image frame and a projection surface with a projected image frame using a camera with a camera image frame, comprising the steps of : determining a mapping between a point in the projected image frame and the corresponding point in the camera image frame; identifying a pointer targeted point within the camera image frame; determining a mapping between the pointer targeted point from the camera image frame to the source image frame; and performing a programmed action of the computer in response to the physical location of the pointer targeted point within the source image frame.

2. The method of claim 1 further comprising the step of: performing a programmed action in response to characteristics of the user's pointer action.

3. The method of claim 1 wherein the step of deteπ ining a mapping between a point in the projected image frame and the corresponding point in the camera image frame is performed manually and includes the steps of: placing a projector in a known position and orientation and in communication with the computer; placing the camera in a known position and orientation; and specifying internal optical characteristics of the projector and the camera.

4. The method of claim 1 wherein the step of determining a mapping between a point in the projected image frame and the corresponding point in the camera image frame is performed semi-automatically and includes the steps of: projecting an image onto the projection surface, with the image having a calibration region; and manually identifying the calibration region in the camera image frame.

5. The method of claim 1 wherein the step of determining a mapping between a point in the projected image frame and the corresponding point in the camera image frame is performed automatically by the computer.

6. The method of claim 1 wherein a projector in communication with the computer and the camera share optics, and the mapping between a point in the projected image frame and the corresponding point in the camera image frame is a similarity mapping.

7. The method of claim 1 wherein the step of identifying a pointer targeted point within the camera image frame includes the steps of: identifying a saturated region of pixels in a captured camera image; thresholding the captured camera image; and identifying the centroid of the saturated region of pixels within the camera image frame.

8. The method of claim 7 wherein a plurality of pointers is simultaneously used by the user and the centroid of each saturated region of pixels for each pointer is separately computed.

9. The method of claim 1 wherein the step of identifying the pointer targeted point within the camera image frame employs template matching.

10. The method of claim 1 wherein the step of identifying the pointer targeted point within the camera image frame employs image differentiation.

11. The method of claim 1 wherein the step of determining a mapping between the pointer targeted point from the camera image frame to the source image frame includes the steps of: identifying a projective transform that converts two dimensional points from the camera image frame to the source image frame; and determining the location of the pointer targeted point in the source image frame utilizing the techniques of matrix multiplication.

12. The method of claim 1 wherein the step of performing a programmed action in response to the location of the pointer targeted point within the source image frame is triggered by a change in a visible pointer state.

13. The method of claim 1 wherein the step of performing a programmed action in response to the location of the pointer targeted point within the source image frame is triggered by specific motion patterns.

14. The method of claim 1 wherein the step of performing a programmed action in response to the location of the pointer targeted point within the source image frame is triggered through voice activation.

15. The method of claim 1 wherein the projection surface is a computer display.

16. The method of claim 1 wherein the pointer is a shadow formed by interposing an opaque object between a projector and the projection surface.

17. The method of claim 1 wherein the position of the pointer targeted point corresponds to and controls the position of a mouse pointer icon on the computer display.

18. The method of claim 1 wherein the pointer controls a projection device in communication with the computer.

19. An apparatus for mapping a user's pointer between a computer with a source image frame and a projection surface with a projected image frame using a camera with a camera image frame, the apparatus comprising: means for determining a mapping between a point in the projected image frame and the corresponding point in the camera image frame; means for identifying a pointer targeted point within the camera image frame: means for determining a mapping between the pointer targeted point from the camera image frame to the source image frame; and means for performing a programmed action of the computer in response to the physical location of the pointer targeted point within the source image frame.

20. The apparatus of claim 19 wherein the computer further includes means for performing a programmed action in response to characteristics of the user's pointer action.

21. The apparatus of claim 19 wherein a projector is placed in a known position and orientation, the projector in communication with the computer, the camera is placed in a known position and orientation, and internal optical characteristics of the projector and the camera are specified resulting in a mapping between a point in the projected image frame and the corresponding point in the projected image frame.

22. The apparatus of claim 19 wherein a projector projects an image onto the projection surface, the image having an active region, the active region is identified in the camera image frame resulting in a mapping between a point in the projected image frame and the corresponding point in the camera image frame.

23. The apparatus of claim 19 wherein the computer automatical maps between a point in the projected image frame and the corresponding point in the camera image frame.

24. The apparatus of claim 19 wherein a projector in communication with the computer and the camera share optics, and the mapping between a point in the projected image frame and the corresponding point in the camera image frame is a similarity mapping.

25. The apparatus of claim 19 wherein the computer further includes means for identifying a saturated region of pixels in a captured camera image, thresholding the captured camera image such that the centroid of the saturated region of pixels within the camera image frame is identified.

26. The apparatus of claim 25 wherein a plurality of pointers is simultaneously used by the user and the centroid of each saturated region of pixels for each pointer is separately computed.

27. The apparatus of claim 19 wherein the pointer is a laser pointer.

28. The apparatus of claim 19 wherein the pointer is a telescoping pointer with an illuminated tip.

29. The apparatus of claim 19 wherein the pointer is a telescoping pointer with a highly reflective tip.

30. The apparatus of claim 19 wherein the pointer targeted point within the camera image frame is identified using template matching.

31. The apparatus of claim 19 wherein the pointer targeted point within the camera image frame is identified using image differentiation.

32. The apparatus of claim 19 wherein the computer further includes means for identifying a projective transform that converts two dimensional points from the camera image frame to the source image frame and determining the location of the pointer targeted point in the source image frame utilizing the techniques of matrix multiplication, such that the mapping between the pointer targeted point from the camera image frame to the source image frame is determined.

33. The apparatus of claim 19 wherein a change in a visible pointer state triggers a programmed action of the computer in response to the physical location of the pointer targeted point within the source image frame.

34. The apparatus of claim 33 wherein the change in the visible pointer state is a color change.

35. The apparatus of claim 33 wherein the change in the visible pointer state is a pointer shape change.

36. The apparatus of claim 33 wherein the change in the visible pointer state is a flash of visible light.

37. The apparatus of claim 19 wherein a specific motion pattern triggers a programmed action of the computer in response to the physical location of the pointer targeted point within the source image frame.

38. The apparatus of claim 37 wherein the specific motion pattern is a pause in a specific region of the projection surface.

39. The apparatus of claim 37 wherein the specific motion pattern is a distinctive gesture.

40. The apparatus of claim 19 wherein voice activation triggers a programmed action of the computer in response to the physical location of the pointer targeted point within the source image frame.

41. The apparatus of claim 19 further comprising a separate device to trigger a programmed action of the computer in response to the physical location of the pointer targeted point within the source image frame.

42. The apparatus of claim 19 wherein the projection surface is a presentation screen.

43. The apparatus of claim 19 wherein the projection surface is a white wall.

44. The apparatus of claim 19 wherein the projection surface is a computer display.

45. The apparatus of claim 44 wherein the computer display is a CRT monitor.

46. The apparatus of claim 19 further comprising means for repeating a mapping between a point in the projected image frame and the corresponding point in the camera image frame.

47. The apparatus of claim 19 wherein the pointer is a shadow, the shadow formed by interposing an opaque object between a projector and the projection surface.

48. The apparatus of claim 19 wherein the position of the pointer targeted point corresponds to and controls the position of a mouse pointer icon on the computer display.

49. The apparatus of claim 19 wherein the pointer controls a projection device, the projection device in communication with the computer.

- 1 J-