CN112578983B

CN112578983B - Finger orientation touch detection

Info

Publication number: CN112578983B
Application number: CN202011036081.2A
Authority: CN
Inventors: 王乐晶; M·梅罗恩; 林志昡
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2019-09-27
Filing date: 2020-09-27
Publication date: 2024-07-23
Anticipated expiration: 2040-09-27
Also published as: US20240168566A1; US11934584B2; CN112578983A; CN118708103A; US20210096652A1

Abstract

The present disclosure relates to finger orientation touch detection. Touch detection includes acquiring image data of a finger and a target surface, and determining a touch area of the image data on which the finger tip is in contact with the target surface. The method includes determining a pointing direction of a finger in the image data and estimating a target area on the target surface based on the touch area and the pointing direction, wherein the touch area includes a portion of the target surface extending from the touch area in a direction of the pointing direction.

Description

Finger orientation touch detection

Background

The present disclosure relates generally to the field of touch detection, and more particularly to finger orientation based touch detection.

Today's electronic devices provide many ways for users to interact with the surrounding world. For example, a user may interact with the electronic device using a virtual or physical keyboard, mouse, trackball, joystick, touch screen, or the like. One way users often interact with digital information on their devices is through a touch screen interface. Touch screen interfaces allow a user to interact with a display surface using a finger, stylus, or other object. The touch sensor recognizes the touched area and provides a response to the user.

As mixed reality environments increase, users typically provide input in an additional manner to enable virtual objects to interact with real objects. For example, a user may touch a real object to interact with the real object in a mixed reality manner. However, real objects typically do not include touch sensors that are traditionally used to detect touches from a user. While cameras may be used for visual touch detection, users may typically interact with the touch screen rather than the plane in a different manner when registering their intent.

Disclosure of Invention

In one embodiment, a method of touch detection is described. The method may include acquiring image data of a finger and a target surface, and determining a touch area of the image data on which the finger tip is in contact with the target surface. The pointing direction of the finger in the image data is determined and a target area on the target surface is estimated based on the touch area and the finger direction, wherein the target area on the target surface comprises a portion of the target surface based on the touch area and the pointing direction.

In another embodiment, the method may be embodied in computer executable program code and stored in a non-transitory storage device. In another embodiment, the method may be implemented in an electronic device.

Drawings

FIG. 1 illustrates, in block diagram form, a simplified system diagram in accordance with one or more embodiments.

FIG. 2 illustrates an example system setting for determining touch detection in accordance with one or more embodiments.

FIG. 3 illustrates an exemplary target surface and touch object in accordance with one or more embodiments.

FIG. 4 illustrates, in flow diagram form, an exemplary technique for detecting touches using finger orientations in accordance with one or more embodiments.

FIG. 5 illustrates, in flow diagram form, an exemplary technique for triggering events based on a detected touch in accordance with one or more embodiments.

FIG. 6 illustrates, in flow diagram form, an exemplary technique for detecting touches with gaze directions in accordance with one or more embodiments.

FIG. 7 illustrates, in block diagram form, a simplified multi-function device in accordance with one or more embodiments.

Detailed Description

The present disclosure relates to systems, methods, and computer-readable media for detecting touches in a physical environment. Enhancing any physical surface as a touch screen will allow intuitive interaction between the user, the computing system, and the real surface. The interactive experience may be improved by incorporating the user's intent into determining the touched surface. According to one or more embodiments, the finger direction may be used to determine the user's intent. For example, the direction in which the user's finger is pointing indicates the location where the user will touch the real surface. Further, according to one or more embodiments, a user may not occlude a target area on a surface (i.e., the portion of the surface that is intended to be noticed by the user by touching the surface) with his or her finger. A touch detection method is presented that can provide a user-friendly touch experience.

The following description describes a touch detection method that can provide a user-friendly and intuitive touch detection experience. In one embodiment, the touch state is determined based on a depth image including a finger and a target surface. In one or more embodiments, the depth image may be captured by a depth camera, or may be acquired from other types of images, such as RGB images. Fingertip position and finger direction may be determined in the depth image. Furthermore, the fingertip position and finger direction may be determined relative to a common coordinate system, such as a global coordinate system, a coordinate system of a camera, a coordinate system of an electronic device, etc. The finger direction may be determined by determining a hand position in the depth image and determining the finger direction based on the hand position and the fingertip position. In some embodiments, the direction of the finger may be a vector in 3D space, e.g., originating from a knuckle on the finger. The target surface may be identified from the image and a target region in the depth image may be identified based on the determined finger direction and fingertip position. Further, geometric characteristics of the target region (such as pose) may be determined.

Once the finger position and orientation are determined, a fingertip area in the depth image may be determined based on the fingertip position. Depth information of the fingertip area may be processed based on the estimated geometric characteristics. In one or more embodiments, the fingertip region may include a region extending from the fingertip along a direction in which the finger is pointed (such as a finger direction). The touch state may be determined based on the processed depth information of the fingertip area. The touch state may be determined based on an expected touch area on the target surface. In one or more embodiments, the target region may intersect the finger direction in a 2D image coordinate system.

In one or more embodiments, the touch area may include digital information, and the detected touch may trigger an event based on the associated digital information. In one or more implementations, a gaze direction of a user associated with a finger may be determined, for example, using a headset. The location of the target area may be further based on the gaze direction and the finger direction.

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed concepts. As part of this specification, some of the drawings of the present disclosure represent structures and devices in block diagram form in order to avoid obscuring novel aspects of the disclosed embodiments. In this context, it should be understood that references to numbered drawing elements without an associated identifier (e.g., 100) refer to all instances of drawing elements having identifiers (e.g., 100a and 100 b). Additionally, as part of this specification, some of the figures of the present disclosure may be provided in the form of a flow chart. Blocks in any particular flowchart may be presented in a particular order. However, it should be understood that the specific flow of any flow chart is merely illustrative of one embodiment. In other embodiments, any of the various components depicted in the flowcharts may be deleted, or components may be executed in a different order, or even concurrently. Further, other embodiments may include additional steps not shown as part of the flowchart. The language used in the present disclosure has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the disclosed subject matter. Reference in the present disclosure to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment, and multiple references to "one embodiment" or "an embodiment" should not be understood as necessarily all referring to the same or different embodiments.

It will be appreciated that in the development of any such actual implementation (as in any development project), numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, and that these goals will vary from one implementation to another. It will be further appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.

For purposes of this disclosure, the term "camera system" refers to one or more lens assemblies as well as one or more sensor elements and other circuitry for capturing images. For purposes of this disclosure, a "camera" may include more than one camera system, such as a stereoscopic camera system, a multi-camera system, or a camera system capable of sensing the depth of a captured scene.

A physical environment refers to a physical world in which people can sense and/or interact without the assistance of an electronic system. Physical environments such as physical parks include physical objects such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with a physical environment, such as by visual, tactile, auditory, gustatory, and olfactory.

Conversely, a Computer Generated Reality (CGR) environment refers to a completely or partially simulated environment in which people sense and/or interact via an electronic system. In the CGR, a subset of the physical movements of the person, or a representation thereof, is tracked and in response one or more characteristics of one or more virtual objects simulated in the CGR environment are adjusted in a manner consistent with at least one physical law. For example, the CGR system may detect human head rotation and, in response, adjust the graphical content and sound field presented to the human in a manner similar to the manner in which such views and sounds change in the physical environment. In some cases (e.g., for reachability reasons), the adjustment of the characteristics of the virtual object in the CGR environment may be made in response to a representation of physical motion (e.g., a voice command).

A person may utilize any of their senses to sense and/or interact with a CGR object, including vision, hearing, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create a 3D or spatial audio environment that provides a perception of point audio sources in 3D space. As another example, an audio object may enable audio transparency that selectively introduces environmental sounds from a physical environment with or without computer generated audio. In some CGR environments, a person may sense and/or interact with only audio objects.

Examples of CGR include virtual reality and mixed reality. A Virtual Reality (VR) environment refers to a simulated environment designed to be based entirely on computer-generated sensory input for one or more senses. The VR environment includes a plurality of virtual objects that a person can sense and/or interact with. For example, computer-generated images of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in the VR environment through a simulation of the presence of the person within the computer-generated environment, and/or through a simulation of a subset of the physical movements of the person within the computer-generated environment.

In contrast to VR environments designed to be based entirely on computer-generated sensory input, a Mixed Reality (MR) environment refers to a simulated environment designed to introduce sensory input from a physical environment or a representation thereof in addition to including computer-generated sensory input (e.g., virtual objects). On a virtual continuum, a mixed reality environment is any condition between, but not including, a full physical environment as one end and a virtual reality environment as the other end.

In some MR environments, the computer-generated sensory input may be responsive to changes in sensory input from the physical environment. In addition, some electronic systems for rendering MR environments may track the position and/or orientation relative to the physical environment to enable virtual objects to interact with real objects (i.e., physical objects or representations thereof from the physical environment). For example, the system may cause the motion such that the virtual tree appears to be stationary relative to the physical ground.

Examples of mixed reality include augmented reality and augmented virtualization. An Augmented Reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment or representation thereof. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present the virtual object on a transparent or semi-transparent display such that a person perceives the virtual object superimposed over the physical environment with the system. Alternatively, the system may have an opaque display and one or more imaging sensors that capture images or videos of the physical environment, which are representations of the physical environment. The system combines the image or video with the virtual object and presents the composition on an opaque display. A person utilizes the system to indirectly view the physical environment via an image or video of the physical environment and perceive a virtual object superimposed over the physical environment. As used herein, video of a physical environment displayed on an opaque display is referred to as "pass-through video," meaning that the system captures images of the physical environment using one or more image sensors and uses those images when rendering an AR environment on the opaque display. Further alternatively, the system may have a projection system that projects the virtual object into the physical environment, for example as a hologram or on a physical surface, such that a person perceives the virtual object superimposed on top of the physical environment with the system.

An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing a passthrough video, the system may transform one or more sensor images to apply a selected viewing angle (e.g., a viewpoint) that is different from the viewing angle captured by the imaging sensor. As another example, the representation of the physical environment may be transformed by graphically modifying (e.g., magnifying) portions thereof such that the modified portions may be representative but not real versions of the original captured image. For another example, the representation of the physical environment may be transformed by graphically eliminating or blurring portions thereof.

Enhanced virtual (AV) environments refer to simulated environments in which a virtual or computer-generated environment incorporates one or more sensory inputs from a physical environment. The sensory input may be a representation of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but the face of a person is realistically reproduced from an image taken of a physical person. As another example, the virtual object may take the shape or color of a physical object imaged by one or more imaging sensors. For another example, the virtual object may employ shadows that conform to the location of the sun in the physical environment.

Referring to fig. 1, a simplified block diagram of an electronic device 100 in accordance with one or more embodiments of the present disclosure is shown. The electronic device 100 may be part of a multi-function device such as a phone, tablet, personal digital assistant, portable music/video player, wearable device, base station, laptop computer, desktop computer, network device, or any other electronic device capable of capturing image data. FIG. 1 illustrates, in block diagram form, an overall view of a system diagram of a system capable of providing touch detection using a vision apparatus. Although not shown, the electronic device 100 may be connected to additional devices capable of providing similar or additional functionality across a network, wired connection, bluetooth or other short-range connection, and the like. Accordingly, the various components and functions described herein with respect to fig. 1 may alternatively be distributed across multiple devices that may be communicatively coupled across a network.

The electronic device 100 may include one or more processors, such as a processing unit (CPU) 120. Processor 120 may be a system-on-chip such as those found in mobile devices, and may include one or more specialized Graphics Processing Units (GPUs). In addition, the processor 120 may include multiple processors of the same or different types. The electronic device 100 may also include memory 130. Memory 130 may include one or more different types of memory that may be used to perform device functions in conjunction with processor 120. For example, memory 130 may include a cache, ROM, RAM, or any kind of transitory or non-transitory computer-readable storage medium capable of storing computer-readable code. The memory 130 may store various programming modules for execution by the processor 120, including a touch module 135. The electronic device 100 may also include a storage 140. Storage 140 may include one or more non-transitory media including, for example, magnetic disks (fixed, floppy, and removable disks) and tapes, optical media such as CD-ROMs, and Digital Video Disks (DVDs), and semiconductor memory devices such as electrically programmable read-only memories (EPROMs) and electrically erasable programmable read-only memories (EEPROMs). The storage 140 may include a model store 145, which model store 145 may include a model of a touch object, such as a user's finger. It should be appreciated that the touch module 135 and the model store 145 can be stored or hosted in different locations from the electronic device 100, according to one or more embodiments. Further, in one or more embodiments, the touch module 135 and the model store 145 can be stored in alternative or additional locations, such as in a network storage device.

In one or more embodiments, the electronic device 100 can include other components for vision-based touch detection, such as one or more cameras 105 and/or other sensors, such as a depth sensor 110. In one or more implementations, each of the one or more cameras 105 can be a conventional RGB camera, a depth camera, or the like. Additionally, the camera 105 may include a stereo or other multi-camera system, a time-of-flight camera system, or the like, that captures images from which depth information of a scene may be determined.

In one or more embodiments, electronic device 100 may allow a user to interact with a CGR environment. There are many different types of electronic systems that enable a person to sense and/or interact with various CGR environments. Examples include head-mounted systems, projection-based systems, head-up displays (HUDs), vehicle windshields integrated with display capabilities, windows integrated with display capabilities, displays formed as lenses designed for placement on a human eye (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smart phones, tablet computers, and desktop/laptop computers. The head-mounted system may have one or more speakers and an integrated opaque display. Alternatively, the head-mounted system may be configured to accept an external opaque display (e.g., a smart phone). The head-mounted system may incorporate one or more imaging sensors for capturing images or video of the physical environment, and/or one or more microphones for capturing audio of the physical environment. The head-mounted system may have a transparent or translucent display instead of an opaque display. The transparent or translucent display may have a medium through which light representing an image is directed to the eyes of a person.

In one or more implementations, the touch module 135 can estimate whether a touch has occurred (e.g., a contact has been made) between the touch object and the target surface. The touch module 135 may determine the likelihood that contact has occurred between a touch object (such as a finger or fingertip) and the target surface. The touch module 135 may determine when a touch event occurs, for example, by acquiring depth information of the touch object and the target surface. For example, the touch module 135 may receive or acquire depth information from the camera 105, the depth sensor 110, or other sensors. Further, the touch module 135 may determine touch information from other data, such as stereoscopic images captured by one or more cameras 105, and the like (such as by generating a depth map). The touch module 135 may then determine an estimate based on the signal that the touch event has occurred. In one or more embodiments, the estimation may be based on a number of factors, such as by utilizing a predetermined model of a finger or other touch object (such as from model store 145). In one or more implementations, the touch module 135 can also estimate a distance between the touch object and the target surface. According to one or more implementations, raw touch data may indicate a likelihood that a touch has occurred based on, for example, a determined measured distance between a touch object and a target surface. The determination that a touch has occurred may be based, for example, on a predetermined or dynamically determined threshold estimate for determining a touch. In addition, the touch state may also be determined relative to the target surface. In one or more embodiments, determining the pose of the target surface and the touch object in a common coordinate system may allow the touch module 135 to determine the relative distance between the touch object and the target surface and/or other components in the environment.

According to one or more embodiments, the touch module 135 may not only determine when a touch has occurred, but may also identify a touch area of the target surface. In one or more implementations, the touch module 135 can determine a touch area in the image on which a fingertip or other touch object is in contact with the target surface, e.g., based on the depth information. Further, the touch module 135 may determine a pointing direction of the touch object in the scene based on the first image data. For example, the touch module 135 may determine a 3D ray that indicates a 3D direction in which a finger or touch object is pointing when in contact with a target surface. The 3D rays may be determined, for example, based on depth information collected from the camera 105 and/or the depth sensor 110. The orientation of the finger may be used to determine the direction of the finger. Then, a target area on the target surface may be estimated based on the touch area and the finger orientation. Thus, according to one or more embodiments, the target area of the target surface may be different from the touch area of the target surface.

Although electronic device 100 is described as including numerous components described above, in one or more embodiments, the various components and functions of the components may be distributed across multiple devices. In particular, in one or more embodiments, one or more of the touch module 135 and the model store 145 may be disputed differently across the electronic device 100 or elsewhere in additional systems communicatively coupled to the electronic device 100. Further, in one or more embodiments, the electronic device 100 may be comprised of multiple devices in the form of an electronic system. Thus, although certain calls and transmissions are described herein with respect to a particular system, various calls and transmissions may be directed differently based on differently distributed functions. In addition, additional components may be used, and some combinations of the functions of any of the components may be combined.

Fig. 2 illustrates an exemplary system setup 200 in which techniques for estimating a target area may be determined. In particular, FIG. 2 illustrates a user 265 utilizing the electronic device 100 to detect a touch between a touch object (e.g., the user's finger 220) and a target surface 235 (e.g., a menu). It should be appreciated that the system arrangement 200 is depicted primarily as an example to facilitate an understanding of the techniques described herein.

In one or more embodiments, the target surface 235 may include one or more regions of interest. For the purposes of the depicted example, the region of interest may include a region of interest 240 (e.g., the "Mixed Greens" portion of the menu) and a region of interest 255 (e.g., the "STEAK TIP SALAD" portion of the menu). As shown, the touch object 220 may be in physical contact with the target surface 235 at the touch area 230. However, the target area 250 may include a different portion of the target surface than the touch area 230. Further, in one or more embodiments, the target region 250 and the touch region 230 may overlap, in whole or in part, or not.

According to one or more embodiments, the touch area 230 may be determined based on depth information captured by one or more cameras 105 and/or other depth sensors 110 of the electronic device 100. According to one or more embodiments, the electronic device 100 may capture images and/or other depth data including the touch object 220 and the target surface 235, and may estimate a distance between the touch object 220 and the target surface 235. In one or more embodiments, the one or more cameras 105 can capture image data of the touch object 220 and the target surface 235. The electronic device 100 may then utilize the model of the touch object 220 to determine the location of the touch object in 3D space. That is, by utilizing the model of the touch object 220, the electronic device can determine where the finger pad is located in space, even though the finger pad may not be visible in the image data due to occlusion by the top of the finger and/or the top of the hand.

According to one or more embodiments, the touch module may utilize the determined touch area 230 to determine a target area 250 that is the subject of user selection. In one or more embodiments, the target area 250 may be determined using an orientation of the touch object 220, such as a finger orientation. In one or more embodiments, the finger orientation may be defined, at least in part, by a directional 3D ray 225 indicating the pointing direction of the finger. For example, the target area 250 may be determined based on the finger orientation at the time of the touch determination. Further, in one or more embodiments, the target region 250 may be determined based in part on the touch region 230. For example, the target region 250 may be determined by adjusting the touch region 230 based on the finger orientation 225.

By utilizing depth information captured from the camera 105 and/or the depth sensor 110, the electronic device can determine not only that a touch event has occurred between the touch object 220 and the target surface 235, but also the location on the target surface where the contact was made. Thus, the model store 145 can include a model of the touch object 220, as shown in the exemplary illustration, with contact made at the touch area 230. According to one or more implementations, the touch area 230 may or may not indicate a particular portion of the target surface 235 that the user is attempting to select. In the example shown, the target surface 235 includes a menu with various menu items that may be considered regions of interest on the target surface. Those menu items include Mixed Greens and Steak Tip salad as depicted, in this example, the user may contact the target surface 235 as follows: the touch area 230 overlaps the step tip salad portion of the menu at 255. However, according to the scene, it is shown that: the user's finger 220 is pointed at Mixed Greens to the salad 240 even though contact is made with the wake to salad portion of the menu at 255. According to one or more embodiments, the user may contact STEAK TIPS salad area 255 of target surface 235 while aiming at showing an interest in Mixed Greens salad area 240 of target surface 235. Thus, the target area 250 must be determined based on the finger orientation and the touch area.

According to one or more embodiments, the target region 250 may also be determined based on the gaze direction of the user. For example, the gaze vector 260 may be determined by the electronic device 100. For example, image data may be captured by a user-facing camera, where the direction of the user's eyes may be determined. The direction of the user's eyes may be determined, for example, in a coordinate system associated with the electronic device 100 or some other coordinate system in which the finger orientation 225 may be utilized to determine the gaze vector 260. In one or more embodiments, the gaze direction may be used to adjust the target area. For example, if the user views in a direction substantially similar to the finger orientation, the target region may be more accurate than if the gaze direction is in a direction that is much different from the finger orientation. If the gaze direction 260 and the finger orientation 225 are substantially different (e.g., if the difference in the 3D direction of the gaze and the 3D direction of the finger orientation meets a predetermined threshold), the target region may be adjusted along the direction of the gaze 260.

FIG. 3 illustrates an exemplary target surface and touch object in accordance with one or more embodiments. Fig. 3 shows an alternative view of the environment described above with respect to fig. 2. Specifically, FIG. 3 shows a view of target surface 235 and touch object 220. Further, FIG. 3 depicts a touch region 230 and a target region 250 that overlaps with the region of interest 240.

In one or more embodiments, the target region 250 may be determined based on the touch region 230 and the finger orientation 225. As described above, the target area may be determined based on the finger orientation at the time of touch detection. Further, according to one or more embodiments, the target area 250 may be determined by adjusting the touch area 230. That is, the target area 250 may be defined as a touch area and then adjusted based on additional factors such as pointing and gaze direction.

According to one or more embodiments, the target region 250 may be determined based on the direction of the finger orientation 225. According to one or more embodiments, the finger orientation may be determined based on a 3D direction vector through one or more joints of the finger (such as joint 305 and joint 310) and/or the tip of the finger 220. According to one or more embodiments, the target region 250 may be determined based on spatial points of the 3D ray 225 through the target surface 235. According to one or more embodiments, the geometric characteristics of the surface may be estimated to determine the depth of the surface relative to the electronic device 100 in order to determine the point in space at which the finger direction vector intersects the target surface 235.

As shown, it becomes clear that even though the touch area 230 overlaps the "STEAK TIP SALAD" portion of the target surface 235, the object that is intended to be selected is the "Mixed Greens" portion 240. By taking into account the finger orientation and the detected touch, the target region 250 provides a more accurate estimate of the selected portion of the target surface 235. According to one or more implementations, the selected portion (e.g., "Mixed Greens" 240) may thus be associated with digital information, and upon detection of selection of a particular region of interest (e.g., "Mixed Greens" 240), an event may be triggered based on the digital information associated with the region of interest.

FIG. 4 illustrates a flow diagram form of an exemplary technique for estimating a target area of a touch between a touch object and a target surface in accordance with one or more embodiments. For purposes of explanation, the following steps will be described in the context of fig. 1. However, it should be understood that various actions may be performed by alternative components. Further, the various actions are performed in a different order. In addition, some acts may be performed concurrently, and some acts may not be required, or other acts may be added.

The flowchart begins at 405, where the electronic device 100 captures image data of a touch object and a target surface in an environment from a first perspective through the camera 105. For example, the camera 105 may face the environment in a manner that the touch object and target surface are positioned. Further, in one or more embodiments, additional data, such as depth information, etc., may be captured.

The flowchart continues at 410, where the touch module 135 determines a fingertip area in the image. As described above, the electronic device 100 may access a model of a touch object (such as a finger). The model may be a model of a general finger or a specific finger and may be used in combination with image data and/or depth data to determine the location of the fingertip in the environment.

At 415, the touch module 135 estimates geometric characteristics of the surface based on the determined fingertip area. According to one or more embodiments, a depth of a surface may be determined at which a finger may touch or hover. In one or more embodiments, depth information for a surface may be determined, for example, using a model of the surface or depth information for the surface captured in association with image data and/or other depth data. The geometric characteristics may include a point in space at which the target surface is behind the touch object relative to the electronic device 100.

The flowchart continues at 420, where the touch module 135 processes surface depth information for the fingertip area based on the geometric characteristics. According to one or more embodiments, the gesture of the touch object may be determined, for example, with respect to a target surface, with respect to an electronic device, and/or the like. Further, in one or more embodiments, the gesture of the touch object may be determined based on, for example, a coordinate system of the electronic device, a real world coordinate system, a coordinate system of the target surface, and the like.

At 425, the touch module 135 determines a touch state based on the depth information. In accordance with one or more embodiments, the touch module 135 determines the touch area of the image on which the finger tip is in contact with the surface. Determining the location of the fingertip in contact with the surface may involve determining the gesture of the finger. In one or more embodiments, depth information for a surface may be determined, for example, using a model of the surface or depth information for the surface captured in association with image data and/or other depth data. Thus, the gap distance between the touch object and the target surface is calculated based on the determined depth of the fingertip or other touch object compared to the target surface on which the touch object is located. In one or more embodiments, the gap distance may be used to estimate the likelihood of a touch or otherwise determine whether a touch has occurred.

The flow chart continues at 430, where the touch module 135 determines the touch area in the image on which the finger tip is in contact with the surface. In one or more embodiments, the touch area may be determined as the portion of the target surface that is hovered/touched by the fingertip when it is determined at 425 that a touch has occurred.

The flowchart continues at 435, where the touch module 135 determines the pointing direction of the touch object in the scene based on the image data. As described above, the pointing direction may be determined as a 3D directed ray corresponding to the pointing direction of the finger. In one or more embodiments, the pointing direction (e.g., finger orientation) may be determined by identifying joints and/or fingertips in the finger from which the rays were determined.

At 425, a target region is estimated based on the touch region and the pointing direction. The target area may be determined in various ways. For example, the target region 250 may be determined based on the finger orientation 220 at the time of the touch determination. Further, in one or more embodiments, the target region 250 may be determined based in part on the touch region 230. For example, the target region 250 may be determined by adjusting the touch region 230 based on the finger orientation 225. The target area may be determined as a portion of the target surface that is offset from the touch area in the direction of the finger orientation. As another example, the target region may be a portion of the target surface that intersects the target surface with a 3D directed ray associated with the pointing of the finger, or may be a portion of the target surface that is offset from the touch region 230 in the direction of the finger orientation 225.

FIG. 5 illustrates, in flow diagram form, an exemplary technique for triggering events based on a detected touch in accordance with one or more embodiments. For purposes of explanation, the following steps will be described in the context of fig. 1. However, it should be understood that various actions may be performed by alternative components. Further, the various actions are performed in a different order. In addition, some acts may be performed concurrently, and some acts may not be required, or other acts may be added.

The flow chart begins at 505, where a determination is made as to whether a touch has been detected. A touch may be detected, for example, as described above with respect to fig. 4. If no touch is detected, the flow chart continues to 510 and the touch module 135 continues to monitor for touches of the touch object and the target surface.

Returning to 505, if it is determined that a touch is detected, the flowchart continues at 515, and the touch module 135 detects an object of interest at the target region. In one or more embodiments, the target region may be determined, for example, as described above with respect to fig. 4. When the target region is detected, an object of interest may be determined. The object of interest may be, for example, a specific part of the target surface or a physical object.

The flow chart continues at 520, where the touch module 135 obtains digital information associated with the object of interest. In one or more embodiments, the digital information may be information related to a visual item at the target area. As another example, the target area may include an indication associated with the additional content, such as a QR code or other indication. In some embodiments, the digital information may be obtained based on the indication.

The flowchart ends at 525, where the touch module 135 triggers an event based on the digital information. As another example, the digital information may be computer code for activating an application, accessing network-based content, and the like. As another example, notifications may be generated and transmitted based on digital information. Returning to the menu example, if the user points to a particular menu item, a message may be transmitted to the restaurant that the user wishes to purchase the selected menu item.

FIG. 6 illustrates, in flow diagram form, an exemplary technique for detecting touches with gaze directions in accordance with one or more embodiments. More specifically, FIG. 6 depicts a detailed technique for estimating a target area on a surface based on touch area and pointing direction in accordance with one or more embodiments. According to one or more embodiments, a more accurate determination of the target area may be determined, for example, by considering the gaze direction of a user utilizing the techniques. For example, in one or more embodiments, if the touch area is above the user's gaze (e.g., the user is stretching his or her hand to make contact with the target surface), the target area may be adjusted to be farther from the user than the target surface. As another example, if the touch area is below the user's eyes, the target area may be adjusted to be closer to the touch area relative to the user.

The flowchart begins at 605, where the touch module 135 determines the user's gaze direction in a particular coordinate system. In one or more embodiments, the gaze direction may be determined in a coordinate system of the detection apparatus, in a coordinate system associated with the target surface, in a universal coordinate system, etc. The gaze direction may be determined, for example, based on image data captured by the camera 105 of the electronic device 100. For example, the camera 105 may include a front camera and a rear camera, where one camera captures an image of the target surface and the touch object and the other camera may capture an image of the user's eyes to determine the direction in which the user is looking. In one or more embodiments, additional or alternative data may be used to determine the gaze direction of the user, for example, using the depth sensor 110 or other sensor.

The flow chart continues at 610, where the touch module 135 compares the gaze direction with the pointing in the coordinate system from 605. In one or more implementations, the gaze direction and the pointing direction may be determined in different coordinate systems, and the touch module 135 may convert the multiple coordinate systems into a single common coordinate system.

At 615, it is determined whether the gaze direction and the pointing direction are substantially similar. The gaze direction and the pointing direction may be substantially similar, for example, if the 3D ray determined for the pointing direction is substantially similar to the ray determined for the gaze direction. For example, if the difference between the gaze ray and the pointing ray meets a predetermined threshold, the gaze direction and the pointing may be substantially similar.

If, at 615, it is determined that the gaze direction and the pointing direction are substantially similar, the flowchart continues at 620, and the touch module 135 adjusts the target area to be closer to the touch area relative to the user. In one or more implementations, an initial determination for a target area may be determined based on a touch direction of a user. Here, the target area may be adjusted to be closer to the user than the touch area. As described above, if the target area is above the user's eyes, the target area may be farther from the user than the default determination of the target area.

Returning to 615, if it is determined that the gaze direction and the pointing direction are not substantially similar, the flowchart continues at 625, where the touch module 135 adjusts the target area to be away from the touch area relative to the user. That is, the target area may be adjusted to be farther from the user than the touch area. According to one or more embodiments, for example, if the target area is below the user's line of sight, the gaze direction and the pointing direction may not be in substantially similar directions.

In accordance with one or more embodiments, additionally or alternatively, a gaze direction of a user may be determined. For example, if the gaze direction is downward (e.g., in a direction below the head of the user), the target area may be adjusted to be closer to the touch area relative to the user. The gaze direction may be determined, for example, based on a real world coordinate system. Conversely, if the gaze direction is determined to be upward (e.g., in a direction above the user's head), the target region may be adjusted to be farther away from the touch region relative to the user.

Referring now to fig. 7, a simplified functional block diagram of an exemplary multi-function electronic device 700 is shown, according to one embodiment. Each of the electronic devices 100 may be a multi-function electronic device or may have some or all of the components described for a multi-function electronic device described herein. The multi-function electronic device 700 may include a processor 705, a display 710, a user interface 715, graphics hardware 720, a device sensor 725 (e.g., a proximity sensor/ambient light sensor, an accelerometer, and/or a gyroscope), a microphone 730, one or more audio codecs 735, one or more speakers 740, a communication circuit 745, a digital image capture circuit 750 (e.g., including a camera system), one or more video codecs 755 (e.g., supporting a digital image capture unit), a memory 760, a storage device 765, and a communication bus 770. The multi-function electronic device 700 may be, for example, a digital camera or a personal electronic device such as a Personal Digital Assistant (PDA), a personal music player, a mobile phone, or a tablet computer.

Processor 705 may execute instructions (e.g., such as the generation and/or processing of images disclosed herein) necessary to implement or control the operation of many of the functions performed by device 700. The processor 705 may, for example, drive the display 710 and may receive user input from the user interface 715. The user interface 715 may allow a user to interact with the device 700. For example, the user interface 715 may take a variety of forms, such as buttons, a keypad, a dial, a click touch dial, a keyboard, a display screen, and/or a touch screen. Processor 705 can also be, for example, a system on a chip, such as those found in mobile devices, and include a dedicated Graphics Processing Unit (GPU). Processor 705 may be based on a Reduced Instruction Set Computer (RISC) or Complex Instruction Set Computer (CISC) architecture or any other suitable architecture and may include one or more processing cores. Graphics hardware 720 may be special purpose computing hardware used to process graphics and/or auxiliary processor 705 to process graphics information. In one implementation, graphics hardware 720 may include a programmable GPU.

Image capture circuit 750 may include two (or more) lens assemblies 780A and 780B, where each lens assembly may have a separate focal length. For example, lens assembly 780A may have a short focal length relative to the focal length of lens assembly 780B. Each lens assembly may have a separate associated sensor element 790. Alternatively, two or more lens assemblies may share a common sensor element. Image capture circuit 750 may capture still images and/or video images. The output from image capture circuit 750 may be at least partially processed by: one or more video codecs 755 and/or processor 705 and/or graphics hardware 720, and/or a dedicated image processing unit or pipeline incorporated within circuitry 765. The image thus captured may be stored in memory 760 and/or storage 765.

Sensor and camera circuitry 750 may capture still and video images that may be processed according to the present disclosure at least in part by: one or more video codecs 755 and/or processor 705 and/or graphics hardware 720 and/or specialized image processing units incorporated within circuitry 750. The image thus captured may be stored in memory 760 and/or storage 765. Memory 760 may include one or more different types of media used by processor 705 and graphics hardware 720 to perform device functions. For example, memory 760 may include a memory cache, read Only Memory (ROM), and/or Random Access Memory (RAM). Storage 765 may store media (e.g., audio files, image files, and video files), computer program instructions or software, preference information, device profile information, and any other suitable data. Storage 765 may include one or more non-transitory computer-readable storage media including, for example, magnetic disks (fixed, floppy, and removable disks) and tapes, optical media such as CD-ROMs and Digital Video Disks (DVDs), and semiconductor memory devices such as electrically programmable read-only memories (EPROMs) and electrically erasable programmable read-only memories (EEPROMs). Memory 760 and storage 765 may be used to tangibly retain computer program instructions or code organized into one or more modules and written in any desired computer programming language. Such computer program code, when executed by, for example, processor 705, may implement one or more of the methods described herein.

The scope of the presently disclosed subject matter should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms "including" and "in which" are used as the plain-English equivalents of the respective terms "comprising" and "wherein".

Claims

1. A non-transitory computer-readable medium comprising computer-readable code executable by one or more processors to:

acquiring image data of a finger and a target surface;

identifying a touch area on the target surface where the finger is in contact with the target surface;

Determining a directed 3D ray consistent with the pointing of the finger in the image data;

Estimating a target region based on a spatial point of the directed 3D ray passing through the target surface, wherein the spatial point is determined based on an estimated depth of the target surface, and wherein the target region is different from the touch region;

determining a touch state based on the estimated depth of the target surface; and

An event is triggered based on the touch state, wherein the event is based on the target area.

2. The non-transitory computer-readable medium of claim 1, further comprising computer-readable code for:

Estimating a geometric characteristic of the target surface based on the estimated target region;

Processing surface depth information of the target region based on the geometric characteristics; and

The touch state is determined based on the target surface depth information.

3. The non-transitory computer-readable medium of claim 2, further comprising computer-readable code for:

providing digital information associated with the target area; and

An event is triggered based on the associated digital information and the touch state.

4. The non-transitory computer-readable medium of claim 1, wherein the computer-readable code for determining the pointing direction of the finger further comprises computer-readable code for:

detecting a first joint and a second joint of the finger in the image data,

Wherein the pointing direction is further determined from the spatial point of each of the first joint and the second joint.

5. The non-transitory computer-readable medium of claim 1, wherein the computer-readable code for determining the pointing direction of the finger further comprises computer-readable code for:

hand positions in the image data are detected,

Wherein the pointing direction is further determined according to the hand position.

6. The non-transitory computer-readable medium of claim 1, further comprising computer-readable code for:

detecting a gaze vector of a user associated with the finger; and

And adjusting the target area according to the gazing vector.

7. The non-transitory computer-readable medium of claim 1, wherein the image data comprises a depth image acquired by a head-mounted device.

8. A system for detecting a touch, comprising:

One or more processors; and

One or more computer-readable media comprising computer-readable code executable by the one or more processors to:

acquiring image data of a finger and a target surface;

Determining a directed 3D ray consistent with the pointing of the finger in the image data; and

Determining a touch state based on the estimated depth information; and

An event is triggered based on the touch state, wherein the event is determined based on the target area.

9. The system of claim 8, further comprising computer readable code for:

The touch state is determined based on the target surface depth information.

10. The system of claim 9, further comprising computer readable code for:

providing digital information associated with the target area; and

11. The system of claim 8, wherein the computer readable code for determining the pointing direction of the finger further comprises computer readable code for:

detecting a first joint and a second joint of the finger in the image data,

12. The system of claim 8, wherein the computer readable code for determining the pointing direction of the finger further comprises computer readable code for:

hand positions in the image data are detected,

13. The system of claim 8, further comprising computer readable code for:

detecting a gaze vector of a user associated with the finger; and

And adjusting the target area according to the gazing vector.

14. A method for detecting a touch, comprising:

acquiring image data of a finger and a target surface;

15. The method of claim 14, further comprising:

The touch state is determined based on the target surface depth information.

16. The method of claim 15, further comprising:

providing digital information associated with the target area; and

17. The method of claim 14, wherein determining the pointing direction of the finger further comprises:

detecting a first joint and a second joint of the finger in the image data,

18. The method of claim 14, wherein determining the pointing direction of the finger further comprises:

hand positions in the image data are detected,

19. The method of claim 14, wherein determining the pointing direction of the finger further comprises:

detecting a gaze vector of a user associated with the finger; and

And adjusting the target area according to the gazing vector.

20. The method of claim 14, wherein the image data comprises a depth image acquired by a head-mounted device.