WO2024008519A1

WO2024008519A1 - Method for real time object selection in a virtual environment

Info

Publication number: WO2024008519A1
Application number: PCT/EP2023/067539
Authority: WO
Inventors: Kai Hübner; Dan SONG; Jakob WAY
Original assignee: Gleechi Ab
Priority date: 2022-07-07
Filing date: 2023-06-27
Publication date: 2024-01-11

Abstract

A computer-implemented method for real time object selection in a virtual environment is provided. The method comprises a first step of obtaining, from a database, 3D geometry data representing semantic parts of objects in the environment and their bounds, for example boundaries and holes of the objects. In a next step, the method comprises obtaining, from a database, object selector data, wherein the object selector data comprises a representation of an object selector intended to select an object. Further, in a next step, the method comprises obtaining position data from a position sensor based on a user interaction with the position sensor, for example a hand movement or an eye movement of a user. Yet further, in a next step, the method comprises selecting an object in the environment based on the 3D geometry data, the object selector data and the position data.

Description

METHOD FOR REAL TIME OBJECT SELECTION IN A VIRTUAL ENVIRONMENT

Technical Field

The present disclosure generally relates to the field of virtual reality, and more particularly to a method for real time object selection in a virtual environment.

Background

Object selection describes the mechanism of singling out at least one object from a potentially cluttered set of objects for a certain task. Object selection relates to identification of an object, rather than interaction with a selected object. In virtual two- dimensional scenarios, this may involve hovering a mouse pointer over a two- dimensional object on the desktop of a computer in order to start a program or open a folder, for example through a mouse click. In virtual three-dimensional scenarios, selection of three-dimensional objects may involve hovering a three-dimensional pointer over a three-dimensional object providing the user with feedback about the possibility of an interaction with the selected object. An example for feedback is to visually highlight the selected object in a different colour. An example for an interaction is to manipulate the object, such as moving it in the virtual space by clicking and dragging the mouse.

Most object selection research comes from the two dimensional domain, and applications in the three-dimensional domain often try to simply adapt two dimensional theories and standards. However, these theories and standards do not fit well into the three- dimensional domain and its use cases.

In light of the observations above, the present inventors have realized that there is a need for improvements when it comes to real time object selection in a virtual environment. Summary

It is accordingly an object of the present disclosure to mitigate, alleviate or eliminate at least some of the problems referred to above, by providing a method for real time object selection in a virtual environment.

In a first aspect, a computer-implemented method for real time object selection in a virtual environment is provided. The method comprises a first step of obtaining, from a database, 3D geometry data representing semantic parts of objects in the environment and their bounds, for example boundaries and holes of the objects. In a next step, the method comprises obtaining, from a database, object selector data, wherein the object selector data comprises a representation of an object selector intended to select an object. Further, in a next step, the method comprises obtaining position data from a position sensor based on a user interaction with the position sensor, for example a hand movement or an eye movement of a user. Yet further, in a next step, the method comprises selecting an object in the environment based on the 3D geometry data, the object selector data and the position data.

Optionally, the semantic parts and their bounds of the objects are determined using mesh segmentation. Optionally, the semantic parts comprise primitive shapes, for example cubes, spheres, or capsules. Optionally, the primitive shapes are determined based on a classification method for the semantic parts.

Optionally, the object selector data is determined based on a kinematic chain and/or a mesh of the object selector. Optionally, the object selector data comprises a coordinate in three dimensions, a point in three dimensions and a vector in three dimensions, or a vector in three dimensions and/or a geometric or solid primitive.

Optionally, the coordinate in three dimensions is determined based on one or more designated points on a hand of a user, for example a selected number of fingertips. Optionally, the vector in three dimensions is determined based on the average direction or mean direction from a designated point on the wrist of a user to a designated point on one or more fingertips of the user.

Optionally, the selection of the object is further based on an intersection between a bound of a semantic part of the object and a bound of the representation of the object selector. Optionally, the selection of the object is further based on an intersection between a ray from a representation of the object selector and a bound of a semantic part of the object. Optionally, the selection of the object is further based on a shortest distance from a bound of the representation of the object selector to a bound of a semantic part of the object.

Optionally, the selection of the object is further based on weights assigned to the semantic parts of the object, wherein the object associated with the semantic part having the highest weight is chosen in a case where several semantic parts are candidates for selection.

In a second aspect, a computer program is provided having stored thereon instructions that, when the program is executed by a computer, cause the computer to carry out the method described herein.

In a third aspect, a computer-readable medium having stored thereon instructions that, when executed by one or more processors, cause execution of the method described herein.

Other aspects and embodiments are defined by the appended claims and are further explained in the detailed description section as well as on the drawings. It should be emphasized that the term “comprises/comprising” when used in this specification is taken to specify the presence of stated features, integers, steps, or components, but does not preclude the presence or addition of one or more other features, integers, steps, components, or groups thereof. All terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the” element, device, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, device, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

Brief Description of the Drawings

Objects, features and advantages of embodiments of the disclosure will appear from the following detailed description, reference being made to the accompanying drawings, in which:

Fig. 1 is a schematic illustration of a non-limiting example a virtual reality environment in which embodiments of the present disclosure may be exercised.

Fig. 2 is a schematic illustration of a non-limiting example virtual reality system in which embodiments of the present disclosure may be exercised.

Fig. 3 is a flow chart of a method according to exemplary embodiments.

Fig. 4 illustrates an exemplary process to obtain 3D geometry data representing semantic parts and their bounds of objects in the environment.

Fig. 5 illustrates a process for creating object selector data according to embodiments of the present disclosure.

Fig. 6 is a schematic illustration of a non-limiting example of object selection.

Fig. 7 is a schematic illustration of a non-limiting example of object selection.

Fig. 8 is a schematic illustration of a non-limiting example of object selection.

In the drawings, like numbers refer to like elements.

Detailed Description

Fig. 1 illustrates a virtual environment 1 in which embodiments of the present disclosure may be exercised. The virtual environment 1 comprises a number of objects 2. The virtual environment 1 may be a virtual environment 1 simulating a real world environment in which certain tasks should be performed, for instance surgery, assembling or manufacturing. The purpose of the virtual environment 1 may then be training of operators or health care personal performing surgery, assembling or manufacturing in the real world environment. This disclosure is however not limited to virtual environments for training, but instead the virtual environment may be for other purposes, such as gaming or guidance.

In Fig. 1, three objects 2a-c, are illustrated: a glass 2a, an apple 2b and a bottle 2c. The disclosure is however not limited to only three objects 2 and there can be more or fewer objects 2 in the virtual environment. In Fig. 1, 3D geometry data representing semantic parts 5a-c of objects 2 a-c in the environment 1 and their bounds are illustrated. The semantic parts 5 will be described in more detail in relation to Fig. 3.

Before an interaction with an object 2 in the virtual environment 1 may be performed, the object 2 may be selected in the virtual environment 1. As will be described later on, objects 2 in the virtual environment 1 may be selected based on the 3D geometry data, the object selector data and the position data. The object 2 may be selected by an object selector 3. In Fig. 1, the object selector 3 is illustrated as a human hand and a representation 4 of the object selector is represented as a sphere. The representation of the object selector 3 in the virtual environment 1 may be moved in the virtual environment 1 based on position data from a position sensor (not shown) in the real world environment.

Fig. 2 illustrates a virtual reality system 10, in which embodiments of the present disclosure may be implemented. As seen in Fig. 1, the virtual reality system 10 comprises a computing unit 20, at least one database 30, a position sensor 40, and an external device 50.

The computing unit 20 may be a cloud-computing unit 20 being included in a distributed cloud network widely and publicly available, or limited to an enterprise cloud. For instance, cloud-computing technologies include, but are not limited to Amazon EC2, Google App Engine, Firebase or Apple iCloud. The database 30 may be run on a cloudcomputing platform, and connection may be established using DBaaS (Database-as-a- service). For instance, the database 30 may be deployed as a SQL data model such as MySQL, PostgreSQL or Oracle RDBMS. Alternatively, deployments based on NoSQL data models such as MongoDB, Hadoop or Apache Cassandra may be used. DBaaS technologies include, but are not limited to Amazon Aurora, EnterpriseDB, Oracle Database Cloud Service or Google Cloud.

In some embodiments, the database 30 is deployed on the same platform as the computing unit 20 deployment. The computing unit 20 is at least configured to obtain 3D geometry data from the database 30, the 3D geometry data representing semantic parts of objects in the virtual environment, for example boundaries and holes of the objects, as will be described in relation to Fig. 3.

Further, the computing unit 20 is configured to obtain object selector data from the database 30, wherein the object selector data comprises a representation of the object selector for selecting an object. By storing the 3D geometry data and object selector data, re-computing of the object selector data and the 3D geometry data can be avoided. An advantage of this is reduced need for processing resources. Another advantage is reduced response time since the 3D geometry data and the object selector data are directly available to the computing unit 20. The 3D geometry data and the object selector data hence do not need to be computed every time the computing unit 20 request the 3D geometry data and the object selector data. Whilst the 3D geometry data and the object selector data described as being stored in the same database 30, it will be envisaged that different data may be stored in different databases.

The computing unit 20 may further be configured to obtain position data from the position sensor 40, wherein the position data is based on a user interaction with the position sensor 40, for example a hand movement or an eye movement of a user. Additionally, the computing unit 20 may also be configured to select an object in the virtual environment based on the 3D geometry data, the object selector data and the position data, as will be described more in detail in relation to Fig. 3. The computing unit 20 is further configured to communicate the selected object to an external device 50. The external device 50 may be embodied as a virtual reality headset, a computer display, a mobile terminal, for instance a mobile phone, laptop computer, stationary computer or a tablet computer. Preferably, the external device 50 has a display 60. The display 60 may be a touch screen display or a non-touch screen display. The display 60 is configured to present information of the virtual environment processed by the computing unit 20. It should be noted that the processing performed in the computing unit 20 could be performed partly or fully in the external device 50.

A flow chart of a method for real time object selection in a virtual environment is illustrated in Fig. 3. The method may be a computer-implemented method. As discussed above, object selection relates to identification of an object, rather than interaction with a selected object. For example, object selection may involve hovering a three-dimensional pointer over a three-dimensional object providing a user with feedback about the possibility of an interaction with the selected object. An example for feedback is to visually highlight the selected object in a different colour. An example for an interaction is to manipulate the object, such as moving it in the virtual space by clicking and dragging the mouse.

The method comprises obtaining, 210, 3D geometry data from a database, the 3D geometry data representing parts of objects in the virtual environment, for example boundaries and holes of the objects. In some embodiments, the object may be an entity that exists in the virtual environment and is a subject of interest with which a user desires to interact. Before the object can be interacted with, the object may be selected in the virtual environment. An example of an object is a 3D model in the virtual environment, such as in a game or in a virtual environment related to training. In some embodiments, before the objects have been divided into the 3D geometry data representing parts of the objects, each object may be represented as a mesh of the object. A mesh is the mathematical definition of the geometry of the object, most commonly described by points or vertices (for example, in 3D, a set of (x, y, z) coordinates) and the surfaces or polygons connecting these points (a tuple of (il, i2, ..., in) connecting n points to one surface patch). The method further comprises obtaining, 220, object selector data from a database, wherein the object selector data comprises a representation of an object selector for selecting an object. A process for creating object selector data according to some embodiments is illustrated in Fig. 5, which will be described later on.

In a next step, 230, the method further comprises obtaining position data from a position sensor 40, wherein the position data is based on a user interaction with the position sensor 40, for example a hand movement or an eye movement of the user.

The method further comprises selecting, 240, an object in the virtual environment based on the 3D geometry data, the object selector data and the position data. Object selection will be described in more detail in relation to Figs. 6 to 8.

In embodiments of the present disclosure, the 3D geometry data representing parts of the object may be geometries or primitives. Geometries or primitives may be basic geometrically simple forms that have a simple mathematical formulation. Examples of geometries or primitives are a cube, a sphere, a cylinder, a capsule, etc.

In embodiments of the present disclosure, the 3D geometry data representing parts of the object may be bounds related to the parts of the objects. The bounds may be different for the different parts of the object. A bound for one part of the object may for instance be a sphere and a bound for another part of the object may for instance be a cylinder.

In some embodiments, the 3D geometry data representing parts of the object may be bounds in the form of primitives that envelop the parts of the object.

In some embodiments, one process of identifying 3D geometry data representing semantic parts and their bounds of objects may be mesh segmentation. Different types of mesh segmentation may be used depending on the type of object. If the object is represented as a 3D point cloud, one type of mesh segmentation is used. If the object is represented as a 3D connected surface, another type of mesh segmentation is used. Mesh segmentation may be a necessary step of identifying 3D geometry data representing semantic parts and their bounds of objects in the environment.

By representing objects as 3D geometry data representing semantic parts and their bounds of the objects, less processing resources are required for object selection in real time applications. In comparison, if the entire mesh of the object itself were used, it would require more processing resources to select an object in the environment based on the 3D geometry data, the object selector data and the position data. Therefore, in some embodiments, the 3D geometry data representing semantic parts and their bounds of objects in the environment are convex bounds. In some embodiments, the 3D geometry data representing semantic parts and their bounds of objects are free-form up to a limited number of polygons.

In some embodiments, the 3D geometry data representing semantic parts and their bounds of objects is a result of a segmentation process. In the segmentation process, the 3D geometry data have undergone a classification process, to identify the 3D geometry data and instances of primitive shapes. Any type of mesh segmentation technique may be combined with any semantic part classification method, in order to provide 3D geometry data representing semantic parts and their bounds of objects in the environment.

In some embodiments, the segmentation may be performed manually by an operator or designer of the environment. In other embodiments, an automatic “baking” algorithm may be used. Baking is the process of sending object mesh data to a computer program (run locally or on a dedicated cloud architecture), processing it, and returning a database including an interaction model for the object. Baking in the cloud allows better and faster performance and enables updating and improving the automatic mesh segmentation continuously over time. In this manner, mesh segmentation may be performed automatically. Results of the mesh segmentation are stored in a database for quick access during runtime, meaning no real-time access to the baking method is needed. Fig. 4 illustrates an exemplary process to obtain 3D geometry data representing semantic parts and their bounds of objects in the environment.

In the first step, a, the object is represented as a mesh. In the next step, b, the mesh is represented as a set of meshes, where each mesh represents a part of the original mesh. In the next steps, c, d, a bounding computation algorithm processes the meshes, in order to form box bounds (bounding boxes) for each mesh. The meshes can also be processed for other 3D geometry data representations, such as through concavity and/or a hole computation algorithm, steps e and f.

In some embodiments, the 3D geometry data representing semantic parts of the objects and their bounds is stored in the database 30. An advantage by storing the 3D geometry data representing semantic parts of the objects and their bounds in the database 30 is the computations of the 3D geometry data do not need to be repeated for the same object. In some embodiments, the computation of the 3D geometry data does not need to be repeated for other objects of the same type as the first object, for which the computation already has been performed.

As mentioned above, in step 220, the method further comprises obtaining object selector data from a database, wherein the object selector data comprises a representation of an object selector for selecting the object.

In some embodiments, there is a relation between a kinematic chain of an object selector and the mesh of the object selector. For instance, a small two-fingered robotic manipulator (represented in the virtual environment) may have object selector data representing a short distance between the two fingers. In contrast, a hand may have object selector data representing a sphere. In addition to size, in some embodiments, there are at least two other factors that influence the object selector data, that is shape and relative pose of the object selector. Relative pose is the combination of position and orientation. The object selector data may be computed from the kinematic structure of the object selector. The object selector data may be of different types. In some embodiments, the object selector data may be a point. In other embodiments, the object selector data may be ray. In yet other embodiments, the object selector data may be a direction or a vector. In yet other embodiments, the object selector data is volume, an extension of a point or a ray cursor by a shape. In some embodiments, the shape is a primitive.

In some embodiments, the object selector data may therefore be a 3D point. In some embodiments, the object selector data may be a 3D point and a shape. In some embodiments, the object selector data may be a 3D vector and a 3D primitive.

In some embodiments, the object selector data is generated from a kinematic chain associated with the object selector. In some embodiments, the object selector data is generated from a kinematic chain associated with the object selector and a 3D Mesh of the object selector. In some embodiments, the object selector data is generated based on the average or mean for some positions of parts of a hand of a user, such as of all the fingertips or only a selected number of fingertips. This can be achieved by associating designated points or nodes with different parts of the hand, for example the wrist, fingers, knuckles, fingertips, etc. In some embodiments, such as using gloves or finger tracking cameras where it is possible to measure finger movements during use, one or more designated points on the hand of the user, for example the knuckles or fingertips, are used to define the object selector geometry. In some embodiments, such as using motion controllers where it is only possible to measure wrist movement, one or more designated points on the wrist and/or hand of the embodied hand are used to define the object selector geometry. By use of designated points in this way, it is not required to monitor the entire hand of the user. The designated points can be selected by a user or operator dependent on the specific application. In some embodiments, the object selector data is generated based on the average or mean of the directions between designated points, for example from a designated point on the wrist of the user to one or more fingertips. In some embodiments, the object selector has a visible geometrical relationship to the hand, i.e. it visually overlaps with a part or the whole visual hand representation (i.e. wrist, fingers, knuckles, fingertips, etc.) and thus also a part of or the whole hand visually needs to overlap with object bounds to generate a result of the object selection process. In other embodiments, the object selector has no visible geometrical relationship to the hand, i.e. it is a geometrical projection (point, ray, primitive, etc., abstracted from the hand as depicted in Fig. 5) that does not visually overlap with a part of the visual hand representations and thus also does not need to visually overlap with object bounds to generate a result of the object selection process.

In some embodiments, the object selector data may be stored in the database 30. An advantage by storing the object selector data in the database 30 is that the computations of the object selector data do not need to be repeated for object selector. In some embodiments, the computation of the object selector data does not need to be repeated for other instances of the same object selector.

Fig. 5 illustrates a process for creating object selector data according to some embodiments. In these embodiments, the object selector data is created from a kinematic chain of a hand, which includes a designated point at the wrist and a number of additional designated points at the knuckles and/or fingertips of one or more fingers. In steps a and b, a position is determined from an average of finger positions. In a next step c, the direction is determined, for example by determining the average of finger positions related to the wrist. Next, in step d, a shape, for example a sphere, is computed from the thumb and the closest finger. The radius of the sphere may be reduced by the thickness of the fingers.

In some embodiments, the position sensor 40 is a device that provides position data based on user interaction with the position sensor 40, for example a hand movement or an eye movement of the user. The position sensor 40 may be a computer mouse that senses movement over a surface and button clicks and provides position data based on the user interaction with the mouse. In other embodiments, the position sensor 40 may be a virtual reality hand controller that senses movements of the user’s hands and button clicks and converts them into position data. In other embodiments, the position sensor 40 may be a sensor glove that senses movements of the user’s fingers and converts it into position data.

In some embodiments, the position sensor 40 may be a finger tracker comprising camera devices that track the fingers of a user. In some embodiments, the position sensor 40 may be an eye tracker comprising camera devices that track eye movements of use.

In some embodiments, the position sensor 40 may be further used to trigger one or more actions. Examples of triggers of various position sensors 40 are a button, the crossing of a threshold on a measured value, a gesture with a hand (such as grasp, pitch, point, etc.), or a blink of the eye. In the step of selecting, the object selector is moved in the virtual environment. In some embodiments, feedback may be provided to a user about a currently selected object to inform a user about the currently selected object. In some embodiments, the feedback is provided by highlighting the object in a different colour on a display. In some embodiments, the feedback is provided by enlarging the object in the virtual environment. Providing feedback to the user is often useful, and sometimes even necessary, to resolve ambiguities and inaccuracies in the selecting step of the method. As mentioned above, in the selecting step of the method, the selection of the object is based on the 3D geometry data, the object selector data and the position data.

One method for selecting an object 2 is illustrated in Fig. 6. In this method, the selection of the object 2 is based on an intersection between a bound of a semantic part 5 of the object 2 and a bound of the representation 4 of the object selector 3. As can be seen in the Fig. 6, a glass 2a is determined as the selected object 2 since there is an intersection between a bound of a semantic part 5a of the glass 2a and a bound of the representation 4 of the object selector 3.

Another method for selecting an object 2 is illustrated in Fig. 7. In this method, the selection of the object 2 is further based on a shortest distance from a bound of the representation 4 of the object selector 3 to a bound of a semantic part 5 of the object 2. As can be seen in Fig. 7, the apple 2b is determined as the selected object 2 since the shortest distance is between the bound of the semantic part 5b of the apple 2b and a bound of the representation 4 of the object selector 3 (i.e. the bounds of the sematic parts 5 of the other objects 2 are further away).

Yet another method for selecting an object 2 is illustrated in Fig. 8. In this method, the selection of the object 2 is based on a ray from a representation 4 of the object selector 3 to a bound of a semantic part 5 of the object 2. The ray may be defined by a directional vector from the object selector data. As can be seen in Fig. 8, the apple 2b is also here determined as the selected object 2 since there is a ray from a representation 4 of the object selector 3 to a bound of a semantic part 5b of the apple 2b.

Since the 3D geometry data represents semantic parts and their bounds of objects in the environment, the object selection method is more reliable and accurate.

The invention has been described above in detail with reference to embodiments thereof. However, as is readily understood by those skilled in the art, other embodiments are equally possible within the scope of the present invention, as defined by the appended claims.

Claims

1. A computer-implemented method (200) for real time object selection in a virtual environment (2), the method comprising the steps of: obtaining (210) from a database (30), 3D geometry data representing semantic parts (5) of objects (2) in the environment (1) and their bounds , for example boundaries and holes of the objects, obtaining (220), from a database (30), object selector data, wherein the object selector data comprises a representation (4) of an object selector (3) intended to select an object (2); obtaining (230) position data from a position sensor (40) based on a user interaction with the position sensor (40), for example a hand movement or an eye movement of a user; and selecting (240) an object (2) in the environment based on the 3D geometry data, the object selector data and the position data.

2. The method (200) according to claim 1, wherein the semantic parts (5) of the objects and their bounds are determined using mesh segmentation.

3. The method (200) according to claim 2, wherein the semantic parts (5) comprise primitive shapes, for example cubes, spheres, or capsules.

4. The method (200) according to any of claims 1 to 3, wherein the object selector data is determined based on a kinematic chain and/or a mesh of the object selector.

5. The method (200) according to claim 4, wherein the object selector data comprises: a coordinate in three dimensions, a point in three dimensions and a vector in three dimensions, or a vector in three dimensions and/or a geometric or solid primitive.

6. The method (200) according to claim 5, wherein the coordinate in three dimensions is determined based on one or more designated points on a hand of a user, for example a selected number of fingertips.

7. The method (200) according to claim 5, wherein the vector in three dimensions is determined based on the average direction or mean direction from a designated point on the wrist of a user to a designated point on one or more fingertips of the user.

8. The method (200) according to any of claims 1 to 7, wherein the selection of the object is further based on an intersection between a bound of a semantic part (5) of the object (2) and a bound of the representation of the object selector.

9. The method (200) according to any of claims 1 to 7, wherein the selection of the object (2) is further based on an intersection between a ray from a representation of the object selector (3) and a bound of a semantic part (5) of the object (2).

10. The method (200) according to any of claims 1 to 7, wherein the selection of the object (2) is further based on a shortest distance from a bound of the representation of the object selector (3) to a bound of a semantic part (5) of the object (2).

11. The method (200) according to any of claims 1 to 7, wherein the selection of the object (2) is further based on weights assigned to the semantic parts (5) of the object (2) and wherein the object associated with the semantic part having the highest weight is chosen in a case were several semantic parts (5) are selected by the object selector (3).

12. A computer-readable medium having stored thereon instructions that, when executed by one or more processors cause execution of the method steps according to any of claims 1 to 11.