CN116820228A

CN116820228A - Object selection and virtual grasp generation method and system for virtual reality

Info

Publication number: CN116820228A
Application number: CN202310542655.0A
Authority: CN
Inventors: 翁冬冬; 江海燕; 东野啸诺
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-09-29

Abstract

The invention discloses an object selection and virtual grasp generation method and a virtual grasp generation system for virtual reality, which are used for controlling the hand pose of a virtual hand by acquiring the real hand pose of a user in real time and utilizing the probability of the grasp type of the gesture of the user by a first neural network; tracking the real wrist pose of the user and the pose of the object in the virtual environment, calculating the related probability of the related object in the scene, and reasoning the selected virtual object; outputting the position and rotation value of the selected object under the wrist coordinate system by using a second neural network; and finally, outputting an optimized rotation value of the hand joint of the user by using the third neural network, and adjusting the virtual hand to realize the grasping of the virtual object by the virtual hand. The system is developed based on common human knowledge, reasoning is carried out according to the interaction environment and gestures of the user, the user can select a virtual object according to the shape of the object and experience of grabbing the real object, namely, the gestures used when grabbing the real object, and the user can quickly use the virtual object without training.

Description

Object selection and virtual grasp generation method and system for virtual reality

Technical Field

The invention relates to the technical field of virtual reality, in particular to an object selection and virtual grasp generation method and system for virtual reality.

Background

When grabbing objects in a virtual reality scene, the general grabbing gesture needs manual calibration, the hand shape needs to be adjusted during calibration, 5 fingers need to be adjusted for adjusting one hand shape, a plurality of joints of one finger, and the steps are complicated.

Paper article: virtualGrasp Leveraging Experience of Interacting with Physical Objects to Facilitate Digital Object Retrieval provides a gesture-based object selection method in which different objects are selected by user-defined gestures.

In the technology, the user selects different objects by executing the grabbing gestures, but a plurality of objects have the same grabbing gestures, so that different gestures need to be defined, the intuitiveness of the gestures is reduced, and the memory burden of the user is large. Meanwhile, when the user executes the same gesture, only the recommendation of the possibly selected object can be realized due to the error of the gesture, and disambiguation selection is needed. Further, for the selected object, it is necessary to make one-to-one correspondence with the gesture, and it is difficult to expand the selected object set in a large amount.

Paper article: hand Interfaces Using Hands to Imitate Objects in AR/VR for Expressive Interactions are used to simulate the shape of an object by Hand to achieve object selection. Since a large number of objects have similar shapes, this method can only enable the selection of a small number of objects.

Therefore, for a virtual reality scene with a large number of virtual objects, if efficient virtual object selection and gripping are to be achieved, cumbersome gesture definitions should be avoided and efficient selection of a large number of different virtual objects can be achieved, which cannot be achieved based on the prior art.

Disclosure of Invention

In view of the above, the invention provides a method and a system for selecting and generating virtual grasp for virtual reality, which can avoid gesture design and iterative verification flow, reduce development cost, and in addition, utilize virtual environment context knowledge for development, can be used for selecting a large number of different virtual objects, and reduce development difficulty; the system is developed based on common human knowledge, reasoning is carried out according to the interaction environment and gestures of the user, and the user can select the virtual object according to the shape of the object and the experience of grabbing the real object, namely, the gestures used when grabbing the real object, so that the user can quickly use the virtual object without training.

In order to achieve the above purpose, the technical scheme of the invention comprises the following steps:

step one: rendering a virtual hand in the virtual scene, wherein the virtual hand is an avatar of a real hand of a user in the virtual scene.

Real hand pose of a user is tracked and obtained in real time, the hand pose of the virtual hand is controlled in real time, the real-time tracking and obtaining the real hand pose of the user, the real-time tracking and obtaining the real hand pose of the virtual hand comprises a position J of a key point of the hand of the virtual hand relative to the wrist of the virtual hand and a hand joint rotation value theta of the virtual hand, the hand joint rotation value theta of the virtual hand is input into a pre-trained first neural network, and probability P of the gripping type of the gesture of the current user is output _grasptype 。

Step two: real-time tracking the wrist pose of the real hand of the user and real-time controlling the virtual wrist pose; tracking the pose of the related virtual object in the virtual environment in real time, calculating the Euclidean distance between the related virtual object and the wrist of the virtual hand in real time, and then calculating the related probability P of the related object in the scene _relative 。

Step three: combining the probability P that a virtual object exists in a scene _scene Probability of grip type, probability of correlation of correlated virtual objects in a scene, probability of virtual object selection P when a user performs a gesture of a specific grip type _grasp And user preference probability P _like The probability of each virtual object being selected is determined and further inferred to obtain the selected virtual object.

Step four: based on the hand pose of the current virtual hand and the pose of the selected virtual object as the input of the pre-trained second neural network, outputting the position and the rotation value of the selected virtual object under the wrist coordinate system of the virtual hand, and rendering the virtual object based on the position and the rotation value; and updating the point cloud pose O 'of the selected virtual object and the point cloud normal vector n' according to the output of the second neural network.

Step five: and outputting the hand joint rotation value of the optimized virtual hand by taking the hand joint rotation value theta of the current virtual hand, the updated normal vector n' of the point cloud of the selected object, the distance D between the sampling point on the hand model surface mesh of the virtual hand and the point cloud of the object and the three-dimensional coordinate v of the sampling point on the hand model surface mesh of the virtual hand as the input of the pre-trained third neural network, wherein the contact area of the virtual hand and the selected object is larger than the contact area before the virtual hand is immersed in the selected virtual object, and the Euclidean distance between the virtual hand and the selected object is smaller than the contact area before the virtual hand is immersed in the selected virtual object.

Finally, the virtual hand is adjusted according to the optimized hand joint rotation value of the virtual hand, and the virtual hand is used for grasping the virtual object.

Further, the wrist coordinate system of the virtual hand refers to that the wrist of the virtual hand is taken as a far point, the directions of parallel four-finger fingers and the palm are taken as y axes, the directions of parallel four-finger fingers and the palm are taken as x axes, and the directions of parallel four-finger fingers, the directions of parallel palm are taken as z axes.

Further, the pre-trained first neural network specifically comprises:

the network structure for constructing the first neural network comprises five layers of full-connection layers, a residual error module and a logistic regression layer; the first to fifth full-connection layers are sequentially connected, a residual error module is arranged between the second layer and the third layer, a residual error module is arranged between the fourth layer and the fifth layer, and the fifth full-connection layer is connected with the logistic regression layer.

The first training sample is set, the hand joint rotation value theta of the virtual hand is taken as input, and the probability of different grasp types of the virtual hand is taken as output.

And training the first neural network by adopting a first training sample to obtain a pre-trained first neural network.

Further, in the second step, a correlation probability P of the correlation object in the scene is calculated _relative The method specifically comprises the following steps: p (P) _relative Equal to the product of the correlation probabilities for each of the correlated virtual objects in the scene that are correlated to the selected virtual object. The correlation probability for each correlated virtual object is equal to the fixed correlation probability for the correlated virtual object and the currently selected virtual object multiplied by the inverse of the Euclidean distance of the correlated virtual object from the virtual wrist.

Further, step three: combining probabilities of virtual objects being present in a sceneP _scene Probability of grip type, probability of correlation of correlated virtual objects in a scene, probability of virtual object selection P when a user performs a gesture of a specific grip type _grasp And user preference probability P _like Determining the probability of each virtual object being selected, and further reasoning to obtain the selected virtual object, specifically:

the related virtual object in the virtual environment refers to a virtual object having a related relationship with the selected virtual object.

Wherein, when the ith grasp type is performed, the probability that each virtual object is selected is: p_i=p _scene ×P _relative ×P _like ×P _{grasp_i} 。

P _scene The probability that the current virtual object exists in the specific virtual scene is given; p (P) _like For the user preference probability, equal to the probability of the user favoring the current virtual object, P _like The initial value is 1/k, k is the number of all the virtual objects which can be selected, then the probability of the number of times of selecting the current virtual object by a user becomes larger, and then the favorite rate of all the virtual objects is obtained by normalization calculation; p (P) _{grasp_i} Probability of the user grabbing the current virtual object when the user performs the gesture of the ith grabbing type; p (P) _{grasp_i} ＝P _{grasptype_i} ×P _{grasp_i_object} Probability of the user grabbing the virtual object when performing a gesture of a specific grabbing type for the user, wherein P _{grasptype_i} Probability of the ith grip type, P, output for the first neural network _{grasp_i_object} P, the probability of the virtual object being gripped at the ith gripping type _{grasp_i_object} Obtained by statistics from the dataset.

The probability that each virtual object is selected when the user performs the ith grip type is calculated as p_i.

The selected virtual object is obtained by adopting the following reasoning process: determining whether the user performs the grip gesture using the persistence technique, considering that the user uses the persistence technique when the user maintains the hand pose and the wrist position for 1s, determining that the user performs the grip gesture, and if the user does not continue using the persistence technique, selecting a virtual object having the highest selected probability by default; if the user continues to use the persistence technique, selecting virtual objects with second to fifth probabilities, switching the selected virtual objects according to probabilities in sequence after each time the hand pose and the wrist position last for 3 seconds, and determining the current virtual object as the selected virtual object after the hand pose and the wrist position of the user change but the hand pose remains unchanged.

Further, the second neural network after pre-training is specifically:

the network structure of the second neural network is constructed and comprises four input full-connection layers connected in parallel, three densely connected blocks connected in series and one output full-connection layer.

The method comprises the steps that a current virtual hand pose and a selected virtual object pose are used as inputs, wherein the current virtual hand pose comprises a position J of a hand key point of a virtual hand relative to a wrist of the virtual hand and a hand joint rotation value theta of the virtual hand; the pose of the selected virtual object comprises a point cloud pose O of the selected virtual object and a normal vector n of the point cloud of the selected virtual object; the four input full connection layers respectively receive the following four inputs: the virtual hand pose θ, the finger joint position J of the virtual hand, the point cloud pose O of the selected virtual object, and the normal vector n of the point cloud of the selected virtual object.

The output ends of the four input full-connection layers are connected to the dense connecting blocks, the three dense connecting blocks are connected in series, and the last output end is connected with the output full-connection layer.

The output of the output full-connection layer is the position and rotation of the selected virtual object in the wrist coordinate system of the virtual hand.

Setting a second training sample, taking the pose of the virtual hand and the position of the joints of the virtual finger, selecting the pose of the point cloud of the virtual object and the normal vector of the point cloud as input, and taking the position and rotation of the virtual object under the coordinate system of the virtual wrist as output.

And training the second neural network by adopting a second training sample to obtain a pre-trained second neural network.

Further, the pre-trained third neural network specifically includes:

the network structure of the third neural network is constructed and comprises five input full-connection layers connected in parallel, three densely connected blocks connected in series and one output full-connection layer.

The five input full-connection layers respectively and correspondingly receive the following five inputs: the current virtual hand joint rotation value theta, the updated selected virtual object point cloud normal vector n 'of the updated selected virtual object point cloud pose O', the distance D between the sampling point on the virtual hand model surface mesh and the virtual object point cloud, and the three-dimensional coordinate v of the sampling point on the virtual hand model surface mesh.

Outputting the output of the full-connection layer as an optimized virtual hand joint rotation value; the rotation value of the joints of the virtual hand is optimized, so that the contact area of the virtual hand and the selected virtual object is larger than that before optimization, and the Euclidean distance of the virtual hand immersed in the selected virtual object is smaller than that before optimization.

And training the third neural network by adopting a third training sample to obtain a pre-trained third neural network.

The invention also provides an object selection and virtual grasp generation system for virtual reality, which comprises a hand gesture estimation module, a wrist gesture tracking module, a virtual environment information extraction module, a grasp type estimation module, an environment context information extraction module, an object selection module, an object grasp module and a virtual environment rendering module.

The hand gesture estimation module is used for tracking and acquiring real hand gesture information of a user in real time and controlling the hand gesture of the virtual hand in real time, wherein the hand gesture information of the virtual hand comprises a position J of a hand key point of the virtual hand relative to a wrist and a hand joint rotation value theta of the virtual hand; the hand joint rotation value θ of the virtual hand is fed into the grasp type estimation module.

The grasping type estimating module is used for taking the hand joint rotation value theta of the virtual hand as a pre-trained first neural network and outputting the probability of the grasping type of the current user gesture of the user.

The wrist pose tracking module is used for tracking the wrist pose of the real hand of the user in real time and controlling the virtual wrist pose in trial.

The virtual environment information extraction module is used for extracting the pose of the related virtual object in the virtual environment;

the environment context information extraction module is used for calculating Euclidean distance between the relevant virtual object and the wrist of the virtual hand in real time according to the wrist pose of the virtual hand and the pose of the relevant virtual object in the virtual environment, and then calculating the relevant probability P of the relevant virtual object in the scene _relative 。

P _relative Equal to the product of the correlation probabilities of all the correlated virtual objects in the scene that are correlated to the selected virtual object; the correlation probability for each correlated virtual object is equal to the fixed correlation probability for the correlated virtual object and the currently selected virtual object multiplied by the inverse of the Euclidean distance of the correlated virtual object from the virtual wrist.

An object selection module for combining the probability P that the virtual object exists in the scene _scene Probability of grip type, probability of correlation of related objects in the scene, probability of object selection P when the user performs a gesture of a specific grip type _grasp And user preference probability P _like The probability of each object being selected is determined and further inferred to yield the selected object.

The grasping gesture prediction module is used for outputting the position and the rotation value of the selected virtual object under the wrist coordinate system of the virtual hand based on the hand pose of the current virtual hand and the pose of the selected virtual object as the input of the pre-trained second neural network; and updating the point cloud pose O 'of the selected virtual object and the point cloud normal vector n' according to the output of the second neural network.

The object grasping module takes a hand joint rotation value theta of a current virtual hand, a point cloud normal vector n' of the selected object after updating, a distance D between a sampling point on a hand model surface mesh of the virtual hand and an object point cloud, and a three-dimensional coordinate v of the sampling point on the hand model surface mesh of the virtual hand as input of a pre-trained third neural network, outputs an optimized hand joint rotation value of the virtual hand, and enables the contact area of the virtual hand and the selected object to be larger than that before the optimization by the optimized hand joint rotation value of the virtual hand.

Further, an object selection module is used for combining the probability P that the virtual object exists in the scene _scene Probability of grip type, probability of correlation of related objects in the scene, probability of object selection P when the user performs a gesture of a specific grip type _grasp And user preference probability P _like Determining the probability of each object being selected, and further reasoning to obtain the selected object, specifically:

P _scene The probability that the current virtual object exists in the specific virtual scene is given; p (P) _like For the user preference probability, equal to the probability of the user favoring the current virtual object, P _like The initial value is 1/k, k is the number of all the virtual objects which can be selected, then the probability of the number of times of selecting the current virtual object by a user becomes larger, and then the favorite rate of all the virtual objects is obtained by normalization calculation; p (P) _{grasp_i} Probability of the user grabbing the current virtual object when the user performs the gesture of the ith grabbing type; p (P) _{grasp_i} ＝P _{grasptype_i} ×P _{grasp_i_object} Probability of the user grabbing the virtual object when performing a gesture of a specific grabbing type for the user, wherein P _{grasptype_i} For the first neural networkProbability of the ith grasp type, P, of the output _{grasp_i_object} P, the probability of the virtual object being gripped at the ith gripping type _{grasp_i_object} Obtained by statistics from the dataset.

The beneficial effects are that:

1: the invention provides a method and a system for selecting and virtually grasping an object for virtual reality, wherein the grasping gesture is a common sense gesture obtained based on big data, the user does not need to learn, the probability of executing all objects to be selected by the user under one grasping gesture by the current user is calculated through the information of a virtual environment where the user is positioned, the real life experience of the user is migrated to the virtual environment to select the object, the selection of different objects by one gesture under different environments and the selection of a plurality of the same objects by different gestures under the same environment can be realized, the limitation of a one-to-one correspondence mechanism of the current gesture selection is broken, and the selection of the virtual object based on the common sense is realized. The object selection method and the system provided by the invention can help a developer to accelerate the development process, avoid gesture design and iterative verification process, reduce development cost, and in addition, the method and the system can be used for selecting a large number of different virtual objects by utilizing the virtual environment context knowledge for development, and reduce development difficulty; the system is developed based on common general knowledge of human beings, and the user can use the system quickly without training.

2: the invention provides a method and a system for selecting an object for virtual reality and generating virtual grasp, which are a method and a system for selecting a virtual object based on common human knowledge and environmental context, and can select an object which cannot be directly contacted by a hand in a virtual environment, including an object outside the distance of the hand and an object in a virtual storage library of the virtual environment. The method has the advantages that the grasping gestures are extracted from common human knowledge, so that the problems of large gesture memory burden, non-visual gestures and the like of a user or a developer for defining the gestures are avoided, and the user can conveniently use the method to select objects under the condition of not receiving training; resolving ambiguity of a large number of object selections using an environmental context; in addition, the limitation of one-to-one correspondence of the existing gesture selection is broken through based on the object selection diagram of the data structure based on the scene, the user can select the same object through different gripping gestures according to experience in life based on the environment context through real-time gripping gestures and wrist position tracking of the user, and meanwhile, different objects can be selected by using the same gripping gesture.

Drawings

FIG. 1 is a flow chart of a virtual object selection method according to an embodiment of the present invention

FIG. 2 is a schematic view of a hand joint;

FIG. 3 is a schematic view of 33 grip types;

FIG. 4 is a block diagram of a first pre-trained neural network;

FIG. 5 is a block diagram of a second pre-trained neural network;

FIG. 6 is a block diagram of a third pre-trained neural network;

fig. 7 is a block diagram of a hardware system configuration.

Detailed Description

The invention will now be described in detail by way of example with reference to the accompanying drawings.

The invention provides an object selection and virtual grasp generation method for virtual reality, which has a flow shown in figure 1 and comprises the following steps:

step (a)And (3) a step of: rendering a virtual hand in the virtual scene, wherein the virtual hand is an virtual image of a real hand of a user in the virtual scene; real hand pose of a user is tracked and obtained in real time, the hand pose of the virtual hand is controlled in real time, the real-time tracking and obtaining the real hand pose of the user, the real-time tracking and obtaining the real hand pose of the virtual hand comprises a position J of a key point of the hand of the virtual hand relative to the wrist of the virtual hand and a hand joint rotation value theta of the virtual hand, the hand joint rotation value theta of the virtual hand is input into a pre-trained first neural network, and probability P of the gripping type of the gesture of the current user is output _grasptype 。

The hand key points of the virtual hand refer to 20 points of hand joints and fingertips (as shown in fig. 2). The hand pose can be predicted by the RGB image through a deep learning method. The hand pose of the user refers to a six-degree-of-freedom pose of 20 hand key points under a wrist coordinate system, and the six-degree-of-freedom pose comprises translation and rotation.

The grip types include 33 kinds, as shown in fig. 3.

In the embodiment of the present invention, the structure of the pre-trained first neural network is shown in fig. 4, which specifically includes:

the network structure for constructing the first neural network comprises five layers of full-connection layers, a residual error module and a logistic regression layer; wherein the first to fifth full-connection layers are sequentially connected, a residual error module is arranged between the second layer and the third layer, a residual error module is arranged between the fourth layer and the fifth layer, and the fifth full-connection layer is connected with the logistic regression layer;

setting a first training sample, taking a hand joint rotation value theta of a virtual hand as input, and taking probabilities of different grasp types of the virtual hand as output;

Step two: real-time tracking the wrist pose of the real hand of the user and real-time controlling the virtual wrist pose; tracking the pose of the related virtual object in the virtual environment in real time, calculating the Euclidean distance between the related virtual object and the wrist of the virtual hand in real time, and then calculating the related probability P of the related object in the scene _relative ；

Calculating the correlation probability P of a correlation object in a scene _relative The method specifically comprises the following steps:

preplayer is equal to the product of the correlation probabilities of all the correlated virtual objects in the scene that are correlated to the selected virtual object. The correlation probability for each correlated virtual object is equal to the fixed correlation probability for the correlated virtual object and the currently selected virtual object multiplied by the inverse of the Euclidean distance of the correlated virtual object from the virtual wrist.

Step three: combining the probability P that a virtual object exists in a scene _scene Probability of grip type, probability of correlation of correlated virtual objects in a scene, probability of virtual object selection P when a user performs a gesture of a specific grip type _grasp And user preference probability P _like Determining the probability of each virtual object being selected, and further reasoning to obtain the selected virtual object, specifically:

the related virtual objects of the virtual environment are virtual objects, such as televisions and cooking tops, which have correlation with the virtual objects to be selected in the virtual environment, such as a remote controller and a television commonly appear together, the probability of the fixed correlation of the remote controller and the television is high, and the probability of the fixed correlation of the remote controller and the cooking tops is low.

Wherein, when the ith grasp type is performed, the probability that each virtual object is selected is: p_i=p _scene ×P _relative ×P _like ×P _{grasp_i} ；

P _scene The probability of a specific virtual scene exists for the current virtual object, the probability can be obtained through statistics of a large amount of scene data, for example, the probability that a kitchen knife appears in a kitchen is higher than the probability that the kitchen knife appears in a living room;

P _relative equal to the product of the correlation probabilities for each of the correlated virtual objects in the scene that are correlated to the selected virtual object. The correlation probability of each correlated virtual object is equal to the fixed correlation probability of the correlated virtual object and the currently selected virtual object multiplied by the inverse of the Euclidean distance between the correlated virtual object and the virtual wrist, for example, when the correlated object milk exists in the virtual environment, the correlation probability of the cup of the current virtual object is 0.9, and the correlation probability of the knife of the current virtual object is 0 1, if the distance between milk and wrist is 2m, P _relative (cup) =0.9x1/2=0.45, p _relative (knife) =0.1×1/2=0.05.

P _like For the user preference probability, the user preference probability refers to the probability of having a higher selection for the virtual object liked by the user, which is equal to the probability of the user liking the current virtual object, P _like The initial value is 1/k, k is the number of all the virtual objects which can be selected, then the probability of the number of times of selecting the current virtual object by a user becomes larger, and then the favorite rate of all the virtual objects is obtained by normalization calculation;

P _{grasp_i} probability of the user grabbing the current virtual object when the user performs the gesture of the ith grabbing type;

P _{grasp_i} ＝P _{grasptype_i} ×P _{grasp_i_object} probability of the user grabbing the virtual object when performing a gesture of a specific grabbing type for the user, wherein P _{grasptype_i} Probability of the ith grip type, P, output for the first neural network _{grasp_i_object} P, the probability of the virtual object being gripped at the ith gripping type _{grasp_i_object} Obtained by statistics from the dataset.

Calculating the probability of each virtual object being selected as P_i when the user executes the ith grasp type;

Step four: based on the hand pose of the current virtual hand and the pose of the selected virtual object as the input of the pre-trained second neural network, outputting the position and rotation value of the selected virtual object under the wrist coordinate system of the virtual hand; and updating the point cloud pose O 'of the selected virtual object and the point cloud normal vector n' according to the output of the second neural network. The wrist coordinate system of the virtual hand refers to that the wrist of the virtual hand is taken as a far point, the directions of parallel four-finger fingers and the palm are taken as y axes, the directions of parallel four-finger fingers and the palm are taken as x axes, and the directions of parallel palm are taken as z axes.

The pretrained second neural network specifically comprises:

the network structure of the second neural network is constructed, as shown in fig. 5, and comprises four input full-connection layers connected in parallel, three densely connected blocks connected in series and one output full-connection layer;

the method comprises the steps that a current virtual hand pose and a selected virtual object pose are used as inputs, wherein the current virtual hand pose comprises a position J of a hand key point of a virtual hand relative to a wrist of the virtual hand and a hand joint rotation value theta of the virtual hand; the pose of the selected virtual object comprises a point cloud pose O of the selected virtual object and a normal vector n of the point cloud of the selected virtual object; the four input full connection layers respectively receive the following four inputs: the method comprises the steps of a virtual hand pose theta, a finger joint position J of a virtual hand, a point cloud pose O of a selected virtual object and a normal vector n of the point cloud of the selected virtual object;

The output ends of the four input full-connection layers are connected to the dense connecting blocks, the three dense connecting blocks are connected in series, and the last output end is connected with the output full-connection layer;

outputting the position and rotation of the selected virtual object under the wrist coordinate system of the virtual hand by the output full-connection layer;

setting a second training sample, taking the pose of the virtual hand and the position of the joint of the virtual finger, selecting the pose of the point cloud of the virtual object and the normal vector of the point cloud as input, and taking the position and rotation of the selected virtual object under the coordinate system of the virtual wrist as output;

Step five: the method comprises the steps of taking a hand joint rotation value theta of a current virtual hand, an updated selected object point cloud normal vector n 'of an updated selected object point cloud pose O', a distance D between a sampling point on a hand model surface mesh of the virtual hand and an object point cloud, and a three-dimensional coordinate v of the sampling point on the hand model surface mesh of the virtual hand as input of a pre-trained third neural network, outputting an optimized hand joint rotation value of the virtual hand, wherein the contact area of the virtual hand and the selected object is larger than that of the virtual hand before optimization due to the optimized hand joint rotation value;

The pre-trained third neural network specifically comprises the following components:

the network structure of the third neural network is constructed, as shown in fig. 6, and comprises five input full-connection layers connected in parallel, three dense connection blocks connected in series and one output full-connection layer;

the five input full-connection layers respectively and correspondingly receive the following five inputs: the method comprises the steps that a current virtual hand joint rotation value theta, an updated selected virtual object point cloud normal vector n 'of an updated selected virtual object point cloud pose O', a distance D between a sampling point on a virtual hand model surface mesh and a virtual object point cloud, and a three-dimensional coordinate v of the sampling point on the virtual hand model surface mesh are carried out;

outputting the output of the full-connection layer as an optimized virtual hand joint rotation value; the optimized virtual hand joint rotation value enables the contact area of the virtual hand and the selected virtual object to be larger than before optimization, and the Euclidean distance of the virtual hand immersed in the selected virtual object is smaller than before optimization;

The invention also provides an object selection and virtual grasp generation system for virtual reality, which comprises a hand gesture estimation module, a wrist gesture tracking module, a virtual environment information extraction module, a grasp type estimation module, an environment context information extraction module, an object selection module, an object grasp module and a virtual environment rendering module;

The grasping type estimation module is used for taking the hand joint rotation value theta of the virtual hand as a pre-trained first neural network and outputting the probability of the grasping type of the current user gesture of the user;

the wrist pose tracking module is used for tracking the wrist pose of the user in real time;

The environment context information extraction module is used for calculating Euclidean distance between the relevant virtual object and the wrist of the virtual hand in real time according to the wrist pose of the virtual hand and the pose of the relevant virtual object in the virtual environment, and then calculating the relevant probability P of the relevant virtual object in the scene _relative ；

P _relative Equal to the product of the correlation probabilities of all the correlated virtual objects in the scene that are correlated to the selected virtual object; the correlation probability of each correlated virtual object is equal to the fixed correlation probability of the correlated virtual object and the currently selected virtual object multiplied by the reciprocal of the Euclidean distance of the correlated virtual object from the virtual wrist;

an object selection module for combining the probability P that the virtual object exists in the scene _scene Probability of grip type, in sceneRelated probability of related object, object selection probability P when user performs gesture of specific grabbing type _grasp And user preference probability P _like Determining the probability of each object being selected, and further reasoning to obtain the selected object;

wherein, the related virtual objects of the virtual environment are objects which have relativity with the virtual objects to be selected in the virtual environment, such as televisions and cooking tops, for example, a remote controller and a television usually appear together, the probability of the fixed relativity of the remote controller and the television is high, and the probability of the fixed relativity of the remote controller and the cooking tops is low;

Wherein, when the ith grip type is performed, the probability that each object is selected is:

P_i＝P _scene ×P _relative ×P _like ×P _{grasp_i} ；

P _scene the probability of the current virtual object existing in the specific virtual scene can be obtained through statistics of a large amount of scene data, for example, the probability of the kitchen knife appearing in a kitchen is 1, and the probability of the kitchen knife appearing in a living room is 0.6.

P _relative Equal to the product of the correlation probabilities of all the correlated virtual objects in the scene that are correlated to the selected virtual object; the correlation probability of each correlated virtual object is equal to the fixed correlation probability of the correlated virtual object and the currently selected virtual object multiplied by the inverse of the Euclidean distance of the correlated virtual object from the virtual wrist, such as 0.9 when the correlated virtual object milk is present in the virtual environment, 0.1 when the correlation probability of the current virtual object cup appears, and P when the distance between the milk and the wrist is 2m _relative (cup) =0.9x1/2=0.45, p _relative (knife) =0.1×1/2=0.05.

P _like The probability of preference is equal to the probability of preference of the user for the current virtual object, and the probability of preference of the user refers to the probability of preference of the user for the virtual object with higher selection. P (P) _like The initial value is 1/k, k is the number of all virtual objects which can be selected, and then the number of times the user selects the current virtual object And the probability becomes larger, and then the favorite rates of all the virtual objects are normalized and calculated.

P _{grasp_i} Probability of the user grabbing the current virtual object when the user performs the gesture of the ith grabbing type; p (P) _{grasp_i} ＝P _{grasptype_i} ×P _{grasp_i_object} Probability of the user grabbing the virtual object when performing a gesture of a specific grabbing type for the user, wherein P _{grasptype_i} Probability of the ith grip type, P, output for the first neural network _{grasp_i_object} P, the probability of the virtual object being gripped at the ith gripping type _{grasp_i_object} Obtained by statistics from the dataset.

Calculating the probability of each virtual object being selected as P_i when the user executes the ith grasp type; the selected virtual object is obtained by adopting the following reasoning process: determining whether the user performs the grip gesture using the persistence technique, considering that the user uses the persistence technique when the user maintains the hand pose and the wrist position for 1s, determining that the user performs the grip gesture, and if the user does not continue using the persistence technique, selecting a virtual object having the highest selected probability by default; if the user continues to use the persistence technique, selecting virtual objects with second to fifth probabilities, switching the selected virtual objects according to probabilities in sequence after each time the hand pose and the wrist position last for 3 seconds, and determining the current virtual object as the selected virtual object after the hand pose and the wrist position of the user change but the hand pose remains unchanged.

The grasping gesture prediction module is used for outputting the position and the rotation value of the selected virtual object under the wrist coordinate system of the virtual hand based on the hand pose of the current virtual hand and the pose of the selected virtual object as the input of the pre-trained second neural network; updating the point cloud pose O 'and the point cloud normal vector n' of the selected virtual object according to the output of the second neural network;

The first through third pre-trained neural networks are consistent with the aforementioned one object selection and virtual grip generation method for virtual reality.

The hardware system of the invention is shown in fig. 7, and comprises an RGB camera, a wrist pose tracker, a selection module processor, a virtual environment processor, a head-mounted display, storage hardware and communication hardware.

The RGB camera is used for acquiring the image of the hand of the user in real time and can be fixed on the head-mounted display.

The wrist pose tracker is used for tracking the hand pose of the user in real time, and can be realized by a commercial tracker, such as an HTC VIVE tracker.

The selection module processor is used for calculating a virtual object selected by a user currently according to the existing information, and the processor is located on a head-mounted display, a cloud end, a local processor and the like.

The virtual environment processor user processes information of the virtual environment where the user is located, and the information can be located on a head-mounted display, a cloud end, a local processor and the like, and the processor can be the same processor as the selection module processor or can be separated from the selection module processor.

The head mounted display is used for displaying the current virtual environment.

The storage hardware is used for storing virtual environment information and other hardware to obtain all digital information, including information with selected objects, and the hardware can be located in cloud and local.

Communication hardware is used for information transfer between individual modules, and wireless or wired modes can be used.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An object selection and virtual grasp generation method for virtual reality is characterized by comprising the following steps:

step one: rendering a virtual hand in a virtual scene, wherein the virtual hand is an avatar of a real hand of a user in the virtual scene;

real hand pose of a user is tracked and obtained in real time, the hand pose of the virtual hand is controlled in real time, the real-time tracking and obtaining the real hand pose of the user, the real-time tracking and obtaining the real hand pose of the virtual hand comprises a position J of a key point of the hand of the virtual hand relative to the wrist of the virtual hand and a hand joint rotation value theta of the virtual hand, the hand joint rotation value theta of the virtual hand is input into a pre-trained first neural network, and probability P of the gripping type of the gesture of the current user is output _grasptype ；

Step three: combining the probability P that a virtual object exists in a scene _scene Probability of grip type, probability of correlation of correlated virtual objects in a scene, probability of virtual object selection P when a user performs a gesture of a specific grip type _grasp And user preference probability P _like Determining the probability of each virtual object being selected, and further reasoning to obtain the selected virtual object;

step four: based on the hand pose of the current virtual hand and the pose of the selected virtual object as the input of the pre-trained second neural network, outputting the position and the rotation value of the selected virtual object under the wrist coordinate system of the virtual hand, and rendering the virtual object based on the position and the rotation value; updating the point cloud pose O 'and the point cloud normal vector n' of the selected virtual object according to the output of the second neural network;

step five: the method comprises the steps of taking a hand joint rotation value theta of a current virtual hand, an updated selected object point cloud normal vector n 'of an updated selected object point cloud pose O', a distance D between a sampling point on a hand model surface mesh of the virtual hand and an object point cloud, and a three-dimensional coordinate v of the sampling point on the hand model surface mesh of the virtual hand as input of a pre-trained third neural network, outputting an optimized hand joint rotation value of the virtual hand, wherein the optimized hand joint rotation value of the virtual hand is used for enabling the contact area of the virtual hand and the selected object to be larger than before optimization, and the distance of the virtual hand immersed in the selected virtual object to be smaller than before optimization;

2. The method for selecting and generating virtual grasp for virtual reality according to claim 1, wherein the wrist coordinate system of the virtual hand is that a wrist of the virtual hand is taken as a far point, parallel four-finger fingers and a palm direction are taken as y axes, a direction perpendicular to the four-finger fingers and parallel to the palm direction is taken as an x axis, and a direction perpendicular to the x, y and palm directions are taken as z axes.

3. The method for virtual reality object selection and virtual grip generation according to claim 2, wherein the pre-trained first neural network is specifically:

4. A method for virtual reality object selection and virtual grip generation according to any one of claims 1-3, characterized in that in step two, the correlation probability P of the correlation object in the scene is calculated _relative The method specifically comprises the following steps:

P _relative equal to the product of the correlation probabilities for each of the correlated virtual objects in the scene that are correlated to the selected virtual object. The correlation probability for each correlated virtual object is equal to the fixed correlation probability for the correlated virtual object and the currently selected virtual object multiplied by the inverse of the Euclidean distance of the correlated virtual object from the virtual wrist.

5. The method for virtual reality object selection and virtual grip generation of claim 4, wherein the step three: combining the probability P that a virtual object exists in a scene _scene Probability of grip type, probability of correlation of correlated virtual objects in a scene, probability of virtual object selection P when a user performs a gesture of a specific grip type _grasp And user preference probability P _like Determining the probability of each virtual object being selected, and further reasoning to obtain the selected virtual object, specifically:

The related virtual objects in the virtual environment are virtual objects with related relations with the selected virtual objects;

P _scene The probability that the current virtual object exists in the specific virtual scene is given; p (P) _like For the user preference probability, equal to the probability of the user favoring the current virtual object, P _like The initial value is 1/k, k is the number of all the virtual objects which can be selected, then the probability of the number of times of selecting the current virtual object by a user becomes larger, and then the favorite rate of all the virtual objects is obtained by normalization calculation;P _{grasp_i} probability of the user grabbing the current virtual object when the user performs the gesture of the ith grabbing type; p (P) _{grasp_i} ＝P _{grasptype_i} ×P _{grasp_i_object} Probability of the user grabbing the virtual object when performing a gesture of a specific grabbing type for the user, wherein P _{grasptype_i} Probability of the ith grip type, P, output for the first neural network _{grasp_i_object} P, the probability of the virtual object being gripped at the ith gripping type _{grasp_i_object} Obtained by statistics from the dataset;

6. The method for virtual reality object selection and virtual grip generation according to claim 4, wherein the pre-trained second neural network is specifically:

constructing a network structure of a second neural network, wherein the network structure comprises four input full-connection layers connected in parallel, three densely connected blocks connected in series and one output full-connection layer;

the output of the output full-connection layer is the position and rotation of the selected virtual object under the wrist coordinate system of the virtual hand;

7. The method for virtual reality object selection and virtual grip generation according to claim 4, wherein the pre-trained third neural network is specifically:

constructing a network structure of a third neural network, wherein the network structure comprises five input full-connection layers connected in parallel, three densely connected blocks connected in series and one output full-connection layer;

the output of the output full-connection layer is an optimized virtual hand joint rotation value; the optimized virtual hand joint rotation value enables the contact area of the virtual hand and the selected virtual object to be larger than before optimization, and the Euclidean distance of the virtual hand immersed in the selected virtual object is smaller than before optimization;

8. The system is characterized by comprising a hand gesture estimation module, a wrist gesture tracking module, a virtual environment information extraction module, a grasp type estimation module, an environment context information extraction module, an object selection module, an object grasp module and a virtual environment rendering module;

the hand gesture estimation module is used for tracking and acquiring real hand gesture information of a user in real time and controlling the hand gesture of a virtual hand in real time, wherein the hand gesture information of the virtual hand comprises a position J of a hand key point of the virtual hand relative to a wrist and a hand joint rotation value theta of the virtual hand; the hand joint rotation value theta of the virtual hand is sent to the grasping type estimation module;

The grasping type estimation module is used for taking a hand joint rotation value theta of the virtual hand as a pre-trained first neural network and outputting the probability of grasping type of the current user gesture of the user;

the wrist pose tracking module is used for tracking the wrist pose of the real hand of the user in real time and controlling the virtual wrist pose of the trial;

an object selection module for combining the probability P that the virtual object exists in the scene _scene Probability of grip type, probability of correlation of related objects in the scene, probability of object selection P when the user performs a gesture of a specific grip type _grasp And user preference probability P _like Determining the probability of each object being selected, and further reasoning to obtain the selected object;

the object grasping module takes a hand joint rotation value theta of a current virtual hand, a point cloud normal vector n' of the selected object after updating, a distance D between a sampling point on a hand model surface mesh of the virtual hand and an object point cloud, and a three-dimensional coordinate v of the sampling point on the hand model surface mesh of the virtual hand as input of a pre-trained third neural network, outputs an optimized hand joint rotation value of the virtual hand, and the optimized hand joint rotation value of the virtual hand is used for enabling the contact area between the virtual hand and the selected object to be larger than that before optimization;

9. The object selection and virtual grip generation system for virtual reality of claim 8, wherein the object selection module is to incorporate a virtual objectProbability of a volume being present in a scene P _scene Probability of grip type, probability of correlation of related objects in the scene, probability of object selection P when the user performs a gesture of a specific grip type _grasp And user preference probability P _like Determining the probability of each object being selected, and further reasoning to obtain the selected object, specifically:

P _scene The probability that the current virtual object exists in the specific virtual scene is given; p (P) _like For the user preference probability, equal to the probability of the user favoring the current virtual object, P _like The initial value is 1/k, k is the number of all the virtual objects which can be selected, then the probability of the number of times of selecting the current virtual object by a user becomes larger, and then the favorite rate of all the virtual objects is obtained by normalization calculation; p (P) _{grasp_i} Probability of the user grabbing the current virtual object when the user performs the gesture of the ith grabbing type; p (P) _{grasp_i} ＝P _{grasptype_i} ×P _{grasp_i_object} Probability of the user grabbing the virtual object when performing a gesture of a specific grabbing type for the user, wherein P _{grasptype_i} Probability of the ith grip type, P, output for the first neural network _{grasp_i_object} P, the probability of the virtual object being gripped at the ith gripping type _{grasp_i_object} Obtained by statistics from the dataset;