CN115686193A

CN115686193A - Virtual model three-dimensional gesture control method and system in augmented reality environment

Info

Publication number: CN115686193A
Application number: CN202211099115.1A
Authority: CN
Inventors: 胡耀光; 王敬飞; 杨晓楠; 王鹏; 毛婉婷
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2023-02-03

Abstract

The invention discloses a three-dimensional gesture control method and a three-dimensional gesture control system for a virtual model in an augmented reality environment, and belongs to the field of man-machine interaction of augmented reality. The invention comprises a data acquisition and processing module, a virtual hand module, a collision calculation module and a gesture intention recognition module. According to the method, a 'grabbing pair' condition is established according to the physical process characteristics of object grabbing and an augmented reality environment, and a grabbing intention identification algorithm is established based on the 'grabbing pair'; the grasping state is judged according to the grasping intention recognition algorithm and the grasping pair condition, whether grasping is finished or not is judged without contact calculation based on a plurality of contact points, so that grasping intention judgment is more flexible, the grasping intention judgment is closer to the real three-dimensional gesture control condition, the method is more suitable for complex gesture interaction scenes and better accords with visual interaction feeling of a user, meanwhile, if a plurality of pairs of grasping pairs exist, a plurality of contact points forming the grasping pair participate in interaction intention recognition, and the robustness, flexibility, efficiency and immersion of gesture interaction intention recognition are improved.

Description

Virtual model three-dimensional gesture control method and system in augmented reality environment

Technical Field

The invention belongs to the field of man-machine interaction of augmented reality, and particularly relates to a virtual model gesture control method in an augmented reality environment.

Background

Augmented Reality (AR) is a technology for superimposing virtual information on a real environment to realize fusion of the virtual information and the real environment. In an augmented reality environment, the traditional way of interacting based on an additional input device such as a keyboard and mouse is no longer applicable since the information is presented in a stereoscopic, three-dimensional manner, as these ways may hinder a seamless interaction experience. Therefore, more natural interaction methods are researched and applied in augmented reality, including gestures, voice, body language, eye tracking and the like. Compared with other interaction modes, the gesture interaction has more advantages in the aspect of direct interaction with the three-dimensional model, and the multi-degree-of-freedom manipulation task of the three-dimensional model can be realized by means of the gesture interaction. Such as an augmented reality assembly system, the gesture interaction may provide a natural and intuitive user interface for manipulating virtual parts or fixtures during virtual assembly.

Current gesture interaction schemes are two-dimensional and three-dimensional. The two-dimensional gesture interaction scheme is generally oriented to an AR system on a mobile device such as a mobile phone tablet, and may perform simple interaction with a model through a two-dimensional (planar) gesture, such as dragging a virtual object. However, since the AR information is three-dimensional, the two-dimensional gesture interaction method is not intuitive and accurate. Compared with two-dimensional gesture interaction, the three-dimensional gesture interaction method supports the interaction between a user and a virtual object in a three-dimensional space, and is more in line with the intuitive feeling and experience of people. For example, chinese patent publication No. CN110221690B discloses a gesture interaction method based on an AR scene, which provides a new gesture interaction mode, and can accurately present a shielding relationship between a hand and a virtual object and "contact" between the hand and the virtual object, so as to make more interaction actions between the hand and the virtual object, and achieve more real interaction experience between a user and the virtual object in the AR scene. However, the current three-dimensional gesture interaction technical scheme has some disadvantages and problems, most of the three-dimensional gesture interaction technical scheme drives the virtual object to move or rotate by means of a fixed gesture recognition result, natural interaction between two hands and the virtual object, such as grasping and the like, cannot be supported, and the user cannot adjust the pose of the virtual object naturally in the actual operation process, so that the virtual object is difficult to be accurately placed at the position expected by the user.

Disclosure of Invention

The invention mainly aims to provide a virtual model three-dimensional gesture control method and system in an augmented reality environment, which can efficiently and accurately identify the natural grasping intention of a user for a three-dimensional virtual model in the augmented reality environment, support the user to move and rotate a virtual object through natural gestures, improve the robustness of the three-dimensional gesture interaction process, enable the gesture interaction experience to be more visual and natural, further improve the virtual effect of the augmented reality on the three-dimensional gesture control, and improve the immersion feeling of the user.

The purpose of the invention is realized by the following technical scheme.

The invention discloses a virtual model three-dimensional gesture control method in an augmented reality environment, which comprises the following steps:

the method comprises the following steps: and acquiring images of both hands of the current frame, and determining the position and posture data of the key nodes of both hands relative to the AR equipment based on a convolutional neural network two-hand pose estimation algorithm. The convolutional neural network algorithm is an algorithm for estimating the pose of two hands from 2D to 3D, which consists of two convolutional neural networks. The first convolutional neural network is trained to realize the positioning of the hand, and the 2D position of the center of the hand in the image is estimated; the hand-located localized image is then used, along with the corresponding input depth values, to generate a normalized cropped image that is passed to a second convolutional neural network to regress the relative 3D hand joint positions in real time.

Step two: and superposing the virtual hand model at the key nodes of the two hands identified by the current frame, and determining the position and the posture of the virtual hand model according to the position and the posture of the key nodes to realize the mapping of the real two hands in the virtual space.

The virtual hand model is composed of a plurality of virtual joint models, each virtual joint model is a cylinder and approximately simulates finger joints of real hands. A topological relation exists between the virtual hand joint models, namely the high-level virtual joint model comprises a low-level virtual joint model, and the low-level virtual joint model is driven to move when the high-level virtual joint model moves. The virtual hand model is represented by the following parameterization:

wherein, jointi is the ith virtual joint model, pi is the position of the virtual joint model, and is represented by a set of vectors xi, yi and zi under the coordinate system of the augmented reality environment, ei is the posture of the virtual joint model, and is represented by a set of vectors wi, ri and li under the coordinate system of the augmented reality environment, size is a parameter of the virtual joint model, li represents the length of the cylinder, di represents the diameter of the cylinder, children represents the driven sub-joint model of the virtual joint model, and Jk represents the kth virtual joint model.

Each virtual joint model in the virtual hand model corresponds to the key node identified by gesture tracking in the step one, the position and posture data of each identified key node of the hand is used for updating the position and posture of the virtual joint model of the current frame, the position and posture of the virtual hand model are determined according to the position and posture of the key node, and the mapping of the real hands in the virtual space is realized, wherein the formula is as follows:

r _i ＝R _z (w _i )R _y (r _i )R _x (l _i ) (5)

wherein pi is a position vector of the ith virtual joint model, ri is a rotation matrix of the virtual joint model, the rotation matrix and the Euler angle have a conversion relation as shown in the formula, wherein Rz (wi) represents rotation wi degrees around the z axis; and T is a transformation matrix representing an augmented reality environment coordinate system and a camera coordinate system in which the virtual joint model is positioned. Pi is the position vector of the key node corresponding to the ith virtual joint model, and ri is the rotation matrix of the key node corresponding to the ith virtual joint model.

Step three: based on a collision detection algorithm, it is calculated in real time every frame whether contact or collision occurs between the virtual hand model and other virtual models to be manipulated.

Step four: and constructing a 'gripping pair' condition according to the characteristics of the gripping physical process of the real object and the augmented reality environment, and constructing a gripping intention recognition algorithm based on the 'gripping pair'. If the touch detection algorithm detects that the hands are in contact with other virtual models, calculating whether a 'gripping pair' can be formed between the hands and a plurality of contact points of the manipulated model according to a gripping intention recognition algorithm, judging whether a gripping condition exists between the hands and the virtual models, wherein the 'gripping pair' consists of two contact points, if more than one 'gripping pair' exists, judging that the gripped virtual models are in a gripping state, judging whether gripping is finished based on contact calculation of the contact points is not needed, so that the gripping intention judgment is more flexible, the gripping condition is closer to a real three-dimensional gesture manipulation condition, the complex gesture interaction scene is more suitable, the intuitive interaction feeling of a user is more met, and meanwhile, if more pairs of 'gripping pairs' exist, a plurality of contact points forming the gripping pair all participate in the interaction intention recognition, and the robustness, flexibility, efficiency and immersion feeling of the gesture interaction intention recognition are improved.

The real object gripping physical process is characterized in that the basic law of Newton rigid body mechanics is applied to judge whether a gripped object is balanced in stress or not and the friction force between the virtual hand model and the contact surface of the manipulated model is used for judging whether the gripped object can be gripped or not, and the implementation principle is that the stress state of the object is analyzed by utilizing a simplified coulomb friction force model.

The 'gripping pair' is formed by a virtual hand model meeting the condition and two contact points of a gripped model. The "grip pair" conditions are as follows: the angle between the line connecting the two contact points and the normal of the respective contact surface does not exceed a fixed angle α, the two contact points will form a stable gripping pair g (a, b). The fixed angle alpha is the friction angle.

The gripping intention recognition algorithm is established according to a gripping pair condition, and whether all current contact points and another contact point can form a pair of gripping pairs is judged circularly. For any two contact points a and b of the virtual hand and the virtual object in one cycle judgment, the angle between the connecting line of the two contact points and the normal of the respective contact surface does not exceed a fixed angle alpha, and then the two contact points form a stable gripping pair g (a and b). This fixed angle α is referred to as the friction angle, i.e. the grip pair g (a, b) should satisfy

Wherein n is _a And n _b The normal vectors of the contact point a and the contact point b are normal vectors of the cylindrical surface of the joint virtual model at the contact point; l _ab Is a connecting line of the contact points a and b; α is the friction angle, the value of which needs to be set by testing for a specific manipulated model to meet the stable, natural grip of the virtual part.

Step five: and constructing a grabbing center acquisition method according to the 'grabbing pair' condition constructed in the fourth step so as to acquire the grabbing center. If the virtual model is judged to be in the gripping state based on the gripping intention recognition algorithm in the fourth step, the virtual force or moment exerted on the virtual model by the two hands is calculated based on the displacement and posture transformation of the gripping center of the two hands on the manipulated model according to the manipulation intention recognition algorithm, and the virtual force or moment drives the movement or rotation of the virtual model. After a manipulation intention identification algorithm is adopted and a grasping center judgment condition is added, all contact points participate in the manipulation intention identification process, so that the manipulation intention is more flexibly identified, and the robustness of the manipulation intention identification is improved.

The grabbing center is a central point representing the motion of the whole hand, the whole hand is regarded as a complete rigid body, and the position, the posture and the speed of the grabbing center represent the motion parameters of the whole virtual hand.

The grasping center judging method comprises the following steps: and determining the position and the number of the 'gripping pairs' according to the 'gripping pairs' condition constructed in the step four. The "grip pair" is regarded as one unified rigid body, and the position and posture of the rigid body are represented by the grip center. If one 'gripping pair' exists, the gripping center is the center of a connecting line of contact points forming the 'gripping pair', and the gripping center position and the posture are calculated as follows:

where Pc denotes a grasping center position, p1 and p2 denote positions of contact points constituting a "grasping pair", wc, rc, and lc denote three Euler angle parameters of the grasping center, respectively,

and

representing unit vectors pointing in the x, y and z axes in the current coordinate system.

If a plurality of 'gripping pairs' exist, judging according to the connecting line lengths of the contact points of the 'gripping pairs', determining the 'gripping pair' with the longest connecting line length as a main gripping pair, and constructing a gripping center according to the formulas (7) and (8).

Step 5.1, judging whether the 'grasping pair' meets a 'grasping pair' canceling condition, if so, determining that the user puts down the manipulated virtual model, not executing subsequent steps, and not updating the position and the posture of the virtual model in the next frame; if not, executing step 5.2;

the "grip sub" cancellation condition is calculated as follows:

wherein the content of the first and second substances,

the distance between two contact points constituting a "grip pair" for the current ith frame,

the distance between two contact points constituting the "grip pair" for the i-1 th frame, k is a fixed value. That is, when the two contact points constituting the "grasping pair" are separated between the two frames and the separation degree satisfies a certain threshold, it is considered that the grasping is cancelled.

And 5.2, calculating virtual force or moment applied to the virtual model by the two hands according to the manipulation intention recognition algorithm, and continuing to execute the step 5.3. The manipulation intention recognition algorithm is used for calculating virtual force or virtual moment applied to the virtual model by the two hands of the current frame based on the pose transformation trend of the grabbing center and calculating the moving and rotating parameters of the virtual model according to the virtual force or the virtual moment, wherein the moving and rotating parameters comprise the moving direction and distance and the rotating direction and angle. After the manipulation intention recognition algorithm is adopted and the condition judgment of the 'grabbing center' is added, all contact points participate in the manipulation intention recognition process, the manipulation intention is more flexibly recognized, and the robustness of the manipulation intention recognition is improved.

The manipulation intention recognition algorithm is constructed based on a spring damping model (springer-dampers) of virtual linearity and torsion. The calculation formula of the manipulation intention recognition algorithm is as follows.

Equation (11) is a calculation equation of the virtual force, f _vf Expressing the virtual steering force, equation (12) is a calculation equation of the virtual moment, τ _vf Representing a virtual steering torque. Wherein the gesture of the current ith frame with both hands contacting the center point is represented as

At frame i +1, the gesture of both hands touching the center point is represented as (qi +1l, qi + 1o),

for the three-dimensional position of the hand in the ith frame,

quaternions to describe hand orientation;

and

linear and angular velocities at frame i for the virtual model being manipulated. K _sl (K _so ) And K _Dl (K _Do ) Coefficients of linear and torsion spring damping models. By means of adjustment. K _sl (K _so ) And K _Dl (K _Do ) And the coefficient realizes stable and smooth dynamic motion of the virtual part and accords with the visual interactive feeling of the user.

And 5.3, calculating the displacement variation and the rotation variation of the virtual model by combining rigid body dynamics according to the virtual force or the moment calculated by the manipulation intention recognition algorithm in the step 5.2. And updating the position and pose of the manipulated virtual model at the current frame based on the displacement and the amount of rotational change, and rendering the virtual model based on the new position and pose.

The displacement variation calculation formula is as follows:

wherein Si represents the displacement of the manipulated virtual model at the current ith frame, vi represents the velocity of the manipulated virtual model at the current ith frame, Δ t represents the time difference between the current ith frame and the i +1 th frame of the next frame, f _vf M represents the mass of the manipulated virtual model for the virtual manipulation force identified by the manipulation intention recognition algorithm. Delta T _i A displacement matrix representing the virtual model, and Z, Y, and X represent coordinate systems in the augmented reality environment.

The formula for calculating the rotation variation is as follows:

ΔR _i ＝R _z (θ _iz )R _y (θ _iy )R _x (θ _ix ) (16)

wherein, theta _i Indicating the angle of rotation of the virtual model manipulated at the current i-th frame, τ _vf The virtual steering force recognized by the steering intention recognition algorithm is recognized, Δ t represents the time difference from the current i-th frame to the i + 1-th frame of the next frame, J represents the moment of inertia of the steered virtual model, and Δ R _i Rotation matrix, θ, representing a virtual model _iz ，θ _iy And theta _ix Respectively indicate the rotation angle at theta _i The components around the x, y, z axes of the augmented reality environment coordinate system.

Step six: and repeating the first step to the fifth step, and according to the virtual hand model, the gripping intention recognition method and the manipulation intention recognition method, performing three-dimensional gesture manipulation in the augmented reality environment, efficiently and accurately recognizing the natural gripping intention of the user for the three-dimensional virtual model, supporting the user to move and rotate the virtual object by the natural gesture, improving the robustness of the three-dimensional gesture interaction process, enabling the gesture interaction experience to be more visual and natural, further improving the virtual effect of the three-dimensional gesture manipulation in the augmented reality environment, and improving the immersion feeling of the user.

The invention also discloses a gesture interaction system which is used for realizing the virtual model three-dimensional gesture control method in the augmented reality environment.

The data acquisition and processing module is used for acquiring the RGB image and the depth image of the current frame, and acquiring the position and posture information of the key hand node of the current frame according to the RGB image and the depth image based on the convolutional neural network two-hand pose estimation algorithm;

and the virtual hand module is used for superposing the virtual hand models, determining the positions and postures of the virtual hand models according to the positions and postures of the key nodes, and realizing the mapping of the real hands in the virtual space. And updating and maintaining the position and the posture of the virtual hand model in real time under an AR equipment coordinate system according to the position and the posture of the key hand node acquired by the data acquisition and processing module.

And the collision calculation module calculates whether contact or collision occurs between the virtual hand model and other virtual models to be manipulated in real time in each frame based on a collision detection algorithm.

The gesture intention recognition module comprises a gripping intention recognition submodule and a manipulation intention recognition submodule. The grasping intention identification submodule constructs a grasping pair condition according to the grasping physical process characteristics of a real object and the augmented reality environment, constructs a grasping intention identification algorithm based on the grasping pair, and identifies whether the two hands grasp the virtual model or release the virtual model. If the collision detection-based module detects that two hands contact other virtual models, a 'gripping pair' is formed between the hands and a plurality of contact points of a manipulated model according to a gripping intention recognition algorithm, whether a gripping condition exists between the two hands and the virtual models or not is judged, the 'gripping pair' is composed of two contact points, if more than one 'gripping pair' exists, the gripped virtual model is judged to be in a gripping state, whether gripping is finished or not is judged without contact calculation based on the contact points, gripping intention judgment is more flexible, the gripping condition is closer to a real three-dimensional gesture manipulation condition, the complex gesture interaction scene is more suitable, visual interaction feeling of a user is better met, meanwhile, if more pairs of 'gripping pairs' exist, a plurality of contact points forming the gripping pair all participate in interaction intention recognition, and robustness, flexibility, efficiency and immersion feeling of the gesture interaction intention recognition are improved.

The manipulation intention identification submodule is used for identifying manipulation intentions of the user for the manipulated virtual model, and the manipulation intentions comprise movement and rotation. After the grasping intention identification submodule identifies the grasping intention, the manipulating intention identification module is called, the intention of the manipulated virtual model is identified based on the manipulating intention identification algorithm and is expressed by the result of the virtual driving force and the moment, the displacement variation and the rotation variation of the manipulated virtual model are calculated based on the force or the moment, the motion state of the manipulated model is updated, and the manipulated model is driven. Compared with a method for predicting model displacement by simply using the displacement of both hands, the manipulation intention recognition mode based on the virtual force is more consistent with the physical motion process, and more accurate manipulation can be realized later.

Has the advantages that:

1. the invention discloses a virtual model three-dimensional gesture control method and a virtual model three-dimensional gesture control system in an augmented reality environment. If the collision detection algorithm detects that two hands contact other virtual models, according to the grasping intention recognition algorithm, whether a grasping pair can be formed between the hands and a plurality of contact points of a manipulated model is calculated, whether a grasping condition exists between the two hands and the virtual models is judged, the grasping pair is composed of two contact points, if more than one grasping pair exists, the grasped virtual model is judged to be in a grasping state, whether grasping is finished or not is judged without contact calculation based on the contact points, the grasping intention judgment is more flexible, the grasping pair is closer to a real three-dimensional gesture manipulating condition, the complex gesture interaction scene is more suitable for the visual interaction feeling of a user, meanwhile, if the grasping pairs exist, the contact points forming the grasping pair all participate in the interaction intention recognition, and the robustness, the flexibility, the efficiency and the immersion feeling of the gesture interaction intention recognition are improved.

2. The invention discloses a virtual model three-dimensional gesture control method and a virtual model three-dimensional gesture control system in an augmented reality environment.

Drawings

Fig. 1 is a schematic view of a grip pair.

FIG. 2 is a flowchart of a method for manipulating a three-dimensional gesture of a virtual model in an augmented reality environment according to the present disclosure.

FIG. 3 is a system block diagram of a virtual model three-dimensional gesture manipulation system in an augmented reality environment disclosed in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

The gesture interaction method in the augmented reality environment disclosed by the embodiment can be applied to AR terminal equipment equipped with a camera, including a mobile phone, a tablet, AR glasses or an AR helmet. The present invention will take AR glasses as an example to explain the gesture interaction method in detail.

Referring to fig. 2, the gesture interaction method in the augmented reality environment disclosed in this embodiment includes the following specific implementation steps:

step 1: and acquiring images of both hands of the current frame, and identifying and tracking the position and posture data of the key nodes of both hands relative to the AR equipment based on a convolutional neural network two-hand pose estimation algorithm.

The two-hand image acquisition uses a pair of cameras facing to the same side on the AR terminal equipment, wherein one camera is a color camera for acquiring color images, and the other camera is a camera for acquiring depth images. In the embodiment, the color image and the depth image are acquired by using an RGB camera and a TOF module configured with AR glasses, and the depth image of each frame is registered to the color image according to external parameters between two cameras, which is convenient for the subsequent processing. The external parameters are the displacement and rotation transformation relation between the camera coordinate systems with the optical centers as the original points of the depth camera and the color camera.

The convolutional neural network two-hand pose estimation algorithm is used for identifying and tracking two-hand key node position and posture information in real time. The algorithm uses two convolutional neural networks, wherein one convolutional neural network is used for positioning a two-hand area on an RGB image, the RGB image is cut according to a positioning result, normalization is carried out on the RGB image and a depth value of the corresponding area, the RGB image and the depth value are input into the next neural network, and position and posture information of key nodes of two hands in a camera coordinate system are subjected to real-time regression calculation and estimation. And finally, converting the pose and posture results of the key nodes of both hands under the camera coordinate system into the AR equipment coordinate system according to the pose transformation matrixes of the camera coordinate system and the AR equipment coordinate system.

Step 2: and overlapping the virtual hand models at the key nodes of the two hands identified by the current frame, and determining the positions and postures of the virtual hand models according to the positions and postures of the key nodes.

The virtual hand model is composed of 19 virtual joint models, each virtual joint model is a cylinder and approximately simulates fingers of both hands. The topological relation exists between the virtual hand joint models, namely the high-level virtual joint model comprises the low-level virtual joint model, and the high-level virtual joint model moves to drive the low-level virtual joint model to move, for example, the motion of the root joint of a finger drives the motion of the middle joint and the top joint. The parameterized representation of the virtual hand model is as follows:

Each virtual joint model in the virtual hand model corresponds to the key node identified by gesture tracking in step 1, and the position and posture data of each identified key node of the hand is used for updating the position and posture of the virtual joint model of the current frame, so as to realize the mapping of the virtual hand model, as follows:

r _i ＝R _z (w _i )R _y (r _i )R _x (l _i ) (5)

wherein pi is a position vector of the ith virtual joint model, ri is a rotation matrix of the virtual joint model, the rotation matrix has a conversion relation with an Euler angle as shown in the formula, wherein Rz (wi) represents rotation wi degrees around a z axis; and T is a transformation matrix representing an augmented reality environment coordinate system and a camera coordinate system in which the virtual joint model is positioned. Pi is the position vector of the key node corresponding to the ith virtual joint model, and ri is the rotation matrix of the key node corresponding to the ith virtual joint model.

And step 3: starting a collision detection algorithm, and calculating whether contact or collision occurs between the virtual hand model and other manipulated virtual models in real time in each frame;

the collision detection algorithm adopted in this embodiment is a hierarchical Bounding Box algorithm, an OBB (organized Bounding Box) Bounding Box is generated around the virtual joint model and other virtual models in the current AR environment, and whether collision between each virtual joint model in the virtual hand model and the virtual model that can be manipulated occurs is calculated in real time according to the Bounding Box. Since the virtual joint model is a regular convex polyhedron (cylinder), a cylindrical bounding box consistent with the geometrical parameters of the virtual joint model is generated, and other virtual models generate an approximate convex polyhedron bounding box according to the shape.

If a collision occurs between the virtual joint model and the virtual model, calculating the specific position and penetration direction of the collision of the virtual joint model and the virtual joint model based on a three-dimensional GJK (Gilbert-Johnson-Keerthi, GJK) Algorithm and an EPA (apparent Multi object Algorithm) Algorithm, and executing the step 4.

And 4, step 4: if the contact between the hands and other virtual models is detected, calculating whether a plurality of contact points of the hands and the models can form a 'gripping pair' according to a gripping intention recognition algorithm, judging whether a gripping condition exists between the hands and the virtual models, and if the 'gripping pair' exists, judging the gripped virtual models to be in a gripping state and jumping to the step 5;

the 'gripping pair' is formed by a virtual hand model meeting the condition and two contact points of a gripped model, as shown in the attached figure 1. The condition of the "grip pair" is that for two contact points a, b of the virtual hand and the virtual object, the angle between the line connecting the two contact points and the normal of the respective contact surface does not exceed a fixed angle α, and the two contact points will form a stable grip pair g (a, b). The fixed angle α is the friction angle, i.e. the grip pair g (a, b) is determined as follows:

wherein n is _a And n _b The normal vectors of the contact point a and the contact point b are normal vectors of the cylindrical surface of the joint virtual model at the contact point; l _ab Is a connecting line of the contact points a and b; α is a friction angle, and in the present embodiment, in order to ensure that the user can grip a virtual object with a large outline in the AR environment, the friction angle is set to a large fixed value α =75 °.

The grasping intention recognition algorithm is constructed according to a grasping pair condition, all current contact points are circulated, whether a pair of grasping pairs can be formed by the current contact points and another contact point is judged, and when at least one pair of grasping pairs exists between the virtual hand and the virtual object, the virtual object is judged to be successfully grasped.

And 5: if the virtual model is in a gripping state, calculating virtual force or moment generated by the virtual model being manipulated by combining a manipulation intention recognition algorithm according to the position and posture transformation trend of the gripping center of the two hands on the manipulated virtual model, and driving the virtual model to move or rotate by the virtual force or moment;

The acquisition method of the grabbing center comprises the following steps: and determining the positions and the number of the 'gripping pairs' according to the 'gripping pair' condition constructed in the step four. The "grip pair" is regarded as one unified rigid body, and the position and posture of the rigid body are represented by the grip center. If one 'gripping pair' exists, the gripping center is the center of a connecting line of contact points forming the 'gripping pair', and the gripping center position and the posture are calculated as follows:

where Pc denotes in grabThe heart position, p1 and p2 represent the positions of the contact points constituting the "grip pair", wc, rc and lc represent the three euler angle parameters of the centre of grip respectively,

and

If a plurality of 'gripping pairs' exist, judging according to the connecting line lengths of the contact points of the 'gripping pairs', determining the 'gripping pair' with the longest connecting line length as a main gripping pair, and constructing a gripping center according to the formulas (8) and (9).

In this embodiment, specifically, step 5 may include the following steps:

step 5.1, judging whether the 'grasping pair' meets a cancellation condition, if so, regarding that the user puts down the manipulated virtual model, not executing the subsequent steps, and not updating the position and the posture of the virtual model in the subsequent frames; if not, executing step 5.2;

the "grip sub" cancellation condition is calculated as follows:

wherein the content of the first and second substances,

the distance between the two contact points constituting the "grip pair" for the current frame (i-th frame),

the distance between two contact points constituting the "grip pair" for the i-1 th frame, k is a fixed value, set to 3mm in this example. That is, when the two contact points constituting the "grasping pair" are separated between the two frames and the separation degree satisfies a certain threshold, it is considered that the grasping is cancelled.

And 5.2, acquiring the current grabbing center according to the grabbing center acquisition method. And (4) calculating the virtual force or moment exerted on the virtual model by the two-hand grabbing center according to the intention identification algorithm and the pose change trend of the grabbing center, and continuing to execute the step 5.3.

The manipulation intention recognition algorithm is used for calculating virtual force or virtual moment applied by the two hands of the current frame to the virtual model based on the pose transformation trend of the grabbing center, and calculating the moving and rotating parameters of the virtual model according to the virtual force or the virtual moment, wherein the parameters comprise the moving direction and distance and the rotating direction and angle.

The manipulation intention recognition algorithm is constructed based on a virtual linear and torsional spring damping model (springer-dampers). The calculation formula of the algorithm is as follows:

equation (11) is a calculation equation of the virtual force, and equation (12) is a calculation equation of the virtual moment. Wherein the posture of the ith frame (current frame) with both hands contacting the center point is represented as

for the three-dimensional position of the hand in the ith frame,

quaternions to describe hand orientation;

and

linear and angular velocities of the virtual part at the ith frame. K is _sl (K _so ) And K _Dl (K _Do ) The coefficient of the linear and torsion spring damping model is adjusted through experience and specific test experiments to realize stable and smooth dynamic motion of the virtual part.

And 5.3, calculating the displacement variation and the rotation variation of the virtual model by combining rigid body dynamics according to the virtual force or the moment calculated by the manipulation intention recognition algorithm in the step 5.2. And updating the position and pose of the manipulated virtual model at the current frame according to the displacement and the rotation variation, rendering the virtual model according to the new position and pose, and continuing to execute 5.1 in the next frame.

The displacement variation calculation formula is as follows:

wherein Si represents the displacement of the manipulated virtual model at the current ith frame, vi represents the velocity of the manipulated virtual model at the current ith frame, Δ t represents the time difference between the current ith frame and the i +1 th frame of the next frame, f _vf For the virtual steering force identified by the steering intent recognition algorithm in question, m represents the mass of the virtual model being steered, set to unity value 10 in this example. Delta T _i A displacement matrix representing the virtual model, and Z, Y, and X represent coordinate systems in the augmented reality environment.

The rotation variation calculation formula is as follows:

ΔR _i ＝R _z (θ _iz )R _y (θ _iy )R _x (θ _ix ) (16)

wherein, theta _i Indicating the angle of rotation, τ, of the virtual model manipulated at the current ith frame _vf For the said manipulationThe virtual steering force recognized by the intention recognition algorithm, Δ t represents the time difference between the current ith frame and the (i + 1) th frame of the next frame, J represents the moment of inertia of the steered virtual model, and Δ R _i Rotation matrix, θ, representing a virtual model _iz ，θ _iy And theta _ix Respectively indicate the rotation angle at theta _i The components around the x, y, z axes of the augmented reality environment coordinate system.

And 6, repeating the steps 1 to 5, and performing three-dimensional gesture control in the augmented reality environment according to the virtual hand model, the gripping intention identification method and the control intention identification method, efficiently and accurately identifying the natural gripping intention of the user for the three-dimensional virtual model, supporting the user to move and rotate the virtual object by the natural gesture, improving the robustness of the three-dimensional gesture interaction process, enabling the gesture interaction experience to be more visual and natural, further improving the virtual effect of the augmented reality on the three-dimensional gesture control, and improving the immersion feeling of the user.

Further, referring to fig. 3, the embodiment further provides a system for gesture interaction in an augmented reality environment, where the system includes a data acquisition module, a virtual hand module, a collision calculation model, and a gesture interaction intention recognition module.

The data acquisition and processing module is used for acquiring RGB images and depth images of the current frame and acquiring key node positions and posture data of both hands based on the convolutional neural network two-hand pose estimation algorithm. The module calls a depth camera and an RGB camera to obtain a depth image and an RGB image, applies a convolutional neural network, estimates the position and the attitude information of a key node of a hand of a current frame based on the RGB image and the depth image and transmits the position and the attitude information to the virtual hand module.

And the virtual hand module updates the positions and the postures of all virtual joint models forming the virtual hand under an AR equipment coordinate system according to the positions and the postures of the key nodes of the hand of the current frame acquired by the data processing module.

The collision calculation module is used for detecting whether the virtual hand model is in contact with the virtual model to be manipulated in the AR environment or not, maintaining the OBB bounding box of the virtual hand joint model and the AR model in each frame, calculating collision and contact between the virtual hand model and other models in the AR environment, and calculating the contact position and direction according to the GJK algorithm after the contact.

The gesture interaction intention recognition module comprises a gripping intention recognition submodule and a manipulation intention recognition submodule. The grasping intention identification submodule constructs a grasping pair condition according to the grasping physical process characteristics of a real object and the augmented reality environment, constructs a grasping intention identification algorithm based on the grasping pair, and identifies whether the two hands grasp the virtual model or release the virtual model. If the collision detection-based module detects that two hands are in contact with other virtual models to be manipulated, according to a grasping intention recognition algorithm, whether a grasping pair can be formed between the hands and the multiple contact points of the manipulated models is calculated, whether a grasping condition exists between the hands and the virtual models is judged, the grasping pair is composed of two contact points, if more than one grasping pair exists, the grasped virtual models are judged to be in a grasping state, whether grasping is finished or not is judged without contact calculation based on the multiple contact points, grasping intention judgment is made to be more flexible, the grasping intention is closer to a real three-dimensional gesture manipulation condition, the method is more suitable for a complex gesture interaction scene and better accords with visual interaction feeling of a user, meanwhile, if the multiple pairs of grasping pairs exist, the multiple contact points forming the grasping pair can participate in interaction intention recognition, and robustness, flexibility, efficiency and immersion feeling of the gesture interaction intention recognition are improved.

The manipulation intention identification submodule is used for identifying manipulation intentions of a user for the manipulated virtual model, including movement rotation and the like. After the grasping intention identification submodule identifies the grasping intention, the manipulating intention identification module is called, the intention of the manipulated virtual model is identified based on the manipulating intention identification algorithm and is expressed by the result of the virtual driving force and the moment, the displacement variation and the rotation variation of the manipulated virtual model are calculated based on the force or the moment, the motion state of the manipulated model is updated, and the manipulated model is driven. Compared with a method for predicting model displacement by simply using displacement of two hands, the manipulation intention identification mode based on the virtual force is more in line with the physical motion process, and more accurate manipulation can be realized later.

The above detailed description is further intended to illustrate the objects, technical solutions and advantages of the present invention, and it should be understood that the above detailed description is only an example of the present invention and should not be used to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A virtual model three-dimensional gesture control method in an augmented reality environment is characterized by comprising the following steps: comprises the following steps of (a) preparing a solution,

the method comprises the following steps: acquiring images of both hands of a current frame, and determining the position and posture data of key nodes of both hands relative to AR equipment based on a convolutional neural network two-hand pose estimation algorithm; the convolutional neural network algorithm is an algorithm for estimating the pose of two hands from 2D to 3D, which consists of two convolutional neural networks; the first convolutional neural network is trained to realize the positioning of the hand, and the 2D position of the center of the hand in the image is estimated; then the localized image of the hand position is used for generating a normalized cutting image together with the corresponding input depth value, and the image is transmitted to a second convolution neural network to return the relative 3D hand joint position in real time;

step two: superposing a virtual hand model at the key nodes of the two hands identified by the current frame, and determining the position and the posture of the virtual hand model according to the position and the posture of the key nodes to realize the mapping of the real two hands in a virtual space;

step three: calculating whether contact or collision occurs between the virtual hand model and other virtual models to be manipulated in real time in each frame based on a collision detection algorithm;

step four: establishing a 'grabbing pair' condition according to the characteristics of a real object grabbing physical process and an augmented reality environment, and establishing a grabbing intention recognition algorithm based on the 'grabbing pair'; if the third step is that the collision detection algorithm detects that the two hands contact other virtual models, a 'gripping pair' is calculated between the hands and the multiple contact points of the manipulated model according to the gripping intention recognition algorithm, whether the gripping condition exists between the two hands and the virtual model is judged, if more than one 'gripping pair' exists, the gripped virtual model is judged to be in a gripping state, whether gripping is finished or not is judged without contact calculation based on the multiple contact points, so that the gripping intention judgment is more flexible, the gripping condition is closer to the real three-dimensional gesture manipulation condition, the complex gesture interaction scene is more suitable, the intuitive interaction feeling of the user is more met, and meanwhile, if more pairs of 'gripping pairs' exist, the multiple contact points forming the gripping pair all participate in the interaction intention recognition, and the robustness, flexibility, efficiency and immersion of the gesture interaction intention recognition are improved;

step five: constructing a grabbing center acquisition method according to the 'grabbing pair' condition constructed in the step four so as to acquire a grabbing center; if the virtual model is judged to be in the gripping state based on the gripping intention recognition algorithm in the fourth step, calculating virtual force or moment exerted on the virtual model by the two hands based on the displacement and posture transformation of the gripping center of the two hands on the manipulated model according to the manipulation intention recognition algorithm, and driving the virtual model to move or rotate by the virtual force or moment; after a manipulation intention identification algorithm is adopted and a grasping center judgment condition is added, all contact points participate in the manipulation intention identification process, so that the manipulation intention is more flexibly identified, and the robustness of the manipulation intention identification is improved;

2. The method for manipulating the three-dimensional gestures of the virtual model in the augmented reality environment according to claim 1, further comprising: the virtual hand model consists of a plurality of virtual joint models, each virtual joint model is a cylinder and approximately simulates finger joints of real hands; a topological relation exists between the virtual hand joint models, namely the high-level virtual joint model comprises a low-level virtual joint model, and the low-level virtual joint model is driven to move when the high-level virtual joint model moves; the virtual hand model is represented by the following parameterization:

wherein Jointi is the ith virtual joint model, pi is the position of the virtual joint model, and is represented by a group of vectors xi, yi and zi under the coordinate system of the augmented reality environment, ei is the posture of the virtual joint model, and is represented by a group of vectors wi, ri and li under the coordinate system of the augmented reality environment, size is the parameter of the virtual joint model, li represents the length of the cylinder, di represents the diameter of the cylinder, children represents the driven sub-joint model of the virtual joint model, and Jk represents the kth virtual joint model;

r _i ＝R _z (w _i )R _y (r _i )R _x (l _i ) (5)

wherein pi is a position vector of the ith virtual joint model, ri is a rotation matrix of the virtual joint model, the rotation matrix has a conversion relation with an Euler angle as shown in the formula, wherein Rz (wi) represents rotation wi degrees around a z axis; t is a transformation matrix which represents an augmented reality environment coordinate system where the virtual joint model is located and a camera coordinate system; pi is the position vector of the key node corresponding to the ith virtual joint model, and ri is the rotation matrix of the key node corresponding to the ith virtual joint model.

3. The method for manipulating the three-dimensional gestures of the virtual model in the augmented reality environment according to claim 2, further comprising: the physical process of gripping the real object is characterized in that the basic law of Newton rigid body mechanics is applied to judge whether the object can be gripped or not by the force balance of the gripped object and the friction force between the virtual hand model and the contact surface of the manipulated model, and the realization principle is to analyze the force state of the object by utilizing a simplified coulomb friction force model;

the 'gripping pair' is formed by a virtual hand model meeting the conditions and two contact points of a gripped model; the "grip pair" conditions are as follows: the angle between the connecting line of the two contact points and the normal of the respective contact surface does not exceed a fixed angle alpha, and then the two contact points form a stable gripping pair g (a, b); the fixed angle alpha is a friction angle;

the gripping intention recognition algorithm is constructed based on a gripping pair condition, and the current contact points are circularly judged whether to form a pair of gripping pairs with another contact point. For any two contact points a and b of the virtual hand and the virtual object in one cycle judgment, if the angle between the connecting line of the two contact points and the normal line of each contact surface does not exceed a fixed angle alpha, the two contact points form a stable gripping pair g (a and b); this fixed angle α is referred to as the friction angle, i.e. the gripping pair g (a, b) should satisfy

Wherein n is _a And n _b Is the normal vector of the contact point a and the contact point b, the normal vector being the contact pointA normal vector of a cylindrical surface of the virtual model of the joint; l _ab Is a connecting line of the contact points a and b; α is the friction angle, the value of which needs to be set for a particular manipulated model by testing to meet the stable, natural grip of the virtual part.

4. The method for manipulating the three-dimensional gestures of the virtual model in the augmented reality environment according to claim 3, further comprising: the grabbing center is a central point representing the motion of the whole hand, the whole hand is regarded as a complete rigid body, and the position, the posture and the speed of the grabbing center represent the motion parameters of the whole virtual hand;

the grasping center judging method comprises the following steps: judging the positions and the number of the 'gripping pairs' according to the 'gripping pairs' condition constructed in the step four; regarding the 'grasping pair' as a uniform rigid body, and representing the position and the posture of the rigid body by a grasping center; if one 'gripping pair' exists, the gripping center is the center of a connecting line of contact points forming the 'gripping pair', and the gripping center position and the posture are calculated as follows:

and

a unit vector representing directions of x, y and z axes in a current coordinate system;

if a plurality of grasping pairs exist, judging according to the connecting line lengths of the contact points of the grasping pairs, determining the grasping pair with the longest connecting line length as a main grasping pair, and constructing a grasping center according to the formulas (7) and (8);

step 5.1, judging whether the 'grasping pair' meets a 'grasping pair' cancellation condition, if so, regarding that the user puts down the manipulated virtual model, not executing the subsequent steps, and not updating the position and the posture of the virtual model in the next frame; if not, executing step 5.2;

the "grip pair" cancellation condition is calculated as follows:

wherein the content of the first and second substances,

the distance between two contact points forming a 'gripping pair' for the (i-1) th frame, and k is a fixed value; namely, when two contact points forming the 'grabbing pair' are far away between two frames and the far-away degree meets a certain threshold value, the grabbing is regarded as cancelled;

step 5.2, calculating the virtual force or moment exerted on the virtual model by the two hands according to the operation intention recognition algorithm, and continuing to execute the step 5.3; the manipulation intention recognition algorithm is used for calculating virtual force or virtual moment applied by two hands of a current frame to a virtual model based on the pose transformation trend of a grabbing center, and calculating the moving and rotating parameters of the virtual model according to the virtual force or the virtual moment, wherein the moving and rotating parameters comprise moving direction and distance, and rotating direction and angle; after a manipulation intention recognition algorithm is adopted and the condition judgment of a 'grabbing center' is added, all contact points participate in the manipulation intention recognition process, so that the manipulation intention recognition is more flexible, and the robustness of the manipulation intention recognition is improved;

the manipulation intention recognition algorithm is constructed based on a virtual linear and torsional spring damping model (springers-dampers); the calculation formula of the manipulation intention recognition algorithm is shown as follows;

equation (11) is a calculation equation of the virtual force, f _vf Expressing the virtual steering force, equation (12) is a calculation equation of the virtual moment, τ _vf Representing a virtual steering torque; wherein the gesture of the current ith frame of two-hand contact with the center point is represented as

At frame i +1, the gesture where both hands touch the center point is represented as (qi +1l, qi + 1o),

for the three-dimensional position of the hand in the ith frame,

quaternions to describe hand orientation;

and

linear and angular velocities at the i-th frame for the manipulated virtual model; k _sl (K _so ) And K _Dl (K _Do ) Coefficients for linear and torsion spring damping models; tuning through for a particular manipulated virtual model; k is _sl (K _so ) And K _Dl (K _Do ) The coefficient realizes stable and smooth dynamic motion of the virtual part and accords with the visual interactive feeling of the user;

step 5.3, according to the virtual force or moment calculated by the operation intention recognition algorithm in the step 5.2, calculating the displacement variation and the rotation variation of the virtual model by combining rigid body dynamics; updating the position and the posture of the manipulated virtual model in the current frame according to the displacement and the rotation variation, and rendering the virtual model according to the new position and the new posture;

the displacement variation calculation formula is as follows:

wherein Si represents the displacement of the manipulated virtual model at the current ith frame, vi represents the velocity of the manipulated virtual model at the current ith frame, Δ t represents the time difference between the current ith frame and the i +1 th frame of the next frame, f _vf A virtual steering force identified for said steering intent recognition algorithm, m representing the mass of the virtual model being steered; delta T _i A displacement matrix representing the virtual model, Z, Y and X representing a coordinate system in the augmented reality environment;

the rotation variation calculation formula is as follows:

ΔR _i ＝R _z (θ _iz )R _y (θ _iy )R _x (θ _ix ) (16)

wherein, theta _i Indicating the angle of rotation, τ, of the virtual model manipulated at the current ith frame _vf The virtual steering force recognized by the steering intention recognition algorithm is recognized, Δ t represents the time difference from the current i-th frame to the i + 1-th frame of the next frame, J represents the moment of inertia of the steered virtual model, and Δ R _i Rotation matrix, θ, representing the virtual model _iz ，θ _iy And theta _ix Respectively indicate the rotation angleAt theta _i The components around the x, y, z axes of the augmented reality environment coordinate system.

5. A gesture interaction system for implementing the virtual model three-dimensional gesture manipulation method in the augmented reality environment according to claim 1, 2, 3 or 4, wherein: the system comprises a data acquisition and processing module, a virtual hand module, a collision calculation module and a gesture intention recognition module;

the virtual hand module is used for superposing the virtual hand models, determining the positions and the postures of the virtual hand models according to the positions and the postures of the key nodes, and realizing the mapping of the real hands in the virtual space; updating and maintaining the position and the posture of the virtual hand model in real time under an AR equipment coordinate system according to the position and the posture of the key hand node acquired by the data acquisition and processing module;

the collision calculation module is used for calculating whether the virtual hand model and other virtual models to be manipulated contact or collide in real time in each frame based on a collision detection algorithm;

the gesture intention recognition module comprises a gripping intention recognition submodule and a manipulation intention recognition submodule; the grasping intention identification submodule constructs a grasping pair condition according to the characteristics of a grasping physical process of a real object and an augmented reality environment, constructs a grasping intention identification algorithm based on the grasping pair, and identifies whether the virtual model is grasped or loosened by two hands; if the collision detection-based module detects that two hands are in contact with other virtual models, according to a gripping intention recognition algorithm, whether a 'gripping pair' can be formed between the hands and a plurality of contact points of a manipulated model is calculated, whether a gripping condition exists between the two hands and the virtual models is judged, the 'gripping pair' is composed of two contact points, if more than one 'gripping pair' exists, the grasped virtual model is judged to be in a gripping state, whether gripping is finished is judged without contact calculation based on the contact points, so that the gripping intention judgment is more flexible, the real three-dimensional gesture manipulation condition is more approximate, the complex gesture interaction scene is more suitable, the intuitive interaction feeling of a user is better met, meanwhile, if more pairs of 'gripping pairs' exist, a plurality of contact points forming the gripping pair all participate in the interaction intention recognition, and the robustness, flexibility, efficiency and immersion feeling of the gesture interaction intention recognition are improved;

the manipulation intention identification submodule is used for identifying manipulation intentions of a user for the manipulated virtual model, and the manipulation intentions comprise movement and rotation; after the grasping intention recognition submodule recognizes the grasping intention, the manipulating intention recognition module is called, recognizes the intention of the manipulated virtual model based on a manipulating intention recognition algorithm, expresses the intention by the result of virtual driving force and moment, calculates the displacement variation and the rotation variation of the manipulated virtual model based on the force or the moment, updates the motion state of the manipulated model and realizes the driving of the manipulated model; compared with a method for predicting model displacement by simply using displacement of two hands, the manipulation intention identification mode based on the virtual force is more in line with the physical motion process, and more accurate manipulation can be realized later.