CN110554774B

CN110554774B - AR-oriented navigation type interactive normal form system

Info

Publication number: CN110554774B
Application number: CN201910660335.9A
Authority: CN
Inventors: 冯志全; 肖梦婷
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2022-11-04
Anticipated expiration: 2039-07-22
Also published as: CN110554774A

Abstract

The invention provides an AR-oriented navigation type interactive system, which comprises an input and sensing module, a fusion module and an application expression module; the input and sensing module acquires hand skeleton depth information through kinect to obtain a gesture depth map and complete gesture recognition; classifying the experiment keywords through voice keyword recognition and extraction, and calculating to obtain similarity probability through similarity to complete voice recognition; the fusion module classifies user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression, determines multi-mode intentions, and sets an intention expert knowledge base; and the expression module is used for carrying out voice navigation, visual presentation and perception feedback expression on the virtual chemical experiment. The invention adopts a natural interaction technology to integrate gestures and voice multimode, explores a natural interaction mode, reduces the load of the user and brings different user experiences.

Description

AR-oriented navigation type interactive normal form system

Technical Field

The invention belongs to the technical field of experimental interaction, and particularly relates to an AR-oriented navigation type interactive paradigm system.

Background

In the existing middle school chemistry, a plurality of chemical experiments have dangerous experiment phenomena such as explosion, corrosion and the like, so that students cannot operate the experiments, the students cannot easily observe the experiments in the experiments with gas, and the middle school students only observe the experiment phenomena through textbooks or videos, so that a new experiment method is needed, the requirements of the outline of the middle school chemical experiments are met, and the provided problems are solved. The student can operate the experiment at any time, so that the chemical reagent resources are saved, and the student can really explore and discover the true theory and mechanism of the experiment.

With the development of human-computer interaction technology, augmented reality is becoming a hot research in the application field as a technology for improving the interaction capability between reality and virtual. And the gesture interaction also becomes an important interaction form in the man-machine interaction, and the combination of the gesture interaction technology and the augmented reality technology is the necessary requirement embodiment of the intelligent era. Gesture interaction technology has been widely used in various fields including educational teaching. According to researches, the traditional education teaching can not gradually meet the requirements and targets of a new period on the teaching effect, and the traditional chemical experiment has dangerous phenomena of explosion, corrosion and the like, so more and more education teaching focuses on novel virtual experiment teaching, the thinking ability of students is improved to a certain extent, the imagination ability of virtual space is enhanced, but most of the education teaching only completes experiment operation by means of a model in a single virtual scene, the sense of reality of a real environment is lacked, and the students can not be integrated into the virtual experiment teaching. In the face of the existing augmented reality technology, three-dimensional reproduction of a virtual model is mostly completed by using card identification with a label, and the fusion of a real world and a virtual scene is completed by using wearing equipment such as a data glove or a gesture identification sensor in a few cases, but the three-dimensional reproduction cannot be completely immersed in a virtual-real interaction experiment. Therefore, the use of bare-hand interaction is becoming an important form in the field of augmented reality, and it is important to combine the bare-hand gesture interaction technology with the augmented reality technology. Therefore, how to perform bare hand gesture recognition in the augmented reality technology is an important issue for application.

Disclosure of Invention

The invention provides an AR-oriented navigation type interactive normal form system, which adopts gesture recognition and voice recognition information input, performs voice navigation and visual presentation on a virtual chemical experiment by an application expression module according to the expression of a fusion module through information fusion, and performs perception force feedback expression through a force feedback part arranged on a finger.

In order to achieve the above object, the present invention provides an AR-oriented navigation interactive paradigm system, which includes an input and sensing module, a fusion module, and an application expression module;

the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; performing voice keyword recognition through the SDK, extracting keywords through voice input, classifying the keywords required by the experiment, calculating the similarity probability through the similarity, forming a complete voice command, and completing voice recognition;

the fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein sufficient necessary conditions between user intention and user behaviors are set in the intention expert knowledge base;

the application expression module performs voice navigation and visual presentation on the virtual chemical experiment according to the expression of the fusion module, and performs perception force feedback expression through a force feedback component arranged on a finger.

Further, the method for obtaining the gesture depth map by obtaining the hand skeleton depth information through the kinect image acquisition device comprises the following steps:

s1: acquiring hand skeleton depth information by using kinect image acquisition equipment, segmenting a hand region and acquiring a gesture depth map;

s2: proportionally dividing the gesture depth map into a training set and a testing set; the proportion of the training set to the testing set is 7:3, and the gesture depth map is cut into n; the n is equal in size and is a natural number;

s3: inputting the gesture depth map of the training set into a CNN Alexnet network, and extracting gesture depth features by continuously updating weights, wherein the expression of the gesture depth features is as follows:

l is the current layer number, n is the neuron number of the previous layer, and w _ij ^m Is the connection weight between layer l neuron j and the previous layer neuron i, b _i ^m Is biased for the jth feature after l convolutional layers.

Further, the method for inputting the gesture depth maps acquired by the kinect image acquisition equipment under different frames into the gesture recognition model to complete recognition in the Unity 3D and realizing the real-time interaction between the gesture position and the virtual model according to the coordinate conversion comprises the following steps:

s4: dividing the gesture depth map into six-dimensional coordinate vectors through a softmax classifier, wherein the six-dimensional coordinate vectors respectively represent six gestures; sequentially taking the maximum probability in the vectors as a training model; inputting the test set into the training model for testing;

s5: respectively obtaining an nth frame of gesture depth map and an nth-1 th frame of gesture depth map through kinect image acquisition equipment, and obtaining coordinates S of two joint points at different moments _n (theta) and S _n-1 (θ); the theta is a three-dimensional coordinate of the depth of the hand node; judgment S _n (theta) and S _n-1 (θ) is equal, if equal, the current gesture is unchanged; if not, judging that the current gesture corresponds to the gesture recognition result in the training model again, and outputting the gesture recognition result to G ^m The m is a gesture type;

s6: according to the mapping of the hand joint point coordinates in the real space and the depth three-dimensional coordinates, determining the mapping relation between the hand joint point coordinates and the real space as follows:

the (Kinect) _X ,Kinect _Y ,Kinect _Z ) Obtaining hand joint coordinates in a real space by using a depth camera; said (U) _X ,U _Y ,U _Z ) Is the coordinate of the virtual scene in the unity environment, the W is the proportional relation corresponding to the coordinate of the real scene and the virtual scene in the field, and the (val) _X ,val _Y ,val _Z ) The corresponding relation between the human hand in the real space and the origin of the viewpoint of the virtual object.

Further, the method for speech recognition comprises:

s1: classifying the keywords required by implementation into verb vocabularies D = { m = ₁ ,m ₂ ...,m _i And attribute words S = { n = } ₁ ,n ₂ ...,n _j }；

S2: matching the set D and the set S pairwise to obtain a matched keyword library n, calculating the similarity of the matched keyword library n and the extracted keywords to obtain the probability P (S) of all the similarities of the keywords in the set, if P is P _i (s)>P _j (s); then the P is _i (s) is the probability maximum;

s3: setting a threshold value t, and expressing the semantics as S ⁿ N represents a speech keyword, and the maximum probability P is judged _i (s)，

S is ⁿ Sensing different keyword signals of a voice channel;

s4: and forming a complete voice command according to the received keywords to complete voice recognition.

Further, the method for expressing the perception force feedback through the force feedback component arranged on the finger comprises the following steps: the method comprises the following steps that a vibrator is worn on a finger, when a gesture collides with an object, the vibrator vibrates, and when a chip receives a force feedback expression signal, a user adopts force feedback expression; setting a feedback force value t; force feedback is expressed as R ^z Upon receipt of R ^z When the intention is expressed, the feedback is sent to the vibration vibrator to vibrate.

Further, the method implemented by the fusion module is as follows: for the acquired gesture recognition G ^m Speech recognition s ⁿ And force feedback expression R ^z According to G ^m ，S ⁿ ，R ^z There is an intersection, phase and independent relationship to obtain a relationship formula of

F(x)∈{(G ^m ∩S ⁿ ),(G ^m ∪S ⁿ ),(G ^m ,S ⁿ ,R ^z )}；

Establishing an expert knowledge base, matching F (x) with the expert knowledge base to obtain an intention E ⁱ 。

Further, the method for performing voice navigation and visual presentation on the virtual chemical experiment by the application expression module according to the expression of the fusion module and performing perception force feedback expression through a force feedback component arranged on a finger comprises the following steps:

in the interaction process of the gesture and the virtual object, setting the virtual object as V ^t T is the number of classes of the subject(ii) a Let the voice navigation be A ^v V is the number of categories navigated; building E from a database ⁱ And V ^t And E ⁱ And A ^v The relationship of (1);

according to intention E ⁱ If multiple V's are displayed in the scene ^t By intention then to obtain a force feedback R ^z To perform a force feedback effect R ^z (ii) a corresponding expression; according to intention E ⁱ Carry out voice navigation A ^v 。

The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:

the embodiment of the invention provides an AR-oriented navigation type interactive normal form system, which comprises an input and perception module, a fusion module and an application expression module; the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; performing voice keyword recognition through the SDK, extracting keywords through voice input, classifying the keywords required by the experiment, calculating the similarity probability through the similarity, forming a complete voice command, and completing voice recognition; the fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein the intention expert knowledge base sets sufficient necessary conditions between the user intention and the user behavior; the application expression module performs voice navigation and visual presentation on the virtual chemical experiment according to the expression of the fusion module, and performs perception force feedback expression through a force feedback part arranged on a finger. In the invention, a virtual experimental device is operated by a real hand to simulate a chemical experiment in an augmented reality environment, a realistic drawing method is adopted in the experimental process to present an experimental phenomenon, the mechanism of the chemical experiment is reflected, a natural interaction technology is utilized to fuse gestures and voice multimode, a novel natural interaction mode is explored, the user load is reduced, and different user experiences are brought.

Drawings

FIG. 1 is a schematic diagram of an AR-oriented navigation paradigm system according to embodiment 1 of the present invention;

FIG. 2 is a system diagram of multimodal fusion proposed in embodiment 1 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.

Example 1

The embodiment 1 of the invention provides an AR-oriented navigation type interactive paradigm system, which comprises an input and perception module, a fusion module and an application expression module;

the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; wherein kinect image acquisition equipment adopts kinect high definition digtal camera. And voice keyword recognition is carried out through the SDK, the keywords are extracted through voice input, the keywords required by the experiment are classified, the similarity probability is obtained through similarity calculation, a complete voice command is formed, and the voice recognition is completed.

The fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein the intention expert knowledge base sets sufficient necessary conditions between the user intention and the user behavior;

the application expression module performs voice navigation and visual presentation on the virtual chemical experiment according to the expression of the fusion module, and performs perception force feedback expression through a force feedback part arranged on a finger.

FIG. 1 is a schematic diagram of an AR-oriented navigation paradigm system according to embodiment 1 of the present invention; in the input and perception module, the method for obtaining the gesture depth map by acquiring the hand skeleton depth information through kinect image acquisition equipment comprises the following steps:

s2: dividing the gesture depth map into a training set and a test set according to a proportion; the proportion of the training set to the testing set is 7:3, and the gesture depth map is cut into n; wherein n is equal in size and is a natural number;

wherein l is the current layer number, n is the neuron number of the previous layer, and w _ij ^m Is the connection weight between layer l neuron j and the previous layer neuron i, b _i ^m Is the first layer after the laminationj feature offsets.

The method for inputting the gesture depth maps acquired by a kinect image acquisition device under different frames into a gesture recognition model to complete recognition in the Unity 3D and realizing real-time interaction between a gesture position and the virtual model according to coordinate conversion comprises the following steps:

s4: dividing the gesture depth map into six-dimensional coordinate vectors through a softmax classifier, wherein the six-dimensional coordinate vectors respectively represent six gestures; sequentially taking the maximum probability in the vectors as a training model; inputting a test set into the training model for testing;

s5: respectively obtaining an nth frame of gesture depth map and an nth-1 th frame of gesture depth map through kinect image acquisition equipment, and obtaining coordinates S of two joint points at different moments _n (theta) and S _n-1 (θ); wherein theta is a three-dimensional coordinate of the depth of the hand node;

judgment S _n (theta) and S _n-1 (θ) is equal, if equal, the current gesture is unchanged; if not, judging that the current gesture corresponds to the gesture recognition result in the training model again, and outputting the gesture recognition result to G ^m Wherein m is a gesture type;

wherein (Kinect) _X ,Kinect _Y ,Kinect _Z ) Obtaining hand joint coordinates in a real space by using a depth camera; (U) _X ,U _Y ,U _Z ) Is the coordinate of the virtual scene in the unity environment, and W is the proportional relation between the coordinate of the real scene and the coordinate of the virtual scene in the field, (val) _X ,val _Y ,val _Z ) The corresponding relation between the human hand in the real space and the origin of the viewpoint of the virtual object.

In the input and perception module, the voice recognition method comprises the following steps:

s1: to implementation facilityClassifying the required keywords into verb vocabularies D = { m = ₁ ,m ₂ ...,m _i And attribute vocabulary S = { n = } ₁ ,n ₂ ...,n _j }；

s3: setting a threshold value t, and expressing the semantics as S ⁿ Wherein n represents a speech keyword, determining the maximum probability P _i (s)，

s ⁿ Sensing different keyword signals of a voice channel;

In addition, the method for expressing the perception force feedback through the force feedback component arranged on the finger comprises the following steps: the vibrator is worn on the finger, when the gesture collides with an object, the vibrator vibrates, and when the chip receives a signal of force feedback expression, a user adopts the force feedback expression; setting a feedback force value t; force feedback is expressed as R ^z Upon receipt of R ^z When the intention is expressed, the feedback is sent to the vibration vibrator to vibrate.

FIG. 2 is a system diagram of the multi-modal fusion proposed in embodiment 1 of the present invention. The fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module, and determines a multi-mode intention by judging the relationship between the user intention and the different states; and setting an intention expert knowledge base, wherein sufficient necessary conditions between the user intention and the user behavior are set in the intention expert knowledge base.

The method for realizing the fusion module comprises the following steps: for the acquired gesture recognition G ^m Speech recognition s ⁿ And force feedback expression R ^z According to G ^m ，S ⁿ ，R ^z There is an intersection, phase and independent relationship to obtain a relationship formula of

F(x)∈{(G ^m ∩S ⁿ ),(G ^m ∪S ⁿ ),(G ^m ,S ⁿ ,R ^z )}；

The method for performing voice navigation and visual presentation on the virtual chemical experiment by using the expression module according to the expression of the fusion module and performing perception force feedback expression by using a force feedback component arranged on a finger comprises the following steps:

in the interaction process of the gesture and the virtual object, a virtual object is set as V ^t T is the number of classes of the subject; let the voice navigation be A ^v V is the number of categories navigated; building E from a database ⁱ And V ^t And E ⁱ And A ^v The relationship of (1);

The invention adopts gesture recognition and voice recognition information input, carries out voice navigation and visual presentation on a virtual chemical experiment by an application expression module according to the expression of a fusion module through information fusion, and carries out perception force feedback expression through a force feedback part arranged on a finger.

The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the present invention as defined in the accompanying claims.

Claims

1. An AR-oriented navigation type interactive system is characterized by comprising an input and perception module, a fusion module and an application expression module;

the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; performing voice keyword recognition through the SDK, extracting keywords through voice input, classifying the keywords required by the experiment, and calculating the similarity probability through the similarity to form a complete voice command to complete voice recognition;

the fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein sufficient necessary conditions between user intentions and user behaviors are set in the intention expert knowledge base;

2. The AR-oriented navigational interactive paradigm system of claim 1, wherein the method for obtaining the gesture depth map by obtaining the hand skeleton depth information through a kinect image capture device comprises:

3. The AR-oriented navigational interactive paradigm system of claim 2, wherein the gesture depth maps obtained by means of a kinect image capturing device in different frames are input into the gesture recognition model in Unity 3D to complete recognition, and the method for realizing real-time interaction between the gesture location and the virtual model according to coordinate transformation comprises:

s6: according to the mapping of the hand joint point coordinates in the real space and the depth three-dimensional coordinates, determining that the mapping relation between the hand joint point coordinates and the real space is as follows:

4. The AR-oriented navigational interactive paradigm system of claim 1, wherein said speech recognition method comprises:

s1: classifying the keywords required by implementation into verb vocabularies D = { m ₁ ,m ₂ ...,m _i And attribute vocabulary S = { n = } ₁ ,n ₂ ...,n _j }；

s3: setting a threshold value t, expressing semanteme as S ⁿ N represents a speech keyword, and the maximum probability P is judged _i (s)，

S is ⁿ Sensing different keyword signals of a voice channel;

5. The AR-oriented navigational interactive range system as recited in claim 1, wherein the sensory feedback is expressed by a force feedback component disposed on the finger: a vibrator is worn on a finger, when the gesture collides with an object, the vibrator vibrates, and when a chip receives a signal expressed by force feedback, a user adopts the force feedback expression; setting a feedback force value t; force feedback is expressed as R ^z Upon receipt of R ^z When the intention is expressed, the feedback is sent to the vibration vibrator to vibrate.

6. The AR-oriented navigational interactive paradigm system of claim 1, wherein said fusion module implements the method comprising: for the acquired gesture recognition G ^m Speech recognition s ⁿ And force feedback expression R ^z According to G ^m ，S ⁿ ，R ^z There is an intersection, phase and independent relationship to obtain a relationship formula of

F(x)∈{(G ^m ∩S ⁿ ),(G ^m ∪S ⁿ ),(G ^m ,S ⁿ ,R ^z )}；

7. The AR-oriented navigational interactive paradigm system of claim 1 or 6, wherein the method for performing voice navigation, visual presentation and sensory feedback representation of a virtual chemistry experiment according to the representation of the fusion module by the application representation module comprises:

in the interaction process of the gesture and the virtual object, setting the virtual object as V ^t T is the number of classes of the subject; let the voice navigation be A ^v V is the number of categories navigated; building E from a database ⁱ And V ^t And E ⁱ And A ^v The relationship of (a);