CN110554774B - AR-oriented navigation type interactive normal form system - Google Patents

AR-oriented navigation type interactive normal form system Download PDF

Info

Publication number
CN110554774B
CN110554774B CN201910660335.9A CN201910660335A CN110554774B CN 110554774 B CN110554774 B CN 110554774B CN 201910660335 A CN201910660335 A CN 201910660335A CN 110554774 B CN110554774 B CN 110554774B
Authority
CN
China
Prior art keywords
gesture
recognition
expression
voice
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910660335.9A
Other languages
Chinese (zh)
Other versions
CN110554774A (en
Inventor
冯志全
肖梦婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN201910660335.9A priority Critical patent/CN110554774B/en
Publication of CN110554774A publication Critical patent/CN110554774A/en
Application granted granted Critical
Publication of CN110554774B publication Critical patent/CN110554774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B9/00Simulators for teaching or training purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/012Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides an AR-oriented navigation type interactive system, which comprises an input and sensing module, a fusion module and an application expression module; the input and sensing module acquires hand skeleton depth information through kinect to obtain a gesture depth map and complete gesture recognition; classifying the experiment keywords through voice keyword recognition and extraction, and calculating to obtain similarity probability through similarity to complete voice recognition; the fusion module classifies user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression, determines multi-mode intentions, and sets an intention expert knowledge base; and the expression module is used for carrying out voice navigation, visual presentation and perception feedback expression on the virtual chemical experiment. The invention adopts a natural interaction technology to integrate gestures and voice multimode, explores a natural interaction mode, reduces the load of the user and brings different user experiences.

Description

AR-oriented navigation type interactive normal form system
Technical Field
The invention belongs to the technical field of experimental interaction, and particularly relates to an AR-oriented navigation type interactive paradigm system.
Background
In the existing middle school chemistry, a plurality of chemical experiments have dangerous experiment phenomena such as explosion, corrosion and the like, so that students cannot operate the experiments, the students cannot easily observe the experiments in the experiments with gas, and the middle school students only observe the experiment phenomena through textbooks or videos, so that a new experiment method is needed, the requirements of the outline of the middle school chemical experiments are met, and the provided problems are solved. The student can operate the experiment at any time, so that the chemical reagent resources are saved, and the student can really explore and discover the true theory and mechanism of the experiment.
With the development of human-computer interaction technology, augmented reality is becoming a hot research in the application field as a technology for improving the interaction capability between reality and virtual. And the gesture interaction also becomes an important interaction form in the man-machine interaction, and the combination of the gesture interaction technology and the augmented reality technology is the necessary requirement embodiment of the intelligent era. Gesture interaction technology has been widely used in various fields including educational teaching. According to researches, the traditional education teaching can not gradually meet the requirements and targets of a new period on the teaching effect, and the traditional chemical experiment has dangerous phenomena of explosion, corrosion and the like, so more and more education teaching focuses on novel virtual experiment teaching, the thinking ability of students is improved to a certain extent, the imagination ability of virtual space is enhanced, but most of the education teaching only completes experiment operation by means of a model in a single virtual scene, the sense of reality of a real environment is lacked, and the students can not be integrated into the virtual experiment teaching. In the face of the existing augmented reality technology, three-dimensional reproduction of a virtual model is mostly completed by using card identification with a label, and the fusion of a real world and a virtual scene is completed by using wearing equipment such as a data glove or a gesture identification sensor in a few cases, but the three-dimensional reproduction cannot be completely immersed in a virtual-real interaction experiment. Therefore, the use of bare-hand interaction is becoming an important form in the field of augmented reality, and it is important to combine the bare-hand gesture interaction technology with the augmented reality technology. Therefore, how to perform bare hand gesture recognition in the augmented reality technology is an important issue for application.
Disclosure of Invention
The invention provides an AR-oriented navigation type interactive normal form system, which adopts gesture recognition and voice recognition information input, performs voice navigation and visual presentation on a virtual chemical experiment by an application expression module according to the expression of a fusion module through information fusion, and performs perception force feedback expression through a force feedback part arranged on a finger.
In order to achieve the above object, the present invention provides an AR-oriented navigation interactive paradigm system, which includes an input and sensing module, a fusion module, and an application expression module;
the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; performing voice keyword recognition through the SDK, extracting keywords through voice input, classifying the keywords required by the experiment, calculating the similarity probability through the similarity, forming a complete voice command, and completing voice recognition;
the fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein sufficient necessary conditions between user intention and user behaviors are set in the intention expert knowledge base;
the application expression module performs voice navigation and visual presentation on the virtual chemical experiment according to the expression of the fusion module, and performs perception force feedback expression through a force feedback component arranged on a finger.
Further, the method for obtaining the gesture depth map by obtaining the hand skeleton depth information through the kinect image acquisition device comprises the following steps:
s1: acquiring hand skeleton depth information by using kinect image acquisition equipment, segmenting a hand region and acquiring a gesture depth map;
s2: proportionally dividing the gesture depth map into a training set and a testing set; the proportion of the training set to the testing set is 7:3, and the gesture depth map is cut into n; the n is equal in size and is a natural number;
s3: inputting the gesture depth map of the training set into a CNN Alexnet network, and extracting gesture depth features by continuously updating weights, wherein the expression of the gesture depth features is as follows:
Figure BDA0002138334300000031
l is the current layer number, n is the neuron number of the previous layer, and w ij m Is the connection weight between layer l neuron j and the previous layer neuron i, b i m Is biased for the jth feature after l convolutional layers.
Further, the method for inputting the gesture depth maps acquired by the kinect image acquisition equipment under different frames into the gesture recognition model to complete recognition in the Unity 3D and realizing the real-time interaction between the gesture position and the virtual model according to the coordinate conversion comprises the following steps:
s4: dividing the gesture depth map into six-dimensional coordinate vectors through a softmax classifier, wherein the six-dimensional coordinate vectors respectively represent six gestures; sequentially taking the maximum probability in the vectors as a training model; inputting the test set into the training model for testing;
s5: respectively obtaining an nth frame of gesture depth map and an nth-1 th frame of gesture depth map through kinect image acquisition equipment, and obtaining coordinates S of two joint points at different moments n (theta) and S n-1 (θ); the theta is a three-dimensional coordinate of the depth of the hand node; judgment S n (theta) and S n-1 (θ) is equal, if equal, the current gesture is unchanged; if not, judging that the current gesture corresponds to the gesture recognition result in the training model again, and outputting the gesture recognition result to G m The m is a gesture type;
s6: according to the mapping of the hand joint point coordinates in the real space and the depth three-dimensional coordinates, determining the mapping relation between the hand joint point coordinates and the real space as follows:
Figure BDA0002138334300000032
the (Kinect) X ,Kinect Y ,Kinect Z ) Obtaining hand joint coordinates in a real space by using a depth camera; said (U) X ,U Y ,U Z ) Is the coordinate of the virtual scene in the unity environment, the W is the proportional relation corresponding to the coordinate of the real scene and the virtual scene in the field, and the (val) X ,val Y ,val Z ) The corresponding relation between the human hand in the real space and the origin of the viewpoint of the virtual object.
Further, the method for speech recognition comprises:
s1: classifying the keywords required by implementation into verb vocabularies D = { m = 1 ,m 2 ...,m i And attribute words S = { n = } 1 ,n 2 ...,n j };
S2: matching the set D and the set S pairwise to obtain a matched keyword library n, calculating the similarity of the matched keyword library n and the extracted keywords to obtain the probability P (S) of all the similarities of the keywords in the set, if P is P i (s)>P j (s); then the P is i (s) is the probability maximum;
s3: setting a threshold value t, and expressing the semantics as S n N represents a speech keyword, and the maximum probability P is judged i (s),
Figure BDA0002138334300000041
S is n Sensing different keyword signals of a voice channel;
s4: and forming a complete voice command according to the received keywords to complete voice recognition.
Further, the method for expressing the perception force feedback through the force feedback component arranged on the finger comprises the following steps: the method comprises the following steps that a vibrator is worn on a finger, when a gesture collides with an object, the vibrator vibrates, and when a chip receives a force feedback expression signal, a user adopts force feedback expression; setting a feedback force value t; force feedback is expressed as R z Upon receipt of R z When the intention is expressed, the feedback is sent to the vibration vibrator to vibrate.
Further, the method implemented by the fusion module is as follows: for the acquired gesture recognition G m Speech recognition s n And force feedback expression R z According to G m ,S n ,R z There is an intersection, phase and independent relationship to obtain a relationship formula of
F(x)∈{(G m ∩S n ),(G m ∪S n ),(G m ,S n ,R z )};
Establishing an expert knowledge base, matching F (x) with the expert knowledge base to obtain an intention E i
Further, the method for performing voice navigation and visual presentation on the virtual chemical experiment by the application expression module according to the expression of the fusion module and performing perception force feedback expression through a force feedback component arranged on a finger comprises the following steps:
in the interaction process of the gesture and the virtual object, setting the virtual object as V t T is the number of classes of the subject(ii) a Let the voice navigation be A v V is the number of categories navigated; building E from a database i And V t And E i And A v The relationship of (1);
according to intention E i If multiple V's are displayed in the scene t By intention then to obtain a force feedback R z To perform a force feedback effect R z (ii) a corresponding expression; according to intention E i Carry out voice navigation A v
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the embodiment of the invention provides an AR-oriented navigation type interactive normal form system, which comprises an input and perception module, a fusion module and an application expression module; the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; performing voice keyword recognition through the SDK, extracting keywords through voice input, classifying the keywords required by the experiment, calculating the similarity probability through the similarity, forming a complete voice command, and completing voice recognition; the fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein the intention expert knowledge base sets sufficient necessary conditions between the user intention and the user behavior; the application expression module performs voice navigation and visual presentation on the virtual chemical experiment according to the expression of the fusion module, and performs perception force feedback expression through a force feedback part arranged on a finger. In the invention, a virtual experimental device is operated by a real hand to simulate a chemical experiment in an augmented reality environment, a realistic drawing method is adopted in the experimental process to present an experimental phenomenon, the mechanism of the chemical experiment is reflected, a natural interaction technology is utilized to fuse gestures and voice multimode, a novel natural interaction mode is explored, the user load is reduced, and different user experiences are brought.
Drawings
FIG. 1 is a schematic diagram of an AR-oriented navigation paradigm system according to embodiment 1 of the present invention;
FIG. 2 is a system diagram of multimodal fusion proposed in embodiment 1 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.
Example 1
The embodiment 1 of the invention provides an AR-oriented navigation type interactive paradigm system, which comprises an input and perception module, a fusion module and an application expression module;
the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; wherein kinect image acquisition equipment adopts kinect high definition digtal camera. And voice keyword recognition is carried out through the SDK, the keywords are extracted through voice input, the keywords required by the experiment are classified, the similarity probability is obtained through similarity calculation, a complete voice command is formed, and the voice recognition is completed.
The fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein the intention expert knowledge base sets sufficient necessary conditions between the user intention and the user behavior;
the application expression module performs voice navigation and visual presentation on the virtual chemical experiment according to the expression of the fusion module, and performs perception force feedback expression through a force feedback part arranged on a finger.
FIG. 1 is a schematic diagram of an AR-oriented navigation paradigm system according to embodiment 1 of the present invention; in the input and perception module, the method for obtaining the gesture depth map by acquiring the hand skeleton depth information through kinect image acquisition equipment comprises the following steps:
s1: acquiring hand skeleton depth information by using kinect image acquisition equipment, segmenting a hand region and acquiring a gesture depth map;
s2: dividing the gesture depth map into a training set and a test set according to a proportion; the proportion of the training set to the testing set is 7:3, and the gesture depth map is cut into n; wherein n is equal in size and is a natural number;
s3: inputting the gesture depth map of the training set into a CNN Alexnet network, and extracting gesture depth features by continuously updating weights, wherein the expression of the gesture depth features is as follows:
Figure BDA0002138334300000071
wherein l is the current layer number, n is the neuron number of the previous layer, and w ij m Is the connection weight between layer l neuron j and the previous layer neuron i, b i m Is the first layer after the laminationj feature offsets.
The method for inputting the gesture depth maps acquired by a kinect image acquisition device under different frames into a gesture recognition model to complete recognition in the Unity 3D and realizing real-time interaction between a gesture position and the virtual model according to coordinate conversion comprises the following steps:
s4: dividing the gesture depth map into six-dimensional coordinate vectors through a softmax classifier, wherein the six-dimensional coordinate vectors respectively represent six gestures; sequentially taking the maximum probability in the vectors as a training model; inputting a test set into the training model for testing;
s5: respectively obtaining an nth frame of gesture depth map and an nth-1 th frame of gesture depth map through kinect image acquisition equipment, and obtaining coordinates S of two joint points at different moments n (theta) and S n-1 (θ); wherein theta is a three-dimensional coordinate of the depth of the hand node;
judgment S n (theta) and S n-1 (θ) is equal, if equal, the current gesture is unchanged; if not, judging that the current gesture corresponds to the gesture recognition result in the training model again, and outputting the gesture recognition result to G m Wherein m is a gesture type;
s6: according to the mapping of the hand joint point coordinates in the real space and the depth three-dimensional coordinates, determining the mapping relation between the hand joint point coordinates and the real space as follows:
Figure BDA0002138334300000081
wherein (Kinect) X ,Kinect Y ,Kinect Z ) Obtaining hand joint coordinates in a real space by using a depth camera; (U) X ,U Y ,U Z ) Is the coordinate of the virtual scene in the unity environment, and W is the proportional relation between the coordinate of the real scene and the coordinate of the virtual scene in the field, (val) X ,val Y ,val Z ) The corresponding relation between the human hand in the real space and the origin of the viewpoint of the virtual object.
In the input and perception module, the voice recognition method comprises the following steps:
s1: to implementation facilityClassifying the required keywords into verb vocabularies D = { m = 1 ,m 2 ...,m i And attribute vocabulary S = { n = } 1 ,n 2 ...,n j };
S2: matching the set D and the set S pairwise to obtain a matched keyword library n, calculating the similarity of the matched keyword library n and the extracted keywords to obtain the probability P (S) of all the similarities of the keywords in the set, if P is P i (s)>P j (s); then the P is i (s) is the probability maximum;
s3: setting a threshold value t, and expressing the semantics as S n Wherein n represents a speech keyword, determining the maximum probability P i (s),
Figure BDA0002138334300000082
s n Sensing different keyword signals of a voice channel;
s4: and forming a complete voice command according to the received keywords to complete voice recognition.
In addition, the method for expressing the perception force feedback through the force feedback component arranged on the finger comprises the following steps: the vibrator is worn on the finger, when the gesture collides with an object, the vibrator vibrates, and when the chip receives a signal of force feedback expression, a user adopts the force feedback expression; setting a feedback force value t; force feedback is expressed as R z Upon receipt of R z When the intention is expressed, the feedback is sent to the vibration vibrator to vibrate.
FIG. 2 is a system diagram of the multi-modal fusion proposed in embodiment 1 of the present invention. The fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module, and determines a multi-mode intention by judging the relationship between the user intention and the different states; and setting an intention expert knowledge base, wherein sufficient necessary conditions between the user intention and the user behavior are set in the intention expert knowledge base.
The method for realizing the fusion module comprises the following steps: for the acquired gesture recognition G m Speech recognition s n And force feedback expression R z According to G m ,S n ,R z There is an intersection, phase and independent relationship to obtain a relationship formula of
F(x)∈{(G m ∩S n ),(G m ∪S n ),(G m ,S n ,R z )};
Establishing an expert knowledge base, matching F (x) with the expert knowledge base to obtain an intention E i
The method for performing voice navigation and visual presentation on the virtual chemical experiment by using the expression module according to the expression of the fusion module and performing perception force feedback expression by using a force feedback component arranged on a finger comprises the following steps:
in the interaction process of the gesture and the virtual object, a virtual object is set as V t T is the number of classes of the subject; let the voice navigation be A v V is the number of categories navigated; building E from a database i And V t And E i And A v The relationship of (1);
according to intention E i If multiple V's are displayed in the scene t By intention then to obtain a force feedback R z To perform a force feedback effect R z (ii) a corresponding expression; according to intention E i Carry out voice navigation A v
The invention adopts gesture recognition and voice recognition information input, carries out voice navigation and visual presentation on a virtual chemical experiment by an application expression module according to the expression of a fusion module through information fusion, and carries out perception force feedback expression through a force feedback part arranged on a finger.
The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the present invention as defined in the accompanying claims.

Claims (7)

1. An AR-oriented navigation type interactive system is characterized by comprising an input and perception module, a fusion module and an application expression module;
the input and perception module acquires hand skeleton depth information through kinect image acquisition equipment to obtain a gesture depth map, gesture recognition is completed in Unity 3D by means of the gesture depth map, and real-time interaction between a gesture position and a virtual model is realized according to coordinate conversion; performing voice keyword recognition through the SDK, extracting keywords through voice input, classifying the keywords required by the experiment, and calculating the similarity probability through the similarity to form a complete voice command to complete voice recognition;
the fusion module classifies the user intentions by calculating signal perceptions of different states under different modes according to gesture recognition, voice recognition and force feedback expression acquired by the input perception module; determining a multi-modal intention by judging the relationship between the user intention and different states; setting an intention expert knowledge base, wherein sufficient necessary conditions between user intentions and user behaviors are set in the intention expert knowledge base;
the application expression module performs voice navigation and visual presentation on the virtual chemical experiment according to the expression of the fusion module, and performs perception force feedback expression through a force feedback component arranged on a finger.
2. The AR-oriented navigational interactive paradigm system of claim 1, wherein the method for obtaining the gesture depth map by obtaining the hand skeleton depth information through a kinect image capture device comprises:
s1: acquiring hand skeleton depth information by using kinect image acquisition equipment, segmenting a hand region and acquiring a gesture depth map;
s2: proportionally dividing the gesture depth map into a training set and a testing set; the proportion of the training set to the testing set is 7:3, and the gesture depth map is cut into n; the n is equal in size and is a natural number;
s3: inputting the gesture depth map of the training set into a CNN Alexnet network, and extracting gesture depth features by continuously updating weights, wherein the expression of the gesture depth features is as follows:
Figure FDA0002138334290000021
l is the current layer number, n is the neuron number of the previous layer, and w ij m Is the connection weight between layer l neuron j and the previous layer neuron i, b i m Is biased for the jth feature after l convolutional layers.
3. The AR-oriented navigational interactive paradigm system of claim 2, wherein the gesture depth maps obtained by means of a kinect image capturing device in different frames are input into the gesture recognition model in Unity 3D to complete recognition, and the method for realizing real-time interaction between the gesture location and the virtual model according to coordinate transformation comprises:
s4: dividing the gesture depth map into six-dimensional coordinate vectors through a softmax classifier, wherein the six-dimensional coordinate vectors respectively represent six gestures; sequentially taking the maximum probability in the vectors as a training model; inputting the test set into the training model for testing;
s5: respectively obtaining an nth frame of gesture depth map and an nth-1 th frame of gesture depth map through kinect image acquisition equipment, and obtaining coordinates S of two joint points at different moments n (theta) and S n-1 (θ); the theta is a three-dimensional coordinate of the depth of the hand node; judgment S n (theta) and S n-1 (θ) is equal, if equal, the current gesture is unchanged; if not, judging that the current gesture corresponds to the gesture recognition result in the training model again, and outputting the gesture recognition result to G m The m is a gesture type;
s6: according to the mapping of the hand joint point coordinates in the real space and the depth three-dimensional coordinates, determining that the mapping relation between the hand joint point coordinates and the real space is as follows:
Figure FDA0002138334290000031
the (Kinect) X ,Kinect Y ,Kinect Z ) Obtaining hand joint coordinates in a real space by using a depth camera; said (U) X ,U Y ,U Z ) Is the coordinate of the virtual scene in the unity environment, the W is the proportional relation corresponding to the coordinate of the real scene and the virtual scene in the field, and the (val) X ,val Y ,val Z ) The corresponding relation between the human hand in the real space and the origin of the viewpoint of the virtual object.
4. The AR-oriented navigational interactive paradigm system of claim 1, wherein said speech recognition method comprises:
s1: classifying the keywords required by implementation into verb vocabularies D = { m 1 ,m 2 ...,m i And attribute vocabulary S = { n = } 1 ,n 2 ...,n j };
S2: matching the set D and the set S pairwise to obtain a matched keyword library n, calculating the similarity of the matched keyword library n and the extracted keywords to obtain the probability P (S) of all the similarities of the keywords in the set, if P is P i (s)>P j (s); then the P is i (s) is the probability maximum;
s3: setting a threshold value t, expressing semanteme as S n N represents a speech keyword, and the maximum probability P is judged i (s),
Figure FDA0002138334290000032
S is n Sensing different keyword signals of a voice channel;
s4: and forming a complete voice command according to the received keywords to complete voice recognition.
5. The AR-oriented navigational interactive range system as recited in claim 1, wherein the sensory feedback is expressed by a force feedback component disposed on the finger: a vibrator is worn on a finger, when the gesture collides with an object, the vibrator vibrates, and when a chip receives a signal expressed by force feedback, a user adopts the force feedback expression; setting a feedback force value t; force feedback is expressed as R z Upon receipt of R z When the intention is expressed, the feedback is sent to the vibration vibrator to vibrate.
6. The AR-oriented navigational interactive paradigm system of claim 1, wherein said fusion module implements the method comprising: for the acquired gesture recognition G m Speech recognition s n And force feedback expression R z According to G m ,S n ,R z There is an intersection, phase and independent relationship to obtain a relationship formula of
F(x)∈{(G m ∩S n ),(G m ∪S n ),(G m ,S n ,R z )};
Establishing an expert knowledge base, matching F (x) with the expert knowledge base to obtain an intention E i
7. The AR-oriented navigational interactive paradigm system of claim 1 or 6, wherein the method for performing voice navigation, visual presentation and sensory feedback representation of a virtual chemistry experiment according to the representation of the fusion module by the application representation module comprises:
in the interaction process of the gesture and the virtual object, setting the virtual object as V t T is the number of classes of the subject; let the voice navigation be A v V is the number of categories navigated; building E from a database i And V t And E i And A v The relationship of (a);
according to intention E i If multiple V's are displayed in the scene t By intention then to obtain a force feedback R z To perform a force feedback effect R z (ii) a corresponding expression; according to intention E i Carry out voice navigation A v
CN201910660335.9A 2019-07-22 2019-07-22 AR-oriented navigation type interactive normal form system Active CN110554774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910660335.9A CN110554774B (en) 2019-07-22 2019-07-22 AR-oriented navigation type interactive normal form system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910660335.9A CN110554774B (en) 2019-07-22 2019-07-22 AR-oriented navigation type interactive normal form system

Publications (2)

Publication Number Publication Date
CN110554774A CN110554774A (en) 2019-12-10
CN110554774B true CN110554774B (en) 2022-11-04

Family

ID=68735724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910660335.9A Active CN110554774B (en) 2019-07-22 2019-07-22 AR-oriented navigation type interactive normal form system

Country Status (1)

Country Link
CN (1) CN110554774B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111665941B (en) * 2020-06-07 2023-12-22 济南大学 Virtual experiment-oriented multi-mode semantic fusion human-computer interaction system and method
CN111814095A (en) * 2020-06-23 2020-10-23 济南大学 Exploration type interactive algorithm in virtual experiment
CN111595349A (en) * 2020-06-28 2020-08-28 浙江商汤科技开发有限公司 Navigation method and device, electronic equipment and storage medium
CN111968470B (en) * 2020-09-02 2022-05-17 济南大学 Pass-through interactive experimental method and system for virtual-real fusion
CN112101219B (en) * 2020-09-15 2022-11-04 济南大学 Intention understanding method and system for elderly accompanying robot
CN112748800B (en) * 2020-09-16 2022-11-04 济南大学 Intelligent glove-based experimental scene perception interaction method
CN112099632B (en) * 2020-09-16 2024-04-05 济南大学 Human-robot cooperative interaction method for helping old accompany
CN112462940A (en) * 2020-11-25 2021-03-09 苏州科技大学 Intelligent home multi-mode man-machine natural interaction system and method thereof
CN112486322A (en) * 2020-12-07 2021-03-12 济南浪潮高新科技投资发展有限公司 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition
CN112579758A (en) * 2020-12-25 2021-03-30 北京百度网讯科技有限公司 Model training method, device, equipment, storage medium and program product
CN113297955B (en) * 2021-05-21 2022-03-18 中国矿业大学 Sign language word recognition method based on multi-mode hierarchical information fusion
CN114881179B (en) * 2022-07-08 2022-09-06 济南大学 Intelligent experiment method based on intention understanding

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993135A (en) * 2019-03-29 2019-07-09 济南大学 A kind of gesture identification method based on augmented reality, system and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101687017B1 (en) * 2014-06-25 2016-12-16 한국과학기술원 Hand localization system and the method using head worn RGB-D camera, user interaction system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993135A (en) * 2019-03-29 2019-07-09 济南大学 A kind of gesture identification method based on augmented reality, system and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向智能电视的隐式手势交互建模与算法;徐治鹏等;《计算机辅助设计与图形学学报》;20170215(第02期);全文 *

Also Published As

Publication number Publication date
CN110554774A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN110554774B (en) AR-oriented navigation type interactive normal form system
Tolentino et al. Static sign language recognition using deep learning
US10664060B2 (en) Multimodal input-based interaction method and device
CN111563502B (en) Image text recognition method and device, electronic equipment and computer storage medium
CN109992107B (en) Virtual control device and control method thereof
CN110286763A (en) A kind of navigation-type experiment interactive device with cognitive function
CN105159452B (en) A kind of control method and system based on human face modeling
Oszust et al. Recognition of signed expressions observed by Kinect Sensor
CN113449700B (en) Training of video classification model, video classification method, device, equipment and medium
CN109035297A (en) A kind of real-time tracing method based on dual Siam's network
CN110286764A (en) A kind of multi-modal fusion experimental system and its application method
Xu et al. Review of hand gesture recognition study and application
Ryumin et al. Automatic detection and recognition of 3D manual gestures for human-machine interaction
Zhang et al. Teaching chinese sign language with a smartphone
CN113506377A (en) Teaching training method based on virtual roaming technology
CN109784140A (en) Driver attributes' recognition methods and Related product
Fei et al. Flow-pose Net: An effective two-stream network for fall detection
Rozaliev et al. Methods and Models for Identifying Human Emotions by Recognition Gestures and Motion
Barbhuiya et al. Alexnet-cnn based feature extraction and classification of multiclass asl hand gestures
Wang et al. MFA: A Smart Glove with Multimodal Intent Sensing Capability.
Rozaliev et al. Detailed analysis of postures and gestures for the identification of human emotional reactions
Rozaliev et al. Recognizing and analyzing emotional expressions in movements
Wren Understanding expressive action
Axyonov et al. Method of multi-modal video analysis of hand movements for automatic recognition of isolated signs of Russian sign language
Dhamanskar et al. Human computer interaction using hand gestures and voice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant