WO2019029100A1 - Multi-interaction implementation method for mining operation based on virtual reality and augmented reality - Google Patents

Multi-interaction implementation method for mining operation based on virtual reality and augmented reality Download PDF

Info

Publication number
WO2019029100A1
WO2019029100A1 PCT/CN2017/118923 CN2017118923W WO2019029100A1 WO 2019029100 A1 WO2019029100 A1 WO 2019029100A1 CN 2017118923 W CN2017118923 W CN 2017118923W WO 2019029100 A1 WO2019029100 A1 WO 2019029100A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
function
probability
virtual
speech
Prior art date
Application number
PCT/CN2017/118923
Other languages
French (fr)
Chinese (zh)
Inventor
彭延军
王美玲
王元红
卢新明
Original Assignee
山东科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东科技大学 filed Critical 山东科技大学
Publication of WO2019029100A1 publication Critical patent/WO2019029100A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/0346Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/012Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment

Definitions

  • the invention belongs to the field of virtual reality and augmented reality technology, and particularly relates to a multi-interaction realization method for mining operations based on virtual reality and augmented reality.
  • AR Augmented Reality
  • the existing traditional training and teaching system is basically a theoretical introduction to mold display or two-dimensional image display, mainly in the classroom, supplemented by simple animation and audio and video introduction, lack of practice, lack of real scenes. Even if you look at the mold, you can't grasp the actual operation flow of the tool.
  • various training systems applied to coal mining have been developed accordingly, but there are also problems such as poor authenticity of the system scene, poor immersive effect, and few interactive functions, which can only be simply demonstrated.
  • the present invention proposes a multi-interaction realization method for mining operations based on virtual reality and augmented reality, which is reasonable in design, overcomes the deficiencies of the prior art, and has good effects.
  • a multi-interaction realization method for mining operation based on virtual reality and augmented reality adopting a multi-interaction simulation system for underground mining operation, the system includes two modes: virtual reality mode and augmented reality mode; virtual reality mode includes modeling and roaming of specific scenes , model and its material replacement, video embedded virtual scene, model movement, application scene intentional interaction, two-dimensional code generation and voice interaction; augmented reality mode including model selection, model explanation, dynamic model demonstration, gesture control model interaction, screenshot generation Icon, 360 degree rotation and stop, function mode switching and function expansion; the system has two hidden menus, namely the replacement tool in the virtual reality mode, the material selection menu and the model selection menu in the augmented reality mode; the first type The user enters the specific area menu to display, leaving will hide; the second type can display the secondary menu somewhere, click the menu again to hide;
  • the mining operation multiple interaction implementation method specifically includes the following steps:
  • Step 1 Construction of the entire environmental scene of the mining operation
  • the modeling tool 3DMax is used for 1:1 equal ratio modeling to realize the environment simulation of the whole underground mining operation;
  • the UE4 engine is used to edit the model including creation, editing texture and material, adding Physical collision, adding lighting, effect lighting and special effects to the overall environment, baking and rendering;
  • Step 2 Roaming of the virtual reality application scenario
  • the upper, lower, left, and right keys of the keyboard are set, and the Up, Down, Right, and Left direction control functions are bound, and the Turnaround control function is bound to the mouse to realize the roaming of the virtual reality scene of the entire underground mining operation;
  • Step 3 Replace the tool model of the underground mining operation and the simulated material of the mine geology
  • Step 4 Embed the video material into the 3D application scene and control playback and stop
  • Step 5 Select the model and move to any location
  • Step 6 Implement the intent interaction of the application scenario
  • the environment light is automatically turned on to implement natural interaction in the virtual scenario
  • Step 7 QR code generation
  • Bind the F key of the keyboard add the QR code generation function, set the keyboard key control to generate the QR code function, and press the F key of the keyboard to generate a two-dimensional code of the virtual scene panorama with the set sampling points;
  • Step 8 Implement voice interaction
  • the user controls the coal mining machine in the virtual reality scene through keywords including forward rotation, reverse rotation, boom raising, lowering arm and stopping, and simulates the operation effect;
  • Step 9 AR dynamic demonstration function mode switching
  • step 3 the model is instantiated into a specific Actor, the SetMesh function and the SetMaterial function are added to replace the model and the model material, and the Widget Blueprint user interface and the Box collision collision detection are set to realize the hidden menu function of the three-dimensional space.
  • step 5 a mouse event is added for the model to be operated, the model is selected by the GetHitResult function, and then the coordinates of the SetActorLocation function of the model are changed according to the coordinates of the mouse in the three-dimensional space, and when the mouse clicks again, the time will be
  • the coordinate values of the mouse x, y, and z directions are assigned to the model, and the GetHitResult function sets the model to the unselected mode.
  • the TriggerBox trigger is set.
  • the system detects that the user intentionally enters an area, a device in the area is automatically enabled.
  • step 7 the user presses the keyboard F key, and the system generates a virtual scene panorama QR code with the set sampling point, and the user scans the two-dimensional code with the mobile phone, and jumps to the virtual application scene display page of the mobile terminal,
  • the user can enable the gyroscope, switch to the VR split screen mode, set the mobile phone parameters, and then use the VR glasses to experience the virtual underground mining operation environment scene, realize the 720 degree view display, and also realize the multi-scene and multi-angle of the mobile phone end. Roaming experience.
  • the speech recognition is implemented based on the Pocket-sphinx library, and the recognition function is implemented through the improved Chinese keyword dictionary, through preprocessing, feature extraction, acoustic model training, language model training, and speech decoding and searching, and finally passes through the UE4.
  • the writing function control function of the engine realizes the control of the speech model in the three-dimensional space; the specific implementation steps of the speech recognition are as follows:
  • Pre-emphasis is achieved by a first-order FIR high-pass digital filter.
  • the transfer function of the first-order FIR high-pass digital filter is:
  • a is the coefficient of the pre-emphasis filter, and the value ranges from 0.9 to 1.0. If the speech sample value at time n is x(n), the pre-emphasized signal is
  • Step 8.2 Feature Extraction
  • Feature extraction is performed by the method of Mel Frequency Cepstral Coefficient (MFCC); the following steps are performed as follows:
  • Step 8.2.1 using the critical band effect of human hearing, using MEL cepstrum analysis technique to obtain a MEL cepstral coefficient vector sequence for speech signal processing;
  • Step 8.2.2 use the MEL cepstral coefficient vector sequence to represent the spectrum of the input speech, and set a number of bandpass filters with triangular or sinusoidal filtering characteristics in the speech spectrum range;
  • Step 8.2.3 Find the output data of each band pass filter through the band pass filter bank
  • Step 8.2.4 Logarithmically output data of each band pass filter and perform discrete cosine transform (DCT);
  • Step 8.2.5 Obtain the MFCC coefficient; the solution formula is as follows:
  • C i is a characteristic parameter
  • k is the number of triangular filters
  • F(k) is the output data of each filter
  • P is the filter order
  • i is the data length
  • the acoustic model parameters are trained according to the characteristic parameters of the training speech library
  • the characteristic parameters of the speech to be recognized can be matched with the acoustic model to obtain the recognition result.
  • the mixed Gaussian model-Hidden Markov Model (GMM-HMM) is used as the acoustic model, which includes the following steps:
  • Step 8.3.1 Find the joint probability density function of the mixed Gaussian model:
  • M is the number of Gaussian in the mixed Gaussian model
  • C m is the weight
  • u m is the mean
  • ⁇ m is the covariance matrix
  • D is the observation vector dimension
  • EM maximum expected value algorithm
  • N is the number of elements in the training data set
  • x (t) is the feature vector at time t
  • h m (t) is the posterior probability of time C m
  • GMM parameters are performed by EM algorithm It is estimated that it is possible to maximize the probability of generating a speech observation feature on the training data
  • q t i) of each state, use the GMM model to describe the observed probability distribution of the state; according to step 8.3.1, the formula is:
  • N is the number of states
  • i and j are states
  • a ij is the probability of jumping from the i state to the t state at time t-1
  • o t is the observed value at time t
  • C i,m is the mixing coefficient , indicating the weight between different Gaussians
  • u i,m represents the mean between different Gaussians
  • ⁇ i,m represents the covariance matrix between different Gaussians
  • the parameters of HMM are estimated by Baum-Welch algorithm, and finally generated Acoustic model file;
  • the N-Gram model is used to implement the training of the language model; the probability of the occurrence of the i-th word in a sentence depends on the N-1 words in front of it, that is, the context of a word is defined as the N-1 appearing in front of the word.
  • P(sentence) P(w 1 )*P(w 2
  • P (w 1) is the probability w 1 appears in the article
  • P (w 1, w 2 ) is w 1
  • w 1 ) is already known w 1
  • P(s) P(s)
  • P(s) P(w 1 , w 2 , . . . , w n ) represents the word set w 1 , w 2 ,... , the probability that w n appears continuously and generates S;
  • P(sentence) P(w 1 )*P(w 2
  • w i-1 ) P(w i-1 ,w i )/P(w i )
  • P(w i-1 ,w i ) and P(w i ) can all be learned from the corpus Statistics
  • P (sentence) the language model stores the probability statistics of P(w i-1 , w i ), and the entire recognition process is realized by finding the maximum value of P (sentence);
  • Step 8.5 Speech decoding and search algorithm
  • an identification network is established according to the trained acoustic model, the language model and the dictionary mapping file created by the g2p tool, and the best path is found in the network according to the search algorithm.
  • the word string of the speech signal is output with the maximum probability, so that the text contained in the speech sample is determined.
  • the Viterbi algorithm is used to implement speech decoding. The specific process is as follows:
  • Step 8.5.5 Output the optimal hidden state path
  • t t (i) is the joint probability of all nodes through which the optimal path passes when recursively to t
  • ⁇ t (i) is the implicit state at time t
  • T is time
  • P * is the probability of the optimal path
  • a takes 0.97.
  • step 9 specifically, the following steps are included:
  • Each type of model is a 1:1 modeling simulation of the real coal mining tool
  • the system After the user selects the model, and then selects the tool model option to be learned through this menu, the system will play the corresponding voice explanation, and click the button voice again to stop;
  • Step 9.4 Screenshot generation icon
  • Step 9.7 The dynamic gesture control model, the real environment is superimposed with the virtual model, and the gesture and the model are interactively controlled, including the following steps:
  • Step 9.7.1 Initialize video capture, read the logo file and camera camera parameters
  • Step 9.7.2 Grab the video frame image
  • Step 9.7.3 performing the detection mark and identifying the mark template in the video frame, and performing motion detection on the acquired video frame image by using the OpenCV library function to determine whether the motion track is detected;
  • step 9.7.4 If the result of the judgment is that the motion track is detected, proceed to step 9.7.4:
  • step 9.7.12 is performed
  • u t is the corresponding pixel point of the background image
  • u t+1 is the updated background image pixel point
  • I t is the pixel point of the current frame image
  • I f is the mask value of the current frame image pixel point, that is, whether Do background update
  • a ⁇ [0,1] is the background image model update speed
  • Step 9.7.4 Perform preprocessing including denoising on the image
  • the motion detection step if motion information is detected, the video frame image containing the motion gesture is preprocessed: median filtering is performed on the image by the medianBlur function of OpenCV to remove the salt and pepper noise;
  • Step 9.7.5 Convert to HSV space
  • the image is color-space converted by the cvtColor function to obtain the data of its HSV space, and the brightness v value in the HSV space is reset as shown below:
  • r, g are red and green pixels of the skin color region, and r>g;
  • Step 9.7.6 Split the hand area
  • Step 9.7.7 Perform morphological processing to remove impurities
  • Step 9.7.8 Obtain the hand contour
  • the gesture contour is obtained by OpenCV's findContours function, and then the pseudo contour is removed.
  • Step 9.7.9 Draw the outline of the hand and calibrate the information
  • Step 9.7.10 Comparison of contour information, setting direction vector
  • Step 9.7.11 Perform force simulation on the model according to vector coordinates to realize interaction between dynamic gestures and virtual models
  • the virtual model is subjected to the force simulation operation according to different judgment results.
  • the coordinate values of the model in the three-dimensional space will be three axes of x, y and z.
  • the multiplication calculation on the above, through the change of the coordinate value, the change of the model position is achieved to achieve the simulation of the force;
  • Step 9.7.12 Calculate a conversion matrix of the camera relative to the detected mark
  • Step 9.7.13 Superimpose the virtual object on the detected mark and return to step 9.7.2 to realize the superimposed display of the real environment and the virtual model.
  • the three-dimensional model of the present invention is established in equal proportions, and the texture map is closely related to the reality through the editing of the UE4 engine platform, and the ambient lighting of the application scene is simulated by the real lighting simulation.
  • the entire virtual reality scene is more realistic and immersive.
  • the present invention realizes multiple functional interactions through technical solutions, such as changing tool models through hidden menus during the process of virtual underground mining scene roaming, replacing mining materials to simulate different mining geology, freely moving the location of mining tools, and
  • the video information is embedded in the machine display to show the real scene, and the voice function is used to control the forward, reverse, lift, lowering, and stopping of the shearer.
  • the invention also generates a two-dimensional code function, and connects the PC end display to the mobile phone end display, and the mobile phone end function can utilize the built-in gyroscope of the mobile phone to generate gravity sensing, and can be utilized simply when set to the VR glasses mode.
  • VR glasses experience real-time scene immersion.
  • the invention also realizes the AR dynamic demonstration function by using the AR development SDK-ARToolKit.
  • the AR editing demonstration function the user can select the mining tool model in real time, perform 360 rotation display, voice explanation and dynamic running display, save the screenshot, etc. What is important is that it displays the tool model in the AR mode, and the virtual model is combined with the real environment. This not only shows the intuitive stereoscopicity of the model, but also shows its authenticity, making it better for learning and education. effect.
  • the AR module of the present invention in addition to its dynamic presentation function, adds processing to the video stream.
  • the dynamic gesture enters the camera perspective, it generates interaction with the model, and the dynamics of the hand from the far to the near are transmitted to the model.
  • a three-dimensional space forwards an analog force.
  • the top-to-bottom dynamics give the model an upward simulated force.
  • the forward flipping of the hand's dynamics gives the model a downward simulation.
  • if the hand is twisted or left or right Tilting gives the model a simulated force with vector direction.
  • the present invention expands the display function of AR in the field of astronomy in addition to the function realization of the AR module in the coal mine application scenario.
  • FIG. 1 is a diagram showing the overall functional structure of an implementation of the present invention.
  • FIG. 2 is a schematic diagram of the function of generating a two-dimensional code of the present invention.
  • FIG. 3 is a schematic diagram of the interactive function of the speech recognition of the present invention.
  • FIG. 4 is a schematic diagram of an AR mode implementation of the present invention.
  • Figure 5 is a flow chart showing the implementation of the dynamic gesture interaction function of the present invention.
  • the invention provides a multi-interaction realization method for mining operation based on virtual reality and augmented reality.
  • the entire technical function encompassed by the present invention can be understood in conjunction with FIG.
  • the specific implementation steps are as follows:
  • Step 1 The entire environmental scene of the underground mining operation is built. Create models based on real mining operations using 3DMax modeling tools. The model classification is imported into the UE4 engine. Through the UE4 platform, the model is written, the natural light and the ambient light are simulated, the physical collision detection is added, the parameters of the system are adjusted, and the baking is rendered.
  • Step 2 Add a first person role in the virtual application scenario and add a mouse and keyboard control event to the character. Bind the up, down, left and right keys of the keyboard to the Up, Down, Right, and Left functions to control the coordinates of the first person character in the virtual three-dimensional space to achieve roaming. Add a Turnaround function to the mouse to control the 720-degree rotation of the first person perspective in the virtual three-dimensional space.
  • Step 3 Set the interactive menu to realize the function interaction of replacing the tool model of the underground mining operation and the geological material of the mine.
  • the created Widget Blueprint user interface is displayed. Leave the Box collision collision detection area and the Widget Blueprint user interface is hidden.
  • the invention sets four types of mining tool models for the user to select, and sets the mine geology into a material selectable mode, and replaces the model and the material through the displayed style menu. After the replacement is completed, leaving the detection area, the menu is automatically hidden, which does not affect the overall roaming visual effect, and can achieve the function of real-time interaction.
  • Step 4 Video embedding, playing in three-dimensional space, simulating the monitoring display equipment of the mining environment.
  • the invention sets the keyboard X key binding to the MediaPlayer media class of the UE4 platform, and controls the playing and stopping of the video stream through the Open-Source and Close functions.
  • This operation can simulate the screen display of the underground mining control equipment and the real-time environmental monitoring screen display, highlighting the authenticity and dynamics of the three-dimensional scene, making the simulated virtual scene closer to reality.
  • Step 5 Select the model to drag to any location that the user wants to place, and to implement the intent interaction function that the device automatically opens.
  • Add a mouse event to the model to be operated select the model by the GetHitResult function, and then change the coordinate value of the model's SetActorLocation function according to the coordinates of the mouse in the three-dimensional space.
  • the GetHitResult function sets the model to the unselected mode.
  • the user can click on the shearer model in the scene and place it in other mining locations of the mining operation scenario.
  • the system adds a TriggerBox trigger to a specific area.
  • the first person character enters this area, triggers the TriggerBox trigger, and the corresponding area's ambient light control function SetVisible triggers, and the light is turned on, thereby realizing the automatic induction lamp function set by the present invention.
  • This is also the function of the present invention designed to detect human intent, thereby enabling more natural system interaction.
  • Step 6 Two-dimensional code generation function.
  • a single PC-side display cannot satisfy the multi-user experience.
  • the present invention can generate a multi-user mobile phone terminal by adding a two-dimensional code and scanning the two-dimensional code, and the mobile phone jumps to the panoramic display of the coal mining operation through the two-dimensional code connection. page.
  • the user can enable the gyroscope, switch to the VR split screen mode, set the mobile phone parameters, and then use the VR glasses to experience the virtual underground coal mining environment to achieve a 720-degree viewing angle display.
  • This function mainly adds the QR code generation and hiding function by binding the F and V keys of the keyboard.
  • Add 6 point collection points in the UE4 engine generate a panorama by collecting the position of the point, and then generate information and related mobile phone settings to generate a network connection in the form of a two-dimensional code to realize end-to-end conversion.
  • the process implemented by this function is shown in Figure 2.
  • Step 7 Implement voice control.
  • the invention realizes Chinese keyword recognition by using Pocket-sphinx.
  • the specific voice control implementation principle flow is shown in FIG. 3.
  • the present invention adds a speech recognition function to an actor created by a shearer model, by enabling a speech recognition class after system initialization, and saving a reference to such a class. Then create and bind a method to the speech recognition function OnWordSpoken. Whenever the user says the set control words, this method will be triggered to realize the forward, reverse, lift, and descend of the shearer through keyword matching. Related controls such as arm and stop.
  • the speech recognition realized by this method is realized based on the improved speech recognition system Sphinx developed by Carnegie Mellon University.
  • the speech recognition method of the present invention is an isolated word recognition method for a large number of vocabulary, non-specific people, and continuous Chinese syllables. It is a good way to identify the set words from different people.
  • the triggering of the action control function corresponding to the matching words after the speech vocabulary recognition is realized, and the corresponding action control of the model is realized.
  • This recognition system includes five parts: speech preprocessing, feature extraction, acoustic model training, language model training and speech decoding. The following is the specific process of speech recognition:
  • Step 7.1 Pretreatment.
  • the input original speech signal is processed, the unimportant information and background noise are filtered out, and the end point detection, speech framing, pre-emphasis and the like of the speech signal are performed.
  • the pre-emphasis of the speech signal is intended to emphasize the high-frequency part of the speech, remove the influence of the lip radiation, and increase the high-frequency resolution of the speech.
  • Step 7.2 Feature extraction.
  • MFCC Mel Frequency Cepstral Coefficient
  • C i is a characteristic parameter
  • k is the number of triangular filters
  • F(k) is the output data of each filter
  • P is the filter order
  • i is the data length.
  • Step 7.3 Acoustic model training.
  • the acoustic model parameters are trained according to the characteristic parameters of the training speech library. At the time of identification, the feature parameters of the speech to be recognized can be matched with the acoustic model to obtain a recognition result.
  • a mixed Gaussian model-Hidden Markov Model (GMM-HMM) is used as the acoustic model.
  • Step 7.3.1 Find the joint probability density function of the mixed Gaussian model:
  • M represents the number of Gaussian in the mixed Gaussian model
  • C m represents the weight
  • u m represents the mean
  • ⁇ m represents the covariance matrix
  • D is the observed vector dimension.
  • EM maximum expected value algorithm
  • N is the number of elements in the training data set
  • x (t) is the feature vector at time t
  • h m (t) is the posterior probability at time t m .
  • the GMM parameters are estimated by the EM algorithm, which maximizes the probability of generating speech observation features on the training data.
  • Step 7.3.2 Solve the three main components of the HMM.
  • the observed probability distribution b i (o t ) P(o t
  • N is the number of states
  • i and j are states
  • a ij is the probability of jumping from the i state to the t state at time t-1
  • o t is the observed value at time t
  • C i,m is the mixing coefficient , indicating the weight between different Gaussians
  • u i,m represents the mean between different Gaussians
  • ⁇ i,m represents the covariance matrix between different Gaussians
  • the parameters of HMM are estimated by Baum-Welch algorithm, and finally generated Acoustic model file;
  • Step 7.4 Language Model Training.
  • the language model is used to constrain word search.
  • Language modeling can effectively combine the knowledge of Chinese grammar and semantics, and describe the intrinsic relationship between words, thereby improving the recognition rate and reducing the search range.
  • This paper uses the N-Gram model to implement the training of the language model.
  • the probability that the i-th word appears in a statement depends on the N-1 words in front of it.
  • the context of a word is defined as the N-1 words appearing in front of the word.
  • the expression is:
  • the language model is the model obtained from the statistical corpus.
  • the corpus is the text library used for training.
  • the dictionary file stores the corpus for training and the corresponding speech.
  • the language model is the combined probability of the expressed corpus.
  • the set P (w 1) w 1 is the probability of appearing in the article, P (w 1, w 2 ) are w 1, w 2 consecutive probability, P (w 2
  • P(sentence) P(w 1 )*P(w 2
  • P(sentence) P(w 1 )*P(w 2
  • Step 7.5 Speech decoding and search algorithm.
  • an identification network is established according to the trained acoustic model, language model and dictionary, and the best path is found in the network according to the search algorithm, and the path is capable of outputting the speech signal with maximum probability.
  • the string of words thus determining the text contained in this speech sample.
  • the Viterbi algorithm is used to decode the speech. The specific process is as follows:
  • the simulation system After the user speaks the boom, the down arm, the forward rotation, the reverse rotation, and the stop voice, the simulation system realizes the corresponding operation of the shearer, and the system recognizes the keyword spoken by the user and displays it in the upper left corner of the interface.
  • Step 8 AR dynamic demonstration function mode switching.
  • Step 9 Model selection, model explanation, and dynamic presentation in AR mode.
  • the AR dynamic demonstration module of the present invention the user interface is designed to be more concise and convenient for AR display, and the second-level implicit menu is designed.
  • the additional sub-function selections of model selection, model explanation, model demonstration and function expansion are designed to be hidden.
  • the second-level menu, model selection is divided into coal mining machine, roadheader, wind coal drill, fully mechanized mining bracket and other models.
  • the submenu can be hidden, and the model explanation, model dynamic demonstration and function expansion menu are also realized.
  • the specific implementation includes content can refer to FIG. This paper takes NFT (Natural Feature Tracking) as an example to implement AR technology. The principle is shown in Figure 4. The specific process is as follows:
  • Step 9.1 Through the camera calibration calibration, the distortion parameter caused by the deviation of the manufacturing process of the camera, that is, the intrinsic matrix of the camera, is obtained to restore the one-to-one correspondence of the 3D space of the camera model to the 2D space.
  • Step 9.2 According to the hardware parameters of the camera itself, we can calculate the corresponding Projection Matrix.
  • Step 9.3 Feature extraction of the natural image to be recognized, and a set of feature points ⁇ p ⁇ is obtained.
  • Step 9.4 Feature extraction of the image acquired by the camera in real time is also a set of feature points ⁇ q ⁇ .
  • Step 9.5 Using the ICP (Iterative Closest Point) algorithm to iteratively solve the R, T matrix (Rotation & Translation) of the two sets of feature points, that is, the Pose matrix, which is the model view matrix commonly referred to in graphics.
  • the two points in the three-dimensional space are: Their Euclidean distance is:
  • the lowest R and T, at this time R, T is the MVP matrix.
  • E is the distance sum of the corresponding points in the two points after the transformation
  • N is the number of points in the point concentration.
  • Step 9.6 Obtain the MVP matrix (Model View Projection) for 3D graphics rendering.
  • Step 10 Screenshot generation icon.
  • Step 11 Model rotation stops showing.
  • AR mode the user sees the superposition of the real scene and the virtual model.
  • This demonstration learning mode is more authentic and immersive.
  • Step 12 AR function expansion module.
  • the invention adds an AR education display extension function, and controls the Map switching by adding a secondary UI to realize demonstration of different objects. These include the Earth, Saturn, Mercury, the Earth's planet and the galaxy's running display function. The planet does the rotation. Through the AR mode, the moving planet is displayed in front of the user, and the knowledge introduction function is added to improve the educational display function of the system. .
  • Step 13 Dynamic gestures interact with the model.
  • the AR mode adds OpenCV video information processing. After initializing the video stream, motion detection is performed first. If dynamic hand motion is detected, image processing is performed, and the gesture is subjected to graphics processing, denoising, converting to HSV mode, morphological processing, and drawing outline. The calibration information and the contour information are compared. Finally, the model force simulation is carried out to realize the interaction between the dynamic gesture and the virtual model.
  • the specific implementation principle flow is shown in FIG. 5 .
  • the dynamic gesture interaction realizes the recognition control of the simulated three-dimensional gesture, and the dynamic hand acquired by the video stream is two-dimensional information, and the matrix calculation is used to compare the calculated camera with the detected transformation matrix of the detected identifier.
  • the specific steps include the following steps:
  • Step 13.1 Motion Detection
  • the method is based on the motion detection of the color histogram and the background difference.
  • the program needs a certain time in the process of starting the camera. This time, it is possible to collect images of 20 frames, and the cyclic background of the 20 frames is updated as follows, and each frame is After the motion detection, pixels other than the motion gesture area are also updated as background.
  • u t is the corresponding pixel point of the background image
  • u t+1 is the updated background image pixel point
  • I t is the pixel point of the current frame image
  • I f is the mask value of the current frame image pixel point, that is, whether Do background update
  • a ⁇ [0,1] is the background image model update speed, generally 0.8 to 1, and 0.8 for this method.
  • step 13 if motion information is detected, the video frame image containing the motion gesture is preprocessed: median filtering is performed by the medianBlur function of OpenCV to remove the salt and pepper noise:
  • the color space is converted by the cvtColor function to obtain the data of its HSV space, and the brightness v value in the HSV space is reset to a relatively small brightness value (reducing the interference of static skin color); the brightness in the HSV space is v The value is reset as shown below:
  • r, g are red and green pixels of the skin color region of interest, and r>g;
  • Step 13.4 Divide the hand area and perform morphological processing
  • Step 13.5 Get the gesture outline
  • the gesture contour is obtained by OpenCV's findContours function, and then the pseudo contour is removed.
  • Step 13.6 Draw outlines, calibration information
  • Step 13.7 Contour information comparison, set direction vector
  • Step 13.8 Applying a force simulation to the virtual model through the direction vector
  • the virtual model is subjected to the force simulation operation according to different judgment results.
  • the coordinate values of the model in the three-dimensional space will be multiplied by the x, y, and z coordinate axes, and the change of the coordinate value will realize the change of the model position to achieve the force. Simulation.
  • a set of palms are selected from the far-to-near motion, the bottom-up motion, and the palms are twisted in various directions to simulate different force effects generated by the model, and the gesture motion models are respectively moved forward, moved upward, and according to The different twist directions of the hand have a running effect on the forces in all directions.
  • This feature demonstrates the interaction of dynamic gestures with virtual models. This interaction helps users view the model from multiple angles and realizes the interaction between teaching and users, increasing the fun.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Architecture (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Processing Or Creating Images (AREA)
  • Coloring Foods And Improving Nutritive Qualities (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention discloses a multi-interaction implementation method for a mining operation based on virtual reality and augmented reality, belonging to the technical field of virtual reality and augmented reality. The method comprises two modes: a virtual reality mode and an augmented reality mode. The following operations can be implemented in a virtual reality scene: displaying a model, selecting and replacing materials, scene exploration, random movement and placement of the model, embedding videos, QR code generation, realizing natural interactions and voice interactions by means of a trigger; The following operations can be implemented in an augmented reality scene: model selection, voice playback, demonstration of model operation dynamics and model rotation and stop control, screen capturing and functional expansions. Various manners of interaction, such as voice control, hand gesture control and control via a keyboard and a mouse cursor can be implemented in the two modes. The method can be applied to virtual simulations of applicable mine operation scenarios. The method is used to train miners in a mining area and students specializing in mining engineering, to reduce training costs and to improve workers' skills, thereby providing a fast and effective means for guiding production, construction, and technical scientific research.

Description

一种基于虚拟现实与增强现实的采矿操作多交互实现方法Multi-interaction realization method of mining operation based on virtual reality and augmented reality 技术领域Technical field
本发明属于虚拟现实、增强现实技术领域,具体涉及一种基于虚拟现实与增强现实的采矿操作多交互实现方法。The invention belongs to the field of virtual reality and augmented reality technology, and particularly relates to a multi-interaction realization method for mining operations based on virtual reality and augmented reality.
背景技术Background technique
2016年被业界称为“虚拟现实元年”,可能会有人误以为这项技术是近几年发展起来的新技术。其实不然,虚拟现实(Virtual Reality,简称VR)技术兴起于20世纪90年代,2000年以后,虚拟现实技术在整合发展中引入了XML、JAVA等先进技术,应用强大的3D计算能力和交互式技术,提高渲染质量和传输速度,进入了崭新的发展时代。虚拟现实技术是经济和社会生产力发展的产物,有着广阔的应用前景。我国虚拟现实技术的研究起步于20世纪90年代初。随着计算机图形学、计算机系统工程等的高速发展,虚拟现实技术得到相当的重视。国家广告研究院等多家机构联合发布的《2016上半年中国VR用户行为研究报告》显示,2016年上半年国内虚拟现实潜在用户达4.5亿,浅度用户约为2700万,重度用户约237万,预计国内虚拟现实市场将迎来爆发式增长。而增强现实(Augmented Reality,简称AR)技术是在虚拟现实的基础上发展起来的一种新兴技术。其应用领域也非常广泛,其在工业、医疗、军事、市政、电视、游戏、展览等领域都表现出了良好的应用前景。In 2016, it was called “the first year of virtual reality” by the industry. Some people may mistakenly believe that this technology is a new technology developed in recent years. In fact, Virtual Reality (VR) technology emerged in the 1990s. After 2000, virtual reality technology introduced advanced technologies such as XML and JAVA in the integration and development, applying powerful 3D computing power and interactive technology. , to improve the rendering quality and transmission speed, entered a new era of development. Virtual reality technology is the product of economic and social productivity development and has broad application prospects. The study of virtual reality technology in China started in the early 1990s. With the rapid development of computer graphics and computer system engineering, virtual reality technology has received considerable attention. The National Advertising Research Institute and other organizations jointly released the "China VR User Behavior Research Report for the First Half of 2016", showing that in the first half of 2016, the number of potential users of domestic virtual reality reached 450 million, the number of shallow users was about 27 million, and the number of heavy users was 2.37 million. It is expected that the domestic virtual reality market will usher in explosive growth. Augmented Reality (AR) technology is an emerging technology developed on the basis of virtual reality. Its application fields are also very extensive, and it has shown good application prospects in the fields of industry, medical, military, municipal, television, games, exhibitions and so on.
目前,VR与AR技术不断发展,应用范围也越来越广泛,但是这两种技术更多的应用于军事、娱乐等领域,对于教育、工业、工程等领域的应用,由于领域本身涉及多种物理、地理等多学科因素,还需要更多的研究与发展。对于矿山开采工业领域,我国矿山的地质条件较为复杂,且多为井下开采,在矿山开采过程中,由于开采环境位于地下,工艺工序又颇为复杂,瓦斯、水害等灾害事故时有发生。与此同时,矿山开采又是一个工期长、投资大、安全隐患高的行业,很容易发生安全事故,所以采矿员工的安全培训一直是采矿工作的重中之重。但是,目前存在的传统培训教学系统,基本是理论介绍加以模具展示或者二维图像展示,以课堂讲解为主,辅以简单的动画和音、视频的介绍,实践不足、缺乏真实场景。即便是观看模具也不能很好的掌握工具的实际操作流程。随着技术的不断发展,各种应用于煤矿开采的训练系统也相应开发,但是也存在系统场景真实性差、沉浸性效果不好以及交互性功能少,只能简单的演示等问题。At present, VR and AR technologies are constantly evolving and the scope of application is more and more extensive, but these two technologies are more used in military, entertainment and other fields. For applications in education, industry, engineering, etc., Multidisciplinary factors such as physics and geography still require more research and development. For the mining industry, the geological conditions of mines in China are complex, and most of them are underground mining. In the mining process, because the mining environment is underground, the process is quite complicated, and gas accidents such as gas and water damage occur. At the same time, mining is another industry with long construction period, large investment and high safety hazard. It is easy to cause safety accidents. Therefore, the safety training of mining employees has always been the top priority of mining work. However, the existing traditional training and teaching system is basically a theoretical introduction to mold display or two-dimensional image display, mainly in the classroom, supplemented by simple animation and audio and video introduction, lack of practice, lack of real scenes. Even if you look at the mold, you can't grasp the actual operation flow of the tool. With the continuous development of technology, various training systems applied to coal mining have been developed accordingly, but there are also problems such as poor authenticity of the system scene, poor immersive effect, and few interactive functions, which can only be simply demonstrated.
发明内容Summary of the invention
针对现有技术中存在的上述技术问题,本发明提出了一种基于虚拟现实与增强现实的采矿操作多交互实现方法,设计合理,克服了现有技术的不足,具有良好的效果。Aiming at the above technical problems existing in the prior art, the present invention proposes a multi-interaction realization method for mining operations based on virtual reality and augmented reality, which is reasonable in design, overcomes the deficiencies of the prior art, and has good effects.
为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
一种基于虚拟现实与增强现实的采矿操作多交互实现方法,采用井下采矿操作多交互仿真系统,该系统包含虚拟现实模式和增强现实模式两种模式;虚拟现实模式包含特定场景的建模、漫游、模型及其材质的更换、视频嵌入虚拟场景、模型移动、应用场景意向交互、二维码生成以及语音交互;增强现实模式包含模型选择、模型讲解、动态模型演示、手势控制模型交互、截图生成图标、360度旋转以及停止、功能模式切换以及功能扩展;系统有设计两种隐藏菜单,即虚拟现实模式下的更换工具、材质的选择菜单以及增强现实模式下的模型选择类菜单;第一种用户进入特定区域菜单才会显示,离开即会隐藏;第二种点击便可在某处显示二级菜单,再次点击菜单隐藏;A multi-interaction realization method for mining operation based on virtual reality and augmented reality, adopting a multi-interaction simulation system for underground mining operation, the system includes two modes: virtual reality mode and augmented reality mode; virtual reality mode includes modeling and roaming of specific scenes , model and its material replacement, video embedded virtual scene, model movement, application scene intentional interaction, two-dimensional code generation and voice interaction; augmented reality mode including model selection, model explanation, dynamic model demonstration, gesture control model interaction, screenshot generation Icon, 360 degree rotation and stop, function mode switching and function expansion; the system has two hidden menus, namely the replacement tool in the virtual reality mode, the material selection menu and the model selection menu in the augmented reality mode; the first type The user enters the specific area menu to display, leaving will hide; the second type can display the secondary menu somewhere, click the menu again to hide;
所述的采矿操作多交互实现方法,具体包括如下步骤:The mining operation multiple interaction implementation method specifically includes the following steps:
步骤1:矿山开采操作的整个环境场景的搭建Step 1: Construction of the entire environmental scene of the mining operation
根据井下采矿操作的真实环境,利用建模工具3DMax进行1:1等比建模,实现整个井下采矿操作的环境模拟;利用UE4引擎对模型进行包括创建、编辑贴图以及材质在内的编辑, 添加物理碰撞,对整体环境进行灯光、效果光照以及特效添加,并进行烘焙、渲染;According to the real environment of underground mining operation, the modeling tool 3DMax is used for 1:1 equal ratio modeling to realize the environment simulation of the whole underground mining operation; the UE4 engine is used to edit the model including creation, editing texture and material, adding Physical collision, adding lighting, effect lighting and special effects to the overall environment, baking and rendering;
步骤2:虚拟现实应用场景的漫游Step 2: Roaming of the virtual reality application scenario
在UE4引擎中,设置键盘上、下、左、右键,绑定Up、Down、Right、Left方向控制函数,为鼠标绑定Turnaround控制函数,实现整个井下采矿操作的虚拟现实场景的漫游;In the UE4 engine, the upper, lower, left, and right keys of the keyboard are set, and the Up, Down, Right, and Left direction control functions are bound, and the Turnaround control function is bound to the mouse to realize the roaming of the virtual reality scene of the entire underground mining operation;
步骤3:更换井下采矿操作的工具模型以及矿山地质的模拟材质Step 3: Replace the tool model of the underground mining operation and the simulated material of the mine geology
在虚拟井下矿山开采场景中添加隐藏菜单,当漫游至矿山开采处,会自动出现模型或者材质选择菜单,用户可以根据需求从菜单中选择模型或者材质进行更换;Add hidden menus in the virtual underground mining scene. When roaming to the mining site, the model or material selection menu will appear automatically. Users can select models or materials to replace them from the menu according to their needs.
步骤4:将视频素材嵌入三维应用场景并控制播放、停止Step 4: Embed the video material into the 3D application scene and control playback and stop
将视频素材嵌入虚拟现实场景,在三维空间中播放,模拟矿山开采环境的监控显示设备,设置键盘X键,绑定UE4平台的MediaPlayer媒体类,通过OpenSource和Close函数控制视频的播放和停止;Embed the video material into the virtual reality scene, play it in the three-dimensional space, simulate the monitoring display device of the mining environment, set the keyboard X key, bind the MediaPlayer media class of the UE4 platform, and control the playing and stopping of the video through the OpenSource and Close functions;
步骤5:选择模型并移动到任意位置Step 5: Select the model and move to any location
通过鼠标选中模型并将模型移动到任意需要进行模拟操作的位置,达到真实场景中的机械移动模拟;Select the model by mouse and move the model to any position where the simulation operation is needed to achieve the mechanical movement simulation in the real scene;
步骤6:实现应用场景的意向交互Step 6: Implement the intent interaction of the application scenario
当用户在虚拟现实应用场景中漫游至某一特定位置,系统检测到用户有意向进入,就自动开启环境灯,实现虚拟场景中的自然交互;When the user roams to a specific location in the virtual reality application scenario, and the system detects that the user intentionally enters, the environment light is automatically turned on to implement natural interaction in the virtual scenario;
步骤7:二维码生成Step 7: QR code generation
绑定键盘的F键,添加二维码生成函数,设置键盘按键控制生成二维码功能,用户按键盘F键,系统生成含设置好采样点的虚拟场景全景图的二维码;Bind the F key of the keyboard, add the QR code generation function, set the keyboard key control to generate the QR code function, and press the F key of the keyboard to generate a two-dimensional code of the virtual scene panorama with the set sampling points;
步骤8:实现语音交互Step 8: Implement voice interaction
用户通过包括正转、反转、升臂、降臂、停止在内的关键词控制虚拟现实场景中的采煤机,模拟其运作效果;The user controls the coal mining machine in the virtual reality scene through keywords including forward rotation, reverse rotation, boom raising, lowering arm and stopping, and simulates the operation effect;
步骤9:AR动态演示功能模式切换Step 9: AR dynamic demonstration function mode switching
用户点击系统右上角的AR模式按键切换到AR演示模式。The user clicks the AR mode button in the upper right corner of the system to switch to the AR presentation mode.
优选地,在步骤3中,将模型实例化为具体的Actor,添加SetMesh函数以及SetMaterial函数来更换模型和模型材质,设置Widget Blueprint用户界面以及Box collision碰撞检测,实现三维空间的隐藏菜单功能。Preferably, in step 3, the model is instantiated into a specific Actor, the SetMesh function and the SetMaterial function are added to replace the model and the model material, and the Widget Blueprint user interface and the Box collision collision detection are set to realize the hidden menu function of the three-dimensional space.
优选地,在步骤5中,为要操作的模型添加鼠标事件,通过GetHitResult函数将模型选中,然后根据鼠标在三维空间的坐标,改变模型的SetActorLocation函数的坐标值,当鼠标再次点击,将此时鼠标x、y、z三个方向的坐标值赋给模型,此时GetHitResult函数将模型设置为取消选中模式。Preferably, in step 5, a mouse event is added for the model to be operated, the model is selected by the GetHitResult function, and then the coordinates of the SetActorLocation function of the model are changed according to the coordinates of the mouse in the three-dimensional space, and when the mouse clicks again, the time will be The coordinate values of the mouse x, y, and z directions are assigned to the model, and the GetHitResult function sets the model to the unselected mode.
优选地,在步骤6中,设置TriggerBox触发器,当第一人称角色触发TriggerBox,系统检测到用户有意向进入某区域,便会自动启用此区域的某个设备。Preferably, in step 6, the TriggerBox trigger is set. When the first person role triggers the TriggerBox, and the system detects that the user intentionally enters an area, a device in the area is automatically enabled.
优选地,在步骤7中,用户按键盘F键,系统生成含设置好采样点的虚拟场景全景图二维码,用户用手机扫描二维码,跳转到手机端的虚拟应用场景展示页面,在手机端,用户可以启用陀螺仪,切换到VR分屏模式,设置好手机参数,便可用VR眼镜体验虚拟井下采矿操作环境场景,实现720度的视角展示,还能够实现手机端的多场景、多角度的漫游体验。Preferably, in step 7, the user presses the keyboard F key, and the system generates a virtual scene panorama QR code with the set sampling point, and the user scans the two-dimensional code with the mobile phone, and jumps to the virtual application scene display page of the mobile terminal, On the mobile phone side, the user can enable the gyroscope, switch to the VR split screen mode, set the mobile phone parameters, and then use the VR glasses to experience the virtual underground mining operation environment scene, realize the 720 degree view display, and also realize the multi-scene and multi-angle of the mobile phone end. Roaming experience.
优选地,在步骤8中,语音识别基于Pocket-sphinx库实现,通过改进中文关键字字典,经过预处理、特征提取、声学模型训练、语言模型训练以及语音解码和搜索实现识别功能,最后经过UE4引擎的编写功能控制函数实现三维空间里语音对模型的控制;语音识别的具体实现步骤如下:Preferably, in step 8, the speech recognition is implemented based on the Pocket-sphinx library, and the recognition function is implemented through the improved Chinese keyword dictionary, through preprocessing, feature extraction, acoustic model training, language model training, and speech decoding and searching, and finally passes through the UE4. The writing function control function of the engine realizes the control of the speech model in the three-dimensional space; the specific implementation steps of the speech recognition are as follows:
步骤8.1:预处理Step 8.1: Pretreatment
对输入的原始语音信号进行处理,滤除掉其中的不重要的信息以及背景噪声,并对语音信号的端点检测、语音分帧和预加重进行处理;Processing the input original speech signal, filtering out unimportant information and background noise, and processing end point detection, speech framing and pre-emphasis of the speech signal;
通过一阶FIR高通数字滤波器来实现预加重,一阶FIR高通数字滤波器的传递函数为:Pre-emphasis is achieved by a first-order FIR high-pass digital filter. The transfer function of the first-order FIR high-pass digital filter is:
H(z)=1-az -1H(z)=1-az -1 ;
其中,a为预加重滤波器的系数,取值范围为0.9~1.0,若设n时刻的语音采样值为x(n),则预加重后的信号为Where a is the coefficient of the pre-emphasis filter, and the value ranges from 0.9 to 1.0. If the speech sample value at time n is x(n), the pre-emphasized signal is
y(n)=x(n)-a*x(n-1);y(n)=x(n)-a*x(n-1);
步骤8.2:特征提取Step 8.2: Feature Extraction
通过梅尔频率倒谱系数(MFCC)的方法来进行特征提取;具体按照如下步骤进行:Feature extraction is performed by the method of Mel Frequency Cepstral Coefficient (MFCC); the following steps are performed as follows:
步骤8.2.1:利用人听觉的临界带效应,采用MEL倒谱分析技术对语音信号处理得到MEL倒谱系数矢量序列;Step 8.2.1: using the critical band effect of human hearing, using MEL cepstrum analysis technique to obtain a MEL cepstral coefficient vector sequence for speech signal processing;
步骤8.2.2:用MEL倒谱系数矢量序列表示输入语音的频谱,在语音频谱范围内设置若干个具有三角形或正弦形滤波特性的带通滤波器;Step 8.2.2: use the MEL cepstral coefficient vector sequence to represent the spectrum of the input speech, and set a number of bandpass filters with triangular or sinusoidal filtering characteristics in the speech spectrum range;
步骤8.2.3:通过带通滤波器组,求各个带通滤波器的输出数据;Step 8.2.3: Find the output data of each band pass filter through the band pass filter bank;
步骤8.2.4:对各个带通滤波器的输出数据取对数,并做离散余弦变换(DCT);Step 8.2.4: Logarithmically output data of each band pass filter and perform discrete cosine transform (DCT);
步骤8.2.5:得到MFCC系数;求解公式如下:Step 8.2.5: Obtain the MFCC coefficient; the solution formula is as follows:
Figure PCTCN2017118923-appb-000001
Figure PCTCN2017118923-appb-000001
其中,C i为特征参数,k为三角滤波器的个数,F(k)为各个滤波器的输出数据,P为滤波器阶数,i为数据长度; Where C i is a characteristic parameter, k is the number of triangular filters, F(k) is the output data of each filter, P is the filter order, and i is the data length;
步骤8.3:声学模型训练Step 8.3: Acoustic Model Training
根据训练语音库的特征参数训练出声学模型参数;The acoustic model parameters are trained according to the characteristic parameters of the training speech library;
在识别时可以将待识别的语音的特征参数同声学模型进行匹配,得到识别结果;本文采用混合高斯模型-隐马尔科夫模型(GMM-HMM)作为声学模型,具体包括如下步骤:In the identification, the characteristic parameters of the speech to be recognized can be matched with the acoustic model to obtain the recognition result. In this paper, the mixed Gaussian model-Hidden Markov Model (GMM-HMM) is used as the acoustic model, which includes the following steps:
步骤8.3.1:求出混合高斯模型的联合概率密度函数:Step 8.3.1: Find the joint probability density function of the mixed Gaussian model:
Figure PCTCN2017118923-appb-000002
Figure PCTCN2017118923-appb-000002
其中,M表示混合高斯模型中高斯的个数,C m表示权重,u m表示均值,∑ m表示协方差矩阵,D为观测矢量维数;利用最大期望值算法(EM)对混合高斯模型参数变量Θ={C m,u m,∑ m}进行估计,利用如下公式求解: Where M is the number of Gaussian in the mixed Gaussian model, C m is the weight, u m is the mean, ∑ m is the covariance matrix, D is the observation vector dimension, and the maximum expected value algorithm (EM) is used to mix the Gaussian model parameter variables. Θ={C m ,u m ,∑ m } is estimated and solved by the following formula:
Figure PCTCN2017118923-appb-000003
Figure PCTCN2017118923-appb-000003
Figure PCTCN2017118923-appb-000004
Figure PCTCN2017118923-appb-000004
Figure PCTCN2017118923-appb-000005
Figure PCTCN2017118923-appb-000005
其中,j是当前迭代轮数,N表示训练数据集中元素的个数,x (t)为t时刻的特征向量,h m(t)表示时刻C m的后验概率;GMM参数通过EM算法进行估计,可以使其在训练数据上生成语音观察特征的概率最大化; Where j is the number of current iterations, N is the number of elements in the training data set, x (t) is the feature vector at time t, h m (t) is the posterior probability of time C m ; GMM parameters are performed by EM algorithm It is estimated that it is possible to maximize the probability of generating a speech observation feature on the training data;
步骤8.3.2:求解HMM的三个主要组成部分Step 8.3.2: Solve the three main components of the HMM
设状态序列为q 1,q 2,…,q N,令转移概率矩阵A=|a ij|i,j∈[1,N],则求出的马尔科夫链状态间的跳转概率为:a ij=P(q t=j|q t-1=i);马尔科夫链的初始概率π=[π i]i∈[1,N],其中,π i=P(q 1=i);令每个状态的观察概率分布n i(o t)=P(o t|q t=i),采用GMM模型来描述状态的观察概率分布;根据步骤8.3.1,求解公式为: Let the state sequence be q 1 , q 2 ,...,q N , and let the transition probability matrix A=|a ij |i,j∈[1,N], then the jump probability between the Markov chain states is :a ij =P(q t =j|q t-1 =i); the initial probability of Markov chain π=[π i ]i∈[1,N], where π i =P(q 1 = i); Let the observed probability distribution n i (o t )=P(o t |q t =i) of each state, use the GMM model to describe the observed probability distribution of the state; according to step 8.3.1, the formula is:
Figure PCTCN2017118923-appb-000006
Figure PCTCN2017118923-appb-000006
其中,N为状态个数,i、j表示状态,a ij表示t-1时刻从i状态跳转到t时刻j状态的概率,o t为t时刻的观测值,C i,m为混合系数,表示不同高斯之间的权重,u i,m表示不同高斯之间的均值,∑ i,m表示不同高斯之间的协方差矩阵;HMM的参数通过Baum-Welch算法进行估计得出,最后生成声学模型文件; Where N is the number of states, i and j are states, a ij is the probability of jumping from the i state to the t state at time t-1, o t is the observed value at time t, and C i,m is the mixing coefficient , indicating the weight between different Gaussians, u i,m represents the mean between different Gaussians, ∑ i,m represents the covariance matrix between different Gaussians; the parameters of HMM are estimated by Baum-Welch algorithm, and finally generated Acoustic model file;
步骤8.4:语言模型训练Step 8.4: Language Model Training
采用N-Gram模型实现语言模型的训练;在一个语句中第i个词出现的概率,条件依赖于它前面的N-1个词,即将一个词的上下文定义为该词前面出现的N-1个词,其表达式为:The N-Gram model is used to implement the training of the language model; the probability of the occurrence of the i-th word in a sentence depends on the N-1 words in front of it, that is, the context of a word is defined as the N-1 appearing in front of the word. Word whose expression is:
Figure PCTCN2017118923-appb-000007
Figure PCTCN2017118923-appb-000007
使用条件概率公式S将上述表达式替换成如下公式:Replace the above expression with the following formula using the conditional probability formula S:
P(sentence)=P(w 1)*P(w 2|w 1)*P(w 3|w 2)…*P(w n|w 1,w 2,…,w n-1) P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )...*P(w n |w 1 ,w 2 ,...,w n-1 )
其中,P(w 1)是w 1在文章中出现的概率,P(w 1,w 2)是w 1,w 2连续出现的概率,P(w 2|w 1)是已知w 1已出现的情况下出现w 2的概率,假设识别sentence的概率用P(s)表示,P(s)=P(w 1,w 2,…,w n)表示单词集w 1,w 2,…,w n连续出现并生成S的概率; Where, P (w 1) is the probability w 1 appears in the article, P (w 1, w 2 ) is w 1, w 2 consecutive probability of, P (w 2 | w 1 ) is already known w 1 The probability of occurrence of w 2 occurs in the case of occurrence, assuming that the probability of identifying the sentence is represented by P(s), and P(s)=P(w 1 , w 2 , . . . , w n ) represents the word set w 1 , w 2 ,... , the probability that w n appears continuously and generates S;
通过马尔科夫假设精简成如下公式:Streamlined to the following formula by the Markov assumption:
P(sentence)=P(w 1)*P(w 2|w 1)*P(w 3|w 2)…*P(w n|w n-1) P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )...*P(w n |w n-1 )
其中,P(w i|w i-1)=P(w i-1,w i)/P(w i),P(w i-1,w i)和P(w i)都可以从语料统计出来,最终就能得到P(sentence),语言模型存储P(w i-1,w i)的概率统计值,通过求出P(sentence)的最大值来实现整个识别过程; Where P(w i |w i-1 )=P(w i-1 ,w i )/P(w i ), P(w i-1 ,w i ) and P(w i ) can all be learned from the corpus Statistics, finally, P (sentence), the language model stores the probability statistics of P(w i-1 , w i ), and the entire recognition process is realized by finding the maximum value of P (sentence);
步骤8.5:语音解码和搜索算法Step 8.5: Speech decoding and search algorithm
针对输入的语音信号,根据己经训练好的声学模型、语言模型及利用g2p工具创建好的字典映射文件建立一个识别网络,根据搜索算法在该网络中寻找最佳的一条路径,这个路径就是能够以最大概率输出该语音信号的词串,这样就确定这个语音样本所包含的文字,本文采用Viterbi算法实现语音解码,具体过程如下:For the input speech signal, an identification network is established according to the trained acoustic model, the language model and the dictionary mapping file created by the g2p tool, and the best path is found in the network according to the search algorithm. The word string of the speech signal is output with the maximum probability, so that the text contained in the speech sample is determined. The Viterbi algorithm is used to implement speech decoding. The specific process is as follows:
步骤8.5.1:输入HMM模型的参数和观测序列O={o 1,o 2,…,o T},则t=1时所有的状态概率为: Step 8.5.1: Enter the parameters of the HMM model and the observation sequence O={o 1 ,o 2 ,...,o T }, then all state probabilities when t=1 are:
δ 1(i)=π ib i(o 1) δ 1 (i)=π i b i (o 1 )
ψ 1(i)=0 ψ 1 (i)=0
步骤8.5.2:逐渐递推到t=2,3,…,T,则为:Step 8.5.2: Gradually recursive to t=2,3,...,T, then:
Figure PCTCN2017118923-appb-000008
Figure PCTCN2017118923-appb-000008
Figure PCTCN2017118923-appb-000009
Figure PCTCN2017118923-appb-000009
步骤8.5.3:终止遍历:Step 8.5.3: Terminate traversal:
Figure PCTCN2017118923-appb-000010
Figure PCTCN2017118923-appb-000010
Figure PCTCN2017118923-appb-000011
Figure PCTCN2017118923-appb-000011
步骤8.5.4:回溯最优路径,t=T-1,T-2,…,1;Step 8.5.4: Backtracking the optimal path, t=T-1, T-2,...,1;
Figure PCTCN2017118923-appb-000012
Figure PCTCN2017118923-appb-000012
步骤8.5.5:输出最优隐状态路径
Figure PCTCN2017118923-appb-000013
Step 8.5.5: Output the optimal hidden state path
Figure PCTCN2017118923-appb-000013
其中,t t(i)是递推到t时刻,最优路径经过的所有结点的联合概率,ψ t(i)是t时刻的隐状态,T为时间,P *为最优路径的概率,
Figure PCTCN2017118923-appb-000014
为最优路径的终结点。
Where t t (i) is the joint probability of all nodes through which the optimal path passes when recursively to t, ψ t (i) is the implicit state at time t, T is time, and P * is the probability of the optimal path ,
Figure PCTCN2017118923-appb-000014
The endpoint for the optimal path.
优选地,a取0.97。Preferably, a takes 0.97.
优选地,在步骤9中,具体包括如下步骤:Preferably, in step 9, specifically, the following steps are included:
步骤9.1:模型选择Step 9.1: Model Selection
对采煤机模型、掘进机模型、风煤钻模型以及综采支架模型进行选择,每一类模型都是对真实采煤工具的1:1建模模拟;Select the shearer model, the roadheader model, the wind coal drill model and the fully mechanized support model. Each type of model is a 1:1 modeling simulation of the real coal mining tool;
步骤9.2:模型讲解Step 9.2: Model explanation
用户通过选择模型后,再通过此菜单选择需要学习的工具模型选项,系统会播放对应的语音讲解,再次点击按键语音停止;After the user selects the model, and then selects the tool model option to be learned through this menu, the system will play the corresponding voice explanation, and click the button voice again to stop;
步骤9.3:模型演示Step 9.3: Model Demo
将在3DMax建模过程中制作的工具模拟运行动画导入到Unreal Engine引擎中,设置相应的选择菜单,点击便可在AR模式下演示相应采煤工具的运行状态;Import the tool simulation running animation created in the 3DMax modeling process into the Unreal Engine engine, set the corresponding selection menu, and click to demonstrate the running state of the corresponding coal mining tool in AR mode;
步骤9.4:截图生成图标Step 9.4: Screenshot generation icon
在AR模式的主菜单,添加一个按钮,绑定摄像机的截图功能,在菜单右侧添加滚动菜单栏,当截图函数成功触发,截图通过设置好的动态材质转换函数,显示到右侧滚动菜单栏,演示过程中,用户点击截图按键,系统会在界面一侧生成图标;In the main menu of AR mode, add a button, bind the camera's screenshot function, add a scrolling menu bar on the right side of the menu, when the screenshot function is successfully triggered, the screenshot is displayed to the right by scrolling the menu bar by setting the dynamic material conversion function. During the demonstration, the user clicks the screenshot button, and the system generates an icon on the interface side;
步骤9.5:旋转Step 9.5: Rotate
将设置的模型实例化为一个Actor,添加Rotation函数,实现模型顺时针旋转;Instantiate the set model as an Actor, add the Rotation function, and implement the model to rotate clockwise;
步骤9.6:功能扩展Step 9.6: Function Extension
添加二级UI,控制Map切换,实现包括地球、土星、水星、含大气层星球以及星系在内的运行演示功能;添加WidgetBlueprint编码实现了知识简介面板的显示或隐藏;设计返回键可以回到AR编辑主模块;Add a secondary UI, control Map switching, and implement running demo functions including Earth, Saturn, Mercury, Atmospheric Planet, and Galaxy; add WidgetBlueprint encoding to display or hide the knowledge panel; design return button to return to AR editing Main module
步骤9.7:动态手势控制模型,真实环境与虚拟模型叠加,手势与模型进行交互控制,具体包括如下步骤:Step 9.7: The dynamic gesture control model, the real environment is superimposed with the virtual model, and the gesture and the model are interactively controlled, including the following steps:
步骤9.7.1:初始化视频捕捉,读取标志文件和摄像相机参数;Step 9.7.1: Initialize video capture, read the logo file and camera camera parameters;
步骤9.7.2:抓取视频帧图像;Step 9.7.2: Grab the video frame image;
步骤9.7.3:执行探测标记以及识别视频帧中的标记模板,并利用OpenCV库函数对获取的视频帧图像进行运动检测,判断是否检测到运动轨迹;Step 9.7.3: performing the detection mark and identifying the mark template in the video frame, and performing motion detection on the acquired video frame image by using the OpenCV library function to determine whether the motion track is detected;
若:判断结果是检测到运动轨迹,则执行步骤9.7.4:If the result of the judgment is that the motion track is detected, proceed to step 9.7.4:
或判断结果是没有检测到运动轨迹,则继续执行探测标记以及识别视频帧中的标记模板,然后执行步骤9.7.12;Or the result of the judgment is that no motion track is detected, then the detection mark is continued and the mark template in the video frame is identified, and then step 9.7.12 is performed;
基于颜色直方图与背景差分进行运动检测,对采集的帧以及对每帧运动检测后得到除运动手势区域外的像素做背景更新,公式如下;Perform motion detection based on the color histogram and the background difference, and perform background update on the acquired frame and the pixels outside the motion gesture area after detecting each frame motion, and the formula is as follows;
Figure PCTCN2017118923-appb-000015
Figure PCTCN2017118923-appb-000015
其中,u t为背景图像相应的像素点,u t+1为更新后的背景图像像素点;I t为当前帧图像的 像素点,I f是当前帧图像像素点的掩码值,即是否做背景更新;a∈[0,1]为背景图像模型更新速度; Where u t is the corresponding pixel point of the background image, u t+1 is the updated background image pixel point; I t is the pixel point of the current frame image, and I f is the mask value of the current frame image pixel point, that is, whether Do background update; a∈[0,1] is the background image model update speed;
步骤9.7.4:对图像进行包括去噪在内的预处理;Step 9.7.4: Perform preprocessing including denoising on the image;
通过运动检测步骤,如果检测到有运动信息,则开始对含有运动手势的视频帧图像进行预处理:通过OpenCV的medianBlur函数对图像进行中值滤波,去除椒盐噪声;Through the motion detection step, if motion information is detected, the video frame image containing the motion gesture is preprocessed: median filtering is performed on the image by the medianBlur function of OpenCV to remove the salt and pepper noise;
步骤9.7.5:转换到HSV空间;Step 9.7.5: Convert to HSV space;
通过cvtColor函数对图像进行颜色空间转换,得到其HSV空间的数据,并对HSV空间中的亮度v值重新设定如下式所示:The image is color-space converted by the cvtColor function to obtain the data of its HSV space, and the brightness v value in the HSV space is reset as shown below:
Figure PCTCN2017118923-appb-000016
Figure PCTCN2017118923-appb-000016
其中,r、g为肤色区域的红色与绿色像素,且r>g;Where r, g are red and green pixels of the skin color region, and r>g;
步骤9.7.6:分割手区域;Step 9.7.6: Split the hand area;
步骤9.7.7:进行形态学处理,去除杂点;Step 9.7.7: Perform morphological processing to remove impurities;
将得到的运动二值图和通过反投影得到的二值图相与,并进行图像形态学闭操作得到比较完整的运动肤色手势二值图;并去除图像中的杂点;Combining the obtained motion binary image with the binary image obtained by back projection, and performing image morphology closing operation to obtain a relatively complete motion skin color gesture binary image; and removing the noise points in the image;
步骤9.7.8:获取手轮廓;Step 9.7.8: Obtain the hand contour;
通过初步的形态学操作,去除噪声,并使手的边界更加清晰后,通过OpenCV的findContours函数得到手势轮廓,然后进行去除伪轮廓操作;After preliminary morphological operations, removing noise and making the boundary of the hand clearer, the gesture contour is obtained by OpenCV's findContours function, and then the pseudo contour is removed.
步骤9.7.9:画出手轮廓,标定信息;Step 9.7.9: Draw the outline of the hand and calibrate the information;
步骤9.7.10:轮廓信息比较,设置方向向量;Step 9.7.10: Comparison of contour information, setting direction vector;
将每一帧得到的轮廓进行比较,设定比较条件,通过比较给方向标志变量赋值;Compare the contours obtained in each frame, set the comparison conditions, and assign values to the direction marker variables by comparison;
步骤9.7.11:对模型根据矢量坐标进行受力模拟,实现动态手势与虚拟模型的交互;Step 9.7.11: Perform force simulation on the model according to vector coordinates to realize interaction between dynamic gestures and virtual models;
动态手势通过轮廓判断后,根据不同的判断结果对虚拟模型进行受力模拟操作,根据轮廓判断过程中方向标记的值,模型在三维空间的坐标值将会进行x、y、z三个坐标轴上的相乘计算,通过坐标值的改变,实现模型位置的改变而达到受力的模拟;After the dynamic gesture is judged by the contour, the virtual model is subjected to the force simulation operation according to different judgment results. According to the value of the direction marker in the contour judgment process, the coordinate values of the model in the three-dimensional space will be three axes of x, y and z. The multiplication calculation on the above, through the change of the coordinate value, the change of the model position is achieved to achieve the simulation of the force;
步骤9.7.12:计算摄像头相对于探测到的标记的转换矩阵;Step 9.7.12: Calculate a conversion matrix of the camera relative to the detected mark;
步骤9.7.13:在探测到的标记上叠加虚拟物体,并返回执行步骤9.7.2,实现真实环境与虚拟模型的叠加显示。Step 9.7.13: Superimpose the virtual object on the detected mark and return to step 9.7.2 to realize the superimposed display of the real environment and the virtual model.
本发明所带来的有益技术效果:The beneficial technical effects brought by the invention:
(1)本发明的三维模型采用等比例建立,材质贴图通过UE4引擎平台的编辑贴近真实,应用场景的环境灯光采用真实灯光模拟烘焙渲染。整个虚拟现实场景都更真实,沉浸感极强。(1) The three-dimensional model of the present invention is established in equal proportions, and the texture map is closely related to the reality through the editing of the UE4 engine platform, and the ambient lighting of the application scene is simulated by the real lighting simulation. The entire virtual reality scene is more realistic and immersive.
(2)本发明通过技术方案实现了多种功能交互,例如在虚拟井下开采场景漫游的过程中通过隐藏菜单更换工具模型,更换矿山材质来模拟不同的开采地质,自由移动开采工具的位置,以及视频信息嵌入机器显示屏展现真实场景,利用语音功能实现控制采煤机的正转、反转、升臂、降臂、停止等。(2) The present invention realizes multiple functional interactions through technical solutions, such as changing tool models through hidden menus during the process of virtual underground mining scene roaming, replacing mining materials to simulate different mining geology, freely moving the location of mining tools, and The video information is embedded in the machine display to show the real scene, and the voice function is used to control the forward, reverse, lift, lowering, and stopping of the shearer.
(3)本发明还通过生成二维码功能,将PC端展示连接到手机端展示,手机端功能更是可以利用手机内置陀螺仪,产生重力感应,如果在设置成VR眼镜模式便可利用简单的VR眼镜体验实时的场景沉浸感。(3) The invention also generates a two-dimensional code function, and connects the PC end display to the mobile phone end display, and the mobile phone end function can utilize the built-in gyroscope of the mobile phone to generate gravity sensing, and can be utilized simply when set to the VR glasses mode. VR glasses experience real-time scene immersion.
(4)本发明还利用AR开发SDK—ARToolKit实现了AR动态演示功能,通过AR编辑演示功能,用户可是实时选择采矿工具模型,进行360旋转展示、语音讲解以及动态运行展示,截图保存等,更重要的是其将工具模型以AR的模式展示,虚拟模型与真实环境结合的展示效果,这不仅能展现模型的直观立体性,更能展现其真实性,使其具有更好的学习、教 育的效果。(4) The invention also realizes the AR dynamic demonstration function by using the AR development SDK-ARToolKit. Through the AR editing demonstration function, the user can select the mining tool model in real time, perform 360 rotation display, voice explanation and dynamic running display, save the screenshot, etc. What is important is that it displays the tool model in the AR mode, and the virtual model is combined with the real environment. This not only shows the intuitive stereoscopicity of the model, but also shows its authenticity, making it better for learning and education. effect.
(5)本发明的AR模块,除了其动态演示功能,更是对视频流添加了处理,当动态手势进入摄像头视角,它会产生与模型的交互,手从远到近的动态会传递给模型一个三维空间向前的一个模拟力,从上到下的动态会给模型一个向上的一个模拟力,向前翻转手的动态会给模型一个向下的模拟了,同样,如果手扭动或者左右倾斜,便会给模型一个具有矢量方向的模拟力。(5) The AR module of the present invention, in addition to its dynamic presentation function, adds processing to the video stream. When the dynamic gesture enters the camera perspective, it generates interaction with the model, and the dynamics of the hand from the far to the near are transmitted to the model. A three-dimensional space forwards an analog force. The top-to-bottom dynamics give the model an upward simulated force. The forward flipping of the hand's dynamics gives the model a downward simulation. Similarly, if the hand is twisted or left or right Tilting gives the model a simulated force with vector direction.
(6)本发明在AR模块除了在煤矿应用场景的功能实现,还扩展了AR在天文学领域的展示功能。添加地球、土星、水星、含有动态大气层的星球以及星系的AR展示功能,与此同时将知识简介面板显示功能添加到此AR展示的模块中,丰富了AR在教育展示领域的应用。(6) The present invention expands the display function of AR in the field of astronomy in addition to the function realization of the AR module in the coal mine application scenario. The addition of Earth, Saturn, Mercury, the planet with a dynamic atmosphere, and the AR display of the galaxy, while adding the knowledge panel display function to the module of the AR display, enriches the application of AR in the field of educational display.
附图说明DRAWINGS
图1是本发明实现的整体功能结构图。1 is a diagram showing the overall functional structure of an implementation of the present invention.
图2是本发明生成二维码功能的示意图。2 is a schematic diagram of the function of generating a two-dimensional code of the present invention.
图3是本发明语音识别实现交互功能的原理图。3 is a schematic diagram of the interactive function of the speech recognition of the present invention.
图4是本发明AR模式实现的原理图。4 is a schematic diagram of an AR mode implementation of the present invention.
图5是本发明动态手势交互功能实现的流程图。Figure 5 is a flow chart showing the implementation of the dynamic gesture interaction function of the present invention.
具体实施方式Detailed ways
下面结合附图以及具体实施方式对本发明作进一步详细说明:The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
本发明提供一种基于虚拟现实与增强现实的采矿操作多交互实现方法。结合附图1可以了解本发明所包含的整个技术功能。其具体实施步骤如下:The invention provides a multi-interaction realization method for mining operation based on virtual reality and augmented reality. The entire technical function encompassed by the present invention can be understood in conjunction with FIG. The specific implementation steps are as follows:
步骤1:井下矿山开采作业的整个环境场景搭建。利用3DMax建模工具根据真实采矿操作环境创建相关模型。将模型分类导入UE4引擎,通过UE4平台,对模型进行材质编写、模拟自然灯光、环境光,添加物理碰撞检测,对系统进行参数调整,烘焙渲染。Step 1: The entire environmental scene of the underground mining operation is built. Create models based on real mining operations using 3DMax modeling tools. The model classification is imported into the UE4 engine. Through the UE4 platform, the model is written, the natural light and the ambient light are simulated, the physical collision detection is added, the parameters of the system are adjusted, and the baking is rendered.
步骤2:在虚拟应用场景添加第一人称角色,给角色添加鼠标键盘控制事件。将键盘的上下左右键绑定Up、Down、Right、Left函数,控制第一人称角色在虚拟三维空间的坐标改变,实现漫游。给鼠标添加Turnaround函数,控制第一人称视角在虚拟三维空间的720度旋转。Step 2: Add a first person role in the virtual application scenario and add a mouse and keyboard control event to the character. Bind the up, down, left and right keys of the keyboard to the Up, Down, Right, and Left functions to control the coordinates of the first person character in the virtual three-dimensional space to achieve roaming. Add a Turnaround function to the mouse to control the 720-degree rotation of the first person perspective in the virtual three-dimensional space.
步骤3:设置交互菜单,实现更换井下采矿操作的工具模型、矿山地质材质等功能交互。首先创建一个Widget Blueprint用户界面,设置菜单选项,为选项添加点击事件。然后给模型添加Box collision碰撞检测区域,当角色进入Box collision碰撞检测区域。创建的Widget Blueprint用户界面显示。离开Box collision碰撞检测区域,Widget Blueprint用户界面隐藏。将采煤机模型实例化为一个Actor,并添加SetMesh函数,实现更换其他工具模型。同理,将三维空间里的矿山地质模型添加SetMaterial函数,实现更换材质。本发明设置四类开采工具模型供用户选择,以及将矿山地质设置成材质可选择模式,通过显示的样式菜单更换模型、材质。更换完毕,离开检测区域,菜单自动隐藏,不影响整体漫游视觉效果,又能达到实时交互的功能。Step 3: Set the interactive menu to realize the function interaction of replacing the tool model of the underground mining operation and the geological material of the mine. First create a Widget Blueprint user interface, set menu options, and add click events to the options. Then add a Box collision collision detection area to the model when the character enters the Box collision collision detection area. The created Widget Blueprint user interface is displayed. Leave the Box collision collision detection area and the Widget Blueprint user interface is hidden. Instantiate the shearer model into an Actor and add the SetMesh function to replace the other tool models. In the same way, the SetMaterial function is added to the mine geological model in the three-dimensional space to realize the replacement of the material. The invention sets four types of mining tool models for the user to select, and sets the mine geology into a material selectable mode, and replaces the model and the material through the displayed style menu. After the replacement is completed, leaving the detection area, the menu is automatically hidden, which does not affect the overall roaming visual effect, and can achieve the function of real-time interaction.
步骤4:视频嵌入,在三维空间播放,模拟矿山开采环境的监控显示设备。本发明设置键盘X键绑定UE4平台的MediaPlayer媒体类,通过Open-Source和Close函数实现控制视频流的播放和停止。此操作可以模拟井下矿山开采控制设备的屏幕显示,以及实时环境监控的画面显示,凸显三维场景的真实性与动态性,使模拟的虚拟场景更加贴近现实。Step 4: Video embedding, playing in three-dimensional space, simulating the monitoring display equipment of the mining environment. The invention sets the keyboard X key binding to the MediaPlayer media class of the UE4 platform, and controls the playing and stopping of the video stream through the Open-Source and Close functions. This operation can simulate the screen display of the underground mining control equipment and the real-time environmental monitoring screen display, highlighting the authenticity and dynamics of the three-dimensional scene, making the simulated virtual scene closer to reality.
步骤5:选择模型可以拖动到任意用户想要放置的位置,以及实现设备自动开启的意向交互功能。为要操作的模型添加鼠标事件,通过GetHitResult函数将模型选中,然后根据鼠标在三维空间的坐标,改变模型的SetActorLocation函数的坐标值。当鼠标再次点击,将此时鼠标x、y、z三个方向的坐标值赋给模型,此时GetHitResult函数将模型设置为取消选中模式。本实施例用户可以点击场景中的采煤机模型,将其放到采矿操作场景的其他开采位置。Step 5: Select the model to drag to any location that the user wants to place, and to implement the intent interaction function that the device automatically opens. Add a mouse event to the model to be operated, select the model by the GetHitResult function, and then change the coordinate value of the model's SetActorLocation function according to the coordinates of the mouse in the three-dimensional space. When the mouse clicks again, the coordinate values of the three directions of the mouse x, y, and z are assigned to the model at this time, and the GetHitResult function sets the model to the unselected mode. In this embodiment, the user can click on the shearer model in the scene and place it in other mining locations of the mining operation scenario.
系统在特定区域添加TriggerBox触发器,第一人称角色进入此区域,触发TriggerBox触 发器,相应下一个区域的环境灯控制函数SetVisible触发,灯被打开,从而实现了本发明设置的自动感应灯功能。这也是本发明设计的检测人意向的功能,从而实现更自然的系统交互。The system adds a TriggerBox trigger to a specific area. The first person character enters this area, triggers the TriggerBox trigger, and the corresponding area's ambient light control function SetVisible triggers, and the light is turned on, thereby realizing the automatic induction lamp function set by the present invention. This is also the function of the present invention designed to detect human intent, thereby enabling more natural system interaction.
步骤6:二维码生成功能。单一的PC端展示不能满足多用户的体验,本发明通过添加二维码生成,扫描二维码便可实现多用户手机端的展示,通过二维码连接,手机跳转到煤矿开采作业的全景展示页面。在手机端,用户可以启用陀螺仪,切换到VR分屏模式,设置好手机参数,便可用VR眼镜体验虚拟井下煤矿开采环境,实现720度的视角展示。与此同时,可以实现手机端的多场景、多角度的漫游体验。本功能主要是通过绑定键盘的F、V键,添加二维码生成与隐藏函数。在UE4引擎中添加场景6个Point采集点,通过采集点位置生成全景图,再将信息和相关手机端设置以生成二维码形式生成网络连接,实现端与端的转换。此功能实现的流程如图2所示。Step 6: Two-dimensional code generation function. A single PC-side display cannot satisfy the multi-user experience. The present invention can generate a multi-user mobile phone terminal by adding a two-dimensional code and scanning the two-dimensional code, and the mobile phone jumps to the panoramic display of the coal mining operation through the two-dimensional code connection. page. On the mobile phone side, the user can enable the gyroscope, switch to the VR split screen mode, set the mobile phone parameters, and then use the VR glasses to experience the virtual underground coal mining environment to achieve a 720-degree viewing angle display. At the same time, a multi-scene, multi-angle roaming experience on the mobile terminal can be realized. This function mainly adds the QR code generation and hiding function by binding the F and V keys of the keyboard. Add 6 point collection points in the UE4 engine, generate a panorama by collecting the position of the point, and then generate information and related mobile phone settings to generate a network connection in the form of a two-dimensional code to realize end-to-end conversion. The process implemented by this function is shown in Figure 2.
步骤7:实现语音控制功能。本发明利用Pocket-sphinx实现中文的关键字识别。具体的语音控制实现原理流程如图3所示,本发明在以采煤机模型创建的Actor上添加语音识别函数,通过在系统初始化后启用语音识别类,并保存对此类的引用。之后创建并绑定一个方法到语音识别函数OnWordSpoken,每当用户说出设置好的控制词语时,便会触发此方法,通过关键词匹配实现采煤机的正转、反转、升臂、降臂以及停止等相关控制。本方法实现的语音识别是基于美国卡内基梅隆大学开发的英语语音识别系统Sphinx改进而实现的。本发明的语音识别方法是大量词汇、非特定人、连续中文音节的孤立词识别方法。能够很好的识别不同人发出的设定词汇。最终通过UE4的编码技术,实现了语音词汇识别后与匹配词对应动作控制函数的触发,实现模型的相应动作控制。此识别体系包括语音预处理、特征提取、声学模型训练、语言模型训练和语音解码五个部分。以下是语音识别的具体流程:Step 7: Implement voice control. The invention realizes Chinese keyword recognition by using Pocket-sphinx. The specific voice control implementation principle flow is shown in FIG. 3. The present invention adds a speech recognition function to an actor created by a shearer model, by enabling a speech recognition class after system initialization, and saving a reference to such a class. Then create and bind a method to the speech recognition function OnWordSpoken. Whenever the user says the set control words, this method will be triggered to realize the forward, reverse, lift, and descend of the shearer through keyword matching. Related controls such as arm and stop. The speech recognition realized by this method is realized based on the improved speech recognition system Sphinx developed by Carnegie Mellon University. The speech recognition method of the present invention is an isolated word recognition method for a large number of vocabulary, non-specific people, and continuous Chinese syllables. It is a good way to identify the set words from different people. Finally, through the coding technology of UE4, the triggering of the action control function corresponding to the matching words after the speech vocabulary recognition is realized, and the corresponding action control of the model is realized. This recognition system includes five parts: speech preprocessing, feature extraction, acoustic model training, language model training and speech decoding. The following is the specific process of speech recognition:
步骤7.1:预处理。Step 7.1: Pretreatment.
对输入的原始语音信号进行处理,滤除掉其中的不重要的信息以及背景噪声,并进行语音信号的端点检测、语音分帧、预加重等处理。语音信号的预加重,目的是为了对语音的高频部分进行加重,去除口唇辐射的影响,增加语音的高频分辨率。一般通过传递函数为H(z)=1-az -1一阶FIR高通数字滤波器来实现预加重,a为预加重滤波器的系数,取值范围一般在0.9~1.0,本文取0.97。设n时刻的语音采样值为x(n),预加重后的信号为 The input original speech signal is processed, the unimportant information and background noise are filtered out, and the end point detection, speech framing, pre-emphasis and the like of the speech signal are performed. The pre-emphasis of the speech signal is intended to emphasize the high-frequency part of the speech, remove the influence of the lip radiation, and increase the high-frequency resolution of the speech. Pre-emphasis is generally realized by a transfer function of H(z)=1-az -1 first-order FIR high-pass digital filter, a is the coefficient of the pre-emphasis filter, and the value range is generally 0.9-1.0, which is 0.97. Let the value of the speech sample at time n be x(n), and the signal after pre-emphasis is
y(n)=x(n)-a*x(n-1)y(n)=x(n)-a*x(n-1)
步骤7.2:特征提取。Step 7.2: Feature extraction.
本文使用梅尔频率倒谱系数(MFCC)的方法来提取。MFCC参数是基于人的听觉特性的,他利用人听觉的临界带效应,采用MEL倒谱分析技术对语音信号处理得到MEL倒谱系数矢量序列,用MEL倒谱系数表示输入语音的频谱。在语音频谱范围内设置若干个具有三角形或正弦形滤波特性的带通滤波器,然后将语音能量谱通过该滤波器组,求各个滤波器输出,对其取对数,并做离散余弦变换(DCT),即可得到MFCC系数。求解公式如下:This paper uses the method of Mel Frequency Cepstral Coefficient (MFCC) to extract. The MFCC parameter is based on the human auditory characteristics. He uses the critical band effect of human hearing, uses the MEL cepstrum analysis technique to process the speech signal to obtain the MEL cepstral coefficient vector sequence, and uses the MEL cepstral coefficient to represent the spectrum of the input speech. Set a number of bandpass filters with triangular or sinusoidal filtering characteristics in the speech spectrum range, then pass the speech energy spectrum through the filter bank, find the output of each filter, take the logarithm of it, and do the discrete cosine transform ( DCT), you can get the MFCC coefficient. The solution formula is as follows:
Figure PCTCN2017118923-appb-000017
Figure PCTCN2017118923-appb-000017
其中,C i为特征参数,k为三角滤波器的个数,F(k)为各个滤波器的输出数据,P为滤波器阶数,i为数据长度。 Where C i is a characteristic parameter, k is the number of triangular filters, F(k) is the output data of each filter, P is the filter order, and i is the data length.
步骤7.3:声学模型训练。Step 7.3: Acoustic model training.
根据训练语音库的特征参数训练出声学模型参数。在识别时可以将待识别的语音的特征参数同声学模型进行匹配,得到识别结果。本文采用混合高斯模型-隐马尔科夫模型(GMM-HMM)作为声学模型。The acoustic model parameters are trained according to the characteristic parameters of the training speech library. At the time of identification, the feature parameters of the speech to be recognized can be matched with the acoustic model to obtain a recognition result. In this paper, a mixed Gaussian model-Hidden Markov Model (GMM-HMM) is used as the acoustic model.
步骤7.3.1:求出混合高斯模型的联合概率密度函数:Step 7.3.1: Find the joint probability density function of the mixed Gaussian model:
Figure PCTCN2017118923-appb-000018
Figure PCTCN2017118923-appb-000018
Figure PCTCN2017118923-appb-000019
Figure PCTCN2017118923-appb-000019
其中,M表示混合高斯模型中高斯的个数,C m表示权重,u m表示均值,∑ m表示协方差矩阵,D为观测矢量维数。利用最大期望值算法(EM)对混合高斯模型参数变量:Θ={C m,u m,∑ m}进行估计,利用如下公式求解: Where M represents the number of Gaussian in the mixed Gaussian model, C m represents the weight, u m represents the mean, ∑ m represents the covariance matrix, and D is the observed vector dimension. The maximum expected value algorithm (EM) is used to estimate the mixed Gaussian model parameter variables: Θ={C m , u m , ∑ m }, which are solved by the following formula:
Figure PCTCN2017118923-appb-000020
Figure PCTCN2017118923-appb-000020
Figure PCTCN2017118923-appb-000021
Figure PCTCN2017118923-appb-000021
Figure PCTCN2017118923-appb-000022
Figure PCTCN2017118923-appb-000022
其中,j是当前迭代轮数,N表示训练数据集中元素的个数,x (t)为t时刻的特征向量,h m(t)表示t时刻C m的后验概率。GMM参数通过EM算法进行估计,可以使其在训练数据上生成语音观察特征的概率最大化。 Where j is the current iteration number, N is the number of elements in the training data set, x (t) is the feature vector at time t, and h m (t) is the posterior probability at time t m . The GMM parameters are estimated by the EM algorithm, which maximizes the probability of generating speech observation features on the training data.
步骤7.3.2:求解HMM三个主要组成部分。Step 7.3.2: Solve the three main components of the HMM.
设状态序列为q 1,q 2,…,q N,令转移概率矩阵A=[a ij]i,j∈[1,N],则求出的马尔科夫链状态间的跳转概率为:a ij=P(q t=j|q t-1=i);马尔科夫链的初始概率π=[π i]i∈[i,N],其中,π i=P(q 1=i);令每个状态的观察概率分布b i(o t)=P(o t|q t=i),采用GMM模型来描述状态的观察概率分布;根据步骤7.3.1,求解公式为: Let the state sequence be q 1 , q 2 ,...,q N , and let the transition probability matrix A=[a ij ]i,j∈[1,N], then the jump probability between the Markov chain states is :a ij =P(q t =j|q t-1 =i); the initial probability of Markov chain π=[π i ]i∈[i,N], where π i =P(q 1 = i); Let the observed probability distribution b i (o t )=P(o t |q t =i) of each state, use the GMM model to describe the observed probability distribution of the state; according to step 7.3.1, the formula is:
Figure PCTCN2017118923-appb-000023
Figure PCTCN2017118923-appb-000023
其中,N为状态个数,i、j表示状态,a ij表示t-1时刻从i状态跳转到t时刻j状态的概率,o t为t时刻的观测值,C i,m为混合系数,表示不同高斯之间的权重,u i,m表示不同高斯之间的均值,∑ i,m表示不同高斯之间的协方差矩阵;HMM的参数通过Baum-Welch算法进行估计得出,最后生成声学模型文件; Where N is the number of states, i and j are states, a ij is the probability of jumping from the i state to the t state at time t-1, o t is the observed value at time t, and C i,m is the mixing coefficient , indicating the weight between different Gaussians, u i,m represents the mean between different Gaussians, ∑ i,m represents the covariance matrix between different Gaussians; the parameters of HMM are estimated by Baum-Welch algorithm, and finally generated Acoustic model file;
步骤7.4:语言模型训练。Step 7.4: Language Model Training.
语言模型是用来约束单词搜索,语言建模能够有效的结合汉语语法和语义的知识,描述词之间的内在关系,从而提高识别率,减少搜索范围。本文采用N-Gram模型实现语言模型的训练。在一个语句中第i个词出现的概率,条件依赖于它前面的N-1个词,即将一个词的上下文定义为该词前面出现的N-1个词,其表达公式为:The language model is used to constrain word search. Language modeling can effectively combine the knowledge of Chinese grammar and semantics, and describe the intrinsic relationship between words, thereby improving the recognition rate and reducing the search range. This paper uses the N-Gram model to implement the training of the language model. The probability that the i-th word appears in a statement depends on the N-1 words in front of it. The context of a word is defined as the N-1 words appearing in front of the word. The expression is:
Figure PCTCN2017118923-appb-000024
Figure PCTCN2017118923-appb-000024
本文取N=2和N=3,也就是通过前一个或两个单词来判定当前单词出现的概率P(w 2|w 1),P(w 3|w 2,w 1)。 In this paper, N=2 and N=3 are taken, that is, the probability P(w 2 |w 1 ), P(w 3 |w 2 , w 1 ) of the current word appearing is determined by the previous word or two words.
简单的说,语言模型就是统计语料得到的模型,语料是用于训练的文本库,字典文件存放的就是训练用的语料和对应发言。语言模型就是表达的语料的组合概率。如设P(w 1)是w 1在 文章中出现的概率,P(w 1,w 2)是w 1,w 2连续出现是概率,P(w 2|w 1)是已知w 1已出现的情况下出现w 2的概率,假设识别sentence的概率用P(s)表示,P(s)=P(w 1,w 2,…,w n)表示单词集w 1,w 2,…,w n连续出现并生成S的概率,使用条件概率公式S把整个公式替换成: To put it simply, the language model is the model obtained from the statistical corpus. The corpus is the text library used for training. The dictionary file stores the corpus for training and the corresponding speech. The language model is the combined probability of the expressed corpus. The set P (w 1) w 1 is the probability of appearing in the article, P (w 1, w 2 ) are w 1, w 2 consecutive probability, P (w 2 | w 1 ) w 1 are already known The probability of occurrence of w 2 occurs in the case of occurrence, assuming that the probability of identifying the sentence is represented by P(s), and P(s)=P(w 1 , w 2 , . . . , w n ) represents the word set w 1 , w 2 ,... , w n appears continuously and generates the probability of S, using the conditional probability formula S to replace the entire formula with:
P(sentence)=P(w 1)*P(w 2|w 1)*P(w 3|w 2)…*P(w n|w 1,w 2,…,w n-1) P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )...*P(w n |w 1 ,w 2 ,...,w n-1 )
再用马尔科夫假设精简成:Then use the Markov assumption to streamline:
P(sentence)=P(w 1)*P(w 2|w 1)*P(w 3|w 2)…*P(w n|w n-1) P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )...*P(w n |w n-1 )
我们知道,P(w i|w i-1)=P(w i-1,w i)/P(w i),P(w i-1,w i)和P(w i)都可以从语料统计出来,最终就能得到P(sentence)。语言模型存储P(w i-1,w i)的概率统计值,通过求出P(sentence)的最大值来实现整个识别过程。 We know that P(w i |w i-1 )=P(w i-1 ,w i )/P(w i ), P(w i-1 ,w i ) and P(w i ) can all be The corpus is counted and the P (sentence) is finally obtained. The language model stores the probability statistics of P(w i-1 , w i ), and the entire recognition process is realized by finding the maximum value of P(sentence).
步骤7.5:语音解码和搜索算法。Step 7.5: Speech decoding and search algorithm.
针对输入的语音信号,根据己经训练好的声学模型、语言模型及字典建立一个识别网络,根据搜索算法在该网络中寻找最佳的一条路径,这个路径就是能够以最大概率输出该语音信号的词串,这样就确定这个语音样本所包含的文字。本文采用Viterbi算法实现语音的解码。具体过程如下:For the input speech signal, an identification network is established according to the trained acoustic model, language model and dictionary, and the best path is found in the network according to the search algorithm, and the path is capable of outputting the speech signal with maximum probability. The string of words, thus determining the text contained in this speech sample. In this paper, the Viterbi algorithm is used to decode the speech. The specific process is as follows:
(1)输入HMM模型的参数和观测序列O={o 1,o 2,…,o T},则t=1时所有的状态概率为: (1) Enter the parameters of the HMM model and the observation sequence O={o 1 ,o 2 ,...,o T }, then all state probabilities when t=1 are:
δ 1(i)=π ib i(o 1) δ 1 (i)=π i b i (o 1 )
ψ 1(i)=0 ψ 1 (i)=0
(2)逐渐递推到t=2,3,…,T,则为:(2) Gradually recursive to t=2,3,...,T, then:
Figure PCTCN2017118923-appb-000025
Figure PCTCN2017118923-appb-000025
Figure PCTCN2017118923-appb-000026
Figure PCTCN2017118923-appb-000026
(3)终止遍历:(3) Terminate traversal:
Figure PCTCN2017118923-appb-000027
Figure PCTCN2017118923-appb-000027
Figure PCTCN2017118923-appb-000028
Figure PCTCN2017118923-appb-000028
(4)回溯最优路径,t=T-1,T-2,…,1;(4) Backtracking optimal path, t=T-1, T-2,...,1;
Figure PCTCN2017118923-appb-000029
Figure PCTCN2017118923-appb-000029
输出最优隐状态路径
Figure PCTCN2017118923-appb-000030
其中,δ t(i)是递推到t时刻,最优路径经过的所有结点的联合概率,ψ t(i)是t时刻的隐状态,T为时间,P *为最优路径的概率,
Figure PCTCN2017118923-appb-000031
为最优路径的终结点。最后通过最优路径实现语音识别。
Output optimal hidden state path
Figure PCTCN2017118923-appb-000030
Where δ t (i) is the joint probability of all nodes through which the optimal path passes when recursively to t, ψ t (i) is the implicit state at time t, T is time, and P * is the probability of the optimal path ,
Figure PCTCN2017118923-appb-000031
The endpoint for the optimal path. Finally, speech recognition is achieved through the optimal path.
用户说出升臂、降臂、正转、反转以及停止语音后,仿真系统实现采煤机的相应操作,系统识别出用户说出的关键字后会在界面的左上角显示。After the user speaks the boom, the down arm, the forward rotation, the reverse rotation, and the stop voice, the simulation system realizes the corresponding operation of the shearer, and the system recognizes the keyword spoken by the user and displays it in the upper left corner of the interface.
步骤8:AR动态演示功能模式切换。Step 8: AR dynamic demonstration function mode switching.
在界面设置一个widget blueprint,添加openLevel函数,切换到新的Map,即AR模式。进入AR演示模式,此模式具体实现采煤过程中的工具模型演示讲解,从而实现AR技术的学习、教育应用功能。Set a widget blueprint on the interface, add the openLevel function, and switch to the new Map, which is AR mode. Enter the AR demonstration mode, which realizes the demonstration of the tool model in the coal mining process, thus realizing the learning and educational application functions of the AR technology.
步骤9:AR模式下的模型选择、模型讲解以及动态演示。Step 9: Model selection, model explanation, and dynamic presentation in AR mode.
本发明的AR动态演示模块,用户界面为了更简洁和便于AR展示,设计二级隐含菜单,本实施例是将模型选择、模型讲解、模型演示以及功能扩展的附加子功能选择设计成隐藏的二级菜单,模型选择分为采煤机、掘进机、风煤钻、综采支架等模型,用户选择完毕,子菜单隐藏即可,模型讲解、模型动态演示以及功能扩展菜单亦是如此实现。具体实现包含内容可参照图1。本文以NFT(自然图片追踪,Natural Feature Tracking)为例实现AR技术,其原理如图4所示,具体流程如下:The AR dynamic demonstration module of the present invention, the user interface is designed to be more concise and convenient for AR display, and the second-level implicit menu is designed. In this embodiment, the additional sub-function selections of model selection, model explanation, model demonstration and function expansion are designed to be hidden. The second-level menu, model selection is divided into coal mining machine, roadheader, wind coal drill, fully mechanized mining bracket and other models. After the user selects, the submenu can be hidden, and the model explanation, model dynamic demonstration and function expansion menu are also realized. The specific implementation includes content can refer to FIG. This paper takes NFT (Natural Feature Tracking) as an example to implement AR technology. The principle is shown in Figure 4. The specific process is as follows:
步骤9.1:通过摄像头校准标定,获取到因为摄像头制造工艺偏差而造成的畸变参数,也 就是摄像头内参(intrinsic matrix),来复原相机模型的3D空间到2D空间的一一对应关系。Step 9.1: Through the camera calibration calibration, the distortion parameter caused by the deviation of the manufacturing process of the camera, that is, the intrinsic matrix of the camera, is obtained to restore the one-to-one correspondence of the 3D space of the camera model to the 2D space.
步骤9.2:根据摄像头本身的硬件参数,我们可以计算出相应的投影矩阵(Projection Matrix)。Step 9.2: According to the hardware parameters of the camera itself, we can calculate the corresponding Projection Matrix.
步骤9.3:对待识别的自然图片进行特征提取,获取到一组特征点{p}。Step 9.3: Feature extraction of the natural image to be recognized, and a set of feature points {p} is obtained.
步骤9.4:实时对摄像头获取到的图像进行特征提取,也是一组特征点{q}。Step 9.4: Feature extraction of the image acquired by the camera in real time is also a set of feature points {q}.
步骤9.5:使用ICP(Iterative Closest Point)算法来迭代求解这两组特征点的R、T矩阵(Rotation&Translation),即Pose矩阵,也就是图形学中常说的模型视图矩阵(Model View Matrix)。假设三维空间的两个点为:
Figure PCTCN2017118923-appb-000032
他们的欧氏距离为:
Step 9.5: Using the ICP (Iterative Closest Point) algorithm to iteratively solve the R, T matrix (Rotation & Translation) of the two sets of feature points, that is, the Pose matrix, which is the model view matrix commonly referred to in graphics. Suppose the two points in the three-dimensional space are:
Figure PCTCN2017118923-appb-000032
Their Euclidean distance is:
Figure PCTCN2017118923-appb-000033
Figure PCTCN2017118923-appb-000033
为求p和q变化的矩阵R和T,对于
Figure PCTCN2017118923-appb-000034
其中i,j=1,2,…,N,利用最小二乘法求出最优解。使:
For the matrix R and T of p and q changes, for
Figure PCTCN2017118923-appb-000034
Where i, j = 1, 2, ..., N, the least squares method is used to find the optimal solution. Make:
Figure PCTCN2017118923-appb-000035
Figure PCTCN2017118923-appb-000035
最小时的R和T,此时的R、T即MVP矩阵。其中,E为变换后两个点集中对应点的距离和,N为点集中点的个数。The lowest R and T, at this time R, T is the MVP matrix. Where E is the distance sum of the corresponding points in the two points after the transformation, and N is the number of points in the point concentration.
步骤9.6:得到MVP矩阵(Model View Projection),进行三维图形绘制。Step 9.6: Obtain the MVP matrix (Model View Projection) for 3D graphics rendering.
步骤10:截图生成图标。Step 10: Screenshot generation icon.
在AR模式的主菜单,添加一个按钮,绑定摄像机的截图功能,在菜单右侧添加滚动菜单栏,当截图函数成功触发,截图通过设置好的动态材质转换函数,显示到右侧滚动菜单栏。演示过程中,用户点击截图按键,系统会在界面左边生成图标,方便用户对学习过程中的难点、疑问点记录与详细观测,这样可以加固学习效果。In the main menu of AR mode, add a button, bind the camera's screenshot function, add a scrolling menu bar on the right side of the menu, when the screenshot function is successfully triggered, the screenshot is displayed to the right by scrolling the menu bar by setting the dynamic material conversion function. . During the demonstration process, the user clicks the screenshot button, and the system generates an icon on the left side of the interface, which is convenient for the user to record the difficult points, question points and detailed observations during the learning process, which can enhance the learning effect.
步骤11:模型旋转停止展示。Step 11: Model rotation stops showing.
AR模式下,用户看到的是真实场景与虚拟模型的叠加。将设置的模型实例化为一个Actor,添加Rotation函数,实现模型顺时针旋转。此设计,设置模型旋转,用户对工具模型有一个360度观测、学习,可以更好的达到视觉效果,这种演示学习模式更具有真实性、沉浸感。In AR mode, the user sees the superposition of the real scene and the virtual model. Instantiate the set model into an Actor, add the Rotation function, and implement the model to rotate clockwise. This design, set the model rotation, the user has a 360-degree observation and learning of the tool model, which can better achieve the visual effect. This demonstration learning mode is more authentic and immersive.
步骤12:AR功能扩展模块。Step 12: AR function expansion module.
本发明添加AR教育展示扩展功能,通过添加二级UI,控制Map切换,实现不同物体的演示。其中包括地球、土星、水星、含大气层星球以及星系运行展示功能,星球做自转运动,通过AR模式,将运动的星球展现在用户眼前,并添加知识简介功能,完善了本系统扩展的教育展示功能。The invention adds an AR education display extension function, and controls the Map switching by adding a secondary UI to realize demonstration of different objects. These include the Earth, Saturn, Mercury, the Earth's planet and the galaxy's running display function. The planet does the rotation. Through the AR mode, the moving planet is displayed in front of the user, and the knowledge introduction function is added to improve the educational display function of the system. .
步骤13:动态手势与模型交互。Step 13: Dynamic gestures interact with the model.
AR模式添加OpenCV视频信息处理,初始化视频流后,先进行运动检测,如果检测到动态手运动,则进行图像处理,将手势进行图形处理去噪、转成HSV模式、形态学处理、画轮廓线,标定信息、轮廓信息比较,最后进行模型受力模拟,实现动态手势与虚拟模型的交互,具体实现原理流程如图5所示。特别的,此动态手势交互实现了模拟三维手势的识别控制,视频流获取的动态手为二维信息,这里通过矩阵运算,将与计算得到的摄像机相对于探测到的标识的转换矩阵做比较,得到一个三维运动手势运动信息,从而实现对模型在三维空间里不同方向上的受力模拟;具体包括如下步骤:The AR mode adds OpenCV video information processing. After initializing the video stream, motion detection is performed first. If dynamic hand motion is detected, image processing is performed, and the gesture is subjected to graphics processing, denoising, converting to HSV mode, morphological processing, and drawing outline. The calibration information and the contour information are compared. Finally, the model force simulation is carried out to realize the interaction between the dynamic gesture and the virtual model. The specific implementation principle flow is shown in FIG. 5 . In particular, the dynamic gesture interaction realizes the recognition control of the simulated three-dimensional gesture, and the dynamic hand acquired by the video stream is two-dimensional information, and the matrix calculation is used to compare the calculated camera with the detected transformation matrix of the detected identifier. Obtain a three-dimensional motion gesture motion information, thereby realizing the force simulation of the model in different directions in the three-dimensional space; the specific steps include the following steps:
步骤13.1:运动检测Step 13.1: Motion Detection
本方法是基于颜色直方图与背景差分的运动检测,程序在启动摄像头过程中,需要一定时间,这个时间差不多可以采集20帧的图像,对这20帧进行循环背景更新如下式,并对每 帧运动检测后得到除运动手势区域外的像素也做背景更新。The method is based on the motion detection of the color histogram and the background difference. The program needs a certain time in the process of starting the camera. This time, it is possible to collect images of 20 frames, and the cyclic background of the 20 frames is updated as follows, and each frame is After the motion detection, pixels other than the motion gesture area are also updated as background.
Figure PCTCN2017118923-appb-000036
Figure PCTCN2017118923-appb-000036
其中,u t为背景图像相应的像素点,u t+1为更新后的背景图像像素点;I t为当前帧图像的像素点,I f是当前帧图像像素点的掩码值,即是否做背景更新;a∈[0,1]为背景图像模型更新速度,一般取0.8到1,本方法取0.8。 Where u t is the corresponding pixel point of the background image, u t+1 is the updated background image pixel point; I t is the pixel point of the current frame image, and I f is the mask value of the current frame image pixel point, that is, whether Do background update; a∈[0,1] is the background image model update speed, generally 0.8 to 1, and 0.8 for this method.
步骤13.2:图像预处理Step 13.2: Image Preprocessing
通过步骤13.1的简单运动检测步骤,如果检测到有运动信息,则开始对含有运动手势的视频帧图像进行预处理:通过OpenCV的medianBlur函数对图像进行中值滤波,去除椒盐噪声:Through the simple motion detection step of step 13.1, if motion information is detected, the video frame image containing the motion gesture is preprocessed: median filtering is performed by the medianBlur function of OpenCV to remove the salt and pepper noise:
步骤13.3:转换到HSV空间Step 13.3: Convert to HSV Space
通过cvtColor函数对图像进行颜色空间转换,得到其HSV空间的数据,并对在HSV空间中亮度v值重新设置为比较小的亮度值(减小静止类肤色的干扰);对HSV空间中亮度v值重新设定如下式所示:The color space is converted by the cvtColor function to obtain the data of its HSV space, and the brightness v value in the HSV space is reset to a relatively small brightness value (reducing the interference of static skin color); the brightness in the HSV space is v The value is reset as shown below:
Figure PCTCN2017118923-appb-000037
Figure PCTCN2017118923-appb-000037
其中,r、g为感兴趣的肤色区域的红色与绿色像素,且r>g;Where r, g are red and green pixels of the skin color region of interest, and r>g;
步骤13.4:分割手区域,并进行形态学处理Step 13.4: Divide the hand area and perform morphological processing
将得到的运动二值图和通过反投影得到的二值图相与,在进行一些图像形态学闭操作得到比较完整的运动肤色手势二值图;去除图像中的杂点;Combining the obtained motion binary image with the binary image obtained by back projection, and performing some image morphology closing operation to obtain a relatively complete motion skin color gesture binary image; removing the noise points in the image;
步骤13.5:获取手势轮廓Step 13.5: Get the gesture outline
通过初步的形态学操作,去除噪声,并使手的边界更加清晰后,通过OpenCV的findContours函数得到手势轮廓,然后进行去除伪轮廓操作;After preliminary morphological operations, removing noise and making the boundary of the hand clearer, the gesture contour is obtained by OpenCV's findContours function, and then the pseudo contour is removed.
步骤13.6:画出轮廓,标定信息Step 13.6: Draw outlines, calibration information
步骤13.7:轮廓信息比较,设置方向矢量Step 13.7: Contour information comparison, set direction vector
由于手是不断运动的,所以我们得到的轮廓也是不断在改变。将每一帧得到的轮廓进行比较,设定比较条件。通过比较给方向标志变量赋值。状态比较和分析如表1:Since the hand is constantly moving, the contours we get are constantly changing. The contours obtained for each frame are compared and the comparison conditions are set. Assign values to the direction marker variables by comparison. State comparison and analysis are shown in Table 1:
表1:状态分析Table 1: Status Analysis
Figure PCTCN2017118923-appb-000038
Figure PCTCN2017118923-appb-000038
步骤13.8:通过方向矢量,作用到虚拟模型,产生受力模拟Step 13.8: Applying a force simulation to the virtual model through the direction vector
动态手势通过轮廓判断后,根据不同的判断结果对虚拟模型进行受力模拟操作。根据轮廓判断过程中方向标记的值,模型在三维空间的坐标值将会进行x、y、z三个坐标轴上的相乘计算,通过坐标值的改变,实现模型位置的改变而达到受力的模拟。After the dynamic gesture is judged by the contour, the virtual model is subjected to the force simulation operation according to different judgment results. According to the value of the direction marker in the contour judgment process, the coordinate values of the model in the three-dimensional space will be multiplied by the x, y, and z coordinate axes, and the change of the coordinate value will realize the change of the model position to achieve the force. Simulation.
本实施例中选取一组手掌由远到近的运动、由下到上运动以及手掌向各个方向扭转运动 对模型产生的不同受力效果模拟展示,手势运动模型分别向前移动、向上移动以及根据手的不同扭转方向有一个向各方向受力的运行效果。此功能展示了动态手势与虚拟模型的交互,此交互可以帮助用户多角度观察模型,并实现了教学与用户之间的互动,增加趣味性。In this embodiment, a set of palms are selected from the far-to-near motion, the bottom-up motion, and the palms are twisted in various directions to simulate different force effects generated by the model, and the gesture motion models are respectively moved forward, moved upward, and according to The different twist directions of the hand have a running effect on the forces in all directions. This feature demonstrates the interaction of dynamic gestures with virtual models. This interaction helps users view the model from multiple angles and realizes the interaction between teaching and users, increasing the fun.
当然,上述说明并非是对本发明的限制,本发明也并不仅限于上述举例,本技术领域的技术人员在本发明的实质范围内所做出的变化、改型、添加或替换,也应属于本发明的保护范围。The above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and variations, modifications, additions or substitutions made by those skilled in the art within the scope of the present invention should also belong to the present invention. The scope of protection of the invention.

Claims (8)

  1. 一种基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征在于:采用井下采矿操作多交互仿真系统,该系统包含虚拟现实模式和增强现实模式两种模式;虚拟现实模式包含特定场景的建模、漫游、模型及其材质的更换、视频嵌入虚拟场景、模型移动、应用场景意向交互、二维码生成以及语音交互;增强现实模式包含模型选择、模型讲解、动态模型演示、手势控制模型交互、截图生成图标、360度旋转以及停止、功能模式切换以及功能扩展;系统设计了两种隐藏菜单,即虚拟现实模式下的更换工具、材质的选择菜单以及增强现实模式下的模型选择类菜单;第一种用户进入特定区域菜单才会显示,离开即会隐藏;第二种点击便可在某处显示二级菜单,再次点击菜单隐藏;A multi-interaction realization method for mining operations based on virtual reality and augmented reality, characterized in that: a multi-interaction simulation system is operated by underground mining operation, and the system comprises two modes: a virtual reality mode and an augmented reality mode; the virtual reality mode includes a specific scene. Modeling, roaming, replacement of models and their materials, video embedded virtual scenes, model movement, application scene intentional interaction, two-dimensional code generation and voice interaction; augmented reality mode including model selection, model explanation, dynamic model demonstration, gesture control model Interactive, screenshot generation icons, 360-degree rotation and stop, function mode switching, and function expansion; the system has designed two hidden menus, namely the replacement tool in the virtual reality mode, the material selection menu, and the model selection menu in the augmented reality mode. The first type of user enters the specific area menu will be displayed, leaving will be hidden; the second type of click can display the second level menu somewhere, click the menu again to hide;
    所述的采矿操作多交互实现方法,具体包括如下步骤:The mining operation multiple interaction implementation method specifically includes the following steps:
    步骤1:矿山开采操作的整个环境场景的搭建Step 1: Construction of the entire environmental scene of the mining operation
    根据井下采矿操作的真实环境,利用建模工具3DMax进行1:1等比建模,实现整个井下采矿操作的环境模拟;利用UE4引擎对模型进行包括创建、编辑贴图以及材质在内的编辑,添加物理碰撞,对整体环境进行灯光、效果光照以及特效添加,并进行烘焙、渲染;According to the real environment of underground mining operation, the modeling tool 3DMax is used for 1:1 equal ratio modeling to realize the environment simulation of the whole underground mining operation; the UE4 engine is used to edit the model including creation, editing texture and material, and add Physical collision, adding lighting, effect lighting and special effects to the overall environment, baking and rendering;
    步骤2:虚拟现实应用场景的漫游Step 2: Roaming of the virtual reality application scenario
    在UE4引擎中,设置键盘上、下、左、右键,绑定Up、Down、Right、Left方向控制函数,为鼠标绑定Turnaround控制函数,实现整个井下采矿操作的虚拟现实场景的漫游;In the UE4 engine, the upper, lower, left, and right keys of the keyboard are set, and the Up, Down, Right, and Left direction control functions are bound, and the Turnaround control function is bound to the mouse to realize the roaming of the virtual reality scene of the entire underground mining operation;
    步骤3:更换井下采矿操作的工具模型以及矿山地质的模拟材质Step 3: Replace the tool model of the underground mining operation and the simulated material of the mine geology
    在虚拟井下矿山开采场景中添加隐藏菜单,当漫游至矿石开采处,会自动出现模型或者材质选择菜单,用户可以根据需求从菜单中选择模型或者材质进行更换;Add hidden menus in the virtual underground mining scene. When roaming to the ore mining area, the model or material selection menu will appear automatically. Users can select models or materials to replace them from the menu according to their needs.
    步骤4:将视频素材嵌入三维应用场景并控制播放、停止Step 4: Embed the video material into the 3D application scene and control playback and stop
    将视频素材嵌入虚拟现实场景,在三维空间中播放,模拟矿山开采环境的监控显示设备,设置键盘X键,绑定UE4平台的MediaPlayer媒体类,通过OpenSource和Close函数控制视频的播放和停止;Embed the video material into the virtual reality scene, play it in the three-dimensional space, simulate the monitoring display device of the mining environment, set the keyboard X key, bind the MediaPlayer media class of the UE4 platform, and control the playing and stopping of the video through the OpenSource and Close functions;
    步骤5:选择模型并移动到任意位置Step 5: Select the model and move to any location
    通过鼠标选中模型并将模型移动到任意需要进行模拟操作的位置,达到真实场景中的机械移动模拟;Select the model by mouse and move the model to any position where the simulation operation is needed to achieve the mechanical movement simulation in the real scene;
    步骤6:实现应用场景的意向交互Step 6: Implement the intent interaction of the application scenario
    当用户在虚拟现实应用场景中漫游至某一特定位置,系统检测到用户有意向进入,就自动开启环境灯,实现虚拟场景中的自然交互;When the user roams to a specific location in the virtual reality application scenario, and the system detects that the user intentionally enters, the environment light is automatically turned on to implement natural interaction in the virtual scenario;
    步骤7:二维码生成Step 7: QR code generation
    绑定键盘的F键,添加二维码生成函数,设置键盘按键控制生成二维码功能,用户按键盘F键,系统生成含设置好采样点的虚拟场景全景图的二维码;Bind the F key of the keyboard, add the QR code generation function, set the keyboard key control to generate the QR code function, and press the F key of the keyboard to generate a two-dimensional code of the virtual scene panorama with the set sampling points;
    步骤8:实现语音交互Step 8: Implement voice interaction
    用户通过包括正转、反转、升臂、降臂、停止在内的关键词控制虚拟现实场景中的采煤机,模拟其运作效果;The user controls the coal mining machine in the virtual reality scene through keywords including forward rotation, reverse rotation, boom raising, lowering arm and stopping, and simulates the operation effect;
    步骤9:AR动态演示功能模式切换Step 9: AR dynamic demonstration function mode switching
    用户点击系统右上角的AR模式按键切换到AR演示模式。The user clicks the AR mode button in the upper right corner of the system to switch to the AR presentation mode.
  2. 根据权利要求1所述的基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征在于:在步骤3中,将模型实例化为具体的Actor,添加SetMesh函数以及SetMaterial函数来更换模型和模型材质,设置Widget Blueprint用户界面以及Box collision碰撞检测,实现三维空间的隐藏菜单功能。The virtual reality and augmented reality based mining operation multi-interaction implementation method according to claim 1, wherein in step 3, the model is instantiated into a specific Actor, and the SetMesh function and the SetMaterial function are added to replace the model and the model. Material, set Widget Blueprint user interface and Box collision collision detection to realize hidden menu function in 3D space.
  3. 根据权利要求1所述的基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征在于:在步骤5中,为要操作的模型添加鼠标事件,通过GetHitResult函数将模型选中,然后根据鼠标在三维空间的坐标,改变模型的SetActorLocation函数的坐标值,当鼠标再次点击,将此时鼠标x、y、z三个方向的坐标值赋给模型,此时GetHitResult函数将模型设置为取消选中模式。The virtual reality and augmented reality based mining operation multi-interaction implementation method according to claim 1, wherein in step 5, a mouse event is added for the model to be operated, the model is selected by the GetHitResult function, and then according to the mouse. The coordinates of the three-dimensional space change the coordinate value of the SetActorLocation function of the model. When the mouse clicks again, the coordinates of the three directions of the mouse x, y, and z are assigned to the model. At this time, the GetHitResult function sets the model to the unselected mode.
  4. 根据权利要求1所述的基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征 在于:在步骤6中,设置TriggerBox触发器,当第一人称角色触发TriggerBox,系统检测到用户有意向进入某区域,便会自动启用此区域的某个设备。The virtual reality and augmented reality based mining operation multi-interaction implementation method according to claim 1, wherein in step 6, a TriggerBox trigger is set, and when the first person role triggers a TriggerBox, the system detects that the user intentionally enters a certain A zone automatically activates a device in this zone.
  5. 根据权利要求1所述的基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征在于:在步骤7中,用户按键盘F键,系统生成含设置好采样点的虚拟场景全景图二维码,用户用手机扫描二维码,跳转到手机端的虚拟应用场景展示页面,在手机端,用户可以启用陀螺仪,切换到VR分屏模式,设置好手机参数,便可用VR眼镜体验虚拟井下采矿操作环境场景,实现720度的视角展示,还能够实现手机端的多场景、多角度的漫游体验。The virtual reality and augmented reality based mining operation multi-interaction implementation method according to claim 1, wherein in step 7, the user presses the keyboard F key, and the system generates a virtual scene panorama with the set sampling points. Code, the user scans the QR code with the mobile phone and jumps to the virtual application scene display page of the mobile phone. On the mobile terminal, the user can enable the gyroscope, switch to the VR split screen mode, set the mobile phone parameters, and then use the VR glasses to experience the virtual underground. The mining operation environment scene realizes the 720-degree view display, and can also realize the multi-scene and multi-angle roaming experience on the mobile phone side.
  6. 根据权利要求1所述的基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征在于:在步骤8中,语音识别基于Pocket-sphinx库实现,通过改进中文关键字字典,经过预处理、特征提取、声学模型训练、语言模型训练以及语音解码和搜索实现识别功能,最后经过UE4引擎的编写功能控制函数实现三维空间里语音对模型的控制;语音识别的具体实现步骤如下:The virtual reality and augmented reality based mining operation multi-interaction implementation method according to claim 1, wherein in step 8, the speech recognition is implemented based on the Pocket-sphinx library, and the Chinese keyword dictionary is improved, and the preprocessing is performed. Feature extraction, acoustic model training, language model training, and speech decoding and search to realize the recognition function. Finally, the UE4 engine's writing function control function is used to realize the control of the speech model in the three-dimensional space. The specific implementation steps of the speech recognition are as follows:
    步骤8.1:预处理Step 8.1: Pretreatment
    对输入的原始语音信号进行处理,滤除掉其中的不重要的信息以及背景噪声,并对语音信号的端点检测、语音分帧和预加重进行处理;Processing the input original speech signal, filtering out unimportant information and background noise, and processing end point detection, speech framing and pre-emphasis of the speech signal;
    通过一阶FIR高通数字滤波器来实现预加重,一阶FIR高通数字滤波器的传递函数为:Pre-emphasis is achieved by a first-order FIR high-pass digital filter. The transfer function of the first-order FIR high-pass digital filter is:
    H(z)=1-az -1H(z)=1-az -1 ;
    其中,a为预加重滤波器的系数,取值范围为0.9~1.0,若设n时刻的语音采样值为x(n),则预加重后的信号为Where a is the coefficient of the pre-emphasis filter, and the value ranges from 0.9 to 1.0. If the speech sample value at time n is x(n), the pre-emphasized signal is
    y(n)=x(n)-a*x(n-1);y(n)=x(n)-a*x(n-1);
    步骤8.2:特征提取Step 8.2: Feature Extraction
    通过梅尔频率倒谱系数(MFCC)的方法来进行特征提取;具体按照如下步骤进行:Feature extraction is performed by the method of Mel Frequency Cepstral Coefficient (MFCC); the following steps are performed as follows:
    步骤8.2.1:利用人听觉的临界带效应,采用MEL倒谱分析技术对语音信号处理得到MEL倒谱系数矢量序列;Step 8.2.1: using the critical band effect of human hearing, using MEL cepstrum analysis technique to obtain a MEL cepstral coefficient vector sequence for speech signal processing;
    步骤8.2.2:用MEL倒谱系数矢量序列表示输入语音的频谱,在语音频谱范围内设置若干个具有三角形或正弦形滤波特性的带通滤波器;Step 8.2.2: use the MEL cepstral coefficient vector sequence to represent the spectrum of the input speech, and set a number of bandpass filters with triangular or sinusoidal filtering characteristics in the speech spectrum range;
    步骤8.2.3:通过带通滤波器组,求各个带通滤波器的输出数据;Step 8.2.3: Find the output data of each band pass filter through the band pass filter bank;
    步骤8.2.4:对各个带通滤波器的输出数据取对数,并做离散余弦变换(DCT);Step 8.2.4: Logarithmically output data of each band pass filter and perform discrete cosine transform (DCT);
    步骤8.2.5:得到MFCC系数;求解公式如下:Step 8.2.5: Obtain the MFCC coefficient; the solution formula is as follows:
    Figure PCTCN2017118923-appb-100001
    Figure PCTCN2017118923-appb-100001
    其中,C i为特征参数,k为三角滤波器的个数,F(k)为各个滤波器的输出数据,P为滤波器阶数,i为数据长度; Where C i is a characteristic parameter, k is the number of triangular filters, F(k) is the output data of each filter, P is the filter order, and i is the data length;
    步骤8.3:声学模型训练Step 8.3: Acoustic Model Training
    根据训练语音库的特征参数训练出声学模型参数;The acoustic model parameters are trained according to the characteristic parameters of the training speech library;
    在识别时可以将待识别的语音的特征参数同声学模型进行匹配,得到识别结果;本文采用混合高斯模型-隐马尔科夫模型(GMM-HMM)作为声学模型,具体包括如下步骤:In the identification, the characteristic parameters of the speech to be recognized can be matched with the acoustic model to obtain the recognition result. In this paper, the mixed Gaussian model-Hidden Markov Model (GMM-HMM) is used as the acoustic model, which includes the following steps:
    步骤8.3.1:求出混合高斯模型的联合概率密度函数的形式如下:Step 8.3.1: Find the joint probability density function of the mixed Gaussian model as follows:
    Figure PCTCN2017118923-appb-100002
    Figure PCTCN2017118923-appb-100002
    其中,M表示混合高斯模型中高斯的个数,C m表示权重,u m表示均值,∑ m表示协方差 矩阵,D为观测矢量维数;利用最大期望值算法(EM)对混合高斯模型参数变量Θ={C m,u m,∑ m}进行估计,利用如下公式求解: Where M is the number of Gaussian in the mixed Gaussian model, C m is the weight, u m is the mean, ∑ m is the covariance matrix, D is the observation vector dimension, and the maximum expected value algorithm (EM) is used to mix the Gaussian model parameter variables. Θ={C m ,u m ,∑ m } is estimated and solved by the following formula:
    Figure PCTCN2017118923-appb-100003
    Figure PCTCN2017118923-appb-100003
    Figure PCTCN2017118923-appb-100004
    Figure PCTCN2017118923-appb-100004
    Figure PCTCN2017118923-appb-100005
    Figure PCTCN2017118923-appb-100005
    其中,j是当前迭代轮数,N表示训练数据集中元素的个数,x (t)为t时刻的特征向量,h m(t)表示t时刻C m的后验概率;GMM参数通过EM算法进行估计,可以使其在训练数据上生成语音观察特征的概率最大化; Where j is the number of current iterations, N is the number of elements in the training data set, x (t) is the eigenvector at time t, h m (t) is the posterior probability at time t m ; GMM parameters are passed EM algorithm Estimating can maximize the probability of generating a speech observation feature on the training data;
    步骤8.3.2:求解HMM的三个组成部分Step 8.3.2: Solve the three components of the HMM
    设状态序列为q 1,q 2,…,q N,令转移概率矩阵A=[a ij]i,j∈[1,N],则求出的马尔科夫链状态间的跳转概率为:a ij=P(q t=j|q t-1=i);马尔科夫链的初始概率π=[π i]i∈[1,N],其中,π i=P(q 1=i);令每个状态的观察概率分布b i(o t)=P(o t|q t=i),采用GMM模型来描述状态的观察概率分布;根据步骤8.3.1,求解公式为: Let the state sequence be q 1 , q 2 ,...,q N , and let the transition probability matrix A=[a ij ]i,j∈[1,N], then the jump probability between the Markov chain states is :a ij =P(q t =j|q t-1 =i); the initial probability of Markov chain π=[π i ]i∈[1,N], where π i =P(q 1 = i); Let the observed probability distribution b i (o t )=P(o t |q t =i) of each state, use the GMM model to describe the observed probability distribution of the state; according to step 8.3.1, the formula is:
    Figure PCTCN2017118923-appb-100006
    Figure PCTCN2017118923-appb-100006
    其中,N为状态个数,i、j表示状态,a ij表示t-1时刻从i状态跳转到t时刻j状态的概率,o t为t时刻的观测值,C i,m为混合系数,表示不同高斯之间的权重,u i,m表示不同高斯之间的均值,∑ i,m表示不同高斯之间的协方差矩阵;HMM的参数通过Baum-Welch算法进行估计得出,最后生成声学模型文件; Where N is the number of states, i and j are states, a ij is the probability of jumping from the i state to the t state at time t-1, o t is the observed value at time t, and C i,m is the mixing coefficient , indicating the weight between different Gaussians, u i,m represents the mean between different Gaussians, ∑ i,m represents the covariance matrix between different Gaussians; the parameters of HMM are estimated by Baum-Welch algorithm, and finally generated Acoustic model file;
    步骤8.4:语言模型训练Step 8.4: Language Model Training
    采用N-Gram模型实现语言模型的训练;在一个语句中第i个词出现的概率,条件依赖于它前面的N-1个词,即将一个词的上下文定义为该词前面出现的N-1个词,其表达式为:The N-Gram model is used to implement the training of the language model; the probability of the occurrence of the i-th word in a sentence depends on the N-1 words in front of it, that is, the context of a word is defined as the N-1 appearing in front of the word. Word whose expression is:
    Figure PCTCN2017118923-appb-100007
    Figure PCTCN2017118923-appb-100007
    使用条件概率公式S将上述表达式替换成如下公式:Replace the above expression with the following formula using the conditional probability formula S:
    P(sentence)=P(w 1)*P(w 2|w 1)*P(w 3|w 2)…*P(w n|w 1,w 2,…,w n-1) P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )...*P(w n |w 1 ,w 2 ,...,w n-1 )
    其中,P(w 1)是w 1在文章中出现的概率,P(w 1,w 2)是w 1,w 2连续出现的概率,P(w 2|w 1)是已知w 1已出现的情况下出现w 2的概率,假设识别sentence的概率用P(s)表示,P(s)=P(w 1,w 2,…,w n)表示单词集w 1,w 2,…,w n连续出现并生成S的概率; Where, P (w 1) is the probability w 1 appears in the article, P (w 1, w 2 ) is w 1, w 2 consecutive probability of, P (w 2 | w 1 ) is already known w 1 The probability of occurrence of w 2 occurs in the case of occurrence, assuming that the probability of identifying the sentence is represented by P(s), and P(s)=P(w 1 , w 2 , . . . , w n ) represents the word set w 1 , w 2 ,... , the probability that w n appears continuously and generates S;
    通过马尔科夫假设精简成如下公式:Streamlined to the following formula by the Markov assumption:
    P(sentence)=P(w 1)*P(w 2|w 1)*P(w 3|w 2)…*P(w n|w n-1) P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )...*P(w n |w n-1 )
    其中,P(w i|w i-1)=P(w i-1,w i)/P(w i),P(w i-1,w i)和P(w i)都可以从语料统计出来,最终就能得到P(sentence),语言模型存储P(w i-1,w i)的概率统计值,通过求出P(sentence)的最大值来实现整个识别过程; Where P(w i |w i-1 )=P(w i-1 ,w i )/P(w i ), P(w i-1 ,w i ) and P(w i ) can all be learned from the corpus Statistics, finally, P (sentence), the language model stores the probability statistics of P(w i-1 , w i ), and the entire recognition process is realized by finding the maximum value of P (sentence);
    步骤8.5:语音解码和搜索算法Step 8.5: Speech decoding and search algorithm
    针对输入的语音信号,根据己经训练好的声学模型、语言模型及利用g2p工具创建好的字典映射文件建立一个识别网络,根据搜索算法在该网络中寻找最佳的一条路径,这个路径就是能够以最大概率输出该语音信号的词串,这样就确定这个语音样本所包含的文字,本文采用Viterbi算法实现语音解码,具体过程如下:For the input speech signal, an identification network is established according to the trained acoustic model, the language model and the dictionary mapping file created by the g2p tool, and the best path is found in the network according to the search algorithm. The word string of the speech signal is output with the maximum probability, so that the text contained in the speech sample is determined. The Viterbi algorithm is used to implement speech decoding. The specific process is as follows:
    步骤8.5.1:输入HMM模型的参数和观测序列O={o 1,o 2,…,o T},则t=1时所有的状态概率为: Step 8.5.1: Enter the parameters of the HMM model and the observation sequence O={o 1 ,o 2 ,...,o T }, then all state probabilities when t=1 are:
    δ 1(i)=π ib i(o 1) δ 1 (i)=π i b i (o 1 )
    ψ 1(i)=0 ψ 1 (i)=0
    步骤8.5.2:逐渐递推到t=2,3,…,T,则为:Step 8.5.2: Gradually recursive to t=2,3,...,T, then:
    Figure PCTCN2017118923-appb-100008
    Figure PCTCN2017118923-appb-100008
    Figure PCTCN2017118923-appb-100009
    Figure PCTCN2017118923-appb-100009
    步骤8.5.3:终止遍历:Step 8.5.3: Terminate traversal:
    Figure PCTCN2017118923-appb-100010
    Figure PCTCN2017118923-appb-100010
    Figure PCTCN2017118923-appb-100011
    Figure PCTCN2017118923-appb-100011
    步骤8.5.4:回溯最优路径,t=T-1,T-2,…,1;Step 8.5.4: Backtracking the optimal path, t=T-1, T-2,...,1;
    Figure PCTCN2017118923-appb-100012
    Figure PCTCN2017118923-appb-100012
    步骤8.5.5:输出最优隐状态路径
    Figure PCTCN2017118923-appb-100013
    Step 8.5.5: Output the optimal hidden state path
    Figure PCTCN2017118923-appb-100013
    其中,δ t(i)是递推到t时刻,最优路径经过的所有结点的联合概率,ψ t(i)是t时刻的隐状态,T为时间,P *为最优路径的概率,
    Figure PCTCN2017118923-appb-100014
    为最优路径的终结点。
    Where δ t (i) is the joint probability of all nodes through which the optimal path passes when recursively to t, ψ t (i) is the implicit state at time t, T is time, and P * is the probability of the optimal path ,
    Figure PCTCN2017118923-appb-100014
    The endpoint for the optimal path.
  7. 根据权利要求6所述的基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征在于:a取0.97。The virtual reality and augmented reality based mining operation multi-interaction implementation method according to claim 6, wherein: a takes 0.97.
  8. 根据权利要求1所述的基于虚拟现实与增强现实的采矿操作多交互实现方法,其特征在于:在步骤9中,具体包括如下步骤:The virtual reality and augmented reality-based mining operation multi-interaction implementation method according to claim 1, wherein in step 9, the method comprises the following steps:
    步骤9.1:模型选择Step 9.1: Model Selection
    对采煤机模型、掘进机模型、风煤钻模型以及综采支架模型进行选择,每一类模型都是对真实采煤工具的1:1建模模拟;Select the shearer model, the roadheader model, the wind coal drill model and the fully mechanized support model. Each type of model is a 1:1 modeling simulation of the real coal mining tool;
    步骤9.2:模型讲解Step 9.2: Model explanation
    用户通过选择模型后,再通过此菜单选择需要学习的工具模型选项,系统会播放对应的语音讲解,再次点击按键语音停止;After the user selects the model, and then selects the tool model option to be learned through this menu, the system will play the corresponding voice explanation, and click the button voice again to stop;
    步骤9.3:模型演示Step 9.3: Model Demo
    将在3DMax建模过程中制作的工具模拟运行动画导入UE4引擎中,设置相应的选择菜单,点击便可在AR模式下演示相应采煤工具的运行状态;The tool simulation running animation created in the 3DMax modeling process is imported into the UE4 engine, and the corresponding selection menu is set, and the running state of the corresponding coal mining tool can be demonstrated in the AR mode by clicking;
    步骤9.4:截图生成图标Step 9.4: Screenshot generation icon
    在AR模式的主菜单,添加一个按钮,绑定摄像机的截图功能,在菜单右侧添加滚动菜单栏,当截图函数成功触发,截图通过设置好的动态材质转换函数,显示到右侧滚动菜单栏,演示过程中,用户点击截图按键,系统会在界面一侧生成图标;In the main menu of AR mode, add a button, bind the camera's screenshot function, add a scrolling menu bar on the right side of the menu, when the screenshot function is successfully triggered, the screenshot is displayed to the right by scrolling the menu bar by setting the dynamic material conversion function. During the demonstration, the user clicks the screenshot button, and the system generates an icon on the interface side;
    步骤9.5:旋转Step 9.5: Rotate
    将设置的模型实例化为一个Actor,添加Rotation函数,实现模型顺时针旋转;Instantiate the set model as an Actor, add the Rotation function, and implement the model to rotate clockwise;
    步骤9.6:功能扩展Step 9.6: Function Extension
    添加二级UI,控制Map切换,实现包括地球、土星、水星、含大气层星球以及星系在内的运行演示功能;添加WidgetBlueprint编码实现了知识简介面板的显示或隐藏;设计返回键回到AR编辑主模块;Add a secondary UI, control Map switching, and implement running demo functions including Earth, Saturn, Mercury, Atmospheric Planet, and Galaxies; add WidgetBlueprint encoding to display or hide the Knowledge Profile panel; Design Back button to return to AR Editor Module
    步骤9.7:动态手势控制模型,真实环境与虚拟模型叠加,手势与模型进行交互控制,具体包括如下步骤:Step 9.7: The dynamic gesture control model, the real environment is superimposed with the virtual model, and the gesture and the model are interactively controlled, including the following steps:
    步骤9.7.1:初始化视频捕捉,读取标志文件和摄像相机参数;Step 9.7.1: Initialize video capture, read the logo file and camera camera parameters;
    步骤9.7.2:抓取视频帧图像;Step 9.7.2: Grab the video frame image;
    步骤9.7.3:执行探测标记以及识别视频帧中的标记模板,并利用OpenCV库函数对获取的视频帧图像进行运动检测,判断是否检测到运动轨迹;Step 9.7.3: performing the detection mark and identifying the mark template in the video frame, and performing motion detection on the acquired video frame image by using the OpenCV library function to determine whether the motion track is detected;
    若:判断结果是检测到手势运动轨迹,则执行步骤9.7.4;If: the judgment result is that the gesture motion track is detected, then step 9.7.4 is performed;
    或判断结果是没有检测到运动轨迹,则继续执行探测标记以及识别视频帧中的标记模板,然后执行步骤9.7.12;Or the result of the judgment is that no motion track is detected, then the detection mark is continued and the mark template in the video frame is identified, and then step 9.7.12 is performed;
    基于颜色直方图与背景差分进行运动检测,对采集的帧以及对每帧运动检测后得到除运动手势区域外的像素做背景更新,公式如下;Perform motion detection based on the color histogram and the background difference, and perform background update on the acquired frame and the pixels outside the motion gesture area after detecting each frame motion, and the formula is as follows;
    Figure PCTCN2017118923-appb-100015
    Figure PCTCN2017118923-appb-100015
    其中,u t为背景图像相应的像素点,u t+1为更新后的背景图像像素点;I t为当前帧图像的像素点,I f是当前帧图像像素点的掩码值,即是否做背景更新;a∈[0,1]为背景图像模型更新速度,本文取0.8; Where u t is the corresponding pixel point of the background image, u t+1 is the updated background image pixel point; I t is the pixel point of the current frame image, and I f is the mask value of the current frame image pixel point, that is, whether Do background update; a∈[0,1] is the background image model update speed, this paper takes 0.8;
    步骤9.7.4:对图像进行包括去噪在内的预处理;Step 9.7.4: Perform preprocessing including denoising on the image;
    通过运动检测步骤,如果检测到有运动信息,则开始对含有运动手势的视频帧图像进行预处理:通过OpenCV的medianBlur函数对图像进行中值滤波,去除椒盐噪声;Through the motion detection step, if motion information is detected, the video frame image containing the motion gesture is preprocessed: median filtering is performed on the image by the medianBlur function of OpenCV to remove the salt and pepper noise;
    步骤9.7.5:转换到HSV空间;Step 9.7.5: Convert to HSV space;
    通过cvtColor函数对图像进行颜色空间转换,得到其HSV空间的数据,并对HSV空间中的亮度v值重新设定如下式所示:The image is color-space converted by the cvtColor function to obtain the data of its HSV space, and the brightness v value in the HSV space is reset as shown below:
    Figure PCTCN2017118923-appb-100016
    Figure PCTCN2017118923-appb-100016
    其中,r、g为肤色区域的红色与绿色像素,且r>g;Where r, g are red and green pixels of the skin color region, and r>g;
    步骤9.7.6:分割手区域;Step 9.7.6: Split the hand area;
    步骤9.7.7:进行形态学处理,去除杂点;Step 9.7.7: Perform morphological processing to remove impurities;
    将得到的运动二值图和通过反投影得到的二值图相与,并进行图像形态学闭操作得到比较完整的运动肤色手势二值图;并去除图像中的杂点;Combining the obtained motion binary image with the binary image obtained by back projection, and performing image morphology closing operation to obtain a relatively complete motion skin color gesture binary image; and removing the noise points in the image;
    步骤9.7.8:获取手轮廓;Step 9.7.8: Obtain the hand contour;
    通过初步的形态学操作,去除噪声,并使手的边界更加清晰后,通过OpenCV的findContours函数得到手势轮廓,然后进行去除伪轮廓操作;After preliminary morphological operations, removing noise and making the boundary of the hand clearer, the gesture contour is obtained by OpenCV's findContours function, and then the pseudo contour is removed.
    步骤9.7.9:画出手轮廓,标定信息;Step 9.7.9: Draw the outline of the hand and calibrate the information;
    步骤9.7.10:轮廓信息比较,设置方向向量;Step 9.7.10: Comparison of contour information, setting direction vector;
    将每一帧得到的轮廓进行比较,设定比较条件,通过比较给方向标志变量赋值;Compare the contours obtained in each frame, set the comparison conditions, and assign values to the direction marker variables by comparison;
    步骤9.7.11:对模型根据矢量坐标进行受力模拟,实现动态手势与虚拟模型的交互;Step 9.7.11: Perform force simulation on the model according to vector coordinates to realize interaction between dynamic gestures and virtual models;
    动态手势通过轮廓判断后,根据不同的判断结果对虚拟模型进行受力模拟操作,根据轮 廓判断过程中方向标记的值,模型在三维空间的坐标值将会进行x、y、z三个坐标轴上的相乘计算,通过坐标值的改变,实现模型位置的改变而达到受力的模拟;After the dynamic gesture is judged by the contour, the virtual model is subjected to the force simulation operation according to different judgment results. According to the value of the direction marker in the contour judgment process, the coordinate values of the model in the three-dimensional space will be three axes of x, y and z. The multiplication calculation on the above, through the change of the coordinate value, the change of the model position is achieved to achieve the simulation of the force;
    步骤9.7.12:计算摄像头相对于探测到的标记的转换矩阵;Step 9.7.12: Calculate a conversion matrix of the camera relative to the detected mark;
    步骤9.7.13:在探测到的标记上叠加虚拟物体,并返回执行步骤9.7.2,实现真实环境与虚拟模型的叠加显示;Step 9.7.13: superimpose the virtual object on the detected mark, and return to step 9.7.2 to realize superimposed display of the real environment and the virtual model;
    步骤9.7.14:当点击VR模式时,系统切换显示模式,摄像头关闭,以上步骤停止执行。Step 9.7.14: When the VR mode is clicked, the system switches the display mode, the camera is turned off, and the above steps are stopped.
PCT/CN2017/118923 2017-08-08 2017-12-27 Multi-interaction implementation method for mining operation based on virtual reality and augmented reality WO2019029100A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710668415.XA CN107515674B (en) 2017-08-08 2017-08-08 It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality
CN201710668415.X 2017-08-08

Publications (1)

Publication Number Publication Date
WO2019029100A1 true WO2019029100A1 (en) 2019-02-14

Family

ID=60722284

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/118923 WO2019029100A1 (en) 2017-08-08 2017-12-27 Multi-interaction implementation method for mining operation based on virtual reality and augmented reality

Country Status (2)

Country Link
CN (1) CN107515674B (en)
WO (1) WO2019029100A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992118A (en) * 2019-02-18 2019-07-09 杭州同绘科技有限公司 Aerial lift device with insulated arm emulating operating system based on virtual reality technology
CN111968445A (en) * 2020-09-02 2020-11-20 上海上益教育设备制造有限公司 Elevator installation teaching virtual reality system
US11119569B2 (en) 2020-02-18 2021-09-14 International Business Machines Corporation Real-time visual playbacks
WO2022007565A1 (en) * 2020-07-10 2022-01-13 北京字节跳动网络技术有限公司 Image processing method and apparatus for augmented reality, electronic device and storage medium

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107515674B (en) * 2017-08-08 2018-09-04 山东科技大学 It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality
TWI633500B (en) * 2017-12-27 2018-08-21 中華電信股份有限公司 Augmented reality application generation system and method
CN108198246A (en) * 2017-12-28 2018-06-22 重庆创通联达智能技术有限公司 A kind of method of controlling rotation and device for showing 3-D view
CN108230440A (en) * 2017-12-29 2018-06-29 杭州百子尖科技有限公司 Chemical industry whole process operating system and method based on virtual augmented reality
US11074292B2 (en) * 2017-12-29 2021-07-27 Realwear, Inc. Voice tagging of video while recording
CN110058673A (en) * 2018-01-17 2019-07-26 广西米克尔森科技股份有限公司 A kind of virtual reality and augmented reality show exchange technology
CN108509031A (en) * 2018-03-12 2018-09-07 中国科学院国家空间科学中心 A kind of space science task display systems based on augmented reality
CN108629076A (en) * 2018-03-22 2018-10-09 广东长亨石业有限公司 A kind of stone pit simulation system and its method based on 3D models
CN108399815A (en) * 2018-03-22 2018-08-14 河南职业技术学院 A kind of security risk based on VR looks into the method and its system except rehearsal
CN108563395A (en) * 2018-05-07 2018-09-21 北京知道创宇信息技术有限公司 The visual angles 3D exchange method and device
CN110489184B (en) * 2018-05-14 2023-07-25 北京凌宇智控科技有限公司 Virtual reality scene implementation method and system based on UE4 engine
CN109144256B (en) * 2018-08-20 2019-08-23 广州市三川田文化科技股份有限公司 A kind of virtual reality behavior interactive approach and device
CN110873901B (en) * 2018-08-29 2022-03-08 中国石油化工股份有限公司 Pseudo well curve frequency increasing method and system
CN109268010B (en) * 2018-09-22 2020-07-03 太原理工大学 Remote inspection intervention method for virtual reality mine fully-mechanized coal mining face
CN109407918A (en) * 2018-09-25 2019-03-01 苏州梦想人软件科技有限公司 The implementation method of augmented reality content multistage interactive mode
CN109191978A (en) * 2018-09-27 2019-01-11 常州工程职业技术学院 Shield machine manipulates driving analog system
CN109543072B (en) * 2018-12-05 2022-04-22 深圳Tcl新技术有限公司 Video-based AR education method, smart television, readable storage medium and system
CN110275610B (en) * 2019-05-27 2022-09-30 山东科技大学 Cooperative gesture control coal mining simulation control method based on LeapMotion somatosensory controller
CN110348370B (en) * 2019-07-09 2021-05-11 北京猫眼视觉科技有限公司 Augmented reality system and method for human body action recognition
CN110502121B (en) * 2019-07-24 2023-02-17 江苏大学 Frame type virtual keyboard with touch sense and high recognition resolution and input correction algorithm thereof
CN110740263B (en) * 2019-10-31 2021-03-12 维沃移动通信有限公司 Image processing method and terminal equipment
CN110969687B (en) * 2019-11-29 2023-07-28 中国商用飞机有限责任公司北京民用飞机技术研究中心 Collision detection method, device, equipment and medium
CN111241963B (en) * 2020-01-06 2023-07-14 中山大学 First person view video interactive behavior identification method based on interactive modeling
CN111309202B (en) * 2020-01-20 2021-09-21 深圳市赛易特信息技术有限公司 Dynamic display method, terminal and storage medium based on webpage
CN111367407B (en) * 2020-02-24 2023-10-10 Oppo(重庆)智能科技有限公司 Intelligent glasses interaction method, intelligent glasses interaction device and intelligent glasses
CN111300412A (en) * 2020-02-28 2020-06-19 华南理工大学 Method for controlling robot based on illusion engine
CN112419329A (en) * 2020-06-03 2021-02-26 中煤华晋集团有限公司王家岭矿 Bulk similarity simulation top coal migration monitoring method based on MATLAB
CN111784850B (en) * 2020-07-03 2024-02-02 深圳市瑞立视多媒体科技有限公司 Object grabbing simulation method based on illusion engine and related equipment
CN111894582B (en) * 2020-08-04 2021-09-24 中国矿业大学 Control method of coal mining machine
CN112382293A (en) * 2020-11-11 2021-02-19 广东电网有限责任公司 Intelligent voice interaction method and system for power Internet of things
CN112799507B (en) * 2021-01-15 2022-01-04 北京航空航天大学 Human body virtual model display method and device, electronic equipment and storage medium
CN113380088A (en) * 2021-04-07 2021-09-10 上海中船船舶设计技术国家工程研究中心有限公司 Interactive simulation training support system
CN113128716A (en) * 2021-04-25 2021-07-16 中国科学院计算机网络信息中心 Operation guidance interaction method and system
CN113160395B (en) * 2021-05-20 2022-06-24 北京知优科技有限公司 CIM-based urban multi-dimensional information interaction and scene generation method, device and medium
CN116744041A (en) * 2022-03-04 2023-09-12 北京字跳网络技术有限公司 Information display method, device, head-mounted display equipment and storage medium
CN114743554A (en) * 2022-06-09 2022-07-12 武汉工商学院 Intelligent household interaction method and device based on Internet of things
CN117316143A (en) * 2023-11-30 2023-12-29 深圳市金大智能创新科技有限公司 Method for human-computer interaction based on virtual person
CN117934674B (en) * 2024-02-05 2024-09-17 深圳萌想文化传播有限公司 Deep learning and three-dimensional animation interactive cooperation method and system
CN117873119B (en) * 2024-03-11 2024-05-28 北京数易科技有限公司 Mobile control method, system and medium for mobile equipment based on virtual reality

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160090839A1 (en) * 2014-11-26 2016-03-31 Larry G. Stolarczyk Method of protecting the health and well-being of coal mine machine operators
CN105955456A (en) * 2016-04-15 2016-09-21 深圳超多维光电子有限公司 Virtual reality and augmented reality fusion method, device and intelligent wearable equipment
CN106019364A (en) * 2016-05-08 2016-10-12 大连理工大学 Floor water inrush early-warning system and method in coal mining
CN107515674A (en) * 2017-08-08 2017-12-26 山东科技大学 It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3108287A4 (en) * 2014-02-18 2017-11-08 Merge Labs, Inc. Head mounted display goggles for use with mobile computing devices
US20160163063A1 (en) * 2014-12-04 2016-06-09 Matthew Ashman Mixed-reality visualization and method
CN106953900A (en) * 2017-03-09 2017-07-14 华东师范大学 A kind of industrial environment outdoor scene enhanced interactive terminal and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160090839A1 (en) * 2014-11-26 2016-03-31 Larry G. Stolarczyk Method of protecting the health and well-being of coal mine machine operators
CN105955456A (en) * 2016-04-15 2016-09-21 深圳超多维光电子有限公司 Virtual reality and augmented reality fusion method, device and intelligent wearable equipment
CN106019364A (en) * 2016-05-08 2016-10-12 大连理工大学 Floor water inrush early-warning system and method in coal mining
CN107515674A (en) * 2017-08-08 2017-12-26 山东科技大学 It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992118A (en) * 2019-02-18 2019-07-09 杭州同绘科技有限公司 Aerial lift device with insulated arm emulating operating system based on virtual reality technology
US11119569B2 (en) 2020-02-18 2021-09-14 International Business Machines Corporation Real-time visual playbacks
WO2022007565A1 (en) * 2020-07-10 2022-01-13 北京字节跳动网络技术有限公司 Image processing method and apparatus for augmented reality, electronic device and storage medium
US11756276B2 (en) 2020-07-10 2023-09-12 Beijing Bytedance Network Technology Co., Ltd. Image processing method and apparatus for augmented reality, electronic device, and storage medium
CN111968445A (en) * 2020-09-02 2020-11-20 上海上益教育设备制造有限公司 Elevator installation teaching virtual reality system

Also Published As

Publication number Publication date
CN107515674A (en) 2017-12-26
CN107515674B (en) 2018-09-04

Similar Documents

Publication Publication Date Title
WO2019029100A1 (en) Multi-interaction implementation method for mining operation based on virtual reality and augmented reality
Zhou et al. Virtual reality: A state-of-the-art survey
CN107423398A (en) Exchange method, device, storage medium and computer equipment
CN104331164B (en) A kind of gesture motion smoothing processing method of the similarity threshold analysis based on gesture identification
CN114144790A (en) Personalized speech-to-video with three-dimensional skeletal regularization and representative body gestures
CN104166851A (en) Multimedia interactive learning system and method for paper textbooks
CN112508750A (en) Artificial intelligence teaching device, method, equipment and storage medium
CN102930270A (en) Method and system for identifying hands based on complexion detection and background elimination
CN103649967A (en) Dynamic gesture recognition process and authoring system
CA3185810A1 (en) Systems and methods for augmented or mixed reality writing
CN111598996B (en) Article 3D model display method and system based on AR technology
CN106293099A (en) Gesture identification method and system
CN116863003A (en) Video generation method, method and device for training video generation model
CN104484034B (en) A kind of gesture motion primitive transition frames localization method based on gesture identification
CN105468574A (en) Decorative font synthesizing method
CN112764530A (en) Ammunition identification method based on touch handle and augmented reality glasses
Fang et al. SignLLM: Sign Languages Production Large Language Models
Putra et al. Designing translation tool: Between sign language to spoken text on kinect time series data using dynamic time warping
CN116841391A (en) Digital human interaction control method, device, electronic equipment and storage medium
Thakar et al. Hand gesture controlled gaming application
CN111860086A (en) Gesture recognition method, device and system based on deep neural network
US12067970B2 (en) Method and apparatus for mining feature information, and electronic device
CN112788390B (en) Control method, device, equipment and storage medium based on man-machine interaction
CN112764531A (en) Augmented reality ammunition identification method
KR20140078083A (en) Method of manufacturing cartoon contents for augemented reality and apparatus performing the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17920772

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17920772

Country of ref document: EP

Kind code of ref document: A1