CN108334199A

CN108334199A - The multi-modal exchange method of movable type based on augmented reality and device

Info

Publication number: CN108334199A
Application number: CN201810144421.XA
Authority: CN
Inventors: 杜广龙; 陈晓丹; 张平; 李方
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-02-12
Filing date: 2018-02-12
Publication date: 2018-07-27

Abstract

The multi-modal exchange method of movable type and device that the invention discloses a kind of based on augmented reality, this approach includes the following steps：Show that human-computer interaction interface, augmented reality virtual scene include the interactive information such as dummy object by augmented reality mode；User sends interactive instruction by way of gesture and voice, by multi-modal fusion method, understands different modalities semanteme, and merge the modal data of gesture and voice, generates multi-modal fusion interactive instruction；After user's interactive instruction effect, exercising result returns in augmented reality virtual scene, is fed back into row information by the variation of scene.The inventive system comprises gesture sensor, PC machine, microphone, optics through mode augmented realities to show equipment, WiFi routers.The present invention provides a kind of method and devices of combination augmented reality and multi-modal interaction, embody thought focusing on people, naturally intuitive, reduce Learning work load, improve interactive efficiency.

Description

The multi-modal exchange method of movable type based on augmented reality and device

Technical field

The present invention relates to human-computer interaction technique fields, and in particular to a kind of multi-modal interaction of movable type based on augmented reality Method and device.

Background technology

Augmented reality (Augmented Reality, AR) is disappearing with the rapid development of computer technology, in recent years Expense causes huge concern in the market, and various products emerge one after another, and have started one vision tide of revolution.Augmented reality skill Art is a kind of technology merging real scene with virtual scene, its purpose is by computer graphical, image processing techniques Realize the synthesis of outdoor scene (display environment or user image) and empty scape (virtual environment or dummy object that computer generates).

Similarly, multi-modal human-computer interaction technology is also that current field of human-computer interaction is studied with being widely studied.It is multi-modal Man-machine interaction mode applies multiple natural interactive styles, so that the perception mode of people is fully used, multiple interaction modalities are not with Same interactive mode cooperating is realized and more freely and is naturally communicated.Multi-modal interaction is not merely using multiple logical Road is independently completed task, but is integrated the interactive information between user's difference channel by Multimodal Integration technology, passes through Interaction between multiple channels forms final interaction and is intended to, and is properly completed task.

However, although current augmented reality possesses the visual display mode different from tradition, in a manner of holographic More information are provided, wearable AR equipment also has good mobility and portability, but its shortage nature is intuitive, high The interactive mode of effect interacts so that user experience is bad typically only by controller or simple voice or gesture. And current multi-modal interactive mode can unify different sense organ mode and realize that nature is intuitive, efficiently interact, but be only applied to Desk device lacks good portability and mobility.

Invention content

The purpose of the present invention is to solve drawbacks described above in the prior art, provide a kind of movement based on augmented reality The multi-modal exchange method of formula and device merge multiple perception mode, and information feedback is realized by augmented reality, and enhancing is existing The enhancement of real technology, portability and the interactivity of multi-modal interaction organically combine, and realize intuitive nature, low Learning work load, height Interactive efficiency has portability, ambulant man-machine interaction mode simultaneously.

According to disclosed embodiment, it is multi-modal that the first aspect of the present invention discloses a kind of movable type based on augmented reality Exchange method, the multi-modal exchange method include the following steps：

S1, human-computer interaction interface, the interactive information of augmented reality virtual scene are shown by augmented reality mode；

S2, user pass through gesture and the multi-modal interactive mode of voice and the virtual interacting pair in augmented reality virtual scene As interacting；

S3, by multi-modal fusion method, understand different modalities semanteme, and merge the modal data of gesture and voice, produce Raw multi-modal fusion interactive instruction；

After S4, user's interactive instruction effect, exercising result returns in augmented reality virtual scene, passes through the variation of scene It is fed back into row information.

Further, the virtual reality scene described in step S1 includes virtual interacting object and virtual information object, In, the virtual interacting object possesses multi-modal interaction capabilities and Informational Expression ability；The virtual information object, is gathered around There is Informational Expression ability.

Further, interactive instruction operation is sent by gesture in step S2, wherein the interactive object of gesture is that enhancing is existing Virtual interacting object in real virtual scene, interactive mode include：Virtual interacting object is implemented to click, drag or touch behaviour Make.

Further, to realize that user passes through the virtual interacting object in gesture and augmented reality virtual scene in step S2 Interaction is needed to realize being registrated between gesture sensor coordinate system and augmented reality virtual scene coordinate system, be obtained between the two Coordinate transformation relation, calculate gesture sensor and augmented reality using Zhang Zhengyou standardizations and show and join outside the interior participation of equipment, The lens model of Zhang Zhengyou standardizations is as follows：

Wherein, s is scale factor, [u, v, 1]^TFor pixel planes coordinate, [X_w,Y_w,Z_w,1]^TFor the coordinate of world coordinate system Point, [R, T] i.e.Matrix is outer parameter, and R is spin matrix, and t is translation vector, the transposition of T representing matrixes,For intrinsic parameter K, F is the focal length of video camera,[u₀,v₀]^TFor coordinate of the camera coordinate system origin in image coordinate system, dx and dy are picture The length of side of element, unit mm, K=K₁K₂；

According to homography, the relationship between image obtained by plane reference plate and video camera is as follows：

Wherein r₁、r₂、r₃Expansion for spin matrix R in x, y, z direction, it is assumed that the point on plane reference plate is sat in the world Z coordinate in mark system is 0, then formula (2) homography relationship is simplified as：

Wherein, K [r₁r₂T] it is homography matrix H, it enablesAbove-mentioned formula can be reduced toWherein：

H=[h₁ h₂ h₃]=λ K [r₁ r₂ t] (10)

According to the characteristic of spin matrix, constrained as follows：With ‖ r₁‖=‖ r₂‖=1, can according to formula (4) Know：

Formula (5), which is substituted into above-mentioned constraint, to be obtained：

I.e. each homography matrix can provide 2 equations, and internal reference matrix includes 5 parameters, it is desirable that solution at least needs 3 A homography matrix, it is therefore desirable to which the picture of three width plane reference plates obtains three groups of formulas (6) to calculate internal reference, further according to formula (5) the outer relationship joined is participated in calculates outer parameter.

Further, the multi-modal fusion method described in step S3 uses the hierarchical fusion model of oriented mission.

Further, which is characterized in that the realization process of the hierarchical fusion model of the oriented mission is as follows：Pass through word The input form in the unified different channels of method layer, the same content in different channels is expressed using same primitive；From morphology layer Primitive information is divided into the primitive, the primitive for indicating object, the primitive for indicating object properties for indicating order according to syntax gauge；It is semantic Layer utilizes task-driven mechanism, finally by primitive combination at various specific tasks.

Further, carrying out information feedback method by augmented reality virtual scene in step S4 includes：By virtually believing Cease object display text and graphical information；Pass through the status and appearance of virtual interacting object.

According to disclosed embodiment, it is multi-modal that the second aspect of the present invention discloses a kind of movable type based on augmented reality Interactive device, the multi-modal interactive device include gesture sensor, PC machine, microphone, augmented reality show equipment, In, the gesture sensor is mounted on the augmented reality by support construction and shows in equipment that data-interface passes through USB The mode of data line is connected with the PC machine, hand gesture location and posture for capturing controller；

The microphone is mounted on the augmented reality and shows in equipment, data-interface by USB data line with PC machine is connected, the phonetic control command for capturing controller；

The augmented reality shows that equipment for rendering and showing augmented reality virtual scene, passes through augmented reality skill Art is superimposed virtual scene in true environment, provides the auxiliary information that real world can not obtain, user is to real world for enhancing Sensing capability and interaction capabilities with real world；

The PC machine, data for identification from gesture mode and speech modality simultaneously carry out multi-modal fusion, described The PC machine interaction results that instruct multi-modal fusion show equipment by wireless network transmissions to the augmented reality, pass through Virtual objects in augmented reality virtual scene realize the feedback of interactive information.

Further, the multi-modal interactive device further includes WiFi routers, and the PC machine is route by WiFi Device and the augmented reality carry out wireless communication between showing equipment.

Further, the gesture sensor uses Leap Motion, the augmented reality to show that equipment uses HoloLens。

The present invention has the following advantages and effects with respect to the prior art：

1, the present invention provides a kind of multi-modal exchange method of movable type based on augmented reality, can effectively organize gesture Mode and speech modality, it is different from the interactive mode of conventional serial, it realizes the parallel and cooperating operation between different modalities, realizes More naturally intuitive interactive mode.

2, the co-registration of coordinate systems used between equipment is shown by gesture sensor and augmented reality, can realize gesture with it is virtual Direct interaction between scene realizes the high efficiency interactive of user and augmented reality without extras such as controllers.

3, the implementing result that multi-modal fusion interacts is fed back to by augmented reality in virtual scene, utilizes enhancing The enhancement of reality technology provides three-dimensional, intuitive feedback information.

4, a kind of multi-modal interactive device of movable type based on augmented reality provided by the invention, is increased using optics through mode Strong reality head-mounted display HoloLens and gesture sensor Leap Motion, is combined the two by bindiny mechanism, is realized The portability and mobility of the interactive device so that user also can normally implement to interact outdoors, under mobile working environment.

Description of the drawings

Fig. 1 is a kind of interaction flow of multi-modal exchange method of movable type based on augmented reality in the embodiment of the present invention Figure；

Fig. 2 is a kind of interaction diagrams of specific implementation scene in the embodiment of the present invention；

Fig. 3 is layering task model structure chart in the embodiment of the present invention；

Fig. 4 is general task slot structure figure in the embodiment of the present invention；

Fig. 5 is multi-modal fusion algorithm flow chart in the embodiment of the present invention；

Fig. 6 is the system assumption diagram that task model is layered in the embodiment of the present invention；

Fig. 7 is a kind of composition figure of the multi-modal interactive device of movable type based on augmented reality in the embodiment of the present invention.

Specific implementation mode

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art The every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Embodiment

A kind of multi-modal exchange method of movable type and device based on augmented reality are provided respectively in the present embodiment, realized Naturally intuitive, low Learning work load, high interactive efficiency have portability, ambulant man-machine interaction mode simultaneously.It is existing by enhancing Real mode shows human-computer interaction interface, and augmented reality virtual scene includes the interactive information such as dummy object；User by gesture and The mode of voice sends interactive instruction, by multi-modal fusion method, understands different modalities semanteme, and merges gesture and voice Modal data generates multi-modal fusion interactive instruction；After user's interactive instruction effect, it is virtual that exercising result returns to augmented reality In scene, fed back into row information by the variation of scene.

As shown in fig. 7, a kind of multi-modal interactive device of movable type based on augmented reality provided in the present embodiment includes Gesture sensor, PC machine, microphone, optics through mode augmented reality show equipment, WiFi routers, wherein：

Gesture sensor, gesture sensor (i.e. Leap Motion in attached drawing 7) are mounted on augmented reality by support construction Show in equipment, data-interface is connected by way of USB data line with PC machine, for capture the hand gesture location of controller with Posture.

PC machine is connected by USB data line with gesture sensor, microphone, comes from gesture mode and voice for identification The data of mode simultaneously carry out multi-modal fusion；The interaction results that multi-modal fusion instructs are passed through into wireless network transmissions to optical lens It crosses formula augmented reality and shows equipment, the feedback of interactive information is realized by the virtual objects in augmented reality virtual scene；And lead to It crosses between WiFi routers show equipment with augmented reality and is communicated.

Microphone is mounted on augmented reality and shows in equipment that data-interface is connected by USB data line with PC machine, uses In the phonetic control command of capture controller；

Augmented reality shows that equipment, augmented reality show that equipment (i.e. HoloLens in attached drawing 7) is responsible for rendering and display increases Strong virtual reality scene is superimposed virtual scene by augmented reality in true environment, and providing real world can not obtain Auxiliary information, enhancing user to the sensing capability of real world and with the interaction capabilities of real world；

WiFi routers provide the wireless network environment that PC machine shows communication between devices with augmented reality.

The device can be divided into following functions module again according to functional characteristics：Augmented reality display module, gesture are defeated Enter module, voice input module and multi-modal understanding and Fusion Module.

Augmented reality display module：It is responsible for rendering and showing augmented reality virtual scene, is mainly enhanced by optics through mode Reality glasses form, and Microsoft's HoloLens equipment is used in the present embodiment.By augmented reality, it is superimposed in true environment Virtual scene, provides the auxiliary information that real world can not obtain, enhancing user to the sensing capability of real world and with it is true The interaction capabilities in the world；Its own possesses and maintains a virtual scene coordinate system, is same as rendering and handling virtual scene object； Simultaneously because the multi-modal interactive device of movable type based on augmented reality should have portability and mobility, augmented reality is aobvious Show that module shows equipment using wear-type augmented reality.

Gesture input module：The acquisition and further processing, the present embodiment for being responsible for gesture interaction data use Leap Motion gesture sensors.Gesture input module is based on binocular camera Depth Imaging, for realizing following steps：It collects and obtains Hand images.Hand Gesture Segmentation is carried out to image, gesture is split from image background.Gesture model is established, a system is passed through The parameter of row describes gesture.Gesture feature is extracted, corresponding characteristic parameter parameter is extracted from gesture according to the model of foundation；Its Owned gesture input module coordinate system, for describing gesture and gesture feature data；Due to described based on augmented reality Mobile multi-modal interactive device should have portability and mobility, and the gesture sensor that gesture input module uses will be installed extremely The wear-type augmented reality of augmented reality display module shows that upper side, induction range follow wear-type augmented reality to show Equipment moves, it is ensured that the normal work under situation of movement.

Voice input module：It is responsible for the acquisition of interactive voice data and further processing, is mainly made of microphone.It is right After voice carries out the pretreatments such as preemphasis, end-point detection, the redundancy in voice data is removed, its feature such as mel cepstrum system is extracted Number, and statistical model is trained with this, sound bank is obtained, recognition result is obtained finally by pattern match.Equally, in order to ensure The portability and mobility of the multi-modal interactive device of movable type based on augmented reality, the Mike that voice input module uses Wind is mounted on augmented reality and shows in equipment.

Multi-modal understanding and Fusion Module：It is responsible for the raw information from mechanical floor carrying out unification in multi-modal understanding part Processing, meaning is identical and input that form is different is same is indicated for identical information, to grammer layer provide with equipment without The information of pass, i.e. Interaction function；And multi-modal fusion part is then the task model based on layering, is appointed according to what user was planned Business fills the Interaction function in different channels into corresponding task slot, is finally fused into goal task.It is multi-modal to understand and melt Molding block is mainly realized in PC machine.

The workflow of the multi-modal exchange method of movable type based on augmented reality based on above-mentioned apparatus and function module As shown in Figure 1, including the following steps：

S1, show that human-computer interaction interface, augmented reality virtual scene include the friendships such as dummy object by augmented reality mode Mutual information.

Step S1 is by augmented reality display module by information with the virtual interacting object of virtual scene and virtual display The mode of object is shown；

S2, user pass through gesture and the multi-modal interactive mode of voice and the virtual interacting pair in augmented reality virtual scene As interacting.

The step is virtual using multichannel interaction manner and augmented reality by gesture input module and voice input module Virtual interacting object in scene interacts；

In step s 2, the gesture of user is needed with virtual interacting object interaction directly in virtual scene by gesture number It is converted to augmented reality virtual scene coordinate system according to from the coordinate system of gesture input module.

It realizes that user is interactive by the virtual interacting object in gesture and augmented reality virtual scene, needs to realize gesture Being registrated between sensor coordinate system and augmented reality virtual scene coordinate system, obtains coordinate transformation relation between the two.

The interior participation of gesture input module and augmented reality display module is calculated in the present embodiment using Zhang Zhengyou standardizations The lens model of outer ginseng, Zhang Zhengyou standardizations is as follows：

Wherein, s is scale factor, [u, v, 1]^TFor pixel planes coordinate, [X_w,Y_w,Z_w,1]^TFor the coordinate of world coordinate system Point, [R, T] i.e.Matrix is outer parameter, and R is spin matrix, and t is translation vector, the transposition of T representing matrixes,For intrinsic parameter K, F is the focal length of video camera,[u₀,v₀]^TFor coordinate of the camera coordinate system origin in image coordinate system, dx and dy are picture The length of side of element, unit mm, K=K₁K₂。

Since Zhang Zhengyou standardizations are that one kind being based on the tessellated calibration of plane, it is flat to another to be transformed to a plane The projection mapping in face, as homography.

Wherein, K [r₁ r₂T] it is homography matrix H, it enablesAbove-mentioned formula can simplify ForWherein：

H=[h₁ h₂ h₃]=λ K [r₁ r₂ t] (4)

According to the characteristic of spin matrix, it is easy to get following constraint：With ‖ r₁‖=‖ r₂‖=1, according to formula (4) It is readily apparent that：

I.e. each homography matrix can provide 2 equations, and internal reference matrix includes 5 parameters, it is desirable that solution at least needs 3 A homography matrix, it is therefore desirable to which the picture of three width plane reference plates is to calculate internal reference, further according to the pass joined outside interior participation System calculates outer parameter.

Obtain after joining outside the interior participation of augmented reality display module and gesture sensor, you can with both calculate coordinate system it Between transformational relation.

If P_vsFor the space coordinate that certain in augmented reality display module is put, p_vsFor projection coordinate of this in image plane, H_vsFor the internal reference of augmented reality display module, can be obtained according to national forest park in Xiaokeng：

p_vs=H_vsP_vs (7)

Similarly, it can obtain：

p_h=H_hP_h (8)

Wherein set P_hFor the space coordinate that certain in gesture sensor is put, p_hFor projection coordinate of this in image plane, H_hFor The internal reference of gesture sensor.Set up an office p_vsWith point p_hFor same point in space, therefore it is real to pass through the transformation of coordinate rotation and translation Now transformation of this o'clock between two coordinate systems, is denoted as P_vs=RP_h+ T, wherein R are rotation transformation, and T is translation transformation.Simultaneously should The P of point_vsAnd P_hExpression can be obtained by coordinate transform according to global coordinate system, that is, scaling board coordinate, such as formula P_vs=R_vsP +T_vsAnd P_h=R_hP+T_h, wherein R_vs、R_h、T_vsAnd T_hIt is that equipment video camera and hand are shown from global coordinate system to augmented reality respectively The rotation transformation of gesture sensor and translation transformation, value can be obtained from the outer ginseng matrix of calibration.Slightly being converted according to upper two formula can To obtain：

According to coordinates of targets transformation relation P_vs=RP_h+ T can be obtained：

Transformation relation can be obtained according to formula (10), finally coordinate can be transformed into enhancing from gesture sensor coordinate system Reality is shown in equipment camera coordinate system, realizes the registration of the two coordinate system.

S3, by multi-modal fusion method, understand different modalities semanteme, and merge the modal data of gesture and voice, produce Raw multi-modal fusion interactive instruction.

The initial data of gesture input module and voice input module is transferred to multi-modal understanding and Fusion Module by the step It is handled and is merged, the interactive task of user is generated according to multi-channel information；

In step s3, it needs to carry out being processed and converted to Interaction function to coming from different each modal datas of input module, Primitive is classified and combined according to semanteme, forms final interactive task.

The mechanical floor expression of voice input module speech recognition is character string, and gesture input module device layer is then coordinate Information and click information are handled by multi-modal understanding, and the information of different modalities is indicated using common data structure, is formed Interaction function.

As shown in figure 3, task model of the multi-modal fusion part using layering, task model is made by the bridge of task With people carrys out the behavior of organically organizational computing machine with the identity of mission planning person, to turn the random function of computer Become the implementation method for target, is exactly briefly the intention of people to be communicated to computer by this form of task, and divide It is mechanical floor, morphology from specific facility information to the final semantic abstraction to be filled that the thought of layer, which is then by modal information, 4 layers of layer, grammer layer and semantic layer.

Due to use be oriented mission multi-modal fusion model, need to define certain structure and appoint to describe interaction Business.As shown in figure 4, general task structure is made of task action and a series of task parameters, and the present embodiment according to Task structure is defined as the form of task action, Structure of object attribute and parameter, a certain generic task structure by specific implementation scene Referred to as task slot.Structure of object attribute is for referring to the object to be interacted, in the interactive process in the present embodiment：Voice is defeated Enter in " measuring distance here thereunto ", " here " and " there " is the primitive for indicating Structure of object attribute, for indicating The location information of object.Task action is the core of task structure, to connect different object properties and parameter, in the present embodiment " measurement " of voice input is just task action.Parameter is the required information of task action, such as " measurement " in the present embodiment It needs " distance " parameter to supplement, different tasks can be combined by changing parameter.It should be noted that different task actions Different task structures may be corresponded to, as " the measuring distance here thereunto " in the present embodiment needs one task of filling dynamic Work, two Structure of object attribute and a parameter, and task " marking this position " then only needs to fill a task and one Structure of object attribute.It may determine that whether a task should submit execution according to the occupied state of task slot, if task There is still a need for the further inputs that other information then waits for user for slot, submit explanation to hold immediately if task slot has been filled with and finishes Row.

Show that multi-modal fusion algorithm flow, step include referring to Fig. 5, Fig. 5：

Step S31 classifies to the incoming event received, if extracting task key word, goes to step S32.Otherwise, it presses Input channel is respectively filled in Speech time queue or gesture event queue, and forms parameter stack；

Step S32 generates corresponding task slot according to task key word, is put into Interaction context；

Step S33 with the syntax rule of the task in temporal correlation, current Interaction context is about in event queue Shu Jinhang task slots are filled, and judge whether that filling is complete, if completely, submitting explanation to execute, otherwise returning to waiting for further filling out It fills；

Step S34 explains execution to task, and empties context.If explaining failure, task can not be executed, be dished out Exception simultaneously records context.

Referring to the architecture of Fig. 6, Fig. 6 layering task model for showing to use in the present embodiment, structure includes：Event Subsystem is managed, the input information of input module is responsible for receiving and managing；Event input by user is responsible in work event queue； Event-action translation table is responsible for mapping the event of work event queue and action；Event argument translation table；Event object attribute Translation table；Interaction context is responsible for current and historic task slot information；Order integrator is responsible for that task slot will be filled, whole Synthesis task.

It, below will tool in order to preferably describe the movable type multi-modal exchange method proposed by the present invention based on augmented reality Body introduces step of the method for the present invention in specific application scenarios.Fig. 2 shows a kind of specific implementation scenes of the present embodiment Interaction flow, the exchange method obtains distance between two points on map through the invention for the specific implementation scene description, Its key step includes：

Information is shown object with the virtual interacting object of virtual scene and virtually by step R1, augmented reality display module Mode is shown.Specifically, the augmented reality virtual scene of the specific implementation scene of the present embodiment includes virtual map and letter Cease overview.Virtual map is virtual interacting object, and user can be interacted by multi-modal interactive mode.And information profile is Virtual information object, for showing area, weather, traffic conditions etc. where the map display area；

Step R2, user are said by voice：" measuring distance here thereunto ", while user clicks virtual field with hand Two positions of the virtual map of scape, respectively as beginning and end；

Step R3, voice input module and gesture input module pass the voice of the user of step R2 and gesture initial data Enter multi-modal understanding and Fusion Module, convert raw data into Interaction function, and is divided into according to syntax rule different classes of Primitive finally utilizes task-driven mechanism by different primitive combinations, forms final interactive task, friendship is executed by the system；

Range measurement is back in virtual scene by step R4 after system completes range measurement task.Specifically, information Feedback system can be shown for virtual information object in a manner of text, or the feedback user by way of voice broadcast.

A kind of multi-modal exchange method of movable type based on augmented reality that the present embodiment proposes includes the following steps：Pass through Augmented reality mode shows that human-computer interaction interface, augmented reality virtual scene include the interactive information such as dummy object；User passes through Gesture and the mode of voice send interactive instruction, by multi-modal fusion method, understand different modalities semanteme, and merge gesture with The modal data of voice generates multi-modal fusion interactive instruction；After user's interactive instruction effect, it is existing that exercising result returns to enhancing In real virtual scene, fed back into row information by the variation of scene.A kind of movement based on augmented reality that the present embodiment proposes The multi-modal interactive device of formula includes gesture sensor, PC machine, microphone, optics through mode augmented reality display equipment, the roads WiFi By device.

In the present embodiment, by augmented reality by information with the virtual interacting object of virtual scene and virtual display pair The mode of elephant is shown.User is empty with augmented reality using multichannel interaction manner by gesture input module and voice input module Virtual interacting object in quasi- scene interacts.According to interaction generate initial data, it is multi-modal understanding and Fusion Module into Row realizes the fusion of multi-modal channel interaction, generates the interactive task of user.After interactive task friendship has been executed by the system, execute As a result augmented reality virtual scene is returned, realizes interactive feedback.By using the information enhancement of augmented reality, extension User obtains the dimension of information.The parallel of a variety of effect channels and cooperation are realized further through multi-modal fusion technology, provide simultaneously Integrated and flexible natural interactive style.The interactivity for relying on the enhancement and multi-modal interaction of augmented reality, carries A kind of naturally intuitive, efficient interactive mode is supplied.

Simultaneously again because of a kind of multi-modal interactive device of movable type based on augmented reality provided by the invention, using optics Through mode augmented reality head-mounted display HoloLens and gesture sensor Leap Motion, by bindiny mechanism by the two In conjunction with realizing the portability and mobility of the interactive device so that user outdoors, under mobile working environment also can be normal Implement interaction, provides good portability and mobility.

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, it is other it is any without departing from the spirit and principles of the present invention made by changes, modifications, substitutions, combinations, simplifications, Equivalent substitute mode is should be, is included within the scope of the present invention.

Claims

1. a kind of multi-modal exchange method of movable type based on augmented reality, which is characterized in that the multi-modal exchange method Include the following steps：

S2, user by gesture and the multi-modal interactive mode of voice and the virtual interacting object in augmented reality virtual scene into Row interaction；

S3, by multi-modal fusion method, understand different modalities semanteme, and merge the modal data of gesture and voice, generate more Modality fusion interactive instruction；

After S4, user's interactive instruction effect, exercising result returns in augmented reality virtual scene, is carried out by the variation of scene Information is fed back.

2. the movable type multi-modal exchange method according to claim 1 based on augmented reality, which is characterized in that step S1 Described in augmented reality virtual scene include virtual interacting object and virtual information object, wherein the virtual interacting pair As possessing multi-modal interaction capabilities and Informational Expression ability；The virtual information object, possesses Informational Expression ability.

3. the movable type multi-modal exchange method according to claim 1 based on augmented reality, which is characterized in that step S2 In pass through gesture send interactive instruction operation, wherein the interactive object of gesture be augmented reality virtual scene in virtual interacting Object, interactive mode include：Virtual interacting object is implemented to click, drag or touch operation.

4. the movable type multi-modal exchange method according to claim 3 based on augmented reality, which is characterized in that step S2 In to realize that user is interactive by the virtual interacting object in gesture and augmented reality virtual scene, need to realize gesture sensor Being registrated between coordinate system and augmented reality virtual scene coordinate system, obtains coordinate transformation relation between the two, using just Friendly standardization calculates gesture sensor and shown with augmented reality joins outside the interior participation of equipment, and the lens model of Zhang Zhengyou standardizations is such as Under：

Wherein, s is scale factor, [u, v, 1]^TFor pixel planes coordinate, [X_w,Y_w,Z_w,1]^TFor the coordinate points of world coordinate system, [R, T] i.e.Matrix is outer parameter, and R is spin matrix, and t is translation vector, the transposition of T representing matrixes,For intrinsic parameter K, F is the focal length of video camera,[u₀,v₀]^TFor coordinate of the camera coordinate system origin in image coordinate system, dx and dy are picture The length of side of element, unit mm, K=K₁K₂；

Wherein r₁、r₂、r₃Expansion for spin matrix R in x, y, z direction, it is assumed that the point on plane reference plate is in world coordinate system In Z coordinate be 0, then formula (2) homography relationship be simplified as：

Wherein, K [r₁ r₂T] it is homography matrix H, it enablesAbove-mentioned formula can be reduced toWherein：

H=[h₁ h₂ h₃]=λ K [r₁ r₂ t] (4)

According to the characteristic of spin matrix, constrained as follows：With ‖ r₁‖=‖ r₂‖=1, according to formula (4)：

I.e. each homography matrix can provide 2 equations, and internal reference matrix includes 5 parameters, it is desirable that solution at least needs 3 lists Answering property matrix, it is therefore desirable to which the picture of three width plane reference plates obtains three groups of formulas (6) to calculate internal reference, further according to formula (5) The relationship joined outside interior participation calculates outer parameter.

5. the movable type multi-modal exchange method according to claim 1 based on augmented reality, which is characterized in that step S3 Described in multi-modal fusion method use oriented mission hierarchical fusion model.

6. the movable type multi-modal exchange method according to claim 5 based on augmented reality, which is characterized in that described The realization process of the hierarchical fusion model of oriented mission is as follows：By the input form in the unified different channels of morphology layer, using same One primitive expresses the same content in different channels；Primitive information from morphology layer is divided into according to syntax gauge and indicates order Primitive, the primitive for indicating object, the primitive for indicating object properties；Semantic layer utilizes task-driven mechanism, finally by primitive combination At various specific tasks.

7. the movable type multi-modal exchange method according to claim 1 based on augmented reality, which is characterized in that step S4 In by augmented reality virtual scene carry out information feedback method include：Believed by virtual information object display text and figure Breath；Pass through the status and appearance of virtual interacting object.

8. a kind of multi-modal interactive device of movable type based on augmented reality, which is characterized in that the multi-modal interactive device Equipment is shown including gesture sensor, PC machine, microphone, augmented reality, wherein the gesture sensor is pacified by support construction It shows in equipment that data-interface is connected by way of USB data line with the PC machine mounted in the augmented reality, uses In the hand gesture location and posture of capture controller；

The microphone is mounted on the augmented reality and shows in equipment that data-interface passes through USB data line and PC machine It is connected, the phonetic control command for capturing controller；

The augmented reality shows equipment for rendering and showing augmented reality virtual scene, by augmented reality, It is superimposed virtual scene in true environment, the auxiliary information that real world can not obtain, sense of the enhancing user to real world are provided Know ability and the interaction capabilities with real world；

The PC machine, data for identification from gesture mode and speech modality simultaneously carry out multi-modal fusion, the PC The interaction results that machine instructs multi-modal fusion show equipment by wireless network transmissions to the augmented reality, pass through enhancing Virtual objects in virtual reality scene realize the feedback of interactive information.

9. the movable type multi-modal interactive device according to claim 8 based on augmented reality, which is characterized in that described Multi-modal interactive device further includes WiFi routers, and the PC machine is shown with the augmented reality by WiFi routers and set Wireless communication is carried out between standby.

10. the movable type multi-modal interactive device according to claim 8 based on augmented reality, which is characterized in that described Gesture sensor use Leap Motion, the augmented reality show equipment use HoloLens.