CN108363556A

CN108363556A - A kind of method and system based on voice Yu augmented reality environmental interaction

Info

Publication number: CN108363556A
Application number: CN201810090559.6A
Authority: CN
Inventors: 谢高喜; 滕禹桥; 任大韫; 姚淼
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-01-30
Filing date: 2018-01-30
Publication date: 2018-08-03
Also published as: US11397559B2; US20190235833A1

Abstract

This application provides a kind of method and system based on voice Yu augmented reality environmental interaction, the method includes obtaining the voice data of user, obtain the corresponding operational order of the voice data；According to the operational order, augmented reality environment is handled, shows the augmented reality handling result.The interactive efficiency of augmented reality environment can be improved by voice and augmented reality environmental interaction.

Description

A kind of method and system based on voice Yu augmented reality environmental interaction

【Technical field】

This application involves automation field more particularly to a kind of method based on voice and augmented reality environmental interaction and System.

【Background technology】

Augmented reality (Augmented Reality, abbreviation AR) is a kind of position calculating camera image in real time Set and angle and plus respective image, video, 3D models technology, the target of augmented reality is on the screen virtual generation Boundary is sleeved on real world and carries out interaction.

Universal with mobile phone mobile device and handheld mobile device, the augmented reality (AR environment) based on mobile device is more More to be recognized by user.

But the interactive means of the augmented reality environment based on mobile device are single, and gesture interaction or movement is only supported to set Standby included GPS+ posture Sensor abilities, are interacted using gesture interaction or mobile device posture, will increase unnecessary action, shadow Ring interactive efficiency.

【Invention content】

The many aspects of the application provide a kind of method and system based on voice Yu augmented reality environmental interaction, for carrying The interactive efficiency of high augmented reality environment.

The one side of the application provides a kind of method based on voice Yu augmented reality environmental interaction, including：

The voice data for obtaining user, obtains the corresponding operational order of the voice data；

According to the operational order, augmented reality environment is handled, shows the augmented reality handling result.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method obtains user's Voice data, obtaining the corresponding operational order of the voice data includes：

Start audio monitoring service, the voice data of monitoring users；

Speech recognition is carried out to the voice data, obtains the corresponding identification text of the voice data；

Semantic analysis is carried out to the identification text, obtains the corresponding operational order of the identification text.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, to the identification Text carries out semantic analysis, and obtaining the corresponding operational order of the identification text includes：

The identification text is accurately matched in preset operational order, searches corresponding operational order；With/ Or,

Word segmentation processing is carried out to the identification text, generates keyword, searches the operational order with the Keywords matching.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, when the key When word and at least two operational order successful match, according to the further selection of user, corresponding operational order is obtained.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the enhancing are existing Real environment includes：Preset augmented reality subenvironment scene；Alternatively, carrying out feature point by the reality scene obtained to camera Analyse obtained augmented reality subenvironment scene.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, according to the behaviour It instructs, carrying out processing to augmented reality environment includes：

According to the operational order, it is existing that corresponding enhancing is carried out to the augmented reality information in augmented reality subenvironment scene Real control operation.

The another aspect of the application provides a kind of system based on voice Yu augmented reality environmental interaction, including：

Operational order acquisition module, the voice data for obtaining user obtain the corresponding operation of the voice data and refer to It enables；

Augmented reality processing module, for according to the operational order, augmented reality processing to be carried out to augmented reality environment, Show the augmented reality handling result.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the operation refer to Acquisition module is enabled, is specifically included：

Voice acquisition submodule, the voice data for starting user；

It is corresponding to obtain the voice data for carrying out speech recognition to the voice data for speech recognition submodule Identify text；

It is corresponding to obtain the identification text for carrying out semantic analysis to the identification text for semantic analysis submodule Operational order.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, described semantic point Submodule is analysed, is specifically used for：

When the keyword and at least two operational order successful match, according to the further selection of user, obtain pair The operational order answered.

The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the enhancing are existing Real processing module, is specifically used for：

Another aspect of the present invention, provides a kind of computer equipment, including memory, processor and is stored in the storage On device and the computer program that can run on the processor, the processor are realized as previously discussed when executing described program Method.

Another aspect of the present invention provides a kind of computer readable storage medium, is stored thereon with computer program, described Method as described above is realized when program is executed by processor.

By the technical solution it is found that the embodiment of the present application can improve the interactive efficiency of augmented reality environment.

【Description of the drawings】

It in order to more clearly explain the technical solutions in the embodiments of the present application, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is some realities of the application Example is applied, it for those of ordinary skill in the art, without having to pay creative labor, can also be attached according to these Figure obtains other attached drawings.

Fig. 1 is that the flow based on voice and the method for augmented reality environmental interaction that one embodiment of the application provides is illustrated Figure；

Fig. 2 is the structural representation based on voice Yu the system of augmented reality environmental interaction that one embodiment of the application provides Figure；

Fig. 3 shows the frame of the exemplary computer system/server 012 suitable for being used for realizing embodiment of the present invention Figure.

【Specific implementation mode】

To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art The whole other embodiments obtained without creative efforts, shall fall in the protection scope of this application.

Fig. 1 is the schematic diagram based on voice Yu the method for augmented reality environmental interaction that one embodiment of the application provides, such as Shown in Fig. 1, include the following steps：

Step S11, the voice data for obtaining user, obtains the corresponding operational order of the voice data；

Step S12, according to the operational order, augmented reality environment is handled, shows the augmented reality processing As a result.

The present embodiment the method can be executed by the control device of augmented reality, the device can by software and/or Hardware is realized, and is integrated in the mobile terminal with augmented reality function.Wherein, mobile terminal includes but is not limited to hand The equipment that the users such as machine, tablet computer hold.

In a kind of preferred implementation of step S11,

Preferably, the voice data for obtaining user, it includes following sub-step to obtain the corresponding operational order of the voice data Suddenly：

Sub-step S111, start audio monitoring service, the voice data of monitoring users；

Preferably, audio select equipment can be handheld device, such as the MIC of mobile phone or tablet computer.Wherein, it monitors and uses The voice data at family.Wherein, the voice data of monitoring users can be the voice data of real-time monitoring users, can also be complete The voice data of monitoring users after being operated at the next item up.For example, it may be after opening augmented reality function monitoring users language Sound data, or complete the voice data of monitoring users after the display of augmented reality content.

Preferably, if current scene is default augmented reality subenvironment scene, user can be guided to input preset language Sound operational order.For example, the augmented reality subenvironment scene is automobile 3D model subenvironment scenes, then in the scene, display Such as " rotating model ", " scale-up model ", " reducing model " prompt, user can be according to the fixation of above-mentioned prompt input format Voice, recognition accuracy are higher.Wherein, it is by the specific of the control device of augmented reality to preset augmented reality subenvironment scene Entrance enters, for example, having preset multiple entrances such as automobile 3D models, personage's 3D models on the APP of control device, user clicks special It is incorporated into mouth, that is, enters default augmented reality subenvironment scene, the display automobile 3D moulds in default augmented reality subenvironment scene Type.

Sub-step S112, speech recognition is carried out to the voice data, obtains the corresponding identification text of the voice data；

Preferably, automatic speech recognition (Automatic Speech Recognition, ASR) is called to service, to user Voice data parsed, obtain the corresponding voice recognition result of the voice, institute's speech recognition result is that voice corresponds to Identification text.

Some existing speech recognition technologies may be used in the process of the speech recognition, include mainly：To voice data Feature extraction is carried out, is decoded, is being solved using the characteristic and trained in advance acoustic model and language model of extraction It can determine that the corresponding syntactic units of voice data, syntactic units such as phoneme or syllable are obtained according to decoding result when code The corresponding identification text of current speech.

Sub-step S113, semantic analysis is carried out to the identification text, obtains the corresponding operational order of the identification text.

Preferably due in default augmented reality subenvironment scene, user can be to format according to guiding input Therefore fixed voice can accurately match the identification text in preset operational order, search corresponding operation Instruction.

Preferably for other augmented reality subenvironment scenes other than default augmented reality subenvironment scene, Yong Huye Accurate can be carried out to the identification text in preset operational order with the fixation voice of input format, therefore Match, searches corresponding operational order.

If not finding the operational order of the identification accurate matching of texts, the identification text is segmented Processing generates keyword；According to the keyword, the operation with the Keywords matching is searched in preset operational order and is referred to It enables.

Preferably, it can be based on semantics recognition technology, the identification text is matched with preset operational order.Example Such as, the identification text is handled based on semantics recognition technology with preset operational order, and calculates phase between the two Like degree, if similarity between the two is more than similarity threshold, it is determined that successful match；Otherwise, it determines matching is unsuccessful.This reality It applies in example and similarity threshold is not especially limited, if similarity threshold can be 0.8.

Preferably, when the keyword and at least two operational order successful match, according to the further selection of user, Obtain corresponding operational order.For example, according to multiple operational orders of successful match, a variety of choosings are provided in augmented reality environment It selects, the selection operation made by user, further corresponding operational order.

In a kind of preferred implementation of step S12,

Preferably, the augmented reality environment includes：Preset augmented reality subenvironment scene；Alternatively, by camera shooting The reality scene that head obtains carries out the augmented reality subenvironment scene that signature analysis obtains.

Preferably, in preset augmented reality subenvironment scene, referred to according to the fixed operation of formatting input by user It enables, executes predetermined registration operation, for example, in preset automobile 3D models augmented reality subenvironment scene, to shown automobile 3D Model such as is rotated, is amplified, being reduced at the operations.

Preferably, signature analysis is carried out by the reality scene that is obtained to camera, when camera captures certain objects, Corresponding augmented reality subenvironment scene is then loaded, for example, when camera captures certain advertisement position, then corresponding advertisement is loaded and increases Strong reality subenvironment scene.According to the operational order, the augmented reality information in augmented reality subenvironment scene is carried out pair The augmented reality control operation answered.For example, user can input the control instruction of " repeating playing ", control advertisement augmented reality is certainly Advertisement augmented reality information in environment scene is repeated playing；The control instruction of " rotation " can also be inputted, advertisement is controlled Augmented reality is rotated from the advertisement augmented reality information in environment scene, and most suitable viewing angle viewing advertisement is selected to increase Strong reality information.

Preferably, when camera does not capture certain objects, then entrance acquiescence augmented reality subenvironment scene, is waited for use The operational order at family, for example, voice input by user is that " please recommend the sand of my a suitable my family space and decoration style collocation Hair " carries out word segmentation processing to the identification text, generates keyword " space ", " style ", " sofa "；According to the keyword, It finds and the operational order of the Keywords matching " display sofa ".Then sand is shown in current augmented reality subenvironment scene The augmented reality information of hair.User can be inputted by the voice of more rounds and is adjusted to the augmented reality information of sofa, e.g., Change sofa type, change sofa color, change sofa size, change sofa angle etc..

Preferably, according to the operational order, after handling augmented reality environment, by treated, augmented reality is believed Breath is plotted in the picture frame or video flowing of camera acquisition.

Specifically, using computer graphics disposal technology, AR information is drawn on picture frame or video flowing.

Treated augmented reality information and picture frame or video flowing are subjected to Rendering operations, finally obtained for exporting Picture frame or video flowing；

Obtained picture frame will be rendered or video flowing is plotted in the memory for display；

The picture frame or video flowing in memory are would be mapped out, shows the screen of the mobile terminal with augmented reality function On.

According to this embodiment, it can by voice and augmented reality environmental interaction, the interaction of augmented reality environment is improved Efficiency.

It should be noted that for each method embodiment above-mentioned, for simple description, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to preferred embodiment, involved action and module not necessarily the application It is necessary.

It is the introduction about embodiment of the method above, below by way of device embodiment, to scheme of the present invention into traveling One step explanation.

Fig. 2 is the structural representation based on voice Yu the system of augmented reality environmental interaction that one embodiment of the application provides Figure, as shown in Fig. 2, including：

Operational order acquisition module 21, the voice data for obtaining user obtain the corresponding operation of the voice data Instruction；

Augmented reality processing module 22 shows institute for according to the operational order, handling augmented reality environment State augmented reality handling result.

System described in the present embodiment can be the control device of augmented reality to execute, the device can by software and/or Hardware is realized, and is integrated in the mobile terminal with augmented reality function.Wherein, mobile terminal includes but is not limited to hand The equipment that the users such as machine, tablet computer hold.

In a kind of preferred implementation of operational order acquisition module 21,

Preferably, the voice data for obtaining user, it includes following submodule to obtain the corresponding operational order of the voice data Block：

Voice acquisition submodule 211, for starting audio monitoring service, the voice data of monitoring users；

Speech recognition submodule 212 obtains the voice data and corresponds to for carrying out speech recognition to the voice data Identification text；

Semantic analysis submodule 213 obtains the identification text and corresponds to for carrying out semantic analysis to the identification text Operational order.

Preferably due in default augmented reality subenvironment scene, user is the fixation formatted according to guiding input Therefore voice can accurately match the identification text in preset operational order, search corresponding operation and refer to It enables.

In a kind of preferred implementation of augmented reality processing module 22,

Augmented reality processing module 22 is handled augmented reality environment according to the operational order, shows the increasing Strong reality handling result.

In the described embodiment, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.

In several embodiments provided herein, it should be understood that disclosed method and apparatus can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be the INDIRECT COUPLING or logical by some interfaces, device or unit Letter connection can be electrical, machinery or other forms.

The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.

In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.The integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds SFU software functional unit.

Fig. 3 shows the frame of the exemplary computer system/server 012 suitable for being used for realizing embodiment of the present invention Figure.The computer system/server 012 that Fig. 3 is shown is only an example, function that should not be to the embodiment of the present invention and use Range band carrys out any restrictions.

As shown in figure 3, computer system/server 012 is showed in the form of universal computing device.Computer system/clothes The component of business device 012 can include but is not limited to：One or more processor or processing unit 016, system storage 028, the bus 018 of connection different system component (including system storage 028 and processing unit 016).

Bus 018 indicates one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 012 typically comprises a variety of computer system readable media.These media can be appointed The usable medium what can be accessed by computer system/server 012, including volatile and non-volatile media, movably With immovable medium.

System storage 028 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 030 and/or cache memory 032.Computer system/server 012 may further include other Removable/nonremovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 034 can For reading and writing immovable, non-volatile magnetic media (Fig. 3 do not show, commonly referred to as " hard disk drive ").Although in Fig. 3 It is not shown, can provide for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and pair can The CD drive that mobile anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) is read and write.In these situations Under, each driver can be connected by one or more data media interfaces with bus 018.Memory 028 may include There is one group of (for example, at least one) program module, these program modules to be configured at least one program product, the program product To execute the function of various embodiments of the present invention.

Program/utility 040 with one group of (at least one) program module 042, can be stored in such as memory In 028, such program module 042 includes --- but being not limited to --- operating system, one or more application program, other Program module and program data may include the realization of network environment in each or certain combination in these examples.Journey Sequence module 042 usually executes function and/or method in embodiment described in the invention.

Computer system/server 012 can also with one or more external equipments 014 (such as keyboard, sensing equipment, Display 024 etc.) communication, in the present invention, computer system/server 012 is communicated with outside radar equipment, can also be with One or more enable a user to the equipment interacted with the computer system/server 012 communication, and/or with make the meter Any equipment that calculation machine systems/servers 012 can be communicated with one or more of the other computing device (such as network interface card, modulation Demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 022.Also, computer system/clothes Being engaged in device 012 can also be by network adapter 020 and one or more network (such as LAN (LAN), wide area network (WAN) And/or public network, such as internet) communication.As shown in figure 3, network adapter 020 by bus 018 and computer system/ Other modules of server 012 communicate.It should be understood that although being not shown in Fig. 3, computer system/server 012 can be combined Using other hardware and/or software module, including but not limited to：Microcode, device driver, redundant processing unit, external magnetic Dish driving array, RAID system, tape drive and data backup storage system etc..

Processing unit 016 is stored in the program in system storage 028 by operation, described in the invention to execute Function in embodiment and/or method.

Above-mentioned computer program can be set in computer storage media, i.e., the computer storage media is encoded with Computer program, the program by one or more computers when being executed so that one or more computers execute in the present invention State method flow shown in embodiment and/or device operation.

With time, the development of technology, medium meaning is more and more extensive, and the route of transmission of computer program is no longer limited by Tangible medium, can also directly be downloaded from network etc..The arbitrary combination of one or more computer-readable media may be used. Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or The arbitrary above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes：There are one tools Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.

Computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission for by instruction execution system, device either device use or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

It can be write with one or more programming languages or combinations thereof for executing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partly executes or executed on a remote computer or server completely on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN) is connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).

Finally it should be noted that：Above example is only to illustrate the technical solution of the application, rather than its limitations；Although The application is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that：It still may be used With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features； And these modifications or replacements, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of method based on voice Yu augmented reality environmental interaction, which is characterized in that include the following steps：

2. according to the method described in claim 1, it is characterized in that, the voice data of acquisition user, obtains the voice data Corresponding operational order includes：

Start audio monitoring service, the voice data of monitoring users；

3. according to the method described in claim 2, it is characterized in that, to identification text progress semantic analysis, obtain described Identify that the corresponding operational order of text includes：

The identification text is accurately matched in preset operational order, searches corresponding operational order；And/or

4. according to the method described in claim 3, it is characterized in that,

When the keyword and at least two operational order successful match, according to the further selection of user, obtain corresponding Operational order.

5. according to the method described in claim 1, it is characterized in that, the augmented reality environment includes：Preset augmented reality Subenvironment scene；Alternatively, carrying out the augmented reality subenvironment field that signature analysis obtains by the reality scene obtained to camera Scape.

6. according to the method described in claim 1, it is characterized in that, according to the operational order, augmented reality environment is carried out Processing includes：

According to the operational order, corresponding augmented reality control is carried out to the augmented reality information in augmented reality subenvironment scene System operation.

7. a kind of system based on voice Yu augmented reality environmental interaction, which is characterized in that including：

Operational order acquisition module, the voice data for obtaining user obtain the corresponding operational order of the voice data；

Augmented reality processing module, for according to the operational order, augmented reality processing, display to be carried out to augmented reality environment The augmented reality handling result.

8. system according to claim 7, which is characterized in that the operational order acquisition module specifically includes：

Voice acquisition submodule, the voice data for starting user；

Speech recognition submodule obtains the corresponding identification of the voice data for carrying out speech recognition to the voice data Text；

Semantic analysis submodule obtains the corresponding operation of the identification text for carrying out semantic analysis to the identification text Instruction.

9. system according to claim 8, which is characterized in that the semantic analysis submodule is specifically used for：

10. system according to claim 9, which is characterized in that the semantic analysis submodule is specifically used for：

11. system according to claim 7, which is characterized in that

The augmented reality environment includes：Preset augmented reality subenvironment scene；Alternatively, passing through the reality obtained to camera Scene carries out the augmented reality subenvironment scene that signature analysis obtains.

12. system according to claim 7, which is characterized in that the augmented reality processing module is specifically used for：

13. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~6 Method described in.

14. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is handled Such as method according to any one of claims 1 to 6 is realized when device executes.