CN102737101A

CN102737101A - Combined activation for natural user interface systems

Info

Publication number: CN102737101A
Application number: CN2012100911763A
Authority: CN
Inventors: L·P·赫克; M·金达昆塔; D·米特比; L·施蒂费尔曼
Original assignee: Microsoft Corp
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2011-03-31
Filing date: 2012-03-30
Publication date: 2012-10-17
Anticipated expiration: 2032-03-30
Also published as: CN102750270A; EP2691885A4; CN102737104B; JP6105552B2; EP2691876A4; JP2014515853A; CN102750311B; WO2012135157A3; EP2691877A4; CN102737099B; KR101963915B1; JP2014512046A; JP2014509757A; EP2691870A2; WO2012135210A3; WO2012135218A2; JP2017123187A; EP2691875A2; WO2012135226A1; CN106383866B

Abstract

A user interaction activation may be provided. A plurality of signals received from a user may be evaluated to determine whether the plurality of signals are associated with a visual display. If so, the plurality of signals may be translated into an agent action and a context associated with the visual display may be retrieved. The agent action may be performed according to the retrieved context and a result associated with the performed agent action may be displayed to the user.

Description

The combined type that is used for the nature user interface system activates

Technical field

The present invention relates to user interactive system, more specifically, relate to the combined type that is used for the nature user interface system and activate.

Background technology

The combined type of nature user interface system activates can provide multi-mode nature user interface activation system, and this system can use various modes to activate or operational applications.In some situations, natural user interface system is paid attention to the activation or the operation of single-mode.For example, the user activates application through voice command or through knocking screen.Yet, the order of the single-mode in the conventional system activates can be extremely sensitive or occur easily various types of inaccurate, such as activating unintentionally.

Summary of the invention

This general introduction is provided so that some notions that will in following detailed description, further describe with the reduced form introduction.This summary of the invention neither is intended to identify the key feature or the essential feature of theme required for protection.Content of the present invention is not intended to be used to limit the scope of theme required for protection yet.

Can provide user interactions to activate.Can assess to confirm whether these a plurality of signals are associated with Visual Display a plurality of signals that receive from the user.If these a plurality of signals can be translated into agent actions, and can retrieve the context that is associated with Visual Display.Can carry out agent actions according to the context of being retrieved, and can show the result who is associated with performed agent actions to the user.

Above general description and following detailed description both provide example, and just illustrative.Therefore, above general description and following detailed description should not be considered to restrictive.In addition, those characteristics of in this paper, being set forth or the variant can also provide other characteristics or variant.For example, embodiment can relate to the various characteristics combination and son combination described in the embodiment.

Description of drawings

Be incorporated in the disclosure and constitute its a part of accompanying drawing embodiments of the invention are shown.In the accompanying drawings:

Fig. 1 is the block diagram of operating environment;

Fig. 2 is a kind of process flow diagram that is used to provide the method that user interactions activates; And

Fig. 3 is the block diagram that comprises the system of computing equipment.

Embodiment

Below describe in detail with reference to each accompanying drawing.As long as maybe, just the identical Reference numeral of use is indicated same or analogous element in accompanying drawing and following description.Although possibly describe embodiments of the invention, modification, reorganization and other realizations are possible.For example, can replace, add or revise the element shown in the accompanying drawing, and can be through disclosed method displacement, rearrangement or interpolation stage are revised method described herein.Therefore, below detailed description does not limit the present invention.On the contrary, correct scope of the present invention is defined by appended claims.

Spoken dialog system (SDS) make people can be enough their sound and computing machine carry out alternately.The primary clustering that drives this SDS can comprise dialog manager: this assembly management and user's the session based on dialogue.Dialog manager can be confirmed user's intention through the combination of a plurality of input sources, this a plurality of input sources such as speech recognition and the output of natural language understanding assembly, the context from previous dialogue round, user's context and/or the result who returns from knowledge base (for example search engine).After confirming intention, dialog manager can be taked action, such as the dialogue that shows net result and/or continuation and user to the user to satisfy their intention.

Fig. 1 is the block diagram of operating environment 100, and operating environment 100 comprises user 105, server 107, network 120 and subscriber equipment 130.Server 107 can comprise spoken dialog system (SDS) 110, personal assistant program 112 and/or search agent 118.SDS 110 can be used for receiving user's phrase, inquiry, action and/or action request via network 120.Network 120 can comprise proprietary network (for example, corporate intranet), cellular network and/or such as public networks such as the Internets.Operating environment 100 also can comprise a plurality of data sources 150 (A)-(C).Subscriber equipment 130 can be used for providing images displayed 132, such as the image that is associated with photo, video and/or recreation.Subscriber equipment 130 can be coupled to camera 135, and camera 135 can be used for recording user 105 and catches action and/or the gesture that user 105 is done.Subscriber equipment 130 also can further be used for catching user 105 such as the word through microphone 137 oral accounts, and/or catch from user 105 such as other input through keyboard and/or mouse (not drawing).According to other embodiments of the invention, camera 135 can comprise any motion detection device that moves that can detect user 105.For example; Camera 135 can comprise Microsoft

motion capture device, and it comprises a plurality of cameras and a plurality of microphone.

Fig. 2 sets forth to be used for providing the method 200 of personalization of the user inquiring process flow diagram in each related summary stage according to one embodiment of the invention.Method 200 can use computing equipment 300 to realize that this will more describe in detail with reference to figure 3 below.Hereinafter the mode in each stage of implementation method 200 will be described in more detail.Method 200 can start from initial block 205 and proceed to the stage 210, and there, computing equipment 300 can receive a plurality of signals from the user.For example, SDS 110 can receive oral account inquiry and by the first performed gesture of camera 135 identifying users 105.For example, the user can wave and say an order, like " hello, xbox ".

Subsequently, method 200 can advance to the stage 220, and there, computing equipment 300 can confirm that whether this signal is to this system.For example, user 105 points to screens can comprise that is activated a gesture, and user 105 passes by before the camera 135 and can not comprise the activation gesture.According to various embodiments of the present invention, user 105 can be defined as the gesture that is associated with any gesture.If gesture that is identified and/or voice signal are identified as not to SDS 110, then method 200 can finish in the stage 270.

If signal pin is to this system, then method 200 can advance to the stage 230, and there, computing equipment 300 can be retrieved the context that is associated with Visual Display.For example, metadata can be associated with video flowing, and video flowing provides such as information such as title, performer, description, gradings.For another example, a retrieval context that can be from data source 150 (A)-(C).For example, data source 150 (A) can comprise the film information website.

Method 200 can advance to step 240 subsequently, and there, computing equipment 300 can be translated into agent actions with the signal that receives.For example, camera 135 can be caught user 105 sensing gesture, points to the subclass that gesture can be used for indicating Visual Display.For example, three famous actors are arranged in the present frame of film video, but which name in camera identifying user 105 positive senses three famous actors then.Indication can be used to create with such as " whom that performer is? " And so on the agent actions that is associated of speech polling.Therefore, agent actions can optionally identify indicated that of user in three famous actors.

Method 200 can advance to the stage 250 subsequently, and there, computing equipment 300 can be carried out agent actions according to context that is retrieved and received signal.For example, SDS 110 can be from all performers' the film that data source 150 (A) retrieval is shown tabulation, three famous actors that the result is narrowed down to when signal is received to be shown, and point to which performer according to user 105 and identify concrete performer.

Method 200 then can advance to the stage 260, and there, computing equipment 300 can show the result who is associated with performed inquiry to the user.For example, can show on subscriber equipment 130 that captions provide the result of inquiry.Subsequently, method 200 can finish in the stage 270.

Can comprise according to one embodiment of the invention and to be used to provide the user interactions activated system.This system can comprise memory stores and the processing unit that is coupled to this memory stores.Processing unit can be used for receiving inquiry from the user, context that retrieval is associated with Visual Display, carries out inquiry and show the result who is associated with performed inquiry to the user according to the context that is retrieved.Visual Display can comprise for example still image, video and/or game image.Can be used for coming retrieval and inquisition can comprise that processing unit can be used for according to the context of being retrieved a plurality of results being narrowed down to said a plurality of results' subclass according to the context of being retrieved.Processing unit also can be further used for receiving gesture from the user, upgrade the context retrieved and carry out inquiry according to the context that is upgraded according to this gesture (for example pointing to gesture).Can be used for upgrading the context of being retrieved and to comprise that processing unit can be used for identifying the element that points to the indicated Visual Display of gesture according to the sensing gesture.

Can comprise according to another embodiment of the present invention and be used to provide the user interactions activated system.This system can comprise memory stores and the processing unit that is coupled to this memory stores.Processing unit can be used for receiving context that the request comprise natural-sounding phrase (for example, oral account phrase), retrieval and Visual Display be associated, gesture that identifying user has been done, according to the context of being retrieved and the gesture that identified carry out with ask related action and the result who is associated with performed action is provided to the user.According to each embodiment of this aspect, the natural language phrase can comprise oral account and/or session grammer rather than formative especially inquiry.For example, " what that buildings is " can comprise the natural language phrase and can be associated with the Visual Display that film " is stolen the dream space ".The query that can be used to contrast such as can being provided for search engine can comprise " domain:imdb.com title:Inception time:1:32 ' identify building ' coordinates:132,425 ".Visual Display can comprise the image that the recording unit that is associated with the user is caught.For example, the user can take a number sign indicating number photo and check image with camera.User's gesture can comprise the activation gesture.For example, user 105 can directly point to camera 135 and will make inquiry and/or action with indication user 105.

Can comprise according to still another embodiment of the invention and be used to provide the user interactions activated system.This system can comprise memory stores and the processing unit that is coupled to this memory stores.Processing unit can be used for receiving a plurality of simultaneous signal from the user; Wherein at least one first signal comprises that the voice signal and at least one secondary signal that receive via at least one microphone comprise the gesture that receives via at least one camera, and processing unit can be used for confirming that whether said a plurality of signals are to this system.To this system, processing unit can be used in response to definite said a plurality of signal pins: receive the inquiry from the user; The context that retrieval is associated with Visual Display; Second gesture that sign receives from the user via camera; Said a plurality of signals are translated at least one agent actions that is associated with Visual Display, and wherein gesture comprises the sensing gesture of the subclass that can be used for selecting Visual Display; Carry out the inquiry proxy action according to context of being retrieved and second gesture that is identified; And show with performed inquiry proxy to the user and to move the result who is associated.

Fig. 3 is the block diagram that comprises the system of computing equipment 300.According to one embodiment of present invention, above-mentioned memory stores and processing unit can be realized in the computing equipment such as the computing equipment 300 of Fig. 3.Can use any suitable combination of hardware, software or firmware to realize memory stores and processing unit.For example, memory stores and processing unit can or combine in other computing equipments 318 of computing equipment 300 any to realize with computing equipment 300.According to embodiments of the invention, said system, equipment and processor are examples, and other system, equipment and processor can comprise above-mentioned memory stores and processing unit.In addition, computing equipment 300 can comprise aforesaid operating environment 100.System 100 can operate in other environment, and is not limited to computing equipment 300.

With reference to figure 3, can comprise computing equipment according to the system of one embodiment of the invention, such as computing equipment 300.In basic configuration, computing equipment 300 can comprise at least one processing unit 302 and system storage 304.The configuration and the type that depend on computing equipment, system storage 304 can include, but not limited to volatile memory (for example, random-access memory (ram)), nonvolatile memory (for example, ROM (read-only memory) (ROM)), flash memory or any combination.System storage 304 can comprise operating system 305, one or more programming module 306, and can comprise personal assistant program 112.For example, operating system 305 is applicable to the operation of control computing equipment 300.In addition, embodiments of the invention can combine shape library, other operating systems or any other application program to put into practice, and are not limited to any application-specific or system.This basic configuration is illustrated by those assemblies in the dotted line 308 in Fig. 3.

Computing equipment 300 can have supplementary features or function.For example, computing equipment 300 also can comprise additional data storage device (removable and/or not removable), such as for example, and disk, CD or tape.These extra storage in Fig. 3 by removable storage 309 with can not mobile storage 310 illustrate.Computer-readable storage medium can comprise the volatibility that realizes with any method or the technology that is used to store such as information such as computer-readable instruction, data structure, program module or other data and non-volatile, removable and removable medium not.System storage 304, removable storage 309 and can not mobile storage 3 10 all be the example of computer-readable storage medium (that is memory stores).Computer-readable storage medium can comprise; But be not limited to, RAM, ROM, electricallyerasable ROM (EEROM) (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical storages, tape cassete, tape, disk storage or other magnetic storage apparatus, or can be used for canned data and can be by any other medium of computing equipment 300 visit.Any this type of computer-readable storage medium can be the part of equipment 300.Computing equipment 300 can also have input equipment 312, like keyboard, mouse, pen, audio input device, touch input device etc.Also can comprise such as output devices 314 such as display, loudspeaker, printers.The said equipment is an example, and can use other equipment.

Computing equipment 300 also can comprise and can allow equipment 300 such as being connected 316 through the communication that the network (for example, Intranet or the Internet) in the DCE comes and other computing equipments 318 communicate.It is examples of communication media that communication connects 316.Communication media is embodied by the computer-readable instruction in the modulated message signal such as carrier wave or other transmission mechanisms, data structure, program module or other data usually, and comprises any information-delivery media.The signal of setting or change its one or more characteristics with the mode that the information in this signal is encoded can be described in term " modulated message signal ".As an example and unrestricted, communication media comprises such as cable network or direct wire medium such as line connection, and such as wireless mediums such as acoustics, radio frequency (RF), infrared ray and other wireless mediums.Can comprise storage medium and communication media like term as used herein " computer-readable medium ".

As stated, can in system storage 304, store a plurality of program modules and the data file that comprises operating system 305.When on processing unit 302, carrying out, programming module 306 (for example, personal assistant program 112) can be carried out each process, for example comprises one or more in each stage of aforesaid method 200.Said process is an example, and processing unit 302 can be carried out other processes.Can comprise Email and contact application, word-processing application, spreadsheet applications, database application, slide presentation applications, drawing or computer-assisted application program etc. according to spendable other programming modules of embodiments of the invention.

Generally speaking, according to embodiments of the invention, program module can comprise can carry out the structure that particular task maybe can realize routine, program, assembly, data structure and the other types of particular abstract.In addition, embodiments of the invention can be put into practice with other computer system configurations, comprise portable equipment, multicomputer system, based on the system of microprocessor or programmable consumer electronics, minicomputer, mainframe computer etc.Put into practice in the embodiments of the invention DCE that also task is carried out by the teleprocessing equipment through linked therein.In DCE, program module can be arranged in local and remote memory storage device.

In addition, embodiments of the invention can comprise the circuit of discrete electronic component, comprise logic gate encapsulation or integrated electronic chip, utilize microprocessor circuit or comprising on the single chip of electronic component or microprocessor and put into practice.Embodiments of the invention also can use can be carried out such as for example, AND (with), OR (or) and the other technologies of the logical operation of NOT (non-) put into practice, include but not limited to machinery, optics, fluid and quantum technology.In addition, embodiments of the invention can be put into practice in multi-purpose computer or any other circuit or system.

For example, embodiments of the invention can be implemented as computer procedures (method), computing system or the goods such as computer program or computer-readable medium.Computer program can be a computer system-readable and to the computer-readable storage medium of the computer program code of the instruction that is used for the object computer process.Computer program can also be that computing system is readable and to the transmitting signal on the carrier of the computer program code of the instruction that is used for the object computer process.Therefore, the present invention can hardware and/or software (comprising firmware, resident software, microcode etc.) embody.In other words, embodiments of the invention can adopt include on it supply instruction execution system to use combine the computing machine of its use to use or the computing machine of computer readable program code can use or computer-readable recording medium on the form of computer program.Computing machine can use or computer-readable medium can be can comprise, store, communicate by letter, propagate or transmission procedure uses or combine any medium of its use for instruction execution system, device or equipment.

Computing machine can use or computer-readable medium for example can be but is not limited to electricity, magnetic, light, electromagnetism, infrared or semiconductor system, device, equipment or propagation medium.Computer-readable medium examples (non-exhaustive list) more specifically, computer-readable medium can comprise following: electrical connection, portable computer diskette, random-access memory (ram), ROM (read-only memory) (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber and portable compact disk ROM (read-only memory) (CD-ROM) with one or more lead.Note; Computing machine can use or computer-readable medium even can be to print paper or another the suitable medium that program is arranged on it; Because program can be via for example to the optical scanning of paper or other media and catch electronically; Compiled, explained or handled if necessary subsequently, and be stored in the computer memory subsequently with other suitable manner.

Above reference example is as the block diagram and/or the operational illustrations of method, system and computer program have been described embodiments of the invention according to an embodiment of the invention.Each function/action of being indicated in the frame can occur by being different from the order shown in any process flow diagram.For example, depend on related function/action, in fact two frames that illustrate continuously can be carried out basically simultaneously, and perhaps these frames can be carried out by opposite order sometimes.

Although described specific embodiment of the present invention, also possibly there are other embodiment.In addition; Though embodiments of the invention be described to be stored in storer and other storage mediums in data be associated; But data also can be stored on the computer-readable medium of other types or from it and read, such as auxiliary storage device (as hard disk, floppy disk or CD-ROM), from carrier wave or the other forms of RAM or the ROM of the Internet.In addition, each step of disclosed method can be revised by any way, comprises through to the rearrangement of each step and/or insert or the deletion step, and does not deviate from the present invention.

The all authority that comprises the copyright in the included code here all belongs to the applicant and is the applicant's property.The applicant keeps also keeping all authority in the included code here, and only authorizes about the reproduction of institute's granted patent and the permission of reproducing these materials from other purposes.

Although this instructions comprises example, scope of the present invention is indicated by appended claims.In addition, although used to the special-purpose language description of architectural feature and/or method action this instructions, claims are not limited to characteristic described above or action.On the contrary, special characteristic described above is to come disclosed as the example of embodiments of the invention with action.

Claims

1. one kind is used to provide user (105) the mutual method (200) that activates, and said method (200) comprising:

Receive (210) a plurality of signals from user (105);

Confirm whether (220) said a plurality of signals are associated with Visual Display; And

Be associated with Visual Display in response to definite (220) said a plurality of signals:

Said a plurality of signal translations (240) are become agent actions,

The context that retrieval is associated with said Visual Display,

Carry out (250) said agent actions according to the context of being retrieved, and

Show the result that (260) are associated with performed agent actions to user (105).

2. the method for claim 1 (200) is characterized in that, said a plurality of signals comprise following one of at least: key word with activate gesture.

3. the method for claim 1 (200) is characterized in that, carries out (250) agent actions according to the context of being retrieved and comprises the subclass that a plurality of results is reduced into said a plurality of results according to the context of being retrieved.

4. method as claimed in claim 5 (200) is characterized in that, also comprises the subclass that shows (260) said a plurality of results to user (105).

5. the method for claim 1 (200) is characterized in that, also comprises:

Receive (210) gesture from user (105), wherein said gesture comprise in said a plurality of signal one of at least;

Upgrade the context that (240) are retrieved according to said gesture; And

Carry out (250) said agent actions according to the context that is upgraded.

6. store one group of computer-readable medium that instructs for one kind, a kind of user (105) method (200) of mutual activation that is used to provide is carried out in said one group of instruction when being performed, instruct the method for carrying out (200) to comprise by said one group:

Receive (210) request from user (105), wherein said request comprises voice signal;

The context that retrieval (230) is associated with Visual Display;

The gesture that sign (240) receives from user (105);

Carry out (250) and described request associated action according to context of being retrieved and the gesture that is identified; And

Show the result that (260) are associated with performed action to user (105).

7. computer-readable medium as claimed in claim 6 is characterized in that, said gesture and said voice signal are to receive from user (105) simultaneously.

8. computer-readable medium as claimed in claim 6 is characterized in that, carries out the subclass that (250) and described request associated action comprise the Visual Display of selecting to be associated with described request according to the gesture that is identified.

9. computer-readable medium as claimed in claim 6 is characterized in that, the context that is associated with said Visual Display be from a plurality of metadata that said Visual Display is associated retrieve.

10. one kind is used to provide user (105) mutual activated system, and said system comprises:

At least one camera (135);

Memory stores (304); And

Be coupled to the processing unit (302) of said memory stores (304) and said camera (135), wherein said processing unit (302) can be used for:

Reception is from user's (105) a plurality of simultaneous signal, and wherein at least one first signal comprises that the voice signal and at least one secondary signal that receive via at least one microphone comprise the gesture that receives via said at least one camera (135),

Whether confirm (220) said a plurality of signals to said system, and

In response to definite (220) said a plurality of signal pins to said system:

The context that retrieval (230) is associated with Visual Display;

With at least one agent actions that said a plurality of signals translations (240) become to be associated with said Visual Display, wherein said gesture comprises the sensing gesture of the subclass that can be used for selecting said Visual Display;

Carry out (250) said agent actions according to context of being retrieved and said gesture; And