CN107480161A

CN107480161A - The intelligent automation assistant probed into for media

Info

Publication number: CN107480161A
Application number: CN201710391293.4A
Authority: CN
Inventors: R·M·奥尔; M·P·贝纳多; D·J·曼德尔
Original assignee: Apple Computer Inc
Current assignee: Apple Inc
Priority date: 2016-06-08
Filing date: 2017-05-27
Publication date: 2017-12-15

Abstract

Embodiment of the disclosure is related to the intelligent automation assistant probed into for media.It is used to operate intelligent automation assistant to probe into the system of media item and process the invention provides a kind of.In an example process, the phonetic entry for representing the request to one or more media items is received from user.The process determines whether phonetic entry corresponds to the user view for obtaining the personalized recommendation for media item.In response to determining that phonetic entry corresponds to user view of the acquisition for the personalized recommendation of media item, at least one media item is obtained from the corpus specific to user of media item.The corpus specific to user of media item is generated based on the data associated with user.At least one media item is provided.

Description

The intelligent automation assistant probed into for media

The cross reference of related application

Entitled " the INTELLIGENT AUTOMATED that patent application claims were submitted on June 8th, 2016 ASSISTANT FOR MEDIA EXPLORATION " U.S.Provisional Serial 62/347,480；In September 15 in 2016 Day submit entitled " the INTELLIGENT AUTOMATED ASSISTANT FOR MEDIA EXPLORATION " U.S. is non- Provisional application Ser.No 15/266,956；And the entitled " " INTELLIGENT submitted on May 15th, 2017 AUTOMATED ASSISTANT FOR MEDIA EXPLORATION " Danish Patent Application sequence number PA201770338's is excellent First weigh, all these patent applications are incorporated by reference in its entirety herein for all purposes accordingly.

Technical field

Present invention relates generally to intelligent automation assistant, and more particularly relate to the intelligent automation that media are probed into Assistant.

Background technology

Intelligent automation assistant (or digital assistants) can provide favourable interface between human user and electronic equipment.This Class assistant can allow user to be interacted using speech form and/or the natural language of textual form with equipment or system.For example, User can be provided the phonetic entry asked comprising user to the digital assistants run on an electronic device.The digital assistants can root Explain user view according to phonetic entry and user view is transformed into task.Then, can be by performing electronic equipment One or more services perform task, and the correlation output that can will be responsive to user's request is back to user.

When managing music or other media, digital assistants can help to search for or play back specific media, particularly exist Under handsfree environment.Specifically, digital assistants effectively can respond to play specific media item, such as by title to request Or the corpus that clearly identifies of artist or song.However, digital assistants are difficult to based on fuzzy open natural language request Such as the media item of correlation is found for recommending the request of song or corpus.

The content of the invention

It is used to operate intelligent automation assistant to probe into the system of media item and process the invention provides a kind of.At one In example process, the phonetic entry for representing the request to one or more media items is received from user.The process determines Whether phonetic entry corresponds to the user view for obtaining the personalized recommendation for media item.In response to determining that phonetic entry is corresponding In the user view for obtaining the personalized recommendation for media item, at least one is obtained from the corpus specific to user of media item Individual media item.The corpus specific to user of media item is generated based on the data associated with user.There is provided this at least one Individual media item.

Brief description of the drawings

Fig. 1 is to show to be used to realize the system of digital assistants and the block diagram of environment according to various examples.

Fig. 2A is the portable multifunction device for showing the client-side aspects for realizing digital assistants according to various examples Block diagram.

Fig. 2 B are the block diagram for showing the example components for event handling according to various examples.

Fig. 3 shows the portable multifunction device of the client-side aspects for realizing digital assistants according to various examples.

Fig. 4 is the block diagram for showing the exemplary multifunctional equipment with display and touch sensitive surface according to various examples.

Fig. 5 A show the example user of the menu of the application program on the portable multifunction device according to various examples Interface.

Fig. 5 B are shown according to the exemplary of the multifunctional equipments with the touch sensitive surface separated with display of various examples User interface.

Fig. 6 A show the personal electronic equipments according to various examples.

Fig. 6 B are the block diagram for showing the personal electronic equipments according to various examples.

Fig. 7 A are the block diagram for showing digital assistant or its server section according to various examples.

Fig. 7 B show the function of the digital assistants as shown in Figure 7 A according to various examples.

Fig. 7 C show a part for the ontologies according to various examples.

Fig. 8 A-C show the process for being used for the digital assistants that media are probed into according to the operation of various examples.

Fig. 9 A-B show to operate the digital assistants probed into for media according to the user of various examples.

Figure 10 shows to operate the digital assistants probed into for media according to the user of various examples.

Figure 11 shows to operate the digital assistants probed into for media according to the user of various examples.

Figure 12 shows the functional block diagram of the electronic equipment according to various examples.

Embodiment

Accompanying drawing will be quoted in below to the description of example, shows the spy that can be carried out by way of illustration in the accompanying drawings Determine example.It should be appreciated that in the case where not departing from the scope of each example, other examples can be used and can make structural Change.

The routine techniques that media content is probed into using digital assistants is general cumbersome and poorly efficient.Specifically, natural language shape The media association requests of formula are for example excessively wide in range or fuzzy, and therefore, it is difficult to be accurately inferred to and the corresponding user view of request.Example Such as, media association requests " it is pleasing to the ear to play some for me " are fuzzy and open, and therefore utilize prior art, numeral Assistant may retrieve the media item incompatible with user preference, and excessive media item may be presented to user, or may be not Return to any content.This can cause a large amount of follow-up interactions between user and digital assistants, to clarify user view.This can be right Consumer's Experience has a negative impact.In addition, a large amount of follow-up interactions are poorly efficient relative to the energy consumption of equipment.This consideration for Battery-driven equipment is especially important.

According to some systems, computer-readable media and process as described herein, digital assistants are with more efficient and accurate Mode is probed into perform media.In an example process, received from user for representing to one or more media items The phonetic entry of request.The process determines whether phonetic entry corresponds to the user's meaning obtained for the personalized recommendation of media item Figure.In response to determining that phonetic entry corresponds to user view of the acquisition for the personalized recommendation of media item, from the spy of media item Corpus due to user obtains at least one media item.At least one media item uses the media sequence mould specific to user Type and be acquired.Generated based on the data associated with user media item specific to the corpus of user or specific to user Media order models.Then at least one media item is provided to user.By using the language specific to user of media item Expect that storehouse and the media order models specific to user obtain at least one media item, at least one media item meets user preference Possibility improve.Therefore, it is recommended that the media item more relevant with user, this improves the efficiency and serviceability of digital assistants.

Although description describes various elements using term " first ", " second " etc. below, these elements should not be by art The limitation of language.These terms are only intended to distinguish an element with another element.For example, the first input can be named as the Two inputs, and similarly, the second input can be named as the first input, without departing from the scope of various described examples. First input and the second data are input, and are independent and different input in some cases.

The term used in the description to various described examples is intended merely to describe particular example herein Purpose, and be not intended to be limited.As used in the description in the various examples and appended claims Like that, singulative "one" (" a ", " an ") and "the" are intended to also include plural form, refer to unless the context clearly Show.It will be further understood that term "and/or" used herein refers to and covered in associated listed project Any and all possible combinations of one or more projects.It will be further understood that term " comprising " (" includes ", " including ", " comprises " and/or " comprising ") when in this manual use when be specify exist institute it is old Feature, integer, step, operation, element and/or the part stated, but it is not excluded that in the presence of or addition one or more other are special Sign, integer, step, operation, element, part and/or its packet.

Based on context, term " if " can be interpreted to mean " and when ... when " (" when " or " upon ") or " in response to It is determined that " or " in response to detecting ".Similarly, based on context, phrase " if it is determined that ... " or " [stated if detected Condition or event] " can be interpreted to mean " it is determined that ... when " or " in response to determining ... " or " [stated detecting Condition or event] when " or " in response to detecting [condition or event stated] ".

1st, system and environment

Fig. 1 shows the block diagram of the system 100 according to various examples.In some instances, system 100 realizes digital assistants. Term " digital assistants ", " virtual assistant ", " intelligent automation assistant " or " automatic digital assistant " refer to explanation voice and/or text The natural language of form is inputted to infer user view and be performed based on the user view being inferred at any information of action Reason system.For example, in order to practice the user view being inferred to, system performs one or more of the following：Pass through design The step of for realizing be inferred to user view and parameter carry out identification mission stream, by the tool from the user view being inferred to Body requirement is input in task flow；Task flow is performed by caller, method, service, API etc.；And generation is to user The sense of hearing (for example, voice) and/or visual form output response.

Specifically, digital assistants can receive at least partly natural language instructions, request, state, tell about and/or ask The user's request for the form asked.Generally, or digital assistants are sought in user's request makes informedness answer, helped or seeking numeral Manage execution task.Gratifying response to user's request is asked including the asked informedness answer of offer, execution Task or combination of the two.For example, user to digital assistants propose problem such as " I now where”.Based on user's Current location, " you are near Central Park west gate for digital assistants answer." user also asks execution task, such as " please invite me Friend's next week participate in the birthday party of my girlfriend." can be come as response, digital assistants by telling " good, at once " Confirmation request, and then represent user and suitable calendar is invited to friend user sent to being listed in the electronic address list of user Each friend in friend.During asked task is performed, digital assistants are being related to multiple letter in some time section sometimes Cease and interacted in the continuous dialogue exchanged with user.Interacted in the presence of with digital assistants with solicited message or various of execution Many other methods of business.In addition to offer speech responds and takes action by programming, digital assistants also provide other videos Or the response of audio form, such as text, alarm, music, video, animation etc..

As shown in figure 1, in some instances, digital assistants can be implemented according to client-server model.Digital assistants It is included in the client-side aspects 102 (hereinafter referred to as " DA clients 102 ") performed on user equipment 104, and in server The server portion 106 (hereinafter referred to as " DA servers 106 ") performed in system 108.DA clients 102 by one or Multiple networks 110 are communicated with DA servers 106.It is such as user oriented defeated that DA clients 102 provide client-side function Enter and output is handled, and communicated with DA servers 106.DA servers 106 are that any number of DA clients 102 provide Server side function, any number of DA clients 102 are each located in respective user equipment 104.

In some instances, DA servers 106 include the I/O interfaces 112 at curstomer-oriented end, one or more processing modules 114th, data and model 116 and the I/O interfaces 118 to external service.The I/O interfaces 112 at curstomer-oriented end are advantageous to DA clothes The input and output processing at the curstomer-oriented end of business device 106.One or more processing modules 114 using data and model 116 come Handle phonetic entry and inputted based on natural language to determine user view.In addition, one or more processing modules 114 are based on The user view that is inferred to performs task.In some instances, DA servers 106 by one or more networks 110 come with External service 120 is (for example, one or more media services 120-1, one or more navigation Service 120-2, one or more disappear Breath type service 120-3, one or more information service 120-4, calendar service 120-5, telephone service 120-6 etc.) communicated, To complete task or collection information.I/O interfaces 118 to external service facilitate such communication.

Specifically, DA servers 106 are communicated with one or more media services, include searching for and obtaining matchmaker to perform The task of body item.One or more media services 120-1 are implemented in for example one or more remote media servers, and by It is configured to provide for media item, song, corpus, playlist, video etc..For example, one or more media services include Media streaming services, such as Apple Music or iTunes Radio^TM(Apple Inc.(Cupertino,California))。 One or more media services 120-1 are configured as receiving media research inquiry (for example, coming from DA servers 106), and make For response, there is provided meet one or more media items of media research inquiry.Specifically, inquired about according to media research, search for matchmaker One or more corpus of body item, to identify one or more media items and provide the one or more media identified .In addition, one or more media services are configured to supply the information associated with media item, such as with certain media items phase Artistical name, the issuing date of certain media items or the lyrics of certain media items of association.

One or more media services 120-1 include the various corpus of media item.The corpus of the media item includes matchmaker The corpus specific to user of body item.Generated based on the data associated with relative users media item it is each specific to The corpus at family.The related data of media include for example indicating media item previously checked, selected by user, having asked, having gathered or User's input of refusal.In addition, the data of media correlation are included in what is found in the personal library of the media item associated with user Media item.Therefore, the matchmaker for the media item reflection relative users being comprised in each corpus specific to user of media item Body preference.In some instances, identify and access based on user profile such as user login information and/or user password information The corpus specific to user of each media item.In some instances, the media item in one or more media services 120-1 Corpus also include the corpus of one or more second of media item generated based on the issuing date of media item.For example, The corpus of one or more second of media item, which only includes, to be had in the predetermined time range since current date Issuing date media item.

In some instances, each media item in the corpus of media item includes indicating one or more media parameters Metadata.Media parameter includes such as { title }, { artist }, { school }, { issuing date }, { mood }, { occasion }, { editor List }, { political orientation }, { skills involved in the labour } etc..Therefore based on the media parameter indicated in the metadata of media item come Media item in the corpus of search and retrieval media item.Provided below with reference to Fig. 8 A-C on the matchmaker associated with media item The additional description of body parameter.

User equipment 104 can be any suitable electronic equipment.In some instances, user equipment is portable multi-function Equipment (for example, below with reference to equipment 200 described in Fig. 2A), multifunctional equipment are (for example, below with reference to the equipment described in Fig. 4 Or personal electronic equipments (for example, equipment 600 described in below with reference to Fig. 6 A-B) 400).Portable multifunction device is for example Also include other functions such as PDA and/or music player functionality mobile phone.The specific example of portable multifunction device Including from Apple Inc.'s (Cupertino, California)Equipment, iPodEquipment andEquipment.Other examples of portable multifunction device include but is not limited to laptop computer or tablet personal computer.In addition, In some examples, user equipment 104 is non-portable multifunctional equipment.Specifically, user equipment 104 is desktop computer, trip Gaming machine, television set or TV set-top box.In some instances, user equipment 104 includes touch sensitive surface (for example, touch-screen is shown Device and/or touch pad).In addition, user equipment 104 optionally includes other one or more physical user-interface devices, such as Physical keyboard, mouse and/or control stick.The various examples such as multifunctional equipment of electronic equipment is retouched more fully below State.

The example of one or more communication networks 110 includes LAN (LAN) and wide area network (WAN), such as internet.One Any of procotol of individual or multiple uses of communication networks 110, including various wired or wireless agreements, such as Ethernet, USB (USB), live wire (FIREWIRE), global system for mobile communications (GSM), enhanced data gsm environment (EDGE), CDMA (CDMA), time division multiple acess (TDMA), bluetooth, Wi-Fi, internet telephone protocol (VoIP), Wi-MAX, Or any other suitable communication protocol is implemented.

Server system 108 is real on the one or more free-standing data processing equipments or distributed network of computer Apply.In some instances, server system 108 also uses third party's service provider (for example, third party cloud service provider) Various virtual units and/or service the potential computing resource and/or infrastructure resources of server system 108 be provided.

In some instances, user equipment 104 is communicated via second user equipment 122 with DA servers 106. Second user equipment 122 is similar or identical with user equipment 104.For example, second user equipment 122 is similar to below with reference to figure Equipment 200,400 or 600 described in 2A, Fig. 4 and Fig. 6 A-B.User equipment 104 is configured as via direct communication connection such as Bluetooth, NFC, BTLE etc. are communicatively coupled to second user via such as local Wi-Fi network of wired or wireless network Equipment 122.In some instances, second user equipment 122 is configured to act as between user equipment 104 and DA servers 106 Agency.For example, the DA clients 102 of user equipment 104 be configured as via second user equipment 122 by information (for example, The user's request received at user equipment 104) transmit to DA servers 106.The processing information of DA servers 106 and via Related data (for example, data content in response to user's request) is back to user equipment 104 by two user equipmenies 122.

In some instances, user equipment 104 is configured as sending the breviary request of data to second user equipment 122, to reduce the information content transmitted from user equipment 104.Second user equipment 122 is configured to determine that added to breviary and asked Side information, with generation transmit to the full request of DA servers 106.The system architecture can advantageously make with finite communication The user equipment 104 of ability and/or finite battery charge (for example, wrist-watch or similar compact electronic devices) is by using tool There are stronger communication capacity and/or the second user of battery electric quantity (for example, mobile phone, laptop computer, tablet personal computer etc.) to set Standby 122 act on behalf of as DA servers 106 to access the service provided by DA servers 106.Although two are merely illustrated in Fig. 1 User equipment 104 and user equipment 122, it is to be understood that, in some instances, system 100 may include to carry out with proxy configurations Configuration is with the user equipment of the Arbitrary Digit amount and type to be communicated with DA server systems 106.

Although the digital assistants shown in Fig. 1 include client-side aspects (for example, DA clients 102) and server side Partly both (for example, DA servers 106), but in some instances, the function of digital assistants is implemented as being installed in user Free-standing application program in equipment.In addition, the function between the client part and server section of digital assistants is divided in Alterable in different specific implementations.For example, in some instances, DA clients are only to provide user oriented input and output Processing function, and the every other function of digital assistants is delegated to the thin-client of back-end server.

2nd, electronic equipment

The embodiment that attention is drawn to the electronic equipment of the client-side aspects for realizing digital assistants.Figure 2A is the block diagram for showing the portable multifunction device 200 with touch-sensitive display system 212 according to some embodiments. Touch-sensitive display 212 is referred to alternatively as or is called " touch-sensitive display system sometimes for being conveniently called " touch-screen " sometimes System ".Equipment 200 includes memory 202 (it optionally includes one or more computer-readable recording mediums), memory controls Device 222, one or more processing units (CPU) 220, peripheral interface 218, RF circuits 208, voicefrequency circuit 210, raise one's voice Device 211, microphone 213, input/output (I/O) subsystem 206, other input control apparatus 216 and outside port 224.If Standby 200 optionally include one or more optical sensors 264.Equipment 200 optionally include be used for detection device 200 (for example, The touch-sensitive display system 212 of touch sensitive surface, such as equipment 200) on contact intensity one or more contact strengths pass Sensor 265.One or more tactiles output that equipment 200 optionally includes being used to generate on the device 200 tactile output occurs Device 267 is (for example, raw in the touch-sensitive display system 212 of touch sensitive surface such as equipment 200 or the touch pad 455 of equipment 400 Exported into tactile).These parts are communicated optionally by one or more communication bus or signal wire 203.

As used in the present specification and claims, " intensity " of the contact on term touch sensitive surface refers to touch The power or pressure (power of per unit area) of contact (for example, finger contact) on sensitive surfaces, or refer to connecing on touch sensitive surface Tactile power or the substitute (surrogate) of pressure.The intensity of contact has value scope, and it is different that the value scope includes at least four Be worth and more typically include up to a hundred different values (for example, at least 256).The intensity of contact optionally uses various methods Combination with various sensors or sensor determines (or measurement).For example, below touch sensitive surface or adjacent to touch sensitive surface One or more force snesors optionally for measurement touch sensitive surface on difference at power.In some specific implementations, Power measurement from multiple force snesors is merged (for example, weighted average), to determine the contact force of estimation.Similarly, stylus Pressure-sensitive top optionally for determining pressure of the stylus on touch sensitive surface.Alternatively, what is detected on touch sensitive surface connects Nearby the electric capacity of touch sensitive surface and/or its change, and/or contact are touch-sensitive nearby for the size of contacting surface product and/or its change, contact The resistance on surface and/or its change are optionally used as the power of contact or the substitute of pressure on touch sensitive surface.In some tools During body is implemented, the replacement measured value of contact force or pressure, which is directly used in, determines whether to exceed intensity threshold (for example, intensity threshold With unit description corresponding with substituting measured value).In some specific implementations, the substitute measurement of contact force or pressure is changed Into the power or pressure of estimation, and the power or pressure estimated are used to determine whether to exceed intensity threshold (for example, intensity threshold is The pressure threshold measured with the unit of pressure).The attribute inputted using the intensity of contact as user, so as to allow user The optional equipment function that user can not may access originally in smaller equipment is accessed, the smaller equipment has Limited area on the spot be used for show (for example, on the touch sensitive display) can represent and/or receive user input (for example, through By touch-sensitive display, touch sensitive surface or physical control/mechanical control, such as knob or button).

As used in the specification and claims, term " tactile output " refers to utilize user's by user The equipment that sense of touch detects is relative relative to the part (for example, touch sensitive surface) of the physical displacement of the previous position of equipment, equipment In another part (for example, shell) of equipment physical displacement or part relative to the barycenter of equipment displacement.For example, The part of equipment or equipment connects with user to touching sensitive surface (for example, other parts of finger, palm or user's hand) In the case of touching, the tactile output generated by physical displacement will be construed to sense of touch by user, and the sense of touch corresponds to equipment or set The change perceived of the physical features of standby part.For example, the movement of touch sensitive surface (for example, touch-sensitive display or Trackpad) " pressing click " or " unclamp and click on " to physical actuation button is optionally construed to by user.In some cases, user will Feel sense of touch, such as " press click " or " unclamp click on ", (example is physically pressed even in the movement by user Such as, be shifted) the physical actuation button associated with touch sensitive surface when not moving.As another example, even in tactile When the smoothness of sensitive surfaces is unchanged, it is touch sensitive surface that the movement of touch sensitive surface, which also optionally can be construed to by user or be sensed, " roughness ".Although will be limited by user by the individuation sensory perception of user such explanation of touch, exist The many sensory perceptions touched are that most of users share.Therefore, when tactile output is described as corresponding to the specific of user During sensory perception (for example, " pressing click ", " unclamp and click on ", " roughness "), unless otherwise stated, the tactile otherwise generated Corresponding to equipment or the physical displacement of its part, the sense organ that the physical displacement will generate typical case (or common) user is known for output Feel.

It should be appreciated that equipment 200 is only an example of portable multifunction device, and equipment 200 optionally has Than shown more or less parts, two or more parts are optionally combined, or optionally there are these parts Different configurations or arrangement.Various parts shown in Fig. 2A are come real with the combination of both hardware, software or hardware and software Existing, it includes one or more signal processing circuits and/or application specific integrated circuit.

Memory 202 includes one or more computer-readable recording mediums.Exemplified by computer-readable recording medium if any It is shape and non-transient.Memory 202 includes high-speed random access memory and may also include nonvolatile memory, such as One or more disk storage equipments, flash memory device or other non-volatile solid state memory equipment.Memory controls The miscellaneous part of the control device 200 of device 222 accesses memory 202.

In some instances, the non-transient computer readable storage medium storing program for executing of memory 202 be used for store instruction (for example, with In perform process described below aspect), for instruction execution system, device or equipment such as computer based system, System comprising processor can obtain instruction and the other systems use of execute instruction from instruction execution system, device or equipment It is or in connection.In other examples, instruction (for example, aspect for performing process described below) is stored in server On the non-transient computer readable storage medium storing program for executing (not shown) of system 108, or can in the non-transient computer of memory 202 Read to divide between storage medium and the non-transient computer readable storage medium storing program for executing of server system 108.

Peripheral interface 218 is used to the input peripheral of equipment and output ancillary equipment being couple to the Hes of CPU 220 Memory 202.The one or more processors 220 run or perform the various software programs that are stored in memory 202 and/ Or instruction set, to perform the various functions of equipment 200 and processing data.In some embodiments, peripheral interface 218, CPU 220 and Memory Controller 222 are implemented in one single chip such as on chip 204.In some other embodiments, They are implemented on independent chip.

RF (radio frequency) circuit 208 receives and sent the RF signals for being also designated as electromagnetic signal.RF circuits 208 turn electric signal Be changed to electromagnetic signal/by electromagnetic signal and be converted to electric signal, and via electromagnetic signal come with communication network and other communicate Equipment is communicated.RF circuits 208 optionally include being used for the well known circuit for performing these functions, including but not limited to antenna System, RF transceivers, one or more amplifiers, tuner, one or more oscillators, digital signal processor, encoding and decoding Chipset, subscriber identity module (SIM) card, memory etc..RF circuits 208 optionally by radio communication and network and its He is communicated at equipment, and the network is such as internet (also referred to as WWW (WWW)), Intranet, and/or wireless network (such as cellular phone network, WLAN (LAN), and/or Metropolitan Area Network (MAN) (MAN)).RF circuits 208 optionally include being used for The well known circuit of near-field communication (NFC) field is detected, is such as detected by short-haul connections radio unit.Channel radio Trust selection of land and use any one of a variety of communication standards, agreement and technology, including but not limited to global system for mobile communications (GSM), enhanced data gsm environment (EDGE), high-speed downlink packet access (HSDPA), High Speed Uplink Packet connect Enter (HSUPA), evolution, clear data (EV-DO), HSPA, HSPA+, double unit HSPA (DC-HSPDA), Long Term Evolution (LTE), Near-field communication (NFC), WCDMA (W-CDMA), CDMA (CDMA), time division multiple acess (TDMA), bluetooth, bluetooth Low-power consumption, Wireless Fidelity (Wi-Fi) are (for example, IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, IEEE 802.11n and/or IEEE 802.11ac), voice over internet protocol (VoIP), Wi-MAX, email protocol (for example, interconnection Network information access protocol (IMAP) and/or post office protocol (POP)), instant message (for example, scalable message processing and exist association Discuss (XMPP), for instant message and in the presence of Session initiation Protocol (SIMPLE), instant message and the presence service using extension (IMPS)), and/or Short Message Service (SMS) or when being included in this document submission date also it is untapped go out communication protocol Any other appropriate communication protocol.

Voicefrequency circuit 210, loudspeaker 211 and microphone 213 provide the COBBAIF between user and equipment 200.Audio Circuit 210 receives voice data from peripheral interface 218, voice data is converted into electric signal, and electric signal transmission is arrived Loudspeaker 211.Loudspeaker 211 converts electrical signals to the audible sound wave of the mankind.Voicefrequency circuit 210 is also received by microphone 213 The electric signal changed according to sound wave.Voicefrequency circuit 210 converts electrical signals to voice data, and voice data is transferred into periphery Equipment interface 218 is for processing.Voice data be retrieved from and/or transmitted by peripheral interface 218 to memory 202 and/ Or RF circuits 208.In some embodiments, voicefrequency circuit 210 also includes earphone jack (for example, 312 in Fig. 3).Headset Jack provides the interface between voicefrequency circuit 210 and removable audio input/output ancillary equipment, and the removable audio is defeated Enter/export earphone that ancillary equipment such as only exports or with output (for example, single head-receiver or bi-telephone) and input (example Such as, microphone) both headset.

I/O subsystems 206 control such as touch-screen 212 of the input/output ancillary equipment in equipment 200 and other inputs Equipment 216 is coupled to peripheral interface 218.I/O subsystems 206 optionally include display controller 256, optical sensor control Device 258 processed, intensity sensor controller 259, tactile feedback controller 261 and one for other inputs or control device Or multiple input controllers 260.One or more input controllers 260 receive telecommunications from other input control apparatus 216 Number/send electric signal to other input control apparatus 216.Other input control apparatus 216 optionally include physical button (example Such as, push button, rocker buttons etc.), dial, slide switch, control stick, click type rotating disk etc..In some alternative realities Apply in scheme, one or more input controllers 260 be optionally coupled to any one of the following (or be not coupled to Any one of lower items)：Keyboard, infrared port, USB port and pointing device such as mouse.One or more buttons (for example, 308 in Fig. 3) optionally include pressing for increase/reduction of the volume of loudspeaker 211 and/or microphone 213 control Button.One or more buttons optionally include push button (for example, 306 in Fig. 3).

Quick push button of pressing releases the locking of touch-screen 212 or begins to use the gesture on touch-screen to come to equipment The process being unlocked, such as it is entitled " the Unlocking a Device by submitted on December 23rd, 2005 Performing Gestures on an Unlock Image " U.S. Patent application 11/322,549 is United States Patent (USP) 7, Described in 657,849, above-mentioned U.S. Patent application is incorporated by reference in its entirety herein accordingly.Push button is pressed longerly (such as 306) make equipment 200 start shooting or shut down.User can carry out self-defined to the function of one or more buttons.Touch-screen 212 are used to realize virtual push button or soft key and one or more soft keyboards.

Touch-sensitive display 212 provides the input interface and output interface between equipment and user.Display controller 256 from touch Touch the reception electric signal of screen 212 and/or send electric signal to touch-screen 212.Touch-screen 212 shows visual output to user.Depending on Feel that output optionally includes figure, text, icon, video and any combination of them (being referred to as " figure ").In some implementations In scheme, the visual output of some visual outputs or whole corresponds to user interface object.

Touch-screen 212 have based on tactile and/or tactile contact come receive from user the touch sensitive surface of input, sensor, Or sensor group.Touch-screen 212 and display controller 256 are (with any associated module in memory 202 and/or instruction Collection together) detection touch-screen 212 on contact (and any movement or interruption of the contact), and by detected contact turn The user interface object (for example, one or more soft keys, icon, webpage or image) for being changed to and being displayed on touch-screen 212 Interaction.In an exemplary embodiment, the contact point between touch-screen 212 and user corresponds to the finger of user.

Touch-screen 212 is (luminous using LCD (liquid crystal display) technology, LPD (light emitting polymer displays) technologies or LED Diode) technology, but other Display Techniques can be used in other embodiments.Touch-screen 212 and display controller 256 use It is currently known or later by any technology and other proximity sensor arrays or use in a variety of touch-sensing technologies of exploitation In it is determined that the one or more points contacted with touch-screen 212 other elements come detect contact and its any movement or interruption, should A variety of touch-sensing technologies include but is not limited to capacitive character, resistive, infrared and surface acoustic wave technique.In an exemplary implementation In scheme, using projection-type mutual capacitance detection technology, such as Apple Inc.'s (Cupertino, California)And iPodIt was found that technology.

Touch-sensitive display in some embodiments of touch-screen 212 can be similar to the multiple spot described in following United States Patent (USP) Touch-sensitive touch pad：6,323,846 (Westerman et al.), 6,570,557 (Westerman et al.) and/or 6,677,932 (Westerman)；And/or U.S. Patent Publication 2002/0015024A1, each patent document in these patent documents is accordingly It is incorporated by reference in its entirety herein.However, touch-screen 212 shows the visual output from equipment 200, and touch-sensitive touch pad is not Visual output is provided.

Touch-sensitive display in some embodiments of touch-screen 212 is described as in following patent application： (1) U.S. Patent application 11/381,313 submitted on May 2nd, 2006, " Multipoint Touch Surface Controller”；(2) U.S. Patent application 10/840,862 submitted on May 6th, 2004, " Multipoint Touchscreen”；(3) U.S. Patent application 10/903,964 submitted on July 30th, 2004, " Gestures For Touch Sensitive Input Devices”；(4) U.S. Patent application 11/048 submitted on January 31st, 2005, 264, “Gestures For Touch Sensitive Input Devices”；(5) in U.S. submitted on January 18th, 2005 State's patent application 11/038,590, " Mode-Based Graphical User Interfaces For Touch Sensitive Input Devices”；(6) U.S. Patent application 11/228,758 submitted for 16th in September in 2005, “Virtual Input Device Placement On A Touch Screen User Interface”；(7) in 2005 The U.S. Patent application 11/228,700 that September is submitted on the 16th, " Operation Of A Computer With A Touch Screen Interface”；(8) U.S. Patent application 11/228,737 submitted on September 16th, 2005, " Activating Virtual Keys Of A Touch-Screen Virtual Keyboard”；(9) in U.S. submitted on March 3rd, 2006 State's patent application 11/367,749, " Multi-Functional Hand-Held Device ".All these patent applications are in full It is herein incorporated by reference.

Touch-screen 212 has the video resolution for example more than 100dpi.In some embodiments, touch-screen has about 160dpi video resolution.User using any suitable object or additives such as stylus, finger etc. come with touch-screen 212 contacts.In some embodiments, by user-interface design mainly to be worked by the contact based on finger and gesture, by It is larger in the contact area of finger on the touchscreen, therefore this may be accurate not as the input based on stylus.In some embodiment party In case, the rough input based on finger is translated as accurate pointer/cursor position or order by equipment, for performing user institute Desired action.

In some embodiments, in addition to a touch, equipment 200 includes being used to activating or deactivating specific function Touch pad (not shown).In some embodiments, touch pad is the touch sensitive regions of equipment, and the touch sensitive regions and touch-screen are not Together, it does not show visual output.Touch pad is the touch sensitive surface separated with touch-screen 212, or is touched by what touch-screen was formed The extension of sensitive surfaces.

Equipment 200 also includes being used for the power system 262 for various parts power supply.Power system 262 includes electrical management System, one or more power supplys (for example, battery, alternating current (AC)), recharging system, power failure detection circuit, power become Parallel operation or inverter, power supply status indicator (for example, light emitting diode (LED)) and the life with the electric power in portable set Any other part associated into, management and distribution.

Equipment 200 also includes one or more optical sensors 264.Fig. 2A, which is shown, to be couple in I/O subsystems 206 Optical sensor controller 258 optical sensor.Optical sensor 264 may include charge coupling device (CCD) or complementation Metal-oxide semiconductor (MOS) (CMOS) phototransistor.Optical sensor 264 from environment receive by one or more lens and The light of projection, and convert light to represent the data of image.With reference to image-forming module 243 (also referred to as camera model), optics passes Sensor 264 captures still image or video.In some embodiments, optical sensor be located at equipment 200 with before equipment The phase of touch-screen display 212 in portion back to rear portion so that touch-screen display is used as being used for still image and/or video figure As the view finder of collection.In some embodiments, optical sensor is located at the front portion of equipment so that is touching screen display in user The image of the user is obtained while showing and other video conference participants are checked on device, for video conference.In some implementations In scheme, the position of optical sensor 264 can be changed (such as by lens and sensor in slewing shell) by user, So that single optical sensor 264 is used together with touch-screen display, for video conference and still image and/or video Both IMAQs.

Equipment 200 optionally also includes one or more contact strength sensors 265.Fig. 2A, which is shown, is couple to I/O The contact strength sensor of intensity sensor controller 259 in system 206.Contact strength sensor 265 optionally includes one Individual or multiple piezoresistive strain instrument, capacitive force transducer, electric force snesor, piezoelectric force transducer, optics force snesor, electricity Appearance formula touch sensitive surface or other intensity sensors are (for example, the sensing of the power (or pressure) for measuring the contact on touch sensitive surface Device).Contact strength sensor 265 receives contact strength information (for example, surrogate of pressure information or pressure information) from environment. In some embodiments, at least one contact strength sensor and touch sensitive surface (for example, touch-sensitive display system 212) juxtaposition Arrangement is neighbouring.In some embodiments, at least one contact strength sensor be located at equipment 200 with positioned at equipment 200 Front portion on the phase of touch-screen display 212 back to rear portion on.

Equipment 200 also includes one or more proximity transducers 266.Fig. 2A, which is shown, is couple to peripheral interface 218 Proximity transducer 266.Alternatively, proximity transducer 266 is couple to the input controller 260 in I/O subsystems 206.It is close Sensor 266 performs as described in following U.S. Patent application：11/241,839, entitled " Proximity Detector In Handheld Device”；11/240,788, entitled " Proximity Detector In Handheld Device”；11/620,702, entitled " Using Ambient Light Sensor To Augment Proximity Sensor Output”；11/586,862, entitled " Automated Response To And Sensing Of User activity In Portable Devices "；With 11/638,251, entitled " Methods And Systems For Automatic Configuration Of Peripherals ", these U.S. Patent applications are incorporated by reference simultaneously accordingly Enter herein.In some embodiments, when multifunctional equipment is placed near the ear of user (for example, when user is entering During row call), proximity transducer is closed and disables touch-screen 212.

Equipment 200 optionally also includes one or more tactile output generators 267.Fig. 2A, which is shown, is couple to I/O The tactile output generator of tactile feedback controller 261 in system 206.Tactile output generator 267 optionally includes one Or multiple electroacoustic equipments such as loudspeaker or other acoustic components；And/or the electromechanics for converting the energy into linear movement is set Standby such as motor, solenoid, electroactive polymerizer, piezo-activator, electrostatic actuator or other tactiles output generating unit (example Such as, the part exported for converting the electrical signal to the tactile in equipment).Contact strength sensor 265 is from haptic feedback module 233 receive touch feedback generation instruction, and the tactile that generation can be felt by the user of equipment 200 on the device 200 is defeated Go out.In some embodiments, at least one tactile output generator and touch sensitive surface (for example, touch-sensitive display system 212) Alignment is neighbouring, and optionally by vertically (for example, surface inside/outside to equipment 200) or laterally (for example, In the surface identical plane with equipment 200 rearwardly and a forwardly) mobile touch sensitive surface exports to generate tactile.In some implementations In scheme, at least one tactile output generator sensor be located at equipment 200 with the touch-screen on the front portion of equipment 200 The phase of display 212 back to rear portion on.

Equipment 200 may also include one or more accelerometers 268.Fig. 2A, which is shown, is coupled to peripheral interface 218 Accelerometer 268.Alternatively, accelerometer 268 is couple to the input controller 260 in I/O subsystems 206.Accelerometer 268 execution as described in following U.S. Patent Publication：20050190059, entitled " Acceleration-based Theft Detection System for Portable Electronic Devices”；And 20060017692, title For " Methods And Apparatuses For Operating A Portable Device Based On An Accelerometer ", the two U.S. Patent Publications are incorporated by reference in its entirety herein.In some embodiments, believe Breath based on to the analysis from one or more accelerometer received datas and on touch-screen display with longitudinal view or Transverse views are shown.Equipment 200 also include optionally in addition to accelerometer 268 magnetometer (not shown) and GPS (or GLONASS or other Global Navigation Systems) receiver (not shown), for obtaining position and the orientation (example on equipment 200 Such as, it is vertical or horizontal) information.

In some embodiments, the software part being stored in memory 202 includes operating system 226, communication module (or instruction set) 228, contact/motion module (or instruction set) 230, figure module (or instruction set) 232, text input module (or instruction set) 234, global positioning system (GPS) module (or instruction set) 235, digital assistants client modules 229 and should With program (or instruction set) 236.In addition, the data storage of memory 202 and model, such as user data and model 231.In addition, In some embodiments, memory 202 (Fig. 2A) or 470 (Fig. 4) storage devices/global internal state 257, such as Fig. 2A and Shown in Fig. 4.Equipment/global internal state 257 includes one or more of the following：Applications active state, The applications active state is used to indicate which application program (if any) is currently movable；Dispaly state, this is aobvious Show that state is used to indicate that what application program, view or other information occupy the regional of touch-screen display 212；Sensor State, the sensor states include the information that each sensor and input control apparatus 216 of slave unit obtain；And on setting Standby position and/or the positional information of posture.

Operating system 226 is (for example, Darwin, RTXC, LINUX, UNIX, OS X, iOS, WINDOWS or embedded behaviour Make system such as VxWorks) include being used to control and manage general system task (for example, memory management, storage device control System, power management etc.) various software parts and/or driver, and promote between various hardware componenies and software part Communication.

Communication module 228 promotes to be communicated with other equipment by one or more outside ports 224, and also wraps Include for handling by RF circuits 208 and/or the various software parts of the received data of outside port 224.Outside port 224 (such as USB (USB), live wire etc.) is suitable to be directly coupled to other equipment, or (such as mutual indirectly by network Networking, WLAN etc.) coupling.In some embodiments, outside port be with(Apple Inc. trade mark) equipment Go up spininess (for example, 30 pins) connector that used 30 needle connectors are same or similar and/or are compatible with.

Contact/motion module 230 optionally detect with touch-screen 212 (with reference to display controller 256) and other touch-sensitive set The contact of standby (for example, touch pad or physics click type rotating disk).Contact/motion module 230 include various software parts for The various operations related with contact detection are performed, such as to determine that whether being in contact (for example, detecting finger down event), true Surely contact intensity (for example, contact power or pressure, or contact power or pressure substitute), determine whether there is contact Movement and track movement (for example, detecting one or more finger drag events) on touch sensitive surface and determine contact Whether stop (for example, detection digit up event or contact disconnect).Contact/motion module 230 receives from touch sensitive surface Contact data.Determine the movement of contact point optionally include determining the speed (value) of contact point, speed (value and direction) and/ Or acceleration (change in value and/or direction), the movement of the contact point are represented by a series of contact data.These operations Optionally it is applied to single-contact (for example, single abutment) or multiple spot while contacts (for example, " multiple point touching "/multiple hands Abutment).In some embodiments, contact/motion module 230 detects the contact on touch pad with display controller 256.

In some embodiments, contact/motion module 230 determines to operate using one group of one or more intensity threshold Whether performed (for example, determining that whether user " clicks on " icon) by user.In some embodiments, according to software parameter To determine at least one subset of intensity threshold (for example, intensity threshold is not Lai really by the activation threshold of specific physical actuation device Fixed, and can be conditioned in the case where not changing the physical hardware of equipment 200).For example, do not changing Trackpad or touch In the case of panel type display hardware, mouse " click " threshold value of Trackpad or touch-screen can be configured to the big of predefined threshold value Any one threshold value in scope.In addition, in some specific implementations, provided to the user of equipment for adjusting one group of intensity threshold One or more of intensity threshold (for example, by adjusting each intensity threshold and/or by using to " intensity " parameter being Irrespective of size click on comes the multiple intensity thresholds of Primary regulation) software design patterns.

Contact/motion module 230 optionally detects the gesture input of user.Different gestures on touch sensitive surface have difference Contact patterns (for example, the different motion of detected contact, timing and/or intensity).Therefore, it is special optionally by detection Determine contact patterns and carry out detection gesture.For example, detection finger tapping down gesture includes detection finger down event, then pressed with finger Lower event identical position (or substantially the same position) place (for example, in opening position of icon) detection finger is lifted and (is lifted away from) Event.As another example, finger is detected on touch sensitive surface and gently sweeps gesture including detecting finger down event, is then detected One or more finger drag events, and then detection finger lifts and (is lifted away from) event.

Figure module 232 includes being used for being presented and showing the various known of figure on touch-screen 212 or other displays Software part, including for changing the visual impact of shown figure (for example, brightness, transparency, saturation degree, contrast Or other visual signatures) part.As used herein, term " figure " includes any object that can be displayed to user, including But it is not limited to text, webpage, icon (user interface object for including soft key), digital picture, video, animation etc..

In some embodiments, figure module 232 stores the data for representing figure to be used.Each figure is appointed Selection of land is assigned corresponding code.Figure module 232 is used for specify figure to be shown one from receptions such as application programs Or multiple codes, coordinate data and other graphic attribute data are also received in the case of necessary, then generate screen picture number According to output to display controller 256.

Haptic feedback module 233 includes being used for the various software parts for generating instruction, and the instruction is by tactile output generator 267 use, to produce tactile in response to the one or more positions of user with interacting for equipment 200 and on the device 200 Output.

In some instances, the text input module 234 as the part of figure module 232 is provided in various applications Program is (for example, contact person 237, Email 240, IM 241, browser 247 and any other application for needing text input Program) in input text soft keyboard.

GPS module 235 determine this information that the position and providing of equipment uses in various application programs (for example, There is provided to the phone 238 for being used for location-based dialing, there is provided be used as picture/video metadata to camera 243, and provide extremely Location Based service such as weather desktop small routine, local Yellow Page desktop small routine and map/navigation desktop small routine are provided Application program).

Digital assistants client modules 229 instruct including various client-side digital assistants, to provide the visitor of digital assistants Family side function.For example, digital assistants client modules 229 can be connect by the various users of portable multifunction device 200 Mouth is (for example, microphone 213, one or more accelerometers 268, touch-sensitive display system 212, one or more optical sensings Device 229, other input control apparatus 216 etc.) come receive sound input (for example, phonetic entry), text input, touch input And/or gesture input.Digital assistants client modules 229 can also pass through the various inputs of portable multifunction device 200 Interface (for example, loudspeaker 211, touch-sensitive display system 212, one or more tactile output generators 267 etc.) provides audio The output of (for example, voice output), video and/or tactile form.For example, output is provided as voice, sound, alarm, text The combination of message, menu, figure, video, animation, vibration, and/or both of the above or more person.During operation, digital assistants Client modules 229 are communicated using RF circuits 208 with DA servers 106.

User data and model 231 include the various data associated with user (for example, the vocabulary number specific to user According to, user preference data, title pronunciation, the data from user's electronic address list, do list, purchase specific to user Thing inventory etc.), to provide the client-side function of digital assistants.In addition, user data and model 231 include being used to handle user Input and determine user view various models (for example, speech recognition modeling, statistical language model, Natural Language Processing Models, Ontologies, task flow model, service model etc.).

In some instances, digital assistants client modules 229 utilize various sensors, subsystem and portable multi-function The ancillary equipment of equipment 200 to gather additional information from the surrounding environment of portable multifunction device 200, with establish with user, The context that active user's interaction and/or active user's input are associated.In some instances, digital assistants client modules 229 provide contextual information or its subset to DA servers 106 together with user's input, to help to infer the intention of user. In some instances, digital assistants also prepare to export and be sent to user using contextual information to determine how.Up and down Literary information is referred to as context data.

In some instances, sensor information is included with the contextual information of user's input, such as illumination, environment are made an uproar Sound, environment temperature, the image of surrounding environment or video etc..In some instances, contextual information may also include the physics of equipment State, such as apparatus orientation, device location, device temperature, power level, speed, acceleration, motor pattern, cellular signal are strong Degree etc..In some instances, also by the information related to the application state of DA servers 106 such as portable multifunction device 200 running, installation procedure, past and current network activity, background service, error log, resource, which use etc., to be made There is provided for the contextual information associated with user's input to DA servers 106.

In some instances, digital assistants client modules 229 select in response to the request from DA servers 106 There is provided the information (for example, user data 231) being stored on portable multifunction device 200 to property.In some instances, number Word assistant client modules 229 are also when being made requests on by DA servers 106 via natural language dialogue or other users interface Extract the additional input from user.Additional input is sent to DA servers 106 by digital assistants client modules 229, with side Help DA servers 106 to carry out intent inference and/or meet the user view expressed in user asks.

Digital assistants are described in further detail below with reference to Fig. 7 A-C.It should be appreciated that digital assistants client modules 229 It may include the submodule of any number of digital assistant module 726 described below.

Application program 236 is included with lower module (or instruction set) or its subset or superset：

Contact module 237 (otherwise referred to as address list or contacts list)；

Phone module 238；

Video conference module 239；

Email client module 240；

Instant message (IM) module 241；

Body-building support module 242；

For still image and/or the camera model 243 of video image；

Image management module 244；

Video player module；

Musical player module；

Browser module 247；

Calendaring module 248；

Desktop small routine module 249, it includes one or more of the following in some instances：Weather desktop Small routine 249-1, stock market desktop small routine 249-2, calculator desktop small routine 249-3, alarm clock desktop small routine 249-4, word Allusion quotation desktop small routine 249-5 and other desktop small routines obtained by user, and the desktop small routine 249- 6 that user creates；

For the desktop small routine builder module 250 for the desktop small routine 249-6 for generating user's establishment；

Search module 251；

Video and musical player module 252, it merges video player module and musical player module；

Notepad module 253；

Mapping module 254；And/or

Online Video module 255.

The example for the other applications 236 being stored in memory 202 include other word-processing applications, its His picture editting's application program, drawing application program, application program, encryption, the digital rights that application program is presented, supports JAVA Benefit management, speech recognition and speech reproduction.

With reference to touch-screen 212, display controller 256, module 230, figure module 232 and text input module 234 are contacted, Contact module 237 is used to manage address list or contacts list (for example, being stored in memory 202 or memory 470 In the application program internal state 292 of contact module 237), including：One or more names are added to address list；From logical One or more names are deleted in news record；Make one or more telephone numbers, one or more e-mail addresses, one or more Individual physical address or other information associate with name；Image is associated with name；Name is classified and sorted；Phone is provided Number or e-mail address, to initiate and/or promote by phone 238, video conference 239, Email 240 or IM 241 Communication of progress etc..

With reference to RF circuits 208, voicefrequency circuit 210, loudspeaker 211, microphone 213, touch-screen 212, display controller 256th, contact/motion module 230, figure module 232 and text input module 234, phone module 238 is used to input and phone The phone number that one or more of character string, access contact module 237 telephone number, modification corresponding to number have inputted Code, corresponding telephone number is dialed, is conversated and is disconnected or hang up when session is completed.As described above, channel radio courier With any one of multiple communication standards, agreement and technology.

With reference to RF circuits 208, voicefrequency circuit 210, loudspeaker 211, microphone 213, touch-screen 212, display controller 256th, optical sensor 264, optical sensor controller 258, contact/motion module 230, figure module 232, text input mould Block 234, contact module 237 and phone module 238, video conference module 239 include according to user instruction to initiate, carry out and Terminate the executable instruction of the video conference between user and other one or more participants.

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, the and of figure module 232 Text input module 234, email client module 240 include creating, send, receive and managing in response to user instruction The executable instruction of Email.With reference to image management module 244, email client module 240 to be very easy to wound Build and send with the still image shot by camera model 243 or the Email of video image.

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, the and of figure module 232 Text input module 234, instant message module 241 include the executable instruction for following operation：Input and instant message pair Character that the character string answered, modification are previously entered, the corresponding instant message of transmission are (for example, using Short Message Service (SMS) or more Media information service (MMS) agreement for the instant message based on phone or using XMPP, SIMPLE or IMPS with For the instant message based on internet), receive instant message and check received instant message.In some embodiment party In case, the instant message for transmitting and/or receiving includes figure, photo, audio file, video file and/or MMS and/or increasing Other annexes supported in strong messenger service (EMS).As used herein, " instant message " refers to the message (example based on phone Such as, the message transmitted using SMS or MMS) and message based on internet (for example, using XMPP, SIMPLE or IMPS to pass Both defeated message).

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, figure module 232, Text input module 234, GPS module 235, mapping module 254 and musical player module, body-building support module 242 include using In the executable instruction of following operation：Create body-building (for example, there is time, distance and/or caloric burn target)；With body-building Sensor (mobile device) is communicated；Receive workout sensor data；Calibrate the sensor for monitoring body-building；Select body-building Music simultaneously plays out；And display, storage and transmission workout data.

With reference to touch-screen 212, display controller 256, one or more optical sensors 264, optical sensor controller 258th, contact/motion module 230, figure module 232 and image management module 244, camera model 243 include being used for following behaviour The executable instruction of work：Capture still image or video (including video flowing) and store them in memory 202, repair Change the feature of still image or video or delete still image or video from memory 202.

With reference to touch-screen 212, display controller 256, contact/motion module 230, figure module 232, text input mould Block 234 and camera model 243, image management module 244 include being used for arranging, change (for example, editor) or otherwise Manipulate, tag, deleting, presenting (for example, in digital slide or photograph album) and storage still image and/or video figure The executable instruction of picture.

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, the and of figure module 232 Text input module 234, browser module 247 include being used for (including searching for, linking to browse internet according to user instruction To, receive and display webpage or part thereof and the annex and alternative document that link to webpage) executable instruction.

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, figure module 232, Text input module 234, email client module 240 and browser module 247, calendaring module 248 include being used for basis User instruction creates, shows, changes and stored calendar and the data associated with calendar (for example, calendar, waiting to handle affairs List etc.) executable instruction.

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, figure module 232, Text input module 234 and browser module 247, desktop small routine module 249 be can be downloaded and be used by user it is miniature should With program (for example, weather desktop small routine 249-1, stock market desktop small routine 249-2, calculator desktop small routine 249-3, noisy Clock desktop small routine 249-4 and dictionary desktop small routine 249-5) or by user create miniature applications program (for example, user The desktop small routine 249-6 of establishment).In some embodiments, desktop small routine includes HTML (HTML) texts Part, CSS (CSS) files and JavaScript file.In some embodiments, desktop small routine (can including XML Extending mark language) file and JavaScript file be (for example, Yahoo！Desktop small routine).

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, figure module 232, Text input module 234 and browser module 247, desktop small routine builder module 250 are used to create desktop little Cheng by user Sequence (for example, user's specified portions of webpage are gone in desktop small routine).

With reference to touch-screen 212, display controller 256, contact/motion module 230, figure module 232 and text input mould Block 234, search module 251 include being used for according to user instruction come the matching one or more searching bar in searching storage 202 The text of part (for example, search term that one or more users specify), music, sound, image, video and/or alternative document Executable instruction.

With reference to touch-screen 212, display controller 256, contact/motion module 230, figure module 232, voicefrequency circuit system System 210, loudspeaker 211, RF circuit systems 208 and browser module 247, video and musical player module 252 include allowing User is downloaded and played back with the music recorded of one or more file formats (such as MP3 or AAC files) storage and other The executable instruction of audio files, and for showing, presenting or otherwise playing back video (for example, in touch-screen 212 Or on the external display connected via outside port 224) executable instruction.In some embodiments, equipment 200 is appointed Selection of land includes MP3 player, such as iPod (Apple Inc. trade mark) function.

With reference to touch-screen 212, display controller 256, contact/motion module 230, figure module 232 and text input mould Block 234, notepad module 253 include creating and managing the executable instruction of notepad, backlog etc. according to user instruction.

With reference to RF circuits 208, touch-screen 212, display controller 256, contact/motion module 230, figure module 232, Text input module 234, GPS module 235 and browser module 247, mapping module 254 can be used for received according to user instruction, It has been shown that, modification and storage map and the data associated with map are (for example, at or near steering direction and ad-hoc location Shop and the relevant data and other location-based data of other points of interest).

With reference to touch-screen 212, display controller 256, contact/motion module 230, figure module 232, voicefrequency circuit 210th, loudspeaker 211, RF circuits 208, text input module 234, email client module 240 and browser module 247, Online Video module 255 is included to give an order：Allow user access, browse, receiving (for example, by transmit as a stream and/or under Carry), playback (such as on the touchscreen or on the external display connected via outside port 224), send and have to special Determine the Email of the link of Online Video and otherwise manage one or more file formats such as H.264 Line video.In some embodiments, using instant message module 241 rather than email client module 240 send to The link of specific Online Video.It is entitled that the additional description of Online Video application program can be that on June 20th, 2007 submits “Portable Multifunction Device,Method,and Graphical User Interface for Playing Online Videos " U.S. Provisional Patent Application 60/936,562 and in the mark submitted on December 31st, 2007 Entitled " Portable Multifunction Device, Method, and Graphical User Interface for Found in Playing Online Videos " U.S. Patent application 11/968,067, the content of the two patent applications is accordingly It is incorporated by reference in its entirety herein.

Above-mentioned each module and application program, which correspond to, to be used to perform above-mentioned one or more functions and in this patent Shen Please described in method (for example, computer implemented method as described herein and other information processing method) executable instruction Collection.These modules (for example, instruction set) need not be implemented as independent software program, process or module, therefore these modules Each subset can in various embodiments be combined or otherwise rearrange.For example, video player module can Individual module (for example, video and musical player module 252 in Fig. 2A) is combined into musical player module.In some realities Apply in scheme, memory 202 stores the subset of above-mentioned module and data structure.In addition, the storage of memory 202 is not described above Add-on module and data structure.

In some embodiments, equipment 200 is that the operation of predefined one group of function in the equipment uniquely passes through Touch-screen and/or touch pad are come the equipment that performs.By using touch-screen and/or touch pad as the operation for equipment 200 Main input control apparatus, optionally reduce equipment 200 on be physically entered control device (push button, driver plate etc. Deng) quantity.

User circle is uniquely optionally included in come the predefined one group of function of performing by touch-screen and/or touch pad Navigation between face.In some embodiments, when user touches touch pad, appoint what is shown in the slave unit 200 of equipment 200 What user interface navigation is to main menu, home menus or root menu.In such embodiment, " dish is realized using touch pad Single button ".In some other embodiments, menu button is physics push button or other are physically entered control device, Rather than touch pad.

Fig. 2 B are the block diagrams for showing the example components for event handling according to some embodiments.In some realities Apply in scheme, memory 202 (Fig. 2A) or memory 470 (Fig. 4) include event classifier 270 (for example, in operating system 226 In) and corresponding application program 236-1 (for example, any one application in aforementioned applications program 237-251,255,480-490 Program).

The application program 236-1 and answer that event information is delivered to by the reception event information of event classifier 270 and determination With program 236-1 application view 291.Event classifier 270 includes event monitor 271 and event dispatcher module 274.In some embodiments, application program 236-1 includes application program internal state 292, the application program internal state Indicate the one or more current applications being displayed on when application program is activity or is carrying out on touch-sensitive display 212 Views.In some embodiments, equipment/global internal state 257 is used for which (which determined by event classifier 270 Application program is currently movable a bit), and application program internal state 292 is used for determination by thing by event classifier 270 The application view 291 that part information is delivered to.

In some embodiments, application program internal state 292 includes additional information, such as one of the following Or more persons：The recovery information used, instruction are just being employed program 236-1 and shown when application program 236-1 recovers to perform Information or be ready for being employed the user interface state information for the information that program 236-1 is shown, for enabling a user to Enough return to application program 236-1 previous state or the state queue of view and the repetition of prior actions that user takes/ Cancel queue.

Event monitor 271 receives event information from peripheral interface 218.Event information is included on subevent (example Such as, as on the touch-sensitive display 212 of a part for multi-touch gesture user touch) information.Peripheral interface 218 Transmitting it, (such as proximity transducer 266, accelerometer 268 and/or microphone 213 (pass through from I/O subsystems 206 or sensor Voicefrequency circuit 210)) receive information.The information that peripheral interface 218 receives from I/O subsystems 206 is included from touch-sensitive aobvious Show the information of device 212 or touch sensitive surface.

In some embodiments, event monitor 271 sends the request to ancillary equipment at predetermined intervals Interface 218.As response, the transmitting event information of peripheral interface 218.In other embodiments, peripheral interface 218 Only when exist notable event (for example, receive higher than predetermined noise threshold input and/or receive more than in advance really The input of fixed duration) when ability transmitting event information.

In some embodiments, event classifier 270 also includes hit view determination module 272 and/or life event Identifier determining module 273.

When touch-sensitive display 212 shows more than one view, hit view determination module 272 is provided for determining sub- thing The part software process where occurred in one or more views.The control that view can be checked over the display by user Part and other elements are formed.

The another aspect of the user interface associated with application program is one group of view, otherwise referred to as applies journey herein Sequence view or user interface windows, wherein display information and occur the gesture based on touch.Touch is detected wherein The sequencing water that (corresponding application programs) application view corresponds in the sequencing or view hierarchies structure of application program It is flat.For example, detect that the floor level view of touch is called hit view wherein, and be identified as correctly entering that The hit view for the initial touch that group event is based at least partially on the gesture for starting based on touch determines.

Hit view determination module 272 and receive the information related to the subevent of the gesture based on touch.Work as application program During with the multiple views organized in hierarchy, hit view determination module 272 will hit view, and be identified as should be to sub- thing Minimum view in the hierarchy that part is handled.In most cases, hit view is to initiate subevent (for example, shape The first subevent into the subevent sequence of event or potential event) the floor level view that occurs wherein.Once hit View is hit view determination module 272 and identified, hit view just generally receive with its be identified as hit view it is targeted Same touch or all subevents of input source correlation.

It is specific that life event identifier determining module 273 determines which or which view in view hierarchies structure should receive Subevent sequence.In some embodiments, life event identifier determining module 273 determines that only hit view should just receive Specific subevent sequence.In other embodiments, life event identifier determining module 273 determines the physics for including subevent All views of position are the active views participated in, it is thus determined that all views actively participated in should receive specific subevent sequence Row.In other embodiments, even if touch subevent is confined to the region associated with a particular figure completely, but Higher view will remain in that view for active participation in hierarchy.

Event information is assigned to event recognizer (for example, event recognizer 280) by event dispatcher module 274.Wrapping In the embodiment for including life event identifier determining module 273, event information is delivered to by living by event dispatcher module 274 The dynamic definite event identifier of event recognizer determining module 273.In some embodiments, event dispatcher module 274 exists Event information is stored in event queue, the event information is retrieved by corresponding event receiver 282.

In some embodiments, operating system 226 includes event classifier 270.Alternatively, application program 236-1 bags Include event classifier 270.In still another embodiment, event classifier 270 is independent module, or is stored in A part for another module (such as contact/motion module 230) in memory 202.

In some embodiments, application program 236-1 includes multiple button.onreleases 290 and one or more applications Views 291, wherein each application view includes being used to handle occurring to regard in the corresponding of user interface of application program The instruction of touch event in figure.Application program 236-1 each application view 291 includes one or more event recognitions Device 280.Generally, corresponding application programs view 291 includes multiple event recognizers 280.In other embodiments, event recognition One or more of device 280 event recognizer is a part for standalone module, the standalone module such as user interface tool bag (not shown) or application program the 236-1 therefrom higher levels of object of inheritance method and other attributes.In some embodiments In, corresponding event processing routine 290 includes one or more of the following：Data renovator 276, object renovator 277, GUI renovators 278, and/or the event data 279 received from event classifier 270.Button.onrelease 290 is utilized or adjusted With data renovator 276, object renovator 277 or GUI renovators 278, with more new application internal state 292.It is alternative One or more of ground, application view 291 application view includes one or more corresponding event processing routines 290.In addition, in some embodiments, one of data renovator 276, object renovator 277 and GUI renovators 278 or More persons are included in corresponding application programs view 291.

Corresponding event identifier 280 from event classifier 270 receive event information (for example, event data 279), and from Event information identification events.Event recognizer 280 includes Event receiver 282 and event comparator 284.In some embodiment party In case, event recognizer 280 also comprises at least the subset of the following：(it is wrapped for metadata 283 and event delivery instruction 288 Enclosed tool event delivery instructs).

Event receiver 282 receives event information from event classifier 270.Event information includes for example touching on subevent Touch or touch the information of movement.According to subevent, event information also includes the position of additional information such as subevent.Work as subevent When being related to the motion of touch, speed and direction of the event information also including subevent.In some embodiments, event includes setting It is standby to rotate to another orientation (for example, rotate to horizontal orientation from machine-direction oriented, or vice versa as the same), and event from an orientation Information includes the corresponding informance of the current orientation (also referred to as equipment posture) on equipment.

Compared with event comparator 284 defines event information with predefined event or subevent, and it is based on being somebody's turn to do Compare to determine event or subevent, or determination or the state of update event or subevent.In some embodiments, event Comparator 284 includes event and defines 286.Event defines 286 definition (for example, predefined subevent sequence) for including event, Such as event 1 (287-1), event 2 (287-2) and other events.In some embodiments, the sub- thing in event (287) Part for example starts including touch, touches and terminate, touch mobile, touch cancellation and multiple point touching.In one example, event 1 The definition of (287-1) is the double-click on shown object.For example, double-click the of the predetermined duration included on shown object Once touch (touch starts), the first time of predetermined duration lifts (touch terminates), is shown predefining on object Second of touch (touch starts) of duration and lifting for the second time (touch terminates) for predetermined duration.Show at another In example, the definition of event 2 (287-2) is the dragging on shown object.For example, dragging is including advance true on shown object The movement and touch of the long touch of timing (or contact), touch on touch-sensitive display 212 are lifted (touch terminates). In some embodiments, event also includes the information for being used for one or more associated button.onreleases 290.

In some embodiments, event defines 287 and includes being used for the definition of the event of respective user interfaces object.One In a little embodiments, event comparator 284 performs hit test, to determine which user interface object is associated with subevent. For example, shown on touch display 212 in the application view of three user interface objects, when in touch-sensitive display 212 On when detecting touch, event comparator 284 performs hit test to determine which of these three user interface objects are used Family interface object is associated with the touch (subevent).If each shown object and the corresponding phase of button.onrelease 290 Association, the then result that event comparator is tested using the hit determine which button.onrelease 290 should be activated.Example Such as, the selection of event comparator 284 button.onrelease associated with the object of subevent and triggering hit test.

In some embodiments, the definition of corresponding event (287) also includes delay voltage, delay voltage delay event The delivering of information, until having determined that whether subevent sequence exactly corresponds to or do not correspond to the event type of event recognizer.

, should when corresponding event identifier 280 determines that any event that subevent sequence is not defined with event in 286 matches The entry event of corresponding event identifier 280 is impossible, event fails or event done state, ignores after this based on touch The follow-up subevent of gesture.In this case, for hit view holding activity other event recognizers (if Words) continue to track and handle the subevent of the lasting gesture based on touch.

In some embodiments, corresponding event identifier 280 includes having how instruction event delivery system should be held Configurable attribute, mark and/or the metadata of list 283 that row is delivered the subevent of the event recognizer of active participation. In some embodiments, metadata 283 includes indicating how event recognizer interacts or how to interact each other and be configurable Attribute, mark and/or list.In some embodiments, metadata 283 includes whether instruction subevent is delivered to view or journey Configurable attribute, mark and/or the list of different levels in sequence hierarchy.

In some embodiments, when one or more specific subevents of identification events, corresponding event identifier 280 The activation button.onrelease 290 associated with event.In some embodiments, corresponding event identifier 280 will be with event Associated event information is delivered to button.onrelease 290.Activation button.onrelease 290 is different from sending subevent (and delaying to send) hits view to corresponding.In some embodiments, event recognizer 280 is dished out and the event phase that is identified The mark of association, and the button.onrelease 290 associated with the mark obtains the mark and performs predefined process.

In some embodiments, event delivery instruction 288 includes event information of the delivering on subevent without activating The subevent delivery instructions of button.onrelease.On the contrary, event information is delivered to and subevent string phase by subevent delivery instructions The button.onrelease of association or the view for being delivered to active participation.The thing associated with subevent string or the view actively participated in Part processing routine receives event information and performs predetermined process.

In some embodiments, data renovator 276 creates and updated the data used in application program 236-1. For example, data renovator 276 is updated to the telephone number used in contact module 237, or to video playback Video file used in device module is stored.In some embodiments, object renovator 277 is created and updated and answering With the object used in program 236-1.For example, object renovator 277 creates new user interface object or more new user interface pair The position of elephant.GUI renovators 278 update GUI.For example, GUI renovators 278 prepare display information and send it to figure mould Block 232, for the display on touch-sensitive display.

In some embodiments, one or more button.onreleases 290 include data renovator 276, object updates Device 277 and GUI renovators 278 or with access to data renovator 276, object renovator 277 and GUI renovators 278 Authority.In some embodiments, data renovator 276, object renovator 277 and GUI renovators 278, which are included in, accordingly should With in program 236-1 or the individual module of application view 291.In other embodiments, they be included in two or In more software modules.

It should be appreciated that the discussed above of the event handling touched on the user on touch-sensitive display is applied also for using defeated Enter user's input that equipment carrys out the other forms of operating multifunction equipment 200, not all user's input is all in touch-screen Upper initiation.For example, optionally with single or multiple keyboard pressings or the mouse movement of combination and mouse button is kept to press；Touch Contact movement, touch, dragging, rolling etc. in template；Stylus inputs；The movement of equipment；Spoken command；Detected Eyes move；Biological characteristic inputs；And/or as it is corresponding with subevent input be used for define the event to be identified its The combination of meaning.

Fig. 3 shows the portable multifunction device 200 with touch-screen 212 according to some embodiments.Touch-screen The one or more figures of display optionally in user interface (UI) 300.In the present embodiment and it is described below In other embodiments, user can be by, for example, one or more fingers 302 (in figure be not drawn to scale) or one Individual or multiple stylus 303 (being not drawn to scale in figure) make gesture to select one or more in these figures on figure Individual figure.In some embodiments, when user interrupts the contact with one or more figures, will occur to one or more The selection of figure.In some embodiments, gesture optionally includes one or many touches, one or many gently swept (from a left side To the right, from right to left, up and/or down the finger), and/or being in contact with equipment 200 rolling (from right to left, from From left to right, up and/or down).In some specific implementations or in some cases, it will not inadvertently be selected with pattern contact Select figure.For example, when gesture corresponding with selection is touch, what is swept above application icon gently sweeps gesture optionally Corresponding application programs will not be selected.

Equipment 200 also includes one or more physical buttons, such as " home " button or menu button 304.Such as preceding institute State, menu button 304 is used for any application program 236 navigate in the one group of application program performed on the device 200.It is alternative Ground, in some embodiments, menu button are implemented as the soft key in the GUI that is displayed on touch-screen 212.

In some embodiments, equipment 200 includes touch-screen 212, menu button 304, for making equipment power on/off With for the push button 306 of locking device, one or more volume knobs 308, subscriber identity module (SIM) neck 310th, earphone jack 312 and docking/charging external port 224.Push button 306 is optionally used to：By pressing button simultaneously Button is set to keep predetermined time interval to make equipment power on/off in pressed status；By pressing button and passing through Release button carrys out locking device before predetermined time interval；And/or releasing process is unlocked or initiated to equipment. In alternative embodiment, equipment 200 is also received defeated for the voice that activates or deactivate some functions by microphone 213 Enter.One or more contact strengths that equipment 200 also optionally includes being used to detect the intensity of the contact on touch-screen 212 pass Sensor 265, and/or for generating one or more tactile output generators 267 of tactile output for the user of equipment 200.

Fig. 4 is the block diagram according to the exemplary multifunctional equipment with display and touch sensitive surface of some embodiments. Equipment 400 needs not be portable.In some embodiments, equipment 400 is laptop computer, desktop computer, flat board electricity Brain, multimedia player device, navigation equipment, educational facilities (such as children for learning toy), games system or control device (example Such as, household controller or industrial controller).Equipment 400 generally include one or more processing units (CPU) 410, one or Multiple networks or other communication interfaces 460, memory 470 and one or more communication bus for making these component connections 420.Communication bus 420 is optionally including making the circuit of the communication between system unit interconnection and control system part (sometimes It is called chipset).Equipment 400 includes input/output (I/O) interface 430 with display 440, and the display is typically to touch Touch panel type display.I/O interfaces 430 also optionally include keyboard and/or mouse (or other sensing equipments) 450 and touch pad 455th, for generating the tactile output generator 457 of tactile output on device 400 (for example, similar to above with reference to Fig. 2A institutes The one or more tactile output generators 267 stated), sensor 459 is (for example, optical sensor, acceleration transducer, close Sensor, touch-sensitive sensors, and/or similar to one or more contact strength sensors 265 above with reference to described in Fig. 2A Contact strength sensor).Memory 470 include high-speed random access memory such as DRAM, SRAM, DDR RAM or other with Machine access solid-state memory device, and optionally include such as one or more disk storage equipments of nonvolatile memory, Optical disc memory apparatus, flash memory device or other non-volatile solid-state memory devices.Memory 470 optionally includes remote From one or more storage devices of one or more CPU 410 positioning.In some embodiments, memory 470 storage with The program, the module that are stored in the memory 202 of portable multifunction device 200 (Fig. 2A) journey similar with data structure Sequence, module and data structure or their subset.In addition, memory 470 is optionally stored in portable multifunction device 200 Appendage, module and the data structure being not present in memory 202.For example, the memory 470 of equipment 400 optionally stores Graphics module 480, module 482, word processing module 484, website creation module 486, disk editor module 488, and/or electricity is presented Sub-table module 490, and the memory 202 of portable multifunction device 200 (Fig. 2A) does not store these modules optionally.

In some instances, each element in the said elements in Fig. 4 can be stored in one or more mentioned above Memory devices in.Each module in above-mentioned module corresponds to the instruction set for being used for performing above-mentioned function.Above-mentioned module or Program (for example, instruction set) need not be implemented as single software program, process or module, and therefore these modules is various Subset can in various embodiments be combined or otherwise rearrange.In some embodiments, memory 470 is deposited Store up the subset of above-mentioned module and data structure.In addition, memory 470 stores the add-on module and data knot being not described above Structure.

Attention is drawn to can be in the embodiment party for the user interface for example realized on portable multifunction device 200 Case.

Fig. 5 A show the example of the application menu on the portable multifunction device 200 according to some embodiments Property user interface.Similar user interface is realized on device 400.In some embodiments, user interface 500 includes following Element or its subset or superset：

One or more signal intensities for one or more radio communications such as cellular signal and Wi-Fi signal indicate Symbol 502；

Time 504；

Bluetooth designator 505；

Battery status indicator 506；

Pallet 508 with the icon for commonly using application program, such as：

The icon 516 for being marked as " phone " of o phone modules 238, the icon 516 optionally include missed call or language The designator 514 of the quantity of sound message；

The icon 518 of the mark " mail " of o email clients module 240, the icon 518 optionally include not reading The designator 510 of the quantity of Email；

The icon 520 for being marked as " browser " of o browser modules 247；With

O videos and musical player module 252 (also referred to as iPod (Apple Inc. trade mark) module 252) is marked It is designated as the icon 522 of " iPod "；And

The icon of other applications, such as：

The icon 524 for being marked as " message " of o IM modules 241；

The icon 526 for being marked as " calendar " of o calendaring modules 248；

The icon 528 for being marked as " photo " of o image management modules 244；

The icon 530 for being marked as " camera " of o camera models 243；

The icon 532 for being marked as " Online Video " of o Online Videos module 255；

The o stock markets desktop small routine 249-2 icon 534 for being marked as " stock market "；

The icon 536 for being marked as " map " of o mapping modules 254；

The o weather desktop small routines 249-1 icon 538 for being marked as " weather "；

The o alarm clock desktop small routines 249-4 icon 540 for being marked as " clock "；

The icon 542 for being marked as " body-building support " of o body-building support module 242；

The icon 544 for being marked as " notepad " of o notepad modules 253；With

O is used for the icon 546 for being marked as " setting " for setting application program or module, and the icon 546 is provided to equipment 200 and its various application programs 236 setting access.

It should indicate, the icon label shown in Fig. 5 A is only exemplary.For example, video and music player mould The icon 522 of block 252 is optionally marked as " music " or " music player ".Optionally make for various application icons With other labels.In some embodiments, the label of corresponding application programs icon includes and the corresponding application programs icon pair The title for the application program answered.In some embodiments, the label of application-specific icon is different from and the application-specific The title of application program corresponding to program icon.

Fig. 5 B are shown with (the example of touch sensitive surface 551 separated with display 550 (for example, touch-screen display 212) Such as, Fig. 4 tablet personal computer or touch pad 455) equipment (for example, Fig. 4 equipment 400) on exemplary user interface.Equipment 400 also optionally include one or more contact strength sensors of the intensity for detecting the contact on touch sensitive surface 551 (for example, one or more of sensor 457 sensor), and/or one for generating tactile output for the user of equipment 400 Individual or multiple tactile output generators 459.

Although it will be provided then with reference to the input on touch-screen display 212 (being wherein combined with touch sensitive surface and display) Example in some examples, it is but in some embodiments, defeated on the touch sensitive surface that equipment detection separates with display Enter, as shown in Figure 5 B.In some embodiments, touch sensitive surface (for example, 551 in Fig. 5 B) have with display (for example, 550) main shaft (for example, 552 in Fig. 5 B) corresponding to the main shaft (for example, 553 in Fig. 5 B) on.According to these embodiments, In position corresponding with the relevant position on display, (for example, in Fig. 5 B, 560 correspond to 568 and 562 pairs for equipment detection Should be in the contact with touch sensitive surface 551 of 570) place (for example, 560 in Fig. 5 B and 562).So, in touch sensitive surface (for example, figure In 5B when 551) being separated with the display (550 in Fig. 5 B) of multifunctional equipment, detected by equipment on touch sensitive surface User's input (for example, contact 560 and 562 and their movement) is used to manipulate the user interface on display by the equipment. It should be appreciated that similar method is optionally for other users interface as described herein.

In addition, though mostly in reference to finger input (for example, finger contact, singly refer to Flick gesture, finger gently sweeps gesture) To provide following example it should be appreciated that in some embodiments, one or more of these fingers input Finger input is substituted by the input (for example, input or stylus based on mouse input) from another input equipment.It is for example, light Sweep gesture and (for example, rather than contact) optionally clicked on by mouse, be cursor afterwards along the moving of the path gently swept (for example, Rather than the movement of contact) substitute.And for example, Flick gesture is optionally by when cursor is located above the position of Flick gesture Mouse is clicked on (for example, rather than the detection to contact, be termination detection contact afterwards) and substituted.Similarly, detected when simultaneously When being inputted to multiple users, it should be appreciated that multiple computer mouses are optionally used simultaneously, or mouse and finger contact Optionally it is used simultaneously.

Fig. 6 A show exemplary personal electronic equipments 600.Equipment 600 includes main body 602.In some embodiments, Equipment 600 is included for some or all of the feature described in equipment 200 and 400 (for example, Fig. 2A -4) feature.In some realities Apply in scheme, equipment 600 has the touch-sensitive display panel 604 of hereinafter referred to as touch-screen 604.Replacement as touch-screen 604 Or supplement, equipment 600 have display and touch sensitive surface.As the situation of equipment 200 and 400, in some embodiments In, touch-screen 604 (or touch sensitive surface) has the one or more for the intensity for being used to detect applied contact (for example, touch) Intensity sensor.One or more of intensity sensors of touch-screen 604 (or touch sensitive surface) can provide for representing to touch Intensity output data.The user interface of equipment 600 is responded based on touch intensity to touch, it means that different strong The touch of degree can call the different operating user interfaces in equipment 600.

Technology for detecting and handling touch intensity is seen in for example following related application：May 8 in 2013 Entitled " Device, Method, and Graphical User Interface for the Displaying User that day submits Interface Objects Corresponding to an Application " international patent application serial number PCT/ US2013/040061, and entitled " Device, Method, the and Graphical submitted on November 11st, 2013 User Interface for Transitioning Between Touch Input to Display Output Relationships " international patent application serial number PCT/US2013/069483, it is each special in the two patent applications Profit application is incorporated by reference in its entirety herein accordingly.

In some embodiments, equipment 600 has one or more input mechanisms 606 and 608.The He of input mechanism 606 608 (if including) were physical form.Being physically entered the example of mechanism includes push button and Rotatable mechanism.One In a little embodiments, equipment 600 has one or more attachment means.Such attachment means (if including) can allow by Equipment 600 and such as cap, glasses, earrings, necklace, shirt, jacket, bracelet, watchband, bangle, trousers, belt, shoes, money Bag, knapsack etc. are attached.These attachment means allow user's wearable device 600.

Fig. 6 B show exemplary personal electronic equipments 600.In some embodiments, equipment 600 includes reference chart Some or all of part described in 2A, Fig. 2 B and Fig. 4 part.Equipment 600 has a bus 612, and the bus is by I/O parts 614 operatively couple with one or more computer processors 616 and memory 618.I/O parts 614 are connected to display 604, the display 604 can be with touch sensing element 622 and optionally also with touch intensity sensing unit 624.In addition, I/O Part 614 is connected with communication unit 630, for using Wi-Fi, bluetooth, near-field communication (NFC), honeycomb and/or other are wireless The communication technology receives application program and operating system data.Equipment 600 includes input mechanism 606 and/or 608.For example, input Mechanism 606 is rotatable input equipment or pressable input equipment and rotatable input equipment.In some instances, it is defeated It is button to enter mechanism 608.

In some instances, input mechanism 608 is microphone.Personal electronic equipments 600 include for example various sensors, all Such as GPS sensor 632, accelerometer 634, orientation sensor 640 (for example, compass), gyroscope 636, motion sensor 638 And/or its combination, all these equipment are operatively connectable to I/O parts 614.

The memory 618 of personal electronic equipments 600 is can for storing the non-transient computer of computer executable instructions Storage medium is read, the instruction by one or more computer processors 616 when being performed, such as computer processor is performed Technique described below and process.The computer executable instructions are also for example in any non-transient computer readable storage medium storing program for executing Stored and/or transmitted, for instruction execution system, device or equipment such as computer based system, include processor System or can be obtained from instruction execution system, device or equipment the other systems of instruction and execute instruction using or tie with it Close.Personal electronic equipments 600 are not limited to Fig. 6 B part and configuration, but may include the miscellaneous part in various configurations or additional Part.

As used herein, refer to can be in such as equipment 200,400 and/or 600 (Fig. 2, Fig. 4 and figure for term " showing to represent " 6) the user mutual formula graphical user interface object of display screen display.For example, image (for example, icon), button and text (for example, hyperlink), which is each formed, to be shown and can represent.

As used herein, term " focus selector " refers to the user interface just interacted for instruction user The input element of current portions.In some specific implementations including cursor or other positions mark, cursor serves as " focus selection Device " so that when cursor is above particular user interface element (for example, button, window, sliding block or other users interface element) When detect input (for example, pressing on touch sensitive surface (for example, touch sensitive surface 551 in touch pad 455 or Fig. 5 B in Fig. 4) Pressure input) in the case of, the particular user interface element is conditioned according to detected input.Including can realize with The touch-screen display of the direct interaction of user interface element on touch-screen display is (for example, the touch-sensitive display in Fig. 2A Touch-screen 212 in system 212 or Fig. 5 A) some specific implementations in, the detected contact on touch-screen is served as " burnt Point selection device " so that when on touch-screen display particular user interface element (for example, button, window, sliding block or other User interface element) opening position detect input (for example, by contact carry out pressing input) when, the particular user interface member Element is conditioned according to detected input.In some specific implementations, focus is moved to from a region of user interface Another region of user interface, the movement of the contact in corresponding movement or touch-screen display without cursor is (for example, logical Cross and focus is moved to another button from a button using Tab key or arrow key)；In these specific implementations, focus choosing Select device mobile according to the focus between the different zones of user interface and move.The specific shape that focus selector is taken is not considered Formula, focus selector be typically from user's control so as to transmit with the user of user interface expected from interact (for example, pass through to The element that the user of equipment indicative user interface it is expected to interact) user interface element (or on touch-screen display Contact).For example, when detecting pressing input on touch sensitive surface (for example, touch pad or touch-screen), focus selector (example Such as, cursor, contact or choice box) instruction user it is expected to activate the corresponding button and (rather than set by position above the corresponding button The other users interface element shown on standby display).

As used in the specification and in the claims, " property strengths " of the contact term refers to one based on contact Or the characteristic of the contact of multiple intensity.In some embodiments, property strengths are based on multiple intensity samples.Property strengths are optional Ground is based on (for example, after contact is detected, before detecting that contact is lifted, connecing detecting relative to predefined event Touch start it is mobile before or after, before detecting that contact terminates, before or after the intensity for detecting contact increases and/ Or detect contact intensity reduce before or after) for the predetermined period (for example, 0.05 second, 0.1 second, 0.2 second, 0.5 second, 1 second, 2 seconds, 5 seconds, 10 seconds) during collection predefined quantity intensity sample or one group of intensity sample.Connect Tactile property strengths are optionally based on one or more of the following：The maximum of contact strength, the average of contact strength, 90% maximum of value, half maximum of contact strength, contact strength at the average value of contact strength, preceding the 10% of contact strength Value etc..In some embodiments, it is determined that using the duration of contact (for example, being to connect in property strengths during property strengths During tactile intensity average value in time).In some embodiments, by property strengths and one group of one or more intensity threshold Value is compared, to determine whether executed operates user.For example, the one or more intensity thresholds of the group include the first intensity threshold Value and the second intensity threshold.In this example, contact of the property strengths not less than first threshold causes the first operation, property strengths Contact more than the first intensity threshold but not less than the second intensity threshold causes the second operation, and property strengths are more than the second threshold The contact of value causes the 3rd operation.In some embodiments, using the comparison between property strengths and one or more threshold values To determine whether to perform one or more operations (for example, being to perform corresponding operating to be also to give up performing corresponding operating), without It is to be used to determine to perform the first operation or the second operation.

In some embodiments, identify a part for gesture for determination property strengths.For example, touch sensitive surface receives Contact is gently continuously swept, this is continuously gently swept contact from original position transition and reaches end position, at the end position, connects Tactile intensity increase.In this example, contact the characteristic strength at end position and be based only upon the continuous part for gently sweeping contact, Rather than entirely gently sweep contact (for example, gently sweeping contact portion only at end position).In some embodiments, it is determined that The forward direction of the property strengths of contact gently sweeps the intensity application smoothing algorithm of gesture.For example, the smoothing algorithm optionally includes One or more of the following：It is smooth that moving average smoothing algorithm, triangle smoothing algorithm, median filter are not weighted Change algorithm and/or exponential smoothing algorithm.In some cases, these smoothing algorithms, which eliminate, gently sweeps in the intensity of contact Narrow spike or depression, to realize the purpose for determining property strengths.

Detection intensity threshold value, light press intensity threshold are such as contacted relative to one or more intensity thresholds, deeply by pressure Threshold value and/or other one or more intensity thresholds are spent to characterize the intensity of the contact on touch sensitive surface.In some embodiments In, light press intensity threshold corresponds to such intensity：Equipment will be performed generally with clicking on pressing for physics mouse under the intensity The operation that button or touch pad are associated.In some embodiments, deep pressing intensity threshold corresponds to such intensity：It is strong at this The operation different by the operation associated from generally with clicking on the button of physics mouse or Trackpad is performed of the lower equipment of degree.At some In embodiment, when detecting property strengths less than light press intensity threshold (for example, and being higher than Nominal contact detection intensity threshold Value, the contact lower than Nominal contact detection intensity threshold value be no longer detected) contact when, equipment will be according to contact in touch-sensitive table Movement on face carrys out moving focal point selector, without performing the behaviour associated with light press intensity threshold or deep pressing intensity threshold Make.In general, unless otherwise stated, otherwise these intensity thresholds are consistent between different groups of user interface accompanying drawing.

Contact performance intensity from the intensity less than light press intensity threshold increase between light press intensity threshold with it is deep by Intensity between Compressive Strength threshold value is sometimes referred to as " light press " input.Contact performance intensity is from less than deep pressing intensity threshold The intensity that intensity increases to above deep pressing intensity threshold is sometimes referred to as " deep pressing " input.Contact performance intensity is from less than connecing The intensity for touching detection intensity threshold value is increased between the intensity contacted between detection intensity threshold value and light press intensity threshold sometimes It is referred to as detecting the contact on touch-surface.Contact characteristic intensity is reduced to low from the intensity higher than contact detection intensity threshold value Sometimes referred to as detect that contact is lifted from touch-surface in the intensity of contact detection intensity threshold value.In some embodiments, It is zero to contact detection intensity threshold value.In some embodiments, contact detection intensity threshold value and be more than zero.

Herein in some described embodiments, in response to detecting gesture or response including corresponding pressing input One or more operations are performed in detecting the corresponding pressing performed using corresponding contact (or multiple contacts) input, wherein extremely It is at least partly based on and detects that the intensity of the contact (or multiple contacts) increases to above pressing input intensity threshold value and detected Corresponding pressing inputs.In some embodiments, in response to detecting that it is strong that the intensity of corresponding contact increases to above pressing input Threshold value (for example, " downward stroke " of corresponding pressing input) is spent to perform corresponding operating.In some embodiments, pressing input Intensity including corresponding contact increase to above pressing input intensity threshold value and the contact intensity be decreased subsequently to be less than by Press input intensity threshold value, and in response to detect the intensity of corresponding contact be decreased subsequently to less than pressing input threshold value (for example, " up stroke " of corresponding pressing input) perform corresponding operating.

In some embodiments, the accident that equipment uses intensity hysteresis to avoid sometimes referred to as " shaking " inputs, its Middle equipment limits or selection has the hysteresis intensity threshold of predefined relation with pressing input intensity threshold value (for example, hysteresis intensity Threshold value than the low X volume unit of pressing input intensity threshold value, or hysteresis intensity threshold be pressing input intensity threshold value 75%, 90% or some rational proportion).Therefore, in some embodiments, pressing input includes the intensity of corresponding contact and increases to height It is decreased subsequently to be less than the hysteresis for corresponding to pressing input intensity threshold value in the intensity of pressing input intensity threshold value and the contact Intensity threshold, and in response to detecting that the intensity of corresponding contact is decreased subsequently to less than hysteresis intensity threshold (for example, accordingly pressing Press " up stroke " of input) perform corresponding operating.Similarly, in some embodiments, only contact is detected in equipment Intensity from equal to or less than hysteresis intensity threshold intensity increase to equal to or higher than pressing input intensity threshold value intensity and Optionally contact strength is decreased subsequently to be equal to or less than just detect pressing input during the intensity of hysteresis intensity, and in response to Pressing input (for example, according to environment, contact strength increase or contact strength reduce) is detected to perform corresponding operating.

In order to easily explain, optionally, triggered in response to detecting any of following various situations situation to sound The associated pressing input of Ying Yuyu pressing input intensity threshold values or the operation performed in response to the gesture including pressing input Description：The intensity of contact increases to above pressing input intensity threshold value, the intensity of contact from strong less than hysteresis intensity threshold Degree increase to above pressing input intensity threshold value intensity, contact intensity be decreased below press input intensity threshold value, and/or The intensity of contact is decreased below hysteresis intensity threshold corresponding with pressing input intensity threshold value.In addition, describe the operations as In the example that intensity in response to detecting contact is decreased below pressing input intensity threshold value and performed, it is optionally in response to examine The intensity for measuring contact is decreased below corresponding to and performs behaviour less than the hysteresis intensity threshold of pressing input intensity threshold value Make.

3rd, digital assistant

Fig. 7 A show the block diagram of the digital assistant 700 according to various examples.In some instances, digital assistant 700 realize in freestanding computer system.In some instances, digital assistant 700 is distributed across multiple computers.One In a little examples, some modules and function in the module and function of digital assistants are divided into server section and client end Point, wherein client part be located at one or more user equipmenies (for example, equipment 104,122,200,400 or 600) on and Communicated by one or more networks with server section (for example, server system 108), as shown in Figure 1.Show at some In example, digital assistant 700 is the specific implementation of the server system 108 (and/or DA servers 106) shown in Fig. 1. It should be pointed out that digital assistant 700 is only an example of digital assistant, and digital assistant 700 can With more or less parts than showing, can be combined two or more parts or can have part it is different configure or Arrangement.Various parts shown in Fig. 7 A are in hardware, the software instruction for being performed by one or more processors, firmware (bag Include one or more signal processing integrated circuits and/or application specific integrated circuit) or combinations thereof in realize.

Digital assistant 700 includes memory 702, one or more processors 704, input/output (I/O) interface 706 and network communication interface 708.These parts can be communicated with one another by one or more communication bus or signal wire 710.

In some instances, memory 702 includes non-transitory computer-readable medium, such as high-speed random access memory And/or non-volatile computer readable storage medium storing program for executing (for example, one or more disk storage equipments, flash memory device or Other non-volatile solid state memory equipment).

In some instances, I/O interfaces 706 such as show the input-output apparatus 716 of digital assistant 700 Device, keyboard, touch-screen and microphone are coupled to subscriber interface module 722.The I/O interfaces 706 combined with subscriber interface module 722 User is received to input (for example, phonetic entry, input through keyboard, touch input etc.) and correspondingly handle these inputs. In some examples, such as when digital assistants are being implemented on free-standing user equipment, digital assistant 700 includes relative respectively In the part and I/O communication interfaces described by equipment 200, equipment 400 or equipment 600 in Fig. 2A, Fig. 4, Fig. 6 A-B Any one.In some instances, digital assistant 700 represents the server section of digital assistants specific implementation, and can lead to The client-side aspects crossed on user equipment (for example, equipment 104, equipment 200, equipment 400 or equipment 600) and user Interact.

In some instances, network communication interface 708 includes one or more wired connection ports 712, and/or wireless Transmission and receiving circuit 714.One or more wired connection ports are via one or more wireline interfaces such as Ethernet, general Universal serial bus (USB), live wire etc. receive and sent signal of communication.Radio-circuit 714 is from communication network and other communication equipments Receive RF signals and/or optical signalling and send RF signals and/or optical signalling to communication network and other communications and set It is standby.Radio communication uses any of a variety of communication standards, agreement and technology, such as GSM, EDGE, CDMA, TDMA, indigo plant Tooth, Wi-Fi, VoIP, Wi-MAX or any other suitable communication protocol.Network communication interface 708 makes digital assistant 700 by network, such as internet, Intranet and/or wireless network such as cellular phone network, WLAN (LAN) and/ Or communication of the Metropolitan Area Network (MAN) (MAN) between other equipment is possibly realized.

In some instances, the computer-readable recording medium storage of memory 702 or memory 702 includes herein below In whole or its subset program, module, instruction and data structure：Operating system 718, communication module 720, user interface mould Block 722, one or more application programs 724 and digital assistant module 726.Specifically, the meter of memory 702 or memory 702 Calculation machine readable storage medium storing program for executing stores the instruction for performing process described below.One or more processors 704 perform these Program, module and instruction, and read data from data structure or write data into data structure.

Operating system 718 is (for example, Darwin, RTXC, LINUX, UNIX, iOS, OS X, WINDOWS or embedded operation System such as VxWorks) include be used for control and manage general system task (for example, memory management, storage device control, Power management etc.) various software parts and/or driver, and be advantageous between various hardware, firmware and software part Communication.

Communication module 720 facilitates what is carried out between digital assistant 700 and other equipment by network communication interface 708 Communication.For example, communication module 720 and electronic equipment such as equipment 200, the equipment as shown in Fig. 2A, Fig. 4, Fig. 6 A-B respectively 400 and the RF circuits 208 of equipment 600 communicated.Communication module 720 also includes being used to handle by radio-circuit 714 and/or had The various parts of the received data of line COM1 712.

Subscriber interface module 722 is via I/O interfaces 706 from user (for example, from keyboard, touch-screen, sensing equipment, control Device and/or microphone processed) order and/or input are received, and user interface object is generated over the display.Subscriber interface module 722 be also prepared for export (for example, voice, sound, animation, text, icon, vibration, touch feedback, illumination etc.) and by its via I/O interfaces 706 (for example, by display, voice-grade channel, loudspeaker, touch pad etc.) are sent to user.

Application program 724 includes being configured as the program and/or module performed by one or more of processors 704. For example, if digital assistant is implemented on free-standing user equipment, application program 724 may include user application, Such as game, calendar applications, navigation application program or mail applications.If digital assistant 700 is in server Upper implementation, then application program 724 include such as asset management application, diagnosis of application program or scheduling application.

Memory 702 also stores digital assistant module 726 (or server section of digital assistants).In some instances, Digital assistant module 726 includes following submodule or its subset or superset：Input/output processing module 728, voice turn text (STT) processing module 730, natural language processing module 732, dialogue stream processing module 734, task flow processing module 736, clothes Processing module 738 of being engaged in and voice synthetic module 740.Each module in these modules, which is respectively provided with, to be helped following system or numeral Manage one or more of data and model of module 726 or the access rights of its subset or superset：Ontologies 760, word Converge and index 744, user data 748, task flow model 754, service model 756 and ASR system.

In some instances, helped using the processing module, data and model implemented on digital assistant module 726, numeral At least some operations in the executable following operation of reason：Phonetic entry is converted into text；Identify the natural language received from user The intention of the user expressed in speech input；Energetically draw and obtain the information (example needed for for the fully intention of deduction user Such as, by eliminating the ambiguity of words, game, purpose etc.)；It is determined that the task flow for realizing the intention being inferred to；And perform The intention that task flow is inferred to realizing.

In some instances, as shown in Figure 7 B, I/O processing modules 728 are entered by the I/O equipment 716 in Fig. 7 A with user Row interaction or by the network communication interface 708 in Fig. 7 A and user equipment (for example, equipment 104, equipment 200, equipment 400 or Equipment 600) interact, with obtain user input (for example, phonetic entry) and provide to user input response (for example, As voice output).I/O processing modules 728 are in company with receiving user's input together or after the user input is received soon Optionally obtain the contextual information associated with user's input from user equipment.Contextual information is included specific to user Data, vocabulary, and/or the preference related to user's input.In some instances, contextual information, which is additionally included in, receives use The software and hardware state of user equipment when family is asked, and/or it is related to the surrounding environment of the user when receiving user's request Information.In some instances, I/O processing modules 728 also send the follow-up problem relevant with user's request to user, and from Family, which receives, answers.When user's request is received by I/O processing modules 728 and user's request includes phonetic entry, I/O processing moulds Phonetic entry is forwarded to STT processing modules 730 (or speech recognition device) by block 728, is changed for speech text.

STT processing modules 730 include one or more ASR systems.One or more ASR systems, which can be handled, passes through I/O Phonetic entry received by processing module 728, to produce recognition result.Each ASR system includes front end speech preprocessor. Front end speech preprocessor extracts characteristic features from phonetic entry.For example, front end speech preprocessor is held to phonetic entry To extract spectral signature, phonetic entry is characterized as representative multi-C vector sequence by the spectral signature for row Fourier transform.In addition, Each ASR system includes one or more speech recognition modelings (for example, sound model and/or language model) and implementation one Or multiple speech recognition engines.The example of speech recognition modeling includes hidden Markov model, gauss hybrid models, depth nerve Network model, n gram language models and other statistical models.The example of speech recognition engine includes being based on dynamic time warping Engine and engine based on FST (WFST).One or more speech recognition modelings and one or more languages Sound identification engine is used for the characteristic features extracted for handling front end speech preprocessor, to produce instant recognition result (example Such as, phoneme, phone string and sub- word) and finally produce text identification result (for example, words, words string or symbol sebolic addressing).One In a little examples, phonetic entry at least partially through third party's service or user equipment (for example, equipment 104,200,400 or 600) located, to produce recognition result.Once STT processing modules 730 are generated comprising text string (for example, words, words sequence Row or symbol sebolic addressing) recognition result, just by the recognition result be sent to natural language processing module 732 for be intended to push away It is disconnected.

The more details that relevant voice turns text-processing are that in September, 2011 submission of 20 days is entitled " Consolidating Speech Recognition Results " U.S. Utility Patent patent application serial numbers 13/236, It is described in 942, the entire disclosure is herein incorporated by reference.

In some instances, STT processing modules 730 include and/or accessed to know via phonemic alphabet table modular converter 731 The vocabulary of malapropism word.One or more candidates of words of each vocabulary words with being represented in speech recognition phonemic alphabet table pronounce It is associated.Specifically, it can recognize that the vocabulary of words includes the words associated with multiple candidates pronunciation.For example, vocabulary include with Candidate pronouncesWith Associated words " tomato ".In addition, vocabulary words is with being based on user The self-defined candidate pronunciation of previous phonetic entry is associated.Such self-defined candidate's pronunciation is stored in STT processing modules 730 In and it is associated with specific user via the user profile in equipment.In some instances, the spelling based on words and One or more language and/or phoneme rules come determine the candidate of words pronounce.In some instances, candidate's hair is manually generated Sound, such as generated based on known RP.

In some instances, candidate's pronunciation is ranked up based on the conventional property of candidate's pronunciation.For example, candidate pronouncesRanking be higher thanBecause the former is pronunciation more often (for example, in all users more often Used for the subset of the user appropriate to any other with, user for specific geographical area or more often).In some examples In, whether it is that the self-defined candidate associated with user pronounces to be ranked up to candidate's pronunciation based on candidate's pronunciation.For example, from The ranking for defining candidate's pronunciation is pronounced higher than standard candidate.This can be used for identification with the suitable of the unique pronunciation for deviateing RP When noun.In some instances, candidate's pronunciation is related to one or more characteristics of speech sounds such as geographic origin, nationality or race Connection.For example, candidate pronouncesIt is associated with the U.S., and candidate pronouncesIt is associated with Britain.Separately Outside, one or more characteristic (examples of the ranking based on the user being stored in the user profile in equipment of candidate's pronunciation Such as, geographic origin, nationality, race etc.).For example, it can determine that user is associated with the U.S. by user profile.Based on user It is associated with the U.S., candidate's pronunciationThe ranking of (associated with the U.S.) is pronounced higher than candidate(with Britain is associated).In some instances, pronunciation (example of the candidate pronunciation in candidate's pronunciation of selected and sorted as prediction Such as, most probable pronunciation).

When receiving phonetic entry, STT processing modules 730 are used to determine phoneme corresponding with phonetic entry (for example, making With sound model), and then attempt to determine the words (for example, using language model) for matching the phoneme.If for example, STT Processing module 730 identifies aligned phoneme sequence/t firstCorresponding to a part for phonetic entry, it is then based on word The index 744 that converges determines whether the sequence corresponds to words " tomato ".

In some instances, STT processing modules 730 determine the words in language using approximate matching techniques.Therefore, example Such as, STT processing modules 730 can determine that aligned phoneme sequenceCorresponding to words " tomato ", instant specific phoneme sequence Row are not a candidate phoneme in the candidate phoneme sequence of the words.

The natural language processing module 732 (" natural language processor ") of digital assistants can be obtained by STT processing modules The words of 730 generations or the sequence (" symbol sebolic addressing ") of symbol, and attempt the formula symbol sebolic addressing and identified by digital assistants One or more " executable to be intended to " are associated." executable to be intended to " is represented to be performed by digital assistants and can had in task The task for the associated task flow implemented in flow model 754.Associated task flow is that digital assistants are taken to perform task A series of actions by programming and step.The limit of power of digital assistants depends in task flow model 754 implementing simultaneously The value volume and range of product of the task flow of storage, or in other words, the quantity of " executable to be intended to " that is identified depending on digital assistants and Species.It is inferred to correctly from user's request with natural language expressing however, the validity of digital assistants additionally depends on assistant " one or more executable be intended to " ability.

In some instances, in addition to the words or the sequence of symbol that are obtained from STT processing modules 730, at natural language Reason module 732 also receives the contextual information associated with user's request, such as from I/O processing modules 728.At natural language Reason module 732 optionally using contextual information clearly, supplement and/or further definition be comprised in from STT processing modules Information in 730 symbol sebolic addressings received.Contextual information includes the hardware and/or software of such as user preference, user equipment State, the previous friendship between sensor information, digital assistants and the user collected soon before, during or after user asks Mutually (for example, dialogue), etc..As described herein, in some instances, contextual information is dynamic and during with dialogue Between, position, content and other factors and change.

In some instances, natural language processing is based on such as ontologies 760.Ontologies 760 are to include many sections The hierarchy of point, each node represent " executable to be intended to " or with one of " executable to be intended to " or other " attributes " or more Person's related " attribute ".As described above, the task that " executable to be intended to " expression digital assistants are able to carry out, the i.e. task are " can Perform " or can be carried out." attribute " represents the parameter associated with the son aspect of executable intention or another attribute.Knowledge Between executable intention node and attribute node in body 760 linking limit by attribute node represent parameter how subordinate In by executable being intended to node and representing for task.

In some instances, ontologies 760 are made up of executable intention node and attribute node.In ontologies 760 Interior, each executable node that is intended to is connected directly to or is connected to one or more category by attribute node among one or more Property node.Similarly, each attribute node be connected directly to or by attribute node among one or more be connected to one or Multiple executable intention nodes.For example, as seen in figure 7 c, ontologies 760 include " dining room reservation " node (that is, executable meaning Node of graph).Attribute node " dining room ", " date/time " (for subscribing) and " colleague's number " are connected directly to executable meaning Node of graph (that is, " dining room reservation " node).

In addition, attribute node " style of cooking ", " price range ", " telephone number " and " position " is the son in attribute node " dining room " Node, and " dining room reservation " node (that is, executable to be intended to node) is connected to by middle attribute node " dining room ".Again Such as, as seen in figure 7 c, ontologies 760 also include " setting is reminded " node (that is, another executable intention node).Attribute section Point " date/time " (being reminded for setting) and " theme " are each attached to " setting is reminded " node (for reminding).Due to category Property " date/time " to carry out dining room reservation task and setting prompting both tasks it is related, therefore attribute node " date/ Time " is connected to both " dining room reservation " node and " setting is reminded " nodes in ontologies 760.

The executable node that is intended to is collectively depicted as being in " domain " together with its concept node connected.In this discussion, each domain With it is corresponding it is executable be intended to associated, and be related to a group node associated with specific executable intention (and between them Relation).For example, the ontologies 760 shown in Fig. 7 C are included in showing for the dining room subscribing domain 762 in ontologies 760 Example and the example for reminding domain 764.Dining room subscribing domain include it is executable be intended to node " dining room reservation ", attribute node " dining room ", " date/time " and " colleague's number " and sub- attribute node " style of cooking ", " price range ", " telephone number " and " position ".Carry Domain 764 wake up including can perform intention node " set and remind " and attribute node " theme " and " date/time ".In some examples In, ontologies 760 are made up of multiple domains.One or more attribute nodes are shared with other one or more domains in each domain.Example Such as, in addition to dining room subscribing domain 762 and prompting domain 764, " date/time " attribute node is also with many not same areas (for example, row Journey arranges domain, travel reservations domain, film ticket domain etc.) it is associated.

Although Fig. 7 C show two example domains in ontologies 760, other domains include such as " lookup film ", " initiation call ", " search direction ", " arrangement meeting ", " transmission message " and " answer that problem is provided ", " reading row Table ", " offer navigation instruction ", " instruction for task is provided " etc.." transmission message " domain and " transmission message " executable intention Node is associated, and also includes attribute node such as " one or more recipients ", " type of message " and " message text ".Category Property node " recipient " further can be defined for example by sub- attribute node such as " recipient's name " and " message addresses ".

In some instances, " lookup media item " domain includes super domain, and the super domain is included with searching or obtaining media item Associated many executable intention nodes.For example, " lookup media item " domain includes executable intention node, such as " tool is obtained Have the media item of nearest release data ", " obtain personalized digital media item recommend " or " obtaining the information associated with media item ".

In some instances, ontologies 760 include digital assistants it will be appreciated that and it is worked all domains (and Therefore executable intention).In some instances, ontologies 760 are such as by adding or remove whole domain or node, or Changed by changing the relation between the node in ontologies 760.

In some instances, by the node clusters associated to multiple related executable intentions in ontologies 760 Under " super domain ".For example, " travelling " super domain may include the attribute node relevant with travelling and the executable cluster for being intended to node. The executable intention node relevant with travelling include " plane ticket booking ", " hotel reservation ", " automobile leasing ", " route planning ", " seek Look for point of interest " etc..Executable intention node under same super domain (for example, " travelling " super domain) has multiple shared category Property node.For example, for " plane ticket booking ", " hotel reservation ", " automobile leasing ", " route planning ", " searching point of interest " can Perform be intended to nodes sharing attribute node " original position ", " destination ", " departure date/time ", " date of arrival/time " and One or more of " colleague's number ".

In some instances, each node in ontologies 760 and and the attribute represented by node or executable intention One group of relevant words and/or phrase are associated.The words and/or phrase of the respective sets associated with each node be and node Associated so-called " vocabulary ".The words and/or phrase of the respective sets associated with each node are stored in and by nodes In the attribute of expression or the executable glossarial index 744 for being intended to be associated.For example, return to Fig. 7 B, the node with " dining room " attribute Associated vocabulary includes words such as " cuisines ", " drinks ", " style of cooking ", " starvation ", " eating ", " Pizza ", snack food, " meals " Deng.And for example, the vocabulary associated with the node of the executable intention of " initiation call " includes words and phrase such as " is exhaled Cry ", " making a phone call ", " dialing ", " with ... take on the telephone ", " calling the number ", " phoning " etc..Glossarial index 744 is optional Ground includes the words and phrase of different language.

Natural language processing module 732 receives symbol sebolic addressing (for example, text string) from STT processing modules 730, and determines to accord with Which node word in number sequence involves.In some instances, if it find that words or phrase in symbol sebolic addressing and knowledge sheet One or more of body 760 node is associated (via glossarial index 744), then words or phrase by " triggering " or " activation " that A little nodes.Based on the quantity and/or relative importance for having activated node, natural language processing module 732 will select executable meaning Executable being intended to perform digital assistants as user view of the task in figure.In some instances, selection has most The domain of more " triggering " nodes.In some instances, selection has highest confidence level (for example, each having triggered node based on its Relative importance) domain.In some instances, domain is selected based on the combination of the quantity and importance that have triggered node. In some examples, additive factor is further contemplated during node is selected, whether such as digital assistants previously are correctly explained From the similar request of user.

User data 748 includes the information specific to user, such as vocabulary, user preference, user specific to user Other short-term or long-term informations of location, the default language of user and second language, the contacts list of user and every user. In some instances, natural language processing module 732 is supplemented included in user's input using the information specific to user Information, further to limit user view.For example, " inviting my friends to participate in my birthday party " is asked for user, from Right language processing module 732 is able to access that user data 748 to determine " friend " is that who and " birthday party " will be in what When where hold, clearly provide this type of information in its request without user.

Other details based on symbol string search ontologies are being filed in the entitled " Method on December 22nd, 2008 And Apparatus for Searching Using An Active Ontology " U.S. Utility Patent application sequence It is described in row number 12/341,743, the entire disclosure is herein incorporated by reference.

In some instances, once natural language processing module 732 be based on user's request identify executable intention (or Domain), just generating structureization inquires about executable intention to represent to be identified to natural language processing module 732.In some instances, Structuralized query includes the parameter for one or more nodes in the executable domain being intended to, and in the parameter at least Some parameters are filled with the customizing messages specified in user asks and requirement." me is helped to subscribe evening in sushi shop for example, user says Upper 7 points of seat ".In this case, natural language processing module 732 can be based on user's input executable intention is correct Ground is identified as " dining room reservation ".According to ontologies, the structuralized query in " dining room reservation " domain include parameter such as { style of cooking }, { time }, { date }, { colleague's number } etc..In some instances, based on phonetic entry and using STT processing modules 730 from The text drawn in phonetic entry, natural language processing module 732 are directed to dining room subscribing domain generating portion structuralized query, wherein Partial structured inquiry includes parameter { style of cooking=" sushi class " } and { time=" at night 7 points " }.However, in this example, User spoken utterances include the information for being not enough to complete the structuralized query associated with domain.Therefore, based on currently available information, tying Other not specified call parameters such as { colleague's number } and { date } in structureization inquiry.In some instances, natural language processing Some parameters that module 732 is inquired about using the contextual information interstitital textureization received.For example, in some instances, if with Family request " nearby " sushi shop, natural language processing module 732 are used for carrying out interstitital texture from the GPS coordinates of user equipment { position } parameter in inquiry.

In some instances, natural language processing module 732 is by the structuralized query of generation (including any completed ginseng Number) it is sent to task flow processing module 736 (" task stream handle ").Task flow processing module 736 is configured as from natural language Say that processing module 732 receives structuralized query, complete structuralized query (if necessary), and perform " completion " user and finally ask Required action.In some instances, various processes provide in task flow model 754 necessary to completing these tasks. In some examples, task flow model 754 include be used for obtain the additional information from user process, and for perform with can Perform the task flow for being intended to associated action.

As described above, in order to complete structuralized query, task flow processing module 736 needs to initiate additional pair with user Words, to obtain additional information, and/or understand fully the language being potentially ambiguous.In view of the necessity of such interaction, at task flow Reason module 736 calls dialogue stream processor module 734 to participate in the dialogue with user.In some instances, stream handle is talked with Module 734 determines how (and/or when) and asks additional information to user, and receives and processing user response.At I/O Problem is provided to user and received from user and answered by reason module 728.In some instances, dialogue stream processing module 734 passes through Dialogue output is presented to user from audio and/or video frequency output, and receives via oral or physics (for example, click) response Input from user.Continue example above, when task flow processing module 736 call dialogue stream processing module 734 with for When the structuralized query associated with domain " dining room reservation " is to determine " colleague's number " and " date " information, dialogue stream processing module 734 generation problems such as " shared how many people's dining" and " specifically when have dinner" to be sent to user.Come from once receiving The answer of user, dialogue stream processing module 734 is then inquired about with missing information interstitital textureization, or is transferred information at task flow Module 736 is managed to complete missing information according to structuralized query.

Once task stream handle 736 is intended to complete structuralized query for executable, task flow processing module 736 is just Set about performing the final task associated with can perform intention.Therefore, task flow processing module 736 is wrapped according in structuralized query The special parameter contained performs step and the instruction in task flow model.For example, it is intended to appointing for " dining room reservation " for executable Business flow model include be used for contact dining room and actually ask special time be directed to specific colleague's number reservation the step of and Instruction.For example, by using structuralized query such as：Dining room is subscribed, dining room=ABC coffee-houses, date=2012/3/12, when Between=at 7 points in afternoon, colleague's number=5 people }, task flow processing module 736 performs following steps：(1) ABC coffee-houses are signed in Server or dining room reservation system are such as(2) inputting date in the form on website, time and same Row number information, (3) submit form, and (4) to make calendar for the reservation in user's calendar.

In some instances, auxiliary of the task flow processing module 736 in service processing module 738 (" service processing module ") Lower the completing to be asked in user's input of the task provides the informedness answer asked in user's input.For example, service center Reason module 738 can represent task flow processing module 736 and initiate call, set calendar, invocation map search, call The other users application program installed on user equipment interacts with the other applications, and calls third party's service (for example, portal website, social network sites, banking portal site etc. are subscribed in dining room) or interacted with third party's service.In some examples In, the agreement and application programming interface needed for each service are specified by the respective service model in service model 756 (API).Service processing module 738 is directed to the appropriate service model of service access, and according to service model according to needed for the service Agreement and API generation for the service request.

For example, if dining room has enabled online booking service, service model is submitted in dining room, and the service model, which is specified, to be carried out The call parameter of reservation and the API that the value of call parameter is sent to online booking service.By task stream handle 736 During request, service processing module 738 establishes the net with online booking service using the network address being stored in service model Network connects, and by the call parameter (for example, time, date, colleague's number) of reservation with the API's according to online booking service Form is sent to online booking interface.

In some instances, natural language processing module 732, dialogue stream processing module 734 and task flow processing module 736 Use jointly and repeatedly to infer and limit the intention of user, acquisition information further to define and refine user view, simultaneously Response (that is, export to user, or complete task) is ultimately generated, to meet the intention of user.The response generated is at least portion Fulfil the dialogue response to phonetic entry of user view with dividing.In addition, in some instances, the response generated is as language The output of sound output.In these examples, the response generated is sent to voice synthetic module 740 (for example, phonetic synthesis Device), wherein the response can be handled to synthesize the dialogue response of speech form.In another example, the response that is generated be with Meet that the user in phonetic entry asks related data content.

Voice synthetic module 740 is configured as synthesis and is used to present to the voice output of user.The base of voice synthetic module 740 Voice output is synthesized in the text that digital assistants provide.For example, the dialogue response generated is the form of text string.Voice closes Text string is converted to audible voice output into module 740.Voice synthetic module 740 uses any appropriate phonetic synthesis skill Art is so that from text generation voice output, the speech synthesis technique includes but is not limited to continuous synthesis, Unit selection synthesis, binary Sound synthesis, the specific synthesis in domain, formant synthesis, voice parameter synthesis, the synthesis based on hidden Markov model (HMM), with And sine wave synthesis.In some instances, voice synthetic module 740 is configured as synthesizing based on phone string corresponding with words Each words.For example, phone string is associated with the words in the dialogue response generated.Phone string is stored in related to words In the metadata of connection.Phonetic synthesis model 740 is configured as the phone string in directly processing metadata, to synthesize speech form Words.

It is in some instances, opposite (or in addition to using voice synthetic module 740) with using voice synthetic module 740, Perform phonetic synthesis on remote equipment (for example, server system 108), and by the voice of synthesis send to user equipment with For exporting to user.For example, this output that digital assistants wherein can occur generate on the server systems some are specific In implementation.And because server system typically has stronger disposal ability or resource than user equipment, therefore being capable of Billy The actual result synthesized with client-side obtains higher-quality voice output.

More details about digital assistants are found in the entitled " Intelligent for being filed on January 10th, 2011 Automated Assistant " U.S. Utility Patent application number 12/987,982 and it is filed in September in 2011 30 Entitled " Generating and Processing Task Items That Represent Tasks to In Perform " U.S. Utility Patent application number 13/251,088, the entire disclosure is incorporated by reference this Text.

4th, the process for the digital assistants probed into for media is operated

Fig. 8 A-C show the process 800 for being used for the digital assistants that media are probed into according to the operation of various examples.Fig. 9 A-B, Figure 10 and Figure 11 shows to operate the digital assistants on the user equipment 903 probed into for media according to the user 901 of various examples Interaction.For example, carry out implementation procedure 800 using the one or more electronic equipments for realizing digital assistants.In some examples In, perform the process on the client-server system (for example, system 100) for realizing digital assistants.In some instances, User equipment (for example, equipment 104,200,400 or 600) on perform the process.In process 800, some frames optionally by Merge, the order of some frames is optionally changed, and some frames are optionally omitted.Additionally, it should be realized that show at some In example, the subset below with reference to the feature described in Fig. 8 A-C is only carried out in process 800.

In frame 802, phonetic entry is received (for example, at I/O processing modules 728 and via microphone from user 213).The phonetic entry represents the request to one or more media items.For example, with reference to figure 9A, phonetic entry for " he, Siri, Play the hip-hop music that some I like ".In another example shown in Figure 10, phonetic entry for " he, Siri, play one A little music for being adapted to barbecue ".In another example shown in Figure 11, phonetic entry for " he, Siri, play what some newly went out Music ".Representing other examples of the phonetic entry of the request to one or more media items includes：" what I should listen ", " push away Recommend some music ", " what content today provides ", " he, Siri, DJ ", " putting some beautiful beats to me " as me, " search recommend playlist ", " playing any pleasing to the ear corpus ", " playing the content that some I like ", " any recommendation Body-building musical ", " music for searching newest distribution ", " the new rock and roll song that hot topic please be play " etc..

At frame 804, it is determined that whether (for example, using natural language processing module 732) be right on the phonetic entry of frame 802 Should be in the user view for obtaining the personalized recommendation for media item.Specifically, it is corresponding with phonetic entry to include determination for the determination User view (for example, executable be intended to).User view is determined in the way of above with reference to described in Fig. 7 A-C.Specifically, Words in phonetic entry or phrase are parsed and entered with the words of glossarial index (for example, glossarial index 744) or phrase Row compares.The various nodes of the words or phrase of glossarial index and ontologies (for example, ontologies 760) are (for example, can hold Row is intended to or domain) it is associated, therefore based on comparing, corresponding with phonetic entry words or phrase are by " triggering " or " activation ". The node with highest confidence level in the node that selection is activated.Therefore, determined by corresponding with the phonetic entry of frame 802 User view is executable intention corresponding with selected node.

Determine whether phonetic entry corresponds to obtained for media item based on selected executable intention node The user view that propertyization is recommended.If there is selected node acquisition to be held for the corresponding of personalized recommendation of media item Row is intended to, it is determined that phonetic entry corresponds to the user view for obtaining the personalized recommendation for media item.If on the contrary, institute The node of selection has the corresponding executable intention outside the personalized recommendation obtained for media item, it is determined that phonetic entry The user view for obtaining the personalized recommendation for media item is not corresponded to.

In some instances, determine whether phonetic entry corresponds to the user's meaning obtained for the personalized recommendation of media item Figure includes determining whether phonetic entry includes one or more of multiple predetermined phrases predetermined phrase.Specifically Ground, glossarial index include corresponding with the executable intention node for obtaining the personalized recommendation for being directed to media item multiple predetermined Phrase.The multiple predetermined phrase is included for example：" recommending for me ... [music] ", " DJ ", " broadcasting as me Some tune/beats ", " what content I should play ", " playing [music] that some I like ", " search and be adapted to ... Some good [music] " etc..One or more of these phrases phrase is included based on phonetic entry, phonetic entry is mapped to and obtained The executable intention of the personalized recommendation for media item is taken, and determines that the phonetic entry corresponds to and obtains for media item The user view of personalized recommendation.For example, in figure 9 a, phonetic entry 902 includes phrase and " plays [sound that some I like It is happy] ", the phrase is corresponding multiple predetermined for the executable intention node of the personalized recommendation of media item with acquisition A predetermined phrase in phrase.Therefore, in this example, determine that phonetic entry 902 corresponds to obtain and be directed to media The user view of the personalized recommendation of item.

In some instances, determine whether phonetic entry corresponds to the user's meaning obtained for the personalized recommendation of media item Whether the quantity for the parameter that figure includes determining to limit in phonetic entry is less than predetermined threshold value.Specifically, if in language The quantity of the parameter (for example, media parameter) limited in sound input is less than predetermined threshold value, it is determined that the phonetic entry pair Should be in the user view for obtaining the personalized recommendation for media item.For example, " I should play any content for phonetic entry" be The request related to playing music.However, the request is wide in range and fuzzy because it does not limit any media parameter, it is all as scheduled The artist of prestige, corpus, school or, issuing date.In this example, determine that the phonetic entry corresponds to obtain and be directed to media The user view of the personalized recommendation of item, because the quantity of the parameter (for example, media parameter) limited in the phonetic entry is small In predetermined threshold value (for example, one).

In some instances, determine whether phonetic entry corresponds to the user view for obtaining personalized recommendation media including true Whether the fixed phonetic entry is related to user.Specifically, phonetic entry is parsed and is related to user's (example to determine if to include Such as, " I ", " being adapted to me ", " give me ", " I " etc.) words or phrase.It is related to for example, determining whether following phrase includes The words of user：" having anything to recommend mine ", " giving me what a surprise ", " today has anything to recommend my content" In some examples, whether determination process is based on determining in phonetic entry comprising being related to the words or phrase and and media of user Related words or phrase (for example, " listening ", " music ", " broadcasting ", " tune ", " DJ " etc.).For example, determine that following phrase includes It is related to the words of user and the words or phrase related to media：" recommending some hip-hop music to me ", " DJ " as me, " what I should listen ", " have what can give recommend mine " or " playing some tunes for me ".Therefore, based on phonetic entry Comprising the words or phrase for being related to user, determine that the phonetic entry corresponds to the user view for obtaining personalized recommendation media.

In response to determining that phonetic entry corresponds to the user view of acquisition personalized recommendation media, frame 806 is performed.In frame At 806, at least one media item is obtained (for example, using natural language processing mould from the corpus specific to user of media item Block 732, task flow processing module 736, and/or service processing module 738).In some instances, at least one media item bag Include song, corpus, video, film or playlist.The corpus specific to user of media item is the matchmaker specific to user The personalized corpus of body item.Specifically, the corpus specific to user of media item based on the data associated with user and Generation.The more detailed description of the corpus specific to user of media item is provided below with reference to frame 810.At frame 806, from The corpus specific to user of media item, which obtains at least one media item, to be included performing one in frame 808-816 described below Individual or multiple frames.For example, use natural language processing module 732, task flow processing module 736 and/or service processing module 738 One or more of perform frame 808-816.

At frame 808, it is determined that the media parameter limited in phonetic entry is (for example, using natural language processing module 732).Then it is corresponding come the executable intention for generating with obtaining the personalized recommendation for being directed to media item using the media parameter of restriction Structuralized query.Specifically, glossarial index (for example, glossarial index 744) includes and each media in multiple media parameters Words corresponding to parameter or phrase.Therefore, by comparing the words or phrase and the words or phrase of glossarial index of phonetic entry To determine the media parameter limited in phonetic entry.For example, glossarial index includes the words or short associated with media parameter Language { school }.Words or phrase are included for example：" hip-hop ", " R＆B ", " jazz ", " punk ", " rock and roll ", " prevalence ", " allusion ", " bluegrass " etc..In the example shown in Fig. 9 A, phonetic entry is determined based on phrase " hip-hop " is detected in phonetic entry 902 902 are defined to media parameter { school } " hip-hop ".Therefore, in this example, generate u and obtain the personalization for being directed to media item Structuralized query is with including media parameter { school }=" hip-hop " corresponding to the executable intention recommended.

Another media parameter that can be determined from phonetic entry is { issuing date }.Media parameter { issuing date } refers to The issuing date of the media item of user's concern.Issuing date is such as exact date or date range.With the media parameter { issue date Phase } associated words or phrase include for example：" the seventies ", " the eighties ", " the nineties ", " last decade ", " 2008 ", " after in March, 2016 " etc..In one example, some the eighties " are played for me based on phonetic entry Words " the eighties " in tune " determines that media parameter { issuing date } is defined to " 1980-1989 " by the phonetic entry. Therefore, in this example, generation structuralized query corresponding with the executable intention for obtaining the personalized recommendation for being directed to media item With including media parameter { issuing date }=" 1980- 1989 ".

In some instances, it is limited based on the context of phonetic entry the date or time section in phonetic entry to be explained Stator school and non-limiting issuing date.For example, based on " 70 in phonetic entry " playing some seventies of punk's music for me " Age " determines that the phonetic entry will be defined to " 1970-1979 " period.In response to determining that phonetic entry limits the time Section, determines whether the phonetic entry limits the school associated with a period.In this example, phonetic entry includes and matchmaker Phrase " punk " corresponding to body parameter { school }.Because the period " seventies " in phonetic entry modifies school " punk ", because This determination phonetic entry is defined in the school " punk " associated with the period " seventies ".In response to determining phonetic entry limit The fixed school associated with the period that is limiting, the school of period or restriction based on restriction determine subflow group.For example, base Determine that subflow is sent " punk's music seventies " in the period " seventies " of restriction and the school " punk " of restriction.Therefore, exist In the example, the structuralized query generated for the executable intention obtained for the personalized recommendation of media item is joined including media Number { school }=" punk's music seventies ".It is worth noting that, with the period of restriction is construed into the media parameter { issue date Phase } on the contrary, the period of the restriction to be more accurately construed to a part for media parameter { school }.In this way, will Phonetic entry is construed to the actual intention for more accurately reflecting user, so as to which more relevant media item is provided to user.For example, At frame 806, the subflow based on determination sends " punk's music seventies " to obtain at least one media item, including issuing date Media item outside period 1970-1979.Specifically, each media item at least one media item includes instruction The metadata of school " punk's music seventies ".

Be determined to be in other media parameters for being limited in phonetic entry include such as { activity }, { mood }, { occasion }, { edit list }, { political orientation } or { skills involved in the labour }.Each media in these media parameters are hereafter described successively Parameter.For example, media parameter { activity } refer to the activity performed by user and with words or phrase such as " body-building ", " from Habit ", " barbecue ", " sleep ", " driving ", " study ", " drawing " etc. are associated.In one example, " searched based on phonetic entry Some be adapted to study music " in words " study " come determine the phonetic entry by media parameter activity } be defined to " learning Practise ".In another example shown in Figure 10, phrase corresponding with media parameter { activity } is detected in phonetic entry 1002 " barbecue ".Therefore, in this example, determine that media parameter { activity } is defined to " roast " by phonetic entry 1002.

Media parameter { mood } refer to user sensation or psychological condition and with words or phrase such as " cheerful and light-hearted ", " sad Wound ", " anger ", " loosening ", " powerful ", " excitement ", " romance " etc. are associated.In one example, " me is given based on phonetic entry Recommend some cheerful and light-hearted music " in words " cheerful and light-hearted " be defined to determining the phonetic entry by media parameter { mood } it is " joyous It hurry up ".

Media parameter { occasion } refer to the occasion associated with special time period and with words or phrase such as " Christmas Section ", " birthday ", " summer ", " winter ", " All Saints' Day ", " New Year ", " Easter " etc. are associated.In one example, based on language Sound input " playing some Christmas Day music " in words " Christmas Day " come determine the phonetic entry by media parameter { occasion } limit It is set to " Christmas Day ".

Media parameter { edit list } refers to the predetermined list of media items compiled by media representatives, such as Rolling Stones magazines, Billboard magazines, Shazam etc..Exemplary edit list is included for example：Billboard hundred Big Billboard UK, Billboard hit parades, the Big-corpus lists of Billboard 200, American top 40, 500 big songs of Rolling Stones, the Big-corpus of Rolling Stones 500, Rolling Stones hundred are most great Artist etc..Media parameter { edit list } with and the corresponding words of these lists or phrase be associated.For example, it is based on language Phrase " hundred big Billboard UKs of Billboard " in sound input " playing song from hundred big Billboard UKs of Billboard for me ", really Media parameter { edit list } is defined to " hundred big Billboard UKs of Billboard " by the fixed phonetic entry.

Media parameter { political orientation } refer to the political orientation of user and with words or phrase such as " conservatives ", " from By sending ", " right flank ", " Right deviation ", " left wing ", " "Left"-deviationist " etc. it is associated.In one example, " searched based on phonetic entry for me Words " conservatives " in conservatives' news " is defined to determining the phonetic entry by media parameter { political orientation } " conservative Group ".In this example, the alternative media item determined in frame 812 more likely with conservatives media sources (for example, Fox News, Drudge Report etc.) rather than liberal's media source (for example, Huffington Post, New York Times etc.) phase Association.

Media parameter { skills involved in the labour } refers to that user is proficient in degree to technical theme.In request for discussion technical theme Documentary film when, the media parameter is related.Specifically, media parameter { skills involved in the labour } and words or phrase be such as " technical very strong ", " layman ", " science ", " should be readily appreciated that ", " simple ", " advanced " etc. are associated.In an example In, based on the words " technical very strong " in phonetic entry " searching some technical very strong documentary films for me " come really Media parameter { skills involved in the labour } is defined to " height " by the fixed phonetic entry.In some instances, based on user to being asked Theme be familiar with journey degree of coming infer media parameter { skills involved in the labour }.Specifically, if user often asks relevant universe The documentary film (for example, being based on user's Request Log) of airship largely has if user has in his/her personal media libraries The documentary film of spaceship is closed, then can determine that the user is very familiar for the theme of spaceship, and therefore in this example, It is " height " to be inferred to media parameter { skills involved in the labour }.

At frame 810, the corpus specific to user of media item is determined.Determine the language material specific to user of media item Storehouse includes obtaining the customer identification information associated with user.Customer identification information includes the correspondence for example for accessing media item The corpus specific to user user account log-on message or user password information.Then known using customer identification information Not and access media item multiple corpus specific to user in appropriate media item the corpus specific to user with Obtain at least one media item.

In some instances, wherein the user equipment of the phonetic entry of sink block 802 with comprising customer identification information only One user profile (for example, being stored in user data 748) is associated.Therefore, it is based on setting with user at frame 810 Standby associated user profile retrieves customer identification information.So as to identify media based on the customer identification information of retrieval The corpus specific to user of item.

In some instances, customer identification information is retrieved when verifying user identity.Specifically, by using frame 802 Phonetic entry performs speaker identification, to verify the identity of user.For example, generated by the phonetic entry compared by frame 802 Vocal print and the reference vocal print associated with specific user perform speaker identification.It is if it is determined that defeated by the voice of frame 802 Enter the vocal print of generation under conditions of higher than confidence threshold value with reference to voice print matching, then the identity of user is verified.Should Recognize, other auth methods, finger print identifying, password authentification etc. can be implemented.In the identity of good authentication user, Retrieval customer identification information (for example, being retrieved from user profile) corresponding with the identity of the empirical tests of user.So Identified and accessed corresponding to media item specific to the corpus of user using customer identification information afterwards.Institute based on user is really Fixed identity, the corresponding language material specific to user of media item is determined from multiple corpus specific to user of media item Storehouse.

In some instances, the corpus specific to user of media item is stored in the long-range clothes separated with user equipment It is engaged on device.For example, the corpus specific to user of media item is as providing the media services of media item (for example, a kind of or more Kind of media services 120-1) a part stored.Customer identification information is needed to access the language material specific to user of media item Storehouse.In some instances, the crypto token comprising customer identification information generates and is sent to media clothes in a user device Business.Then media services are decrypted to token and access media item using the customer identification information in decrypted token The corresponding corpus specific to user, to obtain at least one media item.

In some instances, according to the media preferences of specific user come the corpus specific to user of customized media item. For example, the corpus specific to user of media item is generated using media-related data previously associated with user.Specifically Ground, based on selected by previous user, request or refusal media item and generate the corpus specific to user of media item.Example Such as, if it is determined that user often ask, browse, select or play with some media parameters (for example, { school }=" prevalence " or { artist }=" Katy Perry ") media item, then generate the corpus specific to user of media item to be advantageous to have The media item of those parameters.Similarly, if it is determined that user refuses have other some ginsengs (for example, { mood }=sadness) all the time Recommendation media item, then generate the corpus specific to user of media item to be unfavorable for the media with those other specifications .

In some instances, the language material specific to user of media item is generated based on the information from user profile Storehouse.User profile includes the information for characterizing user, country /region such as associated with user, the spoken language of user, user Age or the activity often participated in of user.Based on the information, the corpus specific to user of media item is generated, with favourable In the media item with the media parameter for supplementing the information.For example, if user profile instruction user is mainly spoken English simultaneously And for 12 years old, then the corpus specific to user of media item is generated, to be advantageous to say or sing English and with nearest (example Such as, over nearly 5 years) media item at issuing date.

In addition, in some instances, the specific of media item is generated based on the personal library of the media item associated with user In the corpus of user.The personal library of media item includes the media item (for example, song, film etc.) gathered by user.Media item Personal library be stored on user equipment and/or be stored on the remote server associated with user account.Generate media The corpus specific to user of item, there are those media being similar in the individual media Xiang Ku of user to be advantageous to media item The media item of the media parameter of item.For example, if the individual media Xiang Ku of user includes artist Katy Perry many languages Expect storehouse, then generation specific to user corpus, be advantageous to artist Katy Perry or similar to Katy Perry The associated media items of artist such as Avril Lavigne.

In some instances, generate the corpus specific to user of media item so that the media item specific to user Corpus in media item include the metadata for indicating corresponding with respective media item multiple media parameters.Specifically, each The metadata of media item limits any one media parameter in above-mentioned media parameter, such as { artist }, { school }, { subflow Group }, { issuing date }, activity }, { mood }, { occasion }, { edit list }, { political orientation } or { skills involved in the labour }.Profit Recommend suitable media item to user with metadata based on the media parameter limited in being inputted in user speech.For example, media The corpus specific to user of item includes instrumental music song " the Chariots of with the metadata for indicating following media parameter Fire”：{ title }=" Chariots of Fire ", { school }=" original music；Instrumental music ", { composer }=Vangelis, { issuing date }=" in March, 1981 ", { activity }=" running " and { mood }=" pursuing a goal with determination ".Therefore, if connect at frame 802 Phonetic entry " playing some instrumental musics of pursuing a goal with determination for being adapted to run for me " is received, then based on the media parameter limited in phonetic entry (that is, { school }=" instrumental music ", { activity }=" running " and { mood }=" pursuing a goal with determination "), " Chariots of Fire " are matchmaker to song Alternative media item that is being identified in the corpus specific to user of body item and recommending user.

In some instances, the metadata of the media item in the corpus specific to user of media item is based on analysis and matchmaker The associated concrete property of body item and by Intelligent generation.Specifically, determined by analyzing the voice data of media item each The music rhythm (for example, beat number per minute) of media item.Based on identified music rhythm come determine media item specific to { activity } media parameter of media item in the corpus of user.For example, media item with faster music rhythm with it is more living Bold and vigorous movable body-building, hikings etc. are associated.On the contrary, media item and more passive work with slower music rhythm Dynamic sleep, meditation etc. are associated.Therefore, correlation { activity } media parameter determined based on music rhythm is included in phase In the metadata for the media item answered.

In addition, in some instances, the note based on each media item transfer to determine media item specific to user's { mood } media parameter of media item in corpus.For example, the music tone of each media item is analyzed, to determine and audio number According to associated music tone (for example, the big tune of c major, G, A ditties etc.).Media item with master music tone with more actively and Cheerful and light-hearted mood " cheerful and light-hearted ", " optimism ", " optimistic ", " excitement " etc. are associated, and the media item with time music tone with More grieved mood " sadness ", " grief " etc. are associated.

At frame 812, the corpus specific to user of media item is determined based on the media parameter of identified frame 808 In multiple alternative media items.For example, search is performed to identify the spy of media item using the media parameter of identified frame 808 Include the alternative media item of the metadata of the media parameter of identified frame 808 due to having in the corpus of user.For example, Fig. 9 A are returned, media parameter { school }=" hip-hop " are determined at frame 808, to be limited in phonetic entry 902.In response to true Determine phonetic entry and media parameter { school } is defined to " hip-hop ", can search for the corpus specific to user of media item, to know There is not include the media item of the metadata of media parameter { school }=" hip-hop ".For example, media item such as J-Kwon " Tipsy ", Jay-Z " 99Problems " and Drake " Over " each have include media parameter { school }=" hip-hop " Metadata.Therefore, in this example, the multiple alternative media items determined from the corpus specific to user of media item include These media items.

In another example shown in Figure 10, the offer of user 901 phonetic entry 1002 " he, Siri, play some and be adapted to The music of barbecue ".In this example, phonetic entry is determined at frame 808, media parameter { activity } is defined to " roast ". In response to determining that media parameter { activity } is defined to " roast " by phonetic entry, the language material specific to user of media item is searched for Storehouse, to identify the media item with the metadata for including media parameter { activity }=" barbecue ".For example, media item such as The Kooks " She Moves in Her Own Way ", Katy Perry " Hot n Cold " and The Beach Boys' " Fun Fun Fun " each have the metadata for including media parameter { activity }=" barbecue ".Therefore, in this example, from matchmaker Multiple alternative media items that the corpus specific to user of body item determines include these media items.

Although Fig. 9 A-B and Figure 10 example are described relative to specific media parameter, but it would be recognized that base Any media parameter limited in the phonetic entry in frame 902 determines multiple times from the corpus specific to user of media item Select media item.For example, in addition to the media parameter { school } described in Fig. 9 A-B and Figure 10 example and { barbecue }, media parameter Including { artist }, { medium type }, { school }, { issuing date }, { mood }, { occasion }, { edit list }, { politics is inclined To }, { skills involved in the labour } etc..

At frame 814, multiple alternative media items of frame 812 are arranged using the media order models specific to user Sequence.Media order models specific to user are stored in such as user data 748 or data and model 116.Using specific The sequence specific to user of each alternative media item in multiple alternative media items is generated in the media order models of user Score.So as to be ranked up based on the sequence score specific to user to multiple alternative media items.Specific to the sequence of user Score represents that user receives the possibility of alternative media item in the case where media parameter is associated with alternative media item.Specific to The media order models of user are the statistical machine learning model trained using the data specific to user (for example, neutral net Model, Bayesian model etc.), such as information from user profile, the previous input related to media from user, Or the media item associated with user.In addition, based on the data specific to user then received come continuous updating specific to The media order models of user.For example, phonetic entry based on frame 802 or being comprised in any in the audio input of frame 824 Voice updates the media order models specific to user, as described below.

Information from user profile includes age of user, race, position, occupation etc..Utilize the information to generate Specific to the media order models of user.For example, if the information instruction user from user profile is conservative to live in Idahoan scientist, then training specific to user media order models, with for more high-tech qualification or More conservative political orientation associated media item generates more favourable score.

Utilize media order models of the previous input related to media to generate specific to user from user.Specifically Ground, the previous input related to media from user are included in the media phase received before the phonetic entry of sink block 802 Close request, selection and refusal.For example, if the previous media association requests instruction user from user generally asks popular sound Find pleasure in and refuse music of talking and singing, then the media sequence mould specific to user is trained based on the input previously related to media Type, the sequence score of pop music is more beneficial for generation and is less favorable for the sequence score of Chinese musical telling music.In another example In, input previously related to media is indicated when user browses Online Music shop, and user often checks that issuing date is 20 The music item of century 70.Based on the determination, the media order models specific to user are trained, are more beneficial for issuing with generation Date is the sequence score of the media item of the 1970s.

The media item associated with user is included in the media item found in the personal media libraries of user.In some examples In, generate the media order models specific to user using the media item in the personal media libraries of user.Specifically, training is special Due to the media order models of user, to be advantageous to the media parameter of the media item with the personal media libraries similar to user Media item.For example, train the matchmaker specific to user based on the personal media libraries of the user of many corpus with Jay-Z Body order models, be more beneficial for generation to artist Jay-Z or similar to the related media item of Jay-Z artist must Point.

In some instances, (addition, or alternatively) to perform the sequence of frame 814 using general media order models. Specifically, the general of each alternative media item in multiple alternative media items is generated using general media order models to sort Point.So as to be ranked up based on general sequence score to multiple alternative media items.General media order models are similar to specific In the media order models of user, the difference is that general media order models are used from a large number of users rather than from one specifically The related data of the media of user are trained.General sequence score represents the general supporting rate of media item.Specifically, general matchmaker The media item that body order models are most frequently asked, check or selected for a large number of users generates more favourable sequence score.

It should be appreciated that in some instances, based on from the media order models specific to user specific to user The combination of sequence score and the general sequence score from general media order models perform the sequence of frame 814.For example, insert Enter score to generate the combination of each alternative media item sequence score.The sequence score for being then based on combination is come to multiple candidate matchmakers Body item is ranked up.Additionally, it should be realized that in some instances, by general media order models and the row specific to user Sequence model integration.For example, the order models specific to user are generated using the related data of the media from a large number of users, but It is to be adjusted to be advantageous to the user preference specific to indicating in the data of user.

At frame 816, at least one media item is selected from multiple alternative media items based on the sequence of frame 814.For example, should N number of alternative media of top ranked in alternative media item or multiple alternative media items of at least one media item including top ranked , wherein N is greater than zero integer.At least one media item obtained at frame 806 is at least one for selected frame 816 Media item.Retrieved from the corpus specific to user of media item selected at least one media item (for example, a kind of or At media service 120-1), and it is provided to user at frame 818.

In some instances, pushing away based on the user couple one or more media parameters associated with least one media item The familiarity broken selects at least one media item.For example, the phonetic entry received at frame 802 is " to play one for me A little Michael Jackson song ".In this example, the phonetic entry is determined at frame 808, by media parameter { art Family } it is defined to " Michael Jackson ".Based on the determination, in the corpus specific to user at frame 812 from media item Identify the Michael Jackson songs of multiple candidates.Based on general popularity (for example, being sorted according to general media at frame 814 Model) and/or the Michael Jackson songs of multiple candidates are arranged based on the media order models specific to user Sequence.Determine user to artist's " Michael Jackson " familiarity.Based on artist " Michael Jackson " Associated data specific to user are determined.For example, referred to based on the related input of the previous media from user Show that user often browses, buys, listens to and/or asked Michael Jackson song or the personal media libraries based on user To artist, " Michael Jackson " familiarity is higher to determine the user for song comprising a large amount of Jackson.Phase Instead, infrequently browse, buy, listen to and/or ask Michael based on the related input instruction user of the media from user Song that Jackson song or personal media libraries based on user include considerably less Jackson determines the user to skill " Michael Jackson " familiarity is relatively low by astrologist.Based on identified familiarity, from how first candidate at frame 816 Michael Jackson song selection song.Such as, if it is determined that user to artist " Michael Jackson's " Familiarity is relatively low, then the Michael Jackson of most popular or top ranked candidate song is selected at frame 814.Tool Body, from the Michael Jackson of the N candidates of the Michael Jackson of how first candidate song selection top ranked Song played out as playlist.By contrast, if it is determined that user to artist " Michael Jackson's " Familiarity is higher, then selects the candidate's of popular (for example, ranking is higher) and less popular (for example, ranking is relatively low) The combination of Michael Jackson song plays out as playlist.Specifically, based on user to artist " Michael Jackson " familiarity is higher, selects the Michael Jackson of the less popular candidate of greater proportion Song.This is favourable because the user very familiar to artist Michael Jackson may be familiar with it is most popular Michael Jackson song.Such user will wish to hear the combination of Michael Jackson song, including The song of the degree of commercialization of the song of popular high commercial and less prevalence relatively low (for example, " deep cuts ").Cause This, in this example, at frame 816 based on determined by user to artist " Michael Jackson " familiarity come Select the average supporting rate of Michael Jackson song.

It should be appreciated that in some instances, the user couple one or more media associated with least one media item The familiarity of parameter is directly included in specific in the media order models of user.For example, based on determination user to art Family " Michael Jackson " are very familiar, specific to user media order models be configured as it is less popular to some Michael Jackson song generates higher sequence score.In this way, the N head candidates of top ranked Michael Jackson song include height on commercialized popular Michael Jackson song and popularity compared with The combination of low Michael Jackson song.In these examples, selected at least one media item includes ranking The Michael Jackson of highest N candidates song.

Although above-mentioned frame 806 is performed using the corpus specific to user of media item, but it would be recognized that and In other examples, other corpus of media item can be used to replace the corpus specific to user of media item.For example, In some examples, at least one media item derives from general (independently of the user's) corpus of media item or based on one or more The corpus of the media item of individual specific media parameter generation.

At frame 818, there is provided at least one media item.Specifically, at least one media are provided at user equipment .In some instances, at least one media item plays (for example, using loudspeaker 211) on a user device.At other In example, at least one media item is shown on a user device (for example, on touch-screen 212), for user check and/ Or selection.In other examples, at least one media item is provided to user (for example, using raising in the form of voice response Sound device 211).

Referring again to example as shown in Figure 9 A, at frame 814 using the media order models specific to user come to The candidate's hip-hop media item determined at frame 812 is ranked up.Specifically, candidate's hip-hop media item is ranked up so that candidate Have in candidate's hip-hop media item that media item Jay-Z " 99Problems " is determined at frame 812 top ranked.Therefore, exist In the example, at least one media item selected at frame 816 includes media item Jay-Z " 99Problems " and in user The media item is played into user 901 in equipment 903.

With reference now to example as shown in Figure 10, using the media order models specific to user come to frame at frame 814 " barbecue " alternative media item determined at 812 is ranked up.In this example, alternative media item The Kooks " She Have in the alternative media item that Moves in Her Own Way " are determined at frame 812 top ranked.Therefore, selected at frame 816 At least one media item selected include media item The Kooks " She Moves in Her Own Way ", and obtain should Media item simultaneously plays it to user 901 on user equipment 903.It should be appreciated that selected at least one media item can Including other media items.For example, alternative media item Katy Perry " Hot n Cold " and The Beach Boys " Fun There is the second high and the 3rd high ranking in the alternative media item that Fun Fun " are determined at frame 812.Selected extremely at frame 816 A few media item includes these media items.Therefore, in these examples, media item The is being played on a user device Kooks " She Moves in Her Own Way " play media item Katy Perry " Hot n Cold " and The afterwards Beach Boys " Fun Fun Fun ".

In some instances, process 800 allows users to provide to follow up when providing at least one media item at frame 818 to ask Ask.For example, at least one media item or the request related at least one media item that user's refusal provides at frame 818 are added Information.Frame 820-826 is described in terms of the response of user's reception follow-up voice request and offer to the voice request that follows up.

At frame 820, it is determined that domain corresponding with phonetic entry whether be one in multiple predetermined domains in advance really Fixed domain.Specifically, only some predetermined domains may extract the follow-up request from user.Therefore, in order to improve efficiency, The ability that follow-up voice request is received from user is realized only for some predetermined domains.For example, multiple predetermined domains Including the domain with the project with a large amount of metadata, such as " lookup media item " domain or " lookup dining room " domain.With a large amount of first numbers According to project such as media item and dining room item often extract follow-up from user and ask.It is corresponding with phonetic entry in response to determining Domain be multiple predetermined domains in a predetermined domain, at frame 820 receive audio input (for example, microphone 213 are activated).On the contrary, in response to determining that corresponding with phonetic entry domain is not one in multiple predetermined domains advance The domain of determination, process 800 are abandoned receiving audio input (for example, the un-activation of microphone 213) at frame 822.

At frame 824, audio input is received.Specifically, it is defeated to receive audio when at least one media item is provided at frame 818 Enter.For example, with reference to figure 9A, once media item Jay-Z " 99 Problems " starts to play on user equipment 903, then user Equipment 903 starts to receive audio input via the microphone of user equipment 903.

At frame 826, determine whether audio input includes voice.The determination is carried out when receiving audio input.Specifically, Once receiving audio input, then audio input is analyzed to determine whether to include sound corresponding with those features of voice Feature.Specifically, temporal signatures are extracted (for example, zero-crossing rate, short-time energy, spectrum energy or frequency from the audio input received Compose flatness) and/or frequency domain character (for example, Mel-frequency Cepstral Coefficients, linear prediction residue error or mel-frequency are discrete Wavelet coefficient) and by it compared with human speech model, to determine that audio input includes the possibility of voice.If really Its fixed possibility is higher than predetermined value, it is determined that the audio input includes voice.On the contrary, if its possibility is less than advance The value of determination, it is determined that the audio input does not include voice.In response to determining that audio input does not include voice, process 800 is pre- Reception audio input at frame 828 is stopped at after the time quantum first determined.For example, with reference to figure 9A, user equipment 903 is receiving quilt It is defined as stopping receiving audio input after the audio input of the predetermined duration without any voice.

In some instances, degree of the predetermined time quantum based on the ambient noise detected in audio input. Specifically, frame 826 includes determining the amount of the ambient noise (for example, background noise) in audio input.Based on detecting that audio is defeated Ambient noise level in entering is higher, the audio input not comprising any voice received at frame 824 it is predetermined Time quantum is reduced.Such as, if it is determined that the amplitude of the ambient noise in audio input is without departing from predetermined threshold value, then process 800 stop reception audio input at frame 828 after predetermined time quantum (for example, 7 seconds).However, if it is determined that sound The amplitude of ambient noise during frequency inputs exceeds predetermined threshold value, then process 800 at frame 828 less than predetermined Stop receiving audio input after second predetermined time quantum (for example, 4 seconds) of time quantum.

In response to determining that audio input includes voice, frame 830 is performed.At frame 830, determine audio input voice whether Corresponding to phonetic entry identical domain.The determination includes determining user view corresponding with the voice of audio input.According to class The mode of frame 804 as described above is similar to determine user view.It is determined that user view corresponding with the voice of audio input includes It is determined that domain corresponding with the voice of audio input.It is then determined that with the voice of audio input corresponding to domain with and frame 802 voice Whether domain corresponding to input is identical.In response to determining that the voice of audio input does not correspond to and phonetic entry identical domain, process 800 abandon providing the response to audio input at frame 832.This is desirable for filtering out babble noise.For example, with reference to Fig. 9 A, phonetic entry 902 correspond to " lookup media item " domain.If received when playing Jay-Z song " 99Problems " The babble noise unrelated with searching media item is included to audio input and the audio input, it is determined that the multichannel is overlapped and made an uproar Sound is unrelated with phonetic entry 902, and will not provide a user follow-up response (frame 832).

Voice in response to determining audio input corresponds to and phonetic entry identical domain, execution frame 834.In frame 834 Place, response is provided according to user view corresponding with the voice of audio input.According to above with reference to similar described in Fig. 7 A-C Mode provides response.Specifically, looked into based on identified user view corresponding with the voice of audio input to generate structuring Ask.Then one or more tasks corresponding with user view are performed according to the structuralized query generated.Based on one or Multiple performed tasks provide response.

Frame 820-834 is further described with reference to figure 9A-B and Figure 10 example.In figure 9 a, broadcast in user equipment 903 When putting acquired at least one media item Jay-Z " 99Problems " (frame 818), received from user 901 and include the second language " what is all right in addition to Jay-Z for sound input 904！" audio input (frame 824).It is it is determined that corresponding with the second phonetic entry 904 User view (frame 830).Media item based on the phrase " Jay-Z " in the second phonetic entry 904 and broadcasting Jay-Z The context of the user equipment 903 of " 99Problems " come determine the second phonetic entry 904 correspond to it is identical with phonetic entry 902 Domain.Specifically, it is determined that it is " lookup media item " domain with 904 corresponding domain of phonetic entry.In addition, determine the second phonetic entry Whether 904 correspond to refusal media item " 99 Problems ".Based on entering with 904 corresponding user view of the second phonetic entry Row determines.In this example, based on play media item " 99Problems " context in explain phrase " except ... in addition to it is assorted It is all right " refuse media item " 99Problems " and obtain to be directed to media item to determine that the second phonetic entry 904 corresponds to The alternative user view recommended.Then one or more tasks (frame 834) corresponding with the user view are performed.Specifically, ring Should be in it is determined that the second phonetic entry corresponds to the refusal of at least one media item, based on the refusal come to previously determined and sequence The hip-hop media item (for example, at frame 812 and frame 814) of candidate resequenced.The rearrangement is similar to frame 814 Sequence, unlike unfavorable sequence score is generated to the media item with media parameter { artist }=Jay-Z.Example Such as, the hip-hop media item of candidate is resequenced so that alternative media item J-Kwon " Tipsy " turns into the cry of surprise of candidate The media item of top ranked in media item is breathed out, and alternative media item Jay-Z " 99Problems " turns into the hip-hop media of candidate The minimum media item of ranking in.Based on the rearrangement, as shown in Figure 9 B, obtain media item J-Kwon " Tipsy " and Played on user equipment 903.In addition, as described above, based on any subsequent phonetic entry received from user come to spy Media order models due to user carry out continuous updating.Correspond to accordingly, in response to the second phonetic entry of determination at least one The refusal of individual media item, the media order models specific to user are updated according to the refusal.For example, renewal is specific to user's Media order models, so that the alternative media item with media parameter { artist }=Jay-Z subsequently generates less favorable sequence Score.Therefore, in the subsequent acquisition request media item recommendations of user, digital assistants are recommended unlikely and artist Jay-Z Associated media item.

Referring now to Figure 10 example, user 901 provides the second phonetic entry 1004, and " when this issues" tool Body, in the broadcasting media item The of user equipment 903 Kooks " during She Moves in Her Own Way ", from user 901 Receive the second phonetic entry 1004 (frame 824).It is determined that with 1004 corresponding user view (frame 830) of the second phonetic entry.It is based on Words " this " and user equipment 903 in second phonetic entry 1004 play media item " She Moves in Her Own Way " Context come determine the second phonetic entry 1004 correspond to and the identical domain of phonetic entry 902.It is specifically, it is determined that defeated with voice It is " lookup media item " domain to enter domain corresponding to 904.In addition, in this example, based in broadcasting media item " She Moves in Words " this " and " distribution " is explained in Her Own Way " context to determine that (frame 830) second phonetic entry 904 corresponds to Obtain and the media item " user view at She Moves in Her Own Way " associated issuing dates.It is true in response to this It is fixed, perform one or more tasks (frame 834) corresponding with user view.Specifically, song " She Moves in Her are retrieved Own Way " issuing date (for example, from one or more media services 120-1) and it is provided to user's (frame 834).For example, as shown in Figure 10, according to identified user view, voice response 1006 is provided at user equipment 903 To user 901.Specifically, voice response 1006 indicates that " She Moves in Her Own Way " issuing date is song " in June, 2006 ".In some instances, in response to the second phonetic entry 1004, release data is addition or alternatively displayed on On user equipment 903.

Returns frame 804, the phonetic entry in response to decision block 802 do not correspond to the personalized recommendation obtained for media item User view, perform Fig. 8 C frame 836.At frame 836, whether the phonetic entry of decision block 802, which corresponds to obtain, has most The user view of the media item at nearly issuing date.As described above, user's meaning corresponding with phonetic entry is determined at frame 804 Figure.Executable intention node selected in knowledge based body (for example, ontologies 760) carrys out decision block 836.It is if selected The node selected is with the corresponding executable intention for obtaining the media item with nearest issuing date, it is determined that phonetic entry is corresponding In the user view for obtaining the media item with nearest issuing date.If on the contrary, the node is with except acquisition is with nearest hair Corresponding executable intention outside the media item on row date, it is determined that phonetic entry, which does not correspond to acquisition, has the nearest issue date The user view of the media item of phase.

In some instances, determine whether phonetic entry corresponds to the user for obtaining the media item with nearest issuing date Intention includes determining whether phonetic entry is predetermined short including one or more of more than second predetermined phrases Language.Specifically, the executable intention node and second of user view corresponding with media item of the acquisition with nearest issuing date Multiple predetermined phrases are associated.Predetermined phrase more than second be stored in and obtain user's meaning of media item It can perform and be intended in the associated glossarial index (glossarial index 744) of node corresponding to figure.More than second predetermined phrase Including the phrase such as " new music ", " issuing recently ", " newest issue ", " newly going out ".Phonetic entry based on frame 802 includes The predetermined phrase of one or more of individual predetermined phrase more than second, phonetic entry is mapped to be had with acquisition It can perform corresponding to the user view of the media item at nearest issuing date and be intended to node.Accordingly, it is determined that the phonetic entry pair of frame 802 Should be in the user view for obtaining the media item with nearest issuing date.For example, with reference to figure 11, receive from user's 901 Phonetic entry 1102 " he, Siri, some pop musics newly gone out are played to me ".Based on the phonetic entry for including phrase " newly going out " 1102, select executable intention node corresponding with the user view for obtaining the media item with nearest issuing date.Therefore, really Determine phonetic entry 1102 and correspond to the user view for obtaining the media item with nearest issuing date.

In response to determining that phonetic entry corresponds to the user view of media item of the acquisition with nearest issuing date, frame is performed 838.On the contrary, in response to determining that phonetic entry does not correspond to the user view of media item of the acquisition with nearest issuing date, mistake Journey 800 is abandoned performing frame 838.For example, as shown in Figure 8 C, there is distribution recently in response to determining that phonetic entry does not correspond to acquisition The user view of the media item on date, process 800 terminate.

At frame 838, at least one second media item is obtained from the second corpus of media item.Frame 838 is similar to frame 806, unlike using the second corpus of media item and the corpus specific to user of non-media item performs frame 838. In addition, frame 838 includes the frame similar to frame 808-816, different is still, relative to the second corpus of media item rather than matchmaker The corpus specific to user of body item performs the frame.Second corpus of media item is such as issue date based on media item Phase and the general corpus of media item generated.Specifically, each media item in the second corpus of media item, which has, is working as Issuing date in the predetermined time range on preceding date.Working as example, the second corpus of media item only includes having The media item at the issuing date in preceding three months dates.In some instances, the people based on such as each media item of other factors Gas generates the second corpus of media item.

At frame 840, there is provided at least one second media item.Frame 840 is similar to frame 818.Specifically, at user equipment At least one second media item is provided.In some instances, at least one media item is played at user equipment.In other examples In, at least one media item is shown on a user device (for example, on touch-screen 212), so that user checks and/or selects Select.In other examples, at least one media item is provided to user in the form of voice response.

Frame 838-840 is further described with reference to figure 11.For example, obtained in response to determining that phonetic entry 1102 corresponds to The user view of media item with nearest issuing date, the digital assistants realized on user equipment 903 from media item Two corpus obtain at least one second media item.Second corpus of media item, which only includes, to be had in current date three months Issuing date media item.In this example, if current date is on June 1st, 2016, the second corpus of media item In each media item have and be no earlier than issuing date on March 1st, 2016.Therefore, obtained from the second corpus of media item At least one second media item have be no earlier than on March 1st, 2016 release data.In this example, this at least one " Dangerous Woman ", its issuing date are March 11 in 2016 to song of individual second media item including Ariana Grande Day.As illustrated, in response to phonetic entry 1102, obtaining song, " Dangerous Woman " are (for example, from one or more matchmakers In body service 120-1) and played out on user equipment 903.

5th, other electronic equipments

Figure 12 shows the functional block diagram of the electronic equipment 1200 configured according to the principle of the various examples.This sets Standby functional block optionally by the combination of the hardware of the principle that carries out various described examples, software or hardware and software Lai Realize.It will be understood by those of skill in the art that the functional block described in Figure 12 is optionally combined or is separated into sub- frame, with Realize the principle of the various examples.Therefore, description herein optionally supports any possible of functional block as described herein Combination or separation further limit.

As shown in figure 12, electronic equipment 1200 includes being configured as showing graphic user interface and receives touch from user The touch screen display unit 1202 of input, the audio input unit for being configured as receiving audio input (for example, phonetic entry) 1204th, it is configured as exporting the loudspeaker unit 1205 of audio (for example, voice and/or media content) and is configured as passing The defeated and communication unit of receive information 1206.Electronic equipment 1200 also includes the processing for being coupled to touch screen display unit 1202 Unit 1208 and audio input unit 1204 and communication unit 1206.In some instances, processing unit 1208 includes connecing Receive unit 1210, determining unit 1212, acquiring unit 1214, provide unit 1216, sequencing unit 1218, updating block 1220, Stop element 1222, abandon unit 1224 and selecting unit 1226.

According to some embodiments, the processing unit 1208, which is configured as receiving from user, represents one or more media items Phonetic entry (for example, phonetic entry of frame 802) (for example, using receiving unit 1210 and via audio input unit 1204).The processing unit 1208 is additionally configured to determine whether phonetic entry corresponds to acquisition and push away for the personalization of media item The user view (for example, utilizing determining unit 1212) (for example, frame 804) recommended.The processing unit 1208 is additionally configured to respond In it is determined that phonetic entry correspond to obtain for media item personalized recommendation user view and from media item specific to The corpus at family obtains at least one media item (for example, at least one media item of frame 806) (for example, utilizing acquiring unit 1214).The corpus specific to user of media item is generated based on the data associated with user (for example, the media of frame 806 The corpus specific to user of item).Processing unit 1208 is additionally configured to provide at least one media item (for example, using carrying For unit and use touch screen display unit 1202 and/or loudspeaker unit 1205) (for example, frame 818).

In some instances, determine whether phonetic entry corresponds to the user's meaning obtained for the personalized recommendation of media item Whether the quantity for the parameter that figure includes determining to limit in phonetic entry is less than predetermined threshold value (for example, frame 804).

In some instances, determine whether phonetic entry corresponds to the user's meaning obtained for the personalized recommendation of media item Figure includes determining whether phonetic entry is corresponding multiple short including the user view with obtaining the personalized recommendation for being directed to media item A phrase (for example, frame 804) in language.

In some instances, determine whether phonetic entry corresponds to the user view for obtaining personalized recommendation media including true Whether the fixed phonetic entry is related to user's (for example, frame 804).

In some instances, generated based on the media item for previously being selected or being asked by user media item specific to user Corpus (for example, corpus specific to user of the media item of frame 806).

In some instances, based on previously by user refusal media item and generated the language material specific to user of media item Storehouse (for example, frame 806).

In some instances, personal library based on the media item associated with user and generate media item specific to user Corpus (for example, frame 806).

In some instances, the processing unit 1208 is also configured to use the media order models (example specific to user Such as, frame 814) come multiple alternative media items of the corpus specific to user from media item are ranked up (for example, utilize Sequencing unit 1218 is ranked up).The media specific to user are generated based on the previous and media association requests from user Order models.Obtaining a few media item includes selecting at least one media item (example from multiple alternative media items based on sequence Such as, frame 816).

In some instances, the processing unit 1208 is additionally configured to receive the second phonetic entry (for example, profit from user With receiving unit 1210 and via audio input unit 1204).The processing unit 1208 is additionally configured to determine that the second voice is defeated Enter the refusal (for example, utilizing determining unit 1212) whether corresponded to at least one media item.The processing unit 1208 goes back quilt It is configured to, in response to determining that the second phonetic entry corresponds to the refusal at least one media item, be updated according to the refusal specific In the media order models (for example, utilizing updating block 1220) of user.

In some instances, the processing unit 1208 is additionally configured to based on the refusal at least one media item come to coming Resequenced from multiple alternative media items of the corpus specific to user of media item (for example, utilizing sequencing unit 1218).The processing unit 1208 is additionally configured to select at least one second from multiple alternative media items based on rearrangement Media item (for example, utilizing selecting unit 1226).

In some instances, the supporting rate based on each media item in multiple alternative media items is come to multiple alternative medias Item is ranked up (for example, frame 814).

In some instances, each media item in the corpus specific to user of media item includes instruction and media item Associated movable metadata.Activity is associated with media item based on the music rhythm of media item.

In some instances, each media item in the corpus specific to user of media item includes instruction and media item The metadata of associated mood.Mood is associated with media item based on the music tone of media item.

In some instances, changed handling unit 1208 be additionally configured to determine phonetic entry whether limit it is related to the period The occasion (for example, utilizing determining unit 1212) (for example, frame 804) of connection.The processing unit 1208 is additionally configured in response to true Determine phonetic entry to limit the occasion associated with the period and at least one media item is obtained based on the occasion (for example, utilizing Acquiring unit 1214), wherein at least one media item includes the metadata (for example, frame 806) of instruction occasion.

In some instances, the processing unit 1208 is additionally configured to determine whether phonetic entry limits and media representatives' phase The edit list (for example, utilizing determining unit 1212) (for example, frame 804) of association.The processing unit 1208 is additionally configured to ring Should be in it is determined that phonetic entry be limited the edit list associated with media representatives and arranged based on the editor associated with media representatives Table obtains at least one media item (for example, utilizing acquiring unit) (for example, frame 806).At least one media item includes referring to Show the metadata of the edit list associated with media representatives.

In some instances, the processing unit 1208 is additionally configured to determine whether phonetic entry limits mood (for example, profit With determining unit 1212) (for example, frame 804).The processing unit 1208 is additionally configured in response to determining that phonetic entry limits feelings Thread and at least one media item (for example, utilizing acquiring unit 1214) is obtained based on the mood, wherein at least one media item Metadata (for example, frame 806) including indicating mood.

In some instances, the processing unit 1208 be additionally configured to determine phonetic entry whether restriction activity (for example, profit With determining unit 1212) (for example, frame 804).The processing unit 1208 is additionally configured to live in response to determining that phonetic entry limits Move and at least one media item (for example, utilizing acquiring unit 1214) is obtained based on the activity, wherein at least one media item Metadata (for example, frame 806) including instruction activity.

In some instances, the processing unit 1208 is additionally configured to determine) phonetic entry whether limiting time section (example Such as, determining unit 1212 (for example, frame 804) is utilized.The processing unit 1208 is additionally configured in response to determining phonetic entry limit Section of fixing time and determine whether phonetic entry limits the school associated with the period (for example, utilizing determining unit 1212).Should Processing unit 1208 be additionally configured in response to determine phonetic entry to limit the school that is associated with the period and be based on the period and School determines that subflow is sent (for example, utilizing determining unit 1212).Sent based on subflow and obtain at least one media item and extremely A few media item includes the metadata (for example, frame 806) of instruction subflow group.

In some instances, phonetic entry limits the classification of media item, and obtains at least one media item including obtaining The multiple media items associated with the classification of media item.The processing unit 1208 is additionally configured to determine class of the user to media item Other familiarity (for example, familiarity of frame 816) (for example, utilizing determining unit 1212).The plurality of media item is averaged Familiarity of the supporting rate based on user to the classification of media item.

In some instances, the processing unit 1208 is additionally configured to come by using phonetic entry execution speaker identification Determine the identity (for example, utilizing determining unit 1212) of user.The processing unit 1208 is additionally configured to use based on determined by The identified identity at family and the language specific to user of media item is determined from multiple corpus specific to user of media item Expect storehouse (for example, utilizing determining unit 1212).

In some instances, obtaining at least one media item includes sending crypto token to remote server.Encryption order Board includes customer identification information.Crypto token is needed to access the language material specific to user of media item via remote server Storehouse.

In some instances, the processing unit 1208 is additionally configured to determine domain corresponding with phonetic entry (for example, frame 820 domain) whether it is a predetermined domain (for example, utilizing determining unit 1212) in multiple predetermined domains.Should Processing unit 1208 is additionally configured in response to determining that domain corresponding with phonetic entry is one in multiple predetermined domains Predetermined domain, audio input (for example, audio input of frame 824) is received when providing at least one media item (for example, profit With receiving unit 1210 and by audio input unit 1204).The processing unit 1208 is additionally configured to determine that audio input is It is no to include voice (for example, frame 826) (for example, utilizing determining unit 1212).The processing unit 1208 be additionally configured in response to Determine audio input not include voice and stop receiving audio input after predetermined time quantum (for example, utilizing stopping Unit 1222) (for example, frame 828).

In some instances, the processing unit 1208 is additionally configured to determine in response to determining audio input to include voice Whether the voice of audio input corresponds to phonetic entry identical domain (for example, utilizing determining unit 1212) (for example, frame 830).The processing unit 1208 is additionally configured to correspond to and phonetic entry identical domain in response to the voice for determining audio input And determine user view (for example, user view of frame 820) corresponding with the voice of audio input (for example, utilizing determining unit 1212).The processing unit 1208 is additionally configured to be provided corresponding to the voice of audio input according to user view) it is directed to audio The response (for example, response of frame 834) of input is (using providing unit 1216.

In some instances, the processing unit 1208 is additionally configured in response to determining that the voice of audio input does not correspond to With phonetic entry identical domain and abandon providing response (for example, using abandon unit 1224) to audio input (for example, frame 832)。

In some instances, degree of the predetermined time quantum based on the ambient noise detected in audio input.

In some instances, there is provided at least one media item includes playing media item.The processing unit 1208 is also configured To receive the 3rd phonetic entry (for example, phonetic entry in the audio input of frame 824) when playing media item (for example, utilizing Receiving unit 1210 and via audio input unit 1204).The processing unit 1208 is additionally configured to be based on playing matchmaker Body item and the 3rd phonetic entry determine that user view (for example, user view of frame 820) corresponds to the 3rd phonetic entry (example Such as, determining unit 1212 is utilized).The processing unit 1208 is additionally configured to according to user view corresponding with the 3rd phonetic entry To provide response (for example, response of frame 834) (for example, using provide unit 1216).

In some instances, the processing unit 1208 is additionally configured in response to determining that phonetic entry does not correspond to acquisition pin Determine whether phonetic entry corresponds to the user view of the personalized recommendation of media item to obtain with nearest issuing date The user view (for example, utilizing determining unit 1212) (for example, frame 836) of media item.The processing unit 1208 is additionally configured to In response to determine phonetic entry correspond to obtain the user view of the media item with nearest issuing date and from the of media item Two corpus obtain at least one second media item (for example, at least one second media item of frame 838) (for example, utilizing acquisition Unit 1214).Each media item in second corpus of media item has the predetermined time range in current date Interior issuing date.The processing unit 1208 is additionally configured to provide at least one second media item (for example, using providing unit 1216) (for example, frame 840).

In some instances, determine whether phonetic entry corresponds to the user for obtaining the media item with nearest issuing date It is intended to include the user view corresponding for determining whether phonetic entry is included with obtaining the media item with nearest issuing date A phrase (for example, frame 836) more than two in individual phrase.

In some instances, the processing unit 1208 be additionally configured to determine the political orientation associated with user (for example, Utilize determining unit 1212) (for example, frame 814).Media item of the determination based on user's previous Request or consumption.Based on being determined Political orientation and obtain at least one media item.

In some instances, the processing unit 1208 is additionally configured to determine the skills involved in the labour associated with user (for example, utilizing determining unit 1212) (for example, frame 814).Media item of the determination based on user's previous Request or consumption.It is based on Identified skills involved in the labour and obtain at least one media item.

Above with reference to the operation described in Fig. 8 A-C optionally by the part shown in Fig. 1-4, Fig. 6 A-B and Fig. 7 A-C Lai real It is existing.For example, the operation of process 800 can handle mould by operating system 718, application program module 724, I/O processing modules 728, STT Block 730, natural language processing module 732, glossarial index 744, task flow processing module 736, service processing module 738, one kind Or media services one or more of 120-1 or one or more processors 220,410,704 to realize.This area Those of ordinary skill can know clearly how to realize based on the part described in Fig. 1-4, Fig. 6 A-B and Fig. 7 A-C Other processes.

According to some specific implementations, there is provided a kind of computer-readable recording medium is (for example, non-transient computer readable storage Medium), one or more journeys that the computer-readable recording medium storage is performed by the one or more processors of electronic equipment Sequence, one or more programs include being used for the instruction for performing any one of method or process described herein.

According to some specific implementations, there is provided a kind of electronic equipment (for example, portable electric appts), the electronic equipment include For performing the device of any one of method or process described herein.

According to some specific implementations, there is provided a kind of electronic equipment (for example, portable electric appts), the electronic equipment include It is configured as performing the processing unit of any one of method or process described herein.

According to some specific implementations, there is provided a kind of electronic equipment (for example, portable electric appts), the electronic equipment include One or more processors and storage by one or more of computing devices one or more programs memory, this one Individual or multiple programs include being used for the instruction for performing any one of method or process described herein.

For illustrative purposes, description above is described by reference to specific embodiment.However, example above The property shown is discussed being not intended to limit or limits the invention to disclosed precise forms.According to teachings above content, very More modifications and variations are all possible.It is to best explain these technologies to select and describe these embodiments Principle and its practical application.Others skilled in the art are thus, it is possible to best utilize these technologies and with suitable In the various embodiments of the various modifications of desired special-purpose.

Although having carried out comprehensive description to the disclosure and example referring to the drawings, it should be noted that, various change and repair Change and will become obvious for those skilled in the art.It should be appreciated that such change and modifications is considered as being wrapped Include in the range of the disclosure and example being defined by the claims.

As described above, the one side of the technology of the present invention is to gather and using the data derived from various sources, to improve Delivering it to user may perhaps any other content in inspiration interested.The disclosure is expected, in some instances, these institutes The data of collection may include to uniquely identify or available for the personal information data for contacting or positioning specific people.Such personal letter Breath data may include demographic data, location-based data, telephone number, e-mail address, home address or any other Identification information.

Be benefited the present disclosure recognize that may be used in family using such personal information data in the technology of the present invention.For example, The personal information data can be used for delivering user object content interested.Therefore, such personal information data are used to cause Planned control can be carried out to the content delivered.In addition, the disclosure is it is also contemplated that personal information data are beneficial to user's Other purposes.

The disclosure be contemplated that be responsible for the collections of such personal information data, analysis, openly, transmission, storage or other use The entity on way will comply with the privacy policy established and/or privacy practice.Specifically, such entity should be carried out and adhere to using It is acknowledged as being met or exceeded by the privacy political affairs to safeguarding the privacy of personal information data and the industry of security or administration request Plan and practice.For example, the personal information from user should be collected for the legal and rational purposes of entity, and do not exist Share or sell outside these legal uses.In addition, such collection should be carried out only after user's informed consent.In addition, this Class entity should take any required step, to ensure and protect the access to such personal information data, and guarantee Access personal information data other people observe their privacy policy and program.In addition, this entity can make itself to be subjected to Third party is assessed to prove that it observes the privacy policy accepted extensively and practice.

Regardless of afore-mentioned, the disclosure is it is also contemplated that user optionally prevents to use or access personal information data Embodiment.I.e. the disclosure is expected that hardware element and/or software element can be provided, to prevent or prevent to such personal information number According to access.For example, for advertisement delivery service, technology of the invention can be configured as allowing user during registration service " addition " or " exiting " is selected to participate in the collection to personal information data.And for example, user may be selected not as object content delivering clothes Business provides positional information.For another example, user may be selected not providing accurate positional information, but granted transmission position area information.

Therefore, although the disclosure is widely covered using personal information data to realize that one or more is various disclosed Embodiment, but the disclosure it is also contemplated that various embodiments also can in the case where such personal information data need not be accessed quilt Realize.That is, the various embodiments of the technology of the present invention will not due to lack such personal information data all or part of and It can not be normally carried out.For example, can by the personal information based on non-personal information data or absolute bottom line such as with user Content, other non-personal information available to content delivery services or publicly available information that associated equipment is asked pushes away Disconnected preference, so as to select content and be delivered to user.

Claims

1. a kind of be used to operate digital assistants to probe into the method for media item, methods described includes：

At the electronic equipment with memory and one or more processors：

The phonetic entry for representing the request to one or more media items is received from user；

Determine whether the phonetic entry corresponds to the user view for obtaining the personalized recommendation for media item；And

In response to determining that the phonetic entry corresponds to user view of the acquisition for the personalized recommendation of media item：

At least one media item, the language specific to user of the media item are obtained from the corpus specific to user of media item Material storehouse is generated based on the data associated with the user；And

At least one media item is provided.

2. according to the method for claim 1, obtained wherein determining whether the phonetic entry corresponds to for media item Whether the quantity for the parameter that the user view of personalized recommendation includes determining to limit in the phonetic entry is less than threshold value.

3. according to the method for claim 1, obtained wherein determining whether the phonetic entry corresponds to for media item The user view of personalized recommendation includes determining whether the phonetic entry includes with obtaining the personalized recommendation for media item The user view corresponding to a phrase in multiple phrases.

4. according to the method for claim 1, obtained wherein determining whether the phonetic entry corresponds to for media item The user view of personalized recommendation includes determining whether the phonetic entry is related to the user.

5. according to the method for claim 1, wherein the corpus specific to user of the media item is based on previously by institute State the media item of user's selection or request and generate.

6. according to the method for claim 1, wherein the corpus specific to user of the media item is based on previously by institute State the media item of user's refusal and generate.

7. according to the method for claim 1, wherein the corpus specific to user of the media item is based on and the use The personal library of the associated media item in family and generate.

8. the method according to claim 11, in addition to：

Using the media order models specific to user come to the multiple of the corpus specific to user from the media item Alternative media item is ranked up, and the media order models specific to user are based on multiple previous medias from the user Association requests and generate, wherein obtaining at least one media item includes sorting come from the multiple alternative media based on described Item selects at least one media item.

9. according to the method for claim 8, wherein the multiple alternative media item is based in the multiple alternative media item The supporting rate of each media item be ranked up.

10. the method according to claim 11, in addition to：

The second phonetic entry is received from the user；

Determine whether second phonetic entry corresponds to the refusal at least one media item；And

In response to determining that second phonetic entry corresponds to the refusal at least one media item：

Specific to the media order models of user described in being updated according to the refusal.

11. the method according to claim 11, in addition to：

Based on the refusal at least one media item come the institute to the corpus specific to user from media item Multiple alternative media items are stated to be resequenced；And

At least one second media item is selected from the multiple alternative media item based on the rearrangement.

12. according to the method for claim 1, wherein each media in the corpus specific to user of the media item Item includes indicating the movable metadata associated with the media item, and the wherein described movable sound based on the media item Happy rhythm and it is associated with the media item.

13. according to the method for claim 1, wherein each media in the corpus specific to user of the media item Item includes indicating the metadata of the mood associated with the media item, and wherein described sound of the mood based on the media item Musical sound adjust and it is associated with the media item.

14. the method according to claim 11, in addition to：

Determine whether the phonetic entry limits the occasion associated with the period；And

In response to determining that the phonetic entry limits the occasion associated with the period, obtained based on the occasion described at least One media item, wherein at least one media item includes the metadata for indicating the occasion.

15. the method according to claim 11, in addition to：

Determine whether the phonetic entry limits the edit list associated with media representatives；And

In response to determining that the phonetic entry limits the edit list associated with media representatives, based on media representatives' phase The edit list of association obtains at least one media item, wherein at least one media item includes instruction and institute State the metadata of the associated edit list of media representatives.

16. the method according to claim 11, in addition to：

Determine whether the phonetic entry limits mood；And

In response to determining that the phonetic entry limits mood, at least one media item is obtained based on the mood, wherein At least one media item includes the metadata for indicating the mood.

17. the method according to claim 11, in addition to：

Determine the phonetic entry whether restriction activity；And

In response to determining the phonetic entry restriction activity, at least one media item is obtained based on the activity, wherein At least one media item includes indicating the movable metadata.

18. the method according to claim 11, in addition to：

Determine the phonetic entry whether limiting time section；

In response to determining the phonetic entry limiting time section, it is related to the period to determine whether the phonetic entry limits The school of connection；And

In response to determining that the phonetic entry limits the school associated with the period, based on the period and the stream Send determine subflow group, wherein at least one media item be based on the subflow send and be acquired, and wherein it is described at least One media item includes the metadata for indicating the subflow group.

19. according to the method for claim 1, wherein the phonetic entry limits the classification of media item, wherein described in obtaining At least one media item includes obtaining the multiple media items associated with the classification of the media item, and also includes：

Familiarity of the user to the classification of the media item is determined, wherein the average support of the multiple media item The familiarity of the rate based on the user to the classification of the media item.

20. the method according to claim 11, in addition to：

Speaker identification is performed by using the phonetic entry to determine the identity of the user；And

Based on the identified identity of the user, the media item is determined from multiple corpus specific to user of media item The corpus specific to user.

21. according to the method for claim 1, wherein obtain at least one media item include by crypto token send to Remote server, the crypto token include customer identification information, and wherein need the crypto token with via described remote Journey server accesses the corpus specific to user of the media item.

22. the method according to claim 11, in addition to：

It is determined that whether domain corresponding with the phonetic entry is a predetermined domain in multiple predetermined domains；

In response to determining that domain corresponding with the phonetic entry is a predetermined domain in multiple predetermined domains：

When providing at least one media item, audio input is received；

Determine whether the audio input includes voice；And

In response to determining that the audio input does not include voice, stop receiving audio input after predetermined time quantum.

23. the method according to claim 11, in addition to：

In response to determining that the audio input includes voice：

Determine whether the voice of the audio input corresponds to and the phonetic entry identical domain；

In response to determining that the voice of the audio input corresponds to and the phonetic entry identical domain：

It is determined that user view corresponding with the voice of the audio input；And

The response to the audio input is provided according to the user view corresponding with the voice of the audio input.

24. the method according to claim 11, in addition to：

In response to determining that the voice of the audio input does not correspond to and the phonetic entry identical domain：

Abandon providing the response to the audio input.

25. according to the method for claim 22, wherein the predetermined time quantum is based in the audio input The degree of the ambient noise detected.

26. the method according to claim 11, wherein it is described at least one including playing to provide at least one media item Media item in media item, and also include：

When playing the media item, the 3rd phonetic entry is received；

User corresponding with the 3rd phonetic entry is determined based on the media item and the 3rd phonetic entry that are playing It is intended to；And

Response is provided according to the user view corresponding with the 3rd phonetic entry.

27. the method according to claim 11, in addition to：

In response to determining that the phonetic entry does not correspond to user view of the acquisition for the personalized recommendation of media item：

Determine whether the phonetic entry corresponds to the user view for obtaining the media item with nearest issuing date；And

In response to determining that the phonetic entry corresponds to the user view of media item of the acquisition with nearest issuing date：

At least one second media item is obtained from the second corpus of media item, wherein in the second corpus of the media item Each media item has the issuing date in the predetermined time range of current date；And

At least one second media item is provided.

28. according to the method for claim 1, obtained wherein determining whether the phonetic entry corresponds to distribution recently The user view of the media item on date includes determining whether the phonetic entry includes with obtaining the matchmaker with nearest issuing date A phrase in more than second individual phrases corresponding to the user view of body item.

29. the method according to claim 11, in addition to：

It is determined that the political orientation associated with user, described to determine based on the previous media item asked or consumed by the user, Wherein described at least one media item is acquired based on identified political orientation.

30. the method according to claim 11, in addition to：

It is determined that the skills involved in the labour associated with user, the determination is based on the previous media asked or consumed by the user , wherein at least one media item is acquired based on identified skills involved in the labour.

31. a kind of computer-readable recording medium, the computer-readable recording medium storage is configured as by electronic equipment One or more programs that one or more processors perform, one or more of programs include being used for what is operated below Instruction：

At least one media item is provided.

32. computer-readable medium according to claim 31, obtained wherein determining whether the phonetic entry corresponds to Whether the quantity of the parameter limited for the user view of the personalized recommendation of media item including determination in the phonetic entry Less than threshold value.

33. computer-readable medium according to claim 31, obtained wherein determining whether the phonetic entry corresponds to Include determining whether the phonetic entry includes being directed to media item with obtaining for the user view of the personalized recommendation of media item Personalized recommendation the user view corresponding to a phrase in multiple phrases.

34. computer-readable medium according to claim 31, obtained wherein determining whether the phonetic entry corresponds to Include determining whether the phonetic entry is related to the user for the user view of the personalized recommendation of media item.

35. computer-readable medium according to claim 31, wherein the corpus specific to user of the media item Generated based on the media item for previously having been selected or having been asked by the user.

36. computer-readable medium according to claim 31, wherein the corpus specific to user of the media item Based on previously being generated by the media item that the user refuses.

37. computer-readable medium according to claim 31, wherein the corpus specific to user of the media item Personal library based on the media item associated with the user and generate.

38. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

39. the computer-readable medium according to claim 38, wherein the multiple alternative media item is based on the multiple The supporting rate of each media item in alternative media item is ranked up.

40. the computer-readable medium according to claim 38, in addition to：

The second phonetic entry is received from the user；

41. computer-readable medium according to claim 40, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

42. computer-readable medium according to claim 31, wherein the corpus specific to user of the media item In each media item include indicating the movable metadata associated with the media item, and wherein described activity is based on institute State the music rhythm of media item and associated with the media item.

43. computer-readable medium according to claim 31, wherein the corpus specific to user of the media item In each media item include the metadata for indicating the mood associated with the media item, and wherein described mood is based on institute State the music tone of media item and associated with the media item.

44. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

45. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction of following operation：

46. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

Determine whether the phonetic entry limits mood；And

47. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

Determine the phonetic entry whether restriction activity；And

48. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

Determine the phonetic entry whether limiting time section；

49. computer-readable medium according to claim 31, wherein the phonetic entry limits the classification of media item, its It is middle to obtain at least one media item including obtaining the multiple media items associated with the classification of the media item, and Wherein one or more of programs further comprise the instruction for being operated below：

50. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

51. computer-readable medium according to claim 31, wherein obtaining at least one media item includes adding Secret order board is sent to remote server, and the crypto token includes customer identification information, and wherein needs the crypto token To access the corpus specific to user of the media item via the remote server.

52. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

When providing at least one media item, audio input is received；

Determine whether the audio input includes voice；And

53. computer-readable medium according to claim 52, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

In response to determining that the audio input includes voice：

54. computer-readable medium according to claim 53, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

Abandon providing the response to the audio input.

55. computer-readable medium according to claim 52, wherein the predetermined time quantum is based on described The degree of the ambient noise detected in audio input.

56. computer-readable medium according to claim 31, wherein providing at least one media item includes playing Media item at least one media item, and wherein one or more of programs further comprise be used for carry out it is following The instruction of operation：

When playing the media item, the 3rd phonetic entry is received；

57. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

At least one second media item is provided.

58. computer-readable medium according to claim 31, obtained wherein determining whether the phonetic entry corresponds to The user view of media item with nearest issuing date includes determining whether the phonetic entry includes with obtaining with recently A phrase in more than second individual phrases corresponding to the user view of the media item at issuing date.

59. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

60. computer-readable medium according to claim 31, wherein one or more of programs further comprise using In the instruction for carrying out following operation：

61. a kind of be used to operate digital assistants to probe into the electronic equipment of media item, the electronic equipment includes：

One or more processors；With

Memory, the memory storage are configured as one or more programs by one or more of computing devices, One or more of programs also include being used for the instruction operated below：

At least one media item is provided.

62. equipment according to claim 61, wherein determining whether the phonetic entry corresponds to obtain is directed to media item The user view quantity of parameter that includes determining to limit in the phonetic entry of personalized recommendation whether be less than threshold value.

63. equipment according to claim 61, wherein determining whether the phonetic entry corresponds to obtain is directed to media item The user view of personalized recommendation include determining whether the phonetic entry includes pushing away with the personalization obtained for media item A phrase in multiple phrases corresponding to the user view recommended.

64. equipment according to claim 61, wherein determining whether the phonetic entry corresponds to obtain is directed to media item The user view of personalized recommendation include determining whether the phonetic entry is related to the user.

65. equipment according to claim 61, wherein the corpus specific to user of the media item be based on previously by The media item of user selection or request and generate.

66. equipment according to claim 61, wherein the corpus specific to user of the media item be based on previously by The media item of user refusal and generate.

67. equipment according to claim 61, wherein the corpus specific to user of the media item be based on it is described The personal library for the media item that user is associated and generate.

68. equipment according to claim 61, in addition to：

69. equipment according to claim 68, wherein the multiple alternative media item is based on the multiple alternative media item In the supporting rate of each media item be ranked up.

70. equipment according to claim 68, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

The second phonetic entry is received from the user；

71. equipment according to claim 70, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

72. equipment according to claim 61, wherein each matchmaker in the corpus specific to user of the media item Body item includes indicating the movable metadata associated with the media item, and wherein described activity is based on the media item Music rhythm and it is associated with the media item.

73. equipment according to claim 61, wherein each matchmaker in the corpus specific to user of the media item Body item includes indicating the metadata of the mood associated with the media item, and wherein described mood is based on the media item Music tone and it is associated with the media item.

74. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

75. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

76. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

Determine whether the phonetic entry limits mood；And

77. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

Determine the phonetic entry whether restriction activity；And

78. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

Determine the phonetic entry whether limiting time section；

79. equipment according to claim 61, wherein the phonetic entry limits the classification of media item, wherein described in obtaining At least one media item includes obtaining the multiple media items associated with the classification of the media item, and wherein described one Individual or multiple programs further comprise the instruction for being operated below：

80. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

Based on the identity of the identified user, the media are determined from the corpus specific to user of multiple media items The corpus specific to user of item.

81. equipment according to claim 61, wherein obtaining at least one media item includes sending crypto token To remote server, the crypto token includes customer identification information, and wherein needs the crypto token with via described Remote server accesses the corpus specific to user of the media item.

82. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

When providing at least one media item, audio input is received；

Determine whether the audio input includes voice；And

83. the equipment according to claim 82, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

In response to determining that the audio input includes voice：

84. the equipment according to claim 83, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

Abandon providing the response to the audio input.

85. the equipment according to claim 82, wherein the predetermined time quantum is based in the audio input The degree of the ambient noise detected.

86. equipment according to claim 61, wherein providing at least one media item is included at least one described in broadcasting Media item in individual media item, and wherein one or more of programs further comprise the finger for being operated below Order：

When playing the media item, the 3rd phonetic entry is received；

87. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

At least one second media item is provided.

88. equipment according to claim 61, sent out recently wherein determining whether the phonetic entry corresponds to obtain to have The user view of the media item on row date includes determining whether the phonetic entry includes with obtaining with nearest issuing date A phrase in more than second individual phrases corresponding to the user view of media item.

89. equipment according to claim 61, wherein one or more of programs further comprise be used for carry out it is following The instruction of operation：

90. equipment according to claim 61, wherein one or more of programs further comprise being used for carrying out it is following The instruction of operation：

The determination is based on the previous media item asked or consumed by the user, wherein at least one media item is based on institute The skills involved in the labour of determination and be acquired.

91. a kind of equipment, the equipment includes being used for the device for performing the method according to any one of claim 1-30.