EP3095113B1 - Digital personal assistant interaction with impersonations and rich multimedia in responses - Google Patents
Digital personal assistant interaction with impersonations and rich multimedia in responses Download PDFInfo
- Publication number
- EP3095113B1 EP3095113B1 EP15702033.0A EP15702033A EP3095113B1 EP 3095113 B1 EP3095113 B1 EP 3095113B1 EP 15702033 A EP15702033 A EP 15702033A EP 3095113 B1 EP3095113 B1 EP 3095113B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- utterance
- personal assistant
- digital personal
- response
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004044 response Effects 0.000 title claims description 229
- 230000003993 interaction Effects 0.000 title description 12
- 238000000034 method Methods 0.000 claims description 40
- 230000000007 visual effect Effects 0.000 claims description 15
- 230000015654 memory Effects 0.000 description 37
- 238000012545 processing Methods 0.000 description 29
- 238000004590 computer program Methods 0.000 description 20
- 230000003287 optical effect Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 6
- 230000001413 cellular effect Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000001755 vocal effect Effects 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 241000283923 Marmota monax Species 0.000 description 2
- 241001653634 Russula vesca Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000012358 sourcing Methods 0.000 description 2
- 239000002023 wood Substances 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- HEFNNWSXXWATRW-UHFFFAOYSA-N Ibuprofen Chemical compound CC(C)CC1=CC=C(C(C)C(O)=O)C=C1 HEFNNWSXXWATRW-UHFFFAOYSA-N 0.000 description 1
- 241000860832 Yoda Species 0.000 description 1
- 230000009118 appropriate response Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Definitions
- One technique for keeping the level of engagement high entails making the responses provided by the digital personal assistant funny and entertaining.
- Some conventional implementations of digital personal assistants are programmed to generate funny text responses when a user asks questions having a casual tone.
- conventional digital personal assistants typically do not leverage the full flexibility of the digital canvas when presenting playful responses. Neither do they leverage the power of modem day text-to-speech synthesizing techniques to sound funny or different when providing responses.
- WO2013155619A1 relates to a method to provide a conversation agent to process natural language queries expressed by a user and perform commands according to the derived intention of the user.
- a natural language processing (NLP) engine derives intent using conditional random fields to identify a domain and at least one task embodied in the query. The NLP may further identify one or more subdomains, and one or more entities related to the identified command.
- a template system creates a data structure for information relevant to the derived intent and passes a template to a services manager for interfacing with one or more services capable of accomplishing the task.
- a dialogue manager may elicit more entities from the user if required by the services manager and otherwise engage in conversation with the user.
- the conversational agent allows a user to engage in multiple conversations simultaneously.
- US6721706B1 relates to an interaction simulator, such as a chatterbot, to simulate an awareness of the user to generate an interaction that is more natural and appropriate than prior art chatterbots.
- the device may employ machine vision to detect the number of persons present or the activity of the user and respond accordingly by interrupting its output or by inviting a conversation or other interaction when a user approaches.
- the device may modify its responses according to the user's activity, for example, by playing music when the user falls asleep or requesting an introduction when another user speaks.
- the device may also respond to unrecognized changes in the situation by inquiring about what is going on to stimulate interaction or generate new responses.
- US2006155765A1 relates to a chat information system having a voice recognition device for recognizing voices, a voice synthesizer, a humanoid robot, a microphone for receiving the voices and a speaker for pronouncing synthesized voices.
- the system comprises a headline sensor capturing news from the Internet, a news database for storing the captured news, and a conversation database including at least a general conversation database storing a set of inquiries and responses.
- the system also includes a chat engine configured to extract one or more keywords from a user's speech that has been recognized by the voice recognition device, to search at least one of the news database and the conversation database with the extracted keywords and to output via the speaker the contents that have been hit by the search.
- a chat engine configured to extract one or more keywords from a user's speech that has been recognized by the voice recognition device, to search at least one of the news database and the conversation database with the extracted keywords and to output via the speaker the contents that have been hit by the search.
- the digital personal assistant is capable of determining that a user has asked a question or made a statement that is intended to engage with a persona of the digital personal assistant as opposed to, for example, requesting that the digital personal assistant obtain information or perform some other task on behalf of the user.
- the digital personal assistant provides a response thereto by displaying or playing back a multimedia object associated with a popular culture reference within or by a user interface of the digital personal assistant.
- the digital personal assistant in response to determining that the user has asked such a question or made such a statement, provides the response thereto by generating or playing back speech that comprises an impersonation of a voice of a person associated with the popular culture reference. Still further, the digital personal assistant may provide the response by displaying within the user interface of the digital personal assistant text that comprises a quotation associated with the popular culture reference, displaying within the user interface of the digital personal assistant a visual representation of the digital personal assistant that evokes the popular culture reference, and/or displaying within the user interface of the digital personal assistant a link that can be activated by the user to access content associated with the popular culture reference.
- references in the specification to "one embodiment,” “an embodiment,” “an example embodiment,” or the like, indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of persons skilled in the relevant art(s) to implement such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
- the digital personal assistant is capable of determining that a user has asked a question or made a statement that is intended to engage with a persona of the digital personal assistant as opposed to, for example, requesting that the digital personal assistant obtain information or perform some other task on behalf of the user.
- the digital personal assistant provides a response thereto by displaying or playing back a multimedia object associated with a popular culture reference within or by a user interface of the digital personal assistant.
- the digital personal assistant in response to determining that the user has asked such a question or made such a statement, provides the response thereto by generating or playing back speech that comprises an impersonation of a voice of a person associated with the popular culture reference. Still further, the digital personal assistant may provide the response by displaying within the user interface of the digital personal assistant text that comprises a quotation associated with the popular culture reference, displaying within the user interface of the digital personal assistant a visual representation of the digital personal assistant that evokes the popular culture reference, and/or displaying within the user interface of the digital personal assistant a link that can be activated by the user to access content associated with the popular culture reference.
- embodiments described herein can advantageously increase the level of engagement between the user and the digital personal assistant and also establish an element of trust between the user and the assistant, thereby facilitating continued use of and interaction with the digital personal assistant. For example, by providing responses that include multimedia objects, voice impersonations, quotations, and links associated with popular culture references likely to be recognized and/or appreciated by the user, the digital personal assistant can both entertain and establish a sense of commonality with the user.
- Section II describes an example system that implements a digital personal assistant that utilizes impersonations and/or multimedia in responding to chit-chat type utterances in accordance with embodiments.
- Section III describes exemplary methods for implementing a digital personal assistant that utilizes impersonations and/or multimedia in responding to chit-chat type utterances in accordance with embodiments.
- Section IV describes an example mobile device that may be used to implement a digital personal assistant in accordance with embodiments described herein.
- Section V describes an example desktop computer that may be used to implement a digital personal assistant in accordance with embodiments described herein.
- Section VI provides some concluding remarks.
- FIG. 1 is a block diagram of an example system 100 that implements a digital personal assistant that utilizes impersonations and multimedia in responding to chit-chat type utterances in accordance with an example embodiment.
- system 100 includes an end user computing device 102 that is communicatively connected to a digital personal assistant backcnd 106 via one or more networks 104.
- networks 104 Each of these components will now be described.
- End user computing device 102 is intended to represent a processor-based electronic device that is capable of executing a software-based digital personal assistant 130 that is installed thereon.
- Digital personal assistant 130 may be executed on behalf of a user of end user computing device 102.
- end user computing device 102 comprises a mobile computing device such as a mobile phone (e.g., a smart phone), a laptop computer, a tablet computer, a netbook, a wearable computer such as a smart watch or a head-mounted computer, a portable media player, a handheld gaming console, a personal navigation assistant, a camera, or any other mobile device capable of executing a digital personal assistant on behalf of a user.
- end user computing device 102 comprises a desktop computer, a gaming console, or other non-mobile computing platform that is capable of executing a digital personal assistant on behalf of a user.
- An example desktop computer that may incorporate the functionality of end user computing device 102 will be discussed below in reference to FIG. 15 .
- End user computing device 102 is capable of communicating with digital personal assistant backend 106 via network 104.
- Personal assistant backend 106 comprises one or more computers (e.g., servers) that are programmed to provide services in support of the operations of digital personal assistant 130 and other digital personal assistants executing on other end-user computing devices.
- personal assistant backend 106 includes one or more computers configured to provide services to digital personal assistant 130 relating to speech recognition and query understanding and response. In particular, as shown in FIG. 1 , these services are respectively provided by a speech recognition service 132 and a query understanding and response system 136.
- digital personal assistant backend 106 may perform any number of other services on behalf of digital personal assistant 130 although such additional services may not be explicitly described herein.
- digital personal assistant backend 106 comprise a cloud-based backend in which any one of a large number of suitably-configured machines may be arbitrarily selected to render one or more desired services in support of digital personal assistant 130.
- a cloud-based implementation provides a reliable and scalable framework for providing backend services to digital personal assistants, such as digital personal assistant 130.
- Network(s) 104 is intended to represent any type of network or combination of networks suitable for facilitating communication between end user computing devices, such as end user computing device 102, and digital personal assistant backend 106.
- Network(s) 104 may include, for example and without limitation, a wide area network, a local area network, a private network, a public network, a packet network, a circuit-switched network, a wired network, and/or a wireless network.
- end user computing device 102 includes a plurality of interconnected components, including a processing unit 110, non-volatile memory 120, volatile memory 112, one or more user input devices 116, a display 118, and one or more network interfaces 114. Each of these components will now be described.
- Processing unit 110 is intended to represent one or more microprocessors, each of which may have one or more central processing units (CPUs) or microprocessor cores. Processing unit 110 operates in a well-known manner to execute computer programs (also referred to herein as computer program logic). The execution of such computer programs causes processing unit 110 to perform operations including operations that will be described herein.
- computer programs also referred to herein as computer program logic.
- the execution of such computer programs causes processing unit 110 to perform operations including operations that will be described herein.
- Each of non-volatile memory 120, volatile memory 112, user input device(s) 116, display 118, and network interface(s) 114 is connected to processing unit 110 via one or more suitable interfaces.
- Non-volatile memory 120 comprises one or more computer-readable memory devices that operate to store computer programs and data in a persistent manner, such that stored information will not be lost even when end user computing device 102 is without power or in a powered down state.
- Non-volatile memory 120 may be implemented using any of a wide variety of non-volatile computer-readable memory devices, including but not limited to, read-only memory (ROM) devices, solid state drives, hard disk drives, magnetic storage media such as magnetic disks and associated drives, optical storage media such as optical disks and associated drives, and flash memory devices such as USB flash drives.
- ROM read-only memory
- solid state drives hard disk drives
- magnetic storage media such as magnetic disks and associated drives
- optical storage media such as optical disks and associated drives
- flash memory devices such as USB flash drives.
- Volatile memory 112 comprises one or more computer-readable memory devices that operate to store computer programs and data in a non-persistent manner, such that the stored information will be lost when end user computing device 102 is without power or in a powered down state. Volatile memory 112 may be implemented using any of a wide variety of volatile computer-readable memory devices including, but not limited to, random access memory (RAM) devices.
- RAM random access memory
- Display 118 comprises a device to which contcnt, such as text and images, can be rendered so that it will be visible to a user of end user computing device 102. Some or all of the rendering operations required to display such content may be performed at least in part by processing unit 110. Some or all of the rendering operations may also be performed by a display device interface such as a video or graphics chip or card (not shown in FIG. 1 ) that is coupled between processing unit 110 and display 118.
- a display device interface such as a video or graphics chip or card (not shown in FIG. 1 ) that is coupled between processing unit 110 and display 118.
- display 118 may comprise a device that is integrated within the same physical structure or housing as processing unit 110 or may comprise a monitor, projector, or other type of device that is physically separate from a structure or housing that includes processing unit 110 and connected thereto via a suitable wired and/or wireless connection.
- User input device(s) 116 comprise one or more devices that operate to generate user input information in response to a user's manipulation or control thereof. Such user input information is passed via a suitable interface to processing unit 110 for processing thereof.
- user input device(s) 116 may include a touch screen (e.g., a touch screen integrated with display 118), a keyboard, a keypad, a mouse, a touch pad, a trackball, a joystick, a pointing stick, a wired glove, a motion tracking sensor, a game controller or gamepad, or a video capture device such as a camera.
- touch screen e.g., a touch screen integrated with display 118
- keyboard e.g., a touch screen integrated with display 118
- keyboard e.g., a touch screen integrated with display 118
- keyboard e.g., a keypad, a mouse, a touch pad, a trackball, a joystick, a pointing stick, a wired glove, a motion tracking
- each user input device 116 may be integrated within the same physical structure or housing as processing unit 110 (such as an integrated touch screen, touch pad, or keyboard on a mobile device) or physically separate from a physical structure or housing that includes processing unit 110 and connected thereto via a suitable wired and/or wireless connection.
- Network interface(s) 114 comprise one or more interfaces that enable end user computing device 102 to communicate over one or more networks 104.
- network interface(s) 114 may comprise a wired network interface such as an Ethernet interface or a wireless network interface such as an IEEE 802.11 (“Wi-Fi”) interface or a 3G telecommunication interface.
- Wi-Fi IEEE 802.11
- 3G telecommunication interface 3G telecommunication interface
- non-volatile memory 120 stores a number of software components including a plurality of applications 122 and an operating system 124.
- Each application in the plurality of applications 122 comprises a computer program that a user of end user computing device 102 may cause to be executed by processing unit 110.
- the execution of each application causes certain operations to be performed on behalf of the user, wherein the type of operations performed will vary depending upon how the application is programmed.
- Applications 122 may include, for example and without limitation, a telephony application, an e-mail application, a messaging application, a Web browsing application, a calendar application, a utility application, a game application, a social networking application, a music application, a productivity application, a lifestyle application, a reference application, a travel application, a sports application, a navigation application, a healthcare and fitness application, a news application, a photography application, a finance application, a business application, an education application, a weather application, a books application, a medical application, or the like. As shown in FIG. 1 , applications 122 include a digital personal assistant 130, the functions of which will be described in more detail herein.
- Applications 122 may be distributed to and/or installed on end user computing device 102 in a variety of ways, depending upon the implementation. For example, in one embodiment, at least one application is downloaded from an application store and installed on end user computing device 102. In another embodiment in which end user device 102 is utilized as part of or in conjunction with an enterprise network, at least one application is distributed to end user computing device 102 by a system administrator using any of a variety of enterprise network management tools and then installed thereon. In yet another embodiment, at least one application is installed on end user computing device 102 by a system builder, such as by an original equipment manufacturer (OEM) or embedded device manufacturer, using any of a variety of suitable system builder utilities. In a further embodiment, an operating system manufacturer may include an application along with operating system 124 that is installed on end user computing device 102.
- OEM original equipment manufacturer
- an operating system manufacturer may include an application along with operating system 124 that is installed on end user computing device 102.
- Operating system 124 comprises a set of programs that manage resources and provide common services for applications that are executed on end user computing device 102, such as applications 122.
- operating system 124 comprises an operating system (OS) user interface 132.
- OS user interface 132 comprises a component of operating system 124 that generates a user interface by which a user can interact with operating system 124 for various purposes, such as but not limited to finding and launching applications, invoking certain operating system functionality, and setting certain operating system settings.
- OS user interface 132 comprises a touch-screen based graphical user interface (GUI), although this is only an example.
- GUI touch-screen based graphical user interface
- each application 122 installed on end user computing device 102 may be represented as an icon or tile within the GUI and invoked by a user through touch-screen interaction with the appropriate icon or tile.
- OS user interface 132 any of a wide variety of alternative user interface models may be used by OS user interface 132.
- applications 122 and operating system 124 are shown as being stored in non-volatile memory 120, it is to be understood that during operation of end user computing device 102, applications 122, operating system 124, or portions thereof, may be loaded to volatile memory 112 and executed therefrom as processes by processing unit 110.
- Digital personal assistant 130 comprises a computer program that is configured to perform tasks, or services, for a user of end user computing device 102 based on user input as well as features such as location awareness and the ability to access information from a variety of sources including online sources (such as weather or traffic conditions, news, stock prices, user schedules, retail prices, etc.).
- sources including online sources (such as weather or traffic conditions, news, stock prices, user schedules, retail prices, etc.).
- Examples of tasks that may be performed by digital personal assistant 130 on behalf of the user may include, but are not limited to, placing a phone call to a user-specified person, launching a user-specified application, sending a user-specified e-mail or text message to a user-specified recipient, playing user-specified music, scheduling a meeting or other event on a user calendar, obtaining directions to a user-specified location, obtaining a score associated with a user-specified sporting event, posting user-specified content to a social media web site or microblogging service, recording user-specified reminders or notes, obtaining a weather report, obtaining the current time, setting an alarm at a user-specified time, obtaining a stock price for a user-specified company, finding a nearby commercial establishment, performing an Internet search, or the like.
- Digital personal assistant 130 may use any of a variety of artificial intelligence techniques to improve its performance over time through continued interaction with the user.
- Digital personal assistant 130 may also be referred to as an intelligent personal assistant,
- Digital personal assistant 130 is configured to provide a user interface by which a user can submit questions, commands, or other verbal input and by which responses to such input may be delivered to the user.
- the input may comprise user speech that is captured by one or more microphones of end user computing device 102 (each of which may comprise one of user input devices 116), although this example is not intended to be limiting and user input may be provided in other ways as well.
- the responses generated by digital personal assistant 130 may be made visible to the user in the form of text, images, or other visual content shown on display 118 within a graphical user interface of digital personal assistant 130.
- the responses may also comprise computer-generated speech or other audio content that is played back via one or more speakers of end user computing device 102 (not shown in FIG. 1 ).
- digital personal assistant 130 is capable of determining that a user has asked a question or made a statement that is intended to engage with a persona of digital personal assistant 130 as opposed to, for example, requesting that the digital personal assistant obtain information or perform some other task on behalf of the user.
- Such questions or statements are often casual or playful in nature and may include, for example, "Will you marry me?,” “What is your favorite color?,” “Sing me a song,” “Tell me a joke,” “Knock knock,” "How much wood could a woodchuck chuck if a woodchuck could chuck wood?,” “Who makes the best phone?,” “Where can I hide a body?,” “What do you look like?,” “You are beautiful,” “How old are you?,” “Who's your daddy?,” “Do you have a boyfriend?,” “What is the meaning of life?,” “I'd like to get to know you better,” or the like.
- chit-chat type utterances or simply "chit-chat”.
- digital personal assistant 130 is further configured to take certain actions in response to determining that the user has made a chit-chat type utterance. For example, in response to determining that the user has made a chit-chat type utterance, digital personal assistant 130 may provide a response thereto by displaying a multimedia object associated with a popular culture reference within its user interface (when the multimedia object is visual in nature) or playing back by its user interface such a multimedia object (when the multimedia object is auditory in nature).
- popular culture reference is intended to broadly encompass a reference to any subject matter associated with the customs, arts and/or social interactions of a large portion of a population.
- a popular culture reference may include a reference to a well-known movie, television show, novel, short story, painting, video game, image, video, cartoon, celebrity, actor or actress, politician or other public figure, stereotype, meme, current event, historical event, or the like.
- digital personal assistant 130 may provide the response thereto by generating or playing back speech that comprises an impersonation of a voice of a person associated with the popular culture reference. Still further, digital personal assistant 130 may be configured to provide the response by displaying within its user interface text that comprises a quotation associated with the popular culture reference, displaying within its user interface a visual representation of the digital personal assistant that evokes the popular culture reference, and/or displaying within its user interface a link that can be activated by the user to access content associated with the popular culture reference.
- block diagram 200 shows how various components of system 100 operate together to enable digital personal assistant 130 to determine that a user has made a chit-chat type utterance and to provide a response thereto.
- the process begins after digital personal assistant 130 has been launched on end user computing device 102.
- a user speaks into one or more microphones of end user computing device 102.
- the user's utterance is captured by the microphone(s) and converted from analog to digital form in a well-known manner.
- Digital personal assistant 130 causes the digital representation of the utterance to be transmitted as an audio stream to speech recognition service 132 (which is part of digital personal assistant backend 106) via network(s) 104.
- digital personal assistant 130 periodically causes a digital representation of a portion of the user's utterance to be packetized and transmitted to speech recognition service 132 via network(s) 104.
- Speech recognition service 132 operates to receive the audio stream transmitted thereto by digital personal assistant 130 and to analyze the audio stream to determine the phonetic content thereof. Once speech recognition service 132 has determined the phonetic content of the audio stream, it then maps the phonetic content to one or more words, which taken together comprise a recognized utterance. Speech recognition service 132 then passes the recognized utterance to query understanding and response system 136.
- speech recognition service 132 transmits the recognized utterance back to digital personal assistant 130 via network(s) 104.
- Digital personal assistant 130 displays a text version of the recognized utterance within its graphical user interface (visible via display 118) so that the user can view the recognized utterance and determine whether or not the recognized utterance accurately represents what he/she said.
- Digital personal assistant 130 further provides a means by which the user can edit the recognized utterance if he/she determines that the recognized utterance does not accurately represent what he/she said and transmit the edited version of the utterance to query understanding and response system 136 for further processing thereof.
- Query understanding and response system 136 receives the recognized or corrected utterance and analyzes the words thereof to determine how such utterance should be handled thereby. For example, query understanding and response system 136 may determine that the recognized or corrected utterance comprises an invocation of a particular task within a predefined set of tasks. For example and without any limitation whatsoever, the task may comprise placing a phone call to a user-specified person (e.g., "call Brian"), sending a user-specified e-mail or text message to a user-specified recipient (e.g., "text Carol that I am running late"), or creating a reminder (e.g., "remind me to check the oven in an hour.”). If query understanding and response system 136 determines that the recognized or corrected utterance comprises an invocation of a particular task within the predefined set, then it will cause specialized logic (e.g., specialized logic within end user computing device 102) to perform the task.
- specialized logic e.g., specialized logic within end user computing device 102
- query understanding and response system 136 further analyses the words of the utterance to determine how such utterance should be handled thereby. For example, query understanding and response system 136 may determine that the utterance should be handled by conducting a Web search or by offering the user with an opportunity to conduct a Web search. In this case, the utterance may be handled by specialized logic for facilitating Web searching that is internal and/or external to query understanding and response system 136.
- the query understanding and response system 136 determine based on an analysis of the words of the utterance that the utterance comprises a chit-chat type utterance, which as noted above is an utterance intended to engage with a persona of digital personal assistant 130.
- query understanding and response system 136 may determine that the utterance comprises a chit-chat type utterance based upon an analysis of factors other than or in addition to an analysis of the words of the utterance. For example, query understanding and response system 136 may determine that the utterance comprises a chit-chat type utterance based in part upon an analysis of an intonation of the utterance, upon contextual clues obtained from a conversation history of the user, or upon any other factors that may be deemed helpful in determining that the utterance comprises a chit-chat type utterance.
- query understanding and response system 136 determines that the utterance comprises a chit-chat type utterance, then the utterance will be handled by a query understanding and response system for chit-chat 138, which is a part of query understanding and response system 136.
- Query understanding and response system for chit-chat 138 is configured to determine the subject matter of the chit-chat type utterance and then, based on the determined subject matter, take steps to cause an appropriate response to the chit-chat type utterance to be output by digital personal assistant 130. As shown in FIG. 2 , this involves sending all or part of a response from query understanding and response system for chit-chat 138 to digital personal assistant 130 via network(s) 104.
- the composition of the response and the manner in which it is conveyed to and/or generated by digital personal assistant 130 will be discussed in more detail below.
- the query understanding and response system for chit-chat 138 determines the subject matter of the chit-chat type utterance and then identifies a plurality of eligible responses that are suitable for responding to the utterance. Query understanding and response system for chit-chat 138 then selects one of the plurality of eligible responses as the response to be provided by digital personal assistant 130. Such selection may be performed at random, in a certain sequence, or by using some other selection methodology.
- query understanding and response system for chit-chat 138 can ensure that digital personal assistant 130 will not provide the same response to the same utterance in every instance, thereby providing some variety and unpredictability to the user's interaction with digital personal assistant 130.
- query understanding and response system for chit-chat 138 operates to match the chit-chat type utterance to a particular utterance type within a hierarchical tree of utterance types having one or more responses associated therewith. Query understanding and response system for chit-chat 138 then selects the response to the chit-chat type utterance from among the response(s) associated therewith.
- FIG. 3 depicts an example hierarchical tree 300 of utterance types that may be used to select a response to a chit-chat type utterance in accordance with an embodiment.
- the root node of hierarchical tree 300 is the general chit-chat utterance type. Every utterance type beneath this root node comprises a chit-chat type utterance.
- At one level below this root node are chit-chat type utterances that are assertions ("Assertion"), commands ("Command”), flirtatious in nature (“Flirt”), requesting information about digital personal assistant 130 ("Sys-info”), or requesting an opinion from digital personal assistant 130 (“Sys-opinion”).
- Beneath each of these nodes are further categories and sub-categories of chit-chat utterance types.
- utterance types generally go from being broader at the type of hierarchical tree 300 to narrower at the bottom of hierarchical tree 300.
- query understanding and response system for chit-chat 138 traverses hierarchical tree 300 and matches the utterance to one of the nodes. For example, query understanding and response system for chit-chat 138 may generate a confidence score that a certain chit-chat type utterance should be matched to "Assertion,” "Command,” “Flirt,” “Sys-info” and “Sys-opinion.” Query understanding and response system for chit-chat 138 then selects the node for which the highest confidence score has been obtained (assuming that some minimum confidence score has been obtained for at least one of the nodes).
- query understanding and response system for chit-chat 138 will traverse hierarchical tree 300 to the node "Sys-Opinion" and generate a confidence score that the chit-chat type utterance should be matched to each of the child nodes of "Sys-opinion"-namely, "Microsoft,” “Trending” and "Advice.”
- Query understanding and response system for chit-chat 138 selects the child node for which the highest confidence score has been achieved (again, assuming some minimum confidence score has been obtained for at least one of the child nodes). If the confidence score for each of the child nodes is less than some predefined minimum confidence score, than the traversal of hierarchical tree 300 stops at the node "Sys-opinion.”
- query understanding and response system for chit-chat 138 may select a response from among the one or more responses associated with the matching node.
- the foregoing approach to identifying suitable responses to chit-chat type utterances is advantageous in that it allows responses to be defined for both broad groups of chit-chat type utterances as well as more narrow groups within the broader groups.
- very specific responses to chit-chat type utterances can be crafted (e.g., "I think Microsoft is great!”), since the system has a high level of confidence that the user is asking for the opinion of digital personal assistant 130 about Microsoft.
- the types of utterances that may be included in hierarchical tree 300 may be determined through human examination of logs of user utterances and labeling of each utterance with an appropriate utterance type.
- a crowd sourcing platform such as the Universal Human Relevance System (UHRS), developed by Microsoft Corporation of Redmond, Washington, may be used to obtain human examination and labeling of thousands of user utterances. This crowd sourcing information can then be used to generate hierarchical tree 300.
- Still other methods for generating a hierarchical tree of utterance types such as hierarchical tree 300 may be used.
- query understanding and response system for chit-chat 138 is configured to maintain one or more responses associated with each of one or more trending topics.
- trending topics are topics that are becoming popular or have recently become popular with users and may be identified automatically (e.g., by automatically monitoring utterances submitted to digital personal assistants, search engine queries, microblogs such as TWITTER, social networking sites such as FACEBOOK, news publications, or other sources) or manually (e.g., through human observation of any or all of these sources).
- query understanding and response system for chit-chat 138 may select the response to the chit-chat type utterance from among the one or more responses associated with the particular trending topic.
- the trending topics may be represented within a hierarchal tree of utterance types that is used by query understanding and response system for chit-chat 138 to select a suitable response to a chit-chat type utterance.
- one of the nodes under "Sys-Opinion" is "Trending.”
- This node can be used to store responses to chit-chat type utterances that are soliciting an opinion of digital personal assistant 130 in regard to one or more trending topics.
- the "Trending" node may have multiple child nodes associated therewith, wherein each child node is associated with a particular trending topic and has one or more responses associated therewith.
- query understanding and response system for chit-chat 138 is configured to maintain one or more responses to certain chit-chat type utterances that are intended to convey the persona of digital personal assistant 130. For example, there may be an interest in ensuring that digital personal assistant 130 has something to say about a particular word, phrase, or topic that is associated with its persona. In this case, an editorial team may generated predefined responses to certain chit-chat type utterances to ensure that digital personal assistant 130 provides characteristic responses whenever such topic is discussed.
- query understanding and response system for chit-chat 138 determines that a chit-chat type utterance is an utterance for which there are one or more predefined responses intended to convey the persona of digital personal assistant 130
- query understanding and response system for chit-chat 138 will select the response to the chit-chat type utterance from among the one or more predefined responses.
- FIG. 4 is a block diagram that shows an example response 400 that may be provided by digital personal assistant 130 in response to a chit-chat type utterance in accordance with an embodiment.
- response 400 includes a number of components, including a display string 402, speech content 404, a speech impersonation component 406, a speech emotion component 408, a digital personal assistant animation 410, a multimedia component 412, and a link to content 414.
- each of the components within response 400 may be stored and/or generated by digital personal assistant backend 106 and transmitted to digital personal assistant 130 by query understanding and response system for chit-chat 138 at the time the response is to be provided to a user.
- one, more than one, or all of the components of response 400 may be stored on and/or generated by end user computing device 102 (e.g., in non-volatile memory 120) and query understanding and response system for chit-chat 138 may send digital personal assistant 130 information sufficient to identify or obtain the component(s) at the time the response is to be provided to a user, so that digital personal assistant 130 can obtain the component(s) locally.
- response 400 includes seven different components, it is to be understood that a response to a chit-chat type utterance may include less than all of the components shown in FIG. 4 .
- Display string 402 comprises text that is to be displayed within the user interface of digital personal assistant 130.
- the text may comprise a verbal response to the chit-chat type utterance of the user.
- display string 402 may comprise a quotation that is associated with a popular culture reference.
- Speech content 404 comprises speech that is to be generated or played back by the user interface of digital personal assistant 130.
- Digital personal assistant 130 may generate such speech by applying text-to-speech conversion to text that comprises part of speech content 404.
- digital personal assistant 130 may generate such speech by playing back an audio file that is included within or identified by speech content 404.
- speech content 404 comprises an audible version of the content included in display string 402, although this need not be the case.
- speech content 404 may comprise verbal information that is entirely different than verbal information included in display string 402.
- the content of speech content 404 may comprise a quotation that is associated with a popular culture reference.
- Speech impersonation component 406 is a component that indicates that digital personal assistant 130 should generate or play back speech content 404 in a manner that impersonates a voice of a person, such as a person associated with a popular culture reference. Speech impersonation component 406 may include or identify an audio file that should be played back by digital personal assistant 130 to perform the impersonation. Alternatively, speech impersonation component 406 may indicate that a special text-to-speech converter should be used by digital personal assistant 130 to generate speech content 404 in a manner that impersonates the voice of the desired person.
- Speech emotion component 408 comprises an emotional element that should be applied to speech content 404 when text-to-speech conversion is applied to such content to generate speech.
- Digital personal assistant animation 410 comprises an animation of an avatar that represents digital personal assistant 130 that is to be displayed within its user interface.
- the animation may be designed such that it evokes a popular culture reference.
- response 400 refers to digital personal assistant animation 410, it is to be appreciated that types of visual representations of the avatar other than animations may be used to evoke the popular culture reference, including static images or the like.
- Multimedia component 412 comprises one or more multimedia objects that are to be displayed within or played back by the user interface of digital personal assistant 130.
- Each multimedia object may be associated with a popular culture reference.
- each multimedia object may comprise, for example, an image to be displayed within the user interface of digital personal assistant 130, video content to be displayed within the user interface of digital personal assistant 130, or audio content to be played back by the user interface of digital personal assistant 130.
- Link to content 414 comprises a link that may be displayed within the user interface of digital personal assistant 130 and that can be activated by the user to access other content.
- the link can be activated by the user to access content associated with a popular culture reference.
- FIGS. 5-10 provide several examples of responses to chit-chat type utterances that may be delivered via the user interface of digital personal assistant 130. These examples help illustrate the various components that may be included in a response to a chit-chat type utterance in accordance with embodiments.
- end user computing device 102 is a smart phone and display 118 is an integrated display of the smart phone.
- end user computing device 102 is not limited to smart phones and may be any of a wide variety of mobile and non-mobile computing devices.
- FIG. 5 illustrates a response that may be provided by digital personal assistant 130 to the chit-chat type utterance "I am nervous about the stats test.”
- a display string 502 comprising the words "May the force be with you” is displayed within the graphical user interface of digital personal assistant 130.
- This display string text comprises a well-known quotation from the popular "Star Wars" movies.
- Visual representation 504 includes a light saber, and thus also evokes the "Star Wars" movies.
- visual representation 504 may comprise part of an animation of the avatar of digital personal assistant 130 that swings the light saber about, perhaps accompanied by audible light saber sounds that are played back via one or more speakers of end user computing device 102.
- the response provided in FIG. 5 also includes audible speech that is played back via one or more speakers of end user computing device 102, wherein such speech also includes the words "May the force be with you.”
- the speech comprises an impersonation of a famous "Star Wars” character such as Yoda or Obi-Wan Kenobi.
- such speech may be rendered by playing back a designated audio file or by applying a special text-to-speech conversion process to the text "May the force be with you.”
- the impersonation may be rendered such that it is apparent that a person other than the "Star Wars" character is performing the impersonation (e.g., a default voice associated with digital personal assistant 130 is performing the impersonation).
- the impersonation may produce a voice that is indistinguishable from that of the "Star Wars" character, or may in fact be the voice of the actor that played the "Star Wars” character.
- display string 502, visual representation 504, and the speech delivered with an impersonation not only respond appropriately to the user's chit-chat type utterance by offering words of encouragement but also serve to strongly evoke a popular culture reference ("Star Wars") that will likely be instantly familiar to the user and help establish a sense of commonality therewith. Furthermore, since the response includes diverse forms of sensory output including the light saber animation and the impersonated voice, the response is more likely to engage and entertain the user then a flat text response.
- FIG. 6 illustrates a response that may be provided by digital personal assistant 130 to the chit-chat type utterance "How do I rob a bank?"
- a display string 602 comprising the words "It didn't end well for these guys” is displayed within the graphical user interface of digital personal assistant 130.
- the response may also include audible speech that is played back via one or more speakers of end user computing device 102, wherein such speech also includes the words "It didn't end well for these guys.”
- the text of display string 602 is referring to a multimedia object 604 that is also displayed within the graphical user interface of digital personal assistant 130.
- multimedia object 604 comprises an image of the movie poster for the 1967 movie "Bonnie & Clyde,” which is a drama concerning the life and death of well-known bank robbers Bonnie Parker and Clyde Barrow.
- display string 602 and corresponding audible speech
- multimedia object 604 comprise a response to the chit-chat utterance "How do I rob a bank” that both responds appropriately to the user's chit-chat type utterance by pointing out the perils of robbing a bank (Bonnie and Clyde were shot to death by police officers) and also evokes a popular culture reference (Bonnie and Clyde and the movie of the same name) that is likely to be familiar to the user and help establish a sense of commonality therewith. Furthermore, since the response forms of output other than flat text, it is more likely to engage and entertain the user.
- FIG. 7 illustrates a response that may be provided by digital personal assistant 130 to the chit-chat type utterance "What's your favorite car in the whole wide world?"
- a display string 702 comprising the words "I love Deloreans. Especially ones that travel through time” is displayed within the graphical user interface of digital personal assistant 130.
- the response may also include audible speech that is played back via one or more speakers of end user computing device 102, wherein such speech also includes the words "I love Deloreans. Especially ones that travel through time.”
- This text and speech refers to the well-known "Back to the Future" movies.
- the response also includes a multimedia object 704 in the form of a YOUTUBE ® video called "Back to the Future - Clock Tower Scene.av" that may be played and viewed by the user within the context of the graphical user interface of digital personal assistant 130.
- a multimedia object 704 in the form of a YOUTUBE ® video called "Back to the Future - Clock Tower Scene.av” that may be played and viewed by the user within the context of the graphical user interface of digital personal assistant 130.
- the response includes a link 706 that, when activated by the user, enables the user to search the Web for the phrase "What's your favorite car in the whole wide world?," which is the original utterance. It is noted that in alternate embodiments, a link may be provided that, when activated by the user, enables the user to search the Web for content associated with the popular culture reference (e.g., the "Back to the Future" movies).
- the response of FIG. 7 strongly evokes a popular culture reference and thus may establish commonality with the user. Furthermore, the video content that is viewable directly from the graphical user interface of digital personal assistant 130 makes the response highly engaging.
- FIG. 8 illustrates a response that may be provided by digital personal assistant 130 to the chit-chat type utterance "You bitch.”
- a display string 802 comprising the words “I'm also a lover, a child and a mother” is displayed within the graphical user interface of digital personal assistant 130.
- the response may also include audible speech that is played back via one or more speakers of end user computing device 102, wherein such speech also includes the words "I'm also a lover, a child and a mother.”
- This text and speech comprises a portion of the lyrics of the well-known song "Bitch" by Meredith Brooks.
- the response also includes a multimedia object 804 in the form of an image of Meredith Brooks.
- the response also includes a text portion 806 that provides information about the song "Bitch" and may also include one or more links that may be activated by the user to purchase a digital copy of the song from one or more sources, respectively.
- the response to FIG. 8 cleverly utilizes a pop culture reference to respond to (and somewhat deflect) the seemingly derogatory chit-chat type utterance. It also includes interesting multimedia content that can help engage the user.
- the response shown in FIG. 8 also illustrates how a response to a chit-chat type utterance can serve in some respects as an advertisement or commercial opportunity in that the user is enabled to purchase the song that is being referred to in the response.
- FIG. 9 illustrates a response that may be provided by digital personal assistant 130 to the chit-chat type utterance "Who is your Daddy?"
- a display string 902 comprising the words "These guys” is displayed within the graphical user interface of digital personal assistant 130.
- the response may also include audible speech that is played back via one or more speakers of end user computing device 102, wherein such speech also includes the words "These guys.”
- the text of display string 902 is referring to a multimedia object 904 that is also displayed within the graphical user interface of digital personal assistant 130. As shown in FIG.
- multimedia object 904 comprises a video of Bill Gates (chairman of Microsoft) and Steve Ballmer (chief executive officer of Microsoft) dressed as the characters of Austin Powers and Dr. Evil, respectively, from the very popular "Austin Powers" movies. This video may be played and viewed by the user within the context of the graphical user interface of digital personal assistant 130.
- digital personal assistant 130 is published by Microsoft Corporation of Redmond, Washington
- the response shown in FIG. 9 is apt since the figures shown in the video are well-known personas associated with Microsoft. Furthermore, the video content is amusing and engaging for the user.
- FIG. 10 illustrates a response that may be provided by digital personal assistant 130 to the chit-chat type utterance "Show me something funny?"
- a display string 1002 comprising the words "A friend of mine has an opinion” is displayed within the graphical user interface of digital personal assistant 130.
- the response may also include audible speech that is played back via one or more speakers of end user computing device 102, wherein such speech also includes the words "A friend of mine has an opinion.”
- the text of display string 1002 is referring to a multimedia object 1004 that is also displayed within the graphical user interface of digital personal assistant 130. As shown in FIG.
- multimedia object 1004 comprises an image of a grumpy-looking cat with the tagline: "I had fun once. It was suspected.” This image is a reference to the popular "Grumpy Cat” internet meme, which may be instantly recognizable to the user and which may also serve to amuse and engage the user.
- FIGS. 5-10 have been provided herein by way of example only. Persons skilled in the relevant art(s) will appreciate that a wide variety of responses to chit-chat type utterances may be provided other than those depicted in FIGS. 5-10 . Such responses may include any one or more of the response components previously described in reference to FIG. 4 , as well as additional components.
- FIG. 11 depicts a flowchart 1100 of a method for implementing a digital personal assistant that utilizes impersonations and/or multimedia in responding to chit-chat type utterances in accordance with an embodiment.
- the method of flowchart 1100 may be performed, for example, by digital personal assistant backend 106 as discussed above in reference to FIG. 1 . Accordingly, the method of flowchart 1100 will now be described with continued reference to system 100 of FIG. 1 . However, the method is not limited to that implementation.
- the method of flowchart 1100 begins at step 1102 in which a digital representation of an utterance of a user of a digital personal assistant is received.
- the digital representation of the utterance may comprise, for example, the utterance that is generated by speech recognition service 132 or the corrected utterance generated through user interaction with digital personal assistant 130 as discussed above in reference to FIG. 2 .
- the digital representation of the utterance is received by query understanding and response system 136.
- step 1104 the digital representation of the utterance is analyzed. As discussed above in reference to FIG. 2 , this step may entail the analysis performed by query understanding and response system 136 to determine if the utterance comprises a chit-chat type utterance.
- step 1106 based on at least the analysis of the digital representation of the utterance, it is determined that the utterance comprises an utterance intended to engage with a persona of the digital personal assistant. As discussed above in reference to FIG. 2 , this step occurs when query understanding and response system 136 determines that the utterance is a chit-chat type uttcrancc. As previously noted, this determination may be based on the analysis of the utterance performed by query understanding and response system 136.
- a response to the utterance is caused to be generated by the digital personal assistant that includes at least one of a multimedia object associated with a popular culture reference and speech that comprises an impersonation of a voice of a person associated with the popular culture reference.
- the multimedia object may comprise, for example, an image, video content, or audio content, and may be displayed within or played back by a user interface of the digital personal assistant.
- the speech may be generated or played back by the digital personal assistant.
- This step may be performed for example, by query understanding and response system for chit-chat 138 which causes digital personal assistant 130 to provide a response that includes a multimedia object, such as multimedia component 404 described above in reference to FIG. 4 , or that includes impersonated speech as indicated by speech impersonation component 406 as described above in reference to FIG. 4 .
- the manner in which query understanding and response system for chit-chat 138 performs this function has been previously described.
- the response to the utterance discussed in step 1108 may further include text that is displayed within the user interface of the digital personal assistant, the text comprising a quotation associated with the popular culture reference, a visual representation of the digital personal assistant (e.g., an animation of the digital personal assistant) that is displayed within the user interface thereof and that evokes the popular culture reference, and/or a link that is displayed within the user interface of the digital personal assistant and that can be activated by the user to access content, such as content associated with the chit-chat type utterance or with the popular culture reference.
- the speech that is generated or played back by the digital personal assistant may comprise a quotation associated with the popular culture reference.
- causing a response to the utterance to be generated in step 1108 comprises identifying a plurality of eligible responses to the utterance and then selecting the response to the utterance from among the plurality of eligible responses to the utterance.
- causing a response to the utterance to be generated in step 1108 comprises matching the utterance to a particular utterance type within a hierarchical tree of utterance types (e.g., hierarchical tree 300 as discussed above in reference to FIG. 3 ), each utterance type in the hierarchical tree of utterance types having one or more responses associated therewith. After the matching, the response to the utterance is selected from among the response(s) associated with the particular utterance type.
- a hierarchical tree of utterance types e.g., hierarchical tree 300 as discussed above in reference to FIG. 3
- causing a response to the utterance to be generated in step 1108 comprises determining that the utterance is associated with a trending topic and then, in response to determining that the utterance is associated with a trending topic, selecting the response to the utterance from among one or more responses associated with the trending topic.
- causing a response to the utterance to be generated in step 1108 comprises determining that the utterance is an utterance for which there are one or more predefined responses intended to convey the persona of the digital personal assistant and, in response to this determination, selecting the response to the utterance from among the one or more predefined responses.
- causing a response to the utterance to be generated in step 1108 comprises sending an audio file that includes the speech or information that identifies the audio file to a computing device executing the digital personal assistant.
- This step may be performed, for example, when query understanding and response system for chit-chat 138 sends an audio file that includes the impersonated speech to end user computing device 102 so that it can be accessed and played back by digital personal assistant 130 or when query understanding and response system for chit-chat 138 sends information that identifies such an audio file to digital personal assistant 130 so that the audio file can be obtained locally by digital personal assistant 130.
- causing a response to the utterance to be generated in step 1108 comprises providing text to a computing device executing the digital personal assistant, wherein the text is to be processed by a text-to-speech component of the digital personal assistant to generate the speech.
- This step may be performed, for example, when query understanding and response system for chit-chat 138 sends an indication to digital personal assistant 130 that digital personal assistant 130 should apply a special text-to-speech converter to designated text to cause the text to be converted to speech in a manner that impersonates the voice of a particular person.
- FIG. 12 depicts a flowchart 1200 of a method by which a digital personal assistant provides a response to a chit-chat type utterance that includes a voice impersonation in accordance with an embodiment.
- the method of flowchart 1200 may be performed, for example, by digital personal assistant 130 as discussed above in reference to FIG. 1 . Accordingly, the method of flowchart 1200 will now be described with continued reference to system 100 of FIG. 1 . However, the method is not limited to that implementation.
- the method of flowchart 1200 begins at step 1202, in which digital personal assistant 130 captures audio that represents an utterance of a user intended to engage with a persona of digital personal assistant 130.
- digital personal assistant 130 transmits the audio to digital personal assistant backend 106.
- digital personal assistant 130 provides a response to the utterance based at least on information received from digital personal assistant backend 106.
- Providing the response includes generating or playing back speech that comprises an impersonation of a voice of a persona associated with a popular culture reference.
- providing the response in step 1206 include playing back an audio file that includes the speech.
- providing the response in step 1206 includes applying text-to-speech conversion to text to generate the speech.
- providing the response in step 1206 includes one or more of: displaying or playing back a multimedia object by a user interface of digital personal assistant 130, the multimedia object being associated with the popular culture reference; displaying text within the user interface of the digital personal assistant, the text comprising a quotation associated with the popular culture reference; displaying a visual representation of the digital personal assistant that evokes the popular culture reference within the user interface of the digital personal assistant; and displaying a link within the user interface of the digital personal assistant that can be activated by the user to access content, such as content associated with the utterance or with the popular culture reference.
- FIG. 13 depicts a flowchart 1300 of a method by which a digital personal assistant provides a response to a chit-chat type utterance that includes a multimedia object in accordance with an embodiment.
- the method of flowchart 1300 may be performed, for example, by digital personal assistant 130 as discussed above in reference to FIG. 1 . Accordingly, the method of flowchart 1300 will now be described with continued reference to system 100 of FIG. 1 . However, the method is not limited to that implementation.
- the method of flowchart 1300 begins at step 1302, in which digital personal assistant 130 captures audio that represents an utterance of a user intended to engage with a persona of digital personal assistant 130.
- digital personal assistant 130 transmits the audio to digital personal assistant backend 106.
- digital personal assistant 130 provides a response to the utterance based at least on information received from digital personal assistant backend 106.
- Providing the response includes displaying or playing back a multimedia object associated with a popular culture reference by a user interface of digital personal assistant 130.
- displaying or playing back the multimedia object in step 1306 comprises displaying an image or video content or playing back audio content by the user interface of digital personal assistant 130.
- providing the response in step 1306 includes one or more of: generating or playing back speech that comprises an impersonation of a voice of a person associated with a popular culture reference; displaying text within the user interface of the digital personal assistant, the text comprising a quotation associated with the popular culture reference; displaying a visual representation of the digital personal assistant that evokes the popular culture reference within the user interface of the digital personal assistant; and displaying a link within the user interface of the digital personal assistant that can be activated by the user to access content, such as content associated with the utterance or with the popular culture reference.
- FIG. 14 is a block diagram of an exemplary mobile device 1402 that may be used to implement end user computing device 102 as described above in reference to FIG. 1 .
- mobile device 1402 includes a variety of optional hardware and software components. Any component in mobile device 1402 can communicate with any other component, although not all connections are shown for ease of illustration.
- Mobile device 1402 can be any of a variety of computing devices (e.g., cell phone, smartphone, handheld computer, Personal Digital Assistant (PDA), etc.) and can allow wireless two-way communications with one or more mobile communications networks 1404, such as a cellular or satellite network, or with a local area or wide area network.
- mobile communications networks 1404 such as a cellular or satellite network, or with a local area or wide area network.
- the illustrated mobile device 1402 can include a controller or processor 1410 (e.g., signal processor, microprocessor, ASIC, or other control and processing logic circuitry) for performing such tasks as signal coding, data processing, input/output processing, power control, and/or other functions.
- An operating system 1412 can control the allocation and usage of the components of mobile device 1402 and support for one or more application programs 1414 (also referred to as "applications" or "apps").
- Application programs 1414 may include common mobile computing applications (e.g., e-mail applications, calendars, contact managers, Web browsers, messaging applications) and any other computing applications (e.g., word processing applications, mapping applications, media player applications).
- application programs 1414 include digital personal assistant 130.
- the illustrated mobile device 1402 can include memory 1420.
- Memory 1420 can include non-removable memory 1422 and/or removable memory 1424.
- Non-removable memory 1422 can include RAM, ROM, flash memory, a hard disk, or other well-known memory devices or technologies.
- Removable memory 1424 can include flash memory or a Subscriber Identity Module (SIM) card, which is well known in GSM communication systems, or other well-known memory devices or technologies, such as "smart cards.”
- SIM Subscriber Identity Module
- Memory 1420 can be used for storing data and/or code for running operating system 1412 and applications 1414.
- Example data can include Web pages, text, images, sound files, video data, or other data to be sent to and/or received from one or more network servers or other devices via one or more wired or wireless networks.
- Memory 1420 can be used to store a subscriber identifier, such as an International Mobile Subscriber Identity (IMSI), and an equipment identifier, such as an International Mobile Equipment Identifier (IMEI). Such identifiers can be transmitted to a network server to identify users and equipment.
- IMSI International Mobile Subscriber Identity
- IMEI International Mobile Equipment Identifier
- Mobile device 1402 can support one or more input devices 1430, such as a touch screen 1432, a microphone 1434, a camera 1436, a physical keyboard 1438 and/or a trackball 1440 and one or more output devices 1450, such as a speaker 1452 and a display 1454.
- Touch screens such as touch screen 1432, can detect input in different ways. For example, capacitive touch screens detect touch input when an object (e.g., a fingertip) distorts or interrupts an electrical current running across the surface. As another example, touch screens can use optical sensors to detect touch input when beams from the optical sensors are interrupted. Physical contact with the surface of the screen is not necessary for input to be detected by some touch screens.
- Other possible output devices can include piezoelectric or other haptic output devices. Some devices can serve more than one input/output function. For example, touch screen 1432 and display 1454 can be combined in a single input/output device.
- the input devices 1430 can include a Natural User Interface (NUI).
- NUI Natural User Interface
- Wireless modem(s) 1460 can be coupled to antcnna(s) (not shown) and can support two-way communications between the processor 1410 and external devices, as is well understood in the art.
- the modem(s) 1460 are shown generically and can include a cellular modem 1466 for communicating with the mobile communication network 1404 and/or other radio-based modems (e.g., Bluetooth 1464 and/or Wi-Fi 1462).
- At least one of the wireless modem(s) 1460 is typically configured for communication with one or more cellular networks, such as a GSM network for data and voice communications within a single cellular network, between cellular networks, or between the mobile device and a public switched telephone network (PSTN).
- GSM Global System for Mobile communications
- PSTN public switched telephone network
- Mobile device 1402 can further include at least one input/output port 1480, a power supply 1482, a satellite navigation system receiver 1484, such as a Global Positioning System (GPS) receiver, an accelerometer 1486, and/or a physical connector 1490, which can be a USB port, IEEE 1394 (FireWire) port, and/or RS-232 port.
- GPS Global Positioning System
- the illustrated components of mobile device 1402 are not required or all-inclusive, as any components can be deleted and other components can be added as would be recognized by one skilled in the art.
- certain components of mobile device 1402 are configured to perform the operations attributed to digital personal assistant 130 as described in preceding sections.
- Computer program logic for performing the operations attributed to digital personal assistant 130 as described above may be stored in memory 1420 and executed by processor 1410.
- processor 1410 may be caused to implement any of the features of digital personal assistant 130 as described above in reference to FIG. 1 .
- processor 1410 may be caused to perform any or all of the steps of any or all of the flowcharts depicted in FIGS. 12 and 13 .
- FIG. 15 depicts an example processor-based computer system 1500 that may be used to implement various embodiments described herein.
- system 1500 may be used to implement end user computing device 102 or any of the computers used to implement digital personal assistant backend 106 as described above in reference to FIG. 1 .
- System 1500 may also be used to implement any or all of the steps of any or all of the flowcharts depicted in FIGS. 11-13 .
- the description of system 1500 provided herein is provided for purposes of illustration, and is not intended to be limiting. Embodiments may be implemented in further types of computer systems, as would be known to persons skilled in the relevant art(s).
- system 1500 includes a processing unit 1502, a system memory 1504, and a bus 1506 that couples various system components including system memory 1504 to processing unit 1502.
- Processing unit 1502 may comprise one or more microprocessors or microprocessor cores.
- Bus 1506 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- System memory 1504 includes read only memory (ROM) 1508 and random access memory (RAM) 1510.
- ROM read only memory
- RAM random access memory
- a basic input/output system 1512 (BIOS) is stored in ROM 1508.
- System 1500 also has one or more of the following drives: a hard disk drive 1514 for reading from and writing to a hard disk, a magnetic disk drive 1516 for reading from or writing to a removable magnetic disk 1518, and an optical disk drive 1520 for reading from or writing to a removable optical disk 1522 such as a CD ROM, DVD ROM, BLU-RAY TM disk or other optical media.
- Hard disk drive 1514, magnetic disk drive 1516, and optical disk drive 1520 are connected to bus 1506 by a hard disk drive interface 1524, a magnetic disk drive interface 1526, and an optical drive interface 1528, respectively.
- the drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computer.
- a hard disk a removable magnetic disk and a removable optical disk
- other types of computer-readable memory devices and storage structures can be used to store data, such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like.
- program modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. These program modules include an operating system 1530, one or more application programs 1532, other program modules 1534, and program data 1536.
- the program modules may include computer program logic that is executable by processing unit 1502 to perform any or all of the functions and features of end user computing device 102 or any of the computers used to implement digital personal assistant backend 106 as described above in reference to FIG. 1 .
- the program modules may also include computer program logic that, when executed by processing unit 1502, performs any of the steps or operations shown or described in reference to the flowcharts of FIGS. 11-13 .
- a user may enter commands and information into system 1500 through input devices such as a keyboard 1538 and a pointing device 1540.
- Other input devices may include a microphone, joystick, game controller, scanner, or the like.
- a touch screen is provided in conjunction with a display 1544 to allow a user to provide user input via the application of a touch (as by a finger or stylus for example) to one or more points on the touch screen.
- processing unit 1502 through a serial port interface 1542 that is coupled to bus 1506, but may be connected by other interfaces, such as a parallel port, game port, or a universal serial bus (USB).
- Such interfaces may be wired or wireless interfaces.
- a display 1544 is also connected to bus 1506 via an interface, such as a video adapter 1546.
- system 1500 may include other peripheral output devices (not shown) such as speakers and printers.
- System 1500 is connected to a network 1548 (e.g., a local area network or wide area network such as the Internet) through a network interface or adapter 1550, a modem 1552, or other suitable means for establishing communications over the network.
- a network 1548 e.g., a local area network or wide area network such as the Internet
- Modem 1552 which may be internal or external, is connected to bus 1506 via serial port interface 1542.
- computer program medium As used herein, the terms “computer program medium,” “computer-readable medium,” and “computer-readable storage medium” are used to generally refer to memory devices or storage structures such as the hard disk associated with hard disk drive 1514, removable magnetic disk 1518, removable optical disk 1522, as well as other memory devices or storage structures such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like.
- Such computer-readable storage media are distinguished from and non-overlapping with communication media (do not include communication media).
- Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media includes wireless media such as acoustic, RF, infrared and other wireless media. Embodiments are also directed to such communication media.
- computer programs and modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. Such computer programs may also be received via network interface 1550, serial port interface 1542, or any other interface type. Such computer programs, when executed or loaded by an application, enable computer 1500 to implement features of embodiments of the present invention discussed herein. Accordingly, such computer programs represent controllers of the system 1500.
- Embodiments are also directed to computer program products comprising software stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a data processing device(s) to operate as described herein.
- Embodiments of the present invention employ any computer-useable or computer-readable medium, known now or in the future. Examples of computer-readable mediums include, but are not limited to memory devices and storage structures such as RAM, hard drives, floppy disks, CD ROMs, DVD ROMs, zip disks, tapes, magnetic storage devices, optical storage devices, MEMs, nanotechnology-based storage devices, and the like.
- system 1500 may be implemented as hardware logic/electrical circuitry or firmware.
- one or more of these components may be implemented in a system-on-chip (SoC).
- SoC may include an integrated circuit chip that includes one or more of a processor (e.g., a microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and/or further circuits and/or embedded firmware to perform its functions.
- a processor e.g., a microcontroller, microprocessor, digital signal processor (DSP), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Electrically Operated Instructional Devices (AREA)
- Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
- Collating Specific Patterns (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/156,009 US9514748B2 (en) | 2014-01-15 | 2014-01-15 | Digital personal assistant interaction with impersonations and rich multimedia in responses |
PCT/US2015/010711 WO2015108758A1 (en) | 2014-01-15 | 2015-01-09 | Digital personal assistant interaction with impersonations and rich multimedia in responses |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3095113A1 EP3095113A1 (en) | 2016-11-23 |
EP3095113B1 true EP3095113B1 (en) | 2022-06-15 |
Family
ID=52440848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP15702033.0A Active EP3095113B1 (en) | 2014-01-15 | 2015-01-09 | Digital personal assistant interaction with impersonations and rich multimedia in responses |
Country Status (17)
Country | Link |
---|---|
US (1) | US9514748B2 (zh) |
EP (1) | EP3095113B1 (zh) |
JP (1) | JP6505117B2 (zh) |
KR (1) | KR102295935B1 (zh) |
CN (1) | CN105917404B (zh) |
AU (1) | AU2015206736B2 (zh) |
BR (1) | BR112016015519B1 (zh) |
CA (1) | CA2935469C (zh) |
CL (1) | CL2016001788A1 (zh) |
HK (1) | HK1223728A1 (zh) |
IL (1) | IL246237B (zh) |
MX (1) | MX360118B (zh) |
MY (1) | MY180332A (zh) |
PH (1) | PH12016501223A1 (zh) |
RU (1) | RU2682023C1 (zh) |
SG (1) | SG11201605642VA (zh) |
WO (1) | WO2015108758A1 (zh) |
Families Citing this family (166)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US9634855B2 (en) * | 2010-05-13 | 2017-04-25 | Alexander Poltorak | Electronic personal interactive device that determines topics of interest using a conversational agent |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US9002322B2 (en) | 2011-09-29 | 2015-04-07 | Apple Inc. | Authentication with secondary approver |
US8769624B2 (en) | 2011-09-29 | 2014-07-01 | Apple Inc. | Access control utilizing indirect authentication |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
DE112014000709B4 (de) | 2013-02-07 | 2021-12-30 | Apple Inc. | Verfahren und vorrichtung zum betrieb eines sprachtriggers für einen digitalen assistenten |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
WO2014143776A2 (en) | 2013-03-15 | 2014-09-18 | Bodhi Technology Ventures Llc | Providing remote interactions with host device using a wireless device |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
EP3937002A1 (en) | 2013-06-09 | 2022-01-12 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
DE112014003653B4 (de) | 2013-08-06 | 2024-04-18 | Apple Inc. | Automatisch aktivierende intelligente Antworten auf der Grundlage von Aktivitäten von entfernt angeordneten Vorrichtungen |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
KR102193559B1 (ko) * | 2014-02-18 | 2020-12-22 | 삼성전자주식회사 | 대화형 서버 및 이의 제어 방법 |
USD801993S1 (en) * | 2014-03-14 | 2017-11-07 | Microsoft Corporation | Display screen with animated graphical user interface |
US20150350146A1 (en) | 2014-05-29 | 2015-12-03 | Apple Inc. | Coordination of message alert presentations across devices based on device modes |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
TWI566107B (zh) | 2014-05-30 | 2017-01-11 | 蘋果公司 | 用於處理多部分語音命令之方法、非暫時性電腦可讀儲存媒體及電子裝置 |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
EP3149554B1 (en) | 2014-05-30 | 2024-05-01 | Apple Inc. | Continuity |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9967401B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | User interface for phone call routing among devices |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10339293B2 (en) | 2014-08-15 | 2019-07-02 | Apple Inc. | Authenticated device used to unlock another device |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9786299B2 (en) * | 2014-12-04 | 2017-10-10 | Microsoft Technology Licensing, Llc | Emotion type classification for interactive dialog system |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US10567477B2 (en) * | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9959866B2 (en) * | 2015-04-02 | 2018-05-01 | Panasonic Intellectual Property Management Co., Ltd. | Computer-implemented method for generating a response sentence by using a weight value of node |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10409550B2 (en) | 2016-03-04 | 2019-09-10 | Ricoh Company, Ltd. | Voice control of interactive whiteboard appliances |
US10417021B2 (en) * | 2016-03-04 | 2019-09-17 | Ricoh Company, Ltd. | Interactive command assistant for an interactive whiteboard appliance |
CN107293292A (zh) * | 2016-03-31 | 2017-10-24 | 深圳光启合众科技有限公司 | 基于云端的设备及其操作方法 |
US10291565B2 (en) * | 2016-05-17 | 2019-05-14 | Google Llc | Incorporating selectable application links into conversations with personal assistant modules |
US10263933B2 (en) | 2016-05-17 | 2019-04-16 | Google Llc | Incorporating selectable application links into message exchange threads |
DK179186B1 (en) | 2016-05-19 | 2018-01-15 | Apple Inc | REMOTE AUTHORIZATION TO CONTINUE WITH AN ACTION |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK201670622A1 (en) | 2016-06-12 | 2018-02-12 | Apple Inc | User interfaces for transactions |
US9990176B1 (en) * | 2016-06-28 | 2018-06-05 | Amazon Technologies, Inc. | Latency reduction for content playback |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
CN108075959B (zh) * | 2016-11-14 | 2021-03-12 | 腾讯科技(深圳)有限公司 | 一种会话消息处理方法和装置 |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US11650791B2 (en) | 2017-01-11 | 2023-05-16 | Microsoft Technology Licensing, Llc | Relative narration |
US10574825B2 (en) * | 2017-02-15 | 2020-02-25 | Microsoft Technology Licensing, Llc | Assisted-communication with intelligent personal assistant |
CN109313649B (zh) * | 2017-03-24 | 2022-05-31 | 微软技术许可有限责任公司 | 用于聊天机器人的基于语音的知识共享的方法和装置 |
US10853717B2 (en) | 2017-04-11 | 2020-12-01 | Microsoft Technology Licensing, Llc | Creating a conversational chat bot of a specific person |
US11170768B2 (en) * | 2017-04-17 | 2021-11-09 | Samsung Electronics Co., Ltd | Device for performing task corresponding to user utterance |
US10992795B2 (en) | 2017-05-16 | 2021-04-27 | Apple Inc. | Methods and interfaces for home media control |
US11431836B2 (en) | 2017-05-02 | 2022-08-30 | Apple Inc. | Methods and interfaces for initiating media playback |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | USER INTERFACE FOR CORRECTING RECOGNITION ERRORS |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | MULTI-MODAL INTERFACES |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US20220279063A1 (en) | 2017-05-16 | 2022-09-01 | Apple Inc. | Methods and interfaces for home media control |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
CN111343060B (zh) | 2017-05-16 | 2022-02-11 | 苹果公司 | 用于家庭媒体控制的方法和界面 |
US20200357382A1 (en) * | 2017-08-10 | 2020-11-12 | Facet Labs, Llc | Oral, facial and gesture communication devices and computing architecture for interacting with digital media content |
US10636424B2 (en) * | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US20190172240A1 (en) * | 2017-12-06 | 2019-06-06 | Sony Interactive Entertainment Inc. | Facial animation for social virtual reality (vr) |
CN107993657A (zh) * | 2017-12-08 | 2018-05-04 | 广东思派康电子科技有限公司 | 一种基于多个语音助手平台的切换方法 |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
WO2019161207A1 (en) * | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for conversational agent via adaptive caching of dialogue tree |
US11308312B2 (en) | 2018-02-15 | 2022-04-19 | DMAI, Inc. | System and method for reconstructing unoccupied 3D space |
KR102515023B1 (ko) * | 2018-02-23 | 2023-03-29 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10984799B2 (en) | 2018-03-23 | 2021-04-20 | Amazon Technologies, Inc. | Hybrid speech interface device |
US10777203B1 (en) | 2018-03-23 | 2020-09-15 | Amazon Technologies, Inc. | Speech interface device with caching component |
WO2019190812A1 (en) * | 2018-03-26 | 2019-10-03 | Microsoft Technology Licensing, Llc | Intelligent assistant device communicating non-verbal cues |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK179822B1 (da) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11076039B2 (en) | 2018-06-03 | 2021-07-27 | Apple Inc. | Accelerated task performance |
KR20190142192A (ko) | 2018-06-15 | 2019-12-26 | 삼성전자주식회사 | 전자 장치 및 전자 장치의 제어 방법 |
US11190465B2 (en) | 2018-08-06 | 2021-11-30 | Oracle International Corporation | Displaying data sets responsive to natural language messages received by chatbots |
WO2020060151A1 (en) | 2018-09-19 | 2020-03-26 | Samsung Electronics Co., Ltd. | System and method for providing voice assistant service |
KR20200033140A (ko) * | 2018-09-19 | 2020-03-27 | 삼성전자주식회사 | 보이스 어시스턴트 서비스를 제공하는 시스템 및 방법 |
CN110942518B (zh) * | 2018-09-24 | 2024-03-29 | 苹果公司 | 上下文计算机生成现实(cgr)数字助理 |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
KR20200044175A (ko) | 2018-10-05 | 2020-04-29 | 삼성전자주식회사 | 전자 장치 및 그의 비서 서비스 제공 방법 |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11657797B2 (en) * | 2019-04-26 | 2023-05-23 | Oracle International Corporation | Routing for chatbots |
US11133005B2 (en) | 2019-04-29 | 2021-09-28 | Rovi Guides, Inc. | Systems and methods for disambiguating a voice search query |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | USER ACTIVITY SHORTCUT SUGGESTIONS |
US11620103B2 (en) | 2019-05-31 | 2023-04-04 | Apple Inc. | User interfaces for audio media control |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
DK201970510A1 (en) | 2019-05-31 | 2021-02-11 | Apple Inc | Voice identification in digital assistant systems |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US10996917B2 (en) | 2019-05-31 | 2021-05-04 | Apple Inc. | User interfaces for audio media control |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11481094B2 (en) | 2019-06-01 | 2022-10-25 | Apple Inc. | User interfaces for location-related communications |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11477609B2 (en) | 2019-06-01 | 2022-10-18 | Apple Inc. | User interfaces for location-related communications |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11941362B2 (en) * | 2020-04-27 | 2024-03-26 | Early Warning Services, Llc | Systems and methods of artificially intelligent sentiment analysis |
US11038934B1 (en) | 2020-05-11 | 2021-06-15 | Apple Inc. | Digital assistant hardware abstraction |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
US11392291B2 (en) | 2020-09-25 | 2022-07-19 | Apple Inc. | Methods and interfaces for media control with dynamic feedback |
US11756574B2 (en) | 2021-03-11 | 2023-09-12 | Apple Inc. | Multiple state digital assistant for continuous dialog |
US11955137B2 (en) | 2021-03-11 | 2024-04-09 | Apple Inc. | Continuous dialog with a digital assistant |
US11847378B2 (en) | 2021-06-06 | 2023-12-19 | Apple Inc. | User interfaces for audio routing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6721706B1 (en) * | 2000-10-30 | 2004-04-13 | Koninklijke Philips Electronics N.V. | Environment-responsive user interface/entertainment device that simulates personal interaction |
US20060155765A1 (en) * | 2004-12-01 | 2006-07-13 | Takeuchi Johane | Chat information service system |
US20090210217A1 (en) * | 2008-02-14 | 2009-08-20 | Aruze Gaming America, Inc. | Gaming Apparatus Capable of Conversation with Player and Control Method Thereof |
WO2013155619A1 (en) * | 2012-04-20 | 2013-10-24 | Sam Pasupalak | Conversational agent |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5918222A (en) | 1995-03-17 | 1999-06-29 | Kabushiki Kaisha Toshiba | Information disclosing apparatus and multi-modal information input/output system |
NL1000679C2 (nl) * | 1995-06-28 | 1996-12-31 | Arie Van Wieringen Video Film | Bewegingseditor/samensteleenheid. |
US6144938A (en) | 1998-05-01 | 2000-11-07 | Sun Microsystems, Inc. | Voice user interface with personality |
US9076448B2 (en) | 1999-11-12 | 2015-07-07 | Nuance Communications, Inc. | Distributed real time speech recognition system |
JP2003044088A (ja) * | 2001-07-27 | 2003-02-14 | Sony Corp | プログラム、記録媒体、並びに音声対話装置および方法 |
JP2005070721A (ja) * | 2003-08-27 | 2005-03-17 | Akihiko Shigeta | 音響出力機能付き化粧用品 |
CN1943218A (zh) * | 2004-02-17 | 2007-04-04 | 语音信号科技公司 | 多模态嵌入界面的可替换定制的方法和设备 |
JP2006039120A (ja) * | 2004-07-26 | 2006-02-09 | Sony Corp | 対話装置および対話方法、並びにプログラムおよび記録媒体 |
JP2006048218A (ja) * | 2004-08-02 | 2006-02-16 | Advanced Media Inc | 音声動画応答方法および音声動画応答システム |
US7949529B2 (en) | 2005-08-29 | 2011-05-24 | Voicebox Technologies, Inc. | Mobile systems and methods of supporting natural language human-machine interactions |
US9318108B2 (en) * | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US7957976B2 (en) | 2006-09-12 | 2011-06-07 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a multimodal application |
US8073681B2 (en) | 2006-10-16 | 2011-12-06 | Voicebox Technologies, Inc. | System and method for a cooperative conversational voice user interface |
US8831977B2 (en) * | 2007-09-26 | 2014-09-09 | At&T Intellectual Property I, L.P. | Methods, systems, and computer program products for implementing personalized dissemination of information |
US8140335B2 (en) | 2007-12-11 | 2012-03-20 | Voicebox Technologies, Inc. | System and method for providing a natural language voice user interface in an integrated voice navigation services environment |
JP4547721B2 (ja) * | 2008-05-21 | 2010-09-22 | 株式会社デンソー | 自動車用情報提供システム |
US8386929B2 (en) * | 2010-06-22 | 2013-02-26 | Microsoft Corporation | Personal assistant for task utilization |
US8640021B2 (en) | 2010-11-12 | 2014-01-28 | Microsoft Corporation | Audience-based presentation and customization of content |
SG184583A1 (en) * | 2011-03-07 | 2012-10-30 | Creative Tech Ltd | A device for facilitating efficient learning and a processing method in association thereto |
US20130061257A1 (en) * | 2011-09-02 | 2013-03-07 | Sony Corporation | Verbally communicating facially responsive television apparatus |
US8346563B1 (en) | 2012-04-10 | 2013-01-01 | Artificial Solutions Ltd. | System and methods for delivering advanced natural language interaction applications |
KR102056461B1 (ko) * | 2012-06-15 | 2019-12-16 | 삼성전자주식회사 | 디스플레이 장치 및 디스플레이 장치의 제어 방법 |
RU2654789C2 (ru) * | 2014-05-30 | 2018-05-22 | Общество С Ограниченной Ответственностью "Яндекс" | Способ (варианты) и электронное устройство (варианты) обработки речевого запроса пользователя |
-
2014
- 2014-01-15 US US14/156,009 patent/US9514748B2/en active Active
-
2015
- 2015-01-09 RU RU2016128739A patent/RU2682023C1/ru active
- 2015-01-09 CN CN201580004844.6A patent/CN105917404B/zh active Active
- 2015-01-09 EP EP15702033.0A patent/EP3095113B1/en active Active
- 2015-01-09 KR KR1020167019069A patent/KR102295935B1/ko active IP Right Grant
- 2015-01-09 MY MYPI2016702496A patent/MY180332A/en unknown
- 2015-01-09 BR BR112016015519-0A patent/BR112016015519B1/pt active IP Right Grant
- 2015-01-09 JP JP2016546938A patent/JP6505117B2/ja active Active
- 2015-01-09 SG SG11201605642VA patent/SG11201605642VA/en unknown
- 2015-01-09 MX MX2016009130A patent/MX360118B/es active IP Right Grant
- 2015-01-09 WO PCT/US2015/010711 patent/WO2015108758A1/en active Application Filing
- 2015-01-09 AU AU2015206736A patent/AU2015206736B2/en active Active
- 2015-01-09 CA CA2935469A patent/CA2935469C/en active Active
-
2016
- 2016-06-15 IL IL246237A patent/IL246237B/en active IP Right Grant
- 2016-06-22 PH PH12016501223A patent/PH12016501223A1/en unknown
- 2016-07-13 CL CL2016001788A patent/CL2016001788A1/es unknown
- 2016-10-19 HK HK16112030.9A patent/HK1223728A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6721706B1 (en) * | 2000-10-30 | 2004-04-13 | Koninklijke Philips Electronics N.V. | Environment-responsive user interface/entertainment device that simulates personal interaction |
US20060155765A1 (en) * | 2004-12-01 | 2006-07-13 | Takeuchi Johane | Chat information service system |
US20090210217A1 (en) * | 2008-02-14 | 2009-08-20 | Aruze Gaming America, Inc. | Gaming Apparatus Capable of Conversation with Player and Control Method Thereof |
WO2013155619A1 (en) * | 2012-04-20 | 2013-10-24 | Sam Pasupalak | Conversational agent |
Also Published As
Publication number | Publication date |
---|---|
PH12016501223B1 (en) | 2016-08-22 |
JP6505117B2 (ja) | 2019-04-24 |
CA2935469A1 (en) | 2015-07-23 |
SG11201605642VA (en) | 2016-08-30 |
US20150199967A1 (en) | 2015-07-16 |
IL246237B (en) | 2019-03-31 |
BR112016015519B1 (pt) | 2023-01-17 |
MX360118B (es) | 2018-10-23 |
BR112016015519A8 (pt) | 2020-06-02 |
CN105917404A (zh) | 2016-08-31 |
IL246237A0 (en) | 2016-07-31 |
BR112016015519A2 (zh) | 2017-08-08 |
EP3095113A1 (en) | 2016-11-23 |
HK1223728A1 (zh) | 2017-08-04 |
AU2015206736B2 (en) | 2019-11-21 |
JP2017515134A (ja) | 2017-06-08 |
WO2015108758A1 (en) | 2015-07-23 |
MY180332A (en) | 2020-11-28 |
AU2015206736A1 (en) | 2016-07-07 |
RU2682023C1 (ru) | 2019-03-14 |
MX2016009130A (es) | 2016-10-13 |
KR20160108348A (ko) | 2016-09-19 |
PH12016501223A1 (en) | 2016-08-22 |
CL2016001788A1 (es) | 2017-01-20 |
CN105917404B (zh) | 2019-11-05 |
CA2935469C (en) | 2022-05-03 |
KR102295935B1 (ko) | 2021-08-30 |
US9514748B2 (en) | 2016-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3095113B1 (en) | Digital personal assistant interaction with impersonations and rich multimedia in responses | |
KR102331049B1 (ko) | 통신 개시를 위한 사용자 신호 레버리징 | |
US20220059091A1 (en) | Voice assistant-enabled web application or web page | |
CN107430858A (zh) | 传送标识当前说话者的元数据 | |
JP2018503894A (ja) | 対話型ダイアログシステムのための感情タイプの分類 | |
US20180365552A1 (en) | Cognitive communication assistant services | |
US20120185417A1 (en) | Apparatus and method for generating activity history | |
US10681402B2 (en) | Providing relevant and authentic channel content to users based on user persona and interest | |
CN117529773A (zh) | 用户自主个性化文本转语音的声音生成 | |
US11057332B2 (en) | Augmented expression sticker control and management | |
US10657692B2 (en) | Determining image description specificity in presenting digital content | |
CN110931014A (zh) | 基于正则匹配规则的语音识别方法及装置 | |
US11318373B2 (en) | Natural speech data generation systems and methods | |
US12020683B2 (en) | Real-time name mispronunciation detection | |
US11889168B1 (en) | Systems and methods for generating a video summary of a virtual event | |
US20240303030A1 (en) | Dynamic audio content generation | |
CN110309270A (zh) | 聊天机器人的唱歌答复技术 | |
TW201901487A (zh) | 能對網路資料進行解析並據以模擬特定對象之方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20160613 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20190206 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC |
|
GRAJ | Information related to disapproval of communication of intention to grant by the applicant or resumption of examination proceedings by the epo deleted |
Free format text: ORIGINAL CODE: EPIDOSDIGR1 |
|
INTG | Intention to grant announced |
Effective date: 20211025 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: MALEKZADEH, SOGOL Inventor name: HARRISON, DEBORAH B. Inventor name: HOWARD, ROBERT J., III Inventor name: REDDY, MOUNI |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTC | Intention to grant announced (deleted) | ||
INTG | Intention to grant announced |
Effective date: 20211213 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602015079453 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1498863 Country of ref document: AT Kind code of ref document: T Effective date: 20220715 |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20220615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220915 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220916 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220915 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1498863 Country of ref document: AT Kind code of ref document: T Effective date: 20220615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221017 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221015 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602015079453 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |
|
26N | No opposition filed |
Effective date: 20230316 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230430 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230109 |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20230131 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230131 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230131 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230131 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20231219 Year of fee payment: 10 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230109 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20231219 Year of fee payment: 10 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20231219 Year of fee payment: 10 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220615 |