WO2017070323A1 - Attentive assistant - Google Patents

Attentive assistant Download PDF

Info

Publication number
WO2017070323A1
WO2017070323A1 PCT/US2016/057876 US2016057876W WO2017070323A1 WO 2017070323 A1 WO2017070323 A1 WO 2017070323A1 US 2016057876 W US2016057876 W US 2016057876W WO 2017070323 A1 WO2017070323 A1 WO 2017070323A1
Authority
WO
WIPO (PCT)
Prior art keywords
link
audio
server
user device
audio stream
Prior art date
Application number
PCT/US2016/057876
Other languages
French (fr)
Inventor
Jordon R. COHEN
Daniel L. Roth
David Leo Wright HALL
Jesse Daniel Eskes RUSAK
Andrew Robert VOLPE
Sean Daniel True
Damon R. PENDER
Laurence S. Gillick
Yan VIRIN
Original Assignee
Semantic Machines, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Semantic Machines, Inc. filed Critical Semantic Machines, Inc.
Publication of WO2017070323A1 publication Critical patent/WO2017070323A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/527Centralised call answering arrangements not requiring operator intervention
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1069Session establishment or de-establishment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]

Definitions

  • This invention relates to a communication assistant, and in particular to an automated assistant for use by an operator of a motor vehicle, or of other equipment, in performing communication related tasks.
  • Mobile devices today may include voice-based interfaces, for instance, the SiriTM interface provided by Apple Inc., which may allow users to interface with their mobile devices using hands-free voice-based interactions. For example, a user may place a telephone call or dictate a text message by voice. Speech-recognition based telephone assistants have been attempted but are not ubiquitous. For example, a system developed by Wildfire Communication over twenty years ago attempted to provide telephone-based assistance, but did not relive the user of having to use a conventional telephone to interact with the system. However, drivers may be distracted using such interfaces even if a hands-free telephone is used. Summary
  • an approach to providing communication assistance to an operator of a vehicle makes use software having a first component executing on a personal device of the operator as well as a second component executing on a server in communication with the personal device.
  • a method for assisting communication via a user device includes receiving at a server a voice-based call from a calling device for the user device, the voice-based call having been made to an address associated with the user device.
  • a first two-way audio link between the server and the calling device is established.
  • a second two-way audio link is also established between a server and the user device.
  • the server responds to the call by sending a first audio stream over the first link to the calling device.
  • the first audio stream includes a spoken message for alerting a calling party to the involvement of an automated assistant.
  • the server receives a second audio stream over the first link from the calling device, and sends a third audio stream over the second link to the user device, where the third audio stream includes a portion of the second audio stream.
  • Audio received over at least one of the first link and the second link is processed at the server. This processing includes waiting to receive a first voice response of a first predetermined type over the second link, and if the first voice response is received, causing the calling device and the user device to be joined by a two-way audio link.
  • aspects may include one or more of the following features.
  • the sending of the third audio stream is performed at least in part during receiving of the second audio stream.
  • the third audio stream is a delay of the second audio stream.
  • the first voice response consists of no spoken response (i.e., the user does not speak, for example, for a prescribed amount of time).
  • Processing the audio further includes waiting to receive a second voice response of a second predetermined type over the second link, and if the second voice response is received, causing the calling device and a voice messaging server to be joined by a two- way audio link. [013] Establishing the second link is performed prior to receiving the voice-based call.
  • the second link comprises a packet-based link (e.g., a WebRTC based link).
  • a packet-based link e.g., a WebRTC based link
  • Causing the calling device and the user device to be joined by a two-way audio link comprises bridging the first link and the second link, or redirecting the voice-based call to the user device.
  • in general method for assisting communication via a user device includes establishing a second two-way audio link between a server and a user device.
  • Audio received at the user device from a user is processed, including receiving a first voice response of a first predetermined type, wherein first voice response causes the calling device and the user device to be joined by a two-way audio link.
  • aspects may include one or more of the following features.
  • the receiving of the third audio stream is performed at least in part during receiving of the second audio stream at the server.
  • the third audio stream is a delay of the second audio stream.
  • An advantage of one or more embodiments is that the there is little if any distraction to the user to cause a call to be either competed from a calling device to the user device or directed to a voice messaging system.
  • a particularly simple call to be either competed from a calling device to the user device or directed to a voice messaging system.
  • the requirement that the user is merely silent to cause the call to be redirected or to utter a simple command to complete the call provides a high degree of functionality with minimal distraction. More complex command input by the user can provide increased functionality without increasing distraction significantly.
  • FIG. 1 is a block diagram of a communication assistance system
  • FIG. 2 is a block diagram of components of the system of FIG. 1.
  • FIG. 1 shows a schematic block diagram of a communication assistance system 100.
  • a representative vehicle 120 is illustrated in FIG. 1, as are a set of representative remote telephones 175 (or other communication devices), but it should be understood that the system described herein is intended to support a large population of users.
  • a user 110 generally an operator of a vehicle 120, makes use of a personal device 125, such as a "smartphone".
  • the device 125 includes a processor that can execute
  • the vehicle 120 may optionally include a built-in station 130, which communicates with the personal device 125 (e.g., via a Bluetooth radio frequency communication link 126) and extends interface functions of the personal device via a speaker 134, microphone 133, and/or touchscreen 132.
  • a built-in station 130 which communicates with the personal device 125 (e.g., via a Bluetooth radio frequency communication link 126) and extends interface functions of the personal device via a speaker 134, microphone 133, and/or touchscreen 132.
  • the personal device 125 is linked to a telephone and data network 140, for example, that includes a cellular based "3G" or "4G"/"LTE" network that provides communication services to the device, including call-based voice communication (i.e., a dedicate channel for voice data) and/or packet or message based communication.
  • a telephone and data network 140 for example, that includes a cellular based "3G” or "4G"/"LTE" network that provides communication services to the device, including call-based voice communication (i.e., a dedicate channel for voice data) and/or packet or message based communication.
  • the system 100 makes use of one or more server computers 150, which execute a server application 155.
  • the client application 127 executing on the user's personal device 125 is in data and/or voice based communication with the server application 155 during the providing of communication assistance to the user.
  • the user's device is associated a conventional telephone number and/or other destination address (e.g., email address, Session Initiation Protocol (SIP) Uniform
  • destination address e.g., email address, Session Initiation Protocol (SIP) Uniform
  • URI Resource Identifier
  • inbound communication for example, from a remote telephone 175 is redirected to the server application 155 at the server 150.
  • redirection is selected by the user 110 when the user is operating the vehicle 120, or in some examples, redirection is initiated automatically when the personal device is used in the vehicle (e.g., paired with the built-in station 130).
  • One way that this redirection is accomplished is for the client application 127 executed on the personal device 125, and to communicate with a component 145 (e.g., a switch, signaling node, gateway, etc.) of the telephone network to cause the redirection on inbound communication to the personal device.
  • a component 145 e.g., a switch, signaling node, gateway, etc.
  • the redirection may be turned on and off using dialing codes, such as "*72" to turn on forwarding and "*73" to turn it off.
  • the user may use built-in capabilities of the personal device 125 to cause the redirection, for example, using a "Settings>Phone>Call Forwarding" setting of a smartphone.
  • calls and optionally text messages are directed to the server application 155 as a result.
  • the server application 155 does not necessarily have a separate physical telephone line for each user 110.
  • dialed number information may be provided by the telephone network 140 when delivering a call for the user to the server application 155 in order to identify the destination (i.e., the user) for the call.
  • inbound communication may pass through a Voice-over-IP (VoIP) gateway in or at the edge of the network 140, and call setup as well as voice data may be provided to the server application 155 over a data network connection (e.g., as Internet Protocol communication).
  • VoIP Voice-over-IP
  • a persistent data connection Prior to receiving communication at the server application 155 for the user 110, a persistent data connection is established between the server application 155 and the client application 127, or alternatively, the client application 127 can accept new data connections that are initiated on demand by the server application 155 over a data network linking the server 150 and the personal device 125 (e.g., over a data network service of the mobile network 140).
  • the server When a voice call is received at the server application 155 for a particular user 110, the server accepts the call and establishes a voice communication channel between the server application and the remote telephone 175, making use of speech synthesis (either from recorded utterances, or using computer-implemented text-to-speech (TTS)) and speech recognition and/or telephone tone (DTMF) decoding capabilities at the server application 155. Handling of a received voice call by the server application generally involves audio communication between the server application and the calling telephone 175 on a first communication link, as well as audio communication between the user 110 and the server application 155 on a second communication link.
  • TTS text-to-speech
  • DTMF telephone tone
  • audio communication between the server application 155 and the user 110 makes use of a peer-to-peer audio protocol (e.g., WebRTC and/or RTP) to pass audio between the server application 155 and the client application 127.
  • a peer-to-peer audio protocol e.g., WebRTC and/or RTP
  • the client application 127 interacts with the user via a microphone and speaker of the device 125 and/or the station 130.
  • the calling telephone 175 and the personal device 125 may at some point in the flow be linked by a bidirectional voice channel, for example, with the channel being bridged at the server application 155, or bridged or redirected via capabilities provided by the telephone network 140.
  • handling of an inbound telephone call involves the server application 155 performing steps including: (1) answering the call; (2) communicating with the caller advising the caller of its assistant nature; (3) announcing the call to the user 110, generally including forwarding of at least some audio of the communication with the caller to the user; and (4) causing the caller and the user to be in direct audio
  • a call made to the user' s telephone number while the user is using the system in the user's vehicle is delivered to the server application 155.
  • the server application implements the assistant function, and upon answering the call, the assistant announces itself, for instance, by saying "this is the assistant for [driver's ID]. May I help you?"
  • the caller may respond by saying "I'd like to speak with [driver's ID]", whereupon the assistant generates an audio response that says "He is driving. I'll see if he can take your call”.
  • the server application forwards the audio to the client application 127 in the vehicle, and the client application plays the audio (e.g., both the server application synthesized prompts as well as the caller's audio answers).
  • the assistant waits a few seconds for the driver to speak.
  • This functionality may be implemented at the client application 127, or alternatively, the monitored audio from within the vehicle may be passed to the server application 155, which makes this determination. In any case, this audio from the vehicle is not generally passed back to the caller.
  • the assistant not hearing any response from the driver, the assistant then generates another audio response that says "[driver ID] is busy; may I forward your call to his voicemail?" If the caller speaks, the assistant detects the caller's verbal response and processes the response. If the driver speaks in response to the assistant's prompt indicating that the call should be completed, then the assistant connects the device 125 to the call, and the phone call proceeds normally. If the driver does not speak, or indicates that he cannot accept the call, the call is directed to voicemail.
  • connection of the call to the user may be performed in a variety of ways, including making a voice link using an Internet Protocol (e.g., SIP, WebRTP, etc.) connection, or using a cellular voice connection, for instance, with the personal device initiating a call to the server or the server initiating a voice call to the personal device (in a manner that is not subject to the forwarding setting for other calls made to the device) or using a call transfer function of the telephone network thereby removing the server application from the call.
  • An Internet Protocol e.g., SIP, WebRTP, etc.
  • a cellular voice connection for instance, with the personal device initiating a call to the server or the server initiating a voice call to the personal device (in a manner that is not subject to the forwarding setting for other calls made to the device) or using a call transfer function of the telephone network thereby removing the server application from the call.
  • a typical interaction might involve the following exchange:
  • a remove calling device 175 makes a call via the Public Switched Telephone Network (PSTN) 240 to a Voice-over-IP (VoIP) gateway 245.
  • PSTN Public Switched Telephone Network
  • VoIP Voice-over-IP
  • the user has previously redirected the telephone number of the user's personal device so that calls to it are redirected, in this case to the VoIP gateway.
  • the server application 155 Prior to the call being made, the server application 155 has registered with the VoIP gateway to be notified of call's made to the user's number.
  • the VoIP gateway uses a Session Initiation Protocol (SIP) to interact with the server application 155 with the public Internet 250.
  • SIP Session Initiation Protocol
  • the server application 155 accepts the call, at which point a Real-Time Protocol (RTP) audio connection is made between the VoIP gateway 245 and the server application 155 for the call.
  • RTP Real-Time Protocol
  • the client application 127 has registered with the server application 155 using a WebRTC protocol over a mobile IP network 260 (e.g., a 4G cellular network) and over the public Internet 260, and upon receiving the call for the user, the server application initiates WebRTC audio communication with the client application (e.g., using a Secure RTP (SRTP) protocol set up as part of the WebRTC interaction between the server application and the client application).
  • SRTP Secure RTP
  • the server application When the server application "transfers" the call to the client, it either stays in the audio path (e.g., bridging the SIP-RTP connection and the WebRTC-SRTP connection), or alternatively, the server application sends a SIP command (e.g., REFER) to the VoIP gateway causing a redirection of the audio connection to pass directly between the VoIP gateway and the user's device 125.
  • a SIP command e.g., REFER
  • the user interacts with the system (i.e., implemented at the client application 127 and/or the server application 155), generally using recognized speech input (or in some embodiments, a limited number of manual inputs, for example, using predefined buttons). For example, in response to hearing the initial exchange with the caller, the user may provide a command that causes one of a number of different actions to be taken.
  • Such actions may include, for example, completing the call (e.g., in a response such as "please put her through"), providing the caller with a predefined synthesized response, or a text message (i.e., a Short Message Service (SMS) message), providing a recorded response, forwarding the call to a predefined or selected alternate destination (e.g., to the user's secretary), etc.
  • SMS Short Message Service
  • the system also accepts text messages (e.g., SMS messages, email etc.) at the server on behalf of the user, and announces the arrival in a similar manner as with incoming voice calls. For instance, the arrival of the text message is announced audio to the user, and optionally (e.g., according to input from the user) the full content of the message is read to the user, and a response may be sent in return (either by default, such as "Dan is driving and can't answer right now", or by voice input (by speech-to-text or selection of predefined responses).
  • text messages e.g., SMS messages, email etc.
  • the server causes audio to be played to the user: "You have a text message from ZZZ. Shall I read it to you?" where ZZZ is the identity of the sender of the text message.
  • the assistant listens for a reply from the driver, and if the reply is not heard, the assistant leaves the message in the message queue on the cell phone. However, if the driver says something ("play me the message", for instance), then the assistant reads the message to the driver using a text-to-speech system, while marking the message in the message queue as "read”.
  • the assistant If the message is played to the driver, the assistant then asks "would you like me to send a delivery receipt?". Upon hearing a response from the driver, the assistant returns a text message to the sender saying "This message was delivered by [driver ID]'s voice assistant". If the driver does not respond, then the assistant simply terminates the transaction, leaving the message in the message inbox for later retrieval.
  • the assistant may be configured for more detailed replies, as described below.
  • the assistant can market itself to the caller as well.
  • the assistant announces itself to the caller and opens the channel to the user.
  • the assistant could also announce to the caller: "I am an automated assistant, freely available at YYYY.com”.
  • the assistant could say: “I'm an automated assistant. Stay on the line after the call and I can tell you about myself and send a link to download me to your phone for free.” or "This automated assistant is available - press 1 for more information”.
  • the assistant could provide some basic information on how the assistant works and, if the caller agrees, send an SMS with a WWW link to download the app.
  • the notifications are returned to the sender in text form.
  • the assistant may modify its actions based on the history of a particular user and on a record of past interactions. For instance, if a particular user is always shunted to voicemail, the assistant may "learn" to recognize this situation, and if this caller calls it can automatically pass the call to voicemail (possibly subject to override by the driver). It may learn this circumstance using standard machine learning protocols, or with a neural network system.
  • buttons are not ordinarily used in user interactions involving the attentive assistant, they may provide "emergency" services. For instance, a call that has been connected through inadvertent miss-communication between the driver and the assistant may be terminated using the "hang up" button on the driver's steering wheel (as he might do after a standard Bluetooth enabled phone call). On the other hand, if the driver did not respond verbally to an offer to connect a call, but wanted the call connected, a push of the "call” button on the steering wheel could be interpreted as a signal to the application that the driver wanted to take the call. Other uses of the steering wheel buttons may enhance the non-standard use of this attentive assistant.
  • the assistant also uses machine learning to better handle calls. It starts by creating a profile for each caller based the incoming phone number.
  • the assistant detects that the caller is from an unrecognized number and introduces herself and explain how she works ("Hi. Dan is currently driving. I'm his AI assistant and help him answer his calls and take messages. Can you let me know what this is regarding?").
  • the assistant identifies the caller and recognizes that in a similar situation the user wanted to speak immediately, so does not ask what the call is in regards to: "Hi, Steve. It's nice to talk to you again. Let me see if Dan's able to talk"
  • the software may include instructions for causing a processor at the user device or server computer to perform functions described above, with the software being stored on a non-transitory machine- readable medium, or transmitted (e.g., to the user device) from a storage to the user device or server computer over a communication network (e.g., downloading an application ("app") to the user's smartphone).

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

An approach to providing communication assistance to an operator of a vehicle makes use software having a first component executing on a personal device of the operator as well as a second component executing on a server in communication with the personal device. In some implementations, handling a call involves establishing a first two-way audio link between the server and the calling device is established, and a second two-way audio link between a server and the user device. The server passes some of the audio from the calling device to the user device, and monitors a user's voice input, of lack thereof, to determine how to handle the call.

Description

ATTENTIVE AS SI S TANT
Cross-Reference to Related Applications
[001] This application claims the benefit of U.S. Provisional Application No.
62/244,417, filed October 21, 2015, titled "THE ATTENTIVE ASSISTANT." This application is incorporated herein by reference.
Background
[002] This invention relates to a communication assistant, and in particular to an automated assistant for use by an operator of a motor vehicle, or of other equipment, in performing communication related tasks.
[003] Mobile devices are ubiquitous in today's connected environment. There are more cell phones in the United States than there are people. Drivers often use mobile communications to transact business, to provide access to social media, or for other personal communications tasks. Some states have legislated for the use only of hands- free communication devices in cars, but scientific studies of distracted driving suggest that this constraint does not free the driver of substantial distraction. The growing rise of text communications among younger people has further exacerbated the problem, with findings that as many as 30% of traffic accidents are caused by texting-while-driving users.
[004] Mobile devices today may include voice-based interfaces, for instance, the Siri™ interface provided by Apple Inc., which may allow users to interface with their mobile devices using hands-free voice-based interactions. For example, a user may place a telephone call or dictate a text message by voice. Speech-recognition based telephone assistants have been attempted but are not ubiquitous. For example, a system developed by Wildfire Communication over twenty years ago attempted to provide telephone-based assistance, but did not relive the user of having to use a conventional telephone to interact with the system. However, drivers may be distracted using such interfaces even if a hands-free telephone is used. Summary
[005] In a general aspect, an approach to providing communication assistance to an operator of a vehicle makes use software having a first component executing on a personal device of the operator as well as a second component executing on a server in communication with the personal device.
[006] In one aspect, a method for assisting communication via a user device includes receiving at a server a voice-based call from a calling device for the user device, the voice-based call having been made to an address associated with the user device. A first two-way audio link between the server and the calling device is established. A second two-way audio link is also established between a server and the user device. The server responds to the call by sending a first audio stream over the first link to the calling device. The first audio stream includes a spoken message for alerting a calling party to the involvement of an automated assistant. The server receives a second audio stream over the first link from the calling device, and sends a third audio stream over the second link to the user device, where the third audio stream includes a portion of the second audio stream. Audio received over at least one of the first link and the second link is processed at the server. This processing includes waiting to receive a first voice response of a first predetermined type over the second link, and if the first voice response is received, causing the calling device and the user device to be joined by a two-way audio link.
[007] Aspects may include one or more of the following features.
[008] The sending of the third audio stream is performed at least in part during receiving of the second audio stream.
[009] The third audio stream is a delay of the second audio stream.
[010] The voice response from the user device is not sent to the calling device.
[011] The first voice response consists of no spoken response (i.e., the user does not speak, for example, for a prescribed amount of time).
[012] Processing the audio further includes waiting to receive a second voice response of a second predetermined type over the second link, and if the second voice response is received, causing the calling device and a voice messaging server to be joined by a two- way audio link. [013] Establishing the second link is performed prior to receiving the voice-based call.
[014] The second link comprises a packet-based link (e.g., a WebRTC based link).
[015] Causing the calling device and the user device to be joined by a two-way audio link comprises bridging the first link and the second link, or redirecting the voice-based call to the user device.
[016] In another aspect, in general method for assisting communication via a user device includes establishing a second two-way audio link between a server and a user device. A call made to the user device (e.g., from a calling device to a number for the user device) at the user device, including by receiving a third audio stream over the second link, where the third audio stream includes a portion of the second audio stream received from a calling device at the server. Audio received at the user device from a user is processed, including receiving a first voice response of a first predetermined type, wherein first voice response causes the calling device and the user device to be joined by a two-way audio link.
[017] Aspects may include one or more of the following features.
[018] The receiving of the third audio stream is performed at least in part during receiving of the second audio stream at the server.
[019] The third audio stream is a delay of the second audio stream.
[020] Establishing the second link is performed to the server receiving the second audio stream.
[021] An advantage of one or more embodiments is that the there is little if any distraction to the user to cause a call to be either competed from a calling device to the user device or directed to a voice messaging system. In a particularly simple
embodiment, in response to "eavesdropping" on an interaction between the assistant and the caller, the requirement that the user is merely silent to cause the call to be redirected or to utter a simple command to complete the call provides a high degree of functionality with minimal distraction. More complex command input by the user can provide increased functionality without increasing distraction significantly.
[022] Other features and advantages of the invention are apparent from the following description, and from the claims. Description of Drawings
[023] FIG. 1 is a block diagram of a communication assistance system; [024] FIG. 2 is a block diagram of components of the system of FIG. 1.
Description
[025] FIG. 1 shows a schematic block diagram of a communication assistance system 100. A representative vehicle 120 is illustrated in FIG. 1, as are a set of representative remote telephones 175 (or other communication devices), but it should be understood that the system described herein is intended to support a large population of users. Generally, a user 110, generally an operator of a vehicle 120, makes use of a personal device 125, such as a "smartphone". The device 125 includes a processor that can execute
applications, and in particular, executes a client application 127, which is used in providing communication assistance to the user. The vehicle 120 may optionally include a built-in station 130, which communicates with the personal device 125 (e.g., via a Bluetooth radio frequency communication link 126) and extends interface functions of the personal device via a speaker 134, microphone 133, and/or touchscreen 132.
[026] The personal device 125 is linked to a telephone and data network 140, for example, that includes a cellular based "3G" or "4G"/"LTE" network that provides communication services to the device, including call-based voice communication (i.e., a dedicate channel for voice data) and/or packet or message based communication.
[027] The system 100 makes use of one or more server computers 150, which execute a server application 155. In general, the client application 127 executing on the user's personal device 125 is in data and/or voice based communication with the server application 155 during the providing of communication assistance to the user.
[028] The user's device is associated a conventional telephone number and/or other destination address (e.g., email address, Session Initiation Protocol (SIP) Uniform
Resource Identifier (URI), etc.) based on which other devices, such as remote telephone 175 can initiate communication to the user's personal device 125. Communication based on a conventional telephone number is described as a typical example.
[029] In general, inbound communication, for example, from a remote telephone 175 is redirected to the server application 155 at the server 150. In one approach, such redirection is selected by the user 110 when the user is operating the vehicle 120, or in some examples, redirection is initiated automatically when the personal device is used in the vehicle (e.g., paired with the built-in station 130). One way that this redirection is accomplished is for the client application 127 executed on the personal device 125, and to communicate with a component 145 (e.g., a switch, signaling node, gateway, etc.) of the telephone network to cause the redirection on inbound communication to the personal device. Various approaches to causing this redirection may be used, at least in part dependent on the capabilities of the telephone network 140. For example, in certain networks, the redirection may be turned on and off using dialing codes, such as "*72" to turn on forwarding and "*73" to turn it off. In embodiments, rather than the client application 127 causing the redirection, the user may use built-in capabilities of the personal device 125 to cause the redirection, for example, using a "Settings>Phone>Call Forwarding" setting of a smartphone. In any case, calls and optionally text messages are directed to the server application 155 as a result. The server application 155 does not necessarily have a separate physical telephone line for each user 110. For example, dialed number information (DNIS) or other signaling information may be provided by the telephone network 140 when delivering a call for the user to the server application 155 in order to identify the destination (i.e., the user) for the call. In some implementations (not shown in FIG. 1), inbound communication may pass through a Voice-over-IP (VoIP) gateway in or at the edge of the network 140, and call setup as well as voice data may be provided to the server application 155 over a data network connection (e.g., as Internet Protocol communication).
[030] Prior to receiving communication at the server application 155 for the user 110, a persistent data connection is established between the server application 155 and the client application 127, or alternatively, the client application 127 can accept new data connections that are initiated on demand by the server application 155 over a data network linking the server 150 and the personal device 125 (e.g., over a data network service of the mobile network 140).
[031] When a voice call is received at the server application 155 for a particular user 110, the server accepts the call and establishes a voice communication channel between the server application and the remote telephone 175, making use of speech synthesis (either from recorded utterances, or using computer-implemented text-to-speech (TTS)) and speech recognition and/or telephone tone (DTMF) decoding capabilities at the server application 155. Handling of a received voice call by the server application generally involves audio communication between the server application and the calling telephone 175 on a first communication link, as well as audio communication between the user 110 and the server application 155 on a second communication link. In one implementation, audio communication between the server application 155 and the user 110 makes use of a peer-to-peer audio protocol (e.g., WebRTC and/or RTP) to pass audio between the server application 155 and the client application 127. The client application 127 interacts with the user via a microphone and speaker of the device 125 and/or the station 130.
Depending on the flow of call handling, as described more fully below, the calling telephone 175 and the personal device 125 may at some point in the flow be linked by a bidirectional voice channel, for example, with the channel being bridged at the server application 155, or bridged or redirected via capabilities provided by the telephone network 140.
[032] In general, handling of an inbound telephone call involves the server application 155 performing steps including: (1) answering the call; (2) communicating with the caller advising the caller of its assistant nature; (3) announcing the call to the user 110, generally including forwarding of at least some audio of the communication with the caller to the user; and (4) causing the caller and the user to be in direct audio
communication (e.g., bridging the call to include the caller, the server, and the in-vehicle user) or forwarding to to a voicemail repository, depending on the actions of the driver.
[033] In an example of handling of an inbound call, a call made to the user' s telephone number while the user is using the system in the user's vehicle is delivered to the server application 155. The server application implements the assistant function, and upon answering the call, the assistant announces itself, for instance, by saying "this is the assistant for [driver's ID]. May I help you?" The caller may respond by saying "I'd like to speak with [driver's ID]", whereupon the assistant generates an audio response that says "He is driving. I'll see if he can take your call". During this exchange with the caller (or optionally with a delay or after the completion of the interaction), the server application forwards the audio to the client application 127 in the vehicle, and the client application plays the audio (e.g., both the server application synthesized prompts as well as the caller's audio answers). After this initial exchange, the assistant waits a few seconds for the driver to speak. This functionality may be implemented at the client application 127, or alternatively, the monitored audio from within the vehicle may be passed to the server application 155, which makes this determination. In any case, this audio from the vehicle is not generally passed back to the caller. Not hearing any response from the driver, the assistant then generates another audio response that says "[driver ID] is busy; may I forward your call to his voicemail?" If the caller speaks, the assistant detects the caller's verbal response and processes the response. If the driver speaks in response to the assistant's prompt indicating that the call should be completed, then the assistant connects the device 125 to the call, and the phone call proceeds normally. If the driver does not speak, or indicates that he cannot accept the call, the call is directed to voicemail. As introduced above, the connection of the call to the user may be performed in a variety of ways, including making a voice link using an Internet Protocol (e.g., SIP, WebRTP, etc.) connection, or using a cellular voice connection, for instance, with the personal device initiating a call to the server or the server initiating a voice call to the personal device (in a manner that is not subject to the forwarding setting for other calls made to the device) or using a call transfer function of the telephone network thereby removing the server application from the call. A typical interaction might involve the following exchange:
• [Assistant]: Hi. I'm Dan's assistant Samantha.
• [Caller]: This is Cora. I wanted to talk to Dan about the press release we're working on.
• [Assistant]: He's currently in his car. Would you like me to see if he's available to speak with you?
• [Caller]: That would be great.
• [Assistant]: ok. Hold on a second and I'll see.
[034] Referring to FIG. 2, in an embodiment of the system 100 described above, a remove calling device 175 makes a call via the Public Switched Telephone Network (PSTN) 240 to a Voice-over-IP (VoIP) gateway 245. As discussed above, the user has previously redirected the telephone number of the user's personal device so that calls to it are redirected, in this case to the VoIP gateway. Prior to the call being made, the server application 155 has registered with the VoIP gateway to be notified of call's made to the user's number. When the call comes in, in this example, the VoIP gateway uses a Session Initiation Protocol (SIP) to interact with the server application 155 with the public Internet 250. The server application 155 accepts the call, at which point a Real-Time Protocol (RTP) audio connection is made between the VoIP gateway 245 and the server application 155 for the call. Previously, the client application 127 has registered with the server application 155 using a WebRTC protocol over a mobile IP network 260 (e.g., a 4G cellular network) and over the public Internet 260, and upon receiving the call for the user, the server application initiates WebRTC audio communication with the client application (e.g., using a Secure RTP (SRTP) protocol set up as part of the WebRTC interaction between the server application and the client application). At this point the server application passes audio data between the caller and the client application. When the server application "transfers" the call to the client, it either stays in the audio path (e.g., bridging the SIP-RTP connection and the WebRTC-SRTP connection), or alternatively, the server application sends a SIP command (e.g., REFER) to the VoIP gateway causing a redirection of the audio connection to pass directly between the VoIP gateway and the user's device 125.
[035] In other somewhat more complex call handling, the user interacts with the system (i.e., implemented at the client application 127 and/or the server application 155), generally using recognized speech input (or in some embodiments, a limited number of manual inputs, for example, using predefined buttons). For example, in response to hearing the initial exchange with the caller, the user may provide a command that causes one of a number of different actions to be taken. Such actions may include, for example, completing the call (e.g., in a response such as "please put her through"), providing the caller with a predefined synthesized response, or a text message (i.e., a Short Message Service (SMS) message), providing a recorded response, forwarding the call to a predefined or selected alternate destination (e.g., to the user's secretary), etc.
[036] The system also accepts text messages (e.g., SMS messages, email etc.) at the server on behalf of the user, and announces the arrival in a similar manner as with incoming voice calls. For instance, the arrival of the text message is announced audio to the user, and optionally (e.g., according to input from the user) the full content of the message is read to the user, and a response may be sent in return (either by default, such as "Dan is driving and can't answer right now", or by voice input (by speech-to-text or selection of predefined responses).
[037] As an example interaction, when a text message is received for the user at the server, the server causes audio to be played to the user: "You have a text message from ZZZ. Shall I read it to you?" where ZZZ is the identity of the sender of the text message. The assistant then listens for a reply from the driver, and if the reply is not heard, the assistant leaves the message in the message queue on the cell phone. However, if the driver says something ("play me the message", for instance), then the assistant reads the message to the driver using a text-to-speech system, while marking the message in the message queue as "read".
[038] If the message is played to the driver, the assistant then asks "would you like me to send a delivery receipt?". Upon hearing a response from the driver, the assistant returns a text message to the sender saying "This message was delivered by [driver ID]'s voice assistant". If the driver does not respond, then the assistant simply terminates the transaction, leaving the message in the message inbox for later retrieval. The assistant may be configured for more detailed replies, as described below.
[039] The assistant can market itself to the caller as well. When a call or message is handled, the assistant announces itself to the caller and opens the channel to the user. Optionally, while waiting for the driver to respond, the assistant could also announce to the caller: "I am an automated assistant, freely available at YYYY.com". Alternatively, it might say: "I'm an automated assistant. Stay on the line after the call and I can tell you about myself and send a link to download me to your phone for free." or "This automated assistant is available - press 1 for more information". At the end of the call, the assistant could provide some basic information on how the assistant works and, if the caller agrees, send an SMS with a WWW link to download the app. Of course, for the messaging application, the notifications are returned to the sender in text form.
[040] The assistant may modify its actions based on the history of a particular user and on a record of past interactions. For instance, if a particular user is always shunted to voicemail, the assistant may "learn" to recognize this situation, and if this caller calls it can automatically pass the call to voicemail (possibly subject to override by the driver). It may learn this circumstance using standard machine learning protocols, or with a neural network system.
[041] While buttons are not ordinarily used in user interactions involving the attentive assistant, they may provide "emergency" services. For instance, a call that has been connected through inadvertent miss-communication between the driver and the assistant may be terminated using the "hang up" button on the driver's steering wheel (as he might do after a standard Bluetooth enabled phone call). On the other hand, if the driver did not respond verbally to an offer to connect a call, but wanted the call connected, a push of the "call" button on the steering wheel could be interpreted as a signal to the application that the driver wanted to take the call. Other uses of the steering wheel buttons may enhance the non-standard use of this attentive assistant.
[042] The assistant also uses machine learning to better handle calls. It starts by creating a profile for each caller based the incoming phone number.
[043] All available metadata (contacts in the user's address book, information in the user's social graph, lookups of where the phone is based on exchange, etc) and the responses the user gives are associated with this profile. This information, along with any context about the current call (date, time, location, how fast the user is driving, etc.), is used to predict the way a new call should be handled, using machine learning models.
[044] For example, the first time Steve calls into the system, the assistant detects that the caller is from an unrecognized number and introduces herself and explain how she works ("Hi. Dan is currently driving. I'm his AI assistant and help him answer his calls and take messages. Can you let me know what this is regarding?"). The next time Steve calls, the assistant identifies the caller and recognizes that in a similar situation the user wanted to speak immediately, so does not ask what the call is in regards to: "Hi, Steve. It's nice to talk to you again. Let me see if Dan's able to talk"
[045] Over time, as more data is fed into the system to create better models, the AI assistant becomes better at predicting what the appropriate action is and simply does it automatically.
[046] It should be understood that various alternative implementations can provide the functionality described above. For example, some of all of the functions described above as being implemented at the server may be hosted in the vehicle, for example, on the user's communication device. Therefore, there may not be separate client and server software. An example of some but not all of the functionality described above for the server being hosted in the vehicle involves speech synthesis to the user and speech recognition of speech of the user being performed in the vehicle, and encoded
information (e.g., text rather than audio) being passed between the client and the server. In some implementations, no software is required in the vehicle with the user's phone being set to automatically answer calls from the server, with the audio link between the server and the user device being formed over a cellular telephone connection rather than being form, for example, over the WebRTC connection described above. Furthermore, certain communication functions are described as using the Public Switched Telephone Network or the public Internet. Alternative implementations may use different communication infrastructure, for example, with the system being entirely hosted within a cellular telephone/communi cation infrastructure (e.g., within an LTE based
infrastructure).
[047] As described above, many features of the system are implemented in software that executes at a user device and/or at a server computer. The software may include instructions for causing a processor at the user device or server computer to perform functions described above, with the software being stored on a non-transitory machine- readable medium, or transmitted (e.g., to the user device) from a storage to the user device or server computer over a communication network (e.g., downloading an application ("app") to the user's smartphone).
[048] It is to be understood that the foregoing description is intended to illustrate and not to limit the scope of the invention, which is defined by the scope of the appended claims. Other embodiments are within the scope of the following claims.

Claims

What is claimed is:
1. A method for assisting communication via a user device, the method comprising: receiving at a server a voice-based call from a calling device for the user device, the voice-based call having been made to an address associated with the user device, including establishing a first two-way audio link between the server and the calling device;
establishing a second two-way audio link between the server and the user device; responding to the call, including
sending a first audio stream over the first link to the calling device, said audio stream including a spoken message for alerting a calling party to the involvement of an automated assistant,
receiving a second audio stream over the first link, and
sending a third audio stream over the second link, said third audio stream including a portion of the second audio stream;
processing audio received over at least one of the first link and the second link at the server, including
waiting to receive a first voice response of a first predetermined type over the second link, and
if the first voice response is received, causing the calling device and the user device to be joined by a two-way audio link.
2. The method of claim 1 wherein the sending of the third audio stream is performed at least in part during receiving of the second audio stream.
3. The method of claim 2 wherein the third audio stream is a delay of the second audio stream.
4. The method of claim 1 wherein the voice response from the user device is not sent to the calling device.
5. The method of claim 1 wherein the first voice response consists of no spoken response.
6. The method of claim 1 wherein processing the audio further includes
waiting to receive a second voice response of a second predetermined type over the second link, and
if the second voice response is received, causing the calling device and a voice messaging server to be joined by a two-way audio link.
7. The method of claim 1 wherein establishing the second link is performed prior to receiving the voice-based call.
8. The method of claim 7 where the second link comprises a packet-based link.
9. The method of claim 1 wherein causing the calling device and the user device to be joined by a two-way audio link comprises bridging the first link and the second link.
10. The method of claim 1 wherein causing the calling device and the user device to be joined by a two-way audio link comprises redirecting the voice-based call to the user device.
11. A method for assisting communication via a user device, the method comprising: establishing a second two-way audio link between a server and a user device; responding to a call made to the user device, including
receiving a third audio stream over the second link, said third audio stream including a portion of the second audio stream received from a calling device at the server;
processing audio received at the user device from a user, including
receiving a first voice response of a first predetermined type, wherein first voice response causes the calling device and the user device to be joined by a two-way audio link.
12. The method of claim 11 wherein the receiving of the third audio stream is performed at least in part during receiving of the second audio stream at the server.
13. The method of claim 12 wherein the third audio stream is a delay of the second audio stream.
14. The method of claim 11 wherein establishing the second link is performed to the server receiving the second audio stream.
15. The method of claim 14 where the second link comprises a packet-based link.
16. The method of claim 1 wherein causing the calling device and the user device to be joined by a two-way audio link comprises causing bridging of the first link and the second link.
17. The method of claim 1 wherein causing the calling device and the user device to be joined by a two-way audio link comprises causing redirection of the voice-based call to the user device.
PCT/US2016/057876 2015-10-21 2016-10-20 Attentive assistant WO2017070323A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562244417P 2015-10-21 2015-10-21
US62/244,417 2015-10-21

Publications (1)

Publication Number Publication Date
WO2017070323A1 true WO2017070323A1 (en) 2017-04-27

Family

ID=57233882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/057876 WO2017070323A1 (en) 2015-10-21 2016-10-20 Attentive assistant

Country Status (2)

Country Link
US (1) US20170118344A1 (en)
WO (1) WO2017070323A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3503116A1 (en) * 2017-12-22 2019-06-26 Corevas GmbH & Co. KG Apparatus, method and system for obtaining information on an emergency situation

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023030619A1 (en) 2021-09-01 2023-03-09 Cariad Se Telephone service device for requesting services, vehicle and method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100166166A1 (en) * 2005-05-04 2010-07-01 Arona Ltd Call handling

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8213910B2 (en) * 2001-02-09 2012-07-03 Harris Technology, Llc Telephone using a connection network for processing data remotely from the telephone
US20080181373A1 (en) * 2007-01-31 2008-07-31 Brown Jr Thomas W Call Messaging System
US8139727B2 (en) * 2008-12-18 2012-03-20 At&T Intellectual Property I, Lp. Personalized interactive voice response system
WO2015157013A1 (en) * 2014-04-11 2015-10-15 Analog Devices, Inc. Apparatus, systems and methods for providing blind source separation services
US9412394B1 (en) * 2015-03-09 2016-08-09 Jigen Labs, LLC Interactive audio communication system
US10693920B2 (en) * 2015-04-29 2020-06-23 Secure Connection Ltd. Systems and methods for screening communication sessions

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100166166A1 (en) * 2005-05-04 2010-07-01 Arona Ltd Call handling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PARROT: "Parrot MKi9200 User guide Contents", 23 July 2012 (2012-07-23), XP055331552, Retrieved from the Internet <URL:https://parrotcontact.parrot.com/website/user-guides/download-user-guides.php?pdf=mki9200/MKi9200_User-guide_UK.pdf> [retrieved on 20161223] *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3503116A1 (en) * 2017-12-22 2019-06-26 Corevas GmbH & Co. KG Apparatus, method and system for obtaining information on an emergency situation

Also Published As

Publication number Publication date
US20170118344A1 (en) 2017-04-27

Similar Documents

Publication Publication Date Title
KR102303810B1 (en) Handling calls on a shared speech-enabled device
US9948772B2 (en) Configurable phone with interactive voice response engine
US9241059B2 (en) Callee rejection information for rejected voice calls
US7844262B2 (en) Method for announcing a calling party from a communication device
EP1296501A1 (en) Courtesy alerting feature for mobile electronic devices
JP2008539629A (en) Call control system and method
MXPA06011460A (en) Conversion of calls from an ad hoc communication network.
US10542147B1 (en) Automated intelligent personal representative
US7333803B2 (en) Network support for voice-to-text memo service
US8433041B2 (en) Method and system to enable touch-free incoming call handling and touch-free outgoing call origination
MX2011001919A (en) Method and system for scheduling phone call using sms.
CN112887194B (en) Interactive method, device, terminal and storage medium for realizing communication of hearing-impaired people
WO2013096102A1 (en) Method and apparatus for providing multiparty participation and management for a text message session
GB2578121A (en) System and method for hands-free advanced control of real-time data stream interactions
US20170118344A1 (en) Attentive assistant
CN106161716A (en) A kind of method of audio call, device and server
WO2007033459A1 (en) Method and system to enable touch-free incoming call handling and touch-free outgoing call origination
US10827068B1 (en) Method and apparatus of processing caller responses
CN112259073A (en) Voice and text direct connection communication method and device, electronic equipment and storage medium
JP5265587B2 (en) Call device and call method
JP2006186893A (en) Voice conversation control apparatus
JP2020061703A (en) Call support device
TR201918971A2 (en) Method and system of switching to rest mode on mobile devices
JP2024530891A (en) Method and device for making a call based on analysis of call connection tones
EP1534034A1 (en) Communication system and method for informing of a delay of a call establishment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16791189

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16791189

Country of ref document: EP

Kind code of ref document: A1