WO2023154080A1 - Providing contextual automated assistant action suggestion(s) via a vehicle computing device - Google Patents
Providing contextual automated assistant action suggestion(s) via a vehicle computing device Download PDFInfo
- Publication number
- WO2023154080A1 WO2023154080A1 PCT/US2022/035946 US2022035946W WO2023154080A1 WO 2023154080 A1 WO2023154080 A1 WO 2023154080A1 US 2022035946 W US2022035946 W US 2022035946W WO 2023154080 A1 WO2023154080 A1 WO 2023154080A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- application
- user
- computing device
- vehicle
- automated assistant
- Prior art date
Links
- 230000009471 action Effects 0.000 title abstract description 26
- 230000003993 interaction Effects 0.000 claims abstract description 59
- 238000000034 method Methods 0.000 claims description 93
- 238000010801 machine learning Methods 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 7
- 230000015654 memory Effects 0.000 claims description 5
- 230000036541 health Effects 0.000 description 27
- 230000008569 process Effects 0.000 description 19
- 238000012549 training Methods 0.000 description 17
- 230000001413 cellular effect Effects 0.000 description 13
- 239000000463 material Substances 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000012482 interaction analysis Methods 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 4
- 238000009877 rendering Methods 0.000 description 4
- 238000010079 rubber tapping Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/14—Digital output to display device ; Cooperation and interconnection of the display device with other functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/451—Execution arrangements for user interfaces
- G06F9/453—Help systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R16/00—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
- B60R16/02—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
- B60R16/037—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
- B60R16/0373—Voice control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
Definitions
- Humans may engage in human-to-computer dialogs with interactive software applications referred to herein as “automated assistants” (also referred to as “digital agents,” “chatbots,” “interactive personal assistants,” “intelligent personal assistants,” “assistant applications,” “conversational agents,” etc.).
- automated assistants also referred to as “digital agents,” “chatbots,” “interactive personal assistants,” “intelligent personal assistants,” “assistant applications,” “conversational agents,” etc.
- humans (which when they interact with automated assistants may be referred to as “users”) may provide commands and/or requests to an automated assistant using spoken natural language input (i.e., utterances), which may in some cases be converted into text and then processed, and/or by providing textual (e.g., typed) natural language input.
- spoken natural language input i.e., utterances
- an automated assistant can be accessed in a vehicle via an integrated vehicle computing device — which can also provide access to other applications.
- the automated assistant can provide a number of benefits to a user, the user may not be entirely aware of all of the functionality of the automated assistant.
- the automated assistant may provide functionality that can assist the user with hands-free control of other applications that are available via the vehicle computing device.
- the user may only invoke and utilize the automated assistant when attempting to control functionality that the user considers to be limited to the automated assistant, and manually control features of other applications without invoking and utilizing from the automated assistant.
- This manual control of the features of these applications can increase a quantity of user inputs received via the vehicle computing device, thereby wasting computational resources of the vehicle computing devices. Further, this manual control of the features of these applications can also increase user distraction while driving the vehicle, since driving and manually interacting with the vehicle computing device can cause the user to direct their attention away from driving.
- a user that places a phone call via a vehicle phone application may necessarily input each number of a phone number or each character of a contact name by gazing down at, and tapping, a graphical keypad rendered at a display interface of the vehicle computing device.
- a user that seeks to stream media from the internet may also navigate a media application by gazing down at, and tapping, GUI elements of the media application to initialize playback of some desired media. Accordingly, not only does this manual control of the features of these applications unnecessarily waste computational resources of the vehicle computing device, but it also increases user distraction.
- Implementations set forth herein relate to an automated assistant that can provide suggestions for assistant inputs to provide to the automated assistant to control certain other applications when a user is within a vehicle and is currently controlling, or is predicted to control, the certain other applications.
- the vehicle can include a vehicle computing device that can provide access to a variety of different applications, including an automated assistant application that is associated with the automated assistant.
- the automated assistant that is accessible via the vehicle computing device can be a counterpart of another automated assistant that is accessible via one or more other devices, such as a mobile computing device (e.g., a cellular phone).
- the automated assistant can operate as an interface between a user and a given vehicle application of the vehicle computing device, and/or between the user and a given mobile application of the mobile computing device.
- the automated assistant can provide, via the vehicle computing device, suggestions for assistant inputs that can be submitted by a user and to the automated assistant that, when submitted, can cause the automated assistant to control one or more features of these applications.
- This can streamline certain interactions between a user and an application by reducing a quantity of inputs that may otherwise be used to control features of certain applications through manual interactions with the certain applications. Additionally, this can reduce user distraction of the user while driving the vehicle or riding in the vehicle by enabling the user to rely on hands-free interactions with the automated assistant to control various applications while driving the vehicle or riding in their vehicle.
- a user can be riding in a vehicle while interacting with an application.
- the application can be, for example, a health application, and the user can be interacting with the health application to schedule an appointment with their primary care doctor.
- the vehicle can include a vehicle computing device, which can provide access to the automated assistant application and the health application, and can include a display interface.
- the automated assistant can also be accessible via a cellular phone of the user, which can include an automated assistant application that is a counterpart of the automated assistant that is accessible via the vehicle computing device.
- the health application can also be accessible via the cellular phone of the user.
- the automated assistant can determine, with prior permission from the user, that the user is interacting with the health application while riding in the vehicle via the vehicle computing device and/or the cellular phone of the user, and can process contextual data to provide a contextual automated assistant suggested action to the user via the display interface of the vehicle computing device.
- This contextual automated assistant suggested action can, for example, educate the user as to how to interact with the health application in a hands-free manner via the automated assistant.
- contextual data can include, for instance, contextual data associated with the user, contextual data associated with the cellular phone of the user, and/or contextual data associated with the vehicle computing device, such as a screenshot of the healthcare application being interacted with by the user via the cellular phone of the user and/or the vehicle computing device, a destination of the vehicle, a current location of the vehicle, an identifier for the user (with prior permission from the user), data characterizing features of recent interactions between the user and one or more applications, and/or any other related data can be processed by the automated assistant. Further, the contextual data can be processed for determining one or more operations that the user is currently engaged in, and/or is expected to initialize, via the health application.
- a screenshot of the health application can include graphics and/or text characterizing a graphical user interface (GUI) for scheduling an appointment with a doctor.
- Data characterizing this screenshot can be processed using one or more heuristic processes and/or one or more trained machine learning models to determine one or more operations (e.g., creating an appointment, canceling an appointment, rescheduling an appointment, etc.) that can be controlled via the GUI.
- one or more particular operations can be selected as a basis for generating suggestion data, which can characterize an assistant input that can be submitted by the user to the automated assistant, in lieu of the user manually interacting with the health application, to control the one or more particular operations of the health application.
- suggestion data generated by the automated assistant can characterize a spoken utterance such as, “Try saying ‘Assistant, schedule an appointment with Dr. Chow for Tuesday at 3:00PM.’”
- the automated assistant can cause text of the spoken utterance to be visually rendered as a graphical element at the display interface of the vehicle computing device simultaneous to the user interacting with the health application and/or subsequent to the user interacting with the health application.
- the graphical element can put the user on notice, before, during, and/or after the user interacts with the health application that the automated assistant can perform the specified operation (e.g., scheduling an appointment) for the current interaction with the health application and/or in lieu of future interactions with the health application in response to the user providing the rendered spoken utterance.
- the user can provide the spoken utterance to the automated assistant.
- Audio embodying the spoken utterance can be received at an audio interface (e.g., one or more microphones) of the vehicle computing device and/or the cellular phone to cause an instance of the automated assistant to interact with the health application.
- the automated assistant can thereby cause the health application to schedule the appointment for the user — without the user manually interacting (e.g., directly tapping a touch interface of the cellular phone) with their cellular phone and/or the vehicle computing device while they are in their vehicle.
- suggestions for assistant inputs can be rendered subsequent to the user interacting with the application. For instance, assume in the above example that the user manually interacts with the health application to schedule the appointment. Further assume that the user successfully schedules the appointment through this manual interaction and a confirmation for the appointment is provided for display via the cellular phone and/or the vehicle computing device. In this instance, the user may receive a suggestion from their automated assistant while riding when the confirmation is displayed.
- the suggestion may characterize a spoken utterance that, when provided to the automated assistant in the future causes the automated assistant to control a feature of the health application that the user may have previously utilized while riding in the vehicle, such as “By the way, next time you can say ‘Assistant, schedule an appointment with Dr. Chow for Tuesday at 3:00PM.’”
- suggestions for assistant inputs can be rendered prior to the user interacting with another application. For instance, a user that interacts with a particular application while riding in their car one day may receive a suggestion from their automated assistant while riding in their car during a different day.
- the suggestion may characterize a spoken utterance that, when provided to the automated assistant, causes the automated assistant to control a feature of an application that the user may have previously utilized while riding in the vehicle.
- the automated assistant may cause a suggestion such as “Assistant, play my health podcast,” while riding in their vehicle if, during a previous ride in their vehicle, the user accessed a podcast application to play a “health podcast.”
- applications accessed directly via the vehicle computing device can be controlled, with prior permission from the user, by the automated assistant, and can therefore be the subject of assistant suggestions.
- a user who is predicted to interact with a vehicle maintenance application of their vehicle computing device can receive a suggestion, via an interface of the vehicle computing device, regarding controlling a feature of the vehicle maintenance application.
- a user may typically access their vehicle maintenance application a few minutes into a drive that is over a threshold distance e.g., 100 miles), in order to see whether there are any charging stations near their destination.
- the automated assistant can cause the vehicle computing device to render a suggestion such as, “Assistant, show me charging stations near my destination,” the next time the user selects to navigate to a destination that is over the threshold distance away (e.g., over 100 miles away), but prior to the user accessing the vehicle maintenance application.
- a given suggestion for a given assistant input may only be rendered for presentation to a given user a threshold quantity of times to reduce a quantity of computational resources consumed in generating the suggestions and to reduce user annoyance.
- the suggestion of “By the way, next time you can say ‘Assistant, schedule an appointment with Dr. Chow for Tuesday at 3:00PM’” in the above example may only be provided for presentation to the user once to educate the user with respect to the automated assistant functionality. Accordingly, if the user subsequently begins interacting with the health application via the vehicle computing device to schedule a subsequent appointment, the suggestion may not be provided.
- a given suggestion for a given application may only be rendered for presentation to a given user a threshold quantity of times to reduce a quantity of computational resources consumed in generating the suggestions and to reduce user annoyance.
- the suggestion of “By the way, next time you can say ‘Assistant, schedule an appointment with Dr. Chow for Tuesday at 3:00PM’” in the above example may only be provided for presentation to the user once to educate the user with respect to the automated assistant functionality.
- the suggestion may additionally include other suggestions with respect to the health application.
- the suggestion may additionally include “You can also say ‘Assistant, cancel my appointment’ or ‘Assistant, reschedule my appointment’”, or include any additional functionality that the automated assistant can perform with respect to the health application.
- the automated assistant proactively educates the user with respect to multiple assistant inputs that may be provided to cause the automated assistant to control different features of the health application.
- techniques described herein enable a system to provide contextually relevant automated assistant action suggestion(s) in a vehicular environment to reduce consumption of computational resources and/or reduce driver distraction.
- techniques described herein can detect interactions with a vehicle computing device of a vehicle of a user and/or a mobile computing device of the user while the user is located in the vehicle. Further, techniques described herein can identify contextual information associated with the interactions. Based on the interactions and/or the contextual information associated with the interactions, the system can generate and provide the suggestion(s) for presentation to the user when the user completes the interaction and/or is predicted to initiate an interaction to enable the interaction to be initialized and completed in a hands-free manner.
- computational resources can be conserved based on at least a reduced quantity of user inputs to achieve the interactions, and user distraction reduced based on obviating the need for the user inputs.
- FIG. 1 A, FIG. IB, FIG. 1C, and FIG. ID illustrate views of a user receiving a suggestion of an assistant input that can be utilized to direct an automated assistant to safely interact with an application while the user is in a vehicle, in accordance with various implementations.
- FIG. 2 illustrates a system that provides an automated assistant that can generate suggestions for invoking the automated assistant to control an application that may distract a user from driving their vehicle, in accordance with various implementations.
- FIG. 3 illustrates a method for providing an automated assistant that can generate suggestions for assistant inputs that can be provided by the user in furtherance of causing the automated assistant to control a separate application while the user is in a vehicle, in accordance with various implementations.
- FIG. 4 is a block diagram of an example computer system, in accordance with various implementations.
- FIG. 1 A, FIG. IB, FIG. 1C, and FIG. ID illustrate a view 100, a view 120, a view 140, and a view 160, respectively, of a user 102 receiving a suggestion of an assistant input that can be utilized to direct an automated assistant to safely interact with an application while the user 102 is in a vehicle 108.
- the user 102 can be interacting with an application via a mobile or portable computing device 104 (e.g., a cellular phone of the user 102) and/or a vehicle computing device 106 of the vehicle 108.
- the application can be separate from the automated assistant, which can be accessible via the portable computing device 104 and/or the vehicle computing device 106 via respective instances of an automated assistant application executing thereon.
- the user 102 can access respective instances of the application via the portable computing device 104 while the user 102 is driving and/or riding in the vehicle 108, and/or via the vehicle computing device 106 while the user 102 is driving and/or riding in the vehicle 108.
- the automated assistant can operate as an interface between the user 102 and the instances of the application to promote safe driving practices, and also reduce a number of inputs that may be necessary to control the respective instances of the application.
- the user 102 can be accessing an instance of an internet of things (loT) application via the portable computing device 104 when the user 102 is driving the vehicle 108.
- the user 102 can be interacting with the instance of the loT application to modify a temperature setting of a home thermostat via an application interface of the loT application.
- the automated assistant can determine that the user 102 is located within the vehicle 108 and interacting with the instance of the loT application. The automated assistant can process data from a variety of different sources to make this determination.
- data based on one or more sensors within the vehicle 108 can be utilized to determine (e.g., using facial recognition, voice signature, touch input, etc.), with prior permission from the user 102, the user is in the vehicle 108.
- the data characterizing screen content 110 of the portable computing device 104 can be processed, with prior permission from the user 102, to determine one or more features of the instance of the loT application and/or one or more applications operating in a background of the portable computing device 104 that the user 102 make be controlling, and/or seeking to control.
- the screen content 110 can include a GUI element for controlling a temperature of a living room in a home of the user 102.
- the screen content 110 can be processed using one or more heuristic processes and/or one or more trained machine learning models to identify one or more different controllable features of the screen content 110.
- the identified feature(s) can be utilized to generate a suggestion for an assistant input that can be provided, by the user 102 and to the automated assistant, to invoke the automated assistant to control the identified feature(s) of the instance of the loT application.
- the suggestion can be generated using an application programming interface (API) for data to be communicated between the instance of the loT application and the automated assistant.
- API application programming interface
- the suggestion data can be generated using content that is accessible via one or more interfaces of the instance of the loT application.
- a feature of the screen content 110 can be a selectable GUI element, and the selectable GUI element can be rendered in association with natural language content (e.g., “72 degrees”).
- a GUI element can be determined to be selectable based on data that is available to the automated assistant, such as HTML code, XML code, document object model (DOM) data, application programming interface (API) data, and/or any other information that can indicate whether certain features of an application interface of the instance of the loT application are selectable.
- a portion of the screen content 110 occupied by the selectable GUI element can be identified, along with the natural language content, for utilization in generating the suggestion of the assistant input to be provided for presentation to the user 102.
- the automated assistant can interact with the application interface of the loT application to control the suggested feature, and/or communicate API data to the loT application for controlling the suggested feature. For example, and as illustrated in FIG.
- the automated assistant can cause an assistant suggestion 124 to be rendered at a display interface 122 of the vehicle computing device 106 prior to the user 102 accessing the instance of the loT application, while the user 102 is accessing the instance of the loT application, and/or subsequent to the user 102 causing some action to be performed via the instance of the the loT application. For instance, assume that the user 102 manually interacts with the instance of the loT application to cause a temperature at a home of the user 102 to be modified. Subsequent to the user 102 manually interacting with the instance of the loT application, the automated assistant can render an assistant suggestion 124 that includes natural language content corresponding to a suggested assistant input.
- the suggested assistant input i.e., suggested command phrase
- the suggested assistant input can be, “Next time, try saying: ‘Assistant, lower the temperature in my Home Control Application’”, which can refer to a request for the automated assistant to automatically modify a temperature value that is managed by the instance of the loT application (/. ⁇ ., “Home Control Application”).
- the assistant suggestion 124 can be stored in association with a command to be submitted by the automated assistant to the loT application, such as “lowerTemp(HomeControlApp, Assistant, [0, -3 degrees]).”
- the automated assistant can additionally, or alternatively, provide other suggestions associated with the instance of the loT application, such as “You can also say: (1) ‘Assistant, open the garage’; and (2) ‘Assistant, disarm the alarm system’”, and/or any other actions that the automated assistant can perform on behalf of the user 102 by interfacing with the instance of the loT application even though these other actions are not directly related to the manual interaction of the user 102 with the instance of the loT application.
- the suggestion 124 can be provided for presentation to the user 102 at the display interface 122 of the vehicle computing device 106 as shown in FIG. 2.
- the suggestion 124 may additionally, or alternatively, be provided for presentation at a display interface of the portable computing device 104.
- the user 102 may not immediately utilize the suggestion 124. For example, when the user 102 is riding in their vehicle 108 and viewing the portable computing device 104, the user 102 may see the assistant suggestion 124 provided at the display interface 122 of the vehicle computing device 106 and place their phone down in acknowledgment of the ability of the automated assistant to control the loT application.
- the user 102 may utilize the suggestion 124 the next time the user 102 would like to control one or more loT devices utilizing the automated assistant.
- the user 102 may see the assistant suggestion 124 during a first trip (e.g., excursion) in the vehicle 108, but not utilize the suggestion 124 until a second, subsequent trip in the vehicle 108.
- the user 102 may be driving to a first destination during the first trip, and be presented with the assistant suggestion 124, but not utilize the suggestion 124 until the second trip.
- the user 102 can subsequently utilize the natural language content of the suggestion 124 to cause the automated assistant to perform some action on behalf of the user 102.
- the user 102 can provide a spoken utterance 142, corresponding to the assistant suggestion 124, during the same trip as illustrated in FIG. 1 A and FIG. IB, or a subsequent trip (e.g., a trip during a day that is subsequent to the day from FIG. 1 A and FIG. IB).
- a spoken utterance 142 corresponding to the assistant suggestion 124
- a subsequent trip e.g., a trip during a day that is subsequent to the day from FIG. 1 A and FIG. IB.
- the spoken utterance 142 can be, “Assistant, lower the temperature of my living room,” which can embody an invocation phrase (e.g., “Assistant”) and a request or command phrase (e.g., “lower the temperature of my living room.”
- the spoken utterance 142 provided by the user 102 is similar to the assistant suggestion 124, which included an invocation phrase (e.g., “Assistant”) and a request or command phrase (e.g., “lower the temperature in my Home Control Application.”) as shown in FIG. IB.
- the user 102 can provide a spoken utterance that is similar to, but not exactly like, the natural language content of the assistant suggestion 124, but cause the operations corresponding to the assistant suggestion 124 to be performed.
- contextual data associated with an instance when the user 102 provides the spoken utterance 142 can be processed to generate a similarity value that can quantify a similarity of the contextual data to the prior instances (i.e., prior contexts) of when the assistant suggestion 124 was previously rendered.
- the automated assistant can respond to the spoken utterance 142 by initializing performance of one or more operations corresponding to the assistant suggestion 124 (e.g, lowering a temperature of a living room of the user 102 that is associated with the Home Control Application).
- the automated assistant can respond to the spoken utterance 142 by initializing performance of one or more operations in furtherance of fulfilling the request or command phrase, and optionally also rendering an assistant output 162 to provide an indication that one or more of the operations were performed.
- the assistant output 162 can be an audible output and/or textual output that conveys, to the user, that the automated assistant fulfilled the request or command phrase.
- the assistant output 162 can be text and/or audio embodying the message, “The temperature of your living room has been modified through the Home Control Application” as shown in FIG. 1C.
- training data can be generated based on this interaction between the user 102, the automated assistant, and/or the instance of the loT application, in furtherance of training one or more trained machine learning models.
- the one or more trained machine learning models can then be utilized in subsequent contexts, for processing contextual data, to provide the user 102 with suggestions for safely streamlining interactions with the automated assistant and/or other applications.
- the example described above with respect to FIG. 1 A, IB, 1C, and ID is described with respect to user 102 interacting with the loT application via the portable computing device 104, it should be understood that is for the sake of example and is not meant to be limiting.
- the manual interaction with the instance of the loT application may occur via the vehicle computing device 106 rather than the portable computing device 104 as described.
- the suggestion 124 may be provided via the vehicle computing device 106 in the same or similar manner described above.
- the user 102 may additionally, or alternatively, interact with other applications via the portable computing device 104 and/or the vehicle computing device 106, and suggestions can be generated for those other applications in the same or similar manner.
- the user 102 can interact with an instance of a media application via the vehicle computing device 106 to cause a song to be provided for playback.
- a suggestion of, for example, “Tell me what song or artist you would like to hear” may be provided for audible and/or visual presentation to the user 102.
- the user 102 can simply provide a reference to a song or artist that he/should would like to hear, rather than navigating through the instance of the media application to cause music to be played back.
- the suggestion may not be provided until after the user manually selects a song or artist that they would like to hear and the song or artist is played back.
- FIG. 2 illustrates a system 200 that provides an automated assistant 204 that can provide suggestions for invoking the automated assistant 204 to control an application that the user may access from their vehicle (e.g., via the portable computing device 104 and/or via the vehicle computing device 106 from FIGS. 1 A-1D).
- the automated assistant 204 can operate as part of an automated assistant application that is provided at one or more computing devices, such as a computing device 202 and/or a server device.
- a user can interact with the automated assistant 204 via automated assistant interface(s) 220, which can be a microphone, a camera, a touch screen display, a user interface, and/or any other apparatus capable of providing an interface between a user and an application.
- automated assistant interface(s) 220 can be a microphone, a camera, a touch screen display, a user interface, and/or any other apparatus capable of providing an interface between a user and an application.
- a user can initialize the automated assistant 204 by providing a verbal, textual, gestural, and/or a graphical input to an assistant interface 220 to cause the automated assistant 204 to initialize one or more actions (e.g., provide data, control a peripheral device, access an agent, generate an input and/or an output, and/or other actions).
- the automated assistant 204 can be initialized based on processing of contextual data 236 using one or more trained machine learning models and/or heuristics processes.
- the contextual data 236 can characterize one or more features of an environment in which the automated assistant 204 is accessible (e.g., one or more features of the computing device 202 and/or one or more features of a vehicle in which a user is located), and/or one or more features of a user that is predicted to be intending to interact with the automated assistant 204.
- the computing device 202 can include a display device, which can be a display panel that includes a touch interface for receiving touch inputs and/or gestures for allowing a user to control applications 234 of the computing device 202 via the touch interface.
- the computing device 202 can lack a display device, thereby providing an audible user interface output, without providing a graphical user interface output.
- the computing device 202 can provide a user interface, such as a microphone, for receiving spoken natural language inputs from a user.
- the computing device 202 can include a touch interface and can be void of a camera, but can optionally include one or more other sensors.
- the computing device 202 and/or other third-party client devices e.g., that are provided by an entity that is in addition to an entity that provides the computing device 202 and/or the automated assistant 204) can be in communication with a server device over a network, such as the internet. Additionally, the computing device 202 and any other computing devices can be in communication with each other over a local area network (LAN), such as a Wi-Fi network or Bluetooth network.
- LAN local area network
- the computing device 202 can offload computational tasks to the server device in order to conserve computational resources at the computing device 202.
- the server device can host the automated assistant 204, and/or computing device 202 can transmit inputs received at one or more assistant interfaces 220 to the server device.
- the automated assistant 204 can be hosted at the computing device 202, and various processes that can be associated with automated assistant operations can be performed at the computing device 202.
- all or less than all aspects of the automated assistant 204 can be implemented on the computing device 202.
- aspects of the automated assistant 204 are implemented via the computing device 202 and can interface with a server device, which can implement other aspects of the automated assistant 204.
- the server device can optionally serve a plurality of users and their associated automated assistant applications via multiple threads.
- the automated assistant 204 can be an application that is separate from an operating system of the computing device 202 (e.g., installed “on top” of the operating system) - or can alternatively be implemented directly by the operating system of the computing device 202 (e.g., considered an application of, but integral with, the operating system).
- the automated assistant 204 can include an input processing engine 206, which can employ multiple different modules for processing inputs and/or outputs for the computing device 202 and/or a server device.
- the input processing engine 206 can include a speech processing engine 208, which can process audio data received at an assistant interface 220 to identify the text corresponding to a spoken utterance that is embodied in the audio data.
- the audio data can be transmitted from, for example, the computing device 202 to the server device in order to preserve computational resources at the computing device 202. Additionally, or alternatively, the audio data can be exclusively processed at the computing device 202.
- the process for converting the audio data to text can include an automatic speech recognition (ASR) algorithm, which can employ neural networks, and/or statistical models for identifying groups of audio data corresponding to words or phrases.
- ASR automatic speech recognition
- the text converted from the audio data can be parsed by a data parsing engine 210 and made available to the automated assistant 204 as textual data that can be used to generate and/or identify command phrase(s), intent(s), action(s), slot value(s), and/or any other content specified by the user.
- output data provided by the data parsing engine 210 can be provided to a parameter engine 212 to determine whether the user provided an input that corresponds to a particular intent, action, and/or routine capable of being performed by the automated assistant 204 and/or an application or agent that is capable of being accessed via the automated assistant 204.
- assistant data 238 can be stored at the server device and/or the computing device 202, and can include data that defines one or more actions capable of being performed by the automated assistant 204, as well as parameters necessary to perform the actions.
- the parameter engine 212 can generate one or more parameters for an intent, action, and/or slot value, and provide the one or more parameters to an output generating engine 214.
- the output generating engine 214 can use the one or more parameters to communicate with an assistant interface 220 for providing an output to a user, and/or communicate with one or more applications 234 for providing an output to one or more applications 234.
- the automated assistant application includes, and/or has access to, on-device ASR, on- device natural language understanding (NLU), and on-device fulfillment.
- on-device ASR can be performed using an on-device ASR module that processes audio data (detected by the microphone(s)) using, for example, an end-to-end speech recognition machine learning model stored locally at the computing device 202.
- the on-device ASR module generates recognized text for spoken utterances (if any) present in the audio data.
- on- device NLU can be performed using an on-device NLU module that processes recognized text, generated using the on-device speech recognition, and optionally contextual data, to generate NLU data.
- the NLU data can include intent(s) that correspond to the spoken utterance and optionally parameter(s) (e.g., slot values) for the intent(s).
- on-device fulfillment can be performed using an on-device fulfillment module that utilizes the NLU data (from the on- device NLU module), and optionally other local data, to determine action(s) to take to resolve the intent(s) of the spoken utterance (and optionally the parameter(s) for the intent).
- This can include determining local and/or remote responses (e.g., answers) to the spoken utterance, interaction(s) with locally installed application(s) to perform based on the spoken utterance, command(s) to transmit to intemet-of-things (loT) device(s) (directly or via corresponding remote system(s)) based on the spoken utterance, and/or other resolution action(s) to perform based on the spoken utterance.
- the on-device fulfillment module can then initiate local and/or remote performance/execution of the determined action(s) to resolve the spoken utterance.
- remote ASR, remote NLU, and/or remote fulfillment can at least selectively be utilized.
- recognized text generated by the on-device ASR module can at least selectively be transmitted to remote automated assistant component(s) for remote NLU and/or remote fulfillment.
- the recognized text generated by the on- device ASR module can optionally be transmitted for remote performance in parallel with on- device performance, or responsive to failure of on-device NLU and/or on-device fulfillment.
- on-device ASR, on-device NLU, on-device fulfillment, and/or on-device execution can be prioritized at least due to the latency reductions they provide when resolving a spoken utterance (due to no client-server roundtrip(s) being needed to resolve the spoken utterance).
- on-device functionality can be the only functionality that is available in situations with no or limited network connectivity.
- the computing device 202 can include one or more applications 234 which can be provided by a first-party entity that is the same entity that provided the computing device 202 and/or the automated assistant 204 and/or provided by a third-party entity that is different from an entity that provided the computing device 202 and/or the automated assistant 204.
- An application state engine (not depicted) of the automated assistant 204 and/or the computing device 202 can access application data 230 to determine one or more actions capable of being performed by the one or more applications 234, as well as a state of each application of the one or more applications 234 and/or a state of a respective device that is associated with the one or more applications 234.
- a device state engine (not depicted) of the automated assistant 204 and/or the computing device 202 can access device data 232 to determine one or more actions capable of being performed by the computing device 202 and/or one or more devices that are associated with the computing device 202 and/or the one or more applications 234. Furthermore, the application data 230 and/or any other data (e.g., the device data 232) can be accessed by the automated assistant 204 to generate contextual data 236, which can characterize a context in which a particular application of the one or more applications 234 and/or device is executing, and/or a context in which a particular user is accessing the computing device 202, accessing a particular application of the one or more applications 234, and/or any other device or module.
- contextual data 236 can characterize a context in which a particular application of the one or more applications 234 and/or device is executing, and/or a context in which a particular user is accessing the computing device 202, accessing a particular application of the one or more applications 23
- the device data 232 can characterize a current operating state of each of the one or more applications 234 executing at the computing device 202.
- the application data 230 can characterize one or more features of the one or more applications 234 while executing, such as content of one or more graphical user interfaces being rendered at the direction of the one or more applications 234.
- the application data 230 can characterize an action schema, which can be updated by a respective one of the one or more applications 234 and/or by the automated assistant 204, based on a current operating status of the respective application.
- one or more action schemas for the one or more applications 234 can remain static, but can be accessed by the application state engine in order to determine a suitable action to initialize via the automated assistant 204.
- the computing device 202 can further include an assistant invocation engine 222 that can use one or more trained machine learning models to process inputs received via the assistant interface 220, the application data 230, the device data 232, the contextual data 236, and/or any other data that is accessible to the computing device 202.
- the assistant invocation engine 222 can process this data in order to determine whether or not to wait for a user to explicitly speak an invocation phrase to invoke the automated assistant 204 via the assistant interface 220, or consider the data to be indicative of an intent by the user to invoke the automated assistant — in lieu of requiring the user to explicitly speak the invocation phrase to invoke the automated assistant 204.
- the one or more trained machine learning models can be trained using instances of training data that are based on scenarios in which the user is in an environment where multiple devices and/or applications are exhibiting various operating states.
- the instances of training data can be generated in order to capture training data that characterizes contexts in which the user invokes the automated assistant and other contexts in which the user does not invoke the automated assistant 204.
- the assistant invocation engine 222 can cause the automated assistant 204 to detect, or limit detecting, spoken invocation phrases from a user based on features of a context and/or an environment.
- the system 200 can further include an interaction analysis engine 216, which can process various data for determining whether an interaction between a user and an application should be the subject of an assistant suggestion.
- the interaction analysis engine 216 can process data that indicates a number of user inputs that a user provided to an application 234 to cause the application 234 to perform a particular operation while the user is in a respective vehicle. Based on this processing, the interaction analysis engine 216 can determine whether the automated assistant 204 can effectuate performance of the particular operation with less inputs from the user.
- a user that switches between corresponding instances of a given application at their portable computing device to cause certain media to be rendered at their vehicle computing device may be able to cause the same media to be rendered by issuing a particular spoken utterance (e.g., a single input) to the automated assistant 204.
- a particular spoken utterance e.g., a single input
- a vehicle context engine 218 can determine when to render a corresponding assistant suggestion for the user based on the determination made by the interaction analysis engine 216 that the particular operation can be initialized via the automated assistant 204 and with fewer inputs.
- the vehicle context engine 218 can process data from one or more different sources to determine a suitable time, location, and/or computing device to render the assistant suggestion for the user.
- the application data 230 can be processed to determine whether the user has switched between applications at their vehicle computing device and/or portable computing device in furtherance of causing certain media to be rendered.
- the vehicle context engine 218 can determine that this instance of switching applications is suitable for rendering an assistant suggestion regarding invoking the automated assistant to interact with a particular application (e.g., the application for rendering media via a suggestion of “Next time, just say ‘Open media application’”) to cause the particular operation to be performed.
- a particular application e.g., the application for rendering media via a suggestion of “Next time, just say ‘Open media application’”
- data from one or more different sources can be processed using one or more heuristic processes and/or one or more trained machine learning models to determine a suitable time, location, and/or computing device to render a particular assistant suggestion.
- the data can be periodically and/or responsively processed to generate an embedding or other lower-dimensional representation, which can be mapped to a latent space in which other existing embeddings have been previously mapped.
- the assistant suggestion corresponding to that existing embedding can be rendered for the user.
- the system 200 can further include an assistant suggestion engine 226, which can generate assistant suggestions based on interactions between a user and one or more applications and/or devices.
- the application data 230 can be processed by the assistant suggestion engine 226 to determine one or more operations capable of being initialized by a particular application of the one or more applications 234.
- interaction data can be processed to determine whether the user has previously initialized performance of one or more operations of the particular application of the one or more applications 234.
- the assistant suggestion engine 234 can determine whether the automated assistant 204 is capable of initializing the one or more operations in response to an assistant input from the user that is directed to the automated assistant 204.
- an API for a particular application of the one or more applications 234 can be accessed by the automated assistant 204 for determining whether an API command can be utilized by the automated assistant 204 to control a particular operation of the particular application of the one or more applications 234.
- the assistant suggestion engine 226 can generate a request or command phrase and/or other assistant input that can be rendered as an assistant suggestion for the user.
- the automated assistant 204 can generate a suggestion of “Assistant, play my recent podcast” based on determining that an API call (e.g., playMedia(Podcast Application, 1, resume _playback()) can be utilized by the automated assistant to control a particular application feature of interest to the user, and where “Assistant” in this suggestion corresponds to an invocation phrase for the automated assistant 204 and “play my recent podcast” corresponds to the request or command phrase that, when detected, causes the automated assistant 204 to perform an operation on behalf of the user.
- an API call e.g., playMedia(Podcast Application, 1, resume _playback()
- the automated assistant 204 can cause the assistant suggestion to be rendered while the user interacts with the one or more applications 234, before the user interacts with the one or more applications 234, in response to completion of a particular application operation via the one or more applications 234, and/or before completion of the particular application operation via the one or more applications 234.
- an assistant suggestion can be generated and/or rendered based on whether a user is an owner of the vehicle, a driver of the vehicle, a passenger in the vehicle, a borrower of the vehicle, and/or any other person that can be associated with a vehicle.
- the automated assistant 204 can determine whether the user located in the vehicle belongs in one or more of these categories using any known technique (e.g., speaker identification, face identification, password identification, fingerprint identification, and/or other techniques). For instance, the automated assistant 204 can render assistant suggestions that are more personalized for an owner when an owner is driving the vehicle, and render other assistant suggestions that may be useful to a broader audience for passengers and/or borrowers of the vehicle.
- the automated assistant 204 can operate such that, a determined familiarity of a user with various automated assistant features can be indirectly proportional to a frequency by which assistant suggestions are rendered for that user.
- the given suggestion may not be subsequently provided for presentation to the user, but may be subsequently provided for presentation to another user that utilizes the vehicle (e.g., a borrower of the vehicle).
- the system 200 can further include a suggestion training engine 224, which can be utilized to generate training data that can be used to train one or more different machine learning models, which can be used for providing assistant suggestions to different users associated with a vehicle.
- the suggestion training engine 224 can generate training data based on whether or not a particular user interacted with a rendered assistant suggestion. In this way, one or more models can be further trained to provide assistant suggestions that the user considers relevant, and avoid distracting the user with too many suggestions that the user may not have a strong interest in and/or may already be aware of.
- assistant suggestions can embody a command phrase that, when detected by the automated assistant 204, causes the automated assistant 204 to interact with the one or more applications 234 to effectuate performance of one or more different operations and/or routines.
- multiple different assistant suggestions can be rendered concurrently, for a particular application of the one or more applications 234 and/or for multiple applications of the one or more applications 234, thereby allowing the user to learn multiple different command phrases at a time for controlling the one or more applications 234.
- assistant suggestions can be rendered more for borrowers of a vehicle compared to an owner of a vehicle.
- a determination to render more assistant suggestions for a borrower and/or other guest can be based on a frequency by which an owner of the vehicle utilizes the automated assistant 204 via the vehicle computing device.
- the assistant suggestion engine 226 may cause fewer assistant suggestions to appear for a user who frequently uses features of their automated assistant 204, and more assistant suggestions for a user (with prior permission from the user) who does not frequently use features of the automated assistant 204.
- the owner of the vehicle can be differentiated from other users using various techniques.
- FIG. 3 illustrates a method 300 for providing an automated assistant that can generate suggestions for assistant inputs that can be provided by the user in furtherance of causing the automated assistant to control an application while the user is in a vehicle.
- the method 300 can be performed by one or more computing devices, applications, and/or any other apparatus or module that can be associated with an automated assistant.
- the method 300 can include an operation 302 of determining whether a user is located within a vehicle.
- the vehicle can be, for example, a vehicle that includes a vehicle computing device, which can provide access to the automated assistant.
- the automated assistant can determine that the user is within the vehicle based on data from one or more different sources, such as the vehicle computing device, a portable computing device (e.g.
- device data can indicate that the user has entered the vehicle, at least based on the device data indicating that a portable computing device, (e.g., a cellular phone) owned by the user, is paired (e.g., via Bluetooth protocol) with the vehicle computing device.
- the automated assistant can determine that the user is located within the vehicle based on contextual data, which can characterize a context of the user at a particular time. For instance, a calendar entry can indicate that the user will be driving to class when the automated assistant is determining whether the user is in the vehicle.
- the automated assistant can use one or more heuristic processes and/or one or more trained machine learning models to determine whether the user is likely (e.g., to a threshold degree of probability) to be located within the vehicle.
- one or more sensors of the vehicle that are coupled to the vehicle computing device of the vehicle can generate sensor data that indicates the user is located in the vehicle (e.g, via an occupancy sensor).
- the method 300 can proceed from the operation 302 to an operation 304, which can include processing contextual data associated with the user and/or or the vehicle.
- the contextual data can include an identifier for the user (e.g, a username, a user account, and/or any other identifier), an identifier for the vehicle (e.g., a type of vehicle, a name for the vehicle, an original equipment manufacturer (OEM) of the vehicle, etc.), and/or a location of the vehicle.
- the contextual data can include device data, application data, and/or any other data that can indicate a device and/or application that: the user recently interacted with, is currently interacting with, and/or is expected to interact with.
- application data can indicate that the user has recently developed a habit of accessing a real estate application each morning, in order to view an application page of “recent listings.”
- the method 300 can proceed from the operation 304 to an operation 306 of determining whether the user is interacting with, or is predicted to interact with, a particular application and/or application feature.
- the method 300 can proceed from the operation 306 to an optional operation 308 of updating training data.
- the training data can be updated to reflect non-interaction with the automated assistant and/or a particular application feature when the user is within the vehicle. In this way, subsequent processing of contextual data using one or more trained machine learning models trained from the updated training data can provide more relevant and/or helpful suggestions to the user.
- the method 300 can proceed from the operation 306 to an operation 310.
- the operation 310 can include generating suggestion data that characterizes an assistant input for controlling the particular application feature.
- the suggestion data can be generated using an application programming interface (API) that can allow the automated assistant to submit to, and/or receive from, the particular application actionable requests.
- API application programming interface
- the automated assistant can process HTML code, XML code, document object model (DOM) data, application programming interface (API) data, and/or any other information that can indicate whether certain features of an application are controllable. Based on this determination, the automated assistant can generate one or more suggestions for spoken utterances that, when subsequently submitted by the user, causes the automated assistant to control one or more features of the particular application.
- the automated assistant can determine that accessing the “recent listings” page of the real estate application involves opening the real estate application and selecting a selectable GUI element labeled “Recent Listings.” Based on this determination, the automated assistant can generate one or more executable requests for the real estate application, and textual content corresponding to a spoken utterance that, when provided by the user, causes the automated assistant to submit the executable requests to the real estate application.
- the one or more requests and/or textual content can then be stored as suggestion data, which can be used for rendering suggestions for one or more users.
- the automated assistant provides the one or more requests to the real estate application.
- the real estate application can respond by generating the “recent listings” page, and the automated assistant can cause the “recent listings” page to be rendered at a display interface of the vehicle computing device. Alternatively, or additionally, the automated assistant can render an audible output that characterizes content of the “recent listings” page.
- the method 300 can proceed from the operation 310 to an operation 312, which can include causing text of the spoken utterance to be rendered at the display interface of the vehicle computing device, and/or another computing device associated with the user (e.g., a portable computing device associated with the user).
- the automated assistant can cause a suggestion GUI element (that is optionally selectable) to be rendered at the display interface of the vehicle computing device while the user is within the vehicle.
- the suggestion GUI element can be rendered with natural language content characterizing the spoken utterance, such as “Assistant, show me recent listings.”
- the automated assistant can render the request suggestion without a particular identifier for the application to be controlled.
- the suggestion can be generated such that the user should be able to correlate the suggestion to a context of the suggestion and infer the application to be controlled.
- the user can initialize the communication of the request from the automated assistant to the real estate application by providing the spoken utterance and/or tapping a touch interface of the vehicle computing device.
- the display interface of the vehicle computing device can be responsive to touch inputs, and/or one or more buttons (e.g., a button on a steering wheel), switches, and/or other interfaces of the vehicle computing device can be responsive to touch inputs.
- buttons e.g., a button on a steering wheel
- switches e.g., a button on a steering wheel
- the method 300 can proceed from the operation 312 to an operation 314 of determining whether the user provided an assistant input in furtherance of controlling the application feature. For example, the user can provide a spoken utterance, such as, “Assistant, show me recent listings,” in furtherance of causing the automated assistant to control the real estate application.
- the method 300 can proceed from the operation 314 to an operation 316, which can include causing the automated assistant to control the particular application feature. Otherwise, when the user does not provide an assistant input corresponding to the suggestion from the automated assistant, the method 300 can proceed from the operation 314 to the optional operation 308.
- the method 300 can proceed from the operation 316 to an optional operation 318.
- the optional operation 318 can include updating training data based on the user utilizing the automated assistant to control the particular application feature. For instance, training data can be generated based on the user providing the assistant input, and the training data can be utilized to train one or more trained machine learning models.
- the one or more trained machine learning models can be subsequently used to process contextual data when the user, or another user, is determined to be in their vehicle. This additional training of the one or more trained machine learning models can allow the automated assistant to provide more relevant and/or effective suggestions for the user to invoke their automated assistant to control one or more separate applications, while also encouraging safe driving habits.
- Computer system 410 typically includes at least one processor 414 which communicates with a number of peripheral devices via bus subsystem 412. These peripheral devices may include a storage subsystem 424, including, for example, a memory 425 and a file storage subsystem 426, user interface output devices 420, user interface input devices 422, and a network interface subsystem 416. The input and output devices allow user interaction with computer system 410.
- Network interface subsystem 416 provides an interface to outside networks and is coupled to corresponding interface devices in other computer systems.
- User interface input devices 422 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touchscreen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices.
- pointing devices such as a mouse, trackball, touchpad, or graphics tablet
- audio input devices such as voice recognition systems, microphones, and/or other types of input devices.
- use of the term "input device” is intended to include all possible types of devices and ways to input information into computer system 410 or onto a communication network.
- User interface output devices 420 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices.
- the display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image.
- the display subsystem may also provide non-visual display such as via audio output devices.
- output device is intended to include all possible types of devices and ways to output information from computer system 410 to the user or to another machine or computer system.
- Storage subsystem 424 stores programming and data constructs that provide the functionality of some or all of the modules described herein.
- the storage subsystem 424 may include the logic to perform selected aspects of method 300, and/or to implement one or more of system 200, vehicle computing device 106, portable computing device 104, and/or any other application, assistant, device, apparatus, and/or module discussed herein.
- Memory 425 used in the storage subsystem 424 can include a number of memories including a main random access memory (RAM) 430 for storage of instructions and data during program execution and a read only memory (ROM) 432 in which fixed instructions are stored.
- a file storage subsystem 426 can provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges.
- the modules implementing the functionality of certain implementations may be stored by file storage subsystem 426 in the storage subsystem 424, or in other machines accessible by the processor(s) 414.
- Bus subsystem 412 provides a mechanism for letting the various components and subsystems of computer system 410 communicate with each other as intended. Although bus subsystem 412 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple busses.
- Computer system 410 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computer system 410 depicted in FIG. 4 is intended only as a specific example for purposes of illustrating some implementations. Many other configurations of computer system 410 are possible having more or fewer components than the computer system depicted in FIG. 4.
- the systems described herein collect personal information about users (or as often referred to herein, “participants”), or may make use of personal information
- the users may be provided with an opportunity to control whether programs or features collect user information (e.g., information about a user’s social network, social actions or activities, profession, a user’s preferences, or a user’s current geographic location), or to control whether and/or how to receive content from the content server that may be more relevant to the user.
- certain data may be treated in one or more ways before it is stored or used, so that personal identifiable information is removed.
- a user’s identity may be treated so that no personal identifiable information can be determined for the user, or a user’s geographic location may be generalized where geographic location information is obtained (such as to a city, ZIP code, or state level), so that a particular geographic location of a user cannot be determined.
- the user may have control over how information is collected about the user and/or used.
- a method implemented by one or more processors includes determining that a user is engaging in an interaction with a given mobile application via a mobile computing device that is located in a vehicle with the user.
- the given mobile application is separate from an automated assistant application, and the automated assistant application is accessible via the mobile computing device and a vehicle computing device of the vehicle.
- the method further includes generating, based on the interaction with the given mobile application, suggestion data that characterizes a command phrase that, when submitted by the user and to the automated assistant application, causes the automated assistant application to control a particular operation of a given vehicular application.
- the given vehicular application is a counterpart of the given mobile application.
- the method further includes causing, based on the suggestion data, the command phrase to be visually rendered in a foreground of a display interface of the vehicle computing device.
- generating the suggestion data may be performed at the mobile computing device, and the method may further include providing, in response to the interaction with the given mobile application, the suggestion data to the vehicle computing device.
- generating the suggestion data may be performed at the vehicle computing device, and the method may further include receiving, from the given mobile application, interaction data that characterizes the interaction between the user and the given mobile application.
- the suggestion data may be generated further based on the interaction data.
- the command phrase may be visually rendered as a selectable graphical user interface (GUI) element at the display interface, and the selectable GUI element may be selectable via touch input received at an area of the display interface corresponding to the selectable GUI element.
- GUI graphical user interface
- generating the suggestion data that characterizes the command phrase may include generating the command phrase based on one or more application operations that can be initialized via direct interaction between the user and a GUI interface of the given mobile application.
- the GUI interface may be being rendered at the mobile computing device, and the one or more application operations include the particular operation.
- generating the suggestion data that characterizes the command phrase may include generating the command phrase based on one or more application operations that the user initialized via the given mobile application of the mobile computing device during one or more prior instances when the user was located in the vehicle.
- the one or more application operations may include the particular operation.
- a method implemented by one or more processors includes generating prediction data that indicates a user is predicted to interact with an application interface of a given application, and via a display interface of a vehicle computing device, to control a feature of the given application.
- the given application is separate from an automated assistant application that is accessible via the vehicle computing device of a vehicle.
- the method further includes generating, based on the prediction data, suggestion data that characterizes a command phrase that, when submitted by the user to the automated assistant application, causes the automated assistant application to control the feature of the given application; causing at least the command phrase to be rendered at the display interface of the vehicle computing device prior to the user interacting with the feature of the application; and in response to causing at least the command phrase to be rendered at the display interface of the vehicle computing device: receiving, from the user, an assistant input that is directed to the automated assistant application and that includes at least the command phrase, and causing, based on receiving the assistant input, the automated assistant application to control the feature of the given application based on the assistant input.
- the command phrase may be rendered as a selectable graphical user interface (GUI) element at the display interface
- the assistant input may be a touch input that is received at an area of the display interface corresponding to the selectable GUI element.
- generating the prediction data may include determining that the vehicle will be driving towards a location when the user is predicted to interact with the application interface and control the feature of the application. The suggestion data may be further based on the location that the vehicle is driving towards.
- the method may further include determining that the application interface of the given application is being rendered at the display interface of the vehicle computing device. Generating the prediction data may be performed in response to determining that the application interface is being rendered at the display interface of the vehicle computing device.
- the method may further include determining that the user is currently located within the vehicle, and determining that, during a prior instance when the user was located within the vehicle, the user accessed the application interface of the given application. Generating the prediction data may be performed based on determining that the user is currently located within the vehicle and that the user previously accessed the application interface of the given application.
- the method may further include processing contextual data using one or more trained machine learning models.
- the contextual data may characterize one or more features of a context of the user, and generating the prediction data may be performed based at least on processing the contextual data.
- the one or more trained machine learning models may be trained using data generated during one or more prior instances in which one or more other users accessed, while in a respective vehicle, the feature of the given application.
- a method implemented by one or more processors includes determining, by a vehicle computing device of a vehicle, that a user is engaging in an interaction with a given application of the vehicle computing device while the user is in the vehicle.
- the given application is separate from an automated assistant application that is accessible via the vehicle computing device.
- the method further includes generating, by the vehicle computing device, and based on the interaction with the given application, suggestion data that characterizes a command phrase that, when submitted by the user and to the automated assistant application, causes the given application to perform a particular operation associated with the interaction between the user and the given application; and causing, by the vehicle computing device, the suggestion data to be visually rendered at a display interface of the vehicle computing device.
- the command phrase is rendered in a foreground of the display interface of the vehicle computing device.
- the command phrase may include natural language content that characterizes an invocation phrase for invoking the automated assistant application and a request to perform the particular operation.
- the method may further include receiving, by the vehicle computing device, a spoken utterance that is directed to the automated assistant application and embodies the request; determining, by the vehicle computing device, that the spoken utterance includes the request rendered at the display interface of the vehicle computing device; and causing, in response to receiving the spoken utterance, the given application to perform the particular operation associated with the interaction between the user and the given application.
- the command phrase may be rendered when the user is in the vehicle during a first excursion; and the spoken utterance may be received when the user is in the vehicle during a second excursion.
- the method may further include determining that the particular operation is capable of being controlled via selectable content that is being rendered at the display interface of the vehicle computing device. Generating the suggestion data that characterizes the command phrase may be based on determining that the particular operation is capable of being controlled via the selectable content that is being rendered at the display interface of the vehicle computing device.
- the application may be a communication application and the selectable content includes one or more selectable elements for specifying a phone number and/or a contact to call.
- the application may be a media application and the selectable content includes one or more selectable elements for specifying media to be visually rendered via the display interface of the vehicle computing device and/or audibly rendered via one or more speakers of the vehicle computing device.
- causing the suggestion data to be rendered at the display interface of the vehicle computing device may be in response to determining that the user has completed the interaction with the given application of the vehicle computing device.
- generating the suggestion data that characterizes the command phrase may be further based on a context of the user that is engaging in the interaction with the given application of the vehicle computing device. In some versions of those implementations, generating the suggestion data that characterizes the command phrase may be further based on a display context of the display interface of the vehicle computing device.
- the method may further include, subsequent to the command phrase being rendered at the display interface of the vehicle computing device: determining, by the vehicle computing device, that the user is engaging in a separate interaction with an additional application of the vehicle, and via the display interface of the vehicle computing device, while the user is in the vehicle.
- the additional application may also be separate from the automated assistant application and may also be separate from the given application.
- the method may further include generating, by the vehicle computing device, and based on the separate interaction with the additional application, additional suggestion data that characterizes an additional command phrase that, when submitted by the user and to the automated assistant application, causes the automated assistant application to control an additional operation of the additional application; and causing, by the vehicle computing device, the additional suggestion data to be visually rendered at the display interface of the vehicle computing device.
- causing the additional suggestion data to be rendered at the display interface of the vehicle computing device may be performed in response to determining that the user has completed the separate interaction with the additional application of the vehicle computing device.
- implementations may include a non-transitory computer readable storage medium storing instructions executable by one or more processors (e.g., central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s)), and/or tensor processing unit(s) (TPU(s)) to perform a method such as one or more of the methods described above and/or elsewhere herein.
- processors e.g., central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s)), and/or tensor processing unit(s) (TPU(s)
- CPU(s) central processing unit
- GPU(s) graphics processing unit
- TPU(s) tensor processing unit
- implementations may include a system of one or more computers that include one or more processors operable to execute stored instructions to perform a method such as one or more of the methods described above and/or elsewhere herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280091075.8A CN118591791A (en) | 2022-02-09 | 2022-07-01 | Providing contextual automated assistant action suggestions via vehicle computing device |
KR1020247025899A KR20240124414A (en) | 2022-02-09 | 2022-07-01 | Provide contextual automated assistant action suggestion(s) via the vehicle computing device. |
JP2024547571A JP2025508375A (en) | 2022-02-09 | 2022-07-01 | Providing context-based automated assistant action suggestion(s) via a vehicle computing device |
EP22758291.3A EP4248304A1 (en) | 2022-02-09 | 2022-07-01 | Providing contextual automated assistant action suggestion(s) via a vehicle computing device |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263308349P | 2022-02-09 | 2022-02-09 | |
US63/308,349 | 2022-02-09 | ||
US17/676,646 | 2022-02-21 | ||
US17/676,646 US12118994B2 (en) | 2022-02-09 | 2022-02-21 | Providing contextual automated assistant action suggestion(s) via a vehicle computing device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023154080A1 true WO2023154080A1 (en) | 2023-08-17 |
Family
ID=83050047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/035946 WO2023154080A1 (en) | 2022-02-09 | 2022-07-01 | Providing contextual automated assistant action suggestion(s) via a vehicle computing device |
Country Status (5)
Country | Link |
---|---|
US (1) | US20250006194A1 (en) |
EP (1) | EP4248304A1 (en) |
JP (1) | JP2025508375A (en) |
KR (1) | KR20240124414A (en) |
WO (1) | WO2023154080A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017197010A1 (en) | 2016-05-10 | 2017-11-16 | Google Llc | Implementations for voice assistant on devices |
US20210280180A1 (en) | 2018-12-28 | 2021-09-09 | Google Llc | Supplementing voice inputs to an automated assistant according to selected suggestions |
-
2022
- 2022-07-01 KR KR1020247025899A patent/KR20240124414A/en active Pending
- 2022-07-01 WO PCT/US2022/035946 patent/WO2023154080A1/en active Application Filing
- 2022-07-01 EP EP22758291.3A patent/EP4248304A1/en active Pending
- 2022-07-01 JP JP2024547571A patent/JP2025508375A/en active Pending
-
2024
- 2024-09-11 US US18/882,604 patent/US20250006194A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017197010A1 (en) | 2016-05-10 | 2017-11-16 | Google Llc | Implementations for voice assistant on devices |
US20210280180A1 (en) | 2018-12-28 | 2021-09-09 | Google Llc | Supplementing voice inputs to an automated assistant according to selected suggestions |
Also Published As
Publication number | Publication date |
---|---|
JP2025508375A (en) | 2025-03-26 |
US20250006194A1 (en) | 2025-01-02 |
KR20240124414A (en) | 2024-08-16 |
EP4248304A1 (en) | 2023-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12073832B2 (en) | Supplementing voice inputs to an automated assistant according to selected suggestions | |
US20240153502A1 (en) | Dynamically adapting assistant responses | |
US12118994B2 (en) | Providing contextual automated assistant action suggestion(s) via a vehicle computing device | |
JP7618812B2 (en) | Performing non-assistant application actions by an automated assistant in response to user input, which may be limited to parameters | |
US12264932B2 (en) | Automated assistant that detects and supplements various vehicle computing device capabilities | |
US20250006194A1 (en) | Providing contextual automated assistant action suggestion(s) via a vehicle computing device | |
CN118591791A (en) | Providing contextual automated assistant action suggestions via vehicle computing device | |
CN118235197A (en) | Selectively generate and/or selectively render continuation content for spoken utterance completion | |
US20240038246A1 (en) | Non-wake word invocation of an automated assistant from certain utterances related to display content | |
US12287827B2 (en) | Automatically suggesting routines based on detected user actions via multiple applications | |
EP4162233A1 (en) | Proactively activating automated assistant driving modes for varying degrees of travel detection confidence | |
WO2025034609A1 (en) | Suggesting automated assistant routines based on detected user actions | |
WO2025128240A1 (en) | Cohort assignment and churn out prediction for assistant interactions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022758291 Country of ref document: EP Effective date: 20221125 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202417053065 Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 20247025899 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280091075.8 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2024547571 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |