GB2623037A

GB2623037A - Early invocation for contextual data processing

Info

Publication number: GB2623037A
Application number: GB2400873.2A
Authority: GB
Inventors: Jacob Ponnu; Kalman Adam; Luo Ruiqi; Zhao Jingqian; Chen Xi; Zhu Zhu Yuqiang; Maddipati Krupal; Yan Wenbo; Yang Liu; Xie Meng; Devillaine Adriano; Kumar Kollu Uday; Ramachandra Prathap; Alizerine Dzialo Charlotte; Alnuaimat Mohammad; P Vinodkrishnan Nalledath
Original assignee: Amazon Technologies Inc
Current assignee: Amazon Technologies Inc
Priority date: 2021-06-24
Filing date: 2022-06-17
Publication date: 2024-04-03
Also published as: WO2022271555A1; DE112022003216T5; GB202400873D0

Abstract

A speech processing system uses contextual data to determine the specific domains, subdomains, and applications appropriate for taking action in response to spoken commands and other utterances. The system can use signals and other contextual data associated with an utterance, such as location signals, content catalog data, data regarding historical usage patterns, data regarding content visually presented on a display screen of a computing device when an utterance was made, other data, or some combination thereof.

Claims

1. A computer-implemented method comprising: as performed by a computing system comprising one or more computer processors configured to execute specific instructions, executing a set of natural language processing actions on natural language input to generate intent data, wherein the natural language input represents a user utterance, and wherein the intent data comprises a semantic representation of the user utterance; determining, at an integration point during the set of natural language processing actions, to generate a contextual data item using an initial data item associated with at least one of the set of natural language processing actions; generating the contextual data item based at least partly on the initial data item concurrently with performance of at least a portion of the set of natural language processing actions; and providing access to the contextual data item, wherein the contextual data item is accessed during performance of at least one of: the set of natural language processing actions, or a set of response determination actions.

2. The computer-implemented method of claim 1 , further comprising determining, using a registry, a contextual data action to be performed to generate the contextual data item, wherein the registry specifies the contextual data action and one or more initial data items to be used to generate the contextual data item.

3. The computer-implemented method of claim 2, wherein determining the contextual data action comprises determining one of: evaluation of the one or more initial data items using a model to generate a prediction regarding an aspect of the user utterance; performance of a calculation using the one or more initial data items to precompute a value; or acquisition of a value from a data store using the one or more initial data items.

4. The computer-implemented method of claim 1, wherein generating the contextual data item composes generating data representing at least one of: a location of a voice-enabled device from which the natural language input is received, an identifier of the voice-enabled device, a content catalog associated with a user profile, historical interactions associated with the user profile, a prediction regarding an aspect of the user utterance, or a comparison of a plurality of interpretations of the user utterance.

5. The computer- implemented method of claim 1, further comprising: determining, at a second integration point during the set of natural language processing actions, to generate a second contextual data item using a second initial data item; and generating the second contextual data item based at least partly on the second initial data item concurrently with performance of at least a second portion of the set of natural language processing actions.

6. The computer-implemented method of claim 1, further comprising: in response to generating the contextual data item, determining to generate a second contextual data item using the contextual data item; and generating the second contextual data item based at least partly on the contextual data item concurrently with performance of at least a second portion of the set of natural language processing actions.

7. The computer-implemented method of claim 1 , wherein providing access to the contextual data item comprises storing the contextual data item in a data store accessible during performance of the at least one of the set of natural language processing actions, or the set of response determination actions.

8. The computer-implemented method of claim 1, wherein executing the set of natural language processing actions comprises: generating text data using audio data and an automatic speech recognition (â ASRâ ) process, wherein the text data represents at least a portion of the user utterance; and generating intent data using the text data and a natural language understanding (â NLUâ ) process, wherein the intent data comprises a semantic representation of the user utterance.

9. A system comprising: computer-readable memory storing executable instructions: and one or more processors in communication with the computer-readable memory and configured by the executable instructions to at least: execute a set of natural language processing actions on natural language input to generate intent data, wherein the natural language input represents a user utterance, and wherein the intent data comprises a semantic representation of the user utterance; determine, at an integration point during the set of natural language proeessing actions, to generate a contextual data item using an initial data item associated with at least one of the set of natural language processing actions; generate the contextual data item based at least partly on the initial data item concurrently with performance of at least a portion of the set of natural language processing actions; and provide access to the contextual data item, wherein the contextual data item is accessed during performance of at least one of: the set of natural language processing actions, or a set of response determination actions.

10. The system of claim 9, wherein the one or more processors are configured by further executable instructions to determine, using a registry, a contextual data action to be performed to generate the contextual data item, wherein the registry specifies the contextual data action and one or more initial data items to be used to generate the contextual data item.

11. The system of claim 9, wherein the one or more processors are configured by further executable instructions to at least: determine, at a second integration point during the set of natural language processing actions, to generate a second contextual data item using a second initial data item; and generate the second contextual data item based at least partly on the second initial data item concurrently with performance of at least a second portion of the set of natural language processing actions.

12. The system of claim 9, wherein the one or more processors are configured by further executable instructions to at least: response to generating the contextual data item, determine to generate a second contextual data item using the contextual data item; and generate the second contextual data item based at least partly on the contextual data item concurrently with performance of at least a second portion of the set of natural language processing actions.

13. The system of claim 9, wherein the one or more processors configured to: generate the contextual data item based at least partly on the initial data item are configured by further executable instructions to evaluate the initial data item using a model to generate a prediction regarding an aspect of the user utterance; and generate a response to the user utterance based at least partly on the prediction, wherein the prediction relates to a likely status of the user utterance as one of an initial user utterance of a multi-turn dialog or a follow-up user utterance of a multi-turn dialog.

14. The system of claim 9, wherein the one or more processors configured to provide access to the contextual data item are configured by further executable instructions to respond to a request for the contextual data item during performance of the at least one of the set of natural language processing actions, or the set of response determination actions.

15. The system of claim 9, wherein the one or more processors are configured by further executable instructions to: determine an application to which intent data comprising a semantic representation of the user utterance is to be routed for generation of a response to the user utterance; and generate a response to the user utterance using the application and the intent data.